Title: | Data Representation: Bayesian Approach That's Sparse |
---|---|
Description: | Feed longitudinal data into a Bayesian Latent Factor Model to obtain a low-rank representation. Parameters are estimated using a Hamiltonian Monte Carlo algorithm with STAN. See G. Weinrott, B. Fontez, N. Hilgert and S. Holmes, "Bayesian Latent Factor Model for Functional Data Analysis", Actes des JdS 2016. |
Authors: | Gabrielle Weinrott [aut], Brigitte Charnomordic [ctr], Benedicte Fontez [cre, aut], Nadine Hilgert [ctr], Susan Holmes [ctr], Isabelle Sanchez [ctr] |
Maintainer: | Benedicte Fontez <[email protected]> |
License: | GPL-3 |
Version: | 0.1.6 |
Built: | 2024-10-30 04:01:59 UTC |
Source: | https://github.com/cran/DrBats |
Convert a STAN objet to MCMC list
coda.obj(stanfit)
coda.obj(stanfit)
stanfit |
a STAN object |
codafit an mcmc.list
Gabrielle Weinrott
data(stanfit) # output of modelFit or main.modelFit coda.fit <- coda.obj(stanfit) head(coda.fit)
data(stanfit) # output of modelFit or main.modelFit coda.fit <- coda.obj(stanfit) head(coda.fit)
Perform Coinertia Analysis on the PCA of the Weighted PCA and Deville's PCA
coinertia.drbats( X.histo = NULL, Qp = NULL, X = NULL, t = NULL, t.range = c(0, 1000), breaks )
coinertia.drbats( X.histo = NULL, Qp = NULL, X = NULL, t = NULL, t.range = c(0, 1000), breaks )
X.histo |
the data matrix projected onto the histogram basis |
Qp |
a matrix of weights, if Qp = NULL the function specifies a diagonal weight matrix |
X |
a data matrix, if X.histo is NULL and needs to be built |
t |
a matrix of observation times, if X.histo is NULL and needs to be built |
t.range |
the range of observation times in vector form, if X.histo is NULL and needs to be built (default: t.range = c(0, 1000)) |
breaks |
integer number of histogram windows |
co_weight the co-inertia object
Gabrielle Weinrott
res <- drbats.simul(N = 5, P = 100, t.range = c(5, 100), breaks = 8) res.coinertia <- coinertia.drbats(X = res$X, t = res$t.simul, t.range = c(5, 100), breaks = 8) res.coinertia
res <- drbats.simul(N = 5, P = 100, t.range = c(5, 100), breaks = 8) res.coinertia <- coinertia.drbats(X = res$X, t = res$t.simul, t.range = c(5, 100), breaks = 8) res.coinertia
Main simulation function
drbats.simul( N = 10, P = 150, t.range = c(0, 1000), b.range = c(0.2, 0.4), c.range = c(0.6, 0.8), b.sd = 2, c.sd = 2, a.range = c(-0.4, 0.4), y.range = c(0, 10), amp = 10, per = 12, data.type = "sparse", breaks = 15, sigma2 = 0.2, seed = NULL )
drbats.simul( N = 10, P = 150, t.range = c(0, 1000), b.range = c(0.2, 0.4), c.range = c(0.6, 0.8), b.sd = 2, c.sd = 2, a.range = c(-0.4, 0.4), y.range = c(0, 10), amp = 10, per = 12, data.type = "sparse", breaks = 15, sigma2 = 0.2, seed = NULL )
N |
integer number of functions to simulate (default = 10) |
P |
a number of observation times (default = 150) |
t.range |
a range of times in which to place the P observations (default = c(1, 1000)) |
b.range |
a vector giving the range of values for the mean of the first mode (default b.range = c(0.2, 0.4)) |
c.range |
a vector giving the range of values for the mean of the second mode (default c.range = c(0.6, 0.8)) |
b.sd |
the standard deviation for the first mode (default b.sd = 2) |
c.sd |
the standard deviation for the second mode (default c.sd = 2) |
a.range |
a vector giving the range of values for the slope (default a.range = c(-0.4, 0.4)) |
y.range |
a vector giving the range of values for the intercept (default y.range = c(0, 10)) |
amp |
the amplitude of the cosine function (default = 10) |
per |
the periodicity of the cosine function (default = 12) |
data.type |
string indicating type of functions (options :sparse, sparse.tend, sparse.tend.cos) |
breaks |
number of breaks in the histogram basis |
sigma2 |
the precision of the error terms (default = 0.2) |
seed |
integer specification of a seed (default = NULL) |
Y.simul a list containing a matrix Y, a matrix beta, and a matrix epsilon
t.simul a matrix of simulated observation times
X the underlying signal to build the data, see DataSimulationandProjection vignette
proj.pca the outputs of the function pca.proj.Xt
wlu the outputs of the function W.QR
Gabrielle Weinrott
res <- drbats.simul(N = 5, P = 100, t.range = c(5, 100), breaks = 8) X <- res$X t <- res$t.simul # To plot the observations, ie the rows matplot(t(t), t(X), type = 'l', xlab = "Time", ylab = "X")
res <- drbats.simul(N = 5, P = 100, t.range = c(5, 100), breaks = 8) X <- res$X t <- res$t.simul # To plot the observations, ie the rows matplot(t(t), t(X), type = 'l', xlab = "Time", ylab = "X")
Project a set of curves onto a histogram basis
histoProj(X, t, t.range, breaks)
histoProj(X, t, t.range, breaks)
X |
a matrix |
t |
a matrix of observation times |
t.range |
a range of times in which to place the P projections (default = c(0, 1000)) |
breaks |
the number of intervals in the histogram basis |
X.proj the matrix X after projection
X.count a matrix containing the number of observations used to build the projection onto the histogram basis
windows a vector containing the first time of each window of the histogram intervals
X.max the matrix of minimum values in each window
X.min the matrix of maximum values in each window
Gabrielle Weinrott
res <- drbats.simul(N = 5, P = 100, t.range = c(5, 100), breaks = 8) res.proj <- histoProj(res$X, res$t.simul, t.range = c(5, 100), breaks = 8) res.proj
res <- drbats.simul(N = 5, P = 100, t.range = c(5, 100), breaks = 8) res.proj <- histoProj(res$X, res$t.simul, t.range = c(5, 100), breaks = 8) res.proj
Fit a Bayesian Latent Factor to a data set using STAN
modelFit( model = "PLT", var.prior = "IG", prog = "stan", parallel = TRUE, Xhisto = NULL, nchains = 4, nthin = 10, niter = 10000, R = NULL )
modelFit( model = "PLT", var.prior = "IG", prog = "stan", parallel = TRUE, Xhisto = NULL, nchains = 4, nthin = 10, niter = 10000, R = NULL )
model |
a string indicating the type of model ("PLT", or sparse", default = "PLT") |
var.prior |
the family of priors to use for the variance parameters ("IG" for inverse gamma, or "cauchy") |
prog |
a string indicating the MCMC program to use (default = "stan") |
parallel |
true or false, whether or not to parelleize (done using the package "parallel") |
Xhisto |
matrix of simulated data (projected onto the histogram basis) |
nchains |
number of chains (default = 2) |
nthin |
the number of thinned interations (default = 1) |
niter |
number of iterations (default = 1e4) |
R |
rotation matrix of the same dimension as the number of desired latent factors |
stanfit, a STAN object
Gabrielle Weinrott
The Stan Development Team Stan Modeling Language User's Guide and Reference Manual. http://mc-stan.org/
Perform a PCA using Deville's method
pca.Deville(X, t, t.range, breaks)
pca.Deville(X, t, t.range, breaks)
X |
a data matrix |
t |
a matrix of observation times corresponding to X |
t.range |
the range of observation times in vector form (ex. t.range = c(0, 1000)) |
breaks |
integer number of histogram windows |
X.histo the matrix projected onto the histogram basis
U.histo a matrix of eigenvectors in the histogram basis
Cp a matrix of principal components
lambda a vector of eigenvalues
perc.lambda a vector of the percentage of total inertia explained by each principal component
Gabrielle Weinrott
JC Deville, "Methodes statisiques et numeriques de l'analyse harmonique", Annales de l'INSEE, 1974.
res <- drbats.simul(N = 5, P = 100, t.range = c(5, 100), breaks = 8) res.pca <- pca.Deville(res$X, res$t.simul, t.range = c(5, 100), breaks = 8) res.pca
res <- drbats.simul(N = 5, P = 100, t.range = c(5, 100), breaks = 8) res.pca <- pca.Deville(res$X, res$t.simul, t.range = c(5, 100), breaks = 8) res.pca
PCA data projected onto a histogram basis
pca.proj.Xt(X, t, t.range = c(0, 1000), breaks = 15)
pca.proj.Xt(X, t, t.range = c(0, 1000), breaks = 15)
X |
the data matrix |
t |
the matrix of observation times |
t.range |
a vector specifying the observation time range (default : c(0, 1000)) |
breaks |
the number of breaks in the histogram basis (default : breaks = 15) |
Xt.proj a matrix of projected observations
U a matrix of eigenvectors
lambda a vector of eigenvalues
lambda.perc the percentage of inertia captured by each axis
Gabrielle Weinrott
res <- drbats.simul(N = 5, P = 100, t.range = c(5, 100), breaks = 8) pca.proj.Xt(res$X, res$t.simul, t.range = c(0, 100), breaks = 8)
res <- drbats.simul(N = 5, P = 100, t.range = c(5, 100), breaks = 8) pca.proj.Xt(res$X, res$t.simul, t.range = c(0, 100), breaks = 8)
Calculate the unnormalized posterior density of the model
postdens(mcmc.output, Y, D, chain = 1)
postdens(mcmc.output, Y, D, chain = 1)
mcmc.output |
an mcmc list as produced by clean.mcmc |
Y |
the data matrix |
D |
the number of latent factors |
chain |
the chain to plot (default = 1) |
post a vector containing the posterior density at each iteration##' @examples
Gabrielle Weinrott
data("toydata") data("stanfit") dens <- postdens(coda.obj(stanfit), Y = toydata$Y.simul$Y, D = 2, chain = 1) hist(dens)
data("toydata") data("stanfit") dens <- postdens(coda.obj(stanfit), Y = toydata$Y.simul$Y, D = 2, chain = 1) hist(dens)
A stanfit object fitted to the toydata
stanfit
stanfit
A large stanfit object
A toy longitudinal data set
toydata
toydata
A list with 5 elements :
a list of simulated data with 3 elements
a matrix with 5 rows and 150 columns giving the observation times of the original data
the original data matrix with 5 rows and 150 columns
a list with 4 elements : results of the function histoProj(X, t, t.range = c(0, 1000), breaks = 8)
a list with 4 elements : results of the function W.QR(U, lambda) where U and lambda are the results of the PCA of X
Format scores output for visualization
visbeta(mcmc.output, Y, D, chain = 1, axes = c(1, 2), quant = NULL)
visbeta(mcmc.output, Y, D, chain = 1, axes = c(1, 2), quant = NULL)
mcmc.output |
an mcmc list as produced by clean.mcmc |
Y |
the matrix of data |
D |
the number of latent factors |
chain |
the chain to use (default = 1) |
axes |
the axes to use (default = c(1, 2)) |
quant |
a vector of quantiles to retain (default = NULL) |
mean.df are the MCMC estimates for the parmeters
points.df contains all of the estimates of the chain
contour.df contains the exterior points of the convex hull of the cloud of estimates
Gabrielle Weinrott
data("toydata") data("stanfit") codafit <- coda.obj(stanfit) ## convert to mcmc.list beta.res <- visbeta(codafit, Y = toydata$Y.simul$Y, D = toydata$wlu$D, chain = 1, axes = c(1, 2), quant = c(0.05, 0.95)) ggplot2::ggplot() + ggplot2::geom_path(data = beta.res$contour.df, ggplot2::aes(x = x, y = y, colour = ind)) + ggplot2::geom_point(data = beta.res$mean.df, ggplot2::aes(x = x, y = y, colour = ind))
data("toydata") data("stanfit") codafit <- coda.obj(stanfit) ## convert to mcmc.list beta.res <- visbeta(codafit, Y = toydata$Y.simul$Y, D = toydata$wlu$D, chain = 1, axes = c(1, 2), quant = c(0.05, 0.95)) ggplot2::ggplot() + ggplot2::geom_path(data = beta.res$contour.df, ggplot2::aes(x = x, y = y, colour = ind)) + ggplot2::geom_point(data = beta.res$mean.df, ggplot2::aes(x = x, y = y, colour = ind))
Plot the estimates for the latent factors
visW(mcmc.output, Y, D, chain = 1, factors = c(1, 2))
visW(mcmc.output, Y, D, chain = 1, factors = c(1, 2))
mcmc.output |
an mcmc list as produced by clean.mcmc |
Y |
the matrix of data |
D |
the number of latent factors |
chain |
the chain to plot (default = 1) |
factors |
a vector indicating the factors to plot (default = c(1, 2)) |
res.W a data frame containing the estimates for the factors, and their lower and upper bounds
Inertia the percentage of total inertia captured by each of the factors
Gabrielle Weinrott
data("toydata") data("stanfit") codafit <- coda.obj(stanfit) ## convert to mcmc.list W.res <- visW(codafit, Y = toydata$Y.simul$Y, D = toydata$wlu$D, chain = 1, factors = c(1, 2)) ## plot the results data <- data.frame(time = rep(1:9, 2), W.res$res.W) ggplot2::ggplot() + ggplot2::geom_step(data = data, ggplot2::aes(x = time, y = Estimation, colour = Factor)) + ggplot2::geom_step(data = data, ggplot2::aes(x = time, y = Lower.est, colour = Factor), linetype = "longdash") + ggplot2::geom_step(data = data, ggplot2::aes(x = time, y = Upper.est, colour = Factor), linetype = "longdash")
data("toydata") data("stanfit") codafit <- coda.obj(stanfit) ## convert to mcmc.list W.res <- visW(codafit, Y = toydata$Y.simul$Y, D = toydata$wlu$D, chain = 1, factors = c(1, 2)) ## plot the results data <- data.frame(time = rep(1:9, 2), W.res$res.W) ggplot2::ggplot() + ggplot2::geom_step(data = data, ggplot2::aes(x = time, y = Estimation, colour = Factor)) + ggplot2::geom_step(data = data, ggplot2::aes(x = time, y = Lower.est, colour = Factor), linetype = "longdash") + ggplot2::geom_step(data = data, ggplot2::aes(x = time, y = Upper.est, colour = Factor), linetype = "longdash")
Build and decompose a low-rank matrix from a matrix of eigenvectors and eigenvalues from principal component analysis
W.QR(U, lambda)
W.QR(U, lambda)
U |
a matrix of eigenvectors |
lambda |
a vector of corresponding eigenvalues |
W a low-rank matrix
D the number of latent factors
Q the orthogonal matrix of the W = QR matrix decomposition
R the upper triangular matrix of the W = QR matrix decomposition
Gabrielle Weinrott
res <- drbats.simul(N = 5, P = 100, t.range = c(5, 100), breaks = 8) res.pca <- pca.Deville(res$X, res$t.simul, t.range = c(5, 100), breaks = 8) Wres.pca <- W.QR(res.pca$U, res.pca$lambda) Wres.pca
res <- drbats.simul(N = 5, P = 100, t.range = c(5, 100), breaks = 8) res.pca <- pca.Deville(res$X, res$t.simul, t.range = c(5, 100), breaks = 8) Wres.pca <- W.QR(res.pca$U, res.pca$lambda) Wres.pca
Perform a weighted PCA using Deville's method on a data matrix X that we project onto a histogram basis and weighted
weighted.Deville(X, t, t.range, breaks, Qp = NULL)
weighted.Deville(X, t, t.range, breaks, Qp = NULL)
X |
a data matrix |
t |
a matrix of observation times corresponding to X |
t.range |
the range of observation times in vector form (ex. t.range = c(a, b)) |
breaks |
integer number of histogram windows |
Qp |
a matrix of weights, if Qp = NULL the function specifies a diagonal weight matrix |
X.histo the matrix projected onto the histogram basis
U.histo a matrix of eigenvectors in the histogram basis
Cp a matrix of principal components
lambda a vector of eigenvalues
perc.lambda a vector of the percentage of total inertia explained by each principal component
Gabrielle Weinrott
res <- drbats.simul(N = 5, P = 100, t.range = c(5, 100), breaks = 8) res.weighted <- weighted.Deville(res$X, res$t.simul, t.range = c(5, 100), breaks = 8, Qp = NULL) res.weighted
res <- drbats.simul(N = 5, P = 100, t.range = c(5, 100), breaks = 8) res.weighted <- weighted.Deville(res$X, res$t.simul, t.range = c(5, 100), breaks = 8, Qp = NULL) res.weighted