# Examples of including covariates with bayesdfa

#### 2019-05-22

Here we will walk through how to use the bayesdfa package to fit dynamic factor analysis (DFA) models with covariates.

library(bayesdfa)
library(ggplot2)
library(dplyr)
library(rstan)

## Notation review for DFA models

Covariates in dynamic factor analysis are generally included in the observation model, rather than the process model. Without covariates, the model can be expressed as

${x}_{t}={x}_{t-1}+{e}_t\\ { e }_{ t }\sim MVN(0,\textbf{Q})\\ { y }_{ t }=\textbf{Z}{ x }_{ t }+{ v }_{ t }\\ { v }_{ t }\sim MVN(0,\textbf{R})$

where the matrix $$\textbf{Z}$$ is dimensioned as the number of time series by number of trends, and maps the observed data $$y_{t}$$ to the latent trends $$x_{t}$$.

### Observation covariates

Observation covariates can be ${x}_{t}={x}_{t-1}+{e}_t\\ { e }_{ t }\sim MVN(0,\textbf{Q})\\ { y }_{ t }=\textbf{Z}{ x }_{ t }+\textbf{D}{ d }_{ t }+{ v }_{ t }\\ { v }_{ t }\sim MVN(0,\textbf{R})$ where the matrix $$\textbf{D}$$ represents time series by number of covariates at time $$t$$. For a single covariate, such as temperature, this would mean estimating $$P$$ parameters, where $$P$$ is the number of time series. For a model including 2 covariates, the number of estimated coefficients would be $$2P$$ and so forth.

### Process covariates

Process covariates on the trends are less common but can be written as ${x}_{t}={x}_{t-1}+\textbf{C}{ c }_{ t }+{e}_t\\ { e }_{ t }\sim MVN(0,\textbf{Q})\\ { y }_{ t }=\textbf{Z}{ x }_{ t }+{ v }_{ t }\\ { v }_{ t }\sim MVN(0,\textbf{R})$ where the matrix $$\textbf{C}$$ represents the number of trends by number of covariates at time $$t$$. For a single trend, this would mean estimating $$K$$ parameters, where $$K$$ is the number of trends. For a model including 2 covariates, the number of estimated coefficients would be $$2K$$ and so forth.

## Examples – observation covariates

We’ll start by simulating some random trends using the sim_dat function,

set.seed(1)
sim_dat <- sim_dfa(
num_trends = 1,
num_years = 20,
num_ts = 4
)

Next, we can add a covariate effect to the trend estimate, x. For example,

cov = expand.grid("time"=1:20, "timeseries"=1:4, "covariate"=1)
cov$value = rnorm(nrow(cov),0,0.1) for(i in 1:nrow(cov)) { sim_dat$y[cov$timeseries[i],cov$time[i]] = sim_dat$pred[cov$timeseries[i],cov$time[i]] + c(0.1,0.2,0.3,0.4)[cov$timeseries[i]]*cov$value[i] } And now fit the model with fit_dfa mod = fit_dfa(y = sim_dat$y, obs_covar = cov, num_trends = 1,
chains=1, iter=300)
##
## SAMPLING FOR MODEL 'dfa' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 6e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.6 seconds.
## Chain 1:
## Chain 1:
## Chain 1: Iteration:   1 / 300 [  0%]  (Warmup)
## Chain 1: Iteration:  30 / 300 [ 10%]  (Warmup)
## Chain 1: Iteration:  60 / 300 [ 20%]  (Warmup)
## Chain 1: Iteration:  90 / 300 [ 30%]  (Warmup)
## Chain 1: Iteration: 120 / 300 [ 40%]  (Warmup)
## Chain 1: Iteration: 150 / 300 [ 50%]  (Warmup)
## Chain 1: Iteration: 151 / 300 [ 50%]  (Sampling)
## Chain 1: Iteration: 180 / 300 [ 60%]  (Sampling)
## Chain 1: Iteration: 210 / 300 [ 70%]  (Sampling)
## Chain 1: Iteration: 240 / 300 [ 80%]  (Sampling)
## Chain 1: Iteration: 270 / 300 [ 90%]  (Sampling)
## Chain 1: Iteration: 300 / 300 [100%]  (Sampling)
## Chain 1:
## Chain 1:  Elapsed Time: 8.41563 seconds (Warm-up)
## Chain 1:                10.5218 seconds (Sampling)
## Chain 1:                18.9374 seconds (Total)
## Chain 1:
## Inference for the input samples (1 chains: each with iter=150; warmup=75):
##
##              mean se_mean  sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## x[1,1]        1.0     0.0 0.3   0.4   0.7   1.0   1.2   1.7    56  1.0
## x[1,2]        3.1     0.1 1.1   1.4   2.3   3.2   3.8   5.5    56  1.0
## x[1,3]        4.3     0.2 1.5   2.0   3.1   4.3   5.3   7.6    56  1.0
## x[1,4]        2.5     0.1 0.9   1.2   1.9   2.6   3.2   4.5    56  1.0
## x[1,5]        0.9     0.0 0.3   0.4   0.6   0.9   1.1   1.6    56  1.0
## x[1,6]        1.1     0.1 0.4   0.5   0.8   1.1   1.4   2.0    57  1.0
## x[1,7]       -0.1     0.0 0.0  -0.2  -0.1  -0.1  -0.1  -0.1    53  1.0
## x[1,8]        1.5     0.1 0.5   0.7   1.1   1.5   1.9   2.7    57  1.0
## x[1,9]       -0.3     0.0 0.1  -0.5  -0.3  -0.3  -0.2  -0.1    55  1.0
## x[1,10]      -1.4     0.1 0.5  -2.5  -1.8  -1.4  -1.0  -0.7    56  1.0
## x[1,11]      -0.1     0.0 0.0  -0.1  -0.1  -0.1  -0.1   0.0    66  1.0
## x[1,12]       0.6     0.0 0.2   0.3   0.4   0.6   0.7   1.0    57  1.0
## x[1,13]      -0.7     0.0 0.2  -1.3  -0.9  -0.7  -0.5  -0.3    58  1.0
## x[1,14]      -0.6     0.0 0.2  -1.1  -0.8  -0.6  -0.5  -0.3    59  1.0
## x[1,15]      -1.4     0.1 0.5  -2.5  -1.8  -1.4  -1.0  -0.7    56  1.0
## x[1,16]      -0.5     0.0 0.2  -0.9  -0.6  -0.5  -0.3  -0.2    57  1.0
## x[1,17]      -0.7     0.0 0.2  -1.3  -0.9  -0.7  -0.5  -0.3    56  1.0
## x[1,18]      -2.9     0.1 1.0  -5.2  -3.7  -3.0  -2.2  -1.4    56  1.0
## x[1,19]      -3.1     0.1 1.1  -5.5  -3.9  -3.2  -2.3  -1.4    57  1.0
## x[1,20]      -3.3     0.2 1.1  -5.8  -4.1  -3.3  -2.4  -1.5    56  1.0
## Z[1,1]        0.6     0.0 0.2   0.3   0.4   0.5   0.7   1.1    49  1.0
## Z[2,1]       -0.6     0.0 0.2  -1.1  -0.7  -0.5  -0.4  -0.3    49  1.0
## Z[3,1]        0.6     0.0 0.2   0.3   0.4   0.5   0.7   1.1    49  1.0
## Z[4,1]       -0.6     0.0 0.2  -1.1  -0.7  -0.5  -0.4  -0.3    49  1.0
## sigma[1]      0.0     0.0 0.0   0.0   0.0   0.0   0.0   0.0    47  1.0
## log_lik[1]    3.4     0.0 0.3   2.8   3.3   3.5   3.6   3.8    79  1.0
## log_lik[2]    2.6     0.1 0.7   0.9   2.3   2.6   3.1   3.5    49  1.0
## log_lik[3]    3.5     0.0 0.2   3.1   3.5   3.6   3.7   3.8    69  1.0
## log_lik[4]    3.3     0.1 0.6   1.4   3.1   3.5   3.6   3.8    42  1.0
## log_lik[5]    3.4     0.0 0.3   2.3   3.3   3.5   3.7   3.8    55  1.0
## log_lik[6]    2.7     0.1 0.7   1.0   2.4   2.8   3.2   3.6    54  1.0
## log_lik[7]    3.5     0.0 0.2   2.9   3.4   3.6   3.7   3.8    39  1.1
## log_lik[8]    3.2     0.1 0.4   2.1   3.1   3.4   3.5   3.7    51  1.0
## log_lik[9]    3.4     0.1 0.5   2.5   3.2   3.5   3.6   3.8    73  1.0
## log_lik[10]   2.4     0.1 1.0  -0.1   2.0   2.5   3.0   3.7   126  1.0
## log_lik[11]   3.4     0.0 0.3   2.6   3.4   3.5   3.6   3.8    78  1.0
## log_lik[12]   3.1     0.1 0.6   1.3   2.7   3.3   3.5   3.7    85  1.0
## log_lik[13]   3.3     0.1 0.4   2.0   3.2   3.4   3.6   3.8    44  1.0
## log_lik[14]   2.8     0.1 0.8   1.3   2.5   3.0   3.4   3.7   104  1.0
## log_lik[15]   3.5     0.1 0.3   2.8   3.4   3.6   3.7   3.8    26  1.1
## log_lik[16]   3.2     0.1 0.6   1.4   3.1   3.4   3.5   3.7    36  1.0
## log_lik[17]   3.3     0.1 0.4   2.3   3.1   3.4   3.6   3.8    72  1.0
## log_lik[18]   2.5     0.1 0.9   0.6   1.9   2.8   3.1   3.6    88  1.0
## log_lik[19]   3.5     0.0 0.3   2.9   3.4   3.6   3.6   3.7    65  1.0
## log_lik[20]   3.2     0.0 0.5   2.0   3.0   3.3   3.5   3.7   128  1.0
## log_lik[21]   3.4     0.0 0.3   2.7   3.3   3.5   3.6   3.7    59  1.0
## log_lik[22]   2.2     0.1 0.8   0.4   1.8   2.3   2.8   3.3    88  1.0
## log_lik[23]   3.6     0.0 0.2   3.2   3.5   3.6   3.7   3.8    51  1.0
## log_lik[24]   3.2     0.1 0.4   2.4   3.1   3.3   3.5   3.7    44  1.1
## log_lik[25]   3.4     0.0 0.3   2.9   3.3   3.5   3.6   3.8    62  1.0
## log_lik[26]   2.6     0.1 0.8   0.7   2.1   2.8   3.1   3.5    89  1.0
## log_lik[27]   3.5     0.0 0.2   3.2   3.4   3.6   3.6   3.8    61  1.0
## log_lik[28]   3.3     0.0 0.4   2.1   3.1   3.4   3.6   3.8    99  1.0
## log_lik[29]   3.4     0.0 0.4   2.4   3.2   3.5   3.6   3.7    68  1.0
## log_lik[30]   2.8     0.1 0.8   0.3   2.6   3.0   3.4   3.6    84  1.0
## log_lik[31]   3.5     0.0 0.3   2.7   3.4   3.6   3.6   3.8    58  1.0
## log_lik[32]   3.2     0.1 0.5   2.0   3.0   3.3   3.5   3.7    66  1.0
## log_lik[33]   3.4     0.1 0.4   2.2   3.3   3.5   3.6   3.8    39  1.0
## log_lik[34]   2.6     0.1 0.7   0.8   2.3   2.7   3.1   3.5    89  1.0
## log_lik[35]   3.5     0.0 0.2   3.0   3.5   3.6   3.7   3.8    51  1.0
## log_lik[36]   3.1     0.1 0.6   1.8   2.7   3.3   3.5   3.7    70  1.0
## log_lik[37]   3.3     0.1 0.4   2.3   3.0   3.4   3.6   3.7    72  1.0
## log_lik[38]   2.5     0.1 0.9  -0.1   2.0   2.6   3.3   3.5    60  1.0
## log_lik[39]   3.5     0.0 0.2   2.9   3.4   3.5   3.6   3.8    62  1.0
## log_lik[40]   3.2     0.1 0.5   2.1   2.9   3.3   3.5   3.7    81  1.0
## log_lik[41]   3.4     0.0 0.3   2.8   3.3   3.5   3.6   3.8    63  1.0
## log_lik[42]   2.8     0.1 0.7   1.3   2.5   2.9   3.2   3.6    78  1.0
## log_lik[43]   3.6     0.0 0.2   3.1   3.5   3.6   3.7   3.8    51  1.0
## log_lik[44]   3.2     0.1 0.6   1.4   3.0   3.3   3.5   3.7    56  1.0
## log_lik[45]   3.4     0.0 0.3   2.8   3.2   3.5   3.6   3.8    65  1.1
## log_lik[46]   2.2     0.1 0.8   0.5   1.6   2.4   2.9   3.5    72  1.0
## log_lik[47]   3.5     0.0 0.2   3.1   3.5   3.6   3.7   3.8    61  1.0
## log_lik[48]   3.2     0.1 0.5   2.0   3.0   3.4   3.5   3.7    82  1.1
## log_lik[49]   3.4     0.0 0.3   2.8   3.2   3.5   3.6   3.8    76  1.0
## log_lik[50]   2.6     0.1 0.7   1.1   2.3   2.8   3.2   3.6    86  1.0
## log_lik[51]   3.5     0.0 0.2   3.0   3.4   3.6   3.6   3.8    75  1.0
## log_lik[52]   3.2     0.0 0.4   2.3   3.0   3.3   3.5   3.7    93  1.0
## log_lik[53]   3.4     0.0 0.3   2.5   3.2   3.5   3.6   3.8    60  1.0
## log_lik[54]   2.9     0.1 0.7   1.3   2.6   3.2   3.4   3.7   189  1.0
## log_lik[55]   3.5     0.0 0.2   2.7   3.3   3.5   3.6   3.8    49  1.0
## log_lik[56]   3.4     0.0 0.4   2.4   3.3   3.5   3.6   3.7    71  1.0
## log_lik[57]   3.4     0.0 0.3   2.6   3.3   3.5   3.6   3.8    64  1.0
## log_lik[58]   2.7     0.1 0.8   1.0   2.3   2.9   3.3   3.6   115  1.0
## log_lik[59]   3.4     0.0 0.4   2.4   3.3   3.5   3.6   3.8    65  1.0
## log_lik[60]   3.3     0.1 0.5   2.1   3.1   3.4   3.6   3.7    92  1.0
## log_lik[61]   3.4     0.0 0.4   2.5   3.1   3.5   3.6   3.7    59  1.0
## log_lik[62]   2.3     0.1 0.8   0.9   1.7   2.4   3.1   3.5    89  1.0
## log_lik[63]   3.5     0.0 0.2   3.0   3.4   3.5   3.7   3.8    38  1.0
## log_lik[64]   3.1     0.1 0.5   1.9   2.9   3.2   3.5   3.7    61  1.0
## log_lik[65]   3.5     0.0 0.3   2.7   3.4   3.5   3.7   3.8    49  1.0
## log_lik[66]   2.8     0.1 0.6   1.4   2.4   2.9   3.2   3.6    68  1.0
## log_lik[67]   3.6     0.0 0.2   3.1   3.5   3.6   3.7   3.8    54  1.0
## log_lik[68]   3.1     0.1 0.6   1.8   2.9   3.3   3.6   3.7    61  1.0
## log_lik[69]   3.3     0.0 0.3   2.5   3.1   3.3   3.5   3.7    75  1.0
## log_lik[70]   2.7     0.1 0.7   1.0   2.3   2.9   3.3   3.6    95  1.0
## log_lik[71]   3.5     0.0 0.3   2.6   3.4   3.5   3.6   3.8    55  1.0
## log_lik[72]   3.1     0.1 0.5   2.2   2.8   3.2   3.5   3.6    69  1.0
## log_lik[73]   3.2     0.1 0.7   1.6   3.0   3.4   3.5   3.7    72  1.0
## log_lik[74]   2.5     0.1 0.9   0.7   2.0   2.7   3.2   3.6    81  1.0
## log_lik[75]   3.5     0.1 0.3   2.5   3.4   3.6   3.7   3.8    37  1.0
## log_lik[76]   3.1     0.1 0.6   1.6   2.8   3.3   3.5   3.7    70  1.0
## log_lik[77]   3.3     0.0 0.4   2.3   3.2   3.4   3.6   3.7    97  1.0
## log_lik[78]   2.8     0.1 0.9   0.5   2.4   3.2   3.4   3.6   113  1.0
## log_lik[79]   3.5     0.0 0.3   2.7   3.4   3.5   3.6   3.8    63  1.0
## log_lik[80]   3.2     0.0 0.5   1.9   3.0   3.4   3.5   3.7   133  1.0
## psi[1]        2.1     0.1 1.3   0.5   1.2   1.8   2.7   6.1   153  1.0
## b_obs[1,1]    0.2     0.0 0.0   0.1   0.2   0.2   0.2   0.2    37  1.1
## b_obs[1,2]    1.5     0.0 0.0   1.5   1.5   1.5   1.5   1.5    75  1.0
## b_obs[1,3]    0.2     0.0 0.0   0.2   0.2   0.2   0.3   0.3    64  1.0
## b_obs[1,4]    0.3     0.0 0.0   0.2   0.2   0.3   0.3   0.3    59  1.0
## lp__        237.5     0.8 4.3 227.7 234.6 238.2 240.6 244.9    26  1.0
##
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).

We can then make plots of the true and estimated trend,

plot(c(sim_dat$x), xlab="Time", ylab="True trend") plot_trends(rotate_trends(mod)) + ylab("Estimated trend") This approach could be modified to have covariates not affecting some time series. For example, if we didn’t want the covariate to affect the last time series, we could say cov = cov[which(cov$timeseries!=4),]

And then again fit the model

mod = fit_dfa(y = sim_dat$y, obs_covar = cov, num_trends = 1, chains=1) ## Examples – process covariates As a cautionary note, there’s some identifiability issues with including covariates in the process model. Covariates need to be standardized or centered prior to being included. Future versions of this vignette will include more clear examples and recommendations. We’ll start by simulating some random trends using the sim_dat function, set.seed(1) sim_dat <- sim_dfa( num_trends = 2, num_years = 20, num_ts = 3 ) Next, we can add a covariate effect to the trend estimate, x. For example, cov = rnorm(20, 0, 1) b_pro = c(0.1,0.3) x = matrix(0,2,20) for(i in 1:2) { x[i,1] = cov[1]*b_pro[i] } for(i in 2:length(cov)) { x[1,i] = x[1,i-1] + cov[i]*b_pro[1] + rnorm(1,0,1) x[2,i] = x[2,i-1] + cov[i]*b_pro[2] + rnorm(1,0,1) } y = sim_dat$Z %*% x

And now fit the model with fit_dfa

pro_cov = expand.grid("trend"=1:2, "time"=1:20, "covariate"=1)
pro_cov$value = cov[pro_cov$time]

mod = fit_dfa(y = sim_dat\$y, pro_covar = pro_cov, num_trends = 2,
chains=1, iter=1000)