title: “Stochastic Process Model for Analysis of Longitudinal and Time-to-Event Outcomes” author: “Ilya Y. Zhbannikov” date: “2020-11-28” output: rmarkdown::html_document bibliography: “references.bib” vignette: >

%\VignetteIndexEntry{stpm} %\VignetteEngine{knitr::rmarkdown} usepackage[utf8]{inputenc}

%\VignetteEncoding{UTF-8}

Overview

The Stochastic Process Model (SPM) was developed several decades ago [@Woodbury_1977, @Yashin_2007], and applied for analyses of clinical, demographic, epidemiologic longitudinal data as well as in many other studies that relate stochastic dynamics of repeated measures to the probability of end-points (outcomes). SPM links the dynamic of stochastical variables with a hazard rate as a quadratic function of the state variables [@Yashin_2007]. The R-package, “stpm”, is a set of utilities to estimate parameters of stochastic process and modeling survival trajectories and time-to-event outcomes observed from longitudinal studies. It is a general framework for studying and modeling survival (censored) traits depending on random trajectories (stochastic paths) of variables.

Installation

Stable version from CRAN

install.packages("stpm")

Most-recent version from GitHub

require(devtools)
devtools::install_github("izhbannikov/stpm")

Data description

Data represents a typical longitudinal data in form of two datasets: longitudinal dataset (follow-up studies), in which one record represents a single observation, and vital (survival) statistics, where one record represents all information about the subject. Longitudinal dataset cat contain a subject ID (identification number), status (event(1)/censored(0)), time and measurements across the variables.

Below there is an example of clinical data that can be used in stpm and we will discuss the fields later.

Longitudinal table:

##   ID IndicatorDeath Age      DBP      BMI
## 1  1              0  30 80.00000 25.00000
## 2  1              0  32 80.51659 26.61245
## 3  1              0  34 77.78412 29.16790
## 4  1              0  36 77.86665 32.40359
## 5  1              0  38 96.55673 31.92014
## 6  1              0  40 94.48616 32.89139

Description of data fields

“Short” and “Long” longitudinal data formats

The packate accepts longitudinal data in two formats: “short” and “long”.

“Short” format
##   id xi  t          y1
## 1  1  0 30   36.133394
## 2  1  0 31   -5.539881
## 3  1  0 32  -45.129493
## 4  1  0 33  -82.739624
## 5  1  0 34 -118.469249
## 6  1  0 35 -152.412392
“Long” format
##   id xi t1 t2          y1     y1.next
## 1  1  0 30 31   36.133394   -5.539881
## 2  1  0 31 32   -5.539881  -45.129493
## 3  1  0 32 33  -45.129493  -82.739624
## 4  1  0 33 34  -82.739624 -118.469249
## 5  1  0 34 35 -118.469249 -152.412392
## 6  1  0 35 36 -152.412392 -184.658379

Discrete- and continuous-time models

There are two main SPM types in the package: discrete-time model [@Akushevich_2005] and continuous-time model [@Yashin_2007]. Discrete model assumes equal intervals between follow-up observations. The example of discrete dataset is given below.

library(stpm)
data <- simdata_discr(N=10) # simulate data for 10 individuals, "long" format (default)
head(data)
##   id xi t1 t2          y1     y1.next
## 1  1  0 30 31   36.133394   -5.539881
## 2  1  0 31 32   -5.539881  -45.129493
## 3  1  0 32 33  -45.129493  -82.739624
## 4  1  0 33 34  -82.739624 -118.469249
## 5  1  0 34 35 -118.469249 -152.412392
## 6  1  0 35 36 -152.412392 -184.658379

In this case there are equal intervals between \(t_1\) and \(t_2\).

In the continuous-time SPM, in which intervals between observations are not equal (arbitrary or random). The example of such dataset is shown below:

library(stpm)
data <- simdata_cont(N=5, format="short") # simulate data for 5 individuals, "short" format
head(data)
##   id xi  t         y1
## 1  0  0 30   71.22668
## 2  0  0 31   28.86212
## 3  0  0 32  -11.43627
## 4  0  0 33  -49.76927
## 5  0  0 34  -86.23273
## 6  0  0 35 -120.91783

Discrete-time model

The discrete model assumes fixed time intervals between consecutive observations. In this model, \(\mathbf{Y}(t)\) (a \(k \times 1\) matrix of the values of covariates, where \(k\) is the number of considered covariates) and \(\mu(t, \mathbf{Y}(t))\) (the hazard rate) have the following form:

\(\mathbf{Y}(t+1) = \mathbf{u} + \mathbf{R} \mathbf{Y}(t) + \mathbf{\epsilon}\)

\(\mu (t, \mathbf{Y}(t)) = [\mu_0 + \mathbf{b} \mathbf{Y}(t) + \mathbf{Y}(t)^* \mathbf{Q} \mathbf{Y}(t)] e^{\theta t}\)

Coefficients \(\mathbf{u}\) (a \(k \times 1\) matrix, where \(k\) is a number of covariates), \(\mathbf{R}\) (a \(k \times k\) matrix), \(\mu_0\), \(\mathbf{b}\) (a \(1 \times k\) matrix), \(\mathbf{Q}\) (a \(k \times k\) matrix) are assumed to be constant in the particular implementation of this model in the R-package stpm. \(\mathbf{\epsilon}\) are normally-distributed random residuals, \(k \times 1\) matrix. A symbol '*' denotes transpose operation. \(\theta\) is a parameter to be estimated along with other parameters (\(\mathbf{u}\), \(\mathbf{R}\), \(\mathbf{\mu_0}\), \(\mathbf{b}\), \(\mathbf{Q}\)).

Example

library(stpm)
#Data simulation (200 individuals)
data <- simdata_discr(N=100)
#Estimation of parameters
pars <- spm_discrete(data)
pars
## $dmodel
## $dmodel$theta
## [1] 0.2
## 
## $dmodel$mu0
## [1] 9.136222991e-06
## 
## $dmodel$b
## [1] 2.477520974e-08
## 
## $dmodel$Q
##                 [,1]
## [1,] 1.678638383e-11
## 
## $dmodel$u
## [1] -39.86660584
## 
## $dmodel$u.std.err
##     (Intercept) 
## 1.072800413e-13 
## 
## $dmodel$R
##      [,1]
## [1,] 0.95
## 
## $dmodel$R.std.err
##                 y1_1
## [1,] 1.830749633e-16
## 
## $dmodel$Sigma
## [1] 3.194195589e-12
## 
## 
## $cmodel
## $cmodel$a
##       [,1]
## [1,] -0.05
## 
## $cmodel$f1
##              [,1]
## [1,] -797.3321169
## 
## $cmodel$Q
##                 [,1]
## [1,] 1.678638383e-11
## 
## $cmodel$f
##              [,1]
## [1,] -737.9555354
## 
## $cmodel$b
##                 [,1]
## [1,] 3.194195589e-12
## 
## $cmodel$mu0
##                  [,1]
## [1,] -5.278592871e-09
## 
## $cmodel$theta
## [1] 0.2
## 
## 
## attr(,"class")
## [1] "spm.discrete"

Continuous-time model

In the specification of the SPM described in 2007 paper by Yashin and collegaues [@Yashin_2007] the stochastic differential equation describing the age dynamics of a covariate is:

\(d\mathbf{Y}(t)= \mathbf{a}(t)(\mathbf{Y}(t) -\mathbf{f}_1(t))dt + \mathbf{b}(t)d\mathbf{W}(t), \mathbf{Y}(t=t_0)\)

In this equation, \(\mathbf{Y}(t)\) (a \(k \times 1\) matrix) is the value of a particular covariate at a time (age) \(t\). \(\mathbf{f}_1(t)\) (a \(k \times 1\) matrix) corresponds to the long-term mean value of the stochastic process \(\mathbf{Y}(t)\), which describes a trajectory of individual covariate influenced by different factors represented by a random Wiener process \(\mathbf{W}(t)\). Coefficient \(\mathbf{a}(t)\) (a \(k \times k\) matrix) is a negative feedback coefficient, which characterizes the rate at which the process reverts to its mean. In the area of research on aging, \(\mathbf{f}_1(t)\) represents the mean allostatic trajectory and \(\mathbf{a}(t)\) represents the adaptive capacity of the organism. Coefficient \(\mathbf{b}(t)\) (a \(k \times 1\) matrix) characterizes a strength of the random disturbances from Wiener process \(\mathbf{W}(t)\).

The following function \(\mu(t, \mathbf{Y}(t))\) represents a hazard rate:

\(\mu(t, \mathbf{Y}(t)) = \mu_0(t) + (\mathbf{Y}(t) - \mathbf{f}(t))^* \mathbf{Q}(t) (\mathbf{Y}(t) - \mathbf{f}(t))\)

here \(\mu_0(t)\) is the baseline hazard, which represents a risk when \(\mathbf{Y}(t)\) follows its optimal trajectory; \(\mathbf{f}(t)\) (a \(k \times 1\) matrix) represents the optimal trajectory that minimizes the risk and \(\mathbf{Q}(t)\) (\(k \times k\) matrix) represents a sensitivity of risk function to deviation from the norm.

Example

library(stpm)
#Simulate some data for 50 individuals
data <- simdata_cont(N=50)
head(data)
##   id xi t1 t2            y1       y1.next
## 1  0  0 30 31   71.22667883   28.86212066
## 2  0  0 31 32   28.86212066  -11.43627413
## 3  0  0 32 33  -11.43627413  -49.76927447
## 4  0  0 33 34  -49.76927447  -86.23273469
## 5  0  0 34 35  -86.23273469 -120.91783418
## 6  0  0 35 36 -120.91783418 -153.91130546
#Estimate parameters
# a=-0.05, f1=80, Q=2e-8, f=80, b=5, mu0=2e-5, theta=0.08 are starting values for estimation procedure
pars <- spm_continuous(dat=data,a=-0.05, f1=80, Q=2e-8, f=80, b=5, mu0=2e-5, theta=0.08)
## Parameter a achieved lower/upper bound.
## 0 
## Parameter Q achieved lower/upper bound.
## 0 
## Parameter b achieved lower/upper bound.
## 5.5 
## Parameter mu0 achieved lower/upper bound.
## 1.8e-05
pars
## $a
##      [,1]
## [1,]    0
## 
## $f1
##             [,1]
## [1,] 87.20242295
## 
## $Q
##      [,1]
## [1,]    0
## 
## $f
##             [,1]
## [1,] 74.69160644
## 
## $b
##      [,1]
## [1,]  5.5
## 
## $mu0
## [1] 1.8e-05
## 
## $theta
## [1] 0.08763157956
## 
## $status
## [1] 5
## 
## $LogLik
## [1] -25284.7227
## 
## $objective
## [1] 25284.7227
## 
## $message
## [1] "NLOPT_MAXEVAL_REACHED: Optimization stopped because maxeval (above) was reached."
## 
## $limit
## [1] TRUE
## 
## attr(,"class")
## [1] "spm.continuous"

Coefficient conversion between continuous- and discrete-time models

The coefficient conversion between continuous- and discrete-time models is as follows ('c' and 'd' denote continuous- and discrete-time models respectively; note: these equations can be used if intervals between consecutive observations of discrete- and continuous-time models are equal; it also required that matrices \(\mathbf{a}_c\) and \(\mathbf{Q}_{c,d}\) must be full-rank matrices):

\(\mathbf{Q}_c = \mathbf{Q}_d\)

\(\mathbf{a}_c = \mathbf{R}_d - I(k)\)

\(\mathbf{b}_c = \mathbf{\Sigma}\)

\({\mathbf{f}_1}_c = -\mathbf{a}_c^{-1} \times \mathbf{u}_d\)

\(\mathbf{f}_c = -0.5 \mathbf{b}_d \times \mathbf{Q}^{-1}_d\)

\({\mu_0}_c = {\mu _0}_d - \mathbf{f}_c \times \mathbf{Q_c} \times \mathbf{f}_c^*\)

\(\theta_c = \theta_d\)

where \(k\) is a number of covariates, which is equal to model's dimension and '*' denotes transpose operation; \(\mathbf{\Sigma}\) is a \(k \times 1\) matrix which contains s.d.s of corresponding residuals (residuals of a linear regression \(\mathbf{Y}(t+1) = \mathbf{u} + \mathbf{R}\mathbf{Y}(t) + \mathbf{\epsilon}\); s.d. is a standard deviation), \(I(k)\) is an identity \(k \times k\) matrix.

Model with time-dependent coefficients

In previous models, we assumed that coefficients is sort of time-dependant: we multiplied them on to \(e^{\theta t}\). In general, this may not be the case [@Yashin_2007a]. We extend this to a general case, i.e. (we consider one-dimensional case):

\(\mathbf{a(t)} = \mathbf{par}_1 t + \mathbf{par}_2\) - linear function.

The corresponding equations will be equivalent to one-dimensional continuous case described above.

Example

library(stpm)
#Data preparation:
n <- 10
data <- simdata_time_dep(N=n)
# Estimation:
opt.par <- spm_time_dep(data, 
                        start = list(a = -0.05, f1 = 80, Q = 2e-08, f = 80, b = 5, mu0 = 0.001), 
                        frm = list(at = "a", f1t = "f1", Qt = "Q", ft = "f", bt = "b", mu0t= "mu0"))
opt.par
## $a
## [1] -0.05
## 
## $f1
## [1] 80
## 
## $Q
## [1] 2e-08
## 
## $f
## [1] 80
## 
## $b
## [1] 5
## 
## $mu0
## [1] 0.001

Setting lower and upper boundaries of the model parameters

Lower and upper boundaries can be set up with parameters \(lb\) and \(ub\), which represents simple numeric vectors. Note: lengths of \(lb\) and \(ub\) must be the same as the total length of the parameters. Lower and upper boundaries can be set for continuous-time and time-dependent models only.

Setting lb and ub for continuous-time model

One covariate

Below we show the example of setting up \(lb\) and \(ub\) when we have a single covariate:

library(stpm)
data <- simdata_cont(N=10, ystart = 80, a = -0.1, Q = 1e-06, mu0 = 1e-5, theta = 0.08, f1 = 80, f=80, b=1, dt=1, sd0=5)
ans <- spm_continuous(dat=data,
                      a = -0.1,
                      f1 = 82, 
                      Q = 1.4e-6,
                      f = 77,
                      b = 1,
                      mu0 = 1.6e-5,
                      theta = 0.1,
                      stopifbound = FALSE,
                      lb=c(-0.2, 60, 0.1e-6, 60, 0.1, 0.1e-5, 0.01), 
                      ub=c(0, 140, 5e-06, 140, 3, 5e-5, 0.20))
ans
## $a
##                [,1]
## [1,] -0.09031734508
## 
## $f1
##             [,1]
## [1,] 79.45710275
## 
## $Q
##                 [,1]
## [1,] 1.612486204e-07
## 
## $f
##            [,1]
## [1,] 99.7029661
## 
## $b
##              [,1]
## [1,] 0.9906096625
## 
## $mu0
## [1] 1.136897362e-05
## 
## $theta
## [1] 0.1114342063
## 
## $status
## [1] 5
## 
## $LogLik
## [1] -529.0602187
## 
## $objective
## [1] 529.0103672
## 
## $message
## [1] "NLOPT_MAXEVAL_REACHED: Optimization stopped because maxeval (above) was reached."
## 
## $limit
## [1] FALSE
## 
## attr(,"class")
## [1] "spm.continuous"

Two covariates

This is an example for two physiological variables (covariates).

library(stpm)

data <- simdata_cont(N=10, 
                     a=matrix(c(-0.1,  0.001, 0.001, -0.1), nrow = 2, ncol = 2, byrow = T),
                     f1=t(matrix(c(100, 200), nrow = 2, ncol = 1, byrow = F)),
                     Q=matrix(c(1e-06, 1e-7, 1e-7,  1e-06), nrow = 2, ncol = 2, byrow = T),
                     f=t(matrix(c(100, 200), nrow = 2, ncol = 1, byrow = F)),
                     b=matrix(c(1, 2), nrow = 2, ncol = 1, byrow = F),
                     mu0=1e-4,
                     theta=0.08,
                     ystart = c(100,200), sd0=c(5, 10), dt=1)

a.d <- matrix(c(-0.15,  0.002, 0.002, -0.15), nrow = 2, ncol = 2, byrow = T)
f1.d <- t(matrix(c(95, 195), nrow = 2, ncol = 1, byrow = F))
Q.d <- matrix(c(1.2e-06, 1.2e-7, 1.2e-7,  1.2e-06), nrow = 2, ncol = 2, byrow = T)
f.d <- t(matrix(c(105, 205), nrow = 2, ncol = 1, byrow = F))
b.d <- matrix(c(1, 2), nrow = 2, ncol = 1, byrow = F)
mu0.d <- 1.1e-4
theta.d <- 0.07

ans <- spm_continuous(dat=data,
                      a = a.d, 
                      f1 = f1.d,
                      Q = Q.d,
                      f = f.d,
                      b = b.d,
                      mu0 = mu0.d,
                      theta = theta.d,
                      lb=c(-0.5, ifelse(a.d[2,1] > 0, a.d[2,1]-0.5*a.d[2,1], a.d[2,1]+0.5*a.d[2,1]), ifelse(a.d[1,2] > 0, a.d[1,2]-0.5*a.d[1,2], a.d[1,2]+0.5*a.d[1,2]), -0.5,  
                           80, 100, 
                           Q.d[1,1]-0.5*Q.d[1,1], ifelse(Q.d[2,1] > 0, Q.d[2,1]-0.5*Q.d[2,1], Q.d[2,1]+0.5*Q.d[2,1]), ifelse(Q.d[1,2] > 0, Q.d[1,2]-0.5*Q.d[1,2], Q.d[1,2]+0.5*Q.d[1,2]), Q.d[2,2]-0.5*Q.d[2,2],
                           80, 100,
                           0.1, 0.5,
                           0.1e-4,
                           0.01),
                      ub=c(-0.08,  0.002,  0.002, -0.08,  
                           110, 220, 
                           Q.d[1,1]+0.1*Q.d[1,1], ifelse(Q.d[2,1] > 0, Q.d[2,1]+0.1*Q.d[2,1], Q.d[2,1]-0.1*Q.d[2,1]), ifelse(Q.d[1,2] > 0, Q.d[1,2]+0.1*Q.d[1,2], Q.d[1,2]-0.1*Q.d[1,2]), Q.d[2,2]+0.1*Q.d[2,2],
                           110, 220,
                           1.5, 2.5,
                           1.2e-4,
                           0.10))
ans
## $a
##                 [,1]            [,2]
## [1,] -0.150678872024  0.001695644868
## [2,]  0.001981464291 -0.149129598829
## 
## $f1
##             [,1]
## [1,] 105.1278134
## [2,] 194.5480463
## 
## $Q
##                 [,1]            [,2]
## [1,] 1.216268259e-06 1.217400485e-07
## [2,] 1.216825424e-07 1.211856991e-06
## 
## $f
##             [,1]
## [1,] 105.0953201
## [2,] 204.8977588
## 
## $b
##             [,1]
## [1,] 1.122142611
## [2,] 1.934691907
## 
## $mu0
## [1] 0.0001110265608
## 
## $theta
## [1] 0.07362559726
## 
## $status
## [1] 5
## 
## $LogLik
## [1] 1597.030875
## 
## $objective
## [1] -1986.370923
## 
## $message
## [1] "NLOPT_MAXEVAL_REACHED: Optimization stopped because maxeval (above) was reached."
## 
## $limit
## [1] FALSE
## 
## attr(,"class")
## [1] "spm.continuous"

Setting lb and ub for model with time-dependent coefficients

This model uses only one covariate, therefore setting-up model parameters is easy:

n <- 10
data <- simdata_time_dep(N=n)
# Estimation:
opt.par <- spm_time_dep(data, start=list(a=-0.05, f1=80, Q=2e-08, f=80, b=5, mu0=0.001), 
                        lb=c(-1, 30, 1e-8, 30, 1, 1e-6), ub=c(0, 120, 5e-8, 130, 10, 1e-2))
opt.par
## $a
## [1] -0.05
## 
## $f1
## [1] 80
## 
## $Q
## [1] 2e-08
## 
## $f
## [1] 80
## 
## $b
## [1] 5
## 
## $mu0
## [1] 0.001

Special case when some model parameter functions are equal to zero

Imagine a situation when one parameter function you want to be equal to zero: \(f=0\). Let's emulate this case:

library(stpm)
n <- 10
data <- simdata_time_dep(N=n)
# Estimation:
opt.par <- spm_time_dep(data, frm = list(at="a", f1t="f1", Qt="Q", ft="0", bt="b", mu0t="mu0"))
opt.par
## $a
## [1] -0.05
## 
## $f1
## [1] 80
## 
## $Q
## [1] 2e-08
## 
## $b
## [1] 80
## 
## $mu0
## [1] 5
## 
## $<NA>
## <NA> 
##   NA

As you can see, there is no parameter \(f\) in \(opt.par\). This is because we set \(f=0\) in \(frm\)!

Then, is you want to set the constraints, you must not specify the starting value (parameter \(start\)) and \(lb\)/\(ub\) for the parameter \(f\) (otherwise, the function raises an error):

n <- 10
data <- simdata_time_dep(N=n)
# Temporarily commented below
# Estimation:
opt.par <- spm_time_dep(data, frm = list(at="a", f1t="f1", Qt="Q", ft="0", bt="b", mu0t="mu0"), 
                        start=list(a=-0.05, f1=80, Q=2e-08, b=5, mu0=0.001), 
                        lb=c(-1, 30, 1e-8, 1, 1e-6), ub=c(0, 120, 5e-8, 10, 1e-2))
opt.par
## $a
## [1] -0.05
## 
## $f1
## [1] 80
## 
## $Q
## [1] 2e-08
## 
## $b
## [1] 5
## 
## $mu0
## [1] 0.001

You can do the same manner if you want two or more parameters to be equal to zero.

Fast parameter estimating for one-dimentional model using spm_con_1d

Function spm_con_1d(...) allows for very fast parameter estimating for one-dimensional model. This function implements a analytical solution to estimate the parameters in the continuous SPM model by assuming all the parameters are constants. Below there is an example.

library(stpm) 
dat <- simdata_cont(N=500)
colnames(dat) <- c("id", "xi", "t1", "t2", "y", "y.next")
res <- spm_con_1d(as.data.frame(dat), a=-0.05, b=2, q=1e-8, f=80, f1=90, mu0=1e-3, theta=0.08)
## [1] "Initial values:"
## [1] -5e-02  2e+00  1e-08  8e+01  9e+01  1e-03  8e-02
## [1] "Lower bounds:"
## [1] -2.5e-01  2.0e-01  1.0e-10  4.0e+01  4.5e+01  1.0e-04  8.0e-02
## [1] "Upper bounds:"
## [1] -2.5e-03  1.0e+01  1.0e-07  1.6e+02  1.8e+02  1.0e-02  8.0e-02
res
## $est
##               Coeff. Std. Err.   z p value
## a     -0.05154002558       NaN NaN     NaN
## b      4.95735487285       NaN NaN     NaN
## q      0.00000000010       NaN NaN     NaN
## f     79.99999757449       NaN NaN     NaN
## f1    80.21370253549       NaN NaN     NaN
## mu0    0.00010000000       NaN NaN     NaN
## theta  0.08000000000       NaN NaN     NaN
## 
## $hessian
##                      a                b   q                f               f1
## a      2.640661511e+05  6.276954008e+03 NaN  8.040033859e-07 -1.232944694e+01
## b      6.276954008e+03  1.738407871e+03 NaN  5.123800236e-14 -2.089436783e-08
## q                  NaN              NaN NaN              NaN              NaN
## f      8.040033859e-07  5.123800236e-14 NaN  5.716158494e-06  4.840379724e-08
## f1    -1.232944694e+01 -2.089436783e-08 NaN  4.840379724e-08  3.453682533e+00
## mu0    9.123466430e-01  2.367570941e-02 NaN -5.205070487e-03  2.535442120e-04
## theta  1.342515458e-02  3.099587825e-04 NaN -6.248100674e-05  2.922396894e-06
##                    mu0            theta
## a      9.123466430e-01  1.342515458e-02
## b      2.367570941e-02  3.099587825e-04
## q                  NaN              NaN
## f     -5.205070487e-03 -6.248100674e-05
## f1     2.535442120e-04  2.922396894e-06
## mu0   -1.909090923e+08  1.873273175e+09
## theta  1.873273175e+09  1.729401196e+07
## 
## $lik
## [1] 50612.58895
## 
## $con
## [1] 4
## 
## $message
## [1] "NLOPT_XTOL_REACHED: Optimization stopped because xtol_rel or xtol_abs (above) was reached."

Simulation (individual trajectory projection, also known as microsimulations)

We added one- and multi- dimensional simulation to be able to generate test data for hyphotesis testing. Data, which can be simulated can be discrete (equal intervals between observations) and continuous (with arbitrary intervals).

Discrete-time simulation

The corresponding function is (k - a number of variables(covariates), equal to model's dimension):

simdata_discr(N=100, a=-0.05, f1=80, Q=2e-8, f=80, b=5, mu0=1e-5, theta=0.08, ystart=80, tstart=30, tend=105, dt=1)

Here:

N - Number of individuals

a - A matrix of kxk, which characterize the rate of the adaptive response

f1 - A particular state, which if a deviation from the normal (or optimal). This is a vector with length of k

Q - A matrix of k by k, which is a non-negative-definite symmetric matrix

f - A vector-function (with length k) of the normal (or optimal) state

b - A diffusion coefficient, k by k matrix

mu0 - mortality at start period of time (baseline hazard)

theta - A displacement coefficient of the Gompertz function

ystart - A vector with length equal to number of dimensions used, defines starting values of covariates

tstart - A number that defines a start time (30 by default). Can be a number (30 by default) or a vector of two numbers: c(a, b) - in this case, starting value of time is simulated via uniform(a,b) distribution.

tend - A number, defines a final time (105 by default)

dt - A time interval between observations.

This function returns a table with simulated data, as shown in example below:

library(stpm)
data <- simdata_discr(N=10)
head(data)
##   id xi t1 t2          y1     y1.next
## 1  1  0 30 31 82.58098816 85.53705702
## 2  1  0 31 32 85.53705702 87.96047382
## 3  1  0 32 33 87.96047382 91.42769481
## 4  1  0 33 34 91.42769481 86.92396439
## 5  1  0 34 35 86.92396439 87.68313777
## 6  1  0 35 36 87.68313777 87.10164423

Continuous-time simulation

The corresponding function is (k - a number of variables(covariates), equal to model's dimension):

simdata_cont(N=100, a=-0.05, f1=80, Q=2e-07, f=80, b=5, mu0=2e-05, theta=0.08, ystart=80, tstart=c(30,50), tend=105)

Here:

N - Number of individuals

a - A matrix of kxk, which characterize the rate of the adaptive response

f1 - A particular state, which if a deviation from the normal (or optimal). This is a vector with length of k

Q - A matrix of k by k, which is a non-negative-definite symmetric matrix

f - A vector-function (with length k) of the normal (or optimal) state

b - A diffusion coefficient, k by k matrix

mu0 - mortality at start period of time (baseline hazard)

theta - A displacement coefficient of the Gompertz function

ystart - A vector with length equal to number of dimensions used, defines starting values of covariates

tstart - A number that defines a start time (30 by default). Can be a number (30 by default) or a vector of two numbers: c(a, b) - in this case, starting value of time is simulated via uniform(a,b) distribution.

tend - A number, defines a final time (105 by default)

This function returns a table with simulated data, as shown in example below:

library(stpm)
data <- simdata_cont(N=10)
head(data)
##   id xi          t1          t2           y1      y1.next
## 1  0  0 38.59519502 40.28167144  78.67833186  79.58517161
## 2  0  0 40.28167144 41.40018081  79.58517161  86.73389787
## 3  0  0 41.40018081 43.26282328  86.73389787  92.79329971
## 4  0  0 43.26282328 44.77508025  92.79329971 107.15008114
## 5  0  0 44.77508025 46.17216013 107.15008114  98.73339760
## 6  0  0 46.17216013 47.99891699  98.73339760  97.74309482

SPM with partially observed covariates

Stochastic Process Model has many applications in analysis of longitudinal biodemographic data. Such data contain various physiological variables (known as covariates). Data can also potentially contain genetic information available for all or a part of participants. Taking advantage from both genetic and non-genetic information can provide future insights into a broad range of processes describing aging-related changes in the organism.

Method

In this package, SPM with partially observed covariates is implemented in form of GenSPM (Genetic SPM), presented in [@Arbeev_2009] and further advanced in [@Arbeev_2014], further elaborates the basic stochastic process model conception by introducing a categorical variable, \(Z\), which may be a specific value of a genetic marker or, in general, any categorical variable. Currently, \(Z\) has two gradations: 0 or 1 in a genetic group of interest, assuming that \(P(Z=1) = p\), \(p \in [0, 1]\), were \(p\) is the proportion of carriers and non-carriers of an allele in a population. Example of longitudinal data with genetic component \(Z\) is provided below.

library(stpm)
data <- sim_pobs(N=10)
head(data)
##   id xi          t1          t2 Z          y1     y1.next
## 1  0  0 87.21089230 88.17811744 0 79.98233206 81.60908832
## 2  0  0 88.17811744 89.17214398 0 81.60908832 73.14241126
## 3  0  0 89.17214398 90.12955040 0 73.14241126 72.00132985
## 4  0  0 90.12955040 91.08371950 0 72.00132985 66.89753164
## 5  0  0 91.08371950 91.99514342 0 66.89753164 64.42219281
## 6  0  0 91.99514342 93.03106766 0 64.42219281 58.44734165

In the specification of the SPM described in 2007 paper by Yashin and colleagues [@Yashin_2007] the stochastic differential equation describing the age dynamics of a physiological variable (a dynamic component of the model) is:

\(dY(t) = a(Z, t)(Y(t) - f1(Z, t))dt + b(Z, t)dW(t), Y(t = t_0)\)

Here in this equation, \(Y(t)\) is a \(k \times 1\) matrix, where \(k\) is a number of covariates, which is a model dimension) describing the value of a physiological variable at a time (e.g. age) t. \(f_1(Z,t)\) is a \(k \times 1\) matrix that corresponds to the long-term average value of the stochastic process \(Y(t)\), which describes a trajectory of individual variable influenced by different factors represented by a random Wiener process \(W(t)\). The negative feedback coefficient \(a(Z,t)\) (\(k \times k\) matrix) characterizes the rate at which the stochastic process goes to its mean. In research on aging and well-being, \(f_1(Z,t)\) represents the average allostatic trajectory and \(a(t)\) in this case represents the adaptive capacity of the organism. Coefficient \(b(Z,t)\) (\(k \times 1\) matrix) characterizes a strength of the random disturbances from Wiener process \(W(t)\). All of these parameters depend on \(Z\) (a genetic marker having values 1 or 0). The following function \(\mu(t,Y(t))\) represents a hazard rate:

\(\mu(t,Y(t)) = \mu_0(t) + (Y(t) - f(Z, t))^*Q(Z, t)(Y(t) - f(Z, t))\)

In this equation: \(\mu_0(t)\) is the baseline hazard, which represents a risk when \(Y(t)\) follows its optimal trajectory; f(t) (\(k \times 1\) matrix) represents the optimal trajectory that minimizes the risk and \(Q(Z, t)\) (\(k \times k\) matrix) represents a sensitivity of risk function to deviation from the norm. In general, model coefficients \(a(Z, t)\), \(f1(Z, t)\), \(Q(Z, t)\), \(f(Z, t)\), \(b(Z, t)\) and \(\mu_0(t)\) are time(age)-dependent. Once we have data, we then can run analysis, i.e. estimate coefficients (they are assumed to be time-independent and data here is simulated):

library(stpm)
#Generating data:
data <- sim_pobs(N=10)
head(data)
##   id xi          t1          t2 Z          y1     y1.next
## 1  0  0 63.11667825 64.02130508 0 77.56719074 74.46960087
## 2  0  0 64.02130508 65.02631509 0 74.46960087 71.59804672
## 3  0  0 65.02631509 66.04423365 0 71.59804672 68.89249589
## 4  0  0 66.04423365 67.04513816 0 68.89249589 76.81077774
## 5  0  0 67.04513816 68.13836281 0 76.81077774 74.96693614
## 6  0  0 68.13836281 69.14715014 0 74.96693614 64.42327653
#Parameters estimation:
pars <- spm_pobs(x=data)
pars
## $aH
##                [,1]
## [1,] -0.05406488234
## 
## $aL
##                 [,1]
## [1,] -0.009931526099
## 
## $f1H
##             [,1]
## [1,] 54.48500195
## 
## $f1L
##             [,1]
## [1,] 87.21420888
## 
## $QH
##                 [,1]
## [1,] 1.878640675e-08
## 
## $QL
##                 [,1]
## [1,] 2.616943089e-08
## 
## $fH
##            [,1]
## [1,] 57.2508126
## 
## $fL
##             [,1]
## [1,] 83.07859361
## 
## $bH
##             [,1]
## [1,] 3.611982031
## 
## $bL
##             [,1]
## [1,] 5.169067084
## 
## $mu0H
## [1] 8.330474754e-06
## 
## $mu0L
## [1] 9.007042292e-06
## 
## $thetaH
## [1] 0.07321248004
## 
## $thetaL
## [1] 0.09005961411
## 
## $p
## [1] 0.2384110394
## 
## $limit
## [1] FALSE
## 
## attr(,"class")
## [1] "pobs.spm"

Here \textbf{H} and \textbf{L} represents parameters when \(Z\) = 1 (H) and 0 (L).

###Joint analysis of two datasets: first dataset with genetic and second dataset with non-genetic component

library(stpm)
data.genetic <- sim_pobs(N=5, mode='observed')
head(data.genetic)
##   id xi          t1          t2 Z          y1     y1.next
## 1  0  0 36.57300264 37.54328884 1 81.71260624 82.56430012
## 2  0  0 37.54328884 38.52377532 1 82.56430012 84.41819331
## 3  0  0 38.52377532 39.61619584 1 84.41819331 89.48995318
## 4  0  0 39.61619584 40.55648777 1 89.48995318 76.61111401
## 5  0  0 40.55648777 41.49180407 1 76.61111401 79.11805016
## 6  0  0 41.49180407 42.47815248 1 79.11805016 74.40481637
data.nongenetic <- sim_pobs(N=10, mode='unobserved')
head(data.nongenetic)
##   id xi          t1          t2          y1     y1.next
## 1  0  0 45.40504819 46.47892624 80.46564725 77.39762958
## 2  0  0 46.47892624 47.40445410 77.39762958 74.10118173
## 3  0  0 47.40445410 48.47629261 74.10118173 68.69041857
## 4  0  0 48.47629261 49.51821072 68.69041857 71.55355641
## 5  0  0 49.51821072 50.50122545 71.55355641 73.27115267
## 6  0  0 50.50122545 51.55911484 73.27115267 74.61010574
#Parameters estimation:
pars <- spm_pobs(x=data.genetic, y = data.nongenetic, mode='combined')
## Parameter thetaH achieved lower/upper bound.
## 0.072
pars
## $aH
##                [,1]
## [1,] -0.03669878625
## 
## $aL
##                [,1]
## [1,] -0.01080678315
## 
## $f1H
##             [,1]
## [1,] 55.83314604
## 
## $f1L
##             [,1]
## [1,] 83.56409523
## 
## $QH
##                [,1]
## [1,] 2.14296873e-08
## 
## $QL
##                 [,1]
## [1,] 1.702318677e-08
## 
## $fH
##             [,1]
## [1,] 56.09645823
## 
## $fL
##             [,1]
## [1,] 80.03943036
## 
## $bH
##             [,1]
## [1,] 4.226919935
## 
## $bL
##             [,1]
## [1,] 4.902493514
## 
## $mu0H
## [1] 7.202059299e-06
## 
## $mu0L
## [1] 9.036337798e-06
## 
## $thetaH
## [1] 0.072
## 
## $thetaL
## [1] 0.09007933941
## 
## $p
## [1] 0.2749075696
## 
## $limit
## [1] TRUE
## 
## attr(,"class")
## [1] "pobs.spm"

Here mode 'observed' is used for simlation of data with genetic component \(Z\) and 'unobserved' - without genetic component.

Genetic SPM 'GSPM'

This type of SPM also uses genetic component by analogy from the previous chapters but uses explicit gradient function which speeds up computations significantly. See [@He_2017] for details. Below we provide examples of usage:

library(stpm) 
data(ex_spmcon1dg)
head(ex_data$spm_data)
##   id xi t1 t2           y      y.next
## 1  1  0 30 31 2.000000000 2.024328135
## 2  1  0 31 32 2.024328135 1.927486318
## 3  1  0 32 33 1.927486318 1.899083801
## 4  1  0 33 34 1.899083801 2.061574385
## 5  1  0 34 35 2.061574385 2.034558435
## 6  1  0 35 36 2.034558435 2.114382051
head(ex_data$gene_data)
##   id geno
## 1  1    1
## 2  2    1
## 3  3    0
## 4  4    0
## 5  5    1
## 6  6    0
res <- spm_con_1d_g(spm_data=ex_data$spm_data, 
                    gene_data=ex_data$gene_data, 
                    a = -0.02, b=0.2, q=0.01, f=3, f1=3, mu0=0.01, theta=1e-05, 
                    upper=c(-0.01,3,0.1,10,10,0.1,1e-05), lower=c(-1,0.01,0.00001,1,1,0.001,1e-07), 
                    effect=c('q'), method = "tnewton")
## [1] "Initial values:"
##  [1] -2e-02 -2e-02  2e-01  2e-01  1e-02  1e-02  3e+00  3e+00  3e+00  3e+00
## [11]  1e-02  1e-02  1e-05
## [1] "Lower bounds:"
##  [1] -1e+00 -1e+00  1e-02  1e-02  1e-05  1e-05  1e+00  1e+00  1e+00  1e+00
## [11]  1e-03  1e-03  1e-07
## [1] "Upper bounds:"
##  [1] -1e-02 -1e-02  3e+00  3e+00  1e-01  1e-01  1e+01  1e+01  1e+01  1e+01
## [11]  1e-01  1e-01  1e-05
res
## $est
##                Coeff.       Std. Err.                z      p value
## a     -0.031238526396 9.063948538e-04 -3.446458932e+01 0.0000000000
## b      0.101329776535 2.785757812e-04  3.637422323e+02 0.0000000000
## q_0    0.000010000000 3.124724202e-03  3.200282442e-03 0.9974465484
## q_2    0.003608789876 5.278940497e-03  6.836201087e-01 0.4942150834
## f      2.999731527110 3.052250225e-02  9.827934494e+01 0.0000000000
## f1     3.000056947322 1.716252597e-02  1.748027622e+02 0.0000000000
## mu0    0.001000000000 1.552500478e+04  6.441221851e-08 0.9999999486
## theta  0.000000100000 8.987112435e+01  1.112704450e-09 0.9999999991
## 
## $lik
## [1] -121546.402
## 
## $con
## [1] -1
## 
## $message
## [1] "NLOPT_FAILURE: Generic failure code."
## 
## $hessian
##                      a                b             q_0              q_2
## a      2.354652039e+06  6.786921795e+05   158.058062204    82.3853495314
## b      6.786921795e+05  1.325446527e+07  1517.123161715   775.1749357615
## q_0    1.580580622e+02  1.517123162e+03   -39.193641214   -13.8346645758
## q_2    8.238534953e+01  7.751749358e+02   -13.834664576   -13.8346698668
## f      3.489487410e-01 -4.196232148e-03 36528.531980456 19000.8175924782
## f1    -8.446522178e+04  5.065699701e+01    -2.840122726    -1.4581382745
## mu0    2.228043741e-03  8.307251846e-04     1.053761684     0.6444883984
## theta  4.230830818e-01  1.744572073e-01   194.433867000   117.7827687934
##                      f               f1              mu0            theta
## a      3.489487410e-01 -8.446522178e+04  2.228043741e-03   0.423083081841
## b     -4.196232148e-03  5.065699701e+01  8.307251846e-04   0.174457207322
## q_0    3.652853198e+04 -2.840122726e+00  1.053761684e+00 194.433867000043
## q_2    1.900081759e+04 -1.458138275e+00  6.444883984e-01 117.782768793404
## f      1.670661873e+02 -1.202720644e-02  3.587745596e-03   0.465787947178
## f1    -1.202720644e-02  6.470562259e+03 -5.237961886e-05  -0.006925314665
## mu0    3.587745596e-03 -5.237961886e-05  0.000000000e+00   0.000000000000
## theta  4.657879472e-01 -6.925314665e-03  0.000000000e+00   0.000000000000
## 
## $beta
##                  Coeff.      Std. Err.         Chi. Sq      p value
## beta_a               NA             NA              NA           NA
## beta_b               NA             NA              NA           NA
## beta_q   0.001799394938 0.004201832138 0.0007705738917 0.9778541969
## beta_f               NA             NA              NA           NA
## beta_f1              NA             NA              NA           NA
## beta_mu0             NA             NA              NA           NA

Here: spm_data - A dataset for the SPM model. See the STPM package for more details about the format.

gene_data - A two column dataset containing the genotypes for the individuals in spm_data. The first column id is the ID of the individuals in dataset spm_data, and the second column geno is the genotype.

a - The initial value for the paramter \eqn{a}. The initial value will be predicted if not specified.

b - The initial value for the paramter \eqn{b}. The initial value will be predicted if not specified.

q - The initial value for the paramter \eqn{q}. The initial value will be predicted if not specified.

f - The initial value for the paramter \eqn{f}. The initial value will be predicted if not specified.

f1 - The initial value for the paramter \eqn{f_1}. The initial value will be predicted if not specified.

mu0 - The initial value for the paramter \eqn{\mu_0} in the baseline hazard. The initial value will be predicted if not specified.

theta - The initial value for the paramter \eqn{\theta} in the baseline hazard. The initial value will be predicted if not specified.

lower - A vector of the lower bound of the parameters.

upper - A vector of the upper bound of the parameters.

effect - A character vector of the parameters that are linked to genotypes. The vector can contain any combination of \code{a}, \code{b}, \code{q}, \code{f}, \code{mu0}.

control - A list of the control parameters for the optimization paramters.

global - A logical variable indicating whether the MLSL (TRUE) or the L-BFGS (FALSE) algorithm is used for the optimization.

verbose - A logical variable indicating whether initial information is printed.

ahessian - A logical variable indicating whether the approximate (FALSE) or analytical (TRUE) Hessian is returned.

est - The estimates of the parameters.

hessian - The Hessian matrix of the estimates.

lik - The minus log-likelihood.

con - A number indicating the convergence. See the 'nloptr' package for more details.

message - Extra message about the convergence. See the 'nloptr' package for more details.

beta - The coefficients of the genetic effect on the parameters to be linked to genotypes.

Multiple imputation with spm.impute(…)

The SPM offers longitudinal data imputation with results that are better than from other imputation tools since it preserves data structure, i.e. relation between Y(t) and mu(Y(t),t). Below there are two examples of multiple data imputation with function spm.impute(…).

library(stpm)

#######################################################################
############## One dimensional case (one covariate) ###################
#######################################################################

## Data preparation (short format)#
data <- simdata_discr(N=1000, dt = 2, format="short")

miss.id <- sample(x=dim(data)[1], size=round(dim(data)[1]/4)) # ~25% missing data
incomplete.data <- data
incomplete.data[miss.id,4] <- NA
# End of data preparation #

##### Multiple imputation with SPM #####
imp.data <- spm.impute(x=incomplete.data, id=1, case="xi", t1=3, covariates="y1", minp=1, theta_range=seq(0.075, 0.09, by=0.001))$imputed

##### Look at the incomplete data with missings #####
head(incomplete.data)
##   id xi  t          y1
## 1  1  0 30 82.21397004
## 2  1  0 32          NA
## 3  1  0 34 85.03517916
## 4  1  0 36 85.42723336
## 5  1  0 38 82.62521178
## 6  1  0 40          NA
##### Look at the imputed data #####
head(imp.data)
##   id xi  t          y1
## 1  1  0 30 82.21397004
## 2  1  0 32 81.93599225
## 3  1  0 34 85.03517916
## 4  1  0 36 85.42723336
## 5  1  0 38 82.62521178
## 6  1  0 40 82.32537105
#########################################################
################ Two-dimensional case ###################
#########################################################

## Parameters for data simulation #
a <- matrix(c(-0.05, 0.01, 0.01, -0.05), nrow=2)
f1 <- matrix(c(90, 30), nrow=1, byrow=FALSE)
Q <- matrix(c(1e-7, 1e-8, 1e-8, 1e-7), nrow=2)
f0 <- matrix(c(80, 25), nrow=1, byrow=FALSE)
b <- matrix(c(5, 3), nrow=2, byrow=TRUE)
mu0 <- 1e-04
theta <- 0.07
ystart <- matrix(c(80, 25), nrow=2, byrow=TRUE)

## Data preparation #
data <- simdata_discr(N=1000, a=a, f1=f1, Q=Q, f=f0, b=b, ystart=ystart, mu0 = mu0, theta=theta, dt=2, format="short")

## Delete some observations in order to have approx. 25% missing data
incomplete.data <- data
miss.id <- sample(x=dim(data)[1], size=round(dim(data)[1]/4)) 
incomplete.data <- data
incomplete.data[miss.id,4] <- NA
miss.id <- sample(x=dim(data)[1], size=round(dim(data)[1]/4)) 
incomplete.data[miss.id,5] <- NA
## End of data preparation #

###### Multiple imputation with SPM #####
imp.data <- spm.impute(x=incomplete.data, id=1, case="xi", t1=3, covariates=c("y1", "y2"), minp=1, theta_range=seq(0.060, 0.07, by=0.001))$imputed

###### Look at the incomplete data with missings #####
head(incomplete.data)
##   id xi  t          y1          y2
## 1  1  0 30 73.58167132 17.53079936
## 2  1  0 32 83.53203831 20.71095517
## 3  1  0 34 81.32282321          NA
## 4  1  0 36 84.73096312          NA
## 5  1  0 38 84.78126323 23.52301468
## 6  1  0 40 79.48244838 26.76178054
###### Look at the imputed data #####
head(imp.data)
##   id xi  t          y1          y2
## 1  1  0 30 73.58167132 17.53079936
## 2  1  0 32 83.53203831 20.71095517
## 3  1  0 34 81.32282321 20.60595418
## 4  1  0 36 84.73096312 20.48709461
## 5  1  0 38 84.78126323 23.52301468
## 6  1  0 40 79.48244838 26.76178054

Prediction

We provide a simple function to predict the next value of \code{Y}. Refer to the example below:

#library(stpm)
#data <- simdata_discr(N=100, format="long")
#res <- spm_discrete(data)
#splitted <- split(data, data$id)
#df <- data.frame()
#lapply(1:100, function(i) {df<<-rbind(df,splitted[[i]][dim(splitted[[i]])[1],c("id", "xi", "t1", "y1")])})
#names(df) <- c("id", "xi", "t", "y")
#predicted <- predict(object=res, data=df, dt=3)
#head(predicted)

Hypothesis testing

The package offers following five hypotheses to test for \code{spm_time_dep(…)} function [@Arbeev_2016]:

H01: \(Q(t)=0\) (i.e., \(a_Q = 0\) and \(b_Q = 0\),so that there is no quadratic term in the hazard rate and mortality is described by the baseline Gompertz rate \(μ_0(t)\)).

H02: \(Q(t) = a_Q\) (i.e., \(b_Q = 0\)).

H03: \(f_1(t) = 0\) (i.e., \(a_{f1} = 0\) and \(b_{f1} = 0\)).

H04: \(f_1(t) = a_{f_1}\) (i.e., \(b_{f_1} = 0\)).

H05: \(a(t) = a_Y\) (i.e., \(b_Y = 0\)).

To perform hypothesis testing you should put the variable lrtest to TRUE (this is "H01" by default) or to any of the following: "H01", "H02", "H03", "H04", "H05".

Examples of hypothesis testing

library(stpm)
n <- 1000

# Data simulation:
data <- simdata_time_dep(N=n, format="long")
head(data)

# Hypotheses testing

## H01
res <- spm_time_dep(data, verbose=F, 
                    frm = list(at="a", f1t="f1", Qt="Q", ft="f", bt="b", mu0t="mu0"),
                    start=list(a=-0.05, f1=80, Q=1e-8, f=90, b=5, mu0=0.001), 
                    lb=c(a=-1, f1=30, Q=1e-9, f=10, b=1, mu0=1e-6), 
                    ub=c(a=0, f1=120, Q=1e-7, f=150, b=10, mu0=1e-2),
                    opts = list(algorithm = "NLOPT_LN_NELDERMEAD", 
                    maxeval = 200, ftol_rel = 1e-12), lrtest="H01")

res$alternative$lr.test.pval

## H02
res <- spm_time_dep(data, verbose=F, 
                    frm = list(at="a", f1t="f1", Qt="1e-6", ft="f", bt="b", mu0t="mu0"),
                    start=list(a=-0.05, f1=80, Q=1e-8, f=90, b=5, mu0=0.001), 
                    lb=c(a=-1, f1=30, Q=1e-9, f=10, b=1, mu0=1e-6), 
                    ub=c(a=0, f1=120, Q=1e-7, f=150, b=10, mu0=1e-2),
                    opts = list(algorithm = "NLOPT_LN_NELDERMEAD", 
                    maxeval = 200, ftol_rel = 1e-12), lrtest="H02")

res$alternative$lr.test.pval

## H03
res <- spm_time_dep(data, verbose=F, 
                    frm = list(at="a", f1t="f1", Qt="Q", ft="f", bt="b", mu0t="mu0"),
                    start=list(a=-0.05, f1=80, Q=1e-8, f=90, b=5, mu0=0.001), 
                    ub=c(a=0, f1=120, Q=1e-7, f=150, b=10, mu0=1e-2),
                    opts = list(algorithm = "NLOPT_LN_NELDERMEAD", 
                    maxeval = 200, ftol_rel = 1e-12), lrtest="H03")

res$alternative$lr.test.pval

## H04
res <- spm_time_dep(data, verbose=F, 
                    frm = list(at="a", f1t="120", Qt="Q", ft="f", bt="b", mu0t="mu0"),
                    start=list(a=-0.05, f1=80, Q=1e-8, f=90, b=5, mu0=0.001), 
                    lb=list(a=-1, f1=30, Q=1e-9, f=10, b=1, mu0=1e-6), 
                    opts = list(algorithm = "NLOPT_LN_NELDERMEAD", 
                    maxeval = 200, ftol_rel = 1e-12), lrtest="H04")

res$alternative$lr.test.pval

## H05
res <- spm_time_dep(data, verbose=F, 
                    frm = list(at="-0.1", f1t="f1", Qt="Q", ft="f", bt="b", mu0t="mu0"),
                    start=list(a=-0.05, f1=80, Q=1e-8, f=90, b=5, mu0=0.001), 
                    opts = list(algorithm = "NLOPT_LN_NELDERMEAD", 
                    maxeval = 200, ftol_rel = 1e-12), lrtest="H05")

res$alternative$lr.test.pval

##References