^{a}

^{a}

^{b}

^{c}

^{d}

^{c}

^{a}

^{e}

^{f}

In biostatistics and medical research, longitudinal data are often composed of repeated assessments of a variable and dichotomous indicators to mark an event of interest. Consequently, joint modeling of longitudinal and time-to-event data has generated much interest in these disciplines over the previous decade. In behavioural sciences, too, often we are interested in relating individual trajectories and discrete events. Yet, joint modeling is rarely applied in behavioural sciences more generally. This tutorial presents an overview and general framework for joint modeling of longitudinal and time-to-event data, and fully illustrates its application in the context of a behavioral study with the JMbayes R package. In particular, the tutorial discusses practical topics, such as model selection and comparison, choice of joint modeling parameterization and interpretation of model parameters. In the end, this tutorial aims at introducing didactically the theory related to joint modeling and to introduce novice analysts to the use of the JMbayes package.

In many research settings, it is very common to record longitudinal observations that correspond to responses on continuous measures alongside dichotomous indicators to mark the occurrence of an event of interest. A prototypical example in clinical research is the repeated assessment of biological measures (e.g., blood pressure, antibody affinity, cholesterol level) that may relate to an event such as death, recovery from a disease, or disease diagnosis. In psychological research, much interest revolves around the association between long-term individual trajectories of cognitive performance and imminent death (e.g., terminal decline or terminal drop hypotheses;

Methods for the separate analysis of such outcomes are well established in the literature, and these frequently include the use of mixed-effects models for the longitudinal portion of the data and the Cox proportional hazards (PH) model for the time-to-event data (

Joint modeling is increasingly used in medical research (see for example

This tutorial is therefore aimed at explaining basic features of the joint modeling framework for longitudinal and time-to-event data for users familiar with mixed-effects and time-to-event (a.k.a. survival) models. The overall aim is to guide readers step by step, such that at the end of the tutorial they acquire the basic concepts of joint modeling and are able to apply this methodology to their own research data. To do so, we initially reviewed the seven currently available joint-modeling packages in the open-source

We note that the aim of this tutorial is not to replace the

Below we describe the major features of the statistical theory behind joint modeling. For the sake of completeness, we show the most important equations of both the mixed-effects and the survival model, which lead to the joint model. Readers interested mainly in a conceptual understanding and in an empirical application of joint modeling may read the next paragraph, skip the remaining portions of this section (except the first paragraph of each subsection), and proceed to the next section.

The joint model, first described by

Longitudinal data here consist of repeated measurements of the same outcome for each study participant over a given period of time. Repeated measurements for the same individual are typically correlated, and therefore each individual in the population is expected to have his or her own subject-specific response pattern over time. The mixed-effects model (

Let

The second line of

Time-to-event, or survival, analysis refers to statistical methods for analyzing time-to-event data. An event time is the time elapsed up to the occurrence of an event of interest (such as death or disease diagnostic), given that it has not previously occurred yet. Time-to-event data are characterized by the fact that the event of interest may not have occurred for every participant during the duration of the study. This particular type of missing data is analogously treated as data of a participant who drops out before the end of the study, and both scenarios constitute

There are different models for time-to-event data (

The Cox PH submodel can be formulated as

Joint models formally associate the longitudinal and time-to-event processes through

The first association structure between the longitudinal and the time-to-event submodels we discuss is known as the “current value”, and assumes that for individual

This equation associates

The “current value” parameterization, however, does not distinguish between individuals who have, at a specific time point, an equal longitudinal score (an equal

The corresponding submodel of this association structure has the form

The third association structure between the time-to-event and the longitudinal submodels is a “shared random-effects” parameterization (

This parameterization is computationally simpler than the “current value” and the “current value plus slope” parameterizations, because the associative part

The

Both the

We illustrate the joint modeling approach by applying

The data come from the Manchester Longitudinal Study of Cognition (MLSC;

PS was assessed up to four times at approximately 3-year intervals. Relations between cognition and survival have been the focus of several previous MLSC analyses (see

Mixed-effects models require repeated data measured across time on each individual, and in our data set we have up to four individual measurements of the PS variable. There is no well-agreed-upon rule of thumb concerning the minimal amount of data required for fitting a longitudinal model properly, but the complexity of the fit (e.g., quadratic, cubic or non-parametric effects of the longitudinal variable) depends on the available amount of repeated data (e.g.,

For the Cox model, the “1 in 10 rule” states that for every 10 occurrences of the event of interest, 1 predictor can be added to the model (

We hypothesize that mortality may explicitly depend on previous PS trajectories, whereby individuals with lower scores and/or steeper trajectories should have a greater hazard of dying than those performing better. We also posit that, given the PS-mortality association, adequate description of cognitive change in adult individuals should statistically take into account their (future) mortality information. A previous survival analysis of these data showed that the PH assumption was not met when comparing mortality risk for male and female participants (

We first select a submodel for the longitudinal part of the data. Based on a previous literature search on the use of the mixed-effects model in cognitive aging research (

We then, separately, select the optimal time-to-event submodel. To do so we rely on the Cox PH model that relates smoking status and initial age to the timing and occurrence of death. For each covariate we test the PH assumption and then retain the submodel that best describes the mortality process based on the selected covariates. Finally, we combined both submodels in joint modeling, and compared four association structures between the two submodels.

The choice of including or not given predictors in the mixed-effects and in the Cox submodels should be theoretically driven. For the longitudinal model, the objective is to describe the longitudinal trajectory across time of the variable of interest for each individual accounting for measurement noise. Besides the obvious choice of including time, at both the overall and subject-specific level, any other predictor that might improve the prediction accuracy of the outcome should be added. Likewise, for the time-to-event submodel, any predictor that might help forecasting the hazard of death should be considered for inclusion. Again, time is a natural predictor here, too. Finally, if theoretically meaningful, given predictors can be included in both the mixed-effects

As required by the

The spline approach has gain much popularity in recent years because of its flexibility in describing subject-specific longitudinal trajectories that follow highly nonlinear functions and that, individually, deviate from the average sample function. A spline is a piecewise function composed of polynomials adjusted to data within adjacent intervals, separated by equidistant points called nodes. The degree of the spline is defined by the polynomial of the highest degree used. For instance, if the function fitted to the data within an interval is a straight lines, the spline is said to be of degree

The user must select the optimal number of nodes, which is a rather arbitrary component of splines. However, this value should be less than or equal to the number of repeated measures. To this end, there is no automatic selection procedure implemented in the

As explained in

For this illustration, we modeled change in cognitive performance as a function of age in years centered at age 50 (

We estimated the following specifications of the longitudinal submodel:

```
ctrl <- lmeControl(opt="optim")
lmeFit.mlsc1 <- lme(FS_SPD ~ Age_50+AgeStart,
data = MLSC_fem2, random = ~ Age_50 | ID)
lmeFit.mlsc2 <- lme(FS_SPD ~ Age_50+I(Age_50^2)+AgeStart,
data = MLSC_fem2, random = ~ Age_50 | ID)
lmeFit.mlsc3 <- lme(FS_SPD ~ Age_50+I(Age_50^2)+I(Age_50^3)+AgeStart,
data = MLSC_fem2, random = ~ Age_50 | ID)
lmeFit.mlsc4 <- lme(FS_SPD ~ ns(Age_50,2)+AgeStart, data = MLSC_fem2,
random = ~ Age_50 | ID)
lmeFit.mlsc5 <- lme(FS_SPD ~ ns(Age_50,3)+AgeStart, data = MLSC_fem2,
random = ~ Age_50 | ID)
lmeFit.mlsc6 <- lme(FS_SPD ~ ns(Age_50,4)+AgeStart, data = MLSC_fem2,
random = ~ Age_50 | ID)
m1<-update(lmeFit.mlsc1, method = "ML" )
m2<-update(lmeFit.mlsc2, method = "ML" )
m3<-update(lmeFit.mlsc3, method = "ML" )
m4<-update(lmeFit.mlsc4, method = "ML" )
m5<-update(lmeFit.mlsc5, method = "ML" )
m6<-update(lmeFit.mlsc6, method = "ML" )
lmeFit.mlsc2.1 <- lme(FS_SPD ~ Age_50*AgeStart+I(Age_50^2),
data = MLSC_fem2, random = ~ Age_50 | ID)
lmeFit.mlsc2.2 <- lme(FS_SPD ~ Age_50*AgeStart+I(Age_50^2)*AgeStart,
data = MLSC_fem2, random = ~ Age_50 | ID)
```

Note that

We evaluated the adjustments of the six submodels in terms of their respective BIC values. The

`BIC(m1,m2,m3,m4,m5,m6)`

and the results are displayed in

Model | BIC | |
---|---|---|

7 | 29414.87 | |

8 | 29158.73 | |

9 | 29167.24 | |

8 | 29162.86 | |

9 | 29170.97 | |

10 | 29178.68 | |

9 | 29155.24 | |

10 | 29163.72 |

For this we concluded that the submodel with a degree-2 polynomial for the fixed effects of

We subsequently expanded this submodel by adding interaction effects between the degree-2

```
lmeFit.mlsc2.1 <- lme(FS_SPD ~ Age_50*AgeStart+I(Age_50^2),
data = MLSC_fem2, random = ~ Age_50 | ID)
lmeFit.mlsc2.2 <- lme(FS_SPD ~ Age_50*AgeStart+I(Age_50^2)*AgeStart,
data = MLSC_fem2, random = ~ Age_50 | ID)
m2.1<-update(lmeFit.mlsc2.1, method = "ML" )
m2.2<-update(lmeFit.mlsc2.2, method = "ML" )
BIC(m2,m2.1,m2.2)
```

To ensure that our model fits the data properly, we need to calculate the ^{2} criterion relative to our selected longitudinal model. Note that the ^{2} criterion for linear mixed-effects models can be obtain following

```
library(MuMIn)
r.squaredGLMM(lmeFit.mlsc2.1)
```

The first column of the output contains the marginal ^{2}, the proportion of variance explained by the fixed effects only, whereas the second column shows conditional ^{2}, the proportion of variance explained by both the fixed and random effects (the quantity of interest for us). In our case, the conditional ^{2} equals .983, which means that the proportion of variance explained in our final longitudinal model is 98.3%. We therefore have a model with an excellent fit to pursue our joint model estimation procedure.

To conclude this section on the longitudinal submodel of the MLSC PS score, we retain submodel

As discussed previously, the Cox PH submodel, defined in

In this illustration we specified a nonparametric baseline hazard function within the Cox PH submodel, with factor Smoker (whether or not an individual was a smoker at study entry) as baseline variable, conditioned on age in years at study entry (

```
Coxfit_fem <- coxph(Surv(AgeLastObserved_2012_50, Dead_By2012) ~ Smoker+AgeStart,
data = MLSC_ID_fem2,x=TRUE)
```

Note that in the joint modeling context we need to set

Results are displayed in

Predictor | coef | exp(coef) | se(coef) | ||
---|---|---|---|---|---|

0.519 | 1.680 | 0.064 | 8.092 | < .001 | |

0.099 | 1.104 | 0.004 | 9.815 | < .001 |

An important (and often overlooked) assumption of the Cox PH model is that baseline hazard functions for model predictors are proportional (

The PH assumption can be checked statistically using the Schoenfeld Residuals Test (SRT;

The SRT can be applied for each predictor individually and also for the entire model with all predictors, using the function

`print(cox.zph(Coxfit_fem))`

Predictor | rho | chisq | |
---|---|---|---|

0.068 | 8.38 | .004 | |

0.061 | 6.34 | .012 | |

GLOBAL | N/A | 15.81 | < .001 |

The results of the SRT indicate that the PH assumption does not hold for the factor

```
fit<-survfit(Surv(AgeLastObserved_2012_50, Dead_By2012) ~ Smoker,
MLSC_ID_fem2, id=ID)
print(fit)
```

Factor | Events | Median centered age at death | 0.95 LCL | 0.95 UCL | |
---|---|---|---|---|---|

2384 | 1562 | 39 | 38 | 39 | |

396 | 296 | 34 | 33 | 35 |

We are now ready to explore the association between longitudinal trajectories in processing speed and risk of death. We consider the three association structures as discussed above and, again, use a model comparison approach to select the final model.

We start with the “current value” parameterization defined in

```
jointFit.mlsc1 <- jointModelBayes(lmeFit.mlsc2.1, Coxfit_fem,
timeVar = "Age_50",n.iter = 30000)
```

with

Additionally, in a Bayesian paradigm, prior distributions must be specified for model parameters. A basic approach is to simply select non-informative priors, which is the default for

As usual, before interpreting the results, we should worry about the estimation quality of the model. Various diagnostic plots are available within

MCMC samples are dependent by construction. This particularity does not impact the validity of the posterior distribution estimation if the algorithm has enough iterations to explore this posterior distribution. To obtain the same Monte Carlo error for an estimation, correlated MCMC estimation needs more samples. In the trace plots, we seek the so-called “lazy caterpillar” pattern, which means that across the various iterations the estimated values of the parameters are within reasonably restricted ranges, and are not suddenly outside such a range. For the autocorrelation plots, which represent the correlations between a solution and that a given lag from it, we seek autocorrelations that become small and as close to zero as quickly as possible (i.e., with small lags), meaning that the solutions of the simulated samples become quickly independent. The kernel density plots are smoothed histograms of the estimated solutions across the simulated samples. We expect these to be unimodal and with small tails. It is also possible to specify MCMC estimation with multiple shorter chains, which can also be diagnosed, but this requires additional

In our case, the diagnostic plot in

The estimated parameter values can be obtained with the standard

Predictor | Value | 2.5% | 97.5% |
---|---|---|---|

0.445 | 0.328 | 0.555 | |

-0.048 | -0.055 | -0.038 | |

-0.026 | -0.032 | -0.020 | |

12.186 | 4.643 | 25.541 | |

8.806 | 6.506 | 11.060 | |

0.192 | 0.079 | 0.302 | |

-0.054 | -0.090 | -0.018 | |

^{2}) |
-0.007 | -0.007 | -0.006 |

-0.005 | -0.006 | -0.003 | |

σ | 0.946 | 0.916 | 0.977 |

31.550 | 28.805 | 34.265 | |

-0.320 | -0.402 | -0.229 | |

0.078 | 0.073 | 0.084 |

The coefficient for smoker statuts (0.445) indicates a strong association with mortality, with a exp (0.445) = 1.560-fold increase in risk of death in smokers, compared to non smokers. The intercept value tells us that when the covariates

This second association structure takes the form

Recalling

The

```
dForm <- list(fixed= ~1 + I(2*Age_50) +AgeStart ,
random = ~ 1, indFixed = c(2,4,5), indRandom = 2)
```

This

Arguments

We show the results of the “current value plus slope” parametrization in

`jointFit.mlsc2 <- update(jointFit.mlsc1, param = "td-both", extraForm = dForm)`

Predictor | Value | 2.5% | 97.5% |
---|---|---|---|

0.424 | 0.315 | 0.573 | |

-0.056 | -0.066 | -0.048 | |

0.001 | -0.008 | 0.010 | |

-1.600 | -1.984 | -1.243 | |

14.250 | 5.747 | 28.615 | |

8.502 | 6.232 | 10.765 | |

0.185 | 0.070 | 0.295 | |

-0.049 | -0.086 | -0.012 | |

_{2}) |
-0.007 | -0.007 | -0.006 |

-0.005 | -0.006 | -0.003 | |

σ | 0.946 | 0.910 | 0.981 |

31.093 | 28.992 | 33.549 | |

-0.301 | -0.396 | -0.216 | |

0.079 | 0.073 | 0.085 |

^{1} = 0.001 and α^{2} = -1.600, with the 95% credible interval of

To interpret the effect of

The third joint model can be written as

```
jointFit.mlsc3<- jointModelBayes(lmeFit.mlsc2.1, Coxfit_fem,
timeVar = "Age_50",param = "shared-RE",
n.iter = 30000)
```

The results are displayed in

Predictor | Value | 2.5% | 97.5% |
---|---|---|---|

0.410 | 0.291 | 0.536 | |

-0.048 | -0.056 | -0.041 | |

0.004 | -0.006 | 0.014 | |

-1.641 | -1.884 | -1.397 | |

7.426 | 2.792 | 16.424 | |

8.330 | 6.050 | 10.515 | |

0.192 | 0.079 | 0.308 | |

-0.046 | -0.081 | -0.009 | |

_{2}) |
-0.007 | -0.007 | -0.007 |

-0.005 | -0.007 | -0.003 | |

σ | 0.957 | 0.924 | 0.993 |

31.438 | 29.167 | 33.827 | |

-0.322 | -0.411 | -0.224 | |

0.080 | 0.074 | 0.085 |

We have illustrated the three association structures discussed previously in the joint models. Before reaching a final substantive conclusion about our data set, we would like to illustrate an additional extension of the association function

We illustrate this by extending the model

The interaction is specified within the

```
tf1<-function (x, data) cbind(x, "Smoker" = x * (data$Smoker=='TRUE'))
jointFit.mlsc4 <- update(jointFit.mlsc1, transFun = tf1)
```

In this code,

`source(tf1.R)`

The results are presented in

Predictor | Value | 2.5% | 97.5% |
---|---|---|---|

0.498 | 0.306 | 0.667 | |

-0.049 | -0.057 | -0.043 | |

-0.028 | -0.034 | -0.021 | |

0.007 | -0.007 | 0.020 | |

6.989 | 2.420 | 18.851 | |

8.746 | 6.453 | 11.048 | |

0.191 | 0.079 | 0.300 | |

-0.054 | -0.090 | -0.016 | |

^{2}) |
-0.007 | -0.007 | -0.006 |

-0.005 | -0.006 | -0.003 | |

σ | 0.938 | 0.906 | 0.968 |

32.501 | 30.020 | 35.273 | |

-0.346 | -0.453 | -0.241 | |

0.079 | 0.073 | 0.084 |

We can finally proceed to compare statistically the various association structures of the estimated joint models. To do so, we rely on the Deviance Information Criterion (DIC;

`anova(jointFit.mlsc1,jointFit.mlsc2,jointFit.mlsc3,jointFit.mlsc4)`

and the results of these comparisons can be found in

Model | LPML | DIC | pD | |
---|---|---|---|---|

5591 | -26425.21 | 51793.14 | 5199.108 | |

5592 | -26333.20 | 51606.23 | 5186.702 | |

5592 | -26350.51 | 51791.15 | 5229.485 | |

5592 | -26357.66 | 51773.84 | 5205.650 |

Based on the DIC values, the evidence is strong to prefer the

In this tutorial, we presented a didactic overview of the joint longitudinal and time-to-event modeling framework, presented a comparison of various

Due to space constraints and to limit complexity, we have not addressed other advanced features of the

Recently,

The primary objective of this tutorial was to present joint modeling of longitudinal and time-to-event data using the

Property | JM ( |
---|---|

Estimation method | Frequentist approach with maximum likelihood estimation using an EM algorithm |

Association structure | Current value, current value and slope, shared random effects, lagged effects, cumulative effects and free-form association structures implemented |

Longitudinal submodel | Longitudinal submodel fitted with the |

Time-to-event submodel | Time-to-event submodel fitted with the |

Model comparison | Joint models comparison can be performed with a Likelihood Ratio Test (LRT) implemented with the |

Post-fit analysis | Various post-fit functions including goodness-of-fit analyses, plots, predicted trajectories, individual dynamic prediction of the event and predictive accuracy assessment are available |

Property | JMbayes ( |
---|---|

Estimation method | Bayesian approach using a MCMC algorithm. A long chain methodology is implemented. Multiple chains are also feasible with a little programming |

Association structure | Current value, current value and slope, shared random effects, lagged effects, cumulative effects and free-form association structure are implemented |

Longitudinal submodel | Continuous and categorical longitudinal outcomes are allowed. Longitudinal submodel fitted with the |

Time-to-event submodel | Time-to-event submodel fitted with the |

Models comparison | Joint models comparison can be performed with the Deviance Information Criterion (DIC) available with the |

Post-fit analysis | Various post-fit tools for model diagnostic checks are available as diagnostic plots. Functionalities are available for computing dynamic predictions for both the longitudinal and time-to-event outcomes and assessment of model accuracy in terms of discrimination and calibration |

Extensions | Joint modeling for multivariate longitudinal outcomes and for time-varying association structures using P-splines are two very recent extensions implemented within the |

Property | joineR ( |
---|---|

Estimation method | Frequentist approach with maximum likelihood estimation using an EM algorithm |

Association structure | The implemented associations structures are based on an extended version of the “shared random effects” model proposed by Wulfsohn and Tsiatis ( |

Longitudinal submodel | Longitudinal submodel is univariate and fitted with a linear mixed-effects model (splines and non normal responses are not feasible) |

Time-to-event submodel | Time-to-event submodel is a Cox PH model with log-Gaussian frailty |

Models comparison | Models comparison can be performed with Likelihood methods |

Post-fit analysis | Exact standard errors intervals can be obtained with implemented bootstrap methodology |

Extensions | The |

Property | lcmm ( |
---|---|

Estimation method | Frequentist approach based on maximum likelihood estimation using a modified Marquardt algorithm |

Association structure | The associations structures available are based on joint latent class models assumptions (joint models that consider homogeneous latent subgroup processing speed of individuals sharing the same longitudinal trajectory and risk of event) |

Longitudinal submodel | The linear model can include an univariate Gaussian outcome, an univariate curvilinear outcome, an univariate ordinal outcome and curvilinear multivariate outcomes |

Models comparison | Models comparison can be performed with likelihood methods |

Post-fit analysis | Various post-fit functions with goodness-of-fit analyses, classification, plots, predicted trajectories, individual dynamic prediction of the event and predictive accuracy assessment are available |

Property | frailtypack ( |
---|---|

Estimation method | Frequentist maximum likelihood approach using either a parametric or semiparametric approach on the penalized likelihood for estimation of the hazard functions |

Longitudinal submodel and time-to-event submodel | Several type of joint models are implemented. In particular, joint models for recurrent events and a terminal event, for two time-to-event outcomes for clustered data, for two types of recurrent events and a terminal event, for a longitudinal biomarker and a terminal event and joint models for a longitudinal biomarker, recurrent events and a terminal event |

Models comparison | Two criteria for assessing model’s predictive accuracy are implemented an can be used for models comparison |

Post-fit analysis | Each model function allows to evaluate goodness-of-fit analyses and provides plots of baseline hazard functions. Individual dynamic predictions of the terminal event and evaluation of predictive accuracy are also implemented |

Property | rstanarm ( |
---|---|

Estimation method | Bayesian approach using a MCMC algorithm with the function |

Association structure | Current value, current value and slope, shared random effects, lagged effects, cumulative effects and interaction effect associations structures are implemented |

Longitudinal submodel | Generalized linear mixed-effects model (the response has to belong to an exponential family distribution). The longitudinal part can be multivariate, contonuous and/or categorical. Linear slope, cubic splines and polynomial terms are allowed |

Time-to-event submodel | The baseline hazard can be specified parametrically or non parametrically. A stratification procedure is not allowed |

Models comparison | There is currently no models comparison procedure implemented |

Post-fit analysis | Several post-fit functions for dynamic predictions of the terminal event and longitudinal trajectories and as well as visualisation functions are available |

Property | bamlss ( |
---|---|

Estimation method | Bayesian method |

Association structure | Flexible additive joint models are implemented |

Longitudinal submodel | Univariate continuous longitudinal outcome is allowed an |

Time-to-event submodel | A single time-to-event outcome is allowed |

The authors have no funding to report.

The authors have declared that no competing interests exist.

The authors gratefully acknowledge