A Tutorial for Joint Modeling of Longitudinal and Time-to-Event Data in R

In biostatistics and medical research, longitudinal data are often composed of repeated assessments of a variable (e.g., blood pressure or other biomarkers) and dichotomous indicators to mark an event of interest (e.g., recovery from disease, or death). Consequently, joint modeling of longitudinal and time-to-event data has generated much interest in these disciplines over the previous decade. In psychology, too, often we are interested in relating individual trajectories (e.g., cognitive performance or well-being across many years) and discrete events (e.g., death, diagnosis of dementia, or of depression). Yet, joint modeling are rarely applied in psychology and social sciences more generally. This tutorial presents an overview and general framework for joint modeling of longitudinal and time-to-event data, and fully illustrates its application in the context of a behavioral (cognitive aging) study. We discuss practical topics, such as model selection and comparison for both longitudinal and time-to-event data, choice of joint modeling parameterization, and interpretation of model parameters. To do so, we examined seven frequently used packages for joint modeling in the R language and environment. We concluded that of these, JMbayes is especially attractive due to its flexibility, its various parameterizations of the association structure, and for its powerful and fully Bayesian implementation. We make available the R syntax to apply the JMbayes package within our example.


Introduction
In many research settings, it is very common to record longitudinal observations that correspond to responses on continuous measures alongside dichotomous indicators to mark the occurrence of an event of interest. A prototypical example in clinical research is the repeated assessment of biological measures (e.g., blood pressure, antibody affinity, cholesterol level) that may relate to an event such as death, recovery from a disease, or disease diagnosis. In psychological research, much interest evolves around the association between long-term individual trajectories of cognitive performance and imminent death (e.g., terminal decline or terminal drop hypotheses; Kleemeier, 1962;Riegel & Riegel, 1972).
Methods for the separate analysis of such outcomes are well established in the literature, and these frequently include the use of mixed-effects models for the longitudinal portion of the data and the Cox proportional hazards model for the time-to-event data (Cox, 1972;Laird & Ware, 1982). However, the repeated measurements of the same individual are most likely related to the event, making them endogenous. This means that the measurement value at a given time point is informative about the future occurrence (or non-occurrence) of the event of interest. Likewise, the longitudinal trajectory is related to, and shaped by, the occurrence of the event. Separate analyses of such data cannot account for this endogeneity and cannot inform about the associative strength between the two. A tentative solution to this problem is a two-step approach, whereby estimated parameters from the mixed-effects models are used as covariates in the time-to-event model. Yet, this approach has been shown to suffer from increased bias in estimation and loss of efficiency (Prentice, 1982;Rizopoulos, 2012). Joint models simultaneously estimate both longitudinal and time-to-event components and are better suited for analyzing such data because they estimate jointly the relative risk of the time-to-event outcome contingent upon the longitudinal outcome (Gould et al., 2015;Henderson, Diggle, & Dobson, 2000;Hogan & tutorial. Note that a range of tools allowing one to perform joint modeling are available for other standard statistical software such as SAS/STAT, Stata, WinBUGS, and JAGS (see Papageorgiou et al., 2019, and references therein for more details).
JMbayes provides a framework wherein one or more characteristics from the longitudinal submodel can be flexibly parameterized as linear predictors in the time-to-event submodel. This flexibility is important because the relation between the continuous variable and the event of interest may take several functional forms, each of which may imply a different theoretical interpretation. For example, the longitudinal process can be specified in relatively simple terms (e.g., individual random effects about the intercept and the linear slope within a mixed-effects submodel), or in more complex non-linear spline effects, each of which can be linked to the time-to-event process with any of several association structures. JMbayes also allows for inclusion of multivariate longitudinal processes and for stratification criteria in the time-to-event submodel. Also, estimation of joint models, which are inherently complex, can be computationally intensive.
Per its name, the JMbayes package relies on a Bayesian analytical approach, which is very attractive for applications where classical maximum likelihood estimation often encounters nonidentifiability issues yielding unreliable inferential conclusions. Indeed, Bayesian analysis using Markov chain Monte Carlo (MCMC) estimation can yield very precise point estimates without reliance on asymptotic assumptions (Robert & Casella, 2013). JMbayes also provides useful diagnostic measures for evaluating MCMC convergence and for evaluating relative model fit.
We note that the aim of this tutorial is not to replace the JMbayes documentation, which provides in-depth methodological detail and supplementary R code (Rizopoulos, 2016(Rizopoulos, , 2019. Here, our goal is to complement this information with a more user-oriented, applied perspective, so that non-expert users may successfully carry out analyses using joint models. For the sake of brevity and clarity, we do not provide an exhaustive treatment of all features of joint modeling from the applications, but we nevertheless briefly discuss some of them at the end of the tutorial. We begin by describing the statistical framework for joint modeling. We then discuss required software packages, describe the data set used for this tutorial, and finally present example analyses to illustrate the use of joint modeling for behavioral data analysis. We provide the data and the full syntax of our application as Supplementary material.

Statistical Framework
The joint model, first described by Wulfsohn and Tsiatis (1997), consists of two submodels: a mixed-effects submodel for the longitudinal data and a time-to-event submodel for the time-to-event data. The two submodels are joined by allowing the time-to-event submodel to depend on some characteristics of the longitudinal submodel potentially in various ways. The defining feature of the joint model is that longitudinal and time-to-event data are modeled simultaneously with respect to a conditional joint density, rather than separately with two marginal and independent densities (see e.g., Rizopoulos, Hatfield, Carlin, & Takkenberg, 2014, for further technical details). Consequently, the effects of the longitudinal submodel on the time-to-event submodel are explicitly specified, and, at the same time, the parameter estimates of the former are also conditioned on the latter submodel (i.e., the longitudinal submodel is more accurately estimated by including the time-to-event data than without them, if endogeneity occurs). We discuss first the longitudinal submodel, then the time-to-event submodel, and finally several parametrization options to associate the two within the joint model.

The Mixed-Effects Submodel
Longitudinal data here consist of repeated measurements of the same outcome for each study participant over a given period of time. Repeated measurements for the same individual are typically correlated, and therefore each individual in the population is expected to have his or her own subject-specific response pattern over time. The mixed-effects model (Laird & Ware, 1982) accounts for this between person variability by estimating person specific random effects around the parameters that are invariant across individuals (fixed effects).
Let y i (t) denote the measurement on y for individual i (i = 1, . . . , N ) at time t (t = 1, . . . , T * i ), where T * i is the time-to-event for individual i. The model assumes that y i (t) is the observed measured value, which deviates from the true unobserved value µ i (t) by the amount of error i (t), where i (t) ∼ N (0, σ 2 ), normally distributed around 0 with variance σ 2 . The longitudinal submodel can then be written as where b i denotes the vector of random effects for individual i with b i ∼ N (0, D), and D is the variance-covariance matrix of the random effects (typically a 2 × 2 unstructured matrix with random intercept and random slope effects). X i (t) and Z i (t) represent the design matrix containing the predictors for the fixed-effects regression coefficients β and for the random-effects regression coefficients b i , respectively (typically b 0i and b 1i for random intercept and slope effects).
The second line of Equation (1) represents the notation when µ i (t) is modeled non-parametrically via a spline approach (illustrated below). f (·) and f i (·) represent the spline function for the fixed and, respectively, random components.

The Time-to-Event Submodel
Time-to-event, or survival, analysis refers to statistical methods for analyzing time-to-event data. An event time is the time elapsed up to the occurrence of an event of interest (such as death or disease diagnostic), given that it has not previously occurred yet.
Time-to-event data are characterized by the fact that the event of interest may not have occurred for every participant during the duration of the study. This particular type of missing data is analogously treated as data of a participant who drops out before the end of the study, and both scenarios constitute right censorship. Time-to-event analysis is particularly appropriate to analyze right-censored data.
There are different models for time-to-event data (Kaplan & Meier, 1958). Probably, the most popular is the semi-parametric Cox proportional hazard (PH) model (Cox, 1972), which is a regression model wherein covariates are assumed to have a multiplicative effect on the hazard for an event. The Cox PH model has the particularity of estimating the baseline hazard function non-parametrically. If there is a theoretical reason to suppose that the baseline hazard function has a particular parametric form (typically Weibull, Gamma or Exponential), a parametric model can be specified. Most often, however, an analyst ignores the exact form of a hazard function. The Cox PH model has been shown to approximate well unknown parametric hazard functions (Fox, 2002).
The Cox PH submodel can be formulated as where h 0 (t) is the baseline hazard function at time t and w i is a vector of baseline predictors with corresponding regression coefficients γ. The submodel has two distinct parts. First, h 0 (t) describes the hazard (e.g., risk of the event occurring, given it has not yet occurred) when all covariates w i have value zero. This baseline hazard function is assumed invariant across all individuals (h 0 (t) does not depend on i). Second, the effect parameters γ T w i describe how the hazard varies as a function of the explanatory covariates w i .

The Joint Model
Joint models formally associate the longitudinal and time-to-event processes through shared parameters (Henderson et al., 2000;Ibrahim, Chu, & Chen, 2010;Rizopoulos, 2012;Wulfsohn & Tsiatis, 1997). This framework therefore models the hazard of experiencing the focal event as dependent on a subject-specific characteristic of its longitudinal trajectory.
More specifically, we have where M i (t) = {µ i (s), 0 ≤ s < t} denotes the history of the true unobserved longitudinal process µ i (s) up to time point t, where s is a time point prior to t. w i is a vector of (possibly time-varying) covariates with corresponding regression coefficients γ, and b i is the vector of random effects for individual i defined in Equation (1).
Parameter vector α quantifies the association between a priori selected features of the longitudinal process and the hazard for the event at time t. Various options for the function f (·) are possible and lead to different forms of the time-to-event submodel. We discuss now the three most frequently used association functions in the joint modeling literature (see for example Gould et al., 2015;Guo & Carlin, 2004;He & Luo, 2016;Henderson et al., 2000;Papageorgiou et al., 2019;Sène et al., 2014;Tsiatis & Davidian, 2004;Wulfsohn & Tsiatis, 1997).
The "current value" association. The first association structure between the longitudinal and the time-to-event submodels we discuss is known as the "current value", and assumes that for individual i, the true value µ i (t) of the longitudinal measure at time t is predictive of the risk h i (t) of experiencing the event at that same time t. The Cox PH submodel with this association structure is written as This equation associates µ i (t), as defined in Equation (1), with h i (t) at each time point t .
The reliable estimate of µ i (t) is of chief importance, and is obtained by correctly specifying the design matrices X i (t) and Z i (t) in Equation (1). This means that we need to specify a longitudinal submodel that accurately describes the repeatedly observed measurements The coefficient α 1 provides a measure of the strength of the association between µ i at time t and the time-to-event process at the same time. More precisely, each one-unit increase on the current value of µ i (t) is associated with an exp(α 1 )-fold increase in a participant's risk of experiencing the event at that same time t, given the event has not occurred prior to t. Note that this parameterization supposes that the associative parameter α 1 between the longitudinal value and the event risk is the same across all individuals. However, the overall hazard h i (t) may vary across individuals as a function of the other subject-specific covariates w i .
The "current value plus slope " association. The "current value" parameterization, however, does not distinguish between individuals who have, at a specific time point, an equal longitudinal score (an equal µ i (t)), but who differ in their rate of change of this score (e.g., one subject may have an increasing trajectory, whereas another a decreasing trajectory). The second association structure between the time-to-event and the longitudinal submodels is known as the "current value plus slope" parameterization. This extends the previous structure by adding the rate of change of the measurement at time t, estimated as the derivative of µ i (t) w.r.t. the time metric, as follows: where µ i (t) is the derivative of µ i (t) with respect to the time variable t. Practically, the quantity µ i (t) has to be calculated "by hand" by the user. The reader is referred to any basic mathematical literature for calculus derivative rules. For worked examples, see Equation (13).
The corresponding submodel of this association structure has the form where the hazard of experiencing the event at time t is now assumed to be associated with both the value of µ i (t) and its rate of change (i.e., slope) µ i (t), at time t (see Equation (5)).
The coefficients α 1 and α 2 express the strength of the association between the value and rate of change of the (true) subject trajectory at time t and the time-to-event process at the same time. For individuals having the same level of µ i (t), the hazard ratio for a one-unit increase of the current rate of change µ i (t) (i.e., the trajectory) is exp(α 2 ).
Identically, for individuals having the same rate of change µ i (t), the hazard ratio for a one-unit increase of the current value µ i (t) is exp(α 1 ).
The "shared random-effects" association. The third association structure between the time-to-event and the longitudinal submodels is a "shared random-effects" parameterization (Wulfsohn & Tsiatis, 1997). This parameterization includes only the random effects from the longitudinal submodel in Equation (1) as linear predictors of the relative risk submodel. For a simple random-effects structure (i.e., random intercept and slope effects), the random effects represent the individual deviations from the sample average intercept and slope values, and the association parameters therefore reflect the change in the log hazard for a one-unit change in these deviations. For more elaborate structures, such as nonlinear parametrization, associative parameters should be interpreted with caution (Gould et al., 2015;Rizopoulos, 2012;. The "shared random-effects" parameterization postulates that, compared to the general tendency, individuals who have a lower (or higher) measure at baseline (i.e., random intercept) or who show a steeper decrease (or increase) in their longitudinal trajectories (i.e., random slope) are more or less likely to experience the event. The "shared random-effects" submodel can be written as This parameterization is computationally simpler than the "current value" and the "current value plus slope" parameterizations, because the associative part α T b i in Equation (7) is time-independent. Indeed, when a simple linear random-intercept and/or random-slope structure is assumed for the longitudinal submodel in (1), individual specificities are characterized by random effects b i witch do not depend on time t. This means that for a specific individual i, random effects b i are unique, regardless of the time t, contrary to the current value µ i (t) and the current slope µ i (t) (see e.g., , for further details).

Software
The JMbayes package in R has a basic model fitting function called jointModelBayes(·), which accepts as its main arguments a mixed-effects submodel object fitted by the function lme(·) from package nlme (Pinheiro, Bates, DebRoy, Sarkar, & Team, 2017) and a time-to-event submodel object fitted by a Cox PH model with the function coxph(·) from the package survival (Therneau, 2015). Thus, both these packages need to be installed to use JMbayes. JMbayes also requires that JAGS (Just Another Gibbs Sampler; Plummer, 2004) be installed in order to perform MCMC estimation. We show the R syntax to properly install the JMbayes package in Section Code 1 lines 1-4 in the Computer Code Section.
Both the nlme and survival packages are commonly used for fitting linear mixed-effects and, respectively, Cox PH submodels within the R environment. We hence do not discuss them at length here. We refer the interested readers to the original articles (Pinheiro et al., 2017;Therneau, 2015).

Empirical Example
We illustrate the joint modeling approach by applying JMbayes to data from a longitudinal study of cognitive change in adults (Rabbitt et al., 2004).

The MLSC Dataset
The data come from the Manchester Longitudinal Study of Cognition (MLSC) (Rabbitt et al., 2004). The MLSC study is a 20-plus-year study consisting in over 6200 adult individuals, with initial ages from 42 to 97. Five domains of cognitive ability were examined, but here, for simplicity, we focus only on processing speed (PS), defined as the speed at which we can compute simple but abstract tasks. This ability has been shown to decrease markedly during the lifespan, to induce general decline in more elaborate cognitive abilities (such as memory or verbal capacities), and to relate to physical health in older adults (Salthouse, 1996). Here, we investigate whether an individual's level of and change in PS performance in adulthood predicts their risk of death.
PS was assessed up to four times at approximately 3-year intervals. Relations between cognition and survival have been the focus of several previous MLSC analyses (see Aichele, Rabbitt, & Ghisletta, 2015, and references therein for further details). For the current illustration, we use data from female participants aged between 50 and 90 years only. This gave an effective N = 2780 participants, of whom 1858 died during the study, who were tested on average twice. Other MLSC variables included in the current analyses were chronological age at each of the longitudinal assessments and at study entry for the longitudinal submodel, and smoking status and chronological age at study entry, time in the study, and survival status for the time-to event submodel.

MLSC Analysis Approach
We hypothesize that mortality may explicitly depend on previous PS trajectories, whereby individuals with lower scores and/or steeper trajectories should have a greater hazard of dying than those performing better. We also posit that, given the PS-mortality association, adequate description of cognitive change in adult individuals should statistically take into account their (future) mortality information. A previous survival analysis of these data showed that the proportional hazards assumption was not met when comparing mortality risk for male and female participants (Aichele et al., 2015). Therefore, sex-specific analyses are warranted. For simplicity, we limit the current analyses to data from female MLSC participants. Before estimating the joint model, we perform initial model selection steps independently for the longitudinal and for the time-to-event submodels.
We first select a submodel for the longitudinal part of the data. Based on a previous literature search on the use of the mixed-effects model in cognitive aging research (Ghisletta et al., in press), we tested six submodels: the first with fixed and random-effects for the intercept and the linear slope (to describe change over time), the second adding to the first a quadratic slope (change trajectories with a single curvature), the third adding to the second a cubic slope (change trajectories with two curvatures), and the fourth to the sixth with spline effects of time and with different numbers of nodes. Of the six submodels, based on statistical criteria, we choose the one that provides the most satisfying description of the PS trajectories.
We then, separately, select the optimal time-to-event submodel. To do so we rely on the Cox PH model that relates smoking status and initial age to the timing and occurrence of death. For each covariate we test the PH assumption and then retain the submodel that best describes the mortality process based on the selected covariates. Finally, we combine both submodels in joint modeling, and compare four association structures between the two submodels.

Longitudinal model selection.
As required by the JMbayes package, we fit the longitudinal submodel with the lme(·) function from the nlme package (Pinheiro et al., 2017). This longitudinal submodel must be specified in terms of its fixed and random components. For both components, linear and/or higher-degree polynomial terms are allowed (e.g., linear, quadratic or cubic change). More complex parameterizations of nonlinear effects are also feasible, such as achieved through the flexible spline methodology (see the second line in Equation (1) and cf. Rizopoulos, 2012).
The spline approach has gain much popularity in recent years because of its flexibility in describing subject-specific longitudinal trajectories that follow highly nonlinear functions and that, individually, deviate from the average sample function. A spline is a piecewise function composed of polynomials adjusted to data within adjacent intervals, separated by equidistant points called nodes. The degree of the spline is defined by the polynomial of the highest degree used. For instance, if the function fitted to the data within an interval is a straight lines, the spline is said to be of degree 1. Cubic splines are a popular specification for splines that entails polynomials of degree 3 (Hastie & Tibshirani, 1987).
Within the lme(·) function, nonlinear effects can be specified with the subfunction ns(·) from the splines package. The ns(·) function uses natural cubic splines that estimate a smooth functions of the predictors (functions f (·) and f i (·) in the second line of Equation (1); Chambers, 1991). With natural cubic splines, the data within each interval are fitted by a cubic spline, and splines from neighboring intervals must respect two conditions: at each node, the first and the second derivatives of the splines are continuous, meaning that, for any two adjacent intervals, the finishing portions of the left spline smoothly joins the starting portion of the right spline, thereby giving the visual impression of a single, continuos line passing through both intervals.
The user must select the optimal number of nodes, which is a rather arbitrary component of splines. However, this value should be less than or equal to the number of repeated measures. To this end, there is no automatic selection procedure implemented in the JMbayes package (in contrast to the mgcv package, for example; Wood, 2006). We suggest to first investigate the number of nodes for the fixed part of the longitudinal submodel, before proceeding to explore that of random part (when both components are modeled using splines, as proposed in Rizopoulos, 2016). To do so we suggest testing multiple submodels with different number of nodes and comparing them on the basis of their Bayesian Information Criterion, where smaller values are preferable (BIC; Schwarz, 1978). The BIC is available in the lme(·) output and is known to penalize complex models more strongly than the Akaike Information Criterion (AIC; Akaike, 1974). Conveniently, the BIC allows a fully Bayesian interpretation (Raftery, 1995). The difference in BIC between two models is to be interpreted as "not worth mentioning," "positive," "strong," and "very strong" evidence against the model with highest value when the difference is between 0 and 2, 2 and 6, 6 and 10, and greater than 10, respectively. (Kass & Raftery, 1995, p. 777). Applications of the spline-based approach within the joint modeling framework can be founded, for example, in Rizopoulos, Verbeke, and Lesaffre (2009), Brown, Ibrahim, and DeGruttola (2005), Ding and Wang (2008), and Rizopoulos and Ghosh (2011).
As explained in Oehlert (2012) and Pinheiro and Bates (2000), likelihood ratio tests to compare models with different fixed effects and same random effects should be performed using maximum likelihhod estimation (ML). However, the default estimation method implemented within the lme(·) function uses Restricted Maximum Likelihood (REML), because REML, unlike ML, produces unbiased estimates of variance and covariance parameters (the difference in bias vanishes with high number of observations).
To meaningfully compare longitudinal submodels in terms of their BIC they should thus be re-estimated with ML.
For this illustration, we modeled change in cognitive performance as a function of age in years centered at age 50 (Age), conditioned on age in years at study entry (AgeStart).
Thus, we add AgeStart and Age as fixed parameters to the longitudinal submodel. We also modeled Age as a random effect, because visually we might infer differences in rates of decline. Individual observed trajectories of PS are depicted in Figure 1.
We estimated the following specifications of the longitudinal submodel: The first three (lmeFit.mlsc1, lmeFit.mlsc2, lmeFit.mlsc3) estimate, respectively, a polynomial of first, second, and third degree of Age, respectively, with random effects for the intercept and Age. Models 4, 5, 6 (lmeFit.mlsc4, lmeFit.mlsc5 lmeFit.mlsc6) estimate cubic spline effects of Age with, respectively, 2, 3 and 4 nodes. Here, too, we included random effects for the intercept and Age. To do so, we used the R code in Section Code 2 lines 3-16 in the Computer Code Section.
We evaluated the adjustments of the six submodels in terms of their respective BIC values. The R code to this end can be founded in Section Code 2 lines 19-25 in the Computer Code Section. The results are displayed in Table 1.
For this we concluded that the submodel with a degree-2 polynomial for the fixed effects of Age and with a random intercept and a random Age slope fit the data best (m2, corresponding to lmeFit.mlsc2).
We subsequently expanded this submodel by adding interaction effects between the degree-2 Age polynomial and AgeStart, as shown in Equation (9). We again evaluated statistical fit with the BIC. The corresponding R code can be founded in Section Code 3 lines 1-4 in the Computer Code Section.
The comparison between these two submodels and the previous m2 submodel is also displayed in Table 1 and indicates that the submodel m2.1 (corresponding to lmeFit.mlsc2.1) best fit the data (the R code can be founded in Section Code 3 lines 7-10 in the Computer Code Section).
To conclude this section on the longitudinal submodel of the MLSC PS score, we retain submodel lmeFit.mlsc2.1, which will be joined to the upcoming time-to-event submodel in the final joint model. The function for the true value of PS µ i (t) (where the Time-to-event model selection. As discussed previously, the Cox PH submodel, defined in Equation (2), is often the preferred choice to analyze time-to-event data, given it does not require the user to specify a parametric hazard function. The nonparametric specification of the baseline hazard function within the coxph(·) function in R uses a B-spline approach, but if desired, regression splines can instead be invoked by setting the baseHaz function argument accordingly. The number and the position of the nodes are automatically selected, but they can also be fixed via the lng.in.kn and the knots function arguments.
In this illustration we specified a nonparametric baseline hazard function within the Cox PH submodel, with factor Smoker (whether or not an individual was a smoker at study entry) as baseline variable, conditioned on age in years at study entry (AgeStart).
The corresponding R code can be founded in Section Code 4 lines 1-2 in the Computer Code Section and results are displayed in Table 2. The results show that factor Smoker is significant (p-value < 0.001). The estimated coefficient means that being smoker at study entry is associated with a hazard ratio of exp(0.519) = 1.680. That is, being smoker at study entry increases the relative risk of death by 68%, compared to a non-smoker at entry.
An important (and often overlooked) assumption of the Cox PH model is that baseline hazard functions for model predictors are proportional (Cox, 1972 SRT can be applied for each predictor individually and also globally (for the model with all predictors) using the function cox.zph(·) from the survival package. Table 3 displays the results of the SRT and the corresponding R code can be founded in Section Code 4 line 3 in the Computer Code Section.
The results of the SRT indicate that the PH assumption does not hold for the factor Smoker and for AgeStart as a predictor (p-values < .05), nor for the whole submodel (p-value < .05). When the PH assumption is not met for a predictor of theoretical interest, a stratification procedure for that predictor is required (Fox, 2002). This stratification procedure is supported within the coxph(·) function inside the JMbayes package. In our particular case, we should thus stratify the time-to-event analysis by smoker status, with two strata, smoker vs non smoker, and also initial age, with an arbitrary number of strata Joint model. We are now ready to explore the association between longitudinal trajectories in processing speed and risk of death. We consider the three association structures as discussed above and, again, use a model comparison approach to select the final model.
The "current value" association. We start with the "current value" parameterization defined in Equation (4). For our data, this joint model si written as with h 0 (t) denoting the baseline hazard function. We estimated this submodel with the R command displayed in Section Code 5 lines 1-2 in the Computer Code Section. Note that the previously estimated longitudinal and time-to-event submodels are the main arguments of the jointModelBayes(·) function, and that the "current value" is the default association structure. The jointModelBayes(·) function uses a Metropolis-based Markov chain Monte Carlo (MCMC) algorithm, for which options must be chosen. We discuss these options in the Section Code 5 in the Computer Code Section.
Additionally, in a Bayesian paradigm, prior distributions must be specified for model parameters. A basic approach is to simply select non-informative priors, which is the default for JMbayes. If desired, informative prior distributions can also be specified. For example, in Andrinopoulou and Rizopoulos (2016), where the goal was to estimate elaborate models with many predictors in the time-to-event submodel, shrinkage priors were used in order to force small effects to be equal to zero while maintaining true large effects (van Erp, Oberski, & Mulder, 2019). Of importance, JMbayes uses parameter estimates from the individual longitudinal and time-to-event submodels, as illustrated above, as starting values for the MCMC estimation of the joint model. Also, the package relies on a single, long chain, rather than multiple parallel, short chains.
As usual, before interpreting the results, we should worry about the estimation quality of the model. Various diagnostic plots are available within JMbayes. The plot(·) function can be used to visually examine convergence of the jointModelBayes(·) MCMC estimation. The output includes trace plots, autocorrelation plots, and kernel density estimation plots. An example of these plots for the association parameter fitted within the joint model with current value parametrization can be founded in Figure 3.
In the trace plots, we seek the so-called "lazy caterpillar" pattern, which means that across the various iterations the estimated values of the parameters are within reasonably restricted ranges, and are not suddenly outside such a range. For the autocorrelation plots, which represent the correlations between a solution and that a given lag from it, we seek autocorrelations that become small and as close to zero as quickly as possible (i.e., with small lags), meaning that the solutions of the simulated samples become quickly independent. The kernel density plots are smoothed histograms of the estimated solutions across the simulated samples. We expect these to be unimodal and with small tails. It is also possible to specify MCMC estimation with multiple shorter chains, which can also be diagnosed, but this requires additional R coding: Users must select different initial values within the priors space and then extract the corresponding MCMC output for analysis using the coda MCMC diagnostic package (Plummer, Best, Cowles, & Vines, 2006;Rizopoulos, 2016).

The estimated parameter values can be obtained with the standard summary(·)
function (line 4 of the Section Code 5 in the Computer Code Section). We show the results of this first association structure in Table 4. We report the posterior means of all estimated parameters, with their 95% credible intervals. The Assoct coefficient in the output represents α 1 in Equation (11) and estimates the association between the true processing speed score at time t and the risk of death at that same time. We can interpret this parameter as follows: The percent reductions in mortality risk associated with the current true value of processing speed for women is calculated as 1 − HR × 100, where HR is the hazard ratio corresponding to a one-unit difference in the current true value of processing speed µ i (t). Thus, HR is calculated as exp(−0.026) = 0.974, which gives 1 − 0.974 = 0.026, and translates to 2.6% lower risk of death per one unit increase in the current true value of processing speed. The coefficient for smoker statuts (0.445) indicates a strong association with mortality, with a exp(0.445) = 1.560-fold increase in risk of death in smokers, compared to non smokers.
The "current value plus slope" association. This second association structure takes the form and specifies that the hazard of experiencing the event at time t is now assumed to be associated both with the true value of processing speed at time t, µ i (t), and its current rate of change (i.e., slope) at time t, noted µ i (t).
Recalling Equation (10), the temporal variable is Age. The derivative of Equation (10) with respect to the temporal variable Age can be written as The R code for specifying the derivative form of this linear submodel and for running the corresponding joint model can be found in the Computer Code Subsections 6 and 7.
We show the results of the "current value plus slope" parametrization in Table 5.
Assoct represents α 1 and AssoctE represents α 2 in Equation (6), and quantify, respectively, the association between the current true value and the current rate of change of processing speed and the relative risk of death. We obtain α 1 = 0.001 and α 2 = −1.600, with the 95% credible interval of Assoct that contains 0, while that of AssoctE does not.
Clearly, then, the rate of change in perceptual speed at a given time point is a much more important predictor of than the actual true value of speed at that same time point. The lower the rate of change, the greater the odds of dying.
To interpret the effect of AssoctE on the risk of death, we could proceed as usual, by relating it to a one-unit change. However, whereas a one-unit value is a meaningful comparison for the current true value of speed, it is excessively large for the rate of change.
To show this, we plotted in Figure 4 the current true value µ(t) estimated from the previous joint model with the "current value" association (continuous line) and the current rate of change µ (t) (dashed line) from the joint model "current value plus slope", as a function of the Age variable. We can immediately see that whereas the current value changes over a large range (from almost 6 to -10), the rate of change varies much less (from about 0 to less than -1). Hence, whereas a one-unit change could be useful in interpreting the former effect, it is less so for the latter effect (it would result in a (1 − exp(−1.600)) × 100 = 79.81% reduction in mortality risk, which is simply huge).
Coming back to Figure 4, we note that over an interval of 40 years, the current true value decreases by 14.722 points, whereas the rate of change decreases by only 0.535 points. This translates to an average decrease over a 10-year period of 3.684 for the current true value and 0.134 for the rate of change, with associated percent reductions in mortality risk of 9.13% (calculated as (1 − exp(−0.026 × 3.684)) × 100) and 19.3% (calculated as (1 − exp(−1.6 × 0.133)) × 100), respectively. We believe this exercice to be most useful to properly interpret the effect sizes stemming from this model, because, typically, the range of the current true value is much larger than that of its rate of change.
The "shared random effects" association. The third joint model can be written as where α 3 and α 4 are the parameters of association of b 0i and b 1i , which are defined in Equation (8) as the individual deviations from the sample average intercept and slope values, respectively. α 3 and α 4 define the association between the random intercept and the random slope effects, respectively, and the hazard of death. More precisely, for individuals having the same deviation from the average slope, the hazard ratio for a one-unit increase in individual deviation from the average intercept is exp(α 3 ). Conversely, for individuals having the same deviation from the average intercept, the hazard ratio for a one unit increase in individual deviation from the average slope is exp(α 4 ). The code for estimating the joint model with the "shared random effects" parameterization can be founded in Section Code 8 lines 1-3 in the Computer Code Section.
The results are displayed in Table 6. Assoct:(Intercept) represents α 3 and Assoct:Age represents α 4 in Equation (14). Subject-specific deviations from the sample average intercept are not predictive of the risk of death, whereas subject-specific deviations from the sample average slope are. To interpret this latter effect properly, rather than relating the reduction in mortality risk to a one-unit change, we relate it to a one-SD change, where D[2, 2]=0.080. This risk reduction is calculated as 1 − exp(α 4 × std(b 1 )) = 1 − exp(−1.641 × √ 0.080) = 0.371. Thus, for a 1SD decrease in slope, the percent increase in mortality risk is 37.1%. Participants with greater cognitive decline have greater odds of dying than those declining less.

Joint model with interaction.
We have illustrated the three association structures discussed previously in the joint models. Before reaching a final substantive conclusion about our data set, we would like to illustrate an additional extension of the association function f (·) in Equation (3). We illustrate interactions of predictors with the current true value µ i (t) and/or current rate of change µ i (t) terms.
We illustrate this by extending the model jointFit.mslc1 to include an interaction term between the current true value of processing speed µ i (t) and the smoker status. The corresponding time-to-event submodel can be written as where the interaction coefficient α 5 expresses the difference in mortality prediction of µ i (t) between smokers (Smoker = 1) compared to non-smokers (Smoker = 0).
The interaction is specified within the JMbayes package using the transFun argument, which takes as its value a generic function tf1 that, in turns, takes arguments x for the longitudinal parameter of interest and data for the predictor with which the longitudinal parameter interacts. This tf1 function has to be saved under its generic name (tf1) and defined in the workspace (see Rizopoulos, 2016 (15)) represents the association between the current true value of processing speed and the mortality risk and Assoct:Smoker (α 5 in equation (15)) represents the interaction effect between the current true value of processing speed and the factor Smoker on the mortality risk. These results indicate that the current value of processing speed µ i (t) is negatively associated with the risk of death (Assoct = −0.028).
Going back to the previous interpretation of the Assoct effect of 3.684 units over a 10-year period, rather than a one-unit change (cf. Figure 4), we obtain a 9.8% (calculated as Section. The results of joint models comparison can be founded in Table 8.
LPML is the log-pseudo-marginal-likelihood value, DIC is the deviance information criterion and pD refers to the effective number of model parameters. Based on the DIC values, the evidence is strong to prefer the jointFit.mlsc2 model, which includes the current true value and the current rate of change of processing speed as predictors of the risk of death. Recall that within this model, the effect of the current true value in speed on the hazard of dying was not different from zero, whereas the association between the rate of change in speed and survival was strong. More precisely, individuals experiencing a greater rate of change, equivalent to what is expected over a 10-year interval, have a 19.3% greater mortality risk than those with shallower rates of change. Thus, we can conclude that cognitive performance per se, assessed via perceptual speed, does not relate strongly to mortality. Rather, it is the rate of change in cognitive performance that is predictive of survival, as hypothesized by the terminal decline and terminal drop hypotheses.

Discussion
In this tutorial, we presented an overview of the joint longitudinal and time-to-event modeling framework, presented a comparison of various R packages allowing this type of analysis, and illustrated how to use the most comprehensive of these R packages, JMbayes, to estimate a series of joint models. We illustrated joint models with real data from a psychological study on adult cognitive development and provided guidelines to choose among a series of theoretically motivated alternative specifications of joint models.
Due to space constraints and to limit complexity, we have not addressed other That is, rather than relating the value or slope on the true trajectory to the event, the whole surface under the trajectory constitutes the predictive information. For example, we might want to relate how learning characteristics during a critical period relate to mortality. In this case, performance during a given learning trial might be less informative or mortality than the entire learning history, as represented by the area under the learning curve.
Recently, JMbayes was extended to support simultaneous modeling of multiple longitudinal outcomes (using the function mvJointModelBayes(·)). This multiple parametrization may be useful when several longitudinal measures are recorded and may jointly impact the risk of death. For instance if several measure of cognitive development are recorded (as processing speed, fluid intelligence and crystallized intelligence in Rabbitt et al., 2004), they may jointly impact the risk of death. We also encourage interested readers to explore the developments of other R packages and other software allowing for joint model estimation.
The primary objective of this tutorial was to illustrate joint modeling of longitudinal and time-to-event data using the JMbayes package. Rather than reviewing the technical details of the joint modeling methodology, we aimed instead at providing researchers in behavioral sciences a practical understanding of what this powerful methodology can offer in substantive terms and how to apply it to their data. We hope this information will encourage adoption of the joint longitudinal-time-to-event modeling framework in research domains where it is currently underutilized and that should benefit from it.

Code 1
install.packages("JMbayes")  lmeFit.mlsc2.1 is the longitudinal submodel, Coxfit_fem is the survival Cox submodel and timeVar = "Age_50" means that the temporal variable is Age_50. Note that no specification of the association structure is necessary, because the "current value" parameterization is the default implementation. Here we specified that, after the burn-in period, the MCMC algorithm should run for 30000 iterations.
Arguments indFixed=c(2,4,5) and indRandom=2 refer to indices of the fixed and random model parameters corresponding to the retained (non-zero) arguments in the derivative of Equation (13 This tf1 function has to be saved under its generic name (tf1) in a separate R file and defined in the workspace. In this code, jointFit.mlsc4 calls jointFit.mlsc1 (the joint model with the current value parametrization) and the function tf1. It means that jointFit.mlsc4 will look for arguments x and data in the previously estimated model jointFit.mlsc1. x is the association structure used in jointFit.mlsc1 (here the current value µ i (t)) and data are the data used in the estimation of jointFit.mlsc1. Then, the function {cbind(x, "Smoker" = x * (data$Smoker=='TRUE'))} defines the columns of the design matrix X i (t) relative to the association structure (see Equation (1)). Here {cbind(x, "Smoker" = x * (data$Smoker=='TRUE'))} means that we will estimate first an simple effect of µ i (t) (x), and then an interaction between the current value µ i (t) and the factor Smoker ("Smoker" = x * (data$Smoker=='TRUE'))).

Code 10
anova ( Table 4 Parameter estimates and 95% credible intervals for the "current value" association. D [i, j] denotes the ij-element of the covariance matrix of the random effects. tauBs is a hyperparameter relative to the distribution of the baseline hazard function h 0 (t) and is typically not interpreted.  Table 5 Parameter estimates and 95% credible intervals for the "current value plus slope" association. D [i, j] denotes the ij-element of the covariance matrix of the random effects.
tauBs is a hyperparameter relative to the distribution of the baseline hazard function h 0 (t) and is typically not interpreted.  Table 6 Parameter estimates and 95% credible intervals for the "shared random effects" association. D [i, j] denotes the ij-element of the covariance matrix of the random effects. tauBs is a hyperparameter relative to the distribution of the baseline hazard function h 0 (t) and is typically not interpreted.  Table 7 Parameter estimates and 95% credible intervals for the joint model with interaction. D [i, j] denotes the ij-element of the covariance matrix of the random effects. tauBs is a hyperparameter relative to the distribution of the baseline hazard function h 0 (t) and is typically not interpreted.  Table 8 Join models comparison.
JM (Rizopoulos, 2010(Rizopoulos, , 2012 Estimation method Frequentist approach with maximum likelihood estimation using an em algorithm Association structure Current value, current value and slope, shared random effects, lagged effects, cumulative effects and free-form association structures  Table 9 Main functionalities of R package JM. JMbayes (Rizopoulos, 2016) Estimation method Bayesian approach using a MCMC algorithm. A long chain methodology is implemented. Multiple chains are also feasible with a little programming Association structure Current value, current value and slope, shared random effects, lagged effects, cumulative effects and free-form association structure are implemented  Table 10 Main functionalities of R package JMbayes.
joineR Williamson, Kolamunnage-Dona, Philipson, & Marson, 2008) Estimation method Frequentist approach with maximum likelihood estimation using an EM algorithm Association structure The implemented associations structures are based on an extended version of the "shared random effects" model proposed by Wulfsohn and Tsiatis (Wulfsohn & Tsiatis, 1997)  joineRML also includes functions to derive dynamic predictions.

Table 11
Main functionalities of R packages joineR and joineRML.  available   Table 12 Main functionalities of R package lcmm.
frailtypack (Król et al., 2017;Rondeau, Mazroui, & Gonzalez, 2012) Estimation method Frequentist maximum likelihood approach using either a parametric or semiparametric approach on the penalized likelihood for estimation of the hazard functions   Table 15 Main functionalities of R package bamlss.