In psychology and the social sciences, discriminant analysis (DA) is traditionally applied to classification tasks in data with continuous variables since its invention by <xref ref-type="bibr" rid="bib25">Fisher (1936)</xref>. Based on estimates of group means and the pooled covariance matrix, a classification rule is obtained or relative variable weights can be computed, respectively. Its importance for the behavioral sciences has often been emphasized in reviews, tutorials and textbooks (<xref ref-type="bibr" rid="bib7">Betz, 1987</xref>; <xref ref-type="bibr" rid="bib8">Boedeker & Kearns, 2019</xref>; <xref ref-type="bibr" rid="bib24">Field, 2017</xref>; <xref ref-type="bibr" rid="bib26">Fletcher et al., 1978</xref>; <xref ref-type="bibr" rid="bib29">Garrett, 1943</xref>; <xref ref-type="bibr" rid="bib36">Huberty & Olejnik, 2006</xref>; <xref ref-type="bibr" rid="bib67">Sherry, 2006</xref>). It has been applied to a large number of problems in experimental and applied psychology for class prediction as well as description (<xref ref-type="bibr" rid="bib1">Aggarwala et al., 2022</xref>; <xref ref-type="bibr" rid="bib45">Kumpulainen et al., 2021</xref>; <xref ref-type="bibr" rid="bib46">Langlois et al., 2000</xref>; <xref ref-type="bibr" rid="bib55">O’Brien et al., 2009</xref>; <xref ref-type="bibr" rid="bib60">Rogge & Bradbury, 1999</xref>; <xref ref-type="bibr" rid="bib68">Shinba et al., 2021</xref>; <xref ref-type="bibr" rid="bib70">Stoyanov et al., 2022</xref>). In contrast to multivariate data measured at a single time point, longitudinal data provide additional information about temporal changes, wherefore they are collected in various disciplines, including psychology and the social sciences (<xref ref-type="bibr" rid="bib3">Banks et al., 2021</xref>; <xref ref-type="bibr" rid="bib40">Jensen et al., 2021</xref>; <xref ref-type="bibr" rid="bib50">McLanahan et al., 2019</xref>). Despite these potential applications for repeated measures DA or alternative linear classification techniques, textbooks discussing DA do not mention respective repeated measures approaches (<xref ref-type="bibr" rid="bib48">Lix & Sajobi, 2010</xref>). To complicate matters further, many classification approaches for continuous multivariate repeated measures data assume multivariate normality (<xref ref-type="bibr" rid="bib31">Gupta, 1986</xref>; <xref ref-type="bibr" rid="bib63">Roy & Khattree, 2005a</xref>, <xref ref-type="bibr" rid="bib64">2005b</xref>; <xref ref-type="bibr" rid="bib75">Tomasko et al., 2010</xref>), but this assumption is rarely fulfilled by psychological datasets and hard to verify for small sample sizes (<xref ref-type="bibr" rid="bib6">Beaumont et al., 2006</xref>; <xref ref-type="bibr" rid="bib20">Delacre et al., 2017</xref>; <xref ref-type="bibr" rid="bib53">Neto et al., 2016</xref>; <xref ref-type="bibr" rid="bib56">Rausch & Kelley, 2009</xref>). Psychological data, especially those obtained using patient-reported instruments, are often characterized by skewness. There are only few alternative repeated measures approaches which relax or overcome the multivariate normality assumption and take the complex correlation structure between time points and variables into account. It is the aim of this manuscript to compare these approaches in an extensive simulation study. In particular, we consider two modifications of repeated measures LDA by <xref ref-type="bibr" rid="bib11">Brobbey (2021)</xref> and <xref ref-type="bibr" rid="bib12">Brobbey et al. (2022)</xref> that are more robust to deviations from multivariate normality, and the generalization of the support vector machine classifier by <xref ref-type="bibr" rid="bib17">Chen and Bowman (2011)</xref> to longitudinal data, which is a nonparametric linear classifier when used with a linear kernel. We compare these methods’ performance among each other and choose more general, realistic simulation settings, including unequal sample sizes, unstructured covariance matrices, and varying correlations over time instead of assuming any specific pattern. <xref ref-type="bibr" rid="bib11">Brobbey (2021)</xref> compares the standard repeated measures LDA (assuming multivariate normality and homoscedasticity) to its performance after preceding multivariate outlier removal based on two trimming algorithms (<xref ref-type="bibr" rid="bib61">Rousseeuw, 1985</xref>). In <xref ref-type="bibr" rid="bib12">Brobbey et al. (2022)</xref>, the performance of the standard approach is compared to its performance when based on parsimonious Kronecker product structure covariance matrix estimates (<xref ref-type="bibr" rid="bib12">Brobbey et al., 2022</xref>) from the Generalized Estimating Equations (GEE) model (<xref ref-type="bibr" rid="bib38">Inan, 2015</xref>). The longitudinal support vector machine classifier by <xref ref-type="bibr" rid="bib17">Chen and Bowman (2011)</xref> uses a weighted combination of multivariate measurements taken at several time points as input in order to represent the data structure more realistically. Thus, this paper provides a neutral comparison study which evaluates the performance of the standard repeated measures LDA, its robust and nonparametric alternatives as well as all possible combinations thereof, in linear classification problems of multivariate repeated measures data and investigate their robustness when data deviate from multivariate normality. In order to mimic realistic datasets, we base simulations on unstructured means and covariance matrices estimated from psychometric reference datasets which differ in sample size, sample size ratios, class overlap, temporal variation and number of measurement occasions. In addition to method comparisons using data simulations, we evaluate the algorithms’ performance in the reference data using a nonparametric bootstrap approach which provides confidence intervals for the performance measures (<xref ref-type="bibr" rid="bib79">Wahl et al., 2016</xref>). The paper is organized as follows. In the <xref ref-type="sec" rid="s2">Data</xref> section we explain the general structure of Likert-type data and its analysis. Some of the literature sources mention the need for longitudinal techniques. We then discuss the characteristics of the five reference datasets, which are based on Likert-type data. In the <xref ref-type="sec" rid="s3">Method</xref> section, we introduce the classification algorithms whose performance we compare, and the two approaches based on the reference data and data simulations, respectively, to compare them. In the <xref ref-type="sec" rid="s4">Results and Discussion</xref>, we present and discuss the results and provide recommendations based on the findings. <xref ref-type="sec" rid="s5">Conclusions</xref> are made in the final section.</sec> <sec id="s2" sec-type="Data"><title>Data

QCMB

Quant Comput Methods Behav Sci

Quantitative and Computational Methods in Behavioral Sciences

Quant. Comput. Methods Behav. Sci.

2699-8432

PsychOpen

qcmb.14891

10.5964/qcmb.14891

Method Dissemination Article

Data Code Materials

Linear Classification Methods for Multivariate Repeated Measures Data — A Simulation Study

Multivariate Repeated Measures Data Classification

Linear classification methods for multivariate repeated measures data — A simulation study

https://orcid.org/0000-0002-0149-479X

Graf

Ricarda

https://orcid.org/0000-0003-0172-9904

Zeldovich

Marina

https://orcid.org/0000-0003-0291-4378

Friedrich

Sarah

14 Karch

Julian

1Department of Mathematics, University of Augsburg, Augsburg, Germany 2Institute of Psychology, University of Innsbruck, Innsbruck, Austria 3Faculty of Psychotherapy Science, Sigmund Freud University Vienna, Vienna, Austria 4Centre for Advanced Analytics and Predictive Sciences (CAAPS), University of Augsburg, Augsburg, Germany Leiden University, Leiden, the Netherlands

*Department of Mathematics, University of Augsburg, Universitätsstraße 2, 86159 Augsburg, Germany. ricarda.graf@math.uni-augsburg.de

10072025

2025

e14891

19 06 2024 19 05 2025 23 07 2025

2025

Graf, Zeldovich, & Friedrich

https://creativecommons.org/licenses/by/4.0/

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC BY) 4.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Researchers in the behavioral and social sciences use linear discriminant analysis (LDA) for predictions of group membership (classification) and for identifying the variables most relevant to group separation among a set of continuous correlated variables (description). In these and other disciplines, longitudinal data are often collected which provide additional temporal information. Linear classification methods for repeated measures data are more sensitive to actual group differences by taking the complex correlations between time points and variables into account, but are rarely discussed in the literature. Moreover, psychometric data rarely fulfill the multivariate normality assumption.

In this paper, we compare existing linear classification algorithms for nonnormally distributed multivariate repeated measures data in a simulation study based on psychological questionnaire data comprising Likert scales. The results show that in data without any specific assumed structure and larger sample sizes, the robust alternatives to standard repeated measures LDA may not be needed. To our knowledge, this is one of the few studies discussing repeated measures classification techniques, and the first one comparing multiple alternatives among each other.

Likert-type datalinear classificationmultivariate repeated measures datanonnormalityrobustness

In psychology and the social sciences, discriminant analysis (DA) is traditionally applied to classification tasks in data with continuous variables since its invention by <xref ref-type="bibr" rid="bib25">Fisher (1936)</xref>. Based on estimates of group means and the pooled covariance matrix, a classification rule is obtained or relative variable weights can be computed, respectively. Its importance for the behavioral sciences has often been emphasized in reviews, tutorials and textbooks (<xref ref-type="bibr" rid="bib7">Betz, 1987</xref>; <xref ref-type="bibr" rid="bib8">Boedeker & Kearns, 2019</xref>; <xref ref-type="bibr" rid="bib24">Field, 2017</xref>; <xref ref-type="bibr" rid="bib26">Fletcher et al., 1978</xref>; <xref ref-type="bibr" rid="bib29">Garrett, 1943</xref>; <xref ref-type="bibr" rid="bib36">Huberty & Olejnik, 2006</xref>; <xref ref-type="bibr" rid="bib67">Sherry, 2006</xref>). It has been applied to a large number of problems in experimental and applied psychology for class prediction as well as description (<xref ref-type="bibr" rid="bib1">Aggarwala et al., 2022</xref>; <xref ref-type="bibr" rid="bib45">Kumpulainen et al., 2021</xref>; <xref ref-type="bibr" rid="bib46">Langlois et al., 2000</xref>; <xref ref-type="bibr" rid="bib55">O’Brien et al., 2009</xref>; <xref ref-type="bibr" rid="bib60">Rogge & Bradbury, 1999</xref>; <xref ref-type="bibr" rid="bib68">Shinba et al., 2021</xref>; <xref ref-type="bibr" rid="bib70">Stoyanov et al., 2022</xref>). In contrast to multivariate data measured at a single time point, longitudinal data provide additional information about temporal changes, wherefore they are collected in various disciplines, including psychology and the social sciences (<xref ref-type="bibr" rid="bib3">Banks et al., 2021</xref>; <xref ref-type="bibr" rid="bib40">Jensen et al., 2021</xref>; <xref ref-type="bibr" rid="bib50">McLanahan et al., 2019</xref>). Despite these potential applications for repeated measures DA or alternative linear classification techniques, textbooks discussing DA do not mention respective repeated measures approaches (<xref ref-type="bibr" rid="bib48">Lix & Sajobi, 2010</xref>). To complicate matters further, many classification approaches for continuous multivariate repeated measures data assume multivariate normality (<xref ref-type="bibr" rid="bib31">Gupta, 1986</xref>; <xref ref-type="bibr" rid="bib63">Roy & Khattree, 2005a</xref>, <xref ref-type="bibr" rid="bib64">2005b</xref>; <xref ref-type="bibr" rid="bib75">Tomasko et al., 2010</xref>), but this assumption is rarely fulfilled by psychological datasets and hard to verify for small sample sizes (<xref ref-type="bibr" rid="bib6">Beaumont et al., 2006</xref>; <xref ref-type="bibr" rid="bib20">Delacre et al., 2017</xref>; <xref ref-type="bibr" rid="bib53">Neto et al., 2016</xref>; <xref ref-type="bibr" rid="bib56">Rausch & Kelley, 2009</xref>). Psychological data, especially those obtained using patient-reported instruments, are often characterized by skewness. There are only few alternative repeated measures approaches which relax or overcome the multivariate normality assumption and take the complex correlation structure between time points and variables into account. It is the aim of this manuscript to compare these approaches in an extensive simulation study. In particular, we consider two modifications of repeated measures LDA by <xref ref-type="bibr" rid="bib11">Brobbey (2021)</xref> and <xref ref-type="bibr" rid="bib12">Brobbey et al. (2022)</xref> that are more robust to deviations from multivariate normality, and the generalization of the support vector machine classifier by <xref ref-type="bibr" rid="bib17">Chen and Bowman (2011)</xref> to longitudinal data, which is a nonparametric linear classifier when used with a linear kernel. We compare these methods’ performance among each other and choose more general, realistic simulation settings, including unequal sample sizes, unstructured covariance matrices, and varying correlations over time instead of assuming any specific pattern. <xref ref-type="bibr" rid="bib11">Brobbey (2021)</xref> compares the standard repeated measures LDA (assuming multivariate normality and homoscedasticity) to its performance after preceding multivariate outlier removal based on two trimming algorithms (<xref ref-type="bibr" rid="bib61">Rousseeuw, 1985</xref>). In <xref ref-type="bibr" rid="bib12">Brobbey et al. (2022)</xref>, the performance of the standard approach is compared to its performance when based on parsimonious Kronecker product structure covariance matrix estimates (<xref ref-type="bibr" rid="bib12">Brobbey et al., 2022</xref>) from the Generalized Estimating Equations (GEE) model (<xref ref-type="bibr" rid="bib38">Inan, 2015</xref>). The longitudinal support vector machine classifier by <xref ref-type="bibr" rid="bib17">Chen and Bowman (2011)</xref> uses a weighted combination of multivariate measurements taken at several time points as input in order to represent the data structure more realistically. Thus, this paper provides a neutral comparison study which evaluates the performance of the standard repeated measures LDA, its robust and nonparametric alternatives as well as all possible combinations thereof, in linear classification problems of multivariate repeated measures data and investigate their robustness when data deviate from multivariate normality. In order to mimic realistic datasets, we base simulations on unstructured means and covariance matrices estimated from psychometric reference datasets which differ in sample size, sample size ratios, class overlap, temporal variation and number of measurement occasions. In addition to method comparisons using data simulations, we evaluate the algorithms’ performance in the reference data using a nonparametric bootstrap approach which provides confidence intervals for the performance measures (<xref ref-type="bibr" rid="bib79">Wahl et al., 2016</xref>). The paper is organized as follows. In the <xref ref-type="sec" rid="s2">Data</xref> section we explain the general structure of Likert-type data and its analysis. Some of the literature sources mention the need for longitudinal techniques. We then discuss the characteristics of the five reference datasets, which are based on Likert-type data. In the <xref ref-type="sec" rid="s3">Method</xref> section, we introduce the classification algorithms whose performance we compare, and the two approaches based on the reference data and data simulations, respectively, to compare them. In the <xref ref-type="sec" rid="s4">Results and Discussion</xref>, we present and discuss the results and provide recommendations based on the findings. <xref ref-type="sec" rid="s5">Conclusions</xref> are made in the final section.</sec> <sec id="s2" sec-type="Data"><title>Data

Questionnaires using Likert-type responses data are a typical example of psychological data to which LDA is applied. In the Psychological Questionnaires Using Likert-Type Scales section we describe the general data structure and how LDA for linear classification is used for validating the importance of a particular subset of variables with the aim of distinguishing two groups. Some sources explicitly mention the need for longitudinal techniques, emphasizing the need for discussing available techniques. In Reference Datasets, we present the two reference datasets for which individuals completed standardized questionnaires using Likert-type responses. In order to examine the methods’ performance in further relevant scenarios, we additionally considered multiple modifications of these datasets, which will also be described.

Psychological Questionnaires Using Likert-Type Scales

In psychological and social science research, behaviour is most often assessed by self-report questionnaires using Likert scales (Baumeister et al., 2007; Clark & Watson, 2019; Sullivan & Artino, 2013). It is common practice to create pools of Likert items to form subscales which each represent an aspect of the overall construct that the questionnaire is intended to investigate. Single Likert items (i.e., questions) are not considered to suﬃciently capture these aspects (Clark & Watson, 2019; Rickards et al., 2012) and are therefore summarized into subscales by considering either the sum or average of subgroups of Likert items. The development and best practices of constructing questionnaires using Likert-type responses is discussed in the methodological psychology literature (Jebb et al., 2021). Likert (1932) developed the typical 5- or 7-point ordinal scale on which single items are measured, e.g., ranging from “Strongly approve” to “Strongly disapprove”. He suggests to assign numerical values to the answer choices in the same order as they are ranked. However, he does not suggest that these ordinal values must necessarily be translated into an equidistant scale, and states that the same results will be obtained as long as the rank order is preserved. This translation of an ordinal scale into a numerical scale conditional on rank preservation is considered to be legitimate elsewhere (Silan, 2020). So in conclusion, the distances between the numerical values are irrelevant to the analysis (Gaito, 1980) which complies with the ordinal measurement scale of the Likert items where distances between answer choices cannot be measured. Likert (1932) suggests to subsequently take the sum or mean of the transformed values, which he assumes to be normally distributed. There is a long-standing debate about how Likert-type scales should appropriately be analysed but the prevailing opinion due to vast empirical evidence (Carifio & Perla, 2007; Norman, 2010) is that survey scales as opposed to single Likert items may be treated as interval data such that means and standard deviations can be computed, and parametric methods should be applied to them (Carifio & Perla, 2008; Rickards et al., 2012; Sullivan & Artino, 2013). Specific examples for the application of LDA to questionnaire data based on Likert-type scales are Knowles et al. (2000), Kristjansdottir et al. (2018), Veronese and Pepe (2017), and Wang et al. (2016). In all of these studies, the authors computed Fisher discriminant function coeﬃcients (descriptive DA) for the subscales of the considered psychological questionnaires using Likert-type responses and showed the validity of these coeﬃcients, i.e., their discriminative ability, by subsequent linear classification (predictive DA). In particular, Wang et al. (2016) examine a longitudinal data set but restrict their analysis to time point one when applying LDA. Veronese and Pepe (2017) emphasize the need to explore the dynamic relations between their chosen subscales over time and point out their restriction to cross-sectional data in their LDA as a considerable limitation.

Reference Datasets

Two datasets differing in the number of repeated measurement occasions, as well as two modifications thereof, are used as reference datasets. Each original dataset comprises measurements of four continuous predictor variables which are measured at two time points (CORE-OM dataset) and four time points (CASP-19 dataset), respectively. The binary outcome variable represents the group (y∈{0,1}). Both of these standardized psychological questionnaires consist of Likert-type questions measured on a 5-point and 4-point Likert scale, respectively. According to the developers of these questionnaires, we considered the mean score of multiple Likert items in case of the CORE-OM dataset, and the sum score in case of the CASP-19 dataset, respectively, as the basis for parameter estimation and subsequent data simulation.

We created reference datasets from these data in order to compare the methods’ performance in different (almost) realistic settings, not in order to draw any substantive conclusions about the data themselves. Datasets differ among others in sample sizes, sample size ratios, class overlap, temporal variation, and number of measurement occasions.

The first dataset (Zeldovich, 2018) is a self-report questionnaire of psychological distress abbreviated to CORE-OM (Clinical Outcomes in Routine Evaluation-Outcome Measure) (Barkham et al., 1998). It assesses the progress of psychological or psychotherapeutic treatment using four domains (subjective well-being, problems/symptoms, life functioning, risk/harm) measured on a 5-point Likert scale (0: not at all, 1: only occasionally, 2: sometimes, 3: often, 4: most or all the time). Our dataset uses the binary variable hospitalisation as group variable and is denoted as “Dataset 1” in the following. Non-hospitalised participants represent Group 0 (n0=42) and hospitalised ones Group 1 (n1=142).

The second dataset is a self-report questionnaire of quality of life developed for adults aged 60 and older abbreviated to CASP-19 (Hyde et al., 2003). The dataset on CASP-19 is derived from Waves 2, 3, 4, and 5 of the English Longitudinal Study of Ageing (ELSA) (Banks et al., 2021). The CASP-19 questionnaire comprises four subdomains (Control, Autonomy, Self-realization, Pleasure) measured on a 4-point Likert scale (0: Often, 1: Sometimes, 2: Not often, 3: Never; reversed scale for some items). Loneliness as one of the factors affecting quality of life (Talarska et al., 2018) is chosen as the group variable. For this purpose, the sample was dichotomized at a score value of three determined from two questions related to loneliness (“Old age is a time of loneliness”, “As I get older, I expect to become more lonely”), answered on a 5-point Likert scale (1: Strongly agree, 5: Strongly disagree) by the participants during Wave 2. Persons who feel less lonely represent Group 0 (n0=948) and those who feel more lonely represent Group 1(n1=1682). Since the group differences were nevertheless marginal, we modified these data. All individuals of Group 1 were included in our reference dataset, but only those individuals of Group 0 were included, whose scores of the variables “control” and “self-realization” lay above their respective 0.2 percentiles. The dataset is referred to as “Dataset 2” in the following.

Answers to questions of each subdomain in these questionnaires using Likert-type responses are summarized in a score, where a higher mean score correspond to a higher level of distress (Dataset 1), and a higher sum score indicates a better quality of life (Dataset 2), respectively. Data simulations are based on these scores. Boxplots in Figure 1a and 1b show the scores’ distribution in Reference Data 1 and 2, respectively. They indicate that on average individuals in one group usually obtain higher/lower scores compared to the other group irrespective of the time point and variable, presumably facilitating classification in these datasets. Also, temporal variation in Dataset 2 is rather modest.

Therefore, we considered further scenarios beyond these two original datasets (Dataset 1: CORE-OM, Dataset 2: CASP-19). In addition, to test the methods under different conditions, we provided three modified versions of these datasets (Dataset 3: modified CORE-OM with equal group means collapsed over time points and group means with opposite temporal trends, Dataset 4: modified CASP-19, Time Points 1 & 2 only, with identical means but heterogeneous covariance matrices, Dataset 5: modified CASP-19, Time Points 1 and 2 only, balanced class sizes by random undersampling of Group 1). Dataset 3 was modified by adding a constant specific to each variable to the data of Group 0 such that collapsed means of both groups became equal in size, while maintaining the original boundaries of the measurement intervals. Then we swapped the data of the two time points for Variables 1, 2, and 3 for Group 0, such that means of Group 0 have an upward temporal trend compared to the downward temporal trend of measurements in Group 1. For Dataset 4, only Time Points 1 and 2 are considered. We adjusted group means of Group 0 per time point such that they equal those of Group 1. For Dataset 5, also only Time Points 1 and 2 are considered, and a random subset of the larger Group 1 equalling the sample size of Group 0 was chosen in order to obtain a balanced scenario. The corresponding R code can be found in Graf et al. (2025, code “availability”).

With Dataset 4, the aim was to create data which only differ in their group covariance matrices. Homogeneity of covariance matrices can be tested using the well-known Box’s M test (Box, 1949), but its reliability suffers when the multivariate normality assumption is even only slightly violated (Tiku & Balakrishnan, 1984). For Dataset 4, the p-value of the approximate χ2 test statistic of the Box’s M test is <.001( χ2(36) = 1789.9) but this significant test result may indicate a violation of normality instead of inequality of covariance matrices. Therefore, since the data significantly differ from multivariate normality (Table S4 in Graf et al., 2025), we visually assessed the covariance matrices’ heterogeneity based on the components used for Box’s M test, i.e., log determinants of the pooled and group covariance matrices (Σpooled,Σ0,andΣ1), which equal the product of their respective log eigenvalues. We use plots of log determinants with 95% confidence intervals and plots of log eigenvalues of the covariance matrices as suggested by Friendly and Sigal (2020). From Figure 2 we conclude substantial heterogeneity of the group covariance matrices Σ0 and Σ1 in Dataset 4. For Dataset 4, simulations are based on the estimates of Σ0 and Σ1, whereas for Datasets 1–3, and 5 they are based on the estimate of Σpooled such that the LDA assumption of homogeneous covariance matrices holds. Figure S1 in Graf et al. (2025) shows the plots for inspecting heterogeneity of covariance matrices for the other reference datasets as well. The assumption is not fulfilled in any of the datasets. Boxplots in Figure 1c–1e show the scores’ distribution in Reference Data 3–5.

We chose reference datasets with moderate temporal and cross-sectional correlations. Correlation matrices are shown in Table S1a–S1e of Graf et al. (2025). In this case, analyzing the data separately per time point or focussing on measurements of single variables over multiple time points, respectively, would ignore these correlations and yield less reliable results if, in fact, aﬃliation to one of the groups is affected by multiple correlated variables and/or time points (e.g., Gnanadesikan & Kettenring, 1984).

Table 1Some Properties of the Reference Datasets and the Corresponding Simulation Scenarios Considered in the Simulation Study

Dataset	# Variables	# Time Points	Sample Size	Simulation Covariance Matrix	Simulation Scenario Description
Dataset 1	4	2	n0=42n1=142	Σpooled	unbalanced sample sizes, homogeneous covariance matrices, same temporal trends of group means
Dataset 2	4	4	n0=948n1=1682	Σpooled	unbalanced sample sizes, homogeneous covariance matrices, same temporal trends of group means
Dataset 3	4	2	n0=42n1=142	Σpooled	unbalanced sample sizes, homogeneous covariance matrices, same group means collapsed over time, opposite temporal trends
Dataset 4	4	2	n0=948n1=1682	Σ0,Σ1	unbalanced sample sizes, heterogeneous covariance matrices, same group means
Dataset 5	4	2	n0=948n1=948	Σpooled	balanced sample sizes, homogeneous covariance matrices, same temporal trends of group means

Note. Σpooled = pooled covariance matrix, Σ0 = covariance matrix Group 0, Σ1 = covariance matrix Group 1.

Figure 1Boxplots Showing the Variables’ Distribution in the Reference Datasets

Note. (a) Dataset 1: CORE-OM dataset, group variable hospitalisation (n0=42, n1=142, non-hospitalised individuals represent Group 0 and hospitalised individuals represent Group 1).

(b) Dataset 2: CASP-19 dataset, group variable loneliness (n0=948, n1=1682, participants who feel less lonely represent Group 0 and participants who feel more lonely represent Group 1).

(d) Dataset 4 (modified Dataset 2, Time Points 1 & 2): same means, group covariance matrices differ.

(e) Dataset 5 (modified Dataset 2, Time Points 1 & 2): balanced class sizes by random undersampling of Group 1.

Figure 2Plots of the Components of Box’s M Test for Dataset 4

Note. Left: log determinants of covariance matrices with asymptotic 95% confidence intervals (CI). Right: scree plots of log eigenvalues of the covariance matrices. Less overlap of CIs and higher differences between log eigenvalues, respectively, correspond to a higher degree of heterogeneity of the (group) covariance matrices. The figures indicate (significant) heterogeneity of covariance matrices.

Method

In the following section, we will describe the traditional repeated measures LDA, which relies on the multivariate normality assumption, its robust versions and the nonparametric longitudinal SVM for classification of nonnormally distributed repeated measures data. We will compare the performance of these methods in a neutral comparison study with respect to multiple performance measures. An overview of the considered methods is given in Table 2. Each classification method is considered in combination with or without previous outlier removal by trimming algorithms. An overview of the steps in the simulation study is shown in Table 3. Further details are included in the Simulation Study section.

We consider a situation with a categorical outcome variable y∈{0,1}, where measurements of d variables are taken at t consecutive time points instead of only a single time point in n=n0+n1 individuals. We consider complete data, i.e., for each individual j∈{1,…,ni}, each measurement l=1,…,d is taken at each time point k=1,…,t. The aim is to estimate a classification rule from the (training) data that can classify new observations (from separate independent test data) into one of two groups.

Table 2Overview of the Considered Linear Classification Methods for Nonnormally Distributed Multivariate Repeated Measures Data

Linear Classification Method	Description	Abbreviation
Repeated measures linear discriminant analysis (LDA)^a	Parametric method depending on estimates of the group means and common covariance matrix
1) standard/traditional	(unstructured) pooled covariance matrix, equires multivariate normality (Lix & Sajobi, 2010)	LDA(Σpooled)
2) robust	a) (parsimonious) Kronecker product covariance estimated by flip-flop algorithm (Brobbey, 2021)	LDA(ΣKP)
	b) (unstructured) covariance matrix estimated using the joint Generalized Estimating Equations model (Brobbey et al., 2022)	LDA(GEE)
Longitudinal Support Vector Machine (SVM) using a linear kernel^b	Nonparametric method independent of distributional assumptions (Chen & Bowman, 2011)	SVM

Note. The performance of each classification method is estimated either without or in combination with preceding multivariate outlier removal (using the Minimum Volume Ellipsoid (MVE) or the Minimum Covariance Determinant (MCD) algorithm, respectively).

^asee the Multivariate Repeated Measures LDA sub-section.

^bsee the Longitudinal Support Vector Machine sub-section.

Multivariate Repeated Measures LDA

For LDA, the unknown parameters μi∈Rdt, i.e., the group-specific mean vectors, and Σ∈Rdt×dt, i.e., the pooled covariance matrix, need to be estimated from the data

Xij1T,⋯,XijtTi∈{0,1}j=1,⋯,ni∈Rn×dt,

where Xijk∈Rd are continuous measurements. Here, i∈{0,1} represents the group label, j∈{1,…,ni} the patient, k∈{1,…,t} the time point, and d the number of variables. The total sample size is denoted by n=n0+n1. The covariance matrix Σ∈Rdt×dt is assumed to be positive definite. The traditional LDA assumes multivariate normality of the data,

Xi∼iidNdt(μi,Σ),

as well as equality of group covariance matrices (homoscedasticity), Σ0=Σ1=Σ. Brobbey (2021) and Brobbey et al. (2022) developed two approaches for robust LDA (when data deviate from multivariate normality) based on the Kronecker product estimate of the covariance matrix Σ that will be described in the Robust Trimmed Likelihood LDA for Multivariate Repeated Measures Data section and in the Generalized Estimation Equations (GEE) Discriminant Analysis for Repeated Measures Data section. Here, we will briefly explain the rationale behind these modified LDA approaches and introduce the general LDA classification rule.

Assuming that Σ is unstructured, all distinct correlations between each pair of the d variables and each combination of the t time points must be estimated. If the dataset is small, the estimate Σ̂ may become singular, i.e., if n≤dt. In order to reduce the complexity of Σ or to estimate Σ more eﬃciently, a reduced number of parameters can be considered by assuming, for example, a Kronecker product structure Σ=Σt×t⊗Σd×d. Here, Σt×t∈Rt×t comprises the correlations between the t time points and Σd×d∈Rd×d comprises the correlations between the d variables. The number of unknown parameters reduces from (dt(dt+1)/2) for an unstructured covariance matrix to d(d+1)/2+t(t+1)/2 for a Kronecker product covariance matrix (Naik & Rao, 2001). It can be estimated by the flip-flop algorithm, which gives maximum likelihood estimates of Σt×t and Σd×d (Lu & Zimmerman, 2005). The flip-flop algorithm is suitable in case each observation can be separated with respect to two factors, such as the time points and variables in case of multivariate longitudinal data.

The LDA classification rule states that a new observation Xij∈Rdt is assigned to class 0 if Xij−μ0+μ12TΣ−1(μ0−μ1)>logπ1π0

where πi,i∈{0,1}, is the prior probability of class i, μ0 and μ1 the respective group means, and Σ−1 is the inverse covariance matrix (Lix & Sajobi, 2010). In the methods by Brobbey (2021) and Brobbey et al. (2022), Σ−1 is replaced by Σt×t−1⊗Σd×d−1.

Robust Trimmed Likelihood LDA for Multivariate Repeated Measures Data

The rationale behind robust trimmed likelihood LDA for multivariate repeated measures data (Brobbey, 2021) is to use more robust estimators of the sample mean and covariance matrix in order to increase the accuracy of LDA predictions in new data. Robust trimmed likelihood LDA for multivariate repeated measures data can also be used as a supporting analysis alongside the traditional LDA, showing that the results are not severely affected by outliers.

Many estimators of these sample statistics are particularly prone to outliers, which are hard to detect in multivariate data with d>2 variables. A popular measure of robustness, the finite sample breakdown point by Donoho (1982) and Donoho and Huber (1983), is the smallest number or fraction of extremely small or large values that must be added to the original sample that will result in an arbitrarily large value of the statistic. While many estimators of multivariate location and scatter break down when adding n/(d+1) outliers (Donoho, 1982), estimators based on the Minimum Volume Ellipsoid (MVE) and Minimum Covariance Determinant (MCD) algorithms (Rousseeuw, 1985) have a substantially higher break-down point of (⌊n/2⌋−d+1)/n (Rousseeuw & Driessen, 1999; Woodruff & Rocke, 1993). The high-breakdown linear discriminant analysis (Hawkins & McLachlan, 1997) for cross-sectional data, for example, is based on the MCD algorithm and has already been implemented in the R package rrcov (Todorov, 2022).

The MCD is statistically more eﬃcient than the MVE algorithm because it is asymptotically normal (Butler et al., 1993), its distances are more precise, i.e., it is more capable of detecting outliers (Rousseeuw & Driessen, 1999). The MCD algorithm takes subsets of size (n+d+1)/2≤h≤n of the dataset (for h>p) and determines the particular subset of h observations out of the nh possible subsets for which the determinant of the sample covariance Σ̂ becomes minimal. The MVE algorithm chooses the subset of h observations for which the ellipsoid containing all h data points becomes minimal.

Brobbey (2021) suggests to estimate the class means μ0 and μ1 as well as the common covariance matrix Σ in the reduced dataset derived after applying the MCD or MVE algorithm, respectively. She furthermore suggests to estimate the Kronecker product structure of the covariance matrix since it is more parsimonious than the unstructured equivalent, which may not be estimable for small sample sizes. We apply both versions, where we once estimate the unstructured pooled covariance matrix Σ^=(n0−1)Σ^0+(n1−1)Σ^1(n0−1)+(n1−1)

and once the Kronecker product covariance Σ^=Σ^t×t⊗Σ^d×d, where Σ^t×t and Σ^d×d are the pooled covariances between the t time points and d variables, respectively. The flip-flop algorithm (Lu & Zimmerman, 2005) is used to estimate Σ^t×ti and Σ^d×di,i∈{0,1} from the data.

Generalized Estimation Equations (GEE) Discriminant Analysis for Repeated Measures Data

Joint Generalized Estimating Equations (GEEs) are another possibility to derive more robust estimates of the sample means and covariance matrix from multivariate longitudinal data (Brobbey et al., 2022; Inan, 2015). GEEs provide population-level parameter estimates, which are consistent and asymptotically normally distributed even in case of misspecified working correlation structures of the outcome variables. The covariance matrix is estimated by a robust sandwich estimator (Hardin & Hilbe, 2013). Brobbey et al. (2022) proposed the use of GEEs for multivariate repeated measures data in the context of repeated measures LDA as implemented by Inan (2015). The population-level estimates (μ̂0,μ̂1,Σ̂) of the GEE model are plugged into the repeated measures LDA classification rule. For parsimony, the joint GEE model by Inan (2015) uses a decomposition of the working correlation matrix into a t×t within- and a d×d between-multivariate response correlation matrix through the Kronecker product. We fitted the joint GEE model by Inan (2015) to the data of each group i∈{0,1} to obtain the class-specific means and covariance matrix estimates, which we subsequently pooled to obtain the common covariance matrix of the entire dataset. Further details on the approach are given in Supplementary Material S.1 (see Graf et al., 2025).

Longitudinal Support Vector Machine

The original linear SVM for cross-sectional data and linearly separable classes (Vapnik, 1982) has been modified such that an overlap between the samples of both classes is to some extent allowed (Cortes & Vapnik, 1995). Chen and Bowman (2011) further generalized this SVM classifier such that it becomes applicable to longitudinal data. In their longitudinal SVM algorithm, temporal changes are modeled by considering a linear combination of the observations Xij=xj∈Rdt and a parameter vector β=(1,β1,…,βt−1), which represents the coeﬃcients for each time point k. Then, x~j=xj1+β1xj2+⋯+βt−1xjt, are provided as input to the traditional SVM. Combining the d observations from all t time points in a single vector assumes that the distances between time points are the same. The approach also assumes a fixed number of d observations per time point k (complete data) just as in case of LDA. Although this SVM classifier can also estimate nonlinear decision boundaries depending on the type of kernel matrix that is used, we apply a linear kernel in order to compare its performance to the other linear classifiers and since the absolute values of the weight vector can be interpreted as variable importance in case of a linear kernel matrix.

Although the SVM algorithm does not make any distributional assumptions, the regularization Parameter C needs to be optimized. We use the SSVMP algorithm (Sentelle et al., 2016), a modification of the SVMpath algorithm (Hastie et al., 2004) to find the optimal value of C. The SSVMP algorithm is applicable for unequal class sizes and semidefinite kernel matrices in contrast to the original version by Hastie et al. (2004). The path algorithm finds the optimal value λSVM=1/C with high accuracy, since it considers all possible values of C. At the same time, it is computationally eﬃcient compared to the generally recommended grid search. It has been shown that the choice of C can be critical for the generalizability of the SVM model (Hastie et al., 2004).

The SSVMP algorithm (Sentelle, 2015; Sentelle et al., 2016) optimizes the inverse of the regularization parameter, λSVM=1/C. Starting with a high value of λSVM such that all samples lie within the margin of the SVM, it successively determines a strictly decreasing sequence of λSVM values for which the set of support vectors changes for each λSVM value, and it stops if no more observations are left inside of the margin (linearly separable case) or if the next λSVM value would be zero.

The longitudinal SVM algorithm by Chen and Bowman (2011) requires to specify a maximum number of iterations used for finding the optimal separating hyperplane parameters. In our case, the iterative algorithm for optimization of the Lagrange multipliers α and temporal change parameters β in the longitudinal SVM is repeated until the Euclidean distance between two consecutive estimates of αm becomes less than 1E−08 or the maximum number of 100 iterative steps is reached. A summary of the longitudinal SVM algorithm using the linear soft-margin approach can be found in Supplementary Material S.2. (see Graf et al., 2025)

Nonparametric Bootstrap Approach

The nonparametric bootstrap approach for point estimates by Wahl et al. (2016) is an extension of the algorithm by Jiang et al. (2008) and based on the.632+ bootstrap method (Efron & Tibshirani, 1997), and thus assumes independence of observations. It estimates the .632+ bootstrap estimate (𝜃̂.632+) of the respective performance measure including a 95% confidence interval.

The.632+ bootstrap estimate is computed as a weighted average of the apparent performance 𝜃̂orig,orig (training and test data given by the original dataset) and the average “out-of-bag” (OOB) performance 𝜃̂bootstrap,OOB=∑b=1B𝜃̂bbootstrap,OOB computed from B bootstrap datasets (training data given by the bootstrap dataset, and test data given by the samples not present in the bootstrap dataset). The formula is: 𝜃̂.632+=(1−w)⋅𝜃̂orig,orig+w⋅𝜃̂bootstrap,OOB,

where w=0.6321−0.368⋅r and r=𝜃̂bootstrap,OOB−𝜃̂orig,orig𝜃noinfo−𝜃̂orig,orig. The value of 𝜃noinfo is 0.5 for predictive accuracy, sensitivity, and specificity. For the Youden index, this value is 0.

Then each bootstrap dataset is assigned a weight wb=𝜃̂bbootstrap,bootstrap−𝜃̂orig,orig, where 𝜃̂bbootstrap,bootstrap is the value of the performance measure, when the bootstrap dataset b∈{1,⋯,B} is used as training as well as test dataset. The α∗2 and 1−α∗2 percentiles of the empirical distribution of these weights, ξα∗2 and ξ1−α∗2, give the confidence interval of 𝜃̂.632+: [𝜃̂.632+−ξ1−α∗2,𝜃̂.632++ξα∗2]

Performance Measures

In order to compare class prediction of the classification algorithms in the independent test data, we used predictive accuracy, the Youden index, sensitivity, and specificity as measures of discrimination. Predictive accuracy is the number of correctly classified samples divided by the total number of samples. Sensitivity, or true positive rate, is the proportion of individuals among all individuals that have been predicted to belong to class 1, whose class prediction matches their true class label. Specificity, or true negative rate, is the the proportion of individuals among all individuals that have been predicted to belong to class 0, whose class prediction matches their true class label. The Youden index (Youden, 1950) combines sensitivity and specificity of the classification model into a single measure (Youden index = |Sensitivity + Specificity - 1|).

Recommendations based on theses measures can differ a lot. Predictive accuracy of an algorithm may be high in data with highly unbalanced classes if the label of the larger class is predicted for all samples. In this case the Youden index will have the minimum value of zero. Therefore it is reasonable to consider both measures, predictive accuracy and the Youden index.

Simulation Study Approach and Software

Our simulation study aims at mimicking reference datasets from psychological applications. See the Reference Datasets sub-section for a detailed description of these datasets. A brief overview of the steps in the simulation study is given in Figure 3. For each scenario, 2000 datasets are simulated. Sample sizes for the training data are chosen identical to the sample sizes of the reference datasets. Sample sizes for the test data for each group are 10 times the number of the respective original group sample size in order to maintain the group size ratio. A larger test sample size can be chosen in simulations since they do not rely on actual data. Variance in the performance estimates may thereby be decreased. Data are simulated from the multivariate normal distribution (as a reference), from the multivariate truncated normal distribution which only takes on values within specified boundaries similar to the sum or mean scores in the reference data, respectively, and from the multivariate lognormally distributed data in order to include an extremely skewed distribution (overview in Table 3). Parameters needed for data simulations are estimated from the reference datasets (i.e., the pooled covariance matrix Σ, or the group covariance matrices Σ0 and Σ1, respectively, group means μ0, and μ1, and the lower and upper boundaries, a and b, of the sum or mean score, respectively). Training data are either not trimmed or trimmed using the MCD and the MVE algorithm, respectively, keeping 90% of the samples, before applying the classification algorithms. In contrast to Brobbey (2021) and Brobbey et al. (2022), we did not use the restrictive assumption of a Kronecker product covariance structure for simulating the data. In contrast to Chen and Bowman (2011), the datasets to which we applied the method are not balanced in sample size. We would like to examine the methods’ performance in more general simulation settings.

Since the SVM algorithm relies on the Euclidean distance to determine the optimal decision boundary, standardization is required as a data-preprocessing step. We standardized the data variable-wise (across time points) before applying the method. Centering and scaling is done using the preProcess function in the R package caret (Kuhn et al., 2024). More specifically, each training dataset is centered, and scaled to unit variance, and the same parameters are then used to standardize the test dataset in the same way (Hsu et al., 2003). Machine-learning algorithms generally require the optimization of hyperparameters. Application of the linear SVM algorithm requires finding the optimal value of the Hyperparameter C which determines the maximum amount of overlap allowed between samples of both classes. We applied the simple SVM path (SSVMP) algorithm by Sentelle et al. (2016) as suggested by Chen and Bowman (2011) in order to determine the optimal regularization Parameter C. It is available as MATLAB code (Sentelle, 2015), which we rewrote in R. Computation of the longitudinal SVM results including the computation of the optimal C could only be done for the two smaller datasets (Dataset 1 and Dataset 3) due to limitations by computational complexity.

The flip-flop algorithm (Lu & Zimmerman, 2005) used by Brobbey (2021) for estimating the Kronecker product structure of the covariance matrix from the training data (for the LDA(ΣKP) algorithm) was iterated until the Frobenius norm of two consecutive Kronecker product covariance matrices became less than or equal to 1E−04, a proposed stopping criterion by Castañeda Garcia and Nossek (2014).

We used the following software for data simulations. We implemented the longitudinal SVM in R using the R package Rcplex (Bravo et al., 2021). We used the implementations of the MVE and MCD algorithm from the R package MASS (Ripley et al., 2022), the joint GEE model as implemented in the R package JGEE (Inan, 2015), and implemented the version of the flip-flop algorithm in R as described in Lu and Zimmerman (2005). For simulation of multivariate normally, lognormally, and truncated normally distributed data, we used the respective functions from the R packages MASS (Ripley et al., 2022), compositions (van den Boogaart et al., 2022), and tmvtnorm (Wilhelm & Manjunath, 2022). For the truncated normal distribution, the rejection method (default) was used.

Table 3Parameterizations of the Multivariate Distributions for Group <inline-formula><mml:math id="math-148" display="inline"><mml:mi>i</mml:mi><mml:mo>∈</mml:mo><mml:mo stretchy="false">{</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">}</mml:mo></mml:math></inline-formula>

Distribution	Parameterization
Multivariate normal	Ndt(μi,Σ)
Multivariate lognormal	LNdt(μi,Σ)
Multivariate truncated normal	TNdt(μi,Σ,a,b)

Note. The multivariate truncated normal distribution is defined by lower and upper boundaries, a∈Rdt and b∈Rdt, respectively, in addition to the mean (μi) and covariance (Σ) parameters.

Figure 3Overview of the Steps in the Simulation Study for a Particular Reference Dataset

Note. Ndt — multivariate normal distribution LNdt — multivariate lognormal distribution, TNdt — multivariate truncated normal distribution, d — # variables, t — # time points, LDA(Σpooled) — Linear Discriminant Analysis (pooled covariance matrix), LDA(ΣKP) — Linear Discriminant Analysis (Kronecker product covariance matrix), LDA(GEE) — Linear Discriminant Analysis (covariance matrix based on generalized estimating equations estimates), SVM — longitudinal Support Vector Machine, MVE — Minimum Volume Ellipsoid algorithm, MCD — Minimum Covariance Determinant algorithm.

Results and Discussion Performance in the Reference Data

For computing point estimates of the performance measures including confidence intervals in the reference data, we used the bootstrap approach described in the Nonparametric Bootstrap Approach sub-section. Estimates of predictive performance and the Youden index are shown in Figure 4, those of sensitivity and specificity can be found in Figure S2 (see Graf et al., 2025). The bootstrap estimates and their respective confidence intervals are also shown in Table S3 (see Graf et al., 2025).

Figure 4 shows that the two methods LDA(Σpooled) and LDA(ΣKP) have a very similar performance in all scenarios, and generally perform best. Including Figure S2 in Graf et al. (2025), these two methods tend to have more moderate values of sensitivity and specificity even for highly imbalanced datasets (Datasets 1,2,3). However, similar to LDA(GEE), they are (almost) incapable to accurately predict the correct class of individuals from the minority class when group means are identical and only group covariance matrices differ (Dataset 4). All three methods, LDA(Σpooled), LDA(ΣKP), and LDA(GEE), predominantly predict that individuals belong to the majority class in this scenario, probably because its covariance matrix has a greater weight when computing the inverse of the pooled covariance matrix for the classification rule (Multivariate Repeated Measures LDA sub-section). In comparison, LDA(GEE) and SVM perform worse for unequal class sizes, of which SVM performs worse compared to LDA(GEE), particularly because its specificity (prediction of the minority class) is very low. Comparing the performance for Dataset 1 (same temporal trends of group means) and Dataset 3 (opposite temporal trends of group means), the results of all performance measures considerably improve for LDA(Σpooled) and LDA(ΣKP). For LDA(GEE) there is almost no change (very slight improvement), and overall no difference for the SVM. Results for Dataset 5 show that using balanced instead of the imbalanced data (Dataset 2), increase specificity of all LDA methods but particularly for LDA(GEE), resulting in a higher Youden index. Trimming of the training data does only in some cases improve the performance in the test data. A slight improvement of predictive performance and Youden index at the same time can only be observed in some cases: for LDA(Σpooled) when applied to Dataset 1 (after MVE trimming), Dataset 3 (after MVE and MCD trimming), Dataset 5 (after MCD trimming), for LDA(ΣKP) when applied to Dataset 1 (MVE trimming), Dataset 3 (MVE and MCD trimming), and LDA(GEE) when applied to Dataset 1 (MCD trimming).

Figure 4Performance in the Reference Data

Note. The bootstrap approach by Wahl et al. (2016) was used, along with the 2000 bootstrap datasets: 𝜃̂.632+ and respective 95% confidence intervals for the performance measures predictive accuracy and Youden index.

(a) Dataset 1: CORE-OM dataset, group variable hospitalisation (n0=42,n1=142).

(b) Dataset 2: CASP-19 dataset, group variable loneliness (n0=948,n1=1682).

(c) Dataset 3: modified Dataset 1, such that group means collapsed over time points are equal, and group means have opposite temporal trends.

(d) Dataset 4: modified Dataset 2 (Time Points 1 & 2), such that group means are equal, and group covariance matrices differ.

(e) Dataset 5: modified Dataset 2 (Time Points 1 & 2), balanced class sizes by random undersampling of Group 1.

Performance in the Simulated Data

For data simulations we assumed homogeneity of covariance matrices (which is a LDA assumption) for data generation based on Datasets 1, 2, 3, and 5, despite heterogeneity in the reference datasets. Figure S1 in Graf et al. (2025) shows plots for comparison of the components of Box’s M test, which is known to be very sensitive to violations of the normality assumption, and results may therefore not be reliable. Log determinants and log eigenvalues of the covariance matrices differ from each other suggesting heterogeneity of covariances in the reference data. Only for Dataset 4 we assumed heterogeneous covariance matrices for data generation in order to compare the methods’ performance under violation of this assumption when group means are identical at the same time.

The second LDA assumption is multivariate normality of the data. Table S4 in Graf et al. (2025) shows that lognormally distributed multivariate data differ most from multivariate normality according to the Mardia measure of multivariate skewness (highest absolute number of significant test results). Truncated normally distributed data differ more significantly from multivariate normality for larger sample sizes and/or a higher number of measurement occasions. Especially for Datasets 2 and 4, respectively, trimming the data using the MCD algorithm notably decreases deviation from multivariate normality in truncated normally distributed data, which is also true for datasets 1 and 3, respectively, when the MCD algorithm is applied to the lognormally distributed data. This effect is weaker for the MVE algorithm. This shows at least that the MCD algorithm, which has been found to be more suitable for outlier detection compared to the MVE algorithm (Rousseeuw & Driessen, 1999), may be useful in case outliers or non-normality is assumed to bias parameter estimates. On the other hand, the optimal trimming value has to be chosen in order to not remove valuable observations from the data. There currently are no general guidelines.

Table 4 shows the computational times per algorithm, averaged over scenarios using different data distributions and trimming approaches. The method LDA(Σpooled) has the advantage of low computational times. Especially for LDA(ΣKP) computational time hugely increases with larger sample size and/or higher number of measurement occasions. In comparison, computational time of LDA(GEE) seems to be less affected by larger sample sizes but rather higher dimensionality (number of time points and variables). Computation of SVM results are most time-consuming, and the algorithm does not always converge after 100 iterations (Table S6 in Graf et al., 2025).

Table 4Computational Times (Hours) per Algorithm Averaged Over the Simulated Datasets per Reference Dataset

Dataset	LDA(Σpooled)	LDA(ΣKP)	LDA(GEE)	SVM
Dataset 1	0.08	1.05	0.34	64.29
Dataset 2	1.4	29.62	26.71	—
Dataset 3	0.11	1.29	0.39	61.63
Dataset 4	0.93	16.99	6.9	—
Dataset 5	0.79	10.57	4.12	—

Note. Irrespective of the data distribution and irrespective whether trimming has been done before application of the classification algorithm.

Dataset 1: CORE-OM dataset, group variable hospitalisation (n0=42,n1=142)

Dataset 2: CASP-19 dataset, group variable loneliness (n0=948,n1=1682).

Dataset 3: modified Dataset 1, such that group means collapsed over time points are equal, and group means have opposite temporal trends.

Dataset 4: modified Dataset 2 (Time Points 1 & 2), such that group means are equal, and group covariance matrices differ.

Dataset 5: modified Dataset 2 (Time Points 1 & 2), balanced class sizes by random undersampling of Group 1.

LDA(Σpooled) — Linear discriminant analysis (pooled covariance matrix), LDA(ΣKP) — Linear discriminant analysis (Kronecker product covariance matrix), LDA(GEE) — Linear discriminant analysis (covariance matrix based on generalized estimating equations estimates), SVM — Support vector machine.

Figures 5 and 6 show the estimates’ distribution of predictive accuracy and the Youden index in the simulated data, respectively. Plots for sensitivity and specificity are shown in Figures S 3 and S 4, respectively. Mean (standard error) of the performance measures are also shown in Tables S5a–Se (see Graf et al., 2025) for Datasets 1–5. A first finding from Figures 5 and 6 is that deviation from normality (in the multivariate lognormally distributed data) in some cases increases (Dataset 1), decreases (Dataset 2 and 5) the algorithms’ predictive performance and Youden index, and in some cases does not have a considerable effect (Dataset 3 and 4). It seems that for the scenarios with smaller sample sizes (n0=42,n1=142), no negative effect could be determined, whereas for the scenarios with much larger sample sizes (n0=948,n1=1682 and n0=n1=948) there is a clear decrease in predictive accuracy and Youden index. The effect is approximately the same for all three repeated-measures LDA methods. A second finding is that predictive accuracy and Youden index for the SVM are visibly worse compared to the LDA methods for these imbalanced sample sizes. It has a sensitivity close to 1, but specificity close to 0, and thus mostly predicts the majority class.

With respect to predictive accuracy, LDA(Σpooled) without prior trimming usually performs best. Only for Dataset 5 (balanced class sizes) LDA(GEE) with prior trimming (MCD algorithm) has a marginally better predictive performance in the lognormally distributed data. Values of both measures, predictive performance and Youden index, of LDA(GEE) are only equal to the other two LDA methods for Dataset 5 (equal sample sizes), Dataset 4 (where all methods perform poorly) and lognormally distributed data simulated based on Dataset 2 (where all methods perform poorly). For Dataset 1 (unbalanced classes, same temporal trends of group means), the Youden index of LDA(GEE) is higher than the values for LDA(Σpooled) and LDA(ΣKP) for multivariate normally and truncated normally distributed data, especially when no trimming is applied to the training data. The boxes only slightly overlap or do not overlap at all. The reason is its higher specificity (prediction of the minority class), but its sensitivity is comparably lower. For the lognormally distributed data generated based on Dataset 1, the Youden index of LDA(ΣKP), especially without prior trimming, is higher compared to the other methods, which is also due to higher specificity.

It is not clear in which situations among the presented simulation scenarios trimming for outlier removal may help, but there is no scenario where we explicitly simulated outliers. Both, predictive accuracy and the Youden index, somewhat increase (from a rather low performance level) for all three LDA methods in the lognormally distributed data for Dataset 5 when trimming in the training data is done.

Figure 5Boxplots Showing the Distribution of Predictive Accuracy Estimated in the 2000 Simulated Datasets

Note. Distribution for the multivariate normal (left), for the multivariate lognormal (center), and for the multivariate truncated normal distribution (right). Results with the highest median value are highlighted in darker colours.

(a) Dataset 1: CORE-OM dataset, group variable hospitalisation (n0=42, n1=142).

(b) Dataset 2: CASP-19 dataset, group variable loneliness (n0=948,n1=1682).

(c) Dataset 3: modified Dataset 1, such that group means collapsed over time points are equal, and group means have opposite temporal trends.

(d) Dataset 4: modified Dataset 2 (Time Points 1 & 2), such that group means are equal, and group covariance matrices differ.

(e) Dataset 5: modified Dataset 2 (Time Points 1 & 2), balanced class sizes by random undersampling of Group 1.

Abbreviations: LDA(Σpooled) — Linear Discriminant Analysis (pooled covariance matrix), LDA(ΣKP) — Linear Discriminant Analysis (Kronecker product covariance matrix), LDA(GEE) — Linear Discriminant Analysis (covariance matrix based on Generalized Estimating Equations estimates), SVM — Support vector machine, MVE — Minimum Volume Ellipsoid algorithm, MCD — Minimum Covariance Determinant algorithm.

Recommendations

Generally, in these simulations the traditional LDA(Σpooled) performs best or reasonably well with respect to predictive performance and Youden index, irrespective of smaller or larger sample size, differing group size ratios, number of measurement occasions, similar or opposite temporal trends in group means. None of the LDA methods works well for identical group means but heterogeneous covariance matrices, where they predominantly assign new observations to the majority class, and the Youden index is close to zero. The same is the case for multivariate lognormally distributed data when sample sizes are large, i.e., for an extremely evident violation of multivariate normality corresponding to extremely high values of the Mardia measure of multivariate skewness test statistic (approximately above 100).

We did not explicitly generate outliers from a different distribution than the actual data, but there may have been some random outliers. In this case, trimming for outlier removal had no effect except a minor effect on the Youden index for all LDA methods in the scenario with balanced group sizes and same temporal trends per group when data were generated from lognormally distributed data. In this case, the LDA methods still did not perform reasonably well. Multivariate trimming in the training data can be tried as a sensitivity analysis if the presence of outliers is suspected. Especially the MCD algorithm has already been recommended in the literature.

In our simulations no Kronecker product covariance matrices and group means are assumed in the reference data. We used unstructured estimates of the pooled covariance matrix and group means. In our simulations, there is only an advantage of the alternative LDA(ΣKP) and LDA(GEE) with respect to the Youden index for data with imbalanced class sizes and comparably smaller (but not small) sample sizes. The advantage of these methods, even if no underlying Kronecker product structure of the parameters can be assumed, may become more evident for smaller sample sizes. They may provide more exact estimates due to their parsimonious number of values that have to be estimated.

Application of repeated-measures techniques should be preferred in order to incorporate the additional information about temporal trends and in order to obtain more reliable results by including data of multiple time points in the analysis provided that moderate correlations between data of different variables and times points exist. Multicollinearity among time points and/or variables would require removal of respective time points or variables, respectively. In case of independence between time points/variables, univariate techniques can be used. According to the psychometric literature, multivariate data are very common. An example are the widely applied questionnaires using Likert-type responses where multiple correlated aspects related to an overall topic are measured. In order to assess the usefulness of different sets of variables for distinguishing two classes of individuals, LDA can be applied for class prediction and its performance for different sets of variables can subsequently be compared to determine the most relevant variables. Usually, for LDA applied to cross-sectional data, Fisher discriminant function coeﬃcients (Fisher, 1936) are computed in order to assess relative variable importance within a particular set. The method can in principle also be applied to repeated measures data. It does not assume multivariate normality although it requires homogeneity of covariance matrices.

Figure 6Boxplots Showing the Distribution of Youden Index Estimated in the 2000 Simulated Datasets

(a) Dataset 1: CORE-OM dataset, group variable hospitalisation (n0=42,n1=142).

(b) Dataset 2: CASP-19 dataset, group variable loneliness (n0=948,n1=1682).

(c) Dataset 3: modified Dataset 1, such that group means collapsed over time points are equal, and group means have opposite temporal trends.

(d) Dataset 4: modified Dataset 2 (Time Points 1 & 2), such that group means are equal, and group covariance matrices differ.

(e) Dataset 5: modified Dataset 2 (Time Points 1 & 2), balanced class sizes by random undersampling of Group 1.

Abbreviations: LDA(Σpooled) — Linear Discriminant Analysis (pooled covariance matrix), LDA(ΣKP) — Linear Discriminant Analysis (Kronecker product covariance matrix), LDA(GEE) — Linear Discriminant Analysis (covariance matrix based on Generalized Estimating Equations estimates), SVM — Support Vector Machine, MVE — Minimum Volume Ellipsoid algorithm, MCD — Minimum Covariance Determinant algorithm.

Conclusion

Longitudinal studies are conducted in psychology and other disciplines. Data in psychology and the social sciences are often characterized by nonnormal distributions, especially skewness. LDA is widely applied as a standard technique in these fields, e.g., to questionnaire data where answers are measured on Likert scales that are summarized in subscales based on means or sums of multiple Likert items (i.e., single questions), either for classification tasks or for identifying variables most relevant to group separation. Repeated measures techniques are preferable for the analysis of data that are collected repeatedly over time compared to conducting several independent analyses for each time point in case temporal correlations exist.

We compared the performance of robust repeated measures DA techniques proposed by Brobbey (2021) and Brobbey et al. (2022) and the longitudinal SVM by Chen and Bowman (2011) using multiple performance measures. We based these comparisons on real psychometric datasets which differ with respect to sample size, sample size ratio, class overlap, temporal variation, number of repeated measurement occasions, and properties of group means and covariance matrices. We thus considered additional scenarios to those in Brobbey (2021) and Brobbey et al. (2022), where Kronecker product structures of means and covariances and thus constant correlations and means of the variables over time were assumed. We also compared several alternative methods among each other in contrast to comparing a particular alternative to the standard method at a time. We included the longitudinal SVM because it is similar to repeated measures LDA in that they are both linear classifiers for which variable weights can additionally be computed and temporal correlations are considered in the analysis. We did not consider extensions of other supervised machine learning algorithms for classification since they usually assume independence between time points (Ribeiro & Freitas, 2019) and do not have a comparably intuitive interpretation of variable weights as the linear SVM.

We followed the guidelines for neutral comparison studies by Weber et al. (2019) and the general design of simulation studies by Morris et al. (2019). We found that the alternative robust methods may not be required for suﬃciently large sample sizes and absence of outliers. Limitations of our simulation study are that only a limited number of scenarios and datasets are considered. Further examination in data with smaller sample sizes and in data containing outliers from a different distribution would be helpful. In this context, the influence of different choices for the trimming parameter when applying one of the trimming algorithms for outlier removal may also be examined. To date, no recommendations on the choice of the trimming parameter for multivariate data exist. Therefore, for an actual dataset, multiple values should be tried. Moreover, due to availability of suitable datasets in particular given data protection policies, and limited number of scenarios considered in every simulation study in general, further conclusions may be possible when applying the methods to other datasets. As with any simulation study, our results can therefore not be generalized beyond the considered scenarios. We found that none of the LDA methods did work well for extreme deviations from normality, and heterogeneity of covariance matrices when group means were identical, respectively. Conclusions based on the performance in the reference datasets and based on data simulations, respectively, are similar.

References

Aggarwala, J., Garg, R., & Chatterjee, S. (2022). Linear discriminant analysis of various physiological and psychological parameters among Indian elite male athletes of different types of sports. Sport Mont, 20(3), 53–60. 10.26773/smj.221009

Banks, J., Batty, G., Breedvelt, J., Coughlin, K., Crawford, R., Marmot, M., Nazroo, J., Oldfield, Z., Steel, N., Steptoe, A., Wood, M., & Zaninotto, P. (2021). English longitudinal study of ageing: Waves 0–9, 1998–2019 [36^th edition] SN: 5050. UK Data Service. 10.5255/ukda-sn-5050-24

Barkham, M., Evans, C., Margison, F., Mcgrath, G., Mellor-Clark, J., Milne, D., & Connell, J. (1998). The rationale for developing and implementing core batteries in service settings and psychotherapy outcome research. Journal of Mental Health, 7(1), 35–47. 10.1080/09638239818328

Baumeister, R., Vohs, K., & Funder, D. (2007). Psychology as the science of self-reports and finger movements whatever happened to actual behavior? Perspectives on Psychological Science, 2(4), 396–403. 10.1111/j.1745-6916.2007.0005

Beaumont, J., Lix, L., Yost, K., & Hahn, E. (2006). Application of robust statistical methods for sensitivity analysis of health-related quality of life outcomes. Quality of Life Research, 15(3), 349–356. 10.1007/s11136-005-2293-1

Betz, N. E. (1987). Use of discriminant analysis in counseling psychology research. Journal of Counseling Psychology, 34(4), 393–403. 10.1037/0022-0167.34.4.393

Boedeker, P., & Kearns, N. (2019). Linear discriminant analysis for prediction of group membership: A user-friendly primer. Advances in Methods and Practices in Psychological Science, 2(3), 250–263. 10.1177/2515245919849378

Box, G. E. P. (1949). A general distribution theory for a class of likelihood criteria. Biometrika, 36(3–4), 317–346. 10.1093/biomet/36.3-4.317

Bravo, H. C., Hornik, K., & Theussl, S. (2021). Rcplex: R interface to CPLEX. Comprehensive R Archive Network. 10.32614/CRAN.package.Rcplex

Brobbey, A. (2021). Classification models for multivariate non-normal repeated measures data. [Doctoral thesis, University of Calgary]. University of Calgary Repository. http://hdl.handle.net/1880/112972

Brobbey, A., Wiebe, S., Nettel-Aguirre, A., Josephson, C., Williamson, T., Lix, L., & Sajobi, T. (2022). Repeated measures discriminant analysis using multivariate generalized estimation equations. Statistical Methods in Medical Research, 31(4), 646–657. 10.1177/09622802211032705

Butler, R. W., Davies, P. L., & Jhun, M. (1993). Asymptotics for the minimum covariance determinant estimator. Annals of Statistics, 21(3), 1385–1400. 10.1214/aos/1176349264

Carifio, J., & Perla, R. J. (2007). Ten common misunderstandings, misconceptions, persistent myths and urban legends about Likert scales and Likert response formats and their antidotes. Journal of Social Sciences, 3(3), 106–116. 10.3844/jssp.2007.106.116

Carifio, J., & Perla, R. J. (2008). Resolving the 50-year debate around using and misusing Likert scales. Medical Education, 42(12), 1150–1152. 10.1111/j.1365-2923.2008.03172.x

Castañeda Garcia, M., & Nossek, J. (2014). Estimation of rank deficient covariance matrices with Kronecker structure. ICASSP — IEEE International Conference on Acoustics, Speech and Signal Processing — Proceedings, (pp. 394–398). Curran Associates.

Chen, S., & Bowman, F. D. (2011). A novel support vector classifier for longitudinal high-dimensional data and its application to neuroimaging data. Statistical Analysis and Data Mining, 4(6), 604–611. 10.1002/sam.10141

Clark, L. A., & Watson, D. (2019). Constructing validity: New developments in creating objective measuring instruments. Psychological Assessment, 31(12), 1412–1427. 10.1037/pas0000626

Cortes, C., & Vapnik, V. N. (1995). Support-Vector Networks. Machine Learning, 20, 273–297. 10.1007/BF00994018

Delacre, M., Lakens, D., & Leys, C. (2017). Why psychologists should by default use Welch’s t-test instead of Student’s t-test. International Review of Social Psychology, 30(1), 92–101. 10.5334/irsp.82

Donoho, D. (1982). Breakdown properties of multivariate location estimators [Unpublished doctoral dissertation]. Harvard University.

Donoho, D. L., & Huber, P. J. (1983). The notion of breakdown point. In P. J. Bickel, K. A. Doksum, & J. L. Hodges, Jr. (Eds.), A Festschrift for Erich Lehmann (pp. 157–184). Wadsworth.

Efron, B., & Tibshirani, R. (1997). Improvements on cross-validation: The .632+ bootstrap method. Journal of the American Statistical Association, 92(438), 548–560. 10.1080/01621459.1997.10474007

Field, A. (2017). Discovering statistics using IBM SPSS Statistics. SAGE Publications.

Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Human Genetics, 7(2), 179–188. 10.1111/j.1469-1809.1936.tb02137.x

Fletcher, J. M., Rice, W. J., & Ray, R. M. (1978). Linear discriminant function analysis in neuropsychological research: Some uses and abuses. Cortex: A Journal Devoted to the Study of the Nervous System and Behavior, 14(4), 564–577. 10.1016/S0010-9452(78)80031-8

Friendly, M., & Sigal, M. (2020). Visualizing tests for equality of covariance matrices. American Statistician, 74(2), 144–155. 10.1080/00031305.2018.1497537

Gaito, J. (1980). Measurement scales and statistics: Resurgence of an old misconception. Psychological Bulletin, 87(3, 564–567. 10.1037/0033-2909.87.3.564

Garrett, H. E. (1943). The discriminant function and its use in psychology. Psychometrika, 8(2), 65–79. 10.1007/BF02288691

Gnanadesikan, R., & Kettenring, J. R. (1984). A pragmatic review of multivariate methods in applications. In W. A. David & H. T. David (Eds.), Statistics: An appraisal (pp. 309–337). Iowa State University Press.

Gupta, A. K. (1986). On a classification rule for multiple measurements. Computers & Mathematics with Applications, 12(2A), 301–308. 10.1016/0898-1221(86)90082-9

Graf, R., Zeldovich, M., & Friedrich, S. (2025). Linear classification methods for multivariate repeated measures data — A simulation study [Code, Data, Supplementary Materials]. Figshare. https://figshare.com/s/104aeb2a870a810f80bd.

Hardin, J. W., & Hilbe, J. M. (2013). Generalized estimating equations. CRC Press.

Hastie, T. J., Rosset, S., Tibshirani, R., & Zhu, J. (2004). The entire regularization path for the support vector machine. Journal of Machine Learning Research, 5, 1391–1415.

Hawkins, D. M., & McLachlan, G. J. (1997). High-breakdown linear discriminant analysis. Journal of the American Statistical Association, 92(437), 136–143. 10.1080/01621459.1997.10473610

Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). A practical guide to support vector classification [Technical Report] (pp. 1–12). Department of Computer Science and Information Engineering, National Taiwan University.

Huberty, C., & Olejnik, S. (2006). Applied MANOVA and discriminant analysis. John Wiley & Sons.

Hyde, M., Wiggins, R., Higgs, P., & Blane, D. (2003). A measure of quality of life in early old age: The theory, development and properties of a needs satisfaction model (CASP-19). Aging & Mental Health, 7(3), 186–194. 10.1080/1360786031000101157

Inan, G. (2015). JGEE: Joint Generalized Estimating Equation solver. Comprehensive R Archive Network. https://cran.r-project.org/src/contrib/Archive/JGEE/

Jebb, A., Ng, V., & Tay, L. (2021). A review of key Likert scale development advances: 1995–2019. Frontiers in Psychology, 12, 637547. 10.3389/fpsyg.2021.637547

Jensen, E., Pfleger, A., Lorenz, L., Jensen, A., Wagoner, B., Watzlawik, M., & Herbig, L. (2021). A repeated measures dataset on public responses to the COVID-19 pandemic: Social norms, attitudes, behaviors, conspiracy thinking, and (mis)information. Frontiers in Communication, 6, 678335. 10.3389/fcomm.2021.678335

Jiang, B., Zhang, X., & Cai, T. (2008). Estimating the confidence interval for prediction errors of support vector machine classifiers. Journal of Machine Learning Research, 9(17), 521–540.

Knowles, C., Eccersley, A., Scott, M., Walker, S., Reeves, B., & Lunniss, P. (2000). Linear discriminant analysis of symptoms in patients with chronic constipation. Diseases of the Colon & Rectum, 43, 1419–1426. 10.1007/BF02236639

Kristjansdottir, H., Erlingsdóttir, A., & Saavedra, J. (2018). Psychological skills, mental toughness and anxiety in elite handball players. Personality and Individual Differences, 134, 125–130. 10.1016/j.paid.2018.06.011

Kuhn, M., Wing, J., Weston, S., Williams, A., Keefer, C., Engelhardt, A., Cooper, T., Mayer, Z., Kenkel, B., R Core Team, Benesty, M., Lescarbeau, R., Ziem, A., Scrucca, L., Tang, Y., Candan, C., & Hunt, T. (2024). caret: Classification and Regression Training. Comprehensive R Archive Network. https://CRAN.R-project.org/package=caret

Kumpulainen, P., Cardó, A. V., Somppi, S., Törnqvist, H., Väätäjä, H., Majaranta, P., Surakka, V., Vainio, O., Kujala, M. V., Gizatdinova, Y., & Vehkaoja, A. (2021). Dog behaviour classification with movement sensors placed on the harness and the collar. Applied Animal Behaviour Science, 241, 105393. 10.1016/j.applanim.2021.105393

Langlois, F., Freeston, M. H., & Ladouceur, R. (2000). Differences and similarities between obsessive intrusive thoughts and worry in a non-clinical population: Study 2. Behaviour Research and Therapy, 38(2), 175–189. 10.1016/s0005-7967(99)00028-5

Likert, R. (1932). A technique for the measurement of attitudes. Archives of Scientific Psychology, 140, 1–55.

Lix, L., & Sajobi, T. (2010). Discriminant analysis for repeated measures data: A review. Frontiers in Psychology, 1, 146. 10.3389/fpsyg.2010.00146

Lu, N., & Zimmerman, D. (2005). The likelihood ratio test for a separable covariance matrix. Statistics & Probability Letters, 73(4), 449–457. 10.1016/j.spl.2005.04.020

McLanahan, S., Garfinkel, I., Edin, K., Waldfogel, J., Hale, L., Buxton, O. M., Mitchell, C., Hyde, L. W., Notterman, D. A., & Monk, C. S. (2019). Fragile families and child wellbeing study, public use, United States, 1998–2017. Inter-University Consortium for Political and Social Research. 10.3886/ICPSR31622.v4

Morris, T. P., White, I. R., & Crowther, M. J. (2019). Using simulation studies to evaluate statistical methods. Statistics in Medicine, 38(11), 2074–2102. 10.1002/sim.8086

Naik, D. N., & Rao, S. S. (2001). Analysis of multivariate repeated measures data with a Kronecker product structured covariance matrix. Journal of Applied Statistics, 28(1), 105–191. 10.1080/02664760120011626

Neto, E., Biessmann, F., Aurlien, H., Nordby, H., & Eichele, T. (2016). Regularized linear discriminant analysis of EEG features in dementia patients. Frontiers in Aging Neuroscience, 8, 273. 10.3389/fnagi.2016.00273

Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Advances in Health Sciences Education, 15(5), 625–632. 10.1007/s10459-010-9222-y

O’Brien, J., Tsermentseli, S., Cummins, O., Happé, F., Heaton, P., & Spencer, J. V. (2009). Discriminating children with autism from children with learning diﬃculties with an adaptation of the short sensory profile. Early Child Development and Care, 179(4), 383–394.

Rausch, J., & Kelley, K. (2009). A comparison of linear and mixture models for discriminant analysis under nonnormality. Behavior Research Methods, 4185–98. 10.3758/BRM.41.1.85

Ribeiro, C. E., & Freitas, A. (2019). A mini-survey of supervised machine learning approaches for coping with ageing-related longitudinal datasets. Third Workshop on AI for Aging, Rehabilitation and Independent Assisted Living (ARIAL) — IJCAI-2019.

Rickards, G., Magee, C., & Artino, A. (2012). You can’t fix by analysis what you’ve spoiled by design: Developing survey instruments and collecting validity evidence. Journal of Graduate Medical Education, 484, 407–410. 10.4300/JGME-D-12-00239.1

Ripley, B., Venables, B., Bates, D. M., Hornik, K., Gebhardt, A., & Firth, D. (2022). MASS: Support functions and datasets for Venables and Ripley’s MASS. Comprehensive R Archive Network. https://CRAN.R-project.org/package=MASS

Rogge, R. D., & Bradbury, T. N. (1999). Till violence does us part: The differing roles of communication and aggression in predicting adverse marital outcomes. Journal of Consulting and Clinical Psychology, 67(3, 340–351. 10.1037/0022-006X.67.3.340

Rousseeuw, P. (1985). Multivariate estimation with high breakdown point. In W. Grossmann, G. Pflug, I. Vincze, & W. Wertz (Eds.), Mathematical statistics and applications (pp. 283–297). Reidel Publishing Company.

Rousseeuw, P., & Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3), 212–223. 10.1080/00401706.1999.10485670

Roy, A., & Khattree, R. (2005a). Discrimination and classification with repeated measures data under different covariance structures. Communications in Statistics – Simulation and Computation, 34(1, 167–178. 10.1081/SAC-200047072

Roy, A., & Khattree, R. (2005b). On discrimination and classification with multivariate repeated measures data. Journal of Statistical Planning and Inference, 134(2), 462–485. 10.1016/j.jspi.2004.04.012

Sentelle, C. (2015). simplesvmpath. GitHub. https://github.com/csentelle/simplesvmpath.git

Sentelle, C., Anagnostopoulos, G. C., & Georgiopoulos, M. (2016). A simple method for solving the SVM regularization path for semidefinite kernels. IEEE Transactions on Neural Networks and Learning Systems, 27(4), 709–722. 10.1109/TNNLS.2015.2427333

Sherry, A. (2006). Discriminant analysis in counseling psychology research. Counseling Psychologist, 34(5), 661–683. 10.1177/0011000006287103

Shinba, T., Murotsu, K., Usui, Y., Andow, Y., Terada, H., Kariya, N., Tatebayashi, Y., Matsuda, Y., Mugishima, G., Shinba, Y., Sun, G., & Matsui, T. (2021). Return-to-work screening by linear discriminant analysis of heart rate variability indices in depressed subjects. Sensors, 21(15), 5177. 10.3390/s21155177

Silan, M. A. A. (2020). When can we treat Likert type data as interval? PsyArXiv. https://osf.io/preprints/psyarxiv/wvkyu_v1

Stoyanov, D. S., Khorev, V. S., Paunova, R., Kandilarova, S., Simeonova, D., Badarin, A. A., Hramov, A. E., & Kurkin, S. A. (2022). Resting-state functional connectivity impairment in patients with major depressive episode. International Journal of Environmental Research and Public Health, 19(21), 14045. 10.3390/ijerph192114045

Sullivan, G., & Artino, A. (2013). Analyzing and interpreting data from Likert-type scales. Journal of Graduate Medical Education, 5(4), 541–542. 10.4300/JGME-5-4-18

Talarska, D., Tobis, S., Kotkowiak, M., Strugała, M., Stanisławska, J., & Wieczorowska-Tobis, K. (2018). Determinants of quality of life and the need for support for the elderly with good physical and mental functioning. Medical Science Monitor, 24, 1604–1613. 10.12659/msm.907032

Tiku, M., & Balakrishnan, N. (1984). Testing equality of population variances the robust way. Communications in Statistics – Theory and Methods, 13(17), 2143–2159. 10.1080/03610928408828818

Todorov, V. (2022). rrcov: Scalable robust estimators with high breakdown point. Comprehensive R Archive Network. https://CRAN.R-project.org/package=rrcov

Tomasko, L., Helms, R. W., & Snapinn, S. M. (2010). A discriminant analysis extension to mixed models. Statistics in Medicine, 18(10), 1249–1260.

van den Boogaart, K. G., Tolosana-Delgado, R., & Bren, M. (2022). compositions: Compositional data analysis. Comprehensive R Archive Network. https://CRAN.R-project.org/package=compositions

Vapnik, V. (1982). Estimation of dependences based on empirical data: Empirical inference science. Springer.

Veronese, G., & Pepe, A. (2017). Life satisfaction and trauma in clinical and non-clinical children living in a war-torn environment: A discriminant analysis. Journal of Health Psychology, 25(4), 459–471. 10.1177/1359105317720004

Wahl, S., Boulesteix, A.-L., Zierer, A., Thorand, B., & Wiel, M. (2016). Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation. BMC Medical Research Methodology, 16, 144. 10.1186/s12874-016-0239-7

Wang, K., Shi, H.-S., Geng, F.-L., Zou, L.-Q., Tan, S.-P., Wang, Y., Neumann, D. L., Shum, D. H. K., & Chan, R. C. K. (2016). Cross-cultural validation of the Depression Anxiety Stress Scale-21 in China. Psychological Assessment, 28(5), e88–e100. 10.1037/pas0000207

Weber, L., Saelens, W., Cannoodt, R., Soneson, C., Hapfelmeier, A., Gardner, P., Boulesteix, A.-L., Saeys, Y., & Robinson, M. (2019). Essential guidelines for computational method benchmarking. Genome Biology, 20, 125. 10.1186/s13059-019-1738-8

Wilhelm, S., & Manjunath, B. (2022). tmvtnorm: Truncated Multivariate Normal and Student t Distribution. Comprehensive R Archive Network. https://CRAN.R-project.org/package=tmvtnorm

Woodruff, D. L., & Rocke, D. M. (1993). Heuristic search algorithms for the minimum volume ellipsoid. Journal of Computational and Graphical Statistics, 2(1), 69–95. 10.1080/10618600.1993.10474600

Youden, W. J. (1950). Index for rating diagnostic tests. Cancer, 3(1), 32–35. https://acsjournals.onlinelibrary.wiley.com/doi/10.1002/1097-0142(1950)3:1%3C32::AID-CNCR2820030106%3E3.0.CO;2-3

Zeldovich, M. (1982). Outcome measurement in russian clinical praxis: clinical outcome in routine evaluation - outcome measure (CORE-OM) [Doctoral dissertation, Universität Klagenfurt]. AAU Open-Access publications. https://netlibrary.aau.at/obvuklhs/content/titleinfo/5370233

Data Availability

The code, data, and supplementary materials are available at Graf et al. (2025)

Supplementary Materials

For this article, the following Supplementary Materials are available:

R Code. (Graf et al., 2025)

Data. (Graf et al., 2025)

Study materials. (Graf et al., 2025)

The authors have no funding to report.

The authors have declared that no competing interests exist.

The authors gratefully acknowledge the resources on the LiCCA HPC cluster of the University of Augsburg, co-funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - Project-ID 499211671.

Publisher Note

This Corrected Version of Record (CVoR) differs from the original Version of Record (VoR), published on July 10, 2025, by correcting an error within the affiliations section. This correction was made on July 23, 2025.