This note sketches two computational shortcuts for estimating multidimensional item response theory (IRT) models (van der Linden & Hambleton, 1997). We focus on the estimation problem using the expectationmaximization (EM) algorithm using numerical integration. In particular, we consider a more efficient implementation of the expectation step (Estep) in the EM algorithm. The computational amount is estimated by calculating the number of required operations. Additionally, computational gains in a pilot study are reported. Our primary focus is to propose two algorithmic shortcuts that avoid redundant operations. Such approaches can also be found for structural equation models (Pritikin, Hunter, von Oertzen, Brick, & Boker, 2017; von Oertzen & Brick, 2014).
This paper is structured as follows. First, the EM algorithm in IRT models is introduced, and its computational amount is analyzed. Second, we show how computational gains are realized by using socalled pseudoitems instead of original items. Third, we suggest a computational shortcut for multidimensional IRT models whose items possess a simple loading structure (i.e., between item dimensionality). Finally, we conclude with a discussion.
EM Algorithm in IRT Models
For simplicity, we focus on dichotomous item responses ${x}_{ni}$ in this paper, where $n=1,\dots ,N$ denote persons, and $i$ denote items. Let $x{}_{n}=\left({x}_{n1},\dots ,{x}_{nI}\right)$ denote the vector of observed data for person $n$. In the IRT model, it is assumed that the item response function (IRF) of item $i$ depends on item parameters $\gamma {}_{i}$, and the multidimensional distribution of the latent variable $\theta $ depends on some parameter $\sigma $. Let ${p}_{i}\left(x\theta ;\gamma {}_{i}\right)$ denote the item response probability $P\left({X}_{i}=x\theta \right)$ of item $i$ ( $i=1,\dots ,I$). The total loglikelihood is then given as
1
$$l\left(\gamma ,\sigma \right)=\sum _{n=1}^{N}log\left({\scriptstyle \int}\left[\prod _{i=1}^{I}{p}_{i}\left({x}_{ni}\theta ;\gamma {}_{i}\right)\right]{f}_{\theta}\left(\theta ;\sigma \right)d\theta \right),$$where ${f}_{\theta}$ denotes the density of the latent variable $\theta $. In the following, we approximate the integral in Equation 1 by a finite summation and use a set of fixed quadrature points $\theta {}_{t}$ $\left(t=1,\dots ,T\right)$. The numerically approximated loglikelihood is then given as
2
$$l\left(\gamma ,\sigma \right)={\displaystyle \sum _{n=1}^{N}}log\left({\displaystyle \sum _{t=1}^{T}}\left[{\displaystyle {\displaystyle \prod _{i=1}^{I}}}{p}_{i}\left({x}_{ni}\theta {}_{t};\gamma {}_{i}\right)\right]{w}_{\theta}\left(\theta {}_{t};\sigma \right)\right),$$where ${w}_{\theta}$ is a discrete density distribution for $\theta $. The latent variable $\theta $ cannot be directly observed. For this reason, the EM algorithm (see Aitkin, 2016; Fu, 2019; Glas, 2016, for overviews) is often used for maximizing $l$ in Equation 2. The EM algorithm alternates between two steps (see Baker & Kim, 2004, for details). In the Estep, the expected value of $l$ in Equation 2 is computed by integrating $\theta $ out using individual posterior distributions. In the Mstep, the expected loglikelihood is maximized based on computed expected counts from the Estep. Note that the update of item parameters $\gamma {}_{i}$ in the Mstep is a logistic regression model using case weights.
In the Estep, individual posterior distributions ${h}_{n}\left(\theta {}_{t}x{}_{n}\right)$ are computed as
3
$${h}_{n}\left(\theta {}_{t}x{}_{n}\right)=\frac{\left[{\displaystyle {\displaystyle \prod _{i=1}^{I}}}{p}_{i}\left({x}_{ni}\theta {}_{t};\gamma {}_{i}\right)\right]{w}_{\theta}\left(\theta {}_{t};\sigma \right)}{{\displaystyle \sum _{u=1}^{T}}\left[{\displaystyle {\displaystyle \prod _{i=1}^{I}}}{p}_{i}\left({x}_{ni}\theta {}_{u};\gamma {}_{i}\right)\right]{w}_{\theta}\left(\theta {}_{u};\sigma \right)}.$$For all items, expected counts are computed as
4
$${e}_{i}\left(x\theta {}_{t}\right)={\displaystyle \sum _{n=1}^{N}}{h}_{n}\left(\theta {}_{t}x{}_{n}\right)1{}_{\left\{{x}_{ni}=x\right\}}\phantom{\rule{2em}{0ex}}x=0,1,$$where $1$ denotes the indicator function.
We can now compute the total number of operations (multiplications or additions) needed in the Estep. The multiplications of probabilities in the numerator in the fraction in Equation 3 are typically substituted by additions of logarithmized probabilities. The following identity is used:
5
$$\left[{\displaystyle {\displaystyle \prod _{i=1}^{I}}}{p}_{i}\left({x}_{ni}\theta {}_{t};\gamma {}_{i}\right)\right]{w}_{\theta}\left(\theta {}_{t};\sigma \right)=\mathrm{exp}\left\{\left[{\displaystyle \sum _{i=1}^{I}}\mathrm{log}{p}_{i}\left({x}_{ni}\theta {}_{t};\gamma {}_{i}\right)\right]+\mathrm{log}{w}_{\theta}\left(\theta {}_{t};\sigma \right)\right\}.$$Notably, additions are typically faster than multiplications. Moreover, calculations on the logarithmic metric prevent numerical underflow (see, e.g., Vermunt, 2003, for an application in multilevel latent class models). First, for the computation of individual posterior distributions (see Equation 3), $NTI$ operations are needed. In addition, $NT$ operations are needed for normalizing the individual posterior distributions. However, we ignore this amount of operations in our calculations because we are particularly interested in situations in which $I$ is large, say larger than 50.
Second, $NTI$ operations are required for computing expected counts in Equation 4. Note that in software implementations, the multiplication with the indicator function $1$ in Equation 4 must not be carried out in the summation for the expected counts ${e}_{i}\left(x\theta {}_{t}\right)$ because one has only to add posterior probabilities for subjects $n$ with ${x}_{ni}=x$. In practice, one uses an index vector for storing the necessary information. In total, the number of operations ${O}_{1}$ needed in the Estep can be determined as
6
$${O}_{1}=2NTI.$$Computationally Efficient EM Algorithms
One strategy for improving computational efficiency is to use a parallelized version of an EM algorithm (Lópezde Teruel, García, & Acacio, 1999). This approach distributes the computational work in the Estep and the Mstep among multiple cores. There are several applications of this strategy in mixture modeling (Chen, Ostrouchov, Pugmire, Prabhat, & Wehner, 2013; Kwedlo, 2014; Lee, Leemaqz, & McLachlan, 2016; Lee, McLachlan, & Leemaqz, 2020; Yin, Zhang, & Gao, 2018). von Davier (2016) proposed a parallel EM algorithm for multidimensional IRT models and found considerable performance gains, in particular, for large datasets (see also Khorramdel, Shin, & von Davier, 2019).
A different strand of literature relies on simplifying the model structure in multidimensional IRT models (see Rijmen, Jeon, von Davier, & RabeHesketh, 2014, for an overview). In this case, the computation of posterior distributions can be simplified if some trait distributions are (conditionally) independent (Rijmen, 2009). Examples include the bifactor model (a.k.a. testlet models, Gibbons & Hedeker, 1992) and the twotier model (Cai, 2010).
Usage of PseudoItems
PseudoItems
In the following, the concept of pseudoitems is introduced. The main idea is to perform the computationally intensive likelihood computation on pseudoitems instead of original items to reduce computation time. A pseudoitem is composed of $k$ original polytomous items. Assume that $I={I}_{k}k$ for an integer $k$, and there are now ${I}_{k}$ pseudoitems. The number $k$ shall also be noted as the size of pseudoitems. More formally, let $\left({x}_{1},\dots ,{x}_{k}\right)$ denote a vector of $k$ polytomous items, each possessing $H$ categories, a pseudoitem ${y}_{j}$ is defined as
7
$${y}_{j}={\displaystyle \sum _{i=1}^{k}}{H}^{i1}{x}_{i+{I}_{k}\left(j1\right)},\phantom{\rule{1em}{0ex}}j=1,\dots ,{I}_{k}.$$The pseudoitem ${y}_{j}$ takes ${H}^{k}$ integer values $0,1,\dots ,{H}^{k}1$.
In the sequel, we only consider the case of dichotomous items (i.e., $H=2$). For example, $k=3$ original dichotomous items are recoded as a pseudoitem $y$ that takes integer values $0,1,\dots ,7$. The original data consisting of $I$ dichotomous item responses $x{}_{n}=\left({x}_{n1},\dots ,{x}_{nI}\right)$ for subjects $n=1,\dots ,N$ is recoded into data $y{}_{n}=\left({y}_{n1},\dots ,{y}_{n{I}_{k}}\right)$ consisting of polytomous pseudoitems.
Using PseudoItems in the EM Algorithm
A computationally more efficient EM algorithm is obtained by evaluating the posterior distributions based on these pseudoitems. Instead of using $I$ original items in the posterior, now only ${I}_{k}$ pseudoitems are involved, and the number of multiplications needed in the posterior is reduced. However, the number of probabilities that have to be additionally precomputed is given by ${I}_{k}{2}^{k}kT={2}^{k}IT$. Due to local independence, the item response probabilities of pseudoitem ${y}_{j}$ are given as
8
$${\stackrel{\u0303}{p}}_{j}\left(y\theta ;\gamma \right)={\displaystyle {\displaystyle \prod _{i=1}^{{I}_{k}}}}{p}_{i+{I}_{k}\left(h1\right)}\left({x}_{i}\theta \right)\phantom{\rule{1em}{0ex}}with\phantom{\rule{1em}{0ex}}y={\displaystyle \sum _{i=1}^{k}}{2}^{i1}{x}_{i}.$$The computation based on individual posterior distributions is now given as (insert Equation 8 into Equation 3)
9
$${h}_{n}\left(\theta {}_{t}x{}_{n}\right)={h}_{n}\left(\theta {}_{t}y{}_{n}\right)=\frac{\left[{\displaystyle {\displaystyle \prod _{j=1}^{{I}_{k}}}}{\stackrel{\u0303}{p}}_{j}\left({y}_{nj}\theta {}_{t};\gamma \right)\right]{f}_{\theta}\left(\theta {}_{t};\sigma \right)}{{\displaystyle \sum _{u=1}^{T}}\left[{\displaystyle {\displaystyle \prod _{j=1}^{{I}_{k}}}}{\stackrel{\u0303}{p}}_{j}\left({y}_{nj}\theta {}_{u};\gamma \right)\right]{f}_{\theta}\left(\theta {}_{u};\sigma \right)}.$$The number of operations needed in the posterior computation in Equation 9 is $NT{I}_{k}=NTI/k$.
Expected counts can now be computed for pseudoitems ${y}_{j}$:
10
$${\stackrel{\u0303}{e}}_{j}\left(y\theta {}_{t}\right)={\displaystyle \sum _{n=1}^{N}}{h}_{n}\left(\theta {}_{t}y{}_{n}\right)1{}_{\left\{{y}_{nj}=y\right\}},\phantom{\rule{1em}{0ex}}y=0,1,\dots ,{2}^{k}1.$$Therefore, for each pseudoitem $j$ and each trait grid point $\theta {}_{t}$, ${2}^{k}$ expected counts are computed. Based on these counts ${\stackrel{\u0303}{e}}_{j}$, expected counts for item $i$ can be computed by summing over appropriate counts of pseudoitems. The total number of operations needed for computing the expected counts is then given as $NTI/k+{2}^{k}IT$. Hence, the total number of operations ${O}_{2}\left(k\right)$ for computing individual posterior distributions and expected counts for items based on the approach using ${I}_{k}$ pseudoitems is given as
11
$${O}_{2}\left(k\right)=2NTI/k+{2}^{k}IT=IT\left(2N/k+{2}^{k}\right).$$We can now define the computational gain $G\left(N,k\right)$ that describes the factor of decrease in computation time by using pseudoitems:
12
$$G\left(N,k\right)=\frac{{O}_{1}}{{O}_{2}\left(k\right)}=\frac{N}{N/k+{2}^{k1}}=k\frac{1}{1+k{2}^{k1}/N}.$$Hence, for a fixed pseudoitem size $k$, the maximum gain $G\left(N,k\right)$ is $k$. It should be emphasized that the gain in the Estep does neither depend on the number of items $I$ nor the number of integration points $T$.
In Figure 1, the computational gain is shown as a function of pseudoitem size $k$ for a sample size of $N=1000$. It can be seen that there is a maximum at $k=6$ with a corresponding gain of about 5. This means that using pseudoitems that are defined by grouping 6 original items results in the highest efficiency gain. Notably, using a too large value of $k$ results in a very inefficient computation.
Figure 1
Due to ${lim}_{k\to \infty}\left(N/k+{2}^{k}\right)=\infty $, it holds that ${lim}_{k\to \infty}G\left(N,k\right)=0$. Hence, the size of pseudoitems has to be optimally chosen, and using a too large value of $k$ can lead to an inefficient algorithm. Also, note that for a fixed $k$, it holds that ${lim}_{N\to \infty}G\left(N,k\right)=k$.
Computation of an Optimal PseudoItem Size
We now determine the optimal pseudoitem size ${k}_{0}$ that maximizes the gain $G\left(N,k\right)$ defined in Equation 12. For simplicity, the pseudoitem size $k$ is now regarded as a continuous variable. Define $g\left(k\right)=N{k}^{1}+{2}^{k1}$. For a fixed sample size $N$, a maximum of $G\left(N,k\right)$ is given as a minimum $g\left(k\right)$. The derivative of $g$ is given by ${g}^{\prime}\left(k\right)=N{k}^{2}+\mathrm{log}\left(2\right){2}^{k1}$. Hence, the optimal value ${k}_{0}$ fulfills
13
$${k}^{2}{2}^{k1}=\frac{N}{\mathrm{log}\left(2\right)}\phantom{\rule{2em}{0ex}}or\phantom{\rule{2em}{0ex}}k{2}^{k1}=\frac{N}{\mathrm{log}\left(2\right)k}.$$Inserting Equation 13 into Equation 12 provides the maximum gain ${G}_{0}$ that can be achieved
14
$${G}_{0}=G\left(N,{k}_{0}\right)={k}_{0}\frac{N}{N+{k}_{0}{2}^{{k}_{0}1}}={k}_{0}\frac{N}{N+\frac{N}{\mathrm{log}\left(2\right){k}_{0}}}={k}_{0}\frac{1}{1+1/\left(\mathrm{log}\left(2\right){k}_{0}\right)}.$$By taking the logarithm in Equation 13, it can be seen that the optimal ${k}_{0}$ fulfills
15
$$2logk+\mathrm{log}\left(2\right)k=logN+\mathrm{log}\left(2\right)\mathrm{log}\left(\mathrm{log}\left(2\right)\right).$$Because the term $k$ is faster increasing than the term $logk$, the optimal ${k}_{0}$ is primarily a function of $logN$, i.e., ${k}_{0}=O\left(\mathrm{log}\left(N\right)\right)$.
Analytical Illustration
We now illustrate the computational gains of the more efficient Estep implementation for sample sizes ranging between $N=100$ and $N=10\phantom{\rule{0.3em}{0ex}}000\phantom{\rule{0.3em}{0ex}}000$ subjects. In Table 1, the computational gain of using pseudoitems in the Estep is shown as a function of sample size $N$ using Equation 12. It can be seen that the maximum computational gain ${G}_{0}$ increases with increasing sample sizes. Increases in computational efficiency are indeed nonnegligible, at least for applications of international largescale assessment studies like PISA (OECD, 2017) that involve student sample sizes of about half a million.
Table 1
Sample size  k_{max}  G_{0} 

100  4  3.0 
1 000  6  5.0 
10 000  9  7.3 
100 000  11  9.9 
1 000 000  14  12.6 
10 000 000  17  15.3 
Note. ${k}_{max}$ = pseudoitem size that results in maximum computational gain; ${G}_{0}$ = maximum computational gain.
Computational Illustration
We now demonstrate how the proposed shortcut using pseudoitems performs in an experimental implementation using the Rcpp package (Eddelbuettel, 2013; Eddelbuettel & François, 2011) in the R software (R Core Team, 2020). The Estep consists of three parts in the computation. First, the logarithms of the item response probabilities are added to the logarithm of the prior distribution. In this first part, an incomplete version of the unnormalized likelihood is computed. Afterward, the $exp$ operation has to be applied to obtain the complete unnormalized likelihood. Second, the unnormalized likelihood on the logarithmic scale is further processed by taking the $exp$ operation and normalizing the likelihood contributions for each subject for obtaining individual posterior distributions. Third, expected counts are computed based on individual posterior distributions.
The computational gains were investigated using sample sizes $N=100$, 1000, 10000, and 100000. The number of items was fixed to 50, 100, and 200. In the EM algorithm, the number of quadrature points was fixed to 21. The R package microbenchmark (Mersmann, Beleites, Hurling, Friedman, & Ulrich, 2019) was used for timing comparisons. In each condition, 500 replications were utilized in the comparison.
In Table 2, computational gains are displayed for different pseudoitem sizes. It can be seen that efficiency gains were only about 2 or 3 if the complete Estep consisting of all three parts is used in the comparison. Notably, the exponentiation step and the normalization of the likelihood was not quantified in our analytically derived computational gains. Interestingly, it turned out that the predictions of computational gains were fulfilled for the computation of the unnormalized likelihood (without taking $exp$, first part) and the computation of expected counts (third part). The second part (exponentiation and normalization of the likelihood terms) is unrelated to using pseudoitems. By considering only the first part of the Estep, computational gains range between 3 and 10, meaning that this part is now three times to ten times faster than a previous implementation. For the third step, similar gains were obtained. Very likely, this finding demonstrates that applying the exponentiatial function is much more computationally demanding than an addition or multiplication.
By comparing the computational gains of the computation of the unnormalized likelihood with the prediction of Equation 12 in Table 2, we observe a moderate agreement of maximum computation gains for sample sizes of at least 1 000. Differences between timings obtained from implementations and analytical predictions could also emerge from the computation time required for memory allocation or accessing appropriate matrix entries in functions.
Table 2
Sample size: No. of items  Pseudoitem size k 


1  2  3  4  5  6  7  8  9  10  
Complete Estep  
100:  
50  1  0.5  0.3  0.3  0.4  0.3  0.2  0.2  0.1  0.1 
100  1  0.7  0.5  0.6  0.5  0.3  0.2  0.2  0.1  0.1 
200  1  0.9  0.7  0.8  0.6  0.4  0.3  0.2  0.1  0.0 
1 000:  
50  1  1.3  1.2  1.2  1.4  1.1  0.9  0.8  0.6  0.5 
100  1  1.6  1.8  1.9  2.0  1.6  1.4  1.1  0.7  0.4 
200  1  1.6  1.9  2.3  1.9  2.0  1.7  1.2  0.8  0.4 
10 000:  
50  1  1.6  1.9  2.0  2.1  2.0  1.6  1.6  1.5  1.4 
100  1  1.8  2.4  2.7  2.8  2.8  2.7  2.3  1.8  1.6 
200  1  1.7  2.3  2.8  2.5  3.0  3.1  2.6  2.1  1.4 
100 000:  
50  1  1.6  1.9  2.1  2.2  2.2  1.7  1.9  1.7  1.8 
100  1  1.8  2.3  2.7  2.8  2.8  2.8  2.5  2.0  2.1 
200  1  1.7  2.4  2.8  2.5  3.2  3.3  2.9  2.5  2.0 
Unnormalized likelihood without $exp$ operation  
100:  
50  1  0.6  0.4  0.4  0.5  0.3  0.3  0.2  0.2  0.1 
100  1  0.9  0.6  0.8  0.7  0.4  0.3  0.2  0.2  0.1 
200  1  1.3  0.9  1.1  1.0  0.5  0.4  0.3  0.2  0.1 
1 000:  
50  1  2.7  2.3  2.0  3.0  2.1  1.9  1.6  1.2  0.9 
100  1  3.2  3.5  4.3  4.3  3.0  2.5  2.1  1.5  0.8 
200  1  3.0  3.7  4.4  4.9  3.8  3.0  2.7  1.6  0.7 
10 000:  
50  1  4.3  5.5  6.5  7.3  7.1  6.3  6.2  5.3  5.2 
100  1  3.9  6.1  7.6  9.1  8.0  8.1  7.4  6.6  4.9 
200  1  3.6  5.3  6.9  8.4  7.9  8.1  8.1  6.9  3.7 
100 000:  
50  1  4.3  5.9  7.1  8.3  8.8  8.0  7.8  8.1  8.4 
100  1  3.8  5.9  7.8  9.1  9.5  8.7  9.2  8.7  9.2 
200  1  3.6  5.2  7.0  8.4  8.6  9.0  9.8  9.5  6.0 
Computational gain predicted from Equation 12  
100  1  1.9  2.7  3.0  2.8  2.1  1.3  0.7  0.4  0.2 
1 000  1  2.0  3.0  3.9  4.6  5.0  4.8  4.0  2.7  1.6 
10 000  1  2.0  3.0  4.0  5.0  5.9  6.7  7.3  7.3  6.6 
100 000  1  2.0  3.0  4.0  5.0  6.0  7.0  7.9  8.8  9.5 
Note. Optimal computational gains are printed in bold. Computational gains computed by Equation 12 do not involve the number of items.
We also investigated computation gains using 61 quadrature points which is the default for unidimensional IRT model in the popular R package mirt (Chalmers, 2012). The findings for computational gains were similar albeit not identical (i.e., some variation in gains of at most 1 in each condition).
It has to be noted that the optimal pseudoitem size from Equation 12 does not coincide with the empirically obtained optimal pseudoitem size. While the computational gains in the computation of the unnormalized likelihood approximately match the analytical predictions (see Table 2), the optimal pseudoitem became smaller in practical implementations. Depending on specific software implementation, it is recommended to always check potential computational gains depending on the sample size, the number of items, and the number of quadrature points for determining the optimal pseudoitem size.
Overall, it has to be admitted that the proposed shortcut might not result in considerable gains in the Estep of the EM algorithm. However, similar gains would also be obtained in parallel computing if only three or four cores were used.
DimensionWise Likelihood Computation in Between Item Dimensionality Multidimensional IRT Models
Assume now that in a multidimensional IRT model, each item loads on exactly one $\theta $ dimension. This property is labeled as between item dimensionality (Adams, Wilson, & Wang, 1997). It is assumed that for each dimension $d$ $\left(d=1,\dots ,D\right)$, there are ${I}_{d}$ items. The item response probability in the case of between item dimensionality now only depends on one dimension. Hence, the total likelihood contribution for subject $n$ can be factored as
16
$${\displaystyle \prod _{d=1}^{D}}}{\displaystyle {\displaystyle \prod _{i=1}^{{I}_{d}}}}{p}_{i}\left({x}_{ndi}\theta ;\gamma {}_{di}\right)={\displaystyle {\displaystyle \prod _{d=1}^{D}}}{\displaystyle {\displaystyle \prod _{i=1}^{{I}_{d}}}}{p}_{i}\left({x}_{ndi}{\theta}_{d};\gamma {}_{di}\right)={\displaystyle {\displaystyle \prod _{d=1}^{D}}}{L}_{nd}\left(\theta ;x{}_{nd}\right),$$where ${x}_{ndi}$ denotes the item response of subject $n$ for the $i$ th on dimension $d$. From Equation 16, it is evident that the likelihood part in individual posterior distributions can be independently computed for all dimensions. In a multidimensional IRT model, one often uses the same $\theta $ grid values across dimensions. For example, let $\mathcal{Q}=\left\{4.0,3.6,\dots ,4.0\right\}$ be again the grid of fixed quadrature points for $\theta $, the multidimensional $\theta $ grid is obtained by defining ${\mathcal{Q}}^{D}$. With $Q=\left\mathcal{Q}\right$, $T={Q}^{D}$ grid values $\theta {}_{t}$ are defined, and each component of this vector possesses values in $\mathcal{Q}$. Using Equation 6,
17
$${O}_{1}=2N{Q}^{D}I$$operations are needed for computing the posterior distributions. It turns out that in the computation of the posterior ${h}_{n}\left(\theta {}_{t};x{}_{n}\right)\propto {L}_{n}\left(\theta {}_{t};x{}_{n}\right)$, the product term does not have to be calculated for all dimensions. In contrast, one can rely on the factorization
18
$${L}_{n}\left(\theta {}_{t};x{}_{n}\right)={\displaystyle {\displaystyle \prod _{d=1}^{D}}}{L}_{nd}\left({\theta}_{td};x{}_{nd}\right).$$The expected counts for an item $i$ have only to be evaluated based on a dimension $d$ due to between item dimensionality. Hence, one can integrate over all dimensions except the $d$ th dimension. Using the notation $\theta {}_{t}=\left({\theta}_{td},\theta {}_{t\left(d\right)}\right)$, one can compute dimensionwise individual marginal posterior distributions ${h}_{nd}^{*}$:
19
$${h}_{nd}^{*}\left({\theta}_{td}x{}_{n}\right)={\sum}_{\left(d\right)}{h}_{n}\left({\theta}_{td},\theta {}_{t\left(d\right)}x{}_{n}\right),$$where ${\sum}_{\left(d\right)}$ denotes a summation over all dimensions except $d$. Based on these marginal distributions, the computational demand of expected counts is substantially reduced as one has to only sum over individual probabilities related on exactly one dimension for each item. By employing these computational shortcuts, the number of required operations can be reduced to
20
$${O}_{2}=2NQ\sum _{d=1}^{D}{I}_{d}+2N{Q}^{D}=2NQI+\left(D+1\right)N{Q}^{D}.$$This less demanding computation provides a computational gain of
21
$$G=\frac{{O}_{1}}{{O}_{2}}=\frac{2NI{Q}^{D}}{2NIQ+\left(D+1\right)N{Q}^{D}}={Q}^{D1}\frac{1}{1+\left(D+1\right){Q}^{D1}/\left(2I\right)}.$$For a large number of items $I$, the computational gain approaches ${Q}^{D1}$. Note that the computational gain is independent of the sample size $N$.
Combination of Usage of PseudoItems and DimensionWise Likelihood Computation
Next, we combine the usage of pseudoitems and dimensionwise likelihood computation. For simplicity, we assume an equal number of items per dimension, i.e., ${I}_{d}=I/d$. Then, we obtain by using the same computations as in Equation 11
22
$${O}_{2}\left(k\right)\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}2NQI/k+\left(D+1\right)N{Q}^{D}+2\cdot {2}^{k}IQ.$$The computational gain can be computed as
23
$$\begin{array}{ccc}\hfill G\left(N,k\right)={\displaystyle \frac{{O}_{1}}{{O}_{2}\left(k\right)}}& \hfill =\hfill & {\displaystyle \frac{2N{Q}^{D}I}{2NQI/k+\left(D+1\right)N{Q}^{D}+{2}^{k}IQ}}\hfill \\ \hfill & \hfill =\hfill & k{Q}^{D1}{\displaystyle \frac{1}{1+k\left(D+1\right){Q}^{D1}/\left(2I\right)+k{2}^{k1}/N}}.\hfill \\ \hfill \end{array}$$In the minimization of ${O}_{2}$ with respect to $k$, the same optimal value ${k}_{0}$ is obtained as in Equation 13. Inserting Equation 13 into Equation 23 leads to an optimal gain of
24
$${G}_{0}=G\left(N,{k}_{0}\right)={k}_{0}{Q}^{D1}\frac{1}{1+{k}_{0}\left(D+1\right){Q}^{D1}/\left(2I\right)+1/\left(log\left(2\right){k}_{0}\right)}.$$Analytical Illustration
We now illustrate the computational gains by applying our proposed computational shortcuts. We consider a multidimensional IRT model with between item dimensionality with $I=50$ items and $Q=15$ fixed quadrature points per dimension. Sample sizes are varied between $N=100$ and $N=10\phantom{\rule{0.3em}{0ex}}000\phantom{\rule{0.3em}{0ex}}000$, and the number of dimensions $D$ is chosen as 2, 3, or 4. Two computational shortcuts are compared. First, we use a strategy that only relies on the dimensionwise computation of the likelihood (Method DW). Second, we combine the dimensionwise computation with the pseudoitem approach that selects the optimal pseudoitem size.
In Table 3, computational gains based on Equation 23 of the two methods are shown as a function of sample sizes and dimensions. It can be seen that the dimensionwise method (Method DW) is independent of sample sizes, but especially results in substantial computational gains for larger numbers of dimensions. For $D=2$ dimensions, additional usage of pseudoitems (Method DP) further provides relevant computational gains, while using pseudoitems for more dimensions ( $D=3$ or $D=4$) is not as effective. Given that the Estep in numerical integration for multidimensional IRT models is particularly computationally demanding, we think that our proposed computational shortcuts could be beneficial for multidimensional models with betweenitem dimensionality. However, it has to be shown that the proposed shortcuts for multidimensional IRT models also show comparable computational gains in software implementations.
Table 3
Sample size  D = 2

D = 3

D = 4



DW  DP  DW  DP  DW  DP  
100  10.3  19.2  22.5  24.1  19.9  20.0 
1 000  10.3  23.1  22.5  24.5  19.9  20.0 
10 000  10.3  25.6  22.5  24.6  19.9  20.0 
100 000  10.3  27.2  22.5  24.7  19.9  20.0 
1 000 000  10.3  28.3  22.5  24.8  19.9  20.0 
10 000 000  10.3  29.1  22.5  24.8  19.9  20.0 
Note. $D$ = number of dimensions; DW = computational gain by using dimensionwise computation; DP = computational gain by using dimensionwise computation and maximum pseudoitem size.
Discussion
This note shows that by using two computational shortcuts, the number of operations required in an Estep in unidimensional or multidimensional models can be substantially reduced. First, the evaluation of individual posterior distributions can be more efficiently implemented by using pseudoitems that are composed of a set of original items. By pursuing this strategy, additional item response probabilities—evaluated at the $\theta $ grid—for these pseudoitems have to be computed. However, as a consequence, fewer multiplications have to be conducted in the computation of individual posterior distributions. It was shown that the optimal number of original items that are grouped into pseudoitems is a function of the sample size. Second, for multidimensional IRT models with between item dimensionality, EM algorithms that rely on numerical integration can be more efficiently implemented if the joint individual likelihood function is factored as a product of dimensionwise likelihood functions. This strategy avoids the evaluation of multiplications of likelihood contributions on the full multidimensional grid of trait values. The usage of pseudoitems can additionally be applied for multidimensional IRT models, which results in further computational gains.
In our evaluation, we only considered the case of dichotomous item responses. In principle, the same strategy can be applied for datasets containing polytomous item responses. However, using pseudoitems for polytomous items with many categories will typically not result in such substantial computational gains as for dichotomous items because a larger number of probabilities have to be computed. The handling of ignorable missing item responses is relatively simple because one could treat them as a special case of polytomous item responses and define a corresponding probability of a missing category as one. In this strategy, missing item responses are essentially omitted from the computation. For planned missingness designs (Frey, Hartig, & Rupp, 2009; Rhemtulla & Little, 2012), more thoughtful approaches seem reasonable that skip not administered items in the computations. Moreover, computational gains by using pseudoitems directly transfer to latent class or mixture models.
Our proposed computational shortcuts can be combined with parallel computing (von Davier, 2016) or acceleration techniques (Beisemann, Wartlick, & Doebler, 2020) in the EM algorithm. This would lead to additional computational gains, which are particularly important for applications involving large numbers of items. This contribution is primarily meant as a theoretical contribution analyzing the amount of required computational operations. The preliminary evaluation of obtained performance gains of implementations in statistical software led to slightly smaller computation gains in practice. However, these gains are nevertheless noticeable.