^{1}

^{2}

^{2}

^{3}

Reduction of graphs is a class of procedures used to decrease the dimensionality of a given graph in which the properties of the reduced graph are to be induced from the properties of the larger original graph. This paper introduces both a new method for reducing chain graphs to simpler directed acyclic graphs (DAGs), that we call power chain graphs (PCG), as well as a procedure for structure learning of this new type of graph from correlational data of a Gaussian graphical model. A definition for PCGs is given, directly followed by the reduction method. The structure learning procedure is a two-step approach: first, the correlation matrix is used to cluster the variables; and then, the averaged correlation matrix is used to discover the DAGs using the PC-stable algorithm. The results of simulations are provided to illustrate the theoretical proposal, which demonstrate initial evidence for the validity of our procedure to recover the structure of power chain graphs. The paper ends with a discussion regarding suggestions for future studies as well as some practical implications.

Probabilistic graphical models (

If

A similar approach is also used for contraction of directed graphs (i.e., graphs where all edges in

A special type of graph known as chain graph (

where

The DAGs and the DG on the bottom are commonly used to interpret, or statistically model, the undirected edge on the CG. However, none of these graphs result in the same conditional distribution, as the joint density of DAG1, DAG2, and DG are, respectively, represented by

where

In the literature on CGs, when the structure is high dimensional, chain graphs can be divided in a path of blocks of variables, known as dependence chains (

The first two, AMP and LWF CGs, are characterized by having directed and undirected edges, but differ on how they establish independence relations between parents of vertices that are linked by an undirected edge. MVR CGs, on the other hand, use directed edges together with bidirectional edges. MVR CGs have more in common with AMP CGs, as each node depends only on its own parents according to both interpretations. Both interpretations can also be modelled with linear equations, being the main difference between MVR CGs and AMP CGs the modelling of noise (

Despite the specific CG interpretation, if all nodes in a subgraph (i.e., a group of variables) have directed relations with all nodes of another subgraph, the nodes within each subgraph can be considered as part of the same block, even if the block represents a disconnected subgraph (i.e., a graph which not all nodes are directly connected). On the other hand, if the blocks are defined by clusters of variables, and if it is possible to test or assume the disjointness condition (i.e., that it is possible to define the direction of the relations between blocks), then we have a special type of CG which can be reduced to a PCG, as illustrated in

The undirected part of the chain graphs can be reduced by graph minor operations, while the directed part can be reduced by the contraction of an undirected equivalent of the strongly connected component to achieve a power chain graph (PCG). The PCG can be defined as the graph

_{1}_{2}_{3}

In the case of CGs, it has been shown that the structure of CGs, just like DAGs, can be, under the correct circumstances and assumptions, recovered from observational data (

If the CG is given or assumed, then reducing it to a PCG requires only three steps. First, one must define or choose a measurement function for assessing the conditional dependencies between the nodes. Then, the strongly connected undirected subgraphs are theoretically defined or empirically estimated from data. In the final step, a method for ensembling the weights of the directed edges of similar nodes in

We propose a structure learning procedure for PCGs where the dependencies between variables are assumed to be linear and, therefore, the variables are assumed to follow a multivariate normal distribution. Gaussian graphical models (GGMs;

The first step of our structure learning procedure for learning the structure of PCGs is the identification of the groups of similar nodes by some data-driven procedure. In a chain graph that can be represented by a GGM, any node _{a} is more similar to any node _{b} than to any node _{c} if and only if _{a} and ρ_{b}. The vector of dependencies ρ_{a} has as its elements the dependencies involving the _{a} node. The vector of dependencies ρ_{b} is interpreted similarly to ρ_{a} in respect to _{b}, and _{a} and ρ_{c}. With GGMs, Pearson correlation coefficient and regularized partial correlation are the most commonly used measures of dependency between nodes (

In the second step of our structure learning procedure, after finding the clusters of variables, the directions of the relations between power nodes can be estimated using causal discovery algorithms (

Step | Assumption | Description |
---|---|---|

Clustering and Contraction | Multivariate Gaussianity of variables | Variables in the dataset can be correctly described by a multivariate normal distribution. This assumption is necessary so the data can be modelled as a GGM. |

Multivariate Gaussianity of dependencies | Correlations estimated by maximum likelihood from the same sample asymptotically follow a multivariate normal distribution. This assumption is necessary so the weights of the edges estimated to be directed can be contracted by taking the arithmetic mean of the correlations between clusters. | |

Causal Discovery | Causal Markov condition | A variable |

Faithfulness | A graph is faithful to the data generating process if no conditional independence relations other than the ones entailed by the Markov condition are present. This assumption means that correlations are not spurious and lack of correlations are not due to suppression effects, but rather to true independence. | |

Causal sufficiency | The dataset includes all of the common causes of pairs of dependent variables in the dataset. This assumption means that there are no confounders (i.e., latent causes) that explain away the correlations between pairs of variables. | |

Multivariate Gaussianity of variables | Same as above. But in this case, this assumption makes it possible to test the significance of the conditional dependencies of triplets of variables. |

The clustering procedure that we propose departs from three main elements: (i) our definition of similar nodes; (ii) the fact proved by

where ^{th} cluster. The parameters of the model can be estimated by maximum likelihood using the Expectation-Maximization algorithm (

The clusters identified by our procedure are the strongly connected undirected subgraphs of the CG. By our definition of a PCG, nodes from different clusters can be connected only by directed edges and, if a node from a cluster is connected to a node in another cluster, all nodes in the first cluster are connected to all nodes in the second cluster, with the edges directed in the same direction. Based on the normality assumption, contraction of edges, resulting in the weighted power edges

where

where z’ is equal to

After calculating the weights of the directed power edges (i.e., after contracting what was estimated to be directed relations), one can use causal discovery methods to estimate causal relations from correlational data (

divergent connection

and v-structure

it being easy to see that

_{i}, _{j}, and _{k} represent any three nodes, or power nodes, and the data generating processes, in terms of causal relations.

The fact that v-structures have a different factorization from the other types of connections implies that—assuming non-confounders and no cycles (i.e., mutual causation)—it is possible to identify causal effects even if the data is correlational, as long as v-structures, as expressed in

There exist three classes of approaches for learning causal graphs (i.e., structure learning) from data (

Score-based algorithms, on the other hand, apply heuristic optimization techniques to the problem of structure learning (

Hybrid learning algorithms, as the name may suggest, combine both constraint-based and score-based algorithms to trying to overcome the limitations of single class algorithms (

Despite the fact that the PC-stable is the most adequate algorithm to learn the directions of the edges in a PCG, as it can be applied using the averaged correlation matrix estimated from the first step of our structure learning procedure, the other algorithms can be used for “fine-tuning” the CG implied by the PCG. This means that other algorithms, or even the PC-stable itself, can be used, after learning the structure of PCG and generating the implied CG, to remove directed edges that may not hold true causal relations between nodes from different power nodes. Therefore, this step of “fine-tuning”, despite not essential for learning the structure of a PCG, can shed a light on how to improve structure learning of CGs (

Finally, it should be noted that current procedures used for learning the structure of CGs, as well as the structure of DAGs, can only recover an equivalence class of directed, or partially directed, graphs. This fact will hold true even when the underlying distribution (i.e., the data generating process) is known, as any two CGs (or DAGs) that belong to the same equivalence class are causally indistinguishable (

The DGP demanded three characteristics to properly address our aim: variables should be normally distributed in groups of similar nodes; different groups should be causally related; and the simulated PCG should be the only element on its equivalence class (so it is causally distinguishable). The first two characteristics are simply characteristics of PCGs. The third characteristic guarantees that, in the absence of both unmeasured confounders and measurement error, there is only one best causal model to explain the dependencies in the data (

The DGPs were based on three different GGM chain graphs, with their corresponding PCGs represented in ^{th} independent power node (V’1, V’2, V’4, and V’6), a vector _{l} of size _{ij} for the variables for each power node were created using a linear factorial model:

where _{j}

DGP1 was chosen because it represents a v-structure, the simplest possible model to be unique in its equivalence class. Both DGP2 and DGP3 were built upon DGP1, increasing the number of nodes, but keeping the same nodes and directed edges, as well as the characteristic of being unique in its equivalence class. For instance,

Our proposed clustering procedure was compared to two other procedures: Exploratory Factor Analysis (EFA;

For assessing the quality of the clustering procedures, we used cluster performance indices (

Index | Formula | Range | Interpretation |
---|---|---|---|

Acc | [0, 1] | Values closer to 1 are preferred as they mean that all edges and all lack of edges were correctly identified | |

PPV | [0, 1] | Values closer to 1 are preferred as they mean that the proportion of correctly identified edges is larger than the proportion of incorrectly identified edges | |

TPR | [0, 1] | Values closer to 1 are preferred as they mean that the proportion of correctly identified edges is larger than the proportion of incorrectly identified lack of edges | |

TNR | [0, 1] | Values closer to 1 are preferred as they mean that the proportion of correctly identified lack of edges is larger than the proportion of incorrectly identified edges | |

NMI | [0, 1] | Values closer to 1 are preferred as they mean that the estimated cluster is dependent of the true cluster | |

HitND | [0, 1] | Can be interpreted as the power of the clustering procedures, with values closer to 1 preferred as they mean that the procedure has always found the correct number of clusters | |

GARI | (–∞, 1] | Negative values mean that the clusters being compared have completely different structures and the value of 1 means that they have exactly the same structure | |

VI | VI ≤ 2log( |
Smaller values are preferred as they mean all variables that should be clustered together were properly clustered together |

_{i}^{th} simulation indicating if the number of identified clusters was correct.

All simulations and data analyses were conducted in R version 4.2.0 (

For the cluster detection part of the simulation, the PA used was the one implemented in the paran package version 1.5.2 (

The Results section is divided in three subsections. In the first subsection, we investigate the performance of clustering algorithms for finding clusters (i.e., the subgroups of undirected edges) in a PCG setting. In the second subsection, we test the performance of algorithms in finding the directed edges of the PCG. We start with the assumption that the clusters are all known so the PC-stable performance is evaluated independently from the performance of the clustering procedure. The last table in the second subsection presents the results when we also consider an experimental setting in which the clusters are not assumed to be known and were first estimated with the clustering algorithms. At last, the application of fine-tuning is illustrated to evaluate the average change of density (i.e., the quantity of arrows kept) after the application of each of the alternative algorithms.

DGP | Cluster Size | Sample Size | Index | PA-EFA | EGA | CGMM | |||
---|---|---|---|---|---|---|---|---|---|

1 | 5 | 100 | GARI | 0.425 | (0.0001) | 0.818 | (0.0002) | ||

VI | 0.701 | (0.0002) | 0.238 | (0.0003) | |||||

NMI | 0.716 | (0.0001) | 0.918 | (0.0002) | |||||

HitND | 0.000 | (0) | 0.589 | (0.0004) | |||||

250 | GARI | 0.442 | (0.0001) | 0.938 | (0.0001) | ||||

VI | 0.585 | (0.0002) | 0.124 | (0.0002) | |||||

NMI | 0.667 | (0.0001) | 0.953 | (0.0001) | |||||

HitND | 0.000 | (0) | 0.750 | (0.0004) | |||||

500 | GARI | 0.444 | (0.0001) | 0.950 | (0.0001) | ||||

VI | 0.581 | (0.0002) | 0.101 | (0.0002) | |||||

NMI | 0.669 | (0.0001) | 0.963 | (0.0001) | |||||

HitND | 0.000 | (0) | 0.777 | (0.0004) | |||||

10 | 100 | GARI | 0.436 | (0.0001) | 0.860 | (0.0002) | |||

VI | 0.649 | (0.0002) | 0.287 | (0.0003) | |||||

NMI | 0.632 | (0.0001) | 0.870 | (0.0001) | |||||

HitND | 0.000 | (0) | 0.539 | (0.0005) | |||||

250 | GARI | 0.439 | (0.0001) | 0.906 | (0.0001) | ||||

VI | 0.644 | (0.0002) | 0.193 | (0.0002) | |||||

NMI | 0.635 | (0.0001) | 0.923 | (0.0001) | |||||

HitND | 0.000 | (0) | 0.463 | (0.0005) | |||||

500 | GARI | 0.444 | (0.0001) | 0.916 | (0.0001) | ||||

VI | 0.624 | (0.0002) | 0.166 | (0.0002) | |||||

NMI | 0.645 | (0.0001) | 0.933 | (0.0001) | |||||

HitND | 0.000 | (0) | 0.427 | (0.0005) | |||||

2 | 5 | 100 | GARI | 0.414 | (0.0001) | 0.568 | (0.0002) | ||

VI | 0.751 | (0.0002) | 0.550 | (0.0002) | |||||

NMI | 0.718 | (0.0001) | 0.803 | (0.0001) | |||||

HitND | 0.000 | (0) | 0.124 | (0.0003) | |||||

250 | GARI | 0.446 | (0.0001) | 0.704 | (0.0002) | ||||

VI | 0.687 | (0.0002) | 0.334 | (0.0002) | |||||

NMI | 0.743 | (0.0001) | 0.879 | (0.0001) | |||||

HitND | 0.000 | (0) | 0.256 | (0.0004) | |||||

500 | GARI | 0.458 | (0.0001) | 0.685 | (0.0002) | ||||

VI | 0.656 | (0.0001) | 0.346 | (0.0002) | |||||

NMI | 0.754 | (0.0001) | 0.873 | (0.0001) | |||||

HitND | 0.000 | (0) | 0.220 | (0.0004) | |||||

10 | 100 | GARI | 0.424 | (0.0001) | 0.801 | (0.0002) | |||

VI | 0.769 | (0.0002) | 0.328 | (0.0002) | |||||

NMI | 0.713 | (0.0001) | 0.891 | (0.0001) | |||||

HitND | 0.000 | (0) | 0.577 | (0.0005) | |||||

250 | GARI | 0.454 | (0.0001) | 0.969 | (0.0001) | ||||

VI | 0.697 | (0.0001) | 0.069 | (0.0001) | |||||

NMI | 0.740 | (0.0001) | 0.979 | (0) | |||||

HitND | 0.000 | (0) | 0.866 | (0.0003) | |||||

500 | GARI | 0.464 | (0) | 0.991 | (0) | ||||

VI | 0.660 | (0.0001) | 0.021 | (0.0001) | |||||

NMI | 0.753 | (0) | 0.994 | (0) | |||||

HitND | 0.000 | (0) | 0.875 | (0.0003) | |||||

3 | 5 | 100 | GARI | 0.348 | (0.0001) | 0.508 | (0.0002) | ||

VI | 0.848 | (0.0001) | 0.611 | (0.0002) | |||||

NMI | 0.741 | (0) | 0.819 | (0.0001) | |||||

HitND | 0.000 | (0) | 0.019 | (0.0001) | |||||

250 | GARI | 0.392 | (0.0001) | 0.655 | (0.0002) | ||||

VI | 0.769 | (0.0001) | 0.401 | (0.0002) | |||||

NMI | 0.766 | (0) | 0.884 | (0) | |||||

HitND | 0.000 | (0) | 0.045 | (0.0002) | |||||

500 | GARI | 0.409 | (0.0001) | 0.668 | (0.0001) | ||||

VI | 0.724 | (0.0001) | 0.386 | (0.0001) | |||||

NMI | 0.779 | (0) | 0.889 | (0) | |||||

HitND | 0.000 | (0) | 0.027 | (0.0002) | |||||

10 | 100 | GARI | 0.360 | (0.0001) | 0.718 | (0.0001) | |||

VI | 0.868 | (0.0001) | 0.417 | (0.0002) | |||||

NMI | 0.735 | (0) | 0.883 | (0.0001) | |||||

HitND | 0.000 | (0) | 0.217 | (0.0004) | |||||

250 | GARI | 0.409 | (0.0001) | 0.907 | (0.0001) | ||||

VI | 0.776 | (0.0001) | 0.120 | (0.0002) | |||||

NMI | 0.764 | (0) | 0.967 | (0) | |||||

HitND | 0.000 | (0) | 0.656 | (0.0005) | |||||

500 | GARI | 0.433 | (0.0001) | 0.981 | (0.0001) | ||||

VI | 0.725 | (0.0001) | 0.023 | (0.0001) | |||||

NMI | 0.780 | (0) | 0.994 | (0) | |||||

HitND | 0.000 | (0) | 0.916 | (0.0003) |

DGP1 |
DGP2 |
DGP3 |
|||||||
---|---|---|---|---|---|---|---|---|---|

Index | PA-EFA | EGA | CGMM | PA-EFA | EGA | CGMM | PA-EFA | EGA | CGMM |

0.441 | 0.899 | 0.446 | 0.782 | 0.392 | 0.735 | ||||

0.614 | 0.211 | 0.699 | 0.274 | 0.777 | 0.329 | ||||

0.651 | 0.918 | 0.739 | 0.903 | 0.763 | 0.906 | ||||

0.000 | 0.570 | 0.000 | 0.485 | 0.000 | 0.303 |

Index | PA-EFA | EGA | CGMM | PA-EFA | EGA | CGMM | PA-EFA | EGA | CGMM |
---|---|---|---|---|---|---|---|---|---|

0.404 | 0.696 | 0.431 | 0.862 | 0.443 | 0.881 | ||||

0.745 | 0.428 | 0.684 | 0.168 | 0.662 | 0.137 | ||||

0.701 | 0.850 | 0.722 | 0.945 | 0.730 | 0.955 | ||||

0.000 | 0.398 | 0.000 | 0.630 | 0.000 | 0.678 |

For evaluating the influence of the clusters’ sizes (

v = 5 |
v = 10 |
|||||
---|---|---|---|---|---|---|

Index | PA-EFA | EGA | CGMM | PA-EFA | EGA | CGMM |

0.420 | 0.715 | 0.432 | 0.910 | |||

0.688 | 0.346 | 0.706 | 0.142 | |||

0.722 | 0.881 | 0.713 | 0.951 | |||

0.000 | 0.356 | 0.000 | 0.724 |

Finally, analyses of variance were conducted in order to assess how each condition affect the performance of the methods. We report only the partial eta squared effect size in

Index | Factor | PA-EFA | EGA | CGMM |
---|---|---|---|---|

GARI | SS | 0.0000 | 0.0003 | 0.0001 |

CS | 0.0000 | 0.0004 | 0.0000 | |

DGP | 0.0001 | 0.0014 | 0.0044 | |

SS:CS | 0.0000 | 0.0001 | 0.0000 | |

SS:DGP | 0.0000 | 0.0002 | 0.0006 | |

CS:DGP | 0.0000 | 0.0002 | 0.0005 | |

SS:CS:DGP | 0.0000 | 0.0002 | 0.0003 | |

HitND | SS | 0.0000 | 0.0005 | 0.0000 |

CS | 0.0000 | 0.0007 | 0.0001 | |

DGP | 0.0002 | 0.0656 | 0.0282 | |

SS:CS | 0.0000 | 0.0004 | 0.0001 | |

SS:DGP | 0.0000 | 0.0011 | 0.0003 | |

CS:DGP | 0.0000 | 0.0009 | 0.0006 | |

SS:CS:DGP | 0.0000 | 0.0006 | 0.0005 | |

NMI | SS | 0.0000 | 0.0010 | 0.0004 |

CS | 0.0000 | 0.0023 | 0.0007 | |

DGP | 0.0001 | 0.0473 | 0.0053 | |

SS:CS | 0.0000 | 0.0004 | 0.0001 | |

SS:DGP | 0.0000 | 0.0013 | 0.0003 | |

CS:DGP | 0.0000 | 0.0017 | 0.0005 | |

SS:CS:DGP | 0.0000 | 0.0013 | 0.0004 | |

VI | SS | 0.0000 | 0.0000 | 0.0000 |

CS | 0.0000 | 0.0000 | 0.0000 | |

DGP | 0.0040 | 0.0666 | 0.0170 | |

SS:CS | 0.0000 | 0.0001 | 0.0001 | |

SS:DGP | 0.0000 | 0.0002 | 0.0002 | |

CS:DGP | 0.0000 | 0.0002 | 0.0002 | |

SS:CS:DGP | 0.0000 | 0.0001 | 0.0002 |

Using the PC-stable algorithm with the averaged correlation matrix of the clusters and assuming the true clusters are known we generated the results shown in

DGP | Cluster Size | Sample Size | Acc | PPV | TPR | TNR | ||||
---|---|---|---|---|---|---|---|---|---|---|

1 | 5 | 100 | 0.982 | (0.0001) | 0.988 | (0) | 0.972 | (0.0001) | 0.993 | (0) |

250 | 1.000 | (0) | 1.000 | (0) | 1.000 | (0) | 1.000 | (0) | ||

500 | 1.000 | (0) | 1.000 | (0) | 1.000 | (0) | 1.000 | (0) | ||

10 | 100 | 0.994 | (0) | 0.996 | (0) | 0.991 | (0.0001) | 0.997 | (0) | |

250 | 1.000 | (0) | 1.000 | (0) | 1.000 | (0) | 1.000 | (0) | ||

500 | 1.000 | (0) | 1.000 | (0) | 1.000 | (0) | 1.000 | (0) | ||

2 | 5 | 100 | 0.934 | (0.0001) | 0.964 | (0) | 0.899 | (0.0001) | 0.970 | (0) |

250 | 0.982 | (0) | 0.966 | (0) | 1.000 | (0) | 0.964 | (0) | ||

500 | 0.980 | (0) | 0.962 | (0) | 1.000 | (0) | 0.960 | (0) | ||

10 | 100 | 0.957 | (0.0001) | 0.973 | (0) | 0.938 | (0.0001) | 0.976 | (0) | |

250 | 0.984 | (0) | 0.970 | (0) | 1.000 | (0) | 0.968 | (0) | ||

500 | 0.983 | (0) | 0.968 | (0) | 1.000 | (0) | 0.967 | (0) | ||

3 | 5 | 100 | 0.882 | (0.0001) | 0.965 | (0) | 0.791 | (0.0002) | 0.974 | (0) |

250 | 0.980 | (0) | 0.978 | (0) | 0.982 | (0.0001) | 0.978 | (0) | ||

500 | 0.984 | (0) | 0.978 | (0) | 0.990 | (0) | 0.978 | (0) | ||

10 | 100 | 0.902 | (0.0001) | 0.972 | (0) | 0.825 | (0.0001) | 0.978 | (0) | |

250 | 0.981 | (0) | 0.985 | (0) | 0.977 | (0.0001) | 0.985 | (0) | ||

500 | 0.983 | (0) | 0.987 | (0) | 0.979 | (0.0001) | 0.987 | (0) |

Data Generating Process |
Sample Size |
Cluster size |
|||||||
---|---|---|---|---|---|---|---|---|---|

Index | Overall | DGP1 | DGP2 | DGP3 | 100 | 250 | 500 | 5 | 10 |

.962 | 0.945 | 0.943 | 0.937 | 0.973 | 0.956 | ||||

.965 | 0.928 | 0.967 | 0.959 | 0.960 | 0.959 | ||||

.961 | 0.970 | 0.919 | 0.898 | 0.992 | 0.955 | ||||

.962 | 0.921 | 0.967 | 0.955 | 0.955 | 0.957 |

Regarding the effects of the sample size, the results are somewhat ambiguous: both the conditions with the largest sample and the condition with the smallest sample outperform the other in two indices. We also found that increasing the cluster size will improve the average performance of the algorithm.

Analyses of variance were conducted in order to assess how each condition affect the performance of the PC-stable algorithm when the true clusters are known. Once again, we report only the partial eta squared effect size in

Factor | Acc | PPV | TPR | TNR |
---|---|---|---|---|

SS | 0.1896 | 0.0155 | 0.2315 | 0.0006 |

CS | 0.0057 | 0.0133 | 0.0031 | 0.0150 |

DGP | 0.1395 | 0.2114 | 0.1222 | 0.2948 |

SS:CS | 0.0078 | 0.0016 | 0.0099 | 0.0005 |

SS:DGP | 0.1051 | 0.0158 | 0.1287 | 0.0250 |

CS:DGP | 0.0007 | 0.0021 | 0.0004 | 0.0035 |

SS:CS:DGP | 0.0004 | 0.0011 | 0.0010 | 0.0021 |

We performed the same simulation again, but this time without assuming the true clusters are known. After applying the PC-stable algorithm, we recovered the adjacency matrix for the induced chain graphs. From those, we calculated the accuracy for all the conditions, as shown in

DGP | Cluster Size | Sample Size | Cluster Procedure | Acc | PPV | TPR | TNR | ||||
---|---|---|---|---|---|---|---|---|---|---|---|

1 | 5 | 100 | |||||||||

EGA | 0.844 | (0.0002) | 0.925 | (0.0001) | 0.734 | (0.0003) | 0.953 | (0.0001) | |||

250 | CGMM | 0.954 | (0.0001) | 0.978 | (0) | 0.925 | (0.0002) | 0.982 | (0) | ||

500 | CGMM | 0.969 | (0.0001) | 0.985 | (0) | 0.951 | (0.0002) | 0.987 | (0) | ||

10 | 100 | CGMM | 0.899 | (0.0001) | 0.961 | (0.0001) | 0.825 | (0.0002) | 0.974 | (0) | |

250 | CGMM | 0.920 | (0.0001) | 0.946 | (0.0001) | 0.883 | (0.0002) | 0.957 | (0) | ||

500 | CGMM | 0.954 | (0.0001) | 0.954 | (0.0001) | 0.952 | (0.0001) | 0.957 | (0) | ||

2 | 5 | 100 | |||||||||

EGA | 0.605 | (0.0001) | 0.793 | (0.0002) | 0.269 | (0.0003) | 0.941 | (0.0001) | |||

250 | |||||||||||

EGA | 0.681 | (0.0002) | 0.678 | (0.0003) | 0.495 | (0.0003) | 0.867 | (0.0001) | |||

500 | |||||||||||

EGA | 0.639 | (0.0002) | 0.518 | (0.0004) | 0.446 | (0.0004) | 0.832 | (0.0001) | |||

10 | 100 | ||||||||||

EGA | 0.759 | (0.0001) | 0.887 | (0.0001) | 0.586 | (0.0003) | 0.931 | (0.0001) | |||

250 | |||||||||||

EGA | 0.909 | (0.0001) | 0.909 | (0.0001) | 0.890 | (0.0002) | 0.927 | (0.0001) | |||

500 | |||||||||||

EGA | 0.928 | (0.0001) | 0.919 | (0.0001) | 0.927 | (0.0002) | 0.929 | (0.0001) | |||

3 | 5 | 100 | |||||||||

EGA | 0.598 | (0.0001) | 0.767 | (0.0002) | 0.272 | (0.0002) | 0.925 | (0.0001) | |||

250 | |||||||||||

EGA | 0.666 | (0.0001) | 0.746 | (0.0002) | 0.460 | (0.0002) | 0.872 | (0.0001) | |||

500 | |||||||||||

EGA | 0.666 | (0.0001) | 0.738 | (0.0001) | 0.486 | (0.0002) | 0.845 | (0) | |||

10 | 100 | ||||||||||

EGA | 0.683 | (0.0001) | 0.859 | (0.0001) | 0.431 | (0.0002) | 0.936 | (0) | |||

250 | |||||||||||

EGA | 0.812 | (0.0001) | 0.879 | (0.0001) | 0.707 | (0.0002) | 0.916 | (0) | |||

500 | |||||||||||

EGA | 0.848 | (0.0001) | 0.900 | (0.0001) | 0.772 | (0.0002) | 0.924 | (0) |

The results shown in

Sample | Index | EGA | CGMM |
---|---|---|---|

Full Sample | Acc | 0.816 | |

PPV | 0.865 | ||

TPR | 0.699 | ||

TNR | 0.933 | ||

N = 100 | Acc | 0.734 | |

PPV | 0.864 | ||

TPR | 0.527 | ||

TNR | 0.942 | ||

N = 250 | Acc | 0.840 | |

PPV | 0.866 | ||

TPR | 0.752 | ||

TNR | 0.927 | ||

N = 500 | Acc | 0.845 | |

PPV | 0.844 | ||

TPR | 0.770 | ||

TNR | 0.921 | ||

V = 5 | Acc | 0.740 | |

PPV | 0.793 | ||

TPR | 0.568 | ||

TNR | 0.912 | ||

V = 10 | Acc | 0.872 | |

PPV | 0.923 | ||

TPR | 0.797 | ||

TNR | 0.947 | ||

DGP = 1 | Acc | 0.932 | |

PPV | 0.966 | ||

TPR | 0.891 | ||

TNR | 0.973 | ||

DGP = 2 | Acc | 0.753 | |

PPV | 0.784 | ||

TPR | 0.602 | ||

TNR | 0.905 | ||

DGP = 3 | Acc | 0.712 | |

PPV | 0.815 | ||

TPR | 0.521 | ||

TNR | 0.903 |

We conducted the final analyses of variance in order to assess how each condition affect the performance of the PC-stable algorithm when the true clusters are unknown. As usual, we report only the partial eta squared effect size in

Method | Factor | Acc | PPV | TPR | TNR |
---|---|---|---|---|---|

CGMM | SS | 0.2995 | 0.0706 | 0.3311 | 0.0189 |

CS | 0.0270 | 0.0172 | 0.0265 | 0.0090 | |

DGP | 0.0393 | 0.0105 | 0.0517 | 0.0188 | |

SS:CS | 0.0308 | 0.0250 | 0.0310 | 0.0090 | |

SS:DGP | 0.0457 | 0.0282 | 0.0444 | 0.0187 | |

CS:DGP | 0.0324 | 0.0634 | 0.0192 | 0.0814 | |

SS:CS:DGP | 0.0037 | 0.0052 | 0.0043 | 0.0094 | |

EGA | SS | 0.1408 | 0.0059 | 0.1938 | 0.0232 |

CS | 0.2252 | 0.1534 | 0.2131 | 0.0863 | |

DGP | 0.4215 | 0.2356 | 0.3810 | 0.2937 | |

SS:CS | 0.0086 | 0.0221 | 0.0028 | 0.0385 | |

SS:DGP | 0.0017 | 0.0382 | 0.0069 | 0.0846 | |

CS:DGP | 0.0907 | 0.0886 | 0.0905 | 0.0223 | |

SS:CS:DGP | 0.0206 | 0.0303 | 0.0143 | 0.0271 |

The final analysis was used to evaluate how applying “fine-tuning” causal discovery algorithms would change the total number of arrows in the CG implied by the PCG.

Algorithms |
||||
---|---|---|---|---|

Condition | CG | PC | HC | MMHC |

Overall | 0.039 | 0.818 | 0.928 | |

DGP | ||||

DGP1 | 0.007 | 0.794 | 0.920 | |

DGP2 | 0.030 | 0.817 | 0.926 | |

DGP3 | 0.081 | 0.842 | 0.938 | |

Sample size | ||||

100 | 0.103 | 0.913 | 0.959 | |

250 | 0.008 | 0.820 | 0.928 | |

500 | 0.006 | 0.721 | 0.897 | |

Cluster size | ||||

5 | 0.045 | 0.754 | 0.893 | |

10 | 0.033 | 0.882 | 0.963 |

For illustration purposes, the present empirical example will apply PCG to derive causal hypotheses related to the empathy construct.

As usual in the psychological literature, the psychometric properties of the questionnaire were evaluated by

The dataset is composed of 1,973 French-speaking students from different colleges, from a diverse set of courses. The age of the participants ranged from 17 to 25 years (

Item | Factor | Item description |
---|---|---|

1 | Fantasy | I daydream and fantasize, with some regularity, about things that might happen to me. |

2 | Empathic concern | I often have tender, concerned feelings for people less fortunate than me. |

3R | Perspective-taking | I sometimes find it difficult to see things from the "other guy's" point of view. |

4R | Empathic concern | Sometimes I don't feel very sorry for other people when they are having problems. |

5 | Fantasy | I really get involved with the feelings of the characters in a novel. |

6 | Personal distress | In emergency situations, I feel apprehensive and ill-at-ease. |

7R | Fantasy | I am usually objective when I watch a movie or play, and I don't often get completely caught up in it. |

8 | Perspective-taking | I try to look at everybody's side of a disagreement before I make a decision. |

9 | Empathic concern | When I see someone being taken advantage of, I feel kind of protective towards them. |

10 | Personal distress | I sometimes feel helpless when I am in the middle of a very emotional situation. |

11 | Perspective-taking | I sometimes try to understand my friends better by imagining how things look from their perspective. |

12R | Fantasy | Becoming extremely involved in a good book or movie is somewhat rare for me. |

13R | Personal distress | When I see someone get hurt, I tend to remain calm. |

14R | Empathic concern | Other people's misfortunes do not usually disturb me a great deal. |

15R | Perspective-taking | If I'm sure I'm right about something, I don't waste much time listening to other people's arguments. |

16 | Fantasy | After seeing a play or movie, I have felt as though I were one of the characters. |

17 | Personal distress | Being in a tense emotional situation scares me. |

18R | Empathic concern | When I see someone being treated unfairly, I sometimes don't feel very much pity for them. |

19R | Personal distress | I am usually pretty effective in dealing with emergencies. |

20 | Fantasy | I am often quite touched by things that I see happen. |

21 | Perspective-taking | I believe that there are two sides to every question and try to look at them both. |

22 | Empathic concern | I would describe myself as a pretty soft-hearted person. |

23 | Fantasy | When I watch a good movie, I can very easily put myself in the place of a leading character. |

24 | Personal distress | I tend to lose control during emergencies. |

25 | Perspective-taking | When I'm upset at someone, I usually try to "put myself in his shoes" for a while. |

26 | Fantasy | When I am reading an interesting story or novel, I imagine how I would feel if the events in the story were happening to me. |

27 | Personal distress | When I see someone who badly needs help in an emergency, I go to pieces. |

28 | Perspective-taking | Before criticizing somebody, I try to imagine how I would feel if I were in their place. |

Because the structure of the instrument was validated extensively in the literature, we proceeded by assuming that the true clusters are known. In the next step, the averaged correlation matrix was calculated and then the PC-stable algorithm was applied for discovering the directions of the relations in the PCG. Finally, the tuning algorithms were applied to decrease the density of the CG implied by the PCG, with the results displayed both graphically and with the percentage of removed relations textually described. The R code for doing these analyses is also included in the

We calculated the averaged correlation matrix based on the clusters identified by

For the final analysis, we proceeded to the fine-tuning procedure with the CG implied by the PCG. For doing that, first, we fixed the direction of the edges according to the estimated PCG. We also fixed the absence of relations between power nodes (e.g., no direct relations between the variables in V’3 and V’2). After that, the PC-stable, the HC, and the MMHC algorithms were applied. Fine-tuning with the PC-stable algorithm resulted in 74.49% of the edges between clusters removed. With the HC algorithm, 65.31% of the edges between clusters were removed. With the MMHC algorithm, 86.22% of the edges between clusters were removed. Each corresponding unweighted CG is displayed in

The main aim of the present study was to propose a procedure of structure learning of PCGs. The secondary aim of the study was to compare clustering algorithms and causal discovery algorithms that can be used for learning the structure of the PCGs. To achieve these aims we used simulated data. To evaluate the practical value of our procedure, we did a full analysis of an empirical example. Our simulations showed that our procedure can properly recover both the clusters of variables, by means of the CGMM procedure, and the correct direction of the underlying causal relations, by means of averaged correlation matrix and the PC-stable algorithm. Our simulations have also illustrated how PCGs can be used to further investigate the structure of CGs, through tuning of the CG implied by the PCG using three different classes of causal discovery algorithms.

To assess the performance of the CGMM procedure in finding the correct number of clusters, we used a number of accuracy related indices and cluster comparison indices. We also compared the CGMM performance with two other more traditional procedures in statistical analysis: PA-EFA and EGA. Our results show that, in general, CGMM outperformed both procedures in a number of different conditions of sample sizes, number of variables per cluster, and the total number of clusters. These results are somewhat unexpected. Previous studies (e.g.,

The PC-stable algorithm performed well in a number of different conditions, even when the clustering was not known a priori. Changing the sample sizes, number of variables per cluster and the total number of clusters had little to no clear effect in the accuracy related indices. The second part of the causal discovery analysis, which compared different tuning algorithms, has some implications for structure learning procedures of CGs (e.g.,

Regarding the empirical example, although it was used more as an illustration of our procedure than a true investigation of the causal structures of empathy as proposed by

There are two main limitations of the present study. The first is the CGMM procedure. Despite its performance in the simulation study, the results may be limited to the particular setting of the data generating process. Theoretical studies are necessary to better understand in what conditions CGMM may perform the best so the present results regarding CGMM can be properly generalized. However, this has little impact on the PCG procedure as a whole, as any clustering procedure can be used in place of the CGMM, as long as it respects the “similarity” condition for clustering the variables. The second main limitation is related to the averaging of the correlation matrix. Despite its relation with the definitions of a power edge and similarity between nodes, there is neither logical implication nor model constraint that requires for the power edges to be estimated like this. Therefore, it is also necessary for future studies to evaluate which procedures for estimating or calculating power edges can improve the performance of the PCG procedure.

Future studies on PCG can focus on both theoretical aspects of CGMM and how causal discovery algorithms can be extended for learning the causal structure from the averaged correlation matrix. For instance, what would happen if instead of correlations, measures of dependence (

Data are freely available, see

The supplementary materials provided are the data that support the findings of this study and a manuscript preprint from the first author's PhD dissertation (for access see

Víthor Rosa Franco received a research grant from the Coordination for the Improvement of Higher Education Personnel grant number 88881.190469/2018-01.

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

The authors would like to thank the reviewers for their valuable comments and suggestions. We also would like to thank the editor for his decision to publish the manuscript.

Due to a technical issue in article production, all figures in the original version of this article, published December 22, 2022, were produced without graphical content. Therefore, the faulty Version of Record was replaced by the current Corrected Version of Record (CVoR) on March 8, 2023.