This is an older version of this preprint and has not been peer reviewed. A newer version will be uploaded when completed.
NetworkToolbox: Methods and Measures for Brain, Cognitive, and Psychometric Network Analysis in R Alexander P. Christensen University of North Carolina at Greensboro Department of Psychology P.O. Box 26170 University of North Carolina at Greensboro Greensboro, NC, 27402-6170, USA E-mail:
[email protected] March 27, 2018 Abstract This article introduces the NetworkToolbox package for R. Network analysis offers an intuitive perspective on complex phenomena via models depicted nodes (variables) and edges (correlations). The ability of networks to model complexity has made them the standard approach for modeling the intricate interactions in the brain. Similarly, networks have become an increasingly attractive model for studying the complexity of psychological and psychopathological phenomena. NetworkToolbox aims to provide psychological researchers with state-of-the-art methods and measures for estimating and analyzing brain, cognitive, and psychometric data. I introduce NetworkToolbox and provide a manual for applying the package’s functions to common measures of psychometric (personality) and brain (resting-state) data. Keywords: psychology, network analysis, graph theory, R
1
Introduction
Open science is ushering in a new era of psychology where multi-site collaborations are common and big data is readily available. Often, in these datasets, noise is mixed in with relevant information. Thus, researchers are faced with a challenge: deciphering information from the noise and maintaining the dataset’s integrity while reducing its complexity. Other areas of science have tackled this challenge with the use of network science and graph theory methods. Psychology is beginning to follow suit, changing the way we conceptualize psychological 1
(Cramer et al., 2012; Schmittmann et al., 2013) and psychopathological phenomena (Borsboom and Cramer, 2013; Borsboom, 2017). Psychological and psychopathological constructs are intrinsically complex. Over the years, researchers have focused on reducing phenomena to only the most significant variables. This reductionist approach has given us a greater understanding of what variables matter and what variables don’t. Nonetheless, the more we reduce, the less we know about the phenomena as a whole (Barab´ asi, 2011). With network science, we can begin to put psychological phenomena back together again, embracing the complexity. Here, I introduce NetworkToolbox, a package for R (R Core Team, 2017), available on the Comprehensive R Archive Network (https://cran.r-project. org/web/packages/NetworkToolbox/index.html). NetworkToolbox offers psychologists the tools to apply various network science and graph theory methods and measures to their brain, cognitive, and psychometric data. Networks are made up of nodes (e.g., circles) that are connected by edges (e.g., lines) to other nodes. Nodes represent variables (e.g., brain regions, words, questionnaire items) and edges represent a relation between two nodes (e.g., correlations, partial correlations), which can be undirected (i.e., bidirectional) or directed, and weighted or unweighted (all weights are equivalent). Thus, networks are conceptually simple yet capable of considerable complexity. In addition, their graphical depictions offer representations that have intuitive interpretations (Epskamp et al., 2012). There are several network analysis packages that are already implemented in R such as, statnet (Goodreau et al., 2008), igraph (Csardi and Nepusz, 2006), and qgraph (Epskamp et al., 2012). Indeed, there are R packages that specifically focus on brain (e.g., brainGraph; Watson, 2017) and psychometric (e.g., qgraph, Epskamp et al., 2012; IsingFit, van Borkulo et al., 2014) networks. NetworkToolbox differentiates itself by including state-of-the-art network methods and measures that are not currently available in these other packages. In addition, NetworkToolbox is specifically focused on network methods and measures for psychological researchers. Thus, the aim of this R package is provide psychological researchers with current and future state-ofthe-art network analysis techniques. Notably, network visualization aspects are not considered in this package. Where necessary, I’ll refer readers to visualization sources that are compatible with NetworkToolbox’s outputs. In general, the R package qgraph (Epskamp et al., 2012) is recommended for psychometric network visualizations.
2
Network Analysis
In this section, I will provide explanations for most methods and measures in NetworkToolbox and include associated code implemented in R. The main body of NetworkToolbox is devoted to psychometric network analysis; however, several functions for cognitive and brain networks are also provided. For the purpose of this manual, I will only discuss functions related to psychometric
2
and brain network analysis. All analyses conducted in this article were performed using R version 3.4.4 (2018-03-15) and NetworkToolbox version 1.1.1. Before digging out the tools, I’ll introduce the main datasets that will be used throughout the paper.
2.1
Datasets
These datasets are centered around the personality trait Openness to Experience and, more specifically, the NEO personality inventories (NEO-PI-3, McCrae et al., 2005; NEO-FFI-3, McCrae and Costa Jr, 2007). People high in Openness to Experience are described as creative, curious, imaginative, unconventional, and intellectual. The NEO Openness to Experience inventories have 6 facets: actions, aesthetics, fantasy, feelings, ideas, and values, representing distinct characterizations of the global trait. NetworkToolbox includes the folllowing psychometric, cognitive, and brain datasets, which are openly available for researchers to use (brain data can be found here: restOpen). Thus, researchers are free to experiment and explore the data in any manner they like. If any data are used for publication purposes, then please cite the appropriate sources. 2.1.1
Psychometric Data
For the psychometric section, I use a dataset that includes four Openness to Experience inventories (Big Five Aspects Scale, DeYoung et al., 2007; HEXACO100, Lee and Ashton, 2016; NEO-PI-3, McCrae et al., 2005; Woo et al.’s Inventory, Woo et al., 2014), which was previously used in Christensen et al. (2018a). In this manual, the analyses of this dataset will be focused on the NEO-PI-3 (48 items) but the full dataset can be found on the Open Science Framework (https://osf.io/954a7/). This dataset contains a relatively large sample (N = 802), which provides a solid foundation for some of the more nuanced analyses in NetworkToolbox. R> library(’’NetworkToolbox’’) R> psy brain behav cog Warning message: In TMFG(psy, na.data = "listwise") : + 10 rows were removed for missing data row(s): + 234, 497, 71, 639, 99, 677, 652, 255, 150, 28 ‘‘pairwise’’ deletion will use the data that is available for each variable when constructing the network. Note that relations from pairwise deletion could potentially be based on different subsets of cases, which could be problematic. By far, the best and recommended option is ‘‘fiml’’ or Full Information Maximum Likelihood (FIML). FIML uses all information available to produce a deterministic result (i.e., the same result every time) in the construction of the network. For this reason, FIML has been the standard in many structural equation modeling (SEM) software. Notably, network construction does take a few seconds longer when using the ‘‘fiml’’ option. If there is missing data in the dataset but no argument is set, then a warning will appear, instructing the researcher to select an option for the na.data argument. Both ‘‘pairwise’’ and ‘‘listwise’’ are implemented via the psych and qgraph packages. The next issue concerning the data is whether the data are multivariate normal. There are multiple R packages that offer multivariate normal testing like MVN (Korkmaz et al., 2014), so such functions are not included in NetworkToolbox. If the
4
data is not multivariate normal, should the data be transformed to be multivariate normal? This is of particular importance in network and graphical modeling because with a multivariate normal distribution there is a direct relationship between the inverse covariance matrix (also known as the precision matrix) and the partial correlation (conditional independence) structure. Thus, conditional independence networks fundamentally rely on a multivariate normal distribution. The necessity of this assumption and the influence of deviating from multivariate normal in psychometric network estimation are worthy of future investigation (Loh and Wainwright, 2012). In addition, the extent to which some psychological phenomena, specifically psychopathological disorders, are truly normally distributed is a philosophical issue also worth discussing. The current psychometric network standard, however, emphasizes conditional independence so non-normal distributions should be considered by the researcher (Epskamp and Fried, ress). An argument called normal handles whether correlations should be used that transform the relations to a normal distribution. When set to TRUE, the data are correlated using qgraph’s cor auto(). This function detects ordinal variables and uses polyserial, polychoric, and Pearson’s correlations. Other correlations (e.g., tetrachoric) and association measures can also be used but they must be computed using some other function (e.g., psych’s tetrachoric(); Revelle, 2017) before they are used as input into any of the network construction functions. When normal is set to FALSE, Pearson’s correlations are computed. In all network construction functions, normal defaults to FALSE. It is important to note that despite the pervasive notion that Pearson’s correlation assumes normality, it does not; however, the test statistic, on which significance is based, may have a different distribution when the data is non-normal. This may be more important for some network construction methods than others. In addition, Pearson’s correlation is sensitive to outliers and highly skewed variables, so data transformations or alternative correlation measures (e.g., Spearman’s rho) should be considered in these instances. The last argument called depend is currently unique to the TMFG() network construction method. The depend argument is used to construct a dependency network from data that has been run through depend(). The dependency network approach was introduced by Kenett et al. (2010) and produces edges in networks that show the direction of influence from one node to another (i.e., a directional network)—similar to relative importance networks (Gr¨ omping et al., 2006; Johnson and LeBreton, 2004; Robinaugh et al., 2014). This approach makes use of first-order partial correlations, which are the partial (or residual) effect of a given node j on the correlation between another pair of nodes i and k. The resulting partial correlation between nodes i and k is the correlation after the subtraction of the correlations between nodes i and j and between nodes k and j. In this way, the partial correlation provides a measure of node j’s influence on the correlation between nodes i and k. Therefore, the influence of node j on node i can be determined by taking the average (or the sum) influence of node j on the correlations of node i with all other nodes. Mathematically, this can be expressed as: d(i, k|j) ≡ C(i, k) − P C(i, k|j)
(1)
where C (i, k) is the correlation between node i and j and PC (i, k—j) is the partial correlation of node i and k given j. By subtracting the partial correlation of nodes i and k given node j from the correlation of nodes i and k, the infrequent case of where node j appears to strongly affect the correlation C (i, k) when C (i, j), and C (k, j) all
5
have small values is avoided. In order to determine the influence of node j on node i, I define the average influence of node j across all relative effects of the correlations where node k does not equal j: D(i, j) =
N −1 1 X d(i, k|j) N −1
(2)
k6=j
As defined, the dependency matrix D has (i, j) elements, which are the dependency of node i on node j. Put differently, this measure specifies the average influence of node j on node i (j → i). In a dependency network, the arrow pointing from node j to node i signifies that node j influences node i. Although the correlation matrix is symmetric, the dependency matrix is asymmetric, which means the influence of node j on node i, D(j, i), is not equal to the influence of node i on node j, D(i, j). It’s important to note that these directional influences are not interpreted as casual relations but instead as predictive relationships. These networks are, however, a step towards inferring casual relations between two nodes and may function as a “pre-test” for data and methods that can infer casual relations (Jacob et al., 2016). Therefore, the advantage of the dependency network approach is that the mutual influence between two nodes is not presumed to be equal but rather one node is assumed to have more influence in the relationship than the other.
2.2.2
Network Construction
The current standard in psychometric network analysis is to use the least absolute selection and shrinkage operator (lasso; Tibshirani, 1996). In short, the lasso approach penalizes the inverse covariance matrix to produce sparse networks whose connections reflect conditional independence (i.e., nodes that are connected are uniquely associated, given all other nodes in the network). Because there are already a number of other R packages that compute lasso networks (see bootnet, Epskamp et al., 2018; IsingFit, van Borkulo et al., 2014; qgraph, Epskamp et al., 2012), they are not included in NetworkToolbox. NetworkToolbox does include several other options for constructing a network. The main approach within the package is the Information Filtering Networks approach (IFN; Barfuss et al., 2016). This approach constructs networks based on the zero-order correlations between variables and induces parsimony via a topological (structural) constraint. Currently, this family of networks has three main methods: the Triangulated Maximally Filtered Graph (TMFG; Massara et al., 2016), the Planar Maximally Filtered Graph (PMFG; Tumminello et al., 2005), and the Maximum Spanning Tree (MaST; Chu and Liu, 1965; Edmonds, 1967; Mantegna, 1999). Additionally, the TMFG method can be associated with the inverse covariance matrix via the Local/Global method (LoGo; Aste and Di Matteo, 2017; Barfuss et al., 2016) for probabilistic graphical modeling. The TMFG() method constructs the network by first sorting the edge weights (i.e., correlations) in descending order. Next, the TMFG algorithm finds the four nodes that have the largest sum of edge weights with all other nodes in the network and connects them, forming a tetrahedron. Then, the algorithm iteratively identifies and adds the node that maximizes the sum of its connections to three of the nodes already included in the network. This process is completed once the network reaches 3n - 6 edges (n = nodes). Inherent in this construction, a planar network—a network that could be
6
drawn on a sphere without any edges crossing—is generated. Thus, the topological constraint of the TMFG network is planarity. One unique property of networks constructed by the TMFG method is that a chordal network is automatically generated. This means that each 4-clique (i.e., a set of 4 connected nodes) in the network contains a chord, which connects two of the nodes not already connected, forming two triangles. Chordal networks are thus composed of separators, which are the 3-node cliques (i.e., a set of connected nodes; in this case, triangles) that separate one 4-node clique from another 4-node clique. This clique structure can be simplified to a junction tree, where the nodes represent the 4-node cliques that are connected by edges, which represent the common nodes between the 4-node cliques (i.e., the separators). In the case of discrete variables (e.g., binary data), the covariance matrix can be augmented by including additional terms of the nodes included in the separators, which produces a graphical structure in the inverse covariance matrix (Loh and Wainwright, 2012). Chordal networks are a special class of networks that can represent the independence assumptions of Bayesian (directed) and Markovian (undirected) networks (Koller and Friedman, 2009) and have a simple association with the joint probability distribution (Barfuss et al., 2016; Massara et al., 2016). This association is expressed as:
Q φ (XC ) C∈Cliques C Q Q(X) = S∈Separators
φS (XS )
(3)
where Q(X) is the estimated joint probability distribution, and φ(C ) and φ(S ) are the marginal probabilities within the subsets of variables of the 4-node cliques (XC ) and 3-node separators (XS ), respectively. This association to the joint probability distribution can be easily implemented with the Local/Global inversion method(LoGo; Barfuss et al., 2016). LoGo() requires a dataset so that the covariance matrix can be estimated to obtain the inverse covariance matrix. The inverse covariance matrix is then associated with the TMFG’s network structure, using the separators and cliques from the TMFG function. Thus, the only difference between the TMFG and LoGo network is the edge weights (zero-order correlations and partial correlations regressed over all others, respectively). In this way, the TMFG network embeds conditional independence within its structure. Below are the functions and arguments for the TMFG() and LoGo() functions: R> TMFG(data, normal = FALSE, weighted = TRUE, + depend = FALSE, na.data = c("pairwise", "listwise", "fiml")) R> LoGo(data, partial = FALSE, normal = FALSE, + na.data = c("pairwise", "listwise", "fiml")) The na.data, normal, and depend arguments have already been discussed. The other arguments include data, which can either be a dataset or correlation matrix for TMFG(). If a dataset, then it should only include the variables of interest, without any identification or demographic variables (unless, of course, these are of interest). The weighted argument is for whether the network should be weighted (edge weights are equal to correlation strength) or unweighted (edge weights are all equal to 1). For LoGo, there is a partial argument, which can be used to convert the filtered inverse covariance matrix to partial correlations between nodes given all other nodes in the network. If partial is set to FALSE, then the filtered inverse covariance will be produced. TMFG() outputs a list containing three objects: the TMFG-filtered
7
adjacency matrix (A), separators (separators), and cliques (cliques). LoGo() also outputs a list but contains two objects: the LoGo-filtered adjacency matrix (logo) and the TMFG-filtered adjacency matrix (tmfg). To obtain only the TMFG- or LoGo-filtered networks, the following code can be used: R> psyTMFG psyLoGo threshold(data, a, thresh = c("alpha", "adaptive", + "bonferroni", "FDR", "proportional"), n = nrow(data), + normal = FALSE, na.data = c("pairwise", "listwise", "fiml"))
8
There are a few more arguments in threshold() than the other network construction methods. For example, n is used to set the number of participants in the data, which defaults to the full dataset. This argument is used for setting the critical value when the input for the data argument is a correlation matrix. The a argument is generally used to set α, however, it can be used to set other parameters for the different types of thresholds. Below, I’ll elaborate on each threshold method and note how a is used in each method. The type of threshold is set using the thresh argument. ‘‘alpha’’ is the standard α that psychologists know and love; it is also the default. Thus, a directly sets the α. Another basic option is also available: Simply select a correlation value (e.g., .15) and threshold() will set all edges below that value to zero—negative correlations greater than this value, for example r ¿ -.15, will also be set to zero. Note that a does not need to be set and the desired correlation value can be immediately input into thresh. The next most familiar is probably the bonferroni correction. Setting thresh to ‘‘bonferroni’’ will adjust the α value set by a based on the number of comparisons made in the network (e.g., 30 nodes will be 302 .05 or α = −30/2 .0001). False-discovery rate (FDR) can also be applied, which controls the rate of Type I errors and is less stringent than the bonferroni correction (Benjamini and Hochberg, 1995). When thresh is set to ‘‘FDR’’ the a argument is used to determine the q-value or the adjusted p-value. For ‘‘FDR’’, a defaults to .10. Finally, NetworkToolbox offers two unique options: ‘‘proportional’’ and ‘‘adaptive’’. For ‘‘proportional’’, the values are thresholded based on the desired network density—number of retained edges divided by the number of possible edges—of the network. As an example, say there are 30 nodes in the network (435 possible edges), then a network density of .15 would be equal to 65.25 included edges (435 × .15). Because it’s impossible to have a quarter of an edge, the value is rounded to zero digits, leaving 65 edges. The edge weights are then sorted in descending order and the 65 largest edges are retained while the others are set to zero. Thus, the number of edges retained are proportional to the number of possible edges. When thresh is set to ‘‘proportional’’, then a will set the density (defaults to .15). Therefore, if a certain number of edges is sought (e.g., 100 of 435 edges), then the density should be computed a priori (i.e., 100 = .23). It most cases, this proportional value is ar435 bitrary but better comparability between networks is gained (i.e., the networks have the same number of nodes and same number of edges; Christensen et al., 2018b). The ‘‘adaptive’’ option will change α based on sample size according to the formula found in P´erez and Pericchi (2014). This approach changes alpha so that an α = .05, for example, is constant across sample sizes. This formula is presented below: αadapt =
α∗
p
n0 × (log(n0 ) + χ2α (1))
p
n1 × (log(n1 ) + χ2α (1))
(4)
where α in the equation is set by a to be manipulated into αadapt . n0 is a reference sample where the researcher can specify the Type I error (α) and Type II error (β). β is set to a default of α*5, suggesting that Type I errors are 5 times worse than Type II errors (Rouder et al., 2016). The R package pwr (Champely, 2017) is then used to determine the sample size necessary to detect a medium effect size (i.e., r = .3). Thus, in NetworkToolbox, the default of n0 is 74.59823, which is based on α = .05, β = .75, and r = .3. Similarly, χ2α (1) defaults to 3.841459. n1 is the sample size of the data, which in most psychological samples is greater than n0 , so αadapt will adjust to values lower than .05. If the sample size is less than n0 , then αadapt will be adjusted
9
to values greater than .05. In sum, there are several different threshold types and all of which have a particular value they are using to threshold values below. Thus, it is important for the researcher to not just obtain the thresholded adjacency matrix but also the value used to threshold. Thus, threshold() produces a list that contains objects of the adjacency matrix (A) and the critical value of r that was used to threshold the matrix (r.cv). An example to retrieve both is below: R> threshNet network critical.value network facets qgraph(network, layout = ’’spring’’, vsize = 4, label.prop = 1, + groups = facets, palette = ’’ggplot2’’, title = ’’NEO-PI-3’’)
Figure 1: Plot of NEO-PI-3’s items using the TMFG network construction method. There are an enormous number of network measures available and the use of them depend on the researcher’s questions. I’ll start with local network characteristics, which have received the most attention in the psychometric network literature. These characteristics measure how each node contributes to the overall network.
10
2.2.4
Local Network Characteristics
The standard measures used in psychometric network analysis are centrality. Centrality measures quantify each node’s influence in the network and do so differently. The standard measures used in the literature are betweenness centrality (BC), closeness centrality (LC), and node strength (NS). betweenness() centrality measures how often a node is on the shortest path—the shortest number of edges to get to another node—from one node to another. closeness() centrality measures the average number of paths from one node to all other nodes in the network—nodes higher in LC have the shortest paths to all other nodes. Finally, node strength() measures the sum of the edge weights connected to a single node. There is one general argument for these measures: weighted. This argument defaults to TRUE but can be set to FALSE to treat all edge weights equally (i.e., a weight of 1). Both BC and LC output a vector of values for each node in the network. Because these measures are frequently used in the psychometric network literature, I will assume the reader has encountered these measures before. There are, of course, a number of other centrality (and local network characteristic) measures. degree() (k), for example, measures how many connections are connected to each node (regardless of strength). In this way, k is just the node’s strength if all weights were 1 or unweighted. Notably, both k and NS output a vector of values for each node in the network when the network is undirected. When directed, however, they output a list containing objects of in-degree/strength, out-degree/strength, and relative influence. The in-degree of a node is how many edges are pointing to that node or how many nodes are influencing that node. In contrast, out-degree of a node is how many edges are pointing away from that node or how many nodes are being influenced by that node. Finally, the relative influence (relInf) of a node is the proportion of in-degree and out-degree connections a node has. This equation is written below: outDegree − inDegree (5) outDegree + inDegree Nodes with higher relative influence values influence other nodes more than they are influenced by other nodes. Conversely, lower relative influence values are influenced by other nodes more than they influence other nodes. One measure that closely resembles centrality measures already used in the literature is the randomized shortest paths betweenness centrality (RSPBC; Kivim¨ aki et al., 2016). This measure differs from the standard BC in that it measures how often a node is on the random shortest path from one node to another rather than how often a node is on the shortest path (the most direct route) between nodes. Consider this interpretation in a psychopathological network: Do symptoms take the most direct route to other symptoms in the network? Probably not. Do symptoms take random routes to other symptoms in the network? Probably not. The truth is probably somewhere in between. Therefore, a researcher can take the average of the two and likely be closer to truth than either measure on its own. Notably, if a node is higher in the standard BC, then it is also likely to be higher in the RSPBC; however, unlike standard BC, each node will have a value greater than zero. rspbc() computes the RSPBC and has a one argument: beta. beta is used to adjust the probability distribution of the Boltzmann distribution—put simply, it adjusts how strongly the random paths influence the measurement of the betweenness centrality (Kivim¨ aki et al., 2016). Higher beta values will decrease the randomness of the paths with beta = 10 being closer to the weighted betweenness centrality; lower relInf =
11
beta values will increase the randomness of the paths with beta = .0001 being closer to degree. The beta argument defaults to .01, which is based on Kivim¨ aki et al. (2016)’s recommendation. Below is how to compute the RSPBC and the average of the standard BC and the RSPBC: R> bc rbc averageBC bootgen(data, + method = c("MaST", "PMFG", "TMFG", "LoGo", "threshold"), + alpha = .05, n = nrow(data), iter = 100, normal = FALSE, + na.data = c("pairwise", "listwise", "fiml"), seeds = NULL, ...) R> bootgenPlot(object, bootmat = FALSE)
Figure 2: Reliabilities for the orignet matrix. Lower triangle represents unfiltered reliabilities; upper triangle represents the filtered matrix, orignet (left). Edge weight strength regressed on their respective reliabilities for orignet (right). Beginning with bootgen(), there are some arguments that should be familiar and are used in identical ways as previously discussed: data, normal, and na.data. Other arguments might be familiar but are used in a different way. First, the method argument is used to select the desired network construction method. If no method
14
is selected, then the argument defaults to ‘‘TMFG’’. Next, the n argument sets the number of cases to be used in the bootstrap, which defaults to the full sample; the iter argument sets the number of iterations to be used in the bootstrap, which defaults to 100. Finally, the seeds argument is used to replicate previous analyses. Without inputting previous seeds, different results will be obtained each and every run. This is because there is a random number generator that changes the seeds each time bootgen() is run. To replicate a previous run of bootgen() the following code can be used: R> networkrel repnetworkrel bootgenPlot(repnetworkrel) Seeds is an object that is output in the list bootgen() returns. In these lines of code, the replicated bootgen() analyses were used to plot Figure 2. Notably, I’ve not discussed one argument yet: alpha. This is because these two arguments belong to a novel bootstrapping technique that is used to overcome the topological constraints of the IFN network construction methods. I’ll describe the novel method next and then discuss its arguments.
Bootstrapped Network Genearlization Building on Tumminello et al. (2007), the method uses the same bootstrap technique, which bootstraps with replacement and applies the selected network construction method to each new sample. Then, the correlations that are retained in these networks are transformed into Fisher’s z. This is so that the mean of each edge’s sampling distribution (e.g., the 1,000 bootstrapped networks) can be derived. Thus, the result is a matrix where each element is the edge’s mean weight across all bootstrapped networks. In this way, each element in the matrix provides an estimate of each edge weight that is penalized based on their reliability in the network. Therefore, edges that appear frequently in the network have a small penalty (and none at all if they appear in every network) whereas edges that appear infrequently shrink proportionally to their reliability. The final step is removing extremely small edges from the network—it’s likely that nearly every edge will have a weight (but that doesn’t mean it should be retained in the network). This is where the argument alpha comes in. alpha is used to set the threshold to remove the edges in the network. This is done via the adaptive alpha described earlier with the threshold function. As before, ‘‘adaptive’’ changes the statistical significance value based on the size of the sample. alpha defaults to .05. The last argument ... is used for additional arguments to be passed to the network construction methods. The reader might notice that many arguments overlap in the threshold() network construction method and bootgen(). Below is an example to demonstrate how to perform this analysis using the threshold network construction method: R> threshBoot bootgenPlot(networkrel, bootmat = TRUE)
Figure 3: Reliabilities for the bootgen matrix. Lower triangle represents unfiltered reliabilities; upper triangle represents the filtered matrix, bootgen (left). Edge weight strength regressed on their respective reliabilities for bootgen (right). The plots’ outputs are labeled with reference to whether they were for the “original” network, which refers to the orignet adjacency matrix, or for the “bootgen” network, which refers to the bootmat adjacency matrix. Notably, the plot on the right in Figure 3 shows that bootgen’s edge weight strength on reliability had a much larger correlation than orignet’s edge weight strength on reliability (plot on the right in Figure 2). This is because the edge weights (i.e., correlations) that were less reliable were penalized for their lack of reliability. Therefore, edge weights were essentially shrunk to their linear relation of reliability. In addition, the bootstrapped network generalization network also removed some edge weights that were not as reliable and added more weights that were reliable. As depicted below, the bootgen network retained zero edges with a reliability of less than .25 while the original network does (Figure 4).
16
Figure 4: The reliability frequency of edges retained in the original and bootgen networks.
The aim of the bootstrapped network generalization method is two-fold: increase the specificity of the edges retained in the network (i.e., reduce false positives) and relax the topological constraint of the IFN approach. Although future simulations studies are needed to fully determine how well these aims are met, preliminary results are satisfactory. As for the latter, the bootgen network retains many of the same connections as the original network (i.e., TMFG) but also includes other connections that violate the planar constraint (Figure 5).
Figure 5: Plot of bootgen network.
Edge Replication and k-fold Validation NetworkToolbox includes other network methods that can be applied to assess the reliability of psychological network estimation. The first is edgerep(), which estimates how many edges replicate between two cross-sectional samples (or a single sample split in half). edgerep() has
17
four arguments: A, B, corr, and plot. Both A and B are arguments for network filtered adjacency matrices. If one or both matrices are asymmetric (i.e., directed networks), then edgerep() will automatically make the asymmetric matrix symmetric and produce a warning letting the researcher know that this has been done. The corr argument allows the researcher to specify the type of correlation to be used to assess the associations between replicated edge weights. Options include ‘‘pearson’’, ‘‘spearman’’, and ‘‘kendall’’. The default is ‘‘pearson’’. plot defaults to FALSE but can be set to true to produce a matrix corresponding to the edges that replicated and their respective edge weights (network A on the lower triangle and network B on the upper triangle). edgerep() outputs a list containing several objects: replicatedEdges, replicated, totalEdges, percentage, density, meanDifference, sdDifference, and correlation. replicatedEdges is a matrix of all the edges that replicated and their corresponding edge weight in their respective network. replicated is the number of edges between the two samples, meanDifference is average difference of the replicated edge weights, sdDifference is the standard deviation difference of the replicated edge weights, and correlation is the specified correlation of the replicated edge weights. For totalEdges, percentage, and density, a value is produced for each network denoted with the associated network input (i.e., A and B). Instead of collecting two different samples to examine edge replicability, a researcher could perform split-half reliability (2-fold) or k-fold analysis to determine how many edges replicate in a smaller sample. To perform such analyses, splitsamp() can automatically split the sample into the desired number of different samples. The arguments for splitsamp() are below: R> splitsamp(data, samples = 10, splits = 2) The splits argument is the number of splits (or folds) to introduce into the sample (e.g., 2 = in half, 3 = in thirds, 4 = in fourths, etc.). The default is 2 (or split in half). The samples is the number of samples desired for the splits. The default is 10 samples. splitsamp outputs a list that contains: the training samples trainSample and testing samples testSample, and their respective sample sizes (trainSize and testSize, respectively). Some a priori consideration about how to split the sample should be taken. Although splitsamp() will handle uneven sample splits, the samples may not be split in the way the researcher desires. In most cases, split-half sampling is the optimal option. There are two functions that can be used to perform split-half reliability analysis. The first is to construct networks from the training and testing datasets. R> splitsampNet(object, + method = c("MaST", "PMFG", "TMFG", "LoGo", "threshold"), + bootmat = FALSE, seeds = NULL, pB = TRUE, ...) splitsampNet() has a few arguments. The first is the object, which is the sample list that is output from the splitsamp(). The method argument allows the researcher to select the network construction method to be performed on the sample (defaults to ‘‘TMFG’’). bootmat is used to perform the bootstrapped network generalization method when constructing the networks (defaults to FALSE). Like before, seeds can be used to use the same seeds that were used in the original bootgen()
18
network (defaults to NULL). pB is used to set the progress bar and defaults to TRUE. Finally, ... is so additional arguments can be used from the network construction methods. splitsampNet() outputs a list containing two objects (train and test). Each object contains 10 networks that are associated with their appropriate network construction method and sample assignment. Next, statistics are necessary to be performed between the corresponding samples: R> splitsampStats(object, stats = c("edges", "centralities"), + edges = c("lower", "upper")) splitsampStats() will perform statistics on the edges and centralities of each network. The first argument is object, which should be the output from splitsampNet(). The next argument is statscan be set to ‘‘edges’’, ‘‘centralities’’, or both (c(‘‘edges’’, ‘‘centralities’’)). The last argument, edges, can be used for networks that contain a different number of edges. ‘‘lower’’ will output the lower percentage of the edges that replicated between the two samples; ‘‘upper’’ will output the higher percentage of the edges that replicated between the two samples. When stats = ‘‘edges’’, then edgerep() is used to perform the edge replication analyses and its output will be contained in a list called Edges. When stats = ‘‘centralities’’, then BC, LC, and strength network measures will be computed and correlated between the two samples. The correlations will be output in a list called Centralities.
2.3
Brain Network Analysis
There are a few R packages that will perform network analysis on brain connectivity matrices. brainGraph (Watson, 2017), for example, can do numerous analyses that neuroscience researchers are already familiar with like general linear models (GLM), permutation testing for group differences, computation of global and local network characteristics, and the network-based statistic (NBS; Zalesky et al., 2010). Similarly, brainwaver (Achard, 2012) offers some other unique analyses like wavelet decomposition and attacks on networks (similar to percolation analyses; Kenett et al., 2018). Therefore, I will only highlight the unique features that NetworkToolbox offers.
2.3.1
Importing Matlab Data
First, preprocessed brain data must be read into R. There are generally two options that a researcher can use to execute this. If the Matlab file is output from Matlab’s CONN Connectivity Toolbox (Rubinov and Sporns, 2010), then the researcher can use convertConnBrainMat() in NetworkToolbox, which will import the correlation and Fisher’s z brain connectivity arrays. If the researcher simply has a Matlab file that contains the correlation (or Fisher’s z) matrices for each participant, then the researcher can use readMat() in R.matlab (Bengtsson, 2016) to input the array. Note that the arrays should be formatted as the node-by-node matrix (n × n) for each participant (m)—that is, n × n × m. Examples of these two options are provided below: R> neuralarray neuralarray corrmat zmat convert2igraph(A, neural = FALSE) R> brainNetList neuralnetfilter(neuralarray, method = c("TMFG", "MaST", + "ECOplusMaST", "ECO", "threshold"), progBar = TRUE, ...) the arguments used in neuralnetfilter() are mostly familiar, with neuralarray referring to a brain connectivity array and ... for specifying additional arguments to be used in the network construction methods (e.g., a = .05 in threshold()). The filtered brain network array can also be converted into igraph’s format for convenient use in brainGraph.
2.3.3
Connectome-based Predictive Modeling
The most notable feature of the brain network analyses provided in NetworkToolbox is Connectome-based Predictive Modeling (CPM; Finn et al., 2015; Rosenberg et al., 2016; Shen et al., 2017). In general, CPM is a cross-validation approach for discovering significant connections between regions of interest (ROIs) in the whole brain that are related to a behavioral variable. The approach can be applied to both resting-state and task-based fMRI data and has been successful in predicting functional connectivity profiles related to fluid intelligence (Finn et al., 2015), sustained attention (Rosenberg et al., 2016, 2018), personality (Hsu et al., 2018), and creativity (Beaty et al., 2018b). There are several methods in the CPM approach: internal validation, fingerprinting and external validation. Within each method there are options to include permutation testing and covariates; however, not all of these options have been programmed into NetworkToolbox (yet). In addition, there is likely to be more options included in these methods in the future. For the purpose of this manual, I will only focus on the internal validation method. Before diving in, I refer the interested reader to Finn et al. (2015), Rosenberg et al. (2016), and Shen et al. (2017) for a more elaborate explanation of the approach and the methods included. The method that I’ll discuss is the internal validation method (cpmIV(); for more a detailed explanation see Shen et al., 2017). The internal validation method performs a leave-one-out regression analysis to predict the score of a behavioral variable, which is correlated with the observed score of the behavioral variable. More specifically, the process begins by iteratively removing one participant (testing participant) from the brain and behavioral datasets. The leftover participants are used to train the regression model (training participants). Then, the behavioral variable is correlated with each edge in the brain connectivity matrix (i.e., each edge across training participants, totaling one edge per participant). From this, a matrix is obtained that corresponds to the correlation strength of the behavioral variable with each edge in the connectivity matrix. A threshold is then applied to remove non-significant edges (usually α = .01), and the remaining positive (edges positively associated with the behavioral variable) and negative (edges negatively associated with the behavioral variable) edges are separated into positive and negative matrices and binarized (1 = included, 0 = not included).
21
Using the positive and negative matrices, the included edges are each multiplied by the weights of the training participants and summed (or averaged). These summarized edge weight statistics are then individually (i.e., positive and negative separately) used to fit linear regression models. Then, using the slope and intercept coefficients from the linear models fit to the training participants, the test participant’s connectivity matrix is input into the regression equation to produce a predicted behavior score for that participant. This process is repeated so that each participant is used a test participant, yielding a brain connectivity predicted behavior score for each participant. The predicted score is then regressed on (correlated with) the observed score. A positive correlation with the positive connectivity profile is interpreted as people higher in the behavior having increased connectivity in these specific connections. Conversely, a positive correlation with the negative connectivity profile is interpreted as people higher in the behavior having decreased connectivity in these specific connections. Negative correlations are a bit more challenging to interpret. A negative correlation with the positive connectivity profile suggests that as the observed score increases, the connectivity profile predicts that the behavior actually decreases, meaning increased connectivity in these specific connections predicts the opposite of the observed score. In contrast, a negative correlation with the negative connectivity profile suggests that as the observed score increases, the connectivity profile predicts that the behavior actually decreases, meaning decreased connectivity in these specific connections predicts the opposite of the observed score. Below are the arguments for cpmIV(): R> cpmIV(neuralarray, bstat, covar, thresh = 0.01, + method = c("mean", "sum"), + model = c("linear", "quadratic", "cubic"), + corr = c("pearson", "spearman"), shen = FALSE, progBar = TRUE) The arguments of cpmIV() refer to the brain connectivity array (neuralarray) that is associated with a behavior measure (bstat). If a researcher wishes to include a covariate (or many), then the covar argument can be set up as a list (e.g., covar = list(age, sex)). The thresh argument specifies the threshold to be used when selecting edges that are significantly associated with the behavior (defaults to .01). The method argument can be set to either ‘‘mean’’ or ‘‘sum’’ for the summarized statistic used in the regression model (defaults to ‘‘mean’’). model is used to specify whether the regression equation should be ‘‘linear’’, ‘‘quadratic’’, or ‘‘cubic’’ (defaults to ‘‘linear’’). Notably, the regression model does not affect which edges are selected but rather the fit of the behavioral prediction. This can lead to increases in the correlation between the observed and predicted scores. The corr argument allows the researcher to select the type of correlation that should be applied for edge selection (defaults to ‘‘pearson’’). Below is an example using the resting-state brain data mentioned earlier and each participant’s averaged NEO-PI-3 Openness to Experience score (behav): R> openCPM write.table(openCPM$negMask, "OpenNegMask.txt", + row.names=FALSE, col.names=FALSE) R> write.table(openCPM$posMask, "OpenPosMask.txt", + row.names=FALSE, col.names=FALSE)
Figure 9: Negative connectivity profile map of within and between macroscale lobe connections.
Figure 10: The top and left view of the negative connectivity profile’s glass brain
24
One useful output from the BioImage Suite Connectivity Viewer is a list of the top 10 degree nodes (ROIs with the most connections). Below, I’ve provided a table with the top 10 degree ROIs, their associated macroscale region (lobe), their Montreal Neurological Institute (MNI) coordinates, and the number of connections for the negative connectivity profile:
ROI L Orbitofrontal Cortex R Ventral Striatum L Caudate Nucleus R Cerebellum R Hippocampus L Medial Prefrontal Cortex L Amygdala L Anterior Cingulate Cortex L Cerebellum R Cerebellum
Lobe Prefrontal Subcortical Subcortical Cerebellum Limbic Prefrontal Limbic Limbic Cerebellum Cerebellum
MNI -18.2, 19, -21 -12.6, 20.2, -0.7 -14.6, -3.5, 21 -6.1, -50.7, -12.3 28.8, -36.9, 0 -5.4, 29.1, -10.1 -21.4, -4.1, -29.4 -6, 34.1, 26.3 -6.5, -50.2, -11.4 21.1, -54.8, -23.8
k 10 8 7 6 5 5 5 4 4 4
Based on the plots and the top 10 degree nodes, I see that most of the connections were between the prefontal, subcortical, cerebellum, and limbic macroscale regions of the brain. Observed scores were positively correlated with the predicted scores in the negative prediction profile, suggesting that people higher in Openness to Experience had decreased connectivity in these regions.
3
Summary
Network science is a rapidly developing statistical approach in psychology. The appeal of the approach is the intuitive modeling of complexity that is intrinsic in many psychological phenomena. NetworkToolbox is designed to equip the psychological researcher with different network tools that have been developed over decades of work in graph theory and the complex systems domains. In this article, many functions of NetworkToolbox were introduced with relevant details for appropriate use. The examples of code used throughout this article can be used as a blueprint for any researcher interested in running their own psychological network analysis. Notably, not all of the functions available in NetworkToolbox were discussed, however, documentation with examples are provided within the package and many of the arguments discussed in this article are used across the package. Moreover, NetworkToolbox is not exhaustive—there are many other methods and measures for constructing, analyzing, and quantifying networks. Because of this, NetworkToolbox will continue to expand to equip psychological researchers with the newest advancements in the domain.
4
Computational Details
The results of this paper were obtained using R 3.4.4 with the Networktoolbox 1.1.1 package and the networks were visualized using the qgraph 1.4.4 package. Importing Matlab files into R can be achieved using the R package R.matlab 3.6.1. R itself and
25
all packages used are available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/.
Acknowledgements I would like to thank Paul Silvia and Yoed Kenett for their insightful contributions throughout the development of NetworkToolbox. I’d also like to thank Roger Beaty for the involvement in the collection and analysis of the brain data provided in the package. Finally, I’d like to thank Monica Rosenberg for her fruitful feedback on the implementation of CPM.
References Achard, S. 2012. brainwaver: Basic Wavelet Analysis of Multivariate Time Series with a Visualisation and Parametrisation using Graph Theory. R package version 1.6. Achard, S. and E. Bullmore 2007. Efficiency and cost of economical brain functional networks. PLoS Computational Biology, 3(2):e17. Aste, T. and T. Di Matteo 2017. Causality network retrieval from short time series. arXiv:1706.01954.
arXiv preprint
Aste, T., T. Di Matteo, and S. Hyde 2005. Complex networks on hyperbolic surfaces. Physica A: Statistical Mechanics and its Applications, 346(1-2):20–26. Barab´ asi, A.-L. 2011. The network takeover. Nature Physics, 8(1):14–16. Barfuss, W., G. P. Massara, T. Di Matteo, and T. Aste 2016. Parsimonious modeling with information filtering networks. Physical Review E, 94(6):062306. Beaty, R. E., Q. Chen, A. P. Christensen, J. Qiu, P. J. Silvia, and D. L. Schacter 2018a. Brain networks of the imaginative mind: Dynamic functional connectivity of default and cognitive control networks relates to openness to experience. Human Brain Mapping, 39(2):811–821. Beaty, R. E., Y. N. Kenett, A. P. Christensen, M. D. Rosenberg, M. Benedek, Q. Chen, A. Fink, J. Qiu, T. R. Kwapil, M. J. Kane, et al. 2018b. Robust prediction of individual creative ability from brain functional connectivity. Proceedings of the National Academy of Sciences, 201713532:1–6. Bengtsson, H. 2016. R.matlab: Read and Write MAT Files and Call MATLAB from within R. R package version 3.6.1.
26
Benjamini, Y. and Y. Hochberg 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological), Pp. 289–300. Borsboom, D. 2017. A network theory of mental disorders. World Psychiatry, 16(1):5–13. Borsboom, D. and A. O. Cramer 2013. Network analysis: An integrative approach to the structure of psychopathology. Annual Review of Clinical Psychology, 9:91–121. Bullmore, E. and O. Sporns 2012. The economy of brain network organization. Nature Reviews Neuroscience, 13(5):336–349. Champely, S. 2017. pwr: Basic Functions for Power Analysis. R package version 1.2-1. Christensen, A. P. 2017. Remotely close associations: Openness to experience and semantic memory structure. Master’s thesis, The University of North Carolina at Greensboro. Christensen, A. P., K. N. Cotter, and P. J. Silvia 2018a. Reopening openness to experience: A network analysis of four openness to experience inventories. Journal of Personality Assessment. Christensen, A. P., Y. N. Kenett, T. Aste, P. J. Silvia, and T. R. Kwapil 2018b. Network structure of the wisconsin schizotypy scales-short forms: Examining psychometric network filtering approaches. Behavior Research Methods, Online First:1–20. Chu, Y. J. and T. H. Liu 1965. On the shortest arborescence of a directed graph. Science Sinica, 14:1396– 1400. Collaboration, O. S. et al. 2015. Estimating the reproducibility of psychological science. 349(6251):aac4716.
Science,
Cotter, K. N., A. P. Christensen, and P. J. Silvia 2018. Understanding inner music: A dimensional approach to musical imagery. under review. Cramer, A. O., S. Sluis, A. Noordhof, M. Wichers, N. Geschwind, S. H. Aggen, K. S. Kendler, and D. Borsboom 2012. Dimensions of normal personality as networks in search of equilibrium: You can’t like parties if you don’t like people. European Journal of Personality, 26(4):414–431. Csardi, G. and T. Nepusz 2006. The igraph software package for complex network research. InterJournal, Complex Systems, 1695(5):1–9.
27
DeYoung, C. G., L. C. Quilty, and J. B. Peterson 2007. Between facets and domains: 10 aspects of the big five. Journal of Personality and Social Psychology, 93(5):880–896. Dimitriadis, S. I., C. Salis, I. Tarnanas, and D. E. Linden 2017. Topological filtering of dynamic functional brain networks unfolds informative chronnectomics: A novel data-driven thresholding scheme based on orthogonal minimal spanning trees (omsts). Frontiers in Neuroinformatics, 11:1–28. Edmonds, J. 1967. Optimum branchings. Journal of Research of the National Bureau of Standards, 71:233–240. Epskamp, S., D. Borsboom, and E. I. Fried 2018. Estimating psychological networks and their accuracy: A tutorial paper. Behavior Research Methods, 50(1):195–212. Epskamp, S., A. O. Cramer, L. J. Waldorp, V. D. Schmittmann, D. Borsboom, et al. 2012. qgraph: Network visualizations of relationships in psychometric data. Journal of Statistical Software, 48(4):1–18. Epskamp, S. and E. I. Fried in press. A tutorial on regularized partial correlation networks. Psychological Methods. Fallani, F. D. V., V. Latora, and M. Chavez 2017. A topological criterion for filtering information in complex brain networks. PLoS Computational Biology, 13(1):e1005305. Finn, E. S., X. Shen, D. Scheinost, M. D. Rosenberg, J. Huang, M. M. Chun, X. Papademetris, and R. T. Constable 2015. Functional connectome fingerprinting: Identifying individuals using patterns of brain connectivity. Nature Neuroscience, 18(11):1664–1671. Forbes, M. K., A. G. C. Wright, K. E. Markon, and R. F. Krueger 2017. Evidence that psychopathology symptom networks have limited replicability. Journal of Abnormal Psychology, 126(7):969–988. Forbes, M. K., A. G. C. Wright, K. E. Markon, and R. F. Krueger 2018. On the reliability and replicability of psychopathology network characteristics. under review. Fried, E. I. and A. O. Cramer 2017. Moving forward: Challenges and directions for psychopathological network theory and methodology. Perspectives on Psychological Science, 12(6):999–1020. Goodreau, S. M., M. S. Handcock, D. R. Hunter, C. T. Butts, and M. Morris 2008. A statnet tutorial. Journal of Statistical Software, 24(9):1–27. Gr¨ omping, U. et al. 2006. Relative importance for linear regression in R: The package relaimpo. Journal of statistical software, 17(1):1–27.
28
Hsu, W.-T., M. D. Rosenberg, D. Scheinost, R. T. Constable, and M. M. Chun 2018. Resting-state functional connectivity predicts neuroticism and extraversion in novel individuals. Social Cognitive and Affective Neuroscience. Jacob, Y., Y. Winetraub, G. Raz, E. Ben-Simon, H. Okon-Singer, K. Rosenberg-Katz, T. Hendler, and E. Ben-Jacob 2016. Dependency network analysis (depna) reveals context related influence of brain network nodes. Scientific Reports, 6:27444. Johnson, J. W. and J. M. LeBreton 2004. History and use of relative importance indices in organizational research. Organizational Research Methods, 7(3):238–257. Jones, P. 2018. networktools: Tools for Identifying Important Nodes in Networks. R package version 1.1.1. Joyce, K. E., P. J. Laurienti, J. H. Burdette, and S. Hayasaka 2010. A new measure of centrality for brain networks. PLoS One, 5(8):e12200. Kenett, D. Y., M. Tumminello, A. Madi, G. Gur-Gershgoren, R. N. Mantegna, and E. Ben-Jacob 2010. Dominating clasp of the financial sector revealed by partial correlation analysis of the stock market. PloS one, 5(12):e15032. Kenett, Y. N., D. Y. Kenett, E. Ben-Jacob, and M. Faust 2011. Global and local features of semantic networks: Evidence from the hebrew mental lexicon. PloS one, 6(8):e23912. Kenett, Y. N., O. Levy, D. Y. Kenett, H. E. Stanley, M. Faust, and S. Havlin 2018. Flexibility of thought in high creative individuals represented by percolation analysis. Proceedings of the National Academy of Sciences, 201717362. Kenett, Y. N., D. Wechsler-Kashi, D. Y. Kenett, R. G. Schwartz, E. Ben Jacob, and M. Faust 2013. Semantic organization in children with cochlear implants: Computational analysis of verbal fluency. Frontiers in Psychology, 4:543. Kivim¨ aki, I., B. Lebichot, J. Saram¨ aki, and M. Saerens 2016. Two betweenness centrality measures based on randomized shortest paths. Scientific Reports, 6:19668. Koller, D. and N. Friedman 2009. Probabilistic Graphical Models: Principles and Techniques. MIT press. Korkmaz, S., D. Goksuluk, and G. Zararsiz 2014. MVN: An R package for assessing multivariate normality. The R Journal, 6(2):151–162. Kotov, R., R. F. Krueger, D. Watson, T. M. Achenbach, R. R. Althoff, R. M. Bagby, T. A. Brown, W. T. Carpenter, A. Caspi, L. A. Clark, et al. 2017. The hierarchical taxonomy of psychopathology (hitop): A dimensional alternative to traditional nosologies. Journal of Abnormal Psychology, 126(4):454–477.
29
Kwapil, T. R., N. Barrantes-Vidal, and P. J. Silvia 2008. The dimensional structure of the wisconsin schizotypy scales: Factor identification and construct validity. Schizophrenia Bulletin, 34(3):444–457. Lee, K. and M. C. Ashton 2016. Psychometric properties of the hexaco-100. Assessment, 1073191116659134:1– 15. Lefort-Besnard, J., D. S. Bassett, J. Smallwood, D. S. Margulies, B. Derntl, O. Gruber, A. Aleman, R. Jardri, G. Varoquaux, B. Thirion, et al. 2018. Different shades of default mode disturbance in schizophrenia: Subnodal covariance estimation in structure and function. Human Brain Mapping. Loh, P.-L. and M. J. Wainwright 2012. Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses. In Advances in Neural Information Processing Systems, Pp. 2087–2095. Ly, A., J. Verhagen, and E.-J. Wagenmakers 2016. Harold jeffreys’s default bayes factor hypothesis tests: Explanation, extension, and application in psychology. Journal of Mathematical Psychology, 72:19–32. Mantegna, R. N. 1999. Hierarchical structure in financial markets. The European Physical Journal B Condensed Matter and Complex Systems, 11(1):193–197. Markon, K. E., R. F. Krueger, and D. Watson 2005. Delineating the structure of normal and abnormal personality: An integrative hierarchical approach. Journal of Personality and Social Psychology, 88(1):139–157. Massara, G. P., T. Di Matteo, and T. Aste 2016. Network filtering for big data: Triangulated maximally filtered graph. Journal of Complex Networks, 5(2):161–178. McCrae, R. R. 2015. A more nuanced view of reliability: Specificity in the trait hierarchy. Personality and Social Psychology Review, 19(2):97–112. McCrae, R. R., P. T. Costa, Jr, and T. A. Martin 2005. The neo-pi-3: A more readable revised neo personality inventory. Journal of Personality Assessment, 84(3):261–270. McCrae, R. R. and P. T. Costa Jr 2007. Brief versions of the neo-pi-3. Journal of Individual Differences, 28(3):116– 128. P´erez, M.-E. and L. R. Pericchi 2014. Changing statistical significance with the amount of information: The adaptive α significance level. Statistics & Probability Letters, 85:20–24. Pozzi, F., T. Di Matteo, and T. Aste 2013. Spread of risk across financial markets: Better to invest in the peripheries. Scientific Reports, 3:1665.
30
R Core Team 2017. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Revelle, W. 2017. psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University, Evanston, Illinois. R package version 1.7.8. Robinaugh, D. J., N. J. LeBlanc, H. A. Vuletich, and R. J. McNally 2014. Network analysis of persistent complex bereavement disorder in conjugally bereaved adults. Journal of Abnormal Psychology, 123(3):510–522. Rosenberg, M. D., E. S. Finn, D. Scheinost, X. Papademetris, X. Shen, R. T. Constable, and M. M. Chun 2016. A neuromarker of sustained attention from whole-brain functional connectivity. Nature Neuroscience, 19(1):165–171. Rosenberg, M. D., W.-T. Hsu, D. Scheinost, R. Todd Constable, and M. M. Chun 2018. Connectome-based models predict separable components of attention in novel individuals. Journal of Cognitive Neuroscience, (Early Access):1–14. Rouder, J. N., R. D. Morey, J. Verhagen, J. M. Province, and E.-J. Wagenmakers 2016. Is there a free lunch in inference? Topics in Cognitive Science, 8(3):520–547. Rubinov, M. and O. Sporns 2010. Complex network measures of brain connectivity: Uses and interpretations. Neuroimage, 52(3):1059–1069. Schmittmann, V. D., A. O. Cramer, L. J. Waldorp, S. Epskamp, R. A. Kievit, and D. Borsboom 2013. Deconstructing the construct: A network perspective on psychological phenomena. New Ideas in Psychology, 31(1):43–53. Shen, X., E. S. Finn, D. Scheinost, M. D. Rosenberg, M. M. Chun, X. Papademetris, and R. T. Constable 2017. Using connectome-based predictive modeling to predict individual behavior from brain connectivity. Nature Protocols, 12(3):506–518. Shen, X., F. Tokoglu, X. Papademetris, and R. T. Constable 2013. Groupwise whole-brain parcellation from resting-state fmri data for network node identification. Neuroimage, 82:403–415. Song, W.-M., T. Di Matteo, and T. Aste 2011. Nested hierarchies in planar graphs. 159(17):2135–2146.
Discrete Applied Mathematics,
Song, W.-M., T. Di Matteo, and T. Aste 2012. Hierarchical information clustering by means of topologically embedded graphs. PLoS One, 7(3):e31929. Stam, C., P. Tewarie, E. van Dellen, E. van Straaten, A. Hillebrand, and P. van Mieghem 2014. The trees and the forest: Characterization of complex brain networks with minimum spanning trees. International Journal of Psychophysiology, 92(3):129–138.
31
Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological), Pp. 267–288. Tumminello, M., T. Aste, T. Di Matteo, and R. N. Mantegna 2005. A tool for filtering information in complex systems. Proceedings of the National Academy of Sciences, 102(30):10421–10426. Tumminello, M., C. Coronnello, F. Lillo, S. Micciche, and R. N. Mantegna 2007. Spanning trees and bootstrap reliability estimation in correlation-based networks. International Journal of Bifurcation and Chaos, 17(7):2319–2329. van Borkulo, C., L. Boschloo, D. Borsboom, B. W. Penninx, L. J. Waldorp, and R. A. Schoevers 2015. Association of symptom network structure with the course of depression. JAMA Psychiatry, 72(12):1219–1226. van Borkulo, C. D., D. Borsboom, S. Epskamp, T. F. Blanken, L. Boschloo, R. A. Schoevers, and L. J. Waldorp 2014. A new method for constructing networks from binary data. Scientific Reports, 4:5918. van Borkulo, C. D., L. Boschloo, J. J. Kossakowski, P. Tio, R. A. Schoevers, D. Borsboom, and L. J. Waldrop 2018. Comparing network structures on three aspects: A permutation test. under review. van den Heuvel, M. P., S. C. de Lange, A. Zalesky, C. Seguin, B. T. Yeo, and R. Schmidt 2017. Proportional thresholding in resting-state fmri functional connectivity networks and consequences for patient-control connectome studies: Issues and recommendations. Neuroimage, 152:437–449. van Wijk, B. C., C. J. Stam, and A. Daffertshofer 2010. Comparing brain networks of different size and connectivity density using graph theory. PloS one, 5(10):e13701. Wagenmakers, E.-J., J. Verhagen, and A. Ly 2016. How to quantify the evidence for the absence of a correlation. Behavior Research Methods, 48(2):413–426. Watson, C. G. 2017. brainGraph: Graph Theory Analysis of Brain MRI Data. R package version 1.0.0. Winterstein, B. P., P. J. Silvia, T. R. Kwapil, J. C. Kaufman, R. Reiter-Palmon, and B. Wigert 2011. Brief assessment of schizotypy: Developing short forms of the wisconsin schizotypy scales. Personality and Individual Differences, 51(8):920–924. Woo, S. E., O. S. Chernyshenko, A. Longley, Z.-X. Zhang, C.-Y. Chiu, and S. E. Stark 2014. Openness to experience: Its lower level structure, measurement, and crosscultural equivalence. Journal of Personality Assessment, 96(1):29–45.
32
Zalesky, A., A. Fornito, and E. T. Bullmore 2010. Network-based statistic: Identifying differences in brain networks. Neuroimage, 53(4):1197–1207.
33