Strotheretal A Hierarchy of Cognitive Brain ...

Strother SC, Sharaf S, Grady C. A Hierarchy of Cognitive Brain Networks Revealed by Multivariate Performance Metrics. Proc. 48th ASILOMAR Conference on Signals, Systems and Computers, pp. 603-607, IEEE, Pacific Grove, USA, Nov. 2014

A Hierarchy of Cognitive Brain Networks Revealed by Multivariate Performance Metrics Stephen C. Strother

Saman Sarraf

Cheryl Grady

Rotman Research Institute, Baycrest 3560 Bathurst Street Toronto, ON M6A-2E1 Canada



Abstract-To evaluate discriminant models in fMRI data we introduce the pseudo-Receiver Operating Characteristic plot defined by subsampled, spit-half measures of prediction (P) versus spatial pattern reproducibility (R). We illustrate (P, R) plots using denoised fMRI data with 10%-100% of the components from a 1st-level principal component analysis (PCA). An LD model is then regularized in split-half subsamples with 2nd-level PCAs that retain Q PCs from the largest to smaller variance. We show that the resulting Z-scored, LD spatial maps with monotonically increasing P and Q reflect regionallydependent hierarchies of underlying brain-networks adapted to meet particular task demands. I.

INTRODUCTION

One of the central goals of the field of fMRI research applied to studying the human brain is the extraction of reliable, predictive patterns of task-dependent spatial regions that may reflect the brain networks underlying human behavior. Such regional patterns are reflected in the metaanalytic summaries of regional locations found to be active in a broad range of cognitive tasks as collected across thousands of published fMRI experiments in the BrainMap database [1]. Alternatively, bivariate spatio-temporal decompositions using independent component analysis (ICA) have been widely used with so-called resting state data sets without overt experimental tasks [2]. The goal of such resting state experiments is to understand brain networks based on the extent to which they are reflected in the extracted spatiotemporal ICA components. The validity of these two approaches has been significantly increased by recent work linking ICA spatial components from both resting state fMRI experiments and meta-analytic spatial activation summaries from BrainMap [3, 4]. In addition, we have recently shown that in resting state experiments essentially the same spatial subspace is spanned by ICA, and subsampled techniques based on an initial bivariate, spatio-temporal principal component analysis (PCA) followed by agnostic canonical variates analysis (aCVA, a form of multi-class linear discriminant) or generalized canonical correlation analysis (gCCA) [5]. Therefore, the initial bivariate decomposition is likely to be unimportant (i.e., ICA or PCA) when followed by LD or gCCA. Here we demonstrate an approach closely related to aCVA using initial, cascaded PCAs followed by LD in a task setting. We show that our approach provides a task-dependent

component decomposition with two unique advantages compared to ICA or aCVA: 1) the resulting components are not constrained to be spatially or temporally independent or uncorrelated, and 2) they are obtained as a hierarchy ordered according to their importance in predicting task performance. II.

METHODS

A. Prediction and Reproducibility Performance Metrics Although crossvalidated prediction accuracy alone can be an effective metric for general machine-learning problems, neuroimaging also demands that the discriminant spatial pattern obtained from a predictive model be reproducibile or generalizable between different groups of subjects or across different scans of the same subject. In high-dimensional brain mapping problems the reliability of the extracted statistical parametric maps (SPMs) of brain activations and the spatial voxel locations that significantly influence prediction performance are often the critical outputs of the modeling process that allow interpretation of the underlying brain processes. Together with prediction accuracy, subsampled reproducibility estimation has proven to be an important metric that is an effective data-driven substitute for Receiver Operating Curve (ROC) analysis when analyzing signal detection in simulated fMRI experiments [6, 7]. To obtain combined prediction and reproducibility SNR values Strother and colleagues used a novel split-half subsampling framework dubbed NPAIRS and applied it to PET [8, 9]). and fMRI (e.g., [10-12]). Using NPAIRS for discriminant analysis of fMRI data we have explored replacement of the true positive (TP) axis of a ROC plot with a cross-validated prediction metric (P) based on the experimental task design in the temporal domain. Prediction estimates are based on split-half subsamples, which provide training and test sets. Furthermore, we replaced the false positive (FP) horizontal axis of a ROC plot by a SPM reproducibility metric (R). R reflects the estimated pattern SNR based on comparing the independent SPMs from splithalf subsamples as outlined below. This FP substitution with R is at best an approximation as any measure of similarity between independent patterns extracted from subsampled data sets will contain an unknown model bias because the true brain pattern is unknown. The joint estimation of R, with P from experimental design labels represents an attempt to

control such discriminant pattern bias using the structure of the experimental task design. A (P, R) plot provides a single optimal performance point with perfect prediction and similarity/reproducibility that is related to the optimal ROC point with TP=1 and FP=0. Here the experimental truth is reflected by prediction and the similarity SNR reflects an inverse measure of false positives where r→1 ==> SNR→∞ ==> FP→0 for an unbiased SPM. B. PCA-Regularised LD with Split-half Subsampling Consider an fMRI data set S={v! , c! }! !!! as a Vdimensional feature vector of spatial voxel locations and N=JT scans for J independent data-set objects (e.g., subjects) of T scans each; S is NxV with V >> N and the brain scan class labels are c ∈ [-1,1]. The independent observations of J objects are split into two independent halves S=[S1, S2] creating training and test sets, representing a form of crossvalidation subsampling that is repeated many times. As we have observed in our own work over the last decade such splitting is an important way of stabilizing parameter estimates in ill-posed classification models as recently shown in the statistical literature [13]. Typically in neuroimaging we have V~10k-100k, with J~2-10s and T~25–100s of scans/subject. Consequently S is large and ill-posed, its covariance cannot be directly inverted, and regularization is used to compute and stabilize discriminant model estimates. Here we describe a 2class LD regularized with 2 levels of cascaded PCA feature reduction with hard subspace thresholds applied at each level. PCA feature reduction is used to try to concentrate modeling estimation on subspaces that are likely to capture at least the linear voxel interactions that reflect functional connectivity of underlying brain networks. The steps of the NPAIRS procedure on cascaded PCA basis estimates are as follows. Step 1. Given a first PCA or singular value decomposition (SVD1) S=ULVT, we compute the eigendecomposition, SST=UL2UT and proceed with a reduced basis set, X*=SV*=U*L* where we retain a reduced fraction, d, of the N possible PCA components so that X* has size (NxdN). Assuming V >> N this achieves a considerable computational speedup as well as an initial data denoising depending on the size of d, which is treated as our first regularization parameter defining a hard subspace threshold. Step 2. Randomly partition X* into two independent splithalf groups across independent observation units (e.g., subjects) to obtain X*=[X1, X2]=[S1, S2]V* where Xi has size (NixdN), Ji=J/2 for J even, or Ji=J/2±0.5 for J odd. The Xi matrices of size (JiTxdN) are our basic modeling data unit achieving a further computational speed up. Step 3. Given a second PCA or SVD2, Xi=YiLiRiT we compute separate, second-level eigendecompositions, Xi*=XiRi*T= Yi*Li* on X1 and X2, and retain Q components so that Xi* has size (QxTi) where Ti=JiT. Q is varied from 1 to Qmax ≤ min(100-500,dN) achieving a further large dimensionality and feature reduction, with corresponding computational efficiencies.

Step 4. Next train LD separately on X1* and X2* using the other independent split-half as a test set to obtain the posterior probability of true-class membership, P. This is performed as a function of regularizers d and Q with a single pair of (P, R) values obtained from each set of split-half data sets for 400 randomly drawn subsamples. Note that we recognize that the NPAIRS P-values are biased; both upwards as a result of optimizing model parameters (i.e, Q) using only training and validation sets without a final test set, and then downwards, relative to leave-one-out cross-validation, as a result of using split-half resampling. However, this is not critical as our primary interest is relative not absolute prediction model performance as a function of regularization. Step 5. For each trained prediction model we calculate a discriminant feature vector Di so that for each set of split-half subsamples we obtain independent pairs [D1,D2]. The reproducibility of D1 and D2 is defined as the correlation (R) between all pairs of their spatially aligned voxels. R is directly related to the available SNR in each pair of Di. Step 6. Record the average, or median, of the resulting P and R distributions across 400 splits for each choice of Q. We then repeat this procedure to obtain a (P,R) curve as a function of Q. Step 7. Last we obtain a single, robust Z-scored discriminant feature vector from each split-half data set [D1,D2] for a given value of Q, i.e., rSPMQ(z). R is calculated from a scatter plot of the independent Di and the projection of all pairs of voxel values onto its principle axis defines a consensus rSPM. These projected rSPM values are then scaled by the pooled noise estimate, (1 − !) , from the minor axis [9]. This noise estimate is uncorrelated by construction and the resulting rSPMQ(z) values will be approximately normally distributed; in practice this is found to be a good approximation [10]. This procedure is robust to heterogeneity across the split objects (e.g., subjects) as more heterogeneous split-half pairs will produce smaller R's and larger (1-R) pooled noise estimates, and therefore lower rSPMQ(z) values than more homogeneous splits. Therefore, if we average rSPMQ(z)'s across all splits for a given Q we obtain a consensus technique for Z-scoring any prediction model, which produces robust, feature parameter estimates. Note that we could obtain our estimates of the principal components (PCs) needed using either a SVD or a PCA eigendecomposition of the smaller NxN covariance matrices. When needed, the resulting eigenvectors and their linear combinations can be projected back into the voxel data space using matrix multiplication. We chose to use an eigendecomposition of the covariance, as this is considerably faster to compute than an SVD (see Appendix of [14] for further details). C. FMRI Data Set These multi-task data were acquired using a 3.0T Siemens Trio MRI scanner from 19 young adults (mean age 25 ± 3 years, range 20-30, 10 women) as described in [15]. For each subject, a high resolution structural T1 (SPGR: TE= 2.6 ms, TR=2000 ms, FOV=256 mm, slice thickness=1 mm) was

acquired, and blood oxygen level-dependent (BOLD) activity fMRI scans (TE=30 ms, TR=2000 ms, flip angle=70, FOV=200 mm) were obtained using echo planar imaging acquisitions. Each functional sequence consisted of 28 5-mm thick axial slices with 3.125 mm2 voxels, positioned to image the whole brain. Subjects were responding to several visual stimuli during the scan by pressing a button. The visual stimuli were band-pass filtered white noise patches with different center frequencies. During the scans there were blocks of four task conditions alternating with blocks of fixation (FIX); 1) simple reaction time (RT); 2) perceptual matching (PMT); 3) attentional cueing (ATT) and 4) delayed match-to-sample (DMS). In this study we used the RT and PMT task data only. In the RT task, a single stimulus appeared for 1000 ms in one of three locations at the bottom of the display (left, central, or right), and participants pressed one of three buttons with their right hand to indicate the location where the stimulus appeared for 12 trials in each RT block. In PMT, a sample stimulus appeared centrally in the upper portion of the screen along with three choice stimuli located in the lower part of the screen (for 4000 ms). The task was to indicate which of the three choice stimuli matched the sample. Six such trials occurred in each PMT block. In all tasks, the inter-trial interval was 2000 ms. An fMRI run was acquired for each subject using a block design with eight alternating taskfixation conditions (FIX) per run (20-25 scans/task-period alternating with 10 scans/fixation-period, TR=2s) for four tasks (ATT, DMS, PMT, RT) with two repetitions of each task. Preprocessing of the fMRI time series data included the following steps for each subject [16]: (1) slice-timing correction, (2) rigid body within-subject realignment (3) spatial smoothing with a 7 mm FWHM Gaussian kernel, (4) artifact-carrying independent components were qualitatively identified and removed using the MELODIC package [17], (5) between-subject alignment of fMRI scans based on spatial normalization of the structural scan to a study specific template (details are described in [15]), (5) using standard white matter and CSF masks, mean within-mask signals were obtained and regressed from the time course of each voxel, (6) temporal linear trends were regressed out per voxel, (7) the scans were masked with an approximate whole-brain mask retaining 21401 voxels. For the classification analysis we discarded two transition scans at the start of each block, which gave 18-23 scans per task block for an average of 83.3 scans total, per subject for a data matrix S of 1584 x 21401 with J=19 subjects. III.

RESULTS and DISCUSSION

Fig. 1 shows the (P, R) curves obtained for a linear discriminant analysis of the PMT vs. RT task blocks from the 19 subjects’ fMRI runs. Three curves are plotted for 1584 (d=100%), 475 (d=30%) and 158 (d=10%) principal components from PCA1 as a function of 33 split-half, PCA2 subspace sizes, Q=[1(+1)10, 14(+4)66, 75(+25)300]. All the following results reflect the (P, R) curve with d=100%. The

Fig. 1. Plot of prediction (P) versus reproducibility (R) values for PMT vs. RT task blocks with split-half subsampling of 19 subjects and a linear discriminant analysis regularized with a two-stage, cascaded PCA. The three (P, R) curves are plotted as a function of the split-half, PCA2 subspace size Q for different fractions of components kept, d, from PCA1.

distribution of P values across the 19 subjects in the split-half test sets is significantly different from random guessing (P=0.5) for Q=2 (P=0.516, CI95%=0.507-0.526) and Q=10 (P=0.587, CI95%=0.561-0.613). P monotonically increases with Q, although very slowly for Q=4-10, indicating that the Q associated discriminant rSPM (z) form a hierarchy of spatial activation patterns. These are progressively more strongly coupled to task performance based on separating the PMT from the RT fMRI scans, i.e., larger Ps. Furthermore, the P values for Q=(250, 50, 10) are significantly larger than for Q=(50, 10, 2), respectively (250-50:ΔP=0.033, CI95%=0.0150.050; 50-10: ΔP=0.069, CI95%=0.052-0.085; 10-2: ΔP=0.071, CI95%=0.048-0.094). These significantly different levels of task coupling indicate that we can expect a range of different Q discriminant rSPM (z) patterns along the (P, R) curve. Q This is seen in Fig. 2, which shows the rSPM (z) spatial regions for Z > +2 and Z < -2 in separate panels for Q=2, 50 and 250. In Fig. 2 panels A and B reflect the very weakly task coupled pattern for Q=2 ( P =0.516). They show regions consistent with the well-known antagonistic patterns of (A) task positive—particularly dorsal attention, fronto-parietal control and visual areas—and (B) default spatial network patterns [18]. The discriminant scores for PMT and RT scans, which modulate the expression of rSPM2(z) are highly overlapped but are slightly positive for PMT (mean=0.143) and slightly negative for RT (mean=-0.118). This indicates that the task positive regions are somewhat more strongly expressed during PMT while the default regions are somewhat more strongly expressed during RT. This slight difference leads to the small but significant prediction difference from random guessing reported above. In Fig. 2 panels C and D reflect the much more strongly task-coupled pattern for Q=50 ( P =0.656). The positive pattern’s regions are consistent with the cognitive PMT including the left Inferior Frontal Gyrus and the right Fusiform Gyrus (slice 2), the right Superior Parietal lobule

Fig. 3. A similarity matrix of the spatial correlation between all possible pairs of the 33 discriminant rSPMQ(z)s underlying the (P, R) curve containing all possible 1st-level PCA1 components (i.e., d=100%) in Fig. 1.

Fig. 2. Spatial regions in discriminant rSPMQ(z) (white blobs) for Z > +2 (upper panel) and Z < -2 (lower panel) with d=100% and Q=2 (A, B), Q=50 (C, D) and Q=250 (E, F). Thresholded rSPMQ(z) patterns are overlaid on the structural reference brain of the MNI-152 Tailarach atlas. Slices are referred to in the text from far left=1 to far right=8, and the brain images are oriented with left=left.

(slice 7), and the left Mid-Frontal Gyrus (slice 5). These are all more strongly expressed during PMT. The negative pattern’s regions include the left Post-Central Primary Motor region (slices 7, 8), the Medial Frontal Gyrus (slice 7), the right Mid-Temporal-Occipital Gyrus (slices 3, 4), the right Cerebellar Culmen (slice 2), and the right Inferior Cerebellum (slice 1). All of these regions are more strongly expressed in RT including the expected combination of the left primary motor and right cerebellar activations characteristic of a right-handed motor task [12]. In Fig. 2 panels E and F reflect the maximally task-coupled spatial pattern for Q=250 (P=0.689). This pattern primarily reflects weaker and more focal activations also seen in the Q=50 discriminant patterns (e.g., slices 7,8). The one exception to this appears to be the negative, right Inferior Cerebellar region in slice 1. Such smaller, focal patterns with sparse representations of known networks are characteristic of discriminant patterns based on maximizing P [12]. Figs. 1 and 2 reflect multiple distinct, spatial discriminant patterns in a task-coupled hierarchy. But it is unclear how many distinct spatial patterns exist across the (P, R) curves as a function of Q. Fig. 3 provides an initial answer to this question in the form of the number of distinct clusters seen in a correlation similarity matrix between all pairs of discriminant rSPMQ(z). There are 4 primary clusters visible comprised of the rSPMQ(z) from Q=[2, 3-10, 14-75, 100-300]. An agglomerative hierarchical clustering applied to the similarity matrix shown in Fig. 3 also produces 4 primary

clusters at a similarity truncation level of 0.64. In future work we will examine the finer substructure of the resulting dendrogram, which contains 5 clusters above a similarity level of 0.66, 6 above a level of 0.86 and 11 above a level of 0.95. Finally we examined the differential expression of individual discriminant regions across the (P, R) curve. From the linear discriminant pattern for Q=50 (Fig. 2, C, D) we converted the 9 regions described in the text into volumes of interest (VOI) by extracting separate masks defined by all locally connected voxels for Z > 4 and Z < -4. These VOIs where then applied to all 33 rSPMQ(z) to extract the mean VOI Z values as a function of Q. Fig. 4 plots these mean VOI Z-scores as a function of Q. We see that if a VOI is positive for Q=50, and therefore linked to stronger expression in PMT it tends to remain positive but with variable Z-score levels, particularly for the right Fusiform Gyrus and the right Superior Parietal lobe for Q = 34. An exception to this pattern is seen in the left Inferior Frontal Gyrus, which for Q < 6 crosses from being positive, and linked to stronger expression in PMT, to being negative and linked to stronger expression in RT. The VOIs that are negative at Q=50, and therefore linked to stronger expression in RT, remain negative for Q ≥ 3. However, with Q=2 three of the VOIs cross to small positive values, and the Medial Frontal Gyrus crosses from being strongly negative to being strongly positively expressed with PMT. Such differential regional expression as a function of P and Q suggests that the cognitive hierarchy reflects multiple underlying brain networks in which the same brain regions may play significantly different roles. IV CONCLUSIONS We have demonstrated a 2-stage, cascaded PCA for denoising, feature selection and regularization of a 2-class LD in a task setting. We show that our approach provides a task-dependent component decomposition with two unique advantages

[5]

[6]

[7]

[8]

[9]

Fig. 4. Plots of the average Z values in volumes of interest (VOI) extracted from rSPM50(z) for Z > ±4 as a function of Q. At Q=50 the positive profiles are most strongly expressed for PMT and the negative profiles for RT. See text for description of VOIs as part of Fig. 2 (C, D).

[10]

compared to other available bivariate component decompositions in the literature: 1) the resulting components are not constrained to be spatially or temporally independent or uncorrelated, and 2) they are obtained as a hierarchy ordered according to their importance in predicting task performance. Furthermore, within the task-coupled network hierarchy a single region may express both significant positive and negative discriminant weights reflecting its linkage to multiple tasks and networks at different levels of the hierarchy.

[11]

[12]

[13] [14]

REFERENCES [1] [2]

[3]

[4]

S. B. Eickhoff, D. Bzdok, A. R. Laird, F. Kurth, and P. T. Fox, "Activation likelihood estimation meta-analysis revisited," Neuroimage, vol. 59, pp. 2349-61, Feb 1 2012. S. M. Smith, D. Vidaurre, C. F. Beckmann, M. F. Glasser, M. Jenkinson, K. L. Miller, T. E. Nichols, E. C. Robinson, G. SalimiKhorshidi, M. W. Woolrich, D. M. Barch, K. Ugurbil, and D. C. Van Essen, "Functional connectomics from resting-state fMRI," Trends in cognitive sciences, vol. 17, pp. 666-82, Dec 2013. S. M. Smith, P. T. Fox, K. L. Miller, D. C. Glahn, P. M. Fox, C. E. Mackay, N. Filippini, K. E. Watkins, R. Toro, A. R. Laird, and C. F. Beckmann, "Correspondence of the brain's functional architecture during activation and rest," Proc Natl Acad Sci U S A, vol. 106, pp. 13040-5, Aug 4 2009. A. R. Laird, P. M. Fox, S. B. Eickhoff, J. A. Turner, K. L. Ray, D. R. McKay, D. C. Glahn, C. F. Beckmann, S. M. Smith, and P. T. Fox, "Behavioral interpretations of intrinsic connectivity networks,

[15]

[16] [17] [18]

"Journal of cognitive neuroscience, vol. 23, pp. 4022-37, Dec 2011. B. Afshin-Pour, C. Grady, and S. Strother, "Evaluation of spatiotemporal decomposition techniques for group analysis of fMRI resting state data sets," Neuroimage, vol. 87, pp. 363-82, Feb 15 2014. G. Yourganov, T. Schmah, N. W. Churchill, M. G. Berman, C. L. Grady, and S. C. Strother, "Pattern classification of fMRI data: Applications for analysis of spatially distributed cortical networks," Neuroimage, vol. 96, pp. 117-32, Aug 1 2014. G. Yourganov, X. Chen, A. S. Lukic, C. L. Grady, S. L. Small, M. N. Wernick, and S. C. Strother, "Dimensionality estimation for optimal detection of functional networks in BOLD fMRI data," Neuroimage, vol. 56, pp. 531-43, May 15 2011. S. C. Strother, Lange, N., Anderson, J.R., Schaper, K.A., Rehm, K., Hansen, L.K., Rottenberg, D.A., "Activation pattern reproducibility: Measuring the effects of group size and data analysis models.," Hum Brain Mapp, vol. 5, pp. 312-316, 1997. S. C. Strother, J. Anderson, L. K. Hansen, U. Kjems, R. Kustra, J. Sidtis, S. Frutiger, S. Muley, S. LaConte, and D. Rottenberg, "The quantitative evaluation of functional neuroimaging experiments: the NPAIRS data analysis framework," Neuroimage, vol. 15, pp. 747-71, Apr 2002. S. LaConte, J. Anderson, S. Muley, J. Ashe, S. Frutiger, K. Rehm, L. K. Hansen, E. Yacoub, X. Hu, D. Rottenberg, and S. Strother, "The evaluation of preprocessing choices in single-subject BOLD fMRI using NPAIRS performance metrics," Neuroimage, vol. 18, pp. 10-27, Jan 2003. S. C. Strother, P. M. Rasmussen, N. W. Churchill, and L. K. Hansen, "Stability and Reproducibility in fMRI Analysis. ," in Practical Applications of Sparse Modeling, I. Rish, G. A. Cecchi, A. Lozano, and A. Niculescu-Mizil, Eds., ed Boston: MIT Press, 2014. P. M. Rasmussen, L. K. Hansen, K. H. Madsen, N. W. Churchill, and S. C. Strother, "Pattern reproducibility, interpretability, and sparsity in classification models in neuroimaging.," Pattern Recognition, vol. 45, pp. 2085-2100, 2012. N. Meinshausen and P. Bühlmann, "Stability selection. ," Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 72, pp. 417–473, 2010. U. Kjems, L. K. Hansen, J. Anderson, S. Frutiger, S. Muley, J. Sidtis, D. Rottenberg, and S. C. Strother, "The quantitative evaluation of functional neuroimaging experiments: mutual information learning curves," Neuroimage, vol. 15, pp. 772-86, Apr 2002. C. L. Grady, A. B. Protzner, N. Kovacevic, S. C. Strother, B. Afshin-Pour, M. Wojtowicz, J. A. Anderson, N. Churchill, and A. R. McIntosh, "A multivariate analysis of age-related differences in default mode and task-positive networks across multiple cognitive domains," Cereb Cortex, vol. 20, pp. 1432-47, Jun 2010. S. C. Strother, "Evaluating fMRI preprocessing pipelines," IEEE Eng Med Biol Mag, vol. 25, pp. 27-41, Mar-Apr 2006. C. F. Beckmann and S. M. Smith, "Probabilistic independent component analysis for functional magnetic resonance imaging," IEEE Trans Med Imaging, vol. 23, pp. 137-52, Feb 2004. R. N. Spreng, J. Sepulcre, G. R. Turner, W. D. Stevens, and D. L. Schacter, "Intrinsic architecture underlying the relations among the default, dorsal attention, and frontoparietal control networks of the human brain," Journal of cognitive neuroscience, vol. 25, pp. 7486, Jan 2013.