THE TENTH ANNUAL MLSP COMPETITION - IEEE Xplore

21 downloads 129 Views 513KB Size Report
For the 24th Machine Learning for Signal Processing competition, participants were asked to automatically diagnose schizophrenia using multimodal features ...
2014 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 21–24, 2014, REIMS, FRANCE

THE TENTH ANNUAL MLSP COMPETITION: SCHIZOPHRENIA CLASSIFICATION CHALLENGE Rogers F Silva1,2, Eduardo Castro1, Cota Navin Gupta1, Mustafa Cetin1,2, Mohammad Arbabshirani1,2, Vamsi K Potluru1,3, Sergey M Plis1, Vince D Calhoun1,2,3 1

2

The Mind Research Network, 1101 Yale Blvd., Albuquerque, New Mexico 87106 Dept. of ECE, MSC01 1100, 1 University of New Mexico, Albuquerque, New Mexico 87131 3 Dept. of CS, MSC01 1130, 1 University of New Mexico, Albuquerque, New Mexico 87131 ABSTRACT

For the 24th Machine Learning for Signal Processing competition, participants were asked to automatically diagnose schizophrenia using multimodal features derived from MRI scans. The objective of the classification task was to achieve the best possible schizophrenia diagnosis prediction based only on the multimodal features derived from brain MRI scans. A total of 2087 entries from 291 participants with active Kaggle.com accounts were made. Each participant developed a classifier, with optional feature selection, that combined functional and structural magnetic resonance imaging features. Here we review details about the competition setup, the winning strategies, and provide basic analyses of the submitted entries. We conclude with a discussion of the advances made to the neuroimaging and machine learning fields. Index Terms— Competition, MRI, Schizophrenia, FNC, SBM 1. INTRODUCTION Schizophrenia is a severe and disabling mental illness which has no well-established, non-invasive diagnosis biomarker. Currently, due to its symptom overlap with other mental illnesses (like bipolar disorder) it is diagnosed subjectively only, by process of elimination. In order to broadly explore the current limits of automatic diagnosis of schizophrenia, this year’s machine learning competition was entitled The MLSP 2014 Schizophrenia Classification Challenge, which occurred from June 5 through July 20, 2014. For the second consecutive year, the competition was hosted on Kaggle.com. The response from neuroscience and machine learning communities was fantastic, including 291 participants in 245 teams, and an impressive 2087 valid entries. The competition was sponsored by Kaggle.com, who provided their hosting services, and Springer, who provided monetary rewards to the winning participants. In total, 611 entries from 151 participants outperformed the provided competition benchmark in all performance measures 978-1-4799-3694-6/14/$31.00 ©2014 IEEE

considered here. The three winning strategies used, in ranking order, Gaussian process (GP) [1], Gaussian kernel support vector machine (SVM) [2], and distance weighted discrimination (DWD) [3] classifiers, respectively. 2. MOTIVATION Magnetic resonance imaging (MRI) is an imaging modality that can capture diverse markers of the physiology and anatomy of the human brain by means of various acquisition protocols [4, 5]. MRI can be categorized as either functional or structural. For example, the information contained in blood oxygenation level-dependent (BOLD) functional MRI (fMRI) scans provides a proxy measurement of neuronal activity over both space and time. Functional network connectivity (FNC) [6] is an active area of research that explores the properties of neural interactions between brain networks based on the temporal information contained in BOLD fMRI scans. Disrupted FNC patterns have been found in schizophrenia and there is evidence that this information is useful to perform schizophrenia classification [7]. On the other hand, structural MRI (sMRI) scans can be used to characterize the density of different types of brain tissue such as gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF). Schizophrenia studies have shown evidence of GM density deficits in certain brain regions [8-10], which has been used for classification [1113]. The relationship between functional and structural data is believed to be the key to a better characterization of brain organization, which would in turn provide a better understanding of brain pathology in mental illnesses and, particularly, in schizophrenia. This provided the motivation for this competition. Thus, one of this competition’s goals was to evaluate how informative combined functional and structural data could be for schizophrenia classification. In order to engage individuals with diverse backgrounds, the data provided in this competition was preprocessed and only the derived features were made available to the participants. Also, to enable interpretability of the provided features, additional anatomical information was also disclosed.

3. DATASETS Data from 144 subjects were acquired using a 3T Siemens Trio MRI scanner with a 12-channel head coil. The dataset was composed of 75 healthy controls (52 males, mean age = 36.27 years, SD = 11.78 years, range: 18-65) and 69 schizophrenia patients (58 males, mean age = 37.32 years, SD = 13.80 years, range: 18-64). Data collection was conducted at The Mind Research Network (MRN, www.mrn.org) with approval and oversight from the University of New Mexico (UNM) human subjects research review committee. The datasets were preprocessed using standard pipelines as described below. Independent components were extracted from the preprocessed data utilizing the independent component analysis (ICA) results from a separate large-cohort (N=603) healthy control study [14, 15], using a technique called spatiotemporal regression (STR) [16] to avoid any potential bias or overfitting. Structural and functional data were mapped into the space of this healthy control baseline dataset. Features were then derived from fMRI and sMRI independent components. 3.1. Modality 1: Functional MRI Resting fMRI data was acquired from subjects who were instructed to relax and stay awake, comprising 149 volumes of T2*-weighted gradient-echo EPI images (TE = 29 ms, TR = 2 s, flip angle = 75°, slice thickness = 3.5 mm, slice gap = 1.05 mm, FOV = 240 mm, matrix size = 64×64, voxel size = 3.75×3.75×4.55 mm). Preprocessing was done in SPM5 (www.fil.ion.ucl.ac.uk/spm/software/spm5) and included removal of the first four volumes to remove T1equilibration effects, image realignment using INRIalign [17], and slice-timing correction. Data were spatially normalized to Montreal Neurological Institute (MNI) space [18] using a nonlinear (affine + low frequency direct cosine transform basis functions) registration, resampled to 3×3×3 mm voxels, and smoothed with a 10 mm full-width at halfmaximum (FWHM) Gaussian kernel. The time series data were rescaled to a mean of 100 [19] and analyzed with STR group independent component analysis (GICA) [20]. A two-step process was used to identify non-artifactual components [21]. First, two criteria applied to the power spectra: dynamic range (the difference between the peak power and minimum power at frequencies to the right of the peak in the power spectra) and low frequency/high frequency ratio (the ratio of the integral of spectral power below 0.10 Hz to the integral of power between 0.15 and 0.25 Hz [19]). For the second step, three expert reviewers evaluated the components for functional relevance. The components were separated into two broad classes: artifactual and non-artifactual components. Out of 75 components, 28 were identified as non-artifactual (rs-ICs 7, 17, 20, 21, 23, 24, 25, 29, 34, 38, 39, 42, 46, 47, 48, 49, 50, 52, 53, 55, 56, 59, 60, 64, 67, 68, 71, 72; [14]). Details are also available at www.kaggle.com/c/mlsp-2014-mri. The STR algorithm was then used, to estimate 75 independent

components (spatial maps) and corresponding timecourses for the competition dataset. These components were used to obtain 378 FNC correlation values. The FNC feature describes the connectivity pattern over time between independent networks (or brain maps). They consist of Pearson correlation values between all pairs of non-artifactual component timecourses, for each subject. FNC indicates a subject's overall level of “synchronicity” between brain areas. These correlation values were provided to the participants to train their classifiers. 3.2. Modality 2: Structural MRI High resolution T1-weighted structural images were acquired using an MPRAGE sequence with voxel size of 1×1×1 mm. Using the standard preprocessing pipeline presented in [8, 9, 23, 24], images were normalized using a 12-parameter affine model to the MNI template, resliced to 2×2×2 mm, and segmented into gray matter, white matter, and CSF using the SPM5 unified segmentation algorithm [18]. Outlier detection was performed based on correlations with an average gray matter map. Identified outliers were visually inspected, corrected and re-segmented if possible, and removed in cases where correction was not possible. Age and gender are known to affect gray matter volume and, thus, were regressed out voxelwise. Finally, a 10 mm FWHM Gaussian kernel was used to smooth the images. Gray matter density (or concentration, GMC) is comprised of the outer sheet of the brain as well as subcortical regions, and roughly indicates the location of the “computational units” of the brain. In source-based morphometry (SBM) [23], each subject’s GMC data is decomposed via ICA into a set of spatially independent component maps and corresponding normalized loading coefficients. These loadings indicate the presence or contribution of each component to each subject’s data. However, the interpretation of the loading coefficient difference depends upon the component. The convention we adopted here was that if the component was predominantly positive and the loading coefficients were greater in healthy controls than in patients with schizophrenia, then we would infer that GMC is greater in controls versus patients for that component [10]. An overview of ICA for GMC data, its strengths and limitations, is available in [23]. SBM was first performed on a baseline dataset from 603 healthy controls to obtain 75 independent component maps [15]. The structural data from the 144 subjects used in this competition were then projected into this 75-component SBM space using STR [16]. The SBM module of the GIFT Toolbox (http://mialab.mrn.org/software/gift/) was used to perform the STR projections. The obtained components were separated into two broad classes, artifactual and nonartifactual components, by expert reviewers and 32 component maps were identified as non-artifactual components (s-ICs 1, 2, 3, 4, 5, 6, 7, 8, 10, 13, 17, 22, 26, 28, 32, 36, 40, 43, 45, 48, 51, 52, 55, 61, 64, 67, 69, 71, 72, 73, 74, 75; [15]). Details are also available at

http://www.kaggle.com/c/mlsp-2014-mri. The loading coefficients (a 144×32 matrix) was provided to the participants to train their classifiers. 3.3. Additional Data In order to enable more principled multimodal feature selection and combination strategies, we provided files containing the component spatial maps for the FNC and SBM features. This additional information indicated the spatial extent of the components, which could be helpful in combining functional and structural features. 4. PROBLEM STATEMENT AND EVALUATION The problem proposed this year was a binary classification task. Formally, let FNC and SBM be the vectorial representations of the FNC correlation coefficients and SBM loadings of subject , respectively. Furthermore, let 0,1 be the class associated to subject (i.e., healthy control or schizophrenic patient, respectively). Given tuples FNC SBM , , , where is the size of the training set, the goal was to estimate the correct class for a new subject when its feature representation FNC , SBM was provided. Performance was estimated using the area under the receiver operating characteristic curve (AUC) [25]. A dataset of 144 subjects was therefore split into training (60%, 86 subjects) and test (40%, 58 subjects) class-balanced sets. In addition, the test set was further subdivided into public (~20%, 30 subjects), and private (~20%, 28 subjects) stratified sets. The entire dataset (144 subjects) was made available to the participants, except test set labels, which were not provided. Participants had to submit soft classification scores for both public and private test sets and were given feedback about their AUC on the public test set only. Winners were defined based on their AUC on the private test set, which was not publicly visible until after the competition had ended. Given the small number of examples (subjects) in the test set, there was a chance that participants would use their entries to infer the example labels of the public test set and use that to their advantage. To avoid such scenario, we permitted only 1 submission a day, set the maximum number of submissions per team to 45, and inflated the test data with approximately 120,000 dummy subjects. Waived MLSP 2014 Workshop registration was offered to the top 3 teams, in addition to monetary prizes of U$350, U$250, and U$150 sponsored by Springer. Each team was also invited to submit a paper to the Proceedings of the IEEE MLSP 2014 International Workshop. These awards were subject to code verification against hand-labeling and/or other violations of the competition rules (www.kaggle.com/c/mlsp-2014-mri/rules). The winners uploaded their model code under an open source license to the competition website.

5. COMPETITION RESULTS AND ANALYSES 5.1. Basic Results and Overall AUC This year, 2087 entries from 291 participants in 245 teams were made (the official standings were 2243, 348, and 313, respectively, including submissions from missing/deleted accounts, which were not included here). Participants submitted “soft” prediction scores for test subjects and, thus, no training set AUCs are reported here. Fig. 1 shows the distribution of private versus public AUCs for all 2087 entries we considered. The public AUCs were computed on the public test set and reported to the participants. Private AUCs were not reported to participants until after the end of the competition. For the most part private and public AUCs were consistent (notice the high concentration of entries around the “x = y” line (not shown), just above the dashed red lines). The correlation between private and public AUCs was 0.71, with 0.5. In some cases there was marked inconsistency between private and public AUCs for some entries. If private and public AUCs were consistent, the distribution of values would be much more tightly packed around the “x = y” line. This had a direct impact on the perceived “model quality” by the participants since quality was mostly inferred from the reported public AUCs. For example, consider the range of public AUC values between 0.7 and 0.85; notice how private AUCs could be lower than 0.5 in a few cases (and the converse is also true). This could be interpreted as “overfitting” to the public (conversely, private) datasets. While the public/private AUC strategy fits competition settings well, the small number of subjects in each set (30/28, respectively) makes the AUCs highly variable, and the ensuing inconsistencies may be misleading of the overall quality of the model. For that reason we have opted for the overall AUC of the entire test set (public + private sets) as a more reliable and stable measure of model performance for the remainder of this work. Fig. 2 shows the distribution of overall AUCs for all 2087 entries we considered. No entry was able to attain an overall AUC of 0.9 or higher. Generalizing results is a hard task due to the lack of examples.

Fig. 2. Overall AUC of all 2087 entries. The dashed red line indicates the median. The orange/yellow triangle indicates the competition benchmark (just above the median). No entry was able to attain an overall AUC of 0.9 or higher.

Fig. 1. Private vs Public AUCs for all 2087 entries. The dashed red lines indicate the median values of private and public AUCs. The blue contours indicate the centration of entries. (a) High variability and inconsistencies between private and public AUCs are evident. Highlighting the three winning entries in green squares. Notice how first and second places attain lower public AUCs. This is because the official ranking optimized for extremes along the x-axis only. (b) Zoom-in on entries with private and public AUCs above their medians.

5.2. Mixture of Experts Different classifiers trained on the same data and with similar overall training performances may still perform differently on the same test data. Combining the output of such classifiers may reduce the risk and increase the overall performance compared to the best single classifier in the ensemble [26]. The key for a successful ensemble model is the diversity in the output of the classifiers. In other words, different classifiers should commit errors on different instances of the test data. We hypothesized that there is diversity in the predictive models built by teams that participated in the competition. Even for same models, hyperparameters may have been chosen differently, resulting in different generalization performance. We therefore decided to form a mixture of experts system by combining the best output score (based on overall AUC) of each team and investigate if we could improve upon the performance of the top team (also based on overall AUC). We selected two simple combining strategies: averaging and weighted-averaging. In the averaging strategy, the “soft” prediction scores of teams in the ensemble were simply averaged to create a new set of scores. It should be emphasized that we only considered the best entry of each team (see Table 1). Then, the overall AUC of the averaged scores was computed. Weightedaveraging was conducted similarly; the only difference was that the scores for each team were weighted by the overall AUC of that team prior to averaging. Note that there are

other weighting methods that might be more efficient but here we opted to use the simplest ones [27].

Fig. 3. Performance (measured by overall AUC) of increasing ensembles of participating teams (best entries of each team) according to the number of top teams combined, utilizing two strategies: averaging and weighted-averaging. Inset: Performance of the ensemble up to top 25 teams. Dashed red line indicates the performance of the top team alone.

Fig. 3 shows the performance (in terms of overall AUC) based on the number of top ranked teams included in the ensemble (according to their rank in Table 1). It is evident that performance was significantly improved by combining the outputs of the top 3 teams. The performance of the ensemble declined after including the top 7 teams but continued to outperform the best team until the top 65 teams were included. From that point onward the ensemble

performance became slightly lower than that of the best team alone (dashed line). By including all teams the performance is about 1% lower than the best team. We hypothesize that the reason for declining performance of the ensemble after its peak is the lack of diversity in the output of new teams added. In other words, after a point, each new team mostly adds noise to the performance of the current ensemble. The other observation is the similar performance of the two combining strategies, especially that the two strategies perform equally well for the top teams, with performance values that are very close to each other. 5.3. Top 25 Teams Table 1 shows the top 25 teams based on their overall AUC. 5.4. Winning Approaches 5.4.1. First Place The winning team (Solution Draft) estimated class probabilities for test examples by means of a Gaussian process (GP) classifier with prior distribution scaled by a probit transformation. The covariance function that defines the Gaussian process prior is a linear combination of constant, linear and Matérn kernels. This composite kernel was defined by 4 hyperparameters. More information about this approach can be found in [1]. 5.4.2. Second Place The methodology proposed by this team (Alex Lebedev) had two stages. The first one performed feature selection by appending a noisy feature with normally distributed values to the training data, and training a random forest afterwards.

Those features that were assigned relevance below the noisy one (according to the Gini coefficient) were considered uninformative and were discarded. In the next stage, classification per se was performed by training a Gaussian kernel SVM with the surviving features. The reader is referred to [2] for more details on this approach. 5.4.3. Third Place The approach adopted in the third place entry (NukaCola) involved a method from the “high-dimensional low sample size” (HDLSS) class. Methods in this class attempt to directly address the issue of small sample size (here, number of subjects) and much higher-dimensional feature space. Specifically, this team used distance weighted discrimination (DWD), a method akin to support vector machines (SVM). However, DWD explicitly accounts for the distance from each sample to the margin and penalizes solutions with many samples clustered too close to the solution margin. This penalty is designed to prevent the model from overfitting. The DWD model can benefit from no feature selection. Thus, feature selection was not performed in this case. Detailed description and discussion on this solution can be found in [3]. 6. DISCUSSION AND CONCLUSIONS Participation was outstanding in this year’s competition. Here, we have presented the competition setup and also the performance of participants on test data. While we were able to find some participants that had highly consistent results on private and public test sets, there were several cases in which the performance gap between these two sets was significant. These inconsistencies showed up because public

Rank Team Name Overall AUC Private AUC Public AUC Date Submitted (UTC) Entry Id Entry Rank 1 Ayush 0.889417360 0.841025641 0.941964286 7/6/2014 4:01 839083 1 2 David Thaler 0.885850178 0.892307692 0.933035714 7/2/2014 0:01 832840 4 3 Jason Noriega 0.881093936 0.848717949 0.950892857 6/24/2014 2:21 819126 9 4 antonio anobii 0.878715815 0.876923077 0.906250000 6/29/2014 7:40 828494 10 5 Black Ice 0.878715815 0.825641026 0.946428571 7/19/2014 3:43 863568 13 6 Zhao Yilong 0.876932224 0.882051282 0.897321429 7/6/2014 16:04 839747 15 7 aptperson 0.872770511 0.871794872 0.883928571 6/26/2014 9:19 823712 24 8 BartekP 0.871581451 0.876923077 0.875000000 6/16/2014 11:52 805874 25 9 Konstantin Togoi 0.870392390 0.910256410 0.857142857 7/7/2014 23:45 842079 27 10 Maccios 0.870392390 0.841025641 0.897321429 6/30/2014 17:23 830640 29 11 jlef 0.869203329 0.846153846 0.897321429 7/15/2014 7:42 856855 30 12 Jun Pilao 0.869203329 0.846153846 0.888392857 6/30/2014 17:26 830644 31 13 disgon 0.868014269 0.897435897 0.857142857 7/16/2014 5:41 859696 32 14 han Zhou 0.868014269 0.866666667 0.875000000 7/1/2014 19:11 832505 34 15 lucas milcov 0.868014269 0.856410256 0.879464286 6/30/2014 19:08 830741 35 16 AdamH 0.865636147 0.892307692 0.816964286 7/2/2014 17:16 834087 37 17 Shahab 0.864447087 0.912820513 0.821428571 6/17/2014 0:35 806849 39 18 Alfredo Kalaitzis 0.864447087 0.876923077 0.883928571 7/15/2014 14:55 857468 40 19 Zhuz 0.864447087 0.851282051 0.897321429 7/15/2014 3:54 856585 42 20 Silvia Longoni 0.863258026 0.856410256 0.870535714 7/1/2014 10:03 831798 47 21 NukaCola 0.862068966 0.902564103 0.843750000 6/7/2014 20:08 792258 48 22 Vivant Shen & kaza 0.862068966 0.871794872 0.857142857 6/23/2014 19:18 818623 49 23 royal mayer 0.862068966 0.846153846 0.879464286 6/30/2014 17:29 830650 50 24 gabe 0.860879905 0.882051282 0.861607143 6/18/2014 4:59 808820 51 25 Phil Culliton 0.859690844 0.882051282 0.875000000 7/16/2014 22:49 860720 52 Table 1. Top 25 teams based on overall AUC. Ties were solved by private AUC, then public AUC, then timestamp.

and private test sets were too small to be representative of the entire data. We found that, in such scenarios, performance evaluation utilizing overall AUC led to more reasonable assessments about model quality than simply the private AUC. A new set of top-ranked participants was therefore evaluated utilizing overall AUCs. It was really reassuring to see that some participants attained very close to 0.90 overall AUC, reinforcing the idea that good classification of schizophrenia could be obtained despite the relatively small sample sizes of training and test sets. A remarkable highlight of this year’s competition was that these elevated overall AUCs were obtained with features derived from independent components using just the STR technique. The successful use of STR to recover independent components for the MRI datasets used in this competition suggests that reference components identified from large-cohort healthy control studies is an effective strategy to retrieve useful discriminative information from smaller, unrelated datasets. The takeaway is that there is a lot of predictive utility in imaging data, but identifying the most predictive features is still a challenge. Competitions represent an efficient way to search the space of features. While the neuroimaging field would clearly benefit from datasets with more examples (subjects), data collection is still a bottleneck in spite of the advances in multi-site collaborations currently underway. Therefore, future competitions should also emphasize the pursuit of whole brain feature extraction from raw or lightly processed data as a way to cope with the lack of examples. Innovative feature extraction approaches can be highly effective for classification in neuroimaging. Other challenges include the prediction of clinical symptom scores and classification of multiple clinical groups. 7. ACKNOWLEDGEMENTS Nicholas Lemke for help with data collection, Centers for Biomedical Research Excellence (COBRE) grants 5P20RR021938/P20GM103472 to Dr. Vince D. Calhoun. 8. REFERENCES [1] A. Solin, and S. Särkkä, “The 10th Annual MLSP Competition: First Place,” in IEEE MLSP, Reims, France, 2014, In Press. [2] A. V. Lebedev, “The 10th Annual MLSP Competition: Second Place,” in IEEE MLSP, Reims, France, 2014, In Press. [3] K. Koncevičius, “The 10th Annual MLSP Competition: Third Place,” in IEEE MLSP, Reims, France, 2014, In Press. [4] C. Bois, H. Whalley, A. McIntosh et al., “Structural magnetic resonance imaging markers of susceptibility and transition to schizophrenia: A review of familial and clinical high risk population studies,” Journal of Psychopharmacology, 2014. [5] O. Demirci, and V. D. Calhoun, “Functional Magnetic Resonance Imaging – Implications for Detection of Schizophrenia,” Eur Neurol Rev, vol. 4, no. 2, pp. 103-106, 2009. [6] M. J. Jafri, G. D. Pearlson, M. Stevens et al., “A method for functional network connectivity among spatially independent resting-state components in schizophrenia,” NeuroImage, vol. 39, no. 4, pp. 1666-1681, 2008.

[7] M. R. Arbabshirani, K. Kiehl, G. Pearlson et al., “Classification of schizophrenia patients based on resting-state functional network connectivity,” Frontiers in Neuroscience, vol. 7, 2013. [8] J. M. Segall, J. A. Turner, T. G. van Erp et al., “Voxel-based morphometric multisite collaborative study on schizophrenia,” Schizophr Bull, vol. 35, no. 1, pp. 82-95, 2009. [9] J. A. Turner, V. D. Calhoun, A. Michael et al., “Heritability of Multivariate Gray Matter Measures in Schizophrenia,” Twin Res Hum Genet, vol. 15, no. Special Issue 03, pp. 324-335, 2012. [10] C. N. Gupta, V. D. Calhoun, S. Rachakonda et al., “Patterns of Gray Matter Loss in Schizophrenia from a Large-Scale Aggregated Dataset,” Schizophrenia Bulletin, Submitted. [11] E. Castro, C. N. Gupta, M. Martinez-Ramon et al., “Identification of Patterns of Gray Matter Abnormalities in Schizophrenia Using Source-Based Morphometry and Bagging,” in 36th EMBC, Chicago, IL, 2014, In Press. [12] S. M. Plis, D. Hjelm, R. Salakhutdinov et al., “Deep learning for neuroimaging: a validation study,” Front Neurosci, vol.8, 2014. [13] M. Nieuwenhuis, N. E. M. van Haren, H. E. Hulshoff Pol et al., “Classification of schizophrenia patients and healthy controls from structural MRI scans in two large independent samples,” NeuroImage, vol. 61, no. 3, pp. 606-612, 2012. [14] E. A. Allen, E. B. Erhardt, E. Damaraju et al., “A Baseline for the Multivariate Comparison of Resting-State Networks,” Front Syst Neurosci, vol. 5, no. 2, 2011. [15] J. M. Segall, E. A. Allen, R. E. Jung et al., “Correspondence between Structure and Function in the Human Brain at Rest,” Frontiers in Neuroinformatics, vol. 6, 2012. [16] E. B. Erhardt, S. Rachakonda, E. J. Bedrick et al., “Comparison of Multi-Subject ICA Methods for Analysis of fMRI Data,” Human Brain Mapping, vol.32, no.12, pp. 2075-2095, 2011. [17] L. Freire, A. Roche, and J. F. Mangin, “What is the best similarity measure for motion correction in fMRI time series?,” IEEE Trans Med Imaging, vol. 21, no. 5, pp. 470-484, 2002. [18] J. Ashburner, and K. J. Friston, “Voxel-based morphometry-the methods,” NeuroImage, vol. 11, no. 6 Pt 1, pp. 805-821, 2000. [19] E. Allen, E. Erhardt, et al., “Capturing inter-subject variability with group independent component analysis of fMRI data: a simulation study,” NeuroImage, vol.59, no.4, pp.4141-4159, 2012. [20] V. D. Calhoun, and T. Adali, “Multisubject Independent Component Analysis of fMRI: A Decade of Intrinsic Networks, Default Mode, and Neurodiagnostic Discovery,” IEEE Reviews in Biomedical Engineering, vol. 5, pp. 60-73, 2012. [21] S. Robinson, G. Basso, N. Soldati et al., “A resting state network in the motor control circuit of the basal ganglia,” BMC Neuroscience, vol. 10, no. 1, pp. 137, 2009. [22] D. Cordes, V. M. Haughton, K. Arfanakis et al., “Mapping functionally related regions of brain with functional connectivity MR imaging,” Am J Neurorad, vol.21, no.9, pp.1636-1644, 2000. [23] L. Xu, K. M. Groth, G. Pearlson et al., “Source-based morphometry: The use of independent component analysis to identify gray matter differences with application to schizophrenia,” Human Brain Mapping, vol. 30, no. 3, pp. 711-724, 2009. [24] C. N. Gupta, V. D. Calhoun, S. Rachakonda et al., “Application of Source Based Morphometry for an Aggregated Gray Matter Dataset,” in 14th ICSR, Grande Lakes, FL, 2013. [25] J. Eng, “Receiver Operating Characteristic Analysis: A Primer,” Academic Radiology, vol. 12, no. 7, pp. 909-916, 2005. [26] R. Polikar, “Ensemble based systems in decision making,” Circuits and Systems Magazine, IEEE, vol.6, no.3, pp.21-45, 2006. [27] L. I. Kuncheva, Combining pattern classifiers : methods and algorithms, 2 ed., Hoboken, NJ: Wiley, 2014.