Beyond Statistics: A New Combinatorial Approach to ... - IOS Press

4 downloads 363 Views 172KB Size Report
IOS Press. 211. Beyond Statistics: A New Combinatorial. Approach to Identifying Biomarker Panels for the Early Detection and Diagnosis of. Alzheimer's Disease.
211

Journal of Alzheimer’s Disease 39 (2014) 211–217 DOI 10.3233/JAD-131424 IOS Press

Beyond Statistics: A New Combinatorial Approach to Identifying Biomarker Panels for the Early Detection and Diagnosis of Alzheimer’s Disease Elizabeth A. Milwarda,b,∗ , Pablo Moscatob , Carlos Riverosb and Daniel M. Johnstonec a School

of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, NSW, Australia for Bioinformatics, Biomarker Discovery and Information-Based Medicine, The University of Newcastle, Callaghan, NSW, Australia c Bosch Institute and Discipline of Physiology, University of Sydney, NSW, Australia b Centre

Accepted 29 August 2013

Abstract. Interventions to delay or slow Alzheimer’s disease (AD) progression are most effective when implemented at preclinical disease stages, making early diagnosis essential. For this reason, there is an increasing focus on discovery of predictive biomarkers for AD. Currently, the most reliable predictive biomarkers require either expensive (brain imaging) or invasive (cerebrospinal fluid collection) procedures, leading researchers to strive toward identifying robust biomarkers in blood. Yet promising early results from candidate blood biomarker studies are being refuted by subsequent findings in other cohorts or using different assay technologies. Recent evidence suggests that univariate blood biomarkers are not sufficiently sensitive or specific for the diagnosis of disorders as complex, multifactorial, and heterogeneous as AD. To overcome these present limitations, more consideration must be given to the development of ‘biomarker panels’ assessing multiple molecular entities. The selection of such panels should draw not only on traditional statistical approaches, whether parametric or non-parametric, but also on newer non-statistical approaches that have the capacity to retain and utilize information about all individual study participants rather than collapsing individual data into group summary values (e.g., mean, variance). These new approaches, facilitated by advances in computing, have the potential to preserve the context of interrelationships between different molecular entities, making them amenable to the development of panels that, as a multivariate collective, can overcome the challenge of individual variability and disease heterogeneity to accurately predict and classify AD. We argue that the AD research community should take fuller advantage of these approaches to accelerate discovery. Keywords: Biomarker, blood, combinatorial optimization, multivariate, non-statistical

Considerable research effort is currently being directed toward the development of biological measures or ‘biomarkers’ that can be used to monitor health and disease states for early diagnosis and prevention or ∗ Correspondence to: Dr. Liz Milward, School of Biomedical Sciences and Pharmacy MSB, University of Newcastle, Callaghan, NSW 2308, Australia. Tel.: +61 2 4921 5167; Fax: +61 2 4921 7903; E-mail: [email protected].

to track disease progression and treatment responses. Ideal biomarkers would be measurable using costeffective and minimally-invasive techniques [1, 2] but at present, the most consistent and reliable biomarkers of Alzheimer’s disease (AD) still require expensive imaging procedures or invasive collection of cerebrospinal fluid (CSF) [3, 4]. Consequently over the last five years there has been a flood of research activity aimed at discovering reliable blood-based biomarkers

ISSN 1387-2877/14/$27.50 © 2014 – IOS Press and the authors. All rights reserved

212

E.A. Milward et al. / Beyond Statistics: A New Combinatorial Approach to Identifying Biomarker Panels

for AD. Various strong candidate biomarkers are emerging, some of which are now being replicated in independent studies, as discussed further below. Yet as the area has evolved, the limitations of early pioneering studies are also becoming more apparent. One shortcoming has been that much of the early research has sought to identify biomarkers that discriminate patients with clinical AD from cognitively normal controls [5, 6]. It is questionable whether this adds anything substantial to what can already be achieved clinically with cognitive testing, although it will of course be important to try to find biomarkers which distinguish AD from other clinically similar neurological conditions or which have the capacity to differentiate different subtypes of AD [7, 8]. Finding biomarkers that discriminate patients with AD from controls may also provide insights into the molecular mechanisms underlying disease pathogenesis and might even improve diagnostic confidence in some cases. Yet, ultimately, more benefits are likely to derive from focusing on the initial, pre-clinical stages of disease—at the onset of mild cognitive impairment (MCI) or before—so that interventions can be commenced as early as possible, before further irreversible brain damage occurs [7, 9]. The challenge here is that people with MCI are a very heterogeneous group—some may go on to develop AD, some may develop other neurodegenerative disorders, some may die before progressing to full-blown disease, and a few may even revert to normal cognitive status [5, 7]. At present, there is no reliable way to decide in advance to which of these categories a person belongs. However large collaborative studies such as the Alzheimer’s Disease Neuroimaging Initiative (ADNI) [10], European Medical Information Framework (EMIF), and the Australian Imaging, Biomarker & Lifestyle Flagship Study of Ageing (AIBL) [11] are helping to overcome this problem by collecting longitudinal data on cognition, blood proteomics, and more traditional markers of brain imaging and CSF proteins. A second important limitation is that most early studies have investigated individual candidate biomarkers rather than biomarker ‘panels’. The idea that there is just one ‘magic marker’ that will enable early diagnosis of disease may hold true for various single gene disorders but for complex, more heterogeneous, multifactorial diseases, single markers are much less likely to be sufficiently sensitive or specific for reliable disease diagnosis [9, 12–15]. Consequently multivariate approaches will probably need to be used to identify biomarker signatures of pre-clinical AD.

The need for multivariate biomarker panels is already recognized by leading research groups in the field [9, 12–15]. What appears to be less well understood is that optimal selection of multivariate panels entails more than just doing a routine ‘single variate’ style analysis, taking the ‘top ten’, or whatever number is deemed appropriate, and using these as the multivariate panel. Although this can work better than using a single variable on its own, strategies of this kind usually fall well short of the outcomes that can be achieved using approaches based on combinatorial optimization of particular sets of entities or features, which also take into account interrelationships between different entities. Such non-statistical approaches are already proving useful in related areas, such as investigations of interactions between multiple genetic risk factors for AD, depression, and other conditions [16–18]. One of the reasons for this is already well recognized in the AD field, namely that single variate measures are generally outperformed by multimodal approaches that combine genomics, proteomics, metabolomics, or other molecular investigations with information from different sources, for example cognitive test scores, imaging, or demographic factors such as gender, age, and education level [9, 13, 15]. In addition to this, there is another fundamental consideration that does not yet appear to be widely appreciated in the field. This is that measures that do not vary significantly between groups of control and test samples can still contribute to distinguishing control and test groups, for example through the contrast between their behavior and that of other measures in each individual [19, 20]. This concept, which is illustrated further by Fig. 1 below, can be hard for many biologists to accept at first sight, as it runs counter to the prevailing dogma that the only genuinely meaningful differences between data groups are those which are statistically significant. In order to understand this better, consider the issue of heterogeneity touched on above and the question of how best to identify biomarker sets that effectively distinguish different states of health or disease. Irrespective of how disease states are defined, in most scenarios there will not be completely separate, distinct groups of biological features that definitively distinguish different states of health or disease in an identical fashion across all individuals. This is because of individual variation as a result of genetic or environmental factors. This variation does not relate solely to disease heterogeneity reflecting different disease etiologies. As just one example, it is now widely recognized that, due to individual differences in genetic or other factors, two

E.A. Milward et al. / Beyond Statistics: A New Combinatorial Approach to Identifying Biomarker Panels

Fig. 1. The value of considering the relationship between entities, rather than just entities in isolation, for biomarker discovery. The diagram illustrates the principle underlying the concept of ‘metafeatures’. It shows two measures (X and Y) that, when considered individually, do not vary significantly between groups of control and disease samples, but whose relationship to one another is useful for distinguishing between these groups, i.e., considering the difference between the log concentration of X and Y (equivalent to considering the ratio of the two raw concentrations), gives rise to a clearer distinction of the control and disease groups.

individuals can respond very differently to identical pathogen exposures, resulting in very different disease risks, progressions, and treatment responses [21, 22]. For this reason, a useful biomarker panel may sometimes need to combine different sets of biomarkers generated over groups of individuals. For example, some patients with a condition may have perturbations in systems A and B while other patients with the same condition have perturbations in systems A and C, where B and C, although possibly related, are not identical. While system A may provide the most useful biomarkers for diagnosis, all three systems may need to be considered for prognosis or to monitor or treat all patients effectively. Furthermore, the importance of any particular system component may depend on the state of other system components or other systems. For example, a gene variant that causes dysfunction of a particular redundant gene product may be inconsequential in many individuals. Yet it may be pathogenic in a subset of people who, because of variation in other genetic factors, are unable to compensate adequately by alternative mechanisms. Because of such considerations, the statistical approaches that combine all individual data into a single representative value, and that have traditionally been used to determine whether or not a particular molecule is important in any situation, only give part of the full picture [17, 23]. Developing a more complete perspective will require novel mathematical approaches that preserve

213

and capitalize on the information from individual patients. Classical experimental biology has most commonly used only statistical approaches to identify molecular differences between different biological states. While statistics provide a useful view of biological states, this is achieved at the cost of information. Whenever data are amalgamated or ‘smeared’ into a distribution or waveform or lumped into an average or median value, information about the individual components or data bits is lost. One way to minimize information loss is to combine the impressionism of statistics with alternative, more ‘pointillist’ approaches that retain information about large numbers of individual components. This is analogous to the way modern physics draws on both wave and quantum mechanics. Biologists need to develop approaches which exploit perceptual switches between alternative data interpretations to see both the wood and the trees. The use of non-statistical approaches, and their application in generating sets of biomarkers which effectively discriminate between different biological states in studies of brain diseases and cancer, are described in detail elsewhere [24–29]. As already described, for each biological entity or ‘feature’, non-statistical approaches involving combinatorial optimization take into account comparisons for all possible pairs of individuals across all the biological states being considered [24], rather than simply lumping information for groups of individuals into a single collective mean or median. Furthermore, whereas statistics identifies individual entities meeting an arbitrary, user-defined cut-off, such as a p value, the sets of entities identified by these kinds of non-statistical approaches represent optimal mathematical solutions fulfilling particular requirements. For example, optimal solutions may aim to effectively distinguish all possible pairs of individuals in different states and, conversely, to class as equivalent all possible pairs of individuals in the same state. Optimal sets of biological entities or ‘features’ determined in this way will therefore exhibit not only strong differences between different states but will also show strong resemblances across similar states [20, 24]. One approach based on this principle is termed the Max Cover (α,β)-k Feature Set approach (also referred to as Max Cover (α,β)-FS). This method utilizes a non-statistical strategy consisting of a two-stage filter process [24]. The first stage uses an algorithm developed by Fayyad and Irani [30] to transform a continuous numeric dataset into a discrete dataset and to then discard entities that do not provide sufficient information to discriminate between control and test groups.

214

E.A. Milward et al. / Beyond Statistics: A New Combinatorial Approach to Identifying Biomarker Panels

Fig. 2. Selection of thresholds by Fayyad and Irani’s algorithm [30]. For each individual entity being investigated, the algorithm orders the full set of samples under consideration by signal size, irrespective of sample class, e.g., control C or test T. Possible thresholds are then set at every interface that occurs between members of different classes. The algorithm then selects the threshold which separates the two classes most effectively (here Threshold 3, i.e., a cut-off of 16.5).

Briefly, for each entity (e.g., protein), the algorithm orders different samples by signal level and identifies the threshold that best separates the test (e.g., AD) and control groups (see Fig. 2). Entities falling below this threshold are assigned a value of 0, while those above are assigned a value of 1. Entities that do not return a useful threshold, according to the Minimum Descriptive Length Principle [24], are discarded from further analysis in an initial filtering step. When applied to a large dataset (e.g., a blood proteomics dataset), this approach generates a matrix of discrete data, thus effectively reducing the dimensionality of the original dataset. A more detailed explanation of this method is given elsewhere [24]. The second stage of this method involves finding a solution to the Max Cover (α,β)-k Feature Set problem [26]. This is achieved by taking into consideration, for each entity in the matrix, all possible pairs of samples, whether controls or tests, in order to search for an optimal set (‘solution’) of entities with both strong inter-class differentiation and strong intra-class similarity [24, 25, 31]. This differs greatly from statistical approaches in that it selects entities that distinguish the maximum possible number of pairs of samples in different groups (control versus test) but not pairs of samples in the same group. We have previously demonstrated the usefulness of this novel multivariate feature selection approach for finding blood biomarker signatures [20]. This approach distinguished pre-clinical AD from healthy controls more successfully than a collection of the ‘best’ markers determined by statistical univariate analysis. The advantage of this type of approach over other more con-

ventional analyses of putative blood biomarkers is that it draws on detailed cross-comparisons among all the individual participants rather than just assessing univariate measures of class central tendency and variance in the usual way. For example, suppose Marker A is useful for discriminating patients 1, 2, 5, and 6 from controls. When constructing an optimal panel using Marker A, Marker B, which distinguishes patients 3 and 4 from controls might be considered to better complement Marker A than Marker C, distinguishing patients 1, 2, and 5 from controls, even though Marker C may distinguish more individual patients than Marker B and the latter may not show any significant difference when considered over the entire group. This illustrates one way a signature set can contain analytes that do not vary significantly between groups of control and test samples yet still contribute to distinguishing these groups through the contrast between their behavior and that of other analytes. As with any emerging field that uses new methods which are still being refined, inconsistencies in sample collection, processing, and measurement methods are likely to be common but independent groups working with diverse patient samples are now starting to generate consistent outcomes, especially when the same technology platform is used [9]. This is encouraging but not entirely unexpected since things that very strongly differentiate different conditions should be picked easily by most sound methods, especially when standard statistical analysis approaches are being used. Less predictably, but perhaps even more encouragingly, our analysis of the ADNI dataset using the

E.A. Milward et al. / Beyond Statistics: A New Combinatorial Approach to Identifying Biomarker Panels

Max Cover (α,β)-FS method [20] identified biomarkers such as brain or B-type natriuretic peptide BNP, C-reactive protein (CRP), and, of course, apolipoprotein E (APOE) that have subsequently been validated in other datasets using standard statistical approaches [9, 32]. Biomarkers that are jointly identified by such very different methods are more likely to represent robust findings because the Max Cover (α,β)-FS approach has such a different mathematical basis from standard statistical methods. While we have focused on the Max Cover (α,β)-FS method here, other useful non-statistical approaches are now also starting to be employed in studies of AD and related fields. One example is recursive partitioning, a classification tree strategy that has been used to identify candidate gene-gene interactions in depression [17, 18]. However the real gold in using mathematically diverse approaches lies in the discoveries that are not getting turned up by the conventional approaches. The unavoidable limitation of conventional statistical approaches is that any method that tests group summary measures is only going to find features that have a sufficiently large difference between group means relative to the group variance. In contrast, the Max Cover (α,β)-FS approach is not bound by this limitation and so has the potential to generate novel markers not found by more restrictive approaches [20, 33]. Using combinatorial optimization approaches to inform our biomarker panel selection and extend the candidate pool beyond the limits of traditional statistical approaches has allowed us to achieve greater sensitivity and specificity in classifying individuals into clinical groups [20]. While these findings remain to be independently validated in different patient groups, the improvements are unlikely to be attributable solely to over-fitting since the number of biomarkers in our panels is the same or less than the number in many panels selected using more traditional approaches. Another example of these new kinds of approaches to identifying biomarkers is the use of ‘meta-features’, functions involving two or more variables (e.g., the relative abundance ratio of two proteins). An example of this is illustrated in Fig. 1. For instance, using two variables determined in one experiment within each individual sample can help minimize any confounding effects due to inter-sample biological or technical variability. This also identifies features or entities that are mathematically or biologically dependent in a supra-additive (synergistic) way with regard to disease prediction capacity, i.e., potentially inter-related (whether directly or not).

215

This draws on a common practice in classification. For example, the concentration ratio of two different analytes has been used previously to establish CSF biomarkers of AD from univariate analysis (e.g., A␤42 /tau) [34, 35]. We have previously demonstrated the benefit of considering meta-features for enhancing the accuracy of blood biomarker panels for AD [19, 20]. Again this is completely counter to conventional biomedical thinking, which assumes, usually without question, that only measures that show real and statistically significant differences between disease and control groups will be of importance and discards all other measures from consideration. In conclusion, the AD field stands to benefit considerably by making much fuller use of nonstatistical methods such as combinatorial optimization to supplement standard statistical approaches for biomarker discovery. These more sophisticated mathematical methodologies, which take advantage of recent advances in computing power, also have a range of other applications. For example, identifying molecular features associated with pathogenesis can reveal mechanisms by which biological systems contribute to disease. Similar approaches may also help refine disease subtype classifications through correlations with distinct sets of specific biomarkers. Appreciation of these new methods should continue to grow as more large and longitudinal proteomics and other biological datasets are made available for analysis in the public domain. ACKNOWLEDGMENTS DMJ is supported by a National Health and Medical Research Council of Australia (NHMRC) Early Career Fellowship. PM is supported by an Australian Research Council (ARC) Future Fellowship. Authors’ disclosures available online (http://www.jalz.com/disclosures/view.php?id=1940). REFERENCES [1]

[2] [3]

[4]

Laske C, Leyhe T, Stransky E, Hoffmann N, Fallgatter AJ, Dietzsch J (2011) Identification of a blood-based biomarker panel for classification of Alzheimer’s disease. Int J Neuropsychopharmacol 14, 1147-1155. Williams R (2011) Biomarkers: Warning signs. Nature 475, S5-S7. Blennow K, Hampel H, Weiner M, Zetterberg H (2010) Cerebrospinal fluid and plasma biomarkers in Alzheimer disease. Nat Rev Neurol 6, 131-144. Hampel H, Frank R, Broich K, Teipel SJ, Katz RG, Hardy J, Herholz K, Bokde AL, Jessen F, Hoessler YC, Sanhai WR, Zetterberg H, Woodcock J, Blennow K (2010) Biomarkers

216

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

E.A. Milward et al. / Beyond Statistics: A New Combinatorial Approach to Identifying Biomarker Panels for Alzheimer’s disease: Academic, industry and regulatory perspectives. Nat Rev Drug Discov 9, 560-574. Tarawneh R, Holtzman DM (2009) Critical issues for successful immunotherapy in Alzheimer’s disease: Development of biomarkers and methods for early detection and intervention. CNS Neurol Disord Drug Targets 8, 144-159. Fiala M, Veerhuis R (2010) Biomarkers of inflammation and amyloid-beta phagocytosis in patients at risk of Alzheimer disease. Exp Gerontol 45, 57-63. Cedazo-Minguez A, Winblad B (2010) Biomarkers for Alzheimer’s disease and other forms of dementia: Clinical needs, limitations and future aspects. Exp Gerontol 45, 5-14. Hampel H, Broich K, Hoessler Y, Pantel J (2009) Biological markers for early detection and pharmacological treatment of Alzheimer’s disease. Dialogues Clin Neurosci 11, 141-157. Bazenet C, Lovestone S (2012) Plasma biomarkers for Alzheimer’s disease: Much needed but tough to find. Biomark Med 6, 441-454. Mueller SG, Weiner MW, Thal LJ, Petersen RC, Jack CR, Jagust W, Trojanowski JQ, Toga AW, Beckett L (2005) Ways toward an early diagnosis in Alzheimer’s disease: The Alzheimer’s Disease Neuroimaging Initiative (ADNI). Alzheimers Dement 1, 55-66. Ellis KA, Bush AI, Darby D, De Fazio D, Foster J, Hudson P, Lautenschlager NT, Lenzo N, Martins RN, Maruff P, Masters C, Milner A, Pike K, Rowe C, Savage G, Szoeke C, Taddei K, Villemagne V, Woodward M, Ames D (2009) The Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging: Methodology and baseline characteristics of 1112 individuals recruited for a longitudinal study of Alzheimer’s disease. Int Psychogeriatr 21, 672-687. Hu WT, Chen-Plotkin A, Arnold SE, Grossman M, Clark CM, Shaw LM, McCluskey L, Elman L, Karlawish J, Hurtig HI, Siderowf A, Lee VM, Soares H, Trojanowski JQ (2010) Biomarker discovery for Alzheimer’s disease, frontotemporal lobar degeneration, and Parkinson’s disease. Acta Neuropathol 120, 385-399. Burnham SC, Faux NG, Wilson W, Laws SM, Ames D, Bedo J, Bush AI, Doecke JD, Ellis KA, Head R, Jones G, Kiiveri H, Martins RN, Rembach A, Rowe CC, Salvado O, Macaulay SL, Masters CL, Villemagne VL (2013) A bloodbased predictor for neocortical Abeta burden in Alzheimer’s disease: Results from the AIBL study. Mol Psychiatry, doi: 10.1038/mp.2013.40 Soares HD, Chen Y, Sabbagh M, Roher A, Schrijvers E, Breteler M (2009) Identifying early markers of Alzheimer’s disease using quantitative multiplex proteomic immunoassay panels. Ann N Y Acad Sci 1180, 56-67. Lane RF, Dacks PA, Shineman DW, Fillit HM (2013) Diverse therapeutic targets and biomarkers for Alzheimer’s disease and related dementias: Report on the Alzheimer’s Drug Discovery Foundation 2012 International Conference on Alzheimer’s Drug Discovery. Alzheimers Res Ther 5, 5. Haasl RJ, Ahmadi MR, Meethal SV, Gleason CE, Johnson SC, Asthana S, Bowen RL, Atwood CS (2008) A luteinizing hormone receptor intronic variant is significantly associated with decreased risk of Alzheimer’s disease in males carrying an apolipoprotein E epsilon4 allele. BMC Med Gene 9, 37. Roetker NS, Yonker JA, Lee C, Chang V, Basson JJ, Roan CL, Hauser TS, Hauser RM, Atwood CS (2012) Multigene interactions and the prediction of depression in the Wisconsin Longitudinal Study. BMJ Open 2, pii: e000944. Roetker NS, Page CD, Yonker JA, Chang V, Roan CL, Herd P, Hauser TS, Hauser RM, Atwood CS (2013) Assessment of genetic and nongenetic interactions for the prediction of

[19]

[20]

[21]

[22] [23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

depressive symptomatology: An analysis of the Wisconsin Longitudinal Study using machine learning algorithms. Am J Public Health 103(Suppl 1), S136-S144. Rocha de Paula M, Gomez Ravetti M, Berretta R, Moscato P (2011) Differences in abundances of cell-signalling proteins in blood reveal novel biomarkers for early detection of clinical Alzheimer’s disease. PLoS One 6, e17481. Johnstone D, Milward EA, Berretta R, Moscato P (2012) Multivariate protein signatures of pre-clinical Alzheimer’s disease in the Alzheimer’s disease neuroimaging initiative (ADNI) plasma proteome dataset. PLoS One 7, e34341. Vodovotz Y, Constantine G, Faeder J, Mi Q, Rubin J, Bartels J, Sarkar J, Squires RH, Jr., Okonkwo DO, Gerlach J, Zamora R, Luckhart S, Ermentrout B, An G (2010) Translational systems approaches to the biology of inflammation and healing. Immunopharmacol Immunotoxicol 32, 181-195. Aderem A, Smith KD (2004) A systems approach to dissecting immunity and inflammation. Semin Immunol 16, 55-67. Hu T, Sinnott-Armstrong NA, Kiralis JW, Andrew AS, Karagas MR, Moore JH (2011) Characterizing genetic interactions in human disease association studies using statistical epistasis networks. BMC Bioinformatics 12, 364. Berretta R, Costa W, Moscato P (2008) Combinatorial optimization methods for finding genetic signatures from gene expression datasets. Methods Mol Biol 453, 363-377. Cotta C, Langston MA, Moscato P (2007) Combinatorial and algorithmic issues for microarray analysis. In Handbook of Approximation Algorithms and Metaheuristics, Gonzalez TF, ed. Chapman & Hall/CRC, Boca Raton, FL, pp. 74.71-74.14. Cotta C, Sloper C, Moscato P (2004) Evolutionary search of thresholds for robust feature set selection: Application to the analysis of microarray data. In Applications of Evolutionary Computing, Raidl GR, Cagnoni S, Branke J, Corne DW, Drechsler R, Jin Y, Johnson CG, Machado P, Marchiori E, Rothlauf F, Smith GD, Squillero G, eds. Springer-Verlag Berlin/Heidelberg, pp. 21-30. Mendes A, Scott RJ, Moscato P (2008) Microarrays– identifying molecular portraits for prostate tumors with different Gleason patterns. Methods Mol Med 141, 131-151. Hourani M, Berretta R, Mendes A, Moscato P (2008) Genetic signatures for a rodent model of Parkinson’s disease using combinatorial optimization methods. Methods Mol Biol 453, 379-392. Gomez Ravetti M, Rosso OA, Berretta R, Moscato P (2010) Uncovering molecular biomarkers that correlate cognitive decline with the changes of hippocampus’ gene expression profiles in Alzheimer’s disease. PLoS One 5, e10153. Fayyad UM, Irani KB (1993) Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. Morgan Kaufmann Publishers, San Francisco, pp. 1022-1029. Gomez Ravetti M, Moscato P (2008) Identification of a 5-protein biomarker molecular signature for predicting Alzheimer’s disease. PLoS One 3, e3111. Hu WT, Holtzman DM, Fagan AM, Shaw LM, Perrin R, Arnold SE, Grossman M, Xiong C, Craig-Schapiro R, Clark CM, Pickering E, Kuhn M, Chen Y, Van Deerlin VM, McCluskey L, Elman L, Karlawish J, Chen-Plotkin A, Hurtig HI, Siderowf A, Swenson F, Lee VM, Morris JC, Trojanowski JQ, Soares H (2012) Plasma multianalyte profiling in mild cognitive impairment and Alzheimer disease. Neurology 79, 897-905. Johnstone DM, Riveros C, Heidari M, Graham RM, Trinder D, Berretta R, Olynyk JK, Scott RJ, Moscato P, Milward EA (2013) Evaluation of different normalization and anal-

E.A. Milward et al. / Beyond Statistics: A New Combinatorial Approach to Identifying Biomarker Panels

[34]

ysis procedures for Illumina gene expression microarray data involving small changes. Microarrays 2, 131-152. Mattsson N, Zetterberg H, Hansson O, Andreasen N, Parnetti L, Jonsson M, Herukka SK, van der Flier WM, Blankenstein MA, Ewers M, Rich K, Kaiser E, Verbeek M, Tsolaki M, Mulugeta E, Rosen E, Aarsland D, Visser PJ, Schroder J, Marcusson J, de Leon M, Hampel H, Scheltens P, Pirttila T, Wallin A, Jonhagen ME, Minthon L, Winblad B, Blennow K (2009) CSF biomarkers and incipient Alzheimer disease in patients with mild cognitive impairment. JAMA 302, 385-393.

[35]

217

Shaw LM, Vanderstichele H, Knapik-Czajka M, Clark CM, Aisen PS, Petersen RC, Blennow K, Soares H, Simon A, Lewczuk P, Dean R, Siemers E, Potter W, Lee VM, Trojanowski JQ (2009) Cerebrospinal fluid biomarker signature in Alzheimer’s disease neuroimaging initiative subjects. Ann Neurol 65, 403-413.

Suggest Documents