Journal of Theoretical Biology 376 (2015) 32–38
Contents lists available at ScienceDirect
Journal of Theoretical Biology journal homepage: www.elsevier.com/locate/yjtbi
A strategy focused on MAPT, APP, NCSTN and BACE1 to build blood classifiers for Alzheimer's disease Marisol Herrera-Rivero a,1, María Elena Hernández-Aguilar b, Gonzalo Emiliano Aranda-Abreu b,n a b
Centro de Investigaciones Biomédicas, Universidad Veracruzana, Xalapa, Veracruz, Mexico Centro de Investigaciones Cerebrales, Cuerpo Académico de Neuroquímica, Universidad Veracruzana, Xalapa, Veracruz, Mexico
H I G H L I G H T S
We used a combined model to build disease classifiers, using measures of blood pressure and serum glucose, cholesterol and triglyceride levels as well as RT-PCR expression levels of APP, NCSTN and BACE1 in peripheral blood mononuclear cells. A set of genes was carefully selected by molecular interactions with MAPT, APP, NCSTN and BACE1 to test an expression-based classifier in a public microarray dataset. The method of variable selection proves that even elements showing no significant differences between controls and AD, but that have somehow been linked to AD or AD-related elements, still hold a potential to be used in its diagnosis.
art ic l e i nf o
a b s t r a c t
Article history: Received 28 October 2014 Received in revised form 21 February 2015 Accepted 31 March 2015 Available online 8 April 2015
Background: Although Alzheimer's disease (AD) is a brain disorder, a number of peripheral alterations have been found in these patients, including differences in leukocyte gene expression; however, the key genes involved in plaque and tangle formation have shown a relatively small potential as diagnostic markers. We focused on MAPT, APP, NCSTN and BACE1 as the basis to build and compare blood classifiers for AD. Methods: We used a combined model to build disease classifiers, using measures of blood pressure and serum glucose, cholesterol and triglyceride levels as well as RT-PCR expression levels of APP, NCSTN and BACE1 in peripheral blood mononuclear cells (PBMCs) from an independent cohort of 36 individuals of cognitively-normal controls, AD and other neuropathologies. Also, a set of genes was carefully selected by molecular interactions with MAPT, APP, NCSTN and BACE1 to test an expression-based classifier in a public microarray dataset of 40 samples (AD and controls). A series of discriminant analyses and classification and regression trees (C&RTs) were used to perform classification tasks. Results: Using C&RTs, the combined model showed potential to differentially diagnose AD with up to 94.4% accuracy and 100% specificity for our independent sample. Furthermore, a subset of 16 genes showed the best diagnostic potential using a minimum number of expression variables, correctly classifying up to 100% of samples in the public dataset. Conclusions: Our unique method of variable selection proves that even elements showing no significant differences between controls and AD, but that have somehow been linked to AD or AD-related elements, still hold a potential to be used in its diagnosis. Sample size and inherent methodological limitations of this study need to be kept in mind. Our classifiers require careful further testing in larger cohorts. Nonetheless, we believe these results provide evidence for the utility of our innovative method, which contributes a different approach to generate promising diagnostic tools for neuropsychiatric disorders. & 2015 Elsevier Ltd. All rights reserved.
Keywords: Alzheimer's disease Gene expression Diagnosis Classifiers
Abbreviations: AD, Alzheimer's disease; APP, amyloid precursor protein; Aβ, amyloid-beta peptides; BACE1, β-site APP cleaving enzyme; C&RTs, classification and regression trees; MAPT, microtubule-associated protein tau; NCSTN, γ-secretase subunit nicastrin; NP, non-AD neuropathologies; PBMCs, peripheral blood mononuclear cells; RT-PCR, retro-transcription polymerase chain reaction n Correspondence to: Luis Castelazo Ayala S/N, Col. Industrial Animas, C.P. 91190 Xalapa, Veracruz, Mexico. Tel.: þ 52 228 8418900x13616; fax: þ 52 228 8418920. E-mail addresses:
[email protected],
[email protected] (M. Herrera-Rivero),
[email protected] (M. Elena Hernández-Aguilar),
[email protected] (G. Emiliano Aranda-Abreu). 1 Present address: Clinical Neuroscience Unit, Department of Neurology, University Hospital Bonn. Bonn, Germany. http://dx.doi.org/10.1016/j.jtbi.2015.03.039 0022-5193/& 2015 Elsevier Ltd. All rights reserved.
M. Herrera-Rivero et al. / Journal of Theoretical Biology 376 (2015) 32–38
1. Background Alzheimer's disease (AD) is the most common form of dementia, representing 50–60% of all dementia cases (De Leon et al., 2007). The underlying pathological events in AD are the amyloid and tau pathologies with the formation of plaques and tangles, respectively. Main roles in plaque formation are played by the amyloid precursor protein (APP), the only protein containing the β-amyloid (Aβ) sequence; and the β- (β-site APP cleaving enzyme, BACE1) and γsecretases, which cleave APP to release Aβ peptides (Aβ40, Aβ42) (Vassar and Kandalepas, 2011). Nicastrin (NCSTN, also known as APH2) is the largest component of the γ-secretase complex and plays a critical role in its activity (Yang et al., 2009). Neuropathological alterations in AD are the result of various cellular processes, such as tangle formation by hyperphosphorylation and intracellular aggregation of the microtubule-associated protein tau (MAPT) (Ballatore et al., 2007), oxidative stress (Jiménez-Jiménez et al., 2006), mitochondrial dysfunction (Gu et al., 2012), abnormal molecular transport (Suzuki et al., 2006), neuroinflammation (Tuppo and Arias, 2005) and lipid dysregulation (Di Paolo and Kim, 2011). Peripheral alterations in AD include affection of components of the immune system, blood plasma and cells (Irizarry, 2004; Ray et al., 2007; Britshgi and Wyss-Coray, 2007), from which lymphocytes show a series of abnormalities (Kálmán et al., 2005) and similarities with neurons that suggest them as useful tools to detect the disease (Gladkevich et al., 2004). A growing list of candidate genes and common medical and lifestyle conditions are proposed to contribute in varying degrees to increase the risk for sporadic AD, influencing disease presentation, progression and age of onset. Associations of AD with various common medical conditions (Irizarry, 2004), such as type 2 diabetes mellitus (Cheng et al., 2011), hypertension, hypercholesterolemia and hypertriglyceridemia (Frisardi et al., 2010), have been reported by different studies. In the absence of a cure and disease-modifying treatments for AD, and in the light of its rising worldwide prevalence, the need for accessible and cost-effective diagnostic tools has led scientists to undertake different approaches in their search for molecular markers which discriminate individuals with AD from nondemented aged controls. We hypothesized that expression of the AD-related genes MAPT, APP, NCSTN and BACE1 in peripheral blood mononuclear cells (PBMCs) are useful elements to generate diagnostic tools, regardless of their controversial significance reported by different studies. The aim of our study was to identify and compare classifiers for AD using an approach centered on this hypothesis, selecting two sets of variables. The first set of variables combines data on APP, NCSTN and BACE1 expression in PBMCs and the blood pressure and serum glucose, cholesterol and triglyceride levels of 36 individuals (12 controls, 12 AD and 12 other neuropathologies), previously collected from an independent cohort (Herrera-Rivero et al., 2013). The second set of variables includes the expression of 32 genes selected by their interactions with MAPT, APP, NCSTN and BACE1, tested in a public dataset of PBMCs microarray profiles from 40 individuals (22 controls and 18 AD) (ArrayExpress).
33
age¼80 72) and an heterogenous non-AD neuropathology control group including some of the most common brain disorders occurring in the elderly (group NP, n ¼12, mean age¼ 787 2), namely vascular dementia (VaD), Parkinson's disease (PD), schizophrenia, psychotic disorder, mild cognitive impairment (MCI), cognitive impairment due to cerebrovascular event (CVE) and traumatic brain injury (TBI). Herein, we used our previously collected data on blood pressure and serum glucose, total cholesterol and triglyceride levels, as well as APP, NCSTN and BACE1 expression in PBMCs, measured by end-point RT-PCR, from different normal and disease groups (Herrera-Rivero et al., 2013). Cognitively-healthy aged individuals were selected as normal controls and the NP group was selected as a non-AD neuropathology group to compare against AD. All individuals from our previous study signed an informed consent either by themselves (healthy volunteers and non-demented cases) or through a family member (for demented cases) prior to sample collection. The study was reviewed and approved by the Ethics Committee of the Centro de Investigaciones Cerebrales at Universidad Veracruzana and was in accordance with the requirements established by the Norma Oficial Mexicana NOM-166-SSA11997 for the organization and functioning of clinical laboratories. Statistical analyses were performed using the statistical software STATISTICA 7 (StatSoft) to compare all variables between the three groups by one-way ANOVA and to investigate for correlations between APP, NCSTN and BACE1 expressions and for interactions between expression of these genes and blood pressure (comprising systolic pressure, diastolic pressure and pulse), glucose, total cholesterol and triglyceride levels. Correlations were performed by Spearman's test. All differences and correlations were considered significant at p o0.05. 2.2. MAPT, APP, NCSTN and BACE1-related genes The public E-GEOD-4229 dataset was downloaded from the ArrayExpress database (ArrayExpress). This dataset consists of PBMC expression data from 22 control (group C–B) and 18 AD (group AD-B) individuals, generated using the NIA Human MGC cDNA microarray by Maes et al. (2009). Using this dataset, we wanted to test a set of genes related to MAPT, APP, NCSTN and BACE1 (primary genes) for its diagnostic potential, for which we queried these in online tools to find proteins interacting with them, using STRING 9.05 (STRING), and genes sharing attributes with them, using the Partner Hunter from GeneDecks V3 (GeneCards). Then we selected the genes that appeared in more than one result list, meaning, either the gene interacts and shares features with one of our primary genes or it interacts/shares features with more than one of our primary genes. This way, we selected a set of 32 genes for the analysis. Comparisons between controls and AD, between males and females within groups and group-gender interactions were analyzed by factorial ANOVA. We also performed a Best Subsets regression to investigate whether a smaller set from the 32 genes was capable of correctly classifying most or all samples. We used the statistical software STATISTICA 7 (StatSoft) and SigmaStat 3.5 (Systat). Differences were considered significant at p o0.05 Values.
2. Methods
2.3. AD classifiers
2.1. Clinical variables and PBMC expression of APP, NCSTN and BACE1
We used the expression and clinical data of our independent dataset as a first model to build combined AD classifiers. We also tested in the public dataset the complete selected gene set and various models obtained with the best subsets regression, by discriminant analysis, to identify the model with the minimum number of genes that satisfactorily classifies all samples and generate additional models to build expression-based AD classifiers.
We selected available data on 36 individuals from our previous study (Herrera-Rivero et al., 2013), including: cognitively-healthy aged controls (group C–A, n ¼12, mean age¼80 72), patients clinically diagnosed with late-onset AD (group AD-A, n¼ 12, mean
34
M. Herrera-Rivero et al. / Journal of Theoretical Biology 376 (2015) 32–38
For both datasets, we used a series of discriminant analyses and classification and regression trees (C&RTs) to perform the classification tasks. Finally, we selected those classifiers we considered showed a better diagnostic potential using the minimum number of variables, for each model, and compared their performances in terms of accuracy, sensitivity, specificity, positive predictive value and negative predictive value, in order to identify the best predictors for AD condition.
3.1.3. Correlations To investigate whether interactions between expression and clinical variables exist, we performed correlation analyses for the complete dataset and then within each group. Overall, we found positive significant correlations for cholesterol with APP and NCSTN, besides the expected correlations between cholesterol and triglycerides and between systolic and diastolic pressures. In the by groups analyses, we found positive significant correlations as shown in Table 1.
3. Results
3.1.4. Combined classifiers We used then APP, NCSTN and BACE1 expression levels together with the raw values of systolic/diastolic pressures, pulse, glucose, cholesterol and triglycerides to test their combined ability to discriminate AD individuals from those who are not. We termed this Model 1. Because we were interested in the model being able to distinguish AD even when individuals who suffer from other neuropathologies are present in the dataset, we included NP group in the analyses. By discriminant analysis, 75% of all 36 samples were correctly classified. Best performance was observed for AD, correctly classifying 10 from 12 samples (83.3% sensitivity) (Table 2). Using classification trees we obtained a maximum accuracy of 94.4%, with all built trees performing better for AD class prediction (91.7– 100% sensitivity). NCSTN was identified as the most important variable for decision making. Classification matrices for all trees can be seen in Table 3 for comparative purposes. When we performed pairwise analyses, we obtained 79.2% accuracy for AD vs. controls by discriminant analysis, but 87.5% for AD vs. NP. AD condition was also best predicted (91.7% sensitivity) in both analyses (Table 2). By classification trees, sensitivity for AD vs. controls was good in all built trees (minimum of 91.7%); nonetheless, specificity importantly decreased in the last trees (from 100% to 75%). The most important element for decision making in discriminating between controls and AD was cholesterol, with a split constant of 163.5. For discrimination between AD and NP, we obtained a minimum accuracy of 91.7%, with 2 NP samples predicted as AD (100% sensitivity, 83.3% specificity). Two trees were built for this classification task, using NCSTN and BACE1 as split variables (Table 3). As shown in Table 3, trees number 1 and 2 can predict the correct class for over 88% of samples within our heterogeneous
3.1. Gene expression/clinical variables-combined classifiers 3.1.1. Clinical variables Previously, we analyzed the prevalence of hypertension, type 2 diabetes mellitus, hypercholesterolemia and hypertriglyceridemia, proposed risk factors for AD, in our small cohort. For this, we used measures of blood pressure and serum glucose, total cholesterol and triglyceride levels and initially classified samples as being “normal” or “elevated”, according to the common clinical criteria. Overall, the most prevalent condition was hypercholesterolemia, present in the 47.2% of individuals (Herrera-Rivero et al., 2013). From the 36 individuals selected for the present study, 14 (38.9%) presented more than one risk condition (5 controls, 4 AD and 5 NP), but none of them presented all four. AD individuals, compared to controls, presented only a higher prevalence of hypertriglyceridemia. Amongst controls and NP the most prevalent condition was hypercholesterolemia. Raw values were used to perform the statistical analyses. No significant differences were found between groups. We noticed that glucose levels were significantly higher in males than in females in the control group (p¼0.037). However, in AD and NP, glucose levels were higher in females than in males, although this difference did not show statistical significance.
3.1.2. APP, NCSTN and BACE1 expression in PBMCs We used APP, NCSTN and BACE1 expression in PBMCs from control, AD and NP individuals, previously analyzed by RT-PCR (Herrera-Rivero et al., 2013). We tested for differences between groups and between genders within groups. APP and BACE1 did not significantly differ between groups, but BACE1 expression was significantly higher in males than in females in NP (p¼0.023). Because NCSTN expression was not detected in any AD individual under the conditions of the experiment, it was the only variable significantly differing in AD from both controls (p¼0.037) and NP (p¼0.001) (Herrera-Rivero et al., 2013). There was no significant difference of NCSTN expression between males and females in any group. Table 1 Significant correlations for clinical variables and PBMC expression of APP, NCSTN and BACE1. Group
Correlation
r
p
All
Systolic–Diastolic Cholesterol–Triglycerides Cholesterol–APP Cholesterol–NCSTN Cholesterol–Triglycerides NCSTN–APP Systolic–Diastolic Triglycerides–Glucose Systolic –APP Cholesterol–NCSTN Triglycerides–APP
0.4858 0.3524 0.4208 0.4113 0.6060 0.5995 0.8162 0.6778 0.5830 0.6270 0.6265
0.003 0.035 0.011 0.013 0.037 0.039 0.001 0.015 0.047 0.029 0.029
C–A AD-A NP
C–A: Control group in the independent cohort, AD-A: AD group in the independent cohort, NP: non-AD neuropathology control group.
Table 2 Classification matrices for the discriminant analyses. Predicted
Model 1
Observed
Model 2
Model 3
C–A AD-A NP Total AD-A C–A Total AD-A NP Total C–B AD-B Total C–B AD-B Total
Control
AD
NP
8 2 1 11 1 8 9
3 10 2 15 11 4 15 11 2 13 0 18 18 0 18 18
1 0 9 10
22 0 22 22 0 22
1 10 11
Total
%Correct
12 12 12 36 12 12 24 12 12 24 22 18 40 22 18 40
66.7 83.3 75 75 91.7 66.7 79.1 91.7 83.3 87.5 100 100 100 100 100 100
C–A: Control group in the independent cohort, AD-A: AD group in the independent cohort, NP: non-AD neuropathology control group, C–B: control group in the public dataset, AD-B: AD group in the public dataset.
M. Herrera-Rivero et al. / Journal of Theoretical Biology 376 (2015) 32–38
35
Table 3 Classification matrices for all trees built per model. MODEL 1
Tree#
Group
1
Control AD NP Total Control AD NP Total Control AD NP Total Control AD NP Total Control AD Total Control AD Total Control AD Total Control AD Total AD NP Total AD NP Total
2
3
4
1
2
3
4
1
2
Predicted Control 12 0 0 12 11 1 1 13 5 1 1 7 0 0 0 0 12 0 12 12 1 13 9 1 10 6 1 7
AD
NP
%Correct
Tree#
Split var
Split const
0 12 2 14 0 11 1 12 4 11 1 16 9 12 2 23 0 12 12 0 11 11 3 11 14 6 11 17 12 0 12 12 2 14
0 0 10 10 1 0 10 11 3 0 10 13 3 0 10 13
100 100 83.3 94.4 91.7 91.7 83.3 88.9 41.7 91.7 83.3 72.2 0 100 83.3 61.1 100 100 100 100 91.7 95.85 75 91.7 83.35 50 91.7 70.85 100 100 100 100 83.3 91.65
1
NCSTN Cholesterol Systolic Pulse Triglycerides BACE1 APP NCSTN Cholesterol Pulse Triglycerides BACE1 NCSTN Cholesterol NCSTN
0.0 163.5 121.5 73.5 188.0 0.1 0.6 0.0 163.5 73.5 188.0 0.1 0.0 163.5 0.0
Cholesterol APP NCSTN Pulse Triglycerides Cholesterol NCSTN Pulse Triglycerides Cholesterol NCSTN Cholesterol
163.5 0.4 0.0 73.5 188.0 163.5 0.0 73.5 188.0 163.5 0.0 163.5
0 12 12 0 10 10
2
3 4 1
2
3 4 1
NCSTN BACE1 BACE1 NCSTN
2
0.0 0.0 0.6 0.0
MODEL 2 Tree#
1
2
3
4
MODEL 3 1
2
3
Group
Predicted Control
AD
%Correct
Tree#
Split var
Split const
Control AD Total Control AD Total Control AD Total Control AD Total
22 2 24 21 4 25 21 6 27 21 11 32
0 16 16 1 14 15 1 12 13 1 7 8
100 88.9 94.5 95.5 77.8 86.7 95.5 66.7 81.1 95.5 38.9 67.2
1
MAP4 TGFB1 DDIT3 APOE PSEN2 APH2 MAP4 DDIT3 APOE MAP4 DDIT3 MAP4
0.5 19.5 14.3 0.8 0.0 2.4 0.5 14.3 0.8 0.5 14.3 0.5
Control AD Total Control AD Total Control AD Total
22 0 22 21 1 22 21 6 27
0 18 18 1 17 18 1 12 13
100 100 100.0 95.5 94.4 95.0 95.5 66.7 81.1
MAP4 ITM2C DDIT3 CTNNA1 PRNP RTN3 GGA2 MAP4 DDIT3 CTNNA1 RTN3 GGA2 MAP4 DDIT3
0.5 0.5 14.3 2.6 4.1 0.6 17.7 0.5 14.3 2.6 0.6 17.7 0.5 14.3
2
3 4 1
2
3
36
M. Herrera-Rivero et al. / Journal of Theoretical Biology 376 (2015) 32–38
cohort using 5–7 split variables, highlighting the importance of our selected variables in the classification task. These tree classifiers using Model 1 appear to have potential for the differential diagnosis of AD. 3.2. PBMC expression-based classifiers 3.2.1. Gene expression With the purpose of generating an expression-based classifier for AD using genes related to our primary and previously analyzed genes, we used online tools to identify genes linked to MAPT, APP, NCSTN and BACE1, based on protein interactions obtained by experimental evidence and genes sharing a number of attributes, as specified in the Section 2. A final set of 32 genes, that we termed Model 2 (Table 4), was chosen for further analysis using the EGEOD-4229 Dataset. Statistical analysis showed no significant differences between AD and controls, but did show differences in a number of genes between males and females overall and within groups. 3.2.2. Best subsets Then we performed best subsets regression in order to investigate for a combination of the minimum number of these genes capable of correctly classifying the maximum number of samples. Subsets obtained were tested using discriminant analysis, identifying a subset of 16 genes that correctly predicted class for all 40 samples (Table 2). We termed this subset Model 3 (Table 4). 3.2.3. Expression based-classifiers Using models 2 and 3 we performed classification tasks by C&RTs. For Model 2, trees showed a maximum accuracy of 94.5% using six split variables. For Model 3, the minimum classification accuracy was 81.1%, but this tree poorly predicted AD class. For both models, the most important element for decision making was MAP4 expression, with a split constant of 0.5. As we can see in Table 3, Model 3 shows a very good potential to identify AD in this small dataset. 3.3. Comparison between classifiers We selected one of the trees built for each model, as C&RTs provide more information about the class prediction criteria used and show a series of different options to achieve classification. To select which we considered the best tree for each model, we looked for the tree showing both accuracy and sensitivity above 80% using the minimum number of variables to predict class. For the case of Model 1, we considered the best classifier to be the tree #2 built when AD, control and NP groups were included in the analysis because it shows a potential for discriminating AD from other Table 4 Genes included in the expression-based classifiers. AATF
CTNNA1 a a
AKT1 APBA2 b APBB1 APH2 a,b APLP2 a,b APOA1 APOE APP CDK5 a b c
a,b
DDIT3 FYN a GGA2 a,b GSK3B b ITM2B b ITM2C a,b MAP4 a,b MAPT PPP2R4 a
PRNP
a,b
b
PSEN1 PSEN2 b RTN3 a,b SNCA a STUB1 b TGFB1 b,c TNFRSF1A a,b TOMM40 a TUBB
Genes in the Best Subset conforming Model 3. Differed between males and females with p o0.05. Differed in Group*Gender analysis with p o 0.05.
UBC
4. Discussion In recent years, biomarker discovery research has explored blood expression (Kálmán et al., 2005) and plasma proteins (Irizarry, 2004), cerebrospinal fluid (CSF) (Mattsson et al., 2012), molecular and brain imaging (Kohannim et al., 2010), and neuropsychological assessment (Gomar et al., 2011), amongst other elements, to find ways to satisfactorily distinguish AD individuals from those with a normal cognition, making array technologies, nuclear magnetic resonance imaging (MRI) and machine learning methods very popular tools. We presented here an exploratory study aiming to test an innovative approach to build and compare different classifiers for AD, using as a basis genes playing primary roles in plaque and tangle formation: MAPT, APP, NCSTN and BACE1. For one of our models, we wanted to include clinical variables that serve as measures to identify medical conditions common in adulthood and that have been linked with increased risk for AD. Besides comparing AD condition with cognitivelynormal aged controls, we were interested in comparing AD against a group of individuals with other neuropsychological conditions. For this model, we focused on using the most easily available and cost-effective technologies to measure the selected variables, for which the serum colorimetric methods and endpoint RT-PCR from PBMCs were chosen. We consider results obtained using our combined model (Model 1) to be satisfactory, particularly because of the potential it showed to discriminate between AD and NP, which suggests this model might be a good tool to be used in the differential diagnosis of AD. A tree classifier using Model 1 may represent a cost-effective and accessible method for every research or clinical laboratory to increase accuracy in identifying AD. The combination of different types of data has shown to be a good approach for AD classification (Kohannim et al., 2010; Gomar et al., 2011). Nevertheless, we were also interested in testing a gene expression-only model although, unlike most studies, following a non-significance-based approach for variable selection. Using this time microarray technology through the public E-GEOD-4229 dataset, generated by expression profiles in the same cell type we previously studied (PBMCs), two models were chosen for further
Table 5 Comparison of tree classifiers showing the best diagnostic potential, using the minimum number of variables.
b,c
YWHAZ
neuropathologies as well as controls. Based on our criteria, for Model 2 we selected tree #1 and for Model 3 we selected the tree #2 as well. When comparing the selected trees for each model on their diagnostic potential, we observed Model 3 appears to be better than Model 2 to detect AD in this small dataset (Table 5). Model 1 showed better potential than Model 2, but not than Model 3; nevertheless, we have to keep in mind that this classifier using Model 1 discriminated AD not only from controls but from other neuropathologies as well, giving this an advantage over Model 3.
a,b,c
Classes
Model 1 AD, control, NP
Model 2 AD, control
Model 3 AD, control
#Total variables #Used variables Tree number Accuracy (%) Sensitivity (%) Specificity (%) PPV (%) NPV (%)
9 5 2 88.9 91.7 87.5 91.7 95.8
32 6 1 94.5 88.9 100 88.9 91.7
16 5 2 95.0 94.4 95.5 94.4 95.5
PPV: positive predictive value, NPV: negative predictive value.
M. Herrera-Rivero et al. / Journal of Theoretical Biology 376 (2015) 32–38
analysis, one being a subset (Model 3) of the original larger set of genes (Model 2). Classifiers built with these models showed a superior accuracy than those built with Model 1, using discriminant analysis. However, using C&RTs and comparing all trees generated for each model, we observed that Model 1 achieved better class predictions for AD compared to controls, while Models 2 and 3 better predicted control class over AD class. NCSTN and cholesterol were the most important split variables in the combined model, and MAP4 was the most important in both expression-based models. Of particular importance for us is the classifier built using Model 1 because it represents a potential cost-effective tool that may be easily implemented to help in differentiating AD from other neuropathologies, which is an important and sometimes difficult task. Misdiagnosis of dementia in the elderly has important implications for both the patients' and their families' quality of life. An accurate diagnosis allows the best management of the disease, influencing treatment and planning for the future. Model 3 showed good diagnostic potential here for AD, although further work would require testing performance when other neuropathologies are included in the dataset and the advantage of incorporating clinical/demographic variables to the classifier. There is little information available regarding PBMC expression of MAPT, APP, NCSTN and BACE1 in AD and, to our knowledge, no correlation studies between expression of these genes and blood pressure, glucose and lipid levels had been made prior to our reports. NCSTN expression did not show alterations in AD in a study reported by Marques et al. (2012), while BACE1 expression increased . When we studied our independent cohort, we found NCSTN differed from controls and NP, but BACE1 did not (HerreraRivero et al., 2013). This might be due to a variety of factors including the primer sequences used for our PCR, the housekeeping gene selected, sample sizes and populations, and even the statistical method used for data analysis. Nevertheless, NCSTN is a crucial molecule in AD pathology and showed importance for AD classification under the conditions of our studies. Hypercholesterolemia, which is a common medical condition in the adult population, has been linked with an increased risk for AD, apparently associated to Aβ generation and aggregation through the cholesterol interactions with ApoE (Puglielli et al., 2003). We did not only identify here serum cholesterol level as an important element for AD classification but found positive significant correlations with NCSTN and APP expression in PBMCs as well; this contributes to the growing evidence supporting a fundamental role for cholesterol in amyloid pathology in AD. The MAP4 gene codes for the major non-neuronal microtubule-associated protein which shares many characteristics with the neuronal MAPT; expectedly, in PBMCs MAP4 expression showed greater importance for AD classification than MAPT. Even when there were a number of differences in expression between males and females, the truly important differences between genders for our purpose would be those that remain significant within groups. As IGFB1, UBC and YWHAZ showed statistical difference in the GroupnGender analysis, we believe it may be advantageous to include gender as a demographic variable for this type of expression-based classifiers.
5. Conclusions Overall, results from this study suggest: (a) a link between hypercholesterolemia and Aβ generation through an increase in NCSTN and APP expressions, although molecular mechanisms should be further investigated, (b) utility of demographic variables such as gender in expression-based AD classifiers, and (c) a potential for PBMC disease-related gene expression and clinical
37
measures of risk conditions to build accessible and cost-effective classifiers for AD, capable of discriminating the latter from other neuropathologies. Furthermore, we demonstrate here that discrimination between AD and cognitively-healthy individuals can be achieved using non-significant PBMC gene expression by selecting sets of genes interacting or sharing attributes with the primary disease-related genes, an innovative approach we propose may be implemented to build disease classifiers not only for AD, but for other similarly complex disorders also. Of note, we have previously tested this approach in a larger cohort consisting of AD, MCI and control individuals, obtaining similarly satisfactory results (unpublished work), suggesting our unique approach may represent an important contribution for disease classification. Although we bear in mind the present sample size and methodological limitations, the positive results obtained with this exploratory study assure further work will be carried out to seek validation of these AD classifiers in a larger cohort.
Competing Interests The authors declare that they have no competing interests.
Authors' Contributions MHR: study conception and design, data acquisition, analysis and interpretation, manuscript preparation. MEHA and GEAA: approval of the study design and the final version of the manuscript.
Acknowledgements Funding for this work was part of the doctoral scholarship (#223277) in Biomedical Sciences granted to MHR by Consejo Nacional de Ciencia y Tecnología (CONACYT, Mexico). The authors would like to thank to Abraham Soto-Cid, PhD, for his contributions to the study from which the present was derived. References ArrayExpress, functional genomics data – EMBL-EBI. 〈www.ebi.ac.uk/arrayexpress/〉. Ballatore, C, Lee, VM, Trojanowski, JQ, 2007. Tau-mediated neurodegeneration in Alzheimer's disease and related disorders. Nat. Rev. Neurosci. 8, 663–672. Britshgi, M, Wyss-Coray, T, 2007. Systemic and acquired immune responses in Alzheimer´s disease. Int. Rev. Neurobiol. 82, 205–233. Cheng, D, Noble, J, Tang, Mx, et al., 2011. Type 2 diabetes and late-onset Alzheimer's disease. Dement. Geriatric Cogn. Disord. 31, 424–430. De Leon, MJ, Mosconi, L, Blennow, K, et al., 2007. Imaging and CSF studies in the preclinical diagnosis of Alzheimer's disease. Ann. N. Y. Acad. Sci. 1097, 115–145. Di Paolo, G, Kim, TW, 2011. Linking lipids to Alzheimer's disease: cholesterol and beyond. Nat. Rev. Neurosci. 12, 284–296. Frisardi, V, Solfrizzi, V, Seripa, D, et al., 2010. Metabolic-cognitive syndrome: a cross-talk between metabolic syndrome and Alzheimer's disease. Ageing Res. Rev. 9, 399–417. GeneCards, the human gene compendium – Weizmann Institute of Science. 〈http:// www.genecards.org/index.php?path=/GeneDecks#〉. Gladkevich, A, Kauffman, Hf, Korf, J, 2004. Lymphocytes as a neural probe: potential for studying psychiatric disorders. Prog. Neuropsychopharmacol. Biol. Psychiatry 28, 559–576. Gomar, JJ, Bobes-Bascaran, MT, Conejero-Goldberg, G, et al., 2011. Utility of combinations of biomarkers, cognitive markers, and risk factors to predict conversion from mild cognitive impairment to Alzheimer's disease in patients in the Alzheimer's disease neuroimaging initiative. Arch. Gen. Psychiatry 68, 961–969. Gu, XM, Huang, HC, Jiang, ZF, 2012. Mitochondrial dysfunction and cellular metabolic deficiency in Alzheimer's disease. Neurosci. Bull. 28, 631–640. Herrera-Rivero, M, Soto-Cid, A, Hernández, ME, Aranda-Abreu, GE, 2013. Tau, APP, NCT and BACE1 in lymphocytes through cognitively normal ageing and neuropathology. An Acad Bras Cienc. 85 (4), 1489–1496. 10.1590/ 0001-376520130013. Irizarry, MC, 2004. Biomarkers of Alzheimer disease in plasma. J. Am. Soc. Exp. Neurother. 1, 226–234.
38
M. Herrera-Rivero et al. / Journal of Theoretical Biology 376 (2015) 32–38
Jiménez-Jiménez, FJ, Alonso-Navarro, H, Ayuso-Peralta, L, Jabbour-Wadih, T, 2006. Oxidative stress and Alzheimer's disease (article in Spanish). Rev. Neurol. 42, 419–427. Kálmán, J, Kitajka, K, Pákáski, M, et al., 2005. Gene expression profile analysis of lymphocytes from Alzheimer's patients. Psychiatr. Genet. 15, 1–6. Kohannim, O, Hua, X, Hibar, DP, et al., 2010. Boosting power for clinical trials using classifiers based on multiple biomarkers. Neurobiol. Aging 31, 1429–1442. Maes, OC, Schipper, HM, Chertkow, HM, Wang, E, 2009. Methodology for discovery of Alzheimer's disease blood-based biomarkers. J. Gerontol. A Biol. Sci. Med. Sci. 64, 636–645. Marques, SC, Lemos, R, Ferreiro, E, et al., 2012. Epigenetic regulation of BACE1 in Alzheimer's disease patients and in transgenic mice. Neuroscience 18, 256–266. Mattsson, N, Andreasson, U, Carrillo, MC, et al., 2012. Proficiency testing programs for Alzheimer's disease cerebrospinal fluid biomarkers. Biomark. Med. 6, 401–407.
Puglielli, L, Tanzi, RE, Kovacs, DM, 2003. Alzheimer's disease: the cholesterol connection. Nat. Neurosci. 6, 345–351. Ray, S, Britschgi, M, Herbert, C, et al., 2007. Classification and prediction of clinical Alzheimer's diagnosis based on plasma signaling proteins. Nat. Med. 13, 1359–1362. STRING, functional protein association networks. 〈http://www.string-db.org〉. Suzuki, T, Araki, Y, Yamamoto, T, Nakaya, T, 2006. Trafficking of Alzheimer's disease-related membrane proteins and its participation in disease pathogenesis. J. Biochem. 139, 949–955. Tuppo, EE, Arias, HR, 2005. The role of inflammation in Alzheimer's disease. Int. J. Biochem. Cell Biol. 37, 289–305. Vassar, R, Kandalepas, PC, 2011. The β-secretase enzyme BACE1 as a therapeutic target for Alzheimer's disease. Alzheimers Res. Ther. 3, 20. Yang, M, Cai, F, Pan, Q, et al., 2009. Transcriptional regulation of the Alzheimer's disease-related gene, nicastrin. Prog. Biochem. Biophys. 36, 994–1002.