Chemometric approaches to improve PLSDA model ... - Springer Link

3 downloads 180 Views 726KB Size Report
Feb 25, 2011 - for predicting human non-alcoholic fatty liver disease using. UPLC-MS as a ... variable selection and PLSDA analysis has been assessed as a tool to ... using two of the most common software packages employed in the field ...
Metabolomics (2012) 8:86–98 DOI 10.1007/s11306-011-0292-5

ORIGINAL ARTICLE

Chemometric approaches to improve PLSDA model outcome for predicting human non-alcoholic fatty liver disease using UPLC-MS as a metabolic profiling tool Guillermo Quinta´s • Nuria Portillo • Juan Carlos Garcı´a-Can˜averas • Jose´ Vicente Castell Alberto Ferrer • Agustı´n Lahoz



Received: 17 December 2010 / Accepted: 14 February 2011 / Published online: 25 February 2011 Ó Springer Science+Business Media, LLC 2011

Abstract An MS-based metabolomics strategy including variable selection and PLSDA analysis has been assessed as a tool to discriminate between non-steatotic and steatotic human liver profiles. Different chemometric approaches for uninformative variable elimination were performed by using two of the most common software packages employed in the field of metabolomics (i.e., MATLAB and SIMCA-P). The first considered approach was performed with MATLAB where the PLS regression vector coefficient values were used to classify variables as informative or not. The second approach was run under SIMCA-P, where variable selection was performed according to both the PLS regression vector coefficients and VIP scores. PLSDA models performance features, such as model validation, variable selection criteria, and potential biomarker output, were assessed for comparison purposes. One

interesting finding is that variable selection improved the classification predictiveness of all the models by facilitating metabolite identification and providing enhanced insight into the metabolic information acquired by the UPLC-MS method. The results prove that the proposed strategy is a potentially straightforward approach to improve model performance. Among others, GSH, lysophospholipids and bile acids were found to be the most important altered metabolites in the metabolomic profiles studied. However, further research and more in-depth biochemical interpretations are needed to unambiguously propose them as disease biomarkers. Keywords Metabolomics  Mass spectrometry  PLSDA and steatosis

1 Introduction Electronic supplementary material The online version of this article (doi:10.1007/s11306-011-0292-5) contains supplementary material, which is available to authorized users. G. Quinta´s Unidad Analı´tica Mixta Instituto de Investigacio´n Sanitaria-Fundacio´n Hospital La Fe, Valencia, Spain N. Portillo  A. Ferrer Department of Applied Statistics, Operations Research and Quality, Universidad Polite´cnica of Valencia, Valencia, Spain J. C. Garcı´a-Can˜averas  J. V. Castell Departamento de Bioquı´mica y Biologı´a Molecular, Facultad de Medicina, Universidad de Valencia, Valencia, Spain J. C. Garcı´a-Can˜averas  J. V. Castell  A. Lahoz (&) Unidad de Hepatologı´a Experimental, Instituto de Investigacio´n Sanitaria Fundacio´n Hospital La Fe, Av. Campanar, 21, 46009 Valencia, Spain e-mail: [email protected]

123

Metabolomics measures the ‘‘downstream’’ products of proteins and genes. Metabolic profiles are particularly good reporters of an organism phenotype or tissue physiology (Nicholson et al. 1999), and provide information which is not accessible through other alternative ‘‘omics’’ approaches such as genomics or proteomics. Recent studies have shown the use of mass spectrometry-based (MS-based) metabolomics as a reliable tool for detecting biomarker patterns for the purpose of finding differences among samples. This information has proven effective for disease diagnosis (Cortes et al. 2010; Li et al. 2010; Pasikanti et al. 2010). MS untargeted metabolic profiling produces huge amounts of data. Extraction of useful and reliable information (i.e., statistically significant) remains one of the most important challenges faced by analysts to date. Among other methods, partial least squares––discriminant

MS-based metabolomic combined with chemometrics for predicting NAFLD

analysis (PLSDA) is a well-established chemometric approach for supervised analyses based on a PLS model in which the dependent variable (Y block) represents class membership (Wold et al. 2001; Matthew and William 2003). PLSDA is preferred to principal component analyses (PCA) for sample discrimination because the dimension reduction provided by PLS is guided explicitly by amonggroups variability, whereas PCA is only capable of identifying gross variability directions and is not capable of distinguishing ‘‘among-groups’’ and ‘‘within-groups’’ variabilities (Barker and Rayens 2003; Brereton 2009). Metabolomic approaches usually deal with a low number of samples (i.e., objects), but up to several hundreds of predictor variables. PLSDA as a dimension reduction approach is especially well suited to handle this type of complex datasets. However, a potential pitfall of PLSDA and other multivariate approaches is the possibility of overfitting leading to apparent class separation in situations in which there is no real class difference (Bijlsma et al. 2006; Westerhuis et al. 2008a). For this reason, appropriate variable selection in metabolomics is an active field of research as it provides significant improvements in both model performance and the interpretability of models by reducing complexity and the likelihood of model overfitting. It goes without saying that variable selection is an especially relevant factor for the further development of ‘target’ MS approaches for the quantitative determination of selected biomarkers. The term fatty liver identifies a liver in which lipids account for more than 5% of the liver’s wet weight. Nonalcoholic fatty liver disease (NAFLD) is a worldwide phenomenon with increasing prevalence which is commonly associated with obesity, dyslipidaemia, insulin resistance and diabetes. For a long time, fatty liver, or hepatic steatosis, was considered a benign manifestation. However, recent data indicate a wide spectrum of clinical and pathological manifestations that subjects with NAFLD develop. Besides, dietary and genetic factors also determine susceptibility to NAFLD and its progression to cirrhosis, hepatocellular carcinoma and liver failure, such complications are further indications for liver transplantation (den Boer et al. 2004; Cheung and Sanyal 2009; Dowman et al. 2010). The main goal of this paper was to assess the potential of MS as a metabolomic profiling tool to investigate steatotic and non-steatotic human liver profiles for discriminative purposes. Consequently, human liver tissues including steatotic and non-steatotic samples from discarded liver grafts were analysed by ultra performance liquid chromatography quadrupole-time-of-flight (UPLCQ-ToF) mass spectrometry. Sample preparation and data acquisition were conducted in compliance with good laboratory practices (GLP). To minimise undesirable sources

87

of variation, instrumental stability was monitored throughout the study by using different quality controls (QC) and internal standards (IS). Metabolomic data was processed using two different chemometric approaches involving two of the most used software packages in metabolomics (MATLAB and SIMCA-P) via the original non-linear iterative partial least squares (NIPALS) algorithm. Both methods included data pre-processing, PLSDA analysis, variable selection and model validation (Fig. 1), subjects which are deeply discussed throughout this paper.

2 Materials and methods 2.1 Samples and reagents A total of 60 liver tissue samples obtained from the Liver Bank of the University Hospital La Fe in Valencia (E. Spain) (UHE-LAFE/CIBERehd) were analysed. Liver tissues samples were classified as steatotic or non-steatotic (control) samples in accordance with their lipid, triglycerides content (Gomez-Lechon et al. 2007) and with available histological information. Table 1 summarises the study cohort characteristics and hepatic lipid content. The study was approved by the Ethics Committee of the University Hospital La Fe. All the solvents and additives used were of mass spectrometry grade: water was purchased from Romil (Cambridge, UK); methanol and acetonitrile were acquired from Fisher Scientific (Loughborough, UK); formic acid, acetaminophen, caffeine, sulphadimethoxine, verapamil, reserpine, leucine enkephalin, erythromycin and chloroform were purchased from Sigma–Aldrich Quimica SA (Madrid, Spain). The kits used for the determination of total lipids and triglycerides were purchased from SpinReact (Girona, Spain). 2.2 Sample pre-treatment Liver extraction was performed by slightly modifying a previously reported procedure (Wu et al. 2008). A portion of accurately weighed frozen hepatic tissue (*100 mg) was placed in a 2 ml polypropylene tube containing CK14 ceramic beads (Precellys, France). Then 4 ml/g of methanol and 1.25 ml/g of an aqueous solution of 2.4 lM Leucine Enkephalin and 1.2 lM Erythromycin (IS) were added sequentially to the tube. Liver tissues were homogenised twice for 25 s at 6000 rpm in a Precellys 24 Dual system (Precellys, France). Tissue homogenates were transferred to a 2 ml Eppendorf tube. Then, 2 ml/g of the aqueous solution were used to improve homogenate recovery. Metabolites were extracted from tissue homogenates using 4 ml/g chloroform stabilised with ethanol. Subsequently, samples were vortexed three times for 15 s for

123

G. Quinta´s et al.

88

Fig. 1 Schematic work flow of the chemometric approaches Table 1 Summary of the clinical characteristics of steatotic and nonsteatotic subjects Characteristics

Non-steatotic

Steatotic

No. of subjects

29

31

Age (years)

52 ± 19

48 ± 16

12 17

21 10

Sex Male Female Body mass index (kg/m2)

26 ± 4

29 ± 4

lg lipids/mg prot

200 ± 80

1300 ± 700

lg TGs/mg prot

70 ± 60

800 ± 400

centrifuged for 7 min. Aqueous and organic phases were carefully withdrawn into two separate clean Eppendorf tubes and stored at -80°C until analysis. Before analysis, frozen aqueous extract samples were allowed to thaw on ice. Then, 100 ll of aqueous extracts were placed in UPLC glass vials, and 6 ll of an aqueous solution containing 1.16% (v/v) formic acid and 0.83 lg/ml reserpine were added. Currently, research involving both the hydrophilic interaction chromatography analysis of the aqueous phase and a reversed phase analysis of the organic layer is being carried out to achieve more comprehensive metabolomic tissue profiling.

Values are given as mean ± standard deviation. TG triglycerides

2.3 UPLC-Q-ToF mass spectrometry extraction and mixed for 20 min at 4°C in a tube rotator. After mixing, samples were left on ice for 15 min and centrifuged for 15 min at 100009g and 4°C. The liquid phase was transferred to a clean Eppendorf tube and

123

A reversed phase analysis was performed in an Acquity UPLCTM chromatograph using an HSS T3 (100 9 2.1 mm, 1.8 lm) C18 column with a HSS T3 VanGuard

MS-based metabolomic combined with chemometrics for predicting NAFLD

precolumn (5 9 2.1 mm, 1.8 lm) from Waters (Milford, MA, USA). A 15-min linear gradient elution was performed at a flow of 440 ll/min as follows: initial conditions of 100% of solvent A (0.1% of formic acid in water) were kept for 0.5 min, followed by a linear gradient from 0 to 95% of mobile phase B (0.1% of formic acid in methanol) for 7.5 min; isocratic conditions of 95% B were held for 3.5 min; finally, a 0.5 min gradient was used to return to the baseline conditions, which were maintained for 3 min. Column and autosampler temperatures were set at 40°C and 4°C, respectively. The eluting analytes were detected using a Q-ToF SYNAPT HDMS spectrometer (Waters, Milford, MA, USA). The electrospray conditions were as follows: capillary and cone voltages were set at 3.5 kV and 35 V in the positive mode; desolvation and source temperatures were set at 340°C and 120°C, respectively; flow rates of the cone and nebulisation gases were set at 60 and 800 l/h, respectively. Full scan data were collected in the TOF MS mode from 50 to 950 mass to charge ratio (m/z) with a scan time of 0.08 s. A Lock Spray interface was used to maintain mass accuracy during the analysis. To this end, a 50 pg/ml solution of leucine enkephalin (m/z 556.2771) in acetonitrile:water (1:1) containing 0.1% of formic acid were infused postcolumn using an isocratic pump at a flow rate of 40 ll/min, acquired in a scan time of 0.2 s every 10 s. The data station operating software used was MassLynx 4.1 (Waters, Milford, MA, USA). 2.4 Sample acquisition Sample acquisition was designed to monitor and minimise undesired sources of variation and to ensure the quality of the results with regards to the instrumental system’s reproducibility and stability. Three types of quality controls (QC) were used throughout this study: (i) a test mixture of standard compounds, (ii) a pool of liver aqueous extracts and (iii) blank samples. The test mixture comprised a standard solution of five selected compounds, covering a wide range of both molecular weight and retention times (rt): acetaminophen (m/z 152.0712, rt 3.00 min), caffeine (m/z 195.0882, rt 3.90 min), sulphadimethoxine (m/z 311.0814, rt 4.92 min), verapamil (m/z 455.2910, rt 5.68 min) and reserpine (m/z 609.2812, rt 6.07 min). A pooled QC sample was prepared by pooling liver aqueous extracts from all the samples comprising the study. Blank QC were analysed to check for possible carryover effects during the sample analysis. QC samples were alternated throughout the analysis batch sequence as follows: 1 test mixture for every 15 biological samples, 1 blank for every 10 biological samples and 1 pool QC for every 6 samples. Reserpine (m/z 609.2812, rt 6.07 min) was used as IS to normalise the instrument response (injection volume, detector sensibility

89

changes) and to correct for possible rt shifts during the analysis. Instrument performance was checked for each injected sample by monitoring mass accuracy, rt and the peak area of reserpine and leucine enkephaline as IS. Finally, other important considerations were also followed to ensure method robustness: (i) the total batch analysis duration was limited to 40 h to minimise the effect of a potential drift in detector sensitivity (Zelena et al. 2009); (ii) the UPLC-MS system was conditioned by 12 injections of a pool of QC samples before analysing biologically relevant samples; (iii) samples were analysed randomly and in triplicate to avoid a bias as regards the order of injection, and to either minimise or average the sources of variation that differed from the inherent biochemical samples composition (Want et al. 2010). 2.5 Data analysis and biomarker selection Raw spectrometric data were processed using the MarkerLynx XS application software (Waters, Milford, MA, USA). The parameters set were a peak baseline noise of 20, a peak width of 6 s (at 5% height), and a noise elimination threshold of 6 with activated de-isotope filtering. Features (rt and m/z ratios) were aligned and matched across all the samples. Under these conditions, an Xraw (180 9 1394) data matrix was obtained, in which each row represents an object (i.e., UPLC injection) and each column is a metabolite peak at a specific rt. A multivariate data analysis was carried out using two chemometric approaches. The first approach consisted in a computing intensive method using MATLAB 2008b (Mathworks Inc. Natick, MA, USA), the PLS Toolbox 5.8.1 (Eigenvector Research Inc., Wenatchee, WA, USA) and inhouse written MATLAB functions. A supervised PLSDA model (called the PLS-bM model) was calculated from a subset of variables selected from the coefficients of the PLSDA regression vector. Double cross validation (2CV) and permutation testing were used to statistically validate the selection of the number of latent variables and also for variable selection (Bijlsma et al. 2006; Westerhuis et al. 2008a). The second approach consisted in a less computing intensive method using Simca P ? v.12 (Umetrics, Windsor, UK). Here two models were calculated employing the subsets of variables selected by using both the variable importance in the projection (VIP) scores (called the PLS-V model) and the coefficients of the PLSDA regression vector (known as the PLS-bS model). Cross validation (CV) and the response permutation test were used for model validation (Lindgren et al. 1996; Eriksson et al. 2008). An external validation set of twelve samples (including six non-steatotic, and six steatotic liver samples) was used to assess the predictive potential of all the calculated models. A detailed work flow of the chemometric approaches is depicted in Fig. 1.

123

G. Quinta´s et al.

90

Metabolite identification was performed by comparing its accurate mass (±5 ppm) with all the metabolites annotated in the Human Metabolome Database (HMDB) (Wishart et al. 2009). The MS/MS spectra of the possible compounds were searched on the mass bank website (Horai et al. 2010) and in the HMDB to be compared with the MS/MS spectra of the discriminatory markers. Finally, standard solutions of commercially available metabolites were analysed to confirm the identity of a reduced set of compounds.

3 Results and discussion 3.1 Data acquisition and assessment of instrument performance MS-based metabolomics discrimination approaches aim to find any biologically relevant differences between metabolic profiles (i.e., non-steatotic versus steatotic). Such differences should relate to the underlying biochemical differences among subjects and not to other sources of variation (i.e., changes of injection volume, adduct formation, ion suppression effects, contamination peaks, etc.). Unfortunately in this type of studies, such undesired variation is often unavoidable. In this study, a rigorous experimental design to minimise and/or detect undesired sources of variation was employed. The mass accuracy, rt, and peak area values of reserpine and leucine enkephalin were monitored and used as indicators of instrument performance. The acceptable repeatability criteria in metabolic profiling experiments are still being discussed. Here, threshold values of 1 and 15 of relative standard deviation or percentage (% RSD) were established for rt and peak area, respectively, following the FDA recommendations for bioanalytical methods (FDA 2001). As regards mass accuracy, a value of 5 ppm was considered an acceptable cut-off for mass accuracy measurements. The % RSD values obtained for the rt and peak area for all the IS used in the analysis were found to be below the maximum accepted threshold values, ranging from 0.07 to 0.23% and from 5.14 to 12.48% for rt and peak area, respectively. Mass accuracy showed good stability, with mean values of 2.75 and 1.14 ppm for reserpine and leucine enkephaline, respectively (Table 1 of supplementary material). These results evidence excellent chromatographic reproducibility and appropriate mass accuracy to guarantee the consistency of the metabolic liver profiles obtained by the MS method. 3.2 Data normalisation and pre-processing To improve comparability among sample replicates, three data normalisation approaches were considered and

123

compared to the use of the raw data, including the normalisation of the sum of each sample’s monoisotopic peak heights or peak areas, and the use of the peak area of reserpine (IS). The distribution of the relative % RSD of the variable intensities within replicates (n = 3) was calculated and selected as the visualisation criterion for normalisation performance (Fig. 1 of Supplementary material). Besides, the median % RSD was used as the overall measure of variability provided. The use of a constant sum of the monoisotopic peak areas within each object for normalisation provided both the narrowest distribution and the lowest median % RSD value. The use of this type of normalization may introduce closure to the data (Johansson et al. 1984) however PCA analysis on the raw and the normalized data showed minor closure contribution. After normalisation, triplicate measurements were averaged to improve the signal-to-noise ratio of the data matrix (Peters et al. 2009). Average values within triplicates were calculated for each variable after slightly modifying a previous procedure (Bijlsma et al. 2006): (i) if the three values were zero, the combined value was zero, (ii) if the three values were non zero, the combined value was equal to the average of the three measurements values, (iii) if two values were non zero, the combined value was equal to the average of the two non-zero values, and (iv) if two values were zero, the combined value was zero. Finally, only those variables present in at least ten samples were included in the final dataset, thus obtaining a data matrix X60 (60 9 540). Before the analysis, the X60 data matrix was logarithmic transformed to make the variable distributions fairly symmetrical. One important issue of MS-based metabolomics and projection methods such as PLSDA is data scaling. With appropriate scaling, the model can be guided to focus more on relevant variables (Sysi-Aho et al. 2007). Consequently, data was mean-centered and divided by the square root of the standard deviation (i.e., pareto scaling) as a scaling factor to lessen the importance of highly intense variables (van den Berg et al. 2006). Finally, 48 of the 60 liver samples included in the study were used as the sample set (X48) for PLSDA model calculation, while the 12 remaining samples, including 6 steatotic and 6 non-steatotic liver samples, were used as an external validation set (X12). 3.3 Multivariate data analysis Unsupervised PCA was performed to achieve any interrelationship including grouping, clustering or outlier detection. A PCA analysis of the QC samples revealed three tight clusters corresponding to the test mix, the pool and the blanks samples, which were randomly injected during

MS-based metabolomic combined with chemometrics for predicting NAFLD

the analytical run (Fig. 2 of Supplementary material). Therefore, it can be assumed that the analytical strategy offers appropriate sample and instrumental stability during the entire batch analysis. A further multivariate analysis was performed by two chemometric approaches involving PLSDA including, model cross-validation, variable selection and external validation. 3.3.1 PLSDA model calculation using double crossvalidation and permutation testing The X48 (48 9 540) dataset was exported into MATLAB software for a further multivariate analysis. Double crossvalidation (2CV) was used to assess the lack of overfitting in the PLSDA models (Westerhuis et al. 2008b). Although detailed descriptions of the 2CV method can be found elsewhere (Filzmoser et al. 2009), it is worth briefly summarising its main steps. In this validation procedure, dataset X48 (48 9 540) was randomly split into four subsets (i.e., fourfold CV). Three of these subsets formed a calibration dataset XCAL (36 9 540), upon which a PLSDA model was built. Using a leave-one-out CV, the number of latent variables (LV), with a maximum of 5, providing the minimum root mean squared error of cross validation (RMSECV) was selected for each ‘inner-model’. The remaining 12 samples were used as a test set XTEST (12 9 540) to assess the quality of the model. This step was repeated until all the samples were included once in the XTEST dataset. As a result, a mean regression coefficients b vector was calculated from the four independent PLSDA models generated during the outer loop of the 2CV process. However, a random four-fold division of a dataset for CV is arbitrary and the results based on this selection might not be representative. For example, the presence of outliers or a low number of samples of one of the two classes included in the data subsets may have a strong effect on the PLS model (Brereton 2009). To minimise the effect of such a potential pitfall, the random dataset division was repeated 2000 times, and a mean PLS regression coefficients vector (breal) was calculated from the obtained set of b vectors. Statistical validation of the model performance was carried out by means of a permutation test in which the response variable y (i.e., the class labels) was randomly permuted 2000 times while the X48 matrix kept its original ordering (Westerhuis et al. 2008b). For each permutation, a PLSDA model was calculated using 2CV, as described above. Two PLSDA model performance parameters (i.e., number of misclassifications and the area under the receiver operating characteristic curve (AUROC)) obtained using real class labels were compared to a reference distribution of the same parameters corresponding to random class assignments. The reference distribution was built

91

according to the H0 hypothesis of there being no difference between the two classes. As Fig. 2a illustrates, when class membership was permuted, the mean value of the number of misclassifications was 23.4. As expected, this number was very close to 50% of the total number of samples used in the model calculation. This indicates that there is no relationship between class and data. Conversely, the distributions of both the number of misclassifications and AUROC values obtained using real class labels were indicative of a real class difference. However a high spread of the number of misclassifications obtained was observed, thus indicating poor consistency of the PLSDA models as prediction was highly dependent on the selection of those samples used for calibration. Such inconsistency could be at least partially owing to model overfitting due, in turn, to noise modelling. Therefore by following the idea of Pierna et al. (2009), an attempt was made to remove those variables that did not correlate with the biological difference between classes (hereafter denoted uninformative variables) by retaining as much information as possible with a minimum error in prediction. 3.3.1.1 Uninformative variable elimination Feature selection or variable elimination is critical in the development of PLS classification methods since a signal is averaged with noise over a large number of variables with loss of discernible signal amplitude when noisy variables are present (Isabelle et al. 2003; Lavine and Workman 2010). Apart from improved predictive performance and robustness, variable selection facilitates the progress made in the understanding and interpretation of both the model and the biochemical pathways involved in the pathology. A number of chemometric methods for variable selection in multivariate datasets have been already employed in metabolomic studies (Hoskuldsson 2001; Cavill et al. 2009; Wongravee et al. 2009). A strategy for data processing, analysis and validation in metabolomics studies in which a PLSDA analysis of the dataset was done to initially select a reduced set of variables as ‘potential biomarkers’ was proposed by Bijlsma et al. (2006). In this work, we adapted this approach with minor modifications: as previously indicated, a total of 2000 PLSDA models and so, the same number of brandom vectors was obtained using random class assignments. Using this information, a variable was considered ‘potentially informative’ if its value in the breal vector did not belong to the distribution of the brandom values of this considered variable. The use of very restrictive criteria (i.e., high confidence level) for the distribution comparison would lower the number of variables considered ‘informative’; thus, the likelihood of retaining noise variables but, on the other hand, the probability of eliminating informative variables would increase, and a compromise should be ensured. Figure 3 shows the effect

123

G. Quinta´s et al.

92

A 700

700 Random class label Real class label Mean value (real class label)

600 500

500

Frequency

Frequency

Random class label Real class label Mean value (real class label)

600

400 300

400 300

200

200

100

100 0

0 0

5

10

15

20

25

30

35

40

0

45

0.1

0.2

0.3

0.4

Number of missclassifications

0.5

0.6

0.7

0.8

0.9

1

0.7

0.8

0.9

1

AUROC

B 700

700 Random class label Real class label Mean value (real class label)

600

500

Frequency

Frequency

500 400 300

400 300

200

200

100

100

0

Random class label Real class label Mean value (real class label)

600

0 0

5

10

15

20

25

30

35

40

45

Number of missclassifications

0

0.1

0.2

0.3

0.4

0.5

0.6

AUROC

Fig. 2 Histogram of the number of misclassifications (left) and AUROC (right) distributions after 2000 permutations using the original dataset (a) and a subset of variables selected as informative using real and permuted class labels (b)

of increasing confidence levels on the number of variables considered as informative. For example, decreasing the confidence level from 99 to 97% leads to an increase in the number of informative variables from 40 to 86. The choice of the confidence level is a critical factor which is frequently selected according to previous experience or to established values (e.g. 99 or 99.5%). In this work, an empirical evaluation of the effect of this value on the predictive capabilities of the PLSDA models was made to support the selection of the confidence level. As mentioned, those variables with breal values within the confidence limits defined by the cut-off levels were classified as ‘potentially uninformative’ and so, a number of sets of ‘potential informative’ (Xinfo) and ‘uninformative’ (Xuninfo) matrices were defined as a function of the confidence level. For each of the Xinfo and Xuninfo dataset a PLSDA model was calculated. The evolution of the number of misclassifications as a function of the confidence level is depicted in Fig. 3. As the confidence level decreases, the number of misclassifications provided by PLSDA models of the Xinfo datasets initially lowers

123

quickly, attains a minimum, but then steeply rises, and for those confidence levels lower than 91% increase rapidly. The PLSDA model providing the lowest misclassification value, hereafter denoted as PLSDA-bM model, was obtained using a total of 73 variables corresponding to a confidence level of 97.66%. The inclusion of additional variables in the corresponding PLSDA models slightly degraded the predictive performance because of the addition of redundant noise to the models. Note that the variables not included in the Xinfo were included in the corresponding Xuninfo datasets for each confidence level. The fact that the number of misclassifications obtained using the Xuninfo datasets remained stable for confidence levels higher than 97.66% suggested that discarded variables provided no discriminant information. 3.3.2 PLSDA model calculation using CV simple crossvalidation and permutation testing The original X48 (48 9 540) dataset was exported to the SIMCA-P software for the PLSDA multivariate analysis.

MS-based metabolomic combined with chemometrics for predicting NAFLD

A 12

A

Number of variables retained in the model 64

113

189

246

295

350

1.0

411

0.9

Xinfo datasets

10

0.8 0.7

Mean number of missclassifications

8

0.6

6

0.5

4

0.4 0.3

2 0 100

0.2 0.1

98

96

94

92

90

88

86

84

Confidence level

B 21

93

-0.0 -0.1

R2 Q2

-0.2

Number of variables retained in the model 476

427

351

294

245

190

0.0

129

Xuninfo datasets

20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

B 1.0

19

0.9 0.8

18

0.7

17

0.6

16

0.5 0.4

15 14 100

0.3 0.2

98

96

94

92

90

88

86

84

Confidence level

0.1 -0.0 -0.1

Fig. 3 Error rates as the mean number of misclassifications obtained with PLSDA on ‘informative’ (a) and ‘uninformative’ (b) datasets as a function of the confidence level selected by the uninformative variable elimination procedure

R2 R2 Q2 Q2

-0.2 0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

C 1.0

Two model validation procedures were applied to assess model performance. Firstly, sevenfold CV was used to select the number of LV according to both the total y explained variance (i.e., R2) and the predictable y variation (i.e., Q2) of the PLSDA model. The performance of a PLSDA model calculated on X48 (48 9 540) using two LV was assessed by using R2 and Q2 statistics, with values of 87.8 and 58.5%, respectively. After CV validation, a y class random permutation test was applied to investigate model consistency. The permutation test compares the goodness of fit of the original model with the values obtained after class randomisation (Lindgren et al. 1996). The results, which are usually presented as a distribution, obtained from a y class permutation test can be visualised as a two-dimensional plot; see Fig. 4a. This plot displays the R2 and Q2 values obtained from each permuted model on the vertical axis, and the correlation coefficients between the real y and the permuted y on the horizontal axis. If we centre on the original R2 and Q2 values (at the right top of the plot), the model could be considered statistically significant, however two aspects are indicative of potential model overfitting or poor model consistency: (i) the Q2 intercept on the left vertical axis should be close to, or even below, zero, when the y class is randomly

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 -0.0 -0.1

R2 R2 Q2 Q2

-0.2 0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Fig. 4 Validation plot obtained from the permutation test. The vertical axis gives the R2 (goodness of fit) and Q2 (goodness of prediction) values of each model. The horizontal axis represents the correlation between the ‘real’ y and the permuted ‘y’. a PLSDA model of the original dataset, b PLSDA-V model after variable selections using VIP scores and c PLSDA-bS model, after variable selection using regression coefficients

permuted, and (ii) not all the R2 values obtained from the 800 permuted models are lower than the original R2 value giving a high R2 intercept and a near-to-zero slope of the regression line. Such aspects are caution indicators for using the ‘real class’ model. To overcome possible model

123

G. Quinta´s et al.

94

inconsistency due to the contribution of those variables unrelated to the class of the objects, a variable selection procedure was applied. 3.3.2.1 Uninformative variable elimination To eliminate those variables unrelated to the physiological status of the liver, two variable selection criteria were considered, which are based on the use of: (i) the variable influence in the projection (VIP) scores, and (ii) the coefficients of the PLSDA regression vector. The VIP score of a predictor summarises the importance for the projections to find h LV, since the average of the squared VIP scores equals 1, the ‘‘greater than one’’ rule is generally used as a variable selection criterion (Chong and Jun 2005). VIP scores were calculated for the 540 variables, and those showing a VIP value greater than 1 were included for a further PLSDA analysis (the PLSDA-V model). As a result, a reduced matrix Xinfo (48 9 169) was obtained. The R2 and Q2 values for the new PLSDA-V, which was calculated using two LV, were 84.1 and 74.6%, respectively. So, a considerable increase in the Q2 value was achieved by applying VIP variable selection criteria if compared with the initial PLSDA model (PLSDA-raw). This improvement indicates that, indeed, the dataset is now more homogenous and can be properly modelled. To further validate the model, a y class random permutation test was conducted; the results confirm slight model improvement, as reflected by the lower abscissa intercept of the R2 regression line. Moreover, the R2 and Q2 values found using real class labels are clearly outside the distributions of those statistical found when employing random class labels (Fig. 4b), which indicates a clear distinction between the permutated-classification and the original classification. For these results, we can conclude that the PLSDA-V model is statistically significant (P \ 0.001). The second variable selection method, as in the previous PLSDA-bM, considered the coefficients of the PLSDA regression vector. Thus, those variables showing higher absolute regression coefficients values with acceptable confidence intervals based on jack-knifing (95% confidence) were selected as informative. Jack-knifing is a method for finding the precision of an estimate by iteratively keeping out parts of the underlying data, making estimates from the data subsets and comparing these estimates. In PLS, the set of multiple models resulting from the cross-validation is used to calculate jack-knifing uncertainty measures (standard errors and confidence intervals) of the predicted y values (Efron and Gong 1983). A total of 88 variables were retained in a Xinfo dataset, and a PLSDA model (PLSDA-bS) was calculated using two LV, resulting in model performance R2 and Q2 values of 85.6 and 76.4%, respectively. Once again, uninformative variable elimination provided plausible model improvement if compared to

123

the PLSDA-raw model. Yet when compared with the PLSDA-V model, no significant improvement was achieved as indicated by the similar R2 and Q2 values obtained in both models. The permutation test in the PLSDA-bS model showed significant improvement regarding the PLSDA-raw model and slight improvement in relation to the PLSDA-V model, as reflected by a drop in the Q2 intercept on the left horizontal axis (Fig. 4c). 3.4 Models outcome comparison Different variable selection procedures were implemented to obtain a small set of variables that offers better generalisation ability, or at least the equivalent to the original set of variables. All the variable selection approaches improved the predictive capabilities of the original models, as indicated by the Q2 and R2 values summarised in Table 2. In order to compare the different variable selection procedures, it is important to consider analysing the number of variables that each model reports. The Venn diagram in Fig. 5 summarises the number of variables selected for each model, along with the common variables among them. PLSDA-V retained a total of 169 variables, while PLSDA-bM and PLSDA-bS, respectively retained 73

Table 2 Summary of model features, before and after the variable selection procedure Software

Model

LVs

Variable selection

R2

Q2

MATLAB

PLSDA-raw

2



86.8

52.5 78.1

MATLAB

PLSDA-bM

2

b

88.1

SIMCA

PLSDA-raw

2



85.3

58.5

SIMCA

PLSDA-V

2

VIP

84.1

74.6

SIMCA

PLSDA-bS

2

b

85.6

76.4

LV Latent variables

Fig. 5 Venn diagram showing the relations between the features reported for each model after variable selection

MS-based metabolomic combined with chemometrics for predicting NAFLD

0 -5 -10

-15

CONTROL CONTROL confidence ellipse STEATOTIC STEATOTIC confidence ellipse CONTROL (Ext. val.) STEATOTIC (Ext. val.)

-10

-5

Scores on LV 2 (10.33%)

Scores on LV 2 (10.13%)

Scores on LV 2 (12.33%)

5

-20

15

10

10

-15

5

0

-5

-10

-15 0

5

10

-15

-5

0

0.8 0.6 0.4 0.2 0 -0.2 -0.4 40

-5 -10

-20

50

60

-10

0

10

20

Scores on LV 1 (32.79%)

Y Predicted (STEATOTICy=1)

Y Predicted (STEATOTICy=1)

1

30

0

5

1.5

1.2

20

5

Scores on LV 1 (19.09%)

1.4

10

10

-15 -10

Scores on LV 1 (20.54%)

Y Predicted (STEATOTICy=1)

95

1 0.5 0 -0.5 -1

1.2 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4

-1.5 10

20

Sample

30

40

50

60

10

20

Sample

30

40

50

60

Sample

Fig. 6 PLSDA scores plot of the first two components indicating the separation between the groups (top) and predicted y values for both the calibration (samples 1–48) and validation (samples 49–60) sets (bottom). From left to right PLSDA-bM, PLSDA-V, PLS-bS models

and 88, with 54 common variables. A total of 47 variables were retained by the three models, and such features were considered for further identification as potential biomarkers. Any model needs to be validated before it is used for predicting or understanding purposes. CV and permutation are reasonable forms of model validation, especially when external validation is not plausible. Such approaches present both advantages and disadvantages, which are discussed elsewhere (Westerhuis et al. 2008b; Esbensen and Geladi 2010). Actually the best validation of a model is achieved by using an independent and representative external validation set; however, this is not always possible. In our experimental design, an external subset of samples X12 (six non-steatotic and six steatotic) was used to evaluate the prediction abilities of the PLSDA models developed after variable selection. Similar results were obtained for all the models when comparing the average y predicted class value, where five of the 6 non-steatotic samples and six of the 6 steatotic samples were correctly classified (Fig. 6). When considering jack-knifing uncertainties bars however (measuring the precision estimates, Fig. 7), those PLSDA models fitted after variables selection (i.e., PLSDA-bM, PLSDA-V, and PLSDA-bS) yielded statistically narrower uncertainty prediction intervals (Pvalue \ 0.05) than those obtained from the PLSDA model calculated with the whole set of 540 variables (i.e., PLSDA-raw). Moreover, no statistical difference in the

prediction uncertainty among the three variable selection methods (P-value \ 0.05) was found. 3.5 Tentative biomarker identification A list of 47 marker ions (metabolites) that discriminate nonsteatotic and steatotic liver profiles, and which were present in all the models, was selected as potential biomarkers. Metabolite identification was performed by comparing the feature exact mass with the exact mass of the putative biomarker in the HMDB within a mass difference lower than 5 ppm (Wishart et al. 2009). The fragmentation spectra of plausible biomarkers were searched on both the mass bank web site (Horai et al. 2010) and the HMDB, and were compared with the MS/MS spectra produced by the markers. Some putative biomarkers were identified and are summarised in Table 3. It is interesting to note how decreased levels of reducing agents such as GSH, increased levels of bile acids such as taurodeoxycholic acid (Kalhan et al. 2010; Vinaixa et al. 2010), and increased levels of different lysophosphatidylethanolamines (Puri et al. 2009) are significantly altered in steatotic liver tissue. GSH is one of the main antioxidant species in hepatocytes; thus, the reduced levels in the steatotic group could be caused by oxidative stress damage, this being assumed damage during development from NAFLD to NASH, and other pathologies (Dowman et al. 2010). Lysophospholipids are products deriving from the partial hydrolysis of phospholipids which

123

G. Quinta´s et al.

96 Fig. 7 Predicted y values obtained for the samples included in the external validation set (X12). Nonsteatotic (C) and steatotic (S). Error bar corresponds to jackknifing uncertainty bars at a 95% confidence interval

Predicted y value (CONTROL, y=1)

1.6 1.4

PLSDA-raw PLSDA-V PLSDA-bs

1.2

PLSDA-bM

1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 C1

C2

C3

C4

C5

C6

S1

S2

S3

S4

S5

S6

Table 3 Statistically significant marker metabolites differentiating human liver steatotic tissue from non-steatotic m/z

Fold of change

Proposed identity

Formula

Pathway Bile acid metabolism

522.2830

0.21**

Taurochenodeoxycholic acid

C26H45NO6S (M ? Na)

538.2514

0.21**

Taurochenodeoxycholic acid

C26H45NO6S (M ? K)

544.2690

0.20**

Taurochenodeoxycholic acid

C26H45NO6S (M ? 2Na-H)

580.2939

0.52**

Taurochenodeoxyxholic acid sulphate

C26H45NO9S2 (M ? H)

144.0127

2.47*

Glutathione

C5H6NO2S (M ? H-C5H12N2O4)

291.0673

2.84*

Glutathione

C10H15N2O6S (M ? H-NH3)

653.1238

4.7*

Oxidized Glutathione

C20H32N6O12S2 (M ? Na)

Glutathione metabolism

504.3074

0.16**

LysoPE (20:3/0:0)

C25H46NO7P (M ? H)

Lipid metabolism

268.1047

5.77**

Adenosine

C10H13N5O4 (M ? H)

Purine metamolism

365.2778

16.80**

Uk

Uk

350.2684

0.24**

Uk

Uk

580.2939

0.52**

Uk

Uk

602.2780 640.5889

0.24** 26.95**

Uk Uk

Uk Uk

* P \ 0.05; ** P \ 0.01 non-steatotic versus steatotic. Statistical P-value calculated using the unpaired Student’s t-test. Fold of change expressed as the relation between the mean of the non-steatotic group with regard to the mean of the steatotic group. Uk unknown

could lead to lipotoxicity phenomena (Han et al. 2008; Barr et al. 2010). Finally, bile acids are reported to be sensitive biomarkers of liver injury (Yang et al. 2008). Although, the putative markers proposed herein are in agreement with previously reported altered metabolites in steatotic livers, further research is needed to understand the role of these metabolites in NAFLD development and its progression to other pathologies.

4 Conclusions A MS-based metabolomic strategy, including variable selection and PLSDA analysis, is presented as a classification tool of human liver profiles (steatotic and non-

123

steatotic) which includes different uninformative variable elimination techniques using two common software packages (MATLAB and SIMCA-P). Chemometric variable selection improves models’ classification performance, mainly by reducing the uncertainty of the estimations. However, no statistical difference in the prediction uncertainty among the different variable selection techniques was found (P-value [ 0.05). Furthermore, a lower number of variables facilitates metabolite identification and provides in-depth insight into the metabolomic data provided by the UPLC-MS method. Variable elimination approaches achieve model simplification, a relevant issue concerning the further development of ‘target’ MS approaches to quantitatively determine biomarkers. The use of an external validation set confirms the predictive abilities of those

MS-based metabolomic combined with chemometrics for predicting NAFLD

models developed after the variable selection procedure. The results reveal the proposed approach’s potential as a straightforward strategy to improve model prediction outcome and to simplify biochemical MS data interpretation. The reliability of the results is sufficient to suggest some metabolites as potential markers; however, further investigation is needed to unambiguously propose them as biomarkers. These findings encourage ongoing research into more comprehensive human liver tissue metabolomic analyses to provide new insights into NAFLD mechanism and development. Acknowledgments This work has been supported by Conselleria de Sanitat (Regional Valencian Ministry of Health) contract (AP-193/ 10). A. L is grateful for a Miguel Servet contract (CP08/00125) from the Spanish Ministry of Science and Innovation/Instituto de Salud Carlos III. J.C G-C is grateful for a pre-doctoral contract from the val I ?d program of the Conselleria d’Educacio´ (Regional Valencian Ministry of Education).

References Barker, M., & Rayens, W. (2003). Partial least squares for discrimination. Journal of Chemometrics, 17, 166–173. Barr, J., Vazquez-Chantada, M., Alonso, C., Perez-Cormenzana, M., Mayo, R., Galan, A., et al. (2010). Liquid chromatography-mass spectrometry-based parallel metabolic profiling of human and mouse model serum reveals putative biomarkers associated with the progression of nonalcoholic fatty liver disease. Journal of Proteome Research, 9, 4501–4512. Bijlsma, S., Bobeldijk, I., Verheij, E. R., Ramaker, R., Kochhar, S., Macdonald, I. A., et al. (2006). Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation. Analytical Chemistry, 78, 567–574. Brereton, R. G. (2009). Chemometrics for pattern recognition. Chichester: John Wiley & Sons. Cavill, R., Keun, H. C., Holmes, E., Lindon, J. C., Nicholson, J. K., & Ebbels, T. M. D. (2009). Genetic algorithms for simultaneous variable and sample selection in metabonomics. Bioinformatics, 25, 112–118. Cheung, O., & Sanyal, A. J. (2009). Recent advances in nonalcoholic fatty liver disease. Current Opinion in Gastroenterology, 25, 230–237. Chong, I.-G., & Jun, C.-H. (2005). Performance of some variable selection methods when multicollinearity is present. Chemometrics and Intelligent Laboratory Systems, 78, 103–112. Cortes, M., Pareja, E., Castell, J. V., Moya, A., Mir, J., & Lahoz, A. (2010). Exploring mass spectrometry suitability to examine human liver graft metabonomic profiles. Transplantation Proceedings, 42, 2953–2958. den Boer, M., Voshol, P. J., Kuipers, F., Havekes, L. M., & Romijn, J. A. (2004). Hepatic steatosis: a mediator of the metabolic syndrome. Lessons from animal models. Arteriosclerosis, Thrombosis, and Vascular Biology, 24, 644–649. Dowman, J. K., Tomlinson, J. W., & Newsome, P. N. (2010). Pathogenesis of non-alcoholic fatty liver disease. Quarterly Journal of Medicine, 103, 71–83. Efron, B., & Gong, G. (1983). A leisurely look at the bootstrap, the jackknife, and the cross-validation. American Statistician, 37, 36–48.

97

Eriksson, L., Trygg, J., & Wold, S. (2008). CV-ANOVA for significance testing of PLS and OPLS (R) models. Journal of Chemometrics, 22, 594–600. Esbensen, K. H., & Geladi, P. (2010). Principles of proper validation: use and abuse of re-sampling for validation. Journal of Chemometrics, 24, 168–187. FDA (2001) Guidance for industry: bioanalytical method validation, In: US Department of Health and Human Services, Food and Drug Administration, Bethesa. Filzmoser, P., Liebmann, B., & Varmuza, K. (2009). Repeated double cross validation. Journal of Chemometrics, 23, 160–171. Gomez-Lechon, M. J., Donato, M. T., Martinez-Romero, A., Jimenez, N., Castell, J. V., & O’Connor, J. E. (2007). A human hepatocellular in vitro model to investigate steatosis. Chemico Biological Interactions, 165, 106–116. Han, M. S., Park, S. Y., Shinzawa, K., Kim, S., Chung, K. W., Lee, J. H., et al. (2008). Lysophosphatidylcholine as a death effector in the lipoapoptosis of hepatocytes. Journal of Lipid Research, 49, 84–97. Horai, H., Arita, M., Kanaya, S., Nihei, Y., Ikeda, T., Suwa, K., et al. (2010). MassBank: a public repository for sharing mass spectral data for life sciences. Journal of Mass Spectrometry and Ion Physics, 45, 703–714. Hoskuldsson, A. (2001). Variable and subset selection in PLS regression. Chemometrics and Intelligent Laboratory Systems, 55, 23–38. Isabelle, G., & Andre, E. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182. Johansson, E., Svante, W., & Sjoedin, K. (1984). Minimizing effects of closure on analytical data. Analytical Chemistry, 56, 1685–1688. Kalhan, S.C., Guo, L., Edmison, J., Dasarathy, S., McCullough, A.J., Hanson, R.W., & Milburn, M. (2010) Plasma metabolomic profile in nonalcoholic fatty liver disease. Metabolism. doi:10.1016/ j.metabol.2010.03.006 Lavine, B., & Workman, J. (2010). Chemometrics. Analytical Chemistry, 82, 4699–4711. Li, X., Yang, S. B., Qiu, Y. P., Zhao, T., Chen, T. L., Su, M. M., et al. (2010). Urinary metabolomics as a potentially novel diagnostic and stratification tool for knee osteoarthritis. Metabolomics, 6, 109–118. Lindgren, F., Hansen, B., Karcher, W., Sjo¨stro¨m, M., & Eriksson, L. (1996). Model validation by permutation tests: applications to variable selection. Journal of Chemometrics, 10, 521–532. Matthew, B., & William, R. (2003). Partial least squares for discrimination. Journal of Chemometrics, 17, 166–173. Nicholson, J. K., Lindon, J. C., & Holmes, E. (1999). ‘Metabonomics’: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica, 29, 1181–1189. Pasikanti, K. K., Esuvaranathan, K., Ho, P. C., Mahendran, R., Kamaraj, R., Wu, Q. H., et al. (2010). Noninvasive urinary metabonomic diagnosis of human bladder cancer. Journal of Proteome Research, 9, 2988–2995. Peters, S., van Velzen, E., & Janssen, H. G. (2009). Parameter selection for peak alignment in chromatographic sample profiling: objective quality indicators and use of control samples. Analytical and Bioanalytical Chemistry, 394, 1273–1281. Pierna, J. A. F., Abbas, O., Baeten, V., & Dardenne, P. (2009). A backward variable selection method for pls regression (BVSPLS). Analytica Chimica Acta, 642, 89–93. Puri, P., Wiest, M. M., Cheung, O., Mirshahi, F., Sargeant, C., Min, H. K., et al. (2009). The plasma lipidomic signature of nonalcoholic steatohepatitis. Hepatology, 50, 1827–1838.

123

98 Sysi-Aho, M., Vehtari, A., Velagapudi, V. R., Westerbacka, J., Yetukuri, L., Bergholm, R., et al. (2007). Exploring the lipoprotein composition using Bayesian regression on serum lipidomic profiles. Bioinformatics, 23, I519–I528. van den Berg, R. A., Hoefsloot, H. C., Westerhuis, J. A., Smilde, A. K., & van der Werf, M. J. (2006). Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics, 7, 142. Vinaixa, M., Rodriguez, M. A., Rull, A., Beltran, R., Blade, C., Brezmes, J., et al. (2010). Metabolomic assessment of the effect of dietary cholesterol in the progressive development of fatty liver disease. Journal of Proteome Research, 9, 2527–2538. Want, E. J., Wilson, I. D., Gika, H., Theodoridis, G., Plumb, R. S., Shockcor, J., et al. (2010). Global metabolic profiling procedures for urine using UPLC-MS. Nature Protocols, 5, 1005–1018. Westerhuis, J. A., Hoefsloot, H. C. J., Smit, S., Vis, D. J., Smilde, A. K., van Velzen, E. J. J., et al. (2008a). Assessment of PLSDA cross validation. Metabolomics, 4, 81–89. Westerhuis, J. A., van Velzen, E. J. J., Hoefsloot, H. C. J., & Smilde, A. K. (2008b). Discriminant Q(2) (DQ(2)) for improved discrimination in PLSDA models. Metabolomics, 4, 293–296. Wishart, D. S., Knox, C., Guo, A. C., Eisner, R., Young, N., Gautam, B., et al. (2009). HMDB: a knowledgebase for the human metabolome. Nucleic Acids Research, 37, D603–D610.

123

G. Quinta´s et al. Wold, S., Sjo¨stro¨m, M., & Eriksson, L. (2001). PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58, 109–130. Wongravee, K., Heinrich, N., Holmboe, M., Schaefer, M. L., Reed, R. R., Trevejo, J., et al. (2009). Variable selection using iterative reformulation of training set models for discrimination of samples: application to gas chromatography/mass spectrometry of mouse urinary metabolites. Analytical Chemistry, 81, 5204–5217. Wu, H., Southam, A. D., Hines, A., & Viant, M. R. (2008). Highthroughput tissue extraction protocol for NMR- and MS-based metabolomics. Analytical Biochemistry, 372, 204–212. Yang, L., Xiong, A., He, Y., Wang, Z., Wang, C., Li, W., et al. (2008). Bile acids metabolomic study on the CCl4- and alphanaphthylisothiocyanate-induced animal models: quantitative analysis of 22 bile acids by ultraperformance liquid chromatography-mass spectrometry. Chemical Research in Toxicology, 21, 2280–2288. Zelena, E., Dunn, W. B., Broadhurst, D., Francis-McIntyre, S., Carroll, K. M., Begley, P., et al. (2009). Development of a robust and repeatable UPLC-MS method for the long-term metabolomic study of human serum. Analytical Chemistry, 81, 1357–1364.