MET-IDEA: Data Extraction Tool for Mass Spectrometry-Based ...

Anal. Chem. 2006, 78, 4334-4341

MET-IDEA: Data Extraction Tool for Mass Spectrometry-Based Metabolomics Corey D. Broeckling,† Indira R. Reddy, Anthony L. Duran,‡ Xuechun Zhao, and Lloyd W. Sumner*

Plant Biology Division, Samuel Roberts Noble Foundation, P.O. Box 2180, Ardmore, Oklahoma 73401

A current and significant limitation to metabolomics is the large-scale, high-throughput conversion of raw chromatographically coupled mass spectrometry datasets into organized data matrices necessary for further statistical processing and data visualization. This article describes a new data extraction tool, MET-IDEA (Metabolomics Ionbased Data Extraction Algorithm) which surmounts this void. MET-IDEA is compatible with a diversity of chromatographically coupled mass spectrometry systems, generates an output similar to traditional quantification methods, utilizes the sensitivity and selectivity associated with selected ion quantification, and greatly reduces the time and effort necessary to obtain large-scale organized datasets by several orders of magnitude. The functionality of MET-IDEA is illustrated using metabolomics data obtained for elicited cell culture exudates from the model legume, Medicago truncatula. The results indicate that MET-IDEA is capable of rapidly extracting semiquantitative data from raw data files, which allows for more rapid biological insight. MET-IDEA is freely available to academic users upon request. Metabolomics is a rapidly maturing field and is expanding the scope of functional genomics beyond mRNA and proteins to the level of metabolism.1-6 Perturbations of a homeostatic biological state result in altered levels of individual metabolites, and metabolomics attempts to capture the altered levels of those metabolites on a large scale.7-14 To date, the most commonly used * To whom correspondence should be addressed. E-mail: [email protected]. † Current affiliation: Department of Horticulture and Landscape Architecture, Colorado State University, 111 Shepardson, Fort Collins, CO 80523-1173. ‡ Current affiliation: Cargill, Inc. Health & Food Technologies, 1 Cargill Drive, Eddyville, IA 52553. (1) Nielsen, J.; Oliver, S. Trends Biotechnol. 2005, 23, 544-546. (2) Bino, R.; Hall, R.; Fiehn, O.; Kopka, J.; Saito, K.; Draper, J.; Nikolau, B.; Mendes, P.; Roessner-Tunali, U.; Beale, M.; Trethewey, R.; Lange, B.; Wurtele, E.; Sumner, L. Trends Plant Sci. 2004, 9, 418-425. (3) Fernie, A. R. Funct. Plant Biol. 2003, 30, 111-120. (4) Sumner, L.; Mendes, P.; Dixon, R. Phytochemistry 2003, 62, 817-836. (5) Trethewey, R. N.; Krotzky, A. J.; Willmitzer, L. Curr. Opin. Plant Biol. 1999, 2, 83-85. (6) Fiehn, O.; Kopka, J.; Dormann, P.; Altmann, T.; Trethewey, R. N.; Willmitzer, L. Nat. Biotechnol. 2000, 18, 1157-1161. (7) Roessner, U.; Luedemann, A.; Brust, D.; Fiehn, O.; Linke, T.; Willmitzer, L.; Fernie, A. R. Plant Cell 2001, 13, 11-29. (8) Roessner, U.; Wagner, C.; Kopka, J.; Trethewey, R.; Willmitzer, L. Plant J. 2000, 23, 131-142. (9) Tohge, T.; Nishiyama, Y.; Hirai, M. Y.; Yano, M.; Nakajima, J.-i.; Awazuhara, M.; Inoue, E.; Takahashi, H.; Goodenowe, D. B.; Kitayama, M.; Noji, M.; Yamazaki, M.; Saito, K. Plant J. 2005, 42, 218-235.

4334 Analytical Chemistry, Vol. 78, No. 13, July 1, 2006

analytical methods in metabolomics involve hyphenated chromatographic techniques coupled to mass spectrometry (MS) or nuclear magnetic resonance (NMR). MS-based techniques are known for their sensitivity, selectivity, and dynamic range.4 In addition, coupling of a chromatographic system to MS increases the resolution of the analysis due to the separation of compounds with similar or identical masses and by minimizing matrix effects. The most commonly used separation techniques include gas chromatography (GC), liquid chromatography (LC), and capillary electrophoresis (CE), with each system having properties both unique and complimentary to the others. Hyphenated chromatography-mass spectrometric systems will be generalized as CHROM-MS hereafter. Whereas traditional metabolomic approaches capture steadystate metabolite levels, metabolic flux analyses attempt to measure the rate of conversion between metabolites in metabolic networks. Mass spectrometry and NMR have been commonly used for flux analysis through addition of isotopically labeled metabolic substrates, with CHROM-MS data collection particularly suited to flux studies attempting to simultaneously examine multiple overlapping pathways.15-18 Standard methods for extracting quantitative information from CHROM-MS datasets and assembly into data matrices amenable to statistical interrogation are quite cumbersome and ineffective for large, complex sample sets. Several chromatography-directed software packages have previously been developed to target specific areas of data processing, including chromatogram alignment and deconvolution of overlapping peaks.19,20 Unfortunately, these packages do not provide quantification capabilities, thus (10) Hirai, M. Y.; Yano, M.; Goodenowe, D. B.; Kanaya, S.; Kimura, T.; Awazuhara, M.; Arita, M.; Fujiwara, T.; Saito, K. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 10205-10210. (11) von Roepenack-Lahaye, E.; Degenkolb, T.; Zerjeski, M.; Franz, M.; Roth, U.; Wessjohann, L.; Schmidt, J.; Scheel, D.; Clemens, S. Plant Physiol. 2004, 134, 548-559. (12) Jeong, M. L.; Jiang, H.; Chen, H.-S.; Tsai, C.-J.; Harding, S. A. Plant Physiol. 2004, 136, 3364-3375. (13) Kant, M. R.; Ament, K.; Sabelis, M. W.; Haring, M. A.; Schuurink, R. C. Plant Physiol. 2004, 135, 483-495. (14) Duran, A. L.; Yang, J.; Wang, L.; Sumner, L. W. Bioinformatics 2003, 19, 2283-2293. (15) Roessner-Tunali, U.; Liu, J.; Leisse, A.; Balbo, I.; Perez-Melis, A.; Willmitzer, L.; Fernie, A. Plant J. 2004, 39, 668-679. (16) Zamboni, N.; Fischer, E.; Sauer, U. BMC Bioinf. 2005, 6, -. (17) Dauner, M.; Sauer, U. Biotechnol. Prog. 2000, 16, 642-649. (18) Christensen, B.; Nielsen, J. Biotechno. Bioeng. 2000, 68, 652-659. (19) Halket, J.; Przyborowska, A.; Stein, S.; Mallard, W.; Down, S.; Chalmers, R. Rapid Commun. Mass Spectrom. 1999, 13, 279-284. (20) Nielsen, N. P. V.; Carstensen, J. M.; Smedsgaard, J. J. Chromatogr., A 1998, 805, 17-35. 10.1021/ac0521596 CCC: $33.50

© 2006 American Chemical Society Published on Web 06/06/2006

reducing their utility for complex chromatograms. Additional software programs specifically targeted to metabolomic studies have been developed previously to address this void.14,21-23 MSFACTs14 was designed to align previously integrated peak values on the basis of retention time, but it is limited in scope in that mass spectral selectivity is not utilized. Previously, the importance of selected ion extraction for data processing was shown using proprietary software; however, this still required manual alignment.6 More recently, several automated tools have been introduced. MetAlign22 is a powerful tool for nontargeted data analysis incorporating peak detection of nominal mass chromatograms coupled to multivariate clustering tools to reassemble component mass spectra; however, this tool is available only at considerable cost and without published algorithm descriptions. MZmine21 also performs peak detection and alignment of multiple complex chromatograms, but it does not perform spectral reconstruction. Although this is not a serious limitation using liquid chromatography-electrospray ionization mass spectrometry, which generates minimal fragmentation, it severely complicates the resulting dataset from GC/electron impact MS data due to considerable fragment generation. Recently, another metaboliteprofiling data processing tool entitled XCMS was reported and has been made publicly available.23 This software divides the mass data into 0.1 m/z bins, which are used to generate extracted ion chromatograms. The extracted ion base-peak chromatograms are then processed individually. This process includes filtering, peak detection, peak matching, and nonlinear retention time alignment. The peak lists are then combined and postprocessed using vicinity elimination. Unfortunately, this package produces highly redundant datasets, does not correct for m/z shifts, and has limited metabolite identification capabilities. This article reports the development of a new tool, MET-IDEA (Metabolomics Ion-based Data Extraction Algorithm), which streamlines the problematic step proceeding from raw data files to a complete data matrix. MET-IDEA implements simple algebraic algorithms in a user-directed fashion to extract ion abundance data associated with separated or coeluting metabolite peaks in complex CHROM-MS datasets. The software is engineered to be compatible with multiple CHROM-MS platforms and is capable of utilizing the output from freely available deconvolution software (AMDIS19) to direct the selected ion extraction process (see below). To demonstrate the utility of this software for complex data, deconvolution and MET-IDEA were used to extract GC/MS metabolic profiling data collected for the model legume Medicago truncatula suspension cell culture media. Evaluation of the culture media for extracellular metabolites yields “metabolic footprints”24 that are helpful in understanding processes such as metabolite secretion, transport, or membrane permeability. Extracellular pools of metabolites are also unique because they are less regulated by feedback mechanisms that control accumulation within the cell. MET-IDEA was used to rapidly generate an organized dataset that was further statistically interrogated. The (21) Katajamaa, M.; Oresic, M. BMC Bioinf. 2005, 6, 179. (22) Tikunov, Y.; Lommen, A.; de Vos, C. H. R.; Verhoeven, H. A.; Bino, R. J.; Hall, R. D.; Bovy, A. G. Plant Physiol. 2005, 139, 1125-1137. (23) Smith, C. A.; Want, E. J.; O’Maille, G.; Abagyan, R.; Siuzdak, G. Anal. Chem. 2006, 78, 779-787. (24) Allen, J.; Davey, H. M.; Broadhurst, D.; Heald, J. K.; Rowland, J. J.; Oliver, S. G.; Kell, D. B. Nat. Biotechnol. 2003, 21, 692-696.

performance of MET-IDEA was validated by comparison with more traditional but time-consuming peak integration methods. MET-IDEA yielded increased automation, reliability, and efficiency. Elicitation with either methyl jasmonate, an elicitor of triterpene saponin synthesis,25 or yeast elicitor, an inducer of isoflavonoid phytoalexins,26 dramatically altered the accumulation of various primary and secondary metabolites in the cell culture media. SYSTEMS AND METHODS Biological Material. Liquid suspension cell cultures were elicited with methyl jasmonate (MeJa), yeast cell wall (YE), and UV irradiation as previously described.26 Cell culture media was harvested using vacuum filtration to remove suspended cells and collected at 0, 6, 12, 24, and 48 h following elicitation. Media was subsequently transferred to centrifuge tubes and flash-frozen in liquid nitrogen. Cell culture media samples were maintained at -80 °C until further processing and analysis. Three biological replicates collected from separate flasks were collected for each time point for both the control and elicited groups, yielding a total of 90 biological samples. Analytical Methods. Frozen media samples were allowed to equilibrate at room temperature in a water bath. A 5-mL portion of media was transferred to a 10.0-mL glass tube. A 20-µL portion of a 1.0 µg/µL solution of 5-methoxysalicylic acid in water was added to each 5.0-mL sample as an internal standard. A 2.5-mL portion of ethyl acetate was added to the samples, which were then acidified with 20 µL of concentrated hydrochloric acid. The biphasic system was vortexed and centrifuged at 1000g for 20 min. A 1.5-mL portion of the ethyl acetate phase was collected and transferred to an autosampler vial and dried under nitrogen to remove solvent and trace HCl. The residue was methoximated by resuspending in 70 µL of 15 µg/µL methoxyamine-HCl in pyridine and incubated for 30 min at 50 °C. The sample was then trimethylsilylated by adding 30 µL of MSTFA + 1% TMCS (Pierce) and incubating for 30 min at 50 °C. A 1.0 µL aliquot of the derivatization solution was injected onto an Agilent 6890 GC coupled to an Agilent 5973 MS using a 1:1 split ratio. The GC was held at 80 °C for 2.0 min, ramped at 5.0 °C/minute to 315 °C and held at 315 °C for 12 min. The injector and auxiliary arm were maintained at 280 °C. Separation was performed using a 60-m DB5-MS column (J&W Scientific, 0.25-mm i.d., 0.25-µm film thickness) at 1.0 mL/min helium. Duplicate injections and analyses were performed for each biological sample, thus, yielding a dataset composed of 180 GC/MS analyses. Data Analysis. MET-IDEA settings were determined on the basis of empirical observations of GC/MS performance characteristics using data from our lab and external labs. Typical settings have been entered as default settings in the program. These included an average peak width of 0.15 min; minimum peak width of 0.2 times the average peak width (i.e., 0.03 min); a maximum peak width of 3 times the average, or 0.45 min; peak slope >1.5; an adjusted retention time accuracy of 0.2 times the average peak width, or 0.03 min; a peak overload factor of 0.75; a mass accuracy of 0.1; and a mass window of (0.2. Raw data extracted using METIDEA were compared to quantification results generated using (25) Suzuki, H.; Achnine, L.; Xu, R.; Matsuda, S.; Dixon, R. Plant J. 2002, 32, 1033-1048. (26) Broeckling, C. D.; Huhman, D. V.; Farag, M. A.; Smith, J. T.; May, G. D.; Mendes, P.; Dixon, R. A.; Sumner, L. W. J. Exp. Bot. 2005, 56, 323-336.

Analytical Chemistry, Vol. 78, No. 13, July 1, 2006

4335

Figure 1. Flowchart diagramming the logical and computational path during a standard analysis.

Agilent Chemstation software (version D.01.02.16) for both total ion and single ion chromatographic integration. The same ions were used for single ion integration comparison between Chemstation and MET-IDEA extraction. For statistical analysis of the experimental time series, analyses of variance (ANOVAs) were performed on each component to analyze the effect of treatment, time point, the interaction term treatment × time point, and analytical replication using JMP (SAS, Cary, NC). Two-dimensional hierarchical clustering was subsequently performed on all metabolites that demonstrated ANOVA models with p < 0.01. All statistical analyses were performed on data adjusted for internal standard recovery (by dividing all component/metabolite values by the internal standard peak area for each GC/MS analysis) and subsequently scaled to mean ) 0 and standard deviation ) 1. This scaling procedure was used to prevent bias in the HCA analysis due to high variation of some components. ALGORITHM MET-IDEA Description and Algorithms. MET-IDEA is a data processing and extraction tool designed to extract quantitative ion abundance data from large series of coupled chromatographicmass spectrometry datasets. Extensive details concerning the algorithms and operational parameters are provided in a 32-page user help file accessible through the software (see Supporting Information); however, a few basic details are provided below. It is highly recommended that users review the help file prior to use and process the representative data as a training set for a better understanding of the software. MET-IDEA is programmed and implemented in Microsoft .NET and first requires installation of the .NET framework 4336


(www.microsoft.com/net/). MET-IDEA imports raw data in net.cdf format, which is currently recognized as a universal industry format (http://my.unidata.ucar.edu/content/software/netcdf/index.html) and is an export option of most recent commercial mass spectrometry data acquisition software. Alternatively, net.cdf formatted files can be generated using commercial file conversion software such as MASSTransit (Palisade Corp., Newfield, NY). The MET-IDEA output is formatted as a tab delimited value text file readily viewable by a large number of word processors, spreadsheets, and statistical software packages. MET-IDEA requires an input list composed of a series of ion/ retention time pairs (IRt), which guide the selected ion extraction process (Figure 1). Each IRt is unique and characteristic of a specific compound, and these are commonly referred to as mass spectral tags.27 The IRt list can be generated in one of three ways: (1) manually created and edited within MET-IDEA, (2) imported in tab-delimited text format from metabolite databases or compiled lists, or (3) extracted from AMDIS (Automated Mass spectral Deconvolution and Identification Software; http://www.amdis.net/) output files. AMDIS was developed and is maintained by the United States National Institute of Standards and Technology (NIST) and is freely available software designed for analysis of GC/MS data.19 It is capable of differentiating chromatographically unresolved and baseline-masked peaks and subsequently identifying those components by comparison with spectral libraries generated from authentic compounds. AMDIS can also be used for analysis of LC/MS and CE/MS data, albeit (27) Kopka, J.; Schauer, N.; Krueger, S.; Birkemeyer, C.; Usadel, B.; Bergmuller, E.; Dormann, P.; Gibon, Y.; Stitt, M.; Willmitzer, L.; Fernie, A. R.; Steinhauser, D. Bioinformatics 2004, bti236.

somewhat less effectively due to the limitations in parameter settings and lack of commercially available spectral libraries. AMDIS does not possess quantification capabilities. By analyzing a representative sample using AMDIS, the vast majority of components in a dataset can be rapidly and efficiently converted to an IRt list. The output from AMDIS peak detection and deconvolution analysis can be used by MET-IDEA to collect values for ions which are selective for a given metabolite. The selection process utilized by MET-IDEA accesses the “model” ions (as recognized by AMDIS and listed in the *.elu output file), when available, and otherwise selects abundant ions that meet a certain set of criteria. These criteria, which can be modified by the user, include a low mass cutoff and an exclusion list for common background ions originating from contamination, column materials, derivatization reagents, and solvent clusters. When AMDIS reports multiple model ions, MET-IDEA selects the more abundant of the ions for quantification. When no models are reported, MET-IDEA selects a nonmodel ion from the raw mass spectrum. When no ions meet the criteria, a value of “-1” is reported, indicating to the user that manual supervision is necessary for that compound. The exclusion list and low mass cutoff apply to both the model ion selection and selection of component spectral ions when models are not available. When the IRt list is prepared from an AMDIS output *.elu file, MET-IDEA references the corresponding *.fin file which contains spectral matches to a custom or commercially available mass spectral library and returns the identifications reported therein. Each IRt is then representative of a mass spectral tag.27 By choosing a representative sample or generating a representative sample by pooling several samples from the dataset and analyzing that pooled sample in the same manner as the individual samples, the vast majority of components in a dataset can be rapidly and efficiently converted to an IRt list. Alternatively, the user can process multiple samples using AMDIS and concatenate the IRt lists. This will generate a dataset that can be cleaned using the “redundancy analysis” function of METIDEA (see below). MET-IDEA allows optional user calibration of mass values, retention time values, neither, or both data types on the basis of user-selected “marker” values. Once the markers are selected, MET-IDEA locates the IRt pairs in a given file, compares the actual retention time values to those in the first reference file, and corrects the retention time (Rt) on a file-by-file basis. MET-IDEA applies either a calculated fixed value correction based on the average deviation of experimental retention values from the expected values (average ∆Rt) or a linear correction28,29 in which each ∆Rt is regressed against the compound Rt. A fixed-value retention time correction is sufficient for several hundred consecutive gas chromatographic injections (Figure 2a), whereas a linear adjustment is often more appropriate for longer time frames (weeks to months) or with LC or CE data, (Figure 2b,c). It is noted that the need for rigorous alignment is reduced when data is extracted using both chromatographic retention and mass domain criteria, as opposed to other algorithms that peak-pick (28) Schauer, N.; Steinhauser, D.; Strelkov, S.; Schomburg, D.; Allison, G.; Moritz, T.; Lundgren, K.; Roessner-Tunali, U.; Forbes, M. G.; Willmitzer, L.; Fernie, A. R.; Kopka, J. FEBS Lett. 2005, 579, 1332-1337. (29) Wu, X.; Gu, L.; Prior, R. L.; McKay, S. J. Agric. Food Chem. 2004, 52, 78467856.

Figure 2. Retention time drift for GC/MS data is predictable. Data collected in a previous study26 were used to demonstrate the applicability of the retention time shift for accurately predicting peak locations. All retention times units are minutes. (a) Retention time variation in GC/MS datasets with up to 300 sample analyses can be accurately predicted by shifting the predicted retention time by a constant value, calculated for each file/analysis. The Rt correction constant was calculated using five identifiable peaks from various locations within the chromatogram, and an average deviation was determined. The overlap in the scatterplots for each of the five peaks demonstrates that the retention time drift is essentially independent of peak elution time over 300 injections. (b) Similar samples were analyzed approximately four months apart, with nearly continuous use of the instrument in the interim. Over these longer time frames, a linear correction more accurately predicts the retention time changes due to column wear. The retention times of several marker metabolites were plotted against each other in scatter-plot format, and the regression line was determined to be highly linear. The linear equation displayed was calculated in MET-IDEA using the five marker peaks labeled with a “+” and used to predict the retention times for all other points (solid diamonds). (c) The linear regression displayed in part b can accurately predict the retention times for other manually matched peaks. The retention time for each peak (x axis) is plotted against the residuals from the regression line in part b. In every case, including those peaks that elute before the earliest marker peaks, the deviation is 10 samples) composed of multiple files because correlation parameters cannot be reliably estimated from small sample sets. This is not likely to be a limitation when applied in a metabolomics experimental setting. The results of redundancy analysis are written to a separate file. MET-IDEA does not delete data because instances may occur in which two adjacent peaks are truly different metabolites but elute with similar retention times and correlate in abundance. In addition to reporting redundancy events, MET-IDEA can perform correlation analysis on the entire dataset, reporting only those r2 values above a user-selected threshold (all results can be collected if the user selects “0” as the threshold). This tool can be used to extract biologically relevant correlation relationships.7,26,30-32 It is the responsibility of the user to determine whether peak pairs identified by the redundancy analysis are, in fact, AMDIS artifacts. IMPLEMENTATION AND DISCUSSION MET-IDEA Performance Validation. MET-IDEA ion extraction was systematically compared with more traditional but timeconsuming quantification methods based on peak abundance, in particular, integration results generated using Agilent Chemstation software (see materials and methods). Three moderately abundant and chromatographically resolved peaks, succinic acid, fumaric acid, and 5-methoxysalicylic acid (internal standard) were used to demonstrate the validity of MET-IDEA quantification. Scatter (30) Weckwerth, W.; Loureiro, M. E.; Wenzel, K.; Fiehn, O. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 7809-7814. (31) Roessner-Tunali, U.; Urbanczyk-Wochniak, E.; Czechowski, T.; Kolbe, A.; Willmitzer, L.; Fernie, A. R. Plant Physiol. 2003, 133, 683-692. (32) de la Fuente, A.; Bing, N.; Hoeschele, I.; Mendes, P. Bioinformatics 2004, 20, 3565-3574.

Figure 3. Comparison of total ion chromatogram, single ion chromatogram, and MET-IDEA-based data integration. TIC and SIC peak integrations were performed using Agilent Chemstation software. MET-IDEA algorithms were then applied using the same ions as SIC, and the data was plotted against the TIC (right column). MET-IDEA performed as well as SIC, and comparison of the two scatterplots for each metabolite reveal that SIC and MET-IDEA often vary in identical fashion, indicating that the minor deviation is not due to the algorithms but instead is induced by use if a single ion rather than the total ion. This indicates that MET-IDEA integration algorithms reliably replicate results generated using more traditional methods. All peaks were within the dynamic range of the instrument.

plots of both single ion chromatogram (SIC) values and METIDEA values vs total ion chromatogram (TIC) values were generated. In all cases tested, MET-IDEA ion values are highly correlated with strong linearity (r2 ≈ 0.99), with results generated using traditional methods (Figure 3), indicating that the algorithms provide data equivalent to more traditional methods, but in a more selective, automated, and high-throughput fashion. Effects of Methyl Jasmonate (MeJa) and Yeast Elicitor (YE) on M. truncatula Cell Suspension Liquid Media. M. truncatula is a model plant for studying legumes, possesses unique secondary metabolism, and establishes symbiotic associations with nitrogen-fixing bacteria and fungi.33,34 Secondary metabolite biosynthesis can be elicited through the application of certain compounds. In this study, methyl jasmonate (MeJa, a phytohormone involved in wound signaling) and a yeast cell wall preparation (YE, a fungal pathogen mimic) were each applied independently to M. truncatula root cell suspension cultures. MeJa application was previously demonstrated to elicit the accumulation of triterpene saponins,25,26 and application of YE induced the (33) Dixon, R. A.; Sumner, L. W. Plant Physiol. 2003, 131, 878-885. (34) Young, N.; Mudge, J.; Ellis, T. Curr. Opin. Plant Biol. 2003, 6, 199-204.

accumulation of both free and glycosylated isoflavanoids, a prominent class of phenylpropanoid compounds found in legumes.26,35 A representative MeJa-elicited sample was analyzed using AMDIS, and an IRt list was generated. The list was then used by MET-IDEA to extract quantitative peak areas for each IRt from each sample. Compounds present in YE elicited or unelicited samples that were absent from the MeJa elicited sample were manually added to the IRt list in MET-IDEA, and data extraction was performed with default values for gas chromatography coupled to a quadrupole mass spectrometry instrument. The resultant data matrix was then exported as a tab-delimited text file that could then been imported into common spreadsheet or statistical programs. ANOVA was performed on all 233 chromatographic peaks extracted from the MeJa-elicited dataset. Of these, 167 (71.7%) peaks resulted in a significant model fit (p < 0.01) (Supp_Table_3_Anova; see Supporting Information). Within those ANOVAs resulting in significant models, 78.4% demonstrated a significant treatment effect, and 35.3%, a significant treatment*time point interaction; however, only 0.6% (1/167) demonstrated a significant injection effect, approximating the false positive error rate to be ∼1% at the 99% confidence interval. This demonstrates both injection reproducibility and consistency of MET-IDEA ion extraction algorithms. Similar statistical results were obtained following analysis of the YE dataset, although dramatically fewer metabolites demonstrated elicitation effects (see Supp_Table_3_Anova). A large number of small organic acids, including lactic, glycolic, oxalic, succinic, fumaric, malic, R-ketoglutaric, and citric acids, and several other compounds (many currently unidentified) were altered by MeJa application. Compounds demonstrating significant ANOVA models were further analyzed by two-dimensional hierarchical cluster analysis. Using this approach, MeJa-elicited samples were clearly differentiated from the nonelicited samples in nearly every case (Figure 4, high-resolution images available online as Figure S1 (MeJa) and Figure S2 (YE); see Supporting Information). Several of the organic acids, such as fumaric and citric acids, tended to decrease in abundance in the media following MeJa elicitation (Figure 4a, cluster IV). This trend is inversely related to the intracellular levels of the same compounds for the cells grown in this media,26 suggesting that reduced transport to (or increased uptake from) the media by the cells may explain the observed changes following elicitation, although nonspecific extracellular enzymatic degradation cannot be ruled out. In addition, several phenolic compounds were induced following elicitation with either MeJa or YE. YE elicited an accumulation of the isoflavonoid, afrormosin, which demonstrated a sharp peak in abundance at 6-12 h (Figure 4b, group IV). MeJa elicitation also induced accumulation of this isoflavonoid, alhough the response was more prolonged, peaking at 24 h while still remaining elevated at 48 h (Figure 4a, group V). This trend was mirrored by other compounds, including the tentatively identified hydroquinone and several compounds with prominent m/z 267. CONCLUSIONS MET-IDEA is a novel data extraction tool for mass spectrometrybased metabolomics that rapidly generates reliable and compre(35) Kessmann, H.; Edwards, R.; Geno, P. W.; Dixon, R. A. Plant Physiol. 1990, 94, 227-232.


4339

Figure 4. Two-dimensional hierarchical clustering of cell culture media samples based on metabolite responses to (a) MeJa and (b) YE elicitation. Only metabolites demonstrating a significant ANOVA model were clustered. (a) The metabolic response of cell culture media was very rapid and prolonged following MeJa elicitation, particularly those in cluster I. Clusters II and III show trends that are influenced primarily by time. Cluster IV contains metabolites that demonstrate decreasing concentrations with MeJa application following a variable time delay. Cluster V metabolites tend to be found with increased concentration following MeJa application. Included in the list of cluster V metabolites are MeJa, jasmonic acid (JA), and hydroxyl-jasmonic acid (JA-OH), among others.26 The sample cluster with the asterisk (0 hr cont) also contains one of the 0-h MeJa samples. (b) YE resulted in very different temporal dynamics from those of MeJa, most apparent in the cluster IV metabolites, which help separate the 6- and 12-h YE-elicited samples from all other samples. Metabolites in cluster IV included primarily phenolics, such as daidzein, afrormosin, and unidentified compounds with mass spectral properties suggestive of a phenolic structure. Cluster III metabolites tended to remain elevated for a longer duration following YE elicitation than those in cluster IV. Clusters I and II contain metabolites that show primarily temporal trends. High resolution images are available on-line as supplemental Figure S1 (panel a) and S2 (panel b) that allow visualization of individual metabolite names listed on the x-axis. 4340


hensive datasets utilizing the improved sensitivity and selectivity of single ion quantification. MET-IDEA can utilize output from the freely available AMDIS chromatographic deconvolution software or from user-generated lists of ion-retention time pairs to perform high-throughput data extraction. This software fills a previous void and accelerates the metabolomic process by streamlining the critical step of transforming raw CHROM-MS files into an organized data matrix amenable to statistical interrogation for biological insight or integration into functional genomics databases. MET-IDEA-based analyses revealed the temporal dynamics of cell culture exudates following exposure to elicitors of secondary metabolic pathways. The biological results obtained using MET-IDEA were validated using traditional data extraction and quantification algorithms. Efficient informatic tools allow biologists to spend less time processing data and more time interpreting the biological significance and implications. The improved efficiency of MET-IDEA-based quantification also en-

ables biologists to analyze more biological conditions or replicates, which will ultimately provide greater confidence in the results and deeper biological insight. ACKNOWLEDGMENT The authors thank the National Science Foundation Plant Genome Research Program Award (DBI-0109732) for support of Corey Broeckling and The Samuel Roberts Noble Foundation for additional financial support. We thank Matt Howry for the programming of the net.cdf API wrapper for METIDEA. SUPPORTING INFORMATION AVAILABLE Supporting Information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org. Received for review December 7, 2005. Accepted April 18, 2006. AC0521596


4341