Mapping the Proteome of Barrel Medic (Medicago ... - Plant Physiology

6 downloads 0 Views 585KB Size Report
The tissue-specific expression of proteins and the levels of identified proteins are compared ..... aba-Responsive protein ABR17 ..... Selenium-binding proteina.
Mapping the Proteome of Barrel Medic (Medicago truncatula)1[w] Bonnie S. Watson, Victor S. Asirvatham, Liangjiang Wang, and Lloyd W. Sumner* Plant Biology Division, The Samuel Roberts Noble Foundation, P.O. Box 2180, Ardmore, Oklahoma 73402

A survey of six organ-/tissue-specific proteomes of the model legume barrel medic (Medicago truncatula) was performed. Two-dimensional polyacrylamide gel electrophoresis reference maps of protein extracts from leaves, stems, roots, flowers, seed pods, and cell suspension cultures were obtained. Five hundred fifty-one proteins were excised and 304 proteins identified using peptide mass fingerprinting and matrix-assisted laser desorption ionization time-of-flight mass spectrometry. Nanoscale high-performance liquid chromatography coupled with tandem quadrupole time-of-flight mass spectrometry was used to validate marginal matrix-assisted laser desorption ionization time-of-flight mass spectrometry protein identifications. This dataset represents one of the most comprehensive plant proteome projects to date and provides a basis for future proteome comparison of genetic mutants, biotically and abiotically challenged plants, and/or environmentally challenged plants. Technical details concerning peptide mass fingerprinting, database queries, and protein identification success rates in the absence of a sequenced genome are reported and discussed. A summary of the identified proteins and their putative functions are presented. The tissue-specific expression of proteins and the levels of identified proteins are compared with their related transcript abundance as quantified through EST counting. It is estimated that approximately 50% of the proteins appear to be correlated with their corresponding mRNA levels.

Legumes are valuable agricultural and commercial crops that serve as important nutrient sources for both humans and animals. For example, alfalfa (Medicago sativa) is an important forage crop with over 24 million acres planted annually with an annual U.S. value approaching 6 billion dollars (U.S. Department of Agriculture-National Agricultural Statistics Service, 2002). Legumes are characterized by symbiotic relationships with both nitrogenfixing bacteria and arbuscular mycorrhizal fungi (Barker et al., 1990). These host-symbiont interactions result in the ability to fix atmospheric nitrogen and effect mutualistic and defense-related biosynthetic pathways such as the isoflavones, which have been reported to possess antimicrobial, anticarcinogenic, and other health-promoting properties (Dixon, 1999). Other secondary metabolites in legumes such as the triterpenes have been associated with defense and are of particular interest as novel pharmaceuticals (Small, 1996; Haridas et al., 2001). The study of legume biology using many of the agriculturally important legumes such as soybean (Glycine max) and alfalfa is complicated by the large genome size and complex ploidy of these species. Fortunately, barrel medic (Medicago truncatula) has a 1

This work was supported by the Samuel Roberts Noble Foundation and by the National Science Foundation (Plant Genome Research Project no. 0109732). [w] The online version of this article contains Web-only data. The supplemental material is available at www.plantphysiol.org. * Corresponding author; e-mail [email protected]; fax 580 – 224 – 6692. Article, publication date, and citation information can be found at www.plantphysiol.org/cgi/doi/10.1104/pp.102.019034. 1104

smaller diploid genome that yields more manageable genetics. These traits, along with its autogamous nature, short generation time, and prolific seed production have made barrel medic a useful model legume (Barker et al., 1990; Cook et al., 1997; Cook, 1999; Bell et al., 2000; Trieu et al., 2000). The impressive achievements in genome and expressed sequence tag (EST) sequencing have yielded a wealth of information for many model organisms, including the plants Arabidopsis and barrel medic. Unfortunately, sequence information alone is insufficient to answer questions concerning gene function, developmental/regulatory biology, and the biochemical kinetics of life. To address these questions, more comprehensive approaches that include quantitative and qualitative analyses of gene expression products are necessary at the transcriptome, proteome, and metabolome levels. Transcriptome approaches using microarray and serial analysis of gene expression technologies are powerful tools; however, mRNA abundances may only represent putative function because there is still a questionable correlation between mRNA and protein levels (Futcher et al., 1999; Gygi et al., 1999). In contrast, proteomics provides a more direct assessment of biochemical processes by monitoring the actual proteins performing the enzymatic, regulatory, and structural functions encoded by the genome and transcriptome. Recent improvements in high-resolution twodimensional PAGE (2-DE; Klose and Kobalz, 1995; Go¨rg et al., 1999), increased content of protein and nucleotide databases, and increased capabilities for protein identification utilizing modern mass spectrometry methods such as matrix-assisted laser de-

Plant Physiology, March 2003, Vol. 131, pp. 1104–1123, www.plantphysiol.org © 2003 American Society of Plant Biologists

Proteomics of Barrel Medic

sorption ionization time-of-flight mass spectrometry (MALDI-TOFMS; Pappin et al., 1993; Yates, 1998a, 1998b; Corthals et al., 2000) have made the large-scale profiling and identification of proteins a dynamic new area of research in plant biology. Although there is a substantial amount of work in the literature on bacterial (Guerreiro et al., 1999; Morris and Djordevic, 2001), yeast (Futcher et al., 1999), and human proteomes (Anderson et al., 2001; Stensballe and Jensen, 2001), there is relatively less information on plant proteomes (van Wijk, 2001). Costa and coworkers have identified proteins from xylem and needles of maritime pine (Pinus pinaster; Costa et al., 1998, 1999), and Tsugita and coworkers have worked on the rice (Oryza sativa) proteome with some success (Tsugita et al., 1994). Both of these groups have relied heavily on Edman sequencing, which suffers due to the inability to sequence proteins blocked at the N terminus. More recently, researchers have reported on subcellular proteomes such as the chloroplast membrane (Peltier et al., 2000, 2002) whereas others have focused on single tissues including Arabidopsis seeds (Gallardo et al., 2001), Arabidopsis mitochondria (Kruft et al., 2001; Millar et al., 2001), maize (Zea mays) root tips (Chang et al., 2000), and barrel medic roots (Mathesius et al., 2001, 2002). To date, there has been no large-scale project to identify proteins from multiple tissues of the same plant species. The objective of the present work was to survey the organ-/tissue-specific proteomes of the model legume barrel medic, to provide an overview of the barrel medic proteome, and to serve as a basis for future proteome comparisons of genetic mutants, biotically, abiotically, and/or environmentally challenged plants. The survey was accomplished using 2-DE to produce reference maps of protein extracts from leaves, stems, roots, flowers, seed pods, and cell suspension cultures. MALDI-TOFMS peptide mass fingerprinting was used to identify 304 proteins. HPLC coupled with quadrupole time-of-flight tandem mass spectrometry (LC/MS/MS) was used to validate marginal MALDI-TOFMS protein identifications. The identified proteins are discussed and classified based on putative functions determined through similarity (Bevan et al., 1998). Database search results are quantified and strategies discussed. The expression levels quantified by 2-DE are compared with mRNA levels quantified by EST counting. RESULTS AND DISCUSSION 2-DE Reference Maps and Protein Identifications of Barrel Medic Tissues

2-DE reference maps were obtained for barrel medic leaves, stems, roots, flowers, seed pods, and cell suspension cultures and are provided in Figure 1. To qualitatively survey the proteins visualized by Plant Physiol. Vol. 131, 2003

2-DE, a total of 551 proteins (i.e. approximately 96 arbitrary protein spots per gel including positive molecular mass marker controls and negative gel blank controls) were excised from each of the organ-/ tissue-specific Coomassie-stained 2-DE gels and analyzed by mass spectrometry. Typically, high-quality MALDI-TOFMS peptide mass maps were obtained, and representative spectra are provided in Figure 2. Of the 551 protein spots processed, 304 proteins were successfully identified and are listed in Table I. Supplemental Table I (see www.plantphysiol.org) contains extensive data that document the analytical rigor of the protein identifications. These data include an assigned protein spot number (see Fig. 1), an arbitrary peptide mass fingerprint data quality (PMFQ) score of 1 to 5 (with 5 being best, see “Materials and Methods”) to allow assessment of data quality, the number of peptides matched, m/z accuracy and sd of peptides matched, percent protein coverage, theoretical molecular mass and pI, experimental molecular mass and pI, the database accession number of the best match and the databases that yielded concurrent identifications, LC/MS/MS data for select proteins, and the organism to which the matching protein was identified through similarity. For protein identifications determined using the SwissProt and National Center for Biotechnology Information (NCBI) databases, the organism reported in supplemental Table I is that from which the protein or gene was directly sequenced. In the case of most ESTs, protein identifications were first made to barrel medic ESTs that were not annotated. These ESTs were annotated by comparison with The Institute for Genomic Research (TIGR) gene indices or through similarity to other organisms via BLAST. The organism yielding the highest similarity score is the organism reported for EST database identifications in Supplemental Table I. Protein function is also classified and recorded in Supplemental Table I. A minimum of four peptides is statistically necessary to qualify as a confident match (Pappin et al., 1993). Use of additional criteria such as those listed above are advised and increase the confidence in the protein identification. Most proteins identified in Table I had high confidence identifications; however, a small number (23) of the original proteins were identified using only four peptides that had poor m/z accuracies (i.e. above 30 ppm). These protein identifications were considered marginal and were further interrogated using LC/MS/MS. LC/MS/MS data were queried against the same three databases (NCBI, SwissProt, and dbESTothers) used to query MALDITOFMS data. The majority of identifications were found to be valid, but four MALDI-TOFMS proteins were revealed as misidentified. The correct LC/ MS/MS identifications for these four are reported in Table I. Tandem data was also used to confirm a specific MALDI-TOFMS identified protein questioned by a reviewer in leaves (spot no. 51) that had 1105

Watson et al.

Figure 1. 2-DE proteome reference maps were obtained for A, leaf; B, stem; C, root; D, flowers; E, seed pods; and F, cell suspension cultures. Proteins that were identified in this study are marked with arrows and numbers. The numbers correlate with protein identifications listed in Table I. 2-DE was performed using 0.75 to1.0 mg of protein, linear 11-cm IPG strips (pH 3–10), and a 12% (w/v) total acrylamide SDS second dimension. Gels were stained overnight with Coomassie Brilliant Blue R-250, destained the next day, and images recorded. 1106

Plant Physiol. Vol. 131, 2003

Proteomics of Barrel Medic

Figure 2. Representative peptide mass maps obtained using MALDI-TOFMS illustrating good data quality but differences in protein identification success dependent upon the database queried. Mass spectral peaks are labeled with monoisotopic mass-to-charge ratio (m/z) values used for database searching. A, Stromal 70-kD heat shock-related protein (HSP70, accession no. Q02028) was successfully identified in seed pods (pds#7) using the NCBI databases. B, Isoflavone reductase (accession no. BE325778) from seed pods (pds#39) was identifiable only through use of the EST databases.

a minimal four matching peptides and low sequence coverage. This identification was confirmed using LC/MS/MS. These results are provided in Figure 3 and include a search score from dbESTothers (12 peptides matched and Mascot score of 513), representative TOFMS data, and tandem TOF/MS/MS data. Nine proteins from the original list of 23 marginal identifications could not be validated by LC/ MS/MS due to limited sample, therefore, were omitted from Table I. Database Query Strategies and Success Rates

In an attempt to maximize our protein identification success rate for barrel medic proteins, we have used protein (SwissProt), nucleotide (NCBI), and EST databases (dbESTothers, and barrel medic-only ESTs from NCBI) for queries of experimental peptide mass maps (Mann and Wilm, 1994; Pappin et al., 1993; Yates, 1998aa, 1998b; Choudhary et al., 2001). The specific databases used to successfully identify each individual protein are reported in Table I, and a summary of the protein identification success rates is Plant Physiol. Vol. 131, 2003

provided in Table II. In most cases, the resulting peptide mass maps were of high quality; however, this did not always translate to successful protein identification. The average protein identification success rate for all tissues using only the protein databases (SwissProt and NCBInr) was 25%, whereas the average protein identification success rate for all tissues using the EST database was 46% (see Table II). Interestingly, the average overlap in the number of proteins identified in both databases was only 15%; thus, searching both databases was complementary and not necessarily redundant. For example, the peptide maps provided in Figure 2 are of similar high quality; however, spectra 2b could not be identified successfully in the SwissProt or NCBI databases and could only be identified successfully through EST database queries. This complementary searching strategy yielded a final protein identification success rate of 55% for our representative protein set. Strategies using multiple database queries have enhanced our ability to identify proteins even in the absence of a genomic sequence. Our overall success 1107

Watson et al.

Table I. Proteins identified in barrel medic tissues Table I contains a list of identified proteins from specific tissues of Medicago truncatula. The data are separated by tissue and include: an assigned protein spot no. (see Fig. 1), database accession no. of the best match, databases that yielded concurrent identifications, and the number of MALDI-TOFMS peptides matched. LC/MS/MS was performed on select proteins, and Mascot scores for these proteins are provided in parentheses. Not applicable (NA) denotes that no MALDI data was used in the identification. Significantly more detailed data supporting the protein identifications can be found in Supplemental Table I. Accession no. is GenBank no. Databases have following notations: N, NCBI; S, SwissProt; E, pdbESTothers; (E), MtESTonly. Species are noted as Hv, Hordeum vulgare; Sb, Sorghum bicolor, LE, Lycopersicon esculentum; and Mt, Medicago truncatula. Tissue

Spot #

lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs

35 39 47 52 51 63 78 82 84 92 98 105 108 111 113b 113b 124 126 123 128 136 138 139 141 144b 144b 149b 149b 155 158 196b 187 191 188 189 196b 206b 206b 205 219 222 223 238 241 237 239 251 250 258b 263 265 261 258b

Identification

er ATPase (CDC48-like protein)a DNA mismatch repair proteina Rubisco Rubisco F23N19.10, TPR repeat protein Cell division prt. FTSK homologa Rubisco Rubisco Rubisco Rubisco Transcription factora S-adenosyl-Met synthetase S-adenosyl-Met synthetase Rubisco activase ATP synthase beta chain Rubisco activase Rubisco activase Rubisco activase Aminomethyl transferase, mito. Precursora Aminomethyl transferase, (T protein)a Fru biphosphate aldolase Spermine synthase Putative Arabidopsis thaliana proteina Ankyrin repeat protein Glyceraldehyde-3-phosphate dehydrogenase Possible tartrate dehydrogenase Glyceraldehyde-3-phosphate dehydrogenase Tartrate dehydrogenase Leu2 (3-isopropyl malate dehydrogenase)a Malate dehydrogenase Ascorbate peroxidase Oxygen evolving enhancer protein Oxygen evolving enhancer protein Remorina Remorina Rubisco Mitotic cyclin B1-1a ATP synthasea Oxygen-evolving enhancer protein 1 Cystathione-B-lyasea Chloro membrane-associated 30-kD protein/transit pepta RNA-binding protein L-ascorbate peroxidase Ascorbate peroxidase Acid phosphatase Acid phosphatase Triose phosphate isomerase, cytosolic ABC transportera Pyrimidine-nucleoside phosphorylasea Chaperonin 21 precursora Chaperonin 21 precursora Patatin-like proteina Transcription factor VSE-1a

Accession Number

NP190891 O66652 BE420942 BE420942 AW694998 P45264 BAA20039 AAF97663 AAF15326 X69528 BF004459 BG581653 P50303 AF251264 NP077960 Q42450 AAG61120 AAG61120 BF521422 P49364 BI309468 BE204391 AW685607 AL388433 BF003409 P70792 BG453922 P70792 P18120 T09286 BG587041 P14226 P14226 BG588209 BG588209 AAC35045 AAC24244 BG582863 BG449793 P53780 AW776774 BF641320 P48534 AAL15164 BG588612 BG588612 BF642390 NP488322 P39N9 AW775755 AW776607 AAF98369 CAA05898

Databases

N S E/Hv, (E) E/Hv E/Mt, (E) S, N N, S, E/Hv N, S, E/Hv N, S, E/Hv N (E) E/Mt, N, S S N N S, E/Sb N, S N, S E/Mt, (E) N, S, E/Mt (E), N, S, E/Mt E/Mt E/Mt E/Mt E/Mt, (E) S, N (E) N, S N, S N, E/Mt, (E) (E) S, N, E/Mt, (E) S, N, E/Mt, (E) (E) (E) N, S N (E) (E) S E/Mt, (E) E/Mt, (E) N, S, E/Mt, (E) N, S, (E) (E) (E) E/Mt, (E) N S E/Mt E/Mt, (E) N N

# Peptides/ LC/MS/MS

11 8 7 5 4/(513) 8 11 9 9 NA/(389) 5 12 8 NA/(372) 8 6 9 8 12 11 9 5 5 5 10 6 7 6 7 14 5 7 9 7 8 8 8 9 5 6 9 8 6 6 8 11 10 7 8 6 5 5 6

(Table continues on following page.)

1108

Plant Physiol. Vol. 131, 2003

Proteomics of Barrel Medic

Table I. Continued from previous page. Tissue

Spot #

Identification

Accession Number

lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs lvs stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm stm rts rts

270 280 281 287 284 338 363 388 387 397 422 5 7 10 9 17 16 18b 18b 19 20 22 23 24 26 29 36 37 39 40 42 43 44 45 48 46 49 51 52 57 58 55 54 60 61 62 64 66 70 72 81 85 86 88 90 89 95 2 5

Oxygen-evolving enhancer protein Oxygen-evolving enhancer protein Plastid specific ribosomal proteina Gly-rich cell wall structural protein 2a Oxygen-evolving enhancer protein Hypothetical proteina Aspartate 1-decarboxylase precursora Rubisco small subunit Rubisco small subunit Plastocyanine precursor Photosystem I iron-sulfur proteina Cell division (valosin-containing) protein Heat shock protein 70 TPR repeat protein Heat shock protein 70 Rubisco Rubisco ATP synthasea Rubisco Rubisco Rubisco Tubulin alpha chain 26S proteasome AAA-ATPase subunita 26S proteasome (TAT binding)a SAM synthetase Actin ATPase or P loop kinasea Fru 1,6 biphosphate aldolase Adenosine kinasea Malate dehydrogenase Annexina Fructokinase Ribose-phosphate pyrophosphokinasea IFR-like oxidoreductase Atran bp1a (Ran-binding protein 1 domain)a Cinnamoyl-CoA reductasea G protein beta subunita Oxygen-evolving enhancer protein I RNA-binding protein-like Ascorbate peroxidase Proteasome subunit alpha type 7 (20S) SAM:trans-caffeoyl CoA 3-O methyl transf.a RNA-binding protein-like Ascorbate peroxidase Acid phosphatase Triosphosphate isomerase Expressed proteina Uridylate monophosphate kinase 23-kD O2-evolving pht. sys. II precursora ATP synthase, delta chaina vcCYP 40S ribosomal protein S12a Gly-rich RNA binding protein 60S ribosomal protein Nucleoside diphosphate kinase I Gly-rich RNA binding protein Hypotheticala Heat shock 70 Phosphoglyceromutase

P16059 BF521386 BE318731 AL366848 P16059 NP180029 P52999 BF520627 BF519126 AW776926 NP039445 P54774 1909352A AW694998 P37900 CAA93074 P28400 CAB85681 P30401 P04991 AAF97641 Q43473 BE325937 NP187204 P46611 Q96483 NP347611 O65735 BF004017 BI310064 T09552 AW584645 P47304 BF644624 AW686211 BF635045 Q39836 P14226 NP196048 BG648814 Q9SXU1 T09399 BF641320 BG648814 BG588612 BF642390 AI774799 AW981222 P16059 Q41000 AW775250 AL375805 AL379229 AW776748 P47922 AA660717 NP174644 Q02028 BG585916

Databases

S, E/Mt, (E) E/Mt, S, (E) (E) E/Mt, (E) S, E/Mt, (E) N S E/Mt, (E) E/Mt E/M. t., (E) N, S N, S N, S (E), E/Mt N, S, E/Mt N, S N, S N S, N S, N, E/Mt N, S, E/Mt N, S, E/Mt E/Mt N, S, E/Mt N, S, E/Mt N, S, E/Mt N N, S, E/Mt (E) (E) N, E/Mt E/Mt, (E) S E/Mt, (E) (E) (E) N, S, E/Mt, (E) S, N N (E) S, N, E/Mt, (E) N, E/Mt E/Mt (E), E/Mt (E), E/Mt E/Mt, (E) E/Le E/Mt, (E) N, S, E/Mt, (E) S E/Mt, (E), N E/Mt E/Mt, (E) E/Mt, (E) S E/Mt, (E), N N N, S, E/Mt (E)

# Peptides/ LC/MS/MS

7 12 4 8 5 5 4 9 5 7 8 8 10 7 8 8 7 7 7 13 6 13 5 7 7 7 9 8 10 6 6 7 7 5 7 7 4/(265) 10 6 4 5 10 5 5 6 10 5 8 7 4 8 4 4 4 5 9 6 22 5

(Table continues on following page.)

Plant Physiol. Vol. 131, 2003

1109

Watson et al.

Table I. Continued from previous page. Tissue

Spot #

Identification

Accession Number

rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts rts flw flw flw flw flw flw flw flw flw flw flw flw flw flw flw flw flw flw flw flw flw

6 9 12 19 20 22 28 32 33 36 39 40 41 42 43 44 45 46 47 48 51 52 53 54 56 60 61 63 65 66 69 75 77 79 80 81 82 92 3 6 7 8 11 13 14 17 18 21 23 27 26 28 31 32 30 33 34 35 39

Protein disulfide isomerase Putative methyl binding domain ATPase beta subunit Actin isoform B Peroxidase precursor Ankyrin repeat protein HBP1 Glyceraldehyde-3-phosphate dehydrogenase Cationic peroxidase precursor Isoflavone reductase Isoflavone reductase homolog Acidic glucanasea Cytochrome c oxidase subunit 6b-1 Gluco endo-1,3-beta-d-glucosidase Hydroxyacyl glutathione hydrolasea Chitinasea Chitinasea Chitinasea Cys proteinase Cys proteinase precursor Ascorbate peroxidase Ascorbate peroxidase Triose phosphate isomerase In2-1 proteina Uridylate kinase (UDP kinase) Chalcone-flavone isomerase Unknown proteina Alpha fucosidasea Putative protein T25B15.70a Seed protein precursor Vc Cyp (peptidyl isomerase) Profucosidasea Putative proteina Glyceraldehyde-3-phosphate dehydrogenase Cu/Zn superoxide dismutasea aba-Responsive protein ABR17 Unknown proteina Putative ripening-related proteina Thioredoxin Valosin-containing cell division protein NADH ubiquinone oxidoreductase Heat shock 70 Poly(A⫹)-binding proteina Phosphoglyceromutase Putative methyl-binding domain Calreticulin ATPase beta subunit Enolase S-adenosyl Met synthetase Rubisco activase Ankyrin repeat protein HBP1 Fru-1,6-biphosphate aldolase Aspartate aminotransferase 1-Aminocyclopro. carboxylic acid oxidasea Pyruvate dehydrogenase beta unita Glyceraldehyde-3-phosphate dehydrogenase Malate dehydrogenase Malate dehydrogenase Ripening-induced proteina Cytochrome c oxidase subunit 6b

BI309490 AL378817 CAA75477 T51183 AL369822 BI311773 BG453922 BG584470 BG645198 BI312226 BF650084 BI310278 BE239884 BG584417 CAA71402 CAA71402 CAA71402 BI269594 BG645760 BG648814 P48534 BG584164 BF635446 AW981222 AW559891 AW686250 BE942130 BF520168 AL371551 BE316900 AW126318 BF005271 BF635050 AL387737 BF648027 AL365549 BE943167 BE997543 P54774 AW587332 Q02028 BG584083 BG585916 AL378817 AW773889 CAA75477 CAB75428 AAL16064 AAK25798 BI311773 O65735 P46643 AY062251 BF645846 P34922 O48905 O48905 BI308422 BI310278

Databases

E/Mt, (E), N, S E/Mt N, S, E/Mt, (E) N, S, E/Mt, (E) (E), E/Mt (E), E/Mt (E), S (E) (E) E/Mt, (E) (E), E/Mt, N (E), E/Mt E/Mt (E) N, S, E/Mt, (E) N, S, E/Mt, (E) N, S, E/Mt, (E) (E), E/Mt (E), E/Mt (E) S, N, E/Mt, (E) (E), E/Mt E/Mt, (E) E/Mt, (E) (E), E/Mt, N, S (E), E/Mt (E) E/Mt, (E) E/Mt, (E) (E), E/Mt (E) (E), E/Mt E/Mt, (E), N E/Mt E/Mt, (E) E/Mt, (E) E/Mt, (E) (E), E/Mt S, N E/Mt, (E) N, S (E) (E), E/Mt, N, S (E), E/Mt E/Mt N, S, E/Mt, (E) N, S, E/Mt, (E) N, S, E/Mt, (E) N, S, E/Mt, (E) (E) N, S, E/Mt, (E) S, N, E/Mt N, (E) (E), E/Mt N, S, E/Mt, (E) S, N N, S, (E) (E) (E), E/Mt

# Peptides/ LC/MS/MS

10 4 13 13 8 12 10 7 13 5 4 8 4 8 9 12 10 4 6 4 4 11 7 4 6 9 6 8 4 4 5 8 9 4 8 6 4 4 16 6 10 5 10 5 4/(348) 8 7 8 9 9 7 4/(115) 6 6 8 4 8 8 5

(Table continues on following page.)

1110

Plant Physiol. Vol. 131, 2003

Proteomics of Barrel Medic

Table I. (Continued from previous page. Tissue

Spot #

Identification

Accession Number

flw flw flw flw flw flw flw flw flw flw flw flw flw flw flw flw flw flw flw flw flw flw pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds

40 50 51 53 55 56 57 60 71 73 74 75 77 78 79 81 82 88 91 92 94 96 5 6 7 8 9 10 13 12b 12b 14 15 16 19 22 20 21 23 24 28 27 31 30 29 32 34 38 37 39 41 43b 43b 44 47 46 51 52 53

Stromal ascorbate peroxidase Acid phosphatase Acid phosphatase Triose phosphate isomerase Osmotin-like protein Chalcone isomerase Ascorbate peroxidase Oxygen-evolving enhancer protein 2 Peptidyl prolyl isomerase Acid phosphatase Peptidyl prolyl isomerase Gly-rich RNA binding protein Peroxiredoxin (peroxidase) Ubiquitin-like SMT3 protein Ubiquitin-like SMT3 protein 60S acidic ribosomal protein p3 Gly cleavage system h precursora Acid ribosomal protein P2a2 Immunophilin Rubisco small chain Profilin 1a NADH plastoquinone oxidoreductase 4a Convicilina Convicilina Heat shock 70 Legumin a2 precursora Protein disulfide isomerase Glycinina Legumin a2 precursora Rubisco NAp1p (plasma membrane intrinsic protein)a Vicilin 47kD precursora Provicilin precursora Vicilin 47kD precursora Vicilin 47kD precursora Glycinina Glycinina Glycinina Glycinina Legumin a2 precursora Legumin a2 precursora Legumin a2 precursora Fru 1,6-biphosphate aldolase Legumin a2 precursora Legumin a2 precursora Legumin a2a Cytosolic malate dehydrogenase Malate dehydrogenase precursor Glycinina IFR-like NADH-dependent oxidoreductase Peroxidase 2 Rubisco Enolase Vicilin 47-kD precursora Acid phosphatase Acid phosphatase Acid phosphatase Proteosome 20S subunit Ascorbate peroxidase

Z67113 BG588612 BF004054 BG584164 BI270608 BI310352 AAL15164 BF636854 BE999037 AW584917 BE997455 BF637655 AW585033 AL376595 P55852 BF003585 BF518986 AW329482 AW574158 BI268542 AL373653 BF631701 BI312063 BI310979 Q02028 BI312252 P29828 BI308459 BI311943 AAK70985 AW774263 BI310576 BI312400 BI311712 BI311712 BI311592 BI311729 BI308883 BI308883 BI309500 BI311943 BI311943 O65735 BI311943 BI311943 BI307938 BG583001 AW688679 BI311164 BE325778 CAC38106 CAA62888 BG362941 BI310576 BF004054 BF004054 BG588612 BE922062 BG648703

Databases

(E), E/Mt (E), E/Mt (E), E/Mt (E) (E) (E), E/Mt N, S, E/Mt, (E) E/Mt, (E) (E), E/Mt (E), E/Mt (E), E/Mt E/Mt (E), E/Mt (E), E/Mt, S S, N, E/Mt mt (E), E/Mt (E), E/Mt (E) (E), E/Mt (E), E/Mt (E) (E) (E) E/Mt N, S E/Mt N, S, (E), E/Mt (E) (E) N, S E/Mt, (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) (E) N, S, (E), E/Mt (E) (E) (E) (E), N, S E/Mt, N (E) E/Mt N, E/Mt N, S E/Gm (E) E/Mt E/Mt (E) E/St (E), E/Mt

# Peptides/ LC/MS/MS

7 7 6 11 8 6 5 5 5 4/(131) 6 4 12 8 7 4/(123) 5 5 6 5 9 6 5 10 9 9 7 7 9 8 10 9 6 4 7 13 4 12 6 11 14 11 7 11 8 5 6 7 10 7 16 10 7 9 5 7 9 7 10

(Table continues on following page.)

Plant Physiol. Vol. 131, 2003

1111

Watson et al.

Table I. Continued from previous page. Tissue

Spot #

Identification

pds pds

54 55

pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds pds cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls

56 57 58 59 62 65 66 68 70 71 75 73 72 77 78 80 82 84 85 91 95 94 2 5 6 7 8 10 9 20 15 16 14b 14b 18 19 21 22 23 24b 24b 28 29 30 31 27 33 32 35b 35b 37 39 46 49 53 60 62

Osmotin like protein precursor Putative GSH-dependent dehydroascorbate reductasea Legumin a2 precursora Legumin a2 precursora Oxygen-evolving enhancer protein 2 Legumin a2 precursora Legumin b (minor small)a Legumin b (minor small)a Legumin b (minor small)a Legumin-related high-Mr polypeptidea Hypothetical proteina Vicilin 4-kD precursora LEA proteina Legumin a2 precursora Eukaryotic initiation factor 5aa VcCyP peptidylprolyl isomerase VcCyP peptidylprolyl isomerase Ubiquitin-like protein aba-Responsive protein abr 17 Gly-rich RNA-binding protein Acidic ribosomal protein Legumin (minor small)a Rubisco small subunit Plastocyanin precursor Cell division cycle prt 48 (valosin contain. prt) Heat shock protein 70-kD (Bip A) Luminal-binding protein Psst 70 Putative-luminal binding protein Leucyl aminopeptidasea 70-kD heat shock protein Catalasea Selenium-binding proteina Selenium-binding proteina Calreticulin Nucleosome assembly protein 1a ATP synthase beta subunit Inosine-5⬘-monophosphate dehydrogenasea Hydroxymethyltransferasea Enolase SAM synthetase Glc-6-phosphate 1 dehydrogenase SAM synthetase 2 Aspartate aminotransferase Putative heat shock protein 12-Oxophytodienoic acid 10,11-reductasea 12-Oxophytodienoate reductase (OPR2)a RAD23 (ubiquitin-like protein)a Probable mannitol dehydrogenase Alcohol dehydrogenasea Catalasea Fru-1,6-biphosphate aldolase 2-Nitropropane dioxygenase-like proteina Fructokinase Beta-1,3-glucanase Stromal L-ascorbate peroxidase precursor Cytochrome b5 reductasea Glyceraldehyde-3-phosphate dehydrogenase Rubisco, small subunit

Accession Number

Databases

BG582096 BF636747

(E), E/Mt E/Mt, (E)

BI307938 BI309895 AW775879 BI309155 BI311720 BI311720 BI311437 BI310430 AW685677 BI312335 BG454568 BI309500 AL389124 BE997455 BE997455 AL376595 BF648027 BI309824 AL383563 BI311437 BF519894 BF005687 P54774 T06598 CAC14168 Q02028 CAC14168 S57811 Q01899 P49315 CAC67501 CAC67501 Q40401 S60893 CAA75478 AAL18815 AW980652 CAB75428 AAG17666 Q42919 Q96552 P28011 AAK63929 BG648922 AW776305 AW586882 AW981164 P12886 P45739 P46257 BF518520 AW584645 BF650622 BE941206 NP 568391 P54270 PS6577

(E) (E) E/Mt, (E), S (E) (E) (E) (E) (E) E/Mt, (E) (E) (E) (E) E/Mt E/Mt, (E) E/Mt, (E) E/Mt (E) (E), E/Mt E/Mt (E) E/Mt E/Mt S E/Hv, N, S N, S, E/Hv N, S, E/Hv N, S, (E) N, S N, S, E/Mt, (E) N, S, (E) N, (E) N, (E) N, E/Mt E/Mt N, S, E/Mt, (E) N, E/Mt, (E) E/Mt, (E) N, E/Mt N, E/Mt, (E) S N, E/Mt, E/Gm N, S, E/Mt N E/Mt, (E) E/Mt (E) (E) S, (E) S N, S, E/Mt E/Mt, (E) E/Mt, N, S E/Mt E/Mt, (E) E/Mt N, S S

# Peptides/ LC/MS/MS

8 6 7 6 12 9 9 12 6 8 9 10 4 4 4 10 13 5 5 11 5 7 8 6 10 10 8 9 9 4 8 6 12 7 4/(135) NA/(208) 6 5 4 6 5 5 5 17 5 6 6 4 6 5 5 5 7 8 4/(378) 5 NA/(406) 5 4

(Table continues on following page.)

1112

Plant Physiol. Vol. 131, 2003

Proteomics of Barrel Medic

Table I. Continued from previous page. Tissue

Spot #

Identification

Accession Number

cls cls cls cls cls cls cls cls cls cls cls cls cls cls cls

64 67 74 78 81 83 82b 82b 87 85 88 86 90 92 96

Proteosome subunit ␣-type 5 (20S subunit) NADH ubiquinone oxidoreductase vcCyp (peptidylprolyl isomerase) Peroxiredoxin TPx1 (thioredoxin peroxidase) Peptidylprolyl isomerase (immunophilin) Disease resistance response proteina aba-Responsive protein Leghemoglobin 2 (Pprg2)a Class 10 PR proteina Cytochrome C-555a Nucleoside diphosphate kinase Gly-rich RNA binding protein Immunophilin Acidic ribosomal protein (60S) 10-kD chaperonina

Q9M4T8 BG448277 AW775250 AW559683 BF635887 BE942549 BF648027 P27993 Q43560 P00124 P47922 AAF06329 AL377066 AL378424 AL377948

a

Putative unique protein identified only in one tissue.

b

E/Mt, N, S (E) E/Mt (E), E/Mt E/Mt, (E) E/Mt, S E/Mt, (E) S N, S, E/Mt S S E/Mt, N E/Mt E/Mt E/Mt

# Peptides/ LC/MS/MS

NA/(449) 4 9 11 5 9 10 4/(680) 5 5 7 9 4 5 4/(121)

Multiple proteins identified in this 2-DE spot.

rate of 55% is good when compared with other reports focused on organisms without sequenced genomes. For example, a recent publication concerning pea (Pisum sativum) chloroplast proteins reported a success rate of 15% using mass spectrometry and Edman sequencing (Peltier et al., 2000), whereas a barrel medic root proteome article reported a success rate of 37% (Mathesius et al., 2001). Our protein identification success rates are approaching those for organisms with sequenced genomes. For example, identification success rates of 54% using MS only (Kruft et al., 2001) and 69% (Millar et al., 2001) using MS, immunoblotting, and Edman sequencing were reported for Arabidopsis mitochondrial proteomes. Further, protein identification success rates in human proteome projects are approximately 60% (Stensballe and Jensen, 2001). We expect protein identification success rates to continually increase as the population of unique ESTs continues to increase, as fulllength EST sequences are generated, and as genomic sequence of barrel medic becomes available (Comment, 2002). The average length of barrel medic ESTs used to successfully identify proteins in all organ/tissues was 597 ⫾ 177 nucleotides (or 199 ⫾ 59 amino acids). For proteins in the 30-kD range or less, this represents complete or almost complete sequence coverage by the EST; thus, our confidence in these identifications is very high. For larger proteins this only represents partial protein sequence; however, our data demonstrate that the current EST information is sufficient to allow confident identifications. Additional experimental data such as number of peptides matched, m/z accuracy, molecular mass, and pI provide additional confirmation of identification. It is logical that a strategy including both protein and nucleotide databases would yield greater protein identification rates as some mRNAs, such as mitochondrial and chloroplast-encoded mRNAs (i.e. Plant Physiol. Vol. 131, 2003

Databases

Rubisco large subunit), do not contain poly(A⫹) tails (Sugiura and Takeda, 2000). These poly(A⫹) tails are used in the initial stages of affinity purification of mRNAs in the cDNA/EST library generation process (Sambrook et al., 1989). Messenger RNAs without poly(A⫹) tails pass through the affinity purification process and are unlikely to be sequenced. These proteins are poorly represented in the EST libraries but are present in many of the protein databases. Therefore, querying both provides greater identification success rates.

Protein Identifications and Functional Classifications

Putative protein functional classifications were assigned based on similarity to better understand the biological processes encompassed by the proteins identified using a 2-DE proteomics approach. Summaries of protein functions observed in the barrel medic proteome are provided in Figure 4. Protein functions were assigned using the protein function database Pfam (http://www.sanger.ac.uk/Software/ Pfam/; Bateman et al., 2002) or Inter-Pro (http:// www.ebi.ac.uk/interpro/; Apweiler et al., 2001). Protein function was categorized into 13 classes as previously described for Arabidopsis (Bevan et al., 1998). The “unclear” protein class included proteins that were successfully matched to putative proteins from such sources as the Arabidopsis genomic sequence but do not yet have a known function. Most proteins could be unambiguously classified; however, a small number of proteins were associated with multiple functions. Classifications for these proteins were based on their predominate function. Discussions concerning a portion of the proteins observed and their functional role are presented below in relation to the tissue in which they were observed. 1113

Watson et al.

Figure 3. Representative LC/MS/MS data obtained on an ABI Qstar Pulsar for leaves (spot no. 51) confirming the identification of this protein as a TPR repeat protein (accession no. AW694998) as suggested by MALDI-TOFMS peptide mass fingerprinting. The data include: A, database search score and peptides successfully identified; B, example TOF/MS; and C, tandem TOF/MS/MS mass spectra for the peptide observed at m/z 677.62. 1114

Plant Physiol. Vol. 131, 2003

Proteomics of Barrel Medic

Table II. Summary of protein identification success rates Success rates are reported as total no. of proteins identified and as a percentage of those identified relative to those processed in parentheses. Tissue

Protein Databases

EST Databases

Leaves Stems Roots Flowers Pods Cells

37 (44%) 28 (30%) 12 (13%) 16 (17%) 9 (10%) 33 (35%)

42 (50%) 33 (38%) 40 (43%) 40 (43%) 59 (65%) 40 (40%)

15 (18%) 15 (16%) 12 (13%) 13 (14%) 7 (8%) 23 (24%)

64/84 (76%) 46/94 (49%) 40/94 (43%) 43/94 (46%) 61/91 (67%) 50/94 (53%)

Total

135/551 (25%)

254/551 (46%)

85/551 (15%)

304/551 (55%)

Leaves

Photosynthetic enzymes dominated the 2-DE profiles of leaf tissue. Approximately 40% of the leaf protein mass visualized with Coomassie staining can be attributed to a small number of enzymes including the large subunit of Rubisco (26.1%), Rubisco small subunit (2.8%), Rubisco activase (3.2%), and oxygenevolving protein (6.4%). Most of these proteins appear as multiple spots, and the reported percentages are estimates including all identified spots. The relatively high concentrations of the abundant photosynthetic enzymes demonstrate the importance of these enzymes; however, the prominence of these proteins, specifically Rubisco, in specific regions of the gel,

Overlap

Total

generally contributes to lower quality 2-DE gels and prevents the observation of moderate or lower abundance proteins due to their relatively lower concentrations and the limited dynamic range of common 2-DE staining techniques including Coomassie. Other proteins involved in photosynthesis and carbon fixation were observed in leaf, including: PS1 iron-sulfur protein, ATP synthase, glyceraldehyde 3-phosphate dehydrogenase, malate dehydrogenase, triose phosphate isomerase, tartrate dehydrogenase, and Fru biphosphate aldolase. Many of these photosynthetic enzymes were also observed at lower levels in other green tissues such as stems and immature seed pods.

Figure 4. Summary of the distribution of tissue specific identified protein classes as determined using the protein function database Pfam (http://www.sanger.ac.uk/Software/Pfam/) and classification schema previously reported for Arabidopsis (Bevan et al., 1998). Plant Physiol. Vol. 131, 2003

1115

Watson et al.

Several signal transduction proteins were observed in leaves, including the multiple domain protein remorin. Remorin binds simple and complex galacturonide and its C-terminal region has functional similarities to viral intercellular communication proteins (Reymond et al., 1996). Other proteins involved in protein destination or transport included chaperonin 21 precursor, an ankryin repeat protein, and an ATPbinding cassette transporter. Ankyrin repeat proteins have been associated with protein-protein interaction (Gorina and Pavletich, 1996), transcriptional regulation (Batchelor et al., 1998), and transcription inhibition (Jacobs and Harrison, 1998). ATP-binding cassette transporters are membrane-localized proteins that transport small hydrophilic molecules across membranes and include an ATP-binding domain (Higgins, 1992; Jasin˜ ski et al., 2001). Interestingly, other membrane localized proteins were identified and included a chloroplast membrane-associated 30-kD protein (Li et al., 1994) and ATP synthase. The identifications of membrane proteins are important because these proteins are generally underrepresented in 2-DE proteomic studies due to low solubility (Molloy et al., 1998). The observation of plant proteins in 2-DE relative to their general average hydropathicity score has been discussed recently (Millar et al., 2001). Additional proteins identified in leaf tissues included: two cell division proteins, filamentous temperature sensitive protein K homolog cell division protein, miotic cyclin B1-1, DNA mismatch repair protein, RNA-binding protein, transcription factor, and a Gly-rich cell wall structural protein.

Stems

The 2-DE reference map of barrel medic stem proteins was of better quality than that of leaves, primarily due to a lower abundance of Rubisco. Many of the same photosynthetic and carbon metabolism enzymes reported above for leaf were also identified in stems. In addition, several members of the ATP complex associated with energy metabolism were observed. Proteins involved in protein destination and storage were also identified and included the 26S proteasome AAA-ATPase subunit and a 20S proteasome subunit alpha type 7 protein. The 26S proteasome is responsible for protein degradation of endogenous proteins. Proteins involved in secondary metabolism are of specific interest to our functional genomics project focused on natural products (National Science Foundation Plant Genome Research Project no. 0109732). Several secondary metabolic enzymes were identified in stems and included cinnamoyl-CoA reductase, which plays a role in lignin biosynthesis, and isoflavone reductase-like oxidoreductase, an enzyme involved in phytoalexin production. Stems also revealed several kinases including adenosine kinase, fructoki1116

nase, Rib-phosphate pyrophosphokinase, uridylate monophosphate kinase, and nucleoside diphosphate kinase1. A number of RNA binding proteins thought to be important in transcription were also observed. Multiple ribosomal proteins including 40S and 60S ribosomal proteins were identified and function in protein synthesis.

Roots

The roots of legumes are of special interest because of their role in the characteristic symbiotic relationships formed with microorganisms. Although recent articles have been published on the proteomes of barrel medic nodulated root (Bestel-Corre et al., 2002) and uninoculated root (Mathesius et al., 2001), we have included roots as part of our survey for completeness and comparison. Approximately 24% of root proteins identified in this report were associated with plant disease/defense and included peroxidases, superoxide dismutases, ripening related protein, abscisic acid (ABA)-responsive protein, and chitinase. Peroxidases are generally involved in hydrogen peroxide detoxification and are induced by bacterial infection (Cook et al., 1995; Peng et al., 1996). Peroxidases also play a major role in lignin biosynthesis (Lewis and Yamamoto, 1990; Davin and Lewis, 1992). Several glucanases were also identified. These normally constitutively expressed proteins are induced in response to fungal and viral elicitation (Meins et al., 1992). Proteins involved in secondary metabolism of the flavonoid/isoflavonoid pathway made up another 8% of the identified root proteins. Similar to leaves, several membrane-localized proteins such as ATPase and cytochrome C oxidase were also observed in roots. Relative to other tissues, a larger percentage (i.e. 15%) of the barrel medic root proteins were identified as putative proteins or unannotated proteins. These proteins could be confidently linked to specific ESTs or predicted open reading frames whose functions are still unknown. The observation of unannotated proteins provides experimental evidence of putative/predicted proteins that offer exceptional opportunities in gene annotation (Mann and Pandey, 2001). Because roots appear to have the largest percentage of proteins of unknown function, it is possible that many of these proteins may be specific to legumes and may be involved in microbial interactions characteristic of legumes. The root 2-DE reference map and protein identifications reported here are consistent with the previous studies by Mathesius et al. (2001) in young, uninoculated barrel medic roots, and by Bestel-Corre et al. (2002) using roots inoculated with Glomus mosseae or Sinorhizobium meliloti. Similar to our results listed above, Mathesius and coworkers reported 5% of their identified root proteins to be associated with flavonoid metabolism and 18% with defense and stress Plant Physiol. Vol. 131, 2003

Proteomics of Barrel Medic

response, yielding a total of 23% defense-related proteins. Further, the total overlap in identified root proteins between the current study and the detailed report by Mathesius and coworkers was over 50%. These included heat shock 70 protein, protein disulfide isomerase, glyceraldehyde-3-phosphate dehydrogenase, isoflavone reductase and chalcone isomerase, a glucosidase and a Cys proteinase, ascorbate peroxidase, alpha-fucosidase, and a ripening-related protein. Many of these proteins had very similar molecular mass and pI values in both studies. For example, cytochrome c oxidase was reported to have a gel molecular mass/pI of 37 kD/4.2 by Mathesius and coworkers, whereas it was observed at a molecular mass/pI of 36 kD/4.9 in the present study. Similarly, ripening related protein had an experimental molecular mass/pI of 16 kD/5.8 in this study and 18 kD/5.5 or 17 kD/6.2 (isoforms) in the Mathesius et al. work. Interestingly, some proteins demonstrated varied slightly between studies. For example, VcCyp was observed at a gel molecular mass/pI of 22 kD/6.3 in the current study as opposed to molecular mass/pI 20 kD/8.8 in Mathesius and coworkers. These slight inconsistencies may represent real differences in posttranslational modifications of the proteins or may be the result of experimental variability. Proteins identified in all three investigations include a peroxidase precursor, cytochrome c oxidase subunit 6, VcCyp (cyclophilin), a superoxide dismutase, and ABA-responsive protein. Only the ABAresponsive protein and VcCyp were reported to be constitutively expressed by Bestel-Corre et al. (2002), whereas the others proteins common to all three investigations were identified by them as symbiosisrelated proteins. Bestel-Corre also identified and reported profucosidase as a symbiosis-related protein. This protein was identified in the current report using uninoculated roots. Interestingly, two proteins identified in this investigation were not found in either of the other two studies. Acidic glucanase was observed as a relatively abundant protein in the present report (rts#39), but due to its pI of 8.4 and the fact that Mathesius and coworkers’ first dimension immobilized pH gradient (IPG) pH range was 4 to 7, it was not present on their gels. We also identified three isoforms of chitinase, all with a pI above 7, that are missing in the Mathesius et al. work. Bestel-Corre et al. (2002) used a pH of 3 to 10 first dimension IPG; thus, these proteins should be visible in their gels. Unfortunately, the total number of identified proteins in the BestelCorre report was limited, and these proteins were not identified by them. Overall, these three reports (this report; Mathesius et al., 2001; Bestel-Corre et al., 2002) provide a wealth of information on the barrel medic root proteome. There are significant similarities between the reference maps that serve as landmarks and can be used for navigation through the root proteome. For examPlant Physiol. Vol. 131, 2003

ple, ABA-responsive protein is one of the most abundant root proteins in each of these investigations. Its relative position can be used to locate PR10 (a highly abundant low-molecular mass protein reported by Mathesius and coworkers next to ABA-responsive protein, rts#80, that was not identified in the present study) in the present and other studies based on similarity. Unfortunately, absolute comparisons of the proteome reference maps are not always straightforward as demonstrated by the differences in molecular mass and pI values shown above for VcCyp. Flowers

The proteome of flowers contained proteins from almost every functional category. The major portion (38%) of the identified proteins was associated with energy production including glycolysis, pyruvate metabolism, and the tricarbonylic acid (TCA) cycle. Another 21% of the identified proteins were involved with protein synthesis or protein destination. For example, peptidyl prolyl isomerase accelerates protein folding by catalyzing cis-trans isomerization in oligopeptides. Several proteins identified were related to disease/defense or involved in secondary metabolism, such as chalcone isomerase. These enzymes are commonly associated with flower pigmentation or UV protection and serve as important defense proteins in developing seeds. One of the proteins identified specifically in the flower proteome was profilin. Profilin normally binds to monomeric actin to prevent polymerization, although under certain conditions it can promote the polymerization of actin. It occurs in all organs, but is most abundant in mature pollen, making it more likely to be identified in flowers. Many proteins associated with oxidative responses were also identified in flowers. Low levels of a few photosynthetic enzymes were observed due to collection of green sepals with the flowers. Seed Pods

The intact seed pod proteome was generated from tissue containing both seed and pod tissue. The proteins visualized and identified in the barrel medic seed pod proteome consisted primarily of globulins or seed storage proteins that serve as a nitrogen/ nutritional source for developing plants. Several members of the superfamily of “cupins” were identified in barrel medic seed and included 7S and 11S globulins (Dunwell, 1998). The 11S globulins are nonglycosylated proteins and include glycinin and legumin (Hayashi et al., 1988; Duranti et al., 1995). The 7S proteins are a series of similar but progressively larger variations of the same subunit and include vicilin, convicilin, and legumin. It is also interesting to note that 85% of the proteins in this group have been matched to other legumes, suggesting a high level of sequence similarity in legume storage pro1117

Watson et al.

teins. All of the barrel medic seed storage proteins were observed at multiple molecular masses and pIs. These may represent various stages of protein synthesis and degradation, posttranslational processing not observable at the genome or transcriptome level, or may be the products of multigene families. Similar variations in observed isoforms have been reported for Arabidopsis 12S seed storage proteins in mature and developing seeds (Gallardo et al., 2001). A significant number of disease-/defense-related proteins were observed in seed pods including peroxidases, osmotin, and ABA-responsive protein. These proteins help defend the plant in early stages of development. Other proteins associated with carbon metabolism, nutrient acquisition, and protein syntheses were also observed. These proteins supply necessary nutrients to the developing plant. Several photosynthetic proteins were observed and are attributed to the collection of immature green seed pods. Cell Suspension Cultures

Cell suspension cultures were initiated from barrel medic root calli (Dixon, 1980) and their proteome surveyed. Cell culture proteins were extracted with a Tris buffer and, thus, consisted primarily of cytosolic proteins. Most of the identified proteins from cell cultures could be classified in four categories: energy (24%), protein destination and storage (24%), metabolism (22%), and disease/defense (18%). The defense proteins were primarily composed of pathogenesisrelated proteins. The most abundant proteins identified were an ABA-responsive protein and a class 10 PR protein. Other disease/defense proteins identified included selenium-binding protein, catalase, and peroxiredoxin. Several of the metabolic enzymes identified in cells were not identified in any other tissue. One of these, 12-oxophytodienoate reductase, is asso-

ciated with the conversion of 12-oxophytodienoic acid to jasmonic acid. In some instances, more than one protein was identified with high confidence in each protein spot. For example, spot cls#82 contained peptides that could be associated with both ABA-responsive protein and leghemoglobin. Interestingly, leghemoglobin was identified as a root nodule-specific isoform (Gallusci et al., 1991). This protein is root specific and is induced during nodulation; however, it is generally not observed at appreciable levels in uninoculated roots. Thus, the observation of leghemoglobin is unique here, and this protein may be induced by the cell culturing process. Further, it may also suggest a “memory” effect or root-specific expression pattern observed in the cell cultures that were originally generated from root material (Dixon, 1980). Although many flavonoid-related proteins were observed in other tissues such as root and stem, none were identified in the limited set of proteins surveyed in unchallenged cell cultures. The proteome of suspension cell cultures is of special interest because the tissue is relatively homogeneous and, therefore, provides a good model tissue system for experiments directed toward integrated functional genomic studies of natural products (https://www.fastlane.nsf.gov/servlet/showaward? award⫽0109732). Future work will focus on generation of an extensive 2-DE proteome reference map of suspension cell cultures and the changes in the proteome after biotic and abiotic elicitation. Tissue-/Organ-Specific Expression of Proteins

Many of the proteins identified were redundant as an average of 61% were identified in one or more tissues of barrel medic. The remaining 39% were identified in only one tissue and have the potential of being uniquely expressed in specific tissues/organs

Figure 5. Bar graph summarizing the number of redundant proteins identified in more than one tissue (A) and the number of putative tissuespecific proteins identified in a single tissue only (B). The graph is segregated by tissue. A total of 61% of the proteins were found to be redundant and 39% were found to be putatively tissue specific. Guarantee of specificity at this stage is difficult due to the limited size of the reported protein dataset relative to the total proteome.

1118

Plant Physiol. Vol. 131, 2003

Proteomics of Barrel Medic

based on our limited dataset. The quantities of redundant and potentially unique proteins identified in each specific tissue are summarized in Figure 5. Many of the putative unique proteins are related to the primary function of the specific tissue. For example, photosynthetic enzymes such as PSI iron-sulfur protein and plastid specific ribosomal proteins were only identified in leaves. Other proteins identified only in a specific tissue include the seed storage proteins glycinin, convicilin, and legumin in seed pods. Profilin, a known pollen allergen, was also identified in flowers. These are limited examples illustrating the unique nature of the proteome, but we are hopeful that continued evaluation of the tissueand organelle-specific proteomes of barrel medic will yield further insight into the specialized functionality of these tissues. Comparison of Barrel Medic Proteome and Transcriptome

A better understanding of the relationship between mRNA and protein abundances is needed to elucidate the processes and regulation of transcription and translation. Several recent publications present conflicting views concerning the correlation of mRNA and protein levels. Gygi et al. (1999) suggested that there is a poor correlation between most yeast mRNAs and protein levels with the exception of only the most abundant proteins. In contrast, Futcher et al. (1999) reported a good correlation between yeast mRNA abundances, measured by both SAGE and microarray chips, and protein abundances. Given the large abundance of EST information for barrel medic (http://www.ncbi.nlm.nih.gov/entrez/ query.fcgi/), a simple comparison of identified protein levels with their corresponding mRNA levels was performed. Currently, over 145,000 EST sequences from approximately 20 different non-subtractive, nonnormalized (J. White, TIGR, personal communication) cDNA libraries are available (Covitz et al., 1998; Cook, 1999; Bell et al., 2000; Gyorgyey et al., 2000). It is

possible that a select few sequences from these libraries are being held back by the contributors, but these are few and specialized, and should have a minimal affect on the following comparisons. The cDNA libraries were used to estimate or “count” the relative expression level of a particular barrel medic transcript based on the repetitive occurrence of sequences from the same mRNA (Audic and Claverie, 1997; Ewing et al., 1999). The relative abundances of the top 200 ESTs for barrel medic leaves, stems, uninoculated roots, flowers, seed pods, and elicited cell cultures were quantified in this manner and are provided in Supplemental Table II (see www.plantphysiol.org). The relative abundances for the ESTs were generated using cDNA libraries originating from similar tissues; however, these tissues were from multiple and separate origins. Comparisons were based on functional annotation and not necessarily on specific protein or GenBank numbers, i.e. oxygen-evolving protein as opposed to P14226. Although this comparison is not of high analytical rigor, it does provide insight into correlation of protein and mRNA levels. Although the proteins were arbitrarily chosen across pI and molecular mass ranges, most represent relatively abundant proteins typical of 2-DE and CBB 250 staining. Based on the 2-DE protein quantification results presented here, 67% of the identified proteins were in the top 100 most abundant proteins visualized with Coomassie, whereas 97% of the proteins identified were in the top 200 most abundant proteins. Thus, identified proteins were compared with the top 200 most abundant tissue-specific ESTs in related cDNA libraries. The percentages of the identified proteins observed by 2-DE that were also observed in the top tissue specific ESTs are summarized in Table III. This summary reveals that an average of 50% of the identified proteins were observed in the top 200 tissue-specific ESTs. An evaluation of the top 100 tissue-specific ESTs shows that 40% of proteins identified in 2-DE experiments were also observed in the 100 most abundant tissue-specific ESTs. These results suggest a moderate level of cor-

Table III. Summary of the correlated protein and EST libraries Ninety-seven percent of all identified proteins were quantified as being in the top 200 most abundant proteins observed in Coomassie-stained 2-DE gels. The occurrence of these identified proteins in the top 100 and 200 ESTs is reported. The no. of EST sequences used for EST counting is listed in parentheses under each tissue identifier. Tissue

Leaves (7,831 ESTs) Stems (10,314 ESTs) Roots (6,593 ESTs) Flowers (3,404 ESTs) Pods (4,587 ESTs) Suspension cells (8,926 ESTs) Total (41,655 ESTs) Plant Physiol. Vol. 131, 2003

No. of Proteins Matched in Top 100 ESTs/No. of Identified Proteins

No. of Proteins Matched in Top 200 ESTs/No. of Identified Proteins

21/64 (33%) 12/46 (26%) 16/40 (40%) 16/43 (37%) 45/61 (74%) 12/50 (24%)

30/64 (47%) 16/46 (35%) 19/40 (48%) 19/43 (44%) 48/61 (79%) 19/50 (38%)

122/304 (40%)

151/304 (50%) 1119

Watson et al.

relation between mRNA and protein. For example, leaf proteins such as the photosynthetic enzymes Rubisco small subunit and oxygen-evolving protein appear to be highly correlated with their respective mRNA levels. Interestingly, some highly expressed proteins such as Rubisco large subunit were not observed in the EST libraries. As mentioned earlier, we believe that this is due to the chloroplast-encoded nature of certain mRNAs, such as Rubisco large subunit, which do not contain poly(A⫹) tails necessary for purification and cDNA library preparation (Sambrook et al., 1989). Highly abundant leaf ESTs not represented in the protein data to date included aquaporins, chlorophyllbinding proteins, and cytochrome B6. This apparent lack of correlation can be explained by the integral thylakoid membrane nature of these proteins. It is commonly accepted that integral membrane proteins are underrepresented in 2-DE due to poor solubilization. Lipoxygenase also appeared in the top 100 clones of five tissue-specific EST libraries; however, it was never identified in the protein dataset. Plants express both cytosolic and chloroplast isoforms of lipoxygenase, most of which have a molecular mass of approximately 100 kD. A possible explanation for the absence of this protein from the protein data could be the inherent discrimination against high-molecular mass proteins encountered during isoelectric focusing using IPG strips of fixed gel composition (Candiano et al., 2002). The lack of correlation between mRNA and protein could not always be explained. For example, identified stem proteins included acid phosphatase, actin, and osmotin; however, these proteins were absent or of very low abundance in the stem-specific EST library. Other proteins identified but not represented in the EST libraries included: RNA-binding protein and ankyrin repeat protein in flowers and hydroxyacyl glutathione hydrolase in roots. Interestingly, elongation factor 1-alpha was observed as a highly expressed EST (top 50) in all tissues but was not observed in the protein set. The lack of correlation may be due to the relative turnover rates of both transcripts and proteins, or translational controls such as codon bias (Gygi et al., 1999), mRNA secondary structure (Wang and Wessler, 2001), or upstream open reading frame repression (Wang and Wessler, 1998). Based on the limited comparison above, we estimate a moderate 50% correlation between protein and mRNA levels. This value suggests a correlation that is higher than that reported by Gygi et al. (1999) but lower than that reported by Futcher et al. (1999). If the limitations imposed by the chloroplast-encoded proteins, poor representation of membrane proteins in 2-DE, and our limited protein dataset are taken into account, a higher correlation than that reported may be possible. Although a significant level of cor1120

relation is perceived, there are still many specific examples that show poor correlation.

CONCLUSIONS

To date, we have identified over 300 proteins in specific tissues of barrel medic. Protein identifications using only protein databases were 25% successful even with good peptide mass fingerprints. Significant increases in protein identification success rates were achieved by using EST sequence databases. Using complementary protein, nucleotide, and EST sequence libraries, we were able to achieve a protein identification success rate of 55% for our representative protein dataset. We consider this a relatively high success rate in the absence of a genomic sequence and in comparison with other plant proteomic projects. Tentative consensus searches currently are being performed and confirm many of the proposed identifications in this study (Asirvatham et al., 2002b); however, this topic will be discussed in a separate publication. The 2-DE profiles of various barrel medic tissues provide reference maps for future proteomic comparisons of genetic mutants, biotically and abiotically challenged plants, and/or environmentally challenged plants. The identified proteins provide a survey of those proteins observable using current technology and also serve to define the limitations of the reported proteomics approach. For example, it will be difficult to study other physiological processes besides photosynthesis and carbon metabolism in leaves using current proteomic technologies due to the very high level of these proteins in leaves. Further, the proteins identified serve as physiological markers of tissue-specific protein expression. Based on the limited dataset, 39% of all the identified proteins were only identified in a single tissue. These putative unique proteins provide valuable insight into the specialized physiological function of each of the tissues. For example, a comparison of roots and root-derived cell cultures can yield insights into the physiological phenomena associated with the dedifferentiation of root tissue during establishment of a suspension cell culture. A comparison between the levels of the identified proteins and mRNA levels quantified through EST counting was performed. It is estimated that on average 50% of the proteins appear to be correlated with their corresponding mRNA levels; conversely, 50% are not. Information on both transcript and protein levels can be utilized for targeting potential regulatory genes that are characterized by high transcript but low protein levels. The proteins identified in this study as unclear or putative represent unique opportunities to probe molecular function. Systematic perturbations and monitoring of these proteins would be expected to yield insight into function. These abundant but unclassiPlant Physiol. Vol. 131, 2003

Proteomics of Barrel Medic

fied proteins have been linked to specific ESTs and, thus, establish the feasibility to experimentally monitor both the protein and mRNA. The relatively high abundance of these proteins further stresses the biological but unknown importance of these proteins in barrel medic. This report provides a comprehensive overview of the barrel medic proteome and provides a good foundation for future comparative proteomic efforts associated with this important model plant. The importance of barrel medic is further emphasized by the recent recommendation from the National Academy of Sciences that the goals of the National Plant Genome Initiative for 2003 through 2008 should focus on a small number of key species including barrel medic (http:// books.nap.edu/books/0309085292/html/index.html). This work serves as a major step in this direction for a key plant species. As we seek to better understand gene function and to study the holistic biology of systems, it is inevitable that we study the proteome. MATERIALS AND METHODS Plant Material and Protein Extraction Differentiated plant tissues were collected from barrel medic (Medicago truncatula cv Jemalong A17) grown in an environmentally controlled growth chamber and maintained under standard conditions (Asirvatham et al., 2002a). Eight-week-old plants were used for leaf and stem tissue. The top two apical unfolded trifoliates were sampled for leaf tissue, and stem tissue was restricted to the first two apical internodes. Flowers included all stages from buds until petal browning and all parts except the peduncles. Green seed pods were collected from a variety of developmental stages (including very young pods to those with maturing seeds) of 3-month-old plants. Roots were collected from seedlings grown in perlite 2 weeks after planting. Total protein from these tissues was extracted according to a reported method (Tsugita et al., 1994). In brief, tissues (0.4–1.0 g) were ground in liquid N2 and proteins precipitated at ⫺20°C with 10% (w/v) TCA in acetone containing 0.07% (w/v) 2-mercaptoethanol for at least 45 min. The mixture was centrifuged at 35,000g at 4°C for 15 min, and the precipitates were washed with acetone containing 0.07% (w/v) 2-mercaptoethanol, 1 mm phenylmethylsulfonyl fluoride, and 2 mm EDTA. Pellets were dried by vacuum centrifugation and solubilized in 8 m urea, 4% (w/v) CHAPS, 20 mm DTT, 0.1% (v/v) Biolytes (pH 3–10; Bio-Rad Laboratories, Hercules, CA; Molloy et al., 1998). Cell cultures derived from barrel medic cv Jemalong A17 roots were grown in the dark in shaker flasks and suspended in Schenk and Hildebrandt (SH) medium with transfer to fresh medium every 2 weeks. Cells were harvested 4 d after transfer, washed once with fresh SH medium and once with SH:water (1:1 [v/v]), ground in liquid N2, and extracted with 40 mm Tris (pH 9.5), 50 mm MgCl2, 2% (w/v) polyvinylpolypyrrolidone, 1 mm phenylmethylsulfonyl fluoride, and 120 units mL⫺1 endonuclease (catalogue no. E8263, Sigma, St. Louis) by sonication (Molloy et al., 1998). After centrifuging at 12,000g, 4°C, for 10 min, proteins in the supernatant were precipitated on ice with 12% (w/v) TCA, centrifuged, and washed with cold acetone. The pellet was air dried and resuspended in solubilization buffer.

Protein Quantification and Electrophoresis Protein concentrations of all tissue extracts were quantified using the Bradford method (Bradford, 1976) and a commercial dye reagent (Bio-Rad) with bovine serum albumin as a standard. Eleven-centimeter immobilized pH gradient (IPG) strips (linear, pH 3–10) from Bio-Rad were rehydrated at 20°C with 0.75 to 1.0 mg of protein in 300 ␮L for 15 to 16 h. Focusing was carried out in a Bio-Rad Protean IEF Cell for a total of 35,000 volt hours. After focusing, strips were equilibrated with reduction and then with alkylation buffers, loaded onto a 12% (w/v) acrylamide gel, and run at 25 mA gel⫺1 (Asirvatham et al., 2002a). Gels were stained overnight with Coomassie

Plant Physiol. Vol. 131, 2003

Brilliant Blue R-250 and destained the next day. Gel images were digitized with a Bio-Rad FluorS equipped with a 12-bit camera. Experimental molecular mass and pI were calculated from digitized 2-DE images using standard molecular mass marker proteins and the linear calibration option of Genomic Solutions HT Analyzer software (Genomic Solutions, Ann Arbor, MI).

Digestions and MALDI-TOFMS Protein spots were excised from the gel, washed twice with water for 15 min, and destained with a 1:1 (v/v) solution of acetonitrile and 50 mm ammonium bicarbonate while changing solutions every 30 min until the blue color of Coomassie was removed. 2-DE gel spots were then dehydrated by washing twice with 100% acetonitrile and dried by vacuum centrifugation. Gel plugs were rehydrated with a solution of 10 ng ␮L⫺1 bovine trypsin (Roche) in 25 mm ammonium bicarbonate and digested for 4 to 6 h at 37°C. The enzymatic digestions were stopped with the addition of 10% (v/v) formic acid, and the supernatant was saved. Gel plugs were extracted once with 25 ␮L acetonitrile:water (1:1 [v/v]) and once with 25 ␮L of 100% (w/v) acetonitrile. Supernatants were combined and taken to dryness. Peptides were resuspended in 2% (w/v) formic acid:acetonitrile (1:1 [w/v]), mixed 1:1 with matrix (10 mg mL⫺1 ␣-cyano-4-hydroxycinnamic acid in same solvent), and spotted for MALDI-TOFMS. Mass spectra were obtained with a PerSeptive Biosystems DE-STR at an instrument resolution exceeding 10,000 and internally mass calibrated by matching to at least one and often more autolytic trypsin peaks (906.5049, 1153.5741, 2163.0570, and 2273.1602). Database search results were reprocessed with a reiterative search algorithm (Intellical, XXXX, XX) at 20 ppm that recalibrates m/z based on the best hit. Intellical software is part of the ABI Proteomics Solutions 1 software. If the best match is a real match, the identification confidence score will increase after reiterative calibration. If the best match is a false positive, the score will generally decline. The process was especially useful when trypsin autolytic peaks were of low abundance or absent. Resultant peptide mass fingerprints were assigned an arbitrary quality score (PMFQ) to quantify the quality of the peptide fingerprint and are reported in Supplemental Table I. The PMFQ scores were assigned based on the relative number of analyte peptides observed and their relative intensities as compared with the most abundant trypsin autolytic peptide peaks (2,163 and 2,273). If no peptides were observed or if analyte peptides were less than 10% of the trypsin autolytic peaks, a PMFQ value of 0 was assigned. If fewer than five peptides with relative intensities less than the trypsin peaks were observed, then a PMFQ of 1 was assigned. If five or more analyte peptides with intensities approximately equal to the trypsin autolytic peaks were observed, then a PMFQ value of 3 was assigned. If significantly more peptides were observed with a relative intensity greater than the trypsin autolytic peaks (but trypsin peaks still ⬎10% for internal m/z calibration) were observed, then a PMFQ value of 4 (approximately 10 peptides) or 5 (⬎10 peptides) was assigned. Both MALDI-TOFMS peptide fingerprints illustrated in Figure 2 have a PMFQ of 5.

Database Queries and Protein Identifications The peptide mass fingerprints were compared with sequences in: (a) NCBInr database (release January 1, 2002), (b) SwissProt database (release January 1, 2002), and/or (c) dbESTothers (NCBI; release January 1, 2002), (d) and/or a subset of dbESTothers (NCBI) consisting of approximately 145,000 barrel medic EST sequences, dated November 15, 2001, and queried using MS-Fit (http://prospector.ucsf.edu) in an automated mode using Proteomic Solutions 1 software from Applied Biosystems (Foster City, CA). Mass spectra were de-isotoped, baseline corrected, and threshold adjusted before database searching. Database searches were performed using a 100-ppm mass accuracy with a minimum requirement of four peptide matches from a submission list of typically 30 peptides. The maximum number of missed cleavages was set at one. The only user-defined modification specified was carbamidomethylation of Cys; however, the software default considered possible modifications of N-terminal Gln to pyro-Glu, oxidation of Met, and protein N terminus acetylation. When peptide mass fingerprints were matched to sequences in the EST databases, functional information was obtained by BLASTX (NCBI; http://www.ncbi.nlm.nih.gov/BLAST/) of the sequence or reference of the clone identifier to the barrel medic gene index (MtGI; http://www.tigr.org/tdb/mtgi/). The theoretical molecular mass and pI of the identified protein were then calculated using GPMAW (Lighthouse data) and compared with the experimental molecular mass

1121

Watson et al.

calculated from the digitized 2-DE images. Protein identifications were evaluated on the basis of multiple variables including the number of peptides matched, mass error (m/z accuracy), percent coverage of the matched protein with 10% of the full-length protein set as the minimum value, quality of the peptide maps, intensity of the matched peaks (18%–20% minimum), similarity of experimental and theoretical protein molecular masses and pIs, and species from which the sequence was matched. For EST matches, the percent coverage was calculated by dividing the number of matched amino acids by the total number of amino acids in the protein sequence returned from the BLASTX or MtGI searches.

LC/MS/MS Select digest mixtures were analyzed by nanoscale HPLC coupled with LC/MS/MS. Data were obtained using an ABI QSTAR Pulsar (Applied Biosystems) hybrid quadrupole time-of-flight mass spectrometer. The instrument m/z was calibrated with standards supplied by the manufacturer. Separated peptides were introduced into the mass spectrometer from an HPLC system equipped with an autosampler (LC Packings, San Francisco). Separations were achieved using an LC Packings nanoscale pepmap column (15 cm ⫻ 75 ␮m i.d., 3 ␮m, 100 Å, C18) and a linear binary gradient (solvent A was 1% [v/v] formic acid in 95%:5% [v/v] water:acetonitrile, whereas solvent B was a 0.8% [v/v] formic acid in 5%:95% [v/v] water:acetonitrile). The linear gradient was 95% (w/v) A:5% (w/v) B (0 min) to 60% (w/v) A:40% (w/v) B over 33 min, then ramped to 5% (w/v) A:95% (w/v) B at 37 min and held at 5% (w/v) A:95% (w/v) B until 42 min, where it was returned to 95% (w/v) A:5% (w/v) B 48 min and allowed to reequilibrate to 95% (w/v) A:5% (w/v) B 60 min. Nanoscale-ESI was performed using a Protona interface and nanoelectrospray needles (silver-coated glass capillary, New Objective, Woburn, MA). Mass spectra datasets were searched against NCBInr, SwissProt, dbESTothers, and mtEST databases using Mascot (http://www.matrixscience.com). The search results were validated as described for the peptide mass fingerprint results.

EST Counting and Protein Relative Abundance Estimates Barrel medic ESTs were extracted from dbEST (http://www.ncbi.nlm. nih.gov/, accessed November 4, 2001). ESTs were assembled into tentative consensus sequences by TIGR to generate the barrel medic gene index (MtGI, http://www.tigr.org/tdb/tgi.shtml). The MtGI release of September 7, 2001 was used to count the occurrence of barrel medic genes in six different EST datasets including leaf (one cDNA library of developing leaf, 7,831 ESTs), stem (one library of developing stem, 10,314 ESTs), root (three libraries of uninoculated root, 6,593 ESTs), flower (one library of developing flower, 3,404 ESTs), seed pod (one library of developing seed and one library of developing pod, 4,587 ESTs), and cell suspensions (one library of elicited cell suspensions, 8,926 ESTs). The barrel medic genes were then sorted in the descending order on their EST counts for each dataset and used in the comparison with proteomic data. Protein abundances were calculated using the normalized spot volume of each protein determined with HT Analyzer software (Genomic Solutions) as previously reported (Asirvatham et al., 2002a).

ACKNOWLEGMENTS We thank Dr. Richard Dixon for scientific discussion and editorial comments. We thank Drs. Zhentian Lei and Aaron Elmer for their assistance in performing LC/MS/MS analyses. Received December 11, 2002; returned for revision December 24, 2002; accepted January 3, 2003.

LITERATURE CITED Anderson NG, Matheson A, Anderson NL (2001) Back to the future: the human protein index and the agenda for post-proteomic biology. Proteomics 1: 3–12 Apweiler R, Attwood TK, Bairoch A, Bateman A, Birney E, Biswas M, Bucher P, Cerutti L, Corpet F, Croning MDR et al. (2001) The InterPro

1122

database, an integrated documentation resource for protein families, domains, and functional sites. Nucleic Acids Res 29: 37–40 Asirvatham VS, Watson BS, Sumner LW (2002a) Analytical and biological variances associated with proteomic studies of Medicago truncatula by 2-DE. Proteomics 2: 960–968 Asirvatham VS, Watson BS, Wang L, Sumner LW (2002b) Protein identification success rates in proteomics studies of Medicago truncatula using peptide mass fingerprints to search protein, nucleotide and EST databases in a species without sequenced genomes. Proceedings of the 50th ASMS Conference on Mass Spectrometry and Allied Topics, Orlando, FL, June 2–6. American Society for Mass Spectrometry, Santa Fe, NM, pp xxx–xxx Audic S, Claverie J-M (1997) The significance of digital gene expression profiles. Genome Res 7: 986–995 Barker DG, Bianchi S, Blondon F, Datte´e Y, Duc G, Essad S, Flament P, Gallusci P, Ge´nier G, Pierre G et al. (1990) Medicago truncatula, a model plant for studying the molecular genetics of the Rhizobium-legume symbiosis. Plant Mol Biol Rep 8: 40–49 Batchelor AH, Piper DE, de la Brousse FC, McKnight SL, Wolberger C (1998) The structure of GABPalpha/beta: an EST domain-ankryin repeat heterodimer bound to DNA. Science 279: 1037–1041 Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, GriffithsJones S, Howe KL, Marshall M, Sonnhammer EL (2002) The Pfam protein families database. Nucleic Acids Res 30: 276–280 Bell CA, Dixon RA, Farmer AD, Flores R, Inman J, Gonzales RA, Harrison MJ, Paiva NL, Scott AD, Weller JW et al. (2000) The Medicago genome initiative: a model legume database. Nucleic Acids Res 29: 1–4 Bestel-Corre G, Dumas-Gaudot E, Poinsot V, Dieu M, Dierick J-F, van Tuinen D, Remacle J, Gianinassi-Pearson V, Gianinazzi S (2002) Proteome analysis and identification of symbiosis-related proteins from Medicago truncatula Gaertn. by two-dimensional electrophoresis and mass spectrometry. Electrophoresis 23: 122–137 Bevan M, Bancroft I, Bent E, Love K, Goodman H, Dean C, Bergkamp R, Dirske W, Van Staveren M, Stiekema W et al. (1998) Analysis of 1.9 Mb of contiguous sequence from chromosome 4 of Arabidopsis thaliana. Nature 391: 485–488 Bradford MM (1976) A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal Biochem 72: 248–254 Candiano G, Musante L, Bruschi M, Ghiggeri GM, Herbert B, Antonucci F, Righetti PG (2002) Two-dimensional maps in soft immobilized pH gradient gels: a new approach to the proteome of the third millennium. Electrophoresis 23: 292–297 Chang WWP, Huang L, Shen M, Webster C, Burlingame AL, Roberts JKM (2000) Patterns of protein synthesis and tolerance of anoxia in root tips of maize seedlings acclimated to a low-oxygen environment, and identification of proteins by mass spectrometry. Plant Physiol 122: 295–317 Choudhary JS, Blackstock WP, Creasy DM, Cottrell JS (2001) Matching peptide mass spectra to EST and genomic DNA databases. Trends Biotechnol 19: S17–S22 Comment (2002) World’s first complete legume genome sequencing project. Trends Plant Sci 7: 101 Cook D, Dreyer D, Bonnet D, Howell M, Nony E, VandenBosch K (1995) Transient induction of a peroxidase gene in Medicago truncatula precedes infection by Rhizobium meliloti. Plant Cell 7: 43–55 Cook DR (1999) Medicago truncatula: a model in the making! Curr Opin Plant Biol 2: 301–304 Cook DR, VandenBosch K, de Bruijn FJ, Huguet T (1997) Model legumes get the nod. Plant Cell 9: 275–281 Corthals G, Gygi S, Aebersold R, Patterson SD (2000) Identification of proteins by mass spectrometry. In T Rabilloud, ed, Proteome Research: Two-Dimensional Gel Electrophoresis and Identification Methods. Springer-Verlag, Berlin, pp 197–231 Costa P, Bahrman N, Frigerio J-M, Kremer A, Plomion C (1998) Waterdeficit-responsive proteins in maritime pine. Plant Mol Biol 38: 587–596 Costa P, Pionneau C, Bauw G, Dubos C, Bahrmann N, Kremer A, Frigerio J-M, Plomion C (1999) Separation and characterization of needle and xylem maritime pine proteins. Electrophoresis 20: 1098–1108 Covitz PA, Smith LS, Long SR (1998) Expressed sequence tags from a root-hair-enriched Medicago truncatula cDNA library. Plant Physiol 117: 1325–1332 Davin LB, Lewis NG (1992) Phenylpropanoid metabolism: biosynthesis of monolignols, lignans and neolignans, lignins and suberins. In HA Staf-

Plant Physiol. Vol. 131, 2003

Proteomics of Barrel Medic

ford, RK Ibrahim, eds, Recent Advances in Phytochemistry, Phenolic Metabolism in Plants. Plenum Press, New York, pp 325–376 Dixon RA (1980) Plant tissue culture methods in the study of phytoalexin induction. In DS Ingram, JP Helgeson, eds, Tissue Culture Methods for Plant Pathologists. Blackwell Scientific Publications, Oxford, pp 185–186 Dixon RA (1999) Isoflavonoids: biochemistry, molecular biology, and biological function. In D Barton, K Nakanishi, O Meth-Cohn, eds, Comprehensive Natural Product Chemistry. Elsevier, New York, pp 774–821 Dunwell JM (1998) Cupins: a new superfamily of functionally diverse proteins that include germins and plant storage proteins. Biotechnol Genet Eng Rev 15: 1–32 Duranti M, Horstamann C, Gilroy J, Croy RR (1995) The molecular basis for N-glycosylation in the 11S globulin (legumin) of lupin seed. J Protein Chem 14: 107–110 Ewing RM, Kahla AB, Poirot O, Lopez F, Audic S, Claaverie J-M (1999) Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression. Genome Res 9: 950–959 Futcher B, Latter GI, Monardo P, McLaughlin CS, Garrels JI (1999) A sampling of the yeast proteome. Mol Cell Biol 19: 7357–7368 Gallardo K, Job C, Groot SPC, Puype M, Demol H, Vanderkerckhove J, Job D (2001) Proteomic analysis of Arabidopsis seed germination and priming. Plant Physiol 126: 835–848 Gallusci P, Dedieu A, Journet EP, Huguet T, Barker DG (1991) Synchronous expression of leghaemoglobin in Medicago truncatula during nitrogen-fixing root nodule development and response to exogenously supplied nitrate. Plant Mol Biol 17: 335–349 Go¨ rg A, Obermaler C, Boguth G, Weiss W (1999) Recent developments in two-dimensional gel electrophoresis with immobilized pH gradients: wide pH gradients up to pH 12, longer separation distances and simplified procedures. Electrophoresis 20: 712–717 Gorina S, Pavletich NP (1996) Structure of the p53 tumor suppressor bound to the ankyrin and SH3 domains of 53BP2. Science 274: 1001–1005 Guerreiro N, Djordjevic MA, Rolfe BG (1999) Proteome analysis of the model microsymbiont Sinorhizobium meliloti: isolation and characterisation of novel proteins. Electrophoresis 20: 818–825 Gygi SP, Rochon Y, Franza BR, Aebersold R (1999) Correlation between protein and mRNA abundance in yeast. Mol Cell Biol 19: 1720–1730 Gyorgyey J, Vaubert D, Jimenez-Zurdo JI, Charon C, Troussard L, Kondorosi A, Kondorosi E (2000) Analysis of Medicago truncatula nodule expressed sequence tags. Mol Plant-Microbe Interact 13: 62–71 Haridas V, Higuchi M, Jayatilake GS, Bailey D, Mujoo K, Blake ME, Arntzen CJ, Gutterman JU (2001). Avicins: triterpenoid saponins from Acacia victoriae (Bentham) induce apoptosis by mitochondrial perturbation. Proc Natl Acad Sci USA 98: 5821–5826 Hayashi M, Mori H, Nishimura M, Akazawa T, Hara-Nishimura I (1988) Nucleotide sequence of cloned cDNA coding for pumpkin 11-S globulin beta subunit. Eur J Biochem 172: 627–632 Higgins CF (1992) ABC transporters: from microorganisms to man. Annu Rev Cell Biol 8: 67–113 Jacobs MD, Harrison SC (1998) Structure of an IkappaBalpha/NF-kappaB complex. Cell 95: 749–758 Jasin˜ ski M, Stukkens Y, Degand H, Purnelle B, Marchand-Brynaert J, Boutry M (2001) A plant plasma membrane ATP binding cassette-type transporter is involved in antifungal terpenoid secretion. Plant Cell 13: 1095–1107 Klose J, Kobalz U (1995) Two-dimensional electrophoresis of proteins: an updated protocol and implications for a functional analysis of the genome. Electrophoresis 16: 1034–1059 Kruft V, Holger E, Ja¨ nsch L, Wolf W, Braun H-P (2001) Proteomic approach to identify novel mitochondrial proteins in Arabidopsis. Plant Physiol 127: 1694–1710 Lewis NG, Yamamoto E (1990) Lignins: occurrence, biosynthesis and biodegradation. Annu Rev Plant Physiol Plant Mol Biol 41: 455–496 Li HM, Kaneko Y, Keegstra K (1994) Molecular cloning of a chloroplastic protein associated with both the envelope and thylakoid membranes. Plant Mol Biol 25: 619–632 Mann M, Pandey A (2001) Use of mass spectrometry-derived data to annotate nucleotide and protein sequence databases. Trends Biochem Sci 26: 54–60 Mann M, Wilm M (1994) Error-tolerant identification of peptide sequence tags. Anal Chem 66: 4390–4399 Mathesius U, Imin N, Chen H, Djordjevic MA, Weinman JJ, Natera SHA, Morris AC, Kerim T, Paul S, Menzel C et al. (2002) Evaluation of

Plant Physiol. Vol. 131, 2003

proteome reference maps for cross-species identification of proteins by peptide mass fingerprinting. Proteomics 2: 1288–1303 Mathesius U, Keijzers G, Natera SHA, Winman JJ, Djordjevic MA, Rolfe BG (2001) Establishment of a root proteome reference map for the model legume Medicago truncatula using the expressed sequence tag database for peptide mass fingerprinting. Proteomics 1: 1424–1440 Meins FJ, Neuhaus J-M, Sperisen C, Ryals J (1992) Plant Gene Research: The primary structure of plant pathogenesis-related glucanohydrolases and their genes. In T Boller, F Meins, eds, Genes Involved in Plant Defense. Springer-Verlag Wien, New York, pp 245–282 Millar AH, Sweetlove LJ, Giege´ P, Leaver CJ (2001) Analysis of the Arabidopsis mitochondrial proteome. Plant Physiol 127: 1711–1727 Molloy MP, Herbert BR, Walsh BJ, Tyler MI, Traini M, Sanchez J-C, Hochstrasser DF, Williams KL, Gooley AA (1998) Extraction of membrane proteins by differential solubilization for separation using twodimensional gel electrophoresis. Electrophoresis 19: 837–844 Morris AC, Djordevic MA (2001) Proteome analysis of cultivar-specific interactions between Rhizobium leguminosarum biovar trifolii and subterranean clover cultivar woogenenellup. Electrophoresis 22: 586–598 Pappin DJC, Hojrup P, Bleasby AJ (1993) Rapid identification of proteins by peptide-mass fingerprinting. Curr Biol 3: 327–332 Peltier J-B, Friso G, Kalume DE, Roepstorff P, Nilsson F, Adamska I, van Wijk KJ (2000) Proteomics of the chloroplast: systematic identification and targeting analysis of lumenal and peripheral thylakoid proteins. Plant Cell 12: 319–341 Peltier J-B, Emanuelsson O, Kalume DE, Yetterberg J, Friso G, Rudella A, Liberles DA, So¨ derberg L, Roepstorff P, von Heijne G et al. (2002) Central functions of the lumenal and peripheral thylakoid proteome of Arabidopsis determined by experimentation and genome-wide prediction. Plant Cell 14: 211–236 Peng HM, Dreyer DA, VandenBosch KA, Cook D (1996) Gene structure and differential regulation of the Rhizobium-induced peroxidase gene rip1. Plant Physiol 112: 1437–1446 Reymond P, Kunz B, Paul-Pletzer K, Grimm R, Eckerskorn C, Farmer EE (1996) Cloning of a cDNA encoding a plasma membrane-associated, uronide binding phosphoprotein with physical proteins similar to viral movement proteins. Plant Cell 8: 2265–2276 Sambrook J, Fritsch EF, Maniatis T (1989) Construction and analysis of cDNA libraries. In N Ford, C Nolan, M Feruson, eds, Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, pp 8.1–8.86 Small E (1996) Adaptations to herbivory in alfalfa (Medicago sativa). Can J Bot 74: 807–822 Stensballe A, Jensen ON (2001) Simplified sample preparation method for protein identification by matrix-assisted laser desorption/ionization mass spectrometry: in-gel digestion on the probe surface. Proteomics 1: 955–966 Sugiura M, Takeda Y (2000) Nucleic acids. In B Buchanan, W Gruissem, R Jones, eds, Biochemistry and Molecular Biology of Plants. American Society for Plant Physiologists, Waldorf, MA, p 298–300 Trieu AT, Burleigh SH, Kardailsky IV, Maldonado-Mendoze IE, Versaw WK, Blaylock LA, Shin H, Chiou T-J, Katagi H, Dewbre GR et al. (2000) Transformation of Medicago truncatula via infiltration of seedlings or flowering plants with Agrobacterium. Plant J 22: 531–541 Tsugita A, Kawakami T, Uchiyama Y, Kamo M, Miyatake N, Nozu Y (1994) Separation and characterization of rice proteins. Electrophoresis 15: 708–720 U.S. Department of Agriculture-National Agricultural Statistics Service (2002) Agricultural statistics 2002. 2000 Agricultural Statistics. http:// www.usda.gov/nass/pubs/agr02/acro02.htm van Wijk K (2001) Challenges and prospects of plant proteomics. Plant Physiol 126: 501–508 Wang L, Wessler SR (1998) Inefficient reinitiation is responsible for upstream open reading frame-mediated translation repression of the maize R gene. Plant Cell 10: 1733–1745 Wang L, Wessler SR (2001) Role of mRNA secondary structure in translational repression of the maize transcriptional activator Lc1,2. Plant Physiol 125: 1380–1387 Yates JR III (1998a) Mass spectrometry and the age of the proteome. J Mass Spectrom 33: 1–19 Yates JR III (1998b) Database searching using mass spectrometry data. Electrophoresis 19: 893–900

1123