Research
Hydrophilic Interaction Chromatography Reduces the Complexity of the Phosphoproteome and Improves Global Phosphopeptide Isolation and Detection*□ S
Dean E. McNulty and Roland S. Annan‡ The diversity and complexity of proteins and peptides in biological systems requires powerful liquid chromatography-based separations to optimize resolution and detection of components. Proteomics strategies often combine two orthogonal separation modes to meet this challenge. In nearly all cases, the second dimension is a reverse phase separation interfaced directly to a mass spectrometer. Here we report on the use of hydrophilic interaction chromatography (HILIC) as part of a multidimensional chromatography strategy for proteomics. Tryptic peptides are separated on TSKgel Amide-80 columns using a shallow inverse organic gradient. Under these conditions, peptide retention is based on overall hydrophilicity, and a separation truly orthogonal to reverse phase is produced. Analysis of tryptic digests from HeLa cells yielded numbers of protein identifications comparable to that obtained using strong cation exchange. We also demonstrate that HILIC represents a significant advance in phosphoproteomics analysis. We exploited the strong hydrophilicity of the phosphate group to selectively enrich and fractionate phosphopeptides based on their increased retention under HILIC conditions. Subsequent IMAC enrichment of phosphopeptides from HILIC fractions showed better than 99% selectivity. This was achieved without the use of derivatization or chemical modifiers. In a 300-g equivalent of HeLa cell lysate we identified over 1000 unique phosphorylation sites. More than 700 novel sites were added to the HeLa phosphoproteome. Molecular & Cellular Proteomics 7:971–980, 2008.
Global proteome analysis by mass spectrometry is hampered by the enormous complexity of the proteome, the large dynamic range of protein expression, and the data acquisition rate of current mass spectrometers (1). Significant research has been carried out to develop separation methods (2–5) to enable the resolution, detection, and identification of the entire complement of proteins in a biological system. Large From the Proteomics and Biological Mass Spectrometry Laboratory, GlaxoSmithKline, King of Prussia, Pennsylvania 19406 Received, November 9, 2007, and in revised form, January 22, 2008 Published, MCP Papers in Press, January 22, 2008, DOI 10.1074/ mcp.M700543-MCP200
© 2008 by The American Society for Biochemistry and Molecular Biology, Inc. This paper is available on line at http://www.mcponline.org
scale proteomics studies typically use a two-dimensional separation strategy coupling an orthogonal mode such as strong cation exchange (SCX)1 to RPLC-MS (4, 6). Analysis of the phosphoproteome represents a significant increase in difficulty beyond the general proteome. Although it has been suggested that about 30% of all proteins are phosphorylated, many of these will be less abundant proteins involved in signaling and regulation of cellular function. Most phosphoproteins will contain between 1–20 phosphorylation sites, and these sites will largely be occupied at substoichiometric levels (7–9). Many phosphorylation sites will occur in combination with other modifications, and multiple phosphorylation sites can exist in a single peptide. All of these factors suggest that the phosphoproteome is likely to be every bit as complex as the general proteome but with a dynamic range spanning an additional 2 orders of magnitude and that the phosphopeptides will exist in the presence of a very large excess of non-phosphorylated peptides. The most common strategy for enriching phosphopeptides in mixtures is the use of IMAC (10, 11) or titanium dioxide (TiO2) microcolumns (12, 13). Although phosphopeptides have a very high affinity for these resins, acidic peptides (containing aspartic acid and glutamic acid) and peptides containing histidine also have a high binding affinity. In the face of an overwhelming amount of the latter peptides, as would be the case for a cell lysate, the specificity of IMAC and TiO2 suffers tremendously. The conversion of peptide carboxylate groups to methyl ester derivatives has been shown to restore the selectivity of both IMAC (14) and TiO2 (12). Recently it was demonstrated that including dihydroxybenzoic acid in the loading buffer can improve the selectivity of phosphopeptide binding to TiO2 (13). Despite the large enrichment factor that IMAC and TiO2 provide, analysis of unfractionated proteomes has been limited to the identification of only a few hundred phosphorylation sites (14 –16). Recently SCX has been suggested as an alternative to metal chelating resins as a means of enriching phosphopeptides (7). 1
The abbreviations used are: SCX, strong cation exchange; HILIC, hydrophilic interaction chromatography; RP, reverse phase; 2D, twodimensional; MAPK, mitogen-activated protein kinase; IMAC, immobilized metal affinity chromatography.
Molecular & Cellular Proteomics 7.5
971
HILIC Reduces the Complexity of the Phosphoproteome
Under typical SCX conditions, most tryptic phosphopeptides will elute in the 1⫹ fractions, whereas the bulk of non-phosphorylated tryptic peptides will elute in the 2⫹ and higher fractions. Although this appears to be highly reproducible, it does not provide a very high level of enrichment, and not all phosphopeptides elute in the 1⫹ fractions. The tryptic peptide containing the doubly phosphorylated p38 activation site for instance elutes with the 2⫹ fraction. Furthermore even the 1⫹ fractions contained significant numbers of non-phosphorylated peptides. SDS-PAGE has been used to reduce the complexity of the samples applied to SCX. Although this makes the final fractions less complex, it does not improve the enrichment capacity of SCX. The large increase in the number of phosphosites identified in recent phosphoproteomics studies (7–9) can be mainly attributed to extensive and repetitive analysis of fractions using new highly sensitive, fast scanning, accurate mass instruments. To provide a more comprehensive view of the phosphoproteome, a further reduction in sample complexity beyond affinity enrichment or partitioning by charge state is necessary. An additional high resolution chromatographic separation at the peptide level, orthogonal to those currently in use, would be highly desirable when coupled with an enrichment strategy like IMAC or TiO2. Hydrophilic interaction chromatography (HILIC) (17) is a high resolution separation technique where the primary interaction between a peptide and the neutral, hydrophilic stationary phase is hydrogen bonding. In HILIC, retention increases with increasing polarity (hydrophilicity) of the peptide, opposite to the trends observed in RPLC (17, 18). Samples are loaded at high organic solvent concentration and eluted by increasing the polarity of the mobile phase (e.g. an inverse gradient of ACN in water). Intuitively it would seem that this mode of chromatography would simply be the “reverse” of RPLC. However, Gilar et al. (19) have shown that HILIC has the highest degree of orthogonality to RPLC of all commonly used peptide separation modes. This and the suitability of HILIC for the highly efficient separation of polar molecules prompted us to investigate the potential of HILIC as part of a multidimensional separation strategy for phosphoproteomics. In this current study we demonstrate that HILIC as a first dimension separation for 2D LC proteomics provided an equivalent number of peptide and protein identifications to SCX for cell lysates. Our findings also show that phosphopeptides exhibited increased retention relative to non-phosphorylated peptides. Using HILIC we developed gradient conditions that provide significant enrichment and fractionation of phosphopeptides. We further demonstrate that the volatile, salt-free TFA/acetonitrile buffer system used with HILIC was fully compatible with direct IMAC capture without any intermediate desalting steps. Finally we report that HILIC fractionation dramatically improved the selectivity of IMAC. The hydrophilicity-based separation yielded fractions where all of the peptides in each fraction have a similar polarity. This ensures fair competition
972
Molecular & Cellular Proteomics 7.5
between the phosphopeptides and the non-phosphopeptides in each fraction and provides highly enriched phosphopeptide pools without the need for derivatization or additives. EXPERIMENTAL PROCEDURES
Sample Preparation—HeLa cells were grown to near confluency in 10-cm dishes; lysed directly in 1 ml of ice-cold 8 M urea, 0.4 M NH4HCO3; scraped; and sonicated prior to clarification by centrifugation. Where indicated, cells were serum-starved overnight and stimulated with 100 nM calyculin A (Cell Signaling Technologies) prior to lysis. Clarified lysates were reduced with 4.5 mM DTT for 30 min at 37 °C and alkylated with 10 mM iodoacetamide for 30 min at room temperature in the dark. Lysates were diluted 1:4 with water, and 250 l of immobilized trypsin slurry was added (Pierce). The samples were digested at room temperature with rocking overnight. Digests were acidified to 1% TFA, desalted on a 1-g Sep-Pak cartridge (Waters), and lyophilized to dryness. Hydrophilic Interaction Chromatography—Preparative chromatographic separations were performed on a Beckman System Gold 126 HPLC system (Beckman-Coulter) using a 4.6 ⫻ 250-mm TSKgel Amide-80 5-m particle column (Tosoh Biosciences). 1 mg of desalted HeLa tryptic digest peptides was loaded in 90% solvent B (98% acetonitrile with 0.1% TFA). Solvent A consisted of 98% water with 0.1% TFA. For global 2D LC-MS analysis, peptides were eluted with an inverse gradient of 90% B to 85% B in 5 min followed by 85% B to 70% B in 40 min and finally a steep gradient to 0% B in 5 min at 0.5 ml/min. Twenty 1-ml fractions were collected throughout the gradient. For global phosphopeptide fractionation, samples were loaded in 80% B, and a modified gradient consisting of 80% B held for 5 min followed by 80% B to 60% B in 40 min and finally 0% B in 5 min was used. Fractions were collected as shown in supplemental Fig. S1 and were lyophilized for 2D LC-MS analyses or used directly for IMAC as described below. Strong Cation Exchange Chromatography—Preparative chromatographic separations were performed on a Beckman System Gold 126 HPLC system (Beckman-Coulter) using a 4.6 ⫻ 200-mm PolySULFOETHYL A 5-m particle column (PolyLC Inc.). 1 mg of desalted HeLa tryptic digest peptides was loaded in solvent A (10 mM ammonium formate, pH 3.0, in 25% acetonitrile). Solvent B consisted of 500 mM ammonium formate, pH 6.8, in 25% acetonitrile. Twenty 1-ml fractions were collected throughout the gradient. For global 2D LC-MS analysis, peptides were eluted with a gradient of 0% B to 25% B in 40 min followed by 25% B to 100% B in 10 min. IMAC—20 l of a 50% slurry (10 l of gel) of PHOS-Select iron affinity gel (Sigma) was added directly to HILIC preparative fraction pools from the optimized phosphopeptide gradient run and vortexed rapidly for 30 min. The fractions were placed in a 0.22-m nylon Spin-X centrifuge tube filter (Corning) and centrifuged at 8200 ⫻ g for 30 s. The unbound material was discarded, and the gel was washed with 500 l of 250 mM acetic acid with 30% acetonitrile followed by 500 l of water. The gel was eluted using 100 l of 400 mM ammonium hydroxide with vortexing for 10 min. The eluant was collected and lyophilized to dryness. LC-MS Analysis—For global proteomics analysis, lyophilized samples were resuspended in 100 l of 0.1% formic acid, 0.02% TFA for analysis. 2 l (2%) of each fraction was injected in duplicate onto a C18 PepMap100 300-m ⫻ 5-mm, 5-m enrichment cartridge (LC Packings) via a 1100 Series binary pump (Agilent) at 20 l/min in 0.1% formic acid, 0.05% TFA. The samples were backflushed in line with the C18 PepMap100 75-m ⫻ 15-mm, 5-m analytical column (LC Packings) and gradient eluted at 300 nl/min using a 1100 Series nanopump (Agilent) coupled to an 1100 Series LC/MSD Trap XCT Ultra ion trap mass spectrometer (Agilent). Solvent A was 0.1% formic acid in water, and solvent B was 0.1% formic acid in acetonitrile. The
HILIC Reduces the Complexity of the Phosphoproteome
gradient used was 4 min at 10% B and 10% B to 30% B in 50 min followed by a steep gradient to 95% B in 5 min. The total run time including column reconditioning was 70 min. MS survey scans and MS/MS product ion scans were recorded in a data-dependent manner with one survey scan being followed by two product ion scans. Phosphopeptide fractions from IMAC were analyzed similarly with the exception that a single injection of 1⁄3 of the sample was performed. In addition, a neutral loss-triggered MS3 scan was performed on ions detected in the MS2 experiment that corresponded to the loss of 49 Da from the precursor. This was performed in the fragmentation only mode in which no isolation is done between the MS2 fragmentation and the “pseudo” MS3 fragmentation. The two spectra were merged into a single spectrum for database searching. Database Search—Peak lists from replicate injections of the global proteomics experiment were created using the Agilent Data Analysis software version 3.4 and concatenated into single files using MasCat (Agilent). A non-redundant, in-house curated protein database containing 2,922,633 sequences was appended to its reverse complement and searched with restriction to Homo sapiens (110,110 entries) using Mascot version 2.1 (Matrix Science). Searches were performed for fully tryptic peptides containing up to one missed cleavage, fixed carboxamidomethylated cysteine modification, and no variable modifications with a mass tolerance of ⫾0.3 Da for the precursor ions and a tolerance of ⫾0.5 Da for the product ions. Using the parameters described above, Mascot individual ion scores ⬎32 indicate identity or extensive homology (p ⬍ 0.05) and yield a 3.82% false positive rate. The false positive rate is given as (false hits above threshold/true hits above threshold) ⫻ 100. Positive protein assignments to unique accession numbers were made on the basis of two unique peptides with both peptides having an ion score ⬎32 and a p ⬍ 0.05 or a single peptide having an ion score ⬎32 and a p ⬍ 0.015. For the phosphoproteome analysis, data from each LC-MS run in an experiment were processed and merged as described as above and searched against the human International Protein Index database (version 3.21; September 6, 2006; 60,397 sequences) appended to its reverse complement (13). Search parameters were as above except that phosphorylation on serine, threonine, and tyrosine were set as variable modifications. Search results were filtered to yield false positive rates that were less than 1.3% in all cases. Sequence assignments with Mascot ion scores below a 1.3% false positive rate but greater than 25 were manually validated as described in the text. Phosphosite assignments were manually verified when the difference in the score of the next most probable site was less than 10. RESULTS AND DISCUSSION
HILIC as a Tool for 2D Proteomics—The overall separation efficiency of any 2D LC system is determined by the peak capacity of each individual dimension and the orthogonality of the two modes of separation. For the highest performance, the ideal first dimension separation would have a peak capacity that matches the second dimension separation. SCX is the current standard for first dimension separations in proteomics experiments where RPLC-MS is the second dimension. Although sufficiently orthogonal to RP, SCX has a significantly lower peak capacity (19) and thus would not represent the ideal first dimension. HILIC on the other hand has a peak capacity that is better than 50% greater than SCX and second only to RP among the common separation modes. Furthermore HILIC has been shown to have the highest degree of orthogonality to RP of all common separation modes (19). To demonstrate the utility of HILIC as a first dimen-
sion separation in the 2D LC analysis of highly complex proteomics samples, we analyzed tryptic digests of HeLa cell lysate and compared these results with those acquired using SCX. Through multiple injections of 1 mg of reduced, alkylated, trypsin-digested HeLa whole cell lysate we developed an optimal gradient that achieved relatively equal peptide distribution throughout (as monitored by A280). It was found that adequate separation could be achieved using a shallow inverse organic gradient of acetonitrile-water containing 0.1% TFA (Fig. 1). Loading the tryptic digest in 90% B was not found to present any significant solubility problems. Twenty 2-min fractions (1 ml each) were collected over the 40-min gradient. To test for orthogonality, a 2-min fraction taken both early and late in the gradient was analyzed by RPLC-MS. Peptides from both HILIC fractions eluted over the entire gradient profile of the RP separation, demonstrating a high degree of orthogonality in the 2D scheme (Fig. 1). An identical 1-mg sample was fractionated using an optimized SCX gradient (data not shown), and 20 equivalent fractions were collected. For each set of HILIC and SCX fractions, 2% was analyzed in duplicate by RPLC-MS/MS. Peptides from each sample were identified using Mascot, and the results were analyzed using identical criteria as described under “Experimental Procedures.” Unique peptides identified in each fraction were uniformly distributed throughout the HILIC gradient, confirming good orthogonality to RPLC (Fig. 2A). As expected, early fractions were largely comprised of short, hydrophobic peptides, whereas peptides from late fractions were larger, hydrophilic, and highly acidic often with missed tryptic cleavages. This distribution is entirely consistent with the HILIC mode of separation. In total, 5613 peptide assignments were made throughout the 20 fractions. This represents the cumulative sum total of unique peptides identified in each fraction but does not take into account peptides identified in multiple fractions. The distribution of peptides from the SCX separation (Fig. 2B) was, as expected, based on solution charge. The early fractions contained almost exclusively peptides derived from the protein C terminus, containing neither terminal arginine or lysine residues. After a quiet region in the chromatogram (fractions 3 and 4) and an acid-rich region (fractions 5 and 6), the bulk of the peptides eluted in fractions 7–10. These peptides tended to be typical tryptic peptides with a net charge of 2⫹. This region was followed by a broad distribution of longer, higher charge state peptides. Using SCX we identified 5398 unique peptides (as described above). Although the number of peptide assignments was similar, comparison of Fig. 2, A and B, would suggest that as a first dimension separation HILIC provides a more balanced distribution of peptide mass across the gradient compared with SCX. Using the criteria for protein identification described under “Experimental Procedures,” we identified 1244 unique accession numbers in the HILIC experiment (Fig. 2A). Of these, 454 were single peptide hits (individual ion score ⬎32
Molecular & Cellular Proteomics 7.5
973
HILIC Reduces the Complexity of the Phosphoproteome
FIG. 1. Orthogonality of HILIC and RP chromatography. A, HILIC separation of 1 mg of trypsinized HeLa whole cell lysate using a gradient optimized for global 2D proteome analysis. Tryptic peptides were preparatively fractionated into 20 2-min pools (1 ml each). Shaded regions indicate early and late eluting fractions analyzed by RPLC-MS. B, base peak RPLC-MS chromatograms for the indicated HILIC fractions. Peptides elute over the entire range of the reverse phase gradient demonstrating a high degree of orthogonality between the two dimensions. mAU, milliabsorbance units.
FIG. 2. Peptide and protein identifications using HILIC and SCX chromatography in the first dimension of a 2D LC proteome analysis. Tryptic peptides from 1 mg of HeLa whole cell lysate digest were fractionated off line using either HILIC (A) or SCX (B). Fractions were analyzed by data-dependent RPLC-MS/ MS. Peptides were identified using Mascot. Gray bars indicate the number of unique peptide identifications obtained in each of the 20 fractions. Black lines indicates cumulative unique accession numbers obtained across the gradient.
and p ⬍ 0.015). Likewise SCX produced 1283 unique accession numbers of which 518 were single peptide hits (Fig. 2B). Based on the greater peak capacity of HILIC and the more uniform distribution of unique peptides across the HILIC gradient, we would have expected that the number of unique peptides and possibly protein assignments would be greater for HILIC as compared with SCX. Rather we found that the practical peak capacity of the two 2D systems is equivalent. In a 2D system, the overall peak capacity is dependent on the number of fractions taken in the first dimension. In this case, by taking 2-min fractions, we may have badly undersampled the HILIC separation and degraded the overall peak capacity to a point where it equaled that of the SCX 2D system. In designing an efficient HILIC-RP 2D chromatography system for proteomics, this parameter will need to be explored more carefully. None the less, for our purposes here, it can be seen that HILIC is at least equivalent to SCX as a first dimension separation.
974
Molecular & Cellular Proteomics 7.5
HILIC as a Tool for Phosphoproteomics—The suitability of HILIC for the separation of highly polar molecules prompted us to explore the potential of HILIC coupled with RP as part of an enrichment and chromatography strategy for phosphoproteomics. The phosphate group is strongly hydrophilic (17), and so we speculated that phosphopeptides might bind more tightly than non-phosphorylated peptides, thus providing a well defined, late eluting pool that is significantly enriched over non-phosphorylated peptides. Using the HILIC gradient developed for the 2D LC experiments described above, we found that purified -casein phosphopeptides FQpSEEQQQTEDELQDK and RELEELNVPGEIVEpSLpSpSpSEESITR (where pS is phosphoserine) eluted in a single fraction at the end of the gradient (Fig. 3A.) However, both of these peptides contain multiple hydrophilic amino acids (Arg, His ⬎ Asn ⬎ Gln, Glu, Lys ⬎ Asp, Ser, Thr) in addition to the phosphoserine(s). When we analyzed peptides from an IMAC enriched
HILIC Reduces the Complexity of the Phosphoproteome
FIG. 3. Hydrophilicity of the phosphate group promotes strong retention of phosphopeptides by HILIC. A, UV chromatograms of monophosphorylated and tetraphosphorylated -casein peptides show late elution during HILIC consistent with the large hydrophilic contribution of the phosphate group and glutamic acid residues. B, UV chromatogram of unfractionated tryptic digest from HeLa cells after IMAC enrichment for phosphopeptides. The majority of the UV absorbance elutes late in the chromatogram (shaded region). C, analysis of this sample by LC-MS/MS confirmed the strong acidic and hydrophilic composition of both phosphorylated (*) and nonphosphorylated peptides. mAU, milliabsorbance units.-
tryptic digest of a HeLa cell lysate we found that the majority of the UV absorbance eluted in the same region as the casein phosphopeptides (Fig. 3B). Data-dependent LC-MS/MS revealed that most of the peptides in this sample (both phosphorylated and non-phosphorylated) contained multiple acidic residues (Fig. 3B). In addition, there were significant numbers of multiply phosphorylated peptides and peptides with missed cleavages. These data are entirely consistent with the reported bias of IMAC (20) and reveal that this peptide pool in addition to being highly acidic is also extremely hydrophilic. To further explore the chromatography of phosphopeptides using HILIC we examined the retention of seven tryptic phosphoepitopes from five highly important signaling proteins, p38 mitogen-activated protein kinase, c-Jun, MK2, Erk1, and Hsp27. All of these phosphoepitopes were efficiently bound and eluted from IMAC medium even in the presence of a complex background (data not shown); however, they were widely distributed throughout the second half of the HILIC gradient (Fig. 4A) in order of their estimated hydrophilicity (Fig. 4B). The wide distribution of these peptides convinced us that HILIC alone could not be used as an efficient enrichment strategy for phosphoproteomics. Nevertheless none of the phosphopeptides tested eluted before the midpoint of the gradient. To investigate this as a generality, we analyzed IMAC enriched fractions from across the HILIC gradient by data-dependent LC-MS/MS. Consistent with our preliminary observations, the first phosphopeptides were detected at approximately the midpoint of the gradient, and subsequent phosphopeptides were fairly evenly distributed in fractions
collected throughout the remainder of the gradient (Fig. 4A). Based on this, we reoptimized the gradient with the intent to segregate those fractions that did not contain phosphopeptides into the flow-through and earliest fractions while providing optimal resolution of the phosphopeptide-containing fractions during the gradient elution (supplemental Fig. S1). At this point we considered whether it was more advantageous to perform HILIC fractionation before or after performing the IMAC enrichment. In trial experiments, we determined that fractionating before IMAC allowed us to eliminate a desalting and concentration step. The tryptic digest can be loaded directly onto the HILIC column (after adding acetonitrile), and the IMAC beads can be added directly to the HILIC fractions. A disadvantage of this approach is that it means performing 20⫹ IMAC captures. The advantage of fractionating after IMAC is that only a single capture step is needed. We prepared tryptic digests from HeLa cell lysates and fractionated one by HILIC, collecting 20 fractions (Fig. S1) and subjecting each to IMAC. A second identical aliquot of lysate was enriched by IMAC first, and the eluted peptides were fractionated by HILIC, again collecting 20 fractions (supplemental Fig. S1). One-third of each fraction (representing a 300-g equivalent of starting material) was analyzed by RPLC-MS/MS. The data from each set of fractions were merged and searched against a concatenated target/decoy database using forward and reverse versions of the human International Protein Index database. The results presented here for all three samples are based on a 1.3% false positive rate. Approximately the same total number of peptides (using 1.3% false positive rate) were identified in each experiment.
Molecular & Cellular Proteomics 7.5
975
HILIC Reduces the Complexity of the Phosphoproteome
FIG. 4. Overall elution behavior of phosphopeptides on HILIC. A, UV chromatogram showing the HILIC separation of a tryptic digest from HeLa whole cell lysate. Ten fractions from this sample were collected as shown and analyzed for phosphopeptide content after IMAC enrichment. Superimposed on the chromatogram is the elution position (dark gray bands) of seven synthetic phosphopeptides derived from the MAPK signaling pathway. B, sequences and properties of the tryptic phosphoepitopes from the MAPK signaling proteins. These peptides demonstrate broad compositional diversity, containing singly and multiply phosphorylated Ser(P), Thr(P), and Tyr(P) residues within a wide context of amino acid length, theoretical hydrophobic/hydrophilic character, and pI range. Yoshida (21) showed that hydrophilicity retention coefficients are roughly inverse to hydrophobicity retention coefficients. Thus the Bull and Breeze hydrophobicity index (B⫹B) may be used as an approximation of elution order in HILIC. The elution position of the MAPK pathway peptides (dark shading) and the distribution of phosphopeptides from the HeLa phosphoproteome (light shading) confirm that increased retention of phosphopeptides is a general phenomenon under HILIC conditions but demonstrate that phosphopeptides can be fractionated according to their hydrophilic character. pT, phosphothreonine; pS, phosphoserine; pY, phosphotyrosine; mAU, milliabsorbance units.
However, the composition of the peptides in the two experiments was quite different (Fig. 5, A and B). In the IMAC-HILIC experiment we identified 899 peptides of which 60% were phosphorylated (545). Among the phosphopeptides, 7.5% were identified as being multiply phosphorylated. As was expected, the non-phosphorylated peptides eluted in the early fractions and were followed by singly and then multiply phosphorylated peptides with the percentage of phosphopeptides in each fraction increasing with each subsequent fraction (Fig. 5A). From the HILIC-IMAC experiment we identified 820 total peptides of which greater than 99% were phosphorylated (814). Of the total pool of phosphopeptides, 9.4% were multiply phosphorylated. Phosphopeptides were more or less evenly distributed throughout the HILIC gradient with the multiply phosphorylated peptides eluting later (Fig. 5B). To help us understand the distribution of phosphopeptides in the two experiments we calculated the pI of each peptide2 and plotted the percentage of peptides that fell in single pI ranges (Fig. 5, C and D). This showed that, as expected, the IMAC-HILIC system strongly favored more acidic peptides with 61% of the phosphopeptides having an isoelectric point between 3.0 and 4.9. In the reverse system only 38% of all peptides had an isoelectric point in the same range. In absolute terms, however, the number of phosphopeptides in this 2 ProMoST: Protein Modification Screening Tool. We used a pKa value of 2.12 and 7.12 for the first and second ionizations, respectively, of phosphoserine and threonine residues.
976
Molecular & Cellular Proteomics 7.5
range was roughly similar for the two data sets with 290 for IMAC-HILIC and 314 for HILIC-IMAC. Because the total number of peptides and the total number of highly acidic phosphopeptides were more or less the same in both experiments, the difference in the results obtained from the two approaches would seem to be a function of the IMAC selectivity in each case. In the IMAC-HILIC experiment non-phosphorylated peptides accounted for 43% of the total, whereas in the reverse approach they accounted for less than 1%. When we looked at the distribution of isoelectric points for the IMAC-HILIC non-phosphorylated peptides, it was not surprising that 67% had a pI between 3.0 and 4.9. Again this is this consistent with the reported bias of IMAC for peptides containing multiple acidic amino acids. To counter this bias the use of methyl esterification to neutralize acid side chains has been promoted (14). What is surprising is the nearly absolute selectivity of the IMAC enrichment in the HILIC-IMAC protocol. However, this may be explained by the solvent system and retention mechanism of HILIC. First, the elution profile used in this work sends ⬃50% of the peptide mass (which contains little or no phosphopeptides) into the solvent front. Second, the acetonitrile/water solvent system for the TSKgel Amide-80 column contains 0.1% TFA, and the majority of phosphopeptides elute between 70 and 50% acetonitrile. Oda and co-workers (22) have shown that the best recovery and greatest selectivity for IMAC occurs when the loading buffer contains between 0.1 and 1.0% TFA and between 45 and 65% acetonitrile. Thus
HILIC Reduces the Complexity of the Phosphoproteome
FIG. 5. Comparison of phosphopeptides identified using IMAC-HILIC and HILIC-IMAC. 1 mg of HeLa cell tryptic digest was enriched by IMAC followed by fractionation of the eluate using HILIC (A) or fractionated by HILIC with IMAC enrichment of each individual pool (B). Non-phosphorylated (yellow), singly phosphorylated (red), and multiply phosphorylated (blue) peptides are shown. The distribution of theoretical pI values for peptides identified by IMAC-HILIC (C) and HILIC-IMAC (D) work flows is shown.
the majority of phosphopeptides were eluted in the ideal loading buffer for IMAC recovery and selectivity. Finally the hydrophilic separation of peptides balances the competition for IMAC binding sites. Generally speaking those amino acids with the greatest affinity for IMAC (Ser(P), Thr(P), Tyr(P), Asp, Glu, and His) are also among the most hydrophilic (Ser(P), Thr(P), Tyr(P) ⬎ His ⬎ Glu ⬎ Asp). Thus multiply phosphorylated peptides will be more hydrophilic than singly phosphorylated peptides and compete more effectively for IMAC binding sites. Furthermore peptides containing multiple acid residues will be more hydrophilic than those with fewer, and they will compete more effectively for IMAC binding sites. Finally histidine is a very hydrophilic amino acid residue, and histidine-containing peptides bind IMAC resins very efficiently. By fractionating the peptides according to their hydrophilic character we effectively created pools where the competition for IMAC binding sites among the population is fair. In this system for instance, a singly phosphorylated peptide with pI ⫽ 5.0 does not have to complete with a highly acidic non-phosphorylated peptide or a multiply phosphorylated peptide for IMAC binding sites. These later peptides will have been partitioned into separate HILIC pools based on their likely greater hydrophilicity. This is in marked contrast to the situation when performing IMAC on unfractionated mixtures. The bias of IMAC toward multiply phosphorylated peptides is well documented (14, 23, 24, 26). Even when using methyl esterification to neutralize acidic peptide binding, in an unfractionated sample, multiply phosphorylated peptides should out-complete equimolar amounts of singly phosphorylated
peptides for IMAC binding sites. However, in several large scale phosphoproteomics studies where samples have been partitioned using SCX and enriched with either IMAC (25) or TiO2 (9), multiply phosphorylated peptides still represented between 30 and 40% of the total. In our own work, we found ⬃10% multiply phosphorylated peptides. The distribution of multiply phosphorylated peptides in the HILIC-IMAC experiments shows that they tend to elute late in the HILIC gradient and make up between 20 and 50% of the late eluting fractions (Fig. 5B), whereas they are nearly absent in early fractions. These late eluting fractions have a solvent composition lower in acetonitrile (typically 5–30%) than those fractions that contain primarily singly phosphorylated peptides. It is possible that this has adversely affected the binding of multiply phosphorylated peptides to the IMAC resin (22). On the other hand, multiply phosphorylated peptides are likely to be extremely hydrophilic; thus it is also possible that the gradient needs to be extended and that additional fractions should be collected. Clearly additional work needs to be done to understand the distribution of highly clustered phosphorylation sites in the cellular proteome (see further discussion below). The lower percentage of multiply phosphorylated peptides in our work may also be a function of the low mass accuracy of the instrument used in these studies coupled with the strict false positive rate utilized in the data analysis. Multiply phosphorylated peptides are generally larger peptides with higher charge states and are prone to consecutive neutral loss of H3PO4. The combination of these three features results in much fewer analytically useful fragment ion spectra particularly when the experiment is performed in an ion trap. Com-
Molecular & Cellular Proteomics 7.5
977
HILIC Reduces the Complexity of the Phosphoproteome
TABLE I Summary of phosphopeptides identified Peptidesa
Phosphopeptidesa
Selectivity
820 899
814 545
99 60
Phosphopeptides
Sites NR
Unique peptidesd
914 607 144
1004 677 168
788 438 109
1665
1849
1335
Validatedb
NRc
1047 684 147 1878
Unique sites
%
HILIC-IMAC IMAC-HILIC IMAC Totals
1410
Based on a false positive rate of ⬍1.3%. b Includes peptides manually validated as described in the text. c NR, non-redundant peptides. We eliminated duplicate sequences including alternative charge states and sequence extensions. d Peptides not found in the other two data sets. a
pounding this situation is the fact that protein database search engines are much less efficient at matching phosphopeptide sequences. Indeed in a recently published large scale analysis of the HeLa cell phosphoproteome more than 50% of all multiply phosphorylated peptides had a Mascot ion score of less than 25 (9). In our own data set, multiply phosphorylated peptides constituted more than 40% of all peptides with Mascot ion scores between 15 and 25. However, in the absence of any other validating criteria, the false positive rates at these score thresholds are unacceptably high. An accurate mass assignment on the precursor is critical to providing confident identification of multiply phosphorylated sequences and other peptides with poor fragment ion spectra. The recent introduction of electron transfer dissociation (27) is also likely to play an important role in analyzing multiply phosphorylated sequences. Given the unique selectivity of HILIC for these types of peptides it will be interesting to evaluate the frequency of phosphorylation site clustering on short sequences when this mode of chromatography can be interfaced to an accurate mass instrument capable of performing electron transfer dissociation. Given the problems with assigning phosphopeptide spectra to correct sequences described above and the absence of an accurate mass measurement on the precursor in our experiments, we manually validated a subset of phosphopeptide identifications. Peptides with Mascot scores between the threshold for a 1.3% false positive rate and 25 were examined for the following features: neutral loss of H3PO4 from the precursor, contiguous series of three or more y or b ions, prominent y ion cleavage N-terminal to a proline, and no unexplained high mass fragment ions. Using these criteria to validate sequence matches with scores below the false positive rate threshold, we added an additional 233 phosphopeptides to the HILIC-IMAC result and 139 to the IMAC-HILIC result for a total of 1047 and 684, respectively. After removing redundancy we identified 1004 sites on 914 peptides in the HILIC-IMAC experiment and 677 sites on 607 peptides in the IMAC-HILIC experiment (Table I). For the sake of comparison, we enriched a third aliquot of HeLa digest with IMAC and analyzed the eluted peptides using a longer RPLC-MS gradient but without any fractionation by HILIC either before or
978
Molecular & Cellular Proteomics 7.5
after IMAC. Using the same database search and validation criteria we identified 144 unique phosphopeptides containing 168 sites. Clearly the advantages of simplifying the phosphopeptide pool are significant. The overlap among the three data sets was relatively low, ranging from 14 to 27%. Combining the peptides from all three experiments and removing the redundancy, our analysis identified 1335 unique phosphopeptides containing 1410 sites. A list of all nonredundant phosphopeptides and phosphorylation sites is found in the supplemental material in supplemental Table S1. We compared this data set with three large scale HeLa phosphoproteomics studies published recently by Beausoleil et al. (7, 8) and Olsen et al. (9) and found that only 16 and 41%, respectively, of the sites in our data were contained in the three literature sets. This result was somewhat surprising considering that the combined three published data sets contain almost 10,000 sites (excluding redundancy between the two groups). However, considering the different cellular perturbations used in the three studies and the very different selectivities in both the chromatography and the enrichment techniques (28) used in our work and the three published studies, it is very likely (and the data would seem to support it) that a unique subset of phosphopeptides was isolated in this study. Thus this work added over 700 novel phosphorylated sequences to the already extensive HeLa phosphoproteome. Conclusions—Since it was introduced in 1990 (17), the use of HILIC for the separation of peptides has gone largely ignored. With the exception of carbohydrates, glycoproteins, and histones (29 –31), HILIC has been used primarily for the analysis of polar metabolites. Recently Boersema et al. (32) described a zwitterionic HILIC system and used it in a 2D LC scheme for proteomics applications. The separations achievable on this HILIC material are highly dependent on buffer pH. In this work we used a TSKgel Amide-80 HILIC column and eluted peptides with a shallow inverse organic gradient of acetonitrile-water-TFA. We demonstrated that this is an appropriate chromatography system for 2D proteomics studies. Furthermore we found that the retention mechanism of HILIC makes it ideal for fractionating digested cell lysates for phosphoproteomics. Because phosphoamino acids are the most
HILIC Reduces the Complexity of the Phosphoproteome
hydrophilic of all amino acids, phosphopeptides are more strongly retained on HILIC columns than non-phosphorylated peptides. We took advantage of this by optimizing the HILIC gradient to send ⬃50% of the non-phosphorylated peptides to the solvent front. The remaining peptides were then fractionated into pools with each successive fraction being ever more hydrophilic. Because the acidic amino acids Glu and Asp, which bind IMAC efficiently, are also hydrophilic, this chromatography essentially fractionates peptides according to their ability to bind IMAC. In fractions where the components all have similar hydrophilic properties, phosphopeptides within these pools compete very effectively for IMAC binding sites. The end result of this is that for each fraction we found IMAC enrichment to be better than 99% selective. This was achieved without derivatization or using chemical modifiers to enhance selectivity. Furthermore because the HILIC mobile phase contains only acetonitrile, water, and TFA the fractions do not require desalting. The IMAC resin can be added directly to the HILIC fractions. It is very likely that TiO2 can be used just as effectively in this protocol with all the same advantages being observed. The highly selective enrichment provided by the HILICIMAC protocol also improved the efficiency of data-dependent MS-based sequencing in that nearly all of the events were triggered by phosphopeptides. In a 300-g equivalent of cell lysate we identified over 1000 unique phosphorylation sites in 20 50-min LC-MS/MS experiments. A number of recently published phosphoproteomics studies have combined SCX with either IMAC (25) or TiO2 (9, 33). It is clear from these studies and ours that having an efficient fractionation scheme in front of the enrichment step makes the mass spectrometer more efficient and increases the number of phosphorylation sites identified. However, the inherently superior resolution of HILIC and the unique characteristics that the HILIC separation mode brings to the combination of HILIC and the enrichment step, as demonstrated in this work, suggest that HILIC may be the ideal first dimension in a multidimensional chromatography system for phosphoproteomics. In these and most proteomics experiments, trypsin is used to enzymatically digest cell lysates. Given that many protein kinases have known basophilic tendencies, for phosphoproteomics it may be advantageous to use other enzymes. Electron transfer dissociation, which has been shown to be highly effective for sequencing phosphopeptides (27, 34), also benefits from the use of enzymes other than trypsin to produce longer peptides and peptides with higher charge states. The use of enzymes other than trypsin will produce many peptides with internal lysines and arginines. This will be especially important for sequences containing highly clustered phosphorylation sites that tend to reduce the overall charge. Enrichment and fractionation of these larger peptides by SCX is likely to be problematic because the clustering effect of tryptic phosphopeptides is eliminated. HILIC on the other hand may be ideally suited to the use of alternative enzymes because
lysine and arginine are very hydrophilic. Thus peptides containing multiple copies of these amino acids will be strongly retained by HILIC and can be partitioned according to their degree of hydrophilicity. Finally the simple enrichment provided by the optimized HILIC gradient described in this study may be ideal for the analysis of purified phosphoproteins. By eliminating ⬃50% of the non-phosphorylated peptides in the solvent front, datadependent LC-MS/MS analysis of the remaining gradient may identify phosphopeptides more efficiently without any other form of enrichment. We are currently working on interfacing nanobore HILIC columns to our existing mass spectrometers. We anticipate that HILIC will make a significant contribution to phosphoproteomics and the study of phosphorylation-dependent biological systems. * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. □ S The on-line version of this article (available at http://www. mcponline.org) contains supplemental material. ‡ To whom correspondence should be addressed. E-mail: roland_s_
[email protected]. REFERENCES 1. de Godoy, L. M., Olsen, J. V., de Souza, G. A., Li, G., Mortensen, P., and Mann, M. (2006) Status of complete proteome analysis by mass spectrometry: SILAC labeled yeast as a model system. Genome Biol. 6, R50 2. Horth, P., Miller, C. A., Preckel, T., and Wenz, C. (2006) Efficient fractionation and improved protein identification by peptide OFFGEL electrophoresis. Mol. Cell. Proteomics 5, 1968 –1974 3. Giorgianni, F., Desiderio, D. M., and Beranova-Giorgianni, S. (2003) Proteome analysis using isoelectric focusing in immobilized pH gradient gels followed by mass spectrometry. Electrophoresis 24, 253–259 4. Washburn, M. P., Wolters, D., and Yates, J. R., III (2001) Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 19, 242–247 5. Lecchi, P., Gupte, A. R., Perez, R. E., Stockert, L. V., and Abramson, F. P. (2006) Size-exclusion chromatography in multidimensional separation schemes for proteome analysis. J. Biochem. Biophys. Methods 56, 141–152 6. Blagoev, B., Kratchmarova, I., Ong, S. E., Nielsen, M., Foster, L. J., and Mann, M. (2003) A proteomics strategy to elucidate functional proteinprotein interactions applied to EGF signaling. Nat. Biotechnol. 21, 315–318 7. Beausoleil, S. A., Jedrychowski, M., Schwartz, D., Elias, J. E., Villen, J., Li, J., Cohn, M. A., Cantley, L. C., and Gygi, S. P. (2004) Large-scale characterization of HeLa cell nuclear phosphoproteins. Proc. Natl. Acad. Sci. U. S. A. 101, 12130 –12135 8. Beausoleil, S. A., Villen, J., Gerber, S. A., Rush, J., and Gygi, S. P. (2006) A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. 24, 1285–1292 9. Olsen, J. V., Blagoev, B., Gnad, F., Macek, B., Kumar, C., Mortensen, P., and Mann, M. (2006) Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 127, 635– 648 10. Nuwaysir, L. M., and Stults, J. T. (1993) Electrospray ionization mass spectrometry of phosphopeptides isolated by on-line immobilized metalion affinity chromatography. J. Am. Soc. Mass Spectrom. 4, 662– 667 11. Posewitz, M. C., and Tempst, P. (1999) Immobilized gallium(III) affinity chromatography of phosphopeptides. Anal. Chem. 71, 2883–2892 12. Pinkse, M. W., Uitto, P. M., Hilhorst, M. J., Ooms, B., and Heck, A. J. (2006) Selective isolation at the femtomole level of phosphopeptides from proteolytic digests using 2D-NanoLC-ESI-MS/MS and titanium oxide precolumns. Anal. Chem. 76, 3935–3943
Molecular & Cellular Proteomics 7.5
979
HILIC Reduces the Complexity of the Phosphoproteome
13. Larsen, M. R., Thingholm, T. E., Jensen, O. N., Roepstorff, P., and Jorgensen, T. J. (2005) Highly selective enrichment of phosphorylated peptides from peptide mixtures using titanium dioxide microcolumns. Mol. Cell. Proteomics 4, 873– 886 14. Ficarro, S. B., McCleland, M. L., Stukenberg, P. T., Burke, D. J., Ross, M. M., Shabanowitz, J., Hunt, D. F., and White, F. M. (2002) Phosphoproteome analysis by mass spectrometry and its application to Saccharomyces cerevisiae. Nat. Biotechnol. 20, 301–305 15. Kim, J. E., Tannenbaum, S. R., and White, F. M. (2005) Global phosphoproteome of HT-29 human colon adenocarcinoma cells. J. Proteome Res. 4, 1339 –1346 16. Yang, F., Stenoien, D. L., Strittmatter, E. F., Wang, J., Ding, L., Lipton, M. S., Monroe, M. E., Nicora, C. D., Gristenko, M. A., Tang, K., Fang, R., Adkins, J. N., Camp, D. G., II, Chen, D. J., and Smith, R. D. (2006) Phosphoproteome profiling of human skin fibroblast cells in response to low and high-dose irradiation. J. Proteome Res. 5, 1252–1260 17. Alpert, A. J. (1990) Hydrophilic-interaction chromatography for the separation of peptides, nucleic acids and other polar compounds. J. Chromatogr. 499, 177–196 18. Yoshida, T., and Okada, T. (1999) Peptide separation in normal-phase liquid chromatography: study of selectivity and mobile phase effects on various columns. J. Chromatogr. A 840, 1–9 19. Gilar, M., Olivova, P., Daly, A. E., and Gebler, J. C. (2005) Orthogonality of separation in two-dimensional liquid chromatography. Anal. Chem. 77, 6426 – 6434 20. Muszynska, G., Dobrowolska, G., Medin, A., Ekman, P., and Porath, J. O. (1992) Interaction of immobilized iron(III) ions with phosphorylated amino acids, peptides and proteins. J. Chromatogr. 604, 19 –28 21. Yoshida, T. (1998) Calculation of peptide retention coefficients in normalphase liquid chromatography. J. Chromatogr. A 808, 105–112 22. Kokubu, M., Ishihama, Y., Sato, T., Nagasu, T., and Oda, Y. (2006) Specificity of immobilized metal affinity-based IMAC/C18 tip enrichment of phosphopeptides for protein phosphorylation analysis. Anal. Chem. 77, 5144 –5154 23. Ndassa, Y. M., Orsi, C., Marto, J. A., Chen, S., and Ross, M. M. (2006) Improved immobilized metal affinity chromatography for large-scale phosphoproteomics applications. J. Proteome Res. 5, 2789 –2799 24. Nuhse, T. S., Stensballe, A., Jensen, O. N., and Peck, S. C. (2006) Largescale analysis of in vivo phosphorylated membrane proteins by immo-
980
Molecular & Cellular Proteomics 7.5
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
bilized metal ion affinity chromatography and mass spectrometry. Mol. Cell. Proteomics 2, 1234 –1243 Villen, J., Beausoleil, S. A., and Gerber, S. A. (2006) Large-scale phosphorylation analysis of mouse liver. Proc. Natl. Acad. Sci. U. S. A. 104, 1488 –1493 Thingholm, T. E., Jensen, O. N., Robinson, P. J., and Larsen, M. R. (November 26, 2007) SIMAC, a phosphoproteomics strategy for the rapid separation of monophosphorylated from multiply phosphorylated peptides. Mol. Cell. Proteomics 10.1074/mcp.M700362-MCP200 Syka, J. E., Coon, J. J., Schroeder, M. J., Shabanowitz, J., and Hunt, D. F. (2004) Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry. Proc. Natl. Acad. Sci. U. S. A. 101, 9528 –9533 Bodenmiller, B., Mueller, L. N., Mueller, M., Domon, B., and Aebersold, R. (2006) Reproducible isolation of distinct, overlapping segments of the phosphoproteome. Nat. Methods 4, 231–237 Karlsson, G., Winge, S., and Sandberg, H. (2005) Separation of monosaccharides by hydrophilic interaction chromatography with evaporative light scattering detection. J. Chromatogr. A 1092, 246 –249 Takegawa, Y., Deguchi, K., Ito, H., Keira, T., Nakagawa, H., and Nishimura, S. (2006) Simple separation of isomeric sialylated N-glycopeptides by a zwitterionic type of hydrophilic interaction chromatography. J. Sep. Sci. 29, 2533–2540 Garcia, B. A., Pesavento, J. J., Mizzen, C. A., and Kelleher, N. L. (2007) Pervasive combinatorial modification of histone H3 in human cells. Nat. Methods 4, 487– 489 Boersema, P. J., Divecha, N., Heck, A. J., and Mohammed, S. (2007) Evaluation and optimization of ZIC-HILIC-RP as an alternative MudPIT strategy. J. Proteome Res. 6, 937–946 Pinkse, M. W. H., Mohammed, S., Gouw, J. W., van Breukelen, B., Vos, H. R., and Heck, A. J. R. (2008) Highly robust, automated, and sensitive, online TiO2-based phosphoproteomics applied to study endogenous phosphorylation in Drosophila melanogaster. J. Proteome Sci. 7, 687– 697 Chi, A., Huttenhower, C., Geer, L. Y., Coon, J. J., Syka, J. E., Bai, D. L., Shabanowitz, J., Burke, D. J., Troyanskaya, O. G., and Hunt, D. F. (2007) Analysis of phosphorylation sites on proteins from Saccharomyces cerevisiae by electron transfer dissociation (ETD) mass spectrometry. Proc. Natl. Acad. Sci. U. S. A. 104, 2193–2198