combining quantitative proteomics data processing workflows ... - Nature

1 downloads 0 Views 623KB Size Report
in ion score for Distiller, the MaxQuant complement departed from the intersection distribution (Fig. 1e,f) that was likely due to MaxQuant's use of a posterior error ...
brief communications

© 2011 Nature America, Inc. All rights reserved.

Combining quantitative proteomics data processing workflows for greater sensitivity Niklaas Colaert1,2, Christophe Van Huele3, Sven Degroeve1,2, An Staes1,2, Joël Vandekerckhove1,2, Kris Gevaert1,2 & Lennart Martens1,2 We here describe a normalization method to combine quantitative proteomics data. By merging the output of two popular quantification software packages, we obtained a 20% increase (on average) in the number of quantified human proteins without suffering from a loss of quality. Our integrative workflow is freely available through our userfriendly, open-source Rover software (http://compomics-rover. googlecode.com/).

Improvements in instrumentation1, quantitative methodologies2 and software2,3 have led to the routine application of ­quantitative proteomics analyses. Yet analysis of a dataset using different quanti­ fication algorithms (for example, MaxQuant4, Mascot Distiller (Matrix Science; referred to as Distiller), MsQuant5 and Census6) yields only partial overlap7, resembling the imperfect overlap for identifications from different search engines8–10. Here we show that these discrepancies can be used to increase the sensitivity of the quantification without suffering from a corresponding loss in specificity. We analyzed two sets of experiments using MaxQuant and Distiller, both algorithms chosen for their popularity and overall performance7. Both datasets used cells prepared by stable­isotope labeling with amino acids in cell culture (SILAC)11. We prepared dataset 1 in-house using methionine combined fractional diagonal chromatography12, and it consists of two labelswapped experiments (designated experiments A and B; Online Methods). We obtained dataset 2 from the public domain, from a study analyzing the effect of DNA and histone methylation on nucleosome-interacting proteins13 (Online Methods). We performed identification in an identical way for each quanti­fication algorithm such that differences in quantification results were only influenced by differences in peak-list generation and peptide quantification between the workflows (Online Methods).

The impact of these differences is shown in the many peptide identifications and quantifications unique to a single workflow (Fig. 1a,b for experiment A of dataset 1; Supplementary Fig. 1 for experiment B of dataset 1; and Supplementary Data 1 for the entire dataset 2). Using set theory nomenclature, we refer to ­workflow-specific identifications and quantifications as ‘complement’, and to shared identifications and quantifications as ‘intersection’ identifications and quantifications. We used two parameters to ascertain the quality of complement results: light and heavy isotope intensities, and peptide Mascot search engine (referred to as Mascot) ion score. The intensities of complement peptides were only slightly smaller than those of shared (inter­ section) peptides (Fig. 1c,d). Whereas we observed no difference in ion score for Distiller, the MaxQuant complement departed from the inter­section distribution (Fig. 1e,f) that was likely due to MaxQuant’s use of a posterior error probability4. MaxQuant complement peptides also had worse posterior error probability values than intersection peptides (Fig. 1g), implying slightly lower overall quality for complement identifications in MaxQuant. Despite the better correlation between MaxQuant and Distiller for protein analysis compared to peptide analysis (Fig. 1h,i), each algorithm retained a substantial set of complement proteins. Combining the results from both tools thus has three potential benefits: (i) intersection peptides are quantified with increased confidence through corroboration; (ii) complement peptides for intersection proteins are quantified with strengthened confidence; and (iii) complement peptides yield complement proteins, thus extending the number of proteins quantified. We therefore combined the quantification results by each algorithm for each dataset separately, after correcting for heterogeneity of variance and scale normalization14 (Fig. 2, Supplementary Fig. 2, Supplementary Data 1 and Online Methods). The merged results yielded more quantified proteins (20% on average; Supplementary Table 1) based on the complement peptide quantifications contributed by each workflow. These comprised three distinct groups of proteins: proteins supported exclusively by intersection peptides (intersection proteins), proteins supported exclusively by complement peptides (complement proteins), and proteins found as a result of the combination of workflows (combination proteins). We analyzed the quantification quality of these three groups by comparing the distributions of s.d. of peptide ratios per protein (Fig. 2i), showing that the combined dataset does not suffer from a loss of quality. Furthermore, protein coverage increased for the merged results owing to complement peptides (Fig. 2j). With more data points for protein quantification, the analysis was more robust, yielding greater statistical power and more sensitive quality control metrics.

1Department of Medical Protein Research, Vlaams Instituut voor Biotechnologie (VIB), Ghent, Belgium. 2Department of Biochemistry, Ghent University, Ghent,

Belgium. 3Department of Management Information and Operations Management, Ghent University, Ghent, Belgium. Correspondence should be addressed to L.M. ([email protected]).

Received 9 August 2010; accepted 13 April 2011; published online 8 MAY 2011; doi:10.1038/nmeth.1604

nature methods  |  VOL.8  NO.6  |  JUNE 2011  |  481

In te rs om ect pl ion D em C isti en om lle t M pl r ax em Q To ua ent n t To al D t ta i l M still ax er Q ua nt

Complement Intersection

6

8

10

12

14

16

C

0 40

60

80

120

100

g

MaxQuant Complement Intersection

10 8

h 50 Ion score100

150

Density 0.04 0.08

0

80

100

120

16

140

Density

18

0

0.15

16 14 12 10 8

i

–50 –100

R = –0.493

2 1 0 –1 –2

–1

0 Distiller log2(L/H)

1

Combination MAD

g

h Original Distiller Original MaxQuant Normalized Distiller Normalized MaxQuant

0.30 0.25 0.20 0.15 0.10 8

10

12 14 Log(intensity L)

Density

18

Protein ratios s.d.

Complement

12 10 8 6 4 2 0 −0.4 0 0.4 Difference in protein ratio s.d.

1 0.8 0.6 0.4 0.2 0

Coverage combined – coverage Distiller Coverage combined – coverage MaxQuant

0.5 0.4 0.3 0.2 0.1 0 0

5

10 15 Coverage difference (%)

482  |  VOL.8  NO.6  |  JUNE 2011  |  nature methods

20

Intersection

i

C

j 0.7 0.6

16

−0.4 0 0.4 Difference in protein ratio s.d.

25

n

18

tio

16

rs ec

12 14 Log(intensity L)

te

10

tio n

8

In

0.1

Intersection

1 0.8 0.6 0.4 0.2 0

bi na

0.2

20 15 10 5 0

om

0.3

f

en t

0.4

e 25

Complement

C

Original Trend line original 1 cycle of normalization 2 cycles of normalization 3 cycles of normalization 4 cycles of normalization

−0.4 0 0.4 Difference in protein ratio s.d.

pl em

18

1 0.8 0.6 0.4 0.2 0

om

Distiller MAD

d

16

Protein ratios s.d.

12 14 Log(intensity L)

2

average of 20% without loss of quality, by combining the results obtained by different quantitative software packages (Rover currently reads results from MaxQuant, Distiller, MsQuant and Census) or by combining forward and reverse replicates in a single analysis. Currently Rover supports only mass spectrometry– based, experiments using stable-isotope peptide-level labels, but we plan to add support for other quantitative proteomics platforms. We fully implemented the algorithm in a user-friendly graphical interface in our freely available, open-source Rover ­application15. Rover is available at http:// compomics-rover.googlecode.com/.

c 10 8 6 4 2 0

Protein ratios s.d.

10

Density

8

b Density

Original Trend line original 1 cycle of normalization 2 cycles of normalization

0.30 0.25 0.20 0.15 0.10

Density

MaxQuant MAD

a

2

Protein ratios

Figure 1 | Analysis of different quantification workflows –150 R = –0.608 for dataset 1, experiment A. (a,b) Number of complement and –200 –2 –1 0 1 intersection peptide and protein identifications found in both –250 Distiller log2(L/H) MaxQuant and Distiller workflows. (c,d) Scatter plot of the light (L) and heavy (H) log-transformed peptide intensities for Distiller (c) and MaxQuant (d); overlapping regions are shown in brown; number of data points was 10,197 (c) and 7,721 (d). (e,f) Distribution of the Mascot ion scores for complement and intersection peptides for Distiller (e) and MaxQuant (f). (g) MaxQuant-specific log-transformed posterior error probability (PEP) scores versus Mascot ion score. (h,i) Correlation between the average base two logarithms of the peptide ratios for the intersection peptide (h) and protein (i) ratios; number of data points was 5,895 (h) and 1,304 (i).

We also used this method to merge results from the same workflow across replicate analyses, yielding high-quality quanti­ fications of (on average) 19.2% more proteins by simultaneous analysis of label-swapped replicates as performed in both datasets (Supplementary Table 2). In conclusion, our algorithm provides a straightforward method to improve the number of quantifiable proteins by an

0.30

18

Peptide ratios 2 1 0 –1 –2

–2

MaxQuant log2(H/L)

60

Log(PEP score)

40

14

6

Complement Intersection

Mascot ion score

© 2011 Nature America, Inc. All rights reserved.

12

0

20

12

Log(L intensity)

0.025 0.020 0.015 0.010 0.005 0

0 0

10

6

Mascot ion score 0.035 0.030 0.025 0.020 0.015 0.010 0.005 0

8

14

140

Density

Density

f

20

6

16

0.005 0

Complement Intersection

0.30 0.25 0.20 0.15 0.10 0.05 0

18 Log(H intensity)

0.010

Density 0.15 0.30

0

Log(L intensity)

Distiller Complement Intersection

0.015

18

MaxQuant peptide identifications

Log(H intensity)

In te C r se om c pl tion D em C ist en om ille t M pl r ax em To Qu en t a t To al D nt ta l M istil ax ler Q ua nt

0.020

d

Distiller peptide identifications 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0

Density

Protein count

Density

e

c 1,500 1,000 500 0

MaxQuant log2(H/L)

b

10,000 8,000 6,000 4,000 2,000 0

Density

a Peptide count

brief communications

Figure 2 | Quality control of the combined results from multiple data-processing workflows for dataset 1, experiment A. (a–f) Median absolute deviation (MAD) values calculated for different intensity windows and across iterations for MaxQuant (a) and Distiller (d). Distributions of the differences in s.d. between the proteinspecific peptide ratios for MaxQuant (b) and Distiller (e) after and before normalization. Distributions for the s.d. of protein-specific peptide ratios for complement and intersection proteins for MaxQuant (c) and Distiller (f). (g) Overlay of the normalization data from a and d on a single plot. (h) Distribution of differences in s.d. between the protein-specific peptide ratios after and before normalization. (i) Distributions for s.d. of protein-specific peptide ratios for complement, combined and intersection proteins. (j) Difference in protein coverage between merged results and separate workflows for intersection proteins.

brief communications Methods Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturemethods/. Note: Supplementary information is available on the Nature Methods website.

© 2011 Nature America, Inc. All rights reserved.

Acknowledgments C.V.H. is supported by a grant of the Research Foundation–Flanders (project 3G003908). We thank B. Ghesquière, F. Impens and E. Timmerman for providing prepublication access during algorithm development to their now published data. We acknowledge the support of Ghent University (Multidisciplinary Research Partnership “Bioinformatics: from nucleotides to networks”) and EU 7th Framework Programme (contract 262067-PRIME-XS). K.G. and J.V. acknowledge funding from the Fund for Scientific Research–Flanders (Belgium) (project G.0042.07), the Concerted Research Actions (project BOF07/GOA/012) from Ghent University and the Interuniversity Attraction Poles (IUAP06). AUTHOR CONTRIBUTIONS N.C. developed the combination algorithm and wrote the first draft of the manuscript. C.V.H. contributed to algorithm development and to manuscript writing. S.D. contributed to algorithm development and manuscript writing. A.S. assisted with data processing and manuscript writing. J.V. and K.G. supervised part of the work and contributed to manuscript writing. L.M. supervised the work, contributed to algorithm development and wrote the manuscript. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests.

Published online at http://www.nature.com/naturemethods/. Reprints and permissions information is available online at http://www.nature. com/reprints/index.html.

1. Domon, B. & Aebersold, R. Science 312, 212–217 (2006). 2. Vaudel, M., Sickmann, A. & Martens, L. Proteomics 10, 650–670 (2010). 3. Mueller, L.N., Brusniak, M., Mani, D.R. & Aebersold, R. J. Proteome Res. 7, 51–61 (2008). 4. Cox, J. & Mann, M. Nat. Biotechnol. 26, 1367–1372 (2008). 5. Mortensen, P. et al. J. Proteome Res. 9, 393–403 (2010). 6. Park, S.K., Venable, J.D., Xu, T. & Yates, J.R. III. Nat. Methods 5, 319–322 (2008). 7. Colaert, N., Vandekerckhove, J., Martens, L. & Gevaert, K. in Methods in Molecular Biology: Gel-Free Proteomics: Methods and Protocols (eds. Gevaert, K. and Vandekerckhove, J.) (Humana Press; in the press). 8. Yu, W. et al. Proteomics 10, 1172–1189 (2010). 9. Jones, A.R., Siepen, J.A., Hubbard, S.J. & Paton, N.W. Proteomics 9, 1220–1229 (2009). 10. Searle, B.C., Turner, M. & Nesvizhskii, A.I. J. Proteome Res. 7, 245–253 (2008). 11. Ong, S.-E. et al. Mol. Cell. Proteomics 1, 376–386 (2002). 12. Gevaert, K. et al. Mol. Cell. Proteomics 1, 896–903 (2002). 13. Bartke, T. et al. Cell 143, 470–484 (2010). 14. Yang, Y.H. et al. Nucleic Acids Res. 30, E15 (2002). 15. Colaert, N., Helsens, K., Impens, F., Vandekerckhove, J. & Gevaert, K. Proteomics 10, 1226–1229 (2010).

nature methods  |  VOL.8  NO.6  |  JUNE 2011  |  483

© 2011 Nature America, Inc. All rights reserved.

ONLINE METHODS Samples. For experiments A and B (dataset 1) human neuro­ blastoma (SHEP) cells were grown in DMEM with either [13C6]lysine and [13C6]arginine (hereafter referred to as the ‘C13 sample’) or light, natural [12C6]lysine and [12C6]arginine (here­after referred to as the ‘C12 sample’). To reduce arginineto-­proline conversion, the arginine concentration was lowered to 30% of the DMEM concentration of arginine (25 mg l−1). Cells were collected using Versene-EDTA and washed with PBS (pH 7.4). Cells were lysed in 250 µl lysis buffer (0.8% CHAPS in 50 mM HEPES (pH 7.4), 100 mM NaCl and 0.5 mM EDTA supplemented with a ­complete protease inhibitor cocktail (Roche, 1/100 ml)) for 20 min on ice. Lysates were cleared by centrifugation for 15 min at 16,000g at 4 °C. Pellets were discarded, and protein concentration was measured in the supernatant using the Bradford method. In experiment A, a component that introduces differences in protein abundances (unpublished data) was added in the C13 sample, after which equal protein amounts of the C12 sample and C13 sample were mixed and analyzed by methionine combined fractional diagonal chromatography (COFRADIC) as described previously12. In experiment B, another component that introduces differences in protein abundances (unpublished data) was added to the C12 sample, after which equal protein amounts of the C13 sample and C12 sample were mixed and analyzed by methionine COFRADIC. Experiment C–L data (dataset 2) had been analyzed in the paper studying nucleosome-interacting proteins13. The raw data files were downloaded via Tranche (Tranche hash: 1eQxbPKQic2S BitXEPSTIDXIjvVxvox1fQFDT00ALGHDSIKgjxbn5hi2XIspd KP5aDKb4dsaomPrhayZBYt3kg/SBhcAAAAAAACAPA = = ). Experimental details are available in reference 13. Liquid chromatography–tandem mass spectrometry analysis. For experiments A and B liquid chromatography–tandem mass spectrometry (LC-MS/MS) analysis was performed using an Ultimate 3000 HPLC system (Dionex) in-line connected to an LTQ-Orbitrap XL mass spectrometer (Thermo Fisher Scientific). Peptides were first trapped on a trapping column (PepMap C18 column, 0.3 mm internal diameter × 5 mm (Dionex)) and after back-flushing from this trapping column, the sample was loaded on a 75 µm internal diameter × 150 mm reverse-phase column (PepMap C18, Dionex). Peptides were eluted with a linear gradient of 1.8% solvent B (0.05% formic acid in 2:8 water:acetonitrile (v/v)) increase per minute at a constant flow rate of 300 nl min−1. The mass spectrometer was operated in data-dependent mode, automatically switching between MS and MS/MS acquisition for the six most abundant ion peaks per MS spectrum. Full-scan MS spectra were acquired at a target intensity value of 105 ion counts with a resolution of 30,000. The six most intense ions were then isolated for fragmentation in the linear ion trap. In the LTQ, MS/MS scans were recorded in profile mode at a target value of 5,000. Peptides were fragmented after filling the ion trap with a maximum ion time of 10 ms and a maximum intensity of 104 ion counts. Tandem mass spectrometry data generation and identification. Both MaxQuant (version 1.0.13.13)4 and Mascot Distiller Quantitation toolbox version 2.3 (Matrix Science) were used to extract MS/MS spectra for the experiments with default nature methods

­ arameters. Peptide identification was performed in the same p way for both workflows. MS/MS spectra were identified using a locally installed version of the Mascot database search engine version 2.2.06 (Matrix Science)16. Peptide identification parameters for experiments A and B. Peptide mass tolerance was set at 10 p.p.m., and peptide fragment mass tolerance was set at 0.5 Da. Trypsin/P was set as the protease, allowing for one missed cleavage. Variable modifications were pyroglutamate formation of N-terminal glutamine, pyrocarbamidomethyl cysteine formation of N-terminal S-alkylated cysteine and acetylation of the alpha N terminus. Fixed modifications were S-carbamidomethyl cysteine and methionine oxidation (sulfoxide form). Quantification was set to “SILAC Arg Lys +6” only for the Mascot Distiller quantification workflow; this allows Mascot to identify arginine and lysine-containing peptides in their heavy form. Peptide identification parameters for experiments C–L. Peptide mass tolerance was set at 10 p.p.m., and peptide fragment mass tolerance was set at 0.5 Da. Trypsin/P was set as the protease, allowing for one missed cleavage. Variable modifications were methionine oxidation (sulfoxide form) and acetylation of the alpha N terminus. The fixed modification was S-carbamidomethyl cysteine. Quantification was set to “SILAC Lys +8 Arg +10” only for the Mascot Distiller quantification workflow; this allows Mascot to identify arginine and lysine containing peptides in their heavy form. In the MaxQuant workflow, the parameters for identifying MS/ MS spectra are slightly different from the Mascot Distiller workflow. This is due to the fact that the MaxQuant algorithm divides all MS/MS spectra in three categories (for double SILAC modus) and thus in three separate files. A first file contained MS/MS spectra that could not be linked to a SILAC event (a SILAC event is defined as two ion envelopes in an MS spectrum with a specific mass difference corresponding to that of the essential amino acids used for SILAC; here this difference was 6 Da for experiments A and B and 8 Da or 10 Da for experiments C–L (not including missed cleavage events)). The second file contained MS/MS spectra that were linked to the ‘light-ion envelope’ in a SILAC event and the third file contains spectra linked to the ‘heavy ion envelope’. MaxQuant automatically generates three parameter files that incorporate the earlier described general parameters and that can be used for every corresponding type of file containing MS/MS spectra. The incorporation of the heavy form of an arginine or lysine was set (by MaxQuant) as a variable modification for the MS/MS spectra that were not linked to a SILAC event. These were set as a fixed modification for the MS/MS spectra that were matched to the heavy peptide in a SILAC event. No extra modifications were added to the identification process of MS/MS spectra linked to the light peptide in a SILAC event. MaxQuant requires a species-specific protein database for identification of MS/MS spectra and also requires the addition of reversed protein sequences to calculate a false discovery rate. A reversed database concatenated with the human Swiss-Prot database (version 15.14 (experiments A and B) and 2010.11 (experiments C–L) of UniProtKB/Swiss-Prot protein database) was created by the SequenceReverser program, which is part of the MaxQuant installation. This database was also used in the Mascot Distiller peptide identification process. doi:10.1038/nmeth.1604

Protein identifications in both workflows were only accepted if matched by two or more distinct peptides, of which at least one was unique for the protein.

© 2011 Nature America, Inc. All rights reserved.

Peptide quantification. The peptide quantification of MaxQuant was performed with the ‘quantify’ program of the MaxQuant software suite with default parameters. Mascot Distiller quantification was performed with the following parameters. The extracted ion chromatogram (XIC) threshold was set from 0.1 to 0.3, XIC smooth was set from 3 to 1 and maximum XIC width was set to 250. The correlation score, a quantification quality parameter was set from 0.7 to 0.9. Scale normalization. As both tools use different algorithms, parameters and settings, the results show distinct distributions of peptide ratios (Fig. 1h) that cannot be directly combined. All normalization steps were performed on log 2(ratio) values. First, the location of the distributions was centered on a ­common value, typically zero (implying one to one mixtures). This was achieved by shifting all peptide ratios (Xi) by the distribution median (medianj(Xj)):    Xi normalized = Xi − medianj (Xj). Second, the distribution scales were normalized. Here we used the median absolute deviation (MAD) as a nonparametric scale estimator:    MAD = mediani (|Xi normalized − medianj (Xj normalized)|) Rather than calculate a global MAD, peptide ratios were first sorted in distinct groups because their distribution suffers from heterogeneity of variance, derived from the dependence of the variance on the MS intensity of the peptide (Fig. 2a,d, Supplementary Fig. 2a,d and Supplementary Data 1). Distinct group-specific MAD (MADg) values were therefore determined across defined ranges of MS intensities by sorting the ratios by MS intensity and then iteratively using a sliding window to calculate an MAD for the windowed range. A sliding window or frame consisted of two parts. One part, the sliding frame center, consisted of the ratios that will be corrected after MAD and rescaling factor are calculated. The other part was the sliding frame border. These borders consisted of ratio values that can be found before and after the ratios from the sliding frame center in the intensity sorted ratio list. The sliding frame border ratios were added to the sliding frame center ratios for the calculation of the MAD. This created a partial overlap in the sliding windows that served to smooth out differences in consecutive MADs. By iteratively repeating this scale normalization process, the s.d. of the different MADs will become progressively smaller, halting when a preset convergence criterion is reached. This preset criterion was set using the absolute difference of the coefficient of variation (CV) as follows: CV =

s MAD values m MAD values

,

in which σ is the s.d. and µ is the mean of the MAD values. Before and after the scale normalization cycle becomes equal to or

doi:10.1038/nmeth.1604

smaller than 0.0005. To minimize the effect of the sliding frame, the size of the frame border was multiplied by a specific value after each normalization cycle. Different sizes for the sliding frame center (150, 100, 50, 25, 15 and 10) and for the sliding frame border (10, 20, 30 and so on to 390) were tested to determine optimal settings. Different values for the border size multiplication factor were examined (0.1, 0.2, 0.3 and so on to 2.5). All these different values were tested for both experiments in the MaxQuant workflow. The final result of a combination of these different values was be the average difference between the protein specific ratio s.d. after and before normalization. If this value is smaller than zero it means that on average the s.d. after normalization was smaller than before. Better combinations of parameters would thus have smaller average differences. The different calculated averages are plotted in function of the frame center and border size, with values for the same multiplication factor connected by a line (Supplementary Fig. 3). The different lines for the cycle multiplication factor converge between sliding frame border size 100 and 200. Nevertheless multiplication factors smaller than one generated a better result than multiplication factors larger than one or equal to one (Supplementary Fig. 3). From sliding frame border size 200 and onwards the averages seemed to stabilize. Not much difference was seen in the size of sliding frame center. The final values used were: sliding frame center of 50, sliding frame border of 200 and cycle multiplication factor at 0.6. The size of the set by which an MAD value was calculated was thus in the first cycle 50 + 2 × 200 = 450 and 50 + 2 × (200 × (1/0.6)) = 716 for the second cycle. Each MADg value was then normalized using the geometric mean of all calculated MADs, yielding the scale normalization factor âg: a˘âgg =

(

MADg

G (ΠG MADg ) g =1

)

By dividing all peptide ratios in the group by this factor, each group was effectively scale-normalized. To evaluate the performance and impact of this normalization algorithm, the s.d. of the peptide ratios linked to an individual protein was used as a quality-control parameter. If this s.d. increased after normalization, the normalization has introduced unwanted variation amongst the peptides. Conversely, a decreasing s.d. implies more consistent results across the peptides after normalization (Fig. 2b,e, Supplementary Fig. 2b,e and Supplementary Data 1). Both distributions were centered on zero, showing that the normalization process did not affect the overall quality of the data. Furthermore, plotting these s.d. separately for complement proteins and intersection proteins, showed that the distributions were similar (Fig. 2c,f, Supplementary Fig. 2c,f and Supplementary Data 1). Complement peptides were therefore of the same overall quality as intersection peptides when considered in the context of the protein to which they map. 16. Perkins, D.N., Pappin, D.J., Creasy, D.M. & Cottrell, J.S. Electrophoresis 20, 3551–3567 (1999).

nature methods

Suggest Documents