964
DOI 10.1002/pmic.201400323
Proteomics 2015, 15, 964–980
REVIEW
Processing strategies and software solutions for data-independent acquisition in mass spectrometry Aivett Bilbao1,2 , Emmanuel Varesio2 , Jeremy Luban3 , Caterina Strambio-De-Castillia3 , 1,4 ¨ ´ erique ´ ´ and Fred Lisacek1,4 Gerard Hopfgartner2∗ , Markus Muller 1
Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland Life Sciences Mass Spectrometry, School of Pharmaceutical Sciences, University of Geneva, University of Lausanne, Geneva, Switzerland 3 Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA, USA 4 Faculty of Sciences, University of Geneva, Geneva, Switzerland 2
Data-independent acquisition (DIA) offers several advantages over data-dependent acquisition (DDA) schemes for characterizing complex protein digests analyzed by LC-MS/MS. In contrast to the sequential detection, selection, and analysis of individual ions during DDA, DIA systematically parallelizes the fragmentation of all detectable ions within a wide m/z range regardless of intensity, thereby providing broader dynamic range of detected signals, improved reproducibility for identification, better sensitivity, and accuracy for quantification, and, potentially, enhanced proteome coverage. To fully exploit these advantages, composite or multiplexed fragment ion spectra generated by DIA require more elaborate processing algorithms compared to DDA. This review examines different DIA schemes and, in particular, discusses the concepts applied to and related to data processing. Available software implementations for identification and quantification are presented as comprehensively as possible and examples of software usage are cited. Processing workflows, including complete proprietary frameworks or combinations of modules from different open source data processing packages are described and compared in terms of software availability and usability, programming language, operating system support, input/output data formats, as well as the main principles employed in the algorithms used for identification and quantification. This comparative study concludes with further discussion of current limitations and expectable improvements in the short- and midterm future.
Received: July 14, 2014 Revised: October 8, 2014 Accepted: November 24, 2014
Keywords: Bottom-up proteomics / Data-independent acquisition / Data processing and analysis / Label-free quantification / Mass spectrometry-LC-MS/MS
1 ´ erique ´ Correspondence: Dr. Fred Lisacek, Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, CMU, 1, rue Michel Servet, 1211 Geneva, Switzerland E-mail:
[email protected] Fax: +41-22-379-58-58 Abbreviations: AIF, all-ion fragmentation; CE, collision energy; DDA, data-dependent acquisition; DIA, data-independent acquisition; ETD, electron transfer dissociation; FDR, false discovery rate; FT-ARM, Fourier transform-all reaction monitoring; HCD, higher energy collisional dissociation; HDMSE , high-definition MSE ; IMS, ion mobility spectrometry; LIT, linear ion trap; MSPLIT, mixture spectrum partitioning using library of identified tandem mass spectra; PAcIFIC, precursor acquisition independent from ion count; QqTOF, quadrupole collision-quadrupole time-
C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Introduction
LC combined with MS has become the main analytical technique to characterize proteins and small molecules in complex samples. In proteomics, identification is performed applying the prominently used shotgun or bottom-up approach. In shotgun proteomics, proteins are digested into peptides. of-flight; SVM, support vector machine; UDMSE , ultra-definition MSE ; WiSIM-DIA, wide selected-ion monitoring DIA; XDIA, extended data-independent acquisition; XIC, extracted ion chromatogram ∗ Additional corresponding author: Dr. G´ erard Hopfgartner, E-mail:
[email protected] Colour Online: See the article online to view Figs. 1 and 2 in colour.
www.proteomics-journal.com
965
Proteomics 2015, 15, 964–980
Complex peptide mixtures are separated by LC, ESI and resulting ions are analyzed by the mass spectrometer. The mass spectrometer is operated in data-dependent acquisition (DDA) mode, where usually the top n most intense precursor ions detected in a survey scan (MS1) are selected for subsequent isolation and fragmentation in a serial manner. The acquired fragment ion or tandem mass spectra (MS/MS or MS2) are matched against theoretical spectra (generated from a sequence database) by a search engine, which then assigns peptide sequences and infers the corresponding proteins [1]. DDA allows the identification of extensive number of proteins and is a milestone for high-throughput proteomics. Chromatographic peak capacity and MS parameters such as cycle time, number of precursor ions selected per cycle, and dynamic exclusion to minimize repeat selections of precursor ions can be optimized and balanced to provide the highest quality spectra for as many precursor ions as possible. However, biologically relevant samples still yield co-elution and their analysis by DDA has significant limitations. For instance, it is unlikely that a peptide will be selected for fragmentation at the apex of its chromatographic peak, which may affect the interpretation of the MS2 spectrum [2,3], albeit the usage of a dynamic exclusion time window corresponding to half the LC-peak width might increase the chance to acquire better MS2 spectra near the LC-peak apex. Overall, DDA performance declines as sample complexity increases because the semistochastic selection of precursor ions aggravates certain limitations for both identification and quantification: limited reproducibility and dynamic range, bias toward high abundance peptides, and under-sampling [4, 5]. Aiming to overcome these limitations, alternative MS acquisition modes, generally termed data-independent acquisition (DIA), have emerged. Contrasting to DDA and by means of coselection and cofragmentation, DIA avoids the detection and selection of individual precursor ions during LC-MS analysis. Convoluted or multiplexed MS2 spectra are generated without explicit association between each single precursor and its corresponding fragments. As a result, DIA requires more sophisticated data analysis post acquisition compared to DDA. The later observation substantiates the purpose of this review, which is organized in two main sections. Since end users and software developers often overlook essential details related to instrumentation and data acquisition, the first part provides a comparison of different DIA schemes and a brief survey of related developments in MS-hardware, from early proof-of-concepts to recent advances in instrumentation. While defining the settings for an acquisition method may help to better exploit the capabilities of a specific instrument in accordance with the goal of a study, understanding the process of data generation may turn out to be crucial for successful data analysis, from both software usage and development perspectives. The second part describes the computational methods and available software implementations to exploit multiplexed data for peptide/protein identification and quantification. Due to the variability in the collection of sample types, LC conditions, mass analyzers, C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
and instrument settings utilized across the references, a direct comparison of the performance of different DIA methods and processing strategies is beyond the scope of this review.
2
Data-independent acquisition methods
DIA methods have evolved alongside hardware platforms. Hybrid MS instruments with faster MS and MS/MS acquisition capabilities, high resolution, and reproducible LC conditions have delivered a robust concept successfully applied in proteomics studies. In the following subsections, methods are described within categories according to the way acquisition cycles are defined for instrument operation: (1) Alternation between low and high collision energy (CE) scans (Fig. 1A) (2) Iteration through a selected m/z range of precursor ions partitioned in a number of predefined isolation windows of fixed width in m/z units (hereafter referred to as u) either in (a) stepwise (Fig. 1B) or (b) random order (Fig. 1C). These different approaches are detailed in the following paragraphs and summarized in Fig. 1 that highlights the impact of choosing one or the other. To facilitate understanding, generic instrument names are preferred and some manufacturer names are included. We refer to Table 1 for a summary of methods including examples of specific settings described in the respective publications they were extracted from.
2.1 Alternation of low/high collision energy An important early DIA proof-of-principle for the analysis of proteomics samples was reported in 2003 by Purvine and coworkers [6]. A peptide mixture was analyzed twice, using two different nozzle-skimmer voltages by LC-ESI coupled to an orthogonal time-of-flight (TOF) mass spectrometer. Sample analysis using low voltage generated mainly precursor ion spectra, whereas high voltage generated product ion spectra by in-source fragmentation. Since peptides were subjected to CID in parallel and resulting spectra contained fragments of multiple precursor ions, they termed the method shotgun CID. Later in 2005, acquisition with hybrid quadrupole collision-quadrupole time-of-flight (QqTOF) instrument (Q-TOF Ultima API) comprising a quadrupole filter, a collision cell and a TOF analyzer was proposed by Waters as an alternative to in-source fragmentation. The refined approach using this device with faster and LC-compatible alternation rate between low and high voltage combined with a high-performance LC system was commercialized under the name MSE . In MSE platforms, precursor and fragment ion spectra are acquired during a single chromatographic run by www.proteomics-journal.com
C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Purvine et al. 2003 [6] Venable et al. 2004 [11]
Silva et al. 2005 [7]
Geiger et al. 2010 [10]
Carvalho et al. 2010 [14] Panchaud et al. 2011 [13]
Shotgun CID DIA
MSE
AIF
XDIA
Gillet et al. 2012 [16]
Egertson et al. 2013 [17]
SWATH
MSX
Fixed
Stepwise windows / 35 s In-cell CID / QqTOF Alternation low/high CE / 2 s In-cell CID / Alternation HCD-Orbitrap low/high CE / 2 s In-cell ETD and CID / Stepwise LIT-Orbitrap windows In-cell CID / LIT Stepwise windows / 2.4 s In-cell CID / LIT-FT Stepwise windows / 5.45 s In-cell CID / QqTOF Stepwise windows / 3.2 s In-cell CID / Random Q-HCD-Orbitrap windows / 3.5 sf)
In-cell CID / LIT
In-source CID / TOF
400–1200
500–900
20d)
500–1000
400–1400
300–2000
400–1400
26
100
2.5
20
None
1700
10
None
0
1
0
1
1
0
Yes (Orbitrap)
Optional (TOF)
No
No
Yes (Orbitrap)
Yes
Yes (TOF)
No
Yes
1
1
1
2
20e) (Orbitrap)
32 (TOF)
5 (FT)
15 (per injection)
1
1
1
45
Several (LIT) 1
1
1 (TOF)
100
1
Fragmentation / MS- Cycling mode / Precursor ion isolation Time-continuous spectra No. of instrumentation cycle time runsb) Window Total m/z Number of Over-lap Precursor ion width fragment ion range dataa) (m/z) maps covered
Precursor m/z tolerance reducible to 4u. Nonconstant sampling frequency per window, reduced precursor m/z range
Optimized and ramped CE per isolation window. Longer cycle time compared to MSE
High-resolution MS/MS spectra. Longer cycle time, reduced precursor m/z range
High-resolution precursor ion data. Limited-resolution MS/MS spectra Isolation windows comparable to DDA. Many runs per sample required
Short cycle time, higher resolution compared to MSE . Increased likelihood of interferences
Short cycle timec) , wider precursor m/z range. Increased likelihood of interferences
Limited-resolution MS/MS spectra, very long cycle time
Less controlled fragmentation
Comments
A. Bilbao et al.
a) Despite some original publications did not require MS1 data, precursor ion scans can be included in any method. b) Number of sample injections required. c) Cycle time < 2 s is used in more recent applications (e.g., 1.3 s in [20]). d) Combination of 5 nonconsecutive subwindows of 4u each. e) 100 after demultiplexing. Each multiplexed spectrum was demultiplexed into five spectra yielding one MS2 spectrum per precursor isolation window of 4u. f) 20 MS2 scans with an MS1 scan every 10 scans. On average, each 4u window was fragmented every 3.5 s.
Weisbrod et al. 2012 [15]
FT-ARM
PAcIFIC
Reference
Method name
Table 1. Data-independent acquisition methods
966 Proteomics 2015, 15, 964–980
www.proteomics-journal.com
967
Proteomics 2015, 15, 964–980
Figure 1. Data-independent acquisition schemes in bottom-up proteomics.(1) LC separation and ESI: after protein digestion, complex peptide mixtures are separated by LC, ionized by electrospray (ESI) and resulting ions are analyzed by the mass spectrometer. During the analysis, the mass spectrometer iterates in cycles specifically defined for each particular method. The red arrow indicates a particular cycle in the run (blue trace). (2) Precursor ion detection: unfragmented peptides may be detected in an MS1 scan at the beginning of each cycle. (3) Precursor ion isolation: fragment ion detection systematically covers a defined precursor m/z range (e.g., 350–1250, gray bar) with one (A) or several (B and C) isolation windows which filter precursor ions. (4) Fragmentation: selected precursor ions are fragmented in parallel. (5) Fragment ion detection: each corresponding fragment ion scan is recorded. (B) Green boxes show examples of the frequency distribution of all detected precursor ions in a run for methods with stepwise windows, given a particular sample containing more precursor ions toward the center of the m/z range. A “mountain-like” shape would be observed using different values for fixed width methods (e.g., left green box). Using MS1 data for the proteome of interest, window widths can be optimized such that each width is inversely proportional to the precursor ion density, equalizing the number of precursor ions selected within each window and hence reducing the total number of precursor ions concurrently fragmented (right green box). (C) Each MSX fragment ion scan combines five nonconsecutive and randomly selected subwindows of 4u each.
alternating low and high CE scans. Additionally, the radio frequency applied to the first quadrupole can be adjusted such that ions from the preferred range, for example, m/z 300 to 2000, are efficiently transmitted ensuring that any peak observed in high CE scans with less than m/z 300 is produced by dissociation in the collision cell. Silva and coworkers described early applications of MSE -based methods for relative [7, 8] and absolute [9] label-free quantification of proteins. In 2010, Geiger et al. demonstrated a similar acquisition strategy using a benchtop device (ExactiveTM ) composed of a higher energy collisional dissociation (HCD) cell, a C-trap, and a Kingdon trap mass spectrometer (better known as the trademarked term Orbitrap) [10]. Precursor ion selection was not possible with this instrument developed for small molecule applications and operation in DIA mode was called All-Ion Fragmentation by Thermo Scientific. In All-Ion C 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Fragmentation scans, peptides are injected into the HCD cell for fragmentation, fragment ions are then moved back into the C-trap from where they are injected into the Orbitrap for analysis.
2.2 Stepwise isolation windows Rather than alternating low and high CE scans, in 2004, Venable and co-workers defined data acquisition based on the sequential isolation and fragmentation of several precursor ion windows (10u each) using an LTQ—a linear ion trap (LIT)—over a defined mass range, thereby introducing the term data-independent acquisition [11]. Through the selectivity of this method and despite a longer cycle time, extracted ion chromatograms (XICs) of fragments from MS2 spectra www.proteomics-journal.com
968
A. Bilbao et al.
contained less background noise providing a dramatic signalto-noise improvement compared to XICs of precursors from MS1 spectra. Then, Panchaud et al. described a variation of Venable’s concept in 2009 by using smaller precursor isolation windows (2.5u) on an improved version of the LIT [12]. Isolation windows comparable to DDA reduce the complexity of the MS2 spectra and increase the number of protein identifications, yet multiple injections of the same sample are required to cover the same mass range (approximately, 67 injections during five days). This method was termed precursor acquisition independent from ion count (PAcIFIC). Later in 2011, Panchaud and co-workers presented an optimized PAcIFIC acquisition on a faster ion trap (LTQ VelosTM ) which decreases the total time needed to perform the analysis, from five to less than two days, approximately [13]. A variation on this theme was introduced by Carvalho and co-workers in 2010 [14] who described a method called extended dataindependent acquisition (XDIA). An XDIA cycle also consist of a series of MS2 events using stepwise precursor isolation windows (e.g., windows of 20u acquired with the LIT) but one high-resolution MS1 scan (e.g., acquired with the Orbitrap) is included at the beginning of each cycle. In addition, fragmentation involves two scan events: ETD (electron transfer dissociation) without supplemental activation followed by CID. In an attempt to skip precursor mass measurement, in 2012 Weisbrod et al. proposed an approach named Fourier transform-all reaction monitoring (FT-ARM) using precursor isolation windows of either 12 or 100u [15]; all ions within a 100u precursor isolation window are selected and fragmented within a LIT and analyzed in an FT-based mass spectrometer: an FT-ICR or an Orbitrap. In the same year, Gillet et al. [16] presented a similar acquisition concept but using a QqTOF system (TripleTOF 5600TM ) achieving precursor isolation windows of 26u in the first quadrupole and recording high-resolution (mass accuracy