Nov 12, 2015 - Biomarker Development for Intraductal Papillary Mucinous. Neoplasms Using Multiple Reaction Monitoring Mass Spectrometry. Yikwon Kim,.
Article pubs.acs.org/jpr
Biomarker Development for Intraductal Papillary Mucinous Neoplasms Using Multiple Reaction Monitoring Mass Spectrometry Yikwon Kim,†,# MeeJoo Kang,‡,# Dohyun Han,† Hyunsoo Kim,† KyoungBun Lee,§ Sun-Whe Kim,‡ Yongkang Kim,∥ Taesung Park,∥,⊥ Jin-Young Jang,*,‡ and Youngsoo Kim*,† †
Department of Biomedical Engineering, ‡Surgery and Cancer Research Institute, and §Department of Pathology, Seoul National University College of Medicine, 28 Yongon-Dong, Seoul 110-799 Korea ∥ Department of Statistics and ⊥Interdisciplinary Program in Bioinformatics, Seoul National University, Daehak-dong, Seoul 151-742, Korea S Supporting Information *
ABSTRACT: Intraductal papillary mucinous neoplasm (IPMN) is a common precursor of pancreatic cancer (PC). Much clinical attention has been directed toward IPMNs due to the increase in the prevalence of PC. The diagnosis of IPMN depends primarily on a radiological examination, but the diagnostic accuracy of this tool is not satisfactory, necessitating the development of accurate diagnostic biomarkers for IPMN to prevent PC. Recently, highthroughput targeted proteomic quantification methods have accelerated the discovery of biomarkers, rendering them powerful platforms for the evolution of IPMN diagnostic biomarkers. In this study, a robust multiple reaction monitoring (MRM) pipeline was applied to discovery and verify IPMN biomarker candidates in a large cohort of plasma samples. Through highly reproducible MRM assays and a stringent statistical analysis, 11 proteins were selected as IPMN marker candidates with high confidence in 184 plasma samples, comprising a training (n = 84) and test set (n = 100). To improve the discriminatory power, we constructed a six-protein panel by combining marker candidates. The multimarker panel had high discriminatory power in distinguishing between IPMN and controls, including other benign diseases. Consequently, the diagnostic accuracy of IPMN can be improved dramatically with this novel plasma-based panel in combination with a radiological examination. KEYWORDS: IPMN, plasma, biomarker development, LC−MRM, targeted proteomics
■
INTRODUCTION
However, IPMN has no specific symptoms or signs, and those who do not undergo radiologic surveillance cannot be diagnosed. Thus, accurate and effective screening tools, such as protein biomarkers, are needed to identify patients with IPMN. There are few studies on biomarkers with regard to the diagnosis of IPMN. Serum tumor markers, such as CEA and CA19-9, have been introduced, but their accuracy is merely approximately 70%. Recently, GNAS and KRAS mutations in pancreatic tissue4 and cyst fluid5 have been implicated as promising biomarkers for the detection of IPMN, but these biomarkers are evaluated in pancreatic tissue that can only be obtained after surgery and cyst fluid that is acquired invasively. The ideal biomarker must be specific, sensitive, and predictive. However, importantly, the samples that are used to test for it should be taken noninvasively through simple, rapid, accurate, and inexpensive methods. As a result, we need biomarkers for
Intraductal papillary mucinous neoplasm (IPMN) is a frequent precursor of pancreatic cancer (PC) that progresses from lowgrade dysplasia to invasive IPMN and ductal adenocarcinoma. PC is one of the most lethal cancers, with a 5 year survival rate of approximately 7%. There are no specific symptoms or signs that suggest PC, and a late diagnosis of advanced-stage disease is one of the reasons for the poor treatment outcomes. However, when diagnosed in the early stages, the 5 year survival rate of PC exceeds 58%.1 Thus, it is critical to detect PC in the early stages, and much effort has been made in examining its precursor lesions, especially IPMN. Most IPMNs are benign, but the prognosis of invasive IPMN is poorer than that of stomach and colon cancer, with a 5 year survival rate of 49%.2 If IPMN progresses to pancreatic ductal adenocarcinoma, this rate decreases to 29%,3 necessitating the early detection of patients with IPMN before it advances to invasive IPMN or PC. IPMN is diagnosed primarily by radiological examination, and its incidence and rate of detection have risen by more than 14-fold in the past 2 decades. © 2015 American Chemical Society
Received: June 15, 2015 Published: November 12, 2015 100
DOI: 10.1021/acs.jproteome.5b00553 J. Proteome Res. 2016, 15, 100−113
Article
Journal of Proteome Research Table 1. Cohort Characteristicsa
IPMN that can be detected in samples of blood, which is more accessible than cyst fluid. The current limitations of diagnostic biomarkers are that they entail labor-intensive procedures and require significant amounts of time for analysis, necessitating sensitive and robust analysis tools to mitigate these bottlenecks. For example, multiple reaction monitoring (MRM) is a multiplex method that uses a triple-quadrupole (QQQ) mass spectrometer that quantifies several proteins in a single microscopy run. MRM performs highly specific detection of numerous target proteins without background interference,6−8 which increases throughput and limits the loss of samples. Furthermore, the sensitivity and accuracy of MRM have been improved through the development of the mass spectrometer and stable isotope standard (SIS) peptides as references. The sensitivity of MRM is similar to that of ELISA, which is the traditional diagnostic method for protein markers (ng/mL). Also, the quantitative results of MRM show high correlation (R2 > 0.9) with ELISA.9 Thus, we adopted MRM to perform a precise and robust quantification of the IPMN proteome. On the basis of the clinical significance of IPMN, the easy accessibility of blood samples for biomarkers, and the highthroughput nature of MRM, we performed MRM quantification to identify plasma-based biomarkers of IPMN. We discovered several preliminary candidates by comparing their abundance using strict statistical methods and verified them in two independent plasma sample sets. To improve the discriminatory power, we constructed a multiprotein panel by combining several marker candidates. Our panel showed high discriminatory power in differentiating IPMN from the heterogeneous control group, demonstrating its value as a diagnostic marker of IPMN and potential in preventing PC.
■
chronic cholecystitis age (mean ± SD) sex (M:F) CA19-9 (U/ml, mean ± SD) CEA (μg/l, mean ± SD) healthy control age (mean ± SD) sex (M:F) CA19-9 (U/ml, mean ± SD) CEA (μg/l, mean ± SD) IPMN age (mean ± SD) sex (M:F) CA19-9 (U/ml, mean ± SD) CEA (μg/l, mean ± SD)
training set (N = 84)
test set (N = 100)
N = 25 49.4 ± 8.5 8:17 9.2 ± 8.1 1.5 ± 0.6 N = 25 55.6 ± 11.1 11:14 22.8 ± 55.2 1.4 ± 0.7 N = 34 64.4 ± 7.7 23:11 21.6 ± 43.6 1.5 ± 0.6
N = 50 54.2 ± 7.0 25:25 9.0 ± 6.0 1.3 ± 0.6 N = 50 67.1 ± 8.5 31:19 48.7 ± 120.6 2.2 ± 2.9
a
Abbreviation: M, male; F, female; SD, standard deviation; IPMN, intraductal papillary mucinous neoplasms; CA19-9, carbohydrate antigen 19-9.
cancer proteome were examined by manual inspection14−31 and filtered using the Plasma Proteome Database (PPD, http:// www.plasmaproteomedatabase.org)32 to improve the detection rate by mass spectrometry; 161 proteins were selected as a result. Thus, a total of 260 proteins were selected as initial MRM target proteins through these steps (Figure 1A), listed in Supplementary Table 1. Sample Preparation
A total of six of the most abundant human plasma proteins were depleted on a high-performance liquid chromatography (HPLC) instrument (Shimadzu Co., Kyoto, Japan) that was equipped with a MARS-6 liquid chromatography (LC) column as described.33 The crude human plasma samples were diluted 5× with buffer A and then passed through 0.22 μm centrifugal filter units. After this, 200 μL of each crude plasma sample was loaded onto the column, and flow-through fractions were collected. The depleted plasma samples were concentrated on Amicon Ultra 3K units. Next, 100 μg of each protein sample was denatured in 8 M urea, 20 mM DTT, and 100 mM Tris pH 8.0 for 60 min at 37 °C and alkylated with 50 mM IAA for 40 min at room temperature in the dark. The samples were diluted 10-fold with 100 mM Tris (pH 8.0), and trypsin was added at a 1:50 (w/w) enzyme-to-protein ratio overnight at 37 °C. The digested samples were acidified with 100% formic acid. The digested samples were desalted using OASIS HLB 1 cc (30 mg) extraction cartridges.34 Briefly, the cartridges were activated with methanol and acetonitrile and equilibrated with 0.1% formic acid. The digested samples were loaded onto the cartridge and washed with 0.1% formic acid. Finally, peptides were eluted sequentially with 40% and 60% acetonitrile in 0.1% formic acid. The eluted samples were lyophilized and stored at −80 °C before MRM analysis. For the MRM analysis, the samples were reconstituted in 50 μL 0.1% formic acid (2 μg/ μL), and 50 fmol of ß-galactose peptide (GDFQFNISR) was spiked into each sample as an external standard for preliminary MRM quantification measurements.
MATERIALS AND METHODS
Materials
Sequencing-grade modified trypsin was purchased from Promega (Madison, WI). Multiple Affinity Removal System (MARS-6, 5185−5984) columns, buffer A (5185−5987), and buffer B (5185−5988) were obtained from Agilent (Santa Clara, CA). Amicon Ultra 3K units were purchased from Millipore (Bedford, MA, U.K.). The SIS peptides (isotope labeled at 13C and 15N) were synthesized at crude levels from JPT (Berlin, Germany). The OASIS HLB 1 cc (30 mg) extraction cartridge was acquired from Waters (Milford, MA). Sample Collection
All plasma samples were collected by the Department of Surgery, Seoul National University College of Medicine, and approved by the institutional review board of Seoul National University Hospital (no. 1412-051-632). A total of 184 plasma samples were used for this study. All sample characteristics are presented in Table 1. Plasma samples were centrifuged for 10 min at 3000g and 4 °C, and the supernatant fractions were stored at −80 °C before analysis. Global Data Mining
The data mining was performed using previous reports and a public database to select PC-related proteins. First, the public database Oncomine10 was searched the following criteria: pancreatic cancer and p-value less than 0.05. Consequently, 235 proteins were selected. A total of four mutant forms of KRAS, GNAS, and AGER were added on the basis of their significance in PC.11−13 Next, 18 studies that were related to the pancreatic
Liquid Chromatography−MRM/MS
MRM analysis was performed on a 6490 triple-quadrupole mass spectrometer that was coupled to an Agilent 1260 Infinity 101
DOI: 10.1021/acs.jproteome.5b00553 J. Proteome Res. 2016, 15, 100−113
Article
Journal of Proteome Research
Figure 1. Venn diagram of data mining and overall flowchart. (A) A Venn diagram of the data mining. A total of 235 proteins were searched in the public database Oncomine, and four proteins with mutant forms were added to the Oncomine-based proteins: KRAS_G12D, AGER_G82S, GNAS_R201C, and GNAS_R201H. The sum of these proteins is displayed with the ONCOMINE + Mutant category of the Venn diagram. A total of 161 proteins were then searched in previous papers and PPD. These proteins are displayed with previous papers and the PPD category of the Venn diagram. A total of 260 proteins were selected as the initial target proteins and processed further. (B) Overall flowchart of the quantitative analysis of IPMN. The IPMN samples were divided into two groups: a training set and a test set. The training set consisted of 34 IPMN and 50 heterogeneous control plasma samples, including benign controls. The test set was composed of 50 IPMN and 50 healthy control plasma samples after sample randomization and blind testing. MRM quantification for the discovery of marker candidates was performed in the training set, and 22 proteins were discovered as IPMN candidate markers for further verification; 11 of the 22 proteins were verified in an independent sample set (test set) and were selected as IPMN candidate markers. For the fixed-marker candidates, multivariate analysis was performed by combining candidates by logistic regression. Consequently, a six-protein panel was constructed in the training set and verified in the test set. This panel had powerful discriminatory power against benign controls and healthy controls. In a further verification step by cross-validation, the classifiers performed consistently and reliably, regardless of sample composition.
HPLC system (Agilent Technologies, Santa Clara, CA).35 A total of 5 μL of each sample, dissolved in 100 μL of solution A (0.1% formic acid in water), was loaded onto an analytical column (Agilent, Zorbax SB-C18, 150 mm × 0.5 mm i.d., 3.5μm particle size). The peptides were separated at a flow rate of 10 μL/min using a 70 min gradient, consisting of 3% solution B (0.1% formic acid in acetonitrile) for 5 min, 3% to 35% solution B for 45 min, 35% to 70% solution B for 1 min, 70% solution B for 10 min, and 70% to 3% solution B for 9 min. The instrument settings were: 2500 V ion-spray capillary voltage, 2000 V nozzle voltage, 250 °C drying-gas temperature at a gasflow rate of 15 L/min, and 350 °C sheath-gas temperature at a gas-flow rate of 12 L/min. Additional parameters were set as follows: 30 psi nebulizer, 380 V fragmentor voltage, 5 V cell accelerator voltage, 200 V delta EMV, 0.7 fwhm quadrupole resolution, and approximately 2 ms cycle time.
Collision energies (CEs) for each transition were optimized in ramping collision-energy mode. CE was ramped from the predicted value to ±2 V of the step size in five increments. Of 11 CE values, the best value was selected on the basis of the peak intensity of each transition. Generation of Target Peptides by in Silico Digestion
The FASTA file of target proteins was imported into Skyline (MacCoss Lab Software, ver 1.4.0), and in silico digestion was performed based on the following criteria: 2+ charge-state for fragment ion; MS spectrum range, 300−1400 m/z; and peptide length, 7−24 amino acids. The N-terminal signal peptides and peptides that contained Met, His, an NXT/NXS motif, or RP/ KP were excluded. Approximately 5−6 transitions per peptide were generated, and those that were detected with at least 3 transitions were chosen as signature peptides. Furthermore, transition patterns of the detected peptides were compared 102
DOI: 10.1021/acs.jproteome.5b00553 J. Proteome Res. 2016, 15, 100−113
Article
Journal of Proteome Research
controls (healthy control n = 25; chronic cholecystitis n = 25) were used to identify significantly changed proteins in IPMN. To demonstrate the specificity of the candidates for IPMN, we included chronic cholecystitis samples in the control group as benign controls. In the test set, plasma samples from 50 IPMN patients (LGD and IGD, n = 25; HGD and invasive IPMN, n = 25) and 50 healthy controls were used to verify and determine IPMN markers. Overall, the IPMN plasma samples were distributed uniformly with regard to clinicopathologic characteristics (LGD, n = 21; IGD, n = 21; HGD, n = 21; invasive IPMN, n = 21). The clinical characteristics of the plasma samples, including age, sex, CEA, and CA19-9, are listed in Table 1.
with the NIST library (National Institute of Standards and Technology, http://peptide.nist.gov/), which includes over 200 000 mass spectra from large proteomic data sets. The comparison between the query and library spectra is represented as a dot-product score based on the scale of peak intensity. The threshold of dot-product score was set to 0.5, on the basis of an earlier study,36 and most target peptides in this study had a dot-product score of >0.6. The removal of interference and poor chromatograms was processed using the Protein Blast database (http://blast.ncbi. nlm.nih.gov/),37 AUDIT,38,39 and SRM Collider.40 Data Analysis and Statistics
The raw MRM analysis data were analyzed in Skyline. Savitzky−Golay smoothing was applied to increase the quality of the chromatograms. The peak integration was processed by manual inspection to correct for false assignments. In the preliminary quantification, endogenous peptides were normalized to ß-galactose peptide (GDFQFNISR) to correct experimental variation. In a subsequent, more precise quantification, the ratios of the areas of endogenous to corresponding SIS peptides (L/H) were calculated to measure protein abundance. On the basis of intensity and consistency, the best transition was selected and used for the quantification. The statistical analyses were performed using SPSS 21 (IBM)41,42 and MSstats43,44 to determine the significance of target proteins. Independent t-test, Mann−Whitney U test, logistic regression, and classification analysis were performed using SPSS 21. LMM (linear mixed model) analysis, and sample-size calculations were performed in MSstats. ROC (receiver operating characteristic) analysis was conducted using MedCalc (MedCalc, Mariakerke, Belgium, v10.0.1.0).45,46 The scatter-plots were drawn by PCA (principal component analysis) using R version 3.2.0. The heatmaps were constructed with the “gplots” package in R, version 3.2.0, using Euclidean distance and complete measure to perform clustering.
Overall Workflow of MRM Analysis
A total of 260 PC-related proteins were selected by data-mining with previous reports and a public database (Figure 1A). Of these proteins, 104 unique candidates were detected in the pooled plasma sample by MRM analysis (40% detection rate). Next, MRM analysis of individual plasma samples (training set, n = 84) was performed in two steps. In the first step, a preliminary quantification was conducted to generate a list of synthetic peptides as reference peptides. By pairwise comparison, the abundance of 29 proteins changed significantly (pvalue 20%). Ultimately, 22 proteins were differentially expressed in IPMN on the basis of LMM (adjusted p-value of