Screening and Validation of Novel Biomarkers in ... - ACS Publications

0 downloads 0 Views 2MB Size Report
Apr 5, 2017 - Cristina Ruiz-Romero,. ⊥. Ronald Sjöber,. ¶. Conrad Droste,. §. Javier De Las Rivas,. §. Peter Nilsson,. ¶. Francisco Blanco,*,⊥ and Manuel ...
Article pubs.acs.org/jpr

Screening and Validation of Novel Biomarkers in Osteoarticular Pathologies by Comprehensive Combination of Protein Array Technologies Á lvaro Sierra-Sánchez,†,‡,∥ Diego Garrido-Martín,†,‡,∥ Lucía Lourido,⊥ María González-González,†,‡ Paula Díez,†,‡ Cristina Ruiz-Romero,⊥ Ronald Sjöber,¶ Conrad Droste,§ Javier De Las Rivas,§ Peter Nilsson,¶ Francisco Blanco,*,⊥ and Manuel Fuentes*,†,‡ †

Department of Medicine and General Cytometry Service-Nucleus, ‡Proteomics Unit, and §Bioinformatics and Functional Genomics Research Group, Cancer Research Centre (IBMCC/CSIC/USAL/IBSAL), 37007 Salamanca, Spain ⊥ Proteomics Group-PBR2-ProteoRed/ISCIII, Rheumatology Division, Instituto de Investigación Biomédica de A Coruña (INIBIC/CHUAC/Sergas/UDC), 15001 A Coruña, Spain ¶ Affinity Proteomics, Science for Life Laboratory, School of Biotechnology, Royal Institute of Technology (KTH), SE-17165 Stockholm, Sweden S Supporting Information *

ABSTRACT: Osteoarthritis (OA) is one of the most prevalent articular diseases. The identification of proteins closely associated with the diagnosis, progression, prognosis, and treatment response is dramatically required for this pathology. In this work, differential serum protein profiles have been identified in OA and rheumatoid arthritis (RA) by antibody arrays containing 151 antibodies against 121 antigens in a cohort of 36 samples. Then the identified differential serum protein profiles have been validated in a larger cohort of 282 samples. The overall immunoreactivity is higher in the pathological situations in comparison with the controls. Several proteins have been identified as biomarker candidates for OA and RA. Most of these biomarker candidates are proteins related to inflammatory response, lipid metabolism, or bone and extracellular matrix formation, degradation, or remodeling. KEYWORDS: antibody arrays, serum protein profiles, osteoarthritis, rheumatoid arthritis, contact printing, noncontact printing, biomarkers

1. INTRODUCTION

most OA treatments are also symptomatic, and the only effective therapy in advanced stages is joint replacement.2 Important advances have been done during the past decade in the understanding of the pathogenesis of OA and other diseases affecting the joint tissues such as rheumatoid arthritis (RA, where autoimmune response leads to the joint destruction). However, we are still far from having a clear picture of the molecular network that predisposes an individual to develop the disease, to worsen symptoms, or to successfully respond to a specific treatment. In this regard, the identification of proteins closely associated with the diagnosis, disease progression, prognosis, and treatment response in these pathologies is dramatically required. Over the last years, multiple biological markers have been proposed that may reflect the synthesis or degradation of

Osteoarthritis (OA) is one of the most prevalent articular diseases. It is characterized by a gradual loss of cartilaginous matrix that often extends over years and decades. Indeed, age is the main risk factor for OA. Thus, as longevity increases, OA has become a leading cause of disability for older adults in developed countries. Worldwide estimates indicate that approximately 10% of men and 18% of women aged over 60 years have symptomatic osteoarthritis. Besides, about 80% of them present substantial limitations of movement, and 25% cannot perform their major daily activities.1 Currently, OA diagnosis is essentially symptomatic and relies on the description of pain symptoms, stiffness of the affected joints, and radiography, which is still the reference technique to determine the degree of joint destruction. However, it provides only indirect information about the tissue and lacks sensitivity to detect small changes in the joint structures. Furthermore, © 2017 American Chemical Society

Received: November 14, 2016 Published: April 5, 2017 1890

DOI: 10.1021/acs.jproteome.6b00980 J. Proteome Res. 2017, 16, 1890−1899

Article

Journal of Proteome Research

St Louis, USA); peroxidase-AffiniPure Goat Anti-Human IgG, Fcγ Fragment Specific, Peroxidase-AffiniPure F(ab’)2 Fragment Goat Anti-Mouse IgG (H+L) (Jackson ImmunoResearch Laboratories, Baltimore, USA); TSA individual cyanine 3 Tyramide Reagent Pack (TSA) (PerkinElmer, Waltham, ́ USA); slides Ground Edges 76 × 26 mm (LineaLab, Badalona, Spain); Amersham Cy5-Streptavidin, 16-Array Chamber Covers (GE Healthcare, Buckinghamshire, UK); Goat AntiRabbit IgG (H+L) HRP Conjugate (BIO-RAD, California, USA), powdered concentrated skimmed milk (Central Lechera Asturiana, Granda-Siero, Spain). Purified rabbit polyclonal antihuman antibodies against the antigens of interest (Supporting Table 1) were kindly provided by the Human Protein Atlas (http://www.proteinatlas.org/). These antigens had been previously described to be altered in osteoarthritic patients at different levels in cartilage (extracellular matrix, chondrocytes) and blood by genomic and mass spectrometry (MS) assays.16−20

the three main joint tissues (cartilage, synovial membrane, and bone).3 However, none of them is sufficiently validated and qualified for its systematic use. Proteomic profiling technologies are powerful tools for biomarker discovery and validation. The basic strategy implies the fractionation of proteins contained in the samples followed by peptide identification using mass spectrometry (MS). This allows an indirect and highly specific identification of the proteins.4,5 Indeed, several studies have recently reported the MS characterization of synovial fluid, cartilage, or subchondral bone to identify proteins as potential biomarker candidates.6−9 Nonetheless, MS lacks sensitivity to analyze complex biological samples. In the case of human serum, the high dynamic range of protein concentrations10 precludes the direct detection of medium/low abundant biomarkers by MS.11 In contrast, the use of high-throughput protein microarrays offers a direct approach to simultaneously screen thousands of antigens in an unbiased manner using a minimal amount of sample. Moreover, it has proven to be a suitable tool for antigen and autoantibody profiling in complex samples across multiple diseases.2,12 Among other aspects, one of the key steps in the construction of protein microarrays is the methodology for spotting/printing the proteins or antibodies onto the functionalized surface of the slide.13 Currently, multiple technologies are commercially available with different printing procedures, most of them imported from DNA arrays. These procedures rank from simple deposition by pins or needles, followed by adsorption of the biomolecule onto the functionalized surface (commonly called contact printing), to more complex nanoinjection fluidic approaches where nanodrops are dispensed by piezo-electric systems onto the functionalized surface (commonly referred to as noncontact printing).14 Both approaches have advantages and drawbacks regarding intra- and interarray reproducibility, surface format (planar slides, microtiter wells, color-coded beads, etc.), real-time detection systems, or compatibility with complex biological samples, among others.15 In this study, we pursued the characterization of differential serum protein profiles in OA patients, RA patients, and healthy controls (C). With this aim, in a preliminary discovery phase, a small subset of samples (n = 36) was employed, and researchers performed a comprehensive evaluation of the suitability of the clinical samples, antibodies, reagents, and experimental procedures for biomarker discovery. This cohort was screened against a panel of 151 antibodies (151 antibodies against 121 proteins, 151 × 121) using antibody arrays developed by contact printing. Then a validation phase was performed in a larger cohort (n = 282 samples) of the different patient groups (OA, RA, C) by screening the sera employing 151 × 121 antibody arrays developed by noncontact printing.

2.2. Patients

Human serum samples corresponding to patients belonging to three different groups: osteoarthritis (OA), rheumatoid arthritis (RA), and healthy controls (C) were provided by the Biobank at the Institute for Biomedical Research of A Coruña (INIBIC). The samples were extracted and processed after written informed consent was signed by each donor according to the guidelines of the local Ethics Committee (Comité Ético de Galicia, Galicia, Spain). The OA group consisted of 108 patients diagnosed with OA according to the American College of Rheumatology (ACR) criteria.21 The RA group comprised 108 patients diagnosed with RA following the ACR/European League Against Rheumatism (EULAR) criteria.22 The 102 healthy controls selected for the analyses were donors without a history of joint disease. In the first discovery phase, a small subset of 36 samples (OA, n = 12; RA, n = 12 and C, n = 12) was selected. In a second validation stage, a total of 282 samples were studied (OA, n = 96; RA, n = 96 and C, n = 90). 2.3. Methods

2.3.1. Surface Functionalization. The glass slide surfaces were activated8 by treatment with 2% (v/v) MANAE in acetone for 30 min with shaking at room temperature (RT). Slides were subsequently washed with acetone and Milli-Q water and dried with compressed filtered air. 2.3.2. Microarray Preparation. 2.3.2.1. Contact Printing Technology (CT). Antihuman polyclonal antibodies (0.25 mg/ mL) were supplemented with 2 mM BS3 as cross-linker. Spotting buffers with and without cross-linker, in the absence of antibodies, as well as BSA (0.038 × 10−3 mg/mL to 0.625 mg/ mL range), were used as negative controls. In turn, positive controls included NHS-PEG4-biotin (0.78 mg/mL) and goat antihuman IgG (0.25 mg/mL). Antibodies and controls were spotted onto MANAE-functionalized slides using the MicroGrid II printer with a 4 × 4 384 split spot tool (BioRobotics, Cambridge, UK). Each sample was spotted in quadruplicate in each of the three identical subarrays contained in each slide. Spot diameter was set at 150 μm, being the distance between spots 585 μm (see Supporting Figure 1A). Eventually, printed slides were packed and stored protected from light in a dry atmosphere at RT, until assayed. 2.3.2.2. Noncontact Printing Technology (NC). In this case, antihuman polyclonal antibodies were resuspended (1:1) in a 47% (v/v) glycerol solution, according to the ArrayJet Printer

2. MATERIALS AND METHODS 2.1. Materials

Acetone >98% (Panreac, Barcelona, Spain), 3-(2-aminoethylamino) propyl-methyl dimethoxysilane (MANAE) (Fluka, Steinheim, Germany), dimethyl sulfoxide (DMSO) (Merck Millipore, Billerica, USA), bis(sulfosuccinimidyl)suberate (BS3), Nunc 384 clear flat well plates, SuperBlock Blocking Buffer, Microtiter Plate 96 Well/V Bottom, Lifterslip coverslips (Thermo Scientific, Portsmouth, USA); Bovine Serum Albumin (BSA) > 98%, NHS-PEG4-Biotin, Tween20 viscous liquid (polyoxyethylenesorbitan monolaurate), ampicillin sodium, Corning hybridization chambers (Sigma-Aldrich, 1891

DOI: 10.1021/acs.jproteome.6b00980 J. Proteome Res. 2017, 16, 1890−1899

Article

Journal of Proteome Research

signal in companion buffers’ spots. Background signal subtraction within each array was followed by fold change calculation with respect to a blank across arrays:24

Marathon v1.4 specifications (ArrayJet, Roslin, UK). NHSPEG4-biotin (0.39 mg/mL) was prepared as positive control. Spotting buffers with and without cross-linker, in the absence of antibodies, and BSA (0.6 mg/L to 3.66 mg/mL range) were prepared as negative controls. Slides printed using the ArrayJet Printer Marathon contained 12 identical subarrays, each one including all antibodies and controls to be analyzed. The spot diameter was set at 100 μm and the separation distance among spots at 200 μm (see Supporting Figure 1B). A total of six serum samples were analyzed per array, in duplicate. Eventually, printed arrays were packed and stored protected from light in a dry atmosphere at RT until assayed. 2.3.3. Evaluation of Array Performance. All the following steps were performed at RT. Antibody arrays were blocked with a SuperBlock-PBS solution for 1 h on a rocking platform. Then they were washed (three times) with PBS for 5 min. After that, the arrays were incubated with HRP-conjugated antirabbit secondary IgG (1:200 (v/v) in SuperBlock-PBS) for 1 h in a humidified chamber. Then they were individually washed with (i) PBS (5 min, three times) and (ii) distilled water (5 min, once). Subsequently, arrays were incubated with 1:50 (v/v) TSA solution for 10 min in a humidified chamber. Arrays were then washed as described above and dried with filtered compressed air. Finally, they were scanned using the GenePix 4000B Scanner (Axon Instruments, Union City, USA) and the SensoSpot Fluorescence Scanner (Sensovation AG; Radolfzell, Germany) for CT and NC technologies, respectively, and analyzed. 2.3.4. Sera Biotinylation. Following the protocol described by Häggmark A. et al.,23 proteins present in OA, RA, and C sera samples were biotinylated by incubation with 0.78 mg/mL of NHS-PEG4-biotin for 2 h at 4 °C. Biotinylation reactions were stopped with 0.5 M Tris-HCl (pH 8). 2.3.5. Detection of Protein Serum Profiles. All steps were performed at RT unless otherwise specified. 2.3.5.1. Discovery Phase. CT antibody arrays were blocked with a blocking solution (1% PBS, 0.2% Tween20 and 5% (w/ v) powdered skimmed milk) for 1 h on a rocking platform following by washing with distilled water (5 min, once). Each array was incubated with 1:600 (v/v) serum (diluted in blocking solution) with slight shaking at 4 °C, overnight (O/ N). Subsequently, slides were incubated with 1:100 (v/v) Cy3Streptavidin for 1 h in darkness, in a humidified chamber. Prior to scanning, arrays were washed as described above and dried with filtered compressed air. 2.3.5.2. Validation Phase. NC arrays were blocked with Superblock for 1 h with shaking and, subsequently, washed with distilled water (5 min, three times). Then 40 μL of 1:1000 (v/ v) biotinylated serum was added to each well of the 16-array chamber. Chambers were covered and incubated O/N at 4 °C with slight shaking. After that, the arrays were individually washed with distilled water and revealed using 1:50 (v/v) Cy5Streptavidin for 20 min. Finally, arrays were washed, dried with compressed filtered air, and scanned. 2.3.6. Image Analysis. The TIFF images generated by array scanning were analyzed using GenePix Pro 4.0. software. Parameters were set to quantify light intensity values at Cy3 (λ = 532 nm) and Cy5 (λ = 635 nm) emission wavelengths, respectively, for CT and NC technologies. 2.3.7. Signal Standardization. Signal intensity values were normalized following eq 1, where SNi is referred to the intra and inter array normalized signal, Si to the raw signal in antibody containing spots, Sbg to the background signal, and S′ to the raw

SiN =

Si − S bg median(S′ − S bg > 0)

(1)

In the discovery phase (CT arrays), background signal was estimated for each array using the DEPC water spot signal values corresponding to the first quartile. In the case of the validation phase (NC arrays), as several sera were assayed per array (in duplicate), the background signal was estimated as the mean of DEPC water spot intensity values present in each pair of subarrays corresponding to the same serum. For both printing technologies, the blank signal was estimated as the median of intra-array normalized intensity values of negative control spots containing the companion buffers (i.e., PBS+H2O +BS3 in CT and glycerol+PBS+BS3 in NC) across all the arrays. The median signal of the replicates per antibody was computed as an estimate of reactivity. Antibodies with normalized intensity values greater than 1 were considered as hits (signal due to antibody−antigen binding), and the corresponding samples were considered as positive for the antigen specifically bound by the antibody. 2.3.8. Statistical Analysis. To remove the effect of outlier samples, the sera in which the number of hits detected was below 1% or above 99% were not considered for further analyses. After this filtering step, 34 CT samples (12 RA, 12 OA, and 10C) in the discovery phase and 246 NC samples (80 RA, 84 OA, and 82 C) in the validation phase were analyzed. The significance level selected was 0.05. The comparison of the overall serum protein profiles among patient groups (RA, OA, and C) was performed using multivariate analysis of variance (MANOVA). The Canonical Biplot (CB) method25 was also employed to represent both patients and protein levels (MultBiplotR R package vs 0.2, http://biplot.usal.es/classicalbiplot/multbiplot-in-r).26 CB method provides a simultaneous representation of n individuals belonging to K groups and p variables measured on them in a space of reduced dimension, maximizing the ratio of “between-group” to “pooled within-group” variance. It allows researchers not only to observe differences among groups, but also to identify the variables responsible for them. In a CB representation, individuals are displayed as points and variables as vectors. The group means and their confidence intervals are also shown. The length of the vectors shows the relevance of the variables to explain the differences among groups. The angle between variables (vectors) can be interpreted as an approximation of their correlation (small angles indicate high correlations). The projection of individual points over a vector provides an estimation of the levels of the corresponding variable in these individuals. It is possible to determine the magnitude of the differences between groups and their significance by checking the nonoverlap of confidence circles over the variables. These confidence circles correspond to the 95% confidence intervals for the location of the centroid of each group, representing the average individual. The small sample size (n = 34) in the discovery phase did not allow MANOVA analysis (higher number of variables than observations); however, a CB representation of the data provided an approximation for the differences in the sera protein profiles. Subsequent ANOVA and t-test analyses were performed for both discovery and validation phases to explore 1892

DOI: 10.1021/acs.jproteome.6b00980 J. Proteome Res. 2017, 16, 1890−1899

Article

Journal of Proteome Research

Figure 1. General overview of the experimental procedures. The steps performed to process the antibody arrays were the following. (1) Glass slides’ selection and cleaning. (2) Surface activation by the addition of highly reactive amine groups. (3) Antibody selection and preparation of the antibody and control solutions for printing. (4) Antibody spotting on the slides using two different printing techniques: (a) contact (CT, discovery phase) and (b) noncontact (NC, validation phase). (5) Quality control (QC) of the printing procedure: addition of HRP-conjugated secondary antibodies and revealing by TSA. (6) Sera samples from RA and OA patients and healthy controls (C). (7) Biotinylation of the proteins present in the sera. (8) Incubation of printed arrays with biotinylated sera. Specific antibody−antigen bindings were revealed using Cy3- and Cy5-streptavidin for CT and NC technologies, respectively. (9) Array scanning and image processing.

the average “one vs all” sensitivity and 1-specificity values across classifiers.

differences in the levels of individual proteins among groups. To control for multiple hypothesis testing, the Benjamini− Hochberg method for FDR was employed. FDR level was set at 0.01. Additionally, for the validation phase, a hierarchical clustering analysis using euclidean distances was carried out on log2 values of normalized signals to address the unsupervised classification of patients based on significant protein levels (FDR < 0.01). Besides, different supervised machine learning approaches were employed to classify the samples based on the levels of the significant proteins (FDR < 0.01). The validation data set was split into a training set (172 samples, 70%) and a test set (74 samples, 30%) with balanced classes. We selected three different, commonly used classifiers, (i) multinomial logistic regression (MLR), (ii) support vector machine with linear kernel (SVM), and (iii) artificial neural network with one hidden layer of three units (NN), and trained them using the train() function in the caret R package versions 6.0−70 (https://CRAN.R-project.org/package=caret).27 We used a 10fold, repeated 10 times cross-validation procedure with parameter tuning to select the optimal models based on their ROC values. We employed the selected models to make predictions on the test data set, obtaining overall and by class (“one vs all”) statistics (i.e., accuracy, sensitivity, specificity). ROC curves were built for the OA and RA patient groups using

3. RESULTS AND DISCUSSION In this work, a total of 318 serum samples (36 in a preliminary discovery phase and 282 in a validation phase) from patients with OA, RA, as well as healthy controls, were screened using a panel of 151 antibodies, immobilized on planar microarrays, against 121 preselected antigens (Supporting Table 1). The main goals of this study included the evaluation of the suitability of the experimental pipeline for biomarker discovery and the characterization of differential serum protein profiles of patients suffering from OA and RA as potential diagnostic tools along with the identification of candidate biomarkers. 3.1. Evaluation of Antibody Array Performance

This study was made up of two subsequent phases, discovery and validation, which involved multiple experimental steps (Figure 1). For each phase, a different array printing technology was employed: contact (CT) and noncontact (NC), respectively. To be able to assess functionality and detect differences in performance, both phases followed almost identical processing steps. This experimental design avoided introducing additional biases to the intrinsic variability of the printing procedures. An identical quality control (QC) was designed to evaluate spot features and to detect undesirable effects such as cross1893

DOI: 10.1021/acs.jproteome.6b00980 J. Proteome Res. 2017, 16, 1890−1899

Article

Journal of Proteome Research

Figure 2. Immunoreactivity profiles of the patient groups. For discovery and validation phases: (A) signal intensity of hits in rheumatoid arthritis (RA) patients, osteoarthritis (OA) patients, and healthy controls (C); (B) correlation of the percentage ([%]) of positive samples per antibody across patient groups.

talking or cross-contamination between spots for both CT and NC arrays. Antibody arrays were incubated with HRPconjugated antirabbit IgG and matched with the expected content of each spot (i.e., antibody spot, empty spot, blank spot). In addition, the signal variation could be related to the amount of immobilized antibodies, and saturation concentration was determined. According to the results of the QC, both printing technologies showed high reliability for their usage in biomarker discovery. However, CT arrays displayed irregular shape and larger size of spots, with increased probability of cross-contamination. These limited the number of subarrays that could be printed per slide and therefore the number of replicates and samples that could be assayed per array. CT printing also required considerably larger amounts of clinical sample and reagents (with respect to NC) and subsequently was not suitable for the analysis of a big number of clinical samples. Taking into account the limitations of CT printing, we decided to use a small number of samples for the discovery phase (n = 36). This sample size would be sufficient to assess the experimental procedures and characterize differential protein profiles and also for the identification of individual biomarker candidates, assuming enough statistical power even if limited. This decision was key to save important sample and reagent amounts, increasing the performance in the subsequent stage, in which we aimed to validate the results observed in the initial phase and to identify new candidate antigens. Regarding the validation phase, where a larger number of antibody arrays and clinical samples were analyzed, NC technology was selected due to the higher precision printing and lower amount of reagents and samples required.

Additionally, to evaluate the similarity in performance of both array platforms, the median normalized signal per antigen and patient group (OA, RA, and C) was computed for each array technology. The correlation (r, Pearson) of these values between CT and NC arrays was obtained for each antigen. The distribution of the correlation coefficients is shown in Supporting Figure 2. Note that we did not aim to exhaustively compare both array formats but to assess the degree of similarity in terms of the global trends. Despite the multiple sources of variability (experimental procedures, different samples assayed in each case, etc.), correlations were positive for the majority of antigens (71%), showing similar behaviors in both array platforms. In fact, 31% of the antigens showed correlations larger than 0.8, which implied highly concordant measurements across technologies. 3.2. Identification of Differential Protein Profiles in Serum

3.2.1. Immunoreactivity. Overall, immunoreactivity estimates obtained using both CT and NC technologies displayed a wide range of values across individual samples. Despite the high variability, different profiles were clearly detected, with proteins showing low/moderate levels in certain patient groups and higher in others. In the case of the discovery phase, the total number of hits per sample was similar between the two disease groups (median values of 67.5 and 68.5 in OA and RA, respectively) and higher than for healthy controls (median value of 49.0). The median normalized intensities of hits per sample, estimates of the antigen levels, displayed an analogous behavior (Figure 2A). The percentage of positive samples per antigen was plotted and correlated to illustrate differences in immunoreactivity between the patient groups (Figure 2B, see also Supporting Figure 3). 1894

DOI: 10.1021/acs.jproteome.6b00980 J. Proteome Res. 2017, 16, 1890−1899

Article

Journal of Proteome Research

Figure 3. Differential serum protein profiles across patient groups. (A) For both discovery (left panel) and validation (right panel) phases, canonical biplot (CB) representations of patients (points) and proteins (vectors) are displayed. The protein vectors have been scaled by a factor of 0.5 (discovery phase) and 0.125 (validation phase) to fit the plotting areas. For simplicity, only the proteins with major differences in their levels estimates among groups (rheumatoid arthritis, RA; osteoarthritis, OA and healthy controls, C) are shown (top 5 proteins for discovery phase, significant proteins in t-test, FDR < 0.01 for validation phase). The name of the patient groups represents the position of the group centroid (average individual). The percentage of variability explained by each canonical axis is shown between parentheses. (B) For the validation phase, hierarchical clusters based on the proteins with significant differences (t-test, FDR < 1%) in RA versus C, OA versus C, and RA versus OA are shown. Heatmaps are also displayed, which represent the antigen levels’ estimates (log2).

although clearly higher for pathological sera when compared to the C group (median values of 93.7 for OA, 67.6 for RA, and 21.2 for C). However, the estimates of antigen levels (Figure 2A) displayed similar median levels for RA and C and higher for OA. This may be a result of the antibody selection against antigens that have been previously reported in OA pathology. This pattern was not seen in the discovery phase. The variability of intensity values was higher for the C group, which

The fraction of reactive samples for particular antigens was observed higher in OA and RA when compared with C. Antigens present in a high number of pathological samples and absent in most of the controls would be potential biomarker candidates. We found antigens with disease-specific immunoreactivity profiles for both RA and OA. In the validation stage, the total number of hits per sample was substantially different among the three groups of samples, 1895

DOI: 10.1021/acs.jproteome.6b00980 J. Proteome Res. 2017, 16, 1890−1899

Article

Journal of Proteome Research

Figure 4. Normalized signal for significant (t-test FDR < 0.01) differentially present proteins among patient groups in the validation phase. Box-plots show the distributions of the noncontact (NC) signal (estimate of the protein levels) for the proteins IL1RAP, PLTP, SLC11A1, ANXA6, FBN1, COL1A1, and VASN in rheumatoid arthritis (RA) patients, osteoarthritis (OA) patients, and healthy controls (C).

circles for the group centroids (average individuals) over the antigen vectors, together with the subsequent statistical analyses based on ANOVA and t-tests, showed these differences in antigen levels to be nonsignificant after multiple testing corrections (FDR > 0.01). In the case of the validation stage, different protein profiles were significantly detected (MANOVA Pillai, p-value < 0.001) among the studied groups. CB representation of the serum samples (Figure 3A) shows a good separation of the OA and RA patients and healthy controls upon the canonical axes. The antigens IL1RAP, PLTP, SLC11A1, ANXA6, FBN1, COL1A1, and VASN are displayed as vectors (again, note that this could be done for all the assayed antigens, but we have restricted the representation to the most relevant ones for visualization purposes). These antigens showed significant differences in their levels between groups in ANOVA and t-test analyses after multiple testing corrections (FDR < 0.01) and also large contributions to the differences between groups (Figure 4). IL1RAP showed higher levels in the C group, SLC11A1 and PLTP in the OA group, and COL1A1, ANXA6, FBN1, and VASN in the RA group. The higher correlation (small angles between the corresponding vectors) between the antigens ANXA6, FBN1, and VASN in the CB representation reflects their similar behavior across sera. It is also relevant that vectors corresponding to the same antigens have similar directions in both screening phases, relative to the patient groups, meaning that the antigen levels are high/low in the same groups of samples for both discovery and validation phases (data not shown). This also supports the similarity and comparability between both printing technologies (see section 3.1). The highlighted antigens in both screening phases belong to three major protein groups: (i) Proteins involved in proinflammatory and inflammatory processes such as IL1RAP (interleukin 1 receptor accessory protein) and VASN (vasorin). Indeed, IL1RAP is a necessary component of the interleukin 1 receptor complex, which initiates the signaling cascade that results in the activation of IL1-responsive genes. (ii) Lipid metabolism related proteins such as PLTP (phospholipid

suggested that some of the patients in this group might have been erroneously included (note that the inclusion criterion for healthy controls was just the absence of a history of joint disease) and would eventually develop the pathology. As in the initial phase, the percentage of positive samples per antigen was plotted and correlated (Figure 2B, see also Supporting Figure 3). The range of percentages was narrower for this second stage; however, the cloud of points was greatly displaced to the pathology side, meaning that for the vast majority of tested antigens, the fraction of samples showing high levels was remarkably larger in the disease groups than in the healthy controls. We also found antigens with strongly disease-biased immunoreactivity profiles when comparing RA versus OA. 3.2.2. Deciphering Differential Protein Serum Profiles. A CB representation of the samples (points) and antigens (vectors) analyzed in the discovery phase is shown in Figure 3A. This representation of the data reflects the differences in protein profiles across the patient groups. A relatively good separation of the patients is achieved based on the antigen levels, especially in the case of OA samples. The top five antigens with major contribution to the differences among groups (TNF, ITGAM, SPARCL1, TGFBI, VASN) were displayed as vectors, their length representing the size of their contribution (note that this could be done for all the assayed antigens, but we have restricted the representation to the most relevant ones for visualization purposes). Given the properties of the CB representation and the direction of the antigen vectors, it is shown that most of the differences correspond to changes in antigen levels between the C group (lower levels) and both disease groups (higher levels). An exception was the VASN protein, with differences in its levels between the RA group (higher levels) and the other two groups (OA and C). We can also make statements about the correlation of the antigen levels across samples. The smaller the angle between their corresponding vectors, the higher their correlation. For instance, here we can see a higher correlation and thus similar behavior across sera for the antigens TNF, SPARCL1, and TGFBI. The overlap of the Bonferroni-corrected confidence 1896

DOI: 10.1021/acs.jproteome.6b00980 J. Proteome Res. 2017, 16, 1890−1899

Article

Journal of Proteome Research transfer protein) and ANXA6 (annexin A6). For example, PLTP is one of the lipid transfer proteins that interacts with apolipoproteins A1 and A2 (APOA1, APOA2). (iii) Proteins related to ECM formation, degradation, or reparation processes such as FBN1 (fibrillin 1) and COL1A1 (collagen type I alpha 1 chain). COL1A1 is one of the components collagen I, a member of the family of proteins that strengthen and support many tissues in the body (cartilage, bone, tendons, skin, etc.). Some of the identified proteins had been already reported as relevant for rheumatic pathologies in previous studies,28,29 which can be considered as an asset of this work. Others have been related to OA and RA for the first time at the protein level in this study. Note also that VASN has been detected as highly expressed in the RA patients when compared to OA and C in both discovery and validation phases. This protein has been recently described as a modulator of the vascular response to injury through the attenuation of TGF-β signaling30 and might have a similar role during joint destruction. Different unsupervised and supervised methods were employed to classify patients and controls according to their serum protein profiles. We restricted most of these analyses to the validation phase, given the difficulties to apply them to the entire data set, mainly due to the differences in signal distribution between both array technologies. Median raw intensities were larger in NC than in CT arrays. Also, signal variability was platform-dependent, smaller in NC arrays due to the higher precision printing. Even if standardization substantially corrected these issues, hierarchical clustering on the entire study still pointed to the array technology as an important clustering factor (see Supporting Figure 4). In addition, proteins employed for classification (those identified in the validation stage) were not seen as significant for distinguishing between patient groups in the initial discovery phase (except for VASN). Subsequently, including the discovery samples would hinder the classification. Note also that they only represent a small fraction of the total number of samples (36/(282 + 36) ∼ 11%) and therefore would not provide much additional information. Hierarchical clustering of all antigens and samples analyzed for all patient groups and both array technologies are shown in Supporting Figure 4. Heat maps for the log2 values of the normalized intensities are also displayed. As it can be seen, there is no clear separation of the patient groups based on the levels of all the antigens. Additionally, in the case of the validation phase, for each pair of groups, the samples were also clustered based on the antigens whose levels were identified as significantly different (FDR < 0.01) between patient groups (Figure 3B). A substantially better classification was achieved in this case. We further addressed the supervised classification of the validation samples based on the levels of the significantly differential proteins (FDR < 0.01) using different classifiers (multinomial logistic regression, MLR; support vector machine with linear kernel, SVM; and neural network with one hidden layer of three units, NN). The average classification accuracy achieved in the test set was of 0.717 (very similar across classifiers; MLR: 0.712, SVM: 0.726, NN: 0.712). Other statistics estimates by class (“one vs all” classification) were also obtained and summarized in the Supporting Table 2. As they were similar across classifiers, the focus was set on their average values. Besides, ROC curves built upon these values for OA and RA are shown in Figure 5.

Figure 5. ROC curves. Representation in the ROC space for each classifier (multinomial logistic regression, MLR; support vector machine with linear kernel, SVM; and neural network with one hidden layer of three units, NN) and disease group (osteoarthritis, OA, and rheumatoid arthritis, RA) the values of sensitivity and 1-specificity corresponding to the “one vs all” classification in the test set. The curves are drawn through the group centroids.

The highest sensitivity (proportion of positives that were correctly classified as such) was achieved for the control group (0.833). Sensitivity for the OA group was also high (0.800). However, it was low for RA classification (0.514). Specificity (proportion of negatives that were correctly identified as such) estimates were especially high for the RA group (0.966), followed by C (0.857) and OA (0.750). This means that it is possible to discriminate with high confidence between diseased and healthy individuals. Additionally, it is feasible to determine with moderate confidence whether an individual is affected by OA. However, in the case of RA, it is only possible to affirm with high confidence that a given individual does not have the disease. This is likely to be due to the antibody selection, against antigens that have been previously reported in OA pathology, and that may not be related to RA. The present study provides interesting insights into the differential protein serum profiles of OA, RA, and healthy individuals, leading to candidate antigens worthy of further evaluation. However, additional steps will be required to verify the actual findings. These could be also immunoassay-based by employing alternative formats to the CT/NC antibody arrays (e.g., beads, ELISA, etc.) but preferably more orthogonal techniques such as mass spectrometry (MS) combined with liquid chromatography (HPLC) and different labeling techniques (e.g., isobaric tags for relative and absolute quantitation, iTRAQ; or tandem mass tags, TMT). Finally, although studying the clinical interest of individual antigens is beyond the goals of the present study, it is worth mentioning that the profiles identified together with information on other clinical variables (bone densitometry, current treatment, environmental factors, etc.) constitute a valuable resource for early diagnosis, progression evaluation, and treatment response assessment in OA and RA. Besides, antibody arrays provide a simple, fast, and miniaturized technology, whose immunoassay format allows direct translation into the clinics (e.g., ELISA). 1897

DOI: 10.1021/acs.jproteome.6b00980 J. Proteome Res. 2017, 16, 1890−1899

Article

Journal of Proteome Research

4. CONCLUSIONS Our results showed differential protein profiles between patient groups. Immunoreactivity of pathological samples (OA and RA) was proven substantially higher when compared to healthy control samples. A set of antigens showing significantly different levels between groups was identified, mostly related to inflammatory response, lipid metabolism, and bone and ECM degradation or remodeling. Unsupervised and supervised machine learning approaches allowed the accurate classification of the patients based on these antigens, which constitute candidate biomarkers. NC printing appears to be the best strategy to process a higher number of samples in a reproducible manner, reducing the amount of antibodies, reagents, and sample required. Nevertheless, further studies will be necessary to accurately quantify differences in protein levels, with a larger number of samples, including other rheumatic/ inflammatory pathologies and exploring the correlation with clinical information (i.e., treatment, disease status, etc.).



(FS-23-2015). The Proteomics Unit belongs to ProteoRed, PRB2-ISCIII, supported by Grant No. PT13/0001 (ISCIIIFondos FEDER). P.D. and C.D. are supported by a JCYLEDU/346/2013 Ph.D. scholarship. The work of D.G.M. was supported by a collaboration scholarship (BOE-A-2014-3844) from the Spanish Ministry of Education, Culture, and Sports (MECD) and awarded in the XIII Certamen Universitario ́ Arquimedes (BOE-A-2014-12688, MECD). We also thank Javier Martiń Vallejo and Manuel Muñoz Aguirre for useful discussions and statistical support.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jproteome.6b00980. TIFF images and design of contact (CT) and noncontact (NC) microarrays; distribution of correlation of median antigen levels per patient group between array platforms; percentage of hits in serum samples of each patient group; clustering analysis of samples based on all antigen levels; antibodies employed in the study; classification statistics (PDF)



AUTHOR INFORMATION

Corresponding Authors

*E-mail: [email protected]. Phone: +34 981 17 82 72. Fax: +34 981 17 82 73. *E-mail: [email protected]. Phone: +34 923294811. Fax: +34 923294743. ORCID

Á lvaro Sierra-Sánchez: 0000-0003-4511-408X Diego Garrido-Martín: 0000-0002-4131-4458 María González-González: 0000-0001-8115-1086 Paula Díez: 0000-0003-2150-6898 Cristina Ruiz-Romero: 0000-0001-7649-9803 Ronald Sjöber: 0000-0003-1363-5796 Conrad Droste: 0000-0001-6027-5805 Javier De Las Rivas: 0000-0002-0984-9946 Manuel Fuentes: 0000-0002-7305-3766 Author Contributions ∥

REFERENCES

(1) Cross, M.; Smith, E.; Hoy, D.; Nolte, S.; Ackerman, I.; Fransen, M.; Bridgett, L.; Williams, S.; Guillemin, F.; Hill, C. L.; et al. The global burden of hip and knee osteoarthritis: estimates from the global burden of disease 2010 study. Ann. Rheum. Dis. 2014, 73 (7), 1323− 1330. (2) Henjes, F.; Lourido, L.; Ruiz-Romero, C.; Fernández-Tajes, J.; Schwenk, J. M.; Gonzalez-Gonzalez, M.; Blanco, F. J.; Nilsson, P.; Fuentes, M. Analysis of autoantibody profiles in osteoarthritis using comprehensive protein array concepts. J. Proteome Res. 2014, 13 (11), 5218−5229. (3) Rousseau, J.-C.; Delmas, P. D. Biological markers in osteoarthritis. Nat. Clin. Pract. Rheumatol. 2007, 3 (6), 346−356. (4) Ruiz-Romero, C.; Blanco, F. J. Proteomics role in the search for improved diagnosis, prognosis and treatment of osteoarthritis. YJOCA 2010, 18, 500−509. (5) Nedelkov, D.; Kiernan, U. A.; Niederkofler, E. E.; Tubbs, K. A.; Nelson, R. W. Investigating diversity in human plasma proteins. Proc. Natl. Acad. Sci. U. S. A. 2005, 102 (31), 10852−10857. (6) Balakrishnan, L.; Nirujogi, R.; Ahmad, S.; Bhattacharjee, M.; Manda, S. S.; Renuse, S.; Kelkar, D. S.; Subbannayya, Y.; Raju, R.; Goel, R.; et al. Proteomic analysis of human osteoarthritis synovial fluid. Clin. Proteomics 2014, 11 (1), 6. (7) Kharaz, Y. A.; Tew, S. R.; Peffers, M.; Canty-Laird, E. G.; Comerford, E. Proteomic differences between native and tissueengineered tendon and ligament. Proteomics 2016, 16 (10), 1547− 1556. (8) Hsueh, M.-F.; Khabut, A.; Kjellström, S.; Ö nnerfjord, P.; Kraus, V. B. Elucidating the Molecular Composition of Cartilage by Proteomics. J. Proteome Res. 2016, 15 (2), 374−388. (9) Briggs, M. T.; Kuliwaba, J. S.; Muratovic, D.; Everest-Dass, A. V.; Packer, N. H.; Findlay, D. M.; Hoffmann, P. MALDI mass spectrometry imaging of N -glycans on tibial cartilage and subchondral bone proteins in knee osteoarthritis. Proteomics 2016, 16 (11−12), 1736−1741. (10) Anderson, N. L.; Polanski, M.; Pieper, R.; Gatlin, T.; Tirumalai, R. S.; Conrads, T. P.; Veenstra, T. D.; Adkins, J. N.; Pounds, J. G.; Fagan, R.; et al. The human plasma proteome: a nonredundant list developed by combination of four separate sources. Mol. Cell. Proteomics 2004, 3 (4), 311−326. (11) Gillette, M. A.; Mani, D. R.; Carr, S. A. Place of Pattern in Proteomic Biomarker Discovery. J. Proteome Res. 2005, 4 (4), 1143− 1154. (12) Ayoglu, B.; Häggmark, A.; Khademi, M.; Olsson, T.; Uhlén, M.; Schwenk, J. M.; Nilsson, P. Autoantibody profiling in multiple sclerosis using arrays of human protein fragments. Mol. Cell. Proteomics 2013, 12 (9), 2657−2672. (13) González-González, M.; Bartolome, R.; Jara-Acevedo, R.; Casado-Vela, J.; Dasilva, N.; Matarraz, S.; García, J.; Alcazar, J. A.; Sayagues, J. M.; Orfao, A.; et al. Evaluation of homo- and heterofunctionally activated glass surfaces for optimized antibody arrays. Anal. Biochem. 2014, 450, 37−45. (14) McWilliam, I.; Kwan, M. C.; Hall, D. Methods Mol. Biol. 2011, 785, 345−361. (15) Gonzalez-Gonzalez, M.; Jara-Acevedo, R.; Matarraz, S.; JaraAcevedo, M.; Paradinas, S.; Sayagües, J. M.; Orfao, A.; Fuentes, M.

These authors contributed equally to this work.

Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS We gratefully acknowledge financial support from the Carlos III Health Institute of Spain (ISCIII, FIS PI14/01538, PI12/ 00624, PI12/00329, PI14/01707, CIBER-CB06/01/0040, and RETIC-RIER-RD12/0009/0018), Fondos FEDER (EU), Junta Castilla y León (BIO/SA07/15), and Fundación Solórzano 1898

DOI: 10.1021/acs.jproteome.6b00980 J. Proteome Res. 2017, 16, 1890−1899

Article

Journal of Proteome Research Nanotechniques in proteomics: protein microarrays and novel detection platforms. Eur. J. Pharm. Sci. 2012, 45 (4), 499−506. (16) Lourido, L.; Calamia, V.; Mateos, J.; Fernández-Puente, P.; Fernández-Tajes, J.; Blanco, F. J.; Ruiz-Romero, C. Quantitative Proteomic Profiling of Human Articular Cartilage Degradation in Osteoarthritis. J. Proteome Res. 2014, 13 (12), 6096−6106. (17) Fernández-Puente, P.; Mateos, J.; Fernández-Costa, C.; Oreiro, N.; Fernández-López, C.; Ruiz-Romero, C.; Blanco, F. J. Identification of a Panel of Novel Serum Osteoarthritis Biomarkers. J. Proteome Res. 2011, 10 (11), 5095−5101. (18) Mateos, J.; Lourido, L.; Fernández-Puente, P.; Calamia, V.; Fernández-López, C.; Oreiro, N.; Ruiz-Romero, C.; Blanco, F. J. Differential protein profiling of synovial fluid from rheumatoid arthritis and osteoarthritis patients using LC−MALDI TOF/TOF. J. Proteomics 2012, 75 (10), 2869−2878. (19) Díaz-Prado, S.; Cicione, C.; Muiños-López, E.; HermidaGómez, T.; Oreiro, N.; Fernández-López, C.; Blanco, F. J. Characterization of microRNA expression profiles in normal and osteoarthritic human chondrocytes. BMC Musculoskeletal Disord. 2012, 13 (1), 144. (20) Evangelou, E.; Kerkhof, H. J.; Styrkarsdottir, U.; Ntzani, E. E.; Bos, S. D.; Esko, T.; Evans, D. S.; Metrustry, S.; Panoutsopoulou, K.; Ramos, Y. F. M.; et al. A meta-analysis of genome-wide association studies identifies novel variants associated with osteoarthritis of the hip. Ann. Rheum. Dis. 2014, 73 (12), 2130−2136. (21) Altman, R.; Asch, E.; Bloch, D.; Bole, G.; Borenstein, D.; Brandt, K.; Christy, W.; Cooke, T. D.; Greenwald, R.; Hochberg, M. Development of criteria for the classification and reporting of osteoarthritis. Classification of osteoarthritis of the knee. Diagnostic and Therapeutic Criteria Committee of the American Rheumatism Association. Arthritis Rheum. 1986, 29 (8), 1039−1049. (22) Aletaha, D.; Neogi, T.; Silman, A. J.; Funovits, J.; Felson, D. T.; Bingham, C. O.; Birnbaum, N. S.; Burmester, G. R.; Bykerk, V. P.; Cohen, M. D.; et al. 2010 rheumatoid arthritis classification criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Ann. Rheum. Dis. 2010, 69 (9), 1580−1588. (23) Häggmark, A.; Byström, S.; Ayoglu, B.; Qundos, U.; Uhlén, M.; Khademi, M.; Olsson, T.; Schwenk, J. M.; Nilsson, P. Antibody-based profiling of cerebrospinal fluid within multiple sclerosis. Proteomics 2013, 13 (15), 2256−2267. (24) Díez, P.; Dasilva, N.; González-González, M.; Matarraz, S.; Casado-Vela, J.; Orfao, A.; Fuentes, M. Data Analysis Strategies for Protein. Microarrays 2012, 1 (2), 64−83. (25) Varas, M. J.; Vicente-Tavera, S.; Molina, E.; Vicente-Villardón, J. L. Role of canonical biplot method in the study of building stones: an example from Spanish monumental heritage. Environmetrics 2005, 16 (4), 405−419. (26) Vicente-Villardón, J. L. MultBiplotR: Multivariate Analysis Using Biplots. R package version 0.2, 2015. http://biplot.usal.es/ classicalbiplot/multbiplot-in-r (accessed July 27, 2016). (27) Kuhn, M.; Wing, J.; Weston, S.; Williams, A.; Keefer, C.; Engelhardt, A.; Cooper, T.; Mayer, Z.; Kenkel, B.;the R Core Team, Benesty, M.; Lescarbeau, R.; Ziem, A.; Scrucca, L.; Tang, Y.; Candan, C. caret: Classification and Regression Training. R package version 6.0− 70, 2016. https://CRAN.R-project.org/package=caret (accessed February 6, 2016). (28) Balakrishnan, L.; Nirujogi, R.; Ahmad, S.; Bhattacharjee, M.; Manda, S. S.; Renuse, S.; Kelkar, D. S.; Subbannayya, Y.; Raju, R.; Goel, R.; et al. Proteomic analysis of human osteoarthritis synovial fluid. Clin. Proteomics 2014, 11 (1), 6. (29) Campbell, K. A.; Minashima, T.; Zhang, Y.; Hadley, S.; Lee, Y. J.; Giovinazzo, J.; Quirno, M.; Kirsch, T. Annexin A6 interacts with p65 and stimulates NF-κB activity and catabolic events in articular chondrocytes. Arthritis Rheum. 2013, 65 (12), 3120−3129. (30) Ikeda, Y.; Imai, Y.; Kumagai, H.; Nosaka, T.; Morikawa, Y.; Hisaoka, T.; Manabe, I.; Maemura, K.; Nakaoka, T.; Imamura, T.; et al. Vasorin, a transforming growth factor -binding protein expressed in vascular smooth muscle cells, modulates the arterial response to injury in vivo. Proc. Natl. Acad. Sci. U. S. A. 2004, 101 (29), 10732−10737. 1899

DOI: 10.1021/acs.jproteome.6b00980 J. Proteome Res. 2017, 16, 1890−1899