J Electrophoresis 2016;60:1 doi:10.2198/jelectroph. 60.1
[Short Communication]
ModProt: a database for integrating laboratory and literature data about protein post-translational modifications Yayoi Kimura*, Tosifusa Toda and Hisashi Hirano Advanced Medical Research Center, Yokohama City University (Received December 24, 2015; Accepted January 29, 2016)
SUMMARY Protein post-translational modifications (PTMs) play crucial roles in regulation of protein function and cell signaling, and abnormalities in protein PTMs are both causes and consequences of disease. Mass spectrometry (MS) is widely used to analyze protein PTMs. In this study, we developed an original database, ModProt (Post-Translational Modification Map of Proteome), to integrate our laboratory data and literature information regarding PTM sites. To develop the ModProt database, we constructed a web-based laboratory information management system (LIMS). This system allows us to administer the ModProt database and to view PTM site maps and corresponding protein information including amino acid sequences, official gene symbols, UniProt accessions/IDs, chromosome number/ positions, and additional description. The ultimate goal of the ModProt database is to achieve PTM-based diagnosis and personalized medicine through detection of abnormal PTMs by comparing PTM site maps in healthy and disease states using the database. Key words: mass spectrometry, post-translational modification, proteome, LIMS, ModProt
INTRODUCTION Most proteins undergo various post-translational modifications (PTMs), which may affect protein function by changing conformational state, stability, or interactions with other proteins or small molecules. Consequently, protein PTMs play key roles in regulating many biological processes, including gene expression and cellular differentiation, and also contribute significantly to the structural and functional diversity of proteins1–3). Because of the functional importance of these modifications, abnormalities in protein PTMs are associated with the causes and consequences of diseases4, 5). Several protein PTMs are the molecular targets of drug therapies. Therefore, analysis of protein PTMs is important not only for understanding cellular functions, but also for developing drug therapies for disorders such as cancer and neurodegeneration. It is essential to investigate the influence of PTMs on protein functions by comprehensively collecting information about them. Therefore, in this study we sought to develop an original database, ModProt (Post-Translational Modification Map of Proteome), to integrate PTM data obtained in the laboratory and information in the literature.
Several differences in PTM status between healthy and disease states can be detected by comparing PTM information in this database. Mass spectrometry (MS) is a powerful tool for identifying and mapping protein PTMs6). To achieve MS-based largescale PTM analysis, advances in mass spectrometer performance and techniques for enrichment of modified peptides are critical. Recently, mass spectrometers have evolved dramatically, and multiple enrichment techniques have been developed1, 6). As a result, the number of PTM sites that can be identified in a sample has dramatically increased. For example, large-scale analysis of phosphorylated peptides can be performed by MS-based proteomics combined with enrichment techniques using titanium dioxide or Phos-tag agarose beads7). Peptides containing ubiquitinated, methylated and acetylated lysine residues can be analyzed by MSbased proteomics with enrichment using PTM-specific antibodies1, 6). Consequently, we have obtained a great deal of information regarding the PTM sites of various proteins. However, the results of comprehensive proteomic analysis of PTMs have not been satisfactorily and efficiently utilized to understand the functional regulation of proteins. This is
* Corresponding author: Yayoi Kimura; Advanced Medical Research Center, Yokohama City University, 3-9 Fukuura, Kanazawa-ku, Yokohama, Kanagawa, 236-0004, Japan E-mail:
[email protected] Fax: +81-45-787-2787
J Electrophoresis 2016 ; 60 : 2
mainly because previously, no system was available for integrating and managing PTM information collected by comprehensive MS-based proteomic approaches in the laboratory. To address this need, in this study we constructed a web-based laboratory information management system (LIMS). We anticipate that the ModProt database will serve as a powerful tool for basic biomedical research and advanced clinical investigations based on protein PTM data in the context of personalized medicine. MATERIALS AND METHODS Data collection Experimental information about protein PTMs was systematically assembled from our MS analysis data. For MSbased proteomics, enrichment methods for modified peptides were performed as described previously with slight modifications1, 6). Each enriched peptide sample was analyzed using an LTQ Orbitrap Velos hybrid mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) coupled to an UltiMate® 3000 LC system (Dionex, LC Packings, Sunnyvale, CA, USA) or a TripleTOF 5600 system (AB Sciex, Foster City, CA, USA) coupled to a DiNa-AP (KYA Technologies, Tokyo, Japan). To identify peptides, peak lists were created using the Proteome Discoverer software (Thermo Fisher Scientific) for an LTQ Orbitrap Velos or the ProteinPilot software (AB Sciex) for a TripleTOF 5600 system, and were then searched against human protein sequences in the UniProt Knowledgebase (UniProtKB/Swiss-Prot, version Jan 2013; 538,849 entries) using MASCOT (v2.4.1, Matrix Science, London, UK). The basic search parameters were
as follows: trypsin digestion with two missed cleavages permitted; peptide charge (2+, 3+ and 4+); usual variable modifications, protein N-terminal acetylation, N-terminal carbamylation, oxidation of methionine, and carbamidomethylation of cysteine; peptide mass tolerance and fragment mass tolerance, ±5 ppm and ±0.5 Da for an LTQ Orbitrap Velos or ±0.05 Da and ±0.1 Da for a TripleTOF 5600 system. In addition, the PTM target of each MS analysis was added to the variable modification parameters. We used a significance threshold of p