Figure S1 - Springer Static Content Server

3 downloads 16 Views 8MB Size Report
Add to Database a. dbSNP. SNP. Indel. Frameshifts/Stop losses. COSMIC .... Figure S7: Variants identified for genes in the COSMIC cancer gene census tend to ...
Figure S1

a

Germline or Somatic Variant Calls

Appropriate Genome Build Variant Effect Predictor

Coding Variants and Consequences Mutate reference proteins Generate Tryptic Peptides Add to Database

Augmented peptide database

b

* 4 databases derived from RNA-Seq SNV Fusion Frameshifts Stop-loss

Sample specific RNA

x41 cell-lines with RNA-seq data

* 4 databases derived combination from Exome-Seq with dbSNP and COSMIC SNV Indel Frameshifts Stop-loss

Sample specific Exome

x59 cell-lines with exome-seq data

164 databases added

236 databases added

5 databases added COSMIC Gene Census

dbSNP

*

SNP Indel Frameshifts/Stop losses

x59 cell-lines with exome-seq data

59 databases added

5 databases added COSMIC Gene Census

COSMIC

*

SNV Indel Fusions/Frameshifts and Stop losses

*

4 databases added

Variants in UniProt

all RNA-Seq Data

all Exome-Seq Data

combined

473 databases added Databases referred to Databases referred to as community-based

Figure S1: Generation of proteogenomic databases (a) Scheme for the generation of databases suitable for MS-based detection of protein variants. (b) Overview of databases generated.

1

* *

Figure S2

a

Variant peptide database

Uniprot database

Reverse variants (augmented decoy database)

Reverse Uniprot (decoy database) Merge databases

b All identified proteogenomic peptides

Identified PTMs from MaxQuant

Spectra identified by both Same peptide Altered

Same site mutated

Search combined database Variant peptides

reject no reject no

Final set of peptides

Reference peptides

Comet Tandem MSGF+ Comet Tandem MSGF+ 55,440 Proteogenomic Search Combinations Filter against reference proteomes

Union

Comet Tandem

Union

MSGF+

Filter for post-translational modifications

all identified peptides Protein expression Filter

Final set of variant peptides

2

Figure S2: Proteogenomic search and filtering strategy (a) Search schema used to identify variant peptides within proteomic datasets. Databases were searched using a split target-decoy strategy with both sequences to the augmented and reference proteins reversed. Three search algorithms (Tandem, COMET, and MS-GF+) were used and results were combined. (b) Schematic representation of PTM filtering strategy. All MS2 spectra identified by both our proteogenomics pipeline and identified as having mass-shifts to canonical peptides by MaxQuant were collected. If there was disagreement between the reference peptide altered by either pipeline, the PSM was rejected. Conservatively, we also rejected peptides if there was further disagreement regarding the site of modification.

Figure S3

UniProt+isoforms (SwissProt+Trembl) UniProt (SwissProt)

0

Ensembl Proteome GRCh37.75

339,321 27,987

31,248 229,053

14,585

0

2,224

102,368

109,424

2,064,452

18,838 12,943

0 0

RefSeq Proteome v. 68

Figure S3: Comparison of reference proteomes. Venn diagram comparing the four reference databases used in this study. Proteins in each reference proteome were in silico digested and filtered in the length range 6-35 amino-acids. The Venn diagram portrays the overlaps of unique peptides originating from each reference database.

3

Figure S4

843 reported variant peptides 839 831 817 476 474

Present in our Exome-Seq databases Detected before 1% FDR cutoff Detected after 1% FDR Peptides not in reference proteomes Cannot be explained by PTMs

459 variant peptides

remain having evidence for protein expression

Figure S4: Comparison to other studies. To illustrate the importance of peptide filters, we compared our results to a previous study [41]. In their study, both a cell-line specific database search and a database combining all exome sequencing data were used. The figure reports the number of peptides identified by [41] that remain after the various filtration approaches in our pipeline.

4

Figure S5

Hydrophobicity

Biophysical properties at Tier 2 150 100

p < 1.0x10-16 effect = 0.02 size

p = 1.0x10-16 effect = 0.01 size

150 100

150 100

50

50

0

0

0

-50

-50

-50

p-value

Suggest Documents