[email protected]
MAPPI-DAT: data management and analysis for protein-protein interaction data from the high-throughput MAPPIT cell microarray platform Surya Gupta1,2,3 , Veronic De Puysseleyr1,2, José Van Der Heyden1,2, Davy Maddelein1,2,3, Irma Lemmens1,2, Sam Lievens1,2, Lennart Martens1,2,3, Jan Tavernier1,2 1
Medical Biotechnology Center, VIB, Ghent, Belgium, 2 Department of Biochemistry, Ghent University, Ghent, Belgium, 3 Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
Introduction Proteins are highly interesting objects of study, involved in different cellular and molecular functions. Identification and quantification of these proteins along with their interacting proteins, nucleic acids and molecules can provide insight into development and disease mechanisms at the systems level. Yet studying these interactions is not trivial. In vivo methods exist to determine these interactions, but these suffer from several drawbacks [4]. To overcome existing problems, an innovative approach called MAPPIT (Mammalian Protein-Protein Interaction Trap) [2] has been established in the Cytokine Receptor Lab to determine interacting partners of proteins in mammalian cells. To allow screening of thousands of interactors simultaneously, MAPPIT has been parallelized in the array MAPPIT system [3]. However, no effective pipeline existed to process the high-through put data generated from array MAPPIT. We therefore established an automated highthroughput data analysis system called MAPPI-DAT (Mappit Array Protein Protein Interaction- Database & Analysis Tool), capable of processing many thousand data points for each experiment, and comprising a data storage system that stores the experimental data in a structured way for meta-analysis.
Approach and Methodology 1. Concept and Raw Data In MAPPIT, the bait protein is fused to a mutant receptor without the STAT recruitment sites, while the prey is fused to the fragment containing these STAT recruitment sites (Figure1). Bait-prey interaction leads to functional complementation of the receptor and STAT activation. In array MAPPIT, STATs migrate to the nucleus and induce expression of fluorescence emitting genes, allowing the intensity of the fluorescence to be measured as a proxy for interaction. Each experiment contains two conditions: stimulated and nonstimulated (as controls). The measured intensities follow a heavily tailed distribution as shown in Figure2.
Counts
Intensity
Figure2: Histogram for stimulated and non-stimulated Intensity
Figure1: MAPPIT schema
2. Analysis Florescence intensities are used to calculate fold change based ranking for each data point across different interactors. Furthermore, the rank products for each pair were used to calculate the p-value with a pre-existing method designed specifically to calculate accurate p-values for replicated microarray experiments[1]. In order to correct for multiple hypothesis testing we used R package “q-value” [5].
3. Post Filtration
Pfp=0.0457 P-value=6.00E-04 FoldChange=1.255493
To minimize false positive hits (Figure3) from RankProd output, quartile based filtration was applied. This classical approach removes outliers in a robust way (Equation 1). 𝑻𝒉𝒓𝒆𝒔𝒉𝒐𝒍𝒅 =
𝟑𝒓𝒅 𝑸𝒖𝒂𝒓𝒕𝒊𝒍𝒆
+ 𝟏. 𝟓 ∗ 𝑰𝒏𝒕𝒆𝒓𝑸𝒖𝒂𝒓𝒕𝒊𝒍𝒆 𝑹𝒂𝒏𝒈𝒆
Equation 1: equation to determine the threshold to filter false positives
Intensity= 9.4E+08
Counts
threshold Intensity= 6.91E+08
Intensity
MAPPI-DAT visual interface
Figure4: Interface for Analysis using RankProd
Figure5: Interface to load data in the database
Figure3: Example of a false positive hit from RankProd
Figure6: Interface to retrieve and visualize the data in the database for each project and experiment
Figure7: MAPPI-DAT database Schema
References [1] Heskes, T., Eisinga, R., & Breitling, R. (2014). A fast algorithm for determining bounds and accurate approximate p -values of the rank product statistic for replicate experiments. BMC Bioinformatics, 15(1). [2] Lievens, S., Peelman, F., De Bosscher, K., Lemmens, I., & Tavernier, J. (2011). MAPPIT: A protein interaction toolbox built on insights in cytokine receptor signaling. Cytokine and Growth Factor Reviews, 22(5-6), 321–329. [3] Lievens, S., Vanderroost, N., Heyden, J. Van Der, Gesellchen, V., Vidal, M., Tavernier, J., & Heyden, V. Der. (2009). Array MAPPIT : High-Throughput Interactome Analysis in Mammalian Cells, 877–886. [4] S.Gopichandran and S.Ranganathan. (2013). Protein-protein Interactions and Prediction: A Comprehensive Overview. Protein and Peptide Letters, 779–789 [5] Storey, J. (2002). A Direct Approach to False Discovery Rates on JSTOR. Wiley Online Library, 64(3), 479–498.
http://www.compomics.com