Tutorial for Radiobiology Worker Risk Assessment Using Stress. Indicators and Proteomics. Karim Mezhoud. 1,2. , Amina Sakly. 3. , Hassen Ben Cheikh. 3.
Tutorial for Radiobiology Worker Risk Assessment Using Stress
Indicators and Proteomics Karim Mezhoud1,2, Amina Sakly3, Hassen Ben Cheikh3, Mouldi Saïdi1, and Marc Edery2 1
UR04CNSTN01 Medical and Agricultural Applications of Nuclear Techniques, National Center
for Nuclear Sciences and Technology (CNSTN), Sidi Thabet Biotechpole, 2020 Sidi Thabet, Tunisia. 2
UMR 7245 CNRS-MNHN Molécules de Communication et Adaptation des Micro-organismes,
Eq. Cyanobactéries, Cyanotoxines et Environnement. Muséum National d'Histoire Naturelle, 12 rue Buffon, F-75231 Paris cedex 05, France. 3
02UR08-03 Histology and Cytogenetics Laboratory, Faculty of Medicine, 5019 Monastir, Tunisia.
This supplementary document is a guideline for statistics using R. This statistic approach could be applied not only in proteomics field but also in other comparative data used in experimental biology. All the steps and command lines can be adapted easily to your experiment. We describe step by step our approach to perform statistic comparison and bioinformatic analysis applied to chromosome aberration, oxidative stress and Mass spectrometry and ELISA data. Finally, the detail of liquid chromatography and mass spectrometry analysis are described. Lymphocyte Chromosome aberration: Chromosome aberration types were saved in file text named Abbwil.txt:
Fig. 1: Abbwil.txt file 1
The following R command lines were done:
2
cd ~path/Abbwil.txt R Abbwil quantile(C2:C5,probs=15.87/100) •
Z score
If Log2(Median of protein) < Median : Z = [M- Log2(Median of protein)]/SDlower If Log2(Median of protein) > Median : Z = [Log2(Median of protein) – M]/SDupper where M is the median of all protein. •
Signifiance =1-LOI.NORMALE.STANDARD(Z-score)
•
Benjamini-Hochberg correction is a way of controlling the false discovery rate (FDR) (Choi and Nesvizhskii, 2008).
◦ Rank the significance (p-value) from highest to lowest with numbers ◦ Do not correct the highest p-value ◦ For the others : corrected p-value = p-value *n/rank •
Biological signifiance
23
•
We select proteins that have normalized ratio variance out of red dashed line (Arntzen et al., 2010):
We can then ask two questions: What is the probability to obtain a ratio higher than SDupper or lower than the SDlower with FDR=p-value< 0.02 (ORBIT), p-valuedata data[ ]
#view table
>names(data)
# verify the column names
>par(mfrow=c(1,3))
#3 column or 3 graphs
>boxplot(ACTBdata~ACTBgroup, data=data, main='ACTB', ylab='1/10000 mg/mL' ) >boxplot(Gcdata~Gcgroup, data=data, main='GC', ylab='1/40000 mg/mL') >boxplot(X1433data~X1433group, data=data, main='YWHAZ', ylab='mg/mL')
35
Fig. 2: Box plots of the ATCB, GC and YWHAZ proteins tested by ELISA assays. The middle line in each box is the mean data value, and the bottom and the top box extremes correspond to the 25th and 75th percentiles of the distribution, respectively. The vertical lines and dots indicate data points on the extremes of the distribution.
To add wilcoxon test resultants as legend in every boxplot, save ELISA data as following and name it elisatest.txt :
and take these R commands to bring the wilcoxon test results :
36
>elisa names(elisa) # verify the columns >wilcox.test(elisa$Cactb, elisa$Eactb) #standard wilcoxon test without pairwise output : Wilcoxon rank sum test with continuity correction data:
elisa$Cactb and elisa$Eactb
W = 441, p-value = 0.001657 > wilcox.test(elisa$Cgc, elisa$Egc) output : Wilcoxon rank sum test with continuity correction data:
elisa$Cgc and elisa$Egc
W = 517, p-value = 2.446e-06 > wilcox.test(elisa$C1433, elisa$E1433) output : Wilcoxon rank sum test with continuity correction data:
elisa$C1433 and elisa$E1433
W = 428.5, p-value = 0.003842
you can include the wilcoxon test result in explicit view by boxplot graphs ; as following : 37
>library("RSvgDevice") >devSVG("boxplot.svg", width=6.5, height=4.8) >data(data) >par(mfrow=c(1,3)) >boxplot(ACTBdata~ACTBgroup, data=data, boxwex=0.3, main='ACTB', ylab='1/10000 ng/mL') >legend("topright", cex=0.9, title="wilcoxon test", "W=441, pvalue=0.0016 ") >boxplot(Gcdata~Gcgroup, data=data, boxwex=0.3, main='GC', ylab='1/40000 ng/mL') >legend("topright", cex=0.9, title="wilcoxon test", "W=517, pvalue=2.44e-06 ") >boxplot(X1433data~X1433group, data=data, boxwex=0.3, main='YWHAZ', ylab='ng/mL')
Fig. 3: The box plots of Fig. 2 with legends.
You can adjust the size and the resolution of the output file (boxplot.svg) using Insckape vectorial graphical program (http://inkscape.org/).
38
Liquid chromatography and Mass spectrometry Buffer Exchange and protein content estimation Using 2000 MWCO Hydrosart Vivaspin 2 spin concentrators (Sartorius Stedim Biotech, Germany), the prefractionated plasma samples were concentrated and buffer-exchanged, by subjecting them to repeated (three times) centrifugation with an appropriate 0.5 M triethylammonium bicarbonate (TEAB) pH 8.5 buffer (Sigma-Aldrich® Corporation, Saint Louis, MO, USA) for downstream analysis. The protein concentrations of the whole and prefractionated plasma samples were estimated using a Qubit kit (Invitrogen®).
Protein Digestion and Peptide labeling 100 µg of protein from the six samples were reduced for 1 hour at 60°C with 5 mM tris-(2-carboxyethyl) phosphine (TCEP), and were then cystein-blocked with 10 mM methyl methanethiosulfonate (MMTS) at room temperature for 10 minutes. The proteins were then digested for 36 hours at 37°C, by TPCK-treated trypsin, with CaCl2 (Applied Biosystems, Foster City, CA, USA). Each peptide solution was labeled for 2 hours, according to the iTRAQ Reagents Multiplex Kit protocol (Applied Biosystems®). The peptides from the radiology workers were labeled with 114, 115, and 116 mass-tagged iTRAQ reagents, and the peptides from unexposed staff were labeled with 117, 118, 119 masstagged iTRAQ reagents. Labeled samples were then pooled and dried in a vacuum concentrator (Eppendorf®). All steps from Iso-Electro-Focalization off-gel fractionation to the protein identification and Peak Area Ratio normalization were done by Dr. François Guillonneau and Dr. Marjorie Leduc at 3P5 proteomics facility, Université Paris Descartes, Sorbonne Paris Cité, 22 rue Méchain - 75014 Paris – France.
Iso Electro Focalization (IEF) Off-gel fractionation Any excess hydrolyzed iTRAQ reagent was eliminated using an SCX column (ABSciex). Briefly: the SCX column was prewashed with 2 mL of cleaning buffer (25% Acetonitrile (Carlo Erba), 10 mM KH2PO4 (Carlo Erba), 1M KCl (SDS), pH3), and equilibrated with 2 mL of loading buffer (25% acetonitrile, 10 mM KH2PO4, pH3). The sample was re-suspended in 1 mL of loading buffer acidified to pH 3 with 20 µL of KH2PO4 10%, percolated on the column, and then washed with 2 mL of loading buffer. The retained peptides were eluted with 500 µL of elution buffer (25% Acetonitrile, 10mM KH2PO4, 350mM KCl, pH3). Eluted peptides were desalted using a Sep-Pak C18 column (Waters®). Briefly, the C18 column was activated with 3 mL of 90% Acetonitrile, 0.1% TFA (Trifluoroacetic acid Fluka) and equilibrated with 3 mL of 0.1% TFA. Peptides were percolated on the column, the retained fraction was
39
washed with 2 mL of 0.1% TFA, and eluted with 1 mL of 70% Acetonitrile, 0.1% TFA then dried in a vacuum concentrator (Eppendorf®). Peptides were prepared for an Off-gel isoelectrofocalization on a 12-cm strip pH 3-10, as indicated in the Agilent 3100 Off-Gel fractionator kit-quick start guide. After focusing, each fraction was collected. To extract the peptides trapped in a gel strip, 200 µL of 50% methanol 1% formic acid were added to each tank of the frame, and incubated for 30 min. Methanol-extracted peptides were pooled with their respective fractions, and then dried in a vacuum concentrator.
Reverse-phase, nano liquid chromatography fractionation Each dried IEF off-gel peptide fraction was re-dissolved in 0.1% TFA (Fluka®) and 10% acetonitrile (Carlo Erba®), and a measured 2-µg portion was injected into an Ultimate 3000 nano-HPLC (Dionex®). Peptides were purified and first concentrated on a C18 PepMap pre-column from Dionex (0.3 mm I.D. ×5 mm, 100 Å pore size, 5 µm particle size) at a flow rate of 30 µl/min in 0.1% TFA and 2% acetonitrile. Subsequently, the peptides were separated on a C18 PepMap100 analytical reverse phase column from Dionex (75 µm I.D.×150 mm, 100 Å pore size, 3 µm particle size) at a 300 nL/min solvent flow rate (solution A: 0.1% TFA, 2% acetonitrile; solution B: 20% solution A mixed vol/vol with 80% acetonitrile). After equilibrating for 6 min in a 7% B a multi-slope gradient as follows: 16% B at 10 min post injection, 32% at 55 min, 50% at 78 min, and a 95% plateau for 81 min before equilibrating for 10 min. Fractionation was done for each fraction using the Probot automated fraction collector (Dionex). Fraction collection started after a 17 min dead volume delay following the injection signal, and was performed directly on a MALDI target (blank plate, ABSciex ®). Fractions were collected every 10s for a total of 384 spots per fraction. Eluent and matrix solutions were mixed on-target. An α-cyano-4-hydrocinnamic acid (CHCA, Laserbio Labs, Sophia, France) matrix was dissolved at 2.5 mg/mL in 70% acetonitrile containing 0.1% TFA, and 1 µM glu-fibrinopeptide-B (Sigma) for internal calibration (m/z=1570.677).
Offline mass spectrometry analysis MS spectrum acquisition Mass spectra were measured with a 4800 MALDI-TOF-TOF mass spectrometer (ABSciex®) equipped with an Nd:YAG pulsed laser (355 nm wavelength with a 95%, False Discovery Rate) unique peptides had been assigned, the protein identification had to have a 95% confidence interval (unused protein score>1.3).
Peak Area Ratio normalization iTRAQ reporter ion abundances in MS/MS scans were evaluated using Protein Pilot v4.0 software (ABI) for the relative quantitation of proteins. For each peptide used for protein identification, the areas under the peaks for peaks at m/z 114-119 were calculated. To correct for experimental errors in the amount of protein included in the different sample group, the bias correction was performed on the Pro Group Algorithm results. This was performed by calculating the median peptide ratio for all peptides reported, adjusting to unity, and then applying the same bias factor to all ratios (performed by Protein Pilot®). This normalizing factor is based on the assumption that the levels of most of the proteins in the plasma from exposed individuals should be similar to those from unexposed individuals, with the exception of those that are influenced by working conditions that constitute a radiation risk
42
References for supplemental tutorial Arntzen, M. O., Koehler, C. J., Barsnes, H., Berven, F. S., Treumann, A. and Thiede, B. (2010) 'IsobariQ: software for isobaric quantitative proteomics using IPTL, iTRAQ, and TMT' Journal of Proteome Research, Vol. 10 No. 2, PP 913-920. Benjamini, Y., Drai, D., Elmer, G., Kafkafi, N. and Golani, I. (2001) 'Controlling the false discovery rate in behavior genetics research' Behav Brain Res, Vol. 125 No. 1-2, PP 279-284. Cerami, E., Demir, E., Schultz, N., Taylor, B. S. and Sander, C. (2010 ) 'Automated network analysis identifies core pathways in glioblastoma' PLoS One, Vol. 5 No. 2, PP e8918. Cerami, E., Gao, J., Dogrusoz, U., Gross, B. E., Sumer, S. O., Aksoy, B. A., Jacobsen, A., Byrne, C. J., Heuer, M. L., Larsson, E., Antipin, Y., Reva, B., Goldberg, A. P., Sander, C. and Schultz, N. (2012) 'The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data' Cancer Discov, Vol. 2 No. 5, PP 401-404. Cerami, E. G., Gross, B. E., Demir, E., Rodchenkov, I., Babur, O., Anwar, N., Schultz, N., Bader, G. D. and Sander, C. (2010) 'Pathway Commons, a web resource for biological pathway data' Nucleic Acids Research, Vol. 39 No. Database issue, PP D685-690. Choi, H. and Nesvizhskii, A. I. (2008) 'False discovery rates and related statistical concepts in mass spectrometry-based proteomics' Journal of Proteome Research, Vol. 7 No. 1, PP 47-50. Gentleman, R. C., Carey, V. J., Bates, D. M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A. J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J. Y. and Zhang, J. (2004 ) 'Bioconductor: open software development for computational biology and bioinformatics' Genome Biology, Vol. 5 No. 10, PP R80. Graves, S., Piepho, H.-P., Selzer, L. and Dorai-Raj, S. (2011 ) 'MultcompView: Visualizations of Paired Comparisons.' [online] http://cran.r-project.org/web/packages/multcompView/index.html, Rproject, cran Huber, W., Von Heydebreck, A., Sultmann, H., Poustka, A. and Vingron, M. (2002 ) 'Variance stabilization applied to microarray data calibration and to the quantification of differential expression' Bioinformatics, Vol. 18 Suppl 1 No., PP S96-104. Joshi-Tope, G., Gillespie, M., Vastrik, I., D'eustachio, P., Schmidt, E., De Bono, B., Jassal, B., Gopinath, G. R., Wu, G. R., Matthews, L., Lewis, S., Birney, E. and Stein, L. (2005) 'Reactome: a knowledgebase of biological pathways' Nucleic Acids Research, Vol. 33 No. Database issue, PP D428-432. Karp, N. A., Huber, W., Sadowski, P. G., Charles, P. D., Hester, S. V. and Lilley, K. S. (2010) 'Addressing accuracy and precision issues in iTRAQ quantitation' Molecular & Cellular Proteomics, Vol. 9 No. 9, PP 1885 - 1897. Kelder, T., Pico, A. R., Hanspers, K., Van Iersel, M. P., Evelo, C. and Conklin, B. R. (2009 ) 'Mining biological pathways using WikiPathways web services' PLoS One, Vol. 4 No. 7, PP e6447. Matthews, L., Gopinath, G., Gillespie, M., Caudy, M., Croft, D., De Bono, B., Garapati, P., Hemish, J., Hermjakob, H., Jassal, B., Kanapin, A., Lewis, S., Mahajan, S., May, B., Schmidt, E., Vastrik, I., Wu, G., Birney, E., Stein, L. and D'eustachio, P. (2009) 'Reactome knowledgebase of human biological pathways and processes' Nucleic Acids Research, Vol. 37 No. Database issue, PP D619-622. Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H. and Kanehisa, M. (1999) 'KEGG: Kyoto Encyclopedia of Genes and Genomes' Nucleic Acids Research, Vol. 27 No. 1, PP 29-34. Orchard, S., Kerrien, S., Jones, P., Ceol, A., Chatr-Aryamontri, A., Salwinski, L., Nerothin, J. and Hermjakob, H. (2007) 'Submit your interaction data the IMEx way: a step by step guide to trouble-free deposition' Proteomics, Vol. 7 Suppl 1 No., PP 28-34. Peri, S., Navarro, J. D., Amanchy, R., Kristiansen, T. Z., Jonnalagadda, C. K., Surendranath, V., Niranjan, V., Muthusamy, B., Gandhi, T. K., Gronborg, M., Ibarrola, N., Deshpande, N., Shanker, K., Shivashankar, H. N., Rashmi, B. P., Ramya, M. A., Zhao, Z., Chandrika, K. N., Padma, N., Harsha, H. C., Yatish, A. J., Kavitha, M. P., Menezes, M., Choudhury, D. R., Suresh, S., Ghosh, N., Saravana, R., Chandran, S., Krishna, S., Joy, M., Anand, S. K., Madavan, V., Joseph, A., Wong, G. W., Schiemann, W. P., Constantinescu, S. N., Huang, L., Khosravi-Far, R., Steen, H., Tewari, M., Ghaffari, S., Blobe, 43
G. C., Dang, C. V., Garcia, J. G., Pevsner, J., Jensen, O. N., Roepstorff, P., Deshpande, K. S., Chinnaiyan, A. M., Hamosh, A., Chakravarti, A. and Pandey, A. (2003) 'Development of human protein reference database as an initial platform for approaching systems biology in humans' Genome Research, Vol. 13 No. 10, PP 2363-2371. Pico, A. R., Kelder, T., Van Iersel, M. P., Hanspers, K., Conklin, B. R. and Evelo, C. (2008) 'WikiPathways: pathway editing for the people' PLoS Biol, Vol. 6 No. 7, PP e184. Prasad, T. S., Kandasamy, K. and Pandey, A. (2009 ) 'Human Protein Reference Database and Human Proteinpedia as discovery tools for systems biology' Methods in Molecular Biology, Vol. 577 No., PP 67-79. Schaefer, C. F., Anthony, K., Krupa, S., Buchoff, J., Day, M., Hannay, T. and Buetow, K. H. (2009) 'PID: the Pathway Interaction Database' Nucleic Acids Research, Vol. 37 No. Database issue, PP D674-679. Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., Amin, N., Schwikowski, B. and Ideker, T. (2003) 'Cytoscape: a software environment for integrated models of biomolecular interaction networks' Genome Research, Vol. 13 No. 11, PP 2498-2504. Tcgc (2009 ) The Cancer genome Atlas. http://cancergenome.nih.gov/. Team, R. C. (2011 ) 'R: A Language and Environment for Statistical Computing.' [online] http://www.Rproject.org, Wu, G., Feng, X. and Stein, L. (2010 ) 'A human functional protein interaction network and its application to cancer data analysis' Genome Biology, Vol. 11 No. 5, PP R53.
44