Statistical evaluation of normalization methods for NanoString ...
Recommend Documents
Background. High-throughput sequencing technologies, such as the Illumina Genome Analyzer, are powerful new tools for investigating a wide range of ...
4.50â7.66. 0.19. * All volumes are given in cm3, and all areas are given in cm2. AJNR: 16, April ..... Hospital, Paddington, London, United Kingdom, for their.
Unit, St Mary's Hospital, London, United Kingdom (J.M.S.). Address reprint ..... Murphy DGM, DeCarli C, Schapiro MB, Rapoport SI, Horwitz B. Age-related ...
Bowman et al. .... Llewellyn & Jenkins 1987 ..... Bowman, J.C., Sleep, D., Forbes, G.J. & Edwards, M. (2000) The association of small mammals ..... McCleery, R.A., Lopez, R.R., Silvy, N.J., Frank, P.A. & Klett, S.B. (2006) Population status and.
Feb 1, 2011 - is known as an excellent classifier of nonlinear input and output numerical data. .... inputs with three hidden layers with 15, 14, 14, 1 neurons.
Feb 1, 2011 - Artificial neural networks seemed to be one of the best approaches in machine learning methods. ANNs are software constructs designed to ...
... Manu Vanaerschot,4 Erwin Witters,2,3. Jean-Claude Dujardin,4,5 Tomasz Burzykowski,1 and Maya Berg4 ...... Aerts M. (2007). Using linear mixed models for ...
Jul 27, 2001 - used to normalize cDNA [8] and oligonucleotide [9-11] arrays. All these techniques are inherently linear; there have been recent reports of ...
May 16, 2011 - Daniel Aguirre de Cárcer, Stuart E. Denman, Chris McSweeney, and ..... Horner-Devine, M. C., M. Lage, J. B. Hughes, and B. J. M. Bohannan.
the above mentioned feature normalization techniques on the telephone (det5) and microphone ... speaker verification systems including feature warping [1], STG [2], CMN [3], and. RASTA filtering [4-5]. ... located at the centre of the sliding window.
Nov 21, 2001 - DAP3. 87831. AFFX-DapX-3_at. 0.15. 5. BIOBM. 87829. AFFX-BioB-M_at. 0.1. 3. Spike-in transcript pool. The 11 spiked transcripts and their ...
analysis is carried out to identity any potential mortality clusters in both ..... List of Tables .... Table 4.8: Significant clusters of colon cancer mortality in Scotland ...
in Figure 1) and accepting bad chips with low IDDQ (region. A in Figure 1) can result in ... signature approach [18] the IDDQ readings for a number of vectors are ...
the approach based on pass/fail limit setting cannot survive in its present form. 1 This data comes from the work of the Test thrust at SEMATECH,. Project S121.
Vassiliki Bountziouka and Demosthenes B. Panagiotakos*. Department of Nutrition - Dietetics, Harokopio University, Athens, Greece. Abstract: The contribution ...
Evaluation of Multivariate Statistical Methods for Analysis and. Modeling of Immunotoxicology Data. Deborah Keil,*,1 Robert W. Luebke,â Michael Ensley,â¡ ...
Support for this project comes from the region's electrical ratepayers through the ... trict), Chip McConnaha (Northwest Power Planning Commission), and John .... methods for obtaining smolt survival estimates through key reaches of the sys- tem. ...
Abstract: The contribution of diet to the development of several chronic diseases, such as vascular disease, diabetes or lipid abnormali- ties has been ...
[5] J. Inaz. Player Value, Part 3: Fielding Performance Estimators. http: //jinaz-reds.blogspot.com/2007/10/player-value-part-3a-fielding. html, 2007. [6] B. James.
Median-Mad, Min-Max, and Z-score normalization are considered in this study. ... of Mean-Mad and Median-Mad is better than the all remaining methods. On the ...
Biology Education Centre Biomedical Center Husargatan 3 Uppsala. Box 592 S-75124 Uppsala ... Det beror på att det finns många källor till varians i dessa nivåer. Fokus i ...... where to store and what to call the image file. The only critical ...
12 Sep 2012 ... Goal is for you to understand one NLP-ML problem of your interest in-depth ...
Classifying Kid-submitted Comments using Machine Learning.
Nov 6, 2013 - predicting wine quality at the time of selecting grapes. Amongst the ..... on the type and quality of the wine desired, the aging of red wine may ...
Statistical evaluation of normalization methods for NanoString ...
INTRODUCTION. NanoString is a novel medium-throughput technology which is becoming widely-accepted in the biomedical community for measurement of ...
Statistical evaluation of normalization methods for NanoString nCounter data ELIKA 1 McGill
1 GARG ,
IVAN
1, 2 TOPISIROVIC ,
AND ROBERT
1, 3 NADON
University, 2 Lady Davis Institute for Medical Research, 3 McGill University and Genome Quebec Innovation Centre
INTRODUCTION
METHODS
Strategy-I
Strategy-II
NanoString is a novel medium-throughput technology which is becoming widely-accepted in the biomedical community for measurement of gene expression. The count data generated by its nCounter machine is sensitive to pre-processing methods, however, and there is as yet no consensus on how best to normalize the data.
Samples required
≥ 2 replicates/condition (in pairs)
≥ 4 replicates/condition (in groups)
Examined entity
Agreement
P-value distribution
based on linear regression of pairs (can be coupled with regression diagnostics)
generated from statistical tests of mean difference between groups (e.g. t-test)
There is a requirement to stabilize the variance and evaluate the normalization methods before application on any dataset. A statistical evaluation of popular normalization methods is discussed and applied to four different datasets. As required by statistical tests, variance within each dataset is stabilized before evaluation.
Assessment tools
Histograms
QQ-plots
Similar values
Eliminates Poor-performers
Sample quantiles
Expects
Concordance Correlation Coefficient = 0.99
Frequency
Replicate 2
Same condition Compares samples
P-value
Replicate 1
Uniform distribution
Opposite condition Compares samples Expects
More differences
Selects
Best-performers
Concordance Correlation Coefficient = 0.82
Sample quantiles
Step-2 Frequency
We illustrate how competing normalization methods can be evaluated for NanoString nCounter mRNA datasets.
CCCs
Step-1
Replicate 2
Two evaluation strategies, each of which has a first elimination step and a second selection step, are presented. The underlying logic of both strategies is that agreement between measurements within the same condition should be high under the null hypothesis assumption of no differences and should be low when measurements from different conditions are compared.
Scatter plots
Replicate 1
P-value
Uniform distribution
DATA
RESULTS
Strategy-I
Strategy-II
• The primary dataset was generated as part of a study examining the role of Estrogen Receptor alpha in prostate cancer. Mice cells with and without ER were grown in culture, then polysomal and total mRNA were extracted from each through polysomal fractionation. • The other datasets were procured from peer-reviewed journals.
Datasets
Data-1, Data-2, Data-3, Data-4
Data-4
Plots
Boxplots of CCCs
Histograms and QQ-plots
Step-1
Number of genes Number of Number of replicates Number of controls Datasets conditions Biological Technical Positive Negative Reference Endogenous used
11 189 9 500 2 65 8 558 conditions used,
P-value
Strategy-I
Low CCCs
Deviation from Strategy-II uniformity
Observed quantiles
1 4 3 6 8 2 2 3 6 2 3 2 3 2 6 8 4 2 6 6 8 Table 1: Description of datasets with respect to number of replicates, controls, and genes.
Elimination
Frequency
Normalization methods
Normalization methods
Uniform distribution Normalization methods
PRE-PROCESSING • Normalization methods were pre-selected based on their popularity in high-impact journals. • They were applied to all datasets through an R-package called NanoStringNorm. . 1 2 3 4 5 Positive Sum control Negative control Reference Geometric genes mean
M-Pos Geometric mean
M-Ref
M-PosNegRef
M-NegRef
Sum
Geometric mean
Mean + 2sd
Mean + 2sd
Geometric mean
Geometric mean
Selection
Strategy-I
Lowest CCCs
Highest Strategy-II deviation from uniformity
Table 2: Description of five pre-selected normalization methods.
• Variance stabilization methods were selected based on their ability in achieving mean-SD (standard deviation) independence. • VSN transformation was applied to all datasets. No transformation
Log transformation
VSN transformation
CONCLUSION • Variance stabilization of data is a necessary preprocessing step. • Normalization by negative controls can be disadvantageous for some NanoString nCounter mRNA data. • These evaluation strategies can help identify the best normalization method for any given dataset.
P-value
Observed quantiles
M-PosRef
Frequency
Step-2
Normalization methods
1 2 3 4 5
M-PosRef M-Pos M-Ref M-PosNegRef M-NegRef
Uniform distribution
Correspondence | Affiliations | Acknowledgements elika.garg@mail .mcgill.ca robert .nadon@mcgill .ca McGill Integrated Cancer Research Training Program