Parameter Estimation for the Exponential-Normal Convolution Model Monnie McGee & Zhongxue Chen
[email protected],
[email protected]
Department of Statistical Science Southern Methodist University
Biostatistics, Bioinformatics and Biomathematics, Georgetown University, January 12, 2007 – p. 1/3
Important Messages “Begin at the beginning. It’s a very fine place to start.” – Sound of Music
Biostatistics, Bioinformatics and Biomathematics, Georgetown University, January 12, 2007 – p. 2/3
The Affymetrix Chip Some Definitions Probes = 25 nt sequences Probe sets = 11 to 20 probes corresponding to a particular gene or EST Chip contains 54K probe sets Human Genome U133 Plus 2.0 Array Courtesy of Affymetrix
Biostatistics, Bioinformatics and Biomathematics, Georgetown University, January 12, 2007 – p. 3/3
Perfect Match vs. Mismatch PM Probe = 25 nt probe perfectly complementary to a specific region of a gene MM Probe = 25 nt probe agreeing with a PM apart from the middle base. The middle base is a transition (A ⇐⇒ T, C ⇐⇒ G) of that base
Biostatistics, Bioinformatics and Biomathematics, Georgetown University, January 12, 2007 – p. 4/3
Perfect Match vs. Mismatch PM Probe = 25 nt probe perfectly complementary to a specific region of a gene MM Probe = 25 nt probe agreeing with a PM apart from the middle base. The middle base is a transition (A ⇐⇒ T, C ⇐⇒ G) of that base
Image Courtesy of Affymetrix
Biostatistics, Bioinformatics and Biomathematics, Georgetown University, January 12, 2007 – p. 4/3
Central Dogma of MA Analysis Computing Expression Values for each probe set requires three steps: Background correction Normalization Summarization
Biostatistics, Bioinformatics and Biomathematics, Georgetown University, January 12, 2007 – p. 5/3
Central Dogma of MA Analysis Computing Expression Values for each probe set requires three steps: Background correction Normalization Summarization Approaches: Microarray Analysis Suite 5.0 (MAS 5.0 - Affymetrix, 2001, 2003) Model Based Expression Index (MBEI - Li and Wong, 2001a,b) Robust Multichip Analysis (RMA - Irizarry et. al., 2003) GeneChip-RMA (Wu, et. al., 2004) Probe Logarithmic Intensity Error Estimation (PLIER - Affymetrix, 2004)
Biostatistics, Bioinformatics and Biomathematics, Georgetown University, January 12, 2007 – p. 5/3
The RMA Approach Background Correction under Exponential-Normal Convolution Model. Normalization via Quantile Normalization. Summarization with Median Polish (Tukey, 1977). Biconductor allows user to interchange methods at any step.
Biostatistics, Bioinformatics and Biomathematics, Georgetown University, January 12, 2007 – p. 6/3
Exp-Norm Convolution Model The Convolution Model is given by X =S+Y
where X = observed probe–level intensity S ∼ E(α) = true signal Y ∼ T N (µ, σ 2 ) = background noise
The true signal can be estimated by a x−a φ( b ) − φ( b ) E(S|X = x) = a + b , a x−a Φ( b ) + Φ( b ) − 1 where a = x − µ − σ 2 α and b = σ .
Biostatistics, Bioinformatics and Biomathematics, Georgetown University, January 12, 2007 – p. 7/3
CM for the Right–Brained ...
Biostatistics, Bioinformatics and Biomathematics, Georgetown University, January 12, 2007 – p. 8/3
Parameter Estimation Background Corrected intensity is Eij = E(Sij |Xij ), where i = 1 . . . G, and j = 1, . . . , J. We need to estimate µ, σ, and α.
Biostatistics, Bioinformatics and Biomathematics, Georgetown University, January 12, 2007 – p. 9/3
Parameter Estimation Background Corrected intensity is Eij = E(Sij |Xij ), where i = 1 . . . G, and j = 1, . . . , J. We need to estimate µ, σ, and α. How does BioC estimate the parameters? µ = Mode of observations to the left of the overall mode σ = Sample standard deviation for observations to left of overall mode α = Mode of observations to the right of the overall mode Shown to perform better than most other approaches (Hein, et. al., 2005).
Biostatistics, Bioinformatics and Biomathematics, Georgetown University, January 12, 2007 – p. 9/3
Code for Parameter Estimates > bg.parameters function (pm, n.pts = 2ˆ14) { max.density