Simultaneous Confidence Intervals and Multiple Contrast Tests

Simultaneous Confidence Intervals and Multiple Contrast Tests

Edgar Brunner Abteilung Medizinische Statistik ¨ Gottingen ¨ Universitat

1

Contents •

•

•

Parametric Methods ⊲ Motivating Example ⊲ SCI Method ⊲ Analysis of the Example Nonparametric Methods ⊲ Motivating Example ⊲ SCI Method ⊲ Analysis of the Example ⊲ Paricular Difficulties References

2

I Parametric Methods • •

Motivating Example O2 -Consumption of Leucocytes ⊲ bars show min ⊢ · · · ⊣ max

O2-Consumption of Leucocytes

D2

n3=7

D1

n2=8

PL

n1=8 3,0

•

3,5 4,0 O2-Consumption [µl]

Question ⊲ Which dose is different from control? 3

Motivating Example Classical Analysis (1) ANOVA / H0 : µP = µ1 = µ2 (2) H0 rejected → multiple comparisons (FWEs = 0.05) (3) confidence intervals for µ1 − µP and µ2 − µP • must be compatible to the decisions of the MCP • i.e. confidence interval (CI) for µi − µP may not contain 0 ⇐⇒ H0 : µi − µP = 0 is rejected, i = 1, 2 • Statistical Methods / Procedures ⊲ ANOVA (F-test) ⊲ multiple comparisons using closure principle (CTP) ⊲ Bonferroni confidence intervals (1 − α = 0.975) • Results ⊲ global hypothesis: F = 2.53 p-value 0.1056 - (n.s.) ⊲ MCP • PL - D1: p = 0.1424 - (n.s.) / PL - D2: p = 0.0488 - (n.s.)

•

4

Motivating Example •

Shift of the D1 Data O2-Consumption of Leucocytes D2

n3=7

D1

n2=8

PL

n1=8 3,0

•


Results ⊲ global hypothesis: F = 4.06 p-value 0.0355 - (*) ⊲ MCP (CTP) • PL - D1: p = 0.0256 - (*) / PL - D2: p = 0.0488 - (*) 5

Conclusions from the Motivating Example •

•

Confidence Intervals (Bonferroni) ⊲ PL - D1: [−0.024, 0.557] - contains 0 / not compatible to CTP ⊲ PL - D2: [−0.063, 0.538] - contains 0 / not compatible to CTP Conclusions (undesirable properties ⊲ decision on effect PL - D2 depends on effect PL - D1 ⊲ confidence intervals are not compatible ⊲ dependency of the statistics X 1 − X P and X 2 − X P not used (wasting information) ⊲ different method needed

6

Different Method •

Idea ⊲ statistical model is • adapted and reduced • to the particular questions of the experimenter ⊲ take dependence of the statistics into account • statistics completely dependent → no α-adjusting necessary • independence is the ’worst case’ ⊲ example of O2 -consumption −1 1 0 .. • C= = (−12 .I2 ) and X· = (X P· , X 1· , X 2· )′ −1 0 1 µ1 − µP X 1· − X P· • desired contrasts CX· = , µδ = µ2 − µP X 2· − X P· • consider the distribution of CX· ∼ N(µ µδ , Σ ) −1 n 0 −1 2 1 • Σ=σ + n −1 P J2 = (si j )i, j=1,2 0 n2 7


Derivation of the Statistic ⊲ sii = σ2 · (ni + nP )/(ni · nP ), i = 1, 2 - diagonal elements of Σ ⊲ s bii : LS-estimator of sii replacing σ2 with the pooled estimator ni 1 2 • σ b2N = (X − X ) , N = n1 + n2 + nP ik i· ∑ ∑ N − 3 i=P,1,2 k=1 √ X − X 1· P· ⊲ studentize each component of CX· = with sbii X 2· − X P· ⊲ under H0 : µ δ = 0, consider the statistics (i = 1, 2) r ni nP . • Ti = bN ∼ N(0, 1), N → ∞, (X i· − X P· ) σ . ni + nP N/ni < N0 < ∞ ⊲ multivariate statistic . • T = (T1 , T2 )′ ∼ N(0, R), R: correlation matrix . 8


Derivation of the (1 − α )-Quantiles ⊲ same quantile z1−α,2,R for all components, such that Z

z1−α,2,R

Z

z1−α,2,R

−z1−α,2,R −z1−α,2,R

dN(0, R) = 1 − α

better approximation: mulitvariate t-distribution: t1−α,2,ν,Rb N b N : LS-estimator of R ⊲ R • replacing σ2 with σ b2N • diagonal elements = 1 • off-diagonal elements depend only on sample sizes and σ b2N • T ∼ multivariate t-distribution ⊲ references • original paper: Bretz, Genz and Hothorn (2001) • multivariate integration: Genz and Bretz (2009) • heteroscedastic case: Hasler and Hothorn (2008) in general: C may be any appropriate contrast matrix ⊲

•

9

SCI-Method / Quantiles

−4

• • • • •

−2

0

2

4

2 −4

−2

0

2 0 −2 −4

−4

−2

0

2

4

Korrelation = 0.99, Quantil= 2.0133

4

Korrelation = 0.5, Quantil= 2.2121

4

Korrelation = 0, Quantil= 2.2365

−4

−2

0

2

4

−4

−2

0

2

4

equi-coordinate quantiles of different bivariate normal distributions squares containing mass 1 − α of the bivariate normal distributions computation by means of R-package „mvtnorm” SAS-macro: to be developed or input of R-code in SAS/IML Studio 3.2 10

SCI-Method / Procedure •

Multiple Comparisons ⊲

(i)

reject H0 : δi = µi − µP = 0 if •

•

•

-

or |Ti | ≥ t1−α,2,ν,Rb N

Global Hypothesis ⊲ reject H0 : Cµ µ = µ δ = 0 if •

•

|Ti | ≥ z1−α,2,R

max{T1 , T2 } ≥ z1−α,2,R

-

or max{T1 , T2 } ≥ t1−α,2,ν,Rb N

Simultaneous Confidence Intervals r ! \ z1−α,2,Rb N ni + nP . ⊲ P δi ∈ X i· − X P· ± = 1−α . bN σ ni nP i∈I

Error Control? ⊲ FWEs (by Gabriel’s Theorem, 1969)

11

Example: Analysis by SCI-Method •

Original Data Set (O2 -Consumption of Leucocytes) O2-Consumption of Leucocytes D2

n3=7

D1

n2=8

PL

n1=8 3,0


SCI Classical PL - D1 t = 2.10 p-value 0.0965 - n.s. n.s. PL - D2 t = 2.18 p-value 0.0864 - n.s. n.s. 12


Shift of the D1 Data O2-Consumption of Leucocytes D2

n3=7

D1

n2=8

PL

n1=8 3,0


SCI Classical PL - D1 t = 2.53 p-value 0.0460 - (∗) (∗) PL - D2 t = 2.18 p-value 0.0864 - n.s. (∗) 13

Conclusions from the Analysis •

•

Confidence Intervals (D1 Shifted) ⊲ PL - D1: [0.0049, 0.5276] - does not contain 0 / compatible ⊲ PL - D2: [−0.0324, 0.5074] - contains 0 / compatible Conclusions ⊲ decision on effect PL - D2 does not depend on effect PL - D1 ⊲ confidence intervals are compatible ⊲ dependency of the statistics X 1 − X P and X 2 − X P is used

14

Extensions / Generalizations •

• •

• •

Factorial Designs ⊲ Biesheuvel and Hothorn (2002) / stratified samples ⊲ general case under research: diploma thesis Large Number of Dimensions bN may become singular (breakdown?) ⊲ Σ Repeated Measures ⊲ n ≥ d and n < d (breakdown?) ⊲ high-dimensional data / Froemke, Hothorn and Kropf (2008) ⊲ is there a limit distribution? Binomial Data ⊲ Schaarschmidt, Sill and Hothorn (2008) Nonparametric effects ⊲ non-normal data (Konietschke, 2009) ⊲ ordinal data: ordinal effect size measure (Ryu and Agresti, 2008) 15

II Nonparametric Methods • •

Motivating Example Toxicity Trial (60 Wistar Rats) ⊲ damage by an inhalable substance on the mucosa of the nose ⊲ 3 concentrations ( 2[ppm], 5[ppm], 10[ppm]) ⊲ score (0= b „no damage”,. . ., 3= b „severe damage”) Concentration

Number of Rats with Score 0 1 2 3 2 [ppm] 18 2 0 0 5 [ppm] 12 6 2 0 10 [ppm] 3 7 6 4

⊲

ordinal data

16

Motivating Example •

Classical Analysis Strategy ⊲ statistical model Xik ∼ Fi (x), i = 1, 2, 3; k = 1, . . . , 20 ⊲ hypotheses (1) • H 0 : F1 = F2 = F3 R (2) • H relative effect: p12 = F1 dF2 0 : F1 = F2 R (3) • H relative effect: p13 = F1 dF3 0 : F1 = F3 R (4) • H relative effect: p23 = F2 dF3 0 : F2 = F3 ⊲ relative effect pi j - interpretation R • pi j = Fi dFj = P(Xi1 < X j1 ) + 21 P(Xi1 = X j1 )

probability that the observations in group i tend to smaller values than in group j • ordinal data: effect size measure (Ryu and Agresti, 2008) R needed: confidence intervals for pi j = Fi dFj , i 6= j = 1, 2, 3 error control: FWEs •

⊲ ⊲

17

SCI-Method •

Hypotheses of Interest ⊲

•

•

(1)

H0 : p12 = 21 ,

(2)

H0 : p13 =

1 2

Estimators of the Relative Effects pi j R pb12 (i j) n j +1 1 b b ⊲ p b = bi j = Fi d Fj = ni R j· − 2 → p pb13 √ ⊲ asymptotic distribution of b − p) ∼ N(0, VN ) N(p ⊲ depends on unknown parameters (elements of VN ) ⊲ no pivotal quantity Statistics p ⊲ studentize each component (i, j) of p b by vb(i j) ⊲ v b(i j) : estimated variance of pbi j (diagonal elements of VN ) ⊲

(i j)

(i j)

(i)

( j)

estimation by means of ranks Rik , R jk , Rik , and R jk Reference: Brunner, Munzel und Puri (2002)

18

SCI-Method •

Asymptotic Distribution of the Statistics

• • •

(i j)

asymptotic distribution under H0 : pi j = 21 of q √ . vbi j ∼ Ti j = N · pbi j − 12 . N(0, 1) . ⊲ T = (T12 , T13 )′ ∼ N(0, R), R: correlation matrix . use the same procedure as in the parametric case error control: FWEs problem: confidence intervals may exceed the [0, 1]-interval ⊲

19

SCI-Method / Properties •

Problem ⊲ intervals are not range preserving

lower and upper bound of a 95% confidence interval (n = 10) Solution multivariate δ-method ⊲

•

20

Range Preserving Intervals •

• •

•

Procedure ⊲ continuous transformation of G( p bi j ) → (−∞, ∞) ⊲ G : (G1 , . . . , Gq ) : (0, 1)q → Rq • strictly monotone, i.e. G′ (pi j ) 6= 0 ℓ • differentiable, bijective, Gℓ ( 1 ) = 0, ℓ = 1, . . . , q 2 in the example: q = 2 asymptotic distribution of G: Cramer’s δ-Theorem ⊲ transformed estimators are also multivariat normal ⊲ elements v∗ of the covariance matrix of G ij • multivariate δ-Theorem: v∗ = [G′ (pi j )]2 · vi j ij

back transformation of the limits → [0, 1] - range preserving

21


Toxicity Trial (60 Wistar Rats) Concentration

Number of Rats with Score 0 1 2 3 2 [ppm] 18 2 0 0 5 [ppm] 12 6 2 0 10 [ppm] 3 7 6 4

•

Results (Probit) Comparison Effect Interval p-Value 2 vs. 5 0.66 0.5 ∈ / [0.501; 0.787] 0.049 2 vs. 10 0.90 0.5 ∈ / [0.753; 0.970] < 0.0001

22

Nonparametric Methods / Difficulties •

•

•

Non-Transitivity ⊲ pairwise relative effects are not transitive ⊲ e.g.: p1 < p2 < p3 < p1 ⊲ counter-example: Efron’s paradox dice (Rump, 2001) • Brown and Hettmansperger (2002) - one-way layout • Thangavelu and Brunner (2007) - stratified Wilcoxon tests New Definition of Relative Effects for a > 2 R ⊲ e.g. pi = HdFi , H = mean of the Fi ⊲ all distributions are compared to H ⊲ or all distributions are compared to the same reference ⊲ to be worked out √ ⊲ covariance matrix of N( pb1 , . . . , pbd )′ is quite involved ⊲ first results: Konietschke (2009) Factorial Designs ⊲ consider each factor separately? or combine all comparisons in one vector? → to be worked out

23

Discussion aund Outlook •

•

• •

SCI-Method unifies 3 steps of the classical analysis strategy ⊲ ANOVA ⊲ multiple comparisons (controlling FWEs ) ⊲ confidence intervals for the effects - compatible to the multiple comparisons in one procedure further research ⊲ detailed results regarding power ⊲ extension to factorial designs ⊲ extension to repeated measures designs for parametric as well as nonparametric models Software ⊲ so far only for independent samples (one-factorial design) ⊲ parametric models: R-package: SimComp in CRAN ⊲ nonparametric models: R-package: nparcomp in CRAN 24

Cooperation / Credits •

•

Ludwig Hothorn and assistants (Biostatistik, LU Hannover) Frank Konietschke (Medizinische Statistik, University of Göttingen)

25

References B IESHEUVEL , E. and H OTHORN , L.A. (2002). Many-to-one comparisons in stratified designs. B IOMETRICAL J OURNAL 44, 101-116. B RETZ , F., G ENZ , A., and H OTHORN , L.A. (2001). On the numerically availibilty of multiple comparison procedures, Biometrical Journal 43, 645-656. B ROWN , B. M. and H ETTMANSPERGER , T. P. (2002). Kruskal-Wallis, Multiple Comparisons and Efron Dice. Australian and New Zealand Journal of Statistics 44, 427-438. B RUNNER ,E., M UNZEL , U., and P URI , M., (2002). The multivariate nonparametric Behrens-Fisher problem. Journal of Statistical Planning and Inference 108, 37-53. F ROEMKE C., H OTHORN L.A. and K ROPF S. (2008). Nonparametric relevance-shifted multiple testing procedures for the analysis of high-dimensional multivariate data with small sample sizes. BMC Bioinformatics, 9:54 doi: 10.1186/1471-2105-9-54. 26

References G ABRIEL , K.R. (1969). Simultaneous Test Procedures - Some Theory of Multiple Comparisons. The Annals of Mathematical Statistics 40, 224 - 250. G ENZ , A. and B RETZ F. (2009). Computation of Multivariate Normal and t Probabilities. Lecture Notes in Statistics 195. Springer, Heidelberg, New York. H ASLER M. and H OTHORN L.A. (2008). Multiple Contrast Tests in the Presence of Heteroscedasticity. Biometrical Journal 50, 793-800. KONIETSCHKE , F. (2009). Simultane Konfidenzintervalle für nichtparametrische relative Kontrasteffekte. Dissertation, ¨ Gottingen ¨ Georg-August-Universitat RUMP, C. M. (2001). Strategies for Rolling the Efron dice. Mathematics Magazine 74, 212-216.

27

References RYU, E. and AGRESTI , A. (2008). Modeling and inference for an ordinal effect size measure. Statistics in Medicine 27, 1703-1717. S CHAARSCHMIDT, F., S ILL , M. and H OTHORN , L.A. (2008). Approximate Simultaneous Confidence Intervals for Multiple Contrasts of Binomial Proportions. Biometrical Journal 50, 782-792. T HANGAVELU, K. and B RUNNER , E. (2007). Wilcoxon Mann-Whitney Test for Stratified Samples and Efron’s Paradox Dice. Journal of Statistical Planning and Inference 137, 720-737.

28

Simultaneous Confidence Intervals and Multiple Contrast Tests

Simultaneous Confidence Intervals and Multiple Contrast Tests

Suggest Documents

simultaneous confidence intervals for

Simultaneous Confidence Intervals and Sample

Simultaneous Confidence Intervals and Sample Size Determination ...

On confidence intervals and tests for autocorrelations

Binomial confidence intervals and contingency tests - UCL

Binomial confidence intervals and contingency tests - UCL

Binomial confidence intervals and contingency tests ... - UCL

Simultaneous Fiducial Generalized Confidence Intervals ... - CiteSeerX

Simultaneous Fiducial Generalized Confidence Intervals ... - CiteSeerX

Approximate Simultaneous Nonparametric Confidence Intervals for All

eAppendix Confidence Intervals Confidence intervals for the ...

Confidence intervals

CONFIDENCE INTERVALS VS BAYESIAN INTERVALS

Statistical Tests, P-values, Confidence Intervals, and Power: A Guide ...

Exact confidence intervals and hypothesis tests for ...

Simultaneous imaging of multiple contrast agents

Confidence Intervals Our Goal in Inference Confidence Intervals (CI)

Simultaneous Confidence Intervals with more Power to Determine Signs

Confidence and Prediction Intervals for

Bootstrap Confidence Intervals

Construction of Confidence Intervals

PROBABILISTIC INTERVALS OF CONFIDENCE

Confidence Intervals Based on Corrected Likelihood Ratio Tests for ...

Confidence Intervals: From tests of statistical significance to ...