Calculating the Confidence Intervals Using Bootstrap - Semantic Scholar

9 downloads 81814 Views 75KB Size Report
Nov 12, 2004 - Application, Chapter 5. Cambridge University Press. [2] DiCiccio, T.J. and Efron B. (1996) Bootstrap confidence intervals (with. Discussion).
Calculating the Confidence Intervals Using Bootstrap a Longhai Li Department of Statistics University of Toronto October 28 2004

a Revised

on Nov 12 2004

1

Looking at the Data

8e+04 4e+04 0e+00

Frequency

Histogram of the Monte Carlo Data

0e+00

2e−04

4e−04

6e−04

8e−04

1e−03

d

50000 20000 0

Frequency

Histogram of the Monte Carlo Data truncated by 1.0e−6

0e+00

2e−07

4e−07

6e−07 d[d < 1e−06]

2

8e−07

1e−06

Estimation of the Values Interested • Mean=1.557816e-07 • 95% Percentile: 4.510736e-07 • 99% Percentile: 1.663400e-06 • Probability that the rate exceeds 1.4e-05: 0.0007828664 • Probability that the rate exceeds 3.0e-4: 9.319838e-06

3

Confidence Intervals • To construct the confidence interval of a value interested using statistic T we need the sampling distributions of T . • Under most circumstances normal distributions are used to approximate the sampling distributions, as justified by CLT. • For our problems, normal distributions may not good enough since they are distributed asymmetric and may have two modes(referring to the plots given later). • Bootstrap is another class of general methods for constructing Confidence Intervals.

4

Introduction to Bootstrap Bootstrap draws samples from the Empirical Distribution of data {x1 , x2 , · · · , xn } to replicate statistic T to obtain its sampling distribution. The Empirical Distribution is just a Uniform distribution over {x1 , x2 , · · · , xn }. Therefore Bootstrap is just drawing i.i.d samples from {x1 , x2 , · · · , xn }. The procedure is illustrated by the following graph.

5

Graphical Illustration of Bootstrap Original data

Resampling t

x1(3)

x2(3) x3(3)

x(2) n

T2

x(3) n

T3

x1(B) x2(B) x3(B)

x(B) n

6

.....

x2(2) x3(2)

.....

.....

x1(2)

T1

.....

n

x(1) n

.....

x

. . .n

ing

w Dra

ts poin

x1(1) x2(1) x3(1)

.....

x1 x2 x3

with

.....

men

ace repl

Bootstrap Statistic

TB

Bootstrap Confidence Intervals • Simple Method To obtain the 95% Confidence Interval, the simple method is by taking 2.5% and 97.5% quantiles of the B replication T1 , T1 , · · · , TB as the lower and upper bound respectively. • More Sophisticated Method When the distributions are skewed we need do some adjustment. One method which is proved to be reliable is BCa method( BCa stands for Bias-corrected and accelerated). For the details please refer to DiCiccio, T.J. and Efron B. (1996) [2]

7

Bootstrap Confidence Intervals Given by BCa When the distribution of T is skewed, we instead use the q.low and q.up percentiles of the bootstrap replicates of T to calculate the lower bound and upper bound of the confidence intervals. Formally, for confidence level 95%, z0 + z 0.025 ) q.low = Φ(z0 + 1 − a(z0 + az 0.025 )

(1)

z0 + z 0.975 ) q.up = Φ(z0 + 1 − a(z0 + az 0.975 )

(2)

where z α is the α quantile of standard normal distribution, z0 and a ,namely bias-correction and alleleration, are two parameters to be estimated, by (2.8) and (6.6) in DiCiccio and Efron[2].

8

Confidence Interval of the Average The plot of 10,000 replicates of the sample average:

1000 0

500

Frequency

1500

2000

Histogram of mboot

1.4 e−07

1.6 e−07

1.8 e−07

2.0 e−07

mboot

The 95% confidence interval of the Average calculated by BCa is: 3.196408%

99.32855%

1.399666e − 07 1.871455e − 07 where 3.196408% and 99.32855% is the q.low and q.up given by (1) and (2), similarly for other C.I.s given henceforth. 9

Confidence Interval of 95% Percentile The plot of 10,000 bootstrap replicates of the 95% sample quantile: 800 1000 1200 600 0

200

400

Frequency

Histogram of qboot95

4.4 e−07

4.5 e−07

4.6 e−07

4.7 e−07

qboot95

95% Confidence Interval: 2.381881%

97.41008%

4.400038e − 07 4.626755e − 07

10

Confidence Interval of 99% Percentile The plot of 10,000 bootstrap replicates of the 99% sample quantile:

1000 0

500

Frequency

1500

2000

Histogram of qboot99

1.55e−06

1.60e−06

1.65e−06

1.70e−06

1.75e−06

qboot99

95% Confidence Interval: 2.514489%

97.59096%

1.589317e − 06 1.751419e − 06 11

1.80e−06

Confidence Interval of the probability that the rate exceeds 1.4E-5 The plot:

Frequency

0

500

1000

1500

2000

Histogram of alfa1boot

0.0005

0.0006

0.0007

0.0008

0.0009

0.0010

alfa1boot

95% Confidence Interval: 1.99185%

97.19283%

0.0006151093

0.0009506235 12

0.0011

Confidence Interval of the probability that the rate exceeds 3E-4 The plot:

2000 0

1000

Frequency

3000

Histogram of alfa2boot

0e+00

1e−05

2e−05

3e−05

4e−05

5e−05

6e−05

alfa2boot

95% Confidence Interval: 0.3888835%

93.23166%

0.000000e + 00 2.795951e − 05 13

7e−05

Summary of Results Par.

Estimation

95% Confidence Interval

Average

1.557816e-07

(1.399666e-07,1.871455e-07)

95% Perc.

4.510736e-07

(4.400038e-07,4.626755e-07)

99% Perc.

1.663400e-06

(1.589317e-06,1.751419e-06)

p1

0.0007828664

(0.000615109,0.000950623)

9.319838e-06 (0.00000e+00,2.79595e-05) p2 p1 —Probability that the rate exceeds 1.4e-05 p2 —Probability that the rate exceeds 3.0e-4

14

Reference [1] Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application, Chapter 5. Cambridge University Press. [2] DiCiccio, T.J. and Efron B. (1996) Bootstrap confidence intervals (with Discussion). Statistical Science, 11, 189-228. [3] Efron, B. (1987) Better bootstrap confidence intervals (with Discussion). Journal of the American Statistical Association, 82, 171-200.

15

Suggest Documents