Introduction to resampling methods

Introduction We want to assess the accuracy (bias, standard error, etc.) of an arbitrary estimate θˆ knowing only one sample x = (x1 , · · · , xn ) drawn from an unknown population density function f .

Definitions and Problems

We propose here one way, called Bootstrap, to do it using computer intensive techniques for resampling.

Non-Parametric Bootstrap Parametric Bootstrap

Bootstrap is a data based simulation method for statistical inference. The basic idea of bootstrap is to use the sample data to compute a statistic and to estimate its sampling distribution, without any model assumption.

Jackknife Permutation tests Cross-validation

No theoretical calculations of standard errors needed so we don’t care how mathematically complex the estimator θˆ can be!

25 / 133

Introduction

26 / 133

Bootstrap samples and replications

The (non-parametric) bootstrap method is an application of the plug-in principle. By non-parametric, we mean that only x is known (observed) and no prior knowledge on the population density function f is available. Originally, the Bootstrap was introduced to compute standard error of an arbitrary estimator by Efron (1979) and to-date the basic idea remains the same. The term bootstrap derives from the phrase to pull oneself up by one’s bootstrap (Adventures of Baron Munchausen, by Rudolph Erich Raspe). The Baron had fallen to the bottom of a deep lake. Just when it looked like all was lost, he thought to pick himself up by his own bootstraps.

27 / 133

Definition A bootstrap sample x∗ = (x1∗ , x2∗ , · · · , xn∗ ) is obtained by randomly sampling n times, with replacement, from the original data points x = (x1 , x2 , · · · , xn ). Considering a sample x = (x1 , x2 , x3 , x4 , x5 ), some bootstrap samples can be: x∗(1) = (x2 , x3 , x5 , x4 , x5 ) x∗(2) = (x1 , x3 , x1 , x4 , x5 ) etc.

Definition With each bootstrap sample x∗(1) to x∗(B) , we can compute a bootstrap replication θˆ∗ (b) = s(x∗(b) ) using the plug-in principle. 28 / 133

How to compute Bootstrap samples

How many values are left out of a bootstrap resample ?

Repeat B times:

Given a sample x = (x1 , x2 , · · · , xn ) and assuming that all xi are different, the probability that a particular value xi is left out of a resample x∗ = (x1∗ , x2∗ , · · · , xn∗ ) is: 1 n ∗ P(xj 6= xi , 1 6 j 6 n) = 1 − n n since P(xj∗ = xi ) = n1 . When n is large, the probability 1 − n1 converges to e −1 ≈ 0.37.

1

A random number device selects integers i1 , · · · , in each of which equals any value between 1 and n with probability n1 .

2

Then compute x∗ = (xi1 , · · · , xin ).

Some matlab code available on the web See BOOTSTRAP MATLAB TOOLBOX, by Abdelhak M. Zoubir and D. Robert Iskander, http://www.csp.curtin.edu.au/downloads/bootstrap toolbox.html

29 / 133

30 / 133

The Bootstrap algorithm for Estimating standard errors

Bootstrap estimate of the standard Error Example A

1

Select B independent bootstrap samples from x

2

Evaluate the bootstrap replications: θˆ∗ (b) = s(x∗(b) ),

3

x∗(1) , x∗(2) , · · ·

, x∗(B)

drawn

From the distribution f : f (x) = 0.2 N(µ=1,σ=2) + 0.8 N(µ=6,σ=1). We draw the sample x = (x1 , · · · , x100 ) : 7.0411 5.2546 7.4199 4.1230 3.6790 −3.8635 −0.1864 −1.0138 6.9523 6.5975 x= 6.1559 4.5010 5.5741 6.6439 6.0919 7.3199 5.3602 7.0912 4.9585 4.7654

∀b ∈ {1, · · · , B}

ˆ by the standard deviation of the B Estimate the standard error sef (θ) replications: "P #1 B ˆ∗ ˆ∗ 2 2 b=1 [θ (b) − θ (·)] se ˆB = B −1 where θˆ∗ (·) =

PB

ˆ ∗ (b) θ B

b=1

4.8397 7.3937 5.3677 3.8914 0.3509 2.5731 2.7004 4.9794 5.3073 6.3495 5.8950 4.7860 5.5139 4.5224 7.1912 5.1305 6.4120 7.0766 5.9042 6.4668

5.3156 4.3376 6.7028 5.2323 1.4197 −0.7367 2.1487 0.1518 4.7191 7.2762 5.7591 5.4382 5.8869 5.5028 6.4181 6.8719 6.0721 5.9750 5.9273 6.1983

6.7719 4.4010 6.2003 5.5942 1.7585 0.5627 2.3513 2.8683 5.4374 5.9453 5.2173 4.8893 7.2756 4.5672 7.2248 5.2686 5.2740 6.6091 6.5762 4.3450

7.0616 5.1724 7.5707 7.1479 2.4476 1.6379 1.4833 1.6269 4.6108 4.6993 4.9980 7.2940 5.8449 5.8718 8.4153 5.8055 7.2329 7.2135 5.3702 5.3261

We have µf = 5 and x = 4.9970. 31 / 133

Bootstrap estimate of the standard Error

32 / 133

Distribution of θˆ

1

B = 1000 bootstrap samples {x∗(b) }

When enough bootstrap resamples have been generated, not only the standard error but any aspect of the distribution of the estimator θˆ = t(fˆ) could be estimated. One can draw a histogram of the distribution of θˆ by using the observed θˆ∗ (b), b = 1, · · · , B.

2

B = 1000 replications {x ∗ (b)}

Example A

3

Bootstrap estimate of the standard error:

Example A

se b B=1000 =

"P

1000 ∗ b=1 [x (b)

x ∗ (·)]2

− 1000 − 1

350

# 12

300

250

= 0.2212

200

150

where x ∗ (·) = 5.0007. This is to compare with se(x) ˆ =

ˆ σ √ n

= 0.22.

100

50

0

0

2

4

5

6

8

10

Figure: Histogram of the replications {x ∗ (b)}b=1···B . 33 / 133

Bootstrap estimate of the standard error

34 / 133

Bootstrap estimate of the standard Error

How many B in practice ? you may want to limit the computation time. In practice, you get a good estimation of the standard error for B in between 50 and 200.

Definition The ideal bootstrap estimate sefˆ (θˆ∗ ) is defined as:

Example A

lim se ˆ B = sefˆ (θˆ∗ )

B→∞

sefˆ

(θˆ∗ )

is called a non-parametric bootstrap estimate of the standard error.

B se bB

10 0.1386

20 0.2188

50 0.2245

100 0.2142

500 0.2248

1000 0.2212

10000 0.2187

Table: Bootstrap standard error w.r.t. the number B of bootstrap samples.

35 / 133

36 / 133

Bootstrap estimate of bias

Bootstrap estimate of bias

Definition

1

B independent bootstrap samples x∗(1) , x∗(2) , · · · , x∗(B) drawn from x

2

Evaluate the bootstrap replications:

The bootstrap estimate of bias is defined to be the estimate: ˆ = Eˆ [s(x∗ )] − t(fˆ) Biasfˆ (θ) f

θˆ∗ (b) = s(x∗(b) ),

= θˆ∗ (·) − θˆ 3

Approximate the bootstrap expectation :

Example A B Efˆ (x ∗ ) d Bias

10 5.0587 0.0617

20 4.9551 -0.0419

50 5.0244 0.0274

100 4.9883 -0.0087

500 4.9945 -0.0025

1000 5.0035 0.0064

B B 1 X ˆ∗ 1 X θˆ∗ (·) = θ (b) = s(x∗(b) ) B B

10000 4.9996 0.0025

b=1

4

d of x ∗ (x = 4.997 and µf = 5). Table: Bias

∀b ∈ {1, · · · , B}

b=1

the bootstrap estimate of bias based on B replications is: d B = θˆ∗ (·) − θˆ Bias

37 / 133

Confidence interval

38 / 133

Can the bootstrap answer other questions?

The mouse data Definition Using the bootstrap estimation of the standard error, the 100(1 − 2α)% confidence interval is: θ = θˆ ± z (1−α) · se bB

Data (Treatment group)

94; 197; 16; 38; 99; 141; 23

Data (Control group)

52; 104; 146; 10; 51; 30; 40; 27; 46

Definition

If the bias in not null, the bias corrected confidence interval is defined by: d B ) ± z (1−α) · se θ = (θˆ − Bias bB

Table: The mouse data [Efron]. 16 mice divided assigned to a treatment group (7) or a control group (9). Survival in days following a test surgery. Did the treatment prolong survival ?

39 / 133

Can the bootstrap answer other questions?

The Law school example

The mouse data Remember in the first lecture, we compute d = x Treat − x Cont = 30.63 with a standard error se(d) ˆ = 28.93. The ratio was d/se(d) ˆ = 1.05 (an insignificant result as measuring d = 0 is likely possible). Using bootstrap method 1

2 3

4

∗(b)

∗(b)

40 / 133

∗(b)

B bootstrap samples xTreat = (xTreat 1 , · · · , xTreat 7 ) and ∗(b) ∗(b) ∗(b) xCont = (xCont 1 , · · · , xCont 9 ), ∀1 6 b 6 B ∗(b) ∗(b) B bootstrap replications are computed: d ∗ (b) = x Treat − x Cont The bootstrap standard error is computed for B = 1400: se ˆ B=1400 = 26.85. The ratio is d/se ˆ 1400 (d) = 1.14.

School LSAT (X) GPA (Y)

1 576 3.39

2 635 3.30

3 558 2.81

4 578 3.03

5 666 3.44

6 580 3.07

7 555 3.00

School LSAT (X) GPA (Y)

9 651 3.36

10 605 3.13

11 653 3.12

12 575 2.74

13 545 2.76

14 572 2.88

15 594 2.96

8 661 3.43

Table: Results of law schools admission practice for the LSAT and GPA tests. It is believed that these scores are highly correlated. Compute the correlation and its standard error.

This is still not a significant result.

41 / 133

42 / 133

Correlation

The Law school example

The estimated correlation is corr(x, d y) = .7764 between LSAT and GPA.

The correlation is defined : corr(X , Y ) =

E[(X − E(X )) · (Y − E(Y ))]

(E[(X − E(X ))2 ] · E[(Y − E(Y ))2 ])1/2

Non-parametric Bootstrap estimate of the standard error B se ˆB

Its typical estimator is: Pn −n x y i=1 xi yi P corr(x, d y) = Pn 2 2 1/2 [ i=1 xi − nx ] · [ ni=1 yi2 − ny 2 ]1/2

25 .140

50 .142

100 .151

200 .143

400 .141

800 .137

1600 .133

3200 .132

Table: Bootstrap estimate of standard error for corr(x, d y) = .776.

d ≈ .132. The standard error stabilizes to sefˆ (corr)

43 / 133

The Law school example: Conclusion

44 / 133

Summary

Re-sampling of x to compute bootstrap samples x∗

The textbook formula for the correlation coefficient is: √ se( ˆ corr) d = (1 − corr d 2 )/ n − 3

With corr d = 0.7764, the standard error is se( ˆ corr) d = 0.1147.

The estimated non-parametric bootstrap standard error seB=3200 is 0.132.

45 / 133

Computation of bootstrap replication of the estimator θˆ∗ (b) for b = 1, · · · , B d B and the From replications, standard error se b B , the bias Bias confidence interval. Non-parametric bootstrap estimations (no prior on f ).

46 / 133

Introduction We want to assess the accuracy (bias, standard error, etc.) of an arbitrary estimate θˆ knowing only one sample x = (x1 , · · · , xn ) drawn from an unknown population density function f .

Definitions and Problems

We propose here one way, called Bootstrap, to do it using computer intensive techniques for resampling.

Non-Parametric Bootstrap Parametric Bootstrap

Bootstrap is a data based simulation method for statistical inference. The basic idea of bootstrap is to use the sample data to compute a statistic and to estimate its sampling distribution, without any model assumption.

Jackknife Permutation tests Cross-validation

No theoretical calculations of standard errors needed so we don’t care how mathematically complex the estimator θˆ can be!

25 / 133

Introduction

26 / 133

Bootstrap samples and replications

The (non-parametric) bootstrap method is an application of the plug-in principle. By non-parametric, we mean that only x is known (observed) and no prior knowledge on the population density function f is available. Originally, the Bootstrap was introduced to compute standard error of an arbitrary estimator by Efron (1979) and to-date the basic idea remains the same. The term bootstrap derives from the phrase to pull oneself up by one’s bootstrap (Adventures of Baron Munchausen, by Rudolph Erich Raspe). The Baron had fallen to the bottom of a deep lake. Just when it looked like all was lost, he thought to pick himself up by his own bootstraps.

27 / 133

Definition A bootstrap sample x∗ = (x1∗ , x2∗ , · · · , xn∗ ) is obtained by randomly sampling n times, with replacement, from the original data points x = (x1 , x2 , · · · , xn ). Considering a sample x = (x1 , x2 , x3 , x4 , x5 ), some bootstrap samples can be: x∗(1) = (x2 , x3 , x5 , x4 , x5 ) x∗(2) = (x1 , x3 , x1 , x4 , x5 ) etc.

Definition With each bootstrap sample x∗(1) to x∗(B) , we can compute a bootstrap replication θˆ∗ (b) = s(x∗(b) ) using the plug-in principle. 28 / 133

How to compute Bootstrap samples

How many values are left out of a bootstrap resample ?

Repeat B times:

Given a sample x = (x1 , x2 , · · · , xn ) and assuming that all xi are different, the probability that a particular value xi is left out of a resample x∗ = (x1∗ , x2∗ , · · · , xn∗ ) is: 1 n ∗ P(xj 6= xi , 1 6 j 6 n) = 1 − n n since P(xj∗ = xi ) = n1 . When n is large, the probability 1 − n1 converges to e −1 ≈ 0.37.

1

A random number device selects integers i1 , · · · , in each of which equals any value between 1 and n with probability n1 .

2

Then compute x∗ = (xi1 , · · · , xin ).

Some matlab code available on the web See BOOTSTRAP MATLAB TOOLBOX, by Abdelhak M. Zoubir and D. Robert Iskander, http://www.csp.curtin.edu.au/downloads/bootstrap toolbox.html

29 / 133

30 / 133

The Bootstrap algorithm for Estimating standard errors

Bootstrap estimate of the standard Error Example A

1

Select B independent bootstrap samples from x

2

Evaluate the bootstrap replications: θˆ∗ (b) = s(x∗(b) ),

3

x∗(1) , x∗(2) , · · ·

, x∗(B)

drawn

From the distribution f : f (x) = 0.2 N(µ=1,σ=2) + 0.8 N(µ=6,σ=1). We draw the sample x = (x1 , · · · , x100 ) : 7.0411 5.2546 7.4199 4.1230 3.6790 −3.8635 −0.1864 −1.0138 6.9523 6.5975 x= 6.1559 4.5010 5.5741 6.6439 6.0919 7.3199 5.3602 7.0912 4.9585 4.7654

∀b ∈ {1, · · · , B}

ˆ by the standard deviation of the B Estimate the standard error sef (θ) replications: "P #1 B ˆ∗ ˆ∗ 2 2 b=1 [θ (b) − θ (·)] se ˆB = B −1 where θˆ∗ (·) =

PB

ˆ ∗ (b) θ B

b=1

4.8397 7.3937 5.3677 3.8914 0.3509 2.5731 2.7004 4.9794 5.3073 6.3495 5.8950 4.7860 5.5139 4.5224 7.1912 5.1305 6.4120 7.0766 5.9042 6.4668

5.3156 4.3376 6.7028 5.2323 1.4197 −0.7367 2.1487 0.1518 4.7191 7.2762 5.7591 5.4382 5.8869 5.5028 6.4181 6.8719 6.0721 5.9750 5.9273 6.1983

6.7719 4.4010 6.2003 5.5942 1.7585 0.5627 2.3513 2.8683 5.4374 5.9453 5.2173 4.8893 7.2756 4.5672 7.2248 5.2686 5.2740 6.6091 6.5762 4.3450

7.0616 5.1724 7.5707 7.1479 2.4476 1.6379 1.4833 1.6269 4.6108 4.6993 4.9980 7.2940 5.8449 5.8718 8.4153 5.8055 7.2329 7.2135 5.3702 5.3261

We have µf = 5 and x = 4.9970. 31 / 133

Bootstrap estimate of the standard Error

32 / 133

Distribution of θˆ

1

B = 1000 bootstrap samples {x∗(b) }

When enough bootstrap resamples have been generated, not only the standard error but any aspect of the distribution of the estimator θˆ = t(fˆ) could be estimated. One can draw a histogram of the distribution of θˆ by using the observed θˆ∗ (b), b = 1, · · · , B.

2

B = 1000 replications {x ∗ (b)}

Example A

3

Bootstrap estimate of the standard error:

Example A

se b B=1000 =

"P

1000 ∗ b=1 [x (b)

x ∗ (·)]2

− 1000 − 1

350

# 12

300

250

= 0.2212

200

150

where x ∗ (·) = 5.0007. This is to compare with se(x) ˆ =

ˆ σ √ n

= 0.22.

100

50

0

0

2

4

5

6

8

10

Figure: Histogram of the replications {x ∗ (b)}b=1···B . 33 / 133

Bootstrap estimate of the standard error

34 / 133

Bootstrap estimate of the standard Error

How many B in practice ? you may want to limit the computation time. In practice, you get a good estimation of the standard error for B in between 50 and 200.

Definition The ideal bootstrap estimate sefˆ (θˆ∗ ) is defined as:

Example A

lim se ˆ B = sefˆ (θˆ∗ )

B→∞

sefˆ

(θˆ∗ )

is called a non-parametric bootstrap estimate of the standard error.

B se bB

10 0.1386

20 0.2188

50 0.2245

100 0.2142

500 0.2248

1000 0.2212

10000 0.2187

Table: Bootstrap standard error w.r.t. the number B of bootstrap samples.

35 / 133

36 / 133

Bootstrap estimate of bias

Bootstrap estimate of bias

Definition

1

B independent bootstrap samples x∗(1) , x∗(2) , · · · , x∗(B) drawn from x

2

Evaluate the bootstrap replications:

The bootstrap estimate of bias is defined to be the estimate: ˆ = Eˆ [s(x∗ )] − t(fˆ) Biasfˆ (θ) f

θˆ∗ (b) = s(x∗(b) ),

= θˆ∗ (·) − θˆ 3

Approximate the bootstrap expectation :

Example A B Efˆ (x ∗ ) d Bias

10 5.0587 0.0617

20 4.9551 -0.0419

50 5.0244 0.0274

100 4.9883 -0.0087

500 4.9945 -0.0025

1000 5.0035 0.0064

B B 1 X ˆ∗ 1 X θˆ∗ (·) = θ (b) = s(x∗(b) ) B B

10000 4.9996 0.0025

b=1

4

d of x ∗ (x = 4.997 and µf = 5). Table: Bias

∀b ∈ {1, · · · , B}

b=1

the bootstrap estimate of bias based on B replications is: d B = θˆ∗ (·) − θˆ Bias

37 / 133

Confidence interval

38 / 133

Can the bootstrap answer other questions?

The mouse data Definition Using the bootstrap estimation of the standard error, the 100(1 − 2α)% confidence interval is: θ = θˆ ± z (1−α) · se bB

Data (Treatment group)

94; 197; 16; 38; 99; 141; 23

Data (Control group)

52; 104; 146; 10; 51; 30; 40; 27; 46

Definition

If the bias in not null, the bias corrected confidence interval is defined by: d B ) ± z (1−α) · se θ = (θˆ − Bias bB

Table: The mouse data [Efron]. 16 mice divided assigned to a treatment group (7) or a control group (9). Survival in days following a test surgery. Did the treatment prolong survival ?

39 / 133

Can the bootstrap answer other questions?

The Law school example

The mouse data Remember in the first lecture, we compute d = x Treat − x Cont = 30.63 with a standard error se(d) ˆ = 28.93. The ratio was d/se(d) ˆ = 1.05 (an insignificant result as measuring d = 0 is likely possible). Using bootstrap method 1

2 3

4

∗(b)

∗(b)

40 / 133

∗(b)

B bootstrap samples xTreat = (xTreat 1 , · · · , xTreat 7 ) and ∗(b) ∗(b) ∗(b) xCont = (xCont 1 , · · · , xCont 9 ), ∀1 6 b 6 B ∗(b) ∗(b) B bootstrap replications are computed: d ∗ (b) = x Treat − x Cont The bootstrap standard error is computed for B = 1400: se ˆ B=1400 = 26.85. The ratio is d/se ˆ 1400 (d) = 1.14.

School LSAT (X) GPA (Y)

1 576 3.39

2 635 3.30

3 558 2.81

4 578 3.03

5 666 3.44

6 580 3.07

7 555 3.00

School LSAT (X) GPA (Y)

9 651 3.36

10 605 3.13

11 653 3.12

12 575 2.74

13 545 2.76

14 572 2.88

15 594 2.96

8 661 3.43

Table: Results of law schools admission practice for the LSAT and GPA tests. It is believed that these scores are highly correlated. Compute the correlation and its standard error.

This is still not a significant result.

41 / 133

42 / 133

Correlation

The Law school example

The estimated correlation is corr(x, d y) = .7764 between LSAT and GPA.

The correlation is defined : corr(X , Y ) =

E[(X − E(X )) · (Y − E(Y ))]

(E[(X − E(X ))2 ] · E[(Y − E(Y ))2 ])1/2

Non-parametric Bootstrap estimate of the standard error B se ˆB

Its typical estimator is: Pn −n x y i=1 xi yi P corr(x, d y) = Pn 2 2 1/2 [ i=1 xi − nx ] · [ ni=1 yi2 − ny 2 ]1/2

25 .140

50 .142

100 .151

200 .143

400 .141

800 .137

1600 .133

3200 .132

Table: Bootstrap estimate of standard error for corr(x, d y) = .776.

d ≈ .132. The standard error stabilizes to sefˆ (corr)

43 / 133

The Law school example: Conclusion

44 / 133

Summary

Re-sampling of x to compute bootstrap samples x∗

The textbook formula for the correlation coefficient is: √ se( ˆ corr) d = (1 − corr d 2 )/ n − 3

With corr d = 0.7764, the standard error is se( ˆ corr) d = 0.1147.

The estimated non-parametric bootstrap standard error seB=3200 is 0.132.

45 / 133

Computation of bootstrap replication of the estimator θˆ∗ (b) for b = 1, · · · , B d B and the From replications, standard error se b B , the bias Bias confidence interval. Non-parametric bootstrap estimations (no prior on f ).

46 / 133