in the iterated bootstrap, and discuss a method of importance resampling appropriate to small samples. ,4MS Subject Classification: Primary 62G05, 62G 15; ...
Journal
of Statistical
Planning
and Inference
157
27 (1991) 157-169
North-Holland
Bootstrap Nicholas
I. Fisher
CSIRO,
D/vision
Peter
Hall
Department Received
show
of Mathematics
of Statistics,
& Statistics,
Sydney,
National
University,
Australian
by T.L.
We describe
that
calculation
Box 218, Lindfield,
G. P.O. Box 4, Canberra,
NS W 2070, Australia
ACT 2601, Australia
Lai
algorithms
they are practicable
for exact computation for small
of the entire bootstrap
with simulation
involving bootstrap,
,4MS Subject
Classification:
words
and
distribution
several hundred
in the iterated
samples.
distribution
size is n = 6 then the entire bootstrap
Key
P.O.
11 May 1989
Recommended
Abstract:
algorithms for small samples
phrases:
is competitive
Primary
62G05,
Bootstrap;
exact
that
resampling
62G 15; secondary computation;
estimators,
and
enumeration
and
For example
and exact calculation
We also describe
of importance
bootstrap
in this setting,
with simulation.
has only 462 atoms,
replications.
and discuss a method
of nonparametric
It is argued
if the sample is competitive
the role of exact computation appropriate
to small samples.
62E 15.
importance
resampling;
Monte
Carlo;
simulation.
1. Introduction The nonparametric bootstrap may be used to estimate a wide variety of statistical features, including bias, variance, distribution function and quantile. Exact calculation of bootstrap estimates requires enumeration of all resamples which may be drawn with replacement from the original sample, and also computation of the likelihood and statistic value associated with each resample. This is impractical in large samples, since the number of possible resamples increases exponentially quickly with sample size. However it is quite feasible for small samples - certainly for samples of size 7 or less, and sometimes for samples of size 8 or 9, depending on the available computing resources. Some authors suggest that between 1000 and 2000 simulations are needed to compute bootstrap estimators by Monte Carlo means. Since there are only 1700 atoms in the bootstrap distribution when sample size is n = 7, and less than 500 when sample size is n = 6, then exact calculation is competitive with simulation in these cases. We describe algorithms for exact computation of nonparametric bootstrap 037%3758/91/$03.50
0
1991-Elsevier
Science Publishers
B.V. (North-Holland)
158
N.I. Fisher, P. Hall / Bootstrap algorithms
estimates estimators,
in small provides
samples. formulae
Section 2 reviews for exact calculation,
basic properties of bootstrap and discusses the issue of ties.
Section 3 describes the role of exact calculations in iterated bootstrap estimation, using exact calculation in the first bootstrap operation and Monte Carlo simulation in the second. Section 4 discusses a small sample version of Johns’ (1988) importance resampling method, and Section 5 briefly describes generalisations of our technique.
2. Exact computation 2.1.
Introduction
of bootstrap
estimators
and summary
In this section we argue that exact computation of bootstrap estimates, such as bias and quantile estimates, is an attractive proposition for samples of size n I 6. Exact calculation is sometimes feasible for n = 7, 8 and 9, but not really for n 2 10 unless extensive computational resources are available. Subsection 2.2 gives notation, Subsection 2.3 describes basic properties of bootstrap estimates, and Subsection 2.4 discusses algorithms for exact computation. In Subsection 2.3 we describe the effect of ties in resamples; Subsection 2.5 suggests a way of allowing for ties in the original sample. 2.2.
Notation
Let .K= {Xi, . . . . Xn} denote a random sample from the distribution of a d-dimensional vector X, put X= H.’ C Xi, and suppose 0 is a real-valued function of d variables. We focus attention on properties of e(X). For example, we may wish to estimate ,!+{e(~)}, or the distribution of 0(x), perhaps after a standardization for scale. Cases which may be studied in this context include that where 0(x) is a univariate mean, when d = 1; or a variance, when d = 2; or a correlation coefficient, when d = 5; or a function such as a ratio of any number of these quantities. We begin by noting the asymptotic distribution of e(X). If p =E(X) denotes the mean of X then n 1/2{e(x)-e(p)j is asy m p totically normal N(O,o’), where
a2 = ; ;=,
iJC3,(~)ej(,u)au,
j=l
e,cp) = (a/a&e(p),
a0 = E(X’Xj)
-,B’,uUJ,
and elements of vectors are indicated by superscripts. To avoid trivialities, assume that a*> 0. This avoids pathological examples such as e(x) =x2. Our empirical estimator of a2 is the asymptotic estimator, -2 aaspt = C C i9j(X)t3j(X)&‘i, i
j
N.I. Fisher, P. Hall / Bootstrap algorithms
159
where (g
= /,-I
f:
xh__yj _ j('xJ.
k=l In small samples this statistic of nvar{6’(8)}. Alternatives estimator 6&,,, defined by
n
is not always compellingly attractive as an estimator are the jackknife estimator 6fa,, and bootstrap
-I -2 obo,,t = E{8(x*)2
j x’) - [A?{&?*)
/ .%}12,
where X, denotes the mean of the (n - 1)-sample .%C\ {X;} and X* equals the mean of a same-size resample K* of size n drawn randomly, with replacement, from K. Each of &z,,r, r?Fackand S&r converges to o2 with probability one as n + 03. The bootstrap estimator of w =E{ e(8)) is @ = E(e(X*)
/ K}.
Hence the bootstrap estimate of bias E{ e(X)} -O(p) is &-e(X). Let e2 denote any one of c?&,,, 6fack and &iOOt, and write 6*2 for the version of 6’ computed for the resample K*. Bootstrap estimators of the distribution functions I
= qe(X)
G(X) = qe(x)-
- e(p)+,
e(pu)j/ax]
are
F(X) = qe(X*)-e(X)5xl Q(X) = fqep-
sit-}, e(x)y6*5x
respectively. Quantiles of the bootstrap confidence intervals. They are tD = inf (x: P(x) 2p},
19-1 distributions
4, = inf{x:
are often needed
to construct
C?(x)Zp}.
Use of c, or the quantile qP, to construct confidence regions for O(p), is known as the percentile-t method, to distinguish it from the percentile method based on E and tP. While percentile-t has advantages over percentile in large samples (HinkIey and Wei, 1984; Hall, 1988), it can suffer difficulties in small samples owing to ties in the resample Z*. These problems will be discussed in Section 3. 2.3.
Basic properties of bootstrap estimates
Assume that the distribution of X has no atoms. Then with probability one, all values in the sample Z are distinct, and the number of different unordered resamples Z* that can be drawn from .% with replacement, equals the number of ways of choosing nonnegative integers k,, . . . , k, satisfying k, + ... + k, = n. This is given by
N.I. Fisher, P. Hall / Bootstrap algorithms
160
N(n) =
c> 2n-
1
n
(Hall, 1987, Appendix 1). Table 1 lists values of N(n). The qualification ‘with probability one’ here and below refers to realizations of X. It means that if ‘8 is the collection of realizations for which the qualified statement is valid, then P(LXE 6?) = 1. With probability one, each of the N(n) different resamples produces a distinct value of 0(X*) and of T*={o(X*)-B(X)}/&*, provided that in the latter we ignore resamples which have 8* =O. Now, with probability one the event 8*=0 occurs for precisely those n resamples K* which consist of n identical elements. Any given one of these special resamples has conditional probability n-” of arising. Hence,
pa(n) = P(f3* = 0 1LX) = n+-‘), with probability
(2.1)
one. ratio T* is not well defined.
In the event that ci*=O, the studentized we interpret 6 as
However
1XL”)
G(x) = P{e(x*)-e(rt)I&*x
if
(2.2)
can be importhen C? is always well defined, as is the quantile 9,. This convention tant in small samples, where the probability at (2.1) may be non-negligible. Formula (2.2) has an obvious analogue in multivariate problems, where 6’is a vector of length r and 8* is an r x r matrix. Despite these convenient interpretations, there are obvious and serious problems in effectively using resamples which have zero variance, when the percentile-t method is employed.
Table
1
Number resampling
of atoms,
N(n) =
( “‘; ‘), of bootstrap
of most likely atoms
act, all others
are rounded
(n!K”)
to four significant
sample
number
size, n
in bootstrap distribution,
of atoms
distribution,
together
and least likely atoms
(n-“).
with probabilities Probabilities
under
uniform
designated
* are ex-
figures probability
of
probability
of
most likely atom
least likely atom
0.5* 0.2222 9.375 x 10-z*
0.25% 3.704 x 10-2 3.906 x 1O-3
3.84 x lo-2* 1.543 x 10-2
3.2 x 10m4* 2.143 x 1O-5 1.214x 1O-6
N(n)
2
3
3 4
10 35 126
5 6 I
462 1716
6.120x
8
6 435
2.403 x 10m3
9
24310
10
92 378
10m3
9.361 x 1O-4 3.629 x 10m4
5.960 x 1O-8 2.581 x10-9
1x lo-‘0*
161
N. I. Fisher, P. Hall / Bootstrap algorithms
Similar
problems
are often
experienced
with resamples
where a sample
value is
repeated n - 1 or n - 2 times. The corresponding value of T*, although well defined, can be very large. Now, the probability that some sample value is repeated precisely n - 1 times in LX* equals p,(n) The chance
= n(n-
that K*=
{x,~xj~x~}~
1). nnP
n13.
= (1 -nPl)K(+3), . . . ,X,}
{Xi,X,,Xk,Xk,
for some set of three distinct
values
is
p#)
=
+n(n-
l)(n-2).
= +(l -n-1)2(1 and the chance
that X*=
p3(n)
=
n(n.-
n(n-l)nP -2nP’)n+“-5),
{X,,Xj,X,,Xk, 1). $z(n-
. . ..X.}
1)X”
n24, for some X,#X,,
= +(l -K’)%+-4),
is
n15.
The sum P(H)
=Po(n)+L+(n)+P,(n)+P3(n) _
mn+l
_~n-“+2+jn~“+3_~jn~“+4+~n~“+5
equals the probability that some observation in the resample is repeated at least n - 2 times, and is given in Table 2. Only for n27 do these values not exceed 5%. Therefore we may expect to experience problems with tied resample values when using the percentile-t method in samples of size ns6. The most likely resample E* is K, the original sample; the least likely is any one of the n resamples of n identical elements. Table 1 lists values of N(n) and of the two extreme probabilities, for 21 ~510. We may deduce from those data that for n=3 to n=6, and perhaps also for n = 7 to n = 9, exact enumeration of bootstrap atoms is computationally n2 10.
feasible.
Simulation
Table 2 Probability
that some sample
value is repeated
at least n - 2 times in resample sample
size, n
probability large
of
T*, p(n)
5
0.290
6 I 8 9
0.052 6.8 x 10m3 6.8 x 10m4 5.5 x 10-5
10
3.7 x 10-6
is really
the only
alternative
for
N.I. Fisher, P. Hall / Bootstrap algorithms
162 2.4.
Exact
Suppose
computation
we wish to calculate
the exact value of
j x}.
+ = E{e(X*)
Let g(n) denote the set of all distinct n-tuples 1=(1,, . . . , I,) having 0 5 I, I I2 5 ... 5 1, and 1 Ii= n. Given (I,, . . . , I,) e g(n), let A([,, . . . , f,) be the set of all ordered of (I,, . . . , I,,). Define n-tuples (k,, . . . , k,) which are simply permutations X(n)
Table
=
IJ .A(l,, IG Y(n)
3
Example
of the sets P(n)
. H(n), and subsequent
and
calculation
of v/ from
klkzk,h
I, 121314 1 1 1
. . ..I.).
I
1 1
I I
r
0112 0 1 2 1 0211 2110 2011 2101
0112
10
(12 distinct 1 2
102
’ X(4)
permutations)
1
1102 1120 1201 L
0013
1 2 10
C 12 distinct permutations i
0022
0004 (2.4)
6 distinct permutations 4 distinct permutations
I
(3=3!4~3[(l!l!l!l!)~‘f?{(X,+X*+X3+X4)/4} +(1!1!2!))’
c
Q{(X,+X,+2&)/4}
+(1!3!)-’
c
B{(x,+3x,)/4}
+(2!2!)-’
c
0{(2X,+2X,)/4}
+(4!)~‘(~(~,)+~(~z)+B(~3)+~(~4))1.
[In each case, summation
is over distinct values of i, j, k.]
equation
(2.4), with n = 4
algorithms
163
--k,!n”)-lBjn-l~l k;x,)
(2.3)
N.I. Fisher, P. Hall / BooNrap
Then
Z(n)
contains
lJT=
precisely
c
n!(k,!
ke.li(rr)
N(H) elements,
and
See Table 3 for a clarifying example. These formulae follow from the fact that the resample in which X, is repeated just kj times for 15 is n, arises with probability ’ conditional on l?r. n!(k,! . ..k.!n”) Much of the labour in computing @ comes from evaluating the ratio n!(k,! . ..k.!~“)~‘. This work is reduced if u/ is computed in the form (2.4) rather than (2.3). Similarly, the bootstrap distribution functions P and C? may be calculated exactly as
where B(k) denotes the value of 6 computed for the resample in which X, appears exactly k, times for 1 sirn. The p-th quantile [, of P may be computed by first obtaining all N(n) pairs (o,r)
= [O(+,
kid,
n!(kl!...k,!tPm’j;
ordering these pairs in respect of the first element, as (COG, n,) where ol < ...