Con dence bands in Generalized linear models Jiayang Sun Clive Loader Case Western Reserve University Bell Laboratories William P. McCormick University of Georgia Abstract Generalized linear models (GLM) include many useful models. This paper studies simultaneous con dence regions for the mean response function in these models. The coverage probabilities of these regions are related to tail probabilities of maxima of Gaussian random elds, asymptotically, and hence, the so called \tube formula" is applicable without any modi cation. However, in the generalized linear models, the errors are often non-additive and non-Gaussian and may be discrete. This poses a challenge to the accuracy of the approximation by the tube formula in the moderate sample situation. Here two alternative approaches are considered. These approaches are based on an Edgeworth expansion for the distribution of a maximum likelihood estimator and a version of Skorohod's representation theorem, which are used to convert an error term (which is of order n?1=2 in one-sided con dence regions and of n?1 in two-sided con dence regions) from the Edgeworth expansion to a \bias" term. The bias is then estimated and corrected in two ways to adjust the approximation formula. Examples and simulations show that our methods are viable and complementary to existing methods. An application to insect data is provided. Code for implementing our procedures is available via the software parfit. AMS 1991 subject classi cations. Primary 62F25, 62J12; secondary 60G70, 60G15, 62G07, 62E20. Key words and phrases. Tube formula, Edgeworth Expansion, maximum of Gaussian random elds,
regression, simultaneous con dence bands. Abbreviated title: Simultaneous con dence bands.
Address and phone no of the corresponding author:
J. Sun, Department of Statistics, Case Western Reserve University, Cleveland, OH 44108-7054.
[email protected]. 216-368-0630(o), 216-3680252 (fax).
1
Software or code: Instructions for using our procedure via the software parfit (in
conjunction with locfit, a software package developed by Clive Loader) will be made public after the paper is accepted.
1 Introduction Generalized linear models (GLM) include many useful models, for example, linear regression models for normal response data, linear logistic models for binary response data and loglinear models for categorical response data. It is important to quantify the shape, presence of some special features, and uncertainty of the estimates, of the mean response curve or the regression function. For this purpose, this article studies informative simultaneous con dence regions (SCR thereafter) for the mean response curve under generalized linear models. The Schee method may be used to construct a SCR for a linear parametric regression function over the whole predictor space when errors are i:i:d: normal and additive. In more practical cases when the domain of interest is a subset of the predictor space, Naiman (1987, 1990) and Sun and Loader (1994) provided SCRs for a regression function over the subset using the \tube method" (see also: Knowles and Siegmund, 1989, Johansen and Johnstone, 1990). These regions are robust for additive models with contaminated normal errors and errors from spherical symmetric distributions (cf. Loader and Sun, 1997). The \tube formula" in principle also works without modi cation for models with any non-normal additive independent errors when the sample size is large as the central limit theorem applies. In other cases, this tube formula without any modi cation (called naive SCR hereafter) may fail, as shown in Loader and Sun (1997). It is encouraging that the tube formula can be modi ed and combined with other techniques to extend to the cases with independent heteroscedastic or correlated additive errors, after an adjustment to account for the extra variation introduced by estimation of unknown weights in a weighted least squares regression or the structured covariance in the correlated case (cf. Faraway and Sun, 1995, and Sun, Raz and Faraway, 1998). However, in the generalized linear model, the errors are often non-additive (unless they are normal) and (response) variables are discrete. This poses a challenge to the tube method used in the previous work when the sample size is moderate. Here two remedies are proposed. They apply the Edgeworth expansion in the middle of approximations rather than the normal approximation for a normalized process and then use the idea of the Skorohod construction to convert the rst order error term from the Edgeworth expansion (in terms of distribution) to a bias term (in terms of the dierence 2
of the process and its limit, in a suitable sample space). The converted expansion may be called an inverse Edgeworth expansion. Our corrections then proceed in two ways in \correcting" the bias term from the inverse Edgeworth expansion. More details are as follows. In Section 2, the GLM and some basic SCRs are described. In Section 3, an inverse Edgeworth expansion, called basic inverse Edgeworth expansion in this paper, is presented. The inverse Edgeworth expansion is of interest in itself and can be considered as a generalization of the Cornish-Fisher expansion for a partial sum of i.i.d. random variables. In Section 4, a rst-level approximation to the coverage probabilities of the basic SCRs is given, several generalized (from the basic inverse Edgeworth expansion) inverse Edgeworth expansions are developed and corrections (in two ways) of the basic SCRs are then derived. In the case of one-sided SCR, the error term in the Edgeworth expansion is at the rate of n? = and depends on the bias, skewness and variance of a normalized process; in the case of two-sided SCRs, the error term is of order n? and depends on the kurtosis and a secondary eect of the skewness of the process. Thus, the corrections for one-sided and two-sided SCRs are dierent. Our rst way of correction for one-sided and two-sided SCRs is to apply the tube formula to some modi ed processes Wn] and Wn (see x4). Our second way of correction uses the method of bias correction in Sun and Loader (1994), which is equivalent to shrinking and expanding the width of the basic SCRs, based on a bound of the bias term. Examples and simulations are shown in Section 5. Concluding remarks and comparisons with alternative methods (e.g. Kna , Sacks and Ylvisaker, 1985, Hall and Titterington, 1988 and Hardle and Marron, 1991) are summarized in Section 6. Proofs and more examples can be found in the Appendix. Those who wish to use our SCRs right away may read Table 3 in x5 directly, though reading the description of GLM in x 2 and concluding remarks in x6 is always helpful. Those who wish to check the meaning of a notation may consult the one page index attached at the end of the paper. The index is a summary of notations arranged roughly in the order of their appearances in the paper. 1 2
1
(2)
2 Model and Basic Simultaneous Con dence Regions Model. Consider the usual generalized linear model (GLM) speci ed by a link function and a random structure. Here the (anti) link function g links n independent observable pairs of predictor xi and response variable Yi by: E (Yijxi) = g((xi)) < 1; for i = 1 : : : n; where E (Y jx) is the conditional expectation of a response Y given a predictor x, (x) = z(x)T is a linear predictor, is a q-dimensional unknown parameter and z is a q-dimensional
3
known covariate. The random structure assumes that the density f of Yi belong to an exponential family:
f (y; i) = expfyi ? b(i ) + a(y)g
(1)
with respect to some - nite measure d(y), for some known functions b and a and a natural parameter i = (xi ). The natural parameter is a function of the linear predictor (xi). Our problem of interest is to nd under the GLM an informative simultaneous con dence region for the mean response function E (Y jx) as a function of x over a domain of interest X IRd for some d > 0. In this paper, for simplicity we only study the case d=1 Throughout this paper, a subscripted function denotes the function evaluated at xi , e.g. zi = z(xi ) and i = (xi ); and b0 denotes the rst derivative of b, b00 the second derivative and b the third derivative etc. Basic Simultaneous Con dence Regions. In most cases, the link function g is chosen to be the canonical link. The canonical link requires that the natural parameter = (x) = (x). Thus g() = g() = E (Y j) = b0() is a known monotone function of ; and hence nding a SCR for the mean response function E (Y jx) is equivalent to nding a SCR for the linear predictor (x), for x 2 X . A natural basic two-sided SCR for (x) is one centered around its estimator ^(x) = z(x)T ^: (3)
l(x) lc(x) = (^(x) ? c^ (x); ^(x) + c^ (x)); 8x 2 X ; (2) for some c > 0, where ^ = ^n is the maximum likelihood estimator of , and ^ (x) is the estimated asymptotic variance of ^(x). See (6) and (7) below. Here c is such that 2
1 ? = prf(x) 2 lc(x); 8x 2 Xg
(3)
for a prescribed level 0 < 1 ? < 1. Once c is found, we have a SCR lc(x) for (x) and hence a SCR g(lc(x)) for the mean response function E (Y jx) for x 2 X . Note that for an approximate pointwise CI at a xed x, we can take simply c = ? (1 ? =2), the upper =2 quantile of the standard normal distribution. Sometimes, one-sided con dence bounds are of interest. For example, in quality control applications, one might want upper bounds on the failure rates of components under various operating conditions. Our one-sided basic upper and lower SCRs for (x) are of the form 1
lu(x) = (?1; ^(x) + cu^ (x)); 8x 2 X ; ll (x) = (^(x) ? cl ^ (x); 1); 8x 2 X 4
(4)
respectively, where cu and cl are such that prf(x) 2 lu(x); 8x 2 Xg = 1 ? = prf(x) 2 ll (x); 8x 2 Xg:
(5)
We shall see in Section 4 that cl = cu, using a rst level approximation to (5). Clearly, the estimator ^ is a solution of MLE equation n X i=1
zi [Yi ? b0(i )] = 0:
(6)
The asymptotic variance (x) of ^(x) and its estimator are 2
(x) = z(x)T In? ( )z(x); ^ (x) = z(x)T In? ( ^)z(x) 2
1
2
1
(7)
where In( ) is the q q Fisher information matrix about in the sample (xi; Yi); i = 1; : : : n. This Fisher information can be expressed as
In( ) =
n X i=1
b00 (i )ziziT = X T X
(8)
where zi = z(xi ), X T = (z ; : : : ; zn) is the n q design matrix and 1
= diag(var(Y ); : : : ; var(Yn)) = diag(b00( ); : : : ; b00 (n)) 1
1
is the variance matrix of the responses. Of course, ^ may be a biased estimator of . We will not merely re-center or shift the region in (2) based on an estimator of the bias, as this may increase the variance of the adjusted estimator and sometimes is worse than no correction (cf. Loader, 1993 and Sun and Loader, 1994). Instead, to achieve a prescribed level, we shall use two methods of bias corrections discussed at the end of Section 1. Discussion. The following development and corrections of basic SCRs are for the case that the link is canonical. However, it is apparent that our method also applies to the case with known non-canonical link. A discussion for the case of an unknown link function is given in Section 6.
3 Inverse Edgeworth Expansion In this section, inverse Edgeworth expansions are developed using Skorohod construction. Examples on relevant coecients in the inverse Edgeworth expansions, such as bias, skewness and lattice term, are presented in the Appendix. For two sequences of distributions, Fn and Gn, it will be convenient to have a notation to denote equality of distributions up to 5
order o(n? = ), or o(n? ). We write Fn =d Gn to indicate that Fn(x) ? Gn (x) = o(n? = ), or o(n? ), uniformly in x. Whether it is o(n? = ) or o(n? ) should be clear from the last term in Fn or Gn. As is customary, we also write Xn =d Yn to indicate that the corresponding sequence of distributions stand in the relation of equality up to order o(n? = ) or o(n? ). The heuristics for developing the inverse Edgeworth expansions are as follows. When is one dimensional, for a normalizing constant an , following Skovgaard (1981a, b) we have an Edgeworth expansion for the random sequence an ( ^n ? ) given by 1 2
1
1 2
1
1 2
1
1 2
1
prfan( ^n ? ) tg = (t) + Rn(t) Fn(t); where Rn(t) = O(n? = ) depends on the bias and skewness of an ( ^n ? ) and (t) is the standard normal cumulative distribution function. [When is q > 1 dimensional, a similar D E ^ expansion exists for v; an( n ? ) for any v 2 IRq .] Thus, for some Qn(t) = O(n? = ), 1 2
1 2
an( ^ ? ) =d Fn? (U ) = ? (U ) + Qn (U ) =d Z + Qn((Z )); 1
(9)
1
where U is a uniform random variable on (0; 1) and Z is a standard normal random variable. In this way, the O(n? = ) error term Rn in the convergence of the distribution is passed onto a Op(n? = ) \bias" term Qn((Z )) between the random sequence and its limit, in a suitable sample space. This is similar to that of Skorohod construction. A detailed expression of Qn is given in (10) of the following Proposition. 1 2
1 2
Proposition 3.1 [Inverse Edgeworth expansion.] Let n n denote the smallest and largest eigenvalues of An where An ( ) In( )=n { a re-scaled Fisher information; and let
Bn = Bn( ) be the upper Cholesky triangle in the decomposition: BnT Bn = An. Suppose that E jYij < 1 for all i and 0 < lim n lim n < 1. Then for any v 2 S q? (the q dimensional unit Sphere) and a random variable Z N (0; 1), we have the following inverse Edgeworth expansion, called basic inverse Edgeworth expansion in this paper: D p E v; nBn( ^n ? ) =d Z + 16 n(v)[Z ? 1] + hv; ni + Tn(v) (10) as D np! 1^. HereE hv; ni, n(v) and Tn(v) are the bias, skewness and lattice term of v; nBn( n ? ) given in (13), (14) and (15) below. 1
3
2
The key formula for deriving the bias, skewness and lattice term in (10) is the following p asymptotic expression of Bn n( ^n ? ): n X p Bn n( ^n ? ) = ? 2n1 = b (i ) hui; i ui + op( n1= ) i (3)
3 2
=1
6
2
1 2
(11)
which comes from a second order expansion of the MLE equation (6). Here b is the third derivative of b, ui = (BnT )? zi is a normalized covariate of zi, and (3)
1
n X = (n) = n1= ui[Yi ? b0 (i )] 1 2
(12)
i=1
is the normalized random variable such that E ( ) = 0 and E ( T ) = Iq , the q-dimensional identity matrix. The moment relationship about here follows easily from (8). See the Appendix for its detailed derivation and the proof of Proposition 3.1. Based on the asymptotic expansion (11), one can simply compute the rst and third p moments of Bn n( ^n ? ) to get the bias hv; ni and skewness n(v): n p 1 X = n = E [Bn n( ^n ? )] = 2? n = i b (i)jjuijj ui; n D E X p n(v) = E v; Bn n( ^n ? ) ? = n?=2 b (i) hui; vi ; (3)
3 2
2
(13)
=1
3
3
(3)
3 2
i=1
(14)
up to o(n? = ), for any v 2 S q? , cf. (A.3). The bias and skewness terms are of O(n? = ). If the distribution of hv; i is lattice, clearly there is an additional n? = order (lattice) term Tn (v) in (10): 1 2
1
1 2
1 2
8 h v n1=2 < H ( Z ); if h; vi has a lattice distribution Tn (v) = : n1=2 h v 0; if h; vi does not have a lattice distribution ( )
( )
(15)
where h(v) is the span of hv; i, and H (x) is a periodic function with a period 1 and such that H (x) = x ? 1=2 for 0 x 1. It is easy to check when the lattice term is zero. A necessary condition for hv; i to have a lattice distribution is that at least one component of T = ( ; : : : ; q ) has a lattice distribution. In other words, there are constants ai and hi > 0 for some i such that 1
pr(i = ai + hik; for some k = : : : ; ?2; ?1; 0; 1; 2; : : :) = 1: The largest real number hi such that the above statement holds is called the span. Based on this necessary condition, it is rare to need a nonzero lattice term even if the response variable is discrete. See the examples in the Appendix. In contrast to the one-sided case, in the case of a two sided SCR the main eect of skewness is of O(n? ) instead of O(n? = ), because the O(n? = ) terms cancel at two boundaries. Indeed, as in Hall (1992, p49), 1
1 2
1 2
prfjan( ^n ? )j tg = 2(t) ? 1 + Rn0 (t) Fn(t); 7
where Rn0 (t) = O(n? ) depends on the kurtosis and a secondary eect of the skewness and variance of an( ^n ? ). Then, similar to the idea for obtaining (9), one obtains 1
jan( ^n ? )j =d jZ j + Q0n ((Z )):
(16)
A detailed expression of Q0n, in the context of the two sided SCR, will be given later { see (28) in x4.
4 Corrections and Approximations In this section, a rst-level approximation is developed for the coverage probabilities of basic SCRs in (2) and (4). The inverse Edgeworth expansion Proposition 3.1 is then generalized to derive corrected SCRs, in two dierent ways. Approximations. Let Wn(x) denote the normalized process
(x) : Wn(x) := ^(x)^? (x)
(17)
Recall from (3) and (5), our goal is to choose c > 0; cu > 0 and cl > 0 such that = P (c) and P u(cu) = = P l (cl), where
P (c) = prfsup jWn(x)j > cg; x2X P u(cu) = prfxinf W (x) < ?cu g; and 2X n P l (cl) = prfsup Wn(x) > cl g: x2X
(18)
It is straightforward to show that under some regularity conditions (e.g. the exponential family speci ed by (1) is minimal and steep in the sense of Brown, 1986; and the natural parameter space contains at least one interior point), the family fWn(x)g is tight and as n ! 1, d Wn(x) ?! W (x): (19) Here W (x) is a standard Gaussian random eld with mean zero and covariance function (x; x0 ) = hs(x0 ); s(x)i, where s(x) = limn!1 sn(x) and T )? z (x) (B T )? z (x) qn sn(x) = (Bpn n (x) = zT A?n z : 1
1
1
(20)
In the typical case that s(x) does not self overlap (in most cases it does not, cf. Sun and Loader, 1994), it is straightforward to apply the tube formula in Sun (1993) and a 8
boundary correction (Sun and Loader, 1994) to get
!
pr sup W (x) > c 2 e?c = + (1 ? (c)) x2X where is the length of fs(x) : x 2 Xg: P l(c)
0
= 0
Z
0
2 2
ks0(x)kdx;
x2X
(21)
s0 is the rst derivative of s, and is Euler-Poincare characteristic of fs(x) : x 2 Xg such that i) = 1 if X = [a; b] and s(a) 6= s(b), and ii) = 0 when s(a) = s(b). Consult Kreyszig (1968) for more details on . Similarly, the coverage probability of two sided SCR is P (c) e?c2 = + 2(1 ? (c)): By the symmetry of W (x), it is easy to see that cl = cu. So, if c and cu = cl are the 0
2
solution of
= e?c2= + 2(1 ? (c)); (22) (23) = 2 e? cl 2 = + (1 ? (cl )) respectively, basic SCRs lc(x) in (2), lu (x) and ll (x) in (4) are 1 ? approximate two-sided, one-sided upper and lower, SCRs of (x) for all x 2 X . Remark 1 . Note that a basic SCR for , or a SCR g(lc) for E (Y jx), based on c or cu from the tube formula (22) and (23), is dierent from the naive SCR unless the responses in GLM are normally distributed. Using the naive SCR in discrete response cases could fail badly, as discussed in x1. Our simulation results in next section show that these basic SCRs based on the rst level approximation in (22) and (23) are good if the sample size is large for many discrete models including Logistic (for n = 200) and Poisson models (for n = 50). When n is small and the data are discrete, it is desirable to have a better approximation or correction which takes into account the bias from the Gaussian approximation in (19). 0
0
2
(
)
2
Generalization of Inverse Edgeworth Expansion. Let Wn (x) denote another normalized process (compare it with Wn(x) in (17)):
(x) : Wn(x) := ^(x)? (x) 9
D
E
p Then Wn(x) = sn(x); Bn n( ^ ? ) since ^(x) = z(x)T ^ and (x) = z(x)T . It follows from (10) and similar steps in establishing (19) that an inverse Edgeworth expansion for Wn(x) is (24) Wn(x) =d W (x) + 61 n(sn(x))[W (x) ? 1] + hsn(x); ni + Tn(sn(x)); under the same regularity conditions of Proposition 3.1. However, the inverse Edgeworth expansion for Wn(x), the process that we are interested in, diers from (24), since Wn(x) = Wn(x)(x)=^ (x) and (x)=^ (x) = 1 + Op(n? = ). Indeed, multiplying Wn in (11) by an expansion of (x)=^ (x) leads to 2
1 2
n i h X Wn(x) = hsn(x); i + 2n1 = b (i ) hsn(x); uii hui; ihsn(x); i ? hui; sn(x)i hui; i i 1 + op( n = ): (25) It then follows by straightforward calculations that the mean, covariance and skewness of Wn(x), up to the order of o(n? = ), are n i h X 0n(x) := E (Wn(x)) = 2n1 = b (i ) hs(x); uii ? hs(x); uii kuik ; i 0 cov(Wn(x); Wn(x )) = hs(x); s(x0 )i ; n X (26) n0 (x) := skew(Wn(x)) = n1= b (i ) hui; s(x)i : 2
(3)
3 2
2
=1
1 2
1 2
3
(3)
3 2
=1
3
(3)
3 2
2
i=1
See the Appendix for a more detailed derivation of (25) and (26). Consequently, a generalized inverse Edgeworth expansion of Wn(x) is (27) Wn(x) =d W (x) + 61 n0 (x)[W (x) ? 1] + 0n(x) + Tn(s(x)): This expansion will be used to build modi ed one-sided SCRs in Corrections 1 and 1' below. Remark 2 . Clearly, Wn has a mean and a skewness much closer to zero than those of Wn; and the skewness of Wn and Wn have dierent signs. Thus, an application of (24) for correcting cu in lu would result in over-corrections . 2
Tracing the same arguments for (27) (details are omitted), we have a generalized twosided inverse Edgeworth expansion relating to (16)
jWn(x)j =d jW (x)j ? p (x; Wn (x)); 2
10
(28)
where the meaning of p is similar to that in Hall (1992, pp48-50) so that h i 1 [ (x) + 4 (x) (x)] (z ? 3)+ (29) p (x; z) = ?z 12 (x) ? 1 + (x) + 24 1 (30) + 72 (x)(z ? 10z + 15) = O( n1 ): 2
2 1
2
2
2 3
4
4
1
2
3
2
Here i 's are the moments of Wn(x): n X (x) = EWn(x) = 2n1 = bi < s(x); ui > [< s(x); ui > ? < ui; ui >]; (x) = E [Wn(x) ? EWn(x)] = 1 + 12 C ? 21 C ? 3C + 21 C + C ? 12 C + 47 C ; X (x) = E [Wn(x) ? EWn(x)] = n1= bi < s(x); ui > ; i (x) = E [Wn(x) ? EWn(x)] ? 3 (x) = ?9C + 3C + 6C + 3C (31) (3)
1
3 2
2
1
2
2
1
3
4
(3)
3
3
2
6
7
8
3
3 2
4
4
2 2
3
6
8
9
with ui = (BnT )? zi and XX bi bj < s(x); ui >< s(x); uj >< ui; uj > = C ; C = n1 i j X C = n1 bi < s(x); ui > < ui; ui >; i XX 1 bi bj < s(x); ui > < s(x); uj > < ui; uj >; C = n i j XX C = n1 bi bj < s(x); ui > < ui; uj >< uj ; uj >; i j X C = n1 bi < s(x); ui > ; i XX 1 bi bj < s(x); ui >< s(x); uj > < ui; ui >; C = n i j XX C = n1 bi bj < s(x); ui > < s(x); uj > = [ (x)] ; i j X C = n1 [bi ] < s(x); ui > : i 1
(3) (3)
1
3
2
2
3
3
4
3
6
2
7
3
8
3
9
2
2
(4)
5
2
(3) (3)
2
(3) (3)
2
(4)
2
4
(3) (3)
3
(3) (3)
3
(2) 2
3
3
2
4
(32)
From these expressions, it is clear that the computation involving n? correction term increases drastically from that of n? = . Nevertheless, they can be used in two ways as those in one-sided case to modify the basic SCR lc { see Corrections 2 and 2' . 1
1 2
Corrections. There are two ways of modifying the basic SCRs, based on (27) for onesided bands and based on (28) for two-sided bands. We shall ignore the lattice term.
11
Correction 1: one-sided bands . De ne a new process 0 n o Wn] (x) = Wn(x) ? 0n(x) ? n6(x) [Wn(x) ? 0n(x)]2 ? 1
(33)
d where Wn(x) = (^(x) ? (x))=^(x). It follows easily that Wn] (x) ! W (x), by (19) and \0n = O(n? = ) = n0 ". Thus, we can de ne corrected upper and lower SCRs based on Wn] : 1 2
lu;c(x) = f(x) : Wn] (x) ?cug; 8x 2 X : ll;c(x) = f(x) : Wn] (x) cl g; 8x 2 X :
(34) (35)
where cu = cl is a solution of (23). Correction 1': one-sided bands . Our second way of correcting a basic SCR is to bound the bias term in the inverse Edgeworth expansion (27) as in the following proposition.
Proposition 4.1 De ne R = supx2X R(x) where
R(x) := 61 n0 (x)[W (x) ? 1] + Tn(s(x)) + 0n(x): 2
Suppose that there are positive constants r = rn such that pr(R r) = o() as n ! 1 and ! 0. Then if X is one dimensional, a conservative approximation and a liberal approximation to (4) are:
2 e? cu?r 2 = + (1 ? (cu ? r)); 2 e? cu r 2 = + (1 ? (cu + r)); 0
(
)
2
(36)
0
(
+ )
2
(37)
respectively, where is the same Euler-Poincare characteristic as that in (23). Thus, our second corrected upper and lower one-sided SCRs are lu and ll in (4) with cu = cl being the solutions from (36) or (37).
Of course, the solution cu depends on constants , and r, which can be estimated. See (45)-(47) below. The resulting SCR lu from (36) may have a slightly conservative coverage probability while the one from (37) may have a slightly liberal coverage probability, but both coverage probabilities approach to 1 ? as n ! 1 and ! 0. See next section for suggestions on choosing cu from either (36) or (37) in practice. The proof is given in the Appendix. The idea of the proof is to apply the inverse Edgeworth expansion (27) to approximate (18) (similarly to that in the Skorohod representation) and consider the n? = term, e.g. R, as a bias term. Then the method of the bias correction from Sun and Loader (1994) is generalized to our problem for improving the accuracy of our approximation formula. 0
1 2
12
Correction 2: two-sided bands . De ne new processes
Wn (x) = jWn(x)j + p (x; Wn(x)) Wn (x) = jWn (x)j + q (x; Wn (x)) (1)
2
(2)
(0)
(38)
(0)
2
where Wn (x) is a normalized process of Wn(x) and q is the counter part of p for Wn (x): (0)
2
2
(0)
q (39) Wn (x) = (Wn(x) ? ^ (x))= ^ (x)) 1 1 s (x)(z ? 3) + 1 [s (x)] (z ? 10z + 15) : (40) q (x; z) = ?z 2 s; (x) + 24 ; 72 ; Here clearly q (x; z) = O(n? ) and s; (x) = C ? 23 C ? C ? C + 21 C + C ? C ; s; (x) = (x); s; (x) = ?3C ? 6C ? 6C + 3C + 3C ? 3C + 3C : (41) (0)
2
1
2
22
2
41
31
2
4
2
1
2
22
3
31
3
8
3
41
5
4
4
5
7
6
6
7
2
8
9
Then, similarly to establishing (28), we have a generalized inverse Edgeworth expansion
Wn (x) =d jW (x)j =d Wn (x); (1)
(2)
up to the order of op(n? ). Thus, let c be the constant such that 1
P fmax jW (x)j cg = 1 ? X ie, c be the solution of the approximation in (22), two corrected regions are
lc (x) = f(x) : Wn (x) cg; lc (x) = f(x) : Wn (x) cg; (1)
(1)
(2)
(2)
8x 2 X ; 8x 2 X ;
(42)
Here l is simpler than l , but Wn (x) in l is more symmetric than Wn(x) in l and does better than l in practice. See Section 5. Correction 2': two-sided bands . It is lc(x) in (2) with c being the solution of (43) or (44) in Proposition 4.2 below. (1)
(2)
(0)
(2)
(1)
(1)
Proposition 4.2 De ne Rp0 = supx2X p (x; W (x)) = Op(1=n). Suppose that there are positive constants rp0 = rp0 (n) such that pr(Rp0 rp0 ) = 1 ? o() as n ! 1 and n?k ! 0 for some 1 k > 0. Then if X is one dimensional, a conservative approximation and an 2
liberal approximation to
1 ? = prf(x) 2 lc(x); 8x 2 Xg 13
are:
(43) e? c?rp0 2 = + 2(1 ? (c ? rp0 )); ? e? c rp0 2 = + 2(1 ? (c + rp0 )); (44) respectively, where is Euler-Poincare characteristic of fs(x) x 2 Xg such that i) = 1 if X = [a; b] and s(a) 6= s(b), and ii) = 0 when s(a) = s(b); and 0:5 Rx2X1 jjs0(x)jjdx with X being the overlapping area of the tubular neighborhoods of fs(x) : x 2 Xg and f?s(x) : x 2 Xg. (Consult Kreyszig (1968) for more details on .) Then, a corrected two-sided SCR is lc(c) in (2) with c being the solutions from (44). Remark 3. The corrected SCR 2' from (43) is lc (x) = f(x) : jWn(x)j c ? rp0 g; 8x 2 X : Similarly to that in (42), the counter part based on Wn (x) is lc (x) = f(x) : jWn (x)j c ? rq0 g; 8x 2 X where pr(Rq0 rq0 ) = 1 ? o() as n ! 1 and Rq0 = supx2X q (x; W (x)) = Op(1=n). 0
0
(
)
1
2
( +
)
2
1
1
(3)
(0)
(4)
(0)
2
Estimation of constants 0; 1 ; r; rp0 ; rq0 . 1. A simple approximation to 0 when X = [a; b] can be obtained by partitioning [a; b] into a = t0 < < tm = b, and computing m (B m Z ti ^nT )?1 z(ti ) (B^nT )?1 z(ti?1 ) ^nT )?1z(x) X X ( B p pn^ (t ) ? pn^(t ) jj jj dx jj (45) ^0 = jjr n^ (x) i i?1 i=1 i=1 ti?1 where B^n = Bn( ^). 2. The calculation of 1 depends on the set X1, which can be approximated roughly by plotting s(x) and ?s(x). As shown in Sun and Loader (1994) that if s : X ! s(X ) is 1 ? 1, three times dierentiable and there exists a vector with h ; s(x)i > 0 for all x 2 X , then the tubular neighborhoods of fs(x)g and f?s(x)g do not intersect for suciently small radii. The vector = (1; : : : ; 1)T suces in most regression models. In our simulations we simply set 1 = 0 which worked ne. A more rigorous and complicated expression for 1 may be derived using the inclusion and exclusion formulae due to Naiman and Wynn (1994). See proof in Appendix. 3. The estimate of r is more complicated than those of 0 and 1 . The sharper it bounds R the better the correction will be. Analogous to (A.5) in the Appendix, a sharp estimate of r is (46) r^ = r^(c) = sup 61 ^n0 (x)[c2 ? 1] + T^n(sn(x)) + ^0n(x) x2X
14
where c is a solution of (36) or (37) with r replaced by r^(c). Thus, given a data set, to ^ and then build a corrected conservative SCR by Correction 1', we compute ^; ^0 ; ^0; T::: solve (36) to obtain c where r = r^(c). A \corrected" liberal SCR can be obtained similarly. The next estimate of r does not depend on c and hence it is easier in practice to solve for c based on (36). Let 0 = supx2X j0n(x)j and = E supx2X jW (x) ? 1j. Then a simple estimate (though not a sharp bound of R) of r is n X (47) r^ = 6n1 = jb (^i)jkuik + ^0 i 2
(3)
0
2 3
3
=1
if < s(x); > is non-lattice, and is r^ + 0:5n? = h^ if < s(x); > is lattice. Here h = supx2X h(x) and h(x) is the largest span of the lattice distribution; and an approximate bound for is: 1 2
0
= E sup jW (x) ? 1j = x2X
2
Z1
P (sup jW (x) ? 1j > t)dt Z 1x2X 1 + P (sup W (x) > t + 1)dt Z 1 x2X p ? t = 1+ + 2(1 ? ( 1 + t)) dt e 1 + 2e + p2e ; where e = expf1g: 2
0
2
1
0
(1+ ) 2
1
0
Similarly, sharp estimates of rp0 and rq0 are
r^p0 = sup p (x; c) and r^q0 = sup q (x; c): x x 2
2
(48)
Here, again, the idea of replacing W (x) by c is similar to that in proving the stochastic representation in (A.5).
5 Examples and Simulations To perform a simulation experiment we just need to compute the maxima (say, equal to b) of the processes Wn (x); Wn] (x); Wn (x), Wn (x) and Wn (x) using the known true (x) and estimate the quantile of a tube approximation t(b) by: (0)
^ =
X
10000
i=1
(1)
(2)
ft(bi) < g=10000
(49)
where bi is the ith independent repetition of b in the experiment, 10; 000 is the simulation size of the experiment (we use a simulation size 10,000 for all experiments) and t() is a tube approximation to P () or P u() or P l() in (18). In other words, t(bi ) is the RHS of 15
o
o
o
0.96
o
o
o
0.92
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
0.88
Coverage Probability
1.00
(22) or (23) (for basic SCRs and corrected SCRs in Corrections 1 and 2), or RHS of (36) or (37) or (43) or (44) (for corrected SCRs in Corrections 1' and 2') at c = bi or cu = bi . If the SCR works well, ^ should be close to the nominal non-coverage probability . We'll discuss our simulation results for 1-sided and 2-sided SCRs with = 0:005; 0:01; 0:05 and 0:1. The canonical links for the Poisson and Bernoulli/Binomial regression are log and logit links. To actually compute SCRs for a data set, one places the critical values c or cu obtained from the tube approximations on these processes to obtain equations that can be solved to nd con dence limits for (x). This will be illustrated by an application to a real data set. Simulations. Twenty xi 's were generated from U [0; 1] (n = 20). For a given mean response function E (Yijxi) = m, Poisson responses were generated, and a log-quadratic model tted to the data. The basic 2-sided SCRs lc(x) with c from the rst level approximation (22) were computed over the interval [0; 1], and the coverage probabilities estimated based on 10000 simulations by (49).
2
4
6
8
10
Poisson - Mean
Figure 1: Simulated coverage probabilities for a Poisson log-quadratic regression. For each mean, we display the actual coverage obtained at nominal 90%, 95% and 99% levels. Figure 1 displays the results. The coverage probabilities are generally quite good and comparable to what would be expected from the \naive" SCR for the mean response function from a Gaussian model. The \naive" SCR works reasonably well under a Gaussian model (Sun and Loader, 1994) but may fail under a non-normal GLM (especially with discrete responses). Recall Remark 1 in Section 4. There is some evidence that the true distribution has light tails at smaller means, with the results being conservative at the nom16
inal 90% coverage, and slightly liberal at the nominal 99% coverage. However, the notable point is that the main eect (at the rate of n? = ) of skewness does not have a signi cant impact on coverage probability of two-sided SCRs, as shown in (16). As we have seen from (26) and shall see shortly, the distribution of Wn(x) is quite heavily skewed. However, the main eects of skewness cancel at the two boundaries; the excess coverage gained at one boundary is lost at the other boundary. To show the eect of skewness more dramatically, we consider one-sided con dence bounds. This may be appropriate in some applications - for example, in quality control applications, one might want upper bounds on the failure rates of components under various operating conditions. We compute basic SCRs lu and ll based on Wn(x) and corrected SCRs lu;c and ll;c based on Wn] (x) as in Correction 1 . Results are shown in Figure 2. Clearly there is a dramatic skewness eect; the basic lower SCRs are liberal, and the basic upper SCRs are very conservative. The corrected SCRs improve the situation in both cases, although in the case of the upper bound, the correction doesn't go far enough (This is also true for the lower bound, since the tube formula should be slightly conservative, especially at the 90% level). Part of the problem may be the use of estimated parameters in 0n(x) and n0 (x); however, the main problem appears to be a low kurtosis; E ((Wn(x)?n(x)) ?(Wn(x)?n(x)) ) < 2. A second simulation is carried out in a Bernoulli setting, using quadratic logistic regression with true mean function 0.5 based on basic 2-sided SCRs. Again, xi's were generated from a U [0; 1] distribution, with sample sizes ranging from 20 to 200. Results are shown in Figure 3. The results in this case are poor, except for the largest sample size n = 200. Moreover, the results here cannot be attributed to bias and skewness; since our responses are Bernoulli with mean 0.5, symmetry implies the estimate is unbiased and has skewness 0. Therefore, Correction 1 and 1' will not help with one-sided con dence bands in this case. The results with Corrections 2 and 2' for two sided SCRs are given below. Some justi cations for over tting the data from a Poisson (or Bernoulli) regression model with a constant mean (deg=0, hereafter) by a log-quadratic (or logistic quadratic) regression model (deg=2, hereafter) are as follows. In practice, we often do not know how a true mean response function depends on predictors. An under t is much more serious than an over t, since an under tted model is wrong while an over tted model still gives an unbiased estimator of the log (or logit) of the mean response function though the variance of the estimator tends to be larger than that by an \exact" t (i.e. t by a deg=0 model). The dierence in the variances of an over tted and an \exactly" tted estimators decreases as the sample size increases. An eect of this variance increase by over tting is that the resulting SCR tends to be more conservative. Compare the non-coverage probabilities from the exact ts (deg=0) and over ts (deg=1,2) in Figure 4. If there must be a deviation it 1 2
4
17
2
1.00
o +
o +
o +
o +
o +
o +
o +
o +
o +
o +
o +
o +
o +
o +
o +
o +
o +
o +
o +
o +
o +
o +
o +
o +
o
0.92
0.96
+
o +
o +
+
0.88
Coverage Probability
o + o + o
2
4
6
8
10
0.96 0.92
+ o
+ o
+ o
+ o
+ o
+ o
+ o
+ o
+ o
+ o
+
+
+ o
+ o
+ o
+ o
+ o
+ o
+ o
+ o
+ o
+ o
+ o
+ o
+ o
+ o
+ o
+ o
o
o
+
+
0.88
Coverage Probability
1.00
Poisson - Mean
o o 2
4
6
8
10
Poisson - Mean
o o
o
o
o
o 0.96
o o
o 0.92
o o
0.88
Coverage Probability
1.00
Figure 2: Simulated coverage probabilities for a Poisson log-quadratic regression and onesided upper con dence bands (top) and lower con dence bands (bottom). Coverages from the basic SCRs are shown with a `o', and coverages with a bias-skewness correction (Correction 1) with a `+'; again at nominal 90%, 95% and 99% levels.
50
100
150
200
Bernoulli - n
Figure 3: Simulated coverage probabilities for a quadratic logistic regression. 18
is better to have a conservative SCR than a liberal SCR as long as the conservative SCR is informative { not too wide. In addition, Figure 4 shows interestingly that the coverage probabilities (not variances) under an over tted model are less varied than those under an \exact" t. Thus, Correction 1' and 2' should work well in practice for the basic SCRs from an over tted model. Another bene t of using an over tted model is to show the dramatic eect of the skewness and motivates our corrections for one-sided and two-sided SCRs. Of course, it is bene cial to perform an exact t if possible (by performing diagnostics on several possible ts) before computing a SCR. This is con rmed in our simulations below for 2-sided basic and corrected SCRs. This two stage process, obtaining an exact t to a given data and then computing a good SCR, will be illustrated in the data application below. Implementing Corrections 2 and 2' for 2-sided SCRs is straightforward as those for Corrections 1 and 1', though the computation involved for computing the n? correction term is much more complicated. We have built a parfit software that makes all computations automatic. Our design of experiments for comparing 2-sided SCRs is a complete or partial crossing of the cases in Table 1 for the following four batches of experiments. In these experiments, SCRs are computed for each simulated data drawn from the models presented in Table 1, under an exact and/or over tting scheme. 1
Model Poisson
Table 1: Models and Notations
Notation Mean pc mean = 1; 2; :::10 pl mean = 5 exp(x), a log linear function pq mean = 10 exp(?1 + 6x ? 9x ), log quadratic Bernoulli bc mean = 0:5; 0:7 bl mean = expit(x), logistic linear bq mean = expit(x ? 2x ), logistic quadratic Binomial bn mean = 3 expit(x)* n Sample size = 20; 50; 100; 150; 200 2
2
Exact t deg=0 deg=1 deg=2 deg=0 deg=1 deg=2 deg=1
where expit(x) = exp(x)=(1+exp(x)), and in each experiment, the simulation size is 10; 000 and each of 10; 000 independent samples comes from a speci ed distribution. For example, pl.2 indicates an experiment in which each sample of size 20 is drawn from the Poisson distribution with the log linear mean in Table 1; bc5.5 indicates an experiment in which each sample of size 50 is drawn from the Bernoulli distribution with a constant mean 0:5; and bq.10 indicates an experiment in which each sample of size 100 is drawn from the 19
Sample Size n 20
50 100 20
50 100 20
50 100 20
50 100 20
50 100 20
50 100 20
50 100 20
50 100 20
50 100 20
50 100
o •
• +
+ o• + •
o• +
o
0.054
0.006
o
o o o o• + +• + + • +• o• o• o + o o + • + + + • + + + o +• • o• + o • • • o • o o
• + o + o • + + o + o• • o + o + o • o • + +• • o o + • •
o o
0.046
0.050
o
0.042
o: deg=0. *: deg=1, +:deg=2
Non-Coverage Probability
0.010
+
o
o o
o o
o
o
o + + o • +• o o • • • • • + • + • o o + • + + • • +• + • • • + + + + + + • + + o + • • +• o• + o o • + o
o
o
o 0.100
o o
0.090
o •
0.080
• + • o + +•
•
+ • o• +
+ +
•
o
o o •
o
o
• + o +• • +
o
o
o o
• o • + + + +• • • • +• + +• o + +• o
o
• +• +
o
+ o
• o +• +
o•
+
o
o o
o
o • + + o o + + • • • •
+•
o
• +
o + o • +•
o• + o• +
+ • o
+ + 1
1 1
2 2
2 3
3
3 4
4 4
5 5
5 6
6 6
7 7
7 8
8
8 9
9 9 10 10 10
True Mean Figure 4: Simulated non-coverage probability of the basic SCR for the mean response function from the Poisson regression model with mean = 1; 2; : : : ; 10, sample size = 20; 50; 100, using exact ts (deg=0) and over ts (deg=1, 2), respectively. The solid horizontal line is the nominal non-coverage probability (0:01; 0:05; 0:1), top axis is the sample size n and bottom axis is the true mean in the Poisson model 20
Bernoulli distribution with the logistic quadratic mean. Here pc1.n-pc10.n cases are only experimented for n = 20; 50 and 100; and the Binomial case bn is only carried out for n = 50 in the rst batch of experiment. The four batches of experiments are: Batch 1. This is the null case where the basic SCRs are computed from Wn(x), for pc1-pc10 models with deg=0,1,2 ts and sample sizes n = 20; 50; 100; for bc5 and bc7 with deg=0,1 ts and for bl, bq, pl and pq with the \exact" ts and n = 20; 50; 100; 150; 200; and for bn. This leads to a total of 90 + (4 + 4) 5 + 1 = 131 experiments. Most basic SCRs under the 90 pc models are conservative, though some are little (fortunately!) liberal. See the percentages of points below and above target lines in Figure 4. Almost all SCRs under the rest 41 models bc5-bn are conservative, where the SCRs under bc5 with deg=1 over ts are most conservative and should be corrected. The SCRs under bl have pretty good coverage probabilities in terms of their closeness to the target values while those of pl and pq provide very good coverages. This is consistent to our earlier discovery in Figure 1. Since the Binomial model bn is equivalent to a Bernoulli model bl with a larger sample size, this is the only batch that bn case is experimented. It is worth to point out that in two out of these 131 experiments, our algorithm returned NA in one (or 4) out of the 10000 samples for bc7.2 with a deg=0 (or deg=1) t. This indicates that our algorithm occasionally has diculties when the sample size is small, e.g. n = 20. Batch 2. This batch compares two types of SCRs from Wn (x) and Wn (x) in Correction 2 with the basic 2-sided SCR as a base line, under the exact and over tting schemes. For pc, we experimented deg=2 over ts for mean = 1; : : : ; 10 cases and deg=0 exact ts and deg= 1 over ts for some of these constant mean cases, with a sample size n = 50 and 100 respectively. For pl, we tried deg=2,1 over and exact ts with n=50; for bl, deg=1 exact ts; and for bc, deg=0, 1 exact and over ts with mean =0.5. It turns out that the SCRs computed from Wn (x) do not provide a good correction at all while the SCRs from Wn (x) result in an overall signi cant improvement over the basic SCRs, especially when the models are tted \exactly." This suggests performing exact ts when possible and using more centered process when possible in Correction 2, since Wn (x) in Wn (x) is more centered than Wn(x) in Wn (x). In the rest of experiments, we hence drop experimenting with Wn (x). Batch 3. This batch, motivated by the batch 2 results, compares corrected SCRs computed from Wn (x) and Wn (x), using again the basic SCR as a base line. A SCR from Wn (x) is much easier to compute in data applications than that from Wn (x) and it practically drops the 1=n correction term from the inverse Edgeworth expansion. The comparison is rst done for the data from the pc1-10 models with deg=0,1,2 ts, (1)
(2)
(1)
(2)
(0)
(2)
(1)
(1)
(0)
(2)
(0)
(2)
21
which totals 10 3 3 3 = 270 experiments. The estimated coverage probabilities from the SCRs computed from Wn (x) and Wn (x) are not too dierent no matter the data are \exactly" tted or over tted. Thus, we suggest for pc1-10 models use the easier-to-compute SCRs from Wn (x). The comparison is then done for bc5 and bc7 cases with deg=0,1 ts, which totals 2 2 3 5 = 60 experiments. Figure 5 displays the results. In the case of deg=0 ts, the \centered" SCRs from Wn (x) provided some improvements though not much, while the corrected SCRs from Wn (x) is the overall winner except some cases when n = 20. That Wn (x) did not improve much is consistent to our earlier observation that the \poor" results of the basic SCRs in bc5 \can not be attributed to the main eects of skewness." The diculty at n = 20 about Wn (x) is not surprising since a 1=n correction term (though it helps in n 50 cases) may have a hard time to make an eect when the sample size is small. In fact, our parfit can fail to converge in some n = 20 cases. This is expected: although (x) > 0 in theory, can be negative when it is estimated in a small sample case. Fortunately the failure happened only when n = 20 and in the worst case only 6 out of 10000 repetitions. We see similar results in the case of deg=1 ts. The simulation also shows that the exact t is bene cial for the constant mean cases. The nal comparison in this batch is done for bl, bq, pl and pq models with the exact ts, which also totals 60 experiments. The Poisson cases are excellent: all SCRs are reasonable while the centered SCRs from Wn (x) are uniformly better though they sometimes do not make enough corrections as Wn (x). The SCRs from Wn (x) are generally better for larger n and is excellent for pq when = 0:05. The story for the Bernoulli case is dierent: for bl, the corrected SCRs from Wn (x) are the winners ; for bq, the corrected SCRs certainly improve over the basic SCRs, but not enough sometimes. This is the case that Correction 2' helped. See Figure 6. Batch 4. This nal batch of experiments compares the basic SCR (as a baseline), SCRs from Corrections 2 and 2' and two new SCRs. The sampling distributions are pl, bl and bq for n = 20; 50; 100; 150; 200 and pq for n = 20; 50; 100 (all with exact ts), a total 18 experiments. In each experiment, 9 SCRs are computed based on processes in Table 2. The results are as follows. The base line SCRs by w has reasonable coverage probabilities for Poisson data, but not for Bernoulli data unless the sample size is 200. The centered SCRs by w0 is uniformly better than the basic SCRs, though the improvements are not as big as q2 in some cases for = 0:1 and 0:05. The corrected SCR by q2 are excellent for pq models, very good for bl except n = 20 and pretty good for pl. The method p3 is no good at all, while q3 improves at = 0:1; 0:05 for larger n though it has a trouble for bl when n = 20 and 50. (0)
(2)
(0)
(0)
(2)
(0)
(2)
2
2
(0)
(2)
(2)
(0)
22
20 50
+ o •
100
150
o + •
200
+ o
20 50
+ o •
100
150
+ o
+ o •
0.10
o
+ o
•
+ o • + o
+ o •
+ o •
+ o •
+ o •
• + • o
+ o •
o + •
150
200
+ o
0.07
+ • o
o + •
• +
+ o 20 50
100
150
200
deg=0 mean=0.7
+ o
+ o •
200
deg=0 mean=0.5
0.10 0.09
•
0.12
•
0.14
+ o •
•
deg=1 mean=0.5
0.02 0.07
•
0.05 + o
• + o
+ o •
+
0.03
•
0.01
+ o + o
• + o
0.16
•
0.02 0.01
•
0.0
0.0
0.01
0.02
0.03
•
+ o •
•
0.08
+ o •
0.12
0.06 0.02 • + o
+ o •
o
0.06 0.05 + o
+ • o
+ o
•
+ o
0.04 •
•
+ o • + o •
0.05
• + o
0.04
+ + o o
+ o •
•
+ o •
+ o
0.03
0.02
0.02 0.01
• • + o
+ o •
0.03
o
α=0.1
0.09
•
+ o •
•
0.04
0.006
o
+ o •
0.03
0.03
•
0.002
o
0.04
+ o
•
0.01
0.003 0.001
+ o
+ o • o + +
+
•
•
0.03
o: w(x), *: w^2(x), +: w^o(x)
Non-Coverage Probability
•
0.010
0.005
• + o
α=0.05 +
0.11
α=0.01 +
0.05
α=0.005 •
o 20 50
100
Sample Size n
Figure 5: Simulated non-coverage probability of the basic and corrected SCRs, computed from wn(x); wn (x); wn (x), for the mean response function from a Bernoulli model with mean = 0:5 and 0:7, sample size = 20; 50; 100; 150; 200, using exact ts (deg=0) and over ts (deg=1), respectively. The dotted horizontal line is the nominal non-coverage probability . (2)
(0)
23
o•
0.04 0.15
•
• + o
•
• 100 150
•
20
40
60
80
100
o• +
0.0480
o
• + o
o 20
40
60
80
100
+ o •
o +•
• + o
+
bq
+
• + o
+
•
100
+ o•
o
+ o•
•
pl
+ •
o o + •
•
+ o •
20 50
150 200
100
150 200
+ o •
•
o + •
0.090
0.047 + o
• + o
o
20 50
+• o
+
0.044
• o +
0.009
+ o
+
+ o •
0.10 + o•
o• +
0.012
0.007
+ o• + o
• • o +
+ o •
bl
•
o
200
+ o•
o
0.093
o
+ o •
0.0
0.04 + o •
+ o•
+ o•
o•
0.094
+ o
20 50
200
•
+
+ o•
• +
o 20
40
60
80
100
0.086
100 150
0.015
20 50
o +
+ o•
•
• o
0.08 0.0070
•
• + o
0.0465
o •
o
•
0.0450
o• +
0.0080
+
o + •
+ o•
o
0.005
+ o•
0.05
0.12
+ o•
0.091
+ o•
0.050
+ o •
+ o•
0.0090
• + o
0.0
+
+ o
+
0.08
0.06 o•
+ o•
0.04
0.04 o
•
+
0.0038
o•
+ o•
•
0.0044
0.0
+
0.009
+ + o•
0.02
+ o•
0.0
o•
+ o•
+
+
0.08
0.08
•
+ o•
α=0.1 +
0.02
0.02 0.0
o•
0.0032
o: w(x), *: w^2(x), + : w^o(x)
+
Non-Coverage Probability
+
0.04
0.04
0.06
+
α=0.05 0.12
α=0.01 0.10
α=0.005
pq
+ o 20
40
60
80
100
Sample Size n
Figure 6: Simulated non-coverage probability of three SCRs for the mean response function from the bl, bp, pl, pq models in Table 1, with exact ts and sample size = 20; 50; 100; 150; 200. For the pq model, results from 3 SCRs for n = 150 and 200 are very close and hence are omitted. 24
Table 2: Corrections 2 and 2' and Their Simpli cations Name w w0 q2 q3 p3 wp+ wpwq+ wq-
Processes maxfjWn (x)jg maxfjWn(0) (x)jg maxfWn(2) (x)g maxfjWn(0) (x)j + q2 (x; c)g maxfjWn (x)j + p2 (x; c)g maxfjWn (x)jg ? r^p0 maxfjWn (x)jg + r^p0 maxfjWn(0) (x)jg ? r^q0 maxfjWn(0) (x)jg + r^q0
Comments basic SCR centered SCR corrected SCR 2 simpli ed1 corrected SCR 2 simpli ed1 corrected SCR 2 corrected SCR 2' corrected SCR 2' corrected SCR 2' correction SCR 2'
The new corrected SCRs q3 and p3 were simpli ed from Wn(2) (x) = jWn(0) (x)j + q2 (x; Wn(0) (x)), the method q2. The method q2 is easy to implement for performing simulations but requires solving an equation that involves a 4th order polynomial for each value of x, for actually building a SCR (given a data set). The justi cation for replacing Wn(0) (x) or Wn(x) by c is same as that for (A.5). The Correction 2' is straightforward to implement in both performing simulations and building a SCR.
The methods wq+, wq-, wp+ and wp- are from Correction 2'. Correction 2' is equivalent to enlarging the basic SCR bands by r^p or r^q to make the basic SCR bands wider, or decreasing the bands by r^p or r^q to make the bands narrower, while the center of the bands is un-changed. wq+ and wq- are pretty good for pl but are otherwise unstable, while an appropriate use of wp+ and wp- resulted an overall improvement (which is also better than q2 in the dicult Binomial cases bl and bq). The guideline for deciding whether to use wp+ or wp- is to use the one that results in more liberal SCRs than the basic SCR, i.e. f(x) : jWn(x)j c ? jr^p0 j; x 2 Xg (50) where c is the rst level approximation in (22) and r^p0 is in (48). In summary, the centered SCR by wn (x) (w0) is uniformly better than the basic SCR wn(x) (w) while the corrected SCR 2 by wn (x) (q2) and corrected SCR 2' by wn(x) + r^p (wp) in (50) are recommended generally. The corresponding SCRs have expressions: (0)
(2)
25
Table 3: Final SCRs
Name Notation SCR Bands for (x) Basic SCR w ^(x) + c^ (x) q Centered SCR w0 ^(x) ? ^ (x)^ (x) + c^ (x) ^ (x) Corrected SCR 2 q2 ^(x) + u(x)^(x) Corrected SCR 2' wp ^(x) + (c ? jr^pj)^(x) where c is the rst level approximation in (22), r^p0 is in (48), ^ (x) and ^ (x) are estimated mean and variance of Wn(x) in (31), and u(x) is the solution of 1
2
1
2
juj + q (x; u) = c 2
with q de ned in (40). Examples. The following is the insect data from Bliss (1935): log (concentration) 0 1 2 3 4 No. of Deaths 2 8 15 23 27 No. of Insects 30 30 30 30 30 An exploratory data analysis suggests that the log2(concentration) of an insecticide might be a good predictor for modeling the probability of death of an insect. The analysis of deviance table for a linear logistic t and a quadratic logistic t to the data is 2
2
Analysis of Deviance (test="Chi") Terms Res.Df
Res. Dev
Test
Df
0
NULL
4
64.76327
1
conc
3
0.3787483
2 conc + conc^2
2
0.1954940 +I(conc^2) 1
+I(conc)
Deviance
1
P-value(Chi) 9.992007e-16
0.1832542
0.6685914
where conc is the log (concentration). This implies that the linear logistic t is good enough. This is also con rmed by other simple diagnostics plots. The estimates of the linear predictor (x) and its variance (x) are: 10 1 0 0:17462 ?0:06582 A@ 1 A ^(x) = ?2:32379 + 1:161895x; ^ (x) = 1 x @ x ?0:06582 0:03291 where x = conc. Based on 41 equally spaced points in (0; 4) and = 0:05, we obtain c = 2:4304, r^p = 0:043414. The summary statistics of ^ (x); ^ (x); u(x) are Min 1st Qu Median Mean 3rd Qu Max st.dev ^ (x) -0.036 -0.012 -0.00055 -0.00011 0.012 0.036 0.00279 ^ (x) 0.985 0.986 0.98660 0.98670 0.988 0.988 0.00015 u(x) 2.185 2.203 2.23700 2.23800 2.279 2.284 0.00517 2
2
2
1
1
2
26
1
Figure 7 displays an approximate pointwise con dence region (CR), ^(x) + 1:96^(x), basic, centered, corrected simultaneous CRs in Table 3 for E (Y jx) = p(x), the probability of death as a function of log2(concentration). The pointwise con dence region is narrowest of all CRs. This is expected since it can not be interpreted simultaneously for making inferences about the entire curve as the other SCRs do. All SCRs are fairly close since the sample size is 150. However, the basic SCR is the widest (as expected) and centered SCR is indistinguishable from the basic SCR. Both corrected SCR 2 and 2' are narrower than the basic SCR with the corrected SCR 2 being the best.
o
0.6
0.8
point basic w0 w2 c2’
o
0.4
o
o
0.2
p = probability of death
1.0
p = expit (-2.32379+1.161895*x)
0.0
o
0
1
2
3
x= log2 (concentration)
4
Figure 7: Approximate pointwise CI, basic, centered, corrected simultaneous CI for E (Y jx) = p(x). The points are percentages of deaths at each log2(concentration). The middle black solid curve is the estimate p^(x) = expit(?2:32379 + 1:161895 x). Total sample size (number of insects) is 150. Figure 8 presents the basic SCR to a simulated data where we know the true mean response function. Other SCRs are omitted to avoid clotting the picture and to illustrate that most basic SCRs (21) are still informative (not too wide) although they can be little conservative. Recall the results in the batch 1 of experiments. See Sun, Raz and Faraway (1998) for an extension of Schee's SCR and its comparison with naive SCRs. The naive SCRs are the basic SCRs if the responses are normally distributed. Figures 3 and 4 in Sun, 27
Raz and Faraway (1998) showed that the naive SCRs beat Schee's SCR and Bonferroni SCRs. Clearly, the width of Bonferroni SCRs increases to in nity as the number of the points in the domain of interests increases to in nity. By our extensive simulations, we have already known that the corrected SCRs improve over the basic SCRs especially when the sample size is moderate and the basic SCRs are too conservative.
15
o
o
o o o o o o
o
10
o
o
y
o o o
o
o
o
o
o
o o
o o
5
o o
o
o
o
o
o o oo
o
o
o
0
o 0.0
0.2
0.4
0.6
o
o
o o
o o
0.8
o
oo oo
o 1.0
x
Figure 8: The design is 50 points uniformly distributed on [0; 1] and the response Poisson with mean 10 exp(?(3 x ? 1) ). The data points are shown by "o". The log-quadratic t is shown by the solid curve; the true mean by the long dashes and 95% rough simultaneous con dence region by short dashes. 2
6 Discussions and Concluding Remarks We constructed basic SCRs and corrected SCRs for the mean response function in a GLM. The central idea in our approach is similar to the Skorohod construction so that we could explicitly study the \bias" from approximating Wn(x) by W (x). When this bias is small we can estimate the bias for adjusting the approximating formula. Basic SCRs are reasonable for continuous data and Poisson data when n 50 and Binomial data when n 200. In other cases, centered or corrected SCRs must be used. 28
Our centered SCRs uniformly improve over the basic SCRs and corrected SCRs in most cases oer improvement over the basic SCRs. Corrections 1 and 1' are generally helpful for one-sided bands. Corrections 2 and 2' can be used for two-sided bands. In the dicult case of quadratic logistic regression, Correction 2' is extremely helpful. See Table 3 for a summary of recommenced SCRs. Implementing these SCRs via our software parfit is fairly straightforward. Our methods require that the link function is known. When the link function is unknown, a user may use residual plots to guess a reasonable link function and then proceed as above as if the link were known. In this case, a further justi cation may be needed to justify the use of the plug-in link function. Alternative methods for constructing a SCR can be found in Kna , Sacks and Ylvisaker (1985), Hall and Titterington (1988), Hardle and Marron (1991) and others. Kna , Sacks and Ylvisaker (1985) assume that the errors are normal and additive. They build simultaneous con dence bands on discrete points using discrete upcrossings and then interpolate between these points, so in the limit it is equivalent to the \tube method" in the one dimensional case (d = 1) when errors are normal. Hardle and Marron (1991)'s bands are based on kernel estimates of the regression function and a bootstrap procedure for building simultaneous error bars, i.e. their bands are simultaneous on a set of nite discrete points in the predictor space. Some comparisons with the tube formula approach are presented in Loader (1993). Hall and Titterington (1988) present an alternative approach for constructing SCRs for density functions and regression functions. They divide the data into non-overlapping bins and then use the data in each bin to nd con dence intervals at the centers of each bin (This essentially amounts to using a rectangular kernel estimate). Since the bins are non-overlapping, the intervals for separate bins are independent, so they can easily make these simultaneous. Then, to construct a SCR, for a regression problem, say, they assume a bound on a low-order derivative of the regression function. So, this bound plays an important role in the actual size and coverage probability of their SCR. In any case, the tube formula approach is complementary to previous bands (cf. Sun and Loader, 1994 and Loader, 1993).
Acknowledgment Authors thank 2 referees and an associate editor for making this paper much more reader friendly and a referee (thanks much!) for pointing out the correction term for two sided bands should be of the order n? rather than n? = . Jiayang Sun acknowledges the helpful discussion that she had with Bob Keener and Dan Rabinowitz at the initial stage of this 1
1 2
29
research. Sun's research is supported in part by NSF grants DMS-95-04515, DMS-9626108 and DMS-9870544.
Appendix: Derivation and Examples Exemplar n; n and Tn's.
Example 1. [Gamma]. Consider the case where the response variables have a Gamma density with known shape parameter r > 0 and unknown scale parameter ! > 0:
r f (y; r; !) = ?(!r) yr? e?!y ; for y > 0: Then we have the form of (1) with = ?!; b() = ?r log(?): Clearly, g() = b0 () = ?r= is monotone in , ?!i = i = hz(xi ); i, and b0 () = ? r = !r ; b00 () = r = !r ; b () = ? 2r = !2r : Substituting them into (13) and (14) etc. gives n n r n r X X 4 X An = n1 r ziziT ; n = n1= jj u jj u ; and ( v ) = i i n n = l i hui; vi : i i i i 1
(3)
2
2
3
3
3 2
3
3
2
=1
2
3 2
=1
3
=1
In this case cannot be lattice as Yi is a continuous random variable and hence Tn in Proposition 3.1 is zero. Note that a commonly used non-canonical link for the gamma case is log-link, i.e. log(!) = hz(x); i : In this case, a similar Edgeworth expansion exists though the maximum likelihood estimates of , n and n may be dierent from ones under the canonical link. Example 2. [Poisson]. Consider the case where the response variables have a Poisson density with intensity parameter ! > 0: y f (y; !) = !y! e?! ; for y = 0; 1; : : : We have the form of (1) with = log(!); b() = e : Again, g() = b0 () = e is monotone in , and log(!i) = i , b0 () = b00 () = b () = e = !: Substituting them into (13) and (14) etc yields similar expressions of An, and n. (3)
30
Note that j has the lattice distribution i the j th component of Pni (yi ? !i)zi does, or if the j th component of zi for all i only take lattice values, i.e. they belong to aj + hj Z for some aj ; bj , for j = 1; : : : ; q, where Z is the integer set. Hence, Tn = 0 unless we have a very special design, e:g: ziT = (1; : : : ; 1), which is rare. =1
Example 3. [Logistic]. Consider the case where the response variables have a Binomial density with success parameter 0 < p < 1:
! n f (y; p) = y py (1 ? p)n?y ; for y = 0; 1; : : : ; n: The GLM with such marginal densities is called Logistic model. We also have the form of (1) with = log(p=(1 ? p)); b() = n log(1 + e ): So, g() = b0 () is monotone in , log(pi=(1 ? pi)) = i, and for q = 1 ? p, ne ? n2e = npq?2np q: 00 () = ne = np; b = npq; b ( ) = b0 () = 1 ne + e (1 + e ) (1 + e ) (1 + e ) Note that might have a lattice distribution if we use the same special design described in Example 2: ziT = (1; : : : ; 1). In most cases, hv; i does not have a lattice distribution; and it is approximately normally 2
(3)
2
2
2
3
distributed.
Proof of Proposition 3.1 and Equation (11). Let be the true value of , set 0
i;0 = z(xi )T 0 ,
and denote n n X X 1 1 0 0 ( ) ? b0 ( ))z ( Y ( b i ? b (i; ))zi and gn ( ) := i i; i n := ni ni Then from (6) we see that ^ is a solution to 0
0
=1
=1
n
= gn( ):
(A.1)
Note that (A.1) has as a rst-order approximation the equation n 1X = n n i b (i; ) hzi; ? i zi = An( ? ) with solution ^n; given by ^n; ? = A?n n : A second-order approximation to (A.1) yields the equation n n 1 X 1X b ( i; ) hzi ; ? i zi + n = ni 2n i b (i; ) hzi ; ? i zi: (2)
0
0
0
=1
1
1
(2)
0
0
1
(3)
0
=1
=1
31
0
0
2
Replacing ? with the rst order solution A?n n in the second term on the right hand side above yields a linear equation with solution ^n; . Continuing in this way one can obtain an expansion for ^n de ned in (A.1) in terms of n . Taking the branch that sets ^n = when n = 0 yields n h i X ^n ? = A?n n ? 21n b (i; ) z(xi )T A?n n A?n z(xi ) + op(j nj ); i i.e. ^n = ^n; if op(j nj ) term above is zero. This is equivalent to 1
0
2
0
(3)
1
0
2
1
0
1
2
=1
2
2
n X p (A.2) Bn n( ^n ? ) = ? 2n1 = b (i; ) hui; i ui + op(n = j nj ): i Note that n = n? = BnT = Op(n? = ) with de ned in (12), so, (A.2) implies (11). The moments of can be calculated easily as: (3)
0
3 2
1 2
2
0
1 2
2
=1
1 2
E ha ; i = 0 E ha ; i ha ; i = ha ; a i n X (A.3) E ha ; iha ; i ha ; i = n1= b (l ) ha ; ul i ha ; ul i ha ; ul i l E ha ; iha ; iha ; i ha ; i = ha ; a i ha ; a i + ha ; a i ha ; a i + ha ; a i ha ; a i n X + 1 [b (l ) + 3b00 (l )] ha ; uli ha ; ul i ha ; ul i ha ; ul i n 0
0
0
1
0
1
0
1
2
3 2
2
1
(3)
3
0
0
1
1
2
3
0
2
1
(4)
2
2
=1
0
l=1
3
0
1
3
1
2
2
3
for any non-random a ; : : : E; a . It then follows from (A.2) and (A.3) that the bias D vectors p and skewness of v; Bn n( ^n ? ) are hv; ni and n(v) in (13) and (14), respectively. In the non-lattice case it follows again from (A.2) that for and denoting the standard normal distribution and density, respectively, ! D E p 1 1 prf v; Bn n( ^n ? ) ? n xg ? (x) ? n(v)(1 ? x )(x) = o p : (A.4) 6 n Taking note of ! h i h i 1 1 1 prfZ + n (v) Z ? 1 xg ? prfZ + n(v) x ? 1 xg = o p (A.5) 6 6 n where n(v) = O( pn ) and 0
3
2
0
2
2
1
! 1 1 1 prfZ x + n (v)(1 ? x )g ? (x) ? n (v)(1 ? x )(x) = o p ; 6 6 n 2
2
we obtain
! prfZ + 1 n(v) hZ ? 1i xg ? (x) ? 1 n(v)(1 ? x )(x) = o p1 ; 6 6 n 2
2
32
(A.6)
thus we see from (A.4) and (A.6) that up to order op( pn ) 1
D E p v; Bn n ^n ? ? n =d Z + 16 n(v)[Z ? 1]: 2
0
The inverse Edgeworth expansion now follows. The lattice case proceeds similarly, but using the notations in Hall (1992, p46). 2
Proof of (25) and (26). For A^n = An( ^), since h i? BnA^?n BnT = Bn (An + A^n ? An)? BnT = I + (BnT )? (A^n ? An)Bn? = I ? (BnT )? (A^n ? An)Bn? + op(A^n ? An) n X = I ? 1 uiuTi (b00(^i ) ? b00 (i)) + op(A^n ? An) (by (8)) n 1
1
1
1
1
1
1
i=1
= I ? n1=
3 2
n X i=1
b (i )uiuTi hui; i (1 + op(1)) (3)
(by (A:2));
we have for v = (BnT )? z(x) that sn(x) = v=jjvjj and (omitting the subscripts of s) 1
"
(x) = ^ (x)
z(x)T A?n z(x) z(x)T A^?n z(x) 1
#=
1 2
jjvjj
=h
i= vT BnA^?n BnT v 2 3? = + * n X 1 v = 41 ? n = b (i ) jjvjj ; ui hui; i (1 + op(1))5 i n X = 1 + 2n1 = b (i ) hs(x); uii hui; i + op(n? = ): i Hence, from (A.2) again, D E p Wn(x) = s(x); Bn n( ^ ? ) (x)=^ (x) = hs(x); i n i h X + 1 = b (i ) hs(x); uii hui; i hs(x); i ? hui; s(x)i hui; i + op(n? = ) 2n i which is (25). Equations in (26) then easily follow from (A.3). 1
1
1 2
1 2
2
(3)
3 2
=1
2
(3)
3 2
=1
2
(3)
3 2
1 2
2
1 2
=1
2
Proof of Proposition 4.1: By the inverse Edgeworth expansion (27) and notations in
(18),
P l(c)
!
pr sup [W (x) + R(x)] > c pr sup W (x) + R > c x2X
!
x2X
where R = Op(n? = ). Let rn be positive constants for which P fRn rng = o() as n ! 1. Then, similar to establishing the rst level approximation (23) to the basic SCR, 1 2
33
we have
(
)
pr sup W (x) > c ? r + o()
P l(c)
x2X
2 e? c?r 2 = + (1 ? (c ? r)) + o(): Setting this last formula to gives (36). This is a conservative con dence band, but the nominal level will be closely approximated if r is close to R and is small. An approximate lower bound to P l(c) can be obtained similarly, which leads to (37). 2 0
(
)
2
Proof of Proposition 4.2 is similar to that of Proposition 4.1. The main dierence are as follows. For the upper bound approximation, we note that
!
P (c) = pr supfjW (x)j ? p (x; W (x))g > c + o() x2X ! pr sup jW (x)j + Rp0 > c + o() (x2X ) 0 2pr sup W (x) > c ? rp + o(): 2
x2X
So, we have (43) by setting the last formula equal to . For the lower bound approximation, note that for c0 = c ? rp0 ,
P (c)
=
!
pr sup jW (x)j ? Rp0 > c + o() ) (x2X 0 pr sup jW (x)j > c + o() x2X ) Z1 ( c0 c0
pr sup j hs(x); U i j y g(y; n)dy + o() x2X
where g(y; n) is the density of a random variable with n degrees of freedom and U is a uniform random variable on the unit sphere S n? . Let T (r jM) be a tube of M = fs(x); x 2 Xg with radius r (by abusing the notation r used earlier to indicate another constant { radius): 1
T (r jM) = fu : u 2 S n? ; xinf ku ? s(x)k rg: 2X 1
Then, in the one-sided case,
0) c P sup hs(x); U i y = P fU 2 T (r jM)g x2X
(
34
q
by the simple equality jja ? bjj = 2 ? 2 ha; bi where a; b 2 S n? and r = 2(1 ? c0 =y). In the two-sided case, ( 0) c P sup j hs(x); U i j y = P fU 2 fT (r jM) [ T (r j?M)gg : x2X The probability on the RHS is equal to the ratio of the volume of the tubular neighborhood T (r jM) [ T (r j?M) and the volume of the unit sphere S n? . We have, by applying the Weyl's formula (1939), as in Sun (1993), to the volume of the tubular neighborhood, and ignoring any possible boundary eect, Z1 ( 0) c pr sup j hs(x); U i j y g(y; n)dy c0 x2X Z1 0 c0 (2 ? (y))J ( cy )g(y; n)dy 2 2? e?c02= where is in (21) and ?( n ) Z J (w) = ?( n? ) (1 ? t ) n?2 4 t dt w Z 0 1 = ec02= 0 (y)J ( cy )g(y; n)dy c Z q (y) = jjs0(x)jjdx; X (y) = [x 2 X : T (r jMg \ T (r j?Mg)] ; r = 2(1 ? c0 =y): 2
1
1
0
1
0
0
1
2
0
1
2 2 2
0
2
1
1
2
1
0
1
X1 (y)
If (y) is independent of y for y C , some constant, then 1
= 1
Z
X1
jjs0(x)jjdx; X = fs(x); x 2 Xg \ f?s(x); x 2 Xg: 1
This is true for most regression problems. Also recall the discussion on computing in x4. In summary, the addition of is to make sure that the intersection of M and ?M is only counted once in computing the volume. Finally, adding the boundary correction 2(1 ? (c)) as that in Sun and Loader (1994) leads to the liberal formula (44). 2 1
1
References [1] Bliss, C. L. (1935). The calculation of the dosage-mortality curve. Annals of Applied Biology, 22, 134-167. [2] Brown, L. D. (1986). Fundamentals of statistical exponential families: with applications in statistical decision theory. Hayward, Calif. IMS. [3] Faraway, J and Sun, J (1995). Simultaneous con dence bands for linear regression with heteroscedastic errors Journal of American Statistical Association, 90, 1094-1098. 35
[4] Hall, P. and Titterington, D. M. (1988). On con dence bands in nonparametric density estimation and regression. J. Multivariate Anal. 27, 228-254. [5] Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer. [6] Hardle, W. and Marron, J. S. (1991). Bootstrap simultaneous error bars for nonparametric regression. Ann. Statist. 19, 778-796. [7] Johansen, S. and Johnstone, I. (1990). Hotelling's Theorem on the volume of tubes: some illustrations in simultaneous inference and data analysis. Ann. Statist. 18, 652684. [8] Kna , G., Sacks, J. and Ylviasaker, D. (1985). Con dence bands for regression functions. J Amer. Statist. Assoc. 80, 683-691. [9] Knowles, M. and Siegmund, D. (1989). On Hotelling's geometric approach to testing for a nonlinear parameter in regression. Internat. Statist. Rev. 57, 205-220. [10] Kreyszig, Erwin (1968). Introduction to Dierential Geometry and Riemannian Geometry. University of Toronto Press. [11] Loader, C. (1993), \Nonparametric regression, con dence bands and bias correction," Proceedings of the 25th Symposium on the Interface Between Computer Science and Statistics, 131-136. [12] Loader, C. and Sun, J. (1997). Robustness of tube formula. J. Comp. Graph. Statist. No2, Vol 6, June. pp242-250. [13] Naiman, D. Q. (1987). Simultaneous con dence bounds in multiple regression using predictor variable constraints. J Amer. Statist. Assoc. 82, 214-219. [14] Naiman, D. Q. (1990). On Volumes of tubular neighborhoods of spherical polyhedra and statistical inference. Ann. Statist. 18, 685-716. [15] Naiman, D. Q. and Wynn, H. P. (1997). Abstract tubes, improved inclusion-exclusion identities and inequalities and importance sampling.Ann. Statist. 25, no. 5, 1954{1983. [16] Schee, H. (1959). The Analysis of Variance. John Wiley & Sons, New York. [17] Skovgaard, IB M. (1981a). Transformation of an Edgeworth expansion by a sequence of smooth functions. Scand. J. Statis., 8, 207-217. 36
[18] Skovgaard, IB M. (1981b). Edgeworth expansions of the distribution of maximum likelihood estimators in the general (non i.i.d.) case. Scand. J. Statis., 8, 227-236. [19] Sun, Jiayang (1993). Tail probabilities of the maxima of Gaussian random elds. Annals of Probability, 21, 34-71. [20] Sun, Jiayang and Loader, Clive (1994). Simultaneous con dence bands for linear regression and smoothing. Annals of Statistics, 22, No. 3, 1328-1345. [21] Sun, J., Raz., J and Faraway, J. (1998). Simultaneous con dence bands for growth and response curves. Statistica Sinica. In press. [22] Weyl, H. (1939). On the Volume of Tubes. Amer. J. Math., 61, 461-472.
37
Index
n xi X yi or Yi g f a(y) b() i; zi :::
sample size ith predictor domain of interest about xi ith response variable link function, g((xi )) = E (Yi jxi ) density of Yi , see (1) component of f , see (1) component of f , see (1) natural parameter, see (1) subscripted functions, denoting the function evaluated xi , e.g. zi = z (xi ); i = (xi ) (x) linear predictor, (x) = z (x)T unknown q dimensional parameter z(x) known q dimensional covariate ^ denote estimate or estimator, e.g. ^ is an estimator of luor lc(x) two-sided basic SCR in (2) l (x) one-sided basic upper SCR in (4) ll (x) one-sided basic lower SCR in (4) ^ 2 (x) estimated variance of ^, see (7) In( ) q q Fisher information matrix, see (8) X nq design matrix: X T = (z1 ; : : : ; zn ), see (8) n n covariance matrix, see (8) Bn upper Cholesky triangle, BnT Bn = An An An = An( ) = In( )=n wi wi = A?n 1zi ui ui = (BnT )?1zi d ? 1 S unit sphere, S d?1 = fv 2 IRd : jjvjj = 1Pg kk Euclidean norm, jjvjj2 = di=1 vi2 d ?! convergence in distribution, see the 2nd paragraph in x3 d = equality of distributions, see the 2nd paragraph in x3 normalized vector, see (12) or n bias, see (13) n skewness, see (14) Tn( ) lattice term, see (15) n eigenvalues of An , see Proposition 3.1 h or H periodic function, see (15) Wn(x) Wn(x) = (^(x) ? (x))=^ (x), see (17) P (c) P (c) = prfsupx2X jWn(x)j > cg, see (18) l P (c) P l (c) = prfsupx2X Wn(x) < cg,
38
P u(c) W (x) s(x) sn(x) () Wn(x) 00n n Wu;cn] (x) l (x)
see (18)
P u(c) = prfinf x2X Wn(x) < ?cg, see (18) limit process of Wn (x), see (19) s(x) = limn!1 sn(x) p s (x) = (B T )? z(x)= ( n(x)), n
n
1
see (20) standard normal cumulative distribution volume of fs(x) : x 2 Xg, see (21) 0 and (45) Wn(x) = (^(x) ? (x))=(x), see (24) bias, see (26) skewness of Wn , see (26) see (33) one-sided corrected upper SCR, see (34) ll;c(x) one-sided corrected lower SCR, see (35) R or R(x) see Proposition 4.1 r; r^; r^1 see Proposition 4.1 and (47) 1 see Proposition 4.1 and (47) p2(x; z) see (30) 1 (x) see (31) i (x) i = 1; : : : ; 4, see (31) C1 : : : C9 see (32) Wn(1) (x) Wn(1) (x) = jWn(x)j + p2(x; Wn (x)), see (38) (2) Wn (x) Wn(2) (x) = jWn(0) (x)j+q2 (x; Wn(0) (x)), see (38) p (0) Wn (x) Wn(0) (x) = (Wn(x)?^1 (x))= ^2 (x)), see (39) q2s(x; zs) see (40) 2;2 ; 3;1 see (41) see (41) s4;1 (1) lc (x) see (42) lc(2) (x) see (42) lc(3) (x) see Remark 3. R0; r0 ; r^0 see Proposition 4.2 and (48) r00; r^00 see Remark 3.