Robust designs for models with possible bias and correlated errors ...

3 downloads 396 Views 328KB Size Report
Tell us about your experience on SpringerLink and you could win a MacBook Pro. ... quadratic loss for the least squares estimation by using a minimax method.
Appl. Math. J. Chinese Univ. 2010, 25(3): 307-317

Robust designs for models with possible bias and correlated errors ZHOU Xiao-dong1,2

YUE Rong-xian1

Abstract. This paper studies the model-robust design problem for general models with an unknown bias or contamination and the correlated errors. The true response function is assumed to be from a reproducing kernel Hilbert space and the errors are fitted by the qth order moving average process MA(q), especially the MA(1) errors and the MA(2) errors. In both situations, design criteria are derived in terms of the average expected quadratic loss for the least squares estimation by using a minimax method. A case is studied and the orthogonality of the criteria is proved for this special response. The robustness of the design criteria is discussed through several numerical examples.

§1

Introduction

In this paper, we study designs for regression models, with an eye to attaining robustness against two violations of the classical assumptions: I. The response is taken to be only approximately linear in the regressors with y(xi ) = E[y|xi ] + εi , i = 1, . . . , n, E[y|xi ] = g  (xi )θ + h(xi ) for a p-dimensional regressor vector g, depending on a q-dimensional vector x. function h represents uncertainty about the exact nature of the regression response. estimates θ but not h, leading to the biased estimation of E[y|x]. II. The random errors, with mean 0, are correlated with Var[ε] = σ 2 P , where ε = (ε(x1 ), . . . , ε(xn )) .

(1) The One

(2)

Violation I is commonly dealt with at the design stage. The model-robust design problem has been studied by many authors, whose investigations differ in specification of the class H, which consists of the bias functions h. In [1], Box and Draper restrict their attention to finite dimensional H. In [4,5,9], the authors deal with infinite dimensional H. Some of them take H = {h : |h(x)| ≤ φ(x), x ∈ X } with various assumptions on φ(x). The designs constructed Received: 2007-11-14. MR Subject Classification: 62K05, 62F35, 62C20. Keywords: Robust design, reproducing kernel Hilbert space, moving average process, Hermite polynomial. Digital Object Identifier(DOI): 10.1007/s11766-010-1922-9. Supported by NSFC grant (10671129), the Special Funds for Doctoral Authorities of Education Ministry (20060270002), E-Institutes of Shanghai Municipal Education Commission (E03004), Shanghai Leading Academic Discipline Project (S30405).

308

Appl. Math. J. Chinese Univ.

Vol. 25, No. 3

 appear to be quite sensitive to the assumed form of φ. The others take H = {h : X [h(x)]2 dx ≤  η, X gj (x)h(x)dx = 0, j = 1, . . . , p}. Here η is assumed to be known, and the second condition ensures the identifiability of the θj . But this specification is criticized for only applicable to designs which are absolutely continuous on the design region X with a finite loss. The review by Chang and Notz [3] provides a good summary of the previous work in this subject. To avoid this limitation, Yue and Hickernell [13] give a more reasonable assumption on H, supposing H to be a reproducing kernel Hilbert space admitting a reproducing kernel K(x, w) and an inner product ·, ·. They use the reproducing kernel space method to get the robust designs. In this paper, we continue to study this assumption. When the errors are correlated as in (2), the model-robust designs become complicated. The most elegant results in this domain have been obtained in a series of papers started by Sacks et al. [7]. Another suggestion goes back to Birmkulov et al. [2]. For a recent survey and some alternative methods, see [6]. For approximate regression models, Wiens and Zhou [10, 11] propose a minimax method to get the robust designs when the errors are autocorrelated. Zhou [14, 15] investigates the robust designs for models with MA processes. In this paper, we take another method to obtain the robust designs for the approximate regression model with correlated errors. We mainly discuss errors that are fitted by MA processes. We organize our paper as follows. In Section 2, we briefly give the precise definition of regression model. In Section 3, we derive the optimal design criteria for models with MA(1) errors and MA(2) errors. The efficiencies of our designs are also defined in this section. In Section 4, a case is studied and the orthogonality of our design criteria is proved under this special situation. Some numerical results are also presented in this section.

§2

Preliminaries

Let the true response be given by (1), and let the class H be a reproducing kernel Hilbert space admitting a reproducing kernel K(x, w) and inner product ·, ·. For the details of reproducing kernel Hilbert spaces, see [8]. We also assume that  gj (x)h(x)dx = 0, j = 1, . . . , p, h ∈ H. (3) X

The random errors have zero means and the variance-covariance matrix is σ 2 P . In this paper, we confine ourselves to the least squares estimators both because these estimators do not depend on the type of deviation from the model and because the variance-covariance matrix is unknown. Let ξn be a sequence of n design points xi in X and let y be the vector of n observations yi . Let  g be the vector of p regressors gj and let X be the design matrix, i.e., X = (g(x1 ), · · · , g(xn )) . Our consideration here is limited to the design ξn with nonsingular information matrix M = X  X. Then the least squares estimation of θ = (θ1 , · · · , θp ) in the assumed model is θˆ = M −1 X  y. Define h = (h(x1 ), · · · , h(xn )) . We obtain the mean squared error of θˆ MSE(ξn , h, P ) = E(θˆ − θ) (θˆ − θ) = IV(ξn , P ) + ISB(ξn , h), (4) where IV(ξn , P ) = σ 2 tr[M −1 X  P XM −1 ], (5) (6) ISB (ξn , h) = h XM −2 X  h. In the following lemma, we give a sharp bound for ISB(ξn , h) for h ∈ H. Lemma 2.1. Let H be defined as above. For any designs ξn , let K be the n × n matrix whose (i,j)-th entry is K(xi , xj ). Then ISB (ξn , h) defined in (6) has the following upper bound ISB (ξn , h) ≤ ||h||2 λmax (X  KXM −2 ), (7)

ZHOU Xiao-dong, et al.

Robust designs for models with possible bias and correlated errors

309

where || · || is the norm in H and λmax (·) is the maximum eigenvalue of a matrix. Equality in (7) holds when h(x) is a constant multiple of the function (8) W (x) = v  M −1 X  k(x), in which v is the eigenvector of X  KXM −2 corresponding to λmax and k is a vector of n functions K(·, xi ) for i = 1, . . . , n. Lemma 2.1 can be proved in a way similar to that in [13]. From this lemma, we have MSE(ξn , h, P ) ≤ σ 2 tr[M −1 X  P XM −1 ] + ||h||2 λmax (X  KXM −2 ). (9) In particular, when the observations are homogeneous, i.e. Cov[ε] = σ 2 In , as considered by Yue and Hickernell [13], we have MSE(ξn , h, In ) ≤ σ 2 tr[M −1 ] + ||h||2 λmax (X  KXM −2 ). (10)

§3

Design criteria for models with correlated errors

In most practical situations, the observations are not homogeneous, and successive observations can be correlated while observations that further apart are not correlated. Thus the moving average processes are reasonable to model the observation errors. The qth order MA process MA(q) can be represented by εi = wi − φ1 wi−1 − · · · − φq wi−q , i = 1, . . . , n, where φ1 , . . . , φq are MA(q) parameters and wi are white noise with mean 0 and variance σ02 . For this error model, we have E[εi ] = 0, Var[εi ] = σε2 = σ02 (1 + φ21 + · · · + φ2q ),

Cov[ε] = Cov[(ε1 , · · · , εn ) ] = σε2 P , where P is the autocorrelation matrix. The following theorems give the supremum of MSE(ξn , h, P ) over H and P . These results can be easily provedafter some algebraic transformations and by Lemma 2.1, see also [14]. Let n−s the matrix B(s) = i=1 g(xi )g  (xi+s ) for s = 0, 1, . . . , n − 1. Theorem 3.1. Suppose that the errors in model (1) are from MA(1) processes satisfying the invertible condition (i.e., the MA(1) parameter satisfying |φ1 | < 1) and H is a reproducing kernel Hilbert space defined as above satisfying (3). Then MSE(ξn , h, P1 ) ≤ ||h||2 λmax (X  KXM −2 ) + 2σ02 [tr(B −1 (0)) + trB −2 (0)B(1))|], (11) where P1 = I − 2φ1 /(1 + φ21 )G1 and G1 is an n × n matrix with (G1 )ij = 0.5δ|i−j|=1 . Theorem 3.2. Suppose that the errors in model (1) are from MA(2) processes satisfying the invertible conditions (i.e., the MA(2) parameters satisfying φ1 +φ2 < 1, φ2 −φ1 < 1, and |φ2 | < 1) and H is a reproducing kernel Hilbert space defined as above satisfying (3). Then MSE(ξn , h, P2 ) ≤ ||h||2 λmax (X  KXM −2 ) + σ02 max{a1 , a2 }, (12) where P2 = I − 2φ1 (1 − φ2 )σ02 /σε2 G1 − 2φ2 σ02 /σε2 G2 , (G2 )ij

=

0.5δ|i−j|=2 , −1

i, j = 1, . . . , n,

(0)] − 2tr[B −2 (0)B(1)],

a1

=

2tr[B

a2

=

6tr[B −1 (0)] + 8|tr[B −2 (0)B(1)]| + 2tr[B −2 (0)B(2)].

Our criteria for choosing designs or comparing the efficiencies of different designs come from

310

Appl. Math. J. Chinese Univ.

Vol. 25, No. 3

the upper bounds for MSE(ξn , h, Pi ) for i = 1, 2 defined in (11) and (12), respectively. Let Jv1 (ξn ) = 2[tr[B −1 (0)] + |tr[B −2 (0)B(1)]|],

and

Jv2 (ξn ) =

max{a1 , a2 },

Jb (ξn ) =

λmax (X KXM



(13) −2

),

J 1 (ξn ; η) = ηJv1 (ξn ) + (1 − η)Jb (ξn ), J 2 (ξn ; η) = ηJv2 (ξn ) + (1 − η)Jb (ξn ),

(14) (15)

where

σ02 . (16) ||h||2 + σ02 We call Jvi (ξn ) a variance discrepancy and Jb (ξn ) a bias discrepancy, while J i (ξn ; η) is their weighted average. The upper bound for MSE(·) is (||h|| + σ02 )J i (ξn ; η). It is clear that η in (16) is independent of the designs ξn and reflects the relative proportion of the variance discrepancy to the bias discrepancy. Values of η near 0 mean small variance error or serious bias, while values of η near 1 mean large variance error or small bias. Thus η can be understood as the prior belief of the experimenter as to the nature of the true response function. For a given value of η, J i (ξn , η) depends only on the design points but not on the bias function h and the specific value of Pi . The smaller the value of J i (ξn , η), the better is the design ξn . Therefore, for fitting the linear model with bias and correlated errors by using the least squares, we should choose a design such that J i (ξn , η) is as small as possible. A design that minimizes J i (ξn , η) η η for a given η ∈ (0, 1) is called a compound optimal designs denoted by ξni . When η = 0, ξni is η called an all-bias design and when η = 1, ξni is called an all-variance design. η=

For model (1) with i.i.d. errors, the designs that minimize the bias discrepancy alone perform very well when the bias is seriously present, and not bad even if there is no bias at all. See [1,13] for details. We believe that the same conclusion is expected for model (1) with correlated errors. In order to compare the behavior of different designs, such as the all-variance and all-bias designs described above, we define the efficiency of a design ξn∗ as minξn J i (ξn ; η) (17) Effi (ξn∗ , η) = J i (ξn∗ ; η) corresponding to the MA(i) process with i = 1, 2. In the following section, we simultaneously investigate the behavior of the classical design ξnc obtained through the upper bound MSE(ξn , h, In ) defined in (18), where the errors are assumed to be homogenous: J c (ξn , η) = ηJvc (ξn ) + (1 − η)Jb (ξn ), (18)

Jvc (ξn ) = tr[M −1 ]. With respect to the true variance-covariance matrix Pi∗ , the efficiencies of the classical design ξnc relative to the J i -optimal design ξni which minimizes J i (ξ, η) are defined by J(ξnc , Pi∗ ; η) Effci (η) = , i = 1, 2, (19) J(ξni , Pi∗ ; η) where J(ξn , Pi∗ ; η) = ηJv (ξn , Pi∗ ) + (1 − η)Jb (ξn ), Jv (ξn ; Pi∗ )

= tr[X  Pi∗ XM −2 ].

ZHOU Xiao-dong, et al.

Robust designs for models with possible bias and correlated errors

§4

311

Case study

Yue [12] studies the model-robust designs when the various stages of specificating the regressor vectors in model (1) are the constant, first-degree, and second-degree polynomials and the function h includes the effects of bias due to higher-degree terms which is fitted by the tensor products of Hermite polynomials. There, the observations are uncorrelated and homogenous. In order to have some insight into the influence of errors assumptions, we continue to study this case. We first prove that designs obtained through our criteria in (11) or (12) have the orthogonal property. Then we give the designs calculated by using Matlab (Mathworks) in diagrammatic representation. Under the true variance-covariance matrix Pi∗ , we compare our designs with the classical ones and the efficiencies calculating with (19) are given in tables. Let F be a Hilbert space with a reproducing kernel K0   1 γ2 γ    (x x + t t) + K0 (x, t) = exp − xt , 2(1 − γ 2 ) 1 − γ2 (1 − γ 2 )q/2 and let H be the subspace of F satisfying the conditions  F = span{g1 , · · · , gp } ⊕ H,

Rq

gj (x)h(x)φ(x)dx = 0,

  1 φ(x) = (2π)−q/2 exp − x x . 2 is the standard multivariate normal density. So H is a reproducing kernel Hilbert space. The regressors we consider in model (1) are the three vectors g(x) = 1, (20)

where

g(x) g(x)

= (1, x1 , · · · , xq ) ,

(21) 

= (1, x1 , · · · , xq , g12 , · · · , gq(q−1) , g11 , · · · , gqq ) ,

(22)



where

xk xl √ 1 ≤ k < l ≤ q, (x2l − 1)/ 2 1 ≤ k = l ≤ q. According to Lemma 1 in [12], the reproducing kernel K for H, a subspace of F , is given by K(x, t) = K0 (x, t) − 1 (23) corresponding to the constant regressor (20); K(x, t) = K0 (x, t) − 1 − γx t (24) corresponding to the first-degree regressor (21); and   q 1  2  2   K(x, t) = K0 (x, t) − 1 − γx t − γ + [(x t) − x x − t t] (25) 2 2 corresponding to the second-degree regressor (22). We will prove that the designs derived from the J i (ξn ; η) have orthogonality. gkl (x) =

Lemma 4.1. [12] Let T be a q × q orthogonal matrix. Then for each K(x, t) given in (23) to (25), for x, t ∈ Rq , we have K(T x, T t) = K(x, t). Lemma 4.2. [12] Let T = (tik )q×q be an orthogonal matrix. Then for the first-degree regressor (21), g(T x) = T1 g(x), where T1 = diag{1, T } is a (q + 1) × (q + 1) orthogonal matrix. For the second-degree regressor

312

Appl. Math. J. Chinese Univ.

Vol. 25, No. 3

(22), g(T x) = T2 x, where T2 = diag{1, T , D} is a (1 + 2q + q(q + 1)/2) × (1 + 2q + q(q + 1)/2) orthogonal matrix in which   D1 D2 D= . D3 D4 Here D1 is a q(q − 1)/2 × q(q − 1)/2 matrix, D2 is a q(q − 1)/2 × q matrix, D3 is a q × q(q − 1)/2 matrix and D4 is a q × q matrix with √ D1 = (t√ik tjl ) + (til tjk )1≤i

Suggest Documents