Biometrika (2004), 91, 1, pp. 165–176 © 2004 Biometrika Trust Printed in Great Britain
A comparison of sequential and non-sequential designs for discrimination between nested regression models B HOLGER DETTE Ruhr-Universita¨t Bochum, Fakulta¨t fu¨r Mathematik, 44780 Bochum, Germany
[email protected]
ROBERT KWIECIEN RW T H Aachen, Institut fu¨r Medizinische Statistik, Pauwelstrasse 30, 52074 Aachen, Germany
[email protected]
S Classical regression analysis is usually performed in two steps. In a first step an appropriate model is identified to describe the data-generating process and in a second step statistical inference is performed in the identified model. In this paper we investigate a sequential and a non-sequential design strategy, which take into account these different goals of the analysis for a class of nested models. It is demonstrated that non-sequential designs usually identify the ‘correct’ model with a higher probability than sequential methods. Although non-sequential designs can never be guaranteed to achieve the best possible efficiency in the ‘correct’ model, it is demonstrated by means of a simulation study that for realistic sample sizes the efficiencies of the non-sequential designs for the estimation of the parameters in the ‘correct’ model are at least as high as the corresponding efficiencies of the sequential methods. Some key words: Discrimination design; F-test; Optimal design; Polynomial regression; Robust design; Sequential design.
1. I Optimal design theory usually assumes precise knowledge of the underlying model of the data-generating process; see for example Silvey (1980). However, in many applications such knowledge is not available and the analysis of the data is usually performed in two steps. The data are first used to identify an appropriate model from a given class of competing models and the second step consists of statistical analysis in the identified model, such as parameter estimation or prediction. An optimal design for one task may be highly inefficient for the other. Consider for example the regression model Y= f (X)+e,
(1·1)
where e has a standard normal distribution. The real-valued function f is assumed to belong to a given class of linear nested models, F={g , . . . , g }, 1 k
(1·2)
166
H D R K
say, where l g (x)= ∑j b f (x) ( j=1, . . . , k) (1·3) j j,i i i=1 are the competing nested models, 1∏l F }, (3·10) 0 1,n−k0,1−a where F(1, n−k , d2) denotes a random variable with an F-distribution with (1, n−k ) 0 0 degrees of freedom and noncentrality parameter
q A
B r
−1 −1 n 1 d2=b2 e , XT X eT k0,k0 s2 k0 n k0 k0 k0
(3·11)
H D R K
174
F is the corresponding (1−a)-quantile of the central F-distribution, e µRk0 1,n−k0,1−a k0 denotes the k th unit vector and n is the sample size. Similarly, the rejection probability 0 for the sequential procedure of Biswas & Chaudhuri (2002) in the final step is p(s, dA 2)=pr{F(s, n−sk , dA 2)>F }, 0 s,n−sk0,1−a
(3·12)
where
q A
B r
−1 −1 1 n s−1 m e ∑ i eT X(i)T X(i) dA 2=b2 k0 k0,k0 s2 n k0 m k0 k0 i i=0
(3·13)
and X(i) is the design matrix from the sample of the ith step. It is easy to see, at least k0 numerically, that for fixed d2 the probabilities ( p(s, d2)) are decreasing with s. In s=1,2... Table 1 we show the smallest value dA 2 such that p(s, dA 2)=p(1, d2), for various values of s, n and k , where we put d2=1. 0 Table 1. Minimal value dA 2 such that p(s, dA 2)=p(1, 1) for various values of n, s and k 0 n
s=2
k =2 0 s=3 s=5
s=9
s=2
k =5 0 s=3 s=5
s=9
s=2
50 100 200 2
1·42 1·41 1·40 1·39
1·76 1·72 1·71 1·69
3·42 3·07 2·95 2·85
1·43 1·41 1·41 1·39
1·79 1·73 1·71 1·69
7·18 3·19 2·98 2·85
1·45 1·41 1·40 1·39
2·35 2·24 2·19 2·15
2·49 2·25 2·20 2·15
k =9 0 s=3 s=5 1·87 1·74 1·71 1·69
4·33 2·30 2·20 2·15
The values in Table 1 can be used as follows. Suppose that the sample size is n=100, the number of unknown parameters in the true model is k +1=3 and the sequential 0 procedure uses s=5 steps. If X is a non-sequential design with probability p(1, d2) of k0 rejecting H , then the noncentrality parameter of the test obtained from a sequential 0k0 A design X(0) , . . . , X(s) has to satisfy d22·24d2 in order to obtain the same probability of k0 k0 rejection for the sequential procedure. Note that the values in the table are increasing with the number of steps in the sequential procedure. This means that an increasing number of steps in the sequential procedure yields a loss in the rate of correct specification, which we have also observed empirically; see our technical report. Therefore, in our numerical comparisons in §§3·2 and 3·3 we actually used a small number of stages for the sequential design. Sequential procedures with more steps will yield lower rates of correct specification. In general define for a non-sequential design j with design matrix X the information k0 matrix M (j)=n−1XT X and k k0 k0 {eT M−1 (j)e }−1 k0 k0 k0 µ[0, 1] eff (j)= k0 sup {eT M−1 (g)e }−1 g k0 k0 k0 as its D -efficiency for testing the hypothesis H , where the supremum is taken over 1 0k0 the class of all designs. Denote by j = W s−1 m n−1j(i) the sequential design, where j(i) s i=0 i corresponds to the design matrix X(i) of the ith step, i=0, . . . , s−1. If c is the value k0
Designs for discrimination between models obtained from Table 1 and the non-sequential design has D -efficiency 1 eff (j)>1/c, k0 then we have that
175
(3·14)
dA 2 W s−1 m n−1 eff (j(i)) k0 = i=0 i 0·44 will yield a higher rate of correct specification than any sequential design k0 of Biswas & Chaudhuri (2002). Note that the efficiencies eff (j), . . . , eff (j) enter into the 1 k optimality criterion (2·1) and (2·3). Therefore the model-robust and optimal discrimination designs often have efficiencies larger than 1/c if the number of steps in the sequential procedure is large. Moreover, the estimate in (3·15) corresponds to an over-optimistic assessment of the sequential design, because we replaced all efficiencies eff (j(i)) by 1 to k0 obtain the inequality. Since j(i) is a convex combination of the D-optimal designs for the models g , . . . , g , the efficiencies eff (j(i)) for testing the hypothesis H will usually be 1 k k0 0k0 much smaller than 1 and the estimate in (3·15) can be improved substantially, if these efficiencies are known. Consider for example the one-dimensional polynomial regression model of § 3·2, the sequential design with s=2 steps and assume that the quadratic model is correct, so that k =2. In the best case the quadratic model is correctly identified in the 0 first step of the sequential procedure. In this optimal case the sequential design is adapted to the correct quadratic model in the second step, which yields for the D -efficiencies 1 eff (j(1))j63·2% and eff (j(2))j73·2%. From (3·15) we therefore obtain the improved k0 k0 estimate dA 2 0·5(0·632+0·732) 0·667 = =