GOODNESS OF FIT STATISTICS, DISCREPANCIES AND ROBUST DESIGNS FRED J. HICKERNELL Abstract. The Cramer{Von Mises goodness-of- t statistic, also known as the L2 -star discrepancy, is the optimality criterion for an experimental design for the location model with misspeci cation. This connection between goodnessof- t statistics, discrepancies and experimental designs is shown to hold in greater generality.
1. Introduction Experimental designs tell the experimenter how to set the levels for various controllable factors in an experiment so as to eciently determine how these factors aect the response. Mathematically speaking, let s denote the number of factors, and let X be a measurable subset of Rs corresponding to the experimental domain or space of feasible levels of the factors. An experimental design is a nite set of points P X . Several criteria have been proposed to choose good designs. Orthogonal designs choose the columns of the design matrix to be orthogonal, and so the information matrix is diagonal. Optimal designs minimize or maximize some function of the information matrix. The optimality criterion depends on the assumed model for the response in terms of the explanatory variables. Since the true model may not be known in advance, robust designs are of interest. For example, consider the location model with misspeci cation:
y = g(x) + ; g(x) = + h(x); where y denotes the response, x is the vector of factors, g(x) is some function of the factors, and denotes random noise with zero mean and variance 2 . The function g(x) is decomposed into its mean value with respect to some given distribution F (x), Z = g(x) dF (x);
(1)
X
and the misspeci cation, h(x). For a given design P = fz (1); : : : ; z (N )g X , one may observe the corresponding response values, y(i) = g(z (i)) + (i) , and then estimate the unknown constant by Key words and phrases. Cramer { Von Mises, low discrepancy, reproducing kernel Hilbert spaces. This research was supported by a Hong Kong RGC grant 94-95/38 and HKBU FRG grant 95-96/II-01. 1
2
FRED J. HICKERNELL
the sample mean of the y(i) :
^ = N1
(2)
N X y(i) : i=1
The mean square error of this estimate may be written as a sum of two parts, a bias term depending on the size of the misspeci cation, and a variance term depending on the size of the noise: 2 E f( ? ^)2 g = [Err(h)]2 + N ; (3) where the square root of the bias term is Z N N X X Err(h) h(x) dF (x) ? 1 h(z (i) ) = ? 1 h(z (i) ) = Err(g):
X
N i=1
N i=1
For the location model, the positions of the design points aect only the bias term, and not the variance term. The bias term can be further analyzed using either a worst-case or an averagecase (Bayesian) approach. It can be shown that either approach produces an optimality criterion that is equivalent to a goodness-of- t statistic measuring how close the empirical distribution of the design, P , X F (x) = 1 1 ;
N z2P fzxg
P
approximates F (x). Here 1A is the indicator function of A, and z = (z1 ; : : : ; zs ) x = (x1 ; : : : ; xs ) means that zj xj for all j . The purpose of this article is to show this equivalence explicitly. Note that the optimality criteria or goodness-of t statistics appearing here are called discrepancies in the numerical quadrature literature. 2. The Optimality Criterion Is a Discrepancy The rst step is to show that an optimal, robust design for the location model is one that minimizes the discrepancy. This is done by determining how large (3) can be for the average-case or worst-case h(x). For the average-case analysis the function of the factors, g(x), is assumed to be in a space of random functions A with zero mean and covariance kernel K (x; t): Eg2A [g(x)] = 0; Eg2A [g(x)g(t)] = K (x; t) 8x; t 2 C s : The mean value of the bias term in (3) is:
Eg2A [Err(g)]2 = Eg2A =
=
Z
Z
"Z
X
#2 X 1 g(x) dF (x) ? N f (z ) z2P
XZ K (x; t) dF (x) dF (t) ? N2 K (z; t) dF (t) XX z2P X X K (z; z 0) + N12 z;z 2P
XX
0
K (x; t) d[F (x) ? FP (x)] d[F (t) ? FP (t)]:
GOODNESS OF FIT STATISTICS, DISCREPANCIES AND ROBUST DESIGNS
3
The right-hand side of this equation is called the square discrepancy, and depends on the design as well as the covariance kernel: (4)
[D(P ; K )]2 =
Z
XX
K (x; t) d[F (x) ? FP (x)] d[F (t) ? FP (t)]:
Therefore, an optimal robust design for the location model is one that minimizes the discrepancy. Note from the formulas above that the computational work required to evaluate the discrepancy is at worst O(N 2 ), the work required to compute a double sum. The worst-case analysis assumes that the function of the factors, g(x), lies in a Hilbert space, W , with inner product h; ; iW and reproducing kernel K (x; t). The reproducing kernel is a function on X X , such that, (5) g(t) = hg; K (; t)iW 8t 2 X ; g 2 W : The reproducing kernel is always positive de nite and symmetric in its arguments, that is, (6a) K (x; t) = K (t; x) 8x; t 2 X ; (6b)
N X a(i) a(k) K (x(i) ; x(k) ) 0 8a(i) 2 R; x(i) 2 X :
i;k=1
The bias term in (3) is the square of Err(g), which is a linear functional on W . By the Riesz Representation Theorem, this linear functional has a representer, , such that Err(g) = h; gi for all g 2 W . Because W has a reproducing kernel, it is easy to nd this representer: (x) = h; K (; x)iW = Err(K (; x)): The Cauchy-Schwarz inequality then implies that [Err(g)]2 = [Err(h)]2 = h; hi2W k k2W khk2W : (7) This upper bound is attained when the misspeci cation, h, is proportional to . The rst term in (7) is just the square discrepancy using (4), since k k2W = h; iW = Err( ) = [D(P ; K )]2 : Therefore, the worst-case analysis gives (8)
2
sup E f( ? ^)2 g = [D(P ; K )]2 + N :
g2W khk=1
A robust design which is optimal against the worst misspeci cation minimizes the discrepancy. Theorem 1. Consider the location model with misspeci cation (1). Then 2 Eg2A E f( ? ^)2 g = sup E f( ? ^)2 g = [D(P ; K )]2 + N ;
g2W khk=1
where A is a space of random functions with covariance kernel K , W is a Hilbert space of functions with reproducing kernel K , and D(P ; K ) is the discrepancy de ned in (4). Thus, the optimal design for this model is the one which minimizes the discrepancy. Moreover, given any continuous kernel, K , that satis es conditions (6), it is possible to construct a set of functions, A, for which K is the covariance
4
FRED J. HICKERNELL
kernel, and there exists a unique Hilbert space, W , for which K is the reproducing kernel. The proof of the rst part of this theorem is given in the paragraphs above, while the proof of the latter part is given in [6]. See [6] for a more detailed discussion of covariance and reproducing kernels. Suppose that the experimental domain is the s-dimensional unit cube, [0; 1]s, (perhaps, after suitable scaling), and that the underlying distribution, F , is the uniform distribution, Funif(x) x1 xs : Furthermore, suppose that the covariance/reproducing kernel is chosen to be
K (x; t) =
s Y
j=1
[1 ? max(xj ; tj )] :
This means that the space A consists of Brownian sheets (multidimensional generalizations of Brownian motions), and the space W consists of all functions g satisfying the conditions: @ s g 2 L ([0; 1]s ) and gj = 0 8j: 2 xj =1 @x @x s
1
Under these assumptions the discrepancy de ned in (4) is: [D2 (P )]2 =
Z
[0;1]
s
1 s
[Funif (x) ? FP (x)]2 dx
!
s 1 ? z2 s XY j + 1 X Y [1 ? max(z ; z 0 )]: = 3 ? N2 j j 2 N 2 z;z 2P j=1 z2P j=1 This is known as the L2 -star discrepancy in the numerical quadrature literature, and also as the Cramer-Von Mises goodness-of- t statistic [1]. The averagecase analysis for this particular kernel was done in [8] and the worst-case analysis in [9]. In the next section the connection between discrepancies and goodness-of- t statistics is generalized. 0
3. The Discrepancy is a Goodness-of-Fit Statistic From the de nition in (4) it is clear that the discrepancy measures how far away the empirical distribution function FP is from F . Since K (x; t) is positive-de nite, the discrepancy becomes small if and only if FP approximates F well. In fact, one may de ne an inner product on a space of signed measures, M, in terms of the reproducing kernel. Let M be the following space of signed measures:
M = signed measure F (x) :
Z
XX
K (x; t) dF (x) dF (t) < +1 ;
and de ne an inner product on M as follows: (9)
hF; GiM =
Z
XX
K (x; t) dF (x) dG(t) 8F; G 2 M:
In this case the discrepancy is just the norm of F ? FP : D(P ; K ) = kF (x) ? FP (x)kM :
GOODNESS OF FIT STATISTICS, DISCREPANCIES AND ROBUST DESIGNS
5
That is, any discrepancy is a goodness-of- t statistic that takes the form of the norm of F ? FP , where the norm is induced by a certain inner product de ned in terms of the kernel K . The converse is also true. Suppose M is some Hilbert space of signed measures containing all point mass measures Ffzg (x) = 1fzxg . One may then de ne a kernel in terms of the inner product on M,
K (z; z 0) hFfzg ; Ffz g iM 8z; z 0 2 X :
(10)
0
This kernel satis es conditions (6) by the de nition of inner product. Also, equation (9) holds automatically when F and G are measures assigning unit mass to a single point. Furthermore, equality (9) holds for any signed measures in F; G 2 M by writing F and G as limits of linear combinations of point mass measures. The results of the preceding two paragraphs give the following theorem:
Theorem 2. The discrepancy as de ned in (4) takes the form kF ? FP kM where the norm is induced by the inner product, h; iM de ned in terms of the kernel in (9). Conversely, if M is a Hilbert space of signed measures containing all measures of the form Ffzg (x) = 1fzxg, then kF ? FP kM = D(P ; K ) for the kernel de ned in (10). 4. Conclusion The previous two sections show the strong relationship between robust optimal design criteria for location models, discrepancies, and goodness-of- t statistics. There are several practical consequences of these results: i. The uniform design [2, 7] spreads design points evenly over the experimental domain by minimizing the discrepancy. Theorem 1 makes the connection between optimal design for the misspeci ed location model and the uniform design. ii. Any goodness-of- t statistic arising from an inner product is equivalent to the discrepancy de ned in (4) by Theorem 2. The formula for the discrepancy provides a way of computing the goodness-of- t statistic in only O(N 2 ) operations. For example, the multidimensional Cramer-Von Mises goodness-of- t statistic (L2 -star discrepancy) compares the proportion of experimental points in the box [0; x) to its volume, and may be written as: [D (P )]2 = 2
(Z
Z [0;1]s
[0;x)
d[Funif ? FP ]
)2
dx:
Because the box is anchored at the origin, this statistic changes when the set of experimental points, P , is re ected through the plane xj = 1=2 passing through the center of the cube for any j . Therefore, one might wish to modify this goodness-of- t statistic to consider all boxes [t; x) [0; 1)s , that is, [D(P )]2 =
(Z
Z
[0;1]2s ; tx
[t;x)
)2
d[Funif ? FP ]
dt dx:
6
FRED J. HICKERNELL
This L2 -unanchored discrepancy is not easy to compute in the above form, but using Theorem 2, one can obtain the relevant kernel as
K (z; z 0) =
(Z
Z
[0;1]2s ; tx
[t;x)
dFfzg
)
Z
[t;x)
dFfz g dt dx 0
=
s Y j=1
[min(zj ; zj0 ) ? zj zj0 ]:
The formula for the corresponding discrepancy then follows from (4) as s s z (1 ? z ) 1 X Y s 1 ? 2 XY j j + 0 0 [D(P )]2 = 12 N z2P j=1 2 N 2 z;z 2P j=1[min(zj ; zj ) ? zj zj ]: This formula was obtained earlier by [5]. iii. By Theorem 2, goodness-of- t statistics may be used as discrepancies, which indicate the error of numerical quadrature. Conversely, discrepancies in the numerical quadrature literature (see, for example, [3, 4]) may be used as goodness-of- t statistics. 0
Acknowledgments
The author would like to thank Sung-Nok Chiu, Gang Wei, Min-Yu Xie, and Rong-Xian Yue for their valuable comments and suggestions. 1. 2. 3. 4. 5. 6. 7. 8. 9.
References R. B. D'Agostino and M. A. Stephens (eds.), Goodness-of- t techniques, Marcel Dekker, New York, 1986. K. T. Fang and F. J. Hickernell, The uniform design and its applications, Bull. Inst. Internat. Statist., 50th Session, Book 1 (1995), 333{349. F. J. Hickernell, A generalized discrepancy and quadrature error bound, Math. Comp. 67 (1998), 299{322. , Lattice rules: How well do they measure up?, Pseudo- and Quasi-Random Point Sets (P. Hellekalek and G. Larcher, eds.), Lecture Notes in Statistics, Springer-Verlag, New York, 1998, to appear. W. J. Moroko and R. E. Ca isch, Quasi-random sequences and their discrepancies, SIAM J. Sci. Comput. 15 (1994), 1251{1279. G. Wahba, Spline models for observational data, SIAM, Philadelphia, 1990. Y. Wang and K. T. Fang, A note on uniform distribution and experimental design, Kexue Tongbao (Chinese Sci. Bull.) 26 (1981), 485{489. H. Wozniakowski, Average case complexity of multivariate integration, Bull. Amer. Math. Soc. 24 (1991), 185{194. S. K. Zaremba, Some applications of multidimensional integration by parts, Ann. Polon. Math. 21 (1968), 85{96. E-mail address :
[email protected], http://www.math.hkbu.edu.hk/~fred
Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong