Zhang, Liang and Yu (1996) proposed some algorithm for choosing n1. They take .... parameter estimation for nonlinear regression models (Fang and Zhang,.
Some Global Optimization Algorithms in Statistics Kai-Tai Fang1 2 , Fred J. Hickernell1 and Peter Winker3 ;
Department of Mathematics, Hong Kong Baptist University Kowloon Tong, Hong Kong 2 Institute of Applied Mathematics, Chinese Academy of Sciences Beijing 100080, China Fakultat fur Wirtschaftswissenschaften und Statistik, Universitat Konstanz Postfach 5560, 78434 Konstanz, Germany. 1
3
Abstract.
There are many problems in statistics that need some powerful global optimization methods. This paper reviews two ecient methods: SNTO (sequential number-theoretic methods for optimization) and TA (the threshold accepting algorithm). A discussion is given of the applications of these methods to various statistics problems: maximum likelihood estimation, regression analysis, model selection, experimental design, projection pursuit, etc.
Key Words and Phrases: Experimental design, global optimization, numbertheoretic methods, nonlinear regression model, projection pursuit, simulated annealing, threshold accepting. Mathematical Subject Classi cations 1991: 65K10.
1 Introduction There are many problems in statistics that need powerful algorithms for optimization, for example, maximum likelihood estimation, nonlinear regression, projection pursuit and design of experiments. Let f be a function over a domain G, a subset of R . We are required to nd the global maximum (minimum) M of f over G, and also a point x 2 G, such that s
(
M = f (x) = maxx2 f (x) or minx2 f (x): G
G
(1)
M is called the global maximum (minimum) of the objective function f over G, and x a maximizing (minimizing) point on G. There are a number
of classical optimization methods, for instance, the downhill simplex method, quasi-Newton methods and conjugate gradient methods (see Nash and WalkerSmith, 1987 and Fletcher, 1987). In those methods, it is typically required that f be unimodal and once or twice dierentiable over G, otherwise convergence may not be guaranteed or only a local maximum may be reached. However, many
statistical problems need global optimization algorithms for possibly nonsmooth objective functions. Below are some examples. Example 1 (Maximum likelihood estimation). Let x1 ; : : : ; x be a sample from a population with the density p(x; ), where = (1 ; : : : ; ) is the parameter vector de ning the density that needs to be estimated. The likelihood function is n
s
L() =
Y n
=1
p(x ; ): i
i
The maximum likelihood estimator (MLE) of is the maximizing point of the likelihood function. This function is often multiextremal, such as, when the data comes from a Weibull or the beta distribution and/or when there are some missing data. Many papers discuss how to nd the MLEs of (see Fang and Yuan, 1990). The problem is even harder when the population distribution is multivariate. Example 2 (Nonlinear regression). Consider a regression model Y = g(x; ) + ; (2) where x = (X1 ; : : : ; X ) are independent variables, Y is the response, and is the s-dimensional regression parameter vector. When the function g is nonlinear in , model (2) is a nonlinear regression model. An estimate of minimizes p
Q() =
X n
=1
h(Y ; g(x ; )); i
(3)
i
i
where f(x ; Y ); i = 1; : : : ; ng are observations. The choice of h(a; b) has been discussed by many authors. For example, h(a; b) = (a ? b)2 gives least squares estimation and h(a; b) = ja ? bj corresponds to `1 -norm estimation. The domain of optimization in model (3) is very large, and may be R . The objective function Q() may not be dierentiable everywhere since h need not be dierentiable. Example 3 (Model selection). In model (2) there may be many possible forms of g to consider and one may not know a priori which predictor variables X to include in the model. The choice of an appropriate model, called model selection, can be formulated as an optimization problem. The search domain is large due to the very large number of possible models. Some recent work in this area has been done by Winker (1995), Delgado et. al. (1996) and Mangeas and Muller (1996). Example 4 (Evaluation of the discrepancy of a set of points). Let F (x) be a given distribution in R , P = fx1 ; : : : ; x g a set of points from R and F (x) its empirical distribution, that is X F (x) = 1 I fx xg; i
i
s
j
s
s
n
n
n
n
=1
i
2
i
n
where I fAg is the indicator function of A, and all inequalities are understood with respect to the componentwise order. Then
D (P ) sups jF (x) ? F (x)j
x2 is called theF -discrepancy of P with respect to F (x). When F (x) is the uniform distribution on C [0; 1] and P C , we denote D (P ) by D(P ) and call it the discrepancy. The concept of the discrepancy was suggested by Weyl (1916). The F -discrepancy is just the Kolmogorov-Smirnov statistic for goodness-of t. It is one of most important concepts in number-theoretic (NT) methods, also called quasi-Monte Carlo (qMC) methods. NT methods have been successfully applied in multidimensional numerical integration (quadrature) and numerical solution of integral equations (see Hua and Wang, 1981 and Niederreiter, 1992) and in various elds of statistics (Fang and Wang, 1994). The function jF (x)?F (x)j is piecewise continuous and attains its supremum as the limit of x approaching some point in the nite set G de ned as follows: F
n
R
s
s
s
F
n
G C1 C ; C f1; x1 ; : : : ; x g; where x is the j th component of the ith point of P . It can be shown that 0 D(P ) = max x2 fjF (x) ? F (x)j; jF (x) ? F (x)jg; s
j
j
nj
ij
n
G
n
where F (x) is the uniform distribution over C and s
F 0 (x) = n1
X n
n
=1
I fx < xg: i
i
Therefore, calculating the discrepancy of a set of points is an optimization problem (1) where the domain is a nite set with (n + 1) points. s
Example 5 (Experimental design). Given a regression model (2) suppose that
g(x; ) =
X s
=1
g (x) ; i
i
i
where the g are some speci ed functions, and let n be the number of data points. Denote the experimental design or values of observed predictor variables by P = fx1 ; : : : ; x g. The information matrix of this design is M = X0X, where i
n
0g (x ) g (x ) 1 1 1 1 C B . . . . X=@ . . A: s
g1 (x ) g (x ) n
s
n
Let (M) be a measure of quality of the design. A -optimal design minimizes (or maximizes) the function (M) with respect to P (cf. Atkinson and Donev, 3
1992). The objective function in this case is a multiextremal function on R and is not necessarily dierentiable everywhere. Another approach to experimental design, called uniform design, minimizes the discrepancy with respect to P . Fang and Wang (1994) assume P to be a subset of an n n rectangular grid, which makes it easier to nd low discrepancy designs for small n and s. In this case the optimization problem is over a nite set. ns
Example 6 (Projection pursuit). Many problems in multivariate statistical analysis involve a projection of the data into a lower dimensional space. Principal component analysis and canonical correlation analysis are well known examples. Let X be an n p matrix of n observations of p variables. For any point a on the p?dimensional sphere S = fx : x 2 R ; x0x = 1g, the vector Xa is the projection of the data in the direction a. Let I (a) = H (Xa) be the projection index, where H is some measure of the quality of the one-dimensional data. The projection pursuit method nds the worst (or best) direction a , that is, p
p
I (a): I (a ) = amax 2 p S
Here the optimization is over the sphere S . In the past many statisticians have had diculty in computing a ( see, e. g., Malkovich and A , 1973, and Rousseeuw and van Zomeren, 1990). p
The above examples share some or all of the following diculties in solving the optimization problem (1): i) the objective function f is multiextremal; ii) the objective function f is not dierentiable or even continuous everywhere in G; iii) the dimension of the domain G is high; iv) the domain G is large in extent, for example G = R ; v) the domain G is the surface of a sphere or some other geometric object; vi) the domain G is a nite set with a large number of elements. To handle these complexities requires some powerful global optimization algorithms. Global algorithms are discussed in Horst and Tuy (1990), and Rinnooy Kan and Timmer (1989). Many of these are probabilistic algorithms. Among them simulated annealing (SA) is a very popular one. SA was proposed in Kirkpatrick, Gelett and Vecchi (1983) and Cerny (1985). The statistical applications of SA are discussed by Goe, Ferrier and Rogers (1994). Another popular class of algorithms is genetic algorithms (see Dorsey and Mayer, 1995). The choice of a suitable algorithm for a speci c problem is not an easy task, and it is dicult to objectively compare dierent methods. This paper recommends two global methods for optimization based on our experience: SNTO (sequential number-theoretic methods for optimization) and s
4
TA (the threshold accepting algorithm). Sections 2 and 3 introduce SNTO and TA, respectively. Their applications are discussed in Section 4. Some discussion is given in last section. We hope that the exchange of views between experts in statistics and optimization will lead to improved algorithms in the future.
2 Sequential Number-theoretic Method for Optimization One probabilistic method for solving the optimization problem (1) is to draw a simple random sample, P , of n points from the domain G. If n is large enough, then the optimum of f on P will be close to the global optimum M . If the points in P are statistically independent, they will not be evenly distributed over the domain. (The second point is as likely to be close to the rst point as it is to be far away from it.) Therefore, the convergence of a random search is slow. A better choice is a set of deterministic quasi-random (i. e., low discrepancy) points, sometimes called an NT-net. Even with this improvement convergence may not be suciently fast. Niederreiter and Peart (1986) and Fang and Wang (1990) independently proposed quasi-random searches over contracting domains. The speci c algorithm presented here is called a sequential number-theoretic method for optimization (SNTO). Suppose that the domain G is the rectangle [a; b]. SNTO for minimization involves the following steps: Step 0 (Initialization). Set the iteration index t = 0, the initial search domain G(0) = G, that is, G(0) = [a(0); b(0)], where a(0) = a; and b(0) = b. Let x(?1) be the empty set. Step 1 (Quasi-random search). Generate an NT-net of n points P ( ) on the domain G( ) by some NT method. Step 2 (Update of approximation). Find the point x( ) 2 P ( ) [ x( ?1) that minimizes f , that is, M ( ) f (x( )) f (y) 8y 2 P ( ) [ x( ?1): Then x( ) and M ( ) are the current best approximation to x and M . Step 3 (Termination criterion). Let c( ) = (b( ) ? a( ))=2. If max c( ) = max(c(1 ); : : : ; c( )) < , a preassigned tolerance, then G( ) is small enough; Consider x( ) and M ( ) to be acceptable and terminate the algorithm. Otherwise, proceed to next step. Step 4 (Contraction of search domain). De ne the new search domain G( +1) as follows: G( +1) = [a( +1) ; b( +1) ]; a( +1) = max(x( ) ? c( ) ; a ); b( +1) = min(x( ) + c( ) ; b ); t
t
t
t
t
t
t
t
t
t
t
t
t
t
t
t s
t
t
t
t
t
t
t
t
t
t
t
j
t
j
j
t
t
j
5
t
j
j
j
j
for j = 1; : : : ; s, where is a prede ned contraction rate. Set t = t + 1 and go to Step 1. Our experience suggests taking n1 >> n2 = n3 = : : : and = 0:5. Some comparisons between SNTO and the BFGS method (a quasi-Newton method) are given by Fang, Wang and Bentler (1994). Although in most cases BFGS converges more quickly to a solution than SNTO, it also fails to reach the global optimum much more frequently. To remedy this one must run BFGS a number of times with dierent initial points. Therefore, the overall computing time for SNTO is less than that for BFGS. The choice of n1 and n2 are very important in the use of SNTO. Recently, Zhang, Liang and Yu (1996) proposed some algorithm for choosing n1 . They take N1 < N2 < : : : and generate NT-nets of size N1 ; N2 ; : : : , respectively. Denote the optimum value and optimizing points on these NT-nets by M1 ; M2; : : : ; and x(1); x(2); : : : , respectively. If both jM ? M +1 j and kx( ) ? x( +1) k are small enough for some t, then choose n1 = N to start SNTO. This method has been applied to problems in chemical quantitative analysis with good results. The accuracy and/or convergence rate of SNTO may suer when G is large in extent. Methods for reducing the size of the search domain are discussed in Sections 3.3-3.4 of Fang and Wang (1994). Various modi cations of SNTO are possible, for example, i Repeated SNTO. When the domain G is very large, SNTO may not converge to a global optimum. One may wish to use the result from one run of SNTO as the center of a new, smaller initial search domain and then run SNTO again. This process may be repeated as necessary. ii Mixtures of SNTO and gradient methods. One may use SNTO to nd an initial approximation, x , to the global optimizer, and then use a classical gradient method, such as, quasi-Newton or conjugate gradient, to re ne the approximation to obtain x. Assuming that the approximation obtained by SNTO is suciently close to the global solution and the objective function is unimodal and twice dierentiable nearby, the gradient method will give much faster convergence than additional iterations of SNTO. Mixtures of SNTO and gradient methods have been studied by Hickernell and Yuan (1994). iii SNTO on a special domain. When the domain G is not a rectangular solid, e. g. the surface of a sphere, the generation of the NT-net is more complicated. One proceeds by mapping the points of an NT-net on a cube of equal or lower dimension into G. The key property of such a mapping is that it must preserve the uniformity of the points. A more dicult problem is how to contract the domain when it is not a rectangular solid. For more details see Section 3.6 of Fang and Wang (1994). t
t
t
t
t
3 Threshold Accepting When the domain in (1) is a nite set, SNTO cannot be applied, and another approach is needed. This section introduces an algorithm that is suitable when G 6
is a nite set. In most statistics problems the domain G is constructed by combinatorial theory. Thus, it is natural idea to consider combinatorial optimization algorithms. Many probabilistic methods for optimization have been developed in the past 15 years. The most popular is simulated annealing (SA), which is especially suited to nding the minimum of an objective function that may possess several local minima. It works by emulating the physical process whereby a solid is slowly cooled so that its structure is eventually \frozen", in con guration with minimum energy (see Bertsimas and Tsitsiklis, 1993). \One of the great charms of SA is its extraordinary generality. Almost any optimization problem can be approached by SA, and often the coding is quite easy" (Steel, 1993). However, the convergence rate of SA is slow (Ferrari, Frigessi and Schonmann, 1993). The threshold accepting algorithm (TA) was introduced by Dueck and Scheuer (1990). TA is similar to SA and can be considered as a limiting case of SA, but in many applications TA is more ecient than SA. TA, like SA, has been successfully implemented for various problems including the NP-hard traveling salesman problem (Dueck and Scheuer, 1990), the NP-complete problem of optimal aggregation (Chipman and Winker, 1995) and portfolio optimization (Dueck and Winker, 1992). TA starts with an initial guess which might be randomly chosen. In each iteration the algorithm tries to replace its current solution x( ) with a new one, chosen (randomly) as a small perturbation of the current solution. This means that for every x 2 G one must de ne a neighborhood N (x) of nearby elements. If the value of f at the new point is no worse than a certain threshold above f (x( )), then the current solution is replaced by the new one. However, as the iterations increase this threshold decreases, guaranteeing convergence. An outline of TA for minimization is given below. Step 0 (Initialization). Set t = 0, and choose a sequence of positive threshold values T ; (t = 0; 1; : : : ). Choose (randomly) the current solution x( ) 2 G. Step 1 (Generate a new candidate solution). Choose x( ) 2 N (x( )) by a deterministic or stochastic selection rule. Step 2 (Decision). If f (x( )) ? f (x( )) < T , then set x( ) = x( ). Step 3 (Change Threshold). Set t = t + 1. If t = tmax , terminate algorithm with x( ) as the approximation to x and f (x( )) as the approximation to M . Otherwise, go to Step 1. The global convergence of TA has been shown by Althofer and Koschnick (1991). The performance of TA depends on the de nition of the local structure as given by the neighborhoods N (x) and on the sequence of threshold values T . Successful implementations of TA indicate that the neighborhoods can be rather small. Furthermore, given a speci c form for the threshold sequence, such as, T = (t ? tmax ), tuning experiments indicate that a too small value of is much more harmful than if it is chosen too high. c
c
c
t
t
t
c
c
t
c
c
t
t
7
c
t
4 Applications of SNTO and TA to Statistics SNTO has been successfully applied to solve many statistical problems, such as maximum likelihood estimation (Fang and Yuan, 1990), parameter estimation for nonlinear regression models (Fang and Zhang, 1993), numerical solution of a system of nonlinear equations (Fang and Wang, 1991, and Hickernell and Fang, 1995), multivariate calibration (Liang and Fang, 1996). During 1989-90 the rst author met a consultancy problem that could be reduced to an optimization problem. The objective function involved the noncentral chi-square distribution, and had as many as 1000 parameters and 100 local minima. SNTO provided an acceptable solution for this problem. Recently, Zhang, Liang and Yu (1996) gave some comparisons between SNTO and variable step size simulated annealing. They conclude, \The clarity and simplicity of the idea of SNTO together with its convenience for implementation show that SNTO can be a promising tool in chemometrics." From our viewpoint SNTO has some advantages: i. SNTO is a global optimization algorithm ii. SNTO is easy to implement and code. iii. SNTO avoids the calculation of the derivatives of the objective function. iv. SNTO can handle general optimization problems. For example, in tting a regression curve a statistician might want to compare a least squares t of a linear model, a least squares t of a nonlinear model, and one or more robust estimates of a linear or nonlinear model. The standard codes for each of these methods is dierent even though each involves the minimization of some objective function of the residuals. In contrast SNTO could be used to solve all of these problems with only minor changes. TA has been successfully applied to evaluate the discrepancy of a set of points. In Example 3 we have explained why this is a dicult problem. Although the discrepancy is the basis of NT methods, there have been few publications on how to evaluate it in practice: Niederreiter (1973) for s = 1, Clerk (1986) for s = 2, and Bundschuh and Zhu (1993) small s. Winker and Fang (1997) reported that the value of the approximate discrepancy using TA is equal or nearly equal to the real one when the number of points and the dimension are not large. For a large set of points TA can give a good approximation in few seconds whereas the algorithm of Bundschuh and Zhu might take years. Winker and Fang (1996) have applied TA to nd uniform designs. For a given number of experiments, n, and number of factors, s. A U-type design, denoted by U, is an n s matrix of full rank with any column being a permutation of f1; 2; : : : ; ng. Without loss of generality it can be assumed that the rst column 8
of any U-type design is (1; 2; : : : ; n)0. Therefore, for the second column there are n! ? 1 possible choices, n! ? 2 for the third column, and so on. Let U be the set of all U-type designs of size n s. Let D be a measure of nonuniformity of U-type design, such as the discrepancy mentioned above. The uniform design, denoted by U , minimizes D(U) over the set of U-type designs.
D(U) = Umin D(U): 2U
(4)
The trivial algorithm for solving (4) by calculating the discrepancy for all designs in U is not feasible even for quite modest n and s. In fact, one might be tempted to conjecture that (4) is NP-complete or even NP-hard. Winker and Fang (1996) applied TA to obtain approximate solutions to this problem for n 30 and s 5. Their solutions match the exact solutions by Li and Fang (1995) for s = 2 and n 23. For s 3, their results can improve on all of the designs mentioned by Fang and Hickernell (1995). The CPU time of using TA is from 0.53 seconds (for s = 2 and n = 5) to 4,000 seconds (for s = 5 and n = 30) on an IBM RS 6000/3AT workstation. There are also potential applications of TA to optimal design. Winker (1995) has shown how TA can be successfully used for model selection. Speci cally, he used it to choose the lag order in multivariate auto-regressive time series models.
5 Conclusion We have shown the need for global optimization in statistics. Two global methods, SNTO and TA, have been applied in many statistical problems. Due to our limited knowledge there may be other global optimization algorithms that can been applied in statistics. Most statisticians are not very familiar with optimization theory and methods, especially, recent developments. One purpose of this paper is to invite optimization experts to consider the needs of statisticians in designing and analyzing new optimization algorithms.
Acknowledgments This work was partially support by Hong Kong RGC grant # 94-95/38.
References [1] Atkinson, A. C. and Donev, A. N. (1992), Optimum Experimental Designs, Oxford Science Publications, Oxford. [2] Althofer, I. and Koschnick, K.-U. (1991), On the convergence of \Threshold Accepting" Applied Mathematics and Optimization 24 183-195. [3] Bertsimas, D. and Tsitsiklis, J. (1993), Simulated annealing, Statist. Science, 8 10-15.
9
[4] Bundschuh, P. and Zhu, Y. C. (1993), A method for exact calculation of the discrepancy of low-dimensional nite point sets (I), Abhandlungen aus dem Math. Seminar der Univ. Hamburg, Bd. 63. [5] Cerny, V. (1985), A thermodynamic approach to the traveling salesman problem: An ecient simulation, J. Optim. Theory Appl., 45 41-51. [6] Chipman, J. S. and Winker, P. (1995),Optimal Industrial Classi cation by Threshold Accepting, Control and Cybernet., 24 (4) 477-494. [7] Clerk, L. D. (1986), A method for exact calculation of the star-discrepancy of plane sets applied the sequences of Hammersley, Mh. Math., 101 261-278. [8] Delgado, A., Puigjaner, L., Sanjeevan, K. and Sole, I. (1996) Hybrid System: Neural Networks and Genetic Algorithms Applied in Nonlinear Regression and Time Series Forecasting, in: COMPSTAT, Proceedings in Computational Statistics, 12th Symposium, A. Prat ed., Physica, Heidelberg, pp. 217-222. [9] Dorsey, R. E. and Mayer, W. J. (1995), Genetic Algorithms for Estimation Problems with Multiple Optima, Nondierentiability, and Other Irregular Features, J. Bus. Economic. Statist., 13 (1) 53-66. [10] Dueck, G. and Scheuer, T. (1990), Threshold accepting: a general purpose algorithm appearing superior to simulated annealing, J. Comput. Phys., 90 161-175. [11] Dueck, G. and Winker, P. (1992), New concepts and algorithms for portfolio choice, Appl. Stochastic Models Data Anal., 8 159-178. [12] Fang, K. T. and Hickernell, F. J. (1995), The uniform design and its application, in Bulletin of The International Statistical Institute, 50th Session, Book 1, Beijing, pp. 339-348. [13] Fang, K. T. and Wang, Y. (1990), A sequential algorithm for optimization and its applications to regression analysis, in Lecture Notes in Contemporary Mathematics, L. Yang and Y. Wang eds., 17-28, Science Press, Beijing. [14] Fang, K. T. and Wang, Y. (1991), A sequential algorithm for solving a system of nonlinear equations, J. Comput. Math. 9 9-16. [15] Fang, K. T. and Wang, Y. (1994), Number-Theoretic Methods in Statistics, Chapman & Hall, London. [16] Fang, K. T. and Wang, Y. and Bentler, P. M. (1994), Some applications of numbertheoretic methods in statistics, Statist. Sci., 9 416-428. [17] Fang, K. T. and Yuan, K. H. (1990), A uni ed approach to maximum likelihood estimation, Chinese J. Appl. Probab. Statist., 6 412-418. [18] Fang, K. T. and Zhang, J. T. (1993), A new algorithm for calculation of estimates of parameters of nonlinear regression modelings, Acta Math. Appl. Sinica, 16 366377. [19] Ferrari, P. A., Frigessi, A. and Schonmann, R. H. (1993), Convergence of some partially parallel Gibbs samplers with annealing, Ann. Appl. Probab., 3 137-153. [20] Fletcher, R. (1987), Practical Optimization Methods, John Wiley & Sons, New York. [21] Goe, W. L., Ferrier, G. D. and Rogers, J. (1994) Global Optimization of Statistical Functions with Simulated Annealing, J. Econometrics 60 65-99. [22] Hickernell, F. J. and Fang, K. T. (1993), Combining quasirandom search and Newton-like methods for nonlinear equations, Technical Report MATH-037, Hong Kong Baptist University. [23] Hickernell, F. J. and Yuan, Y. X. (1994), A simple generalized multistart algorithm for global optimization, Technical Report MATH-053, Hong Kong Baptist University. [24] Horst, R. and Yuy, H. (1990), Global Optimization, Springer, Berlin.
10
[25] Hua, L. K. and Wang, Y. (1981), Applications of Number theory to Numerical Analysis, Springer-Verlag and Science Press, Berlin and Beijing. [26] Kirkpatrick, S., Gelett, C. D., and Vecchi, M. P. (1983), Optimization by simulated annealing, Science, 220 621-630. [27] Li, W. and Fang, K. T. (1995), A global optimum algorithm on two factor uniform design, Technical Report MATH-095, Hong Kong Baptist University. [28] Liang, Y. Z. and Fang, K. T. (1996), Some robust multivariate calibration algorithm based on least median squares and sequential number-theoretic optimization method, Analyst Chem. in press. [29] Malkovich, J. F. and A , A. A. (1973), On tests for multivariate normality, J. Amer. Statist. Assoc., 68 176-179. [30] Mangeas, M. and Muller, C. (1996), How to Find Suitable Parametric Models using Genetic Algorithms: Application to Feedforward Neural Networks," COMPSTAT, Proceedings in Computational Statistics, 12th Symposium, A. Prat ed., Physica, Heidelberg, pp. 355-360. [31] Nash, J. C. and Walker-Smith. (1987), Nonlinear Parameter Estimation: An Integrated System in BASIC, Dekker, New York. [32] Niederreiter, H. (1973), Application of diophantine approximations to numerical integration, in Diophantine Approximation and Its Applications, C. F. Osgood ed., Academic Press, New York, 129-199. [33] Niederreiter, H. (1992), Random Number Generation and Quasi-Monte Carlo Methods, CBMC-NSF, SIAM, Philadelphia. [34] Niederreiter, H. and Peart, P. (1986), Localization of search in quasi-Monte Carlo for global optimization, SIAM J. Sci. Statist. Comput. 7 660-664. [35] Rinnooy Kan, A. H. G. and Timmer, G. T. (1989), Global optimization, Handbooks in Operations Research and Management Science: Vol. I: Optimization, G. L. Nemhauser, A. H. G. Rinnooy Kan and M. J. Todd eds., North-Holland, Amsterdam, pp. 631-662. [36] Rousseeuw, P.. J. and van Zomeren, B. C. (1990), Unmasking multivariate outliers and leverage points (with discussion) J. Amer. Statist. Assoc., 85 633-651. [37] Steel, J. M. (1993), In this issue, Statist. Sci., 8 1-2. [38] Weyl, H. (1916), U ber die Gleichverteilung der Zahlen mod Eins, Math. Ann., 77 313-52. [39] Winker, P. (1995), Identi cation of multivariate AR-models by threshold accepting, Comput. Statist. Data Anal., 20 295-307. [40] Winker, P. and Fang, K. T. (1996), Optimal U-type designs, Technical Report MATH-117, Hong Kong Baptist University. [41] Winker, P. and Fang, K. T. (1997), Application of threshold accepting to the evaluation of the discrepancy of a set of points, SIAM J. Num. Anal., in press. [42] Zhang, L., Liang, Y. Z. and Yu, R. Q. (1996), Sequential number-theoretic optimization method (SNTO) applied to Chemical quantitative analysis, preprint.
11