2005: Kriging Metamodeling in Discrete-Event Simulation: An Overview

15 downloads 0 Views 263KB Size Report
metamodel treats the simulation model as a black box; that is, the simulation model's I/O is observed, and the parame- ters of the metamodel are estimated.
Proceedings of the 2005 Winter Simulation Conference M. E. Kuhl, N. M. Steiger, F. B. Armstrong, and J. A. Joines, eds.

KRIGING METAMODELING IN DISCRETE-EVENT SIMULATION: AN OVERVIEW Wim C.M. Van Beers Department of Information Systems and Management Tilburg University Postbox 90153, 5000 LE Tilburg, THE NETHERLANDS

state. A disadvantage is that it cannot benefit from the specific structure of the simulation model, so it may take more computer time compared with techniques such as perturbation analysis and score functions. Metamodeling can also help in optimization and validation of a simulation model. This paper, however, does not discuss these two topics. Further, if the simulation model has hundreds of inputs, then special ‘screening’ designs are needed, discussed in Campolongo, Kleijnen, and Andres (2000). The examples in this paper, however, limit the number of inputs only to one or two. Whereas polynomial-regression metamodels have been applied extensively in discrete-event simulation (such as queueing simulation), Kriging has hardly been applied to random simulation. However, in deterministic simulation (applied in many engineering disciplines; see for example De Geest et al. 1999), Kriging has been applied frequently, since the pioneering article by Sacks et al. (1989). In such simulation, Kriging is attractive because it can ensure that the metamodel’s prediction has exactly the same value as the observed simulation output. In random simulation, however, this Kriging property may not be so desirable, since the observed (average) value is only an estimate of the true, expected simulation output. Note that several types of random simulation may be distinguished:

ABSTRACT Many simulation experiments require considerable computer time, so interpolation is needed for sensitivity analysis and optimization. The interpolating functions are ‘metamodels’ (or ‘response surfaces’) of the underlying simulation models. For sensitivity analysis and optimization, simulationists use different interpolation techniques (e.g. low-order polynomial regression or neural nets). This paper, however, focuses on Kriging interpolation. In the 1950’s, D.G. Krige developed this technique for the mining industry. Currently, Kriging interpolation is frequently applied in Computer Aided Engineering. In discrete-event simulation, however, Kriging has just started. This paper discusses Kriging for sensitivity analysis in simulation, including methods to select an experimental design for Kriging interpolation. 1

INTRODUCTION

A primary goal of simulation is ‘what-if’ or sensitivity analysis: What happens to the outputs if inputs of the simulation model change? Therefore simulationists run a given simulation program—or computer code—for (say) n different combinations of the k simulation inputs and observe the outputs. (Most simulation models have multiple outputs, but in practice these outputs are analyzed per output type.) To analyze this input/output (I/O) data, classic analysis uses low-order regression metamodels; see Kleijnen (1998). A metamodel is an approximation of the I/O transformation implied by the underlying simulation program. (In certain disciplines, metamodels are also called: Response surface, compact model, emulator, etc.) Such a metamodel treats the simulation model as a black box; that is, the simulation model's I/O is observed, and the parameters of the metamodel are estimated. This black-box approach has the following advantages and disadvantages. An advantage is that the metamodel can be applied to the output of all types of simulation models, either deterministic or random, either in steady-state or in transient

1.

Deterministic simulation with randomly sampled inputs. For example, in investment analysis the cash flow development over time can be computed through a spreadsheet such as Excel. Next, the random values of inputs are sampled—such as the cash flow growth rate—by means of either Monte Carlo or Latin Hypercube Sampling (LHS) through an add-on such as @Risk or Crystal Ball; see Van Groenendaal and Kleijnen (1997). 2. Discrete-event simulation. For example, classic queueing simulation is applied in logistics and telecommunications; see Van Beers and Kleijnen (2003).

202

Van Beers 3.

Combined continuous/discrete-event simulation. For example, simulation of nuclear waste disposal represents the physical and chemical processes through deterministic non-linear difference equations and models the human interventions as discrete events (see Kleijnen and Helton, 1999).

ten applied in CAE. Van Beers and Kleijnen (2003) introduce Kriging interpolation into the area of random simulation. 2.2 Formal Model for Kriging A random process Z(•) can be described by {Z (s) : s ∈ D} where D is a fixed subset of Rd and Z(s) is a random function at location s ∈ D ; see Cressie (1993, p. 52). There are several types of Kriging, but this paper limits to Ordinary Kriging, which makes the following two assumptions:

The remainder of this paper is organized as follows. Subsection 2.1 sketches the history of Kriging and its application in geology and in simulation. Subsection 2.2 describes the basics of Kriging and gives the formal Kriging model. Section 3 discusses classic designs for Kriging and mentions criteria for measurement of their performance. Subsection 3.1 treats customized designs for Kriging in deterministic simulation, whereas subsection 3.2 treats customized designs for random simulation. Both subsections demonstrate the performance of the customized designs by two academic simulation models. Section 4 presents conclusions and topics for future research. 2

1. The model assumption is that the random process consists of a constant μ and an error term δ (s) : Z (s) = μ + δ (s) with s ∈ D, μ ∈ R

2. The predictor assumption is that the predictor for the point s 0 —denoted by p( Z (s 0 )) —is a weighted linear function of all the observed output data:

KRIGING

2.1 History of Kriging In the 1950s, the South African mining engineer D.G. Krige (born in 1919) devised an interpolation method to determine true ore-bodies, based on samples. The basic idea is that these predictions are weighted averages of the observed outputs, where the weights depend on the distances between the input location to be predicted and the input locations already observed. The weights are chosen so as to minimize the prediction variance, i.e., the weights should provide a Best Linear Unbiased Estimator (BLUE) of the output value for a given input. Therefore, Kriging is also called Optimal Interpolation. The dependence of the interpolation weights on the distances between the inputs was mathematically formalized by the French mathematician Georges Matheron (1930-2000) in his monumental ‘Traité de géostatistique appliquée’ (1962). He introduced a function, which he called a variogram, to describe the variance of the difference between two observations. The variogram is the cornerstone in Kriging. Hence, accurate estimation of the variogram, based on the observed data, is essential. Journel and Huijbregts (1978, pp. 161-195) present various parametric variogram models. The values of its parameters are obtained by either Weighted Least Squares (WLS) or Maximum Likelihood Estimation (MLE); see Cressie (1993). So Kriging originated in geostatistics to answer concrete questions in the gold mining industry: Drilling for ore—deep under the ground—is expensive, so efficient prediction methods are necessary. Later on, Kriging was successfully introduced into deterministic simulation by Sacks et al. (1989). For example, Kriging is nowadays of-

p ( Z (s 0 )) = ∑i =1 λi Z (s i ) n

with



n i =1

λi = 1 . (1)

To select the weights λi in (1), the criterion is minimal mean-squared prediction error (MSE), defined as

(

)

σ e2 = E (Z (s 0 ) − p ( Z (s 0 )) )2 .

(2)

Substituting the variogram, defined as 2γ (h) = var[Z (s + h) − Z (s)] ,

in (2) gives the optimal weights λ1 , K , λ n /

⎛ 1 − 1 / Γ −1 γ ⎞ λ / = ⎜⎜ γ + 1 / −1 ⎟⎟ Γ −1 , 1 Γ 1 ⎠ ⎝

where

γ

denotes

the

vector

of

(3) (co)variances

(γ (s 0 − s1 ), K , γ (s 0 − s n )) , Γ denotes the n × n matrix /

whose (i, j)th element is γ (s i − s j ) , 1 = (1, K ,1) / is the vec-

tor of ones; also see Cressie (1993, p. 122). Note that these optimal Kriging weights λi depend on the specific point s 0 that is to be predicted, whereas linear-regression metamodels use fixed estimated parameters (say) βˆ for each s to be predicted. 0

203

Van Beers

However, in (3) γ (h) is unknown. The usual estimator

The most popular design type for Kriging is Latin Hypercube Sample (LHS). This type of design was introduced by McKay, Beckman, and Conover (1979) for deterministic simulation models. Those authors did not analyze the I/O data by Kriging (but they did assume I/O functions more complicated than the polynomial models in classic DOE). LHS offers flexible design sizes n (number of input combinations actually simulated) for any k (number of simulation inputs). LHS proceeds as follows; also see the example for k = 2 factors in Figure 1.

is 2γˆ (h) =

1 N (h)

∑N

(h )

( Z (s i − Z (s j ) 2

where N (h) denotes the number of distinct pairs in N (h) = {(s i , s j ) : s i − s j = h ; i, j = 1, K, n} ; see Matheron

(1962).

1.

3

DESIGNS FOR KRIGING

An experimental design is a set of n combinations of k factor values. These combinations are usually bounded by ‘box’ constraints: with a j , b j ∈ R and j = 1, K, k . The set

2.

of all feasible combinations is called the experimental region (say) H. We suppose that H is a k-dimensional unit cube, after rescaling the original rectangular area. Our goal is to find the ‘best’ design for Kriging predictions within H; the Kriging literature proposes several criteria (see Sacks et al. 1989, p. 414). Most of these criteria are based on the predictor’s MSE (2). Most progress has been made for the IMSE (see Bates et al. 1996):

( )

IMSE = ∫ MSE Yˆ (x) φ (x)dx H

3.

x2

1 N

N

∑ ( yˆ (x) − y (x)) i

i

2

(2) (3) -1

Because LHS implies randomness, its result may happen to be an outlier. For example, it might happen—with small probability—that two input factors have a correlation coefficient of –1 (all their values lie on the main diagonal of the design matrix). Therefore the LHS may be adjusted to become (nearly) orthogonal; see Ye (1998). Classic designs simulate extreme scenarios—namely the corners of a k-dimensional square—whereas LHS has better space filling properties; again see Figure 1. This space filling property has inspired many statisticians to develop related designs. One type maximizes the minimum Euclidean distance between any two points in the k-dimensional experimental area. Other designs minimize the maximum distance. See Koehler and Owen (1996), Santner, Williams, and Notz (2003), and also Kleijnen et al. (2004).

( )}

MaxMSE = max MSE Yˆ (x)

and EIMSE in (5) by EMaxIMSE = max

{(yˆ

i ∈ {1, ..., m}

i

( x) − y i ( x)

) }. 2

x1

Figure 1: A LHS Design for Two Factors & Four Scenarios

Besides this EIMSE, we will also study the maximum MSE; that is, we also consider risk-averse users (also see Van Groenigen, 2000). So IMSE—defined in (4)—is replaced by

{

+1

-1

i =1

x∈H

(4) (1)

(5)

.

(i): Scenario i (i = 1, 2, 3, 4)

+1

(4)

where MSE follows from minimizing (2), and φ (x) is a given weight function—usually assumed to be a constant. To evaluate a design, Sacks et al. (1989, p. 416) compare the predictions with the known output values of a test set consisting of (say) N inputs. Assuming a constant φ (x) in (4), the IMSE can then be estimated by the Empirical IMSE (EIMSE): EIMSE =

LHS divides each input range into n intervals of equal length, numbered from 1 to n (so the number of values per input can be much larger than in designs for low-order polynomials). Next, LHS places these integers 1,…, n such that each integer appears exactly once in each row and each column of the design matrix. Within each cell of the design matrix, the exact input value may be sampled uniformly. (Alternatively, these values may be placed systematically in the middle of each cell. In risk analysis, this uniform sampling may be replaced by sampling from some other distribution for the input values.)

(6)

204

Van Beers Successive Relative Improvement (SRI) after n observations:

3.1 Customized Sequential Designs for Deterministic Simulation Kleijnen and Van Beers (2004) derive designs that are customized; that is, they are not generic designs (such as 2 k − p designs or LHS). More precisely, these customized designs account for the specific input/output function of the particular simulation function at hand. This customization is achieved through cross-validation and jackknifing. Furthermore, these designs are sequential, because sequential procedures are known to be more ‘efficient’; see, for example, Ghosh and Sen (1991) and Park et al. (2002). The procedure starts with a ‘small’ pilot design of size (say) n 0 . To avoid extrapolation, the procedure first selects

SRI n = max{~ s j2 } n − max{~ s j2 } n −1 j

j

max{~ s j2 } n −1 j

where max{ ~ s j2 }n denotes the maximum jackknife variance j

over j = 1, K , c candidates after n evaluations. Note that there are several stopping rules; for example, Sasena et al. (2002) use the Generalized Expected Improvement function, which selects inputs that have high model inaccuracy. They stop their tests—rather arbitrary—after 100 calls of this function, whereas Schonlau (1997) proposes to stopping once the ratio of the expected improvement becomes sufficiently small, e.g. 0.01. Kleijnen and Van Beers (2004) test their Customized Sequential Designs (CSD) through two academic applications: x with 1. the hyperbolic I/O function y = 1− x 0 < x