New Myopic Sequential Sampling Procedures

3 downloads 0 Views 383KB Size Report
One effective approach, which is based upon a Bayesian probability model for .... The frequentist probability of correct selection (PCSIZ) is the probability that the.
Submitted to INFORMS Journal on Computing manuscript (Please, provide the mansucript number!)

New Myopic Sequential Sampling Procedures Stephen E. Chick INSEAD; Technology and Operations Management Area; Boulevard de Constance; F-77305 Fontainebleau, France, [email protected], http://faculty.insead.edu/chick/

J¨ urgen Branke Institute AIFB; University of Karlsruhe (TH); D-76128 Karlsruhe, Germany, [email protected], http://www.aifb.uni-karlsruhe.de/∼jbr

Christian Schmidt Locom Software GmbH; Stumpfstr. 1, D-76131 Karlsruhe, Germany, [email protected], http://www.locom.de/

Statistical selection procedures are used to select the best of a finite set of alternatives, when stochastic simulation is used to infer the value of each alternative, and “best” is defined in terms of each alternative’s unknown expected value. One effective approach, which is based upon a Bayesian probability model for the unknown mean performance of each system, allocates samples based on maximizing an approximation to the expected value of information (EVI) from those samples. The approximations include asymptotic and probabilistic approximations. This paper derives sampling allocations that avoid most of those approximations to the EVI, but entails sequential myopic sampling from a single alternative per stage of sampling. We then demonstrate empirically that the benefits of reducing the number of approximations in the previous algorithms is outweighed by the deleterious effects of a sequential one-step myopic allocation, at least when more than a few dozen samples are allocated. The tests show, however, that there is a benefit from switching from the prior EVI approach to the new small sample procedures to allocate the last few samples. The theory here also clarifies the derivation of selection procedures that are based upon the EVI. We also report on computational techniques that improve the numerical stability of the procedures when they are pushed to extreme levels of correct selection. Key words : 902 Simulation : Statistical analysis; 055 Decision analysis : Sequential; 793 Statistics, Bayesian History : Submitted December 9, 2007

Selection procedures are used to select the best of a finite set of alternatives, where best is determined with respect to the largest mean, and the mean must be inferred via statistical sampling (Bechhofer et al. 1995, Kim and Nelson 2006). Selection procedures are useful in simulation, and have attracted interest in combination with tools like evolutionary algorithms (Branke and Schmidt 2004, Schmidt et al. 2006), and discrete optimization via simulation (Boesel et al. 2003). Since inference about the unknown mean performance of each system is estimated with stochastic simulation output, it is not possible to guarantee that the best alternative is selected with probability 1 in finite time. An ability to minimize the expected penalty for incorrect selections with a limited number of samples is desired. The expected penalty is typically measured by the 1

2

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

probability of incorrect selection (PICS), or the expected opportunity cost (EOC) associated with potentially selecting a system that is not the best. Several frequentist and Bayesian formulations have been used to motivate and derive selection procedures. Branke et al. (2007) compare several of those formulations for fully sequential sampling procedures. A fully sequential procedure allocates one sample at a time to one of the simulated alternatives, runs a replication of the simulation for that alternative, updates the statistics for that system, and iterates until the selection procedure stops and identifies a system to select. They indicate that specific instances of two of those approaches, the expected value of information approach (VIP) (Chick and Inoue 2001) and the optimal computing budget allocation approach (OCBA) (Chen et al. 2000, He et al. 2007) perform quite well, especially when used with particular adaptive stopping rules. The Online Companion to (Branke et al. 2007) also provides a theoretical explanation for why the VIP and OCBA approaches, when used to minimize the EOC objective function for sampling, perform similarly in numerical experiments. The articles that derived the original VIP and OCBA procedures (Chick and Inoue 2001, Chen et al. 2000, He et al. 2007) make use of asymptotic approximations (in the number of samples), approximations to the distribution of the difference of two variables with t distributions (e.g. Welch’s approximation for the so-called Behrens-Fisher problem), and approximations to bound the probability of certain events (e.g. Bonferroni’s or Slepian’s inequality). Those approximations make it feasible to quickly compute an effective (albeit somewhat suboptimal) sequential allocation when the variance of each alternative is unknown but inferrable from simulation output. If the sampling variance for each alternative is known, Gupta and Miescke (1994, 1996) derived optimal allocations when there are two alternatives, and effective suboptimal allocations when there are multiple alternatives. In simulation experiments, however, the sample variance is typically unknown. This paper derives small-sample allocations with the VIP approach that allocate all samples to a single system at each stage of a sequential procedure when the sample variances are unknown. The derivation of the procedures below avoids an asymptotic approximation, the Behrens-Fisher problem (no Welch approximation), and the need to use Bonferroni’s or Slepian’s inequality. Because those approximations are averted, one might expect that the new small-sample VIP procedures would perform better than the original large-sample counterparts. We show that empirically, this is typically not the case when a medium to large number of samples is observed. The explanation is that the small-sample VIP procedures allocate samples in order to optimize the expected value of information (EVI) of those samples, assuming that a system is selected immediately after a given stage of sampling from a single system, rather than after iterating.

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

3

The penalty of myopically applying an optimal small-sample algorithm sequentially, therefore outweighs the penalty associated with making more approximations that model large-sample behavior well, at least when more than a few replications are observed, as is common in simulation practice. The experiments show, however, that the original EVI algorithms can be somewhat enhanced by allocating the last few samples with a new small-sample procedure. This paper focuses on independent samples. We did not analyze common random numbers for this approach, a variance reduction tool, since the new approach did not present a marked improvement in performance when independent simulations are used. The paper does contribute, however, beyond showing that the new small-sample approach is useful for the final few samples of a sequential procedure, and by illustrating the influence of several approximations that are commonly used in ranking and selection. The theory below further clarifies the derivation of selection procedures that are based upon the EVI. We also report on computational techniques that improve the numerical stability of the procedures when they are pushed to extreme levels of correct selection. Preliminary numerical results, and claims without proofs, appeared as Chick et al. (2007).

1.

Problem Formulation and Selection Procedures

We first formalize the problem and describe the assumptions. Section 1.2 describes measures of the evidence of correct selection and stopping rules that influence the efficiency of the procedures. Section 1.3 recalls the original VIP procedures, in order to provide context for the new procedures in Section 1.4. Those procedures are formally justified in Section 2. 1.1.

Setup, Assumptions and Notation

The best of k simulated systems is to be identified, where ‘best’ means the largest output mean. Let Xij be a random variable whose realization xij is the output of the jth simulation replication of system i, for i = 1, . . . , k and j = 1, 2, . . .. Let wi and σi2 be the unknown mean and variance of simulated system i, and let w[1] ≤ w[2] ≤ . . . ≤ w[k] be the ordered means. In practice, the ordering [·] is unknown, and the best system, system [k], is to be identified with simulation. The procedures considered below are derived from the assumption that simulation output is independent and normally distributed, conditional on wi and σi2 ,  iid {Xij : j = 1, 2, . . .} ∼ Normal wi , σi2 , for i = 1, . . . , k.

Although the normality assumption is not always valid, it is often possible to batch a number of outputs so that normality is approximately satisfied. Vectors are written in boldface, such as w = (w1 , . . . , wk ) and σ 2 = (σ12 , . . . , σk2 ). A problem instance (configuration) is denoted by χ = (w, σ 2 ).

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

4

Pni xij /ni be the sample Let ni be the number of replications for system i run so far. Let x ¯i = j=1 P ni (xij − x ¯i )2 /(ni − 1) be the sample variance. Let x ¯(1) ≤ x ¯(2) ≤ . . . ≤ x ¯(k) be the mean and σ ˆi2 = j=1

ordering of the sample means based on all replications seen so far. Equality occurs with probability 0 in contexts of interest here. The quantities ni , x ¯i , σ ˆi2 and (i) are updated as more replications are observed. Each selection procedure generates estimates w ˆi of wi , for i = 1, . . . , k. For the procedures studied here, w ˆi = x ¯i , and a correct selection occurs when the selected system, system D, is the best system, [k]. Here, system D = (k) is selected as best. If Tν is a random variable with standard Student t distribution with ν degrees of freedom, we denote (as do Bernardo and Smith 1994) the distribution of µ + √1κ Tν by St (µ, κ, ν). If ν > 2, then the variance is κ−1 ν/(ν − 2). If κ = ∞ or 1/0, then St (µ, κ, ν) is a point mass at µ. Denote the cumulative distribution function (cdf) of the standard t distribution (µ = 0, κ = 1) by Φν () and probability density function (pdf) by φν (). The standardized EOC function is denoted Z ∞ ν + s2 def φν (s) − sΦν (−s). Ψν [s] = (u − s)φν (u)du = ν −1 u=s 1.2.

(1)

Evidence for Correct Selection

The measures of effectiveness and some stopping rules for the procedures are defined in terms of loss  functions. The zero-one loss function, L0−1 (D, w) = 11 wD 6= w[k] , where the indicator function 11 {·} equals 1 if its argument is true (here, the best system is not correctly selected), and is 0 otherwise. The opportunity cost Loc (D, w) = w[k] − wD is 0 if the best system is correctly selected, and is otherwise the difference between the best and selected system. There are several frequentist measures that describe a procedure’s ability to identify the actual best system. The frequentist probability of correct selection (PCSIZ ) is the probability that the mean of the system selected as best, system D, equals the mean of the system with the highest mean, system [k], conditional on the problem instance.1 The probability is defined with respect to the simulation output Xij generated by the procedure (the realizations xij determine D). def

PCSIZ (χ) = 1 − E [L0−1 (D, w) | χ] = Pr wD = w[k] | χ



We denote the corresponding probability of incorrect selection as PICSIZ (χ) = 1 − PCSIZ (χ). The probability of a good selection, the probability of selecting a system that is within a given amount δ ∗ > 0 of the best, is  def PGSIZ,δ∗ (χ) = Pr wD > w[k] − δ ∗ | χ . 1

The subscript

IZ

denotes indifference zone, a frequentist approach to selection procedures.

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

5

Another measure of selection quality is the frequentist opportunity cost of a potentially incorrect selection,   def EOCIZ (χ) = E [Loc (D, w) | χ] = E w[k] − wD | χ .

At times, the configuration itself is sampled randomly prior to the application of a selection procedure. We then average over the sampling distribution of the configurations, EOCIZ = E[EOCIZ (χ)] or PCSIZ = E[PCSIZ (χ)]. When the mean performance of each system is unknown, a different set of measures is required to describe the evidence for correct selection. Bayesian procedures assume that parameters whose values are unknown are random variables (such as the unknown means W), and use the posterior distributions of the unknown parameters to measure the quality of a selection. Given the data E seen so far, two measures of selection quality are 

def



PCSBayes = 1 − E [L0−1 (D, W) | E ] = Pr WD ≥ max Wi | E i6=D   def EOCBayes = E [Loc (D, W) | E ] = E max Wi − WD | E , i=1,2,...,k

the expectation taken over D (a function of the random Xij ) and the posterior distribution of W given E . Approximations in the form of bounds on the above losses are useful to derive sampling allocations and to define stopping rules. Slepian’s inequality and the Bonferroni inequality (e.g., Kim and Nelson 2006) imply that the posterior evidence that system (k) is best satisfies Y Y  def PCSBayes ≥ Pr W(k) > W(j) | E ≈ Φν(j)(k) (d∗jk ) = PCSSlep j:(j)6=(k)

PCSBayes ≥ 1 −

 j:(j)6=(k) X def Pr W(k) β ∗ , for a specified EOC target β ∗ > 0. 1.3.

Value of Information Procedure (VIP)

VIPs allocate samples to each alternative in order to maximize the EVI of those samples. Some balance the cost of sampling with the EVI, and some maximize EVI subject to a sampling budget constraint (Chick and Inoue 2001). Procedures 0-1(S ) and LL(S ) are sequential variations of those procedures that are designed to iteratively improve PCSBonf and EOCBonf , respectively. Those procedures allocate τ replications per stage until a total of B replications are run. We recall those procedures in order to serve as a basis for comparison with the new procedures in Section 1.4. Procedure 0-1. 1. Specify a first-stage sample size n0 > 2, and a total number of samples τ > 0 to allocate per subsequent stage. Specify stopping rule parameters. 2. Run independent replications Xi1 , . . . , Xin0 , and initialize the number of replications ni ← n0 run so far for each system, i = 1, . . . , k. 3. Determine the sample statistics x ¯i and σ ˆi2 , and the order statistics, so that x ¯(1) ≤ . . . ≤ x ¯(k) . 4. WHILE stopping rule not satisfied DO another stage: (a) Initialize the set of systems considered for additional replications, S ← {1, . . . , k }. 2 2 (b) For each (i) in S\{(k)}: If (k) ∈ S then set λ−1 ˆ(i) /n(i) + σ ˆ(k) /n(k) , and set ν(i)(k) with ik ← σ 2 Welch’s approximation. If (k) ∈ / S then set λik ← n(i) /ˆ σ(i) and ν(i)(k) ← n(i) − 1.

(c) Tentatively allocate a total of τ replications to systems (i) ∈ S (set τ(j) ← 0 for (j) ∈ / S ): τ(i) ←

(τ +

1

P j∈S

2 nj )(ˆ σ(i) γ(i) ) 2

1 σj2 γj ) 2 j∈S (ˆ

P

 − n(i) , where γ(i) ←

λik d∗ik φν(i)(k) (d∗ik ) for (i) 6= (k) P (j)∈S\{(k)} γ(j) for (i) = (k).

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

7

(d) If any τi < 0 then fix the nonnegativity constraint violation: remove (i) from S for each Pk (i) such that τ(i) ≤ 0, and go to Step 4b. Otherwise, round the τi so that i=1 τi = τ and go to Step 4e. (e) Run τi additional replications for system i, for i = 1, . . . , k. Update sample statistics ni ← n i + τi ; x ¯i ; σ ˆi2 , and the order statistics, so x ¯(1) ≤ . . . ≤ x ¯(k) . 5. Select the system with the best estimated mean, D = (k). The formulas in Steps 4b-4c are derived from optimality conditions to improve a Bonferronilike bound on the EVI for asymptotically large τ (Chick and Inoue 2001). Depending on the stopping rule used, the resulting procedures are named 0-1(S ), 0-1(PGSSlep,δ∗ ), 0-1(EOCBonf ), with the stopping rule in parentheses. Procedure LL is a variant of 0-1 where sampling allocations seek to minimize EOCBonf . Procedure LL. Same as Procedure 0-1, except set γ(i) in Step 4c to ( 2 +(d∗ 1/2 ν ik ) λik (i)(k) φν(i)(k) (d∗ik ) for (i) 6= (k) ν −1 γ(i) ← P(i)(k) for (i) = (k) (j)∈S\{(k)} γ(j) 1.4.

(4)

New Small-Sample Procedures.

Procedures 0-1 and LL allocate additional replications using an EVI approximation based on asymptotically large number of replications (τ ) per stage. A performance improvement in the procedures might be obtained by better approximating EVI when there are a small number of replications per stage that are all run for one system. The derivation of Procedure LL1 in Section 2 avoids the asymptotic approximation in the derivation of LL, as well as the Bonferroni and Welch approximations. The procedure’s name is distinguished from its large-sample counterpart by the subscript

1

(one system gets all replications in a

given stage). The procedure uses the following standardized statistics that describe the difference in sample means of two different alternatives, given that additional samples will be, but have not yet been taken. A justification of those statistics follows in Section 2.1. 1/2

d∗{jk} = λ{jk} d(j)(k) ! 2 2 τ σ ˆ τ σ ˆ (j) (k) (k) (j) + λ−1 {jk} = n(k) (n(k) + τ(k) ) n(j) (n(j) + τ(j) )

(5)

Procedure LL1 . Same as Procedure 0-1, except replace Steps 4a-4d by: • For each i ∈ {1, 2, . . . , k }, see if allocating to (i) is best: ∗ — Tentatively set τ(i) ← τ and τ` ← 0 for all ` 6= (i); set λ−1 {jk} , d{jk} with Equation (5) for all j.

8

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

— Compute the EVI,  

h i −1/2 λ{ik} Ψn(i) −1 d∗{ik} if (i) 6= (k) h i EVILL,(i) = −1/2 ∗ λ {k−1,k} Ψn(k) −1 d{k−1,k} if (i) = (k).

(6)

• Set τ(i) ← τ for the system that maximizes EVILL,(i) , and τ` ← 0 for the others.

Procedure 0-11 . Same as Procedure LL1 , except the EVI is approximated with respect to the expected 0-1 loss, 

EVI0−1,(i) =

Φn(i) −1 (−d∗{ik} ) if (i) 6= (k) Φn(k) −1 (−d∗{k−1,k} ) if (i) = (k).

(7)

Before proceeding to their derivation, we make several observations about these procedures. First, straightforward algebra that follows the logic of the derivation in Section 2 below shows that the analog of Equation (6) for the case of a known variances σi for i = 1, 2, . . . , k is  h i  λ−1/2 Ψ d∗ if (i) 6= (k) {ik} σi h {ik} i EVIknown = LL,(i)  λ−1/2 Ψ d∗ {k−1,k} if (i) = (k), {k−1,k}

(8)

2 2 if σ(j) replaces σ ˆ(j) in the definition of λ{i,k} for j ∈ {i, k }, and where φ(s), Φ(s) and Ψ[s] =

φ(s) − s(1 − Φ(s)) are the pdf, cdf and standard newsvendor loss functions, respectively, for the standard normal distribution. 2 2 Similarly, if σ(j) replaces σ ˆ(j) in the definition of λ{i,k} for j ∈ {i, k } then  Φ(−d∗{ik} ) if (i) = 6 (k) known σi EVI0−1,(i) = ∗ Φ(−d{k−1,k} ) if (i) = (k)

(9)

is the expected value of information approximation for the 0-1 loss function when the variances are all known. Second, one can readily verify that Ψ[−s] = s + Ψ[s] and Ψν [−s] = s + Ψν [s]. The expected reward if replications are allocated only to system (i) and to no other system, for the known and unknown variance cases, is therefore σi E[max Wj | known σ` ] = x ¯(k) + EVIknown LL,(i) j

(10)

E[max Wj | unknown σ` ] = x ¯(k) + EVILL,(i) . j

Intuitively, Equation (10) means that the expected reward of a one-step lookahead procedure equals the expected reward if one were to stop now plus the expected value of information in the one-step of sampling, as measured by the expected opportunity cost. This generalizes the observation of the greedy one-step look-ahead algorithm of Gupta and Miescke (1996, Lemma 4), which assumes a known common variance for each system. Their σ(i) is our

2 τ(i) σ ˆ(i)

n(i) (n(i) +τ(i) )

.

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

9

Third, we can assess the optimal one-step allocation of τ replications to one system when k = 2 known σi is and the variances are known. Straightforward algebra shows that the maximizer of EVILL,(i)

the maximizer of 2 σ(i) τ

n(i) (n(i) + τ )

.

(11)

If we formally let τ ↓ 0, this allocation becomes the allocation that adds replications to minimize the difference between (n(1) + τ(1) )/σ(1) and (n(2) + τ(2) )/σ(2) . To see this, take the derivative of Equation (11) with respect to τ , send τ to 0, and select the alternative that maximizes the result. The implication is that replications should be added so that the σi /ni for the two systems stay as equal as possible (subject to the integer sampling constraint). The same result, interestingly, is obtained for the 0-1 loss function for k = 2 systems with known variances as well. In both cases, with k = 2 and equal variances, the optimal allocation samples equally often from both systems, for both loss functions that are considered above. That third observation, for the case of k = 2 systems with known variances, matches the optimal allocation for this special case as formulated in Gupta and Miescke (1994, comments after Theorems 1 and 2) for the VIP approach, and in Chen et al. (2000, Remark 1, page 259, which handles the case of unknown variances by assuming a known variance approximation) for the OCBA approach. This greedy one-step procedure is optimal for a one-stage procedure but may be suboptimal when applied sequentially. Gupta and Miescke (1996) proposed using such a one-stage lookahead, but left the case of unknown variances for further research. This paper contributes by allowing the variances to be unknown, and then empirically assesses whether the suboptimality that is due to a sequential greedy one-step lookahead is better or worse in simulation type applications, as compared with existing large-sample asymptotic procedures that make other approximations.

2.

Removal of Approximations in Fully Sequential VIPs: LL1 and 0-11

Informally, the VIP approach allocates a finite number of samples to each system at each stage, in order to minimize the expected loss once those samples are observed, or to balance the expected reduction in loss from those samples with the expected cost of sampling. The minimization of the expected loss once those samples are observed is the focus here. That problem, which is formalized below, does not have an easily-computed closed form solution. To address this, Procedure 0-1 was derived using a Bonferroni approximation for the loss when k > 2, Welch’s approximation for the degrees of freedom, and two asymptotic approximations (Chick and Inoue 2001). Procedure LL used one less asymptotic approximation.

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

10

This section recalls the EVI approximations that were used to derive LL and 0-1, then derives the new small-sample procedures, LL1 and 0-11 . We show how Procedure 0-11 avoids one of the two asymptotic approximations, along with the Bonferroni and Welch approximations, in the derivation of 0-1. Procedure LL1 removes each such approximation. The argument requires the imposition of an added constraint—all samples in a given stage are allocated to one system. This section also fills in details of the derivation for 0-1 that was outlined in Chick and Inoue (2001). 2.1.

Setup for Optimizing the EVI

The VIP approach models uncertainty about the unknown means, given all of the information so far, E , in order to predict the distribution of output to be run in a future stage of sampling. That distribution is used to assess the EVI in those samples, relative to a loss function, such as the zero-one loss and opportunity cost functions in Section 1.2. Assuming a noninformative prior distribution for the unknown mean and variance, the posterior marginal distribution for the unknown mean Wi , given ni > 2 samples, is St (¯ xi , ni /ˆ σi2 , νi ), where νi = ni − 1 (e.g., de Groot 1970). We use that prior distribution here to predict the distribution of the samples to be observed. Branke et al. (2007) describe how to adapt this discussion if certain other prior distributions are used. In order to describe the samples to be collected, let Y = (Y1 , Y2 , . . . , Yk ) denote the data to be collected in a single additional stage of sampling, with τi ≥ 0 replications to be run for systems i = 1, 2, . . . , k. Let y = (y1 , y2 , . . . , yk ) be the realization of Y, and let y¯i be the sample average of the output of system i in that additional stage if τi > 0. Define y¯i = 0 if τi = 0. The realized overall sample average is zi =

ni x ¯i + τi y¯i . n i + τi

Before Y is observed, the sample average that will arise after sampling, Zi = (ni x ¯i + τi Y¯i )/(ni + τi ), is random. One can show (de Groot 1970) that  Zi ∼ St x ¯i , ni (ni + τi )/(τi σ ˆi2 ), ni − 1 .

(12)

Let D(Y) be the decision to pick the system with the highest overall sample mean, D(Y) = arg max {Zi } . i=1,...,k

That decision is optimal for an EOC objective, and is usually good for a PCS objective (Chick and Inoue 2001).

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

11

We are now in a position to conclude the formal specification of the optimal allocation problem for the VIP approach. The goal is to select τ1 , τ2 , . . . , τk ≥ 0 in order to minimize the expected loss after those samples are observed. For the expected zero-one loss function, the expected loss is E[L0−1 (D(Y), W)] = E [E[L0−1 (D(Y), W) | Y]] .

(13)

The opportunity cost can be handled similarly. Minimizing the expected loss after sampling is equivalent to maximizing the EVI with respect to the given loss function. Because Equation (13) takes a minimum of 0 when all the τi → ∞ together (there is no loss, given the perfect information that is provided from an infinite number of samples), a constraint is added. For the S stopping Pk rule, the constraint is i=1 τi = τ , for some finite τ . 2.2.

Analysis for Optimizing the EVI

In order to maximize the EVI, it will be convenient to work with a modified loss function that is more tractable. To analyze the expected zero one loss function in Equation (13), we add −L0−1 ((k), W) to obtain an auxiliary loss function along the lines of Chick and Inoue (2001,

Theorem 2), L∗0−1 ((i), w) = L 0−1 ((i), w) − L0−1 ((k), w)  0 if (i) = (k) or neither (i) nor (k) is best = −1 if (i) 6= (k) and (i) is best  1 if (i) 6= (k) and (k) is best ∗ E[L0−1 ((k), W) | y] = 0   E[L∗0−1 ((i), W) | y] = Pr W(i) ≤ W(k) | y − Pr W(i) > W(k) | y .

(14)

(15)

Adding −L0−1 ((k), W) does not change the optimal decision (de Groot 1970, Sec. 11.8).   2 The posterior distribution of W(i) given E and y(i) is St z(i) , (n(i) + τ(i) )/ˆ σ(i) , n(i) + τ(i) − 1 . Set V(i) = W(i) − W(k) . Conditional on all information, E and y, we have   ˜ ik , ν˜(i) , V(i) ∼ St z(i) − z(k) , λ

where the degrees of freedom ν˜(i) is estimated with Welch’s approximation if needed, and ˜ −1 = λ jk

2 σ ˆ(k)

2 σ ˆ(j)

+ . n(k) + τ 11 {(k) = i} n(j) + τ 11 {(j) = i}  ˜ 1/2 (z(i) − z(k) )). Note that Pr V(i) > 0 | z(i) − z(k) ≈ Φν˜(i) (λ ik

(16)

Set U(i) = Z(i) − Z(k) , recall Equation (12), and use Welch’s approximation if necessary, so that  U(i) ∼ St −d(i)(k) , λ{ik} , ν(i)(k) .

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

12

The standardized statistics for this difference give rise to Equation (5). When τ(i) → 0, the distribution of Z(i) degenerates to a point mass at x ¯(i) , meaning that the overall sample mean is known with probability 1 if there are no new replications. If replications are run for system (k) but not for system (i), then the correct degrees of freedom is ν(i)(k) = n(k) − 1. If replications are run for system (i) but not for system (k), then ν(i)(k) = n(i) − 1. The Behrens-Fisher problem for U(i) , and therefore Welch’s approximation, is averted if only one system gets additional replications. For the moment, continue to let all τj ≥ 0, and let the order statistics reflect the order of the E[Wi ] before sampling (with the noninformative prior adopted here, this implies x ¯(1) ≤ x ¯(2) ≤ . . . ≤ x ¯(k) ). System (i) is selected after sampling when Z(i) > Zj for all j 6= (i). Further, the conditional distribution of W given Y only depends upon Y through Z. Equation (15) therefore implies    E L∗0−1 (D(Y), W) = Pr W` ≤ W(i) for all ` 6= (i), Z(i) > Zj for all j 6= (i)  −Pr W(i) > W` for all ` 6= (i), Z(i) > Zj for all j 6= (i) .

(17)

Procedure 0-1 allocates samples to minimize the expected value of PCSBonf , given that another stage of sampling will, but has not yet, occurred. We therefore analyze Equation (17) for the special case of k = 2 in order to solve for that case and to understand the role of the Bonferroni bound. If k = 2, then Equation (17) simplifies:     E L∗0−1 (D(Y), W) = Pr Z(i) > Z(k) (1 − 2Pr W(i) > W(k) | Z(i) > Z(k) ) h  i ˜ 1/2 U(i) )) . ≈ E 11 U(i) > 0 (1 − 2Φν˜(i) (λ ik

(18) (19)

Add the L0−1 ((k), W) subtracted earlier and take the expectation to obtain expected 0-1 loss. When k = 2, E[L0−1 (D(Y), W)] = E[L0−1 ((k), W)] + E[L∗0−1 (D(Y), W)] h i 1/2 ˜ 1/2 U )) . ≈ Φν(i)(k) (−λik d(i)(k) ) + E 11 {U > 0}(1 − 2Φν˜ (λ ik

(20)

If the number of additional replications grows without bound (τ(i) , τ(k) → ∞) then λ{ik} → λik and h i 1/2 ˜ ik → ∞, so that E [11 {U > 0}] → Φν ˜ 1/2 U ) converges to λ ( − λ d ), and − 2E 1 1 { U > 0 } Φ ( λ (i)(k) ν ˜ ik ik (i)(k) 1/2

−2Φν(i)(k) (−λik d(i)(k) ). That makes the expected loss in Equation (20) converge to zero in the limit

(no loss with perfect information). With no additional replications (τ(i) , τ(k) → 0), the expectation in Equation (20) tends to zero (Pr (U > 0) → 0), and the first term is the expected loss with no information. This matches intuition: the improvement in the expected loss, or EVI, is the negative of Equation (18) for k = 2.

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

13

The opportunity cost allocation of LL is handled similarly. Allowing all τj ≥ 0 and k ≥ 2 again, L∗LL ((i), w) = L LL ((i), w) − LLL ((k), w) w(k) − w(i) if (i) 6= (k) and (i) is best = 0 if (i) = (k) and (k) is best

(21)

System (i) is selected only if we observe z(i) ≥ zj for all j 6= (i). Given that system (i) is selected, the expected opportunity cost is E[W(k) − W(i) | z(i) ≥ zj for all j 6= i] = z(k) − z(i) , since the posterior mean of W(i) after all data is observed is by definition z(i) . The optimal decision (for minimizing expected loss) is therefore D = arg max` Z` , i.e., to select the system with the best posterior mean. For either loss function, the distribution of each Z(i) may be nondegenerate when all τ(i) ≥ 0. That makes the distribution of which system will be chosen as best, D = arg max` Z` , and therefore the EVI, challenging to assess in general. 2.3.

Specializing the Analysis to the New Small-Sample Procedures

We now specialize this development to the new small-sample procedures, which require that all replications in a given stage be allocated to one system. When τ(j) > 0 for one system but all of the other τ(i) = 0, it is straightforward to assess the distribution of D = arg max` Z` . In particular, all but one of the Z(i) are distributed as a point mass at x ¯(i) (allow τ(i) → 0 in Equation (12) for all systems   2 with τ(i) = 0). The one system with τ(j) > 0 has Z(j) ∼ St x ¯(j) , n(j) (n(j) + τ(j) )/(τ(j) σ ˆ(j) ), n(j) − 1 distribution,  x ¯(k) − Z(j)    0 ∗ E[LLL ((j), W) | Y] = 0    Z(k) − x ¯(k−1)

if if if if

(j) 6= (k), (j) 6= (k), (j) = (k), (j) = (k),

Z(j) > x ¯(k) (system (j) is new best) Z(j) ≤ x ¯(k) (system (k) remains best) (22) Z(j) ≥ x ¯(k−1) (old best remains best) Z(j) < x ¯(k−1) (old best no longer best),

and the probability of these outcomes is determined with the cdf of a t distribution with n(j) − 1 degrees of freedom (no Welch approximation is needed). The EVI of τ(j) > 0 replications for system (j) and none for the others is the negative of expectation of the loss in Equation (22), averaged over possible outcomes Y. ∗ EVILL,(j) = − E[LLL (D(Y), W)] h i  λ−1/2 Ψn −1 λ1/2 (¯ x − x ¯ ) if (j) 6= (k) (k) (j) {j·} (j) h {j·} i =  λ−1/2 Ψn −1 λ1/2 (¯ ¯(k−1) ) if (j) = (k) , {j·} {j·} x(k) − x (j)

(23)

2 where λ{j·} = n(j) (n(j) + τ(j) )/(τ(j) σ ˆ(j) ) is the predictive precision of Z(j) (if τ(j) > 0 and τ(`) = 0 for

all ` 6= j). We observe that x ¯(k) + EVILL,(j) is the expected reward when only running τ(j) replications for system (j), and then selecting the system with the largest sample mean.

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

14

Equation (23) justifies the sampling allocation of Procedure LL1 , and the EVI is exact for a finite (nonasymptotic) number of additional samples. The Bonferroni bound is eliminated because only one of two systems can be selected as best if only one system is simulated – either the system simulated becomes the best, or the best of the other k − 1 systems becomes the new best if the output is sufficiently low. The other k − 2 systems cannot become the new ‘best’ after a single stage of sampling. To obtain the sampling allocation in Equation (7) of Procedure 0-11 , we retain the idea that only one of two systems can be selected, but we make one asymptotic approximation to avoid the computational overhead of determining the EVI exactly with Equation (18). Suppose for the moment that τ(i) > 0 for some (i) 6= (k) and that τ(j) = 0 for all j 6= i. If, in addition, k = 2 and ˜ 1/2 U(i) ) in Equation (19) converges to 1 as the number of replications U(i) > 0, then the term Φν˜(i) (λ ik of each system increases to infinity (one asymptotic approximation). The analogous asymptotic (lim τi → ∞ for all i together) for Equation (17) implies:  Pr W` ≤ W(k) for all ` 6= (k), Z(i) > Zj for all j 6= (i) → 0   Pr W(i) > W` for all ` 6= (i), Z(i) > Zj for all j 6= (i) → Pr Z(i) > Zj for all j 6= (i) .

Intuitively, this means that an infinite amount of data results in no uncertainty about W . Substitute 1/2

these limits to obtain the claimed EVI0−1,(i) approximation, Φν(i)(k) (−λ{ik} d(i)(k) ), in Procedure 0-11 .

3.

Empirical Results

A detailed comparison of different selection procedures, including KN ++ (Goldsman et al. 2002), OCBA variations, and the original VIP procedures, can be found in Branke et al. (2007). One

main result of that paper was that OCBALL and LL behave rather similarly, and are among the best procedures over a large variety of different test cases. We therefore do not report explicitly on OCBALL here, given the strong similarity to the LL results that are displayed below.

In this follow-up study, which extends the initial results in Chick et al. (2007), we compare the small-sample and asymptotic VIP procedures. The test setup is similar to the one used in Branke et al. (2007), which assessed different problem configurations, like monotone decreasing means, slippage configurations, and random problem instances. Many different parameter settings have been tested. We report here on only a typical subset of these runs due to space limitations, and focus on the main qualitative conclusions. We report primarily on the efficiency of the procedures, which is defined in terms of the average number of samples, E[N ], needed to reach a given frequentist probability of correct selection (or

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

15

expected opportunity cost, depending on the objective). All results reported are averages of 100,000 applications of the selection procedure. To sharpen the contrasts between different procedures, common random numbers (CRN) were used to generate common random problem instances, and to synchronize the samples observed across procedures. Samples were independent within each procedure. The initial number of evaluations for each system has been set to n0 = 6, as this yielded good results in Branke et al. (2007). Additional details regarding the implementation are included in Appendix A. 3.1.

Monotone decreasing means

In a monotone decreasing means (MDM) configuration the means are equally spaced with distance δ. We assume independent outputs for systems i = 1, . . . , k that are normally distributed,  Xij ∼ Normal −(i − 1)δ, 2ρ2−i /(1 + ρ) .

The parameter ρ controls the variances. Values of ρ < 1 mean that better systems have a smaller variance. Figure 1 compares the asymptotic VIP procedures LL and 0-1 to the small-sample counterparts LL1 and 0-11 , together with a very efficient adaptive stopping rule, EOCBonf . The small-sample

procedures slightly outperform the original versions if only a few samples are taken (which corresponds to relatively high levels of empirical PICSIZ ). This is the scenario that the small-sample procedures have been designed for, and the removal of the Bonferroni and Welch approximations seems to actually pay off. The situation changes completely with more stringent levels of PICSIZ , as shown in Figure 2. In that case, the asymptotic procedures outperform the small-sample procedures by a wide margin, and the difference grows with increasing E[N ]. Still, the small-sample procedures are much better than the na¨ıve Equal allocation. Figure 3 shows the output for the same configuration, but with a fixed sampling budget (S stopping rule) as opposed to an adaptive stopping rule. All allocations become less efficient with the fixed sampling budget (compare Figures 2 and 3). That observation is consistent with the fact that several other allocations also perform less effectively with a fixed sampling budget (S ) than with the (EOCBonf ) stopping rule (Branke et al. 2007). The asymptotic allocations suffer more than the small-sample variants. In particular, 0-1(S ) is the worst procedure for this experiment. The two small-sample procedures perform similarly and somewhere between LL(S ) and 0-1(S ), even for large E[N ] (large E[N ] results in low levels of PICSIZ ).

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

16

LL1 (EOCBonf) 0-1 (EOCBonf) 0-11 (EOCBonf) LL (EOCBonf) Equal (EOCBonf) Equal (S)

0.2

0.1

PICSIZ

PICSIZ

0.1

LL1 (EOCBonf) 0-1 (EOCBonf) 0-11 (EOCBonf) LL (EOCBonf) Equal (EOCBonf)

0.05

60

65

Figure 1

0.001 70 E [N]

50

80

Figure 2

100

150

200 E [N]

250

300

PCSIZ efficiency for the original and new

allocations with the EOCBonf stopping rule,

allocations with the EOCBonf stopping rule,

and a small mean number of additional

and a large mean number of additional sam-

samples (MDM, k = 10, δ = 0.5, ρ = 1).

ples (MDM, k = 10, δ = 0.5, ρ = 1).

LL1 (S) 0-1 (S) 0-11 (S) LL (EOCBonf) LL (S) Equal (S)

0.01

0.001

LL1 (PCSSlep) LL1 (EOCBonf) LL1 (S) LL (PCSSlep) LL (EOCBonf) LL (S)

0.1

PICSIZ

PICSIZ

75

PCSIZ efficiency for the original and new

0.1

0.01

0.001 50

Figure 3

0.01

100

150

200 E [N]

250

300

PCSIZ efficiency for the original and new small-sample VIPs with S stopping rule

50

100

150

200

250

300

E [N]

Figure 4

Comparison of different stopping rules for LL and LL1 (MDM, k = 10, δ = 0.5, ρ = 1).

(MDM, k = 10, δ = 0.5, ρ = 1).

The influence of the stopping rule is even more apparent in Figure 4, which compares three different stopping rules for LL and LL1 . For the LL stopping rule, EOCBonf is clearly the most efficient, followed by PCSSlep and S . The influence is rather large: For example, to reach a PCSIZ of 0.005, LL requires approximately 115, 125, or 172 samples, depending on stopping rule – the S stopping rule is much worse than the other two. For LL1 , the ranking is the same for higher

acceptable PICSIZ , but the influence is smaller. For the above example of a PCSIZ of 0.005, the required numbers of samples are approximately 162, 190, and 195 for the three stopping rules. For very low values of PICSIZ , the stopping rule with a fixed budget becomes even better than the PCSSlep stopping rule (rightmost two lines); the corresponding efficiency line shows less curvature. Figure 5 shows the influence of the number of systems, k, on the relative performance of

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

17

LL(EOCBonf ) and LL1 (EOCBonf ). For larger k (e.g., k = 20 in the figure), the gap between LL(EOCBonf ) and LL1 (EOCBonf ) is larger when a low level of PICSIZ is desired (e.g., LL1 (EOCBonf )

requires approximately 67% more samples than LL(EOCBonf ) to reach a PICSIZ of 0.003). The relative performance of the small-sample procedure improves with decreasing k, until it is basically equivalent for k = 2. Figure 5 focuses on the EOCBonf stopping rule. For the S stopping rule (not shown), all procedures become less efficient, as shown previously. For k = 2, LL(S ) and LL1 (S ) perform very similarly. For k = 20, LL1 (S ) requires 32% more samples than LL(S ) to reach a PICSIZ level of 0.003, compared to 67% in combination with EOCBonf . The LL1 allocation suffers more, relative to LL, with increasing k when the EOCBonf stopping rule is used, as compared to the S stopping rule. We varied ρ to see if the qualitative observations above would change if the sample variance of the systems were different. We did not observe any perceptible trends that deviated from the qualitative observations above. 3.2.

Slippage configuration

In a slippage configuration (SC) the means of all systems except the best are tied for second best. We use the parameters δ and ρ to describe the configurations of the independent outputs with Normal (wi , σi2 ) distribution, X1j ∼ Normal 0, σ12



 Xij ∼ Normal −δ, σ12 /ρ for i = 2, . . . , k

All systems have the same variance if ρ = 1. The best system has the largest variance if ρ > 1. We set σ12 = 2ρ/(1 + ρ) so that Var[X1j − Xij ] is constant for all ρ > 0. The above MDM example with k = 2 is actually also a SC. More SC results with k = 5 are shown in Figure 6. Again, in combination with the EOCBonf stopping rule, LL is more efficient than LL1 (when k > 2. However, in contrast to the MDM setting, LL1 (S ) outperforms LL(S ) for k > 2 in each of our tests (up to k = 20). As for the MDM experiments, changing ρ did not have any discernible effect on the qualitative observations above. 3.3.

Random problem instances

Random problem instances (RPI) are more realistic in the sense that problems faced in practice typically are not the SC or MDM configuration. The RPI experiment here samples configurations χ from the normal-inverse gamma family. If S ∼ InvGamma (α, β), then E[S] = β/(α − 1) and S −1 ∼

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

18

EOCIZ

0.1

PICSIZ

0.1

LL1, k=2 LL1, k=5 LL1, k=10 LL1, k=20 LL, k=2 LL, k=5 LL, k=10 LL, k=20

0.01

0.01

0.001

0.001 50

Figure 5

LL1 (EOCBonf) LL1 (S) LL (EOCBonf) LL (S)

100

150 E [N]

200

250

50

300

Influence of the number of systems, k,

Figure 6

100

Influence

150

of

200 E [N]

the

on LL(EOCBonf ) and LL1 (EOCBonf ) (MDM,

LL(EOCBonf )

k = 10, δ = 0.5, ρ = 1).

k = 5, δ = 0.5, ρ = 1).

and

250

stopping

300

rule

LL1 (EOCBonf )

350

on (SC,

Gamma (α, β) with E[S −1 ] = αβ −1 and Var[S −1 ] = αβ −2 . A random χ is generated by sampling the σi2 independently, then sampling the Wi conditionally independent, given σi2 , p(σi2 ) ∼ InvGamma (α, β)

(24)

p(Wi | σi2 ) ∼ Normal µ0 , σi2 /η . 

Increasing η makes the means more similar. We set β = α − 1 > 0 to standardize the mean of the variances to be 1. Increasing α reduces the variability in the variances. The noninformative prior distributions that can be used in the derivation of VIP procedures correspond to η → 0, so there is a mismatch in the sampling distribution of χ and the prior distributions assumed by the VIP. A typical result for the RPI problem instances is shown in Figure 7. It is consistent with the previous observations – the asymptotic variants are better in combination with the EOCBonf stopping rule, while the small-sample procedures are at least competitive in combination with the S stopping rule or for a small number of additional samples. For flexible stopping rules, like EOCBonf and PCSSlep , that stop sampling after a certain estimated target performance has been reached, it is interesting to compare the true frequentist performance to the desired target performance. Because the flexible stopping criterion depends on the samples drawn, the obtained (frequentist) performance is also influenced by the sampling rule. A typical plot is shown in Figure 8. The small sample procedure leads to a slightly more conservative behavior than the asymptotic procedure, meaning that it over-delivers slightly more. Note that none of the expected value of information procedures yet have a provable frequentist performance guarantee. On the other hand, the EOCBonf and PCSSlep stopping rules provide Bayesian guarantees for the posterior expectation of the opportunity costs and probability of nongood selections, to the extent that the Welch approximates those probabilities well.

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

0.1 LL1 (EOCBonf) LL1 (S) LL (EOCBonf) LL (S) 0-11 (S) Equal (EOCBonf) Equal (S)

LL 6 (EOCBonf) LL1 6 (EOCBonf)

EOCIZ

EOCIZ

0.05

19

0.01

0.005

0.002 50

Figure 7

100

150 E [N]

200

250

Efficiency of different allocation procedures for RPI (RPI, k = 5, η = 0.5, b = 100).

3.4.

300

0.01 0.01

0.1 β*

Figure 8

Frequentist EOCIZ depending on target EOC η (RPI, k = 5, η = 0.5, b = 100.

Mixed procedures

The preceding results show that the small sample procedures perform better when only a small number of samples is allocated, while the asymptotic procedures perform better for a larger number of samples that are sequentially allocated one after another. These results are consistent with intuition, given the derivation of the procedures. The asymptotic procedures allocate in order to solve a large-sample asymptotic approximation to the expected value of information, while the small sample procedures greedily maximize the outcome assuming only one additional sample. A natural question is whether it is possible to combine the procedures. In particular, if a large number of samples is allocated, is it better to first allocate samples with the asymptotic procedure, then switch to the small sample procedure to allocate the last few remaining samples. Figure 9 shows the effect of switching from one allocation to another. The procedures in that picture allocate one replication at a time. The notation LL denotes the usual LL(S ) procedure. The notation LL1 (S < 200)LL denotes the use of LL1 to allocate until 200 samples have been observed, and LL thereafter. In our example, after 200 samples (Figure 9, left panel), the results obtained with LL are significantly better than the results obtained with LL1 . Switching from LL to LL1 after 200 samples (lower pair of curves) is beneficial for at least several additional samples (approximately 20 in this case). Switching from LL1 to LL after 200 samples (upper pair of curves) temporarily decreases performance. Continuing with LL for a longer term (not shown here) returns LL to a state that dominates LL1 . The plot in the left hand panel of Figure 9 is somewhat more

jagged than the other plots, due to the scaling of the ordinate and the abscissa. Similar observations hold when switching at the beginning of the run, at a time when LL1 is still better than LL, as demonstrated in the right panel of Figure 9.

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

20 0.006

0.03 LL LL_1 (S < 200) LL LL_1 LL (S < 200) LL_1

0.0058

LL LL_1 (S < 50) LL LL_1 LL (S < 50) LL_1

0.025

0.0054

EOC_IZ

EOC_IZ

0.0056

0.0052 0.005

0.02

0.0048 0.0046 190

Figure 9

195

200

205 S

210

215

220

50

55

60

65

70

S

Effect of switching from LL to LL1 or vice versa, after 200 samples (left panel) and 50 samples (right panel) (RPI, k = 5, η = 0.5, b = 100).

Overall, this suggests that it is indeed a good idea to start with LL, and switch to the small sample procedure LL1 for the last few samples to be allocated. It does not appear to be effective to start with LL1 then later switch to LL, hoping that the initially better performance could be sustained by LL in the long run.

4.

Discussion and Conclusion

The choice of the selection procedure and its parameters can have a tremendous effect on the effort spent to select the best system, and the probabilistic evidence for making a correct selection. Branke et al. (2007) showed that the VIP procedures (like LL) and the OCBA procedures are highly effective, especially with flexible stopping rules. The new small-sample VIP procedures avoid the Bonferroni and Welch approximations in their derivation, along with an asymptotic approximation. That provides a potential net benefit relative to previous VIP derivations, which assumed a large-sample asymptotic and other approximations. The small-sample procedures suffer, however, from a myopic allocation that presumes that a selection will be made after one allocation. Repeatedly applying that allocation greedily, in order to obtain a sequential procedure, results in a sub-optimality. Naturally, this sub-optimality is more significant if the number of samples allocated before selection is large (i.e., if low numbers of PICSIZ are sought). Overall, the empirical results that are presented above show that the small-sample procedures are particularly competitive if either the number of additional samples allocated is small, a fixed budget is used as stopping rule (as opposed to a flexible stopping rule such as EOCBonf ), or the number of systems is small. The PCS-based procedures 0-1 and 0-11 were almost always inferior to the EOC-based allocations. Here, the small-sample variant 0-11 yielded a greater improvement

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

21

relative to 0-1, as compared with the relative difference in performance between LL and LL1 . One explanation for the efficient performance of the asymptotic procedures may be that the approximations for the original VIP allocations are not significant, given that the allocations are rounded to integers in order to run an integer number of replications at each stage. The asymptotic allocations are designed to sample for long run behavior, and therefore perform well when more stringent levels of evidence are required. That said, even if the asymptotic LL allocation is used for the bulk of the sampling, there is a small efficiency improvement that is associated with switching to LL1 to allocate the last few samples. Appendix A:

Implementation Issues

The implementation that generated the analysis and graphs in this paper was written in C++. It used the Gnu Scientific Library (gsl) for calculating cdfs, the Mersenne twister random number generator (Matsumoto and Nishimura 1998, with 2002 revised seeding), and FILIB++ (Lerch et al. 2001) for interval arithmetic. Calculations were run on a mixed cluster of up to 120 nodes. The nodes were running Linux 2.4 and Windows XP with Intel P4 and AMD Athlon processors ranging from 2 to 3 GHz. Jobs were distributed with the JOSCHKA-System (Bonn et al. 2005). In order to calculate PCSSlep with high numerical accuracy, we did not calculate it directly with the formula P in Section 1.2. We calculated PCSSlep indirectly via 1 − PCSSlep = −(exp( j log(1 − Φν(j)(k) (−d∗jk ))) − 1). This transformation is useful because log(1 − x) = log1p(-x) and exp(x) − 1 = expm1(x) from the C runtime library have increased accuracy for x near 0, and Φν (t) is more stable for t → −∞. Numerical stability problems may arise in implementations of the LL1 and 0-11 allocations, even with double-precision floating point arithmetic, as the total number of replications gets quite large (i.e., for low values of α∗ or EOC bounds). To better distinguish (i) which system should receive samples in a given stage and (ii) when to stop sampling, we improved numerical stability in several ways for LL1 and 0-11 . We evaluate the system that maximizes log EVI. If the numerical stability does not suffice to calculate log Φν (t) and log Ψν (t) (underflow error), we considered two general approaches. In the first approach, we use interval arithmetic based upon bounds for log Φν (t) and log Ψν (t) that are based on the following property of the cdf of a t-distribution (Evans et al. 1993), ( 1 inc ν 1 β ( , , ν ) if t ≤ 0 2 reg 2 2 ν +t2 Φν (t) = (25) inc ( ν , 1 , ν ) if t > 0, 1 − 12 βreg 2 2 ν +t2 R inc (a, b, x) = β(a, b)−1 x ua−1 (1 − u)b−1 du is the incomplete beta function, and β(a, b) = where βreg 0 Γ(a)Γ(b)/Γ(a + b) denotes the beta function. A lower bound for log Φν (−t) for t > 0 can be derived as follows. If f (u) = ua−1 (1 − u)b−1 , then f (0) = 0, f 0 (u) ≥ 0 and f 00 (u) > 0 for a =

ν

2

> 1, b =

1 2

and all u ∈ [0, 1]. So the

area below f (u) over [0, x] is always larger than the area below the tangent at (x, f (x)).   log Φν (−t) ≥ ν2 log t2ν+ν + 21 log(1 − t2ν+ν ) − log ( ν2 − 1)(1 − t2ν+ν ) + 12 t2ν+ν − log 2

(26)

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

22

For the upper bound, recall Equation (1). As Ψν (t) > 0 for all t > 0 we obtain Φν (−t)
2, or with Equation (19) for k = 2. We implemented both naive Monte Carlo and importance sampling methods to estimate the EVI in Equation (19) but found the estimators to be too inefficient and noisy to be useful. An alternative with k = 2 is one-dimensional quadrature for h i ˜ 1ik/2 U ) − 1) EVI0−1,(i) = E 11 {U > 0}(2Φν˜ (λ Z ∞   /2 ˜ 1ik/2 λ−1/2 y) − 1}dy. = φν(i)(k) y + λ1{ik} d(i)(k) {2Φν˜ (λ {ik}

(30)

y =0

Quadrature behaved ‘better’ when transformed to the unit interval,

R∞ 0

f (y)dy =

R1 0

f ((1 − t)/t)/t2 dt, but

the CPU time to compute the allocation was an order of magnitude slower than for 0-11 , and accuracy was poor if the differences in means was large.

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

23

In summary, we tested a number of techniques in order to improve the numerical stability of each procedure. Some, but not all, of those numerical techniques required additional computational overhead in order to compute the allocation. Monte Carlo estimates and complicated quadrature, the techniques that required the most additional overhead, did not necessarily help more. Interval arithmetic worked fine, but again the minimal improvements did not seem to justify the computational overhead. Definitely useful were the transformations that allowed us to use functions from the C runtime library with higher precision in the critical value space. The LL and 0-1 sampling allocations did not experience numerical stability problems with collisions. Those allocations did not require the above numerical stability techniques. Any procedure that uses the PCSSlep stopping rule can benefit from the trick to improve the accuracy of estimating PCSSlep with high numerical accuracy.

References Bechhofer, R. E., T. J. Santner, D. M. Goldsman. 1995. Design and Analysis for Statistical Selection, Screening, and Multiple Comparisons. John Wiley & Sons, Inc., New York. Bernardo, J. M., A. F. M. Smith. 1994. Bayesian Theory. Wiley, Chichester, UK. Boesel, J., B. L. Nelson, S.-H. Kim. 2003. Using ranking and selection to ‘clean up’ after simulation optimization. Operations Research 51 814–825. Bonn, M., F. Toussaint, H. Schmeck. 2005. JOSCHKA: Job-Scheduling in heterogenen Systemen mit Hilfe von Webservices. Erik Maehle, ed., PARS Workshop L¨ ubeck . Gesellschaft f¨ ur Informatik, 99–106. Branke, J., S.E. Chick, C. Schmidt. 2007. Selecting a selection procedure. Management Science 53 to appear. Branke, J., C. Schmidt. 2004. Sequential sampling in noisy environments. X. Yao et al., ed., Parallel Problem Solving from Nature, LNCS , vol. 3242. Springer, 202–211. Chen, C.-H., J. Lin, E. Y¨ ucesan, S. E. Chick. 2000. Simulation budget allocation for further enhancing the efficiency of ordinal optimization. Discrete Event Dynamic Systems: Theory and Applications 10 251–270. Chick, S. E., J. Branke, C. Schmidt. 2007. New greedy myopic and existing asymptotic sequential selection procedures: Preliminary empirical results. S.G. Henderson, B. Biller, M.-H. Hsieh, J. Shortle, J.D. Tew, R.R. Barton, eds., Winter Simulation Conference. IEEE, Inc., Piscataway, NJ, to appear. Chick, Stephen E., Koichiro Inoue. 2001. New two-stage and sequential procedures for selecting the best simulated system. Operations Research 49 732–743. de Groot, Morris H. 1970. Optimal Statistical Decisions. McGraw-Hill, New York. Evans, M., N. Hastings, B. Peacock. 1993. Statistical Distributions. 2nd ed. Wiley, New York. Goldsman, D., S.-H. Kim, W. S. Marshall, B. L. Nelson. 2002. Ranking and selection for steady-state simulation: Procedures and perspectives. INFORMS Journal on Computing 14 2–19.

24

Chick, Branke, and Schmidt: New Myopic Sequential Sampling Procedures Article submitted to INFORMS Journal on Computing; manuscript no. (Please, provide the mansucript number!)

Gupta, S. S., K. J. Miescke. 1994. Bayesian look ahead one stage sampling allocations for selecting the largest normal mean. Statistical Papers 35 169–177. Gupta, S. S., K. J. Miescke. 1996. Bayesian look ahead one-stage sampling allocations for selecting the best population. Journal of Statistical Planning and Inference 54 229–244. He, D., S. E. Chick, C.-H. Chen. 2007. The opportunity cost and OCBA selection procedures in ordinal optimization for a fixed number of alternative systems. IEEE Trans. Systems, Machines, Cybernetics C: Applications and Reviews 37 951–961. Kim, S.-H., B. L. Nelson. 2006. Selecting the best system. S. G. Henderson, B. L. Nelson, eds., Handbook in Operations Research and Management Science: Simulation. Elsevier, 501–534. Law, Averill M., W. David Kelton. 2000. Simulation Modeling & Analysis. 3rd ed. McGraw-Hill, New York. Lerch, M., G. Tischler, J. Wolff von Gudenberg, W. Hofschuster, W. Kraemer. 2001. The interval library filib++ 2.0 - design, features and sample programs. Preprint 2001/4, University of Wuppertal. Matsumoto, M., T. Nishimura. 1998. Mersenne twister: A 623-dimensionally equidistributed uniform pseudorandom number generator. ACM TOMACS 8 3–30. Schmidt, C., J. Branke, S. Chick. 2006. Integrating techniques from statistical ranking into evolutionary algorithms. Applications of Evolutionary Computation, LNCS , vol. 3907. Springer, 753–762.

Suggest Documents