Comparisons with a Standard in Simulation Experiments - CiteSeerX

8 downloads 0 Views 309KB Size Report
May 13, 1998 - 4. They provide the following bounds: Bounds on the di erence between each alternative and the .... 5a: If X(k). X0+c, then choose the standard and form the one-sided con dence intervals .... When 0 is known, then we replace X0 by 0 in (10){(11). ..... 4.648 (4.702) 6.412 (6.484) 3.965 (4.000) 4.545. 5 2.809.
Comparisons with a Standard in Simulation Experiments Barry L. Nelson Department of Industrial Engineering & Management Sciences Northwestern University David Goldsman School of Industrial and Systems Engineering Georgia Institute of Technology May 13, 1998 Abstract

We consider the problem of comparing a nite number of stochastic systems with respect to a single system (designated as the \standard") via simulation experiments. The comparison is based on expected performance, and the goal is to determine if any system has larger expected performance than the standard, and if so to identify the best of the alternatives. In this paper we provide two-stage experiment design and analysis procedures to solve the problem for a variety of scenarios, including when we encounter unequal variances across systems, and when we use the variance reduction technique of common random numbers and it is appropriate to do so. The emphasis is added because in some cases common random numbers can be counterproductive when performing comparisons with a standard. We also provide methods for estimating the critical constants required by our procedures, present a portion of an extensive empirical study and demonstrate one of the procedures via a numerical example.

1 Introduction We consider an important case of the general class of problems that require comparing a nite, and relatively small number of simulated systems in terms of their expected performance. By \small" we mean less than, say, 20 systems. 1

There has been recent interest in the speci c problem of determining the best of these systems, where \best" means maximum or minimum expected performance of a single, common performance measure. See, for instance, Goldsman, Nelson and Schmeiser (1991), Matejcik and Nelson (1995), Nakayama (1995), Nelson and Matejcik (1995), Yang and Nelson (1991), and the references therein. This work has yielded a rich collection of procedures, including two-stage procedures that guarantee that the best system is chosen and that con dence intervals for the di erence between each alternative's performance and the best system's performance can be formed. In this paper we derive procedures with very similar characteristics that apply to problems in which one of the systems is singled out as the standard or benchmark, and the others are rst evaluated with respect to the standard, and then evaluated with respect to each other if any are superior to the standard. A basic rule of thumb when comparing systems is that sharper inference is obtained by focusing only on the speci c comparisons that are relevant to the application at hand. For instance, when it is important to select the best, and the number of observations is xed, then the procedures cited above are more statistically ecient|meaning more likely to detect actual di erences in expected performance|than procedures that provide all-pairwise comparisons among the alternatives. Similarly, when we desire comparisons with respect to a standard, we are more likely to obtain conclusive results if we derive procedures that speci cally deliver those comparisons. In many applications the expected performance of the standard, as well the expected performance of the alternatives, is unknown. For example, the standard might be an existing system that is being considered for replacement, but it is nevertheless simulated in order to provide a fair comparison with the alternatives. A second example occurs when the standard is the least-cost system design, the design that will be implemented unless more expensive alternatives can signi cantly better its performance. In the statistics literature this type of problem is known as \comparison with a control." 2

In other applications the performance of the standard may be considered known or certain. An example is an existing system that has been in place long enough that its long-run average performance is well documented; the simulation might be undertaken to evaluate various upgrade strategies. A second example of a known standard is a goal or requirement| such as responding to customer calls within 30 minutes|when the purpose of the simulation study is to determine which system designs can meet or beat this standard. Clearly, a known standard is a limiting case of an unknown standard as the performance of the unknown standard becomes more and more certain. The procedures that we develop in this paper all share the following characteristics: 1. They require that the simulation experiment be performed in two stages, a rst stage to assess the variability of the simulation output, and a second stage designed to achieve the desired precision of the comparison. 2. They exploit the concept of an indi erence zone, which is an experimenter-speci ed di erence in expected performance that is deemed practically signi cant, and therefore worth detecting. 3. They yield a decision, either that no alternative is better than the standard, or that one or more of them is better. When at least one of the alternatives is better, the procedures indicate which alternative is the best. These decisions are guaranteed to be correct with an experimenter-speci ed probability. 4. They provide the following bounds: Bounds on the di erence between each alternative and the standard, when none of the alternatives betters it; and bounds on the di erence between each alternative and the best of the others when one or more of the alternatives is better than the standard. These bounds are also guaranteed to be correct with at least an experimenter-speci ed probability. 3

Our procedures extend the work of Bechhofer and Turnbull (1978) and Paulson (1952) by adapting them to handle unequal variances and common random numbers|conditions frequently encountered in simulation experiments|and by adding the bounds on the di erences. Like this earlier work, the speci c procedures we derive depend on the simulation output data being approximately normally distributed. A characteristic of our procedures (and those of Bechhofer 1978, Turnbull and Paulson 1952) is that they require one or more critical constants, constants that are neither easily calculated nor readily tabled. To remedy this problem we also exhibit methods to estimate upper bounds on these constants, bounds that hold with an experimenter-speci ed probability. Since the probability that our procedures achieve their objectives is an increasing function of the critical constants, employing upper bounds insures that the objectives are achieved with at least the prespeci ed probability. The paper is organized as follows: The next section presents the generic comparisonwith-a-standard procedure and establishes the key probability statements that guarantee its success. These key statements do not depend upon the output data being normally distributed. Section 3 customizes the generic procedure for di erent properties of the simulation output data; all of these customizations depend on normality. Section 4 shows how to estimate the critical values required by some versions of the comparison procedure. Section 5 presents a portion of an empirical evaluation of these procedures, while Section 6 explicitly illustrates one of them via a numerical example. Finally, Section 7 describes some direct extensions of the research. The longer proofs are contained in the Appendix.

2 Framework In this section we precisely de ne the comparison problem of interest and put in place the framework that we use to derive procedures. Let i denote the ith system, for i = 0; 1; : : : ; k, where  is the designated benchmark 0

4

or standard. Let Xij represent the j th output (typically a sample average from within a replication or batch) from system i. In this paper Xi ; Xi ; : : : are taken to be independent and identically distributed (i.i.d.), a condition that is always true for replications, and is approximately true when batching within a single long replication if the underlying stochastic process is stationary and the batches are suciently large. System i has expected performance i = E[Xij ], and we denote the ordered but unknown means for alternative systems 1; 2; : : : ; k as 1

2

      k : [1]

[2]

[ ]

The system associated with  i is unknown, but is denoted  i . The expected performance of the standard is denoted as  . All of our procedures are based on estimating the true mean, i, by a sample mean, so we let Xi denote the sample mean of the observations from system i, and let X i denote the sample mean associated with the system having mean  i . The X i need not be ordered, in contrast to X i which denotes the ith smallest sample mean; that is, [ ]

[ ]

0

[ ]

[ ]

[ ]

( )

X  X      X k : (1)

(2)

( )

We desire procedures that provide the following simultaneous guarantees: Prfselect  j   k g  1 ? 0

0

(1)

[ ]

Prfselect  k j k   k? + ;  k   + g  1 ? [ ]

[ ]

[

1]

[ ]

0

(2)

where notation such as \j   k " indicates con gurations of the true means under which we require the probability statement to hold. In words, we wish to retain the standard when none of the alternatives is better, and we wish to select the best of the alternatives when it is at least a practically signi cant amount  better than everything else. The parameter  is called the indi erence zone. If the best alternative is less than  better than the other 0

[ ]

5

alternatives or the standard, then we are indi erent to which of the good systems we select. We will show that our procedures guarantee, with probability at least 1 ? , that the selected system is within  of the best regardless of the con guration of the true means. Our procedures provide a constant, c, and an algorithm to determine the number of outputs, Ni, to be obtained from i, such that (1){(2) hold when we apply the following rule: Choose the standard if X k  X + c; otherwise choose the alternative associated with X k . When the expected performance of the standard is known, then replace X by  in the rule. A generic version of our procedure is as follows: 0

( )

0

( )

0

Generic Comparison-with-a-Standard Procedure Step 0: Given k alternative systems and a standard, specify an initial sample size, n , an indi erence zone , and a con dence level 1 ? . Determine appropriate constants g 0

and h, and let c = h=g.

Step 1: Generate a sample Xi ; Xi ; : : :; Xin from system i, for i = 0; 1; 2; : : : ; k. 1

2

0

Step 2: Compute the appropriate variance estimator Si to be associated with system i. 2

Step 3: Determine the required total sample size from system i as

8 2 ! 39 ( &  ') = < = max :n ; 66 hSc i 77; ; Ni = max n ; gS i 7 6 2

2

0

0

(3)

where dxe denotes the least integer that is greater than or equal to x.

Step 4: Obtain additional outputs Xi;n ; Xi;n ; : : :; Xi;Ni from system i if needed, and 0 +1

0 +2

compute the overall sample mean Ni X Xi = N1 Xij i j =1

for i = 0; 1; 2; : : : ; k. 6

Step 5: With con dence level greater than or equal to 1 ? , apply the following rule: Step 5a: If X k  X + c, then choose the standard and form the one-sided con dence ( )

0

intervals

 ? i  X ? Xi + c 0

(4)

0

for i = 1; 2; : : : ; k.

Step 5b: Otherwise, choose the alternative associated with the largest sample mean X k , and form the con dence intervals ( )

1

" 

?  # i ? max  2 ? Xi ? max X ?  ; Xi ? max X +  `6 i ` `6 i ` `6 i ` +

=

=

=

(5)

for i = 0; 1; 2; : : : ; k, where ?x? = minf0; xg, and x = maxf0; xg. +

Comment: If  is known, then we do not need to sample from system 0, and we replace 0

X by  in Step 5. 0

0

Comment: If smaller expected performance is better, then Steps 5a{b become Step 5a: If X  X ? c, then choose the standard and form the one-sided con dence (1)

0

intervals

 ? i  X ? Xi ? c 0

0

for i = 1; 2; : : : ; k.

Step 5b: Otherwise, choose the alternative associated with the smallest sample mean X , and form the con dence intervals "  ?  # i ? min  2 ? Xi ? min X ?  ; Xi ? min X +  `6 i ` `6 i ` `6 i ` (1)

+

=

=

=

for i = 0; 1; 2; : : : ; k, where ?x? = minf0; xg, and x = maxf0; xg. +

Con dence intervals of this form are known as multiple comparisons with the best (MCB), since the di erence between each alternative and the best of the others is bounded. 1

7

We now derive the fundamental probability statements that guarantee a correct selection. In this section our only requirement on Xij is that the distribution of Xij ? i is independent of i ; a sucient condition is that the Xij are normally distributed We will use the notation \8ia:::b" to mean i = a; a + 1; : : : ; b. The generic algorithm will insure that (1) holds if PrfXi  X + c; 8i :::k j  i; 8i :::k g  1 ? : 0

1

0

(6)

1

Notice that (6) depends on the unknown means i. However, PrfXi  X + c; 8i :::k j  i; 8i :::k g = Prf(Xi ? X ) ? (i ?  )  c ? (i ?  ); 8i :::k j  i; 8i :::k g  Prf(Xi ? X ) ? (i ?  )  c; 8i :::k g 0

1

0

1

0

0

0

0

0

1

0

1

(7)

1

and (7) does not depend on the unknown means. Similarly, (2) will hold if PrfX k > X + c; X k > X i ; 8i :::k? j k   k? + ;  k   + g  1 ? : 0

[ ]

[ ]

[ ]

1

1

[ ]

[

1]

0

[ ]

(8)

Again, (8) depends on the unknown means i. However, PrfX k > X + c; X k > X i ; 8i :::k? j k   k? + ;  k   + g 0

[ ]

[ ]

1

[ ]

1

[ ]

[

1]

0

[ ]

= Prf(X k ? X ) ? ( k ?  ) > c ? ( k ?  ); (X k ? X i ) ? ( k ?  i ) > ?( k ?  i ); 8i :::k? j k   k? + ;  k   + g 0

[ ]

[ ]

[ ]

0

[ ]

[ ]

[ ]

0

[ ]

[ ]

[ ]

1

1

[ ]

[

1]

0

[ ]

 Prf(X k ? X ) ? ( k ?  ) > c ? ; (X k ? X i ) ? ( k ?  i ) > ?; 8i :::k? g (9) [ ]

0

0

[ ]

[ ]

[ ]

[ ]

[ ]

1

1

and (9) is independent of the unknown means. Therefore, we can attain (1){(2) if we can derive a procedure that simultaneously guarantees that Prf(Xi ? X ) ? (i ?  )  c; 8i :::k g  1 ?

(10)

Prf(X k ? X ) ? ( k ?  ) > c ? ; (X k ? X i ) ? ( k ?  i ) > ?; 8i :::k? g  1 ? :

(11)

0

[ ]

[ ]

0

0

1

[ ]

[ ]

0

[ ]

[ ]

8

1

1

When  is known, then we replace X by  in (10){(11). We derive procedures for a variety of cases in Section 3, where we assume that the simulation output data are approximately normally distributed. However, (10){(11) depend only on the weaker condition that the distribution of Xij ? i is independent of i . Therefore, procedures for other types of data could be based on satisfying (10){(11) provided appropriate constants g and h can be determined. We have shown that (10){(11) guarantee a correct selection with probability  1 ? . The following theorem shows that (10){(11) are also sucient to establish the con dence intervals in Step 5 of the generic procedure: 0

0

0

Theorem 1 If (10) and (11) hold and the distribution of Xij ? i is independent of i, then the events (4) and (5) occur with probability greater than or equal to 1 ? . Proof: That (4) holds with probability greater than or equal to 1 ? follows immediately from (10). To show that (5) holds, notice that since c > 0, PrfX k > X ; X k > X i ; 8i :::k? j k   k? + ;  k   + g  PrfX k > X + c; X k > X i ; 8i :::k? j k   k? + ;  k   + g 0

[ ]

[ ]

0

[ ]

1

[ ]

[ ]

1

[ ]

[

1

[ ]

1

1]

[ ]

0

[ ]

[

1]

[ ]

0

 1? where the last inequality follows because (11) implies (8). Therefore, (5) holds with probability greater than or equal to 1 ? by Theorem 1 of Nelson and Matejcik (1995). As consequence of Theorem 1, we are guaranteed that the true mean of the system with the largest second-stage sample mean (be it the standard, or one of the alternatives) is within  of the largest true mean with probability greater than or equal to 1 ? under all possible con gurations of  ;  ; : : :; k . In order to state the following corollary, let B denote the index of the system with the largest second-stage sample mean. 0

1

9

Corollary 1 If (10) and (11) hold and the distribution of Xij ? i is independent of i , then PrfB ? max   ?g  1 ? : i6 B i =

Proof: From Theorem 1, we know that

"  ?  # B ? max  2 ? XB ? max X ?  ; XB ? max X +  `6 B ` `6 B ` `6 B ` +

=

=

=

with probability greater than or equal to 1 ? . But since XB  X`; 8` :::k , the smallest possible value of the lower bound is ?. Notice that we may still select the standard even if X is not the largest sample mean, because our requirements (1){(2) favor the standard, seeking to retain it even if it is tied with the best. Thus, we require substantial evidence before giving up the standard. Corollary 1 guarantees that we get a \good" system, with high probability, if we select the one with the largest sample mean. 0

0

3 Procedures We have derived speci c instances of the Generic Comparison-with-a-Standard Procedure to handle the types of data encountered in simulation. Speci cally, we consider the following cases:

Status of the standard: We consider the case in which  is known and when it is un0

known and must be estimated along with the expected performance of the alternatives. When  is unknown, it may be estimated via a simulation experiment or by collecting data on the real system. 0

Unequal variances across systems: All of our procedures permit unequal variances across systems, although Case C below is an approximation when the variances are not all the same. This is one of the contributions of our work beyond that of Bechhofer and Turnbull (1978) and Paulson (1952). 10

Dependence across systems: We develop procedures in which all systems are simulated (or sampled, if  is a real system) independently. For the case of unknown  , we also develop procedures in which the simulations of all systems are driven by common random numbers (CRN), inducing dependence across systems. One way we account for the e ect of CRN is to assume that the induced variance-covariance matrix across systems satis es a condition known as sphericity. In brief, sphericity approximates the variance of the di erence between observations from any two of the systems by the average variance of the di erence between observations from all pairs of systems. See Nelson (1993) and Nelson and Matejcik (1995) for further discussion of sphericity. 0

0

We also show that it may be counterproductive to use CRN when  is known, or  is unknown and estimated independently of the alternatives. Therefore, we do not derive CRN-compatible procedures for these cases. 0

0

For readers only interested in applications, Section 3.1 gives the essential information required to customize the Generic Procedure for various cases. The proofs are in the Appendix.

3.1 Customizing the Generic Procedure All of our procedures require critical values (g; h) that satisfy PrfH  hg = 1 ? PrfG  gg = 1 ? where H and G are random variables whose distributions depend on the speci cs of the problem. In addition, all of the procedures require a variance estimator Si associated with system i from the rst-stage sample (Step 2). In this section we collect the de nitions of G, H and Si for each of the cases considered below. To do so, let Z ; Z ; : : :; Zk be i.i.d. N(0; 1) random variables; let Y ; Y ; : : : ; Yk be i.i.d.  random variables, each with n ? 1 degrees of 2

2

1

0

1

2

11

2

0

freedom; let T; T ; T ; : : :; Tk be i.i.d. t random variables, each with n ? 1 degrees of freedom; and let (T ; T ; : : :; Tk ) be a multivariate-t random vector with common correlation 1=2 and k(n ? 1) degrees of freedom. In the formulas below a  subscript indicates averaging with respect to that subscript for the n observations from the rst stage of sampling. 1

1

2

0

2

0

0

Case A:  known, alternative systems simulated independently, the Xij are normally dis0

tributed, and the variances across systems may be unequal.

H = maxfT ; T ; : : : ; Tkg n o G = max Zk [(n ? 1)=Yk ] = + h; Zi [(n ? 1)(1=Yi + 1=Yk )] = ; 8i :::k? n0 X 1 Si = n ? 1 (Xij ? Xi ) j 1

2

1 2

0

1 2

0

1

1

2

2

0

=1

Case B:  unknown, all systems simulated independently, the Xij are normally distributed, 0

and the variances across systems may be unequal.

H = maxfZi [(n ? 1)(1=Yi + 1=Y )] = ; 8i :::k g o n G = max Zk [(n ? 1)(1=Y + 1=Yk )] = + h; Zi [(n ? 1)(1=Yi + 1=Yk )] = ; 8i :::k? n0 X Si = n 1? 1 (Xij ? Xi) j 0

1 2

0

0

1

1 2

0

1 2

0

1

1

2

2

0

=1

Case C:  unknown, all systems simulated using CRN, and (X j ; X j ; : : :; Xkj ) have a 0

0

1

multivariate normal distribution whose variance-covariance matrix satis es sphericity.

H = maxfT ; T ; : : : ; Tk g 1

2

G = maxfT + h; T ; : : :; Tk g n0 k X X Si = S = k(n 2? 1) (X`j ? X` ? Xj + X) ` j 1

2

2

2

0

=0 =1

12

2

Case D:  unknown, all systems simulated using CRN, and (X j ; X j ; : : :; Xkj ) have a mul0

0

1

tivariate normal distribution whose variance-covariance matrix is unknown. For Case D, whose proof exploits the Bonferroni inequality, it is easier to provide a somewhat di erent de nition of g and h; speci cally, (g; h) simultaneously solve 1 ? k PrfT > hg = 1 ? 1 ? PrfT + h > gg ? (k ? 1) PrfT > gg = 1 ? : The variance estimator for Case D is 9 8 n 0 h = < i X 1 i ? X` ) ( X ? X ) ? ( X Si = Sb = max ij `j ; i6 ` : n ? 1 j 2

2

2

=

0

=1

the largest variance of the di erence. Notice that Cases C and D use a common sample size N for all systems. In Section 4 we provide a procedure for estimating the quantiles (g; h) for Cases A, B and C. For case D, a simple numerical search suces.

3.2 The Issue of Normality Cases A{D above all assume that the simulation output data are approximately normally distributed, either marginally or jointly. Normality of the rst-stage observations is important because the joint distribution of Xi and Si is central to the derivations of the procedures. Therefore, it makes sense to consider whether or when this is a resonable expectation for simulation output data. In many simulation studies the basic output data Xi ; Xi ; : : : are themselves averages of large numbers of other more basic outputs. For example, if Xi ; Xi ; : : : correspond to di erent replications, and the performance measure is expected cycle time for a product, then Xij would typically be the average of the cycle times for a large number (perhaps hundreds) of individual products that were completed during the j th replication. In this 2

1

2

1

13

2

case, the central limit theorem suggests that approximate normality of the Xij may be anticipated. Clearly situations do arise in which the output from each replication is not even approximately normally distributed. For instance, if each replication produces a single observation of time to failure for a system, then there is no a priori reason to expect normality. However, if a large number of replications can be obtained, say m, then Goldsman, Nelson and Schmeiser (1991) suggest that they be partitioned into n \macroreplications," each consisting of m=n \microreplications." The average value within each macroreplication is then treated as the basic output data value. If m is large enough, then the microreplication averages will be approximately normally distributed. Even when only a single replication is obtained in order to estimate long-run performance in a steady-state simulation, we may anticipate that the Xi ; Xi ; : : : will be approximately normally distributed if they correspond to nonoverlapping batch means of many individual observations. Batch-size analysis in Matejcik and Nelson (1995) then suggests that the number of batches be kept to roughly 40, so that each batch mean is based on a very large number of observations. This same guideline applies to the number of macroreplications that should be formed when outputs are obtained across replications. 0

0

1

2

3.3 CRN May Be Counterproductive We have not presented procedures that incorporate CRN for the simulation of the alternatives  ;  ; : : :; k when  is simulated or sampled independently, or when  is known so that  is not simulated at all. In this section we present a brief analysis that shows why CRN may be counterproductive in these cases, and it is therefore safer to simulate the alternatives independently. Suppose that we have k = 2 alternatives,  is known, and we take a exactly one observation from each of the alternative systems, yielding data (X ; X ). Suppose further that 1

2

0

0

0

0

1

14

2

(X ; X ) has a bivariate normal distribution with parameters ( ;  ;  ;  ; ). We assume  > 0, representing the e ect of CRN. For this simple example, (11) becomes 1

2

1

2

2 1

2 2

PrfX ?  > c ? ; (X ? X ) ? ( ?  ) > ?g [2]

[2]

[2]

[1]

[2]

[1]

= PrfX ?  <  ? c; (X ? X ) ? ( ?  ) < g: [2]

[2]

[2]

[1]

[2]

[1]

(12)

Basic mathematical statistics shows that

= Cov[(X ?  ); (X ? X ) ? ( ?  )] < 0 [2]

[2]

[2]

[1]

[2]

[1]

if  >  = (where  i is the variance of system  i ); this is certainly possible when the variances are unequal. However, by Slepian's inequality (see, for instance, Tong 1980), we know that (12) is an increasing function of . Therefore, when  >  = the probability of correct selection is larger if the alternatives are simulated independently (implying that  = 0 and > 0) rather than with CRN. Intuitively, when the standard is xed and all of the alternatives hang together (due to CRN), then if one of of the alternatives is dicult to distinguish from the standard, they all are. A similar argument holds when  is unknown,  is simulated or sampled independently of the alternatives, and the alternatives are simulated using CRN. [2]

[1]

2 [ ]

[ ]

[2]

0

[1]

0

4 Estimating Critical Values Traditionally, critical values for statistical inference have been computed, often via intensive numerical integration, and then tabled for later use. This becomes impractical when the desired critical values depend on a large number of problem parameters. In the present setting, the critical values (g; h) depend on the con dence level, 1 ? , the number of systems, k, the rst-stage sample size, n , and whether or not CRN is employed. When the required numerical integration is of low dimension (say 2 to 3), then real-time numerical calculation 0

15

of the critical values may be possible. However, each problem type may then require a nely tuned numerical procedure that works well over the feasible range of problem parameters. As computation speed increases, another approach becomes viable: use a separate simulation experiment to estimate the critical values as needed for the problem at hand. As we will show, the present context is ideal for this approach. Cases A, B and C presented in Section 3 require a pair of quantiles (g; h) that satisfy PrfH  hg = 1 ? PrfG  gg = 1 ? where G = maxfM + h; M ; : : : ; Mk g, and H; M ; M ; : : : ; Mk are random variables that are easily sampled. Further, for all of our procedures the total sample size from system i satis es  gSi  Ni   1

2

1

2

2

an increasing function of g, implying that we would prefer an estimate of g that is a bit too large, rather than too small (if Ni is too small then we may not achieve the desired probability of correct selection and con dence interval coverage, while Ni too large makes the procedure conservative). We cannot guarantee that our simulation estimate of g is greater than or equal to g, but we can nd an estimator gb with the property that Prfg > gbg  g for some small, prespeci ed value of g . In this section we show that the following procedure yields such an estimator:

16

Procedure (g; h) Bound Step 1: Select positive integers mh and mg , and con dence levels h and g. Step 2: Find the smallest integers uh and ug such that

mh m ! X h (1 ? )` mh ?`  h ` ` uh ! mg X mg (1 ? )` mg ?`  (1 ? ): g h ` ug ` Step 3: Generate H ; H ; : : : ; Hmh , i.i.d. copies of H , and set hb = H uh . =

=

1

2

(

)

b M ; : : :; Mk g. Step 4: Generate Gb ; Gb ; : : : ; Gb mg , i.i.d. copies of Gb , where Gb = maxfM + h; 1

2

1

2

Step 5: Return hb and gb = H ug . (

)

To prove that the procedure works, we address a somewhat more general case: For continuous cdfs FH and FG, de ne h = FH? (1 ? h) and g = FG? (1 ? g jh), where \jh" indicates that h is a parameter of FG. Suppose that FG? is a nondecreasing function of the parameter h. Suppose also that we have a procedure Qh that provides an estimator hb with the property that Prfh > hb g  h. Further, we have a procedure Qg (hb ), that takes as input hb , and returns as output an estimator gb with the property that 1

1

1

Prfg > gbjh  hb g  g (1 ? h) and gb is nondecreasing in hb .

Theorem 2 If gb is de ned by procedures Qh and Qg (hb ), then Prfg > gbg  g . Proof: Prfg > gbg = Prfg > gbjh  hb g Prfh  hb g + Prfg > gbjh > hb g Prfh > hb g  g (1 ? h)  1 + Prfg > gbjh > hb g h

 g (1 ? h) + Prfg > gbg h 17

(13) (14)

where (13) follows from properties of Qh and Qg (hb ), and (14) from the fact that gb is nondecreasing in hb . Solving for Prfg > gbg gives g (1 ? h ) Prfg > gbg  (1 ? ) = g : h

It is straightforward to verify that Procedure (g; h) Bound satis es the conditions of Theorem 2, since PrfH uh < hg = Prf#fHi  hg  uhg mh m ! X h (1 ? )` mh ?` = ` ` uh  h (

)

=

where the second line follows from the de nition of h and the last line follows from our choice of uh. A similar argument holds for PrfG ug < gjh  hb g, and it is clear that G ug is nondecreasing in hb . (

)

(

)

Comment: Many ranking, selection and multiple comparison procedures, and in particular the procedures of Nelson and Matejcik (1995) and Matejcik and Nelson (1995) for selecting the best, require a single critical value similar to our h. Thus, an upper bound on h that holds with probability greater than or equal to 1 ? h can be achieved by stopping at Step 3 in Procedure (g; h) Bound.

Example: Suppose that we want an overall con dence level of 1 ? = 0:9 and a 95% upper con dence bound on the critical value, implying that Prfg > gbg  g = 0:05. If we set h = 0:05 also, then we require the conditional con dence bound Prfg > gbjh  hb g  g (1 ? h) = 0:0475. When mh = mg = 1000 observations, then to three decimal places X 1000! ` ` (0:9) (0:1) ` 1000

?`

100

=916

18

= 0:0485  0:05

X 1000! ` (0:9) (0:1) ` ` 1000

?`

100

=917

= 0:0383  0:0475

so that uh = 916 and ug = 917. This example shows that if we require the bound on h to hold with very high probability, then the fact that we estimated h has little or no e ect on the estimation of g. In fact, we had to work hard to nd a reasonable example in which uh 6= ug when h = g. Tables 1 and 2 give (h; g) values when the number of rst-stage observations is 10 and the desired overall con dence level is 0:90 or 0:95, respectively, for various numbers of alternative systems k. Notice that even with as few as 20,000 observations the 99% upper con dence bounds are quite close to the point estimates. S-Plus code (MathSoft, Inc.) to obtain critical values for all four cases can be obtained from http://www.iems.nwu.edu/~nelsonb/NSF/.

5 Empirical Evaluation To evaluate the robustness of the Case C version of the procedure to departures from sphericity, and to evaluate the conservatism of the Case D version of the procedure in general, we performed an empirical study. Since it is not possible to control the extent to which systemsimulation examples depart from sphericity, we focussed instead on the space of normally distributed output vectors with nonnegative covariances (the assumed e ect of CRN). We estimated the probability of correct selection (PCS) over this space, but did not estimate con dence-interval coverage separately since it is implied by the correct-selection guarantee. We considered only the equal means con guration (EMC),  =  =    = k , and the slippage con guration (SC), k ?  = k? = k? =    =  , because the minimum PCS should occur at these con gurations. In the EMC, a \correct selection" means retaining the standard, while in the SC it means selecting system k. We xed  = ; 1 and 2 in units of the average standard error of the rst-stage sample 0

1

2

1 2

19

0

1

Table 1: Critical values h above g for n = 10 and 1 ? = 0:90. When the critical value is estimated via simulation, a 99% upper con dence bound is given next to the estimate in parentheses. Estimates are based on 20,000 observations. 0

k A 2 1.817 3.345 (3.381) 3 2.064 3.622 (3.660) 4 2.238 3.860 (3.894) 5 2.373 4.010 (4.047)

Case

B 2.588 (2.631) 4.646 (4.687) 2.922 (2.962) 5.019 (5.062) 3.159 (3.194) 5.234 (5.281) 3.299 (3.347) 5.368 (5.413)

C 1.650 (1.679) 3.024 (3.057) 1.786 (1.811) 3.153 (3.188) 1.882 (1.902) 3.251 (3.280) 1.959 (1.988) 3.301 (3.331)

D 1.833 3.251 2.086 3.514 2.262 3.697 2.398 3.837

Table 2: Critical values h above g for n = 10 and 1 ? = 0:95. When the critical value is estimated via simulation, a 99% upper con dence bound is given next to the estimate in parentheses. Estimates are based on 20,000 observations. 0

k A 2 2.254 4.174 (4.222) 3 2.499 4.437 (4.489) 4 2.673 4.648 (4.702) 5 2.809 4.829 (4.868)

Case B 3.182 (3.242) 5.842 (5.902) 3.506 (3.578) 6.208 (6.276) 3.740 (3.790) 6.412 (6.484) 3.852 (3.916) 6.558 (6.628)

20

C 2.054 (2.097) 3.850 (3.893) 2.147 (2.185) 3.920 (3.960) 2.228 (2.262) 3.965 (4.000) 2.234 (2.355) 4.044 (4.081)

D 2.262 4.112 2.510 4.366 2.685 4.545 2.821 4.684

means; speci cally,  = pn0 ; pn0 or pn0 , where the average variance of an observation across all k + 1 systems was always xed to be 1. When  = there will be a large second-stage sample;  = 1 implies that there will usually be a modest second-stage sample; while  = 2 implies that second-stage sampling is rarely required. In addition to varying , we considered di erent con gurations of the systems' variances,  ;  ; : : : ; k , where i is the variance of an observation from system i. Speci cally, we considered equal variances across all systems; the best system having 20% larger variance than all other systems; and the best system having 20% smaller variance than all other systems (in the EMC the best system is the standard, while in the SC the best system is system k). We chose not to investigate drastically unequal variances because comparisons based on mean performance only make sense when di erences in variances do not dominate di erences in means. In all cases (k + 1)? Pki i = 1. The experiments were conducted as follows: 1

1

2

2

1 2

2 0

2 1

2

2

1

=0

2

1. Fix the number of systems, k, number of rst-stage replications from each system, n , and con dence level 1 ? . We considered k = 3 and 5 systems (which implies k +1 = 4 or 6 systems, including the standard), n = 10 replications, and 1 ? = 0:95. Fix D, a (k + 1)  (k + 1) matrix with o -diagonal elements 0 and diagonal ( ;  ; : : : ; k). 0

0

0

1

2. Generate a random k-dimensional correlation matrix  using the method of Marsaglia and Olkin (1984). This method transforms a randomly generated point on the kdimensional unit sphere into a correlation matrix. We modi ed the method to generate a point on the unit sphere with all nonnegative coordinates, which leads to a correlation matrix with all nonnegative elements. Set  = DD to obtain a covariance matrix with variances ( ;  ; : : :; k ) and implied correlation matrix . 2 0

2 1

2

3. Generate n i.i.d. random vectors Xj  N(0; ), for j = 1; 2; : : : ; n . 0

0

21

4. Compute the total sample size Ni = max fn ; d(gSi =) eg, where g and Si depend on the procedure being evaluated. (For Cases C and D, N = N =    = Nk = N .) 2

0

0

1

5. Generate N ? n i.i.d. random vectors Xj  N(0; ), for j = n + 1; n + 2; : : : ; N . n o 6. (a) In the EMC, score a correct selection if X + c > Xi; 8i 6= 0 . n o (b) In the SC, score a correct selection if Xk +  > Xi ; 8i :::k? ; Xk +  > X + c . 0

0

0

0

1

1

0

7. Repeat Steps 3{6 a total of 2000 times to obtain an estimate of PCS for the covariance matrix  (2000 replications gives two signi cant digits of precision). 8. Repeat Steps 2{7 a total of 5000 times to estimate the distribution of PCS over the space of covariance matrices, . The experiments bypass one problem that a ects all parametric multiple-comparison procedures|nonnormal data|and instead focus on the e ect of positive correlation and unequal variances. The results are therefore optimistic in the same way that any parametric multiple-comparison procedure is optimistic with regard to this assumption. The results are pessimistic in the sense that we seldom encounter the EMC or SC in practice. Before presenting some illustrative results, we summarize our conclusions from the complete set of experiments:

 Case C, the procedure based on assuming sphericity, achieved an average PCS of approximately 0:95 across the 5000 randomly generated covariance matrices for each combination of k,  and ( ;  ; : : :; k ) considered. However, probabilities of correct selection as low as 0:83, while rare, were observed in the SC con guration due to the standard being selected when the best is in fact system k. This is less robust performance than that observed by Nelson and Matejcik (1995) for a similar two-stage procedure designed only to select the best. 2 0

2 1

2

22

Therefore, the Case C procedure performs as desired on average, but has a higher than advertised risk of retaining the standard when one of the alternatives is exactly  better than the standard.

 Case D, the procedure based on the Bonferroni inequality, achieved a PCS of at least 0:95 for each randomly generated covariance matrix over all combinations of k,  and ( ;  ; : : : ; k ) considered. This was expected, since the procedure has been proven to achieve the PCS for normally distributed data. However, the average PCS can be signi cantly higher than 0:95, particularly as k is increased from 3 to 5 systems. 2 0

2 1

2

Therefore, the Case D procedure assures that the desired PCS is attained at the risk of delivering a higher than requested PCS by taking a larger total sample than is actually required.

 Neither  nor ( ;  ; : : : ; k ) had a noticable e ect on the results. 2 0

2 1

2

Therefore, neither procedure is signi cantly a ected by mild di erences in systems' variances.

Figures 1 and 2 present illustrative results for the Case C and D procedures, respectively. Each histogram summarizes the estimated PCS for 5000 randomly generated covariance matrices, and there is one histogram for each value of  and each con guration of the means. Common random numbers were employed in each gure. In Figure 1, consider the SC with  = 2. Although the average PCS is 0:95, the minimum observed is 0:84, and there is a 7% chance of PCS less than 90%. This illustrates the risk in using Case C, with the reward being greatly reduced sample sizes. In Figure 2, consider the EMC with  = . Notice that the average PCS is 0:99, and the minimum observed PCS is 0:98, even though the nominal PCS is 0:95. This illustrates the conservatism inherent in using a procedure based on the Bonferroni inequality. 1 2

23

SC, delta = 2

0

0

200

400

200 400 600

600

800 1000

EMC, delta = 2

0.88

0.90

0.92 0.94 0.96 Ave PCS = 0.95, Min PCS = 0.89

0.98

1.00

0.85

0.90 0.95 Ave PCS = 0.95, Min PCS = 0.84

SC, delta = 1

0

0

200

400

600

200 400 600 800 1000

EMC, delta = 1

1.00

0.88

0.90

0.92 0.94 0.96 Ave PCS = 0.94, Min PCS = 0.89

0.98

1.00

0.85

1.00

SC, delta = 1/2

0

0

200

200

400

600

400

600

800 1000

EMC, delta = 1/2

0.90 0.95 Ave PCS = 0.95, Min PCS = 0.83

0.88

0.90

0.92 0.94 0.96 Ave PCS = 0.94, Min PCS = 0.88

0.98

1.00

0.85

0.90 0.95 Ave PCS = 0.95, Min PCS = 0.83

Figure 1: Estimated probability of correct selection for Case C with k = 5 systems and n = 10 replications over the space of randomly generated  with equal variances. 0

24

1.00

SC, delta = 2

0

0

500

200

1000

400

1500

600

EMC, delta = 2

0.975

0.980

0.985 0.990 Ave PCS = 0.99, Min PCS = 0.98

0.995

1.000

0.94

0.95

0.96 0.97 0.98 Ave PCS = 0.99, Min PCS = 0.95

1.00

SC, delta = 1

0

0

500

1000

1500

100 200 300 400 500 600

EMC, delta = 1

0.99

0.975

0.980 0.985 0.990 Ave PCS = 0.99, Min PCS = 0.97

0.995

1.000

0.94

0.95

0.99

1.00

0.99

1.00

SC, delta = 1/2

0

0

500

1000

1500

100 200 300 400 500 600

EMC, delta = 1/2

0.96 0.97 0.98 Ave PCS = 0.98, Min PCS = 0.95

0.975

0.980

0.985 0.990 Ave PCS = 0.99, Min PCS = 0.98

0.995

1.000

0.94

0.95

0.96 0.97 0.98 Ave PCS = 0.98, Min PCS = 0.95

Figure 2: Estimated probability of correct selection for Case D with k = 3 systems and n = 10 replications over the space of randomly generated  with equal variances. 0

25

6 Example Systems Modeling Corporation provided a simulation model of a printed circuit board (PCB) manufacturing line that they developed for a large industrial client. In this serial line seven types of parts are processed, machine operators follow one of four schedules, and machines experience both scheduled preventative maintenance and random failures. Parts moving through the system may experience rework at any of three inspection steps. The key performance measure is expected part cycle time (in minutes). Three alternatives to the standard design arose by adding resources to shift 1 (alternative 1); adding resources to the swing shift (alternative 2); and increasing the capacity of some ovens (alternative 3). An indi erence zone of  = 100 minutes in cycle time was chosen, and a rst stage experiment consisting of n = 10 replications of 4000 minutes was performed for the standard and each of the k = 3 alternatives. All of the systems were simulated using CRN, so we obtained critical values h = 2:086 and g = 3:514 for Case D from Table 1 to insure an overall con dence level 1 ? = 0:90. Since smaller expected cycle time is better, an alternative must have c = h=g = 59 minutes smaller sample average cycle time than the standard in order to replace it. If Si` denotes the sample variance of the di erence between systems i and `, then the rststage experiment gave S ; = 76550; S ; = 67777; S ; = 49039; S ; = 83968; S ; = 111218 and S ; = 164749. Therefore, Sb = maxi6 ` Si;` = 164749, and the second-stage sample size is ') ( & (3 : 514) (164749) = 204 N = max 10; 100 0

2

2 01

2 23

2 02

2

2 03

=

2 12

2 13

2

2

2

The total run time for all 4204 replications was about 3:5 hours on a Pentium 200 computer. Table 3 displays the second-stage sample means. Since X = 1040 < X ? 59 = 1051, we conclude that alternative 1 is better (has smaller expected cycle time) than the standard. The table also displays the MCB con dence intervals comparing the alternatives to each other. 1

26

0

Table 3: Comparison among the alternatives for the PCB line example. System i Xi Xi ? minj6 i Xj 0 1110 70  100 1 1040 ?50  100 2 1090 50  100 3 1120 80  100 =

Although alternative 1 appears to be the best, the con dence intervals for i ? min`6 i ` all contain 0 so that it is not possible to distinguish among the alternatives with an indi erence zone of 100 minutes. In other words, any of systems 1, 2 or 3 could be the true best. However, the upper con dence bound for system  ? min`6 ` indicates that even if system 1 is not the true best,  is no more than 50 minutes greater than the mean of the true best. If it is important to choose among the alternatives then we could reduce  and continue the experiment. =

1

=1

1

7 Extensions We have presented procedures for comparisons with a standard that generalize procedures due to Bechhofer and Turnbull (1978) and Paulson (1952) so that they are more useful for the types of data encountered in simulation experiments. Our procedures also add con dence bounds on certains di erences, bounds that were not present in the earlier procedures. While we do generalize the procedures of Bechhofer and Turnbull (1978) with respect to the types of data to which they apply, Bechhofer and Turnbull consider a more general decision problem than ours. Speci cally, they derive procedures that guarantee that Prfselect  j   k +  g  1 ? 0

0

0

[ ]

0

Prfselect  k j k   k? +  ;  k   +  g  1 ? : [ ]

[ ]

[

1]

1

[ ]

0

2

1

Notice that this formulation allows for three indi erence zones and a di erent probability 27

requirement for retaining the standard versus selecting the best alternative. We believe that this formulation requires the speci cation of too many constants to be practically useful; nevertheless, our procedures can all be extended to cover this more general case.

Appendix In this section we prove that the Generic Procedure delivers the desired probability guarantees for Cases A{D. In each case, we will prove that (10){(11) hold. The ensuing discussion uses certain notation common to a number of the cases. Suppose that Z ; Z ; : : : ; Zk are i.i.d. standard normal random variables, Y ; Y ; : : : ; Yk are i.i.d. n0? , and T; T ; T ; : : :; Tk are i.i.d. tn0? ; and let (T ; T ; : : :; Tk ) be a multivariate-t random vector with common correlation 1=2 and k(n ? 1) degrees of freedom. We denote the standard normal cumulative distribution function by () and the n0? probability density function by f (). 1

2

1

0

2

1

1

1

2

0

2

1

Case A Since  is known, we can replace X by  . Thus, to derive (10), notice that 0

0

0

PrfXi ? i  c; 8i :::k g 9 8 = < Xi ? i c ; 8i :::k ; q = Pr : q Si =Ni Si =Ni 8 9 Yk < Xi ? i cpNi = Pr : q =  S ; (by independence of the systems) i Si =Ni i pN ) Yk ( c Pr Ti  S i = (by Stein 1945) i i Yk  Pr fTi  hg (by (3)) 1

2

2

1

2

=1

=1

i=1

= PrfT  h; T  h; : : : ; Tk  hg 1

2

28

(by independence of the Ti's)

2

1

= 1 ? ; the last step following from the way we chose h. This gives (10). Before we show that (11) holds, we de ne   Zei = (X k ?rXi2) ? (2k ?  i ) ; 8i :::k? [i] [k ] N[k] + N[i] [ ]

[ ]

and

[ ]

1

1

 ?k Zek = X k q : k = Nk [ ]

[ ]

Further, let

[ ]

[ ]

[ ]

g Qi = s 2 g 2  r  ; 8i :::k?  ] ] (n ? 1) Yi + Yk S[2k] + S[2i] 1

1

1

0

1

and

q Qk = g ?=Sh  pgn??h 1 Yk : k k By symmetry of the normal distribution, the left-hand side of (11) is equivalent to [ ]

[ ]

0

PrfX k ?  k <  ? c; (X k ? X i ) ? ( k ?  i ) < ; 8i :::k? g 8 9 > > = Pr Zk < > [i] [k ] k = Nk : ; N[k] + N[i] 8 9 > > > > < e ( ? c)g= e = g  Pr >Zk <  =S ; Zi < s 2 2 ; 8i :::k? > k k > > [i] [k ] : ; S2 + S2 [ ]

[ ]

[ ]

[ ]

[ ]

1

[ ]

1

[ ]

1

1

[ ]

[ ]

1

[ ]

[k ]

1

[i]

= PrfZei < Qi; 8i :::k g; 1

the inequality following from (3). Conditional on S ; S ; : : :; Sk , the multivariate normal random vector (Ze ; Ze ; : : :; Zek ) 2 1

2 2

2

1

29

2

has mean (0; 0; : : : ; 0) and covariance matrix  with components 8 1 if i = j > > > 2 =N[k] > s 2 2[k] > if i 6= j , i 6= k, j 6= k > < ] ] ] ]  ij = > N[k] N[i] N[k] N[j] > ] =N[k] s >   2  if i 6= j , j = k: > ] ] > [k ] : N[k] N[i] N[k] +

+

+

Since all of the ij 's are positive, Slepian's inequality (Tong 1980) gives us PrfZei < Qi; 8i :::k g h i = E PrfZei < Qi; 8i :::k jS ; S ; : : :; Sk g # "Y k e  E PrfZi < QijS ; S ; : : :; Sk g i "Y # k = E (Qi) 1

2 1

1

2 1

2 2

2

2 2

2

=1

i=1

= = = =

0 2 3k? 1 ! Z1 Br 6 7 C g  (g ? h) n y?k 1 64  B   CA f (y) dy75 f (yk ) dyk @ (n ? 1) y + yk 9 8 s > > = < Y g k Pr >Zk < (g ? h) n ? 1 ; Zi < r  ; 8i :::k? >  ; : (n ? 1) Yi + Yk s ( s ) 1 1 n ? 1 Pr Zk Y + h < g; Zi (n ? 1) Y + Y < g; 8i :::k? k i k 1? s

Z1 0

1

0

0

0

0

0

0

0

by the way we chose g. This proves that (11) holds.

30

1

1

1

1

1

1

1

1

Case B In order to obtain (10), rst de ne 8i :::k ,  X ) ? (i ?  ) Zei = (Xi ? r i2 02 + Ni N0 1

0

and

0

h Qi = r 2 h 2  r :  i + 0 + ( n ? 1) Yi Y0 Si2 S02

Then

1

1

0

Prf(Xi ? X ) ? (i ?  )  c; 8i :::k g 9 8 > > = Zi  r 2 2 ; 8i :::k > 0 i ; : Ni + N0 8 9 > > Zi  r 2 2 ; 8i :::k > 0 i : ; Si2 + S02 0

0

1

1

1

= PrfZei < Qi; 8i :::k g 1

where the inequality follows from (3). Conditional on S ; S ; : : :; Sk , the multivariate normal random vector (Ze ; Ze ; : : :; Zek ) has mean (0; 0; : : : ; 0) and covariance matrix  with components 8 > 1 if i = j > < 2 ij = > s 2 20 =N 0 2 2  if i 6= j:   > : Nii N00 Njj N00 2 0

2 1

2

1

+

+

Since all of the ij 's are positive, Slepian's inequality gives us PrfZei < Qi; 8i :::k g h i = E PrfZei < Qi; 8i :::k jS ; S ; : : : ; Sk g 1

1

2 0

31

2 1

2

2

#

"Yk

PrfZei < QijS ; S ; : : : ; Sk g i # "Yk = E (Qi)

 E

2 0

2 1

2

=1

i=1

0 2 Z 1 6Z 1 B 64  B@ r =

h  (n ? 1) y + y0

0

0

1

1

0

1 3k C 7  CA f (y) dy75 f (y ) dy 0

0

9 8 > > = < h ; 8 i = Pr >Zi < r :::k >   ; : (n ? 1) Yi + Y0 ( s ) 1 1 = Pr Zi (n ? 1) Y + Y < h; 8i :::k 0

1

1

1

0

i

1

0

= 1? by the way we chose h. This proves that (10) holds. Before we show that (11) holds, we de ne   Zi = (X k ?rXi2) ? (2k ?  i ) ; 8i :::k? [i] [k ] N[k] + N[i] and   Zk = (X k ?rX2) ? (2k ?  ) : 0 [k ] N[k] + N0 Further, de ne g Q i = s 2 g 2  r  ; 8i :::k?  ] ] (n ? 1) Yi + Yk S[2k] + S[2i] [ ]

[ ]

[ ]

[ ]

0

[ ]

1

1

0

[ ]

0

and

1

1

1

1

g?h Q k = s g2 ? h 2  r :  ] 0 + (n ? 1) Y0 Yk S[2k] + S02 0

1

1

By symmetry of the normal distribution, the left-hand side of (11) is equivalent to Prf(X k ? X ) ? ( k ?  ) <  ? c; (X k ? X i ) ? ( k ?  i ) < ; 8i :::k? g [ ]

0

[ ]

0

[ ]

32

[ ]

[ ]

[ ]

1

1

8 9 > > < =  ? c  i < r 2 = Pr >Zk < r 2 ; Z ; 8 i :::k? > ] ] 02 [k ] : ; + + N[k] N0 N[k] N[i] 9 8 > > > > = < g g ? h  Pr >Zk < s 2 2 ; Zi < s 2 2 ; 8i :::k? > 0 > [i] [k ] [k ] > ; : S[2k] + S02 S[2k] + S[2i] 1

1

1

1

= PrfZi < Q i; 8i :::k g; 1

the inequality following from (3). Conditional on S ; S ; : : :; Sk , the multivariate normal random vector (Z ; Z ; : : :; Zk ) has mean (0; 0; : : : ; 0) and covariance matrix  with components 8 1 if i = j > > > 2 =N[k] > s 2 2[k] > if i 6= j , i 6= k, j 6= k > < ] ] ] ]  ij = > N[k] N[i] N[k] N[j] 2 > > s 2 ]=N[k] 2  if i 6= j , j = k: > ] 2 ] ] > 0 > : N[k] N[i] N[k] N0 2 0

2 1

2

1

+

+

+

+

2

Since all of the ij 's are positive, Slepian's inequality gives us PrfZi < Q i; 8i :::k g h i = E PrfZi < Q i; 8i :::k jS ; S ; : : : ; Sk g # "Yk    E PrfZi < QijS ; S ; : : :; Sk g i # "Yk = E (Q i) 1

2 0

1

2 0

2 1

2

2 1

2

=1

i=1

0 0 12 1 3k? Z Z 1Z 1 B CC 66 1 BB r CC 77 g g?h r  B f ( y ) dy =     @ @ A4 A 5 f (y )f (yk ) dy dyk (n ? 1) y0 + yk (n ? 1) y + yk 1

0

0

0

1

1

0

0

33

1

1

0

0

8 9 > > < = g ? h g r = Pr >Zk < r ; Z < ; 8 i i :::k? >     : ; (n ? 1) Y0 + Yk (n ? 1) Yi + Yk s ) ( s 1 1 1 1 = Pr Zk (n ? 1) Y + Y + h < g; Zi (n ? 1) Y + Y < g; 8i :::k? 0

1

1

0

0

0

k

1

1

1

0

i

k

1

1

1

= 1? by the way we chose g. This completes the proof.

Case C The key to Case C is the assumption that (X j ; X j ; : : :; Xkj ) have a multivariate normal distribution whose variance-covariance matrix satis es sphericity. A consequence of sphericity is that Var[Xij ? X`j ] = 2 for all i 6= `; i.e., all pairwise di erences across systems within the same replication have common variance. To show that (10) holds, notice that 0

1

2

Prf(Xi ? X ) ? (i ?  )  c; 8i :::k g 9 8 = < (Xi ? X ) ? (i ?  ) c q q ; 8i :::k ; = Pr : S =N S =N 9 8 = < (Xi ? X ) ? (i ?  ) q  h; 8i :::k ;  Pr : S =N 0

0

1

0

0

2

1

2

0

0

1

2

provided that the (common) total number of observations from each alternative is N  (hS=c) = (gS=) , where we use the fact that c = h=g. But 8 9 < (Xi ? X ) ? (i ?  ) = q Pr :  h; 8i :::k ; S =N = PrfT  h; T  h; : : :; Tk  hg (15) 2

2

0

0

2

1

1

2

= 1?

(16) 34

where (15) follows from Theorem 2 of Nelson and Matejcik (1995), and (16) follows from the way we chose h. Therefore, (10) holds. To show that (11) holds, notice that by symmetry of the normal distribution, and the fact that  ? c  0 (since c = h=g and h=g  1), probability statement (11) is equivalent to Prf(X k ? X ) ? ( k ?  ) <  ? c; (X k ? X i ) ? ( k ?  i ) < ; 8i :::k? g 9 8 = < (X k ? X ) ? ( k ?  )  ? c (X k ? X i ) ? ( k ?  i )  q q g ? hg ? (k ? 1) PrfT > gg;

[ ]

[ ]

[ ]

[ ]

[ ][ ]

(22)

the last expression following from Stein (1945). Setting both of the lower bounds from (21) and (22) equal to 1 ? allows us to simultaneously attain (10) and (11).

Acknowledgments This research was partially supported by National Science Foundation Grant numbers DMI9622065 and DMI-9622269, and by Systems Modeling Corporation and Pritsker Corporation. The authors gratefully acknowledge the assistance provided by the Associate Editor and three referees.

References Bechhofer, R. E., and B. W. Turnbull. 1978. Two (k + 1)-decision selection procedures for comparing k normal means with a speci ed standard. Journal of the American Statistical Association 73:385{392. Goldsman, D., B. L. Nelson and B. Schmeiser. 1991. Methods for selecting the best system. In Proceedings of the 1991 Winter Simulation Conference (ed. B. L. Nelson, W. D. Kelton and G. M. Clark), 177{186. Piscataway, New Jersey: Institute of Electrical and Electronics Engineers. Marsaglia, G., and Olkin, I. 1984. Generating correlation matrices. SIAM Journal of Scienti c and Statistical Computing 5:470{475. 37

Matejcik, F. J., and B. L. Nelson. 1995. Two-stage multiple comparisons with the best for computer simulation. Operations Research 43:633{640. Nakayama, M. K. 1995. Selecting the best system in steady-state simulations using batch means. In Proceedings of the 1995 Winter Simulation Conference (ed. C. Alexopoulos, K. Kang, W. R. Lilegdon, and D. M. Goldsman), 362{366. Piscataway, New Jersey: Institute of Electrical and Electronics Engineers. Nelson, B. L. 1993. Robust multiple comparisons under common random numbers. ACM Transactions on Modeling and Computer Simulation 3:225{243. Nelson, B. L., and F. J. Matejcik. 1995. Using common random numbers for indi erencezone selection and multiple comparisons in simulation. Management Science 41:1935{ 1945. Paulson, E. 1952. On the comparison of several experimental categories with a control. Annals of Mathematical Statistics 23:239{246. Stein, C. 1945. A two-sample test for a linear hypothesis whose power is independent of the variance. Annals of Mathematical Statistics 24:669-673. Tong, Y. L. 1980. Probability inequalities in multivariate distributions. New York: Academic Press. Yang, W. and B. L. Nelson. 1991. Using common random numbers and control variates in multiple-comparison procedures. Operations Research 39:583{591.

38

Suggest Documents