Designing a computer experiment that involves switches

9 downloads 0 Views 898KB Size Report
Oct 30, 2010 - The use of Gaussian process emulators is now widespread in the analysis of computer experiments. These methods generally assume that all ...
Designing a computer experiment that involves switches Peter Challenor National Oceanography Centre Southampton SO14 3ZH, UK October 30, 2010

Abstract The use of Gaussian process emulators is now widespread in the analysis of computer experiments. These methods generally assume that all the simulator inputs are continuous. In this paper we consider the design problem for the case where one or more simulator inputs is a switch, a factor that can take the values on or off. We propose two possible designs: one based on Sobol sequences and one on latin Hypercubes. In both cases a small, but space filling, subset of simulator runs are carried out at both switch settings. This design is then nested within larger space filling designs one for each of the switch settings. If the switch is found to not affect the results these two designs can be combined into a much larger also space filling design.

1

Introduction: Computer Experiments

Increasingly science is dependent on experiments that run on computers rather than on the bench or in the field. Examples can be found in engineering, climate science and biology. The codes used in these experiments are often both large and complex. They can be very expensive to run, requiring large amounts of computer time on large computer clusters. The increasing use of such codes has led to the development of statistical methods that analyse such computer experiments. In these methods, we usually treat the computer code (or simulator) as a ‘black box’. It is a function that links a set of inputs, x, to a set of outputs, y. y = f (x) Although in principle f (.) is known everywhere, in practice we only know its value for the values of the inputs where the simulator has been run, everywhere else it is an unknown function. Because the full function space unknown statistically we can model this function as a random function. The basic statistical tool used for this purpose is the Gaussian process emulator. The Bayesian interpretation of an emulator is that it is an encapsulation of our knowledge about the underlying simulator. A good introduction to emulators and computer experiments is given in O’Hagan (2006). Although we prefer this Bayesian interpretation a frequentist interpretation is also possible, see for example Santner et al. (2003). In essence, the simulator is

1

run at a limited number of positions in its input space. A Gaussian process, the basis for the emulator, is then used to link the inputs to the output. The emulator (η(x)) is composed of two parts y ≃ η(x) = h(x)T β + σ The first term is a deterministic regression term. It is modelled as a sum of a number of basis functions (h(.)) with coefficients β. The next term, σ, is a zero mean Gaussian process. As defined here, the emulator will fit the output exactly at the points where the simulator has been run. Sometimes a ‘nugget’ is added, a zero mean independent random variable, to every point. Such a term means that the emulator does not exactly go through the output points. An important part of any computer experiment is the design of the experiment that is used to produce the training set for the production of the emulator. In practice it is often the case that we use ad hoc sequential methods to build an emulator: designing each ‘wave’ of simulator runs in the light of our experience with the previous runs. However such adaptive methods almost always start with a large single stand alone design. Often the simulators involve switches. These are variables that are used to turn on processes within the simulator. In this paper we consider the problem of how to design for a large simulator whose inputs include switches. By switches we mean inputs that are qualitative rather than quantitative but unlike general factors we restrict switches to only two levels [0, 1] or [on, off]. Switches are common in computer simulators usually used to turn on, or switch between, different processes within the simulator. For example in ocean models, we may have two parameterisations of sub-grid scale mixing that can be selected when we run the model, and then how such methods can be extended to include switches. First we give a brief overview of designing a computer experiments in general.

2

Space Filling Designs

The standard design for a computer experiment is intended to fill space with a minimum number of points (runs) such that the complete parameter space of a simulation is sampled.. Such a design is intended to give us a picture of what the simulator is doing across the entire space of input parameters so we need, if possible, to span the full range of inputs with our ‘training’ set of runs. The traditional design is the Latin hypercube (McKay et al., 1979). In a Latin hypercube design we first decide how many evaluations of the simulator we can afford in total, let this be n. We then divide the range for each input variable into n equal sections. The simulator is evaluated once, and only once, in each of these settings. This means that we have good marginal coverage of each of the variables. The Latin hypercube design is now produced by permuting the combinations of the variables in a random manner. Note the randomisation is to produce a design rather than to randomise for external factors as it is used in field experiments. While all Latin hypercubes have good marginal properties this is not necessarily the case for space filling. No one yet has produced an algorithm for an ‘optimal’ Latin hypercube so in practice we aim to produce ‘good’ designs that optimise some criteria from a large number of candidates. A common choice is a maxi-min criterion where we choose the design that maximises the minimum Euclidean distance between points in the design (Johnson et al., 1990). Although maxi-min criteria are popular another possibility is to use Latin hypercubes based on orthogonal arrays (Owen, 1992). Joseph and Hung (2008) propose an algorithm that uses a combination of these two criteria to produce what they call orthogonal-maximin Latin Hypercube designs.

2

An alternative to the Latin hypercube is to use a low discrepancy sequence to define the design points. Such sequences were originally devised to efficiently compute multidimensional integrals (Niederreiter, 1992). Discrepancy is a measure of the difference between a uniform distribution over the hypercube of interest and the design. Assume that our design (D) is defined over [0, 1]d . Then one popular discrepancy is IB (D) ∗ δ (D) = sup − |B| N B {

with B=

d ∏

} [0, ui ) : ui ∈ (0, 1]

i=1

IB (D) is the number of points of D in B and |B| =

∫ B

dx

i.e. B is the set of all cubic intervals with one end in the origin and thus the discrepancy is the classic Kolmagorov-Smirnov statistic for goodness of fit against the uniform distribution. It is believed that the best possible discrepancy is of the form δ ∗ (D) 6 Cd (log n)d + O(log n)d−1 Low discrepancy sequences are designed to have small Cd . Examples include the Halton sequences (Halton, 1960) and Niederreiter nets. For the design of computer experiments the most commonly used low discrepancy sequence is the Sobol sequence (Sobol, 1967; Bratley and Fox, 1988). To define the Sobol sequence, first assume we are working in only one dimension. Our aim is to generate a sequence {xi }, 0 < xi < 1 that is space filling and has low discrepancy. First we define a set of direction numbers v1 , v2 , . . . Each vi is a binary fraction written as vi =

mi 2i

where mi is an odd integer, 0 < mi < 2i To generate the vi we use what are known as primitive polynomials with coefficients either 0 or 1. A primitive polynomial is a polynomial with coefficients in Z2 that cannot be factorised as a product of polynomials of lower degree. P = xd + α1 xd−1 + α2 xd−2 + . . . + αd−1 x + 1 The choice of this polynomial is arbitrary as long as it is primitive. Once we have chosen a polynomial, we can define a recurrence relationship for calculating vi mi = 2α1 mi−1 ⊕ 22 α2 mi−2 ⊕ . . . ⊕ 2d−1 αd−1 mi−d ⊕ mi−d , i > d where ⊕ is the bit wise exclusive ‘or’ function. The 1-d Sobol sequence is then defined by xn = b1 v1 ⊕ b2 v2 + . . . 3

where b1 , . . . are the bits in the binary expansion of n and ⊕ is the bit-wise exclusive-or (xor) operator. This is the classical definition of a Sobol’ sequence. For multidimensional sequences we use different primitive polynomials for each dimension and generate each component of the component separately. For further details on the generation of Sobol sequences see Bratley and Fox (1988). One consequence of the definition is that (regardless of dimension) (i) the first 2n − 1 points are a Latin Hypercube and that (ii) intermediate sequences 2n . . . 2m − 1 are also Latin Hypercubes. This is an immediate consequence of the properties of bits b1 , . . ., (Maruri, 2010).

3 Designing for switches The designs described in the last section cover the case where all the input variables are continuous. Often, however we have input variables that are factors, i.e. that can only take a limited number of values. If a factor can only take 2 values (on and off) we refer to it as a switch. An example from a climate example might be a variable that controls which cloud scheme is used in a model of the earth’s atmosphere. Switches (and factors) cause problems for the standard statistical analysis of computer codes. The Gaussian process emulator theory assumes that all inputs are continuous variables. Qian et al. (2008) and Han et al. (2009) have both put forward extensions to the Gaussian process emulator that can include qualitative factors but neither is widely used. Current practice is to build separate emulators for each value of the switch or factor. As the number of switches becomes large this very quickly becomes very unwieldy. Four separate switches in a model would necessitate the creation of 24 (32) emulators, each requiring the same number of runs of the simulator. Switches also introduce some interesting design problems. Because we may need to build separate emulators for each setting of the switch we need the design for each setting to span the full input space. An obvious solution therefore would be to repeat the same design for each switch setting. However we may find that as we do our analysis that the switch actually has no effect on the output we are considering. In such a case, any duplicate points will be wasted and usually in computer experiments simulator evaluations are expensive. We want to minimise the number of evaluations. Thus we are looking for designs that produce a number of space filling designs (one for each switch setting) with each having a small but also space filling, design embedded within it . The simplest such design would be to use a standard space filling design such as those described above including switches and factors as additional input variables and to discretise those dimensions. The sliced space-filling design of Qian and Wu (2009) is a more sophisticated version of this idea. Such a design has a number of attractions. It can be carried out with the same software as used when we don’t have switches and gives good coverage of the full input space. However there are a number of disadvantages. First, the sub-designs for a single switch setting are no longer space filling, as splitting a space filling design in two does not, in general, produce a space filling design, which may introduce problems and, second, we have no points where all input variables are held constant for both settings of the switch. This latter is important because one question we need to answer is; whether the effect of the switch on the output of the simulator is significant enough to need to include in in our emulator and consequent statistical inference. A design which pushes the points as far apart as possible between switch settings or factor levels would make this difficult. We would have no points at both switch settings to directly compare. In fact if the design is working properly 4

we will not even have points that are close. Our tests of the whether the switch is needed will rely on the emulator we build. In some cases this is appropriate. If we have strong prior opinion that the switch is a vital part of the simulator then we do not need to test whether it is needed and it should be included in the general space filling design. However in many cases it is not known if the switches are needed and it is worth expending some effort to see if we can reduce complexity by removing the switch from the emulator. These two suggested designs, a repeated design at each switch setting and treating the switch in the same way as a quantitative input, can be thought of as in some ways as at opposite ends of a spectrum. The first gives us maximum information for the intercomparison of switch or factor levels whereas the latter gives us the maximum spread of points regardless of the switch level (although as we have seen the sub-designs at each factor level are not themselves necessarily space filling). We now propose a design that allows us to both explore input space efficiently and test whether switches are important input variables that need to be included in our emulators. To reiterate our requirements for a design that involves factors or switches we want a design that is space filling for each switch setting but has a common set of runs to allow us to test if it is necessary to include the switch in any emulator we build. The extension from switches to multi-level factors is usually obvious in what follows so we treat the simpler two level designs for clarity. When the extension to multi-level factors isn’t obvious we will point this out.

4

The Switched Sobol Design

We now propose a design based on a Sobol sequence. The basic design is a Sobol sequence of size n = 2m −1 of dimension d, where d is the number of quantitative variables in the simulation. Assume initially that we have a single switch. We now allocate the first 2m−1 − 1 points to the first switch setting and the remainder to the second. This design gives us two Latin Hypercubes that are also low discrepancy sequences and combined have the same properties. They are not quite the same size as the second part is one larger. For normal size designs with at least 10 points per dimension (Loeppky et al., 2009) a difference in size between one half of the design and the other is irrelevant. This design has nice properties but does not allow us to compare the simulator output directly at both switch settings. We will have to use our emulator to do such comparisons which makes them subject to the uncertainty in the emulation process. Our proposed solution to this problem is to repeat the first 2p −1 (p < m) simulator runs at the second switch setting. This would gives us 2m−1 −1 points at the first switch setting and (2m−1 −1)+(2p −1) at the second setting. These additional 2p − 1 runs are the cost we have to pay for the ability to directly compare the two switch settings. If we have strong a priori reasons for believing that the switch is important, for example from previous experiments, then we may chose not to incur this expense and only run a small number of common simulations to confirm our beliefs. On the other hand, we may have strong beliefs that the switch does not affect the output of the simulator. In this case, we might want a small p so we can validate our beliefs but at minimal cost. A larger p would be appropriate where we wanted to verify that the relationship between the simulator output at the different switch levels was constant across input space. Consider an example. For simplicity we will assume that there are only two quantitative inputs or parameters. The extension to any number of inputs is simple and with only two quantitative inputs we can easily plot the design. For our first example, we have only a single switch. Our base Sobol sequence has 255 members, 5

that is we set m to be 8. The joint part of the design has 15 members; i.e. n is set to be 4. As this is an artificial example we could set m and n to be what we like but these are the sort of values we might use in practice. Figure 1 shows the design. It is clear that the design gives good coverage for both switch settings regardless of whether the switch settings give significantly different results or not. This shows the situation for a single switch. In some simulators there are a number of switches’ making the design a little more complex. We still keep the joint runs for all switches and all switch settings. But we now have to partition the non-common runs in a different way. Our proposal is to split each of the switch settings for switch 1 into two and allocate these to the on/off states of switch 2. For a third switch we continue to divide each set allocated to the previous switch setting in half and allocate each half to on or off for switch 3. This is shown in figure 2. The extension to additional switches is obvious. We maintain the common set for every switch setting and continue to sub-divide the rest of the design in half in a similar way. Although this is satisfactory, but not ideal, for a small number of switches, as the number increases the method breaks down. If our total Sobol sequence is length 2m then we cannot have m or more switches.

5 A Latin Hypercube Based Design As an alternative to a design based on Sobol sequences we could build a similar system using Latin Hypercubes. The nested Latin Hypercube described in Qian (2009) addresses a similar problem. These designs have been proposed for nested simulators where we have a slow complex simulator and a faster less complex one (Kennedy and O’Hagan, 2000). Here the first t points in a design are used to produce a conventional Latin Hypercube. The next n − t points are then sampled from a nested permutation described in Qian (2009). This ensures that the entire set of n points is also a Latin Hypercube. Thus we have a small Latin Hypercube nested within a larger one. Following our Sobol method we do not want a single nested design. We want 2 large designs with a single common small design nested within them. This means that we need to make a very simple extension to the original nested Latin Hypercube design from Qian (2009). In the second stage instead of taking a single sample we produce 2 such nested designs (for a single switch). To produce a nested Latin Hypercube design the number of points in the large design has to be an integer multiple of the number in the small nested hypercube. Thus for comparison with our Sobol design we take 16 points for the nested Latin Hypercube which is repeated for each switch setting and 128 for each of the larger Latin Hypercubes. Thus we have a total of 256+16 points in our total design in contrast to the 255+15 for the Sobol. An example of the resulting design for the same parameters is shown in figure 3. A disadvantage of this design is that, although the design points for both the switch off and the switch on are space filling, the joint design if we discover that the switch is not important is not. It is the union of two nested Latin Hypercubes so it will have good properties but we have not chosen these two designs to work well as a single design. Looking at figures 1 and 3 it will be seen that in general the Sobol based design is better at space filling than the nested Latin Hypercube. However it should be noted that we have not tried to optimise the Latin Hypercubes by producing maximin or orthogonal designs. Such an extension to the nested latin hypercube would not be difficult and would improve its the space filling properties. We could of course also improve our Sobol sequences by scrambling them. An advantage of the nested Latin Hypercube is that it is much more adaptable. With the Sobol design we 6

(b) Switch On

0.0

0.0

0.2

0.2

0.4

0.4

y

y

0.6

0.6

0.8

0.8

1.0

1.0

(a) Switch Off

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

x

x

(c) Switch On (and Off)

(d) Total Design

1.0

0.0

0.2

0.2

0.4

0.4

y

y

0.6

0.6

0.8

0.8

1.0

0.0

0.2

0.4

0.6

0.8

0.0

x

0.2

0.4

0.6

0.8

1.0

x

Figure 1: An example of a switched Sobol design for 2 variables and a single switch. The total design (d) has 255 points and the common design where we run the simulator for both switch settings (c) has 15. (a) shows the design points for the switch off, (b) shows them for the switch on. In addition the points in (c) are also run for the switch on. The total design is shown in (d), switch off points with a ◦ and switch on points with a +

7

All switch combinations Off

On

Switch 1 Off

On

Off

On

Switch 2

Off

On

Off

On

Off

On

Off

On

Switch 3 Figure 2: A diagram to show how 2nd and 3rd switch settings are matched to the Sobol sequence. The Sobol sequence points run from the first point in the sequence on the left to point 2m−1 on the right. The top line shows the points which are run at both levels for all three switch settings.

8

0.8 0.6 0.0

0.2

0.4

y 0.0

0.2

0.4

y

0.6

0.8

1.0

(b) Switch On

1.0

(a) Switch Off

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

x

(c) Switch On (and Off)

(d) Total Design

1.0

0.8 0.6 0.0

0.2

0.4

y 0.0

0.2

0.4

y

0.6

0.8

1.0

x

1.0

0.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

x

0.2

0.4

0.6

0.8

1.0

x

Figure 3: An example of a nested Latin Hypercube design for 2 variables and a single switch. The total design (d) has 256 points and the common design where we run the simulator for both switch settings (c) has 16. (a) shows the design points for the switch off, (b) shows them for the switch on. In addition the points in (c) are also run for the switch on. The total design is shown in (d), switch off points with a ◦ and switch on points with a +

9

are restricted to powers of 2 for our numbers of points. The only restriction for the nested Latin Hypercube is that the total number of runs of the simulator is n = t(2N − 1) where N is a positive integer. Another advantage of the nested Latin Hypercube is that it is relatively easy to extend to multiple level factors. We simply add an additional nested permutation (the points that make a Latin Hypercube when combined with the nested Latin Hypercube) for each level of the factor. Two or more switches is more difficult to achieve in this framework

6

Conclusions

In this paper we have proposed two designs that give good space filling properties for computer experiments that involve switches. The first proposed design includes a small common design that is both a Sobol sequence and a Latin hypercube and larger designs that are always from a Sobol sequence and for a single switch has the Latin hypercube property as well. As the number of switches increases, the efficacy of the design decreases. If the number of switches becomes larger than m, where the size of the design for each switch setting is 2m−1 then the design becomes impossible as we cannot divide up the simulator runs between the switch settings in a unique way. The second design is based on a nested Latin Hypercube (Qian, 2009). It is also based on a common design with more extensive designs for each switch setting. These are based on Latin Hypercubes rather than on Sobol sequences. In practice these may prove more adaptable but do not have such good space filling properties as the Sobol design, particularly if it is found that the switch is not important and the designs are collapsed into a single design. For a small number of switches, ideally only one, the methods proposed give a design that allows us to both easily check whether the switches make enough difference to the simulator output that they need to be included in the emulator or whether they can be ignored, and gives a good space filling design for the quantitative variables. For both designs we need to decide how many points we should put in the nested design. This number needs to be large enough to allow us to test the switch across the full range of the other inputs but we do not want to waste runs unnecessarily. For continuous variables ten is recommended as the minimum number of runs per variable (ref). We can therefore assume that 8 or 16 would be the minimum number of runs needed for a switch with the Sobol design and similar numbers for the nested latin Hypercube. In most cases eight would probably suffice but if the switch were extremely important more would be required. If it is a priori decided that the switch variables are vital to the working of the simulator then treating them in a similar way such as in Qian and Wu (2009) would probably be a better strategy.

7 Acknowledgements This paper was in part funded by the NERC RAPID-WATCH project RAPIT and the Oceans2025 programme. The Sobol sequences were calculated using the package ‘pomp’ in the R language (R Development 10

Core Team, 2010). I would like to thank my colleagues Hugo Maruri, Yiannis Andrianakis and Robin Tokmakian for discussions during the writing of this paper and an anonymous reviewer for very comments.

References Bratley, P. and B. Fox (1988). Algorithm 659: implementing sobol’s quasirandom sequence generator. ACM Transactions on Mathematical Software (TOMS) 14(1), 88–100. Halton, J. (1960). On the efficiency of certain quasi-random sequences of points in evaluating multidimensional integrals. Numerische Mathematik 2, 84–90. Han, G., T. J. Santner, W. I. Notz, and D. L. Bartel (2009). Prediction for computer experiments having quantitative and qualitative input variables. Technometrics 51(3), 278–288. Johnson, M., L. Moore, and D. Ylvisaker (1990). Minimax and maximin distance designs. Journal of Statistical Planning and Inference 26(2), 131–148. Joseph, V. R. and Y. Hung (2008). Orthogonal-maximin latin hypercube designs. Stat Sinica 18(1), 171–186. Kennedy, M. and A. O’Hagan (2000). Predicting the output from a complex computer code when fast approximations are available. Biometrika 87(1), 1–13. Loeppky, J. L., J. Sacks, and W. J. Welch (2009, Jan). Choosing the sample size of a computer experiment: A practical guide. Technometrics 51(4), 366–376. Maruri, H. (2010). Personal communication. McKay, M., R. Beckman, and W. Conover (1979). A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21(2), 239–245. Niederreiter, H. (1992). Random number generation and quasi-monte carlo methods. In CBMS-NSF Regional Conference Series in Applied Mathematics, Volume 63, Philadelphia. SIAM. O’Hagan (2006). Bayesian analysis of computer code output: A tutorial. Reliability Engineering and System Safety 91, 1290–1300. Owen, A. (1992). Orthogonal arrays for computer experiments, integration and visualization. Sinica 2(2), 439–452.

Stat

Qian, P. Z. G. (2009). Nested latin hypercube designs. Biometrika 96(4), 957–970. Qian, P. Z. G. and C. F. J. Wu (2009). Sliced space-filling designs. Biometrika 96(4), 945–956. Qian, P. Z. G., H. Wu, and C. F. J. Wu (2008). Gaussian process models for computer experiments with qualitative and quantitative factors. Technometrics 50(3), 383–396. R Development Core Team (2010). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. Santner, T. J., B. J. Williams, and W. Notz (2003). The design and analysis of computer experiments. Springer.

11

Sobol, L. (1967). On the distribution of points in a cube and the approximate evaluation of on the distribution of points in a cube and the approximate evaluation of integrals. USSR Comput. Math. and Math. Phys. 7, 86–112.

12

Suggest Documents