'black box' of simulations: increased transparency and ... - Springer Link

8 downloads 0 Views 1MB Size Report
Oct 27, 2011 - Abstract Many still view simulation models as a black box. This paper argues that perceptions could change if the systematic design of ...
Comput Math Organ Theory (2012) 18:22–62 DOI 10.1007/s10588-011-9097-3

Opening the ‘black box’ of simulations: increased transparency and effective communication through the systematic design of experiments Iris Lorscheid · Bernd-Oliver Heine · Matthias Meyer

Published online: 27 October 2011 © Springer Science+Business Media, LLC 2011

Abstract Many still view simulation models as a black box. This paper argues that perceptions could change if the systematic design of experiments (DOE) for simulation research was fully realized. DOE can increase (1) the transparency of simulation model behavior and (2) the effectiveness of reporting simulation results. Based on DOE principles, we develop a systematic procedure to guide the analysis of simulation models as well as concrete templates for sharing the results. A simulation model investigating the performance of learning algorithms in an economic mechanism design context illustrates our suggestions. Overall, the proposed systematic procedure for applying DOE principles complements current initiatives for a more standardized simulation research process. Keywords Communication · Design of experiments · Simulation · Standards · Transparency

1 Introduction Simulation has become more and more established as a research method in the social sciences over the past years. Nevertheless, the extent of acceptance varies by field and some disciplines still hesitate to use simulation extensively. This is also reflected in a I. Lorscheid () · B.-O. Heine · M. Meyer Institute of Management Control and Accounting, Hamburg University of Technology, Hamburg, Germany e-mail: [email protected] B.-O. Heine e-mail: [email protected] M. Meyer e-mail: [email protected]

Opening the ‘black box’ of simulations

23

relatively low number of top journal publications, like e.g. in economics or sociology (Harrison et al. 2007; Richiardi et al. 2006; Reiss 2011). At least two possible reasons explain this situation. First, many may still perceive simulation models as a black box because simulation models and the analyses of their behavior are often not described in an exhaustive way. Leaving out details is frequently deemed necessary in order to adhere to space limits given by publication outlets or to avoid overloading the audience (Axelrod 1997; Grimm et al. 2006). The challenge to report the analysis process is aggravated by the fact that non-experts in simulation are often the addressees when targeting a specific discipline. Second, the reluctance to accept simulation might also stem from the difficulty to communicate simulation results (Axelrod 1997; Richiardi et al. 2006). The predominant complexity of simulation data makes it difficult to be conclusive without providing too many details and distracting from the main findings. An interdisciplinary audience can further hamper the effective and efficient reporting of results as it requires more extensive explanations. One way to address these issues is standardization. Axelrod (1997) represents a prominent advocate for creating standards. In particular, he sees two challenges for advancing the method further: progress in simulation model analysis and result sharing, both ideally incorporated in standards for the field. To illustrate the feasibility, he cites empirical research as a model for success. In this area many standardized procedures have been developed for analyzing and presenting empirical data such as a clear understanding of the meaning of “p < 0.05” shared by all recipients of empirical research. For Axelrod, establishing similar standards for simulations is clearly “needed for the field to become mature so that its potential contribution to the social sciences can be realized” (Axelrod 1997:16).1 This paper proposes just that: a systematic and ideally standardized procedure for simulation research2 to address these two important challenges. It argues that the potential of a systematic design of experiments (DOE) has not entirely been realized despite its ability to systematically analyze the behavior of simulation models and present the results. It can be drawn on for addressing both of the challenges described before: increasing (1) the transparency of simulation model behavior and (2) the effectiveness of reporting simulation results. Overall, we propose a systematic procedure for applying DOE principles that hopefully contributes to the current initiatives for a more standardized simulation research process (Grimm et al. 2006; Richiardi et al. 2006). The ultimate goal of all such endeavors is to establish simulation as an accepted research method. This paper is organized as follows: First, we review extant literature with respect to DOE, simulation and standards. Then, we develop the suggested systematic procedure for simulation model analysis based on DOE principles. This also includes a 1 An example for successful standard implementation is empirical marketing research. The discipline’s

flagship journal “Journal of Marketing” has experienced a rise in the share of empirical papers from 62% in 1992 to 86% in 2002. Homburg and Klarmann (2003) argue that implementing standards in empirical research substantially contributed to this development. 2 Given the background of the authors, we have a typical research project in social simulation with an

agent-based model in mind. Still, the basic ideas should also be applicable to other simulation approaches such as the Monte-Carlo-Simulation.

24

I. Lorscheid et al.

section where we critically assess the procedure especially with respect to possible pitfalls. In the final section we provide a discussion of the main results and give a brief outlook to future research.

2 Literature review To contextualize the suggested procedure, three main literature streams will be discussed. First, we present the essentials of DOE as the theoretical basis. Second, we discuss extant literature on the transfer of DOE principles to simulation experiments. Lastly, we relate our call for a systematic DOE to existing initiatives for the standardization of simulation research. A good starting point for characterizing DOE is the definition provided by Antony (2003).3 He defines DOE as “the process of planning, designing and analyzing the experiment so that valid and objective conclusions can be drawn effectively and efficiently” (Antony 2003:7). Two major aspects of this definition are to be stressed: First, the design ensures the conclusion’s validity and reliability (effectivity), e.g. by controlling for possible biases. Second, DOE provides solutions to cope with resource constraints of the experimenter (efficiency), e.g. by limiting the number of experiments to the most important combinations of factor levels. DOE emerged as a response to common criticisms to experiments in agriculture and biology (Wu and Hamada 2000) and as a sub-discipline of mathematical statistics (Kleijnen 1998). The statistician Fisher (1971) argues that typically either the logical structure of an experiment or the interpretation of the results are criticized. This inspired his pioneer work on DOE. While the interpretation simply required the appropriate application of existing statistical methods, the corresponding principles for a proper logical structure, i.e. the design of the experiment, had not yet been developed to the same extent. At that time, agricultural or biological experiments were commonly geared towards comparing available treatments, e.g. to see whether barley varieties differ in yield and drought resistance and, if so, to what extent (Wu and Hamada 2000). These experiments were typically large in scale, very time consuming and exposed to uncontrollable conditions such as weather. To cope with the challenges, the basic principles of DOE were developed by Fisher (1971) and subsequent contributors between 1935 and 1945. Among the most important principles are replication, blocking, randomization, orthogonality, analysis of variance (ANOVA) and fractional factorial designs (Wu and Hamada 2000). After World War II, the focus shifted from treatment comparison to process modeling and optimization in industrial settings (Wu and Hamada 2000). The new setting allows for faster experiments and emphasizes cost consciousness. It led to a methodology that increasingly used regression analysis in order to determine optimal designs for the specific experimental conditions. Subsequently, the reduction of the variation between processes was highlighted and, for the most part, driven by quality considerations in manufacturing (Wu and Hamada 2000). 3 There is a vast body of literature covering basic DOE as major part of experimental methodology, e.g.

Fisher (1971), Wu and Hamada (2000), Box et al. (2005) and Montgomery (2009).

Opening the ‘black box’ of simulations

25

This short historical perspective already offers some insight into the classes of problems possibly tackled by DOE. Wu and Hamada (2000) summarizes the basic issues which can be addressed by DOE as follows: • Treatment Comparison: Comparing treatments and identifying the best one(s). • Variable Screening: Identifying the most important one out of a large number of variables. • Response Surface Exploration: Exploring the effect of a few important variables in detail. • System Optimization: Identifying a variable combination to optimize the response, typically by means of a sequential strategy to narrow down areas of potential optimality. • Increasing System Robustness: Choosing suitable levels of control factors to limit the impact of noise factors.4 The listed issues represent typical questions also addressed by simulation research. The potential of DOE for simulation research in general stems from the fact that most simulations require experimental techniques (explicitly or implicitly) as many parameters and structural assumptions could have a significant impact on the performance measure (Law 2007). Hence, experiments are considered to be an important element in the simulation research process (Harrison et al. 2007; Taber and Timpone 1996).5 The major monograph with respect to DOE for simulation research was published by Kleijnen (2008) with its predecessors dating back to 1974/75.6 Four important differences between field/laboratory and simulation experiments form the starting point for applying DOE principles to simulation research (Kleijnen 1998): 1. Simulation models tend to comprise more factors to be analyzed 2. More control over noise is granted due to the common use of (pseudo) random numbers. Apart from the effect of (pseudo) random numbers, replications by the same experimenter are unnecessary (Wu and Hamada 2000) 3. Blocking as a technique to reduce systematic differences between experimental units (within a block the units are homogeneous) is not needed. Differences between experimental units are modeled only if they are part of the variables under investigation 4. Randomization as a coping mechanism to address systematic differences between experimental units is not necessary even when blocking is not possible or desirable Furthermore, the costs for most simulation experiments are typically substantially lower than for field or laboratory settings. Marginal costs for additional experiments may even approach zero and mitigate a major theme of standard DOE: the objective to 4 Control factors are variables with values that can be chosen by the experimenter and remain constant

afterwards. Noise factors are to some extent uncontrollable in their variation. 5 Peck (2004) suggests viewing simulation just as another kind of experimental system and that it therefore

should be analyzed as such. Reiss (2011) supports this statement and argues that computer simulations can help economics as experimental science. 6 For a primer in global sensitivity analysis see Saltelli et al. (2000) and Saltelli et al. (2004).

26

I. Lorscheid et al.

reduce the number of design points (factor level combinations) within the experimental design. The decreased need for fractional factorial designs is only one advantage. Simulation experiments are also very likely to actually reveal all existing effects of the variables under investigation (given a proper DOE) because even small effects can be detected due to the possibility for numerous simulation runs. Therefore existing effects in a simulation model usually are statistically significant. This cannot be achieved as easily in field or laboratory settings given the higher costs for additional experiments. Based on these characteristics, Kleijnen (2008) elaborates in detail how DOE principles can be applied to simulations (see also Barton 2004; Kelton and Barton 2003; Kleijnen 1998; Kleijnen et al. 2005; Sanchez 2006). The viability of his idea is confirmed by the fact that the basic principles have been incorporated into standard textbooks (e.g. Law 2007)7 and general DOE texts now include sections on the application to simulations (e.g. Siebertz et al. 2010). One might expect a wide application of DOE principles in simulation given the described similarities between experimental research and simulation research, the possible benefits of DOE and the extant work on the application of DOE principles in simulation. Unfortunately, the current implementation level of DOE in many areas of simulation research stands in stark contrast to its potential.8 Kleijnen et al. (2005:263), for example, register that “DOE is not used as widely or effectively in the practice of simulation as it should be”. Richiardi et al. (2006:1.5) go even further for agent-based simulations and claim that “many articles seem to ignore the basics of experimental design”. Simulation models are often analyzed without any reference to experimental methods, omit sensitivity analysis or adopt a ‘one-factor-at-a-time’experimental approach (resembling ‘ceteris paribus’ statements) (Wu and Hamada 2000). The latter kind of analysis in particular has serious limitations. It cannot identify or analyze interaction effects between variables, which poses an unnecessary restriction to the generalizability of conclusions and can miss optimal factor settings (Wu and Hamada 2000). It seems that the overall potential of DOE for simulation research has not been realized so far. This paper’s claim, however, is more specific than that with respect to the possible benefits of a systematic application of DOE principles. We argue that DOE can provide a particularly promising means to address the two challenges described in the introduction: (1) the perceived low transparency of simulation model behavior and (2) the effective communication of simulation results. The first challenge stems from the fact that simulation models often become really complex very fast. This makes them difficult to understand for users and designers alike. Important drivers of this complexity are the number of variables, potential interactions between these variables9 and possible non-linear effects (Gilbert and Troitzsch 2005). Characteristics like the potential sensitivity of simulation runs to 7 Further DOE-challenges specific to simulation research are still being worked out (Kleijnen et al. 2005;

Oh et al. 2009). 8 Of course, there are also positive exceptions. See e.g. Raghu et al. (2003). 9 Therefore, one might expect that many simulation models have undetected interaction effects.

Opening the ‘black box’ of simulations

27

initial conditions and stochastic elements further impede an adequate understanding of these models (Axelrod 1997). The described effectiveness and efficiency of DOE are helpful for an improved variable screening and can therefore increase transparency for the behavior of (complex) simulation models. DOE could be used to systematically analyze and quantify the effects of a large number of variables and detect possible interaction effects which easily go unnoticed by traditional sensitivity analyses. Moreover, techniques like the estimation of experimental error variance could deal with differences in simulation runs caused by variations in initial conditions and stochastic elements.10 The second challenge, effective result communication, is rooted in several factors. The potentially high amount and complexity of simulation data certainly adds to the reporting problem. The question is how to effectively reduce the complexity for presenting the main findings. An interdisciplinary audience can make reporting difficult as well because it means that at least part of the audience is confronted with unfamiliar concepts and techniques (Axelrod 1997). Thus, common concepts and techniques for the effective presentation of results would foster effective and efficient communication (Axelrod 1997). DOE represents a promising avenue to meet this demand as well.11 First of all, standardized approaches have already been developed for DOE in order to tackle a wide range of problems (Wu and Hamada 2000, see above). They include typical formats for reporting the results of experiments investigating such problems. As the structure of most research questions in simulation research resembles the previously outlined problem classes (Kleijnen et al. 2005), the existing DOE formats could offer a viable solution to reporting the results of simulation experiments. Furthermore, the results are mainly expressed in relations between variables and statistical terms. They could provide a “universal” language that is accessible to an interdisciplinary audience because it is also the predominant language for most reports on empirical research (with the exception of qualitative research). Thus, researchers working empirically in these fields should be encouraged to realize the inherent possibility for exchanging ideas and collaboration that emerges from a shared language. At the moment, well-established practices to address these problems do not exist in simulation research. The fact that DOE only forms a loose collection of techniques so far might add to the shortcoming in applying them to simulation research. In the following section, we therefore develop a systematic procedure to effectively integrate DOE principles in the simulation research process. It uses DOE principles in combination with concrete reporting templates for a systematic simulation model analysis and the focused reporting of results. This systematic procedure aims at fostering the transparency and communicability of simulation model behavior and its results. In our opinion, it can be a first step toward developing a collection of “concepts that can become common knowledge and then be communicated briefly” as requested by Axelrod (1997:19). 10 By means of DOE, one can address these issues at the abstract level of general simulation behavior. At

the same time, it might also be the model’s explicit purpose to study concrete histories of simulation runs (Axelrod 1997; Malerba et al. 1999). 11 Again, the problem of describing detailed histories cannot be tackled via DOE. For a discussion of this

issue see Axelrod (1997).

28

I. Lorscheid et al.

Our standardized procedure of applying DOE principles to simulation research offers another advantage as well. It answers the calls for an increased standardization of procedures in simulation research in general (Richiardi et al. 2006). Grimm et al. (2006), Richiardi et al. (2006) and Manuj et al. (2009) have already suggested standard procedures for simulation research to meet this objective.12 The ODD-protocol by Grimm et al. (2006) and Grimm et al. (2010) focuses on documenting and describing the simulation model13 which are important process steps prior to the actual simulation experiments. Richiardi et al. (2006) and Manuj et al. (2009) consider the entire simulation research process in their suggestions. We complement these existing approaches by focusing on main two aspects of the research process: the systematic analysis of simulation model behavior and the effective reporting of results. In addition, Richiardi et al. (2006) formulate clear demands for the generation of simulation results (investigation) as well as for sensitivity analysis and validation. They also explicitly note the insufficient application of DOE principles (see above), but do not elaborate on how to overcome the lack. It will be shown in more detail that the suggested procedure based on DOE principles can directly answer the requests made by Richiardi et al. (2006). Systematic DOE offers clear guidance based on an established body of theory. Consequently, proposing a systematic procedure to apply DOE principles to simulation can contribute to the initiatives for standardizing the research process that are already underway.

3 Model analysis and reporting based on DOE principles This section develops a systematic procedure to analyze simulation models and to report simulation results based on DOE principles. We start by providing an overview of the process stages. The process is embedded in the overall simulation research process and designed to address several typical challenges such as the stochasticity of input parameters and the large number of potentially interacting variables. In addition, the complication of complex factors in a model will be described and a cascaded DOE introduced. The multi-level structure of this DOE supports a systematic analysis of complex factors. Finally, output formats for representing simulation results will be presented. 3.1 Systematic model analysis based on DOE principles A simulation research project can be divided into several steps: designing a model, building a model, verification and validation, and a sensitivity analysis (Gilbert and Troitzsch 2005). The suggested process for simulation model analysis is intended for the later phases of a simulation research project. A main issue in this phase is 12 Richiardi et al. (2006) focus on agent-based simulation and Manuj et al. (2009) on discrete-event sim-

ulation. Corresponding initiatives do exist in the area of ecological simulation modeling, see Schmolke et al. (2010). 13 ODD stands for Overview, Design and Details. For examples see Polhill et al. (2008).

Opening the ‘black box’ of simulations

29

the question of how to produce and analyze simulation data, so that the research question can be answered. When looking at the results, interesting and unexpected model behavior might be discovered. But, as Railsback and Grimm (2011) point out, the researcher needs to be sure whether the results are indeed novel and important, or just the consequence of a model design decision or even a programming mistake. To find an explanation for the discrepancies between prediction and output, some ‘detective work’ has to be done (Railsback and Grimm 2011). Railsback and Grimm (2011) propose to search for explanations by producing lots of output, looking at it intensively and investigating whether any of it seems unexpected. The following procedure can support the understanding of simulation data by guiding the researcher through a systematic analysis process within the iterative steps on this way. The analysis process itself is divided into two parts: a pre-experimental phase and the experimental analysis (see Montgomery 2009). Part of the pre-experimental phase is building a bridge between the simulation model and the simulation experiment. On this basis, the actual simulation experiment is planned and conducted. Figure 1 gives an overview of the analysis process. The process starts with formulating (I) the objective of the simulation experiment. A clear reference to the research goal is needed for the experimental setup in order to ensure an effective and efficient analysis of simulation results. Models can be analyzed from many perspectives and the selection depends on the research focus. Manifold potential objectives exist such as treatment comparison, variable screening, response surface exploration and system optimization (see above). The research question can only be answered if the simulation experiment produces adequate data. The objective of the simulation experiment specifies the direction of the data analysis. Two major objectives are typically stressed: first, a relative comparison of alternative simulation configurations, e.g. identifying important factors and their effects on the response; second, the performance assessment of different simulation configurations, e.g. finding the optimal parameter settings (Law 2007). Typically, simulation models have a large number of variables that influence the model behavior. This poses many challenges for the researcher and requires a variable classification. First, the set of variables has to be divided into the ones that are important for the given research question and the ones that are not important but could affect the model behavior as well. Second, the variables measuring simulation performance have to be identified to be able to evaluate the model behavior. Hence, the classification of model variables allows for an overview of the different types of variables based on their roles with respect to the model and its analysis. Within the (II) classification of variables, the variables are assigned to one of three groups: independent, dependent and control variables14 (Field and Hole 2003). The dependent variables represent the model’s outcome. Their performance depends on the value combination of the independent variables. They are analyzed by varying those values and assessing the effects on the dependent variables. Control variables are fixed to 14 In the context of the model, the terms independent, dependent and control variables are used. For dis-

cussing an experiment, the terms factor (→ independent variable), factor level (→ factor value) and response (→ observed dependent variable) are used, cf. Law (2007). For a more technical perspective on simulation, input and output parameter are used. A more detailed description of experimental terminology can be found in Kleijnen et al. (2005), for example, and in Appendix B of this paper.

30

I. Lorscheid et al.

Fig. 1 Simulation model analysis process based on DOE

one value or randomly assigned by a probability distribution. We are able to define them as they do not have a relevant effect on the simulation outcome and/or are not of interest with respect to the research question under investigation. The resulting variable classification also offers the possibility to represent the model’s underlying research question in a condensed manner because it contains the expected core relationships between the independent and dependent variables of the model (see, e.g. Raghu et al. 2003). In line with the idea of a minimized model description consisting of model parameters, i.e. the “mathematical skeleton” proposed by the ODD protocol (Grimm et al. 2006, 2010), using variables might make the simulation experiment more easily accessible, particularly for non-experts. To generate a comprehensive overview, a validation of all variables through simulation program parameters is proposed. All variables that influence the model behavior are included in the program code as parameters. Therefore, the program parameters provide a specific reference point for the identification and classification of

Opening the ‘black box’ of simulations

31

variables (please see the appendix for an example). Listing them serves both as a checklist and as a tool for increasing the simulation model’s transparency.15 After defining the model variables and their values in the second step (II), they must be transferred into factors, factor level ranges and response variables for the simulation experiment. Response variables are needed to measure the experimental outcome (Law 2007), i.e. to estimate the dependent variables. Studying the independent variables in the experiment requires the definition of factors (Wu and Hamada 2000). For the simulation experiment, quantitative or qualitative factor level value ranges and discrete or continuous response variables have to be established because they guide the analysis of the model behavior. It might be possible to carry the model variable values over directly into factor level ranges and response values. However, most cases make it necessary to transform them first and, thus, to define the (III) response variables and factors for the simulation experiment. Potential independent variables, that were fixed control variables when classifying the variables (II), can be included as additional factors at this stage to understand their effects as well.16 After determining their effect strength and their effect direction, the control variables can be defined plausibly and without the risk of hidden effects of fixed factor levels. A major control variable typically found in simulations is the number of ticks (time steps) per simulation run. Turned into a factor, it can be tested for its sensitivity. One needs to be aware of the possible existence of nuisance factors in a simulation model (Montgomery 2009), as they affect the response. Nuisance factors can be controllable, uncontrollable or even unknown. Controllable nuisance factors are treated as control variables. They can be tested as factors in a sensitivity analysis and fixed to a value or distribution afterwards. It has to be noted that nuisance factors can also be uncontrollable (Montgomery 2009). This applies to stochastic process elements such as arrival processes or random selection. When nuisance factors are uncontrollable in the process but controllable for experimental purposes, they are also referred to as noise factors (Montgomery 2009). A possibility to implement the noise factor happenstance is using a random number generator which can be controlled by a fixed seed or stream in the experiment.17 Sometimes nuisance factors are unknown (Montgomery 2009) and represent a hidden influence on the model behavior. A robustness study can analyze and minimize the effects of uncontrollable nuisance and noise factors. This will be explained in more detail later (see (V) estimation of error variance). If only one factor exists in the model, its factor levels could be varied within the experiment to further analyze its effect on the response. Most models have more factors and, thus, require more complex experiments. An experimental “roadmap” has to be specified for planning the simulation experiment. DOE techniques allow for 15 Additionally, the publication of the source code is recommended. 16 The resulting effects of control variables will be reported in the comprehensive effect matrix (see below,

Sect. 3.3). 17 Using the same seed or stream of random numbers in all system configurations is a variance-reduction

technique called common random numbers (CRN) (see Law 2007).

32

I. Lorscheid et al.

creating such a “roadmap”. They define how the settings for the simulation experiment have to be configured in order to collect appropriate data. Factorial design is an important technique in this context as it can determine which factor levels should be systematically analyzed. Selecting (IV) an appropriate factorial design forms the next step in the simulation model analysis process based on DOE. Two main reasons make factorial design a suitable technique for simulation experiments. First, factorial design assures a systematic analysis, so that valid and objective results are produced and interactions between factors can be identified. Factors interact when the effect of one factor on the response is weakened or reinforced by the level of another factor (Field and Hole 2003). For models with more than one factor, knowledge about existing interactions is important for the interpretation of results. Alternative approaches, such as the best-guess approach or one-factor-at-a-time approach, result in “blind spots” in the analysis (Montgomery 2009) and cannot systematically detect interactions (Law 2007). Second, factorial design can efficiently deal with a large number of factors and factor level ranges. As simulation models often comprise a high number of factors, this ability is crucial and, ultimately, ensures a systematic analysis of complex models. Depending on the number of factors, a reduction of the experimental complexity might be required. This can be accomplished by defining just two factor levels per factor or fractioning subsets of factors.18 In addition, other techniques than systematic DOE might be used to reduce the complexity of the simulation model. Depending on the respective model, the complexity could be reduced by, for example, making environmental circumstances constant, using homogeneous agent types, using zero intelligence agents or switching certain mechanisms on/off. Therewith, simplified model scenarios can be used as a benchmark and thereafter, the complexity could be expanded again in iterative experiments. The choice of the right factorial design is basically defined by the number of factors, the set of factor levels for every factor and their combinations. An important common design is the 2k -factorial design, for which two factor levels for each factor are chosen. Usually, one high value (represented by symbol ‘+’) and one low value (represented by symbol ‘−’) per factor. Basically, the values depend on the model and the objective of the simulation experiment.19 2k -factorial design is an efficient method to measure not only all k factor effects in the model but also to analyze all combinations of the interaction effects between the k factors (Trocine and Malone 2001). As only two factor levels per factor are defined, it requires relatively few settings. However, this approach has some limitations. It is important to note that by two factor levels per factor non-linear effects of a single factor cannot be detected. In addition, the 2k factorial design assumes linearity in the factor effects. As Montgomery (2009) points out, a perfect linearity is not required though. The 2k design determines an effect sufficiently well as long as the assumption of linearity holds approximately. Law (2007) argues, that the range of the factor 18 Our paper focuses on 2k factorial designs while acknowledging the need for other designs. For a detailed

description of different factorial designs, see Law (2007) or Montgomery (2009). 19 Law (2007) suggests discussing these values with subject matter experts.

Opening the ‘black box’ of simulations

33

should be at least monotonic for the application of 2k designs. In the case of nonlinearity, however, two points for each factor are not sufficient and more factor levels necessary.20 Overall, a simulation experiment specified by a 2k design provides only a rough data set. Consequently, 2k is often applied as screening method in the initial phase of experimental studies, to detect the important factors which are responsible for most of the effects on the response (Trocine and Malone 2001).21 The 2k -factorial design can be used to determine the direction for further experimentation and is suitably augmentable for a more detailed local experimentation and more sophisticated analyses. Therefore, it fits the sequential strategy of iterative simulation experiments (Box et al. 2005). The factor level combinations specified by the factorial design are called design points (or scenarios) and determine the simulation settings for performing the simulation experiment. They are structured in a design matrix as illustrated in detail in the appendix. It was outlined before (III) that simulation models often contain uncontrollable or even unknown nuisance factors. Both result in non-deterministic simulation responses. Consider a simulation model with nuisance factors and running it with the same setting several times, for example. The results would show some variability in the response between simulation runs due to the (hidden or known) nuisance factors. We call this variability experimental error.22 This could distort the analysis of outcome differences between simulation settings. In order to obtain meaningful results, the mean and variance over several simulation runs per setting must be analyzed (Gilbert 2008). For an initial estimation of the number of simulation runs required per simulation setting, the size of the experimental error needs to be analyzed prior to executing the simulation experiment. Therefore, an (V) estimation of experimental error variance is performed as part of the simulation model analysis process. Variance measures can be used for a first approximation of how many runs are needed per setting for the given simulation. Calculating the mean and variance for an increasing number of runs and some fixed settings only provides a first impression of the experimental error. A variance measure and criteria for determining the needed number of runs are required. Prominent measures for analyzing the accuracy of the mean and variability of data in descriptive statistics are the sum of squared errors, variance and standard deviation (Field and Hole 2003). We suggest using the coefficient of variation (cv ). Providing a dimensionless and normalized measure of variance, it allows for comparing different data sets to sundry units and means. It is defined as the ratio of the standard deviation of a number of 20 Montgomery (2009) describes the method of adding center points to the 2k design for detecting curva-

tures within the effect functions. 21 According to Bettonvil and Kleijnen (1996) the parsimony principle implies that the task of science is

to identify a short list of important factors. They describe the need of the screening process in the pilot phase of complex simulation studies to identify the relevant factors so that they can be investigated further in later phases. 22 The fluctuation that occurs between experiment repetitions is called noise, experimental variation or

experimental error (Box et al. 2005).

34

I. Lorscheid et al.

measurements to the arithmetic mean (Hendricks and Robey 1936): cv =

s μ

The parameter s is the standard deviation, and μ stands for the arithmetic mean of a set of values. The fact that this measure is dimensionless represents an advantage when examining the variance of multiple response variables, which is a common operation in simulation experiments. Thus, the coefficient of variance is used for the analysis of experimental error variance. To calculate the coefficient of variance, the analysis starts by repeating the simulation with a selected, fixed setting23 for a relatively low number of runs. The response variable values are measured and recorded for every run. After calculating the mean and coefficient of variance, the number of runs is increased iteratively. The given number of simulation runs is performed for the same settings. Increasing the number of repetitions typically stabilizes the variability of the response and of cv to a point when the coefficient of variance does not change with increasing number of runs any more. The stability of cv can be used as the stopping criterion and, thus, to specify the number of runs N required for calculating the response measures in the subsequent simulation experiment. The appendix shows how an error variance matrix can help to structure the analysis. Nevertheless, the experimental error needs to be interpreted with respect to the respective model. General criteria might not be applicable to the given model, e.g. because the variability of response variables does not stabilize over an affordable number of runs. The definition of the needed number of runs is a tradeoff between stability and costs, just as in empirical research, where more observation points lead to a higher certainty in the results, but require also more resources that need to be invested (more questionnaires, experiments, etc.). Accordingly, in simulation research more runs per setting lead to more precision, but also increase the demand for computing power and time. This step, (V) estimation of experimental error variance, should result in a first impression of the error variance (“variance screening”) and in the ability to approximate the required number of runs per setting for the simulation experiment. This should provide a tool for determining the number of runs and thus for communication and transparency of the selection criteria. To assure meaningful results, we suggest to include an additional check of the experimental error variance through ANOVA later in the process (see (VII) effect analysis). In step (VI) the simulation experiment is performed to produce the simulation data. The factor level combinations for the experiment are derived from the factorial design (IV). For every design point, a simulation (→ sub-experiment) is performed N times as determined in the preceding analysis of error variance (step (V)). The response values are recorded as average values over N runs for every factor level combination. Every response value per run is captured in the design matrix for subsequent statistical analyses (see the illustration in Appendix A for an example). 23 As a first approach, two design points with extreme and one design point with mean values of factor

levels can be chosen from the factorial design.

Opening the ‘black box’ of simulations

35

So far, the process has focused on the systematic production of appropriate simulation data. The following step emphasizes the analysis of the produced data. Step (I) yielded a definition for the objective of the simulation experiment such as the analysis of differences between different system configurations and/or the identification of important factors and their effects. To establish which factors are important in the simulation model and how they influence the simulation response, an (VII) effect analysis is recommended. Each of the two components requires a different approach for its assessment. First, a factorial Analysis of Variance (ANOVA24 ) over the recorded means of all simulation runs is applied. Its result provides two clues. R 2 indicates the proportion of systematic variance explained by the identified factor effects to the unsystematic variance from all other factors in the simulation model (Field and Hole 2003). ANOVA also indicates the significance of individual factor effects and 2-factor interaction effects. Typically, this is restricted to two-factor-interactions, but may be extended to higher orders of interactions.25 If all (or most) of the expected effects are significant, step (V), the estimation of experimental error variance, is verified. Second, the effect strength is calculated for all identified significant effects within the effect analysis in order to specify their strength and direction. To this end, the response values recorded in the design matrix are used. A factor’s effect strength is the average change in the response caused by a change in the factor level of this factor (see Law 2007). How to proceed from here depends on the respective objective of the simulation experiment, on the results of the analysis already obtained and on the simulation model. Due to the diversity of possible models in particular, the next steps cannot be entirely standardized. This challenges the researcher to make the right choice for the respective model. Nevertheless, steps (VI) and (VII) introduced above provide a roadmap and suggest to proceed iteratively for drilling down into design points of interest. The following three potential scenarios illustrate this. First, a researcher might have identified one or more factors with a high impact (high effect strength) on the response values. For these influential factors, further simulation experiments with additional factor levels can be performed in the next iteration to explore the relationship between the factors and the response in more detail. Second, a researcher might encounter interesting 2-factor interaction effects. More experimental data for design points with more detailed factor levels for the respective interacting factors might expose non-linear interactions. A third scenario could be the discovery that different simulation settings lead to different response performances, e.g. to different levels of success for the agents. A further analysis could aim at combining the parameters with the best performance. Thus, new factor levels derived from the original ones could be defined and analyzed as design points in further simulation experiments. As mentioned before, the 2k factorial design cannot detect non-linear effects of a single factor. Therefore, we suggest a more in-depth analysis of important effects by specifying additional factor levels per factor in further experiment iterations. 24 For a more detailed description, see Field and Hole (2003). 25 See Box et al. (2005). The higher-order interaction effect can become very difficult to interpret.

36

I. Lorscheid et al.

Fig. 2 Concept of cascaded DOE

To finalize the analysis and results, all outputs are documented in an updated and consistent manner along the process. 3.2 Cascaded DOE for complex factors An important complication typical for simulation models is the existence of complex factors. Complex factors are factors with substructures. Their factor levels are not determined by one quantitative or qualitative value, but by a combination of values at a subordinate level. We refer to these values as sub-factors in our paper. In other words, the factor level of a complex factor is represented by value combinations of sub-factor sets. An example for such a sub-factor is a learning model with several variables. Its learning capabilities are determined by a combination of values of these variables (see Appendix A). In order to define factor levels for complex factors, the sub-factors must be calibrated and doing so increases the experiment’s complexity. Examples for calibration goals are producing the best possible performance or testing the robustness of complex factors. However, the research question focuses on the interaction of the complex factor with other factors in the model. Given that the interaction between sub-factors at different levels of the same complex factor is only of secondary interest, the calibration of sub-factor sets can be divided into disjoint modules. Based on the principle of using hierarchies to handle complex systems (see Simon 1973 for example), this paper introduces the concept of cascaded DOE to provide a structure for simulation experiments with complex factors as shown in Fig. 2. The concept of a cascaded DOE comprises two different DOE types: one TopLevel-DOE and several Sub-DOEs (see Fig. 2). The Top-Level-DOE aims at answering the overall research question. To reduce the experiment’s complexity, the analysis is divided into sub-experiments that correspond to the factor levels of the complex factor. One Sub-DOE is specified for each factor level of the complex factor.

Opening the ‘black box’ of simulations

37

A Sub-DOE contains all sub-factors of one complex factor level as well as all other factors from the Top-Level-DOE. To calibrate the sub-factors for the given scenario, possible interactions with the other factors in the model need to be considered. Therefore, the factors from the Top-Level DOE are included and systematically varied with the sub-factors in the sub-experiment. Factor levels of complex factors can be represented as functions over a set of sub-factors, as exemplified in Fig. 2. In this illustration, the factor level value f is determined by the values taken by x1 and y1 , which represent the sub-factors (here functional values) of f . After calibrating all sub-factor sets within the Sub-DOE, i.e. assigning values to the arguments x1 and y1 , the factor levels of the complex factor are specified, so that the Top-Level experiment can be performed to address the actual research question. 3.3 Reporting templates Presenting simulation results bears the risk of both providing too much detail and impairing the required transparency. A detailed description of the analysis process may overload the reader of a publication and, ultimately, may distract from the main findings. Due to the complexity often inherent in simulation research, a comprehensible presentation is crucial for maintaining the transparency of the method and results in a publication (Axelrod 1997; North and Macal 2007). In the simulation model analysis process outlined before, standardized structures enhance accessibility, as they are easy to grasp for an scientifically trained audience. Standard report formats ease the transfer and comparison of results further. DOE concepts offer the possibility of summarizing results relevant to the research question in a condensed and standardized manner. Even though the systematic DOE yields elaborate information within the analysis process, the final DOE results summarize the model behavior and main findings in a compact yet informative way. Given the significance of a detailed DOE analysis process for valid and objective results, some of its concepts may be employed to communicate the research question and simulation results. This section discusses how to use the results of the proposed model analysis based on DOE principles for communicating simulation results effectively. Typically, the complexity of simulation models is rooted in a relatively high number of variables and processes. Non-experts in particular may find it difficult to describe the research focus with reference to the implemented simulation model. Thus, the classification of variables is suggested as a first reporting template. It can support the communication of the research question with respect to the variables of the simulation model (see Table 1). Table 1 Reporting template (1): Classification of variables

Independent variables

Control variables

Dependent variables

Var1

Var4

Var6

Var2

Var5

Var3

38

I. Lorscheid et al.

Table 2 Reporting template (2): Effect matrix Response variable Ri Factors Factors

x1

x2

x3

x4

x5

x1

main effect x1

interaction effect (x1 , x2 )

interaction effect (x1 , x3 )

interaction effect (x1 , x4 )

interaction effect (x1 , x5 )

main effect x2

interaction effect (x2 , x3 )

interaction effect (x2 , x4 )

interaction effect (x2 , x5 )

main effect x3

interaction effect (x3 , x4 )

interaction effect (x3 , x5 )

main effect x4

interaction effect (x4 , x5 )

x2 x3 x4 x5

main effect x5

The final classification of variables contains all independent, dependent and control variables of the model. It becomes clear immediately from the table which relationships the research focuses on and which variables are not of interest and, thus, treated as control variables. Hence, the variable classification enables a standardized and condensed presentation of the major questions under investigation. They are typically reflected in some hypotheses on the effects of independent variables on the dependent variables (see, e.g. Raghu et al. 2003). The biggest advantages from this are likely to be gained in an interdisciplinary context because it allows for expressing the relationships of interest in the “universal language” of variables and their relationships instead of drawing on a jargon exclusive to one of the respective disciplines.26 The potentially high complexity of simulation models also poses a challenge for effectively reporting the main results. Simulation experiments do often not only consider a high number of factors. They also involve calculating the effect strengths for all factors and factor pairs specified within the experimental setting. To this end, the final effect matrix is suggested as a second reporting template to describe the simulation behavior in a standardized and quantitative way (see Table 2). The effect matrix provides an overview of the effects of all factors and their interdependencies that are relevant to the research question. Factor effects are the predominant resource drawn upon for answering the research question(s), e.g. by supporting or falsifying the investigated hypotheses. In addition, the strength of the effects and their direction can be easily elicited from the matrix. In addition, a comprehensive effect matrix of the model could be included in the appendix to provide additional information. However, this comprehensive effect matrix does not only communicate the main effects with regard to the investigated research question. As it contains the effects of all factors and control variables which were examined during the simulation experiments and therefore displays all relation26 For a similar suggestion see Grimm et al. (2006). They recommend a simple, easily communicable

model description that is independent of any specific structure, purpose and form of implementation.

Opening the ‘black box’ of simulations

39

ships identified in the model. This should further enhance transparency and foster the credibility of the simulation model. Additional information about the analysis process can be provided in the appendix and/or as freely accessible attachments. This would open the ‘black box’ of simulation even more and creates credibility through uncompromised transparency. The table of code parameters and the variance matrix as a reference for the number of runs are two other output formats from the previous section that may be made available in the appendix or as attachments. Furthermore, the ODD protocol (Grimm et al. 2006, 2010) can be attached for a standardized specification of the simulation model. Finally, transparency can be fostered further via publishing the source code, e.g., via the OpenABM platform.27 3.4 Critical considerations The proposed process based on DOE principles offers a general structure that should make it applicable to a broad range of simulation models. Nonetheless, each actual application must account for the specific model under investigation in order to translate the proposed procedures into actions for the individual researcher, who may even find it necessary to adjust some of the procedures. To reflect this, we finally discuss some critical considerations along the proposed process (see Fig. 3).28 The first step, (I) formulating the objective of the simulation experiment, requires to evaluate the fit between the data yielded by the planned simulation experiments and the given research question. A simulation model allows for a lot of flexibility and enables the researcher to observe the model behavior from different perspectives which generally poses a significant advantage. The researcher can change the focus of the analysis over the course of the experimental phase if results indicate new interesting issues unforeseen by the modeler. Despite this freedom, the experimental design must focus on producing data that are appropriate for addressing the chosen research questions. Consequently, one has to assure continuously the fit between research questions and the objective of the simulation experiments. During the (II) classification of variables, the existence of explicit and traceable links between program code and variables represents a critical aspect. A check with deeply hidden parts of the code, such as local parameters in methods or subclasses, ensures the identification of influencing variables. Qualitative model assumptions in particular may not be recognized immediately as potential variables. One has to be diligent in identifying all potential modifications to which the responses might be sensitive. For (III) the definition of response variables and factors, the variables to be considered as factors for the experiment must be carefully selected. The decision is closely linked to the model’s construction. Of particular importance is the distinction be27 See http://www.openabm.org/. 28 Note that we do not intend to be exhaustive, but rather want to give an idea of the challenges associated

with the application of systematic DOE.

Fig. 3 Critical considerations along the suggested process

40 I. Lorscheid et al.

Opening the ‘black box’ of simulations

41

tween model construction decisions and control variables. The former are a given for the experiment, while the latter are optional factors that can be added for the effect analysis. Richiardi et al. (2006) note that aspects such as temporal model, data aggregation, and decision process are typically part of model construction decisions that are not further analyzed. However, it is often worthwhile to analyze the effect of the number of ticks, seed or noise level by temporarily adding them as factors to the experiment. When (IV) selecting an appropriate factorial design, the model’s setting and complexity largely determine the appropriate design (Law 2007). The literature offers a broad range of principles and specific designs that should be screened for applicability (Kleijnen et al. 2005; Wu and Hamada 2000). Regarding the (V) estimation of error variance, researchers might benefit from testing whether the variance coefficient changes with a considerably greater number of runs. Once the stability of coefficients over increasing runs has been identified, its steadiness should be evaluated through a much higher number of runs (in our example: up to 200 times the runs at the reference point, see Appendix A). As proposed for the sample model, a critical consideration of the design points chosen for the variance analysis might lead to the definition of other factor levels rather than to extreme and mean values. The objective is to define a small set of design points that produces a range of extreme response values for the given model and to test the variance for response values with noise. The statistical techniques that can and should be applied to assess the stability of simulation responses and, later, to analyze the effects should be carefully selected in this phase. Before (VI) running the simulation experiment, one must ensure that the statistical methods are applicable with regard to their basic assumptions and sophisticated enough to make the best use of the simulation data in order to allow for valid interpretations (Fisher 1971). The (VII) effect analysis has the inherent potential to reduce complexity. By showing which factors have no or minimal effects on the response values, it reveals which factors can be defined as control variables. Prior to this decision, another iteration of sensitivity analysis might prove valuable. Including other factor levels and the (internal) validation of both the model and the simulation code yields an effect matrix that help to determine whether or not to continue the experiment. The answer highly depends on the research question and on the results obtained at that point. As every iteration can add insight into the model behavior at typically low cost, a bias toward more iterations is desirable (“erring on the side of caution”) for this last step. Finally, it is important to note that the analysis of simulation models actually requires more than just a systematic analysis of factor value spaces. It might be useful to first explore the simulation behavior such as by running simplified scenarios or extreme parameter value spaces, or by looking on the simulation behavior in detail. Thereby, the researcher acquires a sense for the model, which might help to define the further direction of research within the simulation experiment. Once the direction of research is found, the proposed procedure based on DOE principles can be applied for a systematic simulation data analysis.

42

I. Lorscheid et al.

4 Discussion and conclusion This paper developed a systematic procedure for analyzing simulation models and presenting their results. It was motivated by the observation that many still perceive simulation models as a black box due to their complexity and the fact that their results are not easy to communicate. To address this problem, we suggest a systematic application of several principles from the DOE literature, which we aligned with the simulation research process. The result is not only an effective and efficient analysis of simulation behavior. The approach also makes simulation models more transparent at the same time. Moreover, it generates reports that can be used to communicate the results of simulation models effectively. Overall, we hope that the suggested procedure contributes to an increased acceptance of simulation as a research method. The framework of research evaluation by Mårtensson and Mårtensson (2007) is helpful for specifying the potential contributions (see Fig. 4). It systematizes important dimensions usually associated with high research quality (Mårtensson 2003). According to Mårtensson and Mårtensson (2007), good research is credible, contributory and communicable. This paper addresses at least two criteria of this framework directly. First, it makes simulation models more transparent, thereby increasing one important sub-dimension of credibility. Second, it facilitates the communication of simulation research by making it both more accessible and consumable. Therefore, applying the ideas suggested in this paper is likely to have a positive impact on the perceived quality of simulation research. The suggested approach can be easily combined with existing standards because it complements them by focusing on two main aspects of the research process: the systematic analysis of simulation model behavior and effective results reporting. FurFig. 4 Framework for research evaluation (Mårtensson and Mårtensson 2007)

Opening the ‘black box’ of simulations

43

thermore, some of our results can help specify existing initiatives. Grimm et al. (2006) mainly aim at an efficient and effective description of simulation models. Adding to their efforts, our paper reveals another way to describe a model: the classification of variables (see Table 1). This idea is in line with Grimm et al. (2006) and Grimm et al. (2010) who propose using model variables to define the purpose of the model. Both Richiardi et al. (2006) and Manuj et al. (2009) mention the applicability of DOE principles in their standards and even offer an explicit link to DOE. Hence, a wider integration of our systematic DOE could advance these standards by driving the level of detail found in model analysis. The fact that we outline a concrete step-by-step procedure enhances their applicability. The compatibility of our suggestions with existing standards should increase the adoption of our proposed approach. As with existing standards, having journals, reviewers and other members of the simulation community as multipliers would further foster its adoption and increase convergence to a shared standard. Similarly, a more extensive coverage of DOE principles in PhD courses, simulation summer schools, etc. would provide younger researchers with guidance and would be a good platform for discourse. We believe that the application of the suggested approach to analyze simulation models and their results could even be self-enforcing to some extent. Given the aim of DOE to foster the efficiency and effectiveness of (simulation) experiments, their application is in the researchers’ self-interest. Drawing upon DOE raises the model’s transparency and, therefore, the level of understanding for the model. We also anticipate that the suggested approach increases the credibility of simulation models among non-simulation researchers. The emergence of a common language shared by quantitative field researchers, laboratory experimenters and simulation researchers may support interdisciplinary research considerably. Besides these advantages, some limitations and avenues for future research should be mentioned. While we expect the general process of systematically applying DOE elements to be quite stable, the actual selection of DOE elements might differ between simulation models. The support for our approach is currently limited to one example (see illustration in the appendix). Although other examples of a successful application of DOE principles exist in the literature (e.g. Raghu et al. 2003), further applications are warranted to underline our propositions and explore aspects such as the economic viability and feasibility of full factorial designs for a large number of variables. A very high number of variables might pose a general challenge to the approach. Future research addressing these challenges would not only yield valuable answers for simulation researchers, but could also contribute to the advancement of DOE principles. In the long run, simulations and DOE could mutually benefit from each other with advances in one area stimulating progress in the other one. Acknowledgements We thank the participants of EPOS 201029 for their comments, especially Prof. Petra Ahrweiler for her contribution as discussant. Moreover, we thank the two reviewers for their suggestions which helped to improve the paper.

29 IV Edition of Epistemological Perspectives on Simulation, a cross-disciplinary workshop held on 23–25

June, 2010, in Hamburg, Germany.

44

I. Lorscheid et al.

Appendix A: Illustration This appendix demonstrates how the proposed process can be applied to increase the transparency of the simulation model behavior. First, we introduce the sample model that the illustration will be based on. Second, the analysis procedure is applied to the sample model highlighting the role of DOE principles. Finally, the use of DOE formats to communicate simulation results effectively is shown. A.1 Introduction of sample model The sample model is an agent-based simulation from mechanism design. Mechanism design is a subfield of game theory and deals with the problem of how to design the rules of a game in a way that self-interested actors act in the desired way (Dutta 1999). Capital budgeting in decentralized firms represents one application for it. An optimal investment program requires truthful reports on the productivity of the investments suggested by each business unit. However, only the unit managers have the relevant information and maximizing their own profit offers incentives not to report truthfully (Ewert and Wagenhofer 2008). Mechanism design in capital budgeting searches for rules for resource allocation and profit distribution that establish truthful reporting as the best strategy for the unit managers and, therefore, as the Nash equilibrium of the game (Myerson 1999). The Groves mechanism represents an example of mechanism design (Groves 1973; Groves and Loeb 1979). It is a truth-inducing mechanism for bidding processes which form the context for the sample model. Surprisingly, this mechanism is seldom applied in practice despite its theoretically well-founded truth-inducing effect. Its complexity might partially explain this (Arnold et al. 2008). In a similar vein, recent experiments have shown that many participants do not report truthfully, although it is the dominant strategy from a game theoretical perspective (Arnold et al. 2008). Answers in questionnaires indicated that the participants did not understand the Groves mechanism. The sample model aims at examining this phenomenon via an agent-based simulation. The analysis of the actors’ cognitive capabilities forms its center as they are required to understand the rules and their implications. Existing learning concepts for agent-based simulations represent the different degrees of cognitive capabilities.30 The sample model compares the learning algorithms to identify those cognitive abilities required of agents for behaving successfully within the Groves mechanism. The sample model serves as a starting point and focuses on the complexity of Groves. Later, the results of Groves will be compared with other mechanisms in a comparative analysis. The goal is to rank mechanisms by complexity. The subsequent analysis of the differences between successful and failed learning algorithm should enable the identification of cognitive requirements for successful interaction within the mechanisms. 30 We omit the theoretical discussion of the given learning algorithms as representation for different degrees

of cognitive capabilities. This aspect would have to be addressed for the sample model if it was not used merely for illustrating the DOE-based simulation model analysis process.

Opening the ‘black box’ of simulations

45

In the sample model, the research centers on the learning behavior of one player and the research question targets the success of the learning algorithms independent from the opponents’ behavior. Consequently, how the learning process of one player depends on the opponents’ behavior is not of interest for the research question and, thereby, excluded from the sample model. The simulation model is grounded in a published laboratory experiment investigating the applicability of the Groves mechanism (Arnold et al. 2008). Its basic set-up comprises two actors who represent two reporting units in a firm. Only the actors know their own true productivity value and their task is to report it to management. Management needs to know the true productivity in order to allocate the given resources in the best way possible. The Groves mechanism’s payment functions calculate the payoff for each actor based on the actual and reported productivity values (Arnold et al. 2008). The design of the payment function assures that each actor’s optimal strategy is to report the true productivity value.31 The agent tries to learn the best strategy from the payments received for previous decisions. In every simulation step, the agent reports the unit’s productivity, receives a payment for the decision and updates the attraction to strategies. The true productivity is random.32 The implemented learning algorithms reflect reinforcement learning concepts that are stepwise extended and are adopted from previous research by Sarin and Vahid (1999). The general model Experience-Weighted Attraction stems from Camerer and Ho (1999). The results for a zero intelligence agent (Gode and Sunder 1997) are initially analyzed as well to establish a reference point. The decisions of a zero intelligence agent are random and exclude learning. The respective learning algorithm is one major factor driving the results, but it can easily become a black-box at the same time due to its internal complexity. This makes it an interesting example for our considerations. Next, the sample model is used to demonstrate the ability of systematic DOE to increase the transparency of simulation models. A.2 Systematic model analysis—process illustration This section illustrates the proposed procedure for simulation model analysis by applying the process to the sample model. It also introduces additional output formats based on DOE principles for structuring the analysis. We present the information in great detail to make the process easy to understand. However, the level of detail should not be seen as a benchmark for the publication of simulation results. For the communication of results, the use of standardized output formats is shown in a separate section. 31 For a detailed discussion of the payment function see again Arnold et al. (2008). 32 The other agent’s reporting strategy is chosen randomly, i.e. independent from the true productivity

(“zero intelligence”). This introduces some noise for the observed agent. However, the major property of the Groves mechanism to induce truthful reporting holds under this assumption as well. An exception from the rule would be the coordinated behavior of the two agents (“collusion”) which should not emerge under this model’s assumptions (Heine et al. 2005).

46

I. Lorscheid et al.

Table 3 Simulation parameter as reference point Simulation parameter (main class) String learning_algorithm double avg_reward

DOE Variables √ √

Other parameters

Learning algorithm Quality of learning

double sum_reward



Support parameter

double avg_reward_i



Support parameter

double sum_reward_i

– √

Support parameter

double T

Speed of learning

double T_i



Support parameter

double T_max



Support parameter

int T_length

– √

double E double E_i



int counter_E

– √

double[] strategies double pi double pi_real double pj int runs int ticks FileWriter outputfile

– √ √ – √ –

Support parameter Stability of learning Support parameter Support parameter Strategy set Simulation output (report) Real productivity Report of opponent Result of (V), error variance analysis Ticks Technical parameter, creates output file

The implemented sample model, an agent-based simulation programmed in Java, serves as the starting point for the experiment. As described above, the sample model is used to examine the complexity of the Groves mechanism. Therefore, the performances of agents with different learning algorithms are analyzed and compared. The simulation experiment has two objectives (I): First, the effects of the different cognitive capabilities on truthful reporting are analyzed through a relative comparison of the learning performance between agents. Second, the optimum for each learning algorithm’s performance in the given model scenario is determined to enable the comparison. To prepare the simulation experiment, the variables of the sample model are classified (II) in two steps. First, the implemented simulation program parameters are used as reference points to identify all model variables (see Table 3). All parameters influencing the model behavior are included in the DOE as variables. Parameters to evaluate the simulation model are entered as well. Support parameters are needed for calculating the simulation outcome. As they are required for implementation purposes only, they are not included as model variables. Second, the variables are classified as independent, dependent or control variables to get an overview of the model components (see Table 4). Based on the classification of variables, the investigated relation in the model can be described as the effect that different (1) learning algorithms have on the (6) qual-

Opening the ‘black box’ of simulations

47

Table 4 Classification of variables Independent variables (1) Learning algorithm

Control variables

Dependent variables

(2) Strategy set

(6) Quality,

(3) Real productivity

(7) Speed,

(4) Report of opponent

(8) Stability

. . . of learning truthful reporting

(5) Ticks

ity, (7) speed and (8) stability of learning the dominant strategy in Groves, i.e. truthful reporting. The variation of the potential independent variables (2) strategy set, (3) real productivity, (4) report of the opponent and (5) ticks is not of interest for the given research question. Consequently, these parameters remain fixed as control variables in the model. The variable (2) strategy set contains all productivity values that the laboratory experiment can possibly yield and that form the model’s basis. In every step, the agent needs to report a productivity value. The agent knows his/her true productivity value, which is reflected in the control variable (3) real productivity. It is a random variable with a uniform distributed probability for all eight productivity values. The productivity value (4) report of the opponent is also random with a uniform distribution over the strategy set. The control variable (5) ticks defines the number of steps per simulation. In our example, it states the number of rounds in the game and is set to 1,000 as a starting point. The dependent variables are (6) quality, (7) speed and (8) stability of learning to report truthfully. Quality denotes the average relative reward gained by the agent. Speed indicates the average time until the agent learns truthful reporting, i.e. until (s)he discloses the correct information for the first time. Stability of learning is measured by the average period during which the agent behaves in the desired equilibrium. Within the simulation experiment, the described model variables are analyzed as (III) response variables and factors. To accomplish this, the model variables need to be transformed into factors and factor level ranges are to be defined (see Table 5). As shown in the table of factors, the independent variables are analyzed as factors and the dependent variables as response variables in the simulation experiment. The independent variable learning algorithm is also assessed as a factor. The response variables correspond to the dependent variables and are abbreviated by the letters R (quality of learning), T (speed of learning) and E (stability of learning). To test the effect of the simulation run length on the simulation response, the control variable number of ticks is tested as a factor in the experiment. The required length of one simulation run depends on the course of the learning process. According to the respective learning algorithm, the agent learns at different speed, if at all. By including ticks as a factor, the various simulation run lengths can be tested for sensitivity. The inclusion makes it possible to determine if learning occurs later in the simulation as well. The qualitative factor learning_algorithm consists of three factor levels: Zero Intelligence, Reinforcement Learning and Experience-Weighted Attraction. The factor

48

I. Lorscheid et al.

Table 5 Definition of factors, factor level ranges and response variables Classification

Table of factors

Factor level ranges

of variables Independent variables Learning algorithm

Factors Learning algorithm

{zero intell., reinforc. learn., exp. weight. attract.}

Ticks

N

Control variables

Control variables

Strategy set

Strategy set

{1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1}

Real productivity

Real productivity

x ∈ strategy_set (random, uniform distributed probability)

Report of opponent

Report of opponent

x ∈ strategy_set (random, uniform distributed probability)

Ticks Dependent variables

Response variables

Quality of learning

R (Quality of learning)

Speed of learning

T (Speed of learning)

Stability of learning

E (Stability of learning)

levels, in turn, are defined by a combination of values. These learning parameters form substructures and have to be calibrated for the given model scenario in order to analyze their performance. Thus, the factor learning_algorithm is a complex factor according to the description given in Sect. 3.3. To structure the experiment and decrease the complexity of the analysis, the concept of cascaded DOE is applied (see Fig. 5). Figure 5 shows the three Sub-DOE: “Zero Intelligence”, “Reinforcement Learning” and “Experience-Weighted Attraction” (EWA). They allow us to identify the best factor level combinations in terms of learning performance for the given model scenario. Once the optimal algorithm calibration is found forevery Sub-DOE, the performance values are compared across the learning parameters for the calibrated factor learning algorithms. This evaluation of the Top-Level-DOE represents the final step towards answering the research question. We use the Sub-DOE Experience-Weighted Attraction to illustrate the subsequent process stages. A.2.1 Sub-DOE EWA—1st iteration The sub-experiment EWA aims at identifying the best performance of the learning algorithm. To this end, the analysis focuses on how the listed learning parameters impact quality, speed and stability of learning. A Sub-DOE contains not only all subfactors of one complex factor level, i.e. its learning parameters. It also comprises all other factors from the Top-Level-DOE in order to test for interactions. In the case of EWA, ticks represent the only factor from the Top-Level DOE. Performing the simulation experiment requires (IV) an appropriate factorial design. Learning parameters are calibrated by testing their factor level ranges for performance in a factor screening process. The 2k factorial design is a suitable design for

49

Fig. 5 Cascaded DOE

Opening the ‘black box’ of simulations

50 Table 6 Factor levels for 2k factorial design

Table 7 Table of preliminary design points for the estimation of error variance

I. Lorscheid et al. Factors

Factor level range

Factor levels

Representation

ρ

∈ [0, 1]

{0.15, 0.85}

{−, +}

σ

∈ [0, 1]

{0.15, 0.85}

{−, +}

φ2

∈ [0, 1]

{0.15, 0.85}

{−, +}

λ

∈ [0, 1]

{0.15, 0.85}

{−, +}

Ticks

N

{1,000, 5,000}

{−, +}

Design points Factors ρ 1. VA

σ

Definition by φ2

λ

ticks

0.15 0.15 0.15 0.15 1,000 Low factor levels

2. VA

0.5

3. VA

0.85 0.85 0.85 0.85 5,000 High factor levels

0.5

0.5

0.5

1,000 Mean factor levels

factor screening. It provides insight into factor effects with respect to their strength and direction and it determines the best performing factor levels in a stepwise approach (Box et al. 2005). Two values per k factor are analyzed in 2k factorial design. Table 6 presents the selected factor level ranges and levels for Sub-DOE EWA. As shown in Table 6, one high and one low value per factor defines the starting point. Within the experiment, every possible factor level combination is investigated. Before the simulation experiment can be performed, it has to be determined how many runs are necessary per setting. The agent’s decision making process contains stochastic elements, as strategies are randomly chosen with a given probability33 in every step. Therefore, the simulation model has to be run several times for each parameter setting to achieve meaningful results. Estimating the (V) experimental error variance constitutes a first approach to defining the required number of runs. Within the analysis of the experimental error variance, the mean and coefficient of variance are calculated for the increasing number of runs and all response variables. This means that a series of pre-experimental simulation runs needs to be performed with specified parameter settings. To gain a better understanding of the sample model, sets of extreme and mean design points from the factorial design serve as preliminary design points (see Table 7). The goal of pre-experimental simulations is to assess the experimental error variance. To support the analysis, an error variance matrix is used that structures the recorded response values. Table 8 shows the error variance matrix of the sample model that includes the mean value and coefficient of variance for each of the three response variables R, T and E. The matrix is filled with the output data from all pre-experiments. The matrix contains both the observed mean values and the coefficients of variance for every response variable, preliminary design point and defined sets of runs. The values in the error variance matrix offer a first approximation of the required number of runs 33 The probability is defined and updated by the respective learning algorithm.

Opening the ‘black box’ of simulations

51

Table 8 Error variance matrix Dependent variables

Number of runs

and measures

10

100

500

1,000

5,000

10,000 20,000 40,000 60,000 80,000

Design point: 1. VA R (Quality of learning) MEAN

0.70

0.70

0.70

0.70

0.70

0.70

0.70

0.70

0.70

0.70

VARIANCECOEFF

0.02

0.02

0.02

0.02

0.02

0.02

0.02

0.02

0.02

0.02

T (Speed of learning) MEAN VARIANCECOEFF

361.90 524.01 500.26 498.88 502.03 503.25 503.94 502.71 503.11 502.25 0.28

0.34

0.38

0.37

0.38

0.37

0.37

0.37

0.37

0.37

MEAN

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

VARIANCECOEFF

0.67

0.93

0.93

0.94

0.95

0.95

0.95

0.95

0.95

0.95

E (Stability of learning)

Design point: 2. VA R (Quality of learning) MEAN

0.66

0.66

0.66

0.66

0.66

0.66

0.66

0.66

0.66

0.66

VARIANCECOEFF

0.02

0.02

0.02

0.02

0.02

0.02

0.02

0.02

0.02

0.02

T (Speed of learning) MEAN VARIANCECOEFF

400.30 522.90 525.89 517.81 513.88 516.75 517.92 515.27 515.70 518.83 0.33

0.37

0.37

0.38

0.38

0.37

0.37

0.37

0.37

0.37

E (Stability of learning) MEAN

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

VARIANCECOEFF

0.73

0.94

1.09

1.03

1.01

1.01

1.01

1.01

1.01

1.02

MEAN

0.66

0.67

0.66

0.67

0.67

0.67

0.70

0.67

0.66

0.66

VARIANCECOEFF

0.03

0.02

0.03

0.03

0.03

0.03

0.02

0.03

0.03

0.03

Design point: 3. VA R (Quality of learning)

T (Speed of learning) MEAN VARIANCECOEFF

514.70 513.45 488.99 515.62 515.46 513.27 503.94 515.90 516.46 515.71 0.25

0.32

0.40

0.37

0.37

0.37

0.37

0.37

0.37

0.37

E (Stability of learning) MEAN

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

VARIANCECOEFF

1.00

0.96

0.91

0.97

1.01

1.02

0.96

1.01

1.01

1.01

as they allow for identifying the point from which the coefficient of variance does not change with increasing the number of runs. The table for our sample model shows that the variance stabilizes after 40,000 runs at the latest. Hence, all simulation outputs will be calculated with over 40,000 runs per design point for the following analysis. Upon completing the required number of runs for the pre-experimental simulations, (VI) the simulation experiment is performed. Its design points are based on the factorial design (IV). In the sample model, two factor levels per factor and five factors specify the 32 (25 ) parameter combinations for the simulation experiment.

52 Table 9 Design matrix—2k design Sub-DOE EWA

I. Lorscheid et al. Design Factors points

Responses

Ticks ρ

σ

φ2

λ

R

T

E

1











0.705 456.487 0.012

2









+

0.731 181.767 0.052

3







+



0.734 364.125 0.022

4







+

+

0.783 207.305 0.104

5





+





0.705 457.304 0.012

6





+



+

0.731 181.033 0.052

7





+

+



0.734 367.212 0.022

8





+

+

+

0.784 206.150 0.105

9



+







0.699 512.463 0.009

10



+





+

0.706 461.628 0.012

11



+



+



0.705 484.552 0.011

12



+



+

+

0.735 362.173 0.022

13



+

+





0.699 509.315 0.009

14



+

+



+

0.705 461.081 0.012

15



+

+

+



0.705 484.614 0.010

16



+

+

+

+

0.735 361.748 0.022

17

+









0.705 462.672 0.018

18

+







+

0.732 182.305 0.053

19

+





+



0.737 368.752 0.025

20

+





+

+

0.800 222.256 0.109

21

+



+





0.705 458.574 0.018

22

+



+



+

0.732 180.379 0.053

23

+



+

+



0.737 367.273 0.025

24

+



+

+

+

0.799 226.051 0.109

25

+

+







0.699 512.191 0.016

26

+

+





+

0.705 459.364 0.018

27

+

+



+



0.705 483.745 0.017

28

+

+



+

+

0.737 361.640 0.025

29

+

+

+





0.699 517.674 0.016

30

+

+

+



+

0.706 459.273 0.018

31

+

+

+

+



0.705 490.982 0.017

32

+

+

+

+

+

0.737 358.334 0.025

Table 9 depicts the design matrix. The experiment’s simulation output displays the response variable values of R, T and E. The design matrix structures the simulation experiment and contains all factor level combinations and the simulation response (Law 2007). The factor levels in a 2k factorial design matrix are represented by symbols, ‘+’ and ‘−’. The response values are calculated per design point based on the simulation output. For each run i, the value Ri is calculated as the average reward the agent receives within one simulation run. Ti represents the time step at which the agent reaches the desired equilibrium

Opening the ‘black box’ of simulations Table 10 Analysis of variance results for main effects and interaction effects

∗∗∗ p < .001

53

Variable

df

MS

F

Main effect of ρ

1

3.520

8,241.306

∗∗∗

Main effect of σ

1

0.001

1.347

Main effect of φ2

1

4.330

10,138.393

∗∗∗

Main effect of λ

1

3.627

8,492.063

∗∗∗

ρ ×σ

1

0.000

0.281

ρ × φ2

1

0.759

1,777.370

∗∗∗

ρ ×λ

1

0.508

1,190.631

∗∗∗

0.676

σ × φ2

1

0.000

σ ×λ

1

0.000

0.460

φ2 × λ

1

0.823

1,926.075 0.684

ρ × σ × φ2

1

0.000

ρ ×σ ×λ

1

0.001

2.631

ρ × φ2 × λ

1

0.000

0.000

σ × φ2 × λ

1

0.000

0.001

ρ × σ × φ2 × λ Error

1

0.001

2.504

15,984

0.000

∗∗∗

in run i for the first time. Ei captures for how long the agent’s behavior displays the desired equilibrium in run i after Ti . The values R, T and E are average values over 40,000 runs, i.e. over the required number of runs defined by the error variance analysis (V). Next, the (VII) effect analysis of all factors is conducted for the simulation data. For a systematic analysis of the factor effects, the analysis proceeds in two steps. First, factors are tested for significant effects on the simulation response. Second, the strength of all identified significant factor effects is calculated. For the first step, testing the factor effects for significance, a factorial ANOVA is performed. Table 10 shows the output of the “Tests of Between-Subjects-Effects” for the sample model. It is the main factorial ANOVA table. Before evaluating the significance of the single factor effects, the information of the factorial ANOVA can be used to assess the overall effect of the experimental manipulation on results. R 2 indicates whether the identified factor effects explain more experimental variation than the unsystematic error variance caused by stochastic elements in the model. The R 2 value is 0.665 in the given model. Hence, more variance is caused by systematic variation than by experimental error in the model. The significance of factor effects can be read from the factorial ANOVA table. The factors ρ, φ and λ have significant effects in the sample model, whereas σ has no significant effect on the response value R. Significant 2-factor interaction effects can be identified between all significant factors ρ, φ and λ. In the second step of the effect analysis, the average effect strengths of the identified factors are calculated (for details see Law 2007). The main effect of a factor is the average change in the response triggered by changing its factor level alone (Law 2007). The strength of all identified significant 2-factor interaction effects is also calculated (Law 2007). In our example, φ2 has a positive effect of 0.014 on the

54 Table 11 (Preliminary) Effect matrix

I. Lorscheid et al. Response variable R Factors Factors

Ticks

ρ

Ticks

0.003

−0.002

0.000

0.003

−0.030

0.000

−0.013

0.000

0.000

0.000

0.000

0.032

0.014

ρ σ φ2 λ

σ

φ2

λ 0.002

0.030

effect of λ on R. Using an effect matrix for every response value helps to organize the calculated effect strengths. The effect matrix for the response variable R (see Table 11) contains all factor effects and 2-factor interaction effects on R. The main effects are listed at the crossing points (m, m) and the interaction effects between two factors at points (m, n) in the effect matrix. The effect matrix allows for interpreting the model behavior from the perspective of the given factors and the response variable. It becomes evident from Table 11 that the high factor levels of the learning parameters λ and φ2 combined with the low factor level of ρ facilitate a higher learning quality. The factor levels of ticks have only a small effect on R. As the simulation development does not change between 1,000 and 5,000 ticks within one simulation run, a run of 1,000 ticks is sufficient for our sample. Additionally, σ does not appear to have any influence. The parameter σ in EWA calibrates the hypothetical influence from having observed other agents’ experiences. The hypothetical influence does not take effect and, consequently, σ has no impact on the outcome. Given these results, the factors ticks and σ can be fixed as control variables for the further analysis. At this point, the analysis process is pursued iteratively. The analysis can be continued by running more simulation experiments with further factor level combinations to find the best possible learning performance of EWA for the given model scenario. A.2.2 Sub-DOE EWA—2nd iteration The starting point is once again (IV) selecting an appropriate factorial design. New factor levels around the previously successful levels are defined for the next simulation experiment iteration with the ultimate goal of identifying the most successful factor levels in the factor screening process. The result is shown in Table 12. According to the effect matrix (see above, step VII), the factors φ2 and λ affected R positively. Therefore, new factor levels for φ2 and λ are defined around the former high factor level (‘+’ = 0.85). As the factor ρ had a negative effect on R, its new factor levels are specified around the former low factor level (‘−’ = 0.15). The factors ticks and σ had a small or no effect on the learning quality. They can be fixed as control variables for the next iteration with the factor levels of 1,000 and 0.15 respectively.

Opening the ‘black box’ of simulations Table 12 Factor levels for 2k factorial design (2)

Table 13 Design matrix (2)

Table 14 Effect matrix (2)

55

Factors

Factor level range

Factor levels

Representation

ρ

∈ [0, 1]

{0.05, 0.25}

{−, +}

σ

∈ [0, 1]

{0.75, 0.95}

{−, +}

φ2

∈ [0, 1]

{0.75, 0.95}

{−, +}

λ

∈ [0, 1]

0.15

Control variable

Ticks

N

1,000

Control variable

Design points

Factors

Responses

ρ

φ2



R

T

E

1







0.771

197.231

0.082

2





+

0.774

184.363

0.095

3



+



0.799

238.113

0.143

4



+

+

0.797

221.682

0.149

5

+





0.766

213.288

0.069

6

+



+

0.771

197.715

0.082

7

+

+



0.801

253.970

0.134

8

+

+

+

0.799

236.818

0.143

Factors

Factors ρ

P φ2 

−0.001

φ2

λ

0.003

0.000

0.029

−0.003 0.001

The combinations of the new factor levels yield a total of eight (23 ) new design points. They define the (VI) simulation experiment as shown in the new design matrix (Table 13). The values depicted in the design matrix enable once again the two-step process outlined above for analyzing all factor effects and their effect strengths. The effect matrix captures the outcome of the effect analysis (see Table 14). The high factor level of φ2 stands out with its positive effect on R. Thus, a further analysis of factor φ2 around the high factor level (‘+’ = 0.95) might lead to a better learning performance than R = 0.801 which is the best value observed so far (see Table 13, design point 7). The factors ρ and λ have no effect on R within the given factor level interval. Hence, they are fixed as control variables for the next analysis. Factor ρ is set at its average low factor level of 0.15 and factor λ at its average high level of 0.85.

56

I. Lorscheid et al.

Table 15 Factorial design (3)

Table 16 Analysis of φ2

Factors

Factor level range

Factor levels

Representation

ρ

[∈ 0, 1]

0.15

Control variable

σ

[∈ 0, 1]

0.15

Control variable

φ2

[∈ 0, 1]

{0.9, 0.91, . . . , 1}

{0.9, 0.91, . . . , 1}

λ

[∈ 0, 1]

0.85

Control variable

Ticks

N

1.000

Control variable

Design points

Factor

Responses

φ2

R

T

E

1

0.9

0.791

218.379

0.120

2

0.91

0.792

220.468

0.123

3

0.92

0.794

220.871

0.128

4

0.93

0.795

225.274

0.133

5

0.94

1.798

227.638

0.138

6

0.95

1.799

236.125

0.143

7

0.96

0.802

244.362

0.151

8

0.97

0.804

258.794

0.159

9

0.98

0.808

280.879

0.173

10

0.99

0.814

324.725

0.196

11

1

0.837

518.486

0.248

Table 17 Final parameter combination Final dp

“Best performance”

Factors

Responses

Ticks

ρ

σ

φ2

λ

R

T

E

1000

0.15

0.15

1

0.85

0.837

518.486

0.248

A.2.3 Sub-DOE EWA—3rd iteration As explained before, only factor φ2 had an effect on response variable R in the previous iteration of the simulation experiment. Therefore, new factor levels are only defined for φ2 for the subsequent analysis. Ten new factor levels around the former successful high factor level of φ2 are chosen (see Table 15). In the given scenario of R = 0.837, 1.0 emerges as the best factor level for φ2 (see Table 16). This learning parameter combination offers the best learning quality for learning algorithm EWA. The final step for the factor screening is to set the factor levels for all sub-factors at the highest measured quality of learning for learning algorithm EWA (see Table 17). The response variable values of this parameter combination will serve as the performance value for EWA in the Top-level DOE. All Sub-DOE of our sample model have been analyzed in the described way. On this basis, the calibrated sub-factors

Opening the ‘black box’ of simulations

57

Table 18 Design matrix, top-level DOE Design points

Factors

Responses

Learning algorithms

R

T

E

1

Zero intelligence

0.698

517.999

0.015

2

Reinforcement learning

0.837

414.238

0.203

3

Experience weighted attraction

0.837

518.486

0.248

Table 19 Classification of variables, top-level DOE Independent variables Learning algorithm

Control variables

Dependent variables

Strategy set

Quality

Real productivity

Speed

Report of opponent

Stability

. . . of learning truthful reporting

Ticks

with the best performance can be entered in the Top-level DOE as the factor level value for the respective learning algorithm. A.2.4 Top-level DOE The Sub-DOE revealed the best learning performances for the single learning algorithms. The Top-level DOE now allows for comparing them against each other (see Table 18). The subsequent iteration of the experiment at the top level has a very simple setup. The experimental design comprises three (= 31 ) design points that are defined by one factor (learning_algorithm) with three factor levels (Zero Intelligence, Reinforcement Learning, Experience-Weighted Attraction). The factor levels make it possible to compare the learning algorithms for their performance. The learning algorithms Reinforcement Learning and Experience-Weighted Attraction are shown to result in the same learning performance. Neither of them enables the agent to adopt the intended strategy ‘truthful reporting’ as the only dominant strategy. The value of R depicts that the agent only achieves approximately 84% of the maximum payoff to be received from only reporting truthfully. A.3 DOE and effective result communication Describing the analysis process in as much detail as done in the previous section would overload the communication of the experimental results. That is why we outline next how the introduced output formats can be applied to efficiently communicate simulation results. Basic aspects of the research question can easily be communicated using the classification of variables (see Tables 19 and 20). For our sample model, the classification of variables (Table 19) highlights that the research focuses on a comparative analysis of the effects that different cognitive

58 Table 20 Classification of variables, sub-DOE EWA

I. Lorscheid et al. Independent variables Control variables

Response variables

ρ

Strategy set

R (Quality of learning)

σ

Real productivity

T (Speed of learning)

φ2

Report of opponent E (Stability of learning)

λ Ticks

Fig. 6 Graphical representation of a 2-factor-interaction-effect

abilities have on quality, speed and stability of learning truthful reporting (dependent variables). The different cognitive abilities are represented by different learning algorithms (independent variables). Besides the overall differences between learning algorithms, their respective optimal configuration is of interest. Explaining the (optimal) behavior of complex learning algorithms can become a tedious task even when combining sensitivity-analyses in a clever way (Arifovic and Ledyard 2004). Faced with complexity, Sub-DOE can offer the desired clarity.34 We examined the Sub-DOE EWA in detail for our sample model. From the classification of variables (Table 20), one can see that we analyze the effect of the EWA learning parameters ρ, λ, σ and φ2 on quality, speed and stability of learning. The analysis was conducted in order to find the parameter values that result in the best performance of the learning algorithm (exploratory/optimizing approach). The effect matrix shows all factor and interaction effects. It enables us to conclude that factors λ and φ2 have a positive effect, factor ρ has a negative effect and factor σ has no impact on the quality of learning. Consequently, high factor levels of λ and φ2 and low factor levels of ρ support a higher value of R and, thus, a better 34 Depending on the relevance of these challenges, the results can be reported in the main body of the text

or in the appendix/attachment.

Opening the ‘black box’ of simulations

59

learning performance, whereas factor σ has no effect on the learning performance. However, the effect matrix does not only provide a compressed, albeit understandable, explanation of simulation results; it also denotes the 2-factor interaction effects. There are other ways to depict the 2-factor interaction effects in more detail besides the effect matrix. Figure 6 exemplifies one possibility to expand on the preliminary insight given by the effect matrix. The graphical representation helps to further understand the interaction effects and their values because it clearly indicates the direction and size of each effect. In our example, a high value of λ increases the effect of φ2 on quality of learning as visualized by the non-parallel lines in the chart. Other outputs such as the comprehensive effect matrix can be provided in the appendix of a paper.

Appendix B: Glossary Term Independent variable Dependent variable Factor Factor level Control variable

Description A model variable whose variations affect the model outcome. Representation of the model outcome. Experimental term for independent variables in an experiment. Potential factor values (quantitative or qualitative). Variables with no relevant effect on the model/experiment outcome and/or are not of interest with respect to the research question under investigation. Therefore fixed by one value or distribution within the experiment (see nuisance factors). Response variable Measure for the experiment outcome. Input parameter Technical term for a factor, control variable or technical simulation input within the simulation program code. Output parameter Technical term for a response variable or technical simulation output within the simulation program code. Nuisance factors Existing factors within the simulation experiment which are not relevant for the experimenter, but with an effect on the experiment outcome. Nuisance factors can be controllable (as control variables), uncontrollable (see noise factors) or even unknown. Noise factors Uncontrollable nuisance factors within the process, which can be controlled within the simulation experiment by controlled happenstance, e.g. by probability distributions and/or common random numbers. Experimental design Experimental roadmap to plan the simulation experiment. Definition of how the settings for the experiment have to be configured in order to collect appropriate data. Factorial design Important experimental design technique to determine the factor levels to be systematically analyzed within the experiment. Design points Factor level combinations specified by the factorial design, which determine the simulation settings for performing the simulation experiment.

60

I. Lorscheid et al.

Design matrix

Format to structure the design points and measured response variable values within the experiment. Experimental error Variability in the response variable values between simulation runs with the same design point specification caused by (hidden or known) nuisance factors in the simulation model. Error variance matrix Format to structure the analysis for the estimation of the experimental error variance. Definitions based on Kleijnen (1998, 2008), Wu and Hamada (2000), Law (2007) and Montgomery (2009).

References Antony J (2003) Design of experiments for engineers and scientists. Butterworth-Heinemann, Amsterdam Arifovic J, Ledyard J (2004) Scaling up learning models in public good games. J Public Econ Theory 6:203–238 Arnold M, Ponick E, Schenk-Mathes H (2008) Groves mechanism vs. profit sharing for corporate budgeting: An experimental analysis with preplay communication. Eur Account Rev 17:37–63 Axelrod R (1997) Advancing the art of simulation in the social sciences. In: Conte R, Hegselmann R, Terna P (eds) Simulating social phenomena. Springer, Berlin, pp 21–40 Barton R (2004) Designing simulation experiments. In: Proceedings of the winter simulation conference, Washington DC Bettonvil B, Kleijnen JPC (1996) Searching for important factors in simulation models with many factors: Sequential bifurcation. Eur J Oper Res 96:180–194 Box GEP, Hunter JS, Hunter WG (2005) Statistics for experimenters: design, innovation, and discovery. Wiley-Interscience, Hoboken Camerer C, Ho T (1999) Experience-weighted attraction learning in normal form games. Econometrica 67:827–874 Dutta P (1999) Strategies and games: theory and practice. MIT Press, Cambridge Ewert R, Wagenhofer A (2008) Interne Unternehmensrechnung. Springer, Berlin Field A, Hole G (2003) How to design and report experiments. SAGE, London Fisher RA (1971) The design of experiments. Hafner, New York Gilbert N, Troitzsch K (2005) Simulation for the social scientist. Open University Press, Maidenhead Gilbert N (2008) Agent-Based Models. SAGE, London Gode DK, Sunder S (1997) What makes markets allocationally efficient? Q J Econ 112:602 Grimm V, Berger U, Bastiansen F, Eliassen S, Ginot V, Giske J, Goss-Custard J, Grand T, Heinz SK, Huse G, Huth A, Jepsen JU, Jørgensen C, Mooij WM, Müller B, Pe’er G, Piou C, Railsback SF, Robbins AM, Robbins MM, Rossmanith E, Rüger N, Strand E, Souissi S, Stillman RA, Vabø R, Visser U, DeAngelis DL (2006) A standard protocol for describing individual-based and agent-based models. Ecol Model 198:115–126 Grimm V, Berger U, DeAngelis DL, Polhill G, Giske J, Railsback SF (2010) The ODD protocol: a review and first update. Ecol Model 221:2760–2768 Groves T (1973) Incentives in teams. Econometrica 41:617–631 Groves T, Loeb M (1979) Incentives in a divisionalized firm. Manag Sci 25:221–230 Harrison J, Lin Z, Carroll G, Carley K (2007) Simulation modeling in organizational and management research. Acad Manag Rev 32:1229–1245 Heine B-O, Meyer M, Strangfeld O (2005) Stylised facts and the contribution of simulation to the economic analysis of budgeting. J Artif Soc Simul 8(4) Hendricks W, Robey K (1936) The sampling distribution of the coefficient of variation. Ann Math Stat 7:129–132 Homburg C, Klarmann M (2003) Empirische Controllingforschung – Anmerkungen aus der Perspektive des Marketing. In: Weber J, Hirsch B (eds) Zur Zukunft der Controllingforschung, Wiesbaden, pp 65–88 Kelton WD, Barton RR (2003) Experimental design for simulation. In: Proceedings of the 2003 winter simulation conference, New Orleans, Louisiana

Opening the ‘black box’ of simulations

61

Kleijnen JPC (1998) Experimental design for sensitivity analysis, optimization, and validation of simulation models. In: Banks J (ed) Handbook of simulation: principles, methodology, advances, applications and practice, New York, pp 173–224 Kleijnen JPC, Sanchez S, Lucas T, Cioppa T (2005) A user’s guide to the brave new world of designing simulation experiments. INFORMS J Comput 17:263–289 Kleijnen JPC (2008) Design and analysis of simulation experiments. Springer, New York Law A (2007) Simulation modeling and analysis. McGraw-Hill, Boston Malerba F, Nelson R, Orsenigo L, Winter S (1999) History-friendly models of industry evolution: the computer industry. Ind Corp Change 8(1):3–40 Manuj I, Mentzer JT, Bowers MR (2009) Improving the rigor of discrete-event simulation in logistics and supply chain research. Int J Phys Distrib Logist Manag 39:172–201 Mårtensson A (2003) Managing mission-critical IT in the financial industry. Economic Research Institute, Stockholm School of Economics Mårtensson A, Mårtensson P (2007) Extending rigor and relevance: towards credible, contributory and communicable research. In: Proceedings of the 15th European conference on information systems, St Gallen Montgomery DC (2009) Design and analysis of experiments. Wiley, Hoboken Myerson R (1999) Nash equilibrium and the history of economic theory. J Econ Lit 37:1067–1082 North MJ, Macal CM (2007) Managing business complexity: discovering strategic solutions with agentbased modeling and simulation. Oxford University Press, Oxford Oh RPT, Sanchez SM, Lucas TW, Wan H, Nissen ME (2009) Efficient experimental design tools for exploring large simulation models. Comput Math Organ Theory 15:237–257 Peck SL (2004) Simulation as experiment: a philosophical reassessment for biological modeling. Trends Ecol Evol 19:530–534 Polhill JG, Parker D, Brown D, Grimm V (2008) Using the ODD protocol for describing three agentbased social simulation models of land-use change. J Artif Soc Simul 11(2). http://jasss.soc.surrey. ac.uk/11/2/3.html Raghu TS, Sen PK, Rao HR (2003) Relative performance of incentive mechanisms: computational modeling and simulation of delegated investment decisions. Manag Sci 49:160–178 Railsback SF, Grimm V (2011) Agent-based and individual-based modeling: a practical introduction. Princeton University Press, Princeton Richiardi M, Leombruni R, Saam NJ, Sonnessa M (2006) A common protocol for agent-based social simulation. J Artif Soc Simul 9(1). http://jasss.soc.surrey.ac.uk/9/1/15.html Reiss J (2011) A plea for (good) simulations: Nudging economics toward an experimental science. Simul Games 42:243–264 Sanchez SM (2006) Work smarter, not harder: Guidelines for designing simulation experiments. In: Proceedings of the 2006 winter simulation conference, Monterey Sarin R, Vahid F (1999) Payoff assessments without probabilities: a simple dynamic model of choice. Games Econ Behav 28:294–309 Saltelli A, Chan K, Scott EM (2000) Sensitivity analysis. Wiley, New York Saltelli A, Tarantola S, Campolongo F, Ratto M (2004) Sensitivity analysis in practice. Wiley, Chichester Schmolke A, Thorbek P, DeAngelis DL, Grimm V (2010) Ecological modeling supporting environmental decision making: a strategy for the future. Trends Ecol Evol 25:479–486 Siebertz K, Bebber Dv, Hochkirchen T (2010) Statistische Versuchsplanung: Design of Experiments (DoE). Springer, Heidelberg Simon H (1973) The organization of complex systems. In: Pattee HH (ed) Hierarchy theory: the challenge of complex systems. Braziller, New York, pp 1–27 Taber CS, Timpone RJ (1996) Computational modeling. SAGE, Thousand Oaks Trocine L, Malone LC (2001) An overview of newer, advanced screening methods for the initial phase in an experimental design. In: Proceedings of the 2001 winter simulation conference, Arlington Wu CFJ, Hamada M (2000) Experiments: planning, analysis, and parameter design optimization. Wiley, New York

Iris Lorscheid is research assistant and doctoral student at the Institute of Management Control and Accounting at the Hamburg University of Technology. She graduated in 2006 in computer science from the University of Koblenz-Landau and worked from 2008 to 2009 as research associate for the research group

62

I. Lorscheid et al.

of Modeling and Simulation at the University of Koblenz-Landau, Institute of Information Systems Research. Her research interests are methodological issues of simulation, in particular design of experiments and learning algorithms for agent-based simulation. Bernd-Oliver Heine is case team leader with Bain & Company in Germany as well as visiting scholar at the Institute of Management Control and Accounting at Hamburg University of Technology. He received his Ph.D. from WHU—Otto Beisheim School of Management in Koblenz/Germany with a dissertation on the use of information in management control and agent-based simulation. He studied business administration and economics. His research interests include management control, computer simulation, and methodology. Matthias Meyer is professor of management control and accounting and director of the Institute of Management Control and Accounting at Hamburg University of Technology. He has a habilitation degree from WHU—Otto Beisheim School of Management in Koblenz and received 2003 his Ph.D. from the LudwigMaximilians-University in Munich/Germany with a dissertation on principal agent theory and methodology. He studied business administration, economics, philosophy and philosophy of science. His research interests include computer simulation, management control and accounting, institutional economics and methodology.