Visualization and rule validation in human-behavior representation

1 downloads 0 Views 442KB Size Report
Jan 15, 2008 - multiple charts that display the same data using a column chart, stacked ... In a radar chart, each spine represents a category, and the spines ...
Simulation & Gaming http://sag.sagepub.com

Visualization and rule validation in human-behavior representation Lisa Jean Moya, Frederic D. McKenzie and Quynh-Anh H. Nguyen Simulation Gaming 2008; 39; 101 originally published online Jan 15, 2008; DOI: 10.1177/1046878107308096 The online version of this article can be found at: http://sag.sagepub.com/cgi/content/abstract/39/1/101

Published by: http://www.sagepublications.com

On behalf of: Association for Business Simulation & Experiential Learning International Simulation & Gaming Association Japan Association of Simulation & Gaming

North American Simulation & Gaming Association Society for Intercultural Education, Training, & Research

Additional services and information for Simulation & Gaming can be found at: Email Alerts: http://sag.sagepub.com/cgi/alerts Subscriptions: http://sag.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav

Downloaded from http://sag.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2008 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Visualization and rule validation in human-behavior representation Lisa Jean Moya WernerAnderson, Inc., USA Frederic D. McKenzie Old Dominion University, USA Quynh-Anh H. Nguyen WernerAnderson, Inc., USA

Human behavior representation (HBR) models simulate human behaviors and responses. The Joint Crowd FederateTM cognitive model developed by the Virginia Modeling, Analysis, and Simulation Center (VMASC) and licensed by WernerAnderson, Inc., models the cognitive behavior of crowds to provide credible crowd behavior in support of military training, exercise support, and wargaming experiments and can be federated with semi-automated forces (SAF) simulations. Other possible uses include training support to emergency management and security personnel. The large parameter spaces and behavior complexities associated with HBR can make validating these simulations difficult, but with a multipronged approach, it is not impossible. A comprehensive approach investigates not only the results of the simulation but also the mathematical instantiation in the conceptual model. This article demonstrates a technique developed to understand the mathematical instantiation of the parameter space embodied by our HBR model and the effects of different parameterizations on the model’s initialization space. KEYWORDS: agent-based simulation; ABS; cognitive model; crowd modeling; human behavior representation; HBR; Joint Crowd FederateTM; mathematical instantiation; military simulation; multiagent system; validation

Although discrete event and time-stepped models dominate military simulations, multiagent models are of growing interest to the military community. Cares (2002) maintains that agent-based simulations (ABS) are inherently representative in nature and thus provide realistic modeling of the complex systems of interest to the military. He further claims that verification and validation for multiagent models is simpler than the more usual data-intensive military models since the representations are closer to subject matter expert views of the world. He implies that the emergent behavior so often sought and found in these models provides valuable insight to the real-world systems modeled using this paradigm. The U.S. Marine Corps Warfighting Laboratory’s Project Albert gives evidence of the growing interest. Project Albert is a research and development effort to investigate data-farming techniques using lowresolution, entity-based distillation models (Marine Corps Warfighting Laboratory, 2006). The hope is that the simple rule structures of these models can capture the SIMULATION & GAMING, Vol. 39 No. 1, March 2008 101-117 DOI: 10.1177/1046878107308096 © 2008 Sage Publications

101 Downloaded from http://sag.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2008 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

102

SIMULATION & GAMING / March 2008

complexities of the battle space environment and that fully exploring the parameter space within these distillations will lead to needed insights. Research from Project Albert has demonstrated that although the rules for describing multiagent simulation behavior can be simple, which fulfills the promise described by Cares (2002), variances in implementation can create difficulties. Therefore, it is important that developers pay significant attention to validation within the cycle of rule set development and application. This article demonstrates a visualization technique for evaluating the behavior rule sets of human behavior representation (HBR) agent-based simulations, using the Joint Crowd FederateTM research prototype as an example.

Validation of HBRs in military simulations The Department of Defense requires verification and validation (V&V) of military simulations by way of DODD 5000.1 (Department of Defense, 2003a) and DODI 5000.61 (Department of Defense, 2003b). The Defense Modeling and Simulation Office (DMSO) provides guidance in their Verification, Validation, and Accreditation (VV&A) Recommended Practices Guide (Defense Modeling and Simulation Office, 2004). While thought is frequently given in the literature to results validation, less attention has been paid to conceptual model and knowledge base validation. This is a testament to the difficulty faced in validating military simulations and HBR models in this domain specialty. Difficulties in validating HBR models Harmon, Hoffman, Gonzalez, Knauf, and Barr (2002) discuss difficulties with validating HBR models. First, while most validation efforts account for some type of results validation, less attention is paid to conceptual model development. Perhaps, this is due in part to limited sociological and psychological sources to support development and the many disparities in the documents that exist. Harmon et al. (2002) suggest that this leads “validators to choose . . . sometimes with little more justification than their intuition.” Missing documentation further exacerbates this problem. Another difficulty lies in the specification of HBR requirements. While subject matter experts (SMEs) serve as useful sources, without referencing effects and performance metrics when developing them, these requirements are usually less useful in development than they would be otherwise. For face validation, SMEs are often indispensable but also come with their share of difficulties. Harmon et al. mention subjective opinions, difficulty assessing the simulation for its intended use, the danger of a tendency to invent answers, and other difficulties with using SMEs for results validation—all of which make for a challenging job in rationalizing analyses. Therefore, objective validation methods would be welcome that can contribute to and complement SME-based face validation. Finally, the inherent complexity of HBR simulations makes a black box evaluation (i.e., sole reliance on results evaluation) less useful than a white box evaluation. This involves comparing essential components, such as the knowledge

Downloaded from http://sag.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2008 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Moya et al. / HUMAN-BEHAVIOR REPRESENTATION

103

base, internal state representation, and behavior engine, to the referent developed in the conceptual model. Results validation Results validation techniques appear frequently. Simpkins, Paulo, and Whitaker (2001) present a reusable methodology for results validation applied to the theater missile defense capabilities of EADSIM as a baseline for comparison to Wargame 2000. Both models were instantiated with identical parameter data and results were compared using specified measures of effectiveness along with graphical and statistical methods. In general, the methodology proposed relies on the underlying validity of the results provided by the baseline model. This approach is reasonable given the desire for the model to match EADSIM fidelity (Simpkins et al., 2001). The article demonstrates two important caveats to overgeneralizing the validity of models when using a similar approach for other validation efforts. First, should the two models be determined to produce the same output, this does not imply blanket validity. Rather, if one model is invalid for an intended use, the compared model may have equal results to the first, but it still inherits the validity limits of the referent model. Second, when using statistical techniques in validation, while the rejection of a null hypothesis that the two models used give the same results gives evidence that the models differ, the failure to reject does not mean that the results are the same. McCue (2005) also identifies this basic limitation and the difficulties inherent in blanket validity without the concept of intended use to bound the question and qualitatively establishing minimal requirements for validity. In a similar vein, Herington, Lane, Corrigan, and Golightly (2002) use a historical event to validate a command-and-control event-driven simulation model. The validation consisted of comparing actual and simulation campaign metrics. Not all of the real-world conditions in the historical model could be represented, for example, night transit; in these cases, appropriate parameter modifications within the model were made to account for the resultant differences in capabilities. Although there were some noteworthy discrepancies between the simulated and actual events, the authors believe their model was able to reproduce key historical events in the scenario. Further, the process of identifying the underlying factors producing differences between model and historical were used to improve the model. Moffat, Campbell, and Glover (2004) also evaluate the validity of a command-and-control decision-making simulation in the context of a historical mission. This is an agentbased model for command and control, which directly affects the number of engagements. In the validation approach, the effectiveness per engagement is calibrated to that of a historical event. Then, if the casualties per engagement are consistent with the event, although this consistency is ill defined, then the model is considered valid. Neither of these papers presents a statistical basis for good enough nor indicates whether this assessment implies a predictive capability or any limitations on intended use.

Downloaded from http://sag.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2008 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

104

SIMULATION & GAMING / March 2008

What is good enough? Good enough is an issue for HBR validation, both quantitatively and qualitatively. An overview of the inherent difficulties of model and simulation validation with respect to this issue can be found in Akst (2006). While event matching results on specific metrics are useful, a state-by-state matching may not be necessary, or even knowable or desirable, in the case of emergent behavior systems. A human behavior model matching measures of outcome for a single historical event merely shows that the model is capable of getting the same answer for a particular event, not that it arrived at the answer correctly nor that it can get the correct answer for another historical event or for some future event. That is, matching a single event in time does not guarantee that a model is predictive; further, not all models require a level of validity sufficient for predictive behavior. Weisel (2004) discusses validity in terms of relations between the states in the real system and in the modeled system that addresses this difference. HBR models intrinsically have a large state trajectory space built from tightly coupled behavior engines. Small changes in state or stimuli can produce large, and unpredictable, differences in system response. This makes SME face validation difficult since the nonlinear relationships make it impossible for a SME to generalize from one observed behavior condition to another (Defense Modeling and Simulation Office, 2004). In fact, MacKenzie, Schulmeyer, and Yilmaz (2002) maintain that while having accurate results is a necessary condition to validity, it is not sufficient. The model must also accurately reflect the system that is modeled. This includes but is not necessarily limited to the conceptualization, logic, and mathematical structures. Validating agent-based simulations Where agent-based simulations are involved, validation has been shown to be complex and inconclusive. Gill and Grieger (2001) thoroughly report on an evaluation of movement algorithms for two agent-based distillation or high-level combat models: MANA and EINSTein. The movement algorithm uses weights, defined by the user, and a hard-coded penalty function. Of particular interest in their study was whether or not the weights matched user intentions for assigned values. In previous work, the authors found counterintuitive results on the assigned weighting schema. Their study evaluated the current form of the penalty functions along with several alternatives. Significant differences were seen depending on the form of the function used, some results being unintuitive; one of the main conclusions from their work was that assumptions made in implementing the penalty function should be made explicit to the user. Sanchez and Lucas (2002) report on related work by Gill and Shi (2002). In their report, Sanchez and Lucas relate the importance of a matching between the userconceptual model of a parameter’s meaning and the implemented model of that parameter, in this case a weighting factor. Even with an explicit weighting schema, differences in intent or application can create counterintuitive results for the user once the rules are implemented. Sanchez and Lucas (2002) also report on work by

Downloaded from http://sag.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2008 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Moya et al. / HUMAN-BEHAVIOR REPRESENTATION

105

Vinyard and Lucas (2002), where a well-known simple deterministic combat model (Dewar, Gillogly, & Juncosa, 1991), based on the Lanchester model for combat, demonstrated surprisingly non-monotonic chaotic-like (Palmore, 1996) behavior. Work by Tolk (2002) investigated the claims of non-monotonicity and chaotic-like behavior in this model. Tolk was able to demonstrate that the emergent effects seen were the result of the rule set implemented and not necessarily a natural characteristic of the system itself. Further, with reasonable adjustments to the rule set, the resulting behaviors no longer exhibited the chaotic-like behavior and better matched the author’s intuition. These differences beg the question whether the generated behavior is an artifact of the system or of the model instantiation. Further, it highlights the difficulty of relying solely on SME-based validation with respect to HBR models. That is, can a SME merely look at the rules found, that is, as in the Cares (2002) assertion, and accept the results in total? Sanchez and Lucas (2002) show that this is insufficient since interpretation also matters, and Gill and Grieger (2001) demonstrate that there can be multiple rule choices. Tolk (2002) intimates further that if unexpected (emergent) behavior occurs, then perhaps the rules set should be changed. This poses some interesting questions for the validity of reactive agent-based simulation: • Is the emergent behavior exhibited by an agent-based simulation, an inherent characteristic of the system modeled, or merely the result of a specific set of choices made by the modeler? • What determines the validity of a chosen rule set: the generated behavior, the conceptual validity of the rules, both, or other criteria? • What makes one instantiation of a rule more valid than another instantiation?

Techniques that enable model builders to understand one rule set instantiation and/or interpretation over other rule sets and their possible interpretations can help to support overall model development and validation. Visual methods provide one method. Sanchez and Lucas (2002) demonstrate some methods accessed through large data-mining efforts over the entire set of state space trajectories. Data mining of this magnitude may not always be possible. This article discusses a visualization technique to understand the effects of parameter initialization on a simple rule set demonstrated using material for the research prototype of the Joint Crowd FederateTM cognitive model without requiring a large data-mining effort. Organization of the article We first give an overview of the Joint Crowd FederateTM prototype model used to illustrate the technique for evaluating the initialization of the rule set generating the behavior of an agent-based HBR simulation. An informal description will highlight the need for this kind of HBR simulation and give an overview of the conceptualization used to support the explanation of the methodology. A formal description will motivate the purpose of the methodology. Next, the methodology will be presented, which is followed by a discussion of interpreting the results. Finally, some final thoughts are given.

Downloaded from http://sag.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2008 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

106

SIMULATION & GAMING / March 2008

Joint Crowd FederateTM research prototype overview Crowds are a natural part of realistic urban settings and therefore should naturally be within training simulations supporting urban military operations. Crowd behavior is different from population behavior, which provides the patterned backdrop of a target community. Population representations typically do not respond significantly to environmental distractions. Conversely, crowds are sensitive to disturbances in the environment and react to influences due to the presence of military or control forces. Usually, disturbances that are more significant result in appropriately more significant reactions that depend on the crowd makeup. To fill this need, the Joint Crowd FederateTM prototype builds toward a model of crowds that is highly realistic, psychologically based, realtime, and interactive. Significant validation is necessary to substantiate this claim. The Joint Crowd FederateTM model itself is an adaptation of Musse and Thalmann’s (2001) concept of a crowd hierarchy to represent crowds. A crowd consists of a hierarchy of groups, with the groups being composed of individuals (Nguyen, McKenzie, & Petty, 2005). This research prototype of the Joint Crowd FederateTM does not model individual cognition of each group member. Rather, individuals perform navigation and obstacle avoidance at the physical model layer while the cognitive model focuses on the group-level mood and parameters for behavior choices. There may be multiple groups and multiple crowds present in a given scenario, each with its own set of parameters and group mental states guiding its action selection. This model represents the human behavior of crowds at the group level. Informal description The Joint Non-Lethal Directorate’s Human Effects Advisory Panel (Kenny et al., 2001) found that crowds are gatherings of small groups, normally comprised of friends, family members, and acquaintances, called “companion clusters,” who normally arrive together, rally together, and leave together. This is the basis for the definition of a group. Groups are typically small, having two to six members, with a nominal leader identified as the decision maker for the group. The total numbers of individuals who make up the groups within that crowd determine the overall crowd size. This model uses four initial group types, representative of crowd member classification, that are characterized as follows: • Agitators are individuals who come to the crowd with the intention to create conflict. • Protestors are individuals who have joined the crowd to protest and tend to be more committed to the goals of the crowd. • Casual members are not as committed to the crowd’s goals but become part of the crowd’s actions. • Bystanders are individuals near the crowd but only observe what is taking place.

A crowd can be composed of any number of these group types, with the different variations in crowd compositions determining the overall initial aggression level of the crowd and the groups’ likelihood of participating in violent actions.

Downloaded from http://sag.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2008 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Moya et al. / HUMAN-BEHAVIOR REPRESENTATION

107

Based on the companion cluster concept that supports the idea that individuals of a small group behave similarly, the prototype model contains a single metric, the group aggression level (GAL), to measure each group’s aggression level. Since these groups will stay together and act together, we identify a nominal leader for each group, and based on stimuli received by that leader, the values of the current crowd psychological parameters, and the current aggression level of that group, a new mental state is computed for that group. This metric is a discretized utility score between 0 and 1 inclusively, used to determine the behaviors available to the group. The cognitive model chooses a behavior based on the type of group and the group’s mood. Once the cognitive model chooses a behavior, all members of the leader’s group participate in that behavior. The crowd aggression level (CAL) is the weighted average of the GALs, where the weights depend on both the group type and the number of each group in the crowd composition. For the purposes of this model, the agent-entities are the groups making up the crowd. The rule set is the combination of the equation used to determine the GAL update and the knowledge base to determine the available behaviors and the groups’ reaction to incoming stimuli. The visual technique provides a method to evaluate the effect of parameter values on the initialization of the GAL state. Formal description Each of the four crowd group types: agitators (A), protestors (P), casual (C), and bystanders (B), in any combination, has associated with it a GAL. The initialization of the GALs in this example depends on the assumed overall CAL and the crowd composition. Thus, the parameter space is a tuple (nA, nP, nC, nB, CAL), where nx ∈ Ν and CAL is the specified starting CAL. To simplify the model, each group of type x is assumed to start with the same GAL. For simplicity, we let the aggression level for each group of the same type initialize at the same level but allow the aggression levels for each group to progress independently depending on individual stimuli received. The initialization space is the set of all possible initialized values for the groups. While it is understood that for feasibility 0 ≤ GALxi ≤ 1 for each group, not all of these values may be attainable. The visualization method helps to determine the attainable state values. The purpose of the methodology is to develop a visualization of the initialization space over the parameter space, that is, the set of possible crowd compositions or the feasible set of states. Formally, we can describe every modeled crowd in the following way: let x ∈ {A,P,C,B}denote a group type, then ∀ x ∈ {A,P,C,B}, ∃ nx ∈N that gives the number of groups of type x ∀ xi , i=1...., nx ∋: nx ≠ 0, GAL0x = GAL0x, where i GAL0x denotes the initial GAL of the ith group of type x i GALtxi[0,1], ∀xi and steps t The transition of the aggression levels, or the state trajectory, can be described by adapting notation found in Wooldridge (2002) and is given below. Iteration t in this

Downloaded from http://sag.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2008 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

108

SIMULATION & GAMING / March 2008

definition represents any instance where new stimuli are received for processing. For this discussion, a more precise definition is not required. Let Stim = {stim1, stim2,...} be the set of possible stimuli the groups could receive Then Stim* is the set of all stimuli the group could perceive at any iteration, where* denotes the power set Let pertx ∈ Stim* be the perceived stimuli of group xi at iteration t i Let GALtx ∈ [0,1] be the GAL of group xi at iteration t i Let Ac = {α1,α2,....} be the set of possible actions, as discussed previously the ⊂ Ac group type and GAL limit this set, denoted AcGAL x Then we define a function for the next state: next: GAL × Stim* → GAL and a function defining the action: action: GAL → Ac, where GAL txi+1 = next (GALxt i, pertxi) and Acxt i = action (GALxt i) From the above, it is clear that the set of available actions, or behaviors, is constrained by a group’s GAL and that subsequent GAL updates are constrained by the stimuli reaction of each group. That is, the state representation and the change in state in response to stimulation based on the knowledge base fully define the reactions of the groups in the crowd (Defense Modeling and Simulation Office, 2004). Thus, when applying the visualization method, one evaluates the range of behaviors available to the group agents in the knowledge base based on the initialized state values to ensure that this set is reasonable and valid at an initialized state, that is, at the start of the simulation. When evaluating the effect of this initialization on the possible GAL state trajectories, an evaluation of the possible stimuli responses available to each group for its initialized GAL is made across each crowd composition. That is, an evaluation of the possible next step GALs is made. Complex stimuli responses may require further analysis and additional techniques, but simpler responses can lend themselves well to this approach. In this specification, the updated GAL at step t + 1 is a function of the current state (GAL) and the perceived stimuli. The next behavior, ac, is chosen based on this new state. The initial action chosen is based solely on the initialized GAL. From the above, it is clear that the set of available actions, or behaviors, is constrained by a group’s initial GAL and that subsequent GAL updates are constrained by the stimuli reaction of each group. The purpose of the developed technique is to investigate initialization space of the GALs presented by the possible crowd composition. This article will illustrate a technique to assess a crowd’s behavior given the above descriptions.

Methodology This visualization technique allows the investigation of the broad effects of a given range of instantiation in the rule set of an HBR agent model. This allows the model developer to evaluate whether the methodology chosen to instantiate the

Downloaded from http://sag.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2008 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Moya et al. / HUMAN-BEHAVIOR REPRESENTATION

109

conceptual model is valid with respect to that conceptualization. In our example, when evaluating the effect of crowd composition on group initialization and subsequent behaviors, two questions are of interest: • Are appropriate behaviors available to the group (based on the initialized value)? • Does the initialized value over- or underconstrain the updated GAL values? That is, what future states are achievable and are these reasonable?

Further, we want to be able to assess whether the initialized values are reasonable based on the crowd composition relative to the other compositions possible. To determine answers to these questions, both an appropriate visualization method is required along with a useful partitioning of the parameter space describing the crowd composition. This allows us to take a broad-brush look at the large trajectory space. Many partitionings of the space are possible, depending on the desired interest areas. Some might require a statistical design, while others could benefit from a more exhaustive approach. The method presented does not depend on the partitioning method used. For this example, we will present the method for determining the expected behavior of the model over the range of possible crowds without an exhaustive data-mining effort against all possible values within the parameter space, such as discussed in Sanchez and Lucas (2002). The technique developed for this example is illustrated below. Parameter space Although the parameter space describing the crowd is reduced due to the simplifications imposed on the initialized GALs, it still is prohibitively large. Even exploring a splitting of the crowd in 10% increments for each of only three groups still allows 11! crowd combinations. Fortunately, fine granularity is not necessary to see the behavior trends. For the purposes of illustration, consider the following partitioning: • x-heavy: loosely defined as a crowd primarily consisting of group x ∈ {A,P,C,B} with the other groups present and distributed evenly; for example, A-heavy might have nA = 27 and nP = nC = nB = 1 • Half-x: half of the crowd consists of group x ∈ {A,P,C,B} with the other groups present and distributed evenly; for example, Half-A might have nA = 15 and nP = nC = nB = 5 • Equal Groups: each group x ∈ {A,P,C,B}is present and distributed evenly; for example, nA = nP = nC = nB = 10 • Equivalent Influence: nA, nP, nC, and nB are chosen such that the weighted influence each has on the overall CAL is equal for each group x ∈ {A,P,C,B}

This last group partitioning is chosen such that the weights applied to each group in the overall calculation of the crowd aggression level are equal. This partitioning clearly does not exhaustively explore the space, for example, instances where nx = 0 are completely excluded. However, these are special cases of the general one where x ∈G ⊂ {A,P,C,B}, and the same method can be utilized to explore these instances.

Downloaded from http://sag.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2008 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

110

SIMULATION & GAMING / March 2008

Column chart

Stacked column chart 2.2222

1.1111

2.11109 1.99998

0.99999

1.88887 1.77776

0.88888

1.66665 1.55554

0.77777

1.44443 1.33332

0.66666

1.22221 0.55555

1.1111 0.99999 0.88888

0.44444

0.77777 0.66666

0.33333

0.55555 0.44444

0.22222

0.33333 0.22222

0.11111

0.11111 0

0 Equal Number Grps

Bystander Half

Casual Half

Protestor Half

Agitator Half

Agitator

Protestor

Equal Influence

Casual

Bystander Heavy

Casual Heavy

Protestor Heavy

Equal Number Grps

Agitator Heavy

Bystander Half

Casual Half

Protestor Half

Agitator Half

Agitator

Bystander

Line graph over categories

Protestor

Equal Influence

Casual

Bystander Heavy

Casual Heavy

Protestor Heavy

Agitator Heavy

Bystander

Stacked area chart over categories 2.2222

0.99999

2.11109 1.99998

0.88888

1.88887 1.77776

0.77777

1.66665 1.55554

0.66666

1.44443 1.33332 0.55555

1.22221 1.1111

0.44444

0.99999 0.88888

0.33333

0.77777 0.66666 0.55555

0.22222

0.44444 0.33333

0.11111

0.22222 0.11111

0 Equal Number Grps

Bystander Half

Casual Half

Protestor Half

Agitator

Agitator Half

Protestor

Equal Influence

Casual

Bystander Heavy

Casual Heavy

Protestor Heavy

Bystander

Agitator Heavy

0 Equal Number Grps

Bystander Half

Casual Half Protestor Half Agitator Half

Agitator

Protestor

Equal Influence Casual

Bystander Heavy

Casual Heavy

Protestor Heavy

Agitator Heavy

Bystander

FIGURE 1: Sample Data Display Methods

Visualization technique The data for the evaluations consist of crowd composition and initial GAL for each group. There are several common methods for displaying such information, for example, bar charts, column charts, stacked bar or column charts, line graphs, and area graphs, to name a few. Sanchez and Lucas (2002) give examples of other visualization methods for large data spaces, and Tufte (2001) is a good resource for some more esoteric visualization and data display methods. Figure 1 gives examples of multiple charts that display the same data using a column chart, stacked column chart, line graph, and area graph, which are commonly available in most software packages. The data used in each of the graphs are for illustrative purposes only. The graphs show the GAL levels by group, with the x-axis giving the crowd composition and the y-axis displaying the GAL. Notice that to facilitate mapping the resulting GALs to the behavior selection function, major units on the graphs correlate to the discretized GAL utility description rather than using some other scale. While each of these methods displays the data, none adequately allows a vivid understanding of either the behaviors available to the groups or the future GAL

Downloaded from http://sag.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2008 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Moya et al. / HUMAN-BEHAVIOR REPRESENTATION

111

Starting GALs by Crowd Composition

Equal Number Grps 0.99999 Agitator Heavy

0.88888

Bystander Half

0.77777 0.66666 0.55555 0.44444 0.33333

Protestor Heavy

Casual Half

0.22222 0.11111 0

Casual Heavy

Protestor Half

Bystander Heavy

Agitator Half

Equal Influence

Agitator

Protestor

Casual

Bystander

FIGURE 2: Radar Chart Displaying Example Data NOTE: GAL = group aggression level.

trajectories over the range of crowd compositions. Further, as the number of crowd compositions increases, none of these methods scales well. To overcome this scalability problem and better inform our intuition on the crowd behavior, we utilized a radar chart. Figure 2 shows the data displayed in the charts of Figure 1 using a radar chart. In a radar chart, each spine represents a category, and the spines display the y-axis data (GAL). More categories, that is, parameter space partitionings, will create additional spines on the graph. The circular pattern of display allows the visualization of any tug-and-pull between categories. Lastly, with each marker on the spines showing a new GAL level, we can identify trends in the initialization pattern to help determine the effects of the initialization on available behaviors and possible trajectories.

Downloaded from http://sag.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2008 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

112

SIMULATION & GAMING / March 2008

Interpreting the data When evaluating the data displayed in the radar chart, several items are of interest: 1. Do any of the crowd compositions create infeasible GALs for one or more of the groups? The implication is either an infeasible initial CAL or an invalid initial GAL. 2. Do the initialized GALs allow a reasonable set of initial behaviors for each group when evaluated against the available responses found in the knowledge base for the initialized state? 3. Do the possible responses to incoming stimuli at the initialized values allow for reasonable GAL trajectories? That is, based on the initial state and the available responses to stimuli over time, does the set of possible trajectory responses fit with respect to the conceptual model? 4. Is the spectrum of initial GALs for each group reasonable based on the investigated crowd compositions with respect to the conceptual model?

Infeasible CAL/invalid GAL initialization Answering the first question can be clearly determined using any of the graphing techniques described above. However, using the radar chart and limiting the y-axis values to feasible ones creates a vivid illustration of infeasibility. Figure 3 demonstrates this for the Casual Heavy and Bystander Heavy crowd compositions. In this example, for these crowd compositions, the Agitator group is initialized with a GAL > 1, breaking with the conceptual model. In the chart on the left, first notice that to show the attained values for each group, the scale had to go beyond the maximum feasible GAL. This shows absolute attained value, but it is not obvious that the Casual Heavy and Bystander Heavy crowds create infeasible GAL initialized values. In contrast, the graph on the right dramatically demonstrates this infeasibility. Moreover, the associated magnitudes of the other groups’ GALs are clearly apparent. Figure 4 shows an example of extreme infeasibility of GALA across crowd compositions using this method of visualization. When looking at this graph, the analyst knows that either the model will be initialized with infeasible values for one of the groups for a majority of the crowd compositions or the model will make an internal adjustment to force feasibility on the group, which will result in the specified starting CAL not being achieved. In this case, the code works properly but the chosen rule instantiation is invalid with respect to the conceptual model. Available behaviors and trajectories To evaluate the effect of initialization on the crowd behavior, it is necessary to understand the behavior set available to each group for its initialized GAL over each crowd composition. To evaluate the effect of initialization on the possible GAL trajectories, it is necessary to understand the possible stimuli responses available to each group for its initialized GAL over each crowd composition. Complex stimuli responses may require further analysis and additional techniques, but simpler responses can lend themselves well to this ad hoc approach. Downloaded from http://sag.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2008 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Moya et al. / HUMAN-BEHAVIOR REPRESENTATION

Invalid GALA: column chart

113

Invalid GALA: radar chart

1.44443

Equal Number Grps 0.99999

1.33332

0.88888 Agitator Heavy

1.22221

Bystander Half 0.77777

1.1111

0.66666 0.55555

0.99999

0.44444 0.88888

0.33333

Protestor Heavy

Casual Half

0.22222

0.77777

0.11111 0.66666

0

0.55555 0.44444

Casual Heavy

Protestor Half

0.33333 0.22222 0.11111

Bystander Heavy

Agitator Half

0 Equal Number Grps

Bystander Half

Casual Half

Protestor Half

Agitator Half

Agitator

Protestor

Equal Influence

Casual

Bystander Heavy

Casual Heavy

Protestor Heavy

Agitator Heavy

Equal Influence

Bystander

Agitator

Protestor

Casual

Bystander

FIGURE 3: Charts Showing Invalid Group Aggression Level (GAL) Initialization

Equal Number Grps 99 88 Agitator Heavy

Bystander Half 77 66 55 44 33

Protestor Heavy

Casual Half

22 11 0

Casual Heavy

Protestor Half

Bystander Heavy

Agitator Half

Equal Wt

Agitator

Protestor

Casual

Bystander

FIGURE 4: Example of Extreme Infeasibility

In this approach, recall that the next behavior, acx , depends on the initialized value. i Subsequent behaviors depend on the updated GAL and the incoming stimuli. Since the graph is scaled to the GAL utility values, it is simple to map the crowd composition categories to the behaviors available in the knowledge base to the group at initialization. Thus, while the specific behavior chosen by the model is unknown to the

Downloaded from http://sag.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2008 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

114

SIMULATION & GAMING / March 2008

analyst, since the choice is stochastic, the range of possible behaviors is known. The analyst can then evaluate the reasonableness of the crowd behaving in this manner to begin with respect to the conceptual model at the start of the simulation. Further, the behavior update is a function of the current GAL and the incoming stimuli. This response differs by group type. By fixing the current GAL and assessing the possible responses to stimuli and evaluating those responses for the effect on the state update function, it is possible to assess the possibilities of overall behavior. A simple example is explained using Figure 2. In this picture, the following observations are apparent: 1. The bystander group never initializes above 0.1111. This means that irrespective of the model setup, this group’s reaction is limited to the set of trajectories available at this aggression level. 2. The casual group is limited to initializations between 0.3333 and 0.11111. This provides a broader range of behavior possibilities to this group. The analyst would need to ensure that these possibilities allow the group to become involved in the crowd actions, per the group definition, based on the possible stimuli. Further, the analyst would ensure that the possibilities that lead to this involvement are reasonable. 3. The protestor group has a similar interpretation with one important difference. For all crowd compositions, this graph shows the protestor group initializing with increasingly higher GALs than both the casual and bystander groups. This initialization is highest for the Bystander Heavy crowd. The analyst, in this case, would need to ensure that the behaviors available to the group at this level of initialization are appropriate for the start of the crowd activity. 4. The interpretation for the agitator group is similar to that identified for the protestor group above except in extreme.

Clearly, this method allows the analyst to focus his or her efforts on the important aspects and parameters of the model and verify that the responses are as expected. Further, multiple time slices can be evaluated to visualize the trajectory of each composition over time. GAL spectrum One last benefit of this visualization method is the ability to evaluate the relative values of the initialization space across categories and metrics. Multiple metrics having the same scale can be shown on a single graph, as we have shown, while metrics having different scales can be compared using multiple graphs with potential patterns being compared. Further, statistical tests to determine if the initialization values are inappropriately correlated could be designed. Figure 5 is used to illustrate this use of the visualization method. In these charts, first notice the high (perfect) correlation between initial GALs. Second, note that the spread between GALs is larger in Chart 1 versus Chart 2 with the lower aggressive groups starting much lower than the more aggressive ones and that this relationship gets progressively enhanced as the crowd becomes more dominated by the lesser aggressive groups. Behavior of the crowds in Chart 2 begins less

Downloaded from http://sag.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2008 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Moya et al. / HUMAN-BEHAVIOR REPRESENTATION

Chart 1

Chart 2

Equal Number Grps 0.99999

Equal Number Grps 99

0.88888

88

Agitator Heavy

Bystander Half

Agitator Heavy 77

0.66666

66

0.55555

55 44

0.33333

Casual Half

33

Protestor Heavy

0.22222

22

0.11111

11

0

0

Casual Heavy

Protestor Half

Bystander Heavy

Agitator Half

Protestor

Casual Half

Casual Heavy

Protestor Half

Bystander Heavy

Equal Influence

Agitator

Bystander Half

0.77777

0.44444 Protestor Heavy

115

Agitator Half

Equal Wt

Casual

Bystander

Agitator

Protestor

Casual

Bystander

FIGURE 5: Examples of Spectrum

aggressively than that of the crowds portrayed in Chart 1. With this data, the analyst can compare the relative start states over a broad range of parameterizations to determine if the relative differences between entities fit within the conceptual construct of the model for the given starting states.

Final thoughts We have shown a method for visualizing the multidimensional initialization space, which we believe is generalizable for analyzing the initialized behavior of HBR models using a rule-based construct. The display method discussed allows the display of initialization values over a portioned space of the parameters influencing the initialization of a metric or set of metrics. The chart can then be evaluated to determine if some portions of the parameter space exhibit anomalous, unexpected, or infeasible results based on the available initialized behaviors found in the knowledge base. Further, a progression or trajectory over time can be evaluated by comparing charts over multiple time slices. While we interpreted an example using this method, the specific pattern that would suggest good or poor initialization would depend on the specific model and parameters evaluated. The analyst would need to assess whether there is a range of initialization values that would be infeasible or undesirable based on the conceptual model for acceptable behavior; whether there ought to be similar initializations for some sets of parameterizations with opposing values for other sets; and if there are desirable relationships between various initialized metrics. This method provides a useful method to visualize the initialization space without having to plot each of the combinatorial possibilities given by the parameters through looking at a rational partitioning of the parameter space. This method allows the visualization of the multidimensional shape of the initialization space given by the initialization parameters.

Downloaded from http://sag.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2008 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

116

SIMULATION & GAMING / March 2008

References Akst, G. (2006). Musings on verification, validation, and accreditation (VV&A) of analytical combat simulations. Phalanx, 39, 9-11. Cares, J. R. (2002). The use of agent-based models in military concept development. In E. Yücesan, C.-H. Chen, J. L. Snowdon, & J. M. Charnes (Eds.), The 2002 Winter Simulation Conference (pp. 935-939). Piscataway, NJ: Institute of Electrical and Electronic Engineers. Defense Modeling and Simulation Office. (2004). The verification, validation, and accreditation recommended practices guide. Alexandria, VA: Author. Department of Defense. (2003a). Department of Defense Directive (DoDD) 5000.1: The defense acquisition system. Washington, DC: Under Secretary of Defense for Acquisition & Technology. Department of Defense. (2003b). Department of Defense Instruction (DoDI) 5000.61: DoD modeling and simulation (M&S) verification, validation, and accreditation (VV&A). Washington, DC: Under Secretary of Defense for Acquisition & Technology. Dewar, J. A., Gillogly, J. J., & Juncosa, M. L. (1991). Non-monotonicity, chaos, and combat models (Report No. R-3995-RC). Santa Monica, CA: RAND. Gill, A., & Grieger, D. (2001). Validation of agent based distillation movement algorithms. Commonwealth Department of Defence: Defence Science and Technology Organisation, Electronics and Surveillance Laboratory. Gill, A., & Shi, P. (2002). Movement algorithm verification and issues in joint concept development and experimentation. In 5th Project Albert International Workshop. Quantico, VA: Marine Corps Warfighting Laboratory. Harmon, S. Y., Hoffman, C.W.D., Gonzalez, A. J., Knauf, R., & Barr, V. B. (2002). Validation of human behavior representations. In Foundations for V&V in the 21st Century Workshop (Foundations ‘02) (pp. B3-1-B3-34). San Diego, CA: Society for Modeling and Simulation International. Herington, J., Lane, A., Corrigan, N., & Golightly, J. A. (2002). Representation of historical events in a military campaign simulation model. In E. Yücesan, C.-H. Chen, J. L. Snowdon, & J. M. Charnes (Eds.), The 2002 Winter Simulation Conference (pp. 859-863). Piscataway, NJ: Institute of Electrical and Electronic Engineers. Kenny, J. M., McPhail, C., Waddington, P., Heal, S., Ijames, S., Farrer, D. N., Taylor, J., & Odenthal, D. (2001). Crowd behavior, crowd control, and the use of non-lethal weapons. State College, PA: Institute for Non-Lethal Defense Technologies. MacKenzie, G. R., Schulmeyer, G. G., & Yilmaz, L. (2002). Validation of the mission-based approach to representing command and control in simulation models of conflict. In Foundations for V&V in the 21st Century Workshop (Foundations ‘02) (pp. A1-1-A1-40). San Diego, CA: Society for Modeling and Simulation International. Marine Corps Warfighting Laboratory. (2006). Project Albert. Retrieved January 17, 2007, from http:// projectalbert.org McCue, B. (2005). A chessboard model of the U-boat war in the Atlantic with applications to signals intelligence. Naval Research Logistics, 52, 107-136. Moffat, J., Campbell, I., & Glover, P. (2004). Validation of the mission-based approach to representing command and control in simulation models of conflict. Journal of the Operational Research Society, 55, 340-349. Musse, S. R., & Thalmann, D. (2001). Hierarchical model for real time simulation of virtual human crowds. IEEE Transactions on Visualization and Computer Graphics, 7, 152-164. Nguyen, Q. H., McKenzie, F. D., & Petty, M. D. (2005). Crowd behavior cognitive model architecture design. In 2005 Conference on Behavior Representation in Modeling and Simulation. Orlando, FL: Simulation Interoperability Standards Organization. Palmore, J. I. (1996). Research on the causes of dynamical instability in combat models (Report No. ADA315 079/4). Champaign, IL: Construction Engineering Research Lab. Sanchez, S. M., & Lucas, T. W. (2002). Exploring the world of agent-based simulations: Simple models, complex analyses. In E. Yücesan, C.-H. Chen, J. L. Snowdon, & J. M. Charnes (Eds.), The 2002

Downloaded from http://sag.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2008 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Moya et al. / HUMAN-BEHAVIOR REPRESENTATION

117

Winter Simulation Conference (pp. 116-126). Piscataway, NJ: Institute of Electrical and Electronic Engineers. Simpkins, S. D., Paulo, E. P., & Whitaker, L. R. (2001). Case study in modeling and simulation validation methodology. In The 2001 Winter Simulation Conference (pp. 758-766). Piscataway, NJ: Institute of Electrical and Electronic Engineers. Tolk, A. (2002). Human behavior representation—recent developments. In Lecture Series on Simulation of and for Military Decision Making; Studies, Analyses and Simulation Panel (SAS). Neuilly sur Seine, France: NATO Research & Technology Organization. Tufte, E. R. (2001). The visual display of quantitative information. Cheshire, CT: Graphics Press. Vinyard, W. C., & Lucas, T. W. (2002). Exploring combat models for non-monotonicities and remedies. Phalanx, 35, 36-38. Weisel, E. W. (2004). Models, composability, and validity. Unpublished doctoral dissertation, Old Dominion University, Norfolk, VA. Wooldridge, M. J. (2002). An introduction to multiagent systems. West Sussex, UK: John Wiley.

Lisa Jean Moya is the chief scientist of WernerAnderson, Inc., a private veteran-owned small business that provides modeling, simulation, and analytical support to government, academic, and commercial customers. She is a research analyst with 10 years’ experience applying various operations research techniques and a student in the PhD program at Old Dominion University (ODU) in modeling and simulation. She received a MS in operations research from The College of William and Mary and a BS in applied mathematics from ODU. Her areas of expertise include multiple objective decision analysis (MODA), multi-attribute utility theory (MAUT), modeling and simulation, and the development and application of analysis. Her research interests include agent-based simulation, validation, agent architectures, and modeling and simulation formalism. Frederic (Rick) D. McKenzie, PhD, is an associate professor of electrical and computer engineering at Old Dominion University (ODU) where he currently serves as the principal investigator (PI) and co-PI on projects involving simulation architectures, behavior representation in simulations, and medical modeling and simulation. Prior to joining ODU, he held a senior scientist position at Science Applications International Corporation (SAIC), serving as the principal investigator for several distributed simulation projects. At SAIC, he was a team lead on a large distributed simulation system. He has more than 10 years of research and development experience in the software and artificial intelligence fields, including object-oriented design and knowledge-based systems. Both his MS and PhD work have been in artificial intelligence— focusing on knowledge representation and model-based diagnostic reasoning. Quynh-Anh (Mimi) H. Nguyen is a PhD student in the modeling and simulation (M&S) program at Old Dominion University (ODU) and a technical director for WernerAnderson, Inc., in the development of the Joint Crowd FederateTM. She received a MS degree in M&S from ODU in 2003 and a BS degree in electrical engineering from George Mason University. ADDRESSES: LJM: WernerAnderson, Inc., 6596 Main Street, Gloucester, VA 23061, USA; telephone: +1 804-694-3173; fax: +1 804-694-3174; e-mail: [email protected]. FDM: Virginia Modeling Analysis and Simulation Center, Old Dominion University, Norfolk, VA 23529, USA; telephone: +1 757-683-5590; fax: +1 757-683-3220; e-mail: [email protected]. Q-AHN: WernerAnderson, Inc., 6596 Main Street, Gloucester, VA 23061, USA; telephone: +1 804-6943173; fax: +1 804-694-3174; e-mail: [email protected].

Downloaded from http://sag.sagepub.com at PENNSYLVANIA STATE UNIV on April 17, 2008 © 2008 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

Suggest Documents