Using Subject Matter Experts for Results Validation of a Complex Theater Warfare Simulation: A Progress Report Michael L. Metz Innovative Management Concepts, Inc. 45625 Willow Pond Plaza Sterling, Virginia 20164 703-318-8044 ext 210 e-mail:
[email protected] S.Y. Harmon Zetetix, Inc. P.O. Box 2640 Agoura, CA 91376-2640 (818) 991-0480
[email protected] Keywords:
Military, Validation, Results Validation, JWARS
ABSTRACT: This paper addresses the progress that has been made to date in the planning for the use of Subject Matter Experts (SMEs) to support the results validation of the Joint Warfare System (JWARS) simulation. The initial paper was presented to the Fall SISO-SIW in September 2001 (01F-SIW-036 – Using Subject Matter Experts for Results Validation of a Complex Theater Level Simulation). With permission of the JWARS Office and support from the Defense Modeling and Simulation Office (DMSO) the authors are using the JWARS simulation’s verification and valiation (V&V) process as their test bed for this effort. Previously, JWARS results validation process has included a base case output analysis, limited SME predictions of the results of excursions, and comparisons to other theater level simulations. This effort will expand the JWARS results validation process to include a more formal and objective comparison to SME predictions. The planning for formal results validation using SME predictions and comparing them to JWARS output is nearing completion and the actual work will begin in the spring or summer of 2002 when JWARS Release 1.4 is made available for user beta testing and additional V&V. Planning activities relate to the identification of the SMEs, how the SME opinions will be gathered, development of the information the predictions will be based on, identification of the statistical techniques that will be used, development of the methods for comparison, and how the comparisons will be analyzed and reported. Paper Number: 02S-SIW-095
1. Introduction In our earlier paper on this subject (Metz and Harmon 2001, SISO-SIW paper 01F-SIW-036) we described the problems related to the use of subject matter experts (SMEs) for validation of a complex theater level warfare simulation. In the validation process, SMEs frequently provide the referent knowledge, interpret the requirements, and provide the V&V agent with an ability to compare simulation capabilities against this information to make validation decisions. The problems related to this process include that it almost always causes confusion for the SMEs and the V&V Agent in the process of combining these separate and distinct functions. This leads to multiple problems including inconsistency, lack of repeatability, failure to identify causes, and opinion reconciliation issues. The
combination of these problems prevents the V&V Agent from gaining the necessary understanding of the simulation’s validity or adding to increased user confidence in the credibility of the simulation. Given these problems, why use SMEs in the process at all? Currently, it is not possible to provide any other acceptable referent for complex theater level warfare simulations. Obviously, the actual warfare with future doctrine and future forces cannot be conducted and measured on the battlefield. Other results validation methods, including use of test and evaluation data and comparison to other model and simulation results also don’t provide a sufficient referent for comparison. Since the experimental data cannot be obtained elsewhere, SME opinions will be required to conduct results validation of these complex theater level simulations for the foreseeable future. Since the new warfare simulations
3. Planning for Using SMEs in the JWARS Results Validation Process
under development will be used to help senior Department of Defense leaders make multi trillion dollar decisions they must be validated to the maximum extent possible. This paper describes one of the new techniques under development to help make the use of SME opinions in validation more independent, consistent, and repeatable. We will describe the progress in planning to date for this work to be done on JWARS Release 1.4, the beta test version of JWARS slated to be released in the spring or summer of 2002, including how the SMEs will be identified, how their opinions will be obtained, how the JWARS scenario snippets or vignettes will be obtained (or developed) to form the basis of the SME predictions, what statistical techniques that will be used, how the comparisons will be made, and, finally, how the comparisons will be analyzed and reported. The result of this effort, if successful, will be that V&V practitioners will have a better method to assess the validity, and thus the credibility, of complex theater level warfare simulations.
In the JWARS V&V team’s work through Release 1.3 (1.3 was released by the JWARS Office in the fall of 2002 and the V&V Agent review is slated to be completed in February 2003) the results validation process has included multiple activities. They include: V&V Agent analysis of the base case output; comparison of V&V team SME predictions for the results of excursions against the simulation output of the excursions; and comparisons to JWARS output to the output of other theater level simulations. The addition of this planned use of other SMEs to this effort in the Release 1.4 V&V activities will expand the JWARS results validation process to include a more formal and objective comparison to SME predictions. This paper reports on the results of planning for formal results validation using SME predictions and comparing them to JWARS output to date as we approach the completion of JWARS Release 1.4 in the spring or summer of 2002. When completed Release 1.4 will be made available for user beta testing and additional V&V assessment. Specific planning activities relate to the identification of the SMEs, how the SME opinions will be gathered, development of the information the predictions will be based on, identification of the statistical techniques that will be used, development of the methods for comparison, and how the comparisons will be analyzed and reported.
2. JWARS and the JWARS V&V Process The JWARS simulation is a perception based constructive simulation of joint force theater level warfare intended for analysis. Information about JWARS is available from the previous paper. If more information is desired, an executive overview presentation and the Operational Requirements Document is availabe from the Joint Staff (J-8) Studies and Analysis Management Division’s (SAMD) Joint Models web site. (https://www.jointmodels.mil/JWARS/).
3.1 How the SME Opinions Will Be Gathered
The JWARS V&V process includes pre-design artifact validation, design verification, algorithm validation, and results validation. Detailed information about the JWARS V&V process is available in the Joint Warfare System (JWARS) Verification and Validation Plan.
The SMEs for JWARS results validation will be members of the JWARS User Sub Groups. Currently, JWARS Sub Groups with warfare functional area responsibilities include air, C4ISR, ballistic missile defense, land, maritime, space, special operations, transportation & logistics, and weapons of mass destruction. We are working with the JWARS Office to obtain the list of members of the Sub Groups in each of these warfare areas in order to create a candidate list of SMEs and contact them with requests to participate in the SME prediction process. At the same time, we will develop a process to qualify each of them in terms of their expertise and experience via a questionnaire (either hard copy or via electronic access at the IMC web site) to identify their area of expertise and depth of experience. Glasow (1998) and Pace (1999) have written excellent papers on this process. Note to Mike: quote something profound from each paper.
Current JWARS results validation includes a detailed evaluation of the simulation base case outputs by the V&V Agent’s warfare experts, a limited evaluation of the sensitivity of the base case results. This paper addresses the planning for using SMEs to support the results validation of the JWARS simulation. JWARS results validation encompasses two levels of SME activities, including the JWARS Verification & Validation (V&V) Team’s SMEs and the group of SMEs who are members of the JWARS User Subgroups.
Once the determination of the SME group is made, the next challenge is determining how to get the results
2
validation information to them in order for them to make a prediction. A face-to-face meeting with the group adds to the danger that they will discuss the predictions and bias their opinions. Face-to-face individual meetings would over task the available V&V Agent resources. Providing the information via email will require that each of the SMEs have secure internet access. Mailing the information to the SMEs will have to be done using classified mail procedures and require significant time to print and prepare the information packages.
selected SMEs in order to survey their predictions of the outcomes. 3.3 Analysis of Referent Error Sources Any experimental effort must begin by understanding its error sources and the magnitude of the effects of those errors upon the measurements. To this end, we have identified two potentially large sources of errors that could contaminate the referent built from these SME responses to the questionnaires:
The decision on how to distribute the information must be made soon. At this time the most likely method is via classified email. 3.2 Snippet and Vignette Development
•
SME interdependence from their interactions and
•
Expertise biases.
The first of these error sources results from the possible interactions between collocated SMEs responding to the questionnaire via the Internet. The lack of supervision during the survey administration creates the possibility that collocated SMEs might cooperate, even casually. We have assumed that each responding SME acts as an independent source of knowledge and any cooperation could introduce coherence into the predictions that does not actually exist.
The next, and perhaps most significant challenge, is the development of the snippets or vignettes that will be the basis for the SME predictions. We plan to take advantage of two snippet packages that were prepared for the JWARS Study Team (a group of early user testers) in the areas of land warfare and strategic mobility. These snippets will be reviewed and modified for our results validation purposes. In the other warfare areas (air & space, maritime, and C4ISR) the V&V team will have to develop vignettes for each area. At this time the following areas have been identified for analysis: theater ballistic missile defense; sensor operations, air-to-ground adjudication; and mine warfare.
The second error source that could bring in bias is SME area and level of expertise. SMEs responding to situations outside their boundaries of expertise could introduce greater uncertainty in the compiled predictions than that derived from knowledgeable responses. Further, inexperience also tends to introduce bias towards particular answers. Both of these error sources could contaminate part or all of the data derived from these questionnaires.
As described in our previous paper, each vignette must represent a narrow operational situation that covers the simulation behavior space necessary to suit the purpose. They will define
As a result, we will analyze the raw responses to assess the contributions of these error sources, determine the magnitude of their effects, and correct for them when possible. To this end, we will
o
A set of operational components that participate and interact (units and their support forces)
o
A set of conditions under which the components must operate (terrain, weather, light conditions)
•
o
A set of constraints that limit the extent of the operations (physical space and time, for example), and
Identify collocated groups of SMEs from their responses and our understanding of their current assignments and duty stations, and
•
Characterize respondent SME areas and levels of expertise from their self report and from the questions that indirectly test their expertise.
o
The variables against which the simulation results will be measured (losses by asset type, ability of sensors to locate battle space entities, or quantity moved over time).
We will then apply Chi-squared tests to the raw data we receive to assess the independence of that data from these error sources. We will examine any data that does not pass these tests and determine if quantifiable biases affect those data. If we identify any biases in the data then we
These operational vignettes will be used to structure the SME questionnaires that will then be distributed to the
3
will attempt to localize those biases in the data sets, quantify them and normalize the affected data sets appropriately. That data that we cannot normalize will simply be removed from the referent data set.
attempt to determine if a causal relationship exists between the variables or if the actual causal relationship exists between the examined variables sharing another common variable. From this information, we will attempt to statistically characterize any causal relationships that we discover. If we fail to identify the causal relationships underlying any correlations then we will characterize the correlations alone. We will include, in the referent, the most significant correlations and descriptions of the causal relationships that we discover in the data.
3.4 Characterization of Referent Distributions Having corrected the raw data for the effects of errors, we will then proceed to build the referent. This involves computing the arithmetic means of the SME predictions. If necessary, we will compute separate means for the different levels of expertise represented in the respondent sample. We will also compute several characteristics of the uncertainty associated with those means including •
Standard deviation (second moment),
•
Skewness (third moment), and
•
Kurtosis (fourth moment).
The results of this analysis of the raw SME predictions will establish the referent for the JWARS simulation. This referent will describe the mean predictions as a function of SME expertise level and the combined mean predictions. It will also describe the nature of the uncertainties and the correlations associated with each of these prediction sets. In addition, this referent will identify areas where SME opinions do not agree. This disagreement highlights the areas where the referent is weakest. Such weaknesses may reflect the need to collect further information through additional surveys if possible. With descriptions of strengths and weaknesses combined, the integrated result of all this information will be a referent far richer and more objective than could be derived subjectively from a small set of SMEs.
We intend to compute skewness and kurtosis in their nondimensional forms. We will then assess the skewness of each result to assess the impact of the distribution’s deviation from a normal distribution. If the skewness shows moderate or large deviation (i.e., skewness > 0.5) then we shall adjust the means and standard deviations to more correctly reflect the nature of the uncertainties. We have, in this analysis, also assumed all distributions to be unimodal. We will examine the raw data carefully to determine the likelihood of any violations of this assumption and will correct the data appropriately if such cases are detected. We will also examine all of the statistical results for any anomalies that may have manifested in these refined data. If we discover these then we will attempt to identify the nature of the error sources and correct or annotate the data appropriately. Further, we will use the statistical characteristics of the individual SME predictions to compute the variances of the standard deviations, skewnesses and kurtoses themselves. This gives us another set of measures of the statistical behavior of the data set against which to compare the simulation results.
3.5 Comparing Results against the Referent The next step in our validation process is to execute the vignettes in the JWARS simulation. Executions, repeated a statistically significant number of times, will generate the distributions of the simulation’s predictions. We will then analyze the results from these executions to
Finally, we will analyze the raw and statistical data for any correlations between the prediction results and the vignette conditions using multivariate analysis. This analysis will identify any dependencies between the vignette conditions and the prediction outcomes as well as the nature and significance of these correlations. We intend to initially identify the degree of correlation by the correlation coefficient and compute correlation significance from the significance of that coefficient. When we discover a significant correlation, we will
•
Determine the prediction means,
•
Compute the statistical characteristics (i.e., second, third and fourth moments) of the distributions around those means, and
•
Identify the correlations between the simulation’s predictions and the vignette conditions as well as the significance of those correlations.
In this effort, we will reproduce the data in the referent from the simulation instead of the SMEs. Using the same vignettes and the same analysis procedures as used to build the referent will make the comparison of the simulation results against the referent trivial. Given that both the referent and the JWARS simulation represent uncertain events, we expect the simulation results and the referent to vary in places. The question is whether this
4
4. Summary
variance is significant or not. We will test significance by applying the Student’s t-test to these comparisons. We have chosen this significance test over others because we only have the means and the standard deviations of the sampled data available. This test will give us a measure of the confidence that the simulation results agree with the referent. We will apply this test to each of the prediction means, statistical characteristics and correlations described in the referent. Each factor comparison will result in a difference between simulation results and referent and a confidence that the two results agree statistically.
This paper provides an update on the planning that has been conducted in preparation to use objective validation techniques for JWARS results validation. The planning process has been limited to those activities that be conducted prior to release of the version 1.4 of the JWARS simulation in late January or February 2002 (this paper was written in December 2001).We believe that this process will significantly improve the results validation process for JWARS and give future accreditors a better basis for determining the credibility of JWARS in their assessment of whether or not JWARS is suitable for their intended use.
This analysis takes validation beyond its current state by delivering not only a comparison of how closely the simulation results agree with the referent but also a measure of the confidence of that agreement at each point of the comparison.
By removing as much of the subjectivity as possible from the process we hope to use the SMEs to obtain an objective evaluation of the true capabilities of the JWARS simulation. However, we realize that it is not possible to completely remove all subjectivity and obtain a completely objective validation. In future papers will provide updates on the progress to date of the process and will make our observations and conclusions available to other members of the simulation community.
3.6 Activity Schedule We plan to conduct the activities as shown in Figure 1 and complete the effort in this calendar year.
5. References
JWARS Results Validation SME Review
JAN 02
APR 02
JUL 02
OCT 02
JAN 03
Note: website URLs were accurate on 2 January 2002.
Create Vignettes Land combat (direct and indirect fire) Air-to-ground attrition
DMSO (2000). Department of Defense Verification, Validation, and Accreditation Recommended Practices Guide. Build 2 (and future completed editions) available at: http://www.msiac.dmso.mil/vva/.
Build Questionnaire
Do Survey
Strategic mobility Surveillance (perception)
Analyze Survey
Iterate
Glasow (1998), Priscilla Glasow, POET WG2K IV&V Paper No. 25 Characteristics, Selection, and Use of Subject Matter Experts, MITRE Technical Report MTR 99W0000048, June 1998.
Compare Report
Run Simulation
IMC
IPR
IPR
23
IPR
Ζετετιχ
Joint Staff J-8 (1998). Joint Warfare System (JWARS) Operational Requirements Document (ORD), 27 August 1998. Available at: https://www.jointmodels.army.mil/.
Figure 1 - Activity Schedule As shown in Figure 1, building the simulation vignettes is the longest activity and will be iterated as we build the questionnaires to include areas we feel are not discrete enough or descriptive enough in the first iteration.
Pace (1999). Dale K. Pace, “Use of Subject Matter Experts (SMEs) in Simulation Evaluation,” 1999 Fall Simulation Interoperability Workshop Papers, September 1999. Available at: http://www.sisostds.org/index.htm
We will run the JWARS simulation during the time that we conduct the survey and analyze the survey responses, as these activities are not dependent on one another. The comparison phase will follow and then we will report on the results in the JWARS V&V Report for Release 1.4.
JWARS (1998). Joint Warfare System (JWARS) Verification and Validation Plan, Version 3.0, 13 August 1998. JWARS (2000). Joint Warfare System (JWARS) Office Executive Overview Briefing, August 27, 2001.
5
Available at https://www.jointmodels.army.mil/jwars/library.html.
Validation program. He is a specialist in the design, development, verification, validation, and accreditation of simulations. Mr. Metz is a member of the Defense Modeling and Simulation Office’s Verification Validation and Accreditation (VV&A) Technical Support Team (TST). In his work for the TST he is a contributing author of the Department of Defense VV&A Recommended Practices Guide.
Metz and Harmon (2001). Michael L. Metz and Scott Y. Harmon, “Using Subject Matter Experts for Results Validation of a Complex Theater Warfare Simulation,” 2001 Fall Simulation Interoperability Workshop Papers, September 2001. Available at: http://www.sisostds.org/index.htm
SCOTT HARMON, president of Zetetix, a small business specializing in modeling complex information systems. Mr. Harmon has been developing rigorous techniques for the validation of simulation federations and human behavior representations.
Author Biographies MICHAEL METZ, Vice President of Innovative Management Concepts, is the Technical Director of the Joint Warfare System (JWARS) Verification and
6