MILITARY PSYCHOLOGY, 2003, 15(1), 3–16 Copyright © 2003, Lawrence Erlbaum Associates, Inc.
Training Evaluation in the Military: Misconceptions, Opportunities, and Challenges Eduardo Salas and Laura M. Milham Department of Psychology and Institute for Simulation and Training University of Central Florida
Clint A. Bowers Department of Psychology University of Central Florida
Due to a number of misconceptions about training evaluation in the military, these evaluations are rarely done. In this article, we review recent findings that identify obstacles to training evaluation in the military and offer some alternatives for dealing with these problems. Further, we discuss the use of theoretically driven evaluation outcomes to provide evaluators with information that can feed back into the training system. Finally, we discuss future challenges for training evaluation in the military environment, such as the evolution from physical to cognitive tasks; training large, distributed teams; and the use of simulation in training design.
The small number of training evaluations that are routinely completed (and subsequently reported) are surprising. It is tempting merely to create our best solutions to training problems and rest with the belief that this training is effective. Military training is expensive, however. It has been estimated that in past years $27.1 billion was spent on individual and collective training, and an additional $2.2 billion was spent for simulators and necessary support (Orlansky et al., 1994). This expenditure is often made without asking such questions as (a) Does training really work? (b) Is a multi-million-dollar piece of simulation technology effective at training war-fighting skills? (c) Do military personnel learn after attending training? In Requests for reprints should be sent to Eduardo Salas, Department of Psychology, University of Central Florida, P.O. Box 161390, Orlando, FL 32816–1390. E-mail:
[email protected]
4
SALAS, MILHAM, BOWERS
fact, it is interesting to note that our simulation equipment goes through validation, verification, and acceptance testing, but our training does not. As one of the largest consumers of training, the military must have a way to evaluate what has been learned. Thus, we have attempted to identify some of the current obstacles to training and to suggest some solutions for dealing with these obstacles.
MISCONCEPTIONS ABOUT TRAINING EVALUATION Perhaps the most challenging obstacle in all of military training is the set of beliefs held by those who procure and develop training. We have had the opportunity to observe the development of large and small training systems over the past several years. The following sets of beliefs are often openly stated but more often implicit in the behavior of training organizations. Because these beliefs are the single largest impediment to training evaluation, we list and challenge them in the following paragraphs. “Training Evaluation Is Impossible in Our Environment” This problem probably is created by training evaluators. Our insistence on pristine experiment designs may have rendered us irrelevant to the military. For example, experimental evaluation designs require a control group, a condition often described as “absolutely necessary.” However, nontreatment control groups often are simply impossible given mission demands. When the military is forced to choose between training and evaluation, the outcome is not surprising. Further, when each person goes through the program to achieve “readiness,” it is difficult to identify the strengths and weaknesses of each student and focus training accordingly. Currently, training often is treated as a “check-in-the-box” procedure, and students are evaluated only to the degree to which they complete the training. To begin to address this issue, alternative training evaluation designs can be explored to investigate options that do not require control groups. These methods can allow training evaluators to get some valuable information regarding the training. “Training Evaluation Is Not Needed; We Know That It Works” Common types of evaluation data, such as reactions to the training, often suggest that training is very effective. Yet, these data may not provide training designers with useful data to restructure training or to determine whether training increases learning or is transferred back to the job. This is a critical concern in military situations, as the goals of military training include the ability of trainees to perform
TRAINING EVALUATION
5
trained behaviors in operational settings. Without looking at these various outcomes, training designers have little understanding of the nature of transfer and the optimal length of time before retraining becomes necessary. This problem can be addressed by investigating additional outcomes of training evaluation (e.g., learning, behavioral, and organizational) and by using theoretically based models to drive outcome selection (Kraiger, Ford, & Salas, 1993).
“Training Evaluation Is Not Needed; Our Simulators Are Perfect” This misconception stems from the belief that “simulation = training,” although recent reviews have suggested that this is not the case (Salas, Cannon-Bowers, Rhodenizer, & Bowers, 1999). Simulation allows trainees to practice skills but often does not collect performance data. Simply practicing skills does not ensure that a trainee can perform a task well. The process of developing a simulation system should begin with a training-needs analysis, to determine specific training objectives to be met by the simulator. Further, performance measurement in simulators is not commonly used to track how well a trainee is meeting objectives. Unless training design principles are folded into the front end and back end of simulators, there is little data to evaluate the system.
IMPROVING TRAINING EVALUATION IN MILITARY SETTINGS New Designs One bottleneck to performing training evaluation is the rigidity of traditional experimental design. For example, safety issues may require that all soldiers go through a training program, thereby eliminating a control group required by experimental designs. However, recent design alternatives provide flexibility for training evaluators. One of the more interesting ways to conduct training evaluation is to redefine what needs to be evaluated. Sackett and Mullen (1993) suggested that, in addition to the traditional question about the degree of change that has occurred because of training, evaluators may be interested in whether a target performance level has been met. Sackett and Mullen argued that measurement of change from pretest to posttest group or from control to experimental group is important only in three situations: (a) if the experimenter wants to evaluate the utility of a training program, (b) if the evaluator wants to compare the effectiveness of two training programs, or (c) if the evaluator wants to investigate the effectiveness of training methods or approaches.
6
SALAS, MILHAM, BOWERS
In contrast, some military evaluators may want to know only if competencies can be performed at an acceptable level after training. In a situation where trained competencies are targeting skills, for example, the most interesting outcome might be the accuracy that the trainee has on the bombing range. In order to evaluate this, Sackett and Mullen (1993) stated that a target performance level must exist, against which trainee performance can be tracked. Target performance levels are the identification of a minimally acceptable or range of acceptable performance for trainees. For example, a trainer can use a minimum accuracy of bombs on target 60% of the time to assess a trainee’s performance and follow his or her improvements. Finally, Sackett and Mullen suggested that, once the acceptable performance level is known, it should be used to make such decisions as whether the trainee should go through a training program again or receive remediation. This use of performance data is important because it allows an assessment of competencies that is more helpful than a check-in-the-box procedure, which simply indicates trainees have completed a program. Evaluating Training Without a Control Group Sackett and Mullen (1993) also argued that three preexperimental designs can be used as input for making decisions about training. Posttest-only no-control-group design, pretest–posttest no-control-group design, and the posttest-only nonequivalent control group design have been suggested as potential evaluation strategies. Posttest-only no-control-group designs do not include a control group or a pretest to assess if there are changes after training. Pretest–posttest no-control-group designs include a pretest and a posttest but do not require a control group. Finally, the posttest-only nonequivalent control group design does not include a pretest. Posttest-only no-control-group designs can provide evaluators with information about whether the trainees have reached target levels of performance, as described previously. Pretest–posttest no-control-group design and posttest-only nonequivalent control group design can provide evaluators with a measure of change, although the reason for the change cannot be assigned unequivocally to the training (Sackett & Mullen, 1993). Sackett and Mullen suggested that a rational approach to examining threats to internal validity can provide insight regarding the risks of these threats. Haccoun and Hamtiaux (1994) provided a way to increase the utility of the pretest–posttest no-control-group design by using the Internal Referencing Strategy (IRS). The IRS incorporates both relevant and nonrelevant (untrained) material into the pre- and posttests. The changes between pre- and posttests on both types of materials are compared, and training is considered to be effective when changes on the trained material are greater than on the untrained material. For a fighter pilot trained on High-Speed Anti-Radiation Missile (HARM) procedures, this may mean including Standoff Land Attack Missile (SLAM) procedures in the pre- and posttests. Training is effective, then, when the pilot performs better with the HARM systems than with the SLAM systems. The pattern of results found by
TRAINING EVALUATION
7
Haccoun and Hamtiaux suggested that IRS designs provided similar information as did the pre- and posttests with comparison group design, although it may be susceptible to Type II error, suggesting that an ineffective program is effective. Given the limitations of military training, however, such methods may provide some useful information to evaluators. Reducing the Expense of Training Evaluation Yang, Sackett, and Arvey (1996) suggested ways for evaluators to reduce costs of evaluation and still maintain power. When comparing a pretest–posttest design to a posttest-only measure, the pretest–posttest design might be more powerful, requiring fewer participants. But, in cases where measuring criterion performance is expensive, the posttest-only design may be cheaper, even if it requires more participants to increase the power. For example, if criterion performance gauges how well a navigator performs procedures during an actual flight, then measuring an individual before and after training may be more expensive than using more trainees and measuring them after training. Another way to decrease costs is to examine how participants are split up into control and experimental groups. Yang et al. (1996) suggested that, if training costs are high and control costs are relatively low, it is possible to skew the participants in favor of the control group and maintain power. They provided guidelines for examining the ratio of control group members to training group members that are based on costs for participants, costs for administering criterion measures, and costs for creating criterion measures. Finally, Yang et al. (1996) suggested that a less expensive proxy criterion can be used in place of the target criterion measure. Target criterion measures are those that are gathered in the actual operational environment. In many cases, performance measurement in the operational environment can be unrealistic due to safety and monetary concerns. Proxy measures are those that have less fidelity than the criterion measures but represent the criterion. As they are less expensive, proxy criteria can allow evaluators to increase the sample size, which in turn will increase the likelihood that changes in performance are detected. An example of using a proxy measure might include evaluating performance in a low-fidelity simulator rather than during an actual flight. Another example is measuring team performance during a scenario engineered to elicit teamwork skills. Indeed, using such proxy measures may increase control of the situation by allowing evaluators to elicit the trained competencies. Measuring the Right Things In addition to identifying alternative designs for training evaluation, researchers have made improvements in measuring the key training outcomes identified by Kirkpatrick (1976). The following sections review how we can use outcomes to as-
8
SALAS, MILHAM, BOWERS
sess whether training or simulator systems are providing the necessary competencies they are targeting.
Reaction outcomes. Reactions, the most common form of training evaluation outcome collected today, are often taken as a self-report measure capturing the degree to which trainees liked the training. When collecting only this type of information, key attitude data are ignored, such as the relevance and perceived value of the training. These data can help training designers identify whether current content is applicable, if there are gaps in the training program, and when task analyses should be updated. Other research on examining trainees’ reactions has suggested that utility-type reaction measures are related both to learning and to the degree to which the training is transferred to on-the-job performance (Alliger, Tannenbaum, Bennett, Taver, & Sholand, 1997). The method of collecting this data may alter the chances of receiving useful information. For example, if trainees are given an open-ended response format, they may suggest areas that need to be addressed in training (Tannenbaum, Cannon-Bowers, Salas, & Mathieu, 1993). To illustrate, if the switchology of a weapons system is much different from the older systems, trainees may identify the need to spend more time on how to employ it. On the other hand, if the systems are similar, trainees may feel that the training is redundant and that another component of the system should be emphasized. This data can directly feed back into the training system to update and impact training. Each of these findings suggests that there can be valuable attitude data collected if utility data are collected in addition to reaction data. Learning outcomes. Learning outcomes have also benefited from an influx of recent research. The measurement of learning is purported to capture the principles, facts, techniques, and rules targeted in training (Kirkpatrick, 1976). New research on learning outcomes (Kraiger et al., 1993) provides a framework for choosing evaluation measures based on learning outcomes. The approach treats learning as a multidimensional construct in that learning is evident in cognitive outcomes, skilled behaviors, and affective changes. For example, skill-based learning outcomes examine the development of technical or motor skills. This is characterized by the linking of behaviors in an organized manner (Kraiger et al., 1993; Weiss, 1990), including procedures such as assembling a gun, landing a plane, or using communication skills. Cognitive outcomes. Cognitive outcomes include the quantity and type of knowledge and the relations among knowledge elements (Kraiger et al., 1993). The components of cognitive outcomes include verbal knowledge, knowledge organization, and cognitive strategies. Verbal knowledge includes declarative knowledge, or what a trainee needs to know to perform a job. Some researchers (Johnson-Laird, 1983; Rouse & Morris, 1986) have purported that knowledge organization is equal
TRAINING EVALUATION
9
to or more important than declarative knowledge. The structure of knowledge is generally described as the organization and interrelations among domain knowledge. Recent research has uncovered differences between knowledge structures for novices and experts. Specifically, the knowledge bases of experts are characterized by hierarchical storage and strong paths between critical elements (Glaser & Chi, 1989). Further, expert structures have strong links between problems and solutions, enabling experts quickly to assess a problem and come up with a solution (Glaser & Chi, 1989), whereas novices have different knowledge structures for problem definition and solution strategies. Based on the assumption that one of the goals of training is to turn novice performers into more expert performers, Kraiger et al. (1993) have suggested that trainees’ understanding of course material may be addressed best by measuring their structural knowledge. Knowledge organization can be targeted at each step of the training process, from needs assessment to training evaluation. Due to the importance of this factor, both the structure and the content of knowledge are hypothesized to be effective tools for training evaluation. Cognitive strategies are the final component of cognitive outcomes. Due to differences between novice and expert knowledge, it may be that experts have more effective task strategies than novices. Various knowledge elicitation methods can be used to evaluate how experts deal with both problem and everyday situations. Fowlkes, Salas, Baker, Cannon-Bowers, and Stout (2000) suggested a scenario-based interview approach whereby both experts and novices are given a challenging situation and are asked to provide ways of resolving the problem. Expert and novice strategies are compared to determine how experts use strategies and knowledge to recognize the problem as it unfolds, determine the correct course of action, and execute a problem-solving strategy. For example, expert radar operators may be able to predict times during a mission, based on snapshots during a brief, where there is potential for fratricide. Novice operators may not recognize the potential until much later in a mission. Even though some researchers have suggested that measuring knowledge structures or mental models are a better indication of learning than measuring declarative knowledge, there is a need to develop methodologies further to measure conceptual knowledge constructs. The model proposed by Kraiger et al. (1993) provides a solid, theoretical basis for the measurement of such constructs but does not provide a specific methodology for measurement of mental models. Further, more research is required to determine how declarative knowledge, procedural knowledge, and structural knowledge impact performance and how expert strategies and problem solving can be measured more effectively. Finally, methodologies need to be developed that spell out how to use this information to drive future training. Experiential training has been suggested as a method to use structured scenarios to create families of experience for novices (Marshall, 1995), an approach that would result in more accurate mental models. These related experiences are hypothesized to illustrate the differences between situations and to pro-
10
SALAS, MILHAM, BOWERS
vide practice opportunities for dealing with them. Additional research is needed, however, to examine how well these training programs impact learning outcomes.
Behavioral outcomes. Behavioral outcomes are an important indicator of how well someone can perform a set of tasks in the training environment. In a military setting, this outcome is critical, as the consequences of incorrect performance are high. Advances in understanding of training performance have been primarily focused on the area of process versus outcome performance (Blickensderfer, Cannon-Bowers, & Salas, 1997; Smith-Jentsch, Zeisig, Acton, & McPherson, 1998). Outcome measures of performance include how well a team accomplished the mission (e.g., quantity of output, time to perform the task, number of errors). Measures of performance are important to assess the ability of teams to use the trained competencies in an environment. Another way to examine team performance is by measuring the way that a task is accomplished. Process measurement, for example, can focus on the degree to which a team used teamwork skills effectively (Fowlkes, Lane, Salas, Franz, & Oser, 1994; Smith-Jentsch et al., 1998). These measures can be extremely diagnostic of performance deficiencies if the observation of team processes is driven by a priori constructs and expectations (Fowlkes, Dwyer, Milham, Burns, & Pierce, 1999). Event-based training (EBAT) is a method for collecting information on both outcome and process measurement. EBAT involves introducing events into scenarios to provide opportunities for performing targeted skills (Fowlkes et al., 1994). Further, it provides a measurement opportunity that can, in turn, be used to give trainees feedback on both their task-work and teamwork skills. Aside from performance during training, another important outcome is whether a trainee transfers what was learned back to the operational environment. In fact, in many cases, training is provided to enhance job performance. Oddly enough, however, transfer is rarely assessed (Salas et al., 1999). Retention of learned material and changes in job performance should be targeted more frequently for training evaluation. More rigorous methods for collecting such data are needed (Salas & Cannon-Bowers, 2001). Organizational outcomes. Organizational outcomes describe changes in the organization that can be linked back to specific training objectives (Tannenbaum et al., 1993). Examples of these outcomes may range from increases in safety to overall increases in productivity. Frequently, organizational goals are not evaluated either before or after training. This is surprising, as organizational needs are, at least hypothetically, the reason training is often requested. The difficulty in identifying these outcomes often precludes their inclusion into the training system, however. In some cases, these goals may be diametrically opposed, in that the official organizational goals are not reflected in reward programs. One such circum-
TRAINING EVALUATION
11
stance occurs when safety is the identified training goal and competencies are identified and trained to support that goal, but an unwritten goal of the organization is to increase productivity. If “cutting corners,” such as skipping safety steps, supports the rewarded behaviors, then trainees may not perform trained behaviors (Cannon-Bowers, Salas, & Milham, 2000). In summary, there is a need to develop more diagnostic and rigorous assessments of learning, behavioral, and organizational outcomes (Salas & Cannon-Bowers, 2001) to ensure that training is leading to expected changes in each of these areas. Without this, training developers cannot be sure that their programs and simulators are providing the intended competencies to trainees.
CHALLENGES IN TRAINING EVALUATION It Is More About Thinking and Less About Doing The changing roles of military jobs play a large factor in the future of training evaluation. In the past, jobs requiring motor skills were predominant. Today, however, tasks are becoming increasingly cognitive in nature. Cognitive tasks include those relying on the understanding and application of factual knowledge. Tasks requiring cognitive or procedural components may require additional considerations for training design. For example, although there is a general lack of data on skill retention and degradation (Meltzer, Swezey, & Bergondy, 1994), existing studies have indicated that tasks requiring cognitive or procedural components go through greater and more rapid decay than motor skills (Adams & Hufford, 1962; Childs & Spears, 1986; Mengelkoch, Adams, & Gainer, 1971; Schendel, Shields, & Katz, 1978). In order to maintain acceptable levels of mission readiness, the rate of decay requires a more frequent and accurate gauge of when retraining is necessary; in addition, training design must take retention into account. Training theory and strategies to promote retention have been proposed (Schmidt & Bjork, 1992), but evaluation is necessary to determine the effectiveness of such methods. Another obstacle in measuring cognitive skills is difficulties in observation. As suggested previously, however, theoretical and practical work is providing more systematic ways of evaluating these types of skills. As event-based processes and mental model measurement become more refined, these tools can provide ways to measure cognitive and procedural skills within rich, operational contexts. In addition, the evaluation of team training programs needs to be considered. The measurement of team performance has come a long way; more research is required, however, to develop psychometrically sound measures of team competencies. As is the case in individual training evaluation, reactions are the most com-
12
SALAS, MILHAM, BOWERS
mon type of training assessment (Dwyer, Fowlkes, Oser, & Salas, 1997). It is important, however, to evaluate training from multiple perspectives, namely reactions to training, trainee attitudes toward teamwork, knowledge of teamwork concepts, and team performance (Cannon-Bowers et al., 1989). Further, team performance measures have tended to be global, numerical rating scales used by subject matter experts (SMEs). These types of measures tend to lack diagnosticity, in that they may not provide training facilitators with enough information to identify and target weaknesses in training. Training for Larger, Distributed Teams More frequently, teams are changing from those consisting of colocated team members to those that are farther apart spatially, or distributed. Another trend is toward very large tactical teams working toward common goals (Fowlkes, Milham, & Neville, 2000). A preponderance of team performance research is based on teams of three or four members, although tactical teams can range from strike teams with 20 members to ground forces including hundreds of team members. In large, distributed teams, there may be additional challenges due to constraints in exchanging information (Milham, Fowlkes, & Neville, 2000). Team members may lose nonverbal cues and may be limited in communications by voice or data links. Nonverbal information is important for team members as it provides visual information about the state of a team’s task. Without this information, coordination may be more difficult (Milham et al., 2000). It may also be harder to keep track of other team members who are in another location, leading to decreased attention and possible lapses in situation awareness. Further, due to the sheer size and distributed nature of such teams, team members may experience process loss when team members have to use a concerted effort to coordinate. Dwyer et al. (1997) suggested that, in distributed training, the tasks are often more similar to real-world tasks, which makes performance measurement more challenging. Second, the increased number of trainees and dynamic relations among distributed team members make it harder to control team exercises by creating specific measurement opportunities. Third, equipment failures are more likely as additional and different technologies are involved and as various elements use different computer and simulation systems not designed for optimal compatibility. This problem has, in part, been addressed by event-based measurement methodologies (Dwyer et al., 1997; Smith-Jentsch et al., 1998), as discussed in previous sections. EBAT provides training designers with a flexible tool for designing environments that allows trainees to exhibit and practice trained behaviors. EBAT also provides a measurement opportunity that can, in turn, be used to give trainees behavioral feedback. ShipMATE (Shipboard Mobile Aid for Training and Evalua-
TRAINING EVALUATION
13
tion) is a hand-held, computer-based technology that can facilitate real-time observations, data reduction, and presentation to support EBAT methodologies (Lyons & Allen, 2000). Another consideration for evaluation of team exercises is the expertise of raters. Some field research (Fowlkes & Milham, 2000) has found that, in cases where the raters have expertise in an area different from the trainees being evaluated, their ratings may not agree with those of a rater with expertise similar to the trainee. The result is that the same performance is rated very differently. Again, event-based measurement, as a structured method of collecting performance data, is hypothesized to increase agreement among experts in these situations and to provide nonexpert trained raters with a tool to achieve agreement with expert trained raters (Fowlkes & Milham, 2000). The Equipment Is Really Expensive, but the Opportunities for Learning Are Not Any Better In some domains, the environment is too dangerous or impractical to use for training. Advances in simulation and virtual environments provide designers with powerful tools for training systems that can provide realistic fidelity to the training environment. Unfortunately, in many cases, training methods are not used in designing simulated systems. Often, specific training objectives or measurement are not included in simulated training. Although this has been identified as a problem (Vreuls & Obermayer, 1985), simulator development should be designed with specific goals in mind. Transfer of skills from simulation to the operational environment has not been established. Indeed, several researchers (Caird, 1996; Kozak, Hancock, Arthur, & Chrysler, 1993) questioned the notion that training in virtual reality will transfer to the real world, which suggests that there is limited transfer. This is surprising, as more expensive virtual reality systems may be chosen for training due to their high fidelity. Fidelity, however, is not the only component of effective training (Jentsch & Bowers, 1998). For example, providing feedback is critical in performance improvement (Blickensderfer et al., 1997). Simulators have the potential to collect various kinds of performance data for this purpose; on the flip side, however, other simulator systems collect countless pieces of information. A thoughtful approach is required, whereby training technology incorporates training design principles into each step of the system. One way to do this is to use training objectives as guidance for assessing performance in the system. For example, if a training objective includes using standard procedures for a night carrier landing, eye-tracking data can be collected to ensure that the aviator is looking at the appropriate equipment and visual cues, in addition to striving for landing accuracy. Online diagnostic assessment is another way of providing performance feedback. This involves collecting performance data and adjusting training or providing
14
SALAS, MILHAM, BOWERS
performance feedback during the training scenario. If transparent feedback is provided to trainees, they can adjust their performance accordingly until they perform at acceptable levels. If not, an intelligent system can adjust the scenario so that trainees receive more guidance or practice or both for areas of weak performance. Additional research is required to uncover conditions in which virtual reality and high-fidelity simulators are effective training tools. Simply providing practice in any environment is not necessarily the best intervention to provide training, especially if it is the only training intervention used (Cannon-Bowers, Rhodenizer, Salas, & Bowers, 1998). Diagnostic assessment and feedback are potential ways to use training strategies to increase the utility of a simulated training environment. As with any training test beds, however, there is no replacement for using sound training methodologies for the design of training.
CONCLUSIONS Training is a way of life in the military. In peacetime, military personnel spend about 100% of their time in training, getting ready. So why do we not evaluate the training we provide? Why do we not determine what has been learned? The science of training now provides some viable alternatives. Although much work needs to be done to create rigorous methodologies of training evaluation, progress has been made. Ours is a culture in which technology drives training, in that acquisition personnel buy simulators and assume that training is being conducted when people log time with the machines. Such a mind-set needs to end, so that training technologies can be folded into military systems to ensure that the training is accomplishing what is expected.
REFERENCES Adams, J., & Hufford, L. (1962). Contributions of a part-task trainer to the learning and relearning of a time-shared flight maneuver. Human Factors, 4, 159–170. Alliger, G., Tannenbaum, S., Bennett, W., Taver, H., & Sholand, A. (1997). A meta-analysis of the relations among training criteria. Personnel Psychology, 50, 341–358. Blickensderfer, B., Cannon-Bowers, J., & Salas, E. (1997). Theoretical bases for team self-correction. Advances in Interdisciplinary Studies of Work Teams, 4, 249–279. Caird, J. (1996). Persistent issues in the application of virtual environment systems to training. In Proceedings of the 3rd Annual Symposium on Human Interaction With Complex Systems (pp. 124–132). Dayton, OH: Institute of Electrical and Electronics Engineers. Cannon-Bowers, J., Prince, C., Salas, E., Owens, J., Morgan, B. B., Jr., & Gonos, F. (1989). Determining aircrew coordination training effectiveness. In Proceedings of the 11th Interservice/Industry Training Systems Conference (pp. 128–136). Arlington, VA: American Defense Preparedness Association. Cannon-Bowers, J., Rhodenizer, L., Salas, E., & Bowers, C. (1998). A framework for understanding pre-practice conditions and their impact on learning. Personnel Psychology, 51, 291–320.
TRAINING EVALUATION
15
Cannon-Bowers, J., Salas, E., & Milham, L. (2000). The transfer of team training: Propositions and guidelines. Advances in Developing Human Resources: Managing and Changing Learning Transfer Systems in Organizations, 8, 63–74. Childs, J., & Spears, W. (1986). Flight skill decay and recurrent training. Perceptual & Motor Skills, 62, 235–242. Dwyer, D., Fowlkes, J., Oser, R., & Salas, E. (1997). Team performance measurement in distributed environments: The TARGETS methodology. In M. T. Brannick & E. Salas (Eds.), Team performance assessment and measurement: Theory, methods, and applications (pp. 137–153). Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Fowlkes, J., Dwyer, D., Milham, L., Burns, J., & Pierce, L. (1999). Team skills assessment: A test and evaluation component for emerging weapons systems. In Proceedings of the 1999 Interservice/Industry Training, Simulation, and Education Conference (pp. 994–1004). Arlington, VA: National Training Systems Association. Fowlkes, J., Lane, N., Salas, E., Franz, T., & Oser, R. (1994). Improving the measurement of team performance: The TARGETS methodology. Military Psychology, 6, 47–61. Fowlkes, J., & Milham, L. (2000). Team member perspective in defining situation awareness. Paper presented at the meeting of the American Psychological Society, Miami, FL. Fowlkes, J., Milham, L., & Neville, K. (2000). Carrier air wing domain description. Orlando, FL: Naval Air Warfare Center, Training Systems Division. Fowlkes, J., Salas, E., Baker, D., Cannon-Bowers, J., & Stout, R. (2000). The utility of event-based knowledge elicitation. Human Factors, 42, 24–35. Glaser, R., & Chi, M. T. (1989). Overview. In M. T. Chi, R. Glaser, & M. Farr (Eds.), The nature of expertise (pp. xv–xxviii). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Haccoun, R., & Hamtiaux, T. (1994). Optimizing knowledge tests for inferring learning acquisition levels in single group training evaluation designs: The internal referencing strategy. Personnel Psychology, 47, 593–604. Jentsch, F., & Bowers, C. (1998). Evidence for the validity of PC-based simulations in studying aircrew coordination. The International Journal of Aviation Psychology, 8, 243–260. Johnson-Laird, P. (1983). Mental models. Cambridge, MA: Harvard University Press. Kirkpatrick, D. (1976). Evaluation of training. In R. L. Craig (Ed.), Training and development handbook (2nd ed., pp. 301–319). New York: McGraw-Hill. Kozak, J., Hancock, P., Arthur, E., & Chrysler, S. (1993). Transfer of training from virtual reality. Ergonomics, 36, 777–784. Kraiger, K., Ford, K., & Salas, E. (1993). Application of cognitive, skill-based, and affective theories of learning outcomes to new methods of training evaluation. Journal of Applied Psychology, 78, 311–328. Lyons, D., & Allen, B. (2000). Mobile aid for training and evaluation (MATE): A hand-held, configurable set of team performance measurement tools. In Proceedings of the 2000 Interservice/Industry Training, Simulation, and Education Conference (pp. 661–671). Arlington, VA: National Training Systems Association. Marshall, S. (1995). Schemas in problem solving. Cambridge, England: Cambridge University Press. Meltzer, A., Swezey, R., & Bergondy, M. (1994). Guidelines and recommendations for the development and implementation of recurrent aircrew coordination training (ReACT). Orlando, FL: Naval Air Warfare Center, Training Systems Division. Mengelkoch, R., Adams, J., & Gainer, C. (1971). The forgetting of instrument flying skills. Human Factors, 13, 397–405. Milham, L., Fowlkes, J., & Neville, K. (2000). Candidate training strategies for the facilitation of team skills in large, distributed teams. Orlando, FL: Naval Air Warfare Center, Training Systems Division. Orlansky, J., Dahlman, C., Hammon, C., Metzko, J., Taylor, H., & Youngblut, C. (1994). The value of simulation for training (IDA Paper No. P–2982). Alexandria, VA: Institute for Defense Analyses.
16
SALAS, MILHAM, BOWERS
Rouse, W., & Morris, N. (1986). On looking into the black box: Prospects and limits in the search for mental models. Psychological Bulletin, 100, 349–363. Sackett, P., & Mullen, E. (1993). Beyond formal experimental design: Towards an expanded view of the training evaluation process. Personnel Psychology, 46, 613–627. Salas, E., & Cannon-Bowers, J. (2001). The science of training: A decade of progress. Annual Review of Psychology, 52, 471–499. Salas, E., Cannon-Bowers, J., Rhodenizer, L., & Bowers, C. (1999). Training in organizations: Myths, misconceptions, and mistaken assumptions. Research in Personnel and Human Resource Management, 17, 123–161. Schendel, J., Shields, J., & Katz, M. (1978). Retention of motor skills: A review (Tech. Paper No. 313). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences. Schmidt, R., & Bjork, R. (1992). New conceptualizations of practice: Common principles in three paradigms suggest new concepts for training. Psychological Science, 3, 207–217. Smith-Jentsch, K., Zeisig, R., Acton, B., & McPherson, J. (1998). Team dimensional training: A strategy for guided team self-correction. In J. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 271–297). Washington, DC: American Psychological Association. Tannenbaum, S., Cannon-Bowers, J., Salas, E., & Mathieu, J. (1993). Factors that influence training effectiveness: A conceptual model and longitudinal analysis (Tech. Rep. No. 93-011). Orlando, FL: Naval Training Systems Center, Human Systems Integration Division. Vreuls, D., & Obermayer, R. (1985). Human–system performance measurement in training simulators. Human Factors, 27, 241–250. Weiss, H. (1990). Learning theory and industrial and organizational psychology. In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and organizational psychology (2nd ed., Vol. 1, pp. 171–221). Palo Alto, CA: Consulting Psychologists Press. Yang, H., Sackett, P., & Arvey, R. (1996). Statistical power and cost in training evaluation: Some new considerations. Personnel Psychology, 49, 651–668.