Jason H. Wong, Helen Wright & Lauren Ogren. Naval Undersea Warfare Center, Newport, RI. Traditional team training is conducted by physically co-locating ...
Proceedings of the Human Factors and Ergonomics Society 58th Annual Meeting - 2014
2383
COGNITIVE DYNAMICS OF DISTRIBUTED TEAMS PERFORMING DYNAMIC, COMPLEX TASKS Jason H. Wong, Helen Wright & Lauren Ogren Naval Undersea Warfare Center, Newport, RI
Not subject to U.S. copyright restrictions. DOI 10.1177/1541931214581496
Traditional team training is conducted by physically co-locating students and accessing a common system. This method of training involves bringing the entire team to where the system is located. This can be made more efficient by allowing students to remotely access these systems from their duty station. However, the effects of conducting training in this distributed manner have not been investigated. This study identifies task and cognitive performance decrements of distributed teams. Teams of three people were trained on a complex, dynamic task and performed several scenarios. The key variable of interest was whether teams were seated next to each other (co-located) or were in separate spaces (distributed). Task performance metrics were collected, along with measures of workload, communications, and team cohesion. Results demonstrate that all teams had similar mission outcomes, but distributed teams reported higher team workload, lower group integration, and increased communications. This suggests that the act of distributing a team will impact training, but these impacts may not all be negative. Improvements to distributed training environments should be developed to increase performance and reduce workload in distributed teams. INTRODUCTION The successful execution of a naval mission requires cohesive teamwork, and smaller teams (e.g., a Sonar supervisor and a group of Sonar technicians) have a high degree of collaboration and coordination. At the same time, a key principle in submarine crew training is that of forceful backup. Operators are expected to push critical information up the chain of command, momentarily breaking protocol. Navy schoolhouses emphasize teamwork and team coordination in many classes. This occurs both explicitly in the curriculum and implicitly as students work together to solve problems and practice new concepts. However, the resources required to build and maintain training facilities, staff them with instructors, and bring the students to the facilities are significant. A possible mitigation of this issue is the creation of a virtual naval control room, which would be populated with live, shipboard displays. Student would then be able to log in remotely and virtually inhabit this environment. This system would make more efficient use of available training hardware by distributing them over the network, but it presents a possible problem: students who are not physically co-located in a laboratory may not form as strong and as coordinated of a team versus if they were in the same physical space (Cooke, Salas, Keikel & Bell, 2004). This has the potential to impact training effectiveness (in terms of knowledge absorption, retention, and time to reach proficiency) and team dynamics. Team-level cognitive workload may increase, situational awareness may decrease, and cohesion may not be as strong. It is necessary to understand how these dynamics are affected when teams are not co-located in the physical world. Early research viewed team cognition as simply a sum of individual mental processes, and that teamwork was almost always positive because the workload could be divided between team members (Entin, Serfaty & Kerrigan, 1998). However, recent thinking has hypothesized that teams need to balance two processes: task work, which is the actual task that needs to be accomplished, and team work, which involves monitoring other members of the team, ensuring an efficient
distribution of workload and making sure there is no redundant work occurring (unless redundancy is desired) (Cooke, Salas, Kiekel, & Bell, 2004). Therefore, researchers cannot simply measure individual cognition (i.e., situational awareness or cognitive workload) and then draw sound conclusions about the overall team. Workload has long been measured via self-report through the NASA Task Load Index (TLX) from Hart & Staveland (1988). The TLX examines six components of individual workload and has been validated through many studies (for a review, see Hart, 2003). Entin, Serfaty & Kerrigan (1998) sought to extend the TLX to add five team factors: communication, monitoring, control, coordination, and leadership. This combined TLX can serve to measure workload based on both taskwork and teamwork. Team cohesion and resilience is another critical team performance area. Within the submarine domain, resilience has been defined as “A team’s capacity to recognize, deep within the command structure, developing danger and opportunity under ambiguous and uncertain conditions (Smallridge et al., 2013).” General measures of team cohesion have been developed, such as the Group Environment Questionnaire (Widmeyer, Brawley & Carron, 1985), which is a self-report measure examining Individual Attraction to the Group and Group Integration. The submarine community recently approved an observational method for subject matter experts called the Submarine Team Behaviors Tools (STBT) which uses a behaviorally-anchored rating scale to measure resilience (Smallridge et al., 2013). A final important construct when examining teamwork is that of communications. Co-located team members have nonverbal methods of communication available, including looking at each others’ displays, pointing/gesturing, and/or sensing affective states (e.g., frustration, boredom). Distributed team members are unable to take advantage of these cues, and additional communications may be required to make up for this. Therefore, while the exact content of every utterance is not necessary to collect, understanding the kinds of formal communications that take place (i.e., issuing orders, providing
Proceedings of the Human Factors and Ergonomics Society 58th Annual Meeting - 2014
forceful backup), would serve to measure team performance (Driskell, Radtke & Salas, 2003). Previous research on virtual teams has primarily only explored the usability of the environment. In the medical community, Dev, Youngblood, Heinrichs & Kusumoto (2007) created a virtual emergency department, complete with simulated patients with realistic models of medical conditions. Heinrichs, Youngblood, Harter & Dev (2008) conducted exercises in this virtual world and demonstrated that participants rated the scenarios to be realistic enough that they could “suspend disbelief” that they were using a virtual environment and the software usability reasonably high. Team performance, however, was not measured. Similar findings were reported by Creutzfeldt, Hedman, Medin, Heinrichs & Fellander-Tsai (2010) with a virtual world dedicated to training of cardiopulmonary resuscitation (CPR). Three-person CPR teams were trained, tested and re-tested after six months. A survey examining self-efficacy and usability of the virtual world showed high marks at the time of training and six months later. Testing demonstrated that knowledge was lost over the six-month period, but much was still retained. One flaw of this study is that no real-world training condition was conducted as a baseline for learning and retention. Despite this, conclusion is drawn that the virtual world appears to be a suitable environment for team training. The medical and military communities have only begun to examine team training in a virtual world. Military team training environments have been created (Surface, Dierdorff & Watson, 2007; Goldberg, 2010), but the participants have often been co-located in the same room while working on separate workstations. This study seeks to use a complex, dynamic task to examine team and cognitive performance when team members are sitting next to each other (co-located) or sitting far apart (distributed). PROCEDURE Participants. A total of eighteen employees of the Naval Undersea Warfare Center voluntarily participated in the experiment, grouped into six teams of three people each (11 males, 7 females, mean age = 26, SE = 2.01). The experiment was conducted during regular working hours. For the task, each team had three positions: Captain, Helm/Communications, and Weapons/Science. A biographical questionnaire included self-report questions about each participant’s knowledge of science fiction, experience with video games, and ease in a military environment on a 1-7 Likert scale. The Captain was chosen based on the participant who had the highest combination of these factors, and the other two positions were chosen randomly. Task and Stimuli. The Artemis Spaceship Bridge Simulator (version 1.7, http://artemis.eochu.com/) is designed as a cooperative game in which up to six players assume different roles on the bridge of a spaceship. This game was chosen based on similarity to roles found in a submarine environment. The basic game involves randomly generated space “terrain.” The players’ spaceship, named Artemis, is placed into a sector of space and tasked with protecting four friendly space stations (that also provide refueling and restocking of torpedoes) and destroying all enemy vessels.
2384
These vessels can range in strength (shields and armor), number (appearing as a single vessel or in a group), and capability (cloaking, launching fighters, etc.). Participants are asked to “minimize damage to your own ship and to be efficient when destroying enemies." The three team members are each assigned different roles (see Figure 1). The role of the Captain is to see the big picture of the entire space sector, develop and execute strategy, and coordinate his subordinates. The Captain has access to a map of the entire sector and can see environmental obstacles, enemy location and status, and a portion of ownship status. The Captain must issue orders to move the ship, fire weapons, scan enemies, or communicate with other entities.
Figure 1. Three screenshots from the user interface for each role – Captain, Helm/Communications, and Weapons/Science.
The Helm/Communications role is both the ship navigator and the communications officer. Communications is primarily a point-and-click task – choosing the comms target (e.g., an enemy or space station) and then which pre-defined message to send (e.g., order to surrender, order to attack an enemy, or a request for torpedo stock on hand). The Helm position is more involved, as this person is the only one who can drive the ship. Clicking a position around a 360-degree circle sets ship heading, and ship speed is set via sliders or keyboard shortcuts. As nobody else has the ability to drive the ship, Helm must be responsive to orders from the Captain, the needs of the Weapons officer, and be aware of the environment. The Weapons/Science role also has two parts. The Science officer is tasked with scanning enemies. Scanning allows for the identification of enemy ship type, shield strength, and vulnerabilities to weapons frequency. The Weapons position involves loading, managing, and firing both phasers (unlimited-quantity laser beams with a set recharge time) and four kinds of torpedoes with different capabilities (which are in limited quantities). Coordinating with Helm (by communicating through the Captain) is critical, since torpedoes have a specific range, and the ship must be facing the enemy within a certain boundary to fire weapons. Apparatus. Three computers were networked together to run Artemis for participants to use. An Alienware desktop computer ran the Artemis server and an instance of the Artemis client. The server was minimized, and the Captain used the client on this computer. Two Dell laptops running the Artemis client were also used – one for the Helm/ Communications position and the other for the Weapons/
Proceedings of the Human Factors and Ergonomics Society 58th Annual Meeting - 2014
Science position. For distributed teams, all participants wore noise-dampening headsets with microphones. Distributed teams used the voice-over-IP software called Mumble (http://mumble.sourceforge.net). Metrics and Measures. The Artemis simulator provides a number of performance metrics at the end of every scenario, and these numbers were used to evaluate mission outcome and process metrics (e.g., number of torpedoes fired, number of enemy weapon hits, mission duration, etc.). Measures of cognitive workload were gathered at the end of every scenario. The NASA Task Load Index (TLX, [Hart & Staveland, 1988]) is designed to measure individual workload, and five additional questions were added to examine team workload (i.e., demand of communication, monitoring, control, coordination, and leadership). These additional factors (Entin, Serfaty & Kerrigan, 1998) are designed to understand team-specific processes, and are scored and weighted similarly to the traditional measures of individual workload. Communications were recorded by an audio recorder and also categorized in real-time using software developed inhouse known as the Common Observation Recording Tool (CORT). CORT is an electronic notebook designed for collecting observational data. Instead of transcribing event audio, formal utterances were binned into pre-defined categories and tagged with Sender and Receiver identities. These bins are based on Naval litany and include: Acknowledge/Repeat-back, Order, Express Intent, Push Information, Recommend, Request, and Discussion (for less formal conversation). The Group Environment Questionnaire (GEQ, Widmeyer, Brawley & Carron, 1985) was originally designed to measure cohesion in sports teams but was adapted as a nine-question survey focusing on two factors – Individual Attractions to the Group and Group Integration. These areas are the best fit for teams created for this study, especially since physical proximity has been shown to increase feelings of team cohesiveness (Cooke, Salas, Keikel & Bell, 2004). Submarine Team Behaviors Tools (STBT, Steed, 2012) was designed to be an observer-based evaluation of submarine team behavior. STBT provides behavioral anchors across the performance spectrum (from un-stressed to advanced resilience). A subset of STBT measures from the major components (Problem Solving Capacity, Critical Thinking, Decision Making, Bench Strength and Dialogue) were used because some did not apply to this environment and others measures were not observable across all six teams. Design and Procedure. Teams were randomly assigned to one of two between-teams conditions: co-located (participants share a space, enabling free communications and screen sharing) and distributed (participants were physically separated, requiring push-to-talk communications and no screen sharing). Participants were assigned into one of three roles: Captain, Helm/Communications, or Weapons/Science. The experiment was conducted over two days in 2-3 hour sessions. On the first day, team members filled out a biographical questionnaire and received an hour of training on the Artemis user interface and the responsibilities of each position. The desired communications litany was also trained and practiced during this first hour. The litany was patterned
2385
after standard submarine litany but adapted to be significantly smaller, simplified, and fit within the game mechanics. The Artemis litany included rules such as spelling out numbers (“zero-three-zero” instead of “thirty”), identifying the sender and recipient of each communication (“Captain, this is Helm. Coming to course one-eight-zero.”), and enforcing the use of repeat-backs to ensure the order was correctly understood. Litany was enforced through practice and comments during the debriefing after each scenario. Litany was heavily emphasized in this study because of its importance in the military setting: there is a huge amount of information in any operations center (e.g., a submarine attack center), and students must always focus on sending and receiving information in a clear, concise manner. After this period of training, two scenarios were completed on the first day of the experiment. Co-located teams sat next to each other at the same desk, with all three displays next to each other. Distributed teams were seated in different corners of a large room and given headsets with integrated microphones to wear that blocked out external noise. All communication was accomplished by pressing “~” and then speaking. This push-to-talk requirement mirrors radio equipment that warfighters typically use. The first scenario was designed as practice for the participants, with four groups of weak enemies appearing in sequence. The second scenario involved randomly-generated terrain (including an asteroid field, a mine field, and a black hole) and randomly-generated groups of enemies (of various strengths, numbers, and capabilities). The second scenario was set to a difficulty level of 1 (out of a possible 10). On the second day of the experiment, teams practiced litany using a pre-written script, and then they conducted two more scenarios. These scenarios were of difficulty 2 and 3, respectively. Terrain complexity was held constant, so overall scenario difficulty was defined by sensor strength (how far sensors could see through the environment) and enemy strength, number, and capabilities. Overall, difficulty increased across the four scenarios each team conducted. After each scenario, participants filled out the individual and team NASA TLX survey and the adapted GEQ. Then, a debriefing was conducted with experimenters and participants. Scenario performance, communications, strategies, were discussed and recorded for later analysis. RESULTS There are three major variables of interest in this study: the between-teams variable of co-located versus distributed team members, the between-subjects variable of role (Captain, Helm/Communications, Weapons/Science), and the withinteams variable of scenario difficulty (Scenarios 1-4). Task Performance. Mission execution data can be divided into outcome metrics (win percentage, base survival, and enemies destroyed) and process metrics (weapons fired and hits taken). These five metrics were summed across scenarios, and each process metric was normalized for scenario length (i.e., hits taken/minute). Across these five measures, no significant differences were found between co-located and distributed teams using independent-samples t-tests (means can be found in Table 1, greatest t(4) = 1.79, lowest p = 0.15).
Proceedings of the Human Factors and Ergonomics Society 58th Annual Meeting - 2014
This indicates that teams were equally successful at performing and completing the mission. Win %
% bases survived
% enemies destroyed
Weapons /min
2386
questions specifically asked team members about the “togetherness” they felt as a group, and distributed teams felt less cohesive than co-located teams.
Hits /min
Co-Located
0.75
0.98
0.94
1.89
2.16
Distributed
0.83
0.90
0.99
1.61
2.03
Table 1. Outcome and process performance metrics.
Workload. Individual and team workload was measured using factors of the NASA TLX. Numerical weights for each factor were gathered at the start of the study to weight the workload ratings. Individual workload between co-located and distributed teams were similar for Scenarios 1-3, but were significantly different during the difficult Scenario 4 (* = p < 0.05). By separating data from the Captain compared to the Helm or Weapons position (averaged into a single data point for each scenario), this difference is more stark (see Figure 2).
Figure 2. Individual workload scores across scenarios, split up by Captain versus Helm and Weapons.
Team workload results showed a difference trend between co-located and distributed teams across all scenarios (‡ = p < 0.10) along with differences between the Captain and Helm/Weapons positions (Figure 3). Combined, this workload data demonstrates that self-reported workload is greater for distributed teams – team-level demand such as communication and monitoring are much greater, and individual taskwork generates more workload when the scenario is very difficult.
Figure 4. Scores on the GEQ for group integration, indicating that cohesion was lower for distributed teams.
Submarine Team Behaviors Tool. Resilience measures adapted for Artemis included: Managing complexity, Summary/Intent, Tripwires, Information push up the chain, Delegation down the chain, Command Presence, Performance feedback, Formality, and Discussions. Each measure was graded on a 0-7 Likert scale (higher is better, and no team in this experiment achieved higher than 5 on any measure) based on the number of behaviors observed for each measure. For this study, resilience was measured as growth, so a delta score was obtained for each team by subtracting the minimum score from the maximum across the four scenarios. These deltas were averaged across teams to produce a Resilience Growth score for co-located (M = 2.09 SE = 0.18) and distributed (M = 3.00, SE = 0.25) teams (Figure 5). In every resilience measure but one (Discussions), the distributed team showed more growth than the co-located team. Additionally, there was a significant difference in the average Resilience Growth scores (t(4) = 2.96, p < 0.05).
Figure 5. Average improvement in Resilience measures: Co-located vs. distributed.
Figure 3. Team workload scores across scenarios, split up by Captain versus Helm and Weapons.
Team Cohesion. The Group Environment Questionnaire used in this experiment measured two factors: group attraction and group integration. Examining self-report scores between conditions and across scenarios revealed no significant differences in group attraction, but showed significant differences in group integration (Figure 4). Group integration
Communications. Analyses were performed for each communications category across scenarios and between groups. Results showed no significant differences between groups. While there were some identifiable patterns such as greater overall number of communications (see Figure 6) or greater number of orders given in distributed teams, none reached statistical significance. This is possibly due to the fact that some teams had more difficulty with litany, and each team struck their own balance between formality and informality. Table 2 includes means across all categories of utterances.
Proceedings of the Human Factors and Ergonomics Society 58th Annual Meeting - 2014
Order
Acknowledge
Intent
Push
Recommend
Discussion
Overall
Figure 6. Total utterances per minute across scenarios and conditions. There is a pattern of greater communications in the distributed condition.
Co-located
3.20
2.70
0.33
0.94
0.40
0.32
7.96
Distributed
3.63
2.94
0.19
1.19
0.52
0.30
9.73
Table 2. Number of communications per minute, averaged across scenarios.
DISCUSSION This set of findings reveals several important aspects about distributed team members performing a task. All teams were equally able to complete their set of missions. This is an important finding, but is likely task-dependent. This was a fast-paced video game, but many military environments (e.g., submarine combat systems) are slower-paced and require different skills to successfully operate. Nonetheless, the lack of difference is a relevant and encouraging finding. In terms of cognitive performance, the largest differences were found with team-level constructs. Individual task workload did not vary greatly, except when the scenario was very difficult – in this case, distributed team members reported greater individual workload. However, at the team level, distributed teams reported less cohesion within the team and greater difficulty in coordination, communication, and other team-level skills. This combination of results demonstrates that distributed teams can still perform the mission but at the cost of perceived increased workload and less group cohesion. Observer-based measures paint a different picture than the self-report measures, however. Results from the STBT demonstrates that while distributed teams reported less cohesion and greater team workload, they demonstrated an increase in observed resilient behaviors, often relating to coordination, communication, and cohesion elements. Communications showed a pattern of increased utterances by distributed teams, though results did not reach any level of significance. This did not hold for the Intent category, which is when Captains communicate intent to their subordinates. The Resilience data, however, does demonstrate that distributed teams increased their Communicating Intent behaviors across scenarios more than co-located teams did. In all other categories, increased communications are seen, but this may not be negative. Litany is very difficult for new submariners to learn, and this environment emphasizes communications and allows for more litany practice as well.
2387
Together, this pattern of data suggests that training can be done in a distributed fashion, but students, instructors and curriculum developers must be aware of the increased cognitive load and less group cohesion. However, resilient behaviors are likely to improve with balanced application of these extra cognitive load and cohesion challenges, as long as the challenge level is kept in good proportion to the skill. Overall, there are potential benefits to distributed training, but student performance outcomes must be closely examined. Potential mitigations must be explored to improve distributed team performance. These mitigations can be drawn from research demonstrating increased presence and performance by integrating certain capabilities (Dede, 1996), or improvements to user interfaces that allow for collaboration at a distance (e.g., larger displays that allow for neighboring screens to be viewed). Future research will implement these capabilities in a distributed training environment and determine whether cognitive performance can be improved to the level of traditional, co-located training. REFERENCES Cooke, N. J., Salas, E., Kiekel, P. A., & Bell, B. (2004). Team cognition: Understanding the factors that drive process and performance: American Psychological Association. Creutzfeldt, J., Hedman, L., Medin, C., Heinrichs, W. L. R., & Felländer-Tsai, L. (2010). Exploring virtual worlds for scenariobased repeated team training of cardiopulmonary resuscitation in medical students. Journal of Medical Internet research, 12(3). Dede, C. (1996). The evolution of distance education: Emerging technologies and distributed learning. American Journal of Distance Education, 10(2), 4-36. Dev, P., Youngblood, P., Heinrichs, W. L. R., & Kusumoto, L. (2007). Virtual worlds and team training. Anesthesiology clinics, 25(2), 321-336. Driskell, J. E., Radtke, P. H., & Salas, E. (2003). Virtual Teams: Effects of Technological Mediation on Team Performance. Group Dynamics: Theory, Research, and Practice, 7(4), 297. Entin, E., Serfaty, D., & Kerrigan, C. (1998). Choice and performance under three command and control architectures. Paper presented at the 1998 Command and Control Research and Technology Symposium, Monterey, CA. Hart, S. G. (2003). NASA-Task Load Index (NASA-TLX); 20 Years Later. Paper presented at the 47th Annual Meeting of the Human Factors and Ergonomics Society, Denver, CO. Hart, S. G., & Staveland, L. E. (1988). Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research Human mental workload (pp. 139–183). Heinrichs, W., Youngblood, P., Harter, P., & Dev, P. (2008). Simulation for Team Training and Assessment: Case Studies of Online Training with Virtual Worlds. World Journal of Surgery, 32(2), 161-170. Smallidge, T., Jones, E., Lamb, J., Feyre, R., Steed, R., & Caras, A. (2013). Modeling Complex Tactical Team Dynamics in Observed Submarine Operations. In Foundations of Augmented Cognition (pp. 189-198). Springer Berlin Heidelberg.Surface, E. A., Dierdorff, E. C., & Watson, A. M. (2007). Special Operations Language Training Software Measurement of Effectiveness Study: Tactical Iraqi Study Final Report. Widmeyer, W. N., Brawley, L. R., & Carron, A. V. (1985). The measurement of cohesion in sport teams: The group environment questionnaire: Sports Dynamics.