controller). This paper presents a validation of the. AgentFly system, specifically a human en-route Air. Traffic Controller (ATC) behavioral model. The ATC.
VALIDATION OF AN AIR-TRAFFIC CONTROLLER BEHAVIORAL MODEL FOR FAST TIME SIMULATION Přemysl Volf, Jan Jakubův, Lukáš Koranda, David Šišlák, Michal Pěchouček Czech Technical University, Faculty of Electrical Engineering, Department of Computer Science, Agent Technology Center, Prague, Czech Republic Stefania Mereu, Brian Hilburn, TASC, Inc., Chantilly, VA Duc N. Nguyen, Drexel University – Philadelphia, PA
Abstract The US National Airspace System (NAS) is incredibly complex, and consists of many specific functions. Given predicted increases in air traffic, enhancement of the current system and development of the NextGEN system are critical to maintain safe and efficient operation. Each feature of the system needs to be carefully designed, developed, tested and validated. To this end, human-in-the-loop (HITL) simulations represent one of the most powerful and realistic testing tools. HITL simulations can provide valuable feedback on how new features influence the behavior of human operators. The drawbacks of HITL simulations include limited flexibility and scalability, and high cost. Computer simulations, which involve no direct human activity, can avoid some of these potential problems, and often represent an attractive alternative (or adjunct) to HITL simulation. A crucial question underlying the use of computer models is how well the model captures the human operator (in this case, the air traffic controller). This paper presents a validation of the AgentFly system, specifically a human en-route Air Traffic Controller (ATC) behavioral model. The ATC workload model is based on Multiple Resource Theory including visual scanning, radio emulation and different kinds of uncertainty. Validation of the AgentFly system was performed by comparing model output to HITL simulation data. The AgentFly system used simulated behavior of air traffic controllers and pilots to collect similar data. Both types of output data were processed and compared based on selected metrics.
Introduction Increased complexity of the National Airspace System (NAS) [1] requires the development of new 978-1-4799-4891-8/14/$31.00 ©2014 IEEE
T1-1
air traffic management (ATM) tools, to ensure safe and efficient flight operations. New features must be tested before they can be deployed and used by air traffic controllers. Models and simulations are often used to test such tools, particularly in the earliest stages of concept development. Human-in-the-loop (HITL) simulation is a reasonably high fidelity model of real-world scenarios. HITL data are often thought to provide the strongest level reliability in testing the interaction between humans and machines, but there are obvious drawbacks to such research. Human participants tend to be relatively expensive, difficult to recruit, and prone to fatigue. In addition, HITL simulations have limited flexibility and scalability, and they are limited by the capacity of testing facility. Fast-time simulation avoids several of the potential drawbacks of HITL methods. Fast-time simulation enables flexible and rapid prototyping of new functionalities and features into the current ATM systems. It can be used for repeated, simultaneous executions of simulations with a wide range of configurations prior to live trials. Moreover, fast time simulation can lend itself to large scale simulations (such as those involving multi-sector, multi-agent scenarios). The AgentFly system, currently under development at the Czech Technical University (CTU), is an agent-based fast-time simulation that supports modeling of an air traffic controller’s behavior. The AgentFly system was validated against data collected in a HITL simulation to evaluate how accurately it simulates the behavior of controller agents, especially under increased traffic scenarios. The FAA provided input and output data from a HITL simulation, and provided subject matter expert opinion on the correct parameterization of AgentFly. Drexel University performed a set of experiments
running the AgentFly system under the same conditions as the HITL (sector definition, traffic data, policies, etc.) and generated the output data. TASC then processed and transformed the output data, generated appropriate metrics and compared them to the results of the HITL experiment.
processed based on their mutual priorities. The use of such combined simulation makes the whole simulation fully repeatable if no changes are made in the configuration. Controlled randomness is integrated through use of a proper random seed.
Human Behavior Model
AgentFly ATC Behavior Model AgentFly System AgentFly [2] is a multi-agent NAS-wide simulator that supports distributed scalable, fast-time simulation with a highly modular architecture. AgentFly is designed to model both flights (aircraft performance and pilot) and the air traffic controller (simplified to include only ATC agent behavior and partially the En Route Automation Modernization (ERAM) system). Pilot agent comprises each simulated aircraft in the simulation. An airplane’s characteristics are based on performance models from EUROCONTROL’s Base of Aircraft Data (BADA) [3]. The pilot agent operates its radio and interacts with ATM services through the sector radio communication channel. The pilot agent is responsible for confirmation and implementation of provided ATM control clearances. The ATC agent represents the human controller providing ATM services in the sector for which it is fully responsible. ATC agents emulate possible interactions with available ATM tools and provide control to pilot agents via simulated sector radio links. Human controllers are emulated by means of the Visual, Cognitive, Auditory and Psychomotor (VCAP) workload model with limited resources [4]. The details about the model are provided in the following chapter. AgentFly combines two simulation approaches: time-stepped and event driven. The time-stepped simulation advances time by predefined equal-sized time steps. The new states of the simulation are computed after each time step, and each round of simulation begins with sensor computation and ends with agent actions. The time-stepped approach is used for the simulation of the environment, movement of airplanes, weather, etc. The rest of the simulation is based on an event-driven approach. Each event is scheduled to a certain time moment. Events scheduled for the same time moments are T1-2
AgentFly’s ATC agent emulates controller operations, including a model of the controller’s perceived workload. The workload model is based on Multiple Resource Theory (MRT) [5]. MRT proposes that the human operator has several different pools of resources that can be tapped simultaneously. The MRT theory views performance decrement as a shortage of these different resources and describes humans as having limited capability for processing information. AgentFly uses simplified version of the MRT theory. Specifically, the integrated workload model uses four processing resources: visual, cognitive, auditory and psychomotor. The visual and auditory components in the model reflect the external stimuli that human controllers need to attend. The cognitive component describes the level of information processing required to perform specific controllers’ task. The psychomotor component describes the physical actions necessary to complete each task. The human controller duties are modeled as procedures with actions. Actions are organized into dependency chains and procedures. The procedures branch actions into several chains to be executed under the following circumstances. Each particular action has defined which components from the VCAP model it requires, duration, workload and priority. An action can be performed if its predecessor(s) are completed and the respective VCAP components are available at that moment. When two or more actions are ready for execution at the same time, the action with the higher priority is selected. Ready, but not selected, actions are automatically postponed until they can be processed. The duration of each action can be fixed or probabilistic model. The workload for each action is computed as a sum of static and dynamic part. The static workload is assigned to the action based on the configuration. The dynamic workload is computed based on the current state of the simulation and increases with the number of conflicts. A conflict occurs when an action
is being processed and another action is scheduled during this processing. AgentFly emulates controller interactions via a simulated radar display system based on ERAM (see Figure 1). Visual stimuli and psychomotor actions are sensor inputs of the controller model and are connected to ERAM model. Because of this limited interaction, controllers do not have access to the internal states and plans of other
components in the system. For tasks working with the airplane flight trajectories (e.g. handoff, conflict detection, conflict resolution), the ATC agent builds an internal flight information model for each flight. The internal model is updated based on the processed external stimuli, taken and planned ATM control actions. The internal flight model integrates controller predictions and uncertainness as well.
Figure 1. The ERAM Screen within AgentFly
T1-3
Validation Methodology To validate the accuracy of the AgentFly enroute controller, AgentFly simulation output was compared to the results obtained in a human-in-theloop (HITL) simulation of the same scenario. The performance of AgentFly was evaluated over the same scenarios and conditions used in the Separation Management II (SepMan2) study [6] conducted at FAA’s Research Development and Human Factors Lab (RDHFL) with the aim of evaluating proposed modifications to the En Route Automation Modernization (ERAM) system. The study was selected because it tested many features implemented by AgentFly. The dataset included the simulation specifics and human performance data for a total of ten participants, tested individually. Six of these participants were tested in one flight scenario (scenario B) and four of them were tested in an alternative flight scenario (scenario D). AgentFly was tested in both scenarios B and D using same input data (flights, policies). AgentFly was configured (in terms of initial parameterization) in cooperation with air traffic controller Subject Matter Experts (SMEs). We obtained workload ratings for each of the VCAP modalities. Data were collected through an online questionnaire we compiled. Before giving the ratings, two SMEs were debriefed over the phone and questions were answered as they arose. The questionnaire included 27 questions, one for each of the parameters associated with a specific ATC action within AGENTFLY. SMEs were asked to indicate, on a scale of 1 to 7, effort by modality for each given task. Scores were averaged and used as needed by CTU to replace missing values in the system. This section summarizes the validation methodology, the gathered metrics and data analysis approach. A full description of the methodology and results are outlined in the final report [7].
Data Sources The team used human performance data from the SepMan2 HITL study, and SME input on controller behavior to compare with output from AgentFly. Participants in the SepMan2 study completed a 45-minute, high traffic flight scenario. Data were recorded from various communication sources within the simulation. Same data were generated using the
T1-4
AgentFly’s output, so that they could be compared to those generated using the HITL output Several behavioral measures were collected as part of SepMan 2 experimental protocol. The rich dataset consisted of several files relevant to the validation effort, including but not limited to workload, aircraft count, handoffs and the complete set of actions performed by the participants on the aircraft included in the scenario. Of great importance for the evaluation, were the measures obtained using the Air Traffic Workload Input Technique (ATWIT), which required participants to press one of seven buttons to indicate instantaneous workload during the simulation. ATWIT workload data were collected every two minutes.
Metrics There are several measurement issues that must be considered in selecting metrics. These include whether a metric is reliable (repeatable), valid (meaningful), and in fact even measurable. Measured metrics was selected from the list of metrics created by FAA and EUROCONTROL, including their collaborative work in the Action Plan 9 (AP9) effort to agree a common set of ATM research metrics. Given the ambiguity and overlap among metrics lists, we used the following preliminary criteria for identifying candidate metrics: 1) Must be focused on en route, as opposed to terminal or ground; 2) Must be a metric or method from which a specific measurement procedure and measurement units could be inferred; and 3) Must be non-redundant in terms of measurement units (e.g. aircraft per hour, per day, and per 15 minutes were all seen as the same metric: aircraft per unit time). The final set included two main types of metrics: Human performance and system performance. The human performance constructs include workload, taskload, coordination, effectiveness and efficiency. System performance constructs include sector capacity, safety, and cost.
Data Analysis Approach The richness of the available dataset allowed us to perform several analyses, which included subjective evaluation of the output, correlation between the outputs (when possible) and testing of the difference between means (e.g., t-test and
ANOVA). Bayesian statistical methods were used where appropriate. Accordingly, when a hypothesis test did not allow us to reject the null hypothesis— that the difference between the AgentFly’s output and HITL data equals zero—we used Bayesian estimation to assess the credibility of the null hypothesis. From the Bayesian analyses, we report the 95% highdensity interval (HDI), which indicates the interval of most credible hypotheses regarding the difference between the two simulations [8]. An HDI distributed around zero indicates that no difference between the simulations is a highly credible hypothesis. It is necessary to make some assumptions when studying such complex phenomena. Like any simulation, some simplification is required (certainly in the area of human behavior). Second, some assumptions were made in order to be able to compare different sets of data, some of which were not necessarily collected with the purpose of being compared to human behavior.
Results The following section presents results from a variety of gathered metrics. The full validation study focused on results from two separate scenarios representing slightly different air traffic in the same sector. This paper presents results from a single scenario, because results in both scenarios are very similar. The results are categorized by human performance and system performance.
Workload AgentFly workload output was compared to the workload ratings of SepMan2 HITL participants. Ratings were averaged across runs and subjects, and compared separately for each scenario. Because HITL and AgentFly data used different scales, the outputs were standardized as z scores to enable comparison. Results showed a significant correlation between AgentFly and participant ratings over time (r = 0.81; p < .001; Figure 2).
T1-5
Figure 2. Workload as Subjectively Perceived by Participants (HITL) and as Predicted by Agentfly of the Simulation Simplified Dynamic Density To estimate a more objective workload rating, the validation effort gathered and measured the Simplified Dynamic Density (SDD) metric as developed by [9]. SDD is a simplified version of Dynamic Density (DD) [10], which has been under development for over 15 years as an accurate and robust indicator of sector traffic complexity. In its original form, DD was defined as “a measure of control related workload that is a function of the number of aircraft in the complexity of traffic patterns in a volume of airspace”. DD, however, has been criticized as unwieldy. SDD was computed as the weighted sum of the following seven parameters, in order of weighting: traffic density, occupancy counts, sector boundary crossing, proximities (4 levels), altitude changes, heading variance, and speed variance. A parameter was sampled every 300 seconds of the simulation for both AgentFly and SepMan2 in the scenarios used for this validation. Subsequently we computed the correlation, across each simulation (i.e., AgentFly and HITL). Results (Figure 3) show that the SDD values obtained in AgentFly correlated highly with those obtained in SepMan2, (rs= 0.98; p < 0.001).
The overall average traffic count (i.e., number of aircraft simultaneously both within sector and under control) was 16.75 and 17.55 for HITL and AgentFly, respectively. Over session time, AgentFly was also able to closely match the HITL traffic. As shown in Figure 5, AgentFly-derived traffic count generally fell within the range of HITL values (notice the dashed minimum and maximum lines around the HITL average.
ure 3. SDD: AgentFly vs. HITL
Fig
Number of Aircraft under Control Traffic load is known to be one of the prime drivers of task load (and workload) in ATC. Everything else equal, the more aircraft under control, the higher the task load. Given the fundamental role of traffic level, we were therefore interested in first confirming that AgentFly could reasonably reproduce the traffic levels experienced under SepMan2. For each of the 10 HITL participants, maximum instantaneous traffic load (i.e., the maximum number of aircraft simultaneously under sector control) was computed over successive two-minute intervals. These interval values were then averaged across all 10 participants. Traffic load increased steadily over the 45 minute HITL sessions. Average maximum traffic load ranged from 8.9 aircraft (at the very beginning of the scenarios) to 25.3 aircraft (at the end). There was a strong positive correlation between HITL and AgentFly, in terms of aircraft under-sectorcontrol count (r=0.95, p