Problem solving in expert teams: Functional models and task processes

1 downloads 0 Views 883KB Size Report
Aug 15, 2012 - An expert finds a problem, and the team need to solve it. ... 1. start appearing in telemetry at some mission elapsed time (MET) .... ber of Trials.
Simulation and Human Factors in Modeling of Spaceflight Mission Control Teams International Symposium on Resilient Control Systems 15 August 2012, Salt Lake City, UT Barrett S. Caldwell, PhD Jeffrey D. Onken, PhD ISRCS, August 2012

Background Four Team Processes

GROUPER History

Asking

2005 paper

A novice comes to experts with a question.

Sharing

by Caldwell

2006 paper

Experts share opinions, experience, and backgrounds.

Solving

by Ghosh & Caldwell

this paper

An expert finds a problem, and the team need to solve it.

Learning

future work

Experts study together to expand their knowledge.

ISRCS, August 2012

NASA Flight Control Team Flight controllers are highly trained experts all on one team. Their job is to monitor the crew and the Shuttle subsystems via their consoles.

The flight director is the decision lead of the team. The team is grouped according to engineering and scientific disciplines. They communicate on voice loops.

They solve problems together; no one simply dictates instructions.

ISRCS, August 2012

3

Foundational Work Distributed Supervisory Coordination Situation Awareness

Human Supervisory Coordinator

Group Performance Expertise Human–Human Comm. Interfaces Human Supervisory Controller

Human–Human Comm. Interfaces

Group Decision Processes “Solving” Information Flow Human–System Interfaces

Asking and Sharing modules have been modeled. This models the Solving module.

4

Dimensions of Expertise

Human Supervisory Controller

Human–System Interfaces Engineering System Being Controlled ISRCS, August 2012

Situation Awareness

Problem Statement: Analysis of Alternatives How do we guide the design of an MCC?

An offline research MCC would be prohibitively expensive and still take a long time.  A model & simulation is cheap, fast, and safe.

Extensibility

Sustainability

8

1

2

3

Total

Adaptability

21 5

Effectiveness

Robustness

Safety Alternatives

Wght

Trial and error takes years, is very risky, and is limited to only slight variations.

Figures of Merit

Status Quo MCC

?

Skeleton MCC (=10)

?

Smaller MCC (-10) Larger MCC (+10) Unmonitored Sleep Mobile Team??

ISRCS, August 2012

? 5

? ? ? ?

Challenges for a Good Model Integrating Human Factors and Simulation Disciplines An Effective Model Must Include: •Realistic Examples of Anomalies •Taken from actual spaceflight mission

•Believable and Realistic Dynamics of Agents •Based on authors’ experience, research literature on “intellective”

and decision making tasks, permitting “non-rationality” (but “truth wins” convincing) •Recognition of Task Demands and Constraints •Dynamic function allocation and mission safety criteria

•Stability and Convergence of Estimates of Parameter and

Scenario Performance •Monte Carlo analyses, 300 trial convergence

ISRCS, August 2012

Telemetry and Anomalies Telemetry is data from subsystems.

Anomalies are irregularities.

Regular telemetry is nominal.

Anomalies is off-nominal telemetry.

Flight-control consoles display telemetry.

Anomalies may be evidence of underlying problems (may be blip in sensor or comm.) Anomalies are problems that the team must solve.

Decisions must be made by the team, all together, to keep the crew safe. The flight controllers cannot ignore any anomalies. The one they ignore may be the one that kills the crew.

ISRCS, August 2012

7

Fault Detection, Isolation, and Recovery (FDIR) The JPL Purdue Architecture Analysis (PAA) project introduced a FDIR model. Anomalies 1.

start appearing in telemetry at some mission elapsed time (MET)

2. disappear after some set lifetime 3.

impart some penalty based on whether they are degrading

There is a process of anomaly detection, another process of anomaly isolation, and another process for anomaly resolution.

 The anomaly model and anomaly resolution process are based on this. 8

ISRCS, August 2012

Working The Problems, When They Arise Anomalies

Monitoring for Faults

Evidence of problems Could be anything from nonissues to catastrophes.

FDIR

Fault Detection

NASA calls problems faults. Fault Detection requires situation awareness. Fault Isolation requires hypotheses.

Fault Isolation

Fault Recovery requires a solution.

Fault Tolerance Some on-orbit systems may not tolerate any downtime. The MCC works to maximize reliability and availability. ISRCS, August 2012

Fault Recovery

Anomaly Model Anomaly Severity All anomalies expire after some lifetime.

Anomaly Severity

Minor

Non-degrading

No effect

Minor anomalies do not penalize the score. Impairing anomalies do. Critical anomalies make the score zero.

Some anomalies affect objectives.

Degrading

Non-degrading anomalies do not block objectives. Degrading anomalies block all objectives. They must expire or be resolved to unblock.

Resolving Anomalies Agents are not privy to anomaly details. Agents resolve anomalies through Hypotheses.

ISRCS, August 2012

Impairing

Critical

Worst effect

Agent Model Agent-based Modeling Agents represent flight controllers. Agents are dynamic model elements.

Character Stats Character stats simplify complexity through all-inclusive attributes. Agents have four of the six dimensions of expertise. Subject-matter expertise

Communication expertise

Situational-context expertise

Expert-identification expertise

Agents have two more character stats. Saturation

Independence

total demand of simultaneous tasks

autonomy to do their own tasks without discussing them with others

ISRCS, August 2012

Multidisciplinary Modeling Framework Agents each have 4 character stats.

Actor Voice Loop: High-level Messages

Actor

Three are constant. Saturation is dynamic. Telemetry: Low-level Messages

Console

These influence the actor’s effectiveness. Information requirements for acquiring and maintaining situation awareness also influences the actor’s effectiveness.

Agents communicate on voice loops.

Anomalies & Hypotheses

Information sharing occurs via loops.

The team must arrive at a consensus to select the best hypothesis.

Anomalies occur within loop domains, not across them.

Degrading anomalies block progress on mission objectives. ISRCS, August 2012

The Baseline: STS-135 Timeline: Scenario 0 0

2

Mission Elapsed Time, MET (days) 4 6 8 10

12

14

Scenario Check heat shield (#0) Atlantis’ GPC3 failed (#0) Orbital debris tracking (#1) Docking with ISS & leak checks (#1) Install Raffaello module to Harmony (#2) Toilet problem (#2) Spacewalk (#4)

Scenario Objective Anomaly Degrading Anomaly

Inventory swap with ISS (#3)

Atlantis’ GPC4 failed due to single-event upset (#3) Bypass planned communications outage to fix GPC4 (#4) Broken LiOH latches (#5) Microbial air samples (#5) Uninstalling Raffaello from ISS (#6) Undocking from ISS (#7) Photographing the ISS (#8) Deploying picosat (#9)

These objectives and anomalies were taken from the mission status reports, with times estimated from context. ISRCS, August 2012

Discussion

Scenario 0: Convergence and Anomaly Resolution

Outcome Time to first consensus Time to resolution (after first consensus)

300 273

300 Number of Trials

There were big differences between complex and simple anomalies:

Anomaly 0 Anomaly 1 Anomaly 2 Anomaly 3 Anomaly 4 Anomaly 5

297 277

250 200

174

186

150

126

114

100 50

0

0 Resolved Consensus Times by Anomaly Types

100000 10000 1000 100 10 1

1000000 Avg. Resolution Time (s)

Consensus Time (s)

1000000

Sorted Trials

27

3

23

Expired

Resolution Times by Team

100000 10000

Scenario 0 Complex Scenario 0 DYN Scenario 0 MECH

1000 100 10 1

ISRCS, August 2012

14

Sorted Trials

Scenarios Scenario

probability of correct hypothesis

team sizes

level of integration

probability of hypothesis competition

Scenario 0 the baseline

0.2

3, 13, 11

Level 2

0.1

Scenario 1

increased to 0.8

Scenario 2

decreased to 3, 3, 3

Scenario 3

increased to 13, 23, 21

Scenario 4

increased to Level 4

Scenario 5

increased to Level 4

Scenario 6

decreased to 3, 9, 7

increased to Level 4

Scenario 7

decreased to 3, 9, 7

increased to Level 4

Scenario 8

decreased to 3, 9, 7

15

ISRCS, August 2012

increased to 0.5

increased to 0.5

Variability: Model Parameter Changing a model parameter to see how the results vary.

Scenario 1 vs. Scenario 0 in Consensus Time 1000000

Decreasing the probability of correct hypotheses for simple anomalies results in a decrease in the time to first consensus for simple anomalies and no change to complex anomalies.

Consensus Time (s)

100000

Scenario 1 Complex Scenario 1 Simple Scenario 0 Complex Scenario 0 Simple

10000

1000

100

10

1 ISRCS, August 2012

Sorted Trials

16

Variability: Scenario Parameter

Decreasing the teams’ sizes down to three members each does not affect consensus time increases the speed at which the teams get through the anomaly-resolution process for simple anomalies

Resolution Times in Scenarios 2 and 0 1000000 Avg. Time to Resolution (s)

Changing a scenario parameter to see how the results vary.

100000

1000 100 10 1

prevents the teams from reaching consensus for complex anomalies

Number of Trials

300

Increasing the teams’ sizes by ten members each

Scenario 2 DYN Scenario 2 MECH Scenario 0 DYN Scenario 0 MECH

10000

Sorted Trials

Expired Anomalies in Scenarios 3 and 0 300 299

250

Scenario 3

200 150

Scenario 0

126

114

100

ISRCS, August 2012

50

2127

10

0 0

1

17

2 3 Anomaly #

03 4

2523 5

Analysis of Alternatives Results Increased Level of Integration on Consoles

Reduced Team Size

Scenario 0

Level 2

Scenario 4

-> 4

Scenario 5

-> 4

Scenario 6

-> 4

-> 3, 9, 7

Scenario 7

-> 4

-> 3, 9, 7

Effectiveness at Resolving Anomalies

0.1

Good

3, 13, 11

Almost As Good -> 0.5

Scenario 8

Pretty Bad Good

-> 0.5

Pretty Bad

-> 3, 9, 7

The Best

Analysis of Alternatives Results: Anomalies Resolved

300 Number of Trials

Increased Probability of Hypothesis Competition

250

Scenario 0

200

Scenario 4

150

Scenario 5 Scenario 6

100

Scenario 7

50

Scenario 8

0 0

1

2

3 Anomaly # ISRCS, August 2012

4

5

18

Sample Unexpected Insight: Cliffs in Consensus Time

These are consensuses that the team reaches hours or days after their first attempts fail, long after the anomaly disappears from their telemetry. --Not rational agent behavior!

These are retrospective consensuses. The team revisits old problems to try again to solve them. Small mutations in hypotheses occur at a steady state; eventually they lead to breakthroughs.

Avg. Consensus Time (s)

The cliffs appear in both simple and complex anomalies.

AoA: Scenarios 4 vs. 0 in Consensus Time 1000000 100000

Scenario 4 Complex Scenario 4 Simple Scenario 0 Complex Scenario 0 Simple

10000 1000 100 10 1

Sorted Trials

AoA: Scenarios 7, 6, & 0 in Consensus Time 1000000 Avg. Consensus Time (s)

Cliffs appear in the distribution of consensus time.

100000

ISRCS, August 2012

Scenario 7 Complex Scenario 6 Complex Scenario 0 Complex

10000 1000 100 10 1

Sorted Trials

19

Some Conclusions Cheaper than Building New Control Rooms, but Not Trivial Effective Analysis of Alternatives Planning and Design Requires Plausible Agents, Plausible Scenarios, Plausible Mission and Task Dynamics Such Multidisciplinary Integration Supports Novel Links of “Simulation-informed Human Factors” and “Human Factors-informed Simulation” Agent Models Demonstrate Learning and “Retrospective” Behaviors Suggesting Memory and Integration—Are Such Findings Reliable? Useful?

ISRCS, August 2012

Acknowledgements and Contact Material from J. Onken Dissertation, “Modeling Real-time Coordination of Distributed Expertise and Event Response in NASA Mission Control” (2012) Dr. Onken is now at Northrup Grumman; both Prof. Caldwell and Dr. Onken have been supported by NASA. However, this work was not supported by NG and does not reflect any official positions of NG, NASA, or others http://www.grouperlab.org Barrett Caldwell: [email protected]; Jeffrey Onken: [email protected]

ISRCS, August 2012

Suggest Documents