Data science approaches to prevent failures in systems engineering

1 downloads 132 Views 906KB Size Report
Nov 8, 2017 - SSRR 2017. November 8, 2017. 10. Expectation of Machine Learning/AI/Big Data. Human bookkeeping. Automated
Data science approaches to prevent failures in  systems engineering Sponsor: DASD(SE) By Prof. Karen Marais and Prof. Bruno Ribeiro 9th Annual SERC Sponsor Research Review November 8, 2017 FHI 360 CONFERENCE CENTER 1825 Connecticut Avenue NW, 8th Floor Washington, DC 20009 www.sercuarc.org SSRR 2017

November 8, 2017

Project failures occur despite systems engineering best practices

Project delays, cost overruns, quality concerns, cancellations…

SSRR 2017

November 8, 2017

2

Why aren’t these methods helping (as much as we hope)?

Several possible reasons… ① They rely on extensive data creation, collection, and tracking,  which is hard to do ② We think they are not useful, and so they are not

SSRR 2017

November 8, 2017

3

Our core ideas: ① risk assessment based on the “real reasons” for  systems engineering failures,  and  ② augment existing data with about team assessments, Wisdom of the Crowd (WoC), to uncover problems and  likely “real reason” causes

SSRR 2017

November 8, 2017

4

Most systems engineering failures do not involve black swans Most failures result from rather prosaic and predictable white swans: Lost tacit knowledge when  employee(s) departed

Subjected to insufficient testing

Created deficient requirements

Failed to provide resources

Violated regulations

Failed to inspect

Used inadequate justification

Violated procedures

Subjected to inadequate  reviews

Failed to form a contingency  plan

Managed risk poorly

Kept poor records

Failed to consider systems  factor

Created deficient procedures

Failed to supervise

Lacked experience

Enforced deficient regulations

Did not allow aspect to stabilize

Failed to consider human factor

Did not learn from failure

Failed to maintain

Diane Sorenson and Karen Marais, “Patterns of Causation in Accidents and Other Systems Engineering Failures,” IEEE Systems Conference, April 2016, IEEE, Orlando, FL. SSRR 2017

November 8, 2017

5

The failure cause network shows how these causes relate to one  another

SSRR 2017

November 8, 2017

6

Consider the Mars Climate Orbiter failure

The project was severely  understaffed, with some  people working 80 hours  per week. The team monitoring the spacecraft saw  that errors were accumulating on the aim  point for the spacecraft, but did not  investigate.  SSRR 2017

November 8, 2017

7

People knew something wasn’t right… Can we develop ways to get this information? (without requiring ever more paperwork…)

SSRR 2017

November 8, 2017

8

Big data can help, but it’s harder than you might think.

SSRR 2017

November 8, 2017

9

Expectation of Machine Learning/AI/Big Data

insights

Human bookkeeping

=

+ Automated data collection (sensors) Machine Learning / AI

Prevent, forecast failures

SSRR 2017

November 8, 2017

10

Reality of Machine Learning/AI/Big Data Estimated MTBF

~10 years? Check later with Emily



If sensor  measurement is  wrong do…

TODO: Describe our  work‐around. Need to  revisit this later.



Data collection not high priority when project is in trouble

Human bookkeeping (Low priority in crunch time)

=

+ Automated data collection (sensors) Machine Learning / AI

Low priority: faulty, incomplete when needed most SSRR 2017

November 8, 2017

11

We propose a tool that uses both existing data and Wisdom of the  Crowds to help predict failures

Enterprise  Software Derived  Inputs Machine Learning Algorithm App Derived  Inputs

SSRR 2017

Failure  Prediction

Actual Failures

Algorithm is continuously updated within the organization

November 8, 2017

12

How will we get there? IDENTIFY INPUT DATA and  DEVELOP COLLECTION  APP

IDENTIFY Enterprise Software  Derived Inputs

IDENTIFY Human Derived  Inputs DEVELOP WoC App

SSRR 2017

DEVELOP First Generation of  Machine Learning  Algorithm

Failure Prediction  at PARTNER  ORGANIZATION

RECORD  Actual Failures at PARTNER  ORGANIZATION

Use partner organization data to train first generation of machine learning algorithm and tailor set of input parameters.

November 8, 2017

13

Identifying input data

Use student and partner organization data to train first and second  generations of machine learning algorithm and tailor set of input signals. SIGNALS from Enterprise Software  and WoC App

FACTORS Identified based on  literature

REAL REASONS  Provide initial  seed for  selecting factors

REAL FAILURES

Over time, the machine learning  code makes direct links between  signals and failures, thus we can  discard the factors and real  reasons.

SSRR 2017

November 8, 2017

14

How does input data relate to the real reasons?

SIGNAL from Employee App:  How many times did you ask  your team members a “why”  question today?

SIGNAL from Finance  Software:  The percentage of budget  associated with replacing  faulty or unsuitable parts.

SSRR 2017

FACTOR: Low Proactivity  Low proactivity may mean  missed opportunities to  question and improve  requirements. 

FACTOR: Faulty Parts  Many faulty parts may be a  sign of poor requirements  specification (e.g., good part  used in wrong way, or poor  quality part).

November 8, 2017

REAL REASON: Conducted poor  requirements engineering

15

Wisdom‐of‐the‐Crowd Information Accurate Wisdom‐of‐the‐Crowd Predictions from Incomplete  Pictures An expert with a complete view and understanding of the entire process may be able to  give a reasonable assessment of potential problems and delays. As projects become more complex, this assessment is increasingly hard and dedicated experts are no longer able to have a complete view. (Wisdom of the Crowd)  Can we use the assessments of non‐experts with partial views to train neural networks to  learn to a. Predict success and failures using non‐experts with  incomplete (possibly biased) information? b. Ask relevant questions to these non‐experts to help  make the data richer to better predict success and failure? SSRR 2017

November 8, 2017

16

Predicting Outcomes from WoC Inputs Predicted outcome

Hypothesis: Non‐expert opinions and their relationships  can help predict project outcomes Approach: Use our newly developed Sparse Pattern  Convolutional Neural Network (SPCNN), to learn dynamic  relational dependencies between group actors, their  opinions, and project outcomes. Input: WoC team member assessment of project health,  WoC assessment of potential personal issues, team  structure, traditional indicators

Sparse Pattern  Convolutional  Neural Network

Manager – Contractor Engineer – Eng. Intern (graph encodes  relationship patterns)

Output: Predicted outcome of project milestones

M

C

M

E

EI

C

C E

... Meng, C., Sekar, C., Ribeiro, B., Neville, J., 2017, Predicting Subgraph Evolution in  Heterogeneous Dynamic Networks, (preprint). Yang, J., Ribeiro, B., Neville, J., Should We Be Confident in Peer Effects Estimated from Partial  Crawls of Social Networks? AAAI Conference on Web and Social Media (ICWSM), 2017

SSRR 2017

November 8, 2017

... inputs from each individual in team + relevant extra data

...

17

Active Learning + Contextual Bandits • Problem: Too many questions we would like to ask ―Must limit the number of questions to ask (avoid subject fatigue) ―Which questions/answers most correlate with outcome? o Active learning approach to learn which questions to ask Measures Prediction Accuracy

Gives feedback of usefulness of question Learns to choose questions to ask

Tests prediction Gets answer Makes prediction

M

C

M

E

C

C

E

... ...

...

{

predicts usefulness of each question SSRR 2017

November 8, 2017

18

Vision for Product Development: Year One and Future IDENTIFY INPUT DATA and  DEVELOP COLLECTION  APP

Enterprise Software  Derived Inputs IDENTIFY and TAILOR EXPAND and REFINE

IDENTIFY Human Derived  Inputs DEVELOP WoC App REFINE WoC App

SSRR 2017

Use student organization data to train first generation of machine learning algorithm and tailor set of input parameters.

Machine Learning  Algorithm DEVELOP 1st Generation REFINE 2nd Generation

Failure Prediction  at PARTNER  ORGANIZATION

Actual Failures at STUDENT  ORGANIZATIONS at PARTNER  ORGANIZATION

Use partner organization data to train second generation of machine learning algorithm and expand and refine set of input parameters as necessary.

November 8, 2017

19

Suggest Documents