Nov 8, 2017 - SSRR 2017. November 8, 2017. 10. Expectation of Machine Learning/AI/Big Data. Human bookkeeping. Automated
Data science approaches to prevent failures in systems engineering Sponsor: DASD(SE) By Prof. Karen Marais and Prof. Bruno Ribeiro 9th Annual SERC Sponsor Research Review November 8, 2017 FHI 360 CONFERENCE CENTER 1825 Connecticut Avenue NW, 8th Floor Washington, DC 20009 www.sercuarc.org SSRR 2017
November 8, 2017
Project failures occur despite systems engineering best practices
Project delays, cost overruns, quality concerns, cancellations…
SSRR 2017
November 8, 2017
2
Why aren’t these methods helping (as much as we hope)?
Several possible reasons… ① They rely on extensive data creation, collection, and tracking, which is hard to do ② We think they are not useful, and so they are not
SSRR 2017
November 8, 2017
3
Our core ideas: ① risk assessment based on the “real reasons” for systems engineering failures, and ② augment existing data with about team assessments, Wisdom of the Crowd (WoC), to uncover problems and likely “real reason” causes
SSRR 2017
November 8, 2017
4
Most systems engineering failures do not involve black swans Most failures result from rather prosaic and predictable white swans: Lost tacit knowledge when employee(s) departed
Subjected to insufficient testing
Created deficient requirements
Failed to provide resources
Violated regulations
Failed to inspect
Used inadequate justification
Violated procedures
Subjected to inadequate reviews
Failed to form a contingency plan
Managed risk poorly
Kept poor records
Failed to consider systems factor
Created deficient procedures
Failed to supervise
Lacked experience
Enforced deficient regulations
Did not allow aspect to stabilize
Failed to consider human factor
Did not learn from failure
Failed to maintain
Diane Sorenson and Karen Marais, “Patterns of Causation in Accidents and Other Systems Engineering Failures,” IEEE Systems Conference, April 2016, IEEE, Orlando, FL. SSRR 2017
November 8, 2017
5
The failure cause network shows how these causes relate to one another
SSRR 2017
November 8, 2017
6
Consider the Mars Climate Orbiter failure
The project was severely understaffed, with some people working 80 hours per week. The team monitoring the spacecraft saw that errors were accumulating on the aim point for the spacecraft, but did not investigate. SSRR 2017
November 8, 2017
7
People knew something wasn’t right… Can we develop ways to get this information? (without requiring ever more paperwork…)
SSRR 2017
November 8, 2017
8
Big data can help, but it’s harder than you might think.
SSRR 2017
November 8, 2017
9
Expectation of Machine Learning/AI/Big Data
insights
Human bookkeeping
=
+ Automated data collection (sensors) Machine Learning / AI
Prevent, forecast failures
SSRR 2017
November 8, 2017
10
Reality of Machine Learning/AI/Big Data Estimated MTBF
~10 years? Check later with Emily
‐
If sensor measurement is wrong do…
TODO: Describe our work‐around. Need to revisit this later.
‐
Data collection not high priority when project is in trouble
Human bookkeeping (Low priority in crunch time)
=
+ Automated data collection (sensors) Machine Learning / AI
Low priority: faulty, incomplete when needed most SSRR 2017
November 8, 2017
11
We propose a tool that uses both existing data and Wisdom of the Crowds to help predict failures
Enterprise Software Derived Inputs Machine Learning Algorithm App Derived Inputs
SSRR 2017
Failure Prediction
Actual Failures
Algorithm is continuously updated within the organization
November 8, 2017
12
How will we get there? IDENTIFY INPUT DATA and DEVELOP COLLECTION APP
IDENTIFY Enterprise Software Derived Inputs
IDENTIFY Human Derived Inputs DEVELOP WoC App
SSRR 2017
DEVELOP First Generation of Machine Learning Algorithm
Failure Prediction at PARTNER ORGANIZATION
RECORD Actual Failures at PARTNER ORGANIZATION
Use partner organization data to train first generation of machine learning algorithm and tailor set of input parameters.
November 8, 2017
13
Identifying input data
Use student and partner organization data to train first and second generations of machine learning algorithm and tailor set of input signals. SIGNALS from Enterprise Software and WoC App
FACTORS Identified based on literature
REAL REASONS Provide initial seed for selecting factors
REAL FAILURES
Over time, the machine learning code makes direct links between signals and failures, thus we can discard the factors and real reasons.
SSRR 2017
November 8, 2017
14
How does input data relate to the real reasons?
SIGNAL from Employee App: How many times did you ask your team members a “why” question today?
SIGNAL from Finance Software: The percentage of budget associated with replacing faulty or unsuitable parts.
SSRR 2017
FACTOR: Low Proactivity Low proactivity may mean missed opportunities to question and improve requirements.
FACTOR: Faulty Parts Many faulty parts may be a sign of poor requirements specification (e.g., good part used in wrong way, or poor quality part).
November 8, 2017
REAL REASON: Conducted poor requirements engineering
15
Wisdom‐of‐the‐Crowd Information Accurate Wisdom‐of‐the‐Crowd Predictions from Incomplete Pictures An expert with a complete view and understanding of the entire process may be able to give a reasonable assessment of potential problems and delays. As projects become more complex, this assessment is increasingly hard and dedicated experts are no longer able to have a complete view. (Wisdom of the Crowd) Can we use the assessments of non‐experts with partial views to train neural networks to learn to a. Predict success and failures using non‐experts with incomplete (possibly biased) information? b. Ask relevant questions to these non‐experts to help make the data richer to better predict success and failure? SSRR 2017
November 8, 2017
16
Predicting Outcomes from WoC Inputs Predicted outcome
Hypothesis: Non‐expert opinions and their relationships can help predict project outcomes Approach: Use our newly developed Sparse Pattern Convolutional Neural Network (SPCNN), to learn dynamic relational dependencies between group actors, their opinions, and project outcomes. Input: WoC team member assessment of project health, WoC assessment of potential personal issues, team structure, traditional indicators
Sparse Pattern Convolutional Neural Network
Manager – Contractor Engineer – Eng. Intern (graph encodes relationship patterns)
Output: Predicted outcome of project milestones
M
C
M
E
EI
C
C E
... Meng, C., Sekar, C., Ribeiro, B., Neville, J., 2017, Predicting Subgraph Evolution in Heterogeneous Dynamic Networks, (preprint). Yang, J., Ribeiro, B., Neville, J., Should We Be Confident in Peer Effects Estimated from Partial Crawls of Social Networks? AAAI Conference on Web and Social Media (ICWSM), 2017
SSRR 2017
November 8, 2017
... inputs from each individual in team + relevant extra data
...
17
Active Learning + Contextual Bandits • Problem: Too many questions we would like to ask ―Must limit the number of questions to ask (avoid subject fatigue) ―Which questions/answers most correlate with outcome? o Active learning approach to learn which questions to ask Measures Prediction Accuracy
Gives feedback of usefulness of question Learns to choose questions to ask
Tests prediction Gets answer Makes prediction
M
C
M
E
C
C
E
... ...
...
{
predicts usefulness of each question SSRR 2017
November 8, 2017
18
Vision for Product Development: Year One and Future IDENTIFY INPUT DATA and DEVELOP COLLECTION APP
Enterprise Software Derived Inputs IDENTIFY and TAILOR EXPAND and REFINE
IDENTIFY Human Derived Inputs DEVELOP WoC App REFINE WoC App
SSRR 2017
Use student organization data to train first generation of machine learning algorithm and tailor set of input parameters.
Machine Learning Algorithm DEVELOP 1st Generation REFINE 2nd Generation
Failure Prediction at PARTNER ORGANIZATION
Actual Failures at STUDENT ORGANIZATIONS at PARTNER ORGANIZATION
Use partner organization data to train second generation of machine learning algorithm and expand and refine set of input parameters as necessary.
November 8, 2017
19