Business Perspectives on Provably-Correct Software

Business Perspectives on Provably-Correct Software Wayne Lobb W. A. Lobb, LLC Concord, Massachusetts, USA

Bits & Chips, Eindhoven, 3 June 2015

1

Preliminaries

W. Lobb

Bits & Chips

Eindhoven, 3 June 2015

Model-Driven Development

2

Terminology • “Correct Software” – Designed and built to specifications, with no implementation errors – Specifications may be wrong, but that’s a different problem

• “Provably-Correct Software” – Tools can verify, or can prove mathematically, that no errors exist – Note that correctness is a function of the software’s environment – A component can be perfect in version N but fail in N+1, called differently

• “Business Perspectives” – Can we really build provably-correct software? Must we? – If we can and we must, how much will it cost? Can we afford it? – What has our species discovered so far in these matters?

W. Lobb

Bits & Chips



3

Can we really build provably-correct SW? • Yes, absolutely!

int i = 2; – We can write 1 line or 2 or 10 of error-free code: assert(2 == i); – If these 2 lines compile and link without errors, they will never fail, true?

• But something goes wrong in real-world systems • It’s not just a question just of quantity/size of code – 10,000 lines of data-initialization source code – easily made flawless

• It’s about complexity – Cyclomatic complexity: number of paths of execution through the code – Even this is not a perfect measure: e.g. massive switch wrapping API calls

• It’s about human (in)ability to understand and predict – Flow of execution is logic in action, as signals are in circuitry – Logic is inherently tangled, difficult, daunting – it’s pure mathematics W. Lobb

Bits & Chips



4

Two case studies and lessons from them • NASA Mars Code – Curiosity rover – How NASA’s approach to SW development has evolved – Precautions sending proven-correct software to Mars

• Toyota Camry unintended acceleration (UA) problems – In the US Toyota has paid $2bn for UA lawsuits – Do we know that engine control software caused UA? – How will Toyota avoid such problems in the future?

• Lessons: we can and must build proven-correct SW – Competitiveness in life/mission-critical software control demands it – Proven-correct software costs less to make and reaches market faster! W. Lobb

Bits & Chips



5

NASA

What NASA does in software, the world comes around to

W. Lobb

Bits & Chips



6

NASA SW Development Handbook 1984 In 1984 NASA did waterfall and phase-gate (colors added)

Recognition that requirements and design can evolve W. Lobb

Bits & Chips



7

NASA SW Development Handbook 1990 By 1990 NASA is anticipating aspects of Agile; Manifesto was 2001 Coding starts day 1

System test starts before preliminary design is complete

Requirements can evolve even during Acceptance Testing W. Lobb

Bits & Chips



8

NASA Mars Code 2005-today • Lessons learned from control SW for Curiosity rover • Main reference: Gerard Holzmann article “Mars Code” – February 2014 issue of the Communications of the ACM – Plus supplementary info at www.spinroot.com and other sources

• Key requirements for Curiosity software – Human safety #1 during rover construction, testing and deployment – Thereafter, just one chance to get it right – rover has to land safely – Then, ability to monitor, control and debug from 570m km away (14 lm)

• (Here is one provably-correct line of the source: ” Elvis has Spirit. The answer is 42...END\r\n”;

/* 1095 with NULL */

– Part of the daily fill-packet message Curiosity still sends daily to Earth)

W. Lobb

Bits & Chips



9

NASA standard methods for complex SW • • • •

(Quoting from the Holzmann article) Architecture with clean separation of concerns Data hiding, modularity Well-defined interfaces, strong fault-protection

• • • • •

Clearly stated requirements Requirements tracking Daily integration builds Rigorous unit and integration testing Extensive simulation W. Lobb

Bits & Chips



10

NASA additional methods for Mars code • (Quoting from the Holzmann article) • Software redundancy: two different landing code bases • Sparse, risk-based [not style-based] coding standards – Especially to prevent errors from undisciplined use of multitasking – Meant to secure predictable execution in an embedded system context

• Redefined code-review process w/automated tools • Static analysis: Coverity, CodeSonar, Semmle, Uno • Logic model-checking tools formally verify critical code – Strongest check for multithreaded code – rover has 120 tasks under RTOS – Used Spin model-checker plus an extended model extraction tool for C – Spin was developed at Bell Labs early 1980s, freely available since 1989 W. Lobb

Bits & Chips



11

Spin and other logic model checkers • Inspired by logic-checking in design of digital circuits – Impossible to design and test correct circuits without logic-checking tools – Same is true for software “circuits”!

• All checkers do the same thing: test mathematical logic – Exhaustively examine all execution paths through a logical model – Uncover the first instance of a race condition, deadlock, live-lock etc – If no first instance is found, then the model has been proven correct

• Biggest challenge is explosion of state space – Every reachable combination of states and events must be examined – Must test the simplest faithful model possible – can’t test source code

• ONLY way to cost-effectively test concurrent SW logic – Upwards of 80 model checkers exist, nearly all open-source or free use W. Lobb

Bits & Chips



12

“Reverse modeling” of C code in Spin • Spin has an automated model extractor, Modex – Constructs an abstract model of events and states in a stretch of C code – May require manual intervention to get the model ready for checking

• Manual creation of Spin checkable models from C – – – – – –

(Quoting Holzmann article) Critical data-management model implemented in 45,000 lines of C Spin verification model of approximately 1,600 lines Model-checking runs identified most subtle concurrency flaws For the file system in particular, model-checking runs became routine Often surprising us by identifying newly committed coding errors

• Lesson: let computers exhaustively comb for errors – Computers are far more patient, persistent and reliable than humans – Model checkers force you to divide-and-conquer to tame complexity W. Lobb

Bits & Chips



13

How error-free is Curiosity software? • Flawless landing on Mars on 6 August 2012 • Essentially flawless execution in the ~3 years since • Only significant error: contention for overlaid memory – At topmost level, groups had overlooked a potential overlay conflict – But NASA was able to debug and repair it remotely from Earth

• (Footnote: double-ended queue model-check example – – – – – W. Lobb

Holzmann’s article illustrates model-checking power on deque source Not allowed to publish Curiosity source code, so resorted to this example But the article did not get the example completely correct Holzmann published a lengthy erratum at www.spinroot.com/dcas Problems came mainly from the difficulties of model extraction) Bits & Chips



14

Toyota

Toyota sold 10.23m vehicles in 2014, #1 in the world

W. Lobb

Bits & Chips



15

Toyota unintended acceleration (UA) • Camry recalls began late 2009, totaled 10m+ autos • First identified as foot-caught-on-mat, or pedal-stuck • By late 2010, some 37 deaths had been reported – But all in the US, none elsewhere in the world – why? How can this be?

• Much controversy over driver error versus car error • NASA was brought in to analyze control software – 10-month analysis found no fatal problems in electronics or software – "Our conclusion is Toyota's problems were mechanical, not electrical."

• But Barr consulting group did deeper analyses 2011-12 • From 2013 on, US juries are finding Toyota SW at fault – Toyota has paid/lost over $2bn so far and may pay more W. Lobb

Bits & Chips



16

NASA’s investigation of Toyota UA • Focused on 2005 Camry engine throttle control SW • Applied Coverity, CodeSonar, Uno to source code • Built and checked Spin logic models for throttle control – Potential issue with sensor input: 10- vs 12-bit reads, but not proven – Potential issue with PWM: hard-coded sleeps, but could not show UA

• Built and ran MatLab, SimuLink, StateFlow simulations – Found no firm evidence of problems that would cause UA

• Analyzed statistical/probabilistic models of timing – Detected recursion, forbidden by MISRA software standards, in one case – But no firm evidence that timing problems could cause UA

• In sum: no “smoking gun”, but can’t rule out SW cause W. Lobb

Bits & Chips



17

Barr Group analyses of Toyota UA • Seven embedded SW experts worked 2011-2012 – Had access to all relevant Toyota source code, unlike NASA

• Far more damaging report than NASA – Toyota did not take many of the standard precautions that NASA does – Much less the additional ones NASA used in Mars code

• Barr injected one bit-flip, main task died, UA did occur – Was this a true “smoking gun” in a real Camry? Not clear… – Q: Must software be immune to any bit-flip any time? – A: In safety-critical, maybe – but it seems extreme – how to test?

• In sum: not a proof that SW caused UA, but likelihood

W. Lobb

Bits & Chips



18

Lesson from Toyota UA • Toyota’s overall auto safety record is excellent – Though Toyota has never committed to follow MISRA software standard

• Toyota has been assaulted on UA for not being rigorous • In May 2013 Toyota began adopting model-checking – Bought the Altran AdaCore SPARK toolset, based on FDR model checker – Same logic-checking engine used by Verum’s ASD and Dezyne toolsets

• In October 2013 Toyota settled in face of Barr report • In February 2015 Toyota lost another acceleration trial – 1996 Camry – not related to the 2005 Camry UA, but in same spirit

• Lesson: SW found liable even if not proved at fault – Based on not applying available mature tools and industry standards W. Lobb

Bits & Chips



19

Conclusions

W. Lobb

Bits & Chips



20

Concurrency requires model-checking • Concurrency demands a new level of SW engineering – Computing today is multi-computer, multi-process, multi-thread – Many or most significant bodies of code use event-driven concurrency

• NASA and others have shown the way: model-checking – There is no other cost-effective way to provably-correct concurrent logic – Amazon uses TLA+ model-checking for cloud data security and integrity

• But model-checking has existed, unused, for 40 years! – Why hasn’t it spread? Because the tools have not been fully ready. – Too hard to use, too long to ramp up, not supported commercially

• The Eindhoven area leads the world in model-checking – You have demonstrated how it lowers both costs and time-to-market – Disruptive, game-changing new tools are emerging – use them! W. Lobb

Bits & Chips



21

Caution: software liability • Toyota is a world-leading corporation – Toyota Way has profoundly influenced reliability engineering worldwide – Statistical process control, the 5 why’s, and so on

• Toyota engineering is well known for care and diligence • But Toyota has historically gone its own way on SW – Did not sign up for MISRA automotive software standards – Did not follow standard best practices like NASA and others

• Camry unintended acceleration is still a mystery – How could it cause dozens of deaths in the US but none elsewhere? – IMO NASA and Barr did not show causation beyond a reasonable doubt

• Software can be held liable even if not fully proven so – So: when model-checking is standard practice, you’ll need to be using it W. Lobb

Bits & Chips



22

Closing thoughts Model-checking is the bedrock of concurrency design.

It will become standard practice in development. Now is the time to start benefiting from it.

W. Lobb

Bits & Chips



23

Backup

W. Lobb

Bits & Chips



24

Some orgs that use model-checking – 1 of 2 • European Space Agency: Esterel, Lustre – http://www.esa.int/TEC/Software_engineering_and_standardisation/TECLCAUXBQE_0.html

• NASA: Spin for space missions, e.g. Mars rovers – http://spinroot.com/spin/whatispin.html

• Amazon: TLA+ for scalable secure transactions – http://research.microsoft.com/en-us/um/people/lamport/tla/tla.html – Also, Spin for model-checking distributed transaction protocols

• [Medical device builder]: QuantumLeaps – www.state-machines.com

• Airbus: ASTRÈE for aircraft control systems – http://www.astree.ens.fr/

W. Lobb

Bits & Chips



25

Some orgs that use model-checking – 2 of 2 • ASML Eindhoven NL: Verum for fab wafer handling etc – http://www.verum.com/

• FEI Eindhoven NL: Verum for microscope control – http://www.verum.com/

• Philips Healthcare NL: Verum for medical imaging – http://www.verum.com/

• Praxis/Altran: Z and SPARK for air traffic control in UK – http://www.adacore.com/sparkpro/

• FAA now mandates Z language specifications, DO-333 – ISO/IEC 13568:2002; Z is used by the Event-B and Alloy model checkers

• … and many, many others W. Lobb

Bits & Chips



26