Document not found! Please try again

Software Risk - Why must we keep learning from experience

7 downloads 0 Views 1MB Size Report
Why Software Failure. 1. FAT, FMEA, FMECA of Software. 2. In Production Software Maintenance. 3. No Process. 4. No Systems view. Computer science also ...
don shafer

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Software Risk: Why must we keep learning from experience? Don Shafer Chief Technology Officer Athens Group, Inc. 4201 Southwest Freeway Houston, Tx 77027 www.athensgroup.com Return to session directory

don shafer

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Risk

Software Risk: Why must we keep learning from experience?

Presentation Outline

• “Ready” means “fit for purpose” • Why do we care about software? – Cases of software failure. • Where do we stand with O&G E&P software? • Goals of Software Engineering • Is there hope? • Software Engineering Payback • Questions

DP Conference Houston

September 28-30, 2004

Slide 2

don shafer

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

1. 2. 3. 4.

Risk

Software Risk: Why must we keep learning from experience?

Why Software Failure

FAT, FMEA, FMECA of Software In Production Software Maintenance No Process No Systems view

Computer science also differs from physics in that it is not actually a science. It does not study natural objects. Neither is it, as you might think, mathematics; although it does use mathematical reasoning pretty extensively. Rather, computer science is like engineering - it is all about getting something to do something, rather than just dealing with abstractions as in pre-Smith geology. Richard Feynman, Feynman Lectures on Computation, 1996, pg xiii DP Conference Houston

September 28-30, 2004

Slide 3

don shafer

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Risk

Software Risk: Why must we keep learning from experience?

This is “ready” software?

1) DP computer crashed. Determined to be calendar based program faults so it crashes once per month. A fix has been prepared but until we have an opportunity to install it we are just automatically rebooting periodically before they have a chance to crash. That keeps it on OUR schedule rather than as a worst possible moment experience. 2) DPO reports poor positioning during heading maneuver. Loss of position during 180 degree change was 25 meters. Frequent DGPS alarms. No bias, improper gain settings, frequent change of heading set point during turn, incorrect ocean current estimate. DP Conference Houston

September 28-30, 2004

Slide 4

don shafer

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.





Risk

Software Risk: Why must we keep learning from experience?

There is a Process?

Drive-off is caused by a fault in the thrusters and/or their control system and a fault in the position reference system. As for the drift-off incidents, the failure frequencies are derived from the DPVOA database, which consists of mainly DP Class 2 vessels. For the drive-off incident, these figures are representative also for the DP Class 3 vessels. The major difference between DP Class 2 and 3 is that only the latter is designed to withstand fire and flooding in any one compartment. These two incident types do not affect the drive-off frequency significantly due to the fact that drive-off is caused by logical failures in the thrusters, their control system and/or the position reference system. Hence, the DP Class 2 figures are used directly also for the DP Class 3 vessel. The frequency of fault in the thrusters and/or their control system can be obtained from data in Ref. /A-4/. The control system is here meant to include all relevant software and hardware. The statistics show that

the software failure is about four times as frequent as the hardware failure and slightly more frequent than the pure thruster failure. – KONGSBERG OFFSHORE A.S. • DEMO 2000 - LIGHT WELL INTERVENTION • HSE EVALUATION • REPORT NO. 00-4002, REVISION NO. 00

DP Conference Houston

September 28-30, 2004

Slide 5

don shafer

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

DP Conference Houston

Risk

Software Risk: Why must we keep learning from experience?

IMCA Software Incidents as a Percentage of the Total

September 28-30, 2004

Slide 6

don shafer

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

DP Conference Houston

Risk

Software Risk: Why must we keep learning from experience?

Class 1: “Serious loss of position with serious actual or potential consequences.”

September 28-30, 2004

Slide 7

don shafer

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

DP Conference Houston

Risk

Software Risk: Why must we keep learning from experience?

Where is the Systems Engineering?

September 28-30, 2004

Slide 8

don shafer

Risk

Software Risk: Why must we keep learning from experience?

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Where do we stand with O&G E&P Software? (1) ¾ Many vendors, multiple networks, no systems/software engineering ¾ Incomplete, but incorrect, requirement sets ¾ No enforced standards that all purchased components, including embedded Programmable Logic Computers, can communicate through existing, defined interfaces ¾ Network topologies and protocols are incompatible ¾ Single points of failure in the software and network subsystems are NOT identified ¾ Early estimation of the network data throughputs and maximum loading scenarios is either lacking or inadequate DP Conference Houston

September 28-30, 2004

Slide 9

don shafer

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Risk

Software Risk: Why must we keep learning from experience?

Where do we stand with O&G E&P Software? (2)

¾ No early identification of overly complex control system ¾ ¾ ¾ ¾ ¾

designs that lead to impossible to test operational domains No early estimates of the software development timeline and impacts on commissioning No system models or standard equipment definitions against which to build a library of FAT, commissioning and regression test scenarios No system or equipment models against which to rationally conduct software and network FMEAs No baseline for comprehensive make versus buy decisions and the next system’s contract No life cycle model for system upgrade or work-over

DP Conference Houston

September 28-30, 2004

Slide 10

don shafer

Risk

Software Risk: Why must we keep learning from experience?

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Is there hope?

• Semiconductor Fabrication Facility Model Offshore Drilling Rig Software Fails – People Die

Semiconductor Fab

Yes

Yes

Yes - 10 * 103 meters

Yes - 2 * 10-7 meters

Production Environment

Flying mud, 12 hour shifts, steel-toed boots

Clean Room, Positive Pressure, bunny suits

Cost of Software Failure

$100,000s/day

$10,000,000s/day

Automation Software

Plug and Prey

Plug and Play

No effort to measure

3.4 errors/million

No, not POSC , WITS nor WITSML

SEMI Standards begun in early 1970s

Software Process Model

NONE

SEI/CMMI

Years to Improve Quality

UNKNOWN

15

Drills Wells

Software Quality Equipment Standards

DP Conference Houston

September 28-30, 2004

Slide 11

don shafer

Risk

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Software Risk: Why must we keep learning from experience?

Yes, It is a Cyclical Business!

1.6

45

1.4

40 35

1.2

30 1 25 0.8 20 0.6 15 0.4

10

B2B DP Conference Houston

Sep-99

Dec-99

Mar-00

Jun-00

Sep-00

Dec-00

Mar-01

Jun-01

Sep-01

Dec-01

Mar-02

Jun-02

Sep-02

Dec-02

Mar-03

Jun-03

Sep-03

0 Dec-03

0 Mar-04

5

Jun-04

0.2

$/Barrel

September 28-30, 2004

Slide 12

don shafer

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Baseline Waterfall Model Life Cycle

Risk

Software Risk: Why must we keep learning from experience?

Systems Engineering Life Cycle Model Concept Phase

Review 0

Requirements Definition Phase

Development Phase

Integration Phase Integration and Test

Review 1 FMEA

Ha

ath eP r wa Detailed rd Design

Module Development and Unit test

Operations and Maintenance Phase

Review 2 FAT

Review 3

Preliminary Design Hardware Requirements Concept Definition Activity

Product and System Requirements

System Architecture Review

Out of bounds Review

Integrated System Testing

Production and Deployment

Operations and Maintenance

Software Requirements Preliminary Design

So f

COTS Acquisition Model Life Cycle DP Conference Houston

Detailed tw ar Design eP ath

Coding and Unit Test Integration and Testing

Requirements Analysis

Architecture Definition

Areas likely needing software engineering improvements and risk mitigation

System Integration and Test

COTS Selection

September 28-30, 2004

Slide 13

don shafer

Risk

Payback in Software Engineering

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Dollar Value of the Project

Amount of Risk

Without Simulation Without Software Engineering

With Software Engineering

Concept Exploration

System Exploration

Requirements

Highest Project Planning Impacts DP Conference Houston

Software Risk: Why must we keep learning from experience?

Design

Implement

Installation

Highest Project Execution Impacts September 28-30, 2004

Operations & Maintenance Retirement Support

Highest Production Impacts Slide 14

don shafer

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Risk

Software Risk: Why must we keep learning from experience?

Cost of Software Quality

IBM Cost Data

Boehm Cost Data

DP Conference Houston

September 28-30, 2004

Slide 15

don shafer

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Risk

Software Risk: Why must we keep learning from experience?

Cost of Platform Down Time

• Average Daily Rate for 5th Generation Drilling Rigs: – $154,022* per day, ~$77,000 per 12 hour shift – In startup, could be down 10 shifts per week at a cost of $770,000 per week

• Production platforms pump 20,000 bbl per day – Cost of Nymex Crude $41.25** per barrel – Production platform down for 21 days cost $17,325,000

• Cost of doing a front end software and network Verification and Validation: – < $100,000 * Updated at 1330hrs, 16 July 2004, www.rigzone.com DP Conference Houston

** Updated at 1330hrs, 16 July 2004, www.bloomberg.com September 28-30, 2004

Slide 16

don shafer

Risk

Software Risk: Why must we keep learning from experience?

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Measure and Simulate the Control Software

DP Conference Houston

September 28-30, 2004

Slide 17

don shafer

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

DP Conference Houston

Risk

Software Risk: Why must we keep learning from experience?

Collision Timing Results - 1st Pass

September 28-30, 2004

Slide 18

don shafer

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

DP Conference Houston

Risk

Software Risk: Why must we keep learning from experience?

Collision Timing Results - 2nd Pass

September 28-30, 2004

Slide 19

don shafer

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Risk

Software Risk: Why must we keep learning from experience?

Software Engineering Goals Checklist

; Complete and correct set of requirements have been collected ; Requirements of all purchased components that include ; ; ; ; ; ; ; ;

embedded Programmable Logic Computers (PLCs) can communicate with each other through existing, defined interfaces Network topologies and protocols are compatible Early identification of single points of failure in the software and network subsystems Early estimation of the network data throughputs and maximum loading scenarios Early identification of overly complex control system designs that lead to impossible to test operational domains Early estimates of the software development timeline and impacts on commissioning Development of a model system against which to build a library of FAT, commissioning and regression test scenarios Providing a model against which to rationally conduct software and network FMEAs Provide a baseline for comprehensive make versus buy decisions and the next system’s contract

DP Conference Houston

September 28-30, 2004

Slide 20

don shafer

Risk

Software Risk: Why must we keep learning from experience?

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Don Shafer Chief Technology Officer Athens Group, Inc. 4201 Southwest Freeway, Suite 220 Houston, Tx 77027 [email protected] www.athensgroup.com 713.960.5094 x117 DP Conference Houston

September 28-30, 2004

Slide 21

don shafer

Risk

Software Risk: Why must we keep learning from experience?

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Abstract

Is there software risk in the dynamic positioning regime? Cited by Kongsberg Offshore A.S. in HSE Evaluation Report No. 00-4002: “The statistics show that the software failure is about four times as frequent as the hardware failure and slightly more frequent than the pure thruster failure.” Based on IMCA data, the percentage of “Loss of Position Class 1” DP problems that were caused by software for a recent 5 year period was 33%. Can we mitigate the risk? FMEA, FMECA and good software engineering practices will go a long way toward reducing today’s DP software risks. This is not rocket science but the lack of good engineering practices. The Airbus 300 series and the latest Boeing 7x7 aircraft are completely fly by wire. The airbags in your automobile have autonomous processors with embedded software. Embedded medical devices contain processors run by software. We would never tolerate the number of software failures in these devices that occur on DP systems. Why don’t they fail at a 33% rate? This presentation will propose a set of software and hardware life cycle processes along with a mitigation model for identifying and eliminating software risk within DP systems. Several recent incidents will be analyzed to show how these processes would have mitigated the potential for failure. Attendees will be able to take away processes they can implement in their own organizations to reduce software failures.

DP Conference Houston

September 28-30, 2004

Slide 22

don shafer

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Risk

Software Risk: Why must we keep learning from experience?

Presenter’s Bio

Don Shafer is a co-founder, corporate director and Chief Technology Officer of Athens Group, Inc. Incorporated in June 1998, Athens Group is an employeeowned consulting firm, integrating technology strategy and software solutions. Since 2001, he has led Athens engineers in the analysis, verification, validation and simulation of drilling, safety and positioning systems for deep water oil platforms. Prior to that he worked on the development of seismic databases, down hole logging equipment design, SCADA system design and production control His latest patents are on joint work done with Agilent Technologies in state-based machine control. He earned a BS degree from the USAF Academy and an MBA from the University of Denver. Shafer’s work experience includes positions held at Boeing and Los Alamos National Laboratories. He is currently an adjunct professor in graduate software engineering at Texas State University, a Senior Member of the IEEE and the Editor in Chief of the IEEE Computer Society Press. His latest 2004 presentations on oil industry software engineering are for the Society of Petroleum Engineers Leading-Edge Drilling Rigs Advanced Technology Workshop and the Marine Technology Society Dynamic Positioning Conference. With two other colleagues in 2002, he wrote Quality Software Project Management for Prentice-Hall now used in both industry and academia. He is a multiple book author for the IEEE Software Engineering Series.

DP Conference Houston

September 28-30, 2004

Slide 23

Suggest Documents