Technologies for Improving the Dependability of Software- Intensive ...

3 downloads 23176 Views 72KB Size Report
control center (MCC) has doubled in size over the last ten years to more than 1.5 million ... Thus, a “good” software system with more than a million SLOC will contain several thousand ..... system specifications should call for the techniques.
Technologies for Improving the Dependability of SoftwareIntensive Systems: A Review of NASA Experience and Needs George E. Stark • The MITRE Corporation • Houston Key Words : Best current practice, Complexity, Fault tolerance, Formal methods, Software metrics, Software reliability SUMMARY & CONCLUSIONS Software is central to the mission of many NASA systems. It is also a major source of failure. Fortunately, techniques such as best current practice, complexity measurement, fault tolerance, formal methods, and software reliability exist to mitigate the risk of system failure because of software. Unfortunately, these techniques have become disciplines of their own with very little integration, resulting in a lack of support from project managers. NASA SR&QA should develop a strategy to integrate the techniques into the agency reliability plan. THE NEED FOR MORE DEPENDABLE SOFTWARE This paper is about the dependability of systems that rely on software. Dependability is a collective term subsuming the notions of reliability, safety, availability, and security [30]. Specifically, I discuss five technologies that can improve the reliability and safety of softwareintensive NASA systems, and reduce the amount of “engineering judgment” used in the decision to field complex space and ground systems. The five technologies are best current practice, software complexity measurement, fault tolerance, formal methods, and software reliability. Each provides evidence toward improved system dependability. They involve measurement on a particular implementation of the program under study. Individually, there is not quantitative evidence that any technology provides a significant improvement in system reliability. Taken together, however, they may provide the needed level of confidence. A NASA goal should be to integrate these technologies as a normal part of its engineering efforts. To achieve this goal the Safety, Reliability, and Quality Assurance (SR&QA) organization should define a roadmap of short- and long-term needs and invest in projects to meet these needs. This paper is a review of NASA experiences with the technologies that leads to a starting point for the roadmap. Over the years NASA spacecraft and ground systems have become increasingly dependent on software to meet mission objectives. Figure 1 shows the software size for several NASA ground and space applications over time. While the reliance on software by the onboard, unmanned applications has remained small, they have grown from around 500 source lines of code (SLOC) in Mariner 9 to the estimated 35,000 SLOC for the Cassini spacecraft expected to launch in 1997. The onboard, manned software has grown from the 16,500 SLOC for the Apollo Saturn V to more than 500,000 SLOC for the shuttle, and the projected 900,000 SLOC for the space station data management system. The ground systems used to train the astronauts and monitor and control the spacecraft average more than one million SLOC. The shuttle mission control center (MCC) has doubled in size over the last ten years to more than 1.5 million executable SLOC. For each type of system the amount of software has become the dominant aspect of system complexity.

RF

94RM-162

RF

Software Size of Major NASA Manned and Unmanned Space and Ground Systems

10000 Ground Systems 1000

Crew Onboard Systems

100

10

Satellite Onboard Systems

1

0.1 1965

1970

1975

1980

1985

1990

1995

2000

Figure 1. NASA project size over time This increase in SLOC is not surprising. The presence of astronauts onboard a spacecraft forces the use of environmental controls and other sensors not required for smaller unmanned systems. Furthermore, because processing on the ground is significantly less expensive than computing on orbit, NASA tries to do as much processing on the ground as possible. With more complicated missions, such as Space Station and Cassini, increasing functionality will be implemented with software. This increase adds greater risk of software faults leading to mission failure. To understand this risk, we must get a feel for the number of faults* in the code and the associated failure rate of the software component. Numerous fault density results are available [8,19,54,55]. Musa calculated the average fault density from 131 projects to be 1.48 faults/KSLOC at system release. Based on experience in testing products, Miller defines the quality levels in table 1 for software products upon delivery [37].

*In this paper I use the AIAA recommended practice for software reliability standard definitions that an error is a human action that results in a fault, a fault is a defect in the software (i.e., bug), and a failure is the manifestation of the fault during execution that results in the system not performing its intended function.

RF

94RM-162

RF

Table 1. Quality Levels defined by Miller Quality Level Defects/KSLOC Normal

Suggest Documents