Foreword Reliability and Safety in Real-Time Systems - IEEE Xplore

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-12, NO.9, SEPTEMBER 1986

877

Foreword Reliability and Safety in Real-Time Systems

B

UILDING safe and reliable software is a major problem when using computers in safety-critical environments where failures could result in loss of life and tremendous economic costs, These systems are both complex and must operate in real-time. The single mostimportant characteristic of any real-time system is that its actions, or inactions, cannot be overlooked or ignored; the system can never be returned to a prior state. The effect that a decision of the computer system has on the controlled system and, in tum, the controlled system on the real world may be desirable, inconvenient, or disastrous: it is never inconsequential. This requires that each decision must be correct and, because of this, extra care and effort must be involved in producing such systems in order to enhance their reliability and safety. Safety and reliability are often equated, especially with respect to software, but there is a growing trend to separate the two concepts. Reliability is usually defined as the probability that a system will perform its intended function for a specified period of time under a set of specified environmental conditions, Safety is the probability that conditions which can lead to an accident (hazards) do not occur whether the intended function is performed or not. Another way of saying this is that software safety involves ensuring that the software will execute within a system context without resulting in unacceptable risk. In general, reliability requirements are concerned with making a system failure-free whereas safety requirements are concerned with making it accident-free. These are not usually synonymous. There are many failures of differing consequences that are possible in any complex system. The consequences may range from minor annoyance up to death or injury. Reliability is concerned with every possible software fault whereas safety is only concerned with those which result in actual system hazards. That is, hazard is not equivalent to failure; hazards involve the risk of loss or harm whereas failures may not. Not all software faults cause safety problems and not all software which functions according to specification is safe. Severe accidents have occurred while something was operating exactly as intended-i.e., without failure. The need to separate safety and reliability results from the fact that different techniques may be used to enhance the two qualities, and even more important, these two qualities are often conflicting. Increasing safety may, in some circumstances, actually decrease reliability. Real-time software is a special class of software with some unique characteristics. The most important is that often the inputs are not fixed in time or forseeable; the timing is determined by the real world and not by the pro-

grammer. This implies that we cannot anticipate a priori all possible conditions with which the software may have to deal. Another important characteristic is that normally such systems are distributed among different processors, each controlling different real-world processes which potentially interact. Synchronization of different processors and/or different software processes on the same processor is a difficult problem; conceptual tools must be developed that make the modeling of real-time process intercommunication easy to analyze and to implement. Real-time systems normally also have rigorous time constraints that introduce' one more dimension in the problem -of verifi'cation of these systems; to verify the correctness of the functional aspects is not sufficient. The above characteristics require the use of specially tailored techniques and tools for the design and verification of real-time software. And when very high standards for safety and reliability are required, extreme care must be devoted to the design and verification of the software throughout its entire life-cycle. The concept of fault-tolerance is important for safe and reliable systems design. Hardware fault-tolerance is usually implemented through functional and hardware redundancy. To detect hardware failures, identical programs are executed on redundant hardware with majority voting on the results or on a single hardware system with self-test and standby switching to spares. The latter may conflict with the real-time requirements and not be practical for some systems. An attempt has been made to apply these hardware techniques to software. However, the potential benefits of using multiple versions of the software will be decreased by common mode failures of the software. There is recent experimental evidence to indicate that independent development of software does not guarantee the absence of correlated failures of the different program versions. Testing real-time software presents special problems. It is impossible to predict all the possible input situations or all the possible sequences of input events when external interrupts are a feature of the system. Furthermore, even if such tests were possible, the interpretation of the output results may be difficult. For these reasons, analytic techniques are essential to verify the correctness of real-time software. A number of models exist for modeling the software. Petri nets seem to be receiving much attention and several papers included in this issue involve modeling of real-time systems with Petri nets. Programming languages for real-time software also deserve special attention. Using real-time languages which include special features for implementing the timing-crit-

878

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-12, NO.9, SEPTEMBER 1986

ical aspects of the application along with features to improve safety and reliability can enhance the quality of the resulting code. The papers which follow cover many different aspects of the problem of developing safe and reliable real-time systems including design, specification and verification, programming languages, and assessment. We hope that

this special issue will enhance awareness of some of the techniques which are available or being developed to deal with these problems and also will encourage research on these important topics. SANDRO BOLOGNA NANCyG. LEVESON

Guest Editors

Sandro Bologna received the degree in physics from the University of Rome, Rome, Italy, in 1972. Since his graduation, he has been working for ENEA (Italian Agency for Nuclear and Alternative Energy) in the area of process control, with a special emphasis on the area of computer applications for nuclear power plant operation and control. During this time he has visited the Riso National Laboratory in Denmark, Gesellschaft fur Reaktorsicherheit in West Germany, and Westinghouse in the U.S.A. His research interests include software engineering, formal specifications, verification techniques, and knowledge-based support systems design. Dr. Bologna is an active member of the European Workshop on Industrial Computer Systems (EWICS) Technical Committee on Reliability and Safety and a member of the IEEE Computer Society and the Association for Computing Machinery.

Nancy G. Leveson received the B.A. degree in mathematics, the M.S. degree in management, and the Ph.D. degree in computer science from the University of California, Los Angeles. She has worked for IBM and is currently an Associate Professor of Computer Science at the University of California, Irvine. Her current interests are in software reliability, safety, and fault tolerance. She heads the Software Safety Project at UCI which is exploring a range of software engineering topics involved in specifying, designing, verifying, and assessing reliable and safe real-time software.' Dr. Leveson is a member of the Association for Computing Machinery, the IEEE Computer Society, and the System Safety Society.

List of Referees We would like to express our thanks to the following referees who helped make this issue possible: Arvind T. Anderson B. Arbab H. Aschmann E. Ashcroft J. Bartlett D. Berry L. Bic P. Bishop E. Boebert R. Campbell L. Clarke W. Ehrenberger

C. Ghezzi J. Goldberg D. Guaspari L. Harris P. Harter A.lannino C. Jones T. Jones R. Kemmerer K. Kim E. Kligerman J. Knight G. Koch

C. Krishna C. Landwehr D. Lane R. Lauber G. Levi B. Littlewood M. Maiocchi D. Marinescu D. Mandrioli J. McHugh A. Moitra M. Molloy D. Morgan

P. Neumann K. Okumoto C. Pidgeon D. Potier W. Quirk M. Rata R. Razouk D. Rombach M. Rose B. Runge J. Rushby T. Shimeall K. Shin

S. Shrivastava N. Singpurwalla B. Smith L. Simoncini J. Stolzy L. Strigini R. Taylor M. Vernon G. Weber E. Weyuker N. Wirth