Research in Software Reliability Engineering Carol Smidts, Ph. D., University of Maryland Key Words: Software Reliability, Software Testing, Probabilistic Risk Assessment INTRODUCTION Research and education define a faculty’s life. An exposition of one’s research or education is a way to “Meet the Faculty”. Thus, this paper discusses my research in software reliability engineering and related areas at the University of Maryland, Center for Risk and Reliability Engineering. The Center for Risk and Reliability Engineering is hosted in the James Clark School of Engineering at the University of Maryland. The Center is the research arm of the Reliability Engineering Program, an accredited program that delivers Masters and doctoral degrees in Reliability Engineering. The Reliability Engineering program is currently hosted in the Department of Mechanical Engineering. Seven faculty and one hundred and twenty full time and part-time graduate students constitute the core of the program and of the Center. Research and education are centered on different concentration areas. These include: General Reliability, Software Reliability, Micro-electronics Reliability and Probabilistic Risk Assessment. In addition to the core faculty, multiple affiliate faculty borrowed from different schools (Computer Science, Business) and departments (Electrical Engineering, Civil Engineering) participate on research and teaching activities. My area of responsibility is Software Reliability Engineering. As such I have been the main developer of the Software Reliability Engineering Curriculum, a series of four graduate level courses allowing graduate students to achieve a certificate in Software Reliability Engineering or a Masters or PhD degree in Reliability Engineering, with a concentration area in Software Reliability Engineering. The four courses taught are Software Quality Assurance, Software Reliability, and Software Safety and Information Security. The Curriculum was instituted in 1996 with joint funding from NSA and is an active educational area with two core faculty, myself and a recent addition, Dr. M. Cukier. At this point, the Curriculum has graduated 27 MS and PhDs who have taken positions in research laboratories such as Motorola Labs and IBM Watson, and in companies such as SUN, Booz-Allen Hamilton, etc. In parallel to the development of the curriculum, multiple research activities have flourished, which are described below. 1.
SOFTWARE RELIABILITY AND EARLY PREDICTION
Models for predicting software reliability in the early phases of development are of paramount importance since they provide early identification of cost overruns, software development process issues and optimal development strategies. A few models geared towards early reliability prediction, applicable to well defined domains, were developed during the 1990’s. However, many questions related to early prediction are still open, and more research in this area is needed, particularly for developing a generic approach to early reliability prediction. Our research has focused on development of an approach to predicting software reliability based on a systematic identification of software process failure modes and their likelihoods [1]. A direct consequence of the approach and its supporting data collection efforts is the identification of weak areas in the software development process. A Bayes framework for the quantification of software process failure mode probabilities can be useful since it allows use of historical data that are only partially relevant to the software at hand. The approach has been applied in the context of a waterfall life-cycle and for failure modes related to the requirements phase (see Figure1 and Figure 2). Within these constrained limits, the approach seems promising. The key characteristics of the approach should apply to other software development life-cycles and phases. However, it is unclear how difficult the implementation of the approach would be, and how accurate the predictions would be. Further research is thus necessary in several areas which include: 1) the investigation of alternative software life-cycle, 2) the development of software process failure modes for lifecycle phases other than requirements, 3) the development of tools to support the approach, 4) validation on multiple projects. The last point broaches an important issue for software reliability engineering research, the struggle for data. Much of the data publicly available is limited and usually ancient. Since software development techniques evolve rapidly, the usefulness of data obtained on older systems rapidly diminishes with time. More recent data is usually difficult to obtain as it constitutes a competitive advantage or is protected by export control laws. The pedigree of the data is often unavailable, making it altogether useless. A concerted effort needs to be initiated to allow a systematic development of software failure database repositories.
1-4244-0008-2/06/$20.00 (C) 2006 IEEE
2.
SOFTWARE RELIABILITY AND ARCHITECTURES
Most software reliability models in use today treat software as a monolithic block. An aversion towards “atomic”
architecture” [2] (see Figure 3 and Table 1). Since the architecture is hierarchical, higher level functions can be expressed easily as a function of the atomic elements. In addition, the architecture allows explicit modeling of nonfunctional requirements through the concept of attributes. Failure modes are defined for lower level functions, higher level functions, and non-functional requirements.
Figure 3 Functional Architecture (Extract from [2]) Figure 1 Software Requirements Failure Modes (Extract from [1])
Table 1 Functions in the Architecture (Extract from [2])
Figure 2 Requirements Failure Rate as a Function of LifeCycle Effort (Extract from [1]- Texas Instruments data) models seems to exist. These models appear to add complexity to the modeling, to the data collection and seem intrinsically difficult to generalize. However, architectural models can yield multiple benefits such as the determination of weak areas of the architecture and rapid evaluations of the reliability of architectures built from reused components. The architecture’s atomic elements are the lowest level functions specified in the requirements. Different approaches to architectural modeling may be used depending on the atomic element considered, i.e. module, object or function. The architecture may either replicate the actual architecture of the code or be an abstraction of the requirements. Our research has targeted development of an architecture based on the software requirements specification, denoted “functional
Development of a functional architecture can be done early in the software development life-cycle, i.e. as soon as the requirements are available. This is not true for architectural approaches based on code. Consequently, early assessments of the reliability of the software can be obtained if one uses priors to evaluate the probability of failure of each of the elements of the architecture. This also allows early identification of weak areas in the design, the redirection of resources towards these areas or the consideration of design alternatives. A side benefit of the approach is the fact that it challenges the requirements and thus constitutes an indirect validation of the requirements. Development of the architecture may be a time consuming and error prone process, especially when the software to be developed is large. We have started exploring means to extract the architecture automatically from the requirements specification expressed in natural language. This research has led to the introduction of an intermediate language called InterLang which retains many of the characteristics of English Language while leading to the automatic construction of a unique architecture. To obtain the architecture, an analyst is tasked to rewrite the Natural Language specification into InterLang. The InterLang specification is then parsed and the corresponding architecture is built automatically. Although InterLang seems to constitute
a possible solution to the problem of architectural development, it may not be the best or even the last solution and alternative approaches may need to be examined, such as use of domain specific languages, etc. Another remaining issue, pervasive to many software reliability prediction approaches is the establishment of accurate priors, a problem still in search of a solution. 3.
SOFTWARE RELIABILITY AND MODEL-BASED TESTING
Domain Specific Languages are languages dedicated to a particular application domain. Since the domain is constrained, the language constructs are limited. These languages are thus easy to learn. HOTTest develops the test oracles automatically. Further HOTTest can extract domainspecific axioms from the model to extract additional test cases. HOTTest has been used in the context of database applications and is currently being extended to other application domains such as Web-based applications and GUI applications. Usability and feasibility studies have been performed on classroom case studies as well as on an industrial application and demonstrate that for the domain of database applications, HOTTest is more effective in capturing domain properties than most other commonly used modelbased test design techniques. HOTTest is also more efficient and can deliver higher requirements coverage.
Testing and reliability are closely related disciplines. Testing is the one activity which allows assessment of the reliability of a software before it is released. It also allows identification of bugs and hence has a direct impact on reliability improvement. For most companies, it still remains the only approach available to “build reliability in the Modeling using software”. Consequently, our reliability research has led to Translation using Natural DSL Open Source Structural Prototype Tool Language (HaskellDB) Tool Representatioin related research in testing and more specifically in test Specifications Model automation of black box testing. Black box testing is of Test Generation interest to reliability engineers because it validates the system Test Suite Using Commercial Executable Enriched with Development Tool Code level functions of the software, meaningful to the user and DSRs hence directly related to reliability. Test automation is Automatic Process necessary to make testing effective and efficient. Automating Manual Process black box testing requires generating test cases on the basis of Figure 4. HOTTest (Extract [3]) a representative structural or behavioral model of the system called the test model. These techniques are therefore 4. SOFTWARE RELIABILITY AND PROBABILISTIC RISK collectively known as model-based test automation ASSESSMENT techniques. Model-based testing has many advantages. Models not only enhance the understanding of a product and its Probabilistic Risk Assessment (PRA) is a technique used architecture, but enable one to automatically generate test- to assess the probability of failure or success of a large cases at an early development stage. Model-based test technological system such as a chemical plant, nuclear power automation techniques help in making the test generation plant or an assembly such as the Space Station or the Space process faster and make it less susceptible to human error by Shuttle. Results provided by the risk assessment methodology automating routine and error prone tasks. They also help in are used to make decisions concerning choice of upgrades, making the test process more reproducible by making the scheduling of maintenance, decision to launch, start-up, shutprocess less dependent on human interpretation. With suitable down, decision to abort in flight, and other key parameters. enhancements models can be used to generate scripts for The PRA methodology accounts for hardware and to executing test cases using the commercially available test some extent for human interventions but does not account for harnesses like WinRunner, SilkTest or RationalXDE. software contributions to risk. The consequence of this Test models for model-based test automation techniques, statement is that the estimated level of risk is inaccurate and in practice are created from software artifacts like probably significantly lower than it should. This might not requirements documents or design specifications of the have been an issue 30 years ago but given the increased software and hence, these techniques overtly rely on the dependence of current technological systems on software (and specification for the completeness of the specification, and even the possibility of total reliance on autonomous systems thus for the completeness of the test models. These software which will learn new conditions and the particular responses artifacts are frequently underspecified because the user who is to these conditions on the fly), the problem has become familiar with the domain and defines the product significant. requirements, may consider certain domain specific requirements to be too trivial to be specified explicitly in the 4.1. Integrating Software Into Classical PRA requirements document. The tester and the developer may not have the necessary domain knowledge and hence may never Classical PRA is a well-defined methodology supported realize that such a requirement is missing. Our research has by: targeted the development of a model-based testing technique 1. A set of concepts such as: called HOTTest [3] which reduces such testing errors and a. Initiators (i.e. the beginning of an accident sequence), makes testing more effective. HOTTest is an acronym for b. Intermediate events (i.e. intermediate states in an Higher Ordered Typed Specification based testing. It uses a accident progression), higher-ordered domain specific language to model the system. c. End states (i.e. the final states of the accident
progression, such as Loss of Vehicle, Loss of Crew, Core Melt, etc); 2. A set of supporting logic models such as: master logic diagram, event trees or event sequence diagrams and fault trees; 3. Quantification models. These models help quantify the different quantities involved in the risk assessment; 4. And software tools such as SAPHIRE and/or QRAS [4]; Our past research has developed an approach titled, “The Test-based Approach to Integrate Software Risk Contribution into PRA’ [5]. This approach relies on the following steps: Step 1: Identify events/components controlled/supported by software in the Master Logic Diagram, accident scenarios
and Fault Trees, Step 2: Specify the functions involved, Step 3: Model software functions in Event Sequence Diagrams/Event Trees and Fault Trees, Step 4: Construct the input tree. The input tree is a concept similar to the operational profile, however it is specific to a particular accident scenario and applies to the software functions triggered in that particular scenario, Step 5: Quantify the input tree, Step 6: Develop and perform software safety tests. Step 7: Conduct simulation to evaluate the impact of hardware failures on software behavior.
B No
Does the support platform function normally?
Yes
Is the input correct? Yes
Input
SW Execution
Behavior specified in requirements is consistent, unique and the actual behavior is adequate per requirement
Delay of Execution
No
P( f HW ∪ f CAT )
Is the support platform fully nonfunctional?
Yes
A
No
Does this support failure lead to a safe condition?
Yes
P( f CAT )
C
Yes
Continue on safe branch
No Unsafe State
No
P( f HW ∩ f CAT )
Does the SW generate required No output?
No
C
Yes
P ( f SWHW )
A
Yes
Support platform behaves in a degraded mode
Input
Is the input correct?
Yes
SW Execution
SW execution was impacted by a HW failure in the support platform
Delay of Execution
Behavior specified in No requirements is consistent, unique and the actual behavior is adequate per requirement
Yes
A
No No
C B
A
Does the required SW output match the input required by Yes the next component?
Continue on safe branch
No Does the required output lead to a safe condition?
Continue on safe branch Yes
No
B
SW Execution
Does the erroneous behavior lead to a safe condition?
C
Unsafe State
Delay of Execution
Does the SW generate required output? Yes
Continue on safe branch Yes Unsafe State
No No
Does the erroneous output lead to a safe condition? Yes
Continue on safe branch No Unsafe State
A
Figure 5. Modeling Software in the Event Sequence Diagram
Quantification is based on failure modes (such as input, output and support failures modes) and is achieved experimentally. The approach is currently being tested on different NASA systems. If successful, it should be integrated to the set of PRA modeling guidelines defined by NASA. Integrating Software Into Dynamic PRA PRA has been proven to be a systematic, logical, and comprehensive methodology for risk assessment. However the classical PRA framework is widely believed to be very limiting when it comes to identifying software and human contributions to system risk. The enumeration of risk scenarios in the case of highly complex and hybrid systems of hardware, software, and human components is very difficult using the classical PRA method. The dynamic interactions among the components inside the system often make it hard to identify and predict all the possible scenarios. Dynamic Probabilistic Risk Assessment (DPRA) is a set of methods and techniques, in which executable models that represent the behavior of the elements of a system are exercised in order to identify risks and vulnerabilities of the system. Using the DPRA method, the analyst no longer needs to enumerate all the possible risk scenarios. The computer model will explore the possible scenarios based on the system model. Therefore, the burden of proof of correctness is shifted from the analyst to the DPRA methodology. The fact remains, however, that modeling software for use in the DPRA framework is also quite complex and little has been done to address the question directly and comprehensively. Software modeling in the DPRA environment [6] differs from the traditional PRA environment. The analyst no longer needs to study the fault propagation and enumerate all the possible accident sequences. That task is switched to building an executable software model and identifying possible software-related initiating events. The simulation environment will explore the scenario space, based on the system model. The software risk and vulnerabilities will be identified using the simulation results. In this approach an executable software model first needs to be constructed to simulate the software behaviors. The software-related failure modes need to be identified similarly, as in the classical PRA framework. The selected failure modes will be superimposed on the executable behavior model as stochastic events. The software-related failures are controlled by the simulation guidance model during simulation, based on the predefined rules for exploring the risk-scenarios space, following the selected initiating events. The software representation in the adaptive-scheduling DPRA environment includes both a behavior model and a software guidance model (See Figure 6). The behavior model is an executable model. It is plugged into the system environment to represent the software behavior. The software behavior model is a combination of a deterministic model and stochastic model. The deterministic model is used to simulate the behavior of the software, as well as the interaction between the software and other parts of the system. The stochastic model is superimposed onto the deterministic model to represent the
uncertain behavior of the software, e.g., software failures. The Abstraction Knowledge Base and Failure Injection Knowledge Base are automatically generated from the behavior model. The information is used in the guidance model to control the controllable variables. Simulation Finite State Machine (SFSM) is used to build the software behavior model. The software guidance model is used to guide the simulation to explore scenarios of interest instead of a wide-scale exploration. A Simulation Knowledge Base is constructed inside the software guidance model to store the prior knowledge about the software system. The guidance model adjusts the behavior model based on the requirements from the high level scheduler and planner, also the information in the knowledge base. Meanwhile, the software guidance model provides information to update the planner. The Software Representation described above is currently under validation on several applications, including a telescope software. Planner
Planner
Scheduler
Scheduler
SW
HW SW HM
Guidance Model
MultiLayer Software Behaviour Model
HW HM
Figure 6. Software Integration in the DPRA environment (Extract from [6]) 5.
SOFTWARE RELIABILITY AND MEASUREMENT
Software reliability models are typically based on failure data or failure trends collected/observed during either the testing phase or the operational phase. Consequently failure data should be available. This may not always be the case. Companies may not wish to release their data. In the case of a highly reliable system, failure data may be rare and insufficient to obtain accurate statistical estimates of reliability. Our research has targeted the development of alternate means of obtaining reliability estimates based on software engineering measures rather than on failure data. Obtaining such estimates is more cost effective for an organization. Different estimates can be obtained using different measures. These estimates may also be used to reduce the number of test cases required to establish a particular reliability target. The principal concept introduced in this line of research is the concept of Reliability Prediction System (RePS), i.e. a complete set of measures from which software reliability can be predicted. A RePS (See Figure 7) is typically built around a main measure called a root measure. Support measures are then identified which connect the root measure to reliability. A model then connects the measures to reliability. Multiple RePSs can be constructed and ideally one should select the best to estimate reliability. By best, it is meant that
measures constituting the RePS are easy to obtain, the RePS is theoretically credible, and the measurement process is repeatable. Unfortunately, it is not possible to rank RePSs directly. One obvious reason is that RePSs have not been identified thoroughly yet. Therefore we resorted to a simpler approach, ranking the measures individually, hypothesizing that a better root measure will lead to a better RePS. Thirty software engineering measures were ranked by field experts [7] for the purpose of selecting those that best predict software
4.
5. 6.
Reliability
7.
Model
8. Measure
Support Measure 1
Support Measure 2
...
Support Measure m
RePS
the 14th IEEE International Symposium of Software Reliability Engineering, Denver, Colorado, Nov 2003 F. Groen, C. Smidts, A. Mosleh, “QRAS- The Quantitative Risk Assessment System”, Accepted for publication, Reliability Engineering and System Safety, 2005. B. Li, M. Li., C. Smidts, “Integrating Software into PRA: A Test-Based Approach”, Accepted for publication, Journal of Risk Analysis, 2005. D. Zhu, A. Mosleh, C. Smidts, “Software Modeling Framework for Dynamic PRA”, Proceeding of European Safety and Reliability Conference, 2005, pp 2099-2107 M. Li, C. Smidts, “A Ranking of Software Engineering Measures Based on Expert Opinion”, IEEE Transactions on Software Engineering, Vol 29, No.9, pp 811-24. M. Li, Y. Wei, D. Desovski, H. Najad, S. Ghose, B. Cukic, C. Smidts, “Validation of a Methodology for Assessing Software Reliability”, Proceedings of the 15th IEEE International Symposium of Software Reliability Engineering, Saint-Malo, Bretagne, France, November 25, 2004. BIOGRAPHY
Figure 7. RePS reliability. The top measures identified through this ranking are candidates to build acceptable RePSs. A preliminary validation of these rankings considered five measures [8] and ten small applications. These measures were chosen whereby high, medium, and low ranked measures were covered. Also the ease of RePS construction was taken into account. The validation showed adequacy between ranking of the experts and relative error in the reliability estimate produced by each RePS. The validation effort is being pursued on twelve root measures and a larger safety critical application. If positive, the results of this study may be used to influence the current review process used at the USNRC for licenses of digital safety systems. REFERENCES 1.
2.
3.
C. Smidts, M. Stutzke, R. W. Stoddard, “Software Reliability Modeling: An Approach to Early Reliability Prediction”, IEEE Transactions on Reliability, Vol. 47, No.3, 1998. C. Smidts and D. Sova, “An Architectural Model for Software Reliability Quantification: Sources of Data”, Reliability Engineering and System Safety, 64, 279-290, 1999. A. Sinha, C. Smidts, A. Moran, “Enhanced Testing of Domain Specific Applications by Automatic Extraction of Axioms from Functional Specifications”, Proceedings of
Carol Smidts, Ph. D. Center for Risk and Reliability Engineering 2100 Marie Mount Hall University of Maryland, College Park 20742 USA
[email protected] Dr. Carol Smidts is an associate professor in the Center for Risk and Reliability Engineering at the University of Maryland. The Center is currently hosted in the Department of Mechnical Engineering. Dr. Smidts holds a combined BS/MS degree from the Université Libre de Bruxelles (1986) and a PhD degree from the same university (1992). Dr. Smidts’ research focuses on software reliability modeling, software testing, probabilistic risk assessment and human reliability. She is a member of the IEEE and the ANS and is the winner of multiple awards such as the NASA Flight Safety Award for the Development of the QRAS System for Space Mission Risk Assessment, Stellar Team Award (1998) and the NASA Office of Safety and Mission Assurance Award for the project entitled Integrating Software Into PRA. She has more than a 100 refereed publications in journals and conference proceedings, and is the Director of the Software Reliability Engineering Curriculum at the University of Maryland since 1996