A software architecture that allows the method to be easily integrated into ..... be loaded by the reliability analysis tool (e.g. the Relex tool. [17]), that can be ...
A METHOD FOR THE PREDICTION OF SOFTWARE RELIABILITY
A. D’Ambrogio, G. Iazeolla, R. Mirandola Department of Computer Science, S&P University of Roma ”Tor Vergata” 1 Via del Politecnico, I-00133 Roma, Italy
email: dambro,iazeolla,mirandola @info.uniroma2.it ABSTRACT This paper deals with the reliability assessment of component-based software to predict the software product reliability at the early stage. The proposed approach transforms a specification written in a semi-formal language into a stochastic model to be used for reliability evaluation. The paper assumes an UML-based system specification and introduces a method to map the specification onto a failure model. The method enables software designers with no specific knowledge of reliability theory to predict at design time the reliability of the final product, thus introducing lifecycle reliability prediction into their development best practices. The method is illustrated by use of an application case study that deals with the development of distributed software. A software architecture that allows the method to be easily integrated into UML-based software development environments is also introduced. The architecture is based on the use of XML (eXtensible Markup Language) to represent both the UML system specification and the failure model. KEY WORDS software reliability, reliability prediction, automatic model production
1 Introduction Reliability prediction requires insights and skills which software designers are in general not familiar with. Therefore, developing methods to automatically map system specifications into appropriate reliability models should minimize the effort and cost of prediction. Such methods should be integrated into commercial software design environments to enable software developers to experiment with different design alternatives, different configurations, or revise the design to improve the reliability. One of the first proposals in this direction can be found in [12], for the production of stochastic models from
Work partially supported by funds from the MURST project on Soft-
ware Architectures for Heterogeneous Access Network Infrastructures (SAHARA), from the University of Roma TorVergata research on the Performance Validation of Advanced Systems and from the University of Rome TorVergata CERTIA Research Center.
Estelle language specifications. More recently, methods departing from UML specifications have emerged [13, 10, 5, 11, 9], which however deal with performance (delay) problems and only in a few cases [13, 9] they address reliability extensions. There exist various types of reliability models, that according to [6] can be classified into state-based and pathbased models. State-based models introduce discrete or continuous chain of semi-markov processes to represent software architecture [7, 4, 8]. Path-based models instead, combine the software architecture with its failure behavior considering the possible execution paths of the program [6, 14]. In [13] the approach consists of mapping UML system specifications onto state-based models, in particular generalized semi-Markov processes. In [9] the approach consists of mapping UML system specifications onto Timed Petri Net models, which in turn can be translated into a state-based models of continuous markov chain models. In this paper the path-based approach is considered, and the UML system specification is mapped onto a fault tree (FT) model. The method is implemented into a tool that gives the reliability prediction of software architectures described by UML sequence diagrams (SD) and deployment diagrams (DD) [15]. The SD shows the interaction in time sequence among objects, or instances of each given class, in the scenario: vertical dashed lines represent the lifelines of the objects (time progress is top-down) and horizontal arrows represent control transfer (or object interactions) from the sender object to the receiving object. Each UML-type activation box [15] between an incoming arrow and the subsequent outgoing arrow of the SD graphically represents the code executed by each class method. The DD describes the platform configuration and devices with the allocation of the software objects on platform devices. A FT is a graphical and logical framework for analyzing the failure modes of a system [1]. A FT diagram uses logic gates to link failure events, that can be of two types:
basic events, represented by ellipsed, that represent the basic causes for the failure
the software failure preliminary data (SFD), that describe the lifetime distribution for each software object, obtained from statistical testing data on emulated or similar software functions
DD SD1
SDn
SD1 mapper
HFT1 builder
SFT1
SDn mapper
HFT1
the platform devices failure data (PFD), that describe the lifetime distribution for each platform device, obtained from manufacturer data
SFTn and HFT n merger HSFTn
HSFT1
The method consists of a series of steps illustrated by various blocks in Figure 1:
HSFT merger GFT Legend
OP, SFD and PFD GFT parameterizer parameterized GFT GFT solver
Reliability prediction
SDi:
i-th Sequence Diagram
DD:
Deployment Diagram
SFTi:
i-th Software Fault Tree
HFTi:
i-th Hardware Fault Tree
HSFTi:
i-th HW and SW Fault Tree
GFT:
Global Fault Tree
OP:
Operational Profile
SFD:
Software Failure Data
PFD:
Platform Devices Failure Data
Figure 1. Method outline for reliability model production
HFTn
SFTn
SFT1 and HFT 1 merger
HFTn builder
Step1a: each sequence diagram SDi (i=1..n) is first translated into the corresponding software fault tree SFTi (i=1..n) by use of the SDi mapper block. In many practical situations, only a subset of the sequence diagrams is used on the basis of their occurrence probability expressed by the OP Step1b: from each sequence diagram SDi (i=1..n) and from the deployment diagram DD the corresponding hardware fault tree HFTi (i=1..n) is obtained by use of the HFTi builder block, that combines basic failure events from platform devices hosting SDi’s software objects
intermediate events, represented by rectangles, that can be further exploded into more detailed fault tree sub-diagrams
Step2: each SFTi is then grouped with the corresponding HFTi into a so called hardware software fault tree (HSFTi) by use of the SFT and HFT merger block.
The diagram links the undesired top event (system failure) to more basic events by logic gates. The top event is resolved into its constituent causes, connected by AND, OR logic gates, which are then further resolved until basic events are identified. The basic events that give the limit of resolution of the fault tree. The paper is organized as follows. Section 2 describes the general method for producing a fault tree model from UML system specifications. Section 3 gives details of the method application by use of an example case study. Section 4 describes the software architecture of an environment that implements the method.
Step3: all the so obtained HSFTi are then grouped into the global fault tree (GFT) by use of the HSFT merger block.
2 Method Outline Figure 1 gives the method outline for the reliability model production. The assumption is made that the software is developed according to a distributed object-oriented approach and that the UML formalism is used [3, 15]. In more details, it is assumed that the following software artifacts are available (see Figure 1):
Step4: at this time the GFT is translated into a parameterized GFT by use of the parameterizer block. Such a block associates to each basic event of the GFT the relating lifetime distribution function obtained from PFD, if related to platform device failures, from SFD, if related to software object failures, and from OP if related to the operational profile. The so obtained parameterized GFT is then evaluated by use of a reliability evaluation tool [12, 17] that yields the software reliability prediction. Figure 2 illustrates the activities performed by the SDi mapper block of Figure 1. According to the illustration, the mapper first decomposes the SD into its basic constructs, which can be of the following types:
synchronous call (SY)
the set of sequence diagrams SD1, SD2, through SDn
asynchronous call (ASY)
the deployment diagram DD
local call (LC)
the operational profile (OP), that describes the occurrence probability of sequence diagrams SD1, SD2, through SDn
branch (B) loop (L) object creation/destruction (C/D)
mail messages, and standard-users, allowed to download only mail messages. Two scenarios are thus defined for this system, each one referring to a particular category of users.
SD
Identification of SD basic constructs
SY SY mapper BFT SY
ASY ASY mapper BFT ASY
L
B
LC LC mapper BFT LC
B mapper BFT B
C/D
L mapper BFT L
C/D mapper BFT C/D
BFT merger SFT
Figure 2. Details of the SD mapper block
For each basic constructs there exist a corresponding mapper (SY-mapper, ASY-mapper, LCmapper, B-mapper, L-mapper and C/D-mapper) that yields the corresponding basic fault tree ). ( All such BFTs are finally merged into the SFT by use of the BFT merger block. As an example, Figures 3 and 4 give details about the translation of SY and B constructs into the corresponding BFTs, respectively. Figure 3 shows the SY basic construct with the corresponding . The rationale of the translation is that the SY basic construct fails if there occurs a failure in the method call performed by the calling software object or else if the method execution performed by the receiving software object fails or else if fails when obtaining the call return. Figure 4 shows the B basic construct with the corresponding . The rationale of the translation is that the B constructs fails if the software object fails when evaluating the branch condition or else if there occurs a failure in the synchronous call from to or else if there occurs a failure in the synchronous call from to . A similar reasoning holds for the rest of mappers (ASY-mapper, LC-mapper, L-mapper, C/D-mapper).
3 Method Description A constructive description of the method is here given, by use of an application case that deals with the development of a distributed software called mail system. The system gives the users the capability to download bulk of mail messages coming from external sources. A database stores log records of each user operation, and is updated by the system when the user closes the session. There are two categories of users: administrators, who are allowed to download both the log records and the
3.1 Inputs from system specification documents According to the OP document the administrators are 10% of total users and they download one mail message for session, while standard users are 90% of total users and download 3 mail messages for session. A mail system process is run for each user logged on the user workstation. Each user must be authenticated in order to get access to the mail system. It is assumed that the mail system platform configuration is the one in Figure 5, and consists of one User Workstation, and three servers, HW Server A, HW Server B1 and HW Server B2. The user workstation is connected to the HW server A by the LAN and to the HW server B1/B2 by the WAN. Additional artifacts from the system specification phase are the set of SDs and the DD, on whose basis the GFT can be generated as described below. Only two SDs are illustrated here, given in Figure 7 and Figure 8 for the standard-user scenario (SD1) and the administrator scenario (SD2), respectively. In the administrator scenario for example (see Figure 8) the User object initially calls the mail session method of the System object. The System executes the method by first verifying the user rights and then by calling the view log method of the History Mgr object and the read mail method of the Mail Mgr object. Finally the System returns control to the User object and, at the same time, performs a call to the update log method of the History Mgr object. Such a call is of asynchronous type, so there is no return arc to the System object. Figure 6 shows the deployment diagram for the considered mail system, with an example allocation (Alternative 1) of software objects on platform devices. System objects are allocated on the User Workstation node, Mail Mgr objects are allocated on the HW Server A node, while History Mgr objects are redundantly allocated on both HW Server B1 and B2. An alternative decision (Alternative 2) could be taken by allocating both the Mail Mgr and History Mgr objects onto the same HW Server (A or B1/B2). The method this paper introduces could then be used to evaluate the most reliable alternative.
3.2 Generation of the GFT According to step1a of the method (see Section 2) two software fault trees, SFT1 and SFT2, are to be produced from the two sequence diagram, SD1 and SD2, in Figures 7 and 8 respectively. For the sake of brevity only the translation
objA
objB
objA
Gate OR
objB
objC [condition]
method 1()
[condition]
method 2()
method 1()
return
return
call failure
method 1
objA
objB
failure
return
return failure
eval condition failure
objA
SY 1
SY 2
failure
failure
objA
Figure 3. Details of the SY-mapper block
Figure 4. Details of the B-mapper block
User Workstation System
User Workstation
HW Server A
HW Server B1
LAN
LAN
HW Server B2
WAN WAN
LAN HW Server A
LAN
Figure 5. Platform configuration of the considered mail system
:Mail_ Mgr
:System
History_Mgr
History_Mgr
Mail_Mgr
WAN
:User
HW Server B2
HW Server B1
:History_ Mgr
mail_session
Figure 6. Deployment diagram (DD) for the considered mail system
:User
:Mail_ Mgr
:System
:History_ Mgr
mail_session verify_user
verify_user
read_mail
view_log read_mail
read_mail read_mail
update_log
Figure 7. SD for the standard-user scenario (SD1)
update_log
Figure 8. SD for the administrator scenario (SD2)
of SD2 into SFT2 is illustrated here (see Figure 9). The following basic constructs can be recognized in Figure 8:
4 synchronous calls corresponding respectively to the mail session, verify user, view log and read mail methods, respectively one asynchronous call corresponding to the update log method
According to Figure 2 all such basic constructs are translated into the corresponding BFTs which are all merged into the software fault tree (SFT2) illustrated in Figure 9. According to this Figure all such BFTs are logically merged by use of an OR gate, the rationale of this merge being the fact that SD2 fails if there occurs a failure in at least one of its basic constructs. Each SY failure block is in turn expanded into the corresponding fault tree according to Figure 3, as illustrated by the SY1 expansion in Figure 9. It is understood that a similar reasoning applies to the rest of SY and ASY blocks whose expansion fault tree is not illustrated in Figure 9. According to step1b of the method (see Section 2) two hardware fault trees, HFT1 and HFT2, are to be produced from the deployment diagram DD and the two sequence diagram, SD1 and SD2, in Figures 7 and 8 respectively. For the sake of brevity only the HFT2 production is illustrated here (see Figure 10). According to the DD in Figure 6 the alternative 1 allocation of software objects is considered. Therefore a failure occurs if either the User Workstation, or the HW server A, or the LAN, or the WAN or both the HW Servers B1 and B2 fail, and this is schematically represented by the OR/AND hardware fault tree in Figure 10. According to step2 of the method (see Section 2) SFT2 and HFT2 are then merged into a single HSFT2 by use of an OR logic gate, as illustrated in Figure 11. Finally, according to step3 of the method (see Section 2) the HSFT1 and HSFT2 are to be grouped into the global fault tree, as illustrated in Figure 12. The rationale of this Figure is that the system fails if HSFT1 or HSFT2 fails. On the other hand the failure rates of HSFT1 and HSFT2 are to be weighted according to the SD1 and SD2 occurrence probability, respectively. This is modeled by connecting each HSFT by use of an AND gate, since the global failure rate is the product of the SD occurrence probability and the HSFT failure rate. Occurrence probabilities and failure rates are the result of the GFT parameterization discussed in next Section.
3.3 GFT parameterization At this stage, according to step4 of the method (see Section 2) each basic event of the GFT receives an associated lifetime distribution function obtained from PFD, if related to platform device failures, from SFD, if related to software object failures, and from OP if related to the operational profile.
In case of exponential distribution probability for each basic event , the lifetime distribution function is easily obtained as:
! " #
-%. . + / 0 1 2 ! " #%$'&() *, (1) where 34 is the mean time to failure associated to be failure basic event (see 6'an 7 9hardware 56'7 , 8 ,6'that7 can & , 8 6'7 : , ;