SOFTWARE TESTING, VERIFICATION AND RELIABILITY Softw. Test. Verif. Reliab. (2009) Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/stvr.418
Fault-driven stress testing of distributed real-time software based on UML models Vahid Garousi∗, †, ‡ Software Quality Engineering Research Group (SoftQual), Department of Electrical and Computer Engineering, Schulich School of Engineering, University of Calgary, 2500 University Drive NW, Calgary, Alta., Canada T2N 1N4
SUMMARY In a previous article, a stress testing methodology was reported to detect network traffic-related Real-Time (RT) faults in distributed RT systems based on the design UML model of a System Under Test (SUT). The stress methodology, referred to as Test LOcation-driven Stress Testing (TLOST), aimed at increasing the chances of RT failures (violations in RT constraints) associated with a given stress test location (an network or a node under test). As demonstrated and experimented in this article, although TLOST is useful in stress testing different test locations (nodes and network, it does not guarantee to target (test) all RT constraints in an SUT. This is because the durations of message sequences bounded by some RT constraints might never be exercised (covered) by TLOST. A complementary stress test methodology is proposed in this article, which guarantees to target (cover) all RT constraints in an SUT and detect their potential RT faults (if any). Using a case study, this article shows that the new complementary methodology is capable of targeting the RT faults not detected by the previous test methodology. Copyright © 2009 John Wiley & Sons, Ltd. Received 13 August 2008; Revised 8 August 2009; Accepted 13 August 2009 KEY WORDS:
search-based software testing; stress testing; model-based testing; fault-driven testing; distributed systems; UML; network traffic
∗ Correspondence to: Vahid Garousi, Software Quality Engineering Research Group (SoftQual), Department of Electrical and
Computer Engineering, Schulich School of Engineering, University of Calgary, 2500 University Drive NW, Calgary, Alta., Canada T2N 1N4. † E-mail:
[email protected] ‡ Assistant Professor. Contract/grant sponsor: Natural Sciences and Engineering Research Council of Canada (NSERC); contract/grant number: 341511-07 Contract/grant sponsor: Alberta Ingenuity New Faculty Award; contract/grant number: 200600673
Copyright q
2009 John Wiley & Sons, Ltd.
V. GAROUSI
1.
INTRODUCTION
Distributed Real-Time Systems (DRTS)§ are becoming more important in our everyday life. Examples include command and control systems, aircraft aviation systems, robotics, and nuclear power plant systems [1]. Sources of failures in the United States Public-Switched Telephone Network (PSTN), being a very large DRTS, are investigated by Kuhn [2]. It is reported that during the period 1992–1994, although only 6% of the outages were overloads, they led to 44% of the PSTN’s service downtime. In the system under study, overload was defined as the situation in which service demand exceeds the designed system capacity. Thus, it is evident that although overloads do not happen frequently, the failure resulting from them can be quite expensive. Therefore, the motivation for the current work can be stated as follows. However, DRTSs are by nature concurrent and can be data intensive [1], there is a need for methodologies and tools for testing and debugging DRTSs under stress conditions such as heavy user loads and intense network traffic. These systems should be tested under stressing conditions before being deployed in order to assess their robustness to distribution- and network-specific problems. In this article, the focus is on network traffic, one of the fundamental distribution-specific factors affecting the behaviour of DRTSs. Distributed nodes of a DRTS regularly need to communicate with each other to perform system functionality. Network communications are not always successful and on time, as problems such as congestion, transmission errors, or delays might occur. On the other hand, many real-time (RT) and safety-critical systems have hard deadlines for many of their operations, where if the deadlines are not met, serious or even catastrophic consequences (e.g. explosion) will happen. Furthermore, a DRTS might behave well with normal network traffic loads (e.g. in terms of amount of data, or number of requests), but the communication might turn out to be poor and unreliable if many network messages or high loads of data are concurrently transmitted over a particular network or towards a particular node. Since 1997, UML [3] has become the de facto standard for modelling object-oriented software for nearly 70% of the IT industry [4]. As it is generally expected in the community that UML will be increasingly used for DRTSs, it is therefore important to develop automatable UML model-driven, stress test methodologies. Assuming that the UML design model of a DRTS is in the form of sequence diagrams annotated with timing information, and the system’s network topology is given in a specific modelling format. Garousi et al. [5] proposed a systematic methodology to derive test requirements to stress the system with respect to network traffic in a way likely to reveal robustness problems. The methodology [5] can be used to automatically generate an interleaving¶ that will stress the network traffic on a network or a node so as to analyse the system under strenuous but valid conditions. If any network traffic-related failure is observed, designers will be able to apply any necessary fixes to increase the robustness before system delivery.
§ For reading convenience, the glossary of acronyms is provided in Appendix. ¶ A network interaction interleaving is a possible sequence of network interactions among a subset of objects on a subset
of nodes.
Copyright q
2009 John Wiley & Sons, Ltd.
Softw. Test. Verif. Reliab. (2009) DOI: 10.1002/stvr
FAULT-DRIVEN STRESS TESTING
The methodology presented by Garousi et al. [5], referred to in this article as Test LOcation-driven Stress Testing (TLOST), aimed at increasing the chances of violations in RT constraints associated with a given stress test location (a network or a node under test). As experimented and discussed in detail in Section 3.2, although TLOST is useful in stress testing different test locations (nodes and networks), it does not guarantee to target all RT constraints in an SUT. This is due to the manner in which TLOST is designed: it chooses Control Flow Paths (CFPs) that entail the maximum possible traffic on a given network or node. Even if all networks and nodes of a system are stress tested with such an approach, some particular CFPs might never be chosen as stress test cases. In such a case, a few particular RT constraints specified inside those CFPs will never be exercised by TLOST. To address the above limitation of TLOST, this article proposes a complementary stress test methodology, referred to as Real-Time FAult-driven Stress Testing (RTFAST), which guarantees to target (test) all RT constraints of an SUT with maximum stress. To do so, RTFAST picks specific CFPs from the UML sequence diagrams of an SUT and also specific test locations as test requirements, which if triggered, will increase the chances of failures in a given RT constraint. By using RTFAST, all RT constraints of an SUT can be checked one by one to ensure that they are met in most stressed conditions of a system. Note that RTFAST is not intended to replace TLOST [5], but to complement it, that is, both RTFAST and TLOST should be used to stress test a DRTS as they each have a different stress testing objective. Both TLOST and RTFAST are categorized as type of search-based software testing techniques with two slightly different search spaces and objective functions. The goal of TLOST [5] is to target a given network or a node, and by maximizing the network traffic in that network or node, it can then increase only the chances of the RT failures associated with that network or node. On the other hand, RTFAST’s objective is to target a given RT constraint and increase the chances of failures in that RT constraint by maximizing the network traffic affecting that constraint. The relationship between TLOST and RTFAST can be better explained using the concept of test coverage from the foundations of software testing [6] as follows. Recall that the fault hypothesis behind statement test coverage is that faults cannot be discovered if the parts containing them are not executed. Similarly, in the current context, RT faults cannot be discovered if the behavioural scenarios (e.g. CFPs) associated with them are not executed. As discussed in Section 3.2.3, although TLOST [5] is useful in covering all test locations in an SUT, it might not cover all RT constraints. On the other hand, the new RTFAST methodology can cover all RT constraints and thus increase the chances of detecting RT faults. In other words, RTFAST can be used to provide full stress test coverage on RT constraints. The remainder of this article is structured as follows. A survey of the related work is presented in Section 2. Section 3 presents the background information including the previous TLOST technique [5]. The new complementary stress test methodology (RTFAST) is presented in Section 5, and is applied in Section 5 to a prototype DRTS. Conclusions and future works are discussed in Section 6.
2.
RELATED WORK
To the best of the author’s knowledge, no existing work addresses the automated derivation of test requirements from UML models for the performance stress testing of DRTSs from the perspective
Copyright q
2009 John Wiley & Sons, Ltd.
Softw. Test. Verif. Reliab. (2009) DOI: 10.1002/stvr
V. GAROUSI
of increasing the chances of exhibiting RT failures in given RT constraints. In general, there have not been many works on systematic generation of stress and load test suites for software systems, with notable exceptions [5,7–12]. The previous technique [5] is a stress test methodology aimed at increasing the chances of discovering RT faults originating from the network traffic overloads in DRTSs. An overview of the TLOST methodology [5] is presented in Section 3.2. Avritzer and Weyuker [7] proposed a class of load test case generation algorithms for telecommunication systems that can be modelled by Markov chains. The proposed black-box techniques are based on system operational profiles [13]. The Markov chain that represents a system’s behaviour is first built. The operational profile of the software is then used to calculate the probabilities of the transitions in the Markov chain. The steady-state probability solution of the Markov chain is then used to guide the generation process of the test cases according to a number of criteria, in order to target specific types of faults. For instance, using probabilities in the Markov chain, it is possible to ensure that a transition in the chain is involved many times in a test case so as to target the degradation of the number of calls that can be accepted by the system. From a practical standpoint, targeting only systems whose behaviour can be modelled by Markov chains can be considered a limitation of the Avritzer and Weyuker’s test technique. Furthermore, using only operational profiles to test a system may not lead to stressing situations. Briand et al. [8] propose a methodology for the derivation of test cases, which aims at maximizing the chances of deadline misses in RT systems. They show that task deadlines may be missed although the associated tasks have been identified as schedulable through appropriate schedulability analysis. The authors note that although it is argued that schedulability analysis simulates the worstcase scenario of task executions, this is not always the case because of the assumptions made by schedulability theory. The authors develop a methodology that helps identify performance scenarios that can lead to performance failures in a system. This stress testing technique uses RT job schedules to find the worst-case scenario test cases and is not based on stress conditions due to network traffic usage. Yang [9] proposes a technique to identify potentially load sensitive code regions to generate load test cases. The technique targets memory-related faults (e.g. incorrect memory allocation/deallocation, incorrect dynamic memory usage) through load testing. The approach is to first identify statements in the module under test which are load sensitive, that is, they involve the use of malloc() and free() statements (in C) and pointers referencing allocated memory. Then, data flow analysis is used to find all data definition–use (DU) pairs∗∗ that trigger the load sensitive statements. Test cases are then built to execute paths for the DU pairs. Zhang and Cheung [10] describe a procedure for automating stress test case generation in multimedia systems, which is similar to both the work in [5] and the current article. The authors consider a multimedia system consisting of a group of servers and clients connected through a network as an SUT. Stringent timing constraints as well as synchronization constraints are present during the transmission of information from servers to clients and vice versa. The authors identify test cases
The operational profile of a system is defined as the expected workload of the system once it is operational [13]. ∗∗ A data definition and a data use statement in a source code, where the data use uses the value defined in the data
definition [14].
Copyright q
2009 John Wiley & Sons, Ltd.
Softw. Test. Verif. Reliab. (2009) DOI: 10.1002/stvr
FAULT-DRIVEN STRESS TESTING
that can lead to the saturation of one kind of resource, namely CPU usage of a node in the distributed multimedia system. The authors first model the flow and concurrency control of multimedia systems using Petri nets coupled with timing constraints. A specific flavour of temporal logic is used to model temporal constraints. The following are some of the limitations of their technique: (1) The technique cannot be easily generalized to generate test cases to stress test other kinds of resources, such as network traffic, as this would require important changes in the test model. (2) The resource utilization (CPU) of media objects is assumed to be constant over time, but in the current article, a more realistic case, that is, variable resource utilization is considered over time by executing each CFP in an SUT. (3) If the technique presented by Zhang and Cheung [10] is applied in a UML-based development, it requires additional knowledge (Petri nets and a specific flavour of temporal logic) which can be an impediment to its use. There have also been techniques in the literature for timeliness testing in RT systems. The two papers by Nilsson et al. [11,12] proposed a mutation-based testing criteria for timeliness [11] and test case generation based on those criteria [12]. Nilsson et al. argue that conventional test coverage criteria (e.g. control flow-based criteria such as statement coverage) ignore task interleaving and timing, and thus do not help determine which execution orders need to be exercised to test for temporal correctness. The input to the approach [11,12] is a timed automaton for tasks of an RT system. To measure the test adequacy, a new test adequacy criterion based on a set of six mutation operators was defined (i.e. the test criterion specifies the mutation operators to use). To define the mutation operators for testing timeliness, the authors identified two categories of timeliness faults. The first category represents incorrect assumptions about the system behaviour during schedulability analysis and design. This includes assumptions about execution times, use of shared resources, precedence relations, overhead times of context switches, and cache efficiency. The second category of timeliness faults is the system’s ability to cope with unanticipated discontinuities and changes in the environmental behaviour, for example, disturbances in the sampling periods. For example, two of the execution time mutation operators proposed [11] for the above first category were called: executiontime+/−, which, respectively, increase or decrease the execution time of a task by a constant time delta. The mutants created by these operators represent an overly optimistic estimation of the worst-case (longest) execution time of a task or an overly pessimistic estimation of the best-case (shortest) possible execution time. The execution time- execution time operator is relevant when multiple active tasks share resources in the system. A shorter than expected best-case execution time of a task may lead to a scenario where a lower priority task gets a resource and blocks a higher priority task so that it misses its deadline. A case study was reported [11,12] to demonstrate that the faults represented by the mutation operators can lead to missed deadlines.
3.
BACKGROUND
As the test technique presented in this article targets RT faults, Section 3.1 presents a background on RT constraints. Section 3.2 provides an overview of the TLOST stress test methodology [5], and also discusses in detail its limitation in targeting all RT constraints.
Copyright q
2009 John Wiley & Sons, Ltd.
Softw. Test. Verif. Reliab. (2009) DOI: 10.1002/stvr
V. GAROUSI
3.1.
Introduction on RT constraints
RT constraints are timing constraints on operations in DRTSs. For example, the specification of a nuclear power plant system might require that an over-heated reactor should be cooled down within 5 s, or a catastrophic result will happen. There are usually two types of RT constraints and RT systems: hard and soft [1]. A hard RT (HRT) constraint on an operation enforces that the operation must complete within the specified time frame (e.g. 2 s) or the operation is, by definition, incorrect, unacceptable, and usually has a penalty associated if it is missed. The penalty can be infinite (in case of a critical hard deadline). On the other hand, in the case of a soft RT constraint for an operation, the value of the operation declines according to a given value function after the deadline expires. RT tasks completed after their respective deadlines are considered less valuable than those whose deadlines have not yet expired [1]. As the stress testing methodology presented in this article is in the context of DRTSs and UML-driven development and the goal is to increase the chances of failures in RT constraints, it is discussed next how RT constraints can be specified in UML models. To model RT constraints in UML models, the UML profile for Schedulability, Performance, and Time (UML SPT) [15] proposes comprehensive modelling constructs to model timing information. The UML SPT profile briefly mentions soft and HRT constraints (Section 2.2.3 of [15]). However, it does not propose any specific stereotypes to distinguish between hard and soft RT constraints in UML models. Note that the UML SPT profile was the standard profile when this research was conducted. As of the time of this writing (February 2009), the Object Management Group is still working on the finalization stage of a new improved profile. The new profile is called the UML profile for Modelling and Analysis of Real time and Embedded Systems (MARTE) [16], and is expected to replace SPT in the near future. It seems that MARTE will better support the specification of hard and soft RT constraints than the UML SPT profile. The framework reported in this article can easily be modified to be applicable with MARTE as the only dependency is on the tagged value to specify the type of an RT constraint (hard or soft) in UML models. It should be noted that explicit distinction of soft and HRT constraints when modelling RT systems can be beneficial as it can help analysts, developers, and testers to distinguish between the two types and perform different types of analyses for each of them. For example, stress testing with the intention to find a HRT failure (violation of a HRT constraint) is more cost effective than targeting soft RT failures, as the failure costs due to the former type of failures are generally considered more severe than those of the latter. In order to model hard and soft RT constraints, the author proposes two extensions to the RTaction stereotype of the UML SPT profile referred to as hard RT action (HRTaction) and soft RT action (SRTaction). Example usages of the SRTaction and HRTaction stereotypes in a UML sequence diagram and an Interaction Overview Diagram (IOD)†† are demonstrated in Figure 1.
†† IOD were introduced as a new UML diagram in UML 2.0 (Section 14.4 of [17]). IODs ‘define interactions through a
variant of activity diagrams in a way that promotes overview of the control flow’ [17]. IODs are the specializations of activity diagrams where object nodes are either Interactions or InteractionOccurrences.
Copyright q
2009 John Wiley & Sons, Ltd.
Softw. Test. Verif. Reliab. (2009) DOI: 10.1002/stvr
FAULT-DRIVEN STRESS TESTING
sd M
IOD o1 {node = n1}
alt
o2 {node = n2} m1
o3 {node = n3} [condition] SD1
m2
«SRTaction» {RTduration