MBT 2006 Second Workshop on Model Based Testing

MBT 2006 Second Workshop on Model Based Testing

March 25–26, 2006 Vienna, Austria Satellite workshop of ETAPS 2006

Organizers Bernd Finkbeiner, Yuri Gurevich, and Alexander K. Petrenko

ii

Preface This volume contains the proceedings of the Second Workshop on Model Based Testing (MBT 2006). MBT 2006 is to be held on March 25–26, 2006, as a satellite workshop of the European Joint Conferences on Theory and Practice of Software (ETAPS 2006). The workshop is devoted to model-based testing of both software and hardware. Model-based testing uses models that describe behavior of the system under consideration to guide such efforts as test selection and test results evaluation. Model-based testing has gained attention with the popularization of models in software/hardware design and development. Not all the models used now are suitable for testing. Models with formal syntax and precise semantics are particularly important. Testing based on such a model helps one to measure to what degree the code faithfully implements the model. Techniques to support model-based testing are drawn from diverse areas, like formal verification, model checking, control and data flow analysis, grammar analysis, and Markov decision processes. The intent of this workshop is to bring together researchers and users using different kinds of models for testing and to discuss the state of art in theory, applications, tools, and industrialization of model-based testing. We would like to thank the program committee and all reviewers for the excellent job of evaluating the submissions. We are also grateful to the ETAPS 2006 organizing committee for their valuable assistance. We use this opportunity to honor Alexandre Zamulin, a member of the program committee, who died after a long illness during the preparation of the workshop. Alexandre made significant contributions to numerous areas of computer science. He was a great citizen of the worldwide computer science community, a friend, and a wonderful human being. His memory be blessed! Bernd Finkbeiner, Yuri Gurevich, and Alexander K. Petrenko February 2006

iii

Program Committee of MBT 2006 Bernhard K. Aichernig Jonathan Bowen Mirko Conrad John Derrick Bernd Finkbeiner Susanne Graf Yuri Gurevich Alexander S. Kossatchev Darko Marinov Bruno Marre Jefferson Offutt Alexander K. Petrenko Alexandre Petrenko Nikolai Tillmann Jan Tretmans Alexandre Zamulin †

(UNU-IIST, Macau) (South Bank University, UK) (Daimler Chrysler, Germany) (University of Kent, UK) (Universität des Saarlandes, Germany) (Verimag, Grenoble, France) (Microsoft Research, USA) (ISP RAS, Russia) (University of Illinois, USA) (Universite Paris-Sud, France) (George Mason University, USA) (ISP RAS, Russia) (Computer Research Institute of Montreal, Canada) (Microsoft Research, USA) (University of Nijmegen, Netherlands) (IIS RAS, Novosibirsk, Russia)

Additional Referees Sergiy Boroday Klaus Dräger Yaniv Eytani James K. Huggins

Jiale Huo Lars Kuhtz Sasa Misailovic Laurent Mounier

iv

Benoˆıt Ries Sven Schewe Tim Willemse

Contents An Extension of the Classification-Tree Method for Embedded Systems for the Description of Events Alexander Krupp and Mirko Conrad

1

A model-based integration and testing approach to reduce lead time in system development N.C.W.M. Braspenning, J.M. van de Mortel Fronczak, and J.E. Rooda

10

Towards Test Purpose Generation from CTL Properties for Reactive Systems Daniel Aguiar da Silva and Patr´ıcia D. L. Machado

27

Runtime Verification for High-Confidence Systems: A Monte Carlo Approach Sean Callanan, Radu Grosu, Abhishek Rai, Scott A. Smolka, Mike R. True, and Erez Zadok

41

Controlling Testing using Three-Tier Model Architecture Antti Kervinen, Mika Maunumaa, and Mika Katara

54

Testing Self-Similar Networks Constantinos Djouvas, Nancy D. Griffeth, and Nancy A. Lynch

69

Formal Conformance Testing of Systems with Refused Inputs and Forbidden Actions Igor B. Bourdonov, Alexander S. Kossatchev, and Victor V. Kuliamin

87

Test Case Generation for Mutation-based Testing of Timeliness Robert Nilsson, Jeff Offutt, and Jonas Mellin

102

When Model-based Testing Fails Bernhard K. Aichernig and Chris George

122

v

vi

MBT 2006

An Extension of the Classification-Tree Method for Embedded Systems for the Description of Events Mirko Conrad 1 DaimlerChrysler AG, Alt-Moabit 96a, D-10559 Berlin, Germany

Alexander Krupp 2 Paderborn University / C-LAB, Fuerstenallee 11, D-33102 Paderborn, Germany

Abstract Nowadays, model-based test approaches are indispensable for the quality assurance of invehicle control software. In practice, the Classification-Tree Method for Embedded Systems (CTMEMB ) is used to realize a compact graphical representation of test scenarios. Up to now, the CTMEMB has been used mainly in the area of continuous systems. Though the depiction of events within test scenarios is possible already by using existing means of description, there is still room for improvements. Thus, we will introduce in the following a novel extension of the Classification-Tree Method for Embedded Systems for a compact, natural depiction of event-like behaviour which we will illustrate by means of several examples from the area of in-vehicle control software. Key words: classification-tree method, CTMEMB , embedded systems, test scenario description, events

1

Introduction

The selection of suitable, i.e. error-sensitive, test scenarios is the most crucial activity for a trustworthy test of in-vehicle software[3]. It finally determines the scope and quality of the test. Moreover, an appropriate description of the test scenarios used is essential for the human tester. Based on the data-oriented partitioning of the input domain of the system under test, the Classification-Tree Method for Embedded Systems CTMEMB [4,5,6] facilitates a systematic design of time-variable test scenarios and their graphical description. CTMEMB provides a compact, problemoriented graphical representation, which is suitable for a human tester, containing a 1 2

Email: [email protected] Email: [email protected] This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs

Conrad, Krupp

high potential for understandability and reusability. The CTMEMB has recently been successfully employed in different control software development projects[12]. One of the main application areas is the testing of in-vehicle software developed in a model-based way[1,2,15]. Strengths of the CTMEMB approach are the description of time-continuous test patterns. However, parts of modern automotive control systems are event-based. So, events have to be a natural part of test descriptions for mixed discrete-continuous systems. The current CTMEMB notation is already capable for the descriptions of eventlike test scenarios, but the resulting descriptions are unnecessary complex. Therefore, a novel extension of the Classification-Tree Method for Embedded Systems facilitating a compact description of events will be proposed in the remainder of the paper. Section 2 summarizes main concepts and the notation of the CTMEMB . Section 3 describes the proposed extension for event description. Section 4 illustrates its application by means of three different examples. Section 5 relates to other work and section 6 concludes the paper.

2

The Classification-Tree Method for Embedded Systems

Classification Trees were introduced during the early 90s by Grimm and Grochtmann for the structured representation of test cases [9,10]. The construction of classification trees and their associated combination tables is supported by the Classification-Tree Method (CTM), which is derived from the category-partition method[13]. In its basic form, a classification tree and the accompanying combination table describe abstract high-level test cases in a graphical manner without an explicit notion of time. Initially, the input domain of the system under test (SUT) is partioned separately under various aspects relevant to the test. This is visually represented by means of a classification tree. Then, the resulting partitions are recombined within the combination table in order to form test cases. Since 1999, the method and notion has been enhanced by Conrad and Fey to accommodate the description of time-dependent test scenarios termed test sequences[4,5,6,7]. These extensions are known as the Classification-Tree Method for Embedded Systems CTMEMB (previously known as CTM/ES). In CTMEMB , the classification tree is derived directly from the technical interface of the system under test, i.e. each input of the SUT is represented as a classification in the tree. Each input domain is partitioned into intervals or single values represented as classes below the accompanying classification. Abstract test sequences can be described by means of the classifications and classes constructed in this manner. These sequences consist of separate test steps, whose chronology is shown in the rows of the combination table beneath the tree. Each row represents a test step, where each input in the classification tree is restricted to one of its classes via a marking in the combination table. The activation time of each test step is noted in a separate column of the combination table as Time Tag. 2

Conrad, Krupp marking

description

meaning

●

“full circular marking”

select any value from class range

❍

“empty circular marking”

return to / repeat last value from class (history dependent marking)

■

“quadratic marking”

fire event

Table 1 Marking types

The input values between synchronization points are calculated by means of interpolation. The interpolation function can be chosen from different function types, e.g., step, ramp and sine functions. An interpolation function is represented in the combination table by a certain line style between two subsequent markings. Technically, the classifications of the classification tree describe an abstract state space with domain Xi for a classification i (fig 1, left). Each classification represents an input variable of the SUT. The test scenario is described in the time domain by means of synchronization points. If T = {t0 , t1 , ..., te } with t0 < t1 < ... < te describes the set of synchronization points, then the time intervals [t0 , t1 ], [t1 , t2 ], ..., [te−1 , te ] are called test steps. A class function π˜ i : T → Pi (Xi ) assigns to each synchronization point a partition (i.e. a class) of Xi . It is described by means of the combination table. A value function vi : T → Xi assigns a value to each synchronization point of a classification i. vi is called compatible if for all times t ∈ T holds, that πi (vi (t)) ∈ π˜ i , with the standard projection πi : Xi → Pi (Xi ). An interpolation function ii : T → I assigns an interpolation rule with I = {step, ramp, sine} to an input i for every synchronization point in T . An interpolation rule ii,tk (t) : [tk , tk+1 ] → Xi provides the value of an input beginning at synchronization point tk through the interval [tk , tk+1 ]. When, e.g., a ramp or a sine is applied as interpolation rule with a compatible value function, the result is a continuous test-data waveform (time series) v¯ i : T¯ → Xi , where T¯ ⊃ T is a strictly ordered set of times according to the classical notion of time (cf. [5]).

3

A Description of the Extensions

In order to describe an input event i with the possible i1 , i2 , . . . , ik values (or, alternatively, k exclusive events which are related) a classification i is created in the classification tree, containing the classes i1 , i2 , . . . , ik and ide f ault for the default value, which means ‘event is NOT available’, essentially. The temporal sequence of events as well as the design of time-variable signal forms is described in the combination table beneath the classification tree. For this, an additional quadratic marking type “ ■ ” is introduced. An overview and short description of the available marking types is displayed in table 1. If the appearance of an event ik needs to be described at the point of time ti in the combination table a corresponding row with the time tag ti is inserted into the table. This row is marked with the new quadratic marking type underneath the 3

Conrad, Krupp

i i2 i1 idefault

t0 t1 t2 ...

Inputs

Inputs

i

i

idefault i1 i2 ... 1.1 1.2: event 1 1.3 1.4: event 2 1.5: ... 1.n:

t

idefault i1 i2 ...

Time [s] t0 t1 t1+∆t t2 t2+∆t

1.1 1.2: event 1 1.3: event 2 ... 1.n:

Time [s] t0 t1 t2 tend

tend

Figure 1. Modeling of Events: previous and proposed notation

class ik . For the beginning and the end of the test sequence there are usually two more rows needed at the beginning and the end of the combination table, which are marked underneath the default class ide f ault with one of the circular standard markings (fig. 1, right-hand side). The full circular marking means that a specific value is chosen arbitrarily from the associated class, whereas an empty circular marking means the same value is to be selected, which was selected the last time the associated class was used. The meaning of the circular standard markings remains unchanged compared to the familiar form of CTMEMB (see [4]). A marking of a class ik at the point of time ti with the quadratic marking type means that at this point in time an event i with property k fires. After the point in time ti the default value ide f ault is assumed, until another quadratic marking is encountered at some synchronization point t j . The proposed extension of the classification-tree method for embedded systems by event markings allows a decidedly more compact description of events or similar issues when compared to already existing methods of description. Up to now, two rows labeled with the time tags ti and ti+∆t were needed to describe an event ik at the time ti in the combination table 3 . The first of the two rows describes the appearance of the new value ik while the second row describes its retraction in the next possible time step (Fig. 1, left). The introduction of event markings can thus reduce the amount of rows in the combination table by up to 50%. The results are significantly more compact and thus more readable combination tables. Fig. 1 juxtaposes the present (left) and the new, more compact method of description with 3

∆t here denotes the cycle time of the contemplated system.

4

Conrad, Krupp

event markings (right) using a generic example. The proposed extension proves to be beneficial particularly for simultaneous descriptions of events and time-continuous signal forms within a classification tree. Where an equivalence class of the CTMEMB represents a value or a value interval, an event equivalence class usually represents a specific message or a class of messages to be sent. The underlying test-bench is notified as soon as the message event is triggered. Based on the notion of test sequence and interpolation rule in CTMEMB [5] we describe an expanded notion which allows the introduction of events at precise moments in time. CTMEMB describes the derivation of continuous test-sequences by means of variables and by test-steps which are connected by interpolation rules. An event-variable is a variable xi with (finite and enumerable) domain Xi . For an event-variable, we define a modified interpolation rule as follows. Let {t0 , t1 } be a set of (two) consecutive synchronization points with t0 ≤ t1 and with associated values x0 , x1 ∈ X. ∆t is called sampling time. Its multiples are k∆t, k ∈ N0 . Then a function j : X {t0 ,t1 } × k → X associates a value from X with each discrete point in time k∆t within and including the boundaries {t0 , t1 }. We call j the discrete interpolation rule. One example of a discrete interpolation rule is:      x0 , k = 0 jimpulse (k) :     0 , otherwise jimpulse facilitates the introduction of a cycle-accurate singular impulse into the discretized test data waveform such that the first sample of the synchronization interval at 0 · ∆t is of value x0 and all following samples are of value 0. This function also provides a discretization rule for signals described by way of DiracDistributions as shown in [11] for signal derivations. For message-sending events the discrete interpolation rule is as follows:      x0 , k = 0 jevent (k) :     null , otherwise x0 ∈ X, X = {(ide f ault ≡ null), i1 , i2 , . . .} ⊂ (STRING ∪ null) where X is the input domain which encompasses a set of strings and a neutral element null which represents the case that no message is queued for the respective input. Note that the application of jevent on an input signal must not be mixed with other interpolation rules on the same signal which would cover an input domain other than strings and null.

4

Application Examples

The following subsections explain the event extension of the CTMEMB by means of two examples. The first example is a 4-way cruise control lever and the second 5

Conrad, Krupp

Figure 2. Cruise Control Lever Positions

example explains the temporal modeling of CAN bus messages. 4.1

Event Groups

The modeling of event groups can be explained by considering the modeling of a cruise control lever. Normally the control lever of an adaptive cruise control system is positioned in the middle. Short-term moving (tapping) of the lever in one of the four directions activates one of the functions Accelerate (Set+), Decelerate (Set-), Resume and Off (see Fig 2). Immediately after tapping, the lever automatically returns to the normal position. These events can not occur simultaneously for reasons of design. Using the above-mentioned expansion of the CTMEMB describing events, the position of the cruise control lever can be described through the classification LeverPos (compare Fig. 3). Class 0 corresponds to the default position, classes 1 to 4 to the four events alternatively possible. In such a way, usage scenarios or test patterns for an adaptive cruise control system can compactly be described. Besides containing control lever events, these usage scenarios also include continuous input signal forms for the other input signals such as the two pedal positions phi_Acc and phi_Brake. Fig. 3 depicts such a test pattern for cruise controls which was automatically generated with the aid of the EST (Evolutionary Safety Testing) approach. The depicted test pattern was derived from [14]; the temporal sequence of the control lever event is depicted on the top left. The combination table would roughly be double its size without the expansions of event depiction. Furthermore, the ramp-shaped input signal segments of v_tar would have to be split up in two ramp segments respectively, requiring additional (auxiliary) classes belonging to v_tar. 4.2

CAN Messages

The proposed extension is additionally well-suited for the description of bus communications between electronic control units. As an example, the transmission behavior of two control units of a CAN bus (cf. [8]), which can send the CAN messages a, b, c or x and y., is described in Fig. 4. The two control units show independent transmission behavior; therefore each control unit is modeled by an individual event group, a classification, to which the CAN identifiers possible and 6

Conrad, Krupp

Inputs

phi_Acc

phi_Brake

0 100

0

100

1: violation of safe dist…

LeverPos

0 1 2 3 4

v_tar

20 21 24 25 27 40 Time [s]

1.1: init

0

1.2: start tseq

10.0000

1.3:

13.9130

1.4:

20.4348

1.5:

25.6522

1.6:

38.6957

1.7:

40.000

1.8:

41.3043

1.9:

53.0435

1.10:

59.5652

1.11: stop tseq

70.0000

Figure 3. Modeling of a Test Pattern for an Adaptive Cruise Control System with Event Group and Continuous Signal waveforms

‘no msg’ (for the default value) are assigned to as classes. This way, it is possible to easily depict both the concurrent transmission attempt of the two control units (time tags t1 and t4 in Fig. 4) and the exclusive transmission of one control unit (time tags t2 and t3 in Fig. 4). A combination of the above presented description of the transmission behavior of the control units with the mechanisms proposed in the original CTMEMB for the depiction of expected behavior on classification tree level (cf. [5]) allows a test description of complex CAN networks.

5

Related Work

An approach for improving the test coverage of Simulink models by means of the classification-tree method for embedded systems was introduced in [11]. An input is described by the actual signal form as well as by its derivation. The derivation waveform of the signals is described via Dirac impulses, which can be understood as an instantiation of the general approach for the event description presented in this paper. 7

Conrad, Krupp msg c ECU #1 msg b msg a t0 t1 t2

t4

msg y ECU #2 msg x t0 t1

t3 t4

t

Inputs

t

ECU #1

ECU #2

no msg msg a msg b msg c no msg msg x msg y Time [s] t0 1.1 t1 1.2: event 1 t2 1.3: event 2 t3 1.4: event 3 t4 1.5: event 4 tend 1.n:

Figure 4. Modeling CAN Bus Signals

6

Summary

This paper presented an extension to the Classification-Tree Method for Embedded Systems (CTMEMB ) for the compact description of events within test scenarios for mixed discrete / continuous systems. For the depiction of event-like issues a new square marking type is being used, which corresponds to the additional transition function with event semantics. The proposed notational extensions of the CTMEMB allow very compact and natural descriptions of events especially within test or usage scenarios where continuous and event-based inputs are to be combined. In comparison with the previously used approaches, the size of the combination tables can be reduced by up to 50%. Thanks to the open structure of the original CTMEMB framework, it is possible to integrate the extensions seamlessly into present syntax and semantics. The extension of the Classification-Tree Method for Embedded Systems presented in this paper allows access to new fields of application for the compact depiction of test scenarios by means of the CTMEMB . Among them are, for example, event-based body control systems and ECU clusters connected via CAN. The implementation of the introduced extensions for the CTMEMB in the test tools supported is possible in a straightforward manner.

Acknowledgements The work described was partially performed within the IMMOS project funded by the German Federal Ministry of Education and Research (www.immos-project.de). The authors wish to thank Wolfgang Mueller (C-LAB) and Ines Fey (DaimlerChrysler) for helpful discussions. 8

Conrad, Krupp

References [1] Aldrich, W., Using Model Coverage Analysis to Improve the Controls Development Process, in: AIAA Modeling and Simulation Conference, Monterey, California, 2002. [2] Angermann, A., M. Beuschel, M. Rau and U. Wohlfahrth, “Matlab – Simulink – Stateflow: Grundlagen, Toolboxen, Beispiele,” Oldenbourg Verlag, München, 2004. [3] Broekman, E. and E. Notenboom, “Testing Embedded Software,” Addison-Wesley, London(GB), 2003. [4] Conrad, M., A Systematic Approach to Testing Automotive Control Software, in: Proc. 30. Int. Congress on Transportation Electronics (Convergence ’04), Detroit, MI, USA, 2004, pp. 297–308, sAE Techn. Paper #2004-21-0039. [5] Conrad, M., “Modell-basierter Test eingebetteter Software im Automobil (Modelbased Testing of Embedded Automotive Software),” PhD Thesis, Deutscher Universitäts-Verlag, Wiesbaden, 2004. [6] Conrad, M., The Classification-Tree Method for Embedded Systems, in: Dagstuhl Seminar Proceedings 04371, 2005. [7] Conrad, M., H. Dörr, I. Fey and A. Yap, Model-based Generation and Structured Representation of Test Scenarios, in: Workshop on Software-Embedded Systems Testing (WSEST), Gaithersburg, USA, 1999. [8] Etschberger, K., editor, “CAN Controller-Area-Network,” Fachbuchverlag Leipzig, 2002. [9] Grimm, K., “Systematisches Testen von Software - Eine neue Methode und eine effektive Teststrategie (Systematic Software Testing – A new method and an effective test strategy),” Number 251 in GMD-Report, GMD, Oldenbourg, 1995. [10] Grochtmann, M. and K. Grimm, Classification Trees for Partition Testing, Software Testing, Verification and Reliability 3(2), 1993, pp. 63–82. [11] Krupp, A. and W. Mueller, Die Klassifikationsbaummethode für eingebettete Systeme mit Testmustern für nichtkontinuierliche Reglerelemente, in: INFORMATIK 2005 – Informatik LIVE!, Bd. 2, 35. GI-Jahrestagung, 3. ASWE Workshop, GI, Bonn, 2005. [12] Lamberg, K., M. Beine, M. Eschmann, R. Otterbach, M. Conrad and I. Fey, Modelbased Testing of Embedded Automotive Software using MTest, SAE 2004 Transactions, Journal of Passenger Cars - Electronic and Electrical Systems 7 (2005), pp. 132–140. [13] Ostrand, T. J. and M. J. Balcer, The Category-Partition Method for Specifying and Generating Functional Tests, Commun. ACM 31(6), 1988 pp. 676–686. [14] Pohlheim, H., M. Conrad and A. Griep, Evolutionary Safety Testing of Embedded Control Software by Automatically Generating Compact Test Data Sequences, SAE 2005 Transactions, Journal of Passenger Cars - Mechanical Systems (2005), pp. 804– 814. [15] Rau, A., “Model-Based Development of Embedded Automotive Control Systems,” Ph.D. thesis, Dept. of Computer Science, University of Tübingen, Germany (2002).

9

MBT 2006

A model-based integration and testing approach to reduce lead time in system development ? N.C.W.M. Braspenning 1 J.M. van de Mortel Fronczak 2 J.E. Rooda 3 Department of Mechanical Engineering Eindhoven University of Technology Eindhoven, The Netherlands

Abstract New methods and techniques are needed to reduce the very costly integration and test lead time in the development of high-tech multi-disciplinary systems. To facilitate lead time reduction, we propose a method called model-based integration, which can be described as follows. Models of system components that are not yet physically realized are integrated with available realizations of other components. The achieved combination of models and realizations is used for early analysis of the integrated system by means of validation, verification, and testing. This analysis enables early detection and prevention of problems that would otherwise occur during real integration. Early prevention of problems reduces the time invested on integration and testing of the real system. This paper illustrates how models developed for model-based integration can be used for automated model-based testing, which allows time-efficient determination of the conformance of component realizations with respect to their requirements. The combination of model-based integration and model-based testing is practically illustrated in a realistic industrial case study. Results obtained from this study encourage further research on modelbased integration as a prominent method to reduce the integration and test lead time. Key words: Model-based integration, model-based testing, industrial case study

? This work has been carried out as part of the Tangram project under the responsibility of the Embedded Systems Institute. This project is partially supported by the Netherlands Ministry of Economic Affairs under grant TSIT2026. 1 Email (corresponding author): [email protected] 2 Email: [email protected] 3 Email: [email protected] This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs

Braspenning, van de Mortel-Fronczak, and Rooda

1

Introduction

High-tech multi-disciplinary systems like wafer scanners, electronic microscopes and high-speed printers are becoming more complex every day. These systems, consisting of numerous hardware and software components connected through many interfaces, have to meet the strict quality requirements set by the customer in market conditions where lead time (in the context of time to market) is critical. This increasing system complexity also increases the effort needed for the, so-called, integration and test phases. During these phases, the system is realized by combining component realizations (implementations) and, subsequently, tested against the system requirements. In most of the current development processes, the integration and test phases start when the component realizations become available, and these phases should be finished before the system’s shipment date agreed with the customer. As a result, the main lead time burden is shifting from the design and implementation phases to the integration and test phases [6]. Furthermore, finding and fixing integration and test problems late in the system development process (which is the case in the current approach) can be up to 100 times more expensive than finding and fixing the problems during the requirements and design phases [3]. Many research activities that aim at countering this increase of development effort (in terms of lead time, costs, resources) involve model-based techniques like requirements modeling [8], model-based design [13,17], model-based code generation [9], and hardware-software co-simulation [19]. In most cases, however, these model-based techniques are investigated in isolation, and little work is reported on combining these techniques into an overall method. Although model-based systems engineering [18] and OMG’s model-driven architecture [16] (for software only systems) are such overall model-based methods, these methods are mainly focusing on the requirements, design, and implementation phases, rather than on the integration and test phases. Furthermore, literature barely mentions realistic industrial applications of such methods, at least not for high-tech multi-disciplinary systems. Our research within the Tangram project [20] focusses on a method of model-based integration, in which model-based techniques are developed and applied in industry in order to reduce the integration and test lead time. In this method, models of system components that are not yet physically realized are integrated with available realizations of other components, establishing a model-based integrated system. This model-based integrated system is used for analysis on the system level before all components are realized. This early analysis takes integration and test effort out of its critical position and enables the developers to detect and prevent problems that would otherwise occur during real integration (i.e. earlier and thus cheaper), eventually resulting in a reduction of the lead time. Furthermore, the model-based analysis techniques help in clarifying and improving the often difficult decomposition 11


of requirements and design of the system (usually clear and certain) into the requirements and designs of all components (usually unclear and based on assumptions). These improved insights in the system decomposition eventually improve the quality of the system realization. After sufficient and successful validation and verification, the models used for model-based integration are good representations of the requirements and the designs of the corresponding components. When the realization of such a component becomes available, it would be interesting to determine whether this realization conforms to the model (and thus to the requirements and design), before integrating it into the system. When discrepancies between realization and model are found during this analysis, this means that either a problem in the realization is found that needs to be fixed, or it pinpoints incomplete or unclear parts of the requirements and the design that need improvement. Testing the conformance of a component realization with respect to a specification model is the topic of model-based testing research [7], for which several model-based test tools [14] are available. In this paper, we describe how model-based testing is positioned in the model-based integration method and used to determine whether a component realization conforms to the model developed for model-based integration. This model-based integration and testing approach has been applied to an industrial case study concerning the ASML [1] wafer scanner. The structure of the paper is as follows. The model-based integration method and the accompanying techniques and tools are introduced in Section 2. Section 3 describes how model-based testing is positioned in this method. The case study application and results are presented in Section 4. Finally, the conclusions are drawn and discussed in Section 5.

2

Model-based integration

In current industrial practice, the system development process is subdivided into multiple concurrent component development processes. Subsequently, the resulting components are integrated into the system. The development process of a component Ci consists of a requirements definition phase, a design phase, and a realization phase. Each of these three phases results in a different representation form of the component, namely the requirements, the design, and the realization of the component, denoted here as Ri , Di , and Zi , respectively. In the development process of a system S that consists of multiple components, e.g. components C1 and C2 , the system requirements and system design, denoted here as R and D, respectively, precede the development processes of the components. The realization of system S is the result of the integration of realizations Z1 and Z2 of components C1 and C2 . This integration is denoted as {Z1 , I12 , Z2 }, where I12 denotes the infrastructure connecting Z1 and Z2 . Figure 1 shows the development process of system S. In this way of working, only two types of system level analysis can be 12


R1

design

D1

realize

define define

R

design

Z1 integrate

D

I12 integrate

define

R2

design

D2

realize

Z2

Fig. 1. Current system development process

applied. On one hand, the consistency between requirements and designs on component level and on system level can be checked, e.g. R vs. R1 , R2 and D vs. D1 , D2 (which usually boils down to reviewing lots of documents). On the other hand, the integrated system realization, e.g. {Z1 , I12 , Z2 }, can be tested against the system requirements, R, which requires that all components are realized and integrated. This means that when problems occur and need to be fixed during the integration and test phases, the effort invested in these phases immediately increases, directly threatening on-time system shipment. We propose a method of model-based integration to reduce the integration and test lead time. In this method, the designs of the components (e.g. software, mechanics, electronics) are represented by formal, executable models of communicating concurrent processes, expressed in a process algebra [2]. The resulting models, denoted here as Mi for a component Ci , enable formal analysis of component and system behavior. With model validation (e.g. simulation), it can be checked whether the behavior of the integrated models, e.g. {M1 , I12 , M2 }, conforms to the system design, D. With model verification (e.g. model checking), it can be checked whether certain properties from the system requirements, R, are satisfied by the integrated models. Analysis by validation and verification helps in evaluating and improving the correctness of the decomposition of the requirements and design of the system into the requirements and designs of the components. Besides that models enable these additional model-based analysis techniques, they can also replace realizations. This means that integrations of models and realizations can be tested against the system requirements without the necessity that all component realizations are available. As models are usually available earlier than realizations, testing on the system level can start earlier. Earlier testing allows earlier detection and prevention of system integration problems, which should lead to a reduction of the effort invested during real integration and testing. Figure 2 shows the development process of system S in the model-based integration method, where Mi denotes a model of component Ci (based on its design Di ), and where I12 denotes an infrastructure that allows the integration of components C1 and C2 , both represented by either a model or a realization. Note that with code generation, the realization of a software component, Zi , could also be based on its model Mi . 13


realize

R1

design

D1

model

define define

R

design

M1

Z1

integrate

integrate

D

I12 define

R2

design

D2

model

integrate

integrate

M2

Z2

realize

Fig. 2. System development process in the model-based integration method

In our research, we use the timed process algebra χ [23], developed at the Systems Engineering Group, Eindhoven University of Technology, for modeling components. For each component, the internal behavior of the process (assignments, guarded alternatives, guarded repetitions, delays) and the external communication with other processes (sending, receiving) is modeled. The system of components is modeled as the parallel composition of all instantiated component processes, connected by communication channels. The χ toolset contains a simulator to simulate such a system model. Furthermore, several back ends are added to the χ toolset to enable other analysis techniques like model verification, distributed/real-time simulation, and software/hardwarein-the-loop testing, which are all used in the model-based integration method. As mentioned in the introduction, it would be interesting to determine whether a component realization, when it becomes available, conforms to the model used for model-based integration, before the component realization is integrated with other components. Therefore, the set of analysis techniques mentioned previously is extended with model-based testing, as described in the next section.

3

Model-based testing

Model-based testing provides theories and tools for automated testing, which is receiving more and more attention as an alternative to manual and scripted testing, which are becoming incapable of finding all errors within time. In model-based testing, a formal specification model of a component is used to generate tests from, and these tests are executed on-the-fly on the component realization, resulting in a ‘pass’ or ‘fail’ test verdict. In the Tangram project, the test tool TorX [21], based on the theory of input-output conformance (ioco), is used for model-based testing. While extensions towards timed testing [4] and testing with more complex data [12] are being developed, the version of TorX used in our experiments only supports model-based testing of untimed, discrete-event systems without complex data. 14


As previously mentioned, the models used for model-based integration are developed in χ, currently not supported by TorX. However, TorX supports Trojka [10], a slightly modified version of Promela, the specification formalism for the model checker Spin [15]. Spin is also used in the model verification back end of the χ toolset [5], for which a translation scheme from χ to Promela is developed [22]. By combining the translation of χ to Promela and the model-based testing capabilities of TorX, the conformance of a component realization with respect to the χ model used for model-based integration can be determined. This approach is visualized in Figure 3 for system S, in the case that the realization of component 2 becomes available first. In this figure, the χ model of component 2, M2,χ , used for model-based integration with M1,χ , is translated into a Promela equivalent, M2,P . Subsequently, TorX tests whether the realization Z2 , used for model-based integration with M1,χ and later for real integration with Z1 , is ioco conforming to M2,P . Model-based integration of {M1,I12,M2}

Model-based integration of {M1,I12,Z2}

Real integration of {Z1,I12,Z2}

M1,χ

M1,χ

Z1

integrate

integrate

integrate

I12

I12

I12

integrate

integrate

integrate

Z2

Z2

M2,χ

χ2Promela

M2,P

TorX: ioco?

Fig. 3. Model-based integration and test approach with χ and TorX

The procedure for the model-based integration method extended with model-based testing is as follows: 1. Modeling of components, e.g. M1 and M2 , based on their designs 2. Validation and verification of the model-based integrated system with models only, e.g. χ simulation and Spin model checking of {M1 , I12 , M2 } 3. For each component: (a) Replacement of model by realization, e.g. M2 by Z2 , using an infrastructure that enables the integration with the other components (b) Model-based testing of realization with respect to model, e.g. Z2 with respect to M2 using TorX (c) Testing of the model-based integrated system with models and realizations, e.g. {M1 , I12 , Z2 } 4. Testing of the real integrated system, e.g. {Z1 , I12 , Z2 } Note that in this paper, model-based testing is used for components only, 15


although the same technique can also be applied to a system of components. Model-based testing of a system of components can be achieved by using the integrated component models, e.g. {M1 , I12 , M2 }, as basis for test generation and by using the integrated component realizations, e.g. {Z1 , I12 , Z2 }, as system under test. This is possible as long as the size and complexity of the integrated models are not beyond the limitations of the test tool (model abstraction can be used to solve this issue), and as long as the test tool can access the required test interfaces of the integrated system realization. Furthermore, this paper does not use a compositional testing technique, as described in [24], to imply conformance of the integrated system based on the conformance of the individual components. This compositional testing technique requires that the models are complete, i.e. explicit specification of all allowed responses for any possible input, which is not the case for the models developed for model-based integration.

4

Case study: ASML laser subsystem

The model-based integration and testing approach described in the previous sections has been applied to a case study concerning the laser subsystem of the ASML wafer scanner, which is used in lithography industry for the production of integrated circuits or chips. In a wafer scanner, the lithographic process of exposing a silicon wafer with a certain pattern (corresponding to one layer of a chip) takes place. The laser subsystem of a wafer scanner generates the laser light that is used for this lithographic process. A controller, that is part of the wafer scanner, communicates with the laser subsystem in order to get the required amount of laser light for each exposure. This communication is realized by two bi-directional interfaces: a serial (RS232) interface for commands and responses and a parallel interface for multiple status signals. Experience has shown that the interface between the wafer scanner controller and the laser subsystem is difficult to understand, integrate and diagnose. This is mainly caused by the fact that the laser subsystem is produced by a third party manufacturer, meaning that the ASML engineers do not have full insight in and control over the behavior as implemented in the laser subsystem. Therefore, correct integration of the wafer scanner controller and the laser subsystem is an important aspect for the performance and reliability of the wafer scanner. Due to safety and cost reasons, a hardware laser simulator has been used in the case study instead of the real laser. This hardware laser simulator has the same electrical interfaces and the software running on it is specified to behave exactly the same as the real laser. The laser simulator has been developed by ASML and is used for testing the software and electronics of the wafer scanner controller, without the need for a real laser (including the required space and facilities). Because the laser simulator is used for testing of the wafer scanner controller, it is important that the behavior of the laser simulator satisfies the behavior specification of the real laser, in order to avoid faulty test outcomes 16


and, even worse, faulty fixes in the wafer scanner controller. In the case study, we illustrate the application of steps 1, 2, 3a, and 3b of the procedure described in the previous section. χ models of the wafer scanner controller and of the laser subsystem have been developed, integrated, and analyzed by χ simulation and Spin model checking. Subsequently, the conformance of the hardware laser simulator with respect to the Promela equivalent of the laser subsystem χ model is determined using model-based testing with TorX. The application and the results of each of these steps are presented in the sequel. Step 1: Modeling of components The specification documents of the laser subsystem and of the communication with the wafer scanner controller have been taken as a starting point for modeling the components. Our experience is that the modeling activities help in finding and clarifying errors, inconsistencies, and incompleteness in the requirements and design documents. Figure 4 shows the processes (circles) and communication channels (arrows) that have been modeled as described below. Note that processes IO, LC, and LS are all part of the laser subsystem model.

Wafer scanner C

config

command Laser subsystem

response

IO

command

state response

config

LC

state

LS

Fig. 4. Processes and channels of wafer scanner controller and laser subsystem

Wafer scanner controller C This process can be configured (using an external configuration file) to execute specific command sequences for behavior validation, e.g. operational sequences as specified in the documentation. 17


I/O interface IO This process receives the commands from C and, after the handling of the commands by LC or by LS, it sends the responses back to C. Laser communication LC This process receives the commands from C (passed through by IO) and, according to its configuration (stored in an external configuration file), it performs the necessary actions (e.g. a state change) and creates the corresponding responses. Laser state LS This process keeps track of the laser state, which is used by IO for the response to a laser state query command by C. Each process definition contains the state and temporal behavior of the component and the communication behavior including the data that is communicated. Here, the communication involves both the serial and the parallel interface of the laser subsystem. Furthermore, the model of the laser subsystem contains the error handling of ‘unknown’ commands (unspecified commands) and ‘bad context’ commands (specified commands that are not allowed in a certain state). As previously mentioned, the temporal behavior and the complex data, frequently used in the models developed for model-based integration, are not supported by the version of TorX used in the case study. Also the χ to Promela translation scheme used in step 2 of the case study and the Promela language itself have their limitations concerning time and data. Therefore, the original χ models have been made suitable for translating to Promela and for model-based testing with TorX by applying abstractions from time and complex data. These abstractions do not influence the state and communication behavior of the system that is analyzed in steps 2 and 3 of the case study. The resulting χ models are configurable in the sense that the command sequences of the wafer scanner controller and the behavior of the laser subsystem can be modified in external configuration files without modification and recompilation of the χ models. This flexibility of behavior modeling has shown its advantage when the hardware laser simulator was not available for a certain laser type and another laser type had to be modeled. The model-based integrated system, consisting of both the wafer scanner controller and the laser subsystem, is obtained by the parallel composition of all χ processes. Here, the parallel composition operator, defined in the process algebra χ, is used as infrastructure between the two components (corresponding to I12 in Figure 3). The resulting χ system model contains 350 lines of code in total, including the necessary data definitions and functions. Step 2: Validation and verification of integrated models The model-based integrated system developed in step 1 has been validated using the model simulator of the χ toolset. Several simulation runs have been executed, in which the command sequences from the specification documents, e.g. for switching the laser subsystem on and off, are specified in the configuration file of the wafer scanner controller model C. Based on the simulation 18


results, the laser subsystem behavior conforms to the specification documents for all command sequences. Besides validation, also certain properties of the model-based integrated system have been verified by Spin model checking. To perform this type of analysis, the χ model has been translated into Promela, using the translation scheme from [22]. As previously mentioned, this translation scheme and the Promela language itself have their limitations regarding time and data, however the abstractions applied in step 1 result in a model that is suitable for translation to Promela. The resulting Promela system model contains 1850 lines of code, including 900 lines for representing the equivalent of all data definitions used in the χ model and 300 lines for representing the equivalent of all functions used in the χ model as additional processes. Besides a model expressed in Promela, the properties to be verified have to be specified for model checking with Spin. Eight properties of the system have been verified: absence of deadlock, a system invariant concerning the translation of a specific χ statement, and six model specific behavioral properties. Checking the absence of deadlock, or invalid end states, is a standard option in Spin. The system invariant has been checked by defining a safety property on the precondition and the postcondition of the translated χ statement, which is expressed in the linear temporal logic (LTL) formula (precondition → ♦postcondition). Two of the model specific behavioral properties concern the allowed order of state transitions, e.g. from the ‘off’ state, the laser state can only become ‘standby’ without being ‘on’ in between, which is expressed in the LTL formula (state off → ¬state on U state standby). The other four model specific behavioral properties concern all possible actions and responses to each command. For example, when the laser receives the ‘go off’ command, while it is in the ‘off’ state or in the ‘on’ state, it stays in the current state and responds with ‘not allowed’, or, when it receives the ‘go off’ command while it is in the ‘standby’ state, it goes to the ‘off’ state and responds with ‘state off’. This is expressed in the LTL formula: (cmd go off → ♦((state off U rsp not allowed)∨ (state on U rsp not allowed)∨ (state standby U (state off ∧ (state off U rsp state off))))) All these properties have been verified and found to be correct. Based on these verification results, together with the correct simulation results, there is enough confidence that the model is a good representation of the requirements and the design of the laser subsystem, and therefore a good basis for automated model-based testing of the hardware laser simulator. Step 3a/3b: Model-based testing of the laser subsystem In this step of the case study, we use the model developed in step 1, and validated and verified in step 2, for automated model-based testing of the 19


hardware laser simulator using the TorX test tool. As visualized in Figure 3, TorX is connected to the model on one side, and to the realization (in this case the hardware laser simulator) on the other side. To connect the model to TorX, the Promela model of step 2 has been slightly modified, resulting in a Trojka model that is suitable for TorX. A Trojka model must be an open model, meaning that some channels of the processes are not connected to other processes. These unconnected channels are the so-called observable channels, on which test inputs can be given and on which test outputs can be observed. In the case study, an open Trojka model with observable channels has been obtained by removing the C process from Figure 4 and by giving the unconnected command and response channels of process IO the special channel attribute OBSERVABLE. The resulting Trojka model of only the laser subsystem contains 1000 lines of code, including 300 lines for representing all data definitions and 300 lines for representing all functions. To connect the realization to TorX, the abstract commands from the Trojka model need to be transformed into the real commands for the realization and vice versa for the responses of the realization. As previously mentioned, the real commands and responses for the hardware laser simulator (as well as for the real laser) are sent and received through a serial interface and a parallel interface. Unfortunately, direct access of these interfaces from outside, as required for model-based testing with TorX, is limited. While functionality for direct access from outside is provided for the serial interface, this is not the case for the parallel interface, because this interface uses an ASML specific communication protocol, embedded in the electronics of the wafer scanner controller. This limitation in interface access from outside drastically reduces the laser subsystem behavior that can be automatically tested, since the larger part of the laser subsystem state space can only be reached by using the parallel interface. However, the reduced behavior that can be tested with serial communication only is still sufficient to demonstrate automatic testing with TorX based on the models developed for model-based integration. For correct communication over the serial interface, an adapter component has been written in Python that accepts test inputs from TorX, performs the necessary transformations of the abstract model commands into real laser commands (e.g. a left justified string of 128 bits), and uses the provided direct access functionality to send the command over the serial interface. The response to the command is received from the laser subsystem, and after it is transformed back into the abstract response used in the model, it is sent back to TorX by the adapter component. Now both the model and the realization have been connected to TorX, the conformance of the hardware laser simulator with respect to the model can be determined by model-based testing. For all three test runs that have been performed, a random test selection strategy is used to select the test inputs 20


from the set of commands in the Trojka model. The selected commands are sent to the laser simulator through the adapter and the provided direct access functionality, and subsequently the responses from the laser simulator are observed and compared with the behavior specified in the model. The first test run had a limited depth (less than 20 events) and took less than ten seconds until a discrepancy between realization and model was found. To clarify the discrepancy found, Figure 5 shows the state diagram of the laser subsystem that has been automatically tested (i.e. using the serial interface only). In this figure, the nodes depict the states of the model, the solid edges depict the commands sent over the serial interface (starting with ‘LS’), and the dashed edges depict the responses to the commands (starting with ‘LS’ or ‘??’). The central states at the top and bottom denote the actual laser states ‘off’ and ‘standby’, numbered ‘00’ and ‘03’, respectively. The ‘state change’, ‘state query’, and ‘bad context’ states are intermediate states between the commands and the corresponding responses. Note that any other command not shown in the figure results in an ‘unknown command’ response (‘??=00’).

'LS=00'

'LS?'

bad context

off (00)

state query

'??=02'

'LS=00'

'LS=03'

'LS=00'

state state change change

'LS=03'

'LS=00'

'LS=03' bad context

'LS?' stanby (03)

'??=02'

state query 'LS=00'

Fig. 5. Laser subsystem behavior that has been tested

Figure 6 shows the message sequence chart of the first model-based test run, where the ‘TDRV’ thread represents the test tool, the ‘iut’ (implementation under test) thread represents the realization, and the ‘out’ thread represents the output of the test run. Note that the ‘=’ and ‘?’ characters are 21


replaced by ‘ eq ’ and ‘ QM’, respectively, because they are not allowed to be used in Promela/Trojka. When the hardware laser subsystem is in the ‘standby (03)’ state (the response to ‘LS?’ is ‘LS=03’, fourth arrow from below), it receives the command ‘LS=03’ (third arrow from below) and responds with the current laser state, ‘LS=03’ (second arrow from below). However, according to the specifications (see Figure 5), giving a command to go to the current state (here the command ‘LS=03’ in the ‘standby (03)’ state) should result in a ‘bad context’ response (‘??=02’), which is indicated by the last ‘Expected’ arrow in Figure 6.

Fig. 6. Message sequence chart showing the discrepancy

This discrepancy, resulting in a ‘test fail’ verdict, means that the realization is not conforming to the model. Further diagnosis has shown that this non-conformance is due to an incorrect implementation of the error handling behavior of the hardware laser simulator. Directly fixing this error in the laser simulator software was impossible, because the required knowledge and tools were not available at that moment. Therefore, in order to enable further testing, a small modification has been made in the model such that for the next test run there is no discrepancy between model and realization for the handling of the ‘bad context’ error. 22


The second test run had a limited depth as well (less than 20 events) and again took less than ten seconds until another discrepancy was found, involving the handling of the ‘unknown command’ error. According to the laser subsystem specifications, any laser command other than ‘LS=00’, ‘LS=03’, or ‘LS?’, for instance ‘LS=01’, is an unknown command and should thus result in an ’unknown command’ response (??=00). However, when such an ‘unknown’ command (e.g. ‘LS=01’) was selected, which is an allowed test input, the laser simulator responded with the current laser state, as if the laser state query command ’LS?’ was given. Further diagnosis has shown that also this non-conformance is due to an incorrect implementation of the error handling behavior of the hardware laser simulator. To enable further testing, the set of allowed test inputs has been restricted to known commands only, i.e. ‘LS=00’, ‘LS=03’, or ‘LS?’, such that for the next test run the ‘unknown command’ error handling of the laser simulator will not be tested. The third test run, in which the discrepancies found in the first two test runs are not detected any more, kept going for a long time (test depth of more than 1000 events, taking more than 15 minutes), without finding new discrepancies. Although no coverage metrics have been applied (as this is not a current feature of TorX), the test results of the third test run provided enough confidence that no other discrepancies between the realization and the model would be found. The two implementation errors that have been found by automatic modelbased testing are both related to the error handling behavior of the laser simulator. After discussion with the ASML engineers, it became clear that the laser simulator is mainly used for testing the wafer scanner controller under nominal behavior conditions. Although the errors may appear to be trivial and should normally not be encountered during nominal testing with the laser simulator, this experiment shows that such errors are not easily detected in the current industrial way of working, and that a more systematic approach like model-based integration and testing certainly has potential. Furthermore, when these errors would remain undetected, they may still have a substantial impact in the case that the wafer scanner controller contains errors related to the laser simulator errors. In that case, the errors in the wafer scanner controller remain hidden when the development tests rely on the laser simulator, and these errors may cause problems later when the wafer scanner controller is used together with a real laser in a real production environment. In the testing experiments that have been performed, only the relation between individual commands and responses has been tested using the regular testing features of TorX. However, it would also be interesting to test the specific behavioral properties as verified in step 2 of the case study, involving relations of subsequent state transitions and combinations of responses and current states. Focussing model-based test runs towards such specific behaviors can be achieved by defining test purposes, a feature that is supported by TorX [11]. 23


5

Conclusions

A method of model-based system integration with χ is extended with modelbased testing using TorX. Both the method and the extension are successfully applied in a realistic industrial case study to practically illustrate the advantages of model-based validation, verification, and testing, by means of χ simulation, Spin model checking, and test generation and execution with TorX, respectively. These model-based analysis techniques facilitate detection of documentation errors and provide automatic detection of non-conformance of a component realization with respect to the corresponding requirements. The case study presented is an instructive investigation of the advantages and challenges of combining model-based integration and model-based testing. The use of process algebra χ, as a basis for model-based integration, allows easy specification of state, temporal and communication behavior of components; moreover, several complex data structures are supported in χ. Models implemented in χ can easily be integrated, allowing the application of different model-based system analysis techniques. With further improvements of these analysis techniques, e.g. time extensions for Spin and TorX, and by improving and automating the translation from χ to Promela, we can achieve a powerful environment for formal model-based system analysis and automated test generation. Additionally, more work is needed on facilitating the integration of models and realizations. Currently, an infrastructure that allows straightforward coupling of models and realizations is under development. This model-based integration infrastructure should be capable of dealing with the issues of synchronous/asynchronous communication and real-time execution of distributed components. This paper practically illustrates a formal approach for early detection and prevention of system integration problems, on one hand, and time-efficient determination of conformance of component realizations with respect to their requirements, on another. Both early prevention of problems and efficient conformance testing contribute to the reduction of integration and test lead time, in comparison with current industrial practice focussing only on realizations and usually using manual test techniques. Altogether, the current results of the model-based integration method look promising, however further research and application in industry is needed to really illustrate a reduction of integration and test lead time in the development of high-tech multidisciplinary systems.

6

Acknowledgements

The authors would like to thank Nicola Trˇcka, René de Vries, and Will Denissen for their support regarding Promela, TorX and test infrastructure. We also thank Jan Tretmans, Dragan Kostić and all other Tangram project members for their valuable comments and fruitful discussions. 24


References [1] ASML, Website (2005), http://www.asml.com. [2] Baeten, J. and W. Weijland, “Process Algebra,” Cambridge Tracts in Theoretical Computer Science 18, Cambridge University Press, 1990. [3] Boehm, B. and V. Basili, Software defect reduction top 10 list, IEEE Computer 34 (2001), pp. 135–137. [4] Bohnenkamp, H. and A. Belinfante, Timed testing with TorX, in: FM 2005: Formal Methods: International Symposium of Formal Methods Europe, Newcastle, UK, Lecture Notes in Computer Science 3582 (2005), pp. 173–188. [5] Bortnik, E., N. Trˇcka, A. Wijs, S. Luttik, J. van de Mortel-Fronczak, J. Baeten, W. Fokkink and J. Rooda, Analyzing a χ model of a turntable system using SPIN, CADP and Uppaal, Journal of Logic and Algebraic Programming 65 (2005), pp. 51–104. ¨ [6] Bratthall, L., P. Runeson, K. Adelsward and W. Eriksson, A survey of lead-time challenges in the development and evolution of distributed real-time systems, Information and Software Technology 42 (2000), pp. 947–958. [7] Brinksma, E. and J. Tretmans, Testing transition systems: An annotated bibliography, in: MOVEP 2000 – Modelling and Verification of Parallel Processes, Lecture Notes in Computer Science 2067 (2001), pp. 187–195. [8] Broy, M. and O. Slotosch, From requirements to validated embedded systems, in: Embedded Software: First International Workshop, EMSOFT 2001, Tahoe City, CA, USA, Springer-Verlag (2001), pp. 51–65. [9] Budinsky, F., M. Finnie, J. Vlissides and P. Yu, Automatic code generation from design patterns, IBM Systems Journal 35 (1996), pp. 151–171. [10] de Vries, R. and J. Tretmans, On-the-fly conformance testing using Spin, in: Fourth Workshop on Automata Theoretic Verification with the Spin Model Checker, ENST 98 S 002 (1998), pp. 115–128. [11] de Vries, R. and J. Tretmans, Towards formal test purposes, in: Proceedings of the Workshop on Formal Approaches to Testing of Software, FATES ’01, Aalborg, Denmark, BRICS Notes Series NS-01-4, University of Aarhus (2001), pp. 61–76. [12] Frantzen, L., J. Tretmans and T. Willemse, Test generation based on symbolic specifications, in: Proceedings of the Workshop on Formal Approaches to Software Testing (FATES 2004), Linz, Austria, Lecture Notes in Computer Science 3395 (2004), pp. 1–15. [13] Gomaa, H., “Designing Concurrent, Distributed, and Real-Time Applications with UML,” Addison-Wesley Professional, 2000, 1st edition. [14] Hartman, A., Model-based test generation tools, AGEDIS report, AGEDIS project (2002).

25


[15] Holzmann, G., The model checker SPIN, Software Engineering Journal 23 (1997), pp. 279–295. [16] Kleppe, A., W. Bast and J. Warmer, “MDA Explained: The Model Driven Architecture: Practice and Promise,” Addison-Wesley Professional, 2003, 1st edition. [17] Liu, X., J. Liu, J. Eker and E. Lee, Heterogeneous modeling and design of control systems, in: Software-Enabled Control: Information Technology for Dynamical Systems, Wiley-IEEE Press, 2003 pp. 105–122. [18] Ogren, I., On principles for model-based systems engineering, Systems Engineering 3 (2000), pp. 38–49. [19] Rowson, J., Hardware/software co-simulation, in: DAC ’94: Proceedings of the 31st annual conference on Design automation (1994), pp. 439–440. [20] TANGRAM Project, Website (2003), http://www.esi.nl/tangram. [21] Tretmans, J. and E. Brinksma, TorX : Automated model based testing, in: First European Conference on Model-Driven Software Engineering, AGEDIS project, 2003. [22] Trˇcka, N., Verifying χ models of industrial systems with Spin, Computer Science Reports 05-12, Eindhoven University of Technology (2005). [23] van Beek, D., K. Man, M. Reniers, J. Rooda and R. Schiffelers, Syntax and semantics of timed chi, Computer Science Reports 05-09, Eindhoven University of Technology (2005). [24] van der Bijl, M., A. Rensink and J. Tretmans, Compositional testing with ioco, in: Formal Approaches to Software Testing: 3rd Int. Workshop, FATES 2003, Montreal, Quebec, Canada, Lecture Notes in Computer Science 2931 (2003), pp. 86–100.

26

MBT 2006

Towards Test Purpose Generation from CTL Properties for Reactive Systems Daniel Aguiar da Silva 1,2 and Patr´ıcia D. L. Machado 3 Grupo de Métodos Formais Universidade Federal de Campina Grande Campina Grande, Brazil

Abstract This paper presents an approach for the generation of test purposes in the form of labelled transition systems from specifications of properties in CTL . The approach is aimed at adapting the model checking process, by extending search algorithms to perform further analysis so that examples and counter-examples can be extracted. An algorithm for the generation of test purposes through analysis over the examples and counter-examples is presented, along with a case study to show the correspondence between the CTL properties and the generated test purposes. Key words: Test Purpose, Testing, Formal Testing, Formal Verification, Model Checking.

1

Introduction

Specifying systems that are reactive, distributed and concurrent can be complex and error-prone. Many formalisms (e.g. [16,13,14]) have been defined to support this task, producing more precise and correct models. Thus, the verification of properties against models may be done through formal methods, becoming automated and more rigorous. The success of applying formal verification techniques (e.g. model checking [4]) to software development is increasing with the evolution of algorithms and tools. Properties are verified in an efficient and automated way, even against the complex and huge models of such systems, becoming essential to the correctness assurance of the models. Despite the important contribution of formal verification techniques to produce more reliable software systems, it does not assure conformance between implementation and models. Thus, 1 2 3

The author is supported by CAPES Email: [email protected] Email: [email protected] This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs

Silva and Machado

a validation technique, like conformance testing, is necessary to complement the software verification and validation process. Testing is a popular validation technique recognized as a complement to verification techniques [25] (e.g. model checking). Conformance testing is a black box functional testing technique [15] that consists of checking the conformance between the implementation under test (IUT) and the specification. The IUT is a black box, so, its behaviour may only be visible through interactions with the tester. Such interactions are performed through the system’s boundaries with the environment, called points of control and observation (PCO’s). An approach to test case generation is based on the explicit specification of properties to be tested. Such properties are called test purposes, and they focus on specific parts of the specification [15]. In model-based testing the specification is given as a model [7], so, we use the terms as synonyms here. Applying conformance testing to the testing of reactive, distributed and concurrent systems is a laborious and difficult task due to the nondeterminism of these systems. The testing process can become too expensive and inefficient. The application of conformance testing from formal specifications represents an important branch in the efforts to make testing more rigorous and efficient. Many tools (e.g. [15,5,23]) have been developed and applied to industry experiments (e.g. [9]). However, we believe that the lack of techniques and tools for the specification of test purposes has been a great barrier to the application of conformance testing tools. Like specifications, they are usually written based on low level abstract formalisms, therefore, difficult to understand. Moreover, maintaining them based on the commonly huge systems specifications is laborious and error-prone. As the IUT should conform to the model, properties that need to be verified against the model, also needs to be tested against the IUT. Thus, test cases must be generated based on such properties, making test purposes correspondent to them. Based on this correspondence, test purpose generation may be based on model checking, which provides efficient mechanisms to perform model analysis. This paper aims to present an approach for the automatic generation of test purposes for reactive distributed systems based on verification techniques. Our approach uses a model checker to perform the test purpose generation. The test purpose generation consists on the specification of the properties to be verified, with later synthesis from the extracted examples and counterexamples through the model checker. This paper focus on CTL formulas, more specifically, on the EU connective. The test purposes are generated as labbelled transition systems (LTS). As main contribution to testing, we provide a rigorous automated procedure for test purpose specification and generation. The properties to be tested can be specified as an abstract formal language, more suitable for human reasoning. Moreover, the formal verification and testing processes may be linked, providing more consistency to the 28

Silva and Machado

verification and validation based on formal methods. The paper is organized as follows: Section 2 presents the theoretical background; Section 3 presents the approach for test purpose generation; Section 4 presents a case study for a test purpose and test case generation; Section 5 presents some related works; Section 6 presents concluding remarks.

2

Background

As theoretic background we use the formal framework proposed in [25] and its extension presented in [6]. This framework presents the basic formal concepts used in conformance testing and provides mechanisms to test cases evaluation. The extension presented introduces the formal concept of test purposes, called observation objectives 4 . We present some formal concepts related to the observation objectives presented in [6] relating them to the model checking theory.

2.1 Formal Test Purposes Test purposes describe desired behaviour that must be observed during test execution. The test cases related to the test purposes are generated and executed aiming at the exhibition of the desired behaviour by the implementation. Thus, we define a relation exhibits ⊆ IM P S × T OBS, where IM P S is the domain of implementations and T OBS is the domain of test purposes. However, implementations are not suitable for formal reasoning, making it difficult to give a formal definition to this relation. Based on the test hypothesis [1], we assume the existence of a model iIU T ∈ M ODS for the IUT, where M ODS is the universe of models. Now, we can establish a relation in the formal domain making it possible to reason about exhibition. This relation is called the reveal relation, defined as rev ⊆ M ODS × T OBS. Thus, for an implementation IU T ∈ IM P S, a model of the IUT iIU T ∈ M ODS and a test purpose e ∈ T OBS: IU T exhibits e ⇐⇒ iIU T rev e. A verdict function decides whether a test purpose is exhibited by an implementation: He : P(OBS ) →S{hit, miss}. Then considering a test suite Te , iut hits e by Te =def He ( {exec(t,iut) | t ∈ Te }) = hit. A test suite that is e-complete can distinguish among all exhibiting and non-exhibiting implementations, such that, iut exhibits e if and only if iut hits e by Te . A test suite is e-exhaustive when it can only detect nonexhibiting implementations (iut exhibits e implies iut hits e by Te ), whereas a test suite is e-sound when it can only detect exhibiting implementations (iut exhibits e if iut hits e by Te ). 4

For sake of clarity, we use the well known term test purposes throughout the paper

29

Silva and Machado

2.2 Relating Formal Test Purposes to Model Checking Theory The model checking problem is defined in [4] as: given a kripke structure M , which models a concurrent finite state system and a temporal logic formula f expressing a property p, identify the set of states S of M that satisfy f . Formally: {s ∈ S | M, s |= f }. Consider a given specification mIU T as a kripke structure and a model iIU T ∈ M ODS that implements it. If there is a set of states in mIU T that satisfies a given property p, then iIU T is able to reveal p. Assuming that p can be expressed as a temporal logic formula f and by a test purpose e, we can establish that: iIU T rev e ⇐⇒ ∃s ∈ S : mIU T , s |= f . The states satisfying f form sets of states that represent the property p w.r.t. the specification mIU T . These sets contain states related by a predecessor/successor relation, i.e., traces of the kripke structure representing p. As these traces correspond to abstract specifications of p, they may be used to guide the generation of test purposes.

3

Test Purpose Generation

The verification of properties through model checking has been successfully done against realistic size concurrent systems [4]. However, the same rigour is not usually applied to testing implementations, creating a large gap between these processes and making possible the presence of failures on the implementation in points where the specification was successfully corrected. Therefore, we aim to reduce this gap through the generation of test purposes from such properties specified in temporal logic formulas, based on the similarity of them. To achieve this goal, we aim to perform analysis over the model through its state space, like model checking does. However, the process is adapted to get enough information for the test purpose generation in addition to the correctness verification of the model. The approach consists of an adaptation of a model checker algorithm[22] to extract model traces representing examples and counter-examples (if there are any) from the state space and later analysis over these traces to generate an abstract graph representing the test purpose (Fig. 1).

Fig. 1. Test purpose generation process.

30

Silva and Machado

The test purpose is given as an LTS. Formally, a test purpose is a tuple e = (Q, A, →, q0 ), where Q is a finite set of states, A the alphabet of actions, →⊆ Q × A × Q the transition relation and q0 ∈ Q the initial state. The test purpose is equipped with two sets of special states accept and refuse for sequences to be selected or not to compose test cases, respectively. 3.1 Extracting Examples and Counter-examples The adaptation of the model checking technique consists of changes on the search algorithms of a model checker, making it possible to extract a larger number of model traces (i.e. examples and counter-examples). These traces are aimed to provide sufficient information about the model for the test purpose generation. This information is obtained from analysis over the examples and counter-examples that are made to identify the relevant transitions w.r.t. the specified property. Such transitions compose the LTS of the test purpose. The examples are used to provide information about the accepted behaviour defined by the test purpose. The relevant transitions are then taken to construct the accept traces of the test purpose. The irrelevant ones are abstracted, usually by ”*-transitions”. Such *-transitions replace any occurring transition, except the transitions leading to another states. Since the model checking technique is defined over transitions and states in terms of the kripke model, the use of LTS may lead to a misrepresentation of the property. The abstraction by *-transitions may be higher than the necessary to make the LTS correspondent to the formula, making possible the generation of test cases with transition sequences that may lead to property violation. To solve this problem, counter-examples of the formula, containing such undesirable transitions, are used to restrict the LTS to be generated. These transitions compose the traces leading to the refuse states of the test purpose. These states are interesting to the non-determinism problem of reactive systems too. It provides constraints on the test case generation algorithm. 3.2 Analysis and Abstraction To simplify the analysis of the model traces, we define a simplified representation of its states in an abstract way. Thus, we represent these traces by a basic finite state machine defined by the tuple (Q, Σ, δ, q0 , F ), where Q is a finite non-empty set of states, δ a finite set of alphabet symbols accepted by the machine, δ : Q × Σ −→ Q a transition function, q0 ∈ Q an initial state and F ⊆ Q a set of final states, called accept states. The states of each trace are classified into sets defined by the propositions of the respective CTL formula, based on the satisfiability of the states w.r.t. to the formula propositions. Fig. 2 shows model traces (Fig. 2(a) and Fig. 2(b)) related to a CTL formula EU (p, q). The states of the example (Fig. 2(a)) are classified into two sets of state types, p and q. The states satisfying the proposition p 31

Silva and Machado

are called p-states and the state satisfying the proposition q is called q-state. For EU (p, q) formulas, the q-state represents the accept state of the machine. p

x

p

y

p

z

q

p

(a) Example

x

p

v

not(p)

(b) Counter-example

Fig. 2. Simplified representation of traces of the EU (p, q) formula.

The analysis algorithm classifies the relevant and the irrelevant transitions of the traces w.r.t. to the property based on the detection of the state changes over the simplified representation of them. This is done by identification of the transition and/or sequence of transitions necessary to the state changes. This identification consists in classifying the transitions of each trace extracted into the two sets (the relevant and the irrelevant). To detect a sequence of transitions necessary to cause a state change, the algorithm performs an intersection operation over the two sets. Only sequences of transitions that can occur in alternate orders are detected, i.e. if a set of transitions causes state changes jointly, in alternate orders, traces containing a transition of such set that cause a state change must contain all of them. Therefore, two subsets of the relevant set are created, one for the transitions identified in the intersection operation and one for the others. After the examples and counter-examples analysis and transitions classification steps, the next step performed is the test purpose generation. The transitions of the two subsets of the relevant ones are used to construct the test purpose graph: (i) leading to accept states in case of the transitions obtained from examples and (ii) leading to refuse states in case of the transitions obtained from counter-examples. Fig. 3 shows a test purpose generated from the examples of Figures 2(a) and 2(b). * z accept

v refuse

Fig. 3. Test purpose generated from the graphs of Figures 2(a) and 2(b).

3.3 The EU Test Purpose Generation Algorithm The algorithm, shown in Algorithm 1, is based on the state changes. A partition over the transitions of the model traces must be made over two sets, L and N (lines 2-8). This partition is performed with the aid of the function leadsToQ(t,e) (line 3). Transitions that lead to a q-state, i.e. the relevant 32

Silva and Machado

transitions, are added to the L set (line 6). Transitions that do not lead to a q-state, i.e. the irrelevant ones, are added to the N set (line 4). Fig. 4 shows a set of traces related to a given EU (p, q).

p

x

p

p

y

p

p

c

p

p

b

p

y

a

g

f

p

p

p

p

z

z

d

x

p

a

p

p

a

p

p

a

q

p

q

p

p

g q

f

q

(a)

p

y

b

i

c

p

p

p

p

k not(p)

k not(p)

h

j

p

p

j

not(p)

h

p

i

not(p)

(b)

Fig. 4. Examples (4(a)) and counter-examples (4(b)) of a given EU (p, q) formula

The resulting sets of examples transitions are L = {z, f, g} and N = {x, y, a, b, c, d, f, g}. Some transitions belong to both sets (e.g. f and g). We can conclude that these transitions cause state changes in a joint way. Thus, we intersect the two sets to obtain a third set I = {f, g} to group such transitions in order to create the abstract graph (lines 9-16). For each example, the combination of the transitions sequences of the I set must be regarded by the test purpose. Subsets based on these combinations are created, based on a predecession relation over the transitions, to define the correspondent traces of the test purpose. The function predecessor(t, e) (line 12) returns the transitions occurring earlier than a given transition t, in a given example e, regarding their orders. The traces of the graph are created between the lines 21-37. A special set S = {z} containing the transitions belonging only to the L set is created (line 18). These transitions are used to make traces linking the initial state of the graph to the accepting state (lines 21-23). The traces containing transitions belonging to the I set are made based on the sequences of transitions defined through the subsets created to regard such sequences. For each subset a trace must be made (lines 26-30). Transitions from the I and S sets related by the predecession relation are verified between lines 31-35. If a transition t from the I set is a predecessor of a transition j from the S set, a trace containing such transitions must be made (lines 32-34). The graph created from the relevant transitions of the examples is called accepting graph (Fig. 5(a)). The same procedure applied over the examples in order to create the accepting graph is applied to the counter-examples (Fig. 4(b)), generating a graph called refuse graph (Fig. 5(b)). The test purpose graph must contain the information of both graphs. The test purpose resultant from the procedure is shown in Fig. 5(c). 33

Silva and Machado

.

.

*

g

f

g

k

accept

f

*

(a) Accepting graph

refuse

i

j

*

* j

i

* z

.

*

*

*

(b) Refuse graph

g

j refuse

i *

f

k

z j

i

*

accept

g

f

*

(c) Resultant test purpose

Fig. 5. Graphs obtained through the process

4

Case Study

A case study was performed with a specification of the Mobile IP protocol [19]. A test purpose was generated based on our approach and test cases were generated with the TGV tool [15]. The internet protocols do not provide dynamic addressing to mobile devices, called mobile nodes, that can migrate over the network. A migration could cause the connection to get lost. The Mobile IP protocol was developed to solve this problem, providing transparent migration and new IP address assignments. To provide the transparency to the migrations the protocol provides two addresses to the mobile nodes. A home address and a foreign address, called care-of-address (COA). The home address is obtained from the home network, while the COA is obtained from the foreign network for which the mobile node is migrating to. While the mobile node (MN) is within the foreign network, the messages addressed to it are delivered by the foreign router, called foreign agent (FA). Messages sent from a host to a mobile node are addressed to the home address. The home network router, called home agent (HA), encapsulates the message within another one addressed to the COA and sends it to the foreign agent. This process is known as tunnelling. When the mobile node migrates, a COA is assigned to it and the foreign agent sends an advertisement message to the home agent. 4.1 Test Purpose Generation The formalism used to model the protocol was RPOO [22]. RPOO is an objectoriented modelling language based on Petri Nets [16]. The model checker used to verify properties over RPOO models, and adapted to our case study, was Veritas [22], a CTL based model checking tool. As a test purpose, we wish to reason about the conformance between IUT and model in cases messages are sent to the mobile node. While the mobile node is home, the messages must be delivered by the home agent. We 34

Silva and Machado

Algorithm 1 EU Test Purpose Generation Algorithm 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38:

for all e ∈ Examples do for all t ∈ e do if ¬ leadsToQ(t,e) then add(t,N ) else add(t,L) end if end for I =L∩N for all t ∈ I do SU BIt = for all p ∈ predecessors(t, e) do add(p,SU BIt ) end for add(t,SU BIt ) end for end for S =L−I T estP urpose = i=0 for all t ∈ S do add((i,t,”accept”), T estP urpose) end for for all t ∈ I do for all s ∈ SU BIt do if s 6= t then add((i,s,i + 1),T estP urpose) else add((i,s,”accept”),T estP urpose) end if for all j ∈ S do if s ∈ predecessors(j) then add((i + 1,j,”accept”),T estP urpose) end if end for i=i+1 end for end for

specify a simple EU (N OT (p), q) formula, where p means ”the mobile node has migrated to the foreign network” and q means ”the home agent delivers the messages to the mobile node”. The extraction of the model traces was based on the depth-first search algorithm, producing traces containing many states in common. We aimed to cover all q states, with only one example for each q state. So, examples leading to q states previously selected are not extracted. Fig. 6 illustrates the depth-first selection of traces. States 4 and 6 35

Silva and Machado

are q states, covered by the examples marked as a thick arrow. The dashed arrows indicate examples that should not be extracted. However, this strategy would miss some relevant transitions contained by the ”dashed examples”, not contained by the other traces. To solve this problem, in such cases, these dashed examples must be extracted. Thus, all relevant transitions related to the specified property are covered, providing a complete information for the generation of a test purpose consistent w.r.t. the CTL formula. 1

3

2

4

5

6

Fig. 6. Depth search

The analysis of the examples selected concluded that only one action was necessary to reach the q states. Thus, the accepting graph only specifies it, abstracting the others with the *-transition (Fig. 7(a)). As we are not interested in cases the mobile node migrates, the counter-examples obtained represent the violation of the proposition N OT (p), with the migration of the mobile node and the send of an advertisement message from the foreign agent to the home agent. The transitions representing such violation compose the refuse graph (Fig. 7(b)). The test purpose resultant from the abstraction process is shown in Fig. 8. .

.

*

* FA:MN.receiveAdv(ip)

HA:MN.receiveDatagram(dat) accept

(a) Accepting graph

refuse

(b) Refuse graph

Fig. 7. Graphs obtained through the process

The TGV tool was used to generate the test cases from the test purpose of Fig. 8. It produced a complete test graph (CTG) through a synchronous product between the model 5 and the test purpose. The CTG contained all the examples selected, covering all q states. However, the CTG covered all the possibilities leading to the q states (e.g. the dashed examples as in Fig. 6 were covered too). Therefore, all the model traces corresponding to the CTL formula were covered by the test cases, showing the correspondence between the generated test purpose and the CTL formula w.r.t. to the model. 5

The kripke model was converted into an LTS one in the format required by TGV.

36

Silva and Machado .

* HA:MN.receiveDatagram(dat)

accept

FA:MN.receiveAdv(ip) refuse

Fig. 8. The resultant test purpose

The generated CTG is e-exhaustive, containing infinite number of test cases. The TGV guarantees the e-soundness of the generated test cases. So, we can call the test suite composed by the CTG e-complete.

5

Related Works

Test generation using model checkers is a well explored research area. Many approaches have been proposed (e.g. [18,11]) so that model checkers are used to generate test cases directly from model traces. In these cases the test purposes are formalized as temporal logic formulas and applied to the process. However, these approaches are not based on a clear testing theory and are not appropriate to non-deterministic systems [15]. The adaptation of model checking techniques and tools to test case generation is explored in [15,5]. Based on clear theory of conformance testing they provide an exclusive process for test case generation. In [15] the test purposes are given as an LTS, however, the technique does not provide ways to its generation. Another LTS approach to automatically produce test cases allowing checking of satisfiability of a linear property on a given implementation is discussed in [8]. This approach is based on a partial specification and an observer specified as a Rabin automata [21] to recognize the desired execution sequences. A concept of bounded properties is introduced to limit the infinite execution sequences. The partial specifications provide more flexibility to the test case generation and execution. Aiming to solve the state space explosion problem [4] of the explicit state space enumeration techniques like the based on LTS [15,8,24], symbolic approaches have been proposed [3,10]. An algorithm for the test purpose generation is presented in [12]. The approach is aimed at the identification of the significant behaviours of a system modelled as labelled event structures to generate the test purposes in form of MSC’s. Each significant behaviour is to be converted into a test purpose aiming at the generation of a test case for each one. Despite the characteristic of automation of this technique, the test purposes do not provide a higher level of abstraction w.r.t. the model. The test suite tends to be small and not exhaustive. 37

Silva and Machado

6

Conclusion

The presented approach makes possible the straight use of CTL properties to test purpose generation. Also, it promotes the integration of the verification and validation processes, providing a link between the model checking and conformance testing techniques. The test purposes generated through our approach represent rigorous specification of properties to guide the generation of conformance test cases. The test case generation from such test purposes through the related theory presented in [24,25] may lead to e-complete test suites. However, the use of temporal logic properties in the test case generation suffers from some restrictions related to the length of the test cases. Infinite executions, usually represented by liveness properties [17], are not practical to testing. Thus, test case generation techniques based on such properties must provide ways to limit the test case execution (e.g. [8]). The generalization of the presented approach may be reached through its adaptation to nested formulas and EG connective based formulas. Covering the EU and EG connectives suffices, once any CTL formula can be expressed in terms of these connectives. Such generalization may be obtained through a definition of special representation of examples and counter-examples for the EG connective. However, we are investigating a more general representation, covering any kind of CTL formula, on which examples and counter-examples are analyzed in a joint way, distinguished only by the final states accept and refuse, respectively. The algorithms must be adapted to perform the analysis based on the new representation and to treat more kind of states than the current one. The application of the presented approach using linear temporal logic descriptions using automata on infinite words (e.g. [21,2]), like in [8], may be aimed at future works. Applying the proposed approach to finite state machines testing approach [20] constitutes another important research line. It is important to provide techniques to the problem of test case selection as well. As test case generation usually produces an infinite number of test cases, it is necessary to provide ways to select among them (e.g. coverage strategies and/or heuristics to select the most promising execution sequences [8]).

References [1] G. Bernot. Testing Against Formal Specifications: A Theoretical View. In TAPSOFT, Vol.2, pages 99–119, 1991. [2] J. R. B¨ uchi. On a decision method in restricted second-order arithmetic. In Proceedings Logic, Methodology and Philosophy of Sciences 1960, Stanford, CA, 1962. Stanford University Press. [3] D. Clarke, T. Jeron, V. Rusu, and E. Zinovieva. STG – A Symbolic Test

38

Silva and Machado

Generation Tool. In Proceedings of Tools and Algorithms for the Construction and Analysis of Systems (TACAS’02), volume 2280 of Lecture Notes in Computer Science. Springer, 2002. [4] E. Clarke, O. Grumberg, and D. Peled. Model Checking. MIT Press, 1999. [5] R. G. de Vries and J. Tretmans. On-the-fly conformance testing using spin. In E. Najm G. Holzmann and A. Serhrouchni, editors, Fourth Workshop on Automata Theoretic Verification with the Spin Model Checker, ENST 98 S 002, Paris, France. Ecole Nationale Supérieure des Télécommunications, pages 115– 128, November 1998. [6] R. G. de Vries and J. Tretmans. Towards Formal Test Purposes. In Proc. 1st International Workshop on Formal Approaches to Testing of Software (FATES), Aalborg, Denmark, pages 61–76, August 2001. [7] I. K. El-Far and J. A. Whittaker. Model-based software testing. Encyclopedia on Software Engineering, 2001. [8] J. Fernandez, L. Mounier, and C. Pachon. Property oriented test case generation. In Formal Approaches to Software Testing, Proceedings of FATES 2003, volume 2931 of Lecture Notes in Computer Science, pages 147–163, Montreal, Canada, 2004. Springer. [9] J.-C. Fernandez, C. Jard, T. Jéron, and G. Viho. An Experiment in Automatic Generation of Conformance Test Suites for Protocols with Verification Technology. Science of Computer Programming, 29:123–146, 1997. [10] L. Frantzen, J. Tretmans, and T. A. C. Willemse. Test generation based on symbolic specifications. In J. Grabowski and B. Nielsen, editors, FATES’04, number 3395 in Lecture Notes in Computer Science, pages 1–15. Springer, 2005. [11] A. Gargantini and C. Heitmeyer. Using model checking to generate tests from requirements specifications. In ACM SIGSOFT Software Engineering Notes, volume 24 of Software Engineering Notes, pages 146–162, November 1999. [12] O. Henniger, M. Lu, and H. Ural. Automatic generation of test purposes for testing distributed systems. In Formal Approaches to Software Testing, Proceedings of FATES’03, volume 2931 of Lecture Notes in Computer Science, pages 178–191. Springer, October 2003. [13] C. A. R. Hoare. Communicating Sequential Processes. Prentice-Hall, 1985. [14] G. J. Holzmann. The Model Checker SPIN. 23(5):279–295, 1997.

IEEE Trans. Softw. Eng.,

[15] C. Jard and T. Jéron. TGV: theory, principles and algorithms – A tool for the automatic synthesis of conformance test cases for non-deterministic reactive systems. Software Tools for Technology Transfer (STTT), 6, October 2004. [16] K. Jensen. Coloured Petri Nets 1: Basic Concepts, Analysis Methods and Practical Use, volume 1. Springer-Verlag, Berlin, Alemanha, 1992.

39

Silva and Machado

[17] L. Lamport. Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2002. [18] P. E. Black P. E. Ammann and W. Majursky. Using model checking to generate tests from specifications. In IEEE Computer Society, editor, In Proceedings of the Second IEEE International Conference on Formal Engineering Methods (ICFEM’98), pages 46–54, November 1998. [19] C. Perkins. Rfc 3344:IP mobility support for IPv4, aug 2002. Status: Proposed Standard. [20] A. Petrenko, S. Boroday, and R. Groz. Confirming configurations in efsm testing. IEEE Trans. Softw. Eng., 30(1):29–42, 2004. [21] M. O. Rabin. Automata on Infinite Objects and Church’s Problem. American Mathematical Society, Boston, MA, USA, 1972. [22] C. L. Rodrigues, F. V. Guerra, J. C. A. de Figueiredo, D. D. S. Guerrero, and T. S. Morais. Modeling and verification of mobility issues using object-oriented petri nets. In Proc. of 3rd International Information and Telecommunication Technologies Symposium (I2TS2004), 2004. [23] M. Schmitt, A. Ek, J. Grabowski, D. Hogrefe, and B. Koch. Autolink - Putting SDL-based test generation into practice. In In: Testing of Communicating Systems (Editors: A. Petrenko,N . Yevtuschenko), volume 11, Kluwer Academic Publishers, 1998, June 1998. [24] J. Tretmans. Test Generation with Inputs, Outputs and Repetitive Quiescence. Software - Concepts and Tools, 17(3):103–120, 1996. [25] J. Tretmans. Testing Concurrent Systems: A Formal Approach. In J.C.M Baeten and S. Mauw, editors, CONCUR’99 – 10 th Int. Conference on Concurrency Theory, volume 1664 of Lecture Notes in Computer Science, pages 46–65. Springer-Verlag, 1999.

40

MBT 2006

Runtime Verification for High-Confidence Systems: A Monte Carlo Approach Sean Callanan Radu Grosu Abhishek Rai Scott A. Smolka Mike R. True Erez Zadok Computer Science Department Stony Brook University

Abstract We present a new approach to runtime verification that utilizes classical statistical techniques such as Monte Carlo simulation, hypothesis testing, and confidence interval estimation. Our algorithm, MCM, uses sampling-policy automata to vary its sampling rate dynamically as a function of the current confidence it has in the correctness of the deployed system. We implemented MCM within the Aristotle tool environment, an extensible, GCC-based architecture for instrumenting C programs for the purpose of runtime monitoring. For a case study involving the dynamic allocation and deallocation of objects in the Linux kernel, our experimental results show that Aristotle reduces the runtime overhead due to monitoring, which is initially high when confidence is low, to levels low enough to be acceptable in the long term as confidence in the monitored system grows.

1

Introduction

In previous work [7], we presented the MC2 algorithm for Monte Carlo Model Checking. Given a (finite-state) reactive program P , a temporal property ϕ, and parameters and δ, MC2 samples up to M random executions of P , where M is a function of and δ. Should a sample execution reveal a counterexample, MC2 answers false to the model-checking problem P |= ϕ. Otherwise, it decides with confidence 1 − δ and error margin , that P indeed satisfies ϕ. Typically the number M of executions that MC2 samples is much smaller than the actual number of executions of P . Moreover, each execution sampled starts in an initial state of P , and terminates after a finite number of execution steps, when a cycle in the state space of P is reached. In this paper, we show how the technique of Monte Carlo model checking can be extended to the problem of Monte Carlo monitoring and runtime verification. Our resulting algorithm, MCM, can be seen as a runtime adaptation This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs

Callanan

of MC2 , one whose dynamic behavior is defined by sampling-policy automata (SPA). Such automata encode strategies for dynamically varying MCM’s sampling rate as a function of the current confidence in the monitored system’s correctness. A sampling-policy automaton may specify that when a counterexample is detected at runtime, the sampling rate should be increased since MCM’s confidence in the monitored system is lower. Conversely, if after M samples the system is counterexample-free, the sampling rate may be reduced since MCM’s confidence in the monitored system is greater. The two key benefits derived from an SPA-based approach to runtime monitoring are the following: •

As confidence in the deployed system grows, the sampling rate decreases, thereby mitigating the overhead typically associated with long-term runtime monitoring.

•

Because the sampling rate is automatically increased when the monitored system begins to exhibit erroneous behavior (due either to internal malfunction or external malevolence), Monte Carlo monitoring dynamically adapts to internal mode switches and to changes in the deployed system’s operating environment.

A key issue addressed in our extension of Monte Carlo model checking to the runtime setting is: What constitutes an adequate notion of a sample? In the case of Monte Carlo runtime verification, the monitored program is already deployed, and restarting it after each sample to return the system to an initial state is not a practical option. Given that every reactive system is essentially a sense-process-actuate loop, in this paper we propose weaker notions of initial state that are sufficient for the purpose of dynamic sampling. One such notion pertains to the manipulation of instances of dynamic types: Java classes, dynamic data structures in C, etc. In this setting, a sample commences in the program state immediately preceding the allocation of an object o and terminates in the program state immediately following the deallocation of o, with these two states being considered equivalent with respect to o. To illustrate this notion of runtime sampling, we consider the problem of verifying the safe use of reference counts (RCs) in the Linux virtual file system (VFS). The VFS is an abstraction layer that permits a variety of separatelydeveloped file systems to share caches and present a uniform interface to other kernel subsystems and the user. Shared objects in the VFS have RCs so that the degree of sharing of a particular object can be measured. Objects are placed in the reusable pool when their RCs go to zero, objects with low RCs can be swapped out, but objects with high RCs should remain in main memory. Proper use of RCs is essential to avoid serious correctness and performance problems for all file systems. To apply Monte Carlo runtime monitoring to this problem, we have defined Real Time Linear Temporal Logic formulas that collectively specify what it means for RCs to be correctly manipulated by the VFS. We further imple42

Callanan

mented the MCM algorithm within the Aristotle environment for Monte Carlo monitoring. Aristotle provide a highly extensible, GCC-based architecture for instrumenting C programs for the purposes of runtime monitoring. Aristotle realizes this architecture via a simple modification of the GNU C compiler (GCC) that allows one to load an arbitrary number of plug-ins dynamically and invoke code from those plug-ins at the tree-optimization phase of compilation. Using a very simple sampling policy, our results show that Aristotle brings runtime overhead, which is initially very high when confidence is low, down to long-term acceptable levels. For example, a benchmark designed to highlight overheads under worst-case conditions exhibited a 10x initial slowdown; 11 minutes into the run, however, we achieved 99.999% confidence that the error rate for both classes of reference counts was below one in 105 . At this point, monitoring for that class was reduced, leaving an overhead of only 33% from other monitoring. In addition to reference counts, Aristotle currently provides Monte Carlo monitoring support for the correct manipulation of pointer variables (bounds checking), lock-based synchronization primitives, and memory allocation library calls. Due to its extensible architecture based on plug-ins, support for other system features can be easily added. The rest of the paper is organized as follows. Section 2 describes our system design. Section 3 presents our Monte Carlo runtime monitoring algorithm. Section 4 details the Aristotle design and implementation. Section 5 gives an example application of Aristotle, and Section 6 discusses related work. Section 7 contains our concluding remarks and directions for future work.

2

Aristotle Design Overview

Figure 1 depicts the various stages of operation for Aristotle as it processes a system’s source code. A modified version of the GNU C compiler (GCC) parses the source code, invoking an instrumenting plug-in to process the control flow graph for each function. The instrumenting plug-in inserts calls to verification code at each point where an event occurs that could affect the property being checked. The verification code is part of a runtime monitor, which maintains auxiliary runtime data used for property verification and is bound into the software at link time. The runtime monitor interacts with the confidence engine, which implements a sampling policy based on our Monte Carlo runtime monitoring algorithm (described in Section 3). The confidence engine maintains a confidence level for the properties being checked and may implement a sampling policy automaton to regulate the instrumentation or perform other actions. This regulation can be based on changes in the confidence level and could respond to other events in the system, such as the execution of rarely-used code paths. 43

Callanan

Unmodified source code

GCC tree optimization

Instrumenting plug−in

In an RC debugger:

In a bounds checker:

Instruments all code that manipulates reference counts

Instruments all memory accesses and allocations

Monitors objects; flags improper use and leaks

Monitors areas currently allocated; flags bad accesses

GCC code emission

Linking

Runtime monitor

Compile time Run time Instrumen− ted system

Confidence engine

Regulates monitoring based on sampled behavior and sampling policy

Fig. 1. Architectural overview of the Aristotle system.

3

Monte Carlo Monitoring

In this section, we present our MCM algorithm for Monte Carlo monitoring and runtime verification. We first present MCM in the context of monitoring the correct manipulation of reference counts (RCs) in the Linux virtual file system (VFS). RCs are used throughout the Linux kernel, not only to prevent premature deallocation of objects, but also to allow different subsystems to indicate interest in an object without knowing about each other’s internals. Safe use of reference counts is an important obligation of all kernel subsystems. We then consider generalizations of the algorithm to arbitrary dynamic types. In the case of the Linux VFS, the objects of interest are dentries and inodes, which the VFS uses to maintain information about file names and data blocks, respectively. The VFS maintains a static pool of these objects and uses RCs for allocation and deallocation purposes: a free object has an RC of zero and may be allocated to a process; an object with a positive RC is considered in-use and may only be returned to the free pool when the state of the RC returns to zero. Additionally, an object with a high reference count is less likely to be swapped out to disk. To apply Monte Carlo runtime monitoring to this problem, we first define the properties of interest. These are formally defined in Figure 2. Each of these properties is formalized using Real-Time Linear Temporal Logic [2], where G, F and X are unary temporal operators. G requires the sub-formula over which it operates to be true Globally (in all states of an execution), F requires it to hold Finally (in some eventual state of an execution), and X requires it to hold neXt (in the next state of an execution). Also, an unprimed variable refers to its value in the current state and the primed version refers to its value in the next state. Each property uses universal 44

Callanan

(stI)

∀o : C. G o.rc ≥ 0

RC values are always non-negative.

(trI)

∀o : C. G | o0 .rc−o.rc | ≤ 1

RC values are never incremented or decremented by more than 1.

(lkI)

∀o : C. G o0 .rc 6= o.rc ⇒ XF≤T o0 .rc ≤ o.rc

A change in the value of an RC is always followed within time T by a decrement.

Fig. 2. Reference-count correctness properties.

quantification over all instances o of a dynamic type C. The first property is a state invariant (stI) while the second property is a transition invariant (trI). The third property is a leak invariant (lkI) that is intended to capture the requirement that the RC of an actively used object eventually returns to zero. It is expressed as a time-bounded liveness constraint, with time bound T . Since each of these properties can be proved false by examining a finite execution, they are safety properties, and one can therefore construct a deterministic finite automaton (DFA) A that recognizes violating executions [10,16]. The synchronous composition (product) CA of C with A is constructed by instrumenting C with A such that C violates the property in question iff an object o of type C can synchronize with A so as to lead A to an accepting state. We view an object o of type C as executing in a closed system consisting of the OS and its environment. We assume that the OS is deterministic but the environment is a (possibly evolving) Markov chain; i.e., its transitions may have associated probabilities. As a consequence, CA is also a Markov chain. Formally, a Markov chain M = (X, E, p, p0 ) consists of a set X of states; a set E ⊆ X × X of transitions (edges); an assignment of positive transition probabilities p(x, y) to all transitions (x, y) so that for each state x, Σy∈X p(x, y) = 1; and an initial probability distribution p0 on the states such that Σx∈X p0 (x) = 1. A finite trajectory of M is the finite sequence of states x = x0 , x1 , . . . , xn , such that for all i, (xi , xi+1 ) ∈ E and p(xi , xi+1 ) > 0. The probability of a finite trajectory x = x0 , x1 , . . . , xn is defined as PM (x) = p0 (x0 )p(x0 , x1 ) · · · p(xn−1 , xn ). Each trajectory of CA corresponds to an object execution. The more objects displaying the same execution behavior, the higher the probability of the associated trajectory. Hence, although the probabilities of CA are not explicitly given, they can be learned via runtime monitoring. Assuming that kernel-level objects have finite lifetimes (with the possible exception of objects such as the root file-system directory entry), and that state is dependent on the object’s history, CA is actually a Markov tree, since no object goes backward in time. The leaves of CA fall into two categories: (i) violation-free executions of objects of type C which are deallocated after their RCs return to zero, and (ii) executions violating property stI, trI, or lkI. 45

Callanan

Thus, a trajectory in CA can be viewed as an object execution from its birth to its death or to an error state representing a property violation. We consider such a trajectory to be a Bernoulli random variable Z such that Z = 0 if the object terminated normally, and Z = 1 otherwise. Further, let pZ be the probability that Z = 1 and qZ = pZ − 1 be the probability that Z = 0. The question then becomes: how many random samples of Z must one take to either find a property violation or to conclude with confidence ratio δ and error margin that no such violation exists? To answer this question, we rely, as we did in the case of Monte Carlo model checking, on the techniques of acceptance sampling and confidence interval estimation. We first define the geometric random variable X, with parameter pZ , whose value is the number of independent trials required until success, i.e., until Z = 1. The probability mass function of X is p(N ) = P[X = N ] = qZN −1 pZ , and the cumulative distribution function (CDF) of X is X F (N ) = P[X ≤ N ] = p(n) = 1 − qZN n≤N

Requiring that F (N )=1−δ for confidence ratio δ yields: ln(δ) N= ln(1 − pZ ) which provides the number N of attempts needed to find a property violation with probability 1−δ. In our case, pZ is unknown. However, given error margin and assuming that pZ ≥ , we obtain that ln(δ) ln(δ) M= ≥N = ln(1 − ) ln(1 − pZ ) and therefore that P[X ≤ M ] ≥ P[X ≤ N ] = 1 − δ. Summarizing, for ln(δ) M = ln(1−) we have: (1)

pZ ≥ ⇒ P[X ≤ M ] ≥ 1 − δ

Inequality 1 gives us the minimal number of attempts M needed to achieve success with confidence ratio δ under the assumption that pZ ≥ . The standard way of discharging such an assumption is to use statistical hypothesis testing [12]. We define the null hypothesis H0 as the assumption that pZ ≥ . Rewriting inequality 1 with respect to H0 we obtain: (2)

P[X ≤ M | H0 ] ≥ 1 − δ

We now perform M trials. If no counterexample is found, i.e., if X > M , then we reject H0 . This may introduce a type-I error: H0 may be true even though we did not find a counterexample. However, the probability of making this error is bounded by δ; this is shown in inequality 3 which is obtained by taking the complement of X ≤ M in inequality 2: (3)

P[X > M | H0 ] < δ 46

Callanan

With the above framework in place, we now present MCM, our Monte Carlo Monitoring algorithm. MCM, whose pseudo-code is given in Figure 3, utilizes DFA A to monitor properties stI, trI, and lkI, while keeping track of the number of samples taken. input: , δ, C, t, d; global: tn, cn; tn = cn = ln(δ)/ln(1-); set(timeout,d); when (created(o:C) && flip()) if (tn > 0) { tn--; o.to=t; o.rc=0}; when (destroyed(o:C)){ cn--; if (cn = 0) monitoring stop;} when (monitored(o:C) && modified(o.rc)){ if (o0 .rc < 0 | | |o0 .rc-o.rc| > 1) safety stop; if (o.rc-o0.rc == 1) o.to = t;} when (timeout(d)) for each (monitored(o:C)){ o.to--; if (o.to == 0) leak stop;} Fig. 3. The MCM algorithm.

/* stI, trI */

/* lkI */

MCM consists of an initialization part, which sets the target (tn) and current

(cn) number of samples, and a monitoring part, derived from the properties to be verified. The latter is a state machine whose transitions (when statements) are triggered either by actions taken by objects of type C or by a kernel timer thread. The timer thread wakes up every d time units, and the time window used to sample object executions is t ∗ d, where t and d are inputs to the algorithm. When an object o:C is created and the random boolean variable flip() is true, the target number of samples is decremented. The random variable flip() represents one throw of a multi-sided, unweighted coin with one labeled side, and returns true precisely when the labeled side comes up. If enough objects have been sampled (tn=0), no further object is monitored. For a monitored object, its reference count rc and timeout interval to are appropriately initialized. When an object is destroyed, cn is decremented. If the target number of samples was reached (cn=0), the required level of confidence is achieved and monitoring can be disabled. When the RC of a monitored object is altered, we check for a violation of safety properties stI or trI, stopping execution if one has occurred. If an object’s RC is decremented, we reset its timeout interval; moreover, should its RC reach zero, the object is destroyed or reclaimed. When the timer thread awakens, we adjust the timeout interval of all monitored objects. If an object’s timeout interval has expired, leak invariant lkI has been violated and the algorithm halts. Due to the random variable flip(), MCM does not monitor every instance o of type C. Rather, it uses a sampling-policy automaton to determine the rate at which instances of C are sampled. For example, consider the n-state policy automaton PAn that, in state k, 1 ≤ k ≤ n, MCM will only sample o if flip() 47

Callanan

returns true for a 2k -sided coin. Moreover, PAn makes a transition from state k to k + 1 mod n after exactly M samples. Hence, after M samples (without detecting an error) the algorithm uses a 4-sided coin, after 2M samples an 8-sided coin, etc. For a given error margin , the associated confidence ratio δ will then be (1 − )M , (1 − )2M , (1 − )3M and so on. PAn also makes a transition from state k to j, where j < k, when an undesirable event occurs, such as a counterexample, or perhaps an execution of as yet unexecuted code. Sampling policies such as the one encoded by PAn assure that MCM can adapt to environmental changes, and that the samples taken by MCM are mutually independent (as n tends toward infinity). MCM is very efficient in both time and space. For each random sample, it suffices to store two values (old and new) of the object’s RC. Moreover, the number of samples taken is bounded by M . That M is optimal follows from inequality 3, which provides a tight lower bound on the number of trials needed to achieve success with confidence ratio δ and lower bound on pZ . Our kernel-level implementation of MCM is such that if a violating trajectory is observed during monitoring, it is usually the case that a sufficient amount of diagnostic information can be gleaned from the instrumentation to pinpoint the root cause of the error. For example, if an object’s RC becomes negative, the application that executed the method that led to this event can be determined. In another example, if the object’s RC fails to return to zero and a leak is suspected, diagnostic information can be attained by identifying the object’s containing type. Suppose the object is an inode; we can use this information to locate the corresponding file name and link it back to the offending application. The MCM algorithm of Figure 3 can be extended by expanding the class of correctness properties supported by the algorithm. The third and fourth when branches of the algorithm correspond to safety or bounded-liveness checks, respectively. Hence, the MCM algorithm can be generalized in the obvious way, to allow the treatment of arbitrary safety and bounded-liveness properties for any reactive program involving dynamic types. For example, in addition to reference counts, Aristotle currently provides Monte Carlo monitoring support for the correct manipulation of pointer variables (bounds checking), lock synchronization primitives, and memory allocation library calls. Due to its extensible, plug-in-oriented architecture, support for other properties can easily be added.

4

Implementation

In Aristotle, we instrument a program with monitoring code using a modified version of the GNU C compiler (GCC), version 4. We modified the compiler to load an arbitrary number of plug-ins and invoke code from those plugins at the tree-optimization phase of a compilation. At that point in the compilation, the abstract syntax tree has been translated into the GIMPLE 48

Callanan

intermediate representation [6], which includes syntactic, control-flow, and type information. A plug-in is invoked that can use the GCC APIs to inspect each function body in turn and add or remove statements. The plug-in can even invoke other GCC passes to extract information; for example, one plug-in we developed for bounds checking uses the reference-analysis pass to obtain a list of all variables used by a function. Our use of GCC as the basis for Aristotle offers several advantages. First, it can be used to instrument any software that compiles with GCC. Prior staticchecking and meta-compilation projects have used lightweight compilers [4,8] that do not support all of the language extensions and features of GCC. Many of these extensions are used by open-source software, particularly the Linux kernel. Second, the modular architecture of Aristotle allows programmers to instrument source-code without actually changing it. Third, Aristotle users can take advantage of GCC’s library of optimizations and ability to generate code for many architectures. Adding GCC support for plug-ins is very simple; we added a command-line option to load a plug-in and changed the way GCC is built to expose GCC’s internal APIs to plug-ins. The information collected at the instrumented locations in the system’s source code is used by runtime monitors. A runtime monitor is a static library, linked with the system at compile time. The runtime monitor contains checking code which verifies that each detected event satisfies all safety properties; furthermore, it may spawn threads that periodically verify that all bounded liveness properties hold. The monitor interfaces with the confidence engine, reporting rule violations and regulating its operation according to the confidence engine’s instructions, which reflect the operation of a samplingpolicy automaton. Finally, it may also perform other operations, like verbose logging and network-based error reporting, which vary from application to application.

5

Case Study: The Linux VFS

The Linux Virtual File System (VFS) is an interface layer that manages installed file systems and storage media. Its function is to provide a uniform interface to the user and to other kernel subsystems, so that data on mass storage devices can be accessed in a consistent manner. To accomplish this, the VFS maintains unified caches of information about file names and data blocks: the dentry and inode caches, respectively. The entries in these caches are shared by all file systems. The VFS and file systems use reference counts to ensure that entries are not reused without a file system’s knowledge and to prioritize highly-referenced objects for retention in main memory as opposed to being swapped out. The fact that these caches are shared by different file systems, implemented by different authors and of varying degrees of maturity, introduces the potential for system resource leaks and faults arising from misuse of cached objects. 49

Callanan

For example, a misbehaving file system may prevent a storage device from being safely removed because the reference count for an object stored to that device was not safely reduced to zero. Worse, a misbehaving file system could hamper the performance of other file systems by failing to decrement the reference counts of cache data structures. Using the Aristotle framework, we developed a tool that monitors reference counts in the Linux VFS. As described in Section 3, we enforced a state invariant (stI), a transition invariant (trI), and a leak invariant (lkI). The plug-in for this case study instruments every point in the source code at which a reference count was modified. Because we had access to type information, we were able to classify reference counts for dentry and inode objects. Whenever it is invoked, the runtime monitor checks the operation to ensure that the safety properties hold. Additionally, if the operation is a decrement, the monitor updates a timestamp for that reference count, which is maintained in an auxiliary data structure. A separate thread periodically traverses the data structure to verify that all reference counts have been decremented more recently than time interval T . Additionally, all checked operations are optionally logged to disk. The confidence engine maintains separate confidence levels for dentry and inode reference counts using our Monte Carlo model checking algorithm. For clarity, we demonstrate the system with a sampling policy automaton that disables checking when a 99.999% confidence level has been reached that the error rate for that reference counter category is less than 1 in 105 samples. As discussed in Section 3, a sample is defined as the lifetime of a cached object, that is, the period when the object’s reference counter is nonzero. Other sampling policies, such as flipping an n-sided coin where n increases as confidence increases to determine whether to sample a given object, allow more fine-grained trade-offs of performance vs. confidence; additionally, it may be advisable to increase the sampling rate as the environment changes. Figure 4(a) shows the performance overhead of the system with logging and checking enabled, logging disabled but checking enabled, and no instrumentation, under a micro-benchmark designed to exercise the file system caches. In each run, the micro-benchmark creates a tree of directories, does a depthfirst traversal of that tree, and deletes the tree. Because directories are being created and deleted, on-disk data is being manipulated, causing creation and deletion of objects in the inode cache. Additionally, the directory traversal stress-tests the dentry cache. We observe an initial 10x overhead as both dentry and inode reference counts are being monitored and all accesses are being logged. After five runs, which take six minutes in total, dentry confidence reaches the target, and overhead falls to a factor of three. Finally, five minutes later, after eleven runs, overhead drops to 33% when inode confidence reaches the target. The remaining overhead is a characteristic of our prototype; we expect optimization to reduce it significantly. 50

Callanan

80

Events inhibited Events checked Events logged

120

60

Time (seconds)

100

Time (seconds)

Events inhibited Events logged

70

80

60

50

40

30 40 20 20

10

0

0 0

5

10

15

20

25

0

Run number

(a) directory-tree microbenchmark

10

20

30 Run number

40

50

60

(b) compilation of GNU tar

Fig. 4. Overhead reduction as confidence increases.

Figure 4(b) shows the effects under a benchmark that puts less stress on the file system. Compiling the GNU tar utility involves less cache activity than the micro-benchmark described above, so the overheads from monitoring are lower; however, it also takes longer for confidence to reach the target. Initial overhead with logging was 46%. After ten runs, or eleven minutes, this overhead dropped to 14% as dentry confidence reached the target. Forty minutes later, at the 55th run, overheads dropped to 11% as inode confidence reached its target as well.

6

Related Work

Runtime verification is the subject of much recent research [3,14,15]. Our work combines and takes these concepts one step further, detecting instrumentation points at compile time and managing itself autonomously at runtime using statistically-driven sampling policies. Other related work includes metacompilation [8] and ESP [11], which extend compilers with static checkers to find violations of system-specific properties; and the model-checking efforts directed at network protocols [5,13], file systems [17], and device drivers [1]. Chilimbi and Hauswirth [9] have implemented a sampling-based technique for detecting memory leaks in programs. They maintain a timestamp with each memory object and a sampling rate with each basic block of code. Each time a basic block b makes a reference to an object o, o’s timestamp is updated and b’s sampling rate is decreased. No attempt is made to quantify the confidence level and error margin introduced by this technique.

7

Conclusions

We have presented the MCM algorithm for Monte Carlo monitoring and runtime verification, which uses sampling-policy automata to vary its sampling rate dynamically as a function of the current confidence in the monitored system’s 51

Callanan

correctness. We implemented MCM within the Aristotle tool environment, an extensible, GCC-based architecture for instrumenting C programs for the purposes of runtime monitoring. Aristotle realizes this architecture via a simple modification of GCC that allows one to load an arbitrary number of plugins dynamically and invoke code from those plug-ins at the tree-optimization phase of compilation. Our experimental results show that Aristotle reduces the runtime overhead due to monitoring, which is initially high when confidence is low, to long-term acceptable levels as confidence in the deployed system grows. As future work, we are developing an instrumentation-specification language to facilitate plug-in construction and insertion into GCC. Additionally, we are investigating the integration of auxiliary information, such as code coverage, into sampling policies. This would allow, for example, instrumentation to be increased when a rarely-used section of code is executed.

8

Acknowledgments

Yanhong A. Liu and Scott D. Stoller provided valuable feedback to the architectural model described in Section 2 and are collaborating with us on the future work described in Section 7. We are also grateful to the anonymous reviewers who provided invaluable feedback that helped us present our work in as clear a manner as possible. This work was partially made possible thanks to a Computer Systems Research NSF award (CNS-0509230) and an NSF CAREER award in the Next Generation Software program (EIA-0133589).

References [1] Ball, T. and S. K. Rajamani, The SLAM toolkit, in: CAV ’01: Proceedings of the 13th International Conference on Computer Aided Verification (2001), pp. 260–264. [2] Bernstein, A. and J. P. K. Harter, Proving real-time properties of programs with temporal logic, in: SOSP ’81: Proceedings of the eighth ACM symposium on Operating systems principles (1981), pp. 1–11. [3] Bodden, E., A lightweight LTL runtime verification tool for Java, in: Companion to the 19th annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications (2004), pp. 306–307. [4] C. W. Fraser and D. R. Hanson, “A Retargetable C Compiler: Design and Implementation,” Addison-Wesley Longman Publishing Co., Inc., 1995. [5] Dong, Y., X. Du, G. Holzmann and S. A. Smolka, Fighting Livelock in the iProtocol: A Case Study in Explicit-State Model Checking, Software Tools for Technology Transfer 4 (2003).

52

Callanan

[6] GCC

team,

T.,

“GCC

online

documentation,”

(2005),

http://gcc.gnu.org/onlinedocs/.

[7] Grosu, R. and S. A. Smolka, Monte carlo model checking (extended version), in: LNCS 3440 on SpringerLink (2004), pp. 271–286. [8] Hallem, S., B. Chelf, Y. Xie and D. Engler, A System and Language for Building System-Specific, Static Analyses, in: ACM Conference on Programming Language Design and Implementation, Berlin, Germany, 2002, pp. 69–82. [9] Hauswirth, M. and T. M. Chilimbi, Low-overhead memory leak detection using adaptive statistical profiling, SIGARCH Comput. Archit. News 32 (2004), pp. 156–164. [10] Kupferman, O. and M. Y. Vardi, Model Checking of Safety Properties, Formal Methods in System Design 19 (2001), pp. 291–314. [11] M. Das and S. Lerner and M. Seigle, ESP: Path-Sensitive Program Verification in Polynomial Time, in: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2002, pp. 57–68. [12] Mood, A. M., F. Graybill and D. Boes, “Introduction to the Theory of Statistics,” McGraw-Hill Series in Probability and Statistics, 1974. [13] Musuvathi, M., D. Y. W. Park, A. Chou, D. R. Engler and D. L. Dill, CMC: A Pragmatic Approach to Model Checking Real Code, in: Proceedings of the Fifth Symposium on Operating System Design and Implementation (OSDI ’02) (2002), pp. 75–88. [14] Rosu, G. and K. Sen, An Instrumentation Technique for Online Analysis of Multithreaded Programs, in: 18th International Parallel and Distributed Processing Symposium, 2004, p. 268b. [15] Sammapun, U., A. Easwaran, I. Lee and O. Sokolsky, Simulation of Simultaneous Events in Regular Expressions for Run-Time Verification, in: Proceeding of Runtime Verification Workshop (RV’04), Barcelona, Spain, 2004, pp. 123–143. [16] Vardi, M. Y. and P. Wolper, An Automata-Theoretic Approach to Automatic Program Verification, in: Proceedings of the Symposium on Logic in Computer Science (LICS), Cambridge, MA, 1986, pp. 332–344. [17] Yang, J., P. Twohey, D. R. Engler and M. Musuvathi, Using Model Checking to Find Serious File System Errors, in: Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI 2004), San Francisco, CA, 2004, pp. 273–288.

53

MBT 2006

Controlling Testing using Three-Tier Model Architecture Antti Kervinen 1 , Mika Maunumaa 2 , Mika Katara 3 Institute of Software Systems Tampere University of Technology PO Box 553, FI-33101 Tampere, Finland

Abstract In this paper, based on our earlier work, we introduce test model architecture for modelbased testing through a GUI. The model architecture consists of three tiers. The tiers separate important concerns in GUI testing: navigation in the GUI using keywords, high-level concepts expressed as action words, and test-control related issues defined as control words. For test control, we define a novel coverage language to express coverage objectives. Furthermore, we introduce our refined vision for the associated tool platform enabling system level testing in the Symbian environment. The architecture includes a commercial GUI testing tool that we have extended with components enabling the use of test models. Key words: Model-based testing, Symbian, GUI, action word, keyword, test control, coverage

1 Introduction Model-based testing is one of the most promising approaches to tackle increasing testing challenges in software development. Conventional test automation solutions rarely find previously undetected defects and provide return of investment almost only in regression testing where the same test suites are executed frequently. In contrast, model-based testing practices carry the promise of better finding also new defects thus enabling the automation of also some other types of testing. In our earlier work [7], we have investigated model-based GUI testing and its deployment in the Symbian [12] environment. Symbian is an operating system for mobile devices such as smart phones. Our initial solution [7] consisted of a prototype implementation that was built upon an existing GUI test automation system. 1 2 3

Email: [email protected] Email: [email protected] Email: [email protected] This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs

Kervinen, Maunumaa, Katara

The basic idea was to specify the behavior of the system under test (SUT) using Labeled Transition Systems (LTSs) that were fed to the underlying GUI testing tool. Simple random heuristic was used to traverse the model and execute the associated events using the facilities provided by the GUI testing tool. The lessons we learned from that experiment suggest that we still have a long way to go before an industrial-strength approach can be provided. Based on our experiences, we need more advanced metrics and heuristics in order to use the test models more effectively. Furthermore, it is not clear how model-based testing affects test control, i.e. how the different types of tests should be handled. Towards these ends, in this paper, we build on our previous results and address the following problems: On the one hand, concerning coverage, we need to define how to state coverage objectives and how to interpret achieved coverage data. In addition, in certain situations, testing previously tested areas should be avoided when retesting with the same model. On the other hand, we need a better control over the test runs. In other words, how to utilize the same models for smoke testing and long period robustness testing, for instance? The state of the art in GUI testing is represented by so-called keyword and action word techniques [2,1]. They help in maintenance problems by providing a clear separation of concerns between business logic and the GUI navigation needed to implement the logic. Our solution is based on three-tier architecture of test models, the two lowest tiers of which were used already in the prototype. Keyword tier is the lowest level tier in our model architecture defining how to navigate in the GUI. LTSs in this tier are called refinement machines. They define how an action is executed and tested. For instance, a refinement machine can define that the action of starting Camera application in a Symbian smart phone is executed by pressing a button, and the action of verifying that the application is running consists of checking that certain text is displayed on the screen. Action tier is the intermediate layer. Action machines, i.e. LTSs in this tier, describe the behavior that can be tested. They consist of action words, which correspond to high-level concepts that describe the problem domain rather than the solution domain. Action words are refined to keywords by refinement machines in the keyword tier. An action machine that tests interactions of two applications can be built by combining the action machines that define the behavior of the applications. Finally, Test Control tier is the highest-level tier. We call LTSs in this layer Test control machines. Composing LTSs in the two lower-level tiers results in a conventional test model. However, it is on this layer where we express which type of tests (e.g. smoke or a long period test) are to be run, which test models are used in the runs, which test guidance heuristics should be used, and what are the coverage objectives for each run. In the following, we will discuss the above scheme in detail and introduce our refined vision for the associated tool architecture that does not tie our hands to any specific GUI testing tool. The rest of the paper is structured as follows. In Section 2, we present the background of the paper. Sections 3 and 4 introduce the three-tier 55


model architecture and the coverage language, respectively. The tool architecture is presented in Section 5. Finally, some conclusions are drawn in Section 6.

2 Background Test model specifies the inputs that can be given to and the observations that can be made on the SUT during test run. In our approach, a test model is a labeled transition system (LTS). It is a directed graph with labeled edges and with exactly one node marked as an initial state. Definition 2.1 [LTS] A labeled transition system, abbreviated LTS, is defined as a quadruple (S, Σ, ∆, s) ˆ where S denotes a set of states, Σ is a set of actions (alphabet), ∆ ⊆ S {i Σ {i S is a set of transitions and sˆ ∈ S is an initial state. 2 In the test execution, we assume the test model to be deterministic. That is, it does not contain a state where two leaving transitions share the same action. The test model LTS is composed out of small, hand-drawn LTSs by a parallel composition tool. We use a parallel composition given in [6]. The speciality of the parallel composition is that it is explicitly given the combinations of actions that are executed synchronously. This way we can state, for example, that action a in LTS L x is synchronized with action b in LTS Ly and their synchronous execution is observed as action c (the result). Definition 2.2 [Parallel composition “kR ”] kR (L1 , . . . , Ln ) is the parallel composition of n LTSs according , sî ). Let ΣR be a set √ to rules R. LTS Li = (Si , Σi , ∆i√ of resulting actions and a “pass” symbol such that ∀i : ∈ / Σi . The rule set √ √ R ⊆ (Σ1 ∪ { }) {i · · · {i (Σn ∪ { }) {i ΣR . Now kR (L1 , . . ., Ln ) = (S, Σ, ∆, s), ˆ where •

• •

•

S = S1 {i · · · {i Sn

Σ = {a ∈ ΣR | ∃a1 , . . ., an : (a1 , . . . , an , a) ∈ R}

((s1 , . . ., sn ), a, (s01 , . . . , s0n )) ∈ ∆ if and only if there is (a1 , . . ., an , a) ∈ R such that for every i (1 ≤ i ≤ n) · (si , ai , s0i ) ∈ ∆i or √ · ai = and si = s0i

sˆ = hsˆ1 , . . ., sˆn i

2

√ A rule in a parallel composition associates an array of “pass” symbols and actions of input LTSs to an action in the resulting LTS. The action is the result of √ the synchronous execution of the other actions in the array. If there is instead of an action, the corresponding LTS will not participate in the synchronous execution. Test engine is a computer program that explores the test model starting from its initial state. In every step, it first chooses one action in the transitions that leave the current state. There are three types of actions: keyword-success, keyword-fail and the rest. For example, kwVerifyText is a keyword-success action that corresponds to a successful observation: text “Camera” can be found on the display of the SUT. Its negation is the keyword-fail action ˜kwVerifyText: the 56


text does not show up. awStartCam could be one of the rest actions. If the chosen action was one of the rest, the test engine silently (without any communication with the SUT) executes the corresponding transition. That is, it updates the current state to the destination state of the transition. If the chosen action was either of the keywords, the test engine communicates with the SUT accordingly. The communication may be an observing, inputting or a mix-up, for example searching for a text on the display, pressing a button and selecting a menu-item (the menu has to be browsed to find the item, thus both inputs and observations are required). The communication may either succeed or fail: the text either is or is not found on the display, the menu item could or could not be selected. Depending on the result, the test engine tries to execute a transition whose label matches the keyword and the result. If the result is “success”, it tries to execute the keyword action without a tilde, otherwise with a tilde. If there is no such transition leaving the current state, an error is found. Otherwise the transition is executed. Coverage data of a test model LTS (S, Σ, ∆, s) ˆ is a function ∆exec : ∆ → N. It associates transitions with numbers that represent how many times the transitions have been executed. Initially ∀t ∈ ∆ : ∆exec (t) = 0. The function is updated every time a transition is executed. Test engine runs a test with a test model until it finds an error or reaches coverage objectives stated for the test run. In the first case, the test run is halted immediately because the test model does not contain enough information to recover from errors. The behaviour in the second case depends on what has been defined in the test control tier, as we will show in the next section.

3 Three-tier model architecture Next, we will have a look at LTSs on the three tiers of the test model architecture and how they communicate within the tier and with LTSs on adjacent tiers. To illustrate the design, we will use the following running example throughout the rest of the paper. We are testing a device that is capable of taking pictures, making phone calls and sending both email and multimedia messages (MMS). A user story is that Alice takes a picture and sends it to Bob by email or MMS. After five minutes, Bob calls back to thank for the picture. This story in mind, we are going to build a model and design a long period test and a smoke test that use the model. 3.1 Keyword tier To form a test model we use LTSs of the two lowest tiers in the hierarchy, that is, action machines and refinement machines (see Figure 1). The purpose of the refinement machines is to refine every high level action (that is, action word) in action machines to sequences of executable events in the GUI of the SUT (that is, sequences of keywords). Separating the functionality from the GUI events should allow us to reuse ac57


Test control tier

test control machines

set test model and cov. objectives ↓↑ test finished, verdict Action tier execute high level action

action machines

↓↑ execution finished

Keyword tier

refinement machines

execute event ↓↑ result: succeeded or failed (adapter and SUT) Fig. 1. Three-tier model architecture

tion machines with SUTs that can perform the same operations but have different user interface. For example, two camera applications may have exactly the same functionality in two devices where one has buttons and the other a touch screen. The reusability is very desirable, because designing an action machine takes much effort and insight of what is worth testing. On the other hand, when an action machine is already given, writing a refinement machine to use the action machine with a new device should be easier. In our running example, we could define three refinement machines to hide the user interfaces of three applications: Camera, Messaging and Telephone. To interleave the use of these applications, a task switching application is needed. The application allows the user to activate any application that is running in the background. We use yet another refinement machine to abstract the use of the task switcher. CameraRM in Figure 2 is a refinement machine of the Camera application. In its initial state (the filled circle) it is able to refine high level actions which mean starting Camera application (awStartCam) and verifying that the application is running (awVerifyCam). Certainly many other actions should also be refined, but they are omitted in the figure for clarity. There are usually many refinement machines on the keyword tier, but they are not synchronized with each other. Instead, every LTS on the keyword tier is synchronized with an action machine on the action tier so that the transitions labeled by action words in the action machines become refined to keywords. We do not allow the refinement to change the behavior of the action machine. A refined action machine is able to execute exactly the same sequences of actions CameraRM

start_awVerifyCam kwVerifyText y Verif

_aw end

Cam

start_awStartCam end

kwPressKey

_aw

kwPressKey Star

tCam

kwSelectMenu

Fig. 2. Refinement machine for Camera application

58


CameraAM awDeletePhoto o hot fyP eri V aw

awCreateMMS MS fyM eri V aw

awCancelMMS

IR ET

IRET

awQuit

INT

Ve aw

awStartCam am fyC eri V aw awTakePhoto IR ET

am

oC

N rify

INT

INT

PALLOW

Fig. 3. Camera action machine

as it did before the refinement. That is, valid refinement machine contains neither deadlocking nor infinite sequences of keywords (loops). Furthermore, in the parallel composition, the refinement machine never blocks the execution of any action word in the action machine. 3.2 Action Tier Every action machine LTS can be synchronized with several refinement machines and also with other action machines. Instead of using all the power of the parallel composition operation in the synchronizations, we restrict ourselves to use two synchronization mechanisms within the action tier. With the restriction we aim at two goals. Firstly, the test designer does not need to define parallel composition rules. The rules can be generated automatically based on the actions in the LTSs instead. Secondly, the restrictions enable us to do sophisticated automatic checks to find design flaws in LTSs. The states of action machines can be divided in two classes: running and sleeping states. The machines can execute action words (which again are refined to keywords by refinement machines) only in the running states. If the action machines are valid, the synchronization mechanisms guarantee that there is always exactly one action machine in a running state at a time. Initially, the running action machine is a scheduler action machine. A simple action machine for testing the Camera application of our running example is presented in Figure 3. It tests starting the application (awStartCam), taking a photo (awTakePhoto) and creating a multimedia message containing the photo (awCreateMMS). Three lowest states of the action machine in the figure are sleeping states, the leftmost of which is the initial state. When the camera action machine wakes up for the first time, it starts the Camera application and verifies that it started indeed. Then it is up to the test guidance algorithm whether a photo will be taken, the Camera application will be quitted or left to the background, which means putting the action machine back to the sleep state. There are two synchronization mechanisms in the action layer. The first one controls which action machine is running, and uses interrupt primitives INT, IRET, IGOTO and IRST. An action machine goes to sleep state by executing INT action, which synchronously wakes up the scheduler action machine. After some steps, the 59


scheduler then executes IRET synchronously with another (or the same) action machine and enters a sleep state. The other action machine is in a running state after execution of IRET. In some cases it is handy to bypass the scheduler. Running action machine A can put itself into sleep and synchronously wake up another action machine B by executing action IGOTO (the woken action machine executes IRST). In this case the scheduler action machine stays in sleep all the time. INT-IRET and IGOTO-IRST interrupt modelling mechanisms are inspired by the two possible ways how a user can activate applications in Symbian. INT-IRET corresponds to the situation where the task switching application (modelled by the scheduler action machine) is used to activate any running application. This could be compared to using “Alt–Tab” in MS Windows. With IGOTO-IRST we model the situation where the user activates a specific application directly within another application. For instance, in a Symbian phone, it is possible to activate Gallery application by choosing “Go to Gallery” from the menu of the Camera application. Although the ideas for these mechanisms originate from the Symbian world, the mechanisms are generally applicable when testing many applications on any platform where the user is able to switch from an application to another. The other synchronization mechanism is used for exchanging information on shared resources between action machines. There are primitives for requesting (PREQ) and giving permissions (PALLOW). The former is to be executed in running states and the latter in sleeping states. The mechanism cannot wake up sleeping action machines or put the running action machine into sleep. The action machine in Figure 3 is able to execute PALLOW synchronously with PREQ action of some other action machine. In our example, the Messaging action machine requests a photo file for sending it via MMS or Email. In the Messaging action machine we can safely assume the file to be usable in the transitions after PREQ until a sleep state can be visited. After that the file cannot be used without a new request because the Camera action machine may have deleted the photo while the Messaging was asleep. A test model is constructed out of LTSs in the action and keyword tiers with the parallel composition. The rules for the composition can be generated automatically from the alphabets of the LTSs: INT, IRET, IGOTO, IRST, PALLOW and PREQ actions are synchronized within the action tier. Action word transitions are splitted in two ((s, awX, s0 ) becomes (s, start_awX, snew ) and (snew , end_awX, s0 )) and the new action names are synchronized with the same actions in refinement machines. We still need a way to tell the test control tier when it is safe to stop running the test with the current test model. Therefore, we assume that whenever the test model is in its initial state, the test run with that model can be (re)started or stopped. Thus, the initial state should be reachable from every state of the test model. This property can be checked automatically, unless the test model is very large due to the state explosion problem.

60


3.3 Test Control Tier In the test control tier, we define which test models we use, what kind of tests we run and the order of the tests. The kind of a test is determined by setting coverage objectives, that is, what should be tested in the test model before the test run can be ended. In our running example, a test control machine could first run three very short smoke tests with three different test models. Each model could be built from a single action machine composed with its refinement machines. The first would test the Camera application, the second the Messaging, and the last the Telephone. When we know that at least the applications start and stop properly, we could run a longer test covering the elements in the user story: take a photo, send it and receive a phone call. This time the model would be composed of all the LTSs of the previous models, together with task switcher LTSs. Finally, we could start a possibly never-ending long period test with the same model that was used in the last test. We also would like to use the coverage data obtained in the previous test to avoid testing the same paths again. Execution of a transition in a test control machine corresponds to setting up a new test, running the test and handling the coverage data. All the information about test setup, coverage objectives and test guidance is encoded to the label of the transition, which is called control word. Firstly, for test setup, a control word determines which test model should be used in the test run. It also specifies what kind of initial coverage data should be used. It is possible to start testing in a situation where nothing is already covered, or to create the coverage data based on the execution histories of some previous test runs with the model. Secondly, for test run, coverage objectives and guidance heuristics are defined in the control word. Coverage objectives are stated in the coverage language, which we will introduce in Section 4. Because every coverage objective is a boolean function whose domain is the coverage data, it is natural to define the coverage requirement for the test run by combining objectives with logical “and” and “or” operators. There is still need to develop guidance algorithms that take both coverage data and coverage objectives into consideration. One possible approach could be defining a step evaluation function which ranks reaching a coverage objective to be the most desirable step, making progress closer to an objective to be very desirable, and executing a transition for the first time just desirable. This function could then be plugged into a game-like guidance heuristic as done in [8]. Another feature needed from the guidance algorithm is that it tries to guide the test model back to the initial state when the coverage objectives are fulfilled. Only after that, the test run with the model is finished and the test control machine is able to proceed. Finally, the control word in the test control machine defines what to do after the test run. Gathered coverage data can be either stored or erased. This choice may affect the later test runs, depending on whether or not they import coverage data 61


objective ::= query ::= o_quantifier::= q_quantifier::= type ::= requirement ::= query ::= item_list ::=

"require" o_quantifier type requirement "in" item_list "get" q_quantifier type query "in" item_list "any" | "every" | "combined" "every" | "combined" "action" | "state" | "transition" "count >=" number "count" name_regexp (" " item_list)* Fig. 4. Grammar of the coverage language

and how their guidance algorithms react on executing already covered transitions to fulfil coverage objectives. On the other hand, this choice does not affect the ability to make queries on the achieved coverage later on, because the queries can be answered based on the execution log.

4 Coverage language Coverage language is a simple language for expressing coverage objectives and querying what has been covered according to coverage data. The purpose of the language is two-fold. On one hand, objectives let us define test purposes for test runs. It is possible to run smoke and long period tests with the same test model just by varying the coverage objectives. On the other hand, queries should make the contents of the coverage data more accessible to testers, during and after the test run. There are two statements in the language: objective and query (see Figure 4). For example, objective require any action count >= 1 in end_awReceiveEmail end_awReceiveMMS is achieved when an email or a multimedia message is received at least once. It would be a reasonable coverage objective for a smoke test in our running example. Another good thing to test in the smoke test could be pressing every key at least once. The coverage objective would be require every action count >= 1 in kwPressKey.*. For a long period test we could require executing every possible transition that initiates sending email with objective require every transition count >= 1 in ([0-9]+, start_awSendEmail, [0-9]+). Regular expressions “[0-9]+” in the objective match any integers that identify the starting and destination states of transitions. If it seems that reaching every coverage requirement is too hard, we could set an alternative coverage objective require combined transition count >= 10000 in .*. This objective is met after 10000 executions of any transitions. Accomplishing objective 62


require any state count >= 1 in 120 121 requires visiting at least either of states 120 and 121. 4 Query statement (in Figure 4) in the coverage language is used for inspecting what has been covered. It returns either a single value (when “combined” quantifier is used) or a list of items with their execution counts (in case of “every” quantifier). For instance, get every action count in kw.* lists all keywords and how many times they have been executed. A state is considered to be visited whenever it is the destination state of executed transition. Initially there are no visited states (especially, even the initial state is not initially visited). The execution count of an action is incremented whenever any transition labeled by the action is executed. The rest of this section is dedicated for formalizing the semantics of the language. A transition is represented by a string (s,a,s0 ) where s and s0 are numbers that identify the source and the destination states of the transition, and a is a string that is the label of the transition. In the following, suppose that a contains characters a-z, 0-9 and ’’. match(r,t) is a boolean function that returns true if and only if regular expression r matches the string that represents transition t. Let R be a set of regular expressions and T a set of transitions. We define M(R, T ) = {t ∈ T | ∃r ∈ R : match(r,t)}, that is, the set of those transitions in T whose string representation could be matched by at least one regular expression in R. When ∆ exec is the coverage data related to a test model (S, Σ, ∆, s) ˆ and n is a natural number: require any transition count >= n in R ⇔ ∃t ∈ M(R, ∆) : ∆exec (t) ≥ n require every transition count >= n in R ⇔ ∀t ∈ M(R, ∆) : ∆exec (t) ≥ n require combined transition count >= n in R ⇔ ∑t∈M(R,∆) ∆exec (t) ≥ n Objectives considering actions and states can be changed to objectives on transitions by changing the regular expressions. actre(R) = {([0-9]+,r,[0-9]+) | r ∈ R} is a function that converts regular expressions matching actions so that they match to every transition whose label could be matched by the original regular 4

States in LTSs do not have labeling, but states are identified by natural numbers in our LTS file format. We refer to states with those numbers in the coverage language. Defining a labeling also over states, for example in the form of state propositions [3], would help in stating coverage requirements on states, but it would also complicate the parallel composition. On the other hand, visiting a state is not as useful piece of information as executing an action, because we are dealing with action based semantics here. There is no clear correspondence between the states of the test model and the states of the SUT.

63


expression. Similarly function statere(R) = {([0-9]+,[a-z0-9]+,r) | r ∈ R} converts expressions matching states to expressions matching transitions: require any action count >= n in R ⇔

W

r∈R require

combined transition count >= n in actre(r)

require any state count >= n in R ⇔

W

r∈R require

combined transition count >= n in statere(r)

require every action count >= n in R ⇔

V

r∈R require

combined transition count >= n in actre(r)

require every state count >= n in R ⇔

V

r∈R require

combined transition count >= n in statere(r)

require combined action count >= n in R ⇔ ( ∑r∈R get combined transition count in actre(r)) ≥ n require combined state count >= n in R ⇔ ( ∑r∈R get combined transition count in statere(r)) ≥ n Note that in the last two expressions (combined action and combined state objectives) we used query statements. Unlike objectives, which are truth-valued expressions, queries return numbers or sets of item-number-pairs. They are defined for transitions as follows: get every transition count in R = {(t, ∆exec (t)) | t ∈ M(R, ∆)}

get combined transition count in R = ∑t∈M(R,∆) ∆exec (t)

Action and state queries can be expressed with transition queries using the same ideas as in the conversion of coverage objectives: get every action count in R = { (a, n) | ∃s, s0 : (s, a, s0) ∈ M(actre(R), ∆) ∧ n = get combined transition count in actre({a}) } get every state count in R = { (s0 , n) | ∃s, a : (s, a, s0) ∈ M(statere(R), ∆) ∧

n = get combined transition count in statere({s0}) } 64


Test Machine

Model Machine

Symbian Mobile 1

{owner = Alice}

uses

coverage consultation

Coverage

TestGuidance

TestVisualizer

Mobile client for m-Test

coverage criteria

coverage status

m-Test

1

TestController

TestToolAdapter

step consultation

guidance info

1..* current test

TestEngine

mobile network uses

control status

calls

QuickTestPro

test status

status

TestLog

TestLogger

test model

test data

control model

Keyword Library

Symbian Mobile N

contains

TestControlModel

TestModel

TestData

: TestToolAdapter

{owner = Bob}

Mobile client for m-Test

TestScript

TestReporter

Design Machine produces

produces creates

LTS

Test model composer

uses

Model designer

operates use

TestTechnician TestEngineer

Fig. 5. Test tool architectural concepts

get combined action count in R = get combined transition count in actre(R) get combined state count in R = get combined transition count in statere(R)

5 Test tool architecture Although our feasibility study on GUI testing with action words and keyword reached its goal [7], there were several shortcomings in the design and implementation of the tools. To mention some of them, coverage data was not collected, test engine was tightly bound to a specific commercial GUI testing tool and controlling tests (setting up, stop criteria) and different test types (smoke test, long period test) were not considered. In this section we outline our plan for a test tool architecture in which the test engine is not bound to any implementation language or any specific testing tools. The tool itself is currently only partly implemented. Similar keyword-based frameworks have been studied ([11], [10]), but not in the model-based context. On the other hand, a general model-based testing architecture (AGEDIS) presented in [4] does not seem to offer direct support for action words or keywords. The test tool consists of an adapter part and a model execution part. The adapter part provides a high-level interface through which the model execution part can execute keywords and inspect the results of the executions. 65


We build the adapter part inside a GUI testing tool called Mercury’s QuickTest Pro (QTP) [9] (see Figure 5). QTP is a testing tool for MS Windows. It is capable of capturing information about window resources, such as buttons and text fields, and providing access to those resources through an API. QTP also enables writing and executing test procedures using a scripting language (VBScript) and recording a test log when requested. As a remote control tool we used m-Test [5]. The tool provides access to the GUI of the SUT and to some internal information; list of running processes, for instance. m-Test brings interactive user interface of a Symbian device (buttons and display) to an ordinary application window in MS Windows, which again can be accessed from the QTP. With the scripting language provided by QTP we implement the keywords in keyword library so that they are converted to events in the m-Test window. The same scripting system is also used for implementing a small communication module (TestToolAdapter) which connects to the model execution part of our test tool. The model execution part is an external application that may be running in another computer. The first active component in the model execution part is the TestController. When the system is started, TestController instantiates a TestControl object which encapsulates a test control machine. TestController creates TestGuidance and Coverage objects to guide the execution in the test control machine. As already mentioned, executing a transition in a test control machine corresponds to setting up and running a test run with a test model. When a new test run is being set up, TestModel object is instantiated and new TestGuidance and Coverage objects are created to guide the test run. At that point it is possible to load the coverage data of a previous test run to the Coverage object. The old data is useful if we want to avoid testing the same behavior as in the previous run or if we want to repeat the same test as closely as possible, for example. How the coverage data affects the test run, is determined by the test guidance algorithms. In the test first run we could use an “Explorer” guidance algorithm which tries to reach coverage goals by executing as many unseen actions and untraversed transitions as possible. If an error is found and later on corrected, we could then use “RegressionTester” guidance algorithm which tries to reach the objectives by preferring already traversed transitions (perhaps also follow the last execution trace). TestController starts a test run by passing newly created TestGuidance, Coverage and TestModel objects to the TestEngine. The Coverage component contains the coverage data and it also manages coverage objectives. It is able to answer whether or not the execution of a transition takes us closer to achieving the coverage objectives. The information is essential for the test guidance algorithms in TestGuidance component. In addition to the coverage status, the test guidance algorithms can base their decisions on the structure of the test model and on the decisions of other selection algorithms in the TestGuidance component. 66


During the test run both TestControl and TestEngine write information on their events (loaded test models, coverage objectives, executed transitions, results of the executions) to TestLog component. The contents of the log can be visualized with TestVisualizer. It can be used both in on-line and off-line modes, to see how the current test is advancing and to help debugging after the test. Finally, Model designer is a grahical tool for designing test control machines and test model components. Test model is built from the components using Test model composer. Because of the specified semantics of labels in the LTSs, it is enough that test engineer selects appropriate action machines and refinement machines to obtain a model. Parallel composition rules can be constructed automatically based on the labels of the LTSs.

6 Conclusions In this paper, based on our earlier work, we have presented our refined vision for model-based GUI testing in the Symbian environment. In this setting, the SUT is an embedded system, but unlike in usual embedded environments, we run systemlevel tests through a GUI. However, unlike in GUI-based testing of PC applications, there is no access to the GUI resources, which means that screen captures must be used. Our solution is based on a commercial GUI testing tool for Symbian environment that we have extended with components for using test models. The tests models fed to the tools are composed of component models of three different types: test control, action word, and keyword models correspond to different levels of abstraction. This three-tier architecture separates important concerns facilitating test control as well as test model design. In future work, case studies are needed to assess the applicability of our approach.

References [1] Buwalda, H., Action figures, STQE Magazine, March/April 2003 (2003), pp. 42–47. [2] Fewster, M. and D. Graham, “Software Test Automation,” Addison–Wesley, 1999. [3] Hansen, H., H. Virtanen and A. Valmari, Merging state-based and action-based verification, in: Proceedings of the Third International Conference on Application of Concurrency to System Design (2003), pp. 150–156. [4] Hartman, A. and K. Nagin, The AGEDIS tools for model based testing, in: ISSTA ’04: Proceedings of the 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis (2004), pp. 129–132. [5] Intuwave, m-Test homepage, At URL http://www.intuwave.com. [6] Karsisto, K., A new parallel composition operator for verification tools, Doctoral dissertation, Tampere University of Technology (number 420 in publications) (2003).

67


[7] Kervinen, A., M. Maunumaa, T. Pääkkönen and M. Katara, Model-based testing through a GUI, in: Proceedings of the 5th International Workshop on Formal Approaches to Testing of Software (FATES 2005), 2005. [8] Kervinen, A. and P. Virolainen, Heuristics for faster error detection with automated black box testing, in: Proceedings of the Workshop on Model Based Testing (MBT 2004), Electronic Notes in Theoretical Computer Science 111 (2005), pp. 53–71. [9] Mercury Interactive, QuickTest Pro homepage, At URL http://www.mercury.com. [10] Rankin, C., The software testing automation framework, IBM Systems Journal: Software Testing and Verification 41(1) (2002), pp. 126–139. [11] SAFS, Software automation framework http://safsdev.sourceforge.net/ (2006). [12] Symbian Ltd., Symbian Operating http://www.symbian.com/ (2006).

68

support System

homepage, homepage,

At At

URL URL

MBT 2006

Testing Self-Similar Networks Constantinos Djouvas 1,2 Computer Science The Graduate Center, The City University of New York New York, NY, USA

Nancy D. Griffeth 3 Mathematics and Computer Science Lehman College, The City University of New York Bronx, NY, USA

Nancy A. Lynch 4 Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Boston, MA, USA

Abstract A hard problem in network testing is verifying the correctness of a class of networks, as well as the actual networks under test. In practice, at most a few networks (sometimes only one) are actually tested. Thus an important question is how to select one or more networks that are sufficiently representative to apply the results to a class of networks. We present a model-based technique for selecting a representative network. The central theorem establishes that the representative network displays any faults present in any network of the class. This paper introduces the concept of “self-similarity,” which is used to select the network, and presents the results of an experiment in testing one class of networks. Key words: Testing, verification, model-checking, I/O automata, parameterized processes.

1

This work supported in part by DARPA/AFOSR MURI Award F49620-02-1-0325, MURI AFOSR Award SA2796PO 1-0000243658, NSF Award CCR-0121277, NSF Award NeTS-0435130, USAF,AFRL Award FA9550-04-1-012, DARPA Air Force (STTR) contract FA9550-04-C-0084, and Cisco URP Award. 2 Email:[email protected] 3 Email:[email protected] 4 Email:[email protected] This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs

Djouvas

1

Introduction

When a vendor tests its own network equipment, the goal is to verify that the equipment works for a range of network topologies and configurations. Network users may also need to verify correctness of a class of networks. For example, ISP networks change continuously. Even small organizations add new hosts regularly. Anyone may add or swap in new network equipment as new technologies or higher bandwidths become available. The remaining equipment must continue working as expected. This observation motivates the problem of how to choose networks for testing, when the real goal is to verify that a class of networks works. The central goal of this work is to find a single representative of a class of networks, whose correctness implies the correctness of the class. This paper investigates the use of a subnetwork that is common to all of the networks in the class and whose behavior looks like the behavior of any of the networks. When a subnetwork has this property, the class is called ”self-similar”. A tester can also use a weaker condition, self-similiarity with respect to a property, to establish that the network conforms to a single requirement imposing that property. In the latter case, it is necessary only to state the property and prove that if a network conforms to it, any composition consisting of multiple copies of the network also conforms to it. Internet protocols are designed in such a way that many properties of Internet protocols are self-similar. Proxies are a well-known example of selfsimilarity.. A Web server behind a proxy looks like a Web server to a client; similarly, a proxy and client together look like a client to the Web server. Switching and routing algorithms are designed to hide the structure of the networks they support, so that the behavior of a single switch or router can look like the behavior of a larger network. DHCP failover servers are designed to look like a single, highly-reliable server. In this paper, we address how to reduce the size and complexity of the network under test without reducing the test coverage. The central contribution of the paper is a method for choosing the network to be tested, by finding a common substructure of all the networks that behaves like each of the networks. Definitions and the basic theorem are presented in sections 3 and 4. In sections 5 and 6, we describe a case study, in which we modeled the forwarding function of learning bridges and proved self-similarity. In section 7, we describe a experiment on network testing, in which three tests were run, each consisting of a different learning bridge configuration.

2

Related Work

The general question is how to identify a small test that will verify correctness an entire class of networks. Protocol conformance testing solves the problem by verifying that the implementation of a single network device conforms to 70

Djouvas

the required protocol standards. Then, assuming that the protocol standards guarantee that the network has the required properties, protocol conformance testing shows that a network consisting of any number of interconnected devices has the required properties. An excellent review of protocol conformance testing appears in [10]. However, conformance testing presupposes a validated formal model of each protocol and proofs that the models have the required properties. In practice, Internet standards have rarely been formalized and the job of developing formal proofs has barely begun. Some standards, such as BGP, have been shown to have serious problems [6]. Others, such as DHCP, work correctly with high probability, but behave incorrectly on rare occasions [3]. Nonetheless, these protocols have desirable properties, and it is important to be able to verify desired properties for specific implementations. A different approach to network testing is to extend protocol conformance testing to “network interoperability testing,” as in [4,7]. This approach treats the network as a black box, whose external behavior is known but whose internal behavior cannot be observed. The test methodology requires developing a formal model of the network’s external behavior to generate tests that cover all possible sequences of visible actions. As noted above, models of networks and protocols are rarely available and time-consuming to develop. Descriptions of industrial network testing based on actual practice appear in [1,5]. Buchanan[1] presents ad hoc and common-sense approaches to testing networks. While these techniques are valuable, it is hard to analyze and optimize them. Griffeth[5] presents a case study of interoperability testing in an industrial lab. A study of time required in each stage of testing for four test projects (one Voice over IP, two Data Center, and one Network Management) at the Lucent Next Generation Networking Interoperability Lab (NGN) shows that the overwhelming majority of time is spent on test network setup[5]. Figure 1 summarizes the results from this study along with the results of the current experiment. The hypothesis of this paper is that testing only one configuration will result in significant savings in time since only one network needs to be set up. A similar problem, that of verifying a parameterized collection of processes, has been addressed in model-checking. Wolper and Lovinfesse[12] and Kurshan and McMillan[9] have shown how to apply induction to verify a parameterized collection of processes. Their results apply to collections of identical processes, which is not strictly required in this paper. Also, they require bisimulation of the processes; the present result requires only containment. They also require the tester to devise an invariant. This is not necessary for this work. In the simplest case, the tester must identify only that a requirement impose a self-similar property. Other work on reducing the complexity of model-checking

71

Djouvas

Fig. 1. Time Required in Stages of Network Testing. The shaded bars show the results of the NGN study. The only test in which test lab setup did not take most of the time was a test of network configuration tools, i.e., network setup. The cross-hatched bars show the results of the experiment reported in this paper.

3

The I/O Automata Model

To analyze network properties, we use the I/O automata modeling framework [11], which models network components as automata and their interactions as shared actions of the automata. The model provides a formal basis for saying that one network behaves like another: automaton A is said to implement automaton B if all externally visible behaviors of A are also externally visible behaviors of B. An important technique for proving that one automaton implements another is simulation. Automaton A is said to simulate B if there is a simulation relation (defined in Section 6) relating the states of A to those of B. A selfsimilar automaton A is one that can be replicated and connected to itself via a channel to form a new automaton that implements the original automaton A. Another important concept is that of a self-similar property, which is a property of an automaton that is preserved by such a composition. We review the definition of I/O Automata briefly; for details, see [11]. Definition 3.1 An I/O automaton consists of the following components: •

sig(A), a signature, consisting of three disjoint sets of actions: the input actions in(A), output actions out(A), and internal actions int(A). Output and internal actions are locally controlled ; input actions are controlled by the environment. The set of all actions in the signature is denoted acts(A).

•

states(A), a nonempty, possibly infinite set of states.

•

start(A), a nonempty subset of states(A), called the start states. 72

Djouvas •

trans(A), a state-transition relation, contained in states(A)×acts(sig(A))× states(A). We require that for each state s and input action π, there is a transition (s, π, s0 ).

•

tasks(A), a task partition, which is an equivalence relation on the locally controlled actions having at most countably many equivalence classes.

An execution of I/O automaton A is a sequence s0 , π1 , s1 , ..., sn−1 , πn , sn , where s0 is a start state and (si−1 , πi , si ) is a transition for each i ≥ 1. An execution can be finite or infinite. The set of executions of A is denoted as execs(A). We define traces(A) as the set of all sequences π1 , π2 , ..., πn , ... obtained by removing the states and internal actions from a sequence in execs(A). Traces capture the notion of externally visible behavior. A trace property of an automaton A is a property that holds for all traces of A. The composition operation allows the construction of complex I/O automata by combining primitive I/O automata. To compose automata, we treat actions with the same signature in different automata as the same action, and when any component performs an action π, it forces all the components having the same action to perform it. To compose automata, they must be compatible: Definition 3.2 A countable collection {SiT }i∈I is compatible if for allTi, j ∈ I, i 6= j, all of the following hold: (1) int(Si ) acts(Sj ) = φ, (2) out(Si ) out(Sj ) = φ, and (3) No action is contained in infinitely many sets acts(Si ). Definition 3.3 Given a compatible collection {Ai }i∈I of automata, the composition A = Πi∈I Ai is formed by the following rules: S S • sig(A) is defined by: out(A) = out(A ), int(A) = i i∈I int(Ai ), and i∈I S S in(A) = i∈I in(Ai ) − i∈I out(Ai ). •

states(A) = Πi∈I states(Ai ).

•

start(A) = Πi∈I start(Ai ).

•

trans(A) is the set of triples (s, π, s0 ) such that for all i ∈ I, if π ∈ acts(Ai ) then (si , π, s0i ) ∈ trans(Ai ); otherwise, si = s0i . S tasks(A) = i∈I (Ai ).

•

We denote a finite composition of automata A1 , ..., An by A1 k ... k An . After composing I/O Automata, we may want to hide actions used for communication between components, making them internal actions of the composition. Thus, ActHideΦ (A), for Φ ⊆ out(A), is defined as the automaton obtained from A by reclassifying each action in Φ as internal.

4

Self-Similarity

The problem that motivates this paper is that of finding a representative network to test instead of testing all members of a class. If there is a small 73

Djouvas

network N that “looks like” all larger networks in the class, then the smallest such network is an obvious candidate. This is because we can test N by itself to determine properties of the entire class. Defining Self-Similarity. Because we are interested in networks, we consider only automata with send output actions and receive input actions. These automata are parameterized by the number of ports (interfaces) they have to the network. Each send action is associated with a port, and sends the message out using the port. Each receive action is also associated with a port and receives a message arriving on the port. M essage is the set of possible messages over a port. An automaton with n ports has a signature containing at least the following actions: send(m : M essage, i : Int), where 1 ≤ i ≤ n, receive(m : M essage, i : Int), where 1 ≤ i ≤ n. To combine automata, we use a channel automaton Channel(A, B)i,j , as described in [11]. It joins port i of automaton A to port j of automaton B. (When only two automata are being composed, we write just Channeli,j .) This automaton has input actions send(m, i)A and send(m, j)B and output actions receive(m, i)A and receive(m, j)B . We assume that messages are delivered reliably, in-order, and with no duplication. Suppose that an automaton N is parameterized by the number n of ports. Then we say that N (n) is self-similar if traces(ActHideΦ (N (n) k Channeli,j k N (n))) ⊆ traces(N (2n − 2)), where Φ = {send(m, i)a , send(m, j)b , receive(m, i)a , receive(m, j)b }. In other words, the externally visible actions of the composition of N (n) with itself, using a channel connecting ports i and j, looks like a single automaton N (2n − 2), ignoring actions on the ports connecting the automata. We also define self-similarity for properties of networks, since it may be easier to establish self-similarity of interesting properties than for entire automata. We say that a trace property T is self-similar if the network N (n) k Channeli,j k N (n) has property T whenever network N (n) has property T . Thus test results concerning a self-similar property of a network N (n) can be generalized to apply to larger networks. Using Self-Similarity in Testing. By the definition of self-similarity, correct behavior of a self-similar network N implies correct behavior of a larger network composed of multiple instances of N . Perhaps more importantly, if there are bugs in the larger network, they will also be found in N . There are two approaches that allow us to take advantage of self-similarity to reduce the size of the network under test. First, we can define a self-similar model of the network that has the properties of interest in the test effort. 74

Djouvas

Second, we can test directly whether the properties of interest are self-similar. The case study of learning bridges in Section 6 follows the first approach. A set of axioms for learning bridges and proof that a composition of two automata obeying the axioms is presented in a longer version of this paper [2]. Self-Similar Models. This approach requires a generalized model M of the network that is selfsimilar. If the specification holds for M and if we establish by testing that N implements M , we can use the test results as if N itself were self-similar. The following theorem is the basis of this claim. Theorem 4.1 If M (n) is self-similar and if traces(N (n)) ⊆ traces(M (n)) ⊆ traces(S) then ActHideΦ (traces(N (n)) k Channeli,j k traces(N (n))) ⊆ traces(S). This theorem says that given a network N (n) and a self-similar model M (n), where M (n) implements S and N (n) implements M (n), we can conclude that two composed instances of network N (n) implement S. By induction, we can compose any number of instances of N (n) and still conform to S. Proof. Follows immediately from the definitions.

2

Self-Similar Properties. If self-similar trace properties S and T both hold for a network N , then clearly so does the trace property S ∧ T . This fact can be used in showing that a complex network satisfies a conjunction of properties T1 ∧ T2 ∧ . . . ∧ Tn : in showing this, one can prove that each individual property Ti is self-similar, rather than considering the properties together. Not every property we are interested in testing will turn out to be selfsimilar. However, we believe that many will be; for these, testing can be carried out using small networks.

75

Djouvas

5

Learning Bridges

A learning bridge interconnects separate IEEE 802 LAN segments into a single bridged LAN. It relays and filters frames “intelligently” between the separate LAN segments [8]. A learning bridge incorporates a forwarding algorithm and a spanning tree algorithm. The forwarding algorithm initially forwards every frame that arrives at a port out every other port. Also, when a frame arrives at a port, the forwarding algorithm “learns” the relationship between the source address and the port. It records this relationship in a filtering database. Once the forwarding algorithm learns the address-to-port relationship, it forwards any frame sent to that address on the corresponding port. The spanning tree algorithm converts an arbitrary topology to a tree. This eliminates cycles from the network so that frames will not be forwarded forever. We assume that the following important property is enforced by the spanning tree algorithm, as required by the standard: “The spanning tree algorithm creates a single spanning tree for any bridged LAN topology.” Thus, there is a unique path between any two hosts and cycles are eliminated.

6

Self-Similarity of Learning Bridges

This section presents our proof that learning bridges are self-similar. The proof is based on a generalized model of learning bridges. The self-similarity property allows a tester to use Theorem 5.1 to justify testing only a single learning bridge to verify an entire network 5 . Learning bridge operation can be described briefly as “send incoming frames out all ports until the correct port is known; then send out the correct port only.” A network of bridges that conform exactly to this requirement is not self-similar. Consider the following example: Example 6.1 Learning Bridge. Bridges A and B are connected to each other, with A preceding B in a path from S (source) to D (destination). Suppose that the filtering database in A does not contain an entry for D, while the filtering database of B does contain an entry for D. Then if a message initiated is from S to D, A forwards this message to every active port but B forwards it to only the correct port. Compose A and B into one bridge AB. The requirement above means that an external observer would expect the trace of AB to have only one outgoing message with destination D. But this does not happen. Instead the message is forwarded to all ports that are inherited from A and to a single port inherited from B—the same one that B forwards the message to. 5

Note that we address only the forwarding of messages in this paper, not the construction of the spanning tree.

76

Djouvas

So we define a generalized model, which requires only that the bridge copies each message to the “correct port”, and perhaps to other ports as well. By “correct port” P , we mean that P is the port through which the destination is reachable. The learning bridge implements this by using a filtering database to record the source address of each arriving message along with the port at which it arrived. All subsequent messages sent to that address will be copied to the corresponding port (and possibly other ports). If no message has been received from the destination address, the filtering database does not have an entry for the address, and the bridge forwards the message to all ports. The Generalized Model. Each bridge has five actions: input action receive, output action send, and internal actions copyIn, copyOut, and delete. It has a filtering database, an input and output buffer for each port, and an array of queues corresponding to each (input port, output port) pair. The array entry queue[i, j] is a queue of messages that have arrived at port i and are destined to be sent out port j. The receive action adds received messages to the input buffer for the arrival port and updates the filtering database. The send action sends the first message in a port’s output buffer to the channel connected to the port. The copyIn action copies a message from an input buffer to the end of all the internal queues for the input port; copyOut copies a message from one internal queue to an output buffer. Finally, the delete action can delete an arbitrary message m from an internal queue, if the correct port is known at the time of the delete and the queue doesn’t correspond to the correct port for the message 6 . We assume that there are a finite number of active ports in any bridge and that the spanning tree algorithm determines which ports are active. automaton bridge(n : Int)i signature input receive(m, inP ort)i output send(m, outP ort)i internal copyIn(m, inP ort) copyOut(m, inP ort, outP ort)i delete(m, inP ort, outP ort)i states inbuf, an array of input buffers, indexed by {1, ..., n }, one for each port outbuf, an array of output buffers (FIFO queues) indexed by {1, ..., n}, one for each port, initially all empty. queue, an array of FIFO queues indexed by {1, ..., n} × {1, ..., n} one for each pair of ports, initially all empty. filterDB, a mapping of message destinations to ports of bridgei indexed 6

The delete action is one of many ways to model a bridge that is allowed to forward a message out a port other than the correct one. It nondeterministically removes messages from queues that don’t lead to the correct port.

77

Djouvas by {1, ..., n}, initially all nil. transitions receive(m, inP ort)i effect add m to inbuf (inP ort) set f ilterDB(m.src) := inP ort send(m, outP ort)i precondition m first element on outbuf(outPort) effect remove first element from outbuf(outPort) copyIn(m, inP ort) precondition m is the first element on inbuf [inP ort] effect add m to queue[inP ort, i] for all i 6= inP ort remove m from inbuf [inP ort] copyOut(m, inP ort, outP ort)i precondition m first element on queue[inPort,outPort] effect add m to outbuf[outPort] remove m from queue[inP ort, outP ort] delete(m, inP ort, outP ort)i precondition m is in the queue queue[inP ort, outP ort]∧ f ilteringdb[dest(m)] 6= nil ∧ f ilteringdb[dest(m)] 6= outP ort effect remove m from queue[inP ort, outP ort]

Composition of Bridges: Now we describe the composition of two learning bridges. We assume that the spanning tree algorithm has been run to completion by all the bridges in the network and that there are no failures. Because of this, there is only one active path between any two bridges. Let bridge1 and bridge2 be two learning bridges running the IOA code given above. We use the convention that port i is a port of bridge1 and j is a port of bridge2 . Without loss of generality, we assume that port i0 of bridge1 is connected with port j0 of bridge2 through Channeli0 ,j0 . Because of the spanning tree algorithm, these are the only active ports connecting bridge1 and bridge2 . Let bridgec be the result of renaming ports of bridge2 to n + 1, ..., 2n (to avoid conflict with port numbers of bridge1 ), then composing bridge1 and bridge2 with a connecting channel, and finally hiding the send and receive actions on the channel between them: bridgec = ActHideΦ (bridge1 k Channeli0 ,j0 k bridge2 ) and Φ = {send(m, i0 )1 , receive(m, i0 )1 , send(m, j0 )2 , receive(m, j0 )2 }. Our goal is to show that bridgec is essentially the same as a single bridge, 78

Djouvas

which we will call bridgep , running the learning bridge IOA. bridgep must have the same number of ports as bridge1 and bridge2 together, minus the two connected ports. Thus if bridge1 and bridge2 each have n active ports, bridgep has 2n − 2 active ports. Port i of bridgep with 1 ≤ i ≤ n, is connected to the same channel as the corresponding port i of bridge1 . Similarly port j of bridgep , with n + 1 ≤ j ≤ 2n, is connected to the same channel as the corresponding port j of bridge2 . Finally, the input and output actions of bridgep are renamed so that the actions on port i, 1≤i ≤n, are receive(m, i)1 and send(m, i)1 (instead of receive(m, i)p and send(m, i)p ); similarly, actions on port j, n + 1 ≤ j ≤ 2n, are receive(m, j)2 and send(m, j)2 . Simulating a bridge with a composition of bridges: We use an important theorem about IOA to show the equivalence of bridgec to bridgep . The theorem says that if there is a simulation relation (defined below) from an IOA A to an IOA B, then traces(A) ⊆ traces(B). Definition 6.2 A simulation relation from an IOA A to an IOA B is a relation R ⊆ states(A) × states(B). Define f : states(A) → P(states(B)) by f (s) = {t|(s, t) ∈ R}. To be a simulation relation, R must satisfy: (i) (Start condition:) If s ∈ start(A), then f (s) ∩ start(B) 6= φ (start condition). (ii) (Step condition:) If s is a reachable state of A, u ∈ f (s) is a reachable state of B, and (s, π, s0 ) ∈ trans(A), then there is an execution fragment α of B starting in state u and ending in some state u0 ∈ f (s0 ) such that trace(α) = trace(π). Below, we define a relation R from bridgec to bridgep and prove that R is a simulation relation. This gives us the desired result: Theorem 6.3 The learning bridge automaton bridge(n) is self-similar. Proof. Let s be a state of bridgec and t be a state of bridgep . We use dot notation to denote a state variable in a bridge, e.g., s.f ilterDB1 is the value of the filtering database of bridge1 in state s of bridgec . The pair (s, t) belongs to the relation R if: (i) t.f ilterDB = s.f ilterDB1 ∪ s.f ilterDB2 − {haddr, porti|port ∈ {i0 , j0 }} S (ii) t.outbuf [i] = s.outbuf [i]m for i ∈ ports1 ports2 − {i0 , j0 }, and the value m ∈ {1, 2} depends on the value of i. S (iii) t.inbuf [i] = s.inbuf [i]m for i ∈ ports1 ports2 − {i0 , j0 } and the value of m ∈ {1, 2} depends on the value of i. (iv) The internal array of message queues t.queue corresponds to the combined arrays s.queue1 and s.queue2 as follows: • t.queue[i, i0 ] = s.queue[i, i0 ]1 if i, i0 ∈ ports1 , i, i0 6= i0 • t.queue[j, j 0 ] = s.queue[j, j 0 ]2 if j, j 0 ∈ ports2 , j, j 0 6= j0 79

Djouvas •

•

t.queue[i, j] is a concatenation of the following queues for i ∈ ports1 , j ∈ ports2 , with i 6= i0 , j 6= j0 : s.queue[j0 , j]2 ,s.outbuf [j0 ]2 ,s.queuej0 ,i0 ,s.inbuf [i0 ]1 ,s.queue[i, i0 ]1 t.queue[j, i] is defined symmetrically for i ∈ ports1 , j ∈ ports2 , with i 6= i0 , j 6= j0 :

These conditions mean that: (i) The filtering database of bridgep contains the same entries as the union of the filtering databases of the two component bridges of bridgec , excluding the entries for the internal ports. (ii) The output buffer for each port of bridgep contains the same messages as the output buffer of the corresponding port of bridgec . There are no buffers in bridgep corresponding to i0 and j0 . These buffers in bridgec may contain any messages consistent with the other conditions. (iii) The input buffer for each port of bridgep contains the same messages as the input buffer of the corresponding port of bridgec . (iv) Entries in the internal array of queues are the same in bridgep as bridgec if the entry connects an input port to an output port of the same component bridge; otherwise, they are a concatenation involving the channel queue and the buffers for ports i0 and j0 . To show that R is a simulation relation, we must prove the start condition and the step condition. The former is trivial because all states of both bridges are initially empty. The latter requires proving that states of bridgep and bridgec correspond after each action. First we prove state correspondence for the filtering databases: Definition 6.4 State Invariant: In all reachable states of the composed IOA, the filtering database of bridgec corresponds to the filtering database of bridgep as defined by the simulation relation. The proof is by induction of the length of an execution. The result is clear if a message is forwarded only on ports of the bridge at which it arrived. It is less obvious when a frame arrives at one bridge and is forwarded out the second bridge. In this case, the filtering databases of both bridge1 and bridgep are updated on receipt of the message with the relationship between the arrival port and the source address. Later, the filtering database of bridge2 is updated to show the path to the source goes through bridge1 . Since the simulation relation refers only to the entry in bridge1 and ignores the entry in bridge2 , it is preserved in this case (as well as all others). To show that input buffers, output buffers, and internal queues correspond after each action, we consider all actions π. Table 1 summarizes all the possible actions of bridgec , the corresponding execution fragment of bridgep and the trace, which is the same for both bridges. 80

Djouvas

Action of Bridgec

Execution Bridgep

1

receive(m, i)1 , i 6= i0

receive(m, i)1

receive(m, i)1

2

receive(m, j)2 , j 6= j0

receive(m, j)2

receive(m, j)2

3

receive(m, i0 )1

λ

λ

4

receive(m, j0 )2

λ

λ

5

send(m, i)1 , i 6= i0

send(m, i)1

send(m, i)1

6

send(m, j)2 , j 6= j0

send(m, j)2

send(m, j)2

7

send(m, i0 )1

λ

λ

8

send(m, j0 )2

λ

λ

9

delete(m, i, i0 )1 , i0 6= i0

delete(m, i, i0 )p

λ

10

delete(m, j, j 0 )2 , j 0 j0

6= delete(m, j, j 0 )p

λ

11

delete(m, i, i0 )1

Sequence delete(m, i, j)p for λ j ∈ ports2 , j 6= j0

12

delete(m, j, j0 )2

Sequence delete(m, j, i)p for λ i ∈ ports1 , i 6= i0

13

copyIn(m, i)1 , i 6= i0

copyIn(m, i)p

λ

14

copyIn(m, j)2 , j 6= j0

copyIn(m, j)p

λ

15

copyIn(m, i0 )1

λ

λ

16

copyIn(m, j0 )2

λ

λ

17

copyOut(m, i, i0 )1 , i0 6= copyOut(m, i, i0 )p i0

λ

18

copyOut(m, j, j 0 )2 , j 0 6= copyOut(m, j, j 0 )p j0

λ

19

copyOut(m, i, i0 )1

λ

20

fragment

λ

of

Trace

copyOut(m, j, j0 )2 λ λ Table 1: Correspondence between actions of Bridgec and Bridgep

A simple case analysis establishes the result. 2

81

Djouvas

7

Experiment

We performed three tests on learning bridges with the goal of quantifying the impact of self-similarity in reducing test time. The first test used a single bridge, the second two connected bridges, and the third used three connected bridges. Our hypothesis was that doing only the first test would reduce the test time by at least a factor of 2 over testing three connected bridges, since only one configuration need be tested rather than three. Test setup in this case is much simpler than most network test setup, so that time savings should be under-stated. In our tests, we used three Cisco Catalyst 2950 switches, each with four hosts connected to it, all on a single vlan (vlan1). We used 300 seconds (the default) for the expiration time of an entry in the mac-addr-table, which is the internal table on Cisco switches containing the learned MAC addresses. Thus entries that are not used for 5 minutes will be removed from the table. The hosts were configured with network addresses in the 192.168.0.0/24 network. Four hosts were connected to each switch. The network was not connected to a router, so that only traffic from the LAN was visible. In each test, one of tne hosts executed a script to ping each other connected host 5 times. In addition, the pinger tried to ping various non-existent hosts 5 times each. After attempting to ping all hosts in the list, the pinger slept for 600 seconds, allowing the mac-addr-table entries to expire, and then repeated the pings. For the ping, the pinger used the parameters -f -c 5 -p . •

-f: Flood ping with 0 interval: send packets as fast as the host supplies them.

•

-c 5: Packet count is 5.

•

-p : Fill the packet with the given hexadecimal pattern

The flood option was used to stress the switch as much as possible, assuming that errors are more likely when the switch is stressed. The pattern was varied in each ping to pick up potential data-dependent issues on the network. The network traces were captured with the command tcpdump -s0, to capture the entire frame. For analysis, we used tcpdump with options -exxtts0, meaning: •

-e: Print the link-level header with each frame. This is required to evaluate the switch behavior, since it is a link-layer device.

•

-s0: Capture all octets in the frame, for use in evaluating unexpected behavior.

•

-tt Print an unformatted timestamp with each frame, to disambiguate which messages match.

•

-xx: Print each frame, including its link level header, in hex. 82

Djouvas

Correct bridge behavior would require that hosts capture the following messages: •

Broadcast: All ARP request messages broadcast by any host must appear in the traces for all hosts. In other words, if an ARP request message appears in the trace for the source host, it must also appear in the trace for each other host. For connected hosts, the number of ARP requests was one or two, although the number could correctly be higher (on other tests, we have seen as much as three on larger LANs). For hosts that were not available, six messages were broadcast.

•

Unicast: For each unicast message appearing in the trace for a source host, the trace at the destination host must contain the same message (ARP reply message, echo request message, or echo reply message).

•

Received messages: Each message received must match a message that was sent.

Subsequent analysis of the network traces generated by tcpdump found all of the required messages. The tests were set up and run by a single member of the project research staff, a recent graduate of the computer science program at Lehman College. Since the first tests run were the single switch tests, followed by the two switch tests, and finally the three switch tests, it is possible that learning from earlier tests reduced the time required for setting up the later tests. Because of time constraints, we actually used only one host as the pinger instead of rotating through the hosts; this affected the total execution time, which is predictable since we used scripts. It would have been multiplied by the number of hosts in each test (4 for the one switch case, 8 for the two switch case, and 12 for the three switch case). We assume that the effect on setup time would have been minor, on the order of a few minutes for copying the scripts to the other hosts. Table 2 shows the distributions of time observed for running the tests. The short time required for test planning can be attributed to the simple nature of the test. We observe that after setting up for the first test suite, on the single learning bridge configuration, the time required for setting up the lab for later test suites was greatly reduced.

83

Djouvas

One bridge

Two bridges

Three bridges

Test Planning

1 hour

-

-

Test Lab Setup

12.5 hours

1.08 hours

.92 hours

Test Execution

2.33 hours (9.3 hours)



Test Documentation

3 hours

2 hours

2 hours

Total

18.83 5.41 hours 5.25 hours hours Table 2. Times required for stages of testing for 1, 2, and 3 bridges. Presumed test execution times for using each of the hosts as a pinger, instead of only one, are shown in parentheses. It took approximately 1.6 times as long to run three tests as it did to run the first, instead of 2 times as long. One reason for this was that, because the networks are self-similar, the test setup is also almost the same; thus the experience gained setting up one configuration reduces the time required to set up the next configuration. Another reason was that the configuration tasks themselves were not difficult. Creating test execution scripts and verifying that the network configuration was correct was the most difficult part of the setup. We note that in practice, rather than testing until a desired level of confidence is reached, testers actually test until they run out of time. This phenomenon affected this test as well. Thus it is likely that the primary contribution of using self-similarity in testing will be to help testers select better tests and to improve the level of confidence in the results of testing. A secondary goal of this experiment was to identify useful tools that might be built to use test models, especially self-similar models, to support more costeffective testing. Difficult problems observed in the testing were evaluating test results (i.e., correct or not) and verifying correctness of the test lab setup. A model that supports determining whether a network trace is valid would be useful for evaluating test results. Better network management tools would help verify correctness of the test lab setup.

8

Conclusions

In this paper, we have shown that the self-similarity of network devices and their properties provides a powerful tool for reducing the size of a network testing effort. All networks in a class of self-similar networks can be tested by testing the smallest self-similar subnetwork. This reduces to one the number 84

Djouvas

of networks to be tested while minimizing the size of the network. A case study of the self-similarity of learning bridges illustrates one approach to using self-similarity in network testing. This approach uses a selfsimilar network model that captures the behaviors that the network must implement. A longer version of this paper [2] shows how to define required properties of learning bridges and prove self-similarity. The latter approach will be necessary when a model of the network protocol is not available. Additional work is needed to identify other self-similar networks and important self-similar properties of networks. Also, it will be useful to investigate the use of models for evaluating the results of a network test. Another line of investigation is to determine how to evaluate the coverage of a set of tests for a network and to develop ways to measure the level of confidence we have that a network works, given a test suite for the network. Acknowledgments. We are grateful to Pearl Abotsi for her excellent work running the tests.

85

Djouvas

References [1] Robert W. Buchanan. The Art of Testing Network Systems. Wiley, 1996. [2] Constantinos Djouvas, Nancy Griffeth, and Nancy Lynch. Using self-similarity for efficient network testing. http://comet.lehman.cuny.edu/griffeth/ Papers/selfsimlong.pdf, September 2005. [3] Ralph Droms. RFC 2131: Dynamic host configuration protocol, March 1997. [4] Nancy Griffeth, Ruibing Hao, David Lee, and Rakesh Sinha. Integrated system interoperability testing with applications to voip. In Proceedings of FORTE/PSTV 2000, Pisa, Italy, October 2000. [5] Nancy Griffeth and Frederick Stevenson. An approach to best-in-class interoperability testing. Journal of the International Test and Evaluation Association, 23(3):68–82, October 2002. [6] Timothy G. Griffin and Gordon T. Wilfong. An analysis of BGP convergence properties. In Proceedings of SIGCOMM, pages 277–288, Cambridge, MA, August 1999. [7] Ruibing Hao, David Lee, Rakesh K. Sinha, and Nancy Griffeth. Integrated system interoperability testing with applications to voip. IEEE/ACM Transactions on Networking, 12(5):823–836, 2004. [8] IEEE standard for local and metropolitan area networks: Media access control (MAC) bridges. Standard 802.1D-2004, June 2004. [9] R. P. Kurshan and K. McMillan. A structural induction theorem for processes. In PODC ’89: Proceedings of the eighth annual ACM Symposium on Principles of distributed computing, pages 239–247, New York, NY, USA, 1989. ACM Press. [10] D. Lee and M. Yannakakis. Principles and Methods of Testing Finite State Machines - A Survey. In Proceedings of the IEEE, volume 84, pages 1090–1126, 1996. [11] Nancy Lynch. Distributed Algorithms. Morgan Kaufmann Publishers, Inc., March 1996. [12] P. Wolper and V. Lovinfosse. Verifying properties of large sets of processes with network invariants. In Automatic Verification Methods for Finite State Systems, volume 407 of Lecture Notes in Computer Science, pages 68–80. Springer-Verlag, 1989.

86

MBT 2006

Formal Conformance Testing of Systems with Refused Inputs and Forbidden Actions Igor B. Bourdonov 1,2, Alexander S. Kossatchev 1,3, and Victor V. Kuliamin 1,4 Institute for System Programming of Russian Academy of Sciences 1009004, B. Kommunisticheskaya, 25, Moscow, Russia

Abstract The article introduces an extension of the well-known conformance relation ioco on labeled transition systems (LTS) with refused inputs and forbidden actions. This extension helps to apply the usual formal testing theory based on LTS models to incompletely specified systems, which are often met in practice. Another topic concerned in the article is compositional conformance. More precisely, we try to define a completion operation that turns any LTS into input-enabled one having the same set of ioco-conforming implementations. Such a completion enforces preservation of ioco conformance by parallel composition operation on LTSes. Key words: Formal testing, conformance testing, LTS, implementation relation, refusals, ioco.

1

Introduction

In the modern world a large part of human activities is controlled by various computer-based systems. Reliability and quality of such systems become urgent for dependable evolution of our society. One of the tools that help us to ensure system quality is conformance testing. Conformance testing in general is an activity that checks conformance between the real behavior of software or hardware system and the requirements to this behavior. To make results of conformance testing more sound and convincing the testing process needs in 1

This work is partially supported by RFBR grants 05-01-00999-a and 04-07-90386-b, by grant of Russian Science Support Foundation, and by Program 4 of Mathematics Branch of RAS. 2 Email: [email protected] 3 Email: [email protected] 4 Email: [email protected] This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs

Bourdonov, Kossatchev, Kuliamin

a formal framework, including formalism for description of requirements and formal definition of conformance relation. To make the reasoning about conformance rigorous one models both the actual behavior of the system under test (SUT) and the requirements to it in some formalism. The choice of such formalism is directed by a class of systems we need to describe with it. It is preferable to use a theory that allows reasoning about a wide range of software and hardware systems of practical significance. Labeled transition systems (LTS) formalism is a good candidate, and it is used successfully for a long time to model rather complex behavior of distributed software and hardware units, including concurrency aspects. LTSes also serve as semantic metamodel for various process calculi, such as CSP [1] and CCS [2], and for formal languages actively applied in distributed software and hardware verification, e.g. SDL, LOTOS, and Estelle. During testing one usually distinguishes between inputs and outputs of the SUT. A tester provides the former to it, it provides the latter to the tester. So, LTS model should be regarded as IOLTS, i.e. input-output labeled transition system, where labels on transitions are partitioned into input and output symbols. By a specification one means a description of requirements to SUT’s behavior in terms of the formalism chosen, e.g. an LTS modeling the required behavior. Since the requirements are represented formally, one can speak about formal conformance between them and the actual behavior of the SUT, but only if this actual behavior also has some formal representation. Usually the basic test hypothesis states that the actual behavior of the SUT can be adequately described by a model of the same kind [3,4]. In our case this means that there exists an LTS, which is called an implementation, adequately representing the real behavior of the SUT. One does not know it exactly, but can reason on its properties on the base of observations of the SUT’s behavior. Many relations between LTSes can be chosen as conformance relations checked in testing. [6] gives an extensive review of them. The choice of conformance relation depends on the testing abilities – abilities to control the SUT and to observe various aspects of its behavior during testing. On the other hand, the testing abilities determine properties of the system under test that can be checked. One of the most useful and natural conformance relations used in testing is ioco, introduced in works of Jan Tretmans [8,7]. He also developed the theory that helps to construct test suites necessary and sufficient to check conformance between model and implementation according to ioco. 1.1 ioco relation and its problems ioco uses three rather natural and basic testing abilities – ability to provide inputs, ability to observe outputs, and less obvious ability to observe a quiescence, a situation, in which the SUT will not provide any more output. 88


Further observation of a quiescence in traces is denoted as δ. In practice one usually supposes that there exists some finite time T that in any state if no inputs are provided and the SUT is going to provide an output, it always does this in a time less that T. This hypothesis allows us to detect quiescence as the observation of no outputs during some timeout. More attentive analysis of test abilities used by ioco gives two subtle issues. •

We suppose that the implementation is input-enabled, i.e. in each stable state (where there are no internal transitions) it has a transition for each input symbol. Informally, an input-enabled system should always accept any input provided to it. This may be reasonable when we test large components and systems as a whole, because it is natural to them to process any possible inputs. But internal components are often developed in collaborative mode, not the protective one, and rely upon some restrictions on the input.

•

We suppose that during testing we can prevent SUT from giving us an output, if we want it to accept our input first. This property follows from the semantics of LTS interaction based on rendezvous mechanism. If we model testing as interaction between an implementation LTS and a tester LTS by means of parallel composition, we need to have in practice the special ability to prevent the SUT from producing an output to the testing system if the testing system is not ready to accept it.

Both issues were already mentioned by several authors, including Tretmans himself [8]. These assumptions give tester very high level of control over the SUT. The second property is considered by some authors as particularly suspicious, since it is rarely can be met in practice. Only in special contexts, for example, during debugging, tester has enough control over the execution of the SUT to make this assumption valid. However, in the framework of LTS models the lack of control over SUT is the consequence of the presence of some testing context, which represents the transport mechanism, delivering actions from the tester to the SUT and backward. The testing performed through some context is called asynchronous testing, while the one giving the tester full control over the SUT is called synchronous. The second issue can be interpreted that ioco is intended to be used in synchronous testing only. If we need to check conformance between an implementation and a specification by means of testing through some context, it is natural to use the composition of the LTS modeling the context with the original specification as the specification of the observable SUT’s behavior and check its real behavior against the derived specification [4]. Here we face with a known problem of ioco – it is not preserved by the parallel composition of LTSes, i.e. composition IkQ of an implementation I conforming with a specification S and an LTS Q modeling the context may be not conforming with SkQ. Examples of such implementation and specification can found in [9]. Another example is shown on Fig. 1. The specification S 89


and the implementation I presented there are ioco-conforming, but are not ioco-conforming if they are observed through input and output queues (that is, being composed with two endless queues or even queues of length 2). This problem seems to be a consequence of some bias of process calculi to consider bisimulation relation as the most natural conformance relation between processes. Parallel composition preserves bisimulation, which is thought to be the desired relation between specification and its implementation. However, bisimulation is not testable in natural settings. During black-box testing we cannot check it completely and often actually do not want to do it, because specification may describe more general behavior, only a part of which should be realized in any implementation.

Fig. 1. Example of ioco-conforming specification and implementation, which are not ioco-conforming when observed through queues.

So, to propose more practical conformance relation for the testing in context, we can go in two ways. •

To consider some practical variants of contexts and develop testing framework for them, including specialized conformance relations. This approach for context modeled by infinite or bounded input and output queues is presented in works of Petrenko and Yevtushenko [10,11]. Another paper taking such an approach is [12] where the authors propose to augment events provided by the SUT with special stamps revealing the actual order of events in the SUT for tester. Such instrumentation makes another conformance relation, ioconf, also used in synchronous testing, useful for the asynchronous one.

•

To define more convenient composition operation that preserves conformance relation, in so far that we can check the SUT’s behavior through any context against the specifications composed with LTS modeling this context. This way is chosen in recent works of Tretmans with co-authors [9]. It is shown there that input-enabled ioco-conforming LTSes has no problems with composition – if both the specification and the implementation are completely specified and they are ioco-conforming, then their compositions with any context LTS are also ioco-conforming. So, the main problem to be overcome on the way to more convenient composition is unspecified inputs. The demonic completion of specification is proposed in [9]. It forces unspecified inputs to take the specification into special chaotic state, where 90


any behavior is possible. This is done to make any possible SUT’s behavior in the unspecified area conforming to the completed specification. 1.2 The proposed approach We also would like to go in the second way, since it makes possible testing through different contexts, which is useful in practice. For example, contexts not preserving the sequence of actions (as queues do) can be met in practical testing of Internet protocols, components of GRID networks, and Web services. Instrumentation of the SUT is not always possible, especially if it is distributed itself. On this way it is reasonable first to examine more thoroughly the meaning of unspecified inputs, which are the main source of the problems with definition of ‘good’ composition operation. One can notice that this issue is related with the implementation inputenabledness hypothesis. Original definition of ioco is asymmetric in two ways – first, it assumes that an implementation should always accept inputs provided to it, while the tester can abstain from acceptance of an implementation’s output, second, a specification can be partial and not input-enabled in contrast with an implementation. Both sources of asymmetry can be removed if we allow an implementation also to be partially defined, not input-enabled. One can find the following ways of unspecified input understanding. Some of them were already mentioned in the literature [13]. •

Forbidden input. Such an input is forbidden to be provided to the SUT, due to various reasons. It may cause serious destruction of the SUT, or move it into a situation, which we want to avoid during testing, for example, divergence, an infinite path through internal actions. In fact, when demonic completion is introduced, it means the same thing – we don’t want to check the behavior of the SUT after accepting this input, but such a completion may cause us to perform these unwanted checks. We prefer to mark ‘bad’ situation we need to avoid with special forbidden action label γ. Any input that can lead in the state where a forbidden action can occur (maybe after a path through internal transitions) is considered as forbidden. The same holds for outputs that can lead us to the state with a forbidden action. But outputs in some state are under full control of the SUT – it is the SUT, which choose an output to produce. So, we need to ban the mere waiting for an output in states where some output can lead us to a forbidden action.

•

Refused input. This input can be provided to the SUT and in response it demonstrates refusal to accept it. Here we need the new testing ability to observe input refusals. Refused inputs can model situations of practical significance. For example, tea-coffee machine having two buttons for requesting tea and coffee and a slot for coin insertion may also have a special shutter closing the slot until some button is pressed. When trying to insert a coin before pressing a button we may observe that the coin is not taken. 91


More practical example is given by Graphical User Interface controls – menu item and buttons, which can be enabled or disabled. In this case control’s disability means that the system refuses to accept actions on this control. Refused inputs are considered as particular case of refusals forming refusal sets in [5,6] and some papers on conformance testing, e.g. [14] and works on Multi Input-Output Transition Systems (MIOTS) [15,16,17]. In testing based on MIOTS testing concerning input refusals attract more attention, since blocking of one channel caused by a refused input can be resolved after accepting an input on another channel. Here we do not need in detailed consideration of refusal sets and pay more attention to refused inputs. •

Erroneous input. This is more subtle case. In some situations we can provide an input to the SUT, but the fact that the SUT has accepted it says that it is not conforming to the specification. Consider the example presented on Fig. 2. In the specification LTS presented there δ-trace δ?aδ ends in the state where input a is not specified. And its subtraces δ?a and ?aδ end in states where input a is defined, but is followed by different outputs. So, what if we observe the trace δ?aδ in the implementation and then provide an input a? The conforming implementation should be input-enabled and it should accept a, by it cannot provide neither x, nor y, nor it can demonstrate quiescence in response. Otherwise, if it has the trace δ?aδ?aδ, it should have δ?a?aδ, which is absent in the current specification, if it has the trace δ?aδ?a!x, it should have ?aδ?a!x, which is also absent, and if it has the trace δ?aδ?a!y, it should have δ?a?a!y, which is absent in specification again. So, the only reasonable conclusion is that this implementation is not conforming to the specification presented, just after it demonstrated the trace δ?aδ?a. The last input a is erroneous in the sense that any possible behavior after it (any output or refusal) cannot be observed in the conforming implementation. We model such an input as leading to a separate state with the single outgoing transition marked with special an error output. This construction will be necessary in consideration of possible completion operations for LTSes.

Fig. 2. Example of the specification having the trace δ?aδ that should not exist in any ioco-conforming implementation. •

Unspecified input can be considered as doing nothing and so corresponding to a self-loop transition (so called ‘angelic’ behavior). We think, however, 92


that such inputs should be specified in an accurate specification and it should be tested that they actually do nothing. To make an input unspecified there must be more serious reasons (see above). Bearing in mind all the listed possibilities, we do the following. (i) Define an extension of ioco relation for LTSes that can have forbidden actions and refused inputs. Error output is an auxiliary mark to check conformance. This relation is designated as iocoβγδ in this paper. (ii) Since parallel composition of LTSes breaks ioco only on partially specified LTSes, we need to define some completion of the original LTS before composition. This completion from one hand should give an input-enabled LTS, and from the other hand the original LTS and the completed one should have the same set of ioco-conforming implementations. Next sections of the article present the implementation of those steps. It seems that the main contribution of this paper is direct introduction of forbidden actions into the definition of conformance relation and construction of the corresponding completion operation disallowing processing of inputs unspecified in the original LTS.

2

Extended ioco Conformance Relation

Below we recall some part of LTS-based formalism and usual arrow notation. Definition 2.1 An LTS is a tuple L = (Q, C, T, q0 ) where •

Q is non-empty set of states;

•

C = I ∪ U is a set of symbols, I consists of input symbols, U is disjoint from I and consists of output symbols;

•

T ⊆ Q × (C ∪ {τ, γ}) × Q is a set of transitions. A transition (q, a, q 0 ) starts in the state q, ends in the state q 0 , and is marked with the label a. We use labels with question mark (?a) to denote input symbols and labels with exclamation mark (!x) to denote output symbols. τ ∈ / I ∪ U is considered as empty symbol marking internal transitions. γ ∈ / I ∪ U, γ 6= τ is considered as forbidden action symbol.

•

q0 ∈ Q is the initial state. x

x

We denote the fact that in an LTS L (q, x, q 0 ) ∈ TL as q → − q0. q → − denotes x

x

∃q 0 ∈ Q q → − q 0 . q 6−→ denotes ∀q 0 ∈ Q (q, x, q 0 ) 6∈ TL . By a stable state we τ

γ

mean a state q such that q 6−→ ∧q 6−→ . LTS can be partially specified, i.e. it can have states where not all inputs are possible. However, we can consider it as completely specified due to the ?a

following interpretation. If for a stable state q q 6−→, we may mean that this ?a can be given in q and the LTS should demonstrate refusal to accept it in response to this. For an input symbol ?a we denote refusal of this input 93


as {?a}. In addition to input symbols, output symbols, empty symbol, and forbidden action we use symbol δ to denote quiescence, i.e. situation where LTS does not have any transitions marked with output symbols, γ, or τ. Input refusals and quiescence together are called refusals. We call an LTS L strongly convergent if it does not have infinite paths through internal transitions. It is possible to convert any LTS into strongly convergent one by replacing the symbol τ on the transitions of every such path with γ. From testing viewpoint this means that we avoid actions that can lead us to such a path. Further we consider only strongly convergent LTSes. We can transform an original LTS by converting convergence into forbidden actions, making all transitions marked with γ to lead into a special additional state, and adding refusal transitions as self-loops in stable states. An example of such a transformation is shown on Fig. 3.

Fig. 3. Example of transformation making refusals explicit and adding a special state to go after forbidden actions.

By βγδ-traces in alphabet I ∪ U we mean sequences consisting of input and output symbols, γ and refusal symbols – δ and refusals of input symbols. We denote concatenation of traces σ and µ by σµ. µ σ denotes that µ is a beginning of σ. If s is input, output, input refusal symbol, γ or δ, then hsi means the sequence with the single element s. A run of LTS L starting in a state q is a sequence of transitions of L, transformed according to the procedure described above, the first of which starts in q, and each next transition starts in the end state of the previous. A βγδ-trace of a run p is a sequence of labels of transitions of p, from which all symbols τ are skipped. One can see that each time when p goes through a stable state without outgoing output transitions, δ may be inserted in the corresponding place several times. Similarly, each time when p goes through a stable state without outgoing transition marked with input symbol ?a, the symbol {?a} may be inserted in the corresponding place several times. According to the transformation rules the first γ met in βγδ-trace is always the last symbol – there is no need to extend a βγδ-trace after the first forbidden action in it. The set of all the βγδ-traces of runs of L starting in a state q is denoted as T racesβγδ (q, L). T racesβγδ (L) is T racesβγδ (q0 , L). If σ is a βγδ-trace of LTS L then L after σ is a set of all states of L that can be reached by paths having σ as their βγδ-trace. 94


We need the notion of safe actions, which makes us safe from triggering a forbidden action. An input symbol ?x or its refusal are called safe in LTS L γ

after a βγδ-trace σ if ∀q ∈ (L after σh?xi) q 6−→ . An output symbol or δ is γ

said to be safe in L after its trace σ if ∀!x ∈ U ∀q ∈ (L after σh!xi) q 6−→ . A βγδ-trace σ of L is safe if each its symbol is safe in L after the beginning of σ preceding this symbol. A set of all safe βγδ-traces of L is denoted as Saf e(L). We also call an extension of a safe trace σ of L with a safe symbol in L after σ a test trace of L. A set of all test βγδ-traces of L is denoted as T T (L). It is easy to note that T T (L) ∩ T racesβγδ (L) = Saf e(L). Consider again specification LTS S and implementation LTS I. We may perform testing according to S only if we are sure that I operates properly during this process. In ioco theory this is guaranteed by the input-enabledness of I. Although we ease this assumption, we still need some safety hypothesis about I. This leads us to the following definition. Definition 2.2 If I and S are LTSes, I is said to be safe for S if T T (S) ∩ T racesβγδ (I) ⊆ Saf e(I). This definition says that if we construct a test avoiding possibility of forbidden action occurrence in a specification, its application to any implementation safe for this specification cannot lead to forbidden action too. So, implementations safe for a specification can be safely tested according to it. Now we are ready to give the definition of iocoβγδ relation. Definition 2.3 Let I and S are LTSes. Then I iocoβγδ S if and only if I is safe for S and for each βγδ-trace σ ∈ Saf e(S) and for each symbol s (including refusals) safe in S after σ σhsi ∈ T racesβγδ (I) ⇒ σhsi ∈ T racesβγδ (S). More fine (but less intuitive) expression of this fact can be given by the expression T T (S) ∩ T racesβγδ (I) ⊆ T racesβγδ (S) ∩ T T (I). Informally, an implementation I safe for S is said to be iocoβγδ -conforming to S when after an S-safe trace I can accept an input symbol, give an output symbol, demonstrate a quiescence, or input refusal only if S can do just the same thing after the same trace. It is easy to show that iocoβγδ defines a preorder on LTSes. Note, that classic ioco is not a ‘good’ preorder, since it imposes asymmetric restrictions on implementation and specification. While the latter can be incompletely specified, the former should not. The fact that for specifications without forbidden actions and input refusals (usual completely specified LTSes) iocoβγδ is equivalent to ioco is also rather obvious. In this case they both are equivalent to trace inclusion. 2.1 Test Derivation During testing we should check SUT’s behavior on every trace that is safe in the specification. Moreover, we should check it for all the symbols safe after 95


such a trace in the specification. As usual we model test cases by LTSes with inverted inputs and outputs – inputs of the specification become outputs of a test case, outputs of the specification (and implementation) are inputs of a test case. In addition, the symbol θ is used to mark deadlock resolution transitions in a test case. θ is considered as input symbol and means observation of quiescence in the test case states where any SUT’s outputs can be accepted or observation of an input refusal in the test case states where a specification’s input symbol is provided by the test. A test case has two special states fail and pass without outgoing transitions. Other constraints on test case LTS are given below. •

Each maximal trace of a test case should be finite and should end either in the fail state or in the pass state.

•

A test case should resolve all possible deadlocks in its interaction with an !a implementation. If in some state q of a test case there exists a ∈ I q − →, θ then q → − , which fires if the input a is refused by the SUT. If in some state ?x θ q of a test case for all x ∈ U q −→, then q → − , which fires if no output is observed.

•

A test case should be deterministic as much as it is possible. Each its state should be an input state or an output state. An input state should have outgoing transitions marked with all possible SUT’s outputs and θ. An output state should have only one outgoing transition marked with some input of the specification and one outgoing transition marked with θ.

An implementation LTS I passes a test case T if their parallel composition (extended by correlating θ in the test case with δ and input refusals in the implementation) has no states with fail component achievable from the initial state. A test suite for a specification S is a set of test cases for S. An implementation passes a test suite if it passes each its test case. A test suite is called sound for a specification S if any iocoβγδ -conforming implementation passes it, and exhaustive for S if any implementation passing it is iocoβγδ conforming to S. Sound and exhaustive test suite is called complete. Theorem 2.4 Let us denote a set of safe finite βγδ-traces of a specification S as Saf ef (S). For each trace σ ∈ Saf ef (S) construct a test case T (σ) with the help of the following transformations. •

Take a sequence of symbols of σ, construct an inverted symbol for each (?a 7→!a, !x 7→?x, δ 7→ θ, {?a} 7→ θ), and make the sequence of transitions marked with the resulting symbols. Let us denote a state of this LTS by µ ¯, where µ is the corresponding prefix of σ. σ ¯ should be pass.

•

For each prefix µ of σ and symbol s such that µhsi σ we add new transitions. There are several possibilities listed below. For each case we consider possible extensions of µ with alternatives to s. If s is an input, its alternative is the corresponding input refusal, and the alternative to an input refusal is 96


the corresponding input. Alternatives to an output are all other output symbols from the alphabet and quiescence, and alternatives to δ are all output symbols. For each alternative to s we should add an additional transition to our test case. If this alternative is possible into the specification (the trace µ can have several different extensions in the specification), we add the corresponding transition leading to pass, otherwise it should lead to fail. More precise rules are given in the following list. θ · ?s ∈ I. Then, add a transition µ ¯→ − pass, if µh{?s}i ∈ T racesβγδ (S) and θ µ ¯→ − fail otherwise. !r · s is {?r}, where ?r ∈ I. Then, add a transition µ ¯ − → pass, if µh?ri ∈ !r T racesβγδ (S) and µ ¯− → fail otherwise. · !s ∈ U. Then any !r ∈ U, !r 6= !s and δ are safe in S after µ. Add a ?r ?r transition µ ¯ − → pass, if µh!ri ∈ T racesβγδ (S) and µ ¯ − → fail otherwise. θ θ Also add a transition µ ¯ → − pass, if µhδi ∈ T racesβγδ (S) and µ ¯ → − fail otherwise. ?r · s is δ. Then any !r ∈ U is safe after µ in S. Add a transition µ ¯− → pass, ?r if µh!ri ∈ T racesβγδ (S) and µ ¯− → fail otherwise. Then T (Saf ef (S)) is a complete test suite for S. Soundness of test cases from T (Saf ef (S)) is implied by their construction – if the composition of such a test and an implementation comes to a state with fail component, then the implementation has a trace that does not exist in the specification. To prove the exhaustiveness of the constructed test suite one should take an implementation that does not conform to the specification, found a safe trace σ in the specification that can be extended in the implementation by a safe symbol s, for which σhsi ∈ / T racesβγδ (S) holds. Then, it is sufficient to consider the test case constructed for σ extended with an alternative to s, which is a safe trace in S. The implementation chosen cannot pass this test case. More details of the proof can be found in [18].

3

Completion Operations

The next step is to define such a completion operation Comp for LTSes, that for each LTS S Comp(S) is input-enabled and has the same set of iocoβγδ conforming implementations. Results presented further are partial. Only a solution for classic ioco relation is given. The authors are working now on the full completion operation, but have no compact and proved construction for it. In [9] the demonic completion Ξ is defined as a candidate of the needed completion for ioco. However, as it is also noted there, this completion does not preserve full information on unspecified inputs. Moreover, demonic completion from [9] is state completion – it defines some additional behavior after an input in some state – and just this fact makes it slightly inadequate. State 97


completions can make non-conforming implementation conforming, as it is mentioned in [12]. Fig. 4 shows an example of specification S and implementation I such that I ioco S does not hold, but I ioco Ξ(S).

Fig. 4. Example of the specification, for which Ξ changes the ioco relation. The ‘correct’ completion variants ∆ and Γ are also presented.

The same Fig. 4 also presents the examples of ∆ and Γ completions defined below. They both are more suitable completion operations, not extending the set of ioco-conforming implementations. On this figure additional transitions added by completion operations are shown as hatch lines. We propose two completion operations, ∆ and Γ, that differs in interpretation of unspecified inputs in the original LTS. ∆-completion treats them as leading into the states where any possible behavior can be observed, but they can be given to the completed LTS. Γ-completion treats them as forbidden inputs, which should not be provided during testing at all. The main ideas are the following. At first we construct a basic completion, which augments an LTS with additional transitions and states that do not change the set of traces and protects the LTS from extending the set of iococonforming implementations by the further state completion. On Fig. 4 transitions added by basic completion are shown as small-hatch lines. Then, we perform state completion according to the operation used – for ∆-completion we add all possible behaviors after all inputs that remained unspecified after the first step, for Γ-completion we add γ transitions after those inputs. On Fig. 4 transitions added by ∆- or Gamma-completions are shown as long-hatch lines. Definition 3.1 Basic completion operation Bc transforms an LTS with states Q, inputs I and outputs U in the following way. The resulting LTS Bc(L) has the states corresponding to Cδ∗ – all possible sequences of symbols from I ∪ U ∪ {δ}. Inputs of Bc(L) coincide with L, and outputs are U ∪ {!error}. For each σ ∈ Cδ∗ R(σ) denotes the set of δ-traces of L obtained from σ by 98


deletion some or all δ symbols. The set of transitions is the minimal set derived from the following rules. ?a

•

∀?a ∈ I ∃µ ∈ R(σ) µh?ai ∈ T racesδ (L) ⇒ σ − → σh?ai in Bc(L).

•

∀!x ∈ U ∀µ ∈ R(σ) µh!xi ∈ T racesδ (L) ⇒ σ − → σh!xi in Bc(L).

•

∀µ ∈ R(σ) µhδi ∈ T racesδ (L) ∧ σ does not end on δ ⇒ σ → − σhδi in Bc(L).

•

∀µ ∈ R(σ) ∀!x ∈ U µh!xi 6∈ T racesδ (L) ∧ µhδi 6∈ T racesδ (L) ⇒ σ −−−→ σh!errori in Bc(L).

!x

τ

!error

∆-completion of an LTS L is constructed as completion of Bc(L) with τ two states qU and qI demonstrating all possible behaviors, i.e. qU → − qI and !x ?a ∀!x ∈ U qU − → qU and ∀?a ∈ I qI − → qU . For each state q of Bc(L) and each ?a

?a

?a ∈ I if q 6−→ in Bc(L) then q − → qU in ∆(L). Γ-completion of an LTS L is constructed as completion of Bc(L) with one ?a

state qγ having γ-self-loop. For each state q of Bc(L) and each ?a ∈ I if q 6−→ ?a in Bc(L) then q − → qγ in Γ(L). Theorem 3.2 ∆ and Γ turn any LTS S into input-enabled one and preserve the set of ioco-conforming implementations, i.e. ∀I I ioco S ⇔ I ioco ∆(S) ⇔ I ioco Γ(S). We need to skip the proof (see its details in [18]) due to restrictions on the size of the paper. The two completions defined can be used to describe relation between classic ioco and iocoβγδ introduced above. To formulate this relation we first note that ioco conformance to a specification S can be naturally extended on the set Iγ (S) of LTSes that may have refused inputs and forbidden actions, but satisfy the following conditions. •

Empty trace is safe in any I ∈ Iγ (S).

•

Let us call βγδ-trace without input refusals and γ δ-traces and denote the set of all δ-traces of an LTS L as T racesδ (L). For each σ, which is δ-trace of both S and I ∈ Iγ (S), any output should be safe in I ∈ Iγ (S) after σ.

•

For each σ, which is δ-trace of both S and I ∈ Iγ (S), and each input ?a, which can extend σ in the specification (that is, σh?ai ∈ T racesδ (S)), ?a should be safe in I ∈ Iγ (S) after σ and σh?ai should also be a δ-trace of I.

For I ∈ Iγ (S) we can say that I ioco S if and only if for each σ ∈ T racesδ (S) and for each s ∈ U ∪ {δ} σhsi ∈ T racesδ (I) ⇒ σhsi ∈ T racesδ (S). Theorem 3.3 • For each specification S without forbidden actions and completely defined implementation I without forbidden actions (the domain of 99


the classic ioco) I ioco S ⇔ I iocoβγδ ∆(S) ⇔ I iocoβγδ Γ(S). For each specification S without forbidden actions and I ∈ Iγ (S)

•

I ioco S ⇔ I iocoβγδ Γ(S). The proof of this statement can also be found in [18]. Note, that in the second case Γ(S) cannot be substituted by ∆(S), since implementations from Iγ (S) nonconforming to S may conform to ∆(S).

4

Conclusion

The main results of this paper are definition of a conformance relation iocoβγδ introducing semantics of forbidden actions and refused inputs into conformance testing theory based on LTS models and construction of two completion operation that transform any LTS into the input-enabled ones having the same sets of ioco-conforming implementations. The second result makes possible definition of ‘proper’ LTS composition preserving ioco-conformance. Nevertheless, the problems stated in the end of Introduction are not solved completely. We have no compact construction of the analogous completion preserving the set of iocoβγδ -conforming implementations for an LTS with refused inputs. This construction is under development now. Acknowledgements. We thank A. Petrenko from CRIM for helpful discussions.

References [1] C. A. R. Hoare. Communicating Sequential Processes. Prentice-Hall, 1985. [2] R. Milner. Communication and Concurrency. Prentice-Hall, 1989. [3] G. Bernot. Testing against Formal Specifications: A Theoretical View. In Proc. of TAPSOFT’91, Vol. 2. S. Abramsky and T. S. E. Maibaum, eds. LNCS 494, pp. 99–119, Springer-Verlag, 1991. [4] ISO/IEC JTC1/SC21 WG7, ITU-T SG 10/Q.8. Information Retreival, Transfer, and Management for OSI. Framework: Formal Methods in Conformance Testing. Committee Draft CD 13245-1, ITU-T Proposed Recommendation Z.500. ISO–ITU-T, Geneve, 1996. See also ITU-T. Recommendation Z.500. Framework on formal methods in conformance testing. International Telecommunications Union, Geneve, Switzerland, 1997. [5] I. C. C. Phillips. Refusal Testing. Theoretical Computer Science 50, pp. 241– 284, 1987.

100


[6] R. J. van Glaabek. The Linear Time-Branching Time Spectrum II; the Semantics of Sequential Processes with Silent Moves. Proc. of CONCUR’93, Hildesheim, Germany, August 1993. E. Best, ed. LNCS 715, pp. 66–81, SpringerVerlag, 1993. [7] J. Tretmans. Test Generation with Inputs, Outputs, and Repetitive Quiescence. Software – Concepts and Tools, 17(3):103–120, 1996. [8] J. Tretmans. A Formal Approach to Conformance Testing. PhD thesis, University of Twente, Enschede, The Netherlands, 1992. [9] M. van der Bijl, A. Rensink, J. Tretmans. Component Based Testing with ioco. CTIT Technical Report TR-CTIT-03-34, University of Twente, 2003. [10] A. Petrenko, N. Yevtushenko, J. L. Huo. Testing Transition Systems with Input and Output Testers. Proc. of TestCom 2003, LNCS 2644, pp. 129–145, SpringerVerlag, 2003. [11] J. L. Huo, A. Petrenko. On Testing Partially Specified IOTS through Lossless Queues. Proc. of TestCom 2004, LNCS 2978, pp. 76–94, Springer 2004. [12] C. Jard , T. Jéron , L. Tanguy , C. Viho. Remote testing can be as powerful as local testing. In Proc. of the IFIP TC6 WG6.1 Joint International Conference on Formal Description Techniques for Distributed Systems and Communication Protocols (FORTE XII) and Protocol Specification, Testing and Verification (PSTV XIX), October 1999, p.25–40. [13] G. V. Bochmann, A. Petrenko. Protocol Testing: Review of Methods and Relevance for Software Testing. Proc. of ACM SIGSOFT ISSTA’1994, Software Engineering Notes, Special Issue, pp. 109–124. [14] J. Helovuo, S. Leppanen. Exploration Testing. Proc of. 2-nd International Conference on Application of Concurrency to System Design, Newcastle upon Tyne, U.K., June 2001, pp. 201–210. [15] L. Heerink. Ins and Outs in Refusal Testing. PhD thesis, IPA-CTIT, 1998. [16] L. Heerink, J. Tretmans. Refusal Testing for Classes of Transition Systems with inputs and Outputs. In T. Mizuno, N. Shiratori, T. Higashino, A. Togashi, eds. Formal Description Techniques and Protocol Specification, Testing and Verification. Chapman & Hill, 1997. [17] Z. Li, J. Wu, and X. Yin. Refusal Testing for MIOTS with Nonlockable Output Channels. In International Conference on Computer Networks and Mobile Computing, Beijing, China, October 2003, pp. 517–522. [18] I. B. Bourdonov, A. S. Kossatchev, V. V. Kuliamin. Theory of conformance testing for systems with refused inputs and forbidden actions. Synchronous case. ISP RAS Technical Report 2005, in Russian. http://www.ispras.ru/ ∼RedVerst/RedVerst/Publications/TR-01-2005.pdf

101

MBT 2006

Test Case Generation for Mutation-based Testing of Timeliness Robert Nilsson

a,1,2

, Jeff Offutt b,3 and Jonas Mellin

a,1,4

a

Distributed Real-time Systems Group School of Humanities and Informatics University of Sk¨ ovde Sk¨ ovde, Sweden

b

Information and Software Engineering George Mason University Fairfax Virginia, USA

Abstract Temporal correctness is crucial for real-time systems. Few methods exist to test temporal correctness and most methods used in practice are ad-hoc. A problem with testing real-time applications is the response-time dependency on the execution order of concurrent tasks. Execution order in turn depends on execution environment properties such as scheduling protocols, use of mutual exclusive resources as well as the point in time when stimuli is injected. Model based mutation testing has previously been proposed to determine the execution orders that need to be verified to increase confidence in timeliness. An effective way to automatically generate such test cases for dynamic real-time systems is still needed. This paper presents a method using heuristic-driven simulation to generate test cases. Key words: Real-time Systems, Mutation Testing, Model based

1

Introduction

Current real-time systems must be both flexible and timely. There is a desire to increase the number of services that real-time systems offer while using few, standardized hardware components. This can increase system complexity and introduce sources of temporal non-determinism (for example, caches 1

This work has been funded by the Swedish Foundation for Strategic Research (SSF) through the FLEXCON programme. 2 Email: [email protected] 3 Email: [email protected] 4 Email: [email protected] This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs

Nilsson, Offutt and Mellin

and pipelines) that make it hard to predict the execution behavior of tasks [26]. Faults in such predictions may result in software timeliness violations and costly accidents. Thus we need methods to detect violation of timing constraints for computer architectures for which we cannot rely on accurate off-line assumptions. Timeliness is the ability for software to meet time constraints. For example, a time constraint for a flight monitoring system can be that once landing permission is requested, a response must be provided within 30 seconds [28]. When designing real-time systems, software behavior is modelled by periodic and sporadic tasks that compete for system resources (for example, processor-time, memory and semaphores). The response times of these tasks depend on the order in which they are scheduled to execute. Periodic tasks are activated with fixed inter-arrival times, thus all the points in time when such tasks are activated are known. Sporadic tasks are activated dynamically, but assumptions about their activation patterns, such as minimum inter-arrival times, are used in analysis. Each real-time task typically has a deadline. Tasks may also have an offset, which denotes the time before a task of that type is activated. Testing methods must be adapted to address timeliness because it is difficult to characterize a critical sequence of inputs without considering the effect on the set of active tasks and real-time protocols. However, existing testing techniques seldom use information about real-time design in test case generation, nor do they predict what execution orders may reveal faults in off-line assumptions (see section 5 for an overview of related work). In the real-time community, timeliness is traditionally analyzed and maintained using scheduling analysis techniques or regulated online through admission control and contingency schemes [34]. However, these techniques use assumptions about the tasks and activation patterns that must be correct for timeliness to be maintained. Further, doing full schedulability analysis of non-trivial system models is complicated and requires specific rules to be followed by the run-time system. In contrast, testing of timeliness is general in the sense that it applies to all system architectures and can be used, as a complement, to gain confidence in assumptions by systematically sampling among the execution orders that can lead to missed deadlines. However, only some of the possible execution orders typically reveal timeliness violations in the presence of timing faults. Mutation-based testing of timeliness is inspired by a model based method for automatic test case generation presented by Ammann, Black and Majurski [2]. The main idea behind the method is to systematically “guess” what faults a system contains and then evaluate what the effect of such faults could be in a model of the system. Once faults with bad consequences are identified, test cases are constructed that try to reveal those faults in the system implementation. Model-checking has previously been used to analyze models of real-time 103


systems for generating test cases for testing of timeliness [22]. A problem in this context is that analysis of the dynamic real-time systems models often becomes so computationally complex that the previously presented modelchecking approach does not work. In particular, this happens in models of event-triggered systems where the timing of different sporadic interrupts can influence the execution order of tasks [31]. This paper investigates whether application-specific heuristics and simulation can be used as an alternative for analyzing such models. Consequently, this paper proposes a method where a mutated specification model that captures possible execution behaviors is mapped to a simulator. The simulator is then iteratively executed using a genetic algorithm to find input sequences that reveal the potential failures in the mutated model. The method is demonstrated in two experiments. The first experiment compares the method with the model-checking based approach to gain basic confidence in its reliability. The method is also evaluated using a larger, more dynamic system specification for which the model-checking based approach fails. The experiments indicate that the simulation-based method remain effective for the dynamic specification model and that the heuristic functions presented enhance the performance. The inputs to mutation-based testing of timeliness is a specification of a real-time system and a testing criterion. The testing criterion specifies what mutation operators to use, and thus, determines the level of thoroughness of testing and what kind of test cases will be produced. A mutant generator applies the mutation operators to the specification and sends the mutated specifications to an execution order analyzer that determines if and how the mutation can lead to a timeliness failure. We call a mutated specification model that contains a fault that can lead to a timeliness failure a malignant mutant. If analysis reveals a timeliness violation in a mutated model, the mutant is marked as killed. Traces from the killed mutants are fed into a test case generation filter that extract an activation pattern that has the ability to detect faults similar to the malignant mutant in the actual system under test. It is also possible to automatically extract the execution orders of tasks that can lead to a deadline violation when the input stimuli is injected. During test case execution, test inputs are injected in the real-time system according to the activation pattern. Problems associated with controllability and observability when testing flexible real-time systems are out of scope of this paper. Prefix-based and non-deterministic test execution techniques [15,33,21] are complementary to our approach.

2

System Model and Testing Criteria

This paper uses a subset of Timed Automata with Tasks (TAT) [24,11] to define the assumptions about the system under test and as a source for model 104


based test case generation. Timed Automata (TA) [1] have been used to model many different aspects of real-time systems. A TA is a finite state machine extended with a collection of real-valued clocks. Each transition can have a guard, an action and a number of clock resets. A guard is a condition on clocks and variables, such as a time constraint. An action can do calculations and assign values to variables. The clocks increase uniformly from zero until they are individually reset in a transition. When a clock is reset, it is instantaneously set to zero and then starts to increase at the same rate as the other clocks (we assume synchronized clocks). Within TAT models, TA is used to specify the activation pattern of tasks, that is, the order and points in time different task executions is requested. Further, TAT extends the TA notation with a set of real-time tasks P, which need to be scheduled to perform computations in response to an activation. Elements in P express information about tasks as quadruples (c, d, S EM , P REC), where c is the assumed execution time of the task, d is the relative deadline, and SEM and PREC are defined in the following paragraphs. Shared resources are modeled by a set of system-wide semaphores, R, where each semaphore s∈R can be locked and unlocked by tasks at fixed time points in their execution. The set S EM contain tuples of the form (s, t1 , t2 ) where t1 and t2 are the lock and unlock times of semaphore s∈R. These times are expressed relative the task’s start time. Precedence constraints are relations between pairs of tasks A and B stating that an instance of a task A must have executed to completion between the execution of two consecutive instances of task B (otherwise, the second instance of task B is blocked). Hence, P REC is a subset of P that specifies what other tasks must precede this task. We call a task’s behavior, including the points in its execution where different resources are locked and unlocked, the tasks’ execution pattern. In TAT, task execution patterns are fixed. This may appear unrealistic, especially if the input data to a task may vary. In this step we assume that the execution pattern for a task is associated with a particular (typical or worst case) equivalence class of input data. After a critical activation pattern is found, the target system can be tested several times using different task inputs in that sequence, stressing it to reveal faulty behavior. 2.1 Mutation Operators A test criterion defines test requirements that must be satisfied when testing software. An example of a test criterion is “execute all statements once”. A test coverage measure expresses how thoroughly tests have satisfied a test criterion, usually in terms of how many test requirements are satisfied. A mutation-based test criterion is defined by a set of mutation operators. Hence, progress of testing can be expressed in terms of mutants killed during test case generation. For example, if a set of test cases derived from killing all malignant “(∆ = 3) execution time mutants” has been run on the 105


target system, then 100 percent coverage has been reached for that testing criterion. Mutation operators mimic possible faults that can lead to timeliness failures. Our previous work identified and presented formal definitions of seven types of faults or deviations from assumptions that can lead to timeliness failures [22], whereas this paper describe the operators informally and classifies them with respect to the maximum number of mutants created. 2.1.1 Task property mutations O(n) The following operators create 2n mutants, where n is the number of tasks. Execution time operators increase the modelled worst case execution time of a task by a constant time ∆ or decrease the best case execution time with the same amount. This mutation represents a situation where the assumption of a task’s execution times, used for analysis, does not correspond with the execution times that is possible in the implementation. Minimum inter-arrival time operators decrease or increase the assumed inter-arrival time between requests for task execution by a constant time ∆. This reflects a change in the system’s environment that causes requests to come more or less frequently than expected. Such recurring environment requests can also be assumed to have fixed offsets to each other. Pattern offset operators change the offset between two activation patterns by a constant ∆ time units. 2.1.2 Resource locking mutations O(nrl) These mutation operators increase or decrease the time when a particular resource is locked by ∆ time units. The lock time operator changes the point in time resources are locked and the unlock time operator changes the time resources are unlocked relative the start time of the task. The hold time shift operator changes both the lock and unlock times. Since mutants are created for each pair of tasks and resource protected critical sections, the maximum number of mutants is 2n ∗ r ∗ l, where r is the number of resources and l is the maximum number of times a resource is needed by a particular task throughout its execution. 2.1.3 Precedence mutations O(n2 ) For each pair of tasks, if a precedence constraint exists between the pair, then it is removed. If there is no precedence constraint, a new constraint is added. A task cannot be constrained to precede itself, so the number of mutants that can be created is n2 − n.

3

Automated Test Generation using Genetic Algorithms

The previously presented method based on model-checking [22] is safe for analyzing mutated TAT models in the sense that vulnerabilities are guaranteed to be revealed if they exist. However, for some systems the state space becomes too large for model-checking to be effective. In particular, the computational 106


complexity (both time and memory) grows when triggering events are allowed to occur at many different points in time. In dynamic real-time systems, there are many sporadic tasks, making model-checking impractical. For these systems, we propose an approach where a simulation of each mutant model is iteratively run and evaluated using genetic algorithms with application specific heuristics. By using a simulationbased method instead of model-checking for execution order analysis, the combinatorial explosion of full state exploration is avoided. Further, we conjecture that it is easier to modify a system simulation than a model-checker, to correspond to the architecture of the system under test. When simulation is used for mutation analysis, the model task set must be mapped to task entities in a real-time simulator. The activation pattern of periodic tasks is known and can be included in the static configuration of the simulator. The activation pattern for sporadic tasks should be varied for each iteration of simulation to find the execution orders that can lead to timeliness failures. Consequently, a necessary input to the simulation of a particular TAT model is an activation pattern for the sporadic tasks. The relevant output from the simulation is an execution order trace where the sporadic requests have been injected according to the activation pattern. A desirable output from a testing perspective is an execution order trace that leads to a timeliness failure in the mutant. By treating test case generation from the TAT model as a optimization problem, different heuristic methods can be applied to find a trace leading to a missed deadline. This paper focuses on genetic algorithms, since they are highly configurable and cope well with optimization problems that contain local optima [18]. Genetic algorithms operate by iteratively refining a set of solutions to an optimization problem through random changes and by combining features from existing solutions. In this context the solutions are called individuals and the set of individuals is called the population. Each individual has a genome that represents its unique features in a standardized format. Common formats for genomes are bit-strings and arrays of real values. Consequently, users of a genetic algorithm must supply a problem specific mapping function from a genome in any of the standard formats to a particular candidate solution for the problem. It has been argued that the mapping is important for the success of the genetic algorithm. For example, it is desirable that all possible genomes represent a valid solution [18]. The role of the fitness function in genetic algorithms is to evaluate the optimality or fitness of a particular individual. The individuals with the highest fitness in a population have a higher probability of being selected as input to cross-over functions and of being copied to the next generation. Cross-over functions are applied on the selected individuals to create new individuals with higher fitnesses in the next generation. This means either combining properties from several individuals, or modifying a single individual 107

Nilsson, Offutt and Mellin y == MIAT(i)+T(i,j) y := 0

RELEASE OFFSET Task i y

MBT 2007 Third Workshop on Model-Based Testing

MBT 2007 Third Workshop on Model-Based Testing - Semantic Scholar

A Novel Model Based Testing (MBT) - Semantic Scholar

The Second ECVAM Workshop on Phototoxicity Testing

Proceedings of the Workshop on Model-Based Testing and Object

7th International Workshop on Model-based ...

7th International Workshop on Model-based ...

Model-Based Testing for the Second Generation of ... - CiteSeerX

Second International Workshop on Formal

Second International Workshop on Successful

Model-Based Testing in Industry â A Case Study with Two MBT Tools

Model-Based Testing

On Model-Based Testing Advanced GUIs

Return On Investment @ Model Based Testing

On Model-Based Testing Advanced GUIs

Improving Testing of Enterprise Systems by Model-based Testing on

SECOND WORKSHOP ON ON SUSTAINABLE SOFTWARE FOR

SECOND WORKSHOP ON ON SUSTAINABLE SOFTWARE FOR ...

SECOND WORKSHOP ON ON SUSTAINABLE SOFTWARE FOR ...

ECAI 2006 Workshop on Recommender Systems

ECAI 2006 Workshop on Recommender Systems - CiteSeerX

FESCA 2006 Third International Workshop on Formal

Automated Model-based Testing Based on an Agnostic-platform ...

Full Proceedings of the Second Workshop on Cross-Surface workshop

MBT 2006 Second Workshop on Model Based Testing