Improving Evidence about Software Technologies - CiteSeerX

7 downloads 0 Views 805KB Size Report
view MBT as a mainstream approach. It's inter- esting to ask why not, and whether the available evidence has some light to shed on MBT's broader usefulness.
voice of evidence

Editor: Forrest Shull

n

Fraunhofer Center for Experimental Software Engineering, Maryland n [email protected]

Improving Evidence about Software Technologies A Look at Model-Based Testing

Arilo Dias Neto, Rajesh Subramanyan, Marlon Vieira, Guilherme Horta Travassos, and Forrest Shull

A

rich body of experiences hasn’t yet been published on all the software development techniques researchers have proposed. In fact, by some estimates, the techniques for which we do have substantial experience are few and far between. When we started looking at the evidence on model-based testing (MBT), we thought we’d come across some strong studies that showed this approach’s capabilities compared to conventional testing techniques— this wasn’t the case. However, we can still extract some useful knowledge and also discuss some issues that are relevant to other software technologies with similar types of evidence.

A new testing paradigm MBT approaches help automatically generate test cases using models extracted from software artifacts.1 MBT promises to greatly affect how we build software as well as the level of confidence developers can have that their software meets its requirements—especially those related to safety, reliability, and performance. This promise is based on the fact that MBT approaches extract relevant testing information directly from the models, which can automate tasks that fallible humans usually undertake and allow more complete and correct test suites. The technical literature often highlights this goal.1–3 MBT has been studied since at least 1976, when Chittoor V. Ramamoorthy and his colleagues described a seminal approach for test-data generation.4 Given a program graph for a Fortran program, their approach identifies a set of paths that satisfy some given testing criteria. Since this 10

IEEE Soft ware

Published by the IEEE Computer Society

development, researchers have proposed and refined several different MBT approaches. According to the literature, certain specialized domains use MBT. A review of available MBT case studies, for example, highlighted its use in hardware or embedded systems—namely, processor architectures, communication protocols, and smart-card applications, as well as operating systems and language specifications.5 However, our (admittedly anecdotal) experience working with software development teams in other domains is that most developers don’t view MBT as a mainstream approach. It’s interesting to ask why not, and whether the available evidence has some light to shed on MBT’s broader usefulness.

A broad view We could find no prior formal survey of MBT approaches in the literature. To characterize how much evidence existed on MBT, we conducted a systematic review of the published work in this area through mid-2006. (See the details of our review in the Web appendix; go to www.computer. org/software and click on this issue’s Web Extra.) The initial search returned 406 papers, which we reviewed manually. Once we removed clearly irrelevant and duplicate papers, 202 papers remained. Motivated by the popularity of the Unified Modeling Language (UML) among development teams that we work with, we selected as our priority for initial analysis n all papers describing MBT approaches that are

useful for UML models, and 0 74 0 -74 5 9 / 0 8 / $ 2 5 . 0 0 © 2 0 0 8 I E E E

Voice of Evidence

Table 1

n papers describing MBT approaches ap-

plied to models other than UML, which were either recent (published in or after 2004) or heavily cited (receiving three or more citations). This left us with 85 papers, describing 71 distinct MBT approaches, for our initial analysis. To determine how much evidence has been published regarding these approaches’ usefulness, we categorized the papers according to the level of evidence they presented. To avoid being called “ivory tower academics,” we designed the five levels to make explicit that lessons learned and experiences from the field—not just academic studies—provide useful evidence: n Speculation. These papers describe the

n

n

n

n

MBT approach without presenting any study or example that would indicate its feasibility in software projects. Example. These papers describe the MBT approach and an example of its use. However, they provide no evaluation criteria against which to compare the described approach’s performance. Proof of concept. These papers describe using an MBT approach for a “toy” system or a project without commercial pressure. Some measures might be collected that show that the approach can be successfully applied, but not necessarily its effectiveness. Experience/industrial reports. These papers describe a real team developing software in industry using the MBT approach, and include some measures or subjective opinions to understand its utility. Experimentation. These papers evaluate the MBT approach in some detail through an experimental study (such as a case study, rigorous observation of developers, or evaluations against a control approach). They include measures and analyses regarding the results in a specific environment.

We include all five levels here because— although they don’t all provide highly rigorous insight into MBT practices—we’ve found that they can be useful when transferring new technologies into practice.6 To illustrate, think about whether you would have any motivation to try a new approach

Types of evidence in our data set Number of papers Type of evidence

UML-based

Non-UML

Total

Percent

Speculation

17

6

23

27

Example

22

16

38

45

Proof of concept

5

8

13

15

Experience/industrial reports

0

4

4

5

Experimentation

3

4

7

8

47

38

85

100

Total

Table 2 Types of software development paradigms and domains our data set covers Software construction paradigm

Number of model-based testing (MBT) approaches UML-based

Non-UML

Total

41

1

42

Based on formal specification

4

17

21

Based on a finite state machine

0

6

6

Reverse engineering (legacy system)

1

0

1

Object-oriented (and component-based)

Aspect-oriented Total

0

1

1

46

25

71

Target domain* Not defined

36

12

48

Reactive

1

5

6

Safety-critical

2

4

6

Embedded

1

3

4

Distributed

2

1

3

Web application

1

1

2

COTS

2

0

2

Real-time

0

1

1

Product line

1

0

1

Concurrent

0

1

1

Web services

0

1

1

*These categories are not orthogonal; some MBT approaches fall under multiple categories.

if its developer couldn’t at least provide a proof of concept or a small study showing feasibility. Less rigorous results are still useful for demonstrating that proceeding to more rigorous and more expensive trials has some merit. Table 1 shows our results from the literature review, divided according to whether the evidence concerned MBT techniques applicable for systems that use

UML diagrams or other types of models. In table 2, we categorize the software modeling paradigm and the types of software to which MBT researchers have applied MBT approaches. A high number of MBT approaches (67 percent) don’t indicate a software category to which developers could apply them. This proportion is higher for UML-based approaches (36—or 78 percent—give no May/June 2008 I E E E S o f t w a r e 

11

Voice of Evidence

specific category). This might suggest that you can apply these 36 approaches to any type of software category expressed in UML. However, we caution that additional evidence in this area would be helpful. Many MBT approaches are demonstrated on just a subset of the UML diagrams, which leaves open the risk that certain application domains (such as reactive systems) pose specific requirements that a given approach can’t deal with. Even a small feasibility study, using representative software models, could provide additional confidence. Looking at table 1, we can see that most papers about UML-based approaches fall into the speculation or example categories. UML is widely deployed and well accepted, so developers hunting for solutions might find it easier and less risky to pick up potential solutions that work within this paradigm. Table 2 shows that domains in which we find the most examples of non-UML approaches demand high reliability or require the system to react within certain constraints, such as safety-critical systems or embedded systems. We can imagine that approaches targeting these environments would demand more effort to make it clear that they’re feasible and effective, which would also explain why fewer speculative papers exist for the non-UML approaches, whereas those containing at least a concrete example of use increase. An important issue is whether MBT approaches effectively demonstrate that a system satisfies not just its functional requirements but also nonfunctional ones such as reliability, safety, or performance. Many of the traditional models that MBT approaches use can’t represent both functional and nonfunctional requirements. For instance, to represent text control flow, MBT approaches most often use original (without extensions) UML diagrams (state charts, class, activity, and sequence), finite state machines, and event-action graphs (see a list with all the models in the Web appendix). None of these models, as originally defined, explicitly allow the representation of constraints regarding nonfunctional requirements such as execution-response times, security properties, or usability characteristics. We found only 10 MBT approaches (out of 71) that explicitly claim to deal with nonfunctional require12

IEEE Soft ware

w w w. c o m p u t e r. o rg /s o f t w a re

ments (U21, U23, U41, N03, N12, N13, N15, N16, N17, N24 in the Web appendix). At the least, extending these models and the test-development approaches to handle nonfunctional requirements might require significant effort. Today, other approaches such as UML 2.0 and SysML enable test generation that includes some nonfunctional requirements. However, only two MBT approaches found in our review used these models (U21, U25).

Demonstrations of effectiveness Let’s drill down more deeply into the seven experimental studies indicated in table 1. What kind of information do they give us? For six out of the seven, the primary measure of effectiveness is to determine what percentage of known defects the MBT approach under study can detect. To do this, we need a testable system. These studies used either toy systems, with a few hundred lines of code, or small applications ranging from 1 to 6.5 KLOC. Researchers seeded defects into these systems. In most of the studies, they did so via code mutation. A smaller percentage used naturally occurring faults, although in these cases, the researchers selected and seeded the faults. No study claimed that the faults were a complete set of all faults occurring during development, so we don’t know whether similar results are achieveable in the field. Three of these studies focused on UMLbased test approaches, (see U12, U19, and U38 in the Web appendix), whereas three

Many MBT approaches are demonstrated on just a subset of the UML diagrams.

others used approaches based on other modeling notations (N04, N07, N14). These studies’ results show that researchers can produce combinations of test suites and input sets that detect 100 percent of the seeded defects. Usually, these require large input sets (up to 500 input values in some cases, although this parameter varies according to the program being tested) and necessitate generating test cases that cover all program execution paths. Generating smaller test sets that rely on looser criteria or statistical sampling still produces high coverage—usually 70 to 90 percent detection of defects. Unfortunately, only one study (N07) reported more precisely on the tradeoffs between smaller and larger test sets— for example, in terms of the time needed to run the tests or the memory footprint required—so it’s hard to objectively measure various approaches’ costs and benefits.

Experimenting with fallible humans Taking a different tack, the seventh study (N20) is particularly noteworthy because it evaluates issues regarding how well MBT approaches can perform when humans use them. This paper provided some insight into model building’s costs and risks. The six studies we just discussed concern only test case generation and execution. Actually creating the software models needed as inputs to the MBT approaches—not to mention keeping them updated—demands software developers’ participation and can require significant time and effort. However, as the authors of paper N20 pointed out, MBT approaches’ ultimate success relies on the quality of the models that support them.7 So, mistakes humans make during model building (such as a domain expert thinking that certain characteristics are too trivial to include in the model) can directly affect the success of the generated test suite. The seventh study provided such data as a way of characterizing MBT approaches. It compared two approaches. The first was HOTTest—a model-based test automation technique based on models of the system described in an embedded domain-specific language. The second approach was based on extended finite state machines that developers have used previously in industry contexts. Undergraduates had applied both approaches to small but real applica-

Voice of Evidence

tions (measuring 1 KLOC and 4.2 KLOC). HOTTest generated test suites that were at least seven times more effective. The tradeoff was that the average time required to develop the input model HOTTest required is higher, and users subjectively felt that HOTTest was also harder to use. This study reports an interesting dichotomy: although users feel less at ease with more formal approaches, such as HOTTest, they actually perform better with them.

T

his study was promising; it illustrated some issues that make wider MBT deployment difficult while showing that such approaches have benefits that might warrant the necessary training and cost of use. In our view, this type of information is necessary for supporting developers’ decisions about whether to use an MBT approach. Developers must obviously take care to select an MBT approach that matches their project’s specific needs: black-, gray-, or white-box testing? Unit, component, integration, or system testing? However, it’s also risky to choose an MBT approach without having a clear view about its



complexity, cost, effort, and skill required to create the necessary models—which can impact the software project’s budget and flexibility. Evidence on these topics could be a useful step in determining whether wider deployment of MBT approaches to different domains is worthwhile.

References 1. S. Dalal et al., “Model-Based Testing in Practice,” Proc. 1999 Int’l Conf. Software Eng. (ICSE 99), ACM Press, 1999, pp. 285–294. 2. I.K. El-Far and J.A. Whittaker, “Model-Based Software Testing,” Encyclopedia of Software Eng., J.J. Marciniak, ed., John Wiley & Sons, 2001, pp. 825–837. 3. A. Pretschner, “Model-Based Testing,” Proc. 27th Int’l Conf. Software Eng.(ICSE 05), ACM Press, 2005, pp. 722–723. 4. C.V. Ramamoorthy, S.F. Ho, and W.T. Chen, “On the Automated Generation of Program Test Data,” IEEE Trans. Software Eng., vol. 2, no. 4, 1976, pp. 293–300. 5. W. Prenninger, M. El-Ramly, and M. Horstmann, “Case Studies,” Model-Based Testing of Reactive Systems, M. Broy et al., eds., Springer, 2005, pp. 439–464. 6. F. Shull, J. Carver, and G.H. Travassos, “An

Empirical Methodology for Introducing Software Processes,” Proc. 8th European Software Eng. Conf., IEEE CS Press, 2001, pp. 288–296. 7. A. Sinha and C. Smidts, “An Experimental Evaluation of a Higher-Ordered-Typed-Functional Specification-Based Test-Generation Technique,” Empirical Software Eng., vol. 11, no. 2, 2006, pp. 173–202.

Arilo Dias Neto is a doctoral student working with re­-

search regarding model-based testing at COPPE/Federal University of Rio de Janeiro (Experimental Software Engineering Group), Brazil. Contact him at [email protected]. Rajesh Subramanyan is a project manager at

Siemens Corporate Research. Contact him at [email protected]. Marlon Vieira is a program manager at Siemens Corpo-

rate Research. Contact him at [email protected]. Guilherme Horta Travassos is a software en­-

gi­neering professor at COPPE/Federal University of Rio de Janeiro, where he leads the Experimental Software Engineering Group, and a CNPq researcher. Contact him at [email protected]. Forrest Shull is a senior scientist at the Fraunhofer

Center for Experimental Software Engineering, Maryland, and director of its Measurement and Knowledge Management Division. Contact him at [email protected].

May/June 2008 I E E E S o f t w a r e 

13

Suggest Documents