Using Genetic Algorithms to Generate Test Plans ... - ACM Digital Library

3 downloads 2073 Views 342KB Size Report
Florida Tech. Melbourne, Florida, USA rmenezes@cs.fit.edu. Márcio Braga. Software Engineering. IVIA Ltda. Fortaleza, Ceará, Brazil [email protected].
Using Genetic Algorithms to Generate Test Plans for Functionality Testing∗ †

Francisca Emanuelle Vieira Francisco Martins Rafael Silva NATUS Project, IVIA Ltda. Fortaleza, Ceara, ´ Brazil [email protected] [email protected] [email protected]

Ronaldo Menezes Computer Sciences Florida Tech Melbourne, Florida, USA [email protected]

ABSTRACT

Software Engineering IVIA Ltda. Fortaleza, Ceara, ´ Brazil [email protected]

of software testing tools (and programmers) today perform functionality where testing plans (pre-defined sequence of steps) are executed and the outcome of the execution used to evaluate the correctness of the software. This approach is necessary but the bias on the choice of test plans is quite significant. The choice of test plans is driven by the experts’ notion of what is important to be tested or at least of what should be prioritized. Still, software testers struggle to answer questions such as: what test plans should be executed? and What sequence of operations should be placed in a test plan? What we all would like to answer in both cases is: all of them. However, this is an impossibility for medium to large systems for number of possible sequences becomes too large, making the problem very complex – and humans have limited capability when it comes to understanding complex systems [1]. Dijkstra [6] observed that the number of bugs in an application has a direct relation to the logic implemented rather than just the input values. In general, there are two basic approaches to test software. The first, more appreciated in the academia, consists of using formal specifications to design an application and then use theorem provers to demonstrate the application’s properties. This approach is very strict but unfortunately not often used given that the majority of software developers and designers are not prepared to deal with the strictness it requires. As if this was not sufficient, the breadth of formal specification methods do not encompass all the functionality needed in today’s complex applications. The second approach consists of doing test as part of the traditional engineering models (e.g. waterfall, spiral, prototyping) that have a specific phase for testing generally occurring after the application has been implemented – some argue this to be the correct form given that testing can be seen as a process that tries to destroy the software rather than implement it. Granted: modifications to these traditional models have being incorporating testing in every phase of the software development with methodologies such as extreme programming [2]. Despite all the claims, the truth here is that current approaches are insufficient to test software appropriately, thus causing the current status of the field which clearly seems to be loosing the battle of providing

Like in other fields, computer products (applications, hardware, etc.), before being marketed, require some level of testing to verify whether they meet their design and functional specifications – called functionality test. The general process of performing functionality test consists in the production of a test plan that is then executed by humans or by automated software tools. The main difficulty in this entire process is the definition of such test plan. How can we know what a good sequence (test plan) is? The rule of thumb is to trust on people who understand the workings of the application being tested and who can decide what should be tested. The danger is that experts, due to their overconfidence on their knowledge, may become blind to issues that should otherwise be easy to see. This paper describes a technique based on genetic algorithms that is able to generate good test plans in an unbiased way and with minimum expert interference.

1.

Marcio Braga ´

INTRODUCTION

Software applications consist of a sequence of well-defined instructions intended for execution in a computer. The exact sequence that is executed depends on several factors but primarily on the input values and what the use is doing with the application – the sequence of operations being followed by the user. Users deal with applications at the functionality level by performing high level operations. The majority ∗Research funded by the Conselho Nacional de Desenvolvimento Cient´ıfico e Tecnol´ ogico (CNPq), Brazil, under grant number 551761/2005-9. †Corresponding Author

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ACM SE ’06 March 10-12, 2006, Melbourne, Florida, USA Copyright 2006 ACM 1-59593-315-8/06/0004 ...$5.00.

140

users with reliable software. We understand that complete reliability is hard to achieve in empirical approaches to testing – complete testing is impossible – but this does not mean that a good set representing the full space of possible tests cannot be automatically generated thus reducing the cost of software development. The point to be made is not that current approaches in software testing are not sufficient to convince academics and practitioners that that field needs to be taken seriously. From a financial point of view, it only makes sense to look at software testing with more attention given that low estimates, put the cost of software testing at around 40% of the overall cost of development [12]. Therefore, we need a scientific method that allows testing to be carried out effectively without the need for the rigorous approach that exist in formal methods. This paper discusses how genetic algorithms may be used to generate good, unbiased test plans for functionality testing. We argue that this is a step towards a more scientific approach in the field software testing. This paper is divided as follows. In Section 2 we provide a brief introduction of genetic algorithms. Section 3 explains the methodology proposed in our project (called NATUS); this section introduces concepts of how to evaluate a test plan as well as the background necessary to understand the results presented in Section 4. We have implemented our proposal and produced results showing the promise of GAs to the area of software testing automation. Section 5 discusses other works that have used genetic algorithms in software testing and how NATUS differs from them. Finally, in Section 6 we describe the next steps to be taken in our project.

2.

Mating: It requires the selection of two or more individuals. Mating can occur in various ways but the most common is point-crossover in which a “cut” point is randomly selected and new individuals are generated from combinations of the parts of the genotypes. For instance, with a 1-point-crossover and two individuals, one can generate two new individuals combining the first part of one parent with the second part of the other, and vice-versa. The decision to mate is also stochastic – individuals may be selected and not mate. Mutation: Mutation is a process that allows the re-introduction of new genetic material in the population. Although mutation is an important step, it is normally desirable with less frequency. Hence GAs normally set the mutation rate to very low probabilities. A larger than necessary mutation rate may have an undesirable side-effect to the convergence of the algorithm to a good solution. Figure 1 shows a pictorial view of the steps described above. It should be noted in this figure that two individuals are selected for mating, if mate (crossover) occurs, we will have two other individuals originated from the genetic code of the two selected. Understanding the steps above are essential to the understanding of what GAs can offer to software testing. Software testing consists of a difficult search for solutions within an open search-space. A GA is a mechanism to explore this space in a consistent, impartial, and systematic way. A GA approach may remove the partiality of currently software testing techniques that generally rely on the knowledge of a software-testing expert. Section 3 describes the use of GAs in functionality testing while Section 5 describes other aspects of the use of GAs in software testing.

GENETIC ALGORITHMS

Genetic Algorithms (GAs) was proposed by Holland in 1975 [7] as a meta-heuristic to combinatorial problems. It was inspired by the process of natural selection as proposed by Darwin [5]. The GA idea is simple but, as in natural selection, very useful. GAs have been successfully used in a plethora of practical applications. Examples include robot movement planning, protein folding, optimization of mobile communication infrastructure, design of antennas, allocation of resources in distributed systems, and many others. GAs have very well-defined steps:

3.

NATUS PROJECT

The use of GA in the generation of test plan is one of the main milestones of the NATUS project. The overall goal of the project is to provide a methodology in which strategies inspired by nature play a major role on the testing process while eliminating the bias that arise from experts’ decisions. The obvious question is whether the approaches based on GAs can yield good test plans and, in the future, good testing tools. In this section, we discuss our approach to generate and evaluate testing plans based on GAs. First, it is worth mentioning that our project does not aim at criticizing experts in software testing. We understand perfectly that experts have valuable opinions; our process also works well in tandem with them. However we also think that in many occasions they are blind to certain problems in the software because they tend to focus on what they see as critical and not necessarily on how the system is going to be used, or looking to test the software in a way that covers most of the operations (diversity in the system). Our GA approach makes use of a table (Table 1) whose values can be initialized by experts of even modified on-the-fly to reflect some expert knowledge. Here, we describe the paper assuming the values are not necessarily expert generated. The occurrence of a problem in a software application is directly proportional to the level of inconsistency of the state in which the application is. This means that when an error occurs in an application, it was not necessarily the last operation, `, executed by the user that caused the error, but it

Codification: Solutions to the problems need to be coded as genotypes that are evaluated and assigned fitness values. There must be an ordering on the fitness values of genotypes to allow solutions to be compared against each other. Population Creation: A group of (normally) randomlygenerated individuals are created to form a population of fixed size. As part of this step, one needs to look at the issue of diversity in the population. The more diverse the population, the more likely it is that good genetic sequences (part of the genotype) are present somewhere in the population. Individual Selection: Given the fitness of each individual as calculated by the fitness function, the algorithm can stochastically select candidates of the population to be used in the next step (mating). It is common in this step to use a roulette so that each individual are given chances proportional to the goodness of their fitness.

141

Point Crossover (Mate)

Initial Population

Creation of Next Population

mutation mutation Mutation

Resulting Individuals

Figure 1: Pictorial representation of the GA steps.

may have been a sequence of previously executed operations that caused the application to be in an inconsistent state by the time ` was executed. Our premise is that an application has a state of inconsistency φi which is non-decreasing as operations ,`i , are executed. Or in other words, given a sequence of operations, i, i + 1, i + 2, . . ., i + n, we can say that φi ≤ φi+1 ≤ φi+1 ≤ . . . ≤ φi+n . This means that every transition t from instruction `i to `i+1 , written as t(`i → `i+1 ), adds to the state of inconsistency of the application (or at least keep it the same). In these lines, the goal of software testing is to minimize t(`j → `k ), ∀j, k – minimize the degree of inconsistency added to the system in all possible transitions in the application. It is worth explaining in more details what these operations are in our case. The scenario we envision for our process is one where alpha versions of the application are released internally to testers who have the task of using the application in ways they find necessary. The testers execute sequence of operations and keep a log of what they did in case problems are found – problems need to be replicable. The assumption here is that testers are verifying functionally with operations of large granularity (e.g. open file, mark text as bold, delete all text, undo, etc.). Note that this does not affect the use of other testing approaches at the code level, for instance. Our process is used to identify sequences of operations that take the application to an inconsistent state. Obviously it may still be necessary to look at each of these operations in more details if the problem is to be fixed. For instance, a developer may have to fix and test the unit ”open window” if the tester found that it is causing problems to the application. Given the definition of inconsistency above, the goal of a software testing professional (or automated tool) is to assign values to each possible transition in a software system to represent the contribution of that transition to the degree of inconsistency of the application. Clearly, this is a biased approach that should be avoided in practice – we require an expert to define how problematic a particular transition is.

Yet, our main goal here is to demonstrate that approaches such as GAs can used to evolve good test plans (sequences of operations being executed); later we describe how this bias may be eliminated with randomly generated values that, overtime through leaning and the process of feedback, evolve to what they should be. Table 1 shows the representation of the transition values. The NATUS project will incorporate the GA approach with a mechanism that updates the values of the table as the problems are fixed. Therefore, even a randomly generated table will eventually converge to one where the values are realistic to the application being tested.

`1 `2 `3 .. . `n

`1 v1,1 v2,1 v3,1 .. . vn,1

`2 v1,2 v2,2 v3,2 .. . vn,2

`3 v1,3 v2,3 v3,3 .. . vn,3

··· ··· ··· ··· .. . ···

`n v1,n v2,n v3,n .. . vn,n

Table 1: Representation of the assigned values for the inconsistency added by each transition. For instance t(`2 → `3 ) = v2,3 . Assuming a transition table as the one in Table 1 we can define a fitness function, fp , for a test plan (sequence of Pk−1 operations) p = `1 , `2 , ..., `k as fp = t(`i → `i+1 ). i This indicates that the larger the value of fp the better the sequence is considered given that it is more likely to take the application to an inconsistent state. GA can now be used to evolve a population of test plans of a fixed size into individuals (test plans) that are close to a maximum value for fp . Note that the description above is just a (perhaps na¨ıve) example. Fitness functions based on other factors may be used to generate a sequence. In fact, in a more elaborate use of GAs, the table itself can be updated in a feedback loop based on the result of the execution of the test plan. That

142

is, it may be possible to initialize the transition table with the random values for each cell and upon the execution of a sequence that did not cause problem decrease the values for all transitions that are part of the sequence. Another point that need to be noted is that the fitness function as defined earlier has a strong dependency to the application one needs to test. For instance, it may be more meaningful to use a function based on the standard deviation of the inconsistency so that test plans that do not fall within the norm are given a higher score and hence tested first. Our approach to the generation of test plans is quite different from other approaches using nature-inspired (or bioinspired) strategies such as GA and Neural Networks. We envision a test plan as a sequence of pre-defined operations executed by a person sitting in front of a computer. We assume the granularity of these operations are to the level of functionality (e.g. open a window, insert a record, close window, open file, etc.). Our initial goal in the NATUS project is not to test within each of these units (unit testing). However we are aware that GAs and other nature-inspired approaches may be used at other level of testing. Section 5 describe some of the other approaches available in the literature.

4.

Figure 2: These results show the application of our GA on the generation of test plans with 10 operations.

prove upon test plans even though the size of the individual (10 operations) does not allow for that much optimization.

RESULTS

As part of the NATUS project we have implemented a genetic algorithm to generate test plans. We have implemented our GA using the Java language, but since we are not measuring time of execution, the known issues with Java are not relevant here. In all experiments we have used an initial population of random generated test plans of size 150. The optimization has been run for 50 iterations (termination condition). A mate (crossover) probability of 80% is being used with mutation being at only 1%. In essence, these rates indicate that after selection, the individuals may decide to mate with a probability of 80% that mating will take place. If it does, the resulting individuals will be sent to the mutation phase of the algorithm. If it does not, then the selected individuals themselves are passed on to the mutation phase. After mating, the individuals may suffer mutation but with a low probability. Although mutation is very important in GAs for it introduces new genetic material, it has to be carefully controlled in order to allow for the convergence of the algorithm. In our case, the mutation (if it takes place) happens only in one of the positions of the test plan. The algorithm uses point crossover as depicted in Figure 1. Point crossover works by selecting a break point in the chromosome of the two selected individuals and recombining them using each-others’ half. For these experiments we have used a table of size 30×30 with randomly generated transitions values (as shown in Table 1) . The operations are generic and do not apply to any specific application as the objective was to demonstrate that GAs can be used to evolve test plans in a semi-automated way – in practice these operations need to be applicationspecific. Each experiment was run several times – Figures 2 to 4 show the results based on 9 runs of each experiment. For each iteration depicted in the figures, the point value represents the average of the best test plan for that iteration. Figure 2 shows the results when our GA is used to generate test plans of size 10 (10 consecutive operations). It is interesting to see that the algorithm does manage to im-

Figure 3: These results show the application of our GA on the generation of test plans with 20 operations. The two series being plotted in Figure 2 represent the point value (as described above) and the moving average (an average of the last 10 point values in the graph). The moving average is used to better depict the fact that the algorithm is improving the quality of the test plans. The point values are useful because they remind us that GA is a heuristic and solutions may go from from good solutions (fitness values) to worse ones. Although the best sequence is unknown, the goal here was not to show the best is found but rather that the GA is able to generate good test plans. The GA we implement does not employ any elitism. For this reason, we observe that best individuals generated in previous iterations of the algorithm are not automatically carried over to the next generations. Figures 3 and 4 corroborate our results. In these figures, test plans of 20 and 30 operations are generated using genetic algorithms. Note that the fitness of the best individual for the case test plans of 20 operation is approximately double of the fitness of the best individuals for 10 operations.

143

is to work on a methodology for test automation completely inspired from nature approaches such as GAs. In this process we are also considering the use of Swarm Intelligence for the implementation of the testing tools themselves. Given test plans generated using GA, we could have many (in parallel) testing tools whose job consist of selecting testing plans, analyzing the results, and provide feedback to the the inconsistency table used in the GA. For instance, lets assume that a test plan p generated by the GA was considered a good plan, but it turned out that when the testing tool executed it, it did not cause problems in the application. The immediate conclusion here is that the values used in the table for all transitions in p are not what they should be. If the execution finished without errors, then the testing tool could update the inconsistency table to reflect the fact that those transitions have been tested – the values of the transitions are decreased.

Figure 4: These results show the application of our GA on the generation of test plans with 30 operations.

Characteristics passed GA

Similarly, the best test plan with 30 individuals has a fitness function that is about 3 times more than the fitness of the best individual for 10 operations.

5.

Characteristics of the Application to be Tested

Execution of Test Plans using Swarm Intelligence

RELATED WORK

GAs have been used in software testing before. Berndt [3] describes a technique that is placed as part a normal software development life cycle. Berndt generates test plans based on a fitness function that reflects the test originality (degree of novelty of the test), proximity (how close it is to other good test plans that are stored in the system), and seriousness (how severe is the error that the test can cause). Although these are very interesting characteristics, our approach differs from Berndt’s because we propose that test plans that cause errors are less desirable after one is generated. Berndt’s argue that similarity is a good characteristic because if a test plan is problematic, similar ones will also be. Note that we see the use of GA as part of a larger methodology where errors are being fixed (see Figure 5), hence if a test caused an error and the error is fixed, similar test plans are no longer as desirable as before. Berndt’s has also looked at how well GAs do compared with other automated test generations [4] Korel [9] discussed automated test generation from a general point of view. He classifies the generators into three categories: random test generators, path-oriented test data generators, and dynamic test data generators. He also points out that most of these methods are insufficient because they are based solely on the flow of the application. Our approach does not use the structure of the application and generate test plans based primarily on a notion of inconsistency we introduced in Section 3. Michael [11] made a study to show that for data generation which is used in testing, genetic algorithms is almost always better than random generation. Our approach does not deal with data yet and looks only at the level of macro operations in functionality testing.

6.

Generation of Test Plans using Genetic Algorithms

Test Plans Passed Execution Phase

Result of Execution used to update Characteristics the application

Figure 5: Automated testing methodology proposed in the NATUS project.

Figure 5 shows the automated testing methodology proposed as part of the NATUS project. Shaded objects are the subject of this paper. The Swarm Intelligence part of the methodology is currently ongoing. We are developing a model of coverage of a space of testing plans based on an abstraction of alarm pheromone in insects [10]. We believe this abstraction provides an uniform coverage of the space of test plans that need to be executed. We believe we can make the Swarm Intelligence part of our methodology use information of previous executions of test plans to make better predictions of what tests should be executed. This approach is similar to how regression test is performed. Given that our initial results are promising, we will, in our next step, verify the behavior of our GA with real software systems. The NATUS project is being developed in a software company (IVIA Ltda., Brazil) with a diverse portfolio of applications that are suitable candidates to our approach. We intend to start applying our GA to generate test plan for the phonnia [8] – an application to choose long distance service provider in real time. With this application we will be able to test the quality of our test plans against expertcreated test plans. Our focus will be on web-based applications because they have a well-defined protocol (HTTP). Hence the test cases consist of a sequence of HTTP instructions sent between the user front-end (browser) and the web application. A study of our fitness function is also necessary, currently we are using a function in which takes the summation of all

FUTURE ACTIVITIES

The NATUS project aims at developing more than just algorithms to generate test plans. The purpose of this project

144

inconsistency values associated to each transition pair in the test plan. Other approaches such as the use of a function based on the standard deviation may also be useful. We believe the fitness function is heavily dependent on the application. Take for instance an approach based on standard deviation; the function may be such that outliers are given higher fitness scores. This approach is based on the fact that different test plans are the most promising ones (the ones more likely to cause errors in the application).

7.

on Software testing and analysis, pages 209–215, New York, NY, USA, 1996. ACM Press. [10] C. Lloyd. The alarm pheronones of social insects: A review. Technical report, Colorado State University, 2003. [11] C. C. Michael, G. E. McGraw, M. A. Schatz, and C. C. Walton. Genetic algorithms for dynamic test data generation. In ASE ’97: Proceedings of the 1997 International Conference on Automated Software Engineering (ASE ’97) (formerly: KBSE), pages 307–308, Washington, DC, USA, 1997. IEEE Computer Society. [12] C. E. Williams. Software testing and uml. In Proceedings of the 10th International Symposium on Software Reliability Engineering, Boca Raton, Florida, Nov. 1999. IEEE Press.

CONCLUSION

This paper described how genetic algorithms can be used to generate test plans for functionality testing (a sequence of operations to be tested). We argue that such approach eliminates the bias from the expert-generated plans thus enabling computer applications to be more thoroughly tested. We have introduced the concept that the inconsistency of an application is a value that is non-decreasing, meaning that as operations are executed the application can only move to a state that is at best as inconsistent as the previous state it was on. It is with this assumption in mind that we have defined a fitness function that captures how much a test plan contributes to the inconsistency of the application. In this paper a good test plan is the one with the highest contribution to the inconsistency of the application – good is actually bad for the application. We argued that our GA approach can be integrated into a methodology where testing application can use the generated test plans and provide feedback to the inconsistency value of each transition pair in the test plan. If a plan is executed and caused no errors the inconsistency of all transition pairs in that plan can be decreased; otherwise, it the values can be increased or left at the same level.

8.

REFERENCES

[1] A.-L. Barab´ asi. Linked: The New Science of Networks. Perseus Publishing, 2002. [2] K. Beck. Extreme Programming Explained: Embrace Change. Addison-Wesley, 1999. [3] D. Berndt, J. Fisher, L. Johnson, J. Pinglikar, and A. Watkins. Breeding software test cases with genetic algorithms. In HICSS ’03: Proceedings of the 36th Annual Hawaii International Conference on System Sciences (HICSS’03), pages 338–347, Washington, DC, USA, 2003. IEEE Computer Society. [4] D. J. Berndt. Investigating the performance of genetic algorithms-based software test case generation. In Eighth IEEE International Symposium on High Assurance Systems Engineering (HASE’04), 2004. [5] C. Darwin. On the Origin of Species: A facsimile of the first edition. Harvard University Press, July 1975. [6] E. W. Dijkstra. Structured programming. In O.-J. Dahl, E. W. Dijkstra, and C. A. R. Hoare, editors, Notes on Structured Programming, pages 1–82. Academic Press, 1972. [7] J. H. Holland. Adpatation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI, 1975. [8] IVIA Ltda. Phonnia. http://www.phonnia.com.br/. [9] B. Korel. Automated test data generation for programs with procedures. In ISSTA ’96: Proceedings of the 1996 ACM SIGSOFT international symposium

145