Generating Test Data for Software Structural Testing ... - Springer Link

5 downloads 6783 Views 1002KB Size Report
Mar 30, 2014 - lutionary speed. Recently, as a swarm intelligence technique, ... Test data generation is a difficult task in software testing activity. In fact, it is a ...
Arab J Sci Eng (2014) 39:4593–4607 DOI 10.1007/s13369-014-1074-y

RESEARCH ARTICLE - COMPUTER ENGINEERING AND COMPUTER SCIENCE

Generating Test Data for Software Structural Testing Based on Particle Swarm Optimization Chengying Mao

Received: 26 September 2012 / Accepted: 30 April 2013 / Published online: 30 March 2014 © King Fahd University of Petroleum and Minerals 2014

Abstract Testing is an important way to ensure and improve the quality of software system. However, it is a timeconsuming and labor-intensive activity. In the paper, our main concern is software structural testing, and propose a searchbased test data generation solution. In our framework, particle swarm optimization (PSO) technique is adopted due to its simplicity and fast convergence speed. For test data generation problem, the inputs of program under test are encoded into particles. Once a set of test inputs is produced, coverage information can be collected by test driver. Meanwhile, the fitness value of branch coverage can be calculated based on such information. Then, the fitness is used for PSO to adjust the search direction. Finally, the test data set with the highest coverage rate is yielded. In addition, eight well-known programs are used for experimental analysis. The results show that PSO-based approach has a distinct advantage compared to the traditional evolutionary algorithms such as generic algorithm and simulated annealing, and also outperforms comprehensive learning-PSO both in covering effect and in evolution speed. Keywords Test data generation · PSO · Branch coverage · Fitness function · Convergence speed

C. Mao (B) School of Software and Communication Engineering, Jiangxi University of Finance and Economics, Nanchang 330013, China e-mail: [email protected]

1 Introduction Software has widely affected our work and lives, and brought us great convenience. However, its failure may lead to significant economic loss or threat to life safety. For example, the explosion incident of Ariane-V rocket [1] and the BP deepwater horizon disaster [2] are the typical instances of such failures in the past years. As a consequence, software quality has become an important concern for people in the current information society. Testing has been proved as one of the effective approaches to ensure and improve software quality over the past three decades. Generally speaking, software testing methods can be divided into two categories: functional (black-box) testing and structural (white-box) testing [3,4]. Compared with functional testing, structural testing exhibits much higher defect exposure capability, so it has been adopted as an impor-

123

4594

Arab J Sci Eng (2014) 39:4593–4607

tant fault-revealing method during the software development process. While considering such kind of testing method, how to generate test data set which satisfies some specific coverage criterion is not an easy task. In recent years, search techniques have been widely referred in the field of software testing, so-called searchbased software testing (SBST) [5], especially for test data generation problem. The general idea behind search-based test data generation is to select a set of test cases from program input space to meet the testing requirement. The basic procedure is to generate a test suite that covers a specific test adequacy criterion, which is usually expressed as a fitness function [6]. When a coverage criterion is selected, the search activity should attempt to produce a test suite which can cover all construct elements mentioned in the criterion. Meanwhile, testing is a time-consuming and labor-intensive activity. Hence, the size of test suite should be as small as possible to reduce the testing time and cost. Among the existing search methods, meta-heuristic search (MHS) techniques, such as simulated annealing (SA) and generic algorithm (GA), are the most popular algorithms, and have been widely adopted for generating test data. Although they can produce test data with appropriate fault-prone ability [7,8], they fail to produce them quickly due to their slow evolutionary speed. Recently, as a swarm intelligence technique, particle swarm optimization (PSO) [9,10] has become a hot research topic in the area of intelligent computing. Its significant feature is the simplicity and fast convergence speed. In this paper, we studied how to apply this novel optimization technique to generate test data for software structural testing. Meanwhile, some well-known programs have been used for comparison analysis, and the experimental results show that PSO outperforms other traditional search techniques, such as GA and SA, according to the quality of the generated test data. The paper is structured as follows. In the next section, we will briefly introduce the basic procedure of SBST. Meanwhile, the related work about this problem is also addressed. Then, PSO is introduced in Sect. 3. In Sect. 4, the overall framework and some technical details of PSO-based test data generation are discussed. The experimental and comparison analysis are performed in Sect. 5. Finally, concluding remarks are given in Sect. 6.

been incorporated into test data generation. It has been recognized as a rational solution in the field of software testing, also known as SBST. As shown in Fig. 1, the basic process of search-based test data generation [11] can be described as follows. At the initial stage, the search algorithm generates a basic test suite by the random strategy. Then test engineer or tool seeds test inputs from the test suite into the program under test (PUT) and runs it. During this process, the instrumented code monitors the PUT and can produce the information about execution traces and results. Based on such trace information, the coverage metric for some specific criterion can be calculated automatically. Subsequently, the metric is used to update the fitness value of the pre-set coverage criterion. Finally, test data generation algorithm adjusts the search direction in the next-step iteration according to the fitness information. The whole search process could be terminated until the generated test suite Stc satisfies the pre-set coverage criterion. There are three key issues in the above framework: trace collection, fitness function construction and search algorithm selection. In general, execution traces are collected by code instrumentation technology, which can be settled with compiling analysis. While constructing fitness function, testers should determine which kind of coverage is needed. Experience shows that branch coverage is a better cost-effective criterion [12]. For the last issue, we adopt PSO to produce test inputs due to its excellent features, such as easy implementation and fast convergence speed. 2.2 Related Work Test data generation is a key problem in automated software testing, which has attracted researcher’s extensive attention in the past decades. Here, we mainly concern on the test data generation methods based on MHS algorithms. In the 1990s, genetic algorithm has been adopted to generate test data. Jones et al. [13] and Pargas et al. [14] investigated the use of GA for automatically generating test data

Program Under Test Search Algorithm

generates

executes

Test Suite uses

produces

2 Background 2.1 Search-Based Test Data Generation Test data generation is a difficult task in software testing activity. In fact, it is a process of sampling the representative inputs, which can reveal the potential faults in program, from the input space. In recent years, some search techniques has

123

Coverage Info. Fitness Function

updates

refers

statistical analysis

refers

Results & Traces

Coverage Criterion

Fig. 1 The basic framework for search-based test data generation

Arab J Sci Eng (2014) 39:4593–4607

for branch coverage. Experiments on some small programs showed that GA could usually outperform random algorithm significantly. In recent years, McMinn [7] and Harman and McMinn [8] performed empirical study on GA-based test data generation for large-scale programs, and validated its effectiveness over other MHS algorithms. Another well-known search algorithm is simulated annealing [15], which can solve complex optimization problems based on the idea of neighborhood search. Tracey et al. [25] proposed a framework to generate test data based on SA algorithm. Their method can incorporate a number of testing criteria, for both functional and non-functional properties, but their experimental analysis is not sufficient. On the other hand, Cohen et al. [16] adopted simulated annealing technique to generate test data for combinatorial testing. But their method is mainly intended for functional testing. In this paper, we introduce a new emerging algorithm (i.e. PSO) to settle this problem. In our experimental analysis, we found that the test data set generated by GA or SA had lower average coverage than that generated by PSO. Meanwhile, PSO shows relatively faster speed to generate test data than both GA and SA. Although GA and SA are classical search algorithms, their convergence speed is not very significant. PSO proposed by Kennedy and Eberhart [9,10] can overcome this shortcoming. Windisch et al. [17] firsT applied a variant of PSO comprehensive learning particle swarm optimization (CL-PSO) to generate structural test data, but some experiments have confirmed that the convergence speed of CL-PSO is perhaps worse than the basic PSO [18]. Meanwhile, their experiments mainly focused on the artificial test objects. In our experiments, we also performed comparison analysis on these two algorithms, and results show that the basic PSO is more suitable for test data generation problem than CL-PSO. Recently, Chen et al. [19] used PSO to generate test suite for pairwise testing. Ahmed and Zamli [20] also adopted PSO for generating variable–strength interaction test suite with small size. Their work belongs to the combinatorial testing, which is a kind of functional testing. However, our research mainly concerns on the perspective of program code-oriented testing. Furthermore, PSO algorithm has been introduced to settle some other testing problems, such as regression testing [21] and test case prioritization [22]. In the paper, we will address how to utilize the basic PSO algorithm to generate test data for branch coverage testing, and perform comparison analysis on some well-known benchmark programs. Moreover, our previous work in [11] have presented the basic idea of applying PSO to generate test data for software structural testing, and only reported the preliminary results. In this paper, the detailed descriptions on some key techniques such as algorithm process, fitness function construction and branch weight estimation have been provided. On the other hand, the experimental analysis on the larger

4595

benchmark program set is also performed. Moreover, more in-depth analysis like impact analysis on population size as well as statistical analysis on repeated trials is also addressed in detail. Based on these intensive investigations, it is much clear that PSO algorithm is more suitable to generate structural test cases than GA and SA.

3 Particle Swarm Optimization PSO is a typical new technique in the field of swarm intelligence. It emulates the swarm behavior of insects, animals herding, birds flocking and fish schooling where these swarms search for food in a collaborative manner. PSO was first introduced in 1995 by Kennedy and Eberhart [9], and has been exploited across a vast area of research [23,24]. Although PSO shares some similarities with genetic search techniques, it does not use evolution operators such as crossover and mutation. Each member in the swarm, called a particle, adapts its search patterns by learning from its own experience and other members’ experiences. During the iteration process, each particle maintains its own current position, its present velocity and its personal best position. The iterative operation leads to a stochastic manipulation of velocities and flying direction according to the best experiences of the swarm to search for the global optimum in solution space. In general, the personal best position of particle i is denoted by pbesti , while the global best position of the entire population is called gbest. Suppose the population size is s in the D-dimensional search space, a particle represents a potential solution. The velocity and position of the dth dimension of the ith particle can be updated by formulas (1) and (2) [10], respectively.   d · pbestid − X id (t − 1) Vid (t) = w · Vid (t − 1) + c1 · r1i   d + c2 · r2i · gbest d − X id (t − 1) (1) X id (t) = X id (t − 1) + Vid (t)

(2)

where X i = (X i1 , X i2 , . . . , X iD ) is the position of the ith particle, Vi = (Vi1 , Vi2 , . . . , Vi D ) represents the velocity of particle i. pbestid is the personal best position found by the ith particle assigned for dimension d, and gbest d is the global best position of dimension d. The inertia weight w controls the impact of the previous history on the new velocity. The parameters c1 and c2 are the acceleration constants reflecting the weighting of stochastic acceleration terms that pull each d particle toward pbesti and gbest positions, respectively. r1i d and r2i are two random numbers in the range [0, 1]. Moreover, a particle’s velocity on each dimension is clamped to a maximum magnitude Vmax .

123

4596

Arab J Sci Eng (2014) 39:4593–4607

The concept of inertia weight is introduced in [10], which is used to balance the global and local search abilities. Generally speaking, a large inertia weight is more appropriate for global search, and a small inertia weight facilitates local search.

4 PSO-Based Test Data Generation 4.1 Algorithm Description In the framework of search-based test data generation, it is essentially a cooperation process of MHS algorithm and program dynamic execution. Once the search algorithm produces a test suite in the search process, it needs to seed them into PUT and execute program to gather coverage information. Based on such information, the fitness value of the corresponding coverage criterion can be calculated. Then, the fitness value is used to adjust the search direction to find a new test suite which can achieve the maximum possible coverage. However, the key problem is how to realize the perfect interaction between basic search algorithm and coverage information extraction. The overall algorithm of PSO-based test data generation is graphically represented in Fig. 2, whose execution steps can be described as follows. At the initialization stage, we encode the argument list arg = (a1 , a2 , . . . , am ) of PUT into

m-dimension position vector. For a given structural coverage criterion C, such as branch coverage, the fitness function f (∗) for PSO should be designed. Meanwhile, the initial values of f ( pbesti ) and f (gbest) are assigned with 0. To calculate the fitness of each particle (i.e. test case), we should instrument the PUT (P) to gather the coverage information about construct elements. On the other hand, some random values are utilized to initialize the velocity vector Vid and position vector X id . In the main part of algorithm, formulas (1) and (2) are used to determine the current position X id and velocity Vid of particle i at different dimension d, respectively. Then, the position vector X id of particle i is decoded into a test case. The test case is seeded in PUT to collect execution trace information. Subsequently, the fitness f (X i ) of the test case represented by X i is evaluated. Based on the fitness value of each particle (i.e. test case), the personal best position pbesti and global best position gbest can be updated. The whole particle evolution process is controlled by the termination condition. For the test data generation problem, termination condition can be one of the following two cases: (1) all construct elements (e.g. branches) have been covered, (2) the maximum evolution generation (max Gen) is reached. Finally, the particles in the last generation can be decoded into the set of test cases for PUT. Obviously, the size of test suite is the pre-set number of particle population.

start encoding

initialize position X, associated velocities V, pbest and gbest, set k=0

static analysis for PUT

k >= maxGen or all branches are covered

Y construct fitness function

instrument PUT

N

extract interface info.

i=1 update Xi and Vi for particle i in each dimension

reference execute PUT with real parameters

Testing Perspective

collect coverage info.

decoding calculate the fitness of particle i use fit(Xi)>fit(pbesti)

Y

pbesti=Xi fit(Xi)>fit(gbest)

stop the search process, and generate the final test data set

Y

i=i+1

gbest=Xi Y

end

i 0), refers to a constant which is always added if the term is not true. It is not hard to find that, k is used as penalty factor for the deviation of the real execution branch of a test case from its expected execution branch. Its value is always added if the

4597

branch predicate is not true. In our experiments, we assign it with 0.1 through referring the previous studies in [7,25–28]. As stated in Table 1, branch distance can be divided into the following three categories: (1) for the atomic proposition, the distance is set to 0 if the proposition is true, otherwise k. (2) For the arithmetic expression or relation expression, similarly, the distance is 0 if expression’s value is true. Conversely, the distance is the sum of k and the absolute difference between two operands in the corresponding expression. (3) For the compound logical expression, the branch distance is also the combination of the distance for basic proposition (i.e., the above two kinds). For the conjunctive logic, the combination relation is adding operation. Alternatively, function min() is performed for the case of disjunctive logic. A branch distance function can be constructed for each branch predicate in PUT according to the rules in Table 1. Then, the fitness function of the whole program can be defined by comprehensively considering fitness of each branch. Suppose a program has s branches, the fitness function of whole program can be calculated via formula (3).  fitness = 1

θ+

s 

2 wi · f (bchi )

(3)

i=1

where f (bchi ) is the branch distance function for the ith branch in program, θ is a constant with little value and is set to 0.01 in our experiments.  wi is the corresponding weight of the s wi = 1. Generally speaking, ith branch. Obviously, i=1 each branch is assigned with different weight according to its reachable difficulty. The calculation method of branch weight will be addressed in next subsection. Since the covered status of a branch is usually determined by the test data set, in fact, fitness in formula (3) depends on the test suite Stc . As a consequence, the optimization objective on test suite Stc can be represented via fitness in our algorithm implementation. 4.3 Branch Weight

Table 1 The branch functions for several kinds of branch predicates No.

Predicate

Branch distance function f (bchi )

1

Boolean

If true then 0 else k

2

¬a

Negation is propagated over a

3

a=b

If abs(a − b) = 0 then 0 else abs(a − b) + k

4

a = b

If abs(a − b) = 0 then 0 else k

5

ab

If b − a < 0 then 0 else abs(b − a) + k

8

a≥b

If b − a ≥ 0 then 0 else abs(b − a) + k

9

a and b

f (a) + f (b)

10

a or b

min( f (a), f (b))

According to formula (3), we can find that the fitness will compel search algorithm to spend more effort generating test cases which can cover the branches with high weight. Therefore, it is very important to assign a precise weight to each branch in accordance with to its reachable difficulty. Without loss of generality, the reachable difficulty of a branch is usually determined by the following two factors: nesting weight and predicate weight. Generally speaking, the branch in deep nesting level is hard to reach. For branch bchi (1 ≤ i ≤ s), suppose its nesting level is nli . Then, the maximum and minimum nesting levels of all branches can be represented as nlmax and nlmin , respectively. Thus, the nesting weight for branch bchi can be

123

4598

Arab J Sci Eng (2014) 39:4593–4607

computed as follows. wn(bchi ) =

nli − nlmin + 1 nlmax − nlmin + 1

(4)

Furthermore, this weight can be normalized via formula (5). wn(bchi ) wn (bchi ) = s i=1 wn(bchi )

(5)

In general, the nesting level of a branch can be automatically analyzed by program static analysis. When nli of the ith branch is yielded, its nesting weight can be calculated according to formulas (4) and (5). It is not hard to find that, the branch in much deeper nesting level will have the greater weight. Except for the nesting depth of a branch, the difficulty in degree of satisfying the branch coverage also depends on its predicate condition. According to the semantic of predicate statement, we can roughly classify the predicate conditions into the following four groups: equation, boolean expression, inequality and non-equation. From the perspective of cognitive information, different program constructs will cause different difficulties or complexities for comprehension and execution. At present, the existing researches mainly concern on the complexity measurement for different control logic constructs in program code or process, but the research on measuring the satisfiability of a given predicate is relatively rare. In fact, different types of program predicates will also cause different difficulties in satisfying it. Roughly speaking, a predicate formed by operator ‘>’ or ‘! =’ will be easier to be satisfied than the predicate with operator ‘==’. Similarly, we classify the usual condition operators into four categories shown in Table 2. To determine the satisfaction difficulty of four condition types (i.e., the reference weights in Table 2), ten experienced and skilled programmers are consulted for rating the difficulty coefficients of them, then the final weights accordingly established. The above reference model mainly aims at the case of branch predicate with only one condition. However, a branch predicate may contain several conditions in practice. This structural information can also be automatically achieved via static analysis for program code. Here, we assume that branch predicate bchi (1 ≤ i ≤ s) contains u conditions. For each condition c j (1 ≤ j ≤ u), its reference weight wr (c j ) can be

Condition type

1. If predicate bchi is formed by combining u conditions with and operation, its predicate weight is square root of the sum of wr2 (c j ). Generally speaking, the number of conjunctions in a predicate is not more than four, so wp(bchi ) will =1 && month=1 && month=1 and month=1 && month 45 for program line and n > 25 for the remaining three programs. Regarding to the effects of SA and GA, there is no significant difference between them in our experiments. According to above analysis, it is not hard to find that PSO-based approach shows good stability w.r.t. the change of population size. Meanwhile, this approach also can achieve full coverage and quick convergence speed with very small population size. 5.4 Statistical Analysis on Repeated Trials To validate PSO’s advantages to GA and SA from the perspective of statistical analysis, the experiment on each subject program was repeated 1,000 times. Then, the branch coverage information of each program was used for statistical difference test. We compare the coverage effect of PSO with those of GA and SA, respectively. Thus, the following two types of null hypotheses can be formed. Here, Wilcoxon– Mann–Whitney rank sum test (or U test) [32] is used as test method, R [33] is utilized as the analysis tool for statistical test. H01 : PSO is not significantly different to GA with regard to branch coverage? H02 : PSO is not significantly different to SA with regard to branch coverage?

123

4602

Arab J Sci Eng (2014) 39:4593–4607 1.02

1.025 1

avg. coverage

avg. coverage

1 0.98 0.96 0.94 0.92

PSO GA SA

0.9 0.88 10

20

30

40

50

0.95 0.9 0.85 0.8

PSO GA SA

0.75 0.7 10

60

population

20

30

1.025 1

1.025 1

0.95

0.9

0.9 0.85 0.8 PSO GA SA

0.75 0.7 20

30

40

50

0.6

PSO GA SA

0.5 0.4 10

60

(d) remainder

avg. coverage

avg. coverage

0.8 0.7 0.6 0.5

PSO GA SA 30

40

50

0.85 0.8 0.75 PSO GA SA

0.7

10

60

20

30

40

population

(e) computeTax

(f) bessj

avg. coverage

1 0.95 0.9 PSO GA SA 50

60

70

80

50

1.025 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 30

60

PSO GA SA 35

40

45

50

55

60

population

population

(g) printCalendar

(h) line

Fig. 3 Average coverage (AC) vs. population size

60

0.9

population

0.85

50

0.65

1.025

123

40

(c) cal 0.95

40

30

population

0.9

0.8 30

20

population

0.4

avg. coverage

0.7

1.025 1

20

60

0.8

1.025 1

10

50

(b) calDay

avg. coverage

avg. coverage

(a) triangleType

0.65 10

40

population

65

70

75

80

Arab J Sci Eng (2014) 39:4593–4607

4603 60

60 PSO GA SA

40 30 20 10 0 10

PSO GA SA

50

avg. generation

avg. generation

50

40 30 20 10

20

30

40

50

0 10

60

population

20

(a) triangleType

avg. generation

avg. generation

40 30 20

PSO GA SA

80 60 40

20

30

40

population

50

0 10

60

20

30

40

50

60

population

(c) cal

(d) remainder

100

60 PSO GA SA

PSO GA SA

50

avg. generation

80

avg. generation

60

20

10

60 40 20

40 30 20 10

20

30

40

50

0 10

60

population

20

(e) computeTax

30

40

population

50

60

(f) bessj

70

50

50

avg. generation

PSO GA SA

60

avg. generation

50

100 PSO GA SA

50

0 10

40

population

(b) calDay

60

0 10

30

40 30 20

PSO GA SA

40 30 20 10

10 0 30

40

50

60

70

80

0 30

40

50

60

population

population

(g) printCalendar

(h) line

70

80

Fig. 4 Average convergence generation (AG) vs. population size

123

4604

Arab J Sci Eng (2014) 39:4593–4607

Table 8 p values of hypotheses H01 and H02 for repeated experiments on eight subject programs Program

p value of H01

p value of H02

triangleType

3.784e−12

0.0010150

calDay

1.102e−12

0.0006281

cal

1.167e−08

6.395e−12

remainder

1.140e−12

1.191e−7

computeTax

1.617e−12

1.614e−12

bessj

1.824e−11

0.002035

printCalendar

2.108e−11

2.099e−11

line

8.924e−11

6.097e−12

The results of statistical test are shown in Table 8. For null hypothesis H01 , i.e. the comparison between PSO and GA, we can find that the p values for all eight programs are far

Suggest Documents