ACM SIGSOFT Software Engineering Notes
Page 1
September 2011 Volume 36 Number 5
Radial Basis Function Neural Network Based Approach To Test Oracle Om Prakash Sangwan
Pradeep Kumar Bhatia
Prof. Yogesh Singh
School of ICT Gautam Buddha University Greater Noida, Uttar Pradesh, India
Department of CSE GJU of Science & Technology Hisar, Haryana, India
Vice Chancellor M.S. University of Baroda Gujarat, India
[email protected]
[email protected]
[email protected]
ABSTRACT Software testing is an important discipline, and consumes significant amount of effort. A proper strategy is required to design and generate test cases systematically and effectively. In this paper automated software test case generation with Radial Basis Function Neural Network (RBFNN) has been proposed and empirically validated with the help of a case study and compared with other techniques of soft computing. Experimental results show that RBFNN is one of the best technique for automated test case generation
Categories and Subject Descriptors D.2.7 [Software Engineering]: Software Testing, Test Suits, Test Case Generation.
General Terms Algorithms, Measurement, Documentation, Performance, Design, Verification.
Keywords Artificial Neural Network, Radial Basis Neural Network, Feed Forward Backpropagation, Test Oracle, Software Testing.
1. INTRODUCTION Software testing is the process of testing the software product. Effective software testing will lead to better quality product, lesser maintenance costs, and more accurate results. Pressman defined software testing is a critical elements of software quality, assurance and represents the ultimate review of specification, design and code generation [27]. The essence of software testing is to determine a set of test cases for the software being tested. Test cases are the heart of software testing. Firstly, test cases have to be identified which need to be executed on the software, as exhaustive testing is almost impossible. Secondly, these test cases have to be run on the software. A test has an identity, and is closely associated with program behavior. A test case has a set of inputs, a list of expected outputs for the implementation under test [3, 7, 9]. So a test case may be represented as: Test Case = {, } (1) The test suite (collection of test cases) cannot be used without the desired output value being a part of the test case. Therefore in many approaches in software testing the presence of a software oracle is assumed [3]. Oracle is generally implemented manually. It is important to automate the oracle in software testing. A perfect automated oracle would be behaviorally equivalent to the Implementation Under Test (IUT) and a trusted source of expected results for the given sets of input. It would accept every input specified for the IUT and would always produce a correct result. Developing a perfect Oracle is therefore as difficult as solving the original design problem. If a perfect Oracle is available and portable to the target environment, we could dispense with the IUT and fulfil the application environment with the ideal oracle. We cannot guarantee the ability of any algorithm to decide that an algorithm is correct for all possible cases. So the perfect oracle however is something like a philosopher’s stone for software [19].
DOI: 10.1145/2020976.2020992
Few attempts have been made to generate test oracle. Wang proposed the automatic test data generation mechanism using gray-box testing [24]. Nebut et al. proposed a use case driven approach, for automating the generation of system test scenarios in the context of object-oriented embedded software, taking into account traceability problems between high-level views and concrete test case execution. The approach has been evaluated in three case studies by estimating the quality of the test cases generated by the proposed prototype tools [23]. Rajan presented the concept of auto generating black-box test cases from requirements [25]. They defined coverage metrics directly on the structure of the formalized requirements, and used an automated test case generation tool, like the model checker, to generate test cases from formal requirements that satisfy the desired criteria. The idea of defining coverage metrics for requirements-based testing holds promise and auto generating black-box test suites that satisfy rigorous requirements coverage metrics will help (1) gain dramatic time and cost savings, (2) identify weaknesses in requirements set and (3) identify laws in the implementation. Doungsa-ard et al. proposed AI based framework for automatic test data generation. The proposed framework is pluggable; therefore it can be used in many test generation approaches [11]. ANN model has proved to be a useful approach for test case effectiveness prediction and to the development of an approximation oracle for a given implementation under test [2, 4, 5, 20, 21, 22]. Ying and Mao explored Radial Basis Function Neural Network (RBF NN) to construct an automated test oracle. The automated oracle generates the appropriate outputs that are close to expected outputs after training. Experimental results conclude that RBF NN can be used to implement automated oracle. It can generate approximate output that is close expected output for IUT, thereby saving lots of time and cost in software testing [30]. In this paper, Radial Basis Function Neural Network (RBF NN) model has been proposed as a test oracle. Neural network has gained much popularity for that it can approximate complex nonlinear mappings directly from the input and output with a sample topology structure [6, 12, 17, 18, 27]. NN can approximate arbitrary continuous function effectively without the need to have knowledge of that function. Few experiments have been conducted to validate the results. With the help of Triangle Classification Problem (TRIYP), experimental result suggests that RBF Neural Network can be used to give automated outputs with a reasonable degree of accuracy [1, 3, 8, 10, 26]. The paper is organized as follows: Proposed RBF NN is explained in section 2. In section 3, we described experiment design. Section 4 presents the results of the experimental study and the conclusion is presented in section 5.
2. PROPOSED RADIAL BASIS FUNCTION NEURAL NETWORKS Radial Basis Function Neural Networks (RBF NN) may require more neurons than standard feed-forward back propagation networks. But they often can be designed in a fraction of time it takes to train standard
http://doi.acm.org/10.1145/2020976.2020992
ACM SIGSOFT Software Engineering Notes
Page 2
feed-forward networks. They work best when many training vectors are available. A radial basis function network is an artificial neural network that uses radial basis functions as activation functions. It is a linear combination of radial basis functions. They are used in function approximation, time series prediction, and control. Radial Basis Function (RBF) networks typically have three layers: an input layer, a hidden layer with a RBF activation function and a linear output layer. The output,
September 2011 Volume 36 Number 5
are three nodes in the input layer. In the first (hidden) layer neuron were varied from 12 to 30 and in the second (output) layer one neuron was kept to represent four possible outputs as shown in table 1. Table 1. NN Model Summaries with Backpropagation Algorithm Architecture No. of Neurons at Layer 1
24
(hidden layer)
, of the network is thus
No. of Neurons at Layer 2
01
(output layer) ----
(2)
where N is the number of neurons in the hidden layer, is the center vector for neuron i, and ai are the weights of the linear output neuron. In the basic form all inputs are connected to each hidden neuron. The norm is typically taken to be the Euclidean distance and the basis function is taken to be Gaussian
Input Units
03
Output Unit
01
Training
… (3) The Gaussian basis functions are local in the sense that
Algorithm
Feed-forward Back Propagation
Training Function
Trainbr
Adaptive Learning Function
Learngdm
Transfer Function
Logsig
…. (4)
3. EXPERIMENTAL DESIGN The goal of our work is found out whether RBF Neural Network can be used as a test Oracle. For our experimentation, we used the triangle classification problem TRIYP that has been widely studied in literature [1, 10, 16]. The program accepts three integers that represent the relative lengths of the sides of a triangle. The output of the program is equilateral, Isosceles, Scalene or Not a triangle depending on length of sides of triangle. The input variables can have any integer value between 1-200. Since all the three input variables can have any value 3
between 1-200, the number of input combinations is 200 = 80Lacs and producing output manually for all these combinations of input is a tough job. We used the following experimental set up to see if RBF neural network could produce output for every combination of input after getting trained. Test cases were divided into four categories based on four types of expected outputs and their target values were set as mentioned below. 1. T1 (Test cases corresponding to Isosceles output) Target Value: 1 2. T2 (Test cases corresponding to Scalene output) Target Value: 2 3. T3 (Test cases corresponding to Not a triangle output) Target value: 3 4. T4 (Test cases corresponding to Equilateral output) Target Value: 4 The inputs and outputs are normalized using min-max normalization. Min-max normalization performs a linear transformation on the original data [14]. To solve this problem, initially we chose the widely used feed-forward back propagation neural network algorithm [28]. We used a sigmoidal feed forward network with a single hidden layer consisting of twenty four nodes and one sigmoidal node in the output layer. There
DOI: 10.1145/2020976.2020992
The MATLAB training function used was ‘trainbr’, the adaptation learning function selected for this experiment was ‘learngdm’. Performance functions used are MARE and MRE. Transfer function was ‘log-sigmoid’ in both the layers. Each of the networks was trained with 100 epochs, keeping the goal as 0.005. The network was trained with 203 test cases and validated on 200 test cases. The network with 24 hidden neurons is found to be more appropriate when compared with actual and predicted output on the validation set of 200 test cases as shown in figure 1.
Actual V/s Predicted Output With Back Propagation Network 4.5 4 3.5
Predicted Output
i.e. changing parameters of one neuron has only a small effect for input values that are far away from the center of that neuron. In our study we have develop an ANN model based on training data set and proposed a model using with RBF NN function as test oracle.
3 Predicted Output
2.5 2
Actual Output
1.5 1 0.5 0 1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196 Actual Output
Figure 1. Validation set Results with Feed Forward Backpropagation Algorithm Further to our study another variant of back propagation algorithm i.e. cascade-forward backpropagation algorithm was also designed as shown in table 2.
http://doi.acm.org/10.1145/2020976.2020992
ACM SIGSOFT Software Engineering Notes
Page 3
Table 2. NN Model Summaries with Cascade-Forward Backpropagation Algorithm Architecture
September 2011 Volume 36 Number 5
It is clear from the figure that the model with cascade-forward back propagation was not suitable for this case study.
3.1 Design of RBFNN No. of Neurons at Layer 1
24
The goal of our design is to propose a model that gives better performance than traditional back propagation algorithm and whether the proposed RBF NN model is effective, i.e., if the oracle can generate the approximate output closer to the expected output generated by the other neural network models. A radial basis function network is an artificial neural network that uses radial basis functions as activation functions. It is a linear combination of radial basis functions. The net input to the radbas (radial basis) transfer function is the vector distance between its weight vector w and the input vector p, multiplied by the bias b. The transfer function for a radial basis neuron is
(hidden layer) No. of Neurons at Layer 2
01
(output layer) Input Units
03
Output Unit
01 radbas(n) = e – n2
Training Algorithm
Cascade-forward back propagation
Training Function
Trainlm
Adaptive Learning Function
Learngdm
The network was trained with training function “trainlm” and an adaptive learning function “learngdm” on the same training data set and was validated on same validation set of 200 test cases. The performance of the proposed algorithm with the validation set was not satisfactory. A comparison of the actual and predicted output is displayed in the figure 2.
Actual V/s Predicted Output With Cascade-forward Algorithm 4.5 Predicted Output
4 3.5 3 Predicted Output
2.5 2
Actual output
1.5 1 0.5
(5)
In our design, we use three input variables and one output variable. The proposed RBF NN model was trained with 203 test cases with 203 neuron in the hidden layer i.e. radial basis layer and one neuron at the output layer i.e. linear layer as shown in figure 3.
Figure 3. The Architecture of RBF NN with three inputs, 203 neurons in hidden (Radial Basis) layer and one output (Linear Layer)
4. EXPERIMENTAL RESULTS As Compared to the feed forward network, the radial basis function (RBF) network is the next-most-used network model and gives better performance. The model is trained using training test cases and was evaluated on validation test cases. Different error measurements have been used by various researchers, but for our study the main measures used for the model accuracy are the Mean Relative Error (MRE) and Mean Absolute Relative Error (MARE). MARE is the preferred error measure for software measurement research and is calculated as follows [13]:
0 1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196 Actual Output
Figure 2. Validation Set Results with Cascade-Forward Algorithm
DOI: 10.1145/2020976.2020992
http://doi.acm.org/10.1145/2020976.2020992
ACM SIGSOFT Software Engineering Notes n estimate actual MARE actual i 1
n
Page 4
September 2011 Volume 36 Number 5
Actual V/s Predicted Output With Radial Basis Network
(6)
where: estimate is the network output for each observation, n is the number of observations to estimate whether models are biased and tend to over or under estimate, the MRE is calculated as follows [13]:
n estimate actual n MRE actual i 1
..(7)
Predicted Output
4.5 4 3.5 3 Predicted Output
2.5 2
Actual Output
1.5 1 0.5 0 1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196
A large positive MRE would suggest that the model over estimates the number of lines changed per class, whereas a large negative value will indicate the reverse. Another measure used in our study is correlation. It is widely used with observational data, including software measurements. It is used to evaluate the relationship to the actual and predicted of dependent variable. The correlation of the predicted change and the observed change is represented by the coefficient of correlation (r) given by [15]:
Actual Output
Figure 4. Validation set Results with Radial Basis Network
From the above figure it becomes evident that our proposed model produces test cases that are comparable to the measured test cases of the existing model and provides better results from other NN models.
5. CONCLUSIONS (8) where x and y are the sample means AVERAGE(array1) and AVERAGE(array2). The validation results obtained using various training algorithm with MRE, MARE and Correlation Coefficient (r) are shown in table 3. Table 3. Validation Results using Various Training Algorithms Performance Measures
Radial-Basis Function Algorithm
Feed Forward Backpropagation Algorithm
CascadeForward Backpropagation Algorithm
In this paper an attempt has been made to explore the possibility of usage of Artificial Neural Network for test case prediction. The Triangle Classification Problem (TRIYP) has been used as case study. The Radial Basis Function Neural Network is demonstrated to be suitable for this purpose that belongs to supervised stream of neural networks. Multiple training algorithms are used in the experiments. The performance of the proposed model (RBFNN) i.e. MRE and MARE is compared with feed forward backpropagation and cascade-forward backpropagation algorithm. The results of the proposed Radial Basis Function model with MARE 0.137% and MRE 0.04% are the best among those of the compared models. Also it is concluded that the RBF NN model yields better results as test oracle.
6. REFERENCES [1] Acree A T, Budd T A, DeMillo R A, Lipton R J, Sayward F. G. 1979. Mutation Analysis. School of Information and Computer Science. Georgia Institute Technology, Atlanta, tech. Rep. GIT-ICS-79/08.
MARE
0.137421
0.248468
0.473434
[2] Anderson C, Von Mayrhauser, A Mraz, R. 1995. On the Use of Neural Networks to Guide Software Testing Activities. Test Conference, 1995. Proceedings International, 21-25 Oct, pp 720-729.
MRE
0.044266
0.089356
0.24401
Correlation (r)
0.908735
0.81053
0.226686
[3] Aggarwal K K, Yogesh Singh, 2001.Software Engineering Programs, Documentation Operating Procedure. New Age International Publishers (2001).
Figure 4 gives the simulation plot for the case study. Simulation was performed on 200 test cases (T1, T2, T3 and T4 combined). Test cases were chosen randomly, by the Random Number Generator.
[4] Aggarwal K K, Singh Yogesh, Kaur Arvinder, Sangwan O. P 2004. A Neural Net Based Approach to Test Oracle. ACM SIGSOFT, Vol. 29, No.4, (May 2004). [5] Aggarwal K K, Singh Yogesh, Kaur Arvinder, 2005. Empirical Studies on Application of Neural Networks as Approximation Oracle. Journal of Computer Science 1(3): pp 341-345. [6] Alippi, C, V Piuri, F Scotti, 2001. Accuracy versus Complexity in RBF Neural Networks. IEEE Instrumentation and Measurement Magazine 4, pp 32-36. [7] Beizer B.: Software Testing Techniques, 1990. Van Nostrand Reinhold, New York. [8] Clarke L A, Richardson D. J 1983. The Application of Error Sensitive Testing Strategies to Debugging. In Symposium of High–Level debugging, ACM SIGSOFT/SIGPLAN, pp 45-52.
DOI: 10.1145/2020976.2020992
http://doi.acm.org/10.1145/2020976.2020992
ACM SIGSOFT Software Engineering Notes
Page 5
September 2011 Volume 36 Number 5
[9] Collofello J. S. 1988. Introduction to Software Verification and Validation. SEI-CM-13-1.1, Software Engineering Institute, Pittsburgh, P.A., USA.
[20] Mayrhauser A V, DeMillo A, Offutt A J, 1991. Constraint-Based Automatic Test Data Generation. IEEE Transactions on Software Engineering SE-17, pp 900-910.
[10] DeMillo R A, Lipton R J, Sayward F. G.1978. Hints on Test Data Selection: Help for Practicing Programmers. IEEE Computer, Vol. C11: pp 34-41.
[21] Mayrhauser A Von, Walls J, Mraz R 1994. Testing Applications Using Domain Based Testing and Sleuth. Proceedings of the Fifth International Software Reliability Engineering Symposium, Monterey, November 1994, pp 206-215.
[11] Doungsa-ard C, K Dahal, A Hossain, 2006. AI Based Framework IWS06. for Automatic Test Data Generation. http://eastwest.inf.brad.ac.uk/document/publication/Doungsa-ardIWS06%20.pdf [12] Duda R O, P E Hart, D G Stork, 2001. Pattern Classification. 2nd Edition, John Wiley and Sons. [13] Finnie G R, Wittig G. E., 1996. AI Tools for Software Development Effort Estimation. IEEE Transaction on Software Engineering, pp 346-353. [14] Han J, Kamber M.,2001. Data Mining: Concepts and Techniques. Harchort India Private Limited. [15] Hopkins W. G. 2003. A New View of statistics. Sports Science. [16] Howden William E, Eichhorst Peter, 1978. Proving Properties of Programs from Program Traces. In Tutorial: Software Testing and Validation Techniques: E Miller and W.E. howden (eds.0. new York: IEEE Computer Society Press. [17] Karayiannis N B, G W Mi., 1997. Growing Radial Basis Neural Network: Merging Supervised and Unsupervised Learning with Network Growth Techniques. IEEE Transaction on Neural Networks, 8, pp 1492-1506. [18] Li Y H, S Qiang, XY Zhuang, O Kayank, 2006. Robust and Adaptive Back Stepping Control for Nonlinear Systems Using RBF Neural Network. IEEE Transaction on Neural Networks, 15, pp 693701 (2004).decisions. J. Syst. Softw. 79, 5 (May. 2006), 577-590. DOI= http://dx.doi.org/10.1016/j.jss.2005.05.030. [19] Manna Zohar, Waldinger Richard, 1978. The Logic of Computer Programming. IEEE Transactions on Software Engineering SE4(3):199-229.
DOI: 10.1145/2020976.2020992
[22] Mayrhauser A Von, Anderson C, Mraz R. 1995. Using A Neural Network To Predict Test Case Effectiveness. Proceedings IEEE Aerospace Applications Conference, Snowmass CO. [23]. Nebut Cle´mentine, Franck Fleurey, Yves Le Traon, Jean-Marc Je´ze´quell 2006. Automatic Test Case Generation: A Use Case Driven Approach. IEEE Transaction on Software Engineering, Vol. 32, No.3, pp 140-155. [24] Pressman R. S. 1977. Software Engineering: A Practitioners Approach. McGraw Hill, New York.. [25] Rajan Ajitha, 2006. Automated Requirements Based Test Case Generation. FSE 06, November 5–11, Portland, Oregon, USA. [26] Ramamoorthy G V, Ho S F, Chen W. T 1976. On the Automated Generation of Program Test Data. IEEE Transactions on Software Engineering Vol. SE-2, pp 293-300. [27] Roy A, S Govil, R Miranda, 1997. A Neural Network Learning Theory and a Polynomial Time RBF Algorithm. IEEE Transaction on Neural Networks, 8, pp 1301-1313. [28] Rumelhart D et al. 1986. Parallel Distributed Processing: Explorations in the Microstructures of Cognition. MIT Press, Mass. [29] Wang L et al 2004. Generating test cases from UML activity diagram based on Gray-box method. In Software Engineering Conference, 11th Asia-Pacific.. [30] Ying Lu, Mao Ye, 2007. Oracle Model Based on RBF Neural Networks for Automated Software Testing. Information Technology Journal, 6(3), pp 469-474.
http://doi.acm.org/10.1145/2020976.2020992