automatic prioritization of functional tests

Nr 19 Konferencje

Prace Studentów Politechniki Wrocławskiej Nr 19

Nr 19 2014

software testing, object-oriented metrics, classification, defect prediction

Ngoc Trung NGUYEN

AUTOMATIC PRIORITIZATION OF FUNCTIONAL TESTS

To produce a high quality software for clients, it should be accurately tested. However often this process is time-consuming and laborious. This paper attempts to resolve this problem by constructing a prediction model based on object-oriented class metrics.

1. INTRODUCTION Nowadays software testing plays a very important role in process of software production. The companies that provide applications to clients, make every effort to ensure the highest quality of their products and maximize comfort of using a program by potential users . However, what does the term “quality” exactly mean? Often clients say that some product has “good quality” or “bad quality”, but this definition is not precise. The term “quality” is defined as a “compliance with specification” or “suitability for use” [8]. Every difference with specification is treated as a defect. In field of software engineering the quality is defined as a product without defects [8]. A role of testers is hence to find as many defects in application as possible to resolve problems. However, often there is a situation when after making even small changes in one of the modules of program by software developer, a tester has to check entire application. Unfortunately it is very time-consuming and laborious. As a result, an application testing is very expensive for company. This article presents data mining approach for automatic prioritization of functional tests. The objective of this research is to construct a classification model based on Chidamber & Kemerer (CK) , QMOOD, Henderson-Sellers, Martin’s, Tang’s and McCabe’s metrics, that would help tester to make decision about which of the components should be tested [6]. There are many articles about defect predictions, but this paper is focused only on prediction based on object-oriented class metrics. For similar researches, a reader is

373

referenced to inter alia [6][9]. In the first article, the main goal is to construct prediction model using ckjm and the second presents an influence of CK metrics on defects.

2. EXPERIMENT DESIGN This research was divided into 4 stages: data collecting process, creating final version of dataset, feature selection and creating classification model. Section 2.1 describes a source of data, how they were collected and which independent variables were taken into consideration. Section 2.2 is about creating final version of dataset that will be used in classification process. Description about feature selection appears in Section 2.3 and information about constructing prediction model is given in Section 2.4. 2.1. DATA COLLECTION PROCESS

The object of this study was project Camel which is available on Apache website and developed in Java language [11]. The advantage of Apache Camel is big amount of bugs reported by users and moreover, it consists of many components. For each component of Apache Camel project two kinds of data were collected: ·Data related to its defects reported by users using Atlassian JIRA, ·Data related to class metrics In the first stage of data collection process, information about bugs for each component in Apache Camel product were analyzed using Atlassian JIRA system. When the user wants to report a new defect related to the program in this system, he has to create a new issue. There are 5 kinds of issue distinguished: Bug, Improvement, New Feature, Task and Custom issue [10]. In this research only a type Bug was taken into consideration. For every issue one of five following priorities can be assigned: Blocker, Critical, Major, Minor and Trivial [10]. A main objective of this stage is to collect data about numbers of defects for each mentioned priorities that appeared in each component. In this study we looked for information about bugs in each component that appeared in their early phase of development, due to the fact that in this time usually lot of defects occur. To find those data in Atlassian JIRA, there was required to specify a period of time in which the issues were created by setting a start date and end date. Each component appeared in various phase of project development hence the periods of time for searching the issues for the components were setting separately. In each period of time the start date was constant – 02/07/2007 – it is a date of releasing a version 1.0.0 of Apache Camel. The end date is a day before date of releasing next version from one of following versions: 1.0.0, 1.3.0, 2.0.0, 2.5.0, 2.7.0, 2.9.0, 2.10.0, 2.11.0, 2.12.3. To find data about defects in components in their early phase of development, only the earliest possible

374

version from the list of versions above was chosen for the end date, depending on the time when the component appeared in project. Except data collected as described above, a version 2.13.0 of project was also analyzed to find information about numbers of defects for each priorities in each component. In this case the data were searched for all components, where only the bug issues that had status Unresolved were taken into consideration [10]. The period of time was set as 02/07/2007 - 21/03/2014. The objective of second stage of data collection process, in turn, was to find the independent variables. To do this the source files related to each component in appropriate version of project were analyzed. In this study the program, named ckjm, was used to compute CK, QMOOD, Henderson-Sellers, Martin’s, Tang’s and McCabe’s metrics for source code of each component [6]. They consist usually of many classes, hence for each metric in the component, the following statistics were computed: average (AVG), maximum (MAX), standard deviation (STDEV). Those factors are treated as the independent variables. Additionally, two other were added: “Number Of Classes” which specifies, how many classes are in each component and “Duration Of Project” which contains number of days based on a period of time which was set to search information of defects for each component in Atlassian JIRA. 2.2. CREATING A FINAL DATASET

Having the data collected as described in section above, it is required to decide how a final dataset would look like. It was assumed that each independent variable described in second stage of Section 2.1 is treated as a feature (totally 59). Totally there are 195 instances which store values of those features. Except independent variables there is also required to create dependent variable that would point if the instance is classified as component “good” or component “bad”. It was assumed that this variable, called “IsGood”, depends on number of bugs for each priority, that were described in Section 2.1, for each component. For component that is not described as “bad”, the following conditions were set: ·Blocker – the number of issues with this priority cannot appear in any of the classes, ·Critical – the number of issues with this priority cannot appear in any of the classes, ·Major – the number of issues with this priority can represent maximally 10% of number of classes, ·Minor – the number of issues with this priority can represent maximally 30% of number of classes, ·Trivial – the number of issues with this priority can represent maximally 50% of number of classes.

375

Taking into consideration those conditions, formula 1) and 2) were created which could define which type of component points an instance:

s = 1-

1 10 * (2 N * Bl + N * Cr + 10 * Ma + * Mi + 2 * Tr ) N 3 1 f ( s) = -as 1+ e

1) 2)

N is a number of classes in component, Bl – number of bugs with Blocker priority, Cr - number of bugs with Critical priority, Ma - number of bugs with Major priority, Mi - number of bugs with Minor priority, Tr - number of bugs with Trivial priority. The fewer number of defects appear in component, the higher value is received in 2). In ideal case, when no bugs appear, a value of s equals 1 and f(s) has the largest possible value, approximately 1. It is worth noticing that the coefficients near factors Bl, Cr, Ma, Mi and Tr are the inverses of percentages included in mentioned conditions. Thanks to that if a limit is exceeded, a value of s will be less than zero, which causes that the value of f(s) will be fewer. It was assumed that if f(s) > 0,8, an instance points to good component (IsGood = 1) and bad otherwise (IsGood = 0). After applying formulas 1) and 2) the final dataset was created. It consists of 170 instances that point to component “good” and 25 instances that point to component “bad”. 2.3. FEATURE SELECTION

Before building a classification model a new subset of attributes was created. It is a result of feature selection due to the fact that classification accuracy in this situation is better than with the full set [7]. In feature selection process two elements are required: evaluator which assesses a subset of features and search method, As an evaluator, the J48 decision tree was chosen, which is using C4.5 algorithm, while as a search method, a Best First Search method was set [1][5]. As a result we received 5 features: MAX(AMC), AVG(DIT), AVG(Ca), STDEV(MFA) and Duration of Project. In the classification model only those 5 among 59 features will be used. 2.4. PREDICTION MODEL CONSTRUCTION

In constructing classification model, an AdaBoost with J48 decision tree as a base learner was used. J48 is an WEKA implementation of C4.5 algorithm and has important advantages for this research [1]: ·It is easy to understand ·It is able to work with big amount of data and missing values ·It has high performance To make classification results better, AdaBoost algorithm was used. Due to the other researches it can cause accuracy improvement [4].

376

3. RESULTS To examine the prediction model, 10-fold cross-validation method for the dataset and for the subset of attributes was applied. The results are presented in Table 1. Table 1. Values of selected parameters Name Correctly identified instances Incorrectly identified instances TP Rate [2] FP Rate [2] Precision [2] Recall [2] F-Measure [2] ROC Area [2]

Value 85.641% 14.359% 0.856 0.669 0.831 0.856 0.841 0.744

As it is shown in Table 1, the result of correctly identified instances equals 85.641% (167 of 195) and for incorrect identified instances the result is 14.359% (28 of 195). The values of those parameters are satisfactory. An important element of prediction model assessment is ROC curve. A main advantage of this curve is a fact, that it is resistant to imbalanced data [3]. Computing the area (AUC) of this curve, the quality of classifier can be evaluated. For random classifier AUC equals 0.5. The closer to 1 AUC is, the better results achieves prediction model. In this research, our AUC equals 0.744, which is a satisfactory result.

4. THREATS TO VALIDITY During performing the research, some limitations appeared that could have an influence to final results. They are listed below: ·The dataset is too small. It contains only 195 instances which causes that it is difficult to achieve better results of classification, ·The dataset contains imbalanced data. It consists of 170 instances that point to component described as “good” and 25 remaining to component “bad”. This situation causes that classifier may have problems with pointing which of the components are truly “bad”, ·The constructed classification model was not tested on another real programming project. There is possibility that defect prediction may not work properly on other projects.

377

5. CONCLUSIONS AND FUTURE WORKS This paper showed the research based on data about bugs and object oriented class metrics which objective was to build defect prediction model. The results of classifier are satisfactory which enable to develop the research in future, but there is required to take into consideration the limitations described in Section 4. A quite high value of AUC showed that there exists a relation between object-oriented metrics and software defects. This fact enables to create a plugin for Atlassian JIRA system in the future which would point if a specified component is “good” or “bad” based on train data provided by user. Thanks to that software testers would know which of the components should be accurately tested and which of them could be ignored in testing process. REFERENCES [1] N. Bhargava, G. Sharma, R. Bhargava, M. Mathuria, Decision Tree Analysis On J48 Algoritm for Data Mining, International Journal of Advanced Research in Computer Science and Software Engneering, pp. 1114-1119,2013 [2] R. R. Bouckaert, E. Frank, M. Hall, R. Kirkby, P. Reutemann, A. Seewald, D. Scuse, WEKA Manual for version 3-6-10, pp.21-22, 2013 [3] T. Fawcett, An introduction to ROC analysis, Pattern Recognition Letters, pp. 861-874, 2006 [4] J. Gholap, Performance Tuning Of J48 Algorithm For Prediction Of Soil Fertility, Asian Journal of Computer Science And Information Technology, vol. 2, no. 8, pp. 251-252, 2012 [5] M.Hall, A.Smith, Practical Feature Subset Selection for Machine Learning, pp.1-11 [6] M. Jureczko, D. Spinellis, Using object-oriented design metrics to predict software defects, in: Proceedings of the Fifth International Conference on Dependability of Computer Systems, Monographs of System Dependability, Oficyna Wydawnicza Politechniki Wrocławskiej, Wrocław, Poland, 2010, pp. 69-81 [7] A.G.K. Janecek, W.N. Gansterer, M.A. Demel, G.F. Ecker, On the Relationship Between Feature Selection and Classification Accuracy, Workshop and Conference Proceedings 4, pp. 90-105, 2008 [8] S.H.Kan, Metrics and Models in Software Quality Engineering, Second Edition, Pearson Education, Inc, Boston 2003, pp. 336-385 [9] R. Subramanyam, M.S. Krishnan, Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects, IEEE Transactions on software engieneering, vol. 29, no. 4, pp.297-309, April 2003 [10] https://confluence.atlassian.com/display/JIRA/JIRA+User%27s+Guide . Last accessed on June 14, 2014 [11] https://camel.apache.org . Last accessed on June 14, 2014

378

automatic prioritization of functional tests

automatic prioritization of functional tests

Suggest Documents

Towards a Functional Requirements Prioritization with ...

Provocative tests, functional exams

Functional balance tests

Prioritization of Regression Tests using Singular ... - Computer Science

Prioritization of Regression Tests using Singular ... - Semantic Scholar

VARPRISM: incorporating variant prioritization in tests of de ... - Core

Comprehensive digital functional tests of electronic components

The use of functional physiotherapeutic tests to

Discovery and functional prioritization of Parkinson's ... - BioMed Central

Prioritization, clustering and functional annotation of MicroRNAs using ...

An Approach for Integrating the Prioritization of Functional and

Automatic Classification of Non-Functional Requirements from ...

Automatic Classification of Non-Functional Requirements from

On Correlating Structural Tests with Functional Tests for Speed

Automatic Synchronization of Functional Electrical ... - IEEE Xplore

Molecular prioritization strategies to identify functional genetic ... - Nature

Article Integrating Evolutionary and Functional Tests

Can functional capacity tests predict future work

T2S Non Functional Tests - European Central Bank

Nonparametric Permutation Tests For Functional Neuroimaging: A ...

Nonparametric Permutation Tests For Functional Neuroimaging: A ...

Automatic generation of multiple-choice tests Geraç˜ao autom´atica ...

Automatic Generation of Object-Oriented Tests with a Multistage ...

Automatic instantiation of abstract tests on specific ... - arXiv