Jan 1, 2015 - Associative Classifier for Software Fault Tolerance in presence of Class ... associative classifier uses the Class Association ..... The Economics.
International Journal of Innovations & Advancement in Computer Science IJIACS ISSN 2347 – 8616 Volume 4, Issue 1 January 2015
Associative Classifier for Software Fault Tolerance in presence of Class Imbalance Vijaya Bharathi Manjeti
Sireesha Rodda
GMRIT Institute Technology
GITAM Instittue of Technology
Rajam Srikakulam
GITAM University Visakhapatnam
ABSTRACT Software fault prediction is crucial in reducing the overall cost for developing a software product and also to assure the quality of the finished software product. Different software quality models based on data mining techniques are in existence for identifying software-fault prone modules. However, the presence of class imbalance problem reduces the overall quality of the developed software product. This paper addresses the effects of class imbalance on the classification algorithms intended to perform software-fault prediction. An ensemble-based classifier is proposed to mitigate the effects of class imbalance. This classifier learns defect prediction efficiently as demonstrated in the results. Keywords- Software defect prediction, class imbalance learning, ensemble classifiers. 1.INTRODUCTION Presence of software faults can turn out to be expensive during software development in terms of quality and cost [1]. The conventional process of manual software reviews and testing activities can only detect 60% of the faults [2]. Menzies et. al. [3] found defect predictors can increase the probability of detection to 71%. Various Machine learning and statistical approaches have been investigated for Software Defect Prediction. Classification is a popular option for performing software defect prediction. The classification algorithm categorizes which module is more prone to defects based on the classifier developed from existing data culled from previous development projects.
1
Vijaya Bharathi Manjeti, Sireesha Rodda
Association Mining (AM)[4] refers to the task of finding the complete set of frequent itemsets from which class association rules are generated based on their association with the pertinent class labels. Associative Classification[5] deals with the set of features as itemsets and applies Association Mining techniques to discover set of frequent itemsets that occur in the training dataset based on a user specified minimum support threshold. An associative classifier uses the Class Association Rules (CARs) generated by Association Mining to predict the class label of an unseen instance. Once the classification model is built using CARs, it is evaluated on the test data. It has been shown that Associative Classifiers show better performance than other classifiers. The rules generated by the classifier are understandable to the human user. Software Defect Prediction features an imbalance between defect and non-defect class labels of the dataset. Generally, the number of non-defect samples (majority class) is much more than that of defective ones(minority class). Imbalanced distribution of data contributes to for the poor performance of the classifier, negatively effecting the classification of defective samples. Arunasalem et.al.in their paper[6], prove that accuracy is not a suitable metric for evaluating the efficiency of a classifier, particularly when it concerns imbalanced data. They also prove that support and confidence framework is biased towards the majority class. Presence of class imbalance in Software Defect Prediction demands for more importance to the identification of minority class elements even at the cost of accuracy. Therefore, specialized
International Journal of Innovations & Advancement in Computer Science IJIACS ISSN 2347 – 8616 Volume 4, Issue 1 January 2015
techniques which are custom made for imbalanced data must be used. This paper uses Partition based Associative Classification technique for handling imbalanced datasets. The rest of the paper is organized as follows. Section 2 provides a brief review of the recent developments in software defect prediction for imbalanced datasets. Section 3 discusses the methodology and algorithm for Partition based Associative Classifier. Section 4 discusses about the evaluation metrics used for comparing the performance of various classifiers, with respect to imbalanced datasets. Section 5 presents and analyzes the results obtained while comparing them with the performance of other classifiers. Section 6 presents conclusions. 2. RELATED WORK Existing works on Software Defect Prediction (SDP) have been based on classification algorithms such as Naïve Bayes[7], Decision Trees [8] Random Forest [9], AdaBoost [10], Neural Networks [11]. Naïve Bayes is the simplest form of Bayesian Classification. Bayesian classification develops probabilistic models which best fit the training data. Hence, the networks learn to approximate the dependency patterns in the data using probabilities. In case of software defect prediction in presence of imbalanced data, dependency patterns in the rare class are not significant and are usually insufficient to encode that into the networks. Hence, small classes are often misclassified by Bayesian classification. A decision tree uses a tree-like data structure where the non-leaf nodes are labeled with attributes, the arcs out of a node labeled by a given attribute are each labeled by possible values of the attribute, and the leaf nodes are labeled by the classes, indicating whether the current module is fault-prone or not. In the presence of imbalanced data, pruning might remove the branches predicting the minority class and the class label might then be relabeled to majority class. Pruning is based on predicting error. To reduce the error rate, pruning might remove the branches leading to minority class. The stopping criterion also might not allow the decision tree to grow till the minority
2
Vijaya Bharathi Manjeti, Sireesha Rodda
class instances are detected. Hence, decision trees cannot handle imbalanced datasets successfully. Other conventional classifiers based on the accuracy or reducing the error rate ignore the classification rules pertaining to minority class. It has been observed that the imbalanced distribution between fault-prone and non-faulty modules could degrade the classifier’s performance. Some researchers attempted to use class imbalance learning based approaches to alleviate this effect. Menzies et. al. [12] used undersampling for reducing the size of non-faulty modules to be same as that of fault-prone modules. Ensemble based algorithms and cost-snsitive learning algorithms have been proposed to alleviate the effect of clas imbalance on SDP [13, 14]. This paper investigate ensemble based Associative Classifier for handling class imbalance. The methodology is presented in the next section. 3.METHODOLOGY This paper uses Partition-based Associative Classification framework for performing SDP of imbalanced datasets. The dataset is divided into two partitions based on the class label: Majority Partition(non-faulty modules) and Minority Partition fault-prone modules). There is no need of representing the class attribute in either of the partitions. Local frequent items are then generated using any Frequent Itemset Mining algorithm. In this paper, Apriori algorithm was used for generating frequent itemsets. The minimum support threshold should be specified by the user. As the majority and minority samples are considered independently, all the locally significant itemsets which pass the percentage minimum support threshold will be generated. This results in the generation of the frequent itemsets of the majority partition and frequent itemsets of the minority partition. The frequent itemsets directly represent CARs as the right hand side of the CAR is nothing but the label of the partition to which the frequent item belongs. In the classification phase, the number of occurrences of each frequent itemset in the other partition than it was generated from, is calculated. Using this information, Complement Class
International Journal of Innovations & Advancement in Computer Science IJIACS ISSN 2347 – 8616 Volume 4, Issue 1 January 2015
Support[6] and Confidence, and Strength Score[6] of every rule is generated. While classifying a test instances, all the rules belonging to both majority class and minority class are found. The percentage of the minority class with respect to majority class is calculated as a constant ‘k’ using which a Scoring Function[6] parameter is calculated. If this value is greater than some user-specified threshold, then the test instance belongs to minority class. Otherwise, the test instance belongs to the majority class. The details of the algorithm in terms of two phases i.e., Learning Phase and Classification Phase are given below.
Support(CCS), and Strength Score(SS) using the formulas given below. CCS Ai → Cj)= Eq.(2)
3.1 Algorithm Learning Phase: 1. The majority (or negative) and minority (or positive) class labels of the dataset are identified depending on their frequency of occurrence. 2. The training dataset is then divided into two partitions: Pmaj (the training data instances belonging to majority class) and Pmin (the training data instances belonging to minority class). For each partition, each training instance is represented as a transaction after removing its class label. 3. Locally frequent itemsets are generated for every partition using Apriori algorithm. Let Ai be an itemset belonging to Partition with class label Cj. The Class Support of Ai is calculated using the following equation: Ai → Cj)= Eq.(1)
Classification Phase:
The local support of an itemset corresponds to the fraction of instances containing the itemset in that partition. If the support of an itemset is greater than some user-defined threshold, it is considered to be frequent. 4. Once frequent itemsets are identified, generation of CARs is straight forward. The right hand side of the CAR is the class label of the partition currently being used, and its left hand side is the locally frequent itemset. 5. Load the partition Pmin into main memory. For each frequent itemset in Pmaj, find the conditional support count in Pmin.Using that, the global frequency of the itemset in the training data set could be found. This value could be used to obtain Confidence(Conf), Complement Class
3
Vijaya Bharathi Manjeti, Sireesha Rodda
Conf Ai → Cj)= SS Ai →
Cj)=
Eq.(3) where
t=0.01 Eq.(4) Strength Score represents the accuracy with which Ai indicates the belongingness of Ai with Cj. 6. Repeat step 5 after loading partition Pmaj to find the global counts of frequent items in Pmin.
7. For every test instance, find the set of CARs applicable from the majority class(negative class) and set of CARs applicable from the minority class(positive class). Calculate the Scoring Function [5]: S= Eq.(5) Where k= If k>1, value of k is substituted, otherwise k=1is substituted. S ε [0, 1]. If the ‘S’value of the test instance tends to one, minority class is suggested, otherwise majority class is suggested. If the ‘S’ value of the test instance is greater than some cutoff value, then minority class label is assigned to the instance, otherwise majority class label is assigned. 4.PERFORMANCE METRICS While learning from an extremely imbalanced dataset, overall accuracy is not an appropriate measure of performance. A classifier which predicts every test instance as majority class can still achieve a high accuracy.[6,15] show that accuracy is not a proper metric for evaluating classifiers on imbalanced datasets.[16,17] discuss that Precision, Recall, and F-measure are commonly used metrics used to evaluate imbalanced dataset classification models. Hence, the proposed classifier’s performance is analyzed using metrics like Classification Accuracy, Precision, Recall, True Positive Rate, False
International Journal of Innovations & Advancement in Computer Science IJIACS ISSN 2347 – 8616 Volume 4, Issue 1 January 2015
Positive Rate, F-measure which are defined in terms of the entries of the confusion matrix which is shown in Table I. The rows of the confusion matrix correspond to actual classes while the columns correspond to predicted classes. In the Test Set, let ‘P’ indicate the test instance belonging to the positive class. Let ‘N’ indicate the test samples instance to the negative class. According to Table I, the performance metrics are defined as follows: Accuracy= Eq.(6) TruePositiveRate(TPR)=
Eq.(7)
FalsePositiveRate(FPR)=
Eq.(8)
Precision=
Eq.(9)
Recall=
Eq.(10)
F-measure= 5.RESULTS
Eq.(11)
In this section, the performance of different classifiers is evaluated on datasets obtained from publicly available PROMISE repository with data collated from practical projects[18]. Three datasets have been selected from those available in the repository which will be used in our study. The characteristics of the datasets under consideration is presented in Table 1. Dat a
Langua ge
jm1 cm1 pc1
C C C
Exampl es
Attribut es
10885 21 498 21 1109 21 Table 1: PROMISE Datasets
% Imbalan ce 19.35 9.83 6.94
Each sample in the datasets describes the attribute of one module or method, its class label mentioning whether the module is fault-prone or not. The non-class attributes include information such as McCabe metrics, Halstead metrics, lines of code and other attributes. As most classifiers can deal with discrete or categorical values, the five datasets have been discretized using WEKA package. The missing values in jm1 dataset have
4
Vijaya Bharathi Manjeti, Sireesha Rodda
been handled using Replace Missing Values option in WEKA package. The default options in either of the preprocessing procedures are retained. The classifiers used for comparison along with the proposed classifier include Naïve Bayes, AdaBoost, PART, ID3, CBA and the proposed Partition-based Associative Classifier (Partition). In this section, the performance of different classifiers is evaluated on three classification datasets using performance metrics like Precision, Recall, True Positive Rate, False Positive Rate, Accuracy and F-measure. For the sake of comparison, a minimum support threshold of 0.01 is used for Partition-based Associative Classifier and CBA. Datasets used: The performance for different classifiers for jm1, cm1 and pc1 datasets are presented in Tables II, III, and IV respectively. As the jm1 dataset is not very imbalanced, it has been observed that most of the considered classifiers (except CBA) efficiently classifiy the minority class samples (fault-prone/defective modules). However, the cm1 and pc1 datasets are both imbalanced in nature consisting only of a small portion of minority class samples as shown in Table II. For cm1 dataset, except for the proposed Partition algorithm, all other algorithms show sub-optimal performance on classifying defective modules as shown in Table III. The pc1 dataset is most imbalanced among the three datasets considered. The results in Table IV show that, though Partition-based Associative classifier does not return the best overall accuracy, it still returns the best values for Precision, Recall, and F-measure. Hence we can conclude that the Partition-based Associative Classifier performs well for pc1 which is an imbalanced dataset. The Performance is same as other classifier for jm1 dataset in which the class distributions are roughly balanced. In case of the cm1, the partition-based
International Journal of Innovations & Advancement in Computer Science IJIACS ISSN 2347 – 8616 Volume 4, Issue 1 January 2015
associative classifier outperforms all the other classifiers considered. 6.CONCLUSIONS A Partition-based Associative Classifier is presented in this paper. This classifier is specifically designed to predict software defects in development projects in presence of class imbalance unlike most classifiers which assume approximately balanced class distributions for datasets. The classifier’s performance is compared with other classifiers with respect to six performance metrics. Results show that the classifiers performs better than other classifiers when the dataset is skewed and shows comparable performance when the dataset is balanced in nature. 7.REFERENCES [1] Jones, C., & Bonsignour, O. (2012). The Economics of Software Quality. Pearson Education, Inc. [2] Shull, F., Basili, V., Boehm, B., Brown, A. W., Costa, P., Lindvall, M., … Zelkowitz, M. (2002). What we have learned about fighting defects. In Proceedings Eighth IEEE Symposium on Software Metrics 2002 (pp. 249–258). [3] Menzies, T., Milton, Z., Turhan, B., Cukic, B., Jiang, Y., & Bener, A. (2010). Defect prediction from static code features: current results, limitations, new approaches. Automated Software Engineering, 17(4), 375–407. [4] R. Agrawal and R. Srikant (1994). Fast Algorithm for mining association rules. In Proc. Of VLDB’94, Santiago, Chile, Sept. 1994. [5] Janssens D, Wets G, Brijs T and Vanhoof K (2003). “Integrating classification and association rules by proposing adaptations to CBA”. In: Proc. of the 10th International Conference on Recent Advances in Retailing and Services Science, Portland, Oregon. [6] Arunasalem, Bavani. & Chawla, Sanjay. & University of Sydney. 2006 Parameter-free classification for imbalanced data scoring using complement class support, School of Information Technologies, The University of Sydney, [Sydney, N.S.W.]
5
Vijaya Bharathi Manjeti, Sireesha Rodda
[7] T. Menzies, J. Greenwald, and A. Frank, “Data mining static code attributes to learn defect predictors”, no. 1, pp. 2–13, Jan. 2007. [8] Khoshgoftaar, T. M., Seliya, N., & Gao, K. Assessment of a New Three-Group Software Quality Classification Technique: An Empirical Case Study. Empirical Software Engineering, 10(2), 183–218, 2005. [9] Lan Guo, Yan Ma, Bojan Cukic, Harshinder Singh, "Robust Prediction of Fault-Proneness by Random Forests", ISSRE, 2004, 15th International Symposium on Software Reliability Engineering, 15th International Symposium on Software Reliability Engineering 2004, pp. 417-428, doi:10.1109/ISSRE.2004.35 [10] Zheng, Jun. (2010). Cost-sensitive boosting neural networks for software defect prediction. Elsevier Journal Expert Systems with Application, 37(6), pp.4437-4543. [11] Singh, M., Salaria, D.S., (2013). Software defect prediction tool based on Neural Network, International Journal of Computer Applications, Vol. 70(1), May 2013. [12] T. Menzies, B. Turhan, A. Bener, G. Gay, B. Cukic, and Y. Jiang, “Implications of ceiling effects in defect predictors,” in The 4th International Workshop on Predictor Models in Software Engineering (PROMISE 08), 2008, pp. 47–54 [13] Seiffert, C., Khoshgoftaar, T. M., & Van Hulse, J. (2009). Improving Software-Quality Predictions With Data Sampling and Boosting. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 39(6), 1283–1294. [14] Wang, S., & Yao, X. (2013). Using Class Imbalance Learning for Software DefectPrediction. IEEE Transactions on Reliability, 62(2), 434–443. [15] Cheng G. Weng, Josiah Poon. A New Evaluation Measure for Imbalanced Datasets, Conferences in Research and Practice in Information Technology, Vol. 87, pp:27-32 [16] Nitesh V. Chawla, C4.5 and Imbalanced datasets: investigating the effect of sampling method, probabilistic estimate and tree structure, In Proc. of ICML’03 workshop on class imbalances, 2003
International Journal of Innovations & Advancement in Computer Science IJIACS ISSN 2347 – 8616 Volume 4, Issue 1 January 2015
[17] Qiong Gu, Zhihua Cai, Li Ziu, Classification of Imbalanced Data Sets by Using the Hybrid Re-sampling Algorithm Based on Isomap, In LNCS, Advances in Computation and Intelligence, vol. 5821, pp:287-296, 2009
[18] G. Boetticher, T. Menzies, T. J. Ostrand, (2007) Promise repository of empirical software engineering data. [Online]. Available: http://promisedata.org/repository
Predicted Positive Class TP(True Positive)
Actual Positive Class
Actual Negative Class
Predicted Negative Class FN(False Negative)
FP(False Positive) TN(True Negative) Table I:Confusion Matrix
Classifier
TPR
FPR
Precision
Recall
F-measure
Partition Naïve Bayes AdaBoost PART ID3 J48(C 4.5) CBA
1 1
0.085 0.085
0.8518 0.852
1 1
0.92 0.92
1 0.085 0.852 1 0.92 1 0.085 0.852 1 0.92 1 0.085 0.852 1 0.92 1 0.085 0.852 1 0.92 0.9583 0.045 0.92 0.95 0.938 Table II: Performance Comparison for jm1 Dataset
Accuracy % 94.2857 94.2857 94.2857 94.2857 94.2857 94.2857 94.2028
Classifier
TPR
FPR
Precision
Recall
F-measure
Partition Naïve Bayes AdaBoost PART ID3 J48(C 4.5) CBA
0.4893 0.191
0.0209 0.024
0.7931 0.563
0.4893 0.191
0.6052 0.286
Accuracy % 91.0179 86.53
0 0.298 0.261 0.191 0.0667
0 0.045 0.061 0.024 0.069
0 0.519 0.414 0.563 0.6
0 0.298 0.261 0.191 0.667
0 0.378 0.32 0.286 0.12
85.9281 86.2275 81.4371 86.5269 86.486
Table III: Performance Comparison for cm1 Dataset
6
Classifier
TPR
FPR
Precision
Recall
F-measure
Partition Naïve Bayes AdaBoost PART ID3 J48(C 4.5) CBA
0.5
0.065
0.25
0.5
0.3333
Accuracy % 91.6667
0
0.065
0
0
0
89.5833
0 0.065 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Table IV: Performance Comparison for pc1 Dataset
Vijaya Bharathi Manjeti, Sireesha Rodda
89.5833 95.8333 95.8333 95.8333 76.5957