2011 5th Malaysian Conference in Software Engineering (MySEC)
Efficient Prediction of Software Fault Proneness Modules Using Support Vector Machines and Probabilistic Neural Networks Hamdi A. Al-Jamimi
Lahouari Ghouti
King Fahd University of Petroleum and Minerals Information and Computer Science Department Dhahran, Saudi Arabia
[email protected]
King Fahd University of Petroleum and Minerals Information and Computer Science Department Dhahran, Saudi Arabia
[email protected]
Abstract— A software fault is a defect that causes software failure in an executable product. Fault prediction models usually aim to predict either the probability or the density of faults that the code units contain. Many fault prediction models using software metrics have been proposed in the Software Engineering literature. This study focuses on evaluating high-performance fault predictors based on support vector machines (SVMs) and probabilistic neural networks (PNNs). Five public NASA datasets from the PROMISE repository are used to make these predictive models repeatable, refutable, and verifiable. According to the obtained results, the probabilistic neural networks generally provide the best prediction performance for most of the datasets in terms of the accuracy rate. Keywords- Fault proneness; software metrics, probabilistic neural networks, support vector machines .
I.
INTRODUCTION
The challenge in today’s software engineering is to deliver high-quality software system on time to customers. Thus, it is a mandatory to focus on customer satisfaction and software quality to ensure that the desired quality is built into the software product and that customers remain loyal to the produced product. Quality of a software system is relative to the number of defects reported in the final product. Since the defective software modules may increase development and maintenance costs, decrease customer satisfaction and might result in software failures [1]. Accordingly, early and effective prediction of defect-prone software modules before deployment is very important to measure the likely delivered quality and maintenance effort, especially for large and complex systems [2]. Additionally, this can enable software developers to focus quality assurance activities and allocate effort and resources more efficiently. This in turn can lead to a substantial improvement in software quality. Fault prediction has been a new research direction for several years, and becomes one of the major fields recently. Building the models for predicting software quality attributes is increased motivating the usage of artificial intelligence techniques. Artificial neural networks (ANN) have seen an explosion of interest over the years, and are being successfully
applied across a range of problem domains, in areas as diverse as finance, medicine, engineering, geology and physics. Indeed, anywhere that there are problems of prediction, classification or control, NN and machine learning (ML) are being introduced. Software metrics have become essential in software engineering for several reasons, among which quality assessment and reengineering. Indeed, they provide ways to evaluate the quality of software, as well their use in earlier phases of software development can help organizations in assessing large software development quickly, at a low cost [3]. In the field of software evolution, metrics can be used for identifying stable or unstable parts of software systems. In a stable development environment, software metrics can be used to predict modules that are likely to prone faults [4]. Some researchers approve to use of product metrics, such as Halstead complexity, McCabe’s cyclomatic complexity, and various code size measures to predict fault-prone modules [4-7]. The fault prediction models that essentially based on software metrics can predict number of faults and density in software modules. The objective of this paper is to evaluate the capability SVM and PNN models in predicting defect-prone software modules in six NASA products, namely, CM1, JM1, KC1, KC2, and PC1. The approaches are inherently different, raising the question whether one approach has better performance than the other. Thus, the goal of this paper is to empirically evaluate the performance of the aforementioned approaches for predicting software fault proneness. To the best of our knowledge this is the first study which evaluates the performance of these models on that number of datasets. Especially, with considering feature selection techniques cfs to reduce the number of metrics and comparing the obtained results. The rest of the paper is organized as follows; Section II describes the used models and the datasets, while Section III summarizes the related work. The experimental evaluations are detailed in Section IV. The obtained results and discussions of
978-1-4577-1531-0/11/$26.00 ©2011 IEEE
251
the results are detailed in Section V. Finally, the conclusions and future work are stated in Section VI. II.
BACKGROUND
A. Models Used: This section explains the details of each model used during the experiments. x
x
SVM Model- A linear decision boundary is a simple classifier that can be learned very efficiently. However, due to its small complexity it can correctly classify data that is linearly separable only. On the other hand, a more complex decision boundary can correctly classify general data that may not be linearly separable. However, such a classifier may be much harder to train. SVM combines the best of both worlds. That is, it uses an efficient training algorithm while at the same time being capable of representing complex decision boundaries. The hyperplanes corresponding to w.x + b = -1 and w.x + b = 1 are the bounding hyperplanes. The distance between the two bounding hyperplanes is the margin, which is equal to 2/|w|. It can be shown that, for given training data, maximizing the margin of separation between the two classes has the effect of reducing the complexity of the classifier and thus optimizing generalization performance. The optimal hyperplane corresponds to the one that minimizes the training error and, at the same time, has the maximal margin of separation between the two classes. PNN Model- This network provides a general solution to pattern classification problems by following an approach developed in statistics, called Bayesian classifiers. The probabilistic neural network uses a supervised training set to develop distribution functions within a pattern layer. These functions are used to estimate the likelihood of an input feature vector being part of a learned category, or class. PNN is based on one pass learning with highly parallel structure. It is a powerful memory based network and able to deal with sparse data effectively. In PNN, the number of neurons in the hidden layers is usually the number of patterns in the training set because each pattern in the training set is represented by one neuron. The main advantage of PNN is the speed at which the network can be trained. Training a PNN is performed in one pass. The smoothing factor allows PNN to interpolate between the patterns in the training set.
B. Datasets: In this paper, the experiment’s context is the NASA projects. We make use of famous and public domain datasets include: KC1, KC2, JM1, PC1 and CM1belonging to several NASA projects [25]. Table I describes briefly the used datasets. CM1 belongs to a NASA spacecraft instrument project developed in C programming language. CM1 has 498 modules of which 10% are defective. JM1 belongs to a real-time predictive ground system project and it consists of 10885 modules. JM1 is the largest dataset in our experiments implemented with C programming language and it has a 19% of defective modules. KC1 dataset, which is implemented in
252
C++ programming language, belongs to a storage management project for receiving/processing ground data. It has 2109 software modules, 15% of which are defective. KC2 dataset belongs to a data processing project and it has 523 modules. C++ language is used for the KC2 implementation and 21% of modules have defects in KC2 dataset. PC1 belongs to a flight software project for earth orbiting satellite. It has 1109 modules, 7% of which are defective and the implementation language is C. Each dataset contains twenty one software metrics (independent variables) at the module level and the associated dependent Boolean variable. TABLE I.
THE USED DATASETS
Dataset
Language
Size (LOC)
# instances
defective instances
CM1 JM1 KC1 KC2 PC1
C C C++ C++ C
20 KSLOC 315 KSLOC 43 KSLOC 23 KSLOC 40 KLOC
498 10885 2109 522 1109
10% 19% 15% 21% 7%
III.
RELATED WORK
There are numerous studies have been introduced in the field of fault predication. This effort has been started when the earliest studies concentrated on establishing relationships between the defects and the software complexity and size. The broadly proposed metrics were Halstead’s theory [8] and McCabe’s cyclomatic complexity [9]. Recently, many techniques have been applied for fault predication such as, logistic regression, clustering, data mining, and etc. In this regard, several data mining methods have been proposed for defect analysis and for building software fault prediction models. Researchers used different methods such as Neural Networks [10] Genetic Programming [11], Decision Trees [12], Case-based Reasoning [13] Naïve Bayes [14], Fuzzy Logic [15] and Logistic Regression [16] for software fault prediction. Catal et al. [17, 18] developed and validated several Artificial Immune Systems-based models for software fault prediction problem. Elish et al. [19] stated that the performance of SVMs is generally better than, or at least is competitive against the other statistical and machine learning models in the context of four NASA datasets. They compared the performance of SVMs with the performance of Logistic Regression, MLP, Bayesian Belief Network, Naive Bayes, Random Forests, and Decision Trees. Gondra’s [20] experimental results showed that SVMs provided higher performance than the Artificial Neural Networks for software fault prediction. Kanmani et al. [21] validated PNN and Backpropagation Neural Network (BPN) in order to compare their results with results of statistical techniques, where PNN provided better performance than BPN. Quah [10] proposed new sets of metrics for the presentation logic tier and data access tier, additionally he used a NN model with genetic training strategy to predict the number of faults, the number of code changes required to correct a fault, the amount of time needed to make the changes and he. Menzies et al. [14] showed that Naive Bayes with a logNum filter is the best software prediction model, even though it is a very simple algorithm.
They also stated that there is no need to find the best software metrics group for software fault prediction because the performance variation of models with different metrics group is not significant. Almost all the software fault prediction studies use metrics and fault data of previous software release to build fault prediction models, which are called supervised learning approaches in ML community. However, in some cases, there are very few or no previous fault data. Consequently, there is a need for new models and techniques for these two challenging prediction problems. Unsupervised learning approaches such as clustering methods can be used when there are not any previous fault data, whereas semi-supervised learning approaches can be used when there are very few fault data [22]. Zhong et al. [23] used Neural-Gas and K-means clustering algorithms to create the clusters and then an expert examined representative modules of clusters to assign the fault-proneness label. The performance of their model was comparable with classification algorithms [23]. In the cases when a company does not have previously collected fault data, or when a new project type is initiated, there can be used models based on unsupervised learning approaches. Seliya et al. [24] used Expectation–Maximization (E-M) algorithm for semi-supervised software fault prediction problem showing also that their model had comparable results with classification algorithms. So, the semi-supervised fault prediction models are required especially in the de-centralized software projects, where most of the companies find difficulties to collect fault data. IV.
EXPERIMENTAL EVALUATIONS
This section describes in details the goal, the used datasets, context and variables selection, validation methods, accuracy measures and the experiments operations. A. Goal The goal of this paper is to apply two different approaches SVM and PNN models in predicting the modules’ faults by using different metrics explained in the next subsection. Afterwards, we evaluate and compare the performance of the two approaches with respect to its prediction accuracy against each other in the context of the five different dataset introduced previously CM1, JM1, KC1, KC2 and PC1. B. Dependent and Independent Variables The experimental variables are divided into dependent and independent variables. Each dataset contains twenty-one independent variables and a dependent variable. The dependent variable which is binary variable indicates whether or not the module has any defects. Public NASA datasets include 21 method-level metrics proposed by Halstead [8] and McCabe [9, 26]. The twenty-one metrics (independent variables) include: five different lines of code measure, three McCabe metrics, four base Halstead measures, eight derived Halstead measures, and a branch-count. Table II lists and describes these metrics. However, some researchers generally use only 13 metrics from these datasets [24], stating that derived Halstead metrics do not present any extra information for software fault prediction. Additionally, Munson [27], in his book of software measurement, clarifies that the four basic Halstead metrics
describe the variation of all of the rest of Halstead metrics. That is, there is nothing new to get from collecting the remaining derived Halstead metrics. Therefore, in our study we conduct many experiments with considering the 21modulelevel metrics and other in which only the 13 metrics are considered. Some researchers used feature reduction techniques such as Principal Component Analysis (PCA) to reduce multicollinearity and to improve the performance of the models. In our study, we applied correlation-based feature selection (cfs) technique [28] to get relevant metrics, and then we perform other experiments based on that. TABLE II. Type
Line Count
Derived Halstead
Basic Halstead
McCabe Branch
TWENTY ONE METHOD-LEVEL METRICS Metric
Information
LOCode LOComment LOBlank LOCodeAnd Comment LOC N
# of lines of statement # of lines of comment # of lines of blank
V L D I E B T UniqOp UniqOpnd TotalOp TotalOpnd V(g) EV(g) IV (g) Branch Count
# of lines of code and comments Total lines of code Total # of operands operators Volume on minimal implementation Program Length = V/N Difficulty = I/L Intelligent count Effort to write program =V/L Effort Estimate Time to write program= E/18s # of unique operator # of unique operands Total # of operators Total # of operands Cyclomatic Complexity Essential Complexity Design Complexity Total # of branch count
C. Prediction Accuracy Measure Accuracy measure is used to measure the performance of prediction models for two-class problem as in our work either defective or not [31]. Accuracy: refers to the correct classification rate. It is defined as the ratio of the number of modules correctly predicted to the total number of modules. It can be given calculated by the following:
D. Cross Validation In all the experiments we used K-fold cross-validation procedure where the original sample is arbitrarily partitioned into K sub-samples. Of the K sub-samples, K-1 sub-samples are used as training data whereas a single sub-sample is preserved as the validation data for testing the model. Then, the cross-validation process is repeated K times (the folds), with each of the K sub-samples used exactly once as the validation data. The K results from the folds then can be averaged or
253
combined to produce a single estimation. 10-fold crossvalidation is commonly used; also it is used in our work. E. Supporting Tools We list below the tools used to accomplish this study: x
x
DTREG: A predictive modeling software, is a powerful statistical analysis program that generates classification and regression decision trees that model data and can be used to predict values [29]. Weka 3: Weka is a collection of machine learning algorithms for data mining tasks. It contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization [30]. V.
RESULT AND DISCUSSION
Unlikely, for PNN, when comparing the training and testing data the difference of the performance of PNN is obvious. Another observation is that SVM and PNN prediction models show high accuracy values (above 90%) for datasets that has the less percent of defective modules i.e. CM1 and PC1. While the accuracy values do not reach 90% for the datasets have large percent of defective modulus (see Table I). TABLE III.
THE ACCURACY VALUES FOR THE EXPERIMENT #1 SVM
PNN
Dataset
Training Data
Testing Data
Training Data
Testing Data
CM1 JM1 KC1 KC2 PC1
90.16 83.08 89.80 84.67 94.94
90.16 81.22 86.69 84.29 94.85
93.17 84.97 89.22 89.66 99.54
90.16 81.52 86.69 84.48 95.13
This section describes the experiments which were conducted in our study. All the experiments investigated the two compared models: SVM and PNN. Additionally, they all were performed by using a 10-fold cross-validation. Experiments were repeated more than one time to produce reliable results. The performances of the various models were compared according to their accuracy measures. In the following, for each experiment we use different independent variables. However, for all the experiments we used the faultproneness label as the dependent variable. This label can be fault-prone or not fault-prone depending on the result of the prediction. All the three experiments have been conducted at the module-level metrics. A. Experiment #1 This experiment based on 21 module-level metrics (McCabe and Halstead metrics) explained in Table II. The accuracy values for all the investigated models in the context of all datasets are shown in Table III. It is obvious from the table, the highest performance obtained through PNN model for most of the data except for the CM1 and KC1. SVM shows better performance for the KC1 training data, whereas for the testing data for CM1 and CK1 the two models give the same accuracy. The higher the accuracy value, the better is the performance. This experiment used all the method-level metrics (21 metrics) located in the datasets. These 21 metrics are shown in Table II. We computed the accuracy values of models for both training and testing data to check if the accuracy level is acceptable for the relevant model. Table III shows accuracy values of SVM and PNN models for this experiment. According to Table III and Figure 1, PNN model provides the best prediction performance in terms of training and testing data for three of the datasets namely JM1, KC2 and PC1. For KC1, although SVM provides better performance for the training data, the two models behave the same in terms of testing data. On the other hand, regarding CM1 training dataset, the PNN model achieves the highest accuracy value but both models provide the same performance for the CM1 testing data. That is, for the training data PNN provides the highest accuracy values in context of all the datasets except KC1 where the SVM behaves better. The SVM model is stable for CM1 dataset in terms training and testing data. In addition, as noticed in Table II and Figure 1, SVM provides close results in terms of training and testing data for KC2 and PC1 datasets.
254
Figure 1. The accuracy measures for experiment #1 (based on 21 metrics)
B. Experiment # 2 This experiment used only 13 method-level metrics, without employing the derived Halstead metrics. In this experiment we investigate the claim stated by [24] that is; the Halstead metrics do not present any extra information for software fault prediction. Table IV and Figure 2 present the accuracy values of SVM and PNN models for this experiment. According to that, PNN model provides the best prediction performance for CM1, JM1, KC1, and PC1datasets in terms of training and testing data. On the other hand, SVM model provides the best performance for the testing data of CK2 dataset, where PNN model provides the best accuracy for the same dataset in the context of training data. PNN model achieves significantly better performance with 13 metrics comparing with using all the metrics especially for the testing data. Accuracy values of SVM are stable for CM1 and training data of CK2 either with 21 or 13 metrics. Experiment 1 together with experiment 2 show that derived Halstead metrics do not change the performance considerably and therefore, there is no need to collect these derived metrics for software fault prediction problem. Thus, one of the observations from the results of the first two experiments confirms what was stated by [24] that the derived Halstead metrics do not introduce any extra information for software fault prediction.
TABLE IV.
THE ACCURACY VALUES FOR THE EXPERIMENT #2 SVM
PNN
Dataset
Training Data
Testing Data
Training Data
Testing Data
CM1 JM1 KC1 KC2 PC1
90.16 81.45 87.69 84.67 95.40
90.16 81.21 86.11 84.67 94.76
99.80 84.15 89.70 90.42 99.54
90.36 81.63 86.83 84.10 95.22
Table VI and Figure 3 demonstrate the accuracy values of the prediction models after reducing the number of the used independent variables by using correlation-based feature selection technique. Again, PPN model provides the best performance against the SVM model for KC1, PC1, CM1 and JM1 datasets. On the other hand, regarding KC2 dataset the best accuracy value for the testing data obtained through SVM model, where PNN provides the best accuracy regarding the training data of CK2 dataset. The accuracy values obtained by SVM model are moderate for all the datasets. In addition, for this experiment it is clear that SVM model provides the same accuracy for both training and testing data for CM1 and PC1 datasets. Another observation SVM model provides the same accuracy for the smallest dataset (CM1) during the three experiments with different number of independent variables. For the three largest dataset (JM1, KC1 and PC1), the PNN model provides the best performance almost all the time TABLE VI.
THE ACCURACY VALUES FOR THE EXPERIMENT #3 SVM
PNN
Figure 2. The accuracy measures for experiment #2 (based on 13 metrics)
Dataset
Training Data
Testing Data
Training Data
Testing Data
C. Experiment # 3 In this experiment, we applied correlation-based feature selection (cfs) technique [28] to get relevant independent variables, so the number of the collected metrics for each dataset varies from one to another. That is, the independent variables for each dataset are identified after applying the correlation-based feature selection technique on 21 independent variables to reduce this number. Thus, each dataset has a different collection of metrics to predict the faultprone.
CM1 JM1 KC1 KC2 PC1
90.16 81.69 88.31 85.06 94.76
90.16 81.30 86.02 85.06 94.76
99.80 83.74 89.89 86.02 99.54
90.36 81.44 87.12 83.91 95.13
The chosen metrics with cfs technique for each dataset are listed in Table V. It is noticeable form the table, the used metrics are reduced significantly. For example, KC1 and JM1 datasets only need 8 independent variables to predict the faultprone. However, the performance varies form one dataset to another as shown in Table VI and Figure 3. For KC1 dataset the PNN model shows better performance when reducing the metrics, while it does not with JM1. So, we couldn’t judge whether reducing the number of metrics to this level improves the performance or impairs it. With CM1 dataset only 6 independent variables are used. A few numbers of independent variables are needed for PC1 and KC2, only 2 and 3 independent variables, respectively, are used. TABLE V.
THE USED METRICS FOR EACH DATASET WITH (CFS)
Dataset
The chosen metrics
KC1 KC2 PC1 CM1
ev(g), v, d, i, lOCode, lOComment, lOBlank, uniq_Opnd ev(g), b, uniq_Opnd I, locCodeAndComment loc, iv(g), I, lOComment, lOBlank, uniq_Op, uniq_Opnd loc, v(g), ev(g), iv(g),i, lOComment, lOBlank, locCodeAndComment
JM1
Figure 3. The accuracy measures for experiment #3 (after applying cfs to select features)
VI.
SUMMARY AND CONCLUSIONS
In this study, we identified and used high-performance fault prediction models based on PNN and SVM models. We used public NASA datasets from PROMISE repository to make our predictive models repeatable, refutable, and verifiable. The three experiments were conducted to evaluate the performance of the SVM and PNN models as a response for changing the number of independent variables. The SVM and PNN models were evaluated in each of the five public datasets. According to the experiments we conducted, PNN model mostly provided the best performance for large datasets experiments. Moreover, when using derived Halstead metrics for software fault prediction no extra information was shown.
255
ACKNOWLEDGMENT The authors would like to thank King Fahd University of Petroleum and Minerals for supporting this research. REFERENCES [1] [2]
[3]
[4]
[5]
[6]
[7]
[8] [9] [10]
[11]
[12]
[13]
[14]
[15]
256
A. Koru and H. Liu, “Building Effective Defect Prediction Models in Practice,” IEEE Software, vol. 22, no. 6, pp. 23-29, 2005. N. Fenton and M. Neil, “A Critique of Software Defect Prediction Models,” IEEE Trans. on Software Engineering, vol. 25, no. 5, pp. 675689, 1999. K. K. Aggarwal, Y. Singh, A. Kaur, and R. Malhotra, “Investigating the Effect of Coupling Metrics on Fault Proneness in Object-Oriented Systems,” Software Quality Professional, vol. 8, no. 4, pp. 4–16, 2006. R. Ishrat, R. Parveen and S. I. Ahson,”Pattern Trees for Fault-Proneness Detection in Object-Oriented Software,” Journal of Computer Science, vol. 6, no. 10, pp. 1078-1082, 2010. N. N. T. Ball and B. Murphy, “Using historical data and product metrics for early estimation of software failures,” in Proc. ISSRE’06, Raleigh, NC, USA, 2006. V. U. Challagulla, F. B. Bastani, and I.-L. Yen, ”A unified framework for defect data analysis using the mbr technique,” in Proc. of the IEEE International Conference on Tools with Artificial Intelligence (ICTAI’06), 2006. L. Guo, Y. Ma, B. Cukic, and S. Harshinder, “Robust prediction of faultproneness by random forests,” in Proc. of the 15th International Symposium on Software Relaibility Engineering ISSRE’04, 2004. M. Halstead, Elements of Software Science. Elsevier, 1977. T. McCabe, “A complexity measure,” IEEE Trans. on Software Engineering, vol. 2, no.4, pp. 308-320, 1976. M. M. Thwin and T. Quah,”Application of neural networks for software quality prediction using object-oriented metrics,” in Proceedings of the 19th International Conference on Software Maintenance, Amsterdam, Netherlands, 2003. M. Evett, T. M. Khoshgoftaar, and P. D. Chien, “GP-based software quality prediction,” in Proceedings of the Third Annual Genetic Programming Conference, San Francisco, USA, 1998. T. M. Khoshgoftaar and N. Seliya, ”Software quality classification modeling using the SPRINT decision tree algorithm,” in Proceedings of the Fourth IEEE International Conference on Tools with Artificial Intelligence, Washington, DC, USA, 2002. K. El Emam, S.Benlarbi, and N. Goel, “Comparing case-based reasoning classifiers for predicting high risk software components,” Journal of Systems and Software, vol. 55, no. 3, pp. 301-320, 2001.. T. Menzies, J. Greenwald, and A. Frank, “Data mining static code attributes to learn defect predictors,” IEEE Trans. on Software Engineering, vol. 33, no. 1, pp. 2-13, 2007. X. Yuan, T. M. Khoshgoftaar, and E. B. Allen, “An application of fuzzy clustering to software quality prediction,”in Proceedings of the Third IEEE Symposium on Application-Specific Systems and Software Engineering Technology, Washington, DC, USA, 2000.
[16] H. M. Olague, S. Gholston, and S. Quattlebaum, “Empirical validation of three software metrics suites to predict fault-proneness of objectoriented classes developed using highly iterative or agile software development processes,” IEEE Trans. on Software Engineering, vol. 33, no. 6, pp. 402-419, 2007. [17] C. Catal and B. Diri,”A fault prediction model with limited fault data to improve test process,” in Proceedings of the Ninth International Conference on Product Focused Software Process Improvement Lecture Notes in Computer Science, Rome, Italy, 2008. [18] C. Catal and B. Diri,”Software defect prediction using artificial immune recognition system,” in Proceedings of the Fourth IASTED International Conference on Software Engineering (IASTED’07), Innsbruck, Austria, 2007. [19] K. O. Elish and M. O. Elish, “Predicting defect-prone software modules using support vector machines,” The Journal of Systems and Software, vol. 81, pp. 649-660, 2008. [20] I. Gondra, “Applying machine learning to software fault-proneness prediction,” Journal of Systems and Software, vol. 81, no. 2, pp. 186195, 2008. [21] S. Kanmania, V. R. Uthariarajb, V. Sankaranarayananc, and P. Thambiduraia, “Object-oriented software fault prediction using neural networks,” Information and Software Technology, vol. 49, no. 5, pp. 483-492, 2007. [22] C. Catal and B. Diri, ”A conceptual framework to integrate fault prediction sub-process for software product lines,” in Proceedings of the Second IEEE International Symposium on Theoretical Aspects of Software Engineering, Nanjing, China, 2008. [23] S. Zhong, T. M. Khoshgoftaar, and N. Seliya,”Unsupervised learning for expert-based software quality estimation,” in Proceedings of the Eighth International Symposium on High Assurance Systems Engineering, Tampa, FL, USA, 2004. [24] N. Seliya, T. M. Khoshgoftaar, and S. Zhong,”Semi-supervised learning for software quality estimation,” in Proceedings of the 16th International Conference on Tools with Artificial Intelligence, Boca Raton, FL, USA, 2004. [25] PROMISE Software Engineering Repository http://promise.site.uottawa.ca/SERepository/datasets-page.html. [26] T. McCabe and C. Butler, “Design complexity measurement and testing,”Communications of the ACM, vol. 32, no. 12, pp. 1415-1425, 1989. [27] J. C. Munson, Software Engineering Measurement, in Auerbach Publications Boca Raton FL 2003. [28] L. Yu and H. Liu, ”Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution,” in Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington, DC, USA, 2003. [29] Software For Predictive Modeling and Forecasting http://www.dtreg.com/. [30] Machine Learning Group : http://www.cs.waikato.ac.nz/~ml/. [31] I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed., San Francisco: Morgan Kaufmann, 2005.