Evaluation of Classifiers in Software Fault-Proneness Prediction

Authors

F. Karimian Department of Computer Engineering, University of Kashan, Kashan, Iran.

S. M. Babamir Department of Computer Engineering, University of Kashan, Kashan, Iran.

Abstract:

Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one can classify software modules into fault-prone and non-fault-prone ones. To make such a classification, we investigated into 17 classifier methods whose features (attributes) are software metrics (39 metrics) and instances (software modules) of mining are instances of 13 datasets reported by NASA. However, there are two important issues influencing our prediction accuracy when we use data mining methods: (1) selecting the best/most influent features (i.e. software metrics) when there is a wide diversity of them and (2) instance sampling in order to balance the imbalanced instances of mining; we have two imbalanced classes when the classifier biases towards the majority class. Based on the feature selection and instance sampling, we considered 4 scenarios in appraisal of 17 classifier methods to predict software fault-prone modules. To select features, we used Correlation-based Feature Selection (CFS) and to sample instances we did Synthetic Minority Oversampling Technique (SMOTE). Empirical results showed that suitable sampling software modules significantly influences on accuracy of predicting software reliability but metric selection has not considerable effect on the prediction.

Download for Free

Already have an account?login

similar resources

Software Fault-proneness Prediction using Random Forest

Many metric-based classification models have been developed and applied to software fault-proneness prediction. This paper presents a novel prediction model using Random Forest classifier. Random Forest (RF) can be a promising candidate for software quality prediction because it is one of the most accurate classification algorithms available and has strengths in noise handling and efficient run...

full text

Software Metrics Reduction for Fault-Proneness Prediction of Software Modules

It would be valuable to use metrics to identify the fault-proneness of software modules. However, few research works are on how to select appropriate metrics for fault-proneness prediction currently. We conduct a large-scale comparative experiment of nine different software metrics reduction methods over eleven public-domain data sets from the NASA metrics data repository. The Naive Bayes data ...

full text

Software Fault-proneness Prediction using Module Severity Metrics

Most of the fault prediction studies have focused on the binary classification models that determine whether the input modules are fault-prone or not. More recently, several studies have shown that severity-based multi-classification models are more useful since they can predict the fault-proneness depending on the severity of the defects in the module. We present new severity-based prediction ...

full text

Software Fault Proneness Prediction Using Support Vector Machines

Empirical validation of software metrics to predict quality using machine learning methods is important to ensure their practical relevance in the software organizations. In this paper, we build a Support Vector Machine (SVM) model to find the relationship between object-oriented metrics given by Chidamber and Kemerer and fault proneness. The proposed model is empirically evaluated using public...

full text

State-Of-The-Art In Empirical Validation Of Software Metrics For Fault Proneness Prediction: Systematic Review

With the sharp rise in software dependability and failure cost, high quality has been in great demand. However, guaranteeing high quality in software systems which have grown in size and complexity coupled with the constraints imposed on their development has become increasingly difficult, time and resource consuming activity. Consequently, it becomes inevitable to deliver software that have no...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}

Journal title

Journal of Artificial Intelligence and Data Mining

volume 5 issue 2

pages 149- 167

publication date 2017-07-01

unfollow

{@ msg @}

By following a journal you will be notified via email when a new issue of this journal is published.

Keywords

Software fault prediction Classifier performance Feature Selection Data sampling Software metric

Hosted on Doprax cloud platform doprax.com