Software quality analysis by combining multiple projects and learners
This paper continues to explore noise filtering using learners. Four classification scenarios were investigated. The first scenario applies the more classical approach: training one classifier with a single fit dataset and predicting the test dataset. The second approach is a popular method in data mining: a classifier is built based on the prediction of multiple learners induced on the same dataset. The third approach consists of using the prediction of the same learner induced on multiple fit datasets (multi-dataset classifier). Finally, the most generic approach combines the predictions of multiple learners built on multiple fit datasets and applied on the dataset we want to predict. Such a technique is referred to as multi-learner multi-dataset classifier.
To our knowledge, this empirical work is one of the largest in terms of both scale and scope: 119 (1797) base classification models were built, and more than 700 vectors of base estimates were generated. This paper was published in the Software Quality Journal. You can find more information on Springer and on the ACM portal
When building software quality models, the approach often consists of training data mining learners on a single fit dataset. Typically, this fit dataset contains software metrics collected during a past release of the software project that we want to predict the quality of. In order to improve the predictive accuracy of such quality models, it is common practice to combine the predictive results of multiple learners to take advantage of their respective biases. Although multi-learner classifiers have been proven to be successful in some cases, the improvement is not always significant because the information in the fit dataset sometimes can be insufficient. We present an innovative method to build software quality models using majority voting to combine the predictions of multiple learners induced on multiple training datasets. To our knowledge, no previous study in software quality has attempted to take advantage of multiple software project data repositories which are generally spread across the organization. In a large scale empirical study involving seven real-world datasets and seventeen learners, we show that, on average, combining the predictions of one learner trained on multiple datasets significantly improves the predictive performance compared to one learner induced on a single fit dataset. We also demonstrate empirically that combining multiple learners trained on a single training dataset does not significantly improve the average predictive accuracy compared to the use of a single learner induced on a single fit dataset.