Quality Problem in Software Measurement Data
After publishing a couple of articles on quality of software metrics, the Empirical Software Engineering Laboratory was contacted to contribute a chapter for the special issue on Quality Software Development for the book Advances in Computers published by Elsevier.
Therefore, I became the first author of the chapter where I presented the various results about quality in software metrics, noise filtering, and partitioning algorithms. The book has been lately added to Google Books.
An approach to enhance the quality of software measurement data is introduced in this chapter. Using poor-quality data during the training of software quality models can have costly consequences in software quality engineering. By removing such noisy entries, i.e., by filtering the training dataset, the accuracy of software quality classification models can be significantly improved.
The Ensemble-Partitioning Filter functions by splitting the training dataset into subsets and inducing multiple learners on each subset. The predictions are then combined to identify an instance as noisy if it is misclassified by a given number of learners. The conservativeness of the Ensemble-Partitioning Filter depends on the filtering level and the number of iterations. The filter generalizes some commonly used filtering techniques in the literature, namely the Classification, the Ensemble, the Multiple-Partitioning, and the Iterative-Partitioning Filters. This chapter also formulates an innovative and practical technique to compare filters using real-world data. We use an empirical case study of a high assurance software project to analyze the performance of the different filters obtained from the specialization of the Ensemble-Partitioning Filter. These results allow us to provide a practical guide for selecting the appropriate filter for a given software quality classification problem. The use of several base classifiers as well as performing several iterations with a conservative filtering scheme can improve the efficiency of the filtering scheme.