Achieving higher classification accuracy with ensemble of trees
Cheng, Wen Xin
Date of Issue2017-05-18
School of Electrical and Electronic Engineering
Classification is a process where a classifier predicts a class label to an object using the set of inputs. One simple method to solve classification problems is a decision tree, a classifier which can be easily interpreted with a graph and yet produces potentially high accuracies. However, there is a limitation of this method: decision tree is unstable classifiers. This implies that a small change in the dataset results in completely different structure of the decision tree. Therefore, there is a need for ensemble methods, which can significantly improve the performance. In this project, we study on standard decision trees and their ensembles: Bootstrapped Aggregating, Random Forest, Extremely Randomised Trees, Rotation Forest, Gradient Boosting and Adaptive Boosting, and assess the performance of selected classifiers on real-world datasets. In addition, we propose a new ensemble method called Heterogeneous Ensemble of trees and compare its performance with existing methods. From the evaluation of 6 different classifiers on 10 real-world datasets, it shows that Heterogeneous Ensemble of trees outperforms other classifiers including Random Forest, the best ranked classifiers among 179 classifiers in a recent survey with 121 real-world datasets.
Final Year Project (FYP)
Nanyang Technological University