Multiple centers based fuzzy clustering for imbalanced data
Date of Issue2016-05-26
School of Electrical and Electronic Engineering
Clustering for data mining is a useful technique in terms of identifying interesting distributions and discovering groups in the underlying data. K-means is a particular clustering technique that is world-renowned and widely spread for its low computational cost, which mainly includes the hard k-means clustering algorithms and the fuzzy k-means clustering algorithms. There are many factors that may affect the performance of the k-means clustering algorithms, such as high dimensionality, scales of the data, noise, etc. And the data distribution is also an important factor that can affect the performance of the k-means clustering algorithm significantly, not only for the hard k-means clustering, but also for the fuzzy k-means clustering. The problem caused by the imbalanced data is also called the “uniform effect”. In this thesis, the multicenter clustering algorithm (MC)  has been studied and implemented, which aims to solve “uniform effect”. The MC clustering algorithm contains three sub algorithms, which are the fast global fuzzy k-mean algorithm (FGFKM), the best m-plot algorithm (BMP) and the grouping multicenter algorithm (GMC). The experimental study of the MC, and its three sub-algorithms has been conducted, and the performance of the algorithms is evaluated. Comparisons between MC and its related algorithms have been made using several datasets.
DRNTU::Engineering::Electrical and electronic engineering