ing the K-means method as the core, the proposed approach splits only clusters with the largest ... tialization, binary splitting and pair-wise nearest neighbour.
Pattern Clustering Using Incremental Splitting for Non-Uniformly Distributed Data S. C. Chu and John F. Roddick School of Informatics and Engineering, Flinders University of South Australia, Australia
Abstract. This article reports on our work on the clustering of non-uniformly distributed data. An innovative method, termed incremental splitting, is presented. Taking the K-means method as the core, the proposed approach splits only clusters with the largest total error in each iteration. This heuristic has the effect of allocating more clusters to those regions having more sample data. Consistent experimental results reveal that our method outperforms commonly used heuristics, including random initialization, binary splitting and pair-wise nearest neighbour.
1 Introduction Data clustering is a common practice in various fields of research and application development. For instance, in data mining, we might need to extract and capture hidden regularities diffused across a large database and store them as a limited number of representative entities. For codebook design in vector quantization, we require a small number of the most representative vectors from potentially vast volumes of training data in order to minimize the quantization error. Without loss of generality, data clustering can be formulated as a problem of finding N most representative entities, Ci ; i = 1 : : : N , from M supplied data items, Xi ; i = 1 : : : M . Generally N