Approximating K-means-type clustering via semidefinite ... - CiteSeerX

169 downloads 176 Views 236KB Size Report
May 16, 2005 - stochastic, depending on whether the traditional optimization technique or a random search of the state space is used in the process. There are several ...... Text mining has been popular in document analysis, search engine and knowledge .... K.C. Toh for their useful advice in the preparation of this paper.
Approximating K-means-type clustering via semidefinite programming Jiming Peng

Yu Wei



May 16, 2005

Abstract One of the fundamental clustering problems is to assign n points into k clusters based on the minimal sum-of-squares(MSSC), which is known to be NP-hard. In this paper, by using matrix arguments, we first model MSSC as a so-called 0-1 semidefinite programming (SDP). We show that our 0-1 SDP model provides an unified framework for several clustering approaches such as normalized k-cut and spectral clustering. Moreover, the 0-1 SDP model allows us to solve the underlying problem approximately via the relaxed linear and semidefinite programming. Secondly, we consider the issue of how to extract a feasible solution of the original MSSC model from the approximate solution of the relaxed SDP problem. By using principal component analysis, we develop a rounding procedure to construct a feasible partitioning from a solution of the relaxed problem. In our rounding procedure, we need to solve a k-means clustering problem in 1, because otherwise the underlying clustering problem becomes trivial.

3

K-means clustering algorithm (1) Choose k cluster centers randomly generated in a domain containing all the points, (2) Assign each point to the closest cluster center, (3) Recompute the cluster centers using the current cluster memberships, (4) If a convergence criterion is met, stop; Otherwise go to step 2. Another way to model MSSC is based on the assignment. Let X = [xij ] ∈

Suggest Documents