Approximating K-means-type clustering via semidefinite programming

35 downloads 0 Views 236KB Size Report
May 16, 2005 - first model MSSC as a so-called 0-1 semidefinite programming (SDP). ... lying problem approximately via the relaxed linear and semidefinite.
Approximating K-means-type clustering via semidefinite programming Jiming Peng

Yu Wei



May 16, 2005

Abstract One of the fundamental clustering problems is to assign n points into k clusters based on the minimal sum-of-squares(MSSC), which is known to be NP-hard. In this paper, by using matrix arguments, we first model MSSC as a so-called 0-1 semidefinite programming (SDP). We show that our 0-1 SDP model provides an unified framework for several clustering approaches such as normalized k-cut and spectral clustering. Moreover, the 0-1 SDP model allows us to solve the underlying problem approximately via the relaxed linear and semidefinite programming. Secondly, we consider the issue of how to extract a feasible solution of the original MSSC model from the approximate solution of the relaxed SDP problem. By using principal component analysis, we develop a rounding procedure to construct a feasible partitioning from a solution of the relaxed problem. In our rounding procedure, we need to solve a k-means clustering problem in 1, because otherwise the underlying clustering problem becomes trivial.

3

K-means clustering algorithm (1) Choose k cluster centers randomly generated in a domain containing all the points, (2) Assign each point to the closest cluster center, (3) Recompute the cluster centers using the current cluster memberships, (4) If a convergence criterion is met, stop; Otherwise go to step 2. Another way to model MSSC is based on the assignment. Let X = [xij ] ∈