Lecture Notes on Data Science: Online k-Means ...

Lecture Notes on Data Science: Online k-Means Clustering Christian Bauckhage B-IT, University of Bonn In this note, we discuss an approach to k-means clustering that allows for application in online settings where the data arrive one at a time.

Introduction In practice, we often resort to k-means clustering1 not because we want to determine clusters but because we are interested in vector quantization. That is, given a set of n data points X = { x1 , . . . , x n } , x j ∈ Rm

C. Bauckhage. Lecture Notes on Data Science: k-Means Clustering, 2015c. DOI: 10.13140/RG.2.1.2829.4886 1

(1)

the goal is to compute a codebook V = {v1 , . . . , vk }, vi ∈ Rm

(2)

of k n prototypes or codebook vectors. On the one hand, this allows for data compression because appropriate codebooks can represent possibly many data points in terms of only a few prototypes. On the other hand, prototypes are used for feature learning, hierarchical information retrieval, or nearest neighbor classification to name but a few2 . Given our earlier discussions, it is easy to see that k-means clustering provides a solution to the vector quantization problem. Recall that when we use it to partition X into k clusters C1 , . . . , Ck , the algorithm initializes k centroids µ1 , . . . , µk , determines clusters o n

2

2 (3) Ci = x j ∈ X x j − µi ≤ x j − µl ∀ l 6= i , updates the centroids µi =

1 ni

∑

xj

where

ni = |Ci |

(4)

x j ∈Ci

and repeats steps (3) and (4) until convergence. Upon convergence, each centroid thus indicates the mean of a cluster of data points and can therefore be considered as a prototype of the data in X. However, in vector quantization, we are not actually interested in clusters Ci but only in prototypes µi . Step (3) therefore appears to introduce overhead and the question is Q: is there an algorithm that determines centroids µi but avoids the computation of clusters Ci ? Below, we will answer this question affirmatively: yes, such an algorithm exists. Moreover, as it avoids (3), it does not require prior knowledge of the full data set X and thus applies to settings where X is not known in advance but the data arrive one at a time. It is therefore commonly known as online k-means clustering.

codebook vectors

2

Interesting examples can be found in

A. Coates and A.Y. Ng. Learning Feature Representations with K-Means. In G. Montavon, G.B. Orr, and K.-R. Müller, editors, Neural Networks: Tricks of the Trade, volume 7700 of LNCS. Springer, 2012 D. Nister and H. Stewénius. Scalable Recognition with a Vocabulary Tree. In Proc. CVPR, 2006

lecture notes on data science: online k-means clustering

2

Online k-Means Clustering The fundamental observation for the design of an online k-means algorithm is that sample means can be computed recursively. To see how, let us consider the mean µ(n) of a sample of n points { x1 , . . . , xn } and observe that µ(n) =

1 n

n

∑ xj

j =1

=

1 n

n −1

1

∑ x j + n xn

j =1

=

n − 1 ( n −1) 1 µ + xn . n n

(5)

This is to say that the mean µ(n) is a convex combination of a mean µ(n−1) and a data point xn After another truly straightforward algebraic manipulation, we also realize that 1 1 n − 1 ( n −1) 1 µ + x n = µ ( n −1) − µ ( n −1) + x n n n n n

(6)

so that the mean µ(n) in (5) can be written as µ ( n ) = µ ( n −1) +

1 x n − µ ( n −1) . n

(7)

Looking at (7), we recognize an instance of the standard rule of competitive learning. That is, we can estimate the mean iteratively where, once a new data point arrives, we move our current estimate slightly towards the new point. Regarding our problem of online k-means clustering, this then leads to the algorithm in Fig. 1. Assuming a constant stream of data points x j , the algorithm operates in two phases. In the initial phase, the first k data points are used to initialize k centroids µi . For each centroid, it also initializes the number of points ni the centroid represents to 1. In the second phase, each incoming data point x j is then compared to all centroids and the centroid µi that is closest to x j is updated according to (7). In addition, in order to register the fact that another data point has contributed to µi , the number ni of points it represents is increased.

i←1 for all x j ∈ { x1 , x2 , . . .} do if j ≤ k then // initialize centroids µi ← x j ni ← 1 i ← i+1 else // determine winner centroid

2 µi = argmin x j − µl l

// update winner centroid and ni µi ← µi + n 1+1 x j − µi i

ni ← ni + 1

Figure 1: Online k-means.

To conclude our discussion, we emphasize the following: 1. Should we want to use this algorithm for clustering rather than merely for vector quantization, we could, at any point during its execution, use the current centroid estimates µ1 , . . . , µk to compute clusters according to (3). 2. It may be hard to believe that this simple procedure really works, but it does3 . 3. However, our earlier caveats regarding suitable initializations and the tendency of k-means to produce Gaussian clusters4 of small variance5 still apply.

Here is a video to illustrate this point: www.youtube.com/watch?v=hzGnnx0k6es 3

C. Bauckhage. Lecture Notes on Data Science: k-Means Clustering Is Gaussian Mixture Modeling, 2015a. DOI: 10.13140/RG.2.1.3033.2646 5 C. Bauckhage. Lecture Notes on Data Science: k-Means Clustering Minimizes Within Cluster Variances, 2015b. DOI: 10.13140/RG.2.1.1292.4649 4

lecture notes on data science: online k-means clustering

References C. Bauckhage. Lecture Notes on Data Science: k-Means Clustering Is Gaussian Mixture Modeling, 2015a. DOI: 10.13140/RG.2.1.3033.2646. C. Bauckhage. Lecture Notes on Data Science: k-Means Clustering Minimizes Within Cluster Variances, 2015b. DOI: 10.13140/RG.2.1.1292.4649. C. Bauckhage. Lecture Notes on Data Science: k-Means Clustering, 2015c. DOI: 10.13140/RG.2.1.2829.4886. A. Coates and A.Y. Ng. Learning Feature Representations with KMeans. In G. Montavon, G.B. Orr, and K.-R. Müller, editors, Neural Networks: Tricks of the Trade, volume 7700 of LNCS. Springer, 2012. D. Nister and H. Stewénius. Scalable Recognition with a Vocabulary Tree. In Proc. CVPR, 2006.

3

Lecture Notes on Data Science: Online k-Means ...

Lecture Notes on Data Science: Online k-Means ...

Suggest Documents

Lecture Notes on Data Science: Principal Component

lecture notes - Computer Science

Lecture Notes on Geometric Robustness - Computer Science

Lecture Notes on Compositional Data Analysis

Lecture Notes on Compositional Data Analysis - UdG

Lecture Notes on Compositional Data Analysis

Lecture notes on data analysis

Lecture Notes in Computer Science - Data Science & Engineering Lab

Lecture Notes on Compositional Data Analysis - UdG

Lecture Notes in Computer Science

Lecture Notes in Computer Science:

Lecture Notes in Computer Science:

Lecture Notes in Computer Science:

Lecture Notes in Computer Science:

Lecture Notes in Computer Science:

Lecture Notes in Computer Science:

Lecture Notes in Computer Science:

Lecture Notes in Computer Science:

Lecture Notes in Computer Science

Lecture Notes in Computer Science:

Lecture Notes in Computer Science:

Lecture Notes in Computer Science:

Lecture Notes in Computer Science:

Lecture Notes in Computer Science: