Unifying Single-Linkage and Max-Sum Clustering - CiteSeerX

2 downloads 0 Views 131KB Size Report
shai@cs.uwaterloo.ca. Abstract. To unify the motivating principles behind Max-Sum and Single-Linkage cluster- ing, we consider an axiomatic approach to the ...
Unifying Single-Linkage and Max-Sum Clustering

Reza Bosagh Zadeh School of Computer Science Carnegie Mellon University, Pittsburgh, Pennsylvania, USA [email protected]

Shai Ben-David David R. Cheriton School of Computer Science University of Waterloo, Waterloo, Ontario, Canada [email protected]

Abstract To unify the motivating principles behind Max-Sum and Single-Linkage clustering, we consider an axiomatic approach to the theory of Clustering. We discuss abstract properties of clustering functions, following the framework of Kleinberg, [7]. By relaxing one of Kleinberg’s clustering axioms, we sidestep his impossibility result and arrive at a consistent set of axioms. We suggest to extend these axioms, aiming to provide an axiomatic taxonomy of clustering paradigms. Such a taxonomy should provide users some guidance concerning the choice of the appropriate clustering paradigm for a given task. The main result of this paper is a set of abstract properties that characterize the Max-Sum and Single-Linkage clustering functions. These functions have been traditionally treated separately, as the principles motivating their use have never been unified. In the current paper we provide alternative axiomatic definitions for Max-Sum and Single-Linkage which sheds light on their common underlying principles and will guide the user to decide which is appropriate for a particular task, if either of them.

1

Introduction

Clustering a set of comparable objects into groups is a natural task employed in many fields. Equipped with a measure of similarity between two objects, the task is to find a partitioning of a set of objects which respect the given similarities. With this definition, the Clustering task is used in a wide variety of applications. However, it is of course, ill defined. Given a specific data set there are often many different ways to partition that data in a way that adheres to the above description. Consequently, there is an overwhelming variety of clustering paradigms and techniques currently in use. The majority of work in the study of clustering has so far been in the context of concrete methods, concrete data generative models or concrete data-sets. However, only recently has the community started considering clustering from a more general perspective, by looking into unifying principles that are common to all clustering paradigms as well as comparing different clustering paradigms in terms of their characterizing features. Our research aims to address these higher level aspects of clustering. We define a clustering function as one that satisfies a set of axioms. This is often termed as an axiomatic framework. Functions that satisfy the basic axioms required to be called a clustering function can then be further classified according to other properties to provide users with guidance concerning which such function best suits their application. Kleinberg [7] sets forth three properties that one may expect all clustering functions to satisfy, but then proves no function can satisfy all three properties. By shifting the formalism of that discussion to clustering quality measures, [1] concludes that Kleinberg’s impossibility result is more an artifact of the formalism and framework in which the result was stated, rather than a natural and inherent 1

contradiction in our intuition about clustering. In the current paper, we circumvent Kleinberg’s impossibility result by relaxing one of his axioms. We then propose some additional abstract properties of clustering functions and obtain two uniqueness theorems, showing that only the Single-Linkage and Max-Sum functions satisfy the resulting sets of properties, respectively. Our result highlights the aspects of a similarity measure over a data set that Single-Linkage and Max-Sum are focused on. Our framework uses the axioms proposed by Kleinberg [7], however, we circumvent the impossibility result by restricting our attention to clustering algorithms that split the data into two parts. It is known that such a restriction suffices to render the resulting set of axioms consistent. In fact, many of the common clustering algorithms, such as Single-Linkage and Max-Sum, satisfy this relaxed set of axioms. We focus on the underlying principles behind the Single-Linkage and Max-Sum Clustering functions. These functions have been traditionally treated separately, as the principles motivating their use have not been unified. Single-Linkage is often used when one needs a hierarchical clustering algorithm, while Max-Sum is the result of optimization with respect to some objective function. In the current paper we provide alternative axiomatic definitions for Max-Sum and Single-Linkage which sheds light on their common underlying principles and will guide the user to decide which is appropriate for a particular task, if either of them. To achieve characterization of Max-Sum and Single-Linkage, we will use two well known tree constructions. Given a graph G, we are interested in • The Maximum Spanning Tree (MST) of G for characterizing Single-Linkage, and • The Minimum Cut Tree (MCT) of G for characterizing Max-Sum. Both these constructions are defined formally in section 1.2. We show that Max-Sum and SingleLinkage are axiomatically identical, except for which of the above trees they use in the course of their operation. Furthermore, we show that they are the only clustering functions determined by those trees. This paper is concerned with partitioning the data into two clusters. It is clear that fixing the number of clusters to 2 beforehand restricts the set of functions that can be described by the framework. However, one interpretation of Kleinberg’s impossibility result is that if one does not give algorithms the number of clusters they are to return, then the algorithm must be performing some unintuitive operations. 1.1

Background

As mentioned above, our formal framework is based on Kleinberg’s [7]. We also adopt two of the three axioms proposed in that paper, Consistency and Scale Invariance. We replace the Richness axiom of [7] by its version for the case of fixed number of clusters, 2-Richness. An axiomatic characterization of Single-Linkage is available in [8], but they do not provide any characterization of Max-Sum. [3] present some algorithms which use Minimum Cut Trees for the purpose of Clustering, but they do not discuss axioms or uniqueness theorems. Another related axiomatic approach is the work of Jardine and Sibson [6]. They constrain their view of clustering functions and only consider hierarchical functions. Our collection of properties are drastically different from [6] and do not restrict the formalism to only consider hierarchical functions. 1.2

Formal Preliminaries

A partitioning function acts on a set S of n ≥ 2 points, and pairwise similarities among the points in S. The points in S are not assumed to belong to any specific set; the pairwise similarities are the only data the partitioning function has about them. Since we wish to deal with point sets that do not necessarily belong to a specific set, we identify the points with the set S = {1, 2, ..., n}. We can then define a similarity function to be any function s : S × S → R such that for distinct i, j ∈ S, we have s(i, j) > 0, and s(i, j) is undefined if and only if i = j, and s(i, j) = s(j, i) - in other words s must be positive symmetric and a point is infinitely similar to itself. 2

Sometimes we write s = he1 , e2 , . . . , e(n) i to mean the set of edges that exist between all pairs of 2

n points. This list is always ordered by decreasing similarity. w(e) is the weight of edge e which connects some two points i, j. So w(e) = s(i, j). A partitioning function is a function F that takes a similarity function s on S × S and returns a 2-partitioning of S. A 2-partitioning of S is a collection of 2 non-empty disjoint subsets of S whose union is S. The sets in F (s) will be called its clusters. Two partitioning functions are equivalent if and only if they output the same partitioning on all values of s - i.e. functionally equivalent. In the next section we define two trees, each associated with a particular graph G. A natural representation for a similarity function s = he1 , e2 , . . . , e(n) i is a complete weighted 2 graph Gs , whose n nodes correspond to our objects, and edges correspond to similarity scores assigned by s. Note that when there is no ambiguity about which similarity function is being used, we drop the subscript s and simply use G. Also, when it is natural to think of the complete graph for the similarity function, we simply use s to refer to the graph, instead of Gs . Now two trees will be associated with a particular s. 1.2.1

Minimum Cut Tree

Let G = (V, E, s) be any arbitrary weighted undirected connected graph with |V | = n. Two subsets A and B of V , with A ∪ B = V and A ∩ B = φ define a cut in G. The sum of the weights of the edges crossing the cut defines the cut value. The Minimum Cut Tree of G - call it MCT(G) - is a tree which has the same nodes as G, but a different set of edges and weights. The edges of the Minimum Cut Tree of G satisfy the following two properties. • For every two nodes s, t in G, the minimum weight cut in G that separates these points is the cut defined by the connected components of the tree once the minimum weight edge on the tree unique path from s to t in MCT(G) is deleted. • For every two nodes s, t in G, the weight of the minimal-weight edge on the path connecting them equals the minimal weight of a cut of G that separates these points Note that MCT(G) is not a subgraph of G. For every undirected graph, there always exists a min-cut tree [4]. These trees have been well-known by the community for decades and have a rich history. They were initially introduced by Gomory and Hu in [4]. Minimum Cut Trees are not always unique, so we define a canonical MCT function which fixes a particular ordering on pairs of points, and uses the algorithm defined in [4]. Under these conditions, the output of the Gomory Hu algorithm will be unique and we will denote its value MCT(G). 1.2.2

Maximum Spanning Tree

Given a connected, undirected weighted graph G, a spanning tree of G is a subgraph which is a tree and connects all the vertices together. A single graph can have many different spanning trees. The weight of a spanning tree is computed as the sum of the weights of the edges in the spanning tree. A Maximum Spanning Tree (MST) is then a spanning tree with weight greater than or equal to the weight of every other spanning tree. Similar to Minimum Cut Trees, MSTs also have a rich history. They can be computed efficiently by Kruskal’s algorithm. Maximum Spanning Trees are not always unique, but in the case that we fix an edge ordering, it is well known that they are unique for a particular G. We denote the canonical unique Maximum Spanning Tree of a graph as MST(G). 1.2.3

Single-Linkage

Single-Linkage is the clustering function which starts with all points in singleton clusters, and successively merges clusters until only 2 clusters are left. The similarity of two clusters is the similarity of the two most similar points between two clusters. Single-Linkage is made formal in algorithm 1. 3

Algorithm 1 S INGLE -L INKAGE(s) Input: s = he1 , e2 , . . . , e(n) i. 2 Output: The Single-Linkage 2-partitioning. Γ ← {{1}, {2}, . . . , {n}} 4: i ← 1

while |Γ| > 2 do let x, y be the two ends of ei let cx ∈ Γ, cy ∈ Γ be the clusters of x and y 8: if cx 6= cy then Merge cx and cy Γ ← (Γ\cx , cy ) ∪ {cx ∪ cy } end if 12: i←i+1 end while Output Γ

Another way to compute the Single-Linkage 2-partitioning is to cut the smallest edge of the Minimum Spanning Tree of Gs [5]. It should be noted that the behavior of Single-Linkage is robust against small fluctuations in the weight of the edges in s, so long as the order of edges does not change. This can be seen readily from algorithm 1. 1.2.4

Max-Sum

The objective of Max-Sum is to maximize X X Λs (Γ) = s(i, j) + s(i, j) i,j∈A

i,j∈B

over all 2-partitionings Γ = {A, B}. Which is equivalent to the balanced k-median and Graph Cuts objective functions [2, 9]. Finding the optimal partitioning in this case is NP-hard if we were to look for the optimal k-partitioning when k > 2. Notice that finding the optimal Max-Sum 2-partitioning is the same as finding the overall minimum cut, as the problems are duals. And since finding the overall minimum cut is equivalent to cutting the smallest edge of the Minimum Cut Tree, Max-Sum can be reinterpreted as the algorithm which cuts the smallest edge of the Minimum Cut Tree for Gs .

2 2.1

Uniqueness Results Axioms

Now in an effort to distinguish clustering functions from partitioning functions, we lay down some properties that one may like a clustering function to satisfy. Here is the first one. If s is a similarity function, then we define α · s to be the same function with all similarities multiplied by α. S CALE -I NVARIANCE . For any similarity function s, and scalar α > 0, we have F (s) = F (α · s) This property simply requires the function to be immune to stretching or shrinking the data points linearly. It effectively disallows clustering functions to be sensitive to changes in units of measurement - which is desirable. We would like clustering functions to not have any predefined hard-coded similarity values in their decision process. This property was initially used in an axiomatic clustering framework by [6]. The next property ensures that the clustering function is “rich” in types of partitioning it could output. For a fixed S, Let Range(F (•)) be the set of all possible outputs while varying s. 2-R ICHESS . Range(F (•)) is equal to the set of all 2-partitionings of S 4

In other words, if we are given a set of points such that all we know about the points are pairwise similarities, then for any partitioning Γ, there should exist a s such that F (s) = Γ. By varying similarities amongst points, we should be able to obtain all possible 2-partitionings. The next property is more subtle and was initially introduced in [7], along with richness. We call a partitioning function “consistent” if it satisfies the following: when we increase similarities between points in the same cluster and decrease similarities between between points in different clusters, we get the same result. Formally, we say that s0 is a Γ-transformation of s if (a) for all i, j ∈ S belonging to the same cluster of Γ, we have s0 (i, j) ≥ s(i, j); and (b) for all i, j ∈ S belonging to different clusters of Γ, we have s0 (i, j) ≤ s(i, j). In other words, s0 is a transformation of s such that points inside the same cluster are made more similar and points not inside the same cluster are made less similar. C ONSISTENCY. Let s be a similarity function, and s0 be a F (s)-transformation of s. Then F (s) = F (s0 ) In other words, suppose that we run the partitioning function F on s to get back a particular partitioning Γ. Now, with respect to Γ, if we shrink in-cluster similarities or expand between-cluster similarities and run F again, we should still get back the same result - namely Γ. The difference between these and the properties defined in [7] is that at all times, the partitioning function F is forced to return a fixed number of clusters. If this were not the case, then the above properties could never be satisfied by any function. In most popular clustering algorithms such as kmeans, Single-Linkage, and spectral clustering, the number of clusters to be returned is determined beforehand – by the human user or other methods – and passed into the clustering function as a parameter. Kleinberg mentions this fact, but since the goal of [7] was an impossibility result, k was not provided to partitioning functions, thereby making the properties unsatisfiable. Now we are ready to give our definition of a clustering function. Definition 1. A Clustering Function is a partitioning function that satisfies Consistency, Scale Invariance, and 2-Richness. We’ve already introduced two clustering functions. Theorem 2. Single-Linkage and Max-Sum are Clustering functions. Both Single-Linkage and Max-Sum are clustering functions, with proofs available in [8]. So a natural question to ask is what types of properties distinguish these two clustering functions from one another? We introduce two new properties that one may not expect all clustering functions to satisfy, but may at times be desirable. 2.2

Properties

For ease of notation, let MST(s) be the Maximum Spanning Tree of Gs . Similarly, let MCT(s) be the Minimum Cut Tree of Gs . Now we are ready to define two distinguishing properties. MST-C ONSISTENCY. If s and s0 are similarity functions such that MST(s) and MST(s’) have the same minimum cut, then F (s) = F (s0 ) In other words, a clustering function is MST-Consistent if it makes all its decisions based on the Maximum Spanning Tree. Single-Linkage obviously satisfies this property since one definition of Single-Linkage cuts the smallest edge of the MST. Note that this is a property, not an axiom - we don’t expect all clustering functions to make their decisions using the Maximum Spanning Tree. Later in this section we show that Single-Linkage is the only clustering function that is MSTConsistent. Similarly define MCT-Consistency identical to MST-Consistency, with MST replaced with MCT. MCT-C ONSISTENCY. If s and s0 are similarity functions such that MCT(s) and MCT(s’) have the same minimum cut, then F (s) = F (s0 ) This property forces a clustering function to make all its decisions based on the Minimum Cut Tree. Max-Sum satisfies this property since it always cuts the smallest edge of the minimum cut tree. We show that Max-Sum is the only clustering function that is MCT-Consistent. 5

Notice that both MST-Consistency and MST-Consistency imply Scale-Invariance; meaning that if a function is either MST-Consistent or MCT-Consistent, then it is also Scale-Invariant. For this reason, whenever a function satisfies {MCT, MST}-Consistency, we ignore Scale-Invariance. 2.3

Uniqueness Theorems

Shortly, we will be showing that Single-Linkage and Max-Sum are uniquely characterized by MSTConsistency and MCT-Consistency, respectively. Before doing this, we reflect on the relationships between subsets of our axioms and properties. We will show that if any of the axioms or properties that we use for proving uniqueness were to be missing, then the uniqueness results do not hold. This means that all axioms and properties are really necessary to prove uniqueness of Single-Linkage and Max-Sum. Theorem 3. Consistency, MCT-Consistency, and 2-Richness are necessary to characterize MaxSum. Proof. For each of the mentioned properties, we show that all the other properties are not enough to uniquely characterize Max-Sum. For each of the properties, we exhibit an algorithm that acts differently than Max-Sum, and satisfies all the properties except for one. In other words, we show that without each of these properties, the remaining ones do not uniquely characterize Max-Sum. Consistency is necessary. We define the Minimum Cut Tree Cuts family of partitioning functions. As usual, the task is to partition n points into 2 clusters. Let σ be a permutation function for the set of all 2-partitionings of S. A particular member of the MCT cuts family computes the Max-Sum 2-partitioning, then runs σ on the output of Max-Sum. The entire family is obtained by varying the particular permutation used. Since Max-Sum is 2-Rich, and σ is a bijection, then all the members of MCT-Cuts are also 2-Rich. Since Max-Sum is MCT-Consistent and σ does not look at the input, all the members of MCT-Cuts are also MCT-Consistent. However, it is not true that all members MCT-Cuts are Consistent. This is implied by theorem 5, which says the only Consistent member of MST-Cuts is Max-Sum itself, i.e. the case that σ is the identity permutation. MCT-Consistency is necessary. Consider that Single-Linkage satisfies Consistency, ScaleInvariance, and 2-Richess, but is obviously not the same function as Max-Sum. Thus MCTConsistency is necessary. 2-Richness is necessary. Now consider the Constant clustering function which always returns the first n − 1 elements of S as a single cluster and returns the remaining 1 as a singleton cluster (a singleton is a cluster with a single point in it), making a total of 2 clusters. Because this function does not look at s, it is trivially MST-Consistent, Consistent, and Scale-Invariant. However, it is not 2-Rich because it always returns some singletons - i.e. we could never reach a 2-partitioning that has no singletons. Theorem 4. Consistency, MST-Consistency, and 2-Richness are necessary to characterize SingleLinkage. Proof. Similar to what was done for Max-Sum, for each of the mentioned properties, we show that all the other properties are not enough to uniquely characterize Single-Linkage. Consistency is necessary. The Maximum Spanning Tree Cuts family of partitioning functions introduced in [8] satisfies MST-Consistency, Scale-Invariance, 2-Richness, but not Consistency. Thus, leaving out Consistency will be detrimental to any uniqueness theorem for Single-Linkage. MST-Consistency is necessary. Consider that Max-Sum satisfies Consistency, Scale-Invariance, and 2-Richess, but is obviously not the same function as Single-Linkage. Thus MST-Consistency is necessary. 2-Richness is necessary. The constant clustering function described in the proof of theorem 3 is the exhibit for this claim. A summary of these results is available in table 1. Now that we have seen our properties do not trivially characterize neither Single-Linkage nor Max-Sum, we can move onto proving the uniqueness theorems. 6

Single-Linkage Max-Sum MST cuts family MCT cuts family Constant partitioning

Consistency X X × × X

2-Richness X X X X ×

MST-Consistency X × X × X

MCT-Consistency × X × X X

Table 1: Overview of discussed partitioning functions. Even if one were to consider more partitioning functions, as a consequence of theorems 6 and 5, the Single-Linkage and Max-Sum rows are unique amongst all partitioning functions. The horizontal divide separates clustering functions from other partitioning functions. The vertical divide separates Axioms from Properties.

Theorem 5. Max-Sum is the only MCT-Consistent Clustering function. Proof. Let F be any Consistent, 2-Rich, MCT-Consistent clustering function, and let s be any similarity function on n points. We want to show that that for all s, F (s) = MS(s), where MS(s) is the result of the Max-Sum on s. For this purpose, we introduce the partitioning Γ as whatever the output of MS is on s, so MS(s) = Γ. Whenever we say “inner” or “outer” edge for this proof, we mean with respect to Γ. By 2-Richness of F , there exists a s1 such that F (s1 ) = MS(s) = Γ. Now, through a series of transformations that preserve the output of F , we transform s1 into s2 , then s2 into s3 , . . ., until we arrive at s. 1. By 2-Richness, we know there exists a s1 such that F (s1 ) = MS(s) = Γ. 2. Let  denote the smallest similarity value in s, and let there be t outer edges in s. For each of the t outer edges e, we shrink e until it has /n2 similarity score. This will maintain the output of F , by Consistency, since /n2 is guaranteed to be smaller than the original similarity value and it is an outer edge. We perform this operation t times, and arrive at F (st+1 ) = Γ. 3. By Consistency, we shrank all the t outer edges of s1 until they formed the unique global minimum cut for s1 , and called the result st+1 . Note that here we assume that the global minimum cut of s is unique, otherwise Max-Sum would be undefined and we would be done. Also, it is important that the similarity values are positive, otherwise we would not have been able to shrink them enough to obtain a unique global minimum cut for st+1 . 4. Now consider the minimum cut tree for st+1 . It must be that the smallest edge of MCT(st+1 ) corresponds to the global minimum cut of st+1 . Also, by the second property of Minimum Cut Trees, cutting that edge will give the two sides of the cut in Gst+1 . Since the global minimum cut of st+1 and s were made to be the same in the last step, we can invoke MCT-Consistency and transform st+1 to s, while maintaining the output of F , and call the result st+2 with F (st+2 ) = Γ. 5. Thus we have F (st+2 ) = F (s) = Γ. We started with any s, and showed that F (s) = Γ = MS(s). We also know that Max-Sum satisfies all 3 axioms and MCT-Consistency. Thus it is uniquely characterized. Theorem 6. Single-Linkage is the only MST-Consistent Clustering function. Proof. [8] prove a uniqueness theorem which uses the following two properties: • O RDER -C ONSISTENCY. For any two similarity functions s and s0 , if the order of edges in s is the same as the order of edges in s0 , then F (s) = F (s0 ) • MST-C OHERENCE . If s and s0 are similarity functions such that MST(s) = MST(s0 ), then F (s) = F (s0 ) 7

We show that if a function F satisfies MST-Consistency, then it also satisfies Order-Consistency and MST-Coherence. First consider an Order-Consistent change to the input s. Since the order of edges is preserved, so is the structure of the maximum spanning tree for s, by Kruskal’s algorithm for MST. Thus the smallest edge of the MST for s does not change, and since the minimum cut in a tree is the smallest edge in the tree, the minimum cut of the MST(s) also remains intact. Therefore, any Order-Consistent change to s is also an MST-Consistent change to s. Thus MST-Consistency implies Order-Consistency. Now Consider MST-Coherent changes to s. Clearly if the MST of s remains intact, so does the minimum cut of the MST of s. Therefore, any MST-Coherent change to s is also an MST-Consistent change to s. Thus MST-Consistency implies MST-Coherence. [8] prove that the only function simultaneously satisfying Consistency, 2-Richness, MST-Coherence, and Order-Consistency is Single-Linkage. Since MST-Consistency implies MST-Coherence and Order-Consistency, we are done.

3

Conclusions & Future directions

In this paper we make a step towards the important and ambitious goal of developing a general theory of clustering. More concretely, we have unified the underlying principles motivating two popular clustering algorithms, which have traditionally been treated separately with differing motivating principles. Using our framework, one can aim to build a suite of abstract properties of clustering that will induce a taxonomy of clustering paradigms. Such a taxonomy should serve to help utilize prior domain knowledge to allow educated choice of a clustering method that is appropriate for a given clustering task. At this point, we present a consistent axiomatic basis for clustering, and, under that framework, we go on to offer a concrete set of properties that characterize the Single-Linkage and Max-Sum clustering algorithms. Our contribution provides new insight into the connection between the SingleLinkage and Max-Sum clustering functions, as we show they are axiomatically identical except for one property. These uniqueness results are set in a framework similar to one previously used for the purpose of showing impossibility, showing that there is no inherent impossibility in formalizing clustering. By considering the listing in table 1, we demonstrate the type of desired taxonomy of clustering functions based on the properties each satisfies. To investigate the ramifications of algorithms which satisfy only a subset of our properties, we introduced the Minimum Cut Tree Cuts family of partitioning functions, of which Max-Sum is the only Consistent member. The uniqueness theorems came about as a result of forcing functions to focus on Minimum/Maximum Cut/Spanning Trees. We hope that this work spurs new research in building a more solid theory of Clustering, especially an axiomatic theory. Future considerations that will be valuable are the addition of new clustering functions and properties to table 1. We have characterized Single-Linkage and Max-Sum in a useful way, leaving the characterization of other clustering functions for future work. Acknowledgments Anonymous for review.

8

References [1] M. Ackerman and S. Ben-David. Measures of Clustering Quality: A Working Set of Axioms for Clustering. In Advances in Neural Information Processing Systems: Proceedings of the 2008 Conference, 2008. [2] Y. Bartal, M. Charikar, and D. Raz. Approximating min-sum k-clustering in metric spaces. In Proceedings of the thirty-third annual ACM symposium on Theory of computing, pages 11–20. ACM New York, NY, USA, 2001. [3] G.W. Flake, R.E. Tarjan, and K. Tsioutsiouliklis. Graph clustering and minimum cut trees. Internet Mathematics, 1(4):385–408, 2004. [4] RE Gomory and TC Hu. Multi-terminal network flows. Journal of the Society for Industrial and Applied Mathematics, pages 551–570, 1961. [5] J.C. Gower and GJS Ross. Minimum spanning trees and single linkage cluster analysis. Applied Statistics, 18(1):54–64, 1969. [6] N. Jardine and R. Sibson. The construction of hierarchic and non-hierarchic classifications. Multivariate Statistical Methods, Among-groups Covariation, 1975. [7] J. Kleinberg. An Impossibility Theorem for Clustering. In Advances in Neural Information Processing Systems 15: Proceedings of the 2002 Conference. MIT Press, 2003. [8] Anne Nonymous. An Anonymous paper for review. In The Magical Conference, 2012. [9] U. von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 2007.

9

Suggest Documents