Pattern Recognition. Prof. Christian Bauckhage. Page 2. outline additional material for lecture 13 general advice for data clustering. Page 3. note clustering is ...
Pattern Recognition Prof. Christian Bauckhage
outline additional material for lecture 13
general advice for data clustering
note
clustering is generally an ill-posed problem
note
different kind of data will require different kinds of clustering algorithms ⇔ there is no one-size-fits-all solution for clustering
⇒ check what kind of implicit assumptions your favorite algorithm makes verify whether or not they apply to the problem at hand
note
protoype-based clustering algorithms (such as k-means) crucially depend on initializations ⇔ different runs on the same data will very likely produce different results
⇒ when using protoype-based clustering, always run them repeatedly and keep the “best” result
note
cluster quality measures or cluster quality indices suggest a form of objectivity that is hardly ever justified
⇒ do not put too much faith in cluster quality measures
question how to choose k in k-means clustering ?
answer if you perform cluster analysis to assist human experts, set k ≈ 7 ± 2 (it simply does not make sense to report to management that their customers can be grouped into 152 clusters)
if you perform cluster analysis to facilitate subsequent computations, determine experimentally whether large or small values lead to better results