Interpreting Cluster Structure in Waveform Data with Visual Assessment and Dunn’s Index Sara Mahallati, James C. Bezdek, Dheeraj Kumar, Milos R. Popovic, and Taufik A. Valiante
in Frontiers in Computational Intelligence, eds. S. Mostaghim, A. Nuernberger, C. Borgelt, 2017, Springer.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Abstract Dunn’s index was introduced in 1974 as a way to define and identify a “best” crisp partition on n objects represented by either unlabeled feature vectors or dissimilarity matrix data. This article examines the intimate relationship that exists between Dunn’s index, single linkage clustering, and a visual method called iVAT for estimating the number of clusters in the input data. The relationship of Dunn’s index to iVAT and single linkage in the labeled data case affords a means to better understand the utility of these three companion methods when data are crisply clustered in the unlabeled case (the real case). Numerical examples using simulated waveform data drawn from the field of neuroscience illustrate the natural compatibility of Dunn’s index with iVAT and single linkage. A second aim of this note is to study customizing the three methods by changing the distance measure from Euclidean distance to one that may be more appropriate for assessing the validity of crisp clusters of finite sets of waveform data. We present numerical examples that support our assertion that when used collectively, the three methods afford a useful approach to evaluation of crisp clusters in unlabeled waveform data. Sara Mahallati and Milos R. Popovic Institute of Biomaterials and Biomedical Engineering, University of Toronto, Canada and Toronto Rehabilitation Institute, University Health Network, Toronto, Canda. E-mail: email:
[email protected] Taufik A. Valiante Institute of Biomaterials and Biomedical Engineering, University of Toronto, Canada and Krembil Research Institute, University Health Network, Toronto, Canada Dheeraj Kumar Electrical and Electronic Engineering Department, University of Melbourne, Australia James C. Bezdek Computer Science and Information Systems Departments, University of Melbourne, Australia
1
2
Authors Suppressed Due to Excessive Length
19
Key words: internal cluster validity, Dunn’s index, customized Dunn’s indices, single linkage, iVAT, neuronal spike data
20
1 Introduction
18
21 22 23 24 25 26
27 28 29 30 31 32 33 34 35
36 37
Let O = {o1 , ....on } denote any set of n objects (hockey teams, airplanes, epilepsy patients, etc.). Two kinds of numerical data are used to represent O. Numerical object data (feature vector data) has the form X = {x1 , .....xn } ⊂