Fuzzy Clustering: a practical evolution approach

5 downloads 543 Views 167KB Size Report
of Fuzzy C-Means and Random Search, using these ones for comparison, and ... PSO-Particle Swarm Optimization algorithms; RS-Random Search, TRW-Trace.
Fuzzy Clustering: a practical evolution approach Luis Almeida CGD Email: luí[email protected]

Susana Nascimento DI and CENTRIA, FCT-UNL Email: [email protected]

Abstract: Clustering usually turns out into a NP-hard problem, challenging the most efficient optimization methods and algorithms. Fuzzy clustering extends its usefulness and can be implemented by the same type of algorithms than classical clustering. In this paper, we present a general Evolution Algorithm framework that comprehends Genetic Algorithms, Differential Evolution, Particle Swarm Optimization and population variants of Fuzzy C-Means and Random Search, using these ones for comparison, and analyse their performance in standard fuzzy clustering benchmarks. Key words: evolution algorithms, genetic algorithms, differential evolution, fuzzy clustering, optimization Abbreviations: EAs – evolutionary algorithms; ESs – evolution strategy algorithms; FCM-Fuzzy C-Means, GAs – genetic algorithms; PSO-Particle Swarm Optimization algorithms; RS-Random Search, TRW-Trace Within criterion

1.

INTRODUCTION

Cluster analysis or clustering is a non-supervised classification method of a object set, based on identifying homogeneous groups (clusters) and summarising the group shared characteristics. Partitional clustering distributes the objects by a given number of clusters C in a well defined and effective way, trying to minimize the total object dissimilarity within the clusters while maximizing it between distinct ones. Dissimilarity is described by a distance metric relevant to the underlying classification problem. In classical (a.k.a. hard or crisp) clustering each object is assigned to exactly one cluster where as in fuzzy clustering group membership is considered not a matter of affirmation or denial but rather a matter of degree [Bezdek,81;Nascimento,05]. In either case, the clustering is defined by a membership function and a set of C cluster prototypes, cluster centroids and/or other geometrical characteristics. If the clustering quality or the goodness of the partition is defined by a diferenciable function, it´s possible to define a iterative deterministic algorithm for finding local optima, like K-means [MacQueen, 67] or fuzzy C-means [Bezdek,81], respectively for crisp and fuzzy clustering. For global optima, several reruns have to be performed or, alternatively, population based stochastic algorithms can be used, like polyhedron search [Nelder and Mead, 65], controlled random search [Price, 78] and evolutionary algorithms, the main focus of this paper. Evolutionary Algorithms (EAs) consists of improving gradually a population of individuals, in an iterative and non-deterministic way, using computing analogies of biological mechanisms. Each individual codifies a solution of the underlying optimization problem, and the EA solution is the fittest individual found, hopefully the global optima. The iteration can be generational, where some individuals are replaced by their offsprings, or positional, where each individual is replaced by a new one, corresponding to a new position of a trajectory. Common to all types of EAs is the fundamental principle of trying to balance exploration versus exploitation of the search space, exploring different individual transformations, controlling the population diversity, allowing some cooperation between individuals, applying the right amount of selective pressure in population transitions and, most important of all, obtaining a emergent behaviour suitable for finding the global optima, avoiding premature stagnation over local optima. Evolutionary Programming [Fogel, 62], Evolution Strategies [Rechenberg, 73; Schwefel, 94], Genetic Algorithms (GAs) [Holland, 75; Goldberg, 89], Differential Evolution (DEs) [Price and Storm, 95] and Particle Swarm Optimization (PSO) [Engelbrecht, 05] can be considered different types of EAs, even if origins and inspirations differ. Implementations usually are application domain dependent and extend the original proposals in different

1

and sometimes hybrid ways. Even thought EAs can be robust global optimizers, their complex and non-deterministic dynamics make them hard to tune and control. Genetic Algorithms (GAs) are a generational type of EA that simulates genetic evolution by crossover and mutation of individual characteristics. Crossover tries to create offsprings equipped with the best characteristics of their parents and hopefully better than them, while mutation exploits the vicinity of the individual and improves population diversity. Tournament selection and elitism are usually combined in order to obtain the desired level of selective pressure between generations. GAs have been successfully applied to partitioning clustering in three forms according to the codification used: (i) direct encoding of the object-cluster association [Raghavan and Birchand, 79; Falkenauer, 98]; (ii) encoding of cluster separating boundaries [Bandyopadyay et al., 95]; and (iii) encoding of clusters by prototypes [Srikanth et al., 95, Nascimento and Moura Pires, 97]. Classic GA operators are known to perform badly in clustering [Falkenauer, 98], and extra tuning has to be applied. Also generational types of EAs are Differential Evolution (DEs) [Price and Storm, 95], lacking the mutation operator and maximizing the selective pressure, replacing an individual only by a better one. Particle Swarm optimisation (PSO) is a type of positional EA, which consists of modelling the behaviour of different groups of coordinated animals (swarms), like birds and insects. Each individual is characterized by its trajectory, defined both in terms of the best position already attained by the individual and the best position attained by the population. The emergent behaviour of the swarm is to narrow along a near global optimum [Kennedy and Ebelhart, 95]. Applications of PSO to clustering are described in [Engelbrecht, 05]. Random Search (RS) and Fuzzy C-Means (FCM) algorithms implement individual trajectories that can be easily adapted for population evolutions. Recentely, in [Paterlini and Krink, 06] GA, PSO and DE algorithms have been tried n classical clustering, and their performance compared taking as reference K-means and RS algorithms. The authors concluded about the extra performance and less tuning effort of DEs over the other algorithms for a set of several known non-noisy data sets. We note that algorithm’s benchmarking is in itself a difficult task, with each algorithm demanding its own level of tuning to exhibit the best performance. In this work, we propose an evolutionary framework for fuzzy clustering (EA4FC), focusing on aspects of unification, adaptability to less usual distance metric and fitness functions and practicability in real world problems., The remaining sections of the paper are organized as follows. Section 2 introduces the point-prototype fuzzy clustering. Section 3 describes the evolutionary framework for fuzzy clustering and adopted evolutionary operators. The experimental study, regarding the algorithmic parameters set-up, is presented in Section 4 and Section 5 reports the main concluding remarks.

2. PARTITIONAL DISTANCE-BASED FUZZY CLUSTERING Let X = {x1 ,..., x N } be a set of N objects described by M real-valued patterns, such that

xi = ( xi1 ,..., xiM )∈ O ⊂ ℜ M , and its total-scatter matrix T=[Tij]MxM defined by ( µ j is the set

average regarding feature j ):

T jl =

∑ (x N

ij

)(

− µ j xij − µ j

)

(1)

i =1

Let the object dissimilarity be described by the Euclidian distance metric, 1

⎞2 ⎛M 2 d ( x, y ) = ⎜ xj − yj ⎟ (2) ⎟ ⎜ j =1 ⎠ ⎝ Let C, (1