Cliff, P. Husbands, J.A. Meyer, S.W. Wilson (Eds), MIT-Press, pp. 501-508, 1994. 11. Kuntz P. Snyers D., Emergent Colonization and Graph Partitioning, Proc. of ...
An Approach to Spatial Data Mining Inspired by Agent Swarming Gianluigi Folino, Agostino Forestiero, Giandomenico Spezzano Institute for High Performance Computing and Networking (ICAR) – CNR Via P. Bucci 41C I-87030 – Rende (CS), Italy {folino, forestiero, spezzano}@icar.cnr.it
1. Introduction The use of new Swarm Intelligence (SI) based techniques [1 ] for data mining task is an emerging new area of research that has attracted many investigators in the last years. SI is an emerging new area of research into Artificial Life, where a problem can be solved using a set of biologically inspired (unintelligent) agents exhibiting a collective intelligent behaviour. An agent represents an entity capable of sensing its environment and undertaking simple processing of environmental observations in order to perform an action chosen from those available to it. These actions may include modification of the environment in which the agent operates. Intelligent behaviour frequently arises through indirect communication between the agents using the principle of stigmergy [2]. Two forms of stigmergy have been observed: sematectonic and sign-based. According to sematectonic stigmergy the only relevant interactions between individuals occur through modifications of the environment. Individual behaviour modifies the environment, which in turn modifies the behaviour of other individuals. The second form of stigmergy is sign-based. Here something is deposited in the environment that makes no direct contribution to the task being undertaken but is used to influence the subsequent behavior that is task related. Sign-based stigmergy is for example used in ants colonies, termites, swarms of bees etc . Clustering spatial data is the process of grouping similar objects according to their distance, connectivity, or their relative density in space [3]. Spatial clustering has been an active area of research into data mining, with many effective and scalable clustering methods developed. These methods can be classified into partitioning methods [4], hierarchical methods [5], density-based methods [6], and grid-based methods [7]. Han, Kamber, and Tung’s paper [8] is a good introduction to this subject. Recently, other algorithms based on biological models [9-13] have been proposed to solve the clustering problem. These algorithms are characterized by the interaction of a large number of simple agents sensing and changing their environment locally. They exhibit complex, emergent behaviour that is robust compared to the failure of individual agents. This paper presents a parallel spatial clustering algorithm based on the use of new Swarm Intelligence techniques. The algorithm, called SPARROW, combines a smart exploratory strategy based on a flock of birds [14] with a density-based cluster algorithm to discover clusters of arbitrary shape and size in spatial data. Agents use modified rules of the standard flock algorithm to transform an agent into a hunter foraging for clusters in spatial data. We have applied this algorithm to two synthetic data sets and we have measured, through computer simulation, the impact of the flocking search strategy on performance. Moreover, we have evaluated the accuracy of SPARROW compared to the DBSCAN algorithm [15].
References 1. Bonabeau E., Dorigo M., Theraulaz G., Swarm Intelligence: From Natural to Artificial Systems, Oxford University Press, 1999. 2. Grassé, P.P., “La Reconstruction du nid et les Coordinations Inter-Individuelles chez Beellicositermes Natalensis et Cubitermes sp. La Théorie de la Stigmergie : Essai d’interprétation du Comportement des Termites Constructeurs, Insect. Soc. 6 pp. 41-80, 1959. 3. Han J., Kamber M., Data Mining: Concepts and Techniques, Morgan Kaufmann 2000. 4. Kaufman L., Rousseeuw P. J., Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley & Sons, 1990. 5. Karypis G., Han E., Kumar V.,: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling, IEEE Computer, vol. 32, pp.68-75, 1999. 6. Sander J., Ester M., Kriegel H.-P., Xu X.: Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and its Applications, in: Data Mining and Knowledge Discovery, an Int. Journal, Kluwer Academic Publishers, Vol. 2, No. 2, 1998, pp. 169-194. 7. Wang W., Yang J., Muntz R., STING: A Statistical Information Grid Approach to Spatial Data Mining, Proc. of Int. Conf. Very Large Data Bases (VLDB’97), pp. 186-195, 1997. 8. Han J., Kamber M., Tung A.K.H., Spatial Clustering Methods in Data Mining: A Survey, H. Miller and J. Han (eds.), Geographic Data Mining and Knowledge Discovery, Taylor and Francis, 2001.. 9. Deneubourg J. L., Goss S., Franks, N., Sendova-Franks A., Detrain C., and Chretien L., The Dynamic of Collective Sorting Robot- like Ants and Ant-like Robots, Proc. of the first Conf. on Simulation of Adaptive Behavior, J.A. Meyer et S.W. Wilson (Eds), MIT Press/Bradford Books, pp. 356-363, 1990. 10. Lumer E. D., Faieta B., Diversity and Adaptation in Populations of Clustering Ants, Proc. of the third Int. Conf. on Simulation of Adaptive Behavior: From Animals to Animats (SAB94), D. Cliff, P. Husbands, J.A. Meyer, S.W. Wilson (Eds), MIT-Press, pp. 501-508, 1994. 11. Kuntz P. Snyers D., Emergent Colonization and Graph Partitioning, Proc. of the third Int. Conf. on Simulation of Adaptive Behavior: From Animals to Animats (SAB94), D. Cliff, P. Husbands, J.A. Meyer, S.W. Wilson (Eds), MIT-Press, pp. 494-500, 1994. 12. N. Monmarché, M. Slimane, and G. Venturini, “On improving clustering in numerical databases with artificial ants”, in Advances in Artificial Life: 5th European Conference, ECAL 99, LNCS 1674, Springer, Berlin, pp. 626-635, 1999. 13. James Macgill, Using Flocks to Drive a Geographical Analysis Engine, Artificial Life VII: Proceedings of the Seventh International Conference on Artificial Life, MIT Press, Reed College, Portland, Oregon, pp. 1-6, 2000. 14. Reynolds C. W., Flocks, Herds, and Schools: A Distributed Behavioral Model, Computer Graphics vol. 21, n. 4, , (SIGGRAPH 87), pp. 25-34, 1987. 15. Ester M., Kriegel H.-P., Sander J., Xu X., A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD-96), Portland, OR, 1996, pp. 226-231, 1996.