Exploring the Capabilities of Mobile Agents in Distributed Data Mining U.P.Kulkarni {1}, K.K.Tangod {2} , S. R. Mangalwede {3} , A.R.Yardi {4} {1} Faculty, Department of Computer Science and Engineering, SDM College of Engineering and Technology, Dharwad, India. {2,3} Faculty, Department of Computer Science and Engineering, Gogte Institute of Technology, Belgaum, India. {4} Principal, Walchand College of Engineering and Technology, Sangli, India.
[email protected] Abstract Over the past decade, data mining has gained an important role in analysis of large datasets and there by understanding the complex systems in almost all areas. Such datasets are often collected in a geographically distributed way, and cannot, in practice, be gathered in to single repository. Existing data mining methods for distributed data are of communication intensive. Many algorithms for data mining have been proposed for a data at a single location and some at multiple locations with improvement in terms of efficiency of algorithms as a part of quality but effectiveness of these algorithms in real time distributed environment are not addressed, as data on the web/network are distributed by very of its nature. As a consequence, both new architectures and new algorithms are needed. In this paper we introduce the software agent technology that supports building of distributed data mining architecture and explore the capabilities of mobile agents’ paradigm and will show by experiment that, it is suited for distributed data Mining compared to traditional approaches like client server computing..
1. Introduction Progress in bar-code technology has made it possible for retail organizations to collect and store massive amounts of sales data, referred to as the basket data. The emergence of network-based computing environments has introduced a new and important dimension i.e distributed sources of data and computing. The advent of laptops, palmtops, handhelds, embedded systems, and wearable computers are also making ubiquitous access to a large quantity of distributed data a reality. Advanced
analysis of distributed data for extracting useful knowledge is the next natural step in the increasingly connected world of ubiquitous and distributed computing. Most of the popular data mining algorithms are designed to work for centralized data and they often do not pay attention to resource constraints of distributed and mobile environments. Recent research in this area has demonstrated that handling these resource constraints in an optimal fashion requires a new breed of data mining algorithms and systems that are very different from their centralized counterparts A mobile agent is a running program that can move from host to host in a network and created a new paradigm for data exchange and resource sharing in rapidly growing and continually changing computer network [4,8,9]. Mobile code promises to increase system flexibility, scalability, and reliability. To date, however, this promise has been only partially fulfilled. Among the reasons for the technology’s unmet potential are security concerns and incomplete knowledge of the possible consequences of mobile code use [7]. This paper explores the capabilities of Mobile Agents’ Technology for Distributed Data Mining activities. .
2. Motivations and Related Work The process of data mining is becoming harder for large datasets because of the following reasons: • Data stored online doubles every year. • Datasets are distributed geographically • Datasets are immovable. In traditional approach, data mining algorithms are applied to the data at a single location. When data is collected in distributed way; this means that data must
Goal of this work is to develop a system using Mobile Agents and show quantitatively how Mobile Agents are better to mine the huge data available in the distributed environment where the component data are distributed among several sites. The experiments are conducted using scatter-gather style. In the first case agents are sent to remote data repository and large datasets are transferred to central site and then Data mining algorithm is applied on this collected large dataset at central site. Around 6MB data size was used for the faster data transfer because as data chunk size is increased the time taken by Mobile Agent to migrate reduces, as indicated in figure-4. In the second case agents are sent to each remote data repository. Each such agent computes local models, and are brought to central site for combining them to generate global model at central site. The comparative performances of these two cases are shown in figure-1. From figure-1, it is clear that Mobile Agents perform better in terms of time and network bandwidth usage compared with traditional system built over client server technology. The percentage improvement of performance increases with decrease in the number of patterns obtained in each site because, more the number of patterns, implies, more data to be carried by mobile agents during migration to central site. Mobile agents are not good for data transfer which is clear from figure-3, where performance of Mobile agents are compared with File Transfer Protocol (FTP) to transfer huge data over internet. The poor performance is due to serialization and deserialization process of objects involved in migration. However there is limitation on the capacity (Up to 6.004 MB + 514 Bytes) of the data/knowledge, which the agents carry with them, as indicated in Figure-2.
12000 10000 8000 6000 4000 2000 0 1 2 3 4 5 10 2 300 4 50 10 0 200 300 400 0 1 05 0 0 2000 3000 4000 5000 00
3. Proposed Work
Comparing Centrallised & Decentralised Mining of data D a ta S e t T r a n s fe r tim e ( in m S e c )
be transferred to single central place to enable conventional algorithms to be applied. This approach is costly in terms of communication, storage at central site. The various new approaches proposed [17,19,21,22,26,27,28] in the recent past aim at integrating the knowledge, which is discovered out of data at different geographically distributed sites, but it is necessary to have minimum amount of network communication, and maximum of local computation.
Data Set size in Number of Records (in 1000s)
File Transfer+Mining Total Time Required for Mining Remote Data & Getting Result to Central Site (in mSec)
Figure-1: Comparison of Traditional and Distributed Data mining with Mobile Agents.
Effect on Aglet Size on Netw ork Latency (Observed Average) 16000 14000 12000 10000 8000 6000 4000 2000 0
Si z e i n M B
INVALID EXCEPTION
Figure-2: Limitations of Mobile Agents’ Capacity to carry Information.
4. Conclusion
FTP Vs. Aglets 7000000
Transfer Time(msec)
6000000 5000000 Using FTP
4000000
Using Aglets
3000000 2000000 1000000
Compared to the existing applications of similar kind, Mobile Agents’ application proves to be one of the best and robust methods to handle distributed data and hence distributed data mining, in faster way. This also addresses the issue of handling of dynamically generated data, because the data is maintained at remote sites, which can be updated continuously, or as and when it is required. Since this approach does not require the huge amount of data transfer from remote to central site, the network resources are used optimally. Hence existing client server based distributed data mining systems needs to be rewritten using Mobile Agents paradigm, to take the advantage they offer. The development of methodology for consolidating the collected Knowledge from different sites is in progress.
0 1
2
3
4
5. Bibliography
Trial Num ber
Figure-3: Comparison of File Transfer using FTP and Agent.
F i le t r ansf er using A g let s f o r d i f f er ent chunks ( 1M B F ile)
180000 169804
Transfer time
160000
140000
Time in mSec
120000
100000
80000
75619
60000
40000
40899 31405
20000
14351
7791
4687
1873
1081
681
651
641
0
Chunk size in Bytes
Figure-4: Mobile agent’s Performance with varying Chunk size.
[1] Chong Xu, Dongbin Tao. “Building Distributed Application with Aglet”. Project is listed in the IBM Tokyo Research Lab's aglet homepage. [3] Danny B. Lange. “Mobile Objects and Mobile Agents: The Future of Distributed Computing”, 12th European Conference on Object-Oriented Programming Brussels, Belgium, July 20 - 24, 1998 [4] Danny Lange and Mitsuru Oshima, “Programming and deploying Java mobile agents with Aglets”, Addison-Wesley, 1998. [5] Agrawal R. Srikant R., “Fast Algorithm for Mining Association Rules in large databases”. Research report , IBM Almedan Research Center, San Jose, California, June 1994. [6] David Kotz, Robert Gray, and Daniela Rus, “Future Directions for Mobile Agent Research”, Dartmouth College.IEEE Distributed System Online-August 2002. [07] Gian Pietro Piccolo, “Reflections about Mobile Agents and Software engineering”, Dipartimento di Elettronica e Informazione Politecnico di Milano, Italy. [08] Giovanni Vigna, “Mobile Agents: Ten Reasons for Failure”, Reliable Software Group, Department of Computer Science University of California, Santa Barbara. [09] Danny B Lange and Mitsuru Oshima, “Seven good reasons for mobile agents”, Communication of the ACM, 42(3):8889, March 1999. [10] Robert Cooley, Bamshad Mobasher, Jaideep Srivastava, “Web Mining: Information and Pattern Discovery on the World Wide Web”, Department of Computer Science, University of Minncosta, USA. [11] Jian Pei, Jiawei Han,et al, “Mining Frequent Item sets with Convertible Constraints”, Simon Fraser University, Canada. [12] David W. Cheung, Vincent T.Ng, Ada W.Fu et al, “Efficient Mining of Association Rules in Distributed
Database”, IEEE Transaction on Knowledge and Data Engineering, Vol 8, No. 6, Dec 1996. [13] R. Agarwal, T. imielinski, and A. Swami. “Mining association rules between sets of items in large databases”. In Proc. Of the ACM SIGMOD Conference on Management of Data, Washington, D.C., May 1993. [14] Agrawal R., Mannila H., Srikant R., Toivonen H., Verkamo A., “Fast Discovery of Association Rules”, in Advances in Knowledge Discovery and Data Mining, MIT 1996. [15] Hillol Kargupta. IIker Hamzaoglu, Brianstafford, “Parallel Data Mining Agents (PADMA)”, Los Alamous National Laboratory. [16] Walter A. Kosters, Wim Pijls., “APRIORI A depth First Implementation”. [17] The SPIDER Data Mining Project, Mini symposium on Data Mining, 6th International Conference on High Performance Computing, Culcutta, India, December 1999. [18] Philip K. Chan, Wei Fan, et al, “Distributed Data Mining in Credit Card Fraud Detection”, Nov/Dec 1999 IEEE, pp67. [19] P.L.Shopbell, M.C.Britton, R.Ebert, “Distributed Data Mining for Astrophysical Datasets”, Astronomical Data Analysis Software and Systems XIV, ASP Conference Series, Vol. XXX, 2005. [20] D.B.Skillicorn, “Distributed Data Intensive Computation and the Data centric Grid”, School of Computing, Queen’s University. [21] Mohammed J. Zaki, Yi Pan “Introduction: Recent Developments in Parallel and Distributed Data Mining”, Distributed and Parallel Databases, Kluwer Academic Publisher, 2002. [22] Shonali Krishnaswamy, Seng Wai Loke, Arkady Zaslavsky, “Supporting The Optimization of Distributed Data Mining by Prediction Application Run Times”. [23] Naren Ramakrishnan, Ananth Y. Grama, “Mining Scientific Data”, Department of Computer Science, Virginia Tech, VA 24061 & Purdue University , IN. [24] Parthasarathy S., Subramonian R., “An Interactive Resource Aware Framework for Distributed Data Mining”, in News letter of the IEEE Technical Committee on Distributed Processing, Spring 2001, pp24-32. [25]. Turinsky A., Grossman R., “A Framework for Finding Distributed Data Mining Strategies that are Intermediate between Centralized strategies and Inplace Strategies”, Workshop on Distributed and Parallel Knowledge Discovery at KDD-2000, Boston, pp1-7. [26] H. Kargupta, B. Park, D. Hershberger, and E. Johnson. Advances in Distributed and Parallel Knowledge Discovery, chapter 5, “Collective Data Mining: A New Perspective towards Distributed Data Mining”, AAAI/MIT Press, 2000. [27] S. J. Stolfo, A. L. Prodromidis, S. Tselepis, W. Lee, D. W. Fan, and P. K. Chan. “JAM: Java agents for metalearning over distributed databases”, In Proc. KDD-97, pages 74–81, Newport Beach, California, USA, 1997. [28] S. Bailey, R. Grossman, H. Sivakumar, and A. Turinsky. Papyrus: a system for data mining over local and wide area clusters and super-clusters. In Proc. Conference on Supercomputing, page 63. ACM Press, 1999. [29] Matthias Klusch et al, “The Role of Agent in Distributed Data Mining: Issues and Benefits”, Proceedings of the
IEEE/WIC International conference on Intelligent Agent Technology (IAT’03) 2003 IEEE. [30] Thomas H. Hinke et al, “Data Mining on NASA’s Information Power Grid”, 1082-8907/00, 2000 IEEE. [31] Frank Wang et al, “A Distributed and Mobile Data Mining System”, 0-7803-7840-7/03, 2003 IEEE. [32] S. Krishnaswamy et al, “An Architecture to Support Distributed Data Mining Services in E-Commerce Environments”, 0-7605-0610-0/00, 2000 IEEE. [33] Valdimir Gorodetsky et al, “Multi-agent Technology for Distributed Data Mining and Classification”, Proceedings of the IEEE/WIC.