An Architecture to Support Distributed Data Mining ... - CiteSeerX

An Architecture to Support Distributed Data Mining Services in E-Commerce Environments• S. Krishnaswamy1, A. Zaslavsky1, S.W. Loke2 School of Computer Science & Software Engineering, Monash University1 900 Dandenong Road, Caulfield East 3145, Australia Email: {shonali.krishnaswamy, arkady.zaslavsky}@csse.monash.edu.au CRC for Enterprise Distributed Systems Technology2 900 Dandenong Road, Caulfield East 3145, Australia Email: [email protected] Abstract This paper presents our hybrid architectural model for Distributed Data Mining (DDM) which is tailored to meet the needs of e-businesses where application service providers sell DDM services to e-commerce users and systems. The hybrid architecture integrates the clientserver and the mobile agent technologies. This model focuses on the optimisation and costing issues of DDM which are particularly relevant in the context of billing users for data mining services.

1. Introduction The paradigm of Application Service Providers (ASP) has emerged recently to address the application software needs of medium range enterprises. The underlying principle of ASP is the notion of “renting” software [4]. Thus, instead of buying a package and installing it, organisations logon to an application service provider (via either the Internet or a dedicated communication channel) and use the application packages provided by the ASP and pay for this usage. This paradigm is particularly useful for small to medium range organisations as they are often constrained by the high cost of software. The emergence of ASP is closely tied with e-commerce [8]. Technologies like e-commerce provide an opportunity for small and medium range companies to compete in global markets, which were previously the domain of large organisations and multi-nationals. ASP technology provides smaller companies with cutting edge software •

technology that makes the e-business arena a more level playing field. The objective of this paper is to demonstrate how distributed data mining (DDM) can be provided as a service hosted by an ASP in an e-commerce environment. We present out hybrid DDM architecture that is tailored to meet the specific needs of an environment where ecommerce systems interact with ASP for fulfilment of their data mining needs. Data mining has been recognised as a contributing technology in e-commerce frameworks [1]. However, it is the evolution of data mining along the dimension of distribution that has enabled easier integration with e-commerce environments (which are intrinsically linked with the World Wide Web, distribution and heterogeneity). Distributed data mining systems are largely seen as operating within intraorganisational domains (albeit to meet the needs of global organisations which have distributed and heterogeneous data resources). However, when distributed data mining moves from the confines of an organisation to become a generic service provided by an ASP and accessed by different components of an e-commerce system, there are additional requirements and challenges that have to be addressed. The primary issues of concern are billing of users based on estimated costs and response times, improved performance to meet real time needs and the ability to be flexible and extensible to the diverse data mining needs of different organisations. In this paper we present our hybrid DDM architecture which addresses the above questions. The paper is

THE WORK REPORTED IN THIS PAPER HAS BEEN FUNDED IN PART BY THE CO-OPERATIVE RESEARCH CENTRE PROGRAM THROUGH THE DEPARTMENT OF INDUSTRY, SCIENCE AND TOURISM OF THE COMMONWEALTH GOVERNMENT OF AUSTRALIA.

organised as follows. In section 2, we illustrate the interactions of an e-commerce environment with a DDM service provided by an ASP. In section 3 we present our hybrid architecture for DDM. In section 4 we present the cost models which we have developed for the distributed data mining process. In section 5 we discuss related work. Finally, in section 6 we discuss the current status of our work and the future directions.

2. Role of DDM in an E-Commerce Scenario In this section, we present a hypothetical e-commerce scenario and focus on the role of distributed data mining as a service hosted by an ASP. Consider an on-line shopping centre which consists of customers, vendors and a trader. The customers access the shopping centre through a web-interface and interact with the vendors via the trader. The trader at one-level provides catalogue services to customers in terms of vendor profiles and availability of goods and services. At another level, the trader negotiates transactions between the customers and the vendors. The need for distributed data mining in such a scenario arises from two possible sources, namely, the vendor and the trader. The vendors’ data mining requirements have their origins in traditional data mining applications such as market basket analysis. The trader’s data mining needs will be centred on customer-profiling to improve the level of service provided to individual customers. Since the environment is inherently distributed and heterogeneous, the focus is on distributed data mining. In addition to the complexity of distribution, ecommerce adds to the mining process an additional dimension of complexity by emphasising the importance of optimised response time. For example, in a situation where a product required by a customer is not currently available, the trader might want to provide the customer with details such as the likelihood of when the product would be available by analysing past trends or similar products offered by vendors. The trader might also want to give the customer the incentive for waiting by analysing dependencies with seasonal specials. Traditionally, both the trader and individual vendors would have their own data mining systems to meet their individual business needs. However, the rapidly emerging ASP trend provides a means for distributed data mining to be a generic service. The advantage of this is that it allows organisations to access data mining services without having to be concerned with the setting up costs. Further, such a service would have the advantage of being extensible enough to incorporate a suite of data mining algorithms that different users and/or the ASP would offer as an integrated service. A framework for the role of distributed data mining in the e-commerce scenario discussed above is illustrated in figure 1. The components of the figure are:

Customers. Customers are buyers who use the on-line shopping centre to procure goods and services. E-Commerce system. The e-commerce system provides the infrastructure for the on-line shopping centre. It comprises a web interface, an e-catalog, an intermediary and a database. The web interface is the point of access for the customers into the shopping centre. The “ecatalog” is a directory of the goods, services and vendor profiles. The “intermediary” negotiates transactions between the customers and the vendors. The “database” is used to maintain transaction details, vendor and customer information for use by the e-catalog and the intermediary. Vendors. The vendors are the businesses that use the online shopping centre as a medium for marketing and selling their products.

DDM Service

WEB-INTERFACE

E-CATALOG Database

TRADER

On-line Shopping Centre Vendor 1 Oracle

Flat Files

Vendor 2

Sybase

Legacy

Application Service Provider

Figure 1 DDM in an E-Commerce Environment

Application Service Providers (ASP). The ASP provides application services to the e-commerce system components and the vendors. The focus in the above scenario is on the data mining service that is provided by the ASP. Those vendors that require this service and the e-commerce system pay the ASP for accessing the distributed data mining systems that is provided. Distributed Data Mining (DDM) System. This is the distributed data mining system that the ASP uses to provide generic data mining services to its subscribers. In order to support the robust functioning of the system in the environment illustrated in figure 2 it needs to possess certain characteristics such as heterogeneity, costing infrastructure, optimisation, security and extensibility. Heterogeneity implies that the system must be able to mine data from heterogeneous and distributed locations. It must be able to support user requirements with respect to different distributed computing paradigms (including the client-server and mobile agent based models). The underlying philosophy is that the ASP should not impose one of the models on the users and must be able to

support specific needs and requirements that are suitable for the user. The costing infrastructure refers to the system having a framework for estimating the costs of different tasks. This implies that a task that requires higher computational resources and/or faster response time should cost the users more on a relative scale of costs. Further, the system should be able to optimise the distributed data mining process to provide the users with the best response time possible (given the constraints of the mining environment and the expenses the user is willing to incur). Security implies that in some instances, the user might be mining highly sensitive data that should not leave the owner’s site. In such cases, the option is to use the mobile-agent model where the mining algorithm and the relevant parameters are shipped to the data site and at the end of the process the mobile agent is destroyed on the site itself (i.e. it does not leave the site). The system must be extensible to provide for a wide range of mining algorithms. User must be able to register their algorithms with the ASP for use in their specific DDM jobs. This implies that there needs to be a high level semantic specification of the distributed data mining process. The above discussion highlights the need for distributed data mining in e-commerce. It also outlines the specific requirements for a DDM system to operate in an e-commerce environment as a generic service provided by an ASP.

3. Hybrid Model for Distributed Data Mining In this section we present our hybrid model for distributed data mining which is tailored to meet the requirements for DDM systems to operate in e-commerce and ASP dominated environments. The distinguishing features of this architecture are the integration of the client-server model and the mobile-agent paradigm and an “optimiser” which builds cost estimates for DDM tasks. It supports the ability for mining to be performed at remote sites using mobile agents and it also incorporates dedicated data mining servers with well-defined computational resources. This helps it to deal with heterogeneous and varied client needs. The cost estimates address the issues of costing and optimisation of the DDM process. The hybrid model operates on the principle of adopting the most suitable approach for a DDM task depending on user and resource constraints. Thus, it has the option of using the client-server model or the mobileagent model or an integrated approach involving both. The components of the hybrid DDM architecture illustrated in figure 2 are as follows: Users. The users request data mining services by connecting to the distributed data mining server. Dedicated Distributed Data Mining Server. This is a

server with high computational power that acts as both the point of control for the distributed data mining process and the provision of dedicated resources for mining. The server maintains the distributed data mining management system. Distributed Data Mining Management System (DDMMS). The DDMMS is the software that performs the various tasks associated with the distributed data mining process. The DDMMS forms the core of this architecture and the way it is structured encapsulates the framework for resource optimisation. The components within the DDMMS include a user manager, algorithm manager, optimiser, mining process manager and an agent control centre. We now present a detailed outline of the functionality and structure of each of these subcomponents. User Manager. The users connect to the distributed data mining system through the user manager. The user manager performs the following functions: authentication of users, profiling of the data mining task in terms specifying the user requirements including the data mining query, the output required, and the time frame within which the output is required and assigning priorities to tasks as they arrive. Algorithm Manager. The algorithm manager’s primary task is to maintain the data mining algorithms that are part of the distributed data mining system. Users can register any mining algorithm with the system. The users can choose to make available the algorithms that they have registered to other users. At the time of incorporating an algorithm into the system, the algorithm manager records meta level information about the algorithm and its characteristics such as name, version, input parameters, operating environment and output produced. The algorithm manager feeds this information to the mining process manager, which maintains profiles about algorithmic characteristics. Optimiser. The optimiser is the component that is primarily responsible for building an estimated cost of alternative strategies and determining the best option for performing the data mining task to meet user needs. The optimiser interacts with the mining process manager in order to collect statistics regarding the current status of the communication channels and the task profile (specifically to determine the user requirements for task completion and the algorithm allocated for the task). It also interacts with the agent control centre (i.e. the mine sweeper agent) for details regarding the data set size. Using the data collected by the mine sweeper and the mining process manager, the optimiser builds an estimated cost model for the alternative ways to perform the data mining and decides on the option that will meet the user requirements as closely as possible.

USERS

Result

Notebook

Workstation

PC

DDM Mangement System O p t i m i s e r

Algorithm Manager

User Manager

Mining Process Manager

User Agent Mine Sweeper Agent

Knowledge Integrator

Agent Control Centre Local Computational Resources

Mining Agent

''06HUYHU Resource Monitoring Agents

Data Server 1

Network Monitoring Agent

Client Server Model

Data transfer for mining locally

Data Server2 Status Monitoring Information

Mobile Agent Model

Figure 2. Hybrid Model For Distributed Data Mining Mining Process Manager. This module forms the core of the distributed data mining system. It is basically the coordinating facility between the different components of the system and provides information that is current about the system. It forms a point of reference from which information can be obtained regarding the current status of various aspects of the system. The mining process manger can be viewed as a dynamic directory for distributed data mining tasks. We are currently developing a specification of the entire spectrum of the data mining process. This specification will form the basis for the operations of this component. To the best of our knowledge, the mining process manager is the first integrated attempt in dynamically tracking and specifying the components and their interactions within the distributed data mining framework. Agent Control Centre (ACC). The agent control centre is the framework within which the agent activities in the distributed data mining system take place. The ACC is responsible for activating/generating/assembling other agents required for the data mining process. The different agent types and their tasks are briefly discussed below. User Agents. The user agent’s primary task is to support the push model for providing users with updates of the status of their tasks and the final results of the data mining. Network Monitoring Agent. This agent continuously monitors the links to the data servers by traversing the

network and updating the communication links status in the mining process manager. Data Resource Monitoring Agent. A Data Resource Monitoring agent is assigned to each data source that becomes part of the system. The agent is responsible for providing information about the contents of the data sources to the mining process manager. Mine-Sweeper Agent. The mine-sweeper agent is responsible for travelling to a data server, performing preprocessing of the data, determining the available computational resources at the data server and estimating the data size. Mining Agent. The mining agent is an instantiation of the mining algorithm allocated for a task. Knowledge Integrator. The knowledge integrator is a component that combines the data mining results from different data sources and provides the final result to the user agent which in turn communicates this to the user. In this section of the paper we have proposed an architecture for distributed data mining which includes an optimisation component. We now present a mathematical cost model used by the optimiser to develop cost estimates for DDM tasks.

4. Cost Models for Distributed Data Mining In this section of the paper, we present our cost models for distributed data mining. The cost formulae are

estimates of the distributed data mining response time for a given task using a specified architectural model when environmental factors are taken into consideration. The cost model provides the theoretical basis for estimating response times for DDM tasks and is used by the “optimiser” in the hybrid architectural model. We initially present the general cost model for distributed data mining. We then illustrate how the general model is mapped to the cost functions for alternative DDM scenarios. The response time (expressed at a high level of abstraction) for distributed data mining is as follows: T = tddm + tki (1) where T is the response time, tddm is the time taken to perform mining in a distributed environment and tki is the time taken to perform knowledge integration. Depending on the model used for distributed data mining (i.e. mobile agent or client-server) and the different scenarios within each model, the factors which determine tddm will change. This results in a consequent change in the actual cost function that determines tddm. In the following discussion we present the different distributed data mining scenarios and the cost functions to determine tddm for each case.

4.1. Mobile Agent Model This case is characterised by a given distributed data mining task being executed in its entirety using the mobile agent paradigm. The core steps involved are: submission of a task by a user, dispatching of mobile agent (or agents) to the respective data server (or servers), data mining and the return of mobile agent(s) from the data resource(s) with mining results. This model is characterised by a set of mobile agents traversing the relevant data servers to perform mining. In general, this can be expressed as m mobile agents traversing n data sources. There are three possible alternatives within this scenario. The first possibility is m = n, where the number of mobile agents is equal to the number of data servers. This implies that one data mining agent is sent to each data source involved in the distributed data mining task. The second option is m < n, where the number of mobile agents is less than the number of data servers. The implication of having fewer agents than servers is that some agents may be required to traverse more than one server. We do not consider the third case of m > n since this is in effect equivalent to the case 1 above where there is a mobile agent available per data server. Each of the above alternatives has its own cost function. These cost models are described as follows. 4.1.1. Equal number of mobile agents and data servers (m=n). This is a case, as illustrated in figure 3, where data mining from different distributed data servers is performed in parallel. The algorithm used across the

different data servers can be uniform or varied. The system dispatches a mobile agent encapsulating the data mining algorithm (with the relevant parameters) to each of the data servers participating in the distributed data mining activity. Let n be the number of data servers. Therefore, the number of mobile agents is n (since m=n). In order to derive the cost function for the general case involving n data servers and n data mining agents, we first formulate the cost function for the case where there is one data server and one data mining agent. Agent Centre

Agent 1 Agent3 Agent 2

Data Source 1

Data Source2

Data Source n

Figure 3. Equal number of mobile agents and data sources Let us consider the case where data mining has to be performed at the ith data server (i.e 1≤ i ≤ n ). The cost function for the response time to perform distributed data mining involving the ith data server is as follows: tddm= tdm(i) + tdmAgent(AC, i) + tresultAgent(i,AC) (2) The terms in the above cost estimate are discussed below. tddm. This is the response time for performing distributed data mining. In this particular case the distributed data mining process is characterised by one data server and one mobile agent. tdmAgent(AC, i). In our cost model, the representation tmobileAgent(x, y) refers to the time taken by the agent mobileAgent to travel from node x to node y. Therefore tdmAgent(AC, i) is the time taken by the mobile agent dmAgent (which is the agent encapsulating the mining algorithm and the relevant parameters) to travel from the agent centre (AC) to the data server (i). In general, the time taken for a mobile agent to travel depends on the following factors: the size of the agent and the bandwidth between nodes (e.g. in kilobits per second). The travel time is proportional to the size of the agent and is inversely proportional to the bandwidth (i.e. the time taken increases as the agent size increases and decreases as the bandwidth increases). This can be expressed as follows: tdmAgent(AC, i) ∝ size of dmAgent (3) tdmAgent(AC, i) ∝ 1 / bandwidth (4) From (3) and (4): tdmAgent(AC, i) = ( k * size of dmAgent ) / (bandwidth between AC and i) In the above expression for the time taken by the data mining agent to travel from the agent centre to the data

server, k is a constant. In [12] the size of an agent is given by the following triple, size of an agent = < Agent State, Agent Code, Agent Data> where, agent state is the execution state of the agent, agent code is the program that is encapsulated within the agent that performs the agent’s functionality and agent data is the data that the agent carries (either as a result of some computation performed at a remote location or the additional parameters that the agent code requires). On adapting the above representation to express the size of the data mining agent (dmAgent), we get, size of an dmAgent = < dmAgent state, data mining algorithm, input parameters> tresultAgent(i, AC). This is the time taken for the data mining results to be transferred from the data server (i) to the agent centre (AC) is estimated similarly to the dmAgent. This agent does not carry code to be executed, but merely transfers the results to the agent centre for knowledge integration. It must be noted that unlike the time taken by the data mining agent to travel from the agent centre to the data site, the time taken for the result to be carried cannot be estimated a priori since size of the results depends on the characteristics of the data. tdm(i). This is the time taken to perform data mining at server i. The duration of the data mining process (tdm) depends on the factors such as the processor speed, available main memory, data size and complexity of the algorithm. We are currently working on developing techniques for estimating the data mining response time based on historical information. In the foregoing discussion, we developed the cost estimate for distributed data mining involving a scenario where there was one mobile agent and one data server. We now extend the cost estimate for the general case characterised by n mobile agents and n distributed data sources. Let there be n data sources which need to be accessed for a particular distributed data mining exercise. The agent centre dispatches n mobile agents encapsulating the respective mining algorithms and parameters (i.e. one to each of the data sources) concurrently. Mining is performed at each of the sites in parallel and the results are returned to the agent centre. Since the mining is performed at the distributed locations concurrently, the total time taken is equal to the time interval required by the server which takes the longest time to perform mining and return results. Therefore, tddm = max(tdm(i) + tdmAgent(AC, i) + tresultAgent(i,AC) where i = 1..n (5) The expression tdm(i)+ tdmAgent(AC,i)+ tresultAgent(i,AC) represents the duration of the data mining process in the ith server and i varies from 1 to n. The parts of the expression such as tdm(i), tdmAgent(AC,i) and tresultAgent(i,AC) estimated as explained previously. The

knowledge integration can only take place after all the results are returned to the agent centre. 4.1.2. Fewer mobile agents than data servers (m Thus the data carried by the data mining agent in this case includes both the input parameters and the data mining results. This implies that the size of the agent increases incrementally as the sites that have been mined increases along its route. We now extend the above cost estimate for the general case involving m mobile agents and n DDM sites. The n data sites are numbered from 1 through n. Let ds be labelling of a set of data sites. Therefore ds = { 1,2,…,n}. Let m be the number of data mining agents available. The set ds is divided into m subsets – ds1, ds2,… ,dsm with the following property: dsi ⊆ ds (i.e. dsi ⊆ {1,2,..., n} , 1 ≤ i ≤ m). The sets dsi (1 ≤ i ≤ m ) have the additional property of independence. That is, dsi ∩ dsj = φ, i ≠ j. A corollary of the above is that the sum of the cardinalities of the subsets ds1, ds2,… ,dsm is equal to the cardinality of ds. This implies that |ds1| + |ds2| +… + |dsm| = |ds|. Thus, the n data servers are divided into m subsets and the ith data mining agent is assigned the task of mining the data sites in the subset dsi (where 1 ≤ i ≤ m ). Distributed data mining can occur concurrently at one level (i.e. there are m different mining agents operating at the same time). However, each of these m agents could be assigned several sites to mine (i.e. the agent has to travel to different sites). Thus the ith agent (where 1 ≤ i ≤ m) has to travel to and perform mining in the number of sites specified in |dsi|. The total time taken to mine is therefore the time taken by the agent, which takes the maximum time interval to complete its task. The cost estimate for the ith agent’s response time is as follows in equation (8): t ddm

   

(i)

ds

i

= t dmAgent

(AC,

− 1

∑ j = 1

+ t dm (ds

i

(j))

ds

i (1))

+

+ t dmAgent

t dm (ds

i

( ds i ))

+ t dmAgent

(ds

(j, i

In the above expression, the first term is the time taken by the data mining agent to travel from the agent centre to the first data server in its path (i.e. the ith subset dsi). The term involving the summation is the time taken for the agent to mine and to travel to the respective data sites within the set assigned to it (excluding the final site). The second last term is the time taken to mine at the last data site in its path. The final term in the expression is the time taken for the agent to travel from the last site on its path to the agent centre. Since there are m agents operating concurrently, the time taken for completion of the distributed data mining process is the time taken by the agent requiring the longest completion time. Thus, tddm = max(tddm(i) ), where i = 1..m (9) where tddm(i) is estimated from equation (8).

4.2. Client-server Model The cost estimate for the response time in DDM systems that use the traditional client-server paradigm is presented in this section. Typically, data from distributed sources is brought to the data mining server – a fast, parallel server - and then mined. Let there be n data sites from which data has to be mined. Let si be the data set obtained from the ith site (where 1 ≤ i ≤ n). The response for DDM for the data set si from the ith site is as expressed in equation (11) as follows: tddm(i) = tdataTransfer(i,DMS, si) + tdm(DMS), 1 ≤ i ≤ n The term tdataTransfer(i,DMS, si) is the time taken to transfer the data set (si) from the ith site to the DDM server (DMS) and estimated as follows: tdataTransfer(i,DMS, si) = size of si / ( bandwidth between i and DMS ) The second term, tdm(DMS) is the time taken to mine at the data mining server and is estimated as discussed in section 2.1. As illustrated from equation (11), the data transfer component adds to the tddm process. This can be a significant addition when the data volumes are large and/or the bandwidth is low. If the mining is done in a parallel server, then the total response time for n data sources is equal to the time taken by the data set requiring

T

=

n

∑

i = 1

t

ddm

(i) +

t

ki

the maximum processing time. From equation (11), T = max(tddm(i) ) + tki , where i = 1..n (12) If on the other hand the client-server model mines the data sets sequentially, the total response time is as expressed in equation (13) as follows:



4.3. Hybrid Model

 

The hybrid model has the following two principal characteristics. It combines the best aspects of the agent model and the client-server approach by incorporating an

j + 1) 

( ds i ), AC)

agent framework with a dedicated data mining server. It brings with it the advantage of combining the concept of dedicated data mining resources (and thus alleviating the issues associated with lack of control over remote computational resources in the agent model). It also has the ability to circumvent the communication overheads associated with the client-server approach. This gives the model the option of applying either approach to a particular DDM task. Let n be the number of data sites to be mined. The optimiser decides on the basis of the cost estimates that it builds that na sites are to be mined using the agent model and ncs sites are to be mined using the client-server paradigm: n = na + ncs. When ncs=0, the DDM architectural model deployed is in effect the agent model and vice-versa. The response time in the hybrid model assuming that the mobile agent-based mining components and the client-server mining are done in parallel is the time taken by the technique requiring a longer duration. thybrid = max (tna ,tcs) (14) where tna is the time taken to mine the na sites using the agent model and tcs is the time taken to mine the ncs sites using the client-server paradigm.

5. Related Work Several DDM systems using either agent-based architectures or the client-server model have been proposed. The agent paradigm is the more popular approach for building DDM systems. These include Parallel Data Mining Using Agents (PADMA) [5], Generic Data Mining [2], InfoSleuth [7], Java Agents for Meta-Learning (JAM) [11], Besizing Knowledge through Distributed Heterogeneous Induction (BODHI) [6] and Papyrus [10]. DDM systems based on the client-server paradigm include DecisionCentre [3] and IntelliMiner [9]. To the best of our knowledge, there has been no previous attempt to develop an architectural model to support online DDM services provided by Application Service Providers.

6. Conclusions and Future Directions In this paper, we have presented a hybrid architecture for distributed data mining that integrates the client-server and the mobile agent models. We have shown how the hybrid DDM model enables the provision of generic DDM services by an ASP in an e-commerce environment by addressing the issues of optimisation, costing and extensibility. We have also developed a cost model for estimating the DDM response time for alternative scenarios to aid in the optimisation process. The experimental validation of the cost model and the

implementation of the hybrid DDM architecture are currently in progress.

References [1] Adam,N,R., Dogramaci,O., Gangopadhyay,A., and Yesha,Y., (1999), “Electronic Commerce: Technical, Business and Legal Issues”, Prentice Hall, New Jersey, USA. [2] Botia,J,A., Garijo,J,R., and Skarmeta,A,F., (1998), “A Generic Data Mining System: Basic Design and Implementation Guidelines”, in Workshop on Distributed Data Mining at the 4th Int. Conf. on Data Mining and Knowledge Discovery (KDD98), New York, USA, AAAI Press. [3] Chattratichat,J., Darlington, J., Guo,Y., Hedvall,S., Köhler,M., and Syed,J., (1999), “An Architecture for Distributed Enterprise Data Mining”, in Proc. of the 7th Int. Conf. on High Performance Computing and Networking (HPCN Europe’99), Amsterdam, The Netherlands, SpringerVerlag LNCS 1593. [4] Clark-Dickson,P., (1999), “Flag-fall for Application Rental”, Systems, (August), pp.23-31. [5] Kargupta,H., Hamzaoglu,I. and Stafford,B., (1997), “Scalable, Distributed Data Mining Using An Agent Based Architecture”, in Proc. of the 3rd Int. Conf. on Knowledge Discovery and Data Mining, Newport Beach, California, (eds), D.Heckerman, H.Mannila, D.Pregibon, and R.Uthurusamy, AAAI Press, pp. 211-214. [6] Kargupta,H., Park,B., Hershberger,D., and Johnson, E., (1999), “Collective Data Mining: A New Perspective Toward Distributed Data Mining”, to appear in Advances in Distributed Data Mining, (eds) H.Kargupta and P.Chan, AAAI Press. [7] Martin,G., Unruh,A., and Urban,S., (1999), “An Agent Infrastructure for Knowledge Discovery and Event Detection”, Technical Report MCC-INSL-003-99, Microelectronics and Computer Technology Corporation (MCC). [8] Morency,J., (1999), “Application Service Providers and EBusiness”, Network World Fusion Newsletter, URL:http://www.nwfusion.com/newsletters/nsm/0705nm.html [9] Parthasarathy,S., and Subramonian,R., (1999), “Facilitating Data Mining on a network of workstations”, to appear in Advances in Distributed Data Mining, (eds) H. Kargupta and P.Chan, AAAI Press. [10] Ramu,A,T., (1998), “Incorporating Transportable Software Agents into a Wide Area High Performance Distributed Data Mining Systems”, Masters Thesis, University of Illinois, Chicago, USA. [11] Stolfo,S,J., Prodromidis,A,L., Tselepis, L., Lee,W., Fan,D., and Chan,P,K., (1997), “JAM: Java Agents for Meta-Learning over Distributed Databases”, in Proc. of the 3rd Int. Conf. on Data Mining and Knowledge Discovery (KDD-97), Newport Beach, California, (eds) D.Heckerman, H.Mannila, D.Pregibon, and R.Uthurusamy, AAAI Press, pp. 74-81.

[12] Straßer,M., and Schwehm,M., (1997), “A Performance Model for Mobile Agent Systems”, in Proc. of the Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA’97), (eds) H. Arabnia, Vol II, CSREA, pp. 1132-1140.

An Architecture to Support Distributed Data Mining ... - CiteSeerX

An Architecture to Support Distributed Data Mining ... - CiteSeerX

Suggest Documents

An Architecture for Distributed Enterprise Data Mining - CiteSeerX

An Architecture to Support Collaborative Distributed ...

A Distributed Architecture for Data Mining and Integration - CiteSeerX

Parallel and Distributed Data Mining: An ... - BYU Data Mining Lab

Access-Control Architecture to Support E-CRM and Distributed Data ...

An Information Architecture to Support Dynamic ... - CiteSeerX

An Architecture to Support Scalable Distributed Virtual ... - HKU

Distributed Data Mining Bibliography

DXCS: an XCS System For Distributed Data Mining - CiteSeerX

Cost Models for Distributed Data Mining - CiteSeerX

PaDDMAS: Parallel and Distributed Data Mining ... - CiteSeerX

An architecture for the SPIN! spatial data mining platform - CiteSeerX

Towards an Open Service Architecture for Data Mining on ... - CiteSeerX

The Chamois Reconfigurable Data- Mining Architecture - CiteSeerX

Distributed large data-object management architecture - CiteSeerX

An Agents & Artifacts approach to Distributed Data Mining

Distributed Data Mining in Peer-to-Peer Networks - CiteSeerX

Distributed Data Mining in Peer-to-Peer Networks - CiteSeerX

A Data Mining Architecture for Distributed Environments - Springer Link

KNOWLEDGE GRID An Architecture for Distributed ... - CiteSeerX

Towards an Architecture for Distributed Multimedia ... - CiteSeerX

data mining to support anaerobic wwtp monitoring - CiteSeerX

An Architecture for Mobile, Distributed Application ... - CiteSeerX

Spatial Data Mining to Support Pandemic Preparedness - CiteSeerX