A proposal for Meta-learnig through a Multi-agent System

2 downloads 0 Views 262KB Size Report
10] Donald Michie, David J. Spiegelhalter, and. CharlesC. Taylor, editors. Machine Learning,. Neural and Statistical Classi cation. Ellis Hor- wood, 1994.
A proposal for Meta-learnig through a Multi-agent System Juan A. Bota [email protected] Antonio G. Skarmeta [email protected]

Mercedes Garijo and Juan R. Velasco fmga, [email protected]

Dep. Informatica, Inteligencia Arti cial y Electronica Facultad de Informatica Universidad de Murcia, E{30001 Murcia, Spain

Dep. de Ingeniera de Sistemas Telematicos, E.T.S.I. Telecomunicacion Universidad Politecnica de Madrid, E{28040 Madrid, Spain

1 Introduction The problem we tackle in the present work has become very important the last years. This is the problem of automating the process of deciding the most suitable learning algorithm for a concrete intelligent data analysis task. We call it the Meta-learning problem. Maybe, the most important reason for the ocurrence of this fact is a certain stability situation in both machine learning and datamining reseach elds. In one hand, lots of algorithms have been developed that try to create approximated theories from a number of observated examples (for a good introduction to the state of the art you can see [11]). In the other hand, algorithms coming from the machine learning paradigm have been proved successful in many real world datamining applications. But a problem still remains open and that is that there is a lack of methodollogies that guides the process of the correct design of a machine learning experiment. In gure 1 you can see the typical knowledge discovery session scheme. For a compendium of KDD (Knowledge Discovery in Databases) applications you can see [4]. Once we have got familiar with the problem domain, and have the data ready to get automatically analised, two important decissions have to be taken by the data miner and these are (1) choosing a data mining task and (2) deciding which data analysis algorithm to apply to the data. Those decission have Figure 1: A Scheme of a complete Inductive Knowlto be taken one after the other as (2) heavely depends edge Discovery session on decission (1). Understand Problem Domain

User Goal Definition

Prior Knowledge Compilation

Select Target Data

Data Clearning

Data Reduction

Knowledge Language Language

DM Task Selection

Model Knowledge Language Evaluation

Learning Algor. Selection

Model Knowledge Search Language

Perform DM

Interpretate & Evaluate

OK Knowledge

Exploit New Knowledge END

1.1 Choosing the data mining task Some datamining tasks could be:  This work has been partially funded by the Spanish Government under the CICYT project TIC 97-1343-C02-01

1

Change Algorithm’s Parameters

 Classi cation is learning a function that maps

Meta data parameters about the source data like number of available tuples, number of features, nulls percentaje and so have to be taken into account in order to get a data theory with an optimun quality.

 Regression is learning a function that maps

3. type of model to obtain: this is related with the above mentioned user satisfaction level in the sense of obtainning a model that the user can easily understand and interpretate is an important thing. Besides, the data mining expert could have more skills with certain types of models than with others so it could prefer the rst group of them. Another reason to consider the type of data model as an important factor for deciding the task type is that a data mining task of regression that does not require an interpretation of the data leads to use arti cial neural networks as black boxes and a data mining task of detecting clouds of similar behaviour will lead us to a clustering model (i.e. a set of points and a metric to obtain the tuples that are closer to each group).

a data item into one of several prede ned classes. Think, for example in automated identi cation of objects of interes in large image databases, or classifying trends in nancial markets. a data item to a real-value prediction variable. There are many examples of regression use: predicting consumer demand for a new product, depending on the advertising expenditure, estimating the probability that a patient will get better, or worse, given the results of a set of diagnostic tests, etc.

 Clustering is identifying a nite set of cate-

gories or clusters to describe the data, being the categories mutually exclusive and exhaustive, hierarchical or overlapping. Examples are discovering homogeneous sub-populations of consumers in a marketing database.

 Summarization globes methods for nding a

compact description for data. A simple example is tabulating the means and standard deviations for all elds. A more sophisticated method 1.2 Choosing the right algorithm is the discovery of functional relations between The concrete learning algorithm choosen will be devariables. termined by three parameters. The rst is the way  Dependency Modeling is nding a modell in representing knowledge, the second is the strategy for describing structural and cuantitative de- for evaluating knew knowledge and the third is the pendiencies between variables. Having a Belief search technique for discovering knew knowledge. Bayesian Network in mind [5] , arrows between variables show structural relations and condi-  Model Representation is the language, L, for tional probabilities show the strength of them. describing discoverable patterns. The expressive power of a given language L must be fully  Change and Deviation Detection tries to known by the analyst because that power may discover most signi cant changes on data, from whether limitate the model accuracy or increase previously measured or normative values. This the search complexity. For example, a decission technique is used, for example, in credit card tree based language can not discover relations frauds detection. like x = y, but the search complexity is not so high(it is reduced to nd more discriminant atTo get deeper into the problem, we shall introduce tributes). Other like rst order logics have a very the factors that in uence decission (1). Choosing a powerful expressiveness but search complexity is data mining task depends on three factors: very high. 1. user requeriments: this is an important factor as a great portion of the software application  Model Evaluation depends on the data analysis primary task. The most conmonly used cuansuccess will depend on the nal user satisfactitative meassure for almost all models is accution level. We have clearly identi ed three diracy. Three approaches are available for estimatmensions on user requiremets, and those are the ing it: user's primary goal (i.e. why is he using a data analysis tool), its interactivity needs, and the { train & test: there exist two di erentiated dessired quality level of the nal data model. data sets, the training data set and the 2. the nature of the source data: as we understand evaluation data set. The rst one is used it, this is the most important factor in order to for learning and the second evaluates new choose among a set of data analysis algorithms. learned knowledge. 2

{ bootstrap: various learning processes are maintain the idea of using all datamining session re-

made in parallel, with di erent data sets, formed randomly and with replacement from the original data set and then all classi ers are tested with the examples that didn't take part at the corresponding training phase. All error rates can be combined then to obtain stadistics. { cross-validation: the original data set is divided into m subsets and m learning processes are done in parallel, each one with a training set formed with the union of m ? 1 subsets: all sets but it's evaluation one that must be distinct for all m classi ers. The evaluation ratio is the mean of the m produced ratios. Evaluation in a classi cation task involves other factors as well as the prediction accuracy. These are novelty, understandability and utility. Perhaps these are more subjective criteria.  Search Method depends on whether we are searching for a model or for the right parameters for an early xed model. Searching for a model implies a larger search space, explorated usually with heuristics. Searching for parameters that optimize a xed model with observed data is usually a greedy search, as the gradient descent method of back propagation neural nets.

sults as a feedback for the system in order to automatically learn what sould be done in the next session. In this sense the approach is inductive if we see each datamining session results as a tuple of a meta-learning data set. Section 2 is dedicated to introduce and explain the part of the multi-agent architecture devoted to o er a repository of intelligent analysis algorithms. The following section, 3 shows our approach to a ord scalability. Section 4 gets deeper into the meta-learning issue and give details on the inductive approach we use. In section 5 related works are introduced and both conclusions and future works pointed out.

2 A Multi-agent Architecture for Intelligent Data Analysis In this section we will concentrate our attention on the software engineering process of the encapsulation of a machine learning algorithm into a software agent. In order to better understand what could be a sample scenario of the nal system we are propossing, that we call GEMINIS (GEneric MINIng System), in this work we will begin showing a possible con guration of the system, that appears in gure 2. This gure shows a host named A that runs three di erent processes. Each one of this processes is a di erent data analysis algorithm. There are two of them that work with decission rules, (i.e. AQ11 and ID3) and another one that is an anti cial neural networks that learns with the backpropagation rule. The host named as B is running FOIL that learns rst order logic predicates, and C4.5 that generates regression decission trees. At least, in the host named C there is a single algorithm, ANFIS that generates fuzzy rules from data. There are two important issues that must be taken into account after having a look at the gure:

1.3 Our Proposal

Once the problems have been stated, now is time to o er solutions. As it has been clearly emphasized, the more analysis algorithms we have, the more nal user satisfaction we can achieve. One of the directions of this work is the proposal of a multi-agent architecture that provides a repository of ready to go datamining algorithms that:  are encapsulated into learning agents. Each one of them o ering a common interface so all of them can be managed in the same manner no matter the type of datamining algorithm they are.  can be distributed in di erent machines, with di erent operating systems and hardware architectures. In this way we can o er both a powerful datamining tool (in terms of performance) and a scalable scheme of integration of datamining algorithms as the system can be extended through as many hosts as we would need. The other direction is that this system is being used, at the same time, to study the problem of metalearning, under an inductive approach. That is, we

 All algorithms are seen in the same manner by a

possible client (i.e. they o er the same interface in terms of service calls).

 There is a middleware that o ers distrution services and holds communication between clients and learning agents based on CORBA (Common Object Request Broker Arquitecture).

2.1 Learning Services Declaration

Both symbolic and connexionists algorithms can run without any problem in the same host. Besides, each one is self-contained in the sense of none of them need another one to give its learning services. 3

ID3 AQ11

FOIL

ANN-Backpropagation

C4.5

module algorithms{ interface IMachineLearningService{ void conf_algorithm();

A Host

void make_learning(in TLearningTimeConstraint mode,in unsigned long time) raises (AlgorithmIsNotConfiguredException, BadConfigurationException, LearningExperimentDoNotConvergeException, BestEffortDoneBeforeTimeException, TimeTooConstrainedException);

B Host

model::IInference make_inference(in data_server::IExample example) raises (LearningIsNotDoneException, BadExampleException); //This call must return, in some way represented, the learned model model::IModelService get_learned_model() raises(LearningIsNotDoneException); //This call returns a double representing, in some way, the quality of the //learned model double get_experiment_performance() raises (LearningIsNotDoneException);

CORBA BUS

//This call will allow for a new experiment to begin void forget_all() raises (LearningIsNotDoneException); };

ANFIS

};

C Host

Figure 3: Common Services Interface for all Learning algorithms Figure 2: Typical scenario of GEMINIS. Notice that there is a middleware that works as an IPC (Inter Process Communication) layer and it is based on CORBA.

the initial parameters of the learning algorithm. Initial parameters have a group of general parameters, common for all algorithms (e.g. the data source to learn from) and a group of speci c parameters (e.g. for an arti cial neural network with the backprop learning rule there are parameters like the learning rate, momentum, topology, etc.)  void make learning(): this is, obviously, the most important call and makes the algorithm to learn a new theory from data. The algorithm must be con gured (i.e. its parameters must be setted) before learning. If this is not so, then a AlgorithmIsNotConfiguredException exception must be raised. There are three modes for learnig indicated by the argument of type TLearningTimeConstraint. These are: 1. FREE MODE: the algorithm can learn till its process converges to a model acceptable by itself. 2. BEST EFFORT: the algorithm is given a time limit and the client must wait for it to learn. If the time limit is passed and the algorithm did not nish learning, it must stop and return the best model obtained at the moment (the client only sees a normal ending of the call). It the dessired model is obtained at time, then a BestEffortDoneBeforeTimeException exception is raised from the server and the learning call nish. 3. TIME CONSTRAINED: this time the time limit is severe and the model is only accepted if

The set of common services that all algorithms are o ering must be speci ed in a descriptive language. This language is IDL (Interface De nition Language) [12] and it is commonly used to de ne CORBA based distributed services that could be acceses from anywhere through the CORBA bus. The gure 3 shows a reduced IDL de nition of the services that all learning algorithms in GEMINIS should o er. In that speci cation, there is a names space called algorithms inside which all declarations that are related to data analysis algorithm should be enclosed. Inside this name space, it appears the interface IMachineLearningService that includes all services that must be o ered by learning algorithms in GEMINIS. IDL follows an object oriented approach, so inheritance can be used. In this way each particular algorithm can be speci ed de ning its particular interface, and including its con guration parameters as data members so it can be properly con gured before learning. Studying algorithms as simple servers, they are in the group of servers with context (also named servers with state, see [3]). This property is important in a sense as it is not possible that a concrete algorithm could work with more than one client at the same time. Paying attention to gure 3, services o ered by the IMachineLearningInterface are:  void conf algoritm(): is used to con gurate 4

Servidor Cliente

Aprendizaje set_parameter1 set_parameter2

idle

... set_parameter n

forget_all forget_all conf_algoritm

conf_algorithm

make_learning algorithm_configured

get_model_performance make_inference

make_learning

get_learned_model

Learning

make_learning configured

Configuration

learned

learning_done get_model_performance model_evaluated

Figure 4: State transitions (behaviour) of a typical GEMINIS learning server

Model Evaluation

make_inference inference_done ... make_inference

Model Deployment

inference_done

the learning process is realised at time. If not then a TimeTooConstrainedException is raised and the model is destroyed.  IInference make inference(IExample):once the dessired model has been obtained, inferences can be done (by the learning server) from a given example. It must be noticed that an IInference object shall be of the concrete type of inferences the learning server uses by means of the inheritance used in IDL. Besides, if the learning process was not done before this call, the client will get a LearningIsNotDoneException exception raised.  double get experiment performance(): this call allows the client to obtained a goodness meassure for the obtained model. This meassure depends on the type of error estimation used by the algorithm internals. If the model is wanted to be evaluated by an standard mechanism and from the client, the the former call (i.e. make inference()) should be used.  void forget all: tells the server to forget the learned model and put itself in an idle state. Obtained model can be used, remotely as long as dessired without worrying about its type or particular functionality.

forget_all algorithm_idle

Initialization

Figure 5: Simple Interaction : con guration, learning, model evaluation, inference and reset. inferences done with it. Once the work is done the algorithm goes into an idle state again. The events diagram appearing in gure 5 shows this interaction. Another typical scenario is one in which the time restrictions are of type TIME CONSTRAINED and the time is totally spent. Notice that in the case depicted in 5, just after the make learning() call, a time out is setted at the client. This mechanism are necessary because of the following cases can happend: 1. The client is willing to wait a concrete time, for the algorithm to nish its learning task. Inside this case, two di erent situations can occur once the whole time is spent: (a) The client is happy with the obtained model (the client has work under a BEST EFFOR mode. (b) The client do notlike the obtained model, as it understand that the algorithm was not able to get the dessired level of quality in the theory.

2.2 Agents Behaviour Modeling

The internal functioning of a typical GEMINIS learn- 2. The client does not use time restrictions, but the ing server appears modeled by the nite deterministic algorithms puts at work and do not ever conautomata that appears in gure 4. verge. The simplest interaction between a client and a learning server is one in which rst of all, the algorithm is The 1a scenario is depicted in the interaction scheme con gured, then the learning process is carried out in of gure 6. The 1b scenario appears depicted at gtime. After that the model is evaluated, accepted and ure 7. The last scenario, 2 comes represented at gure 5

Servidor Controlador

Cliente set_parameter1

Servidor Cliente

Aprendizaje

Aprendizaje

set_parameter2

set_parameter1 ... set_parameter2

set_parameter n

...

conf_algorithm

set_parameter n

algorithm_configured

conf_algorithm

make_learning_event

algorithm_configured

make_learning

(caller, receiver)

make_learning stop_learning learning_done stop_learning

forget_all

learning_done

algorithm_idle

make_inference

algorithm_idle

inference_done ...

Figure 8: Simple Interaction without time restrictions, and without answer from the learning server: an intervention from a third-party is necessary in order to restablish the normal functionning of the system.

make_inference inference_done forget_all algorithm_idle

Figure 6: Simple Interaction with spent time out: the client accepts what the algorithm learned in that time. 8. This one is the most interesting. It shows an important detail that is related with concepts like software productivity and code replication. This newly mentioned third-party entity has the purpose of the concentration of all coordination tasks between the rest of entities that take part in a data mining session. As it can be seen, when the client makes the make learning call it is necessary to trigger an event to this third-party so it can account all the learning processes that are taking part at a time in the system. So when there is a time out (it is determined by itself) goes into the game and tries to restablish the normal functioning. Servidor

Cliente

Aprendizaje

set_parameter1 set_parameter2

...

set_parameter n conf_algorithm

algorithm_configured

3 Achieving Scalability

make_learning

In section 1.3 we mentioned that it is important to have as many learning algorithms as possible in the system. It is also clear that to build a mechanism to make reasonings about learning capabilities of an agent, it must be clearly de ned and located in the GEMINIS system. In this section we will rst de ne both our characterisation of a learning agent and our characterisation of a learning data set. Then we will Figure 7: Simple Interaction with spent time out: justify the use of a X.500 based directory service[6] the client refuses the model obtained by the learning as a way to centralice the information related to the server in that time interval. description of all agents and data. We will see that a directory service is a powerful software mechanism to provide scalability. stop_learning

learning_done

forget_all

algorithm_idle

6

 The management of data used for learning, un-

Learning

derstanded as a boolean variable that indicates if the algorithm needs all data in memory for learning or not (see [13], pg. 53).

Techniques

Inductive Logic Programming

Evolutive

Neural

Learning

Networks

Backprop Learning

Kohonen Networks

Decission Trees

Fuzzy Neural Networks

Clustering

 The typical algorithm load that is injected to the

Reinforcement Learning

Normal

Hierarchical

Clustering

Clustering

host by the algorithm when working.

Features about the execution environment that are taken into account in the system are:

 Architecture of the host (e.g. Sparc, Intel, HP, Fuzzy Kmeans

Kmeans

Mountain

Clustering

Clustering

etc).

 Operating system used.  The typical machine load.  The available secondary memory space at the

Figure 9: A non exhaustive hierarchy of learning techniques considered in GEMINIS

moment of execution.

3.1 The Learning Agent Information Model

 Main memory used in the host.

A learning agent in GEMINIS is de ned through three di erent angles. The rst one is the one concerning with the agent particular capabilities as a concrete learning algorithm. The second one is related with its implementation features (e.g. language it is programmed in, possible paralelisation, etc). The last one gots to do with the execution environment it is located into (the concrete machine and operating system). This three dimensions will be included in the de niton of GEMINIS learning server. Respect to learning capabilities, an agent will be enclosed in a concrete machine learning family, and all families will we organised in a hierarchical manner. At the moment, the hierarchical manner we have designed is the one appearing at gure 9. Besides, for each type of family there are common descriptors like  The level of visual interpretation of the produced model,

Most of the parameters mentioned above take symbolic values. For example, a parameter like the typical machine load could take values in fLOW, LOW MEDIUM, MEDIUM, MEDIUM HIGH, HIGHg. A group of them have xed values, and others like the one related with available secondary memory could change dinamically. However, at the moment this possibility is not observed. We plan to use SNMP (Simple Network Management Protocol) [15] to update these parameters.

Implementation features considered as relevant in GEMINIS are:  The language used to code the agent: it is important, in terms of resources and CPU consumption, to make a distinction about if the agent is programmed in Java, C++ or C (i.e. this are the three languages in which algorithms encapsulated into agents are programmend in).

 Number of examples, number of classes, number

3.2 The Trainning Data Sets Information Model

Trainning data sets features are important in order to choose among the algorithms available, the one that best suits to the user requirements. Hence, it is important to characterice them in terms of general parameters and particular parameters of each feature and class (if the tuples are labeled as pertainning to  The endurance to noisy data that a family shows, a concrete class). Following, we will enumerate all important parame The type of data the algorithm works with (i.e. ters that are considered general: continuous, symbolic or both) of continuous and discrete attributes: those gives an idea of the size the data set has.

 Examples by class, examples by attribute, at-

tributes by class: those give ideas about relative size of examples for each class, examples for each attribute (length of the set), and attribute for each class.

7

 Nulls percentaje: it is obtained with the of the Most important parameters only for continuous at-

number of nulls in the whole data set by the tributes are: number of total individual data of the set as in  Median, that is the value that can be found in nulls the middle of the set of all values. Nulls% = features  tuples  The arithmetic mean of an attribute, being n the  Entropy: it is calculated as the arithmetic mean number of examples, is of entropy for all attributes in the data set. This meassure gives light about the quality of the sysx = x1 + x2 +n : : : + xn tem for classi cation tasks. The expression used is n m  The geometric mean of an attribute pj log pj i=1 j =1 xg = pn x1 x2 : : : xn  Univar: it is a typical meassure in working with data coming from sensors [14] and estimates the  The armonic mean of an attribute level of separability of the classes in data. For a given class, wi and a given value x, the normalxa = 1 1 n 1 iced probabilistic distance of x to wi , is de ned x1 + x2 +    + xn as di = jx ? i j  The lower, middle and upper quartiles that are i the values of the attribute under which we can where i is the mean of wi class, and i its stanfound the 25%, 50% and 75% of the values, redard deviation. This way, the distance between spectively. two classes wi and wj is de ned as  The skewness of an attribute, that is given by j  ?  j i j the expression d=  + i j 3 sk = E [(X?3 ) ] Following [14], a meassure of separability between the two classes is and meassures the similarity to a normal-like dis(wi ; wj ) = 1 ? ( i + j )2 tribution that the values of the attributes have.

XX

i ? j

 The kurtosis, by the expression 4 kr = E [(X ? ) ]

. So, the mean of all distances between all classes is 2 c?1 c 1 ? ( i + j )2 c(c ? 1) i=1 j=i+1 i ? j and nally, the whole meassure is 2 c?1 c 1 ? ( i + j )2 1 n n k=1 c(c ? 1) i=1 j=i+1 i ? j

XX

X

4

that estimates the proportion of attributes that are far to the mean.

XX

 The variance, calculated with s2 = n ?1 1

Xn (xi ? x)

And now, we will explain all parameters that are cali=i culated for each attribute of the data set. But rst of all we must say that parameters for symbolic atvariation coecient, using the mean and tributes are a subset of parameters for continuous  The variance attributes. So, lets start with the rst subset. 2 C 2 = xs 2 Parameters for both, continuous and discrete attributes are: All this parameters, both general for the whole  Attribute entropy, above explained. dataset and particular for each attribute are intended  Attribute mode, that is the most frecuent value. to give light to the decission process of meta-learning. 8

3.3 An X.500 based service as GEMINIS Repository

c=ES

As we have explain in sections 3.1 and 3.2, there are two important blocks of structured information that results critical for meta-learning. We have decided to use a hierchical information model to represent all this data and an LDAP (Lightweight Directory Access Protocol) [15] based access relational database to store it. In the following two sections, i.e. sections 3.4 and 3.5, we will explain who both agent and trainning data related information is organized in a hierarchical manner.

techniquename=decission trees

.. .

techniquename=clustering ou=geminis techniques

techniquename=evolutive learning

techniquename=inductive logic programming techniquename=neural networks

techniquename=backpopagation learning

3.4 Snapshots of Agents DB

ou=bpn agents ou=bpn experiments

In a X.500 based directory, there is a root node from which all the rest of entries of the database can be reached following a path from it, through other entries, to the target node (this node can either or not be leaf). Following the standard X.500 structure that is now working on Internet, we have decided to put our subtree under the path \ou=Geminis, ou=ants, ou=Intelec, o=Universidad de Murcia, c=ES". So, the GEMINIS system X.500 information hangs under our research group (ANTS), that hangs under our department (Intelec), that hangs under our university and so on. This gives us the possibility to put the system in Internet, when it were stable. A rst look of what hangs under our root node can be seen in gure 10. Under the organizational unit (ou) entry that refers to GEMINIS, there are ve others ou's. Each ou under GEMINIS groups the most important entities in the systems, like are hosts, data repositories in where an algorithm can get its data, Geminis techniques that refers to learning techniques available, learning support agents like the Directory Service Agent (DSA), and all learning datasets and DBs that are taken into account till now. Under \ou=geminis techniques, techniquename=neural networks'" it comes another level of speci city as it is depicted in 10. Then each concrete technique has its own three different ou's, one ou for the agents, one ou for holding experiments results and one ou for holding the models for each experiment (i.e. in this concrete case, the model is the topology of the neural network and the arc's weights between nodes). For example, in the gure 12 it appears the detail of a concrete relation of experiments, called generically \MSE-Topology" trying to study the e ects of topology in the mean squared error in predictions after learning. This is precisely the base for meta-learning data. Those results will compound tuples that theirselves will compound a meta-learning data set (see gure

ou=bpn models

Figure 10: A non exhaustive representation of the GEMINIS directory contents, related to the intelligent techniques available in GEMINIS.

c=ES

o=Universidad de Murcia

ou=Intelec ou=hosts ou=ants ou=repositories

ou=Geminis

ou=geminis techniques

ou=geminis agents

ou=learning datasets

datasourcename=Protein Localization Sites datasourcename=Wine Types Classification datasourcename=Iris Classification Problem datasourcename=Tic-Tac-Toe Game

Figure 11: A non exhaustive representation of the GEMINIS directory contents, related to the trainning data sets available in GEMINIS.

9

c=ES

attributename=sepal length

attributename=petal length

attributename=sepal width ou=learning datasets attributename=petal width

datasourcename=Iris Classification Problem

classname=Iris Setosa

classname=Iris Virginica c=ES

classname=Iris Versicolor

description="The Iris Setosa type of flower"

techniquename=neural networks

classproportion=0.333 techniquename=backpropagation learning

Figure 13: A detail of the represention in the directory of the \Iris Classification Problem".

ou=Iris-Topology

ou=bpn experiments

ou=Images Recognition

14).

ou=MSE-Topology

3.5 Snapshots of Trainning Data Sets

experimentId=Turtle1

momentum=0.01 hidden_n=5 epochs=100 attsusedforlearning=1 2 3 attusedasclass=4 datasourcename=Wines Types learningerro=0.00323 testerror=0.00562

experimentId=Lion1 experimentId=Turtle2 experimentId=Lion2

Figure 12: A detail of some attributes of a learning experiment, executed by an agent called Turtle, that implements a neural network backpropagation algorithm to learn from the data set called \Wines Types". The generalization error obtained after 100 epochs is 0.00562.

Accounting of experiments is important as it was noticed in the previous section. However, for a correct study of the meta-learning problem, all parameters described in 3.2 must be easily accesible from the directory. This is what we show in gure 13. In this gure it appears the representation of the \Iris Classification Problem". There is a rst level of depth in where an entry appears for each attribute and class. This data set has four continuous data attributes, and three di erent classes which label each tuple. The next level for attributes would show entries for the parameters metioned in section 3.2. Parameters for classes are the same for all classes in all data sets, and those are a description, if available, and the class proportion in the whole dataset.

4 Inductive Meta-Learning Very little has been mentioned, till now, about our approach for doing meta-learning. Our approach is inductive, in the sense of trying to discover models of behaviour for the algorithms in the form of rules. The scheme of inductive meta-learning appears depicted in gure 14. There we have some data sets, that must be converted into a GEMINIS common 10

TS1

Trainning Data Sets

Learning Algorithm

Common Format

Experiment Results

TS2

TS1

TSn

LAm

LA2

LA1

Learning Experiments

C11

C12

C1m

C21

C22

C2m

Cn1

Cn2

Cnm

Meta-Data Set

RGA

Rules Generator Algorithm

Meta-learning Rules

Figure 14: Our meta-learning scheme. format. At the moment, we are using 25 di erent data sets from the UCI machine learning database, but we are planning to include more data sets. Once we have the data sets and the algorithms, learning experiments can be run. Each learning experiment will be accomplished with a learning algorithm and a learning data set, with a concrete tuple of con guration parameters for the algorithm (e.g. if we are using a neural network, the topology could be a con guration parameter). In this way, each arrow that goes from either an algorithm or a data set, to a result could be actually a potentially in nite number of arrows, each one for each di erent con guration parameters tuple. With all results we can compound a global meta-learning data set and learning behaviour rules could be extracted from it. This is our main thesis.

5 Conclusions, Related and Future Work This work focuses mainly on meta-learning, although many particular issues are addressed in it. An important development work has been done on the multiagent system. An interesting project based on Java agents is JAM[16]. In JAM, agents are distributed depending on where the multiple data sources are located. Each agent does its own data mining and then another meta-learning agents combine all results. They are used for fraud and intrusion detection. A

similar approach is observed in the PADMA[7] architecture in which data mining agents make the discovering process in paralell without merging results. Integrating di erent machine learning tools have been outlined previously at the KEPLER system[17] which introduces the concept of extensibility of a data mining systems, in the sense of integrating any machine learning algorithm. It is based in the concept of plug-in, however it does not incorporate decision mechanism to choose among those algorithm for a given data mining session. Related works are those in Statlog[10], MLT(Machine Learning Tool) project(ESPRIT-2154) and MLC++[8]. The Statlog project was intended to compare statistical approaches with machine learning and neural networks in order to obtain some basic guidelines for deciding the algorithm's best possible uses. In the MLT project, a set of machine learning algorithms were developed and compared in performance. MLC++ provides a set of C++ classes for supervised machine learning and many of them, if not all, will be integrated into our system. They have already been used in MineSet[2], a data mining system which focusses mainly in results visualization. Meta-learning is also outlined in the early mentioned JAM project. JAM's meta-learning must be understood in terms of combining di erent classi ers previously obtained by each data mining agent. For the engineering of a generic data mining systems being able of integrate any implementation of any machine learning algorithm, both a distributed processing platform and a powerful directory service are needed, in order to assure scalability. CORBA has been signaled here as the most popular framework for object-oriented distributed processing. Besides, LDAP is the most used protocol for accesing X.500 based directory services. However, there is a lot of work still to be done. The meta-learning software infraestructure is now ready but there are no results until now. However, this GEMINIS system is now being used in other projects [9, 1]in order to do generical intelligent data analysis as is can be a powerful and exible data analysis tool.

References

11

[1] J. Botia-Blaya, A. Gomez-Skarmeta, G. Sanchez, M. Valdes, and J. A. LopezMorales. Aplicacion de Tecnicas Inteligentes a la Mejora de los Procesos de Riego en el Entorno Agrcola de la Region de Murcia. In CAEPIA'99 TTIA'99 Libro de Actas Volumen II.

[2] Cli Brunk, James Kelly, and Ron Kohavi. Mi- [14] T. W. Rauber, M. M. Barata, and A. S. SteigerneSet: An Integrated System for Data MinCarc ao. A toolbox por analysis and visualizaing. In David Heckerman, Heikki Mannila, Daryl tion of sensor data in supervision. In InternaPregibon, and Ramasamy Uthurusamy, editors, tional Conference on Fault Diagnosis. The Third International Conference on Knowledge Discovery & Data Mining. AAAI Press, Au- [15] M Scho stall and J. Davin. A Simple Network Management Protocol (SNMP). Technical regust 1997. port, Internet Standard. RFC 1098, 1990. [3] Douglas E. Comer. Internet Working with [16] Salvatore Stolfo, Andreas L. Prodromidis, ShelTCP/IP. prentice, 1995. ley Tselepis, Wenke Lee, and Dave W. Fan. JAM:Java Agents for Meta-Learning over Dis[4] U.M. Fayyad, G Piatetsky-Shapiro, P Smyth, tributed Databases. In David Heckerman, Heikki and R. Uthurusamy, editors. Advances in Mannila, Daryl Pregibon, and Ramasamy UthuKnowledge Discovery and Data Mining. AAAI rusamy, editors, The Third International ConPress/The MIT Press, 1996. ference on Knowledge Discovery & Data Mining. AAAI Press, August 1997. [5] HUGIN. Introduction to Bayesian Networks. Technical report, HUGIN EXPERT A/S, 1993. [17] Stefan Wrobel, Dietrich Wettschereck, Edgar Sommer, and Werner Ende. Extensibility in [6] ISO/IEC-9594-1, X.501. the Directory: data mining systems. In Jiawei Han EvangeOverview of Concepts, Models and Services.An los Simoudis and Usama Fayyad, editors, The ISO/ITU-T Standard, 1995. Second International Conference on Knowledge Discovery & Data Mi ning. AAAI Press, August [7] Hillol Kargupta, Ilker Hamzaoglu, and Brian 1996. Sta ord. Scalable, Distributed Data Miningan Agent Architecture. In David Heckerman, Heikki Mannila, Daryl Pregibon, and Ramasamy Uthurusamy, editors, The Third International Conference on Knowledge Discovery & Data Mining. AAAI Press, August 1997. [8] R. Kohavi, G. John, R. Loing, D. Manley, and K. P eger. MLC++: A machine learning library in C++. In Tools with Ariti cial Intelligence, pages 249{271. IEEE Computer Society Press, 1993. [9] H. Martinez-Barbera, A. Gomez-Skarmeta, M. Zamora-Izquierdo, and J. Botia-Blaya. Neural Networks for Sonar and Infrared Sensors Fusion. In Fussion 2000. Paris, July 2000. [10] Donald Michie, David J. Spiegelhalter, and CharlesC. Taylor, editors. Machine Learning, Neural and Statistical Classi cation. Ellis Horwood, 1994. [11] Tom M. Mitchell. Machine Learning. McGrawHill, 1997. [12] OMG. The Common Object Request Broker: Architecture and Speci cation. Technical report, Object Management Group, July 1995. [13] J. Ross Quinlan. C4.5: Programs For Machine Learning. The Morgan Kaufmann series in Machine Learning. Morgan Kaufmann, San Mateo, California, 1993. 12

Suggest Documents