A proposal for Meta-learnig through a Multi-agent System Juan A. Bota
[email protected] Antonio G. Skarmeta
[email protected]
Mercedes Garijo and Juan R. Velasco fmga,
[email protected]
Dep. Informatica, Inteligencia Arti cial y Electronica Facultad de Informatica Universidad de Murcia, E{30001 Murcia, Spain
Dep. de Ingeniera de Sistemas Telematicos, E.T.S.I. Telecomunicacion Universidad Politecnica de Madrid, E{28040 Madrid, Spain
1 Introduction The problem we tackle in the present work has become very important the last years. This is the problem of automating the process of deciding the most suitable learning algorithm for a concrete intelligent data analysis task. We call it the Meta-learning problem. Maybe, the most important reason for the ocurrence of this fact is a certain stability situation in both machine learning and datamining reseach elds. In one hand, lots of algorithms have been developed that try to create approximated theories from a number of observated examples (for a good introduction to the state of the art you can see [11]). In the other hand, algorithms coming from the machine learning paradigm have been proved successful in many real world datamining applications. But a problem still remains open and that is that there is a lack of methodollogies that guides the process of the correct design of a machine learning experiment. In gure 1 you can see the typical knowledge discovery session scheme. For a compendium of KDD (Knowledge Discovery in Databases) applications you can see [4]. Once we have got familiar with the problem domain, and have the data ready to get automatically analised, two important decissions have to be taken by the data miner and these are (1) choosing a data mining task and (2) deciding which data analysis algorithm to apply to the data. Those decission have Figure 1: A Scheme of a complete Inductive Knowlto be taken one after the other as (2) heavely depends edge Discovery session on decission (1). Understand Problem Domain
User Goal Definition
Prior Knowledge Compilation
Select Target Data
Data Clearning
Data Reduction
Knowledge Language Language
DM Task Selection
Model Knowledge Language Evaluation
Learning Algor. Selection
Model Knowledge Search Language
Perform DM
Interpretate & Evaluate
OK Knowledge
Exploit New Knowledge END
1.1 Choosing the data mining task Some datamining tasks could be: This work has been partially funded by the Spanish Government under the CICYT project TIC 97-1343-C02-01
1
Change Algorithm’s Parameters
Classi cation is learning a function that maps
Meta data parameters about the source data like number of available tuples, number of features, nulls percentaje and so have to be taken into account in order to get a data theory with an optimun quality.
Regression is learning a function that maps
3. type of model to obtain: this is related with the above mentioned user satisfaction level in the sense of obtainning a model that the user can easily understand and interpretate is an important thing. Besides, the data mining expert could have more skills with certain types of models than with others so it could prefer the rst group of them. Another reason to consider the type of data model as an important factor for deciding the task type is that a data mining task of regression that does not require an interpretation of the data leads to use arti cial neural networks as black boxes and a data mining task of detecting clouds of similar behaviour will lead us to a clustering model (i.e. a set of points and a metric to obtain the tuples that are closer to each group).
a data item into one of several prede ned classes. Think, for example in automated identi cation of objects of interes in large image databases, or classifying trends in nancial markets. a data item to a real-value prediction variable. There are many examples of regression use: predicting consumer demand for a new product, depending on the advertising expenditure, estimating the probability that a patient will get better, or worse, given the results of a set of diagnostic tests, etc.
Clustering is identifying a nite set of cate-
gories or clusters to describe the data, being the categories mutually exclusive and exhaustive, hierarchical or overlapping. Examples are discovering homogeneous sub-populations of consumers in a marketing database.
Summarization globes methods for nding a
compact description for data. A simple example is tabulating the means and standard deviations for all elds. A more sophisticated method 1.2 Choosing the right algorithm is the discovery of functional relations between The concrete learning algorithm choosen will be devariables. termined by three parameters. The rst is the way Dependency Modeling is nding a modell in representing knowledge, the second is the strategy for describing structural and cuantitative de- for evaluating knew knowledge and the third is the pendiencies between variables. Having a Belief search technique for discovering knew knowledge. Bayesian Network in mind [5] , arrows between variables show structural relations and condi- Model Representation is the language, L, for tional probabilities show the strength of them. describing discoverable patterns. The expressive power of a given language L must be fully Change and Deviation Detection tries to known by the analyst because that power may discover most signi cant changes on data, from whether limitate the model accuracy or increase previously measured or normative values. This the search complexity. For example, a decission technique is used, for example, in credit card tree based language can not discover relations frauds detection. like x = y, but the search complexity is not so high(it is reduced to nd more discriminant atTo get deeper into the problem, we shall introduce tributes). Other like rst order logics have a very the factors that in uence decission (1). Choosing a powerful expressiveness but search complexity is data mining task depends on three factors: very high. 1. user requeriments: this is an important factor as a great portion of the software application Model Evaluation depends on the data analysis primary task. The most conmonly used cuansuccess will depend on the nal user satisfactitative meassure for almost all models is accution level. We have clearly identi ed three diracy. Three approaches are available for estimatmensions on user requiremets, and those are the ing it: user's primary goal (i.e. why is he using a data analysis tool), its interactivity needs, and the { train & test: there exist two dierentiated dessired quality level of the nal data model. data sets, the training data set and the 2. the nature of the source data: as we understand evaluation data set. The rst one is used it, this is the most important factor in order to for learning and the second evaluates new choose among a set of data analysis algorithms. learned knowledge. 2
{ bootstrap: various learning processes are maintain the idea of using all datamining session re-
made in parallel, with dierent data sets, formed randomly and with replacement from the original data set and then all classi ers are tested with the examples that didn't take part at the corresponding training phase. All error rates can be combined then to obtain stadistics. { cross-validation: the original data set is divided into m subsets and m learning processes are done in parallel, each one with a training set formed with the union of m ? 1 subsets: all sets but it's evaluation one that must be distinct for all m classi ers. The evaluation ratio is the mean of the m produced ratios. Evaluation in a classi cation task involves other factors as well as the prediction accuracy. These are novelty, understandability and utility. Perhaps these are more subjective criteria. Search Method depends on whether we are searching for a model or for the right parameters for an early xed model. Searching for a model implies a larger search space, explorated usually with heuristics. Searching for parameters that optimize a xed model with observed data is usually a greedy search, as the gradient descent method of back propagation neural nets.
sults as a feedback for the system in order to automatically learn what sould be done in the next session. In this sense the approach is inductive if we see each datamining session results as a tuple of a meta-learning data set. Section 2 is dedicated to introduce and explain the part of the multi-agent architecture devoted to oer a repository of intelligent analysis algorithms. The following section, 3 shows our approach to aord scalability. Section 4 gets deeper into the meta-learning issue and give details on the inductive approach we use. In section 5 related works are introduced and both conclusions and future works pointed out.
2 A Multi-agent Architecture for Intelligent Data Analysis In this section we will concentrate our attention on the software engineering process of the encapsulation of a machine learning algorithm into a software agent. In order to better understand what could be a sample scenario of the nal system we are propossing, that we call GEMINIS (GEneric MINIng System), in this work we will begin showing a possible con guration of the system, that appears in gure 2. This gure shows a host named A that runs three dierent processes. Each one of this processes is a dierent data analysis algorithm. There are two of them that work with decission rules, (i.e. AQ11 and ID3) and another one that is an anti cial neural networks that learns with the backpropagation rule. The host named as B is running FOIL that learns rst order logic predicates, and C4.5 that generates regression decission trees. At least, in the host named C there is a single algorithm, ANFIS that generates fuzzy rules from data. There are two important issues that must be taken into account after having a look at the gure:
1.3 Our Proposal
Once the problems have been stated, now is time to oer solutions. As it has been clearly emphasized, the more analysis algorithms we have, the more nal user satisfaction we can achieve. One of the directions of this work is the proposal of a multi-agent architecture that provides a repository of ready to go datamining algorithms that: are encapsulated into learning agents. Each one of them oering a common interface so all of them can be managed in the same manner no matter the type of datamining algorithm they are. can be distributed in dierent machines, with dierent operating systems and hardware architectures. In this way we can oer both a powerful datamining tool (in terms of performance) and a scalable scheme of integration of datamining algorithms as the system can be extended through as many hosts as we would need. The other direction is that this system is being used, at the same time, to study the problem of metalearning, under an inductive approach. That is, we
All algorithms are seen in the same manner by a
possible client (i.e. they oer the same interface in terms of service calls).
There is a middleware that oers distrution services and holds communication between clients and learning agents based on CORBA (Common Object Request Broker Arquitecture).
2.1 Learning Services Declaration
Both symbolic and connexionists algorithms can run without any problem in the same host. Besides, each one is self-contained in the sense of none of them need another one to give its learning services. 3
ID3 AQ11
FOIL
ANN-Backpropagation
C4.5
module algorithms{ interface IMachineLearningService{ void conf_algorithm();
A Host
void make_learning(in TLearningTimeConstraint mode,in unsigned long time) raises (AlgorithmIsNotConfiguredException, BadConfigurationException, LearningExperimentDoNotConvergeException, BestEffortDoneBeforeTimeException, TimeTooConstrainedException);
B Host
model::IInference make_inference(in data_server::IExample example) raises (LearningIsNotDoneException, BadExampleException); //This call must return, in some way represented, the learned model model::IModelService get_learned_model() raises(LearningIsNotDoneException); //This call returns a double representing, in some way, the quality of the //learned model double get_experiment_performance() raises (LearningIsNotDoneException);
CORBA BUS
//This call will allow for a new experiment to begin void forget_all() raises (LearningIsNotDoneException); };
ANFIS
};
C Host
Figure 3: Common Services Interface for all Learning algorithms Figure 2: Typical scenario of GEMINIS. Notice that there is a middleware that works as an IPC (Inter Process Communication) layer and it is based on CORBA.
the initial parameters of the learning algorithm. Initial parameters have a group of general parameters, common for all algorithms (e.g. the data source to learn from) and a group of speci c parameters (e.g. for an arti cial neural network with the backprop learning rule there are parameters like the learning rate, momentum, topology, etc.) void make learning(): this is, obviously, the most important call and makes the algorithm to learn a new theory from data. The algorithm must be con gured (i.e. its parameters must be setted) before learning. If this is not so, then a AlgorithmIsNotConfiguredException exception must be raised. There are three modes for learnig indicated by the argument of type TLearningTimeConstraint. These are: 1. FREE MODE: the algorithm can learn till its process converges to a model acceptable by itself. 2. BEST EFFORT: the algorithm is given a time limit and the client must wait for it to learn. If the time limit is passed and the algorithm did not nish learning, it must stop and return the best model obtained at the moment (the client only sees a normal ending of the call). It the dessired model is obtained at time, then a BestEffortDoneBeforeTimeException exception is raised from the server and the learning call nish. 3. TIME CONSTRAINED: this time the time limit is severe and the model is only accepted if
The set of common services that all algorithms are oering must be speci ed in a descriptive language. This language is IDL (Interface De nition Language) [12] and it is commonly used to de ne CORBA based distributed services that could be acceses from anywhere through the CORBA bus. The gure 3 shows a reduced IDL de nition of the services that all learning algorithms in GEMINIS should oer. In that speci cation, there is a names space called algorithms inside which all declarations that are related to data analysis algorithm should be enclosed. Inside this name space, it appears the interface IMachineLearningService that includes all services that must be oered by learning algorithms in GEMINIS. IDL follows an object oriented approach, so inheritance can be used. In this way each particular algorithm can be speci ed de ning its particular interface, and including its con guration parameters as data members so it can be properly con gured before learning. Studying algorithms as simple servers, they are in the group of servers with context (also named servers with state, see [3]). This property is important in a sense as it is not possible that a concrete algorithm could work with more than one client at the same time. Paying attention to gure 3, services oered by the IMachineLearningInterface are: void conf algoritm(): is used to con gurate 4
Servidor Cliente
Aprendizaje set_parameter1 set_parameter2
idle
... set_parameter n
forget_all forget_all conf_algoritm
conf_algorithm
make_learning algorithm_configured
get_model_performance make_inference
make_learning
get_learned_model
Learning
make_learning configured
Configuration
learned
learning_done get_model_performance model_evaluated
Figure 4: State transitions (behaviour) of a typical GEMINIS learning server
Model Evaluation
make_inference inference_done ... make_inference
Model Deployment
inference_done
the learning process is realised at time. If not then a TimeTooConstrainedException is raised and the model is destroyed. IInference make inference(IExample):once the dessired model has been obtained, inferences can be done (by the learning server) from a given example. It must be noticed that an IInference object shall be of the concrete type of inferences the learning server uses by means of the inheritance used in IDL. Besides, if the learning process was not done before this call, the client will get a LearningIsNotDoneException exception raised. double get experiment performance(): this call allows the client to obtained a goodness meassure for the obtained model. This meassure depends on the type of error estimation used by the algorithm internals. If the model is wanted to be evaluated by an standard mechanism and from the client, the the former call (i.e. make inference()) should be used. void forget all: tells the server to forget the learned model and put itself in an idle state. Obtained model can be used, remotely as long as dessired without worrying about its type or particular functionality.
forget_all algorithm_idle
Initialization
Figure 5: Simple Interaction : con guration, learning, model evaluation, inference and reset. inferences done with it. Once the work is done the algorithm goes into an idle state again. The events diagram appearing in gure 5 shows this interaction. Another typical scenario is one in which the time restrictions are of type TIME CONSTRAINED and the time is totally spent. Notice that in the case depicted in 5, just after the make learning() call, a time out is setted at the client. This mechanism are necessary because of the following cases can happend: 1. The client is willing to wait a concrete time, for the algorithm to nish its learning task. Inside this case, two dierent situations can occur once the whole time is spent: (a) The client is happy with the obtained model (the client has work under a BEST EFFOR mode. (b) The client do notlike the obtained model, as it understand that the algorithm was not able to get the dessired level of quality in the theory.
2.2 Agents Behaviour Modeling
The internal functioning of a typical GEMINIS learn- 2. The client does not use time restrictions, but the ing server appears modeled by the nite deterministic algorithms puts at work and do not ever conautomata that appears in gure 4. verge. The simplest interaction between a client and a learning server is one in which rst of all, the algorithm is The 1a scenario is depicted in the interaction scheme con gured, then the learning process is carried out in of gure 6. The 1b scenario appears depicted at gtime. After that the model is evaluated, accepted and ure 7. The last scenario, 2 comes represented at gure 5
Servidor Controlador
Cliente set_parameter1
Servidor Cliente
Aprendizaje
Aprendizaje
set_parameter2
set_parameter1 ... set_parameter2
set_parameter n
...
conf_algorithm
set_parameter n
algorithm_configured
conf_algorithm
make_learning_event
algorithm_configured
make_learning
(caller, receiver)
make_learning stop_learning learning_done stop_learning
forget_all
learning_done
algorithm_idle
make_inference
algorithm_idle
inference_done ...
Figure 8: Simple Interaction without time restrictions, and without answer from the learning server: an intervention from a third-party is necessary in order to restablish the normal functionning of the system.
make_inference inference_done forget_all algorithm_idle
Figure 6: Simple Interaction with spent time out: the client accepts what the algorithm learned in that time. 8. This one is the most interesting. It shows an important detail that is related with concepts like software productivity and code replication. This newly mentioned third-party entity has the purpose of the concentration of all coordination tasks between the rest of entities that take part in a data mining session. As it can be seen, when the client makes the make learning call it is necessary to trigger an event to this third-party so it can account all the learning processes that are taking part at a time in the system. So when there is a time out (it is determined by itself) goes into the game and tries to restablish the normal functioning. Servidor
Cliente
Aprendizaje
set_parameter1 set_parameter2
...
set_parameter n conf_algorithm
algorithm_configured
3 Achieving Scalability
make_learning
In section 1.3 we mentioned that it is important to have as many learning algorithms as possible in the system. It is also clear that to build a mechanism to make reasonings about learning capabilities of an agent, it must be clearly de ned and located in the GEMINIS system. In this section we will rst de ne both our characterisation of a learning agent and our characterisation of a learning data set. Then we will Figure 7: Simple Interaction with spent time out: justify the use of a X.500 based directory service[6] the client refuses the model obtained by the learning as a way to centralice the information related to the server in that time interval. description of all agents and data. We will see that a directory service is a powerful software mechanism to provide scalability. stop_learning
learning_done
forget_all
algorithm_idle
6
The management of data used for learning, un-
Learning
derstanded as a boolean variable that indicates if the algorithm needs all data in memory for learning or not (see [13], pg. 53).
Techniques
Inductive Logic Programming
Evolutive
Neural
Learning
Networks
Backprop Learning
Kohonen Networks
Decission Trees
Fuzzy Neural Networks
Clustering
The typical algorithm load that is injected to the
Reinforcement Learning
Normal
Hierarchical
Clustering
Clustering
host by the algorithm when working.
Features about the execution environment that are taken into account in the system are:
Architecture of the host (e.g. Sparc, Intel, HP, Fuzzy Kmeans
Kmeans
Mountain
Clustering
Clustering
etc).
Operating system used. The typical machine load. The available secondary memory space at the
Figure 9: A non exhaustive hierarchy of learning techniques considered in GEMINIS
moment of execution.
3.1 The Learning Agent Information Model
Main memory used in the host.
A learning agent in GEMINIS is de ned through three dierent angles. The rst one is the one concerning with the agent particular capabilities as a concrete learning algorithm. The second one is related with its implementation features (e.g. language it is programmed in, possible paralelisation, etc). The last one gots to do with the execution environment it is located into (the concrete machine and operating system). This three dimensions will be included in the de niton of GEMINIS learning server. Respect to learning capabilities, an agent will be enclosed in a concrete machine learning family, and all families will we organised in a hierarchical manner. At the moment, the hierarchical manner we have designed is the one appearing at gure 9. Besides, for each type of family there are common descriptors like The level of visual interpretation of the produced model,
Most of the parameters mentioned above take symbolic values. For example, a parameter like the typical machine load could take values in fLOW, LOW MEDIUM, MEDIUM, MEDIUM HIGH, HIGHg. A group of them have xed values, and others like the one related with available secondary memory could change dinamically. However, at the moment this possibility is not observed. We plan to use SNMP (Simple Network Management Protocol) [15] to update these parameters.
Implementation features considered as relevant in GEMINIS are: The language used to code the agent: it is important, in terms of resources and CPU consumption, to make a distinction about if the agent is programmed in Java, C++ or C (i.e. this are the three languages in which algorithms encapsulated into agents are programmend in).
Number of examples, number of classes, number
3.2 The Trainning Data Sets Information Model
Trainning data sets features are important in order to choose among the algorithms available, the one that best suits to the user requirements. Hence, it is important to characterice them in terms of general parameters and particular parameters of each feature and class (if the tuples are labeled as pertainning to The endurance to noisy data that a family shows, a concrete class). Following, we will enumerate all important parame The type of data the algorithm works with (i.e. ters that are considered general: continuous, symbolic or both) of continuous and discrete attributes: those gives an idea of the size the data set has.
Examples by class, examples by attribute, at-
tributes by class: those give ideas about relative size of examples for each class, examples for each attribute (length of the set), and attribute for each class.
7
Nulls percentaje: it is obtained with the of the Most important parameters only for continuous at-
number of nulls in the whole data set by the tributes are: number of total individual data of the set as in Median, that is the value that can be found in nulls the middle of the set of all values. Nulls% = features tuples The arithmetic mean of an attribute, being n the Entropy: it is calculated as the arithmetic mean number of examples, is of entropy for all attributes in the data set. This meassure gives light about the quality of the sysx = x1 + x2 +n : : : + xn tem for classi cation tasks. The expression used is n m The geometric mean of an attribute pj log pj i=1 j =1 xg = pn x1 x2 : : : xn Univar: it is a typical meassure in working with data coming from sensors [14] and estimates the The armonic mean of an attribute level of separability of the classes in data. For a given class, wi and a given value x, the normalxa = 1 1 n 1 iced probabilistic distance of x to wi , is de ned x1 + x2 + + xn as di = jx ? i j The lower, middle and upper quartiles that are i the values of the attribute under which we can where i is the mean of wi class, and i its stanfound the 25%, 50% and 75% of the values, redard deviation. This way, the distance between spectively. two classes wi and wj is de ned as The skewness of an attribute, that is given by j ? j i j the expression d= + i j 3 sk = E [(X?3 ) ] Following [14], a meassure of separability between the two classes is and meassures the similarity to a normal-like dis(wi ; wj ) = 1 ? ( i + j )2 tribution that the values of the attributes have.
XX
i ? j
The kurtosis, by the expression 4 kr = E [(X ? ) ]
. So, the mean of all distances between all classes is 2 c?1 c 1 ? ( i + j )2 c(c ? 1) i=1 j=i+1 i ? j and nally, the whole meassure is 2 c?1 c 1 ? ( i + j )2 1 n n k=1 c(c ? 1) i=1 j=i+1 i ? j
XX
X
4
that estimates the proportion of attributes that are far to the mean.
XX
The variance, calculated with s2 = n ?1 1
Xn (xi ? x)
And now, we will explain all parameters that are cali=i culated for each attribute of the data set. But rst of all we must say that parameters for symbolic atvariation coecient, using the mean and tributes are a subset of parameters for continuous The variance attributes. So, lets start with the rst subset. 2 C 2 = xs 2 Parameters for both, continuous and discrete attributes are: All this parameters, both general for the whole Attribute entropy, above explained. dataset and particular for each attribute are intended Attribute mode, that is the most frecuent value. to give light to the decission process of meta-learning. 8
3.3 An X.500 based service as GEMINIS Repository
c=ES
As we have explain in sections 3.1 and 3.2, there are two important blocks of structured information that results critical for meta-learning. We have decided to use a hierchical information model to represent all this data and an LDAP (Lightweight Directory Access Protocol) [15] based access relational database to store it. In the following two sections, i.e. sections 3.4 and 3.5, we will explain who both agent and trainning data related information is organized in a hierarchical manner.
techniquename=decission trees
.. .
techniquename=clustering ou=geminis techniques
techniquename=evolutive learning
techniquename=inductive logic programming techniquename=neural networks
techniquename=backpopagation learning
3.4 Snapshots of Agents DB
ou=bpn agents ou=bpn experiments
In a X.500 based directory, there is a root node from which all the rest of entries of the database can be reached following a path from it, through other entries, to the target node (this node can either or not be leaf). Following the standard X.500 structure that is now working on Internet, we have decided to put our subtree under the path \ou=Geminis, ou=ants, ou=Intelec, o=Universidad de Murcia, c=ES". So, the GEMINIS system X.500 information hangs under our research group (ANTS), that hangs under our department (Intelec), that hangs under our university and so on. This gives us the possibility to put the system in Internet, when it were stable. A rst look of what hangs under our root node can be seen in gure 10. Under the organizational unit (ou) entry that refers to GEMINIS, there are ve others ou's. Each ou under GEMINIS groups the most important entities in the systems, like are hosts, data repositories in where an algorithm can get its data, Geminis techniques that refers to learning techniques available, learning support agents like the Directory Service Agent (DSA), and all learning datasets and DBs that are taken into account till now. Under \ou=geminis techniques, techniquename=neural networks'" it comes another level of speci city as it is depicted in 10. Then each concrete technique has its own three different ou's, one ou for the agents, one ou for holding experiments results and one ou for holding the models for each experiment (i.e. in this concrete case, the model is the topology of the neural network and the arc's weights between nodes). For example, in the gure 12 it appears the detail of a concrete relation of experiments, called generically \MSE-Topology" trying to study the eects of topology in the mean squared error in predictions after learning. This is precisely the base for meta-learning data. Those results will compound tuples that theirselves will compound a meta-learning data set (see gure
ou=bpn models
Figure 10: A non exhaustive representation of the GEMINIS directory contents, related to the intelligent techniques available in GEMINIS.
c=ES
o=Universidad de Murcia
ou=Intelec ou=hosts ou=ants ou=repositories
ou=Geminis
ou=geminis techniques
ou=geminis agents
ou=learning datasets
datasourcename=Protein Localization Sites datasourcename=Wine Types Classification datasourcename=Iris Classification Problem datasourcename=Tic-Tac-Toe Game
Figure 11: A non exhaustive representation of the GEMINIS directory contents, related to the trainning data sets available in GEMINIS.
9
c=ES
attributename=sepal length
attributename=petal length
attributename=sepal width ou=learning datasets attributename=petal width
datasourcename=Iris Classification Problem
classname=Iris Setosa
classname=Iris Virginica c=ES
classname=Iris Versicolor
description="The Iris Setosa type of flower"
techniquename=neural networks
classproportion=0.333 techniquename=backpropagation learning
Figure 13: A detail of the represention in the directory of the \Iris Classification Problem".
ou=Iris-Topology
ou=bpn experiments
ou=Images Recognition
14).
ou=MSE-Topology
3.5 Snapshots of Trainning Data Sets
experimentId=Turtle1
momentum=0.01 hidden_n=5 epochs=100 attsusedforlearning=1 2 3 attusedasclass=4 datasourcename=Wines Types learningerro=0.00323 testerror=0.00562
experimentId=Lion1 experimentId=Turtle2 experimentId=Lion2
Figure 12: A detail of some attributes of a learning experiment, executed by an agent called Turtle, that implements a neural network backpropagation algorithm to learn from the data set called \Wines Types". The generalization error obtained after 100 epochs is 0.00562.
Accounting of experiments is important as it was noticed in the previous section. However, for a correct study of the meta-learning problem, all parameters described in 3.2 must be easily accesible from the directory. This is what we show in gure 13. In this gure it appears the representation of the \Iris Classification Problem". There is a rst level of depth in where an entry appears for each attribute and class. This data set has four continuous data attributes, and three dierent classes which label each tuple. The next level for attributes would show entries for the parameters metioned in section 3.2. Parameters for classes are the same for all classes in all data sets, and those are a description, if available, and the class proportion in the whole dataset.
4 Inductive Meta-Learning Very little has been mentioned, till now, about our approach for doing meta-learning. Our approach is inductive, in the sense of trying to discover models of behaviour for the algorithms in the form of rules. The scheme of inductive meta-learning appears depicted in gure 14. There we have some data sets, that must be converted into a GEMINIS common 10
TS1
Trainning Data Sets
Learning Algorithm
Common Format
Experiment Results
TS2
TS1
TSn
LAm
LA2
LA1
Learning Experiments
C11
C12
C1m
C21
C22
C2m
Cn1
Cn2
Cnm
Meta-Data Set
RGA
Rules Generator Algorithm
Meta-learning Rules
Figure 14: Our meta-learning scheme. format. At the moment, we are using 25 dierent data sets from the UCI machine learning database, but we are planning to include more data sets. Once we have the data sets and the algorithms, learning experiments can be run. Each learning experiment will be accomplished with a learning algorithm and a learning data set, with a concrete tuple of con guration parameters for the algorithm (e.g. if we are using a neural network, the topology could be a con guration parameter). In this way, each arrow that goes from either an algorithm or a data set, to a result could be actually a potentially in nite number of arrows, each one for each dierent con guration parameters tuple. With all results we can compound a global meta-learning data set and learning behaviour rules could be extracted from it. This is our main thesis.
5 Conclusions, Related and Future Work This work focuses mainly on meta-learning, although many particular issues are addressed in it. An important development work has been done on the multiagent system. An interesting project based on Java agents is JAM[16]. In JAM, agents are distributed depending on where the multiple data sources are located. Each agent does its own data mining and then another meta-learning agents combine all results. They are used for fraud and intrusion detection. A
similar approach is observed in the PADMA[7] architecture in which data mining agents make the discovering process in paralell without merging results. Integrating dierent machine learning tools have been outlined previously at the KEPLER system[17] which introduces the concept of extensibility of a data mining systems, in the sense of integrating any machine learning algorithm. It is based in the concept of plug-in, however it does not incorporate decision mechanism to choose among those algorithm for a given data mining session. Related works are those in Statlog[10], MLT(Machine Learning Tool) project(ESPRIT-2154) and MLC++[8]. The Statlog project was intended to compare statistical approaches with machine learning and neural networks in order to obtain some basic guidelines for deciding the algorithm's best possible uses. In the MLT project, a set of machine learning algorithms were developed and compared in performance. MLC++ provides a set of C++ classes for supervised machine learning and many of them, if not all, will be integrated into our system. They have already been used in MineSet[2], a data mining system which focusses mainly in results visualization. Meta-learning is also outlined in the early mentioned JAM project. JAM's meta-learning must be understood in terms of combining dierent classi ers previously obtained by each data mining agent. For the engineering of a generic data mining systems being able of integrate any implementation of any machine learning algorithm, both a distributed processing platform and a powerful directory service are needed, in order to assure scalability. CORBA has been signaled here as the most popular framework for object-oriented distributed processing. Besides, LDAP is the most used protocol for accesing X.500 based directory services. However, there is a lot of work still to be done. The meta-learning software infraestructure is now ready but there are no results until now. However, this GEMINIS system is now being used in other projects [9, 1]in order to do generical intelligent data analysis as is can be a powerful and exible data analysis tool.
References
11
[1] J. Botia-Blaya, A. Gomez-Skarmeta, G. Sanchez, M. Valdes, and J. A. LopezMorales. Aplicacion de Tecnicas Inteligentes a la Mejora de los Procesos de Riego en el Entorno Agrcola de la Region de Murcia. In CAEPIA'99 TTIA'99 Libro de Actas Volumen II.
[2] Cli Brunk, James Kelly, and Ron Kohavi. Mi- [14] T. W. Rauber, M. M. Barata, and A. S. SteigerneSet: An Integrated System for Data MinCarc ao. A toolbox por analysis and visualizaing. In David Heckerman, Heikki Mannila, Daryl tion of sensor data in supervision. In InternaPregibon, and Ramasamy Uthurusamy, editors, tional Conference on Fault Diagnosis. The Third International Conference on Knowledge Discovery & Data Mining. AAAI Press, Au- [15] M Schostall and J. Davin. A Simple Network Management Protocol (SNMP). Technical regust 1997. port, Internet Standard. RFC 1098, 1990. [3] Douglas E. Comer. Internet Working with [16] Salvatore Stolfo, Andreas L. Prodromidis, ShelTCP/IP. prentice, 1995. ley Tselepis, Wenke Lee, and Dave W. Fan. JAM:Java Agents for Meta-Learning over Dis[4] U.M. Fayyad, G Piatetsky-Shapiro, P Smyth, tributed Databases. In David Heckerman, Heikki and R. Uthurusamy, editors. Advances in Mannila, Daryl Pregibon, and Ramasamy UthuKnowledge Discovery and Data Mining. AAAI rusamy, editors, The Third International ConPress/The MIT Press, 1996. ference on Knowledge Discovery & Data Mining. AAAI Press, August 1997. [5] HUGIN. Introduction to Bayesian Networks. Technical report, HUGIN EXPERT A/S, 1993. [17] Stefan Wrobel, Dietrich Wettschereck, Edgar Sommer, and Werner Ende. Extensibility in [6] ISO/IEC-9594-1, X.501. the Directory: data mining systems. In Jiawei Han EvangeOverview of Concepts, Models and Services.An los Simoudis and Usama Fayyad, editors, The ISO/ITU-T Standard, 1995. Second International Conference on Knowledge Discovery & Data Mi ning. AAAI Press, August [7] Hillol Kargupta, Ilker Hamzaoglu, and Brian 1996. Staord. Scalable, Distributed Data Miningan Agent Architecture. In David Heckerman, Heikki Mannila, Daryl Pregibon, and Ramasamy Uthurusamy, editors, The Third International Conference on Knowledge Discovery & Data Mining. AAAI Press, August 1997. [8] R. Kohavi, G. John, R. Loing, D. Manley, and K. P eger. MLC++: A machine learning library in C++. In Tools with Ariti cial Intelligence, pages 249{271. IEEE Computer Society Press, 1993. [9] H. Martinez-Barbera, A. Gomez-Skarmeta, M. Zamora-Izquierdo, and J. Botia-Blaya. Neural Networks for Sonar and Infrared Sensors Fusion. In Fussion 2000. Paris, July 2000. [10] Donald Michie, David J. Spiegelhalter, and CharlesC. Taylor, editors. Machine Learning, Neural and Statistical Classi cation. Ellis Horwood, 1994. [11] Tom M. Mitchell. Machine Learning. McGrawHill, 1997. [12] OMG. The Common Object Request Broker: Architecture and Speci cation. Technical report, Object Management Group, July 1995. [13] J. Ross Quinlan. C4.5: Programs For Machine Learning. The Morgan Kaufmann series in Machine Learning. Morgan Kaufmann, San Mateo, California, 1993. 12