Jornadas de Seguimiento de Proyectos, 2007 Programa Nacional de Tecnolog´ ıas Inform´ aticas
Adaptive Learning of Probabilistic Graphical Models in Data Mining and Automatic Personalisation TIN2004-06204-C03 Antonio Salmer´on ∗ Universidad de Almer´ıa
Andr´es Cano † Universidad de Granada
Jos´e A. G´amez ‡ Universidad de Castilla-La Mancha
Abstract In this project we intend to develop the methodology of adaptive learning of probabilistic graphical models, specially Bayesian networks, focused on its application to data mining and automatic personalisation. The background software is the Elvira platform [22], in which development the groups in this project took part, through projects TIC971137-C04 and TIC2001-2973-C05. The most important part of the project is devoted to the development of applications, highly based on the Elvira platform. Each sub-project is responsible of two applications. Sub-project 1 (Almer´ıa) is developing a personalised academic advisor for students, based on the construction of a Bayesian network from the student database of the University of Almer´ıa. Also, it has implemented an application for bookmarks personalisation in web browsers. Sub-project 2 (Albacete) works in the implementation of an advisor for academic managers based on a Bayesian network obtained from the data provided by the University of Almer´ıa. Furthermore, it is developing a system for the classification of e-mail into folders. Sub-project 3 (Granada) is the responsible of the implementation of a system for personalising the result of a web search based on the user’s preferences. Also, it works in the construction of a system for detecting urgent e-mail, specially useful when only a reduced number of messages can be read. Keywords: Probabilistic graphical models, inference, supervised classification, unsupervised classification, applications to e-mail, information retrieval, web search, web browsers, automatic personalisation, data mining.
∗ Email:
[email protected] [email protected] ‡ Email:
[email protected]
† Email:
TIN2004-06204-C03
1
Objectives
We distinguish two types of objectives: A. Objectives of methodological development. B. Objectives of construction of applications. Regarding methodological development, in the project application we established the objectives that we describe below, where Mn means that the planned deadline for a precise objective is the n-th month after the project start: A1 Data preprocessing. Any knowledge discovery task requires a process of data preprocessing. This term refers to a process of transformation of the initial database in order to prepare it for our goals related to learning or inference. We proposed to study the problem of discretisation and establishment of hierarchies of concepts (M16), develop methodologies for variable selection (M15), as well as variable construction and discovery of new variables (M16). A2 Supervised classification. In this area we planned to investigate the construction of classification models incorporating continuous variables, such as Gaussian models and MTEs (mixtures of truncated exponentials). Also, we consider the extension to models where the class variable is continuous (i.e. regression models). The deadline for this objective was M24. A3 Unsupervised classification. The aim was to explore classification problems where the class variable is unknown (i.e. clustering problems), incorporating the feature of adaptation to the arrival of new data. The deadline was M15. A4 Learning of Bayesian networks. Within this objective, we planned tasks consisting of learning in domains of high dimensionality, learning from data streams and incremental learning. The deadline was M28. With respect to the construction of applications, we defined the following objectives: B1 Classification of e-mails into folders. The goal of this application is to classify any incoming mail in the appropriate folder among those defined by the user. The classification model is constructed from a set of e-mails previously classified by the user. The folders may be structured as a tree. The application must be able to handle misclassification scenarios, by offering the user the possibility of correcting the initially proposed folder. The deadline for this objective is M36. B2 Classification of urgent e-mail. The main goal of this application is to sort the incoming e-mails in terms of their degree of urgency. Obviously, the degree of urgency depends on the user, and thus the underlying model must be learnt from the user opinion, as well as from external information (as the date and time). This application may help a user who receives important amounts of messages, to attend them in a more productive way. The deadline for this objective is M36.
TIN2004-06204-C03
B3 Bookmarks personalisation. This application intends to increase the functionality of web browsers by automatically classifying the web sites stored by the user. The classification will take place within a set of folders, possibly hierarchically structured, an taking into account the preferences of the user as well as external information, such as the classification of the site provided by Google’s tree of categories. The deadline for this objective is M36. B4 Personalised web search. This application will improve web search engines by incorporating user preferences in connection with the way in which the search results are displayed. For instance, Google sort them using the page rank. Our aim is to incorporate the user opinion as its degree of satisfaction with the displayed results. The deadline for this objective is M36. B5 Personalised academic advisor for University students. Spanish degree programs usually include an important amount of subjects among which the student must choose in order to create its own profile within the degree program. The aim of this application is to provide the user (the student) the list of subjects that he/she should choose in order to meet some criteria as maximising the probability of success, adaptation to the desired professional profile, etc. The model will be constructed from the administration databases provided by the Unit for Data Coordination of the University of Almer´ıa. The deadline for this task is M36. B6 Academic advisor for lecturers and administration. This objective is related to B4 in the sense that the same data set will be used. The system will provide advise to lecturers related to the kind of students a lecturer can expect to find when approaching a new subject, classifications of the studentship in terms of their previous background, etc. With respect to administration, the system may help to establish pre-requisites for some given subjects, clustering of similar or related subjects, etc. The deadline for this objective is M36.
2 2.1
Status of project development General considerations
In general we consider that the project is progressing in a satisfactory way, even though some of the objectives suffer a delay with respect to the initial plan. However, it must be taken into account that we made the schedule assuming that the the Ministry would provide funds to incorporate three research assistants to the project staff (one per group). Unfortunately, finally we only got funds for one research assistant (assigned to the group in Almer´ıa). Even though the human resources are lower than expected, we decided not to reduce the objectives of the project. In connection with this, an added problem is the difficulty to find a research assistant to enroll in the project. The first candidate that was incorporated to the Almer´ıa group left the project after one year of work, due to the fact that he found a new position with better salary and more projection. Afterwards, the paperwork and bureaucracy required for opening the position again motivated that, during around 6 months our staff was also beneath expected.
TIN2004-06204-C03
2.2
Achievements regarding methodological tasks
A1 Data preprocessing. With respect to discretisation, in this project we were more interested in the problem of grouping of cases for categorical variables with a high number of cases, an equivalent problem to that of discretisation of continuous variables. Thus, we started by implementing a well known method (KEX) in Elvira whose usefulness as a preprocessing tool was satisfactorily tried in [24]. Then, we have develop two semi-naive Bayes classifiers that search for dependent attributes using different filter approaches or the imprecise Dirichlet model [6], and join them into an only attribute. The algorithms also make use of a grouping procedure after two variables are merged in order to reduce the number of cases of the new variables. Regarding hierarchies of concepts, our works are particularised to the classification of bookmarks into a folder structure, in connection with task B3. With respect to variable selection, we have approached the problem from two different points of view: (1) through the construction of a general Bayesian network where the set of relevant features is taken as the Markov Blanket of the class variable. This approach has been tested in connection with the classification of satellite oceanic images [44]; and (2) by proposing a filter+wrapper approach in which the variables are first ranked by using a (fast) filter method, and then a wrapper approach is considered but taking the variables following the ranking instead of using all the variables as candidates at each stage. This second approach has been applied in combination with feature construction in breeding value classification [24, 25]. With regard to the construction of new variables, the previous algorithms [6] are methods to obtain new variables with higher predictive accuracy than the original variables. This task is still in a rather preliminary phase, and we are now investing effort on it in order to advance as much as possible. With the same goal, but in a concrete task as it is e-mail foldering, we have tried to construct new terms by integrating different terms by developing methods inspired in the X-of-N methodology. Concretely we have designed (wrapper) methods based on evolutionary computation to look for good constructed attributes [14] and based on greedy search [15]. These algorithms have been tested by using the well know ENROM e-mail messages data set. A2 Supervised classification. The first group of achievements within this goal comprise supervised classification when the class variable is continuous (i.e. prediction models). We have developed three Gaussian models based on three different network structures but with multivariate Gaussian joint distribution in all the cases. The results have been applied to the prediction of the breeding value in Manchego sheep [34]. In order to approach scenarios in which continuous and discrete feature variables take part simultaneously, we have developed a prediction model based on the Naive Bayes structure where the joint distribution is approximated as a Mixture of Truncated Exponentials (MTE), testing the performance of the models in problems related to the prediction of performance indicators in higher education [41, 42]. Also, some effort has been dedicated to the problem of supervised classification when both the class and the attribute variables are discrete [3, 7]. We have developed new
TIN2004-06204-C03
classification methods using classification trees. It makes use of the imprecise Dirichlet model to estimate the probabilities of belonging to the respective classes. Regarding the induction of the structure of the Bayesian network when the Naive Bayes restriction is not imposed, but still continuous and discrete variables are simultaneously considered, we have developed an algorithm for MTE models based on a hill-climbing and simulated annealing guided search methods [46]. For the estimation of conditional MTE distributions, we have refined previously existing methods by means of the approximation of the sample histogram using kernel functions [47]. The behaviour of the estimation algorithm for conditional MTE distributions is tested in a financial setting in [20]. In all the cases, such as it was contemplated in the project application, the software implementing all the above cited methods has been included in the Elvira system [22]. Once the model has been constructed, the classification is carried out through an inference process in which the distribution of the class variable is computed for the observed values of the feature variables. We have developed an approximate algorithm for carrying out this task when the model is of type MTE, which allows dealing with scenarios defined by a large amount of variables. The method is based on the so-called Penniless technique [17], which is based on adjusting the size of the data structures used to represent the density functions. When all the variables in the network are discrete, the inference process relative to the classification task can be carried out more efficiently, but still there may be situations in which the complexity of this operations is too high. In order to approach these events, we have developed an approximate inference algorithm based on dynamic importance sampling that controls the level of approximation through the factorisation of the probability trees that represent the sampling distributions [37]. On the other hand, we have investigated a new family of Bayesian classifiers, in which the probabilistic graphical model is not a Bayesian network but a dependency network, that is, a network in which directed cycles are allowed. The main advantage of this model is that the parents of each variable can be learnt in parallel and so more efficient algorithms can be designed. In these two years we have explored the following scenarios: learning dependency networks classifiers from data by using score+search, independence tests and evolutionary algorithms [29, 30, 32]. Attending the explanations above, we can conclude that the works in this task progress according to the initial schedule. A3 Unsupervised classification. In this task we have focused on two different problems. Design of a new segmentation algorithm which receives as input a dataset with only discrete variables and outputs a full partitioning of the representation space by using a tree-shaped probabilistic structure [27]. Two additional advantages of this structure (still to be explored) are its use as an approximate factorisation of a probability distribution and its use for approximate inference. On the other hand, we have developed an unsupervised clustering algorithm, which means that the class variable is hidden, which is able to deal with discrete and continuous
TIN2004-06204-C03
features simultaneously [35]. Again, this is possible due to the use of MTEs. Though still some points of the methods must be refined, the results are competitive considering well-known algorithms. A4 Learning of Bayesian networks. Within the framework of this objective, we have built a system based on Bayesian networks that is able to adapt itself to the user profile and that learns using a very large number of variables. The system is actually a computer chess program, which adapts its game strategy to the user’s style, and dynamically re-learns its search heuristic according to several parameters that may change during the game. We think that chess may serve as a benchmark for testing the appropriateness of Bayesian networks for constructing adaptive systems. The system is called BayesChess and is described in [23]. Regarding learning in highly dimensional spaces, we are currently developing methods based on factorisation of probability trees, in which the learnt structure is a join tree rather than a Bayesian network. We have already developed the necessary theory behind approximate factorisation [36]. Also in this issue we have developed CHC (Constrained Hill Climbing) [33], an algorithm that improves the performance of classical Hill Climbing algorithm by constraining during the search the amount of comparisons to be done. The way in which the algorithm reduces the number of comparisons is by constraining the set of candidate parents for each variable, but with the advantage that this knowledge is acquired (and refined) during the search process. A new methodology to treat several types of structural restrictions within algorithms for learning Bayesian networks have been proposed [16]. These restrictions may encode expert knowledge in a given domain, in such a way that a Bayesian network representing that domain should satisfy them. Three types of restrictions are formally defined: existence of arcs and/or edges, absence of arcs and/or edges, and ordering restrictions. It is analysed how the restrictions can be managed within Bayesian network learning algorithms based on both score + search and conditional independence paradigms, and then we particularise to two classical learning algorithms: a local search algorithm guided by a scoring function, with the operator of arc addition, arc removal and arc reversal, and the PC algorithm. Some modifications on the PC basic learning algorithm have been proposed [4]. The variations are: to determine minimum size cut sets between two nodes to study the deletion of a link, to make statistical decisions taking into account a Bayesian score instead of a classical Chi-square test, to study the refinement of the learnt network by a greedy optimisation of a Bayesian score, and to solve link ambiguities taking into account a measure of their strength. It is shown that some of these modifications can improve PC performance, depending of the learning task: discovering the causal structure or approximating the joint probability distribution for the problem variables. B1 E-mail foldering. This task refers to the problem of classifying incoming mail into the folders previously created by an user. The first stage in this task has been the study of text mining techniques, and the creation of a good state-of-the-art [45]. The way in which we have approached this problem has been by means of constructing highly discriminant new attributes from the available ones. The attributes constructed are inspired in the
TIN2004-06204-C03
X-out-of-N methodology, but we have modified the definition to a more suitable one in order to deal with numerical variables that represent word frequencies in the documents. With respect to the algorithms used to look for good constructed attributes we have designed algorithms based on evolutionary [14] and [15], and have been tested over a classical (though recent) benchmark: the ENROM dataset. B2 Classification of urgent e-mail. There are not any available database for e-mails containing the degree of urgency for each mail and each user. Therefore we are developing a prototype program to build a database from the input e-mail of a given user. The program makes use of Elvira, Lucene and Weka. We plan to make a database with the list of e-mails of our team in order to prove different classification algorithms. B3 Bookmarks personalisation. In this task we have addressed the following problems: We have implemented a plug-in for Mozilla Firefox that automatically classifies the bookmarks saved by the user. The basis for the classification is the tree of categories used by Google. We are currently improving the system by incorporating the used preferences. A typical problem in bookmarks management is that users usually limit themselves to add new bookmarks to its favourites in a single folder, but without creating a structure that makes easier its retrieval. With the aim to alleviate this problem we have designed a tool [43] that used hierarchical clustering to create a tree-shaped structure in which the bookmarks are placed depending on its content, which is analyised by using information retrieval classical techniques. Moreover, the tool allow the user to navigate over the constructed hierarchy and to fuse nodes if necessary. Currently we are working on a different task related with bookmarks. Thus, although one user can store a large amount of bookmarks, only a few of them (less than 10) can be place in the personal toolbar folder of a browser, that is, those that are always shown as buttons in a bar of the browser. Of course, this bar is of great utility because we can achieve a new web page just by one mouse click. Now, we are working in the automatic modification of the content of this bar by inducing a classifier that takes into account not only the pages previously visited but also the day of the week, the month, ... B4 Personalised web search. We have analysed factors involved in the relevance assessment of web pages [38, 39] for specific users. We study more than 150 web features (such as textual, visual/layout, structural) in order to investigate, using a machine learning approach, which ones are the most informative about the relevance of web pages. The results showed that while the textual and visual features were useful for the relevance prediction in a topic-independent condition, a range of features can be effective when the task knowledge was available. We have also implemented a plug-in for Mozilla Firefox that recovers the user preferences when he makes a search using Google. This allows to build a local database for each user that will be used to reorder the list of links that retrieve Google in future searches. B5 Personalised academic advisor for University students. We have completed the data preprocessing step required by this task, consisting of databases of performance indicators for all the subjects offered at the University of Almer´ıa since course 1993/1994. A thorough
TIN2004-06204-C03
analysis of the data is carried out in [40, 42]. We are currently implementing the webinterface to the final application. B6 Academic advisor for lecturers and administration. This objective progresses in a parallel way to the former one although it accumulates some delay because the availability of the data from the UCLM. Thus, the initial dataset was received a few months ago, but an enrichment of it was required, that is, addition of extra variables as the professional category of the teacher, the type of subject, ... This new dataset has been just received in this month.
2.3
Additional work
Though not in the original plan, we have carried out some research that was considered necessary or useful for the project development. We want to mention the following contributions: • Evaluation of influence diagrams. Influence diagrams are powerful tools for representing and solving complex decision-making problems. The evaluation may require an enormous computational effort and this is a primary issue when processing real-works models. We have proposed [18] an approximate inference algorithm to deal with very large models, when an exact solution is infeasible. It is an anytime algorithm that returns approximate solutions which are increasingly refined as computation progresses. • Inference in credal networks. A credal network is a graphical structure which is similar to a Bayesian network, where imprecise probabilities are used to represent quantitative knowledge. Credal networks can be applied for example to supervised learning problems. We have proposed two new algorithms [19] for inference in credal networks. These algorithms enable probability intervals to be obtained for the states of a given query variable. • Properties of uncertainty measures. We have studied properties of uncertainty measures for imprecise probabilities [5, 8, 9, 10, 11]. These works have been applied in posterior papers to problems of supervised learning and selection of variables. • Applications to genome analysis of linkage disequilibrium (LD). BMapBuilder builds maps of pairwise linkage disequilibrium (LD) in either two or three dimensions. The optimised resolution allows for graphical display of LD for single nucleotide polymorphics (SNPs) in a whole chromosome. We have used a Bayesian estimator of LD [1, 2] that greatly reduces the effect of ascertainment bias in the inference of LD decay. • Applications of Decision support system in soaring site recommendation. We have developed a recommendation system for the problem of adventure tourism by means of the integration of a data warehouse and a Decision Support System in order to help retrieve data from different databases and information sources and analyse them in order to provide useful and explicit information [12, 13]. • Applications to evolutionary computation. In this point new estimation of distribution algorithms have been designed by using dependency networks as the probabilistic graphical model that guide the search [31].
TIN2004-06204-C03
3
Results indicators
The following indicators support, in our opinion, the results of the project: • The number of publications that have generated by the project works. The list of references include the publications that have received financial support from the project funds. The list includes the most reputed journal and conferences in the field. Also, several papers are currently under review, and for sure, new striking publications will arise in the next future. • Two PhD. theses (Julia Flores -UCLM- and Mar´ıa Morales -UAL-) have been approved during the first part of the project, and other theses are about to be submitted.
References [1] M. M. Abad-Grau, R. Montes, P. Sebastiani. Building chromosome-wide LD maps. Bioinformatics, Vol. 22, no 16, pp. 1933-1934, 2006. [2] M. M. Abad-Grau, P. Sebastiani. Bayesian correction for SNP ascertainment bias. Lectures Notes in Computer Science Vol 3885, Modeling Decisions for Artificial Intelligence (Conference on Modeling decisions for Artificial Intelligence, Tarragona, Spain, 3-5 Abril), pp. 262-273, 2006. [3] J. Abell´ an. Completing an Uncertainty Criterion of Classification. Mathware and Soft Computing, Vol XII, pp. 83-95, 2005. [4] J. Abell´ an, M. G´ omez-Olmedo and S. Moral. Some Variations on the PC Algorithm. Proceedings of the Third European Workshop on Probabilistic Graphical Models, Prague, Czech Republic, pp 1-8, September, 2006. [5] J. Abell´ an. Uncertainty measures on probability intervals from the imprecise Dirichlet model. International Journal of General Systems, Vol. 35, No 5, pp. 509-528, October 2006. [6] J. Abell´ an. Application of uncertainty measures on credal sets on the naive Bayesian classifier. International Journal of General Systems, Vol. 35, No. 6, pp 675-686, 2006. [7] J. Abell´ an, S. Moral, M. G´ omez and A. R. Masegosa. Varying Parameter in Classification Based on Imprecise Probabilities. Soft Methods for Integrated Uncertainty Modelling. Advances in Soft Computing. Vol 6, pp 231-239. Springer-Verlag, ISBN: 3-540-34776-3, 2006. [8] J. Abell´ an, G. J. Klir. Additivity of uncertainty measures on credal sets. International Journal of General Systems, Vol 34, No 6, pp.. 691-713, December 2005. [9] J. Abell´ an, G. J. Klir, S. Moral. Disaggregated total uncertainty measure for credal sets. International Journal of General Systems, Vol 35, No 1, February 2006, pp. 29-44, 2006.
TIN2004-06204-C03
[10] J. Abell´ an, M. G´ omez. Measures of divergence on credal sets. Fuzzy sets and systems, Vo. 157, pp 1514-1531, 2006. [11] J. Abell´ an, S. Moral. An algorithm to compute the upper entropy for order-2 capacities. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. Vol. 14, no 2 pp. 141-154, 2006. [12] F. Araque, A. Salguero and M.M. Abad-Grau. Application of data warehouse and Decision Support System in soaring site recommendation. Proceedings of the the Thirteenth Conference on Information and Communication Technologies in Tourism, ENTER 2006, M. Hitz, M. Sigala and J. Murphy (eds.), Wien-NewYork: Springer-Verlag Computer Science, pp. 308-319, 2006. [13] F. Araque, A. Salguero, M. M. Abad-Grau. Supporting decision making as a plus in adventure tourism. Submitted to Information Technology and Tourism, 2007. [14] P. Bermejo, J.A. G´ amez, and J.M. Puerta. Construcci´on de atributos: un caso de estudio en clasificaci´ on de correo-e. In Actas del V Congreso Espa˜ nol sobre Metaheur´ısticas, Algoritmos Evolutivos y Bioinspirados, MAEB-07., 2007 (accepted). [15] P. Bermejo, J.A. G´ amez, and J.M. Puerta. Attribute Construction for E-mail foldering by using wrappered forward greedy search. 2007 (Submitted). [16] L. M. de Campos, J. G. Castellano. Bayesian network learning algorithms using structural restrictions. To appear in International Journal of Approximate Reasoning, 2007. [17] A. Cano, S. Moral, A. Salmer´ on. Penniless propagation in join trees. International Journal of Intelligent Systems, 15:1027–1059, 2000. [18] A. Cano, M. G´ omez, S. Moral. A forward-backward Monte Carlo method for solving influence diagrams. International Journal of Approximate Reasoning, Vol. 42, pp 119-135, 2006. [19] A. Cano, M. G´ omez, S. Moral and J. Abell´ an. Hill-climbing and Branch-and-bound algorithms for exact and approximate inference in credal networks. To appear in International Journal of Approximate Reasoning, 2007 [20] B. Cobb, R. Rum´ı, and A. Salmer´on. Especificaci´ on de distribuciones condicionadas de variables continuas en redes Bayesianas. In: Actas de las VI Jornadas de Transferencia de Tecnolog´ıa en Inteligencia Artificial. Pag. 21–28. 2005. [21] B. Cobb, P.P. Shenoy, and R. Rum´ı. Approximating probability density functions in hybrid Bayesian networks with mixtures of truncated exponentials. Statistics and Computing, 16:293–308, 2006. [22] Elvira Consortium, Elvira: An environment for creating and using probabilistic graphical models, in: J. G´ amez, A. Salmer´on (Eds.), Proceedings of the First European Workshop on Probabilistic Graphical Models, 2002, pp. 222–230.
TIN2004-06204-C03
[23] A. Fern´ andez and A. Salmer´ on. BayesChess: programa de ajedrez adaptativo basado en redes bayesianas. In: Actas del Campus Multidisciplinar en Percepci´ on e Inteligencia (CMPI-2006). Pages 613-624. 2006. [24] Julia Flores and Jos´e A. G´ amez. Breeding value classification in manchego sheep: a study of attribute selection and construction. In Proceedings of the 9th International Conference on Knowledge-Based & Intelligent Information & Engineering Systems (KES2005), volume 3682 of Lecture Notes in Computer Science, pages 1338–1346, Melbourne, 2005. Springer Verlag. [25] Julia Flores, Jos´e A. G´ amez, Juan L. Mateo, and Jos´e M. Puerta. Selecci´on gen´etica para la mejora de la raza ovina manchega mediante t´ecnicas de miner´ıa de datos. En Inteligencia Artificial (Revista Iberoamericana de Inteligencia Artificial), 29:69–77, 2006. [26] M. J. Flores, J. A. G´ amez and S. Moral. Finding a partition of the explanation space in bayesian abductive inference. Proceedings of the 8th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU-2005). Volumen 3571 of Lecture Notes in Computer Science, Barcelona, Springer Verlag, pp 63-75, 2005. [27] M. J. Flores, J. A. G´ amez and S. Moral. The Independency tree model: a new approach for clustering and factorisation. Proceedings of the Third European Workshop on Probabilistic Graphical Models, Prague, Czech Republic, pp 83-90, September, 2006. [28] J.A. G´ amez, I. Garc´ıa-Varea and J.Hern´andez-Orallo. Miner´ıa de Datos: T´ecnicas y Aplicaciones. Ediciones del Departamento de Inform´atica de la UCLM, 2005. [29] J.A. G´ amez, J.L. Mateo, and J.M. Puerta. Dependency networks based classifiers: learning models by using independence. In Proceedings of the 3rd European Workshop on Probabilistic Graphical Models, pages 115–122, 2006. [30] J.A. G´ amez, J.L. Mateo and J.M. Puerta. Aprendizaje autom´ atico de clasificadores basados en redes de dependencia mediante b´ usqueda heur´ıstica. In Actas del V Congreso Espa˜ nol sobre Metaheur´ısticas, Algoritmos Evolutivos y Bioinspirados, MAEB-07., 2007. (accepted) [31] J.A. G´ amez, J.L. Mateo and J.M. Puerta. EDNA: Un Algoritmo de Estimaci´ on de Distribuciones basado en Redes de Dependencia. In Actas del V Congreso Espa˜ nol sobre Metaheur´ısticas, Algoritmos Evolutivos y Bioinspirados, MAEB-07., 2007. (accepted) [32] J.A. G´ amez, J.L. Mateo and J.M. Puerta. Learning Bayesian classifiers from dependency network classifiers. In Proceedings of the 8th International Conference on Adaptive and Natural Computing Algorithms (ICANNGA-07), Lecture Notes in Computer Science, Warsaw, 2007. Springer Verlag. (in press) [33] Jos´e A. G´ amez and Jos´e M. Puerta. Constrained score+(local)search methods for learning Bayesian networks. In Proceedings of the 8th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU-2005), volume 3571 of Lecture Notes in Computer Science, pages 161–173, Barcelona, 2005. Springer Verlag.
TIN2004-06204-C03
[34] J.A. G´ amez and A. Salmer´ on. Predicci´ on del valor gen´etico en ovejas de raza manchega usando t´enicas de aprendizaje autom´ atico. In Actas de las VI Jornadas de Transferencia de Tecnolog´ıa en Inteligencia Artificial, pages 71–80. Paraninfo, 2005. [35] J.A. G´ amez, R. Rum´ı and A. Salmer´on. Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials. In Proceedings of the 3rd European Workshop on Probabilistic Graphical Models (PGM’06). Pages 123-132. 2006. [36] I. Mart´ınez, S. Moral, C. Rodr´ıguez, and A. Salmer´on. Approximate factorisation of probability trees. ECSQARU’05. Lecture Notes in Artificial Intelligence, 3571:51–62. 2005. [37] I. Mart´ınez, C. Rodr´ıguez, and A. Salmer´on. Dynamic importance sampling in Bayesian networks using factorisation of probability trees. In Proceedings of the 3rd European Workshop on Probabilistic Graphical Models (PGM’06). Pages 187-194. 2006. [38] A. Masegosa, H. Joho and J. Jose. Identifying Features for Relevance Web Pages Predicition. 1st International Worskshop on Adaptive Information Retrieval, Glasgow, UK, pp 36-37, 2006. [39] A. Masegosa, H. Joho and J. Jose. Evaluating the query-independent object features for the relevancy prediction. To appear in Lecture Notes in Computer Science (29th European Conference on Information Retrieval, Roma), 2007. [40] M. Morales, C. Rodr´ıguez, and A. Salmer´on. Estudio de dependencias de indicadores de rendimiento del alumnado universitario mediante redes bayesianas. In Actas de las VI Jornadas de Transferencia de Tecnolog´ıa en Inteligencia Artificial, pages 29–36, 2005. [41] M. Morales, C. Rodr´ıguez, and A. Salmer´on. Selective naive Bayes predictor with mixtures of truncated exponentials. In Proceedings of the International Conference on Mathematical and Statistical Modeling, 2006. [42] M. Morales. Modelizaci´ on y predicci´ on en estad´ıstica universitaria. PhD Thesis. Department of Statistics and Applied Mathematics. University of Almer´ıa. 2006. [43] J.A. Moreno. Gesti´ on semiautom´ atica de los marcadores de un navegador Internet usando t´ecnicas de miner´ıa de datos. Proyecto Fin de Carrera. Ingenier´ıa Inform´atica. Universidad de Castilla-La Mancha. Diciembre, 2006. [44] J.A. Piedra, A. Salmer´ on, F. Guindos, M. Cant´ on (2005). Reduction of Irrelevant Features in Oceanic Satellite Images by means of Bayesian networks. In Actas de las VI Jornadas de Transferencia de Tecnolog´ıa en Inteligencia Artificial, pages 133–140. [45] J.M. Puerta. Introducci´on a la Miner´ıa de Textos. En [28], 67–85. 2005. [46] V. Romero, R. Rum´ı, and A. Salmer´on. Learning hybrid Bayesian networks using mixtures of truncated exponentials. Intnl. Journal of Approximate Reasoning, 42:54–68. 2006. [47] R. Rum´ı, Kernel methods in Bayesian networks, in: Proceedings of the International Mediterranean congress of mathematics, 2005, pp. 135 – 149. [48] R. Rum´ı and A. Salmer´ on. Penniless propagation with mixtures of truncated exponentials. Lecture Notes in Computer Science, 3571:39–50, 2005.