From unsupervised learning to data mining: linking cognition and data analysis Luis Talavera Departament de Llenguatges i Sistemes Informatics Universitat Politecnica de Catalunya Campus Nord, Modul C6, Jordi Girona 3 08034 Barcelona, Catalonia, Spain
[email protected]
Abstract
tions) from raw data that facilitate performance with respect to some given task. The research on inductive learning has been traditionally split into two dierent approaches: supervised and unsupervised learning. In supervised learning, observations are labeled by some external teacher indicating the class membership and the learning task consists of nding the characterization of each prede ned class. The unsupervised learning task assumes that no previous information exists about the class membership of the observations, so learning systems must also nd the underlying structure of the domain by themselves before providing a characterization of the concepts. Application areas of unsupervised learning include discovery, problem solving, planning, engineering, natural language and information retrieval (Fisher & Pazzani, 1991).
Recently, Knowledge Discovery on Databases (KDD) has emerged as a promising research area encompassing methods from several disciplines. Particularly, the data mining step of KDD shares most of its goals with unsupervised learning. But data mining methods are biased towards statistical techniques arguing that Machine Learning (ML) methods are not suitable to deal with real-world databases. We claim that (a) the problems of ML systems in dealing with databases may come from their traditional symbolic nature, (b) some ML challenges may be interpreted from a data mining standpoint, and, as a consequence, (c) ML methods which make use of statistical concepts may be good candidates to solve data mining problems, still incorporating characteristic symbolic biases. Conversely, we also suggest the use of ML biases to constraint existing data mining techniques, thus bridging purely statistical techniques and more heuristic {and cognitive-inspired{ On the other hand, the increasing automaML ones. The unsupervised learning system ISAAC is described to show an application of the proposed ideas. tion of industrial and business activities pro-
duces a rapidly growing stream of data which
Unsupervised learning, Knowledge Dis- has become too huge to be analyzed by mancovery, Data Mining, Clustering, Cognitive Psychology. ual methods. Therefore, methods for automatiKeywords:
cally extracting useful information from data or for assisting humans in this process are needed. A process which serve these purposes is knowledge discovery in databases (KDD) which is concerned with the discovery and interpretation of useful knowledge from data. KDD is intended to be an interdisciplinary eld, enclosing methods and concepts from statistics, databases and ML. A related commonly used term is data mining, which some authors use to denote the application of algorithms for extracting patterns from
1 Introduction Inductive concept learning (Michalski, 1983) is one of the most widely studied areas in Machine Learning(ML). From a cognitive standpoint, it is very important for an intelligent agent to have the ability of grouping and characterizing observations from a given environment. This ability allows agents to create knowledge structures (concept descrip1
data, thus considering these methods as a part of the KDD process. Typical data mining methods are classi cation, regression, clustering, summarization, dependency modeling and change detection (Fayyad, Piatetsky-Shapiro & Smyth, 1996). Both, inductive learning and data mining share the common goal of extracting information (or knowledge) from data. Thus, inductive learning systems appear as good candidates for data mining tasks. At rst, application of inductive learning to a database should be straightforward, and should only suppose some changes in terminology: instances for records, attributes for elds, and so on. As an example, consider the unsupervised learning task which is the analog to clustering in data mining. However, some authors point out several problems that arise when porting ML techniques to data mining. The problems are supposed to arise from some implicit assumptions made in ML approaches about the nature of the training set. Particularly, they claim that many ML algorithms assume that a small, well structured, errorfree dataset from which learning takes place exists. Using a database as a training set for ML systems may cause several problems as databases contain data generated for purposes other than learning (Frawley et al., 1991, Holsheimer & Siebes, 1994). The rst problem is the size of databases. A data mining system is expected to deal with large databases, containing large data volumes. In addition, in many databases, objects are described by a large number of elds, that is, they have a high dimensionality. A high dimensional data set increases the inductive search space in a combinatorial manner, thus making more expensive the learning process. Thus, computational eciency issues play an important role in data mining practice. In contrast, typical ML datasets are small and contain few attributes compared with databases and very much driven by qualitative concerns about the results that by computational concerns. Real world data is often corrupted, containing missing attribute values and non-systematic errors in the value of the attributes. Thus, databases are incomplete and noisy and data mining methods must deal with these problems in order to be eective. Many datasets used in ML do not tend to be contaminated by errors as databases are. Finally, data mining may involve several user de-
cisions in order to ful ll the needs of a speci c task. Therefore, user interaction plays an important role in data mining. Data mining methods should incorporate mechanisms to make use of both user preferences and prior knowledge about the current problem. However, unlike data mining, ML is often concerned with fully automating the learning process, so ML systems often tend to avoid external intervention. Statistics have a wealth of techniques to help in solving all of these problems. This fact contributes to the ubiquity of statistical techniques in data mining, possibly discouraging the use of existing ML systems. The goal of this paper is to show that (a) the problems of ML systems in dealing with databases may come from their traditional symbolic nature, (b) some typical challenges in ML may be interpreted from a data mining standpoint, and, as a consequence, (c) ML methods which make use of statistical concepts may be good candidates to solve data mining problems, still incorporating interesting {possibly cognitive-inspired{ traditional biases. Conversely, we suggest that ML biases can oer new and constructive perspectives to statistical techniques in data mining methods.
2 Unsupervised symbolic learning The symbolic learning paradigm {and in general the symbolic approach in Arti cial Intelligence (AI){ has its roots in several other disciplines such as logic, philosophy and cognitive psychology. It assumes that learning is performed by means of internal processes which manipulate symbolic concept representations. Clearly, the main focus in this paradigm has been on human cognition, either modeling cognitive processes, or identifying useful biases from psychological research. The symbolic approach to learning is therefore concerned with classical AI issues such as heuristics, search and cognitive plausibility which, obviously, have conditioned the research goals pursued in this approach. Since symbolic approaches try often to model cognitive behaviors, simple and noisefree datasets are needed in order to evaluate the learning processes performed by the systems and the results obtained. Large, high dimensional and 2
noisy datasets like those handled in data mining would render analysis of the underlying processes and their results very complex to asses the cognitive plausibility of the methods. Also, there is a strong tendency to consider cognition modeling as closely related to the construction of autonomous systems. Because of that, it is common for symbolic learning systems not to allow any external intervention in the learning processes performed. As mentioned in section 1, this view is absolutely opposed to that of data mining, where the user contribution to the learning process is of great interest. On the other hand, there is a variety of formalisms underlying symbolic concept representations, logic {or similar restricted languages{ being a widely used one. In fact, logic has been present in many sub elds of Arti cial Intelligence as a natural form of representing knowledge in an comprehensible manner. Logic also provides well-de ned inference methods that represent an intuitive manner of dealing with symbolic information. This tendency was followed, for example, in one of the rst approaches to unsupervised learning, the CLUSTER/2 system (Michalski & Stepp, 1983). This system is the original implementation of the conceptual clustering ideas. This approach advocates for a tight coupling of the interpretation task (carried out by an external analyst in statistical approaches) and the clustering processes. To achieve this interaction it is necessary to have a measure of concept quality in order to assess the goodness of the concepts associated with the formed clusters. Logical descriptions t well into this framework as they facilitate the evaluation under quality criteria such as simplicity, t with instances, or discrimination among others. However, classical logic formalisms are not well suited to deal with noisy and missing information. Since these formalisms (or derivations) are very common in symbolic learning, it may appear that symbolic learning systems only can work under ideal conditions and, therefore, are not useful for data mining tasks. But it is not necessary to restrict the view of concepts to logical descriptions. Rather, as shown in several unsupervised learning systems, other formalisms t well into the basic framework of symbolic learning. There are several examples of this tendency such as COBWEB (Fisher, 1987),
AUTOCLASS (Cheeseman et al., 1988) or WITT (Hanson, 1990), which make use of typical statistical topics such as conditional probabilities, Bayesian theory or statistical correlations, but are not by themselves straight statistical methods. In fact, many times there is a cognitive rationale behind the use of statistical principles. The shift from pure logical-based systems to systems which employ some sort of statistical principles in unsupervised learning shows that it is possible to combine the classical symbolic view which provides cognitive biases to the learning systems, with statistical approaches which are suitable for dealing with imperfect information. In the next section, we will review some common topics in unsupervised learning in order to show that they are very close to the data mining problems explained in section 1.
3 Unsupervised learning topics and data mining We have seen that because symbolic learning methods are primarily cognitive-inspired, they may not be able of eciently dealing with databases. From this fact, it would appear that cognitive constraints are harmful when it comes to apply learning systems to the data mining task. But unsupervised learning challenges, even in uenced by the symbolic nature of the approach, are closer to those in data mining that it might seem. Next, we describe some interesting issues present in unsupervised learning research and typical of the symbolic learning paradigm, pointing out their correspondence with data mining problems. Probabilistic concepts. Conjunctive concepts, common in logical representations, imply that all concept members are equal. However, humans regard certain members as more typical than others. Psychologist have developed a number of concept representations to model this behavior. Smith and Medin (1981) refer to a class of representations termed probabilistic concepts, which associates a weight or probability with each attribute value of a concept description. This psychological research has primarily inspired the design of the COBWEB system, but other systems uses some sort of probabilistic representation as 3
Data mining
well (e.g. WITT, AUTOCLASS). The main reason to use this sort of representation is that it provides greater exibility than logical ones. Probabilistic concept representations are better suited to deal with imperfect information than their logical counterparts. Therefore, they should be advantageous when applied to data mining problems which are uncertain in nature. Note that in this case, an initial cognitive motivation (typicality eects) for a learning topic derives in a more practical application (dealing with noise) without existing an a priori clear relationship between both aspects1 . Incremental learning. Incremental constraints are considered to arise in many real-world situations. Humans have the capability of modifying their conceptual schemes as new examples are observed and so learn from a stream of observations. Hence, it is not surprising that an important part of the research in unsupervised learning had focused in incremental systems. Particularly, Gennari, Langley and Fisher (1989) delineate the task of concept formation. Concept formation systems are presented as incremental learners using a hill climbing strategy that operates under reduced search control, with low memory requirements. Another common feature of concept formation systems is their hierarchical organization of concepts, which provide a logarithmic average assimilation cost. Although the eciency constraints present in the incremental learning approach are mainly cognitively justi ed, they still may be useful to deal with large amounts of data. Selective attention. Psychologists have observed a tendency in humans to focus their cognitive eort on some properties when creating categories. This behavior is often referred to as selective attention in Cognitive Psychology and sometimes results in a very strong bias since humans tend to form even unidimensional categories (Ahn & Medin, 1992). This fact has inspired some research in order to incorporate similar constraints to unsupervised learning systems. Gennari's work in focused concept formation (Gennari, 1989) moves along these lines incorporating an attentional mechanism in COBWEB in order to improve the eciency of the system. More recent research has studied other methods for feature selection
imperfect information database size high dimensionality user interaction
Unsupervised learning probabilistic concepts hill-climbing and incremental learning selective attention bias selection
Table 1: Correspondence between data mining problems and unsupervised learning topics
in COBWEB (Devaney & Ram, 1997). Additional work outside the COBWEB framework includes the ISAAC system (Talavera & Cortes, 1996) which computes attribute relevances in order to guide induction. Although data mining and symbolic learning methods may view the problem of reducing the number of terms used in the induction process under a very dierent light, essentially, both approaches try to solve a similar question. Inductive bias. There is a potentially in nite number of inductive hypothesis that can be formulated, but only a small subset make sense for humans. These preferences are implemented in ML algorithms as a bias, considering a bias as any factor that in uences the formulation and selection of inductive hypothesis (Gordon & Desjardins, 1995). Some research has been oriented towards the design of autonomous systems thus incorporating only internal biases (e.g. COBWEB). Other, allow the user to specify certain preferences about the concepts to be created. For example, CLUSTER/2 tries to optimize a list of user preferences about the sort of concepts to be created such as simplicity or t. WITT incorporates several parameters that control the cohesion of the categories formed. In general, we can observe that we should trade autonomy for exibility when designing a learning system. If there are many parameters, the system may become too complex to use, while having no parameters can imply a lack of the necessary exibility to achieve a good interaction with users. If a bias can be characterized in some formal way, in a manner that the behavior of the systems can be changed by tuning some parameter, then bias selection is equivalent to the parameter setting problem. Therefore, interaction with user in learning algorithms can be more easily modeled in systems 1 However, it has been suggested that low typicality could capable of performing some sort of bias selection be an indicator of noisy instances (Talavera, 1996). via parameter setting (Talavera & Cortes, 1997a). 4
But note that some parameters may not be considered adequate for this task as they may not have such a clear meaning for the user as biases have. So far, we have described some traditional concerns in unsupervised symbolic learning and their usefulness as a starting point to solve data mining problems. But it is worth stating at this point two important questions regarding the application of these ideas to real-world problems. In the rst place, we have to point out that, although the topics presented could serve as a basis for applying ML techniques to data mining problems, there is a strong chance that a great amount of adaptation or tuning should be necessary. For example, selective attention and dimensionality reduction face basically the same problem, but it is likely that not every attentional learning mechanism will work for every data mining problem without changes. Actually, the need of improving an existing model to solve a problem is not an speci c drawback in the application of ML to data mining problems, but a general requirement in the process of applying ML algorithms (Brodley & Smyth, 1995). Secondly, we cannot assume that every data mining problem has a cognitive-inspired counterpart in symbolic learning. Rather, some problems must be approached by directly using statistical principles, although still combined with heuristics and control strategies characteristic of symbolic AI approaches. As an example, consider the application of statistical tests in concept tree pruning strategies within the COBWEB framework (Fisher & Chan, 1990) aimed to reduce the noise sensitivity of the system. In the light of this brief account of the relationship between the unsupervised learning topics described and data ming problems, two main and related conclusions can be drawn. First, some features found in ML methods may contribute to solve several data mining problems with proper adaptation to particular applications. It is likely that this adaptation process would imply using statistical principles that will help in improving performance on ill-structured domains, as in the approaches discussed above. But adapted methods can still incorporate interesting traditional symbolic biases. Second, the use of symbolic learning biases can help to nd useful constraints for existing data mining techniques. This double implication should help in bridging a gap between purely statistical tech-
niques and more heuristic {and cognitive-inspired{ ML ones.
4 An example: the ISAAC learning system In this section an unsupervised inductive learning system, ISAAC, will be introduced in order to give an example of the claims made so far. ISAAC is designed under some cognitive assumptions which in turn may solve some KDD problems. ISAAC is an unsupervised learning system that induces knowledge in the form of probabilistic concepts (see section 3). In ISAAC, a probabilistic concept consists of a summary that lists attribute values and their associate conditional probabilities. Thus, each concept is represented by a prototype which lists probabilities of the form P (A = V C ) where A is an attribute, V is an attribute value and C is a concept. On the other hand, membership to probabilistic concepts is graded, overcoming the limitations imposed by logical descriptions, which, in turn, can be considered as a special case when probabilities take only the two extreme values, zero or one. ISAAC allows the user to specify his preferences about the generality of the hierarchies it produces. It takes a parameter, NG, which allows the user to set the degree of generality of induced partitions, more general partitions having few, large concepts and more speci c partitions having many, small concepts. ISAAC generates a at partition with the indicated level of generality, but a complete hierarchy can be built by specifying a set of different NG values. The NG parameter gives the user, therefore, some control of the algorithm biases, allowing him to create hierarchy levels with a generality which is better suited for a given task. This is important because users may want to create dierent types of hierarchies with one data set depending of the performance goal required. This approach is an example of how bias selection can be performed through parameter setting, allowing the user to interact with the systems in order to tune the results obtained. The ISAAC system consist of three stages: Preprocessing, Re ection, and Re nement. The relationship between these stages is shown in gure 1. i
ij
j
k
i
k
5
ij
Reflection current
Preprocessing
hypothesis
weigthed attribute subset
probabillistic concept hierarchy
Refinement
training data set
NG
α
Figure 1: Stages of the Isaac system In the Preprocessing stage, the goal is to obtain a rough view of the training data. This stage consist only of an incremental centroid-based clustering algorithm which uses a distance measure to form clusters. The algorithm requires an parameter which constraints the maximum distance required for an object to belong to a cluster. Obviously, his parameter has not the importance, theoretically speaking, to the ISAAC system that the NG parameter has, given the main learning process is done in the following phases. This phase provides a quick initial partition of data and helps to reduce the initial size of the dataset, grouping very similar or identical observations. Usually, the user has not to interact a lot with the system through the parameter, which can be set to some xed value. However, when dealing with large datasets, the user may want to tune the parameter to further reduce the size of the initial partition. This reduction may, in turn, decrease the complexity of the following stages, which is dependent on the initial partition size. An excessive reduction may result in that the clustering process is almost entirely performed in this stage, so the user must establish a trade-o between the quality of the results and the eciency of the process. After initial preprocessing, the partition created is passed to the Re ection and the Re nement stages. Both stages interact with each other to achieve a hierarchy with a new level which re ects the generality preference of the user as given by the NG parameter. When looking for a given level,
ISAAC continuously merges concepts and checks if the desired level has been reached. At each generalization (merging) step, an abstraction procedure is performed. This procedure consists of selecting the most useful attributes, that is, those attributes which are considered to discriminate better among the current hypotheses. New abstract concept descriptions are generated so that they contain only the subset of attributes considered useful. This is done in the Re ection stage, which uses a relevance measure formerly used in attribute selection in decision trees, called the distance measure (Lopez de Mantaras, 1991), and the current set of hypothesis, to compute a relevance for each individual attribute. The next generalization step is performed using only the useful attributes subset and weighting calculations with the computed attribute relevances. All of this allows to dynamically reduce the dimensionality of the dataset removing those attributes which are not important for the concepts induced so far. From a cognitive standpoint, although ISAAC is not speci cally intended to be a model of human learning, we can nd some cognitive basis behind it. The three stages explained above try to model the assumption that humans often make use of contextual information in categorization processes. This information comes from background knowledge which re ects previous experiences and beliefs (Medin, 1989). But in some situations there is no previous experience in a given domain and background knowledge may not be available. In 6
5 Concluding remarks
such a case, an inductive system may either perform a completely uninformed clustering or try to obtain some information during the process to guide induction. ISAAC deals with contextual information through the Re ection stage, which acts like a sort of meta-level dynamically controlling and adjusting the underlying clustering process. When no other knowledge is present, background knowledge is automatically created in form of attribute relevances which provide an speci c context. In addition, other features are planned to be added within this framework into the Re ection stage, such as constructive induction capabilities, using domain knowledge2 or detecting attribute dependences, which will contribute to build a more knowledge-rich induction context. In addition, the attribute selection procedure implements an attentional mechanism in the system which, as mentioned earlier, is cognitively well-justi ed. Also, there is some cognitive basis in using some simple initial classi cation procedure assuming that the results could indicate more interesting relationships (Medin & Ortony, 1989). To summarize, ISAAC exempli es a system which incorporates some traditional unsupervised symbolic learning biases, and, at the same time, solve important data mining problems as database size, dimensionality or user interaction. Furthermore, ISAAC posses other features that are well suited to the data mining process. The Preprocessing stage substitutes missing values for instances using the majority value of the category in which the instance is assigned as proposed by Bejar (1995). Also, some experiments have been made for automating the selection of the NG parameter for measurable performance goals (e.g. predictive accuracy), using cross-validation techniques to estimate performance (McEwen, 1996, Talavera & Cortes, 1997b). Although these two additional features may be more inclined to use statistical methods, they are perfectly coupled with other cognitive related features, thus eectively showing the plausibility of coupling symbolic learning and statistical principles to solve complex problems.
Data mining is an emerging eld which encloses topics of several disciplines and faces the task of extracting useful information from large, noisy and uncertain data. Although most of the research made in ML pursues the same goals that those established for the data mining task, ML techniques do not seem to be suitable when applied to databases. We have examined the particular case of unsupervised symbolic learning systems, although the conclusions can be extended to supervised approaches as well. We argue that the problems of ML systems in dealing with databases may come from their traditional symbolic nature which is not exactly concerned with the topics found in data mining. However, we have suggested that some traditional symbolic learning topics may be interpreted from a data mining standpoint and that symbolic learning methods which make use of statistical principles may be good candidates to solve data mining problems, without excluding traditional symbolic issues. The ideas presented here are not completely new in the ML community. Fisher's work in iterative optimization of hierarchical clusterings (Fisher, 1995) is an example of how a learning system designed from some psychological ndings (COBWEB) can be pro tably extended to serve as a data analysis tool. From another perspective, current data mining research could bene t of incorporating some ideas from topics in symbolic learning, since symbolic AI is greatly concerned with heuristic and search issues which can complement existing statistical techniques. It cannot be presumed that every method related with cognition has only to work with \toy problems". On the other hand, we have pointed out that in order to get acceptable results in data mining, symbolic learning approaches may need some amount of adaptation to speci c tasks. Although most of the solutions found in ML have not been extensively tested with real databases, they can still be useful and deserve a closer examination. Further testing will allow to assess existent ML solutions under the light of more practical experiences. This future research should help in solving some of the problems found in data mining without excluding 2 Actually, some initial work on using prior knowledge the study of the cognitive rationale for the concept in the form of attribute relevances has being carried out formation process. (Talavera, 1997).
7
Acknowledgments. I thank the anonymous reviewer for his detailed comments and suggestions that greatly improved the quality of this paper. I also thank Ramon Sanguesa for helpful comments and discussions.
Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, (2), 139{172. Frawley, W. J., Piatetsky-Shapiro, G., & Matheus, C. J. (1991). Knowledge discovery in databases. In G. Piatetsky-Shapiro & W. J. Frawley (Eds.), Knowledge Discovery in databases: an overview (pp. 1{27). Cambridge, Massachusetts: AAAI Press. Gennari, J. (1989). Focused concept formation. In Proceedings of the Fifth International Workshop on Machine Learning, (pp. 379{382). Morgan Kaumann. Gennari, J. H., Langley, P., & Fisher, D. (1989). Models of incremental concept formation. Arti cial Intelligence, (40), 11{61. Gordon, D. F. & Desjardins, M. (1995). Evaluation and selection of biases in machine learning. Machine Learning, 20 (1-2), 5{22. Hanson, S. J. (1990). Conceptual clustering and categorization: Bridging the gap between induction and causal models. In R. S. Michalski & Y. Kodrato (Eds.), Machine Learning: An Arti cial Ingelligence Appproach (Volume III) chapter 9, (pp. 235{268). San Mateo, CA: Morgan Kaumann. Holsheimer, M. & Siebes, A. (1994). Data mining: the search for knowledge in databases. Technical Report CS-R9406, Computer Science/Department of Algorithmics and Architecture, CWI. Lopez de Mantaras, R. (1991). A distance based attribute selection measure for decision tree induction. Machine Learning, (6), 81{92. McEwen, M. (1996). ML experiments with an accuracy-eciency trade o in unsupervised attribute prediction. Master's thesis, University of Aberdeen, Computer Science Department. Medin, D. L. (1989). Concepts and conceptual structure. American Psychologist, 44 (12), 1469{1481. Medin, D. L. & Ortony, A. (1989). Psychological essentialism. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning. New York: Cambridge University Press. Michalski, R. S. (1983). A theory and methodology of inductive learning. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine Learning: An Arti cial intelligence approach (pp. 83{134). Los Altos, CA: Morgan Kaumann. Michalski, R. S. & Stepp, R. E. (1983). Learning from observation: Conceptual clustering. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.),
References Ahn, W. & Medin, D. L. (1992). A two-stage model of category construction. Cognitive Science, (16), 81{ 121. Bejar, J. (1995). Adquisicion automatica de conocimiento en dominios poco estructurados. PhD thesis, Facultat d'Informatica de Barcelona, UPC. Brodley, C. & Smyth, P. (1995). The process of applying machine learning algorithms. In Aha, D. & Riddle, P. (Eds.), Working Notes for Applying Machine Learning in Practice: A Workshop at the Twelfth International Machine Learning Conference, Washington, DC. Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., & Freeman, D. (1988). AutoClass: A bayesian classi cation system. In Proceedings of the Fifth International Workshop on Machine Learning, (pp. 54{64). Morgan Kaumann, San Mateo, CA. Devaney, M. & Ram, A. (1997). Ecient feature selection in conceptual clustering. In Machine Learning: Proceedings of the Fourteenth International Conference, Nashville, TN. (To appear). Fayyad, U. M., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery: An overview. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, & R. Uthurusamy (Eds.), Advances in Knowledge Discovery and Data Mining (pp. 1{34). Cambridge, Massachusetts: AAAI Press. Fisher, D. (1995). Optimization and simpli cation of hierarchical clusterings. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, (pp. 118{123)., Montreal, Quebec, Canada. AAAI Press. Fisher, D. & Chan, P. (1990). Statistical guidance in symbolic learning. Annals of Mathematics and Arti cial Intelligence, (2), 135{148. Fisher, D. & Pazzani, M. (1991). Concept formation in context. In D. Fisher, M. Pazzani, & P. Langley (Eds.), Concept Formation: Knowledge and Experience in unsupervised learning (pp. 307{322). San Mateo, CA: Morgan Kaufmann.
8
Machine Learning: An Arti cial intelligence approach (pp. 331{363). Los Altos, CA: Morgan Kaumann. Smith, E. E. & Medin, D. L. (1981). Categories and concepts. Cambridge,MA: Harvard University Press. Talavera, L. (1996). Re exion y re namiento del conocimiento en la formacion de conceptos. Master's thesis, Facultat d'Informatica de Barcelona, UPC. Talavera, L. (1997). Using background knowledge to guide conceptual clustering: an attribute relevance based approach. Unpublished manuscript. Talavera, L. & Cortes, U. (1996). Generalizacion y atencion selectiva para la formacion de conceptos. In V Congreso Iberoamericano de Inteligencia Arti cial, IBERAMIA96, (pp. 320{330)., Cholula, Puebla, Mexico. Limusa, Mexico. Talavera, L. & Cortes, U. (1997a). Exploiting bias shift in knowledge acquisition. In 10th European Workshop on Knowledge Acquisition, Modeling, and Management, Sant Feliu de Guixols, Barcelona, Spain. To appear. Talavera, L. & Cortes, U. (1997b). Inductive hypothesis validation and bias selection in unsupervised learning. In Vanthienen, J. & van Harmelen, F. (Eds.), Proceedings of the 4th European Symposium on the Validation and Veri cation of Knowledge Based Systems, EUROVAV-97, (pp. 169{179)., Leuven, Belgium.
9