Towards a Better Integration of Data Mining and Decision ... - CiteSeerX

3 downloads 21618 Views 191KB Size Report
the lines presented here. Some recent conference calls for papers confirm the relevance of our viewpoint, such as [1]: “Recently emerged the idea that database.
Towards a Better Integration of Data Mining and Decision Support via Computational Intelligence Sylvain Delisle Département de mathématiques et d’informatique Université du Québec à Trois-Rivières Québec, Canada, G9A 5H7 [email protected] www.uqtr.ca/~delisle

Abstract Despite a high level of activity and a large number of researchers involved, current research in data mining is plagued with several serious problems that should be regarded as top-priority challenges. First, large-scale, visible results and benefits seem rare. Second, the field has become utterly specialized and a lot of work remains to be done at the “big picture” level. Finally, data mining is very poorly integrated with the decision support level. This paper concentrates on the latter challenge and presents a research proposal in that direction.

1. Introduction With the relentless growth of information available on the Internet, combined with the storage of gigantic quantities of data on personal and corporate computers, individuals and businesses often find themselves in situations where they feel overwhelmed by the sheer amount of data they must process to make timeconstrained but informed decisions. To add to this already challenging state of affairs, the last couple of years have brought along a new twist: valuable information could be "hidden" in your personal/corporate/Web data, and data mining (DM) may allow you to uncover it. And so DM, and various related methodologies, techniques and tools are now becoming more and more widely used in order to uncover this implicit and potentially useful information or knowledge. Unfortunately, despite the huge interest and sometimes unrealistic hopes it has generated, DM still faces several serious challenges. One of these is the fact that large-scale, systematic, and predictable applications of DM to the benefit of numerous organizations and businesses seem rather uncommon. Another challenge is the fact that the majority of current progress in DM research seems to be based on

utterly-specialized (computer science- or mathematicsrelated) techniques, whereas research on strategic, methodological, and even epistemological aspects of DM are rare—too few people seem to be working at the level of the forest and too many at the level of the individual tree! Yet another challenge, and one on which we will focus in this paper, is the fact that DM remains essentially very poorly integrated with the decision support level (and its corresponding computerized systems) it is supposed to serve, thus not fully realizing its very own raison d'être. In this paper, we present the core of our new research project currently in its initial stage: integrating DM and decision support (DS) through computational intelligence. Our main goal is to make DM more accessible and effective for decision makers to support application-oriented DS. We believe there is currently a lack of research in this area; progress in that direction could foster significant advancement in applied DM.

2. Research Goals The long term goal of this research project is to discover and evaluate new solutions to the crucial and very challenging problem of poor integration between DM and DS. These solutions will be based mainly on computational intelligence concepts and tools (e.g. ontologies and reasoning systems), but also on new development methodologies that take DM aspects early into consideration in the development cycle of information and DS systems. The combination of these two domains, knowledge and methodology, constitutes a most promising avenue relative to the current state of the art in computer science. Our short term objective is to find solutions to the following specific problems which constitute serious limitations in typical situations of DM and DS integration: i) Current DM processes make very little use of already existing corporate knowledge, even when such knowledge has been (partially) formalized

and computerized. Consequently, DM is more tedious than needed and its results tend to produce “already known information”. ii) Most DM projects occur as a posteriori development for which specific databases and processing capabilities must be (re-)developed, and for which special (non-interactive and non-robust) means are used to make DM results usable by the DS level. Thus, the lack of methodological and technical integration with standard information systems causes DM projects to appear as “extra work” for which the return on investment can hardly be measured. iii) There is a lack of standardisation in DM that generates a lot of redundant work and complicates in a major way the evaluation and comparison of tools, methods, and results. Steps must be made towards the adoption of effective DS-oriented DM standards, at least at the local (e.g. enterprise) level. Because we live in an information-based world, the problem and research objectives identified above are of the greatest importance, both at the scientific and economic levels. They are directly related to the allpervading problem of information overload for which DM offers valuable (though partial) solutions, even more so if adequately coupled with modern businesses’ DS systems. We think our project offers great potential for advancing the state of the art in DM applied to information and DS systems.

3. Related Work The problem of poor DM and DS integration has only recently been clearly acknowledged in the scientific literature and little work has been done along the lines presented here. Some recent conference calls for papers confirm the relevance of our viewpoint, such as [1]: “Recently emerged the idea that database management systems should provide support and technology for data mining, as it occurred for OLAP and data warehouses in the last decade”. [15], as several others, mentions “integration of data warehousing, OLAP and data mining” in its list of preferred topics of interest. Some conferences are aimed more specifically at decision making and less so at the technological infrastructure supporting it, as in the work of [3], while others are associated with the field of multiagent systems ([17]): “Integrated knowledge intensive systems emerge in all domains of business and engineering, when intelligent decision support … requires knowing how this knowledge is produced, measured, communicated, and interpreted”—see also [27] for an agent-based approach to DM. The recent domain of grid intelligence, see e.g. [16], is concerned with large-scale

distributed enterprises information systems and can surely be considered promising as it strives to integrate the so-called data and computational grids with the knowledge grid—see also [8]. One of the rare references dealing in some depth with DM and DS is that of [19] with chapters dedicated to various aspects of the integration of DM and DS— these researchers have proposed the equation “data mining + decision tools = better business”. Work in meta-learning is also quite relevant as it attempts to support DM: see METAL (www.metal-kdd.org), [6], and [24]. Other recent work in the field of KDD (Knowledge Discovery in Databases) is that of [13] in which an ontology is used to model domain knowledge to support the KDD process, that of [20] where an environment for the rapid development of pre-DM processing chains is introduced, and that of [25] in which expert system and machine learning technologies are combined to support DM. Work dealing with conceptual queries and online/Web (interactive) DM is also of interest since it must take into account some elements of the DS dimension: see [9], for instance. In the domain of DS systems, the Journal of Decision Support Systems published in 2002 a special issue on “directions for the next decade” containing several key papers: [7] propose to integrate DS and knowledge management processes using DM; [21] propose the concept of a knowledge warehouse to manage the knowledge of the firm; and [22] summarize the evolution of DS systems and emphasize the importance of data warehouses, OLAP, and DM in those systems. Also of interest is case-based reasoning (CBR) which offers valuable tools for DS, as exemplified in the works of [2], [18], and [23]. Because decision makers tend to use analogy-based reasoning processes when solving problems and because they usually do so by considering only a very limited number of cases, especially in the field of business administration, we hypothesize that CBR constitutes a promising approach for our research project—more on this in the following section.

4. For a Better Integration of DM and DS Holsapple & Whinston ([14]) consider that DS systems’ raison d’être is to increase the productivity of decision makers through implementing their abilities to manipulate knowledge, by facilitating problemsolving, and by providing an aid for non-structured problems. Nowadays, this requires the constant processing of phenomenal quantities of data and information and always involves the use of computerized systems. DM methods and tools, hold the promise of facilitating the decision maker’s life by extracting hidden information from a sea of data, and

compacting it into a form easily amenable to decision making. But as we argued before, DM is poorly integrated with DS and, consequently, cannot effectively support decision makers. We start from the work of Bolloju et al. ([7]) in order to depict the typical situation addressed by our proposal, from a DS perspective. Indeed, as shown in Figure 1 below, DM and DS systems are two distinct components that do not usually interact through a computerized tool. Thus, DM and DS are not integrated at all—Bolloju et al. do not consider this question in their work which focuses on knowledge management. From a DM perspective and in relation to our objectives, we must remark that all too often in DM or machine learning, the zero-knowledge (tabula rasa) hypothesis is used and, not very surprisingly, leads to results that can be qualified as “already known information or knowledge” (e.g. “rediscovering” database functional dependencies).

What we propose, in order to support a fruitful integration of DM and DS, is to better exploit existing knowledge—be it background or specialized, or, relative to the enterprise, internal or external—via what we will be referring to as KIVs, i.e. Knowledge (DMDS) Integration Vectors. We define a KIV as a source of data, information or knowledge that plays a role in DM and DS, and in their integration. Examples of KIVs are data models, decision models, general and domain-specific ontologies, and general and domainspecific rule bases. All KIVs will be exploitable and managed by KIVE, i.e. the KIV Environment we propose. KIVE could be added to the architecture depicted in Figure 1 as a new system interfacing with decision makers and interconnected with existing resources. KIVE will contain a case-based system that will index (see [5]) all relevant information to facilitate DM, and ultimately decision making, from the decision maker’s viewpoint.

Figure 1 (Source: Bolloju et al. 2002 [7]). EIS=Enterprise Information System; DSS=Decision Support System(s); model bases= models used by decision makers

DS is now an important application domain of casebased reasoning ([4], [12]) as it embodies well the processing performed by actual decision makers, including their potential usefulness with regard to decision explanation ([10]). The case-based approach we suggest here is somewhat similar to that of [20], although more much elaborated. KIVE will not only consider pre-DM processing, but all aspects of DM related to DS. Also, contrary to [20], we do not exclude unsuccessful cases as they can also be quite informative for DM purposes. Thus, KIVE will constitute a DS-oriented DM environment that contains the following main components: a user interface suitable for decision makers that includes a DSoriented DM wizard; a catalogue of KIVs which are linked to other relevant resources, information and DS

systems; a feature/attribute selection and engineering tool; a catalogue of DM algorithms; a catalogue of (positive and negative) commented cases that will be indexed on multiple criteria, including information on the decision problem related to it; and a case learner and knowledge acquisition module. The development and implementation of the proposed KIVE environment is a challenging project that will allow us to study fundamental issues related to the problem of DM-DS integration. Ultimately, this will support significant progress in applied DM. We now consider many key questions (in no particular order) that will motivate our thinking during this research project: • KIVs include a wide variety of data, information, and knowledge. What is the best way to represent













KIVs, especially for DM purposes and in a format (and language) amenable to DS? How to ensure that existing corporate knowledge is properly formalized and accessible in due time to supply reliable KIVs? What types of metadata and ontologies should be developed to successfully guide the DM process? How can recent work on meta-learning be put to use here? Should metadata be related to a specific DM methodology and/or to a specific theory of knowledge? How to define cases, or derive them from databases ([20], [26]) in order to serve both the objectives of DM and DS? Meta-learning seems promising, as well as knowledge extraction from DM experts. And what is the best way to process (i.e. index, retrieve, assess similarity, adapt) them within KIVE’s case system? How should the knowledge base of KIVE’s wizard be developed in the first place, and what type of rule should it use? Should it put more emphasis on DM or on DS aspects? Should it be data- or model-oriented, or both? Should the wizard’s rulebased system be replaced altogether by a case system instead (with learning capacities)? To what extend can the trial and error strategy, which is often used in DM, be minimized during the interaction with the KIVE interactive wizard when no pre-existing case is available? Here again, meta-learning could be useful. How to define a productive concept of interactive DS through DM when many DM processes may require a lot more time than what is usually understood by “online, interactive”? How can standard development methodologies for information systems and database systems be revised or adjusted to include right from the start relevant requirements and specifications related to DM and DS? Or should entirely new methodologies be devised? How DM and DS integration should be balanced (i.e. with more emphasis on DM or on DS) to better serve the needs of typical decision makers? Should DM-specific and DS-specific functions be accessible separately (through the wizard) when needed? Can we define precisely the notion of DM-DS integration and propose a theoretical model supporting it? Can this integration be measured or quantified?

5. Conclusion We have briefly presented the core of our new research project currently in its initial stage. Consequently, this paper has put forward more

questions than definite answers. We believe finding answers to some of these questions could trigger significant advancement in applied DM, especially in the context of decision support. Our goal is to develop an intelligent environment (named KIVE) to support decision makers in their appropriate application of DM so that they can efficiently make better decisions in a given domain. So the “first order” goal is DM for DS, i.e. DM for application-domain-related DS. However, intelligently supporting DM involves DM-related DS (e.g. how to select the most appropriate DM technique during the DM process). Thus, the “second order” goal is DS for DM. Both first and second order goals will be tackled with DM, DS, and computational intelligence concepts and techniques. For instance, KIVE should be able to allow a manager to perform an in-depth analysis, involving DM, of the last two years’ production data in order to derive decision trees, classification rules, or regression functions that would help him/her better understand how to control production costs and thus be in a position to make better decisions in that area. Another example could be the use of DM on enterprise data in order to derive a parameterized model of its overall performance (see our related work in [11]). This work is oriented towards the elaboration of new conceptual models and computer-oriented techniques, and their implementation and testing in software and related methodologies—but all this while keeping “the forest” in mind. We hypothesize that the current state of the art in DM is limited by hyper-specialized subdisciplines not sufficiently taking into consideration other DM-related disciplines and challenges from a broader perspective. DM is a multidisciplinary field. It obviously involves statistical and computer-science related techniques, but it also involves decision theory and epistemological aspects at an equally important level. We think that much more research is needed on the synergistic combination of the various sub-fields of DM, especially, as we have argued here, with regard to decision support and computational intelligence.

6. Acknowledgments We thank the anonymous reviewers who provided judicious comments that helped us improve our paper. We acknowledge the financial support of the National Sciences and Engineering Research Council of Canada (NSERC).

7. References [1] ACM/SAC (2005), The 2005 ACM Symposium on Applied Computing, Special Track on Data Mining, CFP, Santa Fe (New Mexico, USA), March 2005.

[2] Angehrn, A.A. & S. Dutta (1998), “Case-Based Decision Support”, Communications of the ACM, 41(5). [3] Arnott, D., G. Pervan, P. O’Donnell & G. Dodson (2004), “An Analysis of Decision Support Systems Research: Preliminary Results”, The 2004 IFIP International Conference on Decision Support Systems, Prato (Italy), July 2004. [4] Avesani, P., S. Ferrari & A. Susi (2003), “Case-Based Ranking for Decision Support Systems”, The ICCBR 2003 Conference, Lecture Notes in Artificial Intelligence 2689 (Spinger), 35-49, Trondheim (Norway), June 2003. [5] Azuaje, F., W. Dubitzky, N. Black & K. Adamson (2000), “Retrieval Strategies for Case-Based Reasoning: A Categorized Bibliography”, The Knowledge Engineering Review, 15(4), 371-379. [6] Blanzieri, B., P. Giorgini, P.Massa & S. Recla (2001), “Data Mining, Decision Support and Meta-Learning: Towards an Implicit Culture Architecture for KDD”, Workshop on Positions, Developments and Future Directions, 2001 ECML/PKDD Conference, Freiburg (Germany), September 2001. [7] Bolloju, N., M. Khalifa & E. Turban (2002), “Integrating Knowledge Management into Enterprise Environments for the Next Generation Decision Support”, Decision Support Systems, 33, 163-176. [8] Cannataro, M. & D. Talia (2003), “The Knowledge Grid”, Communications of the ACM, 46(1). [9] Chen, Q., X. Wu & X. Zhu (2004), “OIDM: Online Interactive Data Mining”, The 2004 IEA/AIE Conference, Lecture Notes in Artificial Intelligence 3029 (SpringerVerlag), 66-76, Ottawa (Canada), May 2004. [10] Cunningham, P., D. Doyle & J. Loughrey (2003), “An Evaluation of the Usefulness of Case-Based Explanation”, The ICCBR 2003 Conference, Lecture Notes in Artificial Intelligence 2689 (Spinger), 122-130, Trondheim (Norway), June 2003. [11] Delisle, S., J. St-Pierre & T. Copeck (to appear), “A Hybrid Diagnostic-Advisory System for Small and MediumSized Enterprises: A Successful AI Application", Applied Intelligence (The International Journal of Artificial Intelligence, Neural Networks, and Complex ProblemSolving Technologies), Springer. th [12] ECCBR (2004), The 7 European Conference in CaseBased Reasoning, CFP, Madrid (Spain), August 2004. [13] Euler, T. & M. Scholz (2004), “Using Ontologies in a KDD Workbench”, Workshop on Knowledge Discovery and Ontologies, 2004 ECML/PKDD Conference, Pisa (Italy), September 2004. [14] Holsapple, C.W. & A.B. Whinston (1996), Decision Support Systems: A Knowledge-Based Approach, West. [15] ICDM (2004), The Fourth IEEE International Conference on Data Mining, CFP, Brighton (U.K.), November 2004. [16] KGGI (2003), Workshop on Knowledge Grid and Grid Intelligence, The 2003 IEEE/WIC International Conference on Web Intelligence / Intelligent Agent Technology, Halifax (Nova-Scotia, Canada), October 2003.

[17] KIMAS (2005), The 2005 IEEE International Conference on Integration of Knowledge Intensive MultiAgent Systems, Waltham (Massachusetts, USA), April 2005. [18] Lee, J.K. & J.K. Kim (2002), “A Case-Based Reasoning Approach for Building a Decision Model”, Expert Systems, 19(3). [19] Mladenic, D., N. Lavrac, M. Bohanec & S. Moyle (2003), Data Mining and Decision Support (integration and collaboration), Kluwer. [20] Morik, K. & M. Scholz (2004), “The MiningMart Approach to Knowledge Discovery in Databases”, in Intelligent Technologies for Information Analysis, N. Zhong & J. Liu (eds.), Springer. [21] Nemati, H.R., D.M. Steiger, L.S. Iyer & R.T.Herschel (2002), “Knowledge Warehouse: An Architectural Integration of Knowledge Management, Decision Support, Artificial Intelligence and Data Warehousing”, Decision Support Systems, 33, 143-161. [22] Shim, J.P., M. Warkentin, J.F. Courteny, D.J. Power, R. Sharda & C. Carlsson (2002), “Past, Present, and Future of Decision Support Technology”, Decision Support Systems, 33, 111-126. [23] Sun, B., L.D. Xu, X. Pei & H. Li (2003), “ScenarioBased Knowledge Representation in Case-Based Reasoning Systems”, Expert Systems, 20(2). [24] Vilalta, R., C. Giraud-Carrier, P. Brazdil & C. Soares (2004), “Using Meta-Learning to Support Data Mining”, International Journal of Computer Science and Applications, 1(1), 31-45. [25] Weiss, S.M., S.J. Buckley, S. Kapoor & S. Damgaard (2003), “Knowledge-Based Data Mining”, The 2003 SIGKDD Conference, Washington (D.C., USA), August 2003. [26] Yang, Q. & H. Cheng (2003), “Case Mining from Large Databases”, The ICCBR 2003 Conference, Lecture Notes in Artificial Intelligence 2689 (Spinger), 691-702, Trondheim (Norway), June 2003. [27] Zhang, Z., C. Zhang & S. Zhang (2003), “An AgentBased Hybrid Framework for Database Mining”, Applied Artificial Intelligence, 17, 383-398.

Suggest Documents