Towards Validation of Data Mining Systems Klaus P. Jantke
FIT Leipzig at HTWK Leipzig P.O.Box 30066 04251 Leipzig
Abstract
The growing complexity of systems humans are able to design and implement has lead to an obvious need to develop methodologies and techniques for the validation of intelligent systems. Otherwise, the human inability to validate complex systems might become the crucial obstacle to apply what has been built. Automated systems for knowledge discovery in data bases (data mining systems, for short) have been chosen as a prototypical application domain for analyzing validation problems and for developing a framework of system validation. The paper aims at developing certain novel theoretical concepts for representing an interactive system's validity. Those concepts need to be based on sequences of structured objects describing snapshots of a system's validity evaluation during an interactive examination. In addition, a systematization is provided for validation of data mining systems, in particular.
1 Motivation and Introduction In [WW93], the authors point to the bottleneck of complex systems validation by warning that the in-
ability to adequately evaluate systems may become the limiting factor in our ability to employ systems that our technology and knowledge will allow us to design.
There is abundant evidence of the need to validate complex systems. Severe accidents (cf. [NN66], e.g.) throw some light at the urgent need of system veri cation and validation, and investigations like [Tho94] exhibit the potentials of formally well-based approaches. [Rei96] provides a related system veri cation perspective with quite illuminating examples. Statistics about accidents in German industries during almost 20 years (cf. [KRSU95], [KRSU96], [SU94a], [SU94b]) prove the urgent need of progress in system validation. In particular, the high percentage of human errors (in the application domain of air traf c control, for instance, it is estimated at 66% as a e-mail:
[email protected]
causal factor of air carrier accidents { cf. [PKZI93]) exhibits that the validation issue substantially exceeds the problem of system module veri cation. There is the particular need to step forward from verifying artifacts to validating man{machine systems as already argued in [Por64]. In contrast to veri cation, the validation task is less formally speci ed and involves substantial human factor issues (cf. [Har93], [Hop82], and [Sta93], e.g.) The political and emotional issues associated with the acceptance of some \technically adequate" system (e.g. nuclear power, totally automatic public transportation systems) must also be considered, as said in [WW93].
Following [OO93], validation is distinguished from veri cation by the illustrative circumscription of dealing with building the right system, whereas veri cation deals with building the system right. According to the complexity of the general system validation task, there is an enormous variety of approaches. The spectrum ranges from remarkably different heuristic approaches as in [Ter94] and [GGC96], e.g., to strictly mathematical investigations like in [ABB+ 91] and [BKM91], e.g. Very speci c problems deal with choosing the right experts (cf. [Sch93]). The present paper aims at the investigation of basic phenomena, at the systematization of essential steps towards system validation, and at the introduction of novel concepts. The present approach is based on the case study of validating automated systems for knowledge discovery in data bases (data mining systems, for short). Therefore, a very brief introduction into the essentials of data mining seems appropriate. The key conceptual novelty is to circumscribe some system's validity by a sequence of appropriately structured validity evaluations. The ultimate validity statement is established by such a sequence meeting certain criteria of convergency and, perhaps, monotonicity. Chapter 2 provides a brief introduction into data mining systems. Chapter 3 deals with learning { the algorithmic core of data mining { and with related validation problems. Chapter 4 adopts a bird's{eye view at the validation of data mining systems.
Klaus P. Jantke: Towards Validation of Data Mining Systems
2 Data Mining Systems { A Case for Validation Knowledge discovery is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. (cf. [FPSM91] This is a quite
condensed formulation of what is nicely described in [FPSS96] as follows: As we march into the age of digital information, the problem of data overload looms ominously ahead. Our ability to analyze and understand massive datasets lags far behind our ability to gather and store the data. A new generation of computational techniques and tools is required to support the extraction of useful knowledge from the rapidly growing volumes of data. These techniques and tools are the subject of the emerging eld of knowledge discovery (KDD) and data mining.
According to the motivation brie y sketched above, knowledge discovery deals with truly large data bases, faces all the problems of vagueness and incompleteness unavoidable in reality, and aims at practically useful results. Consequently, the data mining process requires some preprocessing of data (including data selection to reduce the size in data to be processed and data cleaning to t data to the requirements of algorithms and programs). Similarly, there is some need of postprocessing to make the results of data mining understandable and to allow for potential revisions of the data mining process. According to fundamental insights of the philosophy of sciences (cf. [Pop65] resp. [Pop34]), all outputs generated by data mining systems are essentially hypothetical in character. Furthermore, it will be generally impossible to prove those results correct. This imposes some strict limitations on the possibilities of validating systems for knowledge discovery in data bases. In their right perspective, algorithms for data mining are situated inductive learning algorithms. Recently, tools for knowledge discovery in data bases oer a certain collection of data mining procedures. Key features implemented are clustering, classi cation, induction of functional dependencies, association discovery, sequential pattern inference and similar time sequence discovery. Far beyond standard techniques, there are very recent attempts towards the development and implementation of knowledge discovery in multimedia and hypermedia systems (cf. [PE95] and [Etz96], e.g.).
The following gure is illustrating the key functionalities and the basic architecture of systems for knowledge discovery in data bases. We adopt the view that the core functionality is data mining.
Raw Data
Target Data
Preprocessed Data
Data
Mining
Patterns / Regularities
Knowledge
Figure 1: A Process and Architectural Perspective at Knowledge Discovery in Data Bases Thus, the problem of validation and veri cation of learning procedures (especially inductive learning like classi cation and pattern inference, e.g.) is playing a central role. However, even a successful correctness proof for data mining procedures does not imply that the overall system under validation is the right one for the extraction of previously unknown, and potentially useful information as required. Data preprocessing (data selection and cleaning) and postprocessing (incl. visualization) has to be taken into account. Human factor issues evolve and psychology comes into play.
LIT'97
5. Leipziger Informatik-Tage vom 25. bis 26. September 1997
3 Validation of Algorithms resp. Programs for Data Mining
does nothing to distinguish a proposed system from a straw man designed to give correct answers for { and only for { the test set. On the other hand, this pa-
per does not provide any non-trivial contribution to the area. The only result is the proposal to carefully combine arti cially generated test data with those test data drawn from practical applications. The crux is that an analysis of the peculiarities of learning systems is missing. According to the insight (cf. [Pop34], as mentioned above) that knowledge discovery is (mainly inductive) hypothesis formation and hypotheses may be subject to refutation, but can rarely be proved correct, there is no hope to validate a learning system by inspection of individual input/output pairs of the system's behaviour. Learning takes place in an open loop of man{machine interactions the number of which can not be estimated in advance. Thus, validation has to face this limiting character of the target process under investigation. In [JAK97]'s chapter 5, the fundamental formalisms underlying the authors' Turing test approach to validation have been generalized towards dealing with a priori unbounded sequences of man{machine interactions. One of the essential insights is that a system under inspection will lead to a sequence of validation statements rather than to a single declaration. In formal terms, learning systems get fed in sequences of information units f ngn2IN and respond with related hypothetical outputs f n gn2IN . Those sequences, in practice, will always be nite. However, as a sequence's length is usually unknown in advance, the notation using in nite sequences provides a reasonable and convenient simpli cation1. Individual inputs n may be understood as test data. A set of subsequent inputs 1 k is a test data set abbreviated by [ ]. Experimentation takes place when a learning system is fed with test data sets. The related outputs are recorded and collected in some le similarly denoted by [ ]. The learning system's behaviour may be evaluated. Any evaluation of an individual experimentation result ( [ ] k) might be inappropriate for validation of the overall learning behaviour. supported only by anecdotal evidence. Researchers have The crux is that the system's (in)validity is only simply reported the results of tests on small sets of established sequences of system{environment inselected problems. Even more explicitly, the author teractions. over Learning as a limiting process. criticises: When a system is tested by a nite set of In contrast, validationappears has to be based exclusively on problems selected by its author, we get no information local information. During learning, there are always about its likely performance on new problems arising back-strokes possible. The occurrence individin any other way. Reported successes can always be ual back-stroke does usually not imply oftheaninvalidity attributed to a researcher's ability to nd problems for
Because implemented algorithms for data mining form the core computational components of automated systems for knowledge discovery in data bases, the validation or veri cation of those procedures does play a central role in the validation of the corresponding overall knowledge discovery systems. The validity of learning procedures as the heart of systems for knowledge discovery in data bases may be formally speci ed to some extent. There is a certain type of \formulae" taken as potential hypotheses of every particular data mining process. In its right perspective, the underlying data base should be a model of every formula guessed. Unfortunately, this logically simple and clear approach is somehow complicated in reality. First, due to the necessity to restrict the system's attention to only a small part of selected data, one cannot expect a formula, which is hypothetically generated upon a small set of sample data, to be truly valid in the whole underlying data base. Statistics come into play and make the model concept a little cumbersome. This becomes even more complicated in the presence of data cleaning. Second, and even more important, the question whether or not some formula generated as a data mining algorithm's hypothetical result is considered previously unknown and potentially useful is not a formal one. This might become understandable from a logical and computational perspective. Over suciently expressive formal languages, there is an in nite set of logical tautologies which, naturally, are true in any data base. Most of them might not be considered interesting. However, it is conceptually quite dicult to separate among all the valid formulae those which are \more interesting than others". Beyond algorithmic correctness, evaluating learning systems is a widely open problem. [Sch91] is a prototypical publication in the area of validation of learning systems. On the one hand, this papers provides a substantial criticism of the state of the art: With a few exceptions systems have been
i
o
i
i ; : : :; i
i k
o k
i k ;o
:::
which a system is suited rather than to any power inherent in its approach. Moreover, anecdotal evidence
1 As Gurevich says (cf. [Gur92]), the in nite is a good approximation of the nite.
Klaus P. Jantke: Towards Validation of Data Mining Systems
of the system, because it might not aect the system's overall learning achievments. Conceptually, there are at least two ways out. First, one may restrict validation investigations to exclusively checking local properties. For example, it might be considered an essential feature of the validity of a learning system, if every generated output n re ects the input knowledge [ ] upon which it has been built. Second, one may try to re ect the limiting behaviour of learning by limiting concepts of validity. For instance, validation may be understood as some process of incrementally moving through some space of potential validity assessments such that the validation of some truly valid system results in some monotonically improving sequence of validity assessments. Faced to sequences of validity statements, each of them might be understood as to express some con dence in the systems overall validity. There are several problems with the local approach mentioned. It is well-known in learning theory (cf. [AS83] resp. [AS92], for a general introduction and survey, and [JB81], for an investigation of several local properties) that local properties are only loosely coupled to the overall learning success, i.e. most seemingly natural properties of learning engines do not imply the ultimate success in learning and, vice versa, success in learning usually does not require any of the seemingly natural properties. We con ne ourselves to a few remarks on consistency, i.e. the requirement that any hypothetical output [ ] re ects at least the input information [ ] it has been built upon. Consistency is neither sucient nor necessary for successful learning. Thus, checking consistency does not reveal very much about the validity of a learning system under inspection. Even worse, giving up consistency might be an important step towards improving a learning system's eciency (cf. [Wie92]). Validation of learning systems which is based on local investigations needs some inheritance of pliability properties. The search for \elements of resilience" in [Fos93] is quite similar in spirit. Second, one may alternatively accept sequences of validity assessments as the result of intelligent systems validation. As this seems to exceed former validation concepts considerably, we put some more emphasis on a discussion of conceptual details. It might be insucient to evaluate individual experiments ( [ ] k), only. Hypotheses have to be seen in the context of some experimentation history. If the individual experiments in their growing historical context are denoted by ( [ ] [ ]) and evaluated by some k = ( [ ] [ ]), the overall (possibly terminating) sequence f ngn2IN is establishing the result of validation. o
i n
o n
i n
i k ;o
i k ;o k
e
eval i k ; o k e
The reader may easily recognize that there is no simple and naturally distinguished way of discriminating \valid" systems from those being \invalid" by classifying (initial parts of) evaluation sequences. Similarly, there is no obvious approach to interpret those sequences by values of fuzzyness or uncertainty. Roughly speaking, the information inherent in some validation history is naturally structured and cannot be easily reduced to numeric degrees of (un)certainty. Investigations into the nature of interactive systems validation require concepts2 based on sequences of structured objects. Evaluation sequences are investigated in the area of re ective inductive inference (cf. [Jan94], [Jan95], and [Gri96]). These investigations focus on the power and limitations of learning systems which enjoy the particular ability of self-validation. Consequently, the results are of an immediate relevance to the present discussion. The rst results on re ective inductive inference (cf. [Gri96] and the other references mentioned above) exhibit the narrow limits of self-validation. Furthermore, these results illustrate the importance of dierent criteria of convergency and of monotonicity. Last but not least, when dealing with the validation of core procedures for data mining, one should be aware of the potential complexity of those procedures. Data mining approaches may incorporate certain heuristics such that the validation of an individual data mining procedure might involve the validation of some parameter decision, of certain heuristics in use, and so on. [SSS91] provides a very illustrative example, in this respect. The authors investigate a very particular data minig problem and oer a solution which has been prototypically implemented and tested. The problem under investigation is query optimization. For a given database query, one searches for another query which is semantically equivalent, but operationally more ecient. Queries are learnt via the extraction of database usage patterns. The diculty is that there is usually an enormous number of syntactically dierent queries which are semantically equivalent over some given data base. (Note that semantical equivalence may evolve and may disappear when a data base is changing over time.) Therefore, the query generation procedures invokes certain heuristics. Validation of the data mining procedure depends directly on validating these heuristics. 2 Just for illustration, one might consider several combinations of properties of sequence elements (like consistency w.r.t. the underlying information), of properties of subsequences (like local stability), and of combined properties (like semantic monotonicity, e.g.).
LIT'97
5. Leipziger Informatik-Tage vom 25. bis 26. September 1997
4 Scenarios for the Interactive Validation of Systems for Knowledge Discovery Whereas the preceding chapter dealt with the core learning procedures of data mining, the present one adopts a bird{eyes view at systems for knowledge discovery in data bases as a whole. Such a system is called valid, if it is doing its job, i.e. if it allows for the extraction of previously unknown and potentially useful knowledge from data. There is, obviously, no way to express this goal in a thoroughly formal manner. First, we brie y discuss a few issues related to the process and architectural perspective at systems for knowledge discovery in data bases (cf. Figure 1). As the functionality of those systems may be reasonable decomposed, and as those systems are usually structured in a sophisticated manner, validation considerations may refer to subfunctions and modules, thus, combining the ultimate result of system validation from certain subresults. Second, we focus on the problem of getting human beings involved in system validation. The development of appropriate scenarios leads to some further re nement of the sequence-based concepts introduced in the preceding chapter. Validity of the core data mining procedures does not suce to establish the validity of a given system for knowledge discovery in data bases. There do exist further requirements of two types. First, the other system components for raw data selection, data preprocessing, and knowledge explication need to be valid, as well. Second, the feedback mechanisms of the overall system which allow for several loops during the process of knowledge discovery need to be valid. Knowledge discovery in data bases inevitably encloses data base management issues (cf. [IM96]). Thus, the validity of the overall system includes the validity of some essential data base mangement functionalities implemented. Data mining presupposes the selection of data from a usually huge collection of raw data. This is focussing the subsequent data mining process. Because of the fundamental role of data selection, high quality tool support is necessary. The validation of those tools involves, besides data base management questions, certain aspects of human-computer interface validation3 and of cognitive psychology. 3
To get an impression of the complexity of this task, the
Next, target data selected for data mining usually require some so-called data cleaning. Data have to be cleaned from noise and, furthermore, have to be prepared for the data mining process itself. Sometimes, extra syntactical transformations are necessary (in [FPSS96], therefore, the authors carefully distinguish \preprocessed data" from \transformed data"). At the end of the knowledge discovery chain, the system's results have to be made available to the user. Patterns found and represented in an internal formalism are rarely understandable to the domain expert. Thus, a nal step is required which interprets or evaluates the technical results to make them accessible. Visualization is usually playing a rather important role. Just to mention one aspect, the validation of visualization techniques and tools is a highly exciting problem which is widely open, at the moment. It deeply involves cognitive psychology. One of the diculties is to understand what the essential semantic contents of some data might be and, even more dicult, how to visualize such a content to make it appropriately accessible to human beings. Even the user's cultural environment may be of some relevance, as we know from other investigations in cognition (cf. [Lak87] and the references therein). All the aspects discussed so far refer to the knowledge discovery process as a linear chain of activities. This is far to simple to meet the necessities of extracting previously unknown and potentially useful knowledge in the wild. Therefore, another issue of a knowledge discovery system's validation is the possibility to interact with the system for revising, correcting resp. re ning discovery results. The overall system might be used in an open loop. For illustration, assume a user who asked for clustering a given data base. When clusters are presented, i.e. represented by some visualization technique to inform the user about the systems current hypothesis, the user might nd the system's guess senseless or useless. In response, (s)he might suggest to exclude one possibly misleading attribute from clustering and run the data mining procedure again. Such a revision might be iterated until the user accepts an output as suciently new (i.e. previously unknown) and potentially useful. The user might also wish to repeat some run with the same preprocessed data, but in a randomly changed ordering, to check the result for robustness. To sum up, besides checking all the components of a given knowledge discovery system for validity (which may include those psychologically ambitious user should consult [Ter94]. This paper presents one very particular approach, where only for the validation of man-machine interfaces 17 \checking & measuring items" are listed.
Klaus P. Jantke: Towards Validation of Data Mining Systems
and exciting tasks like validating visualization techniques and procedures), the overall system validation will substantially require the validation of the system in loops of man-machine interactions according to all the possibilities sketched in Figure 1 above. Interactive scenarios of system validation require human beings to get engaged in the validation process. The selection of appropriate personnel is frequently underestimated4 . Among the diculties to be taken into account, there is the distinction between experts and novices (cf. [FH92]). For some purpose, novices are desirable. The diculty is that those loose their status during longer series of experiments4. Other problems are more social, or even political, in character (cf. [Sch93], for a very illustrative discussion which exhibits that it might be sometimes completely impossible to get testers of the desired average level, i.e. social reasons are undermining the possibilities to reach the ultimate goal of system validation). We refrain from a further investigation of the problem how to select experts (resp. novices) appropriately and turn to a discussion towards validation scenarios. First steps towards a systematic development of interrogation scenarios for intelligent systems validation have been published in [AKG96], [KP96], [JAK97], [HJK97b], [JKA97b], [AG97], [HJK97c], [KPG97], [JKA97a],[HJK97a],[KG97a],[KG97b], and[KJAP97]. In unison, these papers are propagating some methodology developed in analogy to the well-known Turing test (cf. [Tur50]). Although interrogation techniques like the Turing test5 are not appropriate to reveal a system's \intelligence" (cf. [Hal87]), they are quite appropriate for validation. The key suggestion is to validate a given system for knowledge discovery in data bases { besides all attempts to validate or verify individual modules as usual { within a potentially unlimited validation dialogue which appears very much like an interrogation of the system by a team of experts and/or novices. In a systematic validation dialogue, a system under inspection is subject to a systematic interrogation. This requires a choice of interrogators, regulations for the interrogation, and some scienti cally motivated approach to process the interrogation results towards some nal validity assessment. For rule{based expert systems, publications like [KP96], [KPG97], [KG97a], 4 Even large and well-known enterprises in the hardware and software business are sometimes extraordinarily careless in this respect (private communication). 5 The Loebner prize competition is annually providing a nice illustration, in this respect. In case there will be once a computer system winning this competition, this will not prove the system's intelligence; but it will reveal the system's ability to fool its human interrogators. In its right perspective, this will be a successful system validation.
and [KG97b] provide instances of such an approach to system validation. For interactive systems usually used in open loops of man{machine interactions, such a dialogue is built upon the following basic concepts: The person performing the test, an object under consideration, a measure of experimentation intensity, an experimental run and its result, the person's evaluation of the particular experiment. In other words, an expert (or even a novice) is performing sequences of experiments as discussed in the chapter before. A team of human interrogators, thus, is iteratively performing a bunch of series of experiments. A quintuple containing information as listed above is called a report. There are several ways to impose partial orderings on reports. The easiest is to compare reports only w.r.t. experimentation intensity. There are further ideas like ordering experts by competence or ordering objects under consideration (in data mining: phenomena to be learnt) by complexity, e.g. A validity assessment is a nite set of reports. Validation scenarios will lead to sequences of validity assessments. Those sequences are potentially in nite, but a certi cate of validity has to be based on nite sequences of assessments, naturally. This requires interpretations of those sequences of validity assessments. As discussed in the chapter before, we are faced to the diculty to interpret initial sequences of structured objects. In contrast to the discussion above, every element in such a sequence is remarkably more rich in structure. Understanding the information inherent in individual validity assessments and in nite sequences of subsequently growing competence assessments means understanding the power and limitations of interactive system validation. To sum up, the validity of a given system for knowledge discovery in data bases is certi ed on the basis of validity certi cates for system modules and a validation dialogue according to a well-speci ed interrogation scenario. There is a need to develop those scenarios for systems used in open loops like done before for one-shot systems (cf. [KG97b], e.g.). After those concepts have been developed, studied and tested, there will evolve the social problem of propagating these concepts towards the validation of data mining systems, in particular, and towards the validation of arbitrary intelligent interactive systems, in general.
LIT'97
5. Leipziger Informatik-Tage vom 25. bis 26. September 1997
5 Complex System Validation 6 Conclusions { A Social and a Marketing Data mining and knowledge discovery in data bases are not only new buzzwords in modern information Perspective and communication technologies, these two phrases Our case study of validating systems for knowledge discovery in data bases is taken as a basis for some more practically oriented considerations. Because of the socially highly relevant problem that
the inability to adequately evaluate systems may become the limiting factor in our ability to employ systems that our technology and knowledge will allow us to design (cf. [WW93], as cited above), both system
developers and costumers will become more and more interested in product validation. Complex system validation might become a relevant marketing factor. In contrast to veri cation, the validation inevitably depends on the co-operation of human experts (and, possibly, carefully chosen novices). Sophisticated validation scenarios do depend essentially on human intervention. This leads to a serious obstacle: As there might be unwelcome ndings, experts are not always interested in a system validation being as successful as possible. To overcome this general problem, one needs political, social, and psychological approaches. Consequently, in the long run, progress in complex system validation can only result from a concerted endeavour. Computer science research and development6 can provide the conceptual basis, the methodologies, algorithms and tools, and prototypical implementations and tests. The nal success of the overall endeavour is beyond the limits of computer science. Furthermore, there is the diculty that suciently interesting problems in the area of information and communication technologies are usually undecidable. As a consequence, one can rarely come up with sharp and deterministically true validity assessments. Those interested in validity certi cates may be disappointed by what they can get. They public might insist in validity certi cates which are impossible, by nature. The best computer science can oer are (i) clearly motivated validation scenarios and, based on these scenarios, (ii) validity assessments which certify a certain system's validity w.r.t. the competence of some distinguished team of experts.
6 It is worth to be mentioned here that research and development needs its funding. The approach to wait for the theoretical basis and, then, to invest into its exploration for applicability might fail. Research and development sponsoring agencies need to support the development of the theoretical basis, rst. One has to be aware of the diculty that small and medium-size enterprises often don't have the economical power to lay the cornerstone for innovations in science and technology they are interested in and willing to apply.
are indicating an area of both academic research and commercial relevance which will attract much more attention in the near future. In the present paper, knowledge discovery systems have been chosen as a case for studying the problem of system validation. Complex systems, in particular those for solving dicult and not completely speci ed problems (like knowledge discovery and learning, e.g.), are frequently used in open loops of human-computer interactions. Validation of those systems is a manyfold of diverse problems including social questions and phenomena of cognitive psychology. Many of these problems are beyond the author's competence and lie far outside the reach of computer science methodologies. The validity of a complex interactive systems can be determined only within an a priori unbounded process of human-computer interactions. This requires the speci cation of appropriate validation scenarios. The application of those validation scenarios is inevitably based on human activities. The selection of human experts to become engaged in the system validation process is another issue which has substantial social and psychological dimensions. Any scenario of interactive system validation will usually result in a sequence of validity assessments, where each validity assessment is based on a nite collection of evaluated experiments. There is proposed a template of formalisms where validity assessments are nite sets of so-called reports. Every individual report is understood a quintuple containing information about the human being involved, about the object taken as a basis for experimentation, a measure characterizing the intensity of experimentation, the particular result of running the experiment, and the expert's evaluation of the output. Reports are highly structured and partially ordered, and so are validity assessments. Determining a given system's validity requires some appropriate concepts of convergence for sequences of validity assessments and certain related monotonicity concepts. Approaches of this type are currently under investigation. These formalisms will allow for the interpretation of sequences of validity assessments. The detection of certain monotonicity resp. convergency phenomena may be taken as evidence for validity, to some extent, parameterized by the underlying validation scenario and by the human experts involved.
Klaus P. Jantke: Towards Validation of Data Mining Systems
References
[ABB+ 91] Andrejs Auzins, Janis Barzdins, Janis Bicevskis, Karlis C erans, and Audris Kalnins. Automatic construction of test sets: Theoretical approach. In Janis Barzdins and Dines Bjrner, editors, Baltic Computer Science, volume 502 of Lecture Notes in Arti cial Intelligence, pages 286{359. Springer{Verlag, 1991. [AG97] Thomas Abel and Avelino J. Gonzalez. Utilizing criteria to reduce a set of test cases for expert system validation. In Douglas D. Dankel II, editor, FLAIRS{ 97, Proc. Florida AI Research Symposium, Daytona Beach, FL, USA, May 11{14, 1997, pages 402{406. Florida AI Research
Society, 1997. [AKG96] Thomas Abel, Rainer Knauf, and Avelino J. Gonzalez. Generation of a minimal set of test cases that is functionally equivalent to an exhaustive set, for use in knowledge-based system validation. In John H. Stewman, editor, FLAIRS{ 96, Proc. Florida AI Research Symposium, Key West, FL, USA, May 20{22, 1996,
pages 280{284. Florida AI Research Society, 1996. [AS83] Dana Angluin and Carl H. Smith. A survey of inductive inference: Theory and methods. Computing Surveys, 15:237{269, 1983. [AS92] Dana Angluin and Carl H. Smith. Inductive inference. In Stuart C. Shapiro, editor, Encyclopedia of Arti cial Intelligence. Second Edition (Volume 1), pages 672{682. John Wiley & Sons, Inc., 1992. [BKM91] Juris Borzovs, Audris Kalnins, and Inga Medvedis. Automatic construction of test sets: Practical approach. In Janis Barzdins and Dines Bjrner, editors, Baltic Computer Science, volume 502 of Lecture Notes in Arti cial Intelligence, pages 360{432. Springer{Verlag, 1991. [Etz96] Oren Etzioni. The world{wide web: Quagmire or gold mine? Comm. of the ACM, 39(11):65{68, 1996. [FH92] Mcheal Foley and Anna Hart. Expertnovice dierences and knowledge elicitation. In Robert R. Homan, editor, The Psychology of Expertise. Cognitive Research of Empirical AI, pages 233{244.
Springer-Verlag, 1992.
[Fos93]
Harold D. Foster. Resilience theory and system evaluation. In John A. Wise, V. David Hopkin, and Paul Stager, editors, Veri cation and Validation of Complex Systems: Human Factors Issues, volume 110 of NATO ASI Series, Series F: Computer and Systems Sciences, pages 35{60. Springer{Verlag, 1993. [FPSM91] William J. Frawley, Gregory PiatetskyShapiro, and Christopher J. Matheus. Knowledge discovery in databases: An overview. In Gregory Piatetsky-Shapiro and William J. Frawley, editors, Knowledge Discovery in Databases, pages 1{27. AAAI Press / The MIT Press, 1991. [FPSS96] Usama Fayyad, Gregory Piatetsky-Shapiro, and Padraic Smyth. The KDD process for extracting useful knowledge from volumes of data. Comm. of the ACM, 39(11):27{34, 1996. [GGC96] A. J. Gonzalez, U. G. Gupta, and R. B. Chianese. Performance evaluation of a large diagnostic expert system using a heuristic test case generator. Engineering Applications of Arti cial Intelligence, 1(3):275{284, 1996. [Gri96] Gunter Grieser. Re ecting inductive inference machines and its improvement by therapy. In Setsuo Arikawa and Arun K. Sharma, editors, Proc. 7th International Workshop on Algorithmic Learning Theory, (ALT'96), October 23{25, 1996, Sydney, Australia, volume 1160 of LNAI,
pages 325{336. Springer-Verlag, 1996. [Gur92] Yuri Gurevich. Zero-one laws. Bulletin of the EATCS, (46):90{106, 1992. [Hal87] Mark Halpern. Turing's test and the ideology of arti cial intelligence. 1:79{93, 1987. [Har93] Kelly Harwood. De ning human-centered system issues for verifying and validating air trac control systems. In John A. Wise, V. David Hopkin, and Paul Stager, editors, Veri cation and Validation of Complex Systems: Human Factors Issues, volume 110 of NATO ASI Series, Series F: Computer and Systems Sciences, pages 115{129. Springer{Verlag, 1993. [HJK97a] Jorg Herrmann, Klaus P. Jantke, and Rainer Knauf. Towards cost driven system validation. In IWK{97, 42nd International Scienti c Colloquium, Ilmenau University of Technology, 1997.
LIT'97
5. Leipziger Informatik-Tage vom 25. bis 26. September 1997
[HJK97b] Jorg Herrmann, Klaus P. Jantke, and Rainer Knauf. Towards using structural knowledge for system validation. Technical Report Meme{IMP{2/1997, Hokkaido University Sapporo, Meme Media Laboratory, February/March 1997. [HJK97c] Jorg Herrmann, Klaus P. Jantke, and Rainer Knauf. Using structural knowledge for system validation. In Douglas D. Dankel II, editor, FLAIRS{97, Proc. Florida AI Research Symposium, Daytona Beach, FL, USA, May 11{14, 1997, pages
82{86. Florida AI Research Society, 1997. [Hop82] V. David Hopkin. Human factors in air trac control. NATO AGARDograph, No. 275, Paris, 1982. [IM96] Tomasz Imielinski and Heikki Mannila. A database perspective of knowledge discovery. Comm. of the ACM, 39(11):58{64, 1996. [JAK97] Klaus P. Jantke, Thomas Abel, and Rainer Knauf. Fundamentals of a TURING test approach to validation. Technical Report Meme{IMP{1/1997, Hokkaido University Sapporo, Meme Media Laboratory, February 1997. [Jan94] Klaus P. Jantke. Towards re ecting inductive inference machines. GOSLER Report 24/93, HTWK Leipzig (FH), FB Informatik, Mathematik & Naturwissenschaften, September 1994. [Jan95] Klaus P. Jantke. Re ecting and selfcon dent inductive inference machines. In Klaus P. Jantke, Takeshi Shinohara, and Thomas Zeugmann, editors, Proc. 6th International Workshop on Algorithmic Learning Theory, (ALT'95), October 18{20, 1995, Fukuoka, Japan, volume 997 of LNAI, pages 282{297. Springer-Verlag,
1995. [JB81] Klaus P. Jantke and Hans-Rainer Beick. Combining postulates of naturalness in inductive inference. EIK, 17(8/9):465{484, 1981. [JKA97a] Klaus P. Jantke, Rainer Knauf, and Thomas Abel. The TURING test approach to validation. In IJCAI{97, Workshop on Validation, Veri cation & Re nement of AI Systems and Subsystems August 1997, Nagoya, Japan, 1997.
[JKA97b] Klaus P. Jantke, Rainer Knauf, and Thomas Abel. A TURING test scenario to estimate an AI system's validity. Technical Report DOI{TR{133, Kyushu University 33, Department of Informatics, Fukuoka 812{81, Japan, March 1997. [KG97a] Rainer Knauf and Avelino J. Gonzalez. Estimating an AI system's validity by a TURING test. In IWK{97, 42nd International Scienti c Colloquium, Ilmenau University of Technology, 1997.
[KG97b] Rainer Knauf and Avelino J. Gonzalez. A TURING Test approach to intelligent system validation. In Wolfgang S. Wittig and Gunter Grieser, editors, LIT{97, Proc.
5. Leipziger Informatik-Tage, Leipzig, 25./26. September 1997. Forschungsinsti-
tut fur InformationsTechnologien Leipzig e.V., 1997. [KJAP97] Rainer Knauf, Klaus P. Jantke, Thomas Abel, and Ilka Philippow. Fundamentals of a TURING test approach to validation of AI systems. In IWK{97, 42nd Interna[KP96]
tional Scienti c Colloquium, Ilmenau University of Technology, 1997.
Rainer Knauf and Ilka Philippow. Ein TURING Test zur Validitatsabschatzung von KI{Systemen. In Klaus P. Jantke and Gunter Grieser, editors, LIT{
96, Proc. 4. Leipziger Informatik-Tage, Leipzig, 29./30. August 1996, pages 147{
152. Forschungsinstitut fur InformationsTechnologien Leipzig e.V., 1996. [KPG97] Rainer Knauf, Ilka Philippow, and Avelino Gonzalez. Towards an assessment of an AI system's validity by a TURING test. In Douglas D. Dankel II, editor, FLAIRS{ 97, Proc. Florida AI Research Symposium, Daytona Beach, FL, USA, May 11{14, 1997, pages 397{401. Florida AI Research
Society, 1997. [KRSU95] Michael Kleiber, Sabine Ramm, Susanne Sager, and Hans-Joachim Uth. Jahresbericht 1994. Umweltbundesamt Berlin, 1995. [KRSU96] Michael Kleiber, Sabine Ramm, Susanne Sager, and Hans-Joachim Uth. Jahresbericht 1995. Umweltbundesamt Berlin, 1996. [Lak87] George Lako. Women, Fire, and Dangerous Things. The University of Chicago Press, 1987.
Klaus P. Jantke: Towards Validation of Data Mining Systems
[NN66] [OO93] [PE95]
Radiation accident at Hammersmith.
[SSS91]
R. M. O'Keefe and D. E. O'Leary. Expert system veri cation and validation: A survey and tutorial. Arti cial Intelligence Review, 7:3{42, 1993. M. Perkowitz and O. Etzioni. Category translation: Learning to understand information on the Internet. In Chris S. Mellish, editor, 14th International Joint Con-
[Sta93]
British Medical Journal, (5507):233, 1966.
ference on Arti cial Intelligence, IJCAI{ 95, Montreal, Canada, August 20{25, 1995, pages 930{936. Morgan Kaufmann,
1995. [PKZI93] Joseph Pitts, Phyllis Kayten, and John Zalenchak III. The national plan for aviation human factors. In John A. Wise, V. David Hopkin, and Paul Stager, editors, Veri cation and Validation of Complex Systems: Human Factors Issues, volume 110 of NATO ASI Series, Series F: Computer and Systems Sciences, pages 529{540. Springer{Verlag, 1993. [Pop34] Karl Popper. Logik der Forschung. J.C.B. Mohr, Tubingen, 1934. [Pop65] Karl Popper. The Logic of Scienti c Discovery. Harper & Row, 1965. [Por64] E. H. Porter. Manpower Development. Harper & Row, 1964. [Rei96] Wolfgang Reif. Risikofaktor Software. In Klaus P. Jantke and Gunter Grieser, editors, 4. Leipziger Informatik{Tage, 29./30. August 1996, pages 3{10. Forschungsinstitut fur InformationsTechnologien Leipzig e.V., 1996.
[Sch91]
[Sch93]
Cullen Schaer. On evaluation of domaininedependent scienti c function- nding systems. In Gregory Piatetsky-Shapiro and William J. Frawley, editors, Knowledge Discovery in Databases, pages 93{ 104. AAAI Press / The MIT Press, 1991. Gerhard L. Schaad. Psychological aspects of human factor testing and evaluation of military human-machine systems. In John A. Wise, V. David Hopkin, and Paul Stager, editors, Veri cation and Validation of Complex Systems: Human Factors Issues, volume 110 of NATO ASI Series, Series F: Computer and Systems Sciences,
pages 453{455. Springer{Verlag, 1993.
[SU94a]
Michael Siegel, Edward Sciore, and Sharon Salveter. Rule discovery for query optimization. In Gregory Piatetsky-Shapiro and William J. Frawley, editors, Knowledge Discovery in Databases, pages 411{ 427. AAAI Press / The MIT Press, 1991. Paul Stager. Validation in complex systems: Behavioral issues. In John A. Wise, V. David Hopkin, and Paul Stager, editors, Veri cation and Validation of Complex Systems: Human Factors Issues, volume 110 of NATO ASI Series, Series F: Computer and Systems Sciences, pages 99{114. Springer{Verlag, 1993. Susanne Sager and Hans-Joachim Uth. Bericht. Meldep ichtige Ereignisse 1980{ 1992. Umweltbundesamt Berlin, 1994.
[SU94b] Susanne Sager and Hans-Joachim Uth. Jahresbericht 1993. Umweltbundesamt Berlin, 1994. [Ter94] Takao Terano. The JIPDEC checklistbased guideline for expert system evaluation. International Journal of Intelligent Systems, 9:893{925, 1994. [Tho94] Muy Thomas. The story of the Therac-25 in LOTOS. High Integrity Systems Journal, 1(1):3{17, 1994. [Tur50] Alan M. Turing. Computing machinery and intelligence. Mind, LIX(236):433{460, 1950. [Wie92] Rolf Wiehagen. From inductive inference to algorithmic learning theory. In Shuji Doshita, Koichi Furukawa, Klaus P. Jantke, and Toyaki Nishida, editors, Proc. 3rd Workshop on Algorithmic Learning Theory, (ALT'92), October 20{22, 1992, Tokyo, volume 743 of Lecture Notes in Arti cial Intelligence, pages 13{24. Springer-
Verlag, 1992. [WW93] John A. Wise and Mark A. Wise. Basic considerations in veri cation and validation. In John A. Wise, V. David Hopkin, and Paul Stager, editors, Veri cation and Validation of Complex Systems: Human Factors Issues, volume 110 of NATO ASI Series, Series F: Computer and Systems Sciences, pages 87{95. Springer{Verlag,
1993.