Guest Editor's Introduction - IEEE Computer Society

1 downloads 0 Views 789KB Size Report
Learn?” in Machine Learning: An Al Ap- proach, Vol. 1, R.S. Michalski, J.G. Car- bonell, and T.M. Mitchell, eds., Morgan. Kaufmann, San Mateo, Calif., 1983, pp.
Guest Editor’s Introduction

AppMeat~ons Leamhg

of Machine

Albert0 Maria Segre, Cornell University

M

ACHINE-LEARNING RESEARCH spans almost four decades. Much of the research has been to define various paradigms, establish the relationships among them, and elaborate the algori thms that characterize them. Much less effort, relatively speaking, has been devoted to bringing machine learning to bear on real applications. Recently researchers have focused more on real-world problems. There are several reasons for this change in emphasis. First, as a research paradigm matures, simplistic proof-of-concept implementations become less acceptable as vehicles for describing results. Real applications validate previous work and motivate continued research in the paradigm. Second, synergy between research and applications can benefit both researchers and practitioners. When real applications are actively sought and tried, new and previously unnoticed problems emerge for the research community to address. This special series will present research results from the machine-learning community that AI practitioners will find useful and meaningful. We do not claim that machine-learning algorithms have never been successfully applied to real problems. Successful and interesting applications do exist, and have played a major role in the 30

ONLY RECENTLYHAVE RESEARCHERS BEGUN TO APPLY MACHZNELEARNING TO REAL APPLICATIONS. THIS SERlESWILL INTRODUCENOVEL APPLICATZONS OF ALGORITHMSDRAWN FROM RELATlvELYNEW MACHINE-LEARNINGPARADlGMSAND COMBZNATZONS. SEVERAL THEMESEMERGE. discipline’s evolution. However, most applications-oriented articles discuss relatively straightforward applications ofwellunderstood algorithms drawn from one of the more mature learning paradigms. In contrast, this series will introduce novel applications of algorithms drawn from relatively new machine-learning paradigms or combinations thereof.

Paradigms of learning Machine-learning research covers widely disparate techniques: Inductive learning methods, clustering systems, neural networks, genetic algorithms, and explanation-based learning systems seem to have little in common. What they do share, of course, is their ability to alter system 0885/9000/92/0600-0030

$3.00 0 1992 IEEE

performance based on experience. Herbert Simon wrote: Learning denoteschangesin the systemthat are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more efficiently and more effectively the next time.’ This characterization is intentionally abstract so as to cover any number of techniques: the parameter adjustment of Samuel’s checker player,2 the original work on perceptrons,” the complex exemplarguided pattern recognition of case-based reasoning,4 and the statistically inspired Bayesian approach of Autoclass. To understand how these different techniques are related, let’s subdivide the field into five paradigms:6 supervised concept learning, conceptual clustering, analytic IEEEEXPERT

~

learning, genetic algorithms, and connectionist systems. Supervised concept learning. This is the most mature machine-learning paradigm. A type of inductive learning, supervised concept learning constructs a concept description in some predefined description language based on a collection or training set of examples. Elements of the training set are marked as positive or negative examples; for example, as either members of the target concept, or not members. The resulting concept description can then be used to predict concept membership of future examples. Such algorithms differ in several ways. The first is the language for expressing the target concept. Some systems such as ID37,8 use decision trees, others such as the AQ family9 use logical formulations, and so on. Different description languages have different representational power: Concepts that are easily expressed in one language might be inexpressible in another. The application domain can also impose constraints on therepresentation language. For example, in some domains, examples can be adequately described using features with a finite number of possible values (such as color), while other domains might require continuously valued features (such as weight). In the latter case, the concept description language might need to include relational comparisons (such as less-than) to describe the target concept. A second important difference is the inductive bias applied in constructing a concept description. ‘“,I’ For any set of input examples, a representation language can express multiple (perhaps infinitely many) concepts that accurately classify the entire training set. Selecting a description thus depends on the system’s inductive bias. In some systems, the bias is implicit in the algorithm. For example, a form of Occam’s razor might apply, yielding apreference for the shortest concept description in a given representation language. In other systems, the bias might be explicitly input to the algorithm. A third difference is whether the algorithm operates incrementally or in batch mode. Traditional inductive algorithms generally operate in batch mode, where all training-set examples are made available to the system at the same time. Once the concept description is constructed, it is JUNE 1992

used to predict concept membership for future examples. More useful inductive algorithms accept training-set examples incrementally, using the current description for prediction and updating it as inconsistencies are discovered. Incremental algorithms are also more efficient, since they don’t have to rederive the entire concept description each time they observe a new example that is inconsistent with the current concept. In the case of ID3, there are also incremental versions of the basic batch mode algorithm.

algorithms determine the most appropriate number of clusters and then allocate examples to those clusters. In general, conceptual clustering shares many open problems with supervised concept learning (this is hardly surprising given the close relation between them). Like supervised concept learning, conceptual clustering is a relatively mature paradigm that has been applied to real problems. An oftcited success story for such methods is the discovery of a new categorization of stellar spectra that differed from the generally accepted clustering. The new classification was discovered from spectral data by the Autoclass Bayesian clustering system.5

SOME OF THE WORK

Analytic learning. Analytic learning is more recent. A chief example of this paraIN THlS SERIES digm is explanation-based learning algorithms,16*‘7 which are intended to improve USESTECHNIQUESFROM the efficiency of a problem-solving sysONE PARADIGM TO MASK OR tem. While they generally do not change the problems that are in principle solvable CORRECTTHE PROBLEMSOF by the problem solver (that is, the problem OTHERPARADIGMS. solver’s deductive closure), they do bias the problem solver’s search space. For this reason, EBL has sometimes been described as speed-up learning.18 Naturally, given unlimitedresources,aproblemsolverwould Supervised concept learning has inspired most of the formal work on the theoret- eventually find a solution to any problem ical foundations of learning (that is, the within its deductive closure; thus, EBL “probably approximately correct” learning only makes sense when used to alter the theory). Not surprisingly, it is also the area future performance of a resource-limited that has produced the most applications to problem solver. For some EBL systems, this bias takes date. Nevertheless, many important problems remain: dealing effectively with noise the form of acquired problem-space in the input examples,‘* worrying about macro-operators, I9 which alter the search space by compressing generalizations of concept drift (when the target concept changes over time),13 selecting the appro- previously useful solutions into more efficiently applicable idioms. Essentially, EBL priate inductive bias,14 handling nonintegrates redundant problem-space operdiscrete-valued features, and so on. ators with existing operators to bias the exploration of the search space. Acquired Conceptual clustering. Conceptual macro-operators can lead to quick soluclustering systems differ from supervised learning systems in that the training exam- tions, but in other circumstances they can delay the discovery of a goal. ples are not marked as positive or negative Other EBL systems represent acquired by an outside agent or teacher. These sysbias as explicit search-control heuristics tems must recognize the similarities for existing problem-space operators. These between examples and group them accordheuristics typically alter the ordering of ing to some preestablished notion of similarity.15 Applications for these systems are alternative choices by promoting heuristireadily drawn from the same problems usu- cally more promising operators so that they are tried first. Some heuristics reject cerally addressed by traditional statistical clustain operators outright, while others select tering systems, with one important differa particular operator as especially suitable ence: Unlike statistical clustering algorithms to the current situation (to the detriment where the number of outcome clusters of all other operators). As in the macrois predetermined, conceptual clustering 31

operator systems, while heuristics should contribute to a quicker solution, the time spent evaluating these heuristics can slow down the search. Several problems within this paradigm remain to be addressed. Speeding up real applications requires controlling performance degradation. No doubt, this problem can be alleviated or avoided altogether through clever indexing techniques coupled with heuristics for managing learned information in some semiprincipled fashion. Perhaps the biggest remaining problem is that, unlike inductive learning systems, EBL systems are domain-knowledge intensive. Thus most EBL systems require complete and correct problem-space descriptions (or domain theories). Recently, the analytic-learning community has begun to address the problem of revising inaccurate or incomplete domain theories on the basis of classified examples.20~21 This involves repairing inaccuracies in the domain knowledge that are exposed when examples are handled incorrectly by the original domain theory. Thus domain theory revision is a hybrid problem that shares elements with incremental, supervised, inductive learning problems. Starting from an initial theory (a concept description) that might contain some errors, we patch the theory to account for training examples that were misclassified by the current theory. Given the relative youth of this paradigm, it is not surprising that most analytic learning systems are either proof-of-concept systems or research vehicles to study the performance characteristics of different learning algorithms. Direct applications of this technology to real problems are just beginning to emerge. Genetic algorithms. These adaptive search systems are inspired by the Darwinian notion of natural selection. First introduced by Holland, 22 these algorithms are ideally suited to solving combinatorial optimization problems, since they efficiently search solution spaces for quasi-optimal solutions. Forexample, in its simplest form, a genetic algorithm might encode a solution to an optimization problem as a bit vector. By applying mutation operators to the best-performing members of a random pool of solutions, the algorithm essentially performs a parallel hill-climbing search for high-quality solutions. After multiple generations, the pool should contain bit 32

vectors representing near-optimal solutions to the initial problem. Genetic algorithms can be incorporated in a performance system in a variety of ways. Perhaps the simplest is to construct a system (for example, a classification system) in which a genetic algorithm adjusts the system’s parameters. The system’s overall performance would improve as the parameters attain their quasi-optimal settings. A similar architecture might also be used for a problem-solving system. Several critical factors affect the success

A SECONDTHEME IS THAT APPLZCATZONS OF LEAZUVZNG ALGORITHMS ARE NOW DRAWN FROM MORE DZVEZlSE LEARNZNGPARADIGMS.

of this approach. There must be both an adequaterepresentation of the solution space and an effective set of genetic operators for that representation. Typical applications encode system parameters as bit vectors and rely on biologically inspired mutation and crossover operators. By encoding nonbinary parameters as consecutive bits in the bit vector, crossover operations can generate new bit vectors that differ in a single parameter. Thus, because of this representational locality, the search for a quasi-optimal solution proceeds by tweaking individual parameters of the performance system. Mutation operations can randomize the search, which helps to avoid coming to rest in local minima of the search space. The development of genetic algorithms has followed a path largely independent of the mainstream machine-learning community, spawning a specialized conference, the International Conference on Genetic Algorithms. Unlike some of the other learning paradigms, work in this area has been largely application driven.22,2” Connectionist learning. The groundbreaking work on perceptrons’ in the late

1950s represents some of the earliest work on learning systems. After a hiatus of some 25 years, neurally inspired, fine-grained, massively parallel systems are once more attracting attention. Learning is an integral part of any neurally inspired system; indeed, the development of the backpropagation learning algorithm has largely spurred the recent activity in this area.24 Unlike the early perceptron work, this algorithm supports the training of networks with internal layers of units separating input and output units. Such networks avoid many of the pitfalls of earlier systems.25 As with genetic algorithms, much of the connectionist work is performed within a specialized community. Nevertheless, the basic problem is exactly the same as that addressed by supervised concept learning. Thus some researchers have evaluated the strengths and weaknesses of connectionist learning schemesand compared them with supervised concept-learning systems.2h,27

On this special series There are several recurring themes among the articles in this collection. First, some of the work blends multiple learning paradigms, using techniques from one paradigm to mask or correct the problems of other paradigms. For example, to facilitate engineering design, Sudhakar Yerramareddy et al. incorporate neural networks, traditional statistical-regression techniques, and inductive-learning algorithms such as ID3 and PLS. Their Adaptive and Interactive Modeling System treats both model creation (using multiple techniques) and use (that is, optimization) within the design process to provide decision support across the entire engineering design cycle. Their article appears on p. 52 of this issue. In a similar vein, Riyaz Sikora merges a genetic algorithm and a probabilistic inductive-learning system, and evaluates the resulting hybrid system in a chemical process-control domain (he also mentions financial applications). The hybrid system not only has a slightly better prediction accuracy in a noisy domain, but also produces more concise encodings of the final concept. This qualitative aspect of learned knowledge might interest practitioners even more than researchers. His article is on p. 35. IEEEEXPERT

Further reading The premier source of information on machine learning is the journal Machine Learning, which first appeared in 1986. Several edited collections published by Morgan Kaufmann also provide excellent overviews of the field: Readings in Machine Learning (J. Shavlik and T. Dietterich, eds., 1990); Machine Learning: An AI Approach, Vols. I and 2 (R.S. Michalski, J.G. Carbonell, and

1

A second theme is that applications of aarning algorithms are now drawn from nore diverse learning paradigms. Keith .evi et al. describe one of the first applicaions of an analytic learning technique bestage. While the Y,ond the proof-of-concept 1 approach has yet to be scaled up and deto I:cloyed, their results are encouraging uactitioners and researchers alike. The : ,rticle appears on p. 44. A third theme concerns the integration c)f learning with large database systems. J effrey Schlimmer describes an applicaion that uses traditional inductive learning nethods to check database consistency. nstead of using the learning algorithm to ~ Iiscover or characterize relations in data, he focus here is on providing a smart ,anity-check mechanismin the background hat uses the inherent regularity in data o identify inconsistent new entries. In a ilightly different vein, Lawrence Hunter md David States describe an application of msupervised concept learning in large bi)logical databases. Searching for regularties, they compare the Bayesian classifi:ation method of Autoclass with traditional ;tatistical clustering techniques. These two uticles will appear in future issues. I

I t

References 1. H.A. Simon, “Why Should Machines Learn?” in Machine Learning: An Al Approach, Vol. 1, R.S. Michalski, J.G. Carbonell, and T.M. Mitchell, eds., Morgan Kaufmann, San Mateo, Calif., 1983, pp. 25-38. 2. A.L. Samuel, “Some Studies in Machine Learning Using the Game of Checkers,” in Readings in Machine Learning, .I. Shavlik and T. Dietterich, eds., Morgan Kaufmann, San Mateo, Calif., 1990, pp. 535-554. JUNE 1992

T.M. Mitchell, eds., 1983 and 1986, respectively); and Machine Learning: An AI Approach, Vol. 3 (Y. Kodratoff and R.S. Michalski, eds., 1990). A few textbooks also address particular paradigms; for example, Computer Systems that Learn by S.M. Weiss and C.A. Kulikowski (Morgan Kaufmann, 1990). More recent research results can be found in the annual proceedings of the International Conference on Machine Learning

3. F. Rosenblatt, “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,” in Readings in Machine Learning, J. Shavlik and T. Dietterich, eds., Morgan Kaufmann, San Mateo, Calif., 1990, pp. 138-149. 4. K.J. Hammond, “Chef,” in Readings in Machine Learning, J. Shavlik and T. Dietterich, eds., Morgan Kaufmann, San Mateo, Calif., 1990, pp. 686-709.

(Morgan Kaufmann) or in the more mainstream AI conferences (the International Joint Conference on Artificial Intelligence, the National Conference on Artificial Intelligence, and the European Conference on Artificial Intelligence). In addition, there are more specialized conferences and workshops, such as the International Conference on Genetic Algorithms and the Conference on Learning Theory.

12. J.R. Quinlan, “Probabilistic Decision Trees,” in Machine Learning: An AI Approach, Vol. 3, Y. Kodratoff and R.S. Michalski, eds., Morgan Kaufmann, Sau Mateo, Calif., 1990, pp. 140-152. 13. J. Schlimmer and R.Granger, Jr., “Beyond Incremental Processing: Tracking Concept Drift,” Proc. Nat’lCo@ Al (AAAI ‘86), MIT Press, Cambridge, Mass., 1986, pp. 502.507.

5. P. Cheeseman et al., “Autoclass: A Bayesian Classification System,” in Readings in Machine Learning, J. Shavlik and T. Dietterich, eds., Morgan Kaufmann, San Mateo, Calif., 1990, pp. 296-306.

14. P.E. Utgoff, “Adjusting Bias in Concept Learning,” Proc. Eighth Int’l Joint Con$ on AI (IJCAI ‘83), Morgan Kaufmann, San Mateo, Calif., 1983, pp. 447-449.

6 J G Carbonell, “Introduction: Paradigms for Machine Learning,” Artificial Intelligence, Vol. 40, Nos. l-3, Sept. 1989, pp. l10.

15. R.S. Michalski and R.E. Stepp, “Learning fromobservation: Conceptual Clustering,” in Machine Learning: An Al Approach, Vol. 1, R.S. Michalski, J.G. Carbonell, and T.M. Mitchell, eds., Morgan Kaufmann, San Mateo, Calif., 1983, pp. 331-364.

7. J.R. Quinlan, “InductionofDecisionTrees,” in Readings in Machine Learning, J. Shavlik and T. Dietterich, eds., Morgan Kaufmann, San Mateo, Calif., 1990, pp. 57-69. 8. J.R. Quinlan, “Learning Efficient Classification Procedures and Their Application to Chess Endgames,” in Machine Lemming: An AI Approach, Vol. 1, R.S. Michalski, J.G. Carbonell, and T.M. Mitchell, eds., Morgan Kaufmann, San Mateo, Calif., 1983, pp. 463-482 9. R.S. Michalski, “A Theory and Methodology of inductive Learning,” in Reading.? in Machine Learning, J. Shavlik and T. Dietterich, eds., Morgan Kaufmann, San Mateo, Calif., 1990, pp. 70-95.

16. T.M. Mitchell, R.M. Keller, and S.T. Kedar-Cabelli, “Explanation-Based Generalizations: A Unifying View,” in Readings in Machine Lemming, J. Shavlik and T. Dietterich, eds., Morgan Kaufmann, San Mateo, Calif., 1990, pp. 435-45 1. 17. G. DeJong and R. Mooney, “ExplanationBased Learning: An Alternative View,” in Readings in Machine Learning, J. Shavlik and T. Dietterich, eds., Morgan Kaufmann, San Mateo, Calif., 1990, pp. 452-467. 18. T. Dietterich, “Machine Learning,” Ann. Rev. Computer Science, Vol. 4, 1990, pp. 255-306.

10. P.E. Utgoff, “Shift of Bias for Inductive Concept Learning,” in Machine Learning: An Al Approach, Vol. 2, R.S. Michalski, J.G. Carbonell, and T.M. Mitchell, eds., Morgan Kaufmann, SanMateo, Calif., 1986, pp. 107-148.

19. R.E. Fikes, P.E. Hart, and N.J. Nilsson, “Learning and Executing Generalized Robot Plans,” in Readings in Planning, J. Allen, J. Hendler, and A. Tate, eds., Morgan Kaufmann, San Mateo, Calif., 1990, pp. 189-206.

11. T.M. Mitchell, “The Need for Biases in Learning Generalizations,” in Readings in Machine Learning, J. Shavlik and T. Dietterich, eds., Morgan Kaufmann, San Mateo, Calif., 1990, pp. 184-191.

20. D. Ourston and R.J. Mooney, “Changing the Rules: A Comprehensive Approach to Theory Refinement,” Proc. Nnt’l Conf. AI (AAAI ‘90), MITPress, Cambridge,Mass., 1990, pp. 8 15-820. 33

NEWBOOKS! ARC-hTE-M IR EXPLOITINGP-LJSM editedby DavidJ. Lilja This tutorial surveys the fine-grained parallel architectures that exploit the parallelism available at the instruction set level, the coarse grained parallel architecture that exploits the parallelism available at the loop and subroutine levels, and the single instruction, multiple-data (SIMD) massively parallel architectures that exploit paral- H lelism across large data structures. The book includes over 37 articles that discuss the potential of parallel processing for reducing the execution time of a single program, available parallelism in application programs, processor architectures that exploit parallelism at the instruction level, pipelined processors, multiple-instruction issue architectures, decoupled access/execute architectures, dataflow processors, shared memory multiprocessors, distributed memory multicomputers, reconfigurable and massively parallel architectures, and comparing parallelism extraction techniques.

22. L.B. Booker, D.E. Goldberg, and J.H. Holland, “Classifier Systems and Genetic Algorithms.” in Readings in Machine Learning, .I. Shavlik and T. Dietterich, eds., Morgan Kaufmann, SanMateo, Calif., 1990, pp. 404-428. 23. J. Holland, Adaptation in Natural and Artificial Systems, MIT Press, Cambridge, Mass., 1992. 24. D.E. Rumelhart, G.E. Hinton, and R.J. Williams, “Learning Internal Representations by Error Propagation,” in Readings in Muchine Learning, J. Shavlik and T. Dietterich, eds., Morgan Kaufmann, San Mateo, Calif., 1990, pp. 115. 137.

Sections: Introduction, Fine-Grained Parallel Architectures, Coarse-Grained and Massively Parallel Architectures, Architectural Comparisons. 464 pages. March 1992. Hardbound. ISBN O-8 186-2642-9. Catalog No. 2642 - $80.00 Members $50.00

To Order Cd: or FAX (714) 821-4010 COMP”TER SOCIETY @ IEEE

21. R. Feldman, A. Segre, and M. Koppel. “Incremental Refinement of Approximate Domain Theories,” Proc. Eighth Int’l Machine Learning Workshop, Morgan Kaufmann, San Mateo, Calif., 1991, pp. 500504.

25. M.L. Minsky and S. Papert, Perceptrons, MIT Press, Cambridge, Mass., 1967.

l-800~CS-BOOKS

NEWBOOKS! DIGI'lALIMAGEPROCESSING, 2ndJSdition

I

~ 26. J.W. Shavlik, R.J. Mooney, and G.G. Towell, “Symbolic and Neural Learning Algorithms: An Experimental Comparison,” Machine Learning, Vol. 6, No. 2, Mar. 1991, pp. 1 I l-144. I 27. R. Mooney et al., “An Experimental Comparison of Symbolic and Connectionist Learning Algorithms,” in Readings in Muchine Learning, J. Shavlik and T. Dietterich, eds., Morgan Kaufmann, San Mateo, Calif., 1990, pp. 171-176.

editedby RamaChellappa This collection of papers examining digital image processing is a revised and updated version of the first edition, published in 1985. This new text extensively revises the previous edition. In order to emphasize developing areas in the field, some sections covering now fully matured topics have been eliminated. ~ i ~

The book concentrates on areas of new research and advanced concepts, including discussions of image compression, explorations into image enhancement with an emphasis on median and related nonlinear filtering of images, and investigations of image restoration to preserve discontinuities while smoothing or restoring. The tutorial includes an overview of all five sections and a total of 47 papers of which 35 are new to this edition. It also contains a new section on emerging topics for digital image processing covering wavelets, morphology, neural networks, robust image processing, and high-definition television (HDTV). Sections: Image Models, Image Enhancement, Image Restoration, Image Data Compression, Emerging Topics, Bibliography. 8 16 pages. April 7 992. Hardbound. Catalog No. 2362 - $96.00

/SBN O-8 186-2362-4. Members $64.00

To Order Cd: l-800~CS-BOOKS or FAX (714) 821-4010 COMPUTER SOClFN a. IEEE

Albert0 Maria Segre is an assistant professor at Cornell University, where his research involves analytic learning systems for planning. After receiving undergraduate degrees in music theory and computer engineering at the University of Illinois at Urbana-Champaign, Segre spent a year as a Fulbright scholar at the Computer Music Laboratory of the University of Milan. He returned to the University of Illinois to study artificial intelligence and machine learning, and received his PhD in electrical engineering for work on learning apprentice systems for assembly robots. Segre can be reached at the Dept. of Computer Science, Cornell University, Ithaca, NY 14853, or by e-mail to [email protected]

Suggest Documents