Learning Kernel Classifiers: Theory and AlgorithmsâR. Herbrich

1990

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 11, NOVEMBER 2008

Book Reviews Learning Kernel Classifiers: Theory and Algorithms—R. Herbrich (Cambridge, MA: MIT Press, 2002, Series on Adaptive Computation and Machine Learning, pp. 384, ISBN: 0-262-08306-X). Reviewed by C. Angulo Kernel methods were widely used in the 1990s as a tool in statistical learning theory for extending support vector machines from linear to nonlinear classification problems. Investigations into kernel-based algorithms rapidly increased owing to their suitability for general types of data and for any learning algorithm that can be expressed as dot products in a certain space, as well as their good statistical performance. Several books on this topic have been published since then, with monographs by Schölkopf and Smola [1] and Shawe-Taylor and Cristianini [2] (previously N. Cristianini and J. Shaw-Taylor [3]) being the most popular. Herbrich’s book takes a different approach. Since it is based on the author’s Ph.D. dissertation, there is an imbalance towards minor and subjectively selected research topics. This could be considered a drawback in recommending the book as a student textbook, and the book may be difficult to grasp for researchers coming from completely different fields. On the other hand, by avoiding being too encyclopedic, the book is full of original opinions, and contains perspective on basic problems and formulations that cannot be found elsewhere, so it is very valuable as a secondary text for new research students requiring in-depth knowledge. It has a similar critical spirit as the Vapnik’s book [4], although it does not turn out to be so subjective. Focusing on classification learning, the book covers learning algorithms and learning theory in two parts of three and two chapters, respectively. The book concludes with appendices covering some of the technical aspects involved. Chapters 1 and 2 are classical from the machine learning perspective, describing the learning problem and introducing kernel classifiers. Chapter 3 provides a Bayesian perspective for the problem to introduce the relevance vector machines and the Bayes point machines idea. The Bayesian perspective is not in line with Vapnik’s discourse on statistical learning theory and is not usually covered in other similar books, but it represents an interesting theoretical study of the learning problem that will be very useful for students. The second part of the book covers learning theory, presenting the usual VC framework and the PAC-Bayesian framework, as well as a very different analysis using the “luckiness framework” proposed by the author. Other results from the author’s Ph.D. dissertation are presented such as the PAC-Bayesian margin bound. The book concludes with appendices that include a basic theoretical background, proofs and derivations of the main theorems for each of the two main sections, and a very useful list of pseudocodes for several of the algorithms presented. The source code of these algorithms implemented in R is publicly available at http://www.kernel-machines.org/. The entire book is written in plain language and uses a very rigorous notation style, perhaps with a certain abuse of bold typeface. On the other hand, it is claimed that important material is emphasized through many examples and remarks, but this is not the reader’s impression, especially when compared to the other textbooks cited. The reviewer is with the Automatic Control Department, Technical University of Catalonia, Vilanova i la Geltrú, 088800, Spain (e-mail: cecilio.angulo@upc. edu). Digital Object Identifier 10.1109/TNN.2008.2008390

In conclusion, this book is a good reference for scientists and engineers interested in learning about kernel classifiers. It is not very suitable as a primary student textbook, but is recommended as secondary reading for students requiring an in-depth insight into this area.

REFERENCES [1] B. Schölkopf and A.J. Smola, Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA: MIT Press, 2001. [2] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. New York: Cambridge Univ. Press, 2004. [3] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. New York: Cambridge Univ. Press, 2000. [4] V. N. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.

Numerical Solutions of Stochastic Differential Equations— P. K. Kloeden and E. Platen (Berlin, Germany: Springer-Verlag, 2008, pp. 636, ISBN: 3-540-54062-8). Reviewed by F. Gianfelici Stochastic modeling of artificial neural networks (ANNs) has recently enjoyed increasing attention due to its favorable accuracies over deterministic modeling on a wide spectrum of practical applications and theoretical foundations of computational intelligence. During recent years, it was widely accepted that the modeling of the learning capabilities is equivalent to model physical systems, and thereby, large classes of ANNs can be effectively represented as stochastic dynamical systems described by stochastic differential equations (SDEs). Unlike other scientific trends, the advocated approach resists the temptation of simplifying the mathematical foundations of ANNs to fit the method but retains, instead, the full complexity of the problem. With the above considerations in mind, it is possible to state that a new generation of learning algorithms with optimal regulation capabilities and strong theoretical foundations can be effectively developed by means of recent results, effective theorems, new stochastic-calculus-based theories, and the numerical algorithms that are commonly used to solve and regulate the SDEs. To be more specific, the dichotomic relation between the dynamic behavior of ANNs and SDEs allows to define a scientific bridge between these two fields, and therefore, it is possible to exploit the ground breaking results of SDEs also in the ANN context. This promising interdisciplinary methodology opens new research directions, which represent, on the one hand, the starting point for the development of a new state of the art, and on the other hand, it contributes to a new movement with ancient roots that will restore the mathematical foundations to their rightful position in the field of neural networks through adoption of the SDE-based viewpoint. Although this modeling has been increasingly used, the use of numerical solutions remains rather limited due to the scarcity of practical methods commonly known by engineers and researchers, and the intrinsic complexity of the problem. In fact, the purpose of The reviewer is with the Dipartimento di Elettronica, Intelligenza Artificiale e Telecomunicazioni, Università Politecnica delle Marche, Ancona 60121, Italy (e-mail: [email protected]). Digital Object Identifier 10.1109/TNN.2008.2008405

1045-9227/$25.00 © 2008 IEEE

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 19, NO. 11, NOVEMBER 2008

this book is to bring to the attention of scientific community the fact that an effective methodology for numerical solutions of applied mathematical problems and numerical schemes for particular problems or class of problems exists. Moreover, the need for proper SDE methodologies in a numerical context is increasingly pressing and provides the motivation and the starting point of this excellent book written by Kloeden and Platen. Motivated by the aim to provide an accessible introduction to SDEs and their applications together with a systematic presentation of methods available for their numerical solutions, this book presents many new results on high-order methods for strong sample path approximations and for weak functional approximation, including implicit, predictor corrector, extrapolation, and variance-reduction methods. To better appreciate the aforementioned contents, let us to briefly introduce the six book parts. Part 1 entitled “Preliminaries” is methodologically divided in two chapters entitled “Probabilities and Statistics” and “Probability and Stochastic Processes,” which provide the background material on probability, stochastic processes, and statistics. Part 2 entitled “Stochastic Differential Equations” is divided into three chapters “Ito Stochastic Calculus,” “Stochastic Differential Equations,” and “Stochastic Taylor Expansions,” which introduce stochastic calculus, stochastic differential equations, and stochastic Taylor expansions, respectively. Part 3 entitled “Applications of Stochastic Differential Equations” has two chapters “Modeling with Stochastic Differential Equations” and “Applications of Stochastic Differential Equations,” which survey the application of

1991

SDEs in a diversity of disciplines such as control, filtering, stability, and parametric estimations. Part 4 entitled “Time Discrete Approximations” has also two chapters “Time Discrete Approximation of Deterministic Differential Equations” and “Introduction to Stochastic Time Discrete Approximation,” which provide a brief review of time discretization methods for ordinary differential equations and an introduction to such methods for SDEs. The remaining two parts of the book present different classes of numerical schemes. Strong approximations and weak approximations of numerical solutions of SDEs have been presented in Part 5 and Part 6, respectively. Part 5 entitled “Strong Approximations” is divided in four chapters “Strong Taylor Approximations,” “Explicit Strong Approximations,” “Implicit Strong Approximations,” and “Selected Applications of Strong Approximations.” Part 6 entitled “Weak Approximations” has four chapters “Weak Taylor Approximations,” “Explicit and Implicit Weak Approximations,” “Variance Reduction Methods,” and “Selected Applications of Weak Approximations.” In these parts, the schemes, the convergence orders, the stability properties, and various applications have been properly described. Finally, many simulation exercises are included throughout the book to assist the reader to develop hands on numerical skills and an intuitive understanding of the basis concepts and of the properties and the issues concerning implementation of the numerical schemes introduced. The simulation exercises often build on earlier ones and reappear later in the text and applications, so the reader is encouraged to work through them systematically.

Learning Kernel Classifiers: Theory and AlgorithmsâR. Herbrich

Learning Kernel Classifiers: Theory and AlgorithmsâR. Herbrich

Suggest Documents

Kernel Constrained Covariance for Dependence ... - Ralf Herbrich

Semi-supervised Learning of Classifiers: Theory

Feature and kernel learning

Anytime Query-Tuned Kernel Machine Classifiers

Designing Kernel Scheme for Classifiers Fusion

Fuzzy Classifiers Based on Kernel Discriminant Analysis

Probabilistic Discriminative Kernel Classifiers for ... - Semantic Scholar

Aggregation of estimators and classifiers : theory and

Download PDF - Ralf Herbrich

Semi-supervised Learning of Classifiers: Theory ... - UIUC-IFP

Extraction of Regulatory Events using Kernel-based Classifiers and

Kernel methods in machine learning - Kernel Machines

Machine learning classifiers and fMRI

Learning Classifiers for Assigning Protein

Bayesian Machine Learning Classifiers for

Kernel Dictionary Learning

Download PDF - Ralf Herbrich

Kernel Selection using Multiple Kernel Learning and Domain

Kernel Selection using Multiple Kernel Learning and Domain ...

Vuvuzelas & Active Learning for Online Classification - Ralf Herbrich

Building Sparse Multiple-Kernel SVM Classifiers - Semantic Scholar

Margin and Radius Based Multiple Kernel Learning

Contextual Kernel and Spectral Methods for Learning

Functional learning through kernel - CiteSeerX

Learning Kernel Classifiers: Theory and AlgorithmsâR. Herbrich