Fast Bayesian Support Vector Machine Parameter Tuning ... - CiteSeerX

8 downloads 112869 Views 143KB Size Report
Page 1. Fast Bayesian Support Vector Machine Parameter. Tuning with the Nystrom Method. Carl Gold. Computation & Neural Systems. California Institute ...
Fast Bayesian Support Vector Machine Parameter Tuning with the Nystrom Method Carl Gold

Peter Sollich

Computation & Neural Systems California Institute of Technology 139-74, Pasadena, CA 91125 E-mail: [email protected]

Dept. of Mathematics King’s College London Strand, London WC2R 2LS, U.K. E-mail: [email protected]

Abstract— We experiment with speeding up a Bayesian method for tuning the hyperparameters of a Support Vector Machine (SVM) classifier. The Bayesian approach gives the gradients of the evidence as averages over the posterior, which can be approximated using Hybrid Monte Carlo simulation (HMC). By using the Nystrom approximation to the SVM kernel, our method significantly reduces the dimensionality of the space to be simulated in the HMC. We show that this speeds up (with a the running time of the HMC simulation from large prefactor) to effectively , where is the number of training samples. We conclude that the Nystrom approximation has an almost insignificant effect on the performance of the algorithm when compared to the full Bayesian method, and gives excellent performance in comparison with other approaches to hyperparameter tuning.

 

I. SVM





CLASSIFICATION

training examples usual way we assume a set  Inthe  with   of binary outputs . The SVM maps the

to vectors    in some high-dimensional inputs feature  $# !  " space and uses a maximal margin hyperplane, % '& , to separate the training examples. This is equivalent tominimizing *()( + subject to the constraints  %  ,- .  "# % /0213 (see()( e.g. [1]). The offset parameter is treated as incorporated into   5 in46 the following, by augmenting feature space vectors to    78 . noise in the training data, ‘slack variables’ & arefitting 9  To/:avoid introduced the margin to   ;@? 9  13 toandrelax BAC-D Econstraints  98F is then the term A added to the objective function, with a penalty coefficient G D H   and typically or 2. This gives the SVM optimization problem: Find  to minimize

      (1) I )( ( *(J( + #KAML O  N F 07  FN .PQR D S T?UP

Suggest Documents