Efficient Ensemble Learning with Support Vector

21 downloads 0 Views 753KB Size Report
Efficient Ensemble Learning with Support Vector Machines ... EnsembleSVM remedies these problems [1]. ▷trains ... ▷native interfaces to Python, R, MATLAB, .
Efficient Ensemble Learning with Support Vector Machines Marc Claesen

1,2

3

Frank De Smet

1 KU Leuven, ESAT–STADIUS, Leuven, 2 iMinds Medical IT, Leuven, Belgium 3

Johan Suykens

Belgium

1,2

1,2

Bart De Moor

STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics

KU Leuven, Dept. of Public Health, Leuven, Belgium

Large-scale learning with SVM

Development

Key issues in nonlinear SVM 2 I Ω(n ) training complexity (n instances) I difficult to parallelize/distribute I high memory requirements

implemented in C++11, pthread-parallelized I licensed under GNU LGPL v3+ (free software) I portable via GNU Autotools & libtool I

Benchmark results EnsembleSVM remedies these problems [1] I trains SVM models on (small) subsets I creates ensemble to improve generalization I current focus on binary classification I embarassingly parallel I reduced memory use

covtype (n = 100, 000)

ijcnn1 (n = 35, 000)

sensit (n = 78, 823)

LIBSVM ESVM

LIBSVM ESVM

LIBSVM ESVM

LIBSVM ESVM

accuracy (%) training time (s) memory (MB)

97 148 3200

93 0.3 77

92 728 9496

98 9.5 133

100.0

96.7

95.9

89 35 1510

× EnsembleSVM LIBSVM

Workflow

measure

rcv1 (n = 20, 242)

86.5 591 4187

83.8 7.9 122

96.9

accuracy training time memory use

100%

Train

98 0.3 8

15.9

Tr

...

(1)

...

(1)

SVM

Σ

Test

Tr

(p)

0.3 2.4 rcv1

4.8 covtype

3.2 6.0 ijcnn1

1.3 2.9 sensit

Conclusions (p)

SVM

ˆ y

EnsembleSVM functionality Base models: instance-weighted SVM I support for common & precomputed kernels I LIBSVM is used as solver [2] n X 1 T min w w + Ci ξi , w,ξ,ρ 2 i=1 T

subject to yi (w φ(xi ) + ρ) ≥ 1 − ξi , i = 1, . . . , n, ξi ≥ 0, i = 1, . . . , n. Aggregation of base model predictions I support for common aggregation schemes I flexible framework to prototype novel approaches

EnsembleSVM compared to standard SVM I significantly reduced training complexity I competitive generalization performance Future work I distributed implementation on Hadoop/Spark I GPGPU implementation using CUDA/OpenCL I native interfaces to Python, R, MATLAB, . . . References [1] M. Claesen, F. De Smet, J. Suykens, and B. De Moor, “EnsembleSVM: A library for ensemble learning using support vector machines,” Journal of Machine Learning Research, vol. 15, pp. 141–145, 2014. [2] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, pp. 27:1–27:27, 2011.

Acknowledgements I Marc Claesen is funded by IWT grant number 111065, I Research Council KU Leuven: GOA/10/09 MaNet, KUL PFV/10/016 SymBioSys, PhD/Postdoc grants, I Industrial Research fund (IOF): IOF/HB/13/027 Logic Insulin, I Flemish Government: FWO: projects: G.0871.12N (Neural circuits); PhD/Postdoc grants; IWT: TBM-Logic Insulin(100793), TBM Rectal Cancer(100783), TBM IETA(130256); PhD/Postdoc grants; Hercules Stichting: Hercules 3: PacBio RS, Hercules 1: The C1 single-cell auto prep system, BioMark HD System and IFC controllers (Fluidigm) for single-cell analyses; iMinds Medical Information Technologies SBO 2014; VLK Stichting E. van der Schueren: rectal cancer, I EU: ERC AdG A-DATADRIVE-B.

More information at http://esat.kuleuven.be/stadius/ensemblesvm/

Suggest Documents