Hyperparameter tuning in Python using Optunity Marc Claesen Jaak Simm Dusan Popovic Bart De Moor KU Leuven, Dept. of Electrical Engineering, ESAT/STADIUS – iMinds Medical IT Kasteelpark Arenberg 10, box 2446, 3001 Leuven, Belgium Correspondence address:
[email protected]
Abstract: We present Optunity, a Python library which bundles various strategies to solve hyperparameter tuning problems. The library provides general purpose algorithms, ranging from undirected search methods to adaptive methods based on refinement strategies, heuristics and evolutionary computing. Optunity aspires to become a Swiss army knife to solve tuning problems of any nature. Its design focuses on code clarity, flexibility and ease of use. Keywords: hyperparameter tuning, meta-optimization, free software, Python
1 Introduction Machine learning tasks can be summarized as constructing a model which, in some way, captures relevant aspects of the data that was used to construct the model. The methods used in machine learning vary greatly in approach and application domain (e.g. supervised vs. unsupervised, classification vs. regression, . . . ), but many expose a common issue: hyperparameters which must be optimized. Optimizing hyperparameters is referred to as tuning within the machine learning community. Commonly recurring tuning parameters are related to regularization, kernels, learning rate, . . . Some important general observations: (i) the effect of tuning parameters can differ greatly, (ii) many tuning parameters have (box) constraints, (iii) the importance of a parameter can vary per problem and (iv) typically only a subset of tuning parameters significantly affect the quality of the model. In the tuning problem, the objective function is some user defined score measure for a model built using a tuple of hyperparameters (the optimization variables). From an optimization perspective, the objective function typically is stochastic, typically not well-behaved (certainly not convex) and non-smooth. We additionally assume that evaluating the objective function is expensive, since each evaluation involves training a model and evaluating its performance. A good common example for this is k-fold cross-validation. This last assumption leads us to focus on approaches that require few evaluations (adaptive methods). In Optunity, tuning tasks are formulated as maximization problems. Optunity is in active development. The latest version can be obtained via GitHub: https://github.com/claesenm/optunity.
2 Functional overview Optunity provides various strategies to solve tuning problems. A complete, up-to-date overview with examples of the main features is available at: http://optunity.readthedocs.org/. Formally, the tuning task involves optimizing a parameter vector θ, such that a model M trained on a data set X with hyperparameters θ maximizes some score function. This can be summarized as follows: max f (θ), f (θ) = score Mθ (X ) θ
The choice of score function depends on the task and its design priorities (such as accuracy for classification, R2 for regression). Optunity treats the objective function f (θ) as a black box. All solution strategies operate based on sequential function evaluations, without additional information like gradients or error estimates. Optunity provides repeated k-fold cross-validation and a variety of score functions to estimate generalization performance of supervised algorithms. 2.1 Solvers The library offers a variety of solvers, both adaptive and unadaptive. The following approaches are offered currently: grid search, random search [1], particle swarms [2] and CMA-ES [3]. Future development will incorporate DIRECT [4] and Bayesian approaches such as EGO [5] and sequential Kriging [6].
3 Implementation overview Optunity consists of loosely coupled modules. The code is extensively documented and uses doctests where applicable, both to extend the documentation and provide unit tests. We opted to develop in Python for three key reasons: (i) terseness of the language, (ii) ease of inte-
gration with other tools and (iii) availability of other libraries (such as NumPy/SciPy [7] and DEAP [8]). Optunity has no hard dependencies on external packages, but can benefit from some. When useful algorithms are available in other packages we can provide additional solvers.
4 Code example The code snippet below shows how to optimize SVM hyperparameters c and g using cross-validation to estimate generalization performance on the data set x 7→ y (data to labels). Optunity functions are shown in green. 1 2 3
4 5 6 7
8 9
im po rt o p t u n i t y a s opt @opt . c r o s s v a l i d a t e d ( x , y=y ) def svm score ( x train , y train , x test , y test , c , g) : model = t r a i n ( x t r a i n , y t r a i n , c , g ) y h a t = p r e d i c t ( model , x t e s t ) return some score fcn ( y test , y hat ) s o l v e r = opt . m a k e s o l v e r ( ’ random s e a r c h ’ , 100 , c =[1 , 2 ] , g = [ 0 . 1 , 1 0 ] ) par s , = opt . o p t i m i z e ( s o l v e r , s v m s c o r e ) o p t i m a l m o d e l = t r a i n ( x , y , ∗∗ p a r s )
Listing 1: tuning an SVM with Optunity The cross validated function decorator takes care of cross-validation (line 2). Calls to svm score automatically yield a cross-validated estimate of generalization performance (computed using a fixed set of folds). The body of svm score fits a model on the training folds and predicts the test fold. The test predictions and true test labels are used by some score function. svm score is the objective function which will be maximized. Such train–predict–score chains are the only logic users need to implement to perform cross-validation. The solver is constructed on line 7. We use random search [1], which requires a number of evaluations and a set of box constraints. The call to maximize couples the solver with its objective function. maximize returns the best parameter tuple, which can then be used to train a model on the full training set.
5 Conclusion Optunity allows easy integration of sophisticated tuning strategies in machine learning workflows. Adaptive approaches provide disciplined ways to retrieve good parameter values while using fewer function evaluations than traditional grid search. The library provides various solving strategies and complementary functionality such as cross-validation and common score functions for supervised algorithms. Future development goals include making Optunity available in other environments (e.g. R, MATLAB) and implementing additional state-of-the-art solvers.
Acknowledgments Research supported by: • Research Council KU Leuven: GOA/10/09 MaNet, KUL PFV/10/016 SymBioSys, PhD/Postdoc grants, • Industrial Research fund (IOF): IOF/HB/13/027 Logic Insulin, • Flemish Government: FWO: projects: G.0871.12N (Neural circuits); PhD/Postdoc grants; IWT: TBM-Logic Insulin(100793), TBM Rectal Cancer(100783), TBM IETA(130256); PhD/Postdoc grants; Hercules Stichting: Hercules 3: PacBio RS, Hercules 1: The C1 single-cell auto prep system, BioMark HD System and IFC controllers (Fluidigm) for single-cell analyses; iMinds Medical Information Technologies SBO 2014; VLK Stichting E. van der Schueren: rectal cancer.
References [1] James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(1):281–305, 2012. [2] Maurice Clerc and James Kennedy. The particle swarm-explosion, stability, and convergence in a multidimensional complex space. Evolutionary Computation, IEEE Transactions on, 6(1):58–73, 2002. [3] Nikolaus Hansen and Andreas Ostermeier. Completely derandomized self-adaptation in evolution strategies. Evolutionary computation, 9(2):159–195, 2001. [4] Donald R Jones, Cary D Perttunen, and Bruce E Stuckman. Lipschitzian optimization without the lipschitz constant. Journal of Optimization Theory and Applications, 79(1):157–181, 1993. [5] Donald R Jones, Matthias Schonlau, and William J Welch. Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4):455–492, 1998. [6] Deng Huang, TT Allen, WI Notz, and RA Miller. Sequential kriging optimization using multiplefidelity evaluations. Structural and Multidisciplinary Optimization, 32(5):369–382, 2006. [7] Eric Jones, Travis Oliphant, and Pearu Peterson. Scipy: Open source scientific tools for python. http: // www. scipy. org/ , 2001. [8] F´elix-Antoine Fortin, De Rainville, MarcAndr´e Gardner Gardner, Marc Parizeau, Christian Gagn´e, et al. DEAP: Evolutionary algorithms made easy. Journal of Machine Learning Research, 13(1):2171–2175, 2012.