Meta-Learning & Algorithm Selection @ECAI 2014 Tutorial T4 18 August Prague
Pavel Brazdil, Carlos Soares, Joaquin Vanschoren (
[email protected],
[email protected],
[email protected])
Plan 1. The ML/DM algorithm selection problem 2. Algorithm selection via meta-learning
3. Considering both accuracy and time in algorithm selection 4. Combining metalearning and active testing 5. Extending metalearning to workflow design 6. Metalearning for streaming data
7. Metalearning and algorithm selection in other domains 8. Metalearning in specific problems 9. OpenML brazdil, soares & vanschoren @ ECAI 2014
2
07-08-2014
Metalearning & Algorithm Selection Part I 1
ECAI-2014 Tutorial T4 18 August 2014 Pavel Brazdil LIAAD Inesc Tec / FEP, U.Porto P.Brazdil - ECAI-2014 Tutorial T4
2
Overview 1. The ML/DM algorithm selection problem 2. Algorithm selection via meta-learning 2.1 Types of recommendation 2.2 Following the recommendation to identify the best algorithm 2.3 How to define the metalearning problem
3. Considering both accuracy and time in algorithm selection 4. Combining metalearning and active testing 5. Extending meta-learning to workflow design P.Brazdil - ECAI-2014 Tutorial T4
1
07-08-2014
3
1. The ML/DM Algorithm Selection Problem
Information Flow in Machine Learning (ML) / Data Mining (DM) Systems: .
Data Items (Examples)
Classification Algorithms: - Decision Trees - Neural nets - SVMs etc. Regression Algorithms: …
ML DM
Categoric / Numeric Predictions Novel Patterns
P.Brazdil - ECAI-2014 Tutorial T4
4
1. The ML/DM Algorithm Selection Problem (2) Some Examples of ML/DM Applications: ! Engineering: Controlling a robot path / production etc. ! Finance: Concession of Credit, Predicting bankruptcy ! Medicine: Diagnostics, controlling the state of patient in intensive care ! Bioinformatics: Predicting the toxicity of medical drugs ! Information retrieval: Learning to identify suitable terms / linguistic features and using them in identification of documents
P.Brazdil - ECAI-2014 Tutorial T4
2
07-08-2014
5
1. The ML/DM Algorithm Selection Problem (3)
A large set of algorithms is available in ML: + It increases a possibility of finding a good solution. - It is much harder to find the right algorithm.
Classification Algorithms: - Decision Trees - Neural nets - SVMs … Ensembles of algorithms
This problem is aggravated by: Many algorithms need parameter settings (NNs, SVMs etc.). We want methods that can help us to identify/select the algorithms with the best performance. We cannot test all algorithms for computational reasons (thousands of variants of algorithm + parameter settings) This problem was first formulated by Rice (1976). P.Brazdil - ECAI-2014 Tutorial T4
6
2. Algorithm Selection via Metalearning (1) Metalearning is learning about which method / algorithm is best for which situation. Basic scheme - Part 1: Generation of meta-level models: Description of datasets Description of algorithms
Metadata
Meta-level models
Experimental results of algorithms on datasets P.Brazdil - ECAI-2014 Tutorial T4
3
07-08-2014
7
2. Algorithm Selection via Metalearning (2) Basic scheme - Part 2: Applying the meta-level models to algorithm selection:
Description of a new dataset Meta-level models
Recommendation: - Best algorithm - Ranking
Metadata
P.Brazdil - ECAI-2014 Tutorial T4
8
2. Algorithm Selection via Metalearning (3) One simple scheme does not take into account the characteristics of datasets: Experimental results of algorithms on datasets
Merging Rankings
Recommended Ranking
This method can play the role of “straw man” / default method. More complex methods should perform better than this method.
P.Brazdil - ECAI-2014 Tutorial T4
4
07-08-2014
9
2. Algorithm Selection via Metalearning (4) More details on merging rankings Example with 2 rankings:
P.Brazdil - ECAI-2014 Tutorial T4
10
2. Algorithm Selection via Metalearning (5) Overview of Next Topics 2.1 Types of recommendation 2.2 Following recommendation to identify the best algorithm 2.3 How to define the metalearning problem
P.Brazdil - ECAI-2014 Tutorial T4
5
07-08-2014
11
2.1 Types of Recommendation Possible types of recommendation: 1. Single potentially best algorithm (e.g. A7) - Does not suggest other options 2. As above + all algorithm with a comparable performance (e.g. A7, A13 ,A15 ) +- Suggests more options, but this may not be sufficient 3. Ranking of algorithms + The best alternative! 4. (Predicted of performance of all algorithms → ranking) Unnecessarily more complex & more difficult! P.Brazdil - ECAI-2014 Tutorial T4
12
2.2 Following the Recommendation to Identify the Best Algorithm Top N strategy:
(Initialize Current-Best)
Set Algs[1] as the Current-Best
(Algs[1]= 1st algorithm in the ranking)
Evaluate the Current-Best using CV Set Perf-Current-Best
(CV = cross-validation) (Perf-Current-Best = its performance)
Repeat i←2 Evaluate Algs[i] using CV Set Perf-Algs[i] If Perf-Algs[i] > Perf-Current-Best Current-Best ← Algs[i] Perf-Current-Best ← Perf-Algs[i] i ← i+1 End Repeat
(Algs[i] = i-th algorithm in the ranking)
(Perf-Algs[i] = performance of Algs[i]) (Substitute Current-Best) (Store its performance) (Advance to next algorithm)
P.Brazdil - ECAI-2014 Tutorial T4
6
07-08-2014
13
2.2 Following the Recommendation to Identify the Best Algorithm Evaluation Accuracy loss = Perf-True-Best - Perf-Current-Best (Perf-True-Best = performance of the truly best algorithm) Accuracy loss
4,5
Here we consider 16 algorithms
4 3,5 3 2,5 2 1,5 1 0,5 0 1
2
P.Brazdil - ECAI-2014 Tutorial T4
14
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Nº of algorithms evaluated
2.3 How to Define the Metalearning Problem? Create training data (metadata) i.e. training data for the meta-level task: - characteristics of datasets are used as attributes / features, - algorithm name is used as attribute / feature, - performance (e.g. accuracy) is used as the target variable. What kind of dataset characteristics can be used?
P.Brazdil - ECAI-2014 Tutorial T4
7
07-08-2014
15
2.3 How to Define the Metalearning Problem (2) Dataset characteristics: ! Statistical and information-theoretic measures, ! Landmarks, ! Sub-sampling landmarks.
P.Brazdil - ECAI-2014 Tutorial T4
16
2.3 How to Define the Metalearning Problem (3) Statistical and information-theoretic measures: ! Number of classes, ! Class entropy, ! Number of features, ! Ratio of examples to features, ! Degree of correlation between features and target concept, ! etc. + Positive and tangible results (e.g., in projects Statlog and METAL). - There is a limit on how much information these can capture uni- or bi-lateral measures (2 attributes or attribute/class) P.Brazdil - ECAI-2014 Tutorial T4
8
07-08-2014
17
2.3 How to Define the Metalearning Problem(4) Dataset characteristics (2) - Landmarks (Landmarkers) Measure the performance of a set of some (simple and fast) learning algorithms (landmarkers) (e.g. Simplified decision tree) The accuracy of these landmarkers is used to characterize the dataset. - Which landmarks should be used is a non-trivial problem P.Brazdil - ECAI-2014 Tutorial T4
18
2.3 How to Define the Metalearning Problem (5) Dataset characteristics (3) - Sub-sampling Landmarks Exploit performance information obtained on simplified versions of the data (samples). Accuracy results on these samples (or sequences of samples) serve to characterise individual datasets and are referred to as sub-sampling landmarks. Meta-learning system that exploited sub-sampling landmarks lead to marked improvements (Leite & Brazdil, 2010) P.Brazdil - ECAI-2014 Tutorial T4
9
07-08-2014
19
2.3 How to Define the Metalearning Problem (5) Dataset characteristics (3) – Performance-based Dataset Similarity Some researcher s proposed (Nguyen et al., 2012): Two datasets can be considered similar if the performance of a given set of algorithms on these is similar.
P.Brazdil - ECAI-2014 Tutorial T4
20
2.3 How to Define the Metalearning Problem (6) The meta-level model can be in the form of: ! k-NN
(used by many - incl. Brazdil, Gama & Henery, 1994)
! Meta-level rules (not very reliable), ! Neural network, ! Ranking trees
(Q. Sun, 2013; Sun & Pfahringer, 2013)
! Bagging ensembles
(Q. Sun, 2013; Sun & Pfahringer, 2013)
Etc.
P.Brazdil - ECAI-2014 Tutorial T4
10
07-08-2014
21
2.3 How to Define the Metalearning Problem (7) One example of a meta-learning system: k-NN with sample-based characterization (Leite & Brazdil, 2010) ! Get new dataset Dnew ! Construct partial learning curves – performance of all algorithms on N samples of Dnew. ! Use k-NN to identify the most similar partial learning curve. ! Retrieve curve, adapt & project to obtain estimate of performance. One research question: ! Can we obtain (nearly) the same results, but obtain partial learning curves of some algorithms only? P.Brazdil - ECAI-2014 Tutorial T4
22
2.3 How to Define the Metalearning Problem (8)
P.Brazdil - ECAI-2014 Tutorial T4
11
07-08-2014
23
3. Considering both Accuracy (or AUC) and Time If the aim is to consider several different performance measures - e.g. accuracy (or AUC) and time there are two approaches: ! Multi-objective analysis (e.g. DEA), ! Define a combined measure
P.Brazdil - ECAI-2014 Tutorial T4
24
3. Considering both Accuracy (or AUC) and Time (2) Defining a combined measure Some authors maintain that this measure should not include absolute values, but rather ratios: ! Ratios of accuracies of two algorithms ! Ratios of times of two algorithms (suitably rescaled)
One proposal (S.Abdulrahman, & Brazdil,2014):
P.Brazdil - ECAI-2014 Tutorial T4
12
07-08-2014
25
4. Combining Metalearning and Active Testing In 2.2 we presented a sequential process of following the recommended ranking (keep testing all algorithms, following the ranking). This can suffers from two problems: The set of algorithms may contain: 1. Suboptimal algorithms, 2. Very similar algorithms (e.g. variants of the same algorithm with different parameter settings).
Time can be waisted by testing these. How can this be avoided? P.Brazdil - ECAI-2014 Tutorial T4
26
4. Combining Metalearning and Active Testing (2) Eliminating sub-optimal algorithms can be done as follows: Process all datasets one by one. For each dataset mark all algorithms that achieved a competitive result (e.g. best / equivalent to best). After all datasets have been processed, drop all unmarked algorithms (they did not win on any dataset) Use the remaining algorithms from then on. Positive results were reported (Brazdil, Soares & Pereira, 2001) P.Brazdil - ECAI-2014 Tutorial T4
13
07-08-2014
27
4. Combining Metalearning and Active Testing (3) Dealing with very similar algorithms (1) Various techniques exist that can identify similar algorithms by considering performance e.g. by identifying algorithms that commit correlated errors (see e.g. Lee & Giraud-Carrier, 2011)
This could be exploited: The algorithms that commit similar errors to others could be dropped.
P.Brazdil - ECAI-2014 Tutorial T4
28
4. Combining Metalearning and Active Testing (4) Dealing with very similar algorithms (2) One approach maintains the concept of Current-Best (as in 2.2) and tries to identify an algorithm with the best chance of achieving a better performance. The method looks at all previous situations (datasets) in which a particular algorithm won over Current-Best & quantifies all the performance gains. The algorithm with the largest performance gains is evaluated. If it achieves better performance than Current-Best it replaces Current-Best. See (Leite, Brazdil & Vanschoren, 2012) for more details. P.Brazdil - ECAI-2014 Tutorial T4
14
07-08-2014
29
4. Combining Metalearning and Active Testing (5) Dealing with very similar algorithms (3) Good results were reported:
One active testing variant. It uses peformancebased similarity of datasets.
ATWs_k
Rand TopN
P.Brazdil - ECAI-2014 Tutorial T4
30
4. Combining Metalearning and Active Testing (6) Comments : The algorithm chosen for testing cannot be similar to Current-Best as otherwise it would not have been chosen. The method does not really follow the ranking: It jumps back and forth in the ranking, but similar algorithms are left behind. They are eventually tested, unless the process is terminated before.
P.Brazdil - ECAI-2014 Tutorial T4
15
07-08-2014
31
5. Extending Metalearning to Workflows (1) Workflow is a (partially) ordered sequence of operators ≈ Plan Example of a workflow: Discretization Dataset
Apply Naïve Bayes (class probabilities)
Class probability thresholding
OR
Classification Apply Decision Tree
P.Brazdil - ECAI-2014 Tutorial T4
32
5. Extending Metalearning to Workflows (2) Workflows have been incorporated into many DM systems: Weka, Knime, RapidMiner, SAS etc. Example of a workflow in Knime:
P.Brazdil - ECAI-2014 Tutorial T4
16
07-08-2014
33
5. Extending Metalearning to Workflows (3) Many different operations can be chosen at any step: ! Pre-processing operations (feature selection, discretization etc.), ! Classification algorithms (DT, NB, NN, SVM, kNN ..) ! Parameter settings for each, ! Ensembles (bagging, boosting etc.). Consequently: ! Designing complex workflows manually is time consuming. ! The resulting workflow(s) can have suboptimal performance (accuracy, AUC, training time etc.) => Users need support regards how to obtain good workflows! P.Brazdil - ECAI-2014 Tutorial T4
34
5. Extending Metalearning to Workflows (4) Naïve approach: ! Generate all possible workflows for a new dataset - Exploit a given ontology of abstract/concrete operators
! Use meta-knowledge associated with past problems/datasets to: - Retrieve past workflows associated with similar problems; - Rank these workflows according to the expected performance;
! Carry out tests to identify the best workflow;
Acknowledgments: The help of Salisu Abdulrahman in the preparation of some of these slides is gratefully acknowledged. P.Brazdil - ECAI-2014 Tutorial T4
17
07-08-2014
35
5. Extending Metalearning to Workflows (5)
A DM workflow as a hierarchical DAG (adapted from Hilario et al., 2011)
P.Brazdil - ECAI-2014 Tutorial T4
36
5. Extending Metalearning to Workflows (6) A HTN plan of the workflow shown earlier. Non-terminal nodes = tasks / methods. Abstract operators are in bold. Simple operators are in italics.
P.Brazdil - ECAI-2014 Tutorial T4
18
07-08-2014
37
5. Extending Metalearning to Workflows (7) The naïve approach is not practical: ! The number of possible workflows is normally too large; ! Meta-knowledge concerning different workflows may not be available. Some solutions: Expand preferably only the most promising nodes / branches, with the help of meta-knowledge in the form of: ! Association rules (Kietz et al., 2012) ! Conditional probabilities, ! Collaborative filtering (Sebag, invited talk at MetaSel-2014) P.Brazdil - ECAI-2014 Tutorial T4
38
.
End of Part I Thank you for your attention!
P.Brazdil - ECAI-2014 Tutorial T4
19
07-08-2014
39
References Books and Survey Articles: P. Brazdil, Christophe G. Giraud-Carrier, C. Soares, R. Vilalta: Metalearning - Applications to Data Mining. Springer, 2009. K.Smith-Miles: Cross-Disciplinary Perspectives on Meta-Learning for Algorithm Selection, ACM Computing Surveys, 2008 F Serban, J Vanschoren, JU Kietz and A Bernstein: A Survey of Intelligent Assistants for Data Analysis, ACM Computing Surveys, 2013
P.Brazdil - ECAI-2014 Tutorial T4
40
References SM Abdulrahman, P Brazdil: Measures for Combining Accuracy and Time for Meta-learning, Proc. of Workshop MetaSel-2014 associated with ECAI-2014, CEUR proceedings, 2014 P Brazdil, J Gama, B Henery, Characterizing the applicability of classification algorithms using meta-level learning, Machine Learning: ECML-94, 83-102 PB Brazdil, C Soares, JP da Costa: Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results, Machine Learning 50 (3), 251-277, 2003 P Brazdil, C Soares, R Pereira: Reducing rankings of classifiers by eliminating redundant classifiers, Progress in Artificial Intelligence, 14-21, 2001 M Hilario, P Nguyen, H Do, A Woznica, A Kalousis, "Ontology-based meta-mining of knowledge discovery workflows." in N.J ankovski et al. (eds), Meta-Learning in Computational Intelligence. Springer Berlin Heidelberg, 2011. 273-315. Kietz, JU, F Serban, A Bernstein, & S Fischer, Designing KDD-workflows via HTN-planning for intelligent discovery assistance. In 5th Planning to Learn Workshop WS28 at ECAI-2012. P.Brazdil - ECAI-2014 Tutorial T4
20
07-08-2014
41
References JW Lee, C Giraud-Carrier, A metric for unsupervised metalearning, Intelligent Data Analysis, 2011 R Leite, P Brazdil: Active Testing Strategy to Predict the Best Classification Algorithm via Sampling and Metalearning. Proc. of ECAI, 309-314, 2010 R Leite, P Brazdil, J Vanschoren: Selecting Classification Algorithms with Active Testing, Proc. of MLDM, Springer, 2012; M. Mısır , M. Sebag: Algorithm Selection as a Collaborative Filtering Problem, Inria Research Report N°XX, December 2013, Research Centre Saclay – Île-de-France. P Nguyen, J Wang and M Hilario, A Kalousis: Learning Heterogeneous Similarity Measures for Hybrid-Recommendations in Meta-Mining, Proc. of ICDM-2012, also in arXiv:1210.1317, 2012 Rice, J. : The algorithm selection problem. Advances in Computers 15, 65–118, 1976. Quan Sun: Meta-Learning and the Full Model Selection Problem, PhD thesis, U.Waikato, 2013 Q Sun and B Pfahringer. Pairwise Meta-Rules for Better Meta-Learning-Based Algorithm Ranking. Machine Learning, 93(1):141-161, 2013. P.Brazdil - ECAI-2014 Tutorial T4
21
Metalearning*&*Algorithm*Selec2on* (Part*II)*
ECAI92014*Tutorial*T4** 18*August*2014** Carlos*Soares* INESC*TEC/*FEUP,*U.Porto*
plan*
6. metalearning*for*streaming*data* 7. metalearning*and*algorithm*selec2on*in* other*domains* 8. metalearning*in*specific*problems*
brazdil,*soares*&*vanschoren*@*ECAI*2014*
…(and(for(streams(
algorithm*selec2on*for*streams* typical(scenario(
brazdil,*soares*&*vanschoren*@*ECAI*2014*
streams:*earlier*work*on*2me* series* • rule9based*forecas2ng* • expert*system*approach* – i.e.*based*on*expert*knowledge*
• 6*features*
– func2onal*form,*outliers*(series*and*last* observa2on),*discon2nui2es,*change*in*trend* (whole*series*and*last*observa2ons)*
• iden2fied*based*on*simple*sta2s2cs* • tested*on*2me*series*from*the*M*compe22on* Adya,*M.*et*al.,*2001.*Automa2c*iden2fica2on*of*2me*series*features*for*rule9based*forecas2ng.* Interna'onal*Journal*of*Forecas'ng,*17(2),*pp.143–157.*
brazdil,*soares*&*vanschoren*@*ECAI*2014*
streams:*earlier*metalearning* approach*for*2me*series* • tradi2onal*metalearning*architecture* – based*on*Kalousis’*NOEMON*
• algorithms*
– classical*2me*series*methods* – NNs*
• combina2on*of*types*of*metafeatures* – tradi2onal**
• e.g.*number*of*observa2ons,*skewness,*kurtosis*
– specific*to*2me*series*
• e.g.*trend,*autocorrela2on,*number*of*turning*points*
• tested*on*M39compe22on*data*
Prudêncio,*R.B.C.*&*Ludermir,*T.B.,*2004.*Meta9learning*approaches*to*selec2ng*2me*series* models.*Neurocompu'ng,*61,*pp.121–137*
brazdil,*soares*&*vanschoren*@*ECAI*2014*
streams:*opportuni2es/ challenges*
granularity(of(selec4on(
brazdil,*soares*&*vanschoren*@*ECAI*2014*
data(characteriza4on(
Rossi,*A.L.D.*et*al.,*2014.*MetaStream:*A*meta9learning*based* method*for*periodic*algorithm*selec2on*in*2me9changing*data.* Neurocompu'ng,*127,*pp.52–64.*
streams:*algorithm*selec2on*à*la* streams*(1/3)* • assumes*
– data*divided*into*subsets*generated*by*different* phenomena* – sta2onary*within*subsets* – ...*which*may*be*recurring*
• approach*
brazdil,*soares*&*vanschoren*@*ECAI*2014*
– different*models*used*at*different*2mes* – each*model*may*be*reused* Gama,*J.*&*Kosina,*P.,*2011.*Learning*about*the*learning*process.* In*Proc.*of*the*10th*Int.*Conf.*on*Advances*in*Intelligent*Data* Analysis.*Springer9Verlag,*pp.*162–172.*
streams:*algorithm*selec2on*à*la* streams*(2/3)* • two*layers*architecture*
brazdil,*soares*&*vanschoren*@*ECAI*2014*
– level*0:*base9level* learning* – level*1:*is*classifier*able* to*predict*example* correctly*
Gama,*J.*&*Kosina,*P.,*2011.*Learning*about*the*learning*process.* In*Proc.*of*the*10th*Int.*Conf.*on*Advances*in*Intelligent*Data* Analysis.*Springer9Verlag,*pp.*162–172.*
streams:*algorithm*selec2on*à*la* streams*(3/3)* • model*management* – each*model*has*a* metamodel*
• if*drig*detected*at*the* base*level* – each*referee*predicts* whether*its*model*is* good*enough*
brazdil,*soares*&*vanschoren*@*ECAI*2014*
• may*decide*to*learn* new*model* Gama,*J.*&*Kosina,*P.,*2011.*Learning*about*the*learning*process.* In*Proc.*of*the*10th*Int.*Conf.*on*Advances*in*Intelligent*Data* Analysis.*Springer9Verlag,*pp.*162–172.*
gps*
6. metalearning*for*streaming*data* 7. metalearning*and*algorithm*selec2on*in* other*domains* 8. metalearning*in*specific*problems*
brazdil,*soares*&*vanschoren*@*ECAI*2014*
other*domains* • constraint*sa2sfac2on* – boolean*sa2sfiability*problem*(SAT)*
• search/op2miza2on* – traveling*salesman* – job9shop*scheduling* – quadra2c*assignment*problem*
• sor2ng* Smith9Miles,*K.A.,*2008.*Cross9disciplinary*perspec2ves*on*meta9learning*for* algorithm*selec2on.*ACM*Compu'ng*Surveys,*41(1),*pp.1–25.* * Kojhoff,*L.,*2014.*Algorithm*Selec2on*for*Combinatorial*Search*Problems:*A* Survey.*{AI}*Magazine,*pp.1–17.*
brazdil,*soares*&*vanschoren*@*ECAI*2014*
Kojhoff,*L.*(2012).*Hybrid*regression9classifica2on*models*for* algorithm*selec2on.*In*20th*European*Conference*on*Ar'ficial* Intelligence,*pp.*480–485*
• even*stacking*has*been* adapted!*
– challenges* – approaches*
• but*research*is*quite* similar*
algorithm*selec2on*in*other* domains* • different*names* – hiperheuris2cs* – landscape*analysis* – empirical*hardness*
• ...*terminology* – porkolio*is*the*set*of* algorithms*
• ...*goal* – most*ogen*execu2on*2me* – although*(meta9)heuris2cs* do*not*guarantee*op2mal* solu2on*
brazdil,*soares*&*vanschoren*@*ECAI*2014*
– level*of*domain*knowledge* used*
• related*to*the*possibility*to* switch*algorithm*during* search*
– ...*except*for*dynamical* ones*
• (sta2s2cal)*proper2es*of* the*instances* • landmarkers* • problem9specific*measures*
– similar*
• features*
other*domains*vs** machine*learning* • easier*to*obtain*meta9 examples* – ar2ficial*instances*are* ogen*used*for*research*
– search*methods*operate* on*the*same*space* – except*for*Brodley*(93)*
• possible*to*switch* method*during*search*
*
brazdil,*soares*&*vanschoren*@*ECAI*2014*
algorithm*selec2on*for* op2miza2on:*TSP*
main(contribu4ons( • meta9features*
– network*measures* – algorithm*specific*proper2es* – op2miza2on*landmarkers*
• mul29label*classifica2on* • systema2c*genera2on*of* instances*
brazdil,*soares*&*vanschoren*@*ECAI*2014*
traveling(salesman(problem(
Jorge*Kanda,*Carlos*Soares,*Eduardo*R.*Hruschka,*André*Carlos* Ponce*Leon*Ferreira*de*Carvalho:*A*Meta9Learning*Approach*to* Select*Meta9Heuris2cs*for*the*Traveling*Salesman*Problem*Using* MLP9Based*Label*Ranking.*ICONIP*(3)*2012:*4889495*
dynamic*selec2on*of*operators* • JSS* – find*schedule*that* minimizes*total* comple2on*2me* – …*without*conflicts*
• learn*model* – for*each*conflict* – given*current*schedule* and*what*remains*to*be* done* – which*heuris2c*to*use*
Abreu,*P.,*Soares,*C.*&*Valente,*J.M.,*2009.*Selec2on*of*Heuris2cs*for*the*Job9 Shop*Scheduling*Problem*Based*on*the*Predic2on*of*Gaps*in*Machines.*In* Learning*and*Intelligent*Op'miza'on:*Third*Interna'onal*Conference*(LION*3)*PP* Selected*Papers.*Berlin,*Heidelberg:*Springer9Verlag,*pp.*134–147.*
brazdil,*soares*&*vanschoren*@*ECAI*2014*
gps*
6. metalearning*for*streaming*data* 7. metalearning*and*algorithm*selec2on*in* other*domains* 8. metalearning*in*specific*problems*
brazdil,*soares*&*vanschoren*@*ECAI*2014*
algorithm*selec2on*for** gene*expression*data* microarray(data(
main(contribu4ons( • metafeatures*based*on* clustering*validity*measures* • weighted*k9NN*for*ranking*
brazdil,*soares*&*vanschoren*@*ECAI*2014*
adapted*from*(Harrington*et*al.,*2000)**
Bruno*Feres*de*Souza,*André*Carlos*Ponce*de*Leon*Ferreira*de* Carvalho,*Carlos*Soares:*A*comprehensive*comparison*of*ML* algorithms*for*gene*expression*data*classifica2on.*IJCNN*2010:*198*
response*automa2on*in*a* helpdesk* • finding*the*best*answer* to*customer*emails* • select*best*method* – based*on*similarity*of* email* – and*performance*of* methods*
Marom,*Y.,*Zukerman,*I.*&*Japkowicz,*N.,*2007.*A*meta9learning*approach*for* selec2ng*between*response*automa2on*strategies*in*a*help9desk*domain.*In* Proceeding*AAAI’07*Proceedings*of*the*22nd*na'onal*conference*on*Ar'ficial* brazdil,*soares*&*vanschoren*@*ECAI*2014* intelligence*P*Volume*1.*AAAI*Press,*pp.*907–912.*
metalearning*for*shelf*space* alloca2on*in*retail*
rela4on(of(predic4ve(performance( and(data(characteris4cs(
brazdil,*soares*&*vanschoren*@*ECAI*2014*
solu4on(architecture(
Fábio*Pinto,*Carlos*Soares:*Space*Alloca2on*in*the*Retail*Industry:* A*Decision*Support*System*Integra2ng*Evolu2onary*Algorithms* and*Regression*Models.*ECML/PKDD*(3)*2013:*5319546*
• • • • • •
references:*streams*
Adya,*M.*et*al.,*2001.*Automa2c*iden2fica2on*of*2me*series*features*for* rule9based*forecas2ng.*Interna'onal*Journal*of*Forecas'ng,*17(2),*pp. 143–157* Prudêncio,*R.B.C.*&*Ludermir,*T.B.,*2004.*Meta9learning*approaches*to* selec2ng*2me*series*models.*Neurocompu'ng,*61,*pp.121–137* Lemke,*C.*&*Gabrys,*B.,*2010.*Meta9learning*for*2me*series*forecas2ng* and*forecast*combina2on.*Neurocompu'ng,*73(10912),*pp.2006–2016* Rossi,*A.L.D.*et*al.,*2014.*MetaStream:*A*meta9learning*based*method*for* periodic*algorithm*selec2on*in*2me9changing*data.*Neurocompu'ng,*127,* pp.52–64* Gama,*J.*&*Kosina,*P.,*2011.*Learning*about*the*learning*process.*In*Proc.* of*the*10th*Int.*Conf.*on*Advances*in*Intelligent*Data*Analysis.*Springer9 Verlag,*pp.*162–172* Rijn,*J.N.*Van*et*al.,*2014.*Algorithm*Selec2on*on*Data*Streams.*To*be* published*in*Discovery*Science*
brazdil,*soares*&*vanschoren*@*ECAI*2014*
Smith9Miles,*K.A.,*2008.*Cross9disciplinary*perspec2ves*on*meta9learning*for*algorithm*selec2on.*ACM*Compu'ng* Surveys,*41(1),*pp.1–25* Kojhoff,*L.,*2014.*Algorithm*Selec2on*for*Combinatorial*Search*Problems:*A*Survey.*To*be*published*in*AI* Magazine.*
references:*other*domains* • • • • • • • • •
Guo,*H.*2003.*Algorithm*selec2on*for*sor2ng*and*probabilis2c*inference:*A*machine*learning9based*approach.* Ph.D.*disserta2on,*Kansas*State*University* Lagoudakis,*M.,*Lijman,*M.,*and*Parr,*R.*2001.*Selec2ng*the*right*algorithm.*In*Proceedings*of*the*AAAI*Fall* Symposium*Series:*Using*Uncertainty*within*Computa'on.* Burke,*E.K.*et*al.,*2010.*A*Classifica2on*of*Hyper9heuris2c*Approaches.*Handbook*of*Metaheuris'cs,*146,*pp.1–21.* Smith9Miles,*K.A.*et*al.,*2009.*Understanding*the*Rela2onship*between*Scheduling*Problem*Structure*and* Heuris2c*Performance*using*Knowledge*Discovery,*LNCS.*Learning*and*Intelligent*Op'miza'oN*LION,*3.* Kojhoff,*L.*(2012).*Hybrid*regression9classifica2on*models*for*algorithm*selec2on.*In*20th*European*Conference*on* Ar'ficial*Intelligence,*pp.*480–485* Jorge*Kanda,*Carlos*Soares,*Eduardo*R.*Hruschka,*André*Carlos*Ponce*Leon*Ferreira*de*Carvalho:*A*Meta9Learning* Approach*to*Select*Meta9Heuris2cs*for*the*Traveling*Salesman*Problem*Using*MLP9Based*Label*Ranking.*ICONIP* (3)*2012:*4889495* Abreu,*P.,*Soares,*C.*&*Valente,*J.M.,*2009.*Selec2on*of*Heuris2cs*for*the*Job9Shop*Scheduling*Problem*Based*on* the*Predic2on*of*Gaps*in*Machines.*In*Learning*and*Intelligent*Op'miza'on:*Third*Interna'onal*Conference*(LION* 3)*PP*Selected*Papers.*Berlin,*Heidelberg:*Springer9Verlag,*pp.*134–147.*
brazdil,*soares*&*vanschoren*@*ECAI*2014*
references:*specific*applica2ons*
• Bruno*Feres*de*Souza,*André*Carlos*Ponce*de*Leon*Ferreira* de*Carvalho,*Carlos*Soares:*A*comprehensive*comparison* of*ML*algorithms*for*gene*expression*data*classifica2on.* IJCNN*2010:*198* • Marom,*Y.,*Zukerman,*I.*&*Japkowicz,*N.,*2007.*A*meta9 learning*approach*for*selec2ng*between*response* automa2on*strategies*in*a*help9desk*domain.*In*Proceeding* AAAI’07*Proceedings*of*the*22nd*na'onal*conference*on* Ar'ficial*intelligence*P*Volume*1.*AAAI*Press,*pp.*907–912* • Fábio*Pinto,*Carlos*Soares:*Space*Alloca2on*in*the*Retail* Industry:*A*Decision*Support*System*Integra2ng* Evolu2onary*Algorithms*and*Regression*Models.*ECML/ PKDD*(3)*2013:*5319546* brazdil,*soares*&*vanschoren*@*ECAI*2014*
NET WORKED MACHINE LEARNING
#OpenML
J O A Q U I N VA N S C H O R E N ( T U / E ) , 2 0 1 4
Research different.
GALILEO GALILEI DISCOVERS SATURN’S RINGS
1610
‘ S M A I S M R M I L M E P O E TA L E U M I B U N E N U G T TA U I R A S ’
Research different. Royal society: Take nobody’s word for it Scientific Journal: Reputation-based culture
300 YEARS LATER
NETWORKED SCIENCE
JOURNAL: LONG-TERM MEMORY I N T E R N E T: W O R K I N G M E M O R Y H U G E , O N L I N E D A TA S E T S COMPLEX CODE REAL-TIME DISCUSSION OPEN, MASSIVE COLLABORATION C O M B I N E , R E U S E S C I E N T I F I C R E S U LT S CITIZEN SCIENCE
Collaboration
Broadcast question, hoping that many minds can find a solution
Transparency Data is the new soil
Sharing
Broadcast data, believing that many minds will ask the right questions
Democratization With the right tools, everybody can be a scientist
Opportunities for scientists
DESIGNED SERENDIPITY • Right person, right time, right place
• What is impossible for one scientist, is easy for the next
• Right expertise, resources at the right time • Critical mass: Ideas spark new ideas • Dramatically speeds up progress
R E P U TAT I O N
• Authorship: easier collaboration, contributions visible
• Visibility: earn respect from (notable) peers
• Productivity: Exploit your special insight and advantage • Citation: share data so that many can build on it • Funding: frequent reuse shows value to the community
MACHINE LEARNING • Ideal candidate for networked science
• Highly complex data, code, large-scale experiments, impossible to print in papers
• Experiments are not shared online: impossible to build on prior work, start each time from scratch • Low generalisability: studies contradict
• Low reproducibility: code, experiment details missing
NETWORKED MACHINE LEARNING
• Easily find code, data, reuse it in novel ways
• Reuse prior experiments to quickly answer questions • Faster benchmarking, leaderboards
• Mine collected results for patterns: meta-learning
• Collaborate with many people (interdisciplinary) • Great for students, education
BEYOND JOURNALS
• Enriches research output, linked to papers • Freely accessible • Organized online • Continuously updated • Immensely detailed • Reproducible • Online discussion • Diminishes publication bias
MORE TIME • Automate routinizable work: • Find code and data online
• Setup, run & organize experiments (tools)
• Relate to state-of-the-art (benchmarks) • Annotate code and data • Full log of your research
• Keep control of your data, code, experiments
MORE CREDIT • Citation • Tool tells others how to cite your work • Data easy to find by others
• Altmetrics: tracks how often your work is reused (ArXiv) • Productivity: more time for actual research • Visibility: collaborate, climb leaderboards
Demo
WEKA PLUGIN
MOA PLUGIN
O P E N M L PA C K A G E ( R )
RAPIDMINER PLUGIN
1 . O P E R AT O R T O D O W N L O A D TA S K ( TA S K T Y P E S P E C I F I C )
2 . S U B W O R K F L O W T H AT S O LV E S T H E TA S K , G E N E R AT E S R E S U LT S 3 . O P E R AT O R F O R U P L O A D I N G R E S U LT S
FUTURE WORK
• OpenML studies: online representation of paper: data, code, runs, discussions,…
• Social layer: control visibility: public, friends, private
• Collaborative leaderboards: all top-3 contributors • More data types, tasks
JOIN THE TEAM
SPREAD THE WORD
#OpenML