ECAI Metalearning tutorial-P0-Summary

Meta-Learning & Algorithm Selection @ECAI 2014 Tutorial T4 18 August Prague

Pavel Brazdil, Carlos Soares, Joaquin Vanschoren ([email protected], [email protected], [email protected])

Plan 1.  The ML/DM algorithm selection problem 2.  Algorithm selection via meta-learning

3.  Considering both accuracy and time in algorithm selection 4.  Combining metalearning and active testing 5.  Extending metalearning to workflow design 6.  Metalearning for streaming data

7.  Metalearning and algorithm selection in other domains 8.  Metalearning in specific problems 9.  OpenML brazdil, soares & vanschoren @ ECAI 2014

2

07-08-2014

Metalearning & Algorithm Selection Part I 1

ECAI-2014 Tutorial T4 18 August 2014 Pavel Brazdil LIAAD Inesc Tec / FEP, U.Porto P.Brazdil - ECAI-2014 Tutorial T4

2

Overview 1. The ML/DM algorithm selection problem 2. Algorithm selection via meta-learning 2.1 Types of recommendation 2.2 Following the recommendation to identify the best algorithm 2.3 How to define the metalearning problem

3. Considering both accuracy and time in algorithm selection 4. Combining metalearning and active testing 5. Extending meta-learning to workflow design P.Brazdil - ECAI-2014 Tutorial T4

1

07-08-2014

3

1. The ML/DM Algorithm Selection Problem

Information Flow in Machine Learning (ML) / Data Mining (DM) Systems: .

Data Items (Examples)

Classification Algorithms: - Decision Trees - Neural nets - SVMs etc. Regression Algorithms: …

ML DM

Categoric / Numeric Predictions Novel Patterns

P.Brazdil - ECAI-2014 Tutorial T4

4

1. The ML/DM Algorithm Selection Problem (2) Some Examples of ML/DM Applications: ! Engineering: Controlling a robot path / production etc. ! Finance: Concession of Credit, Predicting bankruptcy ! Medicine: Diagnostics, controlling the state of patient in intensive care ! Bioinformatics: Predicting the toxicity of medical drugs ! Information retrieval: Learning to identify suitable terms / linguistic features and using them in identification of documents


2

07-08-2014

5

1. The ML/DM Algorithm Selection Problem (3)

A large set of algorithms is available in ML: + It increases a possibility of finding a good solution. - It is much harder to find the right algorithm.

Classification Algorithms: - Decision Trees - Neural nets - SVMs … Ensembles of algorithms

This problem is aggravated by: Many algorithms need parameter settings (NNs, SVMs etc.). We want methods that can help us to identify/select the algorithms with the best performance. We cannot test all algorithms for computational reasons (thousands of variants of algorithm + parameter settings) This problem was first formulated by Rice (1976). P.Brazdil - ECAI-2014 Tutorial T4

6

2. Algorithm Selection via Metalearning (1) Metalearning is learning about which method / algorithm is best for which situation. Basic scheme - Part 1: Generation of meta-level models: Description of datasets Description of algorithms

Metadata

Meta-level models

Experimental results of algorithms on datasets P.Brazdil - ECAI-2014 Tutorial T4

3

07-08-2014

7

2. Algorithm Selection via Metalearning (2) Basic scheme - Part 2: Applying the meta-level models to algorithm selection:

Description of a new dataset Meta-level models

Recommendation: - Best algorithm - Ranking

Metadata


8

2. Algorithm Selection via Metalearning (3) One simple scheme does not take into account the characteristics of datasets: Experimental results of algorithms on datasets

Merging Rankings

Recommended Ranking

This method can play the role of “straw man” / default method. More complex methods should perform better than this method.


4

07-08-2014

9

2. Algorithm Selection via Metalearning (4) More details on merging rankings Example with 2 rankings:


10

2. Algorithm Selection via Metalearning (5) Overview of Next Topics 2.1 Types of recommendation 2.2 Following recommendation to identify the best algorithm 2.3 How to define the metalearning problem


5

07-08-2014

11

2.1 Types of Recommendation Possible types of recommendation: 1. Single potentially best algorithm (e.g. A7) - Does not suggest other options 2. As above + all algorithm with a comparable performance (e.g. A7, A13 ,A15 ) +- Suggests more options, but this may not be sufficient 3. Ranking of algorithms + The best alternative! 4. (Predicted of performance of all algorithms → ranking) Unnecessarily more complex & more difficult! P.Brazdil - ECAI-2014 Tutorial T4

12

2.2 Following the Recommendation to Identify the Best Algorithm Top N strategy:

(Initialize Current-Best)

Set Algs[1] as the Current-Best

(Algs[1]= 1st algorithm in the ranking)

Evaluate the Current-Best using CV Set Perf-Current-Best

(CV = cross-validation) (Perf-Current-Best = its performance)

Repeat i←2 Evaluate Algs[i] using CV Set Perf-Algs[i] If Perf-Algs[i] > Perf-Current-Best Current-Best ← Algs[i] Perf-Current-Best ← Perf-Algs[i] i ← i+1 End Repeat

(Algs[i] = i-th algorithm in the ranking)

(Perf-Algs[i] = performance of Algs[i]) (Substitute Current-Best) (Store its performance) (Advance to next algorithm)


6

07-08-2014

13

2.2 Following the Recommendation to Identify the Best Algorithm Evaluation Accuracy loss = Perf-True-Best - Perf-Current-Best (Perf-True-Best = performance of the truly best algorithm) Accuracy loss

4,5

Here we consider 16 algorithms

4 3,5 3 2,5 2 1,5 1 0,5 0 1

2


14

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Nº of algorithms evaluated

2.3 How to Define the Metalearning Problem? Create training data (metadata) i.e. training data for the meta-level task: - characteristics of datasets are used as attributes / features, - algorithm name is used as attribute / feature, - performance (e.g. accuracy) is used as the target variable. What kind of dataset characteristics can be used?


7

07-08-2014

15

2.3 How to Define the Metalearning Problem (2) Dataset characteristics: ! Statistical and information-theoretic measures, ! Landmarks, ! Sub-sampling landmarks.


16

2.3 How to Define the Metalearning Problem (3) Statistical and information-theoretic measures: ! Number of classes, ! Class entropy, ! Number of features, ! Ratio of examples to features, ! Degree of correlation between features and target concept, ! etc. + Positive and tangible results (e.g., in projects Statlog and METAL). - There is a limit on how much information these can capture uni- or bi-lateral measures (2 attributes or attribute/class) P.Brazdil - ECAI-2014 Tutorial T4

8

07-08-2014

17

2.3 How to Define the Metalearning Problem(4) Dataset characteristics (2) - Landmarks (Landmarkers) Measure the performance of a set of some (simple and fast) learning algorithms (landmarkers) (e.g. Simplified decision tree) The accuracy of these landmarkers is used to characterize the dataset. - Which landmarks should be used is a non-trivial problem P.Brazdil - ECAI-2014 Tutorial T4

18

2.3 How to Define the Metalearning Problem (5) Dataset characteristics (3) - Sub-sampling Landmarks Exploit performance information obtained on simplified versions of the data (samples). Accuracy results on these samples (or sequences of samples) serve to characterise individual datasets and are referred to as sub-sampling landmarks. Meta-learning system that exploited sub-sampling landmarks lead to marked improvements (Leite & Brazdil, 2010) P.Brazdil - ECAI-2014 Tutorial T4

9

07-08-2014

19

2.3 How to Define the Metalearning Problem (5) Dataset characteristics (3) – Performance-based Dataset Similarity Some researcher s proposed (Nguyen et al., 2012): Two datasets can be considered similar if the performance of a given set of algorithms on these is similar.


20

2.3 How to Define the Metalearning Problem (6) The meta-level model can be in the form of: ! k-NN

(used by many - incl. Brazdil, Gama & Henery, 1994)

! Meta-level rules (not very reliable), ! Neural network, ! Ranking trees

(Q. Sun, 2013; Sun & Pfahringer, 2013)

! Bagging ensembles

(Q. Sun, 2013; Sun & Pfahringer, 2013)

Etc.


10

07-08-2014

21

2.3 How to Define the Metalearning Problem (7) One example of a meta-learning system: k-NN with sample-based characterization (Leite & Brazdil, 2010) ! Get new dataset Dnew ! Construct partial learning curves – performance of all algorithms on N samples of Dnew. ! Use k-NN to identify the most similar partial learning curve. ! Retrieve curve, adapt & project to obtain estimate of performance. One research question: ! Can we obtain (nearly) the same results, but obtain partial learning curves of some algorithms only? P.Brazdil - ECAI-2014 Tutorial T4

22

2.3 How to Define the Metalearning Problem (8)


11

07-08-2014

23

3. Considering both Accuracy (or AUC) and Time If the aim is to consider several different performance measures - e.g. accuracy (or AUC) and time there are two approaches: ! Multi-objective analysis (e.g. DEA), ! Define a combined measure


24

3. Considering both Accuracy (or AUC) and Time (2) Defining a combined measure Some authors maintain that this measure should not include absolute values, but rather ratios: ! Ratios of accuracies of two algorithms ! Ratios of times of two algorithms (suitably rescaled)

One proposal (S.Abdulrahman, & Brazdil,2014):


12

07-08-2014

25

4. Combining Metalearning and Active Testing In 2.2 we presented a sequential process of following the recommended ranking (keep testing all algorithms, following the ranking). This can suffers from two problems: The set of algorithms may contain: 1. Suboptimal algorithms, 2. Very similar algorithms (e.g. variants of the same algorithm with different parameter settings).

Time can be waisted by testing these. How can this be avoided? P.Brazdil - ECAI-2014 Tutorial T4

26

4. Combining Metalearning and Active Testing (2) Eliminating sub-optimal algorithms can be done as follows: Process all datasets one by one. For each dataset mark all algorithms that achieved a competitive result (e.g. best / equivalent to best). After all datasets have been processed, drop all unmarked algorithms (they did not win on any dataset) Use the remaining algorithms from then on. Positive results were reported (Brazdil, Soares & Pereira, 2001) P.Brazdil - ECAI-2014 Tutorial T4

13

07-08-2014

27

4. Combining Metalearning and Active Testing (3) Dealing with very similar algorithms (1) Various techniques exist that can identify similar algorithms by considering performance e.g. by identifying algorithms that commit correlated errors (see e.g. Lee & Giraud-Carrier, 2011)

This could be exploited: The algorithms that commit similar errors to others could be dropped.


28

4. Combining Metalearning and Active Testing (4) Dealing with very similar algorithms (2) One approach maintains the concept of Current-Best (as in 2.2) and tries to identify an algorithm with the best chance of achieving a better performance. The method looks at all previous situations (datasets) in which a particular algorithm won over Current-Best & quantifies all the performance gains. The algorithm with the largest performance gains is evaluated. If it achieves better performance than Current-Best it replaces Current-Best. See (Leite, Brazdil & Vanschoren, 2012) for more details. P.Brazdil - ECAI-2014 Tutorial T4

14

07-08-2014

29

4. Combining Metalearning and Active Testing (5) Dealing with very similar algorithms (3) Good results were reported:

One active testing variant. It uses peformancebased similarity of datasets.

ATWs_k

Rand TopN


30

4. Combining Metalearning and Active Testing (6) Comments : The algorithm chosen for testing cannot be similar to Current-Best as otherwise it would not have been chosen. The method does not really follow the ranking: It jumps back and forth in the ranking, but similar algorithms are left behind. They are eventually tested, unless the process is terminated before.


15

07-08-2014

31

5. Extending Metalearning to Workflows (1) Workflow is a (partially) ordered sequence of operators ≈ Plan Example of a workflow: Discretization Dataset

Apply Naïve Bayes (class probabilities)

Class probability thresholding

OR

Classification Apply Decision Tree


32

5. Extending Metalearning to Workflows (2) Workflows have been incorporated into many DM systems: Weka, Knime, RapidMiner, SAS etc. Example of a workflow in Knime:


16

07-08-2014

33

5. Extending Metalearning to Workflows (3) Many different operations can be chosen at any step: ! Pre-processing operations (feature selection, discretization etc.), ! Classification algorithms (DT, NB, NN, SVM, kNN ..) ! Parameter settings for each, ! Ensembles (bagging, boosting etc.). Consequently: ! Designing complex workflows manually is time consuming. ! The resulting workflow(s) can have suboptimal performance (accuracy, AUC, training time etc.) => Users need support regards how to obtain good workflows! P.Brazdil - ECAI-2014 Tutorial T4

34

5. Extending Metalearning to Workflows (4) Naïve approach: ! Generate all possible workflows for a new dataset - Exploit a given ontology of abstract/concrete operators

! Use meta-knowledge associated with past problems/datasets to: - Retrieve past workflows associated with similar problems; - Rank these workflows according to the expected performance;

! Carry out tests to identify the best workflow;

Acknowledgments: The help of Salisu Abdulrahman in the preparation of some of these slides is gratefully acknowledged. P.Brazdil - ECAI-2014 Tutorial T4

17

07-08-2014

35

5. Extending Metalearning to Workflows (5)

A DM workflow as a hierarchical DAG (adapted from Hilario et al., 2011)


36

5. Extending Metalearning to Workflows (6) A HTN plan of the workflow shown earlier. Non-terminal nodes = tasks / methods. Abstract operators are in bold. Simple operators are in italics.


18

07-08-2014

37

5. Extending Metalearning to Workflows (7) The naïve approach is not practical: ! The number of possible workflows is normally too large; ! Meta-knowledge concerning different workflows may not be available. Some solutions: Expand preferably only the most promising nodes / branches, with the help of meta-knowledge in the form of: ! Association rules (Kietz et al., 2012) ! Conditional probabilities, ! Collaborative filtering (Sebag, invited talk at MetaSel-2014) P.Brazdil - ECAI-2014 Tutorial T4

38

.

End of Part I Thank you for your attention!


19

07-08-2014

39

References Books and Survey Articles: P. Brazdil, Christophe G. Giraud-Carrier, C. Soares, R. Vilalta: Metalearning - Applications to Data Mining. Springer, 2009. K.Smith-Miles: Cross-Disciplinary Perspectives on Meta-Learning for Algorithm Selection, ACM Computing Surveys, 2008 F Serban, J Vanschoren, JU Kietz and A Bernstein: A Survey of Intelligent Assistants for Data Analysis, ACM Computing Surveys, 2013


40

References SM Abdulrahman, P Brazdil: Measures for Combining Accuracy and Time for Meta-learning, Proc. of Workshop MetaSel-2014 associated with ECAI-2014, CEUR proceedings, 2014 P Brazdil, J Gama, B Henery, Characterizing the applicability of classification algorithms using meta-level learning, Machine Learning: ECML-94, 83-102 PB Brazdil, C Soares, JP da Costa: Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results, Machine Learning 50 (3), 251-277, 2003 P Brazdil, C Soares, R Pereira: Reducing rankings of classifiers by eliminating redundant classifiers, Progress in Artificial Intelligence, 14-21, 2001 M Hilario, P Nguyen, H Do, A Woznica, A Kalousis, "Ontology-based meta-mining of knowledge discovery workflows." in N.J ankovski et al. (eds), Meta-Learning in Computational Intelligence. Springer Berlin Heidelberg, 2011. 273-315. Kietz, JU, F Serban, A Bernstein, & S Fischer, Designing KDD-workflows via HTN-planning for intelligent discovery assistance. In 5th Planning to Learn Workshop WS28 at ECAI-2012. P.Brazdil - ECAI-2014 Tutorial T4

20

07-08-2014

41

References JW Lee, C Giraud-Carrier, A metric for unsupervised metalearning, Intelligent Data Analysis, 2011 R Leite, P Brazdil: Active Testing Strategy to Predict the Best Classification Algorithm via Sampling and Metalearning. Proc. of ECAI, 309-314, 2010 R Leite, P Brazdil, J Vanschoren: Selecting Classification Algorithms with Active Testing, Proc. of MLDM, Springer, 2012; M. Mısır , M. Sebag: Algorithm Selection as a Collaborative Filtering Problem, Inria Research Report N°XX, December 2013, Research Centre Saclay – Île-de-France. P Nguyen, J Wang and M Hilario, A Kalousis: Learning Heterogeneous Similarity Measures for Hybrid-Recommendations in Meta-Mining, Proc. of ICDM-2012, also in arXiv:1210.1317, 2012 Rice, J. : The algorithm selection problem. Advances in Computers 15, 65–118, 1976. Quan Sun: Meta-Learning and the Full Model Selection Problem, PhD thesis, U.Waikato, 2013 Q Sun and B Pfahringer. Pairwise Meta-Rules for Better Meta-Learning-Based Algorithm Ranking. Machine Learning, 93(1):141-161, 2013. P.Brazdil - ECAI-2014 Tutorial T4

21

Metalearning*&*Algorithm*Selec2on* (Part*II)*

ECAI92014*Tutorial*T4** 18*August*2014** Carlos*Soares* INESC*TEC/*FEUP,*U.Porto*

plan*

6.  metalearning*for*streaming*data* 7.  metalearning*and*algorithm*selec2on*in* other*domains* 8.  metalearning*in*specific*problems*

brazdil,*soares*&*vanschoren*@*ECAI*2014*

…(and(for(streams(

algorithm*selec2on*for*streams* typical(scenario(


streams:*earlier*work*on*2me* series* •  rule9based*forecas2ng* •  expert*system*approach* –  i.e.*based*on*expert*knowledge*

•  6*features*

–  func2onal*form,*outliers*(series*and*last* observa2on),*discon2nui2es,*change*in*trend* (whole*series*and*last*observa2ons)*

•  iden2fied*based*on*simple*sta2s2cs* •  tested*on*2me*series*from*the*M*compe22on* Adya,*M.*et*al.,*2001.*Automa2c*iden2fica2on*of*2me*series*features*for*rule9based*forecas2ng.* Interna'onal*Journal*of*Forecas'ng,*17(2),*pp.143–157.*


streams:*earlier*metalearning* approach*for*2me*series* •  tradi2onal*metalearning*architecture* –  based*on*Kalousis’*NOEMON*

•  algorithms*

–  classical*2me*series*methods* –  NNs*

•  combina2on*of*types*of*metafeatures* –  tradi2onal**

•  e.g.*number*of*observa2ons,*skewness,*kurtosis*

–  specific*to*2me*series*

•  e.g.*trend,*autocorrela2on,*number*of*turning*points*

•  tested*on*M39compe22on*data*

Prudêncio,*R.B.C.*&*Ludermir,*T.B.,*2004.*Meta9learning*approaches*to*selec2ng*2me*series* models.*Neurocompu'ng,*61,*pp.121–137*


streams:*opportuni2es/ challenges*

granularity(of(selec4on(


data(characteriza4on(

Rossi,*A.L.D.*et*al.,*2014.*MetaStream:*A*meta9learning*based* method*for*periodic*algorithm*selec2on*in*2me9changing*data.* Neurocompu'ng,*127,*pp.52–64.*

streams:*algorithm*selec2on*à*la* streams*(1/3)* •  assumes*

–  data*divided*into*subsets*generated*by*different* phenomena* –  sta2onary*within*subsets* –  ...*which*may*be*recurring*

•  approach*


–  different*models*used*at*different*2mes* –  each*model*may*be*reused* Gama,*J.*&*Kosina,*P.,*2011.*Learning*about*the*learning*process.* In*Proc.*of*the*10th*Int.*Conf.*on*Advances*in*Intelligent*Data* Analysis.*Springer9Verlag,*pp.*162–172.*

streams:*algorithm*selec2on*à*la* streams*(2/3)* •  two*layers*architecture*


–  level*0:*base9level* learning* –  level*1:*is*classifier*able* to*predict*example* correctly*

Gama,*J.*&*Kosina,*P.,*2011.*Learning*about*the*learning*process.* In*Proc.*of*the*10th*Int.*Conf.*on*Advances*in*Intelligent*Data* Analysis.*Springer9Verlag,*pp.*162–172.*

streams:*algorithm*selec2on*à*la* streams*(3/3)* •  model*management* –  each*model*has*a* metamodel*

•  if*drig*detected*at*the* base*level* –  each*referee*predicts* whether*its*model*is* good*enough*


•  may*decide*to*learn* new*model* Gama,*J.*&*Kosina,*P.,*2011.*Learning*about*the*learning*process.* In*Proc.*of*the*10th*Int.*Conf.*on*Advances*in*Intelligent*Data* Analysis.*Springer9Verlag,*pp.*162–172.*

gps*



other*domains* •  constraint*sa2sfac2on* –  boolean*sa2sfiability*problem*(SAT)*

•  search/op2miza2on* –  traveling*salesman* –  job9shop*scheduling* –  quadra2c*assignment*problem*

•  sor2ng* Smith9Miles,*K.A.,*2008.*Cross9disciplinary*perspec2ves*on*meta9learning*for* algorithm*selec2on.*ACM*Compu'ng*Surveys,*41(1),*pp.1–25.* * Kojhoff,*L.,*2014.*Algorithm*Selec2on*for*Combinatorial*Search*Problems:*A* Survey.*{AI}*Magazine,*pp.1–17.*


Kojhoff,*L.*(2012).*Hybrid*regression9classifica2on*models*for* algorithm*selec2on.*In*20th*European*Conference*on*Ar'ficial* Intelligence,*pp.*480–485*

•  even*stacking*has*been* adapted!*

–  challenges* –  approaches*

•  but*research*is*quite* similar*

algorithm*selec2on*in*other* domains* •  different*names* –  hiperheuris2cs* –  landscape*analysis* –  empirical*hardness*

•  ...*terminology* –  porkolio*is*the*set*of* algorithms*

•  ...*goal* –  most*ogen*execu2on*2me* –  although*(meta9)heuris2cs* do*not*guarantee*op2mal* solu2on*


–  level*of*domain*knowledge* used*

•  related*to*the*possibility*to* switch*algorithm*during* search*

–  ...*except*for*dynamical* ones*

•  (sta2s2cal)*proper2es*of* the*instances* •  landmarkers* •  problem9specific*measures*

–  similar*

•  features*

other*domains*vs** machine*learning* •  easier*to*obtain*meta9 examples* –  ar2ficial*instances*are* ogen*used*for*research*

–  search*methods*operate* on*the*same*space* –  except*for*Brodley*(93)*

•  possible*to*switch* method*during*search*

*


algorithm*selec2on*for* op2miza2on:*TSP*

main(contribu4ons( •  meta9features*

–  network*measures* –  algorithm*specific*proper2es* –  op2miza2on*landmarkers*

•  mul29label*classifica2on* •  systema2c*genera2on*of* instances*


traveling(salesman(problem(

Jorge*Kanda,*Carlos*Soares,*Eduardo*R.*Hruschka,*André*Carlos* Ponce*Leon*Ferreira*de*Carvalho:*A*Meta9Learning*Approach*to* Select*Meta9Heuris2cs*for*the*Traveling*Salesman*Problem*Using* MLP9Based*Label*Ranking.*ICONIP*(3)*2012:*4889495*

dynamic*selec2on*of*operators* •  JSS* –  find*schedule*that* minimizes*total* comple2on*2me* –  …*without*conflicts*

•  learn*model* –  for*each*conflict* –  given*current*schedule* and*what*remains*to*be* done* –  which*heuris2c*to*use*

Abreu,*P.,*Soares,*C.*&*Valente,*J.M.,*2009.*Selec2on*of*Heuris2cs*for*the*Job9 Shop*Scheduling*Problem*Based*on*the*Predic2on*of*Gaps*in*Machines.*In* Learning*and*Intelligent*Op'miza'on:*Third*Interna'onal*Conference*(LION*3)*PP* Selected*Papers.*Berlin,*Heidelberg:*Springer9Verlag,*pp.*134–147.*


gps*



algorithm*selec2on*for** gene*expression*data* microarray(data(

main(contribu4ons( •  metafeatures*based*on* clustering*validity*measures* •  weighted*k9NN*for*ranking*


adapted*from*(Harrington*et*al.,*2000)**

Bruno*Feres*de*Souza,*André*Carlos*Ponce*de*Leon*Ferreira*de* Carvalho,*Carlos*Soares:*A*comprehensive*comparison*of*ML* algorithms*for*gene*expression*data*classifica2on.*IJCNN*2010:*198*

response*automa2on*in*a* helpdesk* •  finding*the*best*answer* to*customer*emails* •  select*best*method* –  based*on*similarity*of* email* –  and*performance*of* methods*

Marom,*Y.,*Zukerman,*I.*&*Japkowicz,*N.,*2007.*A*meta9learning*approach*for* selec2ng*between*response*automa2on*strategies*in*a*help9desk*domain.*In* Proceeding*AAAI’07*Proceedings*of*the*22nd*na'onal*conference*on*Ar'ficial* brazdil,*soares*&*vanschoren*@*ECAI*2014* intelligence*P*Volume*1.*AAAI*Press,*pp.*907–912.*

metalearning*for*shelf*space* alloca2on*in*retail*

rela4on(of(predic4ve(performance( and(data(characteris4cs(


solu4on(architecture(

Fábio*Pinto,*Carlos*Soares:*Space*Alloca2on*in*the*Retail*Industry:* A*Decision*Support*System*Integra2ng*Evolu2onary*Algorithms* and*Regression*Models.*ECML/PKDD*(3)*2013:*5319546*

•  •  •  •  •  • 

references:*streams*

Adya,*M.*et*al.,*2001.*Automa2c*iden2fica2on*of*2me*series*features*for* rule9based*forecas2ng.*Interna'onal*Journal*of*Forecas'ng,*17(2),*pp. 143–157* Prudêncio,*R.B.C.*&*Ludermir,*T.B.,*2004.*Meta9learning*approaches*to* selec2ng*2me*series*models.*Neurocompu'ng,*61,*pp.121–137* Lemke,*C.*&*Gabrys,*B.,*2010.*Meta9learning*for*2me*series*forecas2ng* and*forecast*combina2on.*Neurocompu'ng,*73(10912),*pp.2006–2016* Rossi,*A.L.D.*et*al.,*2014.*MetaStream:*A*meta9learning*based*method*for* periodic*algorithm*selec2on*in*2me9changing*data.*Neurocompu'ng,*127,* pp.52–64* Gama,*J.*&*Kosina,*P.,*2011.*Learning*about*the*learning*process.*In*Proc.* of*the*10th*Int.*Conf.*on*Advances*in*Intelligent*Data*Analysis.*Springer9 Verlag,*pp.*162–172* Rijn,*J.N.*Van*et*al.,*2014.*Algorithm*Selec2on*on*Data*Streams.*To*be* published*in*Discovery*Science*


Smith9Miles,*K.A.,*2008.*Cross9disciplinary*perspec2ves*on*meta9learning*for*algorithm*selec2on.*ACM*Compu'ng* Surveys,*41(1),*pp.1–25* Kojhoff,*L.,*2014.*Algorithm*Selec2on*for*Combinatorial*Search*Problems:*A*Survey.*To*be*published*in*AI* Magazine.*

references:*other*domains* •  •  •  •  •  •  •  •  • 

Guo,*H.*2003.*Algorithm*selec2on*for*sor2ng*and*probabilis2c*inference:*A*machine*learning9based*approach.* Ph.D.*disserta2on,*Kansas*State*University* Lagoudakis,*M.,*Lijman,*M.,*and*Parr,*R.*2001.*Selec2ng*the*right*algorithm.*In*Proceedings*of*the*AAAI*Fall* Symposium*Series:*Using*Uncertainty*within*Computa'on.* Burke,*E.K.*et*al.,*2010.*A*Classifica2on*of*Hyper9heuris2c*Approaches.*Handbook*of*Metaheuris'cs,*146,*pp.1–21.* Smith9Miles,*K.A.*et*al.,*2009.*Understanding*the*Rela2onship*between*Scheduling*Problem*Structure*and* Heuris2c*Performance*using*Knowledge*Discovery,*LNCS.*Learning*and*Intelligent*Op'miza'oN*LION,*3.* Kojhoff,*L.*(2012).*Hybrid*regression9classifica2on*models*for*algorithm*selec2on.*In*20th*European*Conference*on* Ar'ficial*Intelligence,*pp.*480–485* Jorge*Kanda,*Carlos*Soares,*Eduardo*R.*Hruschka,*André*Carlos*Ponce*Leon*Ferreira*de*Carvalho:*A*Meta9Learning* Approach*to*Select*Meta9Heuris2cs*for*the*Traveling*Salesman*Problem*Using*MLP9Based*Label*Ranking.*ICONIP* (3)*2012:*4889495* Abreu,*P.,*Soares,*C.*&*Valente,*J.M.,*2009.*Selec2on*of*Heuris2cs*for*the*Job9Shop*Scheduling*Problem*Based*on* the*Predic2on*of*Gaps*in*Machines.*In*Learning*and*Intelligent*Op'miza'on:*Third*Interna'onal*Conference*(LION* 3)*PP*Selected*Papers.*Berlin,*Heidelberg:*Springer9Verlag,*pp.*134–147.*


references:*specific*applica2ons*

•  Bruno*Feres*de*Souza,*André*Carlos*Ponce*de*Leon*Ferreira* de*Carvalho,*Carlos*Soares:*A*comprehensive*comparison* of*ML*algorithms*for*gene*expression*data*classifica2on.* IJCNN*2010:*198* •  Marom,*Y.,*Zukerman,*I.*&*Japkowicz,*N.,*2007.*A*meta9 learning*approach*for*selec2ng*between*response* automa2on*strategies*in*a*help9desk*domain.*In*Proceeding* AAAI’07*Proceedings*of*the*22nd*na'onal*conference*on* Ar'ficial*intelligence*P*Volume*1.*AAAI*Press,*pp.*907–912* •  Fábio*Pinto,*Carlos*Soares:*Space*Alloca2on*in*the*Retail* Industry:*A*Decision*Support*System*Integra2ng* Evolu2onary*Algorithms*and*Regression*Models.*ECML/ PKDD*(3)*2013:*5319546* brazdil,*soares*&*vanschoren*@*ECAI*2014*

NET WORKED MACHINE LEARNING

#OpenML

J O A Q U I N VA N S C H O R E N ( T U / E ) , 2 0 1 4

Research different.

GALILEO GALILEI DISCOVERS SATURN’S RINGS

1610

‘ S M A I S M R M I L M E P O E TA L E U M I B U N E N U G T TA U I R A S ’

Research different. Royal society: Take nobody’s word for it Scientific Journal: Reputation-based culture

300 YEARS LATER

NETWORKED SCIENCE

JOURNAL: LONG-TERM MEMORY I N T E R N E T: W O R K I N G M E M O R Y H U G E , O N L I N E D A TA S E T S COMPLEX CODE REAL-TIME DISCUSSION OPEN, MASSIVE COLLABORATION C O M B I N E , R E U S E S C I E N T I F I C R E S U LT S CITIZEN SCIENCE

Collaboration

Broadcast question, hoping that many minds can find a solution

Transparency Data is the new soil

Sharing

Broadcast data, believing that many minds will ask the right questions

Democratization With the right tools, everybody can be a scientist

Opportunities for scientists

DESIGNED SERENDIPITY • Right person, right time, right place

• What is impossible for one scientist, is easy for the next

• Right expertise, resources at the right time • Critical mass: Ideas spark new ideas • Dramatically speeds up progress

R E P U TAT I O N

• Authorship: easier collaboration, contributions visible

• Visibility: earn respect from (notable) peers

• Productivity: Exploit your special insight and advantage • Citation: share data so that many can build on it • Funding: frequent reuse shows value to the community

MACHINE LEARNING • Ideal candidate for networked science

• Highly complex data, code, large-scale experiments, impossible to print in papers

• Experiments are not shared online: impossible to build on prior work, start each time from scratch • Low generalisability: studies contradict

• Low reproducibility: code, experiment details missing

NETWORKED MACHINE LEARNING

• Easily find code, data, reuse it in novel ways

• Reuse prior experiments to quickly answer questions • Faster benchmarking, leaderboards

• Mine collected results for patterns: meta-learning

• Collaborate with many people (interdisciplinary) • Great for students, education

BEYOND JOURNALS

• Enriches research output, linked to papers • Freely accessible • Organized online • Continuously updated • Immensely detailed • Reproducible • Online discussion • Diminishes publication bias

MORE TIME • Automate routinizable work: • Find code and data online

• Setup, run & organize experiments (tools)

• Relate to state-of-the-art (benchmarks) • Annotate code and data • Full log of your research

• Keep control of your data, code, experiments

MORE CREDIT • Citation • Tool tells others how to cite your work • Data easy to find by others

• Altmetrics: tracks how often your work is reused (ArXiv) • Productivity: more time for actual research • Visibility: collaborate, climb leaderboards

Demo

WEKA PLUGIN

MOA PLUGIN

O P E N M L PA C K A G E ( R )

RAPIDMINER PLUGIN

1 . O P E R AT O R T O D O W N L O A D TA S K ( TA S K T Y P E S P E C I F I C )

2 . S U B W O R K F L O W T H AT S O LV E S T H E TA S K , G E N E R AT E S R E S U LT S 3 . O P E R AT O R F O R U P L O A D I N G R E S U LT S

FUTURE WORK

• OpenML studies: online representation of paper: data, code, runs, discussions,…

• Social layer: control visibility: public, friends, private

• Collaborative leaderboards: all top-3 contributors • More data types, tasks

JOIN THE TEAM

SPREAD THE WORD

#OpenML