Multi-Scale Search for Black-Box Optimization: Theory ...

2 downloads 0 Views 5MB Size Report
Multi-Scale Search for Black-Box Optimization: Theory & Algorithms. PhD Oral Defense for. Abdullah Al-Dujaili. Supervised by. Assoc. Prof. Suresh Sundaram.
Multi-Scale Search for Black-Box Optimization: Theory & Algorithms

PhD Oral Defense for Abdullah Al-Dujaili Supervised by Assoc. Prof. Suresh Sundaram School of Computer Science & Engineering Nanyang Technological University 1

Outline Background Motivation Contributions Finite-Time Analysis Algorithms for Expensive & Multi-Objective Optimization Benchmarking Platform

Conclusion Future Works

2

Optimization

Recurrent topic of interest for centuries

3

Optimization

Recurrent topic of interest for centuries Many applications: Control/planning Machine learning Design/ manufacture

3

Optimization

Recurrent topic of interest for centuries Many applications: Control/planning Machine learning Design/ manufacture

Many sub-fields Convex Discrete Multi-objective

3

Optimization

Objective Function

• • • •

Zero-order (value) Closed-form High-order (gradient) Smoothness

4

Optimization

Objective Function

• • • •

Zero-order (value) Closed-form High-order (gradient) Smoothness

4

Black-Box Optimization

Objective Function

• Zero-order (value)

4

Black-Box Optimization

Objective Function

• Zero-order (value)

A search problem through point-wise evaluations.

4

Black-Box Applications Geijtenbeek, Thomas, Michiel van de Panne, and A. Frank van der Stappen. 
 "Flexible muscle-based locomotion for bipedal creatures." 
 ACM Transactions on Graphics (TOG) 32.6 (2013): 206.

The muscle routing and control parameters are optimized using the Covariance Matrix Adaptation [Hansen, 2006] black-box algorithm.

5

Black-Box Optimization: Some Challenges

6

Black-Box Optimization: Some Challenges

Dimensionality

6

Black-Box Optimization: Some Challenges

Dimensionality

Modality

6

Black-Box Optimization: Some Challenges

Dimensionality

Modality

Complexity

6

Black-Box Optimization: Some Challenges

Dimensionality

Complexity

Modality

Conditioning Courtesy of Chen et al., 2015

6

Black-Box Approaches Passive Optimization [Sukharev, 1971] A grid of n points Return the best point Inefficient

Active Optimization [Piyavski, 1972; Shobert, 1972] Sequential decision-making Next point depends on the previous points. Return the best point Exploration vs. Exploitation

Objective Function

Solver

7

Active Black-Box Optimization Heuristic Algorithms [Srinivas & Patnaik, 1994]

Objective Function

Nature-inspired Probabilistic guarantees E.g. Swarm Intelligence Solver

MP Algorithms [Piyavski, 1972; Shobert, 1972] Developed by the Mathematical Programming (MP) community Systematic procedure of sampling points • •

Direct the search Construct models of the objective function

Asymptotic convergence

8

Heuristics vs. MP Algorithms

[Audet & Kokkolaras, 2016] 


9

Main Issues of MP Black-Box Optimization

10

Main Issues of MP Black-Box Optimization Theoretical Analysis

1. • •

Several methods have been analyzed independently Asymptotic analysis [Torn & Zilinskas, 1989; Conn et al., 1997; Sergeyev, 1998; Lewis & Torczon, 2002; Finkel & Kelley, 2004; Audet & Dennis, 2006; Sergeyev et al., 2016].



Few papers addressed finite-time performance [Hansen, 1991; Munos, 2011].

10

Main Issues of MP Black-Box Optimization Theoretical Analysis

1. • •

Several methods have been analyzed independently Asymptotic analysis [Torn & Zilinskas, 1989; Conn et al., 1997; Sergeyev, 1998; Lewis & Torczon, 2002; Finkel & Kelley, 2004; Audet & Dennis, 2006; Sergeyev et al., 2016].



Few papers addressed finite-time performance [Hansen, 1991; Munos, 2011].

10

Main Issues of MP Black-Box Optimization Theoretical Analysis

1. • •

Several methods have been analyzed independently Asymptotic analysis [Torn & Zilinskas, 1989; Conn et al., 1997; Sergeyev, 1998; Lewis & Torczon, 2002; Finkel & Kelley, 2004; Audet & Dennis, 2006; Sergeyev et al., 2016].



Few papers addressed finite-time performance [Hansen, 1991; Munos, 2011].

Expensive Optimization

2. • •

Inherently exploratory May reduce to a grid search [Finkel & Kelley, 2004].

10

Main Issues of MP Black-Box Optimization Theoretical Analysis

1. • •

Several methods have been analyzed independently Asymptotic analysis [Torn & Zilinskas, 1989; Conn et al., 1997; Sergeyev, 1998; Lewis & Torczon, 2002; Finkel & Kelley, 2004; Audet & Dennis, 2006; Sergeyev et al., 2016].



Few papers addressed finite-time performance [Hansen, 1991; Munos, 2011].

Expensive Optimization

2. • •

Inherently exploratory May reduce to a grid search [Finkel & Kelley, 2004].

Multi-Criterion Decision Making

3. • •

Tailored for single-objective optimization Few investigations on multi-objective optimizations [Audet et al., 2010; Custodio et al., 2011]

10

Focus of this thesis

Black-Box Optimization Algorithms Objective Function

Active

Passive

Solver

Heuristics

MP Trust-Region Methods

Direct Search Simplex Methods Line Search

Partitioning (Multi-Scale Search Optimization – MSO) Methods 11

Focus of this thesis

Black-Box Optimization Algorithms Objective Function

Active

Passive

Solver

Heuristics

MP Trust-Region Methods

Direct Search Simplex Methods Line Search

Partitioning (Multi-Scale Search Optimization – MSO) Methods 11

Contributions to MP Black-Box Optimization Theoretical Analysis Contribution

1.



Finite-time analysis of several multi-scale search algorithms for single-objective optimization (Lipschitz, DIRECT, MCS) in a unified framework building on the work of Munos, 2011.



Application of the theoretical framework to multi-objective optimization providing a finite-time upper-bound on the Pareto-compliant unary additive epsilon indicator.



Numerical validation of the theoretical bounds on a set of synthetic problems.

12

Contributions to MP Black-Box Optimization

13

Contributions to MP Black-Box Optimization Algorithmic Design Contribution

2.





Proposed an exploitative MSO algorithm (NMSO) for Expensive Single-Objective Optimization with finite-time convergence. Developed two MSO algorithms for Multi-Objective Optimization. • •

MO-DIRECT MO-SOO

13

Contributions to MP Black-Box Optimization Algorithmic Design Contribution

2.





Proposed an exploitative MSO algorithm (NMSO) for Expensive Single-Objective Optimization with finite-time convergence. Developed two MSO algorithms for Multi-Objective Optimization. • •

MO-DIRECT MO-SOO

Benchmarking Contribution

3.





A thorough empirical analysis and comparison of classical (DIRECT, MCS), commercial (BB-LS) and recent (SOO, BaMSOO) algorithms.
 Developed the Black-box Multi-objective Optimization Benchmarking
 (BMOBench) platform based on 100 multi-objective optimization problems from the literature collected by Custodio et al., 2011.

13

Contribution I Finite-Time Analysis of MSO Algorithms

14

Contribution I Finite-Time Analysis of MSO Algorithms



Al-Dujaili, Abdullah, S. Suresh, and N. Sundararajan. "MSO: a framework for boundconstrained black-box global optimization algorithms." Journal of Global Optimization 66.4 (2016): 811-845.

14

Active Optimization : Exploration vs. Exploitation Initial investigations date back to 1933 [Thompson, 1933; Robbins,1952]. Formally know as the multi-armed bandit problem (MAB).

Courtesy of https://conversionxl.com/bandit-tests/

Under MAB, Continuous Black-Box Optimization can be one of two: Infinitely Many-armed Bandit Hierarchy of multi-armed Bandit

15

Black-Box Optimization Employ a divide-and-conquer partitioning tree over the search space.

Assign exploration and exploitation scores for each node. Iteratively, expand nodes based on their scores

16

Classical Method: Lipschitzian Optimization At time t=0, interval = [a,b]

[Piyavski, 1972; Shobert, 1972]

17

Classical Method: Lipschitzian Optimization At time t=0, interval = [a,b]

[Piyavski, 1972; Shobert, 1972]

17

Classical Method: Lipschitzian Optimization At time t=0, interval = [a,b]

[Piyavski, 1972; Shobert, 1972]

17

Classical Method: Lipschitzian Optimization At time t=0, interval = [a,b]

[Piyavski, 1972; Shobert, 1972]

17

Classical Method: Lipschitzian Optimization At time t=0, interval = [a,b]

[Piyavski, 1972; Shobert, 1972]

17

Classical Method: Lipschitzian Optimization At time t=0, interval = [a,b]

[Piyavski, 1972; Shobert, 1972]

17

Classical Method: Lipschitzian Optimization At time t=0, interval = [a,b]

[Piyavski, 1972; Shobert, 1972]

17

Black-Box Optimization

Function Value Selected Interval

Size

18

Black-Box Optimization

Function Value Selected Interval

Size

18

Black-Box Optimization

Function Value Selected Interval

Size

18

Black-Box Optimization

Function Value Selected Interval

Size

18

Black-Box Optimization

Function Value Selected Interval

Size

18

Black-Box Optimization

Function Value

Local Search Selected Interval

Size

18

Black-Box Optimization

Function Value Selected Interval

Size

18

Black-Box Optimization

Function Value

Global Search Selected Interval

Size

18

Black-Box Optimization

Function Value Selected Interval

Size

18

Black-Box Optimization

Function Value

Slope C

Selected Interval

Size

18

MSO Algorithms in Literature Lipschitzian Optimization (LO) B. O. Shubert. A sequential method seeking the global maximum of a function.
 SIAM Journal on Numerical Analysis, 9(3):379-388, 1972. S. Piyavskii. An algorithm for finding the absolute extremum of a function. USSR
 Computational Mathematics and Mathematical Physics, 12(4):57{67, 1972.

Branch and Bound (BB) J. Pinter. Global optimization in action: continuous and Lipschitz optimization: algorithms, implementations and applications, volume 6. Springer Science & Business Media, 1995.

Dividing RECTangles (DIRECT) Jones, D.R., Perttunen, C.D. and Stuckman, B.E., 1993. Lipschitzian optimization without the Lipschitz constant. Journal of Optimization Theory and Applications, 79(1), pp.157-181.

Multilevel Coordinate Search (MCS) Huyer, W. and Neumaier, A., 1999. Global optimization by multilevel coordinate search. Journal of Global Optimization, 14(4), pp. 331-355.

Simultaneous Optimistic Optimization (S00) Munos, R., 2011. Optimistic optimization of deterministic functions without the knowledge of its smoothness. In Advances in neural information processing systems.

19

Convergence Analysis How much exploration is needed? For any

Quantifying As long as

, global search explores a set of

-optimal states

.

measures how much exploration is needed. , convergence is guaranteed.

20

Quantifying Exploration Packing balls as an approach covering number (Hsu et al., 2007) zooming dimension (Kleinberg et al., 2008) near-optimality dimension (Bubeck et al., 2008) Provided: Local Holder Continuity Bounded Partitions Well-Shaped Partitions

21

The Near-optimality Dimension

22

The Near-optimality Dimension

23

The Near-optimality Dimension

23

The Near-optimality Dimension

23

The Near-optimality Dimension

23

The Near-optimality Dimension

Number of intervals to be explored ~

23

The Near-optimality Dimension

Number of intervals to be explored ~

23

The Near-optimality Dimension

Number of intervals to be explored ~

23

The Near-optimality Dimension

Number of intervals to be explored ~

23

The Near-optimality Dimension

Number of intervals to be explored ~

23

The Near-optimality Dimension

Number of intervals to be explored ~

23

Convergence Analysis Theoretical Convergence Rates for MSO algorithms

DOO & SOO analysis [Munos, 2011]

24

Convergence Analysis Theoretical Convergence Rates for MSO algorithms

DOO & SOO analysis [Munos, 2011]

24

Convergence Analysis Theoretical Convergence Rates for MSO algorithms

DOO & SOO analysis [Munos, 2011]

24

Numerical Validation of Theoretical Bounds Empirical convergence rate and its theoretical bound with respect to the number of function evaluations in minimizing using Symbolic Maths.

25

Summary of Contribution I

Finite-time analysis of several multi-scale search algorithms for singleobjective optimization (Lipschitz, DIRECT, MCS). A unified framework building on the work of Munos, 2011 and the notion of packing balls. Numerical validation of the theoretical bounds on a set of synthetic problems.

26

Contribution II NMSO: MSO Algorithm for Expensive Optimization

27

Contribution II NMSO: MSO Algorithm for Expensive Optimization



Al-Dujaili, Abdullah, and S. Suresh. "A Naive multi-scale search algorithm for global optimization problems." Information Sciences 372 (2016): 294-312.

27

Motivation

MSO has been dominantly exploratory The DIRECT algorithm may reduce to an exhaustive grid search [Finkel & Kelley, 2004].

Some incorporates local search (exploitation) as a separate component The MCS algorithm [Huyer & Neumaier, 1999].

Expensive Optimization is more relevant (i.e., limited number of function evaluations) Incorporate local search (exploitation) in the MSO framework & provide a finitetime convergence. NMSO

28

Motivation

MSO has been dominantly exploratory The DIRECT algorithm may reduce to an exhaustive grid search [Finkel & Kelley, 2004].

Some incorporates local search (exploitation) as a separate component The MCS algorithm [Huyer & Neumaier, 1999].

Expensive Optimization is more relevant (i.e., limited number of function evaluations) Incorporate local search (exploitation) in the MSO framework & provide a finitetime convergence. NMSO

28

Naïve Multi-scale Search Optimization Algorithm Naïve Multi-scale Search Optimization (NMSO)

Sampled Function value as exploitation score Depth as exploration score Depth-wise expansion until nono further improvement is noticed and revisit further improvement the root. Guarantees asymptotic convergence. Available at http://ash-aldujaili.github.io/NMSO/

29

“No further improvement” in NMSO for 2D problems

30

“No further improvement” in NMSO for 2D problems

30

“No further improvement” in NMSO for 2D problems

30

“No further improvement” in NMSO for 2D problems

30

“No further improvement” in NMSO for 2D problems

30

“No further improvement” in NMSO for 2D problems

30

“No further improvement” in NMSO for 2D problems

30

Theoretical Performance

*Proof is presented in the thesis.

31

Numerical Validation on 1000 problems

*More comprehensive numerical experiments are presented in the thesis.

32

Summary of Contribution II NMSO: an MSO algorithm For Expensive Optimization Enjoys • finite-time convergence rate • Asymptotic convergence

Limitations • Ill-conditioned functions • Dimensionality

33

Contribution III MSO for Multi-Objective Optimization

34

Contribution III MSO for Multi-Objective Optimization

• • • •

Al-Dujaili, Abdullah, and S. Suresh. "Dividing rectangles attack multi-objective optimization." Evolutionary Computation (CEC), 2016 IEEE Congress on. IEEE, 2016.. Al-Dujaili, Abdullah, and S. Suresh. "Multi-objective simultaneous optimistic optimization." arXiv preprint arXiv:1612.08412 (2016). Al-Dujaili, Abdullah, and S. Suresh. "A MATLAB Toolbox for Surrogate-Assisted Multi-Objective Optimization: A Preliminary Study." Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion. ACM, 2016. Wong, Cheryl Sze Yin, Abdullah Al-Dujaili, and Suresh Sundaram. "Hypervolume-Based DIRECT for Multi-Objective Optimisation." Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion. ACM, 2016.

34

Motivation

35

Motivation The bulk of MSO algorithms have been tailored towards SingleObjective Optimization.

35

Motivation The bulk of MSO algorithms have been tailored towards SingleObjective Optimization. In practice, it is desired to optimize multiple conflicting objectives Prediction Accuracy vs. Training Time

35

Motivation The bulk of MSO algorithms have been tailored towards SingleObjective Optimization. In practice, it is desired to optimize multiple conflicting objectives Prediction Accuracy vs. Training Time

Convergence Analysis: finite-set and/or discrete problems [Rudolph, 1998; Kumar & Banerjee, 2005], probabilistic guarantees [Hanne, 1999] total order on the solutions [Gabor et al., 1998]

35

Potentially-Optimal Nodes ( ) Single-Objective MSO algorithm: DIRECT [Jones et al., 1993]

Function Value (Local Score)

Size (Global Score)

36

Potentially-Optimal Nodes ( ) Single-Objective MSO algorithm: DIRECT [Jones et al., 1993]

Function Value (Local Score)

• From 2D-space projection to (m +1)D-space. • m : # of objectives

Size (Global Score)

36

Potentially-Optimal Nodes for Multi-Objective Optimization



Proposed two MSO algorithmic instances for Multi-Objective Optimization based on DIRECT and SOO: • MO-DIRECT & MO-SOO

37

Empirical Validation on 1000 problems



https://bbcomp.ini.rub.de/results/BBComp2016-2OBJ-expensive/summary.html

38

Theoretical Performance A finite-time upper bound on the Pareto-compliant Quality Indicator: Additive Epsilon-Indicator

39

Empirical Validation of Theoretical Bounds



Code for reproducing these bounds can be found at https://www.dropbox.com/s/ ssiq1m52hczuj7a/mosoo-theory-validation.rar?dl=0 40

Summary of Contribution III Extension of MSO algorithms to Mutli-Objective Optimization: MOSOO & MO-DIRECT. First time in literature, a finite-time upper bound on the Paretocompliant Additive Indicator down to the conflict dimension. Numerical Validation of the bound on a synthetic set of problems.

41

Contribution IV Optimization Algorithms Benchmarking

42

Contribution IV Optimization Algorithms Benchmarking

• • • • • •

Al-Dujaili, Abdullah, and S. Suresh. "BMOBench: Black-Box Multi-Objective Optimization Benchmarking Platform." arXiv preprint arXiv: 1605.07009 (2016). Tanweer, M. R., Abdullah Al-Dujaili, and S. Suresh. "Empirical Assessment of Human Learning Principles Inspired PSO Algorithms on Continuous Black-Box Optimization Testbed." International Conference on Swarm, Evolutionary, and Memetic Computing. Springer International Publishing, 2015. Al-Dujaili, Abdullah, Muhammad Rizwan Tanweer, and Sundaram Suresh. "On the Performance of Particle Swarm Optimization Algorithms in Solving Cheap Problems." Computational Intelligence, 2015 IEEE Symposium Series on. IEEE, 2015. Al-Dujaili, Abdullah, Muhammad Rizwan Tanweer, and Sundaram Suresh. "DE vs. PSO: A Performance Assessment for Expensive Problems." Computational Intelligence, 2015 IEEE Symposium Series on. IEEE, 2015. Al-Dujaili, Abdullah, and S. Suresh. "Analysis of the Bayesian multi-scale optimistic optimization on the CEC2016 and BBOB testbeds." Evolutionary Computation (CEC), 2016 IEEE Congress on. IEEE, 2016. Tanweer, M. R., A. Al-Dujaili, and S. Suresh. "Multi-Objective Self Regulating Particle Swarm Optimization Algorithm for BMOBench Platform.“

42

Empirical Assessment on COCO Comprehensive Benchmark of established MSO algorithms. classical (DIRECT, MCS), commercial (BB-LS) and recent (SOO, BaMSOO)

NMSO, BB-LS, MCS are suitable for expensive optimization. BaMSOO is suitable for cheap optimization problems. Detailed discussion is presented in the thesis.

43

BMOBench Inspired by COCO for Single-Objective Optimization. A platform for 100 problems in C / MATLAB Accommodates stochastic and deterministic algorithms. data profiles generated in terms of 4 quality indicators. Compiled in a LaTeX-template file.

Special session at SSCI’2016, Greece.
 Available at http://ash-aldujaili.github.io/BMOBench/ Acknowledgement: BO-BBOB platform [Brockhoff et al., 2015] and the work of [Cust´odio et al. , 2011] Thanks extended to Bhavarth Pandya, Chaitanya Prasad, Khyati Mahajan, and Shaleen Gupta. 

44

Summary of Contribution IV

Comprehensive benchmarking of several established MSO algorithms to complement the theoretical analysis. BMOBench: a benchmarking platform for multi-objective algorithms with 100 problems.

45

Conclusion & Future Works

46

Conclusion

47

Conclusion 1.

A Theoretical Framework for Finite-Time Analysis

47

Conclusion 1.

A Theoretical Framework for Finite-Time Analysis

2.

A Finite-Time Pareto-Compliant Indicator Bound

47

Conclusion 1.

A Theoretical Framework for Finite-Time Analysis

2.

A Finite-Time Pareto-Compliant Indicator Bound

3.

Exploitative Multi-Scale Search Algorithm for Expensive Optimization

47

Conclusion 1.

A Theoretical Framework for Finite-Time Analysis

2.

A Finite-Time Pareto-Compliant Indicator Bound

3.

Exploitative Multi-Scale Search Algorithm for Expensive Optimization

47

Conclusion 1.

A Theoretical Framework for Finite-Time Analysis

2.

A Finite-Time Pareto-Compliant Indicator Bound

3.

Exploitative Multi-Scale Search Algorithm for Expensive Optimization

4.

Multi-Scale Search Algorithms for Multi-Objective Optimization

47

Conclusion 1.

A Theoretical Framework for Finite-Time Analysis

2.

A Finite-Time Pareto-Compliant Indicator Bound

3.

Exploitative Multi-Scale Search Algorithm for Expensive Optimization

4.

Multi-Scale Search Algorithms for Multi-Objective Optimization

47

Conclusion 1.

A Theoretical Framework for Finite-Time Analysis

2.

A Finite-Time Pareto-Compliant Indicator Bound

3.

Exploitative Multi-Scale Search Algorithm for Expensive Optimization

4.

Multi-Scale Search Algorithms for Multi-Objective Optimization

5.

Comprehensive Benchmarking of MSO algorithms.

47

Conclusion 1.

A Theoretical Framework for Finite-Time Analysis

2.

A Finite-Time Pareto-Compliant Indicator Bound

3.

Exploitative Multi-Scale Search Algorithm for Expensive Optimization

4.

Multi-Scale Search Algorithms for Multi-Objective Optimization

5.

Comprehensive Benchmarking of MSO algorithms.

47

Conclusion 1.

A Theoretical Framework for Finite-Time Analysis

2.

A Finite-Time Pareto-Compliant Indicator Bound

3.

Exploitative Multi-Scale Search Algorithm for Expensive Optimization

4.

Multi-Scale Search Algorithms for Multi-Objective Optimization

5.

Comprehensive Benchmarking of MSO algorithms.

6.

A new Benchmark for Multi-Objective Optimization.

47

Future Directions

Towards Adaptive Partitioning Towards Large-Scale Problems Towards AI Applications

48

Future Directions

Towards Adaptive Partitioning Towards Large-Scale Problems Towards AI Applications

49

The Curse of dimensionality – on 1K problems

50

The Curse of dimensionality – on 1K problems

50

The Curse of dimensionality – on 1K problems

50

The Curse of dimensionality – on 1K problems

50

Theoretical Bound for Large-Scale Optimization



Al-Dujaili, A., & Suresh, S. “Embedded Bandits for Large-Scale Black-Box Optimization”. AAAI Conference on Artificial Intelligence. (2017) 51

EmbeddedHunter for Large-Scale Optimization Available at https://goo.gl/nYDVBY



Al-Dujaili, A., & Suresh, S. “Embedded Bandits for Large-Scale Black-Box Optimization”. AAAI Conference on Artificial Intelligence. (2017) 52

EmbeddedHunter for Large-Scale Optimization Available at https://goo.gl/nYDVBY



Al-Dujaili, A., & Suresh, S. “Embedded Bandits for Large-Scale Black-Box Optimization”. AAAI Conference on Artificial Intelligence. (2017) 52

Future Directions

Towards Adaptive Partitioning Towards Large-Scale Problems Towards AI Applications

53

Black-Box for Machine Learnig / AI OpenAI: Black-Box algorithms for RL.

OpenAI Blog, https://blog.openai.com/ evolution-strategies/ 54

Publications

55