Multi-Scale Search for Black-Box Optimization: Theory & Algorithms. PhD Oral Defense for. Abdullah Al-Dujaili. Supervised by. Assoc. Prof. Suresh Sundaram.
Multi-Scale Search for Black-Box Optimization: Theory & Algorithms
PhD Oral Defense for Abdullah Al-Dujaili Supervised by Assoc. Prof. Suresh Sundaram School of Computer Science & Engineering Nanyang Technological University 1
Outline Background Motivation Contributions Finite-Time Analysis Algorithms for Expensive & Multi-Objective Optimization Benchmarking Platform
Conclusion Future Works
2
Optimization
Recurrent topic of interest for centuries
3
Optimization
Recurrent topic of interest for centuries Many applications: Control/planning Machine learning Design/ manufacture
3
Optimization
Recurrent topic of interest for centuries Many applications: Control/planning Machine learning Design/ manufacture
Many sub-fields Convex Discrete Multi-objective
3
Optimization
Objective Function
• • • •
Zero-order (value) Closed-form High-order (gradient) Smoothness
4
Optimization
Objective Function
• • • •
Zero-order (value) Closed-form High-order (gradient) Smoothness
4
Black-Box Optimization
Objective Function
• Zero-order (value)
4
Black-Box Optimization
Objective Function
• Zero-order (value)
A search problem through point-wise evaluations.
4
Black-Box Applications Geijtenbeek, Thomas, Michiel van de Panne, and A. Frank van der Stappen.
"Flexible muscle-based locomotion for bipedal creatures."
ACM Transactions on Graphics (TOG) 32.6 (2013): 206.
The muscle routing and control parameters are optimized using the Covariance Matrix Adaptation [Hansen, 2006] black-box algorithm.
5
Black-Box Optimization: Some Challenges
6
Black-Box Optimization: Some Challenges
Dimensionality
6
Black-Box Optimization: Some Challenges
Dimensionality
Modality
6
Black-Box Optimization: Some Challenges
Dimensionality
Modality
Complexity
6
Black-Box Optimization: Some Challenges
Dimensionality
Complexity
Modality
Conditioning Courtesy of Chen et al., 2015
6
Black-Box Approaches Passive Optimization [Sukharev, 1971] A grid of n points Return the best point Inefficient
Active Optimization [Piyavski, 1972; Shobert, 1972] Sequential decision-making Next point depends on the previous points. Return the best point Exploration vs. Exploitation
Objective Function
Solver
7
Active Black-Box Optimization Heuristic Algorithms [Srinivas & Patnaik, 1994]
Objective Function
Nature-inspired Probabilistic guarantees E.g. Swarm Intelligence Solver
MP Algorithms [Piyavski, 1972; Shobert, 1972] Developed by the Mathematical Programming (MP) community Systematic procedure of sampling points • •
Direct the search Construct models of the objective function
Asymptotic convergence
8
Heuristics vs. MP Algorithms
[Audet & Kokkolaras, 2016]
9
Main Issues of MP Black-Box Optimization
10
Main Issues of MP Black-Box Optimization Theoretical Analysis
1. • •
Several methods have been analyzed independently Asymptotic analysis [Torn & Zilinskas, 1989; Conn et al., 1997; Sergeyev, 1998; Lewis & Torczon, 2002; Finkel & Kelley, 2004; Audet & Dennis, 2006; Sergeyev et al., 2016].
•
Few papers addressed finite-time performance [Hansen, 1991; Munos, 2011].
10
Main Issues of MP Black-Box Optimization Theoretical Analysis
1. • •
Several methods have been analyzed independently Asymptotic analysis [Torn & Zilinskas, 1989; Conn et al., 1997; Sergeyev, 1998; Lewis & Torczon, 2002; Finkel & Kelley, 2004; Audet & Dennis, 2006; Sergeyev et al., 2016].
•
Few papers addressed finite-time performance [Hansen, 1991; Munos, 2011].
10
Main Issues of MP Black-Box Optimization Theoretical Analysis
1. • •
Several methods have been analyzed independently Asymptotic analysis [Torn & Zilinskas, 1989; Conn et al., 1997; Sergeyev, 1998; Lewis & Torczon, 2002; Finkel & Kelley, 2004; Audet & Dennis, 2006; Sergeyev et al., 2016].
•
Few papers addressed finite-time performance [Hansen, 1991; Munos, 2011].
Expensive Optimization
2. • •
Inherently exploratory May reduce to a grid search [Finkel & Kelley, 2004].
10
Main Issues of MP Black-Box Optimization Theoretical Analysis
1. • •
Several methods have been analyzed independently Asymptotic analysis [Torn & Zilinskas, 1989; Conn et al., 1997; Sergeyev, 1998; Lewis & Torczon, 2002; Finkel & Kelley, 2004; Audet & Dennis, 2006; Sergeyev et al., 2016].
•
Few papers addressed finite-time performance [Hansen, 1991; Munos, 2011].
Expensive Optimization
2. • •
Inherently exploratory May reduce to a grid search [Finkel & Kelley, 2004].
Multi-Criterion Decision Making
3. • •
Tailored for single-objective optimization Few investigations on multi-objective optimizations [Audet et al., 2010; Custodio et al., 2011]
10
Focus of this thesis
Black-Box Optimization Algorithms Objective Function
Active
Passive
Solver
Heuristics
MP Trust-Region Methods
Direct Search Simplex Methods Line Search
Partitioning (Multi-Scale Search Optimization – MSO) Methods 11
Focus of this thesis
Black-Box Optimization Algorithms Objective Function
Active
Passive
Solver
Heuristics
MP Trust-Region Methods
Direct Search Simplex Methods Line Search
Partitioning (Multi-Scale Search Optimization – MSO) Methods 11
Contributions to MP Black-Box Optimization Theoretical Analysis Contribution
1.
•
Finite-time analysis of several multi-scale search algorithms for single-objective optimization (Lipschitz, DIRECT, MCS) in a unified framework building on the work of Munos, 2011.
•
Application of the theoretical framework to multi-objective optimization providing a finite-time upper-bound on the Pareto-compliant unary additive epsilon indicator.
•
Numerical validation of the theoretical bounds on a set of synthetic problems.
12
Contributions to MP Black-Box Optimization
13
Contributions to MP Black-Box Optimization Algorithmic Design Contribution
2.
•
•
Proposed an exploitative MSO algorithm (NMSO) for Expensive Single-Objective Optimization with finite-time convergence. Developed two MSO algorithms for Multi-Objective Optimization. • •
MO-DIRECT MO-SOO
13
Contributions to MP Black-Box Optimization Algorithmic Design Contribution
2.
•
•
Proposed an exploitative MSO algorithm (NMSO) for Expensive Single-Objective Optimization with finite-time convergence. Developed two MSO algorithms for Multi-Objective Optimization. • •
MO-DIRECT MO-SOO
Benchmarking Contribution
3.
•
•
A thorough empirical analysis and comparison of classical (DIRECT, MCS), commercial (BB-LS) and recent (SOO, BaMSOO) algorithms.
Developed the Black-box Multi-objective Optimization Benchmarking
(BMOBench) platform based on 100 multi-objective optimization problems from the literature collected by Custodio et al., 2011.
13
Contribution I Finite-Time Analysis of MSO Algorithms
14
Contribution I Finite-Time Analysis of MSO Algorithms
•
Al-Dujaili, Abdullah, S. Suresh, and N. Sundararajan. "MSO: a framework for boundconstrained black-box global optimization algorithms." Journal of Global Optimization 66.4 (2016): 811-845.
14
Active Optimization : Exploration vs. Exploitation Initial investigations date back to 1933 [Thompson, 1933; Robbins,1952]. Formally know as the multi-armed bandit problem (MAB).
Courtesy of https://conversionxl.com/bandit-tests/
Under MAB, Continuous Black-Box Optimization can be one of two: Infinitely Many-armed Bandit Hierarchy of multi-armed Bandit
15
Black-Box Optimization Employ a divide-and-conquer partitioning tree over the search space.
Assign exploration and exploitation scores for each node. Iteratively, expand nodes based on their scores
16
Classical Method: Lipschitzian Optimization At time t=0, interval = [a,b]
[Piyavski, 1972; Shobert, 1972]
17
Classical Method: Lipschitzian Optimization At time t=0, interval = [a,b]
[Piyavski, 1972; Shobert, 1972]
17
Classical Method: Lipschitzian Optimization At time t=0, interval = [a,b]
[Piyavski, 1972; Shobert, 1972]
17
Classical Method: Lipschitzian Optimization At time t=0, interval = [a,b]
[Piyavski, 1972; Shobert, 1972]
17
Classical Method: Lipschitzian Optimization At time t=0, interval = [a,b]
[Piyavski, 1972; Shobert, 1972]
17
Classical Method: Lipschitzian Optimization At time t=0, interval = [a,b]
[Piyavski, 1972; Shobert, 1972]
17
Classical Method: Lipschitzian Optimization At time t=0, interval = [a,b]
[Piyavski, 1972; Shobert, 1972]
17
Black-Box Optimization
Function Value Selected Interval
Size
18
Black-Box Optimization
Function Value Selected Interval
Size
18
Black-Box Optimization
Function Value Selected Interval
Size
18
Black-Box Optimization
Function Value Selected Interval
Size
18
Black-Box Optimization
Function Value Selected Interval
Size
18
Black-Box Optimization
Function Value
Local Search Selected Interval
Size
18
Black-Box Optimization
Function Value Selected Interval
Size
18
Black-Box Optimization
Function Value
Global Search Selected Interval
Size
18
Black-Box Optimization
Function Value Selected Interval
Size
18
Black-Box Optimization
Function Value
Slope C
Selected Interval
Size
18
MSO Algorithms in Literature Lipschitzian Optimization (LO) B. O. Shubert. A sequential method seeking the global maximum of a function.
SIAM Journal on Numerical Analysis, 9(3):379-388, 1972. S. Piyavskii. An algorithm for finding the absolute extremum of a function. USSR
Computational Mathematics and Mathematical Physics, 12(4):57{67, 1972.
Branch and Bound (BB) J. Pinter. Global optimization in action: continuous and Lipschitz optimization: algorithms, implementations and applications, volume 6. Springer Science & Business Media, 1995.
Dividing RECTangles (DIRECT) Jones, D.R., Perttunen, C.D. and Stuckman, B.E., 1993. Lipschitzian optimization without the Lipschitz constant. Journal of Optimization Theory and Applications, 79(1), pp.157-181.
Multilevel Coordinate Search (MCS) Huyer, W. and Neumaier, A., 1999. Global optimization by multilevel coordinate search. Journal of Global Optimization, 14(4), pp. 331-355.
Simultaneous Optimistic Optimization (S00) Munos, R., 2011. Optimistic optimization of deterministic functions without the knowledge of its smoothness. In Advances in neural information processing systems.
19
Convergence Analysis How much exploration is needed? For any
Quantifying As long as
, global search explores a set of
-optimal states
.
measures how much exploration is needed. , convergence is guaranteed.
20
Quantifying Exploration Packing balls as an approach covering number (Hsu et al., 2007) zooming dimension (Kleinberg et al., 2008) near-optimality dimension (Bubeck et al., 2008) Provided: Local Holder Continuity Bounded Partitions Well-Shaped Partitions
21
The Near-optimality Dimension
22
The Near-optimality Dimension
23
The Near-optimality Dimension
23
The Near-optimality Dimension
23
The Near-optimality Dimension
23
The Near-optimality Dimension
Number of intervals to be explored ~
23
The Near-optimality Dimension
Number of intervals to be explored ~
23
The Near-optimality Dimension
Number of intervals to be explored ~
23
The Near-optimality Dimension
Number of intervals to be explored ~
23
The Near-optimality Dimension
Number of intervals to be explored ~
23
The Near-optimality Dimension
Number of intervals to be explored ~
23
Convergence Analysis Theoretical Convergence Rates for MSO algorithms
DOO & SOO analysis [Munos, 2011]
24
Convergence Analysis Theoretical Convergence Rates for MSO algorithms
DOO & SOO analysis [Munos, 2011]
24
Convergence Analysis Theoretical Convergence Rates for MSO algorithms
DOO & SOO analysis [Munos, 2011]
24
Numerical Validation of Theoretical Bounds Empirical convergence rate and its theoretical bound with respect to the number of function evaluations in minimizing using Symbolic Maths.
25
Summary of Contribution I
Finite-time analysis of several multi-scale search algorithms for singleobjective optimization (Lipschitz, DIRECT, MCS). A unified framework building on the work of Munos, 2011 and the notion of packing balls. Numerical validation of the theoretical bounds on a set of synthetic problems.
26
Contribution II NMSO: MSO Algorithm for Expensive Optimization
27
Contribution II NMSO: MSO Algorithm for Expensive Optimization
•
Al-Dujaili, Abdullah, and S. Suresh. "A Naive multi-scale search algorithm for global optimization problems." Information Sciences 372 (2016): 294-312.
27
Motivation
MSO has been dominantly exploratory The DIRECT algorithm may reduce to an exhaustive grid search [Finkel & Kelley, 2004].
Some incorporates local search (exploitation) as a separate component The MCS algorithm [Huyer & Neumaier, 1999].
Expensive Optimization is more relevant (i.e., limited number of function evaluations) Incorporate local search (exploitation) in the MSO framework & provide a finitetime convergence. NMSO
28
Motivation
MSO has been dominantly exploratory The DIRECT algorithm may reduce to an exhaustive grid search [Finkel & Kelley, 2004].
Some incorporates local search (exploitation) as a separate component The MCS algorithm [Huyer & Neumaier, 1999].
Expensive Optimization is more relevant (i.e., limited number of function evaluations) Incorporate local search (exploitation) in the MSO framework & provide a finitetime convergence. NMSO
28
Naïve Multi-scale Search Optimization Algorithm Naïve Multi-scale Search Optimization (NMSO)
Sampled Function value as exploitation score Depth as exploration score Depth-wise expansion until nono further improvement is noticed and revisit further improvement the root. Guarantees asymptotic convergence. Available at http://ash-aldujaili.github.io/NMSO/
29
“No further improvement” in NMSO for 2D problems
30
“No further improvement” in NMSO for 2D problems
30
“No further improvement” in NMSO for 2D problems
30
“No further improvement” in NMSO for 2D problems
30
“No further improvement” in NMSO for 2D problems
30
“No further improvement” in NMSO for 2D problems
30
“No further improvement” in NMSO for 2D problems
30
Theoretical Performance
*Proof is presented in the thesis.
31
Numerical Validation on 1000 problems
*More comprehensive numerical experiments are presented in the thesis.
32
Summary of Contribution II NMSO: an MSO algorithm For Expensive Optimization Enjoys • finite-time convergence rate • Asymptotic convergence
Limitations • Ill-conditioned functions • Dimensionality
33
Contribution III MSO for Multi-Objective Optimization
34
Contribution III MSO for Multi-Objective Optimization
• • • •
Al-Dujaili, Abdullah, and S. Suresh. "Dividing rectangles attack multi-objective optimization." Evolutionary Computation (CEC), 2016 IEEE Congress on. IEEE, 2016.. Al-Dujaili, Abdullah, and S. Suresh. "Multi-objective simultaneous optimistic optimization." arXiv preprint arXiv:1612.08412 (2016). Al-Dujaili, Abdullah, and S. Suresh. "A MATLAB Toolbox for Surrogate-Assisted Multi-Objective Optimization: A Preliminary Study." Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion. ACM, 2016. Wong, Cheryl Sze Yin, Abdullah Al-Dujaili, and Suresh Sundaram. "Hypervolume-Based DIRECT for Multi-Objective Optimisation." Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion. ACM, 2016.
34
Motivation
35
Motivation The bulk of MSO algorithms have been tailored towards SingleObjective Optimization.
35
Motivation The bulk of MSO algorithms have been tailored towards SingleObjective Optimization. In practice, it is desired to optimize multiple conflicting objectives Prediction Accuracy vs. Training Time
35
Motivation The bulk of MSO algorithms have been tailored towards SingleObjective Optimization. In practice, it is desired to optimize multiple conflicting objectives Prediction Accuracy vs. Training Time
Convergence Analysis: finite-set and/or discrete problems [Rudolph, 1998; Kumar & Banerjee, 2005], probabilistic guarantees [Hanne, 1999] total order on the solutions [Gabor et al., 1998]
35
Potentially-Optimal Nodes ( ) Single-Objective MSO algorithm: DIRECT [Jones et al., 1993]
Function Value (Local Score)
Size (Global Score)
36
Potentially-Optimal Nodes ( ) Single-Objective MSO algorithm: DIRECT [Jones et al., 1993]
Function Value (Local Score)
• From 2D-space projection to (m +1)D-space. • m : # of objectives
Size (Global Score)
36
Potentially-Optimal Nodes for Multi-Objective Optimization
•
Proposed two MSO algorithmic instances for Multi-Objective Optimization based on DIRECT and SOO: • MO-DIRECT & MO-SOO
37
Empirical Validation on 1000 problems
•
https://bbcomp.ini.rub.de/results/BBComp2016-2OBJ-expensive/summary.html
38
Theoretical Performance A finite-time upper bound on the Pareto-compliant Quality Indicator: Additive Epsilon-Indicator
39
Empirical Validation of Theoretical Bounds
•
Code for reproducing these bounds can be found at https://www.dropbox.com/s/ ssiq1m52hczuj7a/mosoo-theory-validation.rar?dl=0 40
Summary of Contribution III Extension of MSO algorithms to Mutli-Objective Optimization: MOSOO & MO-DIRECT. First time in literature, a finite-time upper bound on the Paretocompliant Additive Indicator down to the conflict dimension. Numerical Validation of the bound on a synthetic set of problems.
41
Contribution IV Optimization Algorithms Benchmarking
42
Contribution IV Optimization Algorithms Benchmarking
• • • • • •
Al-Dujaili, Abdullah, and S. Suresh. "BMOBench: Black-Box Multi-Objective Optimization Benchmarking Platform." arXiv preprint arXiv: 1605.07009 (2016). Tanweer, M. R., Abdullah Al-Dujaili, and S. Suresh. "Empirical Assessment of Human Learning Principles Inspired PSO Algorithms on Continuous Black-Box Optimization Testbed." International Conference on Swarm, Evolutionary, and Memetic Computing. Springer International Publishing, 2015. Al-Dujaili, Abdullah, Muhammad Rizwan Tanweer, and Sundaram Suresh. "On the Performance of Particle Swarm Optimization Algorithms in Solving Cheap Problems." Computational Intelligence, 2015 IEEE Symposium Series on. IEEE, 2015. Al-Dujaili, Abdullah, Muhammad Rizwan Tanweer, and Sundaram Suresh. "DE vs. PSO: A Performance Assessment for Expensive Problems." Computational Intelligence, 2015 IEEE Symposium Series on. IEEE, 2015. Al-Dujaili, Abdullah, and S. Suresh. "Analysis of the Bayesian multi-scale optimistic optimization on the CEC2016 and BBOB testbeds." Evolutionary Computation (CEC), 2016 IEEE Congress on. IEEE, 2016. Tanweer, M. R., A. Al-Dujaili, and S. Suresh. "Multi-Objective Self Regulating Particle Swarm Optimization Algorithm for BMOBench Platform.“
42
Empirical Assessment on COCO Comprehensive Benchmark of established MSO algorithms. classical (DIRECT, MCS), commercial (BB-LS) and recent (SOO, BaMSOO)
NMSO, BB-LS, MCS are suitable for expensive optimization. BaMSOO is suitable for cheap optimization problems. Detailed discussion is presented in the thesis.
43
BMOBench Inspired by COCO for Single-Objective Optimization. A platform for 100 problems in C / MATLAB Accommodates stochastic and deterministic algorithms. data profiles generated in terms of 4 quality indicators. Compiled in a LaTeX-template file.
Special session at SSCI’2016, Greece.
Available at http://ash-aldujaili.github.io/BMOBench/ Acknowledgement: BO-BBOB platform [Brockhoff et al., 2015] and the work of [Cust´odio et al. , 2011] Thanks extended to Bhavarth Pandya, Chaitanya Prasad, Khyati Mahajan, and Shaleen Gupta.
44
Summary of Contribution IV
Comprehensive benchmarking of several established MSO algorithms to complement the theoretical analysis. BMOBench: a benchmarking platform for multi-objective algorithms with 100 problems.
45
Conclusion & Future Works
46
Conclusion
47
Conclusion 1.
A Theoretical Framework for Finite-Time Analysis
47
Conclusion 1.
A Theoretical Framework for Finite-Time Analysis
2.
A Finite-Time Pareto-Compliant Indicator Bound
47
Conclusion 1.
A Theoretical Framework for Finite-Time Analysis
2.
A Finite-Time Pareto-Compliant Indicator Bound
3.
Exploitative Multi-Scale Search Algorithm for Expensive Optimization
47
Conclusion 1.
A Theoretical Framework for Finite-Time Analysis
2.
A Finite-Time Pareto-Compliant Indicator Bound
3.
Exploitative Multi-Scale Search Algorithm for Expensive Optimization
47
Conclusion 1.
A Theoretical Framework for Finite-Time Analysis
2.
A Finite-Time Pareto-Compliant Indicator Bound
3.
Exploitative Multi-Scale Search Algorithm for Expensive Optimization
4.
Multi-Scale Search Algorithms for Multi-Objective Optimization
47
Conclusion 1.
A Theoretical Framework for Finite-Time Analysis
2.
A Finite-Time Pareto-Compliant Indicator Bound
3.
Exploitative Multi-Scale Search Algorithm for Expensive Optimization
4.
Multi-Scale Search Algorithms for Multi-Objective Optimization
47
Conclusion 1.
A Theoretical Framework for Finite-Time Analysis
2.
A Finite-Time Pareto-Compliant Indicator Bound
3.
Exploitative Multi-Scale Search Algorithm for Expensive Optimization
4.
Multi-Scale Search Algorithms for Multi-Objective Optimization
5.
Comprehensive Benchmarking of MSO algorithms.
47
Conclusion 1.
A Theoretical Framework for Finite-Time Analysis
2.
A Finite-Time Pareto-Compliant Indicator Bound
3.
Exploitative Multi-Scale Search Algorithm for Expensive Optimization
4.
Multi-Scale Search Algorithms for Multi-Objective Optimization
5.
Comprehensive Benchmarking of MSO algorithms.
47
Conclusion 1.
A Theoretical Framework for Finite-Time Analysis
2.
A Finite-Time Pareto-Compliant Indicator Bound
3.
Exploitative Multi-Scale Search Algorithm for Expensive Optimization
4.
Multi-Scale Search Algorithms for Multi-Objective Optimization
5.
Comprehensive Benchmarking of MSO algorithms.
6.
A new Benchmark for Multi-Objective Optimization.
47
Future Directions
Towards Adaptive Partitioning Towards Large-Scale Problems Towards AI Applications
48
Future Directions
Towards Adaptive Partitioning Towards Large-Scale Problems Towards AI Applications
49
The Curse of dimensionality – on 1K problems
50
The Curse of dimensionality – on 1K problems
50
The Curse of dimensionality – on 1K problems
50
The Curse of dimensionality – on 1K problems
50
Theoretical Bound for Large-Scale Optimization
•
Al-Dujaili, A., & Suresh, S. “Embedded Bandits for Large-Scale Black-Box Optimization”. AAAI Conference on Artificial Intelligence. (2017) 51
EmbeddedHunter for Large-Scale Optimization Available at https://goo.gl/nYDVBY
•
Al-Dujaili, A., & Suresh, S. “Embedded Bandits for Large-Scale Black-Box Optimization”. AAAI Conference on Artificial Intelligence. (2017) 52
EmbeddedHunter for Large-Scale Optimization Available at https://goo.gl/nYDVBY
•
Al-Dujaili, A., & Suresh, S. “Embedded Bandits for Large-Scale Black-Box Optimization”. AAAI Conference on Artificial Intelligence. (2017) 52
Future Directions
Towards Adaptive Partitioning Towards Large-Scale Problems Towards AI Applications
53
Black-Box for Machine Learnig / AI OpenAI: Black-Box algorithms for RL.
OpenAI Blog, https://blog.openai.com/ evolution-strategies/ 54
Publications
55