Asynchronous Parallel Bayesian Optimisation via ... - Google Sites

Asynchronous Parallel Bayesian Optimisation via Thompson Sampling Kirthevasan Kandasamy, Akshay Krishnamurthy, Jeff Schneider, Barnabás Póczos AutoML Workshop, ICML, Sydney, Aug 2017 Summary:

Parallelised Thompson Sampling for BO

• Setting: Bayesian optimisation with parallel evaluations. • A direct application of Thompson sampling in synchronous and asynchronous parallel settings does essentially as well as if the evaluations were made in sequence. • When evaluation time is factored in, the asynchronous version outperforms the synchronous and sequential versions. • Proposed methods are conceptually and computationally much simpler than existing methods for parallel BO.

A straightforward application of TS to parallel settings: Asynchronous: asyTS Synchronous: synTS

Input: Prior GP GP(0, κ). D1 ← ∅, GP1 ← GP(0, κ). for j = 1, 2, . . . 1. Wait for all workers to finish. 2. Dj ← Dj−1 ∪ {(xm, ym)}M m=1 where (xm, ym) are worker m’s query/observation. 3. Compute posterior GP with Dj. 4. Draw m samples gm ∼ GP, ∀m. 5. Deploy worker m at argmax gm, ∀m.

Gaussian Process (Bayesian) Optimisation Expensive Blackbox Function Examples: - Hyper-parameter Tuning - ML estimation in Astrophysics - Optimal policy in Autonomous Driving

Main Theoretical Results: Let f ∼ GP(0, κ). Then for seqTS, synTS, and asyTS, after n √ evaluations we have E[SR(n)] ≲ log(n)Ψn/n.

f (x∗ )

x∗

Let the time taken for an evaluation be random, Then the expected number of completed evaluations for seqTS, synTS, and asyTS satisfy nseq < nsyn < nasy.

x

f (x)

Minimise Simple Regret. SR(n) = f(x⋆) − max f(xt).

Therefore, asyTS achieves asymptotically better regret with time ′ E[SR (T)] than synTS and seqTS. .

t=1,...,n

x

Bayesian Optimisation via Thompson Sampling Model f ∼ GP(0, κ). At time t sample g from posterior GP. Choose xt = argmaxx∈X g(x). yt ← evaluate f at xt. f (x)

Distribution

f (x)

Unif(a, b) HN (ζ ) 2

x

x

xt

N ← # of completed evaluations. In parallel settings, this is ′ by all M workers. SR (T) is practically more relevant than SR(n) and leads to new results in parallel BO.

1 b−a for x ∈ √ x2 − √2 e 2ζ 2 for ζ π

seqTS

(a, b) nseq = x > 0 nseq =

λe−λx for x > 0

Exp(λ)

x

2T b+a √ T√π ζ 2

synTS

asyTS

nsyn = M T(M+1) a+bM nasy = Mnseq Mnseq √ nsyn ≍

log(M) Mnseq log(M)

nsyn ≍

nseq = λT

nasy = Mnseq nasy = Mnseq

Difference between synchronous and asynchronous bounds grows with M and is pronounced for heavy tailed distributions.

This work: Parallel evaluations with M workers Several methods for this setting, but they either, ▶ cannot handle asynchronicity. ▶ do not come with theoretical guarantees. ▶ are conceptually/computationally complex.

Experiments Hartmann18, d = 18, M = 25, halfnormal

Park2, d = 4, M = 10, halfnormal synRAND synBUCB synUCBPE synTS asyRAND asyUCB asyEI asyHUCB asyHTS asyTS

10 0

SR′ (T )

Simple regret on a time budget: After time T, { f(x⋆) − maxj≤N f(xj) if N ≥ 1 ′ SR (T) = . maxx∈X |f(x⋆) − f(x)| otherwise

pdf p(x)

CurrinExp-14, d = 14, M = 35, pareto(k=3)

6.5 6

25

5.5 5 20 4.5

SR′ (T )

f (x)

nseq, nsyn, nasy for different random completion time models.

SR′ (T )

f : X ≡ [0, 1] → R is a noisy, expensive, black-box function. x⋆ = argmaxx∈X f(x) is a maximiser. d

f (x)

Input: Prior GP GP(0, κ). D1 ← ∅, GP1 ← GP(0, κ). for j = 1, 2, . . . 1. Wait for a worker to finish. 2. Dj ← Dj−1 ∪ {(x′, y′)} where (x′, y′) are the worker’s previous query/observation. 3. Compute posterior GP with Dj. 4. Draw a sample g ∼ GP. 5. Deploy worker at argmax g.

4 3.5

15

3 2.5 10

10 -1 0

5

10

15

Time units (T )

20

25

0

5

10

15

20

Time units (T )

25

30

0

5

10

Time units (T )

See paper for more synthetic and real experiments. Selected References: • Russo D. et al 2014, Learning to Optimize via Posterior Sampling. • Srinivas N. et al. 2010, Gaussian Process Optimization in the Bandit Setting ….

15

20

Asynchronous Parallel Bayesian Optimisation via ... - Google Sites

Asynchronous Parallel Bayesian Optimisation via ... - Google Sites

Suggest Documents

Asynchronous Parallel Bayesian Optimisation via ... - Google Sites

Asaga: Asynchronous Parallel Saga

Parallel Optimisation - KIT

BAYESIAN DEFORMABLE MODELS BUILDING VIA ... - Google Sites

Asynchronous Parallel Coordinate Minimization ... - Research at Google

Asynchronous Parallel Coordinate Minimization ... - Research at Google

PARTIALLY ASYNCHRONOUS, PARALLEL ALGORITHMS ... - MIT

Asynchronous Multi-Objective Optimisation in Unreliable ... - CiteSeerX

Deterministic Reductions in an Asynchronous Parallel ... - Google Sites

Bayesian Stochastic Mesh Optimisation for 3D Reconstruction

Bayesian Optimisation Algorithm for Nurse Scheduling

Efficient Bayesian Optimisation Using Derivative Meta

Component Deployment Optimisation with Bayesian Learning

Asynchronous Parallel Stochastic Gradient Descent - ORNL ...

Parallel asynchronous particle swarm optimization - CiteSeerX

Parallel Proposals in Asynchronous Search - Semantic Scholar

Page 1 MULTI-SWEEP ASYNCHRONOUS PARALLEL SUCCESSIVE ...

Asynchronous Parallel Stochastic Gradient Descent

Parallel Asynchronous Tabu Search for Multicommodity Location ...

Asynchronous Updates in Large Parallel Systems

Parallel, Asynchronous and Decentralised Ant ... - Semantic Scholar

ASAP: Asynchronous Approximate Data-Parallel Computation

From Parallel Programs To Asynchronous VLSI

Parallel Asynchronous Tabu Search for ... - Semantic Scholar