Asynchronous Parallel Bayesian Optimisation via ... - Google Sites

1 downloads 231 Views 2MB Size Report
Asynchronous Parallel Bayesian Optimisation via Thompson Sampling. Kirthevasan Kandasamy, Akshay Krishnamurthy, Jeff Sch
Asynchronous Parallel Bayesian Optimisation via Thompson Sampling Kirthevasan Kandasamy, Akshay Krishnamurthy, Jeff Schneider, Barnabás Póczos AutoML Workshop, ICML, Sydney, Aug 2017 Summary:

Parallelised Thompson Sampling for BO

• Setting: Bayesian optimisation with parallel evaluations. • A direct application of Thompson sampling in synchronous and asynchronous parallel settings does essentially as well as if the evaluations were made in sequence. • When evaluation time is factored in, the asynchronous version outperforms the synchronous and sequential versions. • Proposed methods are conceptually and computationally much simpler than existing methods for parallel BO.

A straightforward application of TS to parallel settings: Asynchronous: asyTS Synchronous: synTS

Input: Prior GP GP(0, κ). D1 ← ∅, GP1 ← GP(0, κ). for j = 1, 2, . . . 1. Wait for all workers to finish. 2. Dj ← Dj−1 ∪ {(xm, ym)}M m=1 where (xm, ym) are worker m’s query/observation. 3. Compute posterior GP with Dj. 4. Draw m samples gm ∼ GP, ∀m. 5. Deploy worker m at argmax gm, ∀m.

Gaussian Process (Bayesian) Optimisation Expensive Blackbox Function Examples: - Hyper-parameter Tuning - ML estimation in Astrophysics - Optimal policy in Autonomous Driving

Main Theoretical Results: Let f ∼ GP(0, κ). Then for seqTS, synTS, and asyTS, after n √ evaluations we have E[SR(n)] ≲ log(n)Ψn/n.

f (x∗ )

x∗

Let the time taken for an evaluation be random, Then the expected number of completed evaluations for seqTS, synTS, and asyTS satisfy nseq < nsyn < nasy.

x

f (x)

Minimise Simple Regret. SR(n) = f(x⋆) − max f(xt).

Therefore, asyTS achieves asymptotically better regret with time ′ E[SR (T)] than synTS and seqTS. .

t=1,...,n

x

Bayesian Optimisation via Thompson Sampling Model f ∼ GP(0, κ). At time t sample g from posterior GP. Choose xt = argmaxx∈X g(x). yt ← evaluate f at xt. f (x)

Distribution

f (x)

Unif(a, b) HN (ζ ) 2

x

x

xt

N ← # of completed evaluations. In parallel settings, this is ′ by all M workers. SR (T) is practically more relevant than SR(n) and leads to new results in parallel BO.

1 b−a for x ∈ √ x2 − √2 e 2ζ 2 for ζ π

seqTS

(a, b) nseq = x > 0 nseq =

λe−λx for x > 0

Exp(λ)

x

2T b+a √ T√π ζ 2

synTS

asyTS

nsyn = M T(M+1) a+bM nasy = Mnseq Mnseq √ nsyn ≍

log(M) Mnseq log(M)

nsyn ≍

nseq = λT

nasy = Mnseq nasy = Mnseq

Difference between synchronous and asynchronous bounds grows with M and is pronounced for heavy tailed distributions.

This work: Parallel evaluations with M workers Several methods for this setting, but they either, ▶ cannot handle asynchronicity. ▶ do not come with theoretical guarantees. ▶ are conceptually/computationally complex.

Experiments Hartmann18, d = 18, M = 25, halfnormal

Park2, d = 4, M = 10, halfnormal synRAND synBUCB synUCBPE synTS asyRAND asyUCB asyEI asyHUCB asyHTS asyTS

10 0

SR′ (T )

Simple regret on a time budget: After time T, { f(x⋆) − maxj≤N f(xj) if N ≥ 1 ′ SR (T) = . maxx∈X |f(x⋆) − f(x)| otherwise

pdf p(x)

CurrinExp-14, d = 14, M = 35, pareto(k=3)

6.5 6

25

5.5 5 20 4.5

SR′ (T )

f (x)

nseq, nsyn, nasy for different random completion time models.

SR′ (T )

f : X ≡ [0, 1] → R is a noisy, expensive, black-box function. x⋆ = argmaxx∈X f(x) is a maximiser. d

f (x)

Input: Prior GP GP(0, κ). D1 ← ∅, GP1 ← GP(0, κ). for j = 1, 2, . . . 1. Wait for a worker to finish. 2. Dj ← Dj−1 ∪ {(x′, y′)} where (x′, y′) are the worker’s previous query/observation. 3. Compute posterior GP with Dj. 4. Draw a sample g ∼ GP. 5. Deploy worker at argmax g.

4 3.5

15

3 2.5 10

10 -1 0

5

10

15

Time units (T )

20

25

0

5

10

15

20

Time units (T )

25

30

0

5

10

Time units (T )

See paper for more synthetic and real experiments. Selected References: • Russo D. et al 2014, Learning to Optimize via Posterior Sampling. • Srinivas N. et al. 2010, Gaussian Process Optimization in the Bandit Setting ….

15

20