Sep 14, 2016 - There are several flavors of likelihood-free inference. In. Bayesian ..... IEEE. Conference on Systems, M
Bayesian Optimization for Likelihood-Free Inference Michael Gutmann https://sites.google.com/site/michaelgutmann University of Edinburgh
14th September 2016
Reference
For further information: M.U. Gutmann and J. Corander Bayesian optimization for likelihood-free inference of simulator-based statistical models Journal of Machine Learning Research, 17(125): 1–47, 2016 J. Lintusaari, M.U. Gutmann, R. Dutta, S. Kaski, and J. Corander Fundamentals and Recent Developments in Approximate Bayesian Computation Systematic Biology, in press, 2016
Michael Gutmann
BOLFI
2 / 23
Overall goal I
Inference: Given data y o , learn about properties of its source
I
Enables decision making, predictions, . . .
Data space Data source
Observation
yo
Unknown properties
Inference
Michael Gutmann
BOLFI
3 / 23
Approach I
Set up a model with potential properties θ (hypotheses)
I
See which θ are in line with the observed data y o
Data space Data source
Observation
yo
Unknown properties
M(θ) Model
Inference
Michael Gutmann
BOLFI
4 / 23
The likelihood function L(θ) I
Measures agreement between θ and the observed data y o
I
Probability to generate data like y o if hypothesis θ holds
Data space Data source
Observation
yo
Unknown properties
y|θ
Data generation
ε
M(θ) Model
Michael Gutmann
BOLFI
5 / 23
Performing statistical inference
I
If L(θ) is known, inference is straightforward
I
Maximum likelihood estimation θˆ = argmaxθ L(θ)
I
Bayesian inference p(θ|y ) ∝ p(θ) × L(θ) posterior ∝ prior × likelihood Allows us to learn from data by updating probabilities
Michael Gutmann
BOLFI
6 / 23
Likelihood-free inference
Statistical inference for models where 1. the likelihood function is too costly to compute 2. sampling – simulating data – from the model is possible
Michael Gutmann
BOLFI
7 / 23
Importance of likelihood-free inference One reason: Such generative / simulator-based models occur widely I
Astrophysics: Simulating the formation of galaxies, stars, or planets
I
Evolutionary biology: Simulating the evolution of life
I
Neuroscience: Simulating neural circuits
I
Computer vision: Simulating natural scenes
I
Health science: Simulating the spread of an infectious disease
I
Simulated neural activity in rat somatosensory cortex (Figure from https://bbp.epfl.ch/nmc-portal)
... Michael Gutmann
BOLFI
8 / 23
Flavors of likelihood-free inference
I
There are several flavors of likelihood-free inference. In Bayesian setting e.g. I I
Approximate Bayesian computation (ABC) Synthetic likelihood (Wood, 2010)
I
General idea: Identify the values of the parameters of interest θ for which simulated data resemble the observed data
I
Simulated data resemble the observed data if some distance measure d ≥ 0 is small.
Here: Focus on ABC, see JMLR paper for synthetic likelihood
Michael Gutmann
BOLFI
9 / 23
Meta ABC algorithm
I I
Let y o be the observed data. Iterate many times: 1. Sample θ from a proposal distribution q(θ) 2. Sample y |θ according to the model 3. Compute distance d(y , y o ) between simulated and observed data 4. Retain θ if d(y , y o ) ≤
I
Different choices for q(θ) give different algorithms
I
Produces samples from the (approximate) posterior when is small
Michael Gutmann
BOLFI
10 / 23
Implicit likelihood approximation Likelihood: Probability to generate data like y o if hypothesis θ holds yθ(1)
Model
M(θ)
Data space
yθ(2) yθ(3)
ε
yθ(4)
yo
yθ(5) Likelihood L(θ) ≈ proportion of green outcomes
yθ(6)
L(θ) ≈
1 N
(i) o i=1 1 d(yθ , y ) ≤
PN
Michael Gutmann
BOLFI
11 / 23
Example: Bacterial infections in child care centers I I
Likelihood intractable for cross-sectional data But generating data from the model is possible 5
Parameters of interest: - rate of infections within a center - rate of infections from outside - competition between the strains
Strain Strain
10 15 20 25 30 10
15 20 Individual
25
30
35
Individual
5
Strain
10 15 20
5
25
10
30 5
Strain
5
15
10 20
5 15 20 Individual
25
Strain
25 30 5
10
Time
10
30
35
15 20 15 20 25 Individual
25
30
35
10
15 20 Individual
30
(Numminen et al, 2013)
5
Michael Gutmann
BOLFI
25
30
35
12 / 23
Example: Bacterial infections in child care centers
I
Data: Streptococcus pneumoniae colonization for 29 centers
I
Inference with Population Monte Carlo ABC
I
Reveals strong competition between different bacterial strains 18
prior posterior
Expensive: I
I
4.5 days on a cluster with 200 cores More than one million simulated data sets
probability density function
16 14 12 10
strong
weak
6 4 2 0 0
Michael Gutmann
Competition
8
BOLFI
0.2
0.4 0.6 Competition parameter
0.8
1
13 / 23
Why is the ABC algorithm so expensive? 1. It rejects most samples when is small 2. It does not make assumptions about the shape of L(θ) 3. It does not use all information available 4. It aims at equal accuracy for all parameters 6 5
L(θ) ≈
1 N
PN
i=1
1 d(yθ(i) , y o ) ≤
Approximate likelihood function (rescaled)
Average distance
4 3
Approximate lik function for competition parameter. N = 300.
2 distances
Variability
1
Threshold ε
0 0
Michael Gutmann
BOLFI
0.05 0.1 0.15 Competition parameter
0.2
14 / 23
Proposed solution (Gutmann and Corander, 2016)
1. It rejects most samples when is small ⇒ Don’t reject samples – learn from them 2. It does not make assumptions about the shape of L(θ) ⇒ Model the distances, assume average distance is smooth 3. It does not use all information available ⇒ Use Bayes’ theorem to update the model 4. It aims at equal accuracy for all parameters ⇒ Prioritize parameter regions with small distances equivalent strategy applies to inference with synthetic likelihood
Michael Gutmann
BOLFI
15 / 23
Modeling (points 1 & 2)
(i)
I
Data are tuples (θi , di ), where di = d(yθ , y o )
I
Model the conditional distribution of d given θ ˆ Estimated model yields approximation L(θ) for any choice of
I
b (d ≤ | θ) ˆ L(θ) ∝ Pr
I
b is probability under the estimated model. Pr Here: Use (log) Gaussian process as model (with squared exponential covariance function)
I Approach not restricted to Gaussian processes.
Michael Gutmann
BOLFI
16 / 23
Data acquisition (points 3 & 4) I
Samples of θ could be obtained by sampling from the prior or some adaptively constructed proposal distribution
I
Give priority to regions in the parameter space where distance d tends to be small.
I
Use Bayesian optimization to find such regions
I
Here: Use lower confidence bound acquisition function
(e.g. Cox
and John, 1992; Srinivas et al, 2012)
s ηt2 vt (θ) At (θ) = µt (θ) − |{z} | {z } | {z } post mean
(1)
weight post var
t: number of samples acquired so far I Approach not restricted to this acquisition function.
Michael Gutmann
BOLFI
17 / 23
Bayesian optimization for likelihood-free inference 5
Model based on 2 data points 95%
6
90%
5
80%
distance
0
mean
4 3
50%
2
-5
20%
1
10% -10
Model based on 3 data points
0
5%
Acquisition function
-1 -2
-15 0
0.05
0.1
0.15
Competition parameter
6
0.2
Next parameter to try
-3 0
0.05
0.1
0.15
0.2
Competition parameter
Model based on 4 data points
5
Exploration vs exploitation
distance
4 3
Data
Model
2 1 0 -1 0
Bayes' theorem 0.05
0.1
0.15
0.2
Competition parameter
Michael Gutmann
BOLFI
18 / 23
Example: Bacterial infections in child care centers I
Comparison of the proposed approach with a standard population Monte Carlo ABC approach.
I
Roughly equal results using 1000 times fewer simulations. Developed Fast Method Standard Method
0.4
Posterior means: solid lines,
0.35 Competition parameter
4.5 days with 200 cores ↓ 90 minutes with seven cores .
credibility intervals: shaded areas or dashed lines
0.3 0.25 0.2 0.15 0.1 0.05 2
2.5
3 3.5 4 4.5 5 Computational cost (log10)
5.5
6
(Gutmann and Corander, 2016) Michael Gutmann
BOLFI
19 / 23
Example: Bacterial infections in child care centers I
Comparison of the proposed approach with a standard population Monte Carlo ABC approach.
I
Roughly equal results using 1000 times fewer simulations.
11 Developed Fast Method Standard Method
10
1.8
External infection parameter
Internal infection parameter
Developed Fast Method Standard Method
1.6
9 8 7 6 5 4 3
1.4 1.2 1 0.8 0.6
2 1
0.4
2
2.5
3 3.5 4 4.5 5 Computational cost (log10)
5.5
6
2
2.5
3 3.5 4 4.5 5 Computational cost (log10)
Posterior means are shown as solid lines, credibility intervals as shaded areas or dashed lines
Michael Gutmann
BOLFI
5.5
6
. 20 / 23
Further benefits
I
The proposed method makes the inference more efficient. I
I
Enables inference for models which were out of reach till now I
I
Allowed us to perform far more comprehensive data analysis than with standard approach (Numminen et al, 2016)
model of evolution where simulating a single data set took us 12-24 hours (Marttinen et al, 2015)
Enables easier assessment of parameter identifiability for complex models I
model about transmission dynamics of tuberculosis (Lintusaari et al, 2016)
Michael Gutmann
BOLFI
21 / 23
Open questions
I
Model: How to best model the distance between simulated and observed data?
I
Acquisition function: Can we find strategies which are optimal for parameter inference?
I
Efficient high-dimensional inference: Can we use the approach to infer the joint distribution of 1000 variables?
see JMLR paper for a discussion
Michael Gutmann
BOLFI
22 / 23
Summary
I
Topic: Inference for models where the likelihood is intractable but sampling is possible
I
Inference principle: Find parameter values for which the distance between simulated and observed data is small
I
Problem considered: Computational cost
I
Proposed approach: Combine statistical modeling of the distance with decision making under uncertainty (Bayesian optimization)
I
Outcome: Approach increases the efficiency of the inference by several orders of magnitude
Michael Gutmann
BOLFI
23 / 23
References I M.U. Gutmann and J. Corander. Bayesian optimization for likelihood-free inference of simulator-based statistical models, Journal of Machine Learning Research, 17(125): 1–47, 2016 I J. Lintusaari, M.U. Gutmann, R. Dutta, S. Kaski, and J. Corander. Fundamentals and Recent Developments in Approximate Bayesian Computation, Systematic Biology, in press, 2016 I E. Numminen, M.U. Gutmann, M. Shubin, et al. The impact of host metapopulation structure on the population genetics of colonizing bacteria Journal of Theoretical Biology, 396: 53–62, 2016 I J. Lintusaari, M.U. Gutmann, S. Kaski, and J. Corander. On the identifiability of transmission dynamic models for infectious disease Genetics, 202(3): 911–918, 2016 I P. Marttinen, N.J. Croucher, M.U. Gutmann, J. Corander, and W.P. Hanage. Recombination produces coherent bacterial species clusters in both core and accessory genomes, Microbial Genomics, 1(5), 2015 I Numminen et al. Estimating the Transmission Dynamics of Streptococcus pneumoniae from Strain Prevalence Data. Biometrics 9, 2013. I N. Srinivas, A. Krause, S.M. Kakade, and M. Seeger. Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory, 58(5):3250–3265, 2012. I S.N. Wood. Statistical inference for noisy nonlinear ecological dynamic systems, Nature, 466: 1102–1104, 2010 I D. Cox and S. John. A statistical method for global optimization, Proc. IEEE Conference on Systems, Man and Cybernetics, 2: 1241–1246, 1992