Variational Inference and Experimental Design for ... - Semantic Scholar

Variational Inference and Experimental Design for Sparse Linear Models Hannes Nickisch1 joint work with Matthias Seeger2 , Rolf Pohmann1 and Bernhard Sch¨ olkopf1 2

1 Max Planck Institute for Biological Cybernetics, T¨ ubingen ucken Saarland University & Max Planck Institute for Informatics, Saarbr¨

December 5, 2008

→ H. Nickisch

(MPI)

Inference & Design for Sparse Models

December 5, 2008

1 / 12

Sparse Estimation data fit

si

regularization

sj

y = X u + ε , s = Bu (m × 1) (m × n) (n × 1) (n × 1) (q × 1) Wavelet denoising of medical images (Weaver, 91) Wavelet shrinkage (Donoho and Johnstone, 93) Selective shrinkage (Ishwaran and Rao, 05) Compressed sensing (Candès and Donoho, 06) Graphical model estimation H. Nickisch

(MPI)

(Meinshausen and B¨ uhlmann, 06/08)


December 5, 2008

2 / 12

Real-world signals: Images Images are statistically sparse (noise bitmaps aren’t) Images are not piecewise constant (phantoms are)

∇

∇

4

3

∇

4

x 10

3

2.5

4

11.5 × 10

x 10

2.5

...

2

2

2

1.5

1.5

1.5

1

1

1

0.5

0.5

0.5

0 −1

−0.5

0

0.5

1

0 −1

−0.5

0

0.5

“Learning compressed sensing” for images H. Nickisch

(MPI)

(Simoncelli, 99)

1

0 −1

−0.5

0

0.5

1

(Seeger and Nickisch, ICML 08, [1])


December 5, 2008

3 / 12

Real-world signals: Images Sparse estimation can be too sparse.

H. Nickisch

(MPI)


December 5, 2008

4 / 12

Is the mode sufficiently informative? Data

+ Sparsity s2

shrinkage

s3

s1

Estimator ˆs contains many zeros Shortcomings of sparse estimation alone correlations between eliminated coefficients vanish posterior mode is a singular point (no proper curvature)

Open questions stability of sparseness pattern ordering and confidence of elimination experimental design

Posterior distribution might help here H. Nickisch

(MPI)


December 5, 2008

5 / 12

Application to Magnetic Resonance Imaging P(u) ∝

1) Prior P(u)

Q

j

{P(sj ) ∝ exp(−τ |sj |)}, s = Bu

Laplace prior

2 −1 P(y|u) ∝ exp( 2σ 2 kXu−yk ), y = Xu + ε

2) Data P(y|u)

3) Estimation

Posterior P(u|y)

4) Inference

5) Design

Gaussian likelihood

Design improves MR sequences H. Nickisch

(MPI)

(Seeger&Nickisch, NIPS 08, [3])


December 5, 2008

6 / 12

Inference Approximate inference: P(u|y) ≈ N (u|µγ , Σγ ) deterministic (Gaussian) approximation to posterior

Variational relaxation: minγ φ(γ) high-dimensional convex minimization (Seeger&Nickisch, 08, [2])

Scalability: full images (n = 256 × 256) numerical mathematics: linear systems (CG, Lanczos) signal processing: MVMs (NFFT, filterbanks)

L

≥

Nγ

γ

H. Nickisch

(MPI)


December 5, 2008

7 / 12

Minimization Na¨ıve minimization of φ : R196096 → R is convex but infeasible + Concave convex decomposition: φ = φ∩ + φ∪ ≤ φ/ + φ∪ Double loop algorithm (like EM) inner: γ i+1 ← arg minγ φi/ + φ∪ outer: φi+1 ← tight bound at γ i+1 /

φ

φ i

γ φ∩

H. Nickisch

φ∪

(MPI)

φi/


γi+1

φ∪

December 5, 2008

8 / 12

Sequential Design

Prior P(u)

Lik0 y0 =X0 u+ε0 −→

Post0 P(u|y0 ) ≈ Q(u|y0 )

Lik1 y1 =X1 u+ε1 −→ ↑ →X1

Post1 P(u|y0:1 ) ≈ Q(u|y0:1 )

Postd ... ...

yd =Xd u+εd

−→ ↑ →Xd

P(u|y0:d ) ≈ Q(u|y0:d )

(≈) Gaussian approximation: P(u|y) ≈ Q(u|y) = N (u|µ, Σ) (→) Gaussian design: Xi = arg maxX ln I + X∗ > ΣX∗ ∗

maximal uncertainty reduction ≡ information gain (D-optimal)

H. Nickisch

(MPI)


December 5, 2008

9 / 12

Results

Equi−spaced φ0: Err=4.40

Optimized φ0: Err=3.95

0.04

0.04

0.04

0

0

0

Random φ0: Err=12.08

k−space

−0.04 −0.04

20

0

0.04

k−space

−0.04 −0.04

0

0.04

k−space

−0.04 −0.04

0

0.04

log of L2 Error

10 5

# arms

2

3

H. Nickisch

(MPI)

4

5


6

December 5, 2008

7

10 / 12

Summary, Outlook, Questions?

Many problems require sparse inference, beyond sparse estimation. confidences, ordering, experimental design

For medical images, sparsity regularization can be too strong. Sparse estimation: Design matters more then estimation. Scalable variational inference algorithm over large non-Gaussian model Bayesian experimental design: MRI sequence design support

Can sparsity theory be extended to convex variational inference/experimental design?

H. Nickisch

(MPI)


December 5, 2008

11 / 12

References M. Seeger and H. Nickisch. Compressed sensing and Bayesian experimental design. In 25th International Conference on Machine Learning, 2008. M. Seeger and H. Nickisch. Large scale variational inference and experimental design for sparse generalized linear models. Technical Report 175, MPI Biological Cybernetics, T¨ ubingen, September 2008. M. Seeger, H. Nickisch, R. Pohmann, and B. Schölkopf. Bayesian experimental design of magnetic resonance imaging sequences. Neural Information Processing Systems 21 (in press), 2008.

H. Nickisch

(MPI)


December 5, 2008

12 / 12

Model

Laplace Prior

Sparsity of (Medical) Images optimal quantization

∇

Image Normal Laplace

1

10

0

∇+

10

−1

10

−2

10

−0.2

0

0.2

Images are sparse [Field, 1987] super-Gaussianity (peakedness/kurtosis > 0) compression (much less bits/pixel than noise) reconstruction from incomplete information

Laplace prior: P(u) ∝ exp (− kτ (Bu)k1 )

s sparse P(si ) = τ2i exp(−τi |si |), s = Bu,  coefficients,  W B =  D1  overcomplete transform (wavelet, finite differences) D2

H. Nickisch

(MPI)


December 5, 2008

1 / 12

Model

Gaussian Likelihood

Measurements in k-space r−space: U(r)

k−space: Y(k)

gradients: g(t)

1/2

gx in [mT/m]

n

1 1

n

−1/2 −1/2

gy in [mT/m]

0

0

1/2

50 0 −50 0 50

2

4

6

4

6

0 −50 0

2 t in [ms]

Image u ≡ proton density of tissue in pixel-space U(r ) Measurements y ≡ Fourier coefficients in k-space Y (k) spatial gradients gx (t) and gy (t) ⇒ k-space trajectory k(t) Gaussian likelihood: P(y|u) ∝ exp(− 2σ1 2 kXu − yk22 ) >

linear model y = Xu + ε where X = [e −i2πrj k(tl ) ]lj noise ε due to phase shifts and system imperfections H. Nickisch

(MPI)


December 5, 2008

2 / 12

Model

Posterior

Estimation Posterior (unimodal) P(u|y) ∝ P(u)P(y|u) ∝ exp − kτ (Bu)k1 −

1 2σ 2

kXu − yk22

MAP estimation (convex) wavelet denoising [Weaver, 1991] (si < θ ⇒ si ← 0) Landweber iterations, shrinkage [Stein, 1960s] ˆ MAP = arg maxu P(u|y) = arg minu kτ (Bu)k1 + u

1 2σ 2

kXu − yk22

Compressed Sensing big hype: sensing −→ postprocessing [Candès & Donoho] advice: choose X to be incoherent to B, i.e. random X

H. Nickisch

(MPI)


December 5, 2008

3 / 12

The Bayesian Crank

Inference

Approximations and Relaxations 4

−ln(ZVB)

3.5

−ln(ZKL) −ln(Z)

3 2.5 2 1.5 1 0.5

N approx: P(u|y) =

1 Z

Qq

j=1 P(sj )N (y)

0

5

10

γ

15

20

25

≈ N (u|m, V) =: Q(u)

variational approach: minm,V KL(N ||P) at fixed-point: Z ln |

! ∂ ∂V KL(N ||P) =

convex relaxation: Q j

Q ln

P(sj )N (y) Q

R Q

j

{z

du }

Jensen

KL(Q||P)

Q

Z

≥ | {z }

Qln |

0 ⇒ P(sj ) ← ˜tj (sj ) := e j

P(sj )N (y) Q {z

relax

du }

P(sj )N (y)du

= ln Z H. Nickisch

(MPI)

Qln | ln

ln ZKL Inference & Design for Sparse Models

j

Q

Z

≥

bj sj − 2γ1 sj2

˜ j tj (sj )N (y) Q

R Q

{z

˜ j tj (sj )N (y)du

du }

= ln ZVB December 5, 2008

4 / 12

The Bayesian Crank

Inference

Legendre Duality express convex f (x) as maximisation working horse in variational approximate inference f (x) = maxu x> u − f ∗ (u) = x> u∗ − f ∗ (u∗ ) with u∗ = ∇f (x) f(x)

slope u* −f*(u ) *

x*

slope u H. Nickisch

(MPI)


December 5, 2008

5 / 12

The Bayesian Crank

Inference

Site Bounding ln P(sj ) convex in sj2 (all scale-mixtures) [Wipf, 08] Legendre duality:

g (x) =

P(sj ) =: e

g (sj2 )

max px − g ∗ (p)

L≥N

p

− 2γ1 sj2 j

= max e| {z } e γj >0

−g ∗ (− 2γ1 j

)

N U (sj |0,γj−1 ) 1 2

e −τj |sj | = maxγj N U (sj |0, γj−1 )e − 2 τj γj [Girolami, 01] P c > − ln Z ≤ − ln ZVB (γ) = ln |Aγ | − 2 qj=1 g ∗ (− 2γ1 j ) + σ 2 y> XA−1 γ X y where Aγ = X> X + B> Γ−1 B − ln ZVB (γ) convex if ln P(sj ) concave (Laplace, Bernoulli, logistic)

H. Nickisch

(MPI)


December 5, 2008

6 / 12

The Bayesian Crank

Inference

Double Loop Optimization c

na¨ıve minimization of φ(γ) := − ln ZVB (γ) is infeasible ∂φ ∂γ

needs inversion of Aγ = X> X + B> Γ−1 B, let alone

∂2φ ∂γ∂γ >

outer loop: z ← σ −2 V[Bu|y, γ], inner loop: u ← E[u|y, γ] inner loop: z

u ← minu 2σ1 2 ky − Xuk22 + τ > p γ ← τ z + (Bu/σ)2 }|

outer “ loop: ” > z ← dg BA−1 γ B z }| { min z> γ −1 − φ∗∩ (z) φ(γ)

=

H. Nickisch

(MPI)

{

γ > τ 2 + σ −2 minu ky − Xuk22 + u> B> Γ−1 Bu

z0

m ln |Aγ | | {z } φ∩ (γ −1 ) concave in γ −1 and convex in γ

p z + (Bu/σ)2

+

m > γ > τ 2 − σ −2 y> XA−1 γ X y | {z } φ∪ (γ) convex in γ


December 5, 2008

7 / 12

The Bayesian Crank

Inference

Inner and Outer Loop Inner: Gaussian Mean, IRLS u ← minu

1 2σ 2

ky − Xuk22 + τ >

p z + (Bu/σ)2 (MAP for z = 0)

Newton steps, approx. by LCG

Outer: Gaussian Variances > = σ −2 V[Bu|y, γ] [Schneider & Willsky, 01] z ← dg BA−1 γ B > Lanczos algorithm: [q1 , .., qk ] = Qk s.t. Q> k Aγ Qk = Tk , Qk Qk = I −1 > A−1 z with 0 < ˆz ≤ z γ ≈ Qk Tk Qk yields estimate ˆ

Computational Primitives MVMs with X, X> , B and B> and simple vector operations

H. Nickisch

(MPI)


December 5, 2008

8 / 12

The Bayesian Crank

Design

Classical Gaussian Experimental Design

y = Xu + ε with ε ∼ N (0, σ 2 I) ˆ ML = X+ y, residuals r = u ˆ ML − u ∼ N 0, σ 2 (X> X)−1 u maximize precision matrix P =

1 > X X σ2

[Fisher, 1935]

P A-optimal: tr(P) = Pi λi (P) D-optimal: ln |P| = pi ln λi (P) P 2 E -optimal: kPkFro = i λi (P)

H. Nickisch

(MPI)


December 5, 2008

9 / 12

Results

Simulations from raw data

Experimental Setup Data: Siemens 3T, TSE, 256×256 pixels, 16 slices Spiral trajectories Adjust offset angle φ0 ∈ [0, π)

π

φ0

0

MAP reconstruction Random φ

H. Nickisch

Equispaced φ

(MPI)


Optimized φ

December 5, 2008

10 / 12

Results


Results



0.04

0.04

0.04

0

0

0


k−space

−0.04 −0.04

20

0

0.04

k−space

−0.04 −0.04

0

0.04

k−space

−0.04 −0.04

0

0.04

log of L2 Error

10 5

# arms

2

3

H. Nickisch

(MPI)

4

5


6

December 5, 2008

7

11 / 12

Results


Results



0.04

0.04

0.04

0

0

0


k−space

−0.04 −0.04

20

0

0.04

k−space

−0.04 −0.04

0

0.04

k−space

−0.04 −0.04

0

0.04

log of L2 Error

10 5

# arms

2

3

H. Nickisch

(MPI)

4

5


6

December 5, 2008

7

11 / 12

Results


Results



0.04

0.04

0.04

0

0

0


k−space

−0.04 −0.04

20

0

0.04

k−space

−0.04 −0.04

0

0.04

k−space

−0.04 −0.04

0

0.04

log of L2 Error

10 5

# arms

2

3

H. Nickisch

(MPI)

4

5


6

December 5, 2008

7

11 / 12

Results


Results



0.04

0.04

0.04

0

0

0


k−space

−0.04 −0.04

20

0

0.04

k−space

−0.04 −0.04

0

0.04

k−space

−0.04 −0.04

0

0.04

log of L2 Error

10 5

# arms

2

3

H. Nickisch

(MPI)

4

5


6

December 5, 2008

7

11 / 12

Results


Results



0.04

0.04

0.04

0

0

0


k−space

−0.04 −0.04

20

0

0.04

k−space

−0.04 −0.04

0

0.04

k−space

−0.04 −0.04

0

0.04

log of L2 Error

10 5

# arms

2

3

H. Nickisch

(MPI)

4

5


6

December 5, 2008

7

11 / 12

Results


Results



0.04

0.04

0.04

0

0

0


k−space

−0.04 −0.04

20

0

0.04

k−space

−0.04 −0.04

0

0.04

k−space

−0.04 −0.04

0

0.04

log of L2 Error

10 5

# arms

2

3

H. Nickisch

(MPI)

4

5


6

December 5, 2008

7

11 / 12

Results


Robustness

Optimization

30

MAP(op)

25

MAP(eq) 20

L2 reconstruction error

15

10

7.5

5

4

3

2

2

3

4

5

6

7

8

Number spiral arms

H. Nickisch

(MPI)


December 5, 2008

12 / 12

Variational Inference and Experimental Design for ... - Semantic Scholar

Variational Inference and Experimental Design for ... - Semantic Scholar

Suggest Documents

experimental design and phylodynamic inference - Semantic Scholar

Large Scale Variational Inference and Experimental Design for Sparse ...

Variational Inference for Visual Tracking - Semantic Scholar

Inference Methods and Experimental Design for ... - Semantic Scholar

Experimental Design and Experimental Inference in ...

Variational inference for Markov jump processes - Semantic Scholar

Rejection Sampling Variational Inference

Copula variational inference

Variational Program Inference - arXiv

Variational Program Inference

Variational Program Inference - arXiv

Inference and Learning for Active Sensing, Experimental Design and ...

Psychophysiological Experimental Design for Use ... - Semantic Scholar

Experimental Design for Comparing Static ... - Semantic Scholar

Optimal Experimental Design for Parameter ... - Semantic Scholar

OPTIMAL EXPERIMENTAL DESIGN FOR AN ... - Semantic Scholar

Magnetic Decoupling Design and Experimental ... - Semantic Scholar

Bayesian Experimental Design and Shannon ... - Semantic Scholar

Fusion Engineering and Design Experimental ... - Semantic Scholar

VARIATIONAL INFERENCE FOR RARE VARIANT DETECTION IN ...

VIGoR: Variational Bayesian Inference for Genome-Wide ...

Variational Algorithms for Approximate Bayesian Inference - Computer ...

Variational Algorithms for Approximate Bayesian Inference - Computer ...

Variational Inference for Gaussian Process Modulated Poisson ...