Variational Inference and Experimental Design for ... - Semantic Scholar

1 downloads 0 Views 2MB Size Report
Dec 5, 2008 - M. Seeger and H. Nickisch. Compressed sensing and Bayesian experimental design. In 25th International Conference on Machine Learning, ...
Variational Inference and Experimental Design for Sparse Linear Models Hannes Nickisch1 joint work with Matthias Seeger2 , Rolf Pohmann1 and Bernhard Sch¨ olkopf1 2

1 Max Planck Institute for Biological Cybernetics, T¨ ubingen ucken Saarland University & Max Planck Institute for Informatics, Saarbr¨

December 5, 2008

→ H. Nickisch

(MPI)

Inference & Design for Sparse Models

December 5, 2008

1 / 12

Sparse Estimation data fit

si

regularization

sj

y = X u + ε , s = Bu (m × 1) (m × n) (n × 1) (n × 1) (q × 1) Wavelet denoising of medical images (Weaver, 91) Wavelet shrinkage (Donoho and Johnstone, 93) Selective shrinkage (Ishwaran and Rao, 05) Compressed sensing (Cand`es and Donoho, 06) Graphical model estimation H. Nickisch

(MPI)

(Meinshausen and B¨ uhlmann, 06/08)

Inference & Design for Sparse Models

December 5, 2008

2 / 12

Real-world signals: Images Images are statistically sparse (noise bitmaps aren’t) Images are not piecewise constant (phantoms are)





4

3



4

x 10

3

2.5

4

11.5 × 10

x 10

2.5

...

2

2

2

1.5

1.5

1.5

1

1

1

0.5

0.5

0.5

0 −1

−0.5

0

0.5

1

0 −1

−0.5

0

0.5

“Learning compressed sensing” for images H. Nickisch

(MPI)

(Simoncelli, 99)

1

0 −1

−0.5

0

0.5

1

(Seeger and Nickisch, ICML 08, [1])

Inference & Design for Sparse Models

December 5, 2008

3 / 12

Real-world signals: Images Sparse estimation can be too sparse.

H. Nickisch

(MPI)

Inference & Design for Sparse Models

December 5, 2008

4 / 12

Is the mode sufficiently informative? Data

+ Sparsity s2

shrinkage

s3

s1

Estimator ˆs contains many zeros Shortcomings of sparse estimation alone correlations between eliminated coefficients vanish posterior mode is a singular point (no proper curvature)

Open questions stability of sparseness pattern ordering and confidence of elimination experimental design

Posterior distribution might help here H. Nickisch

(MPI)

Inference & Design for Sparse Models

December 5, 2008

5 / 12

Application to Magnetic Resonance Imaging P(u) ∝

1) Prior P(u)

Q

j

{P(sj ) ∝ exp(−τ |sj |)}, s = Bu

Laplace prior

2 −1 P(y|u) ∝ exp( 2σ 2 kXu−yk ), y = Xu + ε

2) Data P(y|u)

3) Estimation

Posterior P(u|y)

4) Inference

5) Design

Gaussian likelihood

Design improves MR sequences H. Nickisch

(MPI)

(Seeger&Nickisch, NIPS 08, [3])

Inference & Design for Sparse Models

December 5, 2008

6 / 12

Inference Approximate inference: P(u|y) ≈ N (u|µγ , Σγ ) deterministic (Gaussian) approximation to posterior

Variational relaxation: minγ φ(γ) high-dimensional convex minimization (Seeger&Nickisch, 08, [2])

Scalability: full images (n = 256 × 256) numerical mathematics: linear systems (CG, Lanczos) signal processing: MVMs (NFFT, filterbanks)

L





γ

H. Nickisch

(MPI)

Inference & Design for Sparse Models

December 5, 2008

7 / 12

Minimization Na¨ıve minimization of φ : R196096 → R is convex but infeasible + Concave convex decomposition: φ = φ∩ + φ∪ ≤ φ/ + φ∪ Double loop algorithm (like EM) inner: γ i+1 ← arg minγ φi/ + φ∪ outer: φi+1 ← tight bound at γ i+1 /

φ

φ i

γ φ∩

H. Nickisch

φ∪

(MPI)

φi/

Inference & Design for Sparse Models

γi+1

φ∪

December 5, 2008

8 / 12

Sequential Design

Prior P(u)

Lik0 y0 =X0 u+ε0 −→

Post0 P(u|y0 ) ≈ Q(u|y0 )

Lik1 y1 =X1 u+ε1 −→ ↑ →X1

Post1 P(u|y0:1 ) ≈ Q(u|y0:1 )

Postd ... ...

yd =Xd u+εd

−→ ↑ →Xd

P(u|y0:d ) ≈ Q(u|y0:d )

(≈) Gaussian approximation: P(u|y) ≈ Q(u|y) = N (u|µ, Σ) (→) Gaussian design: Xi = arg maxX ln I + X∗ > ΣX∗ ∗

maximal uncertainty reduction ≡ information gain (D-optimal)

H. Nickisch

(MPI)

Inference & Design for Sparse Models

December 5, 2008

9 / 12

Results

Equi−spaced φ0: Err=4.40

Optimized φ0: Err=3.95

0.04

0.04

0.04

0

0

0

Random φ0: Err=12.08

k−space

−0.04 −0.04

20

0

0.04

k−space

−0.04 −0.04

0

0.04

k−space

−0.04 −0.04

0

0.04

log of L2 Error

10 5

# arms

2

3

H. Nickisch

(MPI)

4

5

Inference & Design for Sparse Models

6

December 5, 2008

7

10 / 12

Summary, Outlook, Questions?

Many problems require sparse inference, beyond sparse estimation. confidences, ordering, experimental design

For medical images, sparsity regularization can be too strong. Sparse estimation: Design matters more then estimation. Scalable variational inference algorithm over large non-Gaussian model Bayesian experimental design: MRI sequence design support

Can sparsity theory be extended to convex variational inference/experimental design?

H. Nickisch

(MPI)

Inference & Design for Sparse Models

December 5, 2008

11 / 12

References M. Seeger and H. Nickisch. Compressed sensing and Bayesian experimental design. In 25th International Conference on Machine Learning, 2008. M. Seeger and H. Nickisch. Large scale variational inference and experimental design for sparse generalized linear models. Technical Report 175, MPI Biological Cybernetics, T¨ ubingen, September 2008. M. Seeger, H. Nickisch, R. Pohmann, and B. Sch¨olkopf. Bayesian experimental design of magnetic resonance imaging sequences. Neural Information Processing Systems 21 (in press), 2008.

H. Nickisch

(MPI)

Inference & Design for Sparse Models

December 5, 2008

12 / 12

Model

Laplace Prior

Sparsity of (Medical) Images optimal quantization



Image Normal Laplace

1

10

0

∇+

10

−1

10

−2

10

−0.2

0

0.2

Images are sparse [Field, 1987] super-Gaussianity (peakedness/kurtosis > 0) compression (much less bits/pixel than noise) reconstruction from incomplete information

Laplace prior: P(u) ∝ exp (− kτ (Bu)k1 )

s sparse P(si ) = τ2i exp(−τi |si |), s = Bu,  coefficients,  W B =  D1  overcomplete transform (wavelet, finite differences) D2

H. Nickisch

(MPI)

Inference & Design for Sparse Models

December 5, 2008

1 / 12

Model

Gaussian Likelihood

Measurements in k-space r−space: U(r)

k−space: Y(k)

gradients: g(t)

1/2

gx in [mT/m]

n

1 1

n

−1/2 −1/2

gy in [mT/m]

0

0

1/2

50 0 −50 0 50

2

4

6

4

6

0 −50 0

2 t in [ms]

Image u ≡ proton density of tissue in pixel-space U(r ) Measurements y ≡ Fourier coefficients in k-space Y (k) spatial gradients gx (t) and gy (t) ⇒ k-space trajectory k(t) Gaussian likelihood: P(y|u) ∝ exp(− 2σ1 2 kXu − yk22 ) >

linear model y = Xu + ε where X = [e −i2πrj k(tl ) ]lj noise ε due to phase shifts and system imperfections H. Nickisch

(MPI)

Inference & Design for Sparse Models

December 5, 2008

2 / 12

Model

Posterior

Estimation Posterior (unimodal)  P(u|y) ∝ P(u)P(y|u) ∝ exp − kτ (Bu)k1 −

1 2σ 2

kXu − yk22



MAP estimation (convex) wavelet denoising [Weaver, 1991] (si < θ ⇒ si ← 0) Landweber iterations, shrinkage [Stein, 1960s] ˆ MAP = arg maxu P(u|y) = arg minu kτ (Bu)k1 + u

1 2σ 2

kXu − yk22

Compressed Sensing big hype: sensing −→ postprocessing [Cand`es & Donoho] advice: choose X to be incoherent to B, i.e. random X

H. Nickisch

(MPI)

Inference & Design for Sparse Models

December 5, 2008

3 / 12

The Bayesian Crank

Inference

Approximations and Relaxations 4

−ln(ZVB)

3.5

−ln(ZKL) −ln(Z)

3 2.5 2 1.5 1 0.5

N approx: P(u|y) =

1 Z

Qq

j=1 P(sj )N (y)

0

5

10

γ

15

20

25

≈ N (u|m, V) =: Q(u)

variational approach: minm,V KL(N ||P) at fixed-point: Z ln |

! ∂ ∂V KL(N ||P) =

convex relaxation: Q j

Q ln

P(sj )N (y) Q

R Q

j

{z

du }

Jensen

KL(Q||P)

Q

Z

≥ | {z }

Qln |

0 ⇒ P(sj ) ← ˜tj (sj ) := e j

P(sj )N (y) Q {z

relax

du }

P(sj )N (y)du

= ln Z H. Nickisch

(MPI)

Qln | ln

ln ZKL Inference & Design for Sparse Models

j

Q

Z



bj sj − 2γ1 sj2

˜ j tj (sj )N (y) Q

R Q

{z

˜ j tj (sj )N (y)du

du }

= ln ZVB December 5, 2008

4 / 12

The Bayesian Crank

Inference

Legendre Duality express convex f (x) as maximisation working horse in variational approximate inference f (x) = maxu x> u − f ∗ (u) = x> u∗ − f ∗ (u∗ ) with u∗ = ∇f (x) f(x)

slope u* −f*(u ) *

x*

slope u H. Nickisch

(MPI)

Inference & Design for Sparse Models

December 5, 2008

5 / 12

The Bayesian Crank

Inference

Site Bounding ln P(sj ) convex in sj2 (all scale-mixtures) [Wipf, 08] Legendre duality:

g (x) =

P(sj ) =: e

g (sj2 )

max px − g ∗ (p)

L≥N

p

− 2γ1 sj2 j

= max e| {z } e γj >0

−g ∗ (− 2γ1 j

)

N U (sj |0,γj−1 ) 1 2

e −τj |sj | = maxγj N U (sj |0, γj−1 )e − 2 τj γj [Girolami, 01] P c > − ln Z ≤ − ln ZVB (γ) = ln |Aγ | − 2 qj=1 g ∗ (− 2γ1 j ) + σ 2 y> XA−1 γ X y where Aγ = X> X + B> Γ−1 B − ln ZVB (γ) convex if ln P(sj ) concave (Laplace, Bernoulli, logistic)

H. Nickisch

(MPI)

Inference & Design for Sparse Models

December 5, 2008

6 / 12

The Bayesian Crank

Inference

Double Loop Optimization c

na¨ıve minimization of φ(γ) := − ln ZVB (γ) is infeasible ∂φ ∂γ

needs inversion of Aγ = X> X + B> Γ−1 B, let alone

∂2φ ∂γ∂γ >

outer loop: z ← σ −2 V[Bu|y, γ], inner loop: u ← E[u|y, γ] inner loop: z

u ← minu 2σ1 2 ky − Xuk22 + τ > p γ ← τ z + (Bu/σ)2 }|

outer “ loop: ” > z ← dg BA−1 γ B z }| { min z> γ −1 − φ∗∩ (z) φ(γ)

=

H. Nickisch

(MPI)

{

γ > τ 2 + σ −2 minu ky − Xuk22 + u> B> Γ−1 Bu

z0

m ln |Aγ | | {z } φ∩ (γ −1 ) concave in γ −1 and convex in γ

p z + (Bu/σ)2

+

m > γ > τ 2 − σ −2 y> XA−1 γ X y | {z } φ∪ (γ) convex in γ

Inference & Design for Sparse Models

December 5, 2008

7 / 12

The Bayesian Crank

Inference

Inner and Outer Loop Inner: Gaussian Mean, IRLS u ← minu

1 2σ 2

ky − Xuk22 + τ >

p z + (Bu/σ)2 (MAP for z = 0)

Newton steps, approx. by LCG

Outer: Gaussian Variances  > = σ −2 V[Bu|y, γ] [Schneider & Willsky, 01] z ← dg BA−1 γ B > Lanczos algorithm: [q1 , .., qk ] = Qk s.t. Q> k Aγ Qk = Tk , Qk Qk = I −1 > A−1 z with 0 < ˆz ≤ z γ ≈ Qk Tk Qk yields estimate ˆ

Computational Primitives MVMs with X, X> , B and B> and simple vector operations

H. Nickisch

(MPI)

Inference & Design for Sparse Models

December 5, 2008

8 / 12

The Bayesian Crank

Design

Classical Gaussian Experimental Design

y = Xu + ε with ε ∼ N (0, σ 2 I) ˆ ML = X+ y, residuals r = u ˆ ML − u ∼ N 0, σ 2 (X> X)−1 u maximize precision matrix P =

1 > X X σ2



[Fisher, 1935]

P A-optimal: tr(P) = Pi λi (P) D-optimal: ln |P| = pi ln λi (P) P 2 E -optimal: kPkFro = i λi (P)

H. Nickisch

(MPI)

Inference & Design for Sparse Models

December 5, 2008

9 / 12

Results

Simulations from raw data

Experimental Setup Data: Siemens 3T, TSE, 256×256 pixels, 16 slices Spiral trajectories Adjust offset angle φ0 ∈ [0, π)

π

φ0

0

MAP reconstruction Random φ

H. Nickisch

Equispaced φ

(MPI)

Inference & Design for Sparse Models

Optimized φ

December 5, 2008

10 / 12

Results

Simulations from raw data

Results

Equi−spaced φ0: Err=29.47

Optimized φ0: Err=27.51

0.04

0.04

0.04

0

0

0

Random φ0: Err=29.91

k−space

−0.04 −0.04

20

0

0.04

k−space

−0.04 −0.04

0

0.04

k−space

−0.04 −0.04

0

0.04

log of L2 Error

10 5

# arms

2

3

H. Nickisch

(MPI)

4

5

Inference & Design for Sparse Models

6

December 5, 2008

7

11 / 12

Results

Simulations from raw data

Results

Equi−spaced φ0: Err=23.23

Optimized φ0: Err=23.08

0.04

0.04

0.04

0

0

0

Random φ0: Err=24.07

k−space

−0.04 −0.04

20

0

0.04

k−space

−0.04 −0.04

0

0.04

k−space

−0.04 −0.04

0

0.04

log of L2 Error

10 5

# arms

2

3

H. Nickisch

(MPI)

4

5

Inference & Design for Sparse Models

6

December 5, 2008

7

11 / 12

Results

Simulations from raw data

Results

Equi−spaced φ0: Err=19.03

Optimized φ0: Err=17.50

0.04

0.04

0.04

0

0

0

Random φ0: Err=20.02

k−space

−0.04 −0.04

20

0

0.04

k−space

−0.04 −0.04

0

0.04

k−space

−0.04 −0.04

0

0.04

log of L2 Error

10 5

# arms

2

3

H. Nickisch

(MPI)

4

5

Inference & Design for Sparse Models

6

December 5, 2008

7

11 / 12

Results

Simulations from raw data

Results

Equi−spaced φ0: Err=14.18

Optimized φ0: Err=12.99

0.04

0.04

0.04

0

0

0

Random φ0: Err=16.16

k−space

−0.04 −0.04

20

0

0.04

k−space

−0.04 −0.04

0

0.04

k−space

−0.04 −0.04

0

0.04

log of L2 Error

10 5

# arms

2

3

H. Nickisch

(MPI)

4

5

Inference & Design for Sparse Models

6

December 5, 2008

7

11 / 12

Results

Simulations from raw data

Results

Equi−spaced φ0: Err=10.06

Optimized φ0: Err=8.31

0.04

0.04

0.04

0

0

0

Random φ0: Err=12.86

k−space

−0.04 −0.04

20

0

0.04

k−space

−0.04 −0.04

0

0.04

k−space

−0.04 −0.04

0

0.04

log of L2 Error

10 5

# arms

2

3

H. Nickisch

(MPI)

4

5

Inference & Design for Sparse Models

6

December 5, 2008

7

11 / 12

Results

Simulations from raw data

Results

Equi−spaced φ0: Err=4.40

Optimized φ0: Err=3.95

0.04

0.04

0.04

0

0

0

Random φ0: Err=12.08

k−space

−0.04 −0.04

20

0

0.04

k−space

−0.04 −0.04

0

0.04

k−space

−0.04 −0.04

0

0.04

log of L2 Error

10 5

# arms

2

3

H. Nickisch

(MPI)

4

5

Inference & Design for Sparse Models

6

December 5, 2008

7

11 / 12

Results

Simulations from raw data

Robustness

Optimization

30

MAP(op)

25

MAP(eq) 20

L2 reconstruction error

15

10

7.5

5

4

3

2

2

3

4

5

6

7

8

Number spiral arms

H. Nickisch

(MPI)

Inference & Design for Sparse Models

December 5, 2008

12 / 12