Dec 5, 2008 - M. Seeger and H. Nickisch. Compressed sensing and Bayesian experimental design. In 25th International Conference on Machine Learning, ...
Variational Inference and Experimental Design for Sparse Linear Models Hannes Nickisch1 joint work with Matthias Seeger2 , Rolf Pohmann1 and Bernhard Sch¨ olkopf1 2
1 Max Planck Institute for Biological Cybernetics, T¨ ubingen ucken Saarland University & Max Planck Institute for Informatics, Saarbr¨
December 5, 2008
→ H. Nickisch
(MPI)
Inference & Design for Sparse Models
December 5, 2008
1 / 12
Sparse Estimation data fit
si
regularization
sj
y = X u + ε , s = Bu (m × 1) (m × n) (n × 1) (n × 1) (q × 1) Wavelet denoising of medical images (Weaver, 91) Wavelet shrinkage (Donoho and Johnstone, 93) Selective shrinkage (Ishwaran and Rao, 05) Compressed sensing (Cand`es and Donoho, 06) Graphical model estimation H. Nickisch
(MPI)
(Meinshausen and B¨ uhlmann, 06/08)
Inference & Design for Sparse Models
December 5, 2008
2 / 12
Real-world signals: Images Images are statistically sparse (noise bitmaps aren’t) Images are not piecewise constant (phantoms are)
∇
∇
4
3
∇
4
x 10
3
2.5
4
11.5 × 10
x 10
2.5
...
2
2
2
1.5
1.5
1.5
1
1
1
0.5
0.5
0.5
0 −1
−0.5
0
0.5
1
0 −1
−0.5
0
0.5
“Learning compressed sensing” for images H. Nickisch
(MPI)
(Simoncelli, 99)
1
0 −1
−0.5
0
0.5
1
(Seeger and Nickisch, ICML 08, [1])
Inference & Design for Sparse Models
December 5, 2008
3 / 12
Real-world signals: Images Sparse estimation can be too sparse.
H. Nickisch
(MPI)
Inference & Design for Sparse Models
December 5, 2008
4 / 12
Is the mode sufficiently informative? Data
+ Sparsity s2
shrinkage
s3
s1
Estimator ˆs contains many zeros Shortcomings of sparse estimation alone correlations between eliminated coefficients vanish posterior mode is a singular point (no proper curvature)
Open questions stability of sparseness pattern ordering and confidence of elimination experimental design
Posterior distribution might help here H. Nickisch
(MPI)
Inference & Design for Sparse Models
December 5, 2008
5 / 12
Application to Magnetic Resonance Imaging P(u) ∝
1) Prior P(u)
Q
j
{P(sj ) ∝ exp(−τ |sj |)}, s = Bu
Laplace prior
2 −1 P(y|u) ∝ exp( 2σ 2 kXu−yk ), y = Xu + ε
2) Data P(y|u)
3) Estimation
Posterior P(u|y)
4) Inference
5) Design
Gaussian likelihood
Design improves MR sequences H. Nickisch
(MPI)
(Seeger&Nickisch, NIPS 08, [3])
Inference & Design for Sparse Models
December 5, 2008
6 / 12
Inference Approximate inference: P(u|y) ≈ N (u|µγ , Σγ ) deterministic (Gaussian) approximation to posterior
Variational relaxation: minγ φ(γ) high-dimensional convex minimization (Seeger&Nickisch, 08, [2])
Scalability: full images (n = 256 × 256) numerical mathematics: linear systems (CG, Lanczos) signal processing: MVMs (NFFT, filterbanks)
L
≥
Nγ
γ
H. Nickisch
(MPI)
Inference & Design for Sparse Models
December 5, 2008
7 / 12
Minimization Na¨ıve minimization of φ : R196096 → R is convex but infeasible + Concave convex decomposition: φ = φ∩ + φ∪ ≤ φ/ + φ∪ Double loop algorithm (like EM) inner: γ i+1 ← arg minγ φi/ + φ∪ outer: φi+1 ← tight bound at γ i+1 /
φ
φ i
γ φ∩
H. Nickisch
φ∪
(MPI)
φi/
Inference & Design for Sparse Models
γi+1
φ∪
December 5, 2008
8 / 12
Sequential Design
Prior P(u)
Lik0 y0 =X0 u+ε0 −→
Post0 P(u|y0 ) ≈ Q(u|y0 )
Lik1 y1 =X1 u+ε1 −→ ↑ →X1
Post1 P(u|y0:1 ) ≈ Q(u|y0:1 )
Postd ... ...
yd =Xd u+εd
−→ ↑ →Xd
P(u|y0:d ) ≈ Q(u|y0:d )
(≈) Gaussian approximation: P(u|y) ≈ Q(u|y) = N (u|µ, Σ) (→) Gaussian design: Xi = arg maxX ln I + X∗ > ΣX∗ ∗
maximal uncertainty reduction ≡ information gain (D-optimal)
H. Nickisch
(MPI)
Inference & Design for Sparse Models
December 5, 2008
9 / 12
Results
Equi−spaced φ0: Err=4.40
Optimized φ0: Err=3.95
0.04
0.04
0.04
0
0
0
Random φ0: Err=12.08
k−space
−0.04 −0.04
20
0
0.04
k−space
−0.04 −0.04
0
0.04
k−space
−0.04 −0.04
0
0.04
log of L2 Error
10 5
# arms
2
3
H. Nickisch
(MPI)
4
5
Inference & Design for Sparse Models
6
December 5, 2008
7
10 / 12
Summary, Outlook, Questions?
Many problems require sparse inference, beyond sparse estimation. confidences, ordering, experimental design
For medical images, sparsity regularization can be too strong. Sparse estimation: Design matters more then estimation. Scalable variational inference algorithm over large non-Gaussian model Bayesian experimental design: MRI sequence design support
Can sparsity theory be extended to convex variational inference/experimental design?
H. Nickisch
(MPI)
Inference & Design for Sparse Models
December 5, 2008
11 / 12
References M. Seeger and H. Nickisch. Compressed sensing and Bayesian experimental design. In 25th International Conference on Machine Learning, 2008. M. Seeger and H. Nickisch. Large scale variational inference and experimental design for sparse generalized linear models. Technical Report 175, MPI Biological Cybernetics, T¨ ubingen, September 2008. M. Seeger, H. Nickisch, R. Pohmann, and B. Sch¨olkopf. Bayesian experimental design of magnetic resonance imaging sequences. Neural Information Processing Systems 21 (in press), 2008.
H. Nickisch
(MPI)
Inference & Design for Sparse Models
December 5, 2008
12 / 12
Model
Laplace Prior
Sparsity of (Medical) Images optimal quantization
∇
Image Normal Laplace
1
10
0
∇+
10
−1
10
−2
10
−0.2
0
0.2
Images are sparse [Field, 1987] super-Gaussianity (peakedness/kurtosis > 0) compression (much less bits/pixel than noise) reconstruction from incomplete information
Laplace prior: P(u) ∝ exp (− kτ (Bu)k1 )
s sparse P(si ) = τ2i exp(−τi |si |), s = Bu, coefficients, W B = D1 overcomplete transform (wavelet, finite differences) D2
H. Nickisch
(MPI)
Inference & Design for Sparse Models
December 5, 2008
1 / 12
Model
Gaussian Likelihood
Measurements in k-space r−space: U(r)
k−space: Y(k)
gradients: g(t)
1/2
gx in [mT/m]
n
1 1
n
−1/2 −1/2
gy in [mT/m]
0
0
1/2
50 0 −50 0 50
2
4
6
4
6
0 −50 0
2 t in [ms]
Image u ≡ proton density of tissue in pixel-space U(r ) Measurements y ≡ Fourier coefficients in k-space Y (k) spatial gradients gx (t) and gy (t) ⇒ k-space trajectory k(t) Gaussian likelihood: P(y|u) ∝ exp(− 2σ1 2 kXu − yk22 ) >
linear model y = Xu + ε where X = [e −i2πrj k(tl ) ]lj noise ε due to phase shifts and system imperfections H. Nickisch
(MPI)
Inference & Design for Sparse Models
December 5, 2008
2 / 12
Model
Posterior
Estimation Posterior (unimodal) P(u|y) ∝ P(u)P(y|u) ∝ exp − kτ (Bu)k1 −
1 2σ 2
kXu − yk22
MAP estimation (convex) wavelet denoising [Weaver, 1991] (si < θ ⇒ si ← 0) Landweber iterations, shrinkage [Stein, 1960s] ˆ MAP = arg maxu P(u|y) = arg minu kτ (Bu)k1 + u
1 2σ 2
kXu − yk22
Compressed Sensing big hype: sensing −→ postprocessing [Cand`es & Donoho] advice: choose X to be incoherent to B, i.e. random X
H. Nickisch
(MPI)
Inference & Design for Sparse Models
December 5, 2008
3 / 12
The Bayesian Crank
Inference
Approximations and Relaxations 4
−ln(ZVB)
3.5
−ln(ZKL) −ln(Z)
3 2.5 2 1.5 1 0.5
N approx: P(u|y) =
1 Z
Qq
j=1 P(sj )N (y)
0
5
10
γ
15
20
25
≈ N (u|m, V) =: Q(u)
variational approach: minm,V KL(N ||P) at fixed-point: Z ln |
! ∂ ∂V KL(N ||P) =
convex relaxation: Q j
Q ln
P(sj )N (y) Q
R Q
j
{z
du }
Jensen
KL(Q||P)
Q
Z
≥ | {z }
Qln |
0 ⇒ P(sj ) ← ˜tj (sj ) := e j
P(sj )N (y) Q {z
relax
du }
P(sj )N (y)du
= ln Z H. Nickisch
(MPI)
Qln | ln
ln ZKL Inference & Design for Sparse Models
j
Q
Z
≥
bj sj − 2γ1 sj2
˜ j tj (sj )N (y) Q
R Q
{z
˜ j tj (sj )N (y)du
du }
= ln ZVB December 5, 2008
4 / 12
The Bayesian Crank
Inference
Legendre Duality express convex f (x) as maximisation working horse in variational approximate inference f (x) = maxu x> u − f ∗ (u) = x> u∗ − f ∗ (u∗ ) with u∗ = ∇f (x) f(x)
slope u* −f*(u ) *
x*
slope u H. Nickisch
(MPI)
Inference & Design for Sparse Models
December 5, 2008
5 / 12
The Bayesian Crank
Inference
Site Bounding ln P(sj ) convex in sj2 (all scale-mixtures) [Wipf, 08] Legendre duality:
g (x) =
P(sj ) =: e
g (sj2 )
max px − g ∗ (p)
L≥N
p
− 2γ1 sj2 j
= max e| {z } e γj >0
−g ∗ (− 2γ1 j
)
N U (sj |0,γj−1 ) 1 2
e −τj |sj | = maxγj N U (sj |0, γj−1 )e − 2 τj γj [Girolami, 01] P c > − ln Z ≤ − ln ZVB (γ) = ln |Aγ | − 2 qj=1 g ∗ (− 2γ1 j ) + σ 2 y> XA−1 γ X y where Aγ = X> X + B> Γ−1 B − ln ZVB (γ) convex if ln P(sj ) concave (Laplace, Bernoulli, logistic)
H. Nickisch
(MPI)
Inference & Design for Sparse Models
December 5, 2008
6 / 12
The Bayesian Crank
Inference
Double Loop Optimization c
na¨ıve minimization of φ(γ) := − ln ZVB (γ) is infeasible ∂φ ∂γ
needs inversion of Aγ = X> X + B> Γ−1 B, let alone
∂2φ ∂γ∂γ >
outer loop: z ← σ −2 V[Bu|y, γ], inner loop: u ← E[u|y, γ] inner loop: z
u ← minu 2σ1 2 ky − Xuk22 + τ > p γ ← τ z + (Bu/σ)2 }|
outer “ loop: ” > z ← dg BA−1 γ B z }| { min z> γ −1 − φ∗∩ (z) φ(γ)
=
H. Nickisch
(MPI)
{
γ > τ 2 + σ −2 minu ky − Xuk22 + u> B> Γ−1 Bu
z0
m ln |Aγ | | {z } φ∩ (γ −1 ) concave in γ −1 and convex in γ
p z + (Bu/σ)2
+
m > γ > τ 2 − σ −2 y> XA−1 γ X y | {z } φ∪ (γ) convex in γ
Inference & Design for Sparse Models
December 5, 2008
7 / 12
The Bayesian Crank
Inference
Inner and Outer Loop Inner: Gaussian Mean, IRLS u ← minu
1 2σ 2
ky − Xuk22 + τ >
p z + (Bu/σ)2 (MAP for z = 0)
Newton steps, approx. by LCG
Outer: Gaussian Variances > = σ −2 V[Bu|y, γ] [Schneider & Willsky, 01] z ← dg BA−1 γ B > Lanczos algorithm: [q1 , .., qk ] = Qk s.t. Q> k Aγ Qk = Tk , Qk Qk = I −1 > A−1 z with 0 < ˆz ≤ z γ ≈ Qk Tk Qk yields estimate ˆ
Computational Primitives MVMs with X, X> , B and B> and simple vector operations
H. Nickisch
(MPI)
Inference & Design for Sparse Models
December 5, 2008
8 / 12
The Bayesian Crank
Design
Classical Gaussian Experimental Design
y = Xu + ε with ε ∼ N (0, σ 2 I) ˆ ML = X+ y, residuals r = u ˆ ML − u ∼ N 0, σ 2 (X> X)−1 u maximize precision matrix P =
1 > X X σ2
[Fisher, 1935]
P A-optimal: tr(P) = Pi λi (P) D-optimal: ln |P| = pi ln λi (P) P 2 E -optimal: kPkFro = i λi (P)
H. Nickisch
(MPI)
Inference & Design for Sparse Models
December 5, 2008
9 / 12
Results
Simulations from raw data
Experimental Setup Data: Siemens 3T, TSE, 256×256 pixels, 16 slices Spiral trajectories Adjust offset angle φ0 ∈ [0, π)
π
φ0
0
MAP reconstruction Random φ
H. Nickisch
Equispaced φ
(MPI)
Inference & Design for Sparse Models
Optimized φ
December 5, 2008
10 / 12
Results
Simulations from raw data
Results
Equi−spaced φ0: Err=29.47
Optimized φ0: Err=27.51
0.04
0.04
0.04
0
0
0
Random φ0: Err=29.91
k−space
−0.04 −0.04
20
0
0.04
k−space
−0.04 −0.04
0
0.04
k−space
−0.04 −0.04
0
0.04
log of L2 Error
10 5
# arms
2
3
H. Nickisch
(MPI)
4
5
Inference & Design for Sparse Models
6
December 5, 2008
7
11 / 12
Results
Simulations from raw data
Results
Equi−spaced φ0: Err=23.23
Optimized φ0: Err=23.08
0.04
0.04
0.04
0
0
0
Random φ0: Err=24.07
k−space
−0.04 −0.04
20
0
0.04
k−space
−0.04 −0.04
0
0.04
k−space
−0.04 −0.04
0
0.04
log of L2 Error
10 5
# arms
2
3
H. Nickisch
(MPI)
4
5
Inference & Design for Sparse Models
6
December 5, 2008
7
11 / 12
Results
Simulations from raw data
Results
Equi−spaced φ0: Err=19.03
Optimized φ0: Err=17.50
0.04
0.04
0.04
0
0
0
Random φ0: Err=20.02
k−space
−0.04 −0.04
20
0
0.04
k−space
−0.04 −0.04
0
0.04
k−space
−0.04 −0.04
0
0.04
log of L2 Error
10 5
# arms
2
3
H. Nickisch
(MPI)
4
5
Inference & Design for Sparse Models
6
December 5, 2008
7
11 / 12
Results
Simulations from raw data
Results
Equi−spaced φ0: Err=14.18
Optimized φ0: Err=12.99
0.04
0.04
0.04
0
0
0
Random φ0: Err=16.16
k−space
−0.04 −0.04
20
0
0.04
k−space
−0.04 −0.04
0
0.04
k−space
−0.04 −0.04
0
0.04
log of L2 Error
10 5
# arms
2
3
H. Nickisch
(MPI)
4
5
Inference & Design for Sparse Models
6
December 5, 2008
7
11 / 12
Results
Simulations from raw data
Results
Equi−spaced φ0: Err=10.06
Optimized φ0: Err=8.31
0.04
0.04
0.04
0
0
0
Random φ0: Err=12.86
k−space
−0.04 −0.04
20
0
0.04
k−space
−0.04 −0.04
0
0.04
k−space
−0.04 −0.04
0
0.04
log of L2 Error
10 5
# arms
2
3
H. Nickisch
(MPI)
4
5
Inference & Design for Sparse Models
6
December 5, 2008
7
11 / 12
Results
Simulations from raw data
Results
Equi−spaced φ0: Err=4.40
Optimized φ0: Err=3.95
0.04
0.04
0.04
0
0
0
Random φ0: Err=12.08
k−space
−0.04 −0.04
20
0
0.04
k−space
−0.04 −0.04
0
0.04
k−space
−0.04 −0.04
0
0.04
log of L2 Error
10 5
# arms
2
3
H. Nickisch
(MPI)
4
5
Inference & Design for Sparse Models
6
December 5, 2008
7
11 / 12
Results
Simulations from raw data
Robustness
Optimization
30
MAP(op)
25
MAP(eq) 20
L2 reconstruction error
15
10
7.5
5
4
3
2
2
3
4
5
6
7
8
Number spiral arms
H. Nickisch
(MPI)
Inference & Design for Sparse Models
December 5, 2008
12 / 12