Half-Quadratic Minimization of Regularized

0 downloads 0 Views 105KB Size Report
Feb 3, 2006 - Department of Mathematics, The Chinese University of Hong Kong, Shatin, Hong Kong (rchan@math.cuhk.edu.hk). This work was supported ...
1

Half-Quadratic Minimization of Regularized Objectives as the Fixed Point Iteration for the Linearized Gradient Mila Nikolova and Raymond Chan

Abstract We focus on the minimization of regularized objective functions using the popular half-quadratic approach introduced by Geman and Reynolds in 1992. We show that whenever applicable, this approach is equivalent to the very classical gradient linearization approach, known also as the fixed point iteration.

Key words: half-quadratic regularization, inverse problems, gradient linearization, fixed point iteration, optimization, signal and image restoration, variational methods. I. I NTRODUCTION Let data y ∈ Rq be obtained from an original unknown image or signal x∗ via y = Ax∗ + noise where A ∈ Rq×p is a linear transform. Typical applications are denoising, deblurring, image reconstruction in

tomography and other inverse problems [1], [2], [3]. The sought-after x ˆ ∈ Rp is defined as the minimizer of an objective function J : Rp → R of the form J(x) = kAx − yk2 + βΦ(x), r X Φ(x) = ϕ(kGi xk),

(1) (2)

i=1

where Gi : Rp → Rs , s ≥ 1 for i = 1, . . . , r, are linear operators, k.k is the `2 norm, ϕ : R+ → R is called a potential function (PF) and β > 0 is a parameter. Typically1 , either s = 1 and operators Gi Centre de Math´ematiques et de Leurs Applications (CNRS-UMR 8536), ENS de Cachan, 61 avenue du President Wilson, 94235 Cachan Cedex, France ([email protected]). Department of Mathematics, The Chinese University of Hong Kong, Shatin, Hong Kong ([email protected]). This work was supported by HKRGC Grant CUHK 400503 and CUHK DAG 2060257. February 3, 2006

DRAFT

2

give rise to finite differences between neighboring samples of x, or s = 2 and every Gi yields a discrete approximation of the gradient of x at i. One can observe in the literature that s = 1 is used along with Markov random field models for Bayesian inference, e.g. [4], [1], [5], [6], [7], [8], while s = 2 along with variational formulations, e.g. [9], [10], [11], [12], [3]. The rationale of the image and signal restoration approach (1)–(2) has widely been discussed in the literature, see [13], [2], [1], [4], [7], [3] and many others. Let G denote the rs × p matrix obtained by vertical concatenation of the matrices Gi for i = 1, . . . , r. A basic condition in order to have regularization is that KerAT A ∩ KerGT G = {0}.

(3)

A major requirement for the choice of a PF ϕ is that it allows large differences (“edges”) to be recovered in the solution x ˆ. Examples of PFs are given in Table I; for references see for instance [14], [7], [11], [15], [3]. In this paper we focus on the computation of a minimizer x ˆ of J when PF ϕ is either smooth or is the truncated quadratic PF, ϕ(t) = min{αt2 , 1}, where α > 0. Such a computation presents a challenge since usually the dimension p is high, AT A is ill-conditioned and the derivative ϕ0 involves almost flat regions. In their inaugural paper [6], Geman & Raynolds have shown that under appropriate conditions on ϕ (see section II), x ˆ is also given by (ˆ x, ˆb) = arg min J (x, b), (x,b)

where J is an augmented objective of the form 2

J (x, b) = kAx − yk + β

r µ X bi i=1

where ψ : R → R is such that

½ ϕ(t) = inf

b∈R

2

¶ kGi xk + ψ(bi ) 2

¾ 1 2 bt + ψ(b) . 2

(4)

(5)

Based on this observation, 2-step alternate minimization schemes are used: if the (k − 1)th iterate is (x(k−1) , b(k−1) ), the next one is defined by b(k) = arg min J (x(k−1) , b),

(6)

x(k) = arg min J (x, b(k) ).

(7)

b

x

1

Let x be an m × n image. If s = 1, for every (i, j) ∈ {1, . . . , m} × {1, . . . , n}, there are associated two, or possibly four

operators Gi that yield the differences xi,j − xi−1,j and xi,j − xi,j−1 , and possibly also xi,j − xi−1,j+1 and xi,j − xi+1,j−1 . When s = 2, for every (i, j) there is an operator Gi,j so that Gi,j x = [xi,j − xi−1,j , xi,j − xi,j−1 ]T . The operators relevant to the boundaries are defined according to the boundary conditions. February 3, 2006

DRAFT

3

Convex PFs

Nonconvex PFs

|t|α , 1 < α ≤ 2 √ α + t2 log(cosh(αt))

min{αt2 , 1} αt2 1 + αt2 log(αt2 + 1)

|t|/α − log (1 + |t|/α)

1 − exp (−αt2 )

TABLE I C OMMONLY USED SMOOTH - AT- ZERO PF S ϕ WHERE α > 0 IS A PARAMETER .

Using that x → J (x, b) is quadratic and that b → J (x, b) can be minimized separately for each bi , explicit expressions for the minimizers in (6)–(7) have been derived [11]. The resultant method is called half-quadratic and has been considered in a large number of papers2 ; e.g. [16], [11], [17], [12], [18], [19]. Connections with statistical EM algorithms has been explored in [22]. The contribution of this paper is to show that half-quadratic minimization is equivalent to the very classical graident linearization approach, known also as the fixed point iteration: in order to solve the equation ∇J(ˆ x) = 0, at each iteration k one finds x(k) by solving a linear problem, ³ ´ L x(k−1) x(k) = z,

(8)

where z ∈ Rp is independent of x, and for any x ∈ Rp given, z and L(x) ∈ Rp×p are uniquely defined by ∇J(x) = L(x) x − z.

(9)

Connection between both approaches has been mentioned by Vogel in [23] for the particular case when √ ϕ(t) = α + t2 . As we show below, equivalence holds in general. In turn, convergence results on half-quadratic regularization can now be applied directly to the basically heuristic gradient linearization method in (8)–(9). Furthermore, half-quadratic minimization can be seen as a form of quasi-Newton minimization (this fact has been exhibited in [21]). II. H ALF -Q UADRATIC R EGULARIZATION (M ULTIPLICATIVE F ORM ) Consider that ϕ : R+ → R satisfies the following assumptions: 2

Let us mention that another form of half-quadratic minimization, where b is involved additively via terms of the form

(bi + giT x)2 , was initiated in [8] and studied e.g. in [20], [19], [21]; it is out of the scope of this paper. February 3, 2006

DRAFT

4

√ H1: t → ϕ( t) is concave,

H2: t → ϕ(t) is increasing, H3: ϕ is C 1 on (0, +∞), H4: ϕ0 (0) = 0 and ϕ00 (0+ ) > 0 is finite. Using the theory of convex conjugacy, it has been established in [6], [11], [19], [21], that if ψ : R → R is given by

¾ ½ 1 2 ψ(b) = sup − bt + ϕ(t) , 2 t∈R

(10)

then (5) holds and the infimum in its right side is reached for b = σ(t) where   ϕ00 (0+ ) if t = 0, σ(t) = 0  ϕ (t) if t 6= 0. t

(11)

We can notice3 that σ(t) ∈ (0, ϕ00 (0+ )], ∀t ∈ R.

(12)

For convenience, let us write J in (4) as r

X β J (x, b) = kAx − yk + (Gx)T D(b) Gx + β ψ(bi ), 2 2

i=1

where D :

Rr



Rrs×rs

reads





D(b) = diag b1 , . . . , b1 , . . . , br , . . . , br  . | {z } | {z } s

Given

x(k−1) ,

times

s

(13)

times

the first step of iteration k , as defined in (6), has an explicit form: [b(k) ]i = σ(kGi x(k−1) k),

i = 1, . . . , r.

Its second step is defined by (7) and hence amounts to ³ ´−1 x(k) = L(b(k) ) 2AT y,

(14)

(15)

where L : Rr → Rp×p reads L(b) = 2AT A + βGT D(b)G,

(16)

for D as given in (13). Combining (3) and (12) shows that L(b(k) ) is invertible for all k . 3

By H1, t →



d ϕ( dt

√ √ √ t) = ϕ0 ( t)/(2 t) = σ( t)/2 is decreasing on R+ . Using H2, if t > 0, then ϕ0 (t) ≥ 0 and hence

σ(t) ≥ 0 for all t ∈ R.

February 3, 2006

DRAFT

5

It is worth to recall the interpretation of J exhibited by Geman and Reynolds in [6]. By (11), σ is decreasing on R+ . Inserting bi = σ(kGi xk) into (4) shows that each bi suspends the smoothness constraint if kGi xk is large (i.e. if it corresponds to an edge) and keeps it strong if kGi xk is small (i.e. if it is in a homogeneous region). Thus the auxiliary variable b has been interpreted as a continuous-valued, non-interacting line process. This provides an important insight into the way in which edge-preserving regularization works. III. F IXED P OINT I TERATION FOR THE L INEARIZED G RADIENT Fixed point iteration is aimed at solving ∇J(x) = 0 using (8) and (9). Below we develop these expressions in the case of an objective function of the form (1)–(2). Using that ϕ0 (kGi xk) = 0 if Gi x = 0,

we have ∇J(x) = 2AT Ax + β

X

GTi

i∈Ix

(17)

ϕ0 (kGi xk) Gi x − 2AT y, kGi xk

where Ix = {i ∈ {1, . . . , r} : Gi x 6= 0}. By (11) and (17), we can see that  0  ϕ (kGi xk) G x, ∀i ∈ I , i x kGi xk σ(kGi xk)Gi x =  0 else. Hence ∇J also reads ∇J(x) = 2AT Ax + βGT D ([σ(kGi xk)]ri=1 ) Gx − 2AT y,

where b → D(b) is defined by (13). Thus ∇J can be put into the form (9) for L(x) = L ([σ(kGi xk)]ri=1 ) , z = 2AT y,

(18) (19)

where L is the matrix-valued function defined in (16). By (3) and (12), the matrix L(x) is invertible for every x ∈ Rp . Applying (8) actually yields ³ ³h ir ´´−1 x(k) = L σ(kGi x(k−1) k) 2AT y, i=1

(20)

which amounts to inserting (14) into (15). This shows in particular that fixed-point iteration is convergent for all objective functions for which convergence of half-quadratic regularization has been proven [11], [17], [19], and vice-versa.

February 3, 2006

DRAFT

6

IV. A S PECIAL F ORM OF Q UASI -N EWTON I TERATION In [21] it has been pointed out that for convex PFs ϕ and s = 1, the iterations given by (14) and (15) correspond to a quasi-Newton minimization. The same holds now in our more general context since gradient linearization method can always be viewed as a form of quasi-Newton method. Indeed, starting with (21), we derive ³ ³ ´´−1 L x(k−1) z ³ ³ ´´−1 ³ ³ ´ = L x(k−1) L x(k−1) x(k−1) ³ ´ ´ −L x(k−1) x(k−1) + z ³ ³ ´´−1 ³ ´ = x(k−1) − L x(k−1) ∇J x(k−1) .

x(k) =

In the last equality, we use (9). This shows that half-quadratic regularization (14)–(15), or equivalently fixed-point iteration (21), performs a quasi-Newton iteration where the Hessian of J at x(k−1) is approx¡ ¢ imated by L x(k−1) . V. C ONCLUSION Half-quadratic minimization has been studied and used by numerous authors. In this paper we have shown that it is equivalent to the very classical approach where at each iteration a linear approximation of the gradient is used. We also demonstrate that it is a form of quasi-Newton method. R EFERENCES [1] G. Demoment, “Image reconstruction and restoration : Overview of common estimation structure and problems”, IEEE Transactions on Acoustics Speech and Signal Processing, vol. ASSP-37, no. 12, pp. 2024–2036, Dec. 1989. [2] A. Tarantola, Inverse problem theory : Methods for data fitting and model parameter estimation, Elsevier Science Publishers, Amsterdam, 1987. [3] G. Aubert and P. Kornprobst, Mathematical problems in images processing, Springer-Verlag, Berlin, 2002. [4] J. E. Besag, “Digital image processing : Towards Bayesian image analysis”, Journal of Applied Statistics, vol. 16, no. 3, pp. 395–407, 1989. ´ ´ e de Probabilit´es de [5] D. Geman, Random fields and inverse problems in imaging, vol. 1427, pp. 117–193, Ecole d’Et´ Saint-Flour XVIII - 1988, Springer-Verlag, lecture notes in mathematics edition, 1990. [6] D. Geman and G. Reynolds, “Constrained restoration and recovery of discontinuities”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-14, no. 3, pp. 367–383, Mar. 1992. [7] C. Bouman and K. Sauer, “A generalized Gaussian image model for edge-preserving MAP estimation”, IEEE Transactions on Image Processing, vol. 2, no. 3, pp. 296–310, July 1993.

February 3, 2006

DRAFT

7

[8] D. Geman and C. Yang, “Nonlinear image recovery with half-quadratic regularization”, IEEE Transactions on Image Processing, vol. IP-4, no. 7, pp. 932–946, July 1995. [9] A. Blake and A. Zisserman, Visual reconstruction, The MIT Press, Cambridge, 1987. [10] P. Perona and J. Malik, “Scale-space and edge detection using anisotropic diffusion”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-12, pp. 629–639, July 1990. [11] P. Charbonnier, L. Blanc-F´eraud, G. Aubert, and M. Barlaud, “Deterministic edge-preserving regularization in computed imaging”, IEEE Transactions on Image Processing, vol. 6, no. 2, pp. 298–311, Feb. 1997. [12] S. Teboul, L. Blanc-F´eraud, G. Aubert, and M. Barlaud, “Variational approach for edge-preserving regularization using coupled pde’s”, IEEE Transactions on Image Processing, vol. 7, no. 3, pp. 387–397, Mar. 1998. [13] A. Tikhonov and V. Arsenin, Solutions of Ill-Posed Problems, Winston, Washington DC, 1977. [14] M. Black and A. Rangarajan, “On the unification of line processes, outlier rejection, and robust statistics with applications to early vision”, International Journal of Computer Vision, vol. 19, no. 1, pp. 57–91, 1996. [15] S. Li, Markov Random Field Modeling in Computer Vision, Springer-Verlag, New York, 1 edition, 1995. [16] P. Charbonnier, L. Blanc-F´eraud, G. Aubert, and M. Barlaud, “Two deterministic half-quadratic regularization algorithms for computed imaging”, in Proceedings of IEEE ICIP, 1994, vol. 2, pp. 168–172. [17] A. H. Delaney and Y. Bresler, “Globally convergent edge-preserving regularized reconstruction: an application to limitedangle tomography”, IEEE Transactions on Image Processing, vol. 7, pp. 204–221, Feb. 1998. [18] P. Kornprobst, R. Deriche, and G. Aubert, “Image sequence analysis via partial differential equations”, Journal of Mathematical Imaging and Vision, vol. 11, no. 1, pp. 5–26, Oct. 1999. [19] J. Idier, “Convex half-quadratic criteria and auxiliary interacting variables for image restoration”, IEEE Transactions on Image Processing, vol. 10, no. 7, pp. 1001–1009, July 2001. [20] G. Aubert and L. Vese, “A variational method in image recovery”, SIAM Journal on Numerical Analysis, vol. 34, no. 5, pp. 1948–1979, 1997. [21] M. Nikolova and M. Ng, “Analysis of half-quadratic minimization methods for signal and image recovery”, SIAM Journal on Scientific Computing, vol. (to appear), 2005. [22] F. Champagnat and J. Idier, “A connection between half-quadratic criteria and em algorithms”, IEEE Signal Processing Letters, vol. 11, no. 9, pp. 709–712, Sep. 2004. [23] C. R. Vogel and M. E. Oman, “Fast, robust total variation-based reconstruction of noisy, blurred images”, IEEE Transactions on Image Processing, vol. 7, no. 6, pp. 813–824, Mar. 1998.

February 3, 2006

DRAFT