A primal--dual fixed point algorithm for convex separable minimization ...

IOP PUBLISHING

INVERSE PROBLEMS

Inverse Problems 29 (2013) 025011 (33pp)

doi:10.1088/0266-5611/29/2/025011

A primal–dual fixed point algorithm for convex separable minimization with applications to image restoration Peijun Chen 1,2 , Jianguo Huang 1,3 and Xiaoqun Zhang 1,4 1 Department of Mathematics, and MOE-LSC, Shanghai Jiao Tong University, Shanghai 200240, People’s Republic of China 2 Department of Mathematics, Taiyuan University of Science and Technology, Taiyuan 030024, People’s Republic of China 3 Division of Computational Science, E-Institute of Shanghai Universities, Shanghai Normal University, People’s Republic of China 4 Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai 200240, People’s Republic of China

E-mail: [email protected], [email protected] and [email protected]

Received 11 August 2012, in final form 25 November 2012 Published 17 January 2013 Online at stacks.iop.org/IP/29/025011 Abstract Recently, the minimization of a sum of two convex functions has received considerable interest in a variational image restoration model. In this paper, we propose a general algorithmic framework for solving a separable convex minimization problem from the point of view of fixed point algorithms based on proximity operators (Moreau 1962 C. R. Acad. Sci., Paris I 255 2897–99). Motivated by proximal forward–backward splitting proposed in Combettes and Wajs (2005 Multiscale Model. Simul. 4 1168–200) and fixed point algorithms based on the proximity operator (FP2 O) for image denoising (Micchelli et al 2011 Inverse Problems 27 45009–38), we design a primal–dual fixed point algorithm based on the proximity operator (PDFP2 Oκ for κ ∈ [0, 1)) and obtain a scheme with a closed-form solution for each iteration. Using the firmly nonexpansive properties of the proximity operator and with the help of a special norm over a product space, we achieve the convergence of the proposed PDFP2 Oκ algorithm. Moreover, under some stronger assumptions, we can prove the global linear convergence of the proposed algorithm. We also give the connection of the proposed algorithm with other existing firstorder methods. Finally, we illustrate the efficiency of PDFP2 Oκ through some numerical examples on image supper-resolution, computerized tomographic reconstruction and parallel magnetic resonance imaging. Generally speaking, our method PDFP2 O (κ = 0) is comparable with other state-of-the-art methods in numerical performance, while it has some advantages on parameter selection in real applications. (Some figures may appear in colour only in the online journal) 0266-5611/13/025011+33$33.00 © 2013 IOP Publishing Ltd

Printed in the UK & the USA

1

Inverse Problems 29 (2013) 025011

P Chen et al

1. Introduction This paper is devoted to designing and discussing an efficient algorithmic framework for minimizing the sum of two proper lower semi-continuous convex functions, i.e. x∗ = arg min x∈Rn

( f1 ◦ B)(x) + f2 (x),

(1.1)

where f1 ∈ 0 (Rm ), f2 ∈ 0 (Rn ) and f2 is differentiable on Rn with a 1/β-Lipschitz continuous gradient for some β ∈ (0, +∞) and B : Rn → Rm a linear transform. This parameter β is related to the convergence conditions of algorithms 3–5 presented in the following section. Here and in what follows, for a real Hilbert space X , 0 (X ) denotes the collection of all proper lower semi-continuous convex functions from X to (−∞, +∞]. Despite its simplicity, many problems in image processing can be formulated in the form of (1.1). For instance, the following variational sparse recovery models are often considered in image restoration and medical image reconstruction: x∗ = arg min μBx1 + 12 Ax − b22 ,

(1.2)

x∈Rn

where · 2 denotes the usual Euclidean norm for a vector, A is a p × n matrix representing a linear transform, b ∈ R p and μ > 0 is the regularization parameter. The termBx1 is the usual 1 -based regularization in order to promote sparsity under the transform B. For example, for the well-known Rudin–Osher–Fatemi (ROF) model [30] Bx1 represents the total-variation semi-norm which aims to recover piecewise constant images, with B being a 2n × n discrete differential matrix (cf [16, 24]). More precisely, Bx1 and Bx1,2 are for anisotropic total variation and isotropic total variation, respectively, and here we simply write them as Bx1 . Problem (1.2) can be expressed in the form of (1.1) by setting f1 = μ · 1 and f2 (x) = 12 Ax − b22 . One of the main difficulties in solving it is that f1 is non-differentiable. The case often occurs in many problems we are interested in. Another general problem often considered in the literature takes the following form: x∗ = arg min

f (x) + h(x),

(1.3)

x∈X

where f , h ∈ 0 (X ) and h is differentiable on X with a 1/β-Lipschitz continuous gradient for some β ∈ (0, +∞). Problem (1.1), which we are interested in this paper, can be viewed as a special case of problem (1.3) for X = Rn and f = f1 ◦ B, h = f2 . On the other hand, we can also consider that problem (1.3) is a special case of problem (1.1) for X = Rn , f2 = h, f1 = f and B = I, where I denotes the usual identity operator. For problem (1.3), Combettes and Wajs proposed in [12] a proximal forward–backward splitting (PFBS) algorithm, i.e. xk+1 = proxγ f (xk − γ ∇h(xk )),

(1.4)

where 0 < γ < 2β is a stepsize parameter, and the operator prox f is defined by prox f :

X x

→ →

X arg min f (y) + 12 x − y22 , y∈X

called the proximity operator of f . Note that this type of splitting method was originally studied in [23, 28] for solving partial differential equations, and the notion of proximity operators was first introduced by Moreau in [25] as a generalization of projection operators. The iteration (1.4) consists of two sequential steps. The first performs a forward (explicit) step involving the evaluation of the gradient of h; then the other performs a backward (implicit) step involving f . This numerical scheme is very simple and efficient when the proximity operator used in the second step can be carried out efficiently. For example, when f = · 1 for sparse 2


P Chen et al

regularization, the proximity operator proxγ f (x) can be written as the famous componentwise soft-thresholding (also known as a shrinkage) operation. However, the proximity operators for the general form f = f1 ◦ B as in (1.1) do not have an explicit expression, leading to the numerical solution of a difficult subproblem. In fact, the subproblem of (1.2) is proxμ·1 ◦B (b) and often formulated as the ROF denoising problem: x∗ = arg min μBx1 + 12 x − b22 ,

(1.5)

x∈Rn

where b ∈ Rn denotes a corrupted image to be denoised. In recent years, many splitting methods have been designed to solve the last subproblem in order to take advantage of the efficiency of the soft-thresholding operator. For example, Goldstein and Osher proposed in [18] a splitting algorithm based on the Bregman iteration, namely the split Bregman, to implement the action of prox f1 ◦B , in particular for total variation minimization. This algorithm has shown to be very efficient and useful for a large class of convex separable programming problems. Theoretically, it is shown to be equivalent to the Douglas–Rachford splitting algorithm (see [31, 14]) and alternating direction of multiplier method (ADMM, see [15, 7]), and the convergence was then analyzed based on such equivalence. The split Bregman proposed in [18] is also designed to solve the convex separable problem (1.1). In particular, for the variational model (1.2), the subproblem involves solving a quadratic minimization, which sometimes can be time consuming. To overcome this, a primal–dual inexact split Uzawa method was proposed in [35] to maximally decouple the subproblems so that each iteration step is precise and explicit. In [16, 9], more theoretical analysis on the variants of the primal–dual-type method and the connection with existing methods were examined to bridge the gap between different types of methods. Also, the convergence of ADMM was further analyzed in [19] based on proximal point algorithm (PPA) formulation. In this paper, we will follow a different point of view. In [24], Micchelli–Shen–Xu designed an algorithm called FP2 O to solve prox f1 ◦B (x). We aim to extend FP2 O to solve the general problem (1.1) with a maximally decoupled iteration scheme. One obvious advantage of the proposed scheme is that it is very easy for parallel implementation. Then, we will show that the proposed algorithm is convergent in a general setting. Under some assumptions of the convex function f2 and the linear transform B, we can further prove the linear convergence rate of the method under the framework of fixed point iteration. Note that most of the existing works based on ADMM have shown a sub-linear convergence rate O(1/k) on the objective function and O(1/k2 ) on the accelerated version, where k is the iteration number. Recently, in [19], the ergodic and non-ergodic convergence on the difference of two sequential primal–dual sequences were analyzed. In this paper, we will prove the convergence rate of the iterations directly from the point of view of fixed point theory under some common assumptions. We note that, during the preparation of this paper, Deng and Yin [13] also considered the global linear convergence of the ADMM and its variants based on similar assumptions. In addition, we will reformulate our fixed point type of methods and show their connections with some existing first-order methods for (1.1) and (1.2). The rest of the paper is organized as follows. In section 2, we recall the fixed point algorithm FP2 O and some related works and then deduce the proposed PDFP2 O algorithm and its extension PDFP2 Oκ from our intuitions. In section 3, we first deduce PDFP2 Oκ again in the setting of fixed point iteration; we then establish its convergence under a general setting and the convergence rate under some stronger assumptions on f2 and B. In section 4, we give the equivalent form of PDFP2 O, and the relationships and differences with other first-order algorithms. In section 5, we show the numerical performance and efficiency of PDFP2 Oκ through some examples on image super-resolution, tomographic reconstruction and 3


P Chen et al

parallel magnetic resonance imaging (pMRI). In the final section, we give a discussion and some perspectives on our method and other state-of-the-art methods frequently used in image restoration. 2. Fixed point algorithms Similar to the fixed point algorithm on the dual for the ROF denoising model (1.5) proposed by Chambolle [8], Micchelli et al proposed an algorithm called FP2 O in [24] to solve the proximity operator prox f1 ◦B (b) for b ∈ Rn , especially for the total-variation-based image denoising. Let λmax (BBT ) be the largest eigenvalue of BBT . For 0 < λ < 2/λmax (BBT ), we define the operator H(v) = (I − prox f1 )(Bb + (I − λBBT )v) for all v ∈ Rm ; λ

(2.1)

then the FP2 O algorithm is described as algorithm 1, where Hκ is the κ-averaged operator of H, i.e. Hκ = κI + (1 − κ )H for κ ∈ (0, 1); see definition 3.3 in the following section. Algorithm 1 Fixed point algorithm based on proximity operator, FP2 O [24]. Step 1: set v0 ∈ Rm , 0 < λ < 2/λmax (BBT ), κ ∈ (0, 1). Step 2: calculate v ∗ , which is the fixed point of H, with iteration vk+1 = Hκ (vk ). Step 3: prox f1 ◦B (b) = b − λBT v ∗ .

The key technique to obtain the FP2 O scheme relies on the relation of the subdifferential of a convex function and its proximity operator, as described in the result (3.1). An advantage of FP2 O is that its iteration does not require solving the subproblem and the convergence is analyzed in the classical framework of the fixed point iteration. This algorithm has been extended in [2, 10] to solve 1 x∗ = arg min ( f1 ◦ B)(x) + xT Qx − bT x, 2 x∈Rn where Q ∈ Mn , with Mn being the collection of all symmetric positive definite n × n matrices, b ∈ Rn . Define H(v) = (I − prox f1 )(BQ−1 b + (I − λBQ−1 BT )v) for all v ∈ Rm . λ

Then, the corresponding algorithm is given below, called algorithm 2, which can be viewed as a fixed point algorithm based on the inverse matrix and proximity operator or FP2 O based on the inverse matrix (IFP2 O). Here the matrix Q is assumed to be invertible and the inverse can be easily calculated, which is unfortunately not the case in most of the applications in imaging science. Moreover, there is no theoretical guarantee of convergence if the linear system is only solved approximately. Algorithm 2 FP2 O based on inverse matrix, IFP2 O [2]. Step 1: set v0 ∈ Rm and 0 < λ < 2/λmax (BQ−1 BT ), κ ∈ (0, 1). with iteration κ ( Step 2: calculate v ∗ , which is the fixed point of H, vk+1 = H vk ). Step 3: x∗ = Q−1 (b − λBT v ∗ ).

Further, the authors in [2] combined PFBS and FP2 O for solving problem (1.1), for which we call PFBS_FP2 O (cf algorithm 3 below). Precisely speaking, at step k in PFBS, after one forward iteration xk+1/2 = xk − γ ∇ f2 (xk ), we need to solve for xk+1 = proxγ f1 ◦B (xk+1/2 ). 4


P Chen et al

FP2 O is then used to solve this subproblem, i.e. the fixed point v ∗k+1 of Hxk+1/2 is obtained by the fixed iteration form v i+1 = (Hxk+1/2 )κ (v i ), where Hxk+1/2 (v) = (I − prox γλ f1 )(Bxk+1/2 + (I − λBBT )v) for all v ∈ Rm .

Then xk+1 is given by setting xk+1 = xk+1/2 − λBT v ∗k+1 . The acceleration combining with the Nesterov method [17, 26, 32, 33] was also considered in [2]. We note that algorithm 3 involves inner and outer iterations, and it is often problematic to set the appropriate inner stopping conditions to balance computational time and precision. In our algorithm developed later on, instead of using many number of inner fixed point iterations for solving proxγ f1 ◦B (x), we use only one inner fixed point iteration. Algorithm 3 Proximal forward–backward splitting based on FP2 O, PFBS_FP2 O [2]. Step 1: set x0 ∈ Rn , 0 < γ < 2β. Step 2: for k = 0, 1, 2, . . . xk+1/2 = xk − γ ∇ f2 (xk ), calculate the fixed point v ∗k+1 of Hxk+1/2 with iteration v i+1 = (Hxk+1/2 )κ (v i ), xk+1 = xk+1/2 − λBT v ∗k+1 . end for

Suppose κ = 0 in FP2 O. A very natural idea is to take the numerical solution vk of the fixed point of Hx(k−1)+1/2 as the initial value, and only perform one iteration for solving the fixed point of Hxk+1/2 ; then we can obtain the following iteration scheme: vk+1 = (I − prox γλ f1 )(B(xk − γ ∇ f2 (xk )) + (I − λBBT )vk ), (2.2a) 2 (PDFP O) T xk+1 = xk − γ ∇ f2 (xk ) − λB vk+1 , (2.2b) which produces our proposed method algorithm 4, described below. This algorithm can also be deduced from the fixed point formulation, whose detail we will give in the following section. On the other hand, since x is the primal variable related to (1.1), it is very natural to ask what role the variable v plays in our algorithm. After a thorough study, we find out as given in section 4.1 that v is actually the dual variable of the primal–dual form related to (1.1). Based on these observations, we call our method a primal–dual fixed point algorithm based on the proximity operator, and abbreviate it as PDFP2 O, inheriting the notion of ‘FP2 O’ in [24]. If B = I, λ = 1, then form (2.2) is equivalent to form (1.4). So PFPS can be seen as a special case of PDFP2 O. Also, when f2 (x) = 12 x − b22 and γ = 1, then PDFP2 O reduces to FP2 O for solving prox f1 ◦B (b) with κ = 0. For general B and f2 , each step of the proposed algorithm is explicit when prox γλ f1 is easy to compute. Note that the technique of approximating the subproblem by only one iteration is also proposed in a primal–dual inexact Uzawa framework in [35]. We will show the connection to this algorithm and other ones in section 4. Algorithm 4 Primal–dual fixed point algorithm based on proximity operator, PDFP2 O. Step 1: set x0 ∈ Rn , v0 ∈ Rm , 0 < λ 1/λmax (BBT ), 0 < γ < 2β. Step 2: for k = 0, 1, 2, . . . xk+ 1 = xk − γ ∇ f2 (xk ), 2 vk+1 = (I − prox γλ f1 )(Bxk+ 1 + (I − λBBT )vk ), 2 xk+1 = xk+ 1 − λBT vk+1 . 2 end for

Borrowing the fixed point formulation of PDFP2 O, we can introduce a relaxation parameter κ ∈ [0, 1) to obtain algorithm 5, which is exactly a Picard method with parameters. 5


P Chen et al

The rule for parameter selection will be illustrated in section 3. If κ = 0, then PDFP2 Oκ reduces to PDFP2 O. Our theoretical analysis for PDFP2 Oκ given in the following section is mainly based on this fixed point setting. Algorithm 5 PDFP2 Oκ . Step 1: set x0 ∈ Rn , v0 ∈ Rm , 0 < λ 1/λmax (BBT ), 0 < γ < 2β, κ ∈ [0, 1). Step 2: for k = 0, 1, 2, . . . xk+ 1 = xk − γ ∇ f2 (xk ), 2 vk+1 = (I − prox γλ f1 )(Bxk+ 1 + (I − λBBT )vk ), 2 xk+1 = xk+ 1 − λBT vk+1 , 2 vk+1 = κvk + (1 − κ ) vk+1 , xk+1 . xk+1 = κxk + (1 − κ ) end for

3. Convergence analysis 3.1. General convergence First of all, let us mention some related definitions and lemmas for later requirements. From now on, we use X to denote a finite-dimensional real Hilbert space. Moreover, we always assume that problem (1.1) has at least one solution. As shown in [12], if the objective function ( f1 ◦ B)(x) + f2 (x) is coercive, i.e. lim

x2 →+∞

(( f1 ◦ B)(x) + f2 (x)) = +∞,

then the existence of solution can be ensured for (1.1). Definition 3.1 (Subdifferential [30]). Let f be a function in 0 (X ). The subdifferential of f is the set-valued operator ∂ f : X → 2X , the value of which at x ∈ X is ∂ f (x) = {v ∈ X | v, y − x + f (x) f (y) for all y ∈ X }, where ·, · denotes the inner-product over X . Definition 3.2 (Nonexpansive operators and firmly nonexpansive operators [30]). An operator T : X → X is nonexpansive if and only if it satisfies T x − Ty2 x − y2 for all (x, y) ∈ X 2 . T is firmly nonexpansive if and only if it satisfies one of the following equivalent conditions: (i) T x − Ty22 T x − Ty, x − y for all (x, y) ∈ X 2 . (ii) T x − Ty22 x − y22 − (I − T )x − (I − T )y22 for all (x, y) ∈ X 2 . It is easy to show from the above definitions that a firmly nonexpansive operator T is nonexpansive. Definition 3.3 (Picard sequence, κ-averaged operator [27]). Let T : X → X be an operator. For a given initial point u0 ∈ X , the Picard sequence of the operator T is defined by uk+1 = T (uk ), for k ∈ N. For a real number κ ∈ (0, 1), the κ-averaged operator Tκ of T is defined by Tκ = κI + (1 − κ )T . We also write T0 = T . 6


P Chen et al

Lemma 3.1. Suppose f ∈ 0 (Rm ) and x ∈ Rm . Then there holds y ∈ ∂ f (x) ⇔ x = prox f (x + y).

(3.1)

Furthermore, if f has 1/β-Lipschitz continuous gradient, then

∇ f (x) − ∇ f (y), x − y β∇ f (x) − ∇ f (y)2 for all (x, y) ∈ Rm .

(3.2)

Proof. The first result is nothing but proposition 2.6 of [24]. If f has 1/β-Lipschitz continuous gradient, we have from [12] that β∇ f is firmly nonexpansive, which implies (3.2) readily. Lemma 3.2 (Lemma 2.4 of [12]). Let f be a function in 0 (Rm ). Then prox f and I − prox f are both firmly nonexpansive operators. Lemma 3.3 (Opial κ-averaged theorem, theorem 3 of [27]). If S is a closed and convex set in X and T : S → S is a nonexpansive mapping having at least one fixed point, then for κ ∈ (0, 1), Tκ is nonexpansive, maps S to itself and has the same set of fixed points as T . Furthermore, for any u0 ∈ S and κ ∈ (0, 1), the Picard sequence of Tκ converges to a fixed point of T . Now, we are ready to obtain a fixed point formulation for the solution of problem (1.1) and discuss the convergence of PDFP2 Oκ . To this end, for any two positive numbers λ and γ , define T1 : Rm × Rn → Rm as T1 (v, x) = (I − prox γλ f1 )(B(x − γ ∇ f2 (x)) + (I − λBBT )v)

(3.3)

and T2 : R × R → R as m

n

n

T2 (v, x) = x − γ ∇ f2 (x) − λBT ◦ T1 .

(3.4)

Denote T : Rm × Rn → Rm × Rn as T (v, x) = (T1 (v, x), T2 (v, x)).

(3.5)

Theorem 3.1. Let λ and γ be two positive numbers. Suppose that x∗ is a solution of (1.1). Then there exists v ∗ ∈ Rm such that ∗ v = T1 (v ∗ , x∗ ), x∗ = T2 (v ∗ , x∗ ). In other words, u∗ = (v ∗ , x∗ ) is a fixed point of T . Conversely, if u∗ ∈ Rm × Rn is a fixed point of T , with u∗ = (v ∗ , x∗ ), v ∗ ∈ Rm , x∗ ∈ Rn , then x∗ is a solution of (1.1). Proof. By the first-order optimality condition of problem (1.1), we have x∗ = arg min x∈Rn

( f1 ◦ B)(x) + f2 (x)

⇔ 0 ∈ −∇ f2 (x∗ ) − ∂ ( f1 ◦ B)(x∗ ) ⇔ 0 ∈ −γ ∇ f2 (x∗ ) − γ ∂ ( f1 ◦ B)(x∗ ) γ ⇔ x∗ ∈ x∗ − γ ∇ f2 (x∗ ) − λ BT ◦ ∂ f1 ◦ B (x∗ ). λ Let v∗ ∈

γ ∂ f1 ◦ B (x∗ ) = ∂ f1 (Bx∗ ). λ λ

γ

(3.6)

Then x∗ = x∗ − γ ∇ f2 (x∗ ) − λBT v ∗ .

(3.7) 7


P Chen et al

Moreover, it follows from result (3.1) that (3.6) is equivalent to Bx∗ = prox γλ f1 (Bx∗ + v ∗ )

⇔ (Bx∗ + v ∗ ) − v ∗ = prox γλ f1 (Bx∗ + v ∗ ) ⇔ v ∗ = (I − prox γλ f1 )(Bx∗ + v ∗ ).

(3.8)

Inserting (3.7) into (3.8) gives v ∗ = I − prox γλ f1 (B(x∗ − γ ∇ f2 (x∗ )) + (I − λBBT )v ∗ ). This shows v ∗ = T1 (v ∗ , x∗ ). Next, replacing v ∗ in (3.7) by T1 (v ∗ , x∗ ), we readily have x∗ = T2 (v ∗ , x∗ ). Therefore, for u∗ = (v ∗ , x∗ ), u∗ = T (u∗ ). On the other hand, if u∗ = T (u∗ ), then we can derive that x∗ satisfies the first-order optimality condition of (1.1). Therefore, we conclude that x∗ is a minimizer of (1.1). In the following, we will show that the algorithm PDFP2 Oκ is a Picard method related to the operator Tκ . Theorem 3.2. Suppose κ ∈ [0, 1). Set Tκ = κI + (1 − κ )T . Then the Picard sequence {uk } of Tκ is exactly the one obtained by the algorithm PDFP2 Oκ . Proof. According to the definitions in (3.3)–(3.5), the component form of uk+1 = T (uk ) can be expressed as vk+1 = T1 (vk , xk ) = (I − prox γλ f1 )(B(xk − γ ∇ f2 (xk )) + (I − λBBT )vk ) xk+1 = T2 (vk , xk ) = xk − γ ∇ f2 (xk ) − λBT ◦ T1 (vk , xk ) = xk − γ ∇ f2 (xk ) − λBT vk+1 . Therefore, the iteration uk+1 = T (uk ) is equivalent to (2.2). Employing the similar argument, we can obtain the conclusion for general Tκ with κ ∈ [0, 1). Remark 3.1. From the last result, we find out that algorithm PDFP2 Oκ can also be obtained in the setting of fixed point iteration immediately. For the convergence analysis for PDFP2 Oκ , we will first prove a key inequality for general cases (cf equation (3.13)). Denote g(x) = x − γ ∇ f2 (x) for all x ∈ Rn ,

(3.9)

M = I − λBBT .

(3.10)

When 0 < λ 1/λmax (BB ), M is a symmetric positive semi-definite matrix, so we can define the semi-norm vM = v, Mv for all v ∈ Rm . (3.11) T

For an element u = (v, x) ∈ Rm × Rn , with v ∈ Rm and x ∈ Rn , let

uλ = x22 + λv22 .

(3.12)

We can easily see that · λ is a norm over the produce space R × R whenever λ > 0. m

n

Theorem 3.3. For any two elements u1 = (v1 , x1 ), u2 = (v2 , x2 ) in Rm × Rn , there holds T (u1 ) − T (u2 )2λ u1 − u2 2λ − γ (2β − γ )∇ f2 (x1 ) − ∇ f2 (x2 )22 −λBT (v1 − v2 )22 − λ(T1 (u1 ) − T1 (u2 )) − (v1 − v2 )2M .

8

(3.13)


P Chen et al

Proof. By lemma 3.2, I − prox γλ f1 is a firmly nonexpansive operator. This together with (3.3), (3.9) and (3.10) yields T1 (u1 ) − T1 (u2 )22 T1 (u1 ) − T1 (u2 ), B(g(x1 ) − g(x2 )) + M(v1 − v2 ) = T1 (u1 )−T1 (u2 ), B(g(x1 ) − g(x2 )) + T1 (u1 ) − T1 (u2 ), M(v1 − v2 ). (3.14) It follows from (3.4), (3.9), (3.10) and (3.11) that T2 (u1 ) − T2 (u2 )22 = (g(x1 ) − g(x2 )) − λBT ◦ (T1 (u1 ) − T1 (u2 ))22 = g(x1 ) − g(x2 )22 − 2λ BT ◦ (T1 (u1 ) − T1 (u2 )), g(x1 ) − g(x2 ) + λBT ◦ (T1 (u1 ) − T1 (u2 ))22 = g(x1 ) − g(x2 )22 − 2λ T1 (u1 ) − T1 (u2 ), B(g(x1 ) − g(x2 )) − λT1 (u1 ) − T1 (u2 )2M + λT1 (u1 ) − T1 (u2 )22 . Observing the definitions in (3.5) and (3.9)–(3.12), we have by (3.14)–(3.15) T (u1 ) − T (u2 )2λ = T2 (u1 ) − T2 (u2 )22 + λT1 (u1 ) − T1 (u2 )22 = g(x1 ) − g(x2 )22 − 2λ T1 (u1 ) − T1 (u2 ), B(g(x1 ) − g(x2 ))

(3.15)

− λT1 (u1 ) − T1 (u2 )2M + 2λT1 (u1 ) − T1 (u2 )22 g(x1 ) − g(x2 )22 − λT1 (u1 ) − T1 (u2 )2M + 2λ T1 (u1 ) − T1 (u2 ), M(v1 − v2 ) = g(x1 ) − g(x2 )22 + λv1 −v2 2M − λ(T1 (u1 ) − T1 (u2 )) − (v1 − v2 )2M .

(3.16)

Using the definition in (3.9) and estimate (3.2), we know g(x1 )−g(x2 )22 = x1 − x2 22 −2γ ∇ f2 (x1 )−∇ f2 (x2 ), x1 − x2 + γ 2 ∇ f2 (x1 ) − ∇ f2 (x2 )22 x1 − x2 22 − γ (2β − γ )∇ f2 (x1 ) − ∇ f2 (x2 )22 . (3.17) By the definitions in (3.10) and (3.11), λv1 − v2 2M = λv1 − v2 22 − λBT (v1 − v2 )22 . (3.18) Recalling the definition in (3.12), we easily know that (3.13) is a direct consequence of (3.16)– (3.18). From theorem 3.3, we can derive the following result. Corollary 3.1. If 0 < γ < 2β, 0 < λ 1/λmax (BBT ), then T is nonexpansive under the norm · λ . Since T is nonexpansive, we are able to show the convergence of PDFP2 Oκ for κ ∈ (0, 1), in view of lemma 3.3. Theorem 3.4. Suppose 0 < γ < 2β, 0 < λ 1/λmax (BBT ) and κ ∈ (0, 1). Let uk = (vk , xk ) be a sequence generated by PDFP2 Oκ . Then {uk } converges to a fixed point of T and {xk } converges to a solution of problem (1.1). Proof. In view of theorem 3.2, we know uk+1 = Tκ (uk ), so {uk } is the Picard sequence of Tκ . By assumption, problem (1.1) has a solution, and hence operator T has a fixed point from theorem 3.1. According to corollary 3.1, T is nonexpansive. Therefore, by letting S = Rm , we find from lemma 3.3 that {uk } converges to a fixed point of T for κ ∈ (0, 1). With this result in mind, {xk } converges to a solution of problem (1.1) from theorem 3.1. Now, let us proceed with the convergence analysis of PDFP2 O using some novel technique. 9


P Chen et al

Theorem 3.5. Suppose 0 < γ < 2β and 0 < λ 1/λmax (BBT ). Let uk = (vk , xk ) be the sequence generated by PDFP2 O. Then the sequence {uk } converges to a fixed point of T, and the sequence {xk } converges to a solution of problem (1.1). Proof. Let u∗ = (v ∗ , x∗ ) ∈ Rm × Rn be a fixed point of T . Using theorem 3.3, we have uk+1 − u∗ 2λ uk − u∗ 2λ − γ (2β − γ )∇ f2 (xk ) − ∇ f2 (x∗ )22 −λBT (vk − v ∗ )22 − λvk+1 − vk 2M .

(3.19)

Summing (3.19) over k from 0 to +∞ gives +∞

γ (2β − γ )∇ f2 (xk ) − ∇ f2 (x∗ )22 + λBT (vk − v ∗ )22 + λvk+1 − vk 2M u0 − u∗ 2λ .

k=0

So lim

k→+∞

γ (2β − γ )∇ f2 (xk ) − ∇ f2 (x∗ )22 + λBT (vk − v ∗ )22 + λvk+1 − vk 2M = 0,

which together with 0 < γ < 2β implies lim ∇ f2 (xk ) − ∇ f2 (x∗ )2 = 0,

(3.20)

lim BT (vk − v ∗ )2 = 0,

(3.21)

lim vk+1 − vk M = 0.

(3.22)

k→+∞ k→+∞ k→+∞

By the definitions in (3.10) and (3.11), vk+1 − vk 22 = vk+1 − vk 2M + λBT (vk+1 − vk )22 , which when combined with (3.21) and (3.22) gives lim vk+1 − vk 2 = 0.

k→+∞

(3.23)

On the other hand, from (3.7) we have −γ ∇ f2 (x∗ ) − λBT v ∗ = 0, and from (2.2b) xk+1 − xk = −γ ∇ f2 (xk ) − λBT vk+1 . Hence, xk+1 − xk = −γ (∇ f2 (xk ) − ∇ f2 (x∗ )) − λ(BT vk+1 − BT v ∗ ). Now, using (3.20) and (3.21) we immediately obtain lim xk+1 − xk 2 = 0.

k→+∞

(3.24)

By the definition in (3.12) and (3.23)–(3.24), lim uk+1 − uk λ = 0.

k→+∞

(3.25)

From (3.19), we know that the sequence {uk − u∗ λ } is non-increasing, so the sequence {uk } is bounded and there exists a convergent subsequence {uk j } such that lim uk j − u∗ λ = 0

j→+∞

for some u∗ ∈ Rm × Rn . Next, let us show that u∗ is a fixed point of T . In fact, T (uk j ) − u∗ λ = (uk j +1 − uk j ) − (uk j − u∗ )λ uk j +1 − uk j λ + uk j − u∗ λ 10

(3.26)


P Chen et al

which, in conjunction with (3.25) and (3.26), leads to lim T (uk j ) − u∗ λ = 0.

(3.27)

j→+∞

The operator T is continuous since it is nonexpansive, so it follows from (3.26) and (3.27) that u∗ is a fixed point of T . Moreover, we know that {uk − u∗ λ } is non-increasing for any fixed point u∗ of T . In particular, by choosing u∗ = u∗ , we see that {uk − u∗ λ } is non-increasing. Combining this and (3.26) yields lim uk = u∗ .

k→+∞ ∗ ∗

Writing u∗ = (v , x ) with v ∗ ∈ Rm , x∗ ∈ Rn , we find from theorem 3.1 that x∗ is the solution of problem (1.1). Note that if f2 (x) = 12 x − b22 and γ = 1, then PDFP2 O reduces to FP2 O for κ = 0. As a consequence of the above theorem, we can achieve the convergence of FP2 O for κ = 0 even when BBT is singular, for which no convergence is available from theorem 3.12 of [24]. Corollary 3.2. Suppose 0 < λ 1/λmax (BBT ). Let {vk } be the sequence generated by FP2 O for κ = 0. Set xk = b − λBT vk . Then the sequence {vk } converges to the fixed point of H(see (2.1)), the sequence {xk } converges to the solution of problem (1.1) with f2 (x) = 12 x − b22 . 3.2. Linear convergence rate for special cases In this section, we will give some stronger theoretical results about the convergence rate in some special cases. For this, we present the following condition. Condition 3.1. For any two real numbers λ and γ satisfying that 0 < γ < 2β and 0 < λ 1/λmax (BBT ), there exist η1 , η2 ∈ [0, 1) such that I − λBBT 2 η12 and g(x) − g(y)2 η2 x − y2

for all x, y ∈ Rn ,

where g(x) is given in (3.9). Remark 3.2. If B has full row rank, f2 is strongly convex, i.e. there exists some σ > 0 such that

∇ f2 (x) − ∇ f2 (y), x − y σ x − y22

for all x, y ∈ Rn ,

(3.28)

then this condition can be satisfied. In fact, when B has a full row rank, we can choose η12 = 1 − λλmin (BBT ), where λmin (BBT ) denotes the smallest eigenvalue of BBT . In this case, η12 takes its minimum 2 λmin (BBT ) η1 min = 1 − λmax (BBT ) at λ = 1/λmax (BBT ). On the other hand, since f2 has 1/β-Lipschitz continuous gradient and is strongly convex, it follows from (3.2) and (3.28) that g(x) − g(y)22 = x − y22 − 2γ ∇ f2 (x) − ∇ f2 (y), x − y + γ 2 ∇ f2 (x) − ∇ f2 (y)22 γ (2β − γ ) x − y22 −

∇ f2 (x) − ∇ f2 (y), x − y β σ γ (2β − γ ) 1− x − y22 . β 11


P Chen et al

Hence we can choose η22 = 1 −

σ γ (2β − γ ) . β

In particular, if we choose γ = β, then η2 takes its minimum in the present form: (η22 )min = 1 − σ β. As a typical example, consider f2 (x) = 12 Ax − b22 with AT A full rank. Then we can find that β = 1/λmax (AT A) and σ = λmin (AT A), and hence 2 λmin (AT A) η2 min = 1 − . λmax (AT A) Despite most of our interesting problems not belonging to these special cases, and there will be more efficient algorithms if condition 3.1 is satisfied, the following results still have some theoretical values where the best performance of PDFP2 Oκ can be achieved. First of all, we show that Tκ is contractive under condition 3.1. Theorem 3.6. Suppose condition 3.1 holds true. Let the operator T be given in (3.5) and Tκ = κI + (1 − κ )T for κ ∈ [0, 1). Then Tκ is contractive under the norm · λ . Proof. Let η = max{η1 , η2 }. It is clear that 0 η 1. Then, owing to the condition 3.1, for all u1 = (v1 , x1 ), u2 = (v2 , x2 ) ∈ Rm × Rn , there holds g(x1 ) − g(x2 )2 ηx1 − x2 2 , v1 − v2 M ηv1 − v2 2 , from which, (3.12) and (3.16) it follows that T (u1 ) − T (u2 )2λ g(x1 )−g(x2 )22 +λv1 −v2 2M − λ(T1 (u1 ) − T1 (u2 )) − (v1 − v2 )2M η2 (x1 − x2 22 + λv1 − v2 22 ) = η2 u1 − u2 2λ . On the other hand, it is easy to check from the last estimate and the triangle inequality that Tκ (u1 ) − Tκ (u2 )λ κu1 − u2 λ + (1 − κ )T (u1 ) − T (u2 )λ θ u1 − u2 λ , with θ = κ + (1 − κ )η ∈ [0, 1). So, operator Tκ is contractive.

Now, we are ready to analyze the convergence rate of PDFP2 Oκ . Theorem 3.7. Suppose condition 3.1 holds true. Let the operator T be given in (3.5) and Tκ = κI + (1 − κ )T for κ ∈ [0, 1). Let uk = (vk , xk ) be a Picard sequence of the operator Tκ (or equivalently, a sequence obtained by algorithm PDFP2 Oκ ). Then the sequence {uk } must converge to the unique fixed point u∗ = (v ∗ , x∗ ) ∈ Rm × Rn of T , with x∗ being the unique solution of problem (1.1). Furthermore, there holds the estimate cθ k , (3.29) 1−θ where c = u1 − u0 λ , θ = κ + (1 − κ )η ∈ [0, 1) and η = max{η1 , η2 }, with η1 and η2 given in condition 3.1. xk − x∗ 2

12


P Chen et al

Proof. Since the operator Tκ is contractive, by the Banach contraction mapping theorem, it has a unique fixed point, denoted by u∗ = (v ∗ , x∗ ). It is obvious that Tκ has the same fixed points as T , so x∗ is the unique solution of problem (1.1) from theorem 3.1. Moreover, it is routine that the sequence {uk } converges to u∗ . On the other hand, it follows from theorem 3.6 that uk+1 − uk λ θ uk − uk−1 λ · · · θ k u1 − u0 λ = cθ k . So for all 0 < l ∈ N, uk+l − uk λ

l

uk+i − uk+i−1 λ = cθ k

i=1

l

θ i−1

i=1

cθ k , 1−θ

which immediately implies xk − x∗ 2 uk − u∗ λ

cθ k 1−θ

by letting l → +∞. The desired estimate (3.29) is then obtained.

If B = I, λ = 1, then form (2.2) is equivalent to form (1.4), so as a special case of theorem 3.7, we can obtain the convergence rate for PFBS. Corollary 3.3. Suppose 0 < γ < 2β and there exists η ∈ [0, 1) such that g(x) − g(y)2 ηx − y2

for all x, y ∈ Rn .

Let {xk } be a sequence generated by PFBS and x∗ be the solution of problem (1.3) for X = Rn . Set c = x1 − x0 2 . Then xk − x∗ 2

cηk . 1−η

As a conclusion of theorem 3.7, we can also obtain the convergence rate of FP2 O for κ = 0 under the assumption I − λBBT < 1. Corollary 3.4. Suppose 0 < λ 1/λmax (BBT ), the matrix B has full row rank and η1 is given by condition 3.1. Let v ∗ be the fixed point of H(cf (2.1)). Let {vk } be a sequence generated by FP2 O for κ = 0, with xk = b − λBT vk . Set

c = λBT (v1 − v0 )22 + λv1 − v0 22 . Then cη1k vk − v ∗ 2 √ , λ(1 − η1 )

xk − x∗ 2

cη1k . 1 − η1

4. Connections to other algorithms We will further investigate the proposed algorithm PDFP2 O from the perspective of primal– dual forms and establish the connections to other existing methods. 13


P Chen et al

4.1. Primal–dual and proximal point algorithms For problem (1.1), we can write its primal–dual form using the Fenchel duality [29] as min maxL(x, v) := f2 (x) + Bx, v − f1∗ (v), v

x

(4.1)

where f1∗ is the convex conjugate function of f1 defined by f1∗ (v) = sup v, w − f1 (w). w∈Rm

By introducing a new intermediate variable yk+1 , equations (2.2) are reformulated as ⎧ yk+1 = xk − γ ∇ f2 (xk ) − λBT vk , (4.2a) ⎪ ⎨ (4.2b) vk+1 = (I − prox γλ f1 )(Byk+1 + vk ), ⎪ ⎩xk+1 = xk − γ ∇ f2 (xk ) − λBT vk+1 . (4.2c) According to Moreau decomposition (see equation (2.21) in [12]), for all v ∈ Rm , we have λ γ v , v = v ⊕γ + v γ , where v ⊕γ = prox γλ f1 v, v γ = prox λ f1∗ γ λ λ λ λ λ γ from which we know λ γ λ I − prox γλ f1 (Byk+1 + vk ) = prox λ f1∗ Byk+1 + vk . γ λ γ γ λ Let v k = γ vk . Then (4.2) can be reformulated as ⎧ y = xk − γ ∇ f2 (xk ) − γ BT v k , (4.3a) ⎪ ⎪ ⎨ k+1 λ (4.3b) Byk+1 + v k , v k+1 = prox λ f1∗ ⎪ γ γ ⎪ ⎩ T xk+1 = xk − γ ∇ f2 (xk ) − γ B v k+1 . (4.3c) In terms of the saddle point formulation (4.1), we have by a direct manipulation that ∇ f2 (xk ) + BT v k = ∇x L(xk , v k ) λ γ Byk+1 + v k = arg min − L(yk+1 , v) + v − v k 22 , prox λ f1∗ γ γ 2λ v∈Rm ∇ f2 (xk ) + BT v k+1 = ∇x L(xk , v k+1 ). Hence, (4.3) can be expressed as ⎧ yk+1 = xk − γ ∇x L(xk , v k ), ⎪ ⎪ ⎨ γ v − v k 22 , v k+1 = arg min − L(yk+1 , v) + m 2λ v∈R ⎪ ⎪ ⎩ xk+1 = xk − γ ∇x L(xk , v k+1 ).

(4.4a) (4.4b) (4.4c)

From (4.3a) and (4.3c), we can find out that yk+1 = xk+1 + γ BT (v k+1 − v k ). Then equation (4.4b) becomes v k+1 = arg min − L(xk+1 , v) + v∈Rm

γ v − v k 2M , 2λ

where M = 1 − λBBT . Together with (4.4c), the iterations (4.4) are ⎧ γ ⎨v v − v k 2M , k+1 = arg maxL(xk+1 , v) − 2λ v∈Rm ⎩x = x − γ ∇ L(x , v ). k+1

14

k

x

k

k+1

(4.5a) (4.5b)


P Chen et al

Table 1. Comparison between CP (θ = 1) and PDFP2 O.

CP (θ = 1) Form

Convergence Relation

PDFP2 O

v k+1 = (I + σ ∂ f1∗ )−1 (v k + σ Byk ) v k+1 = (I + γλ ∂ f1∗ )−1 (v k + γλ Byk ) xk+1 = xk − γ ∇ f2 (xk ) − γ BT v k+1 xk+1 = (I + τ ∇ f2 )−1 (xk − τ BT v k+1 ) yk+1 = 2xk+1 − xk yk+1 = xk+1 − γ ∇ f2 (xk+1 ) − γ BT v k+1 0 < σ τ < 1/λmax (BBT ) 0 < γ < 2β, 0 < λ 1/λmax (BBT ) σ = λ/γ , τ = γ

Thus the proposed algorithm can be interpreted as an inexact Uzawa method [3] applied on the dual formulation. Compared to the classical Uzawa method, (4.5) is more implicit since the update of v k+1 involves xk+1 and a proximal point iteration matrix M is used. This leads to a close connection with a class of primal–dual method studied in [35, 16, 9, 19]. For example, in [9], Chambolle and Pock proposed the following scheme for solving (4.1): ⎧ v k+1 = (I + σ ∂ f1∗ )−1 (v k + σ Byk ), (4.6a) ⎪ ⎪ ⎨ −1 T (4.6b) (CP) xk+1 = (I + τ ∇ f2 ) (xk − τ B v k+1 ), ⎪ ⎪ ⎩y = x + θ (x − x ), (4.6c) k+1

k+1

k+1

k

where σ, τ > 0, θ ∈ [0, 1] is a parameter. For θ = 0, we can obtain the classical Arrow– Hurwicz–Uzawa (AHU) method in [3]. The convergence of AHU with very small step length is shown in [16]. Under some assumptions on f1∗ or strong convexity of f2 , global convergence of the primal–dual gap can also be shown with specific chosen adaptive steplength [9]. Note that in the case of the ROF model, Chan and Zhu proposed in [36] a clever adaptive step lengths σ and τ for acceleration, and recently the convergence was shown in [6]. According to equation (4.3), using the relation prox λ f1∗ = (I + γλ ∂ f1∗ )−1 , and changing γ the order of these equations, we know that PDFP2 O is equivalent to ⎧ λ ∗ −1 λ (4.7a) v = I + + v ∂ f By k+1 k k , ⎪ ⎪ γ 1 γ ⎨ xk+1 = xk − γ ∇ f2 (xk ) − γ BT v k+1 , (4.7b) ⎪ ⎪ ⎩ yk+1 = xk+1 − γ ∇ f2 (xk+1 ) − γ BT v k+1 . (4.7c) Let σ = λ/γ , τ = γ , then we can see that equations (4.6b) and (4.6c) are approximated by two explicit steps (4.7b)–(4.7c). In summary, we list the comparisons of CP for θ = 1 with the fixed step length and PDFP2 O in table 1. For f2 (x) = 12 Ax − b22 , (4.4) can be further expressed as ⎧ 1 ⎪ yk+1 = arg minL(x, v k ) + x − xk 2(I−γ AT A) , (4.8a) ⎪ ⎪ n ⎪ 2γ x∈R ⎨ γ v − v k 22 , v k+1 = arg min − L(yk+1 , v) + (4.8b) 2λ v∈Rm ⎪ 1 ⎪ ⎪ ⎪ x − xk 2(I−γ AT A) . L(x, v k+1 ) + (4.8c) ⎩ xk+1 = arg min 2γ x∈Rn Note that by introducing the proximal iteration norm through the matrix I − γ AT A ∈ Mn for 0 < γ < β with β = 1/λmax (AT A), (4.8a) and (4.8c) become explicit. This is particularly useful when the inverse of AT A is not easy to obtain in most of the imaging applications, such as super-resolution, tomographic reconstruction and parallel MRI [11]. Meanwhile, it is worthwhile pointing out that the condition on γ by this formulation is stricter than theorem 3.5, where γ is required as 0 < γ < 2β for the convergence. Furthermore, if we 15


T γ ∗ k )−Bxk T ) and P = λ (I−λBB denote uˆk = v k T , xk T and F (uˆk ) = B∂Tfv1k(v+∇ 0 f2 (xk ) we can also easily write the algorithm in the PPA framework [19] as

P Chen et al 1 γ

0 (I−γ AT A)

, then

0 ∈ F (uˆk+1 ) + P(uˆk+1 − uˆk ).

(4.9)

We note that in [19], the Chambolle–Pock algorithm (4.6) for θ = 1 was also rewritten in the PPA structure as (4.9) with the same F, while 1 I B P = σT 1 . B I τ In [19, 9], a more general class of algorithms taking this form are studied. In particular, an extra extrapolation step can be applied to the algorithm (4.9) for acceleration. 4.2. Splitting type of methods There are other types of methods which are designed to solve problem (1.1) based on the notion of an augmented Lagrangian. For simplicity, we only list these algorithms for f2 (x) = 12 Ax − b22 . Among them, the alternating split Bregman (ASB) method proposed by Goldstein and Osher [18] is very popular for imaging applications. This method has been proved to be equivalent to the Douglas–Rachford method and the alternating direction of multiplier method (ADMM). In [34, 35], based on PFBS and Bregman iteration, a split inexact Uzawa (SIU) method is proposed to maximally decouple the iterations, so that each iteration is explicit. Further analysis and connections to primal–dual methods algorithm are given in [16, 35]. In particular, it is shown that the primal–dual algorithm scheme (4.6) with θ = 1 can be interpreted as SIU. In the following, we study the connections and differences between these two methods. ASB can be described as follows: ⎧ = (AT A + νBT B)−1 (AT b + νBT (dk − vk )), (4.10a) x ⎪ ⎨ k+1 (4.10b) (ASB) dk+1 = prox 1 f1 (Bxk+1 + vk ), ν ⎪ ⎩ vk+1 = vk − (dk+1 − Bxk+1 ), (4.10c) where ν > 0 is a parameter. The explicit SIU method proposed in the literature [35] can be described as ⎧ xk+1 = xk − δAT (Axk − b) − δνBT (Bxk − dk + vk ), (4.11a) ⎪ ⎨ (4.11b) (SIU) dk+1 = prox 1 f1 (Bxk+1 + vk ), ν ⎪ ⎩ vk+1 = vk − (dk+1 − Bxk+1 ), (4.11c) where δ > 0 is a parameter. We can easily see that we approximate the implicit step (4.10a) in ASB by an explicit step (4.11a) in SIU. From (4.2a) and (4.2c), we can find out a relation between yk and xk , given by xk = yk − λBT (vk − vk−1 ). Then eliminating xk , PDFP2 O can be expressed as y T T k+1 = yk − λB (2vk − vk−1 ) − γ ∇ f 2 (yk − λB (vk − vk−1 )), vk+1 = (I − prox γλ f1 )(Byk+1 + vk ).

(4.12a) (4.12b)

By introducing the splitting variable dk+1 in (4.12b), (4.12) can be further expressed as 16


P Chen et al

Table 2. The comparisons among ASB, SIU and PDFP2 O.

PDFP2 O

ASB

SIU

xk+1 = (AT A + νBT B)−1 (AT b + νBT (dk − vk ))

xk+1 = xk − δAT (Axk − b) −δνBT (Bxk − dk + vk )

xk+1 = xk − δAT (Axk − b) −δνBT (Bxk − dk + vk ) −δ 2 νAT ABT (dk − Bxk ) dk+1 = prox 1 f1 (Bxk+1 + vk ) dk+1 = prox 1 f1 (Bxk+1 + vk ) dk+1 = prox 1 f1 (Bxk+1 + vk ) ν ν ν vk+1 = vk − (dk+1 − Bxk+1 ) vk+1 = vk − (dk+1 − Bxk+1 ) vk+1 = vk − (dk+1 − Bxk+1 ) Convergence ν > 0 ν>0 0 < δ < 2/λmax (AT A) T T 0 < δ 1/λmax (A A + νB B) 0 < δν 1/λmax (BBT ) Form

⎧ ⎨yk+1 = yk − λBT (Byk − dk + vk ) − γ ∇ f2 (yk − λBT (Byk − dk )), dk+1 = prox γλ f1 (Byk+1 + vk ), ⎩ vk+1 = vk − (dk+1 − Byk+1 ).

(4.13)

For f2 (x) = 12 Ax − b22 , ∇ f2 (x) = AT (Ax − b). By changing the order and letting γ = δ, λ = δν, (4.13) becomes ⎧ yk+1 = yk − δAT (Ayk − b) − δνBT (Byk − dk + vk ) − δ 2 νAT ABT (dk − Byk ) (4.14a) ⎪ ⎨ (4.14b) dk+1 = (prox 1 f1 )(Byk+1 + vk ), ν ⎪ ⎩ (4.14c) vk+1 = vk − (dk+1 − Byk+1 ). We can easily see that equation (4.10a) in ASB is approximated by (4.14a). Although it seems that PDFP2 O requires more computation in (4.14a) than SIU in (4.11a), PDFP2 O has the same computation cost as that of SIU if the iterations are implemented cleverly. For the reason of comparison, we can change the variable yk to xk in (4.14). Table 2 gives the summarized comparisons among ASB, SIU and PDFP2 O. We note that the only difference of SIU and PDFP2 O is in the first step. As two algorithms converge, the algorithm PDFP2 O behaves asymptotically the same as SIU since dk − Bxk converges to 0. The parameters δ and ν satisfy respectively different conditions to ensure the convergence. 5. Numerical experiments In this section, we illustrate the numerical performance of PDFP2 Oκ for κ ∈ [0, 1) through three applications: image super-resolution, computerized tomography (CT) reconstruction and pMRI. Both the first two applications can be described as problem (1.2), where A is a linear operator representing the subsampling and tomographic projection operator respectively. In pMRI, 1 Ax − b22 is replaced by 1 Nj=1 A j x − b j 22 , and a detailed description will be 2 2 given in section 5.3. Here, we use the total variation as the regularization functional, where the operator B : Rn → R2n is a discrete gradient operator. Furthermore, the isotropic definition is adopted, i.e. f1 (w) = μw1,2 , for all w = (w1 , . . . , wn , wn+1 , . . . , w2n )T ∈ R2n , n

2 . wi2 + wn+i w1,2 = i=1

Let wi = (wi , wn+i )T , wi 2 = expressed as

2 and = wi2 + wn+i

(prox·1,2 (w))i,n+i = max{wi 2 − , 0}

μγ λ . Then prox·1,2 (w) can be

wi , wi 2

i = 1, . . . , n.

17


P Chen et al

For the implementation of PDFP2 O, we use the scheme presented in algorithm 4, where we compute directly (I − prox γλ f1 )(w). In fact, we can deduce that (I − prox γλ f1 )(w) = Proj (w), where Proj is the projection operator from R2n to 2,∞ ball of radius , i.e. wi , i = 1, . . . , n. (Proj (w))i,n+i = min{wi 2 , } wi 2 In the numerical experiments, we compare our proposed algorithm PDFP2 O with the three methods: ASB (cf (4.10)), CP (cf (4.6)) and SIU (cf (4.11)). Both ASB and CP involve linear system inversion (AT A + νBT B)−1 and (I + τ AT A)−1 respectively. In the experiments, we use the conjugate gradient (CG) method for quadratic subproblems. The maximal number of CG iterations is denoted as NI and the stopping criteria are set as the residual error is less than 10−10 . To numerically measure the convergence speed of various methods, we compute the relative error between the energy at kth outer iteration and the optimal value E. In practice, we run each method 5000 (outer iteration) steps with a large range of parameters and set the minimum of all the tests as the optimal minimum energy E. We denote εk = (Ek − E )/E.

(5.1)

In the following, we will use (5.1) as a criterion to compare the performance among ASB, CP, SIU and PDFP2 O. To guarantee the quality of recovered images, we also use the criterion peak signal-to-noise ratio (PSNR) Im − Im 2F 2552 , with MSE = PSNR = 10 log10 , MSE s1 s2 Im denotes the recovered images obtained from where Im denotes the original image of s1 × s2 , various algorithms and ·F denotes the Frobenius norm. All the experiments are implemented under MATLAB7.11(R2010b) and conducted on a computer with Intel(R) core(TM) i5 CPU 750@ 2.67G. 5.1. Image super-resolution In the numerical simulation, the subsampling operator A is implemented by taking the average of every d × d pixels and sampling the average, if a zoom-in ratio d is desired. The experiment is performed on the test image ‘lena’ of size 512 × 512 and the subsampling ratio is d = 4. White Gaussian noise of mean 0 and variance 1 is added to the observed low-resolution image of 128 × 128. The regularization parameter μ is set as 0.1 for the best image quality. First we show the impacts of the parameters κ, γ and λ for the proposed algorithm in figure 1. The conditions for theoretical convergence are 0 < γ < 2β, 0 < λ 1/λmax (BBT ) and κ ∈ [0, 1) (see theorems 3.4 and 3.5). The constant β is given by 1/λmax (AT A), and the maximal eigenvalue of AT A is 1/16, so 0 < γ < 32. It is well known in total variation application that λmax (BBT ) = 8 for B being the usual gradient operator (see [16]), and then 0 < λ 1/8. Figures 1(a) and (b) show that for most cases κ = 0 achieves the fastest convergence compared to other κ ∈ (0, 1). Thus we choose κ = 0 for the following comparison. In figures 1(c) and (d), the parameter λ has relatively smaller impact on the performance of this algorithm. We compare the results for λ = 1/5, 1/6, 1/8, 1/16, 1/32. When λ = 1/6 > 1/8, the algorithm is convergent. While for λ = 1/5, the algorithm does not appear to converge, which shows that we cannot extend the range of λ to (0, 2/λmax (BBT )) generally, as given in [24] for denoising case (see algorithm 1). Hence, we only consider 0 < λ 1/λmax (BBT ) as indicated in theorem 3.5, for which the upper bound λ = 1/8 achieves the best performance. The parameter γ has relatively larger impact for the algorithm. We test γ = 8, 16, 24, 30, 32 for κ = 0, λ = 1/8. We observe that numerically larger γ leads to a faster convergence. For this reason, we can choose γ close to 2β. 18


P Chen et al

29.5

5.1 κ=0.5 κ=0.1 κ=0.01 κ=0.001 κ=0

29

29.5 κ=0.5 κ=0.1 κ=0.01 κ=0.001 κ=0

5.09

λ=1/5 λ=1/6 λ=1/8 λ=1/16 λ=1/32

29

5.08 28.5 PSNR

PSNR

log10(energy)

28.5

5.07

28

28 5.06

27.5

27.5 5.05

27

0

50

100

150

200

250 Iteration

300

350

400

450

5.04

500

0

50

100

150

200

(a)

250 Iteration

300

350

400

450

27

500

0

50

100

150

200

(b)

5.5

29.5 λ=1/5 λ=1/6 λ=1/8 λ=1/16 λ=1/32

5.45

5.4

250 Iteration

300

350

400

450

500

(c) 5.1 γ=8 γ=16 γ=24 γ=30 γ=32

29

γ=8 γ=16 γ=24 γ=30 γ=32

5.09

5.35

log10(energy)

28.5 PSNR

log10(energy)

5.08 5.3

5.25

5.2

5.07

28 5.06

5.15

5.1

27.5 5.05

5.05

5

0

50

100

150

200

250 Iteration

(d)

300

350

400

450

500

27

0

50

100

150

200

250 Iteration

(e)

300

350

400

450

500

5.04

0

50

100

150

200

250 Iteration

300

350

400

450

500

(f)

Figure 1. PSNR and energy versus iterations with different parameters. (a) and (b) are PSNR and energy versus iterations for κ = 0.5, 0.1, 0.01, 0.001, 0(λ = 1/8, γ = 30). (c) and (d) are PSNR and energy versus iterations for λ = 1/5, 1/6, 1/8, 1/16, 1/32 (κ = 0, γ = 30). (e) and (f) are PSNR and energy versus iterations for γ = 8, 16, 24, 30, 32 (κ = 0, λ = 1/8).

As mentioned above, the optimal value E of the optimization model (1.2) for this example is obtained by taking the minimum of a large range of parameter setting on each method with 5000 iterations. The performances for each method with different parameter sets are listed in tables 3–6 for ε = 10−i , i = 1, . . . , 6. For a given ε, the first column gives the least (outer) iteration number k such that εk < ε, and the second column in the bracket gives the corresponding running time in second. The ‘−’ entries indicate that the algorithm fails to drop the error below ε within a maximum number of 5000 iterations. Table 3 shows that the number of inner iteration steps NI in CG affects the speed of ASB. We highlight the best performance for each given tolerance ε. The parameter ν also plays an important role in the performance of ASB. For this example, ν = 0.01 generally gives the smallest number of iterations and the least computation time for different tolerance levels ε. For the CP algorithm, we run the tests with different σ , θ = 1 and τ = 8σ1 according to the convergence condition in table 1. The second quadratic subproblem is solved with CG. After a simple analysis, we observe that (I + τ AT A) has only two eigenvalues (1 + τ /16) and 1; thus, this subproblem can be solved within two CG steps theoretically. Therefore we only list the results with NI = 1, 2 in table 4 for comparison, and we can see that NI = 1 has even better performance in terms of total iteration steps and computation time. For this example, σ has some impact on the performance of CP, and σ = 0.005 gives the best performance for all the tolerance levels. Similar results for the SIU algorithm is given in table 5, and a larger δ yields a better performance on respecting the convergence condition with an accordingly ‘best’ ν. For this example, δ = 24 and ν = 0.06 give the best performance among the tested parameter sets. We also test various γ and λ and list the results for PDFP2 O in terms of computation time and relative error to the optimal minimum in table 6. As we observed previously, γ and λ being close to the upper bound 2β and 1/λmax (BBT ) gives a nearly optimal convergence 19


P Chen et al

Table 3. Performance evaluation for different choices of ν and NI in ASB for image superresolution.

NI

ν

1

0.001 0.005 0.01 0.05 0.1 2 0.001 0.005 0.01 0.05 0.1 5 0.001 0.005 0.01 0.05 0.1 10 0.001 0.005 0.01 0.05 0.1

ε = 10−1 41 8 6 9 16 24 8 4 4 7 13 5 4 4 7 13 5 4 4 7

(2.39) (0.49) (0.36) (0.52) (0.94) (1.79) (0.59) (0.32) (0.30) (0.53) (1.60) (0.61) (0.50) (0.49) (0.86) (2.69) (1.03) (0.84) (0.83) (1.42)

ε = 10−2 165 45 41 103 205 101 31 21 45 89 93 24 19 45 88 93 24 19 45 88

ε = 10−3

ε = 10−4

ε = 10−5

ε = 10−6

(9.55) 486 (28.12) 1800 (104.53) – – (2.64) 144 (8.37) 469 (27.34) 1599 (93.29) – (2.39) 153 (8.94) 464 (27.05) 1572 (93.14) 3633 (7.12) 544 (32.81) 1776 (105.88) – – (13.21) 1062 (63.32) 3485 (208.60) – – (7.53) 459 (34.25) 1772 (132.31) – – (2.33) 110 (8.28) 397 (29.89) 1502 (112.96) – (1.58) 74 (5.55) 254 (19.14) 908 (69.84) 2930 (3.39) 203 (15.28) 801 (61.61) 2266 (173.80) – (6.70) 399 (30.02) 1590 (120.95) 4474 (343.83) – (11.60) 454 (56.57) 1770 (220.55) – – (2.99) 101 (12.60) 367 (45.79) 1446 (180.41) 4968 (2.37) 71 (8.91) 248 (32.75) 884 (114.19) 2866 (5.63) 181 (22.69) 769 (99.59) 2108 (271.02) – (11.01) 356 (46.29) 1524 (195.54) 4156 (533.58) – (19.27) 454 (94.18) 1770 (367.16) – – (4.96) 101 (20.87) 367 (78.11) 1446 (309.09) 4967 (3.99) 72 (15.04) 248 (53.50) 886 (189.75) 2871 (9.59) 182 (38.08) 770 (163.58) 2105 (449.95) – (18.21) 359 (76.37) 1530 (326.83) 4159 (887.33) –

(216.50)

(225.19)

(620.12) (369.36)

(1060.13) (613.37)

Table 4. Performance evaluation for different choices of σ and NI in CP for image super-resolution.

NI 1

2

σ

ε = 10−1

ε = 10−2

ε = 10−3

ε = 10−4

0.0005 110 (4.64) 335 (14.51) 953 (42.64) 3567 (157.65) 0.001 45 (1.89) 171 (7.19) 487 (20.56) 1803 (77.75) 0.005 6 (0.25) 46 (1.94) 150 (6.35) 481 (20.37) 0.01 3 (0.13) 57 (2.42) 217 (10.22) 650 (28.45) 0.05 6 (0.25) 262 (11.09) 1049 (45.82) 3110 (134.20) 0.0005 33 (1.61) 185 (9.11) 909 (44.93) 3537 (178.11) 0.001 22 (1.06) 99 (4.89) 458 (24.12) 1773 (90.94) 0.005 5 (0.24) 45 (2.23) 149 (7.37) 480 (23.72) 0.01 3 (0.14) 56 (2.77) 216 (10.69) 648 (31.95) 0.05 6 (0.28) 262 (12.90) 1048 (51.63) 3109 (156.60)

ε = 10−5 – – 1616 2084 – – – 1617 2080 –

ε = 10−6

– – (69.77) – (90.45) 4186 (180.86) – – – (81.60) – (104.27) 4181 (211.21) –

Table 5. Performance evaluation for different choices of δ and ν in SIU for image super-resolution. The impacts of different ν for δ = 8, 16, 30 are similar to δ = 24; thus, we only list the cases with different ν for δ = 24.

δ

ν

ε = 10−1

ε = 10−2

ε = 10−3

8 0.02 4 (0.11) 81 (2.17) 327 16 0.01 5 (0.15) 46 (1.26) 173 24 0.006 7 (0.20) 42 (1.14) 141 0.005 8 (0.23) 47 (1.27) 153 0.001 54 (1.45) 191 (5.14) 507 0.0005 129 (3.46) 371 (10.12) 980 30 0.0016 32 (0.86) 109 (2.94) 323

20

ε = 10−4

ε = 10−5

(8.71) 975 (26.53) 3106 (4.65) 519 (13.86) 1675 (3.78) 446 (12.45) 1472 (4.23) 490 (13.22) 1625 (13.60) 1817 (48.52) – (26.77) 3584 (96.88) – (8.65) 1147 (30.73) 4496

(83.28) (44.64) (39.77) (43.96)

ε = 10−6

– 3542 (94.95) 4342 (116.73) – – – (121.20) –


P Chen et al

Table 6. Performance evaluation for different choices of γ and λ in PDFP2 O for image superresolution. The impacts of λ for γ = 8, 16 are similar with γ = 24, 30.

γ

λ

8 1/8 16 1/8 24 1/6 1/8 1/16 1/32 30 1/6 1/8 1/16 1/32

ε = 10−1 3 3 3 5 15 41 5 6 17 46

ε = 10−2

ε = 10−3

ε = 10−4

(0.09) 81 (2.05) 328 (8.30) 977 (24.75) (0.09) 49 (1.24) 178 (4.50) 540 (13.64) (0.09) 38 (0.97) 134 (3.41) 417 (10.56) (0.14) 45 (1.15) 149 (3.80) 478 (12.14) (0.38) 76 (1.95) 224 (5.74) 768 (19.93) (1.03) 144 (3.64) 397 (10.03) 1413 (36.15) (0.14) 38 (0.97) 128 (3.25) 417 (10.59) (0.18) 46 (1.20) 150 (3.83) 508 (13.26) (0.43) 82 (2.06) 252 (6.38) 897 (22.90) (1.15) 160 (4.00) 470 (11.88) 1733 (44.13)

ε = 10−5 3115 1750 1373 1587 2811 – 1412 1788 3465 –

(79.08) (44.10) (34.68) (40.15) (71.70) (35.81) (45.60) (88.06)

ε = 10−6 – 3982 3826 4940 – – 4579 – – –

(100.71) (96.96) (125.14) (116.08)

Table 7. Performance comparison among ASB, CP, SIU and PDFP2 O for image super-resolution. For a given error tolerance ε, the first column in the bracket gives the first outer iteration number k such that εk < ε, the second column in the bracket gives the corresponding run time in second and the third column in the bracket gives the corresponding PSNR. For ASB, NI = 2, ν = 0.01. For CP, NI = 1, σ = 0.005. For SIU, δ = 24, ν = 0.006. For PDFP2 O, γ = 30, λ = 1/6.

ε = 10−2 ASB CP SIU PDFP2 O

(21, (46, (42, (38,

1.58, 1.94, 1.14, 0.97,

29.34) 28.97) 28.91) 28.98)

ε = 10−3 (74, (150, (141, (128,

5.55, 6.35, 3.78, 3.25,

29.38) 29.24) 29.22) 29.25)

ε = 10−4 (254, (481, (446, (417,

19.14, 20.37, 12.45, 10.59,

ε = 10−5 29.37) 29.32) 29.31) 29.32)

(908, (1616, (1472, (1412,

69.84, 69.77, 39.77, 35.81,

29.36) 29.35) 29.35) 29.35)

speed. Table 6 shows that γ = 24, λ = 1/8 has slightly better convergence speed, while γ = 30, λ = 1/8 can get slightly higher PSNR in the first steps, as shown in figure 1(e). Finally, we compare the four methods with their ‘optimal’ parameter sets (averagely best for all the tolerance levels) in table 7. We also compare their corresponding values of PSNR to measure the recovered image quality. From table 7, we can see that PDFP2 O is better than ASB and CP in terms of the computation time, especially for a higher accuracy level. The performance of the two explicit methods SIU and PDFP2 O is similar. However, the choice of parameters for PDFP2 O is relatively easier compared to SIU. Also, we point out that ASB can attain higher PSNR at the first few steps for some good choices of ν, which can be interesting in practice when a crude approximation is needed in a short time. Figure 2 shows the images recovered with the four methods for ε = 10−4 and the images look similar as expected. 5.2. CT reconstruction In a simplified parallel beam tomographic problem, an observed body slice is modeled as a two-dimensional function, and projections modeled by line integrals represent the total attenuation of a beam of x-rays when it traverses the object. The operator for this application can be represented by a discrete Radon transform, and the tomographic reconstruction problem is then to estimate a function from a finite number of measured line integrals (see [4]). The standard reconstruction algorithm in clinical applications is the so-called filtered back projection (FBP) algorithm. In the presence of noise, this problem becomes difficult since the inverse of Radon transform is unbounded and ill-posed. In the literature, the model (1.2) is often used for iterative reconstruction. Here, A is the Radon transform matrix and b is the 21


P Chen et al

Original

Zooming

A S B , P S N R = 2 9 .3 7

CP, PSNR=29.32

S I U , P S N R = 2 9 .3 1

PDFP2 O, PSNR=29.32

Figure 2. Super resolution results from 128 × 128 image to 512 × 512 image by ASB, CP, SIU and PDFP2 O corresponding to tolerance error ε = 10−4 with noise level 1. 22


P Chen et al

34

34

5.2

κ=0.5 κ=0.1 κ=0.01 κ=0.001 κ=0

32

κ=0.5 κ=0.1 κ=0.01 κ=0.001 κ=0

5.1

5

λ=1/5 λ=1/6 λ=1/8 λ=1/16 λ=1/32

32

30

30 4.9

28

26

4.8

PSNR

log10(energy)

PSNR

28

4.7

4.6

24

26

24

4.5

22

22 4.4

20

18

20

4.3

0

100

200

300

400

500 Iteration

600

700

800

900

4.2

1000

0

100

200

300

400

(a)

500 Iteration

600

700

800

900

18

1000

0

100

200

300

400

(b) λ=1/5 λ=1/6 λ=1/8 λ=1/16 λ=1/32

5.1

5

600

700

800

900

1000

(c)

34

5.2

500 Iteration

5.2

γ=0.4 γ=0.7 γ=1 γ=1.2 γ=1.3

32

γ=0.4 γ=0.7 γ=1 γ=1.2 γ=1.3

5.1

5

30 4.9

4.9

4.7

4.6

log10(energy)

4.8

PSNR

log10(energy)

28

26

4.8

4.7

4.6

24

4.5

4.5

22 4.4

4.4

20

4.3

4.2

0

100

200

300

400

500 Iteration

(d)

600

700

800

900

1000

18

4.3

0

100

200

300

400

500 Iteration

(e)

600

700

800

900

1000

4.2

0

100

200

300

400

500 Iteration

600

700

800

900

1000

(f)

Figure 3. PSNR and energy versus iterations for different parameters in CT reconstruction. (a) and (b) are PSNR and energy versus iterations for κ = 0.5, 0.1, 0.01, 0.001, 0 (λ = 1/8, γ = 1.3). (c) and (d) are PSNR and energy versus iterations for λ = 1/5, 1/6, 1/8, 1/16, 1/32 (κ = 0, γ = 1.3). (e) and (f) are PSNR and energy versus iterations for γ = 0.4, 0.7, 1, 1.2, 1.3 (κ = 0, λ = 1/8).

measured projections vector. Generally, the size of A is huge and it is not easy to compute the inverse directly. We note that the total variation regularization has become a standard tool in tomographic reconstruction. Recently, first-order methods have been applied for a faster implementation, for example SIU is applied in [34] for CT and the Chambolle–Pock algorithm is applied in [1] for PET and cone beam CT reconstruction. Here, we use the same example tested in [35], i.e. 50 uniformly oriented projections are simulated for a 128×128 Shepp–Logan phantom image and then white Gaussian noise of mean 0 and variance 1 is added to the data. For this example, we compute numerically λmax (AAT ) = 1.5086, so we can set 0 < γ < 2/λmax (AAT ) = 1.3257. As the previous example, we first test the impacts of the parameters κ, γ and λ. The impact of κ has the same behavior as for the super-resolution example, i.e. κ = 0 is the best one for κ ∈ [0, 1) (see figures 3(a) and (b)). Similarly, the parameter λ has relatively small impact on the performance of the algorithm (see figures 3(c) and (d)). It seems that the algorithm still converges with λ = 1/5, but it cannot achieve high accuracy (table 11). As the previous example, the parameter γ has larger impact on the convergence rate of the algorithm (see figures 3(e) and (f)). Theoretically, it should satisfy 0 < γ < 1.3257. Numerically, we test γ = 0.4, 0.7, 1, 1.2, 1.3 for κ = 0, λ = 1/8. Better performance with a larger γ is observed (see figures 3(e) and (f)), while when γ = 1.4, the algorithm diverges. As the previous application, we show the performance of ASB, CP, SIU, PDFP2 O with different parameter sets in tables 8, 9, 10, 11, respectively. For the algorithm ASB, table 8 shows that NI = 5 is the best choice for ε = 10−1 , 10−2 , while NI = 2 is the best one for ε = 10−i , i = 3, 4, 5, 6. We choose NI = 2 for an averagely good performance. Similarly, ν = 0.01 is the best for ε = 10−i , i = 1, 2, 3, 4 and ν = 0.05 is the best for ε = 10−5 , 10−6 . Thus, we use the two parameter sets for comparison (see table 12). Similar to ASB, the best 23


P Chen et al

Table 8. Performance evaluation for different choices of ν and NI in ASB for CT reconstruction.

NI

ν

1

0.001 0.005 0.01 0.05 0.1 2 0.001 0.005 0.01 0.05 0.1 5 0.001 0.005 0.01 0.05 0.1 10 0.001 0.005 0.01 0.05 0.1

ε = 10−1

ε = 10−2

ε = 10−3

ε = 10−4

ε = 10−5

195 147 154 145 148 76 32 28 47 68 18 9 9 34 66 10 7 9 33 65

551 362 366 360 367 206 73 67 124 208 59 21 26 115 227 44 19 27 115 228

995 662 657 646 654 422 149 129 229 421 355 77 63 232 457 352 76 63 231 458

2380 1070 1002 960 967 2085 478 279 345 653 1985 407 219 356 698 1974 401 217 356 699

– 2310 1564 1304 1298 – 2083 1068 492 899 – 1897 977 508 952 – 1872 963 508 952

(3.24) (2.11) (2.22) (2.13) (2.12) (1.58) (0.66) (0.60) (0.99) (1.42) (0.74) (0.38) (0.36) (1.38) (2.68) (0.75) (0.51) (0.67) (2.42) (5.95)

(8.43) (5.20) (5.27) (5.24) (5.28) (4.32) (1.52) (1.42) (2.59) (4.35) (2.40) (0.86) (1.06) (4.68) (9.25) (3.25) (1.40) (1.99) (8.47) (18.76)

(14.92) (11.13) (9.46) (9.35) (9.40) (8.87) (4.13) (2.72) (4.79) (8.82) (14.56) (3.16) (3.33) (9.44) (18.62) (27.50) (5.59) (4.65) (17.02) (35.73)

(35.15) (17.24) (14.43) (13.86) (13.92) (43.81) (11.83) (5.87) (7.24) (13.69) (83.22) (16.63) (11.03) (16.26) (30.16) (149.35) (29.54) (16.01) (26.24) (53.50)

ε = 10−6

– – 4893 1794 1692 – (45.54) – (24.49) 4670 (10.33) 1011 (18.86) 1204 – (79.30) – (41.97) 4287 (22.99) 982 (40.51) 1261 – (142.26) – (73.15) 4236 (37.45) 981 (74.23) 1261 (35.08) (24.64) (18.81) (18.66)

(72.57) (25.86) (24.35)

(102.17) (21.23) (25.28) (181.05) (42.31) (53.13) (322.62) (74.43) (97.04)

Table 9. Performance evaluation for different choices of σ and NI in CP for CT reconstruction.

NI 1

σ

0.001 0.01 0.02 0.05 0.1 2 0.001 0.01 0.02 0.05 0.1 5 0.001 0.01 0.02 0.05 0.1 10 0.001 0.01 0.02 0.05 0.1

ε = 10−1 171 150 153 153 164 77 40 40 82 162 18 19 34 81 161 12 19 34 81 161

(2.42) (2.18) (2.10) (2.11) (2.27) (1.57) (0.81) (0.81) (1.62) (3.21) (0.69) (0.75) (1.34) (3.13) (6.20) (1.44) (1.36) (2.36) (5.63) (11.41)

ε = 10−2 440 362 364 364 387 175 94 93 195 385 56 44 80 193 384 46 44 80 193 384

(6.18) (7.23) (5.02) (5.00) (5.32) (3.55) (2.04) (1.89) (3.88) (7.68) (2.18) (1.72) (3.13) (7.45) (14.80) (3.93) (3.11) (5.59) (13.45) (26.90)

ε = 10−3 847 652 650 647 686 415 174 167 346 684 354 87 144 344 683 353 87 144 344 683

(11.90) (11.26) (8.97) (8.88) (9.41) (8.40) (4.76) (3.37) (6.89) (13.65) (13.80) (3.75) (5.63) (13.29) (27.40) (25.53) (6.14) (10.04) (23.97) (47.70)

ε = 10−4 2314 998 972 960 1015 2098 326 267 515 1013 1985 249 236 513 1011 1977 249 236 513 1011

(32.57) (16.05) (15.55) (13.17) (13.92) (42.52) (8.72) (5.40) (10.26) (20.56) (79.67) (11.54) (9.22) (19.80) (40.80) (143.57) (19.95) (16.46) (37.65) (70.79)

ε = 10−5 – 1559 1375 1305 1365 – 1083 588 714 1363 – 1053 573 712 1361 – 1054 573 712 1361

(23.82) (21.11) (17.90) (19.41) (24.03) (12.73) (14.21) (29.14) (42.90) (24.49) (27.50) (54.30) (78.92) (41.86) (51.50) (92.93)

ε = 10−6 – 4879 2628 1791 1779 – 4671 2401 1156 1777 – 4636 2389 1154 1775 – 4637 2388 1154 1775

(71.22) (38.39) (24.57) (26.15) (98.46) (50.60) (23.05) (37.38) (186.29) (97.59) (46.67) (70.24) (339.05) (172.31) (82.26) (116.60)

parameter sets chosen in terms of computation time for different tolerance for each method are NI = 2 and ν = 0.02 for CP (see table 9), δ = 1.3 and ν = 0.1 for SIU (see table 10), γ = 1.3 and λ = 1/8 for PDFP2 O (see table 11). All these results are compared in table 12. Figure 4 gives the corresponding images recovered for ε = 10−4 . From table 12, we can observe that the evolution of PSNR and energy are very close for PDFP2 O and SIU, although 24


P Chen et al

Table 10. Performance evaluation for different choices of δ and ν in SIU for CT reconstruction. The impacts of ν for δ = 0.4, 0.7, 1, 1.2 are similar with δ = 1.3, and we only list the results with different ν for δ = 1.3.

δ

ν

0.4 0.7 1 1.2 1.3

0.2 0.2 0.1 0.1 0.2 0.1 0.05 0.01 0.005

ε = 10−1 501 286 201 167 184 154 155 157 162

ε = 10−2

(3.56) (2.00) (1.47) (1.17) (1.29) (1.08) (1.09) (1.10) (1.14)

1196 684 479 399 – 368 369 373 380

ε = 10−3

(8.52) 2127 (15.17) (4.79) 1216 (8.52) (3.52) 852 (6.20) (2.80) 710 (4.98) – (2.59) 655 (4.61) (2.59) 656 (4.61) (2.62) 668 (4.70) (2.67) 685 (4.81)

ε = 10−4 3146 1798 1261 1051 – 971 975 1019 1098

ε = 10−5

(22.43) (12.60) (9.08) (7.38) (6.84) (6.85) (7.16) (7.72)

4210 2408 1695 1415 – 1307 1326 1580 2336

ε = 10−6

(30.01) (16.87) (12.12) (9.94) (9.20) (9.32) (11.10) (16.58)

– 3071 2195 1842 – 1707 1816 4881 –

(21.61) (15.64) (12.93) (12.03) (12.77) (34.56)

Table 11. Performance evaluation for different choices of γ and λ in PDFP2 O for CT reconstruction. The impacts of different λ for γ = 0.4, 0.7, 1, 1.2 are similar with γ = 1.3, and we only list the cases with different λ for γ = 1.3.

γ

λ

0.4 0.7 1 1.2 1.3

1/8 1/8 1/8 1/8 1/5 1/6 1/8 1/16 1/32

ε = 10−1 501 286 201 167 157 154 154 155 155

ε = 10−2

(3.51) 1196 (8.40) (2.01) 684 (4.79) (1.41) 479 (3.34) (1.16) 399 (2.78) (1.10) 590 (4.11) (1.07) 368 (2.57) (1.11) 368 (2.61) (1.09) 369 (2.57) (1.08) 370 (2.58)

ε = 10−3 2127 1216 851 710 – 655 655 656 659

(15.10) (8.53) (5.93) (4.94) (4.57) (4.61) (4.57) (4.59)

ε = 10−4 3145 1799 1260 1051 – 970 971 975 984

ε = 10−5

(22.24) (12.61) (8.77) (7.33) (6.76) (6.81) (6.80) (6.85)

4208 2409 1692 1414 – 1303 1307 1327 1371

ε = 10−6

(29.66) (16.89) (11.79) (9.85) (9.16) (9.15) (9.26) (9.56)

– 3074 2178 1838 – 1687 1709 1825 2320

(21.56) (15.16) (12.81) (11.94) (11.95) (12.72) (16.16)

Table 12. Performance evaluation comparison among ASB, CP, SIU and PDFP2 O in CT reconstruction. For a given error tolerance ε, the first column in the bracket gives the first outer iteration number k such that εk < ε, the second column in the bracket gives the corresponding run time in second and the third column in the bracket gives the corresponding PSNR. For ASB1 , NI = 2, ν = 0.01; for ASB2 , NI = 2, ν = 0.05. For CP, NI = 2, σ = 0.02. For SIU, δ = 1.3, ν = 0.1. For PDFP2 O, γ = 1.3, λ = 1/8.

ε = 10−2 ASB1 ASB2 CP SIU PDFP2 O

(67, (124, (44, (368, (368,

1.42, 2.59, 1.72, 2.59, 2.61,

30.29) 30.06) 30.48) 30.03) 30.03)

ε = 10−3 (129, (229, (167, (655, (655,

2.72, 4.79, 3.37, 4.61, 4.61,

31.97) 31.75) 31.80) 31.70) 31.70)

ε = 10−4 (279, (345, (267, (971, (971,

5.87, 7.24, 5.40, 6.84, 6.81,

32.43) 32.26) 32.32) 32.23) 32.23)

ε = 10−5 (1068, (492, (588, (1307, (1307,

24.49, 10.33, 12.73, 9.20, 9.15,

32.45) 32.41) 32.45) 32.38) 32.38)

the iterative schemes are different for the two algorithms. We note that both ASB and CP have better convergence than PDFP2 O and SIU in the early steps, while PDFP2 O and SIU are slightly better if higher accuracy is required. One explanation for this behavior is that when the condition number of AT A is large, explicit methods such as PDFP2 O and SIU approximate the inverse of AT A slowly. For ASB and CP, the regularized inverses (AT A + νBT B)−1 and (I +τ AT A)−1 are used respectively as preconditioning that can partially avoid bad conditioning of AT A at the early steps, while it takes longer due to unnecessary inner iterations later on. For 25


P Chen et al

Original

FBP

ASB, PSNR=32.43

CP, PSNR=32.32

SIU, PSNR=32.23

PDFP2 O, PSNR=32.23

Figure 4. A tomographic reconstruction example for a 128 × 128 image, with 50 projections corresponding to tolerance error ε = 10−4 with noise level 1.

the bad conditioned case, we can consider an efficient preconditioning technique for PDFP2 O, which will be presented in a forthcoming work. 5.3. Parallel MRI Magnetic resonance imaging (MRI) is a medical imaging technique largely used in clinical radiology to visualize the internal structure and function of the body by noninvasive and nonionizing means. It provides better contrast between the different soft tissues than other modalities such as CT and PET. MRI images are obtained through an inversion of Fourier data acquired by the receiver coils. Parallel MRI (pMRI) is a recent technique to accelerate sampling speed in conventional MRI. Instead of relying on increased gradient performance for increased imaging speed, pMRI extracts extra spatial information from an array of surface coils, surrounding the scanned objects by multiple receivers and collecting in the parallel part of the Fourier components at each receiver, resulting in an accelerated image acquisition. There are two general approaches for removing the aliasing artifacts due to Fourier space subsampling: image-domain-methods and k-space-based methods (see [5]). On the other hand, the total variation regularization has also been considered in the literature in order to obtain a better image quality, such as [21, 22, 11]. In this paper, we employ image-domain-based methods and coil sensitivity maps to reconstruct the underlying image. Sensitivity encoding (SENSE) is the most common imagedomain-based parallel imaging method. It is based on the following model which relates the partial k-space data b j , acquired by the jth receiver and the unknown image x: b j = DFS j x + n, where b j is the vector of measured Fourier coefficients at receiver j, D is a diagonal downsampling operator, F is the Fourier transform, S j corresponds to diagonal coil sensitivity 26


(a)

(b)

P Chen et al

Coil1

Coil2

Coil3

Coil4

Coil1

Coil2

Coil3

Coil4

Coil5

Coil6

Coil7

Coil8

Figure 5. In vivo MR images acquired (a) four-channel spine data and (b) eight-channel head data.

mapping for receiver j and n is Gaussian noise. In practice, S j can be often estimated in advance. Let A j = DFS j ; then we can recover images x by solving the least-squares problem with the total variation regularization 1 A j x − b j 22 , 2 j=1 N

x∗ = arg min μBx1 + x∈Rn

(5.2)

where B is the discrete gradient matrix and N is the total number of receivers. Conventionally, the downsampling operator D is implemented with the sampling ratio R = 2, 4 along one dimension (corresponding to phase-encoding). In our experiments, we use the test data provided by the online MATLAB toolbox PULSAR [20]. The toolbox contains two sets of real data acquired by MR systems with a coil array. The first is a spine data set acquired on a 3 tesla whole-body GE scanner using a four-channel CTL spine array and the second is a brain data set acquired using an eight-channel head array. For the details of machine configuration, see [20]. Figure 5 shows the images of the multichannel data, four coils for the spine and eight coils for the brain. We use the sensibility map S j estimated by the build-in function in PULSAR. The square root of the sum of squares (SOS) image of the N full data coils (without downsampling D) is used as a reference image, for which pixel (i, j) is given by N |(F −1 bk )(i, j)|2 . SOS(i, j) = k=1

27


P Chen et al

Given a region of interest (ROI), a measure of the artifact power (AP) for a reconstruction image I rec to a reference image I ref is defined as ref rec 2 (i, j)∈ROI | |I (i, j)| − c|I (i, j)| | , AP = ref 2 (i, j)∈ROI |I (i, j)|

ref 2 rec 2 where c = (i, j)∈ROI |I (i, j)| / (i, j)∈ROI |I (i, j)| . The factor c is used to minimize the scaling effect that might be introduced during the reconstruction process. Another useful index for evaluating image quality is a two-region SNR if a reference image is not available. The two-region SNR method is calculated from a ROS (region of signal) and a RON (region of noise) by Mean of ROS SNR = 20 log10 . Standard Deviation of RON This SNR measure strongly depends on the locations of the ROS and RON defined. The RON can usually be selected from the background areas where no object features are present. The ROS and ROI definitions used for the SNR evaluation are also shown in the first image SOS in figures 6 and 7. In the experiments, we set μ = 0.005 and zero as the initial values for the reconstruction 2 we estimate the maximum eigenvalue of the algorithms ASB, CP, SIU Nand ∗PDFP O. Now ∗ ∗ A A , where A is the conjugate transpose of A. According to system matrix A A = j j=1 j A j = DFS j , we have ⎛ ⎞ S1 N ⎜ .. ⎟ ∗ ∗ ∗ T ∗ ∗ T S j (F D DF )S j = S (F D DF )S, where S = ⎝ . ⎠ A A= j=1 SN and S∗j denotes the conjugate of sensibility mapping and F ∗ = F −1 is the inverse Fourier transform. Since D is a downsampling operator with the diagonal elements 1 and 0, we have λmax (F −1 DT DF ) = 1. Thus, N ∗ ∗ ∗ T 2 2 2 ∗ ∗ Si Si . λmax (A A) = λmax (S F D DFS) = DFS2 DF2 S2 = λmax (S S) = λmax N

∗ i=1 Si Si

i=1

The matrix is diagonal and the maximum diagonal element is approximately 1. Based on this simple calculation, we can set γ ≈ 2, λ = 1/8 by using our nearly optimal rule on applying PDFP2 O. We also observe that a slightly larger γ may yield slightly better performance for R = 4 in the two data sets, but we still set the universal parameter γ = 2 according to the simple rule for different test data and sampling ratios. For the other three methods, we run a big host of parameter sets and iteration numbers to choose the ‘optimal’ one according to the criteria AP and SNR. Figures 6 and 7 show the recovered images with different methods for the two test data sets. According to the numerical results from [20], we choose the best three methods, SENSE, SPACE-RIP and GRAPPA, for comparison. The reconstruction results of SENSE, SPASE-RIP and GRAPPA are obtained using the PULSAR toolbox. In figure 6 on spine data, for the case of sample ratio R = 2, all the four methods based on the TV model (5.2) outperform SENSE, SPACE-RIP and GRAPPA in terms of both computation time and SNR. Among the four TV methods, SIU and PDFP2 O perform closely as expected, and PDFP2 O takes slightly less time to attain the similar AP and SNR with ASB and CP. For R = 4 on spine data, SENSE fails to reconstruct a clean image, and GRAPPA has the best AP value. Among the four TV methods, ASB uses the least amount of time with the ‘optimal’ parameter sets. Similar results are obtained for the brain data (figure 7). The four TV-based methods (5.2) outperform SENSE, 28

Inverse Problems 29 (2013) 025011 SOS

P Chen et al SENSE

SPACE-RIP

GRAPPA

0.002148

ROI ROS

RON

R=2 AP

0

0.004159

0.004159

SNR

31.38

27.18

27.18

29.87

time

0.17

6.10

102.55

11.44

ASB

CP

SIU

PDFP2 O

R=2 AP

0.002502

0.002529

0.002522

0.002527

SNR

35.54

34.56

34.73

34.50

time

0.59

0.72

0.37

0.23

SOS

SENSE

SPACE-RIP

GRAPPA

0.008238

R=4 AP

0

1.451063

0.137803

SNR

31.38

17.82

15.87

25.42

time

0.17

3.28

52.66

11.23

ASB

CP

SIU

PDFP2 O

R=4 AP

0.028629

0.028580

0.036754

0.040011

SNR

36.73

38.16

38.01

38.06

time

9.05

22.65

20.65

20.09

Figure 6. Recover results from the four-channel in vivo spine data with the subsample ratio R = 2, 4. The size of the image is 256 × 256. The AP, SNR and run time are shown under each image. ROI: [15, 220] × [35, 165]; ROS: [50, 120] × [130, 160]; RON: [190, 250] × [185, 215]. NO denotes the outer iteration numbers and NI denotes the inner iteration numbers. For R = 2, ν = 0.1, NI = 2, NO = 4 for ASB; θ = 1, τ = 1/(8σ ), σ = 0.05, NI = 1, NO = 8 for CP; δ = 1.5, ν = 0.05, NO = 8 for SIU; γ = 2, λ = 1/8, NO = 8 for PDFP2 O. For R = 4, ν = 0.01, NI = 10, NO = 20 for ASB; θ = 1, τ = 1/(8σ ), σ = 0.005, NI = 5, NO = 100 for CP; δ = 2.4, ν = 0.0025, NO = 500 for SIU; γ = 2, λ = 1/8, NO = 500 for PDFP2 O.

29

Inverse Problems 29 (2013) 025011 SOS

R O S

P Chen et al SENSE

SPACE-RIP

GRAPPA

0.001624

R O I

R O N

R=2 AP

0

0.001939

0.001939

SNR

36.08

31.08

31.08

29.06

time

0.19

5.94

155.88

44.85

ASB

CP

SIU

PDFP2 O

R=2 AP

0.000811

0.000823

0.000823

0.000822

SNR

39.20

39.27

39.26

39.36

time

1.81

3.72

1.87

1.84

SOS

SENSE

SPACE-RIP

GRAPPA

0.004879

R=4 AP

0

0.027495

0.027497

SNR

36.08

27.05

27.04

23.55

time

0.19

3.33

88.03

34.03

ASB

CP

SIU

PDFP2 O

R=4 AP

0.002431

0.002490

0.002462

0.002478

SNR

41.85

41.74

42.52

43.07

time

7.73

10.17

10.49

10.89

Figure 7. Recover results from the eight-channel in vivo brain data with the subsample ratio R = 2, 4. The AP, SNR and run time are shown for each image. The size of the image is 256 × 256. ROI: [65, 190]×[15, 220]; ROS: [70, 120]×[110, 180]; RON: [220, 250]×[200, 250]. NO denotes the iteration (outer) number and NI denotes the inner iterations number. For R = 2, ν = 0.1, NI = 5, NO = 4 for ASB; θ = 1, τ = 1/(8σ ), σ = 0.05, NI = 1, NO = 25 for CP; δ = 1.5, ν = 0.05, NO = 25 for SIU; γ = 2, λ = 1/8, NO = 25 for PDFP2 O. For R = 4, ν = 0.05, NI = 10, NO = 10 for ASB; θ = 1, τ = 1/(8σ ), σ = 0.01, NI = 2, NO = 50 for CP; δ = 2.2, ν = 0.008, NO = 150 for SIU; γ = 2, λ = 1/8, NO = 160 for PDFP2 O.

30


P Chen et al

SPACE-RIP and GRAPPA in terms of AP and SNR. For R = 2, PDFP2 O takes almost the same amount of time to obtain the similar AP and SNR with ASB, CP and SIU. For R = 4, SENSE takes the least amount of time, while ASB, CP, SIU and PDFP2 O obtain much higher AP and SNR. ASB is faster than CP, SIU and PDFP2 O for comparable SNR and AP if the best iteration number is chosen, while PDFP2 O can achieve a slightly better SNR. 6. Discussions and perspectives We summarize the comparison of the proposed method PDFP2 O with the other three methods, ASB, CP and SIU. • All the four methods in comparison are efficient and enjoy comparative numerical performance, especially when the parameters are properly chosen. This also implies that first-order methods are particularly suitable for large-scale non-differentiable optimization problems arising in image processing and inverse problems. • For both ASB and CP, inversion of the large-scale linear system is involved for solving (1.2). In practice, an inexact solver such as the conjugate gradient method can be applied without actual matrix inversion. However, the number of inner steps greatly affects the overall performance for most of applications when the distribution of eigenvalues of the system matrix is unknown. Therefore, the choice of inner iteration is rather ad hoc and no available theoretical results on the number of inner iterations can be used directly as a general guideline. For both SIU and PDFP2 O, the iteration schemes are straightforward since only one inner iteration is used and easy to implement in practice. • The rules of parameter selection in PDFP2 O are easier than those for the other three methods for practical reason. Theoretically, both ASB and CP have the advantage that there is one almost free parameter (ν > 0 for ASB and SIU, σ > 0 or τ > 0 for CP, see tables 1, 2) that guarantees the convergence. This property is naturally inherited from the implicit numerical scheme for solving the subproblem. However, an ‘optimal’ choice of parameter sets in real applications is rather challenging since these parameters affect greatly the convergence speed and restoration quality, while for PDFP2 O, if the maximal eigenvalue can be roughly estimated, as presented in the previous section, we observe empirically a nearly optimal performance by setting the parameter γ and λ as the upper bound for most cases. Therefore, this can be a practical principle for general image restoration and inverse problems. • Finally, the convergence analysis for the proposed method is derived directly from classical Banach contractive mapping theory, and the relation of convergence rate and condition number of the operators is clearly stated under this theoretical framework. Compared to the reduced version in the denoising case FP2 O proposed in [24], our results are stronger and more general. This also provides an insight for considering preconditioning to accelerate the performance of the algorithm. In conclusion, we design an efficient algorithm, PDFP2 Oκ , for solving problem (1.1). We express it in a fixed point form to analyze its convergence with κ ∈ [0, 1) for very general cases. For some special cases, we further analyze the convergence rate of PDFP2 Oκ . To highlight the nature of PDFP2 O (κ = 0), we present some equivalent forms of PDFP2 O and reveal its connections and differences with other algorithms. For the implementation of PDFP2 Oκ , no linear systems are required to solve at each iteration, and the strategies for choosing involved parameters are very clear and instructive. The efficiency of PDFP2 Oκ is illustrated through some numerical examples on image super-resolution, CT reconstruction and pMRI reconstruction, and generally the proposed methods achieve comparable results 31


P Chen et al

with other state-of-art methods. It can be easily adapted to many inverse problems involving large-scale and complicated functionals minimization with the instructive rule of parameter selection. Finally, the efficiency can be improved by using the preconditioning technique, which will be discussed in a forthcoming work. Acknowledgments We are grateful to the anonymous reviewers for their valuable comments and advice, leading to the great improvement of the manuscript. We are also grateful to the author of PULSAR for making the toolbox available. JH was partially supported by the NSFC (grant nos. 11171219, 11161130004) and E–Institutes of Shanghai Municipal Education Commission (E03004). XZ was partially supported by the NSFC (grant nos. 11101277, 11161130004) and the Shanghai Pujiang Talent Program (grant no. 11PJ1405900). References [1] Anthoine S, Aujol J, Boursier Y and Melot C 2011 On the efficiency of proximal methods in CBCT and PET ICIP’2011: IEEE 18th Int. Conf. on Image Processing pp 1365–8 [2] Argyriou A, Micchelli C A, Pontil M, Shen L and Xu Y 2011 Efficient first order methods for linear composite regularizers arXiv:1104.1436 [3] Arrow K J, Hurwicz L and Uzawa H 1958 Studies in Linear and Non-linear Programming (Stanford: Stanford University Press) [4] Avinash C and Malcolm S 2001 Principles of Computerized Tomographic Imaging (Philadelphia, PA: SIAM) [5] Blaimer M, Breuer F, Muelle M, Heidemann R M, Griswold M A and Jakob P M 2004 SMASH, SENSE, PILS, GRAPPA: How to choose the optimal method? Top. Magn. Reson. Imaging 15 223–36 [6] Bonettini S and Ruggiero V 2012 On the convergence of primal-dual hybrid gradient algorithms for total variation image restoration J. Math. Imaging Vis. 44 236–53 [7] Boyd S, Parikh N, Chu E, Peleato B and Eckstein J 2010 Distributed optimization and statistical learning via the alternating direction method of multipliers Found. Trends Mach. Learn. 3 1–122 [8] Chambolle A 2004 An algorithm for total variation minimization and applications J. Math. Imaging Vis. 20 89–97 [9] Chambolle A and Pock T 2011 A first-order primal–dual algorithm for convex problems with applications to imaging J. Math. Imaging Vis. 40 120–45 [10] Chen D-Q, Zhang H and Cheng L-Z 2012 A fast fixed-point algorithm for total variation deblurring and segmentation J. Math. Imaging Vis. 43 167–79 [11] Chen Y, Hager W, Huang F, Phan D, Ye X and Yin W 2012 Fast algorithms for image reconstruction with application to partially parallel MR imaging SIAM J. Imaging Sci. 5 90–118 [12] Combettes P L and Wajs V R 2005 Signal recovery by proximal forward-backward splitting Multiscale Model. Simul. 4 1168–200 [13] Deng W and Yin W 2012 On the global and linear convergence of the generalized alternating direction method of multipliers UCLA CAM Report (12–52) [14] Eckstein J and Bertsekas D 1992 On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators Math. Program. 55 293–318 [15] Esser E 2009 Applications of Lagrangian-based alternating direction methods and connections to split Bregman UCLA CAM Report (09–31) [16] Esser E, Zhang X and Chan T F 2010 A general framework for a class of first order primal–dual algorithms for convex optimization in imaging science SIAM J. Imaging Sci. 3 1015–46 [17] Goldstein T, O’Donoghue B and Setzer S 2012 Fast alternating direction methods UCLA CAM Report (12–35) [18] Goldstein T and Osher S 2009 The split Bregman method for l 1 regularized problems SIAM J. Imaging Sci. 2 323–43 [19] He B and Yuan X 2012 Convergence analysis of primal–dual algorithms for a saddle-point problem: from contraction perspective SIAM J. Imaging Sci. 5 119–49 [20] Ji J X, Son J B and Rane S D 2007 PULSAR: a MATLAB toolbox for parallel magnetic resonance imaging using array coils and multiple channel receivers Concepts Magn. Reson. 31B 24–36 32


P Chen et al

[21] Keeling S L, Clason C, Hintermuller M, Knoll F, Laurain A and Winckel G V 2012 An image space approach to Cartesian based parallel MR imaging with total variation regularization Med. Image Anal. 16 189–200 [22] Knoll F, Clason C, Bredies K, Uecker M and Stollberger R 2012 Parallel imaging with nonlinear reconstruction using variational penalties Magn. Reson. Med. 67 34–41 [23] Lions P L and Mercier B 1979 Splitting algorithms for the sum of two nonlinear operators SIAM J. Numer. Anal. 16 964–79 [24] Micchelli C A, Shen L and Xu Y 2011 Proximity algorithms for image models: denoising Inverse Problems 27 45009–38 [25] Moreau J-J 1962 Fonctions convexes duales et points proximaux dans un espace hilbertien C. R. Acad. Sci., Paris I 255 2897–99 [26] Nesterov Y 1983 A method of solving a convex programming problem with convergence rate O(1/k2) Sov. Math.—Dokl. 27 372–6 [27] Opial Z 1967 Weak convergence of the sequence of successive approximations for nonexpansive mappings Bull. Am. Math. Soc. 73 591–7 [28] Pssty G B 1979 Ergodic convergence to a zero of the sum of monotone operators in Hilbert space J. Math. Anal. Appl. 72 383–90 [29] Rockafellar R T 1970 Convex Analysis (Princeton, NJ: Princeton University Press) [30] Rudin L I, Osher S and Fatemi E 1992 Nonlinear total variation based noise removal algorithms Physica D 60 259–68 [31] Setzer S 2009 Split Bregman algorithm, Douglas–Rachford splitting and frame shrinkage SSVM ’09: Proc. 2nd Int. Conf. on Scale Space and Variational Methods in Computer Vision vol 5567 pp 464–76 [32] Tseng P 2008 On accelerated proximal gradient methods for convex–concave optimization Preprint (pages.cs.wisc.edu/∼brecht/cs72bdocs/Tseng.APG.pdf) [33] Tseng P 2010 Approximation accuracy, gradient methods, and error bound for structured convex optimization Math. Program. 125 263–95 [34] Zhang X, Burger M, Bresson X and Osher S 2010 Bregmanized nonlocal regularization for deconvolution and sparse reconstruction SIAM J. Imaging Sci. 3 253–76 [35] Zhang X, Burger M and Osher S 2011 A unified primal-dual algorithm framework based on Bregman iteration J. Sci. Comput. 46 20–46 [36] Zhu M and Chan T F 2008 An efficient primal–dual hybrid gradient algorithm for total variation image restoration UCLA CAM Report (08–34)

33

A primal--dual fixed point algorithm for convex separable minimization ...

A primal--dual fixed point algorithm for convex separable minimization ...

Suggest Documents

Solving Multiple-Block Separable Convex Minimization Problems ...

Genetic Algorithm Energy Minimization for Point

A multi-step approximant for fixed point problem and convex ...

FIXED-POINT ELLIPSE DRAWING ALGORITHM

A primal-dual fixed-point algorithm for minimization of the sum of three ...

Fixed Point Algorithm for Solving Nonmonotone Variational ...

FIXED-POINT ELLIPSE DRAWING ALGORITHM

A saddle point algorithm for networked online convex optimization ...

A GLOBAL MINIMIZATION ALGORITHM FOR

Minimization of Fractional Wordlength on Fixed-Point Conversion for

A modified Booth algorithm for high radix fixed-point multiplication ...

an application of fixed point theorem for s-convex function

PROMPTDIFF: A Fixed-Point Algorithm for Comparing ... - CiteSeerX

A Fixed-Point Algorithm for Closed Queueing Networks - CiteSeerX

A randomized intertial primal-dual fixed point algorithm for monotone ...

PROMPTDIFF: A Fixed-Point Algorithm for Comparing ... - CiteSeerX

A new algorithm for fixed point quantum search

A FIXED-POINT ALGORITHM FOR BLIND ... - Google Sites

A FIXED-POINT ALGORITHM FOR BLIND ... - Google Sites

A Fixed Point Conjecture

Incremental Convex Minimization for Computing Collision Translations

A NEW ALTERNATING MINIMIZATION ALGORITHM FOR TOTAL ...

A State Minimization Algorithm for Communicating

Compact asymptotic center and common fixed point in strictly convex ...