A conditional gradient method for primal-dual total variation-based

Electronic Transactions on Numerical Analysis. Volume 48, pp. 310–328, 2018. c 2018, Kent State University. Copyright ISSN 1068–9613. DOI: 10.1553/etna_vol48s310

ETNA Kent State University and Johann Radon Institute (RICAM)

A CONDITIONAL GRADIENT METHOD FOR PRIMAL-DUAL TOTAL VARIATION-BASED IMAGE DENOISING∗ ABDESLEM HAFID BENTBIB†, ABDERRAHMAN BOUHAMIDI‡, AND KARIM KREIT† Abstract. In this paper, we consider the problem of image denoising by total variation regularization. We combine the conditional gradient method with the total variation regularization in the dual formulation to derive a new method for denoising images. The convergence of this method is proved. Some numerical examples are given to illustrate the effectiveness of the proposed method. Key words. ill-posed problem, regularization, total variation, conditional gradient, image restoration, image denoising, convex programming AMS subject classifications. 65F10, 65R30

1. Introduction. Image denoising is a fundamental research field in image processing. The purpose of image denoising is to recover an unknown true or original image u b (noise-free image) from an observed image f that is contaminated by unknown noise η. In general, η is Gaussian white additive noise. The model equation of image denoising may be written as f (x) = u b(x) + η(x),

x = (x, y) ∈ Ω,

where Ω is the domain of the image. It is assumed to be a convex, connected, and bounded Lipschitz open subset of R2 . One of the most popular regularization methods is Tikhonov regularization based on least-squares techniques with a parameter and a regularization operator. Tikhonov regularization has been widely studied in the literature; see, e.g., [5, 6, 7] and the references therein. The drawback of Tikhonov regularization is the fact that it does not preserve sharp discontinuities in the image; see [11]. To overcome this difficulty one may use a technique based on the minimization of the total variation norm subject to a noise constraint. The total variation model is well known and was first introduced in image processing by Rudin, Osher, and Fatemi in [19]. The model is referred to as the ROF model. We consider the Banach space n o BV (Ω) = u ∈ L1 (Ω) : T V (u) < ∞ of functions of bounded variation endowed with the norm kukBV (Ω) = kukL1 + T V (u), where Z ∇u · ϕ dx; ϕ ∈

T V (u) := sup

Cc1 (Ω, R2 ),

kϕk∞

61

Ω

represents the total variation of u; see [1]. The notation ∇u stands for the gradient of u defined in a distributional sense. The dot stands for the Euclidean scalar product in R2 . If the ∗ Received November 14, 2017. Accepted May 16, 2018. Published online on August 28, 2018. Recommended by Fiorella Sgallari. † Université Cadi Ayyad, Marrakesh, Morocco ({a.bentbib, karim.krei}@uca.ac.ma). ‡ Université du Littoral, L.M.P.A, 50 rue F. Buisson BP699, F-62228 Calais-Cedex, France ([email protected]).

310


CONDITIONAL GRADIENT TOTAL VARIATION METHOD

311

gradient ∇u of u belongs to the space L1 (Ω, R2 ), then the total variation of u is given by the expression Z |∇u(x)|dx,

T V (u) = Ω

where | · | stands for the Euclidean norm in R2 and dx stands for the measure dxdy. The ROF model eliminates noise from images and preserve edges by minimizing the functional min

(1.1)

T V (u)

u∈BV (Ω)

subject to the constraint kf − uk22 = σ 2 |Ω|2 ,

(1.2)

where k · k2 denotes the 2-norm in L2 (Ω), σ is the standard derivation of the noise η, and |Ω| is the measure of the domain Ω. Applying a Lagrange multiplier to (1.1) and (1.2) leads to the minimization problem (1.3)

min u∈BV (Ω)

T V (u) +

λ ku − f k22 , 2

where λ > 0 is a Lagrange multiplier. As in Tikhonov regularization, the parameter λ may be viewed as a regularization parameter. The functional in (1.3) has a unique minimizer in the Banach space BV (Ω). A direct method for finding the minimizer of the functional (1.3) is to solve its associated Euler-Lagrange equation. Due to the presence of a highly nonlinear and non-differentiable term, an application of the direct method is not an easy task; see [11]. To avoid the difficulties of solving the Euler-Lagrange equation, Chan, Golub, and Mulet (CGM) were the first to propose in [11] a primal-dual method based on the duality principle and applied it to the TV-regularization for image restoration. In this paper, we propose to use the conditional gradient method based on the dual formulation to derive a new method for solving the problem (1.3). The conditional gradient algorithm, called the Frank-Wolfe method, is well known and was introduced in 1956; see [14, 20]. Such a technique was successfully applied in [4] in the case of Tikhonov regularization. Many variants of the gradient algorithm were proposed for the total variation in image reconstruction; see, for instance, [21, 22, 23]. To the best of our knowledge, the FrankWolfe algorithm has not been used previously in the literature in the case of primal-dual total variation-based image denoising. We will show how it is also possible to adopt the conditional gradient method to a TV-denoising problem. We will prove the convergence of this method, and we will give some numerical examples to illustrate the effectiveness of our proposed method. The paper is organized as follow: In Section 2, we recall the dual total variation formulation and its discrete form. In Section 3, we introduce our proposed method applied to the dual discrete formulation. The convergence of this method is established in Section 4. In Section 5, we show how to select the Lagrange multiplier parameter automatically. Section 6 is devoted to numerical tests illustrating the effectiveness of the proposed method. We also compare our method to state-of-the-art algorithms, e.g., Chambolle’s projection algorithm, the split Bregman method, and primal dual hybrid gradient algorithms.


312

A. H. BENTBIB, A. BOUHAMIDI, AND K. KREIT

2. Dual total variation formulation. The dual formulation of the ROF model was first introduced in image restoration by Chan, Golub, and Mulet [11]. In this section, we recall briefly some well-known results in the dual formulation of the total variation problem (1.3); for more details we refer the readers to [8, 11, 22] and the references therein. It is standard in convex analysis (see for instance [1, 12]) that the T V (u) semi-norm in the minimization problem (1.3) is given in the following form: Z Z Z |∇u(x)|dx = max ∇u(x) · ω(x)dx = max −u(x)∇ · ω(x)dx. 1 2 1 2 Ω

ω∈Cc (Ω,R ) kωk∞ ≤1

ω∈Cc (Ω,R ) kωk∞ ≤1

Ω

Ω

The function ω : Ω → R2 is the dual variable and u the primal one. The notation u · v stands for the standard scalar product of two vectors u and v in R2 . The space Cc1 (Ω, R2 ) denotes the space of functions v : Ω → R2 of class C 1 that are compactly supported on Ω. The gradient of u is denoted by ∇u and ∇ · ω is the divergence of ω. It is well known that the solution u∗ of the minimization problem (1.3) is given by the expression (2.1)

u∗ = f +

1 ∇ · ω∗ , λ

where the dual variable ω ∗ may be regarded as the solution of the minimization problem (2.2)

2 1

ω ∗ = arg min f + ∇ · ω ω∈K λ 2

with K = ω ∈ Cc1 (Ω, R2 ) : kωk∞ 6 1 . We recall that the norm k . k∞ is given for ω = (ω1 , ω2 ) ∈ Cc1 (Ω, R2 ) by kωk∞ = sup |ω(x)|, x∈Ω

where |ω(x)| stands for the Euclidean norm of ω(x) in R2 . Solving the TV-regularization problem (1.3) is equivalent to solving the dual problem (2.1)–(2.2). The main advantage of the dual formulation (2.1)–(2.2) is that the objective function in (2.2) is quadratic. From now on, we consider the discretized formulation. The domain of the images is assumed to be a rectangle of size n × m. The space of the images is X = Rn×m of matrices of size n × m. The space X is equipped with the classical inner product hu|viX = tr(uT v), where u and v are two matrices in X, tr(Z) denotes the trace of the square matrix Z, and uT is the transpose of the matrix u. The associated norm is the well-known Frobenius norm denoted here by kukX =

p hu|uiX .

To define the discrete total variation, we recall some definitions and basic result from [8]. The image u is represented as a matrix in X, where ui,j , the component of u, represents the value of u at the point indexed by (i, j). The discrete gradient operator, whose two


313


components at each point (i, j) are two matrices ∇1 u and ∇2 u in X, is defined by their components (∇1 u)i,j and (∇2 u)i,j as follows: ui+1,j − ui,j if i < n, j = 1, . . . , m, (∇1 u)i,j = 0 if i = n, ui,j+1 − ui,j if j < m, i = 1, . . . , n, (∇2 u)i,j = 0 if j = m. We have ∇u = (∇1 u, ∇2 u), and the discrete gradient, also denoted by ∇, may be seen as an operator ∇ : X → Y, where Y = Rn×m×2 = X × X. The space Y is endowed with the inner product and its associated norm given by p hp|qiY = hp1 |q 1 iX + hp2 |q 2 iX , kpkY = hp|piY , for all p = (p1 , p2 ) and q = (q 1 , q 2 ) in Y. The discrete total variation of u ∈ X is defined by X q (2.3) T V (u) = |(∇1 u)i,j |2 + |(∇2 u)i,j |2 . 1≤i≤n 1≤j≤m

Therefore, the discrete total variation problem (1.3) is defined as (2.4)

h i λ min T V (u) + ku − f k2X , u∈X 2

where u and f are the discretizations of the related continuous variables. The discrete divergence operator div : Y → X, also denoted by ∇·, is defined such that div = −∇∗ ·, that is, for any ω ∈ Y and u ∈ X, (2.5)

h−∇ · ω|uiX = hω|∇uiY .

The discrete divergence is then (see [8]) (2.6)  1  2 1 2 if 1 < i < n,  ωi,j − ωi−1,j  ωi,j − ωi,j−1 1 2 ωi,j if i = 1, ωi,j (∇ · ω)i,j = +   1 2 −ωi−1,j if i = n, −ωi,j−1

if if if

1 < j < m, j = 1, j = m.

Hence, the discrete form of the problem (2.2) may be written as

2 1

(2.7) min f + ∇ · ω , ω∈K λ X where K is the discrete version of the closed convex subset K, denoted by the same symbol. We have q 1 |2 + |ω 2 |2 6 1, ∀i = 1, . . . , n, ∀j = 1, . . . , m . (2.8) K = ω = (ω 1 , ω 2 ) ∈ Y : |ωi,j i,j Thus, as in continuous total variation, the solution u∗ of the discrete total variation problem (2.4) is given by the following expression u∗ = f +

1 ∇ · ω∗ , λ

where ω∗ is the solution of the least-squares problem (2.7).


314


3. The conditional gradient total variation method. In this section, we describe the conditional gradient method for solving the convex constrained optimization problem (2.7), and we give some results for this method related to our problem. We consider the function →

Fλ : K ω

R

2 1

Fλ (ω) = f + ∇ · ω . λ X

7−→

Hence, the convex constrained minimization problem (2.7) may be rewritten as min Fλ (ω).

(3.1)

ω∈K

Since K given by (2.8) is a compact set in X and Fλ is continuous on K, the problem (3.1) has a solution ω∗ in K, i.e., min Fλ (ω) = Fλ (ω∗ ).

ω∈K

We have the following result: P ROPOSITION 3.1. The function Fλ : K → R is Gâteaux differentiable. Its Gâteaux derivative is 2 1 (3.2) Fλ0 (ω) = − ∇ f + ∇ · ω , λ λ and the Taylor expansion of Fλ at ω ∈ K is given for all H ∈ Y such that ω + H ∈ K by (3.3)

Fλ (ω + H) = Fλ (ω) + hFλ0 (ω)|HiY + o(kHkY ),

where o(kHkY ) 6

8 kHk2Y . λ2

Proof. It is straightforward that 2 Fλ (ω + H) = Fλ (ω) + λ

1 1 + 2 k∇ · Hk2X . f + ∇ · ω ∇ · H λ λ X

From the duality condition (2.5), we get 2 1 2 1 f + ∇ · ω ∇ · H = − ∇ f + ∇ · ω H . λ λ λ λ X Y Now, we can see that for all ω ∈ K, H ∈ Y, and t 6= 0, we have Fλ (ω + tH) − Fλ (ω) 2 1 lim = − ∇ f + ∇ · ω H , t→0 t λ λ Y which shows that the Gâteaux derivative of Fλ is given by (3.2). According to (2.6) and the fact that 2ab ≤ a2 + b2 , we obtain 2 X 1 1 2 1 k∇ · Hk2X ≤ Hi,j − Hi−1,j + Hi,j − Hi,j−1 ≤ 8kHk2Y . 1≤i≤n 1≤j≤m



315

The main idea of the conditional gradient method may be summarized as follows: Suppose that the iterate matrix ωk ∈ K at step k is given. Then the search direction dk is obtained by solving an appropriate linear programming problem. Let Fλ (ω) = Fλ (ωk ) + hFλ0 (ωk )|ω − ωk iY + o(k ω − ωk kY ) be the first-order Taylor expansion of Fλ at ωk . Then the problem (3.1) is approximated at each step k by the following linear optimization problem: min gλ (ω),

(3.4)

ω∈K

where gλ (ω) = Fλ (ωk ) + hFλ0 (ωk )|ω − ωk iY . Since ωk is fixed at this stage of the algorithm, the minimization problem (3.4) is equivalent to min hFλ0 (ωk )|ωiY .

(3.5)

ω∈K

Let ω bk be a solution of the problem (3.5). Since K is a convex set, the line segment [ˆ ωk , ωk ] connecting ω bk and ωk lies entirely inside K, and (b ωk − ωk ) is a feasible direction. Thus, we can perform a line-search over this line segment. That is, we solve the one-dimensional problem (3.6)

min Fλ (ωk + α(b ωk − ωk )).

0≤α≤1

Let αk∗ be a solution to the line-search problem (3.6). We update ωk+1 = ωk + αk∗ (b ωk − ωk ),

k = k + 1,

and repeat this process. The conditional gradient method is summarized by Algorithm 1. Algorithm 1: 1 2

Choose a tolerance tol, an initial matrix ω0 ∈ K, and set k = 0; Solve the minimization problem of a linear function over the feasible set K: min hFλ0 (ωk )|ωiY ,

ω∈K

3 4 5

(∗)

and let ω bk be a solution to problem (∗); Compute the value: ηk = hFλ0 (ωk )|b ωk − ωk iY ; If |ηk | < tol stop else continue; Solve the one-dimensional minimization problem min Fλ (ωk + α(b ωk − ωk )),

0≤α≤1

6

(∗∗)

and let αk∗ be a solution to problem (∗∗); Update ωk+1 = ωk + αk∗ (b ωk − ωk ), set k = k + 1 and go to Step 2.

Let PK denote the projection on the closed convex subset K of the Hilbert space Y. We recall that the projection PK is not linear. The following lemma may be seen in a general context. L EMMA 3.2. Let K be a closed convex and bounded subset of an Hilbert space Y such that 0Y ∈ K. Then, for any y ∈ Y with y 6= 0Y and for all ω ∈ K, depending on the sign of


316


hy|ωiY , we have hy| − PK yb iY 6 0 6 hy|ωiY , or hy|PK −b y iY 6 hy|ωiY 6 0, y and α is an upper bound for K, i.e., kωkY 6 α for all ω ∈ K. where yb = α kyk Y Proof. For all y ∈ Y, the projection PK (y) on the closed convex subset K is characterized by

hy − PK (y)|ω − PK (y)iY 6 0,

∀ω ∈ K.

As 0Y ∈ K, it follows that hy − PK (y)| − PK (y)iY 6 0. Then hy|PK (y)iY ≥ kPK (y)k2Y . In the last inequality, replacing y by yb implies that for all y ∈ Y we have hy|PK (b y )iY ≥ αkykY

kP (b 2 K y )kY . α

Multiplying the last inequality by −1, we obtain that for all ω ∈ K, hy| − PK (b y )iY 6 −αkykY

2 2 kP (b kP (b K y )kY K y )kY ≤ −kωkY kykY . α α

Now, if hy|ωiY ≥ 0, then using the Schwarz inequality, we obtain hy| − PK (b y )iY 6 −kωkY kykY

2 kP (b 2 kP (b K y )kY K y )kY 6 hy|ωiY 6 hy|ωiY . α α

Contrary, if hy|ωiY 6 0, then hy| − PK (b y )iY 6 −kωkY kykY

kP (b kP (b 2 2 K y )kY K y )kY 6 (−hy|ωiY ) 6 h−y|ωiY . α α

Replacing y by −y, we obtain hy|PK (−b y )iY 6 hy|ωiY . Let us notice that in our context, the projection PK on the convex set K in (2.8) is given for all y = (y 1 , y 2 ) ∈ Y by PK (y) = ω = (ω 1 , ω 2 ), k ωi,j =

k yi,j

max(1, |yi,j |)

with ,

k = 1, 2, i = 1, . . . , n, j = 1, . . . , m,

q 1 |2 + |y 2 |2 . Then, in this case P (−y) = −P (y). The following where |yi,j | = |yi,j K K i,j proposition provides solutions of the problems (3.5) and (3.6). P ROPOSITION 3.3. 1. At iteration k, a solution of the problem (3.5) is given by h √2nm F 0 (ω ) i k λ ω b k = PK − . k Fλ0 (ωk ) kY


317


2. At iteration k, a solution of the problem (3.6) is given by ( αk if 0 6 αk 6 1, (3.7) αk∗ = 1 if αk > 1, where αk = −

λ2 hFλ0 (ωk )|Hk iY , 2 k ∇ · Hk kX

and Hk = ω b k − ωk .

Proof. 1. The proof of item 1 is an immediate consequence √ of Lemma 3.2. Indeed, the closed convex set K in (2.8) is bounded, and we have kωk 6 2nm for all ω ∈ K. The √Y convex set K contains 0Y . Let yk = Fλ0 (ωk ) and ybk = 2nm kyykkkY . For all ω ∈ K, by Lemma 3.2, we have D E hyk |ωiY > yk |PK −b yk . Y

2. According to (3.3) we have Fλ (ωk + αHk ) = Fλ (ωk ) + hFλ0 (ωk )|αHk iY + o(k αHk kY )

2

1 1

f + ∇ · ω = α2 2 k ∇ · Hk k2X +α hFλ0 (ωk )|Hk iY + k

λ λ X = ak α2 + bk α + ck ,

2 where Hk = ω bk −ωk , ak = λ12 k ∇·Hk k2X , bk = hFλ0 (ωk )|Hk iY , and ck = f + λ1 ∇ · ωk X . Then, it follows that the minimum of the quadratic one-dimensional problem, bk λ2 hFλ0 (ωk )|Hk iY minα Fλ (ωk + αHk ), is given by αk = − bk is a so= − . Since ω 2ak 2 k ∇ · Hk k2X lution at step k of the problem (3.5), ηk = hFλ0 (ωk )|Hk iY 6 0 and αk > 0. It follows that the solution of the problem (3.6) is given by (3.7). We describe the Conditional Gradient Total Variation (CGTV) method in Algorithm 2. Algorithm 2: The Conditional Gradient Total Variation (CGTV) Algorithm. 1 2 3 4 5

Input: a tolerance tol, an integer kmax , a value λ > 0 and ek > tol; initialization: choose an initial ω0 ∈ K, and set u0 = f , k = 0; while k < kmax and ek > tol do √ Compute gk = − λ2 ∇uk and gbk = 2nm kggkkkY ; gk and set Hk = ω bk − ωk ; Compute ω bk = PK −b 2

6 7 8 9 10 11 12

13 14

Compute hk = ∇ · Hk and αk = − λ2 if αk > 1 then αk∗ = 1 ; else αk∗ = αk ; end Update ωk+1 = ωk + αk∗ Hk set k = k + 1; −uk kY Compute ek = kuk+1 ; kuk kY end Result: Return uλ = uk .

hgk |Hk iY ; khk k2X

dk+1 = ∇ · ωk+1 , uk+1 = f + λ1 dk+1 ,

and


318


4. Convergence. We state some results on the convergence of the sequence generated by Algorithm 2. T HEOREM 4.1. Let {ηk } be the sequence given by ηk = hFλ0 (ωk )|b ωk − ωk iY , where ω bk is a solution at step k of the problem (3.5). Then lim ηk = 0.

k→∞

Proof. For any α ∈ [0, 1], we set γk (α) = ωk + α(b ωk − ωk ). According to (3.3), we get Fλ (γk (α)) − Fλ (ωk ) = α hFλ0 (ωk )|b ωk − ωk iY + o(α k ω bk − ωk kY ), where |o(α k ω ˆ k − ωk kY )| 6

8α2 C 2 8α2 2 k ω b − ω k 6 , k k Y λ2 λ2

and C is the diameter of the compact set K, i.e., C = maxz,w∈K k z − w k2Y 6 2mn. Then we get, for all α ∈ [0, 1] and for all positive integers k, that (4.1)

Fλ (γk (α)) − Fλ (ωk ) 6 αηk +

8α2 C 2 . λ2

As α∗ is a solution of the problem (3.6) and ωk+1 = γk (α∗ ), it is clear that, for all α ∈ [0, 1] and for all positive integers k, (4.2)

Fλ (ωk+1 ) 6 Fλ (γk (α)).

In particular, for α = 0, we obtain for all positive integers k that Fλ (ωk+1 ) 6 Fλ (γk (0)) = Fλ (ωk ). We also have for all positive integers k that Fλ (ωk ) > inf Fλ (ω). ω∈K

It follows that the sequence {Fλ (ωk )} is monotonically decreasing and bounded from below, hence the sequence {Fλ (ωk )} is convergent. Consequently, (4.3) lim Fλ (ωk ) − Fλ (ωk+1 ) = 0. k→∞

From (4.1) and (4.2), we have, for all α ∈ [0, 1] and for all positive integers k, Fλ (ωk+1 ) − Fλ (ωk ) 6 αηk +

8α2 C 2 . λ2

Since ηk = hFλ0 (ωk )|ˆ ωk − ωk iY 6 0, it holds that Fλ (ωk ) − Fλ (ωk+1 ) > −αηk −

8α2 C 2 8α2 C 2 = α|η | − . k λ2 λ2

For all α ∈ (0, 1] and for all positive integers k this gives (4.4)

0 < |ηk | 6

8αC 2 Fλ (ωk ) − Fλ (ωk+1 ) + . λ2 α



319

According to (4.3), if we take the limit in (4.4) for k → ∞, we obtain 0 6 lim inf |ηk | 6 lim sup |ηk | 6 k→∞

k→∞

8αC 2 , λ2

∀α ∈ (0, 1].

Now, we take the limit in the last inequality for α → 0 obtaining the desired result. T HEOREM 4.2. The sequence {ωk } generated by Algorithm 2 is a minimizing sequence, i.e., lim Fλ (ωk ) = min Fλ (ω). ω∈K

k→∞

Proof. For any α ∈ [0, 1], we set γk (α) = ωk + α(b ωk − ωk ). From Algorithm 1, we have ωk+1 = ωk + αk∗ (b ωk − ωk ) = γk (αk∗ ), where αk∗ is a solution of the problem (3.6). From the convexity of the function Fλ , we have Fλ (ω∗ ) − Fλ (ωk ) > hFλ0 (ωk )|ωk − ω∗ iY , where ω∗ ∈ K is a solution of the problem (3.1). It follows that 0 6 Fλ (ωk ) − Fλ (ω∗ ) 6 − hFλ0 (ωk )|ωk − ω∗ iY 6 min hFλ0 (ωk )|ω − ωk iY 6 −ηk = |ηk |. ω∈K

Then, we have 0 6 Fλ (ωk ) − Fλ (ω∗ ) 6 |ηk |. By Theorem 4.1, we deduce that lim Fλ (ωk ) = Fλ (ω∗ ). This proves the theorem. k→∞

5. Selection of the parameter λ. In the case of Tikhonov regularization, there are two well-known techniques for an optimal selection of the parameter λ, generalized crossvalidation [16] and the L-curve method [17]. Unfortunately, adopting these techniques to total variation seems to be a difficult task. Nevertheless, we can adopt a method proposed in [8] by constructing a sequence {λk } that in the limit satisfies the standard deviation noise constraints (5.2). We assume that some statistical parameters of the additive Gaussian noise (mean, variance, . . . ) are known or at least are estimated. As usual, we assume here that the variance ση2 is estimated by the value σ 2 given by σ2 =

2 1 1 X kf − f k2X = fi,j − f , mn nm 1≤i≤n 1≤j≤m

where f =

1 X fi,j is the mean value of the observed noisy image f . The discrete nm 1≤i≤n 1≤j≤m

problem corresponding to (1.1)–(1.2) may be written as (5.1)

min T V (u), u∈X

subject to (5.2)

kf − uk2X = mnσ 2 ,


320


where T V (u) is now given in the discrete form (2.3). As in the continuous case, by introducing the Lagrange multiplier, the problem (5.1)–(5.2) is equivalent to the problem (2.4). The task is to find λ > 0 such that kf − uλ k2X = mnσ 2 , where uλ is the solution of (2.4) corresponding to λ. According to [8], for λ > 0, we √ consider the function ϕ(λ) = kPλK f kX . The problem is to find λ∗ such that ϕ(λ∗ ) = σ mn. We recall that PλK f is the projection of f to the convex set λK. We have uλ∗ = f + λ1∗ ∇ · ωλ = f − PλK f . It follows that √ ϕ(λ∗ ) = kPλ∗ K f kX = kf − uλ∗ kX = σ mn = kf − f kX . Let us now generate a sequence {λ` } such that λ`+1 =

(5.3)

kf − uλ` kX λ` , kf − f kX

` ≥ 0,

where the initial guess is chosen such that λ0 > 0. The property that lim`→+∞ λ` = λ∗ > 0 was proved in [8], and consequently √ lim kf − uλ` kX = kf − f kX = σ mn. `→+∞

Based on the sequence (5.3), we may derive the following version of our proposed algorithm. Algorithm 3: The Conditional Gradient Total Variation (CGTV) Algorithm. 1 2

3 4 5 6 7

Input: a tolerance tol1 , an integer `max ; Initialization: choose an initial value λ0 > 0 and r` > tol1 and set λ` = λ0 , and ` = 0; while ` < `max and r` > tol1 do Compute: uλ by using Algorithm 2; kf − uλ` kX λ` ; Update λ`+1 = kf − f kX Compute r` = |λ`+1 − λ` | and set ` = ` + 1; end Result: Return uλ .

Original Image

Noisy Image by noise with SNR =2.5 and PSNR=19.1865

Denoised image by CGTV with PSNR=30.7139

50

50

50

100

100

100

150

150

150

200

200

200

250

250

250

300

300

300

350

350

350

400

400

400

450

450

450

500

500 50

100

150

200

250

300

350

400

450

500

500 50

100

150

200

250

300

350

400

450

500

50

100

150

200

250

300

350

400

450

500

F IG . 5.1. True image (left), the noisy image with SN R = 2.5 and P SN R = 19.186 (center) and the denoised image (right) with P SN R = 30.714 with λ = 0.034, Re (u) = 7.158 × 10−2 .


321

CONDITIONAL GRADIENT TOTAL VARIATION METHOD Original Image



50

50

50

100

100

100

150

150

150

200

200

200

250

250

250

300

300

300

350

350

350

400

400

400

450

450

450

500

500 50

100

150

200

250

300

350

400

450

500

500 50

100

150

200

250

300

350

400

450

500

50

100

150

200

250

300

350

400

450

500

F IG . 5.2. True image (left), the noisy image with SN R = 2.5 and P SN R = 15.905 (center) and the denoised image (right) with P SN R = 26.265 with λ = 0.026, Re (u) = 8.193 × 10−2 . Original Image



50

50

50

100

100

100

150

150

150

200

200

200

250

250

250

300

300

300

350

350

350

400

400

400

450

450

500

450

500 100

200

300

400

500

600

500

700

100

200

300

400

500

600

700

100

200

300

400

500

600

700

F IG . 5.3. True image (left), the noisy image with SN R = 2.5 and P SN R = 11.602 (center) and the denoised image (right) with P SN R = 25.040 with λ = 0.014, Re (u) = 1.013 × 10−1 . Original Image



50

50

50

100

100

100

150

150

150

200

200

200

250

250

250

300

300

300

350

350

350

400

400

400

450

450

450

500

500 50

100

150

200

250

300

350

400

450

500

500 50

100

150

200

250

300

350

400

450

500

50

100

150

200

250

300

350

400

450

500

F IG . 5.4. True image (left), the noisy image with SN R = 2.5 and P SN R = 17.072 (center) and the denoised image (right) with P SN R = 27.206 with λ = 0.031, Re (u) = 9.164 × 10−2 .

6. Numerical results. In this section, we present three numerical tests that illustrate the effectiveness of our proposed method. We test our algorithm by taking known images as matrices of size n × m with grayscale pixel values in the range [0, d], where d = 255 is the maximum possible pixel value of the image. The true image is denoted by u b. We generate a noisy image f = u b + η, where η is a noise matrix whose entries are normally distributed and Gaussian random with zero mean and a value of its variance such that the signal to noise ratio


322


(SN R) has an appropriate (dB) value. We recall that the SN R is defined by σ2 SN R = 10 log10 ub2 , ση where ση2 and σub2 are the variances of the noise and the original image, respectively. Therefore in our tests, we generate an observed noisy image f by (6.1)

f =u b + η0 × σub × 10−

SN R 20

,

where η0 is a random noise matrix with zero mean and variance equal to one. The performance of the proposed method is evaluated by computing the P SN R (peak signal to noise ratio) of the denoised (restored) image u = uλ (for an appropriate λ), which is defined by d2 n × m P SN R(u) = 10 log10 . kb u − uk2F To evaluate the precision of the estimates, the following relative error is also computed Re (u) =

kb u − ukF , kb ukF

where k · kF is the Frobenius norm. All our computations were carried out using MATLAB 12 on an 1.7 GHZ Intel(R) core i7 with 8 GB of RAM. 6.1. Examples of image denoising. In this section, we illustrate the effectiveness of our proposed algorithm by choosing four benchmark images: the “Lena”, “Boat”, “Man” images of size 512 × 512 and the “Cat” image of size 500 × 750. Random Gaussian noise has been added to each image to obtain a noisy image with a specific SN R. The additive noise was constructed as in (6.1). The values of the SNR were SN R = 2.5, SN R = 5, and SN R = 7.5. Table 6.1 gives the SNR of the noise for each image as well as the PSNR of the corresponding noisy image. We have applied our algorithm to each image to obtain a denoised image uλ with an appropriate value of λ for each image. The criterion for stopping Algorithm 2 was that the relative error between two successive iterates of approximated primal variable is less then the tolerance tol = 10−5 with the maximum number of iterations kmax = 500. In Table 6.1 we have reported the P SN R(uλ ) of the denoised image uλ as well as the relative error Re (uλ ) and the CPU-time. Figures 5.1–5.4 display the true, noisy, and denoised images for each case, respectively. In Figure 6.1, we have plotted the curve λ → P SN R(uλ ), where λ is considered as a variable in the range [0, 0.5]. We observed that the curves have a maximum P SN R value at an optimal value of λ = λopt located in the range [0, 0.5]. The optimal value λopt is the value of λ for which the best result, both in terms of the maximum P SN R as well as in terms of the relative error, was obtained. In this example, our proposed algorithm was run for each image with the optimal value λopt . The optimal value was obtained by hand. Indeed, the parameter λ needs to be tuned for each noised image to get an optimal value. Based on the tests reported in Table 6.1 and many more unreported tests, we remark that our proposed algorithm works very effectively for image denoising problems both in terms of the PSNR as well as in terms of the relative error. It is very fast in terms of CPU-time. However, the determination of an optimal value of the parameter λ is still a serious problem. This is a general drawback for all total variation methods. In this example, we did not use any criterion for choosing a good parameter λ. We hand-tuned the optimal value of the parameter λ by choosing the value of λ corresponding to the maximum P SN R. We notice that such a technique is not exploitable in practice since the P SN R is computed from the true image, which is unknown in general. In the following section, we use Algorithm 3 as a remedy. Using the convergent sequence given by (5.3), we may obtain a good value of the parameter λ.


323

CONDITIONAL GRADIENT TOTAL VARIATION METHOD TABLE 6.1 The computational results for optimal parameter λ with SN R = 2.5, SN R = 5, and SN R = 7.5.

SN R 2.5 5 7.5 2.5 5 7.5 2.5 5 7.5 2.5 5 7.5

Image Lena

Boat

Cat

Man

λ 0.034 0.046 0.069 0.026 0.039 0.054 0.014 0.019 0.029 0.031 0.044 0.064

P SN R(f ) 19.1865 21.686 24.186 15.905 18.405 20.905 11.602 14.102 16.602 17.072 19.572 22.072

P SN R(uλ ) 30.714 31.838 33.056 26.265 28.290 28.651 25.040 25.957 26.924 27.206 28.290 29.492

Re (uλ ) 7.158×10−2 6.290×10−2 5.467×10−2 8.193×10−2 7.162×10−2 6.225×10−2 1.013×10−1 9.115×10−2 8.156×10−2 9.164×10−2 8.089×10−2 7.044×10−2

CPU-time 1.74s 1.25s 0.86s 1.47s 1.04s 0.63s 4.17s 2.83s 2.47s 1.13s 1.14s 0.73s

6.2. Image denoising with an automatic selection of the parameter λ. In this section, we show the efficiency of Algorithm 3 for selecting a good value of the parameter λ. The criterion for stopping Algorithm 2 consists of the tolerance tol = 10−4 and the maximum number of iterations kmax = 200 while the tolerance in Algorithm 3 is set to tol1 = 5 × 10−2 with the maximum number of iterations `max = 100 and the initial value λ0 = 100. We use the same benchmark images as in the previous section. The results obtained with Algorithm 3 are summarized in Table 6.2. We observe that there are slight differences between the results of the PSNR given in Table 6.1 and Table 6.2. The value of λ is here obtained as a limit of the sequence given by Algorithm 3 while in the previous section the value of the parameter λ was hand-tuned to obtain a maximum PSNR. The determination of an algorithm for finding an optimal value of the parameter λ is still a serious problem. Nevertheless, this proposed selection method gives a procedure to find a parameter that provides a good PSNR value as well as a small relative error. 6.3. Comparisons. In this section, we compare numerically, under the same conditions, our algorithm referred to as the CGTV algorithm with three state-of-art algorithms developed for TV regularization: Chambolle’s projection algorithm, referred to as the CPA algorithm, the split Bregman algorithm, referred to as the SpB algorithm, and a primal-dual hybrid gradient algorithm, referred to as the PDHG algorithm. The CPA algorithm was introduced in [8] PSNR(u ) with initial SNR=2.5

PSNR(u ) with initial SNR=2.5

32

28

30

26

28

24

26

22

PSNR(u ) with initial SNR=2.5 26

24

22

24

PSNR(u )

PSNR(u )

PSNR(u )

20

20

18

16 22

18

20

16

14

18

12

14 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

10 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

F IG . 6.1. The PSNR curves λ → P SN R(uλ ) for the images from the left to the right: Lena, Boat, and Cat with the SN R = 2.5.


324

A. H. BENTBIB, A. BOUHAMIDI, AND K. KREIT TABLE 6.2 The computational results for the automatic section of the parameter λ by Algorithm 3.

Image Lena

Boat

Cat

Man

SN R

λ

2.5 5 7.5 2.5 5 7.5 2.5 5 7.5 2.5 5 7

0.022 0.020 0.017 0.015 0.014 0.012 0.010 0.009 0.008 0.017 0.016 0.014

P SN R(f )

P SN R(uλ )

19.186 21.689 24.186 15.905 18.405 20.904 11.602 14.102 16.602 17.072 19.572 22.072

30.663 31.771 32.425 26.313 27.169 27.620 24.897 25.887 26.757 27.227 28.044 28.442

Re (uλ )

CPU-time

7.200×10−2 6.338×10−2 5.878×10−2 8.148×10−2 7.384×10−2 7.009×10−2 1.029×10−1 9.190×10−2 8.786×10−2 9.142×10−2 8.321×10−2 7.949×10−2

1.97s 4.31s 3.46s 3.920s 2.95s 3.77s 3.13s 2.67s 1.97s 3.56s 2.49s 4.45s

by Chambolle as a projection algorithm for solving the problem (1.3); see also [9, 10]. The main idea behind CPA is based on a dual formulation and is related to the work in [11]. To ensure convergence of CPA, it is shown in [8] that a parameter τ must satisfy the condition 0 < τ ≤ 18 . In [2], Aujol revisited CPA and showed that CPA is in fact a particular case of an algorithm previously proposed by Bermùdez and Moreno in [3], which is an adaptation of Uzawa’s algorithm to the problem (1.3). So, the range of convergence may be extended to 0 < τ < 14 , and the best convergence is obtained for 0.24 ≤ τ ≤ 0.249. In our tests, the value of the parameter was fixed at τ = 0.245. The split Bregman (SpB) algorithm was introduced by Goldstein and Osher in [15]; see also [18]. The main idea behind this method is the use of the alternating direction method of multipliers adapted to L1-problems. The split Bregman method is used to solve a variety of L1-regularized optimization problems and is particularly effective for problems involving TV regularization. The primal-dual hybrid gradient algorithm (PDHG) was proposed by Zhu and Chan in [23] for a broader variety of convex optimization problems; see also [13]. PDHG is an example of a first-order method, since it requires only functional and gradient evaluations. The PDHG method has been widely used in image processing. For comparing the accuracy obtained by our proposed algorithm with those of the three algorithms, we have used the three benchmark images “Lena”, “Boat”, and “Man” of size 512 × 512 from the previous sections. We generated a noisy image from a true image u b by using an additive Gaussian noise with specific values of the SNR as in (6.1). We used three different values of SN R = 1, SN R = 3, and SN R = 7. In order to provide a fair and unified framework for comparison, we have carefully tuned the value of the parameter λ for optimal performance of the PSNR of the denoised image corresponding to a maximum PSNR obtained by one of these methods. All these algorithms are endowed with the same convergence criterion, i.e., the iterations for all algorithms were terminated when the relative error between two successive iterates of approximated primal variable is less than the tolerance tol = 10−5 or when a maximum of 500 iterations has been performed. In Table 6.3, and for each algorithm, we report the number of iterations (iter) reached, the relative error, the PSNR, as well as the CPU-time in seconds. In Figure 6.2, we plot the curves of the relative error for each algorithm corresponding to different images for the values of the SN R = 1, SN R = 3, and SN R = 7. We observe, from Figure 6.2, that CGTV is better in terms of rapid convergence than the CPA, SpB, and PDHG algorithms. This may be also observed from Table 6.3 by comparing the number of


325

CONDITIONAL GRADIENT TOTAL VARIATION METHOD Error convergence curves with SNR=1

10 0

10 -1

Relative Error

Relative Error

10 -2

10 -3

10 -4

10 -4

10 -5

10 -5

10 -5

0

10

20

30

40

50

60

70

80

90

10 -6

100

0

10

20

30

Number of iterations k

50

60

70

80

90

10 -6

100

CGTV CPA SpB PDHG

CGTV CPA SpB PDHG

10 -3

10 -4

10 -4

10 -5

10 -5

10 -5

30

40

50

60

70

80

90

10 -6

100

0

10

20

30


10

40

50

60

70

80

90

10 -6

100

10 CGTV CPA SpB PDHG

-1

10

10 CGTV CPA SpB PDHG

-1

10 -3

10 -3

10 -5

10 -5

10 -5

20

30

40

50

60


70

80

90

100

10 -6

90

100

10

20

30

40

50

60

70

80

90

100

Error convergence curves with SNR=7 CGTV CPA SpB PDHG

10 -3

10 -4

10

80

10 -2

10 -4

0

0

0

10 -4

10 -6

70

10 -1

10 -2 Relative Error

10 -2

60


Error convergence curves with SNR=3

0

Relative Error

10

50

CGTV CPA SpB PDHG



0

40

10 -3

10 -4

20

30



Relative Error

10 -3

10

20

10 -1

10 -2

0

10

10 0

10 -1

10 -2

10 -6

0



10 0

10 -1

Relative Error

40



10 0

Relative Error

10 -3

10 -4

10 -6

CGTV CPA SpB PDHG

10 -1

10 -2

10 -3


10 0 CGTV CPA SpB PDHG

10 -1



10 0 CGTV CPA SpB PDHG

0

10

20

30

40

50

60


70

80

90

100

10 -6

0

10

20

30

40

50

60

70

80

90

100


F IG . 6.2. Relative error curves of CGTV, CPA, SpB and PDHG algorithms with SN R = 1 (left), SN R = 3 (center) and SN R = 7 (right) for “Lena” image (top), “Boat” image (center) and “Man” image (bottom).

iterations reached for each algorithm and the CPU-time. From Table 6.3, it is apparent that in most cases, our proposed algorithm restored the image with fewer iterations and in less CPU-time than the other three above algorithms with a comparable visual quality confirmed by the values of the obtained PSNR which is in most cases close to the PSNR obtained by the three other algorithms. We notice that the PSNR reached by the CPA, SpB, and PDHG algorithms is still slightly better than that obtained by CGTV. The slowest algorithm in our reported tests was the CPA, which in most cases reached the maximum number of iterations. The observations were confirmed in many others tests, not reported in this table, for different values of the SNR and of the parameter λ. There are some cases where our algorithm obtains the best PSNR with rapid convergence. For instance, in the case of the Boat image with SN R = 1, we observe that the relative error and the PSNR obtained by the CGTV were V V Re (uCGT ) = 8.920 × 10−2 and P SN R(uCGT ) = 25.527 with CPU-time 4.70s, while λ λ


326

A. H. BENTBIB, A. BOUHAMIDI, AND K. KREIT Original Image

Noisy Image by noise with SNR =1 and PSNR=17.6865


50

50

50

100

100

100

150

150

150

200

200

200

250

250

250

300

300

300

350

350

350

400

400

400

450

450

450

500

500 50

100

150

200

250

300

350

400

450

500

500 50

100

Denoised image by CPA with PSNR=30.2754

150

200

250

300

350

400

450

500

50

50

50

50

100

100

150

150

150

200

200

200

250

250

250

300

300

300

350

350

350

400

400

400

450

450

450

500

500 100

150

200

250

300

350

400

450

500

150

200

250

300

350

400

450

500

450

500

Denoised image by PDHG with PSNR=30.275

100

50

100

Denoised image by SplitB with PSNR=30.2728

500 50

100

150

200

250

300

350

400

450

500

50

100

150

200

250

300

350

400

F IG . 6.3. From left to right and from top to bottom: Original image, denoised image, restored image by CGTV, CPA, SpB, and PDHG algorithms for Lena image with SN R = 1.

the results obtained by CPA, SpB, and PDHG were A Re (uCP ) = 9.071 × 10−2 , λ

A P SN R(uCP ) = 25.382, λ

Re (uSpB ) = 9.068 × 10−2 , λ

P SN R(uSpB ) = 25.384, λ

DHG Re (uP ) = 9.070 × 10−2 , λ

DHG P SN R(uP ) = 25.381, λ

V A DHG where uCGT , uCP , uSpB , and uP denote the denoised image obtained by the CGTV, λ λ λ λ CPA, SpB, and PDHG algorithms, respectively. All the four algorithms are very sensitive to the value of the parameter λ. In Figure 6.3, we display the corresponding noisy and denoised images by the four algorithms for the “Lena” image with SN R = 1.

7. Conclusion. In this paper we have introduced the conditional gradient method to the primal-dual formulation of the total variation problem to derive a new algorithm that is competitive both in precision and convergence. We have proved the convergence of our algorithm, and we have tested it numerically showing its effectiveness as a method for denoising images. The value of the Lagrange multiplier may be obtained as the limit of a convergent sequence. We have compared our proposed method to three state-of-art algorithms developed for TV regularization, namely Chambolle’s projection algorithm, the split Bregman algorithm, and a primal-dual hybrid gradient algorithm. In terms of speed and convergence, our algorithm is competitive yielding excellent quality of the restored image. Acknowledgements. The authors would like to thank the anonymous referees for their valuable suggestions and comments, which improved the quality of this paper. REFERENCES [1] H. ATTOUCH , G. B UTTAZZO , AND G. M ICHAILLE, Variational Analysis in Sobolev and BV Spaces: Applications to PDEs and Optimization, SIAM, Philadelphia, 2014.



327

TABLE 6.3 The computational results for the CGTV, CPA, SpB and PDHG algorithms.

Image

Lena

Boat

Man

SN R

λ

P SN R(f )

1

0.035

17.686

3

0.045

19.686

7

0.079

23.686

1

0.020

14.405

3

0.035

16.405

7

0.060

20.405

1

0.030

15.572

3

0.040

17.572

7

0.065

21.572

Method

iter

CGTV CPA SpB PDHG CGTV CPA SpB PDHG CGTV CPA SpB PDHG CGTV CPA SpB PDHG CGTV CPA SpB PDHG CGTV CPA SpB PDHG CGTV CPA SpB PDHG CGTV CPA SpB PDHG CGTV CPA SpB PDHG

53 500 128 52 40 500 107 47 29 500 69 34 47 500 169 68 21 500 84 38 13 500 57 29 49 500 111 46 47 500 87 39 12 500 57 29

Re (u) P SN R(u) 7.816×10−2 7.529×10−2 7.531×10−2 7.529×10−2 7.001×10−2 6.766×10−2 6.766×10−2 6.766×10−2 5.728×10−2 5.498×10−2 5.499×10−2 5.497×10−2 8.920×10−2 9.071×10−2 9.068×10−2 9.070×10−2 8.107×10−2 7.858×10−2 7.855×10−2 7.9023×10−2 6.496×10−2 6.268×10−2 6.272×10−2 6.268×10−2 1.003×10−1 9.789×10−2 9.781×10−2 9.789×10−2 9.077×10−2 8.841×10−2 8.840×10−2 8.841×10−2 7.328×10−2 7.119×10−2 7.112×10−2 7.119×10−2

29.951 30.275 30.273 30.275 30.907 31.204 31.203 31.204 32.651 33.007 33.004 33.007 25.527 25.382 25.384 25.381 26.357 26.627 26.631 26.627 28.281 28.592 28.586 28.591 26.425 26.633 26.640 26.633 27.289 27.518 27.518 27.517 29.149 29.399 29.408 29.399

CPU-time 6.18s 35.31s 37.105 12.85s 5.22s 40.96s 34.91s 11.45s 3.06s 27.68s 18.96s 7.33s 4.70s 26.85s 48.84s 15.09s 2.73s 35.52s 24.34s 9.17s 1.38s 31.68s 16.49s 6.59s 5.02s 34.29s 30.88s 10.71s 5.68s 37.60s 25.17s 9.40s 2.08s 28.94s 16.36s 6.19s

[2] J. AUJOL, Some first-order algorithms for total variation based image restoration, J. Math. Imaging Vision, 34 (2009), pp. 307–327. [3] A. B ERMÙDEZ AND M. M ORENO, Duality methods for solving variational inequalities, Comput. Math. Appl., 7 (1981), pp. 43–58. [4] A. B OUHAMIDI , R. E NKHBAT, AND K. J BILOU, Conditional gradient Tikhonov method for a convex optimization problem in image restoration. J. Comput. Appl. Math., 255 (2014), pp. 580–592. [5] A. B OUHAMIDI AND K. J BILOU, Sylvester Tikhonov-regularization methods in image restoration, J. Comput. Appl. Math., 206 (2007), pp. 86–98. [6] D. C ALVETTI , B. L EWIS , L. R EICHEL , AND F. S GALLARI, Tikhonov regularization with nonnegative con-


328

[7] [8] [9]

[10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23]

A. H. BENTBIB, A. BOUHAMIDI, AND K. KREIT straints, Electron. Trans. Numer. Anal., 18 (2004), pp. 153–173. http://etna.ricam.oeaw.ac.at/vol.18.2004/pp153-173.dir/pp153-173.pdf D. C ALVETTI , S. M ORIGI , L. R EICHEL , AND F. S GALLARI, Tikhonov regularization and the L-curve for large discrete ill-posed problems, J. Comput. Appl. Math., 123 (2000), pp. 423–446. A. C HAMBOLLE, An algorithm for total variation minimization and applications, J. Math. Imaging Vision, 20 (2004), pp. 89–97. , Total variation minimization and a class of binary MRF models, in Energy Minimization Methods in Computer Vision and Pattern Recognition EMMCVPR 05, A. Rangarajan, B. Vemuri, and A. L. Yuille, eds., Lecture Notes in Computer Sciences vol. 3757, Springer, Berlin, 2005, pp. 136–152. A. C HAMBOLLE AND T. P OCK, A first-order primal-dual algorithm for convex problems with applications to imaging, J. Math. Imaging Vision, 40 (2011), pp. 120–145. T. F. C HAN , G. H. G OLUB , AND P. M ULET, A nonlinear primal-dual method for total variation-based image restoration, SIAM J. Sci. Comput., 20 (1999), pp. 1964–1977. I. E KELAND AND R. T ÉMAM, Convex Analysis and Variational Problems, SIAM, Philadelphia, 1999. E. E SSER , X. Z HANG , AND T. F. C HAN, A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science, SIAM J. Imaging Sci., 3 (2010), pp. 1015–1046. M. F RANK AND P. W OLFE, An algorithm for quadratic programming, Naval Res. Logist. Quart., 3 (1956), pp. 95–110. T. G OLDSTEIN AND S. O SHER, The split Bregman algorithm for L1 regularized problems, SIAM J. Imaging Sci., 2 (2009), pp. 323–343. G. H. G OLUB , M. H EATH , AND G. WAHBA, Geleralized cross-validation as a method for choosing a good ridge parameter, Technometrics, 21 (1979), pp. 215–223. P. C. H ANSEN, Analysis of discrete ill-posed problems by means of the L-curve, SIAM Rev., 34 (1992), pp. 561–580. S. O SHER , M. B URGER , D. G OLDFARB , J. X U , AND W. Y IN, An iterative regularization method for total variation-based image restoration, Multiscale Model. Simul., 4 (2005), pp. 460–489. L. I. RUDIN , S. O SHER , AND E. FATEMI, Nonlinear total variation based noise removal algorithms, Physica D, 60 (1992), pp. 259–268. O. V. VASILIEV, Optimization Methods, World Federation Publishers, Atlanta, 1996. B. Z HANG , Z. Z HU , AND S. L I, A modified spectral conjugate gradient projection algorithm for total variation image restoration, Appl. Math. Lett., 27 (2014), pp. 26–35. M. Z HU , S. J. W RIGHT, AND T. F. C HAN, Duality-based algorithms for total-variation-regularized image restoration, Comput. Optim. Appl., 47 (2010), pp. 377–400. M. Z HU AND T. F. C HAN, An efficient primal-dual hybrid gradient algorithm for total variation image restoration Tech. Report, CAM Report 08-34, UCLA Center for Applied Math., UCLA, Los Angeles, 2008.

A conditional gradient method for primal-dual total variation-based

A conditional gradient method for primal-dual total variation-based

Suggest Documents

CONDITIONAL GRADIENT METHOD FOR STOCHASTIC ...

CONDITIONAL GRADIENT METHOD FOR STOCHASTIC

New Results and Analysis for the Conditional Gradient Method

New Results and Analysis for the Conditional Gradient Method

An Inexact Newton-like conditional gradient method for constrained ...

New Results and Analysis for the Conditional Gradient Method - CMAP

A Linearly Convergent Conditional Gradient Algorithm with ...

Fast Generalized Conditional Gradient Method with Applications to ...

A gradient based facile HPLC method for

PRECONDITIONED CONJUGATE GRADIENT METHOD FOR

Conditional Gradient with Enhancement and ... - cmap - polytechnique

A Nonmonotone Gradient Algorithm for Total Variation Image

Stochastic Conditional Gradient Methods: From Convex Minimization ...

Hybrid Conditional Gradient-Smoothing Algorithms with Applications ...

Conditional Accelerated Lazy Stochastic Gradient Descent

3. Conjugate gradient method

gradient method

Robustness Analysis of a Gradient Identification Method for a ...

CPPI Method with a Conditional Floor

Gradient-Based Feature Selection for Conditional Random Fields and ...

The Gradient-Fourier method for nonlinear Neumann

Preconditioned Conjugate Gradient Method for ... - Purdue Math

The Gradient Method for Weighted Overdetermined ...

Denaturing Gradient Gel Method for Mapping