A GLOBAL MINIMIZATION ALGORITHM FOR TIKHONOV FUNCTIONALS WITH p−CONVEX (p > 2) PENALTY TERMS IN BANACH SPACES MIN ZHONG AND WEI WANG
Abstract. We extend the globally convergent TIGRA method in [32] for the computation of a minimizer of the Tikhonov-type functional with the p−convex (p > 2) penalty terms Θ for nonlinear forward operators in Banach spaces. The Θ are allowed to be non-smooth to include Lp −L1 or Lp −TV (total variation) functionals, which are significant in reconstructing special features of solutions such as sparsity and discontinuities. The proposed TIGRA-Θ method uses a dual gradient descent method in the inner iteration and linearly decreases the regularization parameter in the outer iteration. We present the global convergence analysis for the algorithm under suitable parameter selections, and the convergence rate results are provided under both a priori and a posteriori stopping rules, respectively. Two numerical examples including autoconvolution problem and parameter identification problem are presented to illustrate the theoretic analysis and verify the effectiveness of the method.
1. Introduction In this paper, we are interested in solving nonlinear ill-posed operator equation (1.1)
F (x) = y ,
where F : X → Y is a nonlinear operator between Banach spaces X and Y. Instead of exact data y we assume that only noisy data y δ with noise level δ are given such that ∥y δ − y∥ 6 δ . How to use y δ to produce a stable and accurate reconstruction of the solution of equation (1.1) is always a central topic and regularization methods should be taken into account. The theory of regularization in Hilbert spaces are well developed, for a good overview we refer [13, 14, 15, 29]. When the sought solution has special features such as sparsity, piecewise continuity or discontinuities, the regularization theories in Hilbert space are no longer applicable. Therefore, the computation of solutions of nonlinear operator equations in Banach spaces is becoming more and more important. To this end, several of the known regularization methods for nonlinear equations were generalized. Tikhonov regularization might be the best known method. As a solution of (1.1), the minimizer of the functional 1 (1.2) Φα (x) := ∥F (x) − y δ ∥r + αΘ(x) , r is taken, where 1 < r < ∞ is fixed, α > 0 is regularization parameter and Θ(x) is a general convex function on X . With different choices for Θ, the regularized Date: June 6, 2016. 2010 Mathematics Subject Classification. 65J20, 47J06, 47A52, 49J40. Key words and phrases. Inverse problems, Tikhonov regularization, global minimization, parameter choice rules, p−convex (p > 2) penalty terms. 1
2
MIN ZHONG AND WEI WANG
scheme (1.2) can be applied to different kinds of problems. For example, the total variation (TV) penalty term allows for the reconstruction of sharp edges in images [4, 11, 22, 35], whereas L1 penalty are known to promote sparsity [12, 36]. For Tikhonov-type regularization, the first problem is how to choose an appropriate α. For 1 < r < ∞, the α was always chosen by Morozov discrepancy principle (MDP) [1, 3], or a sequential discrepancy principle (SDP) [2]. However, it may not be always feasible due to the nonlinearity of operator F , even if such α exists, the computation cost is large, since a large amount of parameters have to be tested. An alternative are iterative regularization methods, in which the number of iterative steps plays the role of the regularization parameters, the iteration will be terminated until the Morozov discrepancy principle is satisfied. In recent decades, the iterative regularization methods have been widely used in nonlinear ill-posed problems in Banach spaces, e.g., generalized Gauss-Newton method [6], extended Landweber and iteratively regularized Gauss-Newton method [28], the Landweber of Kaczmarz type [20], the nonstationary iterated Tikhonov method [22] and the Levenberg-Marquardt method [21]. However, it is worth to note that, the most of iterative regularization methods are local convergence, under the assumption that the initial guess should incorporate available information on the sought solution, this is not usually feasible in practical. The second question remains, however, concerning that for fixed α, how to proposed a global convergence algorithm to compute the minimizer of Φα . When F is nonlinear, due to the non-convexity of ∥F (x) − y δ ∥r , the Φα might have several (even local) minima. For example in nonlinear seismic inversion, a perturbation of the velocity model and the high frequency components of the wavelet lead plenty of the local minima in objective function [8]. These local minima impede the utilization of the iterative scheme, since the iterates will usually converges to a local minimum, unless the initial guess already incorporate some available information on the sought solution, or the initial model is already be close to solutions. Therefore, it is necessary and interest to propose a global convergence algorithm to get the global minimizer of Φα with an optimal parameter α. In classical Hilbert space, several attempts have been made. In [23], the computation of the minimizer has been embedded into a multilevel strategy: solve the problem with coarse level, and use the obtained solution as a starting value for the iteration solution of the problem with finer level. Moreover, in [24] they prove optimal logarithmic and H¨older type convergence rates under respective source conditions. On the other hand, in [31, 32], the TIGRA (TIkhonov-GRAdient) was introduced, which is a combination of the Tikhonov regularization and a gradient method, the global convergence is guaranteed by the directionally convexity of the Tikhonov functional in a neighborhood of a global minimizer. Applications of TIGRA for the reconstruction of the activity function in single-photon-emission computed tomography shows its effectiveness. In [25, 26, 27], a so called two-stage iterative process was introduced for searching the global solution of the irregular nonlinear operator equations. So far, to the authors best knowledge, the global optimization methods in Banach spaces are still a hard task. In [18], based on the analysis of the radius of convergence, a multilevel method with a steepest descent iteration method for solving nonlinear operator equation in Banach spaces. In order to promote the sparsity of the solution, the TIGRA algorithm was extended for sequence space X = ℓp with p < 1 [34] and 1 < p 6 2 [38]. In this paper, we extend the TIGRA algorithm to general Banach spaces X and Y = Lr with r > 1, along with a p−convex penalty function Θ. In theory, we will prove a so called pre-directionally convexity property of the functional Φα (x) with penalty Θ(x) in a neighborhood of a global minimizer xα . We call it TIGRA-Θ algorithm, which is proposed as follows
MINIMIZATION OF TIKHONOV FUNCTIONALS WITH p−CONVEX PENALTY
3
• Fix the starting value x ¯0 ∈ X (can be far away from the sought solution) and find α0 large enough such that x ¯0 belongs to the pre-directionally convex region of Φα0 , see section 3. • For fixed αj in the j−th level, the dual gradient descent iterates with constraints Θ converge to the minimizer of Φα0 , provided that the step sizes, stopping rule and break criteria are suitably chosen, see section 4.1. • Once the iteration stops, decrease the α by αj+1 = q¯αj , the constant q¯ should be chosen to guarantee that the last iterate should belong to the pre-directionally convexity region of Φαj+1 , see section 4.2. The whole process will be terminated by a priori or a posteriori rule, see section 4.3. In this algorithm, we use the dual gradient descent method as an inner iteration and linearly reduce the regularization parameter α in the outer iteration by a factor q¯ < 1. In inner routine (j fixed), we prove the iterates {xj,k } converge towards a global minimizer of Φαj . In the outer routine, we prove that the algorithm can be terminated in finite steps under a priori and a posteriori stopping rule respectively. The convergence rates for the TIGRA-Θ method are provided as well. The paper is organized as follows. In Section 2, we introduce preliminaries for Banach space and convex analysis, some basic estimates are included either. In Section 3, we prove a so-called pre-directionally convexity property of the Tikhonovtype functional. In Section 4, the TIGRA-Θ algorithm is formulated, and the main theoretical results are also presented in this section. We ensure that there exists α0 sufficiently large such that any initial value x ¯0 belongs to the pre-directionally convexity region of Φα0 (x), and prove that the inner iterates {xj,k }k>0 converge to a global minimizer of Φαj . When the inner routine terminates, we prove that the xj,k∗(j) at j−th level belong to the pre-directionally convexity region of Φαj+1 (x), which can be used as a starting point of (j + 1)−th level. Moreover, we prove the algorithm will terminate in finite steps, the convergence rates under both a priori and a posteriori stopping rule are provided respectively. Finally, in section 5 some numerical simulations were presented to illustrate the effectiveness of the method. 2. Preliminaries 2.1. Concepts and properties of Banach space and convex analysis. In this subsection, we introduce some necessary concepts and properties related to Banach space and convex analysis, we refer [39] for more details. Let X be a Banach space, we use X ∗ to denote its dual space. Given x ∈ X and ξ ∈ X ∗ , we write ⟨ξ, x⟩ = ξ(x) for the duality pairing. Let Y be another Banach space and Y ∗ be its dual. The norms in X and Y are both denoted by ∥ · ∥ for simplicity’s sake. Given a convex function Θ : X → (−∞, ∞], we use D(Θ) := {x ∈ X : Θ(x) < ∞} to denote its effective domain. We call Θ is proper if D(Θ) ̸= ∅. Given x ∈ X , every element in ∂Θ(x) := {ξ ∈ X ∗ : Θ(z) − Θ(x) − ⟨ξ, z − x⟩ > 0 for all z ∈ X } is called a subgradient of Θ at x, and let D(∂Θ) := {x ∈ D(Θ) : ∂Θ(x) ̸= ∅} . For x ∈ D(∂Θ) and ξ ∈ ∂Θ(x), we define the Bregman distance induced by Θ at x in the direction ξ as Dξ Θ(¯ x, x) := Θ(¯ x) − Θ(x) − ⟨ξ, x ¯ − x⟩,
∀¯ x∈X.
A proper, convex function Θ : X → (−∞, ∞] is called uniformly convex if there exists a continuous function h : [0, ∞) → [0, ∞), with the property that h(t) = 0 ⇒
4
MIN ZHONG AND WEI WANG
t = 0, such that (2.1)
Θ(λ¯ x + (1 − λ)x) + λ(1 − λ)h(∥x − x ¯∥) 6 λΘ(¯ x) + (1 − λ)Θ(x)
is valid for all x ¯, x ∈ X and λ ∈ (0, 1). If h(t) = c0 tp for some c0 > 0 and p > 2 in (2.1), then Θ is called p-convex. It can be shown that Θ is p-convex if and only if (2.2)
Dξ Θ(¯ x, x) > c0 ∥x − x ¯∥p ,
∀¯ x ∈ X, x ∈ D(∂Θ), ξ ∈ ∂Θ(x) .
For a proper, lower semi-continuous, convex function Θ : X → (−∞, ∞], its Legendre-Fenchel conjugate is defined by Θ∗ (ξ) := sup {⟨ξ, x⟩ − Θ(x)} ,
ξ ∈ X ∗.
x∈X
It is well known that Θ∗ is also proper, lower semi-continuous, and convex. If X is reflexive, then (2.3)
ξ ∈ ∂Θ(x) ⇐⇒ x ∈ ∂Θ∗ (ξ) ⇐⇒ Θ(x) + Θ∗ (ξ) = ⟨ξ, x⟩.
Moreover, if Θ is p-convex with p > 2 then D(Θ∗ ) = X ∗ , it follows from [39, Corollary 3.5.11] that Θ∗ is Fr´echet differentiable and its gradient ∇Θ∗ : X ∗ → X satisfies ( ) 1 ∥ξ1 − ξ2 ∥ p−1 (2.4) ∥∇Θ∗ (ξ1 ) − ∇Θ∗ (ξ2 )∥ 6 , ∀ξ1 , ξ2 ∈ X ∗ . 2c0 In addition, (2.5)
x = ∇Θ∗ (ξ) ⇐⇒ x = arg min {Θ(z) − ⟨ξ, z⟩} . z∈X
On a Banach space Y, we consider for 1 < r < ∞ the convex function x → ∥x∥r /r. Its subgradient at x is given by { } Jr (x) := ξ ∈ Y ∗ : ∥ξ∥ = ∥x∥r−1 and ⟨ξ, x⟩ = ∥x∥r ∗
which gives the duality mapping Jr : Y → 2Y of Y with gauge function t → tr−1 . We call Y is uniformly smooth if its modulus of smoothness ρY (s) := sup{∥¯ x + x∥ + ∥¯ x − x∥ − 2 : ∥¯ x∥ = 1, ∥x∥ 6 s} satisfies lims↘0 ρYs(s) = 0. There are many examples of uniformly smooth Banach spaces, e.g., sequence space ℓr , Lebesgue space Lr , Sobolev space W k,r and Besov space B s,r with 1 < r < ∞. It is well known that the uniformly smooth space Y is reflexive, and every duality mapping Jr (1 < r < ∞) is single valued and uniformly continuous on a bounded set. For each 1 < r < ∞, we use 1 1 △r (¯ x, x) = ∥¯ x∥r − ∥x∥r − ⟨Jr (x), x ¯ − x⟩ , x ¯, x ∈ Y r r to denote the Bregman distance induced by the convex function x → ∥x∥r /r. In many practical applications, proper, weakly lower semi-continuous, uniformly convex functions can be easily constructed. For instance, let X = Lp (Ω) where 2 6 p < ∞ and Ω is a bounded domain in Rd , utilizing a p-convex functional ∫ Θ0 (x) := |x(ω)|p dω , Ω
we obtain a p-convex functional ∫ ∫ ∫ (2.6) Θ(x) := ν |x(ω)|p dω + a |x(ω)|dω + b |Dx| , Ω Ω Ω ∫ where ν > 0, a, b > 0 and Ω |Dx| denotes the TV of x over Ω. With the different settings for a = 1, b = 0 or a = 0, b = 1, the functional includes Lp −L1 and Lp −TV penalties.
MINIMIZATION OF TIKHONOV FUNCTIONALS WITH p−CONVEX PENALTY
5
2.2. Basic assumptions and estimates. We now recall equation (1.1), where F : X → Y is a nonlinear operator between Banach spaces X and Y. Throughout this paper, we will always assume that X is reflexive, Y is uniformly smooth, F is weakly closed, and the solution of equation (1.1) exists. Moreover, we will make following assumption for the penalty functional Θ. Assumption 2.1. The function Θ : X → (−∞, ∞] is proper, weakly lower semicontinuous and p-convex with p > 2, i.e., (2.1) is valid with h(t) = c0 tp for some c0 > 0. Moreover, assume 0 ∈ D(Θ) and there exists a constant µ > 0 such that 1 ∥x∥p 6 Θ(x) , x ∈ D(Θ) . (2.7) pµ In general (1.1) has many solutions, we define the Θ-minimizing solution x† be (2.8)
Θ(x† ) = min {Θ(x) : F (x) = y}. x∈D(Θ)
When X is a reflexive Banach space, by using the weakly closedness of F , the weakly lower semi-continuity of Θ and (2.7), it is straightforward to show that x† exists. The further assumptions for weakly closed nonlinear operator F are necessary as well. Assumption 2.2. (i) F is Fr´echet differentiable whose Fr´echet derivative is denoted by F ′ (x). The map x → F ′ (x) is Lipschitz continuous , i.e., there exists L > 0 such that ∥F ′ (x) − F ′ (z)∥ 6 L∥x − z∥,
∀x, z ∈ X .
(ii) There exists a constant c > 0 such that ∥F (x) − F (z) − F ′ (z)(x − z)∥ := ∥RF (x, z)∥ 6 cDξ Θ(x, z) holds for all x ∈ X , z ∈ D(∂Θ) and ξ ∈ ∂Θ(z). Note that by using the Lipschitz continuity of F ′ (x), we also have L ∥RF (x, z)∥ 6 ∥x − z∥2 . 2 (iii) For a Θ−minimizing solution x† , ∂Θ(x† ) ∩ Ran(F ′ (x† )∗ ) ̸= ∅, and assume that there exists ω ∈ Y ∗ such that ξ † := F ′ (x† )∗ ω ∈ ∂Θ(x† ) , where the constant ρ < min{(3r) and ϑ in Theorem 3.1 below.
1−r r
∥ω∥ 6 ρ ,
/(4c), (3r)
1−r r
/(2cϑ)} with c in (ii)
The following result confirms the uniqueness of x† . Lemma 2.3. Let Θ satisfy Assumption 2.1 and let F satisfy Assumption 2.2. If x† is a Θ−minimizing solution, then x† is uniquely determined. Proof. Assume there exists another Θ−minimizing solution x ˆ satisfying (2.8), then it follows from Assumption 2.2 (ii) that ∥F ′ (x† )(ˆ x − x† )∥ = ∥RF (ˆ x, x† )∥ 6 cDξ† Θ(ˆ x, x† ) = −c⟨ξ † , x ˆ − x† ⟩ . If ξ † = F ′ (x† )∗ ω = 0, then Dξ† Θ(ˆ x, x† ) = 0, we conclude from the p-convexity of † Θ that x ˆ = x . Otherwise by Assumption 2.2 (iii), we obtain ∥F ′ (x† )(ˆ x − x† )∥ 6 −c⟨ξ † , x ˆ − x† ⟩ = −c⟨ω, F ′ (x† )(ˆ x − x† )⟩ 6 c∥ω∥∥F ′ (x† )(ˆ x − x† )∥ < cρ∥F ′ (x† )(ˆ x − x† )∥ 1−r r
0, the existence of xα is guaranteed by the reflexivity of X , Y, the weakly closedness of F , the weakly lower semi-continuity of Θ and (2.7). However, xα might not be unique when F is nonlinear, we will take xα to be one of the minimizers at this stage and prove its uniqueness later (see Proposition 2.8). The application of the first order optimality condition gives 1 0 ∈ F ′ (xα )∗ Jr (F (xα ) − y δ ) + ∂Θ(xα ) . α Since Y is assumed to be uniformly smooth thus Jr is single valued, we will denote 1 ξα := − F ′ (xα )∗ Jr (F (xα ) − y δ ) ∈ ∂Θ(xα ) (2.10) α which is a subgradient of Θ at xα . The next two lemmas provide basic estimates which will be applied in the following theoretical analysis, similar results have been established in [17, 32, 38]. Lemma 2.4. Let Θ satisfy Assumption 2.1 and F satisfy Assumption 2.2. Let x† be the Θ-minimizing solution and let xα be a minimizer of Φα (x). There hold r
∥F (xα ) − y δ ∥r 6 3δ r + 2r (2ρα) r−1 , ( ) 1 1 3 δr Dξ† Θ(xα , x† ) 6 + (2r ρr α) r−1 . 1 − cρ 2r α Proof. Since xα is a minimizer of Φα (x), we have 1 1 (2.11) ∥F (xα ) − y δ ∥r + αΘ(xα ) − αΘ(x† ) 6 ∥F (x† ) − y δ ∥r . r r † † Adding the term −α⟨ξ , xα − x ⟩ to both sides and applying Assumption 2.2 (ii) (iii), it follows that 1 1 ∥F (xα ) − y δ ∥r + αDξ† Θ(xα , x† ) 6 ∥F (x† ) − y δ ∥r − α⟨ξ † , xα − x† ⟩ r r 1 r 6 δ + α⟨ω, RF (xα , x† )⟩ + α⟨ω, y − y δ + y δ − F (xα )⟩ r 1 r 6 δ + cαρDξ† Θ(xα , x† ) + αρ∥F (xα ) − y δ ∥ + αρδ . r The application of Young’s inequality ϵap1 bp2 1 1 ab 6 + p /p (a, b > 0, ϵ > 0, p1 , p2 > 1 with + = 1) p1 p1 p2 ϵ 2 1 with the variables ϵ := 1/2, p1 = r, p2 = r/(r − 1), b = ρα and a := ∥F (xα ) − y δ ∥ or a := δ yields r
αρ∥F (xα ) − y δ ∥ 6
1 (ρα) r−1 ∥F (xα ) − y δ ∥r + 1 , 2r (1/2) r−1
r
ραδ 6
δr (ρα) r−1 + . 1 2r (1/2) r−1
Therefore, since Assumption 2.2 (iii) guarantees that 1 − cρ > 0, the inequality r 1 3 ∥F (xα ) − y δ ∥r + α(1 − cρ)Dξ† Θ(xα , x† ) 6 δ r + (2ρα) r−1 2r 2r
MINIMIZATION OF TIKHONOV FUNCTIONALS WITH p−CONVEX PENALTY
7
implies the both estimates.
Remark 2.5. To simplify the discussion, in what follows we introduce the paramr−1 eter s = (3r) r ρ and consider the values α > α∗ , where ( ) r−1 1 3 r r−1 ∗ (2.12) α := δ := s¯δ r−1 2ρ r ( 3 ) r−1 1 r with s¯ := 2ρ . For such an α, the first estimate in Lemma 2.4 implies r (2.13)
∥F (xα ) − y δ ∥ 6 (2sα)1/(r−1) .
Lemma 2.6. [3] Let Θ satisfy Assumption 2.1 and F satisfy Assumption 2.2. Choose α = α(δ, y δ ) according to the discrepancy principle δ 6 ∥F (xα ) − y δ ∥ 6 τ δ,
τ >1
then Dξ† Θ(xα , x† ) = O(δ). Proof. Since xα is a minimizer of Φα (x), by discrepancy principle, we have 1 r 1 1 δ + αΘ(xα ) 6 Φα (xα ) 6 ∥F (x† ) − y δ ∥r + αΘ(x† ) 6 δ r + αΘ(x† ) , r r r which shows that Θ(xα ) 6 Θ(x† ). Applying Assumption 2.2 (ii) (iii), it follows that Dξ† Θ(xα , x† ) = Θ(xα ) − Θ(x† ) − ⟨F ′ (x† )∗ ω, xα − x† ⟩ 6 −⟨ω, F ′ (x† )(xα − x† )⟩ 6 ∥ω∥∥F (xα ) − y∥ + ∥ω∥∥RF (xα , x† )∥ 6 ρ(τ + 1)δ + ρcDξ† Θ(xα , x† ) . This implies Dξ† Θ(xα , x† ) 6
ρ(τ + 1)δ . 1 − cρ
Lemma 2.7. Let Θ satisfy Assumption 2.1 and F satisfy Assumption 2.2. Let xα be a minimizers of Φα (x) and let ξα be a subgradient of Θ at xα defined in (2.10). Then, for all α > α∗ with α∗ in (2.12) there holds )1/p ( pµ δ r ∥F (0) − y ∥ + pµΘ(0) , ∥xα ∥ 6 A := rα∗ ∥F ′ (xα )∥ 6 K := LA + ∥F ′ (0)∥ , ∥ξα ∥ 6 B := 2sK . Proof. Since xα is a minimizer of Φα (x), we have 1 ∥F (0) − y δ ∥r + αΘ(0) . r Recalling (2.7) in Assumption 2.1, for α > α∗ , (2.14)
αΘ(xα ) 6 Φα (xα ) 6 Φα (0) =
1 1 ∥xα ∥p 6 ∥F (0) − y δ ∥r + Θ(0) pµ rα∗
8
MIN ZHONG AND WEI WANG
implies the first estimate. The second estimate can be obtained directly by Assumption 2.2 (i) that ∥F ′ (xα )∥ 6 ∥F ′ (xα ) − F ′ (0)∥ + ∥F ′ (0)∥ 6 L∥xα ∥ + ∥F ′ (0)∥ . This together (2.13) and the definition of ξα in (2.10) yield the last estimate.
In the following, we will prove that, the distance between xα and xα¯ corresponding to nearby values α and α ¯ turns out to be the order of |α − α ¯ |1/(p−1) , and xα is continuous with respect to α. This result has significant implications regarding to the update step in the outer routine, where we linearly decrease the parameter α by a factor q¯ < 1. Similar results can be found in [38, Proposition 2.9]. Proposition 2.8. Let Θ satisfy Assumption 2.1 and the nonlinear operator F satisfy Assumption 2.2. For each α > α∗ , let xα be a minimizers of Φα (x), then xδα is continuous from right respect to α. ¯ More precisely, for α∗ 6 α ¯ 6 α 6 q¯α¯0 with q¯0 := 4sc/(1 + 4sc) < 1, we have (2.15)
p
p
Dξα Θ(xα¯ , xα ) 6 c0 σ p−1 (α − α ¯ ) p−1 ,
where σ :=
1
1
∥xα − xα¯ ∥ 6 σ p−1 (α − α ¯ ) p−1 .
4sK (1−4cs)c0 α∗ .
Proof. Since xα¯ is a minimizer of Φα¯ (x), we have 1 1 Φα¯ (xα¯ ) − Φα¯ (xα ) = ∥F (xα¯ ) − y δ ∥r − ∥F (xα ) − y δ ∥r + α ¯ (Θ(xα¯ ) − Θ(xα )) r r = △r (F (xα¯ ) − y δ , F (xα ) − y δ ) + ⟨Jr (F (xα ) − y δ ), F (xα¯ ) − F (xα )⟩ +α ¯ Dξα Θ(xα¯ , xα ) + α ¯ ⟨ξα , xα¯ − xα ⟩ 6 0 , and the non-negativity of △r implies αD ¯ ξα Θ(xα¯ , xα ) 6 ⟨Jr (F (xα ) − y δ ), F (xα ) − F (xα¯ )⟩ − α ¯ ⟨ξα , xα¯ − xα ⟩ . Referring to the non-negativeness of Bregman distance, the definition of ξα in (2.10), Assumption 2.2 (ii), Lemma 2.7 and (2.13), we have α ¯ Dξα Θ(xα¯ , xα ) ⟨ ⟩ α ⟩ ¯⟨ ′ 6 Jr (F (xα ) − y δ ), F (xα ) − F (xα¯ ) + F (xα )∗ Jr (F (xα ) − y δ ), xα¯ − xα α ⟨ ⟩ = − Jr (F (xα ) − y δ ), RF (xα¯ , xα ) + F ′ (xα )(xα − xα¯ ) ⟩ α ¯⟨ Jr (F (xα ) − y δ ), F ′ (xα )(xα¯ − xα ) + α ⟨ ⟩ ( α ¯) = − Jr (F (xα ) − y δ ), RF (xα¯ , xα ) + 1 − ⟨Jr (F (xα ) − y δ ), F ′ (xα )(xα − xα¯ )⟩ α ( α ¯) 6 c∥F (xα ) − y δ ∥r−1 Dξα Θ(xα¯ , xα ) + 1 − K∥xα − xα¯ ∥∥F (xα ) − y δ ∥r−1 α 6 2csαDξα Θ(xα¯ , xα ) + 2sK(α − α ¯ )∥xα − xα¯ ∥. Consequently, −1
1
(¯ α − 2csα)Dξα Θ(xα¯ , xα ) 6 2sK(α − α ¯ )∥xα − xα¯ ∥ 6 2sK(α − α ¯ )c0 p Dξα Θ(xα¯ , xα ) p . Recalling that α∗ 6 α ¯6α6
α ¯ q¯0 ,
we have,
Dξα Θ(xα¯ , xα )
p−1 p
6
−1 4sK c0 p (α − α ¯) (1 − 4cs)¯ α p
6 c0p−1 σ(α − α ¯) , which yields the first estimate. The second estimate is obtained consequently when applying the p−convexity of Θ.
MINIMIZATION OF TIKHONOV FUNCTIONALS WITH p−CONVEX PENALTY
9
r−1
Remark 2.9. Proposition 2.8 is valid provided that 4cs = 4c (3r) r ρ < 1, this is guaranteed by Assumption 2.2 (iii). Moreover, it is worth to note that, the proposition implies the uniqueness of the xα for any α > α∗ . 3. The pre-directionally convexity property of the Tikhonov-type functional In this section, we will investigate a kind of convexity property of the Tikhonovtype functional Φα (x) in an area related to global minimizer xα . To this end, we consider the functions Φα (xα + th), t∈R with h ∈ X with ∥h∥ = 1. As a generalization of earlier works in [38], we will show that there exists some set Iα := {t||t| < rα , xα + th ∈ D(∂Θ)} such that for t1 , t2 ∈ Iα and η1 ∈ ∂Θ(xα + t1 h) RΦα (xα + t2 h, xα + t1 h) := Φα (xα + t2 h) − Φα (xα + t1 h) − ⟨F ′ (xα + t1 h)∗ Jr (F (xα + t1 h) − y δ ) + αη1 , (t2 − t1 )h⟩ is nonnegative. Theorem 3.1. Let Θ satisfy Assumption 2.1 and the nonlinear operator F satisfy Assumption 2.2. Let { r−2 1 − 2csϑ 3 , for r > 2; γ := (3.1) > 0 , where ϑ = 1, for 1 < r 6 2. 2 Then, for all α > α∗ and h ∈ X with ∥h∥ = 1, we have RΦα (xα + t2 h, xα + t1 h) > γαDη1 Θ(xα + t2 h, xα + t1 h) > 0 , where t1 , t2 ∈ Iα with Iα = {t : |t| < rα , xα + th ∈ D(∂Θ)}, η1 ∈ ∂Φα (xα + t1 h), and { ( ) 1 √ ( ) 1 } 1 1 2γα r−1 2 γα 2(r−1) (3.2) rα := , . √ 1 min K cϑ L cϑ r−1 (1 + 2) Proof. Let α > α∗ and h ∈ X with ∥h∥ = 1 be fixed. For t1 , t2 ∈ R, we define xi = xα + ti h for i = 1, 2. Firstly, the identity 1 1 ∥F (x2 ) − y δ ∥r = ∥F (x1 ) − y δ ∥r + ⟨Jr (F (x1 ) − y δ ), F (x2 ) − F (x1 )⟩ r r + △r (F (x2 ) − y δ , F (x1 ) − y δ ) and the non-negativity of △r yields 1 Φα (x2 ) = ∥F (x2 ) − y δ ∥r + αΘ(x2 ) r 1 = ∥F (x1 ) − y δ ∥r + ⟨Jr (F (x1 ) − y δ ), F (x2 ) − F (x1 )⟩ r + △r (F (x2 ) − y δ , F (x1 ) − y δ ) + αΘ(x2 ) 1 = ∥F (x1 ) − y δ ∥r + ⟨Jr (F (x1 ) − y δ ), F ′ (x1 )(x2 − x1 ) + RF (x2 , x1 )⟩ r + △r (F (x2 ) − y δ , F (x1 ) − y δ ) + αΘ(x2 ) > Φα (x1 ) + ⟨F ′ (x1 )∗ Jr (F (x1 ) − y δ ) + αη1 , x2 − x1 ⟩ (3.3)
+ ⟨Jr (F (x1 ) − y δ ), RF (x2 , x1 )⟩ + α(Θ(x2 ) − Θ(x1 ) − ⟨η1 , x2 − x1 ⟩) .
Due to Assumption 2.2 (ii), we have ∥RF (x2 , x1 )∥ 6 cDη1 Θ(x2 , x1 ) ,
10
MIN ZHONG AND WEI WANG
and consequently, (3.4)
RΦα (xα + t2 h, xα + t1 h) > (α − c∥F (x1 ) − y δ ∥r−1 )Dη1 Θ(x2 , x1 ) .
The following step considers the term ∥F (x1 ) − y δ ∥r−1 , which can be estimated by Assumption 2.2 (i), inequality (2.13) and Lemma 2.7 as follows ∥F (x1 ) − y δ ∥r−1 = ∥F (xα ) − y δ + F ′ (xα )(x1 − xα ) + RF (x1 , xα )∥r−1 ( )r−1 6 ∥F (xα ) − y δ ∥ + ∥F ′ (xα )∥ ∥x1 − xα ∥ + ∥RF (x1 , xα )∥ ( )r−1 1 L 2 r−1 + K|t1 | + |t1 | 6 (2sα) 2 ) ( ( )r−1 L 2r−2 r−1 r−1 (3.5) 6 ϑ 2sα + K |t1 | . |t1 | + 2 Plugging (3.5) into (3.4) and recalling the p-convexity of Θ, it follows that )) ( ( RΦα (x2 , x1 ) > α − cϑ 2sα + K r−1 |t1 |r−1 + 21−r Lr−1 |t1 |2r−2 Dη1 Θ(x2 , x1 ) . Denoting γ = (1 − 2scϑ)/2 > 0, thus the above estimate is equivalent to (3.6)
RΦα (x2 , x1 ) − γαDη1 Θ(x2 , x1 ) > p(|t1 |r−1 )Dη1 Θ(x2 , x1 ) ,
where p(|t1 |r−1 ) := γα − cϑK r−1 |t1 |r−1 − 21−r cϑLr−1 |t1 |2r−2 . Finally, we claim that, there exists rα > 0 such that the function p(|t1 |) is nonnegative in [−rα , rα ]. In fact, denote |t1 |r−1 := z, the quadratic function p(z) = −21−r cϑLr−1 z 2 − cϑK r−1 z + γα has two different roots with different signs, i.e., √ cϑK r−1 ∓ c2 ϑ2 K 2(r−1) + 23−r cϑγαLr−1 z± = . −22−r cϑLr−1 In addition, one can prove that, |z+ | < |z− |. Then, then 2γα √ , if 23−r cϑγαLr−1 6 c2 ϑ2 K 2(r−1) , (1 + 2)cϑK r−1 √ |z+ | > r−1 2 2 γα √ , if 23−r cϑγαLr−1 > c2 ϑ2 K 2(r−1) , (3.7) (1 + 2) cϑLr−1 } { √ 1 2γα 2r−1 γα √ min , := rαr−1 . = cϑK r−1 cϑLr−1 (1 + 2) Therefore, the (3.6) and the arbitrariness for t1 complete the proof.
Remark 3.2. We explain the proof for the inequality (3.5), i.e., (a + b + c)ν 6 ϑ (aν + bν + cν ) ,
a, b, c > 0, ν > 0 .
It is obvious ϑ = 1 for ν ∈ (0, 1], for ν > 1, the convexity of the f (x) = xν and Jensen’s inequality gives ( )ν a+b+c aν + bν + cν 6 3 3 and consequently (a + b + c)ν 6 3ν−1 (aν + bν + cν ) .
MINIMIZATION OF TIKHONOV FUNCTIONALS WITH p−CONVEX PENALTY
11
Remark 3.3. We emphasize that, due to the additional requirement that xα + th should belong to D(∂Θ), the set Iα in Theorem 3.1 might not be a connected interval, so it is not accurate to define the directionally convexity of the functional Φα (x) as in [38]. We will call Φα (x) is pre-directionally convex on Brα (xα ) := {x ∈ X : ∥x − xα ∥ < rα , x ∈ D(∂Θ)} , i.e., for all x1 , x2 ∈ Brα (xα ) and η1 ∈ ∂Θ(x1 ), there holds RΦα (x2 , x1 ) > γαDη1 Θ(x2 , x1 ) > 0 . We also note that rα is independent of h and increases with respect to α. 4. The dual gradient descent method for Tikhonov functional In this section we propose TIGRA-Θ algorithm to minimize the Tikhonov-type functional (2.9). The algorithm can be described as follows 1 2 3
Given: q¯ ∈ (0, 1), α0 , and the initial pair ξ¯0 and x ¯0 = ∇Θ∗ (ξ¯0 ); ¯ Initialize: j = 0, ξ0,0 = ξ0 , x0,0 = x ¯0 ; ˆ In the inner iteration at level j, take a enough large integer k(j) (see Theorem ˆ 4.6). For k = 0, . . . , k(j) − 1 do △ξj,k = F ′ (xj,k )∗ Jr (F (xj,k ) − y δ ) + αj ξj,k , ξj,k+1 = ξj,k − βj,k △ξj,k , xj,k+1 = ∇Θ∗ (ξj,k+1 ),
4
5 6
ˆ and let k ∗ (j) be the minimum between k(j) and the first integer k such that ∥△ξj,k ∥ 6 Cj , where Cj will be specified in Lemma 4.8; In the outer iteration, set xj+1,0 = xj,k∗ (j) , ξj+1,0 = ξj,k∗ (j) and αj+1 = q¯αj , then increase j → j + 1 and go to 3; Terminate the algorithm by a suitable a prior or a posterior stopping rule; Taking the output xj ∗ = xj ∗ ,k∗ (j ∗ ) as an approximation solution.
We aim to prove that the TIGRA-Θ algorithm is well defined and globally convergent. To this end, several crucial questions should be verified. First we will show that, for each fixed j, the iterates {xj,k }k>0 converge to the minimizer xαj of ˆ Φαj (x) as k → ∞. Next, it is necessary to discuss how to determine k(j) and an appropriate Cj . Then, the stopping rule in outer routine should be provided thus the whole algorithm can be terminated in finite steps. Finally, we will discuss the convergence rate result for the TIGRA-Θ algorithm. We start with the following two lemmas. Lemma 4.1. For each α > α∗ let dα = c0 rαp . Then, if x ∈ D(∂Θ) and ξ ∈ ∂Θ(x) satisfy Dξ Θ(xα , x) < dα , then x ∈ Brα (xα ). Proof. For x ∈ D(∂Θ), the p-convexity of Θ implies that ( ) p1 ( ) p1 1 dα ∥xα − x∥ 6 Dξ Θ(xα , x) < = rα c0 c0 which shows the conclusion.
In order to start our algorithm, we need to ensure that initial value x ¯0 satisfies Dξ¯0 Θ(xα0 , x ¯0 ) < dα0 . The following result shows that this is possible provided that α0 is sufficiently large.
12
MIN ZHONG AND WEI WANG
Lemma 4.2. Let Θ satisfy Assumption 2.1 and the nonlinear operator F satisfy Assumption 2.2. If there exist positive constants A0 , B0 , such that each initial pair ξ¯0 ∈ X ∗ and x ¯0 = ∇Θ∗ (ξ¯0 ) satisfy ∥ξ¯0 ∥ 6 B0 and ∥¯ x0 ∥ 6 A0 , then there exists α0 > ∗ α large enough such that Dξ¯0 Θ(xα0 , x ¯0 ) < dα0 and consequently x ¯0 ∈ Brα0 (xα0 ). Proof. Firstly we mentioned that, due to the p−convexity of Θ, Θ∗ is Fr´echet differentiable and D(Θ∗ ) = X ∗ , this guarantees that x ¯0 ∈ D(∂Θ). In addition, recalling (2.14), the Bregaman distance Dξ¯0 Θ(xα , x ¯0 ) 6 Θ(xα ) + ∥ξ¯0 ∥∥xα − x ¯0 ∥ 1 6 ∥F (0) − y δ ∥r + Θ(0) + ∥ξ¯0 ∥(∥xα ∥ + ∥¯ x0 ∥) rα∗ is bounded for α > α∗ , whereas dα := c0 rαp → ∞ as α → ∞ by the definition of rα . Thus, there exists a large α0 > α∗ such that Dξ¯0 Θ(xα0 , x ¯0 ) < dα0 and the proof is complete.
4.1. Convergence analysis for inner routine. In this subsection, we will analyze the dual gradient descent method for any fixed regularization parameter αj > α∗ . We drop the subscript j for simplicity, for k > 0, define ( ( ) ) ξk+1 = ξk − βk F ′ (xk )∗ Jr F (xk ) − y δ + αξk := ξk − βk △ξk , (4.1) xk+1 = ∇Θ∗ (ξk+1 ) , where △ξk ∈ ∂Φα (xk ). It should be mentioned again that since D(Θ∗ ) = X ∗ , for each k, xk belongs to D(∂Θ) automatically. In the following, we are aim to prove that, for any starting pair ξ0 and x0 = ∇Θ∗ (ξ0 ), as long as Dξ0 Θ(xα , x0 ) < dα with ξ0 = ∇Θ∗ (x0 ), the iterates {xk } will converge to the minimizer xα of Φα as k → ∞. To this end, we need the following lemmas. Lemma 4.3. Let Θ satisfy Assumption 2.1 and the nonlinear operator F satisfy Assumption 2.2. Assume that xk ∈ Brα (xα ) and let γ = (1 − 2csϑ)/2 > 0. Then ⟨△ξk , xk − xα ⟩ > γαDξk Θ(xα , xk ) for all α > α∗ . α Proof. Since xk ∈ Brα (xα ), taking t1 = ∥xk − xα ∥, t2 = 0 and h = ∥xxkk −x −xα ∥ , it is consequently that t1 , t2 ∈ Iα . The application of Theorem 3.1 with η1 = ξk yields
RΦα (xα , xk ) > γαDξk Θ(xα , xk ) . Therefore, the minimality of xα yields ⟨△ξk , xk − xα ⟩ > γαDξk Θ(xα , xk ) + Φα (xk ) − Φα (xα ) > γαDξk Θ(xα , xk ),
which completes the proof.
Lemma 4.4. Let Θ satisfy Assumption 2.1 and the nonlinear operator F satisfy Assumption 2.2, let ξk and xk be the iterate pair defined by (4.1) with the step size βk > 0. Suppose that α∗ 6 α 6 α0 , αβk′ < 1 and Dξk′ Θ(xα , xk′ ) < dα for all 0 6 k ′ 6 k. Then, (4.2)
¯ ∥ξk ∥ 6 B,
∥∆ξk ∥ 6 κα ,
¯ and κα are independent on k. where the positive constant B
MINIMIZATION OF TIKHONOV FUNCTIONALS WITH p−CONVEX PENALTY
13
Proof. By the definition of ξk we have
( ) ∥ξk ∥ 6 (1 − αβk−1 )∥ξk−1 ∥ + βk−1 F ′ (xk−1 )∗ Jr F (xk−1 ) − y δ Referring to Assumption 2.2 (i) (ii) and Lemma 2.7, it follows that ∥F ′ (xk−1 )∥ 6 ∥F ′ (xk−1 ) − F ′ (xα )∥ + ∥F ′ (xα )∥ < Lrα + K ∥F (xk−1 ) − y δ ∥ 6 ∥RF (xα , xk−1 )∥ + ∥F ′ (xk−1 )(xα − xk−1 )∥ + ∥F (xα ) − y δ ∥ 1
6 cdα + (Lrα + K)rα + (2sα) r−1 . Thus ( )r−1
′ ( ) 1
F (xk−1 )∗ Jr F (xk−1 ) − y δ 6 (Lrα + K) cdα + (Lrα + K)rα + (2sα) r−1 := να . Therefore ∥ξk ∥ 6 (1 − αβk−1 )∥ξk−1 ∥ + βk−1 να 6 ··· 6
k−1 ∏
(1 − αβj )∥ξ0 ∥ +
j=0
6 ∥ξ0 ∥ +
k−1 k−1 ∏ 1∑ αβj (1 − αβi )να α j=0 i=j+1
να , α
where the last inequality has been proved in [37]. Note that ναα is continuous ¯ such that ∥ξk ∥ can be bounded. respect to α in [α∗ , α0 ], hence there is a constant B Moreover, we have
( ) ∥∆ξk ∥ 6 F ′ (xk )∗ Jr F (xk ) − y δ + α∥ξk ∥ 6 2να + α∥ξ0 ∥ := κα .
The proof is complete.
The next lemma provides a condition on the step-size βk so that the Bregman distance between the minimizer xα and the iterates xk decays monotonically. Similar estimates have been in [20]. Lemma 4.5. Under the conditions in Lemma 4.4, if the step-size βk is chosen in (0, β¯k ] with (4.3)
2c0 p−1 β¯k = p (γαDξk Θ(xα , xk )) , κα
then Dξk+1 Θ(xα , xk+1 ) 6 Dξk Θ(xα , xk ). Proof. The application of the iteration (4.1) and the property of Legendre-Fenchel conjugate yields Dξk+1 Θ(xα , xk+1 ) − Dξk Θ(xα , xk ) = Θ(xk ) − Θ(xk+1 ) + ⟨ξk , xα − xk ⟩ − ⟨ξk+1 , xα − xk+1 ⟩ = Θ∗ (ξk+1 ) − Θ∗ (ξk ) − ⟨ξk+1 − ξk , xα ⟩ = Θ∗ (ξk+1 ) − Θ∗ (ξk ) − ⟨ξk+1 − ξk , xk ⟩ + ⟨ξk+1 − ξk , xk − xα ⟩ = Θ∗ (ξk+1 ) − Θ∗ (ξk ) − ⟨ξk+1 − ξk , xk ⟩ − βk ⟨∆ξk , xk − xα ⟩ .
14
MIN ZHONG AND WEI WANG
Due to the p-convexity for Θ, we can use (2.4) to obtain Θ∗ (ξk+1 ) − Θ∗ (ξk ) − ⟨ξk+1 − ξk , ∇Θ∗ (ξk )⟩ ∫ 1 ⟨ξk+1 − ξk , ∇Θ∗ (ξk + t(ξk+1 − ξk ))⟩dt − ⟨ξk+1 − ξk , ∇Θ∗ (ξk )⟩ = 0
∫
1
=
⟨ξk+1 − ξk , ∇Θ∗ (ξk + t(ξk+1 − ξk )) − ∇Θ∗ (ξk )⟩dt
0
∫
6 ∥ξk+1 − ξk ∥
1
∥∇Θ∗ (ξk + t(ξk+1 − ξk )) − ∇Θ∗ (ξk )∥dt
0 p
6 (2c0 )− p−1 ∥ξk+1 − ξk ∥ p−1 . 1
Recalling Lemma 4.3 and Lemma 4.4, it follows that Dξk+1 Θ(xα , xk+1 ) p
p
6 Dξk Θ(xα , xk ) + (2c0 )− p−1 βkp−1 ∥∆ξk ∥ p−1 − βk ⟨∆ξk , xk − xα ⟩ 1
(4.4)
p
p
6 Dξk Θ(xα , xk ) + (2c0 )− p−1 βkp−1 καp−1 − γαβk Dξk Θ(xα , xk ). 1
Therefore, the choice for step-size βk ∈ (0, β¯k ] guarantees p
p
(2c0 )− p−1 βkp−1 ∥∆ξk ∥ p−1 − γαβk Dξk Θ(xα , xk ) 6 0 , 1
and the decrease property follows consequently.
The last Theorem in this subsection provides a convergence result for the dual gradient descent method with any fixed α > α∗ . The idea comes from [7]. Theorem 4.6. Let Θ satisfy Assumption 2.1 and the nonlinear operator F satisfy Assumption 2.2. Let α > α∗ be a fixed parameter and assume that Dξ0 Θ(xα , x0 ) < dα . For any ϵaim ∈ (0, dα ), let ϵ be one of the positive root of the equation Pα ϵp + ϵ − ϵaim = 0,
(4.5) p
. Let {xk }k>0 and {ξk }k>0 be the iterations generated by (4.1) with Pα = 2c20p(γα) κp α with the step-sizes (4.6)
βk =
2c0 (γαϵ)p−1 . 2p−1 κpα
and assume βk α < 1. Let kˆ be the first integer such that dα − ϵaim . kˆ > Pα ϵp ˆ Then Dξk Θ(xα , xk ) < ϵaim for all k > k. Proof. In the first step, we will prove that, under the assumption Dξ0 Θ(xα , x0 ) < dα , the following iterates are all satisfy Dξ0 Θ(xα , x1 ) < dα , Dξ0 Θ(xα , x2 ) < dα , · · · . To this end, assume Dξ0 Θ(xα , xi ) < dα for 1 6 i 6 m for some arbitrary but fixed positive integer m, we aim to show that Dξm+1 Θ(xα , xm+1 ) < dα . The two cases will be considered. If ϵ 6 Dξm Θ(xα , xm ), then the step size βm p−1 0 defined by (4.6) belongs to (0, β¯m ] with β¯m = 2c (γαDξm Θ(xα , xm )) in (4.3) κp α
MINIMIZATION OF TIKHONOV FUNCTIONALS WITH p−CONVEX PENALTY
15
and hence we may use (4.4) to obtain 2c0 (γαϵ)p 2c0 (γαϵ)p − p−1 p p p 2 κα 2 κα 2c0 (γαϵ)p = Dξm Θ(xα , xm k) − = Dξk Θ(xα , xm ) − Pα ϵp 2p κpα < Dξm Θ(xα , xm ) < dα .
Dξm+1 Θ(xα , xm+1 ) 6 Dξm Θ(xα , xm ) +
(4.7)
On the other hand, if Dξm Θ(xα , xm ) < ϵ, we may use again the estimate (4.7) to derive that Dξm+1 Θ(xα , xm+1 ) < Dξm Θ(xα , xm ) +
2c0 (γαϵ)p < ϵ + Pα ϵp = ϵaim < dα . 2p κpα
Next, we claim that, once there exists an integer k such that Dξk Θ(xα , xk ) < ϵaim , i.e., the iterates xk enter the Bregman ball with radius ϵaim around xα , then the following iterates stay in that ball. By replacing m to k, this claim can also be proved by considering the two cases ϵ 6 Dξk Θ(xα , xk ) and Dξk Θ(xα , xk ) < ϵ as in the first step, yielding Dξk+1 Θ(xα , xk+1 ) < ϵaim . The remaining part can be obtained by induction. Finally, we show that for the given kˆ there holds Dξk Θ(xα , xk ) < ϵaim for all ˆ If this is not true, then Dξ Θ(xα , xk ) > ϵaim > ϵ for some k0 > k. ˆ By the k > k. 0 k0 above claim we then have Dξk Θ(xα , xk ) > ϵaim for 0 6 k 6 k0 . Thus βk ∈ (0, β¯k ] for 0 6 k 6 k0 and we may use (4.4) to derive that Dξk0 Θ(xα , xk0 ) < Dξk0 −1 Θ(xα , xk0 −1 ) − Pα ϵp < · · · < Dξ0 Θ(xα , x0 ) − k0 Pα ϵp < dα − k0 Pα ϵp . By the choice of kˆ we have k0 > kˆ >
dα − ϵaim . Pα ϵp
Consequently, Dξk0 Θ(xα , xk0 ) < ϵaim
which is a contradiction.
4.2. Updating the regularization parameters in outer iterations. We have shown that for a fixed regularization parameter αj > α∗ the inner iterates converge to the minimizer xαj of Φαj , provided that xj,0 and ξj,0 satisfy Dξj,0 Θ(xαj , xj,0 ) < dαj . Actually, this can be done when the updating factor q¯ and ϵaim are suitably chosen. Lemma 4.7. Let Θ satisfy Assumption 2.1 and the nonlinear operator F satisfy Assumption 2.2. Let α0 be large enough such that Dξ¯0 Θ(xα0 , x ¯0 ) < dα0 . Assume that the updating coefficient q¯ is chosen in (¯ q0 , 1) with 1 { } 1 p−1 1 d α∗ ¯ σ p−1 α0 (1 − q¯) p−1 6 , max c0 σα0 (1 − q¯) , B + B 3 ¯ are defined in Lemma 2.7 where q¯0 and σ are defined in Proposition 2.8, B and B and Lemma 4.4 respectively. Then for αj+1 = q¯αj > α∗ we have
(4.8)
Dξj,k(j) Θ(xαj+1 , xj,k(j) ) < dαj+1 , ˆ ˆ ˆ where k(j) represents the integer in Theorem 4.6 with ϵaim =
dαj+1 3
.
16
MIN ZHONG AND WEI WANG
Proof. The existence of q¯ is guaranteed as the left hand side in (4.8) tends to zero as q¯ → 1. To show the desired estimate, we write Dξj,k(j) Θ(xαj+1 , xj,k(j) ) = Dξj,k(j) Θ(xαj , xj,k(j) ) + Dξαj Θ(xαj+1 , xαj ) ˆ ˆ ˆ ˆ + ⟨ξαj − ξj,k(j) , xαj+1 − xαj ⟩ . ˆ The application of Theorem 4.6 with ϵaim =
dαj+1 3
gives
dαj+1 . 3 = q¯αj < αj
α∗ and the monotonicity of dα respect to α, it follows p
p
p
< c0 σ p−1 α0p−1 (1 − q¯) p−1 6
Using αj+1
Dξαj Θ(xαj+1 , xαj )
α∗ , let k(j) be the first integer such that ∥△ξj,k ∥ 6 Cj with a constant Cj . This break criteria was utilized be a stopping criteria in [32, 38], where the term ∥△ξj,k ∥ can be proved converges to 0 as k goes to infinity. However, due to the lack of convergence property in our work, we can only put it as a break condition, in which the value of Cj should be Θ(xαj+1 , xj,k(j) ) < dαj+1 . carefully chosen to ensure Dξj,k(j) ¯ ¯ Lemma 4.8. Let Θ satisfy Assumption 2.1 and the nonlinear operator F satisfy Assumption 2.2. Let α0 be chosen large enough such that Dξ¯0 Θ(xα0 , x ¯0 ) < dα0 and ¯ assume the updating coefficient q¯ be chosen in (¯ q0 , 1) satisfying (4.8). Denote k(j) be the first integer k satisfying ∥∆ξj,k ∥ 6 Cj .
(4.9) Then, for (4.10)
1 p
Cj 6 c0 γαj
(
dαj+1 3
) p−1 p ,
and αj+1 = q¯αj > α∗ , we have Dξj,k(j) Θ(xαj+1 , xj,k(j) ) < dαj+1 . ¯ ¯
MINIMIZATION OF TIKHONOV FUNCTIONALS WITH p−CONVEX PENALTY
17
Proof. We write Dξj,k(j) Θ(xαj+1 , xj,k(j) ) = Dξj,k(j) Θ(xαj , xj,k(j) ) + Dξαj Θ(xαj+1 , xαj ) ¯ ¯ ¯ ¯ + ⟨ξαj − ξj,k(j) , xαj+1 − xαj ⟩. ¯ The last two terms on the right hand side can be estimated in the same way as in the proof of Lemma 4.7. We only need to estimate the first term. The application of Lemma 4.3 and the p-convexity of Θ yield 1 c0 ∥xj,k(j) − xαj ∥p 6 Dξj,k(j) Θ(xαj , xj,k(j) )6 ⟨∆ξj,k(j) , xj,k(j) − xαj ⟩ ¯ ¯ ¯ ¯ ¯ γαj 1 ∥∆ξj,k(j) ∥∥xj,k(j) − xαj ∥ . 6 ¯ ¯ γαj Consequently, ∥xj,k(j) − xαj ∥p−1 6 ¯ and thus
( Dξj,k(j) (xαj , xj,k(j) )6 ¯ ¯
1 ∥∆ξj,k(j) ∥, ¯ c0 γαj p ) p−1
1
p
1/p c0 γαj
∥∆ξj,k(j) ∥ p−1 6 ¯
dαj+1 . 3
4.3. A priori and a posteriori stopping rules. As we have proved, for any fixed αj > α∗ the inner iteration yields a sequence of iterates {ξj,k }06k6k∗ (j) , {xj,k }06k6k∗ (j) ˆ ¯ with k ∗ (j) = min{k(j), k(j)}. In this subsection, we will focus on the choice strategy for the regularization parameter αj , we will prove the proposed TIGRA-Θ algorithm terminates after a finite j ∗ steps in outer iteration, the convergence result under the a priori and a posteriori rules will be provided respectively. For the readers’ convience, we will first collect below the values of the various constants appearing in the algorithm. Assumption 4.9. (i) The initial parameter α0 is sufficiently large such that Dξ¯0 Θ(xα0 , x ¯ 0 ) < d α0 as in Lemma 4.1 and Lemma 4.2; (ii) The updating factor q¯ ∈ (¯ q0 , 1) satisfies (4.8) as in Lemma 4.7; (iii) The step-sizes βj,k are chosen according to (4.6) as in Theorem 4.6, where ϵ be one of the positive root of the equation (4.5) with ϵaim = dαj+1 /3; (iv) The constant Cj are chosen according to (4.10) as in Lemma 4.8. ˆ ¯ (v) Let k ∗ (j) be the minimum between k(j) in Lemma 4.7 and k(j) in Lemma 4.8. Theorem 4.10 (A priori stopping rule). Let Θ satisfy Assumption 2.1 and the nonlinear operator F satisfy Assumption 2.2. Under the Assumption 4.9, the iterates {xj,k } and {ξj,k } generated by the TIGRA-Θ algorithm satisfy (4.11)
Dξj,k Θ(xαj , xj,k ) < dαj
for 0 6 k 6 k ∗ (j) and j = 0, 1, · · · .
Moreover, let j ∗ denote the first index such that (4.12)
αj ∗ > α∗
and
αj ∗ +1 = q¯αj ∗ < α∗ ,
then the TIGRA-Θ algorithm terminates after a finite number of steps and converges to the sought solution with the rate ∥xj ∗ ,k(j ∗ ) − x† ∥ = O(δ p ) . 1
18
MIN ZHONG AND WEI WANG
Proof. We first show (4.11) which is trivial when j = k = 0 by the choice of α0 . For any fixed j > 0, assume that Dξj,0 Θ(xαj , xj,0 ) < dαj , the application ˆ of the Theorem 4.6 derives that Dξj,k Θ(xαj , xj,k ) < dαj for all 0 6 k 6 k(j), in addition, by the choice of q¯, we may use Lemma 4.7 and Lemma 4.8 to conclude that Dξj,k∗ (j) Θ(xαj+1 , xj,k∗ (j) ) < dαj+1 . We can use an induction argument to complete the proof of (4.11) when noting xj+1,0 = xj,k∗ (j) . Next we aim to show the convergence rate. Recalling Lemma 4.1 and (4.11), it follows that ∥xαj − xj,k∗ (j) ∥ < rαj for j = 0, 1, · · · . Then, the application of Lemma 2.4 yields ∥xj ∗ ,k∗ (j ∗ ) − x† ∥ 6 ∥xj ∗ ,k∗ (j ∗ ) − xαj∗ ∥ + ∥xαj ∗ − x† ∥ ( ( )) p1 1 1 3 δr < r αj ∗ + + 2 (2ρr αj ∗ ) r−1 . c0 (1 − cρ) 2r αj ∗ Since α∗ 6 αj ∗
2,
with s¯ defined in Remark 2.5. (iii) the step-sizes βj,k are chosen according to (4.6) as in Theorem 4.6, where ϵ be one of the positive root of the equation (4.5) with { ( )p } dαj+1 δ ϵaim = min , c0 ,1 , 1/p 3 c c + Lrαj + K 0
(iv) the constant Cαj are chosen to be { ( )p } p−1 p 1 d δ αj+1 Cαj = c0p γαj min , c0 , 1 . 1/p 3 c0 c + Lrαj + K Theorem 4.12 (a posteriori rule). Let Θ satisfy Assumption 2.1 and the nonlinear operator F satisfy Assumption 2.2. Under the Assumption 4.11, the iterates xj,k generated by the TIGRA-Θ algorithm remain inside the region of convergence. Moreover, If j ∗ is defined the first index such that the discrepancy principle holds ∥F (xj ∗ ,k∗ (j ∗ ) ) − y δ ∥ 6 τ δ , τ > 2.
MINIMIZATION OF TIKHONOV FUNCTIONALS WITH p−CONVEX PENALTY
19
Then, the TIGRA-Θ algorithm terminates after a finite number j ∗ of outer iteration steps and converges to the sought solution with the rate ∥xj ∗ ,k∗ (j ∗ ) − x† ∥ = O(δ p ) . 1
Proof. We will first show that the TIGRA-Θ algorithm terminates after a finite number j ∗ of outer iteration steps. Let ¯j denote the unique index such that α¯j > α∗ and α¯j+1 = q¯α¯j < α∗ . For any fixed j 6 ¯j, if the break condition (4.9) is not satisfied, referring to Theorem 4.6, it follows that, { ( )p } δ Dξj,k∗ (j) Θ(xαj , xj,k∗ (j) ) 6 min c0 (4.14) ,1 , 1/p c0 c + Lrαj + K { ( ) p1 } 1 δ (4.15) ∥xαj − xj,k∗ (j) ∥ 6 min , . 1/p c 0 c0 c + Lrαj + K On the other hand, if the break condition (4.9) is satisfied, similarly to the proof in Lemma 4.8, we obtain p p ( ) p−1 ( ) p−1 p Cαj 1 Dξj,k∗ (j) Θ(xαj , xj,k∗ (j) ) 6 ∥∆ξj,k∗ (j) ∥ p−1 6 , 1/p 1/p c0 γαj c0 γαj then, under the choice for Cαj , the estimates (4.14) and (4.15) are still valid. Therefore, for fixed j 6 ¯j, referring to Assumption 2.2 (i), (ii) and Lemma 2.7, it follows that (4.16) ∥F (xαj ) − F (xj,k∗ (j) )∥ 6 ∥F (xαj ) − F (xj,k∗ (j) ) + F ′ (xj,k∗ (j) )(xαj − xj,k∗ (j) )∥ + ∥F ′ (xj,k∗ (j) )∥∥xαj − xj,k∗ (j) ∥ 6 cDξj,k∗ (j) Θ(xαj , xj,k∗ (j) ) + (Lrαj + K)∥xαj − xj,k∗ (j) ∥ 1
6 cDξpj,k∗ (j) Θ(xαj , xj,k∗ (j) ) + (Lrαj + K)∥xαj − xj,k∗ (j) ∥ 1/p
6
c0 cδ 1/p c0 c
+ Lrαj + K
+ (Lrαj + K)
δ 1/p c0 c
+ Lrαj + K
= δ. ( )1/r r Combining with ∥F (xαj )−y δ ∥ 6 3δ r + 2r (2ραj ) r−1 in Lemma 2.4 and q¯α¯j = α¯j+1 < α∗ := s¯δ r−1 , we obtain ∥F (x¯j,k∗ (¯j) ) − y δ ∥ 6 ∥F (x¯j,k∗ (¯j) ) − F (xα¯j )∥ + ∥F (xα¯j ) − y δ ∥ ( ( ) r )1/r 6 δ + 3δ r + 2r 2ρα¯j r−1 ( ( ) r ) r1 α∗ r−1 r 6 δ + 3δ + 2r 2ρ q¯ ( ) r ) r1 ( s¯δ r−1 r−1 r = δ + 3δ + 2r 2ρ < τδ . q¯ This means x¯j,k∗ satisfies the discrepancy principle thus j ∗ 6 ¯j and αj ∗ > α∗ , and the algorithm can be stopped in the finite steps. In order to prove the convergent rate, we will show xαj∗ satisfies δ 6 ∥F (xαj ∗ ) − y δ ∥ 6 (τ + 1)δ .
20
MIN ZHONG AND WEI WANG
Note that for the last two steps, (4.17)
∥F (xj ∗ −1,k∗ (j ∗ −1) ) − y δ ∥ > τ δ ,
∥F (xj ∗ ,k∗ (j ∗ ) ) − y δ ∥ 6 τ δ .
hence, by (4.16) τ δ < ∥F (xj ∗ −1,k∗ (j ∗ −1) ) − y δ ∥ 6 ∥F (xαj∗ −1 ) − y δ ∥ + ∥F (xj ∗ −1,k∗ (j ∗ −1) ) − F (xαj ∗ −1 )∥ 6 ∥F (xαj∗ −1 ) − y δ ∥ + δ which is, ∥F (xαj∗ −1 ) − y δ ∥ > (τ − 1)δ . We first prove ∥F (xαj∗ ) − y δ ∥ > δ. In contract, assume ∥F (xαj ∗ ) − y δ ∥ < δ, using Assumption 2.2, (ii) and Lemma 2.7 again, we have (τ − 2)δ < ∥F (xαj∗ −1 ) − y δ ∥ − ∥F (xαj ∗ ) − y δ ∥ (4.18)
6 ∥F (xαj∗ ) − F (xαj ∗ −1 )∥ 6 ∥RF (xαj∗ , xαj ∗ −1 )∥ + ∥F ′ (xαj ∗ −1 )∥∥xαj ∗ − xαj∗ −1 ∥ 6 cDξαj∗ −1 (xαj ∗ , xαj∗ −1 ) + K∥xαj∗ − xαj∗ −1 ∥ .
The application of Proposition 2.8 yields p
p
p
p
p
Dξαj ∗ −1 Θ(xαj∗ , xαj ∗ −1 ) 6 c0 σ p−1 (αj ∗ −1 − αj ∗ ) p−1 6 c0 σ p−1 α0p−1 (1 − q¯) p−1 , and 1
1
1
∥xδαj∗ − xδαj ∗ −1 ∥ 6 σ p−1 α0p−1 (1 − q¯) p−1 . Thus, combining the estimates in (4.18) we obtain p
p
p
1
1
1
(τ − 2)δ < cc0 σ p−1 α0p−1 (1 − q¯) p−1 + Kσ p−1 α0p−1 (1 − q¯) p−1 , which contradicts with the choice for q¯ in (4.13). On the other hand, combining with (4.16) and (4.17) yields ∥F (xαj ∗ ) − y δ ∥ 6 ∥F (xj ∗ ,k∗ (j ∗ ) ) − y δ ∥ + ∥F (xj ∗ ,k∗ (j ∗ ) ) − F (xαj∗ )∥ 6 (τ + 1)δ . 1
Now, the Lemma 2.6 can be utilized, yielding ∥xαj∗ − x† ∥ ∼ O(δ p ). Combining (4.15) with ∥xj ∗ ,k∗ (j ∗ ) − xαj∗ ∥ ∼ O(δ), the conclusion is obtained by ∥xj ∗ ,k∗ (j ∗ ) − x† ∥ 6 ∥xj ∗ ,k∗ (j ∗ ) − xαj∗ ∥ + ∥xαj∗ − x† ∥ = O(δ p ) . 1
5. Numerical examples In this section, we present some numerical simulations to test the performance of our method by considering a nonlinear problem arising from the auto-convolution problem and a parameter identification problem. 5.1. The nonlinear auto-convolution problem. Our first example is the nonlinear ill-posed auto-convolution problem, where the forward operator is given by ∫ s (5.1) F (x)(s) = x(s − t)x(t)dt, s ∈ [0, 1], f ∈ L2 [0, 1]. 0
This problem has a couple of application in spectroscopy [5] and stochastic [30]. The properties of auto-convolution operator can be found in [16]. As an operator from L2 [0, 1] → L2 [0, 1], it has been shown in [3] that F is continuous and weakly sequentially closed on the domain D+ := {x ∈ L2 [0, 1] : x(t) > 0 a.e.
t ∈ [0, 1]} .
MINIMIZATION OF TIKHONOV FUNCTIONALS WITH p−CONVEX PENALTY
21
Moreover, The Fr´echet derivative F ′ (x) is Lipschitz continuous which is given by ∫ s x(s − t)h(t)dt, s ∈ [0, 1], h ∈ L2 [0, 1]. [F ′ (x)h](s) = 2[x ∗ h](s) = 2 0 ′
and F (·) is Lipschitz continuous, the adjoint is given by ˜ F ′ (x)∗ h = 2(x ∗ h)˜, ˜ = (h)˜(t) = h(1 − t). For p = 2, Applying the 2-convexity of Θ, the with h ∥RF (x, z)∥ 6
L L ∥x − z∥2 6 Dξ Θ(x, z) 2 2c0
holds for all x ∈ X , z ∈ X ∩ D(∂Θ) and ξ ∈ ∂Θ(z). Therefore, Assumption 2.2 (i-ii) can thus be fully verified. However, condition in Assumption 2.2 (iii) is not known. Nevertheless, the TIGRA-Θ algorithm performed well. In the experiment, the exact solution x† is assumed to be piecewise constant, and we use y δ satisfying ∥y δ − y∥L2 [0,1] = δ with different noise level to do the reconstruction. The functions are sampled at N = 100 equispaced points in [0, 1]. When applying the TIGRA-Θ algorithm, we choose ∫ 1 1 (5.2) Θ(x) = |x(t)|2 dt + |x|T V , 2µ 0 ∑N where |x|T V = j=1 |xj+1 −xj | denotes the discrete one dimensional total variation of x. The implementation of TIGRA-Θ algorithm requires solving the minimization problem { } (5.3) x = arg min Θ(z) − ⟨ξ, z⟩L2 [0,1] 2 z∈L [0,1]
for arbitrary ξ ∈ L2 [0, 1]. For Θ given in (5.2), this minimization problem is equivalent to { } 1 2 x = arg min ∥z − µξ∥L2 [0,1] + T V (z) , z∈L2 [0,1] 2µ which is a variational denoising problem [35]. In our computation example, we use the fast iterative shrinkage-thresholding algorithm (FISTA) in [9, 10] to solve such minimization problem. To verify the global convergence behavior of the TIGRA-Θ algorithm, for fixed µ = 5, we consider various randomly chosen initial guesses ξ¯0 = γ × rand(N, 1), by choosing different amplified factor γ, some initial guesses can be far away from the sought solution. We choose the initial regularization parameter α0 = 104 , and it is updated by the factor q¯ = 0.7. As suggested in Theorem 4.6, in the inner iteration, for fixed j, the dual gradient descent algorithm 1 is utilized, the stepα size parameter is selected by βk = ∥△ξjk ∥2 , in order to guarantee βk αj < 1, we add a correction factor, i.e., β˜k = min{βk , 1/αj }. The inner routine is stopped by the break rule with Cj = 3αj or by the threshold maximum iteration number kmax = 40000. Fortunately, although ∥△ξk ∥ can not be proved tend to zero as k → ∞ in present work, in numerical test, it seems correct since the breaks happen in every inner routine. The outer routine is terminated by a posteriori choice rule, i.e., discrepancy principle with τ = 1.05. Table 1 records the different noise level and initial values, regularization parameter, iteration numbers j ∗ , and relative errors e∗ = ∥xj ∗ ,k∗ (j ∗ ) − x† ∥/∥x† ∥. It is obvious that, the results are not sensitive to the initial guesses, thus our algorithm is indeed global convergent. In Figure 1 we plot the result of TIGRA-Θ algorithm,
22
MIN ZHONG AND WEI WANG j∗ αj ∗
∥ξ¯0 ∥ running time 5 19.6s 50 21.7s 100 37.9s 500 20.3s 0.005 5 35.3s 50 27.5s 100 58.3s 500 2m15s 0.001 5 1m27s 50 1m19s 100 1m43s 500 2m39s δ 0.01
46 46 47 45 48 48 49 48 53 54 53 51
−3
1.07 × 10 1.07 × 10−3 7.49 × 10−4 1.52 × 10−3 5.24 × 10−4 5.24 × 10−4 3.67 × 10−4 5.24 × 10−4 8.81 × 10−5 6.17 × 10−5 8.81 × 10−5 1.80 × 10−4
e∗j ∗ 0.114309 0.098860 0.124156 0.049882 0.025980 0.056078 0.046869 0.049882 0.005363 0.005694 0.008470 0.010283
Table 1. Results for TIGRA-Θ algorithm with TV penalty, p = r = 2.
it is clear that the piecewise constant property of the sought solution is significantly reconstructed. 5.2. The parameter identification problem. We consider the identification of the parameter c in the boundary value problem { −△u + cu = f, in Ω; (5.4) u = g, on ∂Ω. from the measurements of u ∈ Ω, where Ω ⊂ Rd (d = 1, 2, 3) is a bounded domain with a Lipschitz boundary, f ∈ H −1 (Ω) and g ∈ L1/2 (∂Ω). For each c in D = {c ∈ L2 (Ω) : ∥c − cˆ∥L2 (Ω) 6 ρ0 for some cˆ > 0, a.e.} , the (5.4) has a unique solution u = u(c) ∈ H 1 (Ω). By the fact H 1 (Ω) ,→ Lr (Ω), we can define the map F : D ⊂ L2 (Ω) → Lr (Ω) by F (c) = u(c) for any 1 < r < ∞. Therefore, we consider the inverse identifying c ∈ L2 (Ω) from an Lr measurement of u. It has been shown in [19, 28] that the operator F is weakly closed. Moreover, F is Fr´echet differentiable, the Fr´echet derivative and its Banach space adjoint are defined as F ′ (c)h = −A(c)−1 (hF (c)), F ′ (c)ω = −u(c)A(c)−1 ω,
h ∈ L2 (Ω) , ′
ω ∈ Lr (Ω) , ∩ where r′ is the conjugate number of r, and A(c) : H 2 H01 → L2 is defined by A(c)u = −△u + cu. Recall that in the space Lr (Ω) with r > 1, the duality ′ mapping Jr : Lr (Ω) → Lr (Ω) is defined as Jr (φ) := |φ|r−1 sign(φ),
φ ∈ Lr (Ω) .
For the one-dimensional parameter identification problem, it can be proved that the Fr´echet derivative of F is Lipschitz continuous, thus the Assumption 2.2 (i) and (ii) can be verified. However for high-dimensional parameter identification problem (d = 2, 3), to the author’s best knowledge, there is no reference providing the rigorous proof of the Lipschitz continuity for F ′ (c). In [33], the author proved that if r = 2, the Assumption 2.2 (ii) is locally valid.
MINIMIZATION OF TIKHONOV FUNCTIONALS WITH p−CONVEX PENALTY final 53 iterations,1m26s
23
final 54 iterations,1m19s
1.8
1.8 current x
current x
exact x+
1.6 1.4
1.4
1.2
1.2
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
−0.2
−0.2
0
0.2
exact x+
1.6
0.4
0.6
0.8
1
(a) TV reconstruction with γ = 5, ∥¯ x0 − x† ∥ ≈ 100.
0
0.2
0.4
0.6
0.8
1
(b) TV reconstruction with γ = 50, ∥¯ x0 − x† ∥ ≈ 103 .
final 53 iterations,1m43s
final 51 iterations,2m39s
1.8
1.8 current x
current x
exact x+
1.6 1.4
1.4
1.2
1.2
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
−0.2
−0.2
0
0.2
exact x+
1.6
0.4
0.6
0.8
1
(c) TV reconstruction with γ = 100, ∥¯ x0 − x† ∥ ≈ 3 × 103 .
0
0.2
0.4
0.6
0.8
1
(d) TV reconstruction with γ = 500, ∥¯ x0 − x† ∥ ≈ 104 .
Figure 1. Comparison between exact solution and TV reconstructions obtained by the TIGRA-Θ algorithm with various initial guess, δ = 0.001. In our numerical simulations, we first consider one dimensional problem with Ω = [0, 1], and the sought solution c† is piecewise constant defined as if x ∈ [0.35, 0.45]; 1, 0.5, if x ∈ [0.7, 0.8]; c† (x) = 0, else. We assume u(c† ) = 1 + 5x and the noisy uδ (c† ) satisfies ∥u − uδ ∥Lr [0,1] 6 δ with the choice δ = 0.001. The functions are sampled at N = 100 equispaced points in [0, 1]. Besides the TV-penalty term (5.2), we also test L1−penalty term, i.e., ∫ 1 ∫ 1 1 2 (5.5) |c(x)| dx + |c(x)|dx . Θ(c) = 2µ 0 0 For such Θ, the solution of minimization problem (5.3) has explicit form c(x) = µsign(ξ(x)) max{|ξ(x)| − 1, 0},
x ∈ [0, 1] .
The parameters q¯, βk , Cj , τ are chosen to be the same as the first example, and we choose a smaller initial parameter α0 = 100. For fixed µ = 10 and initial
24
MIN ZHONG AND WEI WANG final 42 iterations, 7.5s
final 46 iterations, 4m56s
1.2
1.2 current c
current c
exact c+
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
−0.2
0
0.2
exact c+
1
0.4
0.6
0.8
1
(a) L1 reconstruction with r = 2, ∥¯ c0 − c† ∥ ≈ 200.
−0.2
0
0.2
0.6
0.8
1
(b) TV reconstruction with r = 2, ∥¯ c0 − c† ∥ ≈ 250.
final 37 iterations, 1m3s
final 31 iterations, 1m37s
1.2
1.2 current c
current c
exact c+
1
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
0
0.2
exact c+
1
0.8
−0.2
0.4
0.4
0.6
0.8
1
(c) TV reconstruction with r = 1.5, ∥¯ c0 − c† ∥ ≈ 250.
−0.2
0
0.2
0.4
0.6
0.8
1
(d) TV reconstruction with r = 1.2, ∥¯ c0 − c† ∥ ≈ 250.
Figure 2. Reconstructions obtained by the TIGRA-Θ method with p = 2, γ = 5 and δ = 0.001. amplified factor γ = 5, we plot the reconstruction results in Figure 2, for different r respectively. We finally consider the two dimensional parameter identification problem (5.4) with Ω = [0, 1] × [0, 1]. The sought solution c† is defined as if (x − 0.3)2 + (y − 0.7)2 6 0.22 ; 1, † 0.5, if (x, y) ∈ [0.6, 0.8] × [0.2, 0.5]; c (x, y) = 0, else. Assume u(c† ) = x + y and the noisy uδ (c† ) satisfies ∥u − uδ ∥L2 (Ω) 6 δ with the choice δ = 0.001. The functions are sampled at N = 30 × 30 equispaced points in Ω. For fixed µ = 10, the reconstruction results are plotted in Figure 3, both L1 and TV are included respectively. 6. Acknowledgement The authors are grateful to two referees, one board member, Qinian Jin (Australian National University, Australia) and Shuai Lu (Fudan University, China) for valuable, helpful comments and discussions. The author M. Zhong was supported
MINIMIZATION OF TIKHONOV FUNCTIONALS WITH p−CONVEX PENALTY
1
25
1.5
0.8 1 0.6 0.4 0.5 0.2 0 1
0 1 1
1
0.8
0.5
0.8
0.5
0.6
0.6
0.4 0
0.4
0.2
0
0
(a) Exact solution
0.2 0
(b) L1 reconstruction with γ = 5, ∥¯ c0 − c† ∥ ≈ 600.
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0 1
0 1 1 0.8
0.5
0.6
1 0.8
0.5
0.6
0.4 0
0.2 0
(c) T V reconstruction with γ = 5, ∥¯ c0 − c† ∥ ≈ 700.
0.4 0
0.2 0
(d) T V reconstruction with γ = 10, ∥¯ c0 − c† ∥ ≈ 8 × 103 .
Figure 3. Reconstructions obtained by the TIGRA-Θ method, p = r = 2 and δ = 0.001. by the National Natural Science Foundation of China (No. 11501102) and Natural Science Foundation of Jiangsu Province (No. BK20150594). The author W. Wang was supported by the National Natural Science Foundation of China (No. 11401257) and Natural Science Foundation of Zhejiang Province (No. LQ14A010013).
26
MIN ZHONG AND WEI WANG
References [1] Anzengruber SW. The discrepancy principle for Tikhonov regularization in Banach spaces: Regularization properties and rates of convergence. Saarbrcken: Sdwestdeutscher Verlag fr Hochschulschriften, 2012. [2] Anzengruber SW, Hofmann B, Math P. Regularization properties of the discrepancy principle for Tikhonov regularization in Banach spaces. Appl. Anal. 93(2014) 1382-1400. [3] Anzengruber SW, Ramlau R. Morozovs discrepancy principle for Tikhonov-type functionals with nonlinear operators. Inverse Prob. 26(2010) 025001. [4] Acar R and Vogel CR. Analysis of total variation penalty methods. Inverse Problems 10(1994) 1217-1229. [5] Baumeister J. Deconvolution of appearance potential spectra, in direct and inverse boundary value problems, in Proceedings of 1989 Oberwolfach Seminar, Lang, Frankfurt-am-Main, 1991 1-13. [6] Bakushinsky AB, Kokurin M, Iterative methods for approximate solution of inverse problems (Dordrecht: Kluwer), 2004. [7] Bonesky T, Kazimierski KS, Maass P, Sch¨ opfer F, Schuster T. Minimization of Tikhonov functionals in Banach spaces, Abstr. Appl. Anal. 2008(1)(2008) 1563-1569. [8] Bunks C, Saleck FM, Zaleski S and Chavent G. Multiscale seismic waveform inversion. Geophysics 60(1995) 1457-1473. [9] Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(2009) 183-202. [10] Beck A, Teboulle M. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Process. 18(2009) 2419-2434. [11] Chambolle A, Lions PL. Image recovery via total variationalminimization and related problems. Numer. Math. 76(1997) 167-188. [12] Daubechies I, Defrise M, DeMol C. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 51(2004) 1413-1441. [13] Engl HW, Hanke M and Neubauer A. Regularization of Inverse Problems (Dordrecht: Kluwer),1996. [14] Engl HW, Kunisch K and Neubauer A. Convergence rates for Tikhonov regularization of nonlinear ill-posed problems. Inverse Problems 5(1989) 523-540. [15] Engl HW, Kunisch K and Neubauer A. Optimal a posteriori parameter choice for Tikhonov regularization for solving nonlinear ill-posed problems. SIAM J. Numer. Anal. 30(1993) 17961883. [16] Gorenflo R, Hofmann B. On autoconvolution and regularization. Inverse Prob. 10(1994) 353373. [17] Hofmann B, Yamamoto M. On the interplay of source conditions and variational inequalities for nonlinear ill-posed problems. Applicable analysis 89(11) (2010) 1705-1727. [18] de Hoop MV, Qiu L, Scherzer O. An analysis of a multi-level projected steepest descent iteration for nonlinear inverse problems in Banach spaces subject to stability constraints. Numer. Math. 129(2015) 127-148. [19] Jin B, Maass P. Sparsity regularization for parameter identification problems. Inverse Prob. 28(2012) 123001. [20] Jin Q, Wang W. Landweber iteration of Kaczmarz type with general non-smooth convex penalty functionals. Inverse Prob. 29(2013) 085011. [21] Jin Q, Yang HQ. Levenberg-Marquardt method in Banach spaces with general convex regularization terms. Numer. Math., to appear. [22] Jin Q, Zhong M. Nonstationary iterated Tikhonov regularization in Banach spaces with uniformly convex penalty terms. Numer. Math. 127(2014) 485-513. [23] Kaltenbacher B. Towards global convergence for strongly nonlinear ill-posed problems via a regularizing multilevel method. Numer. Funct. Anal. Optimiz. 27(2006) 637-665. [24] Kaltenbacher B. Convergence rates of a multilevel method for the regularization of nonlinear ill-posed problems. J. Integr. Equ. Appl. 20(2008) 201-228. [25] Kokurin MY. Convexity of the Tikhonov functional and iteratively regularized methods for solving irregular nonlinear operator equations. Comput. Math. Math. Phys. (50)2010 620-632. [26] Kokurin MY. The global search in the Tikhonov scheme. Russ. Math. 54(2010) 17-26. [27] Kokurin MY. On sequential minimization of Tikhonov functionals in ill-posed problems with a priori information on solutions. J. Inverse Ill-Posed Prob. 18(2011) 1031-1050. [28] Kaltenbacher B, Sch¨ opfer F and Schuster T. Iterative methods for nonlinear ill-posed problems in Banach spaces: convergence and applications to parameter identification problems. Inverse Problems 25(2009) 065003. [29] Louis AK. Inverse und Schlecht Gestellte Probleme (Stuttgart: Teubner), 1989.
MINIMIZATION OF TIKHONOV FUNCTIONALS WITH p−CONVEX PENALTY
27
[30] Richter M. Approximation of Gaussian Random Elements and Statistics (Stuttgart: Teubner), 1992. [31] Ramlau R. A steepest descent algorithm for the global minimization of the Tikhonov functional. Inverse Prob. 18(2002) 381-405. [32] Ramlau R. TIGRA-an iterative algorithm for regularizing nonlinear ill-posed problems. Inverse Prob. 19(2003) 433-465. [33] Resmerita E, Scherzer O. Error estimates for non-quadratic regularization and the relation to enhancement. Inverse Prob. 22(2006) 801-814. [34] Ramlau R, Zarzer CA. On the minimization of a Tikhonov functional with a non-covex sparsity constraint. Electronic Transactions on Numerical Analysis 39(2012) 476-507. [35] Rudin LI, Osher S, Fatemi E. Nonlinear total variation based noise removal algorithms. Phys. D. 60(1992) 259-268. [36] Tibshirani R. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58(1996) 267-288. [37] Scherzer O. A modified Landweber iteration for solving parameter estimation problems. Appl.Math.Optim. 38(1998) 45-68. [38] Wang W, Anzengruber SW, Ramlau R, Han B. A global minimization algorithm for Tikhonov functionals with sparsity constraints. Applicable Analysis 94(2015) 580-611. [39] Z˘ alinscu C. Convex Analysis in General Vector Spaces. World Scientific Publishing Co., Inc., River Edge, New Jersey, 2002. Department of Mathematics, Southeast University, Nanjing, Jiangsu 210096, China E-mail address:
[email protected] College of Mathematics, Physics and Information Engineering, Jiaxing University, Jiaxing, Zhejiang 314001, China E-mail address: weiwang
[email protected]