A Unified Algorithmic Framework of Symmetric Gauss-Seidel

0 downloads 0 Views 299KB Size Report
Dec 17, 2018 - Decomposition based Proximal ADMMs for Convex Composite ... and Department of Applied Mathematics, The Hong Kong Polytechnic ... 13] and its contemporary variants such as [8, 10] can be used ...... ADMM for a class of convex composite programming, arXiv: 1803.10803, (2018). ... Program., online,.
arXiv:1812.06579v1 [math.OC] 17 Dec 2018

A Unified Algorithmic Framework of Symmetric Gauss-Seidel Decomposition based Proximal ADMMs for Convex Composite Programming Liang Chen∗

Defeng Sun†

Kim-Chuan Toh‡

Ning Zhang§

December 18, 2018

Abstract This paper aims to present a fairly accessible generalization of several symmetric GaussSeidel decomposition based multi-block proximal alternating direction methods of multipliers (ADMMs) for convex composite optimization problems. The proposed method unifies and refines many constructive techniques that were separately developed for the computational efficiency of multi-block ADMM-type algorithms. Specifically, the majorized augmented Lagrangian functions, the indefinite proximal terms, the inexact symmetric Gauss-Seidel decomposition theorem, the tolerance criteria of approximately solving the subproblems, and the large dual step-lengths, are all incorporated in one algorithmic framework, which we named as sGSimiPADMM. From the popularity of convergent variants of multi-block ADMMs in recent years, especially for high-dimensional multi-block convex composite conic programming problems, the unification presented in this paper, as well as the corresponding convergence results, may have the great potential of facilitating the implementation of many multi-block ADMMs in various problem settings. Keywords. Convex optimization, multi-block, alternating direction method of multipliers, symmetric Gauss-Seidel decomposition, majorization. Majorization AMS Subject Classification. 90C25, 90C22, 90C06, 65K05.

1

Introduction

In this paper, we consider the following multi-block convex composite programming:  min p1 (x1 ) + f (x1 , . . . , xm ) + q1 (y1 ) + g(y1 , . . . , yn ) | A∗ x + B ∗ y = c , x∈X , y∈Y



(1.1)

College of Mathematics and Econometrics, Hunan University, Changsha, 410082, China ([email protected]), and Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong, China ([email protected]). This author was supported by the National Natural Science Foundation of China (11801158, 11871205) and the Fundamental Research Funds for Central Universities in China. † Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong ([email protected]). This author was supported by a start-up research grant from the Hong Kong Polytechnic University. ‡ Department of Mathematics, and Institute of Operations Research and Analytics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore 119076 ([email protected]). This author was supported in part by the Ministry of Education, Singapore, Academic Research Fund (R-146-000-256-114) § Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong ([email protected])

1

where X , Y and Z are three finite dimensional real Hilbert spaces, each endowed with an inner product h·, ·i and its induced norm k · k, and - X can be decomposed as the Cartesian product of X1 , . . . , Xm , which are finite dimensional real Hilbert spaces endowed with the inner product h·, ·i inherited from X and its induced norm k · k. Similarly, Y = Y1 × · · · × Yn . Based on such decompositions, one can write x ∈ X as x = (x1 , . . . , xm ) with xi ∈ Xi , i = 1, . . . , m, and, similarly, y = (y1 , . . . , yn ); - p1 : X1 → (−∞, ∞] and q1 : Y1 → (−∞, ∞] are two closed proper convex functions; - f : X → (−∞, ∞) and g : Y → (−∞, ∞) are continuously differentiable convex functions with Lipschitz continuous gradients; - A∗ and B ∗ are the adjoints of the two given linear mappings A : Z → X and B : Z → Y, respectively; c ∈ Z is a given vector; - without loss of generality, we define the two functions p : X → (−∞, ∞] and q : Y → (−∞, ∞] by p(x) := p1 (x1 ), ∀x ∈ X and q(y) := q1 (y1 ), ∀y ∈ Y for convenience. At the first glance, one may view problem (1.1) as a 2-block separable convex optimization problem with coupled linear equality constraints. Consequently, the classic alternating direction method of multipliers (ADMM) [12, 13] and its contemporary variants such as [8, 10] can be used for solving problem (1.1). For the classic 2-block ADMM, one may refer to [9, 14] for a history of the algorithm and to the recent note [3] for a thorough study on its convergence properties. In high-dimensional settings, it is usually not computationally economical to directly apply the 2-block ADMM and its variants to solve problem (1.1), as in this case solving the subproblems at each ADMM iteration can be too expensive. The difficulty is made more severe especially when we know that ADMMs, being intrinsically first-order methods, are prone to require a large number of outer iterations to compute even a moderately accurate approximate solution. As a result, further decomposition of the variables in problem (1.1) for getting easier subproblems, if possible, should be incorporated when designing ADMM-type methods for solving it. Unfortunately, even if the functions f and g in problem (1.1) are separable with respect to each subspace, say, Xi and Yj , the naive extension of the classic ADMM to multi-block cases is not necessarily convergent[2]. How to address the aforementioned issues is the key reason why the algorithmic development, as well as the corresponding convergence analysis, of multi-block variants of the ADMM has been an important research topic in convex optimization. Of course, it is not reasonable to expect finding a general algorithmic framework that can achieve sterling numerical performance on a wide variety of different classes of linearly constrained multi-block convex optimization problems. Thus, in this paper our focus is on model (1.1), which is already quite versatile, for the following two reasons. Firstly, this model is general enough to handle quite a large number of convex composite optimization models from both the core convex optimization and realistic applications [19, 4]. Secondly, the convergence of multi-block variants of the ADMM for solving problem (1.1) has been separately realized in [25, 19, 18, 4, 30, 5], without sacrificing the numerical performance when compared to the naively extended multi-block ADMM. The latter has long been served as a benchmark for comparing new ADMM-type methods since its impressive numerical performance has been well recognized in extensive numerical experiments, despite its lack of theoretical convergence guarantee. Currently, this line of ADMMs has been applied to many concrete instances of problem (1.1), e.g., [29, 1, 7, 16, 28, 27, 11, 24, 26, 21], to name just a few. Motivated by the above exposition, in this paper, we plan to propose a unified multi-block ADMM for solving problem (1.1). Our unified method is an inexact symmetric Gauss-Seidel (sGS) decomposition based majorized indefinite-proximal ADMM (sGS-imiPADMM). The purpose of this

2

study is to distill and synthesize all the practical techniques that were constructively exploited in the references mentioned above for the computational efficiency of multi-block ADMM-type algorithms. Specifically, our unified algorithm incorporates all the following ingredients developed over the past few years: - the inexact sGS decomposition theorem progressively developed in [25, 19, 20]; - the majorized augmented Lagrangian functions introduced in [18]; - the indefinite proximal terms studied in [18, 30, 5]; - the admissible stopping conditions of approximately solving the subproblems developed in [4], together with the large dual step-lengths for the classic ADMM [13]. We show that the proposed sGS-imiPADMM is globally convergent under very mild assumptions, and the resulting convergence theorem also improves those in the highly related references mentioned above. For instance, it refines the sGS-imsPADMM in [4] by substituting the extra condition [4, (5.26) of Theorem 5.1] for establishing the convergence with the weaker basic condition[4, (3.2)], which is imposed for the well-definedness of the algorithm. Moreover, compared with the recently developed sGS decomposition based majorized ADMM with indefinite proximal terms in [30], the problem setting for sGS-imiPADMM is much more general as the functions f and g here are not restricted to have the separable structures that were required in [30]. The rest of this paper is organized as follows. In section 2, we introduce some notation, and recall the inexact sGS decomposition theorem which plays an important role in the subsequent algorithmic design. In section 3, the sGS-imiPADMM algorithm for the multi-block problem (1.1) is formally proposed. Its global convergence theorem is established in section 4 via the convergence of an inexact majorized indefinite-proximal ADMM (imiPADMM), which will be introduced in section 4.2, in conjunction with the inexact sGS decomposition theorem. Finally, we conclude the paper in section 5.

2 2.1

Notation and Preliminaries Notation

Let U and V be two arbitrary finite dimensional real Hilbert spaces each endowed with an inner product h·, ·i and its induced norm k · k. For any given linear map H : U → V, we use kHk to denote its spectral norm and H∗ : V → U to denote its adjoint linear operator. If U = V and H is self-adjoint and positive semidefinite, then there exists a unique self-adjoint 1 1 1 positive semidefinite linear operator H 2 : U → U p such that H 2 H 2 = H. In this case, for any 1 u, v ∈ U , we define hu, viH := hu, Hvi and kukH := hu, Hui = kH 2 uk. For a closed proper convex function θ : U → (−∞, +∞], we denote dom θ and ∂θ for the effective domain and the subdifferential mapping of θ, respectively. Let s > 0 be a given integer such that one can decompose U as the Cartesian product of U1 , . . . , Us , such that each Ui is a finite dimensional real Hilbert space endowed with the inner product h·, ·i inherited from U and its induced norm k·k. Then, the self-adjoint positive semidefinite linear operator H : U → U can be symbolically decomposed as  H11 H12 · · ·  ∗ H12 H22 · · ·  H= . .. ..  .. . .  ∗ ∗ ··· H2s H1s

3

H1s



 H2s   , ..   .  Hss

(2.1)

where Hij : Uj → Ui , i, j = 1, . . . , s are linear maps and Hii , i = 1, . . . , s are self-adjoint positive semidefinite linear operators. Based on (2.1), we use Hd := Diag(H11 , . . . , Hss ) to denote the block-diagonal part of H, and denote the symbolically strictly upper triangular part of H by Hu , so that H = Hd + Hu + Hu∗ . To simplify the notation in this case, for any u = (u1 , . . . , us ) ∈ U and i ∈ {1, . . . , s}, we denote u≤i := {u1 , . . . , ui }, u≥i := {ui , . . . , us }.

2.2

The inexact sGS decomposition theorem

We now briefly review the inexact block sGS decomposition theorem in [20], which is a generalization of the Schur complement based decomposition technique developed in [19]. Following the notation of the previous subsection, suppose that U = U1 × · · · × Us and H is symbolically decomposed as in (2.1). Let θ1 : U1 → (−∞, ∞] be a given closed proper convex function, and b ∈ U be a given vector. Define the convex quadratic function h : U → (−∞, ∞) by 1 h(u) := hu, Hui − hb, ui, 2

∀u ∈ U .

Let δei , δi ∈ Ui , i = 1, . . . , s be given (error tolerance) vectors with δe1 = δ1 . With the assumption that Hd is positive definite, we define e with e δ) := δ + Hu H−1 (δ − δ) d(δ, d

δ := (δ1 , . . . , δs ) and δe := (δe1 , . . . δes ).

Suppose that u− ∈ U is a given vector. Define   − − ei , ui i , i = s, . . . , 2, u e := arg min , u , u e ) − h δ θ(u ) + h(u  i i ≥i+1 1 ≤i−1   ui   u+ := arg min θ(u1 ) + h(u1 , u e≥2 ) − hδ1 , u1 i , 1 u1    +   u+ := arg min θ(u+ e≥i+1 ) − hδi , ui i , i = 2, . . . , s. 1 ) + h(u≤i−1 , ui , u i

(2.2)

(2.3)

ui

Meanwhile, define the self-adjoint positive semidefinite linear operator sGS(H) : U → U by sGS(H) := Hu Hd−1 Hu∗ .

(2.4)

Now, consider the following convex composite quadratic optimization problem:   1 − 2 e min θ(u1 ) + h(u) + ku − u ksGS(H) − hd(δ, δ), ui . u∈U 2

(2.5)

The following sGS decomposition theorem from [20, Theorem 1 & Proposition 1], reveals the equivalence between the sGS iteration (2.3) and the proximal minimization problem (2.5). This theorem is essential for the algorithmic development in this paper. Theorem 2.1. Suppose that Hd = Diag(H11 , . . . , Hss ) is positive definite. Then (i) both the iteration process in (2.3) and the linear operator sGS(H) defined by (2.4) are welldefined; (ii) problem (2.5) is well-defined and admits a unique solution, which is exactly the vector u+ generated by (2.3); b := H + sGS(H) = (Hd + Hu )H−1 (Hd + Hu∗ ) ≻ 0; (iii) it holds that H d e δ) defined by (2.2) satisfies (iv) the vector d(δ, 1

1

e δ)k ≤ kH− 2 (δ − δ)k e + kH 2 (Hd + Hu )−1 δk. e b − 21 d(δ, kH d d 4

3

An Inexact sGS Decomposition Based Majorized IndefiniteProximal ADMM

In this section, we present the sGS-imiPADMM algorithm for solving problem (1.1). We first recall the majorization technique used in [18] and the indefinite proximal terms used in [5]. (T1) (Majorization technique) Since the two convex functions f and g in problem (1.1) are assumed to be continuously differentiable with Lipschitz continuous gradients, there exist two selfb f : X → X and Σ b g : Y → Y such that adjoint positive semidefinite linear operators Σ  1   f (x) ≤ fb(x; x′ ) := f (x′ ) + h∇f (x′ ), x − x′ i + kx − x′ k2b , ∀x, x′ ∈ X , Σf 2 (3.1) 1  ′ ∈ Y.  g(y) ≤ gb(y; y ′ ) := g(y ′ ) + h∇g(y ′ ), y − y ′ i + ky − y ′ k2 , ∀y, y bg Σ 2

For any given σ > 0, the majorized augmented Lagrangian function associated with problem (1.1) is defined by Lbσ (x, y; (x′ , y ′ , z ′ )) := p(x) + fb(x; x′ ) + q(y) + b g(y; y ′ ) σ + hz ′ , A∗ x + B ∗ y − ci + kA∗ x + B ∗ y − ck2 , 2 ∀ (x, y) ∈ X × Y and ∀ (x′ , y ′ , z ′ ) ∈ X × Y × Z.

(T2) (Indefinite proximal terms) In order to use the sGS decomposition theorem, we symbolically b f and Σ b g defined by (3.1) into the decompose the positive semidefinite linear operators Σ following form, i.e.,     b g )11 (Σ b g )12 · · · (Σ b g )1n b f )11 (Σ b f )12 · · · (Σ b f )1m (Σ (Σ      b ∗  b ∗ b g )22 · · · (Σ b g )2n  b f )22 · · · (Σ b f )2m     (Σg )12 (Σ  (Σf )12 (Σ   bg =  bf =  Σ ,  and Σ   ..  .. .. ..   ..  .. .. .. . .   .  . .  . . .     b g )nn b g )∗ · · · (Σ b g )∗ (Σ b f )mm b f )∗ b f )∗ (Σ · · · (Σ (Σ (Σ 1m

1n

2m

2n

which are consistent with the decompositions of X and Y. Meanwhile, the linear operators A and B are also can be decomposed as Az = (A1 z, . . . , Am z)

and Bz = (B1 z, . . . , Bn z),

where Ai z ∈ Xi and Bj z ∈ Yj , ∀i ∈ {1, . . . , m}, j ∈ {1, . . . , n}, and z ∈ Z. Then, we choose self-adjoint (not necessarily positive semidefinite) linear operators Sei : Xi → Xi , i = 1, . . . , m and Tej : Yj → Yj , j = 1, . . . , n, such that  1 b f )ii + σAi A∗i + Sei ≻ 0,  (Σ i = 1, . . . , m,   2     1 b  ∗  e j = 1, . . . , n,  (Σ g )jj + σBj Bj + Tj ≻ 0, 2 (3.2) 1b   e e e  S := Diag(S1 , . . . , Sm )  − Σf ,   2      Te := Diag(Te , . . . , Te )  − 1 Σ bg. 1 n 2 5

Remark 3.1. Note that the linear operators Se1 : X1 → X1 and Te1 : Y1 → Y1 are chosen for the purpose of making the minimization subproblems involving p1 and q1 easier to solve. With appropriately chosen Se1 and Te1 , we can assume that the following well-defined optimization problems   1 ′ 2 min p1 (x1 ) + kx1 − x1 kM f11 x1 ∈X1 2

and

  1 ′ 2 min q1 (y1 ) + ky1 − y1 kNe 11 y1 ∈Y1 2

can be solved to a sufficient accuracy in the sense of returning approximate solutions with sufficiently small subgradients of the objective functions, for any given x′1 ∈ X1 and y1′ ∈ Y1 . f11 := (Σ b f )11 + σA1 A∗ + Se1 and Here, and as will be formally defined in the next section, M 1 e11 := (Σ b g )11 + σB1 B ∗ + Te1 . Moreover, if Σ b f and Σ b g are block-diagonal, the condition (3.2) is N 1 exactly the one used in [30].

Based on the above prerequisites, we now formally propose the promised sGS-imiPADMM as Algorithm 1 for solving the multi-block convex composite programming problem (1.1). Recall that the Karush-Kuhn-Tucker (KKT) system of problem (1.1) is given by 0 ∈ ∂p(x) + ∇f (x) + Az,

0 ∈ ∂q(y) + ∇g(y) + Bz,

A∗ x + B ∗ y = c.

(3.3)

x, y¯) is an optimal If (¯ x, y¯, z¯) ∈ X × Y × Z satisfies (3.3), from [23, Corollary 30.5.1] we know that (¯ solution to problem (1.1) and z¯ is an optimal solution to the dual of this problem. To simplify the notation, we denote the solution set of the KKT system (3.3) for problem (1.1) by W. We now make the following assumption on problem (1.1) and Algorithm 1. Assumption 3.1. Suppose that: (i) the solution set W to the KKT system (3.3) of problem (1.1) is nonempty; b f : X → X and Σ b g : Y → Y are chosen (ii) the self-adjoint positive semidefinite linear operators Σ such that (3.1) holds; (iii) the self-adjoint linear operators Sei , i = 1, . . . , m and Tej , j = 1, . . . , n are chosen such that (3.2) holds; Based on the above assumption, the global convergence of Algorithm 1 is given as the following theorem. The corresponding proof will be accomplished in section 4.

Theorem 3.1 (Convergence of sGS-imiPADMM). Suppose that Assumption 3.1 holds. Then the whole sequence {(xk , y k , z k )} generated by Algorithm 1 converges to a solution of the KKT system (3.3) of problem (1.1). We end this section by comparing Algorithm 1 and its convergence theorem (Theorem 3.1) with its precursors in [19, 4, 30] for solving problem (1.1). Such a comparison will clearly demonstrate from where the algorithm in this paper originates and to what extent does the progress made in this paper can reach. The details of the comparison are presented in the following table. Ref. \ Item

f & g.

Majorization

Proximal Terms

Inexact

[19]

separable, quadratic

no

semidefinite

no

[4]

-

yes

semidefinite

yes

[30]

separable

yes

indefinite

no

This paper

-

yes

indefinite

yes

6

Algorithm 1 (sGS-imiPADMM): An inexact sGS decomposition based majorized indefiniteproximal ADMM for solving problem (1.1). √ Let τ ∈ (0, (1 + 5)/2) be the step-length and {e εk }k≥0 be a summable sequence of nonnegative 0 0 0 numbers. Let (x , y , z ) ∈ dom p × dom q × Z be the initial point. For k = 0, 1, . . ., perform the following steps: Step 1a. (Backward GS sweep) Compute for i = m, . . . , 2,     1 k+1 k+1 k 2 k k k k k x ei ≈ arg min Lbσ (x≤i−1 , xi , x e≥i+1 ), y ; (x , y , z ) + kxi − xi kSe i 2 xi ∈Xi such that there exists   k+1 k k k k δeik ∈ ∂xi Lbσ (xk≤i−1 , x ek+1 , x e ), y ; (x , y , z ) + Sei (e xk+1 − xki ) with kδeik k ≤ εek . i ≥i+1 i Step 1b. (Forward GS sweep) Compute for i = 1, . . . , m,     1 k+1 k+1 k+1 k k k k k 2 b xi ≈ arg min Lσ (x≤i−1 , xi , x e≥i+1 ), y ; (x , y , z ) + kxi − xi kSe i 2 xi ∈Xi such that there exists   k+1 k+1 k k k k ), y ; (x , y , z ) + Sei (xk+1 − xki ) with kδik k ≤ εek . , x , x e δik ∈ ∂xi Lbσ (xk+1 i ≥i+1 ≤i−1 i

Step 2a. (Backward GS sweep) Compute for j = n, . . . , 2,     1 k+1 k+1 k+1 k k k k k 2 b yej ≈ arg min Lσ x , (y≤j−1 , yj , ye≥j+1 ); (x , y , z ) + kyj − yj kTe j 2 yj ∈Yj such that there exists   k+1 k ); (xk , y k , z k ) + Tej (e yjk+1 − yjk ) with ke γjk k ≤ εek . , yejk+1 , ye≥j+1 γ ejk ∈ ∂yj Lbσ xk+1 , (y≤j−1

Step 2b. (Forward GS sweep) Compute for j = 1, . . . , n,     1 k+1 k+1 yjk+1 ≈ arg min Lbσ xk+1 , (y≤j−1 , yj , ye≥j+1 ); (xk , y k , z k ) + kyj − yjk k2Te j 2 yj ∈Yj such that there exists   k+1 k+1 ); (xk , y k , z k ) + Tej (yjk+1 − yjk ) with kγjk k ≤ εek . , yjk+1 , ye≥j+1 γjk ∈ ∂yj Lbσ xk+1 , (y≤j−1 Step 3. Compute z k+1 := z k + τ σ(A∗ xk+1 + B ∗ y k+1 − c).

Here, the column “f & g” indicates the additional conditions imposed on the functions f and g in problem (1.1), the column “Majorization” indicates whether the majorization technique was used, the column “Proximal Terms” shows whether the proximal terms used are semidefinite or indefinite, and the column “Inexact” shows whether the subproblems are allowed to be solved approximately. It is easy to conclude from the above table that Algorithm 1 proposed in this paper generalizes all those in [19, 4, 30]. This explains why we name the sGS-imiPADMM as a unified algorithmic 7

framework. Here, it is also worthwhile to point out that even when the proximal terms in sGSimiPADMM are chosen to be positive semidefinite, the convergence theorem in this paper is sharper than that in [4]; see Remark 4.3 for the details.

4

Convergence Analysis

In this section, we will prove Theorem 3.1 step-by-step. We first show how to apply the sGS decomposition theorem to reformulate the multi-block Algorithm 1 as an abstract 2-block ADMMtype algorithm. Then we establish the convergence properties of the later, and, as a consequence, prove Theorem 3.1.

4.1

Basic convergence results

f : X → X and N e :Y→Y Following the discussions in [4], here we define two linear operators M as follows: ( f := Σ b f + σAA∗ + S, e M (4.1) e := Σ b g + σBB ∗ + Te . N

f and N e accordingly. Just like the decomposition of H in (2.1), we can symbolically decompose M fd and N ed to denote the corresponding diagonal parts, and M fu and N eu to denote the We use M e = f f f f∗ and N strictly upper triangular parts, respectively. Consequently, M = Md + Mu + M u ∗ e e e Nd + Nu + Nu . Next, we define the following linear operators: ( f = Diag(Se1 , . . . , Sem ) + M fu M f−1 M fu , SsGS := Se + sGS(M) d (4.2) e ) = Diag(Te1 , . . . , Ten ) + N eu N e −1 N e ∗. TsGS := Te + sGS(N u d Based on the above preparations, we can get the following result.

Proposition 4.1. Suppose that (3.2) holds. Define for all k ≥ 0, k k δek := (δe1k , . . . , δem ), δk := (δ1k , . . . , δm ), γ ek := (e γ1k , . . . , γ enk ), and γ k := (γ1k , . . . , γnk )

γ1k := γ1k . Then with the convention that δe1k := δ1k and e (i) the sequences {(xk , y k , z k )}, {δk }, {δek }, {γ k } and {e γ k } generated by the sGS-imiPADMM are well-defined; (ii) the linear operators SsGS and TsGS in (4.2) are well-defined, and

(iii) it holds that

b f + σAA∗ + SsGS ≻ 0, MsGS := Σ

      1 k+1 k k k k k k+1 k 2  d ∈ ∂x Lbσ x , y ; (x , y , z ) + 2 kx − x kSsGS ,   ↑ ↑  x ( ! )   1   dky ∈ ∂y Lbσ xk+1 , y k+1 ; (xk , y k , z k ) + 2 ky k+1 − y k k2TsGS ,  ↑

where

b g + σBB ∗ + TsGS ≻ 0; NsGS := Σ

fu M f−1 (δk − δek ) dkx := δk + M d



and 8

eu N e −1 (γ k − γ dky := γ k + N ek ); d

(4.3)

−1

−1

2 (iv) one has that kMsGS dkx k ≤ κe εk , kNsGS2 dky k ≤ κ′ εek , where  1 1 √  fd + M fu )−1 k, f− 2 k + √mkM f 2 (M  κ := 2 m − 1kM d d 1 1   κ′ := 2√n − 1kN ed + N eu )−1 k. e − 2 k + √nkN e 2 (N d d

(4.4)

Remark 4.1. It is easy to prove Proposition 4.1 via Theorem 2.1. We omit the detailed proof here since it is almost the same as that of [4, Proposition 3.1].

4.2

An inexact 2-block majorized indefinite-proxiaml ADMM

Based on Proposition 4.1 and the previous efforts (see e.g. [20, 4]), here we also view the sGSimiPADMM as a 2-block ADMM-type algorithm applied to problem (1.1) with intelligently constructed proximal terms. For this purpose, we formally present the previously mentioned imiPADMM as follows. Algorithm 2 (imiPADMM): An inexact majorized indefinite-proximal ADMM for solving problem (1.1). √ Let τ ∈ (0, (1 + 5)/2) be the step-length and {εk }k≥0 be a summable sequence of nonnegative numbers. Choose the self-adjoint (not necessarily positive semidefinite) linear operators S and T such that 1b T − Σ g, 2

1b S− Σ f, 2

1b Σf + σAA∗ + S ≻ 0 and 2

For k = 0, 1, . . ., perform the following steps:

1b Σg + σBB ∗ + T ≻ 0. 2

(4.5)

b f + σAA∗ + S)− 12 dk k ≤ εk , where Step 1. Compute xk+1 and dkx ∈ ∂ψk (xk+1 ) such that k(Σ x     1 k+1 k+1 k k k k k 2 b x ≈x ¯ := arg min ψk (x) := Lσ x, y ; (x , y , z ) + kx − x kS . (4.6) 2 x∈X 1

b g + σBB ∗ + T )− 2 dky k ≤ εk , where Step 2. Compute y k+1 and dky ∈ ∂ϕk (y k+1 ) such that k(Σ y

k+1

k+1

≈ y¯

   1 k 2 k+1 k k k b : = arg min ϕk (y) := Lσ x , y; (x , y , z ) + ky − y kT . 2 y∈Y 

(4.7)

Step 3. Compute z k+1 := z k + τ σ(A∗ xk+1 + B ∗ y k+1 − c).

Now, we are ready to present the convergence theorem of the imiPADMM. The proof of this theorem is postponed to Appendix A. Theorem 4.1 (Convergence of imiPADMM). Suppose that parts (i) and (ii) in Assumption 3.1 hold. Then the sequence {(xk , y k , z k )} generated by Algorithm 2 converges to a point in W, i.e., the solution set to the KKT system (3.3) of problem (1.1). Remark 4.2. Even though the purpose of the proposed Algorithm 2 is to derive the convergence properties of Algorithm 1, this 2-block ADMM-type algorithm itself is a very general extension of the 9

classic ADMM that contains many contemporary practical techniques, including the original large dual step-lengths in [13], the positive semidefinite proximal terms in [10], the majorized augmented Lagrangian function and indefinite proximal terms in [18], and the error tolerance criteria in [4]. Remark 4.3. If both S and T are chosen as positive semidefinite linear operators, Algorithm 2 reduces to Algorithm imsPADMM in [4]. Moreover, if the subproblems (4.6) and (4.7) in Algorithm 2 are solved exactly, i.e., by restricting εk ≡ 0, ∀k ≥ 0, imiPADMM here reduces to the Majorized iPADMM proposed in [18]. However, in both [4] and [18], one requires1 1 Σf + S + σAA∗ ≻ 0 2

and

1 Σg + T + σBB ∗ ≻ 0, 2

b f and Σg  Σ b g , and these conditions are in general stronger than the last two condiwhere Σf  Σ tions in (4.5). Therefore, even for 2-block problems, Theorem 4.1 on the convergence of Algorithm 2 has made its own progress on improving the convergence properties of the previously proposed algorithms imsPADMM and Majorized iPADMM in [4] and [18], respectively. As a result, for the multi-block problem (1.1), Theorem 3.1 can also be used to sharpen the convergence properties of the sGS-imsPADMM in [4].

4.3

Convergence of the sGS-imiPADMM

Now we are ready to prove Theorem 3.1 for Algorithm 1 based on the connection between the sGS-imiPADMM for the multi-block problem (1.1) and the imiPADMM for the same problem but from the angle of viewing it as a 2-block problem. The result given in the following proposition is used in the proof of Theorem 3.1. The corresponding proof is given in Appendix B, which is directly based on [5, Theorem 4(i)]. Proposition 4.2. Suppose that Assumption 3.1 holds. Let S := SsGS and T := TsGS , where SsGS and TsGS are given in (4.2). Then, (4.5) holds. Now we are ready to prove Theorem 3.1. Proof of Theorem 3.1. Let S := SsGS and T := TsGS , where SsGS and TsGS are given in (4.2). Then b f + σAA∗ + S = MsGS and Σ b g + σBB ∗ + T = NsGS . Since Assumption by (4.3) one has that Σ 3.1 holds, Proposition 4.2 implies that (4.5) holds. Thus by Proposition 4.1(iii), one has that dkx ∈ ∂ψk (xk+1 ) and dky ∈ ∂ϕk (y k+1 ). Meanwhile, we can define the sequence {εk } in Algorithm 2 by εk := max{κ, κ′ }e εk , ∀k ≥ 0, which is summable due to the fact that the sequence {e εk } used ′ in Algorithm 1 to control the inexactness is summable, where κ and κ are given in (4.4). Then, −1

−1

2 dkx k ≤ εk and kNsGS2 dky k ≤ εk . Thus, since Assumption by Proposition 4.1(iv), one has that kMsGS 3.1(iii) holds, the sequence {(xk , y k , z k )} generated by Algorithm 1 is exactly a sequence generated by Algorithm 2 with the specially constructed proximal terms S = SsGS and T = TsGS . Consequently, since parts (i) and (ii) in Assumption 3.1 hold, from Theorem 4.1 we know that Theorem 3.1 holds. This completes the proof.

5

Conclusions

In this paper, we have developed a unified algorithmic framework, i.e., sGS-imiPADMM, for solving the multi-block convex composite programming problem (1.1). The proposed algorithm combines 1

The precise definitions of Σf and Σg are given in (A.3) of the Appendix.

10

the merits from its various precursors by gathering the practical techniques developed for the purpose of improving the efficiency of ADMM-type algorithms. The motivation behind such a unification is that, these techniques, including the majorization-type surrogates, inexact symmetric Gauss-Seidel decomposition, indefinite proximal terms, inexact computation of subproblems, and large dual step-lengths, have been shown to be very useful in dealing with convex composite programming problems. We established the global convergence of the sGS-imiPADMM under very mild assumptions. We believe that the proposed algorithm can serve not only as a generalization or extension of the existing algorithms, but also provide a catalyst for enhancing the numerical performance of multi-block ADMM based solvers. We should mention that the linear convergence rate of sGS-imiADMM is not discussed in this paper, but one should be able to establish such results following the works conducted in [15, 30] without much difficulty.

References [1] S. Bai and H.-D. Qi, Tackling the flip ambiguity in wireless sensor network localization and beyond. Digit. Signal Process., 55, (2016), 85–97. [2] C. Chen, B. He, Y. Ye, and X. Yuan, The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent, Math. Program., 155:1-2, (2016), 57–79. [3] L. Chen, D.F. Sun, and K.-C. Toh, A note on the convergence of ADMM for linearly constrained convex optimization problems, Comput. Optim. Appl., 66, (2017), 327–343. [4] L. Chen, D.F. Sun, and K.-C. Toh, An efficient inexact symmetric Gauss–Seidel based majorized ADMM for high-dimensional convex composite conic programming, Math. Program., 161:1-2, (2017), 237–270. [5] L. Chen, X.D. Li, D.F. Sun, and K.-C. Toh, On the equivalence of inexact proximal ALM and ADMM for a class of convex composite programming, arXiv: 1803.10803, (2018). [6] F.H. Clarke, Optimization and Nonsmooth Analysis, Wiley, New York, 1983. [7] C. Ding and H.-D. Qi, Convex optimization learning of faithful Euclidean distance representations in nonlinear dimensionality reduction. Math. Program., 164:1-2, (2017), 341–381. [8] J. Eckstein, Some saddle-function splitting methods for convex programming, Optim. Methods Softw., 4:1, (1984), 75–83. [9] J. Eckstein and W. Yao, Understanding the convergence of the alternating direction method of multipliers: Theoretical and computational perspectives, Pac. J. Optim., 11:4, (2015), 619–644. [10] M. Fazel, T.K. Pong, D.F. Sun, and P. Tseng, Hankel matrix rank minimization with applications to system identification and realization, SIAM J. Matrix Anal. Appl., 34:3, (2013), 946–977. [11] J.F.B. Ferreira, Y. Khoo, and A. Singer, Semidefinite programming approach for the quadratic assignment problem with a sparse graph, Comput. Optim. Appl., 69:3, (2018), 677–712. [12] D. Gabay and B. Mercier, A dual algorithm for the solution of non linear variational problems via finite element approximation, Comput Math. Appl., 2:1, (1976), 17–40. 11

[13] R. Glowinski, and A. Marroco, Sur l’approximation, par ´el´ements finis d’ordre un, et la r´esolution, par p´enalisation-dualit´e d’une classe de probl`emes de Dirichlet non lin´eaires, Revue fran¸caise d’automatique, informatique, recherche op´erationnelle, Analyse Num´erique, 9:R2, (1975), 41–76. [14] R. Glowinski, On alternating direction methods of multipliers: a historical perspective, in Modeling, Simulation and Optimization for Science and Technology, W Fitzgibbon, Y.A. Kuznetsov, P. Neittaan-maki, O. Pironneau (eds.), Springer, Netherlands, 2014, pp. 59–82. [15] D.R. Han, D.F. Sun, and L.W. Zhang, Linear rate convergence of the alternating direction method of multipliers for convex composite programming, Math. Oper. Res., 43:2, (2018), 622–637. [16] X.Y. Lam, J.S. Marron, D.F. Sun, and K.-C. Toh, Fast algorithms for large–scale generalized distance weighted discrimination. J. Comput. Graph. Stat., 27:2, (2018), 1–12. [17] C. Lemar´echal, C. Sagastiz´ abal, Practical aspects of the Moreau-Yosida regularization: theoretical preliminaries. SIAM J. Optim., 7:2, (1997), 367–385. [18] M. Li, D.F. Sun, and K.-C Toh, A majorized ADMM with indefinite proximal terms for linearly constrained convex composite optimization, SIAM J. Optim., 26:2, (2016), 922–950. [19] X.D. Li, D.F. Sun, and K.-C. Toh, A Schur complement based semi-proximal ADMM for convex quadratic conic programming and extensions, Math. Program., 155:1-2, (2016), 333– 373. [20] X.D. Li, D.F. Sun, and K.-C. Toh, A block symmetric Gauss–Seidel decomposition theorem for convex composite quadratic programming and its applications, Math. Program., online, (2018), doi: 10.1007/s10107-018-1247-7. [21] X.D. Li, D.F. Sun, and K.-C. Toh, QSDPNAL: A two-phase augmented Lagrangian method for convex quadratic semidefinite programming, Math. Program. Comput., 10:4, (2018), 703– 743. [22] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Kluwer Academic Publishers, Boston, 2004. [23] R.T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, 1970. [24] X.L. Song, B. Chen, and B. Yu, An efficient duality-based approach for PDE-constrained sparse optimization. Comput. Optim. Appl., 69:2, (2018), 461–500. [25] D.F. Sun, K.-C. Toh, and L.Q. Yang, A convergent 3-block semiproximal alternating direction method of multipliers for conic programming with 4-type constraints, SIAM J. Optim., 25:2, (2015), 882–915. [26] D.F. Sun, K.-C. Toh, Y.C. Yuan, and X.Y. Zhao, SDPNAL+: A Matlab software for semidefinite programming with bound constraints (version 1.0), arXiv:1710.10604, (2017). [27] B. Wang and H. Zou, Another look at distance-weighted discrimination, J. Royal Stat. Soc., B, 80:1, (2018), 177–198.

12

[28] Z. Yan, S.Y. Gao, and C.P. Teo, On the design of sparse but efficient structures in operations. Manage. Sci., 64:7, (2018), 3421–3445. [29] L.Q. Yang, D.F. Sun, and K.-C. Toh, SDPNAL+: a majorized semismooth Newton-CG augmented Lagrangian method for semidefinite programming with nonnegative constraints. Math. Program. Comput. 7, (2015), 331–366. [30] N. Zhang, J. Wu, and L.W. Zhang. A linearly convergent majorized ADMM with indefinite proximal terms for convex composite programming and its applications. arXiv:1706.01698v2, (2018).

A

Convergence Analysis of the imiPADMM

In this part, we prove Theorem 4.1. We begin by introducing the notation and definitions that will be used throughout this section. Then we establish a key inequality for obtaining the convergence of the algorithm. After that, we turn to the convergence of the imiPADMM.

A.1

Additional notation and preliminaries

Recall that H is a finite dimensional real Euclidean space endowed with an inner product h·, ·i and its induced norm k · k. Then, we have that   hu, viH = 12 kuk2H + kvk2H − ku − vk2H = 21 ku + vk2H − kuk2H − kvk2H . (A.1) The following lemma was given in [30, Lemma 3.2].

Lemma A.1. Let h : U → (−∞, ∞) be a smooth convex function and there is a self-adjoint b h : U → U such that, for any given u′ ∈ U , positive semidefinite linear operator Σ 1 h(u) ≤ h(u′ ) + h∇h(u′ ), u − u′ i + ku − u′ k2Σb , h 2

∀u ∈ U .

Then it holds that for any given u′ ∈ U ,

1 h∇h(u) − ∇h(u′ ), v − u′ i ≥ − kv − uk2Σb , ∀u, v ∈ U . (A.2) h 4 We now turn to problem (1.1) and Algorithm 2. Due to the convexity of f and g, there exist b f ) and Σg ( Σ b g ) such that two positive semidefinite linear operators Σf ( Σ f (x) ≥ f (x′ ) + h∇f (x′ ), x − x′ i + 21 kx − x′ k2Σf , g(y) ≥ g(y ′ ) + h∇g(y ′ ), y − y ′ i + 12 ky − y ′ k2Σg ,

∀x, x′ ∈ X ,

∀ y, y ′ ∈ Y.

(A.3)

Since that the sequence {εk } in Algorithm 2 is nonnegative and summable, we can define the following two real numbers ∞ ∞ X X E := εk and E ′ := ε2k . k=0

k=0

Suppose that both (i) and (ii) in Assumption 3.1 hold. Then, an infinite sequence {(xk , y k , z k )} can be generated by Algorithm 2. Meanwhile, there exist two sequences {¯ xk } and {¯ y k } defined by (4.6) and (4.7), respectively. In this case, we define for any k ≥ 0, ( k r := A∗ xk + B ∗ y k − c, r¯k := A∗ x ¯k + B ∗ y¯k − c, zek+1 := z k + σr k+1 ,

z¯k+1 := z k + τ σ¯ r k+1

13

with the convention that x ¯0 = x0 and y¯0 = y 0 , where τ is the step-length used in Algorithm 2. Moreover, we define the following three constants:  α := (1 + τ /min{1 + τ, 1 + τ −1 })/2,    α b := 1 − α min{τ, τ −1 }, (A.4)    β := min{1, 1 − τ + τ −1 }α − (1 − α)τ.

Based on the above definitions, we have the following result.

Proposition A.1. Suppose that both (i) and (ii) in Assumption 3.1 hold. Let {(xk , y k , z k )} be xk }, {¯ y k } be the sequence defined by (4.6) and (4.7). the sequence generated by Algorithm 2, and {¯ Then, for any k ≥ 0, we have that kxk+1 − x ¯k+1 kM ≤ εk

and

1

1

ky k+1 − y¯k+1 kN ≤ (1 + σkN − 2 BA∗ M− 2 k)εk ,

where

b f + σAA∗ + S ≻ 0 M := Σ

and

b g + σBB ∗ + T ≻ 0. N := Σ

(A.5)

Proof. As a consequence of (4.5) in Algorithm 2, (A.5) holds. Then, the subsequent proof can be easily completed via a few properties of the Moreau-Yosida mappings in [17], and one can refer to [4, Proposition 3.1] and its proof for the details. The following result on a quasi-Fej´er monotone sequence of real numbers will be used later. Lemma A.2. Let {ak }k≥0 be a nonnegative sequence of real numbers satisfying ak+1 ≤ ak + εk , ∀k ≥ 0, where {εk }k≥0 is a nonnegative and summable sequence of real numbers. Then the quasiFej´er monotone sequence {ak } converges to a unique limit point.

A.2

The key inequality

Now, we start to analyze the convergence of the imiPADMM by studying some necessary results. Let τ be the dual step-length in Algorithm 2 and α be the constant defined in (A.4). We define the following two linear operators (1 − α)σ 1b Σf + S + AA∗ and 2 2 b f and Σ b g are given by (3.1). where Σ F :=

1b 2 ∗ G := Σ g + T + min{τ, 1 + τ − τ }ασBB , 2

(A.6)

√ Lemma A.3. Assume that (4.5) holds. For any τ ∈ (0, (1 + 5)/2), the constants α, α b and β defined by (A.4) satisfy 0 < α < 1, 0 < α b < 1 and β > 0. Meanwhile, the linear operators F and G defined by (A.6) are positive definite.

Proof. It is √ easy to see that 0 < α < 1, 0 < α b < 1 and β > 0 from (A.4) and the fact that τ ∈ (0, (1 + 5)/2). Also, it holds that ρ := min(τ, 1 + τ − τ 2 ) ∈ (0, 1] so that 0 < ρα < 1. Note that by (A.6) we have that     1b ∗ + 1+α 1 Σ b Σ + S + σAA + S , F = (1−α) 2 2 f 2 2 f     1b ∗ + 2−ρα 1 Σ ∗ b Σ + T + σBB + T + ρα G = ρα 2 2 g 2 2 g 2 σBB . Hence, one can readily observe that F ≻ 0 and G ≻ 0 from (4.5). This completes the proof. 14

Based on the previous results, one can get the following result, which is exactly the same as [4, Theorem 5.2]. So we omit the corresponding proof. Lemma A.4. Suppose that both (i) and (ii) in Assumption 3.1 hold. Then, the infinite sequence {(xk , y k , z k )} generated by Algorithm 2 satisfies, for all k ≥ 1, (1 − τ )σkr k+1 k2 + σkA∗ xk+1 + B ∗ y k − ck2 + 2αhdyk−1 − dky , y k − y k+1 i

≥ α bσ(kr k+1 k2 − kr k k2 ) + βσkr k+1 k2 + kxk − xk+1 k2(1−α)σ −ky k−1 − y k k2

b g +T ) α(Σ

+ ky k − y k+1 k2

2

(A.7)

AA∗

b g +T )+min{τ,1+τ −τ 2 }ασBB∗ α(Σ

.

Next, we shall derive an inequality which is essential for establishing the global convergence of the imiPADMM. Proposition A.2 (The key inequality). Suppose that both (i) and (ii) in Assumption 3.1 hold. For any w ¯ := (¯ x, y¯, z¯) ∈ W, the sequence {(xk , y k , z k )} generated by Algorithm 2 satisfies 2αhdky − dyk−1 , y k − y k+1 i − 2hdkx , xk+1 − x ¯i − 2hdky , y k+1 − y¯i

+kxk − xk+1 k2F + ky k − y k+1 k2G + βσkr k+1 k2 ≤ φk (w) ¯ − φk+1 (w), ¯

∀k ≥ 1,

(A.8)

where, for any (x, y, z) ∈ X × Y × Z and k ≥ 1, φk (x, y, z) :=

1 kz − z k k2 + kx − xk k2Σb +S + ky − y k k2Σb +T g f τσ +σkA∗ x + B ∗ y k − ck2 + α bσkr k k2 + αky k−1 − y k k2b

Σg +T

.

(A.9)

Proof. For any given (x, y, z) ∈ X × Y × Z, we define xe := x − x ¯, ye := y − y¯ and ze := z − z¯. From (A.1) one has that z k + σ(A∗ xk+1 + B ∗ y k − c) = ze k+1 + σB ∗ (y k − y k+1 ).

Then, from Step 1 of Algorithm 2, we know that   b f + S)(xk − xk+1 ) ∈ ∂p(xk+1 ). dkx − ∇f (xk ) − A ze k+1 + σB ∗ (y k − y k+1 ) + (Σ

(A.10)

Moreover, the convexity of p implies that

  b f + S)(xk − xk+1 ), xk+1 i ≥ p(xk+1 ). p(¯ x) + hdkx − ∇f (xk ) − A zek+1 + σB ∗ (y k − y k+1 ) + (Σ e

(A.11)

Applying a similar derivation, we can also get that for any y ∈ Y, q(¯ y ) + hdky , yek+1 i − h∇g(y k ), yek+1 i − he z k+1 , B ∗ yek+1 i b g + T )(y k − y k+1 ), yek+1 i ≥ q(y k+1 ). +h(Σ

(A.12)

By using (3.3) and the convexity of the functions p and q, we have p(xk+1 ) − p(¯ x) + h∇f (¯ x) + A¯ z , xk+1 e i ≥ 0,

q(y k+1 ) − q(¯ y ) + h∇g(¯ y ) + B¯ z , yek+1 i ≥ 0. 15

(A.13)

Finally, by summing (A.11) (A.12) and (A.13) together, we get i −he zek+1 , r k+1 i − σhB ∗ (y k − y k+1 ), A∗ xk+1 e k k+1 )i +hxk+1 b e , (x − x Σ

f +S

i + hdky , yek+1 i + hyek+1 , (y k − y k+1 )iΣb g +T + hdkx , xk+1 e

≥ h∇f (xk ) − ∇f (¯ x), xk+1 i + h∇g(y k ) − ∇g(¯ y ), yek+1 i e

(A.14)

≥ − 14 kxk − xk+1 k2b − 14 ky k − y k+1 k2b , Σf

Σg

where the last inequality comes from Lemma A.1. Next, we estimate the left-hand side of (A.14). By using (A.1), we have that i = hB ∗ yek − B ∗ yek+1 , r k+1 − B ∗ yek+1 i hB ∗ (y k − y k+1 ), A∗ xk+1 e = hB ∗ yek − B ∗ yek+1 , r k+1 i − =

1 2

1 2

kB ∗ yek k2 − kB ∗ yek − B ∗ yek+1 k2 − kB ∗ yek+1 k2  kB ∗ yek+1 k2 + kA∗ xk+1 + B ∗ y k − ck2 − kB ∗ yek k2 − kr k+1 k2 .



(A.15)

Also, from (A.1) we know that

hxk+1 , xk − xk+1 iΣb f +S = 21 (kxke k2b e

− kxk+1 k2b e

) − 21 kxk − xk+1 k2b

,

hyek+1 , y k − y k+1 iΣb g +T = 12 (kyek k2b

− kyek+1 k2b

) − 21 ky k − y k+1 k2b

.

Σf +S

Σg +T

Σf +S

Σg +T

Σf +S

Σg +T

(A.16)

Moreover, by using the definition of {e z k } and (A.1) we know that

1 k+1 hz − z k , zek i + σkr k+1 k2 τ σ kzek+1 k2 − kz k+1 − z k k2 − kzek k2 + σkr k+1 k2  )σ kzek+1 k2 − kzek k2 + (2−τ kr k+1 k2 . 2

hr k+1 , zeek+1 i = hr k+1 , zek + σr k+1 i = =

1 2τ σ

=

1 2τ σ

(A.17)

Thus, by substituting (A.15), (A.16) and (A.17) into (A.14), we obtain that  i + hdky , yek+1 i + 2τ1σ kzek k2 − kzek+1 k2 + σ2 (kB ∗ yek k2 − kB ∗ yek+1 k2 ) hdkx , xk+1 e + 12 (kxke k2b

Σf +S

+ kyek k2b

Σg +T

≥ 21 kxk − xk+1 k21 b 2

+ σ2 kA∗ xk+1

+

Σf +S

B∗yk

) − 12 (kxk+1 k2b e

Σf +S

+ kyek+1 k2b

Σg +T

)

(A.18)

+ 21 ky k − y k+1 k21 b −

ck2

+

Σ +T 2 g (1−τ )σ k+1 kr k2 . 2

Note that for any y ∈ Y, A∗ x ¯ + B ∗ y − c = B ∗ ye . Therefore, by applying (A.7) in Lemma A.4 to the right hand side of (A.18) and using (A.9) together with (A.6), we know that (A.8) holds for k ≥ 1. This completes the proof. Remark A.1. The inequality (A.18) in the proof is responsible for the improvement that we made in this paper, when compared with [4], in which the same problem as (1.1) was considered and the inequality for the same purpose as (A.18) is given as follows: 1 k+1 k2 ) + σ (kB ∗ y k k2 k 2 e 2τ σ (kze k − kze 2 1 2 2 k 2 k+1 k+1 ) − 2 (kxe k b ) kye k b + kye k b Σg +T Σg +T Σf +S

k+1 i + k hdkx , xk+1 e i + hdy , ye

+ 21 (kxke k2b

Σf +S



+

1 k 1 kx − xk+1 k21 Σ +S + ky k − y k+1 k21 Σg +T 2 2 f 2 2 σ (1 − τ )σ k+1 2 + kA∗ xk+1 + B ∗ y k − ck2 + kr k , 2 2 16

− kB ∗ yek+1 k2 ) (A.19)

where Σf and Σg are defined by (A.3). The difference between (A.19) and (A.18) is highlighted in a b f  Σf and Σ b g  Σg , the inequality (A.18) is tighter than (A.19). Consequently, box. Since that Σ the inequality (A.8) looks the same as [4, (5.14) in Proposition 5.1], but one should notice that the definitions of F and G in this paper are different from those in [4], as can be seen from the following table. Ref. \ Item

A.3

F

G

[4]

1 2 Σf

+S +

(1−α)σ AA∗ 2

1 2 Σg

+ T + min{τ, 1 + τ − τ 2 }ασBB ∗

This paper

1b 2 Σf

+S +

(1−α)σ AA∗ 2

1b 2 Σg

+ T + min{τ, 1 + τ − τ 2 }ασBB ∗

Proof of Theorem 4.1

The proof for Theorem 4.1 can be obtained by using the newly defined F and G in (A.6) instead of those used in [4] and repeating the proof of [4, Theorem 5.1]. In order to make this paper more readable, we provide the proof of Theorem 4.1 here. Proof. According to Assumption 3.1(i) we know that the solution set W to the KKT system (3.3) of ¯ := (¯ x, y¯, z¯) ∈ W, and define xe := x − x ¯, problem (1.1) is nonempty. Then, we can choose a fixed w ye := y − y¯ and ze := z − z¯ for any given (x, y, z) ∈ X × Y × Z. We first show that the sequence {(xk , y k , z k )} is bounded. According to Lemma A.3 one has that 0 < α < 1, 0 < α b < 1 and β > 0. Moreover, the linear operators F and G defined in (A.6) are positive definite. Hence, it holds that ky k − y k+1 k2G + 2αhdky − dyk−1 , y k − y k+1 i

= ky k − y k+1 + αG −1 (dky − dyk−1 )k2G − α2 kdky − dyk−1 k2G −1 .

By substituting x ¯k+1 and y¯k+1 for xk+1 and y k+1 into (A.8) in Proposition A.2, we obtain that φk (w) ¯ − φ¯k+1 (w) ¯ + α2 kdyk−1 k2G −1

≥ k¯ xk+1 − xk k2F + βσk¯ r k+1 k2 + k¯ y k+1 − y k + αG −1 dyk−1 k2G ,

∀k ≥ 1,

(A.20)

where for any (x, y, z) ∈ X × Y × Z, we define φ¯k (x, y, z) :=

1 τ σ kz

− z¯k k2 + kx − x ¯k k2b

Σf +S



∗ k

2

+ ky − y¯k k2b

Σg +T

k 2

+σkA x + B y¯ − ck + α bσk¯ r k + αky

k−1

− y¯k k2Σb

g +T

,

∀k ≥ 1.

Now, define the sequences {ξ k } and {ξ¯k } in Z × X × Y × Z × Y for k ≥ 1 by   √ √ k , √α(Σ  b f + S) 21 xk , N 12 y k , α b g + T ) 12 (y k−1 − y k ) , τ σzek , (Σ b σr  ξ k := e e √  √ √ b 1 1  k−1 − y k) . b f + S) 21 x  ξ¯k := 2 τ σ¯ zek , (Σ ¯ke , N 2 y¯ek , α bσ¯ r k , α(Σ + T ) (y ¯ g

Obviously we have kξk k2 = φk (w) ¯ and kξ¯k k2 = φ¯k (w). ¯ This, together with (A.20) implies that 1 1 − 2 k−1 2 k+1 2 k 2 2 ¯ kξ k ≤ kξ k + α kG dy k . As a result, it holds that kξ¯k+1 k ≤ kξ k k + αkG − 2 dyk−1 k. Consequently, one obtains that 1 kξ k+1 k ≤ kξ k k + αkG − 2 dyk−1 k + kξ¯k+1 − ξ k+1 k.

17

(A.21)

Next, we estimate kξ¯k+1 − ξ k+1 k in (A.21). From Lemma A.3 we know that α b + τ ∈ [1, 2], so that 1 z k+1 − z k+1 k2 + α bσk¯ r k+1 − r k+1 k2 = (τ + α b)σk¯ r k+1 − r k+1 k2 τ σ k¯ ≤ 2σkA∗ (¯ xk+1 − xk+1 ) + B ∗ (¯ y k+1 − y k+1 )k2 ≤ 4k¯ xk+1 − xk+1 k2σAA∗

+ 4k¯ y k+1 − y k+1 k2σBB∗ .

This, together with Proposition A.1, implies that kξ¯k+1 − ξ k+1 k2

≤ k¯ xk+1 − xk+1 k2b +4k¯ xk+1



+ k¯ y k+1 − y k+1 k2N + k¯ y k+1 − y k+1 k2b

Σf +S k+1 x k2σAA∗

Σg +T

+ 4k¯ y k+1



y k+1 k2σBB∗

(A.22)

≤ 5(k¯ xk+1 − xk+1 k2M + k¯ y k+1 − y k+1 k2N ) ≤ ̺2 ε2k , q 1 1 where ̺ is a positive constant defined by ̺ := 5(1 + (1 + σkN − 2 BA∗ M− 2 k)2 ). On the other 1

1

1

hand, from Proposition A.1, we know that kG − 2 dky k ≤ kG − 2 N 2 kεk . By using this fact together with (A.21) and (A.22), one can get   1 1 1 1 kξ k+1 k ≤ kξ k k + ̺εk + kG − 2 N 2 kεk−1 ≤ kξ 1 k + ̺ + kG − 2 N 2 k E, (A.23)

which implies that the sequence {ξ k } is bounded. Hence by (A.22), we know that the sequence {ξ¯k } is also bounded. From the definition of ξ k , we know that the sequences {y k }, {z k }, {r k } and b f + S) 21 xk } are bounded. Thus, by the definition of r k , we know that the sequence {Axk } is {(Σ also bounded, which together with the definition of M and the fact that M ≻ 0, implies that {xk } is bounded. By (A.20), (A.22) and (A.23) we have that ∞   X k¯ xk+1 − xk k2F + βσk¯ r k+1 k2 + k¯ y k+1 − y k + αG −1 dyk−1 k2G k=1 ∞  X

 φk (w) ¯ − φk+1 (w) ¯ + φk+1 (w) ¯ − φ¯k+1 (w) ¯ + α2 kdyk−1 k2G −1 k=1   P k+1 ¯k+1 k kξ k+1 k + kξ¯k+1 k + kG − 12 N 21 k2 E ′ kξ − ξ ≤ φ1 (w) ¯ + ∞ k=1



1

(A.24)

1

≤ φ1 (w) ¯ + kG − 2 N 2 k2 E ′ + ̺ max{kξ k+1 k + kξ¯k+1 k}E < ∞, k≥1

where we have used the fact that φk (w)− ¯ φ¯k (w) ¯ ≤ kξ k − ξ¯k k(kξ k k+kξ¯k k). By (A.24), we know that k+1 k 2 k+1 k limk→∞ k¯ x − x kF = 0, limk→∞ k¯ y − y + αG −1 dyk−1 k2G = 0, and limk→∞ k¯ r k+1 k2 = 0. Then, by F ≻ 0 and G ≻ 0, we have that {¯ xk+1 − xk } → 0, {¯ y k+1 − y k } → 0 and {¯ r k } → 0 as k → ∞. xk − xk } → 0 and Also, due to the fact that M ≻ 0 and N ≻ 0, by Proposition A.1 we know that {¯ {¯ y k −y k } → 0 as k → ∞. As a result, it holds that {xk −xk+1 } → 0, {y k −y k+1 } → 0, and {r k } → 0 as k → ∞. Note that the sequence {(xk+1 , y k+1 , z k+1 )} is bounded. Thus, it has a convergent subsequence {(xki +1 , y ki +1 , z ki +1 )} which converges to a point, say (x∞ , y ∞ , z ∞ ) ∈ X × Y × Z. We define two nonlinear mappings F : X × Y × Z → X and G : X × Y × Z → Z by F (w) := ∂p(x) + ∇f (x) + Az

and G(w) := ∂q(y) + ∇g(y) + Bz,

∀(x, y, z) ∈ X × Y × Z.

Note that for any k ≥ 0, b g + T )y k − y k+1 ∈ ∂q(y k+1 ). dky − ∇g(y k ) − Be z k+1 + (Σ 18

(A.25)

Since that Assumption 3.1(ii) holds, by Clarke’s Mean Value Theorem [6, Proposition 2.6.5], we bg b f and 0  P k  Σ know that for any k ≥ 1, there exist two self-adjoint linear operators 0  Pxk  Σ y such that ∇f (xk−1 ) − ∇f (xk ) = Pxk (xk−1 − xk )

and ∇g(y k−1 ) − ∇g(y k ) = Pyk (y k−1 − y k ).

(A.26)

Note that (A.10) holds. Then, by (A.25) and (A.26) we know that for all k ≥ 1,  b f + S)(xk − xk+1 ) + (τ − 1)σAr k+1 − σAB ∗ (y k − y k+1 ) dk − Pxk+1 (xk − xk+1 ) + (Σ    x ∈ F (wk+1 ),    b g + T )y k − y k+1 + (τ − 1)σBr k+1 ∈ G(wk+1 ). dky − Pyk+1 y k − y k+1 + (Σ (A.27) By taking limits along {ki } as i → ∞ in (A.27), we know that 0 ∈ ∂p(x∞ ) + ∇f (x∞ ) + Az ∞

and 0 ∈ ∂q(y ∞ ) + ∇g(y ∞ ) + Bz ∞ ,

which together with the fact that limk→∞ r k = 0 implies that (x∞ , y ∞ , z ∞ ) ∈ W. Hence, (x∞ , y ∞ ) is a solution to problem (1.1) and z ∞ is a solution to the dual of problem (1.1). By (A.23) and Lemma A.2, we know that the sequence {kξ k k} is convergent. We can let w ¯ = (x∞ , y ∞ , z ∞ ) in all the previous discussions. In this case, limk→∞ kξ k k = 0. Thus, from the definition of {ξ k }, we know that lim z k = z ∞ ,

k→∞

lim y k = y ∞

k→∞

and

b f + S)xk = (Σ b f + S)x∞ . lim (Σ

k→∞

Since that limk→∞ r k = 0, it holds that {A∗ xk } → A∗ x∞ as k → ∞. Consequently, from the definition of M and the fact that M ≻ 0, we can get limk→∞ xk = x∞ , which completes the proof.

B

The Proof of Proposition 4.2

Proof. We only need to prove one part of the result, i.e., 1b S− Σ f 2

and

1b Σf + S + σAA∗ ≻ 0 2

(B.1)

f  0, we know from the definition of S and with S = SsGS and M = MsGS . Since that sGS(M) (3.2) that bf . f  Diag(Se1 , . . . , Sem )  − 1 Σ S = Diag(Se1 , . . . , Sem ) + sGS(M) 2 We know from (3.2), (4.1) and (A.5) that 1b Σf + σAA∗ + Diag(Se1 , . . . , Sem )  0, 2

and

1  1b b f + σAA∗ + Diag(Se1 , . . . , Sem ) + sGS(M f) ≻ 0. Σf + Σ 2 2 Moreover, one has that  −1 b f + σAA∗ )∗ . b f + σAA∗ + Diag(Se1 , . . . , Sem ) f) = (Σ b f + σAA∗ )u Σ (Σ sGS(M u

(B.2)

M=

d

19

(B.3)

b f + S + σAA∗ is not positive definite. Then, there exists a nonzero vector ν ∈ X Suppose that 12 Σ such that 

b f + S + σAA∗ ν = 0. ν, 12 Σ Therefore, by using the (B.2) and (B.3) one has that

ν,

1b 2 Σf

 + σAA∗ + Diag(Se1 , . . . , Sem ) ν = 0

and



f)ν = 0. ν, sGS(M

b f + σAA∗ )∗ ν = 0. Consequently, it holds that Meanwhile, according to (B.3), one has that (Σ u

 b f + σAA∗ + Diag(Se1 , . . . , Sem ) ν 0 = ν, 12 Σ



 b f + σAA∗ ν + ν, 1 σAA∗ + Diag(Se1 , . . . , Sem ) ν = 12 ν, Σ 2

 

 1 b ∗ = ν, Σf + σAA + Diag(Se1 , . . . , Sem ) ν + 1 σ ν, AA∗ ν 2



= ν,

1 2

2

d





 b f + Diag(Se1 , . . . , Sem ) ν + 1 σhν, (AA∗ )d νi + 1 σ ν, AA∗ ν . Σ 2 2 d

b f  0. Therefore, all its diagonal blocks are positive semidefinite. Hence, we Note that S + 21 Σ know that  

b f + Diag(Se1 , . . . , Sem ) ν = 0 and hν, (AA∗ )d νi = 0. ν, 12 Σ d Thus, we can get



ν,

1 2

bf Σ



d

 + σ(AA∗ )d + Diag(Se1 , . . . , Sem ) ν = 0,

b f )ii + σAi A∗ + Sei ≻ 0 for all i = 1, . . . , m. which contradicts to the assumption in (3.2) that 21 (Σ i b f + S + σAA∗ ≻ 0 and this completes the proof. Therefore, we have 21 Σ

20

Suggest Documents