Optimality conditions for sparse nonlinear programming - Springer Link

3 downloads 0 Views 243KB Size Report
tion subject to sparsity, nonlinear equality and inequality constraints. ... sparse nonlinear programming, constraint qualification, normal cone, first-order optimality ...
SCIENCE CHINA Mathematics Progress of Projects Supported by NSFC

May 2017 Vol. 60 No. 5: 759–776 doi: 10.1007/s11425-016-9010-x

. ARTICLES .

Optimality conditions for sparse nonlinear programming PAN LiLi1,2,∗ , XIU NaiHua1 & FAN Jun1 1Department 2Department

of Mathematics, Beijing Jiaotong University, Beijing 100044, China; of Mathematics, Shandong University of Technology, Zibo 255049, China

Email: [email protected], [email protected], [email protected] Received April 26, 2016; accepted September 22, 2016; published online February 23, 2017

Abstract The sparse nonlinear programming (SNP) is to minimize a general continuously differentiable function subject to sparsity, nonlinear equality and inequality constraints. We first define two restricted constraint qualifications and show how these constraint qualifications can be applied to obtain the decomposition properties of the Fr´ echet, Mordukhovich and Clarke normal cones to the sparsity constrained feasible set. Based on the decomposition properties of the normal cones, we then present and analyze three classes of Karush-KuhnTucker (KKT) conditions for the SNP. At last, we establish the second-order necessary optimality condition and sufficient optimality condition for the SNP. Keywords sparse nonlinear programming, constraint qualification, normal cone, first-order optimality condition, second-order optimality condition MSC(2010)

90C26, 90C30, 90C46

Citation: Pan L L, Xiu N H, Fan J. Optimality conditions for sparse nonlinear programming. Sci China Math, 2017, 60: 759–776, doi: 10.1007/s11425-016-9010-x

1

Introduction

In this paper, we consider the sparse nonlinear programming (SNP) problem in finite dimensional spaces, which is to minimize a general continuously differentiable function subject to sparsity, nonlinear equality and inequality constraints min f (x)

s.t.

g(x) 6 0,

h(x) = 0,

kxk0 6 s,

(1.1)

where f (x) : Rn → R, g(x) = (g1 (x), . . . , gm (x))T : Rn → Rm and h(x) = (h1 (x), . . . , hl (x))T : Rn → Rl are continuously differentiable or twice continuously differentiable functions, which means their Jacobians or Hessians exist and are continuous. kxk0 counts the number of nonzero elements in the vector x and s < n is a positive integer. The SNP problem includes some optimization models reformulated from wellknown linear and nonlinear compressed sensing [5,11,14], variable selection in regression problems [19,21], mixed-integer programs [8,9], etc., and has extensive applications in signal and image processing, applied mathematics, statistics, and computer science, etc. Because the sparsity constrained set is a union of finitely many subspaces, the SNP problem is a mixed combinatorial optimization and hence is generally NP-hard, even if the objective function is convex and no equality and inequality constraints are imposed. ∗ Corresponding

author

c Science China Press and Springer-Verlag Berlin Heidelberg 2017

math.scichina.com

link.springer.com

760

Pan L L et al.

Sci China Math

May 2017

Vol. 60

No. 5

Speaking in general terms, continuous optimization theory is usually unable to be applicable to a combinatorial optimization problem. However, the special structure of sparsity constrained set provides us with an interesting research topic. In particular, the optimality conditions for some special cases of the SNP problem have been impressively obtained. For example, Beck et al. [3] introduced and analyzed three kinds of first-order necessary optimality conditions, which are basic feasibility, L-stationarity and coordinate-wise optimality for the SNP problem under the case of a single sparsity constraint. Lately, Beck et al. [4] extended the above results to the problem of minimizing a general continuously differentiable function over sparse and symmetric sets, which is also a special feasible set of the SNP problem. More recently, Lu [22] studied the strong L-stationarity to the optimization over sparse and symmetric sets as in [4]. For the SNP problem, Lu et al. [23] established a first-order necessary optimality condition by using subspace technique under the Robinson constraint qualification (CQ). By utilizing the tool of tangent cone and normal cone to the feasible set, the optimality conditions for some special cases of the SNP problem are presented. In [2], Bauschke et al. calculated the Bouligand tangent cone, proximal and Mordukhovich normal cones to the sparsity constrained set. Pan et al. [27] established the concepts of N-stationarity and T-stationarity for the SNP problem with sparse and nonnegative constraints by using Bouligand and Clarke tangent cones and corresponding normal cones to the feasible set. They also gave the second-order optimality conditions in terms of Clarke tangent cone for the existence of local optimal solutions. Li et al. [20] presented the first-order necessary conditions for the SNP problem with sparse and polyhedral constraints based on the expressions of the Fr´echet and Mordukhovich normal cones to the feasible set. It is worth mentioning that the calculation of the normal cones, especially the Mordukhovich normal cone to the nonconvex sets is a challenging task, and some results in this direction can be found in [1, 15, 17, 18, 26, 28, 30] and the references therein. Similarly, utilizing the tangent cone and normal cone, Flegel et al. [16] derived two classes of optimality conditions for the disjunctive programs in finite dimensional spaces and specialized these results to mathematical programs with equilibrium constraints. Song et al. [30] generalized the above optimality results for the disjunctive optimization to infinite dimensional spaces. By reformulating the SNP problem ˇ into a standard nonlinear program with complementarity-type continuous constraints, Cervinka et al. [12] defined some problem-tailored CQs and showed how these CQs can be used to obtain the optimality conditions for the reformulation of the SNP problem. Then Burdakov et al. [10] introduced a mixed-integer relaxed formulation of the SNP problem, which is not equivalent to the original SNP problem in the sense of local minima, and investigated the stationarity conditions for the relaxed program under the problem-tailored CQs. Our questions are as follows: Are there new first-order necessary conditions for the general SNP problem? What are the second-order optimality conditions for the general SNP problem? In this paper, we give affirmative answers, and make the following contributions: 1. By introducing the restricted linear independence constraint qualification (R-LICQ) and the restricted Mangasarian-Fromovitz constraint qualification (R-MFCQ) in the context of the SNP problem, we study the decomposition properties of Fr´echet, Mordukhovich and Clarke normal cones to the sparsity constrained feasible set (see Propositions 2.5, 2.11 and 2.12). 2. We introduce and analyze three kinds of KKT conditions for the SNP problem. Then we show the relationship among various first-order optimality conditions (see Theorems 3.2 and 3.4–3.6). 3. We establish the second-order necessary condition and sufficient condition for the SNP problem in terms of Bouligand tangent cone to the feasible set (see Theorems 4.1 and 4.2). This paper is organized as follows. In Section 2, we give the decomposition properties of normal cones. In Section 3, we consider the first-order optimality conditions. In Section 4, we explore the second-order optimality conditions. Concluding remarks are made in Section 5. Our notation is standard. For x ∈ Rn ,pkxk0 = |supp(x)| with supp(x) , {i = 1, . . . , n, xi 6= 0}. k · k denotes the Euclidean norm, i.e., kxk = x21 + · · · + x2n . xJ ∈ R|J| denotes the subvector of x indexed Ω

by index set J ⊆ {1, . . . , n}. x′ − → x means that x′ → x in set Ω ⊆ Rn . For a cone K ⊆ Rn , its polar ◦ cone is defined by K , {d ∈ Rn : hd, xi 6 0, ∀ x ∈ K}. For any x ∈ Rn and set A ⊆ Rn , projector

Pan L L et al.

Sci China Math

May 2017

Vol. 60

761

No. 5

onto A is PA (x) , arg miny∈A kx − yk.

2

Decomposition properties of normal cones

We first review the definitions of several basic notions in variational analysis [25, 29], which will be extensively used throughout the paper. Let Ω ⊆ Rn be an arbitrary closed set and x∗ ∈ Ω. The Bouligand (also contingent or tangent) ∗ C ∗ ∗ cone TB Ω (x ) and Clarke (also regular) tangent cone TΩ (x ) to Ω at x are defined as ∗ TB Ω (x ) , lim sup t↓0

Ω − x∗ t

 xk − x∗ = d ∈ R : ∃ {x } − → x and {tk } ↓ 0 such that lim =d , k→∞ tk Ω−x ∗ TC Ω (x ) , lim inf Ω t x− →x∗ 

n

k





(2.1)

t↓0

=







 xk − y k =d . k→∞ tk

d ∈ Rn : ∀ {xk } − → x∗ and {tk } ↓ 0, ∃ {y k } − → x∗ such that lim

∗ C ∗ B ∗ n The Clarke tangent cone TC Ω (x ) is a closed convex cone and TΩ (x ) ⊆ TΩ (x ). A set Ω ⊆ R is said to ∗ ∗ ∗ be locally closed at a point x if Ω ∩ V (x ) is closed for some closed neighborhood V (x ) of x∗ . When Ω is locally closed at x∗ , one has ∗ B TC (2.2) Ω (x ) = lim inf TΩ (x). Ω x− →x∗

Furthermore, we use the Bouligand and Clarke tangent cone to define the Fr´echet (also regular) and Clarke normal cone, respectively, ∗ B ∗ ◦ NF Ω (x ) , [TΩ (x )] ,

∗ C ∗ ◦ NC Ω (x ) , [TΩ (x )] .

The Mordukhovich (also limiting or basic) normal cone in Rn is defined as ∗ F NM Ω (x ) , lim sup NΩ (x). Ω x− →x∗

(2.3)

∗ M ∗ F ∗ M ∗ It is obvious that NF Ω (x ) ⊆ NΩ (x ) from [29, Proposition 6.5]. In contrast to NΩ (x ), NΩ (x ) may be nonconvex.

Definition 2.1 (See [29, Definition 6.4]). The set Ω being locally closed at x∗ ∈ Ω and satisfying ∗ F ∗ ∗ NM Ω (x ) = NΩ (x ) is called regular at x in the sense of Clarke. In particular, if Ω is a closed convex set and x∗ ∈ Ω, then ∗ M ∗ C ∗ n ∗ NF Ω (x ) = NΩ (x ) = NΩ (x ) = {d ∈ R : hd, x − x i 6 0, ∀ x ∈ Ω}.

Next, we concern with the calculations of various tangent cones and normal cones to the union and Sk intersection of finitely many nonempty closed sets Ω1 , . . . , Ωk . For x∗ ∈ i=1 Ωi with IΩ (x∗ ) , {i = 1, . . . , k, x∗ ∈ Ωi }, we have [ ∗ Sk TB (x∗ ) = TB (2.4) Ωi (x ), Ωi i=1

i∈IΩ (x∗ )

Sk NF

i=1 Ωi



(x ) ⊇

\

i∈IΩ

∗ NF Ωi (x ).

(x∗ )

Furthermore, (2.5) holds as equality if all the sets Ωi are convex (see [1, Proposition 3.1]).

(2.5)

762

Pan L L et al.

Let x∗ ∈

Tk

i=1

Sci China Math

May 2017

Vol. 60

No. 5

Ωi . It holds that Tk TB

i=1

(x∗ ) ⊆ Ωi

k \

∗ TB Ωi (x ),

Tk NF

i=1

i=1

(x∗ ) ⊇ Ωi

k X

∗ NF Ωi (x ).

(2.6)

i=1

Pk ∗ Under the condition that the only combination of vectors vi ∈ NM Ωi (x ) with i=1 vi = 0 is vi = 0 for all i, one also has k k \ X ∗ C ∗ M ∗ ∗ Tk Tk TC (x ) ⊇ T (x ), N (x ) ⊆ NM (2.7) Ωi Ωi (x ). Ωi Ωi i=1

i=1

i=1

If in addition every Ωi is regular at x∗ , then Tk TB

i=1



Ωi

(x ) =

k \

Tk

i=1

i=1

Ωi is regular at x∗ and

∗ TB Ωi (x ),

Tk NM

i=1

i=1



Ωi

(x ) =

k X

∗ NM Ωi (x ),

(2.8)

i=1

see [29, Theorem 6.42]. The next lemma will be used in the sequel. Lemma 2.2 (See [29, Corollary 11.25]). Let K1 and K2 be nonempty cones in Rn . Then ◦ ◦ ◦ ◦ (i) (K1 ∪ K2 ) = (K1 + K2 ) = K1 ∩ K2 ; (ii) let ai ∈ Rn (i = 1, . . . , m + l) and define the cone K , {d ∈ Rn : hai , di 6 0, i = 1, . . . , m, hai , di = 0, i = m + 1, . . . , m + l}. Its polar cone can be written in the form K◦ =

X m i=1

λi ai +

m+l X

i=m+1

 λi ai : λi ∈ R+ , i = 1, . . . , m, λi ∈ R, i = m + 1, . . . , m + l .

Actually, this means that if K1 and K2 are polyhedral cones, then (K1 ∩ K2 )◦ = K1◦ + K2◦ . Now we begin to study the SNP problem (1.1), which is general and includes the ones in [2–4,20,22,27] and coincides with the one in [23]. Denote the feasible set of (1.1) as Q ∩ S, where Q , {x ∈ Rn : g(x) 6 0, h(x) = 0}

(2.9)

and S , {x ∈ Rn : kxk0 6 s}. The set S can be written in the form as [ S= RnJ ,

(2.10)

J∈J

where J = {J ⊂ {1, 2, . . . , n} : |J| = s}, RnJ = span{ei : i ∈ J} is the subspace of Rn spanned by {ei : i ∈ J} and ei is the vector in Rn whose i-th component is one and others are zeros. For x∗ ∈ Q ∩ S,  denote Γ∗ , supp(x∗ ), J ∗ , J ⊂ {1, 2, . . . , n} : J ⊇ Γ∗ , |J| = s , I(x∗ ) , {i = 1, . . . , m : gi (x∗ ) = 0} and ∇Γ∗ g(x∗ ) = (∇g(x∗ ))Γ∗ . We review the tangent cones and normal cones to S. The Bouligand tangent cone and Mordukhovich normal cone to S can be found in [2, Theorems 3.9 and 3.15]. The Bouligand and Clarke tangent cones and corresponding normal cones to S were also characterized based on their definitions in [27, Theorems 2.1 and 2.2]. Lemma 2.3 (See [2, Theorems 3.9 and 3.15] and [27, Theorems 2.1 and 2.2]). ∗ n ∗ n TC NC S (x ) = RΓ∗ , S (x ) = RΓ∗ ,  ( RnΓ∗ , n ∗ R if |Γ | = s, ∗, ∗ B ∗ Γ S n NF (x ) = T (x ) = S S RJ ,  {0}, if |Γ∗ | < s, ∗ J∈J

For any x∗ ∈ S, we have

if |Γ∗ | = s, if |Γ∗ | < s,

Pan L L et al.

Sci China Math

 Rn∗ , Γ ∗ S NM (x ) = S RnJ¯, 

May 2017

Vol. 60

No. 5

763

if |Γ∗ | = s, (2.11)

if |Γ∗ | < s,

J∈J ∗

where J¯ is the complementary set of the index set J ⊂ {1, 2, . . . , n}. In this paper, our main goal is to investigate the optimality conditions for (1.1) utilizing the normal cones. Because the objective function f is continuously differentiable, a well-known optimality condition is (see [29, Theorem 6.12]) ∗ −∇f (x∗ ) ∈ NF (2.12) Q∩S (x ). Unfortunately, by (2.6) we only have the inclusion ∗ F ∗ F ∗ NF Q∩S (x ) ⊇ NQ (x ) + NS (x ),

(2.13)

which turns out to be equal if Q and S are regular from (2.8). However, the regularity of S at x∗ only holds if kx∗ k0 = s and never holds if kx∗ k0 < s from Lemma 2.3. This needs us to explore the ∗ F ∗ F ∗ decomposition property NF Q∩S (x ) = NQ (x ) + NS (x ) to attain the stronger optimality conditions by developing constraint qualifications for the feasible set Q ∩ S of (1.1). Definition 2.4. Let x∗ ∈ Q ∩ S be feasible for (1.1). (i) We say that the restricted linear independence constraint qualification (R-LICQ) holds at x∗ , if • when kx∗ k0 = s, ∇gi (x∗ ), i ∈ I(x∗ ), ∇hj (x∗ ), j = 1, . . . , l are linearly independent; • when 0 < kx∗ k0 < s, ∇Γ∗ gi (x∗ ), i ∈ I(x∗ ), ∇Γ∗ hj (x∗ ), j = 1, . . . , l are linearly independent. (ii) We say that the restricted Mangasarian-Fromovitz constraint qualification (R-MFCQ) holds at x∗ , if • when kx∗ k0 = s, ∇hj (x∗ ), j = 1, . . . , l are linearly independent, and there is a y ∈ Rn such that h∇gi (x∗ ), yi < 0, i ∈ I(x∗ ) and h∇hj (x∗ ), yi = 0, j = 1, . . . , l; • when 0 < kx∗ k0 < s, ∇Γ∗ hj (x∗ ), j = 1, . . . , l are linearly independent, and for any J ∈ J ∗ , there is a y ∈ RnJ , such that h∇gi (x∗ ), yi < 0, i ∈ I(x∗ ) and h∇hj (x∗ ), yi = 0, j = 1, . . . , l. It is obvious that in the case of kx∗ k0 = s, the R-LICQ and the R-MFCQ are the classical definitions of LICQ and MFCQ for Q, respectively. While in the case of 0 < kx∗ k0 < s, the R-LICQ and the R-MFCQ are stronger than classical LICQ and MFCQ, respectively. Note that the set Q is regular at x∗ under the classical MFCQ in [7, Corollary 2.91]. Relying on the above restricted CQs, we have the following decomposition property of the Fr´echet normal cone. Proposition 2.5.

Let x∗ ∈ Q ∩ S and the R-LICQ hold at x∗ . We have ∗ F ∗ F ∗ NF Q∩S (x ) = NQ (x ) + NS (x ).

(2.14)

Proof. If kx∗ k0 = s, (2.14) holds obviously due to the regularity of Q and S. So we only need to prove the case of 0 < kx∗ k0 < s. Notice that from the notation of (2.10),  [  [ [ Q∩S =Q∩ RnJ = (Q ∩ RnJ ) = QJ , |J|=s

|J|=s

|J|=s

where QJ , Q ∩ RnJ . If the R-LICQ holds at x∗ ∈ Q ∩ S, then for any J ∈ J ∗ , ∇J gi (x∗ ), i ∈ I(x∗ ), ∇J hj (x∗ ), j = 1, . . . , l are linearly independent by Γ∗ ⊆ J. So the R-MFCQ holds. The Bouligand and Clarke tangent cones to Q at x∗ coincide (see [7, Corollary 2.91]) and ∗ ∗ TB Q (x ) = LQ (x ),

(2.15)

where LQ (x∗ ) , {d ∈ Rn : h∇gi (x∗ ), di 6 0, i ∈ I(x∗ ), h∇hj (x∗ ), di = 0, j = 1, . . . , l}. Then Q is regular at x∗ in [29, Corollary 6.29] and RnJ does obviously. From (2.8), we have for any J ∈ J ∗ , ∗ B ∗ n TB QJ (x ) = TQ (x ) ∩ RJ ,

∗ F ∗ n NF QJ (x ) = NQ (x ) + RJ¯.

764

Pan L L et al.

May 2017

Sci China Math

Vol. 60

No. 5

It follows from (2.4) and Lemma 2.3 that [

∗ TB Q∩S (x ) =

[

∗ TB QJ (x ) =

J∈J ∗

(2.16a)

J∈J ∗

∗ = TB Q (x ) ∩

 [

RnJ

J∈J ∗

=

∗ n TB Q (x ) ∩ RJ

∗ TB Q (x )





∗ TB S (x ).

(2.16b)

By (2.15) and Lemma 2.2(ii), ∗ ∗ ◦ NF Q (x ) = (LQ (x )) =

 X

l X

λi ∇gi (x∗ ) +

|I(x∗ )|

µj ∇hj (x∗ ) : λ ∈ R+

j=1

i∈I(x∗ )

 , µ ∈ Rl .

(2.17)

From (2.16a) and Lemma 2.2(i), it yields that ∗ NF Q∩S (x )

=

 [

∗ TB Q (x )



RnJ

J∈J ∗

=

\

◦

∗ n (NF Q (x ) + RJ¯).

=

\

∗ n ◦ (TB Q (x ) ∩ RJ )

J∈J ∗

(2.18)

J∈J ∗ ∗ Since 0 < kx∗ k0 < s, we have NF S (x ) = {0} from Lemma 2.3. By (2.6), it holds that ∗ F ∗ F ∗ F ∗ NF Q∩S (x ) ⊇ NQ (x ) + NS (x ) = NQ (x ). ∗ F ∗ We next prove NF Q∩S (x ) ⊆ NQ (x ). ∗

k0 ) being the combinatorial number. Then by (2.18), we Rewrite J ∗ = {J1 , . . . , Jt0 } with t0 , ( n−kx s−kx∗ k0 have t0 \ \ ∗ ∗ n ∗ n NF (NF (NF (2.19) Q∩S (x ) = Q (x ) + RJ¯) = Q (x ) + RJ¯k ). J∈J ∗

k=1

∗ F ∗ n Let v ∈ NF Q∩S (x ). From (2.19), it follows that v ∈ NQ (x ) + RJ¯k , k = 1, . . . , t0 . Thus, by (2.17) there (k)

are λi

(k)

> 0, i ∈ I(x∗ ) and µj , j = 1, . . . , l, such that 

vΓ∗

 v=  vJk \Γ∗ vJ¯k



 P Pl (k) (k) ∇Γ∗ gi (x∗ ) + j=1 µj ∇Γ∗ hj (x∗ ) ∗ λi   Pi∈I(x ) (k) P (k) l ∗ ∗ = i∈I(x∗ ) λi ∇Jk \Γ∗ gi (x ) + j=1 µj ∇Jk \Γ∗ hj (x )   P P (k) (k) l ∗ ∗ i∈I(x∗ ) λi ∇J¯k gi (x ) + j=1 µj ∇J¯k hj (x )



  + ξ (k) , 

(2.20)

Tt0 (k) where ξ (k) ∈ RnJ¯k with ξΓ∗ = 0 for all k = 1, . . . , t0 . As k=1 Jk = Γ∗ and ∇Γ∗ gi (x∗ ), i ∈ I(x∗ ), ∇Γ∗ hj (x∗ ), j = 1, . . . , l are linearly independent by the R-LICQ, and as for each k = 1, 2, . . . , t0 , vi (i ∈ Γ∗ ) (1) (t ) (1) (t ) is invariant, we obtain λi = · · · = λi 0 , λi , i ∈ I(x∗ ) and µi = · · · = µj 0 , µj , j = 1, . . . , l. It imP P T St0 l t0 plies that vΓ∗ = i∈I(x∗ ) λi ∇Γ∗ gi (x∗ )+ j=1 µj ∇Γ∗ hj (x∗ ). Since k=1 Jk = Γ∗ , k=1 Jk = {1, 2, . . . , n}, for any index r ∈ / Γ∗ , there is a k0 such that r ∈ Jk0 \ Γ∗ , i.e., vr =

X

i∈I(x∗ )

λi ∇r gi (x∗ ) +

l X

µj ∇r hj (x∗ ) + ξr(k0 ) .

j=1

P Pl (k ) Since ξ (k0 ) ∈ RJn , we have ξr 0 = 0. Hence, vr = i∈I(x∗ ) λi ∇r gi (x∗ ) + j=1 µj ∇r hj (x∗ ). In conk0 P P ∗ clusion, we obtain v = i∈I(x∗ ) λi ∇gi (x∗ ) + lj=1 µj ∇hj (x∗ ) ∈ NF Q (x ), which shows that (2.14) holds. This completes the proof.

Pan L L et al.

May 2017

Sci China Math

Vol. 60

765

No. 5

Remark 2.6. Actually, from the proof of the above proposition, we can see that the R-MFCQ for Q at x∗ ∈ Q ∩ S is enough to obtain (2.16b) of Bouligand tangent cone, i.e., ∗ B ∗ B ∗ TB Q∩S (x ) = TQ (x ) ∩ TS (x ).

At the same time, from (2.18), we easily verify that under the R-MFCQ for Q at x∗ ∈ Q ∩ S, it holds that ∗ M ∗ M ∗ NF Q∩S (x ) ⊆ NQ (x ) + NS (x ).

(2.21)

The next example shows that the decomposition property (2.14) of Fr´echet normal cone does not hold if the R-LICQ is missing. Example 2.7.

Consider the sets

Q = {x ∈ R3 : x1 + x2 + x3 6 1, 2x1 + x2 + 3x3 = 1}, and the point x∗ = (0, 1, 0)T with J ∗ satisfied at x∗ . Furthermore,       ∗ NF v = λ Q (x ) =     By (2.18), we have

∗ NF Q∩S (x ) =

S = {x ∈ R3 : kxk0 6 2}

= {{2, 3}, {1, 2}}. It is easy to verify that the R-LICQ is not 1





2

   



     1   + µ  1  , λ ∈ R+  ,   1 3 \

∗ NF S (x ) = {0}.

∗ 3 (NF Q (x ) + RJ¯)

J∈J ∗

 (1) (1)    λ + 2µ (1) (1) =   λ +µ    λ(1) + 3µ(1)  (2) (2)    λ + 2µ (2) (2) ∩   λ +µ    λ(2) + 3µ(2)

   



  + R3 , λ(1) ∈ R+ {1}           3 (2)  + R , λ ∈ R+ . {3}    

∗ (1) Obviously, v ∗ = (6, 4, 10)T ∈ NF , µ(1) ) = (1, 3) and (λ(2) , µ(2) ) = (2, 2) but v ∗ ∈ / Q∩S (x ) with (λ ∗ F ∗ NF (x ) + N (x ). Q S

It is interesting that when adding the negative constraint to Q ∩ S, we still have the decomposition property of Fr´echet normal cone under the above R-LICQ. Corollary 2.8.

∗ F ∗ Let x∗ ∈ Q ∩ Rn− ∩ S and the R-LICQ hold at x∗ . We have NF Q∩Rn ∩S (x ) = NQ (x ) −

∗ F ∗ + NF Rn (x ) + NS (x ). −

Proof. The equality holds obviously if kx∗ k0 = s by the regularity of every constraint. We only need to ∗ F ∗ F ∗ consider the case of 0 < kx∗ k0 < s, which is NF Q∩Rn ∩S (x ) = NQ (x ) + NRn (x ) by Lemma 2.3. Noticing −



+ ∗ n n that NF Rn (x ) = R+ ∩ RΓ∗ , RΓ∗ and using the similar proof as the Proposition 2.5, we have −

NF (x∗ ) = Q∩Rn − ∩S =

t0 \

k=1 t0 \

∗ F (NF (x∗ ) + RnJ¯k ) Q (x ) + NRn −

+ ∗ n (NF Q (x ) + RΓ∗ + RJ¯k )

k=1 + ∗ = NF Q (x ) + RΓ∗ , ∗ where the last equality due to J¯k ⊂ Γ and RΓ+∗ + RnJ¯k = { Jk \Γ∗ , µj ∈ R, j ∈ J¯k }. The proof is completed.

P

i∈Jk \Γ∗

λi ei +

P

j∈J¯k

µj ej : λi > 0, i ∈

766

Pan L L et al.

Sci China Math

May 2017

Vol. 60

No. 5

In [22, Proposition 4.2], Lu studied the decomposition property of the Fr´echet normal cone in the SNP problem with constraints of sparse and symmetric sets. From Proposition 2.5 and Corollary 2.8, it is not hard to observe that these symmetric sets are special cases of Q defined in (2.9) and naturally satisfy the R-LICQ. Thus, we immediately obtain the following result. Corollary 2.9 (See [22, Proposition 4.2]). one of the following symmetric sets: Q1 , X1 × · · · × Xn ,

The decomposition property (2.14) holds if the set Q takes

Q2 , {x ∈ Rn : cT x − d = 0},

Q3 , {x ∈ Rn : g(|x|) 6 0},

where Xi , i = 1, . . . , n, are closed intervals in R, c ∈ Rn with ci 6= 0 for all i, d ∈ R, g : Rn+ → R is a smooth increasing convex function and g(0) < 0, ∇supp(x) g(x) 6= 0 for any 0 6= x ∈ Rn+ . Furthermore, if Q is an affine set in Rn , we have the following corollary. Corollary 2.10. Let x∗ ∈ Q ∩ S with Q = {x ∈ Rn : Ax = b}, where A ∈ Rm×n and b ∈ Rm . If ∗ kx k0 = s, then (2.14) holds when rank(A) = m. If 0 < kx∗ k0 < s, then (2.14) holds when rank(AΓ∗ ) = m, where AΓ∗ is the submatrix consisting of columns of A indexed by Γ∗ . Note that for the case of 0 < kx∗ k0 < s, rank(AΓ∗ ) = m means |Γ∗ | > m, which is in accordance with the transversality in [2], where a lower bound is imposed on the sparsity s. Now we turn to study the decomposition property of Mordukhovich normal cone because this normal cone plays a distinguished role in the variational geometry of nonconvex sets [25, 29]. However, the computation of the Mordukhovich normal cone to a concrete nonconvex set is much complicated (see [1, 15, 17, 18, 20, 26, 28, 30]). Under the R-MFCQ or the R-LICQ, we have the following decomposition property. Proposition 2.11.

Let x∗ ∈ Q ∩ S and the R-MFCQ hold at x∗ . We have ∗ M ∗ M ∗ NM Q∩S (x ) ⊆ NQ (x ) + NS (x ).

(2.22)

If the R-LICQ holds, then (2.22) holds as equality. Proof. If kx∗ k0 = s, (2.22) holds as equality obviously due to the regularity of Q and S from (2.8). We next prove the case of 0 < kx∗ k0 < s. First assume that the R-MFCQ holds at x∗ ∈ Q ∩ S. Since ∇Γ∗ hj (x∗ ) (j = 1, . . . , l) are linearly independent, then ∇hj (x∗ ), j = 1, . . . , l, ei , i ∈ J¯ (J ∈ J ∗ ) are also linearly independent. Indeed, Pl P suppose that there are λj , j = 1, . . . , l and µi , i ∈ J¯ such that j=1 λj ∇hj (x∗ ) + i∈J¯ µi ei = 0, or l X



∇Γ∗ hj (x∗ )





0Γ∗





0Γ∗



  X     ∗ +    λj  µi  ∗ hj (x ) ∇ J\Γ    0J\Γ∗  =  0J\Γ∗  , ¯ j=1 i∈J 0J¯ ∇J¯hj (x∗ ) en−s i

where en−s is the i-th coordinate vector of Rn−s . From the linear independence of ∇Γ∗ hj (x∗ ) (j = i 1, . . . , l), we obtain λj = 0 for all j = 1, . . . , l. Furthermore, we have µi = 0 for all i ∈ J¯ since en−s , i ∈ J¯ i n ¯ are linearly independent. Note that y ∈ RJ is equivalent to hei , yi = 0, i ∈ J. Then the R-MFCQ means that ∇hj (x∗ ), j = 1, . . . , l, ei , i ∈ J¯ are linearly independent, and for any J ∈ J ∗ there is a y ∈ Rn such that  ∗  i ∈ I(x∗ ),  h∇gi (x ), yi < 0, h∇hj (x∗ ), yi = 0,   he , yi = 0, i

j = 1, . . . , l, ¯ i ∈ J.

By Motzkin’s theorem, the existence of a solution to the above system is equivalent to the vector (λ, µ, ν) ∈ P P P Pl |I(x∗ )| R+ × Rl × Rn−s with i∈I(x∗ ) λi ∇gi (x∗ ) + j=1 µj ∇hj (x∗ ) + i∈J¯ νi ei = 0, where i∈J¯ νi ei ∈ ∗ NM S (x ). Therefore, (2.22) holds due to [29, Theorem 6.14]. ∗ M ∗ M ∗ Now let the R-LICQ holds. We only need to prove NM Q (x ) + NS (x ) ⊆ NQ∩S (x ).

Pan L L et al.

May 2017

Sci China Math

Vol. 60

767

No. 5

From [29, Proposition 6.27], one has NM TB

Q∩S (x

∗ M ∗ M We show NM Q (x ) + NS (x ) ⊆ NTB

Q∩S (x

∗)

∗)

∗ (0) ⊆ NM Q∩S (x ).

(2.23)

∗ (0). Since TB Q∩S (x ) =

polyhedral cones, by [18, (1)], we have NM TB

Q∩S

(x∗ )

(0) = NF TB

Q∩S

(x∗ )

(0) ∪



S

[

J∈J ∗

NF TB

Q∩S

∗ TB QJ (x ) is a finite union of

(x∗ )

∗ d∈TB Q∩S (x )\{0}

 (d) .

(2.24)

For any Jk ∈ J ∗ = {J1 , . . . , Jt0 }, ∗ n ∗ ∗ ∗ TB QJ (x ) = {d ∈ RJk : h∇gi (x ), di 6 0, i ∈ I(x ), h∇hj (x ), di = 0, j = 1, . . . , l} k

∗ n = TB Q (x ) ∩ RJk , ∗ TB Q∩S (x ) =

t0 [

(2.25)

∗ TB QJ (x ). k

k=1

We assert that there is a dk ∈ RnJk satisfying h∇gi (x∗ ), di = 0,

i ∈ I(x∗ ),

h∇hj (x∗ ), di = 0,

(2.26)

j = 1, . . . , l, ∗

di 6= 0,

i ∈ Jk \ Γ .

In fact, denote the (|I(x∗ )| + l) × s order matrix whose rows are ∇Jk gi (x∗ )T , i ∈ I(x∗ ) and ∇Jk hj (x∗ )T , j = 1, . . . , l as A = (AΓ∗ AJk \Γ∗ ). It suffices to show the homogeneous linear equation ! dΓ∗ =0 Ad = (AΓ∗ AJk \Γ∗ ) dJk \Γ∗ has a solution with di 6= 0, i ∈ Jk \ Γ∗ . By the R-LICQ, ∇Γ∗ gi (x∗ ), i ∈ I(x∗ ), ∇Γ∗ hj (x∗ ), j = 1, . . . , l are linearly independent. Then dJk \Γ∗ must be the free variables, so we can let them be all nonzeros. That is to say there exists a dk ∈ RnJk satisfying (2.26). ∗ k B ∗ k ∗ Therefore, for every TB QJ (x ), there is a d ∈ TQJ (x ) with di 6= 0, ∀ i ∈ Jk \ Γ , which leads to k

k

∗ k B ∗ dk ∈ / TB QJl (x ) for any l 6= k, i.e., d belongs to exactly one of the TQJk (x ), k = 1, . . . , t0 . So we have NF (dk ) = NF (dk ). Together with (2.25) and (2.26), we get TB (x∗ ) TB (x∗ ) Q∩S

QJ k

NF TB

QJ k

k (2.25) (x∗ ) (d ) =

k F k n NF (dk ) = NF TB (x∗ ) (d ) + NRn TB (x∗ ) (d ) + RJ¯k J Q

Q

k

(2.26)

= {d ∈ Rn : h∇gi (x∗ ), di 6 0, i ∈ I(x∗ ), h∇hj (x∗ ), di = 0, j = 1, . . . , l} + RnJ¯k ∗ n = NM Q (x ) + RJ¯k .

This implies that t0 [

[

∗ n (NM Q (x ) + RJ¯k ) ⊆

k=1

NF TB

Q∩S (x

∗)

(d).

∗ d∈TB Q∩S (x )\{0}

Combining with the above relations, we obtain (2.11)

∗ M ∗ M ∗ NM Q (x ) + NS (x ) = NQ (x ) +

t0 [

RnJ¯k =



[

∗ d∈TB Q∩S (x )\{0}

∗ n (NM Q (x ) + RJ¯k )

k=1

k=1 (2.27)

t0 [

NF TB

Q∩S (x

∗)

(d)

(2.27)

768

Pan L L et al.

Sci China Math

May 2017

Vol. 60

No. 5

(2.24)

⊆ NM TB

Q∩S (x

∗)

(0)

(2.23)

∗ ⊆ NM Q∩S (x ),

which completes the proof. Different from Fr´echet normal cone, the decomposition property of Mordukhovich normal cone does not hold for negative constraint. For example, R3− = {x ∈ R3 : x 6 0}, S = {x ∈ R3 : kxk0 6 2} and x∗ = (1, 0, 0)T . One can easily verify ∗ 3 3 3 NM R3 ∩S (x ) = {x ∈ R : x1 = 0, x2 > 0, x3 > 0} ∪ R{2} ∪ R{3} −

{x ∈ R3 : x1 = 0, x2 > 0, x3 > 0} + (R3{2} ∪ R3{3} ) ∗ = NM (x∗ ) + NM Rn S (x ). − ∗ Actually, from [20, Theorem 3.3], one has NM (x∗ ) = NM (x∗ ) ∪ NM S (x ). Rn Rn − ∩S − For the Clarke normal cone, we have the next decomposition property.

Proposition 2.12.

Let x∗ ∈ Q ∩ S and the R-MFCQ hold at x∗ . We have ∗ C ∗ ∗ C NC Q∩S (x ) ⊆ NQ (x ) + NS (x ).

(2.28)

Furthermore, if the R-LICQ holds, (2.28) holds as equality. Proof. has

Using (2.7) and the same argument as in (2.22) of Proposition 2.11, under the R-MFCQ, one ∗ C ∗ C ∗ TC Q∩S (x ) ⊇ TQ (x ) ∩ TS (x ).

Following from Lemma 2.2(ii) and the regularity of Clarke tangent cone, the inclusion holds. ∗ M ∗ ◦ C ∗ M ∗ ◦ C ∗ M ∗ ◦ By [29, Theorem 6.28], TC Q∩S (x ) = (NQ∩S (x )) , TQ (x ) = (NQ (x )) and TS (x ) = (NS (x )) . Since the R-LICQ holds, from Proposition 2.11 and Lemma 2.2(i), we have ∗ M ∗ ◦ M ∗ M ∗ ◦ TC Q∩S (x ) = (NQ∩S (x )) = (NQ (x ) + NS (x )) ∗ ◦ M ∗ ◦ = (NM Q (x )) ∩ (NS (x )) ∗ C ∗ = TC Q (x ) ∩ TS (x ).

By Lemma 2.2(ii) and the regularity of the Clarke tangent cone, the equality holds.

3

First-order optimality conditions

The Lagrangian function of the SNP problem (1.1) is defined as L(x, λ, µ) = f (x) + hλ, g(x)i + hµ, h(x)i,

λ ∈ Rm +,

µ ∈ Rl .

(3.1)

The gradient of L(x, λ, µ) with respect to variable x is ∇x L(x, λ, µ) = ∇f (x) +

m X i=1

λi ∇gi (x) +

l X

µj ∇hj (x).

j=1

Utilizing [29, Theorem 6.12] and the decomposition properties of the normal cones introduced in the previous section, we are now able to define and analyze the first-order optimality conditions for (1.1). Definition 3.1. The point x∗ ∈ Rn is called a B-KKT (or M -KKT, or C-KKT) point of (1.1), if it satisfies the following conditions: There exist λ∗ ∈ Rm and µ∗ ∈ Rl such that  ∗ ∗ ∗ F ∗ M ∗ C ∗  −∇x L(x , λ , µ ) ∈ NS (x ) (or NS (x ), or NS (x )),   λ∗ > 0, g (x∗ ) 6 0, λ∗ g (x∗ ) = 0, i = 1, . . . , m, i i i i ∗  hj (x ) = 0, j = 1, . . . , l,     ∗ kx k0 6 s.

Pan L L et al.

Sci China Math

May 2017

Vol. 60

No. 5

769

∗ M ∗ C ∗ For x∗ ∈ Q ∩ S, from NF S (x ) ⊆ NS (x ) ⊆ NS (x ), we know that the B-KKT conditions are stronger than the M -KKT conditions and the M -KKT conditions are stronger than the C-KKT conditions. We first analyze the B-KKT conditions. It is easy to observe that the B-KKT conditions coincide with the basic feasibility in [3, 4] and the NB -stationarity in [27]. Furthermore, if we treat (1.1) as a disjunctive program, the B-KKT conditions correspond to the strongly stationarity in [16]. For x∗ ∈ Q ∩ S, we consider the following subproblem from the SNP problem:

min ∇hf (x∗ ), di h∇gi (x∗ ), di 6 0,

s.t.



h∇hj (x ), di = 0, d∈

i ∈ I(x∗ ),

(3.2)

j = 1, . . . , l,

∗ TB S (x ).

The next theorem gives the characterizations of the B-KKT conditions. Theorem 3.2. Let the point x∗ ∈ Q ∩ S. (i) Suppose the R-LICQ holds at x∗ . Then x∗ is a B-KKT point of (1.1) if and only if d∗ = 0 is the minimizer of (3.2). (ii) Suppose that x∗ is a local minimizer of (1.1) and the R-LICQ holds at x∗ . Then x∗ is a B-KKT point of (1.1). (iii) Let f and g be convex functions and h be an affine function in (1.1). Suppose that (x∗ , λ∗ , µ∗ ) satisfies the B-KKT conditions of (1.1). If kx∗ k0 < s, then x∗ is the global minimizer of (1.1); otherwise, x∗ is the local minimizer of (1.1). Proof. (i) By Definition 3.1 and Lemma 2.2(ii), x∗ ∈ Q ∩ S being a B-KKT of (1.1) is equivalent to ∗ −∇f (x∗ ) ∈ (LQ (x∗ ))◦ + NF S (x ). Under the R-LICQ, ∗ F ∗ F ∗ F ∗ B ∗ ◦ (LQ (x∗ ))◦ + NF S (x ) = NQ (x ) + NS (x ) = NQ∩S (x ) = (TQ∩S (x )) ∗ B ∗ ◦ ∗ B ∗ ◦ = (TB Q (x ) ∩ TS (x )) = (LQ (x ) ∩ TS (x ))

due to Proposition 2.5 and (2.16b). Then from the definition of normal cone, we have h∇f (x∗ ), di > 0,

∗ ∀ d ∈ LQ (x∗ ) ∩ TB S (x ),

which is equivalent to d∗ = 0 being the minimizer of (3.2). (ii) Since the R-LICQ holds at x∗ ∈ Q ∩ S, from the optimality condition (2.12) and Proposition 2.5, we obtain ∗ F ∗ F ∗ −∇f (x∗ ) ∈ NF (3.3) Q∩S (x ) = NQ (x ) + NS (x ). By (2.17), there exist λ∗i > 0, i ∈ I(x∗ ) and µ∗j ∈ R, j = 1, . . . , l satisfying −∇f (x∗ ) −

X

i∈I(x∗ )

λ∗i ∇gi (x∗ ) −

l X

∗ µ∗j ∇hj (x∗ ) ∈ NF S (x ).

j=1

Letting λ∗i = 0, i ∈ / I(x∗ ), we can obtain the desired result. (iii) Under the assumptions on f , g and h, the Lagrange function L(x, λ, µ) is convex with respect to variable x. It follows that L(x, λ∗ , µ∗ ) > L(x∗ , λ∗ , µ∗ ) + h∇x L(x∗ , λ∗ , µ∗ ), x − x∗ i,

∀ x ∈ Q ∩ S.

(3.4)

Since (x∗ , λ∗ , µ∗ ) satisfies the B-KKT conditions, we have L(x∗ , λ∗ , µ∗ ) = f (x∗ ) + hλ∗ , g(x∗ )i + hµ∗ , h(x∗ )i = f (x∗ ).

(3.5)

From x ∈ Q ∩ S and λ∗ > 0, we derive L(x, λ∗ , µ∗ ) = f (x) + hλ∗ , g(x)i + hµ∗ , h(x)i 6 f (x).

(3.6)

770

Pan L L et al.

May 2017

Sci China Math

Vol. 60

No. 5

∗ ∗ ∗ ∗ If kx∗ k0 < s, from Lemma 2.3, NF S (x ) = {0}, then ∇x L(x , λ , µ ) = 0. Together with (3.4)–(3.6), we have for all x ∈ Q ∩ S,

f (x) > L(x, λ∗ , µ∗ ) > L(x∗ , λ∗ , µ∗ ) + h∇x L(x∗ , λ∗ , µ∗ ), x − x∗ i = f (x∗ ) + 0. ∗ n ∗ ∗ ∗ n ∗ If kx∗ k0 = s, from Lemma 2.3, NF S (x ) = RΓ∗ , then −∇x L(x , λ , µ ) ∈ RΓ∗ . As kx k0 = s, there is a small enough η > 0, such that

x ∈ N (x∗ , η) ∩ (Q ∩ S) = N (x∗ , η) ∩ (Q ∩ RnΓ∗ ). From (∇x L(x∗ , λ∗ , µ∗ ))Γ∗ = 0 and (x − x∗ )Γ∗ = 0, it yields that h∇x L(x∗ , λ∗ , µ∗ ), x − x∗ i = 0. Combining (3.4)–(3.6), we have for all x ∈ N (x∗ , η) ∩ (Q ∩ S), f (x) > f (x∗ ). The proof is completed. Let us consider the following example, which shows that the local minimizer is not a B-KKT point if the R-LICQ is missing. Example 3.3.

Consider the program min x2 x1 − x2 = 0,

s.t.

x3 = 1, kxk0 6 2. It is easy to verify that x∗ = (0, 0, 1)T is the unique minimizer and the R-LICQ is missing in this program. µ∗ ∗ 1 We can observe that x is not a B-KKT point of the program. Indeed, for any ( µ∗ ) ∈ R2 , we have 2       0 1 0       ∗ ∗ ∗ ∗ F    −∇f (x ) = −  1  6= µ1  −1  + µ2  0   ∈ NQ (x ) + {0}, 0 0 1 where

      1 ∗ F NQ (x ) = µ1   −1    0





0



   

    + µ2  0  , µ1 , µ2 ∈ R .       1

Now, we turn to discuss the M -KKT conditions. If we treat (1.1) as a disjunctive program, the M -KKT point corresponds to the M -stationary point in [16]. Furthermore, we can also observe that x∗ ∈ Q ∩ S is the local minimizer of (1.1) if and only if for any J ∈ J ∗ , x∗ is the local minimizer of the following problem: min

f (x)

s.t.

gi (x) 6 0,

i = 1, . . . , m,

hj (x) = 0,

j = 1, . . . , l,

x∈

(3.7)

RnJ .

Using this observation, Lu and Zhang [23] obtained the first-order condition for (3.7) under Robinson CQ and the considered subproblem is defined as min h∇f (x∗ ), di s.t.

h∇gi (x∗ ), di 6 0, ∗

h∇hj (x ), di = 0, d∈

RnJ ,

where the index set J ∈ J ∗ with Q ∩ RnJ 6= ∅. The M -KKT conditions have the following properties.

i ∈ I(x∗ ), j = 1, . . . , l,

(3.8)

Pan L L et al.

Sci China Math

May 2017

Vol. 60

771

No. 5

Theorem 3.4. Let the point x∗ ∈ Q ∩ S. (i) x∗ is an M -KKT point of (1.1) if and only if d∗ = 0 is the minimizer of (3.8) for an index set J ∈ J ∗ with Q ∩ RnJ 6= ∅. (ii) Suppose that x∗ is a local minimizer of (1.1) and the R-MFCQ holds at x∗ . Then x∗ is an M -KKT point of (1.1). (iii) Let f and g be convex functions and h be an affine function in (1.1). Suppose that (x∗ , λ∗ , µ∗ ) satisfies the M -KKT conditions of (1.1). Then x∗ is a global minimizer of (1.1) restricted on RnJ with some J ∈ J ∗ and Q ∩ RnJ 6= ∅. ∗ Proof. (i) If x∗ is an M -KKT point, by the definition, −∇f (x∗ ) ∈ (LQ (x∗ ))◦ + NM S (x ). Following ∗ n from Lemmas 2.3 and 2.2(ii), there is an index set J ∈ J with Q ∩ RJ 6= ∅ such that

−∇f (x∗ ) ∈ (LQ (x∗ ))◦ + RnJ¯ = (LQ (x∗ ) ∩ RnJ )◦ . By the definition of normal cone, we have h∇f (x∗ ), di > 0, ∀ d ∈ LQ (x∗ ) ∩ RnJ , which is equivalent to d∗ = 0 being the minimizer of (3.8). (ii) Under the given assumptions, combining the optimality conditions (2.12) and (2.21), we obtain ∗ ∗ ∗ ◦ M ∗ M ∗ M −∇f (x∗ ) ∈ NF Q∩S (x ) ⊆ NQ (x ) + NS (x ) = (LQ (x )) + NS (x ),

i.e., the result (ii) holds. (iii) Under the assumptions, (3.4)–(3.6) also hold here using the same similar arguments. Since (x∗ , λ∗ , µ∗ ) satisfies the M -KKT conditions, there is an index set J ∈ J ∗ with Q ∩ RnJ 6= ∅ such that −∇x L(x∗ , λ∗ , µ∗ ) ∈ RnJ¯. So (∇x L(x∗ , λ∗ , µ∗ ))J = 0 and (x − x∗ )J¯ = 0 for any x ∈ Q ∩ RnJ . This implies that h∇x L(x∗ , λ∗ , µ∗ ), x − x∗ i = 0, ∀ x ∈ Q ∩ RnJ . Combining (3.4)–(3.6), we have for all x ∈ Q ∩ RnJ , f (x) > f (x∗ ). The proof is completed. Next, we consider the C-KKT conditions. Given x∗ ∈ Q ∩ S, we define the subproblem min h∇f (x∗ ), di s.t.

h∇gi (x∗ ), di 6 0, ∗

h∇hj (x ), di = 0, d∈

i ∈ I(x∗ ),

(3.9)

j = 1, . . . , l,

∗ TC S (x ).

Using the same proof as Theorem 3.4, we can derive the characterizations of the C-KKT conditions. Theorem 3.5. Let the point x∗ ∈ Q ∩ S. (i) x∗ is a C-KKT point of (1.1) if and only if d∗ = 0 is the minimizer of (3.9). (ii) Suppose that x∗ is a local minimizer of (1.1) and the R-MFCQ holds at x∗ . Then x∗ is a C-KKT point of (1.1). (iii) Let f and g be convex functions and h be an affine function in (1.1). Suppose that (x∗ , λ∗ , µ∗ ) satisfies the C-KKT conditions of (1.1). Then x∗ is a global minimizer of (1.1) restricted on RnΓ∗ . Proof. (i) From the definition of the C-KKT conditions, Lemmas 2.3 and 2.2(ii), x∗ being a C-KKT ∗ ∗ ◦ n ∗ n ◦ point is equivalent to −∇f (x∗ ) ∈ (LQ (x∗ ))◦ + NC S (x ) = (LQ (x )) + RΓ∗ = (LQ (x ) ∩ RΓ∗ ) . By the definition of normal cone, we have h∇f (x∗ ), di > 0, ∀ d ∈ LQ (x∗ ) ∩ RnΓ∗ , which is equivalent to d∗ = 0 being the minimizer of (3.9). (ii) The result (ii) holds obviously from Theorem 3.4(ii) and the M -KKT conditions being stronger than the C-KKT conditions. (iii) Note that (3.4)–(3.6) also hold here using the same similar arguments. Since (x∗ , λ∗ , µ∗ ) satisfies the C-KKT conditions, we derive −∇x L(x∗ , λ∗ , µ∗ ) ∈ RΓn∗ , which yields that h∇x L(x∗ , λ∗ , µ∗ ), x − x∗ i = 0 for all x ∈ Q ∩ RnΓ∗ . Combining (3.4)–(3.6), we have for all x ∈ Q ∩ RnΓ∗ , f (x) > f (x∗ ). The proof is completed.

772

Pan L L et al.

Sci China Math

May 2017

Vol. 60

No. 5

At last, we discuss the relationship between the B-KKT point and the α-stationary point. For α > 0, the point x∗ ∈ Q ∩ S is called an α-stationary point of (1.1) if it satisfies the relation x∗ ∈ PQ∩S (x∗ − α∇f (x∗ )).

(3.10)

For only sparsity constrained optimization problem, the α-stationary point was defined in [3] as the Lstationary point. When L = 1, it was called a fixed point in [6]. For the nonnegativity and sparsity constrained optimization problem, the α-stationary point was studied in [27]. Furthermore, for the optimization problem over sparse and symmetric sets, relying on the orthogonal projection operator onto the feasible region, the α-stationary point was explored in [4]. A strong α-stationary point was studied in [22] aiming at the problem in [4]. Just as the discussion in [4], if the objective function f is gradient Lipschitz continuous with constant 1 L(f ) and 0 < α < L(f ) , then the local minimizer is an α-stationary point of (1.1). Note that the point x∗ ∈ Q ∩ S is an α-stationary point of (1.1) if and only if x∗ is a global minimizer of 1 f˜(x) , kx − x∗ + α∇f (x∗ )k2 2 gi (x) 6 0, i = 1, . . . , m,

min s.t.

hj (x) = 0,

j = 1, . . . , l,

kxk0 6 s. Applying Theorems 3.2(i) and 3.2(ii) to the above problem and noticing ∇f˜(x∗ ) = α∇f (x∗ ), we obtain that under the R-LICQ, α-stationarity can imply the B-KKT conditions. Together with the above results, we give the relationship of the first-order optimality conditions for (1.1). Theorem 3.6. Suppose the objective function f is gradient Lipschitz continuous with constant L(f ) > 0. Then the following relations for (1.1) hold: local minimizer of (1.1) ւ ց

0 < α < 1 Lf

R-LICQ

R-LICQ

α-stationary point−−−−→ B-KKT point ↓ M -KKT point ↓ C-KKT point.

4

Second-order optimality conditions

In this section, we consider the second-order optimality conditions for (1.1) under the assumptions that the functions f , g and h in (1.1) are twice continuously differentiable on Q ∩ S. The Hessian matrix of Lagrangian function (3.1) is given by ∇2xx L(x, λ, µ) = ∇2 f (x) +

m X i=1

λi ∇2 gi (x) +

l X

µi ∇2 hj (x).

j=1

˜ ∗) , Suppose that (x∗ , λ∗ , µ∗ ) ∈ Rn × Rm × Rl satisfies the B-KKT conditions. Then we denote I(x ∗ ∗ ∗ n ∗ ˜ ˜ ˜ {i = 1, . . . , m : λi > 0}. Thus I(x ) ⊆ I(x ). Let Q , Q ∩ {x ∈ R : gi (x) = 0, i ∈ I(x )}. Define the closed polyhedral cone ˜ ∗ ), h∇gi (x∗ ), di 6 0, i ∈ I(x∗ ) \ I(x ˜ ∗ ), LQ˜ (x∗ ) , {d ∈ Rn : h∇gi (x∗ ), di = 0, i ∈ I(x h∇hj (x∗ ), di = 0, j = 1, . . . , l}. ∗ ∗ ∗ B ∗ ∗ ˜ Generally, TB ˜ (x ). If the classical MFCQ holds for Q at x , then TQ ˜ (x ). ˜ (x ) ⊆ LQ ˜ (x ) = LQ Q We next give the second-order optimality condition for (1.1).

Pan L L et al.

Sci China Math

May 2017

Vol. 60

No. 5

773

Theorem 4.1 (Second-order necessary condition). Suppose the functions f , g and h in (1.1) are twice continuously differentiable. If the point x∗ ∈ Q ∩ S is the local minimizer of (1.1) and the R-LICQ holds at x∗ , then (i) there are λ∗ ∈ Rm and µ∗ ∈ Rl such that (x∗ , λ∗ , µ∗ ) satisfies the B-KKT conditions; ∗ B ∗ (ii) hd, ∇2xx L(x∗ , λ∗ , µ∗ )di > 0, ∀ d ∈ TB ˜ (x ) ∩ TS (x ). Q Proof. We only prove (ii) because (i) has been obtained in Theorem 3.2. ∗ B ∗ B Under the condition of the R-LICQ, by Remark 2.6, we have d ∈ TB (x∗ ). ˜ (x ) ∩ TS (x ) = TQ∩S ˜ Q ˜ ∩ S and a nonnegative From the definition of Bouligand tangent cone, there exist a sequence {xk } ⊂ Q sequence {tk } such that tk (xk − x∗ ) → d and xk → x∗ . The twice continuous differentiability of function f , g and h yields that L(xk , λ∗ , µ∗ ) = L(x∗ , λ∗ , µ∗ ) + h∇x L(x∗ , λ∗ , µ∗ ), xk − x∗ i 1 + hxk − x∗ , ∇2xx L(x∗ , λ∗ , µ∗ )(xk − x∗ )i + o(kxk − x∗ k2 ). 2

(4.1)

˜ ∩ S, if i ∈ I(x ˜ ∗ ) and gi (xk ) = 0, otherwise, λ∗ = 0. Then λ∗ gi (xk ) = 0, i = 1, . . . , m, which For xk ∈ Q i i yields that L(xk , λ∗ , µ∗ ) = f (xk ) + hλ∗ , g(xk )i + hµ∗ , h(xk )i = f (xk ). ∗ As (x∗ , λ∗ , µ∗ ) satisfies the B-KKT conditions, L(x∗ , λ∗ , µ∗ ) = f (x∗ ) and −∇x L(x∗ , λ∗ , µ∗ ) ∈ NF S (x ). Using the proof as in Theorem 3.2, we get h∇x L(x∗ , λ∗ , µ∗ ), xk − x∗ i = 0. It follows from (4.1) that

1 f (xk ) = f (x∗ ) + hxk − x∗ , ∇2xx L(x∗ , λ∗ , µ∗ )(xk − x∗ )i + o(kxk − x∗ k2 ). 2

(4.2)

˜ ∩ S and xk → x∗ , f (xk ) > f (x∗ ) for sufficiently large k. By (4.2), Since xk ∈ Q 1 k hx − x∗ , ∇2xx L(x∗ , λ∗ , µ∗ )(xk − x∗ )i + o(kxk − x∗ k2 ) > 0 2 holds for sufficiently large k. Multiplying both sides of the above inequality by 2t2k and letting k → ∞, ∗ B ∗ 2 ∗ ∗ ∗ from tk (xk −x∗ ) → d ∈ TB ˜ (x )∩TS (x ), we have hd, ∇xx L(x , λ , µ )di > 0. The proof is completed. Q Now we turn to consider the second-order sufficient condition. Theorem 4.2 (Second-order sufficient condition). Suppose the functions f , g and h in (1.1) are twice continuously differentiable. If (x∗ , λ∗ , µ∗ ) ∈ Rn × Rm × Rl satisfies the B-KKT conditions and hd, ∇2xx L(x∗ , λ∗ , µ∗ )di > 0,

∗ 0 6= d ∈ LQ˜ (x∗ ) ∩ TB S (x )

(4.3)

holds, then x∗ is the strictly local minimizer of (1.1). Proof. We argue by contradiction. Suppose that x∗ is not a strictly local minimizer of (1.1). Then there exists a sequence {xk } ⊂ (Q ∩ S) \ {x∗ } satisfying xk → x∗ and f (xk ) 6 f (x∗ ), k

k = 1, 2, . . .



k k Let dk , kxxk −x −x∗ k . Then kd k = 1. So, there exists a convergent subsequence of {d } whose limit point d satisfies kdk = 1. Without loss of generality, we assume dk → d. Thus, by (2.1) and (2.6), we have ∗ B ∗ B ∗ d ∈ TB Q∩S (x ) ⊆ TQ (x ) ∩ TS (x ). ∗ We next prove d ∈ LQ˜ (x∗ ) ∩ TB S (x ). From

0 > f (xk ) − f (x∗ ) = h∇f (x∗ ), xk − x∗ i + o(kxk − x∗ k), we conclude that



xk − x∗ ∇f (x ), k kx − x∗ k ∗



+

o(kxk − x∗ k) 6 0. kxk − x∗ k

774

Pan L L et al.

May 2017

Sci China Math

Vol. 60

No. 5

Letting k → ∞, we obtain h∇f (x∗ ), di 6 0.

(4.4)



k

Similarly, it follows from hj (x ) − hj (x ) = 0, j = 1, . . . , l, that h∇hj (x∗ ), di = 0,

j = 1, . . . , l.

(4.5)

h∇gi (x∗ ), di 6 0,

i ∈ I(x∗ ).

(4.6)

h∇gi (x∗ ), di = 0,

˜ ∗ ). i ∈ I(x

(4.7)

Since gi (xk ) − gi (x∗ ) 6 0 for i ∈ I(x∗ ), it yields that

Furthermore, we assert that ˜ ∗ ) ⊆ I(x∗ ) such that h∇gi0 (x∗ ), di < 0. By the B-KKT conditions and Indeed, if there exists an i0 ∈ I(x B ∗ d ∈ TS (x ),   l X X − ∇f (x∗ ) − λ∗i ∇gi (x∗ ) − µ∗j ∇hj (x∗ ), d 6 0. j=1

i∈I(x∗ )

˜ ∗ ), λ∗ > 0, we obtain Since i0 ∈ I(x i0 h∇f (x∗ ), di > −

X

λ∗i h∇gi (x∗ ), di −

l X

µ∗j h∇hj (x∗ ), di > −λ∗i0 h∇gi0 (x∗ ), di > 0,

j=1

i∈I(x∗ )

which contradicts with (4.4). From (4.4)–(4.7), we conclude that d ∈ LQ˜ (x∗ ), hence ∗ d ∈ LQ˜ (x∗ ) ∩ TB S (x ).

Since λ∗i > 0,

gi (xk ) 6 0 = gi (x∗ ),

hj (xk ) = 0,

∀ i ∈ I(x∗ ),

∀ j = 1, . . . , l

and f (xk ) 6 f (x∗ ), we have k





k

L(x , λ , µ ) = f (x ) +

X

λ∗i gi (xk )

+

i∈I(x∗ )

l X

µ∗j hj (xk ) 6 f (x∗ ) = L(x∗ , λ∗ , µ∗ ).

j=1

Then it follows that 0 > L(xk , λ∗ , µ∗ ) − L(x∗ , λ∗ , µ∗ ) 1 = kxk − x∗ kh∇x L(x∗ , λ∗ , µ∗ ), dk i + kxk − x∗ k2 hdk , ∇2xx L(x∗ , λ∗ , µ∗ )dk i + o(kxk − x∗ k2 ). 2

(4.8)

∗ Because (x∗ , λ∗ , µ∗ ) satisfies the B-KKT conditions, −∇x L(x∗ , λ∗ , µ∗ ) ∈ NF S (x ). From the expression ∗ ∗ ∗ ∗ ∗ of Bouligand normal cone to S, if kx k0 < s, −∇x L(x , λ , µ ) = 0; if kx k0 = s, then −∇x L(x∗ , λ∗ , µ∗ ) ∈ RΓn∗ and supp(xk ) = Γ∗ for sufficiently large k, which implies that for sufficiently large k,

h∇x L(x∗ , λ∗ , µ∗ ), dk i = 0. As xk 6= x∗ , from (4.8) we derive that for sufficiently large k, hdk , ∇2xx L(x∗ , λ∗ , µ∗ )dk i +

o(kxk − x∗ k2 ) 6 0. kxk − x∗ k2

Letting k → ∞, we have hd, ∇2xx L(x∗ , λ∗ , µ∗ )di 6 0,

∗ ∀ d ∈ LQ˜ (x∗ ) ∩ TB S (x ),

which is a contradiction with (4.3). The proof is completed.

Pan L L et al.

5

Sci China Math

May 2017

Vol. 60

No. 5

775

Concluding remarks

This paper has explored the decomposition properties of three normal cones to the feasible set of sparse nonlinear programming relying on the R-LICQ and the R-MFCQ. Using the decomposition properties, we have established the B-KKT, the C-KKT and the M -KKT conditions and shown the relationship among them for the SNP. Moreover, the second-order necessary optimality condition and sufficient optimality condition are also presented based on the Bouligand tangent cone. In contrast to the optimality results in disjunctive optimization literature, our results are more suitable for sparse constrained problem. Compared with complementarity-type continuous relaxation methods, our optimality conditions are for the original SNP problem. Besides the theoretical significance, the KKT conditions (3.2), (3.8) and (3.9) also provide some inspiration for algorithmic designs. In the future, we will consider the algorithm for the sparse nonlinear programming, especially, for the sparse principal component analysis (PCA) problem [13, 32], the nonnegative sparse PCA problem [31] and the sparse linear discriminant analysis (LDA) problem [24]. Acknowledgements This work was supported by National Natural Science Foundation of China (Grant No. 11431002) and Shandong Province Natural Science Foundation (Grant No. ZR2016AM07). The authors thank two anonymous referees whose insightful comments helped us a lot to improve the quality of the paper.

References 1 Ban L, Mordukhovich B S, Song W. Lipschitzian stability of the parameterized variational inequalities over generalized polyhedron in reflexive Banach spaces. Nonlinear Anal, 2011, 74: 441–461 2 Bauschke H H, Luke D R, Phan H M, et al. Restricted normal cones and sparsity optimization with affine contraints. Found Comput Math, 2014, 14: 63–83 3 Beck A, Eldar Y. Sparsity constrained nonlinear optimization: Optimality conditions and algorithms. SIAM J Optim, 2013, 23: 1480–1509 4 Beck A, Hallak N. On the minimization over sparse symmetric sets: Projections, optimality conditions and algorithms. Math Oper Res, 2015, 41: 196–223 5 Blumensath T. Compressed sensing with nonlinear observations and related nonlinear optimisation problems. IEEE Trans Inform Theory, 2013, 59: 3466–3474 6 Blumensath T, Davies M E. Iterative thresholding for sparse approximations. J Fourier Anal Appl, 2008, 14: 626–654 7 Bonnans J F, Shapiro A. Perturbation Analysis of Optimization Problems. New York: Springer, 2000 8 Bourguignon S, Ninin J, Carfantan H, et al. Exact sparse approximation problems via mixed-integer programming: Formulations and computational performance. IEEE Trans Signal Proc, 2016, 64: 1405–1419 9 Burdakov O P, Kanzow C, Schwartz A. On a reformulation of mathematical programs with cardinality constraints. In: Advances in Global Optimization. Springer Proceedings in Mathematics Statistics, vol. 95. New York: Springer, 2015, 3–14 10 Burdakov O P, Kanzow C, Schwartz A. Mathematical programs with cardinality constraints: Reformulation by complementarity-type constraints and a regularization method. SIAM J Optim, 2016, 26: 397–425 11 Cand` es E J, Tao T. Decoding by linear programming. IEEE Trans Inform Theory, 2005, 51: 4203–4215 ˇ 12 Cervinka M, Kanzow C, Schwartz A. Constraint qualifications and optimality conditions of cardinality-constrained optimization problems. Math Program, 2016, 160: 353–377 13 d′ Aspremont A, Ghaoui L E, Jordan M I, et al. A direct formulation for sparse PCA using semidefinite programming. SIAM Rev, 2007, 49: 434–448 14 Donoho D L. Compressed sensing. IEEE Trans Inform Theory, 2006, 52: 1289–1306 15 Dontchev A D, Rockafellar R T. Characterization of strong regularity for variational inequalities over polyhedral convex sets. SIAM J Optim, 1996, 7: 1087–1105 16 Flegel M L, Kanzow C, Outrata J V. Optimality conditions for disjunctive programswith application to mathematical programs with equilibrium constraints. Set-Valued Anal, 2007, 15: 139–162 17 Henrion R, Mordukhovich B S, Nam N M. Second-order analysis of polyhedral systems in finite and infinite dimensions with applications to robust stability. SIAM J Optim, 2010, 20: 2199–2227 18 Henrion R, Outrata J V. On calculating the normal cone to a finite union of convex polyhedra. Optimization, 2008, 57: 57–78 19 Koh K, Kim S J, Boyd S. An interior-point method for large-scale l1 -regularized logistic regression. J Mach Learn Res, 2007, 8: 1519–1555

776

Pan L L et al.

Sci China Math

May 2017

Vol. 60

No. 5

20 Li X, Song W. The first-order necessary conditions for sparsity constrained optimization in finite dimensional spaces. J Oper Res Soc China, 2015, 3: 521–535 21 Liu J, Chen J, Ye J. Large-scale sparse logistic regression. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2009, 547–556 22 Lu Z. Optimization over sparse symmetric sets via a nonmonotone projected gradient method. ArXiv:1509.08581, 2015 23 Lu Z, Zhang Y. Sparse approximation via penalty decomposition methods. SIAM J Optim, 2013, 23: 2448–2478 24 Moghaddam B, Weiss Y, Avidan S. Generalized spectral bounds for sparse LDA. In: Proceedings of the 23rd International Conference on Machine Learning. New York: ACM Press, 2006, 641–648 25 Mordukhovich B S. Variational Analysis and Generalized Differentiation I: Basic Theory, II: Applications. Berlin: Springer, 2006 26 Mordukhovich B S, Sarabi M E. Generalized differentiation of piecewise linear functions in second-order variational analysis. Nonlinear Anal, 2016, 132: 240–273 27 Pan L L, Xiu N H, Zhou S L. On solutions of sparsity constrained optimization. J Oper Res Soc China, 2015, 3: 421–439 28 Robinson S M. Some continuity properties of polyhedral multifunctions. Math Program Study, 1981, 14: 206–214 29 Rockafellar R T, Wets R J. Variational Analysis. Berlin: Springer, 1998 30 Song W, Wang Q. Optimality conditions for disjunctive optimization in reflexive Banach spaces. J Optim Theory Appl, 2015, 164: 436–454 31 Zass R, Shashua A. Nonnegative sparse PCA. In: Advances in Neural Information Processing Systems. Cambridge: MIT Press, 2006, 1561–1568 32 Zou H, Hastie T, Tibshirani R. Sparse principal component analysis. J Comput Graph Stat, 2006, 15: 265–286

Suggest Documents