A New Look at Smoothing Newton Methods for Nonlinear ... - CiteSeerX

0 downloads 0 Views 306KB Size Report
a School of Mathematics, The University of New South Wales, Sydney 2052, ..... In order to design high-order convergent Newton methods we need the concept.
A New Look at Smoothing Newton Methods for Nonlinear Complementarity Problems and Box Constrained Variational Inequalities 1

Liqun Qi a, Defeng Sun a and Guanglu Zhou a a

School of Mathematics, The University of New South Wales, Sydney 2052, Australia. Email: [email protected], [email protected] and [email protected]

Abstract In this paper we take a new look at smoothing Newton methods for solving the nonlinear complementarity problem (NCP) and the box constrained variational inequalities (BVIs). Instead of using an infinite sequence of smoothing approximation functions, we use a single smoothing approximation function and Robinson’s normal equation to reformulate the NCP and the BVIs as an equivalent nonsmooth equation H(u, x) = 0, where H : ℜ2n → ℜ2n , u ∈ ℜn is a parameter variable and x ∈ ℜn is the original variable. The central idea of our smoothing Newton methods is that we construct a sequence {z k = (uk , xk )} such that the mapping H(·) is continuously differentiable at each z k and may be non-differentiable at the limiting point of {z k }. We prove that three most often used Gabriel-Mor´e smoothing functions can generate strongly semismooth functions, which play a fundamental role in establishing superlinear and quadratic convergence of our new smoothing Newton methods. We do not require any function value of F or its derivative value outside the feasible region while at each step we only solve a linear system of equations and if we choose a certain smoothing function only a reduced form needs to be solved. Preliminary numerical results show that the proposed methods for particularly chosen smoothing functions are very promising. Key words: variational inequalities, nonsmooth equations, smoothing approximation, smoothing Newton method, convergence.

1

This work is supported by the Australian Research Council. (June 12, 1997; Revised August 11, 1998)

Preprint submitted to Elsevier Preprint

8 September 1998

1

Introduction

Consider the variational inequalities (VIs for abbreviation): Find y ∗ ∈ X such that (y − y ∗)T F (y ∗) ≥ 0 for all y ∈ X,

(1)

where X is a nonempty closed subset of ℜn and F : D → ℜn is continuously differentiable on some open set D, which contains X. In this paper, unless otherwise stated, we assume that X := {y ∈ ℜn | a ≤ y ≤ b},

(2)

where a ∈ {ℜ ∪ {−∞}}n , b ∈ {ℜ ∪ {∞}}n and a < b. Then (1) becomes the box constrained variational inequalities (BVIs for abbreviation). This assumption is not restrictive because if in (1) the set X is not of the form (2) but is represented by several equalities and inequalities, then under standard constraint qualifications [25] we can equivalently transform (1) into new VIs with the constraint set of form (2), possibly with increased dimension (see, e.g., [54]). When X = ℜn+ , the VIs reduce to the nonlinear complementarity problem (NCP for abbreviation): Find y ∗ ∈ ℜn+ such that F (y ∗) ∈ ℜn+

and F (y ∗)T y ∗ = 0.

(3)

It is well known (see, e.g., [25]) that solving (1) is equivalent to finding a root of the following equation: W (y) := y − ΠX (y − F (y)) = 0,

(4)

where for any x ∈ ℜn , ΠX (x) is the Euclidean projection of x onto X and X is a nonempty closed convex subset of ℜn , which is not necessarily of the form (2). It is also well known that if X is a closed convex subset of ℜn , then solving VIs is equivalent to solving the following Robinson’s normal equation E(x) := F (ΠX (x)) + x − ΠX (x) = 0

(5)

in the sense that if x∗ ∈ ℜn is a solution of (5) then y ∗ := ΠX (x∗ ) is a solution of (1), and conversely if y ∗ is a solution of (1) then x∗ := y ∗ − F (y ∗) is a solution of (5) [49]. Both (4) and (5) are nonsmooth equations and have led to various generalized Newton’s methods. See [25,40,22,18] for a review of these methods. 2

By using the Gabriel-Mor´e smoothing function for ΠX (·), we can construct approximations for E(·): (u, x) ∈ ℜn × ℜn ,

G(u, x) := F (p(u, x)) + x − p(u, x),

(6)

where for each i ∈ N := {1, 2, ..., n}, pi (u, x) = q(ui, ai , bi , xi ) and for any (µ, c, d, w) ∈ ℜ × ℜ ∪ {−∞} × ℜ ∪ {∞} × ℜ with c ≤ d, q(µ, c, d, w) is defined by q(µ, c, d, w) =

   φ(|µ|, c, d, w)  Π

[c,d]∩ℜ (w)

if µ 6= 0

(7)

if µ = 0

and φ(µ, c, d, w), (µ, w) ∈ ℜ++ ×ℜ is a Gabriel-Mor´e smoothing approximation function [23], also, see Section 2 for the definition of φ(·). For example, for the NCP we can take the Chen-Harker-Kanzow-Smale (CHKS) smoothing NCP function [4,31,51] √ 2 w + 4µ2 + w , φ(µ, 0, ∞, w) = 2

(µ, w) ∈ ℜ++ × ℜ,

which is a special Gabriel-Mor´e smoothing function. In this paper, unless otherwise stated, we always assume that c ∈ ℜ ∪ {−∞}, d ∈ ℜ ∪ {∞} and c ≤ d. By Lemma 2.2 of [23], for any (µ, w) ∈ ℜ++ × ℜ, φ(µ, c, d, w) ∈ [c, d] ∩ ℜ, and so, for any (u, x) ∈ ℜn × ℜn , p(u, x) ∈ X.

(8)

Then the mapping G(·) is defined on ℜ2n while F (·) is only required to have definition on X, the feasible region. It is noted that we can also use p(·) to construct a class of approximation functions for W (·) defined in (4): V (u, y) := y − p(u, y − F (y)),

(u, y) ∈ ℜn × ℜn .

(9)

However, in order to make V (·) have definition on the whole space ℜ2n one has to assume that F (·) is well defined on the whole space ℜn . This requirement for F is not satisfied for many NCPs and BVIs transformed from economic equilibrium problems [18]. Moreover, even if F has definition on the whole space ℜn , some important properties of F , like monotonicity, which holds on 3

X, may not hold outside X. These observations lead us to focus on the approximation functions defined by (6) rather than (9). However, the techniques used here can be applied to (9) too. For the sake of convenience, let φcd : ℜ++ × ℜ → ℜ be defined by φcd (µ, w) := φ(µ, c, d, w),

(µ, w) ∈ ℜ++ × ℜ

(10)

and for any given µ ∈ ℜ++ , let φµcd : ℜ → ℜ be defined by φµcd (w) := φ(µ, c, d, w),

w ∈ ℜ.

(11)

Then, for any given µ ∈ ℜ++ , φµcd (·) is continuously differentiable at any w ∈ ℜ [23]. Moreover, for several most often used Gabriel-Mor´e smoothing functions it can be verified that φcd (·) is also continuously differentiable at any (µ, w) ∈ ℜ++ × ℜ. In this paper, we are interested in smoothing functions with this property, which we make it as an assumption. Assumption 1 The function φcd (·) is continuously differentiable at any (µ, w) ∈ ℜ++ × ℜ. Let z := (u, x) ∈ ℜn × ℜn and define H : ℜ2n → ℜ2n by 

H(z) :=  



u 

G(z)

.

(12)

Then it is easy to see that H is continuously differentiable at any z ∈ ℜn++ ×ℜn if Assumption 1 is satisfied. Recently, smoothing Newton methods have attracted a lot of attention in the literature partially due to their superior numerical performance [2], e.g., see [3–12,26,33,47,46,58] and references therein. Among them the first globally and superlinearly (quadratically) convergent smoothing Newton method was proposed by Chen, Qi and Sun in [11], where the authors exploited a Jacobian consistency property and applied this property to an infinite sequence of smoothing approximation functions to get high-order convergent methods. They dealt with general box constrained variational inequalities. However, even for the NCP they had to assume that F had definition on the whole space ℜn . The result of [11] has been further investigated by Chen and Ye [12] still with the same requirement. On the contrary, here we avoid this requirement by making use of the mapping H(·) and most importantly, we use only one smoothing approximation function instead of using an infinite sequence of those functions. In this way we make our smoothing Newton methods much 4

simpler. It is deserved to point out that the Jacobian consistency property may not hold for the smoothing function (6) if F is not globally Lipschitz continuous, and there is no high-order convergent methods based on (6). To treat the smoothing parameter u as a free variable may restrict the updating rules for choosing u. However, by doing so we can provide a globally and superlinearly convergent method for solving H(z) = 0. The idea of using (6) for solving the NCP was suggested by Chen, Harker and Pınar in [7]. Here we first study the smoothness and semismoothness properties of (6) about u and x jointly and then use these properties to get global and superlinear convergence results based on (12). Chen, Harker and Pınar also pointed out that by choosing smooth functions with finite support, the resulting Newton equation has a reduced dimension. This property carries to the generalized form (6). There are few globally and superlinearly convergent methods in the literature dealing with the NCP and the BVIs with requiring F defined on X only while at each step only solving a linear system of equations. In [24], by combining a modified extragradient method [52] and a generalized Newton method, Han and Sun gave such an algorithm for solving pseudomonotone variational inequalities with X being a nonempty closed convex subset represented by several twice continuously differentiable inequalities. Very recently, Kanzow and Qi [34] designed a QP-free constrained Newton-type method for the BVIs, with assumption bi = ∞, i ∈ N, by combining an updated ε-active projected gradient direction and a modified Gauss-Newton direction. The result of Kanzow and Qi is based on the Fischer-Burmeister function [19], which recently has received a lot of attention in the fields of the NCP and the BVIs, e.g., see [14–16,20,22,28,29,35,55] and references therein. However, it is believed that the Newton-type direction is much better than either the extragradient direction or the projected gradient direction. In this paper, instead of resorting to some hybrid techniques, at each step we use one minor modified Newton direction. This modification is crucial to the design of our algorithms. The organization of this paper is as follows. In the next section we study some preliminary properties of smoothing functions. In Section 3 we prove that φcd (·) is strongly semismooth with several particularly chosen φ(·). In Section 4 we state the algorithm and prove several propositions related to the algorithm. In Section 5 we establish the global convergence of the algorithm. In Section 6 we study under what conditions the level sets of the merit function ψ(·) = kH(·)k2 are bounded. We analyze the superlinear and quadratic convergence properties of the algorithm in Section 7 and give preliminary numerical results in Section 8. Final conclusions are given in Section 9. A word about our notation is in order. For a continuously differentiable function Φ : ℜm → ℜm , we denote the Jacobian of Φ at x ∈ ℜm by Φ′ (x), whereas the transposed Jacobian as ∇Φ(x). k · k denotes the Euclidean norm. If W is an m × m matrix with entries Wjk , j, k = 1, . . . , m, and J and K are 5

index sets such that J , K ⊆ {1, . . . , m}, we denote by WJ K the |J | × |K| sub-matrix of W consisting of entries Wjk , j ∈ J , k ∈ K. If WJ J is nonsingular, we denote by W/WJ J the Schur-complement of WJ J in W , i.e., W/WJ J := WKK − WKJ WJ−1J WJ K , where K = {1, . . . , m}\J . If w is an m vector, we denote by wJ the sub-vector with components j ∈ J .

2

Some Preliminaries

In this section we give some properties related to smoothing functions. In [10], Chen and Mangasarian introduced a class of smoothing approximation functions for max{0, w}, w ∈ ℜ. Gabriel and Mor´e [23] extended ChenMangasarian’s smoothing approach to Π (w), w ∈ ℜ. Let ρ : ℜ → ℜ+ be R ∞ [c,d]∩ℜ ρ(s)ds = 1, with a bounded absolute a density function, i.e., ρ(s) ≥ 0 and −∞ mean, that is κ :=

Z∞

−∞

|s|ρ(s)ds < ∞.

(13)

Recall that for any three numbers c ∈ ℜ ∪ {−∞}, d ∈ ℜ ∪ {∞} with c ≤ d and e ∈ ℜ, the median function mid(·) is defined by

mid(c, d, e) = Π[c,d]∩ℜ (e) =

    c   

e      d

if e < c if c ≤ e ≤ d . if d < e

Then the Gabriel-Mor´e smoothing function φ(µ, c, d, w) for Π[c,d]∩ℜ (w) [23] is defined by φ(µ, c, d, w) =

Z∞

−∞

mid(c, d, w − µs)ρ(s)ds,

(µ, w) ∈ ℜ++ × ℜ.

(14)

If c = −∞ and/or d = ∞, the value of φ takes the limit of φ as c → −∞ and/or d → ∞, correspondingly. For example, if c is finite and d = ∞, then φ(µ, c, ∞, w) = lim φ(u, c, d′, w), ′ d →∞

(u, w) ∈ ℜ++ × ℜ.

Let supp(ρ) = {s ∈ ℜ : ρ(s) > 0}. 6

Lemma 1 [23, Lemma 2.3] For any given µ > 0, the mapping φµcd (·) is continuously differentiable with φ′µcd (w)

=

(w−c)/µ Z

ρ(s)ds,

(w−d)/µ

where φµcd (·) is defined by (11). In particular, φ′µcd (w) ∈ [0, 1]. Furthermore, if supp(ρ) = ℜ and at least one of c and d is finite, then φ′µcd (w) ∈ (0, 1). Let qcd : ℜ2 → ℜ be defined by qcd (µ, w) = q(µ, c, d, w),

(µ, w) ∈ ℜ2 ,

(15)

where q(µ, c, d, w) is defined by (7). Then we have the following lemma. Lemma 2 The mapping qcd (·) defined by (15) is Lipschitz continuous on ℜ2 with Lipschitz constant L := 2 max{1, κ}.

PROOF. Suppose that (µ1 , w1 ) and (µ2 , w2 ) are two arbitrary points of ℜ2 . Then, since the mapping mid(c, d, ·) is non-expansive, we have |qcd (µ1 , w1 ) − qcd (µ2 , w2 )| = ≤ ≤ ≤

∞ Z −∞ Z∞

−∞ Z∞

−∞ Z∞ −∞

mid(c, d, w1 − |µ1 |s)ρ(s)ds −

Z∞

−∞

mid(c, d, w2 − |µ2 |s)ρ(s)ds

|mid(c, d, w1 − |µ1 |s) − mid(c, d, w2 − |µ2|s)|ρ(s)ds |(w1 − |µ1 |s) − (w2 − |µ2 |s)|ρ(s)ds |w1 − w2 |ρ(s)ds +

Z∞

−∞

|µ1 − µ2 ||s|ρ(s)ds

= |w1 − w2 | + κ|µ1 − µ2 | ≤ 2 max{1, κ}k(µ1, w1 ) − (u2 , w2 )k, which completes the proof of this lemma.

The following examples are three most often used Gabriel-Mor´e smoothing functions in the literature. 7

Example 3 Neural Networks Smoothing Function The density function is e−s ρ(s) = . (1 + e−s )2 We have κ = log 2, supp(ρ) = ℜ and for any (µ, w) ∈ ℜ++ × ℜ the smoothing function o

n

o

n

φ(µ, c, d, w) = d + µ ln 1 + e(c−w)/µ − µ ln 1 + e(d−w)/µ .

(16)

Then it is easy to see that φcd (·) is continuously differentiable at any (µ, w) ∈ ℜ++ × ℜ, i.e., Assumption 1 is satisfied for this smoothing function. If c = 0 and d = ∞, then the smoothing function in (16) reduces to the neural networks smoothing plus function [9]: φ(µ, 0, ∞, w) = w + µ ln(1 + e−w/µ ),

(µ, w) ∈ ℜ++ × ℜ.

(17)

The latter has been shown to have superior smoothing properties in global optimization work of Mor´e and Wu [39]. Example 4 Chen-Harker-Kanzow-Smale Smoothing Function The density function is 2 ρ(s) = 2 . (s + 4)3/2 We have κ = 1, supp(ρ) = ℜ and for any (µ, w) ∈ ℜ++ × ℜ the smoothing function φ(µ, c, d, w) =

c+

q

(c − w)2 + 4µ2 2

+

d−

q

(d − w)2 + 4µ2 2

.

(18)

Apparently, φcd (·) is continuously differentiable at any (µ, w) ∈ ℜ++ × ℜ, i.e., Assumption 1 is satisfied for this smoothing function. If c = 0 and d = ∞, then the smoothing function in (18) reduces to the CHKS smoothing NCP function: √ 2 w + 4µ2 + w φ(µ, 0, ∞, w) = . (19) 2 Example 5 Uniform Smoothing Function The density function is ρ(s) =

  1

 0

if −

1 2

≤s≤

otherwise

8

1 2

.

We have κ = 18 , supp(ρ) = [− 21 , 21 ] and for any (µ, w) ∈ ℜ++ × ℜ, the smoothing function 1 1 w (d − c) + (d + c)2 + (c2 − d2 ) µ 2 2µ if |w − c| < µ/2, |w − d| < µ/2 1 [w − µ/4 + d − (w − d)2 /µ] 2 if |w − d| < µ/2, w − c > µ/2 φ(µ, c, d, w) = .  1  2   [w + µ/4 + c + (w − c) /µ]   2      if |w − c| < µ/2, w − d < −µ/2                            

(20)

      mid(c, d, w)       otherwise

By direct computation, we can see that φcd (·) is continuously differentiable at any (µ, w) ∈ ℜ++ × ℜ, i.e., Assumption 1 is satisfied for this smoothing function. If 0 < µ ≤ d − c, then the smoothing function φ(·) has the following simple form: 1 [w − µ/4 + d − (w − d)2 /µ] if |w − d| < µ/2 2 1 φ(µ, c, d, w) = [w + µ/4 + c + (w − c)2 /µ] if |w − c| < µ/2 . (21)   2     mid(c, d, w) otherwise       

If c = 0 and d = ∞, the function in (20) is the Zang smoothing plus function [59]:

φ(µ, 0, ∞, w) =

   0     1

 2µ     w

if w ≤ −µ/2

(w + µ/2)2 if |w| < µ/2 ,

(µ, w) ∈ ℜ++ × ℜ. (22)

if w ≥ µ/2

Similar functions to (22) can be found in [42] and [46]. Theorem 6 Suppose that Assumption 1 holds for a chosen smoothing function φ(µ, c, d, w), (µ, w) ∈ ℜ++ × ℜ. Then (i) The mapping H(·) is continuously differentiable at any z = (u, x) ∈ ℜn++ × 9

ℜn and



H (z) =

      

I

0

(F ′(p(z)) − I)D(u) F ′ (p(z))C(x) + I − C(x)



   ,  

(23)

where D(u) = diag{di (u), i ∈ N}, C(x) = diag{ci (x), i ∈ N} and di (u) = ∂pi (u, x)/∂ui , ci (x) = ∂pi (u, x)/∂xi and ci (x) ∈ [0, 1], i ∈ N. (ii) Suppose that for some z ∈ ℜn++ × ℜn , F ′ (p(z)) is a P0 -matrix, i.e., its every principal minor is nonnegative. Then H ′ (z) is nonsingular if supp(ρ) = ℜ and for each i ∈ N, at least one of ai and bi is finite. (iii) If for some z ∈ ℜn++ ×ℜn , F ′ (p(z)) is a P -matrix, i.e., its every principal minor is positive, then H ′ (z) is nonsingular.

PROOF. (i) Since Assumption 1 is satisfied for φ(·), from the definition, we know that H(·) is continuously differentiable at any z = (u, x) ∈ ℜn++ ×ℜn . By direct computation we have (23). From Lemma 1 and the definition of pi (·), ci (x) ∈ [0, 1], i ∈ N. (ii) Under the assumptions, from Lemma 1, ci (x) ∈ (0, 1), i ∈ N. Then, it is easy to see that F ′ (p(z))C(x) + I − C(x) is nonsingular under the assumption that F ′ (p(z)) is a P0 -matrix, see, e.g., [4, Theorem 3.3]. It then follows from (23) that H ′ (z) is also nonsingular. (iii) The assumption that F ′ (p(z)) is a P -matrix and the fact that ci (x) ∈ [0, 1], i ∈ N ensure that F ′ (p(z))C(x) + I − C(x) is nonsingular, e.g., see [7, Lemma 2]. So, H ′ (z) is nonsingular.

Since the smoothing functions defined by (16), (18) and (20) all satisfy Assumption 1, from Theorem 6 we have the following result. Theorem 7 Suppose that the smoothing function φ(µ, c, d, w), (µ, w) ∈ ℜ++ × ℜ is defined by either (16) or (18) or (20). Then (i) The mapping H is continuously differentiable at any z = (u, x) ∈ ℜn++ ×ℜn and if F ′ (p(z)) is a P -matrix, then H ′(z) is nonsingular. (ii) Suppose that φ(µ, c, d, w), (µ, w) ∈ ℜ++ × ℜ is defined by either (16) or (18) (in each case supp(ρ) = ℜ) and for each i ∈ N, at least one of ai and bi is finite. Then H ′ (z) is nonsingular if F ′ (p(z)) is a P0 -matrix at z = (u, x) ∈ ℜn++ × ℜn . 10

3

Semismoothness Properties

In order to design high-order convergent Newton methods we need the concept of semismoothness. Semismoothness was originally introduced by Mifflin [37] for functionals. Convex functions, smooth functions, and piecewise linear functions are examples of semismooth functions. The composition of semismooth functions is still a semismooth function [37]. In [48], Qi and Sun extended the definition of semismooth functions to Φ : ℜm1 → ℜm2 . A locally Lipschitz continuous vector valued function Φ : ℜm1 → ℜm2 has a generalized Jacobian ∂Φ(x) as in Clarke [13]. Φ is said to be semismooth at x ∈ ℜm1 , if lim

V ∈∂Φ(x+th′ ) h′ →h, t↓0

{V h′ }

exists for any h ∈ ℜm1 . It has been proved in [48] that Φ is semismooth at x if and only if all its component functions are. Also, Φ′ (x; h), the directional derivative of Φ at x in the direction h, exists for any h ∈ ℜm1 if Φ is semismooth at x. Lemma 8 [48] Suppose that Φ : ℜm1 → ℜm2 is a locally Lipschitzian function and semismooth at x. Then (i) for any V ∈ ∂Φ(x + h), h → 0, V h − Φ′ (x; h) = o(khk); (ii) for any h → 0, Φ(x + h) − Φ(x) − Φ′ (x; h) = o(khk). The following lemma is extracted from Theorem 2.3 of [48]. Lemma 9 Suppose that Φ : ℜm1 → ℜm2 is a locally Lipschitzian function. Then the following two statements are equivalent: (i) Φ(·) is semismooth at x. (ii) For any V ∈ ∂Φ(x + h), h → 0, V h − Φ′ (x; h) = o(khk). A stronger notion than semismoothness is strong semismoothness. Φ(·) is said to be strongly semismooth at x if Φ is semismooth at x and for any V ∈ ∂Φ(x + h), h → 0, V h − Φ′ (x; h) = O(khk2 ). (Note that in [48] and [45] different names for strong semismoothness are used.) A function Φ is said to be a (strongly) semismooth function if it is (strongly) semismooth everywhere. 11

Recall that from Lemma 2 the function qcd (·) defined by (15) is globally Lipschitz continuous on ℜ2 . Then, from Lemma 9 and the definition of strong semismoothness, we can prove that qcd (·) is strongly semismooth at x ∈ ℜ2 by verifying that for any V ∈ ∂qcd (x + h), h → 0, ′ V h − qcd (x; h) = O(khk2).

(24)

The following proposition is about the strong semismoothness of qcd resulted from Examples 3–5. Its proof can be found in Appendix. Proposition 10 Suppose that the smoothing function φ(·) is the neural networks smoothing function defined by (16), or the Chen-Harker-Kanzow-Smale smoothing function defined by (18), or the uniform smoothing function defined by (20). Then the corresponding function qcd : ℜ2 → ℜ defined by (15) is a strongly semismooth function. Theorem 11 Suppose that the smoothing function φ(µ, c, d, w), (µ, w) ∈ ℜ++ × ℜ is defined by either (16) or (18) or (20). Then (i) H is semismooth at any z ∈ ℜ2n , and (ii) if for some point z ∈ ℜ2n , F ′ is Lipschitz continuous around p(z) ∈ ℜn , then H is strongly semismooth at z.

PROOF. (i) Since p is strongly semismooth at z if and only if its component functions pi , i ∈ N are, and the composition of strongly semismooth functions is a strongly semismooth function [21, Theorem 19], from Proposition 10, it follows that p is a strongly semismooth function. Hence, by making use of the proposition that the composition of two semismooth functions is semismooth [37] that for each i = 1, 2, ..., n, Gi is a semismooth function. Hence, G, and so H, is a semismooth function. (ii) It is noted that F ′ is Lipschitz continuous around p(z) ∈ ℜn implies that F is strongly semismooth at p(z). Hence, by [21, Theorem 19], F (p(·)) is strongly semismooth at z because p is strongly semismooth at z too. Then G, and so H, is strongly semismooth at z.

4

Smoothing Newton Methods

Throughout the rest of this paper, unless otherwise stated, we assume that the smoothing function φ(·) satisfies Assumption 1. Choose u¯ ∈ ℜn++ and γ ∈ (0, 1) such that γk¯ uk < 1. Let z¯ := (¯ u, 0) ∈ ℜn × ℜn . 12

Define the merit function ψ : ℜ2n → ℜ+ by ψ(z) := kH(z)k2 and define β : ℜ2n → ℜ+ by β(z) := γ min{1, ψ(z)}. Let Ω := {z = (u, x) ∈ ℜn × ℜn | u ≥ β(z)¯ u}.

Then, because for any z ∈ ℜ2n , β(z) ≤ γ < 1, it follows that for any x ∈ ℜn , (¯ u, x) ∈ Ω. Proposition 12 The following relations hold: H(z) = 0 ⇐⇒ β(z) = 0 ⇐⇒ H(z) = β(z)¯ z. PROOF. It follows from the definitions of H(·) and β(·) that H(z) = 0 ⇐⇒ β(z) = 0 and β(z) = 0 =⇒ H(z) = β(z)¯ z. Then we only need to prove H(z) = β(z)¯ z =⇒ β(z) = 0. However, this is an easy task because from H(z) = β(z)¯ z we have u = β(z)¯ u and G(z) = 0. Hence, from the definitions of ψ(·) and β(·), and the fact that γk¯ uk < 1, we get ψ(z) = kuk2 + kG(z)k2 = kuk2 = β(z)2 k¯ uk2 ≤ γ 2 k¯ uk2 < 1. Therefore,

β(z) = γψ(z) = γβ(z)2 k¯ u k2 .

(25)

If β(z) 6= 0, it follows from (25) and the fact β(z) ≤ γ that 1 = γβ(z)k¯ uk2 ≤ γ 2 k¯ uk2 , which contradicts the fact that γk¯ uk < 1. This contradiction completes our proof. Algorithm 1 13

Step 0. Choose constants δ ∈ (0, 1) and σ ∈ (0, 1/2). Let u0 := u¯, x0 ∈ ℜn be an arbitrary point and k := 0. Step 1. If H(z k ) = 0 then stop. Otherwise, let βk := β(z k ). Step 2. Compute ∆z k := (∆uk , ∆xk ) ∈ ℜn × ℜn by H(z k ) + H ′ (z k )∆z k = βk z¯.

(26)

Step 3. Let lk be the smallest nonnegative integer l satisfying ψ(z k + δ l ∆z k ) ≤ [1 − 2σ(1 − γk¯ uk)δ l ]ψ(z k ).

(27)

Define z k+1 := z k + δ lk ∆z k . Step 4. Replace k by k + 1 and go to Step 1. Remark 13 (i) Since we have assumed that Assumption 1 is satisfied for the smoothing function φ used in the algorithm, H(·) is continuously differentiable at any z k ∈ ℜn++ × ℜn . (ii) From Theorem 6, for z k ∈ ℜn++ × ℜn if F ′ (p(z k )) is a P -matrix, then H ′(z k ) is nonsingular, and if supp(ρ) = ℜ and for each i ∈ N, at least one of ai and bi is finite, then the condition that F ′ (p(z k )) is a P0 -matrix is sufficient to guarantee that H ′ (z k ) is nonsingular. (iii) We can solve equation (26) in the following way: Let ∆uk = −uk + βk u¯. Solve [F ′ (p(z k ))C(xk ) + I − C(xk )]∆xk

= −G(z k ) − [(F ′ (p(z k )) − I)D(uk )]∆uk

(28)

to get ∆xk . Then ∆z k = (∆uk , ∆xk ). Equation (28) is an n-dimensional linear system. If we choose the uniform smoothing function φ(·) defined by (20), then solving (26) can be further simplified because in this case some or all of the diagonal entries of the diagonal matrix C(xk ) are zeros. For example, for the NCP, if w k ≤ −uk /2, then from Lemma 1, ci (xk ) = 0, i ∈ N, and so, C(xk ) = 0. Hence, in this case ∆xk = −G(z k ) − [(F ′ (p(z k )) − I)D(uk )]∆uk . Solving a form reduced linear equation is very favourable. However, this is not without a price because by choosing the smoothing function φ(·) defined by (20) we need stronger conditions to ensure the nonsingularity of H ′(z k ) than by choosing the smoothing function φ(·) defined by either (16) or (18). (iv) From the design of our algorithm we can see that the parameter u may not change until ψ(z) < 1. To make the parameter u change at each step we can let the steplength δ lk < 1 in Step 3 for all k such that ψ(z k ) > 1 though this is not recommended in practical computation. Lemma 14 Suppose that Assumption 1 holds. If for some z˜ := (˜ u, x ˜) ∈ ℜn++ × n ′ ℜ H (˜ z ) is nonsingular, then there exist a closed neighbourhood N (˜ z ) of z˜ 14

and a positive number α ¯ ∈ (0, 1] such that for any z = (u, x) ∈ N (˜ z ) and all α ∈ [0, α] ¯ we have u ∈ ℜn++ , H ′ (z) is invertible and ψ(z + α∆z) ≤ [1 − 2σ(1 − γk¯ uk)α]ψ(z).

(29)

PROOF. Since H ′ (˜ z ) is invertible and u˜ ∈ ℜn++ , there exists a closed neighbourhood N (˜ z ) of z˜ such that for any z = (u, x) ∈ N (˜ z ) we have u ∈ ℜn++ and that H ′(z) is invertible. For any z ∈ N (˜ z ), let ∆z = (∆u, ∆x) ∈ ℜn × ℜn be the unique solution of the following equation: H(z) + H ′ (z)∆z = β(z)¯ z

(30)

and for any α ∈ [0, 1], define gz (α) = G(z + α∆z) − G(z) − αG′ (z)∆z. From (30), for any z ∈ N (˜ z ), ∆u = −u + β(z)¯ u. Then for all α ∈ [0, 1] and all z ∈ N (˜ z ), u + α∆u = (1 − α)u + αβ(z)¯ u ∈ ℜn++ .

(31)

It follows from the Mean Value Theorem that

gz (α) = α

Z1 0

[G′ (z + θα∆z) − G′ (z)]∆zdθ.

Since G′ (·) is uniformly continuous on N (˜ z ) and ∆z → ∆˜ z as z → z˜, for all z ∈ N (˜ z ), lim kgz (α)k/α = 0. α↓0

Then, from (31), (30) and the fact that β(z) ≤ γψ(z)1/2 , for all α ∈ [0, 1] and 15

all z ∈ N (˜ z ), we have ku + α∆uk2

= k(1 − α)u + αβ(z)¯ uk2

= (1 − α)2 kuk2 + 2(1 − α)αβ(z)uT u¯ + α2 β(z)2 k¯ uk2

≤ (1 − α)2 kuk2 + 2(1 − α)αβ(z)kukk¯ uk + α2 β(z)2 k¯ uk2 ≤ (1 − α)2 kuk2 + 2αβ(z)kukk¯ uk + O(α2)

≤ (1 − α)2 kuk2 + 2αγψ(z)1/2 kH(z)kk¯ uk + O(α2) = (1 − α)2 kuk2 + 2αγk¯ ukψ(z) + O(α2)

(32)

and kG(z + α∆z)k2

= kG(z) + αG′ (z)∆z + gz (α)k2 = k(1 − α)G(z) + gz (α)k2

= (1 − α)2 kG(z)k2 + 2(1 − α)G(z)T gz (α) + kgz (α)k2 = (1 − α)2 kG(z)k2 + o(α).

(33)

It then follows from (32) and (33) that for all α ∈ [0, 1] and all z ∈ N (˜ z ), we have ψ(z + α∆z) = kH(z + α∆z)k2

= ku + α∆uk2 + kG(z + α∆z)k2

≤ (1 − α)2 kuk2 + 2αγk¯ ukψ(z) + (1 − α)2 kG(z)k2 + o(α) + O(α2) = (1 − α)2 ψ(z) + 2αγk¯ ukψ(z) + o(α)

= (1 − 2α)ψ(z) + 2αγk¯ ukψ(z) + o(α) = [1 − 2(1 − γk¯ uk)α]ψ(z) + o(α).

(34)

Then from inequality (34) we can find a positive number α ¯ ∈ (0, 1] such that for all α ∈ [0, α ¯ ] and all z ∈ N (˜ z ), (29) holds. We can get the following result directly from Lemma 14.

16

Proposition 15 Suppose that Assumption 1 holds. For any k ≥ 0, if z k ∈ ℜn++ × ℜn and H ′(z k ) is nonsingular, then Algorithm 1 is well defined at kth iteration and z k+1 ∈ ℜn++ × ℜn . Proposition 16 Suppose that Assumption 1 holds. For each fixed k ≥ 0, if uk ∈ ℜn++ , z k ∈ Ω and H ′ (z k ) is nonsingular, then for any α ∈ [0, 1] such that ψ(z k + α∆z k ) ≤ [1 − 2σ(1 − γk¯ uk)α]ψ(z k ),

(35)

it holds that z k + α∆z k ∈ Ω.

PROOF. We prove this proposition by considering the following two cases: (i) If ψ(z k ) > 1. Then, βk = γ. It therefore follows from z k ∈ Ω and β(z) = γ min{1, ψ(z)} ≤ γ for any z ∈ ℜ2n that for all α ∈ [0, 1], we have uk + α∆uk − β(z k + α∆z k )¯ u ≥ (1 − α)uk + αβk u¯ − γ u¯

≥ (1 − α)βk u¯ + αβk u¯ − γ u¯ = (1 − α)γ u¯ + αγ u¯ − γ u¯ = 0.

(36)

(ii) If ψ(z k ) ≤ 1. Then, for any α ∈ [0, 1] satisfying (35), we have ψ(z k + α∆z k ) ≤ [1 − 2σ(1 − γk¯ uk)α]ψ(z k ) ≤ 1.

(37)

So, for any α ∈ [0, 1] satisfying (35), β(z k + α∆z k ) = γψ(z k + α∆z k ). Hence, again because z k ∈ Ω, by using the first inequality in (37), for any 17

α ∈ [0, 1] satisfying (35) we have uk + α∆uk − β(z k + α∆z k )¯ u

= (1 − α)uk + αβk u¯ − γψ(z k + α∆z k )¯ u

≥ (1 − α)βk u¯ + αβk u¯ − γ[1 − 2σ(1 − γk¯ uk)α]ψ(z k )¯ u = βk u¯ − γ[1 − 2σ(1 − γk¯ uk)α]ψ(z k )¯ u

= γψ(z k )¯ u − γ[1 − 2σ(1 − γk¯ uk)α]ψ(z k )¯ u = [2γσ(1 − γk¯ uk)]αψ(z k )¯ u ≥ 0.

(38)

Thus, by combining (36) and (38), we have proved that for all α ∈ [0, 1] satisfying (35), z k + α∆z k ∈ Ω. This completes our proof. By combining Propositions 15 and 16, we have Proposition 17 Suppose that Assumption 1 holds. For each fixed k ≥ 0, if uk ∈ ℜn++ , z k ∈ Ω and H ′ (z k ) is invertible, then uk+1 ∈ ℜn++

and

z k+1 ∈ Ω.

Proposition 18 Suppose that Assumption 1 holds and that for every k ≥ 0 with uk ∈ ℜn++ and z k ∈ Ω we have that H ′(z k ) is invertible. Then an infinite sequence {z k } is generated by Algorithm 1, uk ∈ ℜn++ and {z k } ∈ Ω. PROOF. First, because z 0 = (¯ u, x0 ) ∈ Ω, we have from Proposition 17 1 1 n that z is well defined, u ∈ ℜ++ and z 1 ∈ Ω. Then, by repeatedly resorting to Proposition 17 we can prove that an infinite sequence {z k } is generated, uk ∈ ℜn++ and z k ∈ Ω. 5

Global Convergence

In order to discuss the global convergence of Algorithm 1 we need the following assumption. Assumption 2 (i) For every k ≥ 0, if uk ∈ ℜn++ and z k ∈ Ω, then H ′(z k ) is nonsingular; and 18

(ii) for any accumulation point z ∗ = (u∗, x∗ ) of {z k } if u∗ ∈ ℜn++ and z ∗ ∈ Ω, then H ′ (z ∗ ) is nonsingular. Theorem 19 Suppose that Assumptions 1 and 2 are satisfied. Then an infinite sequence {z k } is generated by Algorithm 1 and each accumulation point z˜ of {z k } is a solution of H(z) = 0.

PROOF. It follows from Proposition 18 and Assumption 2 that an infinite sequence {z k } is generated such that {z k } ∈ Ω. From the design of Algorithm 1, ψ(z k+1 ) < ψ(z k ) for all k ≥ 0. Hence the two sequences {ψ(z k )} and {β(z k )} are monotonically decreasing. Since ψ(z k ), β(z k ) ≥ 0 (k ≥ 0), there ˜ β˜ ≥ 0 such that ψ(z k ) → ψ˜ and β(z k ) → β˜ as k → ∞. If ψ˜ = 0 and exist ψ, {z k } has an accumulation point z˜, then from the continuity of ψ(·) and β(·) we obtain ψ(˜ z ) = 0 and β(˜ z ) = 0. Then we obtain the desired result. Suppose that ψ˜ > 0 and z˜ = (˜ u, x˜) ∈ ℜn × ℜn is an accumulation point of {z k }. By taking a subsequence if necessary, we may assume that {z k } converges to z˜. It is easy to see that ψ˜ = ψ(˜ z ), β(˜ z ) = β˜ and z˜ ∈ Ω. Thus, from β(˜ z ) = γ min{1, ψ(˜ z )} > 0 and z˜ ∈ Ω, we see that u˜ ∈ ℜn++ . Then, from (ii) of Assumption 2, H ′ (˜ z ) exists and is invertible. Hence, from Lemma 14 there exist a closed neighbourhood N (˜ z ) of z˜ and a positive number α ¯ ∈ (0, 1] such that for any z = (u, x) ∈ N (˜ z ) and all α ∈ [0, α ¯ ] we have u ∈ ℜn++ , H ′ (z) is invertible and (29) holds. Therefore, for a nonnegative integer l such that δ l ∈ (0, α ¯ ], we have ψ(z k + δ l ∆z k ) ≤ [1 − 2σ(1 − γk¯ uk)δ l ]ψ(z) for all sufficiently large k. Then, for every sufficiently large k, we see that lk ≤ l and hence δ lk ≥ δ l . Then ψ(z k+1 ) ≤ [1 − 2σ(1 − γk¯ uk)δ lk ]ψ(z k ) ≤ [1 − 2σ(1 − γk¯ uk)δ l ]ψ(z k ) for all sufficiently large k. This contradicts the fact that the sequence {ψ(z k )} converges to ψ˜ > 0. So, we complete our proof. Theorem 20 Suppose that the smoothing function φ(·) is defined by either (16) or (18) or (20). Then an infinite sequence {z k } is generated by Algorithm 1 and each accumulation point z˜ of {z k } is a solution of H(z) = 0, under one of the following two conditions, (i) for each z = (u, x) ∈ Ω with u ∈ ℜn++ , F ′ (p(z)) is a P -matrix; (ii) if φ(·) is defined by either (16) or (18), for each i ∈ N, at least one of ai and bi is finite, and for each z = (u, x) ∈ Ω with u ∈ ℜn++ , F ′ (p(z)) is a P0 -matrix. 19

PROOF. By using Theorems 7 and 19 directly, we get the results of this theorem.

6

Bounded Level Sets

In Section 5 we proved, under the assumptions of Theorem 19, that any accumulation point of {z k } generated by Algorithm 1, if it exists, is a solution of H(z) = 0. An important question remained unanswered is whether such an accumulation point exists or not. In this section we answer this question by investigating under what conditions the level sets of ψ(z) = kH(z)k2 are bounded. For this, let L(z 0 ) = {z ∈ ℜ2n | ψ(z) ≤ ψ(z 0 )}. Theorem 21 If X is bounded, then L(z 0 ) is bounded.

PROOF. Since X is bounded, it follows from (8) that kp(z)k is bounded for any z ∈ ℜ2n . For the sake of contradiction, suppose that there exists a sequence {z k = (uk , xk ) ∈ ℜn × ℜn } such that z k ∈ L(z 0 ) and kz k k → ∞. Apparently, since z k ∈ L(z 0 ), kuk k ≤ kH(z k )k ≤ kH(z 0 )k. So, kxk k → ∞. Hence, by using the fact that kp(uk , xk )k is bounded, we have kG(uk , xk )k = kF (p(uk , xk )) + xk − p(uk , xk )k → ∞, which contradicts that z k ∈ L(z 0 ) because kH(z k )k ≥ kG(z k )k. Theorem 22 Suppose that F is a uniform P -function on X, i.e., there exists a positive number ν > 0 such that max(yi1 − yi2 )(Fi (y 1 ) − Fi (y 2)) ≥ νky 1 − y 2 k2 i∈N

∀ y 1 , y 2 ∈ X.

(39)

Then L(z 0 ) is bounded.

PROOF. For the sake of contradiction, suppose that there exists a sequence {z k = (uk , xk ) ∈ ℜn × ℜn } such that z k ∈ L(z 0 ) and kz k k → ∞. Since, apparently, kuk k is bounded, kxk k → ∞. It is easy to prove that for i ∈ N, |mid(ai , bi , xki )| → ∞ =⇒ |xki | → ∞ and |xki − mid(ai , bi , xki )| → 0. (40) 20

From Lemma 2 and the definition of p(·), there exists a constant L′ > 0 such that |pi (uk , xk ) − mid(ai , bi , xki )| ≤ L′ |uki |,

i ∈ N.

(41)

Define the index set J by J := {i| {pi(uk , xk )} is unbounded, i ∈ N}. Then it follows that J 6= ∅ because otherwise kG(z k )k = kF (p(z k ))+xk −p(z k )k → ∞. Let z¯k = (¯ uk , x ¯k ) ∈ ℜn × ℜn be defined by    uk i

u¯ki = and

   xk

pi (¯ zk ) =

if i ∈ J

if i ∈ J

 0

   pi (z k )

if i ∈ /J

if i ∈ /J

i

x¯ki = Then

 0

,

i ∈ N.

if i ∈ /J i , bi , 0) if i ∈ J

,

  mid(a

i ∈ N.

Hence {kp(¯ z k )k} is bounded. Therefore, from (39), we have ν

X i∈J

(pi (z k ) − pi (¯ z k ))2 = νkp(z k ) − p(¯ z k )k2 ≤ max(pi (z k ) − pi (¯ z k ))(Fi (p(z k )) − Fi (p(¯ z k ))) i∈N

≤ max |pi (z k ) − pi (¯ z k )||Fi(p(z k )) − Fi (p(¯ z k ))| i∈N

= max |pi (z k ) − pi (¯ z k )||Fi (p(z k )) − Fi (p(¯ z k ))| ≤

i∈J s X i∈J

(pi (z k ) − pi (¯ z k ))2 max |Fi (p(z k )) − Fi (p(¯ z k ))|. i∈J

Then maxi∈J |Fi (p(z k )) − Fi (p(¯ z k ))| → ∞ as k → ∞. Since {kF (p(¯ z k ))k} is bounded, for each k there exists at least one ik ∈ J such that |Fik (p(z k ))| → ∞. Since J has only a finite number of elements, by taking a subsequence if necessary, we may assume that there exists an i ∈ J such that |Fi (p(z k ))| → ∞. Then, in view of (41), the definition of J and the boundedness of {kuk k}, we have proved that there exists at least one i ∈ J such that |Fi (p(z k ))|, |pi(z k )|, |mid(ai , bi , xki )| → ∞. 21

Hence, by (40), (41) and that {kuk k} is bounded, for such i ∈ J, |xki −pi (z k )| is bounded. It then follows that for such i ∈ J, {|Gi(z k )|} is unbounded. This is a contradiction because kH(z k )k ≥ kG(z k )k. This contradiction shows L(z 0 ) is bounded.

There are several papers in the literature dealing with the bounded level sets issue for different merit functions by assuming that F is a uniform P -function on ℜn , i.e., (39) holds for all y 1, y 2 ∈ ℜn , see, e.g., [16,28,14,11,32,55]. Here we only require (39) to hold on X.

7

Superlinear and Quadratic Convergence

Theorem 23 Suppose that Assumptions 1 and 2 are satisfied and z ∗ is an accumulation point of the infinite sequence {z k } generated by Algorithm 1. Suppose that H is semismooth at z ∗ and that all V ∈ ∂H(z ∗ ) are nonsingular. Then the whole sequence {z k } converges to z ∗ , kz k+1 − z ∗ k = o(kz k − z ∗ k)

(42)

and uk+1 = o(uki ), i

i ∈ N.

(43)

Furthermore, if H is strongly semismooth at z ∗ , then kz k+1 − z ∗ k = O(kz k − z ∗ k2 )

(44)

and uk+1 = O(uki )2 , i

i ∈ N.

(45)

PROOF. First, from Theorem 19 that z ∗ is a solution of H(z) = 0. Then, from [48, Proposition 3.1], for all z k sufficiently close to z ∗ , kH ′(z k )−1 k = O(1). 22

Hence, under the assumption that H is semismooth (strongly semismooth, respectively) at z ∗ , from Lemma 8, for z k sufficiently close to z ∗ , we have kz k + ∆z k − z ∗ k

= kz k + H ′ (z k )−1 [−H(z k ) + βk z¯] − z ∗ k

= O(kH(z k ) − H(z ∗ ) − H ′ (z k )(z k − z ∗ )k + βk k¯ uk)

= o(kz k − z ∗ k) + O(ψ(z k )) (= O(kz k − z ∗ k2 ) + O(ψ(z k ))).

(46)

Then, because H is semismooth at z ∗ , H is locally Lipschitz continuous near z ∗ , for all z k close to z ∗ , ψ(z k ) = kH(z k )k2 = O(kz k − z ∗ k2 ).

(47)

Therefore, from (46) and (47), if H is semismooth (strongly semismooth, respectively) at z ∗ , for all z k sufficiently close to z ∗ , kz k + ∆z k − z ∗ k = o(kz k − z ∗ k) (= O(kz k − z ∗ k2 )).

(48)

By following the proof of Theorem 3.1 of [45], for all z k sufficiently close to z ∗ , we have kz k − z ∗ k = O(kH(z k ) − H(z ∗ )k).

(49)

Hence, if H is semismooth (strongly semismooth, respectively) at z ∗ , for all z k sufficiently close to z ∗ , we have ψ(z k + ∆z k ) = kH(z k + ∆z k )k2

= O(kz k + ∆z k − z ∗ k2 )

= o(kz k − z ∗ k2 ) (= O(kz k − z ∗ k4 ))

= o(kH(z k ) − H(z ∗ )k2 ) (= O(kH(z k ) − H(z ∗ )k4 )) = o(ψ(z k )) (= O(ψ(z k )2 )).

(50)

Therefore, for all z k sufficiently close to z ∗ we have z k+1 = z k + ∆z k , which, together with (48), proves (42), and if H is strongly semismooth at z ∗ , proves (44). 23

Next, from the definition of βk and the fact that z k → z ∗ as k → ∞, for all k sufficiently large, βk = γψ(z k ) = γkH(z k )k2 . Also, because for all k sufficiently large, z k+1 = z k + ∆z k , we have for all k sufficiently large that uk+1 = uk + ∆uk = βk u¯. Hence, for all k sufficiently large, uk+1 = γkH(z k )k2 u¯, which, together with (42), (47) and (49), gives kH(z k ) − H(z ∗ )k2 kH(z k )k2 uk+1 i = lim = 0, = lim k→∞ kH(z k−1 ) − H(z ∗ )k2 k→∞ kH(z k−1 )k2 k→∞ uk i lim

i ∈ N.

This proves (43). If H is strongly semismooth at z ∗ , then from the above argument we can easily get (45). So, we complete our proof. Next, we study under what conditions all the matrices V ∈ ∂H(z ∗ ) are nonsingular at a solution point z ∗ = (u∗, x∗ ) ∈ ℜn × ℜn of H(z) = 0. Apparently, u∗ = 0 and x∗ is a solution of E(x) = 0, where E is defined in (5). For convenience of handling notation we denote I := {i| ai < x∗i < bi , i ∈ N}, J := {i| x∗i = ai , i ∈ N} ∪ {i| x∗i = bi , i ∈ N}

and Then

K := {i| x∗i < ai , i ∈ N} ∪ {i| x∗i > bi , i ∈ N}. I ∪ J ∪ K = N.

By rearrangement we assume that ∇F (ΠX (x∗ )) can be rewritten as ∇F (ΠX (x∗ )) =





 ∇F (ΠX (x ))II    ∇F (ΠX (x∗ ))J I  







∇F (ΠX (x ))IJ ∇F (ΠX (x ))IK   

. ∇F (ΠX (x∗ ))J J ∇F (ΠX (x∗ ))J K  

∇F (ΠX (x∗ ))KI ∇F (ΠX (x∗ ))KJ ∇F (ΠX (x∗ ))KK



The BVIs are said to be R-regular at x∗ if ∇F (ΠX (x∗ ))II is nonsingular and its Schur-complement in the matrix 



∗ ∗  ∇F (ΠX (x ))II ∇F (ΠX (x ))IJ 



∇F (ΠX (x∗ ))J I ∇F (ΠX (x∗ ))J J 24



is a P -matrix, see [50]. Proposition 24 Suppose that z ∗ = (u∗ , x∗ ) ∈ ℜn ×ℜn is a solution of H(z) = 0. If the BVIs are R-regular at x∗ , then all V ∈ ∂H(z ∗ ) are nonsingular. PROOF. It is easy to see that for any V ∈ ∂H(z ∗ ) there exists a W = (Wu , Wx ) ∈ ∂G(z ∗ ) with Wu , Wx ∈ ℜn×n such that 

V = 

I



0 

Wu Wx

.

Hence, proving V is nonsingular is equivalent to proving Wx is nonsingular. Recall that G(u, x) = F (p(u, x)) + x − p(u, x). Then, for any W = (Wu , Wx ) ∈ ∂G(z ∗ ) with Wu , Wx ∈ ℜn×n there exists a U = (Uu , Ux ) ∈ ∂p(z ∗ ) such that Wx = F ′ (p(z ∗ ))Ux + I − Ux . By the definition of p, we have ∂p1 (z ∗ ) × ∂p2 (z ∗ ) × · · · × ∂pn (z ∗ ) = ∂p(z ∗ ). Then for each i ∈ N, the ith row of U, Ui ∈ ∂pi (z ∗ ). Apparently, from the definition of p and Lemma 1, Ux = diag{(ux )i , i ∈ N}, where

    (ux )i   

(ux )i       (ux )i

=1

if i ∈ I

∈ [0, 1] if i ∈ J . =0

if i ∈ K

Hence, for each Wx and each i ∈ J there exists λi ∈ [0, 1] such that (WxT )i =

    ∇F (p(z ∗ ))i   

λi ∇F (p(z       ei



if i ∈ I

))i + (1 − λi )ei if i ∈ J , if i ∈ K

where ei is the ith unit row vector of ℜn , i ∈ N. Then, by using standard analysis (see, e.g., [16, Proposition 3.2]), we can prove that WxT , and so Wx , is nonsingular under the assumption of R-regularity (note that p(z ∗ ) = ΠX (x∗ )). Then, any V ∈ ∂H(z ∗ ) is nonsingular. So, we complete our proof. The following result follows from Theorem 23 and Proposition 24 directly.

25

Theorem 25 Suppose that Assumptions 1 and 2 are satisfied and z ∗ = (u∗ , x∗ ) ∈ ℜn × ℜn is an accumulation point of the infinite sequence {z k } generated by Algorithm 1. Suppose that H is semismooth at z ∗ and that the BVIs are Rregular at x∗ . Then (42) and (43) in Theorem 23 hold. Furthermore, if H is strongly semismooth at z ∗ , then (44) and (45) in Theorem 23 hold. By combining Theorems 11, 20 and 25 we can directly obtain the following result. Theorem 26 Suppose that the smoothing function φ(·) is defined by either (16) or (18) or (20). Then an infinite sequence {z k } is generated by Algorithm 1 and each accumulation point z ∗ = (u∗ , x∗ ) ∈ ℜn × ℜn of {z k } is a solution of H(z) = 0, under one of the following two conditions, (i) for each z = (u, x) ∈ Ω with u ∈ ℜn++ , F ′ (p(z)) is a P -matrix; (ii) if φ(·) is defined by either (16) or (18), for each i ∈ N, at least one of ai and bi is finite, and for each z = (u, x) ∈ Ω with u ∈ ℜn++ , F ′ (p(z)) is a P0 -matrix. Further, if the R-regularity holds at x∗ , then the whole sequence {z k } converges to z ∗ , and (42) and (43) in Theorem 23 hold. Moreover, if F ′ is Lipschitz continuous near ΠX (x∗ ), then (44) and (45) in Theorem 23 hold. Corollary 27 Suppose that the smoothing function φ(·) is defined by either (16) or (18) or (20). If F is a uniform P -function on X, then (i) a bounded infinite sequence {z k } is generated by Algorithm 1 and the whole sequence {z k } converges to the unique solution z ∗ = (u∗, x∗ ) ∈ ℜn × ℜn of H(z) = 0 ; (ii) (42) and (43) in Theorem 23 hold, and (iii) if F ′ is Lipschitz continuous near ΠX (x∗ ), then (44) and (45) in Theorem 23 hold.

PROOF. It follows from (39) in Theorem 22 that for any y ∈ X and all h ∈ ℜn , max hi (F ′ (y)h)i ≥ νkhk2 , i∈N

which, according to [38, Lemma 3.6], implies that F ′ (y) is a P -matrix. Then, for any z ∈ ℜ2n , because p(z) ∈ X, F ′(p(z)) is a P -matrix. Therefore, from Theorem 26, an infinite sequence {z k } is generated by Algorithm 1 and each accumulation point z ∗ of {z k } is a solution of H(z) = 0. Also, since F is a uniform P -function on X, from Theorem 22, L(z 0 ), and so {z k }, is bounded. Hence, there exists at least one accumulation point z ∗ = (u∗ , x∗ ) ∈ ℜn × ℜn of {z k } such that H(z ∗ ) = 0. Since F ′ (ΠX (x∗ )), and so ∇F (ΠX (x∗ )), is a P -matrix, R-regularity holds at x∗ . Hence, we obtain from Theorem 26 that 26

the bounded sequence {z k } converges to z ∗ and (ii) and (iii) hold. Finally, since F is a uniform P -function, the BVIs have a unique solution y ∗ ∈ X (see, e.g., [25, Theorem 3.9]). Hence, the equation E(x) = 0 has a unique solution x∗ = y ∗ − F (y ∗), and so, H(z) = 0 has a unique solution z ∗ = (0, x∗ ). So, we complete our proof.

8

Preliminary Numerical Results

In this section we present some numerical experiments for the nonmonotone line search version of Algorithm 1: Step 3 is replaced by Step 3′ Let lk be the smallest nonnegative integer l satisfying z k + δ l ∆z k ∈ Ω

(51)

ψ(z k + δ l ∆z k ) ≤ W − 2σ(1 − γk¯ uk)δ l ψ(z k ),

(52)

and

where W is any value satisfying ψ(z k ) ≤ W ≤

max

j=0,1,...,M k

ψ(z k−j )

and M k are nonnegative integers bounded above for all k such that the occurrence of nonnegative indices does not happen. Define z k+1 := z k + δ lk ∆z k . Remark 28 (i) We choose a nonmonotone line search here is because in most cases it increases the stability of algorithms. (ii) The requirement (51) is for guaranteeing the global convergence of the algorithm. This requirement automatically holds for our algorithm with a monotone line search, see Proposition 17. The consistency between (51) and (52) can be seen clearly from Propositions 15 and 16. In the implementation we choose W as follows: (1) Set W = ψ(z 0 ) at the beginning of the algorithm. (2) Keep the value of W fixed as long as ψ(z k ) ≤

min ψ(z k−j ).

j=0,1,...,5

(53)

(3) If (53) is not satisfied at kth iteration, set W = ψ(z k ). For a detailed description of the above nonmonotone line search technique and its motivation, see [14]. 27

The above algorithm was implemented in Matlab and run on a DEC Alpha Server 8200. Throughout the computational experiments, the parameters used in the algorithm were δ = 0.5, σ = 0.5 × 10−4 , u¯ = (0.1, 0.1, ..., 0.1), and γ = 0.2 × min{1, 1/k¯ uk}. We used ψ(z) ≤ 10−12 as the stopping rule. The numerical results are summarized in Tables 1-6 for different smoothing functions and different tested problems. In Tables 1-6, Dim denotes the number of the variables in the problem, Start. points denote the starting points, Iter denotes the number of iterations, which is also equal to the number of Jacobian evaluations for the function F , NF denotes the number of function evaluations for the function F , and FF denotes the value of ψ at the final iterate. In the following, we give a brief description of the tested problems, where 0 is the vector of all zeros and e is the vector of all ones. The source reported for the problem is not necessarily the original one. Problem 1. This is the Kojima-Shindo problem, see √ [41]. F (y) is not a P0 function. This problem has two solutions: y 1 = ( 6/2, 0, 0, 0.5) and y 2 = (1, 0, 3, 0). Starting points: (a) 0, (b)−e, (c) e − F (e). Problem 2. This is a linear complementarity problem. See the first example of Jiang and Qi [28]. Starting points: (a) 0, (b) e. Problem 3. This is a linear complementarity problem. See the second example of Jiang and Qi [28]. Starting points: (a) 0, (b) e. Problem 4. This is the fourth example of Watson [56]. This problem represents the KKT conditions for a convex programming problem involving exponentials. The resulting F is monotone on the positive orthant but not even P0 on Rn . Starting points: (a) 0, (b) e. Problem 5. This is a modification of the Mathiesen example of a Walrasian equilibrium model as suggested in [30]. F is not defined everywhere and does not belong to any known class of functions. Starting points: (a) 0 − F (0), (b) e − F (e), (c) e Problem 6. This is the Nash-Cournot production problem [41]. F is not twice continuously differentiable. F is a P -function on the strictly positive orthant. Starting points: (a) 0, (b) e, (c) 10e. Problem 7. This is a Mathiesen equilibrium problem [36,41], in which F is not defined everywhere. Two set of constants were used: (α, b2 , b3 ) = (0.75, 1, 0.5) and (α, b2 , b3 ) = (0.9, 5, 3). We use Problem 7a and Problem 7b to represent 28

this problem with these two set of constants, respectively. Starting points: (a) e, (b) e/2, Problem 8. This is the Kojima-Josephy problem, see [14]. F (x) is not a P0 function. The problem has a unique solution which is not R-regular. Starting points: (a) −e, (b) e − F (e), (c) 0. Problem 9. This is a problem arising from a spatial equilibrium model, see [41]. F is a P -function and the unique solution is R-regular. Starting points: (a) 0, (b) e. Problem 10. This is a traffic equilibrium problem with elastic demand, see [41]. Starting points: (a) All the components are 0 except x1 , x2 , x3 , x10 , x11 , x20 , x21 , x22 , x29 , x30 , x40 , x45 which are 1, x39 , x42 , x43 , x46 which are 7, x41 , x47 , x48 , x50 which are 6, and x44 and x49 which are 10, (b) 0. Problem 11. See Problem 9 of [57]. This is a linear complementarity problem for which Lemke’s algorithm is known to run in exponential time. Starting points: (a) 0. Problem 12. This is the third problem of Watson [56], which is a linear complementarity problem with F (x) = Mx + q. M is not even semimonotone and none of the standard algebraic techniques can solve it. Let q be the vector with −1 in the 8th coordinate and zeros elsewhere. The continuation method of [56] fails on this problem. Starting points: (a) 0. Problem 13. See [1]. This is a linear variational inequality problem with lower and upper bounds. Here a = (0, ..., 0), b = (1, ..., 1). Starting points: (a) e, (b) −2e. Problem 14. This problem is transformed from Problem 11 by adding lower and upper bounds to the constraint set. The resulting problem is a linear variational inequality problem with box constraints. Here we choose a = (−10, ..., −10), b = (0, ..., 0). Starting points: (a) 0, (b) e. Problem 15. This problem is transformed from Problem 1 by adding lower and upper bounds to the constraint set. The resulting problem is a nonlinear variational inequality problem with box constraints. Here we choose a = (−10, ..., −10), b = (10, ..., 10). Starting points: (a) 0, (b) e, (c) 0 − F (0). Problem 16. This is a nonlinear variational inequality problem with box constraints [52]. The mapping F is a polynomial operator. Here a = (0, ..., 0), b = 29

(1, ..., 1). Starting points: (a) 0, (b) e. Problem 17. This problem is transformed from a linear complementarity problem in [17] by adding lower and upper bounds to the constraint set. The resulting problem is a linear variational inequality problem with box constraints. Here we choose a = (−10, ..., −10), b = (−5, ..., −5). Starting points: (a) e. The numerical results reported in Tables 1-6 show that the algorithms proposed in this paper for the three chosen smoothing functions work quite well for both nonlinear complementarity problems and box constrained variational inequalities. It is observed during our numerical experiment that the algorithms based on the neural networks smoothing function (16) and the ChenHarker-Kanzow-Smale smoothing function (18), in particular the latter one, have stronger numerical stability. However, when using the uniform smoothing function (20) we only solve a form reduced linear equation per iteration. A possible way is to use the neural networks smoothing function (16) or the Chen-Harker-Kanzow-Smale smoothing function (18) at the first several iterations and then to use the uniform smoothing function (20). We leave this as a future research topic.

9

Conclusions

In this paper we constructed a new class of smoothing Newton methods for solving nonlinear complementarity problems and box constrained variational inequalities by making use of the facts that the most often used smoothing functions are continuously differentiable jointly with µ and w at any (µ, w) ∈ ℜ++ × ℜ and they lead to strongly semismooth functions on ℜ2 . The techniques provided in this paper can be applied to smoothing Newton methods based on the mapping W (·) defined in (4). Numerical results showed that our algorithms worked very satisfactorily for the tested problems. We expect that these algorithms can be used to solve practical large-scale problems efficiently. In [27] Jiang proposed a smoothing method for solving nonlinear complementarity problems with P0 -functions. Jiang’s approach shares some similarities with ours in the sense that the smoothing parameter is treated as a variable. For any a, b, ε ∈ ℜ, define √ φF B (a, b, ε) = a2 + b2 + ε2 − (a + b). (54) This function is a smoothed form of Fischer-Burmeister function and was first 30

Table 1 Results for the the neural networks smoothing function (16) Problem

Dim.

Start. point

Iter

NF

FF

Problem 1

4

a

5

8

4.7×10−20

4

b

5

10

1.3×10−19

4

c

4

6

2.7×10−20

10,000

a

5

6

1.1×10−21

10,000

b

5

6

1.1×10−21

10,000

a

5

6

1.1×10−21

10,000

b

5

6

1.1×10−21

5

a

19

20

1.4×10−23

5

b

15

16

1.9×10−14

4

a

4

5

1.8×10−13

4

b

5

6

3.4×10−14

4

c

7

8

2.7×10−20

10

a

9

10

1.4×10−14

10

b

7

8

1.3×10−20

10

c

7

8

1.6×10−21

4

a

7

9

9.2×10−20

4

b

7

9

1.8×10−15

4

a

fail

4

b

7

8

1.9×10−16

4

a

6

9

1.1×10−25

4

b

5

7

9.0×10−13

4

c

> 50

42

a

11

22

6.7×10−18

42

b

10

14

9.0×10−19

Problem 2

Problem 3

Problem 4

Problem 5

Problem 6

Problem 7a

Problem 7b

Problem 8

Problem 9

defined by Kanzow [31]. Jiang [27] proves that ψF B (·, ·, ·) := φF B (·, ·, ·)2 is continuously differentiable on ℜ3 . Define G : ℜn+1 → ℜn by Gi (ε, x) := φF B (xi , Fi (x), ε), 31

i = 1, 2, ..., n.

(55)

Table 2 Results for the the neural networks smoothing function (16) (continued) Problem

Dim.

Start. point

Iter

NF

FF

Problem 10

50

a

12

26

8.3×10−18

50

b

16

42

1.4×10−14

Problem 11

1000

a

10

11

1.7×10−20

Problem 12

10

a

6

10

4.2×10−24

Problem 13

10,000

a

6

7

1.4×10−21

10,000

b

5

6

1.1×10−21

1000

a

6

7

1.1×10−21

1000

b

5

6

1.1×10−21

4

a

fail

4

b

4

5

1.8×10−13

4

c

6

7

3.9×10−13

10,000

a

6

8

1.1×10−21

10,000

b

7

8

2.2×10−23

400

a

5

6

1.1×10−21

Problem 14

Problem 15

Problem 16

Problem 17

Jiang [27] provided a different form of H, which was defined by 



ε  e −1 

H(ε, x) := 

G(ε, x)

.

(56)

An interesting property of such defined H is that for any ε > 0 and any λ ∈ (0, 1], ε + λ∆ε > 0 and ε + λ∆ε < ε, where d := (∆ε, ∆x) ∈ ℜ × ℜn is a solution of H(ε, x) + H ′ (ε, x)d = 0. Based on this observation, Jiang [27] designed a smoothing Newton method for solving the NCP with the assumption that F is a P0 -function. By using the continuous differentiability of kHk2 and the assumption that the search directions are bounded, which can be satisfied by assuming that F is a uniform P -function, Jiang proved global and local superlinear (quadratic) convergence 32

Table 3 Results for the CHKS smoothing function (18) Problem

Dim.

Start. point

Iter

NF

FF

Problem 1

4

a

6

9

5.6×10−18

4

b

6

11

3.3×10−23

4

c

5

7

3.9×10−23

10,000

a

5

6

1.1×10−21

10,000

b

5

6

1.1×10−21

10,000

a

5

6

1.1×10−21

10,000

b

5

6

1.1×10−21

5

a

19

20

7.6×10−26

5

b

16

17

2.8×10−24

4

a

5

7

8.8×10−15

4

b

6

7

1.8×10−22

4

c

8

9

2.7×10−20

10

a

10

11

5.0×10−15

10

b

8

9

1.4×10−22

10

c

7

8

1.6×10−13

4

a

8

11

3.4×10−23

4

b

7

10

4.2×10−13

4

a

5

7

2.7×10−21

4

b

4

6

1.1×10−15

4

a

6

9

4.2×10−16

4

b

6

8

3.2×10−19

4

c

> 50

42

a

15

28

8.2×10−26

42

b

14

24

8.3×10−20

Problem 2

Problem 3

Problem 4

Problem 5

Problem 6

Problem 7a

Problem 7b

Problem 8

Problem 9

results of his method. It is noted that Jiang’s idea may be used to any smoothing function H where a smoothing parameter is involved if the following two conditions are satisfied: (i) The function kHk2 is continuously differentiable; (ii) The search directions are uniformly bounded. 33

Table 4 Results for the CHKS smoothing function (18) (continued) Problem

Dim.

Start. point

Iter

NF

FF

Problem 10

50

a

14

28

1.2×10−19

50

b

17

41

5.5×10−15

Problem 11

1000

a

13

15

3.6×10−18

Problem 12

10

a

6

10

9.9×10−20

Problem 13

10,000

a

11

12

5.5×10−17

10,000

b

13

14

9.7×10−26

1000

a

11

13

7.1×10−19

1000

b

9

11

6.6×10−20

4

a

6

33

8.4×10−13

4

b

4

5

5.0×10−13

4

c

6

7

1.4×10−16

10,000

a

6

8

4.7×10−21

10,000

b

7

9

4.8×10−17

400

a

5

6

1.1×10−21

Problem 14

Problem 15

Problem 16

Problem 17

Meanwhile, our approach needs neither condition (i) nor condition (ii). The key point is that we can control the smoothing parameter in such a way that it converges to zero neither too fast nor too slow by using a particularly designed Newton equation and a line search model. To see the advantages of our approach clearly, let us consider the NCP with a P0 -function. For this problem we proved that any accumulation point of our iteration sequence is a solution while Jiang proved the same result under the additional condition (ii). Moreover, we require F to be a P0 -function on ℜn+ only instead of on ℜn . After the announcement of this paper, our method has soon been used to solve various regularized smoothing equations to get stronger global convergence results [53,43,60] and to solve extended vertical linear complementarity problems [44]. In this paper we treated the smoothing parameter u in the same manner as the original variable x except that u is always kept positive. An interesting question is whether the updating rule for u used in this paper can be relaxed or whether it is possible not to treat the parameter u as a free variable while superlinear convergence results can still be obtained for the problems considered here or in [53,43,60]. We leave these as further research topics. 34

Table 5 Results for the uniform smoothing function (20) Problem

Dim.

Start. point

Iter

Problem 1

4

a

> 50

4

b

4 Problem 2

Problem 3

Problem 4

Problem 5

Problem 6

Problem 7a

Problem 7b

Problem 8

Problem 9

NF

FF

6

10

9.8×10−20

c

4

6

2.7×10−20

10,000

a

5

6

1.1×10−21

10,000

b

5

6

1.1×10−21

10,000

a

5

6

1.1×10−21

10,000

b

4

5

1.1×10−21

5

a

19

20

2.5×10−13

5

b

15

16

9.9×10−17

4

a

4

5

1.3×10−17

4

b

4

5

3.7×10−16

4

c

9

58

2.7×10−20

10

a

12

13

1.9×10−25

10

b

7

8

1.3×10−20

10

c

7

8

1.6×10−21

4

a

7

10

4.0×10−24

4

b

6

9

4.0×10−14

4

a

5

7

5.0×10−16

4

b

4

6

3.8×10−24

4

a

5

8

4.0×10−13

4

b

5

7

9.0×10−13

4

c

> 50

42

a

10

18

9.9×10−14

42

b

10

14

5.7×10−19

Acknowledgements

The authors would like to thank the associate editor, the two referees and Houduo Qi for their helpful comments. 35

Table 6 Results for the uniform smoothing function (20) (continued) Problem

Dim.

Start. point

Iter

NF

FF

Problem 10

50

a

11

25

5.8×10−16

50

b

16

41

6.5×10−13

Problem 11

1000

a

11

12

1.1×10−21

Problem 12

10

a

6

15

4.7×10−19

Problem 13

10,000

a

8

10

9.7×10−26

10,000

b

10

12

9.7×10−26

1000

a

11

12

1.1×10−21

1000

b

4

5

1.1×10−21

4

a

fail

4

b

4

5

5.0×10−13

4

c

6

7

8.9×10−17

10,000

a

5

7

1.1×10−21

10,000

b

7

9

2.5×10−15

400

a

4

5

1.1×10−21

Problem 14

Problem 15

Problem 16

Problem 17

References [1] B. H. Ahn, Iterative methods for linear complementarity problem with upperbounds and lowerbounds, Mathematical Programming 26 (1983) 295–315. [2] S.C. Billups, S.P. Dirkse and M.C. Ferris, A comparison of algorithms for large-scale mixed complementarity problems, Computational Optimization and Applications 7 (1997) 3–25. [3] B. Chen and X. Chen, A global and local superlinear continuation-smoothing method for P0 + R0 and monotone NCP, SIAM Journal on Optimization, to appear. [4] B. Chen and P.T. Harker, A non-interior-point continuation method for linear complementarity problems, SIAM Journal on Matrix Analysis and Applications 14 (1993) 1168–1190. [5] B. Chen and P.T. Harker, A continuation method for monotone variational inequalities, Mathematical Programming 69 (1995) 237–253. [6] B. Chen and P.T. Harker, Smooth approximations to nonlinear complementarity problems, SIAM Journal on Optimization 7 (1997) 403–420. [7] B. Chen, P.T. Harker and M.C. Pınar, Continuation methods for nonlinear complementarity problems via normal maps, Preprint, Department of

36

Management and Systems, Washington State University, Pullman, November 1995. [8] B. Chen and N. Xiu, A global linear and local quadratic non-interior continuation method for nonlinear complementarity problems based on ChenMangasarian smoothing function, SIAM Journal on Optimization, to appear. [9] C. Chen and O.L. Mangasarian, Smoothing methods for convex inequalities and linear complementarity problems, Mathematical Programming 71 (1995) 51–69. [10] C. Chen and O.L. Mangasarian, A class of smoothing functions for nonlinear and mixed complementarity problems, Computational Optimization and Applications 5 (1996) 97–138. [11] X. Chen, L. Qi, and D. Sun, Global and superlinear convergence of the smoothing Newton method and its application to general box constrained variational inequalities, Mathematics of Computation 67 (1998) 519–540. [12] X. Chen and Y. Ye, On homotopy-smoothing methods for variational inequalities, SIAM Journal on Control and Optimization, to appear. [13] F.H. Clarke, Optimization and Nonsmooth Analysis (Wiley, New York, 1983). [14] T. De Luca, F. Facchinei and C. Kanzow, A semismooth equation approach to the solution of nonlinear complementarity problems, Mathematical Programming 75 (1996) 407–439. [15] F. Facchinei, A. Fischer and C. Kanzow, Regularity properties of a semismooth equation reformulation of variational inequalities, SIAM Journal on Optimization 8 (1998) 850–869. [16] F. Facchinei and J. Soares, A new merit function for nonlinear complementarity problems and a related algorithm, SIAM Journal on Optimization 7 (1997) 225– 247. [17] Y. Fathi, Computational complexity of LCPs associated with positive definite matrices, Mathematical Programming 17 (1979) 335–344. [18] M.C. Ferris and J.-S. Pang, Engineering and economic applications of complementarity problems, SIAM Review 39 (1997) 669–713. [19] A. Fischer, A special Newton-type optimization method, Optimization 24 (1992) 269–284. [20] A. Fischer, An NCP-function and its use for the solution of complementarity problems, in: D. Du, L, Qi and R. Womersley, eds., Recent Advances in Nonsmooth Optimization (World Scientific Publishers, New Jersy, 1995) 88– 105. [21] A. Fischer, Solution of monotone complementarity problems with locally Lipschitzian functions, Mathematical Programming 76 (1997) 513–532.

37

[22] M. Fukushima, Merit functions for variational inequality and complementarity problems, in: G. Di Pillo and F. Giannessi, eds., Nonlinear Optimization and Applications (Plenum Publishing Corporation, New York, 1996) 155–170. [23] S.A. Gabriel and J.J. Mor´e, Smoothing of mixed complementarity problems, in: M.C. Ferris and J.S. Pang, eds., Complementarity and Variational Problems: State of the Art (SIAM, Philadelphia, Pennsylvania, 1997) 105–116. [24] J. Han and D. Sun, Newton-type methods for variational inequalities, in: Y. Yuan, ed., Advances in Nonlinear Programming (Kluwer, Boston, 1998) 105– 118. [25] P.T. Harker and J.-S. Pang, Finite-dimensional variational inequality and nonlinear complementarity problem: A survey of theory, algorithms and applications, Mathematical Programming 48 (1990) 161–220. [26] K. Hotta and A. Yoshise, Global convergence of a class of non-interior-point algorithms using Chen-Harker-Kanzow functions for nonlinear complementarity problems, Discussion Paper Series No. 708, Institute of Policy and Planning Sciences, University of Tsukuba, Tsukuba, Ibaraki 305, Japan, December 1996. [27] H. Jiang, Smoothed Fischer-Burmeister equation methods for the nonlinear complementarity problem, Preprint, Department of Mathematics & Statistics, the University of Melbourne, Victoria 3052, Australia, June 1997. [28] H. Jiang and L. Qi, A new nonsmooth equation approach to nonlinear complementarity problems, SIAM Journal on Control and Optimization 35 (1997) 178–193. [29] H. Jiang, L. Qi, X. Chen and D. Sun, Semismoothness and superlinear convergence in nonsmooth optimization and nonsmooth equations, in: G. Di Pillo and F. Giannessi, eds., Nonlinear Optimization and Applications (Plenum Publishing Corporation, New York, 1996) 197–212. [30] C. Kanzow, Some equation-based methods for the nonlinear complementarity problem, Optimization Methods and Software 3 (1994) 327–340. [31] C. Kanzow, Some noninterior continuation methods for linear complementarity problems, SIAM Journal on Matrix Analysis and Applications 17 (1996) 851– 868. [32] C. Kanzow and M. Fukushima, Theoretical and numerical investigation of the D-gap function for box constrained variational inequalities, Mathematical Programming, to appear. [33] C. Kanzow and H. Jiang, A continuation method for (strongly) monotone variational inequalities, Mathematical Programming 81 (1998) 103–125. [34] C. Kanzow and H.-D. Qi, A QP-free constrained Newton-type method for variational inequality problems, Mathematical Programming, to appear.

38

[35] Z.-Q. Luo and P. Tseng, A new class of merit functions for the nonlinear complementarity problem, in: M.C. Ferris and J.S. Pang, eds., Complementarity and Variational Problems: State of the Art (SIAM, Philadelphia, Pennsylvania, 1997) 204–225. [36] L. Mathiesen, An algorithm based on a sequence of linear complementarity problems applied to a Walrasian equilibrium model: An example, Mathematical Programming 37 (1987) 1–18. [37] R. Mifflin, Semismooth and semiconvex functions in constrained optimization, SIAM Journal on Control and Optimization 15 (1977) 957–972. [38] J.J. Mor´e, Coercivity conditions in nonlinear complementarity problems, SIAM Review 16 (1974) 1–16. [39] J.J. Mor´e and Z. Wu, Smoothing techniques for macromolecular global optimization, Preprint MCS-P542-0995, Argonne National Laboratory, Argonne, Illinois, 1994. [40] J.-S. Pang, Complementarity problems, in: R. Horst and P. Pardalos, eds., Handbook of Global Optimization (Kluwer, Boston, 1995) 271–338. [41] J.-S. Pang and S.A. Gabriel, NE/SQP: A robust algorithm for the nonlinear complementarity problem, Mathematical Programming 60 (1993) 295–337. [42] M.C. Pınar and S.A. Zenios, On smoothing exact penalty functions for constrained optimization, SIAM Journal on Optimization 4 (1994) 486–511. [43] H.-D. Qi, A regularized smoothing Newton method for box constrained variational inequality problems with P0 functions, Preprint, Institute of Computational Mathematics and Scientific/Engineering Computing, Chinese Academy of Sciences, Revised, March 1998. [44] H.-D. Qi and L.Z. Liao, A smoothing Newton method for extended vertical linear complementarity problems, Preprint, Institute of Computational Mathematics and Scientific/Engineering Computing, Chinese Academy of Sciences, Revised, April 1998. [45] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equations, Mathematics of Operations Research 18 (1993) 227–244. [46] L. Qi and X. Chen, A globally convergent successive approximation method for severely nonsmooth equations, SIAM Journal on Control and Optimization 33 (1995) 402–418. [47] L. Qi and D. Sun, Improving the convergence of non-interior point algorithms for nonlinear complementarity problems, Mathematics of Computation, to appear. [48] L. Qi and J. Sun, A nonsmooth version of Newton’s method, Mathematical Programming 58 (1993) 353–367. [49] S.M. Robinson, Normal maps induced by linear transformation, Mathematics of Operations Research 17 (1992) 691–714.

39

[50] S.M. Robinson, Generalized equations, in: A. Bachem, M. Gr¨otschel and B. Korte, eds., Mathematical Programming: The State of the Art (Springer-Verlag, Berlin, 1983) 346–347. [51] S. Smale, Algorithms for solving equations, in: Proceedings of the International Congress of Mathematicians, (Berkeley, California, 1986) 172–195. [52] D. Sun, A class of iterative methods for solving nonlinear projection equations, Journal of Optimization Theory and Applications 91 (1996) 123–140. [53] D. Sun, A regularization Newton method for solving nonlinear complementarity problems, Applied Mathematics and Optimization, to appear. [54] D. Sun and J. Han, Newton and quasi-Newton methods for a class of nonsmooth equations and related problems, SIAM Journal on Optimization 7 (1997) 463– 480. [55] D. Sun and R.S. Womersley, A new unconstrained differentiable merit function for box constrained variational inequality problems and a damped GaussNewton method, SIAM Journal on Optimization, to appear. [56] L.T. Watson, Solving the nonlinear complementarity problem by a homotopy method, SIAM Journal on Control and Optimization 17 (1979) 36–46. [57] B. Xiao and P.T. Harker, A nonsmooth Newton method for variational inequalities, II: Numerical results, Mathematical Programming 65 (1994) 195216. [58] S. Xu, The global linear convergence of an infeasible non-interior path-following algorithm for complementarity problems with uniform P -functions, Preprint, Department of Mathematics, University of Washington, Seattle, WA 98195, December 1996. [59] I. Zang, A smoothing-out technique for min–max optimization, Mathematical Programming 19 (1980) 61-71. [60] G. Zhou, D. Sun and L. Qi, Numerical experiments for a class of squared smoothing Newton methods for box constrained variational inequality problems, in: M. Fukushima and L. Qi, eds., Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods (Kluwer, Boston, 1998) 421–441.

A

APPENDIX

Proof of Proposition 10: a. Suppose that the smoothing function φ(·) is the neural networks smoothing function defined by (16). From (16) and the definition of qcd (·), we can see that qcd (·) is at least twice continuously differentiable at any x ∈ ℜ2 with x1 6= 0. 40

So, we only need to prove that (24) holds at any x = (0, x2 ) with x2 ∈ ℜ. After simple computations, for any (µ, w) ∈ ℜ2 with µ > 0, we have  

∇qcd (µ, w) =  

t(µ, w) 1 1 + e(c−w)/µ



1 1 + e(d−w)/µ



 , 

(A.1)

where c−w 1 (c−w)/µ µ 1+e d−c 1 d−w + − ln[1 + e(d−w)/µ ] − (d−w)/µ µ 1+e µ

t(µ, w) = ln[1 + e(c−w)/µ ] +

and for any (µ, w) ∈ ℜ2 with µ < 0, we have 

 −1

∇qcd (µ, w) = 



0

0 1

 ∇qcd (−µ, w).

(A.2)

Suppose x = (0, x2 ) ∈ ℜ2 and h = (h1 , h2 ) ∈ ℜ2 . If h1 6= 0, it then follows from the fact that qcd is continuously differentiable at x + h, ∂qcd (x + h) = {∇qcd (x + h)T }. We consider several cases: (i) x2 < c. Suppose that h is small enough such that for all t ∈ [0, 1], x2 +th2 < c. Then from the definition, ′ qcd (x; h) = lim t↓0

qcd (x + th) − qcd (x) = 0. t

Then, for any V ∈ ∂qcd (x + h), h → 0 and h1 6= 0, ′ (x; h) = ∇qcd (x + h)T h − 0 V h − qcd

= |h1 | ln{1 + e[c−(x2 +h2 )]/|h1 | } − |h1 | ln{1 + e[d−(x2 +h2 )]/|h1 | } c − x2 d − x2 + − + (d − c) 1 + e[c−(x2 +h2 )]/|h1 | 1 + e[d−(x2 +h2 )]/|h1 | = O(h21 ) = O(khk2)

and for any V ∈ ∂qcd (x + h), h = (0, h2 ) → 0, ′ V h − qcd (x; h) = V1 ∗ 0 + 0 ∗ h2 − 0 = 0

41

′ because in the latter case qcd (x; h) = 0 and for any V ∈ ∂qcd (0, x2 + h2 ) and x2 + h2 < c, V2 = 0. (ii) c < x2 < d. Suppose that h is small enough such that for all t ∈ [0, 1], c < x2 + th2 < d. Then, from the definition, ′ qcd (x; h) = lim t↓0

qcd (x + th) − qcd (x) = h2 . t

Then for any V ∈ ∂qcd (x + h), h → 0 and h1 6= 0, ′ V h − qcd (x; h) = ∇qcd (x + h)T h − h2

= O(h21 )

= O(khk2 ) and for any V ∈ ∂qcd (x + h), h = (0, h2 ) → 0, ′ V h − qcd (x; h) = V1 ∗ 0 + 1 ∗ h2 − h2 = 0

because in the latter case for any V ∈ ∂qcd (0, x2 + h2 ) and c < x2 + h2 < d, V2 = 1. (iii) x2 = c. Suppose that h ∈ ℜ2 is sufficiently small such that x2 + h2 < b. Then from the definition, if h1 6= 0, ′ qcd (x; h) = lim t↓0

qcd (x + th) − qcd (x) = |h1 | ln(1 + e−h2 /|h1 | ) + h2 t

and if h1 = 0 ′ qcd (x; h) = max{0, h2}.

Then, for any V ∈ ∂qcd (x + h), h → 0 and h1 6= 0, h

i

′ V h − qcd (x; h) = V1 ∗ h1 + V2 ∗ h2 − |h1 | ln(1 + e−h2 /|h1 | ) + h2 1 h2 |h1 | = |h1 | ln(1 + e−h2 /|h1 | ) − −h |h1 | 1 + e 2 /|h1 | h2 1 −|h1 | ln[1 + e(d−c−h2 )/|h1 | ] + |h1 | (d−c−h 2 )/|h1 | |h1 | 1 + e h2 h2 d−c + |h1 | − + −h /|h | (d−c−h 2 1 2 )/|h1 | |h1 | 1+e 1+e

h

− |h1 | ln(1 + e−h2 /|h1 | ) + h2

i

= −|h1 | ln[1 + e(d−c−h2 )/|h1 | ] + d − c − h2 = O(h21 )

= O(khk2 )

42

and for any V ∈ ∂qcd (x + h), h = (0, h2 ) → 0, ′ V h − qcd (x; h) = V1 ∗ 0 + V2 ∗ h2 − max{0, h2 } = 0 ′ because in the latter case qcd (x; h) = max{0, h2 } and for any V ∈ ∂qcd (0, x2 + h2 ) and x2 + h2 < d if h2 > 0, V2 = 1 and if h2 < 0, V2 = 0. (iv) d < x2 . Suppose that h is small enough such that for all t ∈ [0, 1], d < x2 + th2 . Then, from the definition,

′ qcd (x; h) = lim t↓0

qcd (x + th) − qcd (x) = 0. t

Then, for any V ∈ ∂qcd (x + h), h → 0 and h1 6= 0, ′ V h − qcd (x; h) = ∇qcd (x + h)T h − 0

= O(h21 )

= O(khk2) and for any V ∈ ∂qcd (x + h), h = (0, h2 ) → 0, ′ V h − qcd (x; h) = V1 ∗ 0 + 0 ∗ h2 − 0 = 0.

because in the latter case for any V ∈ ∂qcd (0, x2 + h2 ) and x2 + h2 > d, V2 = 0. (v) x2 = d. The proof of this case is similar to that of case (iii). We omit the detail here. Then we have proved that qcd (·) is a strongly semismooth function. b. Suppose that the smoothing function φ(·) is the Chen-Harker-KanzowSmale smoothing function defined by (18). By the definition, for any (µ, w) ∈ ℜ2 , qcd (µ, w) =

c+

q

(c − w)2 + 4µ2

+

d−

q

(d − w)2 + 4µ2

. 2 2 √ It is easy to prove that ν 2 + µ2 is a strongly semismooth function. Since the composition of strongly semismooth functions √ is still a strongly semismooth √ c+

(c−w)2 +4µ2

d−

(d−w)2 +4µ2

and function [21, Theorem 19], the two functions 2 2 are strongly semismooth ones, and so, qcd (·) is a strongly semismooth function.

c. Suppose that the smoothing function φ(·) is the uniform smoothing function defined by (20). For the sake of simplicity we only prove the case c = 0 and d = ∞. The proof of other cases is very similar. From (22) (note that we assume c = 0 and d = ∞) and the definition of qcd (·), we can see that qcd (·) is continuously differentiable at any (µ, w) ∈ ℜ2 with 43

µ 6= 0 and if µ > 0,

′ qcd (µ, w) =

and if µ < 0,

    (0    

if w ≤ −µ/2

0) 2

w w 1 1 + ) if |w| < µ/2 ( − 2 2µ µ 2 1) if w ≥ µ/2

 8       (0



 −1

′ ′ qcd (µ, w) = qcd (−µ, w) 



0

0 1

.

′ It is easy to verify that qcd (·) is locally Lipschitz continuous around any 2 (µ, w) ∈ ℜ with µ 6= 0 and it is known that a function which is continuously differentiable at a certain point is strongly semismooth at this point if ′ its derivative is locally Lipschitz continuous around this point. Hence qcd (·) is strongly semismooth at any (µ, w) ∈ ℜ2 with µ 6= 0.

Next, we prove that (24) holds at any x = (0, x2 ) ∈ ℜ2 . We consider the following several cases to prove this. (i) x2 < 0. Then for any V ∈ ∂qcd (x + h), h → 0 and h1 = 6 0, ′ V h − qcd (x; h) = 0 ∗ h1 + 0 ∗ h2 − 0 = 0

and for any V ∈ ∂qcd (x + h), h = (0, h2 ) → 0, ′ V h − qcd (x; h) = V1 ∗ 0 + 0 ∗ h2 − 0 = 0.

(ii) x2 > 0. Then for any V ∈ ∂qcd (x + h), h → 0 and h1 6= 0, ′ (x; h) = 0 ∗ h1 + 1 ∗ h2 − h2 = 0 V h − qcd

and for any V ∈ ∂qcd (x + h), h = (0, h2 ) → 0, ′ V h − qcd (x; h) = V1 ∗ 0 + 1 ∗ h2 − h2 = 0.

(iii) x2 = 0. Then for any V ∈ ∂qcd (x + h), h → 0, h1 6= 0 and h2 ≤ −|h1 |/2, ′ V h − qcd (x; h) = 0 ∗ h1 + 0 ∗ h2 − 0 = 0;

for any V ∈ ∂qcd (x + h), h → 0, h1 6= 0 and h2 ≥ |h1 |/2, ′ V h − qcd (x; h) = 0 ∗ h1 + 1 ∗ h2 − h2 = 0;

for any V ∈ ∂qcd (x + h), h → 0, h1 6= 0 and |h2 | < |h1 |/2, Vh−

′ qcd (x; h)

=

1 (h2 + |h1 |/2)2 h2 1 h2 h2 − − 22 |h1 | + + =0 8 2h1 |h1 | 2 2|h1 | !

!

44

and for any V ∈ ∂qcd (x + h), h = (0, h2 ) → 0, ′ V h − qcd (x; h) = V1 ∗ 0 + V2 ∗ h2 − max{0, h2 } = max{0, h2 } − max{0, h2 } = 0 ′ because in the latter case qcd (x; h) = max{0, h2 } and for any V ∈ ∂qcd (0, x2 + h2 ) if h2 > 0, V2 = 1 and if h2 < 0, V2 = 0. Then we have proved that qcd (·) is a strongly semismooth function.

45

Suggest Documents