JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION Volume 10, Number 2, April 2014
doi:10.3934/jimo.2014.10.637 pp. 637–663
SUBSTITUTION SECANT/FINITE DIFFERENCE METHOD TO LARGE SPARSE MINIMAX PROBLEMS
Junxiang Li1 , Yan Gao1† , Tao Dai2 , Chunming Ye1 Qiang Su3 and Jiazhen Huo3 1 Business
School, University of Shanghai for Science and Technology Shanghai, 200093, China 2 Glorious Sun School of Business and Management, Donghua University Shanghai, 200051, China 3 School of Economics and Management, Tongji University Shanghai, 200092, China
(Communicated by Nobuo Yamashita) Abstract. We present a substitution secant/finite difference (SSFD) method to solve the finite minimax optimization problems with a number of functions whose Hessians are often sparse, i.e., these matrices are populated primarily with zeros. By combining of a substitution method, a secant method and a finite difference method, the gradient evaluations can be employed as efficiently as possible in forming quadratic approximations to the functions, which is more effective than that for large sparse unconstrained differentiable optimization. Without strict complementarity and linear independence, local and global convergence is proven and q-superlinear convergence result and r-convergence rate estimate show that the method has a good convergence property. A handling method of a nonpositive definitive Hessian is given to solve nonconvex problems. Our numerical tests show that the algorithm is robust and quite effective, and that its performance is comparable to or better than that of other algorithms available.
1. Introduction. Consider the finite minimax optimization problems of the form min F (x),
x∈Rn
(1)
where F (x) = maxj∈Im f j (x) and the f j : Rn → R, j ∈ Im , Im = {1, 2, · · · , m}, are twice continuously differentiable. Many problems of interest in real world applications can be modeled as finite minimax problems. This class of problems arises, 2010 Mathematics Subject Classification. Primary: 90C06, 90C30, 90C47, 49K35; Secondary: 65K10. Key words and phrases. Minimax problem, nondifferentiable optimization, sparsity, partition, substitution, finite difference, secant method. † Corresponding author. This research was sponsored by the Programs of National Natural Science Foundation of China (71090404/71090400, 71102070, 11171221, 71271138, 71202065, 71103199, 71371140), Shanghai First-class Academic Discipline Projects (XTKX2012, S1201YLXK), the Innovation Program of Shanghai Municipal Education Commission (14YZ088, 14YZ089), Programs of National Training Foundation of University of Shanghai for Science and Technology (13XGM03, 12XSY10), Program of Doctoral Scientific Research Starting Foundation of University of Shanghai for Science and Technology (1D-10-303-002) and the Innovation Fund Project For Graduate and Under- graduate Student of Shanghai (JWCXSL1302, 201210252056, XJ2013120).
637
638
J. LI, Y. GAO, T. DAI, C. YE, Q. SU AND J. HUO
for instance, in engineering design [29], control system design [30], portfolio optimization [2] and best polynomial approximation [6] as the solution of approximation problems, systems of nonlinear equations, nonlinear programming problems, multiobjective problems [17], or subproblems in semi-infinite minimax algorithms [31, 41]. The problems also occur usually in management and economic sciences and operations research to optimize the expected performance of their constructed model, taking the staffing and shift scheduling problem in call center [1, 38], the rostering problem in manufacturing [23] and path problems [44] as instance. In this paper, we focus on the case that the Hessians H j (x) of f j (x), j ∈ Im , are often sparse, which allows special techniques to take advantage of the large number of zero elements, and the dimension n is large, say, 104 . This may result from finely discretized semi-infinite minimax problems or optimal control problems. The non-differentiablility of the objective function in (1) is a main challenge for solving minimax problems, as the standard unconstrained optimization algorithms cannot be applied directly. Many approaches have been proposed to solve (1). Besides derivative-free optimization algorithms [11, 18], one method is that Problem (1) can be converted into min v, s.t. f j (x) − v ≤ 0, j ∈ Im ,
(2)
a constrained differentiable optimization problem. By transforming (1) into the nonlinear programming problem (2) we can solve it by well-established methods such as feasible-directions algorithms [35], sequential quadratic programming type algorithms [9, 13, 14, 24, 45], active set methods [10], interior point methods [22, 26, 42] and conjugate gradient methods in conjunction with exact penalties and smoothing [48]. Another method is that Problem (1) is reduced to a standard unconstrained smooth optimization problem by using an exponential penalty function 1 X exp[pf j (x)] (3) Fp (x) = ln p j∈Im
producing a smooth approximation of F (x) [12, 16, 19, 28, 36, 47, 49]. But using the method we have to handle the case that the smoothed problem becomes increasingly ill-conditioned as the approximation gets more accurate. For controlling the ill-conditioning, in [36] an adaptive precision-parameter adjustment scheme was proposed to ensure that the smoothing precision parameter is kept small when far from a stationary solution, and is increased as the stationary solution is approached. Using the adaptive precision-parameter adjustment scheme in [36], [37] presents an active-set strategy for smoothing algorithms that tackles (1) with large m. Besides the above methods, still some authors [32, 33, 34] viewed directly Problem (1) as an unconstrained nondifferentiable optimization problem since F (x) is nondifferentiable. This method is to obtain a decreasing direction hk at each iteration by minimizing the following maximum function to h: FH (xk , h) = max {f j (xk ) + hT g j (xk ) + (1/2)hT H j (xk )h}
(4)
FB (xk , h) = max {f j (xk ) + hT g j (xk ) + (1/2)hT (B j )k h},
(5)
j∈Im
or j∈Im
where g j (xk ) are the gradients of the function f j (xk ), and (B j )k , j ∈ Im are approximations to H j (xk ) with the same sparsity as the Hessians. E. Polak et al. [32] presented three descent algorithms which used a nonstandard search direction
SSFD METHOD TO LARGE SPARSE MINIMAX PROBLEMS
639
problem, and solved (1) by means of a constrained Newton method, and used the results in [40] to establish quadratic convergence. Since the results in [40] depend on the Implicit Function Theorem, earlier proofs of superlinear convergence require an assumption of linear independence of the active gradients at all local minimizers, which is rather restrictive for large problems. In [33] a very novel technique was used and a 3/2 order superliearly convergent result of a constrained Newton method was obtained without being restricted by the linear independence requirements. S.T. Zhang [51] gave some quasi-Newton methods for solving (1) and proved its global and local superlinear convergence without strict complementarity. But there have been only few proposals for the solution of large sparse minimax problems (see, e.g., [16, 22]). In [22] the authors used a feasible primal interior-point method to utilize a special structure of the minimax problem (1) with a large m and did a comparison with a smoothing method. In [16] an inexact smoothing method using the function (3) was given to solve (1) and some implementation techniques taking advantaging of the sparsity of the Hessians of the functions f j (x), j ∈ Im were considered. A most fundamental challenge for solving the class of problems is that in the process of minimizing the function (4) or (5), the computation of the Hessians H j (x) or their approximations B j usually represents a major task. If the matrices are not available in analytic form, automatic differentiation techniques [8], symbolic differentiation [46] or finite difference [39] may be used to estimate them. Especially, as a typical method the finite difference method is often used to form the Hesssians by evaluating m(n + 1) gradients at every iteration for solving Problem (1). When n m > 1, the gradient evaluations are time-cost and hence the save of computation of the gradients is more necessary than that of an unconstrained differentiable optimization problem which need only evaluate n + 1 gradients at every iteration [15, 25, 52]. In this paper in order to spare the computation of gradients in forming the sparse Hessians we use the substitution secant/finite difference (SSFD) algorithm in [52], which was first given for an unconstrained differentiable optimization problem. The SSFD method is a combination of a secant method [50], a finite difference method [25] and the substitution metod [39] for employing the sparsity of the Hessians to spare calculation of gradient evaluations as effectively as possible when the Hessians are calculated, as is the same in [52]. The difference of the SSFD method between in this paper and in [52] is that, comparing with the substitution method, using the method we can spare at least m gradient evaluations at each iteration in this paper instead of one gradient evaluation in [52]. Hence we can save more calculation in the paper than that in [52] at each iteration, which is our motivation that we create the paper. Because Problem (1) is an nondifferentiable optimization problem, we do not know whether the method is convergent and efficient when we use the SSFD method to solve Problem (1). Hence we require proving its global and local convergence and estimating its rate of convergence. For keeping the robust of the method, we have to consider the nonconvex case of some objective functions. We will see in the following sections that the proofs are far more complex than the smooth case [52] by considering the sparse structure of each Hessian H j (x), j ∈ Im at every iteration and the nondifferentiablity of the maximum functions. In this paper, for distinguishing the SSFD method from that of [52] the given method is called the SSFDMM method. It uses the information we already have at every iteration as efficiently as possible and is proven to be locally q-superlinearly convergent without the condition of strict complementarity and linear independence and also be
640
J. LI, Y. GAO, T. DAI, C. YE, Q. SU AND J. HUO
globally convergent when the line search strategy is used. For ensuring robustness of the SSFDMM method, we gave a technique to keep the positive definiteness of the Hessians. Our numerical results also show that, for some test problems, the SSFDMM algorithm is competitive, especially when the gradient evaluations are expensive or the dimensions of the Hessians are large. In the paper, k · kF denotes the Frobenius norm of a matrix, and k · k denotes the l2 norm of a vector. We use ”\” to denote the substraction of two sets, that is, A\B = {w : w ∈ A, w 6∈ B}. We define S(y, δ) = {x ∈ Rn : kx − yk < δ, y ∈ Rn } and S(y, δ) to be its closure. In Section 2, we describe the local SSFDMM method for solving the minimax problem and its some properties. In Section 3, the locally q-superlinear convergence is proven and the r-convergence order is estimated. In Section 4 we present the global method and the proofs of its superlinear convergence. In Section 5, nonconvex case is handled. Section 6 provides some numerical results and comparisons and our concluding comments and final observations are given in Section 7. 2. The local SSFDMM algorithm and its some properties. To specify the sparsity of a given matrix B j , j ∈ Im , we use M j to denote the set of index pairs (l, t), where bjlt is a structural nonzero element of B j , i.e., M j = {(l, t) : bjlt 6= 0}. For a symmetric matrix B j , if (l, t) ∈ M j , then (t, l) ∈ M j . Following Coleman and Mor´e [4, 5], we give some definitions concerning a partition of the columns of a matrix B j : Definition 2.1. A partition of the columns of B j , j ∈ Im , is a division of the columns into groups c1 , c2 , . . . , cpj such that each column belongs to one and only one group. Definition 2.2. A partition of the columns such that columns in a given group do not have a nonzero element in the same row position is consistent with the direct determination of B j , j ∈ Im . Note that, for convenience, ci , i = 1, 2, · · · , pj , indicate both the sets of the columns and the sets of the indices of these columns. To study the properties of the SSFDMM algorithm, we make the following hypotheses: Assumption 2.3. For all j ∈ Im , the f j : Rn → R in (1) are twice Lipschitz continuously differentiable, i.e., there exist 0 < αsj < ∞ such that k(H j (x) − H j (y))es k ≤ αsj kx − yk, ∀x, y ∈ D, j ∈ Im . Pn 1 Let α = maxj∈Im ( s=1 (αsj )2 ) 2 , then kH j (x) − H j (y)kF ≤ αkx − yk, ∀x, y ∈ D,
(6)
(7)
where D ⊆ Rn is an open convex set. Assumption 2.4. There exist constants 0 < u ≤ U < ∞ such that for all x ∈ Rn , ukhk2 ≤ hT H j (x)h ≤ U khk2 , ∀h ∈ Rn , ∀j ∈ Im .
(8)
Assumption 2.5. There exist constants 0 < u1 ≤ U1 < ∞ such that u1 khk2 ≤ hT B j h ≤ U1 khk2 , ∀h ∈ Rn , ∀j ∈ Im .
(9)
SSFD METHOD TO LARGE SPARSE MINIMAX PROBLEMS
641
Since the functions f j (·), j ∈ Im are strictly convex under Assumption 2.4, it follows that F (·) is also strictly convex, and hence has a unique minimizer x∗ . To establish the local convergence and rate of convergence of the algorithm, we will need the following lemmas. Lemma 2.6. [34] Suppose that Assumption 2.4 holds and x∗ is the unique minimizer of F (·). Then, for all x ∈ Rn , u F (x) − F (x∗ ) ≥ kx − x∗ k2 . (10) 2 Lemma 2.7. [34] Suppose that Assumption 2.3 holds. Then for any x, y ∈ Rn , | F (x) − FH (y, x − y) |≤ (α/6) k x − y k3 ,
(11)
where α is defined by (7). Lemma 2.8. Suppose that Assumptions 2.3 and 2.5 are satisfied. Let θˆ : Rn → R and φˆ : Rn → Rn be defined by ˆ θ(x) = minn FB (x, h) − F (x),
(12)
ˆ φ(x) = arg minn FB (x, h) − F (x).
(13)
h∈R
and h∈R
ˆ ˆ ˆ Then, (a) for all x ∈ Rn , dF (x, φ(x)) ≤ θ(x) ≤ 0, where dF (x, φ(x)), the directional ˆ derivative of the function F (·) at a point x in the direction φ(x), is given by the formula ˆ ˆ dF (x, φ(x)) = max (g j (x))T φ(x), j∈Iˆm (x)
and Iˆm (x) = {j ∈ Im : f (x) = F (x)}. ˆ ∗ ) = 0 and φ(x ˆ ∗ ) = 0, (b) x∗ is a solution of (1) if and only if both θ(x n ∗ ˆ (c) for all x ∈ R such that x 6= x , the unique minimizer of (1), θ(x) < 0. j
Proof. (a) Let w(h) = FB (x, h) − F (x). ˆ ˆ Since w(0) = 0 and θ(x) ≤ w(0), we obtain that θ(x) ≤ 0. ˆ For all h ∈ φ(x), from (5), (12) and (13), we obtain that ˆ θ(x) = max {f j (x) − F (x) + hT g j (x) + (1/2)hT B j h}. j∈Im
Since f j (x) − F (x) = 0, for every j ∈ Iˆm (x) and from Assumption 2.5, one has that 1 T j T j ˆ θ(x) ≥ max h g (x) + h B h ≥ max hT g j (x) = dF (x, h). 2 j∈Iˆm (x) j∈Iˆm (x) (b) On the one hand, since x∗ is a solution of (1), then from Theorem 2.1.1 of [34] and (a), we have that ˆ ∗ )) ≤ θ(x ˆ ∗ ) ≤ 0. 0 ≤ dF (x∗ , φ(x ˆ ∗ ) = 0 and, hence, that φ(x ˆ ∗ ) = 0. It means that θ(x ∗ ˆ ˆ x) = 0, then for all h ∈ Rn , On the other hand, let x ˆ 6= x , but θ(ˆ x) = 0 and φ(ˆ 1 T j j T j max f (ˆ x) − F (ˆ x) + h g (ˆ x) + h B h ≥ 0. j∈Im 2
642
J. LI, Y. GAO, T. DAI, C. YE, Q. SU AND J. HUO
Consequently, for some δ > 0 and any h ∈ S(0, δ), we have that 1 max hT g j (ˆ x) + hT B j h ≥ 0. 2 j∈Iˆm (ˆ x) From (9), we have 1 U1 khk2 + max hT g j (ˆ x) ≥ 0. 2 j∈Iˆm (ˆ x) Therefore, for all λ ∈ [0, 1] and h ∈ S(0, δ), i.e., λh ∈ S(0, δ), we have 1 U1 λ2 khk2 + λ max hT g j (ˆ x) ≥ 0, 2 j∈Iˆm (ˆ x) from which we may conclude that dF (ˆ x, h) ≥ 0, ∀h ∈ S(0, δ). Since dF (ˆ x, ·) is positive homogeneous, it follows that dF (ˆ x, h) ≥ 0, ∀h ∈ Rn , and hence that x ˆ is a minimizer of F (x) according to Corollary 2.1.2 in [34], which contradicts the uniqueness of the solution of (1). ˆ (c) It is clear that for all x ∈ Rn , we have that θ(x) ≤ 0, and by the uniqueness of ˆ the solution of (1), we obtain that for all x 6= x∗ , θ(x) < 0. Now the local SSFDMM algorithm can be stated as follows: Algorithm 1. (the local SSFDMM method) Given a consistent partition of the columns of the lower triangular part of each Hessian which divides the set {1, 2, · · · , n} into pj subsets cj1 , cj2 , · · · , cjpj , j ∈ Im and given x−1 , x0 ∈ Rn such that s0i ≡ x0i − x−1 6= 0, i = 1, 2, · · · , n, at each step k ≥ 0: i (1) Set i X X (qij )k = skt et , i = 1, 2, · · · , pj − 1, l=1 t∈cj l
(2)
, t = 1, 2, . . . , n. where skt = xkt − xk−1 t j k j k Compute g (x − (qi ) ), i = 0, 1, · · · , pj − 1, and set j j (yi+1 )k = g j (xk − (qij )k ) − g j (xk − (qi+1 )k ), where
(q0j )k
= 0 and g j (xk − j
(qpj j )k )
= g j (xk−1 ).
cji ,
i = 1, 2, · · · , pj , and skt 6= 0, then set X sk eT (y j )k (bjwl )k w , (bjlt )k = l ki − st skt j
(3) If (l, t) ∈ M , l ≥ t, t ∈
(14)
(15)
w>l, w∈ci
and set
(bjtl )k = (bjlt )k , otherwise set (bjlt )k = (bjtl )k = (bjlt )k−1 ,
t ∈ cji , i = 1, 2, · · · , pj .
(16)
(4) Compute xk+1 = arg minn FB (xk , x − xk ). x∈R
(17)
(5) Replace k + 1 by k. Note that by (14), to get (B j )k , j ∈ Im , instead that pj + 1 gradient evaluations are computed in the substitution method of [39], only pj gradient evaluations are needed at every iteration, since g j (xk−1 )(= g j (xk − (qpj j )k )) have already been calculated at the previous iteration. Thus in total at each iteration the number of gradient evaluations is m less than that for the substitution method. Obviously, if
SSFD METHOD TO LARGE SPARSE MINIMAX PROBLEMS
643
m = 1 in (1), Problem (1) will become an unconstrained differentiable optimization problem and hence the SSFDMM method will be the SSFD method of [52]. Hence we think that the work is an improvement of [39] and a development of [52]. In practice, if |ski | in step (3) is too close to zero the cancellation errors will become significant. Therefore, there should be a lower bound θik > 0 for |ski |. We suggest choosing √ θik = Υ · max{typxi , |xki |}, where Υ is the machine precision and typxi is a typical value of xi given by users. Also for the first step, one should choose |s0i | ≥ θi0 , i = 1, 2, · · · , n, instead of |s0i | = 6 0, i = 1, 2, · · · , n. We suggest choosing √ x−1 = x0i + Υ · x0i , x0i 6= 0, i = 1, 2, · · · , n. i Now, suppose that we have finished the (k − 1)th step of the iteration; then for j ∈ Im we have obtained xk−1 , g j (xk−1 ), (B j )k−1 and xk . Let X (dji )k = skt et , i = 1, 2, · · · , pj , t∈cji
(qij )k =
i X
(djt )k , i = 1, 2, · · · , pj , (q0j )k = 0,
t=1
and (Jij )k =
Z
1
H j (xk − (qij )k + t(dji )k )dt, i = 1, 2, · · · , pj .
(18)
0
Then, if skj 6= 0 one has that (Jij )k et = (B j )k et , where t ∈
cji ,
(19)
j
i = 1, 2, · · · , p .
Lemma 2.9. Suppose that Assumption 2.5 is satisfied. Let the sequence {xk } be generated by Algorithm 1 and x∗ be the solution of (1). Then xk → x∗ if and only ˆ k ) → 0. if φ(x Proof. From (12) and (13), we have that ˆ k ))T (B j )k φ(x ˆ k ) = max {f j (xk ) + (g j (xk ))T φ(x ˆ k ) + 1 (φ(x ˆ k )} − F (xk ) (20) θ(x j∈Im 2 ˆ k )k2 } − F (xk ). ˆ k ) + 1 U1 kφ(x ≤ max {f j (xk ) + (g j (xk ))T φ(x (21) j∈Im 2 ˆ k )k → 0, then by (20), we obtain that θ(x ˆ k ) → 0. Let If kφ(x 1 θU1 (x) = minn max {f j (x) + hT g j (x) + U1 khk2 } − F (x). (22) h∈R j∈Im 2 From Theorem 2.1.6 of [34], we obtain that θU1 (x) is continuous and θU1 (x) ≤ 0, and that θU1 (x) = 0 if and only if x is a local minimizer of F (x). Therefore, by Assumption 2.5 and (21), we have that ˆ k ) ≤ θU (xk ) ≤ 0. θ(x 1
ˆ k ) → 0, it follows that θU (xk ) → 0, and by the uniqueness of the So from θ(x 1 solution of (1), furthermore, that xk → x∗ . ˆ k ) → 0. On the other hand, if xk → x∗ , it follows clearly from (17) that φ(x
644
J. LI, Y. GAO, T. DAI, C. YE, Q. SU AND J. HUO
Lemma 2.10. [52] For all j ∈ Im , assume that H j (x) satisfy Lipschitz condition (7), and that {xt }kt=0 and {(B j )t }kt=0 are generated by Algorithm 1. Suppose that {xt − (qij )t , i = 0, 1, · · · , pj }kt=0 ⊂ D. If sti = 0, t = 1, 2, · · · , k appears consecutively in at most w steps for any specific 1 ≤ i ≤ n, then for k ≥ w − 1, one has k H j (xk ) − (B j )k kF ≤ α
k X
kxt − xt−1 k.
(23)
t=k−w
3. The local convergence of the algorithm. Our proof of convergence for Algorithm 1 will require the following technical results. u and η = u−14σ Lemma 3.1. Let 0 < σ < 14 8α . Suppose that t and s satisfy 0 ≤ t, s ≤ η and α σ t2 ≤ [(s + t)3 + s3 ] + [(s + t)2 + s2 ]. (24) 3u u Then 1 t ≤ s. (25) 2 Proof. From (24), we have that α σ t2 ≤ (2s3 + 3s2 t + 3st2 + t3 ) + (2(s2 + t2 ) + s2 ) 3u u α σ 2 2 2 2 ≤ (2ηs + 3ηs + 3ηt + ηt ) + (3s2 + 2t2 ), 3u u i.e.,
(3u − 4αη − 6σ)t2 ≤ (5αη + 9σ)s2 .
(26)
Thus, (25) is followed from (26) because t > 0, s > 0 and 3u − 4αη − 6σ = 4(5αη + 9σ) > 0. For proving the superlinear convergence of Algorithm 1, the linear convergence of Algorithm 1 will be first proved. Theorem 3.2. Assume that H j (x), j ∈ Im satisfy Lipschitz condition (7). Let {xk } be generated by Algorithm 1 without any global strategy. Then there exist ε > 0 and δ > 0 such that if x−1 ∈ D and x0 ∈ D satisfy kx0 − x∗ k ≤ ε, kx−1 − x0 k ≤ δ, then {xk } is well defined and converges q-linearly to x∗ . Proof. Since x∗ ∈ D, we can choose ε > 0 and δ > 0 so that S(x∗ , 2ε) ⊂ D and u 1 u − 14σ η η 0 < σ ≡ 2α(ε + δ) < , βσ < , η = , < ε < , 5ε < δ, 14 3 8α 4 2 where u is as in Assumption 2.4, β > 0 satisfies k (H j (x∗ ))−1 kF < β, for all j ∈ Im . Let Ω1 = {i ∈ {1, 2, · · · , n} : ski 6= 0} and Ω2 = {1, 2, · · · , n}\Ω1 . From (16) and (19), we have j
j k
j k−1
(B ) = (B )
X t∈Ω2
et eTt
+
p X X i=1 t∈cj ∩Ω1 i
(Jij )k et eTt .
SSFD METHOD TO LARGE SPARSE MINIMAX PROBLEMS
645
Therefore, one obtains that X X kH j (x∗ ) − (B j )k k2F = k(H j (x∗ ) − (B j )k )ei k2 + k(H j (x∗ ) − (B j )k )ei k2 i∈Ω1
i∈Ω2
j
=
p X X i=1
k(H j (x∗ ) − (Jij )k )el k2 +
l∈cji ∩Ω1
p X X i=1 l∈cj ∩Ω1 i
+
X
k(H j (x∗ ) − (B j )k−1 )ei k2
i∈Ω2
j
=
X
1
Z
j )k )))]el dtk2 [(H j (x∗ ) − H j (xk − (qij )k + t((qij )k − (qi−1
k 0
k(H j (x∗ ) − (B j )k−1 )ei k2
i∈Ω2
≤ α (kx∗ − xk k + kxk − xk−1 k)2 + kH j (x∗ ) − (B j )k−1 k2F 2
≤ α2 (2kx∗ − xk k + kx∗ − xk−1 k)2 + kH j (x∗ ) − (B j )k−1 k2F . Hence, one has that k H j (x∗ ) − (B j )k kF ≤ kH j (x∗ ) − (B j )k−1 kF + 3ασ(xk , xk−1 ),
(27)
where σ(xk , xk−1 ) = max{kxk−1 − x∗ k, kxk − x∗ k}. Notice that by Lemma 2.6 in [34] and Lipschitz condition (7), one has kH j (x∗ ) − (B j )0 kF = kH j (x∗ ) − H j (x0 )kF + kH j (x0 ) − (B j )0 kF ≤ αkx∗ − x0 k + αkx0 − x−1 k ≤ α(ε + δ). We next show, by induction on k, that k(B j )k − H j (x∗ )kF ≤ (2 − 2−k )α(ε + δ), j ∈ Im ,
(28)
kxk+1 − x∗ k ≤ (1/2)kxk − x∗ k.
(29)
and For k = 0, (28) holds obviously. The proof of (29) is similar to that of the following general case, so we ignore it. Now suppose (28) and (29) holds for k = 1, 2, · · · , i−1. We show that it also holds for k = i. By (27), (28) and (29), one has that k(B j )i − H j (x∗ )kF ≤ k(B j )i−1 − H j (x∗ )kF + 3αkxi−1 − x∗ k ≤ (2 − 2−(i−1) )α(ε + δ) + 3αkxi−1 − x∗ k.
(30)
From (29) and kx0 − x∗ k ≤ ε, we have that kxi−1 − x∗ k ≤ 2−(i−1) kx0 − x∗ k ≤ 2−(i−1) ε.
(31)
Hence, it follows that k(B j )i − H j (x∗ )kF ≤ (2 − 2−(i−1) )α(ε + δ) + 3 · 2−(i−1) αε ≤ (2 − 2−(i−1) + 2−i )α(ε + δ) = (2 − 2−i )α(ε + δ),
(32)
646
J. LI, Y. GAO, T. DAI, C. YE, Q. SU AND J. HUO
which shows that (28) holds for all k. Thus, from (32) and Assumption 2.3, for j ∈ Im , k(H j (x∗ ))−1 ((B j )i − H j (x∗ ))kF ≤ k(H j (x∗ ))−1 kF k(B j )i − H j (x∗ )kF ≤ β(2 − 2−i )α(ε + δ) ≤ 2βα(ε + δ) < 1/3, Hence, from Von Neumann Theorem [50] we have that there exists ((B j )i )−1 , j ∈ Im such that k((B j )i )−1 kF ≤
β 3β k(H j (x∗ ))−1 kF ≤ = , j ∗ −1 j i j ∗ 1 − k(H (x )) ((B ) − H (x ))kF 1 − 1/3 2
which shows that xi+1 is well defined. Making use of (10) and (17), we find that, for k = 0, 1, 2, · · · , 2 [F (xk+1 ) − F (x∗ )] u 2 ≤ [F (xk+1 ) − FH (xk , xk+1 − xk ) + FH (xk , x∗ − xk ) − F (x∗ ) u + FH (xk , xk+1 − xk ) − FB (xk , xk+1 − xk ) + FB (xk , x∗ − xk )
kxk+1 − x∗ k2 ≤
− FH (xk , x∗ − xk )], because FB (xk , xk+1 − xk ) ≤ FB (xk , x∗ − xk ) by construction of xk+1 from (17). Thus, by (11), we obtain that kxk+1 − x∗ k2 ≤
2 h α k+1 α kx − xk k3 + kxk − x∗ k3 + FH (xk , xk+1 − xk ) u 6 6 k k+1 k −FB (x , x − x ) + FB (xk , x∗ − xk ) − FH (xk , x∗ − xk ) . (33)
Since, by (4), (5), (7) and (28) 1 FH (xk , x − xk ) − FB (xk , x − xk ) ≤ max (x − xk )T (H j (xk ) − (B j )k )(x − xk ) j∈Im 2 (34) 1 ≤ max {kx − xk k2 kH j (xk ) − (B j )k kF } (35) 2 j∈Im 1 ≤ kx − xk k2 max {kH j (xk ) − H j (x∗ )kF j∈Im 2 j ∗ + kH (x ) − (B j )k kF } α ≤ [kxk − x∗ k + (2 − 2−k )(ε + δ)]kx − xk k2 , 2 (36)
SSFD METHOD TO LARGE SPARSE MINIMAX PROBLEMS
647
we have, for k = i, that 2 n α i+1 α α kxi+1 − x∗ k2 ≤ kx − xi k3 + kxi − x∗ k3 + [kxi − x∗ k + (2 − 2−i ) u 6 6 2 (ε + δ)](kxi+1 − xi k2 + kxi − x∗ k2 ) 2 n α i+1 α α kx ≤ − xi k3 + kxi − x∗ k3 + [2−i ε + (2 − 2−i )(ε + δ)] . u 6 6 2 (kxi+1 − xi k2 + kxi − x∗ k2 ) 2 n α i+1 α kx ≤ − xi k3 + kxi − x∗ k3 + α(ε + δ)(kxi+1 − xi k2 u 6 6 i ∗ 2 + kx − x k ) α i+1 (kx − x∗ k + kxi − x∗ k)3 + kxi − x∗ k3 ≤ 3u σ i+1 (kx − x∗ k + kxi − x∗ k)2 + kxi − x∗ k2 . + u From Lemma 2.9, there exists an 1 (> 0) such that if kxi − x∗ k < 1 , then ˆ i )k < η/4. kxi+1 − xi k = kφ(x Let = min(ε, 1 ), if kxi − x∗ k < , then kxi+1 − x∗ k = kxi+1 − xi k + kxi − x∗ k < η/4 + < η. Thus, for k = i, (29) follows from Lemma 3.1. This shows that (29) holds for all k. By (29), one has that kxi+1 − (qlj )i+1 − x∗ k ≤ kxi+1 − x∗ k + k(qlj )i+1 k ≤ kxi+1 − x∗ k + kxi+1 − xi k ≤ ε + η/4 < 2ε. ∗ j Thus, {xk − (qlj )k }i+1 k=1 ⊂ S(x , 2ε) ⊂ D, l = 1, 2, · · · , p , j ∈ Im . Hence, It follows k ∗ from (28) and (29) that {x } converges to x at least q-linearly.
Now we prove that Algorithm 1 is superlinearly convergent under the same conditions as Theorem 3.2. Theorem 3.3. Assume that x−1 , x0 , and {xk } satisfy the hypotheses of Theorem 3.2. Then the convergence is q-superlinear. Proof. If, for all 1 ≤ i ≤ n, ski = 0 appears consecutively in at most w steps, then it follows from (7), (23), (29), (33) and (35) that 2 hα kxk+1 − x∗ k2 ≤ [(kxk+1 − x∗ k + kxk − x∗ k)3 + kxk − x∗ k3 ] u 6 1 j k j k k+1 k 2 ∗ k 2 + k(H (x ) − (B ) )k(kx − x k + kx − x k ) 2 " # k 2 35α k α X 13 ∗ ∗ 3 i i−1 k 2 ≤ kx − x k + kx − x k( kx − x k ) u 48 2 4 i=k−w
2 35α k 13α(w + 1) k−w−1 ≤ [ kx − x∗ k + kx − x∗ k]kxk − x∗ k2 . (37) u 48 8 From Theorem 3.2, we know that limk→∞ kxk − x∗ k = 0. Hence we have that kxk+1 − x∗ k = 0, k→∞ kxk − x∗ k lim
648
J. LI, Y. GAO, T. DAI, C. YE, Q. SU AND J. HUO
i.e., the convergence is superlinear. Otherwise, i.e., there exists at least one integer i ∈ {1, 2, · · · , n} and one integer k such that sik+l = 0 for l = 1, 2, · · · , w + 1. Let A2 = {i ∈ {1, 2, · · · , n} : F or any k > 0, there exists at least one integer w > k such that sw i 6= 0}. and let A1 = {1, 2, · · · , n}\A2 . Then X X (B j )k − H j (x∗ ) = ((B j )k − H j (x∗ ))ei eTi + ((B j )k − H j (x∗ ))ei eTi . i∈A1
i∈A2
From the definition of A1 , there exists a large integer k0 such that sk+1 = 0 for all i i ∈ A1 and k > k0 . Therefore, one has that X ((B j )k − H j (x∗ ))ei eTi (xk+1 − xk ) = 0, (38) i∈A1
for k > k0 . Furthermore, we have that X ((B j )k − H j (xk ))ei eTi (x∗ − xk ) = 0,
(39)
i∈A1
for k > k0 . Now we show that X lim k ((B j )k − H j (x∗ ))ei eTi kF = 0. k→∞
(40)
i∈A2
In fact, limk→∞ kxk − x∗ k = 0 implies that given ε > 0, there exists an integer K such that ε , ∀k > K. kxk − x∗ k < 3α By the definition of A2 , there exists an integer K1 , which depends on K, such that for every i ∈ A2 , there exists at least one integer 0 < t < K1 , such that sK+t 6= 0. i Let K = K + K1 . For k > K and i ∈ A2 , define t(k, i) = min{t : sk−t 6= 0}. i Then k − t(k, i) > K − K1 = K. Let i ∈ cjl , 1 ≤ l ≤ pj . Then we have that (B j )k ei = (B j )k−t(k,i)+1 ei = (Jlj )k−t(k,i)+1 ei . Thus, by Lipschitz condition (6), one has k((B j )k − H j (x∗ ))ei k2 = k((Jlj )k−t(k,i)+1 − H j (x∗ ))ei k2 Z 1 k−t(k,i)+1 =k (H j (xk−t(k,i)+1 − (qij )k−t(k,i)+1 + t((qij )k−t(k,i)+1 − qi−1 )) 0
− H j (x∗ ))ei dtk2
SSFD METHOD TO LARGE SPARSE MINIMAX PROBLEMS 1
Z ≤(
k−t(k,i)+1
αi kxk−t(k,i)+1 − (qij )k−t(k,i)+1 + t((qij )k−t(k,i)+1 − qi−1
649
)
0
− x∗ kdt)2 1 1 k−t(k,i)+1 2 ≤ αi2 (kxk−t(k,i)+1 − x∗ k + k(qij )k−t(k,i)+1 k + kqi−1 k) 2 2 ≤ αi2 (kxk−t(k,i)+1 − x∗ k + kxk−t(k,i)+1 − xk−t(k,i) k)2 ≤ αi2 (2kxk−t(k,i)+1 − x∗ k + kxk−t(k,i) − x∗ k)2 ε ε 2 < αi2 (2 + ) 3α 3α ε2 = αi2 2 . α
Then k
X
((B j )k − H j (x∗ ))ei eTi k2F =
i∈A2
X
k((B j )k − H j (x∗ ))ei k2
i∈A2
0 and 0 < ε < 1 such that if k ≥ k1 , then εk ≤ ε < 1, and ε2k1 +w+2 ε2k1 +w+3
≤
ε2k1 +w+1
≤
ε2k1 +w+2
w+1 X i=0 w+1 X i=0
w+1
X 1 1 εk1 +w−i+1 = ε2k1 +w+1 εk1 +i ≤ ε3 , w+2 w + 2 i=0 1 εk +1+i ≤ ε4 , w+2 1
··· . By induction on i, it can easily follow from the above that εk1 +i ≤ εµi , i = 0, 1, · · · ,
(45)
where 2µi+1 = 2µi + µi−w−1 ,
i = w + 1, w + 2, · · ·
µ0 = µ1 = · · · = µw+1 = 1. Since the largest positive root of (42) is τ > 1, we have that µi ≥ γτ i , i = 0, 1, · · · ,
(46)
−w−1
where γ = τ . In fact, since τ > 1, (46) must hold for i = 0, 1, · · · , w + 1. Now suppose (46) holds for some i > w + 1, we show that it also holds for i + 1. Note that τ is the unique positive root of (42). One has by (42) that 1 1 1 µi+1 = µi + µi−w−1 ≥ γτ i + γτ i−w−1 = γτ i+1 (τ −1 + τ −w−2 ) = γτ i+1 , 2 2 2 which completes the induction step. Thus from (45), (46) and the definition of εk , it follows that i
kxk1 +i − x∗ k = εk1 +i /((w + 2)C1 ) ≤ εγτ /((w + 2)C1 ). Therefore
1
γ
Rτ (xk ) = lim supkxk1 +i − x∗ k τ k1 +i ≤ ε τ k1 < 1. i→∞
By the definition of the r-convergence order [27] we obtain the desired result. In particular, if ski 6= 0, i = 1, 2, · · · , n, k = 1, 2, · · · , that is, w = 0 in (42), by solving the equation 2t2 − 2t − 1 = 0 and taking the root τ > 0 we obtain that the result (43). 4. The global convergence of the algorithm. We will present the globally stabilized SSFDMM method for the minimization of the maximum of a set of twice Lipschitz continuously differentiable, convex functions, i.e., for solving (1) under Assumptions 2.3-2.5. Stabilization is achieved by adding an Armijo-type step-size rule to the local SSFDMM method. The rate of convergence of the local SSFDMM method is preserved, because, as we will show, near the solution of (1), the step size becomes unity, i.e., the global SSFDMM method reverts to the local SSFDMM method. Now the global SSFDMM method can be stated as follows: Algorithm 2. (the global SSFDMM method) Given a consistent partition of the columns of the lower triangular part of each Hessian which divides the set {1, 2, · · · , n} into pj subsets cj1 , cj2 , · · · , cjpj , j ∈ Im and given x−1 , x0 ∈ Rn such that s0i ≡ x0i − x−1 6= 0, i = 1, 2, · · · , n. Set %, ρ ∈ (0, 1) and S = {1, ρ, ρ2 , · · · }. At i each iteration k(≥ 0): (1) Calculate (B j )k by Algorithm 1.
652
J. LI, Y. GAO, T. DAI, C. YE, Q. SU AND J. HUO
ˆ k ) and hk = φ(x ˆ k ), according to (12) and (13). If θ(x ˆ k ) = 0, (2) Compute θ(x then stop. (3) Compute the step size ˆ k )}. λk = max{λ ∈ S|F (xk + λhk ) − F (xk ) ≤ λ%θ(x (4) Set xk+1 = xk + λk hk . (5) Replace k + 1 by k. First we show that Algorithm 2 is globally convergent. Theorem 4.1. Suppose that Assumptions 2.3-2.5 hold, and that x∗ is the solution of Problem (1). Then any sequence {xk }∞ k=0 generated by Algorithm 2 converges to x∗ . Proof. First, because of Assumption 2.4, the level sets of F (·) are bounded and, by construction in Step (3) of Algorithm 2, F (xk+1 ) < F (xk ). Hence any sequence {xk }∞ k=0 generated by Algorithm 2 must have accumulation points. For the sake of ∗ contradiction, suppose that the sequence {xk }∞ k=0 does not converge to x . Then k ∗ there must exist some infinite subset K such that x → x ˆ 6= x , k ∈ K. By the uniqueness of the solution and making use of the continuity of θU1 (x) defined by (22), we conclude that θU1 (ˆ x) < 0 and there exists a k0 such that for all k ≥ k0 , k ∈ K one has that ˆ k ) ≤ θU (xk ) < θU (ˆ θ(x (47) 1 1 x)/2. From the second-order expansion of F (xk ), for all λ ∈ [0, 1], we have that ˆ k )) − F (xk ) F (xk + λφ(x =
ˆ k) max {f j (xk ) − F (xk ) + λ(g j (xk ))T φ(x Z 1 ˆ k ))T H j (xk + λsφ(x ˆ k ))φ(x ˆ k )ds} +λ2 (1 − s)(φ(x j∈Im
0
ˆ k ))T ˆ k ) + 1 (φ(x ≤ λ max {f j (xk ) − F (xk ) + (g j (xk ))T φ(x j∈Im 2 2˜ ˆ k )} + λ U kφ(x ˆ k )k2 − λu1 kφ(x ˆ k )k2 (B j )k φ(x 2 2 ˆ k ) + 1 λ(λU ˆ k )k2 , ˜ − u1 )kφ(x ≤ λθ(x 2 ˜ = max{U, U1 }, because λ ∈ [0, 1] and f j (xk ) ≤ F (xk ). Therefore, if where U ˜ ≤ 1, then λ ≤ u1 /U ˆ k )) − F (xk ) ≤ λθ(x ˆ k ). F (xk + λφ(x (48) ˜, Consequently, for λ ≤ u1 /U k ˆ k )) − F (xk ) − λ%θ(x ˆ k ) ≤ λθ(x ˆ k )(1 − %), F (x + λφ(x ˜ . Thus from (47) and (49), we obtain that which implies that λk ≥ ρu1 /U ρ%u1 ˆ k ρ%u1 F (xk+1 ) − F (xk ) ≤ θ(x ) ≤ θ (ˆ x), ∀k ≥ k0 , k ∈ K. ˜ ˜ U1 U 2U
(49)
(50)
Combining (50) with the fact that the sequence {F (xk )}∞ k=0 is monotonically decreasing, we conclude that F (xk ) → −∞ as k → ∞, which is a contradiction, because the level sets of F (·) are bounded. Hence the theorem must be true. Next we establish superlinear convergence.
SSFD METHOD TO LARGE SPARSE MINIMAX PROBLEMS
653
Theorem 4.2. Suppose that Assumptions 2.3-2.5 hold and that x∗ is the solution of Problem (1). Then the sequence {xk }∞ k=0 generated by Algorithm 2 converges q-superlinearly to x∗ . ∗ Proof. Since {xk }∞ k=0 converges to x , we need to show only that there exists a k1 such that λk = 1 for all k ≥ k1 , so that Algorithm 2 reduces to Algorithm 1 and invoke Theorem 3.3. Now, it follows from Assumptions 2.3-2.5, (11) and (12) that
ˆ k) θ(x
=
ˆ k )) − FH (xk , φ(x ˆ k )) + FH (xk , φ(x ˆ k )) FB (xk , φ(x ˆ k )) + F (xk + φ(x ˆ k )) − F (xk ) −F (xk + φ(x
≥
ˆ k )) − F (xk ) + FB (xk , φ(x ˆ k )) − FH (xk , φ(x ˆ k )) F (xk + φ(x ˆ k )k3 , −(α/6)kφ(x
(51)
Hence ˆ k )) − F (xk ) ≤ F (xk + φ(x
ˆ k ) + [(1 − %)θ(x ˆ k ) + FH (xk , φ(x ˆ k )) %θ(x ˆ k )) + (α/6)kφ(x ˆ k )k3 ]. −FB (xk , φ(x
(52)
ˆ ˆ ˆ Next we establish a relationship between kθ(x)k and kφ(x)k. Since x+ φ(x) is the minimizer of FB (x, ·), it follows that it satisfies the first-order optimality condition ˆ 0 ∈ ∂FB (x, φ(x)). (53) P κ For any integer κ ≥ 1, let Σκ = {µ ∈ Rκ | j=1 µj = 1, µj ≥ 0}. Then it follows ˆ from (53), the definition of the generalized gradient ∂FB (x, φ(x)) (see [3]), and the Carath´eodory theorem[43], that there exists a multiplier µ ∈ Σm such that 0=
m X
ˆ µj [g j (x) + B j φ(x)].
(54)
j=1
Since the µj ≥ 0 in (54), it follows from Assumption 2.5 that the matrix is invertible and hence that −1 m m X X j j ˆ µ B µj g j (x). φ(x) = − j=1
Pm
j=1
µj B j
(55)
j=1
Furthermore, the following complementary slackness condition(see (2.1.7b), (2.1.7c) in [34])) is satisfied: ˆ θ(x) =
m X
T jˆ ˆ ˆ µj {(f j (x) − F (x)) + (g j (x))T φ(x) + (1/2)(φ(x)) B φ(x)}.
(56)
j=1
ˆ Substituting for φ(x) from (55) into (56), we find, in view of Assumption 2.5, that m m X X T ˆ ˆ ˆ θ(x) = µj (f j (x) − F (x)) − (1/2)(φ(x)) µj B j φ(x) j=1
≤
j=1
ˆ −(1/2)u1 kφ(x)k , 2
(57)
654
J. LI, Y. GAO, T. DAI, C. YE, Q. SU AND J. HUO
with the inequality (57) following from the fact that f j (x)−F (x) ≤ 0 for all j ∈ Im . ˆ Substituting for θ(x) from (57) into (52), we find that ˆ k )) − F (xk ) ≤ F (xk + φ(x
ˆ k ) − [u1 (1 − %)/2 − (α/6)kφ(x ˆ k )k]kφ(x ˆ k )k2 %θ(x ˆ k )) − FB (xk , φ(x ˆ k ))). +(FH (xk , φ(x (58)
Since xk → x∗ , k → ∞, we have and there exists k ≥ k0 such that for < u1 (1−%)/2, ˆ k )k < 3/α and from Lemma 2.10, one has that one has that kφ(x k(B j )k − H j (xk )k ≤ , j ∈ Im .
(59)
It follows from (35) that ˆ k )) − FB (xk , φ(x ˆ k )) ≤ (/2)kφ(x ˆ k )k2 . FH (xk , φ(x
(60)
Thus, from (58) and (60) there exists a k1 (≥ k0 ) such that, for all k ≥ k1 , ˆ k ))−F (xk ) ≤ %θ(x ˆ k )+[−u1 (1−%)/2+/2+/2]kφ(x ˆ k )k2 ≤ %θ(x ˆ k ), (61) F (xk + φ(x i.e., that λk = 1. This completes our proof. 5. Handling nonconvex case. From the above analysis we know that the SSFDMM method with unit steps converges rapidly once it approaches a minimizer x∗ . This simple algorithm is inadequate for general use, however, since it may fail to converge to a solution from remote starting points. Even if it does converge, its behavior may be erratic in regions where any of the f j (·), j ∈ Im in (1) is nonconvex. Because if any of the f j (·) in (1) are nonconvex, it is possible that the search direction problem (12) becomes nonconvex. As a result, we can no longer guarantee descent at a nonstationary point, which makes it critical to design a practical algorithm. There are various modifications to ensure a convex search direction problem (see [32]), most of which are that modifying the approximate Hessian matrices (B j )k by implicitly or explicitly choosing the modification ˜ j = (B j )k + E j are sufficiently positive definite. Besides Ekj so that the matrices B k k requiring the modification to be well conditioned, we still hope the modification to be as small as possible, so that the second-order information in the Hessian or its approximation is preserved as far as possible. For the eigenvalue modification strategy based on the eigenvalue decomposition of (B j )k as in [32], although it is efficient, we do not carry out the spectral decomposition of the approximate Hessians, since this is generally too expensive to compute. A popular approach[7, 25] for modifying a Hessian matrix or its approximation that is not positive definite is to perform its Cholesky factorization, but to increase the diagonal elements encountered during the factorization (where necessary) to ensure that they are sufficiently positive. This modified Cholesky approach not only guarantees that the modified Cholesky factors exist and are bounded relative to the norm of the actual Hessian or its approximation, but also does not modify the Hessian or its approximation if it is sufficiently positive definite. Although the modified method first occurs in unconstrained differentiable optimization problems, we can apply the method to the minimax problems (1) so that all Hessians of the functions f j (x), j ∈ Im are guaranteed to be sufficiently positive definite. We define the new optimality function ˆ ˆ θ(·), search direction map φ(·), and algorithm as follows: ˆ θ(x) = minn max {f˜j (x) + hT g j (x) + (1/2)hT Lj Dj (Lj )T h}, h∈R j∈Im
(62)
SSFD METHOD TO LARGE SPARSE MINIMAX PROBLEMS
655
and ˆ φ(x) = arg minn max {f˜j (x) + hT g j (x) + (1/2)hT Lj Dj (Lj )T h}, h∈R j∈Im
(63)
where f˜j (x) = f j (x) − F (x). Algorithm 3. Given a consistent partition of the columns of the lower triangular part of each Hessian H j (·), j ∈ Im , which divides the set {1, 2, · · · , n} into pj subsets cj1 , cj2 , · · · , cjpj , j ∈ Im and given x−1 , x0 ∈ Rn such that s0i ≡ x0i − x−1 6= i 0, i = 1, 2, · · · , n. Set %, ρ ∈ (0, 1) and S = {1, ρ, ρ2 , · · · }. At each iteration k(≥ 0): (1) Calculate (B j )k by Algorithm 1. (2) Modify (B j )k = ((bjis )k ), (i, s) ∈ M j = {(i, s) : bjis 6= 0} during the course of the factorization to obtian its modified Cholesky factors Dkj and Ljk as follows: (2.1) Set δkj > 0 and βkj > 0. P (2.2) Set (d˜jk )s = max{δkj , |(bjss )k − r∈Is−1 ,(s,r)∈M ((Ckj )sr )2 ((djk )r )−1 |}. (2.3) If s = n, then set (djk )s = (d˜jk )s . Otherwise, set X (Ckj )is = (bjis )k − (Ckj )sr (Ckj )ir ((djk )r )−1 , i > s, (i, s) ∈ M j , r∈Is−1 ,(s,r)∈M j
and (Ωjk )s = max{|(Ckj )is | : i > s, (i, s) ∈ M j }, (djk )s = max{(d˜jk )s , ((Ωjk )s )2 /(βkj )2 }, (lkj )is = (Ckj )is /(djk )s . where Dkj is a diagonal matrix and diag(Dkj ) = ((djk )1 , (djk )2 , · · · , (djk )n )), and Ljk = ((lkj )is ) is a lower triangular matrix with unit diagonal elements. ˆ k ) and hk ∈ φ(x ˆ k ), according to (62) and (63). If θ(x ˆ k ) = 0, (3) Compute θ(x then stop. (4) Compute the step size ˆ k )}. λk = max{λ ∈ S|F (xk + λhk ) − F (xk ) ≤ λ%θ(x (5) Set xk+1 = xk + λk hk . (6) Replace k + 1 by k. Algorithm 3 requires approximately n3 /6 arithmetic operations for every modification of (B j )k , k > 0 and j ∈ Im . No additional storage is needed beyond the amount required to store (B j )k and the triangular factors Ljk and Dkj , as well as the intermediate scalars (Ckj )is , can overwrite the elements of (B j )k . It is not difficult to see that the algorithm produces the Cholesky factorization of the modified matrix (B j )k + Ekj , that is, (B j )k + Ekj = Ljk Dkj (Ljk )T , Ekj
(64)
where is a nonnegative diagonal matrix that is zero if (B j )k is sufficiently positive definite. From an examination of the formulae for (Ckj )ss and (djk )s in Algorithm 3, it is clear that the diagonal entries of Ekj are (ejk )s = (djk )s − (Ckj )ss . It is also clear that incrementing (Ckj )ss by (ejk )s in the factorization is equivalent to incrementing (bjss )k by (ejk )s in the original data. ˜ j = Lj Dj (Lj )T , then it follows easily from Algorithm 3 that there exist Let B k k k k constants 0 < u1 < U1 < ∞ such that ˜ j h ≤ U1 khk2 , ∀h ∈ Rn and j ∈ Im . u1 khk2 ≤ hT B (65) k
656
J. LI, Y. GAO, T. DAI, C. YE, Q. SU AND J. HUO
Assumption 5.1. There exist 0 < u < U < ∞ such that ukhk2 < hT H j (x∗ )h < U khk2 ,
∀h ∈ Rn and j ∈ Im ,
(66)
where x∗ is the solution of Problem (1). Lemma 5.2. Suppose that Assumption 2.3 is satisfied. Then h in (62) is bounded on bounded subsets of Rn . Proof. For any h ∈ Rn , we have from (65) the estimate ˜jh max f˜j (x) + hT g j (x) + (1/2)hT B
j∈Im
≥ − max kg j (x)kkhk + (u1 /2)khk2 . j∈Im
(67)
Consequently, if khk > 2 max kg j (x)k/u1 , j∈Im
ˆ the left-hand side of (67) is positive. Since θ(x) ≤ 0, it follows that, in this case, khk ≤ 2 max kg j (x)k/u1 , j∈Im
which yields the desired result. Theorems 5.3-5.5 state global convergence, local and global superlinear convergence results for Algorithm 3 in a similar manner to Theorems 4.1, 3.3 and 4.2. The proofs of these results, with the exception of that of Theorem 5.3 which differs significantly from the proof of Theorem 4.1, are similar to the proofs in Sections 3 and 4 and are omitted. The global convergence result is the same in the convex case. Theorem 5.3. Suppose that Assumptions 2.3 and 5.1 hold, and that x∗ is the solution of (1). Then the accumulation point of any sequence {xk }∞ k=0 generated by Algorithm 3 is x∗ . Proof. First, because of Assumption 5.1, any sequence {xk }∞ k=0 generated by Algorithm 3 has at least an accumulation point. For the sake of contradiction, suppose that the sequence xk → x ˆ 6= x∗ , k ∈ K, where K is some infinite subset of the natural number set N . Let {hk }∞ k=0 be the sequence of search directions produced by Algorithm 3. The proof proceeds by obtaining a lower bound on the step taken by the algorithm and then using this to show that, as a consequence, the sequence of costs must decrease to −∞, which will contradict the continuity of F (·). From Lemma 5.2, we conclude that there exists some Γ < ∞ such that hk ≤ Γ, ∀k ∈ K. Furthermore, since the gradients are continuously differentiable, there exists L < ∞ such that each gradient is Lipschitz continuous with constant L in some ball containing the points xk and xk + hk for all k ∈ K. Let λ ∈ (0, 1] and k ∈ K. This
SSFD METHOD TO LARGE SPARSE MINIMAX PROBLEMS
657
gives the following estimate: F (xk + λhk ) − F (xk ) = max f˜j (xk ) + λhTk g j (xk ) + λ j∈Im
Z
1
(g j (xk + sλhk ) − g j (xk ))T hk ds
0 2
≤ max f˜j (xk ) + λhTk g j (xk ) + Lλ khk k2 j∈Im 1 ˜ j hk + Lλ2 khk k2 ≤ λ max f˜j (xk ) + hTk g j (xk ) + hTk B k j∈Im 2 ˆ k ) + Lλ2 khk k2 ≤ λ(θ(x ˆ k ) + LλΓ2 ). = λθ(x
(68)
Consequently, if ˆ k )/(LΓ2 ), λ ≤ −(1 − %)θ(x the Armijo stepsize rule is satisfied. It follows that the actual stepsize accepted satisfies ˆ k )/(LΓ2 ), ∀k ∈ K. λk ≥ −ρ(1 − %)θ(x Since we have assumed θU1 (ˆ x) < 0 and there exist γ0 > 0 and k0 such that for all k ≥ k0 , k ∈ K we have that ˆ k ) ≤ θU (xk ) < −γ0 . θ(x (69) 1
For such k, we then have ˆ k ) ≤ −%[ρ(1 − %)γ0 /(LΓ2 )]γ0 , F (xk+1 ) − F (xk ) ≤ %λk θ(x
(70)
which yields the desired contradiction. Theorem 5.4. Suppose that Assumptions 2.3 and 5.1 are satisfied and let x∗ be the solution of (1). Then there exist δ, ε > 0 such that if kx0 − x∗ k ≤ ε,
kx0 − x−1 k ≤ δ,
then the sequence {xk }∞ k=0 generated by Algorithm 3 without any global strategy is well defined and converges q-superlinearly to x∗ . Theorem 5.5. Suppose that Assumptions 2.3 and 5.1 are satisfied and let {xk }∞ k=0 generated by Algorithm 3 be a sequence converging to x∗ , the solution of Problem ∗ (1). Then the sequence {xk }∞ k=0 converges q-superlinearly to x . And if for any k 1 ≤ i ≤ n, si = 0 appears consecutively in at most w steps, then the r-convergence order of {xk } is not less than τ , where τ is defined by (42). 6. Numerical results. In this section we compute some examples to test the efficiency of our algorithm. For obtaining a search direction hk in every iteration, we compute (12) and (13) using the search direction method in [32], which is practically a modified Newton’s method, and compare the method with the recent usual smoothing methods (see [16, 36, 47]) by the following techniques to form the Hessian approximations (B j )k of the objective functions f j (xk ), j ∈ Im of Problem (1): the finite difference Newton method (FNMM), which the sparsity of the Hessians is not considered, and three Hessian computation techniques taking advantage of sparsity: the direct method (DIRMM) [4, 39], the substitution method (SUBMM) [4, 39] and the SSFDMM method of this paper. In these methods we consider the following four key indices to compare: the number of the total iterations (NI), the number of function evaluations (NF), the number of gradient evaluations (NG) and the total computational time in seconds (TM). The method handling the case of nonconvex
658
J. LI, Y. GAO, T. DAI, C. YE, Q. SU AND J. HUO
of Problem (1) of Algorithm 3 is applied to not only the SSFDMM method but also the other three compared methods. In the compared smoothing methods, a technique employing sparsity of the Hessians to calculate the product of the Hessian and a vector (see [16]) is also considered. In Algorithm 3, we take % = 0.01, ρ = 0.5. The stopping test we have used is ˆ < 10−5 , while the stopping test of the compared smoothing methods is |gp d| < |θ| 10−5 where gp is the gradient of the function Fp in (3) and d is the obtained search direction in each iteration. In the following examples, the Hessians are all diagonal structure. If the Hessian has a tridiagonal, five-diagonal, seven-diagonal or ninediagonal structure, the number of the partitioning groups is that pj = 3, 5, 7 or 9 for the DIRMM, and that pj = 2, 3, 4 or 5 for the SUBMM and SSFDMM. All these numerical experiments were performed in double-precision Matlab-6.5 on Pentium(R) Dual T2370 @ 1.73GHz 796MHz PC with 1.75 GB RAM. The numerical results are shown in Tables 4.1-4.7. x0 is the starting point. These examples we choose to test (cf.[20, 21]) are as follows: Example 6.1. minx∈Rn maxj∈{1,2,...,m} f j (x), where m = 3 and (n−2)/3
X
1
f (x) =
(xj+1 − xj+2 )2 + (xj+3 − 1)2 + (xj+4 − 1)4 + (xj+5 − 1)6 ,
i=1
j = 3(i − 1); (tridiagonal structure) (n−1)/4
X
2
f (x) =
(xj+1 − xj+2 )2 + (xj+2 − xj+3 )2 + (xj+3 − xj+4 )4
i=1
+(xj+4 − xj+5 )4 , j = 4(i − 1); (tridiagonal structure) (n−2)/3
X (xj+1 − 1)2 + (xj+2 − xj+3 )2 + (xj+4 − xj+5 )4 , j = 3(i − 1).
f 3 (x) =
i=1
(tridiagonal structure) Table 4.1 Comparison of the SSFDMM method with the other methods Alg. Example 6.1 (x0i = 1.05, 1 ≤ i ≤ n, n = 103 ) Search direction method Smoothing method ˆ NI NF NG TM |θ| NI NF NG TM |gp h| FNMM DIRMM SUBMM SSFDMM
Alg.
3 3 3 3
15 15 15 18
9009 30 27 21
2.328 0.250 0.266 0.231
1.891E-7 1.891E-7 1.891E-7 2.908E-7
3 3 3 3
12 12 12 12
9009 30 27 21
1.968 0.063 0.094 0.047
2.195E-7 2.195E-7 2.195E-7 3.389E-7
Table 4.2 Comparison of the SSFDMM method with the other methods Example 6.1 (x0i = 1.05, 1 ≤ i ≤ n, n = 104 ) Search direction method Smoothing method ˆ NI NF NG TM |θ| NI NF NG TM
FNMM DIRMM SUBMM SSFDMM
3 3 3 3
15 15 15 18
90009 30 27 21
152.421 3.844 4.094 3.172
1.887E-7 1.887E-7 2.294E-7 3.724E-6
3 3 3 3
12 12 12 12
90009 30 27 21
147.75 0.734 1.078 0.525
|gp h| 2.665E-6 2.665E-6 2.665E-6 2.916E-6
SSFD METHOD TO LARGE SPARSE MINIMAX PROBLEMS
Alg.
659
Table 4.3 Comparison of the SSFDMM method with the other methods Example 6.1 (x0i = 1.05, 1 ≤ i ≤ n, n = 105 ) Search direction method Smoothing method ˆ NI NF NG TM |θ| NI NF NG TM |gp h|
FNMM DIRMM SUBMM SSFDMM
9 9 9
51 51 54
90 81 57
> 500 116.311 115.343 97.275
9.291E-6 5.569E-6 6.413E-6
13 17 12
48 69 42
130 153 75
>500 38.093 81.344 24.891
4.846E-8 4.764E-7 4.631E-6
From Tables 4.1-4.3 we can see that for the most cases the number of total iterations and the number of function evaluations are (approximately) equal in the four methods but the number of gradient evaluations and the CPU time cost of the SSFDMM method are least in the compared methods by using whether the search direction method or the smoothing method. Example 6.2. minx∈Rn maxj∈{1,2,...,m} f j (x), where m = 8 and 1
f (x) =
n X
100(x2i−1 − xi )2 + (xi−1 − 1)2 ; (tridiagonal structure)
i=2
f 2 (x) =
n−1 X
(xi − 2)4 + (xi − 2)2 x2i+1 + (xi+1 + 1)2 + (xn − 2)4 ;
i=1
(tridiagonal structure) f 3 (x) =
k X
100(x2i−1 − xi )2 + (xi−1 − 1)2 + 90(x2i+1 − xi+2 )2 + (xi+1 − 1)2
j=1
+10(xi + x( i + 2) − 2)2 + (xi − xi+2 )2 /10 . i = 2j, k = (n − 2)/2; (five-diagonal structure) 4
f (x) =
n X
2
[(3 − 2xi )xi − xi−1 − 2xi+1 + 1] , x0 = xn+1 = 0;
i=1
(five-diagonal structure) f 5 (x) =
n−1 X
2 2(xi − xi−1 )2 + (1 − xi )2 + 0.5(xi−1 + xi + xi+1 − 3)2
i=2
+ 2(x2n − xn−1 )2 + (1 − xn )2 ; (five-diagonal structure) f 6 (x) =
n X
n−2 X X n−1 xi + (3/2 − xi /3)x2i − x2i xi+1 − x2i xi+2 ;
i=1
i=1
i=1
(five-diagonal structure) f 7 (x) =
n n−2 n−3 X X X X xi n−1 e + (3/2 − xi /3)x2i − x2i xi+1 − x2i xi+2 − x2i xi+3 ; i=1
i=1
i=1
i=1
(seven-diagonal structure) f 8 (x) =
n X
fi2 (x), fi (x) = xi (2 + 5x2i ) + 1 −
i=1
X
xj (1 + xj ),
j∈Ji
Ji = {j : j 6= i, max(1, i − ml ) ≤ j ≤ min(n, i + mu )}, ml = 2, mu = 2. (nine-diagonal structure)
660
J. LI, Y. GAO, T. DAI, C. YE, Q. SU AND J. HUO
Table 4.4 Comparison of the SSFDMM method with the other methods Example 6.2, x0 = (1.1, 1.14, 1.02, . . . , 1.02)0 , n = 6000 Search direction method Smoothing method ˆ NI NF NG TM |θ| NI NF NG TM
Alg.
FNMM DIRMM SUBMM SSFDMM
5 5 5 4
72 72 72 64
240040 240 165 100
369.469 9.656 9.828 7.984
1.05E-8 1.05E-8 1.06E-8 6.08E-7
5 5 5 5
48 48 48 48
240040 240 165 128
392.827 4.156 3.687 2.984
|gp h| 2.02E-8 2.02E-8 2.02E-8 2.02E-8
Table 4.5 Comparison of the SSFDMM method with the other methods using the smoothing method Alg. Example 6.2, x0 = (1.1, 1.14, 1.02, . . . , 1.02)0 , n = 60000 NI NF NG TM |gp h| FNMM DIRMM SUBMM SSFDMM
5 5 5
48 48 48
240 165 128
> 1000 49.811 51.541 38.437
2.02E-8 2.02E-8 2.02E-8
From the above tables it can be seen that when the number m of the objective functions becomes large and the dimension n of the variable x is huge, the save of gradient evaluations and the CPU time cost of the SSFDMM method are more obvious than that of the other compared methods by using whether the search direction method or the smoothing method. Example 6.3. minx∈Rn maxj∈{1,2,...,m} f j (x), where m = 2 and n X (xi−1 − 3)2 + (xi−1 − xi )2 + exp(20(xi−1 − xi )) ; f (x) = 1
i=2
(tridiagonal structure) n h i X 2 2 f 2 (x) = (x2i−1 )xi +1 + (x2i )xi−1 +1 . (tridiagonal structure) i=2
Table 4.6 Comparison of the SSFDMM method with the other methods using the search direction method Alg. Example 6.3 (x0i = 0.02, 1 ≤ i ≤ 1600) ˆ NI NF NG TM |θ| FNMM DIRMM SUBMM SSFDMM
3 3 4 3
10 10 46 12
9606 24 24 14
43.734 0.438 0.594 0.369
4.104E-10 4.104E-10 4.105E-10 4.661E-10
Table 4.7 Comparison of the SSFDMM method with the other methods using the search direction method Alg. Example 6.3 (x0i = 0.001, 1 ≤ i ≤ 1.6 × 106 ) ˆ NI NF NG TM |θ| FNMM DIRMM SUBMM SSFDMM
3 4 2
42 78 8
24 24 10
>1000 500.78 711.52 305.64
3.722E-8 3.721E-8 5.589E-6
From the above tables for large scale problems the FNMM method without taking advantage of the sparse of the Hessians is inefficient. Tables 4.6-4.7 show again that
SSFD METHOD TO LARGE SPARSE MINIMAX PROBLEMS
661
when the dimension of the problems is very large the SSFDMM method is extremely effective. 7. Conclusions. We have presented the SSFDMM method for solving large sparse minimax problems (1). In the problems, the Hessians of the objective function f j (x), j ∈ Im have a sparse structure and the dimension of the variable x is large. The SSFDMM algorithm takes advantage of the symmetry and the partitioning properties of each Hessian, and reduces the number of gradient evaluations as effectively as possible and is globally and locally superlinearly convergent. For keeping the robust of the SSFDMM method a handling method of the nonconvex case was given. From these tables it can be shown that, for most of the test problems, the number of gradient evaluations and the CPU time cost for the SSFDMM algorithm are usually least, which shows that it will be more efficient and promising to apply the SSFD algorithm to large sparse minimax optimization than to large sparse unconstrained differentiable optimization. Acknowledgments. Authors are indebted to the reviewers and the editors for their constructive and helpful comments which greatly improved the contents and exposition of this paper. REFERENCES [1] S. Bhulai, G. Koole and A. Pot, Simple methods for shift scheduling in multiskill call centers, Manufacturing & Service Operations Management, 10 (2008), 411–420. [2] X. Cai, K. Teo, X. Yang and X. Zhou, Portfolio optimization under a minimax rule, Manag. Sci., 46 (2000), 957–972. [3] F. H. Clarke, Optimization and Nonsmooth Analysis, Canadian Mathematical Society Series of Monographs and Advanced Texts. A Wiley-Interscience Publication. John Wiley & Sons, Inc., New York, 1983. [4] T. F. Coleman and J. J. Mor´ e, Estimation of sparse Hessian matrices and graph coloring problems, Mathematical Programming, 28 (1984), 243–270. [5] T. F. Coleman and J. J. Mor´ e, Software for estimation of sparse Hessian matrices, ACM Transaction on Mathematical software, 11 (1985), 363–377. [6] V. F. Demyanov and V. N. Malozemov, Introduction to Minimax, Translated from the Russian by D. Louvish. Halsted Press [John Wiley & Sons], New York-Toronto, Ont.; Israel Program for Scientific Translations, Jerusalem-London, 1974. [7] P. Gill, W. Murray and M. H. Wright, Practical Optimization, Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], London-New York, 1981. [8] A. Griewank and G. F. Corliss, Automatic Differentiation of Algorithms: Theory, Implementation, and Application, Proceedings of the First SIAM Workshop held in Breckenridge, Colorado, January 6C8, 1991. Edited by Andreas Griewank and George F. Corliss. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1991. [9] S. P. Han, Variable-Metric methods for minimizing a class of nondifferentiable functions, Mathematical Programming, 20 (1981), 1–13. [10] D. L. Han, J. B. Jian and J. Li, On the accurate identification of active set for constrained minimax problems, Nonlinear Analysis, 74 (2011), 3022–3032. [11] W. Hare and M. Macklem, Derivative-free optimization methods for finite minimax problems, Optimization Methods and Software, 28 (2013), 300–312. [12] S. X. He and S. M. Zhou, A nonlinear augmented Lagrangian for constrained minimax problems, Applied Mathematics and Computation, 218 (2011), 4567–4579. [13] J. B. Jian, R. Quan and X. L. Zhang, Feasible generalized monotone line search SQP algorithm for nonlinear minimax problems with inequality constraints, Journal of Computational and Applied Mathematics, 205 (2007), 406–429. [14] J. B. Jian and M. T. Chao, A sequential quadratically constrained quadratic programming method for unconstrained minimax problems, J. Math. Anal. Appl., 362 (2010), 34–45.
662
J. LI, Y. GAO, T. DAI, C. YE, Q. SU AND J. HUO
[15] J. X. Li, L. M. Yan, S. D. Li and J. Z. Huo, Inexact trust region PGC method for large sparse unconstrained optimization, Computational Optimization and Applications, 51 (2012), 981– 999. [16] J. X. Li and J. Z. Huo, Inexact smoothing method for large sparse minimax optimization, Applied Mathematics and Computation, 218 (2011), 2750–2760. [17] S. S. Liu and L. G. Papageorgiou, Multiobjective optimisation of production, distribution and capacity planning of global supply chains in the process industry, Omega, 41 (2013), 369–382. [18] G. Liuzzi, S. Lucidi and M. Sciandrone, A derivative-free algorithm for linearly constrained finite minimax problems, SIAM J. Optim., 16 (2006), 1054–1075. [19] X. S. Li, An entropy-based aggregate method for minimax optimization, Engineering Optimization, 18 (1992), 277–285. [20] L. Lukˇsan and J. Vlˇ cek, Sparse and Partially Separable Test Problems for Unconstrained and Equality Constrained Optimization, Report V-767, Prague, ICS AS CR, 1999. [21] L. Lukˇsan and J. Vlˇ cek, Test Problems for Nonsmooth Unconstrained and Linearly Constrained Optimization, Report V-798, Prague, ICS AS CR, 2000. [22] L. Luksan, C. Matonoha and J. Vlcek, Primal Interior-Point Method for Large Sparse Minimax Optimization, Technical Report 941, Institute of Computer Science, Academy of Sciences of the Czech Republic, Prague, Czech Republic (2005). [23] B. Mor and G. Mosheiov, Minmax scheduling problems with common flow-allowance, Journal of the Operational Research Society, 63 (2012), 1284–1293. [24] W. Murray and M. L. Overton, A projected Lagrangian algorithm for nonlinear minimax optimization, SIAM Journal on Scientific and Statistical Computing, 1 (1980), 345–370. [25] J. Nocedal and S. J. Wright, Numerical Optimization, Springer, NewYork, 1999. [26] E. Obasanjo, G. Tzallas-Regas and B. Rustem, An interior-point algorithm for nonlinear minimax problems, J. Optim. Theory Appl., 144 (2010), 291–318. [27] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, 1970. [28] E. Y. Pee and J. O. Royset, On solving large-scale finite minimax problems using exponential smoothing, J. Optim. Theory Appl., 148 (2011), 390–421. [29] E. Polak, On the mathematical foundations of nondifferentiable optimization in engineering design, SIAM Rev., 29 (1987), 21–89. [30] E. Polak, S. Salcudean and D. Q. Mayne, Adaptive control of ARMA plants using worst case design by semi-infinite optimization, IEEE Trans. Autom. Control, 32 (1987), 388–396. [31] E. R. Panier and A. L. Tits, A globally convergent algorithm with adaptively refined discretization for semi-infinite optimization problems arising in engineering design, IEEE Trans. Autom. Control, 34 (1989), 903–908. [32] E. Polak, D. Q. Mayne and J. E. Higgins, Superlinearly convergent algorithm for min-max problems, Journal of Optimization Theory and Applications, 69 (1991), 407–439. [33] E. Polak, D. Q. Mayne and J. E. Higgins, On the extension of Newton’s method to semiinfinite minimax problems, SIAM Journal on Control and Optimization, 30 (1992), 367–389. [34] E. Polak, Optimization Algorithm and Consistent Approximations, Applied Mathematical Sciences, 124, Springer-Verlag, New York, 1997. [35] E. Polak, R. Trahan and D. Q. Mayne, Combined phase I-phase II methods of feasible directions, Mathematical Programming, 17 (1979), 61–73. [36] E. Polak, J. O. Royset and R. S. Womersley, Algorithms with adaptive smoothing for finite minimax problems, Journal of Optimization Theory and Applications, 119 (2003), 459–484. [37] E. Polak, R. S. Womersley and X. H. Yin, An algorithm based on active sets and smoothing for discretized semi-infinite minimax problems, J. Optim. Theory Appl., 138 (2008), 311–328. [38] A. Pot, S. Bhulai and G. Koole, A simple staffing method for multiskill call centers, Manuf. Ser. Oper. Manage., 10 (2008), 421–428. [39] M. J. D. Powell and Ph. L. Toint, On the estimation of sparse Hessian matrices, SIAM Journal on Numerical Analysis, 16 (1979), 1060–1074. [40] S. M. Robinson, Perturbed Kuhn-Tucker points and rates of convergence for a class of nonlinear-programming algorithms, Mathematical Programming, 7 (1974), 1–16. [41] J. O. Royset and E. Y. Pee, Rate of convergence analysis of discretization and smoothing algorithms for semiinfinite minimax problems, J. Optim. Theory Appl., 155 (2012), 855–882.
SSFD METHOD TO LARGE SPARSE MINIMAX PROBLEMS
663
[42] J. F. Sturm and S. Zhang, A dual and interior-point approach to solve convex min-max problems, in Minimax and Applications, Nonconvex Optim. Appl., 4, Kluwer Acad. Publ., Dordrecht, 1995, 69–78. [43] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, New Jersey, 1970. [44] S. Ruzika and M. Thiemann, Min-Max quickest path problems, Networks, 60 (2012), 253–258. [45] F. S. Wang and Y. P. Wang, Nonmonotone aglorithm for minimax optimization problems, Applied Mathematics and Computation, 217 (2011), 6296–6308. [46] S. Wolfram, The Mathematica Book, Third edition, Wolfram Media, Inc., Champaign, IL; Cambridge University Press, Cambridge, 1996. [47] S. Xu, Smoothing methods for minimax problems, Computational Optimization and Applications, 20 (2001), 267–279. [48] F. Ye, H. liu, S. Zhou and S. Liu, A smoothing trust-region Newton-CG method for minimax problem, Appl. Math. Comput., 199 (2008), 581–589. [49] B. Yu, G. X. Liu and G. C. Feng, The aggregate homotopy methods for constrained sequential max-min problems, Northeastern Mathematical Journal, 19 (2003), 287–290. [50] Y. X. Yuan and W. Y. Sun, Optimization Theorem and Methods, Science Press, Beijing, China, 2001. [51] S. T. Zhang and B. Yu, A globally convergent method for nonconvex generalized semi-infinite minimax problems, Numerical Mathematics A Journal of Chinese Universities, 27 (2005), 316–319. [52] H. W. Zhang and J. X. Li, The substitution secant/finite difference method for large scale sparse unconstrained optimization, Acta Mathematicae Applicatae Sinica, English series, 21 (2005), 581–596.
Received May 2012; 1st revision February 2013; 2nd revision May 2013. E-mail E-mail E-mail E-mail E-mail E-mail
address: address: address: address: address: address:
[email protected] [email protected] [email protected] [email protected] [email protected] [email protected]