IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 6, NOVEMBER 2006
1471
Convergence of Neural Networks for Programming Problems via a Nonsmooth Łojasiewicz Inequality Mauro Forti, Paolo Nistri, and Marc Quincampoix
Abstract—This paper considers a class of neural networks (NNs) for solving linear programming (LP) problems, convex quadratic programming (QP) problems, and nonconvex QP problems where an indefinite quadratic objective function is subject to a set of affine constraints. The NNs are characterized by constraint neurons modeled by ideal diodes with vertical segments in their characteristic, which enable to implement an exact penalty method. A new method is exploited to address convergence of trajectories, which is based on a nonsmooth Łojasiewicz inequality for the generalized gradient vector field describing the NN dynamics. The method permits to prove that each forward trajectory of the NN has finite length, and as a consequence it converges toward a singleton. Furthermore, by means of a quantitative evaluation of the Łojasiewicz exponent at the equilibrium points, the following results on convergence rate of trajectories are established: 1) for nonconvex QP problems, each trajectory is either exponentially convergent, or convergent in finite time, toward a singleton belonging to the set of constrained critical points; 2) for convex QP problems, the same result as in 1) holds; moreover, the singleton belongs to the set of global minimizers; and 3) for LP problems, each trajectory converges in finite time to a singleton belonging to the set of global minimizers. These results, which improve previous results obtained via the Lyapunov approach, are true independently of the nature of the set of equilibrium points, and in particular they hold even when the NN possesses infinitely many nonisolated equilibrium points. Index Terms—Convergence in finite time, exponential convergence, generalized gradient, Łojasiewicz inequality, neural networks (NNs), programming problems.
I. INTRODUCTION
T
HE approach based on the use of analog neural networks (NNs) for solving nonlinear programming problems has received a great deal of attention in the last two decades, see, e.g., [1]–[10] and references therein. This approach is effective and particularly attractive in all those applications where it is of crucial importance to obtain the optimal solution in real time, as in some robotics, surveillance, and signal processing tasks. Recently, a generalized neural network (G-NN) for nonlinear programming problems has been proposed [11], which derives from a natural extension of the programming NN proposed by Kennedy and Chua [2]. Indeed, while the network in [2] was conceived for solving optimization problems with smooth twice continuously differentiable objective and constraint functions, Manuscript received September 14, 2005; revised April 14, 2006. M. Forti and P. Nistri are with the Dipartimento di Ingegneria dell’Informazione, Università di Siena, 53100 Siena, Italy (e-mail:
[email protected];
[email protected]). M. Quincampoix is with the Laboratoire de Mathématiques, Université de Bretagne Occidentale, F-29285 Brest Cedex, France (e-mail:
[email protected]). Digital Object Identifier 10.1109/TNN.2006.879775
G-NN is suitable for solving a much wider class of nonsmooth optimization problems where the objective and constraint functions are modeled by (possibly nondifferentiable) regular functions. G-NN is characterized by constraint neurons modeled by ideal diodes with a vertical segment in the conducting region of their characteristic, which can be thought of as being the limit as the slope tends to infinity of the neuron nonlinearities in [2]. The use of ideal diodes with vertical segments enables G-NN to implement an exact penalty method where the constrained critical points of the objective function are in a one-to-one correspondence with the equilibrium points of G-NN. Due to the presence of the set-valued nonlinearities modeling the constraint neurons, the dynamics of G-NN is mathematically described by a differential inclusion. The inclusion corresponds to the generalized gradient of a nondifferentiable energy function, which is the sum of the objective function and a nonsmooth barrier function defined by the constraints. In this paper, we consider the application of G-NN to linear programming (LP) and convex quadratic programming (QP) problems, and to nonconvex QP problems where a general indefinite quadratic objective function is subject to a set of affine constraints. By means of a new method, which is based on a nonsmooth Łojasiewicz inequality for generalized gradient vector fields, we will establish a number of results on trajectory convergence of G-NN that are stronger than those obtained in [11] by means of a Lyapunov-like method. First of all, it is shown by means of the Łojasiewicz inequality that for LP, convex QP, and nonconvex QP problems, each forward trajectory of G-NN has finite length and converges to a singleton, i.e., G-NN is a convergent system. This property is true even in the most general case where G-NN has infinitely many nonisolated equilibrium points (e.g., a manifold of equilibria). Conversely, we note that the results previously obtained in [11] only ensure that each trajectory of G-NN approaches the set of equilibria, i.e., G-NN is a quasi-convergent system. We stress that the property of convergence is much stronger than that of quasi-convergence. Indeed, it may happen that the trajectories of a quasi-convergent system approach the set of equilibrium points and yet they display large-size nonvanishing oscillations. Examples of this kind have been given in [13, ex. 3, p. 14] and in [12, ex. 3] in the field of NNs. Of course, such an oscillatory behavior of the trajectories would be highly undesirable for a neural optimization solvers, and it can be avoided only by proving the stronger property of convergence. Second, this paper addresses the issue of convergence rate of the trajectories of G-NN, which is of crucial importance for a sound design of NNs aimed at solving optimization problems in real time. Specifically, this paper shows that it is possible to obtain a quantitative evaluation of the Łojasiewicz exponent at
1045-9227/$20.00 © 2006 IEEE
1472
each equilibrium point of G-NN and, on this basis, it establishes the following results on convergence rate: 1) for nonconvex QP problems, it is shown that each trajectory of G-NN is either exponentially convergent or convergent in finite time toward a singleton belonging to the set of constrained critical points; 2) for convex QP problems, the same result as in 1) holds and in addition the singleton belongs to the set of global minimizers; and 3) for LP problems it is shown that each trajectory of G-NN converges in finite time to a singleton that belongs to the set of global minimizers. Again, all these results are valid even when G-NN possesses infinitely many nonisolated equilibrium points. The obtained results improve those established in [11] where, apart from the case of LP problems, no estimate of convergence rate of the trajectories of G-NN has been obtained. The method here employed for addressing convergence, extends to a class of dynamical systems defined by the generalized gradient of a nondifferentiable energy function, the method used in [12] for a class of NNs defined by the (conventional) gradient of an analytic energy function. At the core of the analysis in this paper is the proof of two basic results (Theorems 1 and 2), which extend the standard Łojasiewicz inequality for gradient systems of an analytic functions [14], [15] to generalized gradient vector fields which describe the dynamics of G-NN for LP, convex QP, and nonconvex QP problems. The structure of this paper is briefly outlined as follows. In the remaining part of this section, we give some needed preliminary definitions and results. Section II describes the programming problems considered in this paper, and the G-NN for solving these problems, while Section III gives some basic properties of the dynamics of G-NN. Section IV establishes a nonsmooth Łojasiewicz inequality for G-NN, and then Section V exploits this inequality to prove the main results on convergence for G-NN. After some computer simulations (Section VI), this paper is concluded by collecting some final remarks in Section VII. Notation: Given the column vectors and , where the prime means the transis the scalar product of and pose, is the Euclidean norm of . , we mean the linear subspace of By span spanned by vectors . Suppose that is a square matrix. We denote by the transpose of , by the kernel of , and by the linear subspace of which is orthogonal to . By , we denote the idenbe a subset of . Then, tity linear operator. Let denotes the interior of in and denotes the boundary ; moreover, is the closure of the convex hull of in and , of . If is the ball with radius about . Finally, for any , is the distance of from set . A. Preliminaries In this section, we report definitions and properties which are needed in the development. We refer the reader to [16]–[18] for a more thorough treatment. 1) Set-Valued Maps and Generalized Gradient: By we denote a set-valued map, i.e., a map that assoa subset . Suppose that for ciates to any
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 6, NOVEMBER 2006
, is a nonempty closed subset of , and each that is a bounded map in a neighborhood of . Then, is if and only if its graph said to be upper semicontinuous on is a closed set. is said to be Lipschitz near A function if there exist positive constants and , such that , for all satisfying and . If is Lipschitz near any point , then is said to be locally Lipschitz in . is locally Lipschitz in , then is differIf (in the sense of Lebesgue entiable for almost all (a.a.) measure). Moreover, for any the generalized gradient of at point can be defined as follows [18]:
where is the set of points where is not differentiable, is an arbitrary set with measure zero. It can be and is a set-valued map with nonempty shown that . compact convex values Suppose that is locally Lipschitz near . Then, is said to be regular at if for all directions there exists the usual right directional derivative
and we also have
where is the generalized directional derivative of at in the direction . Function is said to be regular in if it . is regular for any is said to be semiconvex on if, A function , there exists for any open bounded convex set such that the function is convex in . If is , then it is also locally Lipschitz convex (or semiconvex) in . and regular in is a 2) Tangent and Normal Cones: Suppose that nonempty closed convex set. The tangent cone to at is defined as
while the normal cone to
at
is given by
Both and are nonempty closed convex cones in . In particular, if , then and, there. fore, is a nonempty closed convex set, and , If then there exists a unique point satisfying
FORTI et al.: CONVERGENCE OF NNS FOR PROGRAMMING PROBLEMS VIA A NONSMOOTH ŁOJASIEWICZ INEQUALITY
The operator is called the projector of best approximation on . In particular, we denote by
1473
Consider the application of G-NN to the solution of problem (1). The dynamics of G-NN is governed by the generalized gradient system [11] (3)
the vector
with the smallest norm. where II. CLASS OF PROGRAMMING PROBLEMS
Let us consider the following programming problem: minimize subject to
(1)
where and are positive integers, is a symmetric , , and , . matrix, If matrix is indefinite, i.e., has both positive and negative eigenvalues, then (1) is a nonconvex QP problem, while if is positive semidefinite, i.e., has nonnegative eigen, values, then (1) is a convex QP problem. If in particular then (1) reduces to a (convex) LP problem. Henceforth, we always suppose that Assumption 1 holds. Assumption 1: The feasibility region defined by the is a nonempty bounded (closed) convex affine constraints polyhedron
Note that G-NN implements a penalty method with a nondifferentiable barrier function , and that plays the role of a (finite) penalty parameter. We will see that the value of is crucial in the main results of this paper.1 In the actual implementation, corresponds to the negative saturation level of the diode-like nonlinearities used in the realization of the constraint neurons of G-NN. , we mean a function By a solution of (3) in the interval which is absolutely continuous on , such that for a.a. we have . An equilibrium point is a constant solution of (3). Clearly, a is an equilibrium point of (3) if and only if point satisfies the algebraic inclusion . Note that the set of equilibrium points of (3)
(2) such that , Let
. Moreover,
, i.e., we have
.
be the set of points where constrained to the compact set achieves its global minimum, i.e., the set of global minimizers of problem (1). The set of critical points of constrained to is given by [17]
coincides with the set of unconstrained critical points of in . . Then, the explicit equations satisfied by Suppose that G-NN for convex and nonconvex QP problems, are given by
where violated at point , and
(4) for all , is the set of constraints that are ,
(5) From standard results it is known that ; moreover, for . convex QP or LP problems we have of global While for LP and convex QP problems the set minimizers has a unique connected and convex component, for nonconvex QP problems the set of constrained critical points is characterized by multiple connected components, which are located at different values of the objective function. The importance of LP and convex QP problems is well known. Nonconvex QP problems are of great interest in several real-world engineering applications as well. In fact, nonconvex QP problems are known to be NP-hard, and due to their combinatorial structure they can model several classes of problems arising in planning, scheduling, optimal flow computation, and control, see, e.g., the review paper [19].
If
, we obtain the following for an LP problem (6)
where
(7) 1Indeed, when is large enough, then, as shown in Property 3, the penalty method implemented by G-NN is exact, i.e., the constrained critical points of are in a one-to-one correspondence with the equilibrium points of G-NN [20]. See also [5] and [21] for a general discussion on the importance of the exact penalty method in constrained optimization.
1474
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 6, NOVEMBER 2006
It is pointed out that G-NN admits a simple electronic implementation in VLSI for convex or nonconvex QP problems, and LP problems [2], [11].
Let
and be two solutions of (3) such that , then for a.a. we have
III. ATTRACTIVITY AND INVARIANCE OF In this section, we establish some basic results on the dynamics of (3), which are needed in the development. such that Let us fix any
In Property 1, we address the existence, uniqueness, boundedness, and prolongability of the solution to a Cauchy problem associated with (3), with initial condition in . Property 1: For any and any , there exists of (3) starting at , which is a unique solution . Moreover, if is sufficiently large then defined for all the following hold: for ; 1) reaches at a finite instant , and stays in 2) thereafter; is positively invariant for the dynamics of (3). 3) Proof: From (4), we can readily verify that is a set-valued map with nonempty compact convex values. is bounded in a neighborhood of each Moreover, since , and the graph point is a closed set, it follows that is upper-semicontinuous in . Analogous properties hold for the map in , there exists at least a (6). As a consequence, given any [16, Th. local solution of (3) with initial condition 3, p. 98]. This means that there exists such that (3) is and . Furthermore, from satisfied for a.a. is prolongable to [17, Th. 10.1.6] it follows that since, for any , has a linear growth (see [17, def. 10.1.5]). , , is the unique solution to (3) We now prove that , whenever and . To this satisfying is semiconvex on ; in end, note that function fact, the quadratic form is semiconvex on and is convex on , since it is the pointwise sum of the convex functions , (see [22]). Let be such that the function
is convex on . Therefore, valued map [16, Ch. 3], i.e., we have
for any and
, and for any
for any
, and for any
where and . By integrating between 0 and , we obtain
, for a.a.
and so by the Gronwall inequality we have that for any . 1) To show the result in point 1), we introduce the following : positive constants depending on (8) and (9) where
By the convexity of the function , we have that
for Thus, if
and
,
, whenever , then
. with
and
(10)
is a monotone set. for any , , be the solution of (3) starting from Let and consider ,
. Thus, we have
and
where for a.a. can be easily proved that, for
.
. By using (10), it , we have
FORTI et al.: CONVERGENCE OF NNS FOR PROGRAMMING PROBLEMS VIA A NONSMOOTH ŁOJASIEWICZ INEQUALITY
see [11, Lemma 2(b)] for the details. Thus, for and so the result in point 1) is proved. any is convex in ; hence, it is regular in 2) The function . Since , , is absolutely continuous on , thus by Property 4 in Apany compact interval of and are differentiable pendix I we obtain that , and we have for a.a. for a.a.
1475
for any
, namely for any , which contradicts (12). Therefore, for any and thus point 2) is proved. 3) The previous arguments also show that is positively invariant with respect to the dynamics of (3), i.e., the result stated in point 3) holds. Remarks: 1) From the proof of Property 1, it is seen that the results stated in points 1)–3) of the same property hold for any
Therefore, we obtain where the finite positive constants and are defined in (8) and (9), respectively. From (8), we easily obtain , where where
and , are such that
, for a.a. . . By the Schwarz inequality, and Suppose that the following inequality derived from (10):
is an eigenvalue of
(13)
for any , (9) Moreover, considering that . Hence, the results in Property 1 hold yields for any
we obtain
for a.a. thus have
(14)
such that
. For
, we
(11) for a.a. such that Suppose now that we have . By integrating (11) on
This implies that
reaches
. when , we have that
,
which gives a lower bound for of practical applicability, and is valid for convex or nonconvex QP problems. For LP .2 problems, the same bound as in (14) holds with 2) Property 1 improves the results in [11], where uniqueness of the solution was proved only in the case where both the objective function and the constraints are convex. Moreover, in that paper the issue of invariance for the feasibility set was not addressed. Property 2 permits to obtain a simple geometrical interpretation of the dynamics of (3) within the invariant set . , and let be a Property 2: Suppose that solution of (3) such that for . Then, for a.a. we have that
in finite time (15)
i.e., . for any Let us now show that contradiction, thus we assume that time. Let be such that
Furthermore, the function and we have a.a. . We argue by leaves in finite and let
(12) By integrating (11) on into account that have that
, and that
is differentiable for
hence, is nonincreasing for all . (see the proof of Property 1), Proof: When , can be viewed as a solution to the following and differential variational inequality (see [16, Ch. 5]):
, and taking , we
for a.a.
.
2For LP problems, a lower bound for that is less restrictive than (14), has been obtained in [11, Property 5], i.e., > kbkR =f , where R ky k < R. Note that this bound is independent of S .
= max
1476
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 6, NOVEMBER 2006
In fact, accounting for (4) it is seen that
for a.a. for a.a.
hence,
satisfies
. It can be easily verified that if we have
, then
in (7). Although the two around a critical point of the energy inequalities are conceptually related, we find it useful to treat them separately in Sections IV-A and B. Then, in Section IV-C, we report some remarks illustrating the significance of the obtained inequalities. A. Convex and Nonconvex QP Problems Suppose that
, and let be a critical point of constrained to , i.e., . Let
also satisfies
for a.a. . In particular, since is a continuous and is a compact, convex subset of , then we can map in apply [16, Proposition 2, p. 266] and [16, Th. 2, p. 268] in order to obtain (15). semiconvex in , then it is Finally, note that being is absolutely continuous on any also regular in . Since , by Property 4 in Appendix I we compact interval of have that for a.a.
Thus, by choosing
we obtain
Property 2 implies that, within , (3) coincides with a projected dynamical system defined by the vector field . Moreover, the unique solution of (3) coincides with the slow solution of the differential inclusion is defined for a.a. by the velocity field (3), i.e., , which is the vector in the set with minimum norm. Finally, we observe that the time derivative of obeys the same formula as for a smooth gradient the energy system. , then we have Property 3: If
Proof: The proof is the same as that of point 1) in [11, Th. 1]. Property 3 means that, when the penalty parameter is sufficiently large, then (3) implements an exact penalty method [20], where the critical points of constrained to coincide with the equilibrium points of (3) in and, hence, with the unconstrained in . critical points of IV. ŁOJASIEWICZ INEQUALITY In this section, we consider G-NN (4), which is aimed at solving convex or nonconvex QP problems, and G-NN (6) for solving LP problems. The goal is to prove a Łojasiewicz inin (5), and equality around a critical point of the energy
We have seen in Property 3 that, if , then is also in . an unconstrained critical point of around Theorem 1 provides a Łojasiewicz inequality for . Theorem 1: [Łojasiewicz inequality for convex and non, and consider the convex QP problems] Suppose that G-NN for solving convex or nonconvex QP problems in (4). If , then the following holds. Let be a critical in , and, hence, a critical point of in . Then, there exists , , and point of , such that (16) for all . : the Łojasiewicz inequality (16) Let , which is said to be the is satisfied . The number at point , is given as follows. Łojasiewicz exponent of 1) If , then . and , then . 2) If and , then . 3) If Proof: The proof exploits a number of properties of the to points on the boundary of the polyhedron , normal cone which are collected for convenience in Lemma 1 of Appendix II. The following preliminary observations will be useful. i) From (15), we have
where ii) It is verified in Appendix III that given sufficiently close to , then
. , and
(17) where . . Then, there exists a neighborhood iii) Let that contains only as a critical point of , or othercontains critical points that belong wise conto the connected component of critical points of taining . In fact, the number of connected components of is finite. Moreover, since is constant on each connected component, the Łojasiewicz inequality (16) is sat, and any , at the points isfied for any
FORTI et al.: CONVERGENCE OF NNS FOR PROGRAMMING PROBLEMS VIA A NONSMOOTH ŁOJASIEWICZ INEQUALITY
belonging to the connected component of containing . Therefore, to verify the Łojasiewicz inequality (16), it suffices to consider points in a sufficiently small neighborhood of such that , i.e., points . iv) Consider the function
where
and . On the basis of the previous considerations, to prove (16), it suffices to show that there exist and such that is a bounded function for all such that , i.e., for all . . Then, we have 1) Suppose that and . Let be such that ; hence, and for any . , it follows that and Moreover, being . Therefore so
1477
and
it is easily seen that is unbounded in any neighbor, whenever . This concludes the hood proof of Case 1. 2) Suppose that and . Consider a sequence such that as . can be decomposed in a finite number The sequence , of subsequences, each one denoted for simplicity by for any having the property that . . Then, the following First, suppose that is possible. , hence a) Suppose that the sequence and , for any . Therefore, we obtain
Since it follows that for any
as we have
,
(18) for all Let
. and
and
. We have . If , then . Moreover,
while, if
then
. Therefore, we obtain
for all is an eigenvalue of Furthermore, since also obtain
, where . implies
Hence, the subsequence is bounded for any . , and that b) Suppose that the sequence , for any . We have , we
Since , it follows that . Therefore, from the properties of the projection on closed convex cones (see [16, p. 24]), we have for all , where is an eigenvalue of . Note that , since is a symmetric matrix such that . Substituting the obtained expressions in (18), we find that This yields
for all for any
. Therefore, is bounded in . Moreover, from
hence, the subsequence . c) Suppose that the sequence
is bounded for any
, for any , and
, and that . Since , it follows
1478
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 6, NOVEMBER 2006
that for any
is orthogonal to . Hence
, i.e.,
thus the sequence . Moreover, from
is bounded for any
and where , due to the standing . Let assumption , where and (see Lemma 1 in Appendix II), and consider the linear span . We have subspace
it follows that is an unbounded sequence, . whenever , Finally, consider the case where thus
hence
where is a linear operator, and where we have taken into account that , because . Note that , being . and As in the proof of 1), let . We have and , with , since . Then, we obtain
where
Since for any , if then we can proceed as in 2c) is bounded to conclude that the sequence for any , and it is unbounded for any . The same conclusion holds if , in fact, in such a case, we can proceed as in , this concludes 1). Since 2b) cannot occur if the proof of 2). 3) Finally, suppose that and . , let be For some sufficiently small , i.e., . Then such that
Since
is an eigenvalue of
Then, and any
. Furthermore, we have
, we also obtain that , therefore , and so
is bounded in , for some . This concludes the proof of 3).
B. LP Problems of
and let Now, assume that constrained to , i.e., . Also, let
be a critical point
where is an eigenvalue of In conclusion, we have shown that
If , then is also an unconstrained critical point of in (Property 3). the energy around . Theorem 2 gives a Łojasiewicz inequality for Theorem 2: [Łojasiewicz inequality for LP problems] Sup, and consider the G-NN for solving LP probpose that , then the following holds. Let lems in (6). If
FORTI et al.: CONVERGENCE OF NNS FOR PROGRAMMING PROBLEMS VIA A NONSMOOTH ŁOJASIEWICZ INEQUALITY
be a critical point of in in . Then, there exist of
1479
and, hence, a critical point and , such that
, and for any for all quence, the Łojasiewicz exponent of Proof: Consider the function
at
(19) . As a conseis .
(20) where and . As already noticed, to prove the Łojasiewicz inequality (19) it , and , such suffices to show that there exist is a bounded function for all such that that , i.e., for all . . Otherwise, if , then Assume that for any . Since , consider and a neighborhood , with . The following is possible. , then we can proceed as in the proof i) If in (20) is bounded for of 3) of Theorem 1 to show that and any , for sufficiently any small . . First, observe that for any ii) Suppose that we have , thus is and any bounded for any , for sufficiently small . Moreover, if is such that , then . Finally, such that consider points . Then, we can argue as in the proof of 2b) of Theorem 1 to conclude that is bounded for any and any , for sufficiently . small C. Discussion The next remarks are in order. Remarks: 1) In the 1960s, Łojasiewicz proved the following fundamental inequality [14], [15]: Suppose that is an analytic function in the open set , and let be a critical point of , i.e., . Then, there , , and , such that we have exist
for all . Theorems 1 and 2 extend the standard Łojasiewicz inequality for vector fields defined by the of an analytic function , to the classes of gradient vector fields (4) and (6), which are defined by the generalized gradient of the nondifferentiable energy functions and , respectively. See also [23] for another nonsmooth version of Łojasiewicz inequality. 2) Theorem 1 shows that for convex or nonconvex QP probat the constrained lems the Łojasiewicz exponent of
Fig. 1. Illustration of Theorem 1 for a 2-D polyhedral region P .
critical point depends on the position of with respect to and the position of vector with re. Note that if is not a vertex of , then spect to and , hence, from Theorem 1 we have . Therefore, 3) of Theorem 1 is possible only if is a vertex of . Fig. 1 gives an illustration of the result in Theorem 1 for a two-dimensional (2-D) polyhedron . Four cases are considered. In the first case, we have , hence, . In the second case, is a vertex of and , hence, . In the third case, is not a vertex , hence, . of , and Finally, in the fourth case, we have that is a , hence, . vertex of and 3) According to Property 2, the trajectories of (4) and (6) are confined within for all large times. This is the reason why the Łojasiewicz inequalities in Theorems 1 and 2 consider that belong to . only those points is not a critical point 4) It can be easily seen that if of in , then the Łojasiewicz inequalities obtained in Theorems 1 and 2 do hold for any , hence, the . Łojasiewicz exponent at is 5) Theorems 1 and 2 give an exact estimate of the Łojasiewicz exponent for convex or nonconvex QP problems, and LP problems. This is of theoretical interest also in that even in the standard setting of smooth analytic functions, it is usually hard to sharply evaluate the Łojasiewicz exponent. V. EXPONENTIAL CONVERGENCE AND CONVERGENCE IN FINITE TIME By exploiting the Łojasiewicz inequalities in Theorems 1 and 2, we are able to prove the following results on convergence for G-NN (4) and (6). First, consider (4). Theorem 3 holds. , and consider G-NN (4) for Theorem 3: Suppose that such solving convex or nonconvex QP problems. Fix any
1480
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 6, NOVEMBER 2006
that
. If of (4) starting in at and converges to a singleton
, then any trajectory has finite length on as , i.e., we have
Now, exploiting (21) we want to show that given any such that for , then
and
(22)
Furthermore, if the Łojasiewicz exponent of at point is , then is exponentially convergent to , i.e., we have
for some and . If instead we have converges to in finite time, i.e., we have
is a nonincreasing function for Considering that , it suffices to consider the following three distinct cases. for . From (21), i) We have we thus obtain
, then ii) We have implies that Therefore, we obtain
for for all
. Then, (21) .
for some . When is positive–semidefinite, i.e., we consider G-NN (4) for solving convex QP problems, then the same result holds with replaced by . Proof: Let and let , , be the solution . As seen in Property 2, of (4) with initial condition such that for all . there exists Moreover, we have
and
for a.a. . and is continuous for , then Since the -limit set of is nonempty and it is contained in . is a point in the -limit set of , and Suppose that , with as , such consider a sequence that as . Since is continuous (see Property 2), we also have and nonincreasing for . and Our goal is to prove that we have that in addition . To this end, we make use of an argument analogous to that employed by Łojasiewicz in [24]. First of all observe that, on the basis of the Łojasiewicz inequality established in Theorem 1 and Remark 4 in Section IV-C, there , , and , such that exist
(21) for all
such that
.
iii) We have for . Then, (21) implies that , and Therefore, we have
and
for
, and , for some for
for .
FORTI et al.: CONVERGENCE OF NNS FOR PROGRAMMING PROBLEMS VIA A NONSMOOTH ŁOJASIEWICZ INEQUALITY
Then
1481
at point jasiewicz exponent of . Now, define for erwise
is either
or oth-
hence This concludes the verification of (22). To proceed, note that since we have , and , then there exists that
such for a.a. . On the basis of (23) and (21), we obtain
and
Now, we argue by contradiction to show that we have for all . In fact, suppose that for some , while for have Then, accounting for (22) we have
we .
for a.a.
. Hence
(24) for a.a. , where we have let . Now, consider the differential equation Then (25) which is a contradiction. for all Being
, we obtain from (22)
(23) has finite length on and, according to Therefore, a standard argument (see e.g., [12, p. 663]) it follows that the -limit set of is a singleton; hence, we have
We have , for . Indeed, if , then (24) and (25) yield . some , we obtain By solving (25) with
and, hence
Then, suppose that Let us prove that . Since the map is upper semicontinuous in , with nonempty compact convex values, it follows from [16, Th. 2, p. 310] that point is necessarily an equilibrium point of (4). Hence, from Property 3 . This completes the proof of the first part it follows that of the theorem. to . First Now, let us address the convergence rate of , then Theorem 1 implies that the Łonote that, since
; hence, (25) reduces to
Solving this equation, we obtain
for
1482
Hence, the result in the theorem follows by noting that
for all . In Theorem 4, we address convergence of (6). , and consider G-NN (6) Theorem 4: Suppose that such that for solving LP problems. Fix any . If , then any trajectory of (6) starting in at has finite length on and converges to a in finite time, i.e., we have singleton
for some . Proof: To prove that as , we can proceed as in the proof of Theorem 3, and take into account . Also that for a convex programming problem we have to can be proved with the convergence in finite time of same argument as that used in the proof of Theorem 3 in the . case Remarks: 1) Theorems 3 and 4 mean that G-NNs (4) and (6) enjoy very strong convergence properties. Consider for instance (4). and Theorem 3 actually states that given any defining the programming problem (1), and any polyhedral feasibility region , then any trajectory of (4) is exponentially convergent, or convergent in finite time, toward a singleton, and this is true even when (4) possesses infinitely many nonisolated equilibrium points. This means that the property of exponential convergence (or convergence in finite time) is absolutely stable for (4). Similarly, Theorem 2 implies that the property of convergence in finite time is absolutely stable for (6). 2) Theorem 3 gives a stronger result with respect to [11, Th. 4], where the special case where is positive–definite and G-NN (4) has a unique equilibrium point, has been considered. Moreover, no estimate of convergence rate was obtained in that paper. Theorem 4 is stronger than [11, Th. 5]. In fact, in [11, Th. 5] it is only shown that the trajectories of G-NN (6) converge in finite time to the set ; however, convergence of each trajectory to a singleton is not proved. An analogous conclusion holds with respect to the class of NNs for solving LP problems proposed in [5], where the in finite trajectories have been shown to converge to set time, but not necessarily to a singleton. The reader is referred to [5], [25] for a review and a comparative study of other NN approaches for solving LP problems. 3) For convex QP and LP problems, it follows from Theorems 3 and 4 that G-NN is able to compute one of the global optimizers at an exponential rate, or in finite time. The situation is basically different for nonconvex QP problems. Due to the gradient nature of G-NN, in the application to combinatorial nonconvex QP problems this NN is able only to perform a local optimal search, without guaranteeing the
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 6, NOVEMBER 2006
computation of the global optimum. We remark that in this paper we have been concerned basically with the issue of convergence of G-NN for nonconvex QP problems. The analysis of the quality of the local optimal solution computed by G-NN and the question of how to improve the ability to search for the global optimum go beyond the scope of the present work and are an interesting subject for future investigation. 4) In addition to the solution of combinatorial nonconvex QP problems, there is another potential field of application for G-NN (4) in the nonconvex case. Due to the fast convergence speed and the presence of multiple equilibrium points, the G-NN (4) becomes attractive for implementing new classes of associative memories, or new real-time devices for image processing tasks. In this sense, G-NN can be viewed as a generalization of Hopfield networks with bounded piecewise–linear neuron activations, or cellular NNs [26]. In fact, while the feasibility space of G-NN is a general convex polyhedral region, Hopfield NNs and cellular NNs are characterized by a feasibility region coinciding with a hypercube.
VI. SIMULATION RESULTS We have considered the following programming problem variables and constraints, which has been with adapted from an analogous problem in [27]: minimize
subject to
The constraints define a polyhedron satisfying Assumption 1. The vertex of whose distance from the origin is maximum is . Due to the quadratic bilinear terms in the objective function, this is a nonconvex QP problem. We have and, from (13), . If we choose , then and (14) yields . We have simulated the dynamical behavior of G-NN (4) by using the routine ode23s of MATLAB, when . Fig. 2(a) reports the time evolution of a trajectory of (4) starting at at . It is seen that the trajectory converges in finite time to the equilibrium point . Point is a vertex of where constrained to reaches the global minimum . This behavior is in accordance with that predicted by Theorem 3 in
FORTI et al.: CONVERGENCE OF NNS FOR PROGRAMMING PROBLEMS VIA A NONSMOOTH ŁOJASIEWICZ INEQUALITY
0
0
0
1483
Fig. 2. (a) Trajectory of G-NN (4) with initial condition x = (0:5; 4:0; 2:0; 3:0) . (b) The same trajectory shown with expanded amplitude-time scales. (c) Trajectory of G-NN (4) with initial condition x(0) = (1:0; 0:5; 0:7; 0:5) . (d) Trajectory with initial condition x = ( 1:0; 0:5; 0:7; 0:5) .
0
the case . Fig. 2(b) shows an expanded view of the first part of the transient behavior of the same trajectory. It can be verified that the trajectory, which starts outside at , enters at and it stays in thereafter. In Fig. 2(c) we have reported the trajectory of (4) obtained for an initial condition . The trajectory, which starts in at , stays in for all and converges in finite time to the same equilibrium point . Finally, Fig. 2(d) depicts the time-evolution of a trajectory of (4) starting at at . The trajectory converges in finite time to a different equilibrium point , which is a vertex of corresponding to a local minimum of constrained to where . We have performed simulations of (4) for other convex and nonconvex QP problems, including cases where there are nonisolated equilibrium points for (4). The simulations have always agreed with the results on exponential convergence, or convergence in finite time, in Theorems 3 and 4.
0
0
VII. CONCLUSION This paper has addressed trajectory convergence for a class of NNs aimed at solving LP and convex or nonconvex QP problems. The main result is that each trajectory of these networks is either exponentially convergent, or convergent in finite time, toward a singleton. Moreover, these convergence properties are true independently of the nature of the set of equilibrium points, hence they hold even when the network possesses infinitely many nonisolated equilibrium points. The proof exploits a new method for convergence, based on the use of a nonsmooth Łojasiewicz inequality for the generalized gradient vector field defining the NN dynamics. The obtained results are of interest for fast computation of the global optimal solution of convex LP and QP problems and computation of local optimal solutions of combinatorial nonconvex QP problems. In the case of nonconvex QP problems, this paper has pointed out some issues for future investigation, as the possibility to use such networks for implementing new classes of neural associative memories or other neural devices for real-time signal processing.
1484
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 6, NOVEMBER 2006
APPENDIX I For the sake of completeness, in the following, we give a chain rule for computing the derivative of the composition of a regular function and an absolutely continuous function, which plays a key role in some results in this paper. The proof of the rule follows the lines of that of [28, Lemma 1] (see also [29, Th. 2.2]). Property 4: [chain rule] Suppose that is regand is absolutely continuous ular in . Then, and are on any compact interval of differentiable for a.a. and we have
Proof: Function is absolutely continuous on any , since it is the composition of a compact interval of locally Lipschitz function and an absolutely continuous . Hence, and are differentiable for function . a.a. and are differentiable at Suppose that . Then, we have
where we have taken into account the relation between and in [18, Th. 9.61]. Furthermore, by letting (26), and considering again that is regular we obtain
where
Therefore,
we
for some
that
APPENDIX II Lemma 1: Consider the polyhedron given in (2), and let . Then, the following properties hold. 1) We have
2) Consider the
-dimensional hyperplanes , , and denote by , , the external unit normalized vector to . Then, point belongs to the intersection of hyperplanes , , where . 3) We have
, hence
Now, by letting regular, we have
conclude
, and so
(26) , where Indeed, being as , we can write the equation, as shown at bottom of page. Since is locally Lipschitz, it follows that
in
in (26), and considering that
is where
if and only if
. In particular
is a vertex of
.
FORTI et al.: CONVERGENCE OF NNS FOR PROGRAMMING PROBLEMS VIA A NONSMOOTH ŁOJASIEWICZ INEQUALITY
4) There exists a neighborhood
for all 5) We have
of
such that
.
where , for any . Proof: The property in 1) is a direct consequence of Theorem 4.1.10 and the Duality Theorem 1.1.8 in [17]. The remaining properties follow from [18, Th. 6.46]. APPENDIX III To verify (17), note that we have
Taking into account the symmetry of we obtain that
and that
,
ACKNOWLEDGMENT The authors would like to thank the reviewers and the Associate Editor for their insightful and constructive comments, which helped to improve the presentation of the results in this paper. REFERENCES [1] D. W. Tank and J. J. Hopfield, “Simple ‘neural’ optimization networks: an A/D converter, signal decision circuit, and a linear programming circuit,” IEEE Trans. Circuits Syst., vol. 33, no. 5, pp. 533–541, May 1986. [2] M. P. Kennedy and L. O. Chua, “Neural networks for nonlinear programming,” IEEE Trans. Circuits Syst., vol. 35, no. 5, pp. 554–562, May 1988. [3] A. Rodríguez-Vázquez, R. Domínguez-Castro, A. Rueda, J. L. Huertas, and E. Sánchez-Sinencio, “Nonlinear switched-capacitor “neural” networks for optimization problems,” IEEE Trans. Circuits Syst., vol. 37, no. 3, pp. 384–398, Mar. 1990. [4] A. Cichocki and R. Unbehauen, Neural Networks for Optimization and Signal Processing. Chichester, U.K.: Wiley, 1993. ˙ [5] E. K. P. Chong, S. Hui, and S. H. Zak, “An analysis of a class of neural networks for solving linear programming problems,” IEEE Trans. Autom. Control, vol. 44, no. 11, pp. 1095–2006, Nov. 1999. [6] X. Liang and J. Wang, “A recurrent neural network for nonlinear optimization with a continuously differentiable objective function and bound constraints,” IEEE Trans. Neural Netw., vol. 11, no. 6, pp. 1251–1262, Nov. 2000. [7] X. Gao, L. Z. Liao, and W. Xue, “A neural network for a class of convex quadratic minimax problems with constraints,” IEEE Trans. Neural Netw., vol. 15, no. 3, pp. 622–628, May 2004. [8] H. Qi and L. Qi, “Deriving sufficient conditions for global asymptotic stability of delayed neural networks via nonsmooth analysis,” IEEE Trans. Neural Netw., vol. 15, no. 1, pp. 99–109, Jan. 2004.
1485
[9] L. V. Ferreira, E. Kaszkurewicz, and A. Bhaya, “Solving systems of linear equations via gradient systems with discontinuous right hand sides: application to LS-SVM,” IEEE Trans. Neural Netw., vol. 16, no. 2, pp. 501–505, Mar. 2005. [10] Y. Xia and J. Wang, “A recurrent neural network for solving nonlinear convex programs subject to linear constraints,” IEEE Trans. Neural Netw., vol. 16, no. 2, pp. 379–386, Mar. 2005. [11] M. Forti, P. Nistri, and M. Quincampoix, “Generalized neural network for nonsmooth nonlinear programming problems,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 9, pp. 1741–1754, Sep. 2004. [12] M. Forti and A. Tesi, “Absolute stability of analytic neural networks: an approach based on finite trajectory length,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 12, pp. 2460–2469, Dec. 2004. [13] J. Palis and W. De Melo, Geometric Theory of Dynamical Systems. Berlin, Germany: Springer-Verlag, 1982. [14] S. Łojasiewicz, “Sur le problème de la division,” Studia Math., vol. T. XVIII, pp. 87–136, 1959. [15] ——, “Ensembles semi-analytiques” Institut des Hautes Études Scientifique Notes, 1965. [16] J. P. Aubin and A. Cellina, Differential Inclusions. Berlin, Germany: Springer-Verlag, 1984. [17] J. P. Aubin and H. Frankowska, Set-Valued Analysis. Boston, MA: Birkauser, 1990. [18] T. Rockafellar and R. Wets, Variational Analysis. Berlin, Germany: Springer-Verlag, 1997. [19] C. A. Floudas and V. Visweswaran, “Quadratic optimization,” in Handbook of Global Optimization, R. Horst and P. M. Pardalos, Eds. London, U.K.: Kluwer, 1990. [20] D. P. Bertsekas, “Necessary and sufficient conditions for a penalty method to be exact,” Math. Programming, vol. 9, pp. 87–99, 1975. ˙ [21] M. P. Glazos, S. Hui, and S. H. Zak, “Sliding modes in solving convex programming problems,” Soc. Ind. Appl. Math. (SIAM) J. Control Optim., vol. 36, pp. 680–697, Mar. 1998. [22] J. Ekeland and R. Temam, Convex Analysis and Variational Problems. Amsterdam, The Netherlands: North-Holland & Elsevier, 1976. [23] J. Bolte, A. Daniilidis, and A. Lewis, “The Lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems (condensed version),” J. Math. Anal. Applicat. 2006 [Online]. Available: http://legacy.orie.cornell.edu/~aslewis/, submitted for publication [24] S. Łojasiewicz, “Sur les trajectoires du gradient d’une fonction analytique,” in Seminari di Geometria 1982–1983. Bologna, Italy: Università di Bologna, Istituto di Geometria, Dipartimento di Matematica, 1984, pp. 115–117. ˙ [25] S. H. Zak, V. Upatising, and S. Hui, “Solving linear programming problems with neural networks: a comparative study,” IEEE Trans. Neural Netw., vol. 6, no. 1, pp. 94–104, Jan. 1995. [26] L. O. Chua and L. Yang, “Cellular neural networks: theory,” IEEE Trans. Circuits Syst., vol. 35, no. 10, pp. 1257–1272, Oct. 1988. [27] V. Visweswaran and C. A. Floudas, “A global optimization algorithm (GOP) for certain classes of nonconvex NLPs—II. Application of theory and test problems,” Comp. Chem. Eng., vol. 4, pp. 1419–1434, 1990. [28] A. Bacciotti and F. Ceragioli, “Nonsmooth optimal regulation and discontinuous stabilization,” Abstract Appl. Anal., vol. 2003, pp. 1159–1195, 2003. [29] B. Paden and D. Shevitz, “Lyapunov stability theory of nonsmooth systems,” IEEE Trans. Autom. Control, vol. 39, no. 9, pp. 1910–1914, Sep. 1994. Mauro Forti received the degree in electronic engineering from the University of Florence, Florence, Italy, in 1988. From 1991 to 1998, he was an Assistant Professor in applied mathematics and network theory at the Electronic Engineering Department, the University of Florence. In 1998, he joined the Department of Information Engineering, the University of Siena, Siena, Italy, where he is currently a Professor of Electrical Engineering. His main research interests are in the field of nonlinear circuits and systems, with emphasis on the qualitative analysis and stability of circuits modeling artificial neural networks. His research activity also includes some aspects of electromagnetic compatibility. Dr. Forti has been an Associate Editor of the IEEE TRANSACTIONS ON NEURAL NETWORKS since 2001. He has also served as an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: FUNDAMENTAL THEORY AND APPLICATIONS from 2002 to 2004.
1486
Paolo Nistri was born in Rignano sull’Arno, Florence, Italy, in 1948. He received the Dr. degree in mathematics from the University of Florence, Florence, Italy, in 1972. From 1974 to 1979, he has been an Assistant Professor at the University of Calabria, Cosenza, Italy. In 1979, he became a member of the Engineering Faculty, the University of Florence, where from 1982 to 1998, he has been an Associate Professor of Mathematical Analysis. Since 1998, he has been a member of the Engineering Faculty, the University of Siena, Siena, Italy, where he is a Full Professor of Mathematical Analysis. He is author of around one hundred scientific publications, and he has been organizer and coordinator of several international scientific activities. His main scientific interests are in the field of dynamical systems, mathematical control theory, differential inclusions, and topological methods. Dr. Nistri is an Associate Editor of international mathematical journals and he acts as referee to many of them.
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 6, NOVEMBER 2006
Marc Quincampoix received the Ph.D. degree and the habilitation degree from University Paris-Dauphine, Paris, France, in 1991 and 1996, respectively. He was Maitre de Conferences at the University of Tours, Tours, France, from 1991 to 1994 and at the University Paris-Dauphine from 1994 to 1996. Since 1996, he has been a Professor at the Department of Mathematics, the University of Brest, Brest, France. Since 2002, he has been a Coordinator of the European Network “Evolution Equations.” His fields of interests include, control theory, differential games, differential inclusions, and deterministic and stochastic viability.