308
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 2, FEBRUARY 2016
A Generalized Hopfield Network for Nonsmooth Constrained Convex Optimization: Lie Derivative Approach Chaojie Li, Xinghuo Yu, Fellow, IEEE, Tingwen Huang, Guo Chen, and Xing He
Abstract— This paper proposes a generalized Hopfield network for solving general constrained convex optimization problems. First, the existence and the uniqueness of solutions to the generalized Hopfield network in the Filippov sense are proved. Then, the Lie derivative is introduced to analyze the stability of the network using a differential inclusion. The optimality of the solution to the nonsmooth constrained optimization problems is shown to be guaranteed by the enhanced Fritz John conditions. The convergence rate of the generalized Hopfield network can be estimated by the second-order derivative of the energy function. The effectiveness of the proposed network is evaluated on several typical nonsmooth optimization problems and used to solve the hierarchical and distributed model predictive control four-tank benchmark. Index Terms— Constraints, enhanced Fritz John conditions, Filippov solutions, global optimization, Hopfield network, Lie derivative.
I. I NTRODUCTION ANY PRACTICAL applications are concerned with nonsmooth constrained optimization problems, which involve nondifferentiable objective functions, for example, compressed sensing [1], machine learning [2], information security [3], [4], smart grids [5], and controller design in system control [6], [7]. With advances in sensing technologies, wireless communications, and big data, industrial problems become increasingly complicated at present. The real-time
M
Manuscript received August 14, 2014; revised July 7, 2015 and October 13, 2015; accepted October 22, 2015. Date of publication November 18, 2015; date of current version January 18, 2016. This work was supported in part by the National Priorities Research Program through the Qatar National Research Fund, a member of Qatar Foundation under Grant NPRP 4-1162-1-181, in part by the Fundamental Research Funds through the Australian Research Council Discovery Scheme under Grant 140100544, in part by the Fundamental Research Funds for Central Universities under Project XDJK2014C118, in part by the Natural Science Foundation of China under Grant 61403313, and in part by the Natural Science Foundation Project through the China Sustainable Transportation Center, Chongqing, under Grant cstc2014jcyjA40014. (Corresponding author: Xing He.) C. Li and X. Yu are with the School of Electrical and Computer Engineering, RMIT University, Melbourne, VIC 3000, Australia (e-mail:
[email protected];
[email protected]). T. Huang is with the Texas A&M University at Qatar, Doha 23874, Qatar (e-mail:
[email protected]). G. Chen is with the School of Electrical and Information Engineering, The University of Sydney, Sydney, NSW 2006, Australia (e-mail:
[email protected]). X. He is with the School of Electronics and Information Engineering, Southwest University, Chongqing 400715, China (e-mail: hexingdoc@ hotmail.com). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2015.2496658
performance of optimization is very important for applications. At present, most of the mathematical programming solvers are based on numerical optimization techniques, such as CPLEX and GAMS. These in-built soft computing algorithms may not solve the optimization problem in real time, especially for large-scale and nonconvex optimization problems. Circuit-based neural networks have the potential to solve these challenging problems in real time. Consequently, there is a urgent need for high-performance and low computational cost algorithms, such as the Hopfield networks [8]. Hopfield-type networks have been successfully applied to associative memory and to solve constrained optimization problems [9]. Many Hopfield-type networks have been developed for different convex optimization problems (see details in [10] and [11]). For instance, the primal–dual assignment network was designed in [12], and the related work about the assignment problem was described in [23]. Globally convergent optimization neural networks were proposed in [16]. The real-time semidefinite programming problems were solved in [15]. Projection neural networks were analyzed for the pseudomonotonic variational inequality and the pseudoconvex optimization problem. For particular applications, the k-winners-take-all model was solved in [20] and [21] by using a neural network [13]. The scheduling shortest path routing problem was solved by a discrete-time neural network [17]. In an attempt to solve compressed sensing problems, the l1 -norm estimation has been reformulated into the form of a neural network in [18], [19], and [22]. Typically, the sparse approximation problem was solved by introducing a locally competitive algorithm in [24]. For nonsmooth optimization problems, using Hopfield networks would introduce discontinuity which has a negative impact on convergence of the neural networks. Forti et al. [32] proposed a novel neural network for solving nonsmooth nonlinear programming problems with inequality constraints in terms of the Filippov differential inclusion. Based on the first-order necessary conditions, they obtained a global optimal solution and evaluated the finite convergence time by the first-order gradient. Hopfield networks for solving nonsmooth nonlinear convex programming problems with linear equality and inequality constraints can be found in [36] where the exact penalty function was introduced to handle constraints. A general nonconvex optimization problem was solved by a nonsmooth neural network [37]. The smoothing approximate technique-based neural network [40]
2162-237X © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
LI et al.: GENERALIZED HOPFIELD NETWORK FOR NONSMOOTH CONSTRAINED CONVEX OPTIMIZATION
was employed to solve a class of constrained non-Lipschitz optimization problems. It is noticeable that the time derivative of the Lyapunov function is calculated solely according to the chain rule. A convergence analysis of neural networks with discontinuous activations was reported in [33] and [34] by the same technique. Applications of discontinuous neural networks were investigated in [44] and [45] for bilevel programming problems. However, for discontinuous systems, nonsmooth Lyapunov functions may not result in a stable trajectory and counterexamples can be found in [31]. New techniques, such as the extended LaSalle’s invariance principle [30], are required to theoretically guarantee the stability of neural networks governed by differential inclusions. On the other hand, it is crucial to evaluate the convergence performance of neural networks. The existing methods for estimating the finite-time convergence rate separately handle the objective function and its constraints due to the discontinuity of penalty function, providing extra conservativeness to the evaluation of the convergence rate. Therefore, it is essential to develop a new approach that addresses the convergence analysis of neural networks by considering the penalized objective function as a whole. In this paper, a generalized Hopfield network is proposed for solving general constrained convex optimization problems. First, the existence and the uniqueness of solutions to the generalized Hopfield network in the Filippov sense are proved. Then, the Lie derivative is introduced to analyze the stability of the network using a differential inclusion. The optimality of the solution to the nonsmooth constrained optimization problems is shown to be guaranteed by the enhanced Fritz John conditions. It is shown that the convergence rate of the generalized Hopfield network can be estimated by the secondorder derivative of the energy function. The effectiveness of the proposed network is evaluated on several typical nonsmooth optimization problems, and used to solve the hierarchical and distributed (HD) model predictive control (MPC) four-tank benchmark. The remainder of this paper is organized as follows. Section II gives the required preliminaries. In Section III, first, the model of the generalized Hopfield is proposed. Second, the existence and the uniqueness of the Filippov solutions to a Hopfield network are discussed. Third, the optimality of the solution presented by the Hopfield network is obtained under certain conditions. In Section IV, three numerical examples and the HD-MPC four-tank benchmark are provided to illustrate the effectiveness and high performance of the generalized Hopfield network. Finally, the conclusion is drawn in Section V. II. P ROBLEM F ORMULATION AND P RELIMINARIES Before proceeding, some notations and definitions are introduced. Consider the following constrained optimization problem: s.t. g(x) ≤ 0 h(x) = 0 Rn
g(x) = (g1 (x), g2 (x), . . . , gm (x)) : Rn → Rm is a m-dimensional vector-valued function in which gi (i = 1, 2, . . . , m) : Rn → R are convex and their corresponding differentiability is not required. h(x) = (a1T x − b1 , a2T x − b2 , . . . , a Tp x − b p )( p ≤ n) : Rn → R p , where the vectors {a j }( j = 1, 2, . . . , p) are linearly independent. Denote ⎧ ⎪ gi (x) > 0 ⎨1 G i = [0, 1] gi (x) = 0 (2) ⎪ ⎩ 0 gi (x) < 0 ⎧ ⎪ h j (x) > 0 ⎨1 H j = [−1, 1] h j (x) = 0 (3) ⎪ ⎩ −1 h j (x) < 0 where i = 1, 2, . . . , m and j = 1, 2, . . . , p. Denote the feasible domain C of problem (1) C = {x | g1(x) ≤ 0, . . . , gm (x) ≤ 0} ∩ {x | h 1 (x) = 0, . . . , h p (x) = 0}. Definition 1: A function f : Rn → R is said to be locally Lipschitz at x ∈ Rn if there exist positive constants L and μ such that | f (y) − f (y )| ≤ Ly − y for all y, y ∈ Bn (x, μ). | · | is the absolute value of the variables, and · is the norm of the vector. Definition 2: A function f is said to be locally Lipschitz on D ⊂ Rn if it is locally Lipschitz at x, for all x ∈ D. Definition 3: A function f : Rn → R is said to be regular at x ∈ Rn if for all ν ∈ Rn , the right directional derivative of f at x in the direction of ν, denoted by f (x; ν), exists and coincides with the generalized directional derivative of f at x in the direction of ν, denoted by f ◦ (x; ν). Note that a continuously differentiable function at x is locally Lipschitz and regular at x. A locally Lipschitz function is differentiable almost everywhere in the sense of the Lebesgue measure. Definition 4: If f denotes the set of points in Rn at which f fails to be differentiable, and S denotes any other set of measure zero, the generalized gradient of f is defined as f ∂ f (x) = co lim d f (x i )|x i → x, x i ∈ S i→+∞
where co denotes the convex closure. n n Let Ln : 2R → 2R be the set-valued mapping that associates to each subset S of Rn the set of its least-norm elements Ln(S). If the set S is convex, then the set Ln(S) reduces to a singleton, which consists of the orthogonal projection of 0 onto S. Definition 5: For a locally Lipschitz function f , the generalized gradient vector field Ln(∂ f ) : Rn → Rn is defined as x → Ln(∂ f )(x) Ln(∂ f (x)).
min f (x)
)T
309
(1) Rn
∈ and f (x) : → R where x = (x 1 , x 2 , . . . , x n is a convex function but not necessarily differentiable.
In particular, −Ln(∂ f )(x) is a direction of descent of f at x ∈ Rn . Now, consider the differential equation x˙ = X (x(t))
(4)
310
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 2, FEBRUARY 2016
where X : Rn → Rn is measurable and essentially locally bounded. If X : Rn → Rn is continuous, then, for all x 0 ∈ Rn there exists a classical solution of (4) with the initial condition x(0) = x 0 . Nevertheless, if X is discontinuous, then classical solutions might not exist. Therefore, we focus on the solution of this equation in the Filippov sense. Definition 6: For each x ∈ Rn , define the Filippov n set-valued map K [X] : Rn → 2R by K [X](x) co{X (Bn (x, δ)) \ S} (5) δ>0 (S)=0
where denotes the usual Lebesgue measure in Rn . A Filippov solution of (4) on an interval [t0 , t1 ] ⊂ R is defined as a solution of the differential inclusion x˙ ∈ K [X](x).
(6)
In order to analyze the discontinuities and nonsmooth Lyapunov functions, we introduce the Lie derivative and invariant set to depict the property of the evolution of a function along the trajectories. Definition 7: Given a locally Lipschitz function n f : Rn → R and a set-valued map K : Rn → 2R , n the set-valued Lie derivative L X f : Rn → 2R of f with respect to X at x is defined as L K [X ] f (x) = {a ∈ R|∃ν ∈ K [X](x) such that ζ · ν = a, ∀ζ ∈ ∂ f (x)}
η→0+
f (x + ηd) − f (x) . η
(7)
(8)
By this definition, let f : Rn → R be locally Lipschitz and regular, and the derivative of f (t, x(t)) with respect to time t satisfies d f (t, x(t)) ∈ L K [X ] f (t, x(t)) (9) dt for almost every t on its domain. Note that the Lie derivative is more general than the traditional derivative. Therefore, we have D + f (t, x(t)) = max L K [X ] f (t, x(t)).
2) μ∗i ≥ 0 for all i = 1, 2, . . . , m. 3) μ∗0 , μ∗i (i = 1, 2, . . . , m), and λ∗j ( j = 1, 2, . . . , p) are not all equal to 0. 4) If the set I ∪ J is nonempty, where I = {i | μ∗i = 0} and J = { j | λ∗j > 0}, then given any neighborhood B of x ∗ and any > 0, there is a x ∈ B such that f (x) < f (x ∗ ) gi (x) > 0 ∀i ∈ I, λ∗j h j (x) > 0 ∀ j ∈ J gi (x) ≤ ψ(x) ∀i ∈ I, |h j (x)| ≤ ψ(x) ∀ j ∈ J where
Proof: See the Appendix. Remark 1: Compared with the Karush–Kuhn– Tucker (KKT) conditions, the enhanced Fritz John conditions are less restrictive and include more classes of optimal points. For example, the Lagrange multiplier on the gradient of the objective function cannot be zero by the KKT conditions. This is the main reason why the enhanced Fritz John conditions were employed to verify the optimality of the solution. Lemma 2 shows the existence of Lagrange multipliers, which bridges the connection between the classical constraint qualification and the exact penalty function. In particular, when both gi (x)(i = 1, 2 . . . m) and h j (x)( j = 1, 2, . . . p) are linear, the existence of Lagrange multipliers is undetermined by classical Fritz John conditions [27]. Nevertheless, it can be guaranteed by the enhanced Fritz John conditions. Moreover, the existence of the exact penalty functions is ensured in this paper and is more general than the results in [32], [33], [37], and [39]–[41].
(10)
Definition 9: A critical point of f : Rn → R is a point x ∈ Rn such that 0 ∈ ∂ f (x). According to this definition, if f is a locally Lipschitz function at x ∈ Rn , x ∗ ∈ Rn is a critical point when f attains a local minimum or maximum at x ∗ . Lemma 1 (Filippov Set-Valued Map of Nonsmooth Gradient [30]): If f : Rn → R is locally Lipschitz, then n the Filippov set-valued map K [Ln(∂ f )] : Rn → 2R of the nonsmooth gradient of f is equal to the generalized gradient n ∂ f : Rn → 2R of f , that is, for x ∈ Rn K [Ln(∂ f )](x) = ∂ f (x).
j =1
i=1
ψ(x) = min{min{gi (x) | i ∈ I }, min{|h j (x)| | j ∈ J }}.
where X : Rn → Rn is defined as the discontinuous vector field. Definition 8: If f is defined on a vector space, then the upper Dini derivative at x in the direction d is defined as D + f (x) = lim sup
If the objective function of problem (1) is nonsmooth, it will require the gradient to be defined as the set-valued map. This lemma establishes the set-valued map of the nonsmooth gradient in the sense of the Filippov solution. Lemma 2 (Enhanced Fritz John Conditions): Let x ∗ be a local minimum of problem (1). Then, there exist scalars μ∗0 , μ∗i (i = 1, 2, . . . , m) and λ∗j ( j = 1, 2, . . . , p) satisfying the following conditions. 1) ⎡ ⎤ p m 0 ∈ − ⎣μ∗0 ∂ f (x ∗ ) + μ∗i ∂gi (x ∗ ) + λ∗j ∇h j (x ∗ )⎦.
(11)
III. M ODEL D ESCRIPTION This section describes a novel model for solving the nonsmooth constrained optimization problem (1). Consider the associate energy function E(x, y) =
1 + 2 1 1 ( f (x) − y)2 + gi (x) + |h j (x)|2 2 2 2 m
p
i=1
j =1
(12) where gi+ (x) = max{0, gi (x)}. Remark 2: The monotonicity of ( f (x) − y)2 depends on the ∂ f (x) for a fixed value of y. This means that ( f (x) − y)2
LI et al.: GENERALIZED HOPFIELD NETWORK FOR NONSMOOTH CONSTRAINED CONVEX OPTIMIZATION
and f (x) share the same convexity as long as f (x) − y ≥ 0. On the other hand, ( f (x)− y)2 is also convex in y, for fixed x. It is noticeable that the penalty terms gi+ (x)2 (i = 1, 2, . . . , m) and |h j (x)|2 ( j = 1, 2, . . . , p) are convex with respect to x. Therefore, E(x, y) : Rn × R → R is a continuously convex function. This condition plays a significant role in guaranteeing the global optimality. Note that it still holds for the case of nonsmooth convex function f (x). When f (x) and g(x) are the nonsmooth functions accompanied with equality constraints h(x), the nonsmooth gradient flow of E(x, y) does not admit the existence and uniqueness of classical solutions. By Lemma 1, the corresponding Hopfield network is governed by the differential inclusion K [−∂ E](x, y) ⎧ ⎪ ⎪ ⎪ x(t) ˙ ∈ − ( f (x(t)) − y(t))∂ f (x(t)) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ m ⎪ ⎨ + G i gi+ (x(t))∂gi (x(t)) (13) i=1 ⎪ ⎪ p ⎪ ⎪ ⎪ H j |h j (x(t))|∇h j (x(t)) + ⎪ ⎪ ⎪ j =1 ⎪ ⎪ ⎩ y˙ (t) = f (x(t)) − y(t) where x(t) : R → Rn and y(t) : R → Rn are the time-dependent variables with initial values x(0) and y(0) that correspond to the variables x of the objective function f (x) and the augmented variable y. ∂ f (x) and ∂g are the generalized gradients, and ∇h is the classical gradient. Note that the generalized gradients are dynamically weighted by the term f (x) − y. By Definition 9, neurons x perform the function of enforcing the trajectories into the feasible region C and eventually attain an equilibrium of the neural network (13). Meanwhile, neuron y serves as an indicator of optimal solutions for the energy function (12), and the limit of neuron y is the global minimal value of f (x) for problem (1). A. Existence and Uniqueness of Filippov Solutions As for the Hopfield network (13), note that these vector fields are discontinuous, and therefore, we shall consider their solutions in the Filippov sense. Theorem 1: Suppose that problem (1) is solved by the Hopfield network (13). Then, the network has the unique Filippov solution starting from any initial value (x(0), y(0)). Proof: See the Appendix. Note that if the Hopfield network is governed by a differential inclusion, it is essential to investigate the existence and uniqueness of the Filippov solution. Otherwise, the solution and the optimality cannot be guaranteed. This theorem shows that the convexity of the objective function and the constraints has a significant impact on the existence and uniqueness of the solution. Conversely, the Filippov solution may not exist for some nonconvex functions or constraints. Remark 3: Due to y being governed by the ordinary differential equation y˙ = f (x) − y, the Hopfield network (13) has a unique classical solution for y. In addition, f (x), gi (x)(i = 1, 2, . . . , m), and h j (x)
311
( j = 1, 2, . . . p) are locally Lipschitz, which implies ∂ f (x), ∂gi (x), and ∇h j (x) are locally bounded. Hence, the differential inclusion K [−∂x E](x, y) is one-side Lipschitz, and the uniqueness of Fillipov solution for x can be satisfied [29]. B. Convergence Analysis It is known that the system stability is closely linked to convergence. A common approach of a stability analysis is the Lyapunov stability theory. The energy function E(x, y) is a good candidate for the stability analysis using the nonsmooth Lyapunov stability theory [29], [30]. Theorem 2: Given any initial value (x(0), y(0)) ∈ E, the Hopfield network (13) is asymptotically stable such that lim E(x(t), y(t)) = 0
t →+∞
(14)
where E is the set of equilibrium points associated with the Hopfield network. Proof: See the Appendix. The convergence property of the Hopfield network is given by Theorem 2, where the corresponding energy function E(x, y) converges to zero along the steepest descent direction with respect to t. However, it is a general characteristic of the Hopfield network. Whether its equilibria lie in the feasible domain also needs to be seriously considered. As is known, E(x, y) consists of f (x), gi (x)(i = 1, 2, . . . , m), and h j (x)( j = 1, 2, . . . , p). If E(x, y) is able to converge, f (x), gi (x)(i = 1, 2, . . . , m), and h j (x) ( j = 1, 2, . . . , p) shall share this similarity. Denote the energy functions of inequality and equality constraints as follows: E i (x) = gi+ (x)2 , i = 1, 2, . . . , m
(15)
E hj (x)
(16)
g
= h j (x) , 2
j = 1, 2, . . . , p.
The following theorems present the convergence properties for f (x), gi (x)(i = 1, 2, . . . , m), and h j (x)( j = 1, 2, . . . , p), respectively. Theorem 3: Given the initial values (x(0), y(0)), all the constraints of gi (x(0))(i = 1, 2, . . . , m) and h j (x(0)) ( j = 1, 2, . . . , p) converge to 0 as t → ∞, which yield the following. g 1) There exists a scale τi > 0, such that g
gi (x(t)) = gi (x(0))e−τi t , i = 1, 2 . . . , m.
(17)
2) There exists a scale τ hj > 0, such that −τ h t
h j (x(t)) = h j (x(0))e j , j = 1, 2 . . . , p. (18) Proof: See the Appendix. Remark 4: The property of energy functions constructed by inequality constraints gi (x)(i = 1, 2, . . . , m) and equality constraints h j (x)( j = 1, 2, . . . p) was separately analyzed. The analysis shows that both constraints are able to converge to 0 when t → ∞. Note that the trajectories of gi (x) (i = 1, 2, . . . , m) and h j (x)( j = 1, 2, . . . p) merely depict the tendency of the penalty function for problem (1), that is, for each active constraint, its value will eventually tend to zero. Nevertheless, the real trajectory of x(t) and f (x) should
312
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 2, FEBRUARY 2016
consider both constraints simultaneously when analyzing the energy function E(x, y). Theorem 4: Given the energy function E(x, y) in (12), g there exist some πi (i = 1, 2, . . . , m) and π hj ( j = 1, 2, . . . , p) such that ⎧ d f (x(t)) ⎨ ∈ − ∂x E(x, y)2 ( f (x) − y) ⎩ dt +
m
g
G i gi+ (x)πi gi (x(0))e−πi t g
i=1
+
p
H j |h j (x)|π hj h j (x(0))e
j =1
−π hj t
⎫ ⎬ ⎭
.
Then, all the trajectories of a Hopfield network (13) starting at initial value x(0) ∈ C and y(0) conjoin toward the feasible set C as t tends to infinity, which implies that each equilibrium point of the Hopfield network lies in a feasible set C. Proof: See the Appendix. As discussed in Theorem 3, both energy functions have a capacity to converge. Thus, the equilibrium of the Hopfield network starts from outside of the feasible domain and eventually evolves as an interior point of the feasible domain. In other words, this theorem shows that each equilibrium x e as a part of (x e , ye ) ∈ E lies in the feasible set C, which is the prerequisite in searching for the global solution of problem (1). Under certain conditions, the optimal solution can be obtained. C. Optimality Analysis In this section, E(x, y) is assumed convex, locally Lipschitz, and regular. Assume that there exists a global optimal solution (x ∗ , y ∗ ), such that E(x ∗ , y ∗ ) = 0.
(19)
Denote the global optimal solution of problem (1) without any constraint by x and y being the optimal value of f (x ), and let ρ = (y − y)/( f (x) − f (x )). Theorem 5: For any initial value (x(0), y(0)) of the Hopfield network (13), if ρ0 =
y − y(0) f (x(0)) − f (x )
(20)
is sufficiently large, then the trajectory of the Hopfield network (13) converges to its equilibrium point (x ∗ , y ∗ ) such that lim x(t) − x ∗ = 0
(21)
lim y(t) − f (x ∗ ) = 0
(22)
t →+∞
t →+∞
where x ∗ ∈ M and M is the set of optimal solutions. Proof: See the Appendix. Remark 5: The condition of the Hopfield network to guarantee the optimality of the solution has been characterized, which is easily satisfied. Note that the optimal solution x of the unconstrained convex problem is readily determined by its derivative at zero, i.e., 0 ∈ ∂ f (x ). Provided that the initial value y(0) is far less than the value f (x ), y − y(0) will be
sufficiently large, and f (x(0))− f (x ) is positive and bounded with any initial value x(0), which implies ρ0 is large enough to guarantee the optimality. D. Finite-Time Convergence With the Second-Order Derivative The finite-time convergence of the Hopfield network was initially introduced in [32] that estimates the finite time through the first-order derivative of an energy function. Mathematically speaking, it is widely accepted that the first-order information is insufficient to claim a property of finite-time convergence. From the prospective of convex optimization, the Hessian matrix of a convex function plays a significant role in the analysis of a convergence rate. Correspondingly, we should take the second-order derivative into account. For many problems, we cannot afford to compute a Hessian matrix, or even store an n by n matrix. Moreover, due to the nonsmoothness of f (x), gi (x), gi+ (x), and |h j (x)|, it brings complicated computation. Therefore, a new methodology is needed to handle this task. Fortunately, because x and y are simultaneously converging to the optimal solutions during the whole dynamical evolution, it is possible to solely concentrate on the dynamical behavior of y. In other words, if the finitetime convergence of neuron y could be possibly determined, the property of finite-time convergence regarding neurons x is obtained. Ultimately, the finite-time convergence of the Hopfield networks can be determined. Denote the second-order derivative of the energy function (12) with respect to y by ∂ 2 E(x, y)/∂y 2 . Proposition 1: The following Hopfield network shares similar dynamical behavior with the Hopfield network (13) and solves problem (1): x(t) ˙ ∈ −∂x E(x, y) (23) y˙ (t) ∈ −γ ∂ y E(x, y) where γ is a positive scale parameter and accelerates the convergence rate of the Hopfield network. Proof: The proof is straightforward. By smoothness of y, the finite-time convergence theorem can be established for the Hopfield network to solve the general convex optimization problems. Theorem 6: Let > 0, x is an -optimal solution of problem (1) if for any x ∈ C, f (x) ≥ f (x ) − . Given a required accuracy , the neurons of the Hopfield network (23) will reach the -optimality of problem (1) before time T where f (x(0)) − y(0) −
. (24) T = γ
Proof: See the Appendix. By considering (20), y(0) is far less than f (x ). The upper bound T can be approximated by T
ρ0 ( f (x(0)) − f (x )) . γ
(25)
Proposition 1 shows that γ is controllable to solve problem (1). Hence, the larger γ , the faster the convergence rate of the Hopfield network. On the other hand, y(0) is required to be
LI et al.: GENERALIZED HOPFIELD NETWORK FOR NONSMOOTH CONSTRAINED CONVEX OPTIMIZATION
Fig. 1.
Fig. 2. Trajectories of x(t) and y(t) in nonsmooth convex optimization problem (27).
Trajectories of x(t) on the contour.
small enough for the optimality, which may slow down the convergence speed. IV. P ERFORMANCE E VALUATION In this section, we will illustrate the performance of the Hopfield network (23) by numerical examples and one application. A. Numerical Examples First, the effectiveness of our results will be demonstrated by three simple simulations. Example 1: Consider the following nonsmooth convex optimization problem:
The global optimal solution is located at (0, 0.2493, 1.26), and the optimal value is 15.47. Ten initial values are randomly chosen within the interval [−5, 5] and set y(0) = −2000. The system states of the Hopfield network are used to portray the procedure. The trajectories of the Hopfield network depicted on the negative interval [−0.001, 0] represent the initial values. The results show that the trajectories starting from different initial values x i (0)(i = 1, 2, 3) all converge to the unique optimal solution and y always converges to the global minimizer of the objective function (see Fig. 2). Example 3: Consider the following nonsmooth optimization problem [39]:
min |x 1 + 1| + |x 2 + 1| s.t. x 12 + x 22 ≤ 3 x 1 + x 2 + 2 = 0.
min κ(A) = s.t. A ∈
(26)
This simple problem has two variables and is used to illustrate the basic idea of the proposed Hopfield network. Obviously, the global optimal solution is located at the point (−1, −1). Let ten initial values x(0) be randomly selected from −5 to 5 and choose y(0) = −5000. The results are given in Fig. 1 where each trajectory of (x 1 , x 2 ) starts from an initial point and ultimately ends in the red circle. The contour is constructed in terms of x 1 and x 2 from −5 to 5 where the different levels are denoted by different colors. The phase graph depicts steepest decentlike trajectories, while all the simulation results show that the Hopfield network (23) always globally converges to the optimal solution. Example 2: Consider the nonsmooth convex optimization problem in [42] min (x 1 − x 2 + x 3 )2 + |x 12 + 2x 3 | + |x 2 − 1| + 20(x 3 − 2)2 + e−x1 −x2 −x3 s.t. x 12 + 0.5e x1 −1 − 6x 2 − 4x 3 + 2 ≤ 0 x 33 − 2 ≤ 0 x 1 + 3x 2 + 0.2x 3 = 1 = {x : x i ≥ 0, i = 1, 2, 3}.
313
(27)
λmin (A) λmax (A) (28)
where A ∈ Rd×d is a symmetric matrix, is a compact convex set in Rd , and λmin (A) and λmax (A) represent the minimum and maximum eigenvalues of matrix A, respectively. The gradient of the objective function is computed by ∂κ(A) = λmin (A)−1 κ(A)(∂λmin (A) − κ(A)λmax (A)). Let
A=
a T x + a0 0
0 c T x + c0
(29)
(30)
where x = (x 1 , x 2 , x 3 , x 4 ), a = (−2, −1, 2, 0)T , c = (1, −1, 2, 1)T , a0 = 4, c0 = 2, and = {x ∈ R4 | 0 ≤ x ≤ 1}. Correspondingly, the condition number is defined as ⎧ T a x + a0 ⎪ ⎪ , a T x + a 0 ≥ c T x + c0 ⎨ T c x + c 0 κ(A) = (31) ⎪ c T x + c0 ⎪ T x + c > aT x + a ⎩ , c 0 0 a T x + a0 which is a nonsmooth pseudoconvex optimization problem. Thus, the global optimal solution may not be unique.
314
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 2, FEBRUARY 2016
Fig. 3. Trajectories of x(t) and y(t) in the minimization of condition number.
For matrix A, the gradient of each variable x i (i = 1, 2, 3, 4) is
⎧ T T ⎪ ⎪ ai (c x + c0 ) − ci (a x + a0 ) , ⎪ ⎪ ⎪ (c T x + c0 )2 ⎪ ⎪ ⎪ ⎨ a T x + a 0 ≥ c T x + c0
∂κ(A) = ⎪ ∂ xi ci (a T x + a0 ) − ai (c T x + c0 ) ⎪ ⎪ ⎪ , ⎪ ⎪ (a T x + a0 )2 ⎪ ⎪ ⎩ c T x + c0 > a T x + a 0 .
(32)
Substituting into (13) yields the Hopfield network in terms of each variable x i (i = 1, 2, 3, 4) ⎧ ∂κ(A) ⎪ x˙i (t) ∈ −(κ(A) − y(t)) ⎪ ⎪ ⎪ ∂ xi ⎪ ⎪ ⎪ ∂[max{0, x i − 1}] ⎨ − max{0, x i − 1} ∂ xi (33) ⎪ ∂[max{0, −x i }] ⎪ ⎪ − max{0, −x } ⎪ i ⎪ ⎪ ∂ xi ⎪ ⎩ y˙ (t) = κ(A) − y(t). After randomly starting from ten initial values within the interval [−10, 10] and y(0) = −2000, all the neurons of x converge to their own equilibrium points between 0 and 1, while all the neurons of y solely evolve into the optimal value y ∗ = 1 of this problem (see Fig. 3). In all the above examples, we choose the parameter of the Hopfield network (23) γ = 2000. As mentioned, y can be considered as an indicator of the global optimizer. Hence, the proposed Hopfield network can converge to global optimal solutions for the general nonsmooth convex optimization problem. In comparison with the smoothing technique-based neural network in [40], ten initial points from [−10, 10] were chosen to simulate the experiments with μ(0) = 100. As shown in Fig. 4, the smoothing technique μ may not always help to seek the global optimizer. Although most cases could successfully achieve optimality, the accuracy of the results largely depends on the smoothing parameter. Remark 6: The trajectory of y(t) starting from the same initial value will finally converge to the optimal value of problem (1). Once each x(t) evolves into the neighborhood
Fig. 4. (a) Trajectories of x(t) obtained by the method of [40]. (b) Corresponding trajectories of κ( A).
of x ∗ , the evolution of each y(t) would gather together due to the uniqueness of the Filippov solution to the Hopfield network (23). On the other hand, the finite-time convergence mechanism is mainly established by neuron y, namely, T = (( f (x(0)) − y(0) − )/γ ). The convergence time T can be approximated by (γ )−1 |y(0)| if ρ0 is sufficient large, which implies that a different initial value has almost the same T for the same y(0). That is why every y(t) synchronized (see details of y in Figs. 2 and 3). B. Application The MPC is widely applied to the industrial process control where the current control state can be obtained by solving a finite horizon open-loop optimal control problem. Mathematically, this finite horizon open-loop optimal control problem can be considered as a constrained optimization problem in each sampling period, where the current state of the plant is obtained as an initial state. The HD-MPC four-tank benchmark [43] will be considered here. As shown in Fig. 5, the four-tank plant system is governed by the following differential equations: a1 a3 γa 2gh 1 + 2gh 3 + qa (34) h˙1 = − S S S a2 a4 γb 2gh 2 + 2gh 4 + qb (35) h˙2 = − S S S a3 1 − γb qb h˙3 = − 2gh 3 + (36) S S a4 1 − γa h˙4 = − qa 2gh 4 + (37) S S
LI et al.: GENERALIZED HOPFIELD NETWORK FOR NONSMOOTH CONSTRAINED CONVEX OPTIMIZATION
315
¯ g¯ = ( S¯ T Q¯ T¯ )x(k|k), R¯ = diag{R, . . . , R}, P¯ = S¯ T Q¯ S¯ + R, ¯ Q = diag{Q, . . . , Q}, U indicates the constraints imposed on flow level, and b¯l and b¯u indicate the constraints imposed on flow and water level. Considering the terminal cost, the constrained finite horizon linear quadratic optimization problem can be rewritten as min J (x, u) =
N−1 k=1
1 T 1 T x k Qx k + u k−1 Ru k−1 + ρx N 1 2 2
s.t. x k = Ax k−1 + Bu k−1
Fig. 5.
u ≤ uk ≤ u x ≤ xk ≤ x
Diagram of the four-tank process.
x(0) = x 0 .
where h i and ai , i = 1, 2, 3, 4, denote the water level and the discharge constant of tank i , respectively; q j and γ j , j ∈ {a, b} represent the flow and the ratio of the threeway valve of pump j , respectively; S is the cross section of each tank, and g is the gravitational acceleration. Denote the deviation variables x i = h i − h 0i , i = 1, 2, 3, 4 u j = q j − q 0j ,
j ∈ {a, b}
x = (x 1 , x 2 , x 3 , x 4 ) u = (u a , u b ) ν = (x 1 , x 2 ). Using the zero-order hold method with a sampling period of 5 s, the discrete-time model is obtained in the following: x(k + 1) = Ax(k) + Bu(k) (38) ν(k) = C x(k) where
⎡ ⎤ 0.9705 0 0.0207 0 ⎢ 0 0.9663 0 0.0195⎥ ⎥ A=⎢ ⎣ 0 0 0.9790 0 ⎦ 0 0 0 0.9802 ⎡ ⎤ 24.6291 0.5213 ⎢ −0.1967 32.7684⎥ ⎥, C = 1 0 B =⎢ ⎣ 0 49.4735⎦ 0 1 −19.8011 0
0 0
0 . 0
By using the MPC scheme, x(k|k) = x(k) is the initial state at the kth time instant. An online finite horizon open-loop optimal control problem can be described as follows: 1 T ¯ u Pu + g¯ T u 2 ¯ + T¯ x(k|k) ≤ b¯u s.t. b¯l ≤ Su min J (k, u) = u∈U
where
⎡
⎢ ⎢ S¯ = ⎢ ⎣
B AB .. .
A N−1 B
0 B .. .
A N−2 B
··· ··· .. . ···
⎤ ⎡ ⎤ A 0 ⎢ A2 ⎥ 0⎥ ⎥ ⎢ ⎥ .. ⎥, T¯ = ⎢ .. ⎥. ⎣ . ⎦ .⎦ B AN
(39)
(40)
T ]T , then (40) can be expressed as Let z k = [x kT , u k−1
min f (z) = z∈
s.t. Sz = b
N−1 k=1
1 T z Pz k + ρz N 1 2 k (41)
where z = [z 1T , z 2T , . . . , z TN ]T , = 1 × 2 × · · · × Q 0 N = {z|[x T , u T ]T ≤ z k ≤ [x T , u T ]T }, P = 0 R , b = [(Ax 0 )T 0 · · · 0]T , and ⎤ ⎡ I −B 0 0 ··· 0 0 0 0 ⎢−A 0 I −B · · · 0 0 0 0 ⎥ ⎥ ⎢ S=⎢ . . . . .. ⎥. . . . . .. .. .. .. .. .. .. ⎣ .. . ⎦ 0 0 0 0 · · · −A 0 I −B By considering the gradient f (z), we have ∂ f (z) Pz k k = 1, 2, . . . , N − 1 = ∂z k ρH k = N
(42)
where H ∈ ∂z N 1 . Therefore, the corresponding Hopfield network is given by ⎧ ⎪ ⎪z˙ k (t) ∈ −( f (z) − y(t)) ∂ f (z) ⎪ ⎪ ⎪ ∂z k ⎪ ⎪ ⎪ ⎪ ∂[max{0, z k − z¯ k }] ⎪ ⎪ − max{0, z k − z¯ k } ⎪ ⎨ ∂z k (43) ∂[max{0, −z k + z k }] ⎪ ⎪ ⎪ − max{0, −z + z } k ⎪ k ⎪ ∂z k ⎪ ⎪ ⎪ ⎪ − |Sz − b|H S ⎪ z k ⎪ ⎪ ⎩ y˙ (t) = f (z) − y(t) where Hz ∈ ∂(|Sz − b|) and Sk is the summation of the kth column regarding to the matrix S. z k and z¯ k are the lower bound and the upper bound, respectively. In the experiment, it is assumed that the initial state x 0 = [−0.4; 0.6; −0.3; 0.5]. The quadratic cost matrices are chosen Q = I and R = I , and the parameter ρ of l1 -norm is 0.01. Then, at each time instant k = 1, 2, . . ., the proposed Hopfield network is employed to solve a sequence of constrained optimization problems with y(0) = −200. The parameters of lower and upper bounds are given for the states
316
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 2, FEBRUARY 2016
The dashed lines are the trajectories of two cases, which imply an arbitrarily choice of the parameter may not result in a fast convergence rate. As shown, the convergence rate of the proposed method is much quicker than the method in [41]. Iteratively, we can obtain the trajectories of the state variables x i (i = 1, 2, 3, 4) and the control signals u i (i = 1, 2) by our method, as shown in Fig. 7. V. C ONCLUSION
Fig. 6.
Trajectories of f (z) and y(t) at k = 1.
In this paper, we have proposed a novel Hopfield network for solving general nonsmooth constrained convex optimization problems. By means of the discontinuous dynamical system theory, this Hopfield network has been formulated as being governed by a differential inclusion. An augmented variable y is introduced as a penalization, which adaptively adjusts the weight of the gradients between the objective function and the penalty function. Conditions for the existence and uniqueness of the Filippov solution for Hopfield networks have been proved, and the Lie derivative has been used to analyze the convergence property. Furthermore, the optimality has been developed in terms of enhanced Fritz John conditions, and the finite-time convergence of Hopfield networks has been studied. The proposed Hopfield network can adaptively change the descent stepsize. Note that the traditional method has fixed penalty parameters, which are usually greater than a certain threshold to ensure the optimality. However, it may lead to unstable trajectories with an unsuitable stepsize, which fails to achieve the optimality. Noticeably, a practical implementation of neural networks has to involve in safeguards that gradually increase the penalty parameters when the energy function of constraint is not decreasing fast enough, or when the trajectory is unstable. Our proposed approach can avoid this problem. A PPENDIX A. Proof of Lemma 2 Let a sequence of quadratic penalty functions for problem (1) be defined as
Fig. 7. (a) Iterative trajectories of xi with the length of prediction horizon N = 20. (b) Iterative trajectories of u i with the length of prediction horizon N = 20.
k + 2 k gi (x) + |h j (x)|2 , for each k = 1, 2, . . . , ∞. 2 2 m
p
i=1
j =1
(44) and control as follows: 1 1 u= [−1.63; −2], u = [1.63; 2] 3600 3600 x = [−0.45; −0.46; −0.45; −0.46]
We have a sequence of objective functions x∈X
x = [0.71; 0.7; 0.65; 0.64]. The length of prediction is N = 20. In comparison with the other methods, the nonautonomous parameter [41] plays a significant role in the convergence rate of the neural network. Two cases of the neural network were used to compare the performance of the neural networks. The one is chosen √ ε(t) = (2/ 10 t + 4) as the√parameter of case 1, and the other one is chosen ε(t) = (2/ t + 4) as the parameter of case 2. For each time slot, the quadratic optimization problem (41) was solved using the proposed neural network. The results of the cost function by different methods are presented in Fig. 6.
1 k + 2 ( f (x) − f (x ∗ ))2 + gi (x) 2 2 m
min V k (x) ≡
i=1
p k 1 + |h j (x)|2 + x − x ∗ 2 2 2
(45)
j =1
where X = {x|x − x ∗ ≤ } and gi+ (x) = max{0, gi (x)}. Obviously, for all x ∈ X, f (x) − f (x ∗ ) ≥ 0. Thus, there exists some positive number V¯ , V k (x) ≤ V¯ . There exists an optimal solution x k minimizing quadratic penalty function for specific k, such that V k (x k ) ≤
k + ∗ 2 k gi (x ) + |h j (x ∗ )|2 2 2 m
p
i=1
j =1
(46)
LI et al.: GENERALIZED HOPFIELD NETWORK FOR NONSMOOTH CONSTRAINED CONVEX OPTIMIZATION
which implies that lim
k→∞
m
B. Proof of Theorem 1
gi+ (x k )2 = 0;
i=1
lim
k→∞
p
|h j (x k )|2 = 0.
(47)
j =1
The limit point x¯ of x k lies in the feasible domain and 1 ( f (x) ¯ − f (x ∗ ))2 + x¯ − x ∗ 2 ≤ 0. 2
(48)
Therefore, x¯ − x ∗ 2 = 0, which indicates the sequence x k converges to x ∗ for k → ∞. ¯ x k lies in the feasible domain, On the other hand, for k ≥ k, and we have ∇V k (x k )T y ≥ 0
i=1 p
ζ jk ∇h j (x k ) + x k − x ∗
j =1
⎫ ⎬ ⎭
∈ NX (x k )
(50)
where ξik = kG i gi+ (x k ) and ζ jk = k H j |h j (x k )|. NX (x k ) is the normal cone of X at x k . When f (x) is convex, NX (x ∗ ) = 0. By defining p m !2 k ζ jk (ξik )2 + (51) δ = ( f (x k ) − f (x ∗ ))2 + j =1
i=1
we have ⎧ p m ⎨ − μk0 ∂ f (x k ) + μki ∂gi (x k ) + λkj ∇h j (x k ) ⎩ i=1 j =1 " 1 k ∗ + (x − x ) ∈ NX (x k ) δk
i ¯
j
j
˙ ψ(z) = z − z¯ , z˙ − z˙¯ ˙¯ + y − y¯ , y˙ − y˙¯ = x − x, ¯ x˙ − x = x − x, ¯ −∂x E(x, y) + ∂x¯ E(x, ¯ y¯ ) + y − y¯ , −∂ y E(x, y) + ∂ y¯ E( y¯ , y¯ ) ≤ 0.
(52)
By considering the initial value ψ(z(0)) = 0, ψ(z) equals to zero constantly. Thus, z = z¯ for any initial condition. Hence, the Hopfield network (13) has the unique Filippov solution starting from any initial value (x(0), y(0)). C. Proof of Theorem 2 The energy function E(x, y) : Rn × R → R is a continuously differentiable convex function, which is composed of a locally Lipschitz continuous function and an absolutely continuous function. Then, ((d E(x(t), y(t)))/dt) exists almost everywhere. Correspondingly, the upper righthand Dini derivative of E(x, y) with respect to t is given
By (10), we have D + E(x(t), y(t)) = max L K [−∂ E] [E(x(t), y(t))] = max [ζx ζ y ][νx ν y ]T = max {−∂x E(x, y)2 −∂ y E(x, y)2 } (54) where ζx ∈ ∂x E(x, y), ζ y ∈ ∂ y E(x, y), νx ∈ K [−∂x E](x, y), and ν y ∈ K [−∂ y E](x, y). Note that ζx νx may not exist, because f (x) and g(x) are the nonsmooth functions with respect to x, and L K [−∂x E] [E(x(t), y(t))] = ∅ at some x. On the other hand, ζ y ν y always exists, because E(x(t), y(t)) is continuous and differentiable with respect to y. By the results in [31, Th. 2]
¯
we have μ∗i μki > 0 for all i ∈ I and λ∗j λkj > 0 for all j ∈ J . Therefore, in the neighborhood of X $ $ % $ %% ∀i ∈ I μki ≤ min min μki |i ∈ I , min |λkj || j ∈ J $ $ k % $ k %% k |λ j | ≤ min min μi |i ∈ I , min |λ j || j ∈ J ∀ j ∈ J which proves condition 4). The proof is completed.
(53)
D + E(x(t), y(t)) E(x(t) + η x(t), ˙ y(t) + η y˙ (t)) − E(x(t), y(t)) . = lim sup η η→0+
where μk0 = (( f (x k ) − f (x ∗ ))/δ k ), μki = (ξik /δ k ), and λkj = (ζ jk /δ k ). p k 2 k 2 = 1, the Note that (μk0 )2 + m i=1 (μi ) + j =1 (λ j ) sequences μk0 , μki (i = 1, 2, . . . , m), and λkj ( j = 1, 2, . . . , p) are bounded and converge to μ∗0 , μ∗i (i = 1, 2, . . . , m), and λ∗j ( j = 1, 2, . . . , p), respectively. It can be seen that conditions 1)–3) are satisfied. For condition 4), assume that#some inequality and equality constraints are active. Thus, I J is nonempty, and otherwise, μki (i = 1, 2, . . . , m) and λkj ( j = 1, 2, . . . , p) are equal to 0. ¯ Since μk = (ξ k /δ k ) and λk = (ζ k /δ k ), for sufficiently large k, i
= 1, 2, . . . , m), and h j (x) Since f (x), gi (x)(i ( j = 1, 2, . . . , p) are convex on Rn , the multivalued n mapping K [−∂ E](x, y) : Rn × R → 2R ×R is upper semicontinuous with nonempty, compact, convex values, and locally bounded. Consequently, the existence of a Filippov solution to the Hopfield network (13) is guaranteed. Denote z = (x, y). Let z and z¯ as the Filippov solutions of the Hopfield network starting at (x(0), y(0)). Meanwhile, define ψ(z) = 12 z − z¯ 2 . Note that the convexity of E(x, y) implies that its subdifferential is a monotone operator. One has
(49)
for y in the tangent cone of the feasible domain. One obtains m − ( f (x k ) − f (x ∗ ))∂ f (x k ) + ξik ∂gi (x k ) +
317
D + E(x(t), y(t)) ≤ max {−∂ y E(x, y)2 }
(55)
and inf
(x,y)∈B\E
∂ y E(x, y)2 = 0
where B is any neighborhood of E.
(56)
318
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 2, FEBRUARY 2016
Therefore, any initial value starting from (x(0), y(0)) ∈ E asymptotically converges to the set of equilibrium points E, such that
E. Proof of Theorem 4 By Theorem 2, the Lie derivative of E(x, y) for neuron x is obtained
lim E(x(t), y(t)) = 0.
L K [−∂x E] E(x, y) = −∂x E(x, y)2 .
(57)
t →+∞
The proof is completed. D. Proof of Theorem 3
The energy functions (15) and (16) are continuously differentiable. For any x(0) ∈ C, one has gi (x) > 0 and |h(x)| = 0. We have
(68)
Thus, the derivative of f (x) is determined by ⎧ m dgi (x) d f (x(t)) ⎨ 2 ∈ − ∂x E(x, y) − G i gi+ (x) ( f (x)− y) ⎩ dt dt i=1 ⎫ p dh j (x) ⎬ H j |h j (x)| − . (69) dt ⎭ j =1
g
∂x E i (x) = G i gi+ (x)∂gi (x)
(58)
Likewise, there exist some πi and π hj such that, for energy function E(x, y)
∂x E hj (x) = H j |h j (x)|∇h j (x).
(59)
gi (x(t)) = gi (x(0))e−πi t , i = 1, 2, . . . , m
g
h j (x(t)) = h j (x(0))e
g
Sequentially, the Lie derivative of E i (x) and E hj (x) is L K [−∂x E g ] E i (x) = −G i gi+ (x)∂gi (x)2
(60)
L K [−∂x E h ] E hj (x) = −H j |h j (x)|∇h j (x)2 .
(61)
g
i
j
g
Therefore, we have ( f (x) − y)
dt
+
= −H j |h j (x)|∇h j (x) .
G i gi+ (x)∂gi (x)2 dgi (x(t)) ∈ − dt G i gi+ (x) = −G i ∂gi (x)2 gi (x)
(64)
H j |h j (x)|∇h j (x)2 dh j (x(t)) ∈ − dt H j |h j (x)| = −H j ∇h j (x)2 h j (x).
(65)
m
g
G i gi+ (x)πi gi (x(0))e−πi t g
p
H j |h j (x)|π hj h j (x(0))e
j =1
(63)
Hence, the derivatives of gi (x) and h j (x) with respect to t are obtained as
(71)
i=1
(62) 2
j = 1, 2, . . . , p.
⎧ ⎨
+
g
d E hj (x(t))
,
(70)
d f (x(t)) ∈ − ∂x E(x, y)2 ⎩ dt
Substituting into (9) gives d E i (x(t)) = −G i gi+ (x)∂gi (x)2 dt
−π hj t
−π hj t
⎫ ⎬ ⎭
.
Although the terms gi (x)(i = 1, 2, . . . , m) and h j (x) ( j = 1, 2, . . . , p) contribute to the increase of f (x), by Theorem 3, gi (x) and h j (x) would vanish to 0 for each active constraint as t tends to infinity but not monotonically converge. In addition, by Theorem 2, ∂x E(x, y)2 would converge no matter where the initial value (x(0), y(0)) is. Then, f (x) will converge to somewhere until the right-hand side attains balance m of (69) + 2 and E(x, y) G g (x)(dg between −∂ x i (x)/dt) + i=1 i i p H |h (x)|(dh (x)/dt). Meanwhile, the limit of each j j =1 j j initial value x(0) will be attracted into the feasible domain C if x(0) ∈ C.
g
Thus, there exist τi and τ hj such that
F. Proof of Theorem 5
= inf G i ∂gi (x)
Given any initial value x(0), by the definition of ρ, there exists a y(0) such that ρ0 is sufficiently large, satisfying
τ hj = inf H j ∇h j (x)2
f (x(0)) − y(0) = (1 + ρ0 )[ f (x(0)) − f (x )]
g τi
2
x∈B\C x∈B\C
where B is any neighborhood of a feasible set C. Therefore, we have g
gi (x(t)) = gi (x(0))e−τi t , i = 1, 2, . . . , m h j (x(t)) = h j (x(0))e g
−τ hj t
,
j = 1, 2, . . . , p.
(66) (67)
It implies that if τi and τ hj are some positive numbers, gi (t, x) and h j (t, x) correspond with energy functions g E i (x) and E hj (x) converging to 0. The proof is completed.
(72)
which implies that, in the Hopfield network (13), ( f (x(0)) − y(0))∂ f (x(0)) significantly dominates p m + G g (x(0))∂g (x(0)) + H |h (x(0))|∇h i i j j j (x(0)), i=1 i j =1 whereas y˙ = f (x) − y makes a significant contribution to the convergence of y in the differential equation. The convergence rate of the solution (x, y) approximately follows up the steepest decent direction of E(x, y) without the effect from the penalty of constraints: x(t) ˙ ∈ −( f (x(t)) − y(t))∂ f (x(t)) (73) y˙ (t) = f (x(t)) − y(t).
LI et al.: GENERALIZED HOPFIELD NETWORK FOR NONSMOOTH CONSTRAINED CONVEX OPTIMIZATION
Obviously, in this gradient flow, x(t) − x decreases fast toward the neighborhood of 0, while y swiftly increases in the direction of f (x ). By Theorem 1, the Hopfield network (13) admits the unique solution with any initial point (x(0), y(0)), which implies that (x, y) rapidly converges toward to (x , y ). Meanwhile, f (x) experiences a decline during the same period. Nevertheless, f (x) would never hit f (x ) directly, and the trajectory of (x, y) would not converge to (x , y ), so the effect of constraints cannot be ignored. Denote different time instants by tk (k = 1, 2, . . . , ∞) during the evolution of the Hopfield network. Furthermore, there exists t1 > 0 such that f (x ) ≤ f (x(t1 )) ≤ f (x ∗ )
(74)
f (x(t1 )) − y(t1 ) > 0.
(75)
-optimality, and ⎡ 0 ∈ − ⎣( f (x(tk )) − y(tk ))∂ f (x(tk )) + +
+
p
⎤ H j |h j (x(t1 ))|∇h j (x(t1 ))⎦.
H j |h j (x(tk ))|∇h j (x(tk ))⎦.
(82)
Obviously, (x(tk ), y(tk )) is in the neighborhood of (x ∗ , y ∗ ) and ultimately converge to it. Moreover, f (x(tk )) is approaching f (x ∗ ). On the other hand, if we denote μk0 = f (x(tk )) − y(tk )
(83)
μki λkj
(84)
=
G i gi+ (x(tk )),
= H j |h j (x(tk ))|,
i = 1, 2, . . . , m j = 1, 2, . . . , p
(85)
j =1
(86)
(76)
Thus, (x(t1 ), y(t1 )) is not an equilibrium point of the Hopfield network (13). Note that f (x(t1 )) − y(t1 ) > 0, m p + therefore i=1 G i gi (x)∂gi (x) + j =1 H j |h j (x)|∇h j (x) gradually dominates ( f (x) − y)∂ f (x) during the dynamical evolution. From Theorem 4, f (x) tends to increase because of active constraints, i.e., g + (x) > 0 and |h(x)| = 0. Then, there exists t2 > t1 such that
and
⎤
i=1
j =1
f (x(t1 )) ≤ f (x(t2 )) ≤ f (x ∗ ) f (x(t2 )) − y(t2 ) > 0
p
and substitute into (82), then ⎡ ⎤ p m 0 ∈ −⎣μk0 ∂ f (x(tk ))+ μki ∂gi (x(tk ))+ λkj ∇h j (x(tk ))⎦
G i gi+ (x(t1 ))∂gi (x(t1 ))
i=1
G i gi+ (x(tk ))∂gi (x(tk ))
j =1
0 ∈ − ⎣( f (x(t1 )) − y(t1 ))∂ f (x(t1 )) +
m i=1
Since 0 ∈ ∂ f (x ), one has ξ approaching to 0 in the sense of gradient decent where ξ ∈ ∂ f (x(t1 )) and ⎡
m
319
satisfying condition 1) of Lemma 2. Correspondingly, μki ≥ 0 for i = 0, 1, 2, . . . m show that condition 2) is satisfied. μk0 , μki (i = 1, 2, . . . , m), and λkj ( j = 1, 2, . . . , p) are not all equal to 0 because at least μk0 > 0, which indicates condition 3). Let I = {i | μki = 0} and J = { j | λkj > 0}, one has gi (x(tk )) k λ j h j (x(tk ))
(77) (78)
⎡
> 0 ∀i ∈ I
(87)
> 0 ∀j ∈ J gi (x(tk )) ≤ ψ(x(tk )) ∀i ∈ I
(88) (89)
|h j (x(tk ))| ≤ ψ(x(tk )) ∀ j ∈ J
(90)
where
0 ∈ − ⎣( f (x(t2 )) − y(t2 ))∂ f (x(t2 )) +
m
min{|h j (x(tk ))| j ∈ J }}. G i gi+ (x(t2 ))∂gi (x(t2 ))
i=1
+
p
ψ(x(tk )) = min{min{gi (x(tk ))|i ∈ I }
⎤ H j |h j (x(t2 ))|∇h j (x(t2 ))⎦.
(79)
j =1
After increasing, ( f (x) − y)∂ f (x) dominates again so that f (x) decreases along with E(x, y). Due to y being monotonically increasing with respect to t, the dynamical evolution of f (x) will repeat this procedure until some tk > tk¯ for k is sufficient large such that f (x(tk¯ )) ≤ f (x(tk )) ≤ f (x ∗ ) f (x(tk )) − y(tk ) ≤
(80) (81)
where takes on a value arbitrarily close to but greater than zero. k¯ represents the lower bound of k that f (x(tk )) reaches
Then, condition 4) holds. Thus, every neighborhood of the optimal solution must contain all x(tk ) with sufficient large k. Note that the limit of x(t) is x ∗ in a feasible domain C. Therefore, x ∗ is an optimal solution of problem (1) such that lim x(t) − x ∗ = 0
(91)
lim y(t) − f (x ∗ ) = 0.
(92)
t →+∞
t →+∞
The proof is completed.
G. Proof of Theorem 6 For the second-order derivative of y, one has d2 y dy ∂ 2 E(x, y) d y = −γ . = −γ · dt 2 ∂y 2 dt dt
(93)
320
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 2, FEBRUARY 2016
Denote w(t) = y˙ (t). Substituting into (93), we have w(t) ˙ = −γ w(t)
(94)
w(t) ¨ = γ 2 w(t).
(95)
By Theorem 5, w(t) = f (x(t)) − y(t) is positive at initial point and monotonically decreasing. Given a required accuracy
> 0, we have y − y∗ ≤
(96)
where y is an indicator of the global minimizer with
-optimality. Thus, there exists a T ∗ , such that, w(T ∗ ) = . For t ∈ [0, T ∗ ) w(t) ˙ ≤ −γ
2
w(t) ¨ ≥ γ . Denote w(t) ˙ = φ(t) which gives & t d φ(s)ds ≥ φ(0) + γ 2 t. φ(t) = φ(0) + ds 0
(97) (98)
(99)
On the other hand, for t ∈ [0, T ∗ ), φ(t) ≤ −γ . We have φ(0) + γ 2 t ≤ −γ . Thus t≤
w(0) −
. γ
(100)
Therefore, for any initial value w(0) = f (x(0)) − y(0), the finite-time convergence of the Hopfield network (23) to reach
-optimality is upper bounded by T =
f (x(0)) − y(0) −
. γ
The proof is completed.
(101)
R EFERENCES [1] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006. [2] C. J. C. Burges, R. Ragno, and Q. V. Le, “Learning to rank with nonsmooth cost functions,” in Proc. Adv. Neural Inf. Process. Syst., vol. 19. Vancouver, BC, Canada, Dec. 2006, pp. 193–200. [3] S. Wen, Z. Zeng, T. Huang, and Y. Zhang, “Exponential adaptive lag synchronization of memristive neural networks via fuzzy method and applications in pseudorandom number generators,” IEEE Trans. Fuzzy Syst., vol. 22, no. 6, pp. 1704–1713, Dec. 2014. [4] S. Wen, Z. Zeng, T. Huang, Q. Meng, and W. Yao, “Lag synchronization of switched neural networks via neural activation function and applications in image encryption,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 7, pp. 1493–1502, Jul. 2015. [5] C. Li, X. Yu, W. Yu, T. Huang, and Z.-W. Liu, “Distributed eventtriggered scheme for economic dispatch in smart grids,” IEEE Trans. Ind. Informat., doi: 10.1109/TII.2015.2479558. [6] P. Apkarian and D. Noll, “Controller design via nonsmooth multidirectional search,” SIAM J. Control Optim., vol. 44, no. 6, pp. 1923–1949, 2006. [7] S. Wen, Z. Zeng, T. Huang, X. Yu, and M. Xiao, “New criteria of passivity analysis for fuzzy time-delay systems with parameter uncertainties,” IEEE Trans. Fuzzy Syst., doi: 10.1109/TFUZZ.2015.2417913. [8] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proc. Nat. Acad. Sci. USA, vol. 79, no. 8, pp. 2554–2558, 1982. [9] D. Tank and J. J. Hopfield, “Simple ‘neural’ optimization networks: An A/D converter, signal decision circuit, and a linear programming circuit,” IEEE Trans. Circuits Syst., vol. 33, no. 5, pp. 533–541, May 1986. [10] M. P. Kennedy and L. O. Chua, “Neural networks for nonlinear programming,” IEEE Trans. Circuits Syst., vol. 35, no. 5, pp. 554–562, May 1988.
[11] J. Wang, “Recurrent neural networks for computing pseudoinverses of rank-deficient matrices,” SIAM J. Sci. Comput., vol. 18, no. 5, pp. 1479–1493, 1997. [12] J. Wang and Y. Xia, “Analysis and design of primal–dual assignment networks,” IEEE Trans. Neural Netw., vol. 9, no. 1, pp. 183–194, Jan. 1998. [13] J. Wang, “Analysis and design of a k-winners-take-all model with a single state variable and the heaviside step activation function,” IEEE Trans. Neural Netw., vol. 21, no. 9, pp. 1496–1506, Sep. 2010. [14] X. Yu, M. O. Efe, and O. Kaynak, “A general backpropagation algorithm for feedforward neural networks learning,” IEEE Trans. Neural Netw., vol. 13, no. 1, pp. 251–254, Jan. 2002. [15] D. Jiang and J. Wang, “A recurrent neural network for real-time semidefinite programming,” IEEE Trans. Neural Netw., vol. 10, no. 1, pp. 81–93, Jan. 1999. [16] Y. Xia and J. Wang, “A general methodology for designing globally convergent optimization neural networks,” IEEE Trans. Neural Netw., vol. 9, no. 6, pp. 1331–1343, Nov. 1998. [17] Y. Xia and J. Wang, “A discrete-time recurrent neural network for shortest-path routing,” IEEE Trans. Autom. Control, vol. 45, no. 11, pp. 2129–2134, Nov. 2000. [18] Y. Xia, “A compact cooperative recurrent neural network for computing general constrained L 1 norm estimators,” IEEE Trans. Signal Process., vol. 57, no. 9, pp. 3693–3697, Sep. 2009. [19] Y. Xia, C. Sun, and W. X. Zheng, “Discrete-time neural network for fast solving large linear L 1 estimation problems and its application to image restoration,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 5, pp. 812–820, May 2012. [20] X. Hu and J. Wang, “Solving pseudomonotone variational inequalities and pseudoconvex optimization problems using the projection neural network,” IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1487–1499, Nov. 2006. [21] X. Hu and B. Zhang, “An alternative recurrent neural network for solving variational inequalities and related optimization problems,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 39, no. 6, pp. 1640–1645, Dec. 2009. [22] X. Hu, C. Sun, and B. Zhang, “Design of recurrent neural networks for solving constrained least absolute deviation problems,” IEEE Trans. Neural Netw., vol. 21, no. 7, pp. 1073–1086, Jul. 2010. [23] X. Hu and J. Wang, “Solving the assignment problem using continuoustime and discrete-time improved dual networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 5, pp. 821–827, May 2012. [24] A. Balavoine, J. Romberg, and C. J. Rozell, “Convergence and rate analysis of neural networks for sparse approximation,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 9, pp. 1377–1389, Sep. 2012. [25] F. H. Clarke, Optimization and Nonsmooth Analysis. New York, NY, USA: Wiley, 1983. [26] D. P. Bertsekas, Nonlinear Programming, 2nd ed. Belmont, MA, USA: Athena Scientific, 1999. [27] D. P. Bertsekas, A. E. Ozdaglar, and P. Tseng, “Enhanced Fritz John conditions for convex programming,” SIAM J. Optim., vol. 16, no. 3, pp. 766–797, 2006. [28] A. F. Filippov, “Differential equations with discontinuous righthand sides,” in Mathematics and Its Applications (Soviet Series). Boston, MA, USA: Kluwer, 1988. [29] M. Kunze, Non-Smooth Dynamical Systems (Lecture Notes in Mathematics), vol. 1744. Berlin, Germany: Springer-Verlag, 2000, p. 232. [30] J. Cortés, “Discontinuous dynamical systems: A tutorial on solutions, nonsmooth analysis, and stability,” IEEE Control Syst., vol. 28, no. 3, pp. 36–73, Jun. 2008. [31] A. Bacciotti and F. Ceragioli, “Stability and stabilization of discontinuous systems and nonsmooth Lyapunov functions,” ESAIM, Control Optim. Calculus Variat., vol. 4, pp. 361–376, Jun. 1999. [32] M. Forti, P. Nistri, and M. Quincampoix, “Generalized neural network for nonsmooth nonlinear programming problems,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 9, pp. 1741–1754, Sep. 2004. [33] M. Forti, M. Grazzini, P. Nistri, and L. Pancioni, “Generalized Lyapunov approach for convergence of neural networks with discontinuous or non-Lipschitz activations,” Phys. D, Nonlinear Phenomena, vol. 214, no. 1, pp. 88–99, 2006. [34] A. Hosseini, J. Wang, and S. M. Hosseini, “A recurrent neural network for solving a class of generalized convex optimization problems,” Neural Netw., vol. 44, pp. 78–86, Aug. 2013.
LI et al.: GENERALIZED HOPFIELD NETWORK FOR NONSMOOTH CONSTRAINED CONVEX OPTIMIZATION
[35] Q. Liu and J. Wang, “Two k-winners-take-all networks with discontinuous activation functions,” Neural Netw., vol. 21, nos. 2–3, pp. 406–413, Mar./Apr. 2008. [36] Q. Liu and J. Wang, “Finite-time convergent recurrent neural network with a hard-limiting activation function for constrained optimization with piecewise-linear objective functions,” IEEE Trans. Neural Netw., vol. 22, no. 4, pp. 601–613, Apr. 2011. [37] Q. Liu and J. Wang, “A one-layer recurrent neural network for constrained nonsmooth optimization,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 41, no. 5, pp. 1323–1333, Oct. 2011. [38] Q. Liu, Z. Guo, and J. Wang, “A one-layer recurrent neural network for constrained pseudoconvex optimization and its application for dynamic portfolio optimization,” Neural Netw., vol. 26, no. 1, pp. 99–109, Feb. 2012. [39] Q. Liu and J. Wang, “A one-layer projection neural network for nonsmooth optimization subject to linear equalities and bound constraints,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 5, pp. 812–824, May 2013. [40] W. Bian and X. Chen, “Smoothing neural network for constrained non-Lipschitz optimization with applications,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 3, pp. 399–411, Mar. 2012. [41] W. Bian and X. Xue, “Neural network for solving constrained convex optimization problems with global attractivity,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 60, no. 3, pp. 710–723, Mar. 2013. [42] L. Cheng, Z.-G. Hou, Y. Lin, M. Tan, W. C. Zhang, and F.-X. Wu, “Recurrent neural network for non-smooth convex optimization problems with application to the identification of genetic regulatory networks,” IEEE Trans. Neural Netw., vol. 22, no. 5, pp. 714–726, May 2011. [43] I. Alvarado et al., “A comparative analysis of distributed MPC techniques applied to the HD-MPC four-tank benchmark,” J. Process Control, vol. 21, no. 5, pp. 800–815, Jun. 2011. [44] X. He, C. Li, T. Huang, and C. Li, “Neural network for solving convex quadratic bilevel programming problems,” Neural Netw., vol. 51, pp. 17–25, Mar. 2014. [45] X. He, C. Li, T. Huang, C. Li, and J. Huang, “A recurrent neural network for solving bilevel linear programming problem,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 4, pp. 824–830, Apr. 2014. Chaojie Li received the B.Eng. degree in electronic science and technology and the M.Eng. degree in computer science from Chongqing University, Chongqing, China, in 2007 and 2011, respectively. He is currently pursuing the Ph.D. degree with RMIT University, Melbourne, VIC, Australia. His current research interests include distributed optimization and control in smart grid, neural networks, and their application.
Xinghuo Yu (M’92–SM’98–F’08) received the B.Eng. and M.Eng. degrees from the University of Science and Technology of China, Hefei, China, in 1982 and 1984, respectively, and the Ph.D. degree from Southeast University, Nanjing, China, in 1988. He is currently with RMIT University, Melbourne, VIC, Australia, as the Founding Director of the RMIT Platform Technologies Research Institute. His current research interests include variable structure and nonlinear control, complex and intelligent systems, and smart grids. Prof. Yu is a fellow of the Institution of Engineering and Technology (U.K.), the International Energy Foundation, Engineers Australia, the Australian Computer Society, and the Australian Institute of Company Directors. He received a number of awards and honors for his contributions, including the 2013 Dr.-Ing. Eugene Mittelmann Achievement Award of the IEEE Industrial Electronics Society and the 2012 IEEE Industrial Electronics Magazine Best Paper Award. He is an IEEE Distinguished Lecturer and the Vice President (Publications) of the IEEE Industrial Electronics Society.
321
Tingwen Huang received the B.S. degree from Southwest University, Chongqing, China, in 1990, the M.S. degree from Sichuan University, Chengdu, China, in 1993, and the Ph.D. degree from Texas A&M University, College Station, TX, USA, in 2002. He was a Visiting Assistant Professor with Texas A&M University. He joined Texas A&M University at Qatar (TAMUQ), Doha, Qatar, as an Assistant Professor in 2003, where he is currently a Professor. He has authored or co-authored over 100 refereed papers. His current research interests include neural networks, chaotic dynamical systems, complex networks, optimization, and control. His research is partially supported by the Qatar National Priorities Research Program. Dr. Huang was awarded Dean’s Fellow in 2014. He received the Faculty Research Excellence Award from TAMUQ in 2015. In addition, one of his projects for which he was the Lead PI received the Best Project by the Qatar National Research Fund in 2015.
Guo Chen received the B.E. and M.E. degrees in computer science and engineering from Chongqing University, Chongqing, China, in 2003 and 2006, respectively, and the Ph.D. degree in electrical engineering from The University of Queensland, Brisbane, QLD, Australia, in 2010. He was a Research Fellow with Australian National University, Canberra, ACT, Australia, and the University of Newcastle, Callaghan, NSW, Australia. He is currently a Research Fellow with the School of Electrical and Information Engineering, The University of Sydney, Sydney, NSW, Australia. His current research interests include optimization and control, complex network, dynamical systems, intelligent algorithms, and their applications in smart grid.
Xing He received the B.S. degree in mathematics and applied mathematics from the Department of Mathematics, Guizhou University, Guiyang, China, in 2009, and the Ph.D. degree in computer science and technology from Chongqing University, Chongqing, China, in 2013. He was a Research Assistant with Texas A&M University at Qatar, Doha, Qatar, from 2012 to 2013. He is currently an Associate Professor with the School of Electronics and Information Engineering, Southwest University, Chongqing. His current research interests include neural networks, bifurcation theory, optimization method, smart grid, and nonlinear dynamical system.