m; - -Ui

Constrained Optimization with the Hop eld-Lagrange Model Jan van den Berg Jan C. Bioch [email protected] [email protected] Department of Computer Science, Erasmus University Rotterdam, room H4-13, P.O. Box 1738, 3000DR Rotterdam, The Netherlands Abstract A generalized continuous Hop eld model using Lagrange multipliers, originally introduced in [10], is shortly discussed. A more complete treatment of the model can be found in [2].

1 Introduction

Since the appearance of Hop eld and Tank's article [6], many researchers have tried to solve combinatorial optimization problems using arti cial neural networks. Here, we discuss the general problem: minimize E (V~ );

subject to : C (V~ ) = 0; = 1 m;

(1)

where E (V~ ) is the (energy) function to be minimized and the expressions C (V~ ) = 0 are the constraints. Applying a continuous Hop eld type neural network, the constraints can be treated in several ways. One approach uses a so-called `penalty method'. Then, the general problem (1) is converted into: minimize EP (V~ ) = E (V~ ) +

m X

=1

X c C (V~ ) +

Z Vi

i

0

g,1 (V )dV:

(2)

The in uence of the sum of integrals term, which we call the `Hop eld term' and denote by EH (V~ ), may be small (subsection 2.1). Ignoring this term, the energy function EP is a weighted sum of m +1 terms and a diculty arises in determining correct weights c . The minimum of EP will be a compromise between ful lling the constraints and minimizing the original target function. In a second approach, the features of the neural net are changed. This can be done such, that automatically some constraints are ful lled (e.g. [3, 9]). Another way consists of the introduction of a second layer [7]. Here, we take the approach that was pioneered by Platt and Barr [8]. The classical Lagrange multiplier method to convert constrained optimization problems into unconstrained extremization ones is used. The values of the Lagrange multipliers are estimated by using the Basic Dierential Multiplier Method. In [10], Wacholder et al. applied this approach. Adding the term EH (V~ ) to the energy function, they used:

X X EHL (V~ ; ~) = E (V~ ) + C (V~ ) +

P

i

Z Vi 0

g,1(V )dV;

(3)

where E (V~ ) + C (V~ ) equals the energy function Platt and Barr employed. The corresponding motion equations are: HL = , @E , X @C , U ; U_ i = , @E (4) @V i @V @V i

i i HL ~ _ = + @E @ = C (V );

(5)

where Vi = g(Ui ) which, normally, is a sigmoid function. The 's are the multipliers. Note, that Ui = @EH =@Vi . We have termed this model the Hop eld-Lagrange Model.

0.5 0 -0.5 1 , H (Vi ) -1 -1.5 -2 -2.5

= 10 =2 =1

E

= 0:5 = 0:33

E + EH EH

0 0.2 0.4 0.6 0.8 1

Vi

:

1,

0 0.2 0.4 0.6 0.8 1

Figure 1: , 1 H (Vi ) for various values of

Vi Figure 2: E , EH and E + EH as function of Vi

2 Theoretical Results 2.1 The Hop eld Term

To simplify the discussion about the Hop eld term, consider the continuous Hop eld model without constraints. Now writing EHL (V~ ) = E (V~ ) + EH (V~ ), we try to understand the in uence of EH (V~ ). Taking V = g(U ) = (1 + e, U ),1 , one may write (6) U = , 1 ln( 1 ,V V ) = g,1 (V ): Using standard calculus, one nds:

Z Vi

g,1 (V )dV = 1 [(1 , Vi ) ln(1 , Vi ) + Vi ln Vi ] = , 1 H (Vi ); 0

(7)

dUi = , Ui , @E dt @Vi

(8)

where H (Vi ) equals the well known formula of the entropy of a binary source. We make some mathematical 1 observations (see also gures 1 and 2): If ! 1 then EH ! 0, as also Hop eld concluded [5]. For nite values of , EH has a contribution: (boundary) minima of E , which lay in a corner of the hypercube and for which all @E=@Vi are nite, are displaced toward the interior. This is true for any nite value of because the derivative of EH for V = 0 or V = 1 equals -1 and +1: the smaller the value of the larger the displacement toward the interior. It is a pretty feature of the model: in a combinatorial optimization problem solutions imply V -values equal to 0 or 1. The Hop eld term is responsible for changing the values of V to and 1 , , so the corresponding U -values become nite. Because the state space V~ is the whole N -dimensional hypercube [0; 1]N , it is possible that there are minima of E laying in the interior of the hypercube. In that case, there will also be (generally small) displacements of these minima toward the interior. There is some confusion about the Hop eld term EH . Takefuji considers the corresponding `decay term' Ui `harmful' [9]: "Hop eld gives the motion equation of the i-th neuron (Hop eld and Tank 1985): (. . . ) . Wilson and Pawley strongly criticised the Hop eld and Tank neural network through the Travelling Salesman Problem. Wilson and Pawley did not know what causes the problem. The use of the decay term (,Ui = ) increases the computational energy function E under some conditions instead of decreasing it." So, Takefuji suggests that the problems which Wilson and Pawley met, are caused by the decay term Ui = . 1 A physical explanation using `mean eld theory' is also possible: introducing stochastic binary neurons and interpreting the V 's i as their average values, we may write EHL (V~ ) = , TS , where is the average energy function, T = 1= and S the entropy. This means, that the energy EHL (V~ ) corresponds to the `free energy' of the stochastic Hop eld network.

2.2 Why the decay term is not harmful

It is well known, that equation (8) continuously decreases E (V~ ) + EH (V~ ), because E_ + E_ H 0: see [5]. Takefuji argues in the following way that, under some conditions, the energy E alone may increase: using equation (8) with = 1 (which coincides with equation (4), if there are no constraints), one nds: _ i , Ui )V_i = , X(U_ i2 + Ui U_ i ) dVi : V = ( , U i dUi i i i @Vi dVi > 0, a necessary condition for an increase of E is: there should be at least one i such, that Because dU i U_ i2 + Ui U_ i < 0 () , Ui < U_ i < 0 _ 0 < U_ i < ,Ui:

E_ =

X @E _ X

(9)

(10) The right two conditions correspond precisely to a displacement of a solution toward the interior of the state space, namely to a minimum with a lower value of Vi respectively with a higher value. We shall proof the rst one: the left inequality of ,Ui < U_ i < 0 implies that ,Ui , U_ i < 0. Using (8) (with = 1), one nds:

@E = ,U , U_ < 0; so E (as function of V ) is decreasing : i i i @Vi H The right inequality of ,Ui < U_ i < 0 implies that ,U_ i > 0. Again using (8) and Ui = @E @Vi , one nds: @E + @EH = @E + U = ,U_ > 0; so the sum of E and E is increasing : i H @Vi @Vi @Vi i

(11) (12)

The inequalities (11) and (12) together imply:

@EH > 0; so E is increasing and therefore ; V > 0:5: H i @Vi

(13)

Conditions (11) to (13) together imply a displacement of the minimum to the interior, caused by the contribution of EH (see also gure 2). It is easy to prove, that the converse also holds: a displacement of a solution to a smaller value of Vi , caused by the Hop eld term, implies ,Ui < U_ i < 0. This completes the proof. Concluding we notice, that the displacement can be kept small by taking high values of (subsection 2.1). Therefore, the decay term should not be considered as `harmful' nor as the cause of the trouble with the Hop eld and Tank neural network.

2.3 A Lyapunov function

We return to consider unconstrained optimization with energy function (3) and motion equations (4) and (5). Like Platt and Barr, we rst put the equations (4) and (5) together to one second-order dierential equation:

Ui = ,

P

X j

dVj U_ , U_ , X C @C ; Aij dU j i @V j

i

@ C E + 2 where Aij = @V@i2@V @Vi @Vj . We propose as a Lyapunov function of the system : j X 1 _ 2 X Z @C C @V dUi : EL = 2 Ui + i i; i

(14)

2

(15)

We shall verify that (15) is indeed, under some conditions, a Lyapunov function:

E_ L = =

X_ i

Ui Ui +

X @C _ C @V Ui i

0 i; 1 X _ @ X dVj _ _ X @C A X X @C _ Ui , Aij dU Uj , Ui , C @V + C @V Ui j i i i j i X _ dVj _ X _ 2 X _ _

= ,

Ui Aij dU Uj , Ui = , Ui Bij Uj ; (16) j i;j i i;j 2 This Lyapunov function, equal to the sum of kinetic and potential energy, is a generalization of the one of Platt and Barr [8].

1 0.8

r r

0.4 0.2 0

r

r

r r

r r

r

r

r

r

0.8

r

r

r

r

0.6

1

r

r

r

0.4

r r

r

0.2

r

r

0

r

r r r

r

0.2 0.4 0.6 0.8

0

1

r

r

r

r

r r

r

r

r

r

r

0.6

r r

r

r r

r

r

r

r

r

r r r

r

r

r

r

r

0

r

r r

r r

r

0.2 0.4 0.6 0.8

1

Figure 4: A solution of TSP for 32 cities

Figure 3: A Weighted Matching solution for 32 points

dVj + with the Kronecker delta. So, the stability of the Hop eld-Lagrange model depends where Bij = Aij dU ij ij j on matrix B 3 . If B is a positive de nite and EL is bounded below, the system will converge until 8i : U_ i = 0. Analysing equation (4), we nd that this will normally be the case when all 's are constant, i.e. _ = 0. This can only be true when all constraints have been satis ed (see equation (5)).

3 Experimental Results First, we considered quadratic energy functions with linear constraints. The corresponding Lyapunov functions are monotonely decreasing. Experiments showed proper convergence to the constrained (somewhat displaced) minima. No scalability problems have been observed [2]. Secondly, we mention the results with the Weighted Matching Problem [4]. We tried several formulations of the constraints and succeeded using quadratic ones, making the problem: minimize E =

nX ,1 X n i=1 j =i+1

i,1 X

dij Vij ; subject to 8i : 21 (

j =1

Vji +

n X j =i+1

Vij , 1)2 = 0 ^ 8i; j : 12 Vij (1 , Vij ) = 0: (17)

Vij represents whether point i is linked to point j or not, dij is the distance between point i and j . One can verify, dVij or dVij . So, these elements change dynamically, making that a lot of elements of the matrix B equal i dUij j dUij

the stability analysis hard. Nevertheless, the experiments showed results of high quality ( gure 3). Also, the Crossbar Switch Scheduling Problem (CSSP) was tried, a problem also solved by Takefuji. It is a combinatorial problem without a target function to be minimized. We always found a feasible solution. In our approach, the CSSP can be seen as a special case of Travelling Salesman Problem as tackled by Wacholder et al. [10]. Repeating their experiments we met solutions of very poor quality. Therefore, we modi ed the formulation of the constraints like was done for the Weighted Matching Problem with one multiplier for every single constraint. In all experiments we found proper convergence (an example is given in gure 4) and we tried to analyze why: after all, we claim that the use of quadratic constraints can `degenerate' the Hop eld-Lagrange method to a type of penalty method [2], where the penalty weights are calculated automatically.

4 Conclusions

In this paper a mathematical background has been given of the Hop eld-Lagrange model, including an explanation of the eect of the `Hop eld term'. In practice, the theory may be used to analyse stability. Experiments have shown, that for (combinatorial) optimization problems the Hop eld-Lagrange model may yield solutions of high quality, although the results depend strongly on the (formulation of the) problem. In further research we want to analyse mathematically, why some behave well and others do not by, among other things, inspection of the (deformation of the) corresponding energy landscapes. Also, we want to relate statistical physics (see the rst footnote) to this analysis. 3

The matrix A equals the one that Platt and Barr found.

References

[1] A.R. Bizzarri. Convergence properties of a modi ed Hop eld-Tank model. Biological Cybernetics, 64:293{300, 1991. [2] J.van den Berg and J.C. Bioch. The power of the symbiosis between the Hop eld model and Lagrange multipliers in resolving constrained optimization problems. Submitted to: Transactions on Neural Networks, 1994. [3] D.E. Van den Bout and T.K. Miller. Improving the performance of the Hop eld-Tank neural network through normalization and annealing. Biological Cybernetics, 62:129{139, 1989. [4] J. Hertz, A. Krogh, and R.G. Palmer. Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company, The advanced Book Program, 1991. [5] J.J. Hop eld. Neurons with graded responses have collective computational properties like those of two-state neurons. Proceedings of the National Academy of Sciences, USA 81:3088{3092, 1984. [6] J.J. Hop eld and D.W. Tank. \Neural\ computation of decisions in optimization problems. Biological Cybernetics, 52:141{152, 1986. [7] A. Joppe, H.R.A. Cardon, and J.C. Bioch. A neural network for solving the Travelling Salesman Problem on the basis of city adjacency in the tour. In Proceedings of the International Joint Conference on Neural Networks, pages 961{964, San Diego, june 1990. [8] J.C. Platt and A.H. Barr. Constrained dierential optimization. Proceedings of the IEEE 1987 NIPS Conference, 1988. [9] Y. Takefuji. Neural Network Parallel Computing. Kluwer Academic Publishers, 1992. [10] E. Wacholder, J. Han, and R.C. Mann. A neural network algorithm for the Traveling Salesman Problem. Biological Cybernetics, 61:11{19, 1989. [11] G.V. Wilson and G.S. Pawley. On the stabilitity of the Travelling Salesman Problem algorithm of Hop eld and Tank. Biological Cybernetics, 58:63{70, 1988.