Neural Geometry for Constrained Optimization - CiteSeerX

1

Neural Geometry for Constrained Optimization Giansalvo Cirrincione1 Maurizio Cirrincione2 and Sabine Van Huffel1 1

ESAT-SISTA/COSIC, Katholieke Universiteit Leuven, Kardinaal Mercierlaan 94, B-3001 Leuven-Heverlee, Belgium. E-mail: [email protected], 2

CERISEP-CNR, viale delle Scienze snc,90128 Palermo, Italy. E-mail: [email protected]

Abstract — Many engineering problems require the on-line solution of constrained optimization problems. This paper proposes, as an original contribution, particular neural architectures, called neural solids, whose interconnections represent the required constraints. The first neural solid presented here is the neural triangle whose learning law solves simple orthogonality problems. This solid is the basis for more complex architectures solving problems with orthogonality constraints: the paper presents and yields examples of the neural decomposer which can be used for computer vision applications and the EXIN NSVD neural network which yields the singular value decomposition of a given matrix. These neural networks are not only important because they give a fast solution of the optimization, but, above all, because they introduce a new technique which can be extended to more complex constrained optimization problems. Keywords— Backpropagation, Features, Hopfield networks, Image Matching, Neural Networks, Occlusion Map, Pattern Recognition, Self Organization, Structure from Motion. I. Introduction Optimization problems requiring the minimization of a cost function can be solved in a parallel way by neural networks whose energy function is given by the cost function. If a constrained optimization is required, the constraints can be fulfilled by means of particular neural architectures. These structures can be combined with the neural networks which solve the unconstrained minimization. The geometry of the interconnections of these combinations is characteristic (neural solid ) and can be exploited. Hence, the constrained minimization problems define the structure of the solid. The adaptive weights represent the edges of some faces of the solid and converge to the minimum of the problem. In general, these faces represent the constraints of the minimization. The other edges represent input feeding. The neurons are the vertices of the solid. The learning law is generally a gradient flow of the ad hoc minimization error. The weights generally change according to simple

adaptive rules and resonate until convergence. The position of the edges/weights is meaningful for the learning rules and, so, justifies the definition of neural solids. All these neural geometric models can be considered as dual to the Hopfield network. Sec.II deals with the orthogonality constraint which implies a simple basic element of the neural solid called neural triangle (resonator), because is composed of three resonant neurons underlying a self-organizing learning. This triangle is able to solve elementary optimization problems as the search for the nearest orthonormal matrix to a given one. In the remaining part of the paper, two more complex neural solids, which require neural triangles (orthogonality constraints) are proposed as examples of the way a constrained optimization can be implemented in a neural way. Sec.III introduces the neural prism whose architecture is composed of neural triangles and their mutual connections. They can solve more complex optimization problems as the one whose solution yields the decomposition of a given matrix in the product of an orthogonal and an antisymmetric matrix (an important application is the estimation of the structure and motion parameters in computer vision by using the essential matrix [8], [4]). In this application, the neural prism is very fast, but sensitive to noise in the input. Sec.IV proposes a novel neural solid for the Singular Value Decomposition (SVD), called EXIN NSVD. The balancing properties of its neural triangles improve the estimation of the orthogonal matrices of the decomposition. It has been applied in computer vision [4]. II. The Neural Resonator (neural triangle) The neural resonator (neural triangle for n = 3) finds the nearest orthogonal matrix to a given matrix according to the Frobenius norm. Given the matrix M ∈ 0 is a scaling factor for the constraints. The every iteration become easy. Define now the coordinate minimization of the first two terms implies the closeness vector for the iteration t as xT N (TN = turn), i.e. i = T N of the solution to the given matrix (the first one works on (the iteration will be called T N ). As a consequence of this the column space, the second one works on the row space), choice for the inputs, every error vector implies, by means while the minimization of the other two terms implies the of eq.(8), a simple learning rule for the three neurons: orthogonality in a balanced way, i.e. constraining both 1. e¯ (t): at each iteration, the column mT N of the matrix rows and columns of Q. The steepest descent technique for M = [m1 , m2 , m3 ] is fed to the neuron T N which changes the minimization of the error function yields the following its column vector qT N as (9) ∆qT N = µ (mT N − qT N ) system of differential equations (learning law): and so the learning law tries to minimize the column vector ∂E (t) dqij (t) = −µ = µ [ e¯i (t) xj (t) + distances. dt ∂qij (t) 2. e∗ (t): at each iteration, the row nT N of the matrix M = ej (t) yi (t) + ν˜ ei (t) fj (t) ] (8)  nT  e∗j (t) xi (t) + νˆ Pn Pn 1 where fj (t) = k=1 qkj (t) xk (t), yi (t) = k=1 qik (t) xk (t)  nT  is fed to the neuron T N which changes its row 2 and the learning rate µ is positive. Consider the case n = 3 nT3 (space rotation Q): represent a triangle (fig. 2) where the vector rT N as vertices localize three neurons. The neuron k is identified (10) ∆rT N = µ (nT N − rT N ) by a couple of weight vectors: the first is the column vec- and so the learning law tries to minimize the row vector tor qk of the matrix Q and the second is the row vector rk distances. of the same matrix. The edges define couples of weights 3. eˆ (t): at the iteration T N the column vector qT N of the qij (elements of Q), where i is the number of the neuron neuron T N is updated by 2 touched by the tip of the weight vector and j is the num(11) ∆qT N = µ ν 1 − kqT N k2 qT N ber of the neuron touched by the application point of the i.e. there is only a change in the modulus in order to norweight vector. The qii vectors are represented by feed- malize the vector; the column vector qi (i 6= T N ) of the backs on the corresponding neurons. Hence, the triangle is neuron i is updated by (12) ∆qi = −µ ν (qT N , qi ) qT N a planar representation of the two weight vectors of every Fig. 1. Preliminary network for the neural resonator.

where (qT N , qi ) represents the scalar product between qT N and qi . Hence, the learning law tries to change qi with an update in the direction of qT N in order to orthogonalize the two vectors. 4. e˜ (t): at the iteration T N the row vector rT N of the neuron T N is updated by (13) ∆rT N = µ ν 1 − krT N k22 rT N i.e. there is only a change in the modulus in order to normalize the vector; the row vector ri (i 6= T N ) of the neuron i is updated by (14) ∆ri = −µ ν (rT N , ri ) rT N where (rT N , ri ) represents the scalar product between rT N and ri . Hence, the learning law tries to change ri with an update in the direction of rT N in order to orthogonalize the two vectors. In summary, at each learning step (iteration T N ) a column and a row (with the same index) of the matrix M is fed to one neuron. The weight column and row vectors of this neuron change in order to approximate, respectively, the inputs and to normalize its norms. The weight vectors associated to the other two neurons try to move orthogonally to the corresponding weight vectors of the first neuron. At each epoch, every weight in the triangle has eight contributions (see [4] for the explanation). In every epoch all weights are treated in the same way, in order to converge to a solution with minimal Frobenius norm distance. Two implementations are possible: 1. on line version: the M columns and rows are fed randomly; it is possible to update the weights before the end of the iteration for the subsequent learning rules; 2. batch version: an epoch can be interpreted as a simultaneous presentation of the whole matrix M to the triangle. Acceleration techniques as the Conjugate Gradient and the quasi-Newton algorithms [9], [4] can then be applied but are not needed because of the very fast convergence of the simple batch and on line learning. Theorem II-.1 (special duality) The neural resonator is a kind of dual Hopfield network, but with a quartic energy function (the Hopfield energy is quadratic). The neural resonator can be applied to refine matrices which are nearly orthogonal because of numerical inaccuracies in the algorithms for finding rotations, in order to have an exact orthogonality. At this proposal, the best choice for the initial conditions is the same matrix M . The following simulation deals with the refinement of the matrix   0.2625 0.6192 −0.7309  0.6849 0.3215 0.6027  (15) 0.6616 −0.6207 −0.4441 whose determinant is 0.9621. The neural triangle has learning rate µ(t) = t0.1 0.01 , ν = 2.1 and this matrix as initial conditions. Fig. 3 shows the results in the batch learning by plotting the determinant of the weight matrix Q. The computed orthogonal matrix Q, after convergence, is given by:   0.2505 0.6732 −0.6957  0.7207 0.3501 0.5983  (16) 0.6464 −0.6513 −0.3975

Fig. 3. Plot of the determinant of matrix Q given by the weights of the neural triangle in batch learning. whose determinant is 1.0000. Note the fast convergence, even in absence of acceleration techniques. Other results obtained by the neural triangle are shown in [4]. The neural resonator is not only important for the orthogonal refinement of a matrix, but, above all, because it is the basis for more complicated neural solids because it represents a neural way of implementing the orthogonality constraint of an optimization (see its use in [4] for the solution of the orthogonal Procrustes problem [5]). III. The Neural Decomposer (neural prism) The neural decomposer is a neural solid which decomposes a matrix A ∈ 0 is the learning rate, b = Sx, d = Ax, f = QT x and y = Qx. Consider, as in the last section, as inputs to the network the coordinate vectors xi and define in the same way both an iteration and an epoch. The component vector for the iteration t is xT N (the iteration will be called T N ). The learning rules have then simple expressions and are presented in relation to the position in the prism of the corresponding weights. A. Upper Neural Triangle ∆S The constraint on matrix S is fulfilled by the last two terms of the r.h.s. of (25). The learning rules for the weights touching the neuron T N are: • weights pointing out of the neuron T N : (26) ∆siT N = −µ ρ (siT N + sT N i ) • weights pointing into the neuron T N : (27) ∆sT N j = −µ ρ (sj T N + sT N j ) • feedback for the neuron T N : (28) ∆ sT N T N = −2µ ρ sT N T N The other weights do not change at this iteration. Thus, at every epoch, the updating of every weight, caused by the antisymmetric constraint, is made of two contributions (the feedback updating must be counted twice). The learning rules are different from the neural triangle rules, but the planar triangle representation explains well the structure of the neural network. The first two terms of (25) express the influence of the lower neural triangle which is identified by the hatched grey arrows in fig. 4(a). This influence, under form of injection into the upper neurons, is given by the following rule: (29) ∆sT N = µ 2QT aT N − QT Q + I sT N i.e. only the column (pointing out) vector of the upper neuron T N updates its weights. This rule is made of two terms: the first is proportional to the input aT N expressed in the basis of the column space of Q, the second is a kind of feedback (exact feedback for Q orthogonal). The case for T N = 1 is shown in fig. 4(b): the connection is hatched gray and the black point in the center of the lower triangle represents the effect of the whole triangle; the dark textured upper neuron is the only neuron updating its column vector. In summary, for each epoch, the upper triangle weights have three updating contributions, two from the constraint and one from the injection.

B. Lower Neural Triangle ∆Q The last two terms of (24) constrain Q to be orthogonal. They are similar to the last two terms of (8); hence, the corresponding learning rules are equal to eqs. (11)-(14) and, so, the updating of the lower triangle weights is the same as those of the neural triangle except for the injections. Unlike the neural triangle, all three neurons have an updating contribution from the injection. Indeed, the updating for the lower neuron column vector j is given by: (30) ∆qj = µ [2sj T N − (qj , aT N )] aT N − sj T N QsT N Hence, the injections/interconnections are given by (see fig. 4(b)): • For the lower neuron T N : the sum of quantities depending on the input aT N , the lower triangle weights (feedback effect) and the column weight vector of the corresponding upper neuron T N : this last interconnection is represented by the solid gray arrow in fig. 4(b). • For the other two lower neurons: the sum of quantities depending on the input aT N , the lower triangle weights (feedback effect), and the column weight vector of the upper neuron T N : the influence of this weight vector and, in particular, of the weight component connecting the upper neuron T N with the upper neuron of the same index of the lower neuron here considered, is represented by the dotted black arrows for j = 2 and the solid hatched arrows for j = 3 in fig. 4(b). C. Simulations for the Neural Prism The simulations deal with the decomposition of the essential matrix (for details, see [4]). Given a pair of images taken from a mobile camera, which translates and rotates (rigid motion), the essential matrix method computes a matrix G (essential) which contains the motion information. Then, the decomposition of this matrix yields the motion parameters (the translation can be known only in direction). The essential matrix G is defined as1 G = h × R = (h × I) R = T R =   0 −h3 h2 (31) 0 −h1  R =  h3 −h2 h1 0 T where h = h1 h2 h3 is the translation unit vector, R is the rotation matrix and T is the antisymmetric matrix associated with h. A matrix G is decomposable iff its singular values are 1, 1 and 0. The neural prism solves the problem by using A = GT , Q = RT and S = T T = −T . The quasi-Newton BFGS acceleration technique is used [4]. One epoch corresponds to the presentation of one component vector xi . This component vector is presented to the network for a certain number of consecutive epochs (component vector block ); then, another component vector is presented to the network for the same block of epochs and, after this, the same is made for the last component vector. This group of three blocks is then repeated, changing the order of the component vectors, until convergence. 1

Matrix v× A is defined by (v × a1 , v × a2 , v × a3 ) for vector v and matrix A = (a1 , a2, a3 ).

Fig. 4. (a): The structure of the neural prism: the solid gray arrows are the interconnection edges which represent the influence of the upper neural triangle on the lower neural triangle; the dashed gray arrows represent the inverse influence. (b): Analysis of the interconnections between the two neural triangles: here T N = 1. The textured neurons are the only ones which update their column vectors at this iteration. The solid gray arrow indicates the influence of the upper triangle on the lower triangle neuron T N . The dotted black and dashed black arrows represent the influence of the upper neuron T N on the lower neurons 2 and 3, respectively.

Fig. 5. Plot of the determinant of the matrix Q and of the translational distance for the BFGS neural prism in absence of additional noise.

Fig. 6. Plot of the determinant of the matrix Q and of the translational distance for the BFGS neural prism in the presence of additional noise.

This scheduling is justified by the fact that every different component vector presentation deteriorates the previous training. The block is always composed of ten epochs. In the simulations, ν = 1 and ρ = 1. The initial conditions are the identity matrix for Q and the null matrix for S. The first simulation decomposes the essential matrix:   −0.5581 0.1229 −0.0820 G =  −0.4255 0.8066 0.0406  (32) 0.6907 0.5607 0.2045

T and R is In this case, h = 0.8165 −0.4082 0.4082 given by eq.(16). The quality of the decomposition is evaluated by using the determinant of Q and the translational 2 distance td = kS − S ∗ kF , where S ∗ is the desired matrix T . Fig. 5 shows the very accurate results for the BFGS acceleration technique. The peaks are due to the change of block in the training. In the second simulation the matrix (32) is corrupted by additive Gaussian noise with zero

mean and variance σ 2 = 0.01. The singular values of the corrupted matrix are 1.0968, 0.9020 and 0.0526. Hence, it is not anymore an essential matrix. Fig. 6 shows that the method is fast and the solution is still very accurate. IV. A Neural Solid for the Singular Value Decomposition Given the real matrix A ∈

Neural Geometry for Constrained Optimization - CiteSeerX

Neural Geometry for Constrained Optimization - CiteSeerX

Suggest Documents

Neural Networks for Solving Constrained Optimization Problems

Methods for Constrained Optimization - CiteSeerX

Geometry Aware Constrained Optimization ... - CVF Open Access

Modified Differential Evolution for Constrained Optimization - CiteSeerX

Ant Colony Optimization for Constrained Combinatorial ... - CiteSeerX

Parallel Algorithms for PDE-Constrained Optimization - CiteSeerX

Ant Colony Optimization for Constrained Combinatorial ... - CiteSeerX

Optimization Techniques for State-Constrained Control ... - CiteSeerX

constrained differential evolution for constrained optimization

Particle Swarm Optimization Method for Constrained Optimization ...

11. Symplectic geometry of constrained optimization - SISSA People ...

Constrained Trajectory Optimization for Planetary

Are Landscapes for Constrained Optimization

Energy-constrained Modulation Optimization for

Neural Networks for Combinatorial Optimization - CiteSeerX

Resource-Constrained Geometric Network Optimization - CiteSeerX

Constrained Optimization via Particle Evolutionary Swarm ... - CiteSeerX

Approximate Inference and Constrained Optimization - CiteSeerX

Surface Registration by Optimization in Constrained ... - CiteSeerX

Integrated Multidisciplinary Constrained Optimization of ... - CiteSeerX

Multidisciplinary Hybrid Constrained GA Optimization - CiteSeerX

Constrained optimization in human walking: cost ... - CiteSeerX

Stress-Constrained Topology Optimization with Design ... - CiteSeerX

constrained optimization in newsboy problems under ... - CiteSeerX