GRG for Nonconvex Programming Problem via

2 downloads 0 Views 178KB Size Report
nonlinear differentiable constraints is studied, a GRG descent methods with random ... in [10]; nonlinear programming in [16]; stochastic perturbation of projected ...
GRG for Nonconvex Programming Problem via Stochastic Perturbation A. El Mouatasim ∗R. Ellaia



Abstract In this talk, the global optimization of a nonconvex objective function under nonlinear differentiable constraints is studied, a GRG descent methods with random perturbation is proposed and it is desired to establish the global convergence of the algorithm. Some numerical examples are also given by the problems of: octagon, mixture.

Keywords: Nonconvex optimization, Stochastic perturbation, Constraints optimization, Generalized reduced gradient method.

1

Introduction

The general constrained optimization problem is to minimize a nonlinear function subject to nonlinear constraints:   Minimize : f (x) subject to : h(x) = 0, (1)  x ≥ 0, where h maps Rm to R, the objective function f : Rn −→ R and the restriction hj : Rn −→ R, j = 1, ..., m (the component of h) are twice continuously differentiable functions. The main techniques that have been proposed for solving constrained optimization problems in this work is generalized reduced gradient method. We are mainly interested in the situation where, on one hand f is not convex and, on the other hand, the constraints are not linear, for the case of linear constrained (see, for instance [2] and [8]); To get equality constraints one might think about introducing slack variables and incorporating them into the functions gi . The problem (1) can be numerically approached by using GRG methods, which generates a sequence {xk }k≥0 , where x0 is an initial feasible point and, for each k > 0, a new feasible point xk+1 is generated from xk by using an operator Qk (see section 3). Thus the iterations are given by: k ≥ 0 : xk+1 = Qk (xk ) (2) ∗

Ecole Mohammadia d’Ing´enieurs LERMA. Avenue Ibn Sina BP765 - Agdal, Rabat -Maroc, [email protected][email protected]

2

Proceedings JIMD - dec. 2006

In order to prevent from convergence to local minimum, various modifications of these basic methods have been introduced in the literature. For instance, the random perturbations projected gradient method in [22]; the sequential convex programming in [14]; the generalized reduced gradient in ([3], [5], [15] and [20]); sequential quadratic programming in [10]; nonlinear programming in [16]; stochastic perturbation of projected variable metric [7]. Moreover, stochastic or evolutionary search can be introduced, but these methods are usually considered as expensive and not accurate: on the one hand, the flexibility introduced by randomization implies often a large number of evaluations of the objective function and, on the other hand, the pure random search generates many points which do not improve the value of the objective function. This remark has led some authors to introduce a controlled random search (see, for instance [4], [2], [13], [17] and [21]). In such a method, the sequence {xk }k≥0 is replaced by a random vectors sequence {Xk }k≥0 and the iterations are modified as follows: k ≥ 0 : Xk+1 = Qk (Xk ) + Pk

(3)

where Pk is a suitable random variable, called the stochastic perturbation. The sequence {Pk }k≥0 goes to zero slowly enough in order to prevent convergence to a local minimum (see section 4), the GRG is recalled in section 3, the notations are introduced in section 2 and the results of some numerical experiments are given in section 5.

2

Notations and Assumptions

We denote by R the set of the real numbers (−∞, +∞), R+ the set of positive real numbers [0, +∞), E = Rn , the n-dimensional real Euclidean space. For x = (x1 , x2 , ...., xn ) ∈ E, we denote by kxk = (x21 + ... + x2n )1/2 the Euclidean norm of x. At denotes the transpose matrix associated to A. Furthermore the gradient of f may be conformally partitioned as   ∇f (x)t = ∇B f (x)t , ∇N f (x)t , where ∇B f and ∇N f are the basic gradient and nonbasic gradient of f respectively. For simplicity we also assume that the gradients of the constraint functions hj are linearly independent for all point x ≥ 0. This assumption like the assumption of the full rank of the matrix A in the linear case ensures that we can choose a basis and express the basic variables as a linear combination of the nonbasic variables and then reduce the problem. Hence a similar, generalized reduced gradient procedure applies. Let a feasible solution xk ≥ 0 with hj (xk ) = 0 for all j be given. By assumption the Jacobian matrix of the constraints h(x) = (h1 (x), ..., hm (x))t at each x ≥ 0 has full rank and, for simplicity at the point xk will be denoted by A = Jh(xk ), and b = Axk . We introduce basic and nonbasic variables according to  k  xB k A = [B, N ] x = xkN ≥ 0 xkN

xkB > 0

(4)

Stochastic Perturbation of GRG

3

by nondegeneracy assumption. Let C = {x ∈ E | h(x) = 0, x ≥ 0}. The objective function is f : E −→ R, its lower bound on C is denoted by l∗ : l∗ = min f . C

Let us introduce Cα = Sα ∩ C; Sα = {x ∈ E | f (x) ≤ α}. We assume that f is twice continuously differentiable on E

(5)

∀α > l∗ : Cα is not empty, closed and bounded

(6)

∀α > l? :

meas(Cα ) > 0,

(7)

where meas(Cα ) is the measure of Cα . Since E is a finite dimensional space, the assumption (6) is verified when C is bounded or f is coercive, i.e., lim f (x) = +∞. Assumption (6) is verified when C contains a kxk→+∞

sequence of neighborhoods of a point of optimum x? having strictly positive measure, i.e., when x? can be approximated by a sequence of points of the interior of C. We observe that the S assumptions (5)-(6) yield that Cα , i.e., ∀x ∈ C : ∃α > l∗ such that x ∈ Cα . C= α>l?

From (5)-(6): γ1 = sup{k∇f (x)k : x ∈ Cα } < +∞. Then γ2 = sup{kdk : x ∈ Cα } < +∞, where d is the direction of reduced gradient method (see section 3). Thus, β(α, ε) = sup{ky − (x + ηd)k : (x, y) ∈ Cα × Cα , 0 ≤ η ≤ ε} < +∞

(8)

where ε, η are positive real numbers.

3

Generalized Reduced Gradient (GRG)

The problem (1) is equivalent to global optimization problem with equality constrained and nonnegative variables. The reduced gradient method can be generalized to nonlinearly constrained optimization problems. Similar to the linearly constrained case we consider the problem with equality constraints and nonnegative variables. The GRG method begins with a basis B and a feasible solution xk = (xkB , xkN ) such that xkB > 0. The solution x is not necessarily a basic solution, i.e. xN do not has to be identically zero. Such a solution can be obtained e.g. by the usual first phase procedure of linear optimization. Using the basis B form BxB + N xN = b, we have xB = B −1 b − B −1 N xN , hence the basic variables xB can be eliminated from the problem (1)   Minimize : fN (xN ) subject to : B −1 b − B −1 N xN > 0  0 ≤ xN

(9)

4

Proceedings JIMD - dec. 2006

where fN (xN ) = f (x) = f (B −1 b − B −1 N xN , xN ). Using the notation   ∇f (x)t = ∇B f (x)t , ∇N f (x)t , the gradient of fN , which is the so-called reduced gradient, can be expressed as  ∇fN (x)t = − ∇B f (x)t B −1 N ) + (∇N f (x)t . Now let us assume that the basis is nondegenerate, i.e. only the nonnegativity constraints xN ≥ 0 might be active at the current iterate xk . Let the search direction be a vector dt = (dtB , dtN ) in the null space of the matrix A defined as dB = −B −1 N dN and dN ≥ 0. If we define so, then the feasibility of xk + ηd is guaranteed as long as xkB + ηdB ≥ 0, i.e. as long as −xk η ≤ ηmax = min { i }. i∈B,di 0 such that meas(Cγ − Cθ )  β(γ, ε)  gk > 0 ∀θ ∈ (l∗ , l∗ + ν], P (Uk+1 < θ|Uk ≥ θ) ≥ ξkn ξk where n = dim(E).  Proof : See for instance [2] and [6].  The global convergence is a consequence of the following result, which yields from the Borel-Catelli’s lemma (see, for instance, [17]): Lemma 2 Let {Uk }k≥0 be a decreasing sequence , lower bounded by l∗ . Then, there exists U such that Uk → U for k → +∞. Assume that there exists ν > 0 such that for any θ ∈ (l∗ , l∗ + ν], there is a sequence of strictly positive real numbers {ck (θ)}k≥0 such that ∀k ≥ 0 : P (Uk+1 < θ|Uk ≥ θ) ≥ ck (θ) > 0;

+∞ X

ck (θ) = +∞

(20)

k=0

Then U = l∗ almost surely. Proof: See, for instance, [18] and [17].  Th´ eor` eme 4.1 Let γ = f (x0 ) and ηk satisfy (15). Assume that x0 ∈ C, the sequence ξk is non increasing and +∞  β(γ, ε)  X = +∞ . (21) gk ξ k k=0 Then U = l∗ almost surely.

Stochastic Perturbation of GRG

7

Proof: Let meas(Cγ − Cθ ) β(γ, ε) gk ( ) > 0. ξkn ξk

ck (θ) =

(22)

Since the sequence { ξk }k ≥ 0 is non increasing,

meas(Cγ − Cθ ) β(γ, ε) gk ( ) > 0. ξ0n ξk

ck (θ) ≥ Thus, Eq. (21) shows that

+∞

+∞ X

meas ( Cγ − Cθ ) X gk ck (θ) ≥ n ξ 0 k=0 k=0



β(γ, ε) ξk

 = +∞.

Using Lemma 1 and Lemma 2 we have U = l∗ almost surely.  Th´ eor` eme 4.2 Let Zk satisfied (18) and let r ξk =

a log (k + d)

(23)

where a > 0, d > 0 and k is the iteration number. If x0 ∈ C then, for a large enough, U = l∗ almost surely.  Proof : We have

2 !

1 Z

= gk ( k Z k ) > 0 . −

2 σ

1 φk ( Z ) = √ n exp σ 2π So,  gk

β(γ, ε) ξk

 = σ



1 2π

n

(k + d)β(γ,ε)

2 /(2σ 2 a)

For a such that 0
0. number generator of the FORTRAN library. We use ξk = log(k+2) The results for ksto = 0, 250, 500 and 1000 are given in the Tables below for each problems. Concern experiments performed on a workstation HP Intel(R) Celeron(R) M processor 1.30GHz, 224 Mo RAM. The row cpu gives the mean CPU time in seconds for one run.

5.1

Octagon problem

Consider polygon in the plane with 5 sides (5-gons for short) and unit diameter. Which of them have maximum area ? (see for instant [19] and [11]). Using QP and geometric reasoning, the optimal octagon is determined in [6, 10] to have an area about 2.8% larger than the regular octagon. Graham’s conjecture states that the optimal octagon can be illustrated as in Figure 1, in which a solid line between two vertices indicates that the distance between these points is one. In 1996 Hansen et al [1] formulate this question in the quadratically constrained quadratic optimization problem defining this configuration appears below. By symmetry, and without any loss in generality, the constraint x2 ≥ x3 is added to reduce the size of the feasible region. A3 = (−x1 , y1 )CS S

A4 = (0, 1)  A5 = (x1 , y1 )

 C S  C S   C S  A2 = (x3 − x1 − x5 , y1 − y3 + y5 ) C S A6 = (x1 − x2 + x4 , y1 − y2 + y4 )  HH C S   H   HHC S   C  S H  C H HH    HS C   HSH C    H S  A1 = (x1 − x2 , y1 − y2 ) A 7 = (x3 − x1 , y1 − y3 ) C C 

A0 = (0, 0) Figure 1: Definition of variables for the configuration conjectured by Graham to have maximum area.

Stochastic Perturbation of GRG

9

 Minimize: − 21 {(x2 + x3 − 4x1 )y1 + (3x1 − 2x3 + x5 )y2 + (3x1 − 2x2 + x4 )y3 +     (x3 − 2x1 )y4 + (x2 − 2x1 )y5 } + x1     subject to : kA0 − A1 k ≤ 1 : (x1 − x2 )2 + (y1 − y2 )2 + z1 = 1     kA0 − A2 k ≤ 1 : (−x1 + x2 − x5 )2 + (y1 − y3 + y5 )2 + z2 = 1     kA0 − A6 k ≤ 1 : (x1 − x2 + x4 )2 + (y1 − y2 + y4 )2 + z3 = 1     kA0 − A7 k ≤ 1 : (−x1 + x3 )2 + (y1 − y3 )2 + z4 = 1,     kA1 − A2 k ≤ 1 : (2x1 − x2 − x3 + x5 )2 + (y2 − y3 + y5 )2 + z5 = 1     kA1 − A3 k ≤ 1 : (2x1 + x2 )2 + y22 + z6 = 1     kA1 − A4 k ≤ 1 : (x1 − x2 )2 + (y1 − y2 − 1)2 + z7 = 1     kA1 − A7 k ≤ 1 : (2x1 − x2 − x3 )2 + (−y2 + y3 )2 + z8 = 1     kA2 − A3 k ≤ 1 : (x3 − x5 )2 + (−y3 + y5 )2 + z9 = 1  kA2 − A4 k ≤ 1 : (−x1 + x3 − x5 )2 + (y1 − y3 + y5 − 1)2 + z10 = 1   kA2 − A5 k ≤ 1 : (2x1 + x3 − x5 )2 + (−y3 + y5 )2 + z11 = 1     kA2 − A6 k = 1 : (2x1 − x2 − x3 + x4 + x5 )2 + (−y2 + y3 + y4 − y5 )2 = 1     kA3 − A6 k ≤ 1 : (−2x1 + x2 − x4 )2 + (y2 − y4 )2 + z12 = 1     kA4 − A6 k ≤ 1 : (x1 − x2 + x4 )2 + (y1 − y2 + y4 − 1)2 + z13 = 1     kA4 − A7 k ≤ 1 : (x1 − x3 )2 + (1 − y1 + y3 )2 + z14 = 1     kA5 − A6 k ≤ 1 : (x2 − x4 )2 + (y2 − y4 )2 + z15 = 1     kA5 − A7 k ≤ 1 : (2x1 − x3 )2 + y32 + z16 = 1      kA6 − A1 k ≤ 1 : (2x1 − x2 − x3 + x4 )2 + (−y2 + y3 + y4 )2 + z17 = 1     x2 − x3 − z18 = 0; x2i + yi2 = 1, i = 1, 2, 3, 4, 5     x1 + z19 = 0.5; xi + z18+i = 1, i = 2, 3, 4, 5   0 ≤ xi , yi i = 1, 2, .., 5, 0 ≤ zi i = 1, 2, .., 23. There are 33 variables and 62 constraints. We use a = 100, kmax = 600 and the starting 0 0 0 point is chosen by the approximation p of the regular octagon (i.e. x1 = 0.325, x2 = x3 = 2x01 , x04 = x05 = 0.5 + x01 and yi0 = 1 − x0i , i = 1, .., 5), the Fortran code furnish the following optimal solutions, see also Table 1: xopt = (0.2841, 0.6847, 0.6864, 0.8754, 0.9248), yopt = (0.9595, 0.7322, 0.7305, 0.4883, 0.3868). ksto fopt ηeval kend cpu

0 250 500 1000 0.7071 0.7262 0.7237 0.7271 1 6217 26090 40589 1 291 523 588 0 50.71 215.43 197.68 Table 1: Results for Octagon.

5.2

Mixture problem

In this example of petrochemical mixture, we have four reservoirs: R1 , R2 , R3 and R4 . The two first receive three distinct source products, then their content is combined in the two other in order to create the wanted miscellanies. The question is to determine the

10

Proceedings JIMD - dec. 2006

quantity of every product to buy in order to maximize the profits. The R1 reservoir receives two products of quality (e.g., content in sulfur) 3 and 1 in quantity q0 and q1 (see for instance [9] and [12]). The quality of the mixture done contain +q1 in R1 is s = 3qq00+q ∈ [1, 3]. The R2 reservoir contains a product of the third source, of 1 quality 2 in quantity q2 . One wants to get in the R3 reservoirs and R4 of capacity 10 and 20, of the quality products of to the more 2.5 and 1.5. The figure 2, where the variable x1 to x2 represents some quantities, illustrate this situation. The unit prices of the products bought are respectively 60, 160 and 100; those of the products finished are 90 and 150. Is the difference between the costs of purchase and sale therefore 60q0 + 160q1 + 100q2 − 90(x1 + x3 ) − 150(x2 − x4 ). The qualities of the final 4 3 and sxx22+2x . miscellanies are sxx11+2x +x3 +x4 The addition of the constraints of volume conservation q0 + q1 = x1 + x2 and q2 = x3 + x4 does permit the elimination of the variables q0 , q1 and q2 : (3 − s)(x1 + x2 ) (s − 1)(x1 + x2 ) , q1 = , q2 = x 3 + x 4 . 2 2 q ,3 x1 , s ≤ 10;-≤ 2.5 P0P q R P R 

J 1 3 

q0 =

q1 ,1

q2 ,2

R2

J

J x2 ,s J

J

J

J x3 , 2

J

J

J ^ J

-

x4 , 2

R4

≤ 20;-≤ 1.5

Figure 2: A Problem of mixture The mathematical model transformed (the variable s is noted x5 ) is follow it:  Minimize: 120x1 + 60x2 + 10x3 − 50x4 − 50x1 x5 − 50x2 x5     subject to : 2.5x1 + 0.5x3 − x1 x5 − z1 = 0     1.5x2 − 0.5x4 − x2 x5 − z2 = 0    x1 + x3 + z3 = 10 x2 + x4 + z4 = 20,     x5 + z5 = 3,     x5 − z6 = 1,    0 ≤ xi i = 1, .., 5; 0 ≤ zi i = 1, .., 6. There are 11 variables and 17 constraints. We use a = 0.05 and kmax = 75, the Fortran code furnish the following optimal solutions, see also Table 2. xopt = (0, 10, 0, 10, 1). ksto fopt ηeval kend cpu

0 250 500 1000 −307 −316 −333 −400 2 10 21 5766 3 4 7 75 0.00 0.00 0.00 0.14

Table 2: Results for Mixture.

Stochastic Perturbation of GRG

6

11

Concluding Remarks

We have presented a stochastic modification of the reduced gradient method for linear and nonlinear constraints, involving the adjunction of a stochastic perturbation. This approach leads to a stochastic descent method where the deterministic sequence generated by the reduced gradient is replaced by a sequence of random variables. We have established a result of stochastic descent methods. The numerical experiments show that the method is effective to calculate for nonconvex optimization problems verified nondegenerate assumption. The use of stochastic perturbations generally improves the results furnished by the reduced gradient. Here yet, we observe that the adjunction of the stochastic perturbation improves the result, with a larger number of evaluations of the objective function. The final points Xkend are very close or practically identical for ksto ≥ 250. The main difficulty in the practical use of the stochastic perturbation is connected to the tuning of the parameters. Even if the combination with the deterministic reduced gradient increases the robustness, the values of a and ksto have an important influence on the quality of the result. The random number generator may also a influence aspects such as the number of evaluations. All the sequences {ξk }k≥0 tested have furnished analogous results, but for different sets of parameters.

References [1] C. Audet, P. Hansen and S. Le Digabel, Exact Solution of Three Nonconvex Quadratic Programming Problems, Nonconvex Optim. Appl. 74, 2004, 25-45. [2] M. Bouhadi, R. Ellaia and J.E. Souza de Cursi, Random Perturbations of the Projected Gradient for Linearly Constrained Problems, Nonconvex Optim. Appl. 54, 2001, 487–499. [3] A.R. Conn, N. Gould and P.L. Toint, Methods For Nonlinear Constraints in Optimization Calculations, Clarendon Press. Inst. Math. Appl. Conf. Ser., New Ser. 63, 1997, 363–390. [4] C.C.Y. Dorea, Stopping Rules For A Random Optimization Method, SIAM Journal Control and Optimization, 28(4), 1990, 841–850. [5] A.S. Drud, CONOPT – A large-scale GRG code, ORSA J. Comput. 6, No.2, 1994, 207–216. [6] A. El Mouatasim, R. Ellai and E.J. Souza de Cursi, Stochastic Perturbation of Variable Metric Method for Unconstrained Nonconvex Optimization, to appear in Applied Mathematics and Computer Sciences, Vol.16, no.4, 2006. [7] A. El Mouatasim, R. Ellaia and J.E. Souza de Cursi, Stochastic Perturbation of Projected Variable Metric Method for Nonconvex Nonsmooth Optimization Problem with Linear Constrained, submitted to RAIRO-Recherche Operational, 2005.

12

Proceedings JIMD - dec. 2006

[8] R. Ellaia and A. Elmouatasim, Random Perturbation of Reduced Gradient Method for Global Optimization, to appear in Proceeding of MCO’04, Metz, France. [9] L.R. Foulds, D. Haugland and K. Jornsten, A Bilinear Approach to the Pooling Problem, Optimization 24, 1992, 165–180. [10] N.I.M. Gould and P.L. Toint, SQP Methods For Large-Scale Nonlinear Programming, System modelling and optimization. Methods, theory and applications, IFIP, Int. Fed. Inf. Process. 46, 2000, 149–178. [11] R.L. Graham, The Largest Small Hexagon, Journal of Combinatorial Theory, Series A 18, 1975, 165–170. [12] C.A. Haverly, Studies of the behaviour of Recursion for the Pooling Problem, ACM SIGMAP Bulletin 25, 1996, 19–28. [13] P. Kall, Solution Methods in Stochastic Programming, System modelling and optimization, Lect. Notes Control Inf. Sci. 197, 1994, 3-22. [14] K.C. Kiwiel, Free-Steering Relaxation Methods For Problems With Strictly Convex Costs And Linear Constraints, Math. Oper. Res. 22, No.2, 1997, 326-349. [15] L.S. Lasdon, A.D. Waren and M. Ratner, Design And Testing Of A Generalized Reduced Gradient code For nonlinear Programming, ACM Trans. Math. Softw. 4, 1978, 34-50. [16] J.M. Martinez, A direct search method for nonlinear programming, ZAMM, Z. Angew. Math. Mech. 79, No.4, 1999, 267-276. [17] M. Pogu and J.E. Souza de Cursi, Global Optimization by Random Perturbation of the Gradient Method with a Fixed Parameter, J. Glob. Optim. 5, No.2, 1994, 159-180. [18] P. L’Ecuyer and R. Touzin, On the Deng-Lin Random Number Generators and Related Methods, Statistics and Computing, 14(1), 2003, 5-9. [19] K. Reinhardt, Extremale Polygone gegebenen Durchmessers, Jber Dtsch. Math. Ver.31, 1922, 251-270. [20] Y. Smeers, Generalized Reduced Gradient Method as an Extension of Feasible Direction Methods, J. Optimization Theory Appl. 22, 1977, 209-226. [21] P. Sadegh and J.C. Spall, Optimal Random Perturbations for Stochastic Approximation using a Simultaneous Perturbation Gradient Approximation, IEEE Trans. Autom. Control 44, No.1, 1999, 231-232. [22] J.E. Souza de Cursi, R. Ellaia and M. Bouhadi, Global Optimization Under Nonlinear Restrictions by Using Stochastic Perturbations Of The Projected Gradient, Nonconvex Optim. Appl. 74, 2004, 541–563.