JNM: Journal of Natural Sciences and Mathematics Vol. 2, No. 1, pp. 17-40 (June 2008/Jumada Al-Akher 1429H )
c
Qassim University Publications
Reduced Gradient Method and its Generalization via Stochastic Perturbation Abdelkrim El Mouatasim Department of Mathematics, Faculty of Sciences, Jazan University, B.P. 114, Jazan, Saudi Arabia. E-mail:
[email protected]
Abstract: In this paper, the global optimization of a nonconvex objective function under linear and nonlinear differentiable constraints is studied, a reduced gradient and GRG descent methods with random perturbation is proposed and it is desired to establish the global convergence of the algorithm. Some numerical examples are also given by the problems of statistical, octagon, mixture, alkylation and pooling.
Keywords: Stochastic perturbation, constraints optimization, reduced gradient method and its generalization, Nanconvex optimization. Received for JNM on November 19, 2007.
1
Introduction
The general constrained optimization problem is to minimize a nonlinear function subject to linear or nonlinear constraints. Two equivalent formulations of this problem are useful for describing algorithms. They are Minimize : f (x) subject to : gi (x) = 0, i = 1, .., ml gi (x) ≤ 0, i = ml + 1, .., m `≤x≤u
(1.1)
where f and each gi : IRn −→ IR is twice continuously differentiable function and the lower- and upper-bound vectors, ` and u, may contain some infinite components; and Minimize : f (x) Subject to : h(x) = 0 x≥0 17
(1.2)
18
Abdelkrim El Mouatasim
where h maps IRm to IR, the objective function f : IRn −→ IR and the restriction hj : IRn −→ IR, j = 1, ..., m (the component of h) are twice continuously differentiable functions. For the case of multi objective optimization see [10]. The main technique that has been proposed for solving constrained optimization problems in this work is reduced gradient method and its generalization. We are mainly interested in the situation where, on one hand f is not convex and, on the other hand, the constraints are in general not linear; to get equality constraints one might think about introducing slack variables and incorporating them into the functions gi . The problem (1.2) can be numerically approached by using reduced gradient method or its generalizations, which generates a sequence {xk }k≥0 , where x0 is an initial feasible point and, for each k > 0, a new feasible point xk+1 is generated from xk by using an operator Qk (see section 3). Thus the iterations are given by: k ≥ 0 : xk+1 = Qk (xk ).
(1.3)
In order to prevent convergence to local minimum, various modifications of these basic methods have been introduced in the literature. For instance, the reduced gradient method has been considered by [1]; the random perturbations projected gradient method in [2]; the sequential convex programming in [24]; the generalized reduced gradient in [5,7, 25–26,32, 34]; sequential quadratic programming in [15, 30]; nonlinear programming in [3, 20, 27]; stochastic perturbation of active set [11]. Moreover, stochastic or evolutionary search can be introduced, but these methods are usually considered as expensive and not accurate: on the one hand, the flexibility introduced by randomization implies often a large number of evaluations of the objective function and, on the other hand, the pure random search generates many points which do not improve the value of the objective function. This remark has led some people to introduce a controlled random search (see for instance [2, 4, 6, 9, 23, 28, 33]). In such a method, the sequence {xk }k≥0 is replaced by a random vectors sequence {Xk }k≥0 and the iterations are modified as follows: k ≥ 0 : Xk+1 = Qk (Xk ) + Pk
(1.4)
where Pk is a suitable random variable, called the stochastic perturbation. The sequence {Pk }k≥0 goes to zero slowly enough in order to prevent convergence to a local minimum (see Section 4). The reduced gradient method and its generalization are recalled in Section 3, while the notations are introduced in Section 2. The results of numerical experiments will be given in Section 5.
2
Notations and assumptions
We denote by IR the set of the real numbers (−∞, +∞), IR+ the set of positive real numbers [0, +∞), E = IRn , the n-dimensional real Euclidean space. For
Reduced Gradient Method and its Generalization via Stochastic Perturbation
19
t t x √ = (x1 , x22 , ...., xn ) 2 ∈1/2E, x denotes the transpose of x. We denote by k x k = xt x = (x1 + ... + xn ) the Euclidean norm of x.
The general optimization problem (1.1) is equivalent to global optimization problem (1.2) with equality constrained and nonnegative variables.
Remark 2.1 The assumed conditions are satisfied by using two phase method, since we can add artificial variables in the constraints. • Linear case: h(x) = Ax − b where A is m × n matrix with rank m, and b is an m-vector. Any m columns of A are linearly independent, and every extreme point of the feasible region has m strictly positives variables. We introduce basic and nonbasic variables according to xB A = [B, N ] x= xN ≥ 0 xN
xB > 0
(2.1)
by nondegeneracy assumption. Furthermore the gradient of f may be conformally partitioned as ∇f (x)t = ∇B f (x)t , ∇N f (x)t , where ∇B f and ∇N f are the basic gradient and nonbasic gradient of f respectively. • Nonlinear case: for simplicity we also assume that the gradients of the constraint functions hj are linearly independent for all point x ≥ 0. This assumption like the assumption of the full rank of the matrix A in the linear case ensures that we can choose a basis and express the basic variables as a linear combination of the nonbasic variables and then reduce the problem. Hence a similar generalized reduced gradient procedure applies. Let a feasible solution xk ≥ 0 with hj (xk ) = 0 for all j be given. By assumption the Jacobian matrix of the constraints h(x) = (h1 (x), ..., hm (x))t at each x ≥ 0 has full rank and, for simplicity at the point xk will be denoted by Ak = Jh(xk ) and bk = Ak xk . Then a similar construction as in linear case applies. Let C = {x ∈ E | h(x) = 0, x ≥ 0}. The objective function is f : E −→ IR, its lower bound on C is denoted by l∗ : l∗ = min f . Let us introduce C
Cα = Sα ∩ C; Sα = {x ∈ E | f (x) ≤ α}.
20
Abdelkrim El Mouatasim
We assume that f is twice continuously differentiable on E
(2.2)
∀α > l∗ : Cα is not empty, closed and bounded
(2.3)
∀α > l? :
meas(Cα ) > 0,
(2.4)
where meas(Cα ) is the measure of Cα . Since E is a finite dimensional space, the assumption (2.3) is verified when C is bounded or f is coercive, i.e., lim f (x) = +∞. Assumption (2.3) is verified when C kxk→+∞
contains a sequence of neighborhoods of a point of optimum x? having strictly positive measure, i.e., when x? can be approximated by a sequence of points of the interior of C. We observe that the assumptions (2.2)-(2.3) yield that [ Cα i.e., ∀x ∈ C : ∃α > l∗ such that x ∈ Cα . C= α>l?
From (2.2) and (2.3) we get γ 1 = sup{k∇f (x)k : x ∈ Cα } < +∞. Then γ 2 = sup{kdk : x ∈ Cα } < +∞, where d is the direction of reduced gradient method (see section 3). Thus, β(α, ε) = sup{ky − (x + ηd)k : (x, y) ∈ Cα × Cα , 0 ≤ η ≤ ε} < +∞,
(2.5)
where ε, η are positive real numbers.
3 3.1
Reduced Gradient and its Generalization Reduced Gradient Method
The reduced gradient method is coldly related to the simplex method of linear programming in that the problem variables are partitioned into basic and nonbasic groups. From a theoretical viewpoint, the method can be shown to have very much like the gradient projection method [1]. Some variants of the method applied to convex nonlinear problems with linear constraints are also referred to as convex simplex method. Again we are dealing with the same basic descent algorithm structure and are interested in still other means for computing descent directions. We use the following form of the linearly constrained with f (.) continuously differentiable on IRn : Minimize : f (x) (3.1) subject to : Ax = b x≥0
Reduced Gradient Method and its Generalization via Stochastic Perturbation
21
where A is an m × n matrix with full row rank and b ∈ IRm . The reduced gradient method begins with a basis B and a feasible solution xk = (xkB , xkN ) such that xkB > 0. The solution x is not necessarily a basic solution, i.e. xN do not have to be identically zero. Such a solution can be obtained e.g. by the usual first phase procedure of linear optimization. Using the basis B from BxB + N xN = b, we have xB = B −1 b − B −1 N xN , hence the basic variables xB can be eliminated from the problem (3.1) Minimize : fN (xN ) subject to : B −1 b − B −1 N xN > 0 0 ≤ xN
(3.2)
where fN (xN ) = f (x) = f (B −1 b − B −1 N xN , xN ). Using the notation ∇f (x)t = ∇B f (x)t , ∇N f (x)t , the gradient of fN , which is the so-called reduced gradient, can be expressed as ∇fN (x)t = − ∇B f (x)t B −1 N ) + (∇N f (x)t . Now let us assume that the basis is nondegenerate, i.e. only the nonnegativity constraints xN ≥ 0 might be active at the current iterate xk . Let the search direction be a vector dt = (dtB , dtN ) in the null space of the matrix A defined as dB = −B −1 N dN and dN ≥ 0. If we define so, then the feasibility of xk + ηd is guaranteed as long as xkB + ηdB ≥ 0, i.e. as long as η ≤ η max = min { i∈B,di 0 such that meas(Cγ − Cθ ) β(γ, ε) P (Uk+1 < θ|Uk ≥ θ) ≥ > 0 ∀θ ∈ (l∗ , l∗ + ν], gk n ξk ξk where n = dim(E). Proof. Let Cˆθ = {x ∈ C|f (x) < θ}, for θ ∈ (l∗ , l∗ + ν]. Since Cα ⊂ Cˆθ , l∗ < α < θ, it follows from (2.4) that Cˆθ is non empty and has a strictly positive measure. If meas(C − Cˆθ ) = 0 for any θ ∈ (l∗ , l∗ + ν], the result is immediate, since we have f (x) = l∗ on C. Let us assume that there exists ε > 0 such that meas(C−Cˆε ) > 0. For θ ∈ (l∗ , l∗ +ε], we have Cˆθ ⊂ Cˆε and meas(C − CˆθZ) > 0. P (Xk 6∈ Cˆθ ) = P (Xk ∈ C − Cˆθ ) = P (Xk ∈ dx) > 0 for any θ ∈ (l∗ , l∗ + ε], since C−Cˆθ
the sequence {Ui }i≥0 is decreasing, we have also {Xi }i≥0 ⊂ Cγ .
(4.9)
Therefore P (X 6∈ Cˆθ ) = P (Xk ∈ Cγ − Cˆθ ) = k
Z Cγ −Cˆθ
P (Xk ∈ dx) > 0 for any θ ∈ (l∗ , l∗ + ε].
25
Reduced Gradient Method and its Generalization via Stochastic Perturbation
Let θ ∈ (l∗ , l∗ + ε], we have from (4.8) P (Uk+1 < θ|Uk ≥ θ) = P (Xk+1 ∈ Cˆθ |Xi 6∈ Cˆθ , i = 0, · · · , k). But (4.4) and (4.5) yield that P (Xk+1 ∈ Cˆθ |Xi 6∈ Cˆθ , i = 0, · · · , k) = P (Xk+1 ∈ Cˆθ |Xk 6∈ Cˆθ ). Thus
P (Xk+1 ∈ Cˆθ , Xk 6∈ Cˆθ ) P (Xk+1 ∈ Cˆθ |Xk 6∈ Cˆθ ) = . P (Xk 6∈ Cˆθ )
Moreover P (X
k+1
∈ Cˆθ , Xk 6∈ Cˆθ ) =
Z
Z
k
P (X ∈ dx) Cˆθ
C−Cˆθ
fk+1 (y|Xk = x) dy.
Using (4.9) we get P (X
k+1
∈ Cˆθ , Xk 6∈ Cˆθ ) =
Z
Z
k
P (X ∈ dx) Cγ −Cˆθ
Cˆθ
fk+1 (y|Xk = x) dy
and P (X
k+1
∈ Cˆθ , X 6∈ Cˆθ ) ≥ k
inf
nZ
x∈Cγ −Cˆθ
k
Cˆθ
fk+1 (y|X = x) dy
Thus P (X
k+1
∈ Cˆθ |Xk 6∈ Cˆθ ) ≥
nZ
inf
x∈Cγ −Cˆθ
oZ
P (Xk ∈ dx).
Cγ −Cˆθ
o fk+1 (y|X = x) dy . k
Cˆθ
Taking (4.5) into account, we have P (X
k+1
1 ∈ Cˆθ |Xk 6∈ Cˆθ ) ≥ n ξk
inf
nZ
x∈Cγ −Cˆθ
Cˆθ
φk (
o y − Qk (x) ) dy . ξk
Note that (2.5) shows that ky − Qk (x)k ≤ β(γ, ε) and (4.6) yields that φk Hence P (X
k+1
y − Q (x) β(γ, ε) k ≥ gk . ξk ξk
1 ∈ Cˆθ |Xk 6∈ Cˆθ ) ≥ n ξk
Z inf
x∈Cγ −Cˆθ
Cˆθ
gk ((
β(γ, ε) )) dy. ξk
It follows that P (X
k+1
meas(Cγ − Cθ ) β(γ, ε) k ˆ ˆ ∈ Cθ |X 6∈ Cθ ) ≥ gk . ξ nk ξk
The global convergence is a consequence of the following result, which yields from the Borel-Catelli’s lemma (see, for instance, [28]).
26
Abdelkrim El Mouatasim
Lemma 4.2 Let {Uk }k≥0 be a decreasing sequence, lower bounded by l∗ . Then there exists U such that Uk → U for k → +∞. Assume that there exists ν > 0 such that for any θ ∈ (l∗ , l∗ + ν], there is a sequence of strictly positive real numbers {ck (θ)}k≥0 such that +∞ X ∀k ≥ 0 P (Uk+1 < θ|Uk ≥ θ) ≥ ck (θ) > 0; ck (θ) = +∞. (4.10) k=0 ∗
Then U = l almost surely. For the proof we refer the reader to [21] or [28]. Theorem 1 Let γ = f (x0 ) and η k satisfy (4.4). Assume that x0 ∈ C, the sequence ξ k is non increasing and +∞ β(γ, ε) X = +∞. (4.11) gk ξ k k=0 Then U = l∗ almost surely. Proof. Let ck (θ) =
meas(Cγ − Cθ ) β(γ, ε) ) > 0. gk ( ξ nk ξk
(4.12)
Since the sequence { ξ k }k ≥ 0 is non increasing,
ck (θ) ≥
meas(Cγ − Cθ ) β(γ, ε) ) > 0. gk ( ξ n0 ξk
Thus, equation (4.11) shows that +∞ X
+∞
meas ( Cγ − Cθ ) X ck (θ) ≥ gk n ξ 0 k=0 k=0
β(γ, ε) ξk
= +∞.
Using Lemma 1 and Lemma 2 we have U = l∗ almost surely.
Theorem 2 Let Zk = 1C (Z)Z where Z is a random variable following N ( 0, σId), ( σ > 0) and let r a ξk = , (4.13) log (k + d) where a > 0, d > 0 and k is the iteration number. If x0 ∈ C then, for a large enough, U = l∗ almost surely. Proof. We have
1 1
z 2 φk ( z ) = √ n exp − = gk ( k z k ) > 0. 2 σ σ 2π
Reduced Gradient Method and its Generalization via Stochastic Perturbation
So
gk
β(γ, ε) ξk
= σ
For a such that 0< we have
∞ X
gk
k=0
√
27
1 2π
n
2 /(2σ 2 a)
(k + d)β(γ,ε)
β(γ, ε)2 < 1, 2σa β(γ, ε) ξk
= +∞,
and, from the preceding theorem, we have U = l∗ almost surely.
5
Numerical results
In order to apply the method presented in Section 4, we start at the initial value X0 = x0 ∈ C. At step k ≥ 0, Xk is known and Xk+1 is determined. We generate ksto the number of perturbation, the case ksto = 0 corresponds to the unperturbed descent method. We add slash variables z1 , z2 , ... to inequality constraints in order to opting equality constraints. We shall introduce a maximum iteration number kmax such that the iterations are stopped when k = kmax or xk is a Kuhn-Tucker point. For each descent direction, the optimal η k is calculated by unidimensional exhaustive search with a fixed step on the interval defined by (3.4). This procedure could imply a large number of calls of the objective function. The global number of function evaluations is η eval and includes the evaluations for unidimensional search. We denote by kend the value of k when the iterations are stopped (it corresponds to the number of evaluations of the gradient of f ). The optimal value and optimal point are fopt and xopt respectively. The perturbation is normally distributed and samples are generated by using the log-trigonometric generator q and the standard random number generator of FORTRAN library. We use a ξ k = log(k+2) , where a > 0. The maximum of iterations has been fixed at kmax = 100. The results for ksto = 0, 250, 500, 1000 and 2000 are given in the tables below for each problems. Concern experiments performed on a workstation HP Intel(R) Celeron(R) M processor 1.30GHz, 224 Mo RAM. The row cpu gives the mean CPU time in seconds for one run.
5.1
Statistical problem
The statistical problem of estimating a bounded mean with a minimax procedure is nonconvex and nonlinear; we will restrict ourselves to this problem (see for instance, [16] and [17]). Thus, our problem is reduced to the following: for a suitable m2 > m1 ' 1.27 and for each m ∈ (m1 , m2 ] find the a1 , a2 . In this case, the problem reduces to a global optimization problem with 5 unknown (θ, a1 , a2 , a3 , b) and 3 equality constraints. This
28
Abdelkrim El Mouatasim
problem is equivalent to the maximization of the convex combination of the several gi function without the constraint on θ (see for instance [14] and [22]). Using the equality a1 + a2 + a3 = 1, one variable, a1 for example, can be eliminated. The problem we eventually studied is the following: Minimize : − {(1 − a2 − a3 )g1 + a2 g2 + a3 g3 } subject to: a2 + a3 + z1 = 1 ai + zi = 1 i = 2, 3 b + z4 = 1 0 ≤ ai i = 2, 3 0 ≤ b, 0 ≤ zi i = 1, .., 4, where g1 (a2 , a3 , b) : δ(1)2 = θ, g2 (a2 , a3 , b) : bm exp(−bm) +
∞ P
(bm − δ(x))2
x=1
g3 (a2 , a3 , b) : m exp(−m) +
∞ P
(m − δ(x))2
x=1
(bm)x−1 exp(−bm) = θ, x!
mx−1 exp(−m) = θ and x!
a2 bx exp (1 − b)m + a3 if x 6= 0, 1 δ(x) = m x−1 a2 b exp (1 − b)m + a3 a2 b exp(−bm) + a3 exp(−m) . δ(1) = m 1 − a2 + a3 + a2 exp(−bm) + a3 exp(−m) There are 7 variables and 10 constraints. We use a = 1 and m = 1.5, the Fortran code furnishes the following optimal solutions, see also Table 1: a∗1 = 0.0958
a∗2 = 0.9042
a∗3 = 0.00000
b∗ = 0.907159
Table 1: Results for Statistics ksto fopt η eval kend cpu
0 −0.3401 1 2 0.00
250 −0.4022 1183 7 37.16
500 −0.4030 4773 12 134.72
1000 −0.4044 10743 13 288.91
2000 −0.4044 17635 11 491.86
29
Reduced Gradient Method and its Generalization via Stochastic Perturbation
5.2
Octagon problem
Consider polygon in the plane with 5 sides (5-gons for short) and unit diameter. Which one of them has a maximum area? (see for instant [18,31]). Using QP and geometric reasoning, the optimal octagon is determined in (6, 10) to have an area about 2.8% larger than the regular octagon. Graham’s conjecture states that the optimal octagon can be illustrated as in Figure 1, in which a solid line between two vertices indicates that the distance between these points is one. In 1996 Hansen formulates this question in the quadratically constrained quadratic optimization problem defining this configuration, which appears below (see for instance [8]). By symmetry, and without loss of generality, the constraint x2 ≥ x3 is added to reduce the size of the feasible region.
A4 = (0, 1) A3 = (−x1 , y1 )CS S A5 = (x1 , y1 )
A2 =
C S C S C S (x3 − x1 − x5 , y1 − y3 + y5 ) C S A6 = (x1 − x2 + x4 , y1 HH C S HH C S HC HH S C H HHS C HS C H S A1 = (x1 − x2 , y1 − y2 ) A7H = (x3 − x1 , y1 − y3 ) C C
− y2 + y4 )
A0 = (0, 0)
Figure 1: Definition of variables for the configuration conjectured by Graham to have maximum area
30
Abdelkrim El Mouatasim
Minimize: − 21 {(x2 + x3 − 4x1 )y1 + (3x1 − 2x3 + x5 )y2 + (3x1 − 2x2 + x4 )y3 + (x3 − 2x1 )y4 + (x2 − 2x1 )y5 } + x1 Subject to: kA0 − A1 k ≤ 1 : (x1 − x2 )2 + (y1 − y2 )2 + z1 = 1 kA0 − A2 k ≤ 1 : (−x1 + x2 − x5 )2 + (y1 − y3 + y5 )2 + z2 = 1 kA0 − A6 k ≤ 1 : (x1 − x2 + x4 )2 + (y1 − y2 + y4 )2 + z3 = 1 kA0 − A7 k ≤ 1 : (−x1 + x3 )2 + (y1 − y3 )2 + z4 = 1, kA1 − A2 k ≤ 1 : (2x1 − x2 − x3 + x5 )2 + (y2 − y3 + y5 )2 + z5 = 1 kA1 − A3 k ≤ 1 : (2x1 + x2 )2 + y22 + z6 = 1 kA1 − A4 k ≤ 1 : (x1 − x2 )2 + (y1 − y2 − 1)2 + z7 = 1 kA1 − A7 k ≤ 1 : (2x1 − x2 − x3 )2 + (−y2 + y3 )2 + z8 = 1 kA2 − A3 k ≤ 1 : (x3 − x5 )2 + (−y3 + y5 )2 + z9 = 1 kA2 − A4 k ≤ 1 : (−x1 + x3 − x5 )2 + (y1 − y3 + y5 − 1)2 + z10 = 1 kA2 − A5 k ≤ 1 : (2x1 + x3 − x5 )2 + (−y3 + y5 )2 + z11 = 1 kA2 − A6 k = 1 : (2x1 − x2 − x3 + x4 + x5 )2 + (−y2 + y3 + y4 − y5 )2 = 1 kA3 − A6 k ≤ 1 : (−2x1 + x2 − x4 )2 + (y2 − y4 )2 + z12 = 1 kA4 − A6 k ≤ 1 : (x1 − x2 + x4 )2 + (y1 − y2 + y4 − 1)2 + z13 = 1 kA4 − A7 k ≤ 1 : (x1 − x3 )2 + (1 − y1 + y3 )2 + z14 = 1 kA5 − A6 k ≤ 1 : (x2 − x4 )2 + (y2 − y4 )2 + z15 = 1 kA5 − A7 k ≤ 1 : (2x1 − x3 )2 + y32 + z16 = 1 kA6 − A1 k ≤ 1 : (2x1 − x2 − x3 + x4 )2 + (−y2 + y3 + y4 )2 + z17 = 1 x2 − x3 − z18 = 0 x2i + yi2 = 1, i = 1, 2, 3, 4, 5 x1 + z19 = 0.5 xi + z18+i = 1, i = 2, 3, 4, 5 0 ≤ xi , yi i = 1, 2, .., 5, 0 ≤ zi i = 1, 2, .., 23.
There are 33 variables and 62 constraints. We use a = 5, the Fortran code furnishes the following optimal solutions, see also Table 2: xopt = (0.25682, 0.67357, 0.67060, 0.91187, 0.91342), yopt = (0.96619, 0.73978, 0.74245, 0.41172, 0.40820) Table 2: Results for Octagon ksto fopt η eval kend cpu
0 −0.726637 2 3 0.00
250 −0.72712 526 11 1.48
500 −0.727377 1654 12 6.70
1000 −0.72738 2634 10 10.67
2000 −0.72739 4585 15 12.53
Reduced Gradient Method and its Generalization via Stochastic Perturbation
5.3
31
Mixture problem
In this example of petrochemical mixture we have four reservoirs: R1 , R2 , R3 and R4 . The first two receive three distinct source products. Then their content is combined in the other two reservoirs to create the wanted miscellanies. The question is to determine the quantity of every product to buy in order to maximize the profits. The R1 reservoir receives two products of quality (e.g., content in sulfur) 3 and 1 in quantity q0 and q1 (see for instance [12, 19]). The quality of the mixture done +q1 contain in R1 is s = 3qq00+q ∈ (1, 3). The R2 reservoir contains a product of the third 1 source, of quality 2 in quantity q2 . One wants to get in the R3 reservoirs and R4 of capacity 10 and 20, of the quality products of to the more 2.5 and 1.5. Figure 2, where the variable x1 to x2 represents some quantities, illustrates this situation. The unit prices of the bought products are respectively 60, 160 and 100; those of the finished products are 90 and 150. Is the difference between the costs of purchase and sale therefore 60q0 + 160q1 + 100q2 − 90(x1 + x3 ) − 150(x2 − x4 )? The qualities of the 3 4 final miscellanies are sxx11+2x and sxx22+2x . The addition of the constraints of volume +x3 +x4 conservation q0 + q1 = x1 + x2 and q2 = x3 + x4 permits the elimination of the variables q0 , q1 and q2 :
q0 =
(s − 1)(x1 + x2 ) (3 − s)(x1 + x2 ) , q1 = , q 2 = x3 + x4 . 2 2
q ,3
x1 , s
P0P q P
R1 J
J
q1 ,1 J x2 ,s J
J
J
J x3 , 2
J
J
J q2 ,2 ^ J
R 2
x4 , 2
R3
R4
≤ 10;-≤ 2.5
≤ 20;-≤ 1.5
Figure 2: A Problem of mixture
32
Abdelkrim El Mouatasim
The mathematical model transformed (the variable s is noted x5 ) is as follows: Minimize: 120x1 + 60x2 + 10x3 − 50x4 − 50x1 x5 − 50x2 x5 Subject to: 2.5x1 + 0.5x3 − x1 x5 − z1 = 0 1.5x2 − 0.5x4 − x2 x5 − z2 = 0 x1 + x3 + z3 = 10 x2 + x4 + z4 = 20, x5 + z5 = 3, x5 − z6 = 1, 0 ≤ xi i = 1, .., 5; 0 ≤ zi i = 1, .., 6. There are 11 variables and 17 constraints. We use a = 0.05, the Fortran code furnish the following optimal solutions, see also Table 3. xopt = (0, 10, 0, 10, 1) Table 3: Results for Mixture ksto fopt η eval kend cpu
0 −306 2 3 0.00
250 −353 28 7 0.02
500 −367 51 7 0.36
1000 −374 105 9 1.27
2000 −400 28667 101 38.53
We have the optimum value of objective function (fopt = −400) for Mixture problem in ksto = 2000.
5.4
Alkylation problem
The alkylation process is common in the petroleum industry, it is an important unit that is used in refineries to upgrade light olefins and isobutane into much more highly valued gasoline component. The alkylation process is usually mixed at the gasoline in order to improve his performance; see the problem 5.3. A simplified process flow diagram of an alkylation process is shown in Figure 3 below. The process model seeks to determine the optimum set of operating conditions for the process, based on a mathematical model, which allows the maximization of profit. The problem is formulated as a direct nonlinear programming model with mixed nonlinear inequality and equality constraints and a nonlinear profit function to be maximized (see for instance [3]). A typical reaction [29] is (CH3 )2 C = CH2 + (CH3 )3 CH Olefin feed Isobutane make up
Fresh acid (CH3 )2 CHCH2 C(CH3 )3
−→
Alkylate product
As shown in Figure 3, an olefin feed (100% butane), a pure isobutane recycle and a 100% isobutane make up stream are introduced in a reactor together with an acid
33
Reduced Gradient Method and its Generalization via Stochastic Perturbation
catalyst. The reactor product stream is then passed through a fractionator where the isobutane and the alkylate product are separated. The spent acid is also removed from the reactor. Isobutane recycle
-
Hydrocarbone product
Olefin feed-
-
Fractionator
Isobutane - Reactor make up Alkylate product
Spent acid -
-
Fresh acid -
Figure 3: Simplified Alkylation Process Flow sheet The notation used is shown in Table 4 along with the upper and lower bounds on each variable. The bounds represent economic, physical and performance constraints.
Table 4: Variable and their bounds Symbol x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
Variable Olefin feed (barrels/day) Isobutane recycle (barrels/day) Acid Addition Rate (X1000 pounds/day) Alkylate yield (barrels/day) Isobutane make up (barrels/day) Acid strength (wt.%) Motor octane no. External Isobutane-olefin Ratio Acid dilution factor F-4 performance no.
Lower Bound 0 0 0 0 0 85 90 3 1.2 145
Upper Bound 2000 16000 120 5000 2000 93 95 12 4 162
The external isobutane- to- olefin ratio, x8 , was equal to the sum of the isobutane recycle, x2 , and the isobutane make up, x5 , divided by the olefin feed, x1 . x8 =
x2 + x 5 . x1
34
Abdelkrim El Mouatasim
The motor octane number, x7 , was a function of the external isobutane- to- olefin ratio, x8 , and the acid strength by weight percent, x6 , (for the same reactor temperatures and acid strenghts as for the alkylate yield x4 )
x1 ≤ 86.35 + 1.098x8 − 0.038x28 + 0.325(x6 − 89).
The acid strength by weight percent, x6 , could be derived from an equation that expressed the acid addition rate, x3 , as a function of the alkylate yield, x4 , and the acid dilution factor, x9 , and the acid strength by weight percent, x6 (the addition acid was assumed to have a strength of 98%)
1000x3 =
x4 x6 x9 . 98 − x6
This equation repeats itself with the help of quadratic constraints implying an artificial variable x11 x3 = x6 x11
and 98000x11 − 1000x3 = x4 x9 .
The alkylate yield, x4 , was a function of the olefin feed, x1 , and the external isobutaneto- olefin ratio, x8 . The relationship determined by holding the reactor temperature between 80 F and 90 F and the reactor acid strength by weight percent at 85 to 93 was: x4 ≤ x1 (1.12 + 0.13167x8 − 0.00667x28 ). While adding an artificial variable x12 , one gets the equivalent quadratic constraints:
x4 = x1 x12
and x12 ≤ 1.12 + 0.13167x8 − 0.00667x28 .
The objective function is linear in the cost of the products sources x1 , x3 and x5 , and in the cost of the isobutane recycled x2 , and depends linearly on the product of the quantity of alkane by the rate of octane x4 x7 . The other constraints of the physical boundary are on the variables and the linear relations. Continuation to the elimination of variables, a stake to the scale and a
Reduced Gradient Method and its Generalization via Stochastic Perturbation
35
renumbering of the indications, one gets the following optimization problem: Minimize : 614.88x1 − 171.5x2 − 6.3x1 x4 + 4.27x1 x5 − 3.5x2 x5 + 10x3 x6 Subject to : −0.325x3 + x4 − 1.098x5 + 0.038x25 + z1 = 57.425 −0.13167x5 + x7 + 0.00667x25 + z2 = 1.12 12.2x1 − 10x2 + z3 = 200, 12.2x1 − 10x2 − z4 = 0.1 −10x − 10x2 + 12.2x1 x5 − 10x2 x5 − z6 = 0.1 2 + 12.2x1 x5 − 10x2 x5 + z5 = 1600, 65.346x1 − 980x6 − 0.666x1 x4 + 10x2 x5 = 0 x1 − 1.22x1 x7 + x2 x7 = 0 x3 x6 + z7 = 120 x1 + z8 = 32.786885, x1 − z9 = 0.01 x 5 + z10 = 12, x5 − z11 = 3 x2 + z12 = 20, x6 + z13 = 1.411765 x3 + z14 = 95, x3 − z15 = 85 x4 + z16 = 95, x4 − z17 = 92.66666. 0.8196722 + z18 = x7 , xi ≥ 0, i = 1, .., 7; zj , j = 1, .., 18. There are 25 variables and 38 restrictions and we use a = 10 and kmax = 100. The SPRGM algorithm furnishes the following optimal solutions, see also Table 5: xopt = (30.4713, 20.0000, 90.9999, 94.1900, 10.4100, 1.0697, 1.7743)
Table 5: Results for Alkylation nsto fopt η eval kend cpu
5.5
0 −1161.626 2 3 0.00
250 −1176.192 274 14 0.97
500 −1176.191 531 14 1.95
1000 −1176.186 1019 14 3.86
2000 −1176.193 2063 14 7.86
Pooling problem
We present in Figure 4 (and Table 6) an example in which the intermediate pools are allowed to be intermediate, see for instance [12]. The pooling is thus extended to the case where exiting blends of some intermediate pools are feeds of others. The proportion model of the generalized pooling problem is not a bilinear program, since the variables are not partitioned into two sets. Therefore, this formulation belongs to the class of quadratically constrained quadratic programs; cf. for instance [19].
36
Abdelkrim El Mouatasim
F1
- B 1 x11 HH j H y21 P 1P * PP @ PPy12 B R @ F2 2 P q P P 2 : * x32 @ @ y @ 23 F3 R B3 @
Figure 4: A Generalized Pooling Problem Table 6: Characteristics of GP Feed
F1 F2 F3
Price $/bbl
Attribute Quality
6 16 10
3 1 2
Max Pool Supply 18 18 18
P1 P2
Max Blend capacity 20 20
B1 B2 B3
Price $/bbl
Max Demand
9 13 14
10 15 20
Attribute Max 2.5 1.75 1.5
A hybrid model for GP is given below where v12 denotes the flow from P1 to P2 (all other variables are defined as before).
max −6(x11 + y12 + v12 − q21 (y12 + v12 )) − 16q12 (y12 + v12 ) q,t,v,w,x,y −10(x32 + y21 + y23 − v12 ) + 9(x11 + y21 ) + 13(y12 + x32 ) + 14y23 Subject to : x11 + y12 + v12 − q21 (y12 + v12 ) + z1 = 18 supply: q (y + v12 ) + z2 = 18 21 12 x + y21 + y23 − v12 ) + z3 = 18 32 x11 + y21 + z4 = 10 demand: y + x32 + z5 = 15 12 y + z6 = 20 23 y12 + v12 + z7 = 20 capacity: y + y23 + z8 = 20 21 (3 − 2q21 )v12 + 2(y21 + y23 − v12 ) − t12 (y21 + y23 ) = 0 3x11 + t12 y21 − 2.5(x11 + y21 ) + z9 = 0 attribute: 2x + (3 − 2q21 )y12 − 1.75(x32 + y12 ) + z10 = 0 1 32 t + z11 = 1.5 2 y21 + y23 − v12 − z12 = 0 pos. flow: q + z13 = 1 21 q ≥ 0, t ≥ 0, v ≥ 0, w ≥ 0, x ≥ 0, y ≥ 0, zi ≥ 0, i = 1, .., 13.
Reduced Gradient Method and its Generalization via Stochastic Perturbation
37
There are 21 variables and 35 restrictions, we use a = 50. The SPRGM algorithm furnish the following optimal solutions, see also Table 7: xopt = (x11 , x32 ) = (0.1506, 2.6564); yopt = (y12 , y21 , y23 ) = (0.7425, 0.0005, 19.9988) vopt = v12 = 10.0013; qopt = q21 = 0.9949; topt = t12 = 1.5000. Table 7: Results for Pooling nsto fopt η eval kend cpu
6
0 −26.15 2 3 0.00
250 −26.63 181 5 0.03
500 −26.71 441 5 0.03
1000 −26.72 813 5 0.11
2000 −26.73 1114 5 0.17
Concluding remarks
We have presented a stochastic modification of the reduced gradient method for linear and nonlinear constraints, involving the adjunction of a stochastic perturbation. This approach leads to a stochastic descent method where the deterministic sequence generated by the reduced gradient is replaced by a sequence of random variables. We have established a result of stochastic descent methods. The numerical experiments show that our method is effective to be calculated for nonconvex optimization problems verifying a nondegeneracy assumption. The use of stochastic perturbations generally improves the results furnished by the reduced gradient. Here yet, we observe that the adjunction of the stochastic perturbation improves the result, with a larger number of evaluations of the objective function. The final points Xkend are very close or practically identical for ksto ≥ 250. The main difficulty in the practical use of the stochastic perturbation is connected to the tuning of the parameters. Even if the combination with the deterministic reduced gradient increases the robustness, the values of a and ksto have an important influence on the quality of the result. The random number generator may have also a influence aspects such as the number of evaluations. All the sequences {ξ k }k≥0 tested have furnished analogous results, but for different sets of parameters.
References [1] Beck, P., Lasdon, L. and Engquist, M. “A reduced gradient algorithm for nonlinear network problems”. ACM Trans. Math. Softw., 9 (1983), 57–70. [2] Bouhadi, M., Ellaia, R. and Souza de Cursi, J. E. “Random perturbations of the projected gradient for linearly constrained problems”. Nonconvex Optim. Appl., 54 (2001), 487–499.
38
Abdelkrim El Mouatasim
[3] Bracken, J. and McCormick, G. P. Selected applications of nonlinear programming, New York-London-Sydney: John Wiley and Sons, Inc. (1986) XII, 110. [4] Carson, Y. and Maria, A. Simulation optimization: methods and applications, Proceeding of Winter Simulation Conference, USA, 1997. [5] Conn, A.R., Gould, N. and Toint, P.L. Methods for nonlinear constraints in optimization calculations, Inst. Math. Appl. Conf. Ser., New Ser., 63 (1997), 363–390. [6] Dorea, C. C. Y. “Stopping rules for a random optimization method”. SIAM Journal Control and Optimization, Vol. 28, No. 4 (1990), 841–850. [7] Drud, A.S. “CONOPT – A large-scale GRG code”. ORSA J. Comput., Vol. 6, No. 2 (1994), 207–216. [8] El Mouatasim, A. Stochastic perturbation in global optimization: analyses and application, PhD Theses, Morocco: Mohammadia School of Engineers, 2007. [9] El Mouatasim, A., Ellaia, R. and Souza de Cursi, J. E. “Random perturbation of variable metric method for unconstraint nonsmooth global optimization”. International Journal of Applied Mathematics and Computer Science, Vol. 16, No. 4 (2006), 463–474. [10] Ellaia, R., El Mouatasim, A., Banga, J. R. and Sendin, O. H. “NBI-RPRGM for multi-objective optimization design of Bioprocesses”, Journal of ESAIM: PROCEEDINGS, 20 (2007), 118–128. [11] El Mouatasim, A., Ellaia, R. and Souza de Cursi, J.E. “Projected variable metric method for linear constrained nonsmooth global optimization via perturbation stochastic”. Submitted to RAIRO Journal (2008). [12] Foulds, L.R. Haugland, D. and Jornsten, K. “A bilinear approach to the pooling problem”. Optimization. 24 (1992), 165–180. [13] Floudas, C. A. and Pardalos, P. M. A collection of test problems for constrained global optimization algorithms, Lecture Notes in Computer Science, 455. Berlin: Springer-Verlag, XIV, 180, 1990. [14] Ghosh, M. N. “Uniform approximation of minimax point estimates”. Ann. Math. Statist., 35 (1964), 1031–1047. [15] Gould, N.I.M. and Toint, P.L. SQP Methods for large-scale nonlinear programming, System modelling and optimization. Methods, theory and applications, 19th IFIP TC7 conference, Cambridge, GB, July 12-16, 1999, Boston: Kluwer Academic Publishers, IFIP, Int. Fed. Inf. Process. 46, 149–178, 2000.
Reduced Gradient Method and its Generalization via Stochastic Perturbation
39
[16] Gourdin, E. Global optimization algorithms for the construction of least favorable priors and minimax estimators, Engineering Undergraduate Thesis, Every, France: Institut d’Informatique d’Interprise, 1989. [17] Gourdin, E., Jaumard, B. and MacGibbon, B. “Global optimization decomposition methods for bounded parameter minimax evaluation”, SIAM Journal of Sci. Comput., Vol. 15, No. 1 (1994), 16–35. [18] Graham, R. L. “The largest small Hexagon”. Journal of Combinatorial Theory, Series A, 18 (1975), 165–170. [19] Haverly, C. A. “Studies of the behaviour of recursion for the pooling problem”, ACM SIGMAP Bulletin, 25 (1996), 19–28. [20] Hock, W. and Schittkowski, K. Test examples for nonlinear programming Codes,Lecture Notes in Economics and Mathematical Systems, 187, BerlinHeidelberg-New York: Springer-Verlag., V, 178, 1981. [21] L’Ecuyer, P. and Touzin, R. “On the Deng-Lin random number generators and related methods”. Statistics and Computing, Vol. 14, No. 1 (2003), 5–9. Johnstone, I.M. and MacGibbon, K.B. “Minimax estimation of a constrained Poisson vector”, Ann. Stat., Vol. 20, No. 2 (1992), 807–831. [22] Kall, P. Solution methods in stochastic programming, Lect. Notes Control Inf. Sci. 197, 3–22, 1994. [23] Kiwiel, K. C. “Free-Steering relaxation mthods for problems with strictly convex costs and linear constraints”, Math. Oper. Res., Vol. 22, No. 2 (1997), 326–349. [24] Lasdon, L.S., Waren, A.D. and Ratner, M. “Design and testing of a generalized reduced gradient code for nonlinear programming”, ACM Trans. Math. Softw., 4 (1978), 34–50. [25] Luenberger, D. G. “Introduction to linear and nonlinear programming”, AddisonWesley Publishing Company. XII, 356, 1973. [26] Martinez, J. M. “A direct search method for nonlinear programming”. ZAMM, Z. Angew. Math. Mech., Vol. 79, No. 4 (1999), 267–276. Pogu, M. and Souza de Cursi, J.E. “Global optimization by random perturbation of the gradient method with a fixed parameter”. J. Glob. Optim. 5, 2 (1994), 159–180. [27] Pine, S. H. Organic chemistry, New York London: McGraw-hill, 1987. [28] Raber, U. Nonconvex All-Quadratic global optimization problems: solution methods, application and related topics, Disseration Universitat Trier, 1999.
40
Abdelkrim El Mouatasim
[29] Reinhardt, K. “Extremale polygone gegebenen durchmessers”, Jber Dtsch. Math. Ver., 31 (1922), 251–270. [30] Smeers, Y. “Generalized reduced gradient method as an extension of feasible Direction Methods”, J. Optimization Theory Appl., 22 (1977), 209–226. [31] Sadegh, P. and Spall, J. C. “Optimal random perturbations for stochastic approximation using a simultaneous perturbation gradient approximation”, IEEE Trans. Autom. Control, Vol. 44, No. 1 (1999), 231–232. [32] Stople, M. On models and methods for global optimization of structural topology, Doctoral Thesis, Stockholm, 2003.