A derivative-free algorithm for linearly constrained optimization problems
E. A. E. Gumma, M. H. A. Hashim & M. Montaz Ali
Computational Optimization and Applications An International Journal ISSN 0926-6003 Volume 57 Number 3 Comput Optim Appl (2014) 57:599-621 DOI 10.1007/s10589-013-9607-y
1 23
Your article is protected by copyright and all rights are held exclusively by Springer Science +Business Media New York. This e-offprint is for personal use only and shall not be selfarchived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com”.
1 23
Author's personal copy Comput Optim Appl (2014) 57:599–621 DOI 10.1007/s10589-013-9607-y
A derivative-free algorithm for linearly constrained optimization problems E.A.E. Gumma · M.H.A. Hashim · M. Montaz Ali
Received: 30 November 2011 / Published online: 3 October 2013 © Springer Science+Business Media New York 2013
Abstract Based on the NEWUOA algorithm, a new derivative-free algorithm is developed, named LCOBYQA. The main aim of the algorithm is to find a minimizer x ∗ ∈ Rn of a non-linear function, whose derivatives are unavailable, subject to linear inequality constraints. The algorithm is based on the model of the given function constructed from a set of interpolation points. LCOBYQA is iterative, at each iteration it constructs a quadratic approximation (model) of the objective function that satisfies interpolation conditions, and leaves some freedom in the model. The remaining freedom is resolved by minimizing the Frobenius norm of the change to the second derivative matrix of the model. The model is then minimized by a trust-region subproblem using the conjugate gradient method for a new iterate. At times the new iterate is found from a model iteration, designed to improve the geometry of the interpolation points. Numerical results are presented which show that LCOBYQA works well and is very competing against available model-based derivative-free algorithms. E.A.E. Gumma Department of Mathematics, Faculty of Pure and Applied Sciences, International University of Africa, P.O. Box: 2469, Khartoum, Sudan e-mail:
[email protected] E.A.E. Gumma e-mail:
[email protected] M.H.A. Hashim Department of Applied Mathematics, Faculty of Mathematical Sciences, University of Khartoum, P.O. Box: 321, Khartoum, Sudan. e-mail:
[email protected] M.H.A. Hashim e-mail:
[email protected]
B
M.M. Ali ( ) School of Computational and Applied Mathematics, University of the Witwatersrand, Johannesburg, South Africa e-mail:
[email protected]
Author's personal copy 600
E.A.E. Gumma et al.
Keywords Derivative-free optimization · Linearly constrained problem · Least Frobenius norm method · Trust-region subproblem · Conjugate gradient method
1 Introduction Generally, in optimization problems, there is useful information in the derivatives of the function one wishes to optimize. For instance, the gradient and the Hessian matrix of the objective function characterizes the local minimum of the function to be optimized. However, in many engineering problems the analytical expression of the objective function and constraints are unavailable and their values are computed by means of complex black-box simulations. Therefore, the derivatives of such problems are unavailable. Although, we assume that the derivatives exist. The derivative-free methods are useful for solving these problems. In this paper we design an algorithm to solve the optimization problem: min f (x),
x ∈ Rn ,
subject to AT x ≥ b,
(1) A ∈ Rn×mˆ , b ∈ Rmˆ ,
(2)
where f (x) is a smooth nonlinear real-valued function; the gradient and the Hessian of f (x) are unavailable. The algorithm we are going to design can also solve unconstrained and simple bounds constrained problems. The simple bounds l ≤ x ≤ u,
l, u ∈ Rn ,
(3)
−I x ≥ −u.
(4)
can be written as I x ≥ l,
So, we can set A = [I, −I ]T , b = [l, −u]T , where I is the n × n identity matrix. In Eq. (3), if we let the components of l to be very large negative numbers and the components of u to be very large numbers, then we have an unconstrained optimization problem. Thus, our algorithm can also solve unconstrained and simple bound constrained problems. Derivative-free methods can be classified into two classes. The direct search methods and the model-based methods. The algorithm presented in this paper belongs to the latter class. In the literature, some model-based derivative-free algorithms have been proposed. The first trial of employing available objective function values f (x) for building a quadratic model by interpolation was proposed by Winfield [21, 22]. In 1994, Powell [14] proposed the COBYLA algorithm for constrained optimization without derivatives. In this proposal, the objective function and the constraints are approximated by linear multivariate models. Powell [15] also extended the idea of Winfield further by developing his UOBYQA algorithm for unconstrained optimization without derivatives by using quadratic interpolation models of the objective function. A variant model-based algorithm (DFO) using quadratic Newton fundamental polynomial was proposed by Conn, Sheinberg and Toint [3, 4]. In 2005, Frank [7] proposed a variant of UOBYQA algorithm [15], named CONDOR. CONDOR
Author's personal copy A derivative-free algorithm for linearly constrained optimization
601
is a derivative-free algorithm for unconstrained and easy constrained problems. In 2006, Powell [17] proposed NEWUOA, which is a derivative-free algorithm for unconstrained optimization problems based on the least Frobenius norm method. An extension to NEWUOA for simple bound constrained problems based on the least Frobenius norm method was also proposed by Powell [19], named BOBYQA. Our strategy to solve problems (1)–(2) is based on approximating the objective function by a quadratic model. This technique is highly useful to obtain a fast rate of convergence in derivative-free optimization, because it approximates the curvature of the objective function. Therefore, at each iteration, at a point x, we construct the quadratic model: Q(x + p) = c + pT g + 12 pT Gp. We cannot define g = ∇f (x) and G = ∇ 2 f (x), because as mentioned earlier, the gradient and the Hessian matrix of f (x) are unavailable. Instead, the constant c, the vector g and the symmetric matrix G ∈ Rn×n have been determined by imposing the interpolation conditions Q(x i ) = f (x i ),
i = 1, 2, . . . , s,
(5)
where s = 12 (n + 1)(n + 2). In this case, the parameters of Q can be written as a linear system of equations in the coefficients of the model. If we choose the interpolation points so that the linear system is nonsingular, then the model Q will be defined uniquely. On the other hand, the use of a full quadratic model limits the size of the problems that can be solved in practice. One of the methods that overcomes this drawback was proposed by Powell [16]. The method of Powell constructs a full quadratic model that satisfies interpolation conditions, and leaves some freedom in the model. The remaining freedom is taken up by minimizing the Frobenius norm of the second derivative matrix of the change to the model. This variational problem is expressed as a solution of (m + n + 1) × (m + n + 1) system of linear equations, where m is the number of the interpolation points which satisfy n + 2 ≤ m ≤ 12 (n + 1)(n + 2). The algorithm presented here (LCOBYQA) is based on the above principle of Powell and thus, an extension to Powell’s algorithms (NEWUOA, BOBYQA) [17, 19]. The name LCOBYQA is an acronym for Linearly Constrained Optimization BY Quadratic Approximation. LCOBYQA is an iterative algorithm. A typical iteration of the algorithm generates a new vector of variables either by minimizing the quadratic model in a trust-region subject to linear constraints (trust-region subproblem), or by a procedure that should improve the geometry of the interpolation points (model iteration subproblem). The rest of the paper is organized as follows. Since our algorithm is an extension of Powell’s work, the latter is discussed in detail in Sect. 2. This makes it easy to describe our algorithm in Sect. 3. Several numerical results are presented and discussed in Sect. 4. The paper ends in Sect. 5 with a conclusion and an outlook to possible future research.
2 Least Frobenius norm method for updating quadratic models Quadratic approximations of the objective function are highly useful for obtaining a fast rate of convergence in derivative-free algorithms, because usually some attention
Author's personal copy 602
E.A.E. Gumma et al.
has been given to the curvature of the objective function. On the other hand, each full quadratic model has 12 (n + 1)(n + 2) independent parameters and this limits the size of the problems that can be solved in practice. Therefore, Powell [17, 19] investigated the idea of constructing a suitable quadratic model from m interpolation points, when m is much less than 12 (n + 1)(n + 2). If m = 2n + 1, then there are enough data to define a quadratic model with diagonal second derivative matrix which is done before the first iteration. Specifically, on each iteration, the method of NEWUOA and BOBYQA constructs a quadratic model Q(x), x ∈ Rn , of the objective function f (x), x ∈ Rn that required to satisfy interpolation conditions Q(x i ) = f (x i ),
i = 1, 2, . . . , m,
(6)
where m is prescribed by the user, the value m = 2n + 1 being typical, and where the positions of the different points x i , i = 1, 2, . . . , m, are generated automatically. These conditions leave much freedom in Q, taken up when the model is updated by minimizing the Frobenius norm of the change to the second derivative matrix of Q. The success of the method of NEWUOA and BOBYQA is due to a well-known technique that is suggested by the symmetric Broyden method for updating ∇ 2 Q when first derivatives of f (x) are available (see p. 73 of [6]). Let an old model Qold be present, and let the new model Qnew be required to satisfy conditions (6) and leave some freedom in the parameters of Qnew . The technique takes up this freedom by minimizing the Frobenius norm of ∇ 2 Qnew − ∇ 2 Qold . One reason for trying the symmetric Broyden method is that the calculation of Qnew from Qold requires only O(n2 ) operations in the case m = 2n + 1, but O(n4 ) operations are needed if Qnew is defined completely by conditions (6), when m = 12 (n + 1)(n + 2), see [19]. The second reason is that if the objective function f (x) is quadratic, then the symmetric Broyden method has the property: 2 ∇ Qnew − ∇ 2 f 2 = ∇ 2 Qold − ∇ 2 f 2 − ∇ 2 Qnew − ∇ 2 Qold 2 , (7) F F F which suggests that the approximations ∇ 2 Q ≈ ∇ 2 f become more accurate as the iteration proceeds [16]. 2.1 The solution of the variational problem Let the current quadratic model Qold be present and has the form: 1 Qold (x) = cold + (x − x 0 )T g old + (x − x 0 )T Gold (x − x 0 ), 2 where x 0 is a fixed point. Let x opt be the point that satisfies: f (x opt ) = min f (x i ) : i = 1, 2, . . . , m ,
x ∈ Rn ,
(8)
(9)
where x i , i = 1, 2, . . . , m, are the interpolation points of Qold . On each iteration of NEWUOA, the current iteration generates a new point x + . The position of x opt is central to the choice of x + in trust region methods. Indeed, x + is calculated to be a sufficiently accurate estimate of the vector x ∈ Rn that solves the subproblem min Qold (x),
subject to x − x opt ≤ ρ,
(10)
Author's personal copy A derivative-free algorithm for linearly constrained optimization
603
where ρ adjusted automatically [17]. On some iterations x + may be generated in a different way that is intended to improve the accuracy of the quadratic model. Once x + is calculated, one of the old interpolation points, x t , say, is removed to make a + room for x + . Thus, the interpolation points x + i , i = 1, 2, . . . , m, of Qnew are x and m − 1 of the old points x i . Let 1 Qnew (x) = c+ + (x − x 0 )T g + + (x − x 0 )T G+ (x − x 0 ), 2
x ∈ Rn ,
(11)
be the new quadratic model that satisfies the conditions (6) at the new interpolation points, leaving some freedom in the parameters of Qnew . NEWUOA takes up this freedom by minimizing the Frobenius norm of ∇ 2 D(x), where 1 D(x) = Qnew (x) − Qold (x) = c + (x − x 0 )T g + (x − x 0 )T G(x − x 0 ), 2
x ∈ Rn , (12)
subject to the conditions (6). This problem can be written as: n n 2 1 1 2 min G+ − Gold F = Gij , 4 4
(13)
i=1 j =1
subject to T T + + 1 + xi − x0 G x+ δit , c + x+ i − x0 g + i − x 0 = f x i − Qold x i 2 i = 1, 2, . . . , m, (14) where δit is the Kronecker delta. As reported in [16], the positions of the interpolation points x i are required to have the properties: • A1: Let M be the space of quadratic polynomials from Rn to R that are zero at x i , i = 1, 2, . . . , m. Then the dimension of M is s − m, where s = 12 (n + 1)(n + 2). • A2: If q(x), x ∈ Rn , is any linear polynomial that is zero at x i , i = 1, 2, . . . , m, then q is identically zero. Problem (13), subject to (14) is a convex quadratic problem. Therefore, from the KKT conditions, there exist unique Lagrange multipliers λk , k = 1, 2, . . . , m, such that the first derivative of the expression 1 2 Gij 4 n
L(c, g, G) =
n
i=1 j =1
−
m k=1
+ T T + 1 + λk c + x k − x 0 g + x k − x 0 G x k − x 0 , 2
(15)
Author's personal copy 604
E.A.E. Gumma et al.
with respect to the parameters of L is zero. This implies that m
λk = 0,
k=1
m
λk x + k − x 0 = 0,
(16)
k=1
and G=
m
+ T λk x + k − x0 xk − x0 .
(17)
k=1
The second part of Eq. (16) and the conditions (14) give rise to the variational square linear system ⎛ ⎞ ⎛ ⎞ λ λ T r ΛX ⎜ ⎟ ⎜ ⎟ = W ⎝c ⎠ = , (18) ⎝ ⎠ c XO 0 g g where W is an (m + n + 1) × (m + n + 1) matrix, Λ has the elements Λij =
T 2 1 + xi − x0 x+ , j − x0 2
X is the (n + 1) × m matrix 1 x+ − x0 1
1 x+ − x0 2
i, j = 1, 2, . . . , m,
... 1 , . . . x+ m − x0
(19)
(20)
+ and where r is a vector in Rm , which has the components [f (x + i ) − Qold (x i )]δit , i = 1, 2, . . . , m. Suppose that conditions A1 and A2 hold for the interpolation x + i , i = 1, 2, . . . , m, then the matrix X has a full row rank, the solution of the system (18) gives the parameters of D, and the new model is given by
Qnew (x) = Qold (x) + D(x).
(21)
2.2 The Lagrange functions of the interpolation points As presented in [16], Lagrange polynomials play a fundamental role in preserving nonsingularity of the system (18). The Lagrange polynomials of the interpolation problem at the interpolation points x i , i = 1, 2, . . . , m, are quadratic polynomials j (x), x ∈ Rn , j = 1, 2, . . . , m, that satisfy the conditions j (x i ) = δij ,
1 ≤ i, j ≤ m,
(22)
where δij is the Kronecker delta. In order for these polynomials to be applicable to the variational system (18), the conditions A1 and A2 are retained on the positions of the interpolation points, and for each j = 1, 2, . . . , m, the remaining freedom in j (x) is taken up by minimizing the Frobenius norm ∇ 2 j F subject to the constraints in (22). Therefore, if the right hand side of the system (18) is replaced by the j -th
Author's personal copy A derivative-free algorithm for linearly constrained optimization
605
coordinate vector in Rm+n+1 , then the parameters of j are defined by this system. Thus, if Q is the quadratic polynomial Q(x) =
m
f (x j )j (x),
x ∈ Rn ,
(23)
j =1
then its parameters satisfy Eq. (18). It follows from the nonsingularity of the system (18) that the expression (23) is the Lagrange form of the solution of the variational problem (18). Let Ω ΞT = W −1 H= Ξ Υ be the inverse of the matrix W of the system (18). The definition of j , where j is any integer in {1, 2, . . . , m}, implies that the j -th column of H provides the parameters of j . In particular, because of Eq. (17), j has the second derivative matrix Gj = ∇ 2 j =
m
Hkj (x k − x 0 )(x k − x 0 )T ,
j = 1, 2, . . . , m.
(24)
k=1
Therefore, j is the polynomial: 1 j (x) = cj + (x − x 0 )T g j + (x − x 0 )T Gj (x − x 0 ), 2
x ∈ Rn ,
(25)
where cj is equal to Hm+1j and g j is equal to the vector with component Hij , i = m + 2, m + 3, . . . , m + n + 1. Because of the fact that the parameters of j (x) depend on H , the elements of the matrix H are required to be available. To discuss the relation between the polynomials j (x), j = 1, 2, . . . , m, and nonsingularity of the system (18), let x + be the new vector of variables that will replace one of the interpolation points x i , i = 1, 2, . . . , m. When x + replaces x t , t ∈ {1, 2, . . . , m}, x t will be dismissed, so the new interpolation points are the vectors x+ i = xi ,
+ x+ t =x ,
i ∈ {1, 2, . . . , m}\{t}.
(26)
One advantage of the Lagrange polynomials is that they provide a convenient way of maintaining the conditions A1 and A2 of Sect. 2.1, see [16]. These conditions are inherited by the new interpolation points if t is chosen so that t (x + ) is nonzero. It can be seen that at least one of the values j (x + ), j = 1, 2, . . . , m, is nonzero, because interpolation to a constant function yields m
j (x) = 1,
x ∈ Rn .
(27)
j =1
Another advantage of the Lagrange polynomials is that they improve the accuracy of the quadratic model. In order to improve the accuracy of the quadratic model, Powell has chosen an alternative to solving the trust region subproblem (10). In this
Author's personal copy 606
E.A.E. Gumma et al.
alternative the interpolation point that is going to be replaced by x + , namely x t , is selected before the position of x + is chosen. However, x t is often the element of the set {x i : i = 1, . . . , m} that is furthest from the best point x opt . Having picked the index t, the value of | t (x + ) | is made relatively large, by letting x + be an estimate of the vector x ∈ Rn , that solves the model iteration subproblem (28) maxt (x), subject to x − x opt ≤ Δ¯ k , where Δ¯ k is prescribed. 2.3 The procedure for updating the matrix H In this subsection, a procedure for the calculation of H + from H is discussed. It so happens that the change (26) to the interpolation points causes the symmetric matrices W = H −1 and W + = (H + )−1 to differ only in their t-th row and t-th column. Also, recall that W is not stored. Therefore, our formula of H + is going to depend only m+n+1 , which is the t-th column of W + . These data on H and the vector w+ t ∈R completely define H + , because in theory the updating can be done by inverting H to + give W . The availability of w+ t allows the symmetric matrix W to be formed from W , and H + is set to the inverse of W + . This procedure provides excellent protection against the accumulation of computer rounding errors. As presented in [16], H + is given by: 1 αt (et − H w)(et − H w)T − βt H et eTt H σt + τt H et (et − H w)T + (et − H w)eTt H ,
H+ = H +
(29)
where et is the t-th coordinate vector in Rm+n+1 , w ∈ Rm+n+1 is the vector that has the components 2 1 (x i − x 0 )T x + − x 0 , i = 1, 2, . . . , m, 2 wm+1 = 1, and wm+i+1 = x + − x 0 i , i = 1, 2, . . . , n, wi =
(30)
also αt = eTt H et , τt = eTt H w,
βt = (et − H w)T w + ηt , and σt = αt βt + τt2 ,
(31)
where 4 1 ηt = x + − x 0 − eTt w. 2
(32)
Once H + is constructed, the submatrices Ξ and Υ of H are overwritten by Ξ + and Υ + respectively, and the factorization of Ω + is stored instead of Ω. The purpose of the factorization is to reduce the damage from rounding errors to the identity
Author's personal copy A derivative-free algorithm for linearly constrained optimization
607
W = H −1 , which is fulfilled at the beginning of each iteration, see [17]. Let the factorization Ω = V V T be given, the new factorization of Ω + can be constructed by changing only one column of V , the first column say, see [19]. Specifically, the first column has the form − 12
Vi1+ = σt
τt Vi1 + et − eopt − H {w − v} i Vt1 ,
i = 1, 2, . . . , m,
(33)
where v ∈ Rm+n+1 is defined by vi =
2 1 (x i − x 0 )T (x opt − x 0 ) , 2
vm+1 = 1,
i = 1, 2, . . . , m,
vm+1+i = (x opt − x 0 )i ,
(34)
i = 1, 2, . . . , n.
Here opt in eopt is the index of the best point x opt . The quantities σt , τt are defined by Eq. (31) and the vector w is defined by Eq. (30). For full details of these calculations, see [17, 19].
3 Linearly constrained optimization by the quadratic approximation algorithm In this section, we describe our algorithm, LCOBYQA, for solving problem (1) subject to the linear constraints (2). Both NEWUOA and LCOBYQA are based on the main idea of Powell, but our algorithm differs from the NEWUOA in three main procedures, namely: • the initial calculations procedure, • the trust-region subproblem, and • the geometry improvement subproblem (the model iteration subproblem). 3.1 Initial calculations The LCOBYQA algorithm requires the initial point x 0 , the coefficient matrix AT of the linear constraints, the right hand side vector of the constraints b, the parameters ρbeg , ρend , ρ1 and Δ1 which satisfies ρ1 = Δ1 = ρbeg , ρbeg > ρend , and an integer number m = 2n + 1, where n the number of variables. In order to construct the initial interpolation points, we need the point x 0 to be strictly feasible, if x 0 is not strictly feasible, the algorithm calls the phase one procedure of linear programming to provide a vertex point (see [12]), and then uses the following steps to construct a strict feasible point. Let x be the vertex point which is generated by phase one procedure. Suppose that {1, 2, . . . , l} is the index set of the active constraints at x , i.e., a Ti x = bi ,
i = 1, 2, . . . , l.
(35)
For a direction d to be a nonbinding feasible direction with respect to i, i = 1, 2, . . . , l, it must satisfy a Ti d > 0,
i = 1, 2, . . . , l.
(36)
Author's personal copy 608
E.A.E. Gumma et al.
Let Al = [a1 , . . . , al ], so ATl d > 0 characterizes the feasible direction d which is nonbinding with respect to 1, 2, . . . , l. Such a direction can be found by solving the linear system ATl d = e,
e = [1, 1, . . . , 1]T .
(37)
x
The candidate point x 0 is set to + αd, 0 ≤ α ≤ 2Δ1 . Initially, we set α = 2Δ1 , if x 0 = x + αd is strictly feasible then we are done. Otherwise, we reduce α iteratively. These steps produce a strict feasible point, for more details see [10]. The procedure that calculates the strict feasible point x 0 has been denoted by MOVE in the LCOBYQA algorithm, see Sect. 3.6. Once x 0 is constructed, the first m interpolation points are given by [17]: x1 = x0,
x i+1 = x 0 + ρbeg ei ,
x n+1+i = x 0 − ρbeg ei ,
(38)
i = 1, 2, . . . , n,
where ei is the i-th coordinate vector in Rn . If it is necessary then the values of ρbeg , ρ1 and Δ1 are reduced to guarantee that all the initial points that we will constructed are feasible, see [10]. Given x 0 , we now consider the first quadratic model 1 Q1 (x 0 + p1 ) = Q1 (x 0 ) + p T1 g 1 (x 0 ) + p T1 G1 p 1 , 2
p 1 ∈ Rn ,
(39)
subject to AT (x 0 + p1 ) ≥ b, that satisfies the interpolation conditions Q1 (x i ) = f (x i ),
i = 2, . . . , m,
(40)
where m = 2n + 1. The vector g 1 and the diagonal elements (G1 )ii , i = 1, 2, . . . , n, of the diagonal matrix G1 , are given uniquely by the conditions (40). The initial calculations of LCOBYQA also set the initial matrix H = W −1 of the first model, where W occurs in the linear system (18). It is straightforward to derive the element of W for the initial points x i , i = 1, 2, . . . , m, but, as mentioned in Sect. 2.3, it is required to have the elements of Ξ and Υ explicitly, with the factorization of Ω. Fortunately, the chosen positions of the initial interpolation points provide convenient formulae for all of these terms, see [17]. The first row of the initial (n + 1) × m matrix Ξ has the simple form Ξ1j = δ1j ,
j = 1, 2, . . . , m.
(41)
Further, for integer i = 2, 3, . . . , n + 1, the i-th row of the initial Ξ also has just two nonzero elements Ξii = (2ρbeg )−1
and Ξii+n = −(2ρbeg )−1 .
(42)
This completes the definition of Ξ for the initial interpolation points. Moreover, the initial (n + 1) × (n + 1) matrix Υ , is identically zero. As mentioned in the last paragraph of Sect. 2.3, the factorization of Ω, which guarantees that the rank of Ω is at
Author's personal copy A derivative-free algorithm for linearly constrained optimization
609
most m-n-1, having the form Ω=
m−n−1
vk vTk = V V T ,
(43)
k=1
where the components of the initial vector vk ∈ Rm , which is the k-th column of V , are given the values √ −2 V1k = − 2ρbeg , Vj k = 0,
Vk+1k =
1 √ −2 2ρbeg , 2
Vk+n+1k =
1 √ −2 2ρbeg , 2
(44)
otherwise, where 1 ≤ k ≤ n, j = 1, 2, . . . , m.
We see that each of these columns has just three nonzero elements. Let x opt be an interpolation point such that f (x opt ) is the least calculated value of f so far. Each trust-region iteration solves a quadratic subproblem at x opt subject to linear inequality constraints, using a version of the active set method for indefinite quadratic programming problems. However, this method requires that the initial reduced Hessian matrix at x opt to be positive definite. Therefore, if the reduced Hessian at x opt is not positive definite, then artificial constraints are added by the algorithm to the initial working set (the set of active constraints at x opt ). These constraints involve artificial variables yi , and are of the form yi ≥ (x opt )i or yi ≤ (x opt )i . The purpose of the artificial constraints is to convert the reduced Hessian matrix at x opt to a positive definite matrix [9]. When the iterations of the algorithm proceed, the artificial constraints are removed automatically. 3.2 The trust-region subproblem Let x opt be an interpolation point such that f (x opt ) is the least calculated value of f (x) so far. Assume that q constraints are active at x opt , let Aˆ T denote the matrix whose rows correspond to the active constraints at x opt , and bˆ be the corresponding right hand side vector, ˆI be the working set at x opt i.e. ˆI is the index set of the active constraints. The quadratic model at the k-th iteration is defined by 1 Qk (x opt + p k ) = f (x opt ) + gkT p k + p Tk Gk p k , 2
p k ∈ Rn ,
(45)
its parameters being the vector g k ∈ Rn and the n × n symmetric matrix Gk . The model Qk has to satisfy the interpolation conditions (6). Once the quadratic model is constructed, LCOBYQA is directed to one of the two iterations, namely the ‘trustregion’ iteration or the ‘model’ iteration. In each ‘trust-region’ iteration, a step p k from x opt , is defined as the vector that solves: 1 min Qk (x opt + p k ) = Qk (x opt ) + gkT pk + pTk Gp k , 2 subject to A (x opt + pk ) ≥ b, T
pk ≤ Δk .
p k ∈ Rn ,
(46)
Author's personal copy 610
E.A.E. Gumma et al.
The solution of problem (46) is presented in detail in Sect. 3.4. If it occurs that pk < 12 ρk then x opt + p k is considered to be sufficiently close to x opt , and the algorithm is switched to the ‘model’ iteration. Otherwise, the new function value f (x opt + pk ) is calculated; ρk+1 is set to ρk ; the ratio, RATIO, is calculated by the formula: f (x opt ) − f (x opt + p k ) RATIO = . (47) Qk (x opt ) − Qk (x opt + p k ) A new trust-region radius Δk+1 is chosen by the formula (see [19]): ⎧ 1 if RATIO ≤ 0.1 ⎪ ⎨min[ 2 Δk , pk ], 1 Δk+1 = max[ 2 Δk , pk ], if 0.1 < RATIO ≤ 0.7 ⎪ ⎩ max[ 12 Δk , 2pk ], if RATIO > 0.7, and the new point is defined by the equation: x + p k , if f (x opt + p k ) < f (x opt ), + x = opt if f (x opt + p k ) ≥ f (x opt ). x opt ,
(48)
(49)
If RATIO> 0.1 then we select the integer t, the index of the interpolation point that will be removed. The selection of t ∈ {1, 2, . . . , m} provides a relatively large denominator | σt |=| αt βt + τt2 |. Specifically, t is set to the integer in the set {1, 2, . . . , m}\{opt} that maximizes the weighted denominator 1 x + − x 4 − wT H w + eT H w 2 , (50) max 1, x t − x opt 2 /Δ2k Htt 0 t 2 where w is given by (30). For justification of this choice, see [19]. Also, if RATIO> 0.1 then the model Q is updated using the formula (29) such that Q interpolates f (x) at x + instead of x t . If x + minimizes f (x) with respect to a constraint whose index is in ˆI then we calculate the Lagrange multipliers at x + . If all Lagrange multipliers at x + are positives then we terminate the algorithm. Otherwise we delete the index of the constraint with least Lagrange multiplier from the working set and update Aˆ T , bˆ and ˆI. If x + reaches the boundary of a constraint whose index is not in the working set then we add this to the working set and update Aˆ T , bˆ and ˆI. Finally, we set k = k + 1 and the algorithm performs another trust-region iteration. 3.3 The model iteration subproblem If the RATIO calculated by (47) satisfies RATIO < 0.1, then the algorithm set t to be an integer in {1, 2, . . . , m} such that x t maximizes the distance DIST = x i − x opt , ∀i. If DIST ≥ 2Δk , then the geometry of the interpolation points needs to be improved, else the trust-region is shrinked. If the geometry of the interpolation points needs to be improved, then the algorithm invokes the ‘model’ iteration. The ‘model’ iteration tries to improve the geometry of the interpolation points by choosing a step p k which solves problem (28), subject to AT (x opt + pk ) ≥ b, pk ≤ Δk , i.e. it
Author's personal copy A derivative-free algorithm for linearly constrained optimization
611
maximizes | j (x opt + pk ) | subject to AT (x opt + p k ) ≥ b, pk ≤ Δk . The solution of this problem is presented in Sect. 3.5. Once pk is chosen then the model Q is updated using the formula (29) such that Q interpolates f (x) at x opt + p k instead of x t . If f (x opt + p k ) < f (x opt ) then x opt is overwritten by x opt + pk . Finally, we set k = k + 1 and the algorithm performs another trust-region iteration. If the geometry of the interpolation point does not need to be improved then the value of ρk is decreased from ρk to ρk+1 by the following rule [19]: ⎧ ⎪ ⎨ρend ,
if ρk ≤ 16ρend , 1 ρk+1 = (ρk ρend ) 2 , if 16ρend < ρk ≤ 250ρend , ⎪ ⎩ 0.1ρk , if ρk > 250ρend .
(51)
3.4 The solution of the trust-region subproblem Consider the trust-region subproblem 1 min Qk (x opt + pk ) = Qk (x opt ) + g Tk p k + pTk Gk pk , 2 subject to AT (x opt + pk ) ≥ b,
p k ∈ Rn ,
pk ≤ Δk ,
where the parameters g k and Gk are given data. We will use a null space active set version of the truncated conjugate gradient procedure (for indefinite quadratic programming problems) to solve the above subproblem. For more details see [5, 8– 10, 13]. The idea of choosing the active set method in this work is motivated by the fact that if the correct working set at the solution is known a priori then the solution of the linear equality constrained problem would also be a solution to a linear inequality problem. Assume that q constraints are active at x opt , let Aˆ T denote the matrix whose rows correspond to the active constraints at x opt , and bˆ be the corresponding right hand side vector. Therefore, in order to solve (46), we solve the subproblem 1 min Qk (x opt + pk ) = Qk (x opt ) + g Tk p k + pTk Gk pk , 2 T ˆ subject to Aˆ (x opt + p ) = b, p ≤ Δk , k
p k ∈ Rn ,
(52)
k
where Aˆ T ∈ Rq×n , bˆ ∈ Rq and Aˆ T is of a full rank. Let ˆI be the index set of active constraints at x opt (the working set at x opt ). Then any step pk from a feasible point to any other feasible point must satisfy: Aˆ T p k = 0.
(53)
Now, let Y and Z be an n × q and n × (n − q) matrices, respectively, such that [Y : Z] is nonsingular. In addition, let Aˆ T Y = I and Aˆ T Z = O, the columns of Z form a basis for the null space of Aˆ T . So, from Eq. (53), any feasible direction p k
Author's personal copy 612
E.A.E. Gumma et al.
can be written as p k = Zy, where y is any vector in Rn−q . Therefore, any solution of Aˆ T x = bˆ is given by p = Y bˆ + Zy. Thus, problem (52) can be written as: k
1 ˆ T Zy + 1 (g + GY b) ˆ ˆ T Y b, min ψ(y) = y T Z T GZy + (g + GY b) 2 2
y∈Rn−q
(54)
subject to y ≤ Δr , ˆ 2 , see [7]. We observe that the constant term of (54) is where Δr = Δ2k − Y b independent of y, so we can rewrite this problem in the form 1 ˆ T Zy, min ψ(y) = y T Z T GZy + (g + GY b) n−q 2 y∈R
subject to y ≤ Δr .
(55)
Once the solution y ∗ is calculated then we substitute p k = Y bˆ + Zy ∗ .
(56)
We use an active set version of the truncated conjugate gradient method to find y ∗ , see [2]. This method produces a piecewise linear path in Rn−q , beginning at the centre y 0 = 0 of the trust-region {y : y ≤ Δr }. For j ≥ 1, y j is then generated by: y j = y j −1 + αs j ,
j = 1, 2, . . . , n − q,
(57)
where s j is given by −Z T ∇Qk (x opt ), j = 1, sj = −Z T ∇Qk (x opt + Zy j −1 ) + βj s j −1 , j ≥ 2,
(58)
and βj = Z T ∇Qk (x opt + Zy j −1 )2 /Z T ∇Qk (x opt + Zy j −2 )2 . We now discuss the step length α in (57). For each iteration j , let αΔ be the step along the direction s j to the trust-region boundary, i.e. αΔ = Δr . Let αψ be such that the derivative of ψ(y j −1 + αψ s j ) with respect to αψ is zero, except that αψ is regarded as infinity if the first derivative of ψ(y j −1 + αψ s j ) with respect to αψ is negative for every αψ ≥ 0. Let αc be such that αc = min i∈ / Iˆ
bˆ − a T (Y bˆ + Zy ) i i j a Ti (Y bˆ + Zy j )
,
a Ti (Y bˆ + Zy j ) < 0
.
(59)
We now define α = min{αΔ , αψ , αc }, see [19], and y j is overwritten by y j −1 + αs j . In the case α = αΔ , the trust-region boundary has been reached which completes the iteration of the conjugate gradient method. If α = αc , the current line search is restricted by a constraint. Its index is added to Iˆ so that the subsequent choice of x opt + (Y bˆ + Zy j ) will remain on the boundary of the additional constraint. At this
Author's personal copy A derivative-free algorithm for linearly constrained optimization
613
stage Q(x opt ) − Q(x opt + (Y bˆ + Zy j )) is the total reduction in Q that occurred so far, and the product ∇Q(x opt + (Y bˆ + Zy ))Δk is likely to be an upper bound on j
any further reduction. Therefore, the termination occurs if the condition ∇Q x + (Y bˆ + Zy ) Δk ≤ 0.01 Q(x ) − Q x + (Y bˆ + Zy ) (60) opt opt opt j j is achieved. Otherwise, the conjugate gradient procedure is restarted at the point x opt + (Y bˆ + Zy j ) with s j = −Z T ∇Q(x opt + (Y bˆ + Zy j )) as the next search direction. In the remaining case α ≤ αΔ , α ≤ αc , α = αψ , α is a full projected conjugate gradient step without any interference from any constraint which gives the strict reduction in Qk . If this reduction is at most the right hand side of (60) or the inequality (60) holds at the new point x opt + (Y bˆ + Zy j ) then the termination also occurs. If the new solution x opt + (Y bˆ + Zy ) does not satisfy (60) then we perj
form the alternative operation. The alternative is a line search from the new point along the direction s j , which is chosen as a projected steepest descent direction −Z T ∇Q(x opt + (Y bˆ + Zy j )) augmented by the multiple of the previous search direction, as given in (58). If x opt + p k minimizes f (x) with respect to the constraints in Iˆ then we calculate the Lagrange multipliers at x opt + p . If the Lagrange multiplik
ers are positives then the algorithm is terminated, otherwise we remove the index of the constraint which corresponds to the least Lagrange multiplier from the working set ˆI, see [10] for more details. 3.5 The solution of the model iteration subproblem If RATIO of (47) satisfies RATIO < 0.1, then the algorithm tests whether the geometry of the interpolation points needs to be improved. We set t to be an integer in {1, 2, . . . , m} such that x t maximizes the distance DIST = x i − x opt ,
i = 1, 2, . . . , m,
(61)
where x opt is the interpolation point such that f (x opt ) is the least calculated value of f (x) so far. If DIST ≥ 2Δk then the procedure that improves the geometry of the interpolation points is to be invoked. This procedure tests the condition 1 σ = αβ + τ 2 ≤ τ 2 . 2
(62)
If condition (62) is true, then a subroutine RESCUE is invoked (see [19]). Otherwise the procedure replaces the current interpolation point x t by a new point x + , in order to improve the geometry of the interpolation points. This is done by using the Lagrange interpolation polynomials. The Lagrange interpolation polynomial is a quadratic polynomial t (x), x ∈ Rn , that satisfies the Lagrange conditions (22) and the remaining degrees of freedom are used to minimizing the Frobenius norm ∇ 2 t (x)F . Therefore, t is the quadratic function m 2 1 t (x) = c + (x − x 0 ) g + λk (x − x 0 )T (x k − x 0 ) , 2 T
k=1
x ∈ Rn .
(63)
Author's personal copy 614
E.A.E. Gumma et al.
The parameters c, g and λk , k = 1, 2, . . . , m, being defined by the linear system (18), where the right hand side is the coordinate vector et ∈ Rm+n+1 . Thus, the parameters of t are the t-th column of the matrix H . As mentioned in (31) that τt = et H w = t (x opt + pk ), we expect relatively larger modulus of the denominator σt = αt βt + τt2 to be beneficial when the formula (31) is applied. Therefore, when the geometry of the interpolation points need to be improved the point x t is replaced by the point x + = x opt + pk , where the direction p k solves the following problem maxt (x opt + p k ),
subject to pk ≤ Δk ,
ˆ Aˆ T (x opt + p k ) ≥ b.
(64)
We now adopt the following procedure to solve problem (64). It is reported in [18, 19] that x opt + pk is selected usually from one of m − 1 line segment in Rn through x opt and other interpolation points. Let x t be the point that will be removed in order to improve the interpolation set then the direction p k is chosen as pk = α(x t − x opt ), where α ∈ (0, 1) such that | t (x + p k ) | is maximum. Clearly, | t (x + pk ) | is positive, since t (x t ) = 1. The above choice of p k guarantees that x + = x opt + pk is feasible, since the line segment between any two feasible points of a convex set (linear constraints) is feasible. We begin with α = 0.95 and reduce it iteratively, see [10] for more details. 3.6 The LCOBYQA algorithm We now introduce the summary of our algorithm which solves unconstrained, simple bound constrained and linear inequality constrained derivative-free optimization problems. The following summary of the algorithm is divided into nine steps, where each step refers to the relevant part of the material discussed in previous sections. We now present the step by step description of the LCOBYQA algorithm: • Step 1: The user of the algorithm supplies the following data: The initial point x 0 , the parameters ρbeg , ρend , the matrix AT , and the vector b. • Step 2: Initialization Set m = 2n + 1. If x 0 is not strictly feasible, invoke the procedures PHASE ONE and MOVE to a generate strictly feasible point, see Sect. 3.1. Set Δ1 = ρbeg , ρ1 = ρbeg , ρ1 > ρend . Construct the initial interpolation points, as shown in (38). Select a point x opt from the interpolation points such that f (x opt ) has least value of f (x). Select ˆI to be the initial working set at x opt , set Aˆ T to be the coefficient matrix of the constraints in the working set, set bˆ to be the corresponding right hand side vector. Construct the first quadratic model Q, set k = 1, set r = 0. • Step 3: if ρk ≤ ρend go to Step 9. end(if).
Author's personal copy A derivative-free algorithm for linearly constrained optimization
615
Calculate the step pk that minimizes problem (52). If pk < 12 ρk set r = 1 if three recent values of | f (x opt ) − Q(x opt ) | and pk are small go to Step 8. else reduce Δk to Δk+1 by using (48), set RATIO=-1, go to Step 6. end(if) end(if) • Step 4: Calculate f (x opt + pk ) and RATIO by (47). Revise Δk by (48), subject to Δk+1 ≥ ρk . If RATIO ≥ 0.1 Select an integer t by (50), the index of the interpolation point that will be dropped. else set t = 0 end(if). • Step 5: If t > 0 Update the model Q such that Q interpolate f (x) at x opt + pk instead of x t . If x opt + p k minimize f (x) with respect to the constraints in Iˆ if the Lagrange multipliers at x opt + p k are positives go to Step 9. else delete a constraint with least Lagrange multiplier from the working set, ˆ update ˆI, Aˆ T , b. end(if) end(if) If x opt + p k reaches the boundary of a constraint not in the working set ˆ add this constraint to the working set, update ˆI, Aˆ T , b. end(if) If f (x opt + pk ) < f (x opt ) overwrite x opt by x opt + pk . end(if) end(if) If RATIO≥ 0.1 set k = k + 1, go to Step 3 end(if) • Step 6: Select an integer t that maximizes the distance DIST = x t − x opt . If DIST≥ 2Δk If condition (62) is true invoke the subroutine RESCUE. else Replace x t by x opt + p k , where pk solves problem (64).
Author's personal copy 616
E.A.E. Gumma et al.
Update the model Q such that Q interpolate f (x) at x opt + pk instead of x t . If f (x opt + pk ) < f (x opt ) overwrite x opt by x opt + pk . end(if) end(if) set k = k + 1, go to Step 3. end(if). • Step 7: If max [p k , Δk ] > ρk or RATIO> 0 set k = k + 1, go to Step 3. end(if) • Step 8: If ρk > ρend Reduce ρk by (51); reduce Δk by Δk =max [ 12 ρk , ρk+1 ] set k = k + 1, go to Step 3. end(if). • Step 9: If r = 1 Calculate f (x opt + p k ). If f (x opt + pk ) < f (x opt ) overwrite x opt by x opt + pk . end(if) end(if) output the final optimal point x opt , stop. 4 Numerical results and discussion In this section, we compare the performance of LCOBYQA with that of other available model-based derivative-free algorithms. The LCOBYQA algorithm is tested on unconstrained, simple bound constrained and linear constrained problems. Most of these test functions are non-convex. We use the number of function evaluations (that each algorithm takes to solve the problem) and the final function values as the criteria for comparison. The numerical results discussed in this section are carried out on a pentium-4, 2.0 GHz, PC and the code of the algorithm is a complete MATLAB software. First, we compared LCOBYQA with UOBYQA [15], and COBYLA [14] using the unconstrained test problems ARWHEAD, BDQRTIC [3] and CHROSEN [20]. The softwares are run in the same environment. The number of evaluations of f (x) for LCOBYQA, UOBYQA [15], and COBYLA [14], are reported in Table 1. Table 1 shows that our algorithm is attractive in comparison with its competitor, namely COBYLA[14] and UOBYQA[15], except that the results on the function BDQRTIC, where we observe that the performance of UOBYQA is superior to that of LCOBYQA. Next, we compare LCOBYQA with CONDOR [7] and DFO [3, 4] using the test problems POWER, DQDRTIC and VARDIM [1]. We have run the softwares in the
Author's personal copy A derivative-free algorithm for linearly constrained optimization
617
Table 1 Comparative results between LCOBYQA, COBYLA and UOBYQA Number of function evaluations
Final function values
NAME, n
LCOBYQA
COBYLA
UOBYQA
LCOBYQA
COBYLA
UOBYQA
ARWHEAD, 10
184
280
219
8.8374e-14
1.776e-15
3.7747e-13
ARWHEAD, 15
224
522
458
1.9387e-009
1.2854e-11
1.3624e-12
ARWHEAD, 20
368
678
837
2.9088e-010
3.5527e-15
2.03925e-12
ARWHEAD, 25
457
900
1320
1.5336e-11
8.8817e-16
2.9070e-12
BDQRTIC, 10
660
1106
434
11.8654
18.9344
11.8654
BDQRTIC, 15
1273
2323
843
23.6405
30.7017
23.6405
BDQRTIC, 20
1764
3616
1541
35.4091
42.4700
35.4906
BDQRTIC, 25
2369
5619
2302
47.1774
54.2384
47.1774
CHROSEN, 10
411
4661
505
7.7337e-010
3.2103e-15
1.4485e-12
CHROSEN, 15
701
6935
1204
1.0806e-010
1.9841e-15
4.5296e-13
CHROSEN, 20
977
8912
2034
9.2520e-009
1.5176e-15
5.8361e-13
CHROSEN, 25
1216
10861
2933
6.9363e-009
2.0238e-15
1.4764e-12
Table 2 Comparative results between LCOBYQA, CONDOR and DFO Number of function evaluations
Final function values
NAME, n
LCOBYQA
CONDOR
DFO
LCOBYQA
CONDOR
DQDRTIC, 10
28
201
403
2.1777e-25
2.0929e-18
1.6260e-20
POWER, 10
28
550
206
6.6831e-26
9.5433e-7
2.0582e-7
VARDIM, 10
2517
2686
2061
3.15423e-10
2.177e-25
1.626e-20
DFO
same environment. Table 2 shows the number of function evaluations and the final function values for POWER, DQDRTIC and VARDIM test functions. Table 2 shows that the performance of LCOBYQA is far superior to that of the other two algorithms. However, LCOBYQA does not behave as good as DFO on VANDIM. LCOBYQA is also tested and compared to NEWUOA on different unconstrained test functions. The test functions used are, the ARWHEAD, CHROSEN, PENALTY1, PENALTY2 and PENALTY3 [1]. We have run LCOBYQA and NEWUOA in the same environment. We have used x 0 = 0. We set ρend = 10−6 in each case, while ρbeg is given the values 0.5, 0.5, 1.0, 0.1 and 0.1 for ARWHEAD, CHROSEN, PELANTY1, PELANTY2 and PELANTY3, respectively. Table 3 shows the number of function evaluations of LCOBYQA and NEWUOA for the 5 test functions. The star in Table 3 indicates that the CPU time is very long, so the problem is not tried. In this table, we observe that the performance of the two algorithms is similar. For example the results of LCOBYQA on ARWHEAD are better than NEWUOA, but the results of NEWUOA on CHROSEN, PENALTY2 and PENALTY3 are better than LCOBYQA. The results are therefore comparable inspite of the fact that NEWUOA was specifically designed for unconstrained problem. LCOBYQA is also tested and compared with CONDOR [7], BOBYQA [19] and COBYLA [14] using test problems in [11] with simple bound constraints. The results
Author's personal copy 618
E.A.E. Gumma et al.
Table 3 Comparative results between LCOBYQA, and NEWUOA Number of function evaluations
Final function values
NAME, n
LCOBYQA
NEWUOA
LCOBYQA
NEWUOA
ARWHEAD, 20
321
404
1.7764e-015
3.8156e-12
ARWHEAD, 40
1107
1497
2.3208e-012
1.7674e-11
ARWHEAD, 80
2491
3287
5.2358e-013
2.6176e-11
ARWHEAD, 160
8453
8504
3.2557e-008
1.6807e-10
CHROSEN, 20
818
845
1.0806e-009
4.3624e-12
CHROSEN, 40
2042
1876
1.7063e-008
5.4617e-11
CHROSEN, 80
4852
4314
2.7119e-008
2.9747e-11
CHROSEN, 160
*
9875
*
2.3506e-10
PENALTY1, 20
7507
7476
1.7843e-004
1.5777e-4
PENALTY1, 40
16704
14370
3.4791e-004
3.3925e-4
PENALTY1, 80
27407
32390
5.2643e-004
7.1305e-4
PENALTY1, 160
*
72519
*
1.4759e-3
PENALTY2, 20
1612
2443
6.3457e+2
6.3457e+2
PENALTY2, 40
5354
2455
5.5419e+004
5.5418e+4
PENALTY2, 80
13475
5703
1.771e+008
1.7760e+8
PENALTY2, 160
*
*
*
*
PENALTY3, 20
4337
3219
3.2579e+2
3.630e+2
PENALTY3, 40
14221
16589
1.4916e+3
1.5322e+3
PENALTY3, 80
36039
136902
1.4916e+3
6.2844e+3
PENALTY3, 160
*
*
*
*
are presented in Table 4. The numerical results of LCOBYQA reported in this table are encouraging. Out of 8 test cases, results of LCOBYQA on 6 cases are superiors to others. Finally, LCOBYQA is also compared to CONDOR, and COBYLA using problems in [11] with linear constraints. The results of these test functions are given in Table 5. From this table we see that LCOBYQA performs very well compared to CONDOR, and COBYLA for most of the test functions. LCOBYQA was also tested on other functions. Results for these problems are reported in [10]. In our numerical experiments with LCOBYQA, it has been observed that it converges faster when RESCUE procedure of BOBYQA [19] happens to be called earlier during the first few iterations. This holds, for ARWHEAD which was solved by LCOBYQA up to 900 variables using a single processor. This is considered amazing.
5 Conclusion and future outlook Based on Powell’s algorithm (NEWUOA), we have developed a successful new derivative-free algorithm, named LCOBYQA, to solve linearly constrained optimization problems. The algorithm is based on quadratic interpolation. It constructs a quadratic model of the objective function from a few data, and uses the remaining
16
177
HS45, 5
HS110, 10
12
HS04, 2
30
9
HS03, 2
174
51
HS02, 2
HS38, 4
273
HS01, 2
HS05, 2
LCOBYQA
NAME, n
201
377
311
34
54
73
36
142
CONDOR
Number of function evaluations
219
40
546
33
21
31
62
222
BOBYQA
failed
69
382
77
18
45
53
36
COBYLA
Table 4 Comparative results between LCOBYQA, CONDOR, BOBYQA and COBYLA Final function value
1.0000 −45.7785
1.0000 −45.7785
7.8251e-13
−1.3415
2.8731e-10
−1.9130
−45.7784
1.0000
2.22716e-13
−1.91322
2.6666
2.8393e-34
−7.4113e-22 2.6667
4.94122
1.6081e-14
BOBYQA
4.9412
1.5560e-13
CONDOR
2.6667
1.3332e-33
0.0504
9.9003e-10
LCOBYQA
failed
1.5000
7.8770e+0
1.2283
2.6666
1.6331e-17
5.0426e-2
4.9982e-10
COBYLA.
Author's personal copy
A derivative-free algorithm for linearly constrained optimization 619
Author's personal copy 620
E.A.E. Gumma et al.
Table 5 Comparative results between LCOBYQA, CONDOR, and COBYLA Number of function evaluations
Final function value
NAME, n
LCOBYQA
CONDOR
COBYLA
LCOBYQA
CONDOR
COBYLA.
HS09, 2
20
50
53
−0.5000
−0.5000
6.0220e-1
HS21, 2
13
59
48
−99.9600
−99.99
−99.9600
HS24, 2
13
79
71
−1.0000
−0.5774
−2.6214
HS28, 3
52
128
142
5.5114e-18
6.6151e-26
7.4213e-17
HS35, 3
84
130
46
0.1117
0.1111
−9.0000
HS36, 3
20
123
66
−3.3000e+3
−3.3000e+3
4.0692e-14
HS37, 3
113
69
46
−3.4560e+3
−3456
−3.4560e+3
HS44, 4
15
23
45
−15
−15
−15
HS45, 5
16
377
69
1.0000
1.0000
1.0000
HS48, 5
166
269
57
1.3471e-14
7.7073e-23
1.7392e-12
HS50, 5
107
260
95
1.0913e-12
1.0806e-25
2.7840e-16
HS51, 5
87
257
90
2.o866e-11
1.0806e-25
1.7392e-16
HS76, 4
16
21
45
−4.6818
−4.6818
−4.6818
freedom in the model to minimize the Frobenius norm of the Hessian matrix of the change to the model. In Sect. 4, we tested our algorithm (LCOBYQA) on various test problems, most of which are nonconvex problems. Tables 1–5 show that our algorithm is attractive in comparison with its competitors. The results obtained by the algorithm prove its efficiency and show that it competes favourably against other model based algorithms. The work reported in this paper is considered as a starting point for the solution of the large class of constrained optimization problems. For future work, in this regard, we suggest the following: In our future research, we would like to extend our algorithm to handle quadratic, general nonlinear and difficult constraints (general nonlinear constraints which are like the objective, expensive to evaluate and their derivatives are not available). In order to handle quadratic and general constraints, we can use the augmented Lagrangian method or the exact penalty function or the filter method. For the difficult constraints, the idea is to use the interpolation to construct the constraints in the same way as the interpolation of the objective function. The code of the algorithm is a complete MATLAB package, there is no call to any external, unavailable libraries, which makes the algorithm relatively fast. To increase the speed of the algorithm further, it would be interesting to use the technique of parallelism in coding. Acknowledgements Parts of this work was done during the first author’s two visits to African Institute for Mathematical Sciences (AIMS), South Africa. He is very grateful to all AIMS staff. Special thanks to Professors Fritz Hanhe and Barry Green the previous and current directors of AIMS, respectively, for their excellent hospitality and facilities that supported the research. Also, he would like to thank Professor M.J.D. Powell, an emeritus Professor at the Centre for Mathematical Sciences, Department of Applied Mathematics and Theoretical Physics, University of Cambridge, for his encouragement and help during his Ph.D. work.
Author's personal copy A derivative-free algorithm for linearly constrained optimization
621
References 1. Buckley, A.G., Jenning, L.S.: Test functions for unconstrained minimization. Technical report, CS-3, Dalhousie University, Canada (1989) 2. Conn, A., Gould, N., Toint, Ph.: Trust Region Methods. SIAM, Philadelphia (2000) 3. Conn, A., Sheinberg, K., Toint, Ph.: An algorithm using quadratic interpolation for unconstrained derivative-free optimization. In: Pillo, G.Di., Giannessi, F. (eds.) Nonlinear Optimization and Application, pp. 27–47. Plenum, New York (1996) 4. Conn, A., Sheinberg, K., Toint, Ph.: A derivative free optimization (DFO) algorithm. Report 98(11) (1998) 5. Dong, S.: Methods for Constrained Optimization. Springer, Berlin (2002). 18: project 6. Fletcher, R.: Practical Methods of Optimization, 2nd edn. Wiley, New York (1987) 7. Frank, V., Bersini, H.B.: CONDOR, a new parallel constrained extension of Powell’s UOBYQA algorithm: experimental results and comparison with DFO algorithm. J. Comput. Appl. Math. 181, 157–175 (2005) 8. Gay, D.M., David, M.: In: Lecture Note in Mathematics, vol. 1066, pp. 74–105. Springer, Berlin (1984) 9. Gill, P.E., Murray, W., Wright, M.H.: In: Inertia-Controlling Methods for General Quadratic Programming, vol. 33, pp. 1–36. SIAM, Philadelphia (1991) 10. Gumma, E.: A derivative-free algorithm for linearly constrained optimization problems. Ph.D., University of Khartoum, Sudan (2011) 11. Hock, W., Schittkowski, K.: In: Test Example for Nonlinear Programming Codes. Lecture Notes en Economics and Mathematical Systems, vol. 187 (1981) 12. Igor, G., Nash, G., Sofer, A.: In: Linear and Nonlinear Programming. SIAM, Philadelphia (2009) 13. Ju, Z., Wang, C., Jiguo, J.: Combining trust region and line search algorithm for equality constrained optimization. Appl. Math. Comput. 14(1–2), 123–136 (2004) 14. Powell, M.J.D.: A direct search optimization method that model the objective function and constrained functions by linear interpolation. In: Gomez, S., Hennart, J.P. (eds.) Advances in Optimization and Numerical Analysis, Proceeding of the Sixth Workshop on Optimization and Numerical Analysis, Oaxaca, Mexico, vol. 275, pp. 15–67 (1994) 15. Powell, M.J.D.: UOBYQA: unconstrained optimization by quadratic approximation. Math. Program. 92, 555–582 (2002) 16. Powell, M.J.D.: Least Frobenius norm updating for quadratic models that satisfy interpolation conditions. Math. Program. 100, 183–215 (2004) 17. Powell, M.J.D.: The NEWUOA software for unconstrained optimization without derivative. In: Di pillo, G., Roma, M. (eds.) IMA in Large Scale Nonlinear Optimization. Springer, New York (2006) 18. Powell, M.J.D.: Development of NEWUOA for minimization without derivatives. IMA J. Numer. Anal. 28, 649–664 (2008) 19. Powell, M.J.D.: The BOBYQA algorithm for bound constrained optimization without derivative. Technical report, 2009/NA06, CMS University of Cambridge (2009) 20. Toint, Ph.: Some numerical results using a sparse matrix updating formula in unconstrained optimization. Math. Comput. 32, 839–859 (1987) 21. Winfield, D.: Function and Functional Optimization by Interpolation in Data Table. Ph.D., Harvard University, Cambridge, USA (1969) 22. Winfield, D.: Function minimization by interpolation in data table. J. Inst. Math. Appl. 12, 339–347 (1973)