Journal of Computational Mathematics Vol.33, No.2, 2015, 179–190.
http://www.global-sci.org/jcm doi:10.4208/jcm.1411-m4519
A DIRECT SEARCH FRAME-BASED ADAPTIVE BARZILAI-BORWEIN METHOD* Xiaowei Fang College of Sciences,Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China; Department of Mathematics, Huzhou University, Huzhou 313000, China Email:
[email protected]. Qin Ni College of Sciences,Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China Email:
[email protected] Abstract This paper proposes a direct search frame-based adaptive Barzilai-Borwein method for unconstrained minimization. The method is based on the framework of frame-based algorithms proposed by Coope and Price, but we use the strategy of ABB method and the rotational minimal positive basis to reduce the computation work at each iteration. Under some mild assumptions, the convergence of this approach will be established. Through five hundreds and twenty numerical tests using the CUTEr test problem library, we show that the proposed method is promising. Mathematics subject classification: 90C56, 90C30, 65K05. Key words: Direct search, Rotational minimal positive basis, Adaptive Barzilai-Borwein method.
1. Introduction We consider the unconstrained optimization problem min f (x),
x ∈ Rn ,
where the function f : Rn → R is assumed to be continuously differentiable on Rn and the derivative information is unavailable or unreliable. Direct search methods are subset of derivative-free methods, which are the most important and challenging areas in computational science and engineering. In the 1950s, Box and Wilson [5] introduced direct search method related to coordinate search, while Hooke and Jeeves [10] first used the term of direct search method. In the 1990s, Torczon [18, 19] established the convergence theory firstly, which triggered the interest of the numerical optimization community. According to the work of Torczon, et al. [2] proposed a general framework for direct search method. In particular, some classical and modern direct search methods were introduced by Kolda, et al. [14]. In the 2000s, Coope and Price [6,7] study a class of direct search unconstrained optimization algorithms employing fragments of grids called frames, and they prove convergence under some mild conditions. In 2004, Coope and Price [8] presented a direct search frame-based conjugate gradients algorithm (MAPRP for short). The algorithm performs finite direct search conjugate *
Received May 5, 2014 / Revised version received June 26, 2014 / Accepted November 26, 2014 / Published online March 13, 2015 /
180
X.W. FANG AND Q. Ni
gradient steps, and then resets. At each reset, the algorithm get the estimate of the first and second gradients according to the fixed maximal positive basis, then obtain next search direction by applying the modified PRP formula. Finally, a parabolic lines search is designed to locate a line local minimizer. Numerical results show that the algorithm is effective. The application of Coope-Price’s direct search framework could be seen in [15], which employed Coope-Price’s framework and recently developed descent conjugate gradient methods. In 1988, Barzilai and Borwein [11] proposed BB method, which used the negative gradient direction as search direction and calculated the step length according to the secant equation. Two different secant equation deduced the large step length αLBB and the small length αSBB . BB method achieved better performance and cheaper computation than the steepest descent method in numerical experiments. Because of the simplicity and efficiency, BB method triggered a lot of research on the gradient method in recent decades, see, e.g. [3, 20, 21]. And it seems that up to now the good method is the ABB method, which is proposed by Zhou, Gao and Dai [3]. At every iteration, ABB method choose a large step size or a small step size adaptively. Extensive numerical experiments indicate that ABB method surpass the PRP method for many unconstrained optimization problems. Motivated by the efficiency of the ABB method, we propose a new direct search method, which combines the frame-based strategies and the ABB method. Because the ABB method only needs the first gradient information, our method employs the minimal positive basis. In each iteration, the minimal positive basis just need to compute the n+1 function value, while the maximal positive basis require evaluate the 2n function value. So the computation work of the new direct search method is about half of the MAPRP for approximate gradient. In addition, benefit from the characteristics of ABB method, we only require calculate step length by αLBB and αSBB , without the need for lines search. Further more, we rotate the minimal positive basis according to the local topography of objective function, which make our method more effective in practice. The convergence is proved under some mild conditions. Some numerical results show that our direct search method is promising. This paper is organized as follows. In Section 2, we present some basic notions for frame, and describe our direct search method. In Section 3, we prove the convergence of the proposed method. In Section 4, numerical results show the efficiency of method derived in this paper compared to MAPRP [8] and Nelder-Mead [1]. Concluding remarks are given in Section 5. The default norm used in this paper is Euclidean.
2. The New Direct Search Method In order to introduce our method, we give some concepts about positive basis, which can be found in [1]. Definition 2.1. A positive span of a set of vectors {v1 , · · · , vs }in Rn is the convex cone n o v = a1 v1 + · · · + as vs , v ∈ Rn , ai ≥ 0, i = 1, · · · , s . A positive spanning set in Rn is a set of vectors whose positive span is Rn .
Definition 2.2. A positive basis V in Rn is a set of vectors with the following two properties: • (i) every vector in Rn is a linear combination of the members of V, where all coefficients of the linear combination are non-negative; and
181
A Direct Search Frame-Based Adaptive Barzilai-Borwein Method
• (ii) no proper subset of V satisfies (i). It is easy to know that cardinality of any positive basis V satisfies n + 1 ≤ |V| ≤ 2n. Two simple and famous examples of positive bases are n o Vmax = v1 , · · · , vn , −v1 , · · · , −vn ,
n n o X Vmin = v1 , · · · , vn , − vi , i=1
where {v1 , · · · , vn } is a basis for Rn , Vmax represents the maximal positive basis and Vmin represents the minimal positive basis. The following definitions define the frame, minimal frame and quasi-minimal frame proposed by Coope and Price [6]. Definition 2.3. A frame can be defined as Φ = {x + hv : v ∈ V}, where x ∈ Rn is a central point of a frame, h > 0 is frame size, V is a positive basis in Rn . A frame Φ is a minimal frame if and only if f (x) ≤ f (y),
∀y ∈ Φ.
A frame Φ is a quasi minimal frame if and only if f (x) ≤ f (y) + ǫ,
∀y ∈ Φ,
where ǫ = h1+µ , µ is a positive constant, and corresponding point x is called quasi minimal point. We will design a new direct search method, which combines the frame-based strategies and the ABB method. This method forms an estimate of the gradient at each iterate, which is used for building BB search directions. An inner-outer loop structure is adopted such that the approximate gradient is relatively accurate. At same time, we give a new initial strategy, i.e. choose the lowest point in all previous iteration as the initial point of the outer loop. While in general framed-based direct search method, the initial point of the outer loop is the center point of the last frame. Let xjk be the j inner iteration point at the k outer iteration. We will discuss the computation of search direction, step length and rotation of positive basis in detail below. 2.1. Search Direction xjk .
Choose a positive basis Vkj = {v1k,j , · · · , vqk,j }(q ≥ n + 1) corresponding to the iterate point Denote T k,j k,j k,j vlk,j = vl1 , vl2 , · · · , vln , l = 1, · · · , q.
Consider the following linear model
m(xjk + s) = f (xjk ) +
n X i=1
gei si .
182
X.W. FANG AND Q. Ni
It is easy to see that model centered at xjk interpolates f at the points xjk +hjk v1k,j , · · · , xjk +hjk vqk,j , where hjk is the frame size corresponding to the iterate point xjk . The coefficients can be determined by q interpolation conditions m(xjk + hjk vlk,j ) = f (xjk + hjk vlk,j ),
l = 1, · · · , q.
Then we have that f (xjk + hjk vlk,j ) = f (xjk ) + hjk
n X i=1
which can be described at the following form:
where
k,j v11 .. . k,j vq1
··· .. . ···
l = 1, · · · , q,
f¯1 .. .. = .. , . . . k,j gen f¯q vqn k,j v1n
gi vlik,j , e
ge1
f (xjk + hjk vlk,j ) − f (xjk ) f¯l = , hjk
(2.1)
l = 1, · · · , q.
(2.2)
If we choose the positive basis Vkj as Vmin ,where vi = ei (i = 1, · · · , n), ei is the ith unit vector, then the gei is n X f (xjk − hjk e) + f (xjk + hjk el ) ! 1 l=1 f (xjk + hjk ei ) − , (2.3) n+1 hjk Pn for i = 1, · · · , n, where e = l=1 el . For a given positive basis Vkj , by (2.1) we obtain and define a search direction
gkj = (e g1 , · · · , e gn )T ,
(2.4)
pjk = −gkj .
(2.5)
2.2. Step Length Compute a large step (αjk )LBB and a small step (αjk )SBB according to the following: (αjk )LBB =
T j−1 (sj−1 k ) sk T j−1 (sj−1 k ) wk
,
(αjk )SBB =
T j−1 (sj−1 k ) wk
(wkj−1 )T wkj−1
,
j−1 where sj−1 = xjk − xj−1 = gkj − gkj−1 . More details on the large step (αjk )LBB and the k k , wk small step (αjk )SBB were documented in Barzilai and Borwein [11]. Zhou et al. [3] considered that the large step (αjk )LBB was mainly used to produce a sufficient reduction and the small step (αjk )SBB was mainly used to cause a favorable descent direction for the next iteration, so they choose the large step (αjk )LBB or the small step (αjk )SBB adaptively at each iteration.
A Direct Search Frame-Based Adaptive Barzilai-Borwein Method
183
Most of the BB algorithm adopt 1 as the initial step length. Because we calculate the approximate gradient, this strategy is not suitable for our method, especially the gap between approximate gradient and real gradient is large. We get a step length αjk at each iteration according to following strategy based on the practical computation and the suggestion of [3]. 1 if j = 0 and kgkj k2 ≤ β, 1/kg j k2 if j = 0 and kg j k2 > β, k k j αk = (2.6) j SBB (αk ) if j ≥ 1 and (αjk )SBB /(αjk )LBB ≤ κ, (αjk )LBB if j ≥ 1 and (αjk )SBB /(αjk )LBB > κ,
where β and κ are positive constant. According to αjk and pjk , the next iterate point xj+1 is k define below: xj+1 = xjk + αjk pjk (0 ≤ j ≤ γ − 1), k where γ is a positive integer. 2.3. Rotation of the positive basis
In [4], a set of orthogonal directions is rotated at each major step, so that at least one of the new directions is more closely conformed to the local behavior of the function. We employ the similar idea to rotate positive basis, so that the information of decreasing point is used as much as possible. In the k outer loop, we choose the first positive basis Vk0 , and obtain basis Vkj+1 by rotating j Vk (j = 0, 1, · · · , γ − 1). Let n o Vkj = v1k,j , · · · , vnk,j , · · · , vqk,j , (2.7)
where {v1k,j , · · · , vnk,j } is a basis for Rn . It is clear that every vector in Rn is a linear combination of the members of v1k,j , · · · , vnk,j . Denote ∆x = xj+1 − xjk = (∆x1 , · · · , ∆xn )T , k
(2.8)
where ∆xi ∈ R(i = 1, · · · , n) describe the movements perform along the vectors vik,j (i = 1, · · · , n) in a previous iterations. j j j When f (xj+1 k ) < f (xk ), we rotate Vk . Firstly, we modify the previous n vectors in Vk : k,j if ∆xi = 0 vi n k,j+1 X v¯i = , (2.9) ∆xl vlk,j if ∆xi 6= 0 l=i
i = 1, · · · , n. It has been proved (see [16]) that {¯ v1k,j+1 , · · · , v¯nk,j+1 } is linearly independent. Secondly, we use Gram-Schmidt orthogonalization technique to get a class of standard orthogonal basis: n o v1k,j+1 , · · · , vnk,j+1 , (2.10)
where v1k,j+1 may be the most probable descent direction. Finally, we obtain n o k,j+1 Vkj+1 = v1k,j+1 , · · · , vnk,j+1 , vn+1 , · · · , vqk,j+1 ,
(2.11)
184
X.W. FANG AND Q. Ni
k,j+1 where {vn+1 , · · · , vqk,j+1 } is combined by {v1k,j+1 , · · · , vnk,j+1 } according to the same combination principal as Vk0 . For example, if q = n + 1, we can get ( ) n X j+1 k,j+1 k,j+1 k,j+1 Vk = v1 , · · · , vn ,− vi . (2.12) i=1
In the following direct search frame-based adaptive Barzilai-Borwein algorithm, we generate a sequence of quasi-minimal iterates {zm }, which is useful for the convergence of our algorithm. Algorithm 2.1. Step 0: Initializations. Choose initial point x00 ∈ Rn , initial positive basis V00 and initial step length h00 . Choose λ > 1, µ > 0, β ≥ 10, γ ≥ 1, 1 > κ > 0. Set k = 0, j = 0, m = 0. Step 1: Check the stopping condition. If the stopping condition is not met, then go to step 2, otherwise output the lowest known point and stop. Step 2: Build the frame. Create a frame Φjk at the central point xjk according to the step length hjk and Vkj , and calculate the corresponding function values. Step 3: Determination of the search direction. Evaluate gradient estimate gkj according to formula (2.1),(2.2),(2.4), and evaluate search direction pjk using (2.5). Step 4: Determination of the step length. Compute the step length αjk according to (2.6), and set xj+1 = xjk + αjk pjk . k Step 5: Determination of the positive basis. j j+1 If f (xj+1 according to (2.7)-(2.11), otherwise set Vkj+1 = Vkj . k ) < f (xk ), then obtain Vk Step 6: Update some parameters. If the frame Φjk is a quasi minimal frame, then set hj+1 = hjk /λ, m = m + 1, zm = xjk ,; k j+1 j otherwise set hk = hk . In addition, increment j by one. Step 7: Check the condition of inner loop. If j < γ , go to step 1, otherwise set j = 0 and go to step 8. Step 8: Check the condition of outer loop. 0 Choose x0k+1 equal to the lowest known point. Set Vk+1 = V00 , h0k+1 = hγk . In addition, increment k by one and go to step 1. Remark 2.1. (1) Step 1 to step 7 comprise the inner loop, while step 1 to step 8 comprise the outer loop. (2) At the first iteration of outer loop, if Vk0 = Vmin , vik,0 = ei (i = 1, · · · , np ). then the gradient is estimated according to formula (2.3).
3. Convergence Analysis In order to prove the convergence of Algorithm 2.1, we make the following assumptions. Assumption 3.1. The sequence of iterates {xjk } produced by the Algorithm 2.1 is bounded. Assumption 3.2. f is continuously differentiable.
185
A Direct Search Frame-Based Adaptive Barzilai-Borwein Method
Assumption 3.3. kvlk,j k ≤ M for l = 1, · · · , q, j = 0, 1, · · · , γ and k = 0, 1, · · · , where M is a positive constant, vlk,j is the l-th vector in Vkj . From [6], we obtain the following lemma whose proof is almost the same as those in [6] and is omitted. Lemma 3.1. Let Assumptions 3.1 and 3.2 hold. Then we have: (i) the sequence of function value {f (xjk )} is bounded; (ii) If (gkj )T vlk,j ≥ 0, ∀vlk,j ∈ Vkj , then gkj = 0. Now we have the following convergent property of the Algorithm 2.1. Theorem 3.1. If the sequence of function value {f (xjk )} is bounded, then the sequence {zm } is infinite . ¯
Proof. Assume that {zm } is finite, let zM be the final iteration point and zM = xjk¯ . When ¯ the point is iterated from xj¯ to x0¯ , Algorithm 2.1 execute γ − ¯j inner iterations and 1 outer k
k+1
iteration, the sequence of corresponding points display below: ¯
¯
→ · · · → xγk¯ → x0k+1 . xjk¯ → xj+1 ¯ ¯ k
(3.1)
¯
From Step 6 of Algorithm 2.1, we know that the frame Φjk¯ is the final quasi minimal frame. Hence, the frame Φ0k+1 is not quasi minimal. From Definition 2.3 it follows that there exists at ¯ ¯
¯
0 least a vector vlk+1,0 (vlk+1,0 ∈ Vk+1 ), such that ¯ ¯
f (x0k+1 + h0k+1 vlk+1,0 ) < f (x0k+1 ) − ǫ0k+1 , ¯ ¯ ¯ ¯
(3.2)
where ǫ0k+1 = (h0k+1 )1+µ (µ > 0) is a positive constant. By Step 8, we know that x0k+2 is the ¯ ¯ ¯ lowest known point in the k + 1 outer iteration. Hence, with (3.2) we have ¯
f (x0k+2 ) ≤ f (x0k+1 + h0k+1 vlk+1,0 ) < f (x0k+1 ) − ǫ0k+1 . ¯ ¯ ¯ ¯ ¯
(3.3)
Also, from Step 8 and (3.3), we have f (x0k+r+1 ) < f (x0k+r ) − ǫ0k+r < f (x0k+r−1 )− ¯ ¯ ¯ ¯
¯ k+r X
¯ k+r X
ǫ0i < · · · < f (x0k+1 )− ¯
¯ i=k+r−1
ǫ0i ,
(3.4)
¯ i=k+1
where r is a positive integer. ¯ Because the frame Φjk¯ is the final quasi minimal frame, by Step 6 we know that the frame ¯ j = ¯j + 1, · · · , γ or k > k, ¯ and hj is constant after the ¯j Φjk is not quasi minimal for k = k, k inner iteration of the k¯ outer iteration, i.e. ¯
¯
¯ ǫj+1 ¯ k
¯ ǫj+2 ¯ k
hj+1 = hj+2 = · · · = hγk¯ = h0k+1 = · · · = h0k+2 = · · · = h0k+r , ¯ ¯ ¯ ¯ ¯ k k =
= ··· =
ǫγk¯
(3.5)
. = · · · = ǫ0k+2 = · · · = ǫ0k+r = ǫ0k+1 ¯ ¯ ¯
(3.6)
By (3.4) and (3.6), we have )− ) < f (x0k+1 f (x0k+r+1 ¯ ¯
¯ k+r X
¯
) − rǫj+1 = f (x0k+1 ) − rǫ0k+1 ǫ0i = f (x0k+1 ¯ ¯ ¯ ¯ , k
(3.7)
¯ i=k+1
If we ignore the stopping condition and let r → +∞, then f (x0k+r+1 ) → −∞, which contradict ¯ the condition that {f (xjk )} is bounded. The proof of this theorem is complete.
186
X.W. FANG AND Q. Ni
Theorem 3.2. Let Assumptions 3.1-3.3 hold, then each cluster point of {zm } is a stationary point of f . Proof. Let z∞ be an arbitrary cluster point of {zm } and the subsequence {zm }m∈K converges ˆ j to z∞ . Assume zm ˆ ∈ {zm }m∈K , and zm ˆ = xk ˆ . According to Taylor expansion and Assumption 3.3, we have ˆ ˆˆ
ˆ
ˆ
ˆ
ˆ
ˆˆ
ˆ
ˆ
ˆ
ˆˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆˆ
ˆ
ˆ
ˆ
f (xjkˆ + hjkˆ vlk,j ) = f (xjkˆ ) + hjkˆ g(ξkˆj )T vlk,j
ˆˆ
= f (xjkˆ ) + hjkˆ g(xjkˆ )T vlk,j + hjkˆ (g(ξkˆj ) − g(xjkˆ ))T vlk,j ≤ f (xjkˆ ) + hjkˆ g(xjkˆ )T vlk,j + hjkˆ kg(ξkˆj ) − g(xjkˆ )kM, ˆˆ
ˆˆ
ˆ
ˆ
ˆ
(3.8) ˆ
for all vlk,j ∈ Vkˆj , where vlk,j is the lth vector of Vkˆj , and ξkˆj is a convex combination of xjkˆ and ˆ
ˆ ˆˆ
ˆ
xjkˆ + hjkˆ vlk,j . By Step 6 of Algorithm 2.1, we know that the frame Φjkˆ is a quasi minimal frame. From Definition 2.3, we have ˆ ˆˆ
ˆ
ˆ
ˆˆ
ˆ
f (xjkˆ + hjkˆ vlk,j ) ≥ f (xjkˆ ) − (hjkˆ )1+µ ,
ˆ
∀vlk,j ∈ Vkˆj .
(3.9)
Combining (3.8) and (3.9), we obtain ˆ
ˆˆ
ˆ
ˆ
ˆ
g(xjkˆ )T vlk,j ≥ −kg(ξkˆj ) − g(xjkˆ )kM − (hjkˆ )µ ,
ˆˆ
ˆ
∀vlk,j ∈ Vkˆj .
(3.10)
Because zm ˆ iteration point of {zm }K , by Step 6 of Algorithm 2.1, we have ˆ is the mth ˆ
hjkˆ ≤
h00 , ˆ λm
where λ > 1 is a positive constant. ˆ j ˆ ˆ Let m ˆ → +∞, then zm ˆ → z ∞ , hk ˆ → 0, k → +∞. By j < γ, we know that there exists ˆ j j an infinite subsequence such that ˆj ≡ j0 , and g(ξ ) → g(x∞0 ). Combining this with (3.10), we ˆ k
have g(z∞ )T vl∞,j0 ≥ 0,
j0 ∀vl∞,j0 ∈ V∞ .
Then we yield g(z∞ ) = 0 based on Lemma 3.1.
(3.11)
4. Numerical Experiments In this section, we analyze numerical test rusults of MAPRP [8], Nelder-Mead [1] and Algorithm 2.1. Our tests are performed on a PC with Intel Core Duo CPU(
[email protected], 3.60GHz) and 4 GB RAM, with MATLAB 7.12.0. The performances of the algorithms are compared by two performance profiles in [9] and [13]. Let P be the set of benchmark problems and S be the set of optimization solvers. The performance measure tp,s is the number of function evaluations required for problem p ∈ P on a solver s ∈ S to get a function value satisfying: f (x) ≤ fL + τ (f (x00 ) − fL ), where x00 is the starting point for the test problem, τ > 0 is a tolerance, fL is the best function value achieved by any solvers within µf function evaluations, µf is a positive integer.
187
A Direct Search Frame-Based Adaptive Barzilai-Borwein Method
Table 4.1: The information about benchmark problems set. problem 1.Linear (full rank) 2.Linear (rank 1) 3.Linear (rank 1 with 0 columns & rows) 4.Rosenbrock 5.Helical valley 6.Powell singular 7.Freudenstein and Roth 8.Bard 9.Watson 10.Waston 11.Waston 12.Box 3-dimensional 13.Jennrich and Sampson 14.Brown and Dennis 15.Chebyquad 16.Chebyquad 17.Chebyquad 18.Chebyquad 19.Brown almost-linear 20.Bdqrtic 21.Bdqrtic 22.Bdqrtic 23.Bdqrtic 24.Cube 25.Cube 26.Cube
np 9 7 7 2 3 4 2 3 6 9 12 3 2 4 6 7 8 9 10 8 10 11 12 5 6 8
mp 45 35 35 2 3 4 2 15 31 31 31 10 10 20 6 7 8 9 10 8 12 14 16 5 6 8
problem 27.Mancino 28.Mancino 29.Mancino 30.Mancino 31.Penalty II 32.Penalty II 33.Penalty II 34.Penalty II 35.Penalty II 36.Variably dimensioned 37.Variably dimensioned 38.Variably dimensioned 39.Variably dimensioned 40.Variably dimensioned 41.Broyden tridiagonal 42.Broyden tridiagonal 43.Broyden tridiagonal 44.Broyden tridiagonal 45.Broyden tridiagonal 46.Broyden tridiagonal 47.Broyden tridiagonal 48.Broyden banded 49.Broyden banded 50.Broyden banded 51.Broyden banded 52.Broyden banded
np 5 8 10 12 4 6 8 10 12 8 9 10 11 12 6 7 8 9 10 11 12 4 7 9 10 12
mp 5 8 10 12 8 12 16 20 24 10 11 12 13 14 6 7 8 9 10 11 12 4 7 9 10 12
The performance profile [9] of a solver s ∈ S is defined that: ρs (α) =
o 1 n tp,s ≤ α . p∈P : |P | min{tp,s : s ∈ S}
The data profile [13] of a solver s ∈ S is the following fraction: ds (α) =
o 1 n tp,s ≤ α , p∈P : |P | np + 1
where np is the number of variables in p ∈ P . The benchmark problems set P in the experiments contains 52 least squares problems from two test problem sets MGH in [12] and CUTEr in [17]. Table 4.1 shows some informations about test problems, where np is the number of variables and mp is the number of components. In order to test solvers from a remote starting point, we let x00 = sp xs , where sp is 1, 2, · · · , 10, and xs is the standard starting point of each problem. So totally 520 problems are tested in this paper. In all problems, we have 2 ≤ np ≤ 12,
2 ≤ mp ≤ 45,
p = 1, · · · , 52.
188
X.W. FANG AND Q. Ni
Data Profile τ=10−3 1 0.9 0.8 0.7 0.6 0.5 0.4 MAPRP Nelder−Mead Algorithm 2.1
0.3 0.2 0.1 0
0
50
100
150 Number of simplex gradients ν
200
250
Data Profile τ=10−7 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
50
100
150
200 250 Number of simplex gradients ν
300
350
400
Fig. 4.1. Data Profiles of MAPRP, Nelder-Mead and Algorithm 2.1.
In addition, 100 simplex gradients are defined as the maximum computational budget, so we set µf =1300. The parameters of our numerical experiments are listed as follows: h00 = 1, V00 = Vmin , vi0,0 = ei (i = 1, · · · , np ), λ = 4, µ = 0.5, β = 10, γ = 10np , κ = 0.1. In Fig. 4.1, we show the data profiles related to the Algorithm 2.1, MAPRP and NelderMead. As we can see, Algorithm 2.1 guarantees better results than Nelder-Mead as it solves a higher percentage of problems for all accuracy τ , especially when τ = 10−7 . It also can be seen that Algorithm 2.1 and MAPRP have very similar performance when τ = 10−3 . Further more, when τ = 10−7 , Algorithm 2.1 guarantees better results than MAPRP if the number of simplex gradients are larger than 15. In addition, when τ = 10−7 , Nelder-Mead is comparable with MAPRP. For example, MAPRP solve more profile problems with smaller numbers of function evaluation, and Nelder-Mead solve more profile problems with more evaluations. The performance profiles of Algorithm 2.1, MAPRP and Nelder-Mead are reported in Fig. 4.2. When τ = 10−3 , Algorithm 2.1 and MAPRP outperform Nelder-Mead, and the difference is significantly small as the performance ratio increase. When τ = 10−7 , we can note Algorithm 2.1 performs better than the other two solves. For example, Algorithm 2.1 can solves about 75% test problems,while MAPRP and Nelder-Mead only solve no more than 70%, if performance ratio α = 4.
189
A Direct Search Frame-Based Adaptive Barzilai-Borwein Method
Performance Profile τ=10−3 1 0.9 0.8 0.7 0.6 0.5 0.4 MAPRP Nelder−Mead Algorithm 2.1
0.3 0.2 0.1 0
1
2
4
8 Performance Ratio α
16
32
Performance Profile τ=10−7 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
1
2
4
8 Performance Ratio α
16
32
64
Fig. 4.2. Performance Profiles of MAPRP, Nelder-Mead and Algorithm 2.1.
5. Conclusion The computational results which are presented in this paper show that using rotational minimal positive basis and ABB method to frame-based framework appears quite competitive. The data profiles and the performance profiles of numerical results indicates that our algorithm often reduce the number of function evaluations which is required to reach stationary point, and is superior to Nelder-Mead and MAPRP algorithms. Acknowledgments. This work was supported by the National Natural Science Foundation of China (11071117, 11274109) and the Natural Science Foundation of Jiangsu Province (BK20141409).
References [1] A.R. Conn, K. Scheinberg and L.N. Vicente, Introduction to Derivative-Free Optimization MPSSIAM Series on Optimization, Philadelphia, PA, 2008. [2] C. Audet and J. E. Dennis Jr, Analysis of generalized pattern searches, SIAM J. Optim., 13:3 (2003), 889-903. [3] B. Zhou, L. Gao and Y.H. Dai, Gradient methods with adaptive step-sizes, Computational Optimization and Applications, 35:1 (2006), 69-86.
190
X.W. FANG AND Q. Ni
[4] L. Grippo and F. Rinaldi, A class of derivative-free nonmonotone optimization algorithms employing coordinate rotations and gradient approximations, to appear in Computational Optimization and Applications, (2014), DOI 10.1007/s10589-014-9665-9. [5] G.E.P. Box and K.B. Wilson, On the experimental attainment of optimum conditions, J. Roy. Statist. Soc. Ser. B., 13:1 (1951), 1-45. [6] I.D. Coope and C.J. Price, Frame based methods for unconstrained optimization, J. Optim.Theory Appl., 107:2 (2000), 261-274. [7] I.D. Coope and C.J. Price, On the convergence of grid-based methods for unconstrained optimization, SIAM J. Optim., 11:4 (2001), 859-869. [8] I.D. Coope and C.J. Price, A direct search frame-based conjugate gradients method, J. Comput.Math., 22:4 (2004), 489-500. [9] E.D. Dolan and J.J. Mor´e, Benchmarking optimization software with performance profiles, Mathematical Programming., 91:2 (2002), 201-213. [10] R. Hooke and T.A. Jeeves, Direct search solution of numerical and statistical problems, J. Assoc. Comput. Mach., 8:2 (1961), 212-219. [11] J. Barzilai and J.M. Borwein, Two-point step size gradient methods, IMA J. Numer. Anal., 8:1 (1988), 141-148. [12] J.J. Mor´e, Burton S. Garbow and K.E. Hillstrom, Testing unconstrained optimization software, ACM Trans. Math. Softw., 7:1 (1981), 17-41. [13] J.J. Mor´e and S.M. Wild, Benchmarking derivative-free optimization algorithms, SIAM Journal on Optimization, 20:1 (2009), 17-191. [14] T.G. Kolda, R. M. Lewis and V. Torczon, Optimization by direct search: New perspectives on some classical and modern methods, SIAM review 45:3 (2003): 385-482. [15] Q. Liu, Two minimal positive bases based direct search conjugate gradient methods for computationally expensive functions. Numerical Algorithms. 58:4 (2011), 461-474. [16] M.S. Bazaraa, H.D. Sherali and C.M. Shetty, Nonlinear Programming: Theory and Algorithms, 3rd edn. Wiley and Sons, USA , 2006. [17] N.I.M. Gould, D. Orban and P.L. Toint, CUTEr and SifDec: A constrained and unconstrained testing environment revisited, ACM Transactions on Mathematical Software., 29:4 (2003), 373394. [18] V. Torczon, On the convergence of multidirectional search algorithms, SIAM J. Optim., 1:1 (1991), 123-145. [19] V. Torczon, On the convergence of pattern search algorithms, SIAM J. Optim., 7:1 (1997), 1-25. [20] Y.H. Dai, W.W. Hager, K. Schittkowski and H. Zhang, The cyclic Barzilai-Borwein method for unconstrained optimization, IMA J. Numer. Anal., 26:3 (2006), 604-627. [21] Y. Xiao, Q.Wang and D. Wang, Notes on the Dai-Yuan-Yuan modified spectral gradient method, J. Comput. Appl. Math., 234:10 (2010), 2986-2992.