On the Convergence of Mixed Integer Pattern Search Algorithms Charles Audet and J.E. Dennis Jr.
CRPC-TR99785 February 1999
Center for Research on Parallel Computation Rice University 6100 South Main Street CRPC - MS 41 Houston, TX 77005
Submitted February 1999
On the Convergence of Mixed Integer Pattern Search Algorithms Charles Audet J.E. Dennis Jr. fcharlesa,
[email protected]
Rice University Department of Computational and Applied Mathematics 6100 South Main Street - MS 134 Houston, Texas 77005-1892 USA February 11, 1999 Abstract: The de nition of pattern search methods for solving nonlinear unconstrained optimization problems is generalized here to include integer variables. We present a generalized pattern search algorithm that provides an accumulation point having a null gradient with respect to the continuous variable and being a local minimizer with respect to the discrete variables. This point is the limit of a subsequence of unsuccessful iterates whose corresponding mesh size parameters converge to zero. Furthermore, additional assumptions ensuring that these optimality conditions hold for any accumulation point, not only unsuccessful ones, are presented. Key words Pattern search algorithm, convergence analysis, unconstrained optimization, mixed integer programming. Acknowledgments: Work of the rst author was supported by NSERC (Natural Sciences
and Engineering Research Council) fellowship PDF-207432-1998 and by CRPC (Center for Research on Parallel Computing). Work of the second author was supported by DOE DEFG03-95ER25257, AFOSR F49620-98-1-0267, The Boeing Company, Sandia LG-4253 and CRPC CCR-9120008.
1
1 Introduction Torczon [11] presented a general de nition of an abstract pattern search method. The objective of the method is to minimize a continuously dierentiable function f : Rn ! R without any knowledge of its derivative. Torczon [11] shows that the method includes algorithms such as coordinate search with xed step sizes, evolutionary operation using factorial design [2], the original pattern search algorithm [7], and the multidirectional search algorithm [6]. An achievement of [11] is to develop general convergence results for the method that subsumes all the others. A survey of derivative free methods for unconstrained optimization can be found in Conn, Scheinberg and Toint [4]. The main result of [11] is that under mild assumptions, the sequence of iterates (xk ) of n R generated by any pattern search method satis es lim inf krf (xk )k = 0; k!1
(1)
without ever computing or explicitly approximating derivatives. At each iteration, the function is evaluated at trial points belonging to a discrete mesh surrounding the current iterate in order to obtain one yielding decrease in the objective function value. Lewis and Torczon [9] use positive basis theory to strengthen the result by roughly cutting in half the worst case number of trial points at each iterations without aecting the convergence result. Lewis and Torczon [8] [10] extend pattern search algorithms and convergence theory to bound and linearly constrained minimization. The main objective of the present paper is to further generalize the problem to be solved. We consider the problem of minimizing the function f : ! R, where the domain is partitioned into continuous and discrete variables = Rnc Znd , and nc and nd are the dimensions of the corresponding spaces. The function f is assumed to be continuously d n dierentiable when the variables in Z are xed. We present a general pattern search method that reduces to that of Torczon [11] when the dimension nd is xed to zero. The iterates generated by the method are partitioned into continuous and discrete variables x k = (xck ; xdk ), where xck 2 Rnc and xdk 2 Znd . A secondary objective of the paper is to slightly generalize the continuous variable part of the algorithm and to revise and shorten the arguments developed in [11] and in [9]. We rst show how to obtain an accumulation point x^ of the sequence of iterates (xk ) that satis es
rcf (^x) = 0;
(2)
where rcf (x) 2 Rnc denotes the gradient of f with respect to the continuous variables xc while keeping the discrete ones xd xed. This condition is equivalent to (1) when there are no discrete variables. However, we provide additional characteristics that the accumulation
2 point x^ possesses. We then show that the same point x^ = (^xc; x^d) is a local minimizer with respect to the discrete variable, that is,
f (^x) f (^xc; xd) for all xd 2 N (^xd);
(3)
where the user-speci ed neighborhood N (^xd) Znd is described in Section 2.2. Condition (2) guarantees that the gradient with respect to the continuous variables is zero, and condition (3) ensures that the solution is a local optimum with respect to the discrete variables. We present additional assumptions that ensure that these conditions hold not only for the accumulation point x^, but for any accumulation point of the sequence of iterates. These assumptions are direct generalizations of those presented in [11], but again the proofs are dierent. The paper is structured as follows. In the next section, we formally describe a general framework for mixed integer pattern search algorithms. In Section 3, we show the existence of a subsequence of iterates converging to an accumulation point that satis es the optimality conditions (2) and (3). The key in obtaining this accumulation point lies in considering the unsuccessful iterations, i.e., the iterations where no trial point yielding decrease in the objective function were obtained. Stronger convergence results are then derived under more restrictive assumptions.
2 Pattern search methods The underlying structure of a pattern search algorithm is as follows. It is an iterative method that generates a sequence of feasible iterates whose objective function value is non-increasing. At any given iteration, the objective function is evaluated at a nite number of points on a mesh in order to nd one that yields a decrease in the objective function value. Any iteration k of a pattern search method is initiated with the incumbent solution, i.e., the currently best found solution, as well as with an enumerable subset Mk of the domain
= Rnc Znd . Construction of the mesh Mk is formally described in Section 2.1. The objective pursued during each iteration is to obtain a solution on a subset of the current mesh whose function value is strictly less than the incumbent value. Exploration of the mesh is conducted in one or two phases. First, a nite search, free of any rules imposed by the algorithm, is performed anywhere on the mesh. Any strategy can be used, as long as it searches nitely many points. If the rst search does not succeed in improving the incumbent, the second phase is called. A potentially exhaustive search in a small neighborhood (intersected with the mesh) of the incumbent solution is performed. Rules for constructing the neighborhood are detailed in Section 2.2. The rst phase (called the Search step) provides exibility to the method and determines in practice the quality of
3 the solution. The second phase (called the Poll step) follows stricter rules and guarantees theoretical convergence. If a solution having an objective value less than the incumbent is found in either phase, then the iteration is declared successful. The incumbent solution is then updated, and the next iterate is initiated with a (possibly) coarser mesh and a (possibly) larger neighborhood around the newly found incumbent solution. Otherwise, the iteration is declared unsuccessful. The next iteration is initiated at the same incumbent solution, but with a ner mesh, and a smaller neighborhood around the incumbent solution. A key property of the mesh exploration is that if an iteration is unsuccessful, then the current objective function value is less than or equal to all objective function values of the points in the current mesh neighborhood. In order to properly present the pattern search algorithm, we rst detail in the following subsections the construction of the mesh and the neighborhood of the current iterate.
2.1 The mesh At any given iteration k, the current mesh Mk is a discrete set of points in from which the algorithm selects the next iterate. The coarseness or neness of the mesh is dictated by the strictly positive mesh size parameter k 2 R+. Both the mesh and mesh size parameter are updated at every iteration. The mesh is the direct product of the nite union of lattices in Rnc with the integer space of Torczon [11], but the sets produced Znd . The presentation of the lattices diers from that c nc n and for ` varying from 1 to a nite are equivalent. Consider the basis matrix 2 R number `max , consider the generating matrices C` 2 Zncnc , then de ne the pattern matrices P` 2 Rncnc to be the products C`. The continuous variables are chosen from one of the translated (by xck ) integer lattices L`(k ) = xck + k P` z : z 2 Z nc ; for ` = 1; 2; : : : ; `max. The continuous part xck of the current iterate belongs to each lattice regardless of the value of the parameter k. The basis matrix is constant over all iterations. However in practice, the generating matrices C` (and thus P` ) that de ne the lattices can be determined as the algorithm unfolds, as long as only a nite number of them is generated. Each of these lattices is enumerable, and the minimum distance between two distinct points is proportional to the mesh size parameter k . When an iteration is successful, the continuous part of theSnext iterate is chosen in any of these lattices, and thus belongsd to n their union M (k ) = ``max =1 L` (k ), the discrete part is chosen in the integer lattice Z . At iteration k, the current mesh is de ned to be the direct product Mk = M (k ) Znd :
4 The mesh is completely de ned by the current iterate xk and the mesh size parameter k . Whether the iteration is successful or not, the next iterate xk+1 is selected in the mesh Mk . In the case where the Search step in the current mesh is unsuccessful, a second exploration phase must be conducted by the algorithm in a neighborhood surrounding the current iterate. The Poll step veri es if the incumbent solution is a local minimizer on the current mesh neighborhood.
2.2 The neighborhood The current mesh neighborhood Xk of xk contains a nite number of points that must be explored (i.e., their function values are evaluated) by the algorithm before declaring the iteration unsuccessful. It is a subset of the current mesh Mk and is obtained through the union of two smaller neighborhoods: one for the continuous variables, and the other for the discrete ones. Formally, it is de ned as
Xk = Xkc [ Xkd ; where Xkc Rnc fxdk g and Xkd fxck g Znd are nite sets of points. For the continuous variables, Xkc is de ned through a nite number of positive bases on Rnc . A positive basis is a set of non-zero vectors in Rnc whose non-negative linear combinations span Rnc , but no proper subset does so. Each positive bases contain at least nc + 1 and at most 2nc vectors. These are referred to as minimal and maximal positive bases. The following key property of positive bases is used in this document (see Davis [5] c n for characterization of positive bases). For any non-zero vector a in R and positive basis c n B on R , there exists a vector b of the basis B such that
atb < 0:
(4)
Let B be a nite set of positive bases on Rnc such that every column b of any positive basis is of the form P` z for some z 2 Z nc and 1 ` `max. The P` 's are the same matrices used to construct the lattices L` . The set B is xed throughout all iterations. The neighborhood of the continuous variables follows by scaling a basis B of B by the mesh size parameter as follows: the current mesh neighborhood Xkc at iterate k is written
Xkc = fxck + k b : b 2 B g Znd for some B 2 B: This de nition implies that Xkc is a subset of the current mesh Mk . Moreover, Xkc is constructed using a single positive basis chosen from a nite set, and thus there are only a nite number of such neighborhoods to choose from.
5 . . . . . . . . . . . . x^ . . . . . . . . . . . . fd 6= x^ : kd ? x^k1 1g One-norm
. . . . . . . . . . . . . . . . . . . . . . x^ . . . . x^ . . . . . . . . . . . . . . . . . . . . . . fd 6= x^ : kd ? x^k1 2g fd 6= x^ : jd1 ? x^1j + 2jd2 ? x^2j 2g In nity norm
Weighted norm
Figure 1: Examples of discrete neighborhoods N (^x) of x^.
The motivation for introducing positive bases for the continuous variables is that if the gradient rcf of the function f with respect to the continuous variables is non-zero, then at least one of the basis vectors de nes a descent direction. The original work of Torczon [11] uses a maximal positive basis. It was latter generalized in Lewis and Torczon [9] to any positive basis, thus reducing the minimum number of points in Xkc from 2nc to nc + 1. For the discrete variables, recall that the user provides the discrete neighborhood N . It is not required to be derived from a positive basis. It is determined by the quality of the solution that one desires from the algorithm, thus de ning the notion of \local optimality" one wishes to achieve with respect to the discrete variables. Indeed, a solution is said to be a local minimizer with respect to the discrete variable if f (^x) f (^xc; xd) for all xd in the discrete neighborhood N (^xd), where N (^xd) is chosen by the user. Figure 1 illustrates three types of discrete neighborhoods by displaying the points of N (^x) as circled dots. Under each gure is the formal de nition of the neighborhood N (^x). The rst one presents the neighboring points whose distance from xk is within one using the one-norm. The second allows a distance of two with the in nity norm. The third one uses a weighted norm. The de nition of a neighborhood is exible enough so that any nite set of integer points can be used to de ne N . The discrete neighborhood need not be represented through the translation of N (0) as above. For example the Quadratic Assignment Problem in which n facilities must be assigned to n locations. Each assignment may be represented using one of the n! permutation of the vector (1; 2; : : : ; n). The neighborhood of an assignment x could be for example the set of permutations that dier from x only in two locations. Consider the instance with three facilities. It may be modeled with three discrete variables (xd 2 Z3). Not all the points of the integer lattice Z3 represent feasible assignments, only the permutations of (1; 2; 3) are. Also, the ordering is not the classical one associated to a distance: with the neighborhood N (1; 2; 3) = f(1; 3; 2); (3; 2; 1); (2; 1; 3)g the assignment (3; 2; 1) is nearer to (1; 2; 3) than (3; 1; 2) is.
6 The proof of convergence holds for any xed discrete neighborhood N . However, the larger the neighborhood is, the more function evaluations would be expected to be required to converge. Except possibly at a nite number of iterations, the discrete mesh neighborhood Xkd Mk must satisfy fxd : x 2 Xkdg = N (xdk ). We will refer to this property by saying that the discrete component of the neighborhoods Xkd converges to N (xdk ). This ensures that Condition (3) holds at accumulation points. Moreover, this allows nitely many rede nitions of Xkd to adjust the cost of a Poll step (see Section 2.3). For example, if the user de nes N through the in nity norm (as in the second example of Figure 1), it might be worthwhile in the rst few iterations to de ne Xkd through the one-norm (as in the rst example of Figure 1). Then, once a solution satisfying the one-norm is obtained, the discrete mesh neighborhood Xkd may be updated to the in nity norm. Using the above notation, we can now present the generalized mixed integer pattern search algorithm.
2.3 The generalized mixed integer pattern search algorithm Our presentation of the pattern search algorithm is closer to that of Booker et al. [3] than to that of Torczon [11]. Consider the given the initial mesh M0 with mesh size parameter 0 and initial point x0 of M0. Throughout the document, the following assumptions are made: (A1) The level set L(x0) = fx 2 : f (x) f (x0)g is compact. (A2) f isd continuously dierentiable over a neighborhood ofc L(x0) when variables in Zn are xed, i.e., for any xd 2 Z the function fxd : Rn ! R where xc 7! f (xc; xd) is continuously dierentiable over a neighborhood of fxc : (xc; xd) 2 L(x0)g. At any iteration k 0, the general rules for choosing xk+1 in the current mesh Mk and obtaining the next mesh size parameter k+1 are as follows. 1. Search step (in current mesh). Employ some nite strategy to obtain an xk+1 2 Mk satisfying f (xk+1) < f (xk ). If such an xk+1 is found, declare the Search step (as well as the iteration) successful, then expand the mesh at Step 3. 2. Poll step (in current neighborhood). This step is reached only if the Search step is unsuccessful. If f (xk ) f (x) for every x 2 Xk , then declare the Poll step (as well as the iteration) unsuccessful and shrink the mesh at Step 4. Otherwise, choose xk+1 2 Xk to be a point such that f (xk+1 ) < f (xk ), declare the Poll step (as well as the iteration) successful, and expand the mesh at Step 3. 3. Mesh expansion (at successful iterations). Let k+1 = mk k (for mk 1 de ned below). Increase k, and initiate the next iteration at Step 1. +
+
7 m?k k 4. Mesh reduction (at unsuccessful iterations). Set x to x and let = k +1 k k +1 ?
(for 0 < mk < 1 de ned below). Increase k, and initiate the next iteration at Step 1. In the Search and Poll steps, the number of candidate points among which the next iterate can be chosen is nite, since it must belong to the intersection of the enumerable current mesh and the compact set L(x0). The parameters in the two last steps are the rational number > 1 and the integers (whose absolute values are bounded above by mmax 0) m+k 0 and m?k ?1. In [11], the mesh reduction parameter m?k was xed for all k 0. This restriction is relaxed here without aecting the convergence results. We plan to exploit this exibility in subsequent work to try to converge more quickly. The conditions on these parameters imply the simple decrease property used throughout the document: Iteration k is successful if and only if f (xk+1) < f (xk ), if and only if k+1 k and, if and only if xk+1 6= xk . Another important implication of the parameters' de nition is that if the iteration is unsuccessful, then f (x) f (xk ) for all x 2 Xk . Moreover, k+1 is obtained by multiplying k by a nite positive or negative integer power of . Therefore, for any k 0, we can write k = 0 rk ; (5) for some rk belonging to Z. Notice that the cost of the Poll step is expected to depend on the de nition of N . Thus, the user can pay more function evaluations for a stronger local integer solution by de ning N to be a larger neighborhood.
3 Proof of convergence This section contains the convergence proof for the general mixed integer pattern search algorithm. It contains three subsections that correspond to three important parts of the proof. We start by studying the behavior of the mesh size parameter k . Our rst important result is that lim infk!+1 k = 0. Therefore, there is a subsequence of mesh size parameters that converges to zero. It follows that there is an in nite number of unsuccessful iterations, and thus, for any > 0 there is an iterate with k < for which no point in the mesh neighborhood Xk yields decrease in the objective function. Second, we analyze a converging subsequence of unsuccessful iterates whose mesh size parameters converge to zero. We show that any accumulation point of the subsequence satis es the optimality conditions (2) and (3). By focusing on unsuccessful iterations, the same result for the continuous variables is shown using a shorter proof than in [11].
8 Finally, a last subsection is devoted to stronger results. Additional assumptions guarantee that the optimality conditions hold for any accumulation point of the whole sequence of iterates.
3.1 Boundedness of the mesh size parameters We prove here that there is a subsequence of mesh size parameters k that converges to zero. In order to do so, we rst show that these parameters are bounded above by a constant, independent of the iteration number k.
Lemma 3.1 There exists a positive integer rUB such that k 0 rUB for any k 0: Proof: Let be a mesh size parameter large enough so that the union of lattices M () intersects the compact level set fxc : x 2 L(c x0)g only at the translation parameter xck , i.e., for any 1 ` `max and nonzero z 2 Zn , if x is in L(x0) then the solution xc + P`z does not belong to the projection of L(x0) on the continuous variables space. Therefore, if at iteration k the mesh size parameter k is greater than then
Mk \ L(x0) = fxk g: Moreover, only a nite number of iterations will follow before the mesh size parameter drops below . Indeed, the continuous part of all these iterates will necessarily be equal to xck , and the discrete part of these iterates can only take a nite number of values because L(x0) is compact. Let dmax be the total number of distinct values that the discrete variables may take in the compact set L(x0). Therefore, there will be no more than dmax successful iterations before the mesh size parameter goes below . Recall that the expansion mesh size control parameter is bounded above by mmax . Let rUB be a large enough integer so that 0 rUB ( mmax )dmax . It follows that the mesh size parameter at any iteration will never exceed 0 rUB . We now study the convergence behavior of the mesh size parameter. The proof of this result is essentially identical to that of Torczon [11].
Theorem 3.2 The mesh size parameters satisfy lim inf = 0. k!+1 k Proof: Suppose by contradiction that there exists a negative integer rLB such that 0 < 0 rLB k for all k 0. Equation (5) states that for every k 0 there is rk 2 Z such that k = rk 0. Combining this with Lemma 3.1 implies that for any k 0, rk takes its
value among the integers of the bounded interval [rLB; rUB ]. Therefore, rk and k can only take a nite number of values for all k 0.
9 For any k, the continuous part of the next iterate xck+1 belongs to a lattice P`k where 1 `k `max, therefore it can be written xck + k P`k zk for some zk 2 Znc . By substituting k = 0 rk and P` = C`k , it follows that for any integer N
xcN
=
xc0 +
N ?1 X k=1
k P`k zk =
xc0 +0
N ?1 X k=1
rk C
`k zk
=
N ?1 rLB X p c x0 + qrUB 0 prk ?rLB qrUB?rk C`k zk k=1 = pq .
where p and q are relatively prime integers satisfying Since for any k the term prk ?rLB qrUB?rk C`k zk appearing in this last sum is an integer, it follows that the continuous part of all iterates lies on the translated integer lattice generated rLB p c by x0 and the columns of qrUB 0 . Moreover, the discrete part of all iterates also lies on the integer lattice Znd . Therefore, since all iterates belong to the compact set L(x0), it follows that there is only a nite number of dierent iterates, and thus one of them must be visited in nitely many times. Simple decrease ensures that the mesh size parameters converge to zero, which is a contradiction.
3.2 The main results Torczon [11] shows that condition (1) holds, i.e., there exists an accumulation point x^ of the sequence of iterates such that rcf (^x) = 0. Through a shorter proof, we show a slightly stronger result. We show the existence of an accumulation point x^ of the sequence of unsuccessful iterates that satis es the same condition. We also show that the same accumulation point x^ is a local optimizer with respect to the discrete variables, and thus it satis es conditions (2) and (3). Recall that iteration k is unsuccessful if and only if xk+1 = xk , which is equivalent to k+1 < k . Thus, the number of unsuccessful iterations is in nite since lim infk!+1 k = 0. Consider the indices of the unsuccessful iterations whose corresponding mesh size parameters go to zero. For any accumulation point of such a sequence, there is an iterate xk arbitrarily close to it for which no neighboring point of Xk yields descent. The following proposition details properties of an accumulation point x^ of the sequence of unsuccessful iterations.
Proposition 3.3 There is an x^ 2 L(x0) and a subset of indices of unsuccessful iterates K fk : xk+1 = xk g such that lim = 0; lim x = x^; and klim rcf (xk ) = rcf (^x): k2K k k2K k 2K
10
Proof: Theorem 3.2 guarantees that lim infk!+1 k = 0, thus there is an in nite subset of indices of unsuccessful iterations K 0 fk : xk+1 = xk g = fk : k+1 < k g such that the
subsequence (k )k2K converges to zero. Since all iterates xk lie in the compact set L(x0), we can extract an in nite subset K K 0 such that the subsequence (xk )k2K converges. Let x^ in L(x0) be the limit point of such a subsequence. There exists an integer N such that all discrete variables xdk for k 2 K; k > N satisfy xdk = x^d. Recall that the function f is continuously dierentiable on the compact set L(x0) when xd is xed at x^d. Therefore, since (xck )k2K converges to x^c, it follows that the sequence (rcf (xk ))k2K converges to rcf (^xc ; x^d) = rcf (^x). For the rest of this subsection, we assume that x^ and K satisfy the conditions of Proposition 3.3. The main results can now be proved. We rst consider the convergence of the continuous variables. The underlying idea of the proof by contradiction is that if rcf (^x)tbB < 0 for some base vector bB of any basis B of B, then any iterate xk suciently close to x^, and for which the mesh size parameter k is small, will be such that rcf (xk )tbB is less than a negative constant that depends on rcf (^x)tbB . Thus, there will be a descent direction in the mesh neighborhood Xkc . The iteration is therefore successful, contradicting the fact that x^ is a limit point of unsuccessful iterations.
Theorem 3.4 The accumulation point x^ satis es rcf (^x) = 0: Proof: Suppose that krcf (^x)k 6= 0: For any positive basis B 2 B, equation (4) guarantees that there exists a vector bB 2 B satisfying rcf (^x)tbB < 0. Continuous dierentiability of f when xd is xed at x^d allows us
to de ne B > 0 to be such that c t rcf (w)tbB < r f (^2x) bB < 0 8wc 2 BB (^xc) = fwc : kwc ? x^ck < B g; wd = x^d: (6) Let = min be the smallest value of B over all positive bases. Let = bmax kb k be B 2B B 2B 2B B the norm of the largest positive basis vector. Since there are only a nite number of positive bases in B, it follows that both and are nite strictly positive numbers independent of the iteration number k. Let k 2 K be large enough so that xck 2 B (^xc), xdk = x^d, and k < 2 . The mean value theorem ensures that for the current basis B of iteration k, 2
f (xck + k bB ; xdk ) = f (xk ) + rcf (w)tk bB
11
B(^xc) B (^xc) 2
x^c + bB xck + k bB x^c c wc xk
Figure 2: Main result for continuous variables. for some w = (wc; x^d) where wc belongs to the line segment from xck to xck + k bB , i.e., wc = xck + k bB for some in the interval [0; 1]. See Figure 2. Since
kw ? x^k kw ? xk k + kxk ? x^k = k kbB k + kxk ? x^k < 1 2kb k kbB k + 2 = ; B
it follows that (6) holds, that rcf (w)tbB < 0, and thus that f (xck + bB ; xdk ) < f (xk ). This implies that iteration k is successful and contradicts the fact that k belongs to K , a subset of indices of unsuccessful iterations. As show in Audet [1] through a small example, this result cannot be strengthened to limk!1 krcf (xk )k = 0 since there may be an accumulation point whose gradient norm is non-zero. It is also shown there that no second-order optimality conditions can be guaranteed. Next, we consider the discrete variables and show that x^ is a local optimal solution with respect to the discrete neighborhood.
Theorem 3.5 The accumulation point x^ satis es f (^x) f (^xc; xd) for all xd 2 N (^xd). Proof: Suppose by contradiction that there is a d 2 N (^xd) such that f (^x) > f (^xc ; d).
Continuity of the function f with respect to the continuous variables guarantees the existence of an > 0 such that if xc belongs to the ball B(^xc) centered at x^c of radius then f (^x) > f (xc; d). Recall that the discrete component of the neighborhoods Xkd converges to N (xdk ). Together with Proposition 3.3, this ensures that we can select an N > 0 large enough so
12 that if k 2 K is greater than N , then the iterate xk satis es xck 2 B(^xc), xdk = x^d, and fxd : x 2 Xkdg = N (^xd). Therefore, for k 2 K greater than N , the point (xck ; d) belongs to Xkd and xck belongs to B (^xc). It follows that f (xk ) f (^x) > f (xck ; d). Assumption (A1) guarantees that f (xk+1) < f (xk ), and thus the kth iteration is successful. This contradicts the fact that k belongs to K fk : xk+1 = xk g.
3.3 Stronger results Stronger results are obtained under the following more restrictive assumptions. These are generalizations to the mixed integer case of the ones appearing in [11] under the same hypotheses. (A3) The mesh size parameter satis es limk!1 k = 0. (A4) f (xk+1 ) f (x) for all x in Xk . (A5) kxk+1 ? xk k k h, where h > 0 is a nite constant. As observed in Torczon [11], Assumption (A3) can be realized by setting the mesh size increasing parameters mmax to 0. This ensures that the k be non-increasing. Assumption (A4) requires an exhaustive Poll step, i.e., the function must be evaluated at every point of the neighborhood Xk . Finally, Assumption (A5) is veri ed by restraining the Search step to a ball around the current iterate of radius proportional to k . Even under these additional assumptions, Audet [1] shows that the number of accumulation points of the sequence of iterates can be in nite. However, we show that conditions (2) and (3) hold at any accumulation point x~, and not only for those corresponding to unsuccessful iterations. Before showing the stronger results, we introduce the following notation. For any iterate xk such that krcf (xk )k 6= 0, let bk denote the column of the current basis B 2 B that yields the least value of rcf (xk )tbk . Since B is a positive basis, it follows by equation (4) that this value is negative, thus 0 > rcf (xk )tbk rcf (xk )tb
8b 2 B 2 B:
(7)
Moreover, de ne wk = (wkc ; xdk ) to be such that wkc belongs to the line segment from xck to xck + k bk and satis es the mean value theorem f (xk + k bk ) = f (xk ) + k rcf (wk )tbk . Finally, let N be an integer such that xdk = x~d for any k > N . Finiteness of N is guaranteed by combining Assumptions (A3) and (A5). The rst result of this section assures a minimum decrease in the objective function value under precise conditions.
13
Proposition 3.6 For any < 0, there exist > 0 and > 0 independent of the iteration number, such that all iterations k > N satisfying rc f (xk )tbk 2 , and k < also satisfy f (xk+1) f (xk ) ? kxck+1 ? xck k: Proof: For < 0, consider the set of iterates o n X = xk : rcf (xk )tbk 2 L(x0) where bk is the basis vector satisfying equation (7). If X = ; then the result is trivial since no iterate satisfy the conditions of the statement, so we may assume that X = 6 ;. Consider also the set
o n W = wk : rcf (wk )tbk 4 : Theorem 3.4 guarantees that W 6= ;. Both sets X and W are disjoint since is negative. Moreover, since the function f is assumed continuously dierentiable with respect to the continuous variables over a neighborhood of L(x0), the distance dist(X; W ) between these (X;W ) two sets is strictly positive. De ne = 2 maxdist fkbk:b2B 2Bg . Then is a strictly positive nite number, independent of k. Consider an iteration k > N that satis es rcf (xk )tbk 2 and k < (if no such iteration exist, then the result is trivial). It follows that xk belongs to X (by de nition of X ) and that wk does not belong to W (since kwk ? xk k k kbk k < dist(2X;W ) ) and thus rcf (wk )tbk < 4 . Combining this with Assumption (A4) implies that f (xk+1) f (xk + k bk ) = f (xk ) + k rcf (wk )tbk < f (xk ) + k 4 : c c Assumption (A5) ensures that k kxk h?xk k ; thus setting = ?4h > 0 yields the result. The proof of the stronger result for continuous variables diers signi cantly from that of Theorem 3.4. It cannot be assumed here that x~ is obtained through the limit of unsuccessful iterations. +1
Theorem 3.7 Any accumulation point x~ of the sequence of iterates (xk ) satis es rcf (~x) = 0, thus klim krcf (xk )k = 0. !1 Proof: Suppose by contradiction that krcf (~x)k 6= 0 for some accumulation point x~ of the
sequence of iterates (xk ). For any positive basis B 2 B, equation (4) guarantees that there exists a vector bB 2 B satisfying rcf (~x)tbB < 0. The negative value rcf (~x)tbB depends on the positive basis B .
14 Set = 12 maxfrcf (~x)tbB : B 2 Bg < 0 to be half the maximum of these negative values over all positive bases of the nite set B. Continuous dierentiability of f over the compact set L(x0) allows us to de ne > 0 to be such that for any pair of indices k and ` if kx` ? xk k < then j(rcf (x`) ? rcf (xk ))tb`j < ?2
(8)
where b` is the basis vector (chosen from nitely many basis in B) satisfying equation (7). Let and be the positive parameters derived from Proposition 3.6 that depend only on < 0. Let ` > N be an index that satis es the following three properties
rcf (x` )tb` < ;
k < and f (x` ) ? f (xk ) < for any k `:
Existence of such ` is guaranteed by Assumption (A3), Assumption (A2), and the fact the sequence (f (xk )) is non-increasing and convergent. De ne the index k(`) = minfk > ` : rcf (xk )tbk > 2 g where bk is the basis vector satisfying equation (7). This index is nite as Theorem 3.4 states that there is an accumulation point with zero gradient. Therefore, Proposition 3.6 guarantees that f (xk ) ? f (xk+1) kxck+1 ? xck k when ` k < k(`), and the de nition of the index k(`) implies that rcf (xk ` )tbk ` > 2 . Combining all this, and writing out the telescopic sum leads to ( )
f (x` ) ? f (xk ` ) = ( )
kX (`) ?1 k=`
(f (xk ) ? f (xk+1))
kX (`) ?1 k=`
( )
kxck+1 ? xck k kxc` ? xck ` k: ( )
The choice of ` implies that f (x`) ? f (xk ` ) < , and consequently kxc` ? xck ` k < . Moreover, since ` > N , equation (8) holds and therefore ( )
( )
t
rcf (x` )tb` = rcf (xk ` )tb` + rcf (x`) ? rcf (xk ` ) b` > rcf (xk ` )tbk ` + 2 > 2 + 2 = : ( )
( )
( )
( )
This contradicts the fact that any iterate x` satis es rcf (x`)tb` < . The proof of the stronger result for discrete variables is very similar to that of Theorem 3.5. Only the strong decrease assumption (A4) is required to show the result. Convergence of the mesh size parameter k to zero is not required.
Theorem 3.8 Any accumulation point x~ of the sequence of iterates (xk ) satis es f (~x) f (~xc; xd) for all xd 2 N (~xd).
15
Proof: Suppose by contradiction that there is a d 2 N (~xd) such that f (~x) > f (~xc ; d).
Continuity of the function f with respect to the continuous variables guarantees the existence of an > 0 such that if xc belongs to the ball B(~xc) of radius centered at x~c, then f (~x) > f (xc; d). Recall that the discrete component of the neighborhoods Xkd converges to N (xdk ). Since x~ is an accumulation point, there is an iteration number k large enough such that the iterate xk satis es xck 2 B(~xc), xdk = x~d, and fxd : x 2 Xkd g = N (~xd). Therefore, the point (xck ; d) belongs to Xkd and xck belongs to B(~xc). It follows that f (xk ) f (~x) > f (xck ; d). Assumption (A4) guarantees that f (xk+1 ) f (xck ; d), which is impossible since (f (xk )) is bounded below by f (~x).
References [1] Audet C. (1998), \Convergence Results for Pattern Search Algorithms are Tight," TR98-24 Department of Computational & Applied Mathematics, Rice University, Houston TX. [2] Box G.E.P.(1957), \Evolutionary operation: A method for increasing industrial productivity," Appl. Statist. 6, 81{101. [3] Booker A.J., Dennis J.E.Jr, Frank P.D., Serafini D.B., Torczon V. and Trosset M.W.(1998), \A Rigorous Framework for Optimization of Expensive Functions by Surrogates," CRPC TR98739-s Rice University, Houston TX. To appear in Structural Optimization. [4] Conn A.R., Scheinberg K. and Toint Ph.L.(1997) \Recent progress in unconstrained nonlinear optimization without derivatives," Mathematical Programming 79, 397{414. [5] Davis C.(1954), \Theory of positive linear dependence," American Journal of Mathematics 448{474. [6] Dennis J.E.Jr and Torczon V.(1991), \Direct search methods on parallel machines," SIAM Journal on Optimization 1, 448{474. [7] Hooke R. and Jeeves T.A.(1961), \Direct search solution of numerical and statistical problems," J. Assoc. Comput. Mach. 8, 212{229. [8] Lewis R.M. and Torczon V.(1996), \Pattern search algorithms for bound constrained minimization," ICASE NASA Langley Research Center TR 96-20. To appear in SIAM Journal on Optimization.
16 [9] Lewis R.M. and Torczon V.(1996), \Rank ordering and positive basis in pattern search algorithms," ICASE NASA Langley Research Center TR 96-71. [10] Lewis R.M. and Torczon V.(1998), \Pattern search methods for linearly constrained minimization," ICASE NASA Langley Research Center TR 98-3. [11] Torczon V.(1997), \On the Convergence of Pattern Search Algorithms," SIAM Journal on Optimization Vol.7 No.1, 1{25.