PATTERN SEARCH METHODS IN THE PRESENCE OF ...

3 downloads 194 Views 515KB Size Report
Pattern search, linearly constrained optimization, derivative-free optimization, degeneracy, redundancy, constraint classification. AMS subject classifications.
PATTERN SEARCH METHODS IN THE PRESENCE OF DEGENERACY MARK A. ABRAMSON

†,

OLGA A. BREZHNEVA

‡ , AND

J. E. DENNIS JR.



§

Abstract. This paper deals with generalized pattern search (GPS) algorithms for linearly constrained optimization. At each iteration, the GPS algorithm generates a set of directions that conforms to the geometry of any nearby linear constraints. This set is then used to construct trial points to be evaluated during the iteration. In previous work, Lewis and Torczon developed a scheme for computing the conforming directions, but it assumed no degeneracy near the current iterate. The contribution of this paper is to provide a detailed algorithm for constructing the set of directions whether or not the constraints are degenerate. One difficulty in the degenerate case is in classifying constraints as redundant and nonredundant. We give a short survey of the main definitions and methods for treating redundancy and propose an approach to identify nonredundant ε-active constraints, which may be useful for other active set algorithms. We also introduce a new approach for handling nonredundant linearly dependent constraints, which maintains GPS convergence properties without significantly increasing computational cost. Some simple numerical tests illustrate the effectiveness of the algorithm. We conclude by briefly considering the extension of our ideas to nonlinear constraints with linearly dependent constraint gradients. Key words. Pattern search, linearly constrained optimization, derivative-free optimization, degeneracy, redundancy, constraint classification AMS subject classifications. 65K05, 49M30, 90C30, 90C56

1. Introduction. This paper continues the development of generalized pattern search (GPS) algorithms [4, 16] for linearly constrained optimization problems min f (x) , x∈Ω

(1.1)

where f : Rn → R ∪ {∞} may be discontinuous, and the feasible region is given by Ω = {x ∈ Rn : aTi x ≤ bi , i ∈ I} = {x ∈ Rn : AT x ≤ b},

(1.2)

where, for i ∈ I = {1, 2, . . . , |I|}, ai ∈ Rn , bi ∈ R, and A ∈ Qn×|I| is a rational matrix. In [4, 16], the b ≤ u}, where A b ∈ Qm×n is a rational matrix, feasible region Ω is defined as Ω = {x ∈ Rn : ` ≤ Ax m `, u ∈ {R ∪ {±∞}} , and ` < u. As is evident, (1.2) reduces to the definition in [4, 16], where the rth row b is equal to some ith row aT of the matrix AT with a coefficient of +1 or −1, and bi = ur b ar of the matrix A i or bi = −`r . We target the case when the function f (x) may be an expensive “black box”, provide few correct digits, or may fail to return a value even for feasible points x ∈ Ω. In this situation, the accurate approximation of derivatives is not likely to be practical. GPS algorithms rely on simple decrease in f (x); i.e., an iterate xk+1 ∈ Ω satisfying f (xk+1 ) < f (xk ) is considered successful. Lewis and Torczon [16] introduced and analyzed the generalized pattern search for linearly constrained minimization problems. They proved that if the objective function is continuously differentiable and if the ∗

Date: August 19, 2005 Department of Mathematics and Statistics, Air Force Institute of Technology, AFIT/ENC, Building 641, 2950 Hobson Way, Wright-Patterson AFB, Ohio 45433 ([email protected], http://en.afit.edu/enc/Faculty/MAbramson/abramson.html) †

‡ Department of Mathematics and Statistics, Miami University, 123 Bachelor Hall, Oxford, Ohio 45056, ([email protected]). § Computational and Applied Mathematics Department, Rice University - MS 134, 6100 Main Street, Houston, Texas, 77005-1892 ([email protected], http://www.caam.rice.edu/∼ dennis). The research of this author was supported in part by AFOSR F49620-01-1-0013, the Boeing Company, Sandia CSRI, ExxonMobil, the LANL Computer Science (LACSI) contract 03891-99-23, by the Institute for Mathematics and its Applications with funds provided by the National Science Foundation, and by funds from the Ordway Endowment at the University of Minnesota.

1

Form Approved OMB No. 0704-0188

Report Documentation Page

Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number.

1. REPORT DATE

3. DATES COVERED 2. REPORT TYPE

2003

00-00-2003 to 00-00-2003

4. TITLE AND SUBTITLE

5a. CONTRACT NUMBER

Pattern Search Methods in the Presence of Degeneracy

5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER

6. AUTHOR(S)

5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES)

Air Force Institute of Technology,Department of Mathematics and Statistics,2950 Hobson Way Building 640,Wright Patterson AFB,OH,45433-7765 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES)

8. PERFORMING ORGANIZATION REPORT NUMBER

10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S)

12. DISTRIBUTION/AVAILABILITY STATEMENT

Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES

The original document contains color images. 14. ABSTRACT

see report 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF:

17. LIMITATION OF ABSTRACT

a. REPORT

b. ABSTRACT

c. THIS PAGE

unclassified

unclassified

unclassified

18. NUMBER OF PAGES

19a. NAME OF RESPONSIBLE PERSON

22

Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18

set of directions that defines a local search is chosen properly with respect to the geometry of the boundary of the feasible region, then GPS has at least one limit point that is a Karush-Kuhn-Tucker point. By applying the Clarke nonsmooth calculus [11], Audet and Dennis [4] simplified the analysis in [16] and introduced a new hierarchy of convergence results for problems with varying degrees of nonsmoothness. Second-order behavior of GPS is studied in [2]. Generalized pattern search algorithms generate a sequence of iterates {xk } in Rn with non-increasing objective function values. In linearly constrained optimization, a set of directions that defines the so-called poll step must conform to the geometry of the boundary of the feasible region. The key idea, which was first suggested by May in [18] and applied to the GPS in [16], is to use as search directions the generators of cones polar to those generated by the normals of faces near the current iterate. Lewis and Torczon [16] presented an algorithm for constructing the set of generators in the nondegenerate case, and left the degenerate case for future work. In their recent work, Kolda et al. [14] mentioned that the problem when constraints are degenerate has been well studied in computational geometry and the solution to the problem exists in [5, 6]. However, there are examples when the method proposed in [5, 6] requires full enumeration, which can be cost-prohibitive. Price and Coope [21] gave as an aside a result that can be used for constructing a set of generators in the degenerate case. It follows from their result that, in order to construct a set of generators, it is sufficient to consider maximal linearly independent subsets of the active constraints. However, this approach implies enumeration of all possible linearly independent subsets of maximal rank and does not take into account properties of the problem that can help to reduce this enumeration. Price and Coope [21] outlined an algorithm for constructing frames, but it was not their point to work out details of the numerical implementation in the degenerate case. The purpose of this paper is to give detailed consideration to GPS in the degenerate case in a way that is complementary to [4] and [16]. Our main result is a detailed algorithm for constructing the set of generators at a current GPS iterate in both the degenerate and nondegenerate cases. To construct the set of generators in the degenerate case, we identify the redundant and nonredundant active constraints and then use either QR decomposition or a construction proposed in [16]. Classification of constraints as redundant or nonredundant is one of the main issues here, because it is sufficient to construct the set of generators only for nonredundant constraints. Several methods for classifying constraints exist. For example, there are deterministic algorithms [10, 13], probabilistic hit-and-run methods [7], and a probabilistic method based on an equivalence between the constraint classification problem and the problem of finding a feasible solution to a set covering problem [9]. A survey and comparison of strategies for classifying constraints are given in [9, 13]. Any of these approaches can be applied in the GPS framework to identify redundant and nonredundant constraints. However, in the paper, we propose a new projection approach to identify nonredundant constraints that is more suitable for GPS methods. The projection method is similar to the hit-and-run algorithm [7], in which nonredundant constraints are searched for along random direction vectors from each point in a sequence of random interior points, but differs in its use of a deterministic direction. The major advantage of the projection method for our application is that the number of direction vectors (in the terminology of the hit-and-run algorithm) is equal to the number of constraints that have to be identified. For us this is generally a small number. In the hitand-run algorithm, this number is determined by a stop criterion and can be large if many of the randomly generated directions do not detect a nonredundant constraint. Moreover, the formulas used in the projection method are simpler than those used for computing the intersection points of a direction vector with the hyperplanes in the hit-and-run algorithm. We should note also that the goal of hit-and-run is to detect all nonredundant constraints in a full system of linear inequalities. We use the projection method to detect the nonredundant constraints among only active constraints in the case when they are linearly dependent. As our numerical tests show, the projection method cheaply detects all, or almost all, nonredundant constraints. To classify constraints not detected by the projection method, we use another approach outlined in [10]. 2

As a result, we ensure that every active constraint is detected as either redundant or nonredundant. In the worst case, we may have linearly dependent, nonredundant constraints. We propose a general approach for handling this case with an accompanying convergence theorem, along with two specific instances that can be used effectively in practice. In the end, we briefly discuss the extension of our ideas to optimization problems with general nonlinear constraints that are linearly dependent at a solution. We do so by applying the projection method to a linearization of the constraints, and we argue that it is less costly than applying the approach of [10]. The organization of the paper is as follows. In the next section, we give a brief description of GPS as well as the convergence result for linearly constrained minimization following papers by Audet and Dennis [4], and by Lewis and Torczon [16]. Section 3 is devoted to the topic of redundancy. In the first part of the section, we introduce a definition of the ε-active constraints and discuss some scaling issues. The second part of Section 3 contains essential definitions and results on redundancy [10, 13, 19, 25] that are required for our analysis. Then we propose our projection method to determine nonredundant constraints, and we briefly describe a more expensive follow-up approach to be applied if some constraints are not identified by the projection method. In Section 4, we give an algorithm for constructing the set of generators and discuss implementation details, including a new approach for handling nonredundant linearly dependent constraints in a rigorous way without significantly increasing computational cost. In Section 5, we consider the extension of our ideas to nonlinearly constrained problems. Section 6 is devoted to some concluding remarks. Notation. R, Z, and N denote the set of real numbers, integers, and nonnegative integers, respectively. For any finite set S, we may refer to the matrix S as the one whose columns are the elements of S. Similarly, for any matrix A, the notation a ∈ A means that a is a column of A. 2. Generalized pattern search algorithms. In this section, we briefly describe the class of GPS algorithms for linearly constrained minimization, along with the main convergence result. We follow papers by Audet and Dennis [4] and by Lewis and Torczon [16], and we refer the reader there for details of managing the mesh size ∆k . Throughout, we will always use the `2 norm. GPS algorithms can be applied either to the objective function f or to the barrier function fΩ = f + ψΩ : Rn → R ∪ {+∞}, where ψΩ is the indicator function for Ω, which is zero on Ω and ∞ elsewhere. The value of fΩ is +∞ on all points that are either infeasible or at which f is declared to be +∞. This barrier approach is probably as old as direct search methods themselves. A GPS algorithm for linearly constrained optimization generates a sequence of iterates {xk } in Ω. The current iterate xk ∈ Rn is chosen from a finite number of points on a mesh, which is a discrete subset of Rn . At iteration k, the mesh is centered around the current mesh point (current iterate) xk and its fineness is parameterized through the mesh size parameter ∆k > 0 as Mk = {xk + ∆k Dz : z ∈ NnD },

(2.1)

where D is a finite matrix whose columns form a set of positive spanning directions in Rn , nD is the number of columns of the matrix D. At each iteration, some positive spanning matrix Dk composed of columns of D is used to construct the poll set, Pk = {xk + ∆k d : d ∈ Dk }.

(2.2)

A two-dimensional mesh and poll set are illustrated in Figure 2.1. If xk ∈ Ω is not near the boundary, then Dk is a positive spanning set for Rn [16]. If xk ∈ Ω is near the boundary, the matrix Dk is constructed so its columns dj also span the cone of feasible directions at xk and conform to the geometry of the boundary of Ω. Hence, the set D must be rich enough to contain generators 3

r

r r

r

r r

r

r

r

r

r

r

r

r

r

r

r

r

r

r

r

r

r

r r r   r  xkA Ar xk +∆k d r r

d∈Dk

Fig. 2.1. A mesh and poll set in R2

for the tangent cone TΩ (x) = cl{µ(ω − x) : µ ≥ 0, ω ∈ Ω} for every x ∈ Ω. More formally, the sets Dk must satisfy the following definition. Definition 2.1. A rule for selecting the positive spanning sets Dk ⊆ D conforms to Ω for some ε > 0, if at each iteration k and for each y in the boundary of Ω for which ky − xk k < ε, TΩ (y) is generated by a nonnegative linear combination of columns of Dk . Each GPS iteration is divided into two phases: an optional search and a local poll. In each step, the barrier objective function is evaluated at a finite number of mesh points in an attempt to find one that yields a lower objective function value than the incumbent. We refer to such a point as an improved mesh point. If an improved mesh point is found, it becomes the incumbent, so that f (xk+1 ) < f (xk ). The mesh size parameter is then either held constant or increased. In the search step, there is complete flexibility. Any strategy may be used (including none), and the user’s knowledge of the domain may be incorporated. If the search step fails to yield an improved mesh point, the poll step is invoked. In this second step, the barrier objective function is evaluated at points in the poll set Pk (i.e., neighboring mesh points) until an improved mesh point is found or until all the points in Pk have been evaluated. If both the search and poll steps fail to find an improved mesh point, then the incumbent is declared to be a mesh local optimizer and is retained as the incumbent, so that xk+1 = xk . The mesh size parameter is then decreased. Figure 2.2 gives a description of a basic GPS algorithm. We remind the reader that the normal cone NΩ (x) to Ω at x is the nonnegative span of all the outwardly pointing constraint normals at x and can be written as the polar of the tangent cone: NΩ (x) = {v ∈ Rn : ∀ω ∈ TΩ (x), v T ω ≤ 0}. Assumptions. We make the following standard assumptions [4]: A1 A function fΩ and x0 ∈ Rn (with fΩ (x0 ) < ∞) are available. A2 The constraint matrix A is rational. A3 All iterates {xk } produced by the GPS algorithm lie in a compact set. Under these assumptions, Torczon [22] showed that lim inf ∆k = 0, and Audet and Dennis [4] identified the following subsequences, for which the limit of ∆k is zero. Definition 2.2. A subsequence of mesh local optimizers {xk }k∈K (for some subset of indices K) is said to be a refining subsequence if {∆k }k∈K converges to zero. Audet and Dennis [4] proved the following convergence results for GPS in the linearly constrained case 4

• Initialization: Let x0 be such that fΩ (x0 ) is finite. Let D be a positive spanning set, and let M0 be the mesh on Rn defined by ∆0 > 0, and D0 . Set the iteration counter k = 0. • Search and poll step: Perform the search and possibly the poll steps (or only part of them) until an improved mesh point xk+1 with the lowest fΩ value so far is found on the mesh Mk defined by equation (2.1). – Optional search: Evaluate fΩ on a finite subset of trial points on the mesh Mk defined by (2.1) (the strategy that gives the set of points is usually provided by the user; it must be finite and the set can be empty). – Local poll: Evaluate fΩ on the poll set defined in (2.2). • Parameter update: If the search or the poll step produced an improved mesh point, i.e., a feasible iterate xk+1 ∈ Mk ∩ Ω for which fΩ (xk+1 ) < fΩ (xk ), then update ∆k+1 ≥ ∆k . Otherwise, fΩ (xk ) ≤ fΩ (xk + ∆k d) for all d ∈ Dk and so xk is a mesh local optimizer. Set xk+1 = xk , update ∆k+1 < ∆k . Increase k ← k + 1 and go back to the search and poll step. Fig. 2.2. A simple GPS algorithm

using only these assumptions. Lemma 2.3. Under assumptions A1–A3, if x ˆ is any limit of a refining subsequence, if d is any direction in D for which f at a poll step was evaluated for infinitely many iterates in the subsequence, and if f is Lipschitz near x ˆ, then the generalized directional derivative of f at x ˆ in the direction d is nonnegative, i.e., f ◦ (ˆ x; d) ≥ 0. Theorem 2.4 (Convergence to a Karush-Kuhn-Tucker point). [4] Under assumptions A1–A3, if f is strictly differentiable at a limit point x ˆ of a refining subsequence, and if the rule for selecting positive spanning sets Dk ⊆ D conforms to Ω for some ε > 0, then ∇f (ˆ x)T ω ≥ 0 for all ω ∈ TΩ (ˆ x), and so −∇f (ˆ x) ∈ NΩ (ˆ x). Thus, x ˆ is a Karush-Kuhn-Tucker point. The purpose of this paper is to provide an algorithm for constructing sets Dk that conform to the boundary of Ω. If the active constraints are linearly dependent, we apply strategies for the identification of redundant and nonredundant constraints, which are described in the next section, and then construct sets Dk taking into account only nonredundant constraints. We now pause to outline the main results concerning redundancy from mathematical programming, and then in Section 4, we continue consideration of GPS and strategies for constructing the sets Dk . 3. Redundancy. We now present some essential definitions and results concerning redundancy [7, 10, 13, 19, 25] that are required for our analysis. Then we propose our approach, the projection method, to determining the nonredundant constraints and briefly describe another approach that is applied if some constraints are not identified by the projection method. We consider the feasible region Ω defined by (1.2), and refer to the inequality aTj x ≤ bj as the j-th constraint. The region represented by all but the jth constraint is given by Ωj = {x ∈ Rn : aTi x ≤ bi , i ∈ I\{j}}, where I\{j} is the set I with the element j removed. 5

The following definition is consistent with definitions given in [10, 13]. Definition 3.1 (Redundant constraint). The jth constraint aTj x ≤ bj is redundant in the description of Ω if and only if Ω = Ωj , and is (necessarily) nonredundant otherwise. 3.1. ε–active constraints. We next compare two definitions of ε–active constraints and discuss some associated scaling issues. They are replicated from [16] and [21], respectively. Definition 3.2. (e.g., [21]). Let some scalar ε > 0 be given and xk ∈ Ω. The jth constraint is ε–active at xk if 0 ≤ bj − aTj xk ≤ ε.

(3.1)

Definition 3.3. (e.g., [16]). Let some scalar ε > 0 be given and xk ∈ Ω. The jth constraint is ε–active at xk if dist(xk , Hj ) ≤ ε,

(3.2)

where Hj = {x ∈ Rn : aTj x = bj }, and dist(xk , Hj ) = min ky −xk k is the distance from xk to the hyperplane y∈Hj

Hj . Clearly, the jth constraint can be made ε–active at xk in the sense of Definition 3.2 by multiplying the inequality bj − aTj xk ≥ 0 by a sufficiently small number. On the other hand, this multiplication does not change the distance between the point xk and any Hj defined in Definition 3.3. In the paper, we prefer to use Definition 3.2, since it is easier to check than Definition 3.3. However, Definition 3.2 is proper, if we assume preliminary scaling of the constraints so that the following lemma applies. Lemma 3.4. Let some scalar ε > 0 be given, xk ∈ Ω, and kaj k = 1 for all j ∈ I in (1.2). Then, for any j ∈ I, Definition 3.2 of the ε–active constraint is equivalent to Definition 3.3, and the projection Pj (xk ) of the point xk onto the hyperplane Hj = {x ∈ Rn : aTj x = bj } is defined by Pj (xk ) = xk + aj (bj − aTj xk ).

(3.3)

Proof. For any j ∈ I, the distance from xk to the hyperplane Hj is given by dist(xk , Hj ) =

|bj − aTj xk | . kaj k

(3.4)

Hence, if kaj k = 1 and xk ∈ Ω, (3.1) is equivalent to (3.2). By definition of the projection of xk onto Hj , kPj (xk ) − xk k = dist(xk , Hj ). Since xk ∈ Ω and kaj k = 1, it follows from (3.4) that dist(xk , Hj ) = bj − aTj xk and Pj (xk ) = xk + aj dist(xk , Hj ) = xk + aj (bj − aTj xk ). Hence, (3.3) holds. To satisfy the conditions of Lemma 3.4, we introduce the matrix A¯ and vector ¯b that are additional scaled copies of A and b, respectively from (1.2), such that a ¯i =

ai , kai k

¯bi = bi , kai k 6

i ∈ I.

(3.5)

Consequently, k¯ ai k = 1 for all i ∈ I and Ω = {x ∈ Rn : AT x ≤ b} = {x ∈ Rn : A¯T x ≤ ¯b} = {x ∈ Rn : T ¯ a ¯i x ≤ bi , i ∈ I}. We then use A¯ and ¯b to define the set of indices of the ε–active constraints as I(xk , ε) = {i ∈ I : 0 ≤ ¯bi − a ¯Ti xk ≤ ε},

(3.6)

and we apply the projection method for detection of the nonredundant constraints (see Section 3.3.1 for more details.) We refer to the set I(xk , ε) as the working index set at the current iterate xk . This paper also makes use of the regions given by Ω(xk , ε) = {x ∈ Rn : aTi x ≤ bi , i ∈ I(xk , ε)},

(3.7)

and Ωj (xk , ε) = {x ∈ Rn : aTi x ≤ bi , i ∈ I(xk , ε)\{j}},

j ∈ I(xk , ε).

Clearly, Ω ⊆ Ω(xk , ε) ⊆ Ωj (xk , ε). Furthermore, since Ω ⊆ Ω(xk , ε), if the jth constraint is redundant in the description of Ω(xk , ε), it is also redundant in the description of Ω. 1

,

,

,

,

,

,

,

,

,

,

,

,

,

-

-

-

-

-

-

-

-

-

-

-

-

-

,

, -

, -



 



 













































 

 



 

x

 

 

, -



, -



, -

, -

-





"

"

"

"

"

"

"

#

#

#

#

#

#

#



 

$

$

!

%

$

!

%

!

%

!

%

!

$ %

!

%





"

"

"

#

"

#

"

#

"

#

#

&



$

$

$

!



%

"

$

!

"



%



















 



#



 









(

$





$

!

%

$

!

$ %

%

'

%

"



"



"

#

"

#

"

#

%

*

*

*

+

+

+

"

#

#





(

(



)

&

&

&

&

'

















&

'





'

















$

$

$

!

%

$

!

%

$

!

%

$

!

%

$

!

%

%

*

*

* +

+

+







 







 















"

"

"

"

"

"

"





#

#

#

#

#

#

#

(





 

)

















&

&

&

&

'

'

'

'

& '



















 

 

 

 













$







$

!



$

!



%

$

!

%

$

!

%

$

!

%

$

!

%

%

*

*

*

+

 



 

























 







"



"



"

#

"

#

"

#







(

( )

*

* +

+

( )

)

 



&

. /

#

(

 



 

. /

#

)

 



. /

"

#



 



"

#

 





+



 



 





 

 







 



+









 









 











)



 

 





(

)





. /

 





(

)

 



. /

(





. /

*



 





. /

+





 

&



*



 





+

 

 







 

*

'





 

!

 

+

&

'











 

)



 

 



 

(

)







/



 

*

#

)

 

. /





+

"

#



 

.

$

!

(





 



)

&

$

!





. /





( )

& '









 



.

( )

& '

 

 

/

#

& '



 

 

 

 

 

%

"

#

&





 

 

 

 



 





%

"

#

(

 

 

 

$



$

!

"

#

)

 

 

!





'

$

!

"

#

'



( )

& '

$ %

"

#

 

&

( )

& '



 

!

 

 

 

 



( )

& '

 

 

 

 

 

 





#

& '

 

 

 

 

 

. /





"

#

&

 

 

 



 

 





2

 

 



$ %



 

 

 

 





$ %

!



 

 

 



 

. /





, -





 

 

 



 





, -



$



 



, -



!

)

 

, -

.



 

. /

, -

/

 



 

. /

, -



 



 

. /



 

 



, -





(





. /

 



, -



 





. /

 



, -





 

 

, -

 



 





, -

 





, -

 

 

 

 

 

&

&

&

&

'

'

'

'

& '

 



. /



 





 

 



























 





$





!

$ !

$ %

!

$ %

!

$ %

!

$ %

!

$ %

%

* +

 



 

































 





*

*

*

*

+

+

+

+

 

 



 



 

3

 











"

"

"

"

"

"

"









#

#

#

#

#

#

#

Fig. 3.1. An illustration of ε-active and redundant constraints. Constraints 1, 2, and 3 are ε-active at the current iterate x and constraint 2 is redundant.

3.2. Redundancy in mathematical programming. We now give definitions and theorems consistent with the mathematical programming literature [7, 10, 13, 19, 25]. We begin with the following definitions, which can be found in [19, 25]. In the discussion that follows, we use notation consistent with that of Section 1 (see (1.2) and the discussion that follows it). Definition 3.5 (Polyhedron). A subset of Rn described by a finite set of linear constraints P = {x ∈ R : C T x ≤ d} is a polyhedron. n

Obviously, Ω given by (1.2) and Ω(xk , ε) given by (3.7) are polyhedra. Definition 3.6. The points x1 , . . . , xk ∈ Rn are affinely independent if the k − 1 directions x2 − x , . . . , xk −x1 are linearly independent, or alternatively, the k vectors (x1 , 1), . . . , (xk , 1) ∈ Rn+1 are linearly independent. 1

We will assume that Ω is full-dimensional, as defined below. Definition 3.7. The dimension of P , denoted dim(P ), is one less than the maximum number of affinely independent points in P . Then P ⊆ Rn is full-dimensional if and only if dim(P ) = n. Note that, if Ω were not full-dimensional, then a barrier GPS approach would not be a reasonable way to handle linear constraints because it would be difficult to find any trial in Ω. Since we assume Ω is 7

full-dimensional, this implies that its supersets Ω(xk , ε) and Ωj (xk , ε) are full-dimensional. Definition 3.8 (Valid inequality). An inequality cTj x ≤ dj is a valid inequality for P ⊆ Rn if cTj x ≤ dj for all x ∈ P . Definition 3.9 (Face and Facet). (i) F defines a face of the polyhedron P if F = {x ∈ P : cTj x = dj } for some valid inequality cTj x ≤ dj of P . F 6= ∅ is said to be a proper face of P if F 6= P . (ii) F is a facet of P if F is a face of P and dim(F ) = dim(P ) − 1. Definition 3.10 (Interior point). A point x ∈ P is called an interior point of P if C T x < d. We also need the following results from integer programming [25, pp. 142–144] and [19, pp. 85–92]. Proposition 3.11. [19, Corollary 2.5] A polyhedron is full-dimensional if and only if it has an interior point. Theorem 3.12. [25, Theorem 9.1] If P is a full-dimensional polyhedron, it has a unique minimal description P = {x ∈ Rn : cTi x ≤ di ,

i = 1, . . . , m},

where each inequality is unique to within a positive multiple. Corollary 3.13. [25, Proposition 9.2] If P is full-dimensional, a valid inequality cTj x ≤ dj is necessary in the description of P if and only if it defines a facet of P . Corollary 3.13 means that the following concepts are equivalent for Ω(xk , ε) defined in (3.7). • The jth inequality aTj x ≤ bj defines a facet of Ω(xk , ε). • The jth inequality aTj x ≤ bj is necessary (nonredundant) in description of Ω(xk , ε), or in other words, Ω(xk , ε) ( Ωj (xk , ε).

(3.8)

Our approach for identifying nonredundant constraints is based primarily on the following proposition. Proposition 3.14. Let a working index set I(xk , ε) be given. An inequality aTj x ≤ bj , j ∈ I(xk , ε), is nonredundant in the description of Ω(xk , ε) if and only if either I(xk , ε) = {j} or there exists x ¯ ∈ Rn such T T that aj x ¯ = bj and ai x ¯ < bi for all i ∈ I(xk , ε)\{j}. Proof. Since the case I(xk , ε) = {j} is trivial, we give the proof for the case when I(xk , ε)\{j} = 6 ∅. Necessity. Since the inequality aTj x ≤ bj is nonredundant, then, by (3.8), there exists x∗ ∈ Rn such that aTi x∗ ≤ bi for all i ∈ I(xk , ε)\{j}, and aTj x∗ > bj . By Proposition 3.11, there exists an interior point x ˆ ∈ Ω(xk , ε) such that aTi x ˆ < bi for all i ∈ I(xk , ε). Thus on the line between x∗ and x ˆ there is a point n T x ¯ ∈ R satisfying aj x ¯ = bj and aTi x ¯ < bi for all i ∈ I(xk , ε)\{j}. Sufficiency. Let x ˆ ∈ Ω(xk , ε) be an interior point, i.e., aTi x ˆ < bi for all i ∈ I(xk , ε). Since there exists n T x ¯ ∈ R such that aj x ¯ = bj and aTi x ¯ < bi for all i ∈ I(xk , ε)\{j}, then there exists δ > 0 such that x ˜=x ¯ + δ(¯ x−x ˆ) satisfies aTj x ˜ > bj and aTi x ˜ ≤ bi , i ∈ I(xk , ε)\{j}. Therefore, (3.8) holds, and by Definition 3.1, the jth constraint is nonredundant. Proposition 3.14 means that if the jth constraint, j ∈ I(xk , ε), is nonredundant, then there exists a feasible point x ¯ ∈ Ω(xk , ε) such that only this constraint holds with equality at x ¯. Our approach for identifying redundant constraints is based primarily on the following theorem [10]. Theorem 3.15. The jth constraint is redundant in system (1.2) if and only if the linear program, maximize aTj x,

subject to x ∈ Ωj ,

has an optimal solution x∗ such that aTj x∗ ≤ bj . 8

(3.9)

3.3. Approaches for identifying redundant and nonredundant constraints. We now outline two approaches for identifying redundancy in the constraint set: a projection method for identifying nonredundant constraints and a linear programming (LP) approach for identifying redundant ones. The LP approach, which is based on Theorem 3.15, is described in [10]. In Section 4, we will explain in more detail how these ideas are implemented in the class of GPS algorithm for linearly constrained problems, even in the presence of degeneracy. 3.3.1. A projection method. The main idea of the projection method we propose is the construction, ¯ < bi for all i ∈ I(xk , ε)\{j}. If such a point x ¯ exists, ¯ = bj and aTi x if possible, of a point x ¯ such that aTj x then by Proposition 3.14, the jth constraint is nonredundant. Recall that we defined in (3.5) a scaled copy A¯ of the matrix A and a scaled vector ¯b. We denote by Pj (xk ), the projection of xk ∈ Rn onto the hyperplane Hj = {x ∈ Rn : a ¯Tj x = ¯bj }. Assume that xk ∈ Ω. Then by (3.3) and by k¯ aj k = 1, Pj (xk ) = xk + a ¯j (¯bj − a ¯Tj xk ).

(3.10)

The following proposition is the main one for the projection method. Proposition 3.16. Let xk ∈ Ω and let a working index set I(xk , ε) be given. An inequality aTj x ≤ bj , j ∈ I(xk , ε), is nonredundant in the description of Ω(xk , ε) if a ¯Ti Pj (xk ) < ¯bi

for all

i ∈ I(xk , ε)\{j},

(3.11)

where Pj (xk ) is a projection of xk onto Hj . Proof. The proof follows from Proposition 3.14. Proposition 3.16 allows us to very quickly classify the jth constraint as nonredundant if (3.11) holds for all i ∈ I(xk , ε)\{j}, where Pj (xk ) in (3.11) is obtained from (3.10). The only drawback is that it identifies nonredundant constraints and not redundant ones. 3.3.2. The linear programming approach. If some constraints have not been identified by the projection method, we can apply another approach based on Theorem 3.15 to identify redundant and nonredundant constraints. It follows from Theorem 3.15 that all redundant and nonredundant constraints could be conclusively identified by solving n LP problems of the form given in (3.9). While doing so is clearly more expensive than the projection method given in Section 3.3.1, it could be accomplished during the initialization step of GPS (i.e., before the GPS iteration sequence begins), at a cost of solving n LP problems. This is possible because redundancy of linear constraints is independent of the location of the current iterate. However, the projection method could be advantageous when many linear constraints are present (which is often the case with redundant constraints), or when dealing with linear constraints formed by linearizing nonlinear ones. In the latter case, redundancy would depend upon location in the domain, since the linear constraints would change based on location. The book [13] describes different methods in the context of the LP approach. They include some very special propositions involving slack variables that simplify and reduce the computational cost of the numerical solution of the LP problem (3.9). We refer the reader to [13] for a more detailed discussion of these issues. 4. Construction of the set of generators. The purpose of this section is to provide a detailed algorithm for constructing the set of directions Dk introduced in Section 2, even in the presence of degenerate constraints. Let some scalar ε > 0 be given, and let a ¯Ti be the ith row of the matrix A¯T in (3.5). At the current iterate xk , we construct the working index set I(xk , ε) such that 0 ≤ ¯bi − a ¯Ti xk ≤ ε ⇐⇒ i ∈ I(xk , ε). 9

The last inequality means that every constraint that is active at xk or at some point near xk appears in I(xk , ε). In [4], the authors suggest not setting ε so small that ∆k is made small by approaching the boundary too closely before including conforming directions that allow the iterates to move along the boundary of Ω. A good discussion of how to choose ε can be found in [14]. Without loss of generality, we assume that I(xk , ε) = {1, . . . , m}, for m ≥ 2. This avoids more cumbersome notation, like I(xk , ε) = {i1 (xk , ε), . . . , im (xk , ε)}. Furthermore, we denote by Bk , the matrix whose columns are the columns of A corresponding to the indices I(xk , ε) = {1, . . . , m}; i.e., Bk = [a1 , . . . , am ].

(4.1)

4.1. Classification of degeneracy at the current iterate. Let the matrix Bk be defined by (4.1). At the current iterate xk , the matrix Bk satisfies one of the following conditions: • nondegenerate case: Bk has full rank; • degenerate redundant case: Bk does not have full rank and the nonredundant constraints are linearly independent; • degenerate nonredundant case: Bk does not have full rank and the nonredundant constraints are linearly dependent. The last condition is illustrated by following example provided to us by Charles Audet. Example 2. Suppose that the feasible region Ω (see (1.2)), shown in Figure 4.1, is defined by the following system of inequalities: x1 − 2x2 − 2x3 −2x1 + x2 − 2x3 −2x1 − 2x2 + x3 x1 x2 x3

≤ ≤ ≤ ≥ ≥ ≥

0 0 0 0 0 0

(4.2)

If xk ∈ R3 is near the origin, all six constraints are active, linearly dependent, and nonredundant. The matrix Bk is given as   1 −2 −2 −1 0 0 Bk =  −2 1 −2 0 −1 0  . −2 −2 1 0 0 −1

0

x1 xk

x3 x2

Fig. 4.1. Example 2. An illustration of the degenerate nonredundant case. 10

4.2. Set of generators. Following [16], we define the cone K(xk , ε) as the cone generated by the normals to the ε–active constraints, and K ◦ (xk , ε) as its polar: K ◦ (xk , ε) = {w ∈ Rn : aTi w ≤ 0 ∀i ∈ I(xk , ε)}.

(4.3)

This cone can also be expressed as a finitely generated cone [24]. To see this, first consider the following definition. Definition 4.1 (Set of generators). A set V = {v1 , . . . , vr } is called a set of generators of the cone K defined by (4.3) if the following conditions hold: 1. Every vector v ∈ K can be expressed as a nonnegative linear combination of vectors in V . 2. No proper subset of V satisfies 1. Thus, given Definition 4.1, we can express K ◦ (xk , ε) as K ◦ (xk , ε) = {w ∈ Rn : w =

r X

λj vj ,

λj ≥ 0, vj ∈ Rn ,

j = 1, . . . , r},

(4.4)

j=1

where the V = {v1 , . . . , vr } is the set of generators for K ◦ (xk , ε). The key idea, which was first suggested by May in [18] and applied to GPS in [16], is to include in Dk the generators of the cone K ◦ (xk , ε). Hence, the problem of construction of the set Dk reduces to the problem of constructing generators {v1 , . . . , vr } of the cone K ◦ (xk , ε) and then completing them to a positive spanning set for Rn . The following proposition means that it is sufficient to construct the set of generators only for nonredundant constraints. Proposition 4.2. Let I(xk , ε) be the set of indices of ε–active constraints at xk ∈ Rn . Let IN (xk , ε) ⊆ I(xk , ε) be the subset of indices of the nonredundant constraints that define Ω(xk , ε). Let the cone K ◦ (xk , ε) ◦ (xk , ε) be given by be defined by (4.3) and let the cone KN ◦ KN (xk , ε) = {w ∈ Rn : aTi w ≤ 0

∀i ∈ IN (xk , ε)}.

◦ (xk , ε), then it is also a set of generators for K ◦ (xk , ε). If {v1 , . . . , vp } is a set of generators for KN

Proof. The proof of this proposition follows from Corollary 3.13. Pattern search methods require that iterates lie on a rational lattice [16]. To ensure this, Lewis and Torczon [16] placed an additional requirement that the matrix of constraints AT in (1.2) is rational. Under this requirement, Lewis and Torczon [16] showed, in the following theorem, that it is always possible to find rational generators for the cones K ◦ (xk , ε), which, with the rational mesh size parameter ∆k , ensures that GPS iterates lie on a rational lattice. Theorem 4.3. If K is a cone with rational generators V , then there exists a set of rational generators for K ◦ . Moreover, for the case of linearly independent active constraints, Lewis and Torczon [16] proposed constructing the set of generators for all the cones K ◦ (xk , ε), 0 ≤ ε ≤ δ, as follows: Theorem 4.4. Suppose that for some δ, K(x, δ) has a linearly independent set of rational generators V . Let N be a rational positive basis for the null space of V T . Then, for any ε, 0 ≤ ε ≤ δ, a set of rational generators for K ◦ (x, ε) can be found among the columns of N , V (V T V )−1 , and −V (V T V )−1 . The matrix N can be constructed by taking columns of the matrices ±(I − V (V T V )−1 V T ) [16]. Recall that we use the scaled matrix A¯ defined in (3.5) to determine ε-active, redundant, and nonredundant constraints. Then we use the result stated in Theorem 4.4 together with rational columns of A, which correspond to the nonredundant and ε-active constraints, to obtain a set of rational generators. 11

A set of generators, which may be irrational in exact arithmetic, can also be found by using the QR factorization of the matrix V . The following corollary shows how to use the QR factorization of V to construct the generators for all the cones K ◦ (xk , ε), 0 ≤ ε ≤ δ. Recall that the full QR factorization of V can be represented as     R1 R 2 V = Q1 Q2 , (4.5) 0 0 where R1 is upper triangular and rank(R1 ) = rank(V ), and the columns of Q1 form an orthonormal basis for the space spanned by the columns of V , while the columns of Q2 constitute an orthonormal basis for null space of V T . Corollary 4.5. Suppose that for some δ, K(x, δ) has a linearly independent set of rational generators V . Then, for any ε, 0 ≤ ε ≤ δ, a set of generators for K ◦ (x, ε) can be found among the columns of Q2 , Q1 R1 (R1T R1 )−1 , and −Q1 R1 (R1T R1 )−1 . Proof. By substituting V = QR and using the properties of the matrices in the QR factorization, we obtain V (V T V )−1 = QR((QR)T (QR))−1 = QR(RT QT QR)−1 = QR(RT R)−1 .

(4.6)

By applying Theorem 4.4 and by taking into account that columns of Q2 span the null space of V T , we obtain the statement of the corollary. From the theoretical point of view, a set of generators obtained by using Corollary 4.5 may be irrational since an implementation of the QR decomposition involves calculation of square roots. This would violate theoretical assumptions required for convergence of pattern search. However, since V is rational, both sides of (4.6) must also be rational. Therefore, by Corollary 4.5, any generators with irrational elements would be found in the matrix Q2 . But in the degenerate case, Q2 will often be empty, since it represents a positive spanning set for the null space of V T , and most examples of degeneracy occur when the number of ε-active constraints exceeds the number of variables. Furthermore, since we use floating point arithmetic in practice, irrational generators would be represented as rational approximations. This has the effect of generating a slightly different cone. Thus, it would be enough to ensure convergence, but to a stationary point of a slightly different problem. However, the error experienced in representing an irrational number as rational is probably smaller than the typical roundoff error associated with LU factorization. 4.3. The nonredundant degenerate case. Perhaps the most difficult case to handle is the one in which the ε-active constraints at xk are nonredundant, but linearly dependent. This can happen, in particular, when there are more ε-active constraints than variables, as is the case in Example 4.2. The difficulty of this case lies in the fact that the number of directions required to generate the tangent cone can become large. Let Sk = {a1 , a2 , . . . , ap } denote the set of vectors corresponding to the -active nonredundant constraints at xk . Price and Coope [21] showed that, in order to construct Dk , it is sufficient to identify the tangent cone generators of all maximally linearly independent subsets of Sk . For Sk with r = rank(Sk ), we can estimate the number s of these subsets, by s=

p! . r!(p − r)!

(4.7)

Thus, in order to identify the entire set of tangent cone generators, we would have to consider s different sets of positive spanning directions, where s could become quite large. While there are some vertex enumeration techniques [5] (mentioned in [16]) that could be useful, we now present a more practical approach – first in general, and then followed by some specific instances that can be implemented in practice. 12

4.3.1. Partially conforming generator sets. In our approach, we choose a subset of r linearly independent elements of Sk and store them as columns of Bk . Based on the methods described in Section 3.3.1, we can construct a set of generators for the cone defined only by a subset of the constraints represented in Sk . Furthermore, we require Bk to change at each iteration so that, in the limit, each constraint that is active at the limit point x ˆ has been used infinitely often in constructing directions. Since the set of tangent cone generators is finite, the ordering scheme for ensuring this is straightforward. This approach is essentially equivalent to using all the tangent cone generators, except that it is spread out over more than one iteration. The advantage is that it keeps the size of the poll set no larger than it would be in the nondegenerate case. However, the drawback is that we no longer have a full set of directions that conform to the geometry of Ω, which is an important hypothesis in the statement of Theorem 2.4. The proof of Theorem 2.4, given in [4], relies on two crucial ideas; namely, Lemma 2.3 and the use of conforming directions. Under the proposed method of handling degenerate nonredundant constraints, Lemma 2.3 still applies, but Theorem 2.4 cannot be applied, since not all the tangent cone generators are used at each iteration. We introduce the following theorem, which establishes the same result as Theorem 2.4, but with a different hypothesis and a proof that is essentially identical (see [4]). Theorem 4.6. Let x ˆ ∈ Ω be the limit point of a refining subsequence {xk }k∈K . Under Assumptions A1–A3, if f is strictly differentiable at x ˆ and all generators of the tangent cone TΩ (ˆ x) are used infinitely often in K, then ∇f (ˆ x)T ω ≥ 0 for all ω ∈ TΩ (ˆ x), and so −∇f (ˆ x) ∈ NΩ (ˆ x). Thus, x ˆ is a Karush-Kuhn-Tucker point. Proof. Lemma 2.3 and the strict differentiability of f at x ˆ ensure that ∇f (ˆ x)T d ≥ 0 for all d ∈ D ∩TΩ (ˆ x). Since D includes all the tangent cone generators, and each is used infinitely often in K, it follows that every ω ∈ TΩ (ˆ x) can be represented as a nonnegative linear combination of D ∩ TΩ (ˆ x); thus, ∇f (ˆ x)T ω ≥ 0 for all ω ∈ TΩ (ˆ x). To complete the proof, we multiply both sides by −1 and conclude that −∇f (ˆ x) ∈ NΩ (ˆ x). 4.3.2. Generating directions. While the new hypothesis of Theorem 4.6 is weaker and makes the result more general than Theorem 2.4, it is more difficult to enforce. The enumeration scheme mentioned above will ensure that the tangent cone generators get used infinitely often, but it cannot ensure that they get used infinitely often in the refining subsequence. Thus we make the following additional assumption. A4: Any direction used infinitely often is also used infinitely often in any refining subsequence. This assumption is actually quite mild, since a direction that does not satisfy A4 would always be successful (infinitely often) after a finite number of iterations. The following result establishes an important connection between the constraints and tangent cone generators. ˆ be the sets of active Lemma 4.7. Let x ˆ be the limit of a subsequence of GPS iterates, and let Sˆ and D ˆ constraints and tangent cone generators, respectively, at x ˆ. If every constraint in S is used to form tangent ˆ is also used cone generators infinitely often in the subsequence, then every tangent cone generator in D infinitely often in the same subsequence. s [ ˆ such that Sˆ = Proof. Let Sˆj , j = 1, . . . , s be maximally linearly independent subsets of S, Sˆj . j=1

Furthermore, let D(Sˆj ) denote the set of tangent cone generators produced by only the constraints in Sˆj . s [ ˆ ⊂ Price and Coope [21] show that D D(Sˆj ). Thus, if Sˆj is used infinitely often, then D(Sˆj ) is used j=1

ˆ is used infinitely often, and if every Sˆj , j = 1, . . . , s, is used infinitely often, then every direction in D infinitely often. We now give two examples of approaches that generate directions satisfying the hypotheses of Theorem 4.6, followed by convergence theorems for each. 13

Random Selection: Randomly select (with uniform probability) r linearly independent ε-active constraints to form tangent cone generators. Sequential Selection: Order the s subsets of r linearly independent elements of Sk as Si , i = 1, . . . , s, and use subset Sj , j = 1 + k mod s, at iteration k. Theorem 4.8. Let x ˆ be the limit of a refining subsequence {xk }k∈K , in which the set of nonredundant binding constraints at x ˆ is linearly dependent. If search directions are obtained by Random Selection whenever the elements of Sk are linearly dependent, then with probability 1, all tangent cone generators at x ˆ will be used infinitely often in K. Proof. For any nonredundant active constraint at x ˆ, let Pk denote the probability that the constraint is randomly selected at iteration k. Then for sufficiently large k, the set Sk is fixed with p elements (corresponding to the active constraints at x ˆ), and Pk = pr . Then the probability that the constraint is selected infinitely often in any infinite subsequence M of iterates (with sufficiently large k) is equal to Y Y p−r 1− (1 − Pk ) = 1 − = 1. The result then follows from Lemma 4.7. p k∈M

k∈M

Theorem 4.9. Let x ˆ be the limit of a refining subsequence {xk }k∈K , in which the set of nonredundant binding constraints at x ˆ is linearly dependent. Under assumption A4, if search directions are obtained by Sequential Selection whenever the elements of Sk are linearly dependent, then all tangent cone generators at x ˆ will be used infinitely often in K. Proof. Since subset Sj , j = 1 + k mod s is used at iteration k, Sj is used infinitely often in the iteration sequence. Furthermore Sj ⊂ Sˆ for all sufficiently large k, where Sˆ is the set of active constraints at x ˆ. The result follows from Assumption A4 and Lemma 4.7. In each of these two instances, something is sacrificed for the sake of implementation - either a weaker convergence result (Random Selection) or the additional assumption A4 (Sequential Selection). However, if function evaluations are expensive, the alternative of identifying all the tangent cone generators at each iteration will become intractable. Furthermore, this by no means exhausts the possibilities for choosing tangent cone generators when nonredundant constraints are linearly dependent. Considering that the projection method measures distance to each constraint boundary, one promising alternative is to select the closest n − 1 constraints (with ties broken arbitrarily), plus one more constraint obtained by either random or sequential selection. The latter constraint allows the theory in the previous two theorems to hold, while offering an intelligent heuristic in selecting those constraints that are closer to the current iterate. Choosing the closest constraints is equivalent to reducing ε at each iteration so that fewer constraints are flagged as ε-active. 4.4. An algorithm for constructing the set of generators. In this section, we present an algorithm for constructing a set of generators for the cone K ◦ (xk , ε) at the current iterate xk for a given parameter ε. 4.4.1. Comments on the algorithm. The algorithm consists of two main parts. In the first part, we determine the set of indices of the nonredundant ε-active constraints IN (xk , ε) ⊆ I(xk , ε) and form the matrix BN whose columns are the columns of A corresponding to the indices in IN (xk , ε). We use information about the set IN (xk , ε) from the previous iterations of the GPS algorithm. Namely, we put into the set IN (xk , ε) all indices that correspond to the ε–active constraints at the current iterate and that were detected as indices of the nonredundant constraints at the previous iterations of the algorithm. In the second part of the algorithm, we construct the set of generators Dk required by GPS and by Theorem 2.4. First, we try to identify the nonredundant active constraints. If the matrix Bk defined by (4.1) has full rank, then all ε-active constraints are nonredundant, IN (xk , ε) = I(xk , ε), and BN = Bk . If the matrix Bk does not have full rank and we have indices that have not been classified at the previous iterations of the algorithm, we propose using two steps in succession. 14

The first strategy is intended to determine nonredundant constraints cheaply by applying the projection method described in section 3.3.1. By Proposition 3.16, if the projection Pj (xk ) of the current iterate xk onto the hyperplane Hj = {x ∈ Rn : a ¯Tj x = ¯bj } is feasible, and only the jth constraint holds with equality at Pj (xk ), then the jth constraint is nonredundant, and we can put index j into the set IN (xk , ε). If some constraints have not been identified by the projection method, we can either apply the projection method with some other point x ˜ 6= xk or apply the second strategy. The second strategy is intended to classify redundant and nonredundant constraints among those that have not already been determined as nonredundant by the projection method. To identify each constraint, the approach outlined in [10] and in Section 3.15 is applied. If the number of constraints to be identified is too large, we can skip an application of this strategy and construct a set of generators using the set IN (xk , ε) ¯ obtained from the first strategy. Then, while performing the poll step, if we find some point x ¯ = xk + ∆d, where d¯ is some column of Dk , such that aTj x ¯ > bj and aTi x ¯ ≤ bi for all i ∈ I(xk , ε)\{j}, we can conclude that Ω(xk , ε) ( Ωj (xk , ε). Hence, by Corollary 3.13, the jth constraint is nonredundant, and we add j to the set IN (xk , ε). Once we have specified all redundant and nonredundant constraints, we compose the matrix BN of those columns of A that correspond to nonredundant constraints. The rank of BN can be determined by QR factorization. If BN has full rank, then we construct the set of generators using QR or LU factorization. If BN does not have full rank, we construct the set of generators from a set of linearly independent columns of BN , and as the iteration sequence progresses, we invoke one of the methods described in Section 4.3 to ensure that all maximally linearly independent subsets get used infinitely often. 4.4.2. Algorithm. We denote the set of indices of the nonredundant ε-active constraints by IN (xk , ε). Thus, for j ∈ I(xk , ε), 1. if j ∈ IN (xk , ε), then the inequality aTj x ≤ bj is nonredundant; and 2. if j ∈ I(xk , ε)\IN (xk , ε), then the inequality aTj x ≤ bj is redundant. We use IN ⊆ I to denote the set of indices that are detected as nonredundant at some iteration of the algorithm. Thus IN = ∅ in the beginning of the algorithm. We denote the rational matrix in (1.2) by AT and the scaled matrix defined in (3.5) by A¯T . The matrix Bk is defined by (4.1) and is composed of columns aj of A, where j ∈ I(xk , ε), while the matrix BN is composed of those columns of A whose indices are in the set IN (xk , ε). Thus, the columns of BN are those vectors normal to the nonredundant constraints. Algorithm for constructing the set of generators Dk . Let the current iterate xk ∈ Rn and a parameter ε > 0 be given. % Part I: Constructing the set IN (xk , ε) % Construct the working index set I(xk , ε) for i = 1 to |I| if 0 ≤ ¯bi − a ¯Ti xk ≤ ε I(xk , ε) ← I(xk , ε) ∪ {i} Bk ← [Bk , ai ] endif endfor if rank(Bk ) = |I(xk , ε)| IN (xk , ε) ← I(xk , ε) BN ← Bk

% the case where all constraints are nonredundant

15

else % using information from T the previous iterations of the algorithm for each j ∈ {I(xk , ε) IN } IN (xk , ε) ← IN (xk , ε) ∪ {j} BN ← [BN , aj ] endfor % Identification of the nonredundant and redundant constraints for each j in {I(xk , ε)\IN (xk , ε)} % the first strategy Pj (xk ) = xk + a ¯j (¯bj − a ¯Tj xk )

% see Lemma 3.4

if a ¯Ti Pj (xk ) < ¯bi for all i ∈ I\{j} IN (xk , ε) ← IN (xk , ε) ∪ {j} BN ← [BN , aj ] IN ← IN ∪ {j} else % the second strategy solve LP problem (3.15) for x∗ if aTj x∗ ≤ bj % the jth constraint is redundant remove aj x ≤ bj from Ω I ← I \ {j} I(xk , ε) ← I(xk , ε) \ {j} else % the jth constraint is nonredundant IN (xk , ε) ← IN (xk , ε) ∪ {j} BN ← [BN , aj ] IN ← IN ∪ {j} endif endif endfor endif % Part II: Constructing the set of generators Dk r = rank(BN ) if r 6= |IN (xk , ε)| % degenerate case BN ← B, where B is composed of r linearly independent columns of BN endif V = BN D1 ← V (V T V )−1 D2 ← I − V (V T V )−1 V T D = [D1 , D2 , −D1 , −D2 ] As discussed in Section 4.2, the construction of the directions in D, in practice, can be done making use of either LU decomposition, as suggested by Lewis and Torczon [16], or by the more efficient QR factorization approach presented in Section 4.2. In the latter case, D1 and D2 are computed according to Corollary 4.5. We should point out that, in practice, the choice of ε can have a significant affect on numerical performance. If the value is set too low, then the mesh size may become very small before appropriate conforming directions are generated. If this happens, the algorithm may then progress along a new conforming direction, 16

but with the significantly reduced mesh size, resulting in a lot more function evaluations. On the other hand, too large a value may mark too many constraints as active. This could result in otherwise good directions being replaced by worse ones, and even a false detection of degeneracy, resulting in additional unnecessary function evaluations. 4.4.3. Numerical Tests. To test the algorithm, we formed five test problems with varying numbers of variables and redundant linear constraints to test the ability of our approach to accurately construct the set IN (xk , ε) of nonredundant constraints. In doing so, we chose a trial point xk close to several of the constraints and tested the ability of our algorithm to identify the nonredundant ones. The test problems are described as follows: Test Problem 1: Same as the problem given in (4.2), with a test value of xk = (0.1, 0.1, 0.1)T . Test problem 2: The following problem with a test value of xk = (0.01, −0.01, −0.01, −0.00001, 0.01)T : −x1 + x2 x1 + x2 x2 + x3 + x4 −x2 + x5 −x1 + x2 x3 −0.8x1 + x2 + x3

≤0 ≤1 ≤0 ≤5 ≤0 ≤0 ≤ 0.

Test Problem 3: The following problem with a test value of xk = (0.01, 0.01, 0.01, 0.01, 0.01)T : x1 − 2x2 − 2x3 + x4 −2x1 + x2 − 2x3 −2x1 − 2x2 + x3 −x1 −x2 −x3 − x5 −x4 − 0.1x5

≤0 ≤0 ≤0 ≤0 ≤0 ≤0 ≤ 0.

Test Problem 4: Same as Test Problem 3, but with xk = (0.000001, 0.000001, 0.000001, 0.000001, 0.1)T . Test Problem 5: Same as Test Problem 3, but with xk = (0.001, 0.001, 0.001, 0.001, 0.001)T . We report results in Table 4.1, where each row corresponds to one of the five test problems (in the order presented), and where the number of variables is given in the first column. Columns 2 and 3 show the number of nonredundant and redundant constraints, respectively, with their sum representing the total number of constraints for each problem. The last two columns indicate how many of the constraints were identified as nonredundant, first by the projection method, and then by the LP approach if projection failed to identify all the nonredundant ones. As is shown in the table, the projection method identifies most of the nonredundant constraints, and with the LP method as a backup, all the constraints are correctly identified. With this approach in place, the number of GPS iterations required for a problem with no redundant constraints will be no different than for a modified version of the same problem, in which any number of additional redundant constraints are added, since the algorithm detects and removes the redundant constraints at each iteration. Finally, we coded up the random selection and sequential selection approaches for handling linearly dependent, nonredundant ε-active constraints (see Section 4.3), added this code to the NOMADm software [1]), and tested the approaches on the following test problem: Test problem 6: min −x21 − x22 − x23 x

17

Table 4.1 Constructing the set IN (xk , ε) at the current iterate xk

Constraints Detected as nonredundant Variables Nonredundant Redundant by Projection by LP approach 3 6 0 6 5 6 1 5 1 5 7 0 6 1 5 7 0 5 2 5 7 0 6 1

subject to 4x1 + 4x2 − 3x3 16x1 + 8x2 − 9x3 24x1 + 8x2 − 11x3 8x1 − 24x2 + 23x3 8x1 + 8x2 − 13x3 24x1 + 8x2 − 25x3 24x1 − 8x2 − 21x3 x1 + 2x2 − x3 x1 + x2 + x3

≥0 ≥0 ≥0 ≥0 ≤0 ≤0 ≤0 ≥0 ≤ 8.

Test Problem 6 has a degenerate global maximizer at the origin, which is not the local solution. We start the algorithm there, and in order to avoid stalling there, a set of n = 3 constraints must be chosen that will generate a feasible descent direction. Since the algorithm does not evaluate the objective at infeasible points, and the cones of feasible directions and of descent directions coincide at this point, moving off the degenerate point will always occur at the second function evaluation. Thus our measure of performance becomes the number of iterations, rather than function evaluations, required to move off the degenerate point. The number of iterations gives a measure of how many unsuccessful attempts the algorithm made at including a feasible descent direction in its set of poll directions. For sequential selection, we paired the constraints in consecutive order; i.e., constraints 1 and 2, 1 and 3, 1 and 4, etc. The algorithm required 9 iterations to move off of the degenerate point. For random selection, we performed 10 replications and achieved the following number of iterations to move off the degenerate point: {4, 3, 4, 2, 2, 2, 5, 7, 2, 6}. We tested both approaches with default settings of ∆0 = 1, a mesh refinement strategy of ∆k+1 = 21 ∆k , and empty search step. The variable ε was set to 10−4 (although this had no bearing on performance, since we started at the degenerate point), and the QR factorization was used in constructing tangent cone generators. Both direction selection approaches successfully moved off the degenerate point. A comparison between the two approaches is not particularly relevant, since the result for sequential selection is highly dependent on the order in which constraints are expressed or chosen. In the worst-case, since each iteration required 2n = 6 directions, consideration of between 12 and 54 directions were required to move off the degenerate point. However, had we attempted to compute all the directions during the first iteration, we might have had to consider, by (4.7), a worst-case 84 subsets of constraints, or 504 directions. As an aside, we also let the algorithm run to completion (termination tolerance of ∆k < 10−8 ), and all 11 runs successfully converged to one of two local minimizers (at approximately (3.67, 0.221, 4.11)T and (0.471, 3.76, 3.76)T ). Sequential selection required a total of 67 iterations and 78 function evaluations, while random selection required 55–91 iterations and 84–144 function evaluations. 18

We should point out that, for this problem, there is no real cost savings in terms of function evaluations because the infeasible points were not evaluated. Had we chosen a problem in which the initial point was degenerate, but neither a minimizer or maximizer, then the number of function evaluations would become a factor. In this case, the selection of certain combinations of constraints would generate feasible non-descent directions, which would result in additional function evaluations at points with worse function values. But even without the cost savings, we still avoid the potentially intractable task of specifically identifying all the tangent cone generators at each iteration via some vertex enumeration scheme. 5. Nonlinearly constrained minimization. The goal of this section is to illustrate how the projection approach proposed in this paper can also be effective for handling degeneracy in nonlinearly constrained optimization problems. In doing so, we should point out that our approach is different than that of [20] and [26] (and others cited in these papers). Both approaches use local information about the (twice continuously differentiable) objective and constraint functions to identify active constraints. Moreover, the focus in [26] is on distinguishing between strongly and weakly active constraints – the latter having Lagrange multiplier values of zero. In our case, we do not have multiplier values available, and even if we did, most direct search methods we might consider using, such as [12] and [17], can handle weakly active constraints transparently if constraint gradients are linearly independent. We consider the nonlinearly constrained optimization problem min f (x)

x ∈ Ω = {x ∈ Rn : ci (x) ≤ 0, i = 1, . . . , q}.

subject to

x∈Rn

(5.1)

All constraint functions ci , i = 1, 2, . . . , q, are assumed to be continuously differentiable, but their gradients may not be available. The algorithm in [17] uses constraint gradients, while the one in [12] uses only approximations. Our intent is to be as general as possible, so that the ideas presented here might be extendable to both algorithms, as well as other direct search methods. Similar to Section 3, the region defined by all but the j-th constraint is given by Ωj = {x ∈ Rn : ci (x) ≤ 0, i ∈ I\{j}}, where I = {1, 2, . . . , q}. Additionally, for δ > 0, we define Uδ (x) = {y ∈ Rn : ky − xk ≤ δ}, and offer a definition of local redundancy (nonredundancy), in the sense that the constraints are locally nonredundant if they define the shape of the feasible region in some neighborhood of a point x ∈ Rn . This is illustrated in Figure 5.1. 1

2 1



2

Xk

Fig. 5.1. An illustration of a locally redundant constraint. Constraint 2 is locally redundant at xk .

Definition 5.1 (Locally redundant constraint). The jth constraint cj (x) ≤ 0 is locally redundant at x in the description of Ω if, for some δ > 0, Ω ∩ Uδ (x) = Ωj ∩ Uδ (x), and is locally nonredundant otherwise. Our main interest is in the problem of constructing search directions that conform to the boundary of Ω. First, we define constraint j as ε-active if −ε ≤ cj (x) ≤ 0. For the iterate xk at iteration k, we denote by I(xk , ε) the set of indices of the ε-active constraints at xk ; namely, I(xk , ε) = {j = 1, 2, . . . , q : −ε ≤ cj (xk ) ≤ 0}, 19

and extend the following from similar definitions given in Section 3: Ω(xk , ε) = {x ∈ Rn : ci (x) ≤ 0, i ∈ I(xk , ε)}, Ωj (xk , ε) = {x ∈ Rn : ci (x) ≤ 0, i ∈ I(xk , ε)\{j}},

j ∈ I(xk , ε).

If xk is close to the boundary of Ω, then the set of directions should contain generators for the tangent cone TΩ (xk ) for boundary points near xk . (k)

We assume that estimates ai of the gradients ∇ci (xk ), i = 1, 2, . . . , q, are available. Thus, an estimate C (k) (I(xk , ε)) of the tangent cone TΩ (xk ) is given by (k)

C (k) (I(xk , ε)) = {v ∈ Rn : v T ai

≤ 0 ∀i ∈ I(xk , ε)}.

(5.2)

By Definition 4.1, each of these cones can be expressed as the set of nonnegative linear combinations of a finite number of generators, {vj }pj=1 ⊂ Rn . One of the main assumptions in [12] is that at each point x on the boundary of Ω, the gradients of the constraints active at x are linearly independent. By extending our ideas from previous subsections to the nonlinear case, this assumption can be relaxed. The next proposition is simply an application of Proposition 4.2 to the cone defined by the linearized constraints, except that the index sets apply to nonlinear constraints. As a consequence, it is sufficient to construct the set of generators for only the locally nonredundant constraints. Proposition 5.2. Let IN (xk , ε) ⊆ I(xk , ε) be the subset of indices of the locally nonredundant con(k) straints that define Ω(xk , ε). Let the cone C (k) (I(xk , ε)) be defined by (5.2) and let the cone CN be given (k) (k) (k) by CN = {v ∈ Rn : v T ai ≤ 0 ∀i ∈ IN (xk , ε)}. If {v1 , . . . , vp } is a set of generators for CN , then it is (k) also a set of generators for C (I(xk , ε)). Proof. This follows directly from Corollary 3.13. To extend the projection approach described in Section 3.3.1 for detecting locally nonredundant nonlinear constraints, we simply project onto a linearization of the constraint boundary, based on approximations to the (k) constraint gradients at the current iterate; i.e., we project onto the hyperplane Hj = {v ∈ Rn : v T aj = 0}. Scaling the constraints similar to (3.5) and applying Lemma 3.4 yields a projection equation similar to (3.10); namely, (k)

(k)

Pj (xk ) = xk + a ¯j cj (xk ),

(k)

a ¯j

=

aj

(k)

kaj k

,

j = 1, 2, . . . , q.

(5.3)

(k)

If the generators of CN at iteration k are linearly independent, then they would all be included in the set of search directions for that iteration. Otherwise, the set of search directions would include a maximal linearly independent subset of the generators, selected in exactly the same manner as discussed in Section 4.3. We omit a formal discussion of convergence, since any results would be dependent on the algorithm being used and on the details of its implementation. However, it appears safe to assume that any convergence (k) results will require a certain degree of accuracy by the vectors aj as approximations to the constraint gradients ∇cj (xk ). We view these ideas as a natural extension of those of Section 3.3.1, and one that can achieve a significant cost savings over the LP approach. Recall from Section 3.3.2 that the expense of the LP approach for linear constrained problems can be circumvented by performing it before the algorithm commences, since the redundancy of each constraint is independent of the location of the current iterate. However, this is not true for nonlinear constraints, in which case, the LP approach would have to be performed at every iteration, which is considerably more expensive than projection. 20

6. Concluding remarks. This paper fills an important gap in the pattern search literature, complementing the previous work of Lewis and Torczon [16] by rigorously treating the case of degenerate linear constraints. We have introduced an inexpensive projection method for identifying nonredundant constraints, which, when used in conjunction with a linear programming approach as a backup, can cheaply assess the redundancy of each constraint, and thus aid pattern search in computing directions that conform to the boundary of the feasible region. For the case in which nonredundant ε-active constraints are linearly dependent, we avoid complete enumeration of tangent cone generators by including only a subset of them, and then changing them at each iteration, such that all are used infinitely often over the entire iteration sequence. We prove first-order convergence in this case, under an additional mild assumption. Finally, we have shown how our ideas can be extended to nonlinearly constrained optimization problems under similar degenerate conditions. Acknowledgements. This work was begun at the IMA, where Olga Brezhneva was a postdoctoral fellow, John Dennis was a long-term visitor, and Mark Abramson was a short-term visitor. We thank the IMA for providing such a fine atmosphere for collaboration. We also thank Charles Audet for many useful discussions and suggestions, and two anonymous referees for comments that have helped us improve the paper. The views expressed in this paper are those of the authors and do not reflect the official policy or position of the United States Air Force, Department of Defense, United States Government, or research sponsors.

REFERENCES

[1] M. Abramson, NOMADm Optimization Software, http://en.afit.edu/ENC/Faculty/MAbramson/NOMADm.html, 2003. [2] M. Abramson, Second-order behavior of pattern search, SIAM J. Optim., to appear. Also appears as Technical Report TR04-03, Rice University, Department of Computational Mathematics, 2004. [3] C. Audet, Convergence results for pattern search algorithms are tight, Optim. Engin., 5 (2004), pp. 101–122. [4] C. Audet and J. E. Dennis JR., Analysis of generalized pattern searches, SIAM J. Optim., 13 (2003), pp. 889–903. [5] D. M. Avis and K. Fukuda, A pivoting algorithm for convex hulls and vertex enumeration of arrangements and polyhedra, Discrete Comput. Geom., 8 (1992), pp. 295–313. [6] D. M. Avis and K. Fukuda, Reverse search for enumeration, Discrete Applied Mathematics, 6 (1996), pp. 21-46. [7] H. C. P. Berbee, C. G. E. Boender, A. H. G. R. Kan, C. L. Scheffer, R. L. Smith, and J. Telgen, Hit-and-run algorithms for the identification of nonredundant linear inequalities, Math. Program., 37 (1987), pp. 184–207. [8] D. P. Bertsekas, Nonlinear programming, Athena Scientific, Belmont, MA, 1999. [9] A. Boneh, S. Boneh, and R. J. Caron, Constraint classification in mathematical programming, Math. Program., 61 (1993), pp. 61–73. [10] R. J. Caron, J. F. McDonald, and C. M. Ponic, A degenerate extreme point strategy for the classification of linear constraints as redundant or necessary, J. Optim. Theory Appl., 62 (1989), pp. 225–237. [11] F. H. Clarke, Optimization and nonsmooth analysis, SIAM Classics in applied mathematics Vol.5 (1990), Philadephia. [12] I. D. Coope, J. E. Dennis Jr., and C. J. Price, Direct search methods for nonlinearly constrained optimization using filters and frames, Optim. Engin., 5 (2004), pp. 123–144. [13] M. H. Karwan, V. Lotfi, J. Telgen, and S. Zionts, Redundancy in mathematical programming, Springer-Verlag, Berlin, 1983. [14] T. K. Kolda, R. M. Lewis and V. Torczon, Optimization by direct search: new perspectives on some classical and modern methods, SIAM Review, 45 (2003), pp. 385–482. [15] T. K. Kolda, R. M. Lewis and V. Torczon, Stationarity results for generating set search for linearly constrained optimization, Technical Report SAND2003-8550, Sandia National Laboratories, Livermore, California, Oct. 2003. [16] R. M. Lewis and V. Torczon, Pattern search methods for linearly constrained minimization, SIAM J. Optim., 10 (2000), pp. 917–941. [17] S. Lucidi, M. Sciandrone, and P. Tseng, Objective-Derivative-Free Methods for Constrained Optimization, Math. Program., 92 (2002), pp. 37–59. [18] J. H. May, Linearly constrained nonlinear programming: a solution method that does not require analytic derivatives, PhD thesis, Yale University, December 1974. [19] G. L. Nemhauser and L. A. Wolsey, Integer and combinatorial optimization, John Wiley & Sons, New York, 1988. [20] C. Oberlin and S. J. Wright, Active constraint identification in nonlinear Programming, Optimization Technical Report 05-01, Computer Sciences Department, University of Wisconsin-Madison, January, 2005. 21

[21] C. J. Price and I. D. Coope, Frames and grids in unconstrained and linearly constrained optimization: a non-smooth approach, SIAM J. Optim., 14 (2004), pp. 415–438. [22] V. Torczon, On the convergence of pattern search, SIAM J. Optim., 7 (1997), pp. 1–25. [23] L. N. Trefethen and D. Bau, III, Numerical linear algebra, SIAM, Philadelphia, 1997. [24] J. Van Tiel, Convex Analysis, John Wiley & Sons, New York, 1984. [25] L. A. Wolsey, Integer programming, John Wiley & Sons, New York, 1998. [26] S. J. Wright, Constraint identification and algorithm stabilization for degenerate nonlinear programs, Math. Program., Series B, 95 (2003), pp. 137-160.

22

Suggest Documents