AN INTRODUCTION TO NONLINEAR PROGRAMMING

0 downloads 0 Views 2MB Size Report
Sep 17, 1976 - will complete the tutorial on numerical methods begun in Part II with .... where p, is the direction of search, k, is the step size and y,, y,+l are .... the orthogonal projection matrix [12-14] ... Normalization in unit hypercube. ..... However, A'Ho=0 and any square matrix multiplied by its inverse is the identity matrix.
Comput. & Elect Engng, Vol 3, pp 347-386 Pergamon Press, 1976. Printed in Great Britmn

AN INTRODUCTION NUMERICAL

TO NONLINEAR PROGRAMMING--IVt METHODS FOR CONSTRAINED MINIMIZATION H. W. SORENSON

Department of Applied Mechanics and Engineering Sciences, University of California, San Diego, La Jolla, CA 92093, U.S.A.

and H. M. KOBLE Systems Division, Navigation Systems Section, Jet Propulsion Laboratory, Pasadena, CA 91103, U.S.A.

(Received 17 September 1976) Almract--In this paper, we consider the problem of determining the numerical solution of constrained minimization problems. This discussion complements the theoretical development regarding the nature of the solution of the general nonlinear programming problem that was presented in Part I of this series of articles. As in our earlier discussion, the objective of this article is to review basic ideas and to illustrate the application of the ideas by describing specific computational algorithms. Thus, we discuss a variety of algorithms but not always in their greatest detail. References are provided for more detaile.d expositions and for generalizations and extensions of the basic algorithms. Attention is given first to the solution of problems with linear constraints. Then, approximation methods that reduce nonlinearly constrained problems to a sequence of linear programming problems are described. Finally, the discussion is completed by describing methods that reduce nonlinearly constrained problems to a sequence of unconstrained problems. Much of the discussion of specific algorithms draws upon the results presented in Parts II and III of this series.

1. AN OVERVIEW OF CONSTRAINED MINIMIZATION TECHNIQUES This is the fourth in a series of tutorial articles which have reviewed, developed and discussed some of the basic results regarding the solution of parameter optimization (i.e. linear and nonlinear programming) problems. In this concluding article of the series, attention returns to the general formulationconsidered in P art I of the sequence. Thus, we consider the numerical solution of problems that are characterized in the following manner.*

General nonlinear programming problem Choose n variables y which will minimize a cost function l=l(y)

(1.1)

f(y)=0

(1.2)

h(y)->0.

(1.3)

subject to m equality constraints

and p inequality constraints

We specifically assume that m +p > 0 and at least one function in the problem definition is non-linear. The other article that considered the general problem, Part I of the sequence, focused on theoretical topics. Primarily, this involved necessary conditions which a solution of (1.1)-(1.3) tThis work was supported by the Air Force Office of Scientific Research under Grant No. AFOSR-75-2839. ~:Throughout much of this article, it will not be necessary to make an explicit distinction between inequalities of the form (1.3) and any non-negativity conditions (i.e. y >-0) which may be imposed on the variables. 347

348

H. W. SORENSONand H. M. KOBLE

obtained. Now, the discussion will be directed to the examination of some of the search procedures that have been developed for numerically determining a solution vector. It, thereby, will complete the tutorial on numerical methods begun in Part II with the simplex algorithm for linear problems and continued in Part III with the various search procedures for unconstrained (i.e. m + p = 0) nonlinear problems. Many of the basic tools and techniques used to solve constrained minimization problems have been developed in the preceding papers of this series. For a general discussion of the problem, one should refer to the introductory remarks in Part I. In addition, the Kuhn-Tucker conditions derived in Part I provide the necessary conditions that must be satisfied at a minimum. Further, nonlinear programming problems are sometimes solved by approximating them by linear problems and solving the resulting problem using linear programming methods as described in Part II. Finally, the gradient and direct search methods of Part III have been modified for use in constrained problems. The introductory remarks in Part III regarding the general characteristics of search procedures are applicable to this more complex problem. However, the presence of constraints introduces nontrivial complications which must be dealt with in obtaining a solution. For example, in Part III any point yo could be used as the initial condition for a search procedure. With constraints present it becomes necessary to determine if a trial point satisfies the constraints. Obviously, a point must be feasible if it is to be a candidate for a solution of a given problem. Search techniques of two types have been developed. A search procedure is said to be an interior point technique if each trial point is a feasible point. On the other hand if the trial points of a search procedure are not feasible but the procedure yields points which tend toward the admissible region, it is referred to as an exterior point technique. Before proceeding, it is worth noting that it ispossible to eliminate inequality constraints from further consideration by introducing slack variables as in the discussion of linear programming in Part II. However, the use of slack variables increases the overall dimensionality of the problem and can cause computational difficulties that significantly complicate the actual solution of the problem. Unlike our earlier discussion of linear programming problems, a sizable portion of this discussion will deal explicitly with inequality constraints to the exclusion of equality constraints. In trying to read the extensive literature on this subject, we believe it is helpful to observe that most of the methods which have been proposed fall into one of two general categories. We will call these categories primal (direct) solution methods and indirect solution methods.

Primal methods A primal method basically involves the development of an iterative search procedure that works on the given constrained problem directly. The algorithms in question require (1) an initial estimate of the solution yo which satisfies all imposed constraint relations, i.e. is feasible, (2) a mechanism for generating new estimates yk, k = (1,2 .... ) which decreases the cost function value as k increases and, in general, satisfy all constraints and (3) a criterion for terminating the search. The simplex algorithm for linear programs provides an example of a primal method. However, recall that the crux of the algorithm was based on the geometrical observation that the only possible candidates for the optimal solution (assuming uniqueness) were the extreme points (vertices) of the convex polytope which defines the feasible region. Once an initial extreme point was determined, the mechanism for updating the solution estimate consisted of moving to adjacent vertices in sequential fashion. Unfortunately, in nonlinear problems, the feasible region may not be a convex polytope and, irrespective of whether it is, local minima may lie on the boundary or in the interior. The pattern of search cannot follow the simplex philosophy but rather must move throughout the feasible region. In this respect, the pattern is more akin to the unconstrained minimization techniques of Part III. As noted above, the search must produce trial points that are feasible. Because of this requirement, the unconstrained methods of Part III are not directly applicable. We shall review primal methods in Section 2 of this paper.

Indirect methods Primal methods generally have shown their greatest utility in applications to problems with a nonlinear cost function but linear constraints. The various difficulties associated with the

An introduction to nonlinear programming--IV

349

theoretical development and digital computer implementation of primal search procedures for problems with nonlinear constraints have motivated the development of techniques based on an entirely different philosophy. These alternative, indirect approaches are predicated on the assumption that a linear program or unconstrained minimization problem is more amenable to direct solution because reliable and efficient software codes are widely available. A major concern in the formulation of an indirect method is to define a mechanism which transforms a given nonlinear, constrained problem into either a linear program or a nonlinear, unconstrained problem. The solution of the transformed problem will serve as a trial point in the search for the solution of the original problem. This trial point, which might not even be a feasible point of the problem, then is used to define a new transformation of the problem which is then solved to obtained the next trial point. Thus, a sequence of approximating problems is defined and their solutions constitute a sequence of trial solutions of the actual problem of interest. Consequently, a major concern with indirect techniques is to define problem transformations such that the seguence of trial points converges to the solution of the nonlinear, constrained problem. As will be emphasized below, the sequence of linear programs or unconstrained minimization problems can become increasingly difficult because either (1) the number of variables or constraints of the linear program can increase at each stage and can impose prohibitive computational burdens or (2) the unconstrained minimization problem can become very ill-conditioned with the result that very poor convergence properties are observed. Primal methods are discussed in Section 2 of this paper. Attention is restricted principally to problems having linear constraints. Zoutendijk's methods of feasible directions, Rosen's gradient projection method and variable metric projection methods are described. The discussion turns to indirect methods in Sections 3 and 4. In Section 3, methods involving linear approximations are considered. Among this class, we discuss cutting plane methods, the method of approximate programming and separable programming methods. We conclude our discussion in Section 4 by considering methods that obtain the solution of nonlinear, constrained problems by considering the solution of a sequence of unconstrained methods. To describe this class of indirect search procedures, we consider penalty function methods, dual methods and the method of multipliers. The reader is reminded that these articles have been introductory in nature, and no attempt has been made to survey the considerable literature on each topic. Rather, the intent has been to make this literature more accessible and intelligible by citing important references and discussing, with the aid of several examples, the concepts which are most fundamental. Advanced topics such as convergence and convergence rate analysis, acceleration procedures, applications to large dimensional problems, etc. have been either briefly discussed or ignored altogether. For more extensive discussions, Refs. 1-11 are recommended. 2. PRIMAL METHODS OF SOLUTION The algorithms that we discussed in Part III of this sequence of articles for solving unconstrained minimization problems are based on an iterative search procedure defined by y,÷, = y, + k,p,

(2.1)

where p, is the direction of search, k, is the step size and y,, y,+l are successive trial points of the iterative sequence. The selection of the search direction is based on the evaluation of the cost function at one or more points, possibly an evaluation of its gradient and perhaps some second derivative information. When the parameter optimization problem is modified to include one or more equality and/or inequality constraints, the choice of p, cannot be based on cost function information alone. In addition, the choice of the step size parameter k, should reflect the existence of the constraints. As a simple illustration, consider Fig. 1. Suppose that the initial trial point yo is feasible and is contained in the interior of the feasible region. The initial search direction po causes a reduction in cost and directs the search toward the constraint boundary. In the absence of the constraint, the step-size parameter k~ would probably be chosen to minimize the cost in the search direction. As is evident from the figure, this is not possible and ki must be chosen to retain feasibility. The new trial point yl represents the minimum of the cost function in the search direction subject to the constraint. At y, the use of an unconstrained minimization algorithm would probably cause the

350

H. W. SORENSONand H. M. KOBLE

~-~/"~c Constraint

Boundary

.u

Jr / / ~ /

e/

constant

/

Cost Co"t o u ~

-Y~ol Search Direction

Yo Fig. 1. Unconstrained versus constraint iterative search.

search to be directed toward the unconstrained minimum U and out of the admissible region. Thus, at yl the search direction p, must be chosen to point into the admissible region or along the constraint boundary and toward the constrained minimum C. Clearly, for a nonlinear constraint boundary, a search procedure defined by (2.1) must be directed into the admissible region since an attempt to follow the boundary would result in infeasibility. Primal search methods deal directly with the nonlinear programming problem (1.1)-(1.3), and satisfy the recursion (2.1). Certainly, the design of a primal search algorithm must incorporate information about the constraint functions into the pattern of search. Typically, all trial points y, are required to be feasible. Thus, if the search is terminated prematurely, the last trial point, while not providing the minimum, will at least satisfy the constraints and serve as a useful approximation. In addition, the procedures are required to reduce the cost at each new trial point thereby inducing a form of stability on the search. It is possible to define primal methods for problems with either linear or nonlinear constraints. Numerical experience with these algorithms indicates that they are most effective for problems having only linear constraints. The indirect methods discussed in Sections 3 and 4 appear to be more useful in attempting to solve nonlinearly constrained problems. Consequently, we shall restrict our remarks in the discussion of primal methods to linearly constrained problems. The extension to nonlinear constraints is presented in the references provided throughout this section. General considerations for linearly constrained minimization problems are discussed in Section 2.1. Then, in Section 2.2 Zoutendijk's methods of feasible directions are discussed. Rosen's gradient projection method is presented in Section 2.3. Variable-metric projection methods with special emphasis on Goldfarb's method are discussed in Section 2.4. 2.1 Problems with linear constraints Let us consider the problem of minimizing a nonlinear cost function I subject to p linear inequality constraints Ay - b -> 0.

(2.2)

Note that nonnegativity constraints, if present, can be regarded as being included in (2.2). Also, equality constraints, say A l y - bj = 0, are implicitly contained in (2.2) simply by writing the equivalent sets of inequality constraints A~y-b~ ->0 -Aly+b~ >0. Thus, we can regard (2.2) as providing a completely general description of the constraint region. Suppose that the y* that minimizes 1 subject to (2.2) is to be found by a search procedure having the form (2.1). As in the discussion for unconstrained problems, in Part III, we shall

An introductionto nonlinearprogramming--IV

351

impose the requirement that the search procedures are descent procedures in the sense that for all i l(y,+,) < l(y,). In addition, we shall require that each trial point y, satisfy the constraints (2.2). These yi are said to be feasible points. Let us now consider restrictions that must be placed on the search directions p, that insure that the procedure meets these requirements. Suppose that y, is a feasible point and let the gradient at y~ be defined as g, g,=

yi

.

For p~ to be a feasible direction (i.e. a direction that can yield a new feasible point), one must have for some k, > 0 Ay,+l-b>0. Thus, (Ay, - b) + k~Ap, -> 0.

(2.3)

If Ae represents the rows of A that are associated with active constraints, then for p, to be a feasible direction, it is necessary and sufficientthat Aepi -- 0.

(2.4)

Thus, all pi that are feasible directions satisfy (2.4). Let those pl that cause a reduction in the cost for sufficientlysmall k, be denoted as usable directions. Usable directions must satisfy the condition that g rp, < 0

(2.5)

which follows from the Taylor series expansion. That is, for kj sufficiently small 0 -> l(y,+l) - l(yi) ~- g,r (y~+l- y,) = kl(g,Tp,). The problem of defining a search direction p~ is seen to be related to the problem of finding usable feasible directions p, that satisfy (2.4) and (2.5). We will consider the implication of these observations in discussing Zoutendijk's methods of feasible directions in Section 2.2. Let us consider an interesting illustration of the effect of active constraints in the determination of a feasible search direction. Suppose that we consider active (or equality) constraints Ay- b = 0

(2.6)

and let us denote the ]th row of A as a:. Then, we have ajTy - bj = 0. Each constraint in (2.6) defines an ( n - D-dimensional hyperplane. Since the gradient of the constraint is ay

(airy - bj) = ajr

the row vector ajT defines the normal to the hyperplane. The intersection of the m hyperplanes defined by (2.6) yields an ( n - m)-dimensional linear manifold D. The vectors aj define an

352

H.W. SORENSONand H. M. KORLE

m-dimensional linear subspace/5 where D and/5 are mutually orthogonal. Thus, any vector y can be expressed uniquely as y = yo + y~

(2.7)

where y ~ / 5 and yD is parallel to D (i.e. D does not contain the origin unless b = 0). This is depicted in Fig. 2 for a single constraint in a two-dimensional space. Y2'

\

\/--_,T_,-

D Yl

YD

""

\

Fig. 2. Decomposition of a vector y.

Let us consider Fig. 2 further. Suppose that we want to minimize a function I(y,, y2) subject to the constraint

a~y~+a2y2-b = a r y - b =0. In other words, we want to determine y to minimize l in the linear manifold D. But the figure indicates that y can be decomposed according to (2.7). The representation of y as the direct sum of vectors in D and in a subspace parallel to D is accomplished by forming the orthogonal projection of y onto/5 and onto its orthogonal complement. Thus, we want to find the vector yD that minimizes I. The preceding example illustrates the general result that the minimization of a cost function l subject to linear equality constraints can be accomplished as an unconstrained minimization in a linear subspace. When the matrix A has maximal rank m(m < n), the projection of an arbitrary vector y onto the subspace spanned by rows of A, a~r, a2r . . . . . a,, r is readily accomplished using the orthogonal projection matrix [12-14]

~,, a=AT(AAT)_,A"

(2.8)

The projection onto the orthogonal complement D is accomplished by the orthogonal projection matrix (See Refs. 12-14 for additional discussion)

p,. = I - A r ( A A r ) - I A .

(2.9)

Thus, an arbitrary vector y can be expressed as the sum of yo and yb where yo = P,,y;

y~ = / S y .

(2.10)

Using Pro, an unconstrained search procedure can be accomplished in the subspace defined by the constraints. This observation forms the basis for the discussion in Sections 2.3 and 2.4 of Rosen's gradient projection method and the variable-metric projection method. Before dealing with some specific techniques, one additional point must be emphasized. Space does not permit us to deal adequately with the topic of computing the step-size k,. This calculation cannot be dismissed lightly because the convergence properties of many algorithms break down if the one-dimensional line search is done inaccurately. Unfortunately, there is

An introductionto nonlinearprogramming--IV

353

usually a trade-off between a highly accurate minimization of 10') along the search direction p, and computation time. For information on step-size computations, we invite the reader to consult the general references listed in the Bibliography. 2.2 Zoutendijk's methods of feasible directions Zoutendijk [ l l] has proposed a number of related methods for finding the vector y* that minimizes a cost function I subject to p linear inequality constraints (2.2). His methods provide a means for determining feasible, usable directions p, that are obtained at each iterative step as the solution of a linear program. We shall describe the basic ideas behind his approach. At each state, the search direction p, must satisfy (2.4)

A~p, >-0 in order to be a feasible direction. To be usable, it must also satisfy the constraint (2.5) that gfp, < 0. Then, Zoutendijk suggests that to obtain the greatest local reduction in the cost, it is reasonable to choose p, so that (2.5) is minimized subject to (2.4). To insure a bounded solution, the magnitude of p, must be restricted by an appropriate normalization condition. With this idea in mind, let us state the search procedure that is obtained. The magnitude of the search direction p, can be restricted in a variety of ways. For example, one can require that p, satisfy the condition that p rp, 0 piT"p, < 1.

This problem is depicted in Fig. 3 for a two-dimensional case. From the nature of the inner product, it is seen that the solution of this problem occurs for p, with magnitude equal to one. Furthermore, the "best" direction p, will maximize the angle between g, and p, as is shown in Fig. 3. The problem defined by (2.12) contains a quadratic constraint p,rp, ___1. Zoutendijk has proposed a solution procedure which makes use of the Kuhn-Tucker conditions to reduce the problem to a linear program. Then, conventional computer codes can be used to obtain a solution. We shall not discuss the details here. The reader is directed to Zoutendijk's book[Ill. Normalizations other than (2.11) are possible. For example, we could require p, to be contained in a unit hypercube. That is, each component pj, of p. is required to satisfy

IpJ,I --0 and a normalization condition. A better choice would have been made if we observed that although a2Tyi > 0

the scalar product was sufficientlyclose to zero so that the constraint boundary a2Ty = 0 was nearby. Zoutendijk[15], McCormick[16] and others have developed modifications to the previously described feasible direction methods which help to avoid the jamming phenomenon. 2.3 Rosen's gradient projection method [17] The Zoutendijk methods attempt to find a "best" usable, feasible direction in the vicinity of a point generated by the iterative search. To do so however, they require the solution of a parameter optimization problem (i.e. minimize g rp, subject to certain constraints). This can be a

356

H. W. SORENSONand H. M. KOBLE

computationally expensive operation if the dimensionality of the problem is large. As an alternative, we now consider a method due to Rosen which seeks a search direction that may not always be locally "best", but which nevertheless can be computed with far fewer arithmetic operations. The method was originally developed for linear constraints, and we will consider this case below. Later, it was extended to include the case where nonlinearities are present[18]. There is no loss of generality in restricting our attention to problem formulations wherein only inequality constraints are imposed. Rosen's procedure works only with the set of active constraints at the various feasible points generated by the search. Regardless of the status of the inequality constraints at a trial point, by definition all equality constraints are binding. Consequently, any procedure developed for active inequality constraints can be easily extended to include equalities. Therefore, we will again consider the problem of minimizing I subject to the linear inequality constraints (2.2). Also, we will assume again that an initial feasible point yo is available. If a feasible point does not exist, it can always be determined using Phase I of the Simplex procedure described in Part II of this series. Let us consider the determination of trial points according to the recursion (2.1) y,+l = y, + k,p,, k , > 0 where p, must be a feasible, usable direction (i.e. it must satisfy (2.4) and (2.5)). For unconstrained minimization problems, we consider the steepest descent search procedure in which the negative gradient, -g,, was used as the search direction. For this problem, the negative gradient is seen to be a usable direction since g rp, = - g f g , < 0 and (2.5) is satisfied. If y, is in the interior of the admissible region (i.e. no constraints are active), then -g, must also be a feasible direction. If on the other hand, y, is on the boundary so that some constraints are active, then -g, often will not represent a feasible direction. Rosen proposed a method using the idea of orthogonal projections (see Section 2.1) that provides a systematic transformation of the gradient in order to obtain a feasible direction. Suppose at a trial point y, that there are q active constraints. Let Pq be the matrixt that projects any vector onto the subspace defined by the active constraints. Then, if the gradient evaluated at y, is projected onto the constraint region, it follows from the Kuhn-Tucker conditions (see Part I) that y, is the minimizing solution if and only if P,,g, = 0

(2.19)

(AeAZ )-I A~g, ~- O.

(2.20)

and

The condition (2.19) follows by considering the Lagrangian for the problem L = l - Ar ( A e y - be) where we only need to consider active constraints. Then, the stationary point condition Ly = 0 yields g -- A Z ~

= 0.

(2.21)

tFollowingthe discussionin Section2.1, let A denotethe matrixformedfromthe rows of A correspondingto active constraints. Then,the matrix P~ ~ I - A r(AeAZ)-'Ae projects any vectory onto the subspacedefinedby the activeconstraints.

An introductionto nonlinearprogramming--IV

357

Assuming that A has maximal rank, it follows that A is given by A = (AeAZ)-'A~g.

(2.22)

Using this, (2.21) can be rewritten as [I - AeT (AeAeT )-' Ae]g = O.

But from (2.9), it follows from the definition of the projection matrix that we have (2.19). The Kuhn-Tucker conditions also require that the Lagrange multipliers must be nonnegative at the minimizing solution. From this result and (2.22), we have the condition (2.20). Using (2.19) and (2.20) we are now in a position to define a computational procedure for determining the minimum of a cost function subject to linear inequality constraints. First, let us consider a trial point at which either or both of (2.19) and (2.20) are not satisfied. If we define a search direction pi as A

pl = -Pqg,,

(2.23)

it follows that pl must provide a feasible direction. This follows from the fact that P. represents the projection matrix associated with the active constraints at y,. Therefore pi is contained in the subspace defined by the constraint. It is also true that p~ defines a usable direction. This is verified by writing g Tp, = (gT + p T _ piT)p,. But it follows from (2.22) and (2.23) that A

g, + p, = AeTA and we see that (gl T + piT)pi = A~'Aepi = 0.

Consequently, it follows that giTpl = --p,Tpi < 0

and the direction p, given by (2.23) must be usable. Consider a trial point y, and suppose that either (2.19) or (2.20) is not satisfied. First, suppose that (2.19) is violated. Case 1. Pqg,~0.

This implies that the cost can be reduced within the linear manifold defined by the active constraints Ae. Thus, determine a new trial point according to yi+l = y, - k,Pqgl.

(2.24)

The point y,+, lies in D and ki is chosen so that either (1) the minimum is located in the search direction or (2) another constraint becomes active. Note that this step may cause one or more constraints to become active. If this occurs, the projection matrix must be redetermined at yi÷~ before continuing the search. Rather than recomputing the matrix Pq and carrying out the required matrix inversion, it is convenient to compute the new projection matrix, recursively from Pq. The ease with which the projection matrix can be redetermined represents one advantage of this computational scheme. For explicit discussion of this procedure, see Ref. [17]. Case 2. P~gl = 0 but A, < 0 for at least one j.

In this case, the cost can be reduced by causing the constraints associated with negative aj to become inactive. Choose one of the constraints and eliminate it from the projection matrix to obtain P~_,. For example, the constraint for which IlaJllAj is most negative might be chosen.

358

H.W. SORENSONand H. M. KOBLE

Without loss of generality, suppose that a~ is selected. Then Pq-laq # 0 since aq is linearly independent of a,, as . . . . . aq-l. Then

= hqPq_laq ~ O.

In the new manifold D, the projection of the gradient is nonzero so Case I can be repeated. Note that a~ry,+~ - b~ = a J y , - b~ - k,a~Tp~-~g, = -k, a J P q - l g , = -k, AqaqTPq_,aq -0 since ki > 0, h~ < 0. Thus, the qth constraint, which was active at y, is not violated in going to y,+,. In Case 2 it is seen that a new projection matrix Pq-1 must be formed at y,. Thus, in either case, the necessity for redetermining the projection matrix exists and provision for its computation is an important part of the procedure. In fact the only circumstance for which this matrix does not require redetermination occurs in Case 1 when the minimum in the search direction occurs before a new constraint is encountered. Let us now summarize the details of Rosen's gradient projection search procedure.

Summary of Rosen's gradient projection procedure 1. To start the procedure, some feasible point y, must be determined. Since the constraints are linear, Phase I of the simplex procedure could be used. Compute gl. 2. Suppose there are q active constraints at yl and the vectors, say a~, a2. . . . . aq, associated with these constraints are linearly independent. Form the projection matrix Pq and the matrix

(AAT)-t). 3. Consider an arbitrary stage of the procedure, say i, and suppose that y~, g,, P~, (AeAeT) -1 a r e given. Then, we define the following general steps. 4. Compute Pqg, and

A = (A~Af)-IA~gl. 5. If Pqg, = 0 and A -> 0, the solution has been found. Terminate the search. 6. If A,g, < 0, let p, = g~ and go to 9. (All active constraints become inactive). 7. If Pqg~# 0, let p, =P~g, and go to 9. 8. If P~gi = 0 and some aj < 0, select that ] for which A, is most negative and form P~_~. Let p~ = Pq-~g, and go to 9. 9. Determine k' as the smallest value of the step-size parameter that causes a previously inactive constraint to become active. 10. Determine k, 0 -< k -< k' that minimizes the cost function in the search direction. Let this be k,. 11. Form y~+l = yi - k,p~ and compute g,+,. 12. If k, = k', add the jth constraint to the set of active constraints by forming P~+~. Set q=q+l andi=i+l and return to 4. 13. If k~ < k', no change in Pq is required. Let i = i + 1 and return to Step 4.

An introductionto nonlinearprogramming--W

359

For linear programming problems, this procedure, when Step 6 is omitted, yields a search that is identical with that of the simplex procedure if y, is a basic feasible point. Thus, convergence in a finite number of steps can be assured for linear programming problems. For nonlinear cost functions, finite convergence cannot, as a rule, be guaranteed. 2.4 Variable metric methods [or linear constraints The Zoutendijk and Rosen procedures described in the two previous sections compute a search direction based on information concerning function derivatives up to order one. As such, we can consider them to be extensions to the linearly constrained case of the steepest descent method for unconstrained problems. In the discussion of Part III of this sequence of articles, it was pointed out that the steepest descent method generally exhibits slow convergence properties and, as a consequence, algorithms which make use of conjugate direction properties and information concerning the cost function Hessian matrix (or its inverse) have received considerable attention. It would seem natural to consider the extension of these procedures to the linearly constrained problem in the hope that the more favorable convergence properties might be retained. As an illustration, let us consider variable metric or, as they are commonly known, quasi-Newton methods. In particular, recall from Part III the Davidon-Fletcher-Powell (D--F-P) algorithm. The method involved the following salient features: 1. The search direction at iteration i is determined according to pi=HEg,

(2.25)

where H, is the/th approximation to the deflection matrix and gi is the gradient vector. 2. Given a point y,, the search direction p,, and a step length k, determined by minimizing l(y) along p,, we compute Yi+l = y, + kipi. 3. Given H,, p,, g,, and the cost function gradient at y,÷,, gi÷l, the (i + 1)st deflection matrix is computed according to H,+, = Hi

HO'¢yiTH' 4-~ TITHiTi - pir~,,

(2.26)

where A

~,, = gi+, - gi.

(2.27)

Then, a new search direction p,÷l may be computed using (2.25) for i = i + 1.

Extension to linear equalities Consider briefly the extension of this algorithm to problems involving m linear equality constraints, i.e. minimize l(y) subject to

f(y)=A'y-b~=O. If the iterative search procedure is to be feasible, we know from (2.4) that A~p, = 0. Davidon observed that if the initial deflection matrix Ho were orthogonal to the constraint normals, i.e. A'Ho=0

360

H.W. SORENSONand H. M. KOBLE

and subsequent H, matrices were computed according to (2.26), then

A'H,=O,

i = 1,2 . . . . .

(2.28)

From (2.25), this means Alp, = - A I H , g, = 0 and implies that the constraints will be satisfied at all trial points. Consequently, to modify the D-F-P algorithm for linear equality constraints we need to (1) choose/4o such that A'Ho = 0 and (2) choose yo such that A ~yo= b~ (i.e. to be feasible). To preserve the convergence properties of the D-F-P algorithm which apply in the unconstrained case it is also necessary for/4o to be positive definite for all non-zero vectors x such that Alx = 0. A suitable initial approximation satisfying these requirements is to choose Ho to be the projection matrix defined by (2.9). Recalling that all equality constraints are active at yo, we let

Ho = I - (A ')r[A'(A ') r] 'A '

(2.29)

where it is assumed A' has full row rank. The matrix Ho can also be used to generate yo. We set yo = How + (A')T[A'(A')r]-'b,

(2.30)

where w is an arbitrary n-dimensional vector. Notice that A'yo = A'How+A'(A')'r[A'(A')r]-'b,.

(2.31)

However, A ' H o = 0 and any square matrix multiplied by its inverse is the identity matrix. Therefore, (2.31) reduces to A ~yo= 0w + Ibl = b~.

Extension to inequality constraints Goldfarb[9] extended the D-F-P algorithm to problems involving linear inequality constraints by utilizing techniques similar to the projected gradient idea of Rosen's method. The procedure is stated immediately below. Goldfarb's projected conjugate gradient procedure I. To start the procedure, some feasible point y, must be determined. Since the constraints are linear, Phase I of the simplex procedure could be used. Compute g,. 2. Suppose there are q active constraints and the vectors a,, a2. . . . . a~ associated with these constraints are linearly independent. Form the projection matrix Pq and the matrix (AeAer) -~. To start the search, let H,' _a P~. 3. Suppose that y,, g,, Hq', Pq, (A,A,T) -' are given and follow the following general steps. 4. Compute H,'gl and

A = (A,Aer)-lAerg,. For convenience, suppose .,qeqq ~-an ~ Ad3ff,2, i = 1, 2, .. ., q - 1 where/3, is the ith diagonal element of (AeA,r) -t. Note that/3, > 0. 5. If Hq'g, = 0 and A >- 0, the minimum has been found. Terminate the search. 6. If IIH g, II> max{0, --l/2Aqfl~/2}, let p, = H,/g, and go to 8.

An introductionto nonlinearprogramming--IV

361

7. If IIHqg,II- 0 Dy>y where a is an n-dimensional vector, D is a (p - 1) × n matrix, y is a (p - 1) dimensional vector and h is a concave, nonlinear function. Conversion to the form defined by (3.3) is straightforward; e is defined as before and (3.9)

S ___a{x: h(y)-> 0, D y -> y, w -> ary}. &

Step O. Define Po = {x: Dy-> y, w -> otTy}. Clearly S C Po since the one nonlinear constraint has been removed. Set k = 0. A Step 1. Solve the linear program (3.5). Call the solution x~T = (wk, y T). Step 2. If yk satisfies the constraint h(y0-> 0, terminate. Otherwise, go to Step 3. Step 3. If h(yk)< 0, linearize h(y) about yk using a first order Taylor-series expansion

L(y)" h(yk)+

k lY-yk].

(3.10)

Then, definer Hk =~{y:/~(y) > 0}.

(3.11)

Comparing with (3.6), it is clear that

Oh a/~ =~(yO bE=

k

yk--h(yk).

Letting Pk.l = Pk ~ Hk, the algorithm returns to Step 2, setting k = k + 1 therein. To illustrate the procedure, consider the following example problem due to Kelley.

Example Minimize I -- y~ - y2 tThis definition of H~ satisfies the desired property yE~ HE because /~,,(yE)-0 also satisfies the constraint/*~(y)->O.

An introduction to nonlinear programming--IV

365

subject to h(yl, Y2) = 3yl 2 - 2yly2 + y22- 1 -0. Note that the optimal solution occurs at y , = (5.27~ \3.68]" To start the iterations, let y, = (~). At this feasible point, l(y0 = - 6 9 h~(y,) = 7 h2(y0 = 4 - e ~. The linearized problem is defined by, [(y) = - 8 y l - 4y2 - 25 /~l(y) = - 2 y l - 4y2 + 27 -> 0 /~2(y) = 0.28y, + y2 - 2.84 -> 0 /~3(y) = y, -> 0 /z,(Y) = y2 -> 0. The original problem and these linearized constraints are depicted in Fig. 7.

368

H.W. SORENSONand H. M. KOBLE Y2

-- ~ t f ~

~..=-70

/

8

hl=O

/

\

6 4

"%L~- ~" '/):.

/

2

6 Xe] /

4

h2=O

~

,

.t=-o5

~2 "~.

h2=o Fig. 7. MAP example.

0.5), thereby To insure that the next trial point is not excessively far from y,, we set & = 0.5 imposing the constraints Y~- 0.5

-< y'

0.5 "

The solution of the resulting L P problem is found to be 4.5) ~'= 2.5 where l(~) = -70.65 h,(~) = 7.75 h2(~) = 9.5 - e " = 5.02. Since ~ is feasible for the original problem, we set y2 = ~. Relinearizing about y2 = ~, we obtain the problem defined by /" = -7yl - 5y2 - 36 /~1 = 31.4 - 3yl - 4y2 -> 0 /~: = 5.42 - 1.98yl + 1.5y2 -> 0 / ~ = y,->O

/;4= y~->O. We decrease $ as well by setting $2 = 0.4) 0.4 " The solution of the n e w LP problem is (5.45'~ = ,,3.58/" This is a reasonable approximation of the actual solution. The iterations could be continued until a more accurate solution is obtained. The method of approximate programming provides a reasonable approach to the solution of a nonlinear programming problem. The behavior of the algorithm is often characterized by a sequence of small steps and a large number of iterations. Of course, the rate of convergence is strongly dependent upon the manner in which the step-size parameters 8k are determined. It should be mentioned that in recent years, the (MAP) method has been revived by several authors. The papers of Beale [26] and Meyer [27] are recommended. It has come into wider use as

An introduction to nonlinear programming--IV

369

a tool for solving very large nonlinear problems. Buzby[28], for example, reports its use to solve problems with 2200 constraints and 200 nonlinear variables in computational times of a few hours. Also, the notion of bounding the range of the variables has been extracted from MAP by Marsten, Hogan and Blankenship [29] to enhance the performance of cutting plane procedures. 3.3 The method o[ separable programming[30] A function [ of n variables yr = (y,, y2. . . . . y.) is said to be separable if it can be written as the sum of n functions of which each involves only one of the variables. .f(Y)= f,(Y,) + h(Y2) + " "

(3.15)

+/.(Y.).

When the nonlinearities involve only separable functions, the approximation problem is simplified because each scalar function can be approximated separately. Let us consider a means of approximating a scalar function such as depicted in Fig. 8.

/ /~-'~

Pmcewise LinearJ Approximation 7

y

Yl Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 YlO

Fig. 8. Piecewise linear approximations for separable programming.

Suppose a scalar function of a scalar variable is approximated by defining a piecewise linear approximation. To obtain this representation, we define a grid of points y,. Consider two adjacent A

grid points y, and y,+, and the associated function values f, & f(y,); f,÷, = f(y~+,). The straight line connecting these points can be written as

/(y) ~,I, + ~,+,L+, =

where 0 - O, 1 = 1, 2 . . . . . p t~l

(3.20)

t=l

The symbol hr, denotes the value of the ith term of the Ith constraint equation at the ]th grid point. The coefficients a,, satisfy the conditions introduced above. sI

O-0 if h~(y)< 0.

These choices for e and p can be shown to satisfy the properties required for (4.7) to be valid. Other suitable choices include: m

P~(Y) = ~ Ifi(Y)l

(4.11)

ph(y) -- ~ [(hi - Ih, I)/2] 2.

(4.12)

375

An introductionto nonlinearprogramming--IV

It must be emphasized that (4.8)-(4.12) represent convenient choices for e, p~ and ph but other definitions are possible. The appropriate choice of an exterior-penalty function is problem dependent. No "best" definitions of these functions are known. In addition, the convergence properties of any penalty function method depend not only upon the forms assigned to e, Pl and ph but also depend upon the initial choice of rl and upon the manner in which the sequence {rk} is prescribed. These and other questions related to the computational implementation of penalty function methods are discussed later in this section.

Interior-point methods Unlike the exterior-point methods described above, interior-point (or barrier) methods require that every trial point be feasible. The constraints are included such that a "barrier" is formed that prevents a search from departing from the admissible region. It is generally true that interior-point methods can be used only for problems involving inequality constraints. Thus, in this section, the problem of minimizing a cost function I subject only to inequality constraints h(y) -> 0 is considered. An interior-point penalty function can be defined in the following general manner. First, let R ° represent the interior of the admissible region R ° = {y: h(y) > 0}.

Let d(r) be a scalar function of the scalar variable r with the property that if r, > r2 > 0, then d (rl) > d(r2) > 0. Furthermore,if the sequence {rk}vanishes (i.e. [!m rk = 0), then !i_mmd(rk) = 0. Next, let b(y) be a scalar function of y with the following properties: (1) b(y) is continuous in R ° but is undefined on the boundary of the feasible region, (2) b(y) --- 0, and (3) if Ok} is a sequence of points in R ° converging to y® where hi(y®)= 0 for some i, then lira b(yk)= ~. Property (3) k---~-~

indicates that as a constraint boundary is neared, the function b becomes unbounded. The cost function including the interior point penalty function is written as T(y, r) = l(y) + d(r)b(y).

(4.13)

Any function that satisfies the conditions stated above for the interior-point penalty function can be used. Commonly, b is chosen either as b(y) = - ~ l/h,(y) i=1

or as b(y)= - ' ~ l n h , ( y ) . The general procedure for solving the problem of minimizing I subject only to the inequality constraint h(y) --- 0 using an interior-point penalty method can be defined in the following manner. Step O. Choose a sequence of controlling parameter values {rk}, k = 1, 2. . . . such that for each k rk > rk+t

and lim r~ = 0 k~

(e.g.

rE =

10--k). Find an initial point y o E R °.

Step 1. Solve the problem minimize T(y, rk) where T is now defined by (4.13). Call the solution y*(rk)=yL Since an infinite penalty is

376

An introductionto nonlinearprogramming--IV

assigned to satisfying a constraint with equality, the resulting solution should be in R °. The discrete nature of most search procedures could permit a barrier to be "jumped" but the appropriate choice of rk should make this possibility small. Nonetheless, no trial point should be accepted without verifying that it is in R °. Step 2. Test for convergence. Typically, the same criteria discussed for exterior-point methods are used. If the test is not satisfied, return to Step 1 with k = k + 1 and start the new iterative search at the current solution point. To start a search procedure using the interior-point penalty function, it is necessary to first define a feasible initial trial point. The interior-point penalty function can be used to generate this initial feasible point if no such point can be determined otherwise. Suppose that a trial point yo is available but that it violates some constraints. Without loss of generality suppose that first s constraints hi(y), h2(y). . . . . hs (y) are violated whereas the remaining p-s constraints are satisfied. That is, let h,(yo) 0

i = s + l . . . . . p.

To determine a feasible point, the problem of maximizing ~ h,(y) subject to the constraints h,(y), I=l

i = s + 1 , . . . , p is considered. An interior point cost function is defined as T(y, rk) = - ~] h,(y) + rk ~ i=1

l/h,(y)

|=s+l

and a solution is determined for k = 1. If a constraint is satisfied at the solution y*, the problem is reformulated by including these constraints in the barrier function and the problem is restarted with k = k + 1, rk+, < rk. The procedure continues until all constraints are satisfied at which time a feasible point has been obtained for the original problem. If it is not possible to obtain a solution (i.e rk-->0) that leads to a negative cost (i.e. - E

h, 0}. Then, the following is true.

Convergence properties (1)

T(y$+~, rk+,)