Graphical Illustration of Nonlinear Programming Problems .... The basic idea is to approximate f(x) within the neighborhood of the current trial solution by a ...
College of Management, NCTU
Operation Research II
Spring, 2009
Chap12 Nonlinear Programming General Form of Nonlinear Programming Problems Max f(x) S.T. gi(x) ≤ bi for i = 1,…, m x≥0 9 No algorithm that will solve every specific problem fitting this format is available. An Example – The Product-Mix Problem with Price Elasticity 9 The amount of a product that can be sold has an inverse relationship to the price charged. That is, the relationship between demand and price is an inverse curve.
9 The firm’s profit from producing and selling x units is the sales revenue xp(x) minus the production costs. That is, P(x) = xp(x) – cx. 9 If each of the firm’s products has a similar profit function, say, Pj(xj) for producing and selling xj units of product j, then the overall objective function is n
f(x) =
∑ P (x j =1
j
j
) , a sum of nonlinear functions.
9 Nonlinearities also may arise in the gi(x) constraint function.
Jin Y. Wang
Chap12-1
College of Management, NCTU
Operation Research II
Spring, 2009
An Example – The Transportation Problem with Volume Discounts 9 Determine an optimal plan for shipping goods from various sources to various destinations, given supply and demand constraints. 9 In actuality, the shipping costs may not be fixed. Volume discounts sometimes are available for large shipments, which cause a piecewise linear cost function.
Graphical Illustration of Nonlinear Programming Problems Max Z = 3x1 + 5x2 S.T. x1 ≤ 4 9x12 + 5x22 ≤ 216 x1, x2 ≥ 0
9 The optimal solution is no longer a CPF anymore. (Sometimes, it is; sometimes, it isn’t). But, it still lies on the boundary of the feasible region. ¾ We no longer have the tremendous simplification used in LP of limiting the search for an optimal solution to just the CPF solutions. 9 What if the constraints are linear; but the objective function is not?
Jin Y. Wang
Chap12-2
College of Management, NCTU
Operation Research II
Spring, 2009
Max Z = 126x1 – 9x12 + 182x2 – 13x22 ≤ 4 S.T. x1 2x2 ≤ 12 3x1 + 2x2 ≤ 18 x1, x2 ≥ 0
9 What if we change the objective function to 54x1 – 9x12 + 78x2 – 13x22
9 The optimal solution lies inside the feasible region. 9 That means we cannot only focus on the boundary of feasible region. We need to look at the entire feasible region. The local optimal needs not to be global optimal--Complicate further
Jin Y. Wang
Chap12-3
College of Management, NCTU
Operation Research II
Spring, 2009
9 Nonlinear programming algorithms generally are unable to distinguish between a local optimal and a global optimal. 9 It is desired to know the conditions under which any local optimal is guaranteed to be a global optimal. If a nonlinear programming problem has no constraints, the objective function being concave (convex) guarantees that a local maximum (minimum) is a global maximum (minimum). 9 What is a concave (convex) function? 9 A function that is always “curving downward” (or not curving at all) is called a concave function.
9 A function is always “curving upward” (or not curving at all), it is called a convex function.
9 This is neither concave nor convex.
Definition of concave and convex functions of a single variable 9 A function of a single variable f (x) is a convex function, if for each pair of values of x, say, x ' and x '' ( x ' < x '' ), f [λx '' + (1 − λ ) x ' ] ≤ λf ( x '' ) + (1 − λ ) f ( x ' )
for all value of λ such that 0 < λ < 1 . 9 It is a strictly convex function if ≤ can be replaced by for the case of strict concave). Jin Y. Wang
Chap12-4
College of Management, NCTU
Operation Research II
Spring, 2009
9 The geometric interpretation of concave and convex functions.
How to judge a single variable function is convex or concave? 9 Consider any function of a single variable f(x) that possesses a second derivative at all possible value of x. Then f(x) is d 2 f ( x) ≥ 0 for all possible value of x. convex if and only if dx 2
concave if and only if
d 2 f ( x) ≤ 0 for all possible value of x. dx 2
How to judge a two-variables function is convex or concave? 9 If the derivatives exist, the following table can be used to determine a two-variable function is concave of convex. (for all possible values of x1 and x2) Quantity
Convex
Concave
≥ 0
≥ 0
∂ 2 f ( x1 , x 2 ) ∂x12
≥ 0
≤ 0
∂ 2 f ( x1 , x 2 ) ∂x 22
≥ 0
≤ 0
∂ 2 f ( x1 , x 2 ) ∂ 2 f ( x1 , x 2 ) ⎡ ∂ 2 f ( x1 , x 2 ) ⎤ −⎢ ⎥ ∂x12 ∂x 22 ⎣ ∂x1 ∂x 2 ⎦
Jin Y. Wang
2
Chap12-5
College of Management, NCTU
Operation Research II
Spring, 2009
9 Example: f ( x1 , x 2 ) = x12 − 2 x1 x 2 + x 22
How to judge a multi-variables function is convex or concave? 9 The sum of convex functions is a convex function, and the sum of concave functions is a concave function. 9 Example: f(x1, x2, x3) = 4x1 – x12 – (x2 – x3)2 = [4x1– x12] + [–(x2 – x3)2]
If there are constraints, then one more condition will provide the guarantee, namely, that the feasible region is a convex set. Convex set 9 A convex set is a collection of points such that, for each pair of points in the collection, the entire line segment joining these two points is also in the collection.
9 In general, the feasible region for a nonlinear programming problem is a convex set whenever all the gi(x) (for the constraints gi(x) ≤ bi) are convex. Max Z = 3x1 + 5x2 S.T. x1 ≤ 4 9x12 + 5x22 ≤ 216 x1, x2 ≥ 0
Jin Y. Wang
Chap12-6
College of Management, NCTU
Operation Research II
Spring, 2009
9 What happens when just one of these gi(x) is a concave function instead? Max Z = 3x1 + 5x2 ≤ 4 S.T. x1 ≤ 14 2x2 8x1 – x12 + 14x2 – x22 ≤ 49 x1, x2 ≥ 0 ¾ The feasible region is not a convex set. ¾ Under this circumstance, we cannot guarantee that a local maximum is a global maximum. Condition for local maximum = global maximum (with gi(x) ≤ bi constraints). 9 To guarantee that a local maximum is a global maximum for a nonlinear programming problem with constraint gi(x) ≤ bi and x ≥ 0, the objective function f(x) must be a concave function and each gi(x) must be a convex function. 9 Such a problem is called a convex programming problem. One-Variable Unconstrained Optimization 9 The differentiable function f(x) to be maximized is concave. 9 The necessary and sufficient condition for x = x* to be optimal (a global max) is df * = 0 , at x = x . dx
9 It is usually not very easy to solve the above equation analytically. 9 The One-Dimensional Search Procedure. ¾ Fining a sequence of trial solutions that leads toward an optimal solution. ¾ Using the signs of derivative to determine where to move. Positive derivative indicates that x* is greater than x; and vice versa.
Jin Y. Wang
Chap12-7
College of Management, NCTU
Operation Research II
Spring, 2009
The Bisection Method 9 Initialization: Select ε (error tolerance). Find an initial x (lower bound on x* ) and x (upper bound on x* ) by inspection. Set the initial trial solution x ' =
x+x . 2
9 Iteration: ¾ Evaluate
df ( x) at x = x ' . dx
df ( x) ' ≥ 0 , reset x = x . dx df ( x) ' ≤ 0 , reset x = x . ¾ If dx
¾ If
' ¾ Select a new x =
x+x . 2
9 Stopping Rule: If x − x ≤ 2ε , so that the new x ' must be within ε of x * , stops. Otherwise, perform another iteration. 9 Example: Max f(x) = 12x – 3x4 – 2x6
0 1 2 3 4 5 6 7
Jin Y. Wang
df(x)/dx
x
x
4.09 -2.19 1.31 -0.34 0.51
0.75 0.75 0.8125 0.8125 0.828125
1 0.875 0.875 0.84375 0.84375
New x
'
0.875 0.8125 0.84375 0.828125 0.8359375
f (x' )
7.8439 7.8672 7.8829 7.8815 7.8839
Chap12-8
College of Management, NCTU
Operation Research II
Spring, 2009
Newton’s Method 9 The bisection method converges slowly. ¾ Only take the information of first derivative into account. 9 The basic idea is to approximate f(x) within the neighborhood of the current trial solution by a quadratic function and then to maximize (or minimize) the approximate function exactly to obtain the new trial solution.
9 This approximating quadratic function is obtained by truncating the Taylor series after the second derivative term. f '' ( xi ) f ( xi +1 ) ≈ f ( xi ) + f ( xi )( xi +1 − xi ) + ( xi +1 − xi ) 2 2 '
9 This quadratic function can be optimized in the usual way by setting its first derivative to zero and solving for xi+1.
Thus, xi +1 = xi −
f ' ( xi ) . f '' ( x i )
9 Stopping Rule: If xi +1 − xi ≤ ε , stop and output xi+1. 9 Example: Max f(x) = 12x – 3x4 – 2x6 (same as the bisection example) ¾
xi +1 = xi −
f ' ( xi ) = f '' ( x i )
¾ Select ε = 0.00001, and choose x1 = 1. f ( xi ) Iteration i xi 1 2 3 4
0.84003 0.83763
7.8838 7.8839
f ' ( xi )
f '' ( xi )
-0.1325 -0.0006
-55.279 -54.790
xi+1
0.83763 0.83762
Multivariable Unconstrained Optimization 9 Usually, there is no analytical method for solving the system of equations given by setting the respective partial derivatives equal to zero. 9 Thus, a numerical search procedure must be used.
Jin Y. Wang
Chap12-9
College of Management, NCTU
Operation Research II
Spring, 2009
The Gradient Search Procedure (for multivariable unconstrained maximization problems) 9 The goal is to reach a point where all the partial derivatives are 0. 9 A natural approach is to use the values of the partial derivatives to select the specific direction in which to move. 9 The gradient at point x = x’ is ∇f (x) = (
∂f ∂f ∂f ’ , ,..., ) at x = x . ∂x1 ∂x x ∂xn
9 The direction of the gradient is interpreted as the direction of the directed line segment from the origin to the point (
∂f ∂f ∂f , ,..., ) , which is the direction of ∂x1 ∂x x ∂x n
changing x that will maximize f(x) change rate. 9 However, normally it would not be practical to change x continuously in the direction of ∇f (x), because this series of changes would require continuously reevaluating the
∂f and changing the direction of the path. ∂xi
9 A better approach is to keep moving in a fixed direction from the current trial solution, not stopping until f(x) stops increasing. 9 The stopping point would be the next trial solution and reevaluate gradient. The gradient would be recalculated to determine the new direction in which to move. ¾ Reset x’ = x’ + t* ∇f (x’), where t* is the positive value that maximizes f(x’+t* ∇ f(x’)) = 9 The iterations continue until ∇f ( x) = 0 with a small tolerance ε .
Summary of the Gradient Search Procedures 9 Initialization: Select ε and any initial trail solution x’. Go first to the stopping rule. 9 Step 1: Express f(x’+t ∇ f(x’)) as a function of t by setting x j = x 'j + t (
∂f ) ' , for ∂x j x = x
j = 1, 2,…, n, and then substituting these expressions into f(x). 9 Step 2: Use the one-dimensional search procedure to find t = t* that maximizes f(x’+t ∇ f(x’)) over t ≥ 0. 9 Step 3: Reset x’ = x’ + t* ∇ f(x’). Then go to the stopping rule. Jin Y. Wang
Chap12-10
College of Management, NCTU
Operation Research II
9 Stopping Rule: Evaluate ∇ f(x’) at x = x’. Check if
Spring, 2009
∂f ≤ ε , for all j = 1,2,…, n. ∂xi
If so, stop with the current x’ as the desired approximation of an optimal solution x*. Otherwise, perform another iteration. Example for multivariate unconstraint nonlinear programming Max f(x) = 2x1x2 + 2x2 – x12 – 2x22 ∂f ∂f = 2 x 2 − 2 x1 , = 2 x1 + 2 − 4 x 2 ∂x1 ∂x 2
. We verify that f(x) is Suppose pick x = (0, 0) as the initial trial solution. ∇f (0,0) =
9 Iteration 1: x = (0, 0) + t(0, 2) = (0, 2t) f ( x ' + t∇f ( x ' )) = f(0, 2t) =
9 Iteration 2: x = (0, 1/2) + t(1, 0) = (t, 1/2)
9 Usually, we will use a table for convenience purpose. Jin Y. Wang
Chap12-11
College of Management, NCTU
Iteration 1 2
x'
Operation Research II
∇f ( x ' )
x ' + t ∇f ( x ' )
Spring, 2009
f ( x ' + t∇f ( x ' ))
t*
x ' + t * ∇f ( x ' )
For minimization problem 9 We move in the opposite direction. That is x’ = x’ – t* ∇ f(x’). 9 Another change is t = t* that minimize f(x’ – t ∇ f(x’)) over t ≥ 0 Necessary and Sufficient Conditions for Optimality (Maximization) Problem Necessary Condition Also Sufficient if: f(x) concave One-variable unconstrained df = 0 Multivariable unconstrained
dx ∂f = 0 (j=1,2,…n) ∂xi
f(x) concave
General constrained problem KKT conditions
f(x) is concave and gi(x) is convex The Karush-Kuhn-Tucker (KKT) Conditions for Constrained Optimization 9 Assumed that f(x), g1(x), g2(x), …, gm(x) are differentiable functions. Then x* = (x1*, x2*, …, xn*) can be an optimal solution for the nonlinear programming problem only if there exist m numbers u1, u2, …, um such that all the following KKT conditions are satisfied: (1)
m ∂g ∂f * − ∑ u i i ≤ 0 , at x = x , for j = 1, 2, …, n ∂x j i =1 ∂x j
(2) x *j (
m ∂g ∂f * − ∑ u i i ) = 0 , at x = x , for j = 1, 2, …, n ∂x j i =1 ∂x j
(3) gi(x*) – bi ≤ 0, for i =1, 2, …, m (4) ui [gi(x*) – bi] = 0, for i =1, 2, …, m
Jin Y. Wang
Chap12-12
College of Management, NCTU
Operation Research II
Spring, 2009
(5) x *j ≥ 0 , for j =1, 2, …, m (6) u j ≥ 0 , for j =1, 2, …, m Corollary of KKT Theorem (Sufficient Conditions) 9 Note that satisfying these conditions does not guarantee that the solution is optimal. 9 Assume that f(x) is a concave function and that g1(x), g2(x), …, gm(x) are convex functions. Then x* = (x1*, x2*, … , xn*) is an optimal solution if and only if all the KKT conditions are satisfied. An Example Max f(x) = ln(x1 + 1) + x2 S.T. 2x1 + x2 ≤ 3 x1, x2 ≥ 0 n = 2; m = 1; g1(x) = 2x1 + x2 is convex; f(x) is concave. 1. ( j = 1)
1 − 2u1 ≤ 0 x1 + 1
2. ( j = 1) x1 (
1 − 2u1 ) = 0 x1 + 1
1. (j = 2) 1 − u1 ≤ 0 2. (j = 2) x 2 (1 − u1 ) = 0 3. 2 x1 + x2 − 3 ≤ 0 4. u1 (2 x1 + x 2 − 3) = 0 5. x1 ≥ 0, x2 ≥ 0 6. u1 ≥ 0 9 Therefore, There exists a u1 = 1 such that x1 = 0, x2 = 3, and u1 = 1 satisfy KKT conditions. The optimal solution is (0, 3). How to solve the KKT conditions 9 Sorry, there is no easy way. 9 In the above example, there are 8 combinations for x1( ≥ 0), x2( ≥ 0), and u1( ≥ 0). Try each one until find a fit one. 9 What if there are lots of variables? 9 Let’s look at some easier (special) cases.
Jin Y. Wang
Chap12-13
College of Management, NCTU
Operation Research II
Spring, 2009
Quadratic Programming Max f(x) = cx – 1/2 xTQx S.T. Ax ≤ b x≥0 n
9 The objective function is f(x) = cx – 1/2 xTQx = ∑ c j x j − j =1
1 n n ∑∑ qij xi x j . 2 i =1 j =1
9 The qij are elements of Q. If i = j, then xixj = xj2, so –1/2qij is the coefficient of xj2. If i ≠ j, then –1/2(qij xixj + qji xjxi) = –qij xixj, so –qij is the coefficient for the product of xi and xj (since qij = qji).
9 An example Max f(x1, x2) = 15x1 + 30x2 + 4x1x2 – 2x12 – 4x22 S.T. x1 + 2x2 ≤ 30 x1, x2 ≥ 0
9 The KKT conditions for the above quadratic programming problem. 1. 2. 1. 2. 3. 4. 5. 6.
(j = 1) (j = 1) (j = 2) (j = 2)
Jin Y. Wang
15 + 4x2 – 4x1 – u1 ≤ 0 x1(15 + 4x2 – 4x1 – u1) = 0 30 + 4x1 – 8x2 – 2u1 ≤ 0 x2(30 + 4x1 – 8x2– 2u1) = 0 x1 + 2x2 – 30 ≤ 0 u1(x1 + 2x2 – 30) = 0 x1 ≥ 0, x2 ≥ 0 u1 ≥ 0
Chap12-14
College of Management, NCTU
Operation Research II
Spring, 2009
9 Introduce slack variables (y1, y2, and v1) for condition 1 (j=1), 1 (j=2), and 3. = –15 1. (j = 1) – 4x1 + 4x2 – u1 + y1 + y2 = –30 1. (j = 2) 4x1 – 8x2 – 2u1 + v1 = 30 3. x1 + 2x2 Condition 2 (j = 1) can be reexpressed as 2. (j = 1) x1y1 = 0 Similarly, we have 2. (j = 2) x2y2 = 0 4. u1v1 = 0 9 For each of these pairs—(x1, y1), (x2, y2), (u1, v1)—the two variables are called complementary variables, because only one of them can be nonzero. ¾ Combine them into one constraint x1y1 + x2y2 + u1v1 = 0, called the complementary constraint. 9 Rewrite the whole conditions 4x1 – 4x2 + u1 – y1 = 15 – y2 = 30 –4x1 + 8x2 + 2u1 + v1 = 30 x1 + 2x2 =0 x1y1 + x2y2 + u1v1 x1 ≥ 0, x2 ≥ 0, u1 ≥ 0, y1 ≥ 0, y2 ≥ 0, v1 ≥ 0 9 Except for the complementary constraint, they are all linear constraints. 9 For any quadratic programming problem, its KKT conditions have this form Qx + ATu – y = cT Ax + v = b x ≥ 0, u ≥ 0, y ≥ 0, v ≥ 0 x Ty + u Tv = 0 9 Assume the objective function (of a quadratic programming problem) is concave and constraints are convex (they are all linear). 9 Thus, x is optimal if and only if there exist values of y, u, and v such that all four vectors together satisfy all these conditions. 9 The original problem is thereby reduced to the equivalent problem of finding a feasible solution to these constraints. 9 These constraints are really the constraints of a LP except the complementary constraint. Why don’t we just modify the Simplex Method?
Jin Y. Wang
Chap12-15
College of Management, NCTU
Operation Research II
Spring, 2009
The Modified Simplex Method 9 The complementary constraint implies that it is not permissible for both complementary variables of any pair to be basic variables. 9 The problem reduces to finding an initial BF solution to any linear programming problem that has these constraints, subject to this additional restriction on the identify of the basic variables. 9 When cT ≤ 0 (unlikely) and b ≥ 0, the initial solution is easy to find. x = 0, u = 0, y = – cT, v = b 9 Otherwise, introduce artificial variable into each of the equations where cj > 0 or bi < 0, in order to use these artificial variables as initial basic variables ¾ This choice of initial basic variables will set x = 0 and u = 0 automatically, which satisfy the complementary constraint. 9 Then, use phase 1 of the two-phase method to find a BF solution for the real problem. ¾ That is, apply the simplex to (zi is the artificial variables) Min Z = ∑ z j j
Subject to the linear programming constraints obtained from the KKT conditions, but with these artificial variables included. ¾ Still need to modify the simplex method to satisfy the complementary constraint. 9 Restricted-Entry Rule: ¾ Exclude from consideration any nonbasic variable to be the entering variable whose complementary variable already is a basic variable. ¾ Choice the other nonbasic variables according to the usual criterion. ¾ This rule keeps the complementary constraint satisfied all the time. 9 When an optimal solution x*, u*, y*, v*, z1 = 0, …, zn = 0 is obtained for the phase 1 problem, x* is the desired optimal solution for the original quadratic programming problem.
Jin Y. Wang
Chap12-16
College of Management, NCTU
Operation Research II
Spring, 2009
A Quadratic Programming Example Max 15x1 + 30x2 + 4x1x2 – 2x12 – 4x22 S.T. x1 + 2x2 ≤ 30 x1, x2 ≥ 0
Jin Y. Wang
Chap12-17
College of Management, NCTU
Operation Research II
Spring, 2009
Constrained Optimization with Equality Constraints 9 Consider the problem of finding the minimum or maximum of the function f(x), subject to the restriction that x must satisfy all the equations g1(x) = b1 … gm(x) = bm 9 Example: Max f(x1, x2) = x12 + 2x2 S.T. g(x1, x2) = x12 + x22 = 1 9 A classical method is the method of Lagrange multipliers. m
¾ The Lagrangian function h( x, λ ) = f ( x) − ∑ λi [ g i ( x) − bi ] , where (λ1 , λ2 ,..., λm ) i =1
are called Lagrange multipliers. 9 For the feasible values of x, gi(x) – bi = 0 for all i, so h(x, λ ) = f(x). 9 The method reduces to analyzing h(x, λ ) by the procedure for unconstrained optimization. ¾ Set all partial derivative to zero m ∂g ∂h ∂f = − ∑ λi i = 0 , for j = 1, 2, …, n ∂x j ∂x j i =1 ∂x j
∂h = − g i ( x) + bi = 0, for i = 1, 2, …, m ∂λi
¾ Notice that the last m equations are equivalent to the constraints in the original problem, so only feasible solutions are considered. 9 Back to our example ¾ h(x1, x2) = x12 + 2x2 – λ ( x12 + x22 – 1). ¾
∂h = ∂x1 ∂h = ∂x 21 ∂h = ∂λ
Jin Y. Wang
Chap12-18
College of Management, NCTU
Operation Research II
Spring, 2009
Other types of Nonlinear Programming Problems 9 Separable Programming ¾ It is a special case of convex programming with one additional assumption: f(x) and g(x) functions are separable functions. ¾ A separable function is a function where each term involves just a single variable. ¾ Example: f(x1, x2) = 126x1 – 9x12 + 182x2 – 13x22 = f1(x1) + f2(x2) f1(x1) = f2(x2) = ¾ Such problem can be closely approximated by a linear programming problem. Please refer to section 12.8 for details. 9 Geometric Programming ¾ The objective and the constraint functions take the form N
g ( x) = ∑ ci Pi ( x) , where Pi ( x) = x1ai1 x 2ai 2 ...x3ai 3 for i = 1, 2, …, N i =1
¾ When all the ci are strictly positive and the objective function is to be minimized, this geometric programming can be converted to a convex programming problem by setting x j = e y . j
9 Fractional Programming ¾ Suppose that the objective function is in the form of a (linear) fraction. Maximize f(x) = f1(x) / f2(x) = (cx + c0) / (dx + d0). ¾ Also assume that the constraints gi(x) are linear. Ax ≤ b, x ≥ 0. ¾ We can transform it to an equivalent problem of a standard type for which effective solution procedures are available. ¾ We can transform the problem to an equivalent linear programming problem by letting y = x / (dx + d0) and t = 1 / (dx + d0), so that x = y/t. ¾ The original formulation is transformed to a linear programming problem. Max Z = cy + c0t S.T. Ay – bt ≤ 0 dy + d0t = 1 y,t ≥ 0
Jin Y. Wang
Chap12-19