Optimization: Theory, Algorithms, Applications MSRI - Berkeley SAC, Nov/06 Henry Wolkowicz Department of Combinatorics & Optimization University of Waterloo
Outline Why are we here? (What is Optimization?) History of Optimization Main Players Most important Open Problems Different Areas for connections Resources/References
What is Optimization? Two quotes from Tjalling C. Koopmans, Nobel Memorial Lecture: [22] “best use of scarce resources” “Mathematical Methods of Organizing and Planning of Production", [18] ————————(Kantorovich and K.: joint winners Nobel Prize Economics 1975, "for their contributions to the theory of optimum allocation of resources")
History Virgil’s Aeneid 19 BCE, Legend of Carthage Queen Dido’s Problem: Queen fled to African coast after husband killed; she begged King Jambas (local ruler) for land; he granted only as much as she could enclose within a bull’s hide; she sliced the hide into strips; used the strips to surround a large area. Optimal shape was ? —————In 3-dimensions: soap bubbles and films are examples of minimal surface areas.
The Brachistochrone Problem cycloid or curve of fastest descent; stationary body starts at first point and passes down along curve to second point, under action of constant gravity, ignoring friction. Bernoulli (1696)/Calculus of Variations
Figure 1: Cycloid
History of Math. Progr., 1991 [26] • remarkably short - rooted in applications
• 1940’s - driven by applications (war time moving men and machinery) • Dantzig (Pentagon - Stanford) and Kantorovich (Leningrad) • Others: Hitchcock, Koopmans, Arrow, Charnes, Gale, Goldman, Hoffman, Kuhn, von Neumann (game theory, duality, computers) etc...
Dantzig/Linear Programming, LP • Planning problems:
Assign 70 men to 70 jobs; vij benefit of man i assigned to job j (Linear Assignment Problem, LAP)
but 70! > 10100 (a googol) • Dantzig visited Von Neumann - Oct 3, 1947 learned about Farkas’ Lemma, Duality (game theory) - SIMPLEX METHOD for LP —————• Hotteling: But we all know the world is nonlinear ... Von Neumann: ... if linear application ... use it
Unreasonable Success of Simplex LP min cT x s.t. Ax = b, x ≥ 0. • Klee-Minty 1970: exponential time example for simplex method. But, linear time in practice. • SIAM 70s computer survey: 70% of (world) computer time spent on LP/simplex • Is LP in class P (easy) or class NP (hard)? • Russian Mathematician Khachian 1978: LP algorithm based on ellipsoids/duality/inequalities showed LP is in P. (NYT frontpage stories/fables) • Hungarian method for LAP in O(n3 ) time, [23, 31]; BUT - still no known strongly polynomial method for general LP.
A First Meeting
Figure 2: George B. Dantzig and Leonid Khachiyan, meeting for the first time, February 1990, Asilomar, California, at the SIAM-organized workshop Progress in Mathematical Programming.
2005: Khachiyan died Apr 29 (age 52) Dantzig died May 13 (age 90)
Lagrange Multiplier Extensions NLP: MOTIVATED by LP Success e.g. [24] • [25] 1951: Kuhn-Tucker optimality conditions for nonlinear programming (NLP) [20] 1939: Karush, Masters Thesis, Math., Univ. Chicago (Same constraint qualification) • [17] 1948: Fritz John, Extremum problems with inequalities...
K-K-T Conditions NLP
min f (x) s.t. g(x) ≤ 0, h(x) = 0
CQ: Geometry (cone of tangents) coincides with algebra (linearization) (modern opt cond) ∇f (x∗ ) + g ′ (x∗ )λ∗ + h′ (x∗ )µ∗ = 0, λ ≥ 0, dual feas h(x∗ ) = 0, g(x∗ ) ≤ 0, primal feas. g(x∗ )T λ∗ = 0, compl. slack. Proof: Apply Farkas’ Lemma, 1902, to local linearization. (modern: use hyperplane separation theorem/S.Mazur’s geometric Hahn-Banach Theorem.)
Further Extensions • Infinite (Cone) Programs, Duffin, 1956 [6]. • optimization with respect to partial orders, [15, 36, 16, 28, 14]. • Optimal Control (Pontryagin Maximum Principle) • Discrete/Combinatorial Optimization FIRST CHANCES - QUESTIONS? DISCUSSION?
Figure 3: Optimization Tree,
Quasi-Newton Methods For Unconstrained Optimization: • Least Change Secant Methods, Variable Metric Methods Davidon’59[5]/Fletcher-Powell’63[10] (DFP), and Broyden[4]/Fletcher[9]/Goldfarb[13]/Shanno[34] ’70/(BFGS). • rank-two updates of Hessian, maintains positive definite Hessian approximations • But: automatic differentiation Griewank-Corliss’91 differentiates the code efficiently.
Power of Duality Find optimal trajectory/control (for rocket) R 1 1 t1 2 2 µ0 = min J(u) = 2 ku(t)k = 2 t0 u (t)dt s.t. x(t) ˙ = A(t)x(t) + b(t)u(t) x(t0 ) = x0 , x(t1 ) ≥ c. Using fundamental solution matrix Φ Z t1 Φ(t1 , t)u(t)b(t)dt x(t1 ) = Φ(t1 , t0 )x(t0 ) + t0 | {z } integral oper. Ku
Duality cont... Convex Pgm
min J(u) = 21 ku(t)k2 s.t. Ku ≥ d
Lagrangian dual (best lower bound) is
µ0 = maxλ≥0 minu {J(u) + λT (d − Ku)} = maxλ≥0 λT Qλ + λT d simple FIN. dim. QP R 1 t1 where Q = − 2 t0 Φ(t1 , t)b(t)b(t)T Φ(t1 , t)dt u∗ (t) = λT∗ Φ(t1 , t)b(t)
Convex Analysis Lies behind results in Optimization • Classic ’70 text by Rockafellar (UofW, Seattle), [33]; • Nonsmooth Analysis: Clarke, Borwein (Smooth Variational Principle), Mordukhovich, Lewis • Variational Principles (powerful optimality conditions, extensions to nonconvex case)
Proving/Generating Theorems using Optimization
Spectral Decomposition Theorem, A = AT : • min xT Ax s.t. xT x = 1 Lagrangian is: L(x, λ) = xT Ax + λ(1 − xT x) stationarity: ∇L(x1 , λ) = 2Ax1 − 2λx1 = 0 min eig since obj.: xT1 Ax1 = λxT1 x1 = λ → min Now add constraint xT x1 = 0, to get second eigen-pair etc...
Proving/Generating Theorems using Optimization cont...
Eigenvalue Bounds, A = AT : min λ1 (A) P λi (A) = trace (A) s.t i P 2 2 λ (A) = trace (A ) i i
Lagrangian is: L(....., stationarity: ∇L(.... = 0 trace (A2 ) trace (A) 2 2 − m Explicit solution: m := n ; s = n √ λmin (A) ≤ m + n − 1s Similarly, get upper/lower bounds for λ2 (A) and other functions of the eigenvalues, e.g. [35].
SUMT • Penalty and Barrier Methods (lost favour) • Frisch ’55 [11]; Sequential Unconstrained Minimization Techniques, Fiacco-McCormick, ’68 [8], • Penalize µ1 kequality constraintsk2, µ1 → ∞, -replace inequality constraints by smooth barrier, µΣk log gk (x), µ ↓ 0 • Solve sequence of simpler unconstrained problems 1 min Bµ (x) = f (x) + kh(x)k2 + µΣk log gk (x) x µ
Methods using Lagrange Multipliers • Hestenes, Rockafellar, Fletcher, Powell, Conn-Gould-Toint (augmented Lagrangians combination of Lagrange and penalty methods), Gill-Murray-Wright (’81 [12], Stanford). • Sequential Quadratic Programming (SQP): Solve the Newton direction for the optimality conditions using quadratic approximations involving the Lagrangian function. SECOND CHANCES - QUESTIONS? DISCUSSION?
Interior Point Methods, LP • 1984: Karmarkar’84[19](Berkeley), interior point method to improve complexity (polynomial time) results. BUT: high efficiency claimed for practical problems! (NYT frontpage stories/fables) • Stanford gang of four (Gill-Murray-Wright-Saunders) connection to log-barrier methods (came back).
Interior Point Revolution • Kojima-Mizuno-Yoshise’89[21] elegant primal-dual path-following framework. Mehrotra’92[30] predictor-corrector speedup-stability. • OB1 Lustig-Marsten-Shanno, ’92 [29], legal battle with Bell Labs • CPLEX tool for LP - large scale - 15 million variable problems solved on a desktop. • Nesterov-Nemirovski, ’89 [32], extensions to convex problems, e.g. cone optimization problems, e.g. Semidefinite Programming
Semidefinite Programming • Elegant Theory, Efficient Algorithms, Many Applications • MAX-CUT Undirected, weighted graph G = (N, E), weights W = wij . Cut (divide) the set of nodes N into two sets so that the sum of weights that are cut is maximized. P 1 ∗ p := max ij wij (1 − xi xj ) 4 s.t xi ∈ {±1}, i = 1, . . . n s.t/equiv. x2i = 1, i = 1, . . . n
SDP via Dual of Dual Lagrangian Dual ∗
d := minλ maxx x Qx + = min et λ Q−Diag (λ)0
2 λ (1 − x i) i i
dual of MC dual of dual of MC max trace QX min et λ s.t Diag (λ) − Z = Q s.t diag (X) − Z = Q Z0 Z0 .878 performance guarantee, Goemans and Williamson (IBM Bay Area)
SDP with CONDOR Solves QAP • QAP: Quadratic Assignment Problem: size n > 20 considered hard. (compared to fast solutions for n = 106 for LAP) • important applications to e.g. VLSI design, massive parallelism (Blue Gene/IBM) • SDP bound in a branch and bound framework; using CONDOR (High Throughput Computingusing free cycles worldwide) • Solves Nugent problem n = 30 (and others) for first time, [1].
SDP and Robust!! Optimization Robust optimization: problem data known only within certain bounds. conic: max bT y s.t. ∀A ∈ U, c − AT y ∈ K goal: find feasible solution acceptably close to optimal for data within the bounds. Applications e.g.: control theory; engineering design and finance; Aircraft path planning; machine learning (robust classification, support vector machines, and kernel optimization); e.g. Ben-Tal [3]; El-Ghaoui (Berkeley) [7].
SDP and Hilbert’s 17th Problem, SOS Hilbert, 1900: Given a multivariate polynomial that takes only non-negative values over the reals, can it be represented as a sum of squares of rational functions? Artin, 1927 - YES; (Gondard & Ribenboim, extension to symmetric matrices, 1974.) But: SOS polys ⊂ nonneg polys (ml (z) vector of monomials) p(z) is SOS of pols iff p(z) ≡ ml (z)T W ml (z), W
Lax’58 Conjecture is True Lewis, Parrilo, Ramana’03,[27]: Hyperbolic polynomials in three variables are determinants of three symmetric matrices. and this is equivalent to Helton-Vinnikov observation: A polynomial q on ℜ2 is a real zero polynomial of degree d and satisfies q(0, 0) = 1 if and only if there exist matrices B, C ∈ S d such that q is given by q(y, z) = det(I + yB + zC).
SDP Open Problems; Exciting Area • Which problems can be formulated as SDPs? (Algebraic connections, BIRS/MSRI workshop on Positive Polynomials and Optimization, e.g. Parrilo, ex-postdoc Berkeley) • efficient/stable solutions, large scale problems • extension of SDP methodology: e.g. symmetric cones and relation to Jordan algebras; bilinear matrix inequalities. • SDP at UC Berkeley: leader is Laurent El Ghaoui; recent grad. Jiawang Nie (working on sensor localization) with co-advisors Jim Demmel and Bernd Sturmfels.
Outstanding Problems/Questions • Kepler (1611) Conjecture: close packing (cubic or hexagonal close √ packing; have maximum densities of π/(3 2) ≈ 74.048%) is the densest possible sphere packing. Find densest (not necessarily periodic) packing of spheres - Kepler problem.
Effective Certificates of Optimality? • Hales’ (1997) detailed plan; extensive use of computer calculations. Hales’ full proof in a series of papers totaling more than 250 pages (Cipra 1998). Proof relies extensively on: global optimization; linear programming; interval arithmetic. Computer files contain more than 3 gigabytes of storage, e.g. [2].
Hard Nonconvex Problems • e.g. protein folding - how does natural phenomena optimize? • - Hubble telescope SDP - projection algorithm. • strongly polynomial LP algorithms; Hirsch conjecture (comb. diameter of d-polytope with n facets is bounded by n − d; • weather prediction uses a least squares min/opt problem for initial conditions • Massive Parallel Computing optimize network minimize heat - VLSI design; metrics for performance?
Discrete Optimization • Important applications: e.g. ministry of health all discrete opt problems • using continuous optimization relaxations within branch and bound methods • Gomory cutting planes came back and lie behind current success of solving large scale discrete problems (e.g. in CPLEX). • QAP problem is NP- hard, but still needs to be solved, i.e. worst case complexity and expected performance can differ drastically.
Connections with Optimization • Discrete and Continuous Optimization • Optimal Control Theory (space program, environment) • Medicine (molecular conformation, scheduling) • Politics (game theory) • Computer Science (massive parallelism, VLSI design, computer design, quantum computing) • Management Science and Engineering in general • Economics (government planning) • Statistics, e.g. machine learning
Optimization: Theory, Algorithms, Applications – p.35/37
Resources/References • Optimization Frequently Asked Questions: Linear Programming FAQ Nonlinear Programming FAQ • NEOS: • e-optimization community: • Optimization Online:
