Hotteling: But we all know the world is nonlinear ... Von Neumann: ... if linear
application ... use it. Optimization: Theory, Algorithms, Applications – p.7/37 ...
Optimization: Theory, Algorithms, Applications MSRI - Berkeley SAC, Nov/06 Henry Wolkowicz Department of Combinatorics & Optimization University of Waterloo
Optimization: Theory, Algorithms, Applications – p.1/37
Outline Why are we here? (What is Optimization?) History of Optimization Main Players Most important Open Problems Different Areas for connections Resources/References
Optimization: Theory, Algorithms, Applications – p.2/37
What is Optimization? Two quotes from Tjalling C. Koopmans, Nobel Memorial Lecture: [22] “best use of scarce resources” “Mathematical Methods of Organizing and Planning of Production", [18] ————————(Kantorovich and K.: joint winners Nobel Prize Economics 1975, "for their contributions to the theory of optimum allocation of resources")
Optimization: Theory, Algorithms, Applications – p.3/37
History Virgil’s Aeneid 19 BCE, Legend of Carthage Queen Dido’s Problem: Queen fled to African coast after husband killed; she begged King Jambas (local ruler) for land; he granted only as much as she could enclose within a bull’s hide; she sliced the hide into strips; used the strips to surround a large area. Optimal shape was ? —————In 3-dimensions: soap bubbles and films are examples of minimal surface areas.
Optimization: Theory, Algorithms, Applications – p.4/37
The Brachistochrone Problem cycloid or curve of fastest descent; stationary body starts at first point and passes down along curve to second point, under action of constant gravity, ignoring friction. Bernoulli (1696)/Calculus of Variations
Figure 1: Cycloid
Optimization: Theory, Algorithms, Applications – p.5/37
History of Math. Progr., 1991 [26] • remarkably short - rooted in applications
• 1940’s - driven by applications (war time moving men and machinery) • Dantzig (Pentagon - Stanford) and Kantorovich (Leningrad) • Others: Hitchcock, Koopmans, Arrow, Charnes, Gale, Goldman, Hoffman, Kuhn, von Neumann (game theory, duality, computers) etc...
Optimization: Theory, Algorithms, Applications – p.6/37
Dantzig/Linear Programming, LP • Planning problems:
Assign 70 men to 70 jobs; vij benefit of man i assigned to job j (Linear Assignment Problem, LAP)
but 70! > 10100 (a googol) • Dantzig visited Von Neumann - Oct 3, 1947 learned about Farkas’ Lemma, Duality (game theory) - SIMPLEX METHOD for LP —————• Hotteling: But we all know the world is nonlinear ... Von Neumann: ... if linear application ... use it
Optimization: Theory, Algorithms, Applications – p.7/37
Unreasonable Success of Simplex LP min cT x s.t. Ax = b, x ≥ 0. • Klee-Minty 1970: exponential time example for simplex method. But, linear time in practice. • SIAM 70s computer survey: 70% of (world) computer time spent on LP/simplex • Is LP in class P (easy) or class NP (hard)? • Russian Mathematician Khachian 1978: LP algorithm based on ellipsoids/duality/inequalities showed LP is in P. (NYT frontpage stories/fables) • Hungarian method for LAP in O(n3 ) time, [23, 31]; BUT - still no known strongly polynomial method for general LP.
Optimization: Theory, Algorithms, Applications – p.8/37
A First Meeting
Figure 2: George B. Dantzig and Leonid Khachiyan, meeting for the first time, February 1990, Asilomar, California, at the SIAM-organized workshop Progress in Mathematical Programming.
2005: Khachiyan died Apr 29 (age 52) Dantzig died May 13 (age 90)
Optimization: Theory, Algorithms, Applications – p.9/37
Lagrange Multiplier Extensions NLP: MOTIVATED by LP Success e.g. [24] • [25] 1951: Kuhn-Tucker optimality conditions for nonlinear programming (NLP) [20] 1939: Karush, Masters Thesis, Math., Univ. Chicago (Same constraint qualification) • [17] 1948: Fritz John, Extremum problems with inequalities...
Optimization: Theory, Algorithms, Applications – p.10/37
K-K-T Conditions NLP
min f (x) s.t. g(x) ≤ 0, h(x) = 0
CQ: Geometry (cone of tangents) coincides with algebra (linearization) (modern opt cond) ∇f (x∗ ) + g ′ (x∗ )λ∗ + h′ (x∗ )µ∗ = 0, λ ≥ 0, dual feas h(x∗ ) = 0, g(x∗ ) ≤ 0, primal feas. g(x∗ )T λ∗ = 0, compl. slack. Proof: Apply Farkas’ Lemma, 1902, to local linearization. (modern: use hyperplane separation theorem/S.Mazur’s geometric Hahn-Banach Theorem.)
Optimization: Theory, Algorithms, Applications – p.11/37
Further Extensions • Infinite (Cone) Programs, Duffin, 1956 [6]. • optimization with respect to partial orders, [15, 36, 16, 28, 14]. • Optimal Control (Pontryagin Maximum Principle) • Discrete/Combinatorial Optimization FIRST CHANCES - QUESTIONS? DISCUSSION?
Optimization: Theory, Algorithms, Applications – p.12/37
NEOS/Argonne/Solvers
Figure 3: Optimization Tree, neos.mcs.anl.gov
Optimization: Theory, Algorithms, Applications – p.13/37
Quasi-Newton Methods For Unconstrained Optimization: • Least Change Secant Methods, Variable Metric Methods Davidon’59[5]/Fletcher-Powell’63[10] (DFP), and Broyden[4]/Fletcher[9]/Goldfarb[13]/Shanno[34] ’70/(BFGS). • rank-two updates of Hessian, maintains positive definite Hessian approximations • But: automatic differentiation Griewank-Corliss’91 differentiates the code efficiently.
Optimization: Theory, Algorithms, Applications – p.14/37
Power of Duality Find optimal trajectory/control (for rocket) R 1 1 t1 2 2 µ0 = min J(u) = 2 ku(t)k = 2 t0 u (t)dt s.t. x(t) ˙ = A(t)x(t) + b(t)u(t) x(t0 ) = x0 , x(t1 ) ≥ c. Using fundamental solution matrix Φ Z t1 Φ(t1 , t)u(t)b(t)dt x(t1 ) = Φ(t1 , t0 )x(t0 ) + t0 | {z } integral oper. Ku
Optimization: Theory, Algorithms, Applications – p.15/37
Duality cont... Convex Pgm
(
min J(u) = 21 ku(t)k2 s.t. Ku ≥ d
Lagrangian dual (best lower bound) is
µ0 = maxλ≥0 minu {J(u) + λT (d − Ku)} = maxλ≥0 λT Qλ + λT d simple FIN. dim. QP R 1 t1 where Q = − 2 t0 Φ(t1 , t)b(t)b(t)T Φ(t1 , t)dt u∗ (t) = λT∗ Φ(t1 , t)b(t)
Optimization: Theory, Algorithms, Applications – p.16/37
Convex Analysis Lies behind results in Optimization • Classic ’70 text by Rockafellar (UofW, Seattle), [33]; • Nonsmooth Analysis: Clarke, Borwein (Smooth Variational Principle), Mordukhovich, Lewis • Variational Principles (powerful optimality conditions, extensions to nonconvex case)
Optimization: Theory, Algorithms, Applications – p.17/37
Proving/Generating Theorems using Optimization
Spectral Decomposition Theorem, A = AT : • min xT Ax s.t. xT x = 1 Lagrangian is: L(x, λ) = xT Ax + λ(1 − xT x) stationarity: ∇L(x1 , λ) = 2Ax1 − 2λx1 = 0 min eig since obj.: xT1 Ax1 = λxT1 x1 = λ → min Now add constraint xT x1 = 0, to get second eigen-pair etc...
Optimization: Theory, Algorithms, Applications – p.18/37
Proving/Generating Theorems using Optimization cont...
Eigenvalue Bounds, A = AT : min λ1 (A) P λi (A) = trace (A) s.t i P 2 2 λ (A) = trace (A ) i i
Lagrangian is: L(....., stationarity: ∇L(.... = 0 trace (A2 ) trace (A) 2 2 − m Explicit solution: m := n ; s = n √ λmin (A) ≤ m + n − 1s Similarly, get upper/lower bounds for λ2 (A) and other functions of the eigenvalues, e.g. [35].
Optimization: Theory, Algorithms, Applications – p.19/37
SUMT • Penalty and Barrier Methods (lost favour) • Frisch ’55 [11]; Sequential Unconstrained Minimization Techniques, Fiacco-McCormick, ’68 [8], • Penalize µ1 kequality constraintsk2, µ1 → ∞, -replace inequality constraints by smooth barrier, µΣk log gk (x), µ ↓ 0 • Solve sequence of simpler unconstrained problems 1 min Bµ (x) = f (x) + kh(x)k2 + µΣk log gk (x) x µ
Optimization: Theory, Algorithms, Applications – p.20/37
Methods using Lagrange Multipliers • Hestenes, Rockafellar, Fletcher, Powell, Conn-Gould-Toint (augmented Lagrangians combination of Lagrange and penalty methods), Gill-Murray-Wright (’81 [12], Stanford). • Sequential Quadratic Programming (SQP): Solve the Newton direction for the optimality conditions using quadratic approximations involving the Lagrangian function. SECOND CHANCES - QUESTIONS? DISCUSSION?
Optimization: Theory, Algorithms, Applications – p.21/37
Interior Point Methods, LP • 1984: Karmarkar’84[19](Berkeley), interior point method to improve complexity (polynomial time) results. BUT: high efficiency claimed for practical problems! (NYT frontpage stories/fables) • Stanford gang of four (Gill-Murray-Wright-Saunders) connection to log-barrier methods (came back).
Optimization: Theory, Algorithms, Applications – p.22/37
Interior Point Revolution • Kojima-Mizuno-Yoshise’89[21] elegant primal-dual path-following framework. Mehrotra’92[30] predictor-corrector speedup-stability. • OB1 Lustig-Marsten-Shanno, ’92 [29], legal battle with Bell Labs • CPLEX tool for LP - large scale - 15 million variable problems solved on a desktop. • Nesterov-Nemirovski, ’89 [32], extensions to convex problems, e.g. cone optimization problems, e.g. Semidefinite Programming
Optimization: Theory, Algorithms, Applications – p.23/37
Semidefinite Programming • Elegant Theory, Efficient Algorithms, Many Applications • MAX-CUT Undirected, weighted graph G = (N, E), weights W = wij . Cut (divide) the set of nodes N into two sets so that the sum of weights that are cut is maximized. P 1 ∗ p := max ij wij (1 − xi xj ) 4 s.t xi ∈ {±1}, i = 1, . . . n s.t/equiv. x2i = 1, i = 1, . . . n
Optimization: Theory, Algorithms, Applications – p.24/37
SDP via Dual of Dual Lagrangian Dual ∗
T
d := minλ maxx x Qx + = min et λ Q−Diag (λ)0
P
2 λ (1 − x i) i i
dual of MC dual of dual of MC max trace QX min et λ s.t Diag (λ) − Z = Q s.t diag (X) − Z = Q Z0 Z0 .878 performance guarantee, Goemans and Williamson (IBM Bay Area)
Optimization: Theory, Algorithms, Applications – p.25/37
SDP with CONDOR Solves QAP • QAP: Quadratic Assignment Problem: size n > 20 considered hard. (compared to fast solutions for n = 106 for LAP) • important applications to e.g. VLSI design, massive parallelism (Blue Gene/IBM) • SDP bound in a branch and bound framework; using CONDOR (High Throughput Computingusing free cycles worldwide) www.cs.wisc.edu/condor. • Solves Nugent problem n = 30 (and others) for first time, [1].
Optimization: Theory, Algorithms, Applications – p.26/37
SDP and Robust!! Optimization Robust optimization: problem data known only within certain bounds. conic: max bT y s.t. ∀A ∈ U, c − AT y ∈ K goal: find feasible solution acceptably close to optimal for data within the bounds. Applications e.g.: control theory; engineering design and finance; Aircraft path planning; machine learning (robust classification, support vector machines, and kernel optimization); e.g. Ben-Tal [3]; El-Ghaoui (Berkeley) [7].
Optimization: Theory, Algorithms, Applications – p.27/37
SDP and Hilbert’s 17th Problem, SOS Hilbert, 1900: Given a multivariate polynomial that takes only non-negative values over the reals, can it be represented as a sum of squares of rational functions? Artin, 1927 - YES; (Gondard & Ribenboim, extension to symmetric matrices, 1974.) But: SOS polys ⊂ nonneg polys (ml (z) vector of monomials) p(z) is SOS of pols iff p(z) ≡ ml (z)T W ml (z), W
Optimization: Theory, Algorithms, Applications – p.28/37
Lax’58 Conjecture is True Lewis, Parrilo, Ramana’03,[27]: Hyperbolic polynomials in three variables are determinants of three symmetric matrices. and this is equivalent to Helton-Vinnikov observation: A polynomial q on ℜ2 is a real zero polynomial of degree d and satisfies q(0, 0) = 1 if and only if there exist matrices B, C ∈ S d such that q is given by q(y, z) = det(I + yB + zC).
Optimization: Theory, Algorithms, Applications – p.29/37
SDP Open Problems; Exciting Area • Which problems can be formulated as SDPs? (Algebraic connections, BIRS/MSRI workshop on Positive Polynomials and Optimization, e.g. Parrilo, ex-postdoc Berkeley) • efficient/stable solutions, large scale problems • extension of SDP methodology: e.g. symmetric cones and relation to Jordan algebras; bilinear matrix inequalities. • SDP at UC Berkeley: leader is Laurent El Ghaoui; recent grad. Jiawang Nie (working on sensor localization) with co-advisors Jim Demmel and Bernd Sturmfels.
Optimization: Theory, Algorithms, Applications – p.30/37
Outstanding Problems/Questions • Kepler (1611) Conjecture: close packing (cubic or hexagonal close √ packing; have maximum densities of π/(3 2) ≈ 74.048%) is the densest possible sphere packing. Find densest (not necessarily periodic) packing of spheres - Kepler problem.
Optimization: Theory, Algorithms, Applications – p.31/37
Effective Certificates of Optimality? • Hales’ (1997) detailed plan; extensive use of computer calculations. Hales’ full proof in a series of papers totaling more than 250 pages (Cipra 1998). Proof relies extensively on: global optimization; linear programming; interval arithmetic. Computer files contain more than 3 gigabytes of storage, e.g. [2].
Optimization: Theory, Algorithms, Applications – p.32/37
Hard Nonconvex Problems • e.g. protein folding - how does natural phenomena optimize? • - Hubble telescope SDP - projection algorithm. • strongly polynomial LP algorithms; Hirsch conjecture (comb. diameter of d-polytope with n facets is bounded by n − d; • weather prediction uses a least squares min/opt problem for initial conditions • Massive Parallel Computing optimize network minimize heat - VLSI design; metrics for performance?
Optimization: Theory, Algorithms, Applications – p.33/37
Discrete Optimization • Important applications: e.g. ministry of health all discrete opt problems • using continuous optimization relaxations within branch and bound methods • Gomory cutting planes came back and lie behind current success of solving large scale discrete problems (e.g. in CPLEX). • QAP problem is NP- hard, but still needs to be solved, i.e. worst case complexity and expected performance can differ drastically.
Optimization: Theory, Algorithms, Applications – p.34/37
Connections with Optimization • Discrete and Continuous Optimization • Optimal Control Theory (space program, environment) • Medicine (molecular conformation, scheduling) • Politics (game theory) • Computer Science (massive parallelism, VLSI design, computer design, quantum computing) • Management Science and Engineering in general • Economics (government planning) • Statistics, e.g. machine learning
Optimization: Theory, Algorithms, Applications – p.35/37
THIRD CHANCES - QUESTIONS? DISCUSSION?
Optimization: Theory, Algorithms, Applications – p.36/37
Resources/References • Optimization Frequently Asked Questions: Linear Programming FAQ Nonlinear Programming FAQ www-unix.mcs.anl.gov/otc/Guide/faq/ • NEOS: neos.mcs.anl.gov/neos/index.html • e-optimization community: www.e-optimization.com/ • Optimization Online: www.optimization-online.org/
Optimization: Theory, Algorithms, Applications – p.37/37
References [1] K.M. ANSTREICHER, N.W. BRIXIUS, J.-P. GOUX, and J. LINDEROTH. Solving large quadratic assignment problems on computational grids. Math. Program., 91(3, Ser. A):563–588, 2002. [2] D.H. BAILEY and J.M. BORWEIN. Experimental mathematics: recent developments and future outlook. In Mathematics unlimited—2001 and beyond, pages 51–66. Springer, Berlin, 2001. URL: users.cs.dal.ca/˜jborwein/math-future.pdf. [3] A. BEN-TAL and A.S. NEMIROVSKI. Robust convex optimization. Math. Oper. Res., 23(4):769–805, 1998. [4] C.G. BROYDEN. The convergence of a class of doublerank minimization algorithms, Part I. IMA J. Appl. Math., 6:76–90, 1970. [5] W.C. DAVIDON. Variable metric methods for minimization. Technical Report ANL-5990, Argonne National Labs, Argonne, IL, 1959. [6] R.J. DUFFIN. Infinite programs. In A.W. Tucker, editor, Linear Equalities and Related Systems, pages 157–170. Princeton University Press, Princeton, NJ, 1956.
37-1
[7] L. EL GHAOUI and G. CALAFIORE. Worst-case simulation of uncertain systems. In A. Tesi A. Garulli and A. Vicino, editors, Robustness in Identification and Control, Lecture Notes in Control and Information Sciences. Springer, 1999. To appear. [8] A.V. FIACCO and G.P. McCORMICK. Nonlinear programming sequential unconstrained minimization techniques. Classics in Applied Mathematics. SIAM, Philadelphia, PA, USA, 1990. [9] R. FLETCHER. A new approach to variable metric algorithms. Comput. J., 13:317–322, 1970. [10] R. FLETCHER and M.J.D. POWELL. A rapidly convergent descent method for minimization. Comput. J., 6:163–168, 1963. [11] K.R. FRISCH. The logarithmic potential method of convex programming. Technical report, Institute of Economics, Oslo University, Oslo, Norway, 1955. [12] P.E. GILL, W. MURRAY, and M.H. WRIGHT. Practical Optimization. Academic Press, New York, London, Toronto, Sydney and San Francisco, 1981. [13] D. GOLDFARB. A family of variable-metric methods derived by variational means. Math. Comp., 24:23–26, 1970.
37-2
[14] R.B. HOLMES. Geometric Functional Analysis and its Applications. Springer-Verlag, Berlin, 1975. [15] J. JAHN. Mathematical Vector Optimization in Partially Ordered Linear Spaces. Peter Lang, Frankfurt am Main, 1986. [16] G. JAMESON. Ordered linear spaces. Springer-Verlag, New York, 1970. [17] F. JOHN. Extremum problems with inequalities as subsidiary conditions. In Sudies and Essays, Courant Anniversary Volume, pages 187–204. Interscience, New York, 1948. [18] L.V. KANTOROVICH. Mathematical methods of organizing and planning production. Management Sci., 6:366– 422, 1959/1960. [19] N.K. KARMARKAR. A new polynomial-time algorithm for linear programming. Combinatorica, 4:373–395, 1984. [20] W. KARUSH. Minima of functions of several variables with inequalities as side constraints. Master’s thesis, University of Chicago, Illiois, 1939. [21] M. KOJIMA, S. MIZUNO, and A. YOSHISE. A primal– dual interior point algorithm for linear programming. In
37-3
N. Megiddo, editor, Progress in Mathematical Programming : Interior Point and Related Methods, pages 29–47. Springer Verlag, New York, 1989. [22] T.C. KOOPMANS. Concepts of optimality and their uses. Nobel Memorial Lecture, Yale University, 1975. [23] H.W. KUHN. The Hungarian method for the assignment problem. Naval Res. Logist. Quart., 2:83–97, 1955. [24] H.W. KUHN. Nonlinear programming: a historical view. In R.W. Cottle and C.E. Lemke, editors, Nonlinear Programming, pages 1–26, Providence, R.I., 1976. AMS. [25] H.W. KUHN and A.W. TUCKER. Nonlinear programming. In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, 1950, pages 481– 492, Berkeley and Los Angeles, 1951. University of California Press. [26] J.K. LENSTRA, A.H.G. RINNOY KAN, and A. SCHRIJVER. History of Mathematical Programming: A Collection of Personal Reminiscences. CWI North-Holland, Amsterdam, 1991. [27] A.S. LEWIS, P.A. PARRILO, and M.V. RAMANA. The Lax conjecture is true. Proc. Amer. Math. Soc., 133(9):2495– 2499 (electronic), 2005.
37-4
[28] D.G. LUENBERGER. Optimization by Vector Space Methods. John Wiley, 1969. [29] I. J. LUSTIG, R. E. MARSTEN, and D. F. SHANNO. On implementing Mehrotra’s predictor–corrector interior point method for linear programming. SIAM J. Optim., 2(3):435– 449, 1992. [30] S. MEHROTRA. On the implementation of a primal-dual interior point method. SIAM J. Optim., 2(4):575–601, 1992. [31] J. MUNKRES. Algorithms for the assignment and transportation problems. J. Soc. Indust. Appl. Math., 5:32–38, 1957. [32] Y.E. NESTEROV and A.S. NEMIROVSKI. Interior Point Polynomial Algorithms in Convex Programming. SIAM Publications. SIAM, Philadelphia, USA, 1994. [33] R.T. ROCKAFELLAR. Convex Analysis. Princeton University Press, Princeton, NJ, 1970. [34] D.F. SHANNO. Conditioning of quasi-Newton methods for function minimization. Math. Comp., 24:647–657, 1970. [35] H. WOLKOWICZ and G.P.H STYAN. More bounds for eigenvalues using traces. Linear Algebra Appl., 31:1–17, 1980. 37-5
[36] Yau Chuen Wong and Kung Fu Ng. Partially ordered topological vector spaces. Clarendon Press, Oxford, 1973. Oxford Mathematical Monographs.
37-6