Convex Optimization and Applications - Computing + Mathematical ...

16 downloads 311 Views 735KB Size Report
Stephen Boyd (Stanford) and Lieven Vandenberghe (UCLA) and their research .... it's not easy to recognize convex functions and convex optimization problems.
Convex Optimization and Applications

Lin Xiao Center for the Mathematics of Information California Institute of Technology Acknowledgment: Stephen Boyd (Stanford) and Lieven Vandenberghe (UCLA) and their research groups

CS286a: Mathematics of Information seminar, 10/15/04

Two problems polyhedron P described by linear inequalities, aTi x ≤ bi, i = 1, . . . , L

P PSfrag replacements a1

Problem 1: find minimum volume ellipsoid ⊇ P Problem 2: find maximum volume ellipsoid ⊆ P are these (computationally) difficult? or easy? CS286a seminar 10/15/04

1

problem 1 is very difficult • in practice • in theory (NP-hard)

problem 2 is very easy • in practice (readily solved on small computer) • in theory (polynomial complexity)

CS286a seminar 10/15/04

2

Moral

very difficult and very easy problems can look quite similar

. . . unless we are trained to recognize the difference

CS286a seminar 10/15/04

3

Linear program (LP) minimize cT x subject to aTi x ≤ bi,

i = 1, . . . , m

c, ai ∈ Rn are parameters; x ∈ Rn is variable

• easy to solve, in theory and practice

• can solve dense problems with n = 1000 vbles, m = 10000 constraints easily; far larger for sparse or structured problems

CS286a seminar 10/15/04

4

Polynomial minimization minimize p(x) p is polynomial of degree d; x ∈ Rn is variable

• except for special cases (e.g., d = 2) this is a very difficult problem

• even sparse problems with size n = 20, d = 10 are essentially intractable

• all algorithms known to solve this problem require effort exponential in n

CS286a seminar 10/15/04

5

Moral

• a problem can appear∗ hard, but be easy • a problem can appear∗ easy, but be hard



if we are not trained to recognize them

CS286a seminar 10/15/04

6

What makes a problem easy or hard?

classical view:

• linear is easy • nonlinear is hard(er)

CS286a seminar 10/15/04

7

What makes a problem easy or hard?

emerging (and correct) view: . . . the great watershed in optimization isn’t between linearity and nonlinearity, but convexity and nonconvexity. — R. Rockafellar, SIAM Review 1993

CS286a seminar 10/15/04

8

Convex optimization minimize f0(x) subject to f1(x) ≤ 0, . . . , fm(x) ≤ 0,

Ax = b

x ∈ Rn is optimization variable; fi : Rn → R are convex: fi(λx + (1 − λ)y) ≤ λfi(x) + (1 − λ)fi(y) for all x, y, 0 ≤ λ ≤ 1 • includes least-squares, linear programming, maximum volume ellipsoid in polyhedron, and many others • convex problems are fundamentally tractable

CS286a seminar 10/15/04

9

Example: Robust LP minimize cT x subject to Prob(aTi x ≤ bi) ≥ η,

i = 1, . . . , m

coefficient vectors ai IID, N (ai, Σi); η is required reliability • for fixed x, aTi x is N (aTi x, xT Σix) • so for η = 50%, robust LP reduces to LP minimize cT x subject to aTi x ≤ bi,

i = 1, . . . , m

and so is easily solved • what about other values of η, e.g., η = 10%? η = 90%? CS286a seminar 10/15/04

10

constraint Prob(aTi x ≤ bi) ≥ η equivalent to 1/2

a ¯Ti x + Φ−1(η)kΣi xk2 − bi ≤ 0 Φ is CDF of unit Gaussian

is LHS a convex function?

CS286a seminar 10/15/04

11

Hint

{x | Prob(aTi x ≤ bi) ≥ η, i = 1, . . . , m}

η = 10%

CS286a seminar 10/15/04

η = 50%

η = 90%

12

That’s right

robust LP with reliability η = 90% is convex, and very easily solved

robust LP with reliability η = 10% is not convex, and extremely difficult

CS286a seminar 10/15/04

13

Maximum volume ellipsoid in polyhedron • polyhedron: P = {x | aTi x ≤ bi, i = 1, . . . , m}

• ellipsoid: E = {By + d | kyk ≤ 1}, with B = B T Â 0

s

PSfrag replacements

d

E

maximum volume E ⊆ P, as convex problem in variables B, d: maximize log det B subject to B = B T Â 0, CS286a seminar 10/15/04

kBaik + aTi d ≤ bi,

i = 1, . . . , m

14

Moral

• it’s not easy to recognize convex functions and convex optimization problems • huge benefit, though, when we do

CS286a seminar 10/15/04

15

Convex Analysis and Optimization

Convex analysis & optimization

nice properties of convex optimization problems known since 1960s • local solutions are global

• duality theory, optimality conditions

• simple solution methods like alternating projections

convex analysis well developed by 1970s Rockafellar • separating & supporting hyperplanes • subgradient calculus

CS286a seminar 10/15/04

16

What’s new (since 1990 or so) • primal-dual interior-point (IP) methods extremely efficient, handle nonlinear large scale problems, polynomial-time complexity results, software implementations • new standard problem classes generalizations of LP, with theory, algorithms, software • extension to generalized inequalities semidefinite, cone programming

CS286a seminar 10/15/04

17

Applications and uses • lots of applications control, combinatorial optimization, signal processing, circuit design, communications, machine learning . . . • robust optimization robust versions of LP, least-squares, other problems • relaxations and randomization provide bounds, heuristics for solving hard (e.g., combinatorial optimization) problems

CS286a seminar 10/15/04

18

Recent history • 1984–97: interior-point methods for LP – 1984: Karmarkar’s interior-point LP method – theory Ye, Renegar, Kojima, Todd, Monteiro, Roos, . . . – practice Wright, Mehrotra, Vanderbei, Shanno, Lustig, . . . • 1988: Nesterov & Nemirovsky’s self-concordance analysis • 1989–: LMIs and semidefinite programming in control

• 1990–: semidefinite programming in combinatorial optimization Alizadeh, Goemans, Williamson, Lovasz & Schrijver, Parrilo, . . .

• 1994: interior-point methods for nonlinear convex problems Nesterov & Nemirovsky, Overton, Todd, Ye, Sturm, . . . • 1997–: robust optimization Ben Tal, Nemirovsky, El Ghaoui, . . .

CS286a seminar 10/15/04

19

New Standard Convex Problem Classes

Some new standard convex problem classes • second-order cone program (SOCP)

• geometric program (GP) (and entropy problems) • semidefinite program (SDP)

all these new problem classes have • complete duality theory, similar to LP

• good algorithms, and robust, reliable software

• wide variety of new applications

CS286a seminar 10/15/04

20

Second-order cone program second-order cone program (SOCP) has form minimize cT0 x subject to kAix + bik2 ≤ cTi x + di,

i = 1, . . . , m

with variable x ∈ Rn • includes LP and QP as special cases

• nondifferentiable when Aix + bi = 0

• new IP methods can solve (almost) as fast as LPs

CS286a seminar 10/15/04

21

Example: robust linear program minimize cT x subject to Prob(aTi x ≤ bi) ≥ η,

i = 1, . . . , m

where ai ∼ N (¯ a i , Σi ) equivalent to minimize cT x 1/2 subject to a ¯Ti x + Φ−1(η)kΣi xk2 ≤ bi,

i = 1, . . . , m

where Φ is (unit) normal CDF robust LP is an SOCP for η ≥ 0.5 (Φ−1(η) ≥ 0) CS286a seminar 10/15/04

22

Geometric program (GP) log-sum-exp function: lse(x) = log (ex1 + · · · + exn ) . . . a smooth convex approximation of the max function

geometric program: minimize lse(A0x + b0) subject to lse(Aix + bi) ≤ 0,

i = 1, . . . , m

Ai ∈ Rmi×n, bi ∈ Rmi ; variable x ∈ Rn

CS286a seminar 10/15/04

23

Posynomial form geometric program x = (x1, . . . , xn): vector of positive variables function f of form f (x) =

t X

k=1

α

α

nk ck x1 1k x2 2k · · · xα n

with ck ≥ 0, αik ∈ R, is called posynomial like polynomial, but • coefficients must be positive

• exponents can be fractional or negative

CS286a seminar 10/15/04

24

Posynomial form geometric program posynomial form GP: minimize f0(x) subject to fi(x) ≤ 1,

i = 1, . . . , m

fi are posynomial; xi > 0 are variables to convert to (convex form) GP, express as minimize log f0(ey ) subject to log fi(ey ) ≤ 0,

i = 1, . . . , m

objective and constraints have form lse(Aiy + bi)

CS286a seminar 10/15/04

25

Entropy problems unnormalized negative entropy is convex function − entr(x) =

n X

xi log(xi/1T x)

i=1

defined for xi ≥ 0, 1T x > 0 entropy problem: minimize − entr(A0x + b0) subject to − entr(Aix + bi) ≤ 0,

i = 1, . . . , m

Ai ∈ Rmi×n, bi ∈ Rmi CS286a seminar 10/15/04

26

Solving GPs (and entropy problems)

• GP and entropy problems are duals (if we solve one, we solve the other) • new IP methods can solve large scale GPs (and entropy problems) almost as fast as LPs • applications in many areas: – information theory, statistics – communications, wireless power control – digital and analog circuit design

CS286a seminar 10/15/04

27

Generalized inequalities with proper convex cone K ⊆ Rk we associate generalized inequality x ¹K y ⇐⇒ y − x ∈ K convex optimization problem with generalized inequalities: minimize

f0(x)

subject to f1(x) ¹K1 0, . . . , fL(x) ¹KL 0,

Ax = b

fi : Rn → Rki are Ki-convex: for all x, y, 0 ≤ λ ≤ 1, fi(λx + (1 − λ)y) ¹Ki λfi(x) + (1 − λ)fi(y)

CS286a seminar 10/15/04

28

Semidefinite program semidefinite program (SDP): minimize cT x subject to x1A1 + · · · + xnAn ¹ B • B, Ai are symmetric matrices; variable is x ∈ Rn

• ¹ is matrix inequality; constraint is linear matrix inequality (LMI) • SDP can be expressed as convex problem as

λmax(B − x1A1 · · · − xnAn) ≤ 0 or handled directly as cone problem

CS286a seminar 10/15/04

29

Semidefinite programming • nearly complete duality theory, similar to LP • interior-point algorithms that are efficient in theory & practice • applications in many areas: – – – – – – – – – – –

control theory combinatorial optimization & graph theory structural optimization statistics signal processing circuit design geometrical problems communications and information theory quantum computing algebraic geometry machine learning

CS286a seminar 10/15/04

30

Continuous time symmetric Markov chain on a graph • connected graph G = (V, E), V = {1, . . . , n}, no self-loops • Markov process on V, with symmetric rate matrix Q – Qij = 0 for (i, j) 6∈ E, i 6= j – Qij ≥ 0 for P (i, j) ∈ E – Qii = − j6=i Qij

• eigenvalues of Q ordered as 0 = λ1(Q) > λ2(Q) ≥ · · · ≥ λn(Q) • state distribution given by π(t) = etQπ(0) • distribution converges to uniform with rate determined by λ2 √

kπ(t) − 1/nktv ≤ ( n/2)eλ2(Q)t

CS286a seminar 10/15/04

31

Fastest mixing Markov chain (FMMC) problem ? minimize

λ2(Q)

subject to Q1 = 0,

Q=Q

T

?

?

Qij = 0 for (i, j) 6∈ E, i 6= j Qij ≥ 0 for (i, j) ∈ E P 2 d (i,j)∈E ij Qij ≤ 1

? ?

?

?

• variable is matrix Q; problem data is graph, constants dij = dij on E • need to add constraint on rates since λ2 is homogeneous • optimal Q gives fastest diffusion process on graph (subject to rate constraint) CS286a seminar 10/15/04

32

PSfrag replacements

Fast diffusion processes

electrical PSfrag replacements

g13 g46 g12 1

mechanical

2

g34

g23

4 g45

3

k13 k12 1

CS286a seminar 10/15/04

5

k46 k23

2

6

k34 3

k45 4

5

6

33

Swap objective and constraint • λ2(Q) and

P

(i,j)∈E

d2ij Qij are homogeneous

• hence, can just as well minimize weighted rate sum subject to bound on λ2(Q)

minimize

P

P

2 d (i,j)∈E ij Qij

2 d (i,j)∈E ij Qij

subject to Q1 = 0,

Q = QT

Qij = 0 for (i, j) 6∈ E, i 6= j Qij ≥ 0 for (i, j) ∈ E

λ2(Q) ≤ −1

CS286a seminar 10/15/04

34

SDP formulation of FMMC using Q1 = 0, we have λ2(Q) ≤ −1 ⇐⇒ Q − 11T /n ¹ −I so FMMC reduces to SDP minimize

P

2 d (i,j)∈E ij Qij

subject to Q1 = 0,

Q = QT

Qij = 0 for (i, j) 6∈ E, i 6= j Qij ≥ 0 for (i, j) ∈ E

Q − 11T /n ¹ −I hence: can solve efficiently, duality theory, . . . CS286a seminar 10/15/04

35

Robust Optimization

Robust optimization problem robust optimization problem: minimize maxa∈A f0(a, x) subject to maxa∈A fi(a, x) ≤ 0,

i = 1. . . . , m

• x is optimization variable

• a is uncertain parameter

• A is parameter set (e.g., box, polyhedron, ellipsoid)

heuristics (detuning, sensitivity penalty term) have been used since beginning of optimization

CS286a seminar 10/15/04

36

Robust optimization via convex optimization El Ghaoui, Ben Tal & Nemirovsky, . . .

• robust versions of LP, QP, SOCP problems, with ellipsoidal or polyhedral uncertainty, can be formulated as SDPs (or simpler) • other robust problems (e.g., SDP) intractable, but there are good convex approximations

CS286a seminar 10/15/04

37

Robust least-squares robust LS problem with uncertain parameters v1, . . . , vp minimize supkvk≤1 k(A0 + v1A1 + · · · + vpAp)x − bk2 equivalent SDP (variables x, t1, t2): minimize

t 1 + t2  I subject to  P (x)T q(x)T



P (x) q(x) t1 I 0 º0 0 t2

where P (x) =

£

CS286a seminar 10/15/04

A1 x A 2 x · · · A p x

¤

∈ Rm×p,

q(x) = A0x − b 38

example: minimize supkvk≤ρ k(A0 + v1A1 + v2A2)x − bk2 12

worst-case residual

10

PSfrag replacements

8

LS sol.

6 4

robust LS sol.

2 0 0

0.2

0.4

ρ

0.6

0.8

1

(kA0k = 10, kA1k = kA2k = 1) CS286a seminar 10/15/04

39

example: minimize supkvk≤1 k(A0 + v1A1 + v2A2)x − bk2 0.25

xrls

0.2

frequency

0.15

0.1

PSfrag replacements

xtych xls

0.05

0 0

1

2

3

4

5

k(A0 + v1A1 + v2A2)x − bk2

distribution assuming v is uniformly distributed CS286a seminar 10/15/04

40

Relaxations & Randomization

Relaxations & randomization

convex optimization is increasingly used • to find good bounds for hard (i.e., nonconvex) problems, via relaxation • as a heuristic for finding good suboptimal points, often via randomization

CS286a seminar 10/15/04

41

Example: Boolean least-squares Boolean least-squares problem: minimize kAx − bk2 subject to x2i = 1, i = 1, . . . , n

• basic problem in digital communications

• could check all 2n possible values of x . . .

• an NP-hard problem, and very hard in practice

• many heuristics for approximate solution

CS286a seminar 10/15/04

42

Boolean least-squares as matrix problem

kAx − bk2 = xT AT Ax − 2bT Ax + bT b

= Tr AT AX − 2bT AT x + bT b

where X = xxT hence can express BLS as minimize Tr AT AX − 2bT Ax + bT b subject to Xii = 1, X º xxT , rank(X) = 1 . . . still a very hard problem

CS286a seminar 10/15/04

43

SDP relaxation for BLS ignore rank one constraint, and use X º xxT ⇐⇒

·

X xT

x 1

¸

º0

to obtain SDP relaxation (with variables X, x) Tr AT AX − 2bT AT x + bT b · ¸ X x subject to Xii = 1, º0 xT 1 minimize

• optimal value of SDP gives lower bound for BLS

• if optimal matrix is rank one, we’re done CS286a seminar 10/15/04

44

Interpretation via randomization • can think of variables X, x in SDP relaxation as defining a normal distribution z ∼ N (x, X − xxT ), with E zi2 = 1 • SDP objective is E kAz − bk2

suggests randomized method for BLS: • find X ?, x?, optimal for SDP relaxation

• generate z from N (x?, X ? − x?x?T )

• take x = sgn(z) as approximate solution of BLS (can repeat many times and take best one)

CS286a seminar 10/15/04

45

Example • (randomly chosen) parameters A ∈ R150×100, b ∈ R150 • x ∈ R100, so feasible set has 2100 ≈ 1030 points

LS approximate solution: minimize kAx − bk s.t. kxk2 = n, then round yields objective 8.7% over SDP relaxation bound

randomized method: (using SDP optimal distribution) • best of 20 samples: 3.1% over SDP bound

• best of 1000 samples: 2.6% over SDP bound CS286a seminar 10/15/04

46

0.5

frequency

0.4

PSfrag replacements

SDP bound

LS solution

0.3

0.2

0.1

0

1

1.2

kAx − bk/(SDP bound)

CS286a seminar 10/15/04

47

Calculus of Convex Functions

Approach

• basic examples or atoms • calculus rules or transformations that preserve convexity

CS286a seminar 10/15/04

48

Convex functions: Basic examples • xp for p ≥ 1 or p ≤ 0; −xp for 0 ≤ p ≤ 1 • ex, − log x, x log x • xT x; xT x/y (for y > 0); (xT x)1/2 • kxk (any norm) • max(x1, . . . , xn), log(ex1 + · · · + exn ) • log Φ(x) (Φ is Gaussian CDF) • log det X −1 (for X Â 0)

CS286a seminar 10/15/04

49

Calculus rules • convexity preserved under sums, nonnegative scaling • if f cvx, then g(x) = f (Ax + b) cvx • pointwise sup: if fα cvx for each α ∈ A, then g(x) = supα∈A fα(x) cvx • minimization: if f (x, y) cvx, then g(x) = inf y f (x, y) cvx • composition rules: if h cvx & increasing, f cvx, then g(x) = h(f (x)) cvx • perspective transformation: if f cvx, then g(x, t) = tf (x/t) cvx for t>0 . . . and many, many others CS286a seminar 10/15/04

50

More examples • λmax(X) (for X = X T ) • f (x) = x[1] + · · · + x[k] • −

Pm

i=1 log(−fi (x))

(sum of largest k elements of x) (on {x | fi(x) < 0}; fi cvx)

• f (x) = log Prob(x + z ∈ C)

(C convex, z ∼ N (0, Σ))

• xT Y −1x is cvx in (x, Y ) for Y = Y T Â 0

CS286a seminar 10/15/04

51

Duality

Lagrangian and dual function primal problem: minimize f0(x) subject to fi(x) ≤ 0,

Lagrangian: L(x, λ) = f0(x) +

i = 1, . . . , m

Pm

i=1 λi fi (x)

dual function: g(λ) = inf x L(x, λ)

CS286a seminar 10/15/04

52

Lower bound property

• for any primal feasible x and any λ ≥ 0, we have f0(x) ≤ g(λ) • hence for any λ ≥ 0, g(λ) ≤ p? (optimal value of primal problem) • duality gap of x, λ is defined as f0(x) − g(λ) (gap is nonnegative; bounds suboptimality of x)

CS286a seminar 10/15/04

53

Dual problem find best lower bound on p?: maximize g(λ) subject to λi ≥ 0,

i = 1, . . . , m

p? is optimal value of primal, and d? is optimal value of dual • weak duality: even when primal not convex, d? ≤ p? • for convex primal problem, we have strong duality: d? = p? (provided technical condition holds)

CS286a seminar 10/15/04

54

Example: linear program

primal:

dual:

CS286a seminar 10/15/04

minimize cT x subject to Ax ¹ b

maximize bT λ subject to AT λ + c = 0,

λº0

55

Example: unconstrained geometric program

primal: minimize log

dual:

¡P m

T i=1 exp(ai x

+ bi )

Pm maximize bT ν − i=1 νi log νi subject to 1T ν = 1, AT ν = 0,

¢

νº0

. . . an entropy maximization problem

CS286a seminar 10/15/04

56

Example: duality between FMMC and MVU FMMC problem minimize

P

(i,j)∈E

subject to Q1 = 0,

d2ij Qij Q = QT

Qij = 0 for (i, j) 6∈ E, i 6= j Qij ≥ 0 for (i, j) ∈ E

Q − 11T /n ¹ −I dual of FMMC problem maximize

Tr X

subject to Xii + Xjj − Xij − Xji ≤ d2ij , X1 = 0,

CS286a seminar 10/15/04

Xº0

(i, j) ∈ E

57

FMMC dual as maximum-variance unfolding 

xT1



• use variables x1, . . . , xn, with X =  ..  [x1 · · · xn] xTn • dual FMMC problem becomes maximize subject to

Pn

P

2 kx k i i=1 i xi

= 0,

kxi − xj k ≤ dij ,

(i, j) ∈ E

• position n points in Rn to maximize variance, respecting local distance constraints, i.e., maximum-variance unfolding problem

CS286a seminar 10/15/04

58

Semidefinite embedding • similar to semidefinite embedding for unsupervised learning of manifolds (which has distance equality constraints) (Weinberger, Saul 2003)

• surprise: fastest diffusion on graph, max-variance unfolding are duals CS286a seminar 10/15/04

59

Interior-Point Methods

Interior-point methods • handle linear and nonlinear convex problems Nesterov & Nemirovsky • based on Newton’s method applied to ‘barrier’ functions that trap x in interior of feasible region (hence the name IP) √ • worst-case complexity theory: # Newton steps ∼ problem size • in practice: # Newton steps between 20 & 50 (!) — over wide range of problem dimensions, type, and data • 1000 variables, 10000 constraints feasible on PC; far larger if structure is exploited • readily available (commercial and noncommercial) packages

CS286a seminar 10/15/04

60

Log barrier for convex problem minimize f0(x) subject to fi(x) ≤ 0,

i = 1, . . . , m

we define logarithmic barrier as φ(x) = −

m X

log(−fi(x))

i=1

• φ is convex, smooth on interior of feasible set

• φ → ∞ as x approaches boundary of feasible set CS286a seminar 10/15/04

61

Central path central path is curve x?(t) = argmin (tf0(x) + φ(x)) , x

t≥0

• x?(t) is strictly feasible, i.e., fi(x) < 0

• x?(t) can be computed by, e.g., Newton’s method

• intuition suggests x?(t) converges to optimal as t → ∞

• using duality can prove x?(t) is m/t-suboptimal

CS286a seminar 10/15/04

62

Central path & duality from ?

?

?

∇x(tf0(x ) + φ(x )) = t∇f0(x ) +

m X i=1

1 ? ∇f (x )=0 i −fi(x?)

we find that

1 , i = 1, . . . , m = −tfi(x?) is dual feasible, with g(λ?(t)) = f0(x?) − m/t λ?i(t)

• duality gap associated with pair x?(t), λ?(t) is m/t • hence, x?(t) is m/t-suboptimal

CS286a seminar 10/15/04

63

Example: central path for LP ³

x?(t) = argminx tcT x −

P6

i=1 log(bi

− aTi x)

´

c

x?(0) PSfrag replacements

CS286a seminar 10/15/04

xopt

x?(10)

64

Barrier method a.k.a. path-following method

given strictly feasible x, t > 0, µ > 1 repeat 1. compute x := x?(t) (using Newton’s method, starting from x) 2. exit if m/t < tol 3. t := µt

duality gap reduced by µ each outer iteration

CS286a seminar 10/15/04

65

Trade-off in choice of µ

large µ means • fast duality gap reduction (fewer outer iterations), but • many Newton steps to compute x?(t+) (more Newton steps per outer iteration)

total effort measured by total number of Newton steps

CS286a seminar 10/15/04

66

Typical example

2

10

0

10

duality gap

GP with n = 50 variables, m = 100 constraints, mi = 5

• wide range of PSfrag µ works well replacements • very typical behavior (even for large m, n)

−2

10

−4

10

−6

10

0

µ = 150 20

µ = 50 40

60

µ=2 80

100

120

Newton iterations

CS286a seminar 10/15/04

67

Effect of µ

Newton iterations

140

PSfrag replacements

120 100 80 60 40 20 0 0

20

40

60

80

100 120 140 160 180 200

µ barrier method works well for µ in large range

CS286a seminar 10/15/04

68

Typical convergence of IP method

2

10

duality gap

0

PSfrag replacements

10

−2

10

SOCP GP

−4

10

SDP

−6

10

0

LP

10

20

30

40

50

# Newton steps

LP, GP, SOCP, SDP with 100 variables

CS286a seminar 10/15/04

69

Typical effort versus problem dimensions

35

• 100 instances for each of 20 problem sizes • avg & std dev shown

Newton steps

• LPs with n vbles, 2n constraints

30

25

20

PSfrag replacements 15 1 10

CS286a seminar 10/15/04

2

10

n

3

10

70

Complexity analysis

• based on self-concordance (Nesterov & Nemirovsky, 1988) • for any choice of µ, #steps is O(m log 1/²), where ² is final accuracy √

• to optimize√complexity bound, can take µ = 1 + 1/ m, which yields #steps O( m log 1/²) • in any case, IP methods work extremely well in practice

CS286a seminar 10/15/04

71

Computational effort per Newton step • Newton step effort dominated by solving linear equations to find primal-dual search direction • equations inherit structure from underlying problem (e.g., sparsity, symmetry, toeplitz, circulant, Hankel, Kronecker) • equations same as for least-squares problem of similar size and structure conclusion: we can solve a convex problem with about the same effort as solving 20–50 least-squares problems

CS286a seminar 10/15/04

72

Other interior-point methods

more sophisticated IP algorithms • primal-dual, incomplete centering, infeasible start • use same ideas, e.g., central path, log barrier

• readily available (commercial and noncommercial packages)

typical performance: 20 – 50 Newton steps (!) — over wide range of problem dimensions, problem type, and problem data

CS286a seminar 10/15/04

73

Exploiting structure sparsity • well developed, since late 1970s • direct (sparse factorizations) and iterative methods (CG, LSQR) • standard in general purpose LP, QP, GP, SOCP implementations • can solve problems with 105, 106 vbles, constraints (depending on sparsity pattern) symmetry • reduce number of variables • reduce size of matrices (particularly helpful for SDP) CS286a seminar 10/15/04

74

A return to subgradient-type methods • very large-scale problems: 105 − 107 variables • applications: medical imaging, shape design of mechanical structures, machine learning and data mining, . . . • IPM out of consideration (even single iteration prohibitive); can use only first-order information (function values and subgradients) • a reasonable algorithm should have – computational effort per iteration at most linear in design dimension – potential to obtain (and willing to accept) medium-accuracy solutions – error reduction factor essentially independent of problem dimension Ben Tal & Nemirovsky, Nesterov, . . . CS286a seminar 10/15/04

75

Conclusions

Conclusions

convex optimization • theory fairly mature; practice has advanced tremendously last decade • qualitatively different from general nonlinear programming • cost only 30× more than least-squares, but far more expressive • lots of applications still to be discovered

CS286a seminar 10/15/04

76

Some references • Convex Optimization, Boyd & Vandenberghe, 2004 www.stanford.edu/~boyd/cvxbook.html (pdf of full text on web) • Introductory Lectures on Convex Optimization, Nesterov, 2003 • Lectures on Modern Convex Optimization, Ben Tal & Nemirovsky, 2001 • Interior-point Polynomial Algorithms in Convex Programming, Nesterov & Nemirovsky, 1994 • Linear Matrix Inequalities in System and Control Theory, Boyd, El Ghaoui, Feron, & Balakrishnan, 1994

CS286a seminar 10/15/04

77