Randomized Methods for Solving Convex Problems - MIT Sloan

Randomized Methods for Solving Convex Problems: Some Theory and Some Computational Experience Robert M. Freund MIT (joint work with Alexandre Belloni) October, 2007 (paper not yet available)

A Quotation “The final test of any theory is its capacity to solve the problems which originated it.”

1

A Quotation “The final test of any theory is its capacity to solve the problems which originated it.” –George B. Dantzig, opening sentence of Linear Programming and Extensions, 1963

2

In conversation at U.Florida, 1999 Don Ratliff: “Tell me something, do any of you people at MIT ever do anything practical?”

3

In conversation at U.Florida, 1999 Don Ratliff: “Tell me something, do any of you people at MIT ever do anything practical?” Rob Freund: “In theory, yes.”

4

Motivation and Scope of Talk

• Until recently, randomized methods have played mostly a minor role in theory and practice in continuous convex optimization • Perhaps now is the time to explore the possible contributions of randomized algorithms in convex optimization Herein we present: • concepts of modern conic optimization and semidefinite programming (SDP) • recent theoretical algorithm for convex optimization by Bertsimas/Vempala • recent “practical” method for conic optimization using IPMs by Belloni/Freund

5

Preamble We know how to do at least three types of random sampling: 1. compute a random vector on the unit sphere S d−1 ⊂ IRd 2. compute a random point uniformly distributed on a convex body S given an initial interior point v 0 ∈ S and a membership oracle for S 3. compute a random point norm-exponentially distributed on a convex set S given an initial interior point v 0 ∈ S and a membership oracle for S These and other random sampling methods lead to some very interesting methods for convex optimization

6

Uniform Vector on the Sphere

7

Uniform Vector on a Convex Body

8

Norm-Exponential Distribution on Convex Set S f (x) ∼ e−tkxk2 on S

9

Norm-Exponential Distribution on Convex Set S f (x) ∼ e−tkxk t >> 0

10

Norm-Exponential Distribution on Convex Set S f (x) ∼ e−tkxk t>0

11

Norm-Exponential Distribution on Convex Set S f (x) ∼ e−tkxk t≈0

12

Outline

1. Conic and Semi-Definite Optimization (SDP), and Interior-Point Methods

2. Solving Convex Programs by Random Walks (Bertsimas/Vempala)

3. Improved Performance of IPMs for Conic Optimization using Probabilistic ReNormalization (Belloni/Freund)

4. References

13

Conic and Semi-Definite Programming, and Interior-Point Methods

14

A Different View of Linear Programming (LP)

LP : minimize c · x s.t.

ai · x = bi, i = 1, . . . , m x ∈ IRn+

“c · x” means the linear function “

Pn

j=1 cj xj ”

IRn+ := {x ∈ IRn | x ≥ 0} is the nonnegative orthant. IRn+ is a convex cone K is a convex cone if x, w ∈ K and α, β ≥ 0 ⇒ αx + βw ∈ K 15

LP is a Conic Program

LP : minimize c · x s.t.

ai · x = bi, i = 1, . . . , m x ∈ IRn+

“Minimize the linear function c · x, subject to the condition that x must solve m given equations ai · x = bi, i = 1, . . . , m, and that x must lie in the convex cone K = IRn+”

16

What is the LP Dual Problem?

LD : maximize s.t.

m P i=1 m P

y i bi yiai + s = c

i=1

s ∈ IRn+ For feasible solutions x of LP and (y, s) of LD, the duality gap is simply

c·x−

m X i=1

y i bi =

c−

m X

! yiai

·x=s·x≥0

i=1

17

LP Strong Duality If LP and LD are feasible, then there exists x∗ and (y ∗, s∗) feasible for the primal and dual, respectively, for which c · x∗ −

m X

yi∗bi = s∗ · x∗ = 0

i=1

18

Interior-Point Method (IPM) Set-up for LP

OPTVAL := minx s.t.

f (x) := −

cT x Ax = b x ∈ IRn+

= maxy,s bT y s.t. AT y + s = c s ∈ IRn+

Pn

n n ln(x ) is a barrier function for IR = {x ∈ IR : x ≥ 0} j + j=1

f (x) repels x from ∂IRn+ in a special way Central path:

x(µ) := argminx cT x + µf (x) s.t. Ax = b x>0

19

IPM for LP, Central Path

Central path:

x(µ) := argminx s.t.

P n cT x + µ − j=1 ln(xj ) Ax = b x>0

20

IPM for LP, continued

x(µ) := argminx s.t.

P n cT x + µ − j=1 ln(xj ) Ax = b x>0

Optimality gap property of the central path: cT x(µ) − OPTVAL ≤ n · µ

Algorithm strategy: trace the central path x(µ) for a decreasing sequence of values of µ & 0

21

IPM Strategy for LP

22

IPM for LP: Computational Reality

• 1985 - first IPM codes - 20-100 iterations on NETLIB suite, typically 35 iterations • ∼1990 - Mehrotra predictor-corrector, 10-60 iterations on NETLIB suite, typically 25 iterations • 1992-2007 - no further computational improvements • Each IPM iteration is expensive to solve:

k

D A

T

A 0

∆x ∆y

=

r1 r2

• O(n3) work per iteration, managing sparsity is important for success

23

IPM for LP: Complexity Theory Let L denote the bit-length needed to encode the LP data d = (A, b, c). Renegar 1986: The number of Newton steps needed to exactly solve an LP instance d = (A, b, c) is bounded above by: √ O( nL)

24

What is Semi-Definite Programming (SDP)? • generalization of LP, emerged in early 1990’s as the most significant computationally tractable generalization of LP • all relevant IPM machinery of LP easily translates to SDP • in practice, SDPs solve in 10 − 60 iterations, typically around ∼ 25 iterations • independently discovered by Alizadeh and Nesterov-Nemirovskii • applications of SDP are vast, encompassing such diverse areas as integer programming and control theory

25

Partial List of Applications of SDP • LP, Convex QP, Convex QCQP • tighter relaxations of IP (≤ 12% of optimality for MAXCUT) • static structural (truss) design, dymamic truss design, antenna array filter design, other engineered systems problems • control theory • shape optimization, geometric design, volume optimization problems • D -optimal experimental design, outlier identification, data mining, robust regression • eigenvalue problems, matrix scaling/design • sensor network localization • optimization or near-optimization with large classes of non-convex polynomial constraints and objectives (Parrilo, Lasserre, SOS methods) • robust optimization methods for standard LP, QCQP 26

The Semidefinite Cone Let X be an n × n matrix symmetric matrix

X is positive semi-definite if v T Xv ≥ 0 for any v ∈ IRn

X is positive definite if v T Xv > 0 for any v ∈ IRn, v 6= 0

27

The Semidefinite Cone, continued S n denotes the set of symmetric n × n matrices n S+ denotes the set of positive semi-definite n × n symmetric matrices n S++ denotes the set of positive definite n × n symmetric matrices

“X 0” denotes that X is symmetric positive semi-definite “X Y ” denotes that X − Y 0 “X 0” to denote that X is symmetric positive definite, etc. n Remark: S+ = {X ∈ S n | X 0} is a closed convex cone

28

Facts about Symmetric Matrices

• let λ1(X), . . . , λn(X) denote the eigenvalues of X • X 0 if and only if λ1(X), . . . , λn(X) ≥ 0 • X 0 if and only if λ1(X), . . . , λn(X) > 0 • det(X) =

n Y

λj (X)

j=1

29

Semidefinite Program (SDP) Primal Problem In SDP, X 0 will be the primal decision variable. Think of X as: • a matrix • an array of n2 components of the form (x11, . . . , xnn) • an object (a “vector”) in the vector space S n All three different equivalent ways of looking at X will be useful

30

Semidefinite Program (SDP) Primal Problem, continue Let X ∈ S n. What will a linear function of X look like? If C(X) is a linear function of X, then C(X) can be written as C • X, where

C • X :=

n X n X

Cij Xij .

i=1 j=1

There is no loss of generality in assuming that the matrix C is also symmetric

31

Definition of SDP Primal Problem

SDP : minimize C • X s.t.

Ai • X = bi , i = 1, . . . , m, X0

n “X 0” is the same as “X ∈ S+ ”

The data for SDP consists of the symmetric matrix C (which is the data for the objective function) and the m symmetric matrices A1, . . . , Am, and the m−vector b, which form the m linear equations

32

An Example 

1  0 A1 = 1

0 3 7

 1 7 , 5



0  2 A2 = 8

  8 1 11   0 2 , b= , and C = 19 4 3

2 6 0

2 9 0

 3 0  7

The variable X will be the 3 × 3 symmetric matrix:



x11 X =  x21 x31

SDP :

minimize s.t.

x12 x22 x32

 x13 x23  , x33

x11 + 4x12 + 6x13 + 9x22 + 0x23 + 7x33 x11 + 0x12 + 2x13 + 3x22 + 14x23 + 5x33 0x11 + 4x12 + 16x13 + 6x22 + 0x23 + 4x33 

x11 X =  x21 x31

x12 x22 x32

= =

11 19

 x13 x23  0 x33 33

Example, continued

SDP :

minimize s.t.

x11 + 4x12 + 6x13 + 9x22 + 0x23 + 7x33 x11 + 0x12 + 2x13 + 3x22 + 14x23 + 5x33 0x11 + 4x12 + 16x13 + 6x22 + 0x23 + 4x33 

x11 X =  x21 x31

x12 x22 x32

= =

11 19



x13 x23  0. x33

It may be helpful to think of “X 0” as stating that each of the n eigenvalues of X must be nonnegative

34

SDP Dual Problem

SDD : maximize

m P

y i bi

i=1 m P

s.t.

yiAi + S = C

i=1

S 0. Notice S=C−

m X

yiAi 0

i=1

35

SDP Dual Problem, continued and so equivalently:

SDD : maximize

m P

y i bi

i=1

s.t.

C−

m P

yiAi 0

i=1

It may be helpful to think of the constraints as stating that the entries of the m P positive semi-definite matrix S := C − yiAi are linear functions of the m-vector i=1 y

36

Example, continued



1 A1 =  0 1

SDD :

0 3 7

 1 7 , 5

maximize



0 A2 =  2 8

  1 8 11 , and C =  2 0 , b = 19 4 3

2 9 0

 3 0 , 7

11y1 + 19y2 

s.t.

2 6 0

1 y1  0 1

0 3 7

  1 0 7  + y2  2 5 8

2 6 0

  8 1 0 +S = 2 4 3

2 9 0

 3 0  7

S0

37

Example, continued

SDD :

maximize

11y1 + 19y2 

s.t.

1 y1  0 1

0 3 7

  1 0 7  + y2  2 5 8

2 6 0

  8 1 0 +S = 2 4 3

2 9 0

 3 0  7

S0 is the same as:

SDD :

maximize

11y1 + 19y2

s.t.



1 − 1y1 − 0y2  2 − 0y1 − 2y2 3 − 1y1 − 8y2

2 − 0y1 − 2y2 9 − 3y1 − 6y2 0 − 7y1 − 0y2



3 − 1y1 − 8y2 0 − 7y1 − 0y2  0 7 − 5y1 − 4y2 38

39

SDP Weak Duality Weak Duality Theorem: Given a feasible solution X of SDP and a feasible solution (y, S) of SDD, the duality gap is C •X −

m X

y i bi = S • X ≥ 0 .

i=1

If C •X −

m X

y i bi = 0 ,

i=1

then X and (y, S) are each optimal solutions to SDP and SDD, respectively, and furthermore, SX = 0.

40

Strong Duality ∗ Let zP∗ and zD denote the optimal objective function values of SDP and SDD, respectively.

ˆ of Strong Duality Theorem: Suppose that there exists a feasible solution X ˆ 0, and that there exists a feasible solution (ˆ ˆ of SDD SDP such that X y , S) such that Sˆ 0. Then both SDP and SDD attain their optimal values, and ∗ . zP∗ = zD

41

IPM Set-up for SDP

minX s.t.

C •X Ai • X = bi , i = 1, . . . , m X0

'

maxy,S

Pm

s.t.

Pm

i=1 yi bi i=1 yi Ai

+S =C

S0

42

IPM for SDP, Central Path

Central path:

X(µ) := argminX s.t.

Central path:


C •X −µ

Pn

j=1 ln(λj (X))

Ai • X = bi , i = 1, . . . , m X0 C • X − µ ln(det(X)) Ai • X = bi , i = 1, . . . , m X0

43

IPM for SDP, Central Path, continued

Central path:


C • X + µ ln(det(X)) Ai • X = bi , i = 1, . . . , m X0

44

IPM for SDP, Central Path, continued

Central path:


C • X + µ ln(det(X)) Ai • X = bi , i = 1, . . . , m X0

Optimality gap property of the central path: C • X(µ) − OPTVAL ≤ n · µ

Algorithm strategy: trace the central path X(µ) for a decreasing sequence of values of µ & 0

45

IPM Strategy for SDP

46

IPM for SDP: Computational Reality • 1991-94 - Alizadeh, Nesterov and Nemirovski - IPM theory for SDP • 1996 - software for SOCP, SDP - 10-60 iterations on SDPLIB suite, typically ∼30 iterations • Each IPM iteration is expensive to solve: k T ∆x r1 H(x ) A = ∆y r2 A 0 • O(n6) work per iteration, managing sparsity and numerical stability are tougher bottlenecks • most IPM computational research since 1996 has focused on work per iteration, sparsity, numerical stability, lower-order methods, etc. 47

Computing a Point x ∈ S by Random Walks Bertsimas and Vempala (2004)

48

Convex Body Convex Body is a compact convex set with nonempty interior

49

Notation

• B∞(x, r) is the box of radius 2r centered at x • A positive definite matrix Σ induces the ellipsoidal norm: kvkΣ :=

√

v T Σ−1v

50

Separation Oracle for a Convex Set A separation oracle for a set P : given a point x, identifies if x ∈ P or outputs a vector d 6= 0 satisfying dT y ≤ dT x for all y ∈ P

51

Computing a Point x ∈ S by Random Walks S ⊂ IRn is a convex set given by a Separation Oracle The goal is to compute a point x ∈ S Assume that B∞(v, r) ⊂ S ⊂ B∞(0, R) for some v, r, R Assume that we know R

52

Cut through the Center of Mass Cut through the Center of Mass: Let µ denote the center of mass of the convex body P . Any halfspace H that contains µ contains at most (1- 1/e) of the volume of P

This implies Vol(P ∩ H) ≤ (1 − 1/e)Vol(P ) 53

Exact Center of Mass Algorithm Levin’s Algorithm (1965): Input: Separation Oracle for S, scalar R Step Step Step Step Step Step

0. 1. 2. 3. 4. 5.

(Initialization) P ← B∞(0, R), µ ← 0 (Oracle call) If µ ∈ S, stop. Else compute separator d (Compute Halfspace) H ← {x ∈ IRn : dT x ≤ dT µ} (Cut P ) P ← P ∩ H (Compute new center of mass) µ ← µ(P ) (Repeat) Goto Step 1.

54

Complexity of Exact Center of Mass Algorithm Assume that B∞(v, r) ⊂ S ⊂ B∞(0, R) for some v, r, R Assume R is known Algorithm will compute x ∈ S is at most d2.2n ln(R/r)e iterations Problem: computing µ(P ) is #P-complete The method appears to be worthless

55

Basic Idea of Probabilistic Algorithm

56

Computing a Point x ∈ S by Random Walks B-V Algorithm: Input: Separation Oracle for S, scalar R Step Step Step Step Step Step Step

0. 1. 2. 3. 4. 5. 6.

(Initialization) P ← B∞(0, R), vˆ ← 0 (Oracle call) If vˆ ∈ S, stop. Else compute separator d (Compute Halfspace) H ← {x ∈ IRn : dT x ≤ dT vˆ} (Cut P ) P ← P ∩ H (Sample) Sample M random points v 1, . . . , v M ∼ U (P ) PM i 1 (Estimate Mean) vˆ ← M i=1 v (Repeat) Goto Step 1.

How does volume of P ∩ H decrease? How to choose the number of samples M ?

57

The Uniform Distribution on S: Bounding Ellipsoids Let X be a random vector uniformly distributed on a convex body S ⊂ IRd. Let fU (S) denote the uniform density function on S: 1/Vol(S) for x ∈ S fU (x) := 0 for x ∈ /S µ := E[X]

and

Theorem: BΣ

Σ := E[(X − µ)(X − µ)T ]

p p µ , (d + 2)/d ⊂ S ⊂ BΣ µ , d(d + 2)

(This yields a d-rounding of S)

58

Covariance Matrix and Bounding Ellipsoids

59

Bounding the Decrease in Volume Let µ and Σ denote the mean and covariance of the uniform distribution on P µ is the center of mass of P √ Define kwkΣ := wT Σ−1w vˆ is is an approximation of µ Theorem (Bertsimas/Vempala): If kˆ v − µkΣ ≤ t, then any halfspace containing vˆ also contains at most (1 − 1e + t) of the volume of P .

60

Main Computational Complexity Result R iterations Run the algorithm for K ≤ 3n ln r

where recall B∞(v, r) ⊂ S ⊂ B∞(0, R) for some v, r, R Let 1 − δ be the desired probability of success of algorithm &

R r

!'

3n ln be the Theorem (Bertsimas/Vempala): Let M = 838n ln δ number of sample points at each iteration. Then the algorithm will compute a point x ∈ S, and therefore stop, with probability at least 1 − δ 2

61

“High Probability” Let T be the number of iterations needed to have success with probability p = 0.125 Probability of Success 0.9 0.99 0.999 0.9999

Iteration Bound 18T 36T 54T 72T

62

Improved Performance of IPMs using Probabilistic Re-Normalization Belloni and Freund (2007?)

63

Goal of this Work

• goal is to actually reduce the number of IPM iterations needed to solve conic convex optimization problems • this is accomplished by computing “better” initial values with which to start an IPM • enabling technology is a random walk on a related auxiliary convex set

64

Distribution on Convex Set S f (x) ∼ e−tkxk t >> 0

65

Distribution on Convex Set S f (x) ∼ e−tkxk t>0

66

Distribution on Convex Set S f (x) ∼ e−tkxk t≈0

67

The Hit-and-Run Algorithm X ∼ f (·) on S

68

The Hit-and-Run Algorithm X ∼ f (·) on S Let v 0 ∈ intS be given v k is current point in hit-and-run algorithm • choose d ∼ U (S n−1), the (n − 1)-sphere in 0 and κ∗ = 0 75

HSD Model Stopping Rule (x, y, z, τ, κ, θ) is a (feasible) iterate of HSD model Compute the trial primal and dual values: (¯ x, y¯, z¯) := (x/τ, y/τ, z/τ ) Stop if: kAT y¯ + z¯ − ck∞ (cT x ¯ − bT y¯)+ kb − A¯ xk∞ 2 +2 + ≤ rmax 1 + kbk∞ 1 + kck∞ OPTVAL Typically set rmax = 10−8 Re-write this rule as: −Axk∞ 2 kbτ 1+kbk∞

+

kAT y+z−cτ k∞ 2 1+kck∞

τ

+

(cT x−bT y)+ OPTVAL

≤ rmax

76

HSD Model Stopping Rule, continued (x, y, z, τ, κ, θ) is a (feasible) iterate of HSD model Compute the trial primal and dual values: (¯ x, y¯, z¯) := (x/τ, y/τ, z/τ ) Stop if: kA −Axk∞ 2 kbτ + 2 1+kbk∞

T y+z−cτ k

∞

1+kck∞

τ

+

(cT x−bT y)+ OPTVAL

≤ rmax

Redefine this slightly to: RESID :=

where

bτ − Ax, AT y + z − cτ, cT x − bT y+κ S τ

≤ rmax

krpk∞ krdk∞ (rg )+ krp, rd, rg k := 2 +2 + 1 + kbk∞ 1 + kck∞ OPTVAL S

77

Equivalent Stopping Rule Define the initial residual:

0

bτ − Ax0, AT y 0 + z 0 − cτ 0, (cT x0 − bT y 0 + κ0)+ S RESID0 := τ0 Theorem: Assume that P and D are both feasible, and presume that at stopping: θ ≈ 0, κ ≈ 0, τ > 0, x ¯ ≈ xopt, and z¯ ≈ z opt (optimal solutions of P/D). Then the stopping rule is equivalent to: θ / rmax

1 RESID0

0 T 0

0 0

(x ) z + κ τ

!

(x0 )T z 0 + κ0 τ 0 + D(ky opt − y 0 /τ 0 k1 + kxopt − x0 /τ 0 k1 }RESID0

where D := (1 + max{kbk∞ , kck∞ }) /2

78

Equivalent Stopping Rule, continued Equivalent Stopping Rule: θ / rmax

1 RESID0

(x0 )T z 0 + κ0 τ 0

!



IPM iterations of HSD model depends on: (1) rmax desired feasibility/optimality tolerance (2) RESID0 initial residual √

(3) convergence rate of θ & 0 (translates to 9 ϑ for theoretical algorithm with barrier function complexity value ϑ)

79

A Strategy for Reducing IPM Iterations Simplified Stopping Rule: θ / rmax

1 RESID0

(x0 )T z 0 + κ0 τ 0

!



Strategy: replace (x0, y 0, z 0, τ 0, κ0) with (x1, y 1, z 1, τ 1, κ1) to improve RESID0 to RESID1 0, κ0 > 0 Consider the auxiliary optimization problem: AU X : +∞ = maxyˇ,ˇx,ˇτ s.t.

τ 0 + τˇ Aˇ x − bˇ τ = 0 z 0 − AT yˇ + cˇ τ ∈ C∗ κ0 + bT yˇ − cT x ˇ ≥ 0 x0 + x ˇ ∈ C τ 0 + τˇ ≥ 0

• (ˇ y, x ˇ, τˇ) = (0, 0, 0) is strictly feasible solution of AUX • The nontrivial rays of the feasible region of AUX are (y, x, τ ) = (y opt, xopt, 1) where xopt, y opt are optimal solutions of P and D 81

The Auxiliary Problem, continued AU X :

+∞ = maxyˇ,ˇx,ˇτ

τ 0 + τˇ Aˇ x − bˇ τ z 0 − AT yˇ + cˇ τ 0 T T κ + b yˇ − c x ˇ x0 + x ˇ 0 τ + τˇ

s.t.

= ∈ ≥ ∈ ≥

0 C∗ 0 C 0

Theorem: Suppose (ˇ y, x ˇ, τˇ) is feasible for AUX with objective value at least U , and consider the assignment:     0 0 1 (x + x ˇ)/(τ + τˇ) x   y1   (y 0 + yˇ)/(τ 0 + τˇ)      1    0 T 0 τ )/(τ + τˇ)  .  z  =  (z − A yˇ + cˇ  1     τ   1  1 0 T T 0 κ (κ + b yˇ − c x ˇ)/(τ + τˇ) 0

Then

RESID1 =

τ U

! 0

RESID .

82

Strategy for Reducing RESID Starting at (ˇ y, x ˇ, τˇ) = (0, 0, 0) perform a random walk on the feasible region of AUX to improve the objective function to some pre-set (or dynamically determined) value U . Output final value of (ˇ y, x ˇ, τˇ) and set      

1

x y1 z1 τ1 κ1





    =    

0

0

(x + x ˇ)/(τ + τˇ) (y 0 + yˇ)/(τ 0 + τˇ) (z 0 − AT yˇ + cˇ τ )/(τ 0 + τˇ) 1 (κ0 + bT yˇ − cT x ˇ)/(τ 0 + τˇ)

    .  

Use (x1, y 1, z 1, τ 1, κ1) as the new given initial values in the HSD model. RESID1 =

τ0 RESID0 U 83

Norm-Exponential Distribution on Convex Set S f (x) ∼ e−tkxk t >> 0

84

Norm-Exponential Distribution on Convex Set S f (x) ∼ e−tkxk t>0

85

Norm-Exponential Distribution on Convex Set S f (x) ∼ e−tkxk t≈0

86

Theoretical Complexity of Random Walk Recall property that the nontrivial rays of the feasible region of AUX are (y, x, τ ) = (y opt, xopt, 1) where xopt, y opt are optimal solutions of P and D Suppose P ,D have unique optimal solutions. Define (y opt, xopt, 1) v := (vy , vx, vτ ) := k(y opt, xopt, 1)k2 Definition: Suppose P ,D have unique optimal solutions. The ε-recession cone of AUX is defined as: Qε = {u = (uy , ux, uτ ) : uT v ≥ (1 − ε)kuk2}

87

Theoretical Complexity of Random Walk, continued Theorem Suppose that P ,D have unique optimal solutions. A feasible value of AUX with objective function value at least U can be computed with high probability in at most n O n3.5 ln(n) ln (Rε + 2 (U + Rε)) kxoptk2 + ky optk2 + 1 r

hit-and-run steps, where r is the Euclidean distance from the initial point to the boundary of the feasible region F of AUX Rε is the radius of a Euclidean ball with the property that the F ⊂ B2(0, Rε) + Qε ε≤

1 16(kxoptk22 + ky optk22 + 1) 88

Computational Experience Computations on 50 “random” problems for each m, n pairing Problems were pre-designed to be poorly conditioned

Dimensions m n

Original

RESID After Re-Normalization

20 100 200 100 200

3215 13876 31511 461233 427073

33 74 178 2391 2245

100 500 1000 5000 5000

SDPT3-HSD IPM Iterations After Original Re-Normalization 19.42 21.30 19.74 32.86 31.78

18.78 18.16 16.14 20.00 18.18

Used SDPT3-HSD software Numbers in table are arithmetic averages 89

Computational Experience, continued

Dimensions m n

SDPT3-HSD Running Time (seconds) After Original Re-Normalization

20 100 200 100 200

0.99 3.62 14.89 44.43 125.21

100 500 1000 5000 5000

0.98 3.08 12.20 27.66 73.28

Random Walk Running Time (seconds) 0.90 4.38 14.55 308.84 319.19

90

Computational Experience, continued

Dimensions m n

IN ITp0

Original IN ITd0

20 100 200 100 200

496 2908 6831 108980 89003

2119 4260 5706 12843 12800

100 500 1000 5000 5000

INITIAL RESIDUALS After Re-Normalization 0 IN ITg IN ITp1 IN ITd1 IN ITg1 600 6708.2 18974 339410 325270

5.7 28.1 63.9 577.7 486.1

27 42.2 53.7 67.9 69.9

-22.7 3.6 60.1 1746 1689

91

Next Steps Heuristics/theory to improve practical performance of random walk More complexity theory More computation

92

Back-up Slides to Follow

93

Skew-Symmetric Feasibility Problem Given a cone K and a skew-symmetric matrix M (M T = −M ) Solve for (v, s): SSF P :

Mv v∈K

+s

= 0

s ∈ K∗

Properties of SSFP: • self-alternative • always has a non-trivial solution • is ill-posed

94

Normalized SSFP and Image Set Given (v 0, s0) ∈ intK × intK ∗, consider N SSF P :

Mv

+s

= 0

(s0)T v +(v 0)T s = 1 v∈K

s ∈ K∗

Image set: H := {M v + s : (s0)T v + (v 0)T s = 1, v ∈ K, s ∈ K ∗}

Proposition: 0 ∈ ∂H

95

Image Set, Illustrated

96

Normalized SSFP and Image Set, continued

N SSF P :

Mv

+s

= 0

(s0)T v +(v 0)T s = 1 v∈K

s ∈ K∗

H := {M v + s : (s0)T v + (v 0)T s = 1, v ∈ K, s ∈ K ∗} H0 := {v : v T u ≤ 1 for all u ∈ H}

Proposition: H0 = {v : s0 − M T v ∈ K ∗, v 0 − v ∈ K} Proposition: −recH0 = {v : ∃s satisfying M v + s = 0, v ∈ K, s ∈ K ∗}

97

Homogenized Conic Optimization as SSFP Solve for (x, y, z, τ, κ): HCOP :

Ax −AT y bT y

−bτ +cτ

−cT x

y ∈ 0 z ∈ C∗ κ ≥ 0

98

Normalized SSFP and Image Set, continued H := {M v + s : (s0)T v + (x0)T s = 1, v ∈ K, s ∈ K ∗}

H0 = {v : s0 − M T v ∈ K ∗, v 0 − v ∈ K}

Proposition: If v ∈ intH0, then

1

v s1

=

0

v −v s0 − M T v

satisfies v 1 ∈ intK, s1 ∈ intK ∗, and M v 1 + s1 = M v 0 + s0.

99

Homogeneous Self-Dual (HSD) Model Embedding Given initial values (x0, y 0, z 0) satisfying x0 ∈ intC, z 0 ∈ intC ∗, τ 0 > 0, κ0 > 0, θ0 > 0, consider the homogeneous self-dual (HSD) embedding: H :

VALH := minx,y,z,τ,κ,θ

αθ ¯ Ax

s.t. T

−A y bT y −¯ bT y

+¯ bθ +¯ cθ +¯ gθ

−bτ +cτ

T

−c x −¯ cT x

−¯ gτ

x∈C

τ ≥0

−z −κ

z ∈ C∗

= = = =

0 0 0 −α ¯

κ≥0

where: ¯ b

=

bτ 0 −Ax0 θ0

g ¯

=

cT x0 −bT y 0 +κ0 θ0

c¯

=

AT y 0 +z 0 −cτ 0 θ0

α ¯

=

(z 0 )T x0 +τ 0 κ0 θ0

100

Properties of the HSD Model, continued

H :


αθ ¯ Ax

s.t. T

−A y bT y −¯ bT y

−bτ +cτ

T

−c x −¯ cT x

−¯ gτ

x∈C

τ ≥0

+¯ bθ +¯ cθ +¯ gθ

−z −κ

z ∈ C∗

= = = =

0 0 0 −α ¯

κ≥0

• Pre-multiplying by y T , xT , τ, θ and summing: xT z + τ κ = α ¯θ • Pre-multiplying by (y 0)T , (x0)T , τ 0, θ0 and summing: (z 0)T x + (x0)T z + κ0τ + τ 0κ = α ¯ θ0 + α ¯θ

101

Properties of the HSD Model, continued Using last property, we can re-write HSD model as:

H :


αθ ¯ Ax

s.t. T

−A y bT y

−bτ +cτ

T

−c x (z 0 )T x

+κ0 τ

x∈C

τ ≥0

+¯ bθ +¯ cθ +¯ gθ

−z

−αθ ¯

−κ

= = =

0 0 0

+(x0 )T z

+τ 0 κ

=

αθ ¯ 0

z ∈ C∗

κ≥0

102

Conditions are a Skew-Symmetric Conic System −bτ

Ax −AT y

−π −z

+cτ

bT y

−cT x

y ∈

Randomized Methods for Solving Convex Problems - MIT Sloan

Randomized Methods for Solving Convex Problems - MIT Sloan

Suggest Documents

Sequential Convex Programming Methods for Solving Nonlinear

MIT Sloan School of Management MIT Sloan Working ... - DSpace@MIT

Solving Multiple-Block Separable Convex Minimization Problems ...

Solving Convex MINLP Optimization Problems ... - Semantic Scholar

Solving Convex MINLP Optimization Problems ... - Semantic Scholar

Qualitative Research Methods for Solving Workplace Problems

A BFGS-IP algorithm for solving strongly convex optimization problems ...

Different transformations for solving non-convex trim-loss problems by

an extended cutting plane method for solving convex minlp problems

An Introduction to Structured Problem Solving - MIT Sloan

T - MIT Sloan

Nicholas Bloom - MIT Sloan

On Efficient Randomized Methods for Convex Optimization - CiteSeerX

Randomized Block Subgradient Methods for Convex Nonsmooth and ...

Competitive Randomized Algorithms for Non-Uniform Problems - Mit

MIT Sloan School of Management - DSpace@MIT

MIT Sloan School of Management - DSpace@MIT

MIT Sloan School of Management - DSpace@MIT

MIT Sloan School of Management - DSpace@MIT

Free-Steering Relaxation Methods for Problems with Strictly Convex

Sampling methods for multistage robust convex optimization problems

Stochastic quasi-Newton methods for non-strongly convex problems ...

FUNCTIONAL REGRESSION FOR STATE PREDICTION ... - MIT Sloan

Solving Ramsey Problems with Nonlinear Projection Methods