Two properties of condition numbers for convex programs via implicitly defined barrier functions Javier Pe˜ na∗ Graduate School of Industrial Administration Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213-3890 USA e-mail:
[email protected] October 9, 2000, Revised May 9, 2001
Abstract We study two issues on condition numbers for convex programs: one has to do with the growth of the condition numbers of the linear equations arising in interior-point algorithms; the other deals with solving conic systems and estimating their distance to infeasibility. These two issues share a common ground: the key tool for their development is a simple, novel perspective based on implicitly-defined barrier functions. This tool has potential use in optimization beyond the context of condition numbers.
1
Introduction
We study two issues concerning condition numbers for convex programs. These two issues share a common ground: the key tool for their development is a simple, novel perspective based on implicitly-defined barrier functions. This tool has potential use in optimization beyond the context of condition numbers. The first issue concerns the potential explosive growth of the condition numbers of the linear equations arising in interior-point methods. The core of the computational work of most interior-point algorithms is the solution of a particular system of equations at each main iteration. It is known that as the optimal solution is approached, the system of equations may become ill-conditioned. This constitutes a critical numerical issue of interior-point methods. Two types of estimates on the growth of these condition numbers along the central path have been previously developed in [1, 10] and [12]. The two bounds are derived ∗ Supported
by NSF Grant CCR-0092655.
1
and stated in fundamentally different ways. We show that these bounds are compatible. The bound in [12] corresponds to a weaker version of the bound in [1, 10], but applies to a larger class of problems. The relationship between the two bounds readily follows from an appropriate parametrization of the central path. The second issue has to do with conic systems and their distance to infeasibility. We provide a generalization of the algorithm and results developed in [13] for computing solutions to a convex conic linear system. In [13] the authors propose an algorithm for computing solutions to a conic system Ax = b x ∈ C. This problem is reformulated as an optimization problem to be solved via a suitable interior-point method. The main features of the algorithm are the following: aside from knowing that the conic system is feasible, no further information is needed; points obtained after a finite number of iterations yield both “backward” and “forward” approximate solutions to the system; last and most interesting, the condition number of the systems of equations to be solved throughout the algorithm are bounded above. The behavior and amount of work required by the algorithm depend on the condition number of the conic system; which is a measure of well-posedness of the particular problem instance. We generalize the algorithm and main results in [13] to general conic systems Ax − b ∈ CY x ∈ CX . In particular, when specialized to the case CY = {0}, we obtain new and more concise proofs of the key results in [13]. A common theme unifies these two issues: the fundamental tool in their development is a framework based on a particular class of implicitly defined functions. Specifically, we consider functions constructed via the following optimization scheme: f¯(y) =
min f (x) s.t. Ax = y.
This implicitly defined function approach enables a succinct consolidation of several results regarding self-concordant functions, the central path of a convex program, and conic systems and their distance to infeasibility. The paper is naturally divided in three main components. Section 2 develops an implicitly defined function framework. We establish closed-form expressions for the first and second derivatives of f¯ in terms of those of f . As an interesting consequence, we prove that if f is a self-concordant (barrier) function then so is f¯. 2
Section 3 deals with the growth of condition numbers in the equations arising in interior-point algorithms. Bounds on the growth of these condition numbers have been independently derived in [1, 10] and [12]. We show that the latter is a weaker version of the former, but applies to a larger class of problems. This relationship readily follows from an alternative objective value parametrization of the central path. The key feature of the objective value parametrization is its close connection with the usual parametrization of the central path (Proposition 3.2). Section 4 discusses conic systems and their distance to infeasibility. We extend the development in [13] to a more general class of conic systems. The most interesting results in this section are the connections between the distance to infeasibility of a conic system and the smallest singular value of a particular matrix (see Propositions 4.2, 4.6, 4.7).
2
Differential properties of implicitly defined functions
Given a function f : Df → IR with Df ⊆ IRn , and a linear map A : IRn → IRm , consider the implicitly defined function f¯(y)
=
min f (x) s.t. Ax = y.
(1)
In this section we shall derive closed-form expressions for the first and secondorder derivatives of the implicitly defined function f¯ in terms of the derivatives of f . We will make the following technical assumptions throughout the paper: Assumption 2.1 f is a C 2 strictly convex function, Df is open, and f (xk ) → ∞ for any sequence {xk } ⊆ Df that converges to a point in ∂Df .
Assumption 2.2 A is a surjective linear map, and for some x ∈ Df (and hence all) the set A−1 (A(x)) ∩ Df is bounded. If Assumptions 2.1 and 2.2 hold, then the function f¯ above is well defined for all y ∈ ADf . Furthermore, by the first-order optimality conditions, for each y ∈ ADf there exist unique points x(y) ∈ Df and v(y) ∈ IRm such that f¯(y) = f (x(y)) and g(x(y)) = AT v(y). Here g(·) denotes the gradient of f . Proposition 2.3 Let f¯(·), x(·), v(·) be defined on ADf as above. Then (i) x(·), v(·) are C 1 .
3
¯ of f¯ satisfy: (ii) For all y ∈ ADf , the gradient g¯ and Hessian H ¯ −1 = AH(x(y))−1 AT , g¯(y) = v(y) and H(y)
(2)
where H(·) denotes the Hessian of f . x(y) Proof. For any given y ∈ ADf the point is the solution to the nonlinear v(y) system of equations Ax − y 0 F (x, v, y) := = . 0 g(x) − AT v Notice that Fx,v (x, v, y) =
A 0 , H(x) −AT
which is non-singular at (x(y), v(y), y) because H(x(y)) is positive definite and A is full-rank. Therefore, by the implicit function theorem (see, e.g., [17]), the functions x(·) and v(·) are well-defined and C 1 on ADf . This proves (i). For (ii), assume y, y +h ∈ ADf . For simplicity of notation let x, x+ , v denote x(y), x(y + h), v(y) respectively. Hence we have f¯(y + h) − f¯(y) = f (x+ ) − f (x) = hg(x), x+ − xi + o(kx+ − xk). But g(x) = AT v, and o(kx+ − xk) = o(khk) because x(·) is C 1 ; thus, f¯(y + h) − f¯(y) = hv, hi + o(khk), i.e., g¯(y) = v = v(y). On the other hand, by continuity of linear maps, H(x)−1 AT (¯ g (y + h) − g¯(y)) = H(x)−1 (g(x+ ) − g(x)) = x+ − x + o(khk). Hence, AH(x)−1 AT (¯ g (y + h) − g¯(y)) = h + o(khk), i.e., ¯ −1 = AH(x(y))−1 AT . H(y)
(3) 2
4
Relying on Proposition 2.3 we can succinctly prove that if f is a selfconcordant (ϑ-barrier) function then so is f¯. Let us recall the formal definitions of self-concordance and barrier functions (see [6, 16] for a detailed discussion). A strictly convex, C 3 function f : Df → IR, Df ⊆ IRn is self-concordant if for all x ∈ Df , h ∈ IRn |f 000 (x)[h, h, h]| ≤ 2(f 00 (x)[h, h])3/2 ,
(4)
and f (xk ) → ∞ for any sequence {xk } ⊆ Df that converges to a point in ∂Df . A self-concordant function f is a ϑ-barrier if for all x ∈ Df hg(x), H(x)−1 g(x)i ≤ ϑ. Proposition 2.4 Let f¯ be defined in terms of A, f as in (1). If f is a selfconcordant (ϑ-barrier) then so is f¯. Proof. To prove the self-concordance of f¯ we only need to check that f¯ satisfies (4). Since f is a self-concordant function, it suffices to show that for any y ∈ Df¯, and h ∈ IRm there exists w ∈ IRn such that both f¯00 (y)[h, h] = f 00 (x(y))[w, w]
(5)
f¯000 (y)[h, h, h] = f 000 (x(y))[w, w, w]
(6)
and
hold. ¯ Let x(·), v(·) be as in Proposition 2.3. Letting w = H(x(y))−1 AT H(y)h, equation (5) readily follows from (3). On the other hand, applying the chain rule to (3) we get f¯000 (y)[h, h, h] = f 000 (x(y))[w, w, z], where z = x0 (y)[h]. But by implicitly differentiating F (x(y), v(y), y)) = 0, where Ax − y F (x, v, y) := , g(x) − AT v we get 0 x (y) = −Fx,v (x(y), v(y), y)−1 Fy (x(y), v(y), y). v 0 (y) It follows that x0 (y) = H(x(y))−1 AT (AH(x(y))−1 AT )−1 ,
5
so ¯ z = x0 (y)[h] = H(x(y))−1 AT (AH(x(y))−1 AT )−1 h = H(x(y))−1 AT H(y)h = w. Therefore (6) is proven. For the second part of the proposition, assume f is a ϑ-barrier. Then from (2) it follows that for any given y ∈ Df¯ ¯ −1 g¯(y)i = hv(y), (AH(x(y))−1 AT )v(y)i h¯ g (y), H(y) = hAT v(y), H(x(y))−1 (AT v(y))i = hg(x(y)), H(x(y))−1 g(x(y))i ≤ ϑ. Hence f¯ is a ϑ-barrier.
2
Remark 2.5 The self-concordance of f¯ had been established by Nesterov and Nemirovskii (see Proposition 5.1.5 in [6]). They actually proved it under slightly weaker assumptions. Their treatment is different from the one presented here. In particular, they do not provide, nor does their proof rely on closed-forms expressions for the derivatives of the implicitly defined function. To conclude this section let us recall the following basic properties of barrier functions (see [6, 16]). The rest of our treatment will rely on these facts. Proposition 2.6 If f1 and f2 are barrier functions with parameters ϑ1 and ϑ2 respectively, then f1 +f2 is a barrier function with domain Df1 ∩Df2 and barrier parameter no greater than ϑ1 + ϑ2 . Proposition 2.7 Let f be a ϑ-barrier function and x ∈ Df . Then {z : hz − x, H(x)(z − x)i ≤ 1} ⊆ Df ,
(7)
{z ∈ Df : hz − x, g(x)i ≥ 0} ⊆ {z : hz − x, H(x)(z − x)i ≤ 4ϑ + 1},
(8)
and for any w ∈ Df hg(x), w − xi ≤ ϑ.
3
(9)
Equations arising in interior-point methods
We address the potential explosive growth of the condition number of the systems of equations arising in interior-point methods. The bulk of the computational work at each main iteration of a typical interior-point algorithm is the solution of such systems of linear equations. It is known that as the optimal solution is approached, the system of equations may become ill-conditioned. 6
Estimates on these condition numbers along the central have been previously developed in [1, 10] and [12] via substantially different approaches. The peculiar approach of the former applies only to semidefinite programming whereas the latter applies to the larger class of self-scaled programs. In this section we show that these two estimates are indeed closely related. The bound in [12] corresponds to a weaker version of the bound in [1, 10], but applies to a larger class of problems. The key for establishing this connection is an alternative parametrization of the central path of a general convex program. This parametrization is interesting on its own sake.
3.1
Objective value parametrization of the central path
In this section we shall show that the objective value serves as a natural parameter for the central path of a convex program. Consider the problem inf
hc, xi Ax = b ¯f , x∈D
(10)
where Df is the domain of a self-concordant barrier function f . The central path of (10) is defined as {x(µ) : µ > 0}, where x(µ) is the minimizer of the problem min hc, xi + µf (x) Ax = b.
(11)
Throughout this section we assume that {x : Ax = b, x ∈ Df } 6= ∅, and that (10) is bounded below so that the central path exists. We also assume that c 6∈ span(AT ), for otherwise the objective value in (10) is the same for any feasible point and hence the problem is uninteresting. An alternative parametrization of the central path can be given as follows: Let Vinf be the optimal value of (10) and Vsup be the optimal value of the problem obtained by replacing ‘inf’ with ‘sup’ in (10) (Vsup may be +∞). For any given t ∈ (0, Vsup − Vinf ), let x ˜(t) be the minimizer of the problem min f (x) hc, xi = t + Vinf Ax = b.
(12)
It is easy to see that the set {˜ x(t) : t > 0} contains the central path {x(µ) : µ > 0} defined above. We shall actually establish a stronger connection between these two parametrizations. For that purpose we rely on the following technical lemma.
7
Lemma 3.1 Assume f : (0, b) → IR is a ϑ-barrier function with b ∈ (0, +∞]. Let tf = arg min f (t) when b < ∞, and tf = ∞ otherwise. Then −
ϑ 1 1 ≤ f 0 (t) ≤ − t tf t
for all t ∈ (0, tf ) (under the convention that
1 ∞
= 0).
Proof. By (9), for any t ∈ (0, b) f 0 (t)(0 − t) ≤ ϑ, which readily yields the lower bound on f 0 (t). For the upper bound, first notice that by (7), f 00 (t) → ∞ as t ↓ 0. Furthermore, since f is self-concordant, −f 000 (t) ≤ 2((f 00 (t))3/2 for all t ∈ (0, b). Therefore, for t ∈ (0, b) 1 =− 00 (f (t))1/2
Z 0
t
f 000 (s) ds ≤ 2(f 00 (s))3/2
Z
t
ds = t.
0
To finish, notice that if tf < ∞ then f 0 (tf ) = 0, and if tf = ∞ then, by (9), limt→∞ f 0 (t) = 0. Thus, −f 0 (t) =
Z
tf
f 00 (s)ds ≥
t
Z t
tf
1 1 1 ds = − . 2 s t tf
2 We can now establish the following connection between the two parametrizations of the central path. Proposition 3.2 Let tmax = sup{hc, x(µ)i : µ > 0} − Vinf . Then {x(µ) : µ > 0} = {˜ x(t) : 0 < t < tmax }. Furthermore, if x(µ) = x ˜(t) then ϑt ≤ µ ≤ t 1 + tmaxt −t . Proof. Given x(µ) take t = hc, x(µ)i − Vinf . Notice that 0 < t < tmax . Because f is strictly convex, it follows that x(µ) solves min f (x) hc, xi = t + Vinf Ax = b, i.e., x(µ) = x ˜(t). Conversely, given t ∈ (0, tmax ) take µ > 0 such that hc, x(µ)i = t + Vinf (such µ exists because the central path {x(µ) : µ > 0} is continuous by Proposition 2.3). Again strict convexity of f implies that x ˜(t) = x(µ). Hence the two parametrizations yield the same set. 8
On the other hand, the first-order optimality conditions for (11) and (12) yield g(x(µ)) = −
1 c + AT y(µ), µ
and g(˜ x(t)) = v(t)c + AT w(t). For some v(t) ∈ IR, and y(µ), w(t) ∈ IRm . In particular, if x(µ) = x ˜(t) then v(t) = − µ1 because A is full row-rank and c 6∈ span(AT ). Let f¯ : (0, Vsup − Vinf ) → IR be defined by f¯(t) := f (˜ x(t)). By the chain rule, f¯0 (t) = hz 0 (t), g(˜ x(t))i = v(t)hc, z 0 (t)i + hw(t), Az 0 (t)i. But (12) yields hc, z 0 (t)i = 1 and Az 0 (t) = 0 so f¯0 (t) = v(t). By Proposition 2.4, the function f˜(y, t) := min{f (x) : Ax = y, hc, xi = t} is a ϑ-barrier. Thus, f¯(t) := f (˜ x(t)) = f˜(b, t) is a ϑ-barrier function for (0, Vsup − Vinf ). And by construction, tmax = arg min f¯(t). Lemma 3.1 thus implies that for t ∈ (0, tmax ) −
ϑ 1 1 ≤ f¯0 (t) ≤ − . t tmax t
Therefore x(µ) = x ˜(t) implies 1 = v(t) = f¯0 (t) ∈ µ ≤ µ ≤ t 1 + tmaxt −t . −
i.e.,
t ϑ
ϑ 1 1 − , − t tmax t
,
2
Remark 3.3 The proof of Proposition 3.2 actually shows that x(µ) = x ˜(t) ⇔ − µ1 = f¯0 (t) for the function f¯(t) = f (˜ x(t)). This is an interesting fact on its own. However, we will only use the weaker statement in Proposition 3.2 in this paper.
3.2
Self-scaled programs
Let us consider the (primal) program min s.t.
hc, xi Ax = b x ∈ K,
(13)
where c ∈ IRn , b ∈ IRm , A ∈ L(IRn , IRm ), and K ⊆ IRn is a self-scaled cone (see [7, 8, 16] for a detailed discussion on self-scaled cones). Self-scaled programming includes as special cases standard linear programming, semidefinite 9
programming, and second-order cone programming. In their seminal work (see [7, 8]), Nesterov and Todd generalized primal-dual interior-point methods to this class of problems. Self-scaled cones are in particular self-dual, i.e., K ∗ = K. Hence the dual problem associated to (13) is max s.t.
hb, yi AT y + s = c s ∈ K,
(14)
In order to apply interior-point methods, it is standard to make the following assumption on the primal-dual pair (13)-(14) above.
Assumption 3.4 (i) There exist primal and dual feasible points x and (y, s) respectively, such that x, s ∈ int(K). (ii) The linear map A is onto. (iii) A self-scaled barrier function f for K is available. A key property of self-scaled cones is the existence of scaling points: given x, s ∈ int(K), a point w ∈ int(K) is a scaling point of x, s if H(w)x = s, where H denotes the Hessian of the self-scaled barrier function f . If K is a self-scaled cone, such a scaling always exists and is unique for any pair x, s ∈ int(K) (cf. [7, 8]).
3.2.1
The central path
Assume f is a self-scaled barrier function for the cone K. The central paths of the problems (13) and (14) are {x(µ) : µ > 0} and {(y(µ), s(µ)) : µ > 0} respectively, where x(µ) is the minimizer of min hc, xi + µf (x) Ax = b. and (y(µ), s(µ)) is the maximizer of max hb, yi − µf (s) AT y + s = c. Alternatively, both central paths can be characterized as the set of points {(x(µ), y(µ), s(µ)) : µ > 0}, where (x(µ), y(µ), s(µ)) is the minimizer of min hc, xi − hb, yi + µ(f (x) + f (s)) Ax = b AT y + s = c.
10
In other words, {(x(µ), y(µ), s(µ)) : µ > 0} is the central path of the primal-dual problem min hc, xi − hb, yi Ax = b AT y + s = c x, s ∈ K.
(15)
By strong duality, the optimal value of (15) is 0. Furthermore, it is known that for any point (x(µ), y(µ), s(µ)), the objective function in (15) satisfies hc, x(µ)i − hb, y(µ)i = µϑ, where ϑ is the barrier parameter of f . (For details, see [7, 8, 16].) Hence, for the primal-dual problem (15), this identity provides a direct connection between the usual parametrization of the central path and the objectivevalue parametrization. Let z denote a generic point (x, y, s). In the notation of Section 3.1, the point z(µ) corresponds to z˜(ϑµ). Proposition 3.2 is an extension of this fact to general central paths. Incidentally, notice that even for self-scaled programs it is not necessarily the case that x(µ) and x ˜(ϑµ) are the same point. Instead, we have the following weaker statement that suffices for our purposes. Proposition 3.5 For µ, t > 0 sufficiently small, if x(µ) = x ˜(t) then 2t. Proof. This readily follows from Proposition 3.2.
3.3
t ϑ
≤µ≤
2
Equations arising in interior-point methods
For a given µ > 0, the point (x(µ), y(µ), s(µ)) can also be characterized as the solution of the following nonlinear system of equations (see [7, 8, 16]).
AT y
µg(x) + s = 0 Ax = b + s = c.
(16)
One of the central concepts in extending in extending primal-dual interior-point methods to self-scaled programs is the Nesterov-Todd direction, which we describe next. At a given iterate (x, y, s), we would like to generate an improving direction (dx , dy , ds ) to aim for the point (x(µ), y(µ), s(µ)) on the central path. A natural attempt is to linearize (16). One of Nesterov and Todd’s key ideas was the choice of a linearization that yields symmetry between x and s. The Nesterov-Todd direction is the solution to H(w)dx Adx AT dy
+ ds + ds 11
= s + µ g(x) = 0 = 0,
(17)
where w ∈ int(K) is the scaling point of (x, y, s); i.e., w ∈ int(K) is the unique point such that H(w)x = s. The system of equations (17) can be simplified by eliminating the variables dx , ds yielding the following smaller system (AH(w)−1 AT )dy = −AH(w)−1 (s + µg(x)). The solution of this linear system is the crux of the numerical work involved at each interior-point iteration. As the optimal solution is approached, the matrix AH(w)−1 AT may get ill-conditioned due to the ill-conditioning of H(w)−1 . This is clearly critical from the numerical analysis point of view. The remaining part of this section discusses this issue. Basic properties developed in [7, 8] show that at a point (x(µ), y(µ), s(µ)) on the central path, the scaling point w satisfies x(µ) w = √ , and H(w) = µH(x(µ)). µ
(18)
For our discussion on the condition number of AH(w)−1 AT , we shall restrict our attention to this case. By (18) we have κ(AH(w)−1 AT ) = κ(AH(x(µ)−1 AT ). Some bounds on the growth of the condition number of AH(x)−1 AT for x on the central path have been derived in [1, 10] and [12]. In [1, 10], the authors give estimates on the growth of κ(AH(x)−1 AT ) for the particular case of semidefinite programming, i.e., when K is the positive semidefinite cone. Their proof relies on a careful analysis of the structure of the clusters of eigenvalues of the operator AH(x)−1 AT . Proposition 3.6 (Alizadeh et al. [1, 10]) Suppose (13) is a semidefinite program, i.e., K is the positive semidefinite cone. If x = x(µ) is a point on the central path of (13) then Θ(1/µ2 ) if the problem is dual degenerate; Θ(1/µ) if the primal and dual solutions are unique, strict complementarity holds, and κ(AH(x)−1 AT ) = 6= m; 0 6= r(r+1) 2 Θ(1) in any other case. Here r is the rank of the primal optimal solution to (13). On the other hand, the following bound is proven in [12] using a different technique based on perturbation theory. Proposition 3.7 Assume x is on the central path of (13) and let t = hc, xi − Vmin , where Vmin is the optimal value of (13). Then κ(AH(x)−1 AT ) = O(1/t2 ).
12
Notice that in Proposition 3.7 above x = x ˜(t) in the notation introduced in Section 3.1. Consequently, Proposition 3.5 readily allows us to rephrase the latter bound in a way that naturally extends the bounds in Proposition 3.6. Corollary 3.8 Assume x = x(µ) is on the central path of (13). Then κ(AH(x)−1 AT ) = O(1/µ2 ). Remark 3.9 (i) The constants hidden in the O(·) expression in Proposition 3.7 and Corollary 3.8 depend on the conditioning of the primal and dual constraints and the parameter of the barrier function for K (see [12, Sect. 5]). (ii) The bound in Corollary 3.8 extends the weakest of the bounds in Proposition 3.6. This is the best one can expect because the statement in Proposition 3.7 applies to self-scaled programs and hence does not use the particular structure of semidefinite programming; in particular, no distinction concerning degeneracy is made. (iii) The bounds above hold for w scaling point of (x, y, s) assuming (x, y, s) = (x(µ), y(µ), s(µ)) is a point on the central path. It is straightforward to show that the same bounds hold for the scaling points of (x, y, s) as long as (x, y, s) is in a local neighborhood of the central path. We conjecture that similar bounds can be derived for scaling points w of arbitrary interior points (x, y, s), such as those that arise in potential reduction algorithms.
4
Conic systems and the distance to infeasibility
Consider the conic linear system Ax − b ∈ CY x ∈ CX ,
(19)
where A ∈ L(IRn , IRm ), b ∈ IRm , and CX ⊆ IRn , CY ⊆ IRm are closed convex cones. In [13] the subclass of conic systems in “standard form” Ax = b x ∈ C,
(20)
(i.e., the special case when CY = {0}) is studied. The authors propose and analyze an interior-point method for computing solutions to the conic system (20). Besides generating a solution to (20), the algorithm propose in [13] has several nice features: early termination of the algorithm yields “backward” solutions to (20); once a threshold is crossed, accurate “forward” solutions to (20)
13
are obtained as well; the condition number of the equation arising in the algorithm grow in a controlled manner and remain uniformly bounded. (See [13] for a detailed discussion). In this section we extend the algorithm and main results in [13] to conic systems in the more general form (19). Aside from its theoretical interest, there is practical motivation for studying this general form of conic systems. Some applications lead to constraints written in this fashion. For example, certain models in linear programming naturally lead to constraints of the form Ax ≤ b, x ≥ 0. The subsequent discussion focuses on the aspects that extend the treatment in [13]. While this section is self-contained, we have aimed for conciseness. In particular, there is little overlap with [13]. The reader interested in a more complete and detailed picture can easily do so by reading [13] in addition to this section.
4.1
Reformulation
We reformulate the problem of finding a solution to (19) as an optimization problem to be solved via interior-point methods. First, by homogenizing and adding slack variables, we can rewrite (19) as Ax − tb = s x ∈ CX , s ∈ CY , t > 0 k(x, t)k ≤ 1.
(21)
Here k(x, t)k2 := kxk2 + t2 . Next, we introduce a relaxation variable y and a positive variable δ whose value bounds kyk. We reformulate (19) as the following optimization problem (cf. [13, Sect. 3]): min δ Ax − tb + y = s x ∈ CX , s ∈ CY , t ≥ 0 k(x, t)k ≤ 1 kyk ≤ δ.
(22)
The goal of this formulation is to ultimately obtain a solution (x, t) to (21). The relaxation variable y allows us to gradually approach such a solution. To ensure that t stays away from zero, we shall take an interior-point approach to solving (22). Assume fCX , fCY are barrier functions for CX , CY respectively. Let f (x) := fCX (x) − ln(t) − ln(1 − kxk2 − t2 ). Proposition 2.6 implies that f (x, t), f (x, t) + fCY (s), and f (x, t) + fCY (s) − ln(δ 2 − kyk2 ), are barrier functions for the sets {(x, t) : x ∈ CX , t ≥ 0, k(x, t)k ≤ 1}, 14
{(x, t, s) : x ∈ CX , t ≥ 0, s ∈ CY , k(x, t)k ≤ 1}, and {(x, t, s, y) : x ∈ CX , t ≥ 0, s ∈ CY , k(x, t)k ≤ 1, kyk ≤ δ} respectively. We can then solve (22) via interior-point methods. A standard argument shows if (19) is well-posed (i.e., has strictly feasible solutions) then the central path of (22) converges to the analytic center (¯ x, t¯, s¯) of Ax − tb = s x ∈ CX , s ∈ CY , t > 0 k(x, t)k ≤ 1,
(23)
that is, the minimizer of f (x, t)+fCY (s) in the subspace {(x, t, s) : Ax−tb = s}. In particular, the point x ¯/t¯ is a strictly feasible solution to (19). Remark 4.1 Throughout this section, the analytic center of a set of the form {x : Ax = b, x ∈ D} is defined as the minimizer of a barrier function for D – always clear from the context – over the affine subspace {x : Ax = b}.
4.2
Distance to infeasibility
The distance to infeasibility of the conic system (19) is a fundamental parameter in the analysis of our interior-point algorithm. In particular, there is close connection between the analytic center (¯ x, t¯, s¯) of (23) and the distance to infeasibility of the conic system (19), as Proposition 4.2 below states. Let us recall the definition of the distance to infeasibility. The pair (A, b) is the data of the conic system (19). Letting A vary over IRm×n and b vary over IRm we obtain the data space. Endow the data space with a norm as follows: Let k(A, b)k be the operator norm of A b , i.e., k(A, b)k := k A b k = max{kAx + tbk : kxk2 + t2 ≤ 1}. The distance to infeasibility of the conic system (19) is defined as the smallest perturbation of the data that yields an infeasible system, that is, dist((A, b), I) := inf{k(∆A, ∆b)k : (A + ∆A, b + ∆b) ∈ I}, where I := {(A, b) : the system (19) is infeasible}. The distance to infeasibility yields a notion of condition number that has proven to be useful in convex optimization; in particular, it serves as a natural parameter for analyzing the behavior of interior-point methods (see, e.g., [3, 4, 9, 14]). For a detailed discussion on the distance to infeasibility see [11, 14]. Assuming (19) is well-conditioned (i.e., dist((A, b), I) > 0), the analytic center (¯ x, t¯, s¯) exists and is related to the distance to infeasibility dist((A, b), I) as follows (cf. [13, Thm. 2]). 15
Proposition 4.2 Let (¯ x, t¯, s¯) be the analytic center of Ax − tb = s x ∈ CX , s ∈ CY , t > 0 k(x, t)k ≤ 1. Then dist((A, b), I) ≤ 4ϑ + 1
s
λmin
T A −1 −1 ¯ A −b H(¯ x, t) + HCY (¯ s) ≤ dist((A, b), I). −bT
Here H, HCY denote the Hessians of f, fCY respectively, and ϑ denotes the barrier parameter of f + fCY . Proof. See Proposition 4.4.
2
Hence, in addition to a solution to (19), (¯ x, t¯, s¯) yields an estimate of the distance to infeasibility of (19). Before proving Proposition 4.2, observe that by a straightforward homogenization, it is easy to see that there is no loss of generality in considering homogeneous problems Ax ∈ CY , x ∈ CX , (i.e., in assuming b = 0). In this case we write dist(A, I) instead of dist((A, 0), I). In this situation, the function f above can be simplified to f (x) := fCX − ln(1 − kxk2 ). This is now a barrier for {x ∈ CX : kxk ≤ 1}. We will crucially rely on the following key characterization of the distance to infeasibility (see [11, 14]). Proposition 4.3 dist(A, I) = inf{kyk :6 ∃ x s.t. Ax − y ∈ CY , kxk ≤ 1, x ∈ CX }. The right hand side in this characterization of dist(A, I) can be interpreted as a generalization of the smallest singular value of the matrix A. Some intuition can be developed by considering the very special case CX = IRn , CY = {0}. In this case it is easy to see that dist(A, I) is the distance from A to the set of rank-deficient matrices. Relying on rank-one perturbations, it can be shown that this distance is precisely the smallest singular value of A, i.e., inf{kyk :6 ∃ x s.t. Ax = y, kxk ≤ 1}. Indeed, Proposition 4.3 can be proven via rank-one perturbations, as the author showed in [11]. This rank-one paradigm has been further extended to more abstract and general contexts (see [2, 5]). We can now give the homogenized version of Proposition 4.2 above. 16
Proposition 4.4 Let (¯ x, s¯) be the analytic center of Ax = s x ∈ CX , s ∈ CY kxk ≤ 1. Then dist(A, I) ≤ 4ϑ + 1
q
λmin (AH(¯ x)−1 AT + HCY (¯ s)−1 ) ≤ dist(A, I).
Here H, HCY denote the Hessians of f, fCY respectively, and ϑ denotes the barrier parameter of f + fCY . Proposition 4.4 follows from a more general result, namely Proposition 4.6 below. The latter deals with a generalization of the distance to infeasibility, where only a block of the data is subject to perturbations.
4.3
Restricted distance to infeasibility
Let A ∈ L(IRn , IRm ) and C ⊆ IRn be a closed convex cone. Suppose that IRn = X1 × X2 , and consequently A = A1 A2 . Consider perturbations to the conic system Ax = 0, x ∈ C
(24)
where we only allow perturbations on the first block A1 . Define the corresponding restricted distance to infeasibility as dist1 (A, I) = inf{k(∆A1 , ∆b)k :6 ∃ x ∈ C s.t. (A1 + ∆A1 )x1 + A2 x2 = ∆b}. The following result (cf. [11, Corol. 4.6]) generalizes Proposition 4.3. Proposition 4.5 dist1 (A, I) = inf{kyk :6 ∃x s.t. Ax = y, kx1 k ≤ 1, x ∈ C}. Let us consider the natural modification of the construction described above: Assume fC is a barrier for C, take f1 (x) = fC (x) − ln(1 − kx1 k2 ) as the barrier for the set {x ∈ C : kx1 k ≤ 1}. Now the analytic center of Ax = 0 x∈C kx1 k ≤ 1
(25)
is related to the restricted distance dist1 (A, I) as follows. Proposition 4.6 Assume A is such that {x : Ax = 0, x ∈ C, kx1 k ≤ 1} is nonempty and bounded. Let x ¯ be the analytic center of (25). Then q dist1 (A, I) ≤ λmin (AH1 (¯ x)−1 AT ) ≤ dist1 (A, I). 4ϑ + 1 Where H1 is the Hessian of f1 , and ϑ is the barrier parameter of f1 . 17
Proof. By Proposition 2.4, the implicitly defined function f¯(y) := min{f1 (x) : Ax = y, x ∈ Df1 } is a barrier function for the set Df¯ = ADf1 = {y : ∃x s.t. Ax = y, kx1 k ≤ 1, x ∈ C}. Proposition 2.7 applied to 0 ∈ Df¯ yields ¯ {z : hz, H(0)zi ≤ 1} ⊆ Df¯.
(26)
¯ −1 )} ⊆ {z : hz, H(0)zi ¯ {z : kzk2 ≤ λmin (H(0) ≤ 1}.
(27)
Notice that
Hence putting together (26) and (27), and applying Propositions 2.3 and 4.5 we get q λmin (AH1 (¯ x)−1 AT ) ≤ dist1 (A, I). On the other hand, if we take z = αv, where z is a norm-one eigenvector of 1 ¯ ¯ H(0) with eigenvalue λmax (H(0)) = λmin (H(0) −1 ) , then ¯ ¯ hz, H(0)zi =
α2 ¯ −1 ) . λmin (H(0)
¯ −1 ) and the sign of α is chosen so that hz, g¯(0)i ≥ 0, If α2 > (4ϑ + 1)2 λmin (H(0) then Proposition 2.7 yields v ∈ / Df¯. Thus, applying Proposition 4.5 again, we get dist1 (A, I) ≤ kvk = |α|. ¯ −1 ). Therefore, This holds for any α such that α2 > (4ϑ + 1)2 λmin (H(0) q x)−1 AT ). dist1 (A, I) ≤ (4ϑ + 1) λmin (AH1 (¯ 2 Proof of Proposition 4.4: Write the conic system Ax ∈ CY , x ∈ CX as x A −I =0 s x ∈ CX × CY . s
(28)
Now dist(A, I) corresponds to the restricted distance to infeasibility of (28), where we only allow perturbations on the first block of A −I . The result then readily follows from Proposition 4.6. 2 18
We can actually go a bit further. For a given δ > 0, consider the relaxation of (25): Ax − y = 0 x∈C kx1 k ≤ 1 kyk ≤ δ.
(29)
Now the function f1 (x)+fδ (y) := fC (x)−ln(1−kx1 k2 )+fCY (s)−ln(δ 2 −kyk2 ) is a barrier for the set {(x, y) : x ∈ C, s ∈ CY , kx1 k ≤ 1, kyk ≤ δ}. By Proposition 4.5, it is easy to see that dist1 (A, I) + δ = inf{ky 0 k :6 ∃ x s.t. Ax − y = y 0 , kx1 k ≤ 1, x ∈ C, kyk ≤ δ}. Hence a straightforward modification of the proof of Proposition 4.6 yields the following result. Proposition 4.7 Assume A is such that {x : Ax = y, x ∈ C, kx1 k ≤ 1, kyk ≤ δ} is nonempty and bounded. Let (¯ x, y¯) be the analytic center of (29). Then q dist1 (A, I) + δ ≤ λmin (AH1 (¯ x)−1 AT + Hδ (¯ y )−1 ) ≤ dist1 (A, I) + δ. 4ϑ + 1 Here Hδ (y) is the Hessian of − ln(δ 2 −kyk2 ), and ϑ denotes the barrier parameter of f1 (x) − ln(δ 2 − kyk2 ).
4.4
The algorithm
As in [13], we can apply an interior-point path-following algorithm (e.g., the barrier method) to follow the central path of min δ Ax − y = s x ∈ CX , s ∈ CY kxk ≤ 1 kyk ≤ δ.
(30)
For an initial point, assume – as is the case for cones arising in practice – that the analytic centers x0 , s0 of the sets {x : x ∈ CX , kxk ≤ 1} and {s : s ∈ CY , ksk ≤ M } can be easily obtained. Then standard argument on barrier functions (see e.g., [13, Thm. 3]) shows that as long as M, δ0 = Ω(kAk), the point (x0 , s0 , Ax0 − s0 , δ0 ) is near the central path of (30). The crux of each iteration of the algorithm is the solution of a system of equations whose matrix is of the form AH(x)−1 AT + HCY (s)−1 + Hδ (y)−1 . 19
For a point (¯ x, y¯, δ) on the central path of (30), the smallest eigenvalue of this matrix can be bounded: in such case (¯ x, y¯) is the analytic center of Ax − y = s x ∈ C, s ∈ CY kxk ≤ 1 kyk ≤ δ. And from Proposition 4.7, it follows that q dist(A, I) + δ ≤ λmin (AH(¯ x)−1 AT + HCY (¯ s)−1 + Hδ (¯ y )−1 ) ≤ dist(A, I) + δ. 4ϑ + 1 (31) In particular, the smallest eigenvalue λmin (AH(¯ x)−1 AT + HCY (¯ s)−1 + Hδ (¯ y )−1 ) is bounded below. A straightforward argument shows that a similar bound holds for (x, y, δ) in a local neighborhood of the central path. In addition, the largest eigenvalue λmax (AH(x)−1 AT + HCY (s)−1 + Hδ (y)−1 ) is also bounded above, as the following proposition states. Proposition 4.8 Assume CY is a closed, pointed cone. Let 6 denote the maximal angle occurring between pairs of vectors in CY , that is, hu, vi 6 := sup arccos : u, v ∈ CY \ {0} . kuk kvk ◦ Let (x, s, y) be such that x ∈ CX , kxk < 1, s ∈ CY◦ , kyk < δ, and Ax − s = y. Then
λmax (AH(x)−1 AT + HCY (s)−1 + Hδ (y)−1 ) ≤ (1 + max{1, tan
6
2
}2 )(kAk + δ)2 . (32)
Proof. See Section 4.5.
2
From the bounds (31) and (32) above, it follows that throughout the algorithm, the condition number of the matrix AH(x)−1 AT + HCY (s)−1 + Hδ (y)−1 satisfies −1
κ(AH(x)
T
−1
A + HCY (s)
−1
+ Hδ (y)
)=O
kAk + δ dist(A, I) + δ
2
,
(33)
where we are including the barrier parameter ϑ in the O(·) expression. Moreover, arguments essentially identical to those in [13, Sect. 4], show that for a given iterate (xk , sk , yk , δk ) with δk = O(dist(A, I)), a suitable projection of (xk , sk ) on {(x, s) : Ax = s} yields a strictly interior solution to Ax ∈ CY x ∈ CX . 20
The results above yield interesting numerical insight into this interior-point approach to solving conic systems, following the same spirit initiated in [13]. In numerical linear algebra it is well known that the condition number of a matrix influences the amount of computational effort needed to solve a system of equations. For instance, it determines the number of iterations that an iterative method, like the conjugate gradient, takes to get within a certain neighborhood of the solution. The bound (33) yields a similar statement for the conic system Ax ∈ CY , x ∈ kAk CX . In this context the role of the condition number is played by , dist(A, I)
which is usually interpreted as a general notion of condition number for conic systems.
4.5
Bound on the largest eigenvalue
We conclude with the proof of Proposition 4.8, which relies on the following lemma. Lemma 4.9 Assume CY is a closed, pointed cone, and let Proposition 4.8. Then for any given s ∈ CY◦ kHCY (s)−1 k ≤ ksk2 (max{1, tan
6
2
6
be defined as in
})2 .
Proof. We claim that if w satisfies hw, HCY (s)wi ≤ 1 then kwk ≤ ksk max{1, tan To see this, notice that by Proposition 2.7, if hw, HCY (s)wi ≤ 1 then both s + w, s − w ∈ CY . From elementary trigonometry it follows that kwk ≤ 6 ksk max{1, tan 2 }. The bound on kHCY (s)−1 k now follows from the identity kHCY (s)−1 k =
1 . inf{hw, HCY (s)wi : kwk = 1} 2
Proof of Proposition 4.8: We have λmax (AH(x)−1 AT + HCY (s)−1 + Hδ (y)−1 ) ≤ kAk2 kH(x)−1 k + kHCY (s)−1 k + kHδ (y)−1 k. (34) On the other hand, Proposition 2.7, and the boundedness of the domains ◦ Df = {x ∈ CX , kxk < 1} and Dfδ = {y : kyk < δ}, imply that kH(x)−1 k ≤ 1 and kHδ (y)−1 k ≤ δ 2 .
(35)
ksk = kAx − yk ≤ kAk + δ.
(36)
Furthermore,
The bound on λmax (AH(x)−1 AT + HCY (s)−1 + Hδ (y)−1 ) now follows from (34), (35), (36), and Lemma 4.9. 2 21
6 2
}.
References [1] F. Alizadeh, J. Haeberly, and M. Overton, “Primal-Dual Interior-Point Methods for Semidefinite Programming: Convergence Rates, Stability and Numerical Results,” SIAM Journal on Optimization 8 (1998) pp. 746-768. [2] A.L. Dontchev, A.S. Lewis, and R.T. Rockafellar, “The radius of metric regularity,” Submitted to Transactions of the AMS. [3] R. Freund and J. Vera, “Some Characterization and Properties of the ‘Distance to Ill-posedness’ and the Condition Measure of a Conic Linear System,” Mathematical Programming 86 (1999) 225-260. [4] R. Freund and J. Vera, “Condition-based Complexity of Convex Optimization in Conic Linear Form via the Ellipsoid Algorithm,” SIAM Journal on Optimization 10 (1999) 155-176. [5] A. Lewis, “Ill-conditioned convex processes and linear inequalities,” Mathematics of Operations Research 24 (1999), 829-834. [6] Y. Nesterov and A. Nemirovskii, Interior-Point Polynomial Algorithms in Convex Programming, SIAM, Philadelphia, 1994. [7] Y. Nesterov and M.J. Todd, “Self-scaled barriers and interior-point methods for convex programming,” Mathematics of Operations Research 22 (1997) 1-42. [8] Y. Nesterov and M.J. Todd, “Primal-dual interior-point methods for selfscaled cones,” SIAM Journal on Optimization 8 (1998) 324-364. [9] M. Nu˜ nez and R. Freund, “Condition Measures and Properties of the Central Trajectory of a Linear Program,” Mathematical Programming 83 (1998) 1-28. [10] M. Overton, Private Communication. [11] J. Pe˜ na, “Understanding the Geometry on Infeasible Perturbations of a Conic Linear System,” SIAM Journal on Optimization 10 (2000) 534-550. [12] J. Pe˜ na, “Conditioning of Convex Programs from a Primal-Dual Perspective,” To Appear in Mathematics of Operations Research. [13] J. Pe˜ na and J. Renegar, “Computing Approximate Solutions for Conic Systems of Constraints,” Mathematical Programming 87 (2000) 351-383. [14] J. Renegar, “Linear Programming, Complexity Theory and Elementary Functional Analysis,” Mathematical Programming, 70 (1995) pp. 279-351. [15] J. Renegar, “Condition Numbers, the Barrier Method and the Conjugate Gradient Method,” SIAM Journal on Optimization 6 (1996) 879-912.
22
[16] J. Renegar, A Mathematical View of Interior-Point Methods in Convex Optimization, SIAM, Philadelphia, 2000. [17] W. Rudin, Principles of Mathematical Analysis, Third Edition, McGraw Hill, 1976.
23