Unifying Condition Numbers for Linear ... - Optimization Online

3 downloads 29 Views 193KB Size Report
numbers. To do so, we introduce yet another linear programming problem and ..... complementary solution is (x0,x1, y, s0,s1) = (0,1,1,1,0) and we are in case.
Unifying Condition Numbers for Linear Programming ∗ Dennis Cheung, Felipe Cucker Department of Mathematics City University of Hong Kong 83 Tat Chee Avenue, Kowloon HONG KONG e-mail: {50003110@plink,macucker@math}.cityu.edu.hk Javier Pe˜ na † Graduate School of Industrial Administration Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213-3890 USA e-mail: [email protected]

Abstract. In recent years, several condition numbers were defined for a variety of linear programming problems based upon relative distances to illposedness. In this paper we provide a unifying view of some of these condition numbers. To do so, we introduce yet another linear programming problem and show that its distance to ill-posedness naturally captures the most commonly used distances to ill-posedness.

1

Introduction

1. Let A ∈ IRm×n , b ∈ IRm and c ∈ IRn , with n ≥ m. A variety of linear programming problems are associated to the triple d = (A, b, c). For instance, the Primal Feasibility problem is Decide whether there exists x ∈ IRn such that Ax = b, x ≥ 0,

(PF)

and if yes, find one such x ∗ This work has been substantially funded by a grant from the Research Grants Council of the Hong Kong SAR (project number CityU 1085/02P). † Supported by NSF grant CCR-0092655. This paper was written when Javier Pe˜ na was visiting City University of Hong Kong in Summer 2002.

1

and the Dual Feasibility problem is Decide whether there exists y ∈ IRm such that AT y ≤ c

(DF)

and if yes, find one such y. We denote by (PDF) the problem of deciding feasibility of both (PF) and (DF). Also, assuming both of them are feasible, the Optimization problem in Standard Primal-Dual Form is to compute points x∗ and y ∗ (the optimizers) attaining the minimum and maximum in the pair min cT x s.t. Ax = b x ≥ 0.

max bT y s.t. AT y ≤ c

(OSPDF)

Note that in this case linear programming duality yields the equality c T x∗ = bT y ∗ . The (slightly simpler) problem of computing this Optimal Value will be denoted by (OV). Another linear programming problem is the Homogeneous Feasibility problem which does not depend of b and c and can be stated in the following primal-dual form Decide which of the two systems Ax = 0 x≥0

AT y ≤ 0

and

(HF)

has non-trivial solutions and find one such solution.

2. A trend in recent research on those problems is the use of condition numbers in the study of the complexity and/or the round-off analysis of algorithms that solve them. A common way to define condition numbers, rooted on a result by Eckart and Young [9], consists in defining the condition of a data d as the relative inverse to the distance of ill-posed data. To do so, a notion of ill-posedness needs to be at hand. Generally speaking, a data is ill-posed when arbitrarily small perturbations can yield a qualitative change in the output of the problem. Thus, for instance, a square matrix is ill-posed (w.r.t. linear equation solving) when it is singular; input data of a decisional problem are ill-posed when they are in the boundary between the accepted and rejected inputs. The latter therefore defines ill-posedness of pairs (A, b) or (A, c) (for (PF) and (DF) respectively). Ill-posedness for (HF) is defined similarly. Clearly, a triple (A, b, c) is ill-posed for (PDF) when (A, b) or (A, c) is illposed (for (PF) or (DF) respectively). A triple (A, b, c) is ill-posed for (OV) when it is so for (PDF). Otherwise, small variations in the data will yield small variations in the optimal value (which we assume belongs to [−∞, +∞]). For (OSPDF) the situation is more delicate since the problem only makes sense when both (A, b) and 2

(A, c) are feasible. Assuming this is the case, a triple is ill-posed when either it is illposed for (OV) (which means that small perturbations can destroy feasibility) or the pair (x∗ , y ∗ ) of optimizers is degenerate. The above briefly describes ill-posedness for (PF), (DF), (PDF), (OV), (OSPDF) and (HF). A formal definition of these notions will be given in §2.2. To define conditioning from ill-posedness one needs a norm in the space of data. Let k k denote such a norm and ρ denote the distance to ill-posedness induced by k k. Then, the condition number of a data d is given by kdk . ρ(d) Condition numbers thus defined have been widely used in the analysis of algorithms. Thus, the distance ρP (A, b) is used in relation to (PF) [10, 12, 13, 18, 23, 25, 26], ρD (A, c) is used in relation to (DF) [12, 13, 18, 25], %(d) is used in relation to (OSPDF) [2, 3], ρ(d) = min{ρP (A, b), ρD (A, c)} is used in relation to (OV) [24, 33] and ρh (A) is used in relation to (HF) [5]. These distances to ill-posedness differ from one another since the problems they are tailored for (beginning by their data) differ as well. The goal of this paper is to introduce yet another linear programming problem and to show that its associated distance to ill-posedness naturally captures the distances to ill-posedness of the problems mentioned before. 3. The following picture shows a schematic landscape of the input data space. The continuous line is the set of ill-posed triples for the problem (PDF). On the other hand, the ill-posed triples for (OSPDF) are those in the upper half ∨ of the continuous line together with those in the dotted lines (which are those having degenerate pairs of optimizers). . . . . . .. . . .. .. . .. ... . .. .. ..... . . . . . ..... ... . . . . ... . . . ..... . . ..... . . . . .. . ..... ..... .. . . . . . . . . . . . . . . .... . ...... ..... . . .. .. .. . ..... ...... .. . . ..... .. ............. .. . . . ..... . .. .. . ..... . ...... ..... . .. . . . . ..... ..... . ...... ..... .. . . ...... ..... . . . ..... . ..... . . . . . . ..... . ..... . ..... . ..... . ..... . ........... . ..... . ..... . ..... ..... .. . ..... . . . . .. ..... .. . ..... .. ...... .. ..... .. ...... .. ..... . . ........... ..... . . . ...... .... . . . ..... . .... ..... ..... ..... ...... ..... ..... .......... .......... ..

(PF) feasible (DF) feasible

(PF) feasible (DF) infeasible

(PF) infeasible (DF) feasible

3

As we noted, the domains of the distances ρ and % are different. While ρ(d) is defined for all triples d, %(d) is so only for those triples in the upper third of the picture. And restricted to that upper third, it is clear that %(d) ≤ ρ(d) and that this inequality may be strict. The goal of this paper is to show that there is a natural complementary partitioning problem (CPP) with input d (see §2.1) that satisfies the following. The distance %(d) from d to the set of ill-posed triples (for problem (CPP)) satisfies that if any of (PF) or (DF) is infeasible then %(d) = ρ(d) = min{ρP (A, b), ρD (A, c)}, and if both (PF) and (DF) are feasible, then %(d) = %(d). Furthermore, when either c or b is maintained fixed to zero, the corresponding distance from d to the set of illposed triples (for problem (CPP)) coincides with ρP (A, b) and ρD (A, c) respectively. And when both b and c are maintained fixed to zero, the corresponding distance from d to the set of ill-posed triples (for problem (CPP)) coincides with ρ h (A). In this sense, the distance to ill-posedness of the partitioning problem (CPP) “unifies” the distances to ill-posedness ρ(d), %(d), ρP (A, b), ρD (A, c), and ρh (A). The issue of the relations of % with other measures of condition is discussed in Section 6.

2

Statement of the result

2.1

A complementary partition problem

From the basic duality theory of linear programming, it is well known that the problem of finding optimizers for (OSPDF) may alternatively be formulated as the following system of linear inequalities: − c T x + bT y c − AT y − b + Ax x

≥ 0 ≥ 0 = 0 ≥ 0.

(1)

By strong duality (1) is infeasible precisely when either (PF) or (DF) is infeasible. Consider the homogenized version of (1) cx0 − bx0

− c T x + bT y − AT y + Ax   x0 x

≥ ≥ =

0 0 0

(2)

≥ 0.

The homogeneous system (2) has a nice symmetry: it is the same as its alterna-

4

tive (dual) system. By adding slack variables, (2) can be written as − c T x + bT y − s 0 cx0 − AT y − s − bx0 + Ax x s     x0 s where x and s denote and 0 respectively. The x s tially the classical Goldman-Tucker Theorem, see [14]) is development.

= = = ≥ ≥

0 0 0 0 0

(3)

following theorem (essencrucial for our subsequent

Theorem 1 Let A, b, c and (3) be as above. There exists a unique partition B∪N = {0, 1, . . . , n} such that the following system of linear inequalities has a solution cx0 − bx0

− c T x + bT y − − AT y − + Ax

s0 s xB xN sB sN

Proof.

= 0 = 0 = 0 > 0 = 0 = 0 > 0.

See §3.3.



Remark 1 (1) Any solution of the system in Theorem 1 is called a strictly complementary solution of (3). (2) We will denote by B(d) the set B in Theorem 1 corresponding to the triple d when we want to emphasize its dependency on the data. Theorem 1 suggests the following complementary partition problem Compute B(d),

(CPP)

which has a natural distance to ill-posedness given by %(d) = inf{k∆dk : B(d + ∆d) 6= B(d)}.

2.2

Other distances to ill-posedness

We now recall the definitions of the distances to ill-posedness ρP (A, b), ρD (A, c), ρ(d), %(d), and ρh (A). Let FP be the set of pairs (A, b) such that (PF) is feasible and ΣP be the boundary of FP . We define ρP (A, b) as the distance from (A, b) to ΣP . 5

Similarly, let FD be the set of pairs d such that (DF) is feasible and ΣD be the boundary of FD . We define ρD (A, c) as the distance from (A, c) to ΣD . Also, for a triple d = (A, b, c), we define ρ(d) = min{ρP (A, b), ρD (A, c)}. Assume now that d is such that A is full row-rank and both (PF) and (DF) are feasible. Let B ∗ be an optimal basis for d. We define %(d) by %(d) = inf{kδdk : B ∗ is not an optimal basis for d + δd}. By convention, %(d) = 0 if A is not full row-rank, or if (PF) or (DF) is infeasible. Finally, let Fh be the set of matrices A ∈ IRm×n such that Ax = 0, x ≥ 0 has non-trivial solutions and let Σh be the boundary of Fh . Then ρh (A) is defined by ρh (A) = inf{kδAk : A + δA ∈ Σh }. All these distances to ill-posedness induce condition numbers in the usual way: CP (A, b) =

k(A, b)k , ρP (A, b)

Ch (A) =

2.3

CD (A, c) =

kAk , ρh (A)

k(A, c)k , ρD (A, c)

and K (d) =

C(d) =

kdk , ρ(d)

kdk . %(d)

On norms

In all the above we have freely written expressions like kdk or k(A, b)k without specifying which norms are being considered. This will hold in particular for the statement of the Main Theorem. Actually, norms are considered in the following spaces: IRm×n (e.g., input A), IRm×(n+1) (e.g., (A, b)), IR(m+1)×n (e.g., (A, c)), and IRmn+m+n (e.g., (A, b, c)). And, implicitly, all these four spaces and norms are present in the statement of the Main Theorem. The point we want to emphasize here is that the Main Theorem holds for an arbitrary choice of norms in these four spaces as long as some minimal compatibility conditions hold. More precisely, we require kAk ≤ k(A, b)k, k(A, c)k ≤ k(A, b, c)k, and k(A, 0, c)k = k(A, c)k,

k(A, b, 0)k = k(A, b)k,

where the 0 in k(A, 0)k is either in IRm or in IRn .

6

k(A, 0, 0)k = k(A, 0)k = kAk,

2.4

Main theorem

Our main result is the following (recall the distances ρP (A, b), ρD (A, c), %(d) and ρh (A) defined above). Main Theorem

Let d = (A, b, c) ∈ IRm×n × IRm × IRn .

(i) Both (PF) and (DF) are feasible if and only if 0 ∈ B(d). In addition: (a) If A is full row-rank and (OSPDF) has unique optimal primal and dual solutions then (OSPDF) has a unique optimal basis B ∗ , B(d) = B ∗ ∪ {0} and 0 < %(d) = %(d). (b) If A is not full row-rank or (OSPDF) has multiple optimal primal or dual solutions then 0 = %(d) = %(d). (ii) Either (PF) or (DF) is infeasible if and only if 0 ∈ N (d). In addition: (a) If B(d) and N (d) \ {0} are both non-empty then  ρP (A, b) if (PF) is infeasible 0 = %(d) = ρh (A) = ρD (A, c) if (DF) is infeasible. (b) If B(d) is empty then (DF) is feasible, (PF) is infeasible, and 0 < %(d) = ρP (A, b) ≤ ρh (A) ≤ ρD (A, c). (c) If N (d) = {0} then either (1) or (2) below holds (1) (PF) is infeasible, A is not full row-rank, and 0 = %(d) = ρP (A, b) = ρh (A) ≤ ρD (A, c). (2) (PF) is feasible, (DF) is infeasible, and 0 ≤ %(d) = ρD (A, c) ≤ ρh (A) ≤ ρP (A, b). (iii) The distances ρP , ρD , and ρh can be recovered as restricted versions of %: (a) ρP (A, b) = inf{k(∆A, ∆b)k : B(A + ∆A, b + ∆b, 0) 6= B(A, b, 0)}. (b) ρD (A, c) = inf{k(∆A, ∆c)k : B(A + ∆A, 0, c + ∆c) 6= B(A, 0, c)}. (c) ρh (A) = inf{k∆Ak : B(A + ∆A, 0, 0) 6= B(A, 0, 0)}. Remark 2 (1) The problems (PF), (DF), and (OSPDF) merge into a single problem: Given a triple d = (A, b, c), decide whether both (PF) and (DF) are feasible and if yes, find optimizers. The quantity %(d) captures the conditioning of both parts. If (PF) or (DF) is infeasible then %(d) = ρ(d) = min{ρP (A, b), ρD (A, c)}. Else %(d) = %(d). 7

(2) From the Main Theorem it follows that if one of (PF) or (DF) is infeasible and well posed, then the other is feasible and better posed. (3) In (ii.c.2) we may have equality in the first inequality (i.e., 0 = %(d)). For instance, take m = n = 1 and A = 0, b = 0, c = −1. Then a strictly complementary solution is (x0 , x1 , y, s0 , s1 ) = (0, 1, 1, 1, 0) and we are in case (ii.c.2) with 0 = %(d) = ρD (A, c) = ρP (A, b). Note also have ρD (A, c) > 0. An example is A =   (ii.c.1) we  may  that in  1 1 1 −1 . , and c = ,b= 1 2 1 −1 (4) Each of the feasibility problems (PF) and (DF) are special cases of (CPP), obtained by taking c = 0 and b = 0 respectively. In these cases ρP (A, b) and ρD (A, c) correspond to restricted versions of the distance to ill-posedness %, namely those obtained by considering perturbations that maintain c fixed to zero and b fixed to zero respectively. Similarly, (HF) is a special case of (CPP), obtained by taking b = 0 and c = 0. In this case, ρh (A) corresponds to the restricted version of % obtained by considering perturbations that maintain both b and c fixed to zero.

2.5

Characterizations of distances to ill-posedness

The problem of finding alternative characterizations of the several distances to illposedness above is interesting both from the theoretical and practical points of view. In particular, our proof of the Main Theorem relies on some of such characterizations. The search for characterizations of the distance to ill-posedness has received a great deal of attention and is still subject of research. We next summarize some of the relevant results known to date and how they relate to the new distance to ill-posedness % introduced above. For linear equation solving, the classical Eckart and Young identity [9] (see also [15, Thm. 6.5] for an extension) provides a simple characterization of the distance to singularity (the notion of ill-posedness for linear equation solving). This identity states that the distance to singularity of a given square matrix is equal to the reciprocal of the norm of its inverse. In particular, when the Euclidean norm is used, this identity states that the distance to singularity and the smallest singular value of a given square matrix are the same. For the primal and dual feasibility problems (PF) and (DF), Renegar [25, Thm. 3.5] proves a characterization of ρP (A, b) and ρD (A, c) as the optimal value of certain optimization problems. His characterization actually applies to more general conic systems. Renegar’s characterization can be seen as a natural generalization of the Eckart and Young identity. In the case when both (PF) and (DF) are feasible, Cheung and Cucker [2, Thm. 3] prove a characterization of %(d) as the minimum of the reciprocal of the norms of

8

 0 −cT the inverses of certain submatrices of determined by the optimal basis −b A of d. 

The characterizations above together with the Main Theorem yield a characterization of % as the solution of certain optimization problems. It is interesting to note that all these characterizations maintain the original flavor of the Eckart and Young identity. There are also generalizations of Renegar’s characterization of the distance to ill-posedness to more abstract settings. Using the elegant framework of convex processes, Lewis [16, 17] generalizes Renegar’s characterization to a more abstract class of problems. Spefically, Lewis [16, Thm. 2.8] shows that the distance to nonsurjectivity of a convex process equals the reciprocal of the norm of its inverse, which is again a natural generalization of the Eckart and Young identity. Along the same lines, Dontchev, Lewis, and Rockafellar [7] provide similar results to characterize the radius of metric regularity. In the recent paper [22], Pe˜ na addresses the problem of characterizing the distance to ill-posedness for perturbations restricted to a particular block-structure, such as that determined by a sparsity pattern. The results in [22] yield yet a different type of generalization of Renegar’s characterization of the distance to ill-posedness by addressing the problem for a restricted class of data perturbations. The distance to ill-posedness under block-structured perturbations is closely related to the structured singular value, which plays a central role in robust control and has been extensively studied in the robust control literature; see, e.g., [8, 19, 20, 35] and the references therein. It is also related with the componentwise distance to singularity as studied by Demmel [6], Rohn [27], and Rump [28] among others (see [22] for a more detailed discussion). An appropriate study of the peculiar structure of the system (2) may yield an alternative characterization of % independently of its connections with ρ P , ρD and %. However, at this moment such a characterization is not known. This is not surprising as it is only recently that the properties of the distance to ill-posedness under structured perturbations have been investigated. This interesting topic will be a matter of further research.

3 3.1

Preliminaries Some connections between FP , FD and Fh and between ρP , ρD and ρh

The following proposition states some basic topological properties of F h . In turn, it yields some connections between the sets FP , FD and Fh and between ρP , ρD and ρh .

9

Proposition 1 (a) int(Fh ) = {A : {Ax : x ≥ 0} = IRm } = {A : A is full row-rank and ∃x > 0 s.t. Ax = 0}. (b) Fhc = int(Fhc ) = {A : {AT y + s : y ∈ IRm , s ≥ 0} = IRn } = {A : ∃y s.t. AT y < 0}. Proof. Since all norms in IRm×n are topologically equivalent, we can assume without loss of generality that IRm×n is endowed with the Euclidean operator norm. Under this assumption both the first equality in (a) and the middle one in (b) readily follow from the following characterization of ρh (A) (see [25, Thm. 3.5] and [21, Cor. 4.6]). If A ∈ Fh then ρh (A) = inf{kyk : y 6∈ {Ax : x ≥ 0, kxk ≤ 1}}; and if A 6∈ Fh then ρh (A) = inf{kvk : v 6∈ {AT y + s : kyk ≤ 1, s ≥ 0}}. The second equality in (a) and the third equality in (b) are straightforward. Finally, the identity Fhc = int(Fhc ) follows from Gordan’s Theorem.  Corollary 1 (a) If A ∈ int(Fh ) then (A, b) ∈ FP for all b ∈ IRm and ρP (A, b) ≥ ρh (A) > 0. (b) If A 6∈ Fh then (A, c) ∈ FD for all c ∈ IRn and ρD (A, c) ≥ ρh (A) > 0.



There are more key connections between ρP , ρD and ρh . It is easy to see that  if (A,  b) ∈ FP then, under the additional compatibility condition k(A, b)k = k −b A k, the following identity holds ρP (A, b) = ρh ([−b A]),

where the latter is the distance to ill-posedness of the homogeneous system −bx0 + Ax = 0, x0 , x ≥ 0. Also, for A ∈ Fh , we have ρP (A, 0) = ρh (A) and ρD (A, 0) = 0 (here the 0 is in IRm , IRn , and IR respectively). Likewise, for A 6∈ Fh , we have ρD (A, 0) = ρh (A) and ρP (A, 0) = 0. On the other hand, we have the following inequalities whenever (PF) or (DF) is infeasible. Proposition 2 Let (A, b, c) ∈ IRm×n × IRm × IRn be given. 10

(a) If (A, b) 6∈ FP then ρP (A, b) ≤ ρh (A). (b) If (A, c) 6∈ FD then ρD (A, c) ≤ ρh (A). Proof. (a) Note that by Corollary 1(a), A 6∈ int(Fh ). Hence ρh (A) = inf{kA0 − Ak : A0 ∈ Fh }.

(4)

Let A0 ∈ Fh be fixed. Then there exists x ≥ 0, x 6= 0 such that A0 x = 0. Without loss of generality assume kxk = 1. Let  > 0 be fixed. Take ∆A :=  bxT . It follows that (A0 + ∆A, b) ∈ FP as 1 x solves (A0 + ∆A)x = b, x ≥ 0. Since  > 0 is arbitrary, it follows that ρP (A, b) ≤ kA0 − Ak. This holds for any A0 ∈ Fh so by (4) we get ρP (A, b) ≤ ρh (A). (b) Use a dual argument. By Corollary 1(b), A ∈ Fh , and consequently ρh (A) = inf{kA0 − Ak : A0 6∈ Fh }.

(5)

Let A0 6∈ Fh be fixed. Then there exists y 6= 0 such that (A0 )T y ≤ 0. Without loss of generality assume kyk = 1. Let  > 0 be fixed. Take ∆A :=  ycT . It follows that (A0 + ∆A, c) ∈ FP as 1 y solves (A0 + ∆A)T y ≤ c. Since  > 0 is arbitrary, it follows that ρD (A, c) ≤ kA0 − Ak. This holds for any A0 6∈ Fh so by (5) we get ρD (A, c) ≤ ρh (A). 

3.2

A homogeneous complementarity partition problem

The following theorem, a special case of Theorem 1, easily yields a special case of the Main Theorem and will be useful in its proof. Theorem 2 Let A be a given m by n matrix. There exists a unique partition B ∪ N = {1, 2, . . . , n} such that the following system of linear inequalities has a solution AT y + s = 0 Ax = 0 xB > 0 xN = 0 sB = 0 sN > 0. 11

Proof.

See §3.3.



We denote the partition given by Theorem 2 by Bh (A), Nh (A). Note that Nh (A) = ∅ ⇔ Ax = 0, x ≥ 0 has strictly feasible solutions and Bh (A) = ∅ ⇔ AT y + s = 0, s ≥ 0 has strictly feasible solutions. The following result is an immediate consequence of these equivalences. Proposition 3 For A ∈ IRm×n , ρh (A) = inf{k∆Ak : Bh (A + ∆A) 6= Bh (A)}. In addition, if Bh (A) 6= ∅ and Nh (A) 6= ∅ then ρh (A) = 0.

3.3



Strict complementarity

Theorems 1 and 2 are elementary and probably apparent to any expert in optimization. Since our development relies so crucially on them, we provide a proof here. We show below that both Theorem 1 and 2 readily follow from Proposition 4, a slightly more general result. This proposition also describes what B and N are, thereby providing more insight. Proposition 4 is in turn proved via a suitable version of Farkas’ Lemma. We will use the following general form of Farkas’ Lemma. Let K1 and K2 be closed cones, products of {0}, IR+ , and IR. Then Aw + b ∈ K2 ,

w ∈ K1

has a solution if and only if −AT v ∈ K1∗ ,

−bT v > 0,

v ∈ K2∗

has no solution. Here, for a closed cone K as above with `-th component K[`], K ∗ denotes its dual, whose `-th component is given by  if K[`] = {0}  IR K ∗ [`] = 0 if K[`] = IR  IR+ if K[`] = IR+ .

Proposition 4 Let P ∈ IRn×n , Q ∈ IRm×m be skew-symmetric matrices and R ∈ IRn×m . There exists a unique partition B ∪ N = {1, . . . , n} such that the following system has a solution

12

(P x + Ry)B (P x + Ry)N −RT x + Qy xB xN Proof.

= > = > =

0 0 0 0 0.

Consider the system    P R x ∈ IRn+ × {0m }, −RT Q y

x ≥ 0.

(6)

Let M denote the (n + m) × (n + m) matrix in the system (6). Define     x B= i≤n:∃ solution of (6) with xi > 0 y and N=

(

)      x x solution of (6) with M j≤n:∃ >0 . y j y

We first show that B, N is a partition of {1, . . . , n}. Let i 6∈ B and M [`] be the matrix obtained by removing the `th column of M (a similar notation is used for x). Then, by homogeneity, there is no solution to (6) with xi = 1, i.e.,  [i]  [i] x M + Mi ∈ IRn+ × {0m }, x[i] ≥ 0 y has no solution. By the version of Farkas’ Lemma stated above,    T x n−1 [i] T x ∈ IR+ × {0m }, > 0, − M −Mi y y does have a solution or, equivalently, so does   x ∈ IRn+ × {0m }, M y

x≥0

   x M >0 y i

and this shows that i ∈ N . To show that B ∩ N = ∅ note that, since M   is skew-symmetric,  T T x x x ∈ IRn+m . If, in addition, is a solution of (6) y M = 0 for all x y y y       T T x x then, for all i = 1, . . . , n, xi M = 0 because all the terms in x y M y y i are non-negative. Thus, B ∩ N = ∅. 13

It only remains to show that the system (P x + Ry)B (P x + Ry)N −Rx + Qy xB xN

= > = > =

0 0 0 0 0

has a solution. Such a solution can be obtained by adding, for j = 1, . . . , n, a   x solution [j] as ensured by the definitions of B and N (according to whether j is y in B or N ).  Proofs of Theorems 1 and 2. For Theorem 1, apply Proposition 4 taking   T   b 0 −cT , R= , and Q = 0. P = c 0 −AT which yields system (2). Theorem 2 follows in a similar manner taking P = 0, R = −AT , and Q = 0.  Corollary 2 Let d = (A, b, c) ∈ IRm×n × IRm × IRn . Then (a) (A, b) ∈ FP and (A, c) ∈ FD if and only if 0 ∈ B(d). (b) (A, b) ∈ FP if and only if 0 ∈ B(A, b, 0). In addition: (A, b) ∈ int(FP ) ⇔ −bx0 + Ax = 0, x0 , x ≥ 0 has strictly feasible solutions and A is full row-rank ⇔ N (A, b, 0) = ∅ and A is full row-rank, and (A, b) ∈

int(FPc )

   bT s ⇔ y − 0 = 0, s0 , s ≥ 0 has strictly feasible solutions T −A s ⇔ B(A, b, 0) = ∅. 

(c) (A, c) ∈ FD if and only if 0 ∈ B(A, 0, c). In addition: (A, c) ∈ int(FD ) ⇔ cx0 − AT y − s = 0, x0 , s ≥ 0 has strictly feasible solutions ⇔ B(A, 0, c) = {0}, and (A, c) ∈

c ) int(FD

 T   −c s ⇔ x − 0 = 0, s0 , x ≥ 0 has strictly feasible solutions A 0 and A is full row-rank ⇔ N (A, 0, c) = {0} and A is full row-rank. 14

Proof. (a) This part readily follows from linear programming duality and from the way B is defined in the proof of Proposition 4. (b, c) These parts are straightforward consequences of (a), Farkas’ Lemma, and the definitions of B and N in the proof of Proposition 4.  The following relation between the complementarity partition and the homogeneous complementarity partition will be useful in the proof of the Main Theorem. Proposition 5 Let d = (A, b, c) ∈ IRm×n × IRm × IRm be given. (a) If 0 ∈ N (d) then B(d) = Bh (A) and N (d) = Nh (A) ∪ {0}. (b) If B(d) = ∅ then A 6∈ Fh , (A, c) ∈ FD , and (A, b) 6∈ FP . (c) If N (d) = {0} then either (i) A ∈ Σh , or (ii) A ∈ int(Fh ), (A, b) ∈ FP , and (A, c) 6∈ FD . Proof. (a) This readily follows from Theorems 1 and 2. (b) By (a), Bh (A) = B(d) = ∅. Thus there exists y ∈ IRm such that AT y < 0. By Proposition 1(b), A ∈ int(Fhc ) = Fhc and by Corollary 1(b), (A, c) ∈ FD . Since 0 ∈ N (d) and (A, c) ∈ FD , by Corollary 2(a) we must have (A, b) 6∈ FP . (c) Again by (a), Nh (A) = N (d) \ {0} = ∅. Thus there exists x ∈ IRn , x > 0 such that Ax = 0, so A ∈ Fh . If A 6∈ Σh then A ∈ int(Fh ); hence by Corollary 1 (A, b) ∈ FP , and by Corollary 2(a) we must have (A, c) 6∈ FD . 

4

Proof of the Main Theorem

The first statements in both (i) and (ii) were proven in Corollary 2(a). (i.a) It is known that (OSPDF) has unique optimal primal and optimal dual solutions if and only if there exists an optimal basis B ∗ such that the corresponding optimizers (y ∗ , x∗ ) are non-degenerate. Put s∗ = c − AT y ∗ . Then, for i = 1, . . . , n, x∗i 6= 0 ⇐⇒ s∗i = 0 ⇐⇒ i ∈ B ∗ . By the way B and N are defined in the proof of Proposition 4 this shows that B(d) = B ∗ ∪ {0}. Furthermore, this also implies the uniqueness of the optimal basis B ∗ . 15

The inequality %(d) > 0 follows because A full row-rank, and the nondegeneracy of the optimizers (y ∗ , x∗ ) ensure that B ∗ remains optimal with non-degenerate optimizers for slight perturbations of d. It also follows that %(d) > 0 because B(d) = B ∗ ∪ {0} holds as long as the optimizers associated to B ∗ are non-degenerate. Finally the equality %(d) = %(d) follows from (i.b) below. (i.b) If A is not full row-rank then we can make (PF) infeasible with an arbitrarilly small perturbation on b thus changing B(d). This shows that %(d) = %(d) = 0. Hence assume A is full row-rank and (OSPDF) has either multiple optimal primal or dual solutions. Let B ∗ be an optimal basis. It must be the case that B ∗ 6= B(d) \ {0} for otherwise the optimal primal and dual solutions associated to B ∗ would be unique. It is then easy to show that there exist d0 = (A, b + ∆b, c + ∆c) and d00 = (A, b + ∆b00 , c + ∆c00 ) arbitrarily close to d and such that B ∗ is the optimal basis for d0 with non-degenerate optimizers and B ∗ is not the optimal basis for d00 . In particular %(d) ≤ kd − d0 k and %(d) ≤ kd − d00 k. Since both d0 and d00 can be taken arbitrarily close to d, we get %(d) = %(d) = 0. (ii.a) By Proposition 5(a), Bh (A) = B(d) 6= ∅ and Nh (A) = N (d) \ {0} 6= ∅. Proposition 3 then yields ρh (A) = 0. Furthermore, since either (A, b) 6∈ FP or (A, c) 6∈ FD , Proposition 2 implies that min{ρP (A, b), ρD (A, c)} = ρh (A) = 0 and the minimum corresponds to the infeasible problem. Let ∆A be such that Bh (A + ∆A) 6= Bh (A). We claim that B(A + ∆A, b, c) 6= B(d). Here is a proof of the claim. If 0 ∈ B(A + ∆A, b, c) then there is nothing to prove as 0 6∈ B(d). Hence assume 0 6∈ B(A + ∆A, b, c). By Proposition 5(a) again, Bh (A + ∆A) = B(A + ∆A, b, c). But B(d) = Bh (A) 6= Bh (A + ∆A), and so this proves the claim. We thus conclude that %(d) = 0 = ρh (A) = min{ρP (A, b), ρD (A, c)}. (ii.b) By Proposition 5(a,b), Bh (A) = B(d) = ∅, and A 6∈ Fh , (A, b) 6∈ FP , (A, c) ∈ FD . Thus, by Corollary 1(b) and Proposition 2(a), ρP (A, b) ≤ ρh (A) ≤ ρD (A, c). Let ∆d = (∆A, ∆b, ∆c) be such that k∆dk < ρP (A, b). Then in particular k(∆A, ∆b)k ≤ k∆dk < ρP (A, b) and so (A + ∆A, b + ∆b) 6∈ FP . Hence by part (i), 0 ∈ N (d + ∆d), and by Proposition 5(a), B(d + ∆d) = Bh (A + ∆A). But k∆Ak ≤ k∆dk < ρP (A, b) ≤ ρh (A) so B(d + ∆d) = Bh (A + ∆A) = Bh (A) = B(d). This shows %(d) ≥ ρP (A, b). For the reverse inequality, let (∆A, ∆b) be such that (A + ∆A, b + ∆b) ∈ FP . By Proposition 5(b) again, B(A + ∆A, b + ∆b, c) 6= ∅ = B(d). Since this 16

holds for any (∆A, ∆b) such that (A + ∆A, b + ∆b) ∈ FP , we conclude that %(d) ≤ ρP (A, b). To show 0 < ρP (A, b), first notice that because B(d) = ∅, there exists y such that −bT y < 0 (7) AT y < 0. These inequalities are strict, so they continue to hold for slight perturbations on (A, b). Since (7) implies (A, b) 6∈ FP , it follows that ρP (A, b) > 0. (ii.c) Consider the two cases given by Proposition 5(c). Case I: A ∈ Σh . In this case ρh (A) = 0. Furthermore, since either (A, b) 6∈ FP or (A, c) 6∈ FD , Proposition 2 yields min{ρP (A, b), ρD (A, c)} = ρh (A). Proceeding as in (ii.a) we get %(d) = 0 = ρh (A) = min{ρP (A, b), ρD (A, c)} and the minimum corresponds to the infeasible problem. If (A, b) ∈ FP then (ii.c.2) holds. If, instead, (A, b) 6∈ FP then A can not be full row-rank. This follows from the fact that, since Bh (A) = {1, . . . , n}, the system Ax = 0, x > 0, is feasible. If A were full row-rank then, by Proposition 1(a), we should have A ∈ int(Fh ) and thus ρh (A) > 0. Therefore, (ii.c.1) holds. Case II: A ∈ int(Fh ), (A, b) ∈ FP , and (A, c) 6∈ FD . In this case by Corollary 1(a) and Proposition 2(b), we have ρD (A, c) ≤ ρh (A) ≤ ρP (A, b). Now proceed as in (ii.b): Let ∆d = (∆A, ∆b, ∆c) be such that k∆dk < ρD (A, c). Then in particular k(∆A, ∆c)k ≤ k∆dk < ρD (A, c) and so (A + ∆A, c + ∆c) 6∈ FD . Hence by part (i), 0 ∈ N (d + ∆d), and by Proposition 5(a), B(d + ∆d) = Bh (A + ∆A). But k∆Ak ≤ k∆dk < ρD (A, b) ≤ ρh (A) so B(d + ∆d) = Bh (A + ∆A) = Bh (A) = B(d). This shows %(d) ≥ ρD (A, b). On the other hand, let (∆A, ∆c) be such that (A + ∆A, c + ∆c) ∈ FD . If (A + ∆A, b) ∈ FP then Corollary 2(a) yields 0 ∈ B(A + ∆A, b, c + ∆c), i.e., N (A + ∆A, b, c + ∆c) 6= N (d). If (A + ∆A, b) 6∈ FP then Corollary 1 yields A + ∆A 6∈ int(Fh ). Thus for any  > 0 there exists k∆A0 k <  such that A + ∆A + ∆A0 6∈ Fh . Using Proposition 1(b) and the remark after Theorem 2, we deduce that Nh (A + ∆A + ∆A0 ) 6= ∅. Thus by Proposition 5(a) again N (A + ∆A + ∆A0 , b, c + ∆c) 6= N (d). In either case we have N (A+∆A+∆A0 , b, c+∆c) 6= N (d) for some k∆A0 k < . Since this holds for any (∆A, ∆c) such that (A + ∆A, c + ∆c) ∈ FD and any  > 0, we finally get %(d) = ρD (A, c) ≤ ρh (A) ≤ ρP (A, b). 17

and again (ii.c.2) holds. (iii.a) If (A, b) ∈ ΣP then from Corollary 2(b) it follows that arbitrarily small perturbations to (A, b) can make B(A, b, 0) change either to ∅ or to {0, 1, . . . , n}. Therefore, for arbitrary pairs (A, b), ρP (A, b) ≥ inf{k(∆A, ∆b)k : B(A + ∆A, b + ∆b, 0) 6= B(A, b, 0)}. On the other hand, if (A, b) ∈ FP then 0 ∈ B(A, b, 0) and by Corollary 2(b) ρP (A, b) = inf{k(∆A, ∆b)k : (A + ∆A, b + ∆b) ∈ ΣP } = inf{k(∆A, ∆b)k : (A + ∆A, b + ∆b) ∈ int(FPc )} = inf{k(∆A, ∆b)k : B(A + ∆A, b + ∆b, 0) = ∅} ≤ inf{k(∆A, ∆b)k : B(A + ∆A, b + ∆b, 0) 6= B(A, b, 0)}. Similarly, if (A, b) ∈ FPc then 0 6∈ B(A, b, 0) and by Corollary 2(b) ρP (A, b) = inf{k(∆A, ∆b)k : (A + ∆A, b + ∆b) ∈ ΣP } = inf{k(∆A, ∆b)k : (A + ∆A, b + ∆b) ∈ int(FP )} ≤ inf{k(∆A, ∆b)k : B(A + ∆A, b + ∆b, 0) = {0, 1, . . . , n}} ≤ inf{k(∆A, ∆b)k : B(A + ∆A, b + ∆b, 0) 6= B(A, b, 0)}. We thus conclude that ρP (A, b) = inf{k(∆A, ∆b)k : B(A + ∆A, b + ∆b, 0) 6= B(A, b, 0)}. (iii.b) This follows from Corollary 2(c) proceeding as in (iii.a). (iii.c) This follows from (iii.a) and (iii.b). Alternatively, it also follows directly from Proposition 3 and the straightforward observation: B(A, 0, 0) = Bh (A) ∪ {0}. 

5

Other forms of linear programs

Perhaps due to the influence of the simplex method, most of the literature on linear programming deals with linear programs in standard form. While any linear program can be recast in such a form, some useful problem structure may be lost in the process. We exposed our development above for linear programs in standard form. We note here that similar versions of the Main Theorem hold for any form of linear programming as long as an appropriate notion of optimal basis is defined.

18

As an example, we describe next how this is done for the Optimization problem in Symmetric Form min cT x s.t. Ax ≥ b x≥0

max bT y s.t. AT y ≤ c y≥0

(OSF)

where, as above, A is an m × n matrix. In this case an optimal basis is a pair of sets R ⊆ {1, . . . , m} and B ⊆ {1, . . . , n} such that • |R| = |B| and ARB is invertible, • x∗B = A−1 RB bR ≥ 0, ∗ = A−T c ≥ 0, • yR RB B

• AM B x∗B ≥ bM , and ∗ • AT RN yR ≤ cN .

Here M = {1, . . . , m}\R, N = {1, . . . , n}\B, and AP Q denotes the matrix obtained by removing from A all the rows not in P and all the columns not in Q. As in the standard form case, an optimal basis automatically determines a pair ∗ ∗ ∗ (x , y ∗ ) of optimizers: Solve ARB xB = bR and AT RB yR = cB to get xB and yR , and ∗ ∗ T ∗ T ∗ set xN = 0 and yM = 0. It immediately follows that c x = b y . Proposition 4 yields the following result, analogous to Theorem 1. Theorem 3 There exist unique partitions B ∪ N = {0, 1, . . . , n} and R ∪ M = {1, . . . , m} such that the following system of linear inequalities has a solution cx0 − bx0

− c T x + bT y − − AT y − + Ax −

    x0 s where x and s denote and 0 respectively. x s

19

s0 s z yR , x B yM , x N zR , s B zM , s N

= 0 = 0 = 0 > 0 = 0 = 0 > 0, 

We use the notations B(d), R(d), N (d), M (d), as in the previous sections. In addition, to emphasize the similarity with the Main Theorem we put B(d) := (B(d), R(d)) and N (d) := (N (d), M (d)). We also denote by (PF) the system Ax ≥ b, x ≥ 0, and by (DF) the system AT y ≤ c, y ≥ 0. The distances ρP (A, b), ρD (A, c), %(d), ρh (A) and %(d) are then defined in the obvious way. In particular, %(d) = inf{k∆dk : B(d + ∆d) 6= B(d)}. Using this notation, it is straightforward to prove the following version of our main result for the primal-dual pair in symmetric form. Theorem 4 Let d = (A, b, c) ∈ IRm×n × IRm × IRn . (i) Both (PF ) and (DF) are feasible if and only if 0 ∈ B(d). In this case, (a) If (OSF) has unique optimal primal and dual solutions then d has a unique optimal basis (B ∗ , R∗ ), and B(d) = B ∗ ∪ {0}, R(d) = R∗ , and 0 < %(d) = %(d). (b) Otherwise 0 = %(d) = %(d). (ii) Either (PF ) or (DF) is infeasible if and only if 0 ∈ N (d). In this case, %(d) = min{ρP (A, b), ρD (A, c)} ≤ ρh (A) ≤ max{ρP (A, b), ρD (A, c)}, where the minimum corresponds to the infeasible problem. (iii) The distances ρP , ρD , and ρh can be recovered as restricted versions of %: (a) ρP (A, b) = inf{k(∆A, ∆b)k : B(A + ∆A, b + ∆b, 0) 6= B(A, b, 0)}. (b) ρD (A, c) = inf{k(∆A, ∆c)k : B(A + ∆A, 0, c + ∆c) 6= B(A, 0, c)}. (c) ρh (A) = inf{k∆Ak : B(A + ∆A, 0, 0) 6= B(A, 0, 0)}. 

6

Final remarks

There are other ways to define condition measures. And there are reasons to do so. Note that when a data is ill-posed, its condition (in the sense above) is infinity. This is a desirable feature when the condition measure is used for round-off analysis since in this case, arbitrarily small errors can have unbounded effects in the computed solution. It is not desirable, in contrast, when the condition measure is used in the complexity analysis of an algorithm that always terminates. In this case, one would like a condition measure which is always finite. Examples of such finite condition measures exist. For instance, for (HF), condition measure σ(A) is defined in [34] and used in [31] for a complexity analysis. 20

Similarly, a complexity analysis is done in [32] in terms of the measure χA introduced in [29] and [30]. Both σ(A) and χA are finite for all matrices A. Other recent measures of condition for linear programming, C (A) and µ(A) (only for the case in which Ax = 0, x ≥ 0, has non trivial solutions), are defined in [1] and [11]. kdk While these measures can not be directly compared with K (d) = %(d) since they are tailored for a different problem, we note that they can be compared with C h (A) (whose relationship with K (d) is known from the Main Theorem). Relationships between these different condition measures were established, among other papers, in [1, 11]. The following table taken taken from [4] summarizes these comparisons. On this table, if the cell on row i and column j is No it means that the condition number in column j carries no upper bound information about condition number in row i. Ch (A) Ch (A) C (A) χA 1/σ(A) µ(A)

C (A) ≤



nCh (A)

No 1 σ(A)

C (A) No

≤ 1 + Ch (A)

µ(A) ≤ Ch (A)

χA No

1/σ(A) No

µ(A) No

No

No

No

No No No

No 1 σ(A)

≤ χA + 1

µ(A) ≤ χA

No 1 σ(A)

µ(A) =

1 σ(A)

= 1 + µ(A)

−1

References [1] D. Cheung and F. Cucker. A new condition number for linear programming. Math. Program., 91:163–174, 2001. [2] D. Cheung and F. Cucker. Solving linear programs with finite precision: I. Condition numbers and random programs. To appear in Math. Program., 2002. [3] D. Cheung, F. Cucker, and J. Pe˜ na. Solving linear programs with finite precision: II. Algorithms. Manuscript in preparation, 2003. [4] D. Cheung, F. Cucker, and Ye. Y. Linear programming and condition numbers under the real number computation model. In Ph. Ciarlet and F. Cucker, editors, Handbook of Numerical Analysis, volume XI. North-Holland, 2003. [5] F. Cucker and J. Pe˜ na. A primal-dual algorithm for solving polyhedral conic systems with a finite-precision machine. SIAM Journal on Optimization, 12:522–554, 2002. [6] J. Demmel. The componentwise distance to the nearest singular matrix. SIAM J. Matrix Anal. Appl., 13:10–19, 1992. [7] A. Dontchev, A. Lewis, and T. Rockafellar. The radius of metric regularity. Trans. Amer. Math. Soc., 355:493–517, 2003. [8] J. Doyle. Analysis of feedback systems with structured uncertainty. IEE Proceedings, 129:242–250, 1982. [9] C. Eckart and G. Young. The approximation of one matrix by another of lower rank. Psychometrika, 1:211–218, 1936.

21

[10] M. Epelman and R.M. Freund. Condition number complexity of an elementary algorithm for computing a reliable solution of a conic linear system. Math. Program., 88:451–485, 2000. [11] M. Epelman and R.M. Freund. A new condition measure, preconditioners, and relations between different measures of conditioning for conic linear systems. SIAM Journal on Optimization, 12:627–655, 2002. [12] R.M. Freund and J.R. Vera. Condition-based complexity of convex optimization in conic linear form via the ellipsoid algorithm. SIAM Journal on Optimization, 10:155– 176, 1999. [13] R.M. Freund and J.R. Vera. Some characterizations and properties of the “distance to ill-posedness” and the condition measure of a conic linear system. Math. Program., 86:225–260, 1999. [14] A. Goldman and A. Tucker. Theory of linear programming. In H. Kuhn and A. Tucker, editors, Linear Inequalities and Related Systems, volume 38 of Annals of Mathematical Studies, pages 53–97. Princeton Univ. Press, 1956. [15] N. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, 1996. [16] A. Lewis. Ill-conditioned convex processes and linear inequalities. Math. of Oper. Res., 24:829–834, 1999. [17] A.S. Lewis. Ill-conditioned inclusions. Set-Valued Analysis, 9:375–381, 2001. [18] M. Nunez and R.M. Freund. Condition measures and properties of the central trajectory of a linear program. Math. Program., 83:1–28, 1998. [19] A. Packard and J. Doyle. The complex structured singular value. Automatica, 29:71– 109, 1993. [20] A. Packard and P. Pandey. Continuity properties of the real/complex structured singular value. IEEE Trans. on Automatic Control, 38:415–428, 1993. [21] J. Pe˜ na. Understanding the geometry of infeasible perturbations of a conic linear system. SIAM Journal on Optimization, 10:534–550, 2000. [22] J. Pe˜ na. A characterization of the distance to infeasibility under block-structured perturbations. To Appear in Lin. Alg. Appl., 2003. [23] J. Pe˜ na and J. Renegar. Computing approximate solutions for conic systems of constraints. Math. Program., 87:351–383, 2000. [24] J. Renegar. Some perturbation theory for linear programming. Math. Program., 65:73– 91, 1994. [25] J. Renegar. Linear programming, complexity theory and elementary functional analysis. Math. Program., 70:279–351, 1995. [26] J. Renegar. Condition numbers, the barrier method, and the conjugate-gradient method. SIAM Journal on Optimization, 6:879–912, 1996. [27] J. Rohn. Systems of linear interval equations. Lin. Alg. Appl., 126:39–78, 1989. [28] S. Rump. Ill-conditioned matrices are componentwise near to singularity. SIAM Review, 41:102–112, 1999.

22

[29] G.W. Stewart. On scaled projections and pseudoinverses. Linear Algebra and its Applications, 112:189–193, 1989. [30] M.J. Todd. A Dantzig-Wolfe-like variant of Karmarkar’s interior-point linear programming algorithm. Operations Research, 38:1006–1018, 1990. [31] S.A. Vavasis and Y. Ye. Condition numbers for polyhedra with real number data. Oper. Res. Lett., 17:209–214, 1995. [32] S.A. Vavasis and Y. Ye. A primal-dual interior point method whose running time depends only on the constraint matrix. Math. Program., 74:79–120, 1996. [33] J.R. Vera. On the complexity of linear programming under finite precision arithmetic. Math. Program., 80:91–123, 1998. [34] Y. Ye. Toward probabilistic analysis of interior-point algorithms for linear programming. Math. of Oper. Res., 19:38–52, 1994. [35] K. Zhou, J. Doyle, and K. Glover. Robust and Optimal Control. Prentice Hall, 1996.

23

Suggest Documents