Generating Disjunctive Cuts for Mixed Integer Programs — Doctoral Dissertation Michael Perregaard
[email protected]
Carnegie Mellon University Graduate School of Industrial Administration Schenley Park Pittsburgh, PA 15213 September 1, 2003
Abstract This report constitutes the Doctoral Dissertation for Michael Perregaard and is a collection of results on the efficient generation of disjunctive cuts for mixed integer programs. Disjunctive cuts is a very broad class of cuts for mixed integer programming. In general, any cut that can be derived from a disjunctive argument can be considered a disjunctive cut. Here we consider specifically cuts that are valid inequalities for some simple disjunctive relaxation of the mixed integer program. Such a relaxation can e.g. be obtained by relaxing the integrality condition on all but a single variable. The liftand-project procedure developed in the early nineties is a systematic way to generate an optimal (in a specific sense) disjunctive cut for a given disjunctive relaxation. It involves solving a higher dimensional cut generating linear program (CGLP) and has been developed for the simplest possible disjunctions; those requiring that a simple variable be either zero or one. In our work we consider the problem of efficiently generating disjunctive cuts for any given disjunction. That is, once we are presented with a disjunctive relaxation of a mixed integer program, how can we efficiently generate one or more cuts that cuts off an optimal solution to the LP relaxation? This problem naturally falls into two cases: Two-term disjunctions, as those the original lift-and-project procedure was designed to solve, and more general multiple-term disjunctions. For the two-term disjunctions we show how one can effectively reduced the CGLP, but the main result is that we show a precise correspondence between the lift-andproject cuts obtained from the CGLP and simple disjunctive cuts from rows of the LP relaxation simplex tableau. The implication is that lift-and-project cuts from the high dimensional CGLP can be obtained directly from the LP relaxation. Furthermore, if integrality on all variables are considered then this becomes a correspondence between strengthened lift-and-project cuts and Gomory’s mixed integer cuts. Using this correspondence we present a procedure to efficiently generate an optimal mixed integer Gomory cut (optimal in the sense of the CGLP) through pivots in the simplex tableau of the LP relaxation. In the case of multiple-term disjunctions we present procedures that provide an optimal solution to the high dimensional CGLP, by solving the cut problem in the original space without recourse to the many auxiliary variables present in the CGLP. Finally, we propose a procedure that generates a set of facets of the convex hull of a given disjunctive set.
Contents Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Lift-and-Project 1.1 Introduction . . . . . . . . . . . . . . . . . . . Disjunctive programming . . . . . . . . . . . . Two basic ideas . . . . . . . . . . . . . . . . . 1.2 Compact Representation of the Convex Hull . Projection and polarity . . . . . . . . . . . . . 1.3 Generating Cuts . . . . . . . . . . . . . . . . . Deepest cuts . . . . . . . . . . . . . . . . . . . Cut lifting . . . . . . . . . . . . . . . . . . . . Cut strengthening . . . . . . . . . . . . . . . . The overall cut generating procedure . . . . . 1.4 Variations on the Cut Generating LP . . . . . Alternative normalizations . . . . . . . . . . . Complementarity of solution components . . . Reduced-size (CGLP) . . . . . . . . . . . . . . Multiple cuts from a disjunction . . . . . . . . 1.5 Computational Experience . . . . . . . . . . . Results with a reduced-size CGLP . . . . . . . Results with multiple cuts from a disjunction .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
2 A Precise Correspondence 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Simple Disjunctive Cuts and Mixed Integer Gomory Cuts . . . . . 2.3 Lift-and-Project Cuts . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 The Correspondence Between the Unstrengthened Cuts . . . . . . 2.5 The Correspondence Between the Strengthened Cuts . . . . . . . 2.6 Bounds on the Number of Essential Cuts . . . . . . . . . . . . . . 2.7 The Rank of P With Respect to Different Cuts . . . . . . . . . . 2.8 Solving (CGLP)k on the (LP) Simplex Tableau . . . . . . . . . . 2.9 Using Lift-and-Project to Strengthen Mixed Integer Gomory Cuts i
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
iii
. . . . . . . . . . . . . . . . . .
1 2 2 3 4 5 5 6 7 7 8 9 9 11 12 15 15 15 18
. . . . . . . . . .
19 20 21 21 22 23 28 30 31 32 43
2.10 Computational Experience . . . . . . . . . . . . . . . . . . . . . . . . 2.10.1 Technical Details . . . . . . . . . . . . . . . . . . . . . . . . . 2.10.2 A Computational Comparison With Classic Lift-and-Project . 2.10.3 Computation Experiments with Branch-and-Bound . . . . . . 2.11 Earlier Computational Results on Lift-and-Project Cuts in the Litterature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.11.1 Balas, Ceria and Cornu´ejols, 1996 . . . . . . . . . . . . . . . . 2.11.2 Ceria and Pataki, 1998 . . . . . . . . . . . . . . . . . . . . . . 2.11.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.12 Analyzing the root node lift-and-project cuts. . . . . . . . . . . . . . 3 Generating Cuts from Multiple-Term Disjunctions 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 3.2 An Iterative Approach to Cut Generating . . . . . . 3.3 Generating adjacent extreme points . . . . . . . . . . 3.4 How to generate a facet of conv(PDP ) in n iterations 3.5 Cut Lifting . . . . . . . . . . . . . . . . . . . . . . . 3.6 Computational testing . . . . . . . . . . . . . . . . . 3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . .
45 45 52 54 62 62 66 69 70
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
74 75 76 79 80 82 83 84
4 Finding a Sufficient Set of Facets for a Disjunctive Program 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Generating the facet-defining inequalities . . . . . . . . . . . . . 4.3 Filtering the set of generated inequalities . . . . . . . . . . . . . 4.4 Selecting the next inequality for processing . . . . . . . . . . . . ¯ . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Selecting (α, ¯ β) 4.6 The procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 A fast approximation . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 Modifying the procedure . . . . . . . . . . . . . . . . . . 4.7.2 Tilting an inequality αk x ≥ β k . . . . . . . . . . . . . . . 4.7.3 The modified procedure . . . . . . . . . . . . . . . . . . 4.7.4 Finding an orthogonal inequality . . . . . . . . . . . . . 4.7.5 Exploiting common constraints . . . . . . . . . . . . . . 4.7.6 An Alternative Search Order . . . . . . . . . . . . . . . . 4.8 Computational Experiments . . . . . . . . . . . . . . . . . . . . 4.9 Necessary Improvements . . . . . . . . . . . . . . . . . . . . . . 4.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
88 89 91 93 94 96 97 98 99 100 101 102 105 105 106 112 113
ii
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
Overview The leading theme for this project is the efficient generation of lift-and-project cuts for mixed integer programming. This report contains four parts. Chapter one is an introduction to lift-and-project and contains some of the background theory necessary to understand the results presented here. In addition some new results are presented on normalizing the cut generating linear program (CGLP) and exploiting the structure of a solution to the CGLP. In Chapter two the central result is an exact correspondence between bases to the CGLP and bases to the LP relaxation of the mixed integer program, which becomes a correspondence between the cuts derived from these bases. We exploit this correspondence to develop an algorithm that creates lift-and-project directly from the LP relaxation without recourse to the higher-dimensional CGLP. Computational results are presented that evaluates our implementation of this new approach. Chapters one and two concerns disjunctive cuts obtained from a two-term disjunction. In chapter three we present algorithms to efficiently generate cuts from multiple-term disjunctions. We present computational evidence against using the CGLP approach when generating cuts from disjunctions with four or more terms. For such larger disjunctions it is much better to use a decomposition approach which works in the original space of the mixed integer program. The first three chapters are all focused on creating a single cut from a given disjunction. In chapter four we propose an algorithm to generate a set of facets of the convex hull of a given disjunctive set. This idea gets around the troublesome normalization constraint, otherwise required in lift-and-project, and instead relies solely on a direction of optimization when creating the cuts.
iii
Chapter 1 Lift-and-Project
Parts of the work in this chapter has been published as Lift-and-project for Mixed 0-1 programming: recent progress by E. Balas and M. Perregaard in Discrete Applied Mathematics 123 (2002), 129-154.
1
1.1
Introduction
The foundations of lift-and-project were laid in a July 1974 technical report on disjunctive programming, published 24 years later as an invited paper [4] with a foreword. For additional work on disjunctive programming in the seventies and eighties see [2, 3, 9, 13, 17, 18, 28, 29, 30, 34]. In particular, [2] contains a detailed account of the origins of the disjunctive approach and the relationship of disjunctive cuts to Gomory’s mixed integer cut [27], intersection cuts [1] and others. Disjunctive programming received a new impetus in the early nineties from the work on matrix cones by Lov´asz and Schrijver [31], see also Sherali and Adams [33]. The version that led to the computational breakthroughs of the nineties is described in the two papers by Balas, Ceria and Cornu´ejols [6, 7], the first of which discusses the cutting plane theory behind the approach, while the second deals with the branch-and-cut implementation and computational testing. Related recent developments are discussed in [8, 5, 14, 19, 20, 21, 35, 37, 38].
Disjunctive programming Disjunctive programming is optimization over unions of polyhedra. While polyhedra are convex sets, their unions of course are not. The name reflects the fact that the objects investigated by this theory can be viewed as the solution sets of systems of linear inequalities joined by the logical operations of conjunction, negation (taking of complement) and disjunction, where the nonconvexity is due to the presence of disjunctions. Pure and mixed integer programs, in particular pure and mixed 0-1 programs can be viewed as disjunctive programs; but the same is true of a host of other problems, like for instance the linear complementarity problem. Our focus will be on pure and mixed 0-1 programs, although all results are easily generalized to mixed integer programs. Our starting point is a mixed 0-1 program in the form min s.t.
cx Ax ≥ b x ≥ 0 xj ∈ {0, 1}, j = 1, . . . , p
(MIP)
where A is m × n, c and x are n-vectors, and b is an m-vector, for some p, 0 ≤ p ≤ n. We will assume that the system Ax ≥ b subsumes the inequalities xj ≤ 1, j = 1, . . . , p, i.e. the last p inequalities of Ax ≥ b are −xj ≥ −1, j = 1, . . . , p. The linear programming relaxation of (MIP) is min{cx : x ∈ P } where P := {x ∈ Rn+ : Ax ≥ b}. 2
(LP)
A disjunctive relaxtion of (MIP) can be obtained by replacing the 0-1 condition with a valid disjunction of the form _ D q x ≥ dq x q∈Q
where Q is a finite index set. The simplest example of such a disjunction would be to impose the 0-1 condition on a single variable xk as in (xk ≤ 0) ∨ (xk ≥ 1). Let Pq := {x ∈ Rn : Aq x ≥ bq }, q∈Q be convex polyhedra where q
A :=
A Dq
b b := dq
q
with (Aq , bq ) an mq × (n + 1) matrix, q ∈ Q. Then the disjunctive set ∪ Pq over q∈Q
which we wish to minimize the linear function cx can be expressed as {x ∈ Rn : ∨ (Aq x ≥ bq )}, q∈Q
(1.1)
which is its disjunctive normal form (a disjunction whose terms do not contain further disjunctions). The same disjunctive set can also be expressed as {x ∈ Rn : Ax ≥ b, ∨ (dh x ≥ dh0 ), j = 1, . . . , t}, h∈Qj
(1.2)
which is its conjunctive normal form (a conjunction whose terms do not contain further conjunctions). Here (dh , dh0 ) is a (n + 1)-vector for h ∈ Qj , all j. The connection between (1.1) and (1.2) is that each term Ai x ≥ bi of the disjunctive normal form (1.1) contains Ax ≥ b and exactly one inequality dh x ≥ dh0 of each disjunction of (1.2) indexed by Qj for j = 1, . . . , t, and that all distinct systems Ai x ≥ bi with this property are present among the terms of (1.1). See [3] for details on how to go from (1.1) to (1.2) and from (1.2) to (1.1).
Two basic ideas The lift-and-project approach relies mainly on the following two ideas, the first of which uses the disjunctive normal form (1.1), while the second one uses the conjunctive normal form (1.2): 1. There is a compact representation of the convex hull of a union of polyhedra in a higher dimensional space, which in turn can be projected back into the original space. The first step of this operation may be viewed as lifting, the second step, projection. As a result one obtains the convex hull in the original space. 2. A large class of disjunctive sets, called facial, can be convexified sequentially, i.e. their convex hull can be derived by imposing the disjunctions one at a time, generating each time the convex hull of the current set. 3
1.2
Compact Representation of the Convex Hull
Theorem 1.1 ([4]). Given polyhedra Pq := {x ∈ Rn : Aq x ≥ bq } 6= ∅, q ∈ Q, the closed convex hull of ∪ Pq is the set of those x ∈ Rn for which there exist vectors q∈Q
(y q , y0q ) ∈ Rn+1 , q ∈ Q, satisfying P x − q∈Q y q Aq y q − bq y0q y0q P q q∈Q y0
= ≥ ≥ =
0 0 0 1.
(1.3)
q∈Q
In particular, denoting by PQ := conv ∪ Pq the closed convex hull of ∪ Pq and q∈Q
q∈Q
by P the set of vectors (x, {y q , y0q }q∈Q ) satisfying (1.3),
(i) if x∗ is an extreme point of PQ , then (¯ x, {¯ y q , y¯0q }q∈Q ) is an extreme point of P, with x¯ = x∗ , (¯ y k , y¯0k ) = (x∗ , 1) for some k ∈ Q, and (¯ y q , y¯0q ) = (0, 0) for q ∈ Q \ {k}. (ii) if (¯ x, {¯ y q , y¯0q }q∈Q ) is an extreme point of P, then y¯k = x¯ = x∗ and y¯0k = 1 for some k ∈ Q, (¯ y q , y¯0q ) = (0, 0), q ∈ Q \ {k}, and x∗ is an extreme point of PQ . Note that in this higher dimensional representation of PQ , the number of variables and constraints is linear in the number |Q| of polyhedra in the union, and so is the number of facets of P. Note also that in any basic solution of the linear system (1.3), y0q ∈ {0, 1}, q ∈ Q, automatically, without imposing this condition explicitly. Of course, if the set Q is itself exponential in the number of variables, then system (1.3) becomes unmanageably large. This is the case for instance if we impose simultaneously all p integrality conditions of the mixed 0-1 program (MIP), in which case we have a disjunction with 2p terms, one for every p-component 0-1 point. But if we impose only disjunctions that yield a set Q of manageable size, then this representation becomes extremely useful (such an approach is facilitated by the sequential convexifiability of facial disjunctive sets, see below). In the special case of a disjunction of the form xj ∈ {0, 1}, when |Q| = 2 and Pj0 := {x ∈ Rn+ : Ax ≥ b, xj = 0}, Pj1 := {x ∈ Rn+ : Ax ≥ b, xj = 1}, PQ := conv (Pj0 ∪ Pj1 ) is the set of those x ∈ Rn for which there exist vectors (y, y0), (z, z0 ) ∈ Rn+1 such that + x −
y − Ay − by0 −yj
z
Az − bz0 zj − z0 + z0
y0 4
= ≥ = ≥ = =
0 0 0 0 0 1
(3′ )
Unlike the general system (1.3), the system (3′ ), in which |Q| = 2, is of quite manageable size.
Projection and polarity In order to generate the convex hull PQ , and more generally, to obtain valid inequalities (cutting planes) in the space of the original variables, we project P onto the x-space: Theorem 1.2 ([4]). Projx (P) = {x ∈ Rn : αx ≥ β for all (α, β) ∈ W0 }, where W0 := {(α, β) ∈ Rn+1 : α = uq Aq , β ≤ uq bq for some uq ≥ 0, q ∈ Q}. The polyhedral cone W0 used to project P can be shown to be the reverse polar cone PQ∗ of PQ , i.e. the cone of all valid inequalities for PQ : Theorem 1.3 ([4]). PQ∗ := {(α, β) ∈ Rn+1 : αx ≥ β for all x ∈ PQ } = {(α, β) ∈ Rn+1 : α = uq Aq , β ≤ uq bq for some uq ≥ 0, q ∈ Q}. To turn again to the special case of a disjunction of the form xj ∈ {0, 1}, projecting the system (3′ ) onto the x-space yields the polyhedron PQ whose reverse polar cone is PQ∗ = {(α, β) ∈ Rn+1 : α ≥ uA − u0 ej α ≥ vA + v0 ej β ≤ ub β ≤ vb + v0 u, v ≥ 0} (where ej is the j-th unit vector.) One of the main advantages of the higher dimensional representation is that in projecting it back we have an easy criterion to distinguish facets of PQ from other valid inequalities. Theorem 1.4 ([4]). Assume PQ is full dimensional. The inequality αx ≥ β defines a facet of PQ if and only if (α, β) is an extreme ray of the cone PQ∗ .
1.3
Generating Cuts
The implementation of the disjunctive programming approach into a practical 0-1 programming algorithm had to wait until the early ’90s. It required not only the choice of a specific version of disjunctive cuts, but also a judicious combination of cutting with branching, made possible in turn by the discovery of an efficient procedure for lifting cuts generated in a subspace (for instance, at a node of the search tree) to be valid in the full space (i.e. throughout the search tree). 5
Deepest cuts As mentioned earlier, if PQ is full dimensional, then facets of PQ correspond to extreme rays of the reverse polar cone PQ∗ . To generate such extreme rays, for each 0-1 variable xj that is fractional at the linear programming optimum, we solve a linear program over a normalized version of the cone PQ∗ corresponding to the disjunction xj = 0∨xj = 1, with an objective function aimed at cutting off the linear programming optimum x¯ by as much as possible. This “cut generating linear program” for the j-th variable is of the form α¯ x − β α − uA + u0 ej ≥ 0 α − vA − v0 ej ≥ 0 (CGLP)j −β + ub = 0 −β + vb + v0 = 0 u, v ≥ 0 P and (i) β ∈ {1, −1}, or (ii) j |αj | ≤ 1. For details, see [6, 7]. The normalization constraint (i) or (ii) has the purpose of turning the cone PD∗ into a polyhedron. In case of (i) this is achieved by using −1 ≤ β ≤ 1. In case of (ii), by substituting αj+ − αj− for αj , with αj+ , αj− ≥ 0, ∀j. A third normalization, proposed later and used in computational experiments subsequent to [7], is X X (iii) ui + u0 + vi + v0 = 1. min s.t.
i
i
The merits and demerits of various normalizations will be discussed later. Solving (CGLP)j yields a cut αx ≥ β, where max{uak , vak } k ∈ N \ {j} αk = max{uaj − u0 , vaj + v0 } k = j, with ak the k-th column of A, and β = min{ub, vb + v0 }. This cut maximizes the amount β − α¯ x by which x¯ is cut off (with respect to the selected normalization). The experiments of [7] indicated that the most efficient way of generating cuts is to stop short of solving (CGLP) to optimality. This idea was implemented by ignoring those columns of (CGLP) associated with constraints of P not tight at the optimum, except for the lower and upper bounding constraints on the 0-1 variables.
6
Cut lifting In general, a cutting plane derived at a node of the search tree defined by a subset F0 ∪ F1 of the 0-1 variables, where F0 and F1 index those variables fixed at 0 and 1, respectively, is only valid at that node and its descendants in the tree (where the variables in F0 ∪ F1 remain fixed at their values). Such a cut can in principle be made valid at other nodes of the search tree, where the variables in F0 ∪ F1 are no longer fixed, by calculating appropriate values for the coefficients of these variables – a procedure called lifting. However, calculating such coefficients is in general a daunting task, which may require the solution of an integer program for every coefficient. One important advantage of the cuts discussed here is that the multipliers u, u0 , v, v0 obtained along with the cut vector (α, β) by solving (CGLP)j can be used to calculate by closed form expressions the coefficients αh of the variables h ∈ F0 ∪ F1 . While this possibility of calculating efficiently the coefficients of variables absent from a given subproblem (i.e. fixed at certain values) is crucial for making it possible to generate cuts during a branch-and-bound process that are valid throughout the search tree, its significance goes well beyond this aspect. Indeed, most columns of A corresponding to nonbasic components of x¯ typically play no role in determining the optimal solution of (CGLP)j and could therefore be ignored. In other words, the cuts can be generated in a subspace involving only a subset of the variables, and then lifted to the full space. This is the procedure followed in [6, 7], where the subspace used is that of the variables indexed by some R ⊂ N such that R includes all the 0-1 variables that are fractional and all the continuous variables that are positive at the LP optimum. The lifting coefficients for the variables not in the subspace, which are all assumed to be at their lower bound, are then given by αk := max{uak , vak },
k ∈N \R
where u and v are the optimal vectors obtained by solving (CGLP)j . These coefficients always yield a valid lifted inequality. If normalization (i) is used in (CGLP)j , the resulting lifted cut is exactly the same as the one that would have been obtained by applying (CGLP)j to the problem in the full space. If other normalizations are used, the resulting cut may differ in some coefficients and may therefore not be optimal for the full space (CGLP)j .
Cut strengthening The cut αx ≥ β derived from a disjunction of the form xj ∈ {0, 1} can be strengthened by using the integrality conditions on variables other than xj , as shown in [9] (see also §7 of [2]). Indeed, if xk is such a variable, the coefficient αk := max{uak , vak }
7
can be replaced by αk′ := min{uak + u0 ⌈mk ⌉, vak − v0 ⌊mk ⌋}, where mk :=
vak − uak . u0 + v0
For a proof of this statement, see [2], [6] or [7]. The strengthening “works,” i.e. produces an actual change in the coefficient, only if |uak − vak | ≥ u0 + v0 .
(1.4)
Indeed, if (1.4) does not hold, then either uak > vak and 0 ≥ mk > −1, or uak < vak and 0 ≤ mk < 1; in either case, αk′ = αk . Furthermore, the larger the difference |uak − vak |, the more room there is for strengthening the coefficient in question. This strengthening procedure can also be applied to cuts derived from disjunctions other than xj ∈ {0, 1}, including disjunctions with more than two terms. In the latter case, however, the closed form expression for the value mk used above has to be replaced by a procedure for calculating those values, whose complexity is linear in the number of terms in the disjunction (see [9] or [2] for details).
The overall cut generating procedure Considering what we said about solving the cut generating LP in a subspace and then lifting the resulting cut to the full space and strengthening it, the actual cut generating procedure is not just “lift and project,” but rather RLPLS, an acronym for • RESTRICT the problem to a subspace defined from the LP optimum, and choose a disjunction; • LIFT the disjunctive set to describe its convex hull in a higher dimensional space; • PROJECT the polyhedron describing the convex hull onto the original (restricted) space, generating cuts; • LIFT the cuts into the original full space; • STRENGTHEN the lifted cuts.
8
1.4
Variations on the Cut Generating LP
The solution of the cut generating LP depends on two factors: the objective function and the normalization used. The choice of the former is dictated by the fact that the immediate goal is to cut off the LP optimum by as much as possible. The normalization is a different story.
Alternative normalizations It was shown in [4] that if normalization (i) is used, then (CGLP) has a finite minimum if and only if x¯λ ∈ PD for some λ ∈ R+ . This condition is satisfied for certain classes of problems, for instance set covering (for β = 1) and set packing (for β = −1), but not for others, and the absence of a finite minimum leads to complications. In case of normalizations (ii) or (iii), a different difficulty arises. If PD is fulldimensional, then the inequality αx ≥ β defines a facet of PD if and only if (α, β) is an extreme ray of the reverse polar cone PD∗ . If PD∗ is truncated or intersected with a single hyperplane in (α, β)-space, then the extreme points of the resulting polyhedron correspond to extreme rays of PD∗ . But if PD∗ is truncated or intersected by multiple hyperplanes, that will typically result in a polyhedron whose extreme points do not always correspond to extreme rays of PD∗ . This is exactly what happens in the case of P normalizations (ii) and (iii). In the case of (ii), the constraint (|αj | : j ∈ N) ≤ 1, which requires α to belong to an n-dimensional octahedron, is equivalent to imposing n on α the 2n inequalities δ i α ≤ 1 for all δ i ∈ {1, −1}P , which define P the facets of the octahedron. In the case of (iii), the constraint i ui + u0 + i vi + v0 = 1 guarantees that (CGLP)j will have a finite minimum for every nonnegative objective function. Since the multipliers ui , vi , i = 0, . . . , m, are all required to be nonnegative, the normalization (iii) bounds each multipler; and since α is bounded from below by a linear combination of those multipliers, it follows that the objective function of (CGLP)j is bounded from below for any nonnegative x¯. If (iii) is replaced by X X X X (iii′ ) ui + u0 + vi + v0 + sk + tk = 1, i
i
k
k
where sk ≥ 0, tk ≥ 0, k = 1, . . . , n, are surplus variables used to bring (CGLP)j to equality form, then (iii′ ) bounds PD∗ in every direction. This follows because the surplus variables are now also required to be nonnegative, and thus α is also bounded from above by a linear combination of the multipliers. Although the higher dimensional cone is truncated by a single hyperplane through either (iii) or (iii′ ), the outcome in the (α, β)-subspace may correspond to a truncation of PD∗ by multiple hyperplanes, and thus an extreme point of the resulting polyhedron may not correspond to an extreme ray of PD∗ . To avoid this difficulty, we propose another normalization, whose generic form is (iv) αy = 1. Let (CGLP)y denote 9
the problem with this normalization. The advantage of normalization (iv) is that it intersects PD∗ with a single hyperplane in the (α, β)-space and thus has the effect that every extreme point of the resulting polyhedron corresponds to an extreme ray of PD∗ . This does not imply that every extreme point of the higher-dimensional (CGLP)y corresponds to an extreme ray of PD∗ ; but it does imply that if the objective min(α¯ x −β) is bounded then there exists an optimal extreme point of (CGLP)y which corresponds to an extreme ray of PD∗ . Theorem 1.5. Let (CGLP)y be feasible. Then it has a finite minimum if and only if x¯ + yλ ∈ PD for some λ ∈ R. Proof. If (CGLP)y is unbounded in the direction of minimization, then there exists ˜ ∈ P ∗ with x¯T α (α, ˜ β) ˜ < β˜ and y T α ˜ = 0. But then (¯ x + yλ)T α ˜ < β˜ for all λ ∈ R, Q hence x¯ + yλ ∈ / PD . ˆ ∈ Rn+1 such that Conversely, if x¯ + yλ ∈ / PD for all λ ∈ R, there exists (α, ˆ β) ˆ But αx ˆ ≥ βˆ for all x ∈ PD and α(¯ ˆ x + yλ) < βˆ for all λ ∈ R, i.e. α ˆ y = 0 and α¯ ˆ x < β. ˆ is a direction of unboundedness for (CGLP)y . then (α, ˆ β) ˜ then Theorem 1.6. If (CGLP)y has an optimal solution (α, ˜ β), x¯T α ˜ − β˜ = λ∗ := min{λ : x¯ + yλ ∈ PD }, and
˜ (¯ x + yλ∗)T α ˜ = β.
˜ be an optimal solution to (CGLP)y , and define λ∗ := min{λ : Proof. Let (α, ˜ β) ˜ Further, x¯ + yλ ∈ PD }. Since αy ˜ 6= 0, there exists λ0 ∈ R such that (¯ x + yλ0)T α ˜ = β. T T 0 0 0 ∗ 0 ∗ ˜ x¯ α ˜ − β = y αλ ˜ = λ . We claim that λ = λ . For suppose λ > λ . Then (¯ x + yλ∗ )T α ˜ − β˜ = x¯T α ˜ − β˜ + y T α ˜ λ∗ = −λ0 + λ∗ (since y T α ˜ = 1) < 0, ˜ contradicting x¯ + yλ∗ ∈ PD . i.e. the point x¯ + yλ∗ violates the inequality αx ˜ ≥ β, 0 ∗ Now suppose λ < λ . Then there exists a hyperplane αx ¯ = β¯ such that αx ¯ ≥ β¯ ∗ 0 0 ¯ ¯ for all x ∈ PQ , α(¯ ¯ x + yλ ) = β, and α(¯ ¯ x + yλ ) < β. Further, since α ¯ yλ < αyλ ¯ ∗ and ¯ is scaled λ0 < λ∗ , it follows that α ¯ y > 0 and so w.l.o.g. we may assume that (α, ¯ β) so as to make αy ¯ = 1. But then α¯ ¯ x − β¯ < αyλ ¯ 0 ˜ = α¯ ˜ x − β, ˜ for (CGLP)y . contradicting the optimality of (α, ˜ β) Corollary 1.7. Let y := x∗ − x¯ for some x∗ ∈ PQ . Then (CGLP)y has an optimal ˜ such that (i) α ˜ and (ii) α solution (α, ˜ β) ˜ x¯ < β; ˜ x = β˜ is the supporting hyperplane of PQ that intersects the line segment (¯ x, x∗ ] at the point closest to x∗ . Theorem 1.6 and Corollary 1.7 are illustrated in Figure 3. 10
x2 x¯ y x¯ + yλ∗ x∗
x1 α(¯ ˜ x + yλ∗ ) = β˜
PQ
Figure 1.1: y = x∗ − x¯
Complementarity of solution components Consider the linear program (CGLP)j used to generate a cut from the disjunction xj = 0 or xj = 1. (CGLP)j has a set of trivial solutions corresponding to the constraint set Ax ≥ b. Assuming that normalization (iii) is used, set u¯i = v¯i = 21 for ¯ u¯, u some i ∈ M, u¯h = v¯h = 0 for all h ∈ M \ {i} and u¯0 = v¯0 = 0. Then (α, ¯ β, ¯0, v¯, v¯0 ) i is a solution to (CGLP)j with α = a and β = bi , i.e., the coefficient vector of the i-th constraint of Ax ≥ b. We call a basic solution nontrivial if it is not of this type. Theorem 1.8. Any nontrivial basic solution w := (α, β, u, u0, v, v0 ) to (CGLP)j satisfies u · v = 0. Proof. We assume that (CGLP)j uses normalization (iii). An analogous reasoning proves the other cases. Let A have rows ah , h ∈ M, let w¯ be a basic solution, and suppose u¯i v¯i > 0 for some i ∈ M. W.l.o.g., assume 0 < u¯i ≤ v¯i . Define ρ(¯ vh − u¯h ) h = i 0 h=i , vˆh := uˆh := ρ¯ vh h ∈ M \ {i} ρ¯ uh h ∈ M \ {i} uˆ0 = ρ¯ u0 , vˆ0 := ρ¯ v0 , α ˆ := ρ(α ¯ − u¯i ai ), βˆ := ρ(β¯ − u¯ibi ),
with ρ := 1/(1 + 2¯ ui ), and 2σ¯ uh h = i u˜h := , σ¯ uh h ∈ M \ {i}
σ(¯ vh + u¯h ) h = i σ¯ vh h ∈ M \ {i} u˜0 := σ¯ u0 , v˜0 := σ¯ v0 , α ˜ := σ(α ¯ + u¯i ai ), β˜ := σ(β¯ + u¯i bi ), v˜h :=
with σ := 1/(1 + 2¯ ui). Then wˆ and w˜ are both feasible. But since w is nontrivial, wˆ 6= 0 6= w˜ and so 1 1 wˆ + 2σ w˜ = w, ¯ which contradicts the assumption that w¯ is it is easily verified that 2ρ basic. 11
The complementarity property shown in Theorem 1.8 means that while the two variables u0 , v0 associated with the inequalities xj ≤ 0 and xj ≥ 1 may both be (and typically are) positive at the optimum, the pair (ui , vi ) associated with the i-th inequality of Ax ≥ b is complementary for every i: at most one of the two variables can be positive. This is a consequence of the intuitively plausible fact, that a given inequality of Ax ≥ b can be profitably added with a positive multiplier either to one term of the disjunction, or to the other, but not to both. In fact, in addition to the complementarity of the pairs (ui, vi ), typically both members of many pairs are 0. In other words, some of the inequalities of Ax ≥ b do not contribute to the improvement of the cut, whichever term of the disjunction they are added to. This has led us to a search for criteria by which to decide for each inequality of Ax ≥ b, whether it should be added to the first term of the disjunction with multiplier ui , or to the second term with the multiplier vi , or not included at all in the cut generating LP.
Reduced-size (CGLP) After some experimentation with several different criteria, we concluded that the best indicator of the usefulness of the presence of an inequality of Ax ≥ b, x ≥ 0, in one term or the other of the disjunction is to be found in the optimal simplex tableau of the linear program max{cx : x ∈ P }. Namely, suppose we want to build the cut generating LP for the disjunction xk ∈ {0, 1}, where x¯k is a fractional component of the LP optimum x¯. Let the row of the optimal simplex tableau associated with xk be X xk = a¯k0 − a ¯kj xj , (1.5) j∈J
where J is the index set of nonbasic variables and 0 < a ¯k0 = x¯k < 1, and the nonbasic variables are all at their lower bound. Next we restrict the system Ax ≥ b, x ≥ 0 to the subspace obtained by removing all nonbasic structural variables and all constraints that are not binding at the LP optimum (hence all basic surplus variables). (Here structural variables are the components of x, whereas surplus variables stand for the components of s = Ax − b.) We are then left with only the basic structural variables and the nonbasic surplus variables, and can write the resulting system as BxB − sM = bM xB , sM ≥ 0 Note that B has |M| columns and |M| rows and is nonsingular. Multiplying with B −1 yields xB = B −1 bM + Bk−1 sM , a system whose row corresponding to xk is xk = Bk−1 bM + Bk−1 sM , 12
where Bk−1 is row k of B −1 . This is just another way of writing the equation that remains after we remove from (1.5) the nonbasic structural variables, and replace the notation xj with sj for the surplus variables: X xk = a ¯k0 + (−¯ aki )si , i∈M
Bk−1 bM ,
−1 −1 where a ¯k0 = x¯k = and for i ∈ M, a ¯ki = −Bki , with Bki the i-th component of Bk−1 . The simplest disjunctive cut derived from the condition xk ≤ 0 ∨ xk ≥ 1, namely the intersection cut from the pair of halfspaces 0 ≤ xk ≤ 1, is known to be (see [2]) πsM ≥ π0 , where
π0 = x¯k (1 − x¯k ) and for i ∈ M, πi := max{πi1 , πi2 }, with −1 πi1 := (¯ xk − 1)Bki ,
−1 πi2 := x¯k Bki .
We wish to construct a basic solution of (CGLP)k , whose (α, β)-component yields the cut αx ≥ β obtained by expressing πsM ≥ π0 in terms of x. For this purpose, we write α = (αB , αR ) where αB stands for the components associated with the columns of B, and αR for the components that have been removed. We now define αB := πB, β := π0 + πbM , 1 u := π − π , v := π − π 2 , u0 := 1 − x¯k , v0 := x¯k .
(1.6)
Theorem 1.9. The vector w := (αB , β, u, u0, v, v0 ) defined by (1.6) is a basic feasible solution of (CGLP)k with the normalization u0 + v0 = 1. Proof. Since αR has been removed, the expression α − uA + u0 ek reduces to αB − uB + u0 ek , with ek the unit vector in the subspace of αB . We then have αB − uB + u0 ek = πB − (π − π 1 )B + (1 − x¯k )ek = π 1 B + (1 − x¯k )ek = (¯ xk − 1)Bk−1 B + (1 − x¯k )ek = 0 since Bk−1 B = ek . Next, αB − vB − v0 ek = πB − (π − π 2 )B − x¯k ek = π 2 B − x¯k ek = x¯k Bk−1 B − x¯k ek = 0. 13
Further, −β + ubM = −π0 − πbM + (π − π 1 )bM = −¯ xk (1 − x¯k ) − π 1 bM = −¯ xk (1 − x¯k ) + (1 − x¯k )Bk−1 bM = 0, since Bk−1 bM = x¯k . Also, −β + vbM + v0 = −π0 − πbM + (π − π 2 )bM + x¯k = −¯ xk (1 − x¯k ) − π 2 bM + x¯k = −¯ xk + (¯ xk )2 − x¯k Bk−1 bM + x¯k = 0. Finally, u0 + v0 = (1 − x¯k ) + x¯k = 1. Furthermore, (1.6) implies that u, v ≥ 0. This proves that w is feasible. To see that it is basic, note that there are 2|M| + 2 constraints satisfied at equality, and the same number of nonnegative variables, whose coefficient vectors are linearly independent. Using the basic solution (1.6), we construct the associated simplex tableau of (CGLP)k , and among the nonbasic variables ui , vi , we keep only those with negative reduced cost, while removing the others. Our interpretation that these are the only variables likely to improve the cut (in terms of the chosen objective) is more than born out by our computational experience: as shown in the computational section of this paper, the cuts obtained from this smaller (CGLP) tend to be just as strong as those obtained from the full fledged problem. In chapter two, theorem 2.9 we show how these reduced costs can be computed from the (LP) simplex tableau without ever creating the (CGLP)k . Our approach for constructing a starting solution for (CGLP)k highlights the connection between this lift-and-project cut and the mixed integer Gomory cut, which in this case (since the nonbasic 0-1 variables have been removed) is identical to the intersection cut from the pair of half-spaces 0 ≤ xk ≤ 1. Thus the lift-and-project cut can be viewed as a generalization of the mixed integer Gomory cut πsM ≥ π0 , where the generalization consists in optimally combining each of the inequalities π 1 sM ≥ π0 and π 2 sM ≥ π0 with some of the constraints of P before taking the component-wise maximum of π 1 and π 2 . This relationship with Gomory’s mixed integer cuts hinted at here will be fully explored in Chapter 2 and forms the basis of a new approach towards creating liftand-project cuts.
14
Multiple cuts from a disjunction There are several ways of deriving more than one cut from a given disjunction. The approach proposed in [5] was to generate several facets of PQ containing its optimal extreme point xopt . This approach asks for the calculation of xopt (recall, we are talking about a disjunction with two terms, not too expensive to solve), to be used to generate n facets of PQ containing xopt . The way to accomplish this is to replace the objective function of (CGLP)j by min(xopt )T α − β, which results in a linear program whose optimal solutions (α, β, u, u0, v, v 0) yield all the valid inequalities αx ≥ β (including those that define facets of PQ ) satisfied at equality by xopt (Theorem 1.1 of [5]). Thus, having obtained one such optimal solution, one may generate all the others by pivoting in columns with zero reduced cost. In theory this is a way of generating all the facets of PQ that contain xopt . In practice, the massive degeneracy that is typically present in the optimal tableau of this problem makes the procedure of finding alternative optima with the relevant (α, β)-components computationally rather expensive. An alternative way of generating multiple cuts from the same disjunction is to explore near-optimal solutions to (CGLP)j by forcing to 0 some component of (u, v) positive in the optimal tableau. This has been explored by Ceria and Pataki [19]; we have also tried it, with results slightly better than those obtained with the previous approach. Finally, a third way which we found considerably more useful than either of these two, is the following. Having found an optimal solution to (CGLP)j , we go back to the optimal simplex tableau of min{cx : x ∈ P }, and generate all adjacent solutions to x¯ obtainable by a single pivot: let these solutions be x1 , . . . , xk . We then use each one of them in turn to replace x¯ in the objective function of (CGLP)j . This yields reasonably good results, to be discussed in the computational section. The ideas here are all rooted in the cut generating linear program. In chapter 4 we propose a radically different idea for generating multiple cuts from a single disjunction. Instead of solving a set of CGLPs, it requires solving each term of the disjunction and exploits these solutions to create a set of cuts.
1.5
Computational Experience
Results with a reduced-size CGLP We have extensively tested the reduced-size (CGLP) constructed from the optimal simplex tableau for min{cx : x ∈ P } by using the complementarity of (u, v), as described in section 7. In the experiment summarized in Table 2, we chose 13 of the harder MIPLIB problems [15], and for each instance we solved, for each 0-1 variable fractional at the LP optimum, a (CGLP) (with normalization (iii)) not restricted to a subspace, in 15
two versions: the full size (CGLP) and the reduced size CGLP. The results are shown in Table 2. Every problem was preprocessed by CPLEX before generating cuts. Problem
p0548 p2756 pp08a vpm2 10-teams danoint misc07 pk1 seymour vpm1 mod010 l152lav set1ch
Optimum of the preprocessed LP 3126 2703 2748 10.27 897.0 62.64 1415 0 267.8 16.43 6532 4656 30427
Lower bound after adding cuts Full Reduced CGLP CGLP 5713 5709 2880 2880 2103 2103 11.05 11.02 904.0 904.0 62.65 62.66 1415 1415 0 0 271.1 270.8 17.01 17.01 6533 6543 4659 4659 35174 35174
Average number of columns in CGLP Full Reduced
Average number of pivots in CGLP Full Reduced
820 1984 875 873 1528 1989 763 198 13022 893 1315 874 2768
25.62 5.39 19.26 37.18 709.9 1230.0 154.0 21.67 1936.7 54.64 258.4 186.4 45.0
305.6 313.4 490.8 374.1 584.2 637.4 203.9 48.0 2338.0 388.0 542.6 366.7 1515.8
0.18 0.08 5.33 4.50 258.7 619.3 46.2 17.0 158.4 5.93 121.6 92.8 6.43
Table 1.1: Computational results with the full size and the reduced (CGLP). It is clear from the table, that the reduced (CGLP), while generating cuts whose strength – as measured by the lower bounds they provide – is fully equal to that of the cuts generated by the full (CGLP), requires a computational effort that is several times smaller. In the table 1.2 we report on the total running time when solving each of the 29 problems used in the paper [7]. The column labelled “MIPO” contains the results of using the cut generator by Ceria and Pataki [19] whereas all the other columns use our cut generator. Here “Full” refers to the full formulation of the cut LP, “Tight” is the formulation where we only include constraints that are tight at the LP relaxation optimum, and “Bounds” includes the constraints of “Tight” plus all the lower and upper bounds (as in the MIPO cut generator). The column “Reduced” contains the results of using reduced cost computations to reduce the size of the (CGLP). In all our tests we use an advance basis as given by Theorem 1.9 and 1.6. In the last row of Table 1.2 we give the average running time relative to the MIPO cut generator, where a run that was not completed within one hour is counted for 3600 seconds towards this average. From looking at Table 1.2 it is clear that there is quite a spread in the solution times, also between the “MIPO” column and the “Bounds” column even though the reduced (CGLP) should be identical in these two cases. We can only assume that the difference is due to implementation details. But overall we observe a saving in using 16
Problem MIPO Full Tight Bounds Reduced air04 * 3203.80 3295.89 * * air05 2837.53 * * 3136.02 * bm23 3.86 1.50 1.48 1.48 1.58 c-fat200-1 200.63 108.00 204.59 130.86 88.68 egout 0.66 0.27 0.18 0.23 0.20 fxch.3 15.23 12.75 18.41 28.35 12.34 genova6xs 107.98 87.75 40.02 137.14 102.14 l152lav 366.23 331.91 375.30 330.23 170.31 lseu 8.05 9.44 9.70 15.23 13.04 misc05 19.80 13.76 14.58 15.76 13.17 misc07 * * 3527.23 * 3191.89 mod008 21.53 19.05 42.26 19.47 18.80 mod010 5.25 34.32 11.35 24.99 40.06 modglob 197.20 974.41 1483.29 674.47 276.33 p0033 1.38 0.58 0.84 0.69 0.79 p0201 10.19 13.33 12.09 8.70 16.08 p0282 10.76 12.94 20.20 17.97 11.99 p0291 0.72 0.57 0.32 0.35 0.34 p0548 7.47 5.39 4.28 4.18 4.53 p2756 45.76 22.14 26.14 21.11 23.11 rgn 15.64 11.94 13.31 11.90 3.45 san200 0.9 3 319.22 928.75 3435.39 224.55 1495.94 scpc2s 154.52 149.89 122.40 125.22 141.93 set1al 11.40 6.97 6.83 6.96 6.71 stein45 2292.55 2089.00 1572.15 1768.88 2193.24 tsp43 106.80 104.19 98.17 199.51 310.80 utrans.2 10.34 4.47 12.53 5.74 4.65 utrans.3 26.37 18.15 23.23 19.16 13.47 vpm1 532.58 22.47 24.64 7.67 14.59 Avg. to MIPO 1.00 0.82 0.90 0.79 0.76 Table 1.2: Total running time. Problems marked with * were not solved within one hour.
17
the reduced cut LPs with a starting basis. When reducing the (CGLP) based on reduced cost computations, we save about 25% of the work compared against MIPO. Because we use strong branching (the default setting) the time actually spend generating cuts accounts on allmost all problems for less than 50% of the total running time and on several problems for less than 25%.
Results with multiple cuts from a disjunction In a computational experiment meant to test the efficiency of generating multiple cuts from the same disjunction, we ran two versions of MIPO on a set of MIPLIB problems. Both versions used the same formula for calculating at the root node the cutting frequency. The first version used the standard approach of generating one cut from each disjunction, whereas the second version generated q + 1 cuts from each disjunction, solving each (CGLP)j for the objective functions min{αx∗ − β}, with x∗ = x¯, x1 , . . . , xq , where x¯ is the optimal solution to min{cx : x ∈ P }, and x1 , . . . , xq are extreme points of P adjacent to x¯, obtainable by one pivot in the simplex tableau associated with x¯ (q was limited to at most 0.5 times the number of basic variables). The outcome is shown in table 3 for the problems that required at least 250 seconds to solve. Time (CPU Sec) Search tree nodes Problem version 1 version 2 version 1 version 2 c-fat-200-1 413 113 49 31 pp08CUTS 358 299 2891 1743 san200-0.9-3 2358 1751 495 609 stein45 1335 1789 20761 24711 vpm1 295 190 8811 4153 vpm2 432 399 8901 2877 10teams 1490 * 259 * ∗ Time or memory limit exceeded. Table 1.3: As can be seen, the extra work spent on generating more cuts pays off more often than not: the computing time is smaller in version 2 in five out of the seven instances.
18
Chapter 2 A Precise Correspondence
Parts of the work in this chapter has been published as A Precise Correspondence Between Lift-and-Project Cuts, Simple Disjunctive Cuts, and Mixed Integer Gomory Cuts for 0-1 Programming by E. Balas and M. Perregaard in Mathematical Programming, Ser. B 94 (2003), 221-245.
19
2.1
Introduction
Cutting planes for integer programs, pure or mixed, 0-1 or general, have a 40-someyears history. Gomory’s mixed integer cuts were proposed in the early sixties [27]. In the late sixties, intersection cuts made their appearance [1]. Soon they developed into disjunctive cuts, of which a variety were proposed in the seventies. A subclass of the latter were revived in the early nineties under the name of lift-and-project cuts [6], and were implemented in branch and cut algorithms [7], a framework which proved to be particularly fruitful. Gomory’s mixed integer cuts are also closely related to disjunctive cuts. Specifically, it has been shown (in e.g. [32]) that they are a special case of disjunctive cuts from a two-sided disjunction, also called split cuts [23]. Here we give a precise characterization of the connection between lift-and-project cuts and the earlier cuts in this literature. This correspondence has theoretical and practical consequences. On the theoretical side, it provides new bounds on the number of essential cuts in the elementary closure, and on the rank of the standard relaxation of a mixed 0-1 polyhedron with respect to various families of cuts. On the practical side, it makes it possible to solve the cut generating linear program of the lift-andproject procedure on the simplex tableau of the standard LP relaxation, without explicit recourse to the expanded formulation. The algorithm that does this can also be interpreted as a procedure for systematically improving a mixed integer Gomory cut from the optimal simplex tableau through a sequence of pivots that combine the terms of the disjunction applied to the cut row with other rows of the same tableau in a specific way. Next we outline the structure of this chapter. Section 2.2 describes simple disjunctive cuts and mixed integer Gomory cuts, while section 2.3 does the same thing for lift-and-project cuts. Section 2.4 establishes the precise correspondence between lift-and-project cuts and simple disjunctive cuts, while section 2.5 does the same for the strengthened version of these cuts, which includes the mixed integer Gomory cuts. Section 2.6 derives some bounds on the number of undominated disjunctive cuts, while section 2.7 gives bounds on the rank of the linear programming polyhedron with respect to various families of cuts. Section 2.8 describes the algorithm for solving the cut generating linear program implicitly, through appropriate pivots in the simplex tableau of the LP relaxation. Section 2.9 interprets the algorithm of section 2.8 as a method for improving mixed integer Gomory cuts in a systematic fashion. Finally, section 2.10 presents computational experience with this approach.
20
2.1.1
Notation
We consider the mixed integer 0-1 program in the form min s.t.
cx Ax ≥ b x ≥ 0 xj ∈ {0, 1}, j = 1, . . . , p
(MIP)
where A is (m + p) × n, c and x are n-vectors, and b is an (m + p)-vector for some p, 0 ≤ p ≤ n. We will assume that the system Ax ≥ b subsumes the inequalities xj ≤ 1, j = 1, . . . , p, i.e. the last p inequalities of Ax ≥ b are −xj ≥ −1, j = 1, . . . , p. The linear programming relaxation of (MIP) is min{cx : x ∈ P },
(LP)
where P := {x ∈ Rn+ : Ax ≥ b}.
˜ ≥ ˜b, where A˜ := A We will sometimes denote the constraint set defining P by Ax I and ˜b := 0b have m+p+n rows. The vector x¯ will denote an optimal solution to (LP). To simplify notation we will often write xJ to mean the subvector of the components of x indexed by the index set J.
2.2
Simple Disjunctive Cuts and Mixed Integer Gomory Cuts
Consider the simplex tableau associated with the optimal solution x¯ to (LP), and let the row associated with basic variable xk be X xk = a ¯k0 − a ¯kj sj (2.1) j∈J
where J is the index set of nonbasic variables, and 0 < a ¯k0 < 1. The intersection cut [1] from the convex set {x ∈ Rn : 0 ≤ xk ≤ 1}, also known as the simple disjunctive cut from the condition xk ≤ 0 ∨ xk ≥ 1 applied to (2.1), is πsJ ≥ π0 , where π0 := a ¯k0 (1 − a ¯k0 ) and πj := max{¯ akj (1 − a¯k0 ), −¯ akj a ¯k0 } j ∈ J.
(2.2)
Note that the cut πsJ ≥ π0 derived from xk ≤ 0 ∨ xk ≥ 1 depends on the nonbasic set J in terms of which xk is expressed. Different sets J give rise to different cuts derived from the same disjunction xk ≤ 0 ∨ xk ≥ 1. 21
When p ≥ 1, the simple disjunctive cut πsJ ≥ π0 can be strengthened [2, 9] by replacing π with π ¯ , defined as min{fkj (1 − a ¯k0 ), (1 − fkj )¯ ak0 } j ∈ J ∩ {1, . . . , p} π ¯j := (2.3) πj j ∈ J \ {1, . . . , p} with fkj := a ¯kj − ⌊¯ akj ⌋. The strengthened simple disjunctive cut π ¯ sJ ≥ π0 is the same as the mixed integer Gomory cut [27]. The mixed integer Gomory cut, when applied to a pure 0-1 program, dominates the fractional Gomory cut for pure integer programs [26].
2.3
Lift-and-Project Cuts
Lift-and-project cuts [6] are a special class of disjunctive cuts [2, 4], obtained from a disjunction of the form ˜ ≥ ˜b ˜ ≥ ˜b Ax Ax (4)k ∨ xk ≥ 1 −xk ≥ 0 for some k ∈ {1, . . . , p} such that 0 < x¯k < 1. A lift-and-project cut αx ≥ β from this disjunction is obtained by solving the cut generating linear program (see [6]) min α¯ x − α α
β −
uA + u0 ek −
−β + −β
vA − v0 ek
ub +
0 0 0 0
vb +
v0
vi +
v0 = 1
(CGLP)k
m+p
m+p
X
≥ ≥ = =
ui + u0
+
X i=1
i=1
u, u0, v, v0 ≥ 0,
where ek is the k-th unit vector. The last equation of (CGLP)k is a normalization constraint, meant to truncate the polyhedral cone defined by the remaining inequalities. The objective function of (CGLP)k is chosen so as to maximize the amount by which x¯ is cut off (see [6]). While an optimal solution to (CGLP)k yields a deepest cut in this sense, any solution to the constraint set of (CGLP)k yields a member of the family of lift-andproject cuts. However, since cuts corresponding to nonbasic solutions are dominated by those corresponding to basic solutions, we will only be interested in the latter. Since the components of (α, β) are unconstrained in sign, they can be eliminated and (CGLP)k can be solved solely in terms of the variables (u, u0, v, v0 ). Given any basic solution (u, u0, v, v0 ) to this reduced system, the (α, β) component of the 22
corresponding basic solution to (CGLP)k , and hence the coefficient vector of the cut αx ≥ β, is defined by β := ub = vb + v0 , and αj :=
max{uaj , vaj } j 6= k max{uak − u0 , vak + v0 } j = k,
(2.5)
where aj denotes the j-th column of A. The lift-and-project cuts αx ≥ β defined this way are derived from the disjunction on the 0-1 variable xk considered in the form (4)k . The integrality conditions on xj , j ∈ {1, . . . , p}\{k}, can be used to strengthen these cuts [2,4,5], and α can be replaced by α, ¯ where min{uaj + u0 ⌈mj ⌉, vaj − v0 ⌊mj ⌋}, j ∈ {1, . . . , p} \ {k}, α ¯ j := (2.6) αj , j ∈ {k} ∪ {p + 1, . . . , n}, with
vaj − uaj . (2.7) u0 + v0 We will call αx ≥ β defined by (2.5) an (unstrengthened) lift-and-project cut, and the cut αx ¯ ≥ β defined by (2.6), (2.7) a strengthened lift-and-project cut. Next we establish a precise connection between the lift-and-project cuts (strengthened lift-and-project cuts) on the one hand, and simple disjunctive cuts (strengthened simple disjunctive cuts or mixed integer Gomory cuts) on the other. A first step in this direction was taken in [6] where it was shown how one can obtain a feasible solution to (CGLP) from the basis inverse corresponding to an optimal basic solution to (LP). mj :=
2.4
The Correspondence Between the Unstrengthened Cuts
First, we introduce surplus variables into the inequalities of (CGLP)k and rewrite the constraint set as α α
uA˜ + u0 ek
−
= v A˜ − v0 ek = = ˜ v b + v0 =
− u˜b
− β + − β
+
m+p+n
m+p+n
X
0 0 0 0
ui + u0
+
X
vi + v0
= 1
i=1
i=1
u, u0, v, v0 , ≥ 0,
where the vectors u, v now have among their components the surplus variables. 23
(2.8)
Lemma 2.1. In any basic solution to (2.8) that yields an inequality αx ≥ β not dominated by the constraints of (LP), both u0 and v0 are positive. ˜ β = u˜b; and if v0 = 0, then α = v A, ˜ β = v˜b. In either Proof. If u0 = 0, then α = uA, ˜ ≥ ˜b. case, αx ≥ β is a nonnegative linear combination of the inequalities of Ax Since the components of (α, β) are unrestricted in sign, we may assume w.l.o.g. that they are all basic. We then have ¯ u Lemma 2.2. Let (α, ¯ β, ¯, u¯0 , v¯, v¯0 ) be a basic solution to (8), with u¯0 , v¯0 > 0 and all ¯ components of (α, ¯ β) basic. Further, let the basic components of u¯ and v¯ be indexed by M1 and M2 , respectively. Then M1 ∩ M2 = ∅, |M1 ∪ M2 | = n, and the n × n submatrix Aˆ of A˜ whose rows are indexed by M1 ∪ M2 is nonsingular. Proof. Removing from (8) the nonbasic variables and subscripting the basic components of u and v by M1 and M2 , respectively, we get α − uM1 A˜M1 + u0 ek α − vM2 A˜M2 − v0 ek −β + uM1 ˜bM1 −β + vM2 ˜bM2 + v0 uM1 1|M1 | + u0 + vM2 1|M2 | + v0
= = = = =
0 0 0 0 1
Eliminating the variables α, β, unrestricted in sign, we obtain the system ˜ A (uM1 , −vM2 ) A˜M1 − (u0 + v0 )ek = 0 M2 ˜ b (uM1 , −vM2 ) ˜bM1 − v0 = 0
(2.9)
(2.10)
M2
uM1 1|M1 | + vM2 1|M2 | +
u0 + v0
= 1
of n+2 equations, of which (¯ uM1 , u¯0 , v¯M2 , v¯0 ) is the unique solution. Since the number of variables, like that of the constraints, is n + 2, it follows that |M1 | + |M2 | = n. Now suppose that ˜ AM Aˆ := ˜ 1 AM 2
is singular. Then there exists a vector ∗ u∗M1 1|M1 | + vM 1 = 1. By setting 2 |M2 | v0∗
=
∗ (u∗M1 , −vM ) 2
∗ (u∗M1 , vM ) 2
˜ bM1 , ˜bM 2
∗ such that (u∗M1 , −vM )Aˆ = 0 and 2
u∗0 = −v0∗
∗ we obtain a solution (u∗M1 , u∗0, vM , v0∗) to (10). By assumption u¯0 > 0 and v¯0 > 0, 2 ∗ hence the solution (u∗M1 , u∗0 , vM , v0∗) differs from (¯ uM1 , u¯0 , v¯M2 , v¯0 ). But this contra2 dicts that (¯ u M1 , u ¯0, v¯M2 , v¯0 ) is the unique solution to (10), which proves that Aˆ is nonsingular. If |M1 ∩ M2 | = 6 ∅ then Aˆ will be singular, hence |M1 ∩ M2 | = ∅ and |M1 ∪ M2 | = n.
24
˜ ≥ ˜b by Now define J := M1 ∪ M2 , and consider the system obtained from Ax replacing the n inequalities indexed by J with equalities, i.e. by setting the corresponding surplus variables to 0. Since the submatrix Aˆ of A˜ whose rows are indexed by J is nonsingular, these n equations define a basic solution, with an associated simplex tableau whose nonbasic variables are indexed by J. Recall that in the (CGLP)k solution that served as our starting point, J := M1 ∪ M2 was the index set of the ˆ and basic components of (u, v). Writing ˆb for the subvector of ˜b corresponding to A, sJ for the surplus variables indexed by J, we have ˆ − sJ = ˆb Ax or
x = Aˆ−1ˆb + Aˆ−1 sJ .
(2.11)
Here some components of sJ may be surplus variables in an inequality of the form xj ≥ 0. Such a variable is of course equal to, and therefore can be replaced by, xj itself. If that is done, then the row of (2.11) corresponding to xk (a basic variable, since k 6∈ J) can be written as X xk = a ¯k0 − a ¯kj sj (2.12) j∈J
where a ¯k0 = ek Aˆ−1ˆb and a ¯kj = −(Aˆ−1 )kj . Notice that (2.12) is the same as (2.1). Note that this basic solution (and the associated simplex tableau) need not be feasible, either in the primal or in the dual sense. On the other hand, it has the following property. Lemma 2.3. 0 < a ¯k0 < 1. Proof. From (2.10), (uM1 , −vM2 ) = (u0 + v0 )ek Aˆ−1 , (u0 + v0 )ek Aˆ−1ˆb = v0
(2.13)
and since u0 > 0, v0 > 0, 0 < a¯k0 = ek Aˆ−1ˆb =
v0 < 1. u0 + v0
Theorem 4A. Let αx ≥ β be the lift-and-project cut associated with a basic solution (α, β, u, u0, v, v0 ) to (CGLP)k , with u0 , v0 > 0, all components of α, β basic, and the basic components of u and v indexed by M1 and M2 , respectively. Let πsJ ≥ π0 be the simple disjunctive cut from the disjunction xk ≤ 0 ∨ xk ≥ 1 applied to (12) with J := M1 ∪ M2 . Then πsJ ≥ π0 is equivalent to αx ≥ β. 25
Proof. From Lemma 2.2 we have that the matrix Aˆ = A˜J is nonsingular, so (2.12) is well-defined. As stated in section 2.2, the cut πsJ ≥ π0 from the disjunction xk ≤ 0 ∨ xk ≥ 1 applied to (2.12) is defined by π0 := a ¯k0 (1 − a ¯k0 )
(2.14)
where a ¯k0 = ek Aˆ−1ˆb, and πj := max{πj1 , πj2 }, j ∈ J, where πj1 := a ¯kj (1 − a ¯k0 ) = −(Aˆ−1 )kj (1 − a¯k0 ),
πj2 := −¯ akj a ¯k0 = (Aˆ−1 )kj a ¯k0 .
(2.15)
The equivalence of πsJ ≥ π0 to the cut αx ≥ β corresponding to the basic solution w = (α, β, u, u0, v, v0 ) of (CGLP)k is obtained by showing that θβ = π0 + πˆb, θvJ = π − π 2 , θv0 = a¯k0 .
ˆ θα = π A, θuJ = π − π 1 , θu0 = 1 − a¯k0 ,
(2.16)
for some θ > 0. This we do by showing that α, β, u, u0, v, v0 as defined by (2.16) satisfies (2.9). Indeed, using (2.14) and (2.15), θ(α − uJ Aˆ + u0 ek ) = = θ(−β + uJ ˆb) = = = ˆ θ(α − vJ A − v0 ek ) = = ˆ θ(−β + vJ b + v0 ) = = =
π Aˆ − (π − π 1 )Aˆ + (1 − a ¯k0 )ek = π 1 Aˆ + (1 − a¯k0 )ek ˆ −a −ek Aˆ−1 A(1 ¯k0 ) + (1 − a ¯k0 )ek = 0 1 −(π0 + πˆb) + (π − π )ˆb = −π0 − π 1ˆb ¯k0 ) −¯ ak0 (1 − a ¯k0 ) + ek Aˆ−1ˆb(1 − a −¯ ak0 (1 − a ¯k0 ) + a ¯k0 (1 − a ¯k0 ) = 0 2 ˆ ˆ π A − (π − π )A − a ¯k0 ek = π 2 Aˆ − a ¯k0 ek −1 ˆ ˆ ek A A¯ ak0 − a ¯k0 )ek = 0 ˆ −(π0 + π b) + (π − π 2 )ˆb + a ¯k0 = −π0 − π 2ˆb + a¯k0 −¯ ak0 (1 − a ¯k0 ) − ek Aˆ−1ˆb¯ ak0 + a ¯k0 −¯ ak0 (1 − a ¯k0 ) − a ¯k0 a ¯k0 + a ¯k0 = 0.
We further have that θuj = πj −
πj1
θvj = πj −
πj2
=
max{πj1 , πj2 }
=
max{πj1 , πj2 }
−
πj1
−
πj2
=
πj2 − πj1 if j ∈ M1 0 if j ∈ M2
=
0 if j ∈ M1 πj1 − πj2 if j ∈ M2
and
so u and v as defined by (2.16) is zero for every component not in M1 and M2 respectively. Finally, if we choose θ in (2.16) such that the normalization constraint uM1 1M1 + u0 + vM2 1M2 + v0 = 1 26
is satisfied, then we have that α, β, u, u0, v, v0 as defined by (2.16) satisfies the system (2.9). Since w is a basic solution and therefore is the unique solution to (2.9) then it must be as defined by (2.16). ˆ ≥ (π0 + πˆb). Substituting for x The cut θαx ≥ θβ defined by (2.16) is (π A)x using (2.11) we obtain the cut πsJ ≥ π0 , which shows the equivalence. Theorem 4A has the following converse. Theorem 4B. Let Aˆ be any n×n nonsingular submatrix of A˜ and ˆb the corresponding subvector of ˜b, such that 0 < ek Aˆ−1ˆb < 1, ˆ ˆb). Further, let πsJ ≥ π0 be the simple disjunctive and let J be the row index set of (A, cut obtained from the disjunction xk ≤ 0 ∨ xk ≥ 1 applied to the expression of xk in terms of the nonbasic variables indexed by J. Further, let (M1 , M2 ) be any partition of J such that j ∈ M1 if πj1 < πj2 (i.e. a ¯kj < 0) and j ∈ M2 if πj1 > πj2 (i.e. a ¯kj > 0), 1 2 where πj , πj are defined by (2.15). Now let αx ≥ β be the lift-and-project cut corresponding to the basic solution to (CGLP)k in which all components of α, β are basic, both u0 and v0 are positive, and the basic components of u and v are indexed by M1 and M2 , respectively. Then αx ≥ β is equivalent to πsJ ≥ π0 . Proof. First we show that the choice of basic variables in the Theorem is well-defined, i.e., that they form a basis. We proceed as in the proof of Lemma 2.2 by first eliminating all variables chosen to be nonbasic, from the system (2.8), which results ˜ A ˆ in the system (2.9). If we further eliminate α and β we obtain (2.10). Here ( A˜M1 ) = A, M2
which is nonsingular. From this it follows that (2.10) has a unique solution and hence (2.9) also has a unique solution. Therefore, the choice of basic variables forms a basis. Next we show the equivalence. Consider the solution (α, β, u, u0, v, v0 ) defined by (2.16). Following the proof of Theorem 4A we have that this solution satisfies (2.9) for a certain θ and that αx ≥ β is equivalent to the cut πsJ ≥ π0 . We have shown above that (2.9) has a unique solution for the choice of basis given in Theorem 4B, hence this basic solution must be as defined by (2.16). Note that in spite of the close correspondence that Theorems 4A and 4B establish between bases of (CGLP)k and those of (LP), this correspondence is in general not one to one. Of course, when πj1 6= πj2 for all j ∈ J, then the partition (M1 , M2 ) of J is unique. But when πj1 = πj2 for some j ∈ J, which is only possible when πj1 = πj2 = 0 (since πj1 and πj2 , when nonzero, are of opposite signs), then the corresponding index j can be assigned to either M1 or M2 , each assignment yielding a different basis for (CGLP)k . However, although the two bases are different, the associated basic solutions to (CGLP)k are the same, since the components of u and v corresponding to an index j ∈ J with πj1 = πj2 = 0 are 0, and hence the pivot in (CGLP)k that takes
27
one basis into the other is degenerate; i.e., the change of bases does not produce a change of solutions. The system (2.8) consists of a cone given by the homogeneous constraints truncated by a single hyperplane - the normalization constraint. Since (2.8) is bounded the extreme points of (2.8) are in one-to-one correspondence with the extreme rays of the non-normalized cone. Hence the relationship in Theorems 4A and 4B can also be interpreted as one between basic solutions to (LP) and extreme rays of the cone defined by the homogenous system of (2.8). Next we show that the correspondence between the cuts αx ≥ β and πsJ ≥ π0 established in Theorem 4A, 4B carries over to the strengthened version of these cuts.
2.5
The Correspondence Between the Strengthened Cuts
Theorem 2.5. Theorems 4A and 4B remain valid if the inequalities αx ≥ β and πsJ ≥ π0 are replaced by the strengthened lift-and-project cut αx ¯ ≥ β defined by (2.6), (2.7), and the mixed integer Gomory cut (or strengthened simple disjunctive cut) π ¯ sJ ≥ π0 defined by (2.3), respectively. Proof. The only coefficients of the cut πsJ ≥ π0 that can possibly be strengthened are those πj such that j ∈ J1 := J ∩ {1, . . . , p}, i.e. xj is a structural integer-constrained variable, nonbasic in the simplex tableau (2.11), (2.12). We claim that these are precisely the indices of the coefficients of the cut αx ≥ β that can be strengthened. Indeed, by substituting for the surplus variables of the constraints Ax ≥ b that are tight, πsJ ≥ π0 can be written as ¯ − ¯b) ≥ π0 πN ∩J sN ∩J + πS∩J (Ax or
(πN ∩J + πS∩J A¯N ∩J )xN ∩J + πS∩J A¯N ∩I xN ∩I ≥ π0 + πS∩J ¯b.
¯ ≥ ¯b Here N is the index set of structural variables, S that of surplus variables, Ax ˆ ˆ is the subsystem of Ax ≥ b indexed by S ∩ J, while J and I index the nonbasic and basic variables, respectively, in the simplex tableau (2.11), (2.12). The components of pi are πj := max{πj1 , πj2 }, where πj1 := a¯kj (1 − a ¯k0 ), with π0 := a¯k0 (1 − a ¯k0 ). On the other hand, writing α we have αN ∩J αN ∩I β
πj2 := −¯ akj a ¯k0 ,
:= (αN ∩J , αN ∩I ), A¯ := (A¯N ∩J , A¯N ∩I ), from (2.16) = πN ∩J + πS∩J A¯N ∩J = πS∩J A¯N ∩I , = π0 + πS∩J ¯b. 28
(2.17)
Here we assume that the solution (α, β, u, u0, v, v0 ) given by (2.16) is scaled such that θ = 1 for ease of notation. Also, α = max{α1 , α2 }, where max is the component-wise maximum, and, α1 = uA − u0 ek = α − uN , α2 = vA + v0 ek = α − vN ,
(2.18)
where uN and vN are the components of u and v, respectively, associated with the constraints xj ≥ 0, j ∈ N. Now for i ∈ N ∩ I, i.e. for the structural variables xi basic in (2.11), (2.12), the corresponding components of u and v are 0, i.e. ui = vi = 0, since by the choice of J it contains the index of every positive ui and vi . Thus we have that 1 2 αN ∩I = αN ∩I = αN ∩I ,
and therefore the components of α corresponding to variables basic in (2.11),(2.12) cannot be strengthened. Consider now the components of α corresponding to the variables xj nonbasic in (2.11),(2.12). For j ∈ N ∩ J we have αj1 = αj − uj (from (2.18) = πj + (πS∩J A¯N ∩J )j − (πj − πj1 ) = πj1 + ρj
(from (2.17) and (2.16))
and αj2 = αj − vj = πj2 + ρj where ρj is the j-th column of πB∩J A¯N ∩J . Thus αj = max{αj1, αj2 } = max{πj1 , πj2 } + ρj and the strengthened coefficient is α ¯ j = min{αj1 + u0⌈mj ⌉, αj2 − v0 ⌊mj ⌋}, where
α2 −α1
mj := uj0 +v0j = πj2 − πj1 since u0 + v0 = 1 − a ¯k0 + a ¯k0 = 1. Note that the definition of mj does not depend on the scaling of (α, β, u, u0, v, v0 ). Consequently, α ¯ j = min{πj1 + u0 ⌈mj ⌉, πj2 − v0 ⌊mj ⌋} + ρj = min{¯ akj (1 − a¯k0 ) + (1 − a ¯k0 )⌈¯ akj ⌉, −¯ akj a¯k0 − a¯k0 ⌊¯ akj ⌋} + ρj = min{fkj (1 − a¯k0 ), (1 − fkj )¯ ak0 } + ρj . Thus the coefficients α ¯ j are the same as the coefficients π ¯j expressed in terms of the structural variables. 29
2.6
Bounds on the Number of Essential Cuts
The correspondences established in Theorems 4A, 4B and 5 allow us to derive some new bounds on the number of undominated cuts of each type. Indeed, since every valid inequality for {x ∈ P : (xk ≤ 0) ∨ (xk ≥ 1)} is dominated by some liftand-project cut corresponding to a basic feasible solution of (CGLP)k , the number of undominated valid inequalities is bounded by the number of bases of (CGLP)k , which in turn cannot exceed 2(m + p + n + 1) + n + 1 # variables , = 2n + 3 # constraints a rather weak bound. But from Theorems 4A/4B it follows that a much tighter upper bound is also available, namely the number of ways to choose a subset of n variables to be nonbasic in a simplex tableau where xk is basic, that is, the number of subsets J of cardinality n of the set {1, . . . , m + p + n} \ {k}: Corollary 2.6. The number of facets of the polyhedron Pk := conv {x ∈ P : x satisfies (4)k } is bounded by
m+p+n−1 . n p \ Pk of P with respect to the lift-and-project operThus the elementary closure k=1 ation has at most p m+p+n−1 facets. n Similarly, the number of undominated simple disjunctive cuts obtainable by applying the disjunction xk ≤ 0 ∨ xk ≥ 1 to the expression of xk in terms of any other variables is bounded by m+p+n−1 , n and the number of cuts of this type for all k ∈ {1, . . . , p} is consequently at most m+p+n−1 p . n If we try to extend these bounds to strengthened lift-and-project cuts, or to strengthened simple disjunctive cuts, we run into the problem that the extension is only valid if we restrict ourselves to strengthened cuts derived from basic solutions. But this is not satisfactory, since although any unstrengthened cut in either class is dominated by some unstrengthened cut corresponding to a basic solution, the same is not true of the strengthened cuts: a strengthened cut from a nonbasic solution need not be dominated by any strengthened cut from a basic solution: counterexamples are easy to produce. On the other hand, the correspondences established in Theorems 4A/4B and 2.5 have an important consequence on the rank of the LP relaxation of a 0-1 mixed integer program with respect to the various families of cuts examined here. 30
2.7
The Rank of P With Respect to Different Cuts
Let P again denote the feasible region of the LP relaxation, as defined in section 2.1. It is well known that in the case of a pure 0-1 program, i.e. when p = n, the rank of P with respect to the family of (pure integer) fractional Gomory cuts can be strictly greater than n [24]. We now show that by contrast, the rank of P with respect to the family of mixed integer Gomory cuts is at most p. We first recall the definition of rank. We say that P has rank k with respect to a certain family of cuts (or with respect to a certain cut generating procedure) if k is the smallest integer such that, starting with P and applying the cut generating procedure recursively k times, yields the convex hull of 0-1 points in P . Theorem 2.7. The rank of P with respect to each of the following families of cuts is at most p, the number of 0-1 variables: (a) unstrengthened lift-and-project cuts; (b) simple disjunctive cuts; (c) strengthened lift-and-project cuts; (d) mixed integer Gomory cuts or, equivalently, strengthened simple disjunctive cuts. ˜ ≥ ˜b}, and Proof. Denote, as before, P := {x ∈ Rn : Ax PD := conv {x ∈ P : xj ∈ {0, 1}, j = 1, . . . , p} . From the basic result of disjunctive programming on sequential convexification [2, 4], if we define P0 := P and for j = 1, . . . , p, P j := conv {P j−1 ∩ {x ∈ Rn : xj ∈ {0, 1}}), then P p = PD . Since P j can be obtained from P j−1 by unstrengthened lift-and-project cuts, this implies (a). From Theorems 4A/4B, at any iteration j of the above procedure, each lift-andproject cut used to generate P j , corresponding to some basic solution of (CGLP)j , can also be obtained as a simple disjunctive cut associated with some nonbasic set J, with |J| = n. Hence the whole sequential convexification procedure can be stated in terms of simple disjunctive cuts rather than lift-and-project cuts, which implies (b). Turning now to strengthened lift-and-project cuts, if at each iteration j of the above procedure we use strengthened rather than unstrengthened lift-and-project 31
cuts corresponding to basic solutions of (CGLP)j , we obtain a set P˜ j instead of P j , with P˜ j ⊆ P j . Clearly, using the same recursion as above, we end up with P˜ p = PD . This proves (c). Finally, since every strengthened lift-and-project cut corresponding to a basic solution of (CGLP)j is equivalent to a mixed integer Gomory cut derived from the row corresponding to xj of a simplex tableau with a certain nonbasic set J with |J| = n (Theorem 2.5), the procedure discussed under (c) can be restated as an equivalent procedure in terms of mixed integer Gomory cuts, which proves (d). In [22] it was shown that the bound established in Theorem 2.7 is tight for the mixed integer Gomory cuts, by providing a class of examples with rank p. We now turn to the computational implications of Theorems 4A/4B and 2.5.
2.8
Solving (CGLP)k on the (LP) Simplex Tableau
The major practical consequence of the correspondence established in Theorems 4A/4B is that the cut generating linear program (CGLP)k need not be formulated and solved explicitly; instead, the procedure for solving it can be mimicked on the linear programming relaxation (LP) of the original mixed 0-1 problem. Apart from the fact that this replaces a large linear program with a smaller one, it also substantially reduces the number of pivots for the following reason. A basic solution to (LP) associated with a nonbasic set J corresponds to a set of basic solutions to (CGLP)k having u0 > 0, v0 > 0, all components of (α, β) basic, and ui , vj basic for some i ∈ M1 , and j ∈ M2 , respectively, such that M1 ∪M2 = J. The various solutions to (CGLP)k that correspond to the basic solution to (LP) associated with J differ among themselves by the partition of J into M1 and M2 . These solutions can be obtained from each other by degenerate pivots in (CGLP)k . Thus a single pivot in (LP), which replaces the set J with some J ′ that differs from J in a single element, may correspond to several pivots in (CGLP)k , which together change the set M1 ∪ M2 by a single element, but shift one or more elements from M1 to M2 and vice-versa. We will now describe the procedure that mimics on (LP) the optimization of (CGLP)k . We start with the simple disjunctive cut πsJ ≥ π0 derived from the optimal simplex tableau (2.11) by applying the disjunction xk ≤ 0 ∨ xk ≥ 1 to the expression X xk = a ¯k0 − a¯kj sj . (12) j∈J
As mentioned before, the coefficients of this cut are π0 := (1 − a¯k0 )¯ ak0
and πj := max{πj1 , πj2 }, 32
j∈J
with πj1 := (1 − a¯k0 )¯ akj , πj2 := −¯ ak0 a ¯kj . We know that the lift-and-project cut αx ≥ β equivalent to πsj ≥ π0 corresponds to the basic solution of (CGLP)k defined by (16). We wish to obtain the lift-andproject cut corresponding to an optimal solution to (CGLP)k by performing the improving pivots in (LP). We start by examining a pivot on an element a¯ij , i 6= k, of the simplex tableau (2.11) for (LP). The effect of such a pivot is to add to the cut row (12) the i-th row multiplied by γj := −¯ akj /¯ aij , and hence to replace (12) by X xk = a¯k0 + γj a ¯i0 − (¯ akh + γj a ¯ih )sh − γj xi (12′ ) h∈J\{j}
Now if 0 < a ¯k0 + γj a ¯i0 < 1, i.e. if (−¯ ak0 /¯ ak0 ) < γj < ((1 − a ¯k0 )/¯ ai0 ), then we ′ can apply the disjunction xk ≤ 0 ∨ xk ≥ 1 to (12 ) instead of (12), to obtain a cut π γ sJ γ ≥ π0γ , where J γ := (J \ {j}) ∪ {i} and si denotes xi . The question is how to choose the pivot element a ¯ij in order to make π γ sJ γ ≥ π0γ a stronger cut than πsJ ≥ π0 , in fact as strong as possible. This choice involves two elements. First, we choose a row i, some multiple of which is to be added to row k; second, we choose a column in row i, which sets the sign and size of the multiplier. Note that we can pivot on any nonzero a ¯ij since we do not restrict ourselves to feasible bases. As to the first choice, pivoting in row i, i.e. pivoting the variable xi out of the basis, corresponds in the simplex tableau for (CGLP)k to pivoting into the basis one of the variables ui or vi . Clearly, such a pivot is an improving one in terms of the objective function of (CGLP)k only if either ui or vi have a negative reduced cost. Below we give the expressions for rui and rvi , the reduced costs of ui and vi respectively, in terms of the coefficients a¯kj and a ¯ij , j ∈ J ∪ {0}. As to the second choice, one can identify the index j ∈ J such that pivoting on a ¯ij maximizes the improvement in the strength of the cut, by first maximizing the improvement over all j ∈ J with γj = −¯ aij /¯ aij > 0, then over all j ∈ J with γj < 0 and choosing the larger of the two improvements. In the following we will consider a basic solution (x, s) to (LP) in which the basic and non-basic structural variables are indexed respectively by B and R, and the basic and non-basic surplus variables are indexed respectively by P and Q. We will index the surplus variables by 1, . . . , m+p, and the structural variables by m+p+1, . . . , m+ p + n. With this indexing we have a direct correspondence between the variables of ˜ ≥ ˜b. Note, however, that the latter system (LP) and the surplus variables of Ax contains the extra surplus variables from the rows xj − sj = 0,
j = m + p + 1, . . . , m + p + n
(2.19)
which of course are equal to the corresponding structural variables. Using this indexing we will let a ¯ij denote the coefficient for non-basic variable j in the row for basic variable i, in the simplex tableau of (LP) corresponding to the 33
solution (x, s). Further, we will let a ¯i0 denote the right-hand side of the simplex ˜ tableau row with basic variable i, and A˜i the i-th row of A. Lemma 2.8. Set J = Q ∪ R. The coefficients a ¯ij for i = 1, . . . , m + p + n, j ∈ J, and the right-hand sides a ¯i0 for i = 1, . . . , m + p + n satisfy a¯ij = −(A˜i A˜−1 J )j −1˜ ˜ ˜ a¯i0 = Ai AJ bJ − ˜bi
(2.20) (2.21)
Proof. From Lemma 2 we know that A˜J is invertible. Let us first consider the basis matrix for (LP) and its inverse. We can write the nontrivial constraints of (LP) as AB + AR Q xB Q xR − sQ = bQ B AP xB − sP + AR = bP P xR where A = (AB AR ). The basis matrix, E, for (LP) and its inverse, E −1 , are B −1 (AB 0 AQ 0 −1 Q) E= E = B −1 AB −I AB P (AQ ) P −I ˜ ≥ ˜b indexed by J are The constraints of Ax R AB = bQ Q xB + AQ xR − sQ xR − sR = 0
where sR are those surplus variables of (2.19) corresponding to xR . We can thus write A˜J and its inverse, A˜−1 J as B B −1 B −1 R R (A ) −(A ) A A A −1 Q Q Q Q Q A˜J = A˜J = 0 I 0 I There are four cases to consider for the coefficients of the simplex tableau for (LP); index i can be either that of a structural variable (i ∈ B) or that of a surplus variable (i ∈ P ), and likewise for index j (j ∈ R or j ∈ Q). Case 1: i and j both index a surplus variable (i ∈ P, j ∈ Q). We obtain a ¯ij by premultiplying the j-th non-basic column, which in this case is the column for surplus variable j, by E −1 and taking the i-th component, as −ej B −1 B B −1 −1 = −(AB a ¯ij = (0 ei )E P (AQ ) )ij = −(Ai (AQ ) )j , 0 B where AB i is the i-th row of A . To show (2.20), note that for this case we have B R A˜i = (Ai Ai ) and the j-th component of −A˜i A˜−1 J becomes e e j j B −1 B B −1 B R −1 −1 = −AB ¯ij . = −(Ai Ai )A˜J −A˜i A˜J i (AQ ) ej = −(Ai (AQ ) )j = a 0 0
34
Case 2: i indexes a surplus variable and j indexes a structural variable (i ∈ P, j ∈ R). R Again A˜i = (AB i Ai ), and R B −1 R R −1 AQ = AB a ¯ij = (0 ei )E i (AQ ) (AQ )j − Aij AR P j B −1 R R B R ˜−1 0 −1 0 ˜ ˜ = AB ¯ij . = −(Ai Ai )AJ −Ai AJ i (AQ ) (AQ )j − Aij = a ej ej Case 3: i indexes a structural variable and j indexes a surplus variable (i ∈ B, j ∈ Q). Here A˜i = (ei 0), and −1 −1 −ej = −((AB a ¯ij = (ei 0)E Q ) )ij 0 −1 ej −1 ej −1 ˜ ˜ ˜ −Ai AJ = −(ei 0)AJ = −((AB ¯ij . Q ) )ij = a 0 0 Case 4: i and j both index a structural variable (i ∈ B, j ∈ R). Again A˜i = (ei 0), and R −1 AQ −1 R a¯ij = (ei 0)E = ((AB Q ) AQ )ij R AP j 0 0 −1 R = ((AB ¯ij . = −(ei 0)A˜−1 −A˜i A˜−1 Q ) AQ )ij = a J J ej ej For the right-hand side a ¯i0 there are only two cases, depending on whether i indexes a structural or a surplus variable. Case 1: i indexes a surplus (i ∈ P ). Then B −1 −1 bQ = AB a ¯i0 = (0 ei )E i (AQ ) bQ − bi bP To show (2.21), note that ˜bi = bi and ˜ B −1 bQ R ˜−1 bQ B −1 ˜ ˜ ˜ − bi = (Ai Ai )AJ Ai AJ ˜ − bi = AB ¯i0 i (AQ ) bQ − bi = a 0 bR Case 2: i indexes a structural variable (i ∈ B). Then −1 bQ −1 = ((AB a ¯i0 = (ei 0)E Q ) bQ )i , bP 35
and since ˜bi = 0, ˜ −1 −1 bQ −1 bQ ˜ ˜ ˜ ˜ − 0 = ((AB ¯i0 . − bi = (ei 0)AJ Ai AJ ˜ Q ) bQ )i = a 0 bR Theorem 2.9. Let (α, β, u, u0, v, v0 ) be a basic, feasible solution to (8) with u0 , v0 > 0, all components of α, β basic, and the basic components of u and v indexed by M1 and ˜ ≥ ˜b corresponding to the solution M2 , respectively. Let s¯ be the surplus variables of Ax x¯. The reduced costs of ui and vi for i 6∈ J ∪{k} in this basic solution are, respectively −σ+τ −a ¯i0 (1 − x¯k ) + s¯i −σ−τ +a ¯i0 (1 − x¯k )
ru i = rvi = where σ = τ
P
= σ
j∈M2
a ¯kj s¯j −¯ ak0 (1−¯ xk ) P |¯ a | kj j∈J
(2.22)
1+
P
j∈M1
a ¯ij − σ
P
j∈M2
a ¯ij +
P
j∈M2
a ¯ij s¯j
Proof. If we restrict (8) to the basic variables plus ui and vi , and eliminate α, β, we obtain the system uM1 A˜M1 + ui A˜i − u0 ek = vM2 A˜M2 + vi A˜i + v0 ek = vM2 ˜bM2 + vi˜bi + v0 uM1 ˜bM1 + ui˜bi P P j∈M1 uj + ui + u0 + j∈M2 vj + vi + v0 = 1
The first two equations can be rewritten
(uM1 , −vM2 )A˜J + (ui − vi )A˜i = (u0 + v0 )ek (uM1 , −vM2 )˜bJ + (ui − vi )˜bi = v0 From Lemma 2 we know that A˜J is invertible, so (uM1 , −vM2 ) = (u0 + v0 )ek A˜−1 − (ui − vi )A˜i A˜−1 J J ˜bJ − (ui − vi )(A˜i A˜−1˜bJ − ˜bi ) v0 = (u0 + v0 )ek A˜−1 J J
(2.23)
˜ ˜−1 Now, using Lemma 8 we can identify the expressions −(ek A˜−1 J )j and −(Ai AJ )j with the coefficients a ¯kj (since A˜k = ek ) and a ¯ij in the simplex tableau of (LP) for the basic solution with variables indexed by J being non-basic. Likewise we can identify the ˜ ˜ ˜ ˜−1˜ expressions ek A˜−1 ¯k0 (since J bJ and Ai AJ bJ − bi with the right-hand side constants a ˜bk = 0) and a ¯i0 of the simplex tableau. With this substitution we have that uj = −(u0 + v0 )¯ akj + (ui − vi )¯ aij for j ∈ M1 vj = (u0 + v0 )¯ akj − (ui − vi )¯ aij for j ∈ M2 v0 = (u0 + v0 )¯ ak0 − (ui − vi )¯ ai0 36
(2.24)
We can now write the normalization constraint as X X vj + ui + vi + u0 + v0 uj + 1= j∈M2
j∈M1
(substituting uj and vj from (2.24)) X (−(u0 + v0 )¯ akj + (ui − vi )¯ aij ) = j∈M1
+
X
((u0 + v0 )¯ akj − (ui − vi )¯ aij ) + ui + vi + u0 + v0
j∈M2
= (u0 + v0 ) − + (ui − vi )
X
a ¯kj +
X
j∈M1
j∈M2
X
X
a ¯ij −
a¯kj + 1 a¯ij
j∈M2
j∈M1
!
!
+ ui + vi .
Since (2.24) is satisfied for the current basic solution with ui = vi = 0 and since uM1 , vM2 ≥ 0, it follows from (2.24) that j ∈ M1 ⇒ a ¯kj ≤ 0 and j ∈ M2 ⇒ a ¯kj ≥ 0, so X X X a ¯kj = |¯ akj | a ¯kj + − We thus have
u0 + v0 =
j∈J
j∈M2
j∈M1
1 − (ui − vi )
P
a¯ij − P
j∈M1
1+
j∈J
P
j∈M2
a ¯ij − ui − vi
|¯ akj |
We can now write the objective function of (CGLP)k in terms of ui and vi as α¯ x − β = vM (A˜M x¯ − ˜bM ) + vi (A˜i x¯ − ˜bi ) + v0 (ek x¯ − 1) 2
2
2
(use that s¯M2 = A˜M2 x¯ − ˜bM2 and s¯i = A˜i x¯ − ˜bi ) = vM2 s¯M2 + vi s¯i + v0 (¯ xk − 1) (substitute for vM2 and v0 using (2.24)) X X a ¯ij s¯j + vi s¯i a ¯kj s¯j − (ui − vi ) = (u0 + v0 ) j∈M2
j∈M2
+ (u0 + v0 )¯ ak0 (¯ xk − 1) − (ui − vi )¯ ai0 (¯ xk − 1) ! X a ¯kj s¯j − a¯k0 (1 − x¯k ) = (u0 + v0 ) j∈M2
+ (ui − vi ) −
X
j∈M2
37
a ¯ij s¯j + a ¯i0 (1 − x¯k )
!
+ vi s¯i
(2.25)
If we substitute for (u0 + v0 ) from (2.25) and use the definition of σ we obtain P P P ¯ij + σ j∈M2 a ¯ij − σ − j∈M2 a ¯ij s¯j + a ¯i0 (1 − x ¯k ) α¯ x − β = σ + ui −σ j∈M1 a P P P ¯ij − σ j∈M2 a ¯ij − σ + j∈M2 a ¯ij s¯j − a ¯i0 (1 − x ¯k ) + s¯i . + vi +σ j∈M1 a (2.26)
Setting
τ := σ
X
a ¯ij − σ
j∈M1
X
j∈M2
a¯ij +
X
a ¯ij s¯j
j∈M2
the equation (2.26) reduces to α¯ x − β = σ + ui (−σ − τ + a ¯i0 (1 − x¯k )) + vi (−σ + τ − a ¯i0 (1 − x¯k ) + s¯i ) from which we can then read the reduced costs rui and rvi as the coefficients of ui and vi . Theorem 2.10. The pivot column in row i of the (LP) simplex tableau that is most improving with respect to the cut from row k, is indexed by that l∗ ∈ J that minimizes f + (γl ) if a ¯kl a¯il < 0 or f − (γl ) if a¯kl a ¯il > 0, over all l ∈ J, where γl := − a¯a¯klil and for any γ, P akj , −γ¯ aij }¯ sj − a ¯k0 (1 − x¯k ) + γ¯ ai0 x¯k j∈J max{¯ P f + (γ) = 1 + |γ| + j∈J |¯ akj + γ¯ aij | and
f − (γ) =
P
j∈J
max{¯ akj + γ¯ aij , 0}¯ sj − a ¯k0 (1 − x¯k ) − γ¯ ai0 (1 − x¯k ) P 1 + |γ| + j∈J |¯ akj + γ¯ aij |
Proof. Consider row k and row i of the (LP) simplex tableau X X xk + a ¯kj sJ = a¯k0 xi + a ¯ij sJ = a ¯i0 . j∈J
j∈J
If we add row i to row k with weight γ ∈ R we obtain the composite row X xk + γxi + (¯ akj + γ¯ aij )sJ = a¯k0 + γ¯ ai0 .
(2.27)
j∈J
A pivot on column l in row i has the effect of adding γl = − a¯a¯klil times row i to row k. We want to identify the column l such that the simple disjunctive cut, π γ sJ ≥ π0γ we derive from the composite row (2.27) with γ = γl minimizes π γ s¯J − π0γ .
38
For any γ ∈ R, the simple disjunctive cut from the disjunction xk ≤ 0 ∨ xk ≥ 0 applied to (2.27) has coefficients (see Section 2.2) πiγ = max{(1 − a ¯k0 − γ¯ ai0 )γ, −(¯ ak0 + γ¯ ai0 )γ} γ πj = max{(1 − a ¯k0 − γ¯ ai0 )(¯ akj + γ¯ aij ), −(¯ ak0 + γ¯ ai0 )(¯ akj + γ¯ aij )} for j ∈ J γ π0 = (1 − a ¯k0 − γ¯ ai0 )(¯ ak0 + γ¯ ai0 ) We can eliminate xi by subtracting πiγ times row i from the cut πiγ xi + π γ sJ ≥ π0γ . The result depends on the sign of γ, since πiγ = (1 − a ¯k0 − γ¯ ai0 )γ if γ > 0, and γ πi = −(¯ ak0 + γ¯ ai0 )γ if γ < 0. γ>0 :
πjγ+ = max{(1 − a¯k0 − γ¯ ai0 )¯ akj , −(¯ ak0 + γ¯ ai0 )¯ akj − γ¯ aij } = −(¯ ak0 + γ¯ ai0 )¯ akj + max{¯ akj , −γ¯ aij } π0γ+ = (1 − a ¯k0 − γ¯ ai0 )¯ ak0
γ 0}, M2 = J \ M1 , and rvi with M1 = {j ∈ J : a ¯kj < 0 ∨ (¯ akj = 0 ∧ a¯ij < 0}, M2 = J \ M1 of ui , vi , corresponding to each row i 6= k of the simplex tableau of (LP) according to (2.22).
Step 2
Let i∗ be a row with rui∗ < 0 or rvi∗ < 0. If no such row exists, go to Step 5.
Step 3
Identify the most improving pivot column j∗ in row i∗ by minimizing f + (γj ) over all j ∈ J with γj > 0 and f − (γj ) over all j ∈ J with γj < 0 and choosing the more negative of these two values.
Step 4
Pivot on a ¯i∗ j∗ and go to Step 1.
Step 5
If row k has no 0 entries, stop. Otherwise perturb row k by replacing every 0 entry by εt for some small ε and t = 1, 2, . . . (different for each entry). Go to step 1.
Figure 2.1: Algorithm for generating an optimal lift-and-project cut from the simplex tableau of the LP relaxation.
41
When we compute the reduced costs in Step 1, we create a partition (M1 , M2 ) of J according to Theorem 4B. When a ¯kj = 0 for some j ∈ J we are free to choose whether to assign j to M1 or M2 . By assigning such a j to M1 if a ¯ij > 0 and to M2 otherwise, we make sure there exists a non-degenerate pivot in row i that improves the cut. This is because with such a choice of M1 and M2 it is possible to increase ui by a small amount in (2.24) without driving any of the uj and vj negative. Equivalently for rvi . Hence, when step 2 does not find a negative reduced cost, there is no improving pivot in the simplex tableau of (LP). In order to explain the role of the perturbation (Step 5), we need to examine in more detail the connection between pivots in the (LP) simplex tableau (the small tableau) and pivots in the simplex tableau of (CGLP) (the large tableau). The set J defines a unique basis B of the small tableau, which corresponds to as many bases of the large tableau as there are partitions (M1 , M2 ) of J satisfying the requirements of feasibility (i.e. j ∈ M1 if a ¯kj < 0 and j ∈ M2 if a ¯kj > 0). Now let us assume a certain partition (M1 , M2 ) satisfying these requirements, corresponding to a feasible e of the large tableau. A pivot in row i of the small tableau replaces B with an basis B adjacent basis B ′ , but it may change the signs of many entries of row k, resulting in a change of the set of candidates for inclusion into M1 or M2 . Thus any of the partitions (M1′ , M2′ ) available after such a pivot may differ from the earlier partition (M1 , M2 ) e ′ of (CGLP) corresponding to the by several elements, which means that the basis B e by several columns, i.e. chosen partition (M1′ , M2′ ) will differ from the earlier basis B e through several pivots in (CGLP). would be obtainable from B When the algorithm comes to a point where Step 2 finds no row i 6= k with rui < 0 or rvi < 0, i.e. all the reduced costs of the large tableau are nonnegative, we could conclude that the solution is optimal if the reduced costs had all been calculated with respect to the same basis of the large tableau, i.e. with respect to the same partition (M1 , M2 ). However, this is not the case, since the attempt to find a pivot that improves the cut from row k as much as possible makes us use a different partition (M1 , M2 ) for every row i, as explained above. While in the absence of 0 entries in row k the partition (M1 , M2 ) is unique (the same for all i), the presence of 0’s in row k allows us to use different partitions for different rows, thereby gaining in efficiency. When all the reduced costs calculated in this way are nonnegative, then in order to make sure that the cut is optimal, we must recalculate the reduced costs from a unique basis of the large tableau, i.e. a unique partition (M1 , M2 ). This is what the perturbation in Step 5 accomplishes, by eliminating the 0 entries in row k. Since the perturbation is cumbersome and slows down the algorithm, in practice we run it without step 5, stopping when all reduced costs are nonnegative. Experience shows that the cuts obtained this way are on the average of roughly the same strength as those obtained by solving explicitly the (CGLP) (see section 9 for details).
42
2.9
Using Lift-and-Project to Strengthen Mixed Integer Gomory Cuts
The algorithm of section 2.10 for finding an optimal lift-and-project cut through a sequence of pivots in the simplex tableau of (LP) can also be interpreted as an algorithm for strengthening (improving) through a sequence of pivots a mixed integer Gomory cut derived from a row of the (LP) simplex tableau. The first pivot in this sequence results in the replacement of the mixed integer Gomory (MIG) cut from the row associated with xk in the optimal simplex tableau of (LP) (briefly row k) with the MIG cut from the same row k of another simplex tableau (not necessarily feasible), the one resulting from the pivot. The new cut is guaranteed to be more violated by the optimal LP solution x¯ than was the previous cut. Each subsequent pivot results again in the replacement of the MIG cut from row k of the current tableau with a MIG cut from row k of a new tableau, with a guaranteed improvement of the cut. This algorithm is essentially an exact version of the heuristic procedure for improving mixed integer Gomory cuts described in [2] (Example 3.1, p. 9-11). The nature of this improvement is best understood by viewing the MIG cut as a simple disjunctive cut, and considering the strengthening of the disjunction – a dichotomy between two inequalities – through the addition of multiples of other inequalities to either term, before actually taking the disjunction. Here is a brief illustration of what this strengthening procedure means, on an example small enough for the purpose, yet hard enough for standard cuts, the Steiner triple problem with 15 variables and 35 constraints (problem stein15 P of [15]). P To mitigate the effects of symmetry, we replaced the objective function of xj by jxj . j
j
The linear programming optimum is
x¯ = (1, 1, 1, 1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0, 0, 0, 0, 0) with a value of 35. The integer optimum is x∗ = (1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0), with value 45. Generating one mixed integer Gomory cut from each of the five fractional variables and solving the resulting linear program yields a solution x1 of value 39. Iterating this procedure 10 times, each time generating one MIG cut from each of the fractional variables of the current solution and solving the resulting linear program, yields the solution x10 = (0.97, 1, 1, 0.93, 1, 0.73, 0.64, 0.33, 0.61, 0.86, 0.67, 0, 0.35, 0.55, 0.32) with a value of 42.73. 43
But if instead of using the five MIG cuts as they are, we first improve them by our pivoting algorithm, then use the five improved cuts in place of the original ones, we get a solution x˜1 of value 41.41. If we then iterate this procedure 10 times, using every time the improved cuts in pace of the original ones, we obtain the solution x˜10 = (1, 1, 1, 1, 1, 1, 1, 0.94, 0.72, 0.29, 0, 0, 0, 0, 0) with a value of 44.85. The difference between x10 and x˜10 is striking. However, even more striking are the details. Lack of space limits us to discussing the first out of the ten iterations of the above procedure. Here is how the improving pivots affect the amount of violation, defined as β − α¯ x for the cut αx ≥ β normalized as in section 2.3, and the distance, meaning the Euclidean distance of x¯ from the cut hyperplane: Distance 0.1443 0.2835
Cut from x6 : original MIG optimal (after 3 pivots)
Violation 0.0441 0.0833
Cut from x7 : original MIG optimal (after 1 pivot)
0.0625 0.0714
0.1768 0.2085
Cut from x8 : original MIG optimal (after 1 pivot)
0.0577 0.0833
0.2023 0.2835
Cut from x9 : original MIG optimal (after 3 pivots)
0.0500 0.0833
0.1744 0.2887
Cut from x10 : original MIG optimal (after 4 pivots)
0.0500 0.0833
0.1744 0.2835
In the process of strengthening the 5 MIG cuts in the first iteration, 12 new cuts are generated. If, instead of replacing the original MIG cuts with the improved ones, we keep all the cuts generated and solve the problem with all 17 cuts (the 5 initial ones plus the 12 improved ones), we get exactly the same solution x˜1 as with the 5 final improved cuts only: the original MIG cuts as well as the intermediate cuts resulting from the improving pivots (except for the last one) are made redundant by the 5 final improved cuts. A similar behavior is exhibited on the problems stein27 (with 27 variables and 117 constraints) and stein45 (45 variables, 330 constraints). Since the improved MIG cuts resulting from our algorithm are equivalent to the (strengthened) optimal lift-and-project cuts, these findings corroborate those of [6]. The practical question that remains to be answered, is the following: does the gain in the quality of the cuts justify the computational effort for improving them? This 44
can only be established experimentally, and it pretty much defines the next task in this area of research. One last comment. Since the algorithm described here starts with a MIG cut from the optimal simplex tableau and stops with a MIG cut from another (usually infeasible) simplex tableau, one may ask what is the role of lift-and-project theory in this process? The answer is that it provides the guidance needed to get from the first MIG cut to the final one. It provides the tools, in the form of the reduced costs from the (CGLP) tableau of the auxiliary variables ui and vi , for identifying a pivot that is guaranteed to improve the cut, if one exists. Over the last three and a half decades there have been numerous attempts to improve mixed integer Gomory cuts by deriving them from tableau rows combined in different ways, but none of these attempts has succeeded in defining a procedure that is guaranteed to find an improved cut when one exists. The lift-and-project approach has done just that.
2.10
Computational Experience
We have over the past pages developed an algorithm as an alternative to solving the higher-dimensional CGLP. This leaves the essential question: how does it perform in practice? Therefore, we went ahead and implemented this algorithm and ran some experiments to get a practical comparison against the lift-and-project cuts, and to gauge the effectiveness of using our cuts in a branch-and-bound setting. First, we will go into more detail on the technical aspects of how we have implemented the algorithm in Figure 2.1.
2.10.1
Technical Details
The efficiency of any implementation of the algorithm in Figure 2.1 efficiently there are certain depends very much on how the reduced costs of Theorem 2.9 are calculated and how the function in Theorem 2.10 is minimized. Calculating those expressions directly is not necessarily the best way to do it. Below we describe some considerations necessary to implement our algorithm. Using explicit lower and upper bounds on variables During our derivation of the theory on program of the form min s.t.
the preceeding pages we assumed a linear cx ˜ ≥ ˜b Ax
where the x variables are unrestricted and if there are any bounds on the variables ˜ ≥ ˜b. In practice then they must be included as explicit constraints in the system Ax most variables have both lower and upper bounds and the lower bounds are not necessarily zero. It is inefficient to explicitly store such bounds as constraints which 45
is why modern LP solvers does not include them in the constraint set but instead remembers what the bounds on the variables are. Then when a variable is made nonbasic, the solver flags what bound the variable is nonbasic at. Fortunately, it is very easy to move between a simplex tableau with bounded variables and one where the bounds are included as constraints. A lower bound lj on a variable xj leads to a constraint xj ≥ lj . In our theory from the previous sections we assumed that only the surplus variables could be non-basic since all the structural variables are unrestricted. If xj is non-basic in a row of the simplex tableau with implicit bounds as in xi + a¯i1 s1 + . . . + a ¯ij xj + . . . = a ¯i0 we can replace it using xj − sj = lj , where sj is a surplus variable, to obtain xi + a ¯i1 s1 + . . . + a¯ij sj + . . . = a¯i0 − a ¯ij lj ˜ ≥ ˜b with sj non-basic which is now a row of the equivalent simplex tableau for Ax and xj basic. ˜ ≥ ˜b the constraint Likewise, if xj has an upper bound dj we will include in Ax −xj ≥ −dj . If we have a row of the simplex tableau with xj non-basic, but now at its upper bound dj xi + a¯i1 s1 + . . . + a ¯ij xj + . . . = a ¯i0 we can again replace it by introducing a surplus variable sj in −xj − sj = −dj , to obtain xi + a ¯i1 s1 + . . . − a¯ij sj + . . . = a¯i0 − a ¯ij dj We can generalize this if we let L and D denote the sets of non-basic structural variables at their lower and upper bounds, respectively. Consider a row of the simplex tableau with implicit bounds X X xi + a ¯ij xj + a ¯ij xj = a¯i0 j∈L
j∈U
Here we assume that any non-basic surplus variables are included in the set L since they are non-basic at a lower bound of zero. If we perform the substitutions outlined above for all non-basic variables, we obtain an equivalent row of the simplex tableau ˜ ≥ ˜b with non-basic surplus indexed by J = L ∪ D, of the form for Ax X X X X xi + a ¯ij sj + (−¯ aij )sj = a ¯i0 − a ¯ij lj − a¯ij dj j∈L
j∈D
j∈L
j∈D
Therefore, if our LP solver is set up to work with implicit bounds then we should set a ¯ij if j ∈ L (non-basic at lower bound) ′ a ¯ij = −¯ aij if j ∈ D (non-basic at upper bound) 46
and a¯′i0 = a ¯i0 −
X
a ¯ij lj −
j∈L
a ¯′ij ,
X
a ¯ij dj
j∈D
a ¯′i0
such that j ∈ J = L ∪ D, and becomes the coefficients we should use in Theorem 2.9 and Theorem 2.10. This covers the case when the non-basic variables have bounds, but what about the basic variable? In our theory we assumed that we always pivot a surplus variable into the basis in place of another surplus variable. If we are not storing the bounds as explicit constraints in the system then we need to be able to pivot a structural variable out of the basis. Suppose the surplus sli that we pivot out of the basis is the surplus of a lower bound constraint xi − sli = li , then the effect of pivot sli out of the basis is to set sli = 0 and therefore set xi = li . This is equivalent to making xi non-basic at its lower bound. Conversely, if the variable sdj we are pivoting out of the basis is the surplus of an upper bound constraint −xi − sdj = −di then we are essentially making xi non-basic at its upper bound. Consider the row of the simplex tableau with implicit bounds in which xi is basic: X a ¯′j sj = a xi + ¯′i0 j∈J
Here we have already substituted out the non-basic structural variables as described above. If xi has both a lower bound li and an upper bound di we will have two ˜ ≥ ˜b, one with sl basic and one with corresponding rows of the simplex tableau for Ax i d si basic: P sli + j∈J a ¯′ij sj = a ¯′i0 − li P (2.29) ¯′i0 a′ij )sj = di − a sdi + j∈J (−¯ The reduced costs rul j and rul j we calculate in Theorem 2.9 for sli are the reduced costs for pivoting xi out of the basis to its lower bound, and the reduced costs rudj and rudj we calculate for sdi are the reduced costs for making xi go to its upper bound. The two rows in (2.29) are very similar so it should come as no surprise that once we have calculated one pair of reduced costs, the other pair follows with almost no ¯i0 by a ¯′i0 − li . For the effort. For the lower bound costs we replace a¯ij by a ¯′ij and a upper bound costs we replace a¯ij by −¯ a′ij and a ¯i0 by di − a ¯′i0 . Therefore, all four reduced costs can be computed as: rul i rvl i rudi rvdi
= = = =
−σ+τ −σ−τ −σ−τ −σ+τ
where τ =σ
X
j∈M1
− (¯ a′i0 − li )(1 − x¯k ) + (¯ xi − li ) ′ + (¯ ai0 − li )(1 − x¯k ) − (di − a ¯′i0 )(1 − x¯k ) + (di − x¯i ) i + (d − a ¯′i0 )(1 − x¯k )
a¯′ij − σ
X
j∈M2
47
a ¯′ij +
X
j∈M2
a ¯′ij s¯′j
(2.30)
(2.31)
and s¯′j = x¯j − lj if xj is non-basic at its lower bound lj or dj − barxj if xj is non-basic at its upper bound dj . In principle, whenever a formula calls for the a surplus sj for a lower or upper bound constraint, we replace it with the applicable one of xj − lj or dj − xj . Computing the Reduced Costs Although the expression for τ in (2.30) for the reduced costs does not look simple, it is linear in the simplex tableau coefficients from the row i. This makes it relatively easy to compute, provided the partition (M1 , M2 ) is given. Suppose AB and AJ are the basic, respectively non-basic columns of A in a solution to the LP relaxation Ax = b (x denotes both structural and surplus variables here). The expression for τ involves computing a sum of the form X zj a¯ij (2.32) j∈J
Such a sum can be computed as X X −1 −1 zj a ¯ij = zj (A−1 B AJ )ij = z(AB AJ )i = (zAB )AJi j∈J
(2.33)
j∈J
This requires computing z¯ = zA−1 B only once, and then for each row i compute the vector product z¯AJi , where AJi is the i’th row of the non-basic columns. Hence computing the reduced cost for either ui or vi requires post-multiplying one vector by the basis inverse and an inner product for each row. This is in theory comparable to the amount of work usually needed to price out all the columns in a regular linear program. Calculating reduced costs in the presence of zero coefficients A complication arises if a¯kj = 0 for some j ∈ J. In that case the decision of whether to include j in M1 or M2 depends on the sign of a ¯ij . This corresponds in (CGLP )k to a basis that is primal degenerate where both ui and vi are zero so we are free to choose which must be basic. If we want to be guaranteed that a pivot on a calculated negative reduced cost leads to a better cut, then we have to use the partition in Step 1 of our algorithm, i.e., place j in M1 if a ¯ij > 0 and otherwise in M2 when computing ru i . Therefore, we have to make a distinction between the non-basic columns with a ¯kj 6= 0 and those with a ¯kj = 0, that is, split J into the two subsets J ∗ := {j ∈ J |a ¯kj 6= 0} and J 0 := {j ∈ J | a¯kj = 0}. We can then write the sum (2.32) for τ as X X zj a ¯ij zj a ¯ij + j∈J ∗
j∈J 0
48
L152LAV row 1 0
-0.0001
-0.0002
-0.0003
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Figure 2.2: CGLP objective as a function of γ for the pivots on the first row of the optimal simplex tableau of the problem instance L152LAV. Pivots on both lower and upper bounds included.
The first sum we can still compute quickly using the trick in (2.33). The second sum, on the other hand, we compute directly by calculating each entry a ¯ij of the simplex tableau for every column with a¯kj 6= 0. This will make the calculation of the reduced costs expensive, but will also guarantee that there is a pivot on row i that improves our cut if and only if one of the reduced costs is zero. Identifying the nonbasic variable to enter the basis The other major task in our algorithm is Step 3; identifying which non-basic variable should enter the basis in our chosen row i∗ . Since we consider a single row, we can afford to compute all the tableau coefficients for this row. Once we have these constants, the functions f + (γ) and f − (γ) we have to evaluate in Theorem 2.10 become two piece-wise linear functions divided by each other. A sample of this function is provided in Figure 2.2. Each point corresponds to one possible pivot in this row. Here the minimum of −0.0003675 is attained at γ = 0.180. A single pivot to this basis would require at least 77 pivots in (CGLP)k to accomplish the equivalent change in the (CGLP)k basis. We describe here a procedure for finding the minimum of f + (γ). The procedure for f − (γ) is very similar and requires only minor modifications. 49
Since f + (γ) consists of a piece-wise linear function divided by another piece-wise linear function, we will write it as f + (γ) =
p + γq r + γs
Note that the sign of the first derivative qr − ps d + q(r + γs) − (p + γq)s f = = 2 dγ (r + γs) (r + γs)2 is given by the sign of qr − ps which does not depend on γ. Hence any minimum is a ¯ attained when γ equals a ratio − a¯kj for some j. ij The procedure is as follows: Step 1 Sort all columns with a negative ratio 0≤− Step 2 Set
and l = 0.
a ¯kj a ¯ij
in non-increasing order such that
a ¯kj1 a ¯kj a¯kjm ≤ − 2 ≤ ···− a¯ij1 a ¯ij2 a¯ijm
P p0 = ¯kj s¯j − a ¯k0 (1 − x¯k ) a ¯kj >0 a P P q0 = − a¯kj 7200 0.90 251.18 408.50 >7200 30.35 >7200 9.56 99.66 >7200 >7200 1.08 1.42 0.01 0.20 >7200 0.66 0.94 0.35 0.07 0.25 0.28 0.53 0.45 0.05 >7200 0.22 20.73 0.19 >7200 >7200 >7200 >7200
XPgom >7200 0.93 356.76 409.32 >7200 46.14 >7200 6.60 90.41 >7200 >7200 1.13 1.51 0.03 0.20 >7200 0.69 0.99 0.30 0.05 0.25 0.27 0.58 0.64 0.09 >7200 0.24 13.15 0.47 >7200 >7200 >7200 >7200
Solution time (sec.) CGLP2 CGLP10 PIV2 135.94 538.24 41.85 1.08 1.09 0.94 423.63 1273.66 855.05 743.03 726.19 393.83 >7200 >7200 >7200 39.20 38.55 40.10 >7200 >7200 >7200 14.46 15.38 10.12 105.66 107.65 107.71 >7200 >7200 >7200 >7200 >7200 >7200 5.11 32.71 2.11 9.41 44.08 2.73 0.03 0.03 0.03 3.72 4.52 1.24 >7200 >7200 >7200 4.47 31.46 2.61 1.45 1.49 1.48 0.39 0.56 0.37 0.09 0.10 0.07 0.64 0.67 0.41 0.63 0.69 0.33 2.05 15.98 1.17 2.40 21.56 1.26 0.25 1.33 0.11 >7200 >7200 >7200 0.76 3.83 0.37 25.49 63.47 79.18 0.43 1.70 0.23 >7200 >7200 >7200 >7200 >7200 >7200 >7200 >7200 >7200 4726.02 5001.32 3670.28
PIV10 41.81 0.94 458.31 303.92 >7200 38.65 >7200 14.60 113.69 >7200 >7200 14.61 7.33 0.03 1.31 >7200 13.35 4.96 0.29 0.08 0.43 0.33 12.84 9.92 0.45 >7200 0.71 49.00 1.00 >7200 >7200 >7200 >7200
XPcov (14.29) 1 746 3990 (11.59) 95530 (89.43) 10977 15313 (0.01) (43.26) 794 48 1 768 (3.56) 297 80 5862 6 13 12 107 109 15 (40.03) 34 7008 848 (100.00) (100.00) (51.14) (96.49)
XPgom (71.43) 1 1137 3990 (11.59) 104153 (91.07) 6175 15528 (0.01) (43.18) 632 40 1 623 (3.56) 297 101 4430 6 3 5 117 167 171 (38.15) 23 3621 1910 (100.00) (100.00) (50.08) (87.80)
Nodes / (Final gap (%)) CGLP2 CGLP10 PIV2 756 1145 1033 1 1 1 649 481 2010 5658 2822 3533 (16.22) (16.22) (18.54) 96868 102633 98151 (91.27) NA (90.98) 11273 7853 8656 17242 19609 17435 0 0 0 (32.31) (33.22) (28.76) 638 557 438 45 36 42 1 1 1 12646 8923 4657 (13.16) (13.16) (5.72) 426 463 435 100 99 74 5573 5852 5610 6 6 6 1 1 1 3 3 5 92 42 139 56 39 55 29 23 19 (52.71) (47.29) (58.43) 25 19 12 5548 7052 16721 704 2436 547 (100.00) (100.00) (100.00) (100.00) (100.00) (100.00) (50.39) (50.30) (51.47) 1589557 1636138 1462918
PIV10 75 1 1179 2489 (23.17) 99193 (97.49) 11682 17652 0 (30.60) 263 63 1 4875 (6.90) 530 189 3832 6 1 5 27 33 66 (54.97) 16 5347 2231 (100.00) (100.00) (50.68) (87.47)
Table 2.5: Solution time with a 2 hour time limit and number of nodes (or final lower bound in parenthesis if the run did not finish).
Problem
61
misc03 misc06 misc07 mitre mkc mod008 mod010 mod011 modglob noswot nw04 p0033 p0201 p0282 p0548 p2756 pk1 pp08a pp08aCUTS qiu qnet1 qnet1 o rentacar rgn rout set1ch seymour stein27 stein45 swath vpm1 vpm2
XPcov 0.66 0.30 89.01 1.33 >7200 1.21 2.14 321.26 0.38 >7200 >7200 0.03 1.01 0.21 0.17 1.09 553.27 2.08 1.99 465.73 0.88 0.78 3.44 0.08 >7200 >7200 >7200 1.09 45.57 >7200 0.07 3.43
XPgom 0.79 0.21 107.34 1.36 >7200 1.56 0.44 324.01 0.30 >7200 >7200 0.02 0.92 0.27 0.15 0.83 530.66 2.26 2.02 486.99 1.19 1.01 3.41 0.08 >7200 >7200 >7200 1.11 41.90 >7200 0.07 5.73
Solution time (sec.) CGLP2 CGLP10 1.53 4.13 1.30 1.83 153.88 181.39 2.03 1.79 >7200 >7200 1.27 2.71 3.80 27.74 343.43 342.70 1.34 6.92 >7200 >7200 121.01 120.38 0.12 0.43 2.00 9.21 1.22 5.51 0.58 0.79 2.14 5.40 871.56 1279.30 4.44 18.65 6.07 26.95 466.39 749.55 10.71 21.98 10.85 63.38 3.22 3.26 0.10 0.10 >7200 >7200 >7200 >7200 >7200 >7200 1.44 2.10 79.83 68.27 >7200 >7200 0.11 0.08 8.58 31.26
PIV2 1.12 0.42 141.26 1.43 >7200 0.29 0.81 342.88 0.45 >7200 110.36 0.04 1.37 0.81 0.45 1.21 601.56 4.13 5.55 846.90 4.60 4.06 3.40 0.08 >7200 >7200 >7200 1.43 52.84 >7200 0.09 4.21
PIV10 3.47 0.55 190.48 1.54 >7200 0.59 0.98 355.02 0.76 >7200 117.66 0.14 7.94 3.48 0.50 4.60 644.08 15.93 22.88 2196.38 30.82 31.61 3.49 0.11 >7200 >7200 >7200 1.43 58.72 >7200 0.10 12.48
XPcov 741 87 60403 1 0 2907 835 3012 201 (100.00) NA 91 1139 66 73 396 548211 1119 825 37409 31 31 14 1 (49.88) (28.24) (14.24) 4738 78520 NA 6 3553
XPgom 813 9 69094 1 (22.41) 5254 1 2683 114 (100.00) NA 32 805 277 23 175 532178 1304 704 34707 33 49 14 1 (48.89) (62.41) (24.60) 4561 70479 NA 6 5870
Nodes / (Final gap (%) CGLP2 CGLP10 PIV2 714 649 928 9 3 5 59766 55197 67892 1 1 1 (6.18) (7.07) (14.85) 2771 9046 575 299 307 1 2460 2460 2460 77 224 204 (100.00) (100.00) (100.00) 344 344 344 39 122 29 796 591 810 124 102 36 22 1 1 568 327 158 658286 876860 550300 1396 2281 1324 752 1011 1022 21418 24916 39148 50 1 11 68 33 18 19 19 18 1 1 1 (48.76) (46.60) (47.90) (54.08) (75.61) (54.74) (18.45) (26.77) (24.01) 4954 5413 4701 107040 96644 72966 NA NA NA 6 6 6 6751 10460 2895
PIV10 478 3 49659 1 (35.32) 1482 1 2460 208 (100.00) 344 33 850 508 1 403 526074 1560 1410 34704 9 23 18 1 (45.14) (92.82) (32.40) 4531 81708 NA 6 4107
Table 2.6: Solution time with a 2 hour time limit and number of nodes (or final lower bound in parenthesis if the run did not finish). (continued)
Problems solved Solution time - Average Solution time - Geo. Mean Nodes - Average Nodes - Geo. Mean Cutting time - Average Cutting time - Geo. Mean Root gap closed - Average
XPcov 47 49.46 3.37 18871.68 326.75 7.80 1.42 16.92
XPgom 47 52.03 3.39 18559.96 262.03 8.11 1.48 34.75
CGLP2 50 71.54 5.54 21829.17 293.38 26.50 3.33 39.77
CGLP10 50 111.53 10.84 26502.68 253.30 82.68 8.86 48.06
PIV2 50 74.95 4.38 19235.28 206.16 16.89 2.27 44.18
PIV10 49 98.61 7.32 18296.15 223.58 41.17 4.90 54.65
Table 2.7: Summary of results. is currently implemented using the optimizer library interface in XPRESS and does as such not have access to the internal data structures which necessarily degrades performance. It is also not quite fair to compare cutting time of the internal Gomory cuts against our cuts since many of the internal Gomory cuts are rejected even before they are fully created.
2.11
Earlier Computational Results on Lift-andProject Cuts in the Litterature.
The apparent weak computational strength of the lift-and-project cuts in the previous section is in stark contrast to the positive results reported on in the litterature, in particular in [7] and later in [19]. In the following we attempt to analyze these existing results with an aim towards explaining the small effect we experienced with lift-and-project.
2.11.1
Balas, Ceria and Cornu´ ejols, 1996
The paper [7] was the first with a computational study of the benefits of using lift-andproject cuts with branch-and-cut for solving mixed integer problems. The authors showed how using lift-and-project cuts helped them create an efficient solver that performed as well, or better, than the commercial solvers available at the time. Since 1996 the commercial solvers have improved significantly, and some of the improvements are due to the use of cutting planes before the branch-and-bound search is started. The question is whether lift-and-project can perform as well with a stateof-the-art solver as it did in 1996. This paper also provided the first comparison between the use of lift-and-project cuts and Gomory’s mixed integer cuts. Since todays commercial solvers apply these cuts, it is worth a second look. We will focus on the comparison performed in Table 4 of [7], where Gomory’s mixed integer cuts (labelled ”Not Reoptimized”) are compared against lift-and-project 62
Not Reoptimized
Problem BM23 CTN2 EGOUT FXCH3 MISC05 MODGLOB MOD008 P0033 P0201 P0282 P0291 P2756 SCPC2S SET1AL TSP43 VPM1
Semireoptimized
Fully Reoptimized
# of Cuts
Gap Closed
CPU Time
# of Cuts
Gap Closed
CPU Time
# of Cuts
Gap Closed
CPU Time
58 329 133 244 170 443 39 51 512 206 97 184 556 278 423 345
27.9% 73.6% 99.0% 44.3% 7.7% 67.0% 33.5% 68.9% 24.6% 23.4% 90.5% 3.3% 12.2% 99.9% 55.6% 23.6%
4 92 20 40 30 2,254 6 4 471 15 7 42 534 68 308 89
109 325 93 294 197 477 115 136 548 346 122 448 1,465 261 380 301
36.5% 93.4% 100% 76.7% 17.2% 87.8% 52.1% 74.6% 63.5% 94.7% 98.5% 97.6% 47.5% 99.9% 55.6% 72.7%
5 141 12 134 58 1,412 8 7 408 71 5 80 10,551 99 257 95
103 334 77 291 244 412 106 123 654 340 122 498 1,132 257 709 300
39.2% 95.5% 100% 88.8% 12.4% 96.6% 43.0% 72.9% 59.8% 94.1% 98.8% 90.3% 60.5% 99.9% 62.2% 72.3%
8 240 15 205 284 14,498 10 7 1,714 125 8 163 15,171 113 2,782 211
Table 2.8: Comparison of Three Types of Cuts. Original data from [7] using CPLEX 2.1. cuts from a relaxed cut generating linear program (CGLP) (labelled ”Semireoptimized”) and from a full CGLP (labelled ”Fully reoptimized”). The difference is in which of the constraints of the mixed 0/1 program are included in the CGLP. In the first formulation only constraints with non-basic slack are included. In the second formulation all bounds are added, and in the final formulation all constraints and bounds are included. This Table 4 is reproduced here as Table 2.8. In each test 10 rounds of cuts were generated, with one cut for each fractional variable. The numbers reported is the total number of cuts, the percentage the gap between the LP solution and the optimal integer solution was closed and the CPU time (in seconds) to generate the cuts. These computational test were done on a HP720 workstation, using CPLEX version 2.1 to solve all LPs. This table shows a clear improvement in using Liftand-Project cuts (”Semireoptimized” and ”Fully reoptimized”) over Gomory’s mixed integer cuts (”Not Reoptimized”). It is interesting to observe how these results change when using a state-of-theart solver. For comparison we will use version 14.05 of the XPRESS optimization library and our own implementation of the lift-and-project cuts, utilizing the callback functionality of XPRESS. These lift-and-project cuts are generated in a similar fashion to those in Table 2.8, by solving an α-normalized CGLP in a subspace followed by lifting and strengthening of the solution cut. The main difference between XPRESS 14.05 and CPLEX 2.1, as far as the tightening of the LP relaxation is concerned, is that XPRESS applies up to 20 rounds of 63
Not Reoptimized
Problem BM23 CTN2 EGOUT FXCH3 MISC05 MODGLOB MOD008 P0033 P0201 P0282 P0291 P2756 SCPC2S SET1AL TSP43 VPM1
Semireoptimized
Fully Reoptimized
# of Cuts
Gap Closed
CPU Time
# of Cuts
Gap Closed
CPU Time
# of Cuts
Gap Closed
CPU Time
27 104 37 95 82 66 14 39 117 60 30 228 125 206 135 55
28.63% 45.41% 98.50% 41.33% 22.71% 44.60% 22.90% 41.44% 43.14% 12.06% 89.64% 97.94% 11.05% 99.98% 85.11% 48.82%
0.14 1.51 0.30 1.11 1.25 1.67 0.17 0.21 3.16 0.62 0.21 4.87 13.85 2.84 10.48 0.62
42 142 59 108 87 0 34 43 259 130 48 297 626 205 118 66
39.51% 78.49% 100.00% 76.67% 48.56% 0.00% 40.49% 43.38% 24.70% 93.58% 97.81% 98.86% 45.78% 99.96% 30.78% 71.96%
0.31 18.50 0.46 6.16 6.30 0.44 0.45 0.45 24.63 2.45 0.57 12.13 248.54 9.00 30.29 1.94
42 114 34 96 85 0 32 46 295 126 38 309 654 205 125 96
39.58% 91.25% 100.00% 82.82% 17.18% 0.00% 46.13% 43.38% 48.12% 94.51% 98.31% 99.43% 62.86% 99.97% 27.78% 55.15%
0.45 34.15 0.47 14.74 23.31 0.47 0.42 0.42 162.24 3.36 0.75 21.25 2348.99 8.95 63.22 3.27
Table 2.9: Comparison of Three Types of Cuts. Using XPRESS 14.05 without ”cover” cuts. various classes of cuts, referred to as ”cover” cuts in XPRESS, that are all derived from one or a few of the LP constraints (lifted cover cuts, knapsack cuts, mixed integer rounding cuts, etc.). On most problems these cuts alone will close the gap significantly. In tables 2.9 and 2.10 we report on the new results obtained using XPRESS 14.05, with and without the ”cover” cuts, respectively. These tests were done on a 1.5GHz Pentium-4 computer. For the problem MODGLOB, all of the lift-and-project cuts were rejected for having too large variation in the cut coefficients. Such cuts are discarded to avoid making the linear program unstable. Without the ”cover” cuts, the gap closed for the three variations are similar to the original numbers reported in [7]. Once we add the ”cover” cuts the numbers change drastically. Except for three problems, these cuts close the gap significantly, and in 9 out of the 16 instances the gap is closed by more than 90%. It should also be noted that for all of the problems examined here, the time to solve each 0-1 problem to optimality (using XPRESS with default settings) is less than the time to generate the 10 rounds of lift-and-project cuts. Thus, the efficient branch-and-bound code of XPRESS is faster at closing the integrality gap than liftand-project cuts, for these small and easy problems. The conclusion from this must be that for easy problems it is best not to generate lift-and-project cuts, due to the effectiveness of cheap cuts like XPRESS’ ”cover”, in
64
Cov. Cuts
Problem BM23 CTN2 EGOUT FXCH3 MISC05 MODGLOB MOD008 P0033 P0201 P0282 P0291 P2756 SCPC2S SET1AL TSP43 VPM1
Not Reoptimized
Semireoptimized
Fully Reoptimized
Gap Closed
# of Cuts
Gap Closed
CPU Time
# of Cuts
Gap Closed
CPU Time
# of Cuts
Gap Closed
CPU Time
17.88% 99.00% 100.00% 91.80% 50.78% 94.05% 22.48% 45.36% 0.00% 97.76% 97.25% 98.37% 0.00% 99.95% 0.00% 100.00%
28 110 0 77 24 14 32 30 58 59 38 21 125 6 135 0
25.24% 100.00% 100.00% 94.29% 50.78% 95.48% 45.50% 76.05% 28.43% 98.42% 99.74% 99.58% 11.05% 99.97% 85.11% 100.00%
0.15 2.56 0.00 2.09 1.01 0.38 0.58 0.21 2.49 1.94 0.48 1.31 13.71 0.56 10.52 0.00
55 35 0 114 65 1 59 60 263 182 67 42 626 8 118 0
40.30% 100.00% 100.00% 96.51% 51.83% 94.09% 52.69% 96.67% 34.66% 99.06% 100.00% 99.73% 45.78% 99.97% 30.78% 100.00%
0.41 2.13 0.00 14.09 6.32 2.25 0.82 0.52 24.78 8.08 0.60 3.82 252.61 1.23 30.50 0.00
61 56 0 104 38 1 66 66 290 226 78 37 654 8 125 0
45.98% 100.00% 100.00% 98.71% 53.06% 94.09% 60.29% 89.32% 45.85% 98.90% 100.00% 99.79% 62.86% 99.97% 27.78% 100.00%
0.75 4.85 0.00 36.77 10.06 2.02 0.98 0.64 136.90 32.84 0.72 5.43 2380.67 1.47 63.46 0.00
Table 2.10: Comparison of Three Types of Cuts. Using XPRESS 14.05 with ”cover” cuts. combination with an efficient branch-and-bound code. The tables do show a clear trend that 10 rounds of lift-and-project cuts is able to close more of the integrality gap than 10 rounds of Gomory’s mixed integer cuts. It is still plausible that on hard mixed integer problems several rounds of lift-and-project cuts can speed up the solution time.
65
2.11.2
Ceria and Pataki, 1998
A more resent computational study of lift-and-project is the paper [19]. The authors concentrate on a subset of the MIPLIB3 library of mixed integer linear programs, that they have identified as ”hard” problems. By their definition, a problem is classified as ”hard” if the problem is not solved by CPLEX version 5.0 within 1000 seconds on a 167MHz Sun Enterprise 4000 server. The main differences between the lift-and-project implementation in [19] and the original one in [7] are: Variable selection The integer variables used in [7] to construct the lift-and-project CGLP from were selected as those whose fractional part was closest to 0.5. In [19] a lot more effort is spent on identifying variables that will lead to strong cuts. A candidate list of the 150 variables whose fractional part is closest to 0.5 is created. From this candidate list the 50 most promising variables are selected using the CPLEX strongbranch procedure. Normalization The normalization constriant imposed in [19] bounds the sum of the auxilliary variables in the CGLP (normalization (iii) of chapter 1). In [7] the normalization bounded the 1-norm on the cut coefficients. Extra cuts By zeroing out certain non-zero multipliers in the optimal CGLP solution and then reoptimizing, the code in [19] generates additional cuts from each CGLP. Branch-and-Cut vs. Cut-and-Branch In [19] the problems are solved by Cutand-Branch, i.e., the mixed integer program is first strengthened by several rounds of cuts and then the strengthened formulation is solved by pure branchand-bound. The original lift-and-project solver [7], on the other hand, solves mixed integer programs by branch-and-cut, where a round of cuts is generated for every k’th search node. k is a problem dependent parameter from 1 to 32. In Table 2 of [19] (reproduced here as Table 2.11) the authors compare the effect of strengthening the mixed integer programs with 0, 2 or 5 rounds of lift-and-project cuts before solving the resulting strengthened mixed integer program using CPLEX 5.0 with its default settings. These 17 problems are divided into three groups depending on whether the lift-and-project cuts 1) reduced the run time significantly, 2) had no significant impact on the run time, or 3) resulted in worse run time. In Table 2.12 we attempt to recreate the comparison of Table 2.11 but using version 14.05 of XPRESS and our own implementation of lift-and-project. These tests were carried out on a 2GHz Pentium-4 computer. Since XPRESS creates mixed integer Gomory cuts, we ran XPRESS both without and with Gomory cuts. Next we ran XPRESS without Gomory cuts but with either two or five rounds of lift-and-project cuts. In all the tests we used default settings which lets XPRESS create up to 20 66
Problem 10 teams air04 air05 gesa2 gesa2 o modglob pp08a pp08aCUTS vpm2 harp2 misc07 p6000 qiu arki001 mod011 pk1 rout
0 rounds Time Nodes 5404 2265 2401 146 1728 326 9919 86522 12495 111264 * * * * 50791 1517658 8138 481972 14804 57350 2950 15378 1115 2911 35290 27458 6994 21814 22344 18935 3903 130413 19467 133075
2 rounds Time Nodes 1274 306 1536 110 1411 141 3407 22601 4123 28241 10033 267015 1924 47275 277 3801 1911 63282 10686 28477 2910 12910 1213 2896 15389 10280 18440 68476 63481 240990 5094 122728 26542 155478
5 rounds Time Nodes 5747 1034 5084 120 4099 213 1721 6464 668 4739 435 5623 178 1470 134 607 974 18267 13377 31342 4133 14880 805 1254 27691 15239 13642 12536 * * 6960 150243 40902 190531
Table 2.11: Computational results for cut-and-branch using CPLEX 5.0 The authors do not describe the reason behind the lack of the numbers marked * rounds of ”cover” cuts. These cuts are created before any Gomory or lift-and-project cuts. The problem p6000 is no longer part of the MIPLIB test set and was therefore excluded from our comparison. The problem pp08aCUTS did not differ significantly from pp08a and is excluded as well. The main differences in how lift-and-projects cuts are generated in [19] and in our code are: Variable selection We first create a candidate list of Gomory cuts from the 150 most fractional variables, and among these I select the 50 deepest cuts (as measured by the Euclidian distance from the cutting plane to the LP relaxation solution). The basic variables from which these 50 Gomory cuts were created are then used to generate the lift-and-project cuts. Extra cuts Only one cut is created for each CGLP in our implementation. Out of the 9 problems in Table 2.11 for which lift-and-project cuts was shown to work well in [19], 6 of those are now very easy to solve thanks to the ”cover” cuts. Only 10teams remains as a problem for which lift-and-project helps significantly, although this problem can also be solved quickly by generating a few extra rounds of Gomory cuts.
67
Problem
10teams air04 air05 gesa2 gesa2 o modglob pp08a vpm2 harp2 misc07 qiu arki001 mod011 pk1 rout
XPRESS no Gom cuts Time Nodes * * 207.28 746 340.36 3,990 0.34 13 0.31 12 0.44 343 1.80 1,823 2.98 3,553 * * 75.27 60,403 308.55 31,794 * * 281.23 16,597 512.89 548,211 * *
XPRESS w/Gom cuts Time Nodes * * 298.25 1,137 341.06 3,990 0.34 3 0.28 5 0.36 127 1.80 1,823 5.83 7,374 * * 90.55 69,094 429.00 38,428 * * 281.72 16,597 492.31 532,178 * *
Lift-and-project 2 rounds Time Nodes 73.77 445 241.24 253 283.23 2,254 0.59 1 0.58 3 1.86 77 3.86 1,170 6.95 6,837 * * 117.73 63,486 334.38 21,108 * * 197.91 5,538 598.42 555,733 * *
Lift-and-project 5 rounds Time Nodes 166.88 306 519.69 261 596.08 4,201 0.55 1 0.58 3 4.47 336 6.05 1,071 12.45 7,225 * * 121.63 58,034 474.92 19,632 * * 239.58 5,119 665.89 617,420 * *
Table 2.12: Computational results for cut-and-branch using XPRESS 14.05. A time limit of 1800 seconds was imposed. The hard hard problems Out of the 66 problems in MIPLIB3, Ceria and Pataki encountered five problems they could not solve without or with (at most 5 rounds of) lift-and-project cuts. These are danoint, dano3mip, noswot, set1ch and seymour. With the combination of the XPRESS ”cover” cuts and lift-and-project cuts, the formerly very hard problem set1ch now becomes a very easy problem. As Ceria and Pataki observes, five rounds of lift-and-project cuts is not sufficient for this problem, but if we add 15 rounds of cuts, the gap is closed almost completely. The seymour problem on the other hand remains hard to solve. Figure 2.3 plots the progress of the LP relaxation with either half an hour of lift-and-project cuts (either from the CGLP or by pivoting), half an hour of branch-and-bound or Gomory cuts (no further improvement was possible with Gomory cuts after 25 rounds/50 seconds). For this problem lift-and-project cuts does help to close the integrality gap better than a comparative amount of branch-and-bound search, but there is still a sizeable gap remaining to the optimal solution value of 423. Only recently [25] was this value shown to be optimal by using a combination of lift-and-project cuts and strong parallel computers. The noswot problem is special in that it contains a constraint that restrict the LP objective to be no less than the integer optimum of -43. Thus, no amount of cutting will be able to tighten the LP objective value for this problem. It is more of a test of 68
Seymour
411
branch-and-bound Gomory cuts Pivots CGLP
410
409
Objective
408
407
406
405
404
403 0
200
400
600
800 1000 Time (secs)
1200
1400
1600
1800
Figure 2.3: LP objective versus time for classic and pivot based lift-and-project cuts, Gomory cuts and branch-and-bound.
how fast the primal heuristic of the solver can find the optimal solution. danoint and dano3mip remains very hard problems for which none of the cuts appear able to close the gap by any significant amount.
2.11.3
Conclusion
We have seen how the presence of simple cuts such as lifted cover cuts, knapsack cuts, mixed integer rounding cuts, etc. are sufficient to turn many former hard problems into very easy problems. Among the problem instances that Balas et al. [7] studied in 1996, all of them can today be solved in less time than it takes to create lift-andproject cuts for them. Thus those instances should no longer be considered when studying lift-and-project cuts. In their study, Ceria and Pataki [19] focused on the MIPLIB3 [15] public set of problem instances. They identified nine instances where lift-and-project cuts worked well on, but if the cheap cuts are applied first, six of those instances becomes very easy. out of the remaining three, lift-and-project cuts does improve performance significantly on one of them. Besides these nine instances, they also identified five very hard instances they could not solve with any method. One, the seymour instance was only solved recently [25] due in part to lift-and-project cuts. Another instance, set1ch, is easily solved only if several rounds of lift-and-project cuts are created first. For the remaining three very hard instances, either it is a problem of finding the integer solution (noswot) or the problem appears unsolvable (danoint and dano3mip). If we can conclude anything from these experiments it is that there is a lack of good public problem instances for experimentation. The lift-and-project cuts no longer 69
provide the clear-cut improvements due to competition from cheaper cuts, but there are still instances where lift-and-project cuts can reduce solution time drastically. Unfortunately, the amount of public “interesting” problem instances available is too small to draw any conclusion.
2.12
Analyzing the root node lift-and-project cuts.
Here we attempt to examine how come we often does not observe a reducting in the number of branch-and-bound nodes even though we tighten the root bound. What might happen is that the root cuts are ”locallized” to the top nodes of the search tree. Once we have branched a few times the cuts might no longer be tight and therefore not affect any further branching choices. In such a case the cuts only affect the initial branching choices, but it is not clear that adding cuts should results in better choices. This is the motivation behind the following experiments. We create five rounds of cuts using our pivot based algorithm within XPRESS and then measure how often the cuts are tight on the nodes of the subsequent branch-and-bound search. We measure the angle of the cuts to the objective function, since a cut that is almost parallel to the objective function will almost certainly stop being tight after a few branches. The column named ”Gap closed” show how much of the gap between the LP relaxation value (including XPRESS ”cover” cuts) and the optimal solution value is closed using our cuts. A value of 1.0 means that the gap was closed completely. The column ”Iterations” is the average number of pivots we performed to create the cuts. The results of these experiments on the MIPLIB set of problems are tabulated in Table 2.13. The column labelled ”Fraction tight cuts” is the average fraction of the root cuts that were tight at the nodes of the branch-and-bound search. For each node we counted how many cuts had a non-basic slack in the LP solution and it is this total count divided by the number of nodes and number of created cuts that give ”Fraction tight cuts”. Finally, the column ”Ratio nodes” is calculated as the number of nodes required to solve the problem with cuts over the number of nodes required by XPRcov . These numbers unfortunately does not quite comfirm our idea that it is when the cuts are tight on most nodes that we get a reduction in the number of nodes. bell3a, fiber, fixnet6 and p2756 are all problems where we close a significant amount of the gap and where many of the cuts remain tight throughout the branch-and-bound search. But each of these problems require more nodes to solve if we add our cuts. In Figure 2.4 we plot the number of nodes and the fraction of cuts that are tight at the various depths of the branch and bound search for two problems. As we have mentioned before, lift-and-project cuts seem to work particularly well on the seymour problem, and this is also corroborated by this graph. Not only do we manage to raise the root bound with our cuts, but those cuts remain tight through the branch-andbound search. Note that we ran this experiment with a time limit of 30 minutes, which is why these graphs only show a truncated search. For the mas76 problem the 70
71
Problem 10teams air03 air04 air05 arki001 bell3a bell5 blend2 cap6000 dano3mip danoint dcmulti dsbmip egout enigma fast0507 fiber fixnet6 flugpl gen gesa2 gesa2 o gesa3 gesa3 o gt2 harp2 khb05250 l152lav lseu markshare1 markshare2 mas74 mas76
Iterations 3.608 0.000 3.232 4.184 0.619 0.636 1.700 18.967 4.800 0.710 18.865 3.351 5.583 0.000 1.000 3.106 6.233 0.933 0.000 0.000 2.000 1.250 3.182 6.038 3.400 4.412 0.778 0.019 3.136 1.917 4.875 3.167 2.636
Angle 78.440 * 63.013 52.625 85.031 81.202 82.065 73.236 58.951 89.861 88.650 78.272 89.868 * 77.976 29.744 69.366 73.177 59.328 * 89.623 89.798 70.998 88.631 67.543 83.482 53.688 66.738 46.810 68.365 73.985 80.990 81.823
Density 0.953 * 0.893 0.913 0.146 0.178 0.154 0.547 0.937 0.456 0.866 0.262 0.111 * 0.650 0.968 0.300 0.565 0.453 * 0.021 0.012 0.137 0.096 0.642 0.481 0.451 0.780 0.785 0.725 0.736 0.998 0.996
Fraction tight cuts 0.031 1.000 0.158 0.097 0.420 0.653 0.238 0.164 0.003 0.203 0.141 0.429 0.533 1.000 0.001 0.187 0.373 0.397 0.009 0.764 0.885 0.852 0.478 0.705 0.216 0.192 0.731 0.037 0.219 0.000 0.000 0.000 0.004
Gap closed 1.000 1.000 0.346 0.137 0.243 0.418 0.923 0.035 0.490 0.002 0.003 0.795 1.000 1.000 1.000 0.063 0.480 0.359 0.128 1.000 1.000 0.657 0.742 0.920 0.640 0.422 0.926 0.486 0.375 1.000 1.000 0.063 0.067
Ratio nodes 0.008 1.000 2.728 1.166 1.038 1.064 1.153
0.212 1.313 1.000 6.348 2.205 2.363 0.654 1.000 0.077 0.417 0.327 0.303 4.400 0.471 0.896 2.631
Problem misc03 misc06 misc07 mitre mkc mod008 mod010 mod011 modglob noswot nw04 p0033 p0201 p0282 p0548 p2756 pk1 pp08a pp08aCUTS qiu qnet1 qnet1 o rentacar rgn rout set1ch seymour stein27 stein45 swath vpm1 vpm2
Iterations 1.063 4.455 2.357 2.000 1.256 6.909 0.000 0.000 0.211 3.365 8.963 0.235 1.875 9.673 18.333 16.500 1.176 14.081 16.655 20.000 10.337 4.930 0.000 0.000 12.071 0.763 0.226 0.048 1.629 0.372 0.000 4.093
Angle 50.036 69.170 57.555 88.123 69.389 18.442 68.576 * 69.799 82.199 73.252 67.598 44.847 40.909 84.935 78.438 87.455 58.264 49.686 51.942 78.323 76.654 88.741 * 78.640 82.275 62.327 38.093 43.498 68.902 * 72.716
Density 0.577 0.078 0.498 0.015 0.232 0.943 0.525 * 0.109 0.331 0.684 0.515 0.703 0.274 0.339 0.291 0.653 0.534 0.741 0.700 0.682 0.777 0.295 * 0.482 0.031 0.513 0.795 0.762 0.302 * 0.290
Table 2.13: Statistics about the created lift-and-project cuts.
Fraction tight cuts 0.081 0.864 0.043 0.985 0.156 0.065 0.082 0.481 0.662 0.235 0.938 0.479 0.091 0.275 1.000 0.889 0.000 0.529 0.398 0.000 0.495 0.337 0.639 0.750 0.126 0.465 0.373 0.014 0.000 0.088 0.807 0.343
Gap closed 0.221 1.014 0.047 1.000 0.371 0.902 1.000 0.000 0.397 1.000 1.000 0.967 0.453 0.575 1.000 0.851 0.000 0.382 0.332 0.096 0.980 0.815 0.181 0.000 0.099 0.827 0.206 0.000 0.000 0.064 1.000 0.360
Ratio nodes 0.645 0.034 0.822 1.000 0.510 0.001 0.817 1.035
0.363 0.583 7.697 0.014 1.018 0.960 1.394 1.709 0.928 0.226 1.258 1.286 1.000
0.956 1.041 1.060 1.000 1.156
cuts quickly stop being tight and during the vast majority of the branch-and-bound search, none of the cuts have any effect. It is therefore surprising that we were able to solve this problem with lift-and-project cuts but not without. This must be the case where the cuts lead to better branching choices at the top nodes which affects all of the subsequent choices.
72
seymour
80
0.8 nodes tight cuts
70
0.7
60
50 Nodes
0.5 40 0.4 30
Fraction of cuts tight
0.6
0.3 20
0.2
10
0 0
10
20
30
40
50 Depth
60
70
80
90
0.1 100
mas76
90000
1 nodes tight cuts
80000 0.8
70000
Nodes
0.6 50000
40000 0.4
Fraction of cuts tight
60000
30000
20000
0.2
10000
0
0 0
10
20
30 Depth
40
50
60
Figure 2.4: Number of nodes processed and the fraction of the root cuts tight at each node at the various depth of the branch and bound search for the seymour and the mas76 problems.
73
Chapter 3 Generating Cuts from Multiple-Term Disjunctions
The work in this chapter has been published as Generating cuts from Multiple-Term Disjunctions by E. Balas and M. Perregaard in K. Aardal and B. Gerards (editors), Proceedings of IPCO VIII, Lecture Notes in Computer Science 2081 (2001), 348-360.
74
3.1
Introduction
We have seen in Chapter 1 that the traditional lift-and-project approach formulates and solves a block-angular higher dimensional Cut Generating Linear Program (CGLP). Each block of the CGLP corresponds to the constraints of one term in the underlying disjunction. In the standard case there are two blocks, but if a cut is generated by the lift-and-project approach from imposing the 0-1 condition on e.g. three 0-1 variables, there are 8 ways to assign 0-1 values to the variables and hence the underlying disjunction will have 8 terms. In principle, the dimension of the CGLP grows as 2k , where k is the number of 0-1 variables on which the 0-1 condition is imposed. Thus, as the complexity of the disjunctions used for cut generation increases, the time to solve a CGLP quickly becomes prohibitive. In Chapter 2 we have shown how lift-and-project cuts from a two-term disjunction can be solved directly in the simplex tableau of the LP relaxation, thereby bypassing the whole higher-dimensional linear program. This result, however, only applies to cuts from two-term disjunctions. The focus of the present chapter is on cuts derived from stronger disjunctions, typically involving at least 3-4 integer variables. We cannot get around the exponential growth in generating optimal disjunctive cuts as we impose the 0-1 condition on more variables, because the problem in itself is N P-hard. However, as we claim to show here, there is a more efficient way to generate cuts from stronger disjunctions, than by solving the corresponding CGLP in the standard fashion. We start again with a mixed integer program of the form min cx s.t. Ax ≥ b xj ∈ Z for j ∈ I
(MIP)
where A is an m×n matrix and here we assume that Ax ≥ b subsumes the constraints x ≥ 0 (if present). The LP-relaxation of MIP is min cx s.t. Ax ≥ b
(LP)
To generate one or more cuts we will consider a disjunctive relaxation of MIP of the form min cx s.t. Ax (DP) W ≥bq q D x ≥ d q∈Q
where each term of the disjunction imposes integrality on several xj , j ∈ I, and no feasible solution to MIP is excluded. We will use PIP , PLP and PDP to denote the feasible regions of MIP, LP and DP, respectively, and we have PIP ⊂ PDP ⊂ PLP . We will further let xIP , xLP and xDP denote an optimal solution to each of the three programs. 75
Our objective is to determine one or more cutting planes of the form αx ≥ β which are valid for PDP but cut off parts of PLP , and in particular the point xLP . From Chapter 1 we have that an inequality αx ≥ β is valid for PDP (see also [2, 4]) if and only if there exist non-negative multipliers uq and v q for each term q ∈ Q such that α = uq A + v q D q β = u q b + v q dq ∀q ∈ Q (3.1) uq ≥ 0, v q ≥ 0 In order to generate cuts, we truncate the higher-dimensional cone defined by (3.1) with the normalization constraint αy = 1, where y ∈ Rn is the constant vector introduced in Chapter 1. To obtain a cut that is maximally violated by the point xLP , we minimize the objective function αxLP − β over the above set of constraints. The result is the lift-and-project Cut Generating Linear Program: min αxLP − β s.t. α = uq A + v q D q β = u q b + v q dq ∀q ∈ Q (CGLP)Q q q u ≥ 0, v ≥ 0 αy = 1 W If we replace the generic disjunction q∈Q D q x ≥ dq with a simple two-term disjunction (xk ≤ 0) ∨ (xk ≥ 0) we recover the CGLP used in [6, 7] (although with the normalization constraint introduced in Chapter 1).
3.2
An Iterative Approach to Cut Generating
An alternative formulation of the cut generating linear program is to define it as optimization over the reverse polar cone of PDP , which after normalization yields the problem min αxLP − β s.t. αx ≥ β ∀x ∈ PDP (3.2) αy = 1 In principle we have here a constraint in α and β for each x ∈ PDP , an infinite set. However, conv(PDP ) is defined by its set of extreme points and extreme rays, so only constraints of (3.2) associated with these are necessary. Thus (3.2) is equivalent to min αxLP − β s.t. αxi ≥ β αxi ≥ 0 αy = 1
76
∀xi ∈ S ∀xi ∈ R
(3.3)
where S and R are the sets of extreme points and (directions of) extreme rays of conv(PDP ). Although we do not know S and R a priori, we can solve (3.3) iteratively and generate the extreme points and rays as needed. In other words, we propose is solve (3.3) by row generation. Suppose we have S1 ⊂ S and R1 ⊂ R. If we solve min αxLP − β s.t. αxi − β ≥ 0 ∀xi ∈ S1 αxi ≥ 0 ∀xi ∈ R1 αy = 1
(3.4)
we obtain an inequality α1 x ≥ β 1 valid for all extreme points in S1 and all extreme rays in R1 , but not necessarily for those in S and R. To check if α1 x ≥ β 1 is valid for all of conv(PDP ), we can solve a linear program of the form min α1 x s.t. x ∈ conv(PDP )
(3.5)
If (3.5) is bounded and x1 is an optimal, extreme solution with α1 x1 < β 1 , then x1 is a point from S \ S1 that violates the current inequality α1 x ≥ β 1 . Hence we can replace S1 by the larger set S2 = S1 ∪ {x1 }. If (3.5) is unbounded then we can find an extreme ray of (3.5) with a direction vector x1 such that α1 x1 < 0, which we can add to R1 to obtain a larger set R2 = R1 ∪ {x1 }. We then repeat this with the new sets S2 and R2 to obtain a new inequality α2 x ≥ β 2 , and keep repeating until in some iteration k we obtain an optimal solution xk to (3.5) which satisfies αk x ≥ β k for the last solution (αk , β k ) to (3.4). This solution demonstrates that we have found a valid inequality. Since the xi we obtain from solving (3.5) are extreme points or rays of PDP , the finiteness of S and R guarantees that the process will terminate. The procedure is outlined in Figure 3.1. In the following we will refer to the problem in Step 2 as the master problem and to the problem in Step 3 as the separation problem. The procedure described here is isomorphic to applying Benders’ decomposition to (CGLP)Q . So far we have not considered the possibility that the master problem in Step 2 of Figure 3.1 could be unbounded. This is where we need a certain property of the normalization αy = 1. If we choose y = xLP − x∗ , where x∗ ∈ conv(S1 ), then the master problem will always be bounded (see [11]). The iterative procedure of Figure 3.1 can be modified in its Step 3 as follows. Instead of adding to the master problem the inequality corresponding to the extreme point xk that minimizes αk x, i.e. violates the inequality αk x ≥ β k by a maximum amount, we add all the violating extreme points or rays encountered in solving the separation problem. We call this version 2. Since the separation problem is usually solved by solving an LP over each term of the disjunction, version 2 of the iterative procedure does not require more time to solve the separation problem. On the 77
Step 1
Let k = 1, R1 ⊂ R and S1 ⊂ S with S1 6= ∅
Step 2
Let (αk , β k ) be an optimal solution to the master problem: min αxLP − β s.t. αxi − β ≥ 0 ∀xi ∈ Sk αxi ≥ 0 ∀xi ∈ Rk αy = 1
Step 3
Solve the separation problem: min αk x s.t. Ax W ≥bq q q∈Q D x ≥ d
If the problem is bounded, let xk be an optimal solution. If αk xk ≥ β k then go to Step 4. Otherwise, set Sk+1 = Sk ∪ {xk } and Rk+1 = Rk . If the problem is unbounded, let xk be the direction vector of an extreme ray satisfying αk xk < 0. Set Rk+1 = Rk ∪ {xk } and Sk+1 = Sk . Set k ← k + 1 and repeat from Step 2. Step 4
The inequality αk x ≥ β k is a valid inequality for PDP . Stop.
Figure 3.1: Iterative procedure for generating a valid inequality: version 1.
78
other hand, it builds up faster the master problem, but it also creates one with a larger number of constraints. On balance, version 2 seems better (see the section on computational results).
3.3
Generating adjacent extreme points
In this section we consider one approach towards reducing the number of extreme points we need to consider in the separation problem. A reasonable constraint to impose on the cut is to require it to be tight at xDP , an optimal solution to (DP). Suppose we impose this restriction, i.e. add to (3.4) the equation αxDP − β = 0, and (αk , β k ) is the solution to the master problem at iteration k. Then either αk x ≥ β k is a valid inequality for PDP or there exists a vertex adjacent to xDP (or possibly an extreme ray incident with xDP ) in conv(PDP ) which violates αk x ≥ β k . This is an immediate consequence of the convexity of conv(PDP ). It follows from this observation that when searching for a violating extreme point (or ray) in the separation problem of Figure 3.1 we only need to consider extreme points adjacent to xDP (or extreme rays incident with xDP ) in conv(PDP ). We now turn to the problem of identifying the extreme points adjacent to xDP . Consider the disjunctive cone CxDP defined by CxDP = {(x, x0 ) ∈ Rn × R+ | W Ax′ + (AxDP − b)x′0 ≥ 0 q ′ q DP − dq )x′0 ≥ 0 } q∈Q D x + (D x
This cone is obtained from PDP by first translating PDP by −xDP such that xDP is translated into the origin, and then homogenizing the translated polyhedron. The following Theorem, which we state without proof here, gives the desired property. Let cone(CxDP ) be the conical hull (positive hull) of CxDP . Theorem 3.1. Let C be the projection of cone(CxDP ) onto the x-space. Then the extreme rays of the convex cone C are in one-to-one correspondence with the edges of conv(PDP ) incident with xDP . Since any vertex adjacent to xDP in conv(PDP ) by definition shares an edge incident with xDP , the immediate result of this theorem is that we only need to consider the extreme rays of C. The relationship between rays (x′ , x′0 ) of CxDP and points or rays x of PDP is ( ′ x + xDP if x′0 > 0 x′0 x= x′ if x′0 = 0 Let αk x ≥ β k be the current iterate from our procedure. To check if there is a violating point adjacent to xDP , we first need to translate and homogenize the inequality, in accordance with what was done to obtain CxDP . The translation results in the inequality αk x′ ≥ β k − αk xDP , but since we imposed on (3.4) the constraint αxDP = β, 79
the righthand side becomes zero, and the coefficient for x′0 after homogenizing will also be zero; hence we obtain the inequality αk x′ ≥ 0. We can thus state Proposition 3.2. Let x ∈ PDP . αk x < β k if and only if αk x′ < 0 for (x′ , x′0 ) = (x − xDP , 1) ∈ CxDP . To obtain a violating vertex of conv(PDP ) adjacent to xDP we solve the following disjunctive program: min αx ˜ ′ ′ DP s.t. Ax − b)x′0 ≥ 0 W + (Ax q ′ q DP − dq )x′0 ≥ 0 q∈Q D x + (D x k ′ α x = −1 x′0 ≥ 0
(3.6)
We impose the equation αk x′ = −1 both to truncate the cone CxDP and to restrict the feasible set to those solutions that satisfy the condition of Proposition 3.2. Any solution to problem (3.6) corresponds to a violating point in PDP , but to obtain a violating extreme point, we minimize an objective over this set. If we choose α ˜ such ′ that αx ˜ ≥ 0 is valid for CxDP then the problem (3.6) will be bounded. To guarantee that the solution we obtain corresponds to a vertex of PDP adjacent to xDP , we must first project out x′0 , according to Theorem 3.1. This can be done by e.g. applying the Fourier-Motzkin projection method. The size of the resulting set of constraints will depend on the number of constraints present in the system D q x′ ≥ dq , but since in most cases of interest D q x′ ≥ d2 can be replaced with a single constraint, the cost of projecting out x′0 is typically not high.
3.4
How to generate a facet of conv(PDP ) in n iterations
For the iterative procedure in Figure 3.1, we do not have a bound on the number of iterations required to obtain a valid inequality, except the trivial bound which is the total number extreme points and extreme rays of conv(PDP ). In this section we present a method which will find a facet-defining inequality for conv(PDP ) in a number of iterations that only depends on the dimension of the problem. When we do this we can no longer guarantee that the resulting inequality will be optimal in (3.2). The basic idea is to start with an inequality that is known to be valid and supporting for PDP . Finding such an inequality should not pose a problem. Using xDP we can easily give such an inequality: cx ≥ cxDP . Then through a sequence of rotations we turn this inequality into a facet-defining inequality for PDP . Each rotation will be chosen such that the new cut will be tight at one more vertex of PDP than the previous cut. 80
x2 a1 H1 P2
x1
H2 x3 H3
P1
a2
Figure 3.2: Example showing how an initial supporting hyperplane H1 is rotated through H2 into a facet-defining hyperplane H3 . An illustration is provided in Figure 3.2. This figure presents two polyhedra, P1 and P2 , whose union is PDP . Our initial plane is H1 which supports PDP only at the point x1 . The first rotation we perform is around the axis a1 through x1 which rotates H1 into H2 . Now H2 is a plane touching PDP at the two points x1 and x2 . Finally, we rotate H2 around the axis a2 through the points x1 and x2 . This brings us to the final plane H3 , which is tight at x1 , x2 and x3 , a maximum independent set on a facet of conv(PDP ). The idea of hyperplane rotation is implemented by performing a linear transformation, in which the current inequality αk x ≥ β k is combined with some target inequality ˜ Thus, we want to find a maximal γ such that (αk + γ α)x ˜ is αx ˜ ≥ β. ˜ ≥ (β k + γ β) a valid inequality for PDP . Suppose Sk is the set of extreme points of conv(PDP ) ˜ such that the inequality αx for which αk x ≥ β k is tight. If we choose (α, ˜ β) ˜ ≥ β˜ is also tight at Sk then the resulting inequality must be tight at Sk . If we further ensure that αx ˜ ≥ β˜ is invalid for PDP then there is a finite maximal γ for which ˜ is valid for PDP . (αk + γ α)x ˜ ≥ (β k + γ β) It can be shown that the maximum value γ ∗ of γ is given by the optimal objective value of the disjunctive program γ ∗ = min αk x − β k x0 s.t. Ax W − bxq0 ≥ 0 q q∈Q D x − d x0 ≥ 0 x0 ≥ 0 ˜ 0 = −1 α ˜ x − βx 81
(3.7)
Step 1
Let α1 x ≥ β 1 be a valid inequality for PDP tight for x1 ∈ PDP . Set S1 = {x1 } and k = 1.
Step 2
Choose a target inequality αx ˜ ≥ β˜ tight for Sk and not valid for PDP .
Step 3
Solve (3.7) to obtain γ ∗ and a point xk .
Step 4
˜ and Sk+1 = Sk ∪ Set (αk+1, β k+1) = (αk , β k ) + γ ∗ (α, ˜ β) k {x }. Increment k ← k + 1.
Step 5
If k = n stop, otherwise repeat from Step 2.
Figure 3.3: Procedure to obtain a facet-defining inequality for PDP in n iterations.
′
The optimal solution (x′ , x′0 ) we obtain from solving (3.7) defines a point x = xx′ ∈ 0 PDP (or a ray of PDP if x′0 = 0) affinely independent of Sk , for which the new inequality is tight. We are now able to present an outline of a procedure that finds a facet-defining inequality of conv(PDP ) in n iterations. This procedure is given in Figure 3.3. For simplicity we have assumed that PDP is bounded and thus omitted the possibility of extreme rays. The inequality α ˜ x ≥ β˜ should be chosen as one that is “deeper” than αk x ≥ β k with respect to xLP , the point we want to cut off. If we do this and if the initial inequality α1 x ≥ β 1 already cuts off xLP then we are guaranteed that the above procedure produces a facet-defining inequality that also cuts off xLP .
3.5
Cut Lifting
An important ingredient of the lift-and-project method is that of working in a subspace (ref. to Chapter 1 or [6, 7]). If a variable is at its lower or upper bound in the optimal solution xLP to the LP-relaxation, it can be ignored for the purpose of cut generation. Thus cuts are generated in a subspace and are then lifted to a cut that is valid for the full space by computing the coefficients of the missing variables. These coefficients are computed using the multipliers {uq }q∈Q that satisfy the constraints of (CGLP)Q for the subspace cut coefficients. The procedures featured in Figures 1 and 3 do not cover the cut lifting aspect and thus do not specify how to compute these multipliers; but once we have determined the cut αx ≥ β, we can fix the value of α and β in (CGLP). This will decouple the constraints and leave |Q| independent linear equality problems from which the multipliers associated with (α, β) are easy
82
to calculate. One potential problem with working in a subspace is the choice of the latter: if we restrict the space too much, the feasible region may become empty. To avoid this, we require that each term of the disjunction be non-empty in the subspace. In our testing to be discussed in the next section, we used the smallest subspace that contains the nonzero components of the optimal solution from each of the separate linear programs of the disjunction. This was easy to implement, since the method we used to solve the disjunctive programs of Step 3 in the procedures of Figures 1 and 3 was to solve a linear program over each term of the disjunction and retain the best solution found.
3.6
Computational testing
To test the ideas presented in this paper experimentally, we need specific disjunctive relaxations of (MIP). We are mainly interested in comparing the effect on the various methods of an increase in the number of terms in the disjunction. There are many ways to create a disjunction involving multiple 0-1 variables. The simplest one is to assign all possible values to a fixed number, k, of 0-1 variables, thus creating a disjunction with 2k terms. However, a little thinking and experimenting shows that this way is not the best. Instead, we use a partial branch and bound procedure with no pruning except for infeasibility, to generate a search tree with a fixed number, k, of leaves. The union of subproblems corresponding to these k leaves is then guaranteed to contain the feasible solutions to (MIP). Therefore the disjunction whose terms correspond to the leaves of this partial search tree is a valid one, although the number of variables whose values are fixed at 0 or 1 in the different terms of the disjunction need not be the same. The branch-and-bound procedure used here is a simple one whose only purpose is to provide us with a disjunction of a certain size. As a branching rule, we branch on an integer constrained variable whose fractional part is closest to 21 . For node selection we use the best-first rule. This search strategy will quickly grow a sufficiently large set of leaf nodes for our disjunction. Further, by using best-first search we also ensure a strong disjunction with respect to the objective function. For each problem instance we generate a round of up to 50 cuts, each from a disjunction coming from a search tree initiated by first branching on a different 0-1 variable fractional in the LP solution. The cuts themselves are generated by five different methods, each using the same disjunctions: 1. By using the simplex method to solve (CGLP)Q in the higher dimensional space; 2. By using the iterative procedure of Figure 3.1, version 1; 3. By using the iterative procedure of Figure 3.1, version 2; 4. By using the iterative procedure that generates only extreme points adjacent to xDP ; 83
5. By using the n-step procedure of Figure 3.3 to find a facet defining inequality. These procedures have been implemented in C on a SUN Ultra 60 with a 360 MHz Ultra SPARC-II processor and 512 MB of memory. To solve the linear programs that arise, CPLEX version 6.60 was used. The test set for our experiments consisted of a set of 14 pure or mixed 0-1 problems from the MIPLIB library of mixed integer programs [15]. The main purpose of our computational testing was to compare the proposed procedures with each other and with the standard procedure of solving (CGLP)Q from the point of view of their sensitivity to the number of terms in the disjunctions from which the cuts are generated. A first comparison, shown in Table 1, features the total time required to generate up to 50 cuts for each of the 14 test problems, (a) by solving the higher dimensional (CGLP)Q as a standard linear program (using CPLEX), and (b) by using the iterative procedure of Figure 1, version 2 (which adds to the master problem all the violators found in Step 3). These numbers are compared for disjunctions with 2, 4, 8 and 16 terms, with the outcome that the times in column (b) are worse than those in column (a) for disjunctions with 2 terms (|Q| = 2), roughly equal or slightly worse for |Q| = 4, considerably better for |Q| = 8, and vastly better for |Q| = 16. For the method featured in column (b), the total computing time grows roughly linearly with |Q|: in about half of the 14 problems, the growth is slightly less than linear, and in the other half it is slightly more than linear. The numbers in column (a) grow much faster, which is understandable in light of the fact that the number of variables and constraints of (CGLP)Q increases with |Q|. Figure 4 shows 5 graphs featuring the behavior of the 5 procedures listed above, as a function of |Q|, the number of terms in the disjunction. On the horizontal axis we represent |Q|, on the vertical axis the total time needed to generate up to 50 cuts, normalized by setting to 1 the time needed by procedure 1 (solving (CGLP)Q directly) for the case |Q| = 8. Graph 3 of Figure 4 corroborates what we said above concerning version 2 of the iterative procedure, whose performance is featured in the columns (b) of Table 1: namely, total time grows roughly linearly with |Q|. Also, graph 1 illustrates the much faster growth of the total time needed for solving (CGLP)Q directly, featured in the columns (a) of Table 1.
3.7
Conclusions
We have described several methods for generating cuts for pure or mixed 0-1 programs (MIP) from more complex disjunctions than the standard dichotomy (xj ≤ 0) ∨ (xj ≥ 1). These methods solve the cut generating linear program iteratively, in the space of the original MIP. For the classical dichotomy, these procedures are inferior to the standard lift-and-project method which solves a linear program in a higher dimensional 84
|Q| = 2 |Q| = 4 |Q| = 8 |Q| = 16 (a) (b) (a) (b) (a) (b) (a) (b) BM21 0.07 0.21 0.32 0.40 1.91 0.85 7.85 1.52 EGOUT 0.08 0.16 0.23 0.25 0.82 0.46 5.76 0.82 FXCH.3 0.52 0.80 1.53 1.50 9.86 2.54 65.19 4.01 LSEU 0.07 0.14 0.23 0.27 0.97 0.56 6.33 1.47 MISC05 1.13 1.82 5.15 5.07 52.25 13.72 658.88 33.00 MOD008 0.09 0.13 0.19 0.33 0.60 0.67 2.69 1.54 P0033 0.07 0.13 0.19 0.24 0.78 0.57 2.89 1.14 P0201 1.40 2.09 8.12 4.80 88.26 10.67 609.17 26.45 P0282 0.85 1.90 2.02 4.51 8.94 17.75 86.24 44.36 P0548 2.77 11.85 7.94 9.00 37.75 20.51 276.18 44.25 STEIN45 36.42 148.38 99.71 157.86 280.71 159.04 1082.78 222.32 UTRANS.2 0.50 0.81 1.55 1.50 10.26 3.53 111.06 13.65 UTRANS.3 0.78 1.26 2.71 2.29 26.47 4.93 173.20 9.31 VPM1 0.45 0.79 1.29 1.62 11.38 2.97 81.29 6.82 (a) Solving (CGLP) (b) Using the Iterative Method of Figure 1, version 2 Table 3.1: Total time for up to 50 cuts
85
50
45
45
40
40
35
35
30
30
normalized time
normalized time
50
25 20
20
15
15
10
10
5
5
0 5
10
15 20 terms in disjunction
25
0
30
5
10
15
20
25
2. Iterative algorithm of Figure 1, version 1. 50
45
45
40
40
35
35
30
30
normalized time
50
25 20
25 20
15
15
10
10
5
5
0 5
10
15
20
25
0
30
5
terms in disjunction
3. Iterative algorithm of Figure 1, version 2.
10
15 20 terms in disjunction
25
45 40 35 30 25 20 15 10 5 0 5
10
15
20
25
30
terms in disjunction
5. Algorithm of Figure 3.
Figure 3.4: Scaled running time versus number of terms in the disjunction
86
30
4. Iterative algorithm with adjacent point generation.
50
normalized time
30
terms in disjunction
1. Solving (CGLP)Q directly.
normalized time
25
space; but for generating cuts from disjunctions with more than 4 terms, i.e. involving 3 or more variables, at least one of the proposed methods is definitely superior to the standard one. This opens up for further research problems like: modifying the procedures to generate multiple cuts from the more complex disjunctions studied here, identifying those multiple-term disjunctions most likely to provide stronger cuts, analyzing the behavior of these cuts as compared to the ones generated from the standard two-term disjunction.
87
Chapter 4 Finding a Sufficient Set of Facets for a Disjunctive Program
88
4.1
Introduction
In this chapter we shift our focus from generating a single cut from a disjunctive relaxation to that of generating a set of cuts. One of our main obstacles with liftand-project is with the choice of a good normalization and objective function over which to optimize a single cut. Our main goal with these cuts is to tighten the LP relaxation of a mixed integer program in the direction of the objective, which is a goal that is not easily translated into an objective for a single cut. In a sense, this is a very bad goal to aim for with a single cut, since an “optimal” cut can be obtained simply by taking the objective function min cx and an optimal solution x¯ to create the “optimal” cut cx ≥ c¯ x. In chapter 1 we looked at various normalization constraints that in each of their own way puts a ranking on the cuts such that we can seach for one that is “optimal”. Thus, the choice of a normalization constraint defines what cut is optimal, which is not necessarily the cut that will work best at tightening the LP relaxation in the direction of the objective function. This brings us to the goal we are aiming for in this chapter; to develop a method for generating a sufficient set of facet-defining inequalities from a chosen disjunctive relaxation, such that these inequalities, when added to the LP relaxation, will tighten it as much as possible in the direction of the objective. Thus, we desire the LP relaxation with cuts to have the property that an optimal solution to the disjunctive program will also be optimal for the tightened LP relaxation. Note that his informal definition does not depend on any normalization constraint but only on the disjunction and a direction of optimzation. The optimization objective merely serves to define what facet-defining inequalities should be generated. Since our underlying application is that of generating cuts for a mixed integer program, the disjunction is used to define a collection of valid inequalities for the mixed integer program and the optimization objective restricts these inequalities to those most relevant when attempting to solve it. Conversely, one can consider the situation where a few branches have been performed for a mixed integer program, but where it is not clear that these branches are particularly better than other choices. If there is a single clear choice of branching variable then it is most likely best to go ahead and branch on it instead of generating cuts. But in the case where we are faced with many seemingly equally promising branching variables, we perform a couple of branches but are not satisfied with the progress. Instead of giving up and dropping all the effort put into branching, we can create a set of cuts as suggested above to capture the work done so far in tightening the LP relaxation objective, and start anew with a fresh branch. Such an approach could be used to e.g. perform some probing branches, where we do not necessarily want to continue with the initial choices, so we capture the branching work done by a set of cuts instead of losing it completely, before probing again. Below we give a more formal definition of this set of cuts. The ideas presented here are not particular to a two-term disjunction as in Chapter 2, but applies equally 89
well to a general disjunction as in Chapter 3. Consider a given disjunctive relaxation (DP) of the mixed integer program (MIP), that is, the disjunctive program min cx s.t. Ax W ≥bq q q∈Q D x ≥ d
(DP)
and, with the notation of chapter 3, we let xDP be an optimal solution to (DP). Let PDP be the convex hull of feasible solutions to (DP), i.e., PDP = conv{x ∈ Rn | Ax W ≥bq q } q∈Q D x ≥ d
Our aim is thus to find a set of facets, F x ≥ f , of PDP such that xDP is optimal in min cx s.t. Ax ≥ b Fx ≥ f The main idea behind generating these facets is fairly simple. Let αx ≥ β be an inequality such that the following linear program is a relaxation of (DP): min cx s.t. Ax ≥ b αx ≥ β
(4.1)
Let α ¯ x ≥ β¯ be another inequality, not equivalent to αx ≥ β. Now construct two new ¯ as follows inequalities, valid for (DP), from (α, β) and (α, ¯ β), α 1 = α + λ1 α ¯ 1 1¯ β = β+λ β
α 2 = α − λ2 α ¯ 2 2¯ β = β−λ β
where λ1 , λ2 ≥ 0. Together, these two new inequalities dominate the original inequality αx ≥ β in every direction. Therefore, if we substitute those for αx ≥ β in (4.1) we obtain the formulation min cx s.t. Ax ≥ b (4.2) α1 x ≥ β 1 α2 x ≥ β 2 which is at least as strong as (4.1). ¯ such that both αx If we are careful to select (α, ¯ β) ¯ ≥ β¯ and −¯ αx ≥ −β¯ are invalid 1 2 1 for (DP) then there exists maximal λ and λ for which α x ≥ β 1 and α2 x ≥ β 2 are valid for (DP). For such maximal λ1 and λ2 , the new inequalities will define faces of 90
PDP of at least the same dimension as that of αx ≥ β, and higher if λ1 and λ2 are non-zero. If we repeat this process with (α1 , β 1) and (α2 , β 2), we will eventually end up with a set of facet-defining inequalities F x ≥ f for PD , such that min cx s.t. Ax ≥ b Fx ≥ f
(4.3)
is a stronger formulation than (4.1).
4.2
Generating the facet-defining inequalities
As introduced in the previous section, we start with the LP relaxation of (DP) min cx s.t. Ax ≥ b
(LP)
plus one inequality, α0 x ≥ β 0 , where α0 = c β 0 = cxDP We will use K to denote the index set of generated and active inequalities, so initially K = {0}. For each inequality k ∈ K, we will let E k denote the set of extreme points of PDP for which we know that the inequality αk x ≥ β k is tight at, i.e., if i ∈ E k then αk xi = β k for the extreme point xi . We will further assume that PDP is full dimensional. Otherwise we would also have to determine the affine hull of PDP . It should be easy to extend the results presented here to a PDP that is not full dimensional, by also storing a set of hyperplanes containing PDP , as they are found. Finally, we will assume that PDP is bounded. This is again to not clutter the description here with unnecessary detail. If PDP is unbounded then we will not only have to keep track of extreme points but also extreme rays. For a general step of the procedure, consider an inequality αk x ≥ β k for some k ∈ K, where |E k | < n. If |E k | = n for every k ∈ K then all the inequalities in K are already facet-defining for PDP and we can stop here since these inequalities cannot be tightened any further. Let α ¯ x ≥ β¯ be any inequality not equivalent to αk x ≥ β k , such that αx ¯ i = β¯ for k every i ∈ E . The problem we need to solve is max λ s.t. (αk + λα)x ¯ ≥ β k + λβ¯ ∀x ∈ PDP
91
(4.4)
Instead of solving this program directly it is much simpler to solve the following dual program min αk x − β k x0 s.t. Ax W − bxq0 ≥ 0 q (4.5) q∈Q D x − d x0 ≥ 0 x0 ≥ 0 ¯ 0 = −1 α ¯ x − βx That (4.5) is indeed the dual to (4.4) is established in the following theorem. Theorem 4.1. Let αk x ≥ β k be a valid inequality for PDP and let α ¯ x ≥ β¯ be any inequality. Then (a) (4.4) is unbounded if and only if (4.5) is infeasible which holds if and only if αx ¯ ≥ β¯ is valid for PDP , (b) if (4.5) is feasible then it is also bounded. Further, if λ∗ is an optimal solution to (4.4) and (x∗ , x∗0 ) is an optimal solution to (4.5) then λ∗ = αk x∗ − β k x∗0 . ¯ Proof. For part (a) observe that if (4.4) is unbounded then (αk + λα)x ¯ ≥ (β k + λβ) 1 k 1 k ¯ In the is valid for PDP for any lambda ≥ 0, and hence so is ( λ α + α ¯ )x ≥ ( λ β + β). ¯ limit λ → ∞ we get that α ¯ x ≥ β is valid for PDP . Note that the constraint set of (4.5) is the homogenization of the constraint set (DP) defining PDP , plus the single ¯ 0 = −1. Therefore, a nonzero point (x, x0 ) is feasible in (4.5) only if equality α ¯ x − βx x is a feasible ray of PDP (if x0 = 0) or x10 x is a feasible point of PDP (if x0 > 0). Since ¯ 0 ≥ 0 in we observed that αx ¯ ≥ β¯ is a valid inequality for PDP we must have αx ¯ − βx ¯ 0 = −1, hence (4.5) must be either case. But this conflicts with the equation αx ¯ − βx infeasible. If (4.4) is bounded then αx ¯ ≥ β¯ can not be valid for PDP by the previous observation. Thus, there exists a point x′ ∈ PDP such that αx ¯ ′ < β¯ or a ray x′ of PDP such that α ¯ x′ < 0. Then (x′ , x′0 ), with x′0 = 1 or 0 respectively, and any scalar multiple of it ¯ ′ < 0 and hence satisfies the homogeneous constraints of (4.5). Furthermore, α ¯ x′ − βx 0 there exists a non-zero multiple of (x′ , x′0 ) that satisfies all the constraints of (4.5), which proves its feasibility. To show that it is also bounded, suppose it is not. Then there exists a feasible (x, x0 ) such that αk x < β k x0 . But this leads to a contradiction with αk x ≥ β k being a valid inequality for PDP . Now, assume that (4.5) is feasible and bounded and thus also (4.4) is feasible and bounded. Let (x∗ , x∗0 ) be an optimal solution to (4.5) and let λ∗ be the optimal ¯ ∗ = −1 we have ¯ ∗ − βx solution to (4.4). Suppose that αk x∗ − β k x∗0 < λ∗ . Since αx 0 k ∗ k ∗ ∗ ∗ ∗ k ∗ ∗ k ∗¯ ∗ ¯ ) or (α + λ α β)x that α x − β x0 < −λ (αx ¯ − βx ¯ )x − (β + λ 0 < 0. Since 0 ∗ ∗ ∗ (x , x0 ) satisfies the homogeneous constraints of (4.5) then either x is a feasible ray of PDP (if x∗0 = 0) or x1∗ x∗ is a feasible point of PDP . In either case it leads to a 0 ¯ for all x ∈ PDP . Hence we must have contradiction with (αk + λ∗ α)x ¯ ≥ (β k + λ∗ β) αk x∗ − β k x∗0 ≥ λ∗ . 92
Since (4.4) is bounded there either exists an x′ ∈ PDP such that (αk + λ∗ α)x ¯ ′= ∗¯ ′ ′ k ∗ ′ (β + λ β) and αx ¯ < β¯ or a feasible ray x of PDP such that (α + λ α)x ¯ = 0 and ′ ′ k ∗ ′ k ∗¯ ∗ ¯ αx ¯ < 0. If αx ¯ ≥ β or (α + λ α)x ¯ > (β + λ β) for all x ∈ PDP then λ could not be ∗ maximal since λ + ε would also be feasible for some small ε > 0. Then γ(x′ , x′0 ) with ¯ ′ ). x′ = 1 or 0 respectively, satisfies all the constraints of (4.5) with γ = −1/(αx ¯ ′ − βx 0 ¯ ′ , the objective value of this solution is Since (αk + λ∗ α)x ¯ ′ = (β k + λ∗ β)x 0 ¯ ′ ) = λ∗ γ(αk x′ − β k x′ ) = −γλ∗ (αx ¯ ′ − βx k
0
Therefore, αk x∗ − β k x∗0 ≤ γ(αk x′ − β k x′ ) = λ∗ , which together with the previous result proves that αk x∗ − β k x∗0 = λ∗ . Let (x1 , x10 ) be an optimal solution to (4.5). The optimal λ1 = αk x1 − β k x10 . The new inequality αk+1 x ≥ β k+1 with αk+1 = αk + λ1 α ¯ k+1 k 1¯ = β +λ β β 1
will be tight at the new extreme point xi = xx1 plus all the extreme points of E k , 0 because both αx ≥ β and αx ¯ ≥ β¯ are tight at all the points of E k . Hence we can set E k+1 = E k ∪ {i}. We now set α ¯ = −αk+1 and β¯ = −β k+1 , and reoptimize (4.5). The new problem (4.5) will be feasible because PDP is full-dimensional. Let (x2 , x20 ) be the new optimal solution to (4.5) and set λ2 = αk x2 − β k x20 . We obtain the second inequality αk+2x ≥ β k+2 as αk+2 = αk + λ2 α ¯ k+2 k 2¯ β = β +λ β This inequality is again tight at the extreme points of E k and the new extreme point 2 xi+1 = xx2 , so we set E k+2 = E k ∪ {i + 1}. 0 Finally, we replace αk x ≥ β k by the two new inequalities by setting K = K \ {k} ∪ {k + 1, k + 2}. This describes how to construct the two inequalites to replace an existing inequality αk x ≥ β k . In the next section we look at how we can delete existing inequalities, such that instead of repeating the above procedure up to 2n times we only have to repeat it at most n2 times.
4.3
Filtering the set of generated inequalities
Each time we repeat the procedure above, we strengthen the LP formulation. This is doing more than we are aiming to do. All we want is a sufficient set of facets F x ≥ f such that any optimal solution to (DP) is also optimal for min cx s.t. Ax ≥ b Fx ≥ f 93
This means that we only have to keep enough of the constraints of K to guarantee this. How do we determine when it is possible to remove a constraint from K? Consider solving the LP min cx s.t. Ax ≥ b (4.6) αk x ≥ β k ∀k ∈ K This program will of course have the same objective value as (DP) because of how K was constructed. Let (y, {z k }k∈K ) be the dual variables in an optimal solution to (4.6). If the dual value for a constraint is zero in an optimal, feasible solution, then that constraint can be removed from the problem and the solution will remain both optimal and feasible. Therefore, if zk = 0 for some k ∈ K then the constraint αk x ≥ β k can be removed from from (4.6) without changing the optimality of the current solution. Recall that our objective is to create a set of facet-defining cuts K such that (4.6) has the same objective value as (DP), and removing cuts k from K with zk = 0 does not change this. So after having optimized (4.6) we keep a constraint αk x ≥ β k if and only if z k > 0, i.e., we set K = {k ∈ K | z k > 0}. Because of the property of basic solutions this implies that the size of K will never exceed n. This in itself is not enough to guarantee that we only have to process at most n2 inequalies. In the next section we examine how this bound is guaranteed.
4.4
Selecting the next inequality for processing
So far we have not considered which inequality k ∈ K to select next to break into two new inequalities. As we will show here, this choice has a large impact on how much work we have to do. Suppose we adopt a scheme that says that we will always choose the k ∈ K with largest size |E k | next. Let us now consider the worst case behavior, where we can not reduce the size of K below n. Let F (m, n) denote the number of times we have to process an inequality in this scheme if we start with K = {k} and where m = n−|E k | and n is the dimension of the problem. After splitting αk x ≥ β k we obtain two new inequalities αk+1 x ≥ β k+1 and αk+2 x ≥ β k+2, with |E k+1 | = |E k+2 | = |E k | + 1. If m = 1 then after one split of inequality k, we obtain two facets, so F (1, n) = 1. On the other hand if n = 1, we will always throw one of the two generated inequalities away, so F (m, 1) = m. So, suppose m > 1 and n > 1. Since both inequalities k + 1 and k + 2 have the same size of |E k+1| and |E k+2 | we can arbitrarily pick the second to process next. In our scheme of always choosing the one with largest size |E k |, this means that we will never touch inequality k + 1 until inequality k + 2 has been developed fully into a set of facets. Since the size of K can never exceed n, this means that the number 94
n/m 1 2 3 4 5 6 7
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 4 4 4 4 4 4
4 4 7 8 8 8 8 8
5 5 11 15 16 16 16 16
6 6 16 26 31 32 32 32
7 8 7 8 22 29 42 64 57 99 63 120 64 127 64 128
Table 4.1: The worst-case work function F . of inequalities in K derived from inequality k + 2 can never exceed n + 1, because inequality k +1 will always be present in K (in a worst case). This is equivalent to the situation where we start with just the inequality k + 2 in an n − 1 dimensional space. The work for this is F (m − 1, n − 1). Not until inequality k + 2 has been processed to completion do we touch inequality k + 1. Once we start working on this inequality, the worst case behavior we can have is that all the inequalities derived from inequality k + 2 will be lost, hence the worst case work for inequality k + 1 is F (m − 1, n). The total (worst case) work is thus F (m, n) = F (m − 1, n − 1) + F (m − 1, n). This is an exponential function, and the first couple of entries for this function have been computed in Table (4.1). This table clearly shows that the assymptotic behavior of the function F (m, n) is 2n . Notice that for the overall process the worstcase work is F (n, n), which we can extrapolate from this table to be F (n, n) = 2n−1 . Hence even if we delete constraints as suggested in the previous section, we are not guaranteed to reduce the amount of work needed. A much better selection scheme to follow is to always choose the inequality k ∈ K with smallest |E k |. As before we have that the size of K will never exceed n, but now we have that all the inequalities in K can differ in value of |E k | by at most one. Whenever we process an inequality k ∈ K with largest |E k |, the worst that can happen is that one of the two new inequalities is thrown away. Therefore, the sum n X k G(K) = |E | |K| k∈K
will increase by at least one. Hence after at most n2 iterations, G(K) ≥ n2 , which means that since |K| ≤ n, then |E k | = n for all k ∈ K. In the scheme where we always select an inequality k ∈ K with smallest |E k |, we have shown that we will have to process an inequality at most n2 times. This is of course a worst case behaviour. We were assuming that we filter the constraints each time we split one constraint into two new constraints, but it does not have to be done this often. Since we always select an inequality with smallest |E k |, we will never consider inequalities of higher 95
set size |E k |. In particular, when we break an inequality with |E k | tight points, the two new inequalities will be tight at |E k | + 1 points. Therefore, only when we first choose an inequality with a higher set size is it necessary to filter the cuts. Hence, in practice we only have to filter cut once for every set size, i.e., at most n times. In the early stages, the program (4.6) will have many optimal solutions, hence there is a choice in which optimal solution should be found. We should try to guide the optimal solution such that the number of non-zero dual variables zk are minimized. Thus to further reduce the amount of work, we can solve the dual problem P min zk k∈KP s.t. yA +P k∈K zk αk = c (4.7) yb + k∈K zk β k = cxDP y ≥ 0, zk ≥ 0 ∀k ∈ K This dual attempts to minimize the sum of the dual variables for the inequalities in K, while requiring the solution be optimal in (4.6).
4.5
¯ Selecting (α, ¯ β)
So far we have not specified how to choose the inequality α ¯ x ≥ β¯ used to modify k k (or “tilt”) our current inequality α x ≥ β . We assume that αk x ≥ β k is tight at a set E k of extreme points of PDP and we want the tilted inequality to remain tight at these points, hence we obtain the natural requirement 1. αx ¯ i = β¯ for all xi ∈ E k . Furthermore, the program (4.5) must be feasible. According to Theorem 4.1 (4.5) is feasible iff α ¯ x ≥ β¯ is not valid for PDP , hence we obtain the requirement 2. αx ¯ ≥ β¯ is not a valid inequality for PDP . The second requirement is the hardest to satisfy for a general PDP since there is no known quick procedure to generate an invalid inequality for an arbitrary polyhedron. If |E k | < n and PDP is full dimensional then we know that there exists an inequality αx ¯ ≥ β¯ satisfying 1 and 2. This is because dim(PDP ) = n > |E k | and therefore there exists a hyperplane containing all of E k , but not some other point of PDP . ¯ satisfying conditions 1 and 2 is to guess one One approach towards finding (α, ¯ β) that satisfies only condition 1. If the program (4.5 is found to be infeasible then by ¯ by Theorem 4.1 α ¯ x ≥ β¯ is a valid inequality for PDP . In that case we replace (α, ¯ β) ¯ ¯ (−¯ α, −β). The program (4.5) with the new (α, ¯ β) can be infeasible only if PDP is ¯ contained in the hyperplane αx ¯ = β. We assumed that PDP is full dimensional so at least one of these programs must be feasible. Alternatively, if we know some point x′ ∈ PDP that does not satisfy αk x ≥ β k at equality then it is sufficient to require that α ¯ x′ < β¯ or equivalently that αx ¯ ′ − β¯ = −1, ¯ will be the solution to a system of |E k | + 1 in addition to condition 1. Hence (α, ¯ β) linear equations. 96
4.6
The procedure
Here is an outline of the complete procedure: Step 0 Let x0 = xDP be an optimal solution to (DP). Set α0 = c, β 0 = cx0 and E 0 = {0}. Set K = {0}. Finally, set kmax = 0. Step 1 Choose k ∈ K such that |E k | = min{|E l | | l ∈ K}. If |E k | = n then STOP. ¯ such that αx Step 2 Select (α, ¯ β) ¯ i = β¯ for all i ∈ E k and αx ¯ ≥ β¯ is not valid for PDP . Step 3a Let xi1 be an optimal solution to λ1 = min αk x − β k x0 s.t. Ax W − bxq0 ≥ 0 q q∈Q D x − d x0 ≥ 0 ¯ 0 = −1 αx ¯ − βx Set
αkmax +1 = αk + λ1 α ¯ kmax +1 k 1¯ β = β +λ β E kmax +1 = E k ∪ {i1 }
Step 3b Let xi2 be an optimal solution to λ2 = min αk x − β k x0 s.t. Ax W − bxq0 ≥ 0 q q∈Q D x − d x0 ≥ 0 −αkmax +1 x + β kmax+1 x0 = −1 Set
αkmax +2 = αk − λ2 αkmax +1 β kmax +2 = β k − λ2 β kmax +1 E kmax +2 = E k ∪ {i2 }
Step 3c Set K ← K \ {k} ∪ {kmax + 1, kmax + 2} and kmax ← kmax + 2. Step 4 Let (y, {zk }k∈K ) be an optimal solution to the LP P min zk k∈KP s.t. yA +P k∈K zk αk = c yb + k∈K zk β k = cxDP y ≥ 0, zk ≥ 0 ∀k ∈ K Set K ← {k ∈ K | zk > 0}. Repeat from Step 1. 97
What we have not touched upon in this chapter is how to solve the disjunctive programs in Step 3a and Step 3b. These programs are exactly the same problems that arise in Section 3.4 and can be solved by a similiar technique. ¯ except to state the Finally, we have not gone into detail on how to choose (α, ¯ β), ¯ can be used to constraints it must satisfy. It is possible that a smart choice of (α, ¯ β) further reduce the number of iterations of the above procedure. Another complication arises when PDP is not full dimensional. When we reach k = dim(P ) in Step 1, then E k contains a maximal set of affinely independent points for a facet, and we must stop. It is preferable if the cuts resulting from the procedure are basic solutions to the corresponding cut generating linear problem (CGLP). This is especially true if the cuts are generated in a subspace. If the subspace cut corresponds to a basic solution to the subspace CGLP, then the lifted cut will also correspond to a basic solution to the full space CGLP.
4.7
A fast approximation
There are two issues with the procedure as given above that can make it impractictal to implement. First, it requires solving a full disjunctive program in each iteration, and with a bound of O(n2 ) such programs, this becomes a substatial amount of work. Second, when PDP is not full dimensional complications arise from ensuring that the cuts correspond to basic solutions to the CGLP. In this section we consider relaxing the linear programs that arise in the disjunction to addresses these two issues. Consider again the disjunctive program min cx s.t. Ax W ≥bq q q∈Q D x ≥ d
(DP)
For each q ∈ Q let xq be an optimal basic solution to min cx s.t. Ax ≥ b D q x ≥ dq
(4.8)
Let Aq x ≥ bq be the collection of rows from Ax ≥ b and D q x ≥ dq which have nonbasic slack in the basis corresponding to xq . Note that Aq is invertible and that xq is also an optimal solution to the relaxed linear program min cx s.t. Aq x ≥ bq Using this relaxation for each q ∈ Q, the disjunctive program is then relaxed to min cx W q q s.t. q∈Q A x ≥ b 98
(DP’)
Since each polyhedron {x ∈ Rn | Aq x ≥ bq } is full dimensional it follows that PDP ′ , defined as PDP ′ = cl conv{x ∈ Rn | ∨q∈Q Aq x ≥ bq } is also full dimensional. We now apply our procedure to (DP’) instead of (DP). The result will be a set of facets F x ≥ f of PDP ′ . Because of the choice of (Aq , bq ), the inequalities F x ≥ f will still satisfy our main property, which is that xDP will be an optimal solution to min cx s.t. Ax ≥ b Fx ≥ f Because PDP ′ is full dimensional each inequality in F x ≥ f is uniquely defined (up to a scalar) by the set of affinely independent points E n , hence each inequality corresponds to a basic solution to the CGLP.
4.7.1
Modifying the procedure
Consider any inequality αx ≥ β. If we introduce explicit surplus variables into each constraint set as Aq x−sq = bq then we can write x = (Aq )−1 bq +(Aq )−1 sq . Thus αx ≥ β can be rewritten in terms of the surplus variables as α(Aq )−1 sq ≥ β − α(Aq )−1 bq . Since the feasible region for sq is just Rn+ , the inequality αx ≥ β is implied by the constraint set Aq x ≥ bq if and only if α(Aq )−1 ≥ 0 q −1 q β − α(A ) b ≤ 0 If we now consider one step of the algorithm, we have a valid inequality αk x ≥ β k ¯ The problem of finding the maximum and add λ times an invalid inequality αx ¯ ≥ β. k k ¯ λ such that (α + λα)x ¯ ≥ (β + λβ) can now be solved easily since this inequality is valid if and only if λ satisfies q −1 (αk + λα)(A ¯ ) ≥ 0 k k q −1 q ¯ (β + λβ) − (α + λα)(A ¯ ) b ≤ 0
(4.9)
Set π q,k = αk (Aq )−1
π0q,k = β k − αk (Aq )−1 bq
which we can also write as αk = π q,k Aq β k = π q,k bq + π0q,k
for q ∈ Q
From here we immediately observe that π q,k are the multipliers in the CGLP, hence by this approach we have the CGLP multipliers readily available and can therefore easily lift the inequality αk x ≥ β k if it is generated in a subspace. 99
4.7.2
Tilting an inequality αk x ≥ β k
¯ We have two The problem remaining is how to choose the tilting inequality αx ¯ ≥ β. properties we require this inequality to satisfy: 1. It is tight at every point of PDP ′ ∩ {x ∈ Rn | αk x = β k }, and ¯ is valid for PDP ′ } is bounded both 2. the set {λ ∈ R | (αk + λα)x ¯ ≥ (β k + λβ) from above and below. The first condition guarantees that when we tilt the inequality αk x ≥ β k , the new inequality will remain tight at all points αk x ≥ β k is tight at. The second condition then guarantess that when we tilt it in either direction we are guaranteed to hit a new point the tilted inequality will be tight at. These properties are best expressed in the space of surplus variables. As before, let π q,k = αk (Aq )−1 π0q,k = β k − αk (Aq )−1 bq If πiq,k = 0 for some i ∈ {1, . . . , n} then the hyperplane defined by π q,k sq = π0q,k is parallel to the feasible extreme ray ei (in the space of surplus variables), or equivaq,k lently, αk x ≥ β k is tight for the extreme ray (Aq )−1 = 0 then the i . Likewise, if π0 q,k k q,k q inequality π s ≥ π0 is tight at the feasible point 0 and therefore α x ≥ β is tight at xq . Therefore, we require that any coefficient πiq,k or π0q,k which is zero, remains at zero when tilting. The tilting is best considered if expressed in the space of the surplus variables sq¯ for some fixed q¯ ∈ Q. In this space the tilting inequality can be written as π ¯ q¯,k sq¯ ≥ π ¯0q¯,k . ¯ where Since sq¯ = Aq¯x − bq¯, this inequality can be written in the x-space as αx ¯ ≥ β, α ¯ = π ¯ q¯,k Aq¯ β¯ = π ¯ q¯,k bq¯ + π ¯0q¯,k We can again restate the inequality αx ¯ ≥ β¯ in the space of surplus variables sq for q −1 q q −1 q any q ∈ Q using x = (A ) s + (A ) b , as an inequality π ¯ q,k sq ≥ π ¯0q,k where π ¯ q,k = α ¯ (Aq )−1 = π ¯ q¯,k Aq¯(Aq )−1 q,k π ¯0 = β¯ − α(A ¯ q )−1 bq = π ¯0q¯,k + π ¯ q¯,k (bq¯ − Aq¯(Aq )−1 bq ) The requirement that if πiq,k = 0 for some i, q then it remains zero after tilting, can now be written π ¯ q¯,k Aq¯(Aq )−1 = 0 ∀i, q : π q,k = 0 i π ¯ q¯,k (bq¯ − Aq¯(Aq )−1 bq ) + π ¯0q¯,k = 0 ∀q : π0q,k = 0
(4.10)
We can write the equation system (4.10) as (V, V0 ) · (¯ π q¯,k , π ¯0q¯,k ) = 0. This system of q ¯ ,k equations defines a linear subspace that (¯ π q¯,k , π ¯0 ) must be chosen from. Note that 100
(π q¯,k , π0q¯,k ) by construction lies in this subspace. Hence, when the dimension of the subspace is one, we can conclude that the equivalent inequality αk x ≥ β k defines a facet of PDP ′ since any other inequality tight at PDP ′ ∩ {x ∈ Rn | αk x = β k } can differ from αk x ≥ β k by at most a scalar multiple. With a given (π q¯,k , π0q¯,k ) we can now solve the problem of finding a maximum λmax π0q¯,k ) defines a valid inequality and a minimum λmin such that (π q¯,k +λ¯ π q¯,k )sq¯ ≥ (π0q¯,k +λ¯ π q,k ≥ 0 and π0q,k + λ¯ π0q,k ≤ 0 for PDP ′ . The constraints we must satisfy are π q,k + λ¯ for each q ∈ Q, that is, λ must satisfy λ(¯ π q¯,k Aq¯(Aq )−1 ) ≥ −π q,k λ(¯ π0q¯,k + π ¯ q¯,k (bq¯ − Aq¯(Aq )−1 bq )) ≤ −π0q,k for each q ∈ Q. When αk x ≥ β k does not define a facet, the subspace given by {(¯ π q¯,k , π ¯0q¯,k ) ∈ Rn+1 | (V, V0 ) · (¯ π q¯,k , π ¯0q¯,k ) = 0} has dimension at least two, i.e., rank(V, V0 ) ≤ n − 1. Thus we are free to chose (¯ π q¯,k , π ¯0q¯,k ) such that at least one component is strictly positive and at least one component is strictly negative. With this choice, there will be at least one constraint that limits how negative λ can become, and another that restricts how positive λ can become, and thus λ is guaranteed to be bounded.
4.7.3
The modified procedure
We are now in a position to state the modified and faster procedure for finding a set of valid facet-defining inequalities F x ≥ f for PDP ′ . Step 0 Let xq be an optimal solution to min{cx | Ax ≥ b, D q x ≥ dq } for each q ∈ Q and let Aq x ≥ bq denote the subset of the constraints Ax ≥ b, D q x ≥ dq with nonbasic surplus in this solution. Choose q¯ ∈ Q such that xq¯ is optimal for (DP’). Set (π q¯,0 , π0q¯,0 ) = (c(Aq¯)−1 , 0) and set K = {0}. Finally, set kmax = 0. Step 1 Choose k ∈ K such that the coefficient matrix (V, V0 ) of (4.10) corresponding to (π q¯,k , π0q¯,k ) has minimal rank among all elements of K. If this rank is n then Stop. Step 2 Select (¯ π q¯, π ¯0q¯) such that (V, V0 ) · (¯ π q¯, π ¯0q¯) = 0 and such that at least one component is negative and at least one component is positive. Step 3a Find the maximal λmax and minimal λmin that satisfies λ(¯ π q¯Aq¯(Aq )−1 ) ≥ −π q,k q¯ λ(¯ π0 + π ¯ q¯(bq¯ − Aq¯(Aq )−1 bq )) ≤ −π0q,k
for all q ∈ Q
Set (π q¯,kmax +1 , π0q¯,kmax +1 ) = (π q¯,k , π0q¯,k )+λmax (¯ π q¯, π ¯0q¯) and (π q¯,kmax +2 , π0q¯,kmax +2 ) = q ¯ ,k q ¯ (π q¯,k , π0 ) + λmin (¯ π q¯, π ¯0 ). 101
Step 3b Update K ← (K \ {k}∪){kmax + 1, kmax + 2} and set kmax ← kmax + 2. Step 4 If the miminal rank of (4.10) over all inequalities indexed by K has not increased, repeat from Step 1. Step 5 Let (y, {zk }k∈K ) be an optimal solution to the LP P min zk k∈KP s.t. yA +P k∈K zk αk = c yb + k∈K zk β k = cxDP y ≥ 0, zk ≥ 0 ∀k ∈ K Set K ← {k ∈ K | zk > 0}. Repeat from Step 1. The coefficient matrix (V, V0 ) given by (4.10) serves the same purpose here as the extreme points indexed by |E k | do in the procedure in Section 5. Thus, to avoid recomputing (V, V0 ) every time in Step 1, we should keep track of this set for each k ∈ K. Thus in step 3, when we tilt an inequality, we only need to update the corresponding (V, V0 ) by considering any coefficient πiq , i = 0, . . . , n that has become zero as a result of the tilting. Note, that by construction the polyhedron PDP ′ has at most n|Q| extreme rays and |Q| extreme points, hence we do not need to store (V, V0 ) explicitly for each inequality, but only the indices for the extreme rays and points present in the set. In the initial step we choose q¯ to be a term of the disjunction containing an optimal solution to (DP). This is not required, but by this choice we already have one zero component in (π q¯,k , π0q¯,k ). The more zeros we have in (π q¯,k , π0q¯,k ), the easier it will be to solve (V, V0 ) · (¯ π q¯, π ¯0q¯), because each zero in (π q¯,k , π0q¯,k ) will correspond to a unit vector in (V, V0 ).
4.7.4
Finding an orthogonal inequality
The computationally most expensive part of the modified procedure, as outlined in the previous section, is solving the system (V, V0 ) · (¯ π q¯, π ¯0q¯) = 0
(4.11)
of Step 2. Here a row (v i , v0i ) of (V, V0 ) represents either an extreme ray v i (when v0i = 0) or an extreme point v i (when v i = 1). The purpose of solving the system is to find a new tilting inequality π ¯ q¯sq¯ ≥ π ¯0q¯ that is tight at all these extreme rays and points. In our case any non-zero solution with a positive and a negative component will do as it will represent a valid and bounded tilting direction. As we tilt an inequality and make it tight at more extreme rays or points, the effect on our system (V, V0 ) is that of adding more rows to it. Thus the homogeneous systems we need to solve is a collection of equalities that grow with each iteration. 102
v1 v2 v3 *
=0
v k−1 vk π Figure 4.1: The system of homogenous system in upper tringular form.
Solve this system from scratch in each iteration is far too expensive1 . Since the system is homogenous we can transform it to an upper triangular form by doing forward substitutions, i.e., create a system where all coefficienst are zero below the diagonal (refer to Figure 4.1). Then when it is time to solve it, we only need to assigne a positive and a negative value to two variables to the right of the diagonal and perform back substitutions. Calculating the back substitutions are fairly cheap and it can be made more efficient for a sparse system if we store only the non-zero coefficients of V and V0 . Unfortunately, we do not have a single system of equations to which we add more and more rows. Each time we perform our tilting operation at an iteration k in Steps 3a and 3b, we create two new inequalities and therefore create two new sets of equality constraints. Thus from our initial system (V k , V0k ) we create two new systems (V k+1 , V0k+1 ) and (V k+2 , V0k+2 ), each consisting of the constraints (V k , V0k ) plus one new row each. We can visualize the set of homogeneous equations as a binary tree, where the constraint set at any particular node is given by the constraint associated with the node itself plus those of all its ancestors. This is illustrated in Figure 4.2. On the left hand side we have the tree representing the inequalities, where the two child inequalities are obtained by perform our tilting operations on the parent inequality. On the right hand side we obtain an identically structured tree which instead has the homogenous equalities as nodes. 1
In an early implementation of the modified procedure we did try to solve the homogeneous system independently in each iteration. The cost was prohibitive and the run time to generate the cuts would often be more than that required to solve the mixed integer program to optimality.
103
π 1 ≥ π01
v1
π 2 ≥ π02
v2
π 3 ≥ π03 π 5 ≥ π05
π 6 ≥ π06
v4
v3
π 4 ≥ π04 v5
π 7 ≥ π07
v6
v7
Figure 4.2: The homogeneous system stored as a binary tree with linked lists through the non-zero coefficients. If we e.g. want to split on inequality six next, then we need to find (π, π0 ) the satisfies the homogenous constraints v 1 π + v01 π0 = 0, v 2 π + v02 π0 = 0, v 3 π + v03 π0 = 0 and v 6 π + v 6 π0 = 0. We will maintain the homogenous equations from a leaf node to the root as an upper triangular system. Notice that as we add a new homogenous equation at the bottom of the tree, we only have to modify this new equation to maintain our upper triangular property. None of the existing equations in the tree will have to be changed. This makes adding new equalities to the tree very easy. Furthermore, to make the procedure efficient we should store only the non-zero coefficients. In order to keep track of them we store them as a linked list for each row (needed for maintaining the upper triangular property when adding a new row) and as a linked list for each column, starting at the leaves and pointing back to the previous non-zero coefficient towards the root node (needed when solving a system).
104
4.7.5
Exploiting common constraints
Suppose we have a constraint ai x ≥ bi that appears in the tight set Aq x ≥ bq for all q ∈ Q. Then this constraint is facet-defining for each of the cones {x ∈ R | Aq x ≥ bq } and must therefore also be facet-defining for the closed convex union of these cones, i.e., for PDP ′ . If this constraint ai x ≥ bi further satisfies the constraint set (4.10) (after transformation into the space of surplus variables for q¯), then we can use ai x ≥ bi as ¯ the tilting inequality αx ¯ ≥ β. We already know that ai x ≥ bi is valid and facet-defining for PDP ′ . Therefore we only need to find the maximum λ such that αk+1x ≥ β k+1 , with (αk+1, β k+1) = (αk , β k ) − λ(ai , bi ), is valid for PDP ′ . The result is that we can write αk x ≥ β k as a positive combination of αk+1 x ≥ β k+1 and ai x ≥ bi , so we have essentially split αk x ≥ β k into the two cuts αk+1x ≥ β k+1 and ai x ≥ bi . But since ai x ≥ bi is an existing inequality of Ax ≥ b, we can immediately throw it away and continue with αk+1 x ≥ β k+1. In our modified procedure above, we should therefore always chose as our tilting constraint (¯ π q¯, π ¯0q¯) one that corresponds to a constraint ai x ≥ bi that is common to all Aq x ≥ bq for q ∈ Q, if one exists that satisfies (V, V0 ) · (¯ π q¯, π ¯0q¯) = 0. If there are k common constraints then all of these constraints must necessarily be linearly independent since each Aq is non-singular by design. Hence during each of our k first iterations we must be able to pick one of these common constraints. The result is that we are left with a single cut which is tight at at least k linearly independent points. Only at this point will we have to start splitting the cut. The effect should be a huge saving in the number of splits we have to perform.
4.7.6
An Alternative Search Order
The argument of the previous section can be generalized to consider any facet of PDP ′ instead of the more easily identifiable common constraints. Our basic problem is that of finding a set of inequalities F x ≥ f such that we can write a given inequality cx ≥ c0 as (uF + vA)x ≥ (uf + vb) with c = uF + vA and c0 ≥ uf + vb for some u ≥ 0, v ≥, where F x ≥ f are facet defining for PDP ′ . If we consider the reverse polar cone of PDP ′ , i.e., the set CDP ′ = {(α, β) | αx ≥ β∀x ∈ PDP ′ }, then (c, c0 ) is a ray of CDP ′ and (F, f ) will be extreme rays of CDP ′ . Thus we are asking for a set of extreme rays of CDP ′ such that (c, c0 ) is in their span (together with (A, b)). This is closely related to the problem of given a point x in a polyhedron P , find a set of dim(P ) + 1 extreme points of P such that x lies in their convex hull. In the previous section we found the extreme rays of CDP ′ one-by-one as the common constraints of Aq x ≥ bq over all q ∈ Q and in each iteration effectively restricted CDP ′ to the face of CDP ′ in which we must find the remaining extreme points (through the constraints (4.10)). This concept should be clearer from the illustration in Figure 4.3. On the left we have our standard method which starts with our “cut” x and shoots out in two 105
directions (equivalent to tilting in the two opposite directions when we minimize and maximize lambda in (4.7.3)). Where we hit the faces of the polyhedron will be our next two cuts and we restrict ourselves to the two faces by adding the necessary constraints to (4.10). In the next iteration we again shoot out in two opposite directions within the faces from the two cuts, resulting in our final set of extreme “cuts”. On the right-hand side of Figure 4.3 we have the alternative dive method. Here we first attempt to find an extreme point, starting from our “cut” x. Each time we hit a face, we shoot out in a new direction within that face and eventually (within at most dim(P ) such searches) we will have found an extreme point, x1 . We then fix this extreme point and search for the remaining set of points whose convex hull wil contain x. We do this by shooting from x1 through x to the opposite face, which provides us with a new point x2 . Note that x is already a convex combination of x1 and x2 and if we can find a set of extreme points with x2 in their convex hull, we are done. Hence we have reduced our initial problem to an equivalent smaller problem in the face of x2 . We recursively repeat the search for an extreme point here until we have the desired set of extreme points. The common constraints are a special case of this diving in that we can easily identify some extreme points. Once we have exploited the common constraints we can start the search for regular facets of PDP ′ by shooting out in a random direction as illustrated in Figure 4.3. If our problem contains n variables and k common constraints, then after exploiting all common constraints, we are left with the problem of finding n−k facet-defining cuts. In a d-dimensional cone, it requires at most d tilts to find a facet-defining cut. The total number of tilts we have to perform is therefore bounded by k + (n − k) + (n − k − 1) + . . . + 2 + 1 = O(k + (n − k)2 ).
4.8
Computational Experiments
We have implemented the modified procedure of Section 4.7.3 together with the enhancement of exploiting common constraints in Section 4.7.5. This implementation includes the alternative in Section 4.7.6 of diving for facet-defining cuts instead of the regular splitting. This implementation uses the optimizer library of XPRESS version 14.05 to set up and solve the linear programs that arise. Our main obstacle in testing these cuts is that we need a good disjunction that raises the LP relaxation objective (we consider minimization problems). If the disjunctive relaxation (DP) has exactly the same optimal objective value as the LP relaxation of (MIP) then the empty set of constraints (F, f ) = (0, 0) trivially solves our problem. We will use the branch-and-bound as in Chapter 3 (Section 3.6) to create our disjunction, but we use strong-branching to select the branching variable, instead of simply looking at the fractional value, in order to get a disjunction that raises the optimal objective value. The disjunction will, as in Chapter 3, be the constructed as the disjunction induced by all the leaf nodes after branching the problem a few times. For testing, we create disjunctions with 2, 4 or 8 terms in this fashion. 106
Splitting
Diving
x
x
x
x1 x
x
x
x1 x
x
x2
Figure 4.3: Split versus dive search order.
107
For our test set we use the MIPLIB [15] library of mixed integer problems, but where we have removed those instances that are either too big for our implementation to handle or where it was impossible to raise the objective with our branching scheme. For comparison we create a single round of cuts using our lift-and-project described in Chapter 2. All of the tests were run on a Personal Computer with a 1.6GHz Pentium-4 processors and 512MB RAM. The test results are tabulated in Tables 4.2, 4.3 and 4.4. The first table contains the times to generate the cuts for a 2, 4 or 8 term disjunction using either the modified splitting procedure or the alternate diving procedure. The last column contains the times to generate a single round of lift-and-project cuts. Table 4.3 lists the corresponding final number of cuts created. Finally, in Table 4.4 we show the percentage of the gap between the optimal LP relaxation solution and the mixed integer solution that was closed using the cuts. Our implementation is unfortunately not as numerically stable as we could have wished for. The runs that failed to complete are marked with “N/A” in the tables. What happens is that each time we tilt a cut, round-off errors from the computations accumulate and in some of the larger problems this can unfortunately lead to unexpected results like the filter LP (ref. to Section 4.3) becoming slightly infeasible. Currently we can not counter these accumulated round-off errors until we have the facet-defining cuts with their full sets of tight points. At this stage we can recalculate the cuts from the tight points and thus obtain much more accurate cuts. The one biggest effect on the cutting time is the identification and exploitation of common constraints. Without this improvement, a problem like p0548 would require about 50 times as much time, which is why we do not even consider running without it. The majority of the work using our splitting procedure is in filtering the cuts at each level. Each time we filter we are solving a linear program that is equivalent to the LP relaxation of the original mixed integer program. For the diving method, on the other hand, we do not have to (and can not) filter the cuts, but here the run time clearly exhibits the quadratic asymptotic growth in the number of cuts we have to split. As an illustration of the two methods, we plot the number of cuts processed at the various levels (number of tight points) for the fiber problem in Figure 4.4. For the 2, 4 and 8 terms there are respectively 1026, 991 and 942 common constraints in this 1054 variable problem. Thus, we were able to find respectively 97%, 94% and 89% of the required tight points before we had to do any splitting. This is a major saving in time. Since the area under the graphs is proportional to the number of cuts we have to process, we clearly see the quadratic growth with the diving method, and equally clear the large benefit of filtering in the splitting method. Without the common constraints we would have been unable to create cuts with the diving method in any reasonable amount of time. The cost of these cuts is still quite high, although for some problems like seymour the times are comparable to lift-and-project. But we have to remember here that the 108
Problem arki001 bell3a bell5 blend2 danoint dcmulti dsbmip egout fiber fixnet6 flugpl gen gesa2 gesa2 o gesa3 o gt2 harp2 khb05250 l152lav lseu mas74 mas76 misc03 misc06 mod008 mod010 modglob p0033 p0201 p0282 p0548 p2756 pp08a pp08aCUTS qiu qnet1 qnet1 o rentacar rout set1ch seymour vpm1 vpm2
Splitting (terms) 2 4 8 2.03 3.99 N/A 0.15 0.23 0.41 0.11 0.19 0.46 0.44 N/A N/A 1.40 3.06 8.06 1.37 N/A N/A 1.50 4.57 7.47 0.06 0.10 0.23 3.65 8.91 18.22 16.47 22.05 31.03 0.03 0.06 0.12 0.59 2.05 5.03 2.75 6.36 15.56 2.36 5.94 15.97 1.42 3.40 7.21 0.13 0.21 0.37 2.45 N/A N/A 1.06 2.58 10.58 14.97 37.29 58.36 0.04 0.09 0.17 0.14 0.35 0.64 0.08 0.25 0.47 0.07 0.19 1.25 1.97 5.48 N/A 0.08 0.25 0.68 8.73 35.61 57.93 0.32 0.96 1.68 0.03 0.07 0.14 0.07 0.80 1.54 0.19 N/A N/A 2.38 N/A N/A 3.09 7.23 N/A 0.15 0.34 0.80 0.30 0.84 1.75 0.92 N/A N/A 17.20 N/A N/A 2.75 4.21 80.45 152.88 N/A N/A 0.81 1.48 3.28 0.62 1.92 3.40 4.55 32.00 67.88 0.24 0.59 1.25 0.16 0.51 0.98
Diving (terms) 2 4 8 N/A N/A N/A 0.09 0.15 0.29 0.12 0.16 0.41 N/A N/A N/A 1.18 3.15 7.50 N/A N/A N/A 1.54 4.38 7.11 0.04 0.08 0.17 0.68 3.38 14.71 4.00 12.26 32.53 0.03 0.06 0.10 0.26 1.01 3.15 0.88 2.87 N/A 0.81 2.56 N/A 0.81 1.99 4.94 N/A N/A N/A N/A N/A N/A 0.57 1.20 6.55 2.30 9.07 34.35 0.04 0.07 0.15 0.07 N/A N/A 0.06 0.13 0.30 0.06 0.18 0.95 N/A N/A N/A 0.06 0.12 0.38 2.46 6.81 20.84 0.17 N/A N/A 0.03 0.06 0.12 0.08 0.54 1.17 N/A N/A N/A N/A N/A N/A N/A N/A N/A 0.12 0.30 0.72 0.23 0.64 1.49 N/A N/A N/A N/A N/A N/A 0.73 1.61 N/A N/A N/A N/A 0.40 0.91 2.14 0.36 1.02 2.44 8.79 N/A N/A 0.06 0.13 0.28 0.10 0.42 0.79
L-and-P 3.09 0.03 0.02 0.05 0.75 0.25 1.78 0.00 0.09 0.79 0.00 0.14 0.89 1.12 3.38 0.01 0.11 0.03 0.20 0.01 0.02 0.01 0.02 0.38 0.00 0.18 0.07 0.00 0.09 0.01 0.09 0.40 0.04 0.32 7.74 0.97 0.16 0.13 0.69 0.76 354.24 0.02 0.08
Table 4.2: Cutting time (includes branch-and-bound time to find disjunction).
109
Problem arki001 bell3a bell5 blend2 danoint dcmulti dsbmip egout fiber fixnet6 flugpl gen gesa2 gesa2 o gesa3 o gt2 harp2 khb05250 l152lav lseu mas74 mas76 misc03 misc06 mod008 mod010 modglob p0033 p0201 p0282 p0548 p2756 pp08a pp08aCUTS qiu qnet1 qnet1 o rentacar rout set1ch seymour vpm1 vpm2
Splitting (terms) 2 4 8 1 2 N/A 5 6 8 8 9 12 1 N/A N/A 1 1 1 3 N/A N/A 6 6 6 5 4 9 5 7 11 2 12 11 2 4 5 2 3 15 3 9 13 2 9 12 1 4 7 9 3 5 1 N/A N/A 2 4 5 2 6 5 2 2 3 3 4 8 1 4 4 0 0 4 1 4 N/A 1 3 4 1 4 3 2 4 5 1 4 4 0 4 4 7 N/A N/A 11 N/A N/A 2 2 N/A 3 5 8 1 3 4 0 N/A N/A 2 N/A N/A 2 3 5 5 N/A N/A 1 2 5 1 5 7 2 5 13 1 2 3 2 7 11
Diving (terms) 2 4 8 N/A N/A N/A 8 12 14 8 10 16 N/A N/A N/A 1 1 1 N/A N/A N/A 6 6 6 5 3 11 4 12 18 11 24 25 2 3 8 1 11 15 4 12 N/A 4 11 N/A 1 4 8 N/A N/A N/A N/A N/A N/A 2 4 10 4 10 15 2 4 5 2 N/A N/A 1 3 6 0 0 4 N/A N/A N/A 2 2 7 2 2 5 2 N/A N/A 1 3 6 0 5 8 N/A N/A N/A N/A N/A N/A N/A N/A N/A 3 5 8 1 7 7 N/A N/A N/A N/A N/A N/A 2 3 N/A N/A N/A N/A 2 6 7 1 8 10 2 N/A N/A 1 2 3 3 8 16
Table 4.3: Final number of cuts.
110
L-and-P 87 24 25 6 52 49 56 8 44 12 9 23 42 73 100 11 28 19 53 7 12 11 16 14 5 41 30 9 22 24 45 54 51 46 36 47 11 9 40 129 580 15 29
Problem arki001 bell3a bell5 blend2 danoint dcmulti dsbmip egout fiber fixnet6 flugpl gen gesa2 gesa2 o gesa3 o gt2 harp2 khb05250 l152lav lseu mas74 mas76 misc03 misc06 mod008 mod010 modglob p0033 p0201 p0282 p0548 p2756 pp08a pp08aCUTS qiu qnet1 qnet1 o rentacar rout set1ch seymour vpm1 vpm2
Splitting (terms) 2 4 8 3.5 0.0 N/A 27.3 45.2 57.2 84.7 85.8 91.0 19.7 N/A N/A 1.7 1.7 1.7 21.6 N/A N/A 0.0 0.0 0.0 23.4 34.6 60.3 2.8 5.3 9.2 19.2 32.4 37.6 7.0 10.2 14.7 8.6 16.3 24.3 4.9 14.3 21.4 4.6 13.5 20.7 7.0 13.4 19.4 61.6 63.9 65.2 2.1 N/A N/A 15.2 27.7 37.8 7.9 17.7 33.0 0.8 1.4 7.2 3.1 6.9 9.0 1.9 3.2 5.9 0.0 0.0 7.2 22.7 40.7 N/A 1.5 2.3 16.5 22.5 30.3 47.1 7.2 12.6 18.4 5.0 35.3 35.3 0.0 11.3 21.1 26.1 N/A N/A 76.7 N/A N/A 0.1 0.1 N/A 5.1 9.4 13.5 4.4 7.9 13.5 0.0 N/A N/A 21.3 N/A N/A 12.0 21.0 32.6 24.2 N/A N/A 2.4 4.1 6.4 0.8 2.0 3.2 2.6 4.5 7.1 4.7 6.1 11.2 3.2 19.6 24.9
Diving (terms) 2 4 8 N/A N/A N/A 27.3 45.2 57.2 84.7 85.8 91.0 N/A N/A N/A 1.7 1.7 1.7 N/A N/A N/A 0.0 0.0 0.0 23.4 34.6 60.3 2.8 5.3 9.2 19.2 32.4 37.6 7.0 10.2 14.7 8.6 16.3 24.3 4.9 14.3 N/A 4.6 13.5 N/A 7.0 13.4 19.4 N/A N/A N/A N/A N/A N/A 15.2 27.7 37.8 7.9 17.7 33.0 0.8 1.4 7.2 3.1 N/A N/A 1.9 3.2 5.9 0.0 0.0 7.2 N/A N/A N/A 1.5 2.3 16.5 22.5 30.3 47.1 7.2 N/A N/A 5.0 35.3 35.3 0.0 11.3 21.1 N/A N/A N/A N/A N/A N/A N/A N/A N/A 5.1 9.4 13.5 4.4 7.9 13.5 N/A N/A N/A N/A N/A N/A 12.0 21.0 N/A N/A N/A N/A 2.4 4.1 6.4 0.8 2.0 3.2 2.6 N/A N/A 4.7 6.1 11.2 3.2 19.6 24.9
L-and-P 29.3 51.7 85.4 16.2 1.0 58.6 0.0 31.9 69.6 28.2 11.7 33.3 21.4 28.7 72.9 71.0 21.2 75.2 2.0 29.1 4.6 3.3 0.0 33.3 1.4 74.9 21.3 8.9 0.0 3.2 46.4 43.1 57.9 28.7 2.2 17.8 49.1 15.2 2.0 28.3 9.8 29.9 16.0
Table 4.4: Percent of gap closed between LP and integer solution from cuts.
111
fiber
120
split 2 split 4 split 8 dive 2 dive 4 dive 8
100
Number of cuts
80
60
40
20
0 940
960
980
1000 Tight points
1020
1040
1060
Figure 4.4: Number of cuts processed as a function of depth (number of tight points) for the fiber problem. lift-and-project cuts are created as one cut from each of many disjunctions, whereas we have only considered a single disjunction for each problem here. What is more interesting is that with the relatively low number of cuts, we are often able to raise the LP objective as much as with a full round of lift-and-project cuts and often surpass it. danoint, l152lav, mod008, p282, p0548, qnet1 and rout are all problems where our small set of cuts from a simple two-term disjunctions is able to raise the objective higher than a round of lift-and-project cuts.
4.9
Necessary Improvements
Although, as we have seen, the identification of common constraints is able to significantly reduce the cost of generating our sufficient set of facets, the cost is still large compared to alternative cuts such as lift-and-project cuts. One of the main advantages that lift-and-project cuts posess is that they can be created in a subspace. Such a subspace is constructed by fixing and thereby ignoring variables that are at a common bound in both terms of the disjunction. In other words, we have here the similar situation where the presence of common constraints (in this case bounds) allows us to reduce the amount of work. The difference is that the common bounds can be used to completely eliminate a large part of the variables which will have thus been removed from the cutting procedure. In our case, the common constraints reduces the number of iterations, but not the dimension of our problem; the filter LP remains unchanged and the homogeneous system of equalities we have to solve when constructing a tilt cut also includes all the points found from the common constraints. 112
If we could somehow eliminate the common constraints completely from the cutting phase, as with lift-and-project cuts, we should see a drastic improvement in time and stability. Another, albeit less important advantage that lift-and-project cuts currently hold is that they can be strengthened (ref. to Chapter 1). Strengthening our set of cuts is essentially equivalent to improving the branching rule when building the disjunction. This is perhaps a weakness of our cuts since the quality of them depends directly on our ability to pick good branches.
4.10
Conclusion
We have here presented a method for generating a set of facets for a disjunctive program. Although the implementaion we currently have is a bit unstable and rather slow compared to lift-and-project cuts, it demonstrates that with a few well-created cuts from a single disjunction, it is often possible to tighten the LP relaxation of a mixed integer program at least as well as with a full round of lift-and-project cuts.
113
Bibliography [1] E. Balas, Intersection Cuts – A New Type of Cutting Planes for Integer Programming, Operations Research 19 (1971), 19-39. [2] E. Balas, Disjunctive Programming, Annals of Discrete Mathematics 5 (1979), 3-51. [3] E. Balas, Disjunctive Programming and a Hierarchy of Relaxations for discrete Optimization Problems, SIAM Journal on Algebraic and Discrete Methods, 6 (1985), 466-485. [4] E. Balas, Disjunctive Programming: Properties of the Convex Hull of Feasible Points, Discrete Applied Mathematics 89 (1998), 1-44. [5] E. Balas, A Modified Lift-and-Project Procedure, Mathematical Programming 79 (1997), 19-31. [6] E. Balas, S. Ceria and G. Cornu´ejols, A Lift-and-Project Cutting Plane Algorithm for Mixed 0-1 Programs, Mathematical Programming 58 (1993), 295-324. [7] E. Balas, S. Ceria and G. Cornu´ejols, Mixed 0-1 Programming by Lift-and-Project in a Branch-and-Cut Framework, Management Science 42 (1996), 1229-1246. [8] E. Balas, S. Ceria, G. Cornu´ejols and G. Pataki, Polyhedral Methods for the Maximum Clique Problem, in D. Johnson and M. Trick (editors), “Clique, Coloring and Satisfiability: The Second DIMACS Challenge”, The American Mathematical Society, Providence, RI, 1996, 11-27. [9] E. Balas and R. Jeroslow, Strengthening Cuts for Mixed Integer Programs, European Journal of Operations Research 4 (1980), 224-234. [10] E. Balas and M. Perregaard, Generating cuts from Multiple-Term Disjunctions, In K. Aardal and B. Gerards (editors), Proceedings of IPCO VIII, Lecture Notes in Computer Science 2081 (2001), 348-360. [11] E. Balas and M. Perregaard, Lift-and-project for Mixed 0-1 programming: recent progress, Discrete Applied Mathematics 123 (2002), 129-154. 114
[12] E. Balas and M. Perregaard, A Precise Correspondence Between Lift-and-Project Cuts, Simple Disjunctive Cuts, and Mixed Integer Gomory Cuts for 0-1 Programming, Mathematical Programming, Ser. B 94 (2003), 221-245. [13] E. Balas, J. Tama and J. Tind, Sequential Convexification in Reverse Convex and Disjunctive Programming, Mathematical Programming 44 (1989), 337-350. [14] N. Beaumont, An Algorithm for Disjunctive Programming, European Journal of Operational Research 48 (1990), 362-371. [15] R.E. Bixby, S. Ceria, C.M. McZeal, M.W.P. Savelsbergh, An updated Mixed Integer Programming Library: MIPLIB 3.0, http://www.caam.rice.edu/∼bixby/miplib/miplib.html. [16] R. Bixby, W. Cook, A. Cox and E. Lee, Computational Experience with Parallel Mixed Integer Programming in a Distributed Environment, Annals of Operations Research 90 (1999), 19-45. [17] C. Blair, Two Rules for Deducing Valid Inequalities for 0-1 Programs, SIAM Journal of Applied Mathematics 31 (1976), 614-617. [18] C. Blair, Facial Disjunctive Programs and Sequences of Cutting Planes, Discrete Applied Mathematics 2 (1980), 173-180. [19] S. Ceria and G. Pataki, Solving Integer and Disjunctive Programs by Lift-andProject, In R.E. Bixby, E.A. Boyd and R.Z. Rios-Mercado (editors), Proceedings of IPCO VI, Lecture Notes in Computer Science 1412 (1998), 271-283. [20] S. Ceria and J. Soares, Disjunctive Cuts for Mixed 0-1 Programming: Duality and Lifting, GSB, Columbia University, 1997. [21] S. Ceria and J. Soares, Convex Programming for Disjunctive Optimization, GSB, Columbia University, 1997. [22] G. Cornu´ejols, Y. Li, On the Rank of Mixed 0,1 Polyhedra, in K. Aardal et al. (editors), “Integer Programming and Combinatorial Optimization”, Proceedings of IPCO 8, Lecture Notes in Computer Science 2081 (2001), 71-77. [23] W. Cook, R. Kannan, A.J. Schrijver, Chv´atal closures for mixed integer programming problems, Mathematical Programming 47 (1990), 155-174. [24] F. Eisenbrand and A. Schulz, Bounds on the Chvatal rank of polytopes in the 0-1 cube, in G. Cornu´ejols et al. (editors), “Integer Programming and Combinatorial Optimization@, Proceedings of IPCO 7, Lecture Notes in Computer Science 1610 (1999), 137-150.
115
[25] M.C. Ferris, G. Pataki, S. Schmieta, Solving the seymour problem, Optima 66 (2001) 2-6. [26] R. Gomory, Outline of an Algorithm for Integer Solutions to Linear Programs, Bulletin of the American Mathematical Society 64 (1958), 275-278. [27] R. Gomory, An Algorithm for the Mixed Integer Problem, Technical Report RM2597, The RAND Corporation, 1960. [28] R.G. Jeroslow, Cutting Plane Theory: Disjunctive Methods, Annals of Discrete Mathematics 1 (1977), 293-330. [29] R.G. Jeroslow, Representability in Mixed Integer Programming I: Characterization Results, Discrete Applied Mathematics 17 (1987), 223-243. [30] R.G. Jeroslow, Logic Based Decision Support: Mixed Integer Model Formulation, Annals of Discrete Mathematics 40 (1989). [31] L. Lov´asz and A. Schrijver, Cones of Matrices and Set Functions and 0-1 Optimization, SIAM Journal of Optimization 1 (1991), 166-190. [32] G.L. Nemhauser, L.A. Wolsey, A recursive procedure to generate all cuts for 0-1 mixed integer programs, Mathematical Programming 46 (1990), 379-390. [33] H. Sherali and W. Adams, A Hierarchy of Relaxations Between the Continuous and Convex Hull Representations for Zero-One Programming Problems, SIAM Journal on Discrete Mathematics 3 (1990), 411-430. [34] H. Sherali and C. Shetty, Optimization with Disjunctive Constraints, Lecture Notes in Economics and Mathematical Systems 181, Springer, 1980. [35] R.A. Stubbs and S. Mehrotra, A Branch-and-Cut Method for 0-1 Mixed Convex Programming, Department of Industrial Engineering, Northwestern University, 1996. [36] S. Thienel, ABACUS: A Branch-and-Cut System, Doctoral Dissertation, Faculty of Mathematics and The Natural Sciences, University of Cologne, 1995. [37] M. Turkay and I.E. Grossmann, Disjunctive Programming Techniques for the Optimization of Process Systems with Discontinuous Investment Costs-Multiple Size Regions, Industrial Engineering Chemical Research 35 (1996), 2611-2623. [38] H.P. Williams, An Alternative Explanation of Disjunctive Formulations, European Journal of Operational Research 72 (1994), 200-203.
116