Path-Following Methods for Linear Programming Author(s): Clovis C. Gonzaga Source: SIAM Review, Vol. 34, No. 2 (Jun., 1992), pp. 167-224 Published by: Society for Industrial and Applied Mathematics Stable URL: http://www.jstor.org/stable/2132853 Accessed: 21-09-2015 16:22 UTC
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/ info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact
[email protected].
Society for Industrial and Applied Mathematics is collaborating with JSTOR to digitize, preserve and extend access to SIAM Review.
http://www.jstor.org
This content downloaded from 200.17.211.124 on Mon, 21 Sep 2015 16:22:39 UTC All use subject to JSTOR Terms and Conditions
SIAM REVIEW Vol. 34, No. 2, pp. 167-224,June1992
(
1992 SocietyforIndustrialand AppliedMathematics 001
PATH-FOLLOWING METHODS FOR LINEAR PROGRAMMING* CLOVIS C. GONZAGAt forlinearprogramming methods ofalgorithms is described In thispapera unified treatment Abstract. farfrom andthatstaysalways path.Thispathis a curvealongwhichthecostdecreases, basedon thecentral inprimal andprimalofthiscurvearedescribed set.Severalparameterizations theboundary ofthefeasible areobtainedbyfollowing thecurveusingdifferent algorithms anditis shownhowdifferent dualproblems, thecurveapproximately, andthisconcept byfollowing areobtained Polynomial algorithms parameterizations. path. ofa pointinrelation tothecentral theproximity rulesformeasuring becomesprecisebyusingexplicit algorithms interior path-following pointmethods, Keywords.linearprogramming, 49D AMS(MOS)subjectclassification.
a family ofalgorithms forsolving thelinear In thispaperwestudy 1. Introduction. problem programming minimize cTx subjectto
(P)
Ax
=
b
x
>
0,
m x n matrix, n > m. wherec E El, b E E', A is a full-rank Weassumethatthefeasibleregion S
= {X E In
I Ax = b,x > O}
relative interior is boundedandhasa nonempty givenby So
= {X
E 1WnI Ax = b,x >
O}.
wasfirst solvedbyDantzig[14]forty yearsago. problem The linearprogramming anditwill usedalgorithm, byhimisstillthemostwidely methoddeveloped Thesimplex andelegant, thesimplex methodis efficient remainso inthefuture. Although possibly in thelasttwo thatbecamemoreand morecharming it does notpossessa property In fact,a problemdevisedbyKlee and Minty[60] decades: polynomial complexity. thatgrew operations forcedthesimplexmethodto executea numberof arithmetical oftheproblem, to themethodan ofvariables attaching withthenumber exponentially worst-case complexity. exponential probforthelinearprogramming algorithm a polynomial The questiononwhether [58],[59]. He appliedtheellipsoidal lemexistswas answeredin 1978byKhachiyan probmethod ofShor[102]andYudinandNemirovskii [123]tothelinearprogramming neededto boundonthenumber ofarithmetical lemandproveda polynomial operations ofthe findan optimal Thebound,O(n4L),dependsona number L, thelength solution. oftheproblem data),whichis someofbitsusedinthedescription input(totalnumber with ofa "strongly Theexistence algorithm, i.e.,a method polynomial" whatfrustrating. andconstraints, isstilla diffiofvariables a complexity boundbasedonlyonthenumber andhada greatimpact raisedanenormous Themethod cultopenproblem. enthusiasm, form)May10,1991. (inrevised 4, 1990;acceptedforpublication September *Received bytheeditors RJ,Brazil(email: ofRio de Janeiro, C. Postal68511,21945Rio de Janeiro, tCOPPE,FederalUniversity
gonzagaQbrlncc . bitnet).
167 on Mon, 21 Sep 2015 16:22:39 UTC This content downloaded from 200.17.211.124 All use subject to JSTOR Terms and Conditions
168
CLOVIS C. GONZAGA
onthetheory ofcomplexity, butunfortunately thepractical havebeen implementations irremediably inefficient. Forcomprehensive studiesofthesetwoapproaches, see forinstance Dantzig[15], andTodd[32].Complexity inShamir Schrijver [100],andGoldfarb issuesarediscussed [101],Megiddo[70],[71],Bland,Golfarb, andTodd[12],Borgwardt [13],andTardos [107]. In 1984,Karmarkar [55]published hisalgorithm, whichnotonlyhada polynomial boundofO(n3-5L)operations, complexity lowerthanKhachian's, butwasannounced as moreefficient thanthesimplex method.Therewas initially muchdiscussion about thisclaim,butnowitis clearthatwell-coded versions ofthenewmethodology arevery whentheproblem efficient, sizeincreases abovesomethousands especially ofvariables. Karmarkar's isessentially algorithm different from thesimplex methodinthatitevolves the(relative) interior ofthefeasibleset,insteadoffollowing through a sequenceofverticesas doesthesimplex method.Karmarkar's hasa flavor algorithm ofnonlinear proincontrast withthecombinatorial gramming, gaitofthesimplex method. Karmarkar's initsoriginal form neededa specialformulation ofthelinear algorithm andreliedontheknowledge programming problem, ofthevalueofan optimal solution, or a processforgenerating efficient lowerboundsforit. Soon standard-form variants weredevelopedbyAnstreicher [4],Gay[24],Gonzaga[34],Steger[106],andYe and Kojima[121],andan efficient methodforgenerating lowerboundsfortheoptimalcost wasdevisedbyToddandBurrell[108].Another approachforfinding lowerboundswas developedbyAnstreicher [3]. A thorough ofKarmarkar's simplification thealgorithm due algorithm reproduces toDikin[16],[17],whichnowreceived thenameof"affine-scaling." willbe Thismethod discussedin ?3.2. Karmarkar's itsvariants andimplementations, briefly are algorithm, inGoldfarb andTodd[32].Wedescribe discussed a variant in ofKarmarkar's algorithm ?3.6. Ourconcern starts from thefactthatKarmarkar's wellbyavoidalgorithm performs ofthefeasibleset.Anditdoesthiswiththehelpofa classicalresource, ingtheboundary first usedinoptimization byFrisch[22]in 1955:thelogarithmic barrier function: X
E
ff?n
X
>
O
n
~-p(X)
= i=l
xi. Elog
Thisfunction neartheboundary ofthefeasiblesetS, andcanbe used growsindefinitely as a penalty attachedto thosepoints.Combining makespoints p(.) withtheobjective neartheboundary andforcesanyminimization algorithm to avoidthem. expensive, A questionis thennaturally raised:howfarfromtheboundary shouldonestay?A wasgiventhrough successful answer thedefinition oftheanalytic centerofa polytope by Sonnevend A well-behaved thebarrier function. [103],theuniquepointthatminimizes curveis formed centers ofall theconstant-cost slicesofthefeasibleset bytheanalytic in (P): thecentralpath. Thisis thesubjectofthispaper. Thispathis a regionwith someveryattractive andprovides an answer toourquestion:try primal-dual properties, to staynearthecentralpath.Renegardidso in 1986[96],andobtainedthefirst pathwitha complexity following lowerthanthatofKarmarkar's methodinterms algorithm, ofnumberofiterations (O(V/iL) againstO(nL)). Renegar'sapproachwasbasedon Huard'smethodofcenters [50]. Soon afterwards and Gonzaga[36],folVaidya[111],refining Renegar'sresults, withan overallcomplexity lowinga penaltyfunction approach,describedalgorithms ofO(n3L) arithmetical limitthatis stillstanding. operations, Simultaneously, Kojima, This content downloaded from 200.17.211.124 on Mon, 21 Sep 2015 16:22:39 UTC All use subject to JSTOR Terms and Conditions
PATH-FOLLOWING METHODS FOR LINEARPROGRAMMING
169
path-following method, whichwas Mizuno,andYoshise[65]developeda primal-dual and [64]andbyMonteiro bythesameauthors soontobe reducedtothatlowcomplexity potential function appearedlater, Adler[89].A fourth approachbasedon Karmarkar's first inYe [118]andtheninFreund[19],andinGonzaga[41]. results werecitedinthebriefhistorical accountabove.An Onlyprovencomplexity method function [6]:theclassicalbarrier amazing facthasbeenfoundoutbyAnstreicher [18],exactly programming byFiaccoandMcCormick (SUMT) developedfornonlinear in O(V/niL logL) iteraas implemented in 1968,solveslinearand quadraticprograms tions. hasbeenextremely activeinthelastfewyears. The fieldofinterior pointmethods thefourapproachesforlinearproOvera hundredpaperswerewritten, developing tolinearcomplementarity quadratic programming, gramming, extending themtoconvex Path-following methods, whichstarted programming. problems andtoconvexnonlinear evolvedintopractical withnicetheoretical largeas short-steps properties, algorithms stepsmethods. of centralpathalThe purposeof thispaperis to describea unifiedtreatment at thefourapproachescommented gorithms, and to showhowone arrivesnaturally and methods, methods, potential reduction penalty function above(methods ofcenters, ofpointsnearthecentral primal-dual methods).Weshallsee thatthegoodproperties atthesepoints, andthiswill associated toniceprimal-dual properties pathareintimately Andnotsurprisingly, weshallinthe forthewholetheory. providetheunifying concepts withprimal-dual properties, pathandworkdirectly endbe ableto abandonthecentral ofpath-following methods. whilekeepingalltheniceproperties Proofswillbe givenonlyforsomeresults.We hopeto achievethegoalofprovidandtopavethe function ofone approach(penalty methods), inga completetreatment We shall restrain tolinear of other ones. ourselves analyses the wayforstraightforward ofthefieldinthispaper:itshould tomakea survey programming, andwedo notintend be considered as a tutorial on thebasictechniques. rather overview ofthegeometriofthepaper.Thenextsectionisaninformal Organization andthe thelinearprogramming cal aspectsofthemethods. Section3 describes problem a variant ofKarmarkar's maintoolsusedininterior algorithm including pointmethods, thecentral Section4 describes withtheTodd-Burrell lowerboundupdating procedure. whichassumethatexactpointson the algorithms, pathandconceptual path-following stresses thesimilarities amongseveral patharecomputed byan oracle.The treatment 5 and forconceptual Sections algorithms. theorem approaches andendsbya complexity theconstruction 6 discussnearlycentralpointsandcentralization algorithms, allowing inwhichonlypointsnearthecentral ofcomputationally path algorithms, implementable to thevariousapproaches ofthesealgorithms (penalty are allowed.The specialization and primal-dual reduction methodsofcenters, function methods, potential methods, indetailin ??7,8, and9. Section10 hasreferences to topicsnot methods)is described oftheapproachtononlinear coveredinthispaperandtoextensions problems. Notation.We shallworkwithcolumnvectorsandmatrices denoted,respectively, willbe denotedbysuperindices; Different vectors bylowercase anduppercaseletters. willdenotecomponents of a vector.Theseare somespecialconventions: subindices Fora vectorlikex, xk,z, thecorresponding X, Xk, Z willdenote uppercase symbols formed Givena vectorx E ERn the thediagonalmatrix bythevector'scomponents. x21,i = 1,***,n. notation x- willbe usedforthevectorwithcomponents indicated The lettere willdenotethevectorofones,e = [1 ... j]T, withdimension bythecontext. This content downloaded from 200.17.211.124 on Mon, 21 Sep 2015 16:22:39 UTC All use subject to JSTOR Terms and Conditions
170
CLOVIS C. GONZAGA
usedinthetext: ofthemainsymbols hereis a listing Forfuture reference,
e = [1... 1]T
X = diag(xi,
**x,,n)
vectors inRfn. andpositive 1R+,1R++:nonnegative
x,w,z,A,b,c: variables anddatafor(P) and(D) (?3.1).
y,w, z, A, b,c: variablesand data forscaled problems(?3.2). x, w,z: optimalsolutionsfor(P) and (D) (?3.1). value(?3.1). v = c x: optimal
interior (?3.1). S, SO: feasiblesetfor(P) anditsrelative interior (?3.1). Z, ZO: setoffeasibledualslacksanditsrelative complement (?3.1). matrix intoAf(M) anditsorthogonal PM,PM: projection (?3.1). ofr intosubspaceinthecontext rp= Pr: projection (?3.3). p(.): barrier function of thecentralpath(?4.2). p X x(p): genericparameterization ofthedual centralpath(?4.2). p z(p): genericparameterization p v(p): dual cost (lowerbound) associatedto p (?4.2). -
a: penalty (18). multiplier K: upperboundforthecost(19). v: lowerboundforthecost(20). centerandpotential functions (?3.4). fo(), fK( ), fv(.):penalized, q > n (?3.4). q: fixedmultiplier,
centerofS (?3.3). X: analytic
h(x, p) = X- Ih(x, p): SSD directionfrome in scaled space (22).
forfp(-)fromx (23). h(x,p): SSD direction x tox(p) (40). measurefrom 6(x,p) = IIh(x, p)II:proximity points(?4.1). v(p),z(p), Ai(p):lowerbound,dualslacks,andgapassociatedtocentral v(x,p),z(x,p),A(x, p): lowerbound,dualslacks,andgap associatedto nearlycentral points(42). v5(x),z4x), A(x): bestguessesforlowerbound,dualslacks,andgapatx (?3.5). geveryinformally 2. Anoverview ofcentralpathmethods.Thissectiondiscusses will andreferences properties, ometrical path.Precisedefinitions, aspectsofthecentral be givenlater. function Letus startbyobserving thebarrier n
x
EJRn,
X >
0 F-4p(x)
=-E i=1
n
logxi=-logJ xi. i=1
Thisfunction penalizesvariablesthatapproachzero,andhencepenalizespointsnear centerofS, and theboundary ofS. The uniqueminimizer ofp(.) in SO is theanalytic theproduct ofthevariablesinS. Figure2.1 itcoincides withthepointthatmaximizes illustrates thecenterfora simpleproblem. toshowing thatNewton-Raphson's method can Muchofthepaperwillbededicated withexcellent precision be adaptedto thedetermination ofa centerwitha prescribed Forthetimebeing,letus assumethatan exact theoretical andpractical performance. is athand. solution is thatcostsshouldtryto be deThe mainidea behindall interior pointmethods As is naturalto do in the creasedand simultaneously moveawayfromtheboundary.
This content downloaded from 200.17.211.124 on Mon, 21 Sep 2015 16:22:39 UTC All use subject to JSTOR Terms and Conditions
PATH-FOLLOWING METHODS FOR LINEARPROGRAMMING
171
FIG. 2.1. Levelcurves andvalues inS. fortheproduct ofvariables
faceofcompeting objectives, weshallexamine combinations ofcostandbarrier function, ina traditional construction knownas internal penalizedfunction: a
E R,
x EFH
fa
(x) = acTx +P(X).
Thisfunction wasextensively studiedbyFiaccoandMcCormick intheirbook[18],and described sincetheninall nonlinear textbooks. programming a a centralpoint Nowassociatetoeachvalueoftheparameter x(a) uniquely defined by (1)
x((a) = argminfa(x). xESO
The curvea e 1f?F-* x(a) is thecentral pathforthelinearprogramming problem(P). The curveis smoothandhastheimportant thatas a increases itconverges to property an optimalsolution of(P). Thereare severaldifferent ofthecentralpathas we shallsee below. descriptions Eachdescription toa different corresponds ofthecurve.One ofthem parameterization hasa simplegeometrical consider a central interpretation: point x(a) = argmin{accTx + xESO
p(x)}.
Thispointobviously solvestheproblem obtainedbyconstraining thecosttoitsvalueat x(a), thatis, x(a) = argmin{p(x)IcTx = cTx(a)}. xESO
Thisproblem describes theanalytic center ofa constant-cost sliceoftheoriginal feasible setS, andthisis illustrated byFig.2.2. Path-following algorithms followthecentral path.Let p e (w, w+) F-* x(p) be a parameterization ofthecentral Allalgorithms infinite. path,withw+ > w-, w+ possibly followthemodelbelow. ALGORITHM 2.1. Conceptual path-following: givenxo e so, po e (w-, w+), with x? = x(po).
This content downloaded from 200.17.211.124 on Mon, 21 Sep 2015 16:22:39 UTC All use subject to JSTOR Terms and Conditions
172
CLOVIS C. GONZAGA
slicesofS. centers oftheconstant-cost FIG.2.2. Thecentral pointsaretheanalytic
k := 0. REPEAT
ChoosePk+1 > Pk minimization algorithm tofindxk+1 := x(pk+1). Call an internal k := -k+ 1.
UNTIL convergence.
a sequenceofindependent centralpoints.Actual As itis,themodelsimply generates willdependon theparameterization, (choiceofpoandx?), theinitialization algorithms andwhatis moreimportant, thecriterion forupdating theparameter. different algorithms. As andtheupdating rulecharacterize The parameterization in(1) andupdates anexample, approachusestheparameterization thepenalty function theparameterbyak+1 := (1 +
p)ak,
wherep is a positiveconstant.
thesameforall methods.Findinga central The internal algorithm is essentially witha simpleHeswitha nonlinear function objective pointis a minimization problem anda naturalchoiceis thealgorithm Weshalluse in sianmatrix, ofNewton-Raphson. inthenextsection, descent allcasesan algorithm tobe discussed calledscaling-steepest isinsomecasesexactly Thecrucial equivalent toNewton-Raphson. (SSD). Thismethod inrelation totheinternal is itsstopping rule. algorithm pointtobe discussed infinite Itisimpossible pointexactly time,andwewanttoconstruct tofinda central Also fromthepracticalpointofviewtheinternal algorithm polynomial algorithms. as soonas possible.We mustthenrenounceto thedeterminashouldbe terminated andwork"near"thecentral mustbe defined path.Precisecriteria tionofcentral points, a point"near"a central forconsidering point,andthiswillbe theobjectof?5. thatwe do havea goodcriterion formeasuring ourmodelcan proximity, Assuming inthis a little, as follows.Figure2.3 illustrates thebehavior ofalgorithms be improved model.
This content downloaded from 200.17.211.124 on Mon, 21 Sep 2015 16:22:39 UTC All use subject to JSTOR Terms and Conditions
PATH-FOLLOWING METHODS FOR LINEARPROGRAMMING
7~~~~~
/
173
/~~~
2
Ix
I /
./; /
:
FIG.2.3. Short andlargesteppath-foUlowing algorithms.
ALGORITHM 2.2.Implementable path-following: givenxo e
x? nearx(po).
So, po e (w, w+), with
k := 0. REPEAT
UNTIL
ChoosePk+1 > Pk minimization to findxk+l nearX(pk+l). The algoCall an internal algorithm rithm starts atxk. k := k + 1. convergence.
ofparameter Figure2.3 showstwopossiblecombinations updatesand proximity In thefirst criteria. to tracethepath,so case,a short-step updateforcesthealgorithm thatall pointsgenerated arenearthepath.In thiscase,theinternal algorithm usually In thesecondcase, one iteration ofthemainalgorithm. executesexactly periteration theparameter is updatedbylargesteps,andseveraliterations oftheinternal algorithm areneededtoapproachthecentral toPk+1* pointcorresponding 3. Toolsand non-path-following methods.Thissectionestablishes themainfacts ingeneral. anddefinitions thatcomposethelanguageofinterior pointmethods 3.1. The linearprogramming problem.The linearprogramming problem(P) was and nonnegative withequalityconstraints alreadystatedin ?1. We chosetheformat withinequality andunrestricted buttheequivalent format constraints variables variables, couldhavebeenchosenas well.In fact,therearesimplerulesfor"translating" results fromone formulation to inequality intotheother(see Gonzaga[35]). The extension constraints ofthebarrier andofthenotionofanalytic function centeris straightforward byusingslacks. A is full-rank, andthatthefeasible We assumeas abovethattheconstraint matrix setS isboundedwithnonempty relative interior canbe relaxed: SO.Theseassumptions we onlyneeda boundedoptimalsetformostresults, butthisgeneralization affects the oftheresults. simplicity Weshallalso assumethatan initialinterior pointxo E SO is at hand.Thisassumpthatmodifies theproblem.A tionis inpractice replacedbyan initialization procedure
This content downloaded from 200.17.211.124 on Mon, 21 Sep 2015 16:22:39 UTC All use subject to JSTOR Terms and Conditions
174
CLOVIS C. GONZAGA
Reliketheone discussedinAdler,Karmarkar, is a big-Mmethod, typical procedure sende,andVeiga[2]. unique), solution hasanoptimal theproblem Withthesehypotheses, x (notnecessarily willbe denotedby andtheoptimal valueoftheproblem v=
(2)
x.
ofc intoAf(A), showstheprojection problem(P). The figure Figure3.1 illustrates pointmethroleininterior willplayan important vectors thenullspaceofA. Projected theconceptofprojection. ods,andwe shalltakea littlespacetoreview
cPl
\ if/ X
AX = b
/
problem. FIG. 3.1. Thelinearprogramming
represented byA: Twosubspacesof1RI areassociatedtothelineartransformation therange complement, thenullspaceK(A) = {X E 1Rn I Ax = O},anditsorthogonal byIZ(AT) = {x eE1n I x = ATW, W E jRm}. Anyvectord E 1Rn spaceofAT, defined as d = dP+ dP, wheredPE Af(A) and iP E IZ(AT).dpand canbe uniquely decomposed complement. the ofd intoK(A) anditsorthogonal projection are, respectively, dP PA,such Sincetheprojection is linear,itcan be represented bya matrix operator thatdP = PAd. The orthogonalcomplementwillbe dP = PAd, wherePA = I
- PA.
IfA is a full-rank matrix: fortheprojection matrix, thenthereis a closedformula PA = I - AT(AAT)-1A.
(3)
Euclideandistanceto ofd intoK(A) is thepointinK(A) withsmallest Theprojection themostusualdefinition ofprojection: d. Thisis actually (4)
dP = argmin{lx - dll I x E AS(A)}.
Similarstatements areassociatedtotheorthogonal complement: (5)
dP= argmin{lizilI z = d - ATw,w E Rim}
(6)
= min{ld- ATwII I w E lm}. ildpli
z
This content downloaded from 200.17.211.124 on Mon, 21 Sep 2015 16:22:39 UTC All use subject to JSTOR Terms and Conditions
METHODS FOR LINEAR PROGRAMMING PATH-FOLLOWING
175
bycp'x. The optimalsetfora problem(P) doesnotchangeifwe replacetheobjective thesteepestascent costcpshouldbe clear:itprovides oftheprojected The importance function f: SO -+ point.Givenanydifferentiable an interior forthecostfrom direction forf(.) fromx is -PAVf(x). descentdirection JRandan interior pointx,thesteepest intoAf(M)will matrix M, theprojection Remarkon notation.Givenanymatrix notation is possible, we use thesimplified no confusion be denotedbyPM. Whenever ofa vectorr willbe denotedby matrix, and thentheprojection P fortheprojection rp_ Pr-PMr. to (P) is associated The dualproblem Dual problem. maximize (D)
bTw
subjectto ATw + z
=
c
z
>
0.
(D) hasan optimal Thevariablesz E DRnarecalleddualslacks.Underourhypotheses, v. unique),andbTw V z) (notnecessarily solution(wz, The dualitygap. Givenanypair(x,z), wherex E S and (w,z) is feasiblefor(D) forsome w E #R', xTz = cTx-bTW.
ofz = c - ATw. fact,whichcanbe provedbydirectsubstitution Thisis a well-known ofcomplementary to xTz = 0: thisis thetheorem is equivalent Notethatoptimality slacks. The dualproblemseemsto havetoo manyvariables.In fact,thevariablesw can pair.Thishasbeen primal-dual symmetrical leadingtoa veryconvenient be eliminated, prostudiedbyToddandYe [110],andwe useherea verysimplereduction thoroughly cedure. LEMMA3.1. z E 1n isa feasibledualslackfor(D) ifand onlyifz > 0 and PAz = PAC
Proof.Considera vectorz > 0. Thenz is a feasibledualslackifandonlyifforsome
w E JRm,
c- z
= ATW.
But c - z can be decomposedin a uniquewayas c-Z = PA(c-z) +?ATW, and itfollowsfromthecomparisonofthetwoexpressionsabove thatPA(c- z) = PACPAZ = 0, completingtheproof. [1
thefeasibility of itprovides a simplerulefortesting Thislemmaisveryinteresting: canbe obtainednow. a dualslack.Someconclusions dualproblem(inthesensethattheobjective Givenanypointx E S, an equivalent as at all feasiblepoints)canbe written differs bya constant minimize xT z (7)
subjectto
PAZ z
=
PAC
>
0.
This content downloaded from 200.17.211.124 on Mon, 21 Sep 2015 16:22:39 UTC All use subject to JSTOR Terms and Conditions
176
CLOVIS C. GONZAGA
Here theobjectiveis thedualitygap, and theoptimalvalue is
z
= cTI -
v.
theprimalproblemcan also be modified: itsobjectivecan be replaced Similarly, bycTx as we sawabove,orbyzTx foranyfeasibledualslackz. The equivalent primal problem willbe: minimize ;Tx (8)
subjectto
Ax
=
b
x
>
0,
Thedualfeasiblesetfor(7) anditsrelative Notation. willbe defined as interior Z
=
Z?=
{ZEJR|PAZ=PAC,Z>?O}, {ZE1Rn|PAZ=PAC,z>O}.
A scalingtransformation 3.2. Thescaling-steepest descentalgorithm. onproblem (P) isa changeofvariables x = Dy,whereD isa positive Givena point diagonalmatrix. x = Xoy,whereaccording x? E S, scalingaboutx? isthescalingtransformation toour notational convention, scaled problem Xo = diag(x?, , xO).Thelinearprogramming aboutx? willbe minimize
ET
subjectto
(SP)
y
Ay
=
y >
b
0,
whereA = AXO,c = Xoc are obtainedbysubstitution of x := Xoy in (P). The pointx?
is transported toe,thevectorofones. ina simpleway. affects dualvariables Scaling
LEMMA3.2. (w, z) is a feasibledual solutionfor(P) ifand onlyif(w, Xoz) is a feasible dual solutionfor(SP).
Proof.(w,z) is feasiblefor(P) ifandonlyif
z > O.
ATw+z=c
Multiplying bythepositive diagonalmatrix Xo, (AXo)Tw
+ XOz
=
XOC,
Xoz
> 0.
a dualfeasiblepair(w,Xoz) for(SP), completing Thischaracterizes theproof. O Theprimalvariables inscaledproblems Remarkon notation. willbe eithery orx. Allotherentities associatedtothescaledproblem willbe indicated bya bar. Thereare severalreasonswhyscalingis veryuseful.It obviously doesnotchange problem(P), and so itis in principle innocuous.The first reasonwhywe shallalways workwithscaledproblems is thatitsimplifies inmostofthetheory theexpressions and procedures tobe studied, yielding veryclearformulas. The secondreasonis thatscalingdoesaffect thesteepestdescentdirection. Andit doesso ina cleverway,as we shownow. Considera generalization ofproblem(P) fora differentiable objective f(I): minimize f (x),
This content downloaded from 200.17.211.124 on Mon, 21 Sep 2015 16:22:39 UTC All use subject to JSTOR Terms and Conditions
PATH-FOLLOWING METHODS FOR LINEARPROGRAMMING
177
anda pointxo E SO.Thesteepest descentdirection wasstudied byCauchyaround1840: itis thedirection ofthelinearapproximation thatsolvestheminimization off(.) about inx?, x? overa unitballcentered (9)
minimize{Vf(xO)TdI Ildil< 6,d E Af(A)}.
The optimalsolution oftheCauchy-Schwarz andis stems,as a consequence inequality, alwaysa multiple of (10)
h=-PAVf(x0).
in constrained The steepestdescentdirection as we maybe veryinefficient problems, in Fig.3.2 fora verysimpleproblem, illustrate withS = 1R2. The steepestdescent whatis knowntodayas a trust-region a simple is actually minimization: computation ina simpleregion(a ball)toobtaina hint isminimized objective (linearapproximation) ofthefunction on thebehavior aroundthepoint.A ballis an obviouschoicefortrust anddemocratic regionbecauseitis easy(all one needsis a projection) (no directions arefavored). Thepresence ofpositivity constraints andmotivates the spoilsthesecondadvantage, searchforan easyregioncapableofreflecting theshapeoftheregionofinterest more inthe The easiestlargeshapeavailableis thelargest precisely. possiblesimpleellipsoid withaxesparallelto thecoordinate shownin Fig.3.2. The ellipsoid, positiveorthant, withS. axes(andhencesimple),provides a largetrust regionwhenintersected
FIG.3.2. Trust inCauchy andSSD algorithms. minimization region
thisellipsoidintoa ballcenteredat e, and Scalingtheproblemaboutx? deforms hencethesolutionofthetrustregionminimization is obtainedbyscalingfollowed by theprojection oftheresulting vector(see Fig.3.3). gradient Thisanalysis ina generalfirst-order trust results interior regionminimization algorithm thatcaninprinciple be usedforanycontinuously differentiable function. objective Thisalgorithm willbecalledscaling-steepest descent(SSD), andwillbe theminimization willuse a slightly methodusedin mostofthispaper(primal-dual different algorithms scaling). ALGORITHM 3.3. (SSD): given xo E SO,f: SO -> JRcontinuously differentiable. k := 0. REPEAT
Scaling:A := AXk, g
xkVf(xk).
This content downloaded from 200.17.211.124 on Mon, 21 Sep 2015 16:22:39 UTC All use subject to JSTOR Terms and Conditions
CLOVIS C. GONZAGA
178
trustregions. FIG.3.3.Affine-scaling
Direction:h :=-PA9- _ Line search:y := e + Ah,y > 0. Scaling:xk+l Xky.
k := k+ 1.
UNTIL convergence.
h minimizes xk tothevectore. Thedirection transports Thescalingtransformation tothelargest ofy - f(y) = f(X-1y) ina ball(corresponding thelinearapproximation here:itis space).Thelinesearchalonghisnotspecified simpleellipsoidintheoriginal procedure witha heuristic minimization off(.) alongh,perhaps an approximate usually is present). function (notneededifthebarrier toavoidtheboundary is obtainedbythedirect forlinearprogramming An amazingly efficient algorithm of SSD to theoriginalproblem(P). Thisis themethodknownas affineapplication one in first proposedbyDikinin 1967[16]. Dikintookalwaysa stepoflength scaling, usedlargesteps,a fixedpercentage Otherresearchers thelinesearch,i.e.,A = 1/1lhll. Likein orthant. inthepositive possiblesteplength (above95 percent)ofthemaximum intheprojection workis concentrated thecomputational pointalgorithm, anyinterior neededineachiteration. operation variantof Karobtainedas a simplified is naturally algorithm The affine-scaling alongthispathbyseveralauthors:Barnes andwas rediscovered markar's algorithm, for convergent isglobally andFreedman [115].The algorithm Meketon, [9],Vanderbei, as Dikinprovedin1974[17]forunitsteplengths. withnoprimaldegeneration, problems to andLagarias[114],andextended byVanderbei andclarified Hisproofwasimproved bymany implemented largestepsbyGonzaga[39]. The methodhasbeensuccessfully Resende,andVeiga[2] and MonmaandMorton[84]. groups,likeAdler,Karmarkar, ofimplementations. andTodd[32]fora discussion See Goldfarb scalingopusingan explicit The searchdirection.We wrotetheSSD algorithm erationat eachiteration.Thisis notneeded,sincescalingwas onlya methodforthe This content downloaded from 200.17.211.124 on Mon, 21 Sep 2015 16:22:39 UTC All use subject to JSTOR Terms and Conditions
METHODS FOR LINEARPROGRAMMING PATH-FOLLOWING
179
intheoriginal expressed canbe explicitly thesearchdirection trust regionminimization; space,anditis easytosee thatitis givenby (11)
h=
Xkh = -XkPAXk Xkvf (x)
space. directly intheoriginal canbe written The SSD algorithm xo E S0, f: S0 -* R continuously ALGORITHM 3.4. (SSD): given differentiable. k := 0. REPEAT
Direction:h == -XkPAXk XkVf (Xkk). Linesearch:xk+l := xk + Ah,xk+l > 0. k := k+ 1. UNTIL convergence. mathematics, tendstoproducehard-to-read expressions Sincetheuse oftheseexplicit space. do scalingandworkinthetransformed we shallfrequently inFig.3.3. The ellipforan example,are illustrated iterations, The affine-scaling at eachpointto the space: theycorrespond soidaltrustregionsare shownin original at thepoint.Herewe simpleellipsoidinBR+centered intersection ofS andthelargest tookunitsteplengths. algorithm needs function andtheanalytic center.Theaffine-scaling 3.3. Thebarrier byreandobtainsthisfeature trustregions, interior pointstogenerateniceellipsoidal and a fixedpercentage of A stepofunitlengthis inefficient, thesteplength. stricting may it the boundary, the points avoids stepis farfromelegant.Although themaximum algorithm isnot accumulate nearit.Therearegoodreasonstobelievethattheresulting polynomial (see MegiddoandShub[73]). a cenwillbe obtained bydefining An elegantwayofactively avoiding theboundary of the two reducing objectives: the and simultaneous consideration terfor polytope S, by offinding thecostfora whileandturntotheproblem Wenowforget costsandcentering. a "center" forthepolytope S. thecenter ofgravity, butitscompuofcenter isprobably Thebestpossibledefinition thanthelinearprogramming moredifficult problem tationis knowntobe verydifficult, volumeinscribed itself.Anothernicecenteris thecenterofan ellipsoidofmaximum inpolynomial and timebyKhachiyan in S. Although thisellipsoidhasbeencomputed centers.A thirdgoodcenter Theseare geometrical Todd[57],it is stilltoo difficult. in polynomial timefora has beendefined byVaidya[113],andcan also be computed volume centeris likebeforethecenterofa maximum givenprecision:thevolumetric ofS istakenamongthesimpleellipsoids (intersections ellipsoidinS, butthemaximum andellipsoids withaxesparalleltothecoordinate axes).Forthetimebeingitis alsotoo difficult methods. forpractical At thistime,themostusefulcenteris theanalytic center,definedbySonnevend centerofS is theuniquepointgivenby [103]:The analytic (12)
X = argminp(x). xESO
of theanalytic center(centering) dependson a goodunderstanding Approaching willbe The notational conventions thebarrier function. explainedin theintroduction Thebarrier function usedinthestudy ofitsproperties. by p: ffn -* fRis defined
This content downloaded from 200.17.211.124 on Mon, 21 Sep 2015 16:22:39 UTC All use subject to JSTOR Terms and Conditions
180
CLOVIS C. GONZAGA n
p(x)
xi,
= -Elog i=l
andhasderivatives (13)
(14)
-x-l, V2p(x) = X`2 Vp(x)
Vp(e) =-e, V2p(e) = I.
=
SinceV2p(x)ispositive definite inSO,p(.) isstrictly convex.Besidesthis, Convexity. theboundary center as x approaches ofS, andthustheanalytic p(x) growsindefinitely iswelldefined. D. Wehave Effect diagonalmatrix ofscaling.Considera positive n
p(Dx) = p(x) -
logdi. i=l
Giventwopointsx1,x2 > 0, - p(Dxl)
p(Dx2)
= p(X2)
-p(Xl)
ofp(.). andhencescalingoperations do notaffect variations variations of reasonfortheuse ofscaling:whilenotaffecting Herewe see another Stillmorestriking is the thebarrier scalingyieldsextremely easyderivatives. function, thatthesteepest is theidentity, withtheconsequence factthatat e theHessianmatrix direction.Hence,the frome coincideswiththeNewton-Raphson descentdirection and Newton-Raphson's methodwithlinesearches descentalgorithm scaling-steepest function. coincideforthebarrier weshall oftheefficiency ofalgorithms Linearapproximations arounde.Inourstudy a boundon Atthispointwe establish ofthebarrier function. uselinearapproximations arounde. theerrorofthelinearapproximation around1. on thelogarithm function Webeginbylisting someresults LEMMA3.5. Let A E (-1, 1) begiven.Then (15)
A2 A > log(l + A) > A - -
1
ofthelogarithm. oftheconcavity is a directconsequence Proof.Thefirst inequality in thelogarithm was provedbyKarmarkar The secondinequality [55],bydeveloping series: Taylor's log(l+A)
=
A--2 +3 2 3
4 4
A2
>
andtheproofis complete.
A--l2(1?+IAI+A12+...) 2 A2 1
O
This content downloaded from 200.17.211.124 on Mon, 21 Sep 2015 16:22:39 UTC All use subject to JSTOR Terms and Conditions
PATH-FOLLOWING METHODS FOR LINEAR PROGRAMMING
181
Variation ofthebarrierfunction arounde. LEMMA 3.6. Considera vectord E 1fn suchthatIIdlIk0 < 1. Then p(e + d) > Vp(e)'d
(16)
(17)
-e'd.
=
p(e + d) < Vp(e)'d + 1I1d12- 1
2 1 lldlIk0
-e'd + 1Id12 -1 2 1 lldlli
Proof.We have n
p(e + d)
=
l-,1og(1
+ di).
i=l
itis enoughto extendtheproperties Sincedi e (-1,1) byhypothesis, (15). The extensionis straightforward byaddingtheinequalities fori = 1 ton. 0 SincetheSSD algorithm coincides withNewton-Raphson's method with Centering. linesearchesfortheminimization itis naturalto concludethat ofthebarrier function, eithermethodmustbe efficient forthedetermination oftheanalytic center.Theresultingalgorithm isindeedefficient bothintheory andinpractice:itistheonlymethod used intheliterature. Itscomplexity wasstudiedbyVaidya[112],andwillbe revisedin?6. 3.4. Auxiliary functions. Givena pointin SO,ourtaskis obtaining a betterpoint withrespectto twogoals: costimprovement andcentering. As itis naturalwhentwo we takecombinations arepresent, ofthem. objectives thisreasoning, different functions are constructed, Following auxiliary each one of algorithms. Each auxiliary function usesa parameter leadingto a different family thatweights insomewaytheimportance Each auxgivento eachofthetwoobjectives. willbe associated iliary function tooneparameterization ofthecentral path,as weshall seeindetailinthenextsection.Herewesimply listthefunctions usedinprimal methods methods willbe examined (primal-dual separately). a (i) Thepenalizedfunction (Frisch[22],FiaccoandMcCormick [181)-parameter associatedtoa duality gap: x E So
(18)
f(x) = acTx + p(x).
K, upperboundto (ii) Thecenterfunction (Huard[50],Renegar[96])-parameter theoptimalcost;q > n constant: (19)
x E S0
s.t. cT x < K
fK (x)
=
-qlog(K-c
Tx) + p(x).
(iii) The potential function (Karmarkar [55])-parameterv, lowerboundforthe optimalcost;q > n constant: (20)
x E S?
fv(x) = qlog(cTX
-
v) + p(x).
In thenotation we sacrificed formal usedforthesefunctions forsimplicprecision is singledoutbythesymbol usedfortheparameter.We shall ity:theactualfunction dedicatemucheffort to eachofthesefunctions ahead. At thispointwe wantto make somecomments on theirsimilarities.
This content downloaded from 200.17.211.124 on Mon, 21 Sep 2015 16:22:39 UTC All use subject to JSTOR Terms and Conditions
182
CLOVIS C. GONZAGA
All auxiliary functions haveas secondtermthebarrierfunction, for responsible theboundary. Thefirst terminvolves both thecost,andtheparameter avoiding weights terms.In (i), increasing a increases theimportance ofthecostterm;in(ii),decreasing - log(K - cTx); in(iii) thesameeffect K increases is obtainedbyincreasing v. is a comparison at e (e will Stillmoreinteresting ofthegradients ofthesefunctions resultfroma scalingoperation inalgorithms), respectively, _
- e, aec-e,
q
-
K_-ceCCe
q Te-v
-
ate areallcombinations Thesteepest descentdirections oftwovectors:-PAC andPAe, true calledcost-reduction direction andcentering direction. Thisisactually respectively, usedbythealgorithms interior formostexistent pointmethods:thesearchdirections ofthecost-reduction andcentering directions arecombinations (forscaledproblems). to finda conclusion is thatgivenK or v, itis straightforward Another interesting valueofa suchthatVfQ(e) coincides with,respectively, VfK (e) orVfv(e): (21)
a = K q
a=
c
v
fortheauxiliary Mostinternal Descentdirections functions. algorithms applythe witha fixedvalueoftheparameter.Let us SSD algorithm to theauxiliary functions forthesedescentdirections. somenotation introduce in theoriginal Givena pointx e SO,thedescentdirections spacewillbe denoted inscaled directions h(x,p),wherep is a parameter amonga, K, v. The corresponding spacewillbe denotedh(x,p). Wehave (22)
h(x, p) = -PAXXVfp(x),
(23)
h(x, p) = Xh(x, p).
betweentheSSD direction Nowletus discusstherelationship h(x,p) andtheNewtonthedirections Raphsonstep(NR step).The mainresultis forthepenalizedfunction: coincide.
LEMMA3.7. Considerthefunctionfa(.) fora fixeda > 0. The NR stepfromx coincideswiththeSSD direction, givenby = -acp + ep. (i) h(e, a) =-PAVfca(e) (ii) h(x, a) =-XPAXXVfa (x) = -XPA (a - e)a
off].(.) about thatx = e. Thenthequadratic Proof.Assumeinitially approximation e hasderivatives givenby (e + h) VfcQ
= VfQ,f(e) + Ih, (e) + V2fcQ(e)h VfcQ
ofthequadratic totheminimizer sinceV2fQ, (e) = V2p(e) = I. TheNR stepcorresponds theprojected tozero: obtainedbysetting approximation, gradient PVfQ(e) + h(e, a) = 0,
theproofof(i). completing In fact,thequadratic is scaleinvariant. To prove(ii), notethattheNR algorithm ofa function does notdependon themetricofthespace (in contrast approximation
This content downloaded from 200.17.211.124 on Mon, 21 Sep 2015 16:22:39 UTC All use subject to JSTOR Terms and Conditions
METHODS FOR LINEARPROGRAMMING PATH-FOLLOWING
183
Weconcludethatforan arbitrary descentdirection). steepest withthenorm-dependent as in(i) inthatspace, x E So, theNR stepinscaledspacewillbe computed h(x, a) =-PAXVf,Q(x),
theproof. [ completing term, is influenced bythefirst theHessianmatrix functions, Fortheotherauxiliary as a quasi-Newton to NR. Butit can be interpreted and SSD is no longerequivalent sense. methodinthefollowing modelofthefunca quadratic ateachiteration methodminimizes A quasi-Newton expansion: fromtheTaylor tion,whichmaydiffer fp(x + h) uses E The SSD algorithm
=
fp(x) + Vfp(x)Th + 2hTEh.
V2p(x) insteadof E
=
V2fp(x). We lose thecontribution
isnullinthe Thiscontribution termofthefunctions. ofthefirst ofthesecondderivatives term thefirst function, coincide.Forthepotential andthemethods penalizedfunction, definiteness thepositive matrix thatmaydestroy rank-one definite a negative contributes can theequivalence Forthecenterfunction, andthusis ignored. oftheHessianmatrix, in?8. tobe described transformation be reestablished bya problem 3.5. Guessinga dual slack and a lowerbound. The parameterused bythepotential
function f (*) mustbe a lowerboundto thevaluev ofan optimalsolution.We now a lowerbound a feasibledualslack,andconsequently forguessing a procedure describe in[40],andgivesthesameboundsas themethods waspresented forv. Thisprocedure andVial[25]usingprojective developedbyToddandBurrell[108],andbyde Ghellinck as theonesusedinall The dualslacksgenerated byithavethesameformat geometry. methods. reduction existent primalpotential associatestoita lowerboundv(x) > Givenanyfeasiblepointx E S, theprocedure fails,and no dual slackis generated.The -oo. If v(x) = -oo, thentheprocedure willbe ensuredbythefact(to be seenin ?5.2)thata good oftheprocedure usefulness path. atpointson ornearthecentral lowerboundwillalwaysbe generated thate E S. Suppose initially
FromLemma3.1,wededucethata vectorz E 1RWisa feasibledualslackifandonly intrying to finda "very ifz > 0 andz = cp + -y,where-yI K(A). Ourguessconsists vector.Theideal vector-yI Kr(A) andaddingittocp toobtaina nonnegative positive" toKr(A). Wetry to e, butingenerale is notorthogonal guesswouldbe -yproportional toe - ep. -yproportional Letus definethevectora I Kr(A) givenby ti
(24)
e ep>
If forsome it E 1, cp - ,ii > 0, thenz = cp - ,ii is a feasibledualslack.The duality gap associatedto theprimal-dualpair (e, z) willbe A=eTZ
p
= cT e-
fL
sinceaTe
- ep) = (e - ep)Te. = 1 as is easyto see because (e - ep)T(e Now v = cTe - A is a lowerbound forv. The best lowerboundwillcorrespondto
valuefor/A. admissible themaximum
This content downloaded from 200.17.211.124 on Mon, 21 Sep 2015 16:22:39 UTC All use subject to JSTOR Terms and Conditions
184
CLOVIS C. GONZAGA
We can nowformalize theprocedure, to each x E S a lowerbound associating v(x) E [-oo,v] obtainedbytheprocedure aboveafterscalingtheproblem aboutx: A(x)
(25)
iV(X)
=
inf{eTPA- -
=
C X-A(X).
fLI
PAC-
,ii
>
O},
withtheconvention thatinf0 = +oo. IfA(x) < +oo, thentheprocedure defines the dualfeasibleslack Z(x) = X-1(PAE- Aia),
(26)
whereftis theminimizer in(25). 3.6. Non-path-following variantsofKarmarkar's Karmarkar's algorithm. original algorithm [55]isbasedonthepotential function withq = n. Itassumesthattheoptimal costvaluev isknown, andusesthisparameter valuefromthebeginning. Sincetheoptimalvalueisseldomavailable, Karmarkar proposedtheuseoflowerboundstov,andan inthereferences updating procedure.Updating weresoonimproved cited procedures in?3.5. Karmarkar's isnotsimply theSSD algorithm funcalgorithm appliedtothepotential tion.Forcompleteness, a verybriefdescription we nowpresent ofitsmechanics. First,assumethattheprimalproblem (P) is statedintheformat minimize cTx subjectto
A'x
=
0
aTx
=
1
>
0.
x
thisformat from(P) is straightforward withtheintroduction ofan extravariObtaining able.Letq = n inthepotential function thatv = 0. (20) andassumeinitially The resulting function is fo(x) = n logcTx + p(x). It is zero-degree hopotential i.e,foranyx > 0, A > 0, fo(Ax)= fo(x). Thismeansthatgivenanypoint mogeneous, x > 0 such thatA'x = 0 and aTx > 0 but aTx $ 1, the pointx/aTx is feasibleand = 1). Thusthefollowing hasthesamepotential value(sinceaT(x/aTx) schemecanbe used: aTx = 1; (i) Droptheconstraint (ii) Use SSD tofindxk suchthatfo(Xk) n thecenteris morethan activeregion.Renegarfoundthecleverproperty
halfwaytowardtheoptimalsolutions,thatis,cTx(K)
< (v + K)/2.
ofcentralpointsis welldefined, sincefK (x) is strictly Again,thecharacterization convexandgrowsindefinitely neartheboundary oftherestricted feasibleregion.The centralpointx(K) associatedto K >
is uniquelydeterminedbythecondition
c
Kq
(32)
v
- Px
=0.
Consider thefeatures ofthisparameterization. Thefirst columnofTable1 describes
x = x(K), and letus look closelyto thedualitygap forthecase inwhichq = n. We have = K
A(K)
-
cTx(K).
thatindeed Theduality cTx < K. Thisshows gapequalstheslackintheextrarestriction cTx is lowerthanhalfwaybetween plenty ofroomtodecreaseK v andK, andprovides
whilekeepingtheconstraint cTx < K inactive.There is stillmoreroomifq > n. The conceptualAlgorithm 2.1 is specializedbyspecifying theupdate rule:
SetKk+1 :=f3Kk + (1-f3)c xk. The duality gap at x(Kk+1) willbe suchthatA(Kk+l) > 3A(Kk), sinceA(Kk+1) = and themethodas "optiCTXk+l < CTXk. Thischaracterizes (n/q)(Kk+l cTxk+1), Notethat is smallerthanthedesiredreduction. mistic," sincetheactualgapreduction thisdoesnotdestroy sinceK - v hasa sounddecreaseat eachiteration, convergence, as we shownow. LEMMA4.4. ConsiderthecenterfinctionfK(*) withq > n and thecentralpoint
x = x(K) for K > v. Define r = n/q and
K' =f3K+(1
-3)cTX.
Then r+f3 1+/3 < 1+ r 2 1+r
K '-0
(33) (33)
v< K'~~~~ ~~K-0'
-
and thegap is relatedto K - v by
(34)
r(K
A(K)