Optimal approximation of sparse hessians and its ... - Semantic Scholar

Mathematical Programming 26 (1983) 153-171 North-Holland Publishing Company

O P T I M A L A P P R O X I M A T I O N OF S P A R S E H E S S I A N S AND ITS E Q U I V A L E N C E TO A G R A P H C O L O R I N G P R O B L E M S. Thomas M c C O R M I C K Department of Operations Research. Stanford University, Stanford, CA 94305, U.S.A. Received 4 December 1981 Revised manuscript received 11 June 1982

We consider the problem of approximating the Hessian matrix of a smooth non-linear function using a minimum number of gradient evaluations, particularly in the case that the Hessian has a known, fixed sparsity pattern. We study the class of Direct Methods for this problem, and propose two new ways of classifying Direct Methods. Examples are given that show the relationships among optimal methods from each class. The problem of finding a non-overlapping direct cover is shown to be equivalent to a generalized graph coloring problem--the distance-2 graph coloring problem. A theorem is proved showing that the general distance-k graph coloring problem is NP-Complete for all fixed k >- 2, and hence that the optimal non-overlapping direct cover problem is also NP-Complete. Some worst-case bounds on the performance of a simple coloring heuristic are given. An appendix proves a well-known folklore result, which gives lower bounds on the number of gradient evaluations needed in any possible approximation method.

Key words: Sparse Hessian Approximation, Graph Coloring Heuristics, Elimination Methods, NP-Complete, Direct Methods.

1. Introduction The problem of interest is the approximation of the Hessian matrix of a smooth nonlinear function F : ~ " ~ . In many circumstances, it is difficult or even impossible to evaluate the Hessian of F from its exact representation, Under these conditions, an approximation to the Hessian can be computed using finite differences of the gradient. When the Hessian is a dense matrix, this approximation is usually obtained by differencing the gradient along the coordinate vectors, and hence requires n evaluations of the gradient (which is the minimum possible number; see Appendix 1). However, if the Hessian has a fixed sparsity pattern at every point (i.e., certain elements are known to be zero), the Hessian may be approximated with a smaller number of gradient evaluations by differencing along specially selected sets of vectors. For example, consider the following sparsity pattern, in which 0 stands for a This research was partially supported by the Department of Energy Contract AM03-76SF00326, PA#DE-AT03-76ER72018: Army Research Office Contract DAA29-79-C-0110, Office of Naval Research Contract N00014-74-C-0267: National Science Foundation Grants MCS76-81259, MCS79260099 and ECS-8012974. 153

S.T. McCormick/Approxinmtion of sparse Hessians

154

k n o w n zero and 1 stands for a possible non-zero of the Hessian:

1 If the gradient is differenced along the directions (1, 1,0) 1 and (0,0, 1)x, the Hessian may be approximated with only two additional gradient evaluations. When n is large and the proportion of zero elements is high, the n u m b e r of gradient evaluations needed to a p p r o x i m a t e the Hessian m a y be only a small fraction of n. This result is particularly useful in numerical optimization algorithms. L e t g(x) denote the gradient of F and H ( x ) denote the Hessian. Assume that g(x ~ is k n o w n and that we wish to a p p r o x i m a t e H ( x ~ by evaluating g(x~ hdt), l = 1. . . . . k, for some step size h and a set of k difference vectors {dr}. For each I and sufficiently small h, we obtain n a p p r o x i m a t e linear equations

hH (x~ I ~ g(x~ + hd I) - g(x~),

(1)

SO that there are a total of nk equations. Note that m a n y of the c o m p o n e n t s of d t and H ( x c) are usually zero. Schemes for evaluating a Hessian approximation have been divided into three categories, depending on the complexity of the s u b s y s t e m of (1) that must be solved for the u n k n o w n elements of the Hessian (see e.g. [2]). Direct Methods c o r r e s p o n d to a diagonal s u b s y s t e m ; Substitution Methods correspond to a triangular s u b s y s t e m ; and Elimination Methods c o r r e s p o n d simply to a nonsingular subsystem. There is a tradeoff here; as we m o v e from Direct Methods to Elimination Methods, we are less restricted and thus expect fewer evaluations to be required, but we lose ease of approximation and possibly numerical stability. Once a class of methods has been selected, the problem is to choose a specific method that minimizes k for a given sparsity pattern, without requiring too m u c h effort to determine the vectors {d~}. This p a p e r is concerned primarily with a partial solution to this problem for direct methods. (The solution for elimination methods is well known, but seems never to have been published. Appendix 1 gives a proof of the t h e o r e m behind the solution in this case.) Section 2 gives a new classification for direct methods, Section 3 reduces one of these classes to a graph coloring problem and shows that problem to be N P - C o m p l e t e , Section 4 gives some heuristic results for the same class, and Section 5 points out possible future avenues of research.

2. Classifying direct methods Let H denote the Hessian of F at the point x ~ Any element of H that is not known to be zero is called an unknown. An illuminating interpretation of a direct

S.T. McCormick/Approximation of sparse Hessians

155

method is to regard the non-zero components of a given d t as specifying a subset St of the columns of H ; Sz is called the lth group of columns; by a slight abuse of notation, a column index j is said to belong to S~ when its column does. When two columns belonging to S~ both have an unknown in row i, there is said to be an overlap in S~ in row i. By definition of a direct method, the family of subsets {S~} must satisfy the Direct Cover Property (DC): (DC)

For each unknown h~j, there must be at least one S~ containing column j such that column j is the only column in S~ that has an unknown in row i.

Any family of subsets of columns satisfying (DC) is called a direct cover for H, and naturally gives a scheme for approximating H. That is, if e~ is the ith unit vector, differencing along

d t = ~ , ej, I = l ..... k is the scheme associated with the family {S~}. The problem of interest is thus that of finding a minimum cardinality direct cover for a given H (an optimal direct cover). Since it is difficult to find a general optimal direct cover, the problem is often approached heuristically by restricting the acceptable direct covers and attempting to choose an optimal or near-optimal direct cover from the restricted set. We suggest a new classification scheme for types of permitted direct covers. From most to least restrictive, the categories are: (1) Non-Overlap Direct Covers (NDC): No overlap may occur within any group of columns, i.e. every group has at most one unknown in each row. The best-known heuristic, the CPR method, belongs to this class (see Curtis, Powell and Reid [3]). (2) Sequentially Overlapping Direct Covers (SeqDC): A less restricted class may be defined by observing that overlap within a group of an ordered direct cover does not violate the direct cover property if the values of overlapping unknowns can be resolved using preceding groups. In a SeqDC, columns in group I are allowed to overlap in row m only if column m belongs to some group k such that k < I. This definition implies that (DC) is satisfied: Consider an unknown h~j and let k = min{/[either i or j belongs to group l} (k is called the minimum index group of h~j); note that h~j cannot overlap with any other unknown in its row in group l. Powell and Toint [8] propose a heuristic of this class. (3) Simultaneously Overlapping Direct Covers (SimDC): This is the most general class of direct covers. Any kind of overlap is allowed, as long as (DC) is not violated. In particular, an unknown may overlap in its row even in its minimum index group as long as the overlap is resolved in some succeeding

156


group. Thapa's " N e w Direct M e t h o d " [9] falls in this class, though he adds several other restrictions. Define c xDc to be the number of groups needed by an optimal direct cover of type XDC. Clearly NDC C SeqDC C SimDC, which implies that c NDc -> c seqDcc simDc. These inclusions and inequalities can be strict, as the following examples show. Consider

\1

1

1

0

1

1

1

0

1

1

0

0

1

.

1/

Since every column overlaps with every other column, an NDC must use at least five groups (and of course, five suffice). But {{3}, {2,4}, {1}, {5}} is a SeqDC of cardinality 4 < 5. N o w consider (from [8]):

i' ~ 1

1

0

0

0

0

1 0

1

0

0

1

0

1

0

0

.

It is easy to see that any SeqDC requires at least four groups, but {{1,5}, {2, 6}, {3, 4}} is a SimDC of size only three. The above discussion glossed over whether a column can belong to more than one group. This consideration leads to an independent classification scheme for direct covers into: (1) Partitioned Direct Covers (prefix P): These require the direct cover to be a partition of {1,2 . . . . . n}, i.e., every column must belong to exactly one group. (2) General Direct Covers (prefix G): These allow either columns that belong to no group, or columns that belong to more than one group. All heuristics proposed so far known to this author restrict themselves to partitioned direct covers. When h~ is an unknown, column i must belong to at least one group, for otherwise h~ would not be determined. Since hii is an unknown for all i in most unconstrained problems, it seems natural to restrict our attention to direct covers in which each column belongs to at least one group. However, sometimes the optimal direct cover is larger under this restriction. For example, consider an N D C for

(i'i 1

,

0

'

S.T. McCormick] Approximation of sparse Hessians

157

Since all three pairs of columns overlap, a P N D C must use three groups; however, {{2}, {3}} is a valid G N D C of smaller size. But such problems rarely occur in unconstrained problems, and we shall henceforth consider only direct covers in which every column belongs to at least one group. From the r e m a r k in the definition of SeqDC that the definition of a SeqDC implies (DC), it is easy to see that in any GSeqDC, we can delete a column from all groups in which it appears except for its minimum index group, without violating (DC) (i.e., since any h~ is always determined directly by some column in that c o l u m n ' s minimum index group, any later occurrences of that column are superfluous). Thus in the SeqDC case, and so also in the N D C case, it suffices to consider only partitioned direct covers. But, unfortunately, P S i m D C s are not optimal in the class of GSimDC. Consider 1

0

0

1

1

1

1,

0

1

1 0

1

1 0

0

1

1

1

0

0

1

1

0

1

1

1

0

0

1

1

0

1

1

0

0

1

1

0

0

0

1

0

1

0

1

0

0

0

1

(2)

Laborious calculations verify that any PSimDC must have more than four groups. H o w e v e r , {{1,2}, {1, 3}, {4,6}, {5, 7}} is a GSimDC of size four where column 1 appears in two different groups. (The matrix (2) is the smallest possible such example in terms of number of columns.) There is no efficient way known at present to find an optimal direct c o v e r from any of these classes, though heuristics that appear to work very well in practice are known. The first heuristics studied were designed to find N D C s , largely because this is the natural model which was known for approximating sparse Jacobians (see [2]). Powell and Toint [8] noted that any N D C heuristic can be converted into a SeqDC heuristic just by relaxing the non-overlapping condition to apply only to unknowns that have not yet been determined, so as to take advantage of the s y m m e t r y of the Hessian. Finally, Thapa [9] realized that in some cases a SimDC would do better than a SeqDC, and proposed a heuristic for SimDCs. T h a p a ' s N e w Direct Method appears to offer little gain relative to the extra effort necessary to implement it. H o w e v e r , Powell and Toint's modification of the Curtis, Powell and Reid method adds almost no overhead and can achieve substantial i m p r o v e m e n t s in some cases, and so appears to be the best current choice for actual practice. The next section shows that finding an optimal NDC is very difficult. Though it would be interesting to analyze the more useful class, SeqDC, NDC seems to be more amenable to analysis. The

158


tWO classes are similar enough that the results in Section 3 are considerable evidence that finding an optimal SeqDC is also a very hard problem.

3. An equivalent graph coloring problem; NP-completeness We show that the problem of finding an optimal N D C (which can be assumed to be partitioned) is equivalent to a graph coloring problem, which is then shown to be N P - C o m p l e t e . Our notation and terminology for graphs follow that of B o n d y and Murty [1]. Our way of writing a sparsity pattern as a (0, D-matrix, call it S, could just as well be interpreted as the v e r t e x - v e r t e x incidence matrix of an undirected graph. That is, the s y m m e t r y of the sparsity pattern is reflected in the undirectedness of the graph. A partition of the columns of the sparsity pattern naturally induces a partition of the vertices of the associated graph; the vertex partition can be considered to be some sort of coloring on the graph. T w o columns in the same group, i.e. two vertices of the same color, cannot 'overlap'. Column i overlaps column j if sk~ = ski = 1 for some row k, i.e., if vertex i and vertex j are both adjacent to vertex k. Thus the restriction on our graph coloring is that no two vertices of the same color can have a c o m m o n neighbor. If distance from vertex i to vertex j in the graph is measured by ' m i n i m u m number of edges in any path between i and j', then we require that any two vertices of the same color must be more than two units apart. In the usual Graph Coloring Problem, we require that any two vertices of the same color be more than one unit apart. This leads to defining a proper distance-k coloring of a graph G to be a partition of the vertices of G into classes (colors) so that any two vertices of the same color are more than k units apart. Then we want to solve the Distance-k Graph Coloring Problem ( D k G C P ) on a graph G: Find a proper distance-k coloring of G in the minimum possible number of colors. Then the usual Graph Coloring Problem (GCP) is DIGCP, and the optimal NDC problem is equivalent to D2GCP. We shall use this equivalence to show that the optimal N D C problem is N P - C o m p l e t e by showing that D 2 G C P is N P - C o m p l e t e ; in fact, we shall show the stronger result that D k G C P is NPComplete for any fixed k -> 2. First we review the definition of N P - C o m p l e t e n e s s (see G a r e y and Johnson [4]). The fundamental N P - C o m p l e t e problem is the Satisfiability Problem (SAT), which we use in a slightly simpler, but equivalent form: 3-Satisfiability (3SAT): Given a set of atoms u~. u2. . . . . u,,. we get the set of literals L = {u. ff~, u2. t~2. . . . . u., t].}. Let C = { C . C2 . . . . . C.,} be a set of 3-clauses drawn from L, that is, each C~ C L. and ICsl = 3. Is there a truth assignment T: {u~ . . . . . u . } ~ { t r u e , false} such that each Cs contains at least one u~ with T(u~)= true or at least one t~i with T ( u , ) = false? The set of clauses is really an abstraction of a logical formula; imagine the

159


, 2 I,--/', .

.

.-:.,-, .

-cO-

9 ..__ckl

.

T~--"

9.

r".-

rh.

§

.

.

.

.

.

.

-

Figure I.

clauses as parenthesized subformulae whose literals are connected by 'or', with all the clauses connected with 'and'. Then a satisfying truth assignment makes the whole formula true. 3SAT has been shown to be 'at least as hard as' a large class of hard problems. Thus, if 3SAT can be encoded into any other problem X, then X inherits the 'at least as hard as' property and is called N P - C o m p l e t e . Since G C P is well k n o w n to be N P - C o m p l e t e [7], it is also possible to show that a problem X is N P - C o m p l e t e by showing that G C P can be encoded into X. This approach might seem to be more natural for D k G C P , but it appears to be very difficult. As we shall see in Section 4, D k G C P encodes more naturally into G C P than vice-versa. In order to encode 3SAT into D k G C P , D k G C P must be recast as a decision problem. As is standard with optimization problems, we re-phrase D k G C P to "Is there a distance-k coloring that uses p or f e w e r c o l o r s ? " In a slight abuse of notation, we shall let D k G C P refer to both the optimization problem and the related decision problem. Our encoding is a generalization of the one found in K a r p ' s original proof [7] of the N P - C o m p l e t e n e s s of GCP. The theorem about the encoding requires the exclusion of the case in which a clause contains both an atom and its negation. But such clauses are always trivially satisfied, so we henceforth understand ' 3 S A T ' to mean '3-Satisfiability without trivial clauses'. Also, we assume (without loss of generality) that n-> 4. This will allow us to choose, for any clause Cj, an atom u~ with u ~ " Cj, and tii~" Cj. Given a 3SAT problem P, we construct from it a decision problem on a graph Gk(P). If P has atoms ul, uz . . . . . u,, and clauses Ct, C2 . . . . . Cm, let h = r89 and p = 2nk + r e ( k - 1 ) . Let V and E denote the vertices and edges of Gk(P), and define them by ui, t~i

V=

r=l ..... k C~, r = O . . . . . k - l } I~, r = l . . . . . k - 1 F[,T~,

} i = 1,

....

n

s=l ..... m

literal vertices, false vertices, true vertices clause vertices, intermediate vertices


160

{ui, ti,}

u, t~ different colors

all i

{v,, {T;, r;+'}/ all F ' s , T ' s different colors

all r, all i # j

{v;, F j} [ {T;, T;} J {I:,I~+'} {I~ ', c~}

il}

all r

C o different color than its literals

all s

{l~,u~} ifu~eC, {I.',, a~} if/~ E C E =

{u,, F~} {u, T~}

ui, (~i can only be FI or TI

all i # j

{a,, vT} {/i,, T~}

{c;, c;} {c~, F7 +'}

r>-h

{C~. T7 +'}

F~

{c~, c7'}

r_>

ul,o,.~&~ C't ~ {c','-', TP} all i {c';, Fp} Ub Cs}

{C~, TI'}

all i

C~, r > O, different from each other and F ' s and T ' s

a l l s # t , alli

J

all s, k odd l all s, k even | J

C o can only be F I colors of its literals

N o t e that we consider Gk(P) only for k---2, implying that h + 1 - < k, so h + 1 makes sense as a superscript for the F ' s and the T's. The global structure of G~(P) is pictured in Fig. 1. We need three propositions about the structure of a p r o p e r distance-k coloring of Gk(P).

Proposition 1. The vertices FT, T~, and C~ ~, i = 1. . . . . n, s = 1. . . . . m, r = ! . . . . . k must all have different colors, thus using up all p colors.

Proof. Consider the length k paths rll

iT,-

T~ . . . . .

T~

which d e m o n s t r a t e that all F ' s and T ' s must be different colors. N o w consider the length k - 1 paths C l-C; .....

f v l " ' - -!'+: . . . . .

C ~ - I T , h+l - TI --' '+2 . . . . .

e~

T~

(3)

which show that no C~, 0 < q -< h can be any F I or T [ color, r > h. The length at

S.T. M c C o r m i c k / A p p r o x i m a t i o n

most k-

161

of sparse Hessians

1 paths livl,~, _ F~+-, . . . . . C k-I _ C ~ -e

*,

~

Th+2

T k

s h o w that no C~, h h. L e t I be an i n d e x such t h a t u~K C, a n d &~" C,, a n d c o n s i d e r the l e n g t h k - 1 p a t h s , c,-

....

h-, f T l ' c,-lC~-TI'-

TI'-' ..... T~ k o d d .... TI keven

(4)

w h i c h s h o w t h a t no C q, 0 < q < h c a n be a n y T7 c o l o r , r -< h. T h e length k p a t h s

C ~__ C~ . . . . .

c'"-'-

.

-s

f

F ~ ' - F ) - ~ ' "- --i '

/C h

FI'-

. . . . .

~- i' ' . . . . .

g'

--J

F1 -i

kodd k even

(5)

s h o w t h a t no C~~, 0 < q < h c a n be a n y F~ c o l o r , r - < h. T h e l e n g t h k p a t h s

-El,§ '-

.....

{ri§

F~ rl' .....

FI rl

s h o w t h a t no C~, h --- q < k, c a n be a n y F~ or T~ c o l o r , r - < h. T h e l e n g t h k -

1

paths , C ~ - C7 . . . . .

h-I f T ~ - ~,,-I C,h-: ~1 ' - ~' C , - ] . C h _ ~,--,, r.~,-1 . . . . . p I~ '

k odd k even

(6)

s h o w that no C7 c a n be the s a m e c o l o r as a n y C~, 0 < q, r < h. F i n a l l y , the l e n g t h k - 1 path c~-

c:, .....

cl' - cl' - c, ~+' .....

cb'

(7)

s h o w s t h a t no C .q~ c a n b e the s a m e c o l o r as a n y C , , h - < q, r < k. S i n c e FT, T7 a n d C7, q > O, u s e up all the c o l o r s , w e s u b s e q u e n t l y r e f e r to the colors by these vertex names. Proposition

2. Vertices ui and & must be colored F~ and TI in some order,

i = 1 .... , n . Proof. L e t j r i and c o n s i d e r the l e n g t h k p a t h s

F ~ - ~ ,,Akj_ ~

u,}_

I

,~k 2 . . . . .

-- __j

I=~ --j

. . . . r~ rr!, -_ T~r;_2' . ..... r?

1c ~ - ' -

c~-'- . . . . .

(8)

c l

w h i c h s h o w t h a t u~ a n d & c a n n o t be a n y c o l o r o t h e r t h a n F I a n d TI. A l s o , u~ a n d ai c e r t a i n l y c a n n o t be the s a m e c o l o r .

S.T. McCormick/Approximation

162

of sparse Hessians

Thus a proper distance-k coloring of Gk(P) induces a truth assignment on the literals.

Proposition 3. If the literals in clause Cs have indices a, b, and c, then C Omust be colored FI,, Fib, or F~., s = 1. . . . . m. Proof. We can add C~ to the beginning of the paths in (3), (4), (5), (6) and (7), thus excluding all colors except FI from C~ If l is an index such that ul.~ C , fit.~ C,, then we can drop the edge FI' - F~ from (5) and add C~ to the beginning to show that C o cannot be any F~ color either. N o w we are ready to state and prove the N P - C o m p l e t e n e s s theorem, which then immediately implies that finding an optimal N D C is N P - C o m p l e t e . This theorem is a symmetric version of a result in Section 3 of Coleman and Mor6 [2]. Theorem 1. For fixed k ~- 2, D k G C P is NP-Complete.

Proof. Since the size of Q ( P ) is a polynomial in m and n, it is clear that the above reduction of 3SAT to D k G C P can be carried out in polynomial time. We must show that there is a satisfying truth assignment for the 3SAT problem P if and only if the graph G~(P) has a proper distance-k coloring in p or f e w e r colors. First suppose that G~(P) is properly distance-k colored. If l,,, lb, and l~ are the literals contained in Cs, then the length k path 0

C~-Is

k I

-Ix

k-2

I

- .... I~-li

shows that C~ cannot be the same color as any of l~, lb, or l,.. But C O must be colored F~o, F~, or F~ by Proposition 3. By Proposition 2, each l~ is colored either FI or TI, so each clause must contain at least one true literal under the truth assignment induced by the proper coloring, i.e., the clauses are satisfiable. N o w we need only show that Gg(P) can always be colored in p or f e w e r colors if P is satisfiable. Let -r be some satisfying truth assignment for C1, C2 . . . . . C,,. First color the F [ ' s , T [ ' s and C~'s, r > 0, as decreed by Proposition 1. Color u~ with TI if "r(ui)= true, color u~ with FI otherwise; color ai with the c o m p l e m e n t a r y color. Each C~ has at least one true literal, say l~. Color COs with color F~. Finally, color I~ with C~+1, r = 1. . . . . k - 1, where the subscript on C~ is interpreted modulo m. We now show that this coloring is proper. The colors F[, T[, 1 < r - k, each appear on only one vertex and so are proper. Color C~+t appears on exactly two vertices, itself and I~. A shortest possible path between these vertices in G~(P) is .

.

.

.

.

.

.

.

L)fii

'_

C~+

1 --

.+

' ' __ C s + l


163

and is of length k + 1. This is a shortest path because at least k edges must be used to get from layer I or to layer C or , and one extra edge must be used to get from an F or a T to a C. Also, any alternative path between these vertices that goes through a C ~ has at least k + 2 h edges because of the difference in subscripts, and because the C~'s do not interconnect for r < h ; thus color C; is proper. Color TI also appears on exactly two vertices, itself and one of u~ or g~. A shortest possible path between these vertices is the third one in (8) with T~ added at the end. For ] # i, at least k edges must be used to get from the uo layer to the T~ layer, and an extra edge is necessary to go from an i vertex to a j vertex. Any other path between these vertices through the I ' s uses at least k + 2 h - 1 edges (see Fig. 1), so T] is proper. Finally, F I can appear in three places: on itself, on u~ or fi~, and on any number of C~ whose clauses contain either u~ or t~i. As with ul or fir and TI above, ui or ai and F I do not cause a conflict. Some shortest possible paths between F I and any C o are those in (5) with C o added to the beginning. Again, at least k edges are necessary to go from 0 1 the C o layer to the F o layer, and an extra edge is necessary to go from an l vertex to an i vertex. In Fig. 1 we see that any other path between these vertices through the I ' s uses at least 2k edges, so no C~ FI pair causes a conflict. Between a u~ or t~ and a C ~ some shortest possible paths are f l t 1-' Ui -- | T k pk-l

lk-1 ~ o C,t f~0

of lengths k and k + 1 respectively. The first cannot exist because of the truth assignment and because there are no trivial clauses. Once again, the second must use k edges going from layer u. to layer C.,0 and an extra edge going from an F or a T to a C, so no u~ or t~, C o pair conflicts. Finally, a shortest possible path between C O and C o is (6) with C O added to the beginning and C O added to the end, of length k + 1. In Fig. 1 we see that any other path between these vertices through the I ' s uses at least 2k edges, so no F I color conflicts. Thus the coloring is proper, and the theorem is proved.

4. Heuristics for finding non-overlapping direct covers

Theorem 1 is, unfortunately, a negative result, since it implies that finding an optimal NDC is very hard. On a more positive note, much work has been done on finding near-optimal, polynomial-time, heuristic algorithms for NP-Complete problems (see [4, Chapter 6]). In the present case, the most obvious heuristic approach is to reduce D2GCP to GCP and then apply known heuristic results on GCP to the reduced graph. Given a graph G = (V, E), define D2(G) (the distance-2 completion of G) to be the graph on the same vertex set V, and with edges E = { { i , j } [ i and j are distance 2 or less apart in G}. Then it is easy to verify that a coloring of V is a

164

S.T, McCormick~Approximationof sparse Hessians

proper distance-2 coloring of G if and only if it is a proper (distance-l) coloring of D2(G) (note that this reduction also implies that D1GCP is NP-Complete). If there were a 'good' heuristic for GCP, then we could compose it with D2(o) to obtain a 'good' heuristic for D2GCP. Coleman and Mor6 [2, Section 4], gives a good overview of the present state of the art in GCP heuristics, which is not 'good'. In fact, if oH(G) denotes the number of colors used by the best known heuristic on graph G, and ~((G) denotes the optimal number of colors necessary for G (its chromatic number), then in the worst case max

cH(G) - O( n )

(9)

G . . . . . . . rices x ( G )

(this best heuristic and the bound (9) are due to Johnson [6]). Two facts mitigate the unpleasantness of (9). First, the range of D2(, ) does not include all graphs, and hence a better bound than (9) can be obtained for D2GCP. Second, average-case results have been obtained for GCP heuristics that are considerably better than (9). To improve on (9) for D2GCP, consider the specific heuristic called the distance-2 sequential algorithm (D2SA). Define W(i)={jr j is distance -- 1 I no j E ,~(i), j < i, is colored c} to vertex i, i = 1..... I V[. That is, D2SA assigns vertex i the smallest color not conflicting with those already assigned. (This is just the distance-2 version of the best known GCP heuristic, the sequential algorithm, which is called the CPR method in its applications to approximating sparse Jacobians (see Curtis et al. [3]).) Let cS(G) denote the number of colors used by D2SA when applied to G. In order to obtain bounds on cS(G), we require two definitions. The maximum degree of G, A(G), is defined as A(G) =

maxl{j 1{i, j} ~ E(G)}I.

The distance-2 chromatic number of G, x2(G) is defined as the optimal number of colors in a proper distance-2 coloring of G, i.e. x2(G) = min{k [ G has a proper distance-2 coloring with k colors}.

The following theorem bounds corollary improves (9) for D2SA:

x2(G) and cS(G)

in terms of zI(G), and a

Theorem 2. L e t d = z I ( G ) . Then

d + 1 -< x2(G) -< cS(G) -< d2 + 1

for all graphs G.

(lO)

165


Proof. L e t i be a v e r t e x incident to e x a c t l y d edges, and note that i and its d n e a r e s t neighbors must all be different colors in a p r o p e r distance-2 coloring; this p r o v e s the lower b o u n d in (10). T h e s e c o n d inequality in (10) is trivial. T o p r o v e the u p p e r b o u n d in (10), note that for a n y v e r t e x i, IN(i)]-< d + d(d - 1) = d 2. S u p p o s e that D 2 S A assigns color l to v e r t e x i; by definition of D 2 S A , this can h a p p e n only if at least one vertex of each color 1 . . . . . l - 1 is in X ( i ) . Thus, if i were assigned color I > d 2 + 1, then IN(i) I - d 2 + 1 (a contradiction). (This p r o o f is essentially a c o n s t r u c t i v e p r o o f of C o r o l l a r y 8.2.1 in B o n d y and M u r t y [113 Corollary 1. F o r all n >- 1, cS(G)

max

_-- X / n -

(11)

1 + 1 = O(X/n).

Proof. Clearly, c S ( G )

Optimal approximation of sparse hessians and its ... - Semantic Scholar

Optimal approximation of sparse hessians and its ... - Semantic Scholar

Suggest Documents

Sampling approach to sparse approximation ... - Semantic Scholar

A Framework of Sparse Online Learning and Its ... - Semantic Scholar

Optimal Parallel Solution of Sparse Triangular ... - Semantic Scholar

Optimal approximation of closed polygonal curves - Semantic Scholar

Affine-Constrained Group Sparse Coding and Its ... - Semantic Scholar

Competitive Online Approximation of the Optimal ... - Semantic Scholar

Optimal Approximation of Functions.

A TDI System and Its Application to Approximation ... - Semantic Scholar

An R Package for Trust Region Optimization with Sparse Hessians

The Optimal Computation of Hessians in Optimization

Approximation Algorithms for Finding the Optimal ... - Semantic Scholar

Sparse Partitions - Semantic Scholar

An Equivalence Between Sparse Approximation and ... - CiteSeerX

Sparse Approximation via Iterative Thresholding - Computing and

constructive approximation - Semantic Scholar

constructive approximation - Semantic Scholar

Optimal representation of sparse matrices

Approximation of Optimization Problems and ... - Semantic Scholar

Relatively optimal control and its linear ... - Semantic Scholar

Sparse Components of Images and Optimal

An Asymptotically Optimal Algorithm and Its ... - Semantic Scholar

Analysis and Approximation of Optimal Co

OPTIMAL APPROXIMATION OF STOCHASTIC ... - CiteSeerX

Sparse Support Faces - Semantic Scholar