Approximation and Hardness Results for Label Cut and Related ...

2 downloads 1096 Views 362KB Size Report
Emails: [email protected], linqing@ios.ac.cn, angsheng@ios.ac.cn. ...... Multicriteria Approximation of Network Design and Network Upgrade Problems.
Approximation and Hardness Results for Label Cut and Related Problems Peng Zhang



Linqing Tang



Wenbo Zhao



Jin-Yi Cai



Angsheng Li



Abstract We investigate a natural combinatorial optimization problem called the Label Cut problem. Given an input graph G with a source s and a sink t, the edges of G are classified into different categories, represented by a set of labels. The labels may also have weights. We want to pick a subset of labels of minimum cardinality (or minimum total weight), such that the removal of all edges with these labels disconnects s and t. We give the first non-trivial approximation √ and hardness results for the Label Cut problem. Firstly, we present an O( m)-approximation algorithm for the Label Cut problem, where m is the number of edges in the input graph. 1−1/ log logc n n Secondly, we show that it is NP-hard to approximate Label Cut within 2log for any constant c < 1/2, where n is the input length of the problem. Thirdly, our techniques can be applied to other previously considered optimization problems. In particular we show that the Minimum Label Path problem has the same approximation hardness as that of Label Cut, simultaneously improving and unifying two known hardness results for this problem which were previously the best (but incomparable due to different complexity assumptions).

1

Introduction

In many graph optimization problems, it is natural to associate edges with labels (or colors) which partition the set of edges into categories. There are several classical optimization problems considered under this model, such as the Minimum Label Spanning Tree problem [3, 6, 14, 19] and the Minimum Label s-t Path problem (Label Path, for short) [4, 14]. In this paper we consider the Minimum Label s-t Cut problem (Label Cut, for short), in which we are asked to pick labels at minimum total cost to disconnect a source s and a sink t by removing edges from the graph whose labels are picked. The Label Cut problem is a natural generalization of the classical Minimum s-t Cut problem, in the sense that every edge in the latter has a unique label. We give the first non-trivial approximation algorithm as well as prove hardness results for the Label Cut problem. This problem is also a generalization of the Minimum Set Cover or Minimum Hitting Set problems. In fact it can be considered as a version of the Minimum Hitting Set problem where the collection of sets to be “hit” is implicitly represented (and could be exponentially many). This problem is sufficiently natural that it can appear in many contexts. We came across this problem from the work of Jha, Sheyner and Wing [15], and of Sheyner, Haines, Jha, Lippmann, ∗

State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, P.O. Box 8718, Beijing, 100080, China. Emails: [email protected], [email protected], [email protected]. Supported by NSFC grants No. 60325206 and No. 60310213. Corresponding authors: Linqing Tang, Peng Zhang. † Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA. Email: [email protected]. ‡ Computer Sciences Department, University of Wisconsin, Madison, WI 53706. USA. Email: [email protected]. Supported by NSF CCF-0511679.

1

and Wing [23, 22] in computer security, in particular on intrusion detection and on generation and analysis of Attack Graphs¶ . In this application, an attack graph G has nodes representing various states, and directed edges representing state transitions and are labeled by possible “atomic attacks”. A pair of special nodes s and t are also given representing the initial state and the success state (for the intruder). To disable an “atomic attack” incurs some cost (a unit or a wighted cost). Then the computational task is to find a subset of “atomic attacks” of minimum cardinality (or of minimum total weight), such that the removal of all edges labeled by these “atomic attacks” disconnects s and t. This is precisely the Minimum Label Cut problem. Formally, an instance of the Minimum Label Cut problem consists of a graph G = (V, E) (directed or undirected) with one source node s ∈ V and one sink node t ∈ V , and a label (or color) set L = {1, 2, . . . , q}. Each edge e ∈ E has a unique label l(e) ∈ L, but different edges may have the same label. Each label l ∈ L has a nonnegative weight w(l). A subset L′ ⊆ L of labels is called a label cut if the removal of all edges of labels in L′ disconnects s and t. The problem asks to find a label cut L′ such that the total weight of L′ is minimized.

1.1

Related Works

In [15] it was observed that Label Cut is NP-hard by reducing Minimum Hitting Set to it. There is a simple duality between Minimum Hitting Set and Minimum Set Cover. It is well known that Minimum Set Cover has a greedy polynomial time (1 + loge |U |)-approximation algorithm, where |U | is the size of the underlying set. The same algorithm can be translated to Hitting Set by duality. In [15] the authors express the Label Cut problem as an implicit Minimum Hitting Set problem, and then translate the Set Cover algorithm to get an approximation algorithm. This implicit Minimum Hitting Set problem is as follows: Let D = {L′ | L′ is the set of labels appearing on a path from s and t}. This is the set to hit. A set of labels is a label cut iff it intersects every L′ ∈ D. The approximation algorithm for Set Cover translates to an approximation algorithm for Label Cut with approximation guarantee of 1 + ln |D|. However, as observed by Jha, Sheyner and Wing [15], |D| is typically exponentially larger than the input size (number of edges in G). Since the removal of all edges is certainly a feasible solution to Label Cut, the worst-case approximation ratio 1 + ln |D| is useless. (There is an additional problem. This concerns the need to compute in polynomial time the next “greedy” step in the approximation algorithm for Set Cover. For implicit Minimum Hitting Set, we would need to find the next label which “hits” the most remaining paths from s to t, and it is not known how to do this in general. The authors in [15] consider special cases where this is feasible.) We mention another problem, dual to Label Cut, called the Label Path problem. Given an edge-labeled graph, the Label Path problem finds an s-t path with minimum total number of labels on the path. Label Path is NP-hard by a reduction from 3SAT. Broersma et. al. [4] devised two exact exponential-time algorithms for the Label Path problem, with respective running times of O(n · min{q d , 2q }) and O(n2 q!), where d is the s-t distance. Through an approximation preserving reduction from the Red-Blue Set Cover problem [5], Wirth [24, Theorem 2.16] proved that unless 1−ǫ NP ⊆ DTIME(npolylog(n) ) Label Path can not be approximated within ratio O(2log n ). Very √ recently, Hassin, Monnot and Segev [14] gave an O( n)-approximation algorithm for the weighted Label Path problem, and an approximation hardness result O(logk n) (for any fixed k ≥ 1) for the problem assuming P 6= NP. Notice that the two hardness results for Label Path are incomparable. Minimum s-t Cut is solvable in polynomial time. On the other hand, a generalization called the Multicut problem, asks for edge removals to disconnect a given set of source-sink pairs at ¶

We are indebted to Jeannette Wing who gave an overview talk on their interesting work at Tsinghua University.

2

minimum cost, is NP-hard even for trees [13]. The relationship between the Multicut problem and the Shortest Path problem is very similar to that between Label Cut and Label Path. Multicut can √ be approximated within a factor of O( n) in directed graphs [11], a factor of O(log k) in undirected graphs [12] (where k is the number of source-sink pairs), and a factor of 2 in trees [13]. It seems that to prove approximation hardness of Multicut is difficult, and the only known hardness results of Multicut are logarithmic in directed graphs [8] and a constant factor in undirected graphs [7, 17].

1.2

Our Results

In this paper, we give the first non-trivial approximation and hardness results for Label Cut. Firstly, √ we give a polynomial time O( m)-approximation algorithm, where m is the number of edges in the input graph. The idea of the algorithm is that we first make the graph sparse by picking labels at appropriately inexpensive cost, and when the graph is sufficiently sparse we directly compute a minimum s-t cut in the resulting graph. The solution to the problem is then the set of labels picked in the first stage and the labels in the minimum s-t cut computed in the second stage. We use ideas and results of [14, 18]. We also extend the algorithm to deal with related cut problems in edge-classified graphs, such as Label Multicut, Label Multiway Cut, and Label k-Cut. Secondly, on approximation hardness, we show that Label Cut can not be approximated within 1−1/ log logc n log n for any constant c < 1/2 unless P = NP, where n is the input length of the 2 problem. We prove this by a gap-preserving reduction from the Label Cover problem, for which Dinur and Safra [10] proved such a hardness result. The proof is ultimately based on a reduction from the PCP theorem. Our method of the reduction also gives the same approximation hardness for the Minimum Label Path problem as that of Label Cut, thus simultaneously improving and unifying two known hardness results [14, 24] for this problem which were previously the best (but incomparable due to different complexity assumptions as discussed in Section 1.1). The proof of hardness for Label Cut and Label Path relies on two ingredients. Firstly we 1−1/ log logc n n (under complexity assumption P 6= NP) is show that the approximation hardness 2log preserved even for a special type of instances of Label Cover, in which the two parts of an input bipartite graph have the same number of vertices. Secondly, we show that the mapping relation between labels in the Label Cover problem can be represented by some sub-structures in the Label Cut (resp. Label Path) problems, and thus the cut requirement in Label Cut (resp. the connectivity requirement in Label Path) is equivalent to the edge-covering requirement in Label Cover.

2

√ An O( m)-Approximation Algorithm for Label Cut

Minimum s-t Cut can be solved in P. If a graph G does not have many edges, it also has an s-t cut without many edges. Consequently it also has a small label cut. Our plan is to successively remove edge sets which are relatively large, yet defined by small label sets. When the graph becomes sufficiently sparse we can directly remove the minimum s-t cut and its corresponding label set. To do this we consider the Budgeted Maximum Coverage problem (BMCP, for short). We are given a ground set U , a collection S of subsets of U , and a budget B. Each subset Si ∈ S has a nonnegative weight. The problem is to find a sub-collection S ′ ⊆ S such that the total weight of S ′ is at most B and the number of elements in U covered by S ′ is maximized. This problem can be approximated in polynomial time within a factor of 1 − 1e [18]. Now we give a detailed outline of our approximation algorithm for Label Cut. Denote by P w(L′ ) = l∈L′ w(l) the total weight of labels from a label subset L′ ⊆ L. Let m = |E| and L∗ ⊆ L be an optimal solution to Label Cut. We aim for an m1/2 -approximation of the optimal solution,

3

i.e., we want to output a label cut L′ ⊆ L such that w(L′ ) ≤ m1/2 w(L∗ ). We start with an estimate ∆ on the optimum OPT = w(L∗ ) of the Label Cut instance, such that OPT ≤ ∆ < 2 · OPT. We try a sequence of ∆ as guesses (see Theorem 2.1). Suppose we have a correct ∆. Starting with H0 = G, we will perform a sequence of edge removal operations. This creates graphs H0 , H1 , . . . , Hm′ , where Hi+1 is obtained from Hi by the removal of some subset of edges. Each step i also produces a subset of labels Li . The final output is the union of all these Li . At step i, we are given Hi . Keeping only edges in Hi with label weight > ∆ (and delete others) defines a subgraph Hi′ . Suppose that G is undirected, then we find connected components of Hi′ . For directed G we find strongly connected components of Hi′ . Then we merge the nodes from Hi within each (strongly) connected component defined above to obtain Hi∗ . Edges within a component disappear. Other edges remain; in particular between the merged nodes there may be multiple edges in Hi∗ . All edges retain their labels, and all label weights in Hi∗ are ≤ ∆. If OPT ≤ ∆, then vertices s and t remain distinct in Hi∗ ; otherwise an s-t path using only edges of weight > ∆ exists which has no intersection with the cut of total weight OPT ≤ ∆. Now we use a Maxflow-Mincut algorithm to find the minimum cut size c∗i = |mincutHi∗ (s, t)| of Hi∗ . Here each edge in Hi∗ counts as one, and multiple edges count with multiplicity. If c∗i ≤ m1/2 , then we find a minimum cut of Hi∗ , and use all labels on this cut. This is our Li , and with this we terminate with m′ = i. Note that all edges of Hi of labels in Li also form a cut in Hi . Now suppose c∗i > m1/2 . We consider the following instance of BMCP where the ground set is the set of all edges E(Hi ) of the graph Hi , and the collection of subsets S in BMCP consists of all Sl = {e ∈ E(Hi ) | l(e) = l} for l ∈ L, and where the weight of Sl is w(l). We apply the polynomial time approximation algorithm [18] on this instance for budget ∆ to get a sub-collection S ′ ⊆ S. S Then we use all labels coming from S ′ as our Li in this case. We claim that the edge set l∈Li Sl corresponding to Li has cardinality at least 21 m1/2 (see Lemma 2.1). This gives the approximation algorithm for Label Cut, as presented by Algorithm A below. For any subset E ′ ⊆ E, let L(E ′ ) denote the set of all labels in E ′ . 1. 2.

3. 4. 5.

Algorithm A let i ← 0, H0 ← G. Define H0′ and H0∗ as described above. while |mincutHi∗ (s, t)| > m1/2 a. Construct a BMCP instance I ′ as follows: the ground set U = E(Hi ), the collection S consists of Sl = {e ∈ E(Hi ) | l(e) = l} with weight w(l) for every l ∈ L, and the budget B = ∆. b. Approximately solve I ′ and let Ei be the set of covered edges. ′ ∗ c. let Li ← L(Ei ), Hi+1 ← Hi \ Ei . Define Hi+1 and Hi+1 as described above. d. let i ← i + 1. endwhile let Li ← L(mincutHi∗ (s, t)). S return L′ = i Li .

Let m′ be the number of iterations of step 2, i.e., m′ equals the value of i when step 2 terminates. Lemma 2.1 For Algorithm A, if OPT ≤ ∆, then m′ < 2m1/2 . Proof: Suppose that an optimal solution to the Label Cut instance I is L∗ ⊆ L with total weight OPT = w(L∗ ) ≤ ∆, and that E ∗ ⊆ E(G) is the set of edges corresponding to the label subset L∗ . In each iteration of step 2, E ∗ ∩ E(Hi∗ ) is a feasible solution to the BMCP instance I ′ , since the total weight of labels in E ∗ ∩ E(Hi∗ ) is at most w(L∗ ), w(L(E ∗ ∩ E(Hi ))) ≤ w(L∗ ) ≤ ∆. 4

A crucial observation is that E ∗ ∩ E(Hi∗ ) ⊆ E ∗ ∩ E(Hi ) is also an s-t cut in Hi∗ . This is because any s-t path in Hi∗ can be extended to an s-t path in Hi by appending only edges of label weight > ∆. The path in Hi must intersect E ∗ ∩ E(Hi ), but all edges in E ∗ have label weight ≤ ∆, therefore the s-t path in Hi∗ must intersect E ∗ ∩ E(Hi∗ ). So, we know that |E ∗ ∩ E(Hi∗ )| ≥ |mincutHi∗ (s, t)| > m1/2 . By the work of [18], we know that Ei from 2.b. covers at least (1 − 1e )OPTBMCP (I ′ ) edges, where OPTBMCP (I ′ ) is the optimum of instance I ′ . So in each iteration of step 2 at least

1 1 1 (1 − )OPTBMCP (I ′ ) ≥ (1 − )|E ∗ ∩ E(Hi∗ )| > m1/2 e e 2 edges are removed from Hi . Since there are at most m edges in total in graph H0 , it follows that the number of iterations m′ < 2m1/2 . Theorem 2.1 The Label Cut problem can be approximated within a factor of O(m1/2 ) in polynomial time, where m is the number of edges in the input graph of the problem. Proof: Let wmin denote the minimum (nonzero) label weight. Then we know wmin ≤ OPT ≤ w(L). We try Algorithm A with ∆ = 2k · wmin for every k = 0, . . . , ⌈log2 w(L) wmin ⌉, where ∆ is used as an estimate of OPT. Let k be such that 2k−1 · wmin < OPT ≤ 2k · wmin . Then for the guess ∆ = 2k · wmin , we have OPT ≤ ∆ < 2 · OPT. With this ∆, we know that the number of iterations m′ < 2m1/2 by Lemma 2.1. Moreover, when step 2 terminates, |mincutH ∗ ′ (s, t)| is at most m1/2 , implying that m

∗ 1/2 ∆. w(Lm′ ) ≤ |mincutH ∗ ′ (s, t)| · max{w(l): l ∈ L(E(Hm ′ ))} ≤ m m

Thus the total weight of the output L′ is at most m′ ∆ + w(Lm′ ) ≤ 2m1/2 · ∆ + m1/2 · ∆ = O(m1/2 ) · OPT. We note that, from the label set Lm′ derived from mincutH ∗ ′ (s, t), all edges of labels from Lm′ m in Hm′ do constitute an s-t cut in Hm′ , and therefore the output L′ is an s-t cut in G. Finally, for every k, we run Algorithm A. (We never run beyond m′ ≥ 2m1/2 , and verify each time s and t remain distinct in the merging operations; if these are unsatisfied then we go to the next ∆.) Among all tries with different ∆ we pick the label subset with minimum total weight as our final solution. Our method here can be extended to deal with other cut problems in the edge-classified graph model, such as Label Multicut, Label Multiway Cut and Label k-Cut e.t.c. (we call them generally Label γ-Cut problems below). Algorithm A can be generalized to deal with each of them by replacing the minimum s-t cut computation in step 2 with a known approximation algorithm for the corresponding (NP-hard) unlabeled problem. If the approximation ratio of the algorithm for the unlabeled problem is ρ, we can show that the approximation ratio of the modified algorithm A is O((mρ)1/2 ). For the problems Multicut, Multiway Cut and k-Cut in undirected graphs, we know that ρ are O(log k) [12], 1.3438 [16], and 2 [21], respectively. The proof of Theorem 2.2 is omitted here due to space limit and is given in the Appendix. 5

Theorem 2.2 The Label γ-Cut problem can be approximated within factor of O((mρ)1/2 ) in polynomial time, where m is the number of edges in the input graph and ρ is the approximation ratio of the existing approximation algorithm for γ-Cut.

3

Approximation Hardness for Label Cut

In this section we prove that it is NP-hard to approximate the Minimum Label Cut problem 1−o(1) n . We do this by reducing the Minimum Label Cover problem to Label Cut. to within 2log The Minimum Label Cover problem (Label Cover, for short) was implicit in [20] and first formally defined in [1]. Label Cover is one of the six canonical intractable problems in terms of approximation introduced by Arora and Lund [2]. The current best known approximation hardness result for Label Cover is due to Dinur and Safra [10], which states that the problem is NP-hard to approximate 1−1/ log logc n n for any constant c < 1/2. within 2log In the Minimum Label Cover problem, we are given a bipartite graph G = (U, V, E) where E ⊆ U × V , two sets of possible labels B1 and B2 for vertices in U and V respectively, and a relation Π ⊆ E × B1 × B2 that consists of admissible pairs of labels for each edge e ∈ E. A labeling is a pair of functions (P1 , P2 ) that assigns a non-empty set of labels to every vertex in U ∪ V , where P1 : U → 2B1 and P2 : V → 2B2 . It is said to cover an edge e = (u, v) (where u ∈ U and v ∈ V ) if for every label b ∈ P2 (v) there is some label a ∈ P1 (u) such that (e, a, b) ∈ Π. The l1 -cost of a labeling is the l1 norm of the vector (|P1 (u1 )|, . . . , |P1 (u|U | )|), that is, the total number P u∈U |P1 (u)| of labels assigned to vertices in U by function P1 counted with multiplicities. A total-cover (a solution) of G is a labeling that covers all the edges in G. The problem is to find a total-cover with minimal l1 -cost. We always implicitly assume that the only instances we shall consider for Minimum Label Cover are such that a total-cover exists (see [2], p403, line −8). 1−o(1) n for Label Cover via a Dinur and Safra [10] proved the approximation hardness result 2log reduction from the following version of the PCP Theorem. Theorem 3.1 ([9]) Let Ψ = {ψ1 , . . . , ψn } be a system of local-tests over variables X = {x1 , . . . , xn′ } such that each local-test depends on D = log logc n variables (for any constant c < 1/2), and each 1−1/O(D) n ). It is NP-hard to distinguish between variable ranges over a field F where |F| = O(2log the following two cases: Yes: There is an assignment to the variables such that all ψ1 , . . . , ψn are satisfied. 2 fraction of the ψi ’s. No: No assignment can satisfy more than |F| Define gc (n) = 2log

1−1/ log logc n

n.

Dinur and Safra proved that

Theorem 3.2 ([10]) Let Ψ be a local-test system defined in Theorem 3.1. Then an instance I = ((U, V, E), B1 , B2 , Π) of Label Cover can be constructed from Ψ in polynomial time such that for any constant c < 1/2, (Soundness) Ψ is satisfied =⇒ OPT(I) = |U |, and (Completeness) No assignment can satisfy more than

2 fraction of Ψ =⇒ OPT(I) > g|U |, |F|

where g = gc (|I|) and OPT(I) denotes the optimum of Label Cover on the instance I. We shall prove the approximation hardness of Label Cut via a gap-preserving reduction from Label Cover. For succinctness, denote by E(v) ⊆ E all edges that are incident to v, and by 6

N (v) ⊆ U all endpoints of edges in E(v) that lie in U , i.e., N (v) is the neighborhood of vertex v. Without loss of generality we assume |N (v)| ≥ 1 for all v ∈ V . Similarly every u is in some S neighborhood N (v), i.e., U = v∈V N (v). Let us begin with a simple lemma about labels in B2 . Lemma 3.1 For any vertex v ∈ V and label b ∈ B2 , if there is a vertex u ∈ N (v) such that for all a ∈ B1 , ((u, v), a, b) 6∈ Π, then vertex v can not be assigned with label b by any solution. Proof: The proof is straightforward. If vertex v is associated with label b by some solution (P1 , P2 ), since (P1 , P2 ) must be a total-cover, we know that every vertex u ∈ N (v) has a mapping ((u, v), a, b) ∈ Π for some a ∈ B1 by the cover condition, contradicting the condition in the lemma. If a label b ∈ B2 satisfies the condition in Lemma 3.1, we say this label b is excluded (by the vertex v). A label b is non-excluded (by the vertex v) iff for all u ∈ N (v) there exists some label a ∈ B1 , such that ((u, v), a, b) ∈ Π. Since we only consider instances where a total-cover exists, we must have for every v, it has a non-empty set of non-excluded labels. The following lemma gives a transformation on the instances of Label Cover, which is useful for the gap-preserving reduction from Label Cover to Label Cut. Lemma 3.2 Without loss of generality, we may assume that instances of Label Cover satisfy |U | = |V |. Proof: Take any instance I = ((U, V, E), B1 , B2 , Π) of Label Cover. We apply a transformation on I, which was originally introduced in [2, p. 413] (for the maximum version of Label Cover), to obtain an instance I ′ = ((U ′ , V ′ , E ′ ), B1 , B2 , Π′ ) of Label Cover. U ′ consists of |V | copies of U , and V ′ consists of |U | copies of V . It is obvious that |U ′ | = |V ′ | for the new instance I ′ . The edge set E ′ in I ′ consists of copies of E between each new pair of copies of U and V . The relation Π′ is constructed by duplicating Π accordingly. If I has a labeling (P1 , P2 ) with l1 -cost equal to |U |, then I ′ has a labeling with l1 -cost |U ′ |, since we can get a labeling for I ′ by duplicating (P1 , P2 ). Conversely, if every labeling for I has l1 -cost > g|U |, then every labeling for I ′ has l1 -cost > g|U ′ |. Suppose not. If I ′ has a labeling (P1′ , P2′ ) with l1 -cost ≤ g|U ′ |, then we can get a labeling for each copy of U and V by restricting (P1′ , P2′ ) on that copy. Picking the copy of U with the minimum l1 -cost, together with any copy of V gives us a solution to I with l1 -cost ≤ g|U |. That is, for instances I and I ′ we have OPT(I) = |U | =⇒ OPT(I ′ ) = |U ′ |,

OPT(I) > g|U | =⇒ OPT(I ′ ) > g|U ′ |. By adjusting the arbitrary constant c < 1/2, g = gc (n) can absorb any polynomial increase in n. The Lemma follows. Lemma 3.3 There is a polynomial-time gap-preserving reduction τ from Label Cover to Label Cut that transforms any instance I = ((U, V, E), B1 , B2 , Π) of Label Cover to an instance I ′ = ((V ′ , E ′ ), s, t, L) of Label Cut, such that, OPTCOVER (I) = |U | =⇒ OPTCUT (I ′ ) ≤ |U | + |V |,

OPTCOVER (I) > g|U | =⇒ OPTCUT (I ′ ) > g|U | + |V |, where OPTCOVER () (resp. OPTCUT ()) returns the optimum of any instance of Label Cover (resp. Label Cut). 7

Figure 1: Gadget Gv,b for vertex v ∈ V and label b ∈ B2

Figure 2: Gadget Gv for vertex v ∈ V Proof: Given an instance I = ((U, V, E), B1 , B2 , Π) of Label Cover, the instance I ′ = (G′ , s, t, L) of Label Cut is constructed as follows. For every vertex v and non-excluded label b, there is a gadget Gv,b , as shown in Figure 1. Gv,b consists of a series of parallel chains, each of which corresponds to an edge (u, v) ∈ E(v). For such an edge (u, v), suppose that the set of labels mapped to b ∈ B2 by Π is {a1 , a2 , . . . , ap } ⊆ B1 , i.e., this is the subset of labels a ∈ B1 such that ((u, v), a, b) ∈ Π. (Since b is a non-excluded label by v, for every neighbor u ∈ N (v), this number p ≥ 1. This p depends on v, b and u. Note also that |N (v)| ≥ 1 as noted earlier.) Then in Gv,b the chain corresponding to edge (u, v) contains p consecutive diamonds, with the ith diamond (1 ≤ i ≤ p) having label (u, ai ) for its two top edges, and label (v, b) for its two bottom edges (see Figure 1). All other chains are constructed in the same manner, one chain for each u ∈ N (v). The dashed lines (not belonging to the graph) at the two ends of Gv,b mean that the ends of each chain are merged. Next, a gadget Gv for every vertex v ∈ V is constructed by linking several gadgets Gv,b together, where each gadget Gv,b corresponds to a non-excluded label b by v (see Figure 2.) Note that only Gv,b for non-excluded labels b for the vertex v appear in Gv . Its number is at least one but ≤ |B2 |. (In Figure 2 for notational simplicity we assume that for the given example of Gv , all |B2 | labels are non-excluded labels for v and are denoted as 1, 2, . . . , |B2 |.) Finally, graph G′ = (V ′ , E ′ ) consists of a series of parallel gadgets Gv for every vertex v ∈ V . All the heads of gadgets Gv are merged into one vertex, that is, the source vertex s, meanwhile all the tails of gadgets Gv are merged into one vertex, that is, the sink vertex t. The label set L contains all labels of the forms (u, a) and (v, b) used in the construction. Every label in L has unit weight. Since each diamond of G′ corresponds to a tuple of Π, we know that I ′ can be constructed in polynomial time. Suppose (P1∗ , P2∗ ) is an optimal solution to the Label Cover instance I whose l1 -cost is |U |. 8

For any v ∈ V , since P2∗ (v) is non-empty, there exists some b ∈ P2∗ (v). This b is obviously a non-excluded label, by the definition of a solution in Label Cover. So (v, b) is a label in L. Now we pick the label (v, b). Next, for all u ∈ U , pick all labels (u, a) for a ∈ P1∗ (u). Since the l1 -cost of (P1∗ , P2∗ ) is |U |, this counts |P1∗ (u)| for different u ∈ U with multiplicity, it follows that we have picked a total of |U | + |V | labels from L. We claim that removing all edges with the picked labels disconnects s and t in G′ . This is equivalent to disconnecting every Gv . Fix any v ∈ V . We consider the picked label (v, b), and show that the gadget Gv,b is disconnected. This in turn is equivalent to disconnecting every chain in Gv,b , one chain for each u ∈ N (v). Consider such a chain defined by a particular u ∈ N (v). Since all edges labeled (v, b) have been removed, we just need to prove that at least one diamond has been completely removed. Suppose there are p diamonds, where the upper path is labeled by (u, a1 ), . . . , (u, ap ), and these ai are those satisfing ((u, v), ai , b) ∈ Π. (see Figure 1). Since (P1∗ , P2∗ ) is a total-cover, for some 1 ≤ i ≤ p, ai ∈ P1∗ (u). But we did pick labels (u, a), for all a ∈ P1∗ (u). It follows that OPTCUT (I ′ ) ≤ |U | + |V |. Now we prove the second part of the lemma by its contrapositive. Suppose the optimum of the Label Cut instance I ′ is ≤ g|U | + |V |, and an optimal solution is L∗ ⊆ L. For clarity let us call a label in L of the form (u, a) a u-label, and a label of the form (v, b) a v-label. Denote by nul (resp. nvl ) the number of u-labels (resp. v-labels) in L∗ . Since G′ consists of |V | subgraphs Gv in parallel, the removal of edges with labels in L∗ must disconnect each Gv . By construction of Gv as a chain of Gv,b , it must disconnect some Gv,b , and therefore must pick at least one v-label (v, b) for every v ∈ V . This defines our assignment P2 on V , and nvl ≥ |V |. Next we define P1 on U as follows: P1 (u) = {a | (u, a) is a picked u-label in L∗ }. Since nvl ≥ |V |, it follows that nul ≤ g|U |. Since L∗ disconnects Gv,b , for every u ∈ N (v), L∗ must include at least one u-label (u, a) such S that ((u, v), a, b) ∈ Π. In particular P1 (u) is non-empty. Since U = v∈V N (v), this is true for all u ∈ U . This also shows that (P1 , P2 ) is a solution to the Label Cover instance I. Since nul ≤ g|U |, we know that the l1 -cost of (P1 , P2 ) is ≤ g|U |. This completes the proof of the contrapositive. Theorem 3.3 For any constant c < 1/2, the Label Cut problem can not be approximated within 1−1/ log logc n n in polynomial time unless P = NP, where n is the input length of the problem. 2log Proof: Given any Label Cover instance I, first we apply the transformation presented in Lemma 3.2 to get an instance Ib = ((U, V, E), B1 , B2 , Π) of Label Cover such that |U | = |V |. Denote by m the length of the Label Cover instance I, and by n the length of the Label Cut instance b By Lemma 3.3, the I ′ = (G′ , s, t, L) generated by the reduction τ in Lemma 3.3 on instance I. gap between the two cases in the lemma is at least g/2, where g = gc (m). By Theorems 3.1 and 3.2, it is NP-hard to approximate Label Cover within ratio gc (m) for any constant c < 1/2. Since essentially the edges (together with their labels) in G′ reflect the relation Π, we know that n ≤ mk for some positive constant k, i.e., m ≥ n1/k . So we have that it is NP-hard to approximate Label Cut (instances generated by the reduction τ of Lemma 3.3) within ratio gc (m)/2 for any constant c < 1/2. This implies the approximation hardness gc′ (n) of Label Cut for any constant c′ < 1/2 under the complexity assumption P 6= NP (the coefficient 1/2 and exponential 1/k can be absorbed in the exponent of the hardness gc′ (n)).

4

Approximation Hardness for Label Path

The Minimum Label Path problem has been well studied before [24, 14]. The instance of the problem is the same as that of Label Cut. The goal of the problem is to find an s-t path P with minimum total weight of labels that appear on P . We show that the approximation hardness of 1−1/ log log c n n for any constant c < 1/2 under the complexity assumption P 6= NP. Label Path is 2log 9

This simultaneously improves and unifies two known hardness results for this problem which were previously the best (but incomparable due to different complexity assumptions). The previous results were (a) it is NP-hard to approximate Label Path within factor logk n for any fixed k ≥ 1 1−ǫ [14], and (b) Label Path can not be approximated within O(2log n ) for any constant ǫ > 0 unless NP ⊆ DTIME(npolylog(n) ) [24]. In both of these two hardness results, n denotes the number of vertices in the input graph. We prove the approximation hardness for Label Path via a gap-preserving reduction from Label Cover that is similar to that from Label Cover to Label Cut given in Section 3. The reduction uses two types of gadgets G′v,b and G′v , which are transformations of the two gadgets Gv,b and Gv (see Figures 1 and 2) used for Label Cut. According to the construction of the gadgets G′v,b and G′v and the designation method of labels to edges in the gadgets, the connectivity requirement in the Label Path instance generated by the reduction is equivalent to the edge-covering requirement of the given Label Cover instance. Again, the proof (including the reduction and the gadgets G′v,b and G′v ) of Theorem 4.1 is omitted here and is given in the Appendix. Theorem 4.1 For any constant c < 1/2, the Label Path problem can not be approximated within 1−1/ log logc n n in polynomial time unless P = NP, where n is the input length of the problem. 2log

5

Conclusions

We introduce and study the Minimum Label Cut problem in the edge-classified graph model, which is the labeled (or colored) version of the classical minimum s-t cut problem. We present a √ polynomial time approximation algorithm for Label Cut with performance ratio O( m), where m is the number of edges in the input graph. We show that unless P = NP Label Cut can not be 1−1/ log logc n n for any constant c < 1/2, where approximated in polynomial time within a factor of 2log n is the length of the instance of Label Cut, via a gap-preserving reduction from the Minimum Label Cover problem. Our method of the reduction also gives a stronger approximation hardness result for the Minimum Label Path problem, improving and unifying the best previously known hardness results [14, 24] for this problem. It remains an interesting problem to improve the approximation algorithm as well as to improve approximation hardness results (say, nǫ -hardness) for Label Cut and the related problems.

References [1] S. Arora, L. Babai, J. Stern, Z. Sweedyk. The hardness of approximate optima in lattices, codes, and systems of linear equations. In Proceedings of the 34th IEEE Symposium on Foundations of Computer Science, pages 724–733, 1993. Journal version: Journal of Computer and System Sciences. 54(2):317–331, 1997. [2] S. Arora, C. Lund. Hardness of Approximation. In: D. Hochbaum (editor), Approximation Algorithms for NP-hard Problems, PWS Publishing Company, 399–446, 1997. [3] H. Broersma, X. Li. Spanning trees with many or few colors in edge-colored graphs. Discussiones Mathematicae Graph Theory, 17(2):259–269, 1997. [4] H. Broersma, X. Li, G. Woeginger, S. Zhang. Paths and cycles in colored graphs. Australasian Journal on Combinatorics, 31:299–311, 2005.

10

[5] R. Carr, S. Doddi, G. Konjevod, M. Marathe. On the red-blue set cover problem. In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 345–353, 2000. [6] R.-S. Chang, S.-J. Leu. The minimum labeling spanning trees. Information Processing Letters, 63(5):277–282, 1997. [7] S. Chawla, R. Krauthgamer, R. Kumar, Y. Rabani. D. Sivakumar. On the hardness of approximating Multicut and Sparsest-Cut. In Proceedings of IEEE Conference on Computational Complexity, pages 144–153, 2005. [8] J. Chuzhoy, S. Khanna. Hardness of cut problems in directed graphs. In Proceedings of the 38th Annual ACM Symposium on Theory of Computing, pages 527–536, 2006. [9] I. Dinur, E. Fischer, G. Kindler, R. Raz, S. Safra. PCP characterizations of NP: towards a polynomially-small error-probability. In Proceedings of the 31st Annual ACM Symposium on Theory of Computing, pages 29–40, 1999. [10] I. Dinur, S. Safra. On the hardness of approximating Label Cover. Information Processing Letters 89(5):247–254, 2004. [11] A. Gupta. Improved results for directed multicut. In Proceedings of the 14th Annual ACMSIAM Symposium on Discrete Algorithms, pages 454–455, 2003. [12] N. Garg, V. Vazirani, M. Yannakakis. Approximate max-flow min-(multi)cut theorems and their applications. SIAM Journal on Computing, 25:235–251, 1996. [13] N. Garg, V. Vazirani, M. Yannakakis. Primal-dual approximation algorithms for integral flow and multicut in trees. Algorithmica, 18:3–20, 1997. [14] R. Hassin, J. Monnot, D. Segev. Approximation algorithms and hardness results for labeled connectivity problems. Journal of Combinatorial Optimization, 14(4):437–453, 2007. [15] S. Jha, O. Sheyner, J.M. Wing. Two formal analyses of attack graphs. In Proceedings of the 15th IEEE Computer Security Foundations Workshop, pages 49–63, Nova Scotia, Canada, June 2002. [16] D. Karger, P. Klein, C. Stein, M. Thorup, N. Young. Rounding algorithms for a geometric embedding of minimum multiway cut. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing, pages 668–678, 1999. [17] S. Khot, N. Vishnoi. The unique games conjecture, integrality gap for cut problems and the embeddability of negative type metrics into l1 . In Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science, pages 53–62, 2005. [18] S. Khuller, A. Moss, J. Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39–45, 1999. [19] S. Krumke, H. Wirth. On the minimum label spanning tree problem. Information Processing Letters, 66(2):81–85, 1998. [20] C. Lund, M. Yannakakis. On the hardness of approximating minimization problems. Journal of the ACM, 41(5):960–981, 1994. 11

[21] H. Saran, V. Vazirani. Finding k-cuts within twice the optimal. SIAM Journal on Computing, 24:101–108, 1995. [22] O. Sheyner and J.M. Wing, Tools for Generating and Analyzing Attack Graphs. Proceedings of Workshop on Formal Methods for Components and Objects, 2004, pp. 344–371. [23] O. Sheyner, J. Haines, S. Jha, R. Lippmann, and J.M. Wing, Automated Generation and Analysis of Attack Graphs, Proceedings of the IEEE Symposium on Security and Privacy, Oakland, CA, May 2002. [24] H. Wirth. Multicriteria Approximation of Network Design and Network Upgrade Problems. PhD thesis, Department of Computer Science, W¨ urzburg University, 2001.

Appendix Proof of Theorem 2.2 Proof: We modify Algorithm A such that it is suitable to deal with Label γ-Cut by replacing the minimum s-t cut computation in step 2 by an existing ρ-approximation algorithm (call it algorithm B), for the corresponding unlabeled γ-Cut problem. Denote by ci the capacity (i.e., the number of edges in our setting) of the γ-cut returned by algorithm B on graph Hi∗ . Step 2 of Algorithm A is replaced by 2. while ci = |B(Hi∗ )| > (mρ)1/2 . Denote by c∗i the optimum of γ-Cut on the instance Hi∗ . Since algorithm B returns a ρapproximation on Hi∗ , we know that ρ · c∗i ≥ ci > (mρ)1/2 , i.e., we have c∗i > (m/ρ)1/2 . Recall that E ∗ ⊆ E(G) is the set of edges corresponding to an optimum solution L∗ ⊆ L of Label γ-Cut. Since E ∗ ∩E(Hi∗ ) is a γ-cut for Hi∗ , we have |E ∗ ∩E(Hi∗ )| ≥ c∗i > (m/ρ)1/2 . Moreover, since E ∗ ∩ E(Hi∗ ) is also a feasible solution to the BMCP instance I ′ , for the edge subset Ei we have 1 1 1 |Ei | ≥ (1 − )OPTBMCP (I ′ ) ≥ (1 − )|E ∗ ∩ E(Hi∗ )| > (m/ρ)1/2 . e e 2 So we know that the number of iterations of step 2 of the modified Algorithm A is at most m 1 1/2 2 (m/ρ)

= 2(mρ)1/2 .

By the guess technique used in Theorem 2.1, there is a try of the modified Algorithm A whose output satisfies that [

w(

i

Li ) ≤ 2(mρ)1/2 · ∆ + (mρ)1/2 · ∆ = O((mρ)1/2 ) · OPT,

completing the proof of the theorem.

12

Figure 3: Gadget G′v,b for vertex v ∈ V and label b ∈ B2

Proof of Theorem 4.1 Proof: We prove the theorem via a gap-preserving reduction τ ′ which transforms any Label Cover instance I = ((U, V, E), B1 , B2 , Π) to a Label Path instance I ′ = ((V ′ , E ′ ), s, t, L) in polynomial time. First we give the reduction τ ′ . Given an instance I = ((U, V, E), B1 , B2 , Π) of Label Cover, the instance I ′ = (G′ , s, t, L) of Label Path is constructed as follows. For every vertex v ∈ V and non-excluded label b, there is a gadget G′v,b (see Figure 3). G′v,b consists of a series of consecutive subgraphs, with each of them corresponding to an edge (u, v) ∈ E(v). For such an edge (u, v) ∈ E(v), suppose that the set of labels mapped to b ∈ B2 by Π is {a1 , a2 , . . . , ap } ⊆ B1 . (Since b is a non-excluded label by v, we know that for every neighbor u ∈ N (v), this number p (depends on v, b and u) is at least 1. Note also that |N (v)| ≥ 1 as discussed in Section 3.) Then the subgraph corresponding to edge (u, v) contains p parallel paths of length 2 (for clarity, we call each such path a rung), with the ith rung (1 ≤ i ≤ p) having labels (u, ai ) and (v, b) for its two consecutive edges (see Figure 3). All other subgraphs are constructed in the same manner, one subgraph for each u ∈ N (v). The dashed lines (not belonging to the graph) at the two ends of the paths mean that for each subgraph the ends of the paths are merged. Next, a gadget G′v for every vertex v ∈ V is constructed by stacking up several gadgets G′v,b with each of them corresponding to a non-excluded label b by v (see Figure 4). Note that only G′v,b for non-excluded labels b for the vertex v appear in G′v . Its number is at least one but ≤ |B2 |. (For notational simplicity we assume that for the given example of G′v in Figure 4 all |B2 | labels are non-excluded labels for v.) Finally, the graph G′ = (V ′ , E ′ ) consists of a series of consecutive gadgets G′v (in arbitrary order) for every vertex v ∈ V . The left end of the first gadget G′v in G′ denotes the source vertex s, meanwhile the right end of the last gadget G′v in G′ denotes the sink vertex t. The label set L contains all u-labels and v-labels used in the construction. Every label in L has unit weight. Since each rung of G′ corresponds to a tuple of Π, I ′ can be constructed in polynomial time. For the Label Cover instance I and the Label Path instance I ′ generated by the reduction τ ′ , we claim that OPTCOVER (I) = |U | =⇒ OPTPATH (I ′ ) ≤ |U | + |V |,

OPTCOVER (I) > g|U | =⇒ OPTPATH (I ′ ) > g|U | + |V |, where OPTPATH () returns the optimum of any instance of Label Path. Suppose that (P1∗ , P2∗ ) is an optimal solution to the Label Cover instance I with l1 -cost |U |. We define a label subset L′ for the Label Path instance I ′ as follows. For every v ∈ V , since P2∗ (v) 6= ∅, we pick some b ∈ P2∗ (v) to include (v, b) in L′ . These are all the v-labels in L′ . For u-labels we 13

Figure 4: Gadget G′v for vertex v ∈ V include all (u, a) in L′ for every u ∈ U and a ∈ P1∗ (u). Since the l1 -cost of labeling (P1∗ , P2∗ ) is |U |, P1∗ assigns only one label to every vertex u ∈ U , (as every u is in some N (v), P1∗ assigns at least one, and hence, exactly one label to u), and we have picked also one label (v, b) in L′ for every v ∈ V . Note also that for any v ∈ V , the label b we picked for v is obviously a non-excluded label, by the definition of a solution in Label Cover, and hence (v, b) is a label in L. Consider any vertex v ∈ V and a non-excluded label b by v. For every u ∈ N (v) and label a ∈ B1 such that ((u, v), a, b) ∈ Π, the label (u, a) is assigned to some edge in the gadget G′v,b by its construction. So we know that every u-label we put in L′ is actually also in the label set L. That is, L′ is a subset of L with size of |U | + |V |. Then fix any vertex v ∈ V . We consider the picked label (v, b), and show that L′ contains a path passing through the gadget G′v,b . (Note that we view a path as the set of edges that appear on it, and we say that L′ contains an edge subset E ′ if L′ contains L(E ′ ).) Recall that G′v,b consists of a series of consecutive subgraphs, one subgraph for each u ∈ N (v). Consider the subgraph for a particular u ∈ N (v). Suppose that there are p rungs in the subgraph, where the first edges of all such p rungs are labeled with (u, a1 ), . . . , (u, ap ), and these ai are those satisfying ((u, v), ai , b) ∈ Π (see Figure 3). Since (P1∗ , P2∗ ) is a total-cover, for some 1 ≤ i ≤ p, ai ∈ P1∗ (u). But we did pick labels (u, a) for all a ∈ P1∗ (u). So L′ must contain a whole rung for the considered subgraph corresponding to u for G′v,b . Since this holds for every subgraph corresponding to u ∈ N (v) in G′v,b , we know that L′ contains a path in G′v,b from its left end to its right end. This gives a path passing through G′v (See Figure 4). These paths for every gadget G′v constitute an s-t path of G′ , that is, L′ is a feasible solution to instance I ′ . Since |L′ | = |U | + |V |, we know that OPTPATH (I ′ ) ≤ |U | + |V |. For the second part of the claim we prove its contrapositive. Suppose that P ∗ is an optimal s-t path of G′ with L(P ∗ ) = L∗ and |L∗ | ≤ g|U | + |V |. Denote by nul (resp. nvl ) the number of u-labels (resp. v-labels) in L∗ . Since G′ consists of |V | consecutive subgraphs G′v , P ∗ must traverse each G′v . By the construction of G′v as a stack of G′v,b , it must traverse at least one G′v,b , and therefore L∗ must pick at least one v-label (v, b) for every v ∈ V . This defines our assignment P2 on V , and nvl ≥ |V |. Next we define P1 on U as follows: P1 (u) = {a | (u, a) is a picked u-label in L∗ }. Since nvl ≥ |V |, it follows that nul ≤ g|U |. Since for every v, P ∗ traverses some G′v,b of G′v , for every u ∈ N (v), L∗ must include at least one u-label (u, a) such that ((u, v), a, b) ∈ Π. In particular S P1 (u) is non-empty. Since U = v∈V N (v), this is true for all u ∈ U . It follows that (P1 , P2 ) is a solution to the Label Cover instance I. Since nul ≤ g|U |, we know that the l1 -cost of (P1 , P2 ) is ≤ g|U |. This completes the proof of the contrapositive. we know that it By an argument similar to that of Theorem 3.3, we know that it is NP-hard to approximate Label Path within ratio gc (n) for any constant c < 1/2, where n denotes the input 14

length of Label Path. The proof of the theorem is concluded.

15