A Matching-Related Property of Bipartite Graphs

1 downloads 0 Views 259KB Size Report
To avoid trivialities, we only consider graphs with at least one edge. For a graph G .... Algorithm 1 below computes the surplus of a bipartite graph. Its correctness.
A Matching-Related Property of Bipartite Graphs With Applications in Signal Processing Epameinondas Fritzilas a, Martin Milaniˇc b, J´erˆome Monnot c and Yasmin A. Rios-Solis d a

Faculty of Technology, Bielefeld University, Bielefeld, Germany efritzil#cebitec.uni-bielefeld.de

b

FAMNIT and PINT, University of Primorska, Koper, Slovenia martin.milanic#upr.si

c

d

LAMSADE, Universit´e Paris-Dauphine, Paris Cedex 16, France monnot#lamsade.dauphine.fr

Graduate Program of Systems Engineering, UANL, Monterey, M´exico yasmin#yalma.fime.uanl.mx

Abstract A bipartite graph G = (L, R; E) is said to be identifiable if for every vertex v ∈ L, the subgraph induced by its non-neighbors has a matching of cardinality |L| − 1. This definition arises in the context of low-rank matrix factorization. Motivated by signal processing applications, in this paper we (i) propose the robustness of identifiability with respect to edge modifications as a polynomially computable measure of evaluating how strongly a bipartite graph possesses the property of identifiability, and (ii) introduce three problems that deal with finding identifiable subgraphs, and study their complexity. Keywords: bipartite matching, complexity, combinatorial optimization, source-sensor network

1

Introduction

A matching in a graph is a subset of pairwise disjoint edges. A bipartite graph G = (L, R; E) with at least one edge is called identifiable if for every vertex in L, the subgraph induced by its anti-neighborhood has a matching of cardinality |L| − 1. As shown in [5], the concept of identifiable bipartite graphs arises naturally in the context of low-rank matrix factorization, particularly in the area of signal processing: Suppose we have a set L of signal sources and a set R of sensors, each measuring a linear mixture of the source signals over time. The exact values of the mixing coefficients are unknown. A bipartite graph G = (L, R; E) is given that specifies which sensors measure which sources; such a graph can often be inferred a priori, independently of the exact values of the mixing coefficients. Given the sensor measurements over k discrete time points, our task is to infer the source signals over the k time points and the non-zero mixing coefficients. Formally, given an |R|×k matrix Y = (yit ) where yit is the signal measured at sensor i at time point t, we wish to express Y as a product Y = AX of an |R| × |L| matrix A = (aij ), where aij is the mixing coefficient between sensor i and signal source j, and a |L|×k matrix X = (xjt ) that contains the source-signal intensities at different time points. Moreover, we require that the matrix A satisfies the constraints imposed by G (denoted by A ⊳ G), that is, (j, i) ∈ / E implies aij = 0. We assume that |L| ≤ min(|R|, k) (low-rank factorization). Since an exact low-rank factorization is impossible in general, we seek a matrix pair (A, X) such that A ⊳ G and an appropriately chosen error measure kY − AXk (e.g., the Frobenius norm) is minimized. b X) b to this Now, let us suppose that we have obtained a feasible solution (A, Zero-Constrained Approximate Matrix Factorization (ZCAMF) problem. Many bX b and A ⊳ G, and are other pairs (A, X) may exist that satisfy AX = A b X) b in terms of how well they approxitherefore indistinguishable from (A, bX, b A ⊳ G} of all such mate Y . However, the level set L = {(A, X) : AX = A pairs is in general restricted by the bipartite graph G. In particular, if G is b identifiable, and under some mild full-rank assumptions on the two factors A b which are likely to be satisfied for numbers that correspond to physical and X, quantities 1 , the elements of the level set are unique up to diagonal scaling [5]. That is, for every (A, X) ∈ L, there exists an invertible diagonal matrix D b must have full row-rank and certain submatrices of A b must have full columnMatrix X ranks. 1

b and X = D −1 X. b such that A = AD Identifiable bipartite graphs will thus find applications in every area where the ZCAMF problem is relevant. In computational biology, for example, ZCAMF appears in at least two contexts: in determining the activities of transcription factors from transcriptome data [9,3], and in microarray analysis [14]. Our Contributions In the first part of the paper (Section 2), we define the robustness of identifiable bipartite graphs with respect to edge additions and deletions. This is useful in applications where the sets of sensors and sources are known exactly, but the structure of the bipartite network is predicted with some uncertainty (see, e.g., [9]). In practice, the robustness of bipartite graphs can be used as a measure to select, among different sensor designs, the one that gives rise to the “most identifiable” network. We show that robustness is computable in polynomial time. In particular, we show that the robustness of a bipartite graph G is equal to the minimum of the surpluses of certain subgraphs of G, plus one. Based on two classical results on bipartite matchings, we give an efficient algorithm for computing the surplus of a general bipartite graph. Then, using the properties of the surplus function, we also develop a polytime algorithm for computing a tight set. In the second part of the paper (Section 3), we introduce and study several variants of the following problem: Given a bipartite graph G = (L, R; E), can we delete some vertices D ⊂ L together with their neighbors so that the resulting graph is identifiable? This problem arises when the source-sensor graph G is not identifiable, but we would still like to be able to measure a subset J = L \ D of the signal sources in an “identifiable” way. If we omit the sources in D from our measurements, then we can use only the sensors that do not measure any source in D; this justifies why, together with each vertex in D, we must also delete all its neighbors. We identify several polynomially solvable cases, and show the hardness of the version of the problem in which we would like to keep as few vertices from L as possible so that the resulting graph is identifiable. This latter problem arises in the context of ZCAMF when the number of time samples k is limited. Notations and Definitions All graphs considered are undirected, finite and loopless but may contain parallel edges. To avoid trivialities, we only consider graphs with at least one edge. For a graph G, we denote by V (G) the vertex set of G and by E(G)

its edge (multi)set. A bipartite graph is a triple (L, R; E) where (L ∪ R, E) is a graph such that E ⊆ L × R. Following the standard graph-theoretic terminology (see, e.g., [4]), we denote with ν(G) the matching number of a graph G, i.e., the maximum cardinality of a matching in G. For a graph G = (V, E) and a subset of vertices X ⊆ V , NG (X) denotes the neighborhood of X, i.e., the set of all vertices in V \X that have a neighbor in X. For a vertex x ∈ V , we write NG (x) for NG ({x}), and denote the degree of x with dG (x) = |NG (x)|. The anti-neighborhood of x ∈ V is the set V (G) \ (NG (x) ∪ {x}). In NG (X), NG (x), dG (x), we shall omit the subscript G if the graph is clear from the context. For a bipartite graph G = (L, R; E) and vertex sets X ⊆ L, Y ⊆ R, we denote by G[X, Y ] the subgraph of G induced by X ∪Y . Moreover, for a vertex x ∈ L, we use Gx as a shortcut for G[L \ {x}, R \ N(x)]. The surplus of a set X ⊆ L is defined as σ(X) = |N(X)| − |X|. The surplus of the whole graph is defined as the minimum surplus over all non-empty sets: σ(G) = min∅6=X⊆L {σ(X)}. A set X ⊆ L is called tight if σ(X) = σ(G).

2

Robustness of Identifiable Graphs

In many applications (e.g., [9]), the sets of sensors and sources are known exactly, but the structure of the bipartite graph is predicted with some statistical method. Such a prediction involves some uncertainty, i.e., some connections that have been predicted as significant may not exist in reality and, vice versa, some existing connections may have been missed by the prediction. In this setting, a natural question arises: How many prediction mistakes can a given bipartite graph tolerate, before it loses the property of identifiability? This quantity is defined below as the robustness of an identifiable graph. In practice, the robustness can be used in order to select among different sensor designs the one that gives rise to the “most identifiable” network. Definition 2.1 Let G = (L, R; E) be an identifiable graph. For a vertex x ∈ L, we define its robustness, denoted by ρ(x), as the minimum number of edge modifications (additions and/or deletions) that are required to destroy the identifiability of x, i.e., to make ν(Gx ) < |L| − 1. The robustness of the whole graph is ρ(G) = minx∈L ρ(x). Remark 2.2 The following construction shows that there exist graphs of arbitrarily high robustness: For two integers n > k ≥ 1, let Gn,k = (L, R; E) be the graph given by L = {1, . . . , n}, R = {I ⊆ L : |I| = k} and (i, I) ∈ E iff i ∈ I. Using Proposition 2.3 below, it can be verified that ρ(Gn,k ) = n−2 . k−1 In the rest of this section, we prove that the robustness of a given identi-

fiable graph G, as well as a minimum-sized set of edge modifications required to destroy its identifiability, can be computed in polynomial time. First, we show that computing the robustness reduces to computing the (non-negative) surpluses of the graphs Gx , for all x ∈ L. Proposition 2.3 For every x ∈ L, ρ(x) = σ(Gx ) + 1. Proof. Consider a vertex x ∈ L. Adding (deleting) an edge incident to x, say (x, y), results in deleting (adding) the vertex y in Gx and this can only decrease (increase) ν(Gx ). Adding (deleting) an edge (z, y), where z 6= x and y ∈ N(x), has no influence on Gx . Finally, adding (deleting) an edge (z, y), where z 6= x and y ∈ / N(x), results in adding (deleting) the edge (z, y) in Gx and this can only increase (decrease) ν(Gx ). Therefore, in order to decrease ν(Gx ), we must either add edges that are incident to x or delete edges that are incident to some non-neighbor of x. Since both operations are allowed and have the same cost, we can safely focus only on edge additions, because they correspond to vertex deletions in Gx . More specifically, ρ(x) equals the minimum number of vertices that we must delete from R \ N(x), such that the matching number of the remaining subgraph of Gx becomes less than |L| − 1. By Hall’s Marriage Theorem [6], this can only be achieved by choosing a nonempty set Y ⊆ L \ {x} and deleting |NGx (Y )| − |Y | + 1 vertices from NGx (Y ). Therefore, ρ(x) = min∅6=Y ⊆L\{x} {|NGx (Y )| − |Y | + 1} = σ(Gx ) + 1. 2 The above proof also shows that, in order to destroy the identifiability of x ∈ L with a minimum number of edge modifications, it is enough to find a tight set X in Gx and then add to G all the edges {(x, y) : y ∈ NGx (X)}. Motivated by the computation of robustness, in the following sections we present polynomial-time algorithms for computing the surplus and finding a tight set in a bipartite graph G = (L, R; E). These results apply to arbitrary bipartite graphs and are interesting also outside the context of robustness, especially when the input graph G has σ(G) ≥ 0. In this case, G has an Lperfect matching and the following problem arises in any application that must guarantee robust matchings: Find a minimum subset of R, whose deletion leaves G without L-perfect matching. For the solution it is enough to find a tight set X ⊆ L and then delete an arbitrary set of σ(X) + 1 vertices from the neighborhood of X.

2.1 Computing the Surplus Algorithm 1 below computes the surplus of a bipartite graph. Its correctness is based on the following result (Theorems 1.3.1 and 1.3.6 in [10]). Lemma 2.4 Let G = (L, R; E) be a bipartite graph. If σ(G) < 0, then σ(G) = ν(G) − |L|. If σ(G) ≥ 0, then σ(G) equals the largest integer s satisfying the following property, for every x ∈ L: if we add s new vertices to L and connect them to all neighbors of x, the resulting graph has non-negative surplus. For the algorithm’s implementation we use a well-known theorem by Berge, a result of central importance for matching algorithms [1]. Berge’s theorem holds for arbitrary graphs and states that a matching M in a graph G is maximum if and only if G has no M-augmenting path, i.e., a path whose edges alternate between matched and unmatched and whose endpoints are unmatched. If there exists an M-augmenting path P , then a matching M ′ larger than M can be immediately obtained by replacing in M the matched edges along this path with the unmatched ones. In formulae, M ′ = M△E(P ) where △ denotes the symmetric difference operator. Algorithm 1 Computation of the surplus in G = (L, R; E) 1: Compute a maximum matching M in G. 2: if |M| = |L| (i.e., σ(G) ≥ 0) then 3: for all x ∈ L do 4: sx ← 0, Mx ← M, Gx ← G 5: repeat 6: Gx ← (Gx with new vertex v ∗ connected to all neighbors of x) 7: if Gx has an Mx -augmenting path P then 8: Mx ← Mx △E(P ) and sx ← sx + 1 9: else exit the repeat loop 10: end if 11: end repeat 12: end for 13: σ(G) ← minx∈L sx 14: else 15: σ(G) ← |M| − |L| 16: end if 17: return σ(G) Proposition 2.5 Algorithm 1 computes the surplus of a bipartite p graph G = (L, R; E). It can be implemented so that it runs in time O(|E|( |L| + |R| +

|L| + |E|)) (which is O(|E|2 ) if G has no isolated vertices). Proof. By Hall’s Theorem, the augmented graph Gx (line 6) has non-negative surplus if and only if there exists a matching that covers all vertices of Lx ∪ {v ∗ }. By Berge’s Theorem this happens if and only if there exists an Mx augmenting path (which has to start at v ∗ ). Finally, the correctness of the algorithm follows from Lemma 2.4. It remains to analyze the running time.pThe computation of a maximum matching in line 1 can be done in O(|E| |L| + |R|) time, using, e.g., the algorithm of Hopcroft and Karp [7]. For each x ∈ L, the internal repeat-loop can be executed up to d(x) times and, therefore, |E(Gx )| ≤ |E| + (d(x))2 . In line 7, checking if Gx has an Mx -augmenting path and if yes, finding one, can be done as follows: First we orient all unmatched edges in Gx from L to R and all matched edges from R to L and then we look for a directed path from v ∗ to an unmatched vertex. This can be done with breadth-first search in O(|E(Gx )|) time. In line 8, the symmetric difference can p be computed in O(|E(Gx )|) time. So, theptotal running time is in O(|E| |L| + |R| + P 2 2 x∈L (|E| + (d(x)) )) ⊆ O(|E| |L| + |R|+ |L||E| + |E| ). The last inclusion P P 2 follows from: x∈L (d(x))2 ≤ = |E|2 . x∈L d(x) 2 2.2 Finding a Tight Set In this section, we show that we can find a tight set in G = (L, R; E) by using an algorithm for surplus computation as a black-box routine in a greedy fashion (see Algorithm 2). For L′ ⊆ L, we denote with G − L′ the subgraph induced by (L \ L′ ) ∪ R and we write G − x as a shortcut for G − {x}. The neighborhood of any set X ⊆ L \ L′ , is the same in G and in G − L′ , and, therefore, for all X ⊆ L \ L′ , σG−L′ (X) = σG (X). So, we will omit the index and write σ(X) := σG−L′ (X) = σG (X). To prove the correctness of Algorithm 2 we need the following simple observations. Proposition 2.6 Consider the graph G = (L, R; E) and a set L′ ⊂ L. (i) σ(G − L′ ) ≥ σ(G) (ii) If σ(G − L′ ) = σ(G), then, for all L′′ ⊆ L′ , any tight set of G − L′′ is also a tight set of G. In particular, it follows that σ(G − L′′ ) = σ(G). (iii) Consider x ∈ L such that σ(G − x) = σ(G). Then, any tight set of G − x is a tight set of G. (iv) σ(G − x) > σ(G) for all x ∈ L, if and only if L is the only tight set of G.

Proof. (i) σ(G−L′ ) = minX⊆L\L′ {σ(X)} ≥ minX⊆L {σ(X)} = σ(G). The inequality follows from the fact that we are minimizing the same objective function over a larger ground set. (ii) Let X ∗ ⊆ L \ L′′ be a tight set of G − L′′ . So, we have σ(X ∗ ) = minX⊆L\L′′ {σ(X)} ≤ minX⊆L\L′ {σ(X)} = minX⊆L {σ(X)}. The inequality follows from the fact that L \ L′′ ⊇ L \ L′ and the last equality follows from the hypothesis σ(G − L′ ) = σ(G). So, we finally get σ(X ∗ ) = minX⊆L {σ(X)}, i.e., X ∗ is tight in G. (iii) This is a special case of (ii) for L′′ = L′ = {x}. (iv) Forward: For the sake of contradiction, assume that G has a tight set Y ⊂ L. Then, Y is also tight in G − x, where x ∈ L \ Y . That is, σ(Y ) = σ(G) = σ(G−x), which contradicts the hypothesis that σ(G−x) > σ(G). Reverse: For the sake of contradiction, assume that there exists x ∈ L with σ(G − x) = σ(G) (due to (i), it cannot be σ(G − x) < σ(G)). Then, there exists a set Y ⊆ L \ {x} that is tight in both G − x and G, and this contradicts the hypothesis that L is the only tight set. 2 Algorithm 2 below computes a tight set in pthe graph G = (L, R; E). Its total running time is of the order O(|L||E|( |L| + |R| + |L| + |E|)), since it requires |L| + 1 computations of surplus. Its correctness is established by Theorem 2.8. Algorithm 2 Computation of a tight set in G = (L, R; E) 1: X ← L 2: for all x ∈ L do 3: if σ(G − x) = σ(G) then 4: G ← G − x and X ← X \ {x} 5: end if 6: end for 7: return X ′ Lemma 2.7 Let H = (LH , RH ; EH ) be a bipartite graph. Let v ∗ ∈ LH such ∗ ′ that σ(H −v ) > σ(H). If L = {v1 , . . . , vi } ⊂ LH and σ(H −{v1 , . . . , vj−1 }) = σ(H − {v1 , . . . , vj }) for all j = 1, . . . , i, then σ((H − L′ ) − v ∗ ) > σ(H − L′ ).

Proof. Assume that the lemma fails. From assertion (i) of Proposition 2.6, we have σ((H − L′ ) − v ∗ ) = σ(H − L′ ). By hypothesis, σ(H − L′ ) =

σ(H − {v1 , . . . , vi }) = · · · = σ(H). Then, σ((H − L′ ) − v ∗ ) = σ(H). Using assertion (ii) of Proposition 2.6, we get that σ(H − v ∗ ) = σ(H), a contradiction with the hypothesis. 2 Theorem 2.8 The set X returned by Algorithm 2 is tight. Proof. If X = L, the statement follows from assertion (iv) of Proposition 2.6. So assume that L \ X = {vi1 , . . . , vik } for some k ≥ 1, and this is the order in which these vertices are deleted by the algorithm. Denote by G′ the subgraph induced by X ∪ R. Then, it is enough to show the following: Claim. For every x ∈ X, σ(G′ − x) > σ(G′ ). Assume that the claim holds. Then, assertion (iv) of Proposition 2.6 implies that X is the only tight set of G′ . Applying inductively assertion (iii) of Proposition 2.6 on the vertices deleted by the algorithm (in reverse order), we deduce that X is a tight set of G. Proof of Claim. Let x ∈ X. Let vij for j ≤ k be the last vertex deleted by the algorithm before x is encountered by the algorithm (or j = 0 if there is no such vertex). If j = k, the inequality of the claim follows by the algorithm’s rule. So, assume j < k. Let G′′ = G − {vi1 , . . . , vij }. The algorithm’s rule implies that σ(G′′ − x) > σ(G′′ ). Now, we apply Lemma 2.7 with H = G′′ , v ∗ = x and L′ = {vij+1 , . . . , vik }. We have G′′ −L′ = G′ (the subgraph induced by X ∪ R) and σ(G′ − x) = σ((G′′ − L′ ) − x) > σ(G′′ − L′ ) = σ(G′ ), proving inequality σ(G′ − x) > σ(G′ ). This completes the proof of the claim and with it the proof of the proposition. 2 Remark 2.9 The surplus and a tight set can also be computed by minimizing |A| submodular functions of the form fx (X) = σGx (X) (over all X ⊆ A − {x}), for all x ∈ A. For instance, using as a black-box the algorithm for submodular function minimization by Iwata [8], or the one by Orlin [11], we can compute the surplus and find a tight set in time O(|A|5 |E| log |B|) or O(|A|6 |E|), respectively. If |B| or |E| are significantly bigger than |A|, then this approach is faster than the simple algorithms proposed above. 2

3

Finding Identifiable Subgraphs

In this section, we focus on three problems that are all variants of the following generic task: Given a source-sensor network, we want to find subsets of sources 2

The approach with submodular function minimization can also be used to compute the weighted surplus σw (G) = min∅6=X⊆A P (w(N (X)) − w(X)), where each vertex v is assigned a positive weight w(v) and w(S) = v∈S w(v), for all S ⊆ A ∪ B.

that can be measured independently from the other sources in an identifiable way. Let us consider, for example, a scenario where the source-sensor graph G = (L, R; E) is not identifiable, and, due to limited budget, it is also not possible to fix this problem by augmenting R with more sensors. Then, if we still want the source-sensor network to be of any use, we must find a subset of sources J ⊂ L that can be measured independently from L\J in an identifiable way. Can we verify if such a set exists at all? Can we find a largest such set? Notice that isolating a subset J ⊂ L creates a complication: We can use only the sensors whose neighborhood is completely contained in J, because all other sensors measure signal mixtures from sources that we are not including in our ZCAMF computation. This fact motivates the following definition. Definition 3.1 Let us consider G = (L, R; E). We will call a set J ⊆ L separable if there exists a set I ⊆ R such that N(I) = J. For J ⊆ L, the family of sets whose neighborhood equals J is closed under union. Therefore, if J is separable, then there is a unique non-empty maximal set I ⊆ R with N(I) = J; we will denote this set with s(J). In other words, a set J ⊆ L is separable if and only if J = N(R \ N(L \ J)) and, in this case, s(J) = R \ N(L \ J). Definition 3.2 Let G = (L, R; E) and J ⊆ L. We will call J nicely separable, if it is separable and, moreover, the subgraph G[J, s(J)] is identifiable. In terms of nicely separable sets, the above motivating discussion can be formalized in the following two problems (SRC stands for “source” and SEL for “selection”): Problem 3.3 SRC-SEL Given a bipartite graph G = (L, R; E), is there a nicely separable set J ⊆ L? Problem 3.4 MAX-SRC-SEL Given a bipartite graph G = (L, R; E), find a nicely separable set J ⊆ L of maximum cardinality. Another application that requires the selection of subsets of sources arises in the context of ZCAMF when the number of time samples is limited. As stated b X) b of ZCAMF, the elements of in the introductory section, given a solution (A, b b the level set L = {(A, X) : AX = AX, A ⊳ G} differ only by diagonal scaling, if the source-sensor graph is identifiable and some full-rank conditions b and X b also hold [5]. In particular, X b must have full row-rank and a on A necessary condition for this is that there are at most as many signal sources as time samples. Now, consider a scenario where we are given a source-sensor

graph G = (L, R; E), but, due to limited budget, we cannot afford to take measurements at more than k time samples. Can we isolate a nicely separable set J of at most k sources? This question motivates the following problem. Problem 3.5 MIN-SRC-SEL Given a bipartite graph G = (L, R; E), find a nicely separable set J ⊆ L of minimum cardinality. We now investigate the complexity of the three problems defined above. In particular, in Sections 3.1, 3.2 and 3.3 we give polynomial solutions for three special cases: (i) G is a tree, (ii) d(x) ≤ 2 for every x ∈ L and (iii) d(x) ≤ 2 for every x ∈ R. Finally, in Section 3.4 we show that MIN-SRC-SEL is in general APX-hard. Notice that for a polynomial solution to any of these three problems, it suffices to operate separately on each connected component of the input graph. Therefore, in the rest of this section we assume that the input graph is connected. The polynomial results are summarized in the following theorem. Theorem 3.6 Let G = (L, R; E) be a connected bipartite graph (i) If G is a tree, then SRC-SEL, MAX-SRC-SEL and MIN-SRC-SEL are polynomially solvable. (ii) If d(x) ≤ 2 for all x ∈ L, then SRC-SEL, MAX-SRC-SEL and MIN-SRC-SEL are polynomially solvable. (iii) If d(y) ≤ 2 for all y ∈ R, then SRC-SEL is polynomially solvable. The following Sections 3.1, 3.2 and 3.3 are devoted to proving the three parts of Theorem 3.6. 3.1 Polynomial results for trees First, we need to show two useful properties of nicely separable sets. Lemma 3.7 Let G′ = (V ′ , E) be a subgraph of G = (L, R; E) induced by J ∪ s(J) where J ⊆ L is a nicely separable set. The following properties hold: (i) For all x, y ∈ J, x 6= y, we have NG′ (x) * NG′ (y). (ii) For every cycle C in G′ , there is x ∈ J ∩ V (C) such that dG′ (x) ≥ 3. Proof. (i) For the sake of contradiction, assume that NG′ (x) ⊆ NG′ (y) for some x, y ∈ J, x 6= y. Then, x would be isolated in the subgraph obtained by G′ after deleting y and its neighbors, and this is a contradiction.

(ii) For the sake of contradiction, assume that there is a cycle C in G′ such that ∀x ∈ J ∩ V (C) we have dG′ (x) = 2. Let x0 ∈ J ∩ V (C) and consider the subgraph G′′ of G′ induced by (J \ {x0 }) ∪ (s(J) \ NG′ (x0 )). We get |NG′′ ((J ∩ V (C)) \ {x0 })| = |J ∩ V (C)| − 2 and then |NG′′ ((J ∩ V (C)) \ {x0 })| < |(J ∩ V (C)) \ {x0 }|. By Hall’s Marriage Theorem, there does not exist a matching of G′′ saturating all vertices of J \ {x0 }, and this is a contradiction. 2 Lemma 3.8 Let T = (L, R; E) be a tree. We have the following: (i) T is identifiable iff either |L| = 1 or for every x ∈ L, dT (x) ≥ 2. (ii) SRC-SEL is feasible on T iff there exists a y ∈ R with dT (y) = 1. Proof. (i) The case |L| = 1 is clear; so, let us suppose |L| ≥ 2. Now, assume that T is identifiable. Assertion (i) of Lemma 3.7 with G′ = T implies ∀x ∈ L, dT (x) ≥ 2 since T is connected. Conversely, assume that ∀x ∈ L, dT (x) ≥ 2 and let us prove that T is identifiable. We need to use the following claim: Claim. If T ′ = (L′ , R′ ; E ′ ) is a tree such that there is at most one leaf in L′ , then T ′ has a matching M ′ saturating L′ . Assume that the claim holds, let x ∈ L and consider the subtrees T1 , . . . , Tp when we delete {x} ∪ NT (x) from T . Each Ti satisfies the claim, hence T is identifiable. Proof of Claim. The proof is by induction on |L′ |. If |L′ | = 1, it is obvious. So, assume that the result holds for any tree with |L′ | = k ≥ 1 and such that at most one leaf of T ′ is in L′ and let us prove the result holds for trees with |L′ | = k + 1. So, Let T ′ = (L′ , R′ ; E ′ ) be a tree with |L′ | = k + 1 and such that at most one leaf x0 of T ′ is in L′ . Root T ′ at x0 (if x0 does not exist, select any vertex of L′ as root) and let y be a leaf of T ′ maximizing dT ′ (x0 , y), i.e., the length of the longest path from x. By hypothesis, y ∈ R′ (actually, all leaves of T ′ are in R′ except possibly the root) and let x1 ∈ L′ be the neighbor of y in T ′ . When we delete x1 from T ′ , we obtain a subtree T ′′ and some isolated vertices in R′ since dT ′ (x0 , y) = max{dT ′ (x0 , z) : z ∈ L′ ∪ R′ }. The tree T ′′ = (L′′ , R′′ ; E ′′ ) has k vertices in L′′ and satisfies the inductive hypothesis. Thus, in T ′′ there is a matching M ′′ saturating all vertices in L′′ . By setting M ′ = M ′′ ∪ {(y, x1 )}, we obtain the expected result.

(ii) The reverse direction is clear. For the forward direction, let J ⊆ L be a nicely separable set in a tree T . Then, the graph G′ = T [J, s(J)] is a collection of disjoint trees T1′ = (J1 , s(J1 ); E1 ), . . . , Tp′ = (Jp , s(Jp ); Ep ) such that each Ti′ is identifiable. From (i) of Lemma 3.8, each tree Ti′ has at least one leaf yi ∈ s(Ji ) (actually, when |Ji | ≥ 2 all the leaves of Ti′ are in s(Ji )). By construction of s(J), we must have dG (yi ) = 1. 2 Now, we are ready to prove part (i) of Theorem 3.6. The solution to MIN-SRC-SEL (and to SRC-SEL, if there exists one) is N(y), where y is the vertex from Lemma 3.8. For MAX-SRC-SEL, a simple algorithmic solution is given by Algorithm 3.1. This algorithm is obviously polynomial and the solution returned is a forest which is an optimal solution of MAX-SRC-SEL. This follows from assertion (i) of Lemma 3.7 and Lemma 3.8. Algorithm 3 SST(input: a tree T = (L, R; E) with at least one leaf in R) 1: if |L| = 1 then 2: return T 3: else 4: if ∀x ∈ L, dT (x) ≥ 2 then 5: return T 6: else 7: Let x be a vertex in L of degree 1. 8: Let T1 , . . . , Tk be the connected components (trees) of T − ({x} ∪ NT (x)). 9: For every Ti such that Ti has at least one leaf in R let Qi = SST(Ti ). 10: 11: 12:

return ∪i Qi . end if end if

3.2 Polynomial results for bipartite graphs with maxx∈L d(x) ≤ 2 Lemma 3.9 Let G = (L, R; E) be a connected bipartite graph such that dG (x) ≤ 2 for all x ∈ L. We have the following: (i) G is identifiable iff G is an identifiable tree (see also Lemma 3.8). (ii) SRC-SEL is feasible on G iff there exists a y ∈ R with dG (y) = 1. Proof.

(i) The reverse direction is clear. For the forward direction we apply (ii) of Property 3.7 with G′ = G (i.e., J = L and s(J) = R), from which we deduce that G is acyclic since ∀x ∈ L, dG (x) ≤ 2. The result follows. (ii) The proof is similar to the proof of (ii) of Lemma 3.8. 2 It follows that SRC-SEL is polynomial on connected bipartite graphs such that dG (x) ≤ 2 for all x ∈ L. The fact that MAX-SRC-SEL and MIN-SRC-SEL are polynomial on such graphs follows immediately from the following characterization of nicely separable sets for these graphs. Proposition 3.10 Let G = (L, R; E) be a connected bipartite graph such that dG (x) ≤ 2 for all x ∈ L, and let J ⊆ L be a separable set of G (we denote by G′ the subgraph of G induced by J ∪ s(J)). Then, J is a nicely separable set of G iff either J = L (i.e., G′ = G) and G is identifiable (see Lemma 3.9 for a characterization of such graphs) or G′ is an induced matching and dG (y) = 1 for all y ∈ s(J). Proof. One direction is clear. Let J ⊆ L be a nicely separable set of G. If J = L, the result follows from Lemma 3.9. So, assume J 6= L (and then s(J) 6= R). Assume that G′ has p ≥ 1 connected components G′1 = (J1 , s(J1 ); E1 ), . . . , G′p = (Jp , s(Jp ); Ep ). Let us prove that |s(Ji)| = 1 for every i = 1, . . . , p (in this case, we get |Ji | = 1 and then, each G′i is reduced to an edge). By contradiction, assume that there exists i ∈ {1, . . . , p} with |Ji | ≥ 2. The subgraph G′i is identifiable. Then, using (i) of Lemma 3.9 we get that dG′i (x) = 2 for every x ∈ Ji ; thus, on the one hand ∀x ∈ Ji , dG′i (x) = dG (x). On the other hand, since G is connected there are x1 ∈ L \ Ji and y ∈ s(Ji ) such that (x1 , y) ∈ E. In this case, x1 must belong to Ji , and this is a contradiction. 2 3.3 Polynomial results for bipartite graphs with maxy∈R d(y) ≤ 2 In this section we show part (iii) of Theorem 3.6: SRC-SEL is polynomial for connected bipartite graphs G = (L, R; E) such that d(y) ≤ 2 for all y ∈ R. If there is a vertex y ∈ R such that d(y) = 1, then the singleton N(y) forms a nicely separable set. Thus, we assume from now on that d(y) = 2 for all y ∈ R. First, we show that the SRC-SEL problem, when restricted to such graphs, can be formulated in terms of (not necessarily bipartite) graphs. We say that a graph H with at least one edge is identifiable if for all v ∈ V (H), every component of the graph H − v contains a cycle; equivalently, no component of H − v is a tree. This definition is motivated by the following result.

Proposition 3.11 Let G = (L, R; E) be a bipartite graph such that dG (y) = 2 for all y ∈ R. Let H denote the graph such that V (H) = L, and its multiset of edges is given by E(H) = {NG (y) : y ∈ R}. Then, a subset J ⊆ L is nicely separable iff the subgraph of H induced by J is identifiable. Proof. First, suppose that J ⊆ L is nicely separable. Let H ′ denote the subgraph of H induced by J, and let v ∈ J. Then, the subgraph H ′ − v coincides with the graph (J − {v}, E(v)) where E(v) = {N(u) : u ∈ s(J)\N(v)}. Since J is a nicely separable set, the graph G[J − {v}, s(J) \ N(v)] contains a matching M of size |J| − 1. For every x ∈ J \ {v}, let m(x) denote the other endpoint of the edge in M covering x. In the graph H ′ − v, the set N(m(x)) defines an edge incident with x. Let Ev = {N(m(x)) : x ∈ J \ {v}} denote the multiset of all these edges. We claim that the graph Fv = (J \ {v}, Ev ) defines a spanning subgraph of H ′ − v such that every component of Fv contains a cycle. Consider the orientation E˜v of Ev , obtained by orienting each edge N(m(x)) away from x. Since M is a matching in G[J − {v}, s(J) \ N(v)], this orientation is ˜v ). well-defined. Furthermore, each vertex has out-degree one in (J \ {v}, E This implies that Fv contains no acyclic connected components (an acyclic component would necessarily contain a sink, that is, a vertex of out-degree zero). In other words, every connected component of Fv contains a cycle. Since Fv is a subgraph of H ′ − v, every connected component of H ′ − v contains a cycle, thus H ′ is identifiable. The converse direction can be proved similarly. Suppose that the subgraph of H induced by J (call it H ′ ) is identifiable. First, observe that, since H ′ contains an edge and no isolated vertices, J is separable. By definition of identifiabiliy, for every v ∈ J, every component of the subgraph H ′ − v contains a cycle. Let Fv denote a spanning subgraph of H ′ − v every connected component of which contains precisely one cycle. Let Ev denote the edge set of Fv . Fix an orientation E˜v of Ev such that each vertex has out-degree one in ˜v ): such an orientation can be obtained, for example, by orienting, (J \ {v}, E in every connected component K of Fv , the edges in the unique cycle L in K in one of the two directions following the cycle; and orienting all the other edges toward L. By construction, the set {(x, m(x)) : x ∈ J \ {v}}, where m(x) ˜v , forms is the element of R corresponding to the outgoing edge form x in E a matching of size |J| − 1 in the graph G[J − {v}, s(J) \ N(v)]. Therefore, J ⊆ L is a nicely separable set in G. 2 Therefore, it suffices to show that one can determine in polynomial time whether a given graph H contains an induced identifiable subgraph. First,

observe that, since the identifiable graphs are closed under edge addition, H contains an induced identifiable subgraph if and only if H contains an identifiable subgraph. Fig. 1 below shows five minimal identifiable graphs: for every i ∈ {1, . . . , 5}, the graph Hi is identifiable while no proper subgraph of it is identifiable. It can be readily verified that the identifiable graphs are closed under edge subdivision; therefore, all subdivisions of any of the five graphs from Fig. 1 are also identifiable. It turns out that the presence of a subdivision of one of these graphs is not only a sufficient condition for the presence of an identifiable subgraph, but also a necessary one. Lemma 3.12 A graph H contains an identifiable subgraph iff it contains a subgraph isomorphic to a subdivision of one of the graphs depicted in Fig. 1.

H1

H2

H3

H4

H5

Fig. 1. Minimal identifiable graphs

Finally, we observe that testing for the presence of a subdivision of Hi in a given graph reduces to solving polynomially many instances of the Disjoint Paths problem with k = |E(Hi)|, which is a polynomially solvable task, as shown by Robertson and Seymour: Theorem 3.13 ([13]) For every positive integer k, there is a polynomial time algorithm that solves the following “Disjoint Paths” problem: Given a graph H and pairs (s1 , t1 ), . . . , (sk , tk ), do there exist paths P1 , . . . , Pk , pairwise internally disjoint, such that Pi joins si and ti (1 ≤ i ≤ k)? Our proof of Lemma 3.12 will rely on the following results on the structure of 2-connected graphs (see Exercise 5.1.4 and Proposition 9.5 in [2]). Proposition 3.14 ([2]) Let H = (V, E) be a 2-connected graph. Then: (i) If X and Y are two sets of vertices of H, each of cardinality at least two, then there exist in H two disjoint (X, Y )-paths. (ii) Let x be a vertex of H, and let Y ⊆ V \ {x} be a set with |Y | ≥ 2. Then there exist two internally disjoint (x, Y )-paths whose terminal vertices are distinct. Proof (Lemma 3.12). It is enough to show that every identifiable graph contains a subgraph isomorphic to a subdivision of one of the graphs depicted

in Fig. 1. Suppose not, and let H be an identifiable graph that contains no subdivision of H1 , H2 , H3, H4 or H5 . Without loss of generality, we may assume that H is connected. Claim 1: H is 2-connected. Suppose not. Then, H contains at least two end blocks (cf. Exercise 5.2.4 in [2]). An end block of H is a block of H that corresponds to a leaf of the block tree of H. The block tree of H is the bipartite graph B(H) with bipartition (B, S), where B is the set of blocks of H and S the set of cut vertices of H, a block B and a cut vertex v being adjacent in B(H) if and only if B contains v. Let B1 and B2 be two distinct end blocks of H, and let v1 and v2 be the respective neighbors of B1 and B2 in B(H). Furthermore, for i = 1, 2, let Ci be a cycle in Bi − vi . For i = 1, 2, since Bi is 2-connected, Proposition 3.14(ii) implies that Bi contains two internally-vertex-disjoint (vi , Ci )paths, say Pi and Qi , whose terminal vertices are distinct. However, the four paths P1 , Q1 , P2 , Q2 together with the cycles C1 and C2 and a shortest (v1 , v2 )path in H form a subdivision of either H4 or H5 , depending on whether v1 = v2 or not; a contradiction. Claim 2: Every two cycles in H have a vertex in common. Suppose not, and let C1 and C2 be two vertex-disjoint cycles in H. By Proposition 3.14(i), since H is 2-connected, there exist in H two vertex-disjoint (C1 , C2 )-paths. These two paths, together with C1 ad C2 form a subdivision of H3 ; a contradiction. Now, fix an arbitrary chordless cycle C in H and a vertex v ∈ V (H) outside C. By Proposition 3.14(ii), since H is 2-connected, there exist two internally disjoint (v, C)-paths P , Q, whose terminal vertices are distinct. Let x and y denote the terminal vertices of P and Q on C. Moreover, let P1 and P2 denote the two (x, y)-paths in C, and P3 the (x, y)-path obtained as the union of the P and Q. Then: Claim 3: Every chordless cycle C ′ in H − x contains y. Suppose not, and let C ′ be a chordless cycle in H − x such that y 6∈ V (C ′ ). Since every two cycles in H have a vertex in common, C ′ has a vertex in common with each of the cycles C, P1 ∪ P3 , P2 ∪ P3 . Then, C ′ has a vertex in common with each of int(P1 ) ∪ int(P2 ), int(P1 ) ∪ int(P3 ) and int(P2 ) ∪ int(P3 ), where int(Pi ) denotes the set of internal vertices of Pi . Without loss of generality, we may assume that V (C ′ ) ∩ int(P1 ) 6= ∅ and V (C ′ ) ∩ int(P2 ) 6= ∅. Since the subgraph of H − {x, y} induced by V (C ′ ) ∪ int(P1 ) ∪ int(P2 ) is connected, there exists a path in H − {x, y} connecting an internal vertex of P1 to an internal vertex of P2 . But now, a subdivision of H1 can be easily found in H;

a contradiction. Claim 4: Every chordless cycle C ′ in H − x satisfies V (C ′ ) ∩ V (C ∪ P3 ) = {y}. Suppose not, and let C ′ be a chordless cycle in H − x such that (V (C ′ ) ∩ V (C ∪ P3 ))\{y} = 6 ∅. Then, there exists a (y, V (C ∪ P3 ))-path P in H − x such that no internal vertex of P belongs to V (C ∪ P3 ). Let z denote the terminal vertex of P on C ∪ P3 . Let C ′′ be a chordless cycle in H − y. By symmetry with Claim 3, x ∈ V (C ′′ ). Let Cˆ denote the cycle obtained from P and the shortest (z, y)-path on C ∪ P3 . Since C ′′ and Cˆ have a vertex in ˆ common, there exists an (x, V (C))-path Q in H − y. However, this implies that H contains either a subdivision of H2 (if Q attaches to C through z), or a subdivision of H1 (otherwise); in either case, we have a contradiction. We are now ready to complete the proof of Lemma 3.12. Let C ′ be a cycle in H − x, and let C ′′ be a cycle in H − y. Then, V (C ′ ) ∩ V (C ∪ P3 ) = {y} and V (C ′′ ) ∩ V (C ∪ P3 ) = {x}. Using the fact that C ′ and C ′′ have a vertex in common and to avoid a subdivision of H1 in H, we conclude that C ′ and C ′′ have precisely one vertex in common (say z), moreover the cycles C, C ′ and C ′′ are pairwise edge-disjoint. However, these three cycles now form a subdivision of H2 in H; a contradiction. 2 3.4 Finding a Minimum Nicely Separable Set is Hard Since we do not know the complexity status of the SRC-SEL problem for general bipartite graphs, the complexity of the MIN-SRC-SEL problem is an interesting question. It turns out that even for identifiable graphs, MIN-SRC-SEL does not admit a PTAS, unless P=NP. Theorem 3.15 MIN-SRC-SEL is APX-hard, even for identifiable graphs. Proof. We prove the hardness with a reduction from the following APX-hard problem, called 3-HITTING-SET [12]: Given a family F of subsets of size 3 of a finite set S such that each element of S is covered by at most 3 sets, find a set S ′ ⊆ S of minimum cardinality that intersects all members of F . Let I = (F , S) be an instance of 3-HITTING-SET with S = {1, . . . , n} and F = {T1 , . . . , Tm }. Moreover, we assume that m ≥ 3. We construct an identifiable bipartite graph G = (L, R; E) of size |L| = 2m + n and |R| = 2m2 − m + 3n; an example for the construction is illustrated in Fig. 2. The vertex set L consists of three parts: X = {x1 , . . . , xm }, Y = {y1 , . . . , ym } and Z = {z1 , . . . , zn }. The vertex set R consists of m + 4 parts: A = {a1 , . . . , an },

B1 , . . . , Bm (all parts B1 , . . . , Bm have size 2m − 2), C = {c1 , . . . , cm }, D = {d1 , . . . , dn }, F = {f1 , . . . , fn }. The edges of G are defined as follows: For all j ∈ {1, . . . , m}, N(xj ) = R \ ({ai : i ∈ Tj } ∪ Bj ∪ D) and N(yj ) = R \ (Bj ∪ {cj } ∪ F ). Finally, for all j ∈ {1, . . . , n}, N(zj ) = {aj , dj , fj }. In terms of the graph G, 3-HITTING-SET is equivalent to asking: Find a minimum-sized set I ′ ⊆ A such that each vertex from X has at least one non-neighbor in I ′ . Let us denote by opt(I) and opt ′ (G) respectively the optimal values of 3-HITTING-SET on I and of MIN-SRC-SEL on G. Below, we will prove that opt ′ (G) = 2m + opt(I), which establishes the reduction of 3-HITTING-SET to MIN-SRC-SEL. To show the claim, let us first assume that I ′ = {ai1 , . . . , ait } ⊆ A such that each vertex from X has at least one non-neighbor in I ′ . Let J = X ∪ Y ∪ {zi1 , . . . , zit }. Then, J is separable, with s(J) = {ai1 , . . . , ait } ∪ B1 ∪ . . . ∪ Bm ∪ C ∪ {di1 , . . . , dit } ∪ {fi1 , . . . , fit }. Clearly, |J| = 2m + |I ′ |; moreover, we show that J is nicely separable, by verifying that G[J, s(J)] is identifiable. By assumption, for all j ∈ {1, . . . , m}, xj has a non-neighbor in I ′ ; let us call it a(xj ). For any j ∈ {1, . . . , m}, we consider the subgraph Gxj and demonstrate the existence of a matching that covers all vertices of J \ {xj }: All vertices from (X ∪ Y ) \ {yj } are matched inside part Bj , yj is matched to a(xj ) and {zi1 , . . . , zit } are matched inside {di1 , . . . , dit }. Similarly, for any j ∈ {1, . . . , m}, there exists a matching in the subgraph Gyj that covers all vertices of J \{yj }: all vertices of (X ∪Y )\{xj } are matched inside part Bj , xj is matched to cj and {zi1 , . . . , zit } are matched inside {fi1 , . . . , fit }. Finally, for any z ∈ {zi1 , . . . , zit }, there exists a matching in the subgraph Gz that covers all vertices of J \ {z}: all vertices of X ∪ Y are matched inside parts B1 and B2 and {zi1 , . . . , zit } \ {z} are matched inside {di1 , . . . , dit }. So, we conclude that G[J, s(J)] is identifiable, which shows that opt ′ (G) ≤ opt(I) + 2m. To show the other inequality, we need Lemma 3.16 that is given below. Fix a nicely separable set J ⊆ L. From Lemma 3.16, J ⊇ X ∪Y . Let t = |J|−2m; thus, we can write J = X ∪ Y ∪ {zi1 , . . . , zit }. Then, s(J) = {ai1 , . . . , ait } ∪ B1 ∪ . . . ∪ Bm ∪ C ∪ {di1 , . . . , dit } ∪ {fi1 , . . . , fit }. The fact that G[J, s(J)] is identifiable implies that each vertex of J has at least |J| − 1 = 2m + t − 1 non-neighbors in s(J). But each vertex of X has exactly 2m + t − 2 nonneighbors in B1 ∪ . . . ∪ Bm ∪ C ∪ {di1 , . . . , dit } ∪ {fi1 , . . . , fit }. Therefore, each vertex of X must have at least one non-neighbor in I ′ := {ai1 , . . . , ait }. Thus, opt(I) ≤ t = opt ′ (G) − 2m. The above proof also shows that from any approximate solution of MIN-SRC-SEL on G with value apx ′ (G) we can get, in polynomial time, an ap-

proximate hitting set on I with value apx (I) such that apx (I) = apx ′ (G)−2m. So, we get apx (I) − opt(I) = apx ′ (G) − opt ′ (G) and opt ′ (G) = 2m + opt(I) ≤ 7opt(I) (since each element x ∈ S is covered by at most 3 sets in I, we get opt(I) ≥ m/3). Thus, the reduction is an L-reduction and the result follows. 2 Lemma 3.16 If J ⊆ L is a nicely separable set in G, then J ⊇ X ∪ Y . Proof. First we show the following statements: ∀j ∈ {1, . . . , m} : yj ∈ J ⇒ J ⊇ X \ {xj }

(1)

∀j ∈ {1, . . . , m} : xj ∈ J ⇒ J ⊇ Y \ {yj }

(2)

Because J is nicely separable, there exists a set s(J) ⊆ R such that N(s(J)) = J and G[J, s(J)] is identifiable. Let us assume that yj ∈ J; then the identifiability of G[J, s(J)] implies that s(J) contains at least one nonneighbor of yj . But every non-neighbor of yj is connected to all vertices of X \ {xj }. Therefore, N(s(J)) ⊇ X \ {xj } and (1) follows. Similarly, we can show (2). Now, we show that if J contains any vertex from X ∪ Y , then it contains them all. Thereby, we make use of the assumption that m ≥ 3. Without loss of generality we assume that x1 ∈ J and from (1) and (2) we get the following sequence of implications: x1 ∈ J ⇒ J ⊇ Y \ {y1 } ⇒ y2 ∈ J y2 ∈ J ⇒ J ⊇ X \ {x2 } ⇒ x3 ∈ J x3 ∈ J ⇒ J ⊇ Y \ {y3 } ⇒ y1 ∈ J y1 ∈ J ⇒ J ⊇ X \ {x1 } ⇒ x2 ∈ J So, it follows that J ⊇ X ∪ Y . To complete the proof, it remains to show that J contains at least one vertex from X ∪ Y . For the sake of contradiction, suppose that there is a nicely separable set J ⊆ Z. The identifiability of G[J, s(J)] implies that for each x ∈ J, s(J) contains at least one vertex that is connected to x. But for every y ∈ Z, every vertex that is connected to y is also connected to a vertex from X ∪ Y . Therefore, N(s(J)) ∩ (X ∪ Y ) 6= ∅ and this implies that J ∩ (X ∪ Y ) 6= ∅, which is a contradiction. 2

References [1] Berge, C., Two theorems in graph theory, Proc. Natl. Acad. Sci. USA 43 (1957), pp. 842–844. [2] Bondy, J. A. and U. S. R. Murty, “Graph Theory.” Graduate Texts in Mathematics 244, Springer, New York, 2008. [3] Boscolo, R., C. Sabatti, J. Liao and V. Roychowdhury, A generalized framework for network component analysis., IEEE Trans. Comp. Biol. Bioinf. 2 (2005), pp. 289–301. [4] Diestel, R., “Graph Theory, Third Edition,” Springer, 2005. [5] Fritzilas, E., Y. Rios-Solis and S. Rahmann, Structural identifiability in lowrank matrix factorization, in: Proceedings of COCOON’08, LNCS 5092, 2008, pp. 140–148, Extended version: http://www.cebitec.uni-bielefeld.de/∼efritzil/papers/IdentifiableGraphs.pdf. [6] Hall, P., On representatives of subsets, J. London Math. Society 10 (1935), pp. 26–30. [7] Hopcroft, J. and R. Karp, An n5/2 algorithm for maximum matchings in bipartite graphs, SIAM J. Comput. 2 (1973), pp. 225–231. [8] Iwata, S., A faster scaling algorithm for minimizing submodular functions, SIAM J. Comput. 32 (2003), pp. 833–840. [9] Liao, J., R. Boscolo, Y.-L. Yang, L. Tran, C. Sabatti and V. Roychowdhury, Network component analysis: reconstruction of regulatory signals in biological systems., Proc. Natl. Acad. Sci. USA 100 (2003), pp. 15522–15527. [10] Lov´asz, L. and M. Plummer, “Matching Theory,” North-Holland, 1986. [11] Orlin, J. B., A faster strongly polynomial time algorithm for submodular function minimization, in: Proceedings of IPCO’07, LNCS 4513, 2007, pp. 240– 251. [12] Papadimitriou, C. and M. Yannakakis, Optimization, approximation, and complexity classes, J. Comput. Syst. Sci. 43 (1991), pp. 425–440. [13] Robertson, N. and P. Seymour, Graph minors. XIII. the disjoint paths problem, J. Comb. Theory, Ser. B 63 (1995), pp. 65–110. [14] Wang, H., E. Hubbell, J. Hu, G. Mei, M. Cline, G. Lu, T. Clark, M. SianiRose, M. Ares, D. Kulp and D. Haussler, Gene structure-based splice variant deconvolution using a microarray platform, Bioinformatics 19 (2003), pp. i315– i322.

0

S =

{1, . . . , 6}

8 9 > > > >{2, 4, 5}> > < = F = {2, 3, 6} > > > > > :{1, 5, 6}> ;

B B B B B B B B B B B B A B B B B B B B B B B B B B B B B1 B B B B B B B B B B B B B2 B B B B B B B B B B B =⇒ G = B B B3 B B B B B B B B B B B C B B B B B B B B B B B B B D B B B B B B B B B B B B B B B B B B F B B B B B B @

X

a1 a2 a3 a4 a5 a6

c1 c2 c3 d1 d2 d3 d4 d5 d6 f1 f2 f3 f4 f5 f6

Y

Z

1

C C x 1 x 2 x 3 y1 y2 y3 z 1 z 2 z 3 z 4 z 5 z 6 C C C 1 1 0 1 1 1 1 0 0 0 0 0 C C C 0 0 1 1 1 1 0 1 0 0 0 0 C C C 1 0 1 1 1 1 0 0 1 0 0 0 C C C 0 1 1 1 1 1 0 0 0 1 0 0 C C C 0 1 0 1 1 1 0 0 0 0 1 0 C C C 1 0 0 1 1 1 0 0 0 0 0 1 C C C 0 1 1 0 1 1 0 0 0 0 0 0 C C C 0 1 1 0 1 1 0 0 0 0 0 0 C C C 0 1 1 0 1 1 0 0 0 0 0 0 C C C 0 1 1 0 1 1 0 0 0 0 0 0 C C C 1 0 1 1 0 1 0 0 0 0 0 0 C C C 1 0 1 1 0 1 0 0 0 0 0 0 C C C 1 0 1 1 0 1 0 0 0 0 0 0 C C C 1 0 1 1 0 1 0 0 0 0 0 0 C C C 1 1 0 1 1 0 0 0 0 0 0 0 C C C 1 1 0 1 1 0 0 0 0 0 0 0 C C C 1 1 0 1 1 0 0 0 0 0 0 0 C C C 1 1 0 1 1 0 0 0 0 0 0 0 C C C 1 1 1 0 1 1 0 0 0 0 0 0 C C C 1 1 1 1 0 1 0 0 0 0 0 0 C C C 1 1 1 1 1 0 0 0 0 0 0 0 C C C C 0 0 0 1 1 1 1 0 0 0 0 0 C C C 0 0 0 1 1 1 0 1 0 0 0 0 C C C 0 0 0 1 1 1 0 0 1 0 0 0 C C C 0 0 0 1 1 1 0 0 0 1 0 0 C C C 0 0 0 1 1 1 0 0 0 0 1 0 C C C 0 0 0 1 1 1 0 0 0 0 0 1 C C C 1 1 1 0 0 0 1 0 0 0 0 0 C C C 1 1 1 0 0 0 0 1 0 0 0 0 C C C 1 1 1 0 0 0 0 0 1 0 0 0 C C C 1 1 1 0 0 0 0 0 0 1 0 0 C C C 1 1 1 0 0 0 0 0 0 0 1 0 C A 1 1 1 0 0 0 0 0 0 0 0 1

Fig. 2. An example for the reduction of 3-HITTING-SET to MIN-SRC-SEL. We represent the bipartite graph G = (L, R; E) as a 0/1 matrix such that the columns correspond to part L, the rows correspond to part R and the ones correspond to the edges in E.