Jun 12, 2017 - Darling, Harris, Phulara, & Proos. CANADAM 2017 ..... REFERENCES. 1. David P. Williamson & David B. Shmoys, The Design of.
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
The Combinatorial Data Fusion Problem GRAPH CUT PROBLEMS FOR BIG DATA R. W. R. Darling1
David G. Harris2 John A. Proos3
1 National 2 University 3 Tutte
Dev R. Phulara1
Security Agency, USA
of Maryland Department of Computer Science
Institute for Mathematics & Computing, Canada
June 12, 2017
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Outline
1
Combinatorial Data Fusion Examples and Formalism
2
Complexity of Combinatorial Data Fusion
3
Practical Data Fusion Algorithms
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
How Data Fusion Problems Arise
A connected edge-weighted graph (V , E0 , w) arises from one data source. A second data source introduces an independence system S on V , characterized by its circuits F, called forbidden sets. Task: Find E1 ⊂ E0 , weight of E0 \ E1 minimum, so no graph component of (V , E1 ) contains any forbidden set.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
How Data Fusion Problems Arise
A connected edge-weighted graph (V , E0 , w) arises from one data source. A second data source introduces an independence system S on V , characterized by its circuits F, called forbidden sets. Task: Find E1 ⊂ E0 , weight of E0 \ E1 minimum, so no graph component of (V , E1 ) contains any forbidden set.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
How Data Fusion Problems Arise
A connected edge-weighted graph (V , E0 , w) arises from one data source. A second data source introduces an independence system S on V , characterized by its circuits F, called forbidden sets. Task: Find E1 ⊂ E0 , weight of E0 \ E1 minimum, so no graph component of (V , E1 ) contains any forbidden set.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Connected Edge-weighted Graph G := (V , E0 ) with w : E0 → (0, ∞)
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Forbidden Sets of Vertices Forbidden: F1 := {v1 , v2 , v3 } and F2 := {v6 , v9 }
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Forbidden Sets in Detail
Forbidden Sets F: a collection of subsets of graph’s vertex set V . Call (V , F) the forbidden hypergraph. Rule 1: F has no singletons, no edges of the graph. Rule 2: No F ∈ F may be proper subset of F 0 ∈ F. Vertex Set here was V := {v1 , v2 , . . . , v11 }. Forbidden Sets here were F1 := {v1 , v2 , v3 } and F2 := {v6 , v9 }.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Forbidden Sets in Detail
Forbidden Sets F: a collection of subsets of graph’s vertex set V . Call (V , F) the forbidden hypergraph. Rule 1: F has no singletons, no edges of the graph. Rule 2: No F ∈ F may be proper subset of F 0 ∈ F. Vertex Set here was V := {v1 , v2 , . . . , v11 }. Forbidden Sets here were F1 := {v1 , v2 , v3 } and F2 := {v6 , v9 }.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Forbidden Sets in Detail
Forbidden Sets F: a collection of subsets of graph’s vertex set V . Call (V , F) the forbidden hypergraph. Rule 1: F has no singletons, no edges of the graph. Rule 2: No F ∈ F may be proper subset of F 0 ∈ F. Vertex Set here was V := {v1 , v2 , . . . , v11 }. Forbidden Sets here were F1 := {v1 , v2 , v3 } and F2 := {v6 , v9 }.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Forbidden Sets in Detail
Forbidden Sets F: a collection of subsets of graph’s vertex set V . Call (V , F) the forbidden hypergraph. Rule 1: F has no singletons, no edges of the graph. Rule 2: No F ∈ F may be proper subset of F 0 ∈ F. Vertex Set here was V := {v1 , v2 , . . . , v11 }. Forbidden Sets here were F1 := {v1 , v2 , v3 } and F2 := {v6 , v9 }.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Forbidden Sets in Detail
Forbidden Sets F: a collection of subsets of graph’s vertex set V . Call (V , F) the forbidden hypergraph. Rule 1: F has no singletons, no edges of the graph. Rule 2: No F ∈ F may be proper subset of F 0 ∈ F. Vertex Set here was V := {v1 , v2 , . . . , v11 }. Forbidden Sets here were F1 := {v1 , v2 , v3 } and F2 := {v6 , v9 }.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Forbidden Sets in Detail
Forbidden Sets F: a collection of subsets of graph’s vertex set V . Call (V , F) the forbidden hypergraph. Rule 1: F has no singletons, no edges of the graph. Rule 2: No F ∈ F may be proper subset of F 0 ∈ F. Vertex Set here was V := {v1 , v2 , . . . , v11 }. Forbidden Sets here were F1 := {v1 , v2 , v3 } and F2 := {v6 , v9 }.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
What Role Do Forbidden Sets Play?
Suppose the weighted graph is G := (V , E0 ) with w : E0 → (0, ∞). Given an edge subset E1 ⊂ E0 , the subgraph (V , E1 ) partitions V into components V1 , V2 , . . . , Vd . Edge subset E1 ⊂ E0 is feasible if no component Vj contains any forbidden set. Seek such a feasible edge subset of maximum total weight.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
What Role Do Forbidden Sets Play?
Suppose the weighted graph is G := (V , E0 ) with w : E0 → (0, ∞). Given an edge subset E1 ⊂ E0 , the subgraph (V , E1 ) partitions V into components V1 , V2 , . . . , Vd . Edge subset E1 ⊂ E0 is feasible if no component Vj contains any forbidden set. Seek such a feasible edge subset of maximum total weight.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
What Role Do Forbidden Sets Play?
Suppose the weighted graph is G := (V , E0 ) with w : E0 → (0, ∞). Given an edge subset E1 ⊂ E0 , the subgraph (V , E1 ) partitions V into components V1 , V2 , . . . , Vd . Edge subset E1 ⊂ E0 is feasible if no component Vj contains any forbidden set. Seek such a feasible edge subset of maximum total weight.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
What Role Do Forbidden Sets Play?
Suppose the weighted graph is G := (V , E0 ) with w : E0 → (0, ∞). Given an edge subset E1 ⊂ E0 , the subgraph (V , E1 ) partitions V into components V1 , V2 , . . . , Vd . Edge subset E1 ⊂ E0 is feasible if no component Vj contains any forbidden set. Seek such a feasible edge subset of maximum total weight.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
(V , E1 ): No Component Contains Forbidden Set Dashed edges are removed. Solid edges remain.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Semi-supervised Learning as Data Fusion Problem
Client-Server Graph: VERTICES Some vertices may be clients, some may be servers, and others may be labels. EDGES Some edges are transactions between clients and servers, and others tag a server with a label.
Multiway Cut: FORBIDDEN All pairs of distinct labels. OUTCOME Each component has eactly one label vertex.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Semi-supervised Learning as Data Fusion Problem
Client-Server Graph: VERTICES Some vertices may be clients, some may be servers, and others may be labels. EDGES Some edges are transactions between clients and servers, and others tag a server with a label.
Multiway Cut: FORBIDDEN All pairs of distinct labels. OUTCOME Each component has eactly one label vertex.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Semi-supervised Learning as Data Fusion Problem
Client-Server Graph: VERTICES Some vertices may be clients, some may be servers, and others may be labels. EDGES Some edges are transactions between clients and servers, and others tag a server with a label.
Multiway Cut: FORBIDDEN All pairs of distinct labels. OUTCOME Each component has eactly one label vertex.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Semi-supervised Learning as Data Fusion Problem
Client-Server Graph: VERTICES Some vertices may be clients, some may be servers, and others may be labels. EDGES Some edges are transactions between clients and servers, and others tag a server with a label.
Multiway Cut: FORBIDDEN All pairs of distinct labels. OUTCOME Each component has eactly one label vertex.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Semi-supervised Learning as Data Fusion Problem
Client-Server Graph: VERTICES Some vertices may be clients, some may be servers, and others may be labels. EDGES Some edges are transactions between clients and servers, and others tag a server with a label.
Multiway Cut: FORBIDDEN All pairs of distinct labels. OUTCOME Each component has eactly one label vertex.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Semi-supervised Learning as Data Fusion Problem
Client-Server Graph: VERTICES Some vertices may be clients, some may be servers, and others may be labels. EDGES Some edges are transactions between clients and servers, and others tag a server with a label.
Multiway Cut: FORBIDDEN All pairs of distinct labels. OUTCOME Each component has eactly one label vertex.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Multiway Cut Visualization Left nodes: Clients. Right Nodes: Servers. Color: Server tag. Edges: Transactions
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Last Picture Should Look Like . . .
Cut by color.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Conversion to Combinatorial Data Fusion Extra nodes: colors. Color tag becomes edge from server to a new color node. Forbidden sets: color node pairs.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Name vs. Metadata Type of Data Fusion Problem
Entity Similarity Graph VERTICES Vertices are named entities. Vertex v is tagged with a metadata vector xv and a name `(v ). EDGES Edge weights w({v , v 0 }) reflect name-based similarity between `(v ) and `(v 0 ).
Metadata Compatibility FORBIDDEN We L say F ⊂ V is forbidden if the concatenated vector v ∈F xv fails to satisfy some monotone system of linear inequalities. OUTCOME Each component consist of similarly-named entities with compatible metadata,
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Name vs. Metadata Type of Data Fusion Problem
Entity Similarity Graph VERTICES Vertices are named entities. Vertex v is tagged with a metadata vector xv and a name `(v ). EDGES Edge weights w({v , v 0 }) reflect name-based similarity between `(v ) and `(v 0 ).
Metadata Compatibility FORBIDDEN We L say F ⊂ V is forbidden if the concatenated vector v ∈F xv fails to satisfy some monotone system of linear inequalities. OUTCOME Each component consist of similarly-named entities with compatible metadata,
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Name vs. Metadata Type of Data Fusion Problem
Entity Similarity Graph VERTICES Vertices are named entities. Vertex v is tagged with a metadata vector xv and a name `(v ). EDGES Edge weights w({v , v 0 }) reflect name-based similarity between `(v ) and `(v 0 ).
Metadata Compatibility FORBIDDEN We L say F ⊂ V is forbidden if the concatenated vector v ∈F xv fails to satisfy some monotone system of linear inequalities. OUTCOME Each component consist of similarly-named entities with compatible metadata,
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Name vs. Metadata Type of Data Fusion Problem
Entity Similarity Graph VERTICES Vertices are named entities. Vertex v is tagged with a metadata vector xv and a name `(v ). EDGES Edge weights w({v , v 0 }) reflect name-based similarity between `(v ) and `(v 0 ).
Metadata Compatibility FORBIDDEN We L say F ⊂ V is forbidden if the concatenated vector v ∈F xv fails to satisfy some monotone system of linear inequalities. OUTCOME Each component consist of similarly-named entities with compatible metadata,
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Name vs. Metadata Type of Data Fusion Problem
Entity Similarity Graph VERTICES Vertices are named entities. Vertex v is tagged with a metadata vector xv and a name `(v ). EDGES Edge weights w({v , v 0 }) reflect name-based similarity between `(v ) and `(v 0 ).
Metadata Compatibility FORBIDDEN We L say F ⊂ V is forbidden if the concatenated vector v ∈F xv fails to satisfy some monotone system of linear inequalities. OUTCOME Each component consist of similarly-named entities with compatible metadata,
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Name vs. Metadata Type of Data Fusion Problem
Entity Similarity Graph VERTICES Vertices are named entities. Vertex v is tagged with a metadata vector xv and a name `(v ). EDGES Edge weights w({v , v 0 }) reflect name-based similarity between `(v ) and `(v 0 ).
Metadata Compatibility FORBIDDEN We L say F ⊂ V is forbidden if the concatenated vector v ∈F xv fails to satisfy some monotone system of linear inequalities. OUTCOME Each component consist of similarly-named entities with compatible metadata,
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Other Ways Forbidden Sets Arise BALANCED CUTS: Assign price p(v ) to vertex v . Edge weight w({u, v }) measures similarity between u and v . Call F ⊂ V forbidden if X p(v ) > P1 v ∈F
for some price limit P1 . Group vertices into similar clusters with a price bound. LOGIC PROBLEMS: Logical variables x1 , . . . , xn . Boolean functions f1 , . . . , fm . Vertex for each possible value of an xi or fj . {x1 = 0, x4 = 0, x9 = 1, f1 = 0} is forbidden if f1 = 1 when (x1 , x4 , x9 ) = (0, 0, 1). Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Other Ways Forbidden Sets Arise BALANCED CUTS: Assign price p(v ) to vertex v . Edge weight w({u, v }) measures similarity between u and v . Call F ⊂ V forbidden if X p(v ) > P1 v ∈F
for some price limit P1 . Group vertices into similar clusters with a price bound. LOGIC PROBLEMS: Logical variables x1 , . . . , xn . Boolean functions f1 , . . . , fm . Vertex for each possible value of an xi or fj . {x1 = 0, x4 = 0, x9 = 1, f1 = 0} is forbidden if f1 = 1 when (x1 , x4 , x9 ) = (0, 0, 1). Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
EQUIVALENT: Coloring the Forbidden Hypergraph Given (V , E0 , w) and forbidden hypergraph (V , F). 1
Let d ≥ 2. A hypergraph coloring of (V , F) means a mapping χ : V → {1, 2, . . . , d} such that no hyperedge in F is monochromatic;
2
Find a d and a hypergraph coloring χ for which the total weight X w({u, v }). {u,v }∈E0 χ(u)6=χ(v )
of edges with differently colored endpoints is minimum. 3
Equivalent to CDFP. Colors ↔ components of (V , E1 ).
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
EQUIVALENT: Coloring the Forbidden Hypergraph Given (V , E0 , w) and forbidden hypergraph (V , F). 1
Let d ≥ 2. A hypergraph coloring of (V , F) means a mapping χ : V → {1, 2, . . . , d} such that no hyperedge in F is monochromatic;
2
Find a d and a hypergraph coloring χ for which the total weight X w({u, v }). {u,v }∈E0 χ(u)6=χ(v )
of edges with differently colored endpoints is minimum. 3
Equivalent to CDFP. Colors ↔ components of (V , E1 ).
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
EQUIVALENT: Coloring the Forbidden Hypergraph Given (V , E0 , w) and forbidden hypergraph (V , F). 1
Let d ≥ 2. A hypergraph coloring of (V , F) means a mapping χ : V → {1, 2, . . . , d} such that no hyperedge in F is monochromatic;
2
Find a d and a hypergraph coloring χ for which the total weight X w({u, v }). {u,v }∈E0 χ(u)6=χ(v )
of edges with differently colored endpoints is minimum. 3
Equivalent to CDFP. Colors ↔ components of (V , E1 ).
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
EQUIVALENT: Optimum Hypergraph Matching Given (V , E0 , w) and independence system S on V whose circuits are forbidden sets. 1
2
3
A hypergraph matching M of S is a collection of pairwise disjoint hyperedges in S. Define weight w(U) as total weight of the vertex-induced subgraph of G induced by U ∈ S. Seek a perfect matching1 M which maximizes the sum of the weights of the matching hyperedges: X w(M) := w(U). U∈M
4
Equivalent to CDFP. Hyperedges in optimum matching ↔ components of (V , E1 ).
1
Since {v } ∈ S for all v ∈ V by assumption, and all singletons have weight 0, any matching can be extended to a perfect matching of equal weight. Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
EQUIVALENT: Optimum Hypergraph Matching Given (V , E0 , w) and independence system S on V whose circuits are forbidden sets. 1
2
3
A hypergraph matching M of S is a collection of pairwise disjoint hyperedges in S. Define weight w(U) as total weight of the vertex-induced subgraph of G induced by U ∈ S. Seek a perfect matching1 M which maximizes the sum of the weights of the matching hyperedges: X w(M) := w(U). U∈M
4
Equivalent to CDFP. Hyperedges in optimum matching ↔ components of (V , E1 ).
1
Since {v } ∈ S for all v ∈ V by assumption, and all singletons have weight 0, any matching can be extended to a perfect matching of equal weight. Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
EQUIVALENT: Optimum Hypergraph Matching Given (V , E0 , w) and independence system S on V whose circuits are forbidden sets. 1
2
3
A hypergraph matching M of S is a collection of pairwise disjoint hyperedges in S. Define weight w(U) as total weight of the vertex-induced subgraph of G induced by U ∈ S. Seek a perfect matching1 M which maximizes the sum of the weights of the matching hyperedges: X w(M) := w(U). U∈M
4
Equivalent to CDFP. Hyperedges in optimum matching ↔ components of (V , E1 ).
1
Since {v } ∈ S for all v ∈ V by assumption, and all singletons have weight 0, any matching can be extended to a perfect matching of equal weight. Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
EQUIVALENT: Optimum Hypergraph Matching Given (V , E0 , w) and independence system S on V whose circuits are forbidden sets. 1
2
3
A hypergraph matching M of S is a collection of pairwise disjoint hyperedges in S. Define weight w(U) as total weight of the vertex-induced subgraph of G induced by U ∈ S. Seek a perfect matching1 M which maximizes the sum of the weights of the matching hyperedges: X w(M) := w(U). U∈M
4
Equivalent to CDFP. Hyperedges in optimum matching ↔ components of (V , E1 ).
1
Since {v } ∈ S for all v ∈ V by assumption, and all singletons have weight 0, any matching can be extended to a perfect matching of equal weight. Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
How Difficult Is Combinatorial Data Fusion?
Linear-time, polynomial-time, and NP-hard subcases exist. Authors do not know a triage of CDFP among these categories of difficulty. NAE-SAT and Max. Weight Stable Set (for vertex-weighted graphs) can be converted to CDFP. Most-studied subproblems are Multicut and Multiway Cut (next page). Integer Linear Program formulation below.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
How Difficult Is Combinatorial Data Fusion?
Linear-time, polynomial-time, and NP-hard subcases exist. Authors do not know a triage of CDFP among these categories of difficulty. NAE-SAT and Max. Weight Stable Set (for vertex-weighted graphs) can be converted to CDFP. Most-studied subproblems are Multicut and Multiway Cut (next page). Integer Linear Program formulation below.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
How Difficult Is Combinatorial Data Fusion?
Linear-time, polynomial-time, and NP-hard subcases exist. Authors do not know a triage of CDFP among these categories of difficulty. NAE-SAT and Max. Weight Stable Set (for vertex-weighted graphs) can be converted to CDFP. Most-studied subproblems are Multicut and Multiway Cut (next page). Integer Linear Program formulation below.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
How Difficult Is Combinatorial Data Fusion?
Linear-time, polynomial-time, and NP-hard subcases exist. Authors do not know a triage of CDFP among these categories of difficulty. NAE-SAT and Max. Weight Stable Set (for vertex-weighted graphs) can be converted to CDFP. Most-studied subproblems are Multicut and Multiway Cut (next page). Integer Linear Program formulation below.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
How Difficult Is Combinatorial Data Fusion?
Linear-time, polynomial-time, and NP-hard subcases exist. Authors do not know a triage of CDFP among these categories of difficulty. NAE-SAT and Max. Weight Stable Set (for vertex-weighted graphs) can be converted to CDFP. Most-studied subproblems are Multicut and Multiway Cut (next page). Integer Linear Program formulation below.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Complexity When All Forbidden Sets Have Size 2 Multiway Cut: Fix S ⊂ V . Forbidden sets are all pairs {s, s0 } with s, s0 ∈ S and s 6= s0 . NP-hard if |S| > 2. Randomized 3/2-approximation exists. Multicut: Forbidden sets {si , ti }1≤i≤k . NP-hard if k ≥ 2. 4 log (k + 1)-approximation exists. If Unique Games Conj. holds, no constant factor approximation unless P=NP. REFERENCES 1
2
David P. Williamson & David B. Shmoys, The Design of Approximation Algorithms, Cambridge Univ. Press, 2011 Vijay V. Vazirani, Approximation Algorithms. Springer, Berlin, 2003.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Complexity When All Forbidden Sets Have Size 2 Multiway Cut: Fix S ⊂ V . Forbidden sets are all pairs {s, s0 } with s, s0 ∈ S and s 6= s0 . NP-hard if |S| > 2. Randomized 3/2-approximation exists. Multicut: Forbidden sets {si , ti }1≤i≤k . NP-hard if k ≥ 2. 4 log (k + 1)-approximation exists. If Unique Games Conj. holds, no constant factor approximation unless P=NP. REFERENCES 1
2
David P. Williamson & David B. Shmoys, The Design of Approximation Algorithms, Cambridge Univ. Press, 2011 Vijay V. Vazirani, Approximation Algorithms. Springer, Berlin, 2003.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Complexity When All Forbidden Sets Have Size 2 Multiway Cut: Fix S ⊂ V . Forbidden sets are all pairs {s, s0 } with s, s0 ∈ S and s 6= s0 . NP-hard if |S| > 2. Randomized 3/2-approximation exists. Multicut: Forbidden sets {si , ti }1≤i≤k . NP-hard if k ≥ 2. 4 log (k + 1)-approximation exists. If Unique Games Conj. holds, no constant factor approximation unless P=NP. REFERENCES 1
2
David P. Williamson & David B. Shmoys, The Design of Approximation Algorithms, Cambridge Univ. Press, 2011 Vijay V. Vazirani, Approximation Algorithms. Springer, Berlin, 2003.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Complexity When All Forbidden Sets Have Size 2 Multiway Cut: Fix S ⊂ V . Forbidden sets are all pairs {s, s0 } with s, s0 ∈ S and s 6= s0 . NP-hard if |S| > 2. Randomized 3/2-approximation exists. Multicut: Forbidden sets {si , ti }1≤i≤k . NP-hard if k ≥ 2. 4 log (k + 1)-approximation exists. If Unique Games Conj. holds, no constant factor approximation unless P=NP. REFERENCES 1
2
David P. Williamson & David B. Shmoys, The Design of Approximation Algorithms, Cambridge Univ. Press, 2011 Vijay V. Vazirani, Approximation Algorithms. Springer, Berlin, 2003.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Complexity When All Forbidden Sets Have Size 2 Multiway Cut: Fix S ⊂ V . Forbidden sets are all pairs {s, s0 } with s, s0 ∈ S and s 6= s0 . NP-hard if |S| > 2. Randomized 3/2-approximation exists. Multicut: Forbidden sets {si , ti }1≤i≤k . NP-hard if k ≥ 2. 4 log (k + 1)-approximation exists. If Unique Games Conj. holds, no constant factor approximation unless P=NP. REFERENCES 1
2
David P. Williamson & David B. Shmoys, The Design of Approximation Algorithms, Cambridge Univ. Press, 2011 Vijay V. Vazirani, Approximation Algorithms. Springer, Berlin, 2003.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Combinatorial Data Fusion on Trees Suppose T = (V , E, w) is a weighted tree. Multiway Cut: becomes a matroid problem2 . Linear time algorithm already known. Arbitrary Forbidden Sets: F := {F1 , F2 , . . . , Fb }. Say E := {e1 , e2 , . . . , em }. Let ai,j = 1 if removing edge ej disconnects Fi in T . 1
Set Cover: Integer linear program: minimize over m X
ai,j xj ≥ 1,
i = 1, 2, . . . , b;
j=1 2
2
Optimum is: delete ej if xj = 1
Proved in our 2017 paper Darling, Harris, Phulara, & Proos
CANADAM 2017
Pm
j=1
w(ej )xj
xj ∈ {0, 1}.
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Combinatorial Data Fusion on Trees Suppose T = (V , E, w) is a weighted tree. Multiway Cut: becomes a matroid problem2 . Linear time algorithm already known. Arbitrary Forbidden Sets: F := {F1 , F2 , . . . , Fb }. Say E := {e1 , e2 , . . . , em }. Let ai,j = 1 if removing edge ej disconnects Fi in T . 1
Set Cover: Integer linear program: minimize over m X
ai,j xj ≥ 1,
i = 1, 2, . . . , b;
j=1 2
2
Optimum is: delete ej if xj = 1
Proved in our 2017 paper Darling, Harris, Phulara, & Proos
CANADAM 2017
Pm
j=1
w(ej )xj
xj ∈ {0, 1}.
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Combinatorial Data Fusion on Trees Suppose T = (V , E, w) is a weighted tree. Multiway Cut: becomes a matroid problem2 . Linear time algorithm already known. Arbitrary Forbidden Sets: F := {F1 , F2 , . . . , Fb }. Say E := {e1 , e2 , . . . , em }. Let ai,j = 1 if removing edge ej disconnects Fi in T . 1
Set Cover: Integer linear program: minimize over m X
ai,j xj ≥ 1,
i = 1, 2, . . . , b;
j=1 2
2
Optimum is: delete ej if xj = 1
Proved in our 2017 paper Darling, Harris, Phulara, & Proos
CANADAM 2017
Pm
j=1
w(ej )xj
xj ∈ {0, 1}.
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Combinatorial Data Fusion on Trees Suppose T = (V , E, w) is a weighted tree. Multiway Cut: becomes a matroid problem2 . Linear time algorithm already known. Arbitrary Forbidden Sets: F := {F1 , F2 , . . . , Fb }. Say E := {e1 , e2 , . . . , em }. Let ai,j = 1 if removing edge ej disconnects Fi in T . 1
Set Cover: Integer linear program: minimize over m X
ai,j xj ≥ 1,
i = 1, 2, . . . , b;
j=1 2
2
Optimum is: delete ej if xj = 1
Proved in our 2017 paper Darling, Harris, Phulara, & Proos
CANADAM 2017
Pm
j=1
w(ej )xj
xj ∈ {0, 1}.
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Data Fusion on Trees in General Say G = (V , E0 , w) is a tree, F ⊂ V is a forbidden set. Seek edges EF ⊂ E0 so removal of any e ∈ EF disconnects F .
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Algorithm for Finding Edges Which Disconnect F Desired edges in E0 belong to 2-core of a graph with auxiliary vertex v0 connected to each v ∈ F .
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Combinatorial Data Fusion as Integer LP Seek optimum d-coloring χ of forbidden hypergraph (V , F). Notation: 1
2
xiα ∈ {0, 1}, for 1 ≤ i ≤ n and 1 ≤ α ≤ d. Here xiα = 1 means that vertex vi is assigned color χ(vi ) = α. Edges E := {ej }, weights {wj }, and yj ∈ {0, 1} for 1 ≤ j ≤ m. Here yj = 1 means that edge ej is deleted.
Objective: minimize Constraints: 1 2
Pm
j=1 wj yj .
Pd α Every vertex has 1 color: 1 ≤ i ≤ n. α=1 xi = 1, Delete edge if end points have different colors: ej = {vr , vs } ⇒ ±yj ≥ xrα − xsα ,
3
1 ≤ α ≤ d.
A forbidden set cannot be monochromatic: X xiα ≤ |fk | − 1, 1 ≤ α ≤ d. ∀F ∈ F. i:vi ∈F Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Combinatorial Data Fusion as Integer LP Seek optimum d-coloring χ of forbidden hypergraph (V , F). Notation: 1
2
xiα ∈ {0, 1}, for 1 ≤ i ≤ n and 1 ≤ α ≤ d. Here xiα = 1 means that vertex vi is assigned color χ(vi ) = α. Edges E := {ej }, weights {wj }, and yj ∈ {0, 1} for 1 ≤ j ≤ m. Here yj = 1 means that edge ej is deleted.
Objective: minimize Constraints: 1 2
Pm
j=1 wj yj .
Pd α Every vertex has 1 color: 1 ≤ i ≤ n. α=1 xi = 1, Delete edge if end points have different colors: ej = {vr , vs } ⇒ ±yj ≥ xrα − xsα ,
3
1 ≤ α ≤ d.
A forbidden set cannot be monochromatic: X xiα ≤ |fk | − 1, 1 ≤ α ≤ d. ∀F ∈ F. i:vi ∈F Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Combinatorial Data Fusion as Integer LP Seek optimum d-coloring χ of forbidden hypergraph (V , F). Notation: 1
2
xiα ∈ {0, 1}, for 1 ≤ i ≤ n and 1 ≤ α ≤ d. Here xiα = 1 means that vertex vi is assigned color χ(vi ) = α. Edges E := {ej }, weights {wj }, and yj ∈ {0, 1} for 1 ≤ j ≤ m. Here yj = 1 means that edge ej is deleted.
Objective: minimize Constraints: 1 2
Pm
j=1 wj yj .
Pd α Every vertex has 1 color: 1 ≤ i ≤ n. α=1 xi = 1, Delete edge if end points have different colors: ej = {vr , vs } ⇒ ±yj ≥ xrα − xsα ,
3
1 ≤ α ≤ d.
A forbidden set cannot be monochromatic: X xiα ≤ |fk | − 1, 1 ≤ α ≤ d. ∀F ∈ F. i:vi ∈F Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Combinatorial Data Fusion as Integer LP Seek optimum d-coloring χ of forbidden hypergraph (V , F). Notation: 1
2
xiα ∈ {0, 1}, for 1 ≤ i ≤ n and 1 ≤ α ≤ d. Here xiα = 1 means that vertex vi is assigned color χ(vi ) = α. Edges E := {ej }, weights {wj }, and yj ∈ {0, 1} for 1 ≤ j ≤ m. Here yj = 1 means that edge ej is deleted.
Objective: minimize Constraints: 1 2
Pm
j=1 wj yj .
Pd α Every vertex has 1 color: 1 ≤ i ≤ n. α=1 xi = 1, Delete edge if end points have different colors: ej = {vr , vs } ⇒ ±yj ≥ xrα − xsα ,
3
1 ≤ α ≤ d.
A forbidden set cannot be monochromatic: X xiα ≤ |fk | − 1, 1 ≤ α ≤ d. ∀F ∈ F. i:vi ∈F Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Combinatorial Data Fusion as Integer LP Seek optimum d-coloring χ of forbidden hypergraph (V , F). Notation: 1
2
xiα ∈ {0, 1}, for 1 ≤ i ≤ n and 1 ≤ α ≤ d. Here xiα = 1 means that vertex vi is assigned color χ(vi ) = α. Edges E := {ej }, weights {wj }, and yj ∈ {0, 1} for 1 ≤ j ≤ m. Here yj = 1 means that edge ej is deleted.
Objective: minimize Constraints: 1 2
Pm
j=1 wj yj .
Pd α Every vertex has 1 color: 1 ≤ i ≤ n. α=1 xi = 1, Delete edge if end points have different colors: ej = {vr , vs } ⇒ ±yj ≥ xrα − xsα ,
3
1 ≤ α ≤ d.
A forbidden set cannot be monochromatic: X xiα ≤ |fk | − 1, 1 ≤ α ≤ d. ∀F ∈ F. i:vi ∈F Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Combinatorial Data Fusion as Integer LP Seek optimum d-coloring χ of forbidden hypergraph (V , F). Notation: 1
2
xiα ∈ {0, 1}, for 1 ≤ i ≤ n and 1 ≤ α ≤ d. Here xiα = 1 means that vertex vi is assigned color χ(vi ) = α. Edges E := {ej }, weights {wj }, and yj ∈ {0, 1} for 1 ≤ j ≤ m. Here yj = 1 means that edge ej is deleted.
Objective: minimize Constraints: 1 2
Pm
j=1 wj yj .
Pd α Every vertex has 1 color: 1 ≤ i ≤ n. α=1 xi = 1, Delete edge if end points have different colors: ej = {vr , vs } ⇒ ±yj ≥ xrα − xsα ,
3
1 ≤ α ≤ d.
A forbidden set cannot be monochromatic: X xiα ≤ |fk | − 1, 1 ≤ α ≤ d. ∀F ∈ F. i:vi ∈F Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Combinatorial Data Fusion as Integer LP Seek optimum d-coloring χ of forbidden hypergraph (V , F). Notation: 1
2
xiα ∈ {0, 1}, for 1 ≤ i ≤ n and 1 ≤ α ≤ d. Here xiα = 1 means that vertex vi is assigned color χ(vi ) = α. Edges E := {ej }, weights {wj }, and yj ∈ {0, 1} for 1 ≤ j ≤ m. Here yj = 1 means that edge ej is deleted.
Objective: minimize Constraints: 1 2
Pm
j=1 wj yj .
Pd α Every vertex has 1 color: 1 ≤ i ≤ n. α=1 xi = 1, Delete edge if end points have different colors: ej = {vr , vs } ⇒ ±yj ≥ xrα − xsα ,
3
1 ≤ α ≤ d.
A forbidden set cannot be monochromatic: X xiα ≤ |fk | − 1, 1 ≤ α ≤ d. ∀F ∈ F. i:vi ∈F Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Combinatorial Data Fusion as Integer LP Seek optimum d-coloring χ of forbidden hypergraph (V , F). Notation: 1
2
xiα ∈ {0, 1}, for 1 ≤ i ≤ n and 1 ≤ α ≤ d. Here xiα = 1 means that vertex vi is assigned color χ(vi ) = α. Edges E := {ej }, weights {wj }, and yj ∈ {0, 1} for 1 ≤ j ≤ m. Here yj = 1 means that edge ej is deleted.
Objective: minimize Constraints: 1 2
Pm
j=1 wj yj .
Pd α Every vertex has 1 color: 1 ≤ i ≤ n. α=1 xi = 1, Delete edge if end points have different colors: ej = {vr , vs } ⇒ ±yj ≥ xrα − xsα ,
3
1 ≤ α ≤ d.
A forbidden set cannot be monochromatic: X xiα ≤ |fk | − 1, 1 ≤ α ≤ d. ∀F ∈ F. i:vi ∈F Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Three Scales of Problem Needing Algorthms Moderate: G may have less than 100 vertices and less than 20 cycles. Tree width may be low. Disconnecting a few vertices from the rest of the graph may leave the giant component free of forbidden sets. Large: G has up to 100K vertices, nearly disconnected into clusters whose sizes follow a power law. Each cluster has a few high degree vertices, and many low degree vertices. There are thousands of special nodes T ⊂ V . Solve Multiway Cut for T , meaning that every element of T must be in a different component. Mega: Like the large case, but with up to 100M vertices. T is in the hundreds. Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Three Scales of Problem Needing Algorthms Moderate: G may have less than 100 vertices and less than 20 cycles. Tree width may be low. Disconnecting a few vertices from the rest of the graph may leave the giant component free of forbidden sets. Large: G has up to 100K vertices, nearly disconnected into clusters whose sizes follow a power law. Each cluster has a few high degree vertices, and many low degree vertices. There are thousands of special nodes T ⊂ V . Solve Multiway Cut for T , meaning that every element of T must be in a different component. Mega: Like the large case, but with up to 100M vertices. T is in the hundreds. Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Three Scales of Problem Needing Algorthms Moderate: G may have less than 100 vertices and less than 20 cycles. Tree width may be low. Disconnecting a few vertices from the rest of the graph may leave the giant component free of forbidden sets. Large: G has up to 100K vertices, nearly disconnected into clusters whose sizes follow a power law. Each cluster has a few high degree vertices, and many low degree vertices. There are thousands of special nodes T ⊂ V . Solve Multiway Cut for T , meaning that every element of T must be in a different component. Mega: Like the large case, but with up to 100M vertices. T is in the hundreds. Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Exact Algorithms for Moderate Case Adam Logan (Tutte Institute) has an exact algorithm for graphs with ∼ 100 vertices, ∼ 50 cycles, and ∼ 10 forbidden sets. It makes multiple calls to an exact ILP solver in Python. Associate a 0-1 variable with each included edge of the graph. Suppose the included edges give some component which contains a forbidden set F . Find a set of edges EF which form an approximate Steiner tree for the vertices in F . Add lazy constraint do not include all edges in EF in the next iteration, and repeat. Logan reports that this terminates in about a minute. Mark Velednitsky (U. C. Berkeley) reports3 an efficient branch and bound algorithm for Multiway Cut. 3
Personal communication Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Exact Algorithms for Moderate Case Adam Logan (Tutte Institute) has an exact algorithm for graphs with ∼ 100 vertices, ∼ 50 cycles, and ∼ 10 forbidden sets. It makes multiple calls to an exact ILP solver in Python. Associate a 0-1 variable with each included edge of the graph. Suppose the included edges give some component which contains a forbidden set F . Find a set of edges EF which form an approximate Steiner tree for the vertices in F . Add lazy constraint do not include all edges in EF in the next iteration, and repeat. Logan reports that this terminates in about a minute. Mark Velednitsky (U. C. Berkeley) reports3 an efficient branch and bound algorithm for Multiway Cut. 3
Personal communication Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Exact Algorithms for Moderate Case Adam Logan (Tutte Institute) has an exact algorithm for graphs with ∼ 100 vertices, ∼ 50 cycles, and ∼ 10 forbidden sets. It makes multiple calls to an exact ILP solver in Python. Associate a 0-1 variable with each included edge of the graph. Suppose the included edges give some component which contains a forbidden set F . Find a set of edges EF which form an approximate Steiner tree for the vertices in F . Add lazy constraint do not include all edges in EF in the next iteration, and repeat. Logan reports that this terminates in about a minute. Mark Velednitsky (U. C. Berkeley) reports3 an efficient branch and bound algorithm for Multiway Cut. 3
Personal communication Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Exact Algorithms for Moderate Case Adam Logan (Tutte Institute) has an exact algorithm for graphs with ∼ 100 vertices, ∼ 50 cycles, and ∼ 10 forbidden sets. It makes multiple calls to an exact ILP solver in Python. Associate a 0-1 variable with each included edge of the graph. Suppose the included edges give some component which contains a forbidden set F . Find a set of edges EF which form an approximate Steiner tree for the vertices in F . Add lazy constraint do not include all edges in EF in the next iteration, and repeat. Logan reports that this terminates in about a minute. Mark Velednitsky (U. C. Berkeley) reports3 an efficient branch and bound algorithm for Multiway Cut. 3
Personal communication Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
LP Relaxations for Large Case
The authors are developing an LP relaxation of the integer LP for optimum d-coloring of the forbidden hypergraph, described above. As in classical LP relaxation of multiway cut, the vertices of the graph will be embedded in the d-dimensional simplex. Then partitioned into d clusters, with the goal that no forbidden set falls entirely within one cluster. Constant factor approximation seems unlikely, because of the negative result for Multicut. The log of the number of forbidden sets may intrude.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
LP Relaxations for Large Case
The authors are developing an LP relaxation of the integer LP for optimum d-coloring of the forbidden hypergraph, described above. As in classical LP relaxation of multiway cut, the vertices of the graph will be embedded in the d-dimensional simplex. Then partitioned into d clusters, with the goal that no forbidden set falls entirely within one cluster. Constant factor approximation seems unlikely, because of the negative result for Multicut. The log of the number of forbidden sets may intrude.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
LP Relaxations for Large Case
The authors are developing an LP relaxation of the integer LP for optimum d-coloring of the forbidden hypergraph, described above. As in classical LP relaxation of multiway cut, the vertices of the graph will be embedded in the d-dimensional simplex. Then partitioned into d clusters, with the goal that no forbidden set falls entirely within one cluster. Constant factor approximation seems unlikely, because of the negative result for Multicut. The log of the number of forbidden sets may intrude.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Greedy Algorithms for Mega Case? The optimum hypergraph coloring formulation leads to a natural greedy algorithm which starts with one color, and adds an extra color whenever needed to complete coloring of vertices in some forbidden set. Uncolored vertex v receive color α if adjacent vertices of color α have high total weight edges incident to v . The hypergraph matching formulation leads to another natural greedy algorithm, starting with all singletons in the matching, and allowing sets in the matching to merge greedily when their merger is still an independent set. More ideas are welcome.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Greedy Algorithms for Mega Case? The optimum hypergraph coloring formulation leads to a natural greedy algorithm which starts with one color, and adds an extra color whenever needed to complete coloring of vertices in some forbidden set. Uncolored vertex v receive color α if adjacent vertices of color α have high total weight edges incident to v . The hypergraph matching formulation leads to another natural greedy algorithm, starting with all singletons in the matching, and allowing sets in the matching to merge greedily when their merger is still an independent set. More ideas are welcome.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Greedy Algorithms for Mega Case? The optimum hypergraph coloring formulation leads to a natural greedy algorithm which starts with one color, and adds an extra color whenever needed to complete coloring of vertices in some forbidden set. Uncolored vertex v receive color α if adjacent vertices of color α have high total weight edges incident to v . The hypergraph matching formulation leads to another natural greedy algorithm, starting with all singletons in the matching, and allowing sets in the matching to merge greedily when their merger is still an independent set. More ideas are welcome.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
More details
Preprint: R. W. R. Darling, David G. Harris, Dev R. Phulara, John A. Proos, The Combinatorial Data Fusion Problem I , In preparation. Coding: 1
2
Experimental coding at scale based on JGraphT Java library Moderate case in Python with pulp or cvxopt.
Collaboration: Interested in hearing suggestions of open source data for algorithm testing.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
More details
Preprint: R. W. R. Darling, David G. Harris, Dev R. Phulara, John A. Proos, The Combinatorial Data Fusion Problem I , In preparation. Coding: 1
2
Experimental coding at scale based on JGraphT Java library Moderate case in Python with pulp or cvxopt.
Collaboration: Interested in hearing suggestions of open source data for algorithm testing.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
More details
Preprint: R. W. R. Darling, David G. Harris, Dev R. Phulara, John A. Proos, The Combinatorial Data Fusion Problem I , In preparation. Coding: 1
2
Experimental coding at scale based on JGraphT Java library Moderate case in Python with pulp or cvxopt.
Collaboration: Interested in hearing suggestions of open source data for algorithm testing.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
More details
Preprint: R. W. R. Darling, David G. Harris, Dev R. Phulara, John A. Proos, The Combinatorial Data Fusion Problem I , In preparation. Coding: 1
2
Experimental coding at scale based on JGraphT Java library Moderate case in Python with pulp or cvxopt.
Collaboration: Interested in hearing suggestions of open source data for algorithm testing.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
More details
Preprint: R. W. R. Darling, David G. Harris, Dev R. Phulara, John A. Proos, The Combinatorial Data Fusion Problem I , In preparation. Coding: 1
2
Experimental coding at scale based on JGraphT Java library Moderate case in Python with pulp or cvxopt.
Collaboration: Interested in hearing suggestions of open source data for algorithm testing.
Darling, Harris, Phulara, & Proos
CANADAM 2017
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms
Questions
QUESTIONS?
Darling, Harris, Phulara, & Proos
CANADAM 2017