The Combinatorial Data Fusion Problem - GRAPH ...

20 downloads 0 Views 703KB Size Report
Jun 12, 2017 - Darling, Harris, Phulara, & Proos. CANADAM 2017 ..... REFERENCES. 1. David P. Williamson & David B. Shmoys, The Design of.
Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

The Combinatorial Data Fusion Problem GRAPH CUT PROBLEMS FOR BIG DATA R. W. R. Darling1

David G. Harris2 John A. Proos3

1 National 2 University 3 Tutte

Dev R. Phulara1

Security Agency, USA

of Maryland Department of Computer Science

Institute for Mathematics & Computing, Canada

June 12, 2017

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Outline

1

Combinatorial Data Fusion Examples and Formalism

2

Complexity of Combinatorial Data Fusion

3

Practical Data Fusion Algorithms

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

How Data Fusion Problems Arise

A connected edge-weighted graph (V , E0 , w) arises from one data source. A second data source introduces an independence system S on V , characterized by its circuits F, called forbidden sets. Task: Find E1 ⊂ E0 , weight of E0 \ E1 minimum, so no graph component of (V , E1 ) contains any forbidden set.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

How Data Fusion Problems Arise

A connected edge-weighted graph (V , E0 , w) arises from one data source. A second data source introduces an independence system S on V , characterized by its circuits F, called forbidden sets. Task: Find E1 ⊂ E0 , weight of E0 \ E1 minimum, so no graph component of (V , E1 ) contains any forbidden set.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

How Data Fusion Problems Arise

A connected edge-weighted graph (V , E0 , w) arises from one data source. A second data source introduces an independence system S on V , characterized by its circuits F, called forbidden sets. Task: Find E1 ⊂ E0 , weight of E0 \ E1 minimum, so no graph component of (V , E1 ) contains any forbidden set.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Connected Edge-weighted Graph G := (V , E0 ) with w : E0 → (0, ∞)

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Forbidden Sets of Vertices Forbidden: F1 := {v1 , v2 , v3 } and F2 := {v6 , v9 }

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Forbidden Sets in Detail

Forbidden Sets F: a collection of subsets of graph’s vertex set V . Call (V , F) the forbidden hypergraph. Rule 1: F has no singletons, no edges of the graph. Rule 2: No F ∈ F may be proper subset of F 0 ∈ F. Vertex Set here was V := {v1 , v2 , . . . , v11 }. Forbidden Sets here were F1 := {v1 , v2 , v3 } and F2 := {v6 , v9 }.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Forbidden Sets in Detail

Forbidden Sets F: a collection of subsets of graph’s vertex set V . Call (V , F) the forbidden hypergraph. Rule 1: F has no singletons, no edges of the graph. Rule 2: No F ∈ F may be proper subset of F 0 ∈ F. Vertex Set here was V := {v1 , v2 , . . . , v11 }. Forbidden Sets here were F1 := {v1 , v2 , v3 } and F2 := {v6 , v9 }.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Forbidden Sets in Detail

Forbidden Sets F: a collection of subsets of graph’s vertex set V . Call (V , F) the forbidden hypergraph. Rule 1: F has no singletons, no edges of the graph. Rule 2: No F ∈ F may be proper subset of F 0 ∈ F. Vertex Set here was V := {v1 , v2 , . . . , v11 }. Forbidden Sets here were F1 := {v1 , v2 , v3 } and F2 := {v6 , v9 }.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Forbidden Sets in Detail

Forbidden Sets F: a collection of subsets of graph’s vertex set V . Call (V , F) the forbidden hypergraph. Rule 1: F has no singletons, no edges of the graph. Rule 2: No F ∈ F may be proper subset of F 0 ∈ F. Vertex Set here was V := {v1 , v2 , . . . , v11 }. Forbidden Sets here were F1 := {v1 , v2 , v3 } and F2 := {v6 , v9 }.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Forbidden Sets in Detail

Forbidden Sets F: a collection of subsets of graph’s vertex set V . Call (V , F) the forbidden hypergraph. Rule 1: F has no singletons, no edges of the graph. Rule 2: No F ∈ F may be proper subset of F 0 ∈ F. Vertex Set here was V := {v1 , v2 , . . . , v11 }. Forbidden Sets here were F1 := {v1 , v2 , v3 } and F2 := {v6 , v9 }.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Forbidden Sets in Detail

Forbidden Sets F: a collection of subsets of graph’s vertex set V . Call (V , F) the forbidden hypergraph. Rule 1: F has no singletons, no edges of the graph. Rule 2: No F ∈ F may be proper subset of F 0 ∈ F. Vertex Set here was V := {v1 , v2 , . . . , v11 }. Forbidden Sets here were F1 := {v1 , v2 , v3 } and F2 := {v6 , v9 }.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

What Role Do Forbidden Sets Play?

Suppose the weighted graph is G := (V , E0 ) with w : E0 → (0, ∞). Given an edge subset E1 ⊂ E0 , the subgraph (V , E1 ) partitions V into components V1 , V2 , . . . , Vd . Edge subset E1 ⊂ E0 is feasible if no component Vj contains any forbidden set. Seek such a feasible edge subset of maximum total weight.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

What Role Do Forbidden Sets Play?

Suppose the weighted graph is G := (V , E0 ) with w : E0 → (0, ∞). Given an edge subset E1 ⊂ E0 , the subgraph (V , E1 ) partitions V into components V1 , V2 , . . . , Vd . Edge subset E1 ⊂ E0 is feasible if no component Vj contains any forbidden set. Seek such a feasible edge subset of maximum total weight.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

What Role Do Forbidden Sets Play?

Suppose the weighted graph is G := (V , E0 ) with w : E0 → (0, ∞). Given an edge subset E1 ⊂ E0 , the subgraph (V , E1 ) partitions V into components V1 , V2 , . . . , Vd . Edge subset E1 ⊂ E0 is feasible if no component Vj contains any forbidden set. Seek such a feasible edge subset of maximum total weight.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

What Role Do Forbidden Sets Play?

Suppose the weighted graph is G := (V , E0 ) with w : E0 → (0, ∞). Given an edge subset E1 ⊂ E0 , the subgraph (V , E1 ) partitions V into components V1 , V2 , . . . , Vd . Edge subset E1 ⊂ E0 is feasible if no component Vj contains any forbidden set. Seek such a feasible edge subset of maximum total weight.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

(V , E1 ): No Component Contains Forbidden Set Dashed edges are removed. Solid edges remain.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Semi-supervised Learning as Data Fusion Problem

Client-Server Graph: VERTICES Some vertices may be clients, some may be servers, and others may be labels. EDGES Some edges are transactions between clients and servers, and others tag a server with a label.

Multiway Cut: FORBIDDEN All pairs of distinct labels. OUTCOME Each component has eactly one label vertex.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Semi-supervised Learning as Data Fusion Problem

Client-Server Graph: VERTICES Some vertices may be clients, some may be servers, and others may be labels. EDGES Some edges are transactions between clients and servers, and others tag a server with a label.

Multiway Cut: FORBIDDEN All pairs of distinct labels. OUTCOME Each component has eactly one label vertex.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Semi-supervised Learning as Data Fusion Problem

Client-Server Graph: VERTICES Some vertices may be clients, some may be servers, and others may be labels. EDGES Some edges are transactions between clients and servers, and others tag a server with a label.

Multiway Cut: FORBIDDEN All pairs of distinct labels. OUTCOME Each component has eactly one label vertex.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Semi-supervised Learning as Data Fusion Problem

Client-Server Graph: VERTICES Some vertices may be clients, some may be servers, and others may be labels. EDGES Some edges are transactions between clients and servers, and others tag a server with a label.

Multiway Cut: FORBIDDEN All pairs of distinct labels. OUTCOME Each component has eactly one label vertex.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Semi-supervised Learning as Data Fusion Problem

Client-Server Graph: VERTICES Some vertices may be clients, some may be servers, and others may be labels. EDGES Some edges are transactions between clients and servers, and others tag a server with a label.

Multiway Cut: FORBIDDEN All pairs of distinct labels. OUTCOME Each component has eactly one label vertex.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Semi-supervised Learning as Data Fusion Problem

Client-Server Graph: VERTICES Some vertices may be clients, some may be servers, and others may be labels. EDGES Some edges are transactions between clients and servers, and others tag a server with a label.

Multiway Cut: FORBIDDEN All pairs of distinct labels. OUTCOME Each component has eactly one label vertex.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Multiway Cut Visualization Left nodes: Clients. Right Nodes: Servers. Color: Server tag. Edges: Transactions

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Last Picture Should Look Like . . .

Cut by color.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Conversion to Combinatorial Data Fusion Extra nodes: colors. Color tag becomes edge from server to a new color node. Forbidden sets: color node pairs.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Name vs. Metadata Type of Data Fusion Problem

Entity Similarity Graph VERTICES Vertices are named entities. Vertex v is tagged with a metadata vector xv and a name `(v ). EDGES Edge weights w({v , v 0 }) reflect name-based similarity between `(v ) and `(v 0 ).

Metadata Compatibility FORBIDDEN We L say F ⊂ V is forbidden if the concatenated vector v ∈F xv fails to satisfy some monotone system of linear inequalities. OUTCOME Each component consist of similarly-named entities with compatible metadata,

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Name vs. Metadata Type of Data Fusion Problem

Entity Similarity Graph VERTICES Vertices are named entities. Vertex v is tagged with a metadata vector xv and a name `(v ). EDGES Edge weights w({v , v 0 }) reflect name-based similarity between `(v ) and `(v 0 ).

Metadata Compatibility FORBIDDEN We L say F ⊂ V is forbidden if the concatenated vector v ∈F xv fails to satisfy some monotone system of linear inequalities. OUTCOME Each component consist of similarly-named entities with compatible metadata,

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Name vs. Metadata Type of Data Fusion Problem

Entity Similarity Graph VERTICES Vertices are named entities. Vertex v is tagged with a metadata vector xv and a name `(v ). EDGES Edge weights w({v , v 0 }) reflect name-based similarity between `(v ) and `(v 0 ).

Metadata Compatibility FORBIDDEN We L say F ⊂ V is forbidden if the concatenated vector v ∈F xv fails to satisfy some monotone system of linear inequalities. OUTCOME Each component consist of similarly-named entities with compatible metadata,

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Name vs. Metadata Type of Data Fusion Problem

Entity Similarity Graph VERTICES Vertices are named entities. Vertex v is tagged with a metadata vector xv and a name `(v ). EDGES Edge weights w({v , v 0 }) reflect name-based similarity between `(v ) and `(v 0 ).

Metadata Compatibility FORBIDDEN We L say F ⊂ V is forbidden if the concatenated vector v ∈F xv fails to satisfy some monotone system of linear inequalities. OUTCOME Each component consist of similarly-named entities with compatible metadata,

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Name vs. Metadata Type of Data Fusion Problem

Entity Similarity Graph VERTICES Vertices are named entities. Vertex v is tagged with a metadata vector xv and a name `(v ). EDGES Edge weights w({v , v 0 }) reflect name-based similarity between `(v ) and `(v 0 ).

Metadata Compatibility FORBIDDEN We L say F ⊂ V is forbidden if the concatenated vector v ∈F xv fails to satisfy some monotone system of linear inequalities. OUTCOME Each component consist of similarly-named entities with compatible metadata,

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Name vs. Metadata Type of Data Fusion Problem

Entity Similarity Graph VERTICES Vertices are named entities. Vertex v is tagged with a metadata vector xv and a name `(v ). EDGES Edge weights w({v , v 0 }) reflect name-based similarity between `(v ) and `(v 0 ).

Metadata Compatibility FORBIDDEN We L say F ⊂ V is forbidden if the concatenated vector v ∈F xv fails to satisfy some monotone system of linear inequalities. OUTCOME Each component consist of similarly-named entities with compatible metadata,

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Other Ways Forbidden Sets Arise BALANCED CUTS: Assign price p(v ) to vertex v . Edge weight w({u, v }) measures similarity between u and v . Call F ⊂ V forbidden if X p(v ) > P1 v ∈F

for some price limit P1 . Group vertices into similar clusters with a price bound. LOGIC PROBLEMS: Logical variables x1 , . . . , xn . Boolean functions f1 , . . . , fm . Vertex for each possible value of an xi or fj . {x1 = 0, x4 = 0, x9 = 1, f1 = 0} is forbidden if f1 = 1 when (x1 , x4 , x9 ) = (0, 0, 1). Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Other Ways Forbidden Sets Arise BALANCED CUTS: Assign price p(v ) to vertex v . Edge weight w({u, v }) measures similarity between u and v . Call F ⊂ V forbidden if X p(v ) > P1 v ∈F

for some price limit P1 . Group vertices into similar clusters with a price bound. LOGIC PROBLEMS: Logical variables x1 , . . . , xn . Boolean functions f1 , . . . , fm . Vertex for each possible value of an xi or fj . {x1 = 0, x4 = 0, x9 = 1, f1 = 0} is forbidden if f1 = 1 when (x1 , x4 , x9 ) = (0, 0, 1). Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

EQUIVALENT: Coloring the Forbidden Hypergraph Given (V , E0 , w) and forbidden hypergraph (V , F). 1

Let d ≥ 2. A hypergraph coloring of (V , F) means a mapping χ : V → {1, 2, . . . , d} such that no hyperedge in F is monochromatic;

2

Find a d and a hypergraph coloring χ for which the total weight X w({u, v }). {u,v }∈E0 χ(u)6=χ(v )

of edges with differently colored endpoints is minimum. 3

Equivalent to CDFP. Colors ↔ components of (V , E1 ).

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

EQUIVALENT: Coloring the Forbidden Hypergraph Given (V , E0 , w) and forbidden hypergraph (V , F). 1

Let d ≥ 2. A hypergraph coloring of (V , F) means a mapping χ : V → {1, 2, . . . , d} such that no hyperedge in F is monochromatic;

2

Find a d and a hypergraph coloring χ for which the total weight X w({u, v }). {u,v }∈E0 χ(u)6=χ(v )

of edges with differently colored endpoints is minimum. 3

Equivalent to CDFP. Colors ↔ components of (V , E1 ).

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

EQUIVALENT: Coloring the Forbidden Hypergraph Given (V , E0 , w) and forbidden hypergraph (V , F). 1

Let d ≥ 2. A hypergraph coloring of (V , F) means a mapping χ : V → {1, 2, . . . , d} such that no hyperedge in F is monochromatic;

2

Find a d and a hypergraph coloring χ for which the total weight X w({u, v }). {u,v }∈E0 χ(u)6=χ(v )

of edges with differently colored endpoints is minimum. 3

Equivalent to CDFP. Colors ↔ components of (V , E1 ).

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

EQUIVALENT: Optimum Hypergraph Matching Given (V , E0 , w) and independence system S on V whose circuits are forbidden sets. 1

2

3

A hypergraph matching M of S is a collection of pairwise disjoint hyperedges in S. Define weight w(U) as total weight of the vertex-induced subgraph of G induced by U ∈ S. Seek a perfect matching1 M which maximizes the sum of the weights of the matching hyperedges: X w(M) := w(U). U∈M

4

Equivalent to CDFP. Hyperedges in optimum matching ↔ components of (V , E1 ).

1

Since {v } ∈ S for all v ∈ V by assumption, and all singletons have weight 0, any matching can be extended to a perfect matching of equal weight. Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

EQUIVALENT: Optimum Hypergraph Matching Given (V , E0 , w) and independence system S on V whose circuits are forbidden sets. 1

2

3

A hypergraph matching M of S is a collection of pairwise disjoint hyperedges in S. Define weight w(U) as total weight of the vertex-induced subgraph of G induced by U ∈ S. Seek a perfect matching1 M which maximizes the sum of the weights of the matching hyperedges: X w(M) := w(U). U∈M

4

Equivalent to CDFP. Hyperedges in optimum matching ↔ components of (V , E1 ).

1

Since {v } ∈ S for all v ∈ V by assumption, and all singletons have weight 0, any matching can be extended to a perfect matching of equal weight. Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

EQUIVALENT: Optimum Hypergraph Matching Given (V , E0 , w) and independence system S on V whose circuits are forbidden sets. 1

2

3

A hypergraph matching M of S is a collection of pairwise disjoint hyperedges in S. Define weight w(U) as total weight of the vertex-induced subgraph of G induced by U ∈ S. Seek a perfect matching1 M which maximizes the sum of the weights of the matching hyperedges: X w(M) := w(U). U∈M

4

Equivalent to CDFP. Hyperedges in optimum matching ↔ components of (V , E1 ).

1

Since {v } ∈ S for all v ∈ V by assumption, and all singletons have weight 0, any matching can be extended to a perfect matching of equal weight. Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

EQUIVALENT: Optimum Hypergraph Matching Given (V , E0 , w) and independence system S on V whose circuits are forbidden sets. 1

2

3

A hypergraph matching M of S is a collection of pairwise disjoint hyperedges in S. Define weight w(U) as total weight of the vertex-induced subgraph of G induced by U ∈ S. Seek a perfect matching1 M which maximizes the sum of the weights of the matching hyperedges: X w(M) := w(U). U∈M

4

Equivalent to CDFP. Hyperedges in optimum matching ↔ components of (V , E1 ).

1

Since {v } ∈ S for all v ∈ V by assumption, and all singletons have weight 0, any matching can be extended to a perfect matching of equal weight. Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

How Difficult Is Combinatorial Data Fusion?

Linear-time, polynomial-time, and NP-hard subcases exist. Authors do not know a triage of CDFP among these categories of difficulty. NAE-SAT and Max. Weight Stable Set (for vertex-weighted graphs) can be converted to CDFP. Most-studied subproblems are Multicut and Multiway Cut (next page). Integer Linear Program formulation below.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

How Difficult Is Combinatorial Data Fusion?

Linear-time, polynomial-time, and NP-hard subcases exist. Authors do not know a triage of CDFP among these categories of difficulty. NAE-SAT and Max. Weight Stable Set (for vertex-weighted graphs) can be converted to CDFP. Most-studied subproblems are Multicut and Multiway Cut (next page). Integer Linear Program formulation below.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

How Difficult Is Combinatorial Data Fusion?

Linear-time, polynomial-time, and NP-hard subcases exist. Authors do not know a triage of CDFP among these categories of difficulty. NAE-SAT and Max. Weight Stable Set (for vertex-weighted graphs) can be converted to CDFP. Most-studied subproblems are Multicut and Multiway Cut (next page). Integer Linear Program formulation below.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

How Difficult Is Combinatorial Data Fusion?

Linear-time, polynomial-time, and NP-hard subcases exist. Authors do not know a triage of CDFP among these categories of difficulty. NAE-SAT and Max. Weight Stable Set (for vertex-weighted graphs) can be converted to CDFP. Most-studied subproblems are Multicut and Multiway Cut (next page). Integer Linear Program formulation below.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

How Difficult Is Combinatorial Data Fusion?

Linear-time, polynomial-time, and NP-hard subcases exist. Authors do not know a triage of CDFP among these categories of difficulty. NAE-SAT and Max. Weight Stable Set (for vertex-weighted graphs) can be converted to CDFP. Most-studied subproblems are Multicut and Multiway Cut (next page). Integer Linear Program formulation below.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Complexity When All Forbidden Sets Have Size 2 Multiway Cut: Fix S ⊂ V . Forbidden sets are all pairs {s, s0 } with s, s0 ∈ S and s 6= s0 . NP-hard if |S| > 2. Randomized 3/2-approximation exists. Multicut: Forbidden sets {si , ti }1≤i≤k . NP-hard if k ≥ 2. 4 log (k + 1)-approximation exists. If Unique Games Conj. holds, no constant factor approximation unless P=NP. REFERENCES 1

2

David P. Williamson & David B. Shmoys, The Design of Approximation Algorithms, Cambridge Univ. Press, 2011 Vijay V. Vazirani, Approximation Algorithms. Springer, Berlin, 2003.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Complexity When All Forbidden Sets Have Size 2 Multiway Cut: Fix S ⊂ V . Forbidden sets are all pairs {s, s0 } with s, s0 ∈ S and s 6= s0 . NP-hard if |S| > 2. Randomized 3/2-approximation exists. Multicut: Forbidden sets {si , ti }1≤i≤k . NP-hard if k ≥ 2. 4 log (k + 1)-approximation exists. If Unique Games Conj. holds, no constant factor approximation unless P=NP. REFERENCES 1

2

David P. Williamson & David B. Shmoys, The Design of Approximation Algorithms, Cambridge Univ. Press, 2011 Vijay V. Vazirani, Approximation Algorithms. Springer, Berlin, 2003.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Complexity When All Forbidden Sets Have Size 2 Multiway Cut: Fix S ⊂ V . Forbidden sets are all pairs {s, s0 } with s, s0 ∈ S and s 6= s0 . NP-hard if |S| > 2. Randomized 3/2-approximation exists. Multicut: Forbidden sets {si , ti }1≤i≤k . NP-hard if k ≥ 2. 4 log (k + 1)-approximation exists. If Unique Games Conj. holds, no constant factor approximation unless P=NP. REFERENCES 1

2

David P. Williamson & David B. Shmoys, The Design of Approximation Algorithms, Cambridge Univ. Press, 2011 Vijay V. Vazirani, Approximation Algorithms. Springer, Berlin, 2003.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Complexity When All Forbidden Sets Have Size 2 Multiway Cut: Fix S ⊂ V . Forbidden sets are all pairs {s, s0 } with s, s0 ∈ S and s 6= s0 . NP-hard if |S| > 2. Randomized 3/2-approximation exists. Multicut: Forbidden sets {si , ti }1≤i≤k . NP-hard if k ≥ 2. 4 log (k + 1)-approximation exists. If Unique Games Conj. holds, no constant factor approximation unless P=NP. REFERENCES 1

2

David P. Williamson & David B. Shmoys, The Design of Approximation Algorithms, Cambridge Univ. Press, 2011 Vijay V. Vazirani, Approximation Algorithms. Springer, Berlin, 2003.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Complexity When All Forbidden Sets Have Size 2 Multiway Cut: Fix S ⊂ V . Forbidden sets are all pairs {s, s0 } with s, s0 ∈ S and s 6= s0 . NP-hard if |S| > 2. Randomized 3/2-approximation exists. Multicut: Forbidden sets {si , ti }1≤i≤k . NP-hard if k ≥ 2. 4 log (k + 1)-approximation exists. If Unique Games Conj. holds, no constant factor approximation unless P=NP. REFERENCES 1

2

David P. Williamson & David B. Shmoys, The Design of Approximation Algorithms, Cambridge Univ. Press, 2011 Vijay V. Vazirani, Approximation Algorithms. Springer, Berlin, 2003.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Combinatorial Data Fusion on Trees Suppose T = (V , E, w) is a weighted tree. Multiway Cut: becomes a matroid problem2 . Linear time algorithm already known. Arbitrary Forbidden Sets: F := {F1 , F2 , . . . , Fb }. Say E := {e1 , e2 , . . . , em }. Let ai,j = 1 if removing edge ej disconnects Fi in T . 1

Set Cover: Integer linear program: minimize over m X

ai,j xj ≥ 1,

i = 1, 2, . . . , b;

j=1 2

2

Optimum is: delete ej if xj = 1

Proved in our 2017 paper Darling, Harris, Phulara, & Proos

CANADAM 2017

Pm

j=1

w(ej )xj

xj ∈ {0, 1}.

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Combinatorial Data Fusion on Trees Suppose T = (V , E, w) is a weighted tree. Multiway Cut: becomes a matroid problem2 . Linear time algorithm already known. Arbitrary Forbidden Sets: F := {F1 , F2 , . . . , Fb }. Say E := {e1 , e2 , . . . , em }. Let ai,j = 1 if removing edge ej disconnects Fi in T . 1

Set Cover: Integer linear program: minimize over m X

ai,j xj ≥ 1,

i = 1, 2, . . . , b;

j=1 2

2

Optimum is: delete ej if xj = 1

Proved in our 2017 paper Darling, Harris, Phulara, & Proos

CANADAM 2017

Pm

j=1

w(ej )xj

xj ∈ {0, 1}.

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Combinatorial Data Fusion on Trees Suppose T = (V , E, w) is a weighted tree. Multiway Cut: becomes a matroid problem2 . Linear time algorithm already known. Arbitrary Forbidden Sets: F := {F1 , F2 , . . . , Fb }. Say E := {e1 , e2 , . . . , em }. Let ai,j = 1 if removing edge ej disconnects Fi in T . 1

Set Cover: Integer linear program: minimize over m X

ai,j xj ≥ 1,

i = 1, 2, . . . , b;

j=1 2

2

Optimum is: delete ej if xj = 1

Proved in our 2017 paper Darling, Harris, Phulara, & Proos

CANADAM 2017

Pm

j=1

w(ej )xj

xj ∈ {0, 1}.

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Combinatorial Data Fusion on Trees Suppose T = (V , E, w) is a weighted tree. Multiway Cut: becomes a matroid problem2 . Linear time algorithm already known. Arbitrary Forbidden Sets: F := {F1 , F2 , . . . , Fb }. Say E := {e1 , e2 , . . . , em }. Let ai,j = 1 if removing edge ej disconnects Fi in T . 1

Set Cover: Integer linear program: minimize over m X

ai,j xj ≥ 1,

i = 1, 2, . . . , b;

j=1 2

2

Optimum is: delete ej if xj = 1

Proved in our 2017 paper Darling, Harris, Phulara, & Proos

CANADAM 2017

Pm

j=1

w(ej )xj

xj ∈ {0, 1}.

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Data Fusion on Trees in General Say G = (V , E0 , w) is a tree, F ⊂ V is a forbidden set. Seek edges EF ⊂ E0 so removal of any e ∈ EF disconnects F .

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Algorithm for Finding Edges Which Disconnect F Desired edges in E0 belong to 2-core of a graph with auxiliary vertex v0 connected to each v ∈ F .

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Combinatorial Data Fusion as Integer LP Seek optimum d-coloring χ of forbidden hypergraph (V , F). Notation: 1

2

xiα ∈ {0, 1}, for 1 ≤ i ≤ n and 1 ≤ α ≤ d. Here xiα = 1 means that vertex vi is assigned color χ(vi ) = α. Edges E := {ej }, weights {wj }, and yj ∈ {0, 1} for 1 ≤ j ≤ m. Here yj = 1 means that edge ej is deleted.

Objective: minimize Constraints: 1 2

Pm

j=1 wj yj .

Pd α Every vertex has 1 color: 1 ≤ i ≤ n. α=1 xi = 1, Delete edge if end points have different colors: ej = {vr , vs } ⇒ ±yj ≥ xrα − xsα ,

3

1 ≤ α ≤ d.

A forbidden set cannot be monochromatic: X xiα ≤ |fk | − 1, 1 ≤ α ≤ d. ∀F ∈ F. i:vi ∈F Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Combinatorial Data Fusion as Integer LP Seek optimum d-coloring χ of forbidden hypergraph (V , F). Notation: 1

2

xiα ∈ {0, 1}, for 1 ≤ i ≤ n and 1 ≤ α ≤ d. Here xiα = 1 means that vertex vi is assigned color χ(vi ) = α. Edges E := {ej }, weights {wj }, and yj ∈ {0, 1} for 1 ≤ j ≤ m. Here yj = 1 means that edge ej is deleted.

Objective: minimize Constraints: 1 2

Pm

j=1 wj yj .

Pd α Every vertex has 1 color: 1 ≤ i ≤ n. α=1 xi = 1, Delete edge if end points have different colors: ej = {vr , vs } ⇒ ±yj ≥ xrα − xsα ,

3

1 ≤ α ≤ d.

A forbidden set cannot be monochromatic: X xiα ≤ |fk | − 1, 1 ≤ α ≤ d. ∀F ∈ F. i:vi ∈F Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Combinatorial Data Fusion as Integer LP Seek optimum d-coloring χ of forbidden hypergraph (V , F). Notation: 1

2

xiα ∈ {0, 1}, for 1 ≤ i ≤ n and 1 ≤ α ≤ d. Here xiα = 1 means that vertex vi is assigned color χ(vi ) = α. Edges E := {ej }, weights {wj }, and yj ∈ {0, 1} for 1 ≤ j ≤ m. Here yj = 1 means that edge ej is deleted.

Objective: minimize Constraints: 1 2

Pm

j=1 wj yj .

Pd α Every vertex has 1 color: 1 ≤ i ≤ n. α=1 xi = 1, Delete edge if end points have different colors: ej = {vr , vs } ⇒ ±yj ≥ xrα − xsα ,

3

1 ≤ α ≤ d.

A forbidden set cannot be monochromatic: X xiα ≤ |fk | − 1, 1 ≤ α ≤ d. ∀F ∈ F. i:vi ∈F Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Combinatorial Data Fusion as Integer LP Seek optimum d-coloring χ of forbidden hypergraph (V , F). Notation: 1

2

xiα ∈ {0, 1}, for 1 ≤ i ≤ n and 1 ≤ α ≤ d. Here xiα = 1 means that vertex vi is assigned color χ(vi ) = α. Edges E := {ej }, weights {wj }, and yj ∈ {0, 1} for 1 ≤ j ≤ m. Here yj = 1 means that edge ej is deleted.

Objective: minimize Constraints: 1 2

Pm

j=1 wj yj .

Pd α Every vertex has 1 color: 1 ≤ i ≤ n. α=1 xi = 1, Delete edge if end points have different colors: ej = {vr , vs } ⇒ ±yj ≥ xrα − xsα ,

3

1 ≤ α ≤ d.

A forbidden set cannot be monochromatic: X xiα ≤ |fk | − 1, 1 ≤ α ≤ d. ∀F ∈ F. i:vi ∈F Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Combinatorial Data Fusion as Integer LP Seek optimum d-coloring χ of forbidden hypergraph (V , F). Notation: 1

2

xiα ∈ {0, 1}, for 1 ≤ i ≤ n and 1 ≤ α ≤ d. Here xiα = 1 means that vertex vi is assigned color χ(vi ) = α. Edges E := {ej }, weights {wj }, and yj ∈ {0, 1} for 1 ≤ j ≤ m. Here yj = 1 means that edge ej is deleted.

Objective: minimize Constraints: 1 2

Pm

j=1 wj yj .

Pd α Every vertex has 1 color: 1 ≤ i ≤ n. α=1 xi = 1, Delete edge if end points have different colors: ej = {vr , vs } ⇒ ±yj ≥ xrα − xsα ,

3

1 ≤ α ≤ d.

A forbidden set cannot be monochromatic: X xiα ≤ |fk | − 1, 1 ≤ α ≤ d. ∀F ∈ F. i:vi ∈F Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Combinatorial Data Fusion as Integer LP Seek optimum d-coloring χ of forbidden hypergraph (V , F). Notation: 1

2

xiα ∈ {0, 1}, for 1 ≤ i ≤ n and 1 ≤ α ≤ d. Here xiα = 1 means that vertex vi is assigned color χ(vi ) = α. Edges E := {ej }, weights {wj }, and yj ∈ {0, 1} for 1 ≤ j ≤ m. Here yj = 1 means that edge ej is deleted.

Objective: minimize Constraints: 1 2

Pm

j=1 wj yj .

Pd α Every vertex has 1 color: 1 ≤ i ≤ n. α=1 xi = 1, Delete edge if end points have different colors: ej = {vr , vs } ⇒ ±yj ≥ xrα − xsα ,

3

1 ≤ α ≤ d.

A forbidden set cannot be monochromatic: X xiα ≤ |fk | − 1, 1 ≤ α ≤ d. ∀F ∈ F. i:vi ∈F Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Combinatorial Data Fusion as Integer LP Seek optimum d-coloring χ of forbidden hypergraph (V , F). Notation: 1

2

xiα ∈ {0, 1}, for 1 ≤ i ≤ n and 1 ≤ α ≤ d. Here xiα = 1 means that vertex vi is assigned color χ(vi ) = α. Edges E := {ej }, weights {wj }, and yj ∈ {0, 1} for 1 ≤ j ≤ m. Here yj = 1 means that edge ej is deleted.

Objective: minimize Constraints: 1 2

Pm

j=1 wj yj .

Pd α Every vertex has 1 color: 1 ≤ i ≤ n. α=1 xi = 1, Delete edge if end points have different colors: ej = {vr , vs } ⇒ ±yj ≥ xrα − xsα ,

3

1 ≤ α ≤ d.

A forbidden set cannot be monochromatic: X xiα ≤ |fk | − 1, 1 ≤ α ≤ d. ∀F ∈ F. i:vi ∈F Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Combinatorial Data Fusion as Integer LP Seek optimum d-coloring χ of forbidden hypergraph (V , F). Notation: 1

2

xiα ∈ {0, 1}, for 1 ≤ i ≤ n and 1 ≤ α ≤ d. Here xiα = 1 means that vertex vi is assigned color χ(vi ) = α. Edges E := {ej }, weights {wj }, and yj ∈ {0, 1} for 1 ≤ j ≤ m. Here yj = 1 means that edge ej is deleted.

Objective: minimize Constraints: 1 2

Pm

j=1 wj yj .

Pd α Every vertex has 1 color: 1 ≤ i ≤ n. α=1 xi = 1, Delete edge if end points have different colors: ej = {vr , vs } ⇒ ±yj ≥ xrα − xsα ,

3

1 ≤ α ≤ d.

A forbidden set cannot be monochromatic: X xiα ≤ |fk | − 1, 1 ≤ α ≤ d. ∀F ∈ F. i:vi ∈F Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Three Scales of Problem Needing Algorthms Moderate: G may have less than 100 vertices and less than 20 cycles. Tree width may be low. Disconnecting a few vertices from the rest of the graph may leave the giant component free of forbidden sets. Large: G has up to 100K vertices, nearly disconnected into clusters whose sizes follow a power law. Each cluster has a few high degree vertices, and many low degree vertices. There are thousands of special nodes T ⊂ V . Solve Multiway Cut for T , meaning that every element of T must be in a different component. Mega: Like the large case, but with up to 100M vertices. T is in the hundreds. Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Three Scales of Problem Needing Algorthms Moderate: G may have less than 100 vertices and less than 20 cycles. Tree width may be low. Disconnecting a few vertices from the rest of the graph may leave the giant component free of forbidden sets. Large: G has up to 100K vertices, nearly disconnected into clusters whose sizes follow a power law. Each cluster has a few high degree vertices, and many low degree vertices. There are thousands of special nodes T ⊂ V . Solve Multiway Cut for T , meaning that every element of T must be in a different component. Mega: Like the large case, but with up to 100M vertices. T is in the hundreds. Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Three Scales of Problem Needing Algorthms Moderate: G may have less than 100 vertices and less than 20 cycles. Tree width may be low. Disconnecting a few vertices from the rest of the graph may leave the giant component free of forbidden sets. Large: G has up to 100K vertices, nearly disconnected into clusters whose sizes follow a power law. Each cluster has a few high degree vertices, and many low degree vertices. There are thousands of special nodes T ⊂ V . Solve Multiway Cut for T , meaning that every element of T must be in a different component. Mega: Like the large case, but with up to 100M vertices. T is in the hundreds. Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Exact Algorithms for Moderate Case Adam Logan (Tutte Institute) has an exact algorithm for graphs with ∼ 100 vertices, ∼ 50 cycles, and ∼ 10 forbidden sets. It makes multiple calls to an exact ILP solver in Python. Associate a 0-1 variable with each included edge of the graph. Suppose the included edges give some component which contains a forbidden set F . Find a set of edges EF which form an approximate Steiner tree for the vertices in F . Add lazy constraint do not include all edges in EF in the next iteration, and repeat. Logan reports that this terminates in about a minute. Mark Velednitsky (U. C. Berkeley) reports3 an efficient branch and bound algorithm for Multiway Cut. 3

Personal communication Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Exact Algorithms for Moderate Case Adam Logan (Tutte Institute) has an exact algorithm for graphs with ∼ 100 vertices, ∼ 50 cycles, and ∼ 10 forbidden sets. It makes multiple calls to an exact ILP solver in Python. Associate a 0-1 variable with each included edge of the graph. Suppose the included edges give some component which contains a forbidden set F . Find a set of edges EF which form an approximate Steiner tree for the vertices in F . Add lazy constraint do not include all edges in EF in the next iteration, and repeat. Logan reports that this terminates in about a minute. Mark Velednitsky (U. C. Berkeley) reports3 an efficient branch and bound algorithm for Multiway Cut. 3

Personal communication Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Exact Algorithms for Moderate Case Adam Logan (Tutte Institute) has an exact algorithm for graphs with ∼ 100 vertices, ∼ 50 cycles, and ∼ 10 forbidden sets. It makes multiple calls to an exact ILP solver in Python. Associate a 0-1 variable with each included edge of the graph. Suppose the included edges give some component which contains a forbidden set F . Find a set of edges EF which form an approximate Steiner tree for the vertices in F . Add lazy constraint do not include all edges in EF in the next iteration, and repeat. Logan reports that this terminates in about a minute. Mark Velednitsky (U. C. Berkeley) reports3 an efficient branch and bound algorithm for Multiway Cut. 3

Personal communication Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Exact Algorithms for Moderate Case Adam Logan (Tutte Institute) has an exact algorithm for graphs with ∼ 100 vertices, ∼ 50 cycles, and ∼ 10 forbidden sets. It makes multiple calls to an exact ILP solver in Python. Associate a 0-1 variable with each included edge of the graph. Suppose the included edges give some component which contains a forbidden set F . Find a set of edges EF which form an approximate Steiner tree for the vertices in F . Add lazy constraint do not include all edges in EF in the next iteration, and repeat. Logan reports that this terminates in about a minute. Mark Velednitsky (U. C. Berkeley) reports3 an efficient branch and bound algorithm for Multiway Cut. 3

Personal communication Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

LP Relaxations for Large Case

The authors are developing an LP relaxation of the integer LP for optimum d-coloring of the forbidden hypergraph, described above. As in classical LP relaxation of multiway cut, the vertices of the graph will be embedded in the d-dimensional simplex. Then partitioned into d clusters, with the goal that no forbidden set falls entirely within one cluster. Constant factor approximation seems unlikely, because of the negative result for Multicut. The log of the number of forbidden sets may intrude.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

LP Relaxations for Large Case

The authors are developing an LP relaxation of the integer LP for optimum d-coloring of the forbidden hypergraph, described above. As in classical LP relaxation of multiway cut, the vertices of the graph will be embedded in the d-dimensional simplex. Then partitioned into d clusters, with the goal that no forbidden set falls entirely within one cluster. Constant factor approximation seems unlikely, because of the negative result for Multicut. The log of the number of forbidden sets may intrude.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

LP Relaxations for Large Case

The authors are developing an LP relaxation of the integer LP for optimum d-coloring of the forbidden hypergraph, described above. As in classical LP relaxation of multiway cut, the vertices of the graph will be embedded in the d-dimensional simplex. Then partitioned into d clusters, with the goal that no forbidden set falls entirely within one cluster. Constant factor approximation seems unlikely, because of the negative result for Multicut. The log of the number of forbidden sets may intrude.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Greedy Algorithms for Mega Case? The optimum hypergraph coloring formulation leads to a natural greedy algorithm which starts with one color, and adds an extra color whenever needed to complete coloring of vertices in some forbidden set. Uncolored vertex v receive color α if adjacent vertices of color α have high total weight edges incident to v . The hypergraph matching formulation leads to another natural greedy algorithm, starting with all singletons in the matching, and allowing sets in the matching to merge greedily when their merger is still an independent set. More ideas are welcome.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Greedy Algorithms for Mega Case? The optimum hypergraph coloring formulation leads to a natural greedy algorithm which starts with one color, and adds an extra color whenever needed to complete coloring of vertices in some forbidden set. Uncolored vertex v receive color α if adjacent vertices of color α have high total weight edges incident to v . The hypergraph matching formulation leads to another natural greedy algorithm, starting with all singletons in the matching, and allowing sets in the matching to merge greedily when their merger is still an independent set. More ideas are welcome.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Greedy Algorithms for Mega Case? The optimum hypergraph coloring formulation leads to a natural greedy algorithm which starts with one color, and adds an extra color whenever needed to complete coloring of vertices in some forbidden set. Uncolored vertex v receive color α if adjacent vertices of color α have high total weight edges incident to v . The hypergraph matching formulation leads to another natural greedy algorithm, starting with all singletons in the matching, and allowing sets in the matching to merge greedily when their merger is still an independent set. More ideas are welcome.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

More details

Preprint: R. W. R. Darling, David G. Harris, Dev R. Phulara, John A. Proos, The Combinatorial Data Fusion Problem I , In preparation. Coding: 1

2

Experimental coding at scale based on JGraphT Java library Moderate case in Python with pulp or cvxopt.

Collaboration: Interested in hearing suggestions of open source data for algorithm testing.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

More details

Preprint: R. W. R. Darling, David G. Harris, Dev R. Phulara, John A. Proos, The Combinatorial Data Fusion Problem I , In preparation. Coding: 1

2

Experimental coding at scale based on JGraphT Java library Moderate case in Python with pulp or cvxopt.

Collaboration: Interested in hearing suggestions of open source data for algorithm testing.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

More details

Preprint: R. W. R. Darling, David G. Harris, Dev R. Phulara, John A. Proos, The Combinatorial Data Fusion Problem I , In preparation. Coding: 1

2

Experimental coding at scale based on JGraphT Java library Moderate case in Python with pulp or cvxopt.

Collaboration: Interested in hearing suggestions of open source data for algorithm testing.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

More details

Preprint: R. W. R. Darling, David G. Harris, Dev R. Phulara, John A. Proos, The Combinatorial Data Fusion Problem I , In preparation. Coding: 1

2

Experimental coding at scale based on JGraphT Java library Moderate case in Python with pulp or cvxopt.

Collaboration: Interested in hearing suggestions of open source data for algorithm testing.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

More details

Preprint: R. W. R. Darling, David G. Harris, Dev R. Phulara, John A. Proos, The Combinatorial Data Fusion Problem I , In preparation. Coding: 1

2

Experimental coding at scale based on JGraphT Java library Moderate case in Python with pulp or cvxopt.

Collaboration: Interested in hearing suggestions of open source data for algorithm testing.

Darling, Harris, Phulara, & Proos

CANADAM 2017

Combinatorial Data Fusion Examples and Formalism Complexity of Combinatorial Data Fusion Practical Data Fusion Algorithms

Questions

QUESTIONS?

Darling, Harris, Phulara, & Proos

CANADAM 2017

Suggest Documents