Controlling the False Discovery Rate in Boolean

The Trustworthy Pal: Controlling the False Discovery Rate in Boolean Matrix Factorization Sibylle Hess, Katharina Morik and Nico Piatkowski

Given a Binary Data Matrix..

Depending on the data domain, one could ask for ▶

groups of users which like the same set of movies,

▶

groups of patients having a similar set of gene mutations,

▶

groups of costumers buying a similar set of items.

Finding a Factorization Solve min |D − Y ⊙ X ⊤ | for binary matrices X and Y X,Y

=

≈

⊙

⊗

⊕

⊕

The False Discovery Rate (FDR)

“

Boy called wolf once – Type 1 error Boy called wolf twice – Type 2 error

The FDR is controlled at level q if (v ) E ≤q r

v : false alarms r : alarms in total

”

False Discoveries and BMF Given a factorization of rank r, define { 1 if Y·s X·s⊤ covers (mostly) noise Zs = 0 if Y·s X·s⊤ is a part of the model The FDR is controlled at level q if P(Zs = 1) < q: E

(v ) r

1∑ = P(Zs = 1) r r

s=1

Properties of Outer Products If the binary matrices solve (X, Y ) ∈ arg min |D − Y ⊙ X ⊤ | then any outer product A = D ◦ Y·s X·s⊤ has a high density δ=

Y·s⊤ DX·s 1 ≥ |X·s ||Y·s | 2

and a high coherence η=

max ⟨A·i , A·k ⟩ > δ|Y·s |

1≤i̸=k≤n

δ|X·s | − 1 |X·s | − 1

Theorem (Density Bound) Suppose N is an m × n Bernoulli matrix with parameter p. The probability that a δ-dense tile of size |x| ≥ a and |y| ≥ b exists is no larger than ( )( ) n m exp(−2ab(δ − p)2 ) . (1) a b Proof (sketch): Hoeffding’s inequality yields    ) ( ⊤ ∑ y Nx 1 xi yj Nji  − p ≥ δ − p ≥ δ = P  P |x||y| ab i,j

≤ exp(−2ab(δ − p)2 ), The union bound yields the final result.

Theorem (Coherence Bound) Let N be an m × n Bernoulli matrix with parameter p and let √ µ > p2 . The function value of η satisfies η ((1/ m)N ) ≥ µ with probability no larger than ( ) 3 (µ − p2 )2 n(n − 1) exp − m 2 . (2) 2 2 2p + µ Proof (sketch): The Bernstein inequality yields ( ) 3 (µ − p2 )2 P(⟨N·i , N·k ⟩ ≥ mµ) ≤ exp − m 2 , 2 2p + µ The union bound yields the final result.

Applying the Bounds Given Y ⊙ X ⊤ ≈ D, calculate for 1 ≤ s ≤ r the density δs and coherence ηs . Toss (X·s , Y·s ) if all of the following bounds are larger than q.

Corollary )( ) n m exp(−2|X·s ||Y·s |(δs − p)2 ) P(Zs = 1) ≤ |X·s | |Y·s | ( ) 3 (ηs/m − p2 )2 P(Zs = 1) ≤ exp − m 2 2 2p + ηs/m (

How Good are These Bounds?

|Y·s | m

(n, m) = (1000, 800)

(n, m) = (1600, 500)

0.2

0.2

0.1

0.1

0

0

0.02

0.04

|X·s | n

density bound

0

0

0.02 |X·s | n

coherence bound

0.04

Synthetic Experiments in PalTiling (n, m) = (1600, 500)

F

1 0.8 0.6 0.4

r

(n, m) = (1000, 800)

30 20 10 0

0

0

10

10

20

20

1 0.8 0.6 0.4 30 20 10 0

0

10

20

0

10

20

p± [%] TrustPal dens

p± [%] TrustPal coh

Primp

Take Home ▶

Establishing quality guarantees for unsupervised tasks is pretty interesting

▶

Concentration inequalities (e.g., Hoeffding bound) are powerful stuff,

▶

the binary nature of BMF enables satisfying solutions to problems which are much harder in fuzzier tasks like NMF