Controlling the False Discovery Rate in Boolean

0 downloads 0 Views 1MB Size Report
probability that a δ-dense tile of size |x| ≥ a and |y| ≥ b exists is no larger than. ( n a. )( m b. ) exp(−2ab(δ − p)2) . (1). Proof (sketch): Hoeffding's inequality yields.
The Trustworthy Pal: Controlling the False Discovery Rate in Boolean Matrix Factorization Sibylle Hess, Katharina Morik and Nico Piatkowski

Given a Binary Data Matrix..

Depending on the data domain, one could ask for ▶

groups of users which like the same set of movies,



groups of patients having a similar set of gene mutations,



groups of costumers buying a similar set of items.

Finding a Factorization Solve min |D − Y ⊙ X ⊤ | for binary matrices X and Y X,Y

=











The False Discovery Rate (FDR)



Boy called wolf once – Type 1 error Boy called wolf twice – Type 2 error

The FDR is controlled at level q if (v ) E ≤q r

v : false alarms r : alarms in total



False Discoveries and BMF Given a factorization of rank r, define { 1 if Y·s X·s⊤ covers (mostly) noise Zs = 0 if Y·s X·s⊤ is a part of the model The FDR is controlled at level q if P(Zs = 1) < q: E

(v ) r

1∑ = P(Zs = 1) r r

s=1

Properties of Outer Products If the binary matrices solve (X, Y ) ∈ arg min |D − Y ⊙ X ⊤ | then any outer product A = D ◦ Y·s X·s⊤ has a high density δ=

Y·s⊤ DX·s 1 ≥ |X·s ||Y·s | 2

and a high coherence η=

max ⟨A·i , A·k ⟩ > δ|Y·s |

1≤i̸=k≤n

δ|X·s | − 1 |X·s | − 1

Theorem (Density Bound) Suppose N is an m × n Bernoulli matrix with parameter p. The probability that a δ-dense tile of size |x| ≥ a and |y| ≥ b exists is no larger than ( )( ) n m exp(−2ab(δ − p)2 ) . (1) a b Proof (sketch): Hoeffding’s inequality yields    ) ( ⊤ ∑ y Nx 1 xi yj Nji  − p ≥ δ − p ≥ δ = P  P |x||y| ab i,j

≤ exp(−2ab(δ − p)2 ), The union bound yields the final result.

Theorem (Coherence Bound) Let N be an m × n Bernoulli matrix with parameter p and let √ µ > p2 . The function value of η satisfies η ((1/ m)N ) ≥ µ with probability no larger than ( ) 3 (µ − p2 )2 n(n − 1) exp − m 2 . (2) 2 2 2p + µ Proof (sketch): The Bernstein inequality yields ( ) 3 (µ − p2 )2 P(⟨N·i , N·k ⟩ ≥ mµ) ≤ exp − m 2 , 2 2p + µ The union bound yields the final result.

Applying the Bounds Given Y ⊙ X ⊤ ≈ D, calculate for 1 ≤ s ≤ r the density δs and coherence ηs . Toss (X·s , Y·s ) if all of the following bounds are larger than q.

Corollary )( ) n m exp(−2|X·s ||Y·s |(δs − p)2 ) P(Zs = 1) ≤ |X·s | |Y·s | ( ) 3 (ηs/m − p2 )2 P(Zs = 1) ≤ exp − m 2 2 2p + ηs/m (

How Good are These Bounds?

|Y·s | m

(n, m) = (1000, 800)

(n, m) = (1600, 500)

0.2

0.2

0.1

0.1

0

0

0.02

0.04

|X·s | n

density bound

0

0

0.02 |X·s | n

coherence bound

0.04

Synthetic Experiments in PalTiling (n, m) = (1600, 500)

F

1 0.8 0.6 0.4

r

(n, m) = (1000, 800)

30 20 10 0

0

0

10

10

20

20

1 0.8 0.6 0.4 30 20 10 0

0

10

20

0

10

20

p± [%] TrustPal dens

p± [%] TrustPal coh

Primp

Take Home ▶

Establishing quality guarantees for unsupervised tasks is pretty interesting



Concentration inequalities (e.g., Hoeffding bound) are powerful stuff,



the binary nature of BMF enables satisfying solutions to problems which are much harder in fuzzier tasks like NMF