Scalable multiscale density estimation

Scalable multiscale density estimation

arXiv:1410.7692v1 [stat.ME] 28 Oct 2014

Ye Wang Duke University [email protected]

Antonio Canale Universit`a degli studi di Torino e Collegio Carlo Alberto [email protected]

Abstract Although Bayesian density estimation using discrete mixtures has good performance in modest dimensions, there is a lack of statistical and computational scalability to high-dimensional multivariate cases. To combat the curse of dimensionality, it is necessary to assume the data are concentrated near a lower-dimensional subspace. However, Bayesian methods for learning this subspace along with the density of the data scale poorly computationally. To solve this problem, we propose an empirical Bayes approach, which estimates a multiscale dictionary using geometric multiresolution analysis in a first stage. We use this dictionary within a multiscale mixture model, which allows uncertainty in component allocation, mixture weights and scaling factors over a binary tree. A computational algorithm is proposed, which scales efficiently to massive dimensional problems. We provide some theoretical support for this geometric density estimation (GEODE) method, and illustrate the performance through simulated and real data examples.

1

Introduction

Let yi = (yi1 , . . . , yiD )T , for i = 1, . . . , n, be a sample from an unknown distribution having support in a subset of } < (1 − a) for d > 2 log{b/(1 − a)}/ log(1/a), where d∞ (Ωs,h , Ωds,h ) is defined as kΩs,h − Ωds,h k∞ . kAk∞ calculates the 1 maximum absolute row sum of the matrix A, b = E(σs2 ) and a = E( τs,h,1 ).


Figure 7: A 4 level binary tree decomposition of a parabola using METIS, with the black rectangular denoting the second level cells, the red denoting the third level cells and the green denoting the leaf cells. QK Proof. With a slight abuse of notation, we write us,h,k as u and let A = Let 4d = m=1 τs,h,m . ΨΣs,h ΨT − Ψd Σds,h (Ψd )T , 4d = {ai.j } and Ψ = {ψi,j }. Clearly, d∞ (Ωs,h , Ωds,h ) = max1≤i,j≤D |adi,j |, and PD adi,j = k=d+1 αk2 ψi,k ψj,k . By Cauchy-Schwartz inequality, |

D X

αk2 ψi,k ψj,k | ≤ max ( 1≤m≤D

k=d+1

D X

2 αk2 ψm,k ).

k=H+1

2 Since Ψ is orthonormal, we have ψi,j ≤ 1 for any i and j. Hence D X

d∞ (Ωs,h , Ωds,h ) ≤

αk2 .

k=d+1

For a fixed > 0, by Chebyshev’s inequalities p{d∞ (Ωs,h , Ωds,h ) ≤ }

≥ p

X D

αk2 ≤

k=d+1

= E p(

D X

αk2 ≤ |τ )

k=d+1

=

X D 1 − E p( αk2 > |τ ) k=d+1

PD E( k=d+1 αk2 |τ ) . ≥ 1−E

By design we have u ∼ Ga(0,1) (A + 1, 1) and u and σs2 are conditionally independent, hence E[(

1 − 1)σs2 |τ ] u

=

E[(

1 − 1)|τ ]E(σs2 ). u


Then we have R1

1 E[( − 1)|τ ] u

=

=

e−1

= A Let γ(s, x) =

Rx 0

A

u e−u du (1/u − 1) Γ(A+1) = R 1 uA e−u du 0 Γ(A+1)

R1

1/u × uA e−u du −1 R1 uA e−u du 0 R 1 A−1 −u R1 1 A −u 1 u e du |0 + 0 A1 uA e−u du Au e 0 −1= −1 R1 R1 uA e−u du uA e−u du 0 0 0

R1 0

uA e−u du

−1+

0

1 . A

ts−1 e−t dt be the lower incomplete Gamma function. Note that, A A uA+1 e−u |10 + γ(A + 2, 1) A+1 A+1 A −1 A 1 1 −1 = e + e + γ(A + 3, 1) A+1 A+1 A+2 A+2 X K Γ(A + 1)2 −1 e + AΓ(A + 1)F (1; A + K, 1) = lim K→∞ Γ(A)Γ(A + k + 1)

Aγ(A + 1, 1)

=

k=1

∞ X

=

k=1 ∞ X

=

k=1

Γ(A + 1)2 e−1 Γ(A)Γ(A + k + 1) A e−1 (A + 1)(A + 2) . . . (A + k)

where F (x; a, b) is the cdf of Ga(a, b) and lima=∞ F (1; a, 1) = 0. Furthermore we have ∞ X k=1

Γ(A + 1)2 Γ(A)Γ(A + k + 1)

=

P∞

A k=1 (A+1)(A+2)...(A+k)

≥ 1/2,

and 1−

∞ X k=1

Γ(A + 1)2 Γ(A)Γ(A + k + 1)

≤1−

A A+1

≤

1 , A

thus we have e−1 A

R1

uA e−uh dus,h,k 0 s,h,k

−1+

1 A

=

= ≤ = Hence E[( u1 − 1)|τ ] ≤ 3/(

Qk

m=1 τs,h,m ).

D X k=d+1

1 Γ(A+1)2 k=1 Γ(A)Γ(A+k+1)

P∞

−1+

Γ(A+1)2 Γ(A)Γ(A+k+1) Γ(A+1)2 k=1 Γ(A)Γ(A+k+1)

P∞ 1 − k=1 P∞

+

1/A 1 + 1/2 A 3 . A

Based on this inequality, we have

1 2 E E[( − 1)σs |τ ] u

≤

PD

k=d+1

=

E

PD

Qk 3 m=1 τs,h,m

k=d+1

3bak ≤

E(σs2 )

3bad 1−a

1 A 1 A

Scalable multiscale density estimation 1 where b = E(σs2 ) and a = E( τs,h,1 ). Note that τs,h,m ∼ Exp[1,∞) (λ), thus a < 1. By Fubini’s theorem, P∞ P ∞ 1 −1)σs2 |τ ] . Now use inequality (1−x/2) > exp(−x) if 0 < x ≤ 1.5 E E( k=H+1 αk2 |τ ) = k=d+1 E E[( us,h,k

to get p{d∞ (Ωs,h , Ωds,h ) ≤ } ≥ exp{

−6bad } (1 − a)

if d > 2 log{b/(1 − a)}/ log(1/a). Hence, p{d∞ (Ωs,h , Ωds,h ) > } ≤ 1 − exp{

6bad −6bad }≤ , (1 − a) (1 − a)

since 6bad /{(1 − a)} < 1.

Theorem 4. Let s

L

f (yi ) =

L X 2 X

π ˜s,h ND (yi ; µs,h , Φs,h Σs,h ΦTs,h + σs2 I)

s=1 h=1

R R denote the approximation at scale L, let P (B) = B f (yi )dy and P L (B) = B f L (yi )dy, for all B ⊂

Scalable multiscale density estimation

Scalable multiscale density estimation

Suggest Documents

Scalable multiscale density estimation

MULTISCALE MOTION ESTIMATION FOR SCALABLE VIDEO CODING

Multiscale estimation of the field-aligned current density

Time Parallel Scalable Multiphysics/Multiscale Framework

Scalable reconstruction of density matrices

Pareto Density Estimation: A Density Estimation for ... - Uni Marburg

Contingent Kernel Density Estimation

Density estimation for RWRE

Multiscale finite-volume method for density-driven

A Study of Multiscale Density Fluctuation Measurements

LOCAL MULTISCALE FREQUENCY AND BANDWIDTH ESTIMATION

Multiscale Consensus for Decentralized Estimation ... - Matthew West

Multi-dimensional Density Estimation - CiteSeerX

sequential nonparametric density estimation - CiteSeerX

Nonparametric multivariate density estimation - yaroslavvb

density estimation introduction - Semantic Scholar

Pareto Density Estimation - Semantic Scholar

Adaptive Density Estimation - VLDB Endowment

High Density People Count Estimation

2.2 Saturated film density estimation

Stacked Density Estimation - Semantic Scholar

Partition-Based Conditional Density Estimation

Nonparametric Density Estimation using Wavelets

Scalable Preparation of Multiscale Carbon Nanotube/Glass Fiber ...