Very Sparse Random Projections - UCSD CSE [PDF]

Very Sparse Random Projections Ping Li, Trevor Hastie and Kenneth Church [KDD ’06]

Presented by: Aditya Menon UCSD

March 4, 2009

Presented by: Aditya Menon (UCSD)

Very Sparse Random Projections

March 4, 2009

1 / 21

Dimensionality reduction In dimensionality reduction, we want to embed points into a lower dimensional space such that some measure of interest is preserved In this talk, we are interested in preserving pairwise Euclidean distances in a specific sense:

The problem Suppose we have n data points x1 , . . . , xn , where xi ∈ Rd . Given some k d and > 0, find points y1 , . . . , yn where yi ∈ Rk such that for all i, j, (1 − )||xi − xj ||2 ≤ ||yi − yj ||2 ≤ (1 + )||xi − xj ||2 Let us call the yi ’s an -embedding of the input How do we find this embedding?



March 4, 2009

2 / 21

The solution: PCA?

Simply applying PCA does not necessarily solve this problem Recall that PCA focusses on the following global minimization: min

n X

||xi − yi ||2

i=1

P where yi = kj=1 (xi · u ˆ(j) )ˆ u(j) lies on a linear subspace defined by the basis vectors u ˆ(j) I

Does not necessarily mean that local pairwise distances are preserved

So do we have reason to believe such an embedding is easy to find?



March 4, 2009

3 / 21

Theory: the Johnson-Lindenstrauss lemma

A classic result of Johnson and Lindenstrauss guarantees the following:

The Johnson-Lindenstrauss lemma [3] Any n points in Rd can be -embedded into RO(log n/

2)

Note: The reduced dimension does not depend on d, but instead on n What’s the problem with the lemma? I

It’s non-constructive

Question: Is there an easy way to find this mapping?



March 4, 2009

4 / 21

Construction through randomization It turns out we can find an -approximate embedding through randomization I

Same guarantee, only probabilistic

All we need is a random transformation through a Gaussian:

Theorem ([2]) Let the rows of X ∈ Rn×d denote the points x1 , . . . , xn . Let R ∈ Rd×k be a matrix whose entries are iid N (0, 1), and let yi denote the ith row of the 4 log n matrix √1k XR. Then, if k ≥ 2 /2− 3 /3 , with high probability, (1 − )||xi − xj ||2 ≤ ||yi − yj ||2 ≤ (1 + )||xi − xj ||2 We call this operation a random projection



March 4, 2009

5 / 21

Why does this projection work? It is not difficult to show the following fact:

Theorem Let xi , xj and yi , yj be defined as before. Then, E[||yi − yj ||2 ] = ||xi − xj ||2 Var[||yi − yj ||2 ] =

2 ||xi − xj ||4 k

In expectation, distances are preserved with this mapping The variance is sufficiently small that certain concentration bounds let us derive a high probability guarantee for all distances We will prove the expectation part of this theorem



March 4, 2009

6 / 21

Preservation of distance: proof First, recall that by definition 1 1 yi = √ xi · R = √ xi · r1 . . . xi · rk k k So, (focussing on x1 , y1 for simplicity) 1X E[(x1 · rj )2 ] by linearity of expectation k j ! X X X 1 2 E[x21i rij ]+2 E[rij ri0 j x1i x1i0 ] = k j i i 0) lim Pr[|X n→∞

That is, with probability 1, the mean of the variables approaches the true mean. Proof is out of scope for this talk! But now we can prove the theorem...



March 4, 2009

15 / 21

Asymptotic behaviour of distribution: proof

Let us look at the extra term in the variance, and show that it tends to zero: P P 4 4 (s − 3) ( j (x1j − x2j ) )/d j (x1j − x2j ) = (s − 3) ||x1 − x2 ||4 d (||x1 − x2 ||2 /d)2 (s − 3) E[(x1j − x2j )4 ] → d (E[(x1j − x2j )2 ])2 →0 where the second p line is by the strong law of large numbers. The rate of convergence is O( (s − 3)/d).



March 4, 2009

16 / 21

Implications thus far

The very sparse distribution preserves distances in expectation The variance of pthe estimator tends to that with a normal distribution at the rate O( (s − 3)/d) √ Consider s = d, e.g. I I I

This converges to the normal estimator at the rate O(d−1/4 ) When d is large, this rate is extremely quick Considerable speedup over normal/Achlioptas distribution

So, we √ have a theoretically sound way to do dimensionality reduction in O(n dk) time I

In fact, even s =


d log d

will work, but with higher error


March 4, 2009

17 / 21

Application: wireless sensor network [4]

Say we have d sensors, each of which generates some value xi : combined, this describes a vector (x1 , . . . , xd ) Problem: We’d like to query a few (say k) of these sensors, and receive a good approximation (y1 , . . . , yk ) to the true observations I I

Not unreasonable that the observations are tightly compressible Would like to do this without global coordination

Is there a simple solution?



March 4, 2009

18 / 21

Application: wireless sensor network

Can use sparse projections with s = 1 2 3

d log d :

Each sensor i generates ri1 , . . . , rid ← on average log d values For each non-zero rij , it requests uj and so computes u · ri Picking these dot-products for any k sensors gives us a good approximation



March 4, 2009

19 / 21

Application: wireless sensor network



March 4, 2009

20 / 21

Questions?



March 4, 2009

21 / 21

Dimitris Achlioptas. Database-friendly random projections. In PODS ’01: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 274–281, New York, NY, USA, 2001. ACM. Sanjoy Dasgupta and Anupam Gupta. An elementary proof of the Johnson-Lindenstrauss Lemma. Technical Report TR-99-006, International Computer Science Institute, Berkeley, California, 1999. W.B. Johnson and J. Lindenstrauss. Extensions of Lipschitz maps into a Hilbert space. In Conference in Modern Analysis and Probability (1982, Yale University), volume 26 of Contemporary Mathematics, pages 189–206. AMS, 1984. Wei Wang, Minos Garofalakis, and Kannan Ramchandran. Distributed sparse random projections for refinable approximation. Presented by: Aditya Menon (UCSD)


March 4, 2009

21 / 21

In IPSN ’07: Proceedings of the 6th international conference on Information processing in sensor networks, pages 331–339, New York, NY, USA, 2007. ACM.



March 4, 2009

21 / 21

Very Sparse Random Projections - UCSD CSE [PDF]

Very Sparse Random Projections - UCSD CSE [PDF]

Suggest Documents

Very Sparse Random Projections - Stanford University

Efficient Dictionary Learning via Very Sparse Random Projections - arXiv

Compressed Sensing with Very Sparse Gaussian Random Projections

Improved Range-Summable Random Variable ... - UCSD CSE

CSE 101 - UCSD CSE

ModelSim Tutorial - UCSD CSE [PDF]

Lucas-Kanade - UCSD CSE

Cybercriminal Activity - UCSD CSE

Mathematical Biology - UCSD CSE

L supercomputer - UCSD CSE

Efficient BackProp - UCSD CSE

RTL Handout - UCSD CSE

z900 - UCSD CSE

Efficient BackProp - UCSD CSE

Sparse Representations and Random Projections for Robust and ...

Sparse Representation Based Projections

Are Greebles Special? - UCSD CSE

Clock Network Synthesis - UCSD CSE

Garrison W. Cottrell - UCSD CSE

Light Field Mapping - UCSD CSE

The Apprentice Challenge - UCSD CSE

Time-Varying BRDFs - UCSD CSE

State space search - UCSD CSE

Optimistic Virtual Synchrony - UCSD CSE