The Random Projection Method - Semantic Scholar

The Random Projection Method ∗ Edo Liberty † September 25, 2007

1

Introduction

We start by giving a short proof of the Johnson-Lindenstrauss lemma due to P.Indyk and R.Motowani. We continue by showing an application of random projections for the use of fast matrix rank k approximation due to Papadimitriou, Raghavan, Tamaki, and Vempala.

2

Random Projection

In the next 20 minutes we will prove the Johnson-Lindenstrauss (JL) lemma. We use a construction and proof derivation by Indyk and Motowani which are simpler then the original proof. Let us state the JL lemma: Lemma 2.1 A set of n points u1 . . . un in Rd can be projected down to v1 . . . vn in Rk such that all pairwise distances are preserved: (1 − ²)kui − uj k2 ≤ kvi − vj k2 ≤ (1 + ²)kui − uj k2 if k>

9 ln n , and,0 ≤ ² ≤ 1/2 ²2 − ²3

In order to see this we first need to consider the probability that one vector u is distorted by more then ². Our projecting matrix R will be a random d × k matrix s.t R(i, j) are drawn i.i.d from N (0, 1)1 . We will see that if we set: 1 v = √ RT u k Then:

E(kvk2 ) = kuk2

And with probability Prsuccess ≥ 1 − 2e−(²

(1 − ²)kuk2 ≤ kvk2 ≤ (1 + ²)kuk2 2

−²3 ) k 4

.

∗ chosen

chapters from DIMACS vol.65 by Santosh S. Vempala University, New Haven CT ([email protected]) Supported by AFOSR, and NGA. 1 Normal distribution with mean 0 and variance 1.

† Yale

1

2.1

Calculating E(kvk2 )

Since v =

√1 RT u k

we can calculate the expectancy E(kvk2 ).   2  k d X X 1    √ R(i, j)u(j)  E(kvk2 ) = E  k i=1 j=1  2  k d X1  X  = E  R(i, j)u(j)  k i=1 j=1 =

k d X ¢¡ ¢ 1X ¡ E R(i, j)2 u(j)2 k j=1 i=1

=

d k X 1X (u(j))2 k i=1 j=1

= kuk2 .

2.2

Bounding the ”stretching” probability

In order to bound the distortion probability we define xj = x=

1 kuk hRj , ui

and

k k X X (RjT u)2 kkvk2 = = x2j 2 kuk2 kuk j=1 j=1

The convince in these definitions will become clear in the next few steps. We now turn to calculate the probability £ ¤ Pr kvk2 ≥ (1 + ²)kuk2 = Pr [x > (1 + ²)k] = Pr [x ≥ (1 + ²)k] h i = Pr eλx ≥ eλ(1+²)k E(eλx ) eλ(1+²)k 2 Πkj=1 E(eλxj ) = eλ(1+²)k !k Ã 2 E(eλx1 ) = eλ(1+²)

≤

2

Here we are faced with calculating E(eλx1 ). Using the fact that x1 itself is drawn from N (0, 1) We get that: Z ∞ 2 2 t2 1 E(eλx1 ) = eλt √ e− 2 dt 2π −∞ Z ∞ √ 1 1 − 2λ − t2 (1−2λ) √ e 2 = √ dt 1 − 2λ −∞ 2π 1 = √ For λ < 1/2. 1 − 2λ 2

Inserting E(eλX1 ) =

√ 1 1−2λ

into the probability equation we get: £ ¤ Pr kvk ≥ (1 + ²)kuk2 ≤ 2

µ

e−2(1+²)λ 1 − 2λ

¶k/2

Substituting for λ =

² 2(1+²)

we get: £ ¤ ¡ ¢k/2 Pr kvk ≥ (1 + ²)kuk2 ≤ (1 + ²)e−²

Finally using that 1 + ² < e²−(²

2

−²3 )/2

we obtain:

¤ £ 2 3 Pr kvk ≥ (1 + ²)kuk2 ≤ e−(² −² )k/4

2.3

Bounding the other direction £ ¤ Pr kvk ≤ (1 − ²)kuk2 = = = ≤

Achieving for one point: with probability Pr ≥ 1 − 2e−(²

2.4

Pr [x ≤ (1 − ²)k] h i Pr e−λx ≥ e−λ(1−²)k Ã !k 2 E(e−λx1 ) e−λ(1−²) µ 2(1−²)λ ¶k/2 e ² ,λ = 1 + 2λ 2(1 − ²)

≤

((1 − ²)e² )k/2

≤

e−(²

2

−²3 )k/4

(1 − ²)kuk2 ≤ kvk2 ≤ (1 + ²)kuk2 2

−²3 ) k 4

Putting it all together

In the case we have n points we have to preserve O(n2 ) distances which are all the pairwise distances between points. We can fail with probability less then a constant, say, Prf ail < 1/2. We use a simple union bound: 2

3

n2 2e−(² −² )k/4 (²2 − ²3 )k/4 k

< >

1/2 2 ln(2n) 9 ln(n) > For n > 16. ²2 − ²3

Finally, since the failure probability is smaller then 1/2 we can repeat until success a constant number of times in expectancy.

3

Fast low rank approximation

The results from the last section can be applied to accelerating low rank approximation of matrices. An optimal low rank approximations can be easily computed using the SVD of A in O(mn2 ). Using random projections we show how to achieve an ”almost optimal” low rank approximation in O(mn log(n)). We will go over a two step algorithm, suggested by Papadimitriou, Raghavan, Tamaki, and Vempala. First, we use k random projections to find a matrix B which is ”much smaller” then A, but still shares most of its (right) eigenspace. Then, we SVD B and project A on B’s k top eigenvectors.

3

3.1

Introduction

A low rank approximation of an m × n, (m ≥ n), matrix A is another matrix Ak such that: 1. The rank of Ak is at most k. 2. kA − Ak knorm is minimized. It is well known that for both l2 , and the Frobenius norms Ak =

k X

σi ui viT

i=1

where the singular value decomposition (SVD) of A is: A = U SV T =

n X

σi ui viT

i=1

The algorithm: Let R be an m×` matrix such that R(i, j) are drawn i.i.d from N (0, 1). Also we have that ` ≥ 1. Compute B =

c log(n) . ²2

1 √ RT A. `

P` 2. Compute the SVD of B, B = i=1 λi ai bTi . ´ ³ ek ← A Pk bi bT . 3. Return: A i i=1

4

Proof of the algorithm

Since Ak is the optimal solution we cannot hope to do better then kA − Ak k2F , Yet we show that we do not do much worse. ek k2F ≤ kA − Ak k2F + 2²kAk k2F kA − A Reminder: A=

n X

σi ui viT

, Ak =

i=1

B=

` X

k X

σi ui viT

i=1

Ã

λi ai bTi

ek = A , A

i=1

k X

! bi bTi

.

i=1

Since the bi s are orthogonal ek k2F kA − A

=

n X

ek )bi k2 k(A − A

i=1

= =

n X

k X kAbi − A( bj bTj )bi k2

i=1 n X

j=1

kAbi k2

i=k+1

=

kAk2F −

k X i=1

4

kAbi k2

on the other hand:

kA − Ak k2F = kAk2F − kAk k2F .

We gain that: ek k2 = kA − Ak k2 + (kAk k2 − kA − A F F F

k X

kAbi k2 ).

i=1

We now need to show that Pk 2 i=1 kAbi k .

kAk k2F k X

is not much larger then

λ2i

k X

=

i=1

Pk i=1

kAbi k2 . We start by giving a lower bound on

kBbi k2

i=1 k X

1 k √ RT (Abi )k2 ` i=1

=

≤ (1 + ²)

k X

kAbi k2 With probability

i=1

and so we gain k X

k

kAbi k2 ≥

i=1

We now need to bound the term

Pk

1 X 2 λ 1 + ² i=1 i

λ2i from below:

i=1

k X

λ2i

k X

≥

i=1

viT B T Bvi

i=1 k X 1

=

`

i=1

viT AT RRT Avi

k X

1 σi2 k √ RT ui k2 ` i=1

=

k X

≥

(1 − ²)σi2 , With probability.

i=1

Combining the inequalities with the fact that

Pk

k X

i=1

σi2 = kAk k2F we get:

kAbi k2

≥

1−² kAk k2F 1+²

≥

(1 − 2²)kAk k2F

i=1

Finally:

ek k2F ≤ kA − Ak k2F + 2²kAk k2F kA − A

Notice that we need to preserve the length of at most 2n vectors. This gives us a success probability of Prsuccess ≥ 2 3 1 − 4ne−(² −² )`/4 , which is constant for l = O( log(n) ²2 ).

5

4.1

Computational savings for full matrices

1. Computing the matrix B is O(mn`). 2. Computing the SVD of B is O(m`2 ). 3. Where ` ≥

c log(n) . ²2

Hence the total running time is:

µ O

¶ 1 mn log(n) . ²2

as compared to the straightforward SVD which is O(mn2 ).

5

Summary • We showed that a random matrix can be used to project points in Rd to Rk in a length preserving way (with high probability). Such that k = O( log(n) ²2 ). • We used this fact to accelerate Low rank approximations of m × n matrices from O(mn2 ) to O(mn log(n)).

6

Important message

Petros Drineas, Rafi Ostrovski and Yuval Rabani are organizing a research/reading group on algorithmic aspects of k-means clustering. The first meeting will take place this Friday, Sep 28, at 10:30am, at the IPAM lobby. The topics are related and somewhat in the same spirit as the this talk. A list of the papers to be discussed during the semester were sent to you by email. The first paper read will be Kanungo et al.

6

The Random Projection Method - Semantic Scholar

The Random Projection Method - Semantic Scholar

Suggest Documents

Random iteration and projection method

The random projection method for sti multi-species ... - CiteSeerX

Hybrid simulation method for the scatter projection ... - Semantic Scholar

The immersed boundary method: A projection ... - Semantic Scholar

Random Projection Trees Revisited

Rosen's Projection Method for SVM Training - Semantic Scholar

Projection Stabilizing Method for Palm-top Display ... - Semantic Scholar

A Novel Purification Method for CNS Projection ... - Semantic Scholar

Random Projection Ensemble Classifiers - CiteSeerX

Generalized Perspective Projection - Semantic Scholar

The Central Projection of Cephalic ... - Semantic Scholar

Active Build-Model Random Forest Method for ... - Semantic Scholar

Convergence of a Random Particle Method to ... - Semantic Scholar

RASH: A Self-adaptive Random Search Method - Semantic Scholar

Pruned Random Subspace Method for One-Class ... - Semantic Scholar

Projection in interaction and projection in grammar - Semantic Scholar

Virtual Projection: Exploring Optical Projection as a ... - Semantic Scholar

Random Feature Maps via a Layered Random Projection (LaRP

Multi-Shot Re-Identification with Random-Projection-Based Random ...

A NEW PROJECTION METHOD FOR

Hybrid simulation method for the scatter projection

A weighted projection centering method

Linear extensions through random projection - arXiv

Is margin preserved after random projection?