Reading Group ML / AI

8 downloads 0 Views 228KB Size Report
PageRank. Google's PageRank equation summary. Page 3. Google's search engine in a nutshell offline keep track of the Web graph. Page 4. Google's search ...
Reading Group ML / AI Prof. Christian Bauckhage

outline PageRank

Google’s PageRank equation

summary

Google’s search engine in a nutshell

offline keep track of the Web graph

Google’s search engine in a nutshell

offline keep track of the Web graph and rank its nodes

Google’s search engine in a nutshell

offline keep track of the Web graph and rank its nodes

online determine nodes / pages which are relevant to a query return the results according to their page ranks

note

in the following, we will work with column matrices the literature, however, typically discusses row matrices

general idea

consider the adjacency matrix A of the Web graph  1 if page i links to page j Aij = 0 otherwise turn A into a stochastic or Markov matrix P where Aij Pij = P l Alj determine the stationary vector of P, i.e. determine the vector π for which Pπ = π

observe

π is the eigenvector of eigenvalue 1 of P

π is a stochastic vector 0 6 πi 6 1

and

X

πi = 1

i

entry πi of π indicates the page rank of page i ⇔ if πi > πj , then page i will rank higher in a result list than page j

problem(s)

the Web graph is a directed graph

there are many dangling nodes, i.e. nodes with no outgoing links

therefore, P is irreducible, i.e. P does not have a stationary vector

solution

add a stochastic perturbation or teleportation matrix  H = α P + 1 − α v1T

solution

add a stochastic perturbation or teleportation matrix  H = α P + 1 − α v1T

v is called the personalization vector, it was originally set to v = n1 1 0 < α < 1 is called the damping factor, is was originally set to α = 0.85

observe

to compute the stationary vector of the humungous matrix H, one uses the power method π = lim Hk π0 k→∞

observe

to compute the stationary vector of the humungous matrix H, one uses the power method π = lim Hk π0 k→∞

or, recursively πk+1 = H πk  = α P πk + 1 − α v1T πk  = α P πk + 1 − α v

observe

 π1 = α P π0 + 1 − α v  π2 = α P π1 + 1 − α v     = α P α P π0 + 1 − α v + 1 − α v   = α2 P2 π0 + α P 1 − α v + 1 − α v  π3 = α P π2 + 1 − α v      = α P α2 P2 π0 + α P 1 − α v + 1 − α v + 1 − α v    = α3 P3 π0 + α2 P2 1 − α v + α P 1 − α v + 1 − α v .. .

in general

k−1 X  i  k πk = α P π0 + 1 − α αP v i=0

question what about

π = π∞ = lim πk k→∞

question what about

π = π∞ = lim πk k→∞

answer let’s see . . .

observe

0 < α < 1 and, since P is stochastic, its spectral  radius ρ P = 1 according to the Perron-Frobenius theorem, we therefore have 

lim α P

k→∞

k

=0

and

lim

k→∞

k−1 X  i=0

i−1 i h α P = I−α P

observe

0 < α < 1 and, since P is stochastic, its spectral  radius ρ P = 1 according to the Perron-Frobenius theorem, we therefore have 

lim α P

k→∞

k

=0

and

⇒ page rank equation i−1 h π = 1 − α I − αP v

lim

k→∞

k−1 X  i=0

i−1 i h α P = I−α P

summary

we now know about

Google’s PageRank