Reading Group ML / AI. Prof. Christian Bauckhage. Page 2. outline. PageRank recap what is the intuition behind Google's PageRank ? summary. Page 3. recap.
Reading Group ML / AI Prof. Christian Bauckhage
outline PageRank
recap
what is the intuition behind Google’s PageRank ?
summary
recap
a graph G = V, E
V = v1 , v2 , . . . , v12
2 5
4
9
E ⊆V ×V
3 6 8 10 12
1
11 7
recap
2
adjacency matrix
5
1 0
Aij =
if vi , vj ∈ E otherwise
4
9
3 6 8
Markov matrix
10
Aij Pij = P Alj l
12
1
11 7
example
here, we have 0 0 0 0 0 1 A= 1 0 0 1 0 1 0.00 0.00 0.00 0.00 0.00 0.25 P= 0.25 0.00 0.00 0.25 0.00 0.25
0 0 1 1 1 0 0 0 0 0 0 0
0 1 0 1 0 1 1 1 1 0 0 1 0.00 0.00 0.33 0.33 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0 1 1 0 1 1 0 1 0 0 0 0
0 1 0 1 0 0 0 0 0 0 0 0
0.00 0.14 0.00 0.14 0.00 0.14 0.14 0.14 0.14 0.00 0.00 0.14
1 0 1 1 0 0 0 1 1 1 0 1 0.00 0.20 0.20 0.00 0.20 0.20 0.00 0.20 0.00 0.00 0.00 0.00
1 0 1 0 0 0 0 0 0 0 1 0
0 0 1 1 0 1 0 0 1 0 1 1 0.00 0.50 0.00 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0 0 1 0 0 1 0 1 0 0 0 0 0.14 0.00 0.14 0.14 0.00 0.00 0.00 0.14 0.14 0.14 0.00 0.14
1 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 1 1 0 0 0 0 0.33 0.00 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.33 0.00
1 0 1 0 0 1 0 1 0 0 0 0 0.00 0.00 0.17 0.17 0.00 0.17 0.00 0.00 0.17 0.00 0.17 0.17
0.00 0.00 0.33 0.00 0.00 0.33 0.00 0.33 0.00 0.00 0.00 0.00
0.50 0.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.50 0.50 0.00 0.00 0.00 0.00
0.25 0.00 0.25 0.00 0.00 0.25 0.00 0.25 0.00 0.00 0.00 0.00
recap
2
page rank matrix 5
H = α P + 1 − α v1T 4
9
3
page rank vector
6 8
k
π = lim H π0
10
k→∞
12
i−1 h = 1 − α I − αP v
1
11 7
where originally α = 0.85 and v = 1n 1
what is the intuition behind Google’s PageRank ?
a random walk on G
2
2
5
2
5
4
2
5
4
9
5
4
9
3
6
1
1
t=0
1
2
2
2
4
5
4
9
6
6
8
8
10
10
12
1
t=4
3
6 8 10
12
9
3
6 8
4
9
3
7
t=3
5
9
11
7
t=2
5
3
1
11
7
t=1
2
10 12
11
7
4
8 10 12
11
7
5
6
8 10 12
11
3
6 8 10
12
9
3
6 8
4
9
3
1
11 7
t=5
10
12
12
1
11 7
t=6
1
11 7
t=7
question assume that, at t = 0, the walker starts in node vi where will he likely be after t = 1, 2, . . . , k steps ?
question assume that, at t = 0, the walker starts in node vi where will he likely be after t = 1, 2, . . . , k steps ?
answer let’s see . . .
observe
we simply consider the Markov chain πt = P πt−1 with initial state distribution π0 = ei ∈ Rn
observe
unrolling the recursion, we find π1 = P π0 π2 = P π1 = P P π0 π3 = P π2 = P P P π0 .. .
observe
unrolling the recursion, we find π1 = P π0 π2 = P π1 = P P π0 π3 = P π2 = P P P π0 .. . ⇔ πk = Pk π0
observe
in our example, the walker starts in v5 and we have π0 = 0.00 π1 = 0.00 π2 = 0.00 π3 = 0.01 π4 = 0.05 π5 = 0.05 π6 = 0.07 π7 = 0.07 π8 = 0.08 π9 = 0.08 π10 = 0.08 π11 = 0.08 π12 = 0.08 π13 = 0.08 π14 = 0.08 π15 = 0.08
0.00
0.00
0.00
1.00
0.00
0.00
0.00
0.00
0.00
0.00
0.50
0.00
0.50
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.10
0.27
0.17
0.27
0.10
0.00
0.10
0.00
0.00
0.00
0.20
0.10
0.24
0.07
0.09
0.04
0.09
0.07
0.01
0.02
0.09
0.20
0.14
0.12
0.13
0.03
0.12
0.04
0.02
0.03
0.11
0.13
0.16
0.06
0.12
0.05
0.11
0.07
0.03
0.03
0.08
0.17
0.12
0.07
0.14
0.04
0.12
0.06
0.03
0.04
0.08
0.14
0.12
0.05
0.14
0.06
0.12
0.06
0.04
0.04
0.07
0.15
0.11
0.05
0.14
0.05
0.12
0.06
0.04
0.04
0.07
0.14
0.11
0.05
0.14
0.06
0.12
0.06
0.04
0.04
0.07
0.15
0.11
0.05
0.14
0.06
0.12
0.06
0.04
0.04
0.07
0.15
0.11
0.04
0.14
0.06
0.12
0.06
0.04
0.04
0.06
0.15
0.11
0.04
0.15
0.06
0.12
0.06
0.04
0.04
0.06
0.15
0.11
0.04
0.15
0.06
0.12
0.06
0.04
0.04
0.06
0.15
0.10
0.04
0.15
0.06
0.12
0.06
0.04
0.04
0.06
0.15
0.10
0.04
0.15
0.06
0.12
0.06
0.04
0.04
0.00
0.00 0.00 0.07 0.04 0.08 0.07 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08
observe
if the walker starts in v3 , we have π0 = 0.00 π1 = 0.00 π2 = 0.10 π3 = 0.04 π4 = 0.10 π5 = 0.06 π6 = 0.09 π7 = 0.07 π8 = 0.09 π9 = 0.08 π10 = 0.09 π11 = 0.08
0.00
1.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.14
0.00
0.14
0.00
0.14
0.14
0.14
0.14
0.00
0.00
0.03
0.25
0.09
0.08
0.14
0.00
0.13
0.04
0.02
0.07
0.09
0.10
0.12
0.03
0.14
0.10
0.14
0.08
0.05
0.02
0.05
0.18
0.10
0.06
0.15
0.03
0.12
0.06
0.03
0.06
0.07
0.13
0.11
0.04
0.14
0.08
0.13
0.07
0.05
0.03
0.06
0.16
0.10
0.05
0.15
0.05
0.12
0.06
0.04
0.05
0.07
0.14
0.11
0.04
0.14
0.07
0.13
0.06
0.04
0.04
0.06
0.15
0.10
0.04
0.15
0.06
0.12
0.06
0.04
0.04
0.06
0.14
0.11
0.04
0.15
0.07
0.13
0.06
0.04
0.04
0.06
0.15
0.10
0.04
0.15
0.06
0.12
0.06
0.04
0.04
0.06
0.14
0.10
0.04
0.15
0.06
0.13
0.06
0.04
0.04
0.00
0.14 0.04 0.10 0.07 0.09 0.08 0.09 0.08 0.09 0.08 0.08
observe
both processes π = lim Pk π0 k→∞
converge to the same stationary distribution π = Pπ
yet, the process starting in π0 = e3 converges slightly faster than the one starting in π0 = e5
note
if a Markov chain is time-homogeneous, its transition matrix P is the same in each step if a time-homogeneous Markov chain is irreducible and aperiodic, its stationary distribution π is unique ⇔ if a time-homogeneous Markov chain is irreducible and aperiodic, we have lim Pk = π 1T h i π = lim Pk π0 = π 1T π0 = π 1T π0 = π k→∞
⇔
k→∞
note
there is a crucial difference between where a random walker likely is and where a random walker really is at time t, the entries πi of πt denote probabilities for the walker to be in node vi however, at time t, the walker can be in only one node vj we may think of this node as being randomly chosen or drawn according to the probabilities in πt to express this, we write vj ∼ πt
question OK, so we know about the random walk πt = P πt−1 but what about the page rank process h i πt = α P + 1 − α v1T πt−1 = α P πt−1 + 1 − α v
question OK, so we know about the random walk πt = P πt−1 but what about the page rank process h i πt = α P + 1 − α v1T πt−1 = α P πt−1 + 1 − α v answer let’s see . . .
observe
the page rank process models a random surfer at each time t, the surfer is on a Web page and either randomly clicks a link leading to another page or enters a random URL to get to an unrelated page
observe
the page rank process models a random surfer at each time t, the surfer is on a Web page and either randomly clicks a link leading to another page or enters a random URL to get to an unrelated page
⇔ with probability α the surfer continues the random walk at vj ∼ πt = P πt−1
with probability 1 − α the surfer restarts the random walk at a new page vj ∼ v
observe
the entries πi of the page rank vector i−1 h v π = 1 − α I − αP thus simply state how likely a surfer who has performed an infinitely long random walk with restarts will end up on page vi
summary
we now know about
the intuition behind Google’s PageRank