Graph-based Algorithms for Information Retrieval and Natural ...

10 downloads 2443 Views 4MB Size Report
Graph-based algorithms: graph traversal, min-cut algorithms, random ...... bookmarks about cars vs. actual sites about cars. Honda. Ford. VW. Car and Driver.
!""# $

%& !""#

– – !

!

– "

# !

$

% & & –

$

#

%

%

' – ( ' HLT/NAACL 2006

#) $ *

%

% *

*

%

*

+

%

2

' $ – ,

%

-

– .

-

'

– ) $

%

*

% %

/

*

/

0 1(2 ! %

– – !

$3

%

+

/

HLT/NAACL 2006

3

&

HLT/NAACL 2006

(

)&

4

'

-

*

HLT/NAACL 2006

5

*

+ (

4 5%1

#

5# 1# ( 1*

5 4

%

#

a

b

E= {(a,b),(a,c),(a,d), (b,e),(c,d),(c,e), (d,e)}

c d HLT/NAACL 2006

V= {a,b,c,d,e}

e 6

,6



%

.

&

&/ %

7

6

3

7

– ,

%

6

7

6

7

8 6% 79 –

3

6

– ,

%

7

7

8 6% 79

,

6 6

*

*

3 7

$

: *



$ *



$

HLT/NAACL 2006

7

0( 0 3

2

1

0

2

3 3 1

2 3 3

3

3

5

6 1

0

1 G2 1 in:1, out: 1

1

in: 1, out: 2

2

in: 1, out: 0

4

1

3

directed graph in-degree out-degree

G3 HLT/NAACL 2006

8

,-

#

; 7% /

'

L

'# 6

– 0

) J

1

2

3

$&

5

O$

6G

#

– ! 6 46 – ! 47 – ! * 4 PQ * 7 = PQ *=7 % – % P

HLT/NAACL 2006

4

68*8 $

'

58

(**)

/ / "

$

& 46% # 47& L

;

#

∇ u = u xx + u yy = 0 2

, ;

#

k∇ u = ut 2

%

; &

HLT/NAACL 2006

59

,

*

– /

*

– (

*

– (

$

& $

& &

– ( 3

$

$

&

&



. – !

'

– . *

&

$ $ &

'

1 –

'

HLT/NAACL 2006

$

'

60

0 (

M *

N

Av = λv + $

.

%

$

*

$

#

det(A − λI ) = 0

HLT/NAACL 2006

61

0 1*

# A=

/ , :

−1 3 2 0

A − λI =

−1− λ

3

2

−λ

( λ 4 7 λ Q λ >Q 1

' *%

c Pxy =

1

1

a

b

0.5

a

b 0.5

1

c

d

C xy Cx

Cxy C xy =

Cx =

a b c d 1 1 0 0 2 2 1 2 0 0 3 3 1 1 1 0 4 4 2 1 2 2 0 5 5 5

y

w=Pw

1 Rxy

2 14 3 14 4 14 5 14

T

d From Doyle and Snell 2000 HLT/NAACL 2006

64

0

)

>

) >

ixy =

c

vx − v y

1

1

cxy

vx = y

a

0.5

1

cx

y

vy =

Pxyvy y

1 1 7 v = + v = c d va = 1 4 2 16 1 2 vb = 0 v = + v = 3 d c 5 5 8

b

0.5

ixy = 0

= (v x − v y )C xy

Rxy

$ $

' $

d

&& ,

1V

' . &

HLT/NAACL 2006

0

65

(**)

, , "

,

ixy2 Rxy % 1 E= ixy2 Rxy 2 x, y

* f xy = − f yx

# f xy = 0, for y ≠ a, b y

. ( wa − wb ) ja = HLT/NAACL 2006

1 2

( wx − wy ) j xy x, y

66

(**)

= "

# ixy2 Reff =

,

1 2

ixy2 Rxy x, y

$ $46

47

&, $

,

$

L !

HLT/NAACL 2006

6

67

>

Transition matrix

Graph G (V,E)

7

1

6

8

7

7


2

U

7 5 3

HLT/NAACL 2006

< > U O R T S

O R

7

7

7 7 7

7

7

7

7

7

T 4

S

68

6

>

(

'

$

$

'

* & ! 4 ; ) 4 *(Q , $ $

'%

(%

K%

& $

$ $ '4

& )#

'%

%

#&

HLT/NAACL 2006

1

69

,

, A

1

, 7VSVB

+ ' $

' ' #

– $

– –

,

'

#

– (

7



'



'

1* M

HLT/NAACL 2006

$

#!

' A0

!

7VVSB#

N

70

(

1

6

1

8

0.9

t=0

0.8 PageRank

0.7 0.6 0.5 0.4 0.3 0.2 0.1

2

0

7

1

2

3

4

5

6

7

8

1 0.9

5

t=1

0.8

3

PageRank

0.7 0.6 0.5 0.4 0.3

4

0.2 0.1 0 1

2

3

4

5

6

7

8

This graph G has a second graph G’ (not drawn) superimposed on it: G’ is a uniform transition graph. HLT/NAACL 2006

71

,

Solution for the stationary distribution

p = GT p (I − GT ) p = 0

HLT/NAACL 2006

function PowerStatDist (G): begin p(0) = u; (or p(0) = [1,0,…0]) i=1; repeat p(i) = ETp(i-1) L = ||p(i)-p(i-1)||1; i = i + 1; until L < ε return p(i) end

72

0( 1 0.9

t=0

0.8 PageRank

0.7 0.6 0.5 0.4 0.3 0.2

1

6

8

0.1 0 1

2

3

4

5

6

7

8

1 0.9

t=1

0.8 0.7 PageRank

2 7

0.6 0.5 0.4 0.3 0.2 0.1

5

0 1

2

3

4

5

6

7

8

1

3

0.9

t=10

0.8

4 PageRank

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1

HLT/NAACL 2006

3

4

5

6

7

8

73

*

HLT/NAACL 2006

2

1

(

74

' ! J,

'

HLT/NAACL 2006

75

; $ ) $% ; & W E ) $G

;

'

&

"

HLT/NAACL 2006

% 4
! $ – !

'

– (

&

'

&

– (

M

$N

&

*

% M

$N&

– ,

' $&



'

$ *

– ! – , $ / "X HLT/NAACL 2006

+

'

!

' $

J !



$

154

1*

#

Example from Zhu et al. 2003

HLT/NAACL 2006

155

Example from Zhu et al. 2003

* . & HLT/NAACL 2006

156

Pierre Vinken , 61 years old , will join the board as a nonexecutive director Nov. 29. Mr. Vinken is chairman of Elsevier N.V. , the Dutch publishing group. Rudolph Agnew , 55 years old and former chairman of Consolidated Gold Fields PLC , was named a nonexecutive director of this British industrial conglomerate. A form of asbestos once used to make Kent cigarette filters has caused a high percentage of cancer deaths among a group of workers exposed to it more than 30 years ago , researchers reported . The asbestos fiber , crocidolite , is unusually resilient once it enters the lungs , with even brief exposures to it causing symptoms that show up decades later , researchers said . Lorillard Inc. , the unit of New York-based Loews Corp. that makes Kent cigarettes , stopped using crocidolite in its Micronite cigarette filters in 1956 . Although preliminary findings were reported more than a year ago , the latest results appear in today 's New England Journal of Medicine , a forum likely to bring new attention to the problem .

J

& V N N N V V

x02_join x02_is x02_name x02_caus x02_us x02_bring

x01_board x01_chairman x01_director x01_percentag x01_crocidolit x01_attent

x0_as x0_of x0_of x0_of x0_in x0_to

x11_director x11_entitynam x11_conglomer x11_death x11_filter x11_problem

HLT/NAACL 2006

157

,

'

1* *

!! &

Suggest Documents