Structured Composition of Semantic Vectors - Google Sites

1 downloads 386 Views 1MB Size Report
Speed Performance. Stephen Wu ... Speed Performance. Stephen Wu ..... SVS Composition Components. Word vector in context
Introduction Structured Vectorial Semantics Evaluation

Structured Composition of Semantic Vectors Stephen Wu Division of Biomedical Statistics and Informatics Mayo Clinic

January 13, 2011 | IWCS

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Outline 1

Introduction Overview Related Work

2

Structured Vectorial Semantics Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

3

Evaluation Model Fit Parsing Speed Performance

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Outline 1

Introduction Overview Related Work

2

Structured Vectorial Semantics Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

3

Evaluation Model Fit Parsing Speed Performance

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Big Picture

Distributed Semantic Vector Composition

    .5 .1 .2 , .1 .1 .1 the

engineers

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Big Picture

Distributed Semantic Vector Composition

    .5 .1 .2 , .1 .1 .1 the

(Syntactic) Parsing

+

S NP

pulled off ...

DT

NN

the

engineers

engineers

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Big Picture

Distributed Semantic Vector Composition

    .5 .1 .2 , .1 .1 .1 the

(Syntactic) Parsing

+

=

Structured Vectorial Semantics (SVS)

  .1 .2 .1 NP

S NP

pulled off ...

DT

NN

the

engineers

engineers

  .1 .2 .1 DT the

Stephen Wu

  .5 .1 .1 NN

engineers

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Weaknesses of Distributed Semantic models 1

No compositionality Ex 1: Patient is a 48-year old male with no significant past medical history

2

Bag-of-words independence assumption Ex 2: 2a) Significant improvement of health outcomes followed the drastic overhaul of surgical pre-operation procedure. 2b) Significant overhaul of surgical pre-operation procedure followed the drastic improvement of health outcomes.

⇒ Structured Vectorial Semantics (SVS) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Weaknesses of Distributed Semantic models 1

No compositionality Ex 1: Patient is a 48-year old male with no significant past medical history complaining of abdominal pain.

2

Bag-of-words independence assumption Ex 2: 2a) Significant improvement of health outcomes followed the drastic overhaul of surgical pre-operation procedure. 2b) Significant overhaul of surgical pre-operation procedure followed the drastic improvement of health outcomes.

⇒ Structured Vectorial Semantics (SVS) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Weaknesses of Distributed Semantic models 1

No compositionality Ex 1: Patient is a 48-year old male with no significant past medical history complaining of abdominal pain.

2

Bag-of-words independence assumption Ex 2: 2a) Significant improvement of health outcomes followed the drastic overhaul of surgical pre-operation procedure. 2b) Significant overhaul of surgical pre-operation procedure followed the drastic improvement of health outcomes.

⇒ Structured Vectorial Semantics (SVS) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Weaknesses of Distributed Semantic models 1

No compositionality Ex 1: Patient is a 48-year old male with no significant past medical history complaining of abdominal pain.

2

Bag-of-words independence assumption Ex 2: 2a) Significant improvement of health outcomes followed the drastic overhaul of surgical pre-operation procedure. 2b) Significant overhaul of surgical pre-operation procedure followed the drastic improvement of health outcomes.

⇒ Structured Vectorial Semantics (SVS) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Weaknesses of Distributed Semantic models 1

No compositionality Ex 1: Patient is a 48-year old male with no significant past medical history complaining of abdominal pain.

2

Bag-of-words independence assumption Ex 2: 2a) Significant improvement of health outcomes followed the drastic overhaul of surgical pre-operation procedure. 2b) Significant overhaul of surgical pre-operation procedure followed the drastic improvement of health outcomes.

⇒ Structured Vectorial Semantics (SVS) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Vector Composition Background General definition

(Mitchell & Lapata ’08)

eγ = f ( eα , eβ , M, L )

Syntactic context Predicate–argument Selectional preferences Language models Matrices

Stephen Wu

(Kintsch ’01) (Erk & Padó ’08) (Mitchell & Lapata ’09) (Rudolph & Giesbrecht ’10)

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Vector Composition Background General definition

(Mitchell & Lapata ’08)

eγ = f ( eα , eβ , |{z} M , |{z} L ) |{z} |{z} |{z}

target vector

source 1 source 2 syntax knowledge

Syntactic context Predicate–argument Selectional preferences Language models Matrices

Stephen Wu

(Kintsch ’01) (Erk & Padó ’08) (Mitchell & Lapata ’09) (Rudolph & Giesbrecht ’10)

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Vector Composition Background General definition

(Mitchell & Lapata ’08)

eγ = f ( eα , eβ , |{z} M , |{z} L ) |{z} |{z} |{z}

target vector

source 1 source 2 syntax knowledge

Add: eγ [i] = eα [i] + eβ [i] Mult: eγ [i] = eα [i] · eβ [i] Syntactic context Predicate–argument Selectional preferences Language models Matrices

Stephen Wu

(Kintsch ’01) (Erk & Padó ’08) (Mitchell & Lapata ’09) (Rudolph & Giesbrecht ’10)

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Vector Composition Background General definition

(Mitchell & Lapata ’08)

eγ = f ( eα , eβ , |{z} M , |{z} L ) |{z} |{z} |{z}

target vector

source 1 source 2 syntax knowledge

Add: eγ [i] = eα [i] + eβ [i] Mult: eγ [i] = eα [i] · eβ [i] Syntactic context Predicate–argument Selectional preferences Language models Matrices

Stephen Wu

(Kintsch ’01) (Erk & Padó ’08) (Mitchell & Lapata ’09) (Rudolph & Giesbrecht ’10)

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Semantically-annotated Parsing Headword Lexicalization S

(Charniak ’97) One-word semantics Subcategorization ⇒ Headword-lex SVS

VP

NP

DT

NN

the

engineers

VBD

NP

Latent Annotations (Matsuzaki VBD

PRT

DT

pulled

off

an

et al. ’05)

NN

NN

NN

engineering

trick

Learned subcats Clustered semantics ⇒ Relationally-clustered SVS

Semantic parsing Logical forms ⇒ Logical interpretation SVS

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Semantically-annotated Parsing Headword Lexicalization

S

ipulled

NP

VP

iengineers

DT

NN

(Charniak ’97) One-word semantics Subcategorization ⇒ Headword-lex SVS

ipulled

VBD

ithe

iengineers

the

engineers iVBD pulled

NP

ipulled

pulled

itrick

PRT ioff

off

DT ian

Latent Annotations (Matsuzaki NN

et al. ’05)

itrick

an i NN engineering engineering

NN

itrick

trick

Learned subcats Clustered semantics ⇒ Relationally-clustered SVS

Semantic parsing Logical forms ⇒ Logical interpretation SVS

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Semantically-annotated Parsing Headword Lexicalization S[e]

NP[e]

DT[e]

NN[e]

(Charniak ’97) One-word semantics Subcategorization ⇒ Headword-lex SVS

VP[e]

VBD[e]

NP[e]

Latent Annotations (Matsuzaki the

engineers VBD[e] PRT[e]

pulled

off

DT[e]

an

et al. ’05)

NN[e]

NN[e]

NN[e]

engineering

trick

Learned subcats Clustered semantics ⇒ Relationally-clustered SVS

Semantic parsing Logical forms ⇒ Logical interpretation SVS

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Semantically-annotated Parsing Headword Lexicalization

S pulled(egr,trick(egrng))

VP

NP egr

DT -

(Charniak ’97) One-word semantics Subcategorization ⇒ Headword-lex SVS

pulled(x,trick(egrng))

NN eng

VBD

NP

pulled(x,y)

trick(egrng)

Latent Annotations (Matsuzaki the

VBD engineers pulled(x,y)

pulled

PRT -

DT -

off

an

NN trick(egrng)

NN

NN

egrng

trick(z)

engineering

trick

et al. ’05)

Learned subcats Clustered semantics ⇒ Relationally-clustered SVS

Semantic parsing Logical forms ⇒ Logical interpretation SVS

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Outline 1

Introduction Overview Related Work

2

Structured Vectorial Semantics Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

3

Evaluation Model Fit Parsing Speed Performance

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Components Word vector in context (e)   .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip

  .1 engineers eα = .2 .1 P(the | lciα ) iu ik ip

the

ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )

Syntactic vector (m)

iu : unknown

iu ik ip

iu ik ip

ip : people ik : known

Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )

eγ = f (eα , eβ , M, L)

iu ik ip

  .2 m(lMod NP  lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Components Word vector in context (e)   .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip

  .1 engineers eα = .2 .1 P(the | lciα ) iu ik ip

the

ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )

Syntactic vector (m)

iu : unknown

iu ik ip

iu ik ip

ip : people ik : known

Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )

eγ = f (eα , eβ , M, L)

iu ik ip

  .2 m(lMod NP  lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Components Word vector in context (e)   .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip

  .1 engineers eα = .2 .1 P(the | lciα ) iu ik ip

the

ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )

Syntactic vector (m)

iu : unknown

iu ik ip

iu ik ip

ip : people ik : known

Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )

eγ = f (eα , eβ , M, L)

iu ik ip

  .2 m(lMod NP  lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Components Word vector in context (e)   .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip

  .1 engineers eα = .2 .1 P(the | lciα ) iu ik ip

the

ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )

Syntactic vector (m)

iu : unknown

iu ik ip

iu ik ip

ip : people ik : known

Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )

eγ = f (eα , eβ , M, L)

iu ik ip

  .2 m(lMod NP  lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Components Word vector in context (e)   .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip

  .1 engineers eα = .2 .1 P(the | lciα ) iu ik ip

the

ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )

Syntactic vector (m)

iu : unknown

iu ik ip

iu ik ip

ip : people ik : known

Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )

eγ = f (eα , eβ , M, L)

iu ik ip

  .2 m(lMod NP  lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Components Word vector in context (e)   .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip

  .1 engineers eα = .2 .1 P(the | lciα ) iu ik ip

the

ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )

Syntactic vector (m)

iu : unknown

iu ik ip

iu ik ip

ip : people ik : known

Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )

eγ = f (eα , eβ , M, L)

iu ik ip

  .2 m(lMod NP  lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Components Word vector in context (e)   .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip

  .1 engineers eα = .2 .1 P(the | lciα ) iu ik ip

the

ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )

Syntactic vector (m)

iu : unknown

iu ik ip

iu ik ip

ip : people ik : known

Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )

eγ = f (eα , eβ , M, L)

iu ik ip

  .2 m(lMod NP  lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Equation eǫ (lM OD )S

Composing “the engineers...” eγ (lM OD )NP eα

(lM OD )DT

the

Stephen Wu

pulled off ... eβ

(lI D )NN

engineers

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Equation eǫ (lM OD )S

Composing “the engineers...” eγ (lM OD )NP

eγ = f ( eα , eβ , M, L )

(lM OD )DT

iu ik ip

= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )

the

pulled off ... eβ



(lI D )NN

engineers

            0.0120 .2 .1 .5 1 0 0 .6 .2 .2 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 0.0048 .4 .1 .1 .1 .2 .7 | {z } | {z } | {z } | {z } {z } | {z } | eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ



What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Equation eǫ (lM OD )S

Composing “the engineers...” eγ (lM OD )NP

eγ = f ( eα , eβ , M, L )

(lM OD )DT

iu ik ip

= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )

the

pulled off ... eβ



(lI D )NN

engineers

            .6 .2 .2 0.0120 .2 .1 .5 1 0 0 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 .1 .2 .7 0.0048 .4 .1 .1 | {z } | {z } | | {z } | {z } {z } | {z } eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ



What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Equation eǫ (lM OD )S

Composing “the engineers...” eγ (lM OD )NP

eγ = f ( eα , eβ , M, L )

(lM OD )DT

iu ik ip

= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )

the

pulled off ... eβ



(lI D )NN

engineers

            .6 .2 .2 0.0120 .2 .1 .5 1 0 0 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 .1 .2 .7 0.0048 .4 .1 .1 | {z } | {z } | | {z } | {z } {z } | {z } eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ



What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Equation eǫ (lM OD )S

Composing “the engineers...” eγ (lM OD )NP

eγ = f ( eα , eβ , M, L )

(lM OD )DT

iu ik ip

= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )

the

pulled off ... eβ



(lI D )NN

engineers

            0.0120 .2 .1 .5 1 0 0 .6 .2 .2 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 0.0048 .4 .1 .1 .1 .2 .7 | {z } | {z } | {z } | {z } {z } | {z } | eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ



What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Equation eǫ (lM OD )S

Composing “the engineers...” eγ (lM OD )NP

eγ = f ( eα , eβ , M, L )

(lM OD )DT

iu ik ip

= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )

the

pulled off ... eβ



(lI D )NN

engineers

            1 0 0 .6 .2 .2 .1 .5 .2 0.0120 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 .1 .2 .7 .1 .1 .4 0.0048 | {z } | {z } | | {z } {z } | {z } | {z } eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ



What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Equation eǫ (lM OD )S

Composing “the engineers...” eγ (lM OD )NP

eγ = f ( eα , eβ , M, L )

(lM OD )DT

iu ik ip

= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )

the

pulled off ... eβ



(lI D )NN

engineers

            0.0120 .2 .1 .5 1 0 0 .6 .2 .2 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 0.0048 .4 .1 .1 .1 .2 .7 | {z } | {z } | {z } | {z } {z } | {z } | eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ



What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Equation eǫ (lM OD )S

Composing “the engineers...” eγ (lM OD )NP

eγ = f ( eα , eβ , M, L )

(lM OD )DT

iu ik ip

= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )

the

pulled off ... eβ



(lI D )NN

engineers

            0.0120 .2 .1 .5 1 0 0 .6 .2 .2 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 0.0048 .4 .1 .1 .1 .2 .7 | {z } | {z } | {z } | {z } {z } | {z } | eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ



What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Equation eǫ (lM OD )S

Composing “the engineers...” eγ (lM OD )NP

eγ = f ( eα , eβ , M, L )

(lM OD )DT

iu ik ip

= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )

the

pulled off ... eβ



(lI D )NN

engineers

            0.0120 .2 .1 .5 1 0 0 .6 .2 .2 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 0.0048 .4 .1 .1 .1 .2 .7 | {z } | {z } | {z } | {z } {z } | {z } | eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ



What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Dual Problem: Parsing eγ = m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ ) X X P(iβ | iγ , lβ ) · P(xβ | lciβ ) P(iα | iγ , lα ) · P(xα | lciα ) · = P(lciγ → lcα lcβ ) · iβ



=

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(xα | lciα ) · P(iβ | iγ , lβ ) · P(xβ | lciβ )

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(iβ | iγ , lβ ) · P(xα | lciα ) · P(xβ | lciβ )

XX

P(lciγ → lciα lciβ ) · P(xα | lciα ) · P(xβ | lciβ )



=



=









Semantic labels l, concepts i Standard equations Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Dual Problem: Parsing eγ = m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ ) X X P(iβ | iγ , lβ ) · P(xβ | lciβ ) P(iα | iγ , lα ) · P(xα | lciα ) · = P(lciγ → lcα lcβ ) · iβ



=

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(xα | lciα ) · P(iβ | iγ , lβ ) · P(xβ | lciβ )

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(iβ | iγ , lβ ) · P(xα | lciα ) · P(xβ | lciβ )

XX

P(lciγ → lciα lciβ ) · P(xα | lciα ) · P(xβ | lciβ )



=



=









Semantic labels l, concepts i Standard equations Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Dual Problem: Parsing eγ = m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ ) X X P(iβ | iγ , lβ ) · P(xβ | lciβ ) P(iα | iγ , lα ) · P(xα | lciα ) · = P(lciγ → lcα lcβ ) · iβ



=

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(xα | lciα ) · P(iβ | iγ , lβ ) · P(xβ | lciβ )

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(iβ | iγ , lβ ) · P(xα | lciα ) · P(xβ | lciβ )

XX

P(lciγ → lciα lciβ ) · P(xα | lciα ) · P(xβ | lciβ )



=



=









Semantic labels l, concepts i Standard equations Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Dual Problem: Parsing eγ = m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ ) X X P(iβ | iγ , lβ ) · P(xβ | lciβ ) P(iα | iγ , lα ) · P(xα | lciα ) · = P(lciγ → lcα lcβ ) · iβ



=

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(xα | lciα ) · P(iβ | iγ , lβ ) · P(xβ | lciβ )

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(iβ | iγ , lβ ) · P(xα | lciα ) · P(xβ | lciβ )

XX

P(lciγ → lciα lciβ ) · P(xα | lciα ) · P(xβ | lciβ )



=



=









Semantic labels l, concepts i Standard equations Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Dual Problem: Parsing eγ = m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ ) X X P(iβ | iγ , lβ ) · P(xβ | lciβ ) P(iα | iγ , lα ) · P(xα | lciα ) · = P(lciγ → lcα lcβ ) · iβ



=

XX iα

=

XX iα

=



XX iα

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(xα | lciα ) · P(iβ | iγ , lβ ) · P(xβ | lciβ )



P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(iβ | iγ , lβ ) · P(xα | lciα ) · P(xβ | lciβ ) | {z } P(lciγ → lciα lciβ ) · P(xα | lciα ) · P(xβ | lciβ )



Semantic labels l, concepts i Standard equations Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Dual Problem: Parsing eγ = m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ ) X X P(iβ | iγ , lβ ) · P(xβ | lciβ ) P(iα | iγ , lα ) · P(xα | lciα ) · = P(lciγ → lcα lcβ ) · iβ



=

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(xα | lciα ) · P(iβ | iγ , lβ ) · P(xβ | lciβ )

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(iβ | iγ , lβ ) · P(xα | lciα ) · P(xβ | lciβ )

XX

P(lciγ → lciα lciβ ) · P(xα | lciα ) · P(xβ | lciβ )



=



=









Semantic labels l, concepts i Standard equations Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Most Likely Tree Compare vectors (aT ) X P(xγ , lcγ ) = P(lciγ )·P(xγ | lciγ ) = aTγ · eγ Best vector



def

PθVit(G) (xγ | lceγ ) = J eγ = arg max lceι



aTι eι

 K

Implied tree Similar at root

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Most Likely Tree Compare vectors (aT ) X P(xγ , lcγ ) = P(lciγ )·P(xγ | lciγ ) = aTγ · eγ Best vector



def

PθVit(G) (xγ | lceγ ) = J eγ = arg max lceι



aTι eι

 K

Implied tree Similar at root

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Most Likely Tree Compare vectors (aT ) X P(xγ , lcγ ) = P(lciγ )·P(xγ | lciγ ) = aTγ · eγ Best vector



def

PθVit(G) (xγ | lceγ ) = J eγ = arg max lceι



aTι eι

 K

Implied tree Similar at root

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Most Likely Tree Compare vectors (aT ) X P(xγ , lcγ ) = P(lciγ )·P(xγ | lciγ ) = aTγ · eγ Best vector



def

PθVit(G) (xγ | lceγ ) = J eγ = arg max lceι



aTι eι

 K

Implied tree Similar at root

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Most Likely Tree Compare vectors (aT ) X P(xγ , lcγ ) = P(lciγ )·P(xγ | lciγ ) = aTγ · eγ Best vector



def

PθVit(G) (xγ | lceγ ) = J eγ = arg max lceι



aTι eι

 K

Implied tree Similar at root

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Probability Models Syntactic model

m(lcγ lcα lcβ )[iγ , iγ ] =PθM (lciγ → lcα lcβ )

Semantic model

Lγ×ι(lι )[iγ , iι ] =PθL (iι | iγ , lι ) eγ [iγ ] =PθP-Vit(G) (xγ | lciγ ),

Preterminal model

for preterm γ

T

Root const. model

aǫ [iǫ ] =PπGǫ (lciǫ )

Any const. model

aT γ [iγ ] =PπG (lciγ )

Different instantiations

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Probability Models Syntactic model

m(lcγ lcα lcβ )[iγ , iγ ] =PθM (lciγ → lcα lcβ )

Semantic model

Lγ×ι(lι )[iγ , iι ] =PθL (iι | iγ , lι ) eγ [iγ ] =PθP-Vit(G) (xγ | lciγ ),

Preterminal model

for preterm γ

T

Root const. model

aǫ [iǫ ] =PπGǫ (lciǫ )

Any const. model

aT γ [iγ ] =PπG (lciγ )

Different instantiations

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Probability Models Syntactic model

m(lcγ lcα lcβ )[iγ , iγ ] =PθM (lciγ → lcα lcβ )

Semantic model

Lγ×ι(lι )[iγ , iι ] =PθL (iι | iγ , lι ) eγ [iγ ] =PθP-Vit(G) (xγ | lciγ ),

Preterminal model

for preterm γ

T

Root const. model

aǫ [iǫ ] =PπGǫ (lciǫ )

Any const. model

aT γ [iγ ] =PπG (lciγ )

Different instantiations

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Probability Models Syntactic model

m(lcγ lcα lcβ )[iγ , iγ ] =PθM (lciγ → lcα lcβ )

Semantic model

Lγ×ι(lι )[iγ , iι ] =PθL (iι | iγ , lι ) eγ [iγ ] =PθP-Vit(G) (xγ | lciγ ),

Preterminal model

for preterm γ

T

Root const. model

aǫ [iǫ ] =PπGǫ (lciǫ )

Any const. model

aT γ [iγ ] =PπG (lciγ )

Different instantiations

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Relationally-clustered headwords Headword Lexicalization    e= 

0 1 .. . 0

Relational clusters   p1

iaardvark iengineers .. . izygote

  

icluster1

e =  ...  ... p|e|

iengineers

icluster|e|

icluster1 (lM OD )NP

(lM OD )NP

ithe (lM OD )DT

iengineers (lI D )NN

icluster2 (lM OD )DT

icluster3

the

engineers

the

engineers

(lI D )NN

Inside–Outside Algorithm (EM)

−−−−−−−−−−−−−−−−−−−−−−−−−−−→ Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Relationally-clustered headwords Headword Lexicalization    e= 

0 1 .. . 0

Relational clusters   p1

iaardvark iengineers .. . izygote

  

icluster1

e =  ...  ... p|e|

iengineers

icluster|e|

icluster1 (lM OD )NP

(lM OD )NP

ithe (lM OD )DT

iengineers (lI D )NN

icluster2 (lM OD )DT

icluster3

the

engineers

the

engineers

(lI D )NN

Inside–Outside Algorithm (EM)

−−−−−−−−−−−−−−−−−−−−−−−−−−−→ Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Relationally-clustered headwords Headword Lexicalization    e= 

0 1 .. . 0

Relational clusters   p1

iaardvark iengineers .. . izygote

  

icluster1

e =  ...  ... p|e|

iengineers

icluster|e|

icluster1 (lM OD )NP

(lM OD )NP

ithe (lM OD )DT

iengineers (lI D )NN

icluster2 (lM OD )DT

icluster3

the

engineers

the

engineers

(lI D )NN

Inside–Outside Algorithm (EM)

−−−−−−−−−−−−−−−−−−−−−−−−−−−→ Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Outline 1

Introduction Overview Related Work

2

Structured Vectorial Semantics Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

3

Evaluation Model Fit Parsing Speed Performance

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Model Fit Evaluation WSJ Sec 02–21 train, 23 test Binarized, subcategorized Non-syntactic information

Quantitative fit: Perplexity Models explain language Sec. 23, ‘unk’+‘num’ syntax only baseline

Stephen Wu

Perplexity 428.94

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Model Fit Evaluation WSJ Sec 02–21 train, 23 test Binarized, subcategorized Non-syntactic information

Quantitative fit: Perplexity Models explain language Sec. 23, ‘unk’+‘num’ syntax only baseline

Stephen Wu

Perplexity 428.94

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Model Fit Evaluation WSJ Sec 02–21 train, 23 test Binarized, subcategorized Non-syntactic information

Quantitative fit: Perplexity Models explain language Sec. 23, ‘unk’+‘num’ syntax only baseline rel’n clust. 1khw→005e

Stephen Wu

Perplexity 428.94 371.76

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

EM-learned Relational Clusters Clusters in syntactic context (plural nouns) Cluster i0 ‘money’ unk 0.431 cents 0.135 shares 0.084 yen 0.036 sales 0.025 points 0.023 marks 0.018 francs 0.018 tons 0.013 people 0.012

Cluster i1 ‘people’ officials 0.145 unk 0.141 years 0.132 shares 0.093 prices 0.061 people 0.050 stocks 0.032 sales 0.027 executives 0.024 analysts 0.018

Stephen Wu

Cluster i2 ‘companies’ unk 0.248 markets 0.056 companies 0.036 issues 0.035 firms 0.033 banks 0.030 loans 0.025 investors 0.024 contracts 0.022 stocks 0.021

Cluster i5 ‘time’ years 0.25 months 0.19 unk 0.18 days 0.12 weeks 0.06 points 0.03 companies 0.02 hours 0.02 people 0.01 units 0.01

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

EM-learned Relational Clusters Clusters in syntactic context (past-tense verbs) Cluster i1 ‘announcement’ unk 0.362 was 0.173 reported 0.097 posted 0.036 earned 0.029 filed 0.024 were 0.022 had 0.020 told 0.013 approved 0.013

Cluster i5 ‘change in value’ rose 0.137 fell 0.124 unk 0.116 gained 0.063 dropped 0.051 attributed 0.051 jumped 0.046 added 0.041 lost 0.039 advanced 0.022

Stephen Wu

Cluster i7 ‘change possession’ unk 0.381 had 0.065 was 0.062 took 0.036 bought 0.027 completed 0.025 received 0.024 were 0.023 got 0.018 made 0.018 acquired 0.016

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

WSJ Parsing accuracy and Relational clusters Are distributed semantics better? Sec. 23, length < 40 wds syntax-only baseline: headword-lex. 10hw: headword-lex. 50hw: rel’n clust. 50hw10 clust:

LR 83.32 83.10 83.09 83.67

LP 83.83 83.61 83.40 84.13

F 83.57 83.35 83.24 83.90

Are more clusters better? Sec. 23, length < 40 wds baseline1 clust 1000 hw5 clust, avg 1000 hw10 clust, avg 1000 hw15 clust, avg 1000 hw20 clust, avg Stephen Wu

LR 83.34 83.85 84.04 84.15 84.21

LP 83.90 84.23 84.40 84.38 84.42

F 83.62 84.04 84.21 84.26 84.31

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

WSJ Parsing accuracy and Relational clusters Are distributed semantics better? Sec. 23, length < 40 wds syntax-only baseline: headword-lex. 10hw: headword-lex. 50hw: rel’n clust. 50hw10 clust:

LR 83.32 83.10 83.09 83.67

LP 83.83 83.61 83.40 84.13

F 83.57 83.35 83.24 83.90

Are more clusters better? Sec. 23, length < 40 wds baseline1 clust 1000 hw5 clust, avg 1000 hw10 clust, avg 1000 hw15 clust, avg 1000 hw20 clust, avg Stephen Wu

LR 83.34 83.85 84.04 84.15 84.21

LP 83.90 84.23 84.40 84.38 84.42

F 83.62 84.04 84.21 84.26 84.31

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

WSJ Parsing accuracy and Relational clusters Are distributed semantics better? Sec. 23, length < 40 wds syntax-only baseline: headword-lex. 10hw: headword-lex. 50hw: rel’n clust. 50hw10 clust:

LR 83.32 83.10 83.09 83.67

LP 83.83 83.61 83.40 84.13

F 83.57 83.35 83.24 83.90

Are more clusters better? Sec. 23, length < 40 wds baseline1 clust 1000 hw5 clust, avg 1000 hw10 clust, avg 1000 hw15 clust, avg 1000 hw20 clust, avg Stephen Wu

LR 83.34 83.85 84.04 84.15 84.21

LP 83.90 84.23 84.40 84.38 84.42

F 83.62 84.04 84.21 84.26 84.31

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Parsing Speed with Vectors Extra operations Slower Vectorization improves speed O(n3 ) runtime Coefficients? 0.66505 un-vectorized 0.00267 vectorized

Efficient operations

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Parsing Speed with Vectors Extra operations Slower Vectorization improves speed O(n3 ) runtime Coefficients? 0.66505 un-vectorized 0.00267 vectorized

Average Parsing Time (s)

500 Non−vectorized Vectorized

400 300 200 100 0

0

5

10

15 20 25 Sentence Length

30

Efficient operations

Stephen Wu

Structured Composition of Semantic Vectors

35

40

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Parsing Speed with Vectors Extra operations Slower Vectorization improves speed O(n3 ) runtime Coefficients? 0.66505 un-vectorized 0.00267 vectorized

Average Parsing Time (s)

500 Non−vectorized Vectorized

400 300 200 100 0

0

5

10

15 20 25 Sentence Length

30

Efficient operations

Stephen Wu

Structured Composition of Semantic Vectors

35

40

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Parsing Speed with Vectors Extra operations Slower Vectorization improves speed O(n3 ) runtime Coefficients? 0.66505 un-vectorized 0.00267 vectorized

Average Parsing Time (s)

500 Non−vectorized Vectorized

400 300 200 100 0

0

5

10

15 20 25 Sentence Length

30

Efficient operations

Stephen Wu

Structured Composition of Semantic Vectors

35

40

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Parsing Speed with Vectors Extra operations Slower Vectorization improves speed O(n3 ) runtime Coefficients? 0.66505 un-vectorized 0.00267 vectorized

Average Parsing Time (s)

500 Non−vectorized Vectorized

400 300 200 100 0

0

5

10

15 20 25 Sentence Length

30

Efficient operations

Stephen Wu

Structured Composition of Semantic Vectors

35

40

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Conclusion: Structured Vectorial Semantics Addressing weaknesses No compositionality ← Phrasal semantics Bag-of-words ← Context

Relational-clustering SVS Distributed semantics + Latent-annotation parsing Broad-coverage

Evaluation Perplexity reduction Qualitative clusters Mild parsing gains Tractability

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Conclusion: Structured Vectorial Semantics Addressing weaknesses No compositionality ← Phrasal semantics Bag-of-words ← Context

Relational-clustering SVS Distributed semantics + Latent-annotation parsing Broad-coverage

Evaluation Perplexity reduction Qualitative clusters Mild parsing gains Tractability

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Conclusion: Structured Vectorial Semantics Addressing weaknesses No compositionality ← Phrasal semantics Bag-of-words ← Context

Relational-clustering SVS Distributed semantics + Latent-annotation parsing Broad-coverage

Evaluation Perplexity reduction Qualitative clusters Mild parsing gains Tractability

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Thank you! [email protected]

Stephen Wu

Structured Composition of Semantic Vectors

Inside–Outside Algorithm (EM) E-step: Estimates → annot. rule





P(iγ , iα , iβ | lcγ , lcα , lcβ ) =



PθOut (lciγ , lchǫ −lchγ ) · PθIns (lchγ | lciγ ) ∧

P(lchǫ )

Weight against real data ∧

M-step:



:

P(lciγ ,lciα ,lciβ ) = P(iγ ,iα ,iβ |lcγ ,lcα ,lcβ ) · P(lcγ ,lcα ,lcβ ) : P ∧ iη0 ,iη1 P(lciη , lciη0 , lciη1 ) PθM (lciη  lcη0 , lcη1 ) ← P : latent cliη0 ,cliη1 P(lciη , lciη0 , lciη1 )

Estimate grammar rules Imagine annotations

Frequency count

P



PθL (iη0 | iη ; lη0 ) ← P ∧

:

clη ,cη0 ,cliη1

P(lciη , lciη0 , lciη1 )

clη ,ciη0 ,cliη1

P(lciη , lciη0 , lciη1 )

:

:

P(lciη , −, −) PθH (hη | lciη ) ← P : hη P(lciη , −, −) Stephen Wu

Structured Composition of Semantic Vectors

Inside–Outside Algorithm (EM) E-step: Estimates → annot. rule





P(iγ , iα , iβ | lcγ , lcα , lcβ ) =



PθOut (lciγ , lchǫ −lchγ ) · PθIns (lchγ | lciγ ) ∧

P(lchǫ )

Weight against real data ∧

M-step:



:

P(lciγ ,lciα ,lciβ ) = P(iγ ,iα ,iβ |lcγ ,lcα ,lcβ ) · P(lcγ ,lcα ,lcβ ) : P ∧ iη0 ,iη1 P(lciη , lciη0 , lciη1 ) PθM (lciη  lcη0 , lcη1 ) ← P : latent cliη0 ,cliη1 P(lciη , lciη0 , lciη1 )

Estimate grammar rules Imagine annotations

Frequency count

P



PθL (iη0 | iη ; lη0 ) ← P ∧

:

clη ,cη0 ,cliη1

P(lciη , lciη0 , lciη1 )

clη ,ciη0 ,cliη1

P(lciη , lciη0 , lciη1 )

:

:

P(lciη , −, −) PθH (hη | lciη ) ← P : hη P(lciη , −, −) Stephen Wu

Structured Composition of Semantic Vectors

Inside–Outside Algorithm (EM) E-step: Estimates → annot. rule





P(iγ , iα , iβ | lcγ , lcα , lcβ ) =



PθOut (lciγ , lchǫ −lchγ ) · PθIns (lchγ | lciγ ) ∧

P(lchǫ )

Weight against real data ∧

M-step:



:

P(lciγ ,lciα ,lciβ ) = P(iγ ,iα ,iβ |lcγ ,lcα ,lcβ ) · P(lcγ ,lcα ,lcβ ) : P ∧ iη0 ,iη1 P(lciη , lciη0 , lciη1 ) PθM (lciη  lcη0 , lcη1 ) ← P : latent cliη0 ,cliη1 P(lciη , lciη0 , lciη1 )

Estimate grammar rules Imagine annotations

Frequency count

P



PθL (iη0 | iη ; lη0 ) ← P ∧

:

clη ,cη0 ,cliη1

P(lciη , lciη0 , lciη1 )

clη ,ciη0 ,cliη1

P(lciη , lciη0 , lciη1 )

:

:

P(lciη , −, −) PθH (hη | lciη ) ← P : hη P(lciη , −, −) Stephen Wu

Structured Composition of Semantic Vectors

Inside–Outside Algorithm (EM) E-step: Estimates → annot. rule





P(iγ , iα , iβ | lcγ , lcα , lcβ ) =



PθOut (lciγ , lchǫ −lchγ ) · PθIns (lchγ | lciγ ) ∧

P(lchǫ )

Weight against real data ∧

M-step:



:

P(lciγ ,lciα ,lciβ ) = P(iγ ,iα ,iβ |lcγ ,lcα ,lcβ ) · P(lcγ ,lcα ,lcβ ) : P ∧ iη0 ,iη1 P(lciη , lciη0 , lciη1 ) PθM (lciη  lcη0 , lcη1 ) ← P : latent cliη0 ,cliη1 P(lciη , lciη0 , lciη1 )

Estimate grammar rules Imagine annotations

Frequency count

P



PθL (iη0 | iη ; lη0 ) ← P ∧

:

clη ,cη0 ,cliη1

P(lciη , lciη0 , lciη1 )

clη ,ciη0 ,cliη1

P(lciη , lciη0 , lciη1 )

:

:

P(lciη , −, −) PθH (hη | lciη ) ← P : hη P(lciη , −, −) Stephen Wu

Structured Composition of Semantic Vectors

Inside–Outside Algorithm (EM) E-step: Estimates → annot. rule





P(iγ , iα , iβ | lcγ , lcα , lcβ ) =



PθOut (lciγ , lchǫ −lchγ ) · PθIns (lchγ | lciγ ) ∧

P(lchǫ )

Weight against real data ∧

M-step:



:

P(lciγ ,lciα ,lciβ ) = P(iγ ,iα ,iβ |lcγ ,lcα ,lcβ ) · P(lcγ ,lcα ,lcβ ) : P ∧ iη0 ,iη1 P(lciη , lciη0 , lciη1 ) PθM (lciη  lcη0 , lcη1 ) ← P : latent cliη0 ,cliη1 P(lciη , lciη0 , lciη1 )

Estimate grammar rules Imagine annotations

Frequency count

P



PθL (iη0 | iη ; lη0 ) ← P ∧

:

clη ,cη0 ,cliη1

P(lciη , lciη0 , lciη1 )

clη ,ciη0 ,cliη1

P(lciη , lciη0 , lciη1 )

:

:

P(lciη , −, −) PθH (hη | lciη ) ← P : hη P(lciη , −, −) Stephen Wu

Structured Composition of Semantic Vectors

Inside–Outside Algorithm (EM) E-step: Estimates → annot. rule





P(iγ , iα , iβ | lcγ , lcα , lcβ ) =



PθOut (lciγ , lchǫ −lchγ ) · PθIns (lchγ | lciγ ) ∧

P(lchǫ )

Weight against real data ∧

M-step:



:

P(lciγ ,lciα ,lciβ ) = P(iγ ,iα ,iβ |lcγ ,lcα ,lcβ ) · P(lcγ ,lcα ,lcβ ) : P ∧ iη0 ,iη1 P(lciη , lciη0 , lciη1 ) PθM (lciη  lcη0 , lcη1 ) ← P : latent cliη0 ,cliη1 P(lciη , lciη0 , lciη1 )

Estimate grammar rules Imagine annotations

Frequency count

P



PθL (iη0 | iη ; lη0 ) ← P ∧

:

clη ,cη0 ,cliη1

P(lciη , lciη0 , lciη1 )

clη ,ciη0 ,cliη1

P(lciη , lciη0 , lciη1 )

:

:

P(lciη , −, −) PθH (hη | lciη ) ← P : hη P(lciη , −, −) Stephen Wu

Structured Composition of Semantic Vectors

Relational Clustering SVS Five SVS models to train Syntactic model

PθM (lciγ → lcα lcβ )

estimated in EM

Semantic model

PθL (iι | iγ , lι )

estimated in EM

PθP-Vit(G) (xγ | lciγ )

backed off from EM

Root const. model

PπGǫ (lciǫ )

byproduct of EM

Any const. model

PπG (lciγ )

byproduct of EM

Preterminal model

∧ P (x | lci ) η θH η PθP-Vit(G) (xη | lciη ) = ∧ P θP-Vit(G) (xη | cη )· PθH (unk | lciη ) ∧

Preterminal model



Root const. model Any const. model

def



PπGǫ (lciǫ ) = PθOut (lciǫ , lchǫ −lchǫ ) X: ∧ def P(lciη , lciη0 , lciη1 ) PπG (lciη ) = lciη0 ,lciη1

Stephen Wu

Structured Composition of Semantic Vectors

xη ∈ H xη 6∈ H

Relational Clustering SVS Five SVS models to train Syntactic model

PθM (lciγ → lcα lcβ )

estimated in EM

Semantic model

PθL (iι | iγ , lι )

estimated in EM

PθP-Vit(G) (xγ | lciγ )

backed off from EM

Root const. model

PπGǫ (lciǫ )

byproduct of EM

Any const. model

PπG (lciγ )

byproduct of EM

Preterminal model

∧ P (x | lci ) η θH η PθP-Vit(G) (xη | lciη ) = ∧ P θP-Vit(G) (xη | cη )· PθH (unk | lciη ) ∧

Preterminal model



Root const. model Any const. model

def



PπGǫ (lciǫ ) = PθOut (lciǫ , lchǫ −lchǫ ) X: ∧ def P(lciη , lciη0 , lciη1 ) PπG (lciη ) = lciη0 ,lciη1

Stephen Wu

Structured Composition of Semantic Vectors

xη ∈ H xη 6∈ H

Relational Clustering SVS Five SVS models to train Syntactic model

PθM (lciγ → lcα lcβ )

estimated in EM

Semantic model

PθL (iι | iγ , lι )

estimated in EM

PθP-Vit(G) (xγ | lciγ )

backed off from EM

Root const. model

PπGǫ (lciǫ )

byproduct of EM

Any const. model

PπG (lciγ )

byproduct of EM

Preterminal model

∧ P (x | lci ) η θH η PθP-Vit(G) (xη | lciη ) = ∧ P θP-Vit(G) (xη | cη )· PθH (unk | lciη ) ∧

Preterminal model



Root const. model Any const. model

def



PπGǫ (lciǫ ) = PθOut (lciǫ , lchǫ −lchǫ ) X: ∧ def P(lciη , lciη0 , lciη1 ) PπG (lciη ) = lciη0 ,lciη1

Stephen Wu

Structured Composition of Semantic Vectors

xη ∈ H xη 6∈ H

Suggest Documents