Speed Performance. Stephen Wu ... Speed Performance. Stephen Wu ..... SVS Composition Components. Word vector in context
Introduction Structured Vectorial Semantics Evaluation
Structured Composition of Semantic Vectors Stephen Wu Division of Biomedical Statistics and Informatics Mayo Clinic
January 13, 2011 | IWCS
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Outline 1
Introduction Overview Related Work
2
Structured Vectorial Semantics Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
3
Evaluation Model Fit Parsing Speed Performance
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Overview Related Work
Outline 1
Introduction Overview Related Work
2
Structured Vectorial Semantics Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
3
Evaluation Model Fit Parsing Speed Performance
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Overview Related Work
Big Picture
Distributed Semantic Vector Composition
.5 .1 .2 , .1 .1 .1 the
engineers
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Overview Related Work
Big Picture
Distributed Semantic Vector Composition
.5 .1 .2 , .1 .1 .1 the
(Syntactic) Parsing
+
S NP
pulled off ...
DT
NN
the
engineers
engineers
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Overview Related Work
Big Picture
Distributed Semantic Vector Composition
.5 .1 .2 , .1 .1 .1 the
(Syntactic) Parsing
+
=
Structured Vectorial Semantics (SVS)
.1 .2 .1 NP
S NP
pulled off ...
DT
NN
the
engineers
engineers
.1 .2 .1 DT the
Stephen Wu
.5 .1 .1 NN
engineers
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Overview Related Work
Weaknesses of Distributed Semantic models 1
No compositionality Ex 1: Patient is a 48-year old male with no significant past medical history
2
Bag-of-words independence assumption Ex 2: 2a) Significant improvement of health outcomes followed the drastic overhaul of surgical pre-operation procedure. 2b) Significant overhaul of surgical pre-operation procedure followed the drastic improvement of health outcomes.
⇒ Structured Vectorial Semantics (SVS) Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Overview Related Work
Weaknesses of Distributed Semantic models 1
No compositionality Ex 1: Patient is a 48-year old male with no significant past medical history complaining of abdominal pain.
2
Bag-of-words independence assumption Ex 2: 2a) Significant improvement of health outcomes followed the drastic overhaul of surgical pre-operation procedure. 2b) Significant overhaul of surgical pre-operation procedure followed the drastic improvement of health outcomes.
⇒ Structured Vectorial Semantics (SVS) Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Overview Related Work
Weaknesses of Distributed Semantic models 1
No compositionality Ex 1: Patient is a 48-year old male with no significant past medical history complaining of abdominal pain.
2
Bag-of-words independence assumption Ex 2: 2a) Significant improvement of health outcomes followed the drastic overhaul of surgical pre-operation procedure. 2b) Significant overhaul of surgical pre-operation procedure followed the drastic improvement of health outcomes.
⇒ Structured Vectorial Semantics (SVS) Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Overview Related Work
Weaknesses of Distributed Semantic models 1
No compositionality Ex 1: Patient is a 48-year old male with no significant past medical history complaining of abdominal pain.
2
Bag-of-words independence assumption Ex 2: 2a) Significant improvement of health outcomes followed the drastic overhaul of surgical pre-operation procedure. 2b) Significant overhaul of surgical pre-operation procedure followed the drastic improvement of health outcomes.
⇒ Structured Vectorial Semantics (SVS) Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Overview Related Work
Weaknesses of Distributed Semantic models 1
No compositionality Ex 1: Patient is a 48-year old male with no significant past medical history complaining of abdominal pain.
2
Bag-of-words independence assumption Ex 2: 2a) Significant improvement of health outcomes followed the drastic overhaul of surgical pre-operation procedure. 2b) Significant overhaul of surgical pre-operation procedure followed the drastic improvement of health outcomes.
⇒ Structured Vectorial Semantics (SVS) Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Overview Related Work
Vector Composition Background General definition
(Mitchell & Lapata ’08)
eγ = f ( eα , eβ , M, L )
Syntactic context Predicate–argument Selectional preferences Language models Matrices
Stephen Wu
(Kintsch ’01) (Erk & Padó ’08) (Mitchell & Lapata ’09) (Rudolph & Giesbrecht ’10)
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Overview Related Work
Vector Composition Background General definition
(Mitchell & Lapata ’08)
eγ = f ( eα , eβ , |{z} M , |{z} L ) |{z} |{z} |{z}
target vector
source 1 source 2 syntax knowledge
Syntactic context Predicate–argument Selectional preferences Language models Matrices
Stephen Wu
(Kintsch ’01) (Erk & Padó ’08) (Mitchell & Lapata ’09) (Rudolph & Giesbrecht ’10)
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Overview Related Work
Vector Composition Background General definition
(Mitchell & Lapata ’08)
eγ = f ( eα , eβ , |{z} M , |{z} L ) |{z} |{z} |{z}
target vector
source 1 source 2 syntax knowledge
Add: eγ [i] = eα [i] + eβ [i] Mult: eγ [i] = eα [i] · eβ [i] Syntactic context Predicate–argument Selectional preferences Language models Matrices
Stephen Wu
(Kintsch ’01) (Erk & Padó ’08) (Mitchell & Lapata ’09) (Rudolph & Giesbrecht ’10)
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Overview Related Work
Vector Composition Background General definition
(Mitchell & Lapata ’08)
eγ = f ( eα , eβ , |{z} M , |{z} L ) |{z} |{z} |{z}
target vector
source 1 source 2 syntax knowledge
Add: eγ [i] = eα [i] + eβ [i] Mult: eγ [i] = eα [i] · eβ [i] Syntactic context Predicate–argument Selectional preferences Language models Matrices
Stephen Wu
(Kintsch ’01) (Erk & Padó ’08) (Mitchell & Lapata ’09) (Rudolph & Giesbrecht ’10)
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Overview Related Work
Semantically-annotated Parsing Headword Lexicalization S
(Charniak ’97) One-word semantics Subcategorization ⇒ Headword-lex SVS
VP
NP
DT
NN
the
engineers
VBD
NP
Latent Annotations (Matsuzaki VBD
PRT
DT
pulled
off
an
et al. ’05)
NN
NN
NN
engineering
trick
Learned subcats Clustered semantics ⇒ Relationally-clustered SVS
Semantic parsing Logical forms ⇒ Logical interpretation SVS
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Overview Related Work
Semantically-annotated Parsing Headword Lexicalization
S
ipulled
NP
VP
iengineers
DT
NN
(Charniak ’97) One-word semantics Subcategorization ⇒ Headword-lex SVS
ipulled
VBD
ithe
iengineers
the
engineers iVBD pulled
NP
ipulled
pulled
itrick
PRT ioff
off
DT ian
Latent Annotations (Matsuzaki NN
et al. ’05)
itrick
an i NN engineering engineering
NN
itrick
trick
Learned subcats Clustered semantics ⇒ Relationally-clustered SVS
Semantic parsing Logical forms ⇒ Logical interpretation SVS
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Overview Related Work
Semantically-annotated Parsing Headword Lexicalization S[e]
NP[e]
DT[e]
NN[e]
(Charniak ’97) One-word semantics Subcategorization ⇒ Headword-lex SVS
VP[e]
VBD[e]
NP[e]
Latent Annotations (Matsuzaki the
engineers VBD[e] PRT[e]
pulled
off
DT[e]
an
et al. ’05)
NN[e]
NN[e]
NN[e]
engineering
trick
Learned subcats Clustered semantics ⇒ Relationally-clustered SVS
Semantic parsing Logical forms ⇒ Logical interpretation SVS
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Overview Related Work
Semantically-annotated Parsing Headword Lexicalization
S pulled(egr,trick(egrng))
VP
NP egr
DT -
(Charniak ’97) One-word semantics Subcategorization ⇒ Headword-lex SVS
pulled(x,trick(egrng))
NN eng
VBD
NP
pulled(x,y)
trick(egrng)
Latent Annotations (Matsuzaki the
VBD engineers pulled(x,y)
pulled
PRT -
DT -
off
an
NN trick(egrng)
NN
NN
egrng
trick(z)
engineering
trick
et al. ’05)
Learned subcats Clustered semantics ⇒ Relationally-clustered SVS
Semantic parsing Logical forms ⇒ Logical interpretation SVS
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
Outline 1
Introduction Overview Related Work
2
Structured Vectorial Semantics Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
3
Evaluation Model Fit Parsing Speed Performance
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Composition Components Word vector in context (e) .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip
.1 engineers eα = .2 .1 P(the | lciα ) iu ik ip
the
ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )
Syntactic vector (m)
iu : unknown
iu ik ip
iu ik ip
ip : people ik : known
Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )
eγ = f (eα , eβ , M, L)
iu ik ip
.2 m(lMod NP lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Composition Components Word vector in context (e) .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip
.1 engineers eα = .2 .1 P(the | lciα ) iu ik ip
the
ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )
Syntactic vector (m)
iu : unknown
iu ik ip
iu ik ip
ip : people ik : known
Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )
eγ = f (eα , eβ , M, L)
iu ik ip
.2 m(lMod NP lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Composition Components Word vector in context (e) .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip
.1 engineers eα = .2 .1 P(the | lciα ) iu ik ip
the
ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )
Syntactic vector (m)
iu : unknown
iu ik ip
iu ik ip
ip : people ik : known
Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )
eγ = f (eα , eβ , M, L)
iu ik ip
.2 m(lMod NP lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Composition Components Word vector in context (e) .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip
.1 engineers eα = .2 .1 P(the | lciα ) iu ik ip
the
ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )
Syntactic vector (m)
iu : unknown
iu ik ip
iu ik ip
ip : people ik : known
Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )
eγ = f (eα , eβ , M, L)
iu ik ip
.2 m(lMod NP lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Composition Components Word vector in context (e) .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip
.1 engineers eα = .2 .1 P(the | lciα ) iu ik ip
the
ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )
Syntactic vector (m)
iu : unknown
iu ik ip
iu ik ip
ip : people ik : known
Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )
eγ = f (eα , eβ , M, L)
iu ik ip
.2 m(lMod NP lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Composition Components Word vector in context (e) .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip
.1 engineers eα = .2 .1 P(the | lciα ) iu ik ip
the
ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )
Syntactic vector (m)
iu : unknown
iu ik ip
iu ik ip
ip : people ik : known
Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )
eγ = f (eα , eβ , M, L)
iu ik ip
.2 m(lMod NP lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Composition Components Word vector in context (e) .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip
.1 engineers eα = .2 .1 P(the | lciα ) iu ik ip
the
ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )
Syntactic vector (m)
iu : unknown
iu ik ip
iu ik ip
ip : people ik : known
Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )
eγ = f (eα , eβ , M, L)
iu ik ip
.2 m(lMod NP lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Composition Equation eǫ (lM OD )S
Composing “the engineers...” eγ (lM OD )NP eα
(lM OD )DT
the
Stephen Wu
pulled off ... eβ
(lI D )NN
engineers
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Composition Equation eǫ (lM OD )S
Composing “the engineers...” eγ (lM OD )NP
eγ = f ( eα , eβ , M, L )
(lM OD )DT
iu ik ip
= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )
the
pulled off ... eβ
eα
(lI D )NN
engineers
0.0120 .2 .1 .5 1 0 0 .6 .2 .2 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 0.0048 .4 .1 .1 .1 .2 .7 | {z } | {z } | {z } | {z } {z } | {z } | eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ
What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Composition Equation eǫ (lM OD )S
Composing “the engineers...” eγ (lM OD )NP
eγ = f ( eα , eβ , M, L )
(lM OD )DT
iu ik ip
= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )
the
pulled off ... eβ
eα
(lI D )NN
engineers
.6 .2 .2 0.0120 .2 .1 .5 1 0 0 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 .1 .2 .7 0.0048 .4 .1 .1 | {z } | {z } | | {z } | {z } {z } | {z } eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ
What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Composition Equation eǫ (lM OD )S
Composing “the engineers...” eγ (lM OD )NP
eγ = f ( eα , eβ , M, L )
(lM OD )DT
iu ik ip
= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )
the
pulled off ... eβ
eα
(lI D )NN
engineers
.6 .2 .2 0.0120 .2 .1 .5 1 0 0 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 .1 .2 .7 0.0048 .4 .1 .1 | {z } | {z } | | {z } | {z } {z } | {z } eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ
What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Composition Equation eǫ (lM OD )S
Composing “the engineers...” eγ (lM OD )NP
eγ = f ( eα , eβ , M, L )
(lM OD )DT
iu ik ip
= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )
the
pulled off ... eβ
eα
(lI D )NN
engineers
0.0120 .2 .1 .5 1 0 0 .6 .2 .2 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 0.0048 .4 .1 .1 .1 .2 .7 | {z } | {z } | {z } | {z } {z } | {z } | eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ
What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Composition Equation eǫ (lM OD )S
Composing “the engineers...” eγ (lM OD )NP
eγ = f ( eα , eβ , M, L )
(lM OD )DT
iu ik ip
= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )
the
pulled off ... eβ
eα
(lI D )NN
engineers
1 0 0 .6 .2 .2 .1 .5 .2 0.0120 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 .1 .2 .7 .1 .1 .4 0.0048 | {z } | {z } | | {z } {z } | {z } | {z } eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ
What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Composition Equation eǫ (lM OD )S
Composing “the engineers...” eγ (lM OD )NP
eγ = f ( eα , eβ , M, L )
(lM OD )DT
iu ik ip
= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )
the
pulled off ... eβ
eα
(lI D )NN
engineers
0.0120 .2 .1 .5 1 0 0 .6 .2 .2 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 0.0048 .4 .1 .1 .1 .2 .7 | {z } | {z } | {z } | {z } {z } | {z } | eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ
What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Composition Equation eǫ (lM OD )S
Composing “the engineers...” eγ (lM OD )NP
eγ = f ( eα , eβ , M, L )
(lM OD )DT
iu ik ip
= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )
the
pulled off ... eβ
eα
(lI D )NN
engineers
0.0120 .2 .1 .5 1 0 0 .6 .2 .2 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 0.0048 .4 .1 .1 .1 .2 .7 | {z } | {z } | {z } | {z } {z } | {z } | eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ
What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Composition Equation eǫ (lM OD )S
Composing “the engineers...” eγ (lM OD )NP
eγ = f ( eα , eβ , M, L )
(lM OD )DT
iu ik ip
= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )
the
pulled off ... eβ
eα
(lI D )NN
engineers
0.0120 .2 .1 .5 1 0 0 .6 .2 .2 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 0.0048 .4 .1 .1 .1 .2 .7 | {z } | {z } | {z } | {z } {z } | {z } | eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ
What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
Dual Problem: Parsing eγ = m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ ) X X P(iβ | iγ , lβ ) · P(xβ | lciβ ) P(iα | iγ , lα ) · P(xα | lciα ) · = P(lciγ → lcα lcβ ) · iβ
iα
=
XX
P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(xα | lciα ) · P(iβ | iγ , lβ ) · P(xβ | lciβ )
XX
P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(iβ | iγ , lβ ) · P(xα | lciα ) · P(xβ | lciβ )
XX
P(lciγ → lciα lciβ ) · P(xα | lciα ) · P(xβ | lciβ )
iα
=
iα
=
iα
iβ
iβ
iβ
Semantic labels l, concepts i Standard equations Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
Dual Problem: Parsing eγ = m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ ) X X P(iβ | iγ , lβ ) · P(xβ | lciβ ) P(iα | iγ , lα ) · P(xα | lciα ) · = P(lciγ → lcα lcβ ) · iβ
iα
=
XX
P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(xα | lciα ) · P(iβ | iγ , lβ ) · P(xβ | lciβ )
XX
P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(iβ | iγ , lβ ) · P(xα | lciα ) · P(xβ | lciβ )
XX
P(lciγ → lciα lciβ ) · P(xα | lciα ) · P(xβ | lciβ )
iα
=
iα
=
iα
iβ
iβ
iβ
Semantic labels l, concepts i Standard equations Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
Dual Problem: Parsing eγ = m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ ) X X P(iβ | iγ , lβ ) · P(xβ | lciβ ) P(iα | iγ , lα ) · P(xα | lciα ) · = P(lciγ → lcα lcβ ) · iβ
iα
=
XX
P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(xα | lciα ) · P(iβ | iγ , lβ ) · P(xβ | lciβ )
XX
P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(iβ | iγ , lβ ) · P(xα | lciα ) · P(xβ | lciβ )
XX
P(lciγ → lciα lciβ ) · P(xα | lciα ) · P(xβ | lciβ )
iα
=
iα
=
iα
iβ
iβ
iβ
Semantic labels l, concepts i Standard equations Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
Dual Problem: Parsing eγ = m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ ) X X P(iβ | iγ , lβ ) · P(xβ | lciβ ) P(iα | iγ , lα ) · P(xα | lciα ) · = P(lciγ → lcα lcβ ) · iβ
iα
=
XX
P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(xα | lciα ) · P(iβ | iγ , lβ ) · P(xβ | lciβ )
XX
P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(iβ | iγ , lβ ) · P(xα | lciα ) · P(xβ | lciβ )
XX
P(lciγ → lciα lciβ ) · P(xα | lciα ) · P(xβ | lciβ )
iα
=
iα
=
iα
iβ
iβ
iβ
Semantic labels l, concepts i Standard equations Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
Dual Problem: Parsing eγ = m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ ) X X P(iβ | iγ , lβ ) · P(xβ | lciβ ) P(iα | iγ , lα ) · P(xα | lciα ) · = P(lciγ → lcα lcβ ) · iβ
iα
=
XX iα
=
XX iα
=
iβ
XX iα
P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(xα | lciα ) · P(iβ | iγ , lβ ) · P(xβ | lciβ )
iβ
P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(iβ | iγ , lβ ) · P(xα | lciα ) · P(xβ | lciβ ) | {z } P(lciγ → lciα lciβ ) · P(xα | lciα ) · P(xβ | lciβ )
iβ
Semantic labels l, concepts i Standard equations Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
Dual Problem: Parsing eγ = m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ ) X X P(iβ | iγ , lβ ) · P(xβ | lciβ ) P(iα | iγ , lα ) · P(xα | lciα ) · = P(lciγ → lcα lcβ ) · iβ
iα
=
XX
P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(xα | lciα ) · P(iβ | iγ , lβ ) · P(xβ | lciβ )
XX
P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(iβ | iγ , lβ ) · P(xα | lciα ) · P(xβ | lciβ )
XX
P(lciγ → lciα lciβ ) · P(xα | lciα ) · P(xβ | lciβ )
iα
=
iα
=
iα
iβ
iβ
iβ
Semantic labels l, concepts i Standard equations Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
Most Likely Tree Compare vectors (aT ) X P(xγ , lcγ ) = P(lciγ )·P(xγ | lciγ ) = aTγ · eγ Best vector
iγ
def
PθVit(G) (xγ | lceγ ) = J eγ = arg max lceι
aTι eι
K
Implied tree Similar at root
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
Most Likely Tree Compare vectors (aT ) X P(xγ , lcγ ) = P(lciγ )·P(xγ | lciγ ) = aTγ · eγ Best vector
iγ
def
PθVit(G) (xγ | lceγ ) = J eγ = arg max lceι
aTι eι
K
Implied tree Similar at root
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
Most Likely Tree Compare vectors (aT ) X P(xγ , lcγ ) = P(lciγ )·P(xγ | lciγ ) = aTγ · eγ Best vector
iγ
def
PθVit(G) (xγ | lceγ ) = J eγ = arg max lceι
aTι eι
K
Implied tree Similar at root
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
Most Likely Tree Compare vectors (aT ) X P(xγ , lcγ ) = P(lciγ )·P(xγ | lciγ ) = aTγ · eγ Best vector
iγ
def
PθVit(G) (xγ | lceγ ) = J eγ = arg max lceι
aTι eι
K
Implied tree Similar at root
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
Most Likely Tree Compare vectors (aT ) X P(xγ , lcγ ) = P(lciγ )·P(xγ | lciγ ) = aTγ · eγ Best vector
iγ
def
PθVit(G) (xγ | lceγ ) = J eγ = arg max lceι
aTι eι
K
Implied tree Similar at root
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Probability Models Syntactic model
m(lcγ lcα lcβ )[iγ , iγ ] =PθM (lciγ → lcα lcβ )
Semantic model
Lγ×ι(lι )[iγ , iι ] =PθL (iι | iγ , lι ) eγ [iγ ] =PθP-Vit(G) (xγ | lciγ ),
Preterminal model
for preterm γ
T
Root const. model
aǫ [iǫ ] =PπGǫ (lciǫ )
Any const. model
aT γ [iγ ] =PπG (lciγ )
Different instantiations
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Probability Models Syntactic model
m(lcγ lcα lcβ )[iγ , iγ ] =PθM (lciγ → lcα lcβ )
Semantic model
Lγ×ι(lι )[iγ , iι ] =PθL (iι | iγ , lι ) eγ [iγ ] =PθP-Vit(G) (xγ | lciγ ),
Preterminal model
for preterm γ
T
Root const. model
aǫ [iǫ ] =PπGǫ (lciǫ )
Any const. model
aT γ [iγ ] =PπG (lciγ )
Different instantiations
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Probability Models Syntactic model
m(lcγ lcα lcβ )[iγ , iγ ] =PθM (lciγ → lcα lcβ )
Semantic model
Lγ×ι(lι )[iγ , iι ] =PθL (iι | iγ , lι ) eγ [iγ ] =PθP-Vit(G) (xγ | lciγ ),
Preterminal model
for preterm γ
T
Root const. model
aǫ [iǫ ] =PπGǫ (lciǫ )
Any const. model
aT γ [iγ ] =PπG (lciγ )
Different instantiations
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
SVS Probability Models Syntactic model
m(lcγ lcα lcβ )[iγ , iγ ] =PθM (lciγ → lcα lcβ )
Semantic model
Lγ×ι(lι )[iγ , iι ] =PθL (iι | iγ , lι ) eγ [iγ ] =PθP-Vit(G) (xγ | lciγ ),
Preterminal model
for preterm γ
T
Root const. model
aǫ [iǫ ] =PπGǫ (lciǫ )
Any const. model
aT γ [iγ ] =PπG (lciγ )
Different instantiations
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
Relationally-clustered headwords Headword Lexicalization e=
0 1 .. . 0
Relational clusters p1
iaardvark iengineers .. . izygote
icluster1
e = ... ... p|e|
iengineers
icluster|e|
icluster1 (lM OD )NP
(lM OD )NP
ithe (lM OD )DT
iengineers (lI D )NN
icluster2 (lM OD )DT
icluster3
the
engineers
the
engineers
(lI D )NN
Inside–Outside Algorithm (EM)
−−−−−−−−−−−−−−−−−−−−−−−−−−−→ Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
Relationally-clustered headwords Headword Lexicalization e=
0 1 .. . 0
Relational clusters p1
iaardvark iengineers .. . izygote
icluster1
e = ... ... p|e|
iengineers
icluster|e|
icluster1 (lM OD )NP
(lM OD )NP
ithe (lM OD )DT
iengineers (lI D )NN
icluster2 (lM OD )DT
icluster3
the
engineers
the
engineers
(lI D )NN
Inside–Outside Algorithm (EM)
−−−−−−−−−−−−−−−−−−−−−−−−−−−→ Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
Relationally-clustered headwords Headword Lexicalization e=
0 1 .. . 0
Relational clusters p1
iaardvark iengineers .. . izygote
icluster1
e = ... ... p|e|
iengineers
icluster|e|
icluster1 (lM OD )NP
(lM OD )NP
ithe (lM OD )DT
iengineers (lI D )NN
icluster2 (lM OD )DT
icluster3
the
engineers
the
engineers
(lI D )NN
Inside–Outside Algorithm (EM)
−−−−−−−−−−−−−−−−−−−−−−−−−−−→ Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Model Fit Parsing Speed Performance
Outline 1
Introduction Overview Related Work
2
Structured Vectorial Semantics Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS
3
Evaluation Model Fit Parsing Speed Performance
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Model Fit Parsing Speed Performance
Model Fit Evaluation WSJ Sec 02–21 train, 23 test Binarized, subcategorized Non-syntactic information
Quantitative fit: Perplexity Models explain language Sec. 23, ‘unk’+‘num’ syntax only baseline
Stephen Wu
Perplexity 428.94
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Model Fit Parsing Speed Performance
Model Fit Evaluation WSJ Sec 02–21 train, 23 test Binarized, subcategorized Non-syntactic information
Quantitative fit: Perplexity Models explain language Sec. 23, ‘unk’+‘num’ syntax only baseline
Stephen Wu
Perplexity 428.94
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Model Fit Parsing Speed Performance
Model Fit Evaluation WSJ Sec 02–21 train, 23 test Binarized, subcategorized Non-syntactic information
Quantitative fit: Perplexity Models explain language Sec. 23, ‘unk’+‘num’ syntax only baseline rel’n clust. 1khw→005e
Stephen Wu
Perplexity 428.94 371.76
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Model Fit Parsing Speed Performance
EM-learned Relational Clusters Clusters in syntactic context (plural nouns) Cluster i0 ‘money’ unk 0.431 cents 0.135 shares 0.084 yen 0.036 sales 0.025 points 0.023 marks 0.018 francs 0.018 tons 0.013 people 0.012
Cluster i1 ‘people’ officials 0.145 unk 0.141 years 0.132 shares 0.093 prices 0.061 people 0.050 stocks 0.032 sales 0.027 executives 0.024 analysts 0.018
Stephen Wu
Cluster i2 ‘companies’ unk 0.248 markets 0.056 companies 0.036 issues 0.035 firms 0.033 banks 0.030 loans 0.025 investors 0.024 contracts 0.022 stocks 0.021
Cluster i5 ‘time’ years 0.25 months 0.19 unk 0.18 days 0.12 weeks 0.06 points 0.03 companies 0.02 hours 0.02 people 0.01 units 0.01
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Model Fit Parsing Speed Performance
EM-learned Relational Clusters Clusters in syntactic context (past-tense verbs) Cluster i1 ‘announcement’ unk 0.362 was 0.173 reported 0.097 posted 0.036 earned 0.029 filed 0.024 were 0.022 had 0.020 told 0.013 approved 0.013
Cluster i5 ‘change in value’ rose 0.137 fell 0.124 unk 0.116 gained 0.063 dropped 0.051 attributed 0.051 jumped 0.046 added 0.041 lost 0.039 advanced 0.022
Stephen Wu
Cluster i7 ‘change possession’ unk 0.381 had 0.065 was 0.062 took 0.036 bought 0.027 completed 0.025 received 0.024 were 0.023 got 0.018 made 0.018 acquired 0.016
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Model Fit Parsing Speed Performance
WSJ Parsing accuracy and Relational clusters Are distributed semantics better? Sec. 23, length < 40 wds syntax-only baseline: headword-lex. 10hw: headword-lex. 50hw: rel’n clust. 50hw10 clust:
LR 83.32 83.10 83.09 83.67
LP 83.83 83.61 83.40 84.13
F 83.57 83.35 83.24 83.90
Are more clusters better? Sec. 23, length < 40 wds baseline1 clust 1000 hw5 clust, avg 1000 hw10 clust, avg 1000 hw15 clust, avg 1000 hw20 clust, avg Stephen Wu
LR 83.34 83.85 84.04 84.15 84.21
LP 83.90 84.23 84.40 84.38 84.42
F 83.62 84.04 84.21 84.26 84.31
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Model Fit Parsing Speed Performance
WSJ Parsing accuracy and Relational clusters Are distributed semantics better? Sec. 23, length < 40 wds syntax-only baseline: headword-lex. 10hw: headword-lex. 50hw: rel’n clust. 50hw10 clust:
LR 83.32 83.10 83.09 83.67
LP 83.83 83.61 83.40 84.13
F 83.57 83.35 83.24 83.90
Are more clusters better? Sec. 23, length < 40 wds baseline1 clust 1000 hw5 clust, avg 1000 hw10 clust, avg 1000 hw15 clust, avg 1000 hw20 clust, avg Stephen Wu
LR 83.34 83.85 84.04 84.15 84.21
LP 83.90 84.23 84.40 84.38 84.42
F 83.62 84.04 84.21 84.26 84.31
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Model Fit Parsing Speed Performance
WSJ Parsing accuracy and Relational clusters Are distributed semantics better? Sec. 23, length < 40 wds syntax-only baseline: headword-lex. 10hw: headword-lex. 50hw: rel’n clust. 50hw10 clust:
LR 83.32 83.10 83.09 83.67
LP 83.83 83.61 83.40 84.13
F 83.57 83.35 83.24 83.90
Are more clusters better? Sec. 23, length < 40 wds baseline1 clust 1000 hw5 clust, avg 1000 hw10 clust, avg 1000 hw15 clust, avg 1000 hw20 clust, avg Stephen Wu
LR 83.34 83.85 84.04 84.15 84.21
LP 83.90 84.23 84.40 84.38 84.42
F 83.62 84.04 84.21 84.26 84.31
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Model Fit Parsing Speed Performance
Parsing Speed with Vectors Extra operations Slower Vectorization improves speed O(n3 ) runtime Coefficients? 0.66505 un-vectorized 0.00267 vectorized
Efficient operations
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Model Fit Parsing Speed Performance
Parsing Speed with Vectors Extra operations Slower Vectorization improves speed O(n3 ) runtime Coefficients? 0.66505 un-vectorized 0.00267 vectorized
Average Parsing Time (s)
500 Non−vectorized Vectorized
400 300 200 100 0
0
5
10
15 20 25 Sentence Length
30
Efficient operations
Stephen Wu
Structured Composition of Semantic Vectors
35
40
Introduction Structured Vectorial Semantics Evaluation
Model Fit Parsing Speed Performance
Parsing Speed with Vectors Extra operations Slower Vectorization improves speed O(n3 ) runtime Coefficients? 0.66505 un-vectorized 0.00267 vectorized
Average Parsing Time (s)
500 Non−vectorized Vectorized
400 300 200 100 0
0
5
10
15 20 25 Sentence Length
30
Efficient operations
Stephen Wu
Structured Composition of Semantic Vectors
35
40
Introduction Structured Vectorial Semantics Evaluation
Model Fit Parsing Speed Performance
Parsing Speed with Vectors Extra operations Slower Vectorization improves speed O(n3 ) runtime Coefficients? 0.66505 un-vectorized 0.00267 vectorized
Average Parsing Time (s)
500 Non−vectorized Vectorized
400 300 200 100 0
0
5
10
15 20 25 Sentence Length
30
Efficient operations
Stephen Wu
Structured Composition of Semantic Vectors
35
40
Introduction Structured Vectorial Semantics Evaluation
Model Fit Parsing Speed Performance
Parsing Speed with Vectors Extra operations Slower Vectorization improves speed O(n3 ) runtime Coefficients? 0.66505 un-vectorized 0.00267 vectorized
Average Parsing Time (s)
500 Non−vectorized Vectorized
400 300 200 100 0
0
5
10
15 20 25 Sentence Length
30
Efficient operations
Stephen Wu
Structured Composition of Semantic Vectors
35
40
Introduction Structured Vectorial Semantics Evaluation
Model Fit Parsing Speed Performance
Conclusion: Structured Vectorial Semantics Addressing weaknesses No compositionality ← Phrasal semantics Bag-of-words ← Context
Relational-clustering SVS Distributed semantics + Latent-annotation parsing Broad-coverage
Evaluation Perplexity reduction Qualitative clusters Mild parsing gains Tractability
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Model Fit Parsing Speed Performance
Conclusion: Structured Vectorial Semantics Addressing weaknesses No compositionality ← Phrasal semantics Bag-of-words ← Context
Relational-clustering SVS Distributed semantics + Latent-annotation parsing Broad-coverage
Evaluation Perplexity reduction Qualitative clusters Mild parsing gains Tractability
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Model Fit Parsing Speed Performance
Conclusion: Structured Vectorial Semantics Addressing weaknesses No compositionality ← Phrasal semantics Bag-of-words ← Context
Relational-clustering SVS Distributed semantics + Latent-annotation parsing Broad-coverage
Evaluation Perplexity reduction Qualitative clusters Mild parsing gains Tractability
Stephen Wu
Structured Composition of Semantic Vectors
Introduction Structured Vectorial Semantics Evaluation
Model Fit Parsing Speed Performance
Thank you!
[email protected]
Stephen Wu
Structured Composition of Semantic Vectors
Inside–Outside Algorithm (EM) E-step: Estimates → annot. rule
∧
∧
P(iγ , iα , iβ | lcγ , lcα , lcβ ) =
∧
PθOut (lciγ , lchǫ −lchγ ) · PθIns (lchγ | lciγ ) ∧
P(lchǫ )
Weight against real data ∧
M-step:
∧
:
P(lciγ ,lciα ,lciβ ) = P(iγ ,iα ,iβ |lcγ ,lcα ,lcβ ) · P(lcγ ,lcα ,lcβ ) : P ∧ iη0 ,iη1 P(lciη , lciη0 , lciη1 ) PθM (lciη lcη0 , lcη1 ) ← P : latent cliη0 ,cliη1 P(lciη , lciη0 , lciη1 )
Estimate grammar rules Imagine annotations
Frequency count
P
∧
PθL (iη0 | iη ; lη0 ) ← P ∧
:
clη ,cη0 ,cliη1
P(lciη , lciη0 , lciη1 )
clη ,ciη0 ,cliη1
P(lciη , lciη0 , lciη1 )
:
:
P(lciη , −, −) PθH (hη | lciη ) ← P : hη P(lciη , −, −) Stephen Wu
Structured Composition of Semantic Vectors
Inside–Outside Algorithm (EM) E-step: Estimates → annot. rule
∧
∧
P(iγ , iα , iβ | lcγ , lcα , lcβ ) =
∧
PθOut (lciγ , lchǫ −lchγ ) · PθIns (lchγ | lciγ ) ∧
P(lchǫ )
Weight against real data ∧
M-step:
∧
:
P(lciγ ,lciα ,lciβ ) = P(iγ ,iα ,iβ |lcγ ,lcα ,lcβ ) · P(lcγ ,lcα ,lcβ ) : P ∧ iη0 ,iη1 P(lciη , lciη0 , lciη1 ) PθM (lciη lcη0 , lcη1 ) ← P : latent cliη0 ,cliη1 P(lciη , lciη0 , lciη1 )
Estimate grammar rules Imagine annotations
Frequency count
P
∧
PθL (iη0 | iη ; lη0 ) ← P ∧
:
clη ,cη0 ,cliη1
P(lciη , lciη0 , lciη1 )
clη ,ciη0 ,cliη1
P(lciη , lciη0 , lciη1 )
:
:
P(lciη , −, −) PθH (hη | lciη ) ← P : hη P(lciη , −, −) Stephen Wu
Structured Composition of Semantic Vectors
Inside–Outside Algorithm (EM) E-step: Estimates → annot. rule
∧
∧
P(iγ , iα , iβ | lcγ , lcα , lcβ ) =
∧
PθOut (lciγ , lchǫ −lchγ ) · PθIns (lchγ | lciγ ) ∧
P(lchǫ )
Weight against real data ∧
M-step:
∧
:
P(lciγ ,lciα ,lciβ ) = P(iγ ,iα ,iβ |lcγ ,lcα ,lcβ ) · P(lcγ ,lcα ,lcβ ) : P ∧ iη0 ,iη1 P(lciη , lciη0 , lciη1 ) PθM (lciη lcη0 , lcη1 ) ← P : latent cliη0 ,cliη1 P(lciη , lciη0 , lciη1 )
Estimate grammar rules Imagine annotations
Frequency count
P
∧
PθL (iη0 | iη ; lη0 ) ← P ∧
:
clη ,cη0 ,cliη1
P(lciη , lciη0 , lciη1 )
clη ,ciη0 ,cliη1
P(lciη , lciη0 , lciη1 )
:
:
P(lciη , −, −) PθH (hη | lciη ) ← P : hη P(lciη , −, −) Stephen Wu
Structured Composition of Semantic Vectors
Inside–Outside Algorithm (EM) E-step: Estimates → annot. rule
∧
∧
P(iγ , iα , iβ | lcγ , lcα , lcβ ) =
∧
PθOut (lciγ , lchǫ −lchγ ) · PθIns (lchγ | lciγ ) ∧
P(lchǫ )
Weight against real data ∧
M-step:
∧
:
P(lciγ ,lciα ,lciβ ) = P(iγ ,iα ,iβ |lcγ ,lcα ,lcβ ) · P(lcγ ,lcα ,lcβ ) : P ∧ iη0 ,iη1 P(lciη , lciη0 , lciη1 ) PθM (lciη lcη0 , lcη1 ) ← P : latent cliη0 ,cliη1 P(lciη , lciη0 , lciη1 )
Estimate grammar rules Imagine annotations
Frequency count
P
∧
PθL (iη0 | iη ; lη0 ) ← P ∧
:
clη ,cη0 ,cliη1
P(lciη , lciη0 , lciη1 )
clη ,ciη0 ,cliη1
P(lciη , lciη0 , lciη1 )
:
:
P(lciη , −, −) PθH (hη | lciη ) ← P : hη P(lciη , −, −) Stephen Wu
Structured Composition of Semantic Vectors
Inside–Outside Algorithm (EM) E-step: Estimates → annot. rule
∧
∧
P(iγ , iα , iβ | lcγ , lcα , lcβ ) =
∧
PθOut (lciγ , lchǫ −lchγ ) · PθIns (lchγ | lciγ ) ∧
P(lchǫ )
Weight against real data ∧
M-step:
∧
:
P(lciγ ,lciα ,lciβ ) = P(iγ ,iα ,iβ |lcγ ,lcα ,lcβ ) · P(lcγ ,lcα ,lcβ ) : P ∧ iη0 ,iη1 P(lciη , lciη0 , lciη1 ) PθM (lciη lcη0 , lcη1 ) ← P : latent cliη0 ,cliη1 P(lciη , lciη0 , lciη1 )
Estimate grammar rules Imagine annotations
Frequency count
P
∧
PθL (iη0 | iη ; lη0 ) ← P ∧
:
clη ,cη0 ,cliη1
P(lciη , lciη0 , lciη1 )
clη ,ciη0 ,cliη1
P(lciη , lciη0 , lciη1 )
:
:
P(lciη , −, −) PθH (hη | lciη ) ← P : hη P(lciη , −, −) Stephen Wu
Structured Composition of Semantic Vectors
Inside–Outside Algorithm (EM) E-step: Estimates → annot. rule
∧
∧
P(iγ , iα , iβ | lcγ , lcα , lcβ ) =
∧
PθOut (lciγ , lchǫ −lchγ ) · PθIns (lchγ | lciγ ) ∧
P(lchǫ )
Weight against real data ∧
M-step:
∧
:
P(lciγ ,lciα ,lciβ ) = P(iγ ,iα ,iβ |lcγ ,lcα ,lcβ ) · P(lcγ ,lcα ,lcβ ) : P ∧ iη0 ,iη1 P(lciη , lciη0 , lciη1 ) PθM (lciη lcη0 , lcη1 ) ← P : latent cliη0 ,cliη1 P(lciη , lciη0 , lciη1 )
Estimate grammar rules Imagine annotations
Frequency count
P
∧
PθL (iη0 | iη ; lη0 ) ← P ∧
:
clη ,cη0 ,cliη1
P(lciη , lciη0 , lciη1 )
clη ,ciη0 ,cliη1
P(lciη , lciη0 , lciη1 )
:
:
P(lciη , −, −) PθH (hη | lciη ) ← P : hη P(lciη , −, −) Stephen Wu
Structured Composition of Semantic Vectors
Relational Clustering SVS Five SVS models to train Syntactic model
PθM (lciγ → lcα lcβ )
estimated in EM
Semantic model
PθL (iι | iγ , lι )
estimated in EM
PθP-Vit(G) (xγ | lciγ )
backed off from EM
Root const. model
PπGǫ (lciǫ )
byproduct of EM
Any const. model
PπG (lciγ )
byproduct of EM
Preterminal model
∧ P (x | lci ) η θH η PθP-Vit(G) (xη | lciη ) = ∧ P θP-Vit(G) (xη | cη )· PθH (unk | lciη ) ∧
Preterminal model
∧
Root const. model Any const. model
def
∧
PπGǫ (lciǫ ) = PθOut (lciǫ , lchǫ −lchǫ ) X: ∧ def P(lciη , lciη0 , lciη1 ) PπG (lciη ) = lciη0 ,lciη1
Stephen Wu
Structured Composition of Semantic Vectors
xη ∈ H xη 6∈ H
Relational Clustering SVS Five SVS models to train Syntactic model
PθM (lciγ → lcα lcβ )
estimated in EM
Semantic model
PθL (iι | iγ , lι )
estimated in EM
PθP-Vit(G) (xγ | lciγ )
backed off from EM
Root const. model
PπGǫ (lciǫ )
byproduct of EM
Any const. model
PπG (lciγ )
byproduct of EM
Preterminal model
∧ P (x | lci ) η θH η PθP-Vit(G) (xη | lciη ) = ∧ P θP-Vit(G) (xη | cη )· PθH (unk | lciη ) ∧
Preterminal model
∧
Root const. model Any const. model
def
∧
PπGǫ (lciǫ ) = PθOut (lciǫ , lchǫ −lchǫ ) X: ∧ def P(lciη , lciη0 , lciη1 ) PπG (lciη ) = lciη0 ,lciη1
Stephen Wu
Structured Composition of Semantic Vectors
xη ∈ H xη 6∈ H
Relational Clustering SVS Five SVS models to train Syntactic model
PθM (lciγ → lcα lcβ )
estimated in EM
Semantic model
PθL (iι | iγ , lι )
estimated in EM
PθP-Vit(G) (xγ | lciγ )
backed off from EM
Root const. model
PπGǫ (lciǫ )
byproduct of EM
Any const. model
PπG (lciγ )
byproduct of EM
Preterminal model
∧ P (x | lci ) η θH η PθP-Vit(G) (xη | lciη ) = ∧ P θP-Vit(G) (xη | cη )· PθH (unk | lciη ) ∧
Preterminal model
∧
Root const. model Any const. model
def
∧
PπGǫ (lciǫ ) = PθOut (lciǫ , lchǫ −lchǫ ) X: ∧ def P(lciη , lciη0 , lciη1 ) PπG (lciη ) = lciη0 ,lciη1
Stephen Wu
Structured Composition of Semantic Vectors
xη ∈ H xη 6∈ H