N. Query. Use MSO as query language. A query is an MSO-sentence. Example: NPakk. NPdat. £. ¡£ ¢¤¢. Vakk. Vdat. £.
Uwe M¨onnich, Frank Morawietz, and Stephan Kepser
E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN
A Regular Query for Context-Sensitive Relations um,frank,kepser @sfs.uni-tuebingen.de http://tcl.sfs.uni-tuebingen.de
Seminar f¨ur Sprachwissenschaft Theoretical Computational Linguistics Group SFB 441: Linguistic Data Structures University of T¨ubingen
Philadelphia, 12. December 2001 – p.1
Fundamental Problem
E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN
Skylla (Lack of Expressive Power) and Charybdis (Undecidability)
Philadelphia, 12. December 2001 – p.2
Core XML (Dan Suciu: “The longest definition (500 pp) of regular tree
E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN
grammars”.) S
Fritz l¨ ost das Problem
NP
VP
PN
V
Fritz
löst
NP D
N
das
Problem
Philadelphia, 12. December 2001 – p.3
Context-Sensitive Relations
E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN
Cross-Serial Dependencies: (. . . wil) mer de maa em chind lönd hälffe schwüme Swiss-German: an bm cn d m —Non-CF Cannot be queried by regular means. Monadic Second Order logic (MSO) as query language is not powerful enough.
Philadelphia, 12. December 2001 – p.4
Context-free Tree Grammars
Generalizations of context-free (string) grammars.
xn ➝ t xn .
Rules have the form F x1 t is a tree with variables x1
E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN
Needed to describe context-sensitive relations.
Rewrite a non-terminal F with complete tree t . Examples: TAG, Minimalist Grammars (E. Stabler). Not necessary to write a new grammar for every query.
Philadelphia, 12. December 2001 – p.5
NPakk NPdat
Use MSO as query language. A query is an MSO-sentence. Example:
E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN
Query
Vakk Vdat
In praxi, use tree automaton representing the MSO-sentence. Result: set of candidate trees, all the trees we actually search for are in it, but also a lot of garbage.
Philadelphia, 12. December 2001 – p.6
E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN
Overview
Derivation Trees (MSO, RTG, FSTA)
Mildly Context-Sensitive Structures
Philadelphia, 12. December 2001 – p.7
FT
in
g
Derivation Trees (MSO, RTG, FSTA)
LI
E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN
Overview
Mildly Context-Sensitive Structures
Philadelphia, 12. December 2001 – p.7
Derivation Trees (MSO, RTG, FSTA)
g in FT LI
n
tio
uc
sd
n ra -T
SO
M
E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN
Overview
Mildly Context-Sensitive Structures
Philadelphia, 12. December 2001 – p.7
Lifting the Grammar
E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN
Is a simple primitive recursive function Makes control structure visible All function symbols become constants. Resulting grammar is regular. Therfore expressible by an MSO-formula, and There is a tree automaton for the lifted grammar.
Philadelphia, 12. December 2001 – p.8
Lifting the Candidate trees
E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN
Similar to lifting a grammar. All internal nodes become leaves. For each candidate tree, there is a small finite set of lifted trees.
Philadelphia, 12. December 2001 – p.9
Example of a Lifted Tree c
π4
b
c
π3
π1
π2
π2
c d
π1
a
ε
c
ε
π4
c
c
π3
c
π2
c
π1
c
a
π3
c
π4
a
a
d
c
c
E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN
c
b
ε
c
ε
Philadelphia, 12. December 2001 – p.10
Lifted Query We have E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN
a tree automaton representing the lifted grammar, and a set of lifted candidate trees. Run the tree automaton on the set of lifted candidate trees. Result: Solution set of lifted trees. But: Lifted Trees are unreadable, not in the format of trees in the treebank.
Philadelphia, 12. December 2001 – p.11
Reconstruction of the Intended Structures
E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN
The intended tree is present in the lifted tree, but hidden. Have to read the intended tree off the lifted tree. Technically: Define dominance and precedence relation of the intended tree on the basis of the dominance and precedence and the control structure (composition and projection) of the lifted tree. Possible with MSO-definable Transductions
Philadelphia, 12. December 2001 – p.12
Reconstruction of the Intended Structures c c
π1
π3
c π4
E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN
c
π4
π2
c
c
π4
c
ε
d
c π3
π3
c
ε
c π2
c a
π1
π2
b
c c a
a
b
ε
c
ε
intended dominance
d
immediate dominance
a
π1
Philadelphia, 12. December 2001 – p.13
Intended Query Result
E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN
Context-free tree grammar for describing context-sensitive relations MSO query on the original tree bank ➪ set of candidate trees Lift grammar and candidate trees ➪ automaton and set of lifted candidate trees Run automaton representing lifted grammar on lifted candidate trees ➪ set of lifted solution trees Reconstruct intended trees via MSO-transduction ➪ set of intended solution trees
Philadelphia, 12. December 2001 – p.14
Effectiveness
E BERHARD K ARLS U NIVERSITÄT T ÜBINGEN
MSO decidable on trees MSO undecidable on graphs MSO + one binary relation undecidable on trees MSO ➝ tree automata hyperexponential “Normalized” MSO ➝ tree automata exponential Lifting and transduction require linear time
Philadelphia, 12. December 2001 – p.15