Jul 11, 2011 - Hierarchy of finite-state machines. 4 .... The hierarchy II: finite-state automata (FSAs) .... There are more efficient minimization algorithms.
Efficient finite-state algorithms for the application of local grammars Javier M. Sastre-Martínez1,2,3 Ph.D. thesis supervised by
Mikel L. Forcada2 Eric Laporte1 1
LIGM, Université Paris-Est
2
Grup Transducens, DLSI, Universitat d’Alacant
3
iTEAM, Universitat Politècnica de València
11th July 2011 Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
1 / 42
Outline 1
Background Local grammars Lexicon grammar + local grammars = natural language parsing The MovistarBot: an industrial natural language application
2
Motivation & goal
3
Hierarchy of finite-state machines
4
Efficient algorithms of application of local grammars
5
Algorithm optimizations Efficient management of sets Efficient management of sequences
6
Empirical tests Experimental conditions Experimental results
7
Conclusion
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
2 / 42
Background
Outline 1
Background Local grammars Lexicon grammar + local grammars = natural language parsing The MovistarBot: an industrial natural language application
2
Motivation & goal
3
Hierarchy of finite-state machines
4
Efficient algorithms of application of local grammars
5
Algorithm optimizations Efficient management of sets Efficient management of sequences
6
Empirical tests Experimental conditions Experimental results
7
Conclusion
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
2 / 42
Background
Local grammars
Local grammars (Gross, 1997) Describe sets of meaningful sequences in natural languages (NLs) Handcrafted or semi-automatically built More control on the results than statistical methods Formalism: recursive transition networks (RTNs, Woods, 1970) with output, taking a set of lexical masks as input alphabet Lexical masks: predicates representing sets of lexical units complying with some morphosyntactic and/or semantic criteria RTNs ≡ context-free grammars (CFGs) ≡ push-down automata RTNs + “unification output” ≡ lexical-functional grammars RTNs are more compact than CFGs, hence. . . . . . more efficient algorithms of application (Woods, 1969) Very intuitive graphical representation Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
3 / 42
Background
Local grammars
Example of local grammar I (excerpt)
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
4 / 42
Background
Local grammars
Example of local grammar II (excerpt)
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
5 / 42
Background
Lexicon grammar + local grammars = natural language parsing
Lexicon grammar (Gross, 1996) First empirical method for the exhaustive description of the syntax of NLs (as for Laporte, 2005) Classes of syntactic structures of sentences (Leclère, 2002). . . . . . but taking into account irregularities within each class due to the use of specific predicative elements! A lexicon grammar table per class: a matrix of predicative elements × differential properties Syntactic structures described in the table’s documentation Lexicon grammar of French: one of the richest linguistic resources for French (72000 entries, starting with Gross, 1975) Can be semi-automatically transformed into local grammars for NL parsing (Roche, 1993; Constant, 2003a), though. . . . . . must be first transformed into some exploitable format . . . not a negligible task (e.g.: see Tolone, 2011) Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
6 / 42
Background
Lexicon grammar + local grammars = natural language parsing
Example of lexicon grammar table (excerpt)
Red area: possibility to use auxiliary verbs avoir and être Visualized with HOOP (Sastre, 2006a; Sastre, 2006b): http://hoop.univ-mlv.fr Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
7 / 42
Background
The MovistarBot: an industrial natural language application
The MovistarBot Conversational agent created by Telefónica I+D Text-based communication in Spanish through MSN Messenger Sells mobile services: sending text & multimedia messages search & download games, photos and music search & subscribe to alerts provides information about products and offers
Firstly based on AIML (Wallace, 2004): Simple formalism based on XML Less powerful than regular expressions
Extended with local grammars for boosting the recognition of request sentences (Sastre et al., 2009) Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
8 / 42
Motivation & goal
Outline 1
Background Local grammars Lexicon grammar + local grammars = natural language parsing The MovistarBot: an industrial natural language application
2
Motivation & goal
3
Hierarchy of finite-state machines
4
Efficient algorithms of application of local grammars
5
Algorithm optimizations Efficient management of sets Efficient management of sequences
6
Empirical tests Experimental conditions Experimental results
7
Conclusion
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
8 / 42
Motivation & goal
Motivation & goal Conception of faster algorithms of application of local grammars w.r.t. the algorithms nowadays in use: Unitex (Paumier et al., 2009): top-down depth-first Outilex (Blanc and Constant, 2006): Earley-like Intex/NooJ (Silberztein, 1998; Silberztein, 2007): unknown not open-source conceived for research on linguistics, but not on algorithmics
Other classic algorithms not so straightforwardly adaptable: LR (Knuth, 1965): deterministic & non-ambiguous grammars only; not NL grammars requires a table having an entry per possible input symbol; but input alphabets of local grammars are potentially infinite
Tomita, 1987: LR extension, solves 1st problem but not 2nd CYK (Cocke and Schwartz, 1970; Younger, 1967; Kasami, 1965): Chomsky normal form; grammar broken into binary pieces
Outilex’s Earley-like algorithm can be further improved Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
9 / 42
Hierarchy of finite-state machines
Outline 1
Background Local grammars Lexicon grammar + local grammars = natural language parsing The MovistarBot: an industrial natural language application
2
Motivation & goal
3
Hierarchy of finite-state machines
4
Efficient algorithms of application of local grammars
5
Algorithm optimizations Efficient management of sets Efficient management of sequences
6
Empirical tests Experimental conditions Experimental results
7
Conclusion
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
9 / 42
Hierarchy of finite-state machines
Why a hierarchy?
Different problems to solve → different machines to use. . . . . . but not so different Common features, properties and generic procedures Complex machines (and associated algorithms) easier to understand as extensions of simpler ones Hierarchy of finite-state machines: Factors out common features, properties and generic procedures Incremental definition of machines Incremental construction of proofs Incremental definition of their respective algorithms
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
10 / 42
Hierarchy of finite-state machines
The hierarchy I: finite-state machines (FSMs) A virtual base class Common features & properties to every machine A set of input symbols, states, transitions & transition labels
qi .. .
...
ξi
qj
...
ξj
qk .. .
...
ql
...
ξ0 q0 Some states are initial
.. . qh
ξk
qm .. .
Some states are final
qn
Transition (qs , ξ, qt ) allows to bring the machine from qs to qt depending on ξ and the current context of execution (e.g.: next input)
Classic representation equivalent to that of local grammars Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
11 / 42
Hierarchy of finite-state machines
The hierarchy II: finite-state automata (FSAs) Transitions either consume one input symbol or none (ε) Compact lexicon representation (Revuz, 1992; Daciuk et al., 2000; Carrasco and Forcada, 2002; Daciuk et al., 2005) Factor out prefixes & suffixes folk/s fork/s four/s
q0
f y
q3 k
l q1
o q 2
q5
r u
q4 r
ε s
q6
yolk/s york/s your/s Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
12 / 42
Hierarchy of finite-state machines
The hierarchy III: retrieval trees (tries) FSAs having a tree-like structure (Fredkin, 1960) Factor out prefixes but not suffixes Useful property: each state corresponds to a unique prefix
f f ε y
fol k l o fo r for k u fou r yol k
l o y yo r yor k u you r
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
s
folks
fork s
forks
four s
fours
yolk s
yolks
york s
yorks
your s
yours
folk
11th July 2011
13 / 42
Hierarchy of finite-state machines
The hierarchy IV: recursive transition networks (RTNs) FSAs extended with recursive calls (Woods, 1970) Compact grammar representation: factor out infixes as well Modular grammars (Gross, 1999; Friburger, 2002; Constant, 2003b; Jung, 2005; Yannacopoulou, 2005; Voyatzi, 2006; Laporte, 2007. . . ) A determiner (DET) followed by a noun (N) is a noun phrase (NP) e.g.: the machine
{qDET0 } qNP1 {qN0 } qNP0
qNP3
{qNP0 } qNP2 {qPP0 } A NP followed by a prepositional phrase (PP) is another NP e.g.: the machine with calls Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
14 / 42
Hierarchy of finite-state machines
The hierarchy V: input alphabets
Letter machines: transitions may consume one specific symbol Lexical machines: transitions may consume any word complying with a set of morphosyntactic and/or semantic restrictions Input alphabet of predicates (e.g.: lexical masks) better suited for NL grammars (van Noord and Gerdemann, 2001) Difference affects implementation but not theory Letter machines are conceptually simpler Therefore, hierarchy described in terms of letter machines Guidelines given for the implementation of lexical machines
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
15 / 42
Hierarchy of finite-state machines
The hierarchy VI: machines with output Finite-state transducers (FSTs) & RTNs with different kinds of output: Blackboards (FSTBOs & RTNBOs) → generic output Strings (FSTSOs & RTNSOs) → to translate or to insert metadata
{qDET0 } qNP2 {qN0 } qNP0
ε :
qNP1
qNP4
ε : q NP5
{qNP0 } qNP3 {qPP0 } Weights (WFSMs) → implement heuristics for ambiguity resolution Unification (UFSMs) → ease the representation of long distance phenomena, subcategorization and free-permutable constituents Composite output (FSMCOs) → combined solution Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
16 / 42
Hierarchy of finite-state machines
The hierarchy VII: filtered-popping RTNs (FPRTNs) Also called filtered-popping networks (FPNs) Efficient representation of the outputs generated by applying a RTN with output to an input sequence (Sastre, 2009) FPN = RTN + map κ of states to keys Pop transitions cannot be taken unless keys of final and return states match (they are filtered ) FPN paths represent translations performed by the RTN Keys are indexes to the input symbols consumed by the RTN Keys give the correspondence between FPN paths and input segments Pop transitions are filtered in order to ensure that connected FPN paths correspond to translations of connected input segments Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
17 / 42
Hierarchy of finite-state machines
The hierarchy VII: filtered-popping RTNs (example) translate abc
=======⇒
RTNSO a:(
q1
{q6 }
q0 a:[
q2
q3
c:)
q4
FPN
q5
0
[
b:y
q8
r7 2
r6 q9
r5 3
x 1
y
r8
]
2
a
r4 3
r3 r5
c:]
)
{r6 }
r2
q7
q6
r3 2
1
b:x
{r6 }
1
r0
{q6 }
r1
(
r9 3
b
c
Red pop transitions are forbidden: (r7 , r5, r5 ) allows for skipping translation of c (r9 , r3, r3 ) allows for two consecutive translations of c Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
18 / 42
Efficient algorithms of application of local grammars
Outline 1
Background Local grammars Lexicon grammar + local grammars = natural language parsing The MovistarBot: an industrial natural language application
2
Motivation & goal
3
Hierarchy of finite-state machines
4
Efficient algorithms of application of local grammars
5
Algorithm optimizations Efficient management of sets Efficient management of sequences
6
Empirical tests Experimental conditions Experimental results
7
Conclusion
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
18 / 42
Efficient algorithms of application of local grammars
Formal description of machine behaviour I Application of machines in terms of execution states (ESs) x ∈ X ES = last machine state reached (q ∈ Q) + additional data representing the algorithm’s context of execution Exact definition depends on the machine and the algorithm Examples of ESs for top-down breadth-first and depth-first algs.: FSA: q, last state reached FSTBOs: (q, b), last state reached + output generated up to q RTNs: (q, π), last state reached + stack of return states RTNBOs: (q, b, π), combination of FSTBOs and RTNs
Algorithms for FSAs generalized to other machines by treating ESs as FSA states Indeterministic machines ⇒ multiple ESs for the same input Management of sets of ESs Vi (SES) rather than simple ESs Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
19 / 42
Efficient algorithms of application of local grammars
Formal description of machine behaviour II XI : initial SESs (for a RTNBO: initial state, empty blackboard and empty stack) XF : final SESs (for a RTNBO: final state, some blackboard, empty stack) D(V ): set of ESs directly ε-reachable from V , that is, reachable from any ES of V through a transition that does not consume input Cε (V ): set of ESs ε-reachable from V , that is, through zero, one or more ε-transitions ∆(V , σ): set of ESs directly reachable from V by consuming σ ∆∗ (V , σ1 . . . σn ): set of ESs reachable from V by consuming σ1 . . . σn L(A): language accepted by machine A τ (A): language of translations generated by machine A Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
20 / 42
Efficient algorithms of application of local grammars
ε-closure Generic computation of the ε-closure à la van Noord, 2000 Algorithm 1 eclosure(V ) enqueue every ES in V while there are enqueued ESs do dequeue next ES xs for each xt ∈ D(V ) do if xt ∈ / V then add xt to V and enqueue it end if end for end while
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
⊲ Cε (V )
11th July 2011
21 / 42
Efficient algorithms of application of local grammars
Breadth-first & depth-first application Generic breadth-first application à la Sastre and Forcada, 2009 Algorithm 2 translate_string(σ1 . . . σn ) V0 = Cε (XI ) for i = 0 to n do if Vi = ∅ then return ∅ end if Vi+1 = Cε (∆(Vi , σi+1 )) end for return the set of blackboards of the ESs in Vn ∩ XF
⊲ τ (σ1 . . . σn )
Unitex’s depth-first produces the same ESs but in depth-first order No use of SESs: follow a single path & backtrack forks May process the same ES twice, but managing SESs is expensive Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
22 / 42
Efficient algorithms of application of local grammars
Pseudo-determinization
Reduces the number of reachable ESs from other ESs Apply the machine for every input sequence it may consume Take SESs as the new states of the machine Problem: contrary to FSAs, machines with output may not be determinizable Solution interpret the machines as FSAs by taking transition labels as mere input symbols Not a full determinization, but removes some structures that may lead to infinite loops Other problematic structures do not make sense for NL grammars
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
23 / 42
Efficient algorithms of application of local grammars
Pseudo-minimization Minimization reduces the size of the machine Pseudo-minimization à la van de Snepscheut, 1985 There are more efficient minimization algorithms (Hopcroft et al., 2000), but we focus on algorithms of application: minimize once, apply to multiple sentences Simple procedure: reverse, pseudo-determinize, reverse, pseudo-determinize Reverse machine: produces reversed translations of reversed sequences Basically consists in swapping initial and final sets of states and in reversing the transitions
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
24 / 42
Efficient algorithms of application of local grammars
Flattening of RTNs
Recursively replace call transitions by copies of the called substructures (up to n recursion levels) Equivalent to function inlining in C++ Accelerates the machine application, but. . . . . . size may increase exponentially w.r.t. n Complete flattening of RTNs results in FSAs Complete flattening of RTNs with output results in FSTs Complete flattening only possible for non-recursive RTNs, but. . . . . . natural languages are recursive (in theory)
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
25 / 42
Efficient algorithms of application of local grammars
Earley-like acceptor Contrary to FSAs, RTNs can factor out infixes Breadth-first & depth-first treat RTNs as FSAs: take ESs with a stack as FSA states Problem 1: computation of common infixes is not factored out (exponential worst-case cost) Problem 2: left-recursive calls lead to infinite loops Earley-like: as breadth-first but without stacks Exploration of call transitions is paused Calls to the same set of states are initiated only once Paused explorations are resumed each time the call they depend on is completed Both problems solved Polynomial worst-case cost (n3 , without output generation) Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
26 / 42
Efficient algorithms of application of local grammars
Earley-like translator Outilex’s “trivial” extension of Earley’s algorithm for RTNs with output (see Sastre and Forcada, 2009) extend ESs with the blackboards generated from the last initiated call up to reaching the ES upon call completion, resume explorations with the combination of the pre-call and in-call blackboards
Problem: ESs cloned due to different outputs to generate Indeed, implicit computation of pre-call × in-call blackboards RTN generating an exponential number of outputs w.r.t. the input length ⇒ exponential worst-case cost Example: unresolved prepositional phrase attachments The boy saw the man: 20 interpretations The boy saw the man with the telescope: 21 interpretations The boy saw the man with the telescope in the garden: 22 interp. Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
27 / 42
Efficient algorithms of application of local grammars
Translation into FPNs
Compute the set of outputs as a FPN accepting them instead of extending ESs with blackboards (Sastre, 2009) Earley acceptor ESs become FPN states Call transitions between ESs become FPN call transitions RTN infixes also factored out in the FPN No more ES cloning: create FPN transitions consuming the outputs Polynomial worst-case cost (n3 ) even for grammars generating an exponential number of outputs w.r.t. the input length
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
28 / 42
Efficient algorithms of application of local grammars
FPN pruning & language generation
FPN pruning: remove useless substructures due to input misinterpretations (Sastre et al., 2009) Prune before generating the language of outputs, if needed (the effective list of outputs, Sastre et al., 2009) Again, exponential worst-case cost, but. . . . . . no time wasted computing translations of misinterpreted input segments
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
29 / 42
Efficient algorithms of application of local grammars
Blackboard set processing of FPNs r0
0
r1 {r2}
r6 r4 r2 0
Blackboard set processing (BSP): efficient generation of the language of a FPN Traverse the FPN by following a topological sort Avoid multiple explorations of FPN transitions Output FPNs are a kind of “acyclic” RTNs
B
r3 1
{r2} r
4
1
A
Topological sort possible as for PERT networks (Kahn, 1962), though no calls in PERT networks! Redefinition of topological sort for FPNs:
r5 2
{r2}
r6
2
A
r7
3
r1
Topological sort within call substructures as for PERT networks Initialization of call substructures in arbitrary order, but. . . return states must wait for every call completion they depend on
l
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
30 / 42
Efficient algorithms of application of local grammars
Computing a FPN’s top-ranked blackboard in time n3 Extension of blackboard set processing for FPNs with weighted output Inspired in dynamic programming algorithm for the computation of the edit distance between two strings (Wagner and Fischer, 1974) Traverse the FPN by following a topological sort Annotate at each state the maximum weight that can be generated up to reaching them and the corresponding last transition Traverse backwards the succession of last transitions and build the corresponding top blackboard Finally, a polynomial worst-case cost algorithm even for RTNs generating an exponentially increasing number of outputs Unification machines may produce incompatible feature structures Top blackboard might be illegal Compute the top non-illegal blackboard in time n3 ? (future work) Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
31 / 42
Algorithm optimizations
Outline 1
Background Local grammars Lexicon grammar + local grammars = natural language parsing The MovistarBot: an industrial natural language application
2
Motivation & goal
3
Hierarchy of finite-state machines
4
Efficient algorithms of application of local grammars
5
Algorithm optimizations Efficient management of sets Efficient management of sequences
6
Empirical tests Experimental conditions Experimental results
7
Conclusion
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
31 / 42
Algorithm optimizations
Efficient management of sets
Why should we care about set management? Algorithms make an intensive use of set data structures: Construction of sets of execution states (SESs) Construction of sets of outputs
Self-balancing binary search trees (BSTs) are an efficient option Element searches have a logarithmic cost, but. . . . . . addition & removal cost increased due to rebalancing Worst case: successive additions in direct or reverse order
4
1
1
add 2
===⇒
6
2 3
5
Javier Sastre (Univs. Paris-Est & Alacant)
1
rebalance!
======⇒
1 2
2 add 3
===⇒
7 Ph.D. public defense
2 1
3
3 11th July 2011
32 / 42
Algorithm optimizations
Efficient management of sets
Red-black trees Addition of elements in random order tends to keep balance Strict balancing unnecessary GNU’s C++ Standard Template Library implements red-black trees (following Cormen et al., 2001): “half”-balanced BSTs
4
2 6
2 1
3
1
5
4 3
random order
direct order
5 6
Good compromise between balanced & unbalanced BSTs, but. . . . . . once a FPN is built, further processing does not require element additions or searches ⇒ rebalances are unnecessary Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
33 / 42
Algorithm optimizations
Efficient management of sets
Double-linked red-black trees Red-black tree + double-linked list = double-linked red-black tree Once no more elements are to be added or searched, the structure can be treated as a mere double-linked list Faster element removal without rebalancing or even maintaining the tree structure
?
4
1
remove 4
=====⇒
6
2 3
5
7
6
2 1
3
5
7
Unexpected (but good) side effects (Das et al., 2008): Faster access to neighbour elements → faster element addition Faster sequential traversal → faster set deletion Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
34 / 42
Algorithm optimizations
Efficient management of sequences
Why should we care about sequence management?
Algorithms generating output sequences or using stacks make an intensive use of sequence copies and comparisons: Compare a sequence when adding it to a set of ESs or outputs Copy α when building β as ασ (appending σ to the output) Copy π when building π ′ as πqr (pushing return state qr ) Copy π when popping return state qr from π ′
Cost proportional to the sequence lengths Recall: each trie state corresponds to a unique string Sequences can be reduced to integer numbers: use the pointers to the nodes of a trie as identifiers
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
35 / 42
Algorithm optimizations
Efficient management of sequences
Retrieval trees for string management Build the trie as needed and retrieve the pointers (red arrows)
ε o o
i i n in
f of
ε t
t n o on to
of ε ε = ⇒ o o ε·o o·f =⇒ o = ⇒ o f o·f = of ⇐
ε ε o o o·n o =⇒ o n f f on of of
Append σ to α: follow pointer to α, search/insert children ασ & return its pointer Remove σ from ασ: follow pointer to ασ & return parent pointer Operations on sequences reduced to pointer copies & comparisons Logarithmic worst-case cost instead of linear Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
36 / 42
Empirical tests
Outline 1
Background Local grammars Lexicon grammar + local grammars = natural language parsing The MovistarBot: an industrial natural language application
2
Motivation & goal
3
Hierarchy of finite-state machines
4
Efficient algorithms of application of local grammars
5
Algorithm optimizations Efficient management of sets Efficient management of sequences
6
Empirical tests Experimental conditions Experimental results
7
Conclusion
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
36 / 42
Empirical tests
Experimental conditions
The MovistarBot grammar & testing corpus Used for comparing the performances of the different algorithms and optimizations Translates sentences in Spanish requesting for mobile services into commands that an AIML chatterbot can easily understand The grammar (two versions): Pseudo-minimized version: 1359 states & 3141 transitions Flattened & pseudo-minimized version: 5504 states & 31702 transitions
The corpus: 168 sentences 6.9 interpretations per sentence (average) 10.1 words per sentence (average) 4.1 characters per word (average)
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
37 / 42
Empirical tests
Experimental results
Speedups w.r.t. Unitex’s depth-first algorithm Grammar flattening (before pseudo-minimization): Pseudo-minimized FPN top blackboard FPN blackboard set proc. Optimized Earley Outilex’s Earley Optimized depth-first Unitex’s depth-first Optimized breadth-first
[1.43, 5.05]
2.12 1.74 1.64 1.48 1.15 1 0.68
Flattened & pseudo-minimized FPN top blackboard 1.45 Optimized depth-first 1.16 FPN blackboard set proc. 1.15 Unitex’s depth-first 1 Optimized breadth-first 0.76 Optimized Earley 0.72 Outilex’s Earley 0.69
Pseudo-minimized Set management [1.02, 1.37] Sequence manag. [1.14, 1.43] Both [1.30, 1.64]
Flattened & pseudo-minimized Set management [1.02, 1.12] Sequence manag. [1.11, 1.37] Both [1.12, 1.45]
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
38 / 42
Empirical tests
Experimental results
What about large coverage NL grammars?
Speedups of new algorithms can be expected to be greater for large coverage NL grammars Main difference between new algorithms and Unitex’s and Outilex’s algorithms: more efficient treatment of non-determinism and ambiguity These factors are greater in large coverage NL grammars Furthermore speedup of FPN top blackboard expected to increase exponentially w.r.t. ambiguity and non-determinism since. . . . . . it has a polynomial worst-case cost instead of exponential
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
39 / 42
Conclusion
Outline 1
Background Local grammars Lexicon grammar + local grammars = natural language parsing The MovistarBot: an industrial natural language application
2
Motivation & goal
3
Hierarchy of finite-state machines
4
Efficient algorithms of application of local grammars
5
Algorithm optimizations Efficient management of sets Efficient management of sequences
6
Empirical tests Experimental conditions Experimental results
7
Conclusion
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
39 / 42
Conclusion
Conclusion Grammar flattening: best optimization (when possible) Faster algorithms of application of local grammars FPN top blackboard, the fastest for both MovistarBot grammars FPN blackboard set processing faster than Unitex’s & Outilex’s algorithms Flattened grammar ⇒ Unitex’s algorithm faster than Outilex’s
A polynomial worst-case cost algorithm instead of exponential New algorithms treat more efficiently ambiguity and indeterminism Therefore even better results expected for larger NL grammars Optimizations applicable to parsing algorithms in general: Efficient management of sets with double-linked red-black trees Efficient management of sequences with retrieval trees
A family of finite-state machines and algorithms of application A theoretical framework providing the tools for future extensions Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
40 / 42
Conclusion
Future work Multiple proposals for further continuing this work (extensive list in the thesis) Algorithm enhancements Better strategies for the management of sets and sequences Parallelization: concurrent traversal of transitions
Grammar optimizations Grammar filtering according to the sentence to apply (Boullier and Sagot, 2007) Flattening initial fragments of grammar paths (prefix overlay transducers, Marschner, 2007)
Additional functionalities Efficient support of unification grammars (problem of the illegal top-blackboard) Tolerating errors (approximate string matching)
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
41 / 42
Conclusion
Acknowledgements Université Paris-Est, Ministère de l’Éducation Nationale de la Recherche et de la Technologie & Centre Nationale de la Recherche Scientifique: contrat d’engagement en qualité d’allocataire de recherche No 15198-2004 Universitat d’Alacant: grant numbers INV05-40 & VIGROB-127 Spanish Government: grant number TIC20033-080681-C02-02 Universitat Politècnica de València, Instituto de Telecomunicaciones y Aplicaciones Multimedia & Telefónica I+D: Project “Tecnologías disruptivas para servicios avanzados en movilidad”, Ref. 48566/1
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
42 / 42
Traces
Outline
8
Traces
9
References
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
0 / 12
Traces
Breadth-first acceptor trace a q1
{q0}
a q3
{q0}
(q0, ¸)
q2 b
"
q0
q4 b
XI
(q5, ¸)
q5
[ C"(XI)
(q1, ¸)
(q3, ¸)
(q0, q2)
(q0, q4)
(q5, q2)
(q5, q4)
(q2, ¸)
(q4, ¸)
¢(V0, a)
[ C"(¢(V0, a))
(q1, q2)
(q3, q2)
(q1, q4)
(q3, q4)
(q0, q2q2)
(q0, q2q4)
(q0, q4q2)
(q0, q4q4)
(q5, q2q2)
(q5, q2q4)
(q5, q4q2)
(q5, q4q4)
(q2, q2)
(q4, q2)
(q2, q4)
(q4, q4)
¢(V1, a)
[ C"(¢(V1, a))
(q5, q2)
(q5, q4)
¢(V2, b)
(q2, ¸)
(q4, ¸)
[ C"(¢(V2, b))
(q5, ¸)
¢(V3, b)
[ C"(¢(V3, b)) Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
1 / 12
Traces
Breadth-first translator trace {q } a:( q1 0 q2 b:) ":x q5 q0 q q 4 b:] a:[ 3 {q0}
(q0, ", ¸)
XI
(q5, x, ¸) (q1, (, ¸) (q0, (, q2) (q5, (x, q2) (q2, (x, ¸)
(q1, ((, q2)
(q3, ([, q2)
[ C"(XI) (q3, [, ¸)
¢(V0, a)
(q0, [, q4)
[ C"(¢(V0, a))
(q5, (x, q4) (q4, (x, ¸) (q1, [(, q4)
(q3, [[, q4)
¢(V1, a)
(q0, ((, q2q2)
(q0, ([, q2q4)
(q0, [(, q4q2)
(q0, [[, q4q4)
(q5, ((x, q2q2)
(q5, ([x, q2q4)
(q5, [(x, q4q2)
(q5, [[x, q4q4)
(q2, ((x, q2)
(q4, ([x, q2)
(q2, [(x, q4)
(q4, [[x, q4)
(q5, ((x), q2)
(q5, ([x], q2)
(q5, [(x), q4)
(q5, [[x], q4)
¢(V2, b)
[ C"(¢(V1, a))
(q2, ((x), ¸)
(q2, ([x], ¸)
(q4, [(x), ¸)
(q4, [[x], ¸)
[ C"(¢(V2, b))
(q5, ((x)), ¸)
(q5, ([x]), ¸)
(q5, [(x)], ¸)
(q5, [[x]], ¸)
¢(V3, b)
[ C"(¢(V3, b)) Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
2 / 12
Traces
Earley acceptor trace a q1
{q0}
q2 b
"
q0 a q3
{q0}
q5 q4 b
(q0, ¸, {q0}, 0)
XI
(q5, ¸, {q0}, 0)
[ C"(XI)
(q1, ¸, {q0}, 0)
(q3, ¸, {q0}, 0)
¢(V0, a)
(q0, ¸, {q0}, 1) (q5, ¸, {q0}, 1)
[ C"(¢(V0, a)) (q4, ¸, {q0}, 0)
(q2, ¸, {q0}, 0) (q1, ¸, {q0}, 1)
(q3, ¸, {q0}, 1)
¢(V1, a)
(q0, ¸, {q0}, 2) (q5, ¸, {q0}, 2) (q2, ¸, {q0}, 1)
[ C"(¢(V1, a)) (q4, ¸, {q0}, 1) ¢(V2, b)
(q5, ¸, {q0}, 1) (q4, ¸, {q0}, 0)
(q2, ¸, {q0}, 0) (q5, ¸, {q0}, 0)
[ C"(¢(V2, b)) ¢(V3, b)
[ C"(¢(V3, b)) Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
3 / 12
Traces
Earley translator trace {q } a:( q1 0 q2 b:) ":x q5 q0 q q 4 b:] a:[ 3 {q0}
(q0, ", ¸, {q0}, 0)
XI
(q5, x, ¸, {q0}, 0)
[ C"(XI)
(q1, (, ¸, {q0}, 0)
(q3, [, ¸, {q0}, 0)
¢(V0, a)
(q0, ", ¸, {q0}, 1) (q5, x, ¸, {q0}, 1) (q2, (x, ¸, {q0}, 0)
[ C"(¢(V0, a)) (q4, [x, ¸, {q0}, 0)
(q1, (, ¸, {q0}, 1)
(q3, [, ¸, {q0}, 1)
¢(V1, a)
(q0, ", ¸, {q0}, 2) (q5, x, ¸, {q0}, 2)
[ C"(¢(V1, a))
(q2, (x, ¸, {q0}, 1)
(q4, [x, ¸, {q0}, 1)
(q5, (x), ¸, {q0}, 1)
(q5, [x], ¸, {q0}, 1)
¢(V2, b)
(q2, ((x), ¸, {q0}, 0)
(q4, [(x), ¸, {q0}, 0)
(q2, ([x], ¸, {q0}, 0)
(q4, [[x], ¸, {q0}, 0) [ C"(¢(V2, b))
(q5, ((x)), ¸, {q0}, 0)
(q5, [(x)], ¸, {q0}, 0)
(q5, ([x]), ¸, {q0}, 0)
(q5, [[x]], ¸, {q0}, 0) ¢(V3, b)
[ C"(¢(V3, b)) Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
4 / 12
References
Outline
8
Traces
9
References
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
4 / 12
References
References I Blanc, O. and Constant, M. (2006). Outilex, a linguistic platform for text processing. In Interactive Presentation Session of Coling-ACL06, pages 73–76, Morristown, NJ, USA. Association for Computational Linguistics. Boullier, P. and Sagot, B. (2007). Are very large context-free grammars tractable? In Proceedings of the 10th International Workshop on Parsing Technologies (IWPT 07) , Prague, Czech Republic. Carrasco, R. C. and Forcada, M. L. (2002). Incremental construction and maintenance of minimal finite-state automata. Computational Linguistics, 28(2):207–216. Cocke, J. and Schwartz, J. T. (1970). Programming languages and their compilers: Preliminary notes. Technical report, Courant Institute of Mathematical Sciences, New York University, New York. Constant, M. (2003a). Converting linguistic systems of relational matrices into finite-state transducers. In Proceedings of the EACL Workshop on Finite-State Methods in Natural Language Processing, pages 75–82, Budapest. Constant, M. (2003b). Grammaires locales pour l’analyse automatique de textes : Méthodes de construction et outils de gestion . PhD thesis, Université de Marne la Vallée. Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. (2001). Introduction to algorithms. MIT press, Cambridge, Massachusetts, 2nd edition. Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
5 / 12
References
References II Daciuk, J., Maurel, D., and Savary, A. (2005). Dynamic perfect hashing with finite-state automata. ´ S., and Trojanowski, K., editors, Intelligent Information Processing and Web Mining, In Kłopotek, M. A., Wierzchon, volume 31 of Advances in Soft Computing, pages 169–178. Springer Berlin / Heidelberg. Daciuk, J., Mihov, S., Watson, B. W., and Watson, R. E. (2000). Incremental construction of minimal acyclic finite-state automata. Computational Linguistics, 26(1):3–16. Das, D., Valluri, M., Wong, M., and Cambly, C. (2008). Speeding up STL set/map usage in C++ applications. In Kounev, S., Gorton, I., and Sachs, K., editors, Performance Evaluation: Metrics, Models and Benchmarks, volume 5119 of Lecture Notes in Computer Science, pages 314–321. Springer-Verlag. Fredkin, E. (1960). Trie memory. Communications of the ACM, 3(9):490–499. Friburger, N. (2002). Reconnaissance automatique de noms propres: Application à la classification automatique des textes journalistiques. PhD thesis, Université de Tours. Gross, M. (1975). Méthodes en syntaxe. Hermann, Paris.
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
6 / 12
References
References III Gross, M. (1996). Lexicon-grammar. In Brown, K. and Miller, J., editors, Concise Encyclopedia of Syntactic Theories , pages 224–259. Pergamon Press, Oxford. Gross, M. (1997). The construction of local grammars. In Roche, E. and Schabes, Y., editors, Finite State Language Processing, pages 329–352. MIT Press, Cambridge, MA, USA. Gross, M. (1999). Lemmatization of compound tenses in English. Lingvisticæ Investigationes, 22:71–122. Hopcroft, J. E., Motwani, R., and Ullman, J. D. (2000). Introduction to automata theory, languages, and computation. Addison-Wesley, 2nd edition. Jung, E.-J. (2005). Grammaire des adverbes de duree et de date en coréen. PhD thesis, Université de Marne-la-Vallée. Kahn, A. B. (1962). Topological sorting of large networks. Communications of the ACM, 5(11):558–562. Kasami, T. (1965). An efficient recognition and syntax analysis algorithm for context free languages. Scientific Report AF CRL-65-758, Air Force Cambridge Research Laboratory, Bedford, Massachusetts.
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
7 / 12
References
References IV Knuth, D. E. (1965). On the translation of languages from left to right. Information and Control, 8(6):607–639. Laporte, E. (2005). In memoriam Maurice Gross. Archives of Control Sciences, 15(3):257–278. Special issue on Human Language Technologies as a challenge for Computer Science and Linguistics. Part I. (2nd Language and Technology Conference). Laporte, E. (2007). Evaluation of a grammar of French determiners. In 27th Congress of the Brazilian Society of Computation (SBC’07) , pages 1625–1634. Workshop on Information Technology and Human Language (TIL). Leclère, C. (2002). Organization of the lexicon-grammar of French verbs. Lingvisticæ Investigationes, 25(1):29–48. Marschner, C. (2007). Efficiently matching with local grammars using prefix overlay transducers. In Holub, J. and Ždárek, J., editors, Implementation and Application of Automata, volume 4783 of Lecture Notes in Computer Science, pages 314–316. Springer-Verlag. Paumier, S., Nakamura, T., and Voyatzi, S. (2009). UNITEX, a corpus processing system with multi-lingual linguistic resources. In eLexicography in the 21st century: new challenges, new applications (eLEX’09), pages 173–175.
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
8 / 12
References
References V Revuz, D. (1992). Minimisation of acyclic deterministic automata in linear time. Theoretical Computer Science, 92(1):181–189. Roche, E. (1993). Une représentation par automate fini des textes et des propriétés transformationnelles des verbes. Lingvisticæ Investigationes, 17(1):189–222. Sastre, J. M. (2006a). Computer tools for the management of lexicon-grammar databases. In Proceedings of TALN’06, pages 600–608, Leuven, Belgium. Sastre, J. M. (2006b). HOOP: a Hyper-Object Oriented Platform for the management of linguistic databases. Presentation in 25th Lexis and Grammar Conference, Palermo, Italy, September 6-9. Abstract available for download at http://www-igm.univ-mlv.fr/~sastre/publications/sastre06b.zip. Sastre, J. M. (2009). Efficient parsing using filtered-popping recursive transition networks. In Maneth, S., editor, Implementation and Application of Automata, volume 5642 of Lecture Notes in Computer Science, pages 241–244. Springer-Verlag. Sastre, J. M. and Forcada, M. L. (2009). Efficient parsing using recursive transition networks with output. In Vetulani, Z. and Uszkoreit, H., editors, Human Language Technology. Challenges of the Information Society, volume 5603 of Lecture Notes in Artificial Intelligence, pages 192–204. Springer-Verlag. Extended version.
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
9 / 12
References
References VI Sastre, J. M., Sastre, J., and García, J. (2009). Boosting a chatterbot understanding with a weighted filtered-popping network parser. ´ In Vetulani, Z., editor, Proceedings of the 4th Language & Technology Conference (LTC’09), pages 74–78, Poznan, ´ Poland. Wydawnictwo Poznanskie Sp. z o.o. Silberztein, M. D. (1998). INTEX: An integrated FST toolbox. In Wood, D. and Yu, S., editors, Automata Implementation, volume 1436 of Lecture Notes in Computer Science, pages 185–197. Springer Berlin / Heidelberg. Silberztein, M. D. (2007). An alternative approach to tagging. In Kedad, Z., Lammari, N., Métais, E., Meziane, F., and Rezgui, Y., editors, Natural Language Processing and Information Systems, volume 4592 of Lecture Notes in Computer Science, pages 1–11. Springer Berlin / Heidelberg. Tolone, E. (2011). Analyse syntaxique à l’aide des tables du Lexique-Grammaire du français. PhD thesis, Université Paris-Est. Tomita, M. (1987). An efficient augmented-context-free parsing algorithm. Computational Linguistics, 13(1-2):31–46. van de Snepscheut, J. L. A. (1985). Trace Theory and VLSI Design, volume 200 of Lecture Notes in Computer Science. Springer-Verlag. PhD thesis, Eindhoven University of Technology.
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
10 / 12
References
References VII van Noord, G. (2000). Treatment of epsilon moves in subset construction. Computational Linguistics, 26(1):61–76. van Noord, G. and Gerdemann, D. (2001). Finite state transducers with predicates and identities. Grammars, 4(3):263–286. Voyatzi, S. (2006). Description morpho-syntaxique et sémantique des adverbes figés de phrase en vue d’un système d’analyse automatique des textes grecs. PhD thesis, Université de Marne-la-Vallée. Wagner, R. A. and Fischer, M. J. (1974). The string-to-string correction problem. Journal of the ACM, 21(1):168–173. Wallace, R. (2004). The elements of AIML style. ALICE AI Foundation. Woods, W. A. (1969). Augmented transition networks for natural language analysis. Technical Report CS-1, Harvard Computation Laboratory. Woods, W. A. (1970). Transition network grammars for natural language analysis. Communications of the ACM, 13(10):591–606.
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
11 / 12
References
References VIII
Yannacopoulou, A. (2005). Le lexique-grammaire des verbes du grec moderne – Les constructions transitives locatives standard. PhD thesis, Université de Marne-la-Vallée. Younger, D. H. (1967). Recognition and parsing of context-free languages in time n3 . Information and Control, 10(2):189–208.
Javier Sastre (Univs. Paris-Est & Alacant)
Ph.D. public defense
11th July 2011
12 / 12