Efficient Parsing Using Filtered-Popping Recursive ... - Springer Link

Efficient Parsing Using Filtered-Popping Recursive Transition Networks Javier M. Sastre-Mart´ınez Laboratoire d’Informatique de l’Institut Gaspard Monge, Université Paris-Est, F-77454 Marne-la-Vallée Cedex 2, France Grup Transducens, Departament de Llenguatges i Sistemes Inform` atics, Universitat d’Alacant, E-03071 Alacant, Spain [email protected]

Abstract. We present here filtered-popping recursive transition networks (FPRTNs), a special breed of RTNs, and an efficient parsing algorithm based on recursive transition network with string output (RTNSO) which constructs the set of parses of a potentially ambiguous sentence as a FPRTN in polynomial time. By constructing a FPRTN rather than a parse enumeration, we avoid the exponential explosion due to cases where the number of parses increases exponentially w.r.t. the input length. The algorithm is compatible with the grammars that can be manually developed with the Intex and Unitex systems.

1

Introduction

This paper describes filtered-popping recursive transition networks (FPRTNs), an extension of recursive transition networks [1] (RTNs) which serves as a compressed representation of a potentially exponential set of sequences, and give the modifications to perform on the Earley-like algorithm for RTNs with string output (RTNSOs) given in [2] for building a FPRTN recognizing the language of translations of a given input sequence in polynomial time. If RTNSOs represent grammars where transition output labels are tags bounding sentence compounds, then the algorithm computes the set of parses of a given sentence. Extending Earley’s algorithm [3] for output generation raises its asymptotic cost from polynomial to exponential due to cases where the number of outputs increases exponentially w.r.t. the input length; for instance, sentences with unresolved prepositional phrase (PP) attachments [4] produce an exponentially large number of parses w.r.t. the number of PPs (e.g.: the girl saw the monkey with the telescope under the tree). RTNs with output are used by both Intex [5] and Unitex [6] systems in order to represent natural language grammars.

2

Recursive Transition Networks

Given the definition of RTNSO in [2], we define a RTN R = (Q, Σ, δ, QI , F ) by removing the output alphabet Γ and by removing the output labels of S. Maneth (Ed.): CIAA 2009, LNCS 5642, pp. 241–244, 2009. c Springer-Verlag Berlin Heidelberg 2009

242

J.M. Sastre-Mart´ınez

– translating and inserting transitions, which become consuming transitions δ(qs , σ) → qt , that is, just read input symbol σ, and – deleting and ε2 -transitions, which become explicit ε-transitions δ(qs , ε) → qt , that is, do not read or write symbols. Call, push and pop transition definitions are not modified since they define no output. We obtain RTN execution states (ESs) x = (q, π) ∈ (Q × Q∗ ) by suppresing outputs from RTNSO ESs, x representing the fact of reaching a state q after generating a stack π of return states. Δ, the extension of transition function δ for a set of execution states (SES) V and input symbol σ, becomes Δ(V, σ) = {(qt , π) : qt ∈ δ(qs , σ) ∧ (qs , π) ∈ V } ,

(1)

and ε-moves adding elements to the ε-closure are redefined as follows: – explicit ε-transitions: add (qt , π) for each (qs , π) in the ε-closure and for each qt ∈ Q such that qt ∈ δ(qs , ε); – push transitions: add (qc , πqt ) for each (qs , π) in the ε-closure and for each qc , qt ∈ Q such that qt ∈ δ(qs , qc ); – pop transitions: add (qr , π) for each (qf , πqr ) in the ε-closure such that qf ∈ F ; The initial SES is redefined as XI = QI × {λ}, that is, recognition starts from an initial state without having started any call, and the acceptance SES as XF = F × {λ}, that is, recognition ends once an acceptance state is reached without uncompleted calls. Δ∗ , the extension of Δ for input sequences, is not modified except for the use of the redefined Δ and ε-closure functions. We define the language of a RTN A instead of the language of translations as L(A) = {w ∈ Σ ∗ : Δ∗ (XI , w) ∩ XF = ∅} .

3

(2)

Filtered-Popping Recursive Transition Networks

A FPRTN (Q, K, Σ, δ, κ, QI , F ) is a RTN extended with a finite set of keys K and a κ : Q → K function that maps states to keys in K. FPRTNs behave as RTNs except for pop transitions: bringing the machine from an acceptance state qs to a popped state qr is only possible if κ(qs ) = κ(qr ); we say pop transitions are filtered.

4

Language of a RTN via Earley-Like Processing

We define the Earley-like computation of the acceptance/rejection of an input sequence by a RTN by suppressing the outputs from the Earley-like processing for RTNSOs given in [2]. ESs become 4-tuples (qs , qc , qh , j) ∈ Q × (Q ∪ {λ}) × Q × IN, the Δ function, analogous to Earley’s “scanner”, becomes Δ(V, σ) = {(qt , λ, qh , j) : qt ∈ δ(qs , σ) ∧ (qs , λ, qh , j) ∈ V } and the ε-moves adding ESs to the ε-closure are redefined as follows:

(3)

Efficient Parsing Using Filtered-Popping Recursive Transition Networks

243

– explicit ε-transitions: add (qt , λ, qh , j) for each (qs , λ, qh , j) in the εclosure of Vk and for each qt such that qt ∈ δ(qs , ε); – push transitions: analogously to Earley’s “predictor”, add (qt , qc , qh , j) and (qc , λ, qc , k) for each (qs , λ, qh , j) in the ε-closure of Vk and for each qc and qt such that qt ∈ δ(qs , qc ); (qt , qc , qh , j) is the paused ES waiting for qc ’s call completion and (qc , λ, qc , k) is the active ES initiating the call; – pop transitions: analogously to Earley’s “completer”, for each (qf , λ, qc , j) such that qf ∈ F (the ESs completing call to qc ) and for each (qr , qc , qh , i) ∈ Vj (the paused ESs depending on call to qc ), retroactively add (qr , λ, qh , i) to the ε-closure of Vk (we resume these paused ESs). Retroactive call completion is explained in [2], which is based on the management of deletable non-terminals for CFGs explained in [7]. The initial SES is redefined as XI = {(qs , λ, qs , 0) : qs ∈ QI }, that is, the ESs initiating a call to each initial state, and the acceptance SES as XF = F × {λ} × QI × {0}, that is, the ESs triggering a pop from an initial call. Δ∗ and L are not modified w.r.t. section 2 except for the use of the sets and functions redefined here.

5

Translating a String into a FPRTN

We give here the modifications to perform on the Earley-like algorithm in [2] for the generation of a FPRTN A = (Q , K, Σ , δ , κ, QI , F ) from a RTNSO A = (Q, Σ, Γ, δ, QI , F ) and input σ1 . . . σl , where Σ = Γ , K = {0, . . . , l} and given a path p within A having r and r as start and end states, p consumes a possible translation of σκ(r)+1 . . . σκ(r ) (see Fig 1). First of all, we obtain a RTN Earley-like algorithm from the one for RTNSOs in [2] by suppressing outputs, as shown in the equations above. Then we insert the following instructions for the construction of the FPRTN: a:{

q1

q6

q0 a:[

q2

q3

c:}

q4 r0

q6

q5

r1

{ 0

1

r2

[

q7

q6

b:y

q8

c:]

r6 q9

1

a

r3

}

2

1

b:x

r6 r6

r5

r3

3

r5 x

r7 2

y

r8 2

b

r4 3

] c

r9 3

Fig. 1. At the left, an ambiguous RTNSO, and at the right, an FPRTN recognizing the language of translations of abc for this RTNSO. Boxes contain the key of the state they are attached to. FPRTN push and pop transitions are explicitly represented as dotted and thick arrows, respectively. Only pop transitions corresponding to connected input segments are allowed: pop transitions from r7 to r5 and from r9 to r3 are forbidden since the former skips the translation of c and the latter translates c twice.

244

J.M. Sastre-Mart´ınez

– create states rI ∈ QI with κ(rI ) = 0 and rF ∈ F with κ(rF ) = l – for each active ES xt to be added to a SES Vk , create non-initial state rt ∈ Q with κ(rt ) = k, create map ζs (k, xt ) → rt and add rt to F iif xt represents to have reached an acceptance RTNSO state, – for each xt ∈ XI add transition δ (rI , ζs (0, xt )) → rF , – for each paused ESs xp ∈ Vk derived from an active source ES xs ∈ Vk due to a call transition with xc ∈ Vk as active ES initiating the call, create maps ζs (k, xp ) → ζs (k, xs ) and ζc (k, xp ) → ζs (k, xc ), – let xs ∈ Vj be the active ES xt ∈ Vk is derived from, if the derivation is due to a non-call RTNSO transition generating g ∈ Γ ∪ {ε} then add transition δ (ζs (j, xs ), g) → ζs (k, xt ), otherwise – if it is due to a call completion resuming paused ES xp ∈ Vi , then add transition δ (ζs (i, xp ), ζc (i, xp )) → ζs (k, xt ).

6

Empirical Tests

The algorithm has been tested for the same exponential RTNSO translator and under the same conditions than the ones shown in [2], section 6. The meassured times are just twice the ones of the acceptor-only Earley algorithm (see Fig. 2 of [2]), hence keeping a linear cost instead of exponential for this case.

7

Future Work

We are currently studying probabilistic prunning methods for weighted FPRTNs in order to compute the highest-ranked outputs in polynomial time. Acknowledgments. This work has been partially supported by the Spanish Government (grant number TIN2006–15071–C03–01), by the Universitat d’Alacant (grant number INV05–40), by the MENRT and by the CNRS. We thank Profs. E. Laporte and M. Forcada for their useful comments.

References 1. Woods, W.A.: Transition network grammars for natural language analysis. Commun. ACM 13(10), 591–606 (1970) 2. Sastre, J.M., Forcada, M.L.: Efficient parsing using recursive transition networks with string output. LNCS (LNAI), vol. 5603. Springer, Heidelberg (in press) 3. Earley, J.: An efficient context-free parsing algorithm. Commun. ACM 13(2), 94–102 (1970) 4. Ratnaparkhi, A.: Statistical models for unsupervised prepositional phrase attachment. In: COLING-ACL 1998, Morristown, NJ, USA, ACL, pp. 1079–1085 (1998) 5. Silberztein, M.D.: Dictionnaires électroniques et analyse automatique de textes. Le système INTEX. Masson, Paris (1993) 6. Paumier, S.: Unitex 1.2 User Manual. Université de Marne-la-Vallée (2006) 7. Aycock, J., Horspool, N.: Practical Earley Parsing. The Computer Journal 45(6), 620–630 (2002)

Efficient Parsing Using Filtered-Popping Recursive ... - Springer Link

Efficient Parsing Using Filtered-Popping Recursive ... - Springer Link

Suggest Documents

Efficient recursive backup parsing - Semantic Scholar

Efficient parsing using recursive transition networks with output

Efficient Implementation of Recursive Queries in Major ... - Springer Link

Efficient Unsupervised Recursive Word Segmentation Using Minimum ...

Efficient Network Management Using SNMP - Springer Link

Wing aerodynamic optimization using efficient ... - Springer Link

Efficient document image binarization using ... - Springer Link

Structurally Recursive Descent Parsing (Draft) - Chalmers

Learning Recursive Segments for Discourse Parsing

Learning Recursive Segments for Discourse Parsing

Proof rules for recursive procedures - Springer Link

Parsing Formal Languages using Natural Language Parsing ...

Efficient Structured Parsing of FaÃ§ades Using Dynamic Programming

Efficient Analysis of Complex Diagrams using Constraint-Based Parsing

Using recursive algorithms for the efficient identification of smoothing ...

space efficient cryptographic protocol using recursive bitwise ... - arXiv

Efficient implementation of discrete cosine transform using recursive ...

Using recursive algorithms for the efficient identification of smoothing ...

Efficient Pareto Frontier Exploration using Surrogate ... - Springer Link

Efficient Processing of XPath Queries Using Indexes - Springer Link

Efficient shift-variant image restoration using ... - Springer Link

Fast and Efficient Parallel Computations Using a ... - Springer Link

Efficient real-time reservoir management using adjoint ... - Springer Link

Making efficient public good decisions using an ... - Springer Link