Abstract. This paper proposes an approach to reduce the stochastic parsing time with stochastic context-free grammars. The basic idea consists of storing a set.
Time reduction of stochastic parsing with stochastic context-free grammars ? J.A. Sánchez and J.M. Benedí Depto. Sistemas Informáticos y Computación Universidad Politécnica de Valencia Camino de Vera s/n, 46022 Valencia (Spain) e-mail: {jandreu,jbenedi}@dsic.upv.es
Abstract. This paper proposes an approach to reduce the stochastic parsing time with stochastic context-free grammars. The basic idea consists of storing a set of precomputed problems. These precomputed problems are obtained off line from a training corpus or they are computed on line from a test corpus. In this work, experiments with the UPenn Treebank are reported in order to show the performance of both alternatives.
1 Introduction Stochastic Context-Free Grammars (SCFGs) are an important specification formalism that are frequently used in Syntactic Pattern Recognition. SCFGs have been widely used to characterize the probabilistic modeling of language in Computational Linguistics [3, 9, 1], Speech Recognition and Understanding [6], and Biological Sequence Analysis [8]. An important advantage of this formalism is the capability to model the long-term dependencies established between the different parts of a sentence, and the possibility of incorporating the stochastic information which allows for an adequate modeling of the variability phenomena that are always present in complex problems. A notable obstacle to using these models is the time complexity of the stochastic parsing algorithms that handle them and the algorithms that are used for the probabilistic estimation of the models from a training corpus. Most of the well-known parsing algorithms are based on the Earley algorithm for SCFGs in General Format [9] or in the Cocke-Younger-Kasami (CYK) algorithm for SCFGs in Chomsky Normal Form (CNF) [6]. One of these algorithms for SCFGs in CNF is the inside algorithm [4], which allows us to compute the probability of a string given a SCFG by using a Dynamic Programming scheme. The inside algorithm has a time complexity O(n3 ) for a string of length n. There are theoretical works that attempt to improve this time complexity. In [10], a version of the CYK algorithm was proposed whose time complexity is O(M (n)), where M (n) is the time complexity of the product of two matrices of dimension n. The best known algorithm for multiplying two matrices of dimension n is described in [2], whose time ?
This work has been partially supported by the Spanish MCyT under contract (TIC2002/04103C03-03) and by Agencia Valenciana de Ciencia y Tecnología under contract GRUPOS03/031.
complexity is O(n2.38 ). A similar parsing algorithm could be considered for SCFGs by adequately modifying the inside algorithm. However, the large implicit constant associated to this matrix product algorithm could make this modified parsing algorithm only interesting for long strings. Given these drawbacks, other improvement alternatives should be considered. In this work, we propose a simple technique that allows us to reduce computation time especially for short strings. The basic idea consists of storing a set of precomputed problems associated to short strings. The set of problems can be chosen from a training corpus or it can be composed on line from a test set. In this work, we explore these two proposals and we report the results of experiments on the UPenn Treebank in order to show the performance of both alternatives.
2 Definitions A Context-Free Grammar (CFG) G is a four-tuple (N, Σ, P, S), where N is a finite set of non-terminal symbols, Σ is a finite set of terminal symbols, P is a finite set of rules, and S is the initial symbol. A CFG is in Chomsky Normal Form (CNF) if the rules are of the form A → BC or A → a (A, B, C ∈ N and a ∈ Σ). A Stochastic Context-Free Grammar (SCFG) Gs is defined as a CFG in of appliPwhich each rule has a probability P cation associated to it such that ∀A ∈ N : B,C∈N Pr(A → BC) + a∈Σ Pr(A → a) = 1. We define the probability of the derivation dx of the string x, PrGs (x, dx ) as the product of the probability application function of all the rules P used in the derivation dx . We define the probability of the string x as: PrGs (x) = ∀dx PrGs (x, dx ). An important problem is the calculation of the probability of a string. For SCFG in CNF, there are different parsing algorithms that are based on the CYK algorithm. We describe one of them below. The inside algorithm [4] allows us to compute the probability of a string by defining ∗ e(A < i, i + l >) = PrGs (A ⇒ xi · · · xi+l ), 0 ≤ l < n, as the probability of the substring xi . . . xi+l being generated from A. This probability can be efficiently computed for a string of size n with the following Dynamic Programing scheme for all A ∈ N: e(A < i, i >) = Pr(A → xi ) 1 ≤ i ≤ n, X i+l−1 X e(A < i, i + l >) = Pr(A → BC)e(B < i, k >)e(C < k + 1, i + l >) B,C∈N : k=i
(1)
1 ≤ l < n, 1 ≤ i ≤ n − l.
(A→BC)∈P
In this way, PrGs (x) = e(A < 1, n >). First, we analyze the time complexity of the inside algorithm from expression 1. In the next section, we explain how the computation time can be improved. Note that the inner loop in the inside algorithm comprises two products and one addition. Suppose that we denote with the two products and the addition by a. Then,the total amount of operations is: n−1 n−l i+l−1 XX X n3 − 3n2 + 2n a|P | = a|P |. (2) 3 i=1 l=1
k=i
Consequently, the time complexity of the inside algorithm is O(n3 |P |).
3 Time reduction of the stochastic parsing algorithm Here, we explain how reduce the time required for the stochastic parsing. We state the improvement that can be obtained and finally, we explain the main disadvantage of the proposal. Note that in expression (1), each substring is a subproblem in the Dynamic Programing scheme. One possible way to reduce computations in expression (1) consists of precomputing all the problems. In this case, expression (1) can be computed by consulting such precomputed problems. It should be pointed out that with this proposal, the efficient search of a precomputed problem becomes an serious problem. In order to carry out this search efficiently, we have used hash tables. By using this data structure, the search time can be done linearly with the length of the subproblem. If the time complexity of looking for a subproblem of length l is l times c, where c is the implicit constant associated to the search in the hash table, then (2) becomes cf (n), where f (n) = (n3 − 3n2 + 2n)/3. Note that for real tasks it is reasonable to think that c