Best parse parsing with Earley's and Inside

0 downloads 0 Views 692KB Size Report
Inside parsing is better with sentences of 16 or less words for a grammar containing 429 states. In practice ..... 50 100 150 200 250 300. Network Size (states).
Natural Language Engineering 1 (2): 147-161

© 1995 Cambridge University Press

147

Best parse parsing with Earley's and Inside algorithms on probabilistic RTN YOUNG S. HAN AND KEY-SUN CHOI Korea Advanced Institute of Science and Technology Taejon, Korea

Abstract Inside parsing is a best parse parsing method based on the Inside algorithm that is often used in estimating probabilistic parameters of stochastic context free grammars. It gives a best parse in O(Af3G3) time where N is the input size and G is the grammar size. Earley algorithm can be made to return best parses with the same complexity in N. By way of experiments, we show that Inside parsing can be more efficient than Earley parsing with sufficiently large grammar and sufficiently short input sentences. For instance, Inside parsing is better with sentences of 16 or less words for a grammar containing 429 states. In practice, parsing can be made efficient by employing the two methods selectively. The redundancy of Inside algorithm can be reduced by the topdown filtering using the chart produced by Earley algorithm, which is useful in training the probabilistic parameters of a grammar. Extensive experiments on Penn Tree corpus show that the efficiency of Inside computation can be improved by up to 55%.

1 Introduction

Probabilistic approaches suggested to cope with the uncertainty raised in various levels of natural language processing are exemplified by Hidden Markov models (Charniak et al, 1993), Stochastic CFGs (Jelinek et al, 1990; Lari and Young, 1990), and Bayesian models (Han and Choi, 1993; Charniak and Goldman, 1993). Hidden Markov models have been successful for the problems of regular languages particularly in part-of-speech tagging and speech processing. For context free languages, probability acquisition algorithms are established for many diverse parsing methods. Some probabilistic frameworks are LR parser (Wright, 1990), Generalized LR (Briscoe and Carroll, 1993), Tree Adjoining Grammar (Schabes, 1992), Link grammar (Lafferty et al, 1992), Dependency grammar (Glenn and Charniak, 1992), and general CFG (Kupiec, 1991; Han and Choi, 1993). One of the reasons to associate probabilities with grammars is to retrieve the most probable analyses that can generate a given input sentence. Carroll and Briscoe (1992) suggested a heuristic technique for best-first search on a parse forest,

148

Y. S. Han and K.-S. Choi

while Viterbi-like algorithm for unpacking parse forests to retrieve n-best analyses was introduced by Wright et al. (1991). Though there have been many studies on the probabilistic parsing methods, not much is known on the performance of the best parse parsing in realistic implementations. In this paper, we introduce Earley and Inside algorithms as best parse parsers on Probabilistic RTN, and show that the two parsing methods are ideal to work together complementing the defects of each other. Inside algorithm, better known as a method to train probabilistic parameters, becomes a parser that returns n best analyses when the summations in the algorithm are replaced by maximizations. Alternatively Inside parsing can be said to be a Viterbi algorithm applied to the probabilistic RTN. The Inside parser runs at O(N3G2) time where N is the input size and G is the grammar size. The Earley algorithm has the same complexity in N (Aho and Ullman, 1972), but it turned out that Earley algorithm is less sensitive to N while Inside parser is less sensitive to G. According to our experiments, given a grammar of 429 states, Inside algorithm outperformed Earley algorithm with sentences whose length is less than 16 on the average. An important observation is that as the grammar size grows, Inside parsing will become more efficient than Earley's making Earley parsing good only for longer sentences. On the other hand, the performance of Inside algorithm is degraded as the sentence size grows. It is then reasonable to use the two parsing methods selectively according to the sentence length. The computation of an Inside probability (or parsing) can be made to run faster with the aid of Earley algorithm. After Earley parsing is applied, subsequent Inside parsing works on the chart produced by the Earley parser so that the number of inside computations of the Inside parsing may be reduced. This combined method is suggestive particularly for the problem of reestimating probabilistic parameters of grammars. The implementation of Earley algorithm for the experiments employed reachability test in constructing each Earley list to reduce the redundant items, but no other tricks are used to enhence the speed. Inside parsing also used a chart to store computed insides so as to avoid the same computations. The insights from our experiments must lead to the more practical design of a wide range of language processing applications. Because the Inside parsing is natural for best parse parsing and parsers for probabilistic grammars may well be a best parse parsing method, we compared Earley and Inside algorithms as best parse parsers, but the conclusions from the experiments should also apply to the plain version of the algorithms since best parse parsing does not cost more time in chart based parsing. In the next section, a brief explanation on our underlying representation, probabilistic RTN, is given followed by the definition of Inside parsing. In section 3, Earley algorithm for probabilistic RTN is described. In Section 4 empirical results to support the performance characteristics of Earley and Inside parsers are presented, and an efficient method of Inside computation that uses the result of Earley parsing is also explained in section 5. A conclusion is given in section 6.

Best parse parsing CFG

NP NP NP AP AP

149

art AP noun AP noun noun adj AP adj

PRTN

Fig. 1. Illustration of PRTN. A parse is composed of dark-headed transitions.

2 Probabilistic RTN and Inside parsing

In this section, we first describe probabilistic RTN (PRTN) and define Inside parsing for PRTN after explaining Inside algorithm. PRTN is a RTN with probabilistic transitions and probabilistic word selections at some transitions (Han and Choi, 1994). A PRTN consists of possibly many subnetworks each of which corresponds to a nonterminal. Transitions are typed according to the grammatical role they play. Nonterminal transitions lead to subnetworks, each terminal transition defines its own lexical table, and return transitions lead back to higher subnetworks. There are also empty transitions that are like terminal transitions without lexical tables. There is one start state denoted by Sf, and one final state denoted by #". Conceptually each subnetwork can be pictured as an independent network. Subnetworks are, however, interconnected to one another through nonterminal and return transitions. A parse or an analysis is a sequence of the transitions of the four types (dark-headed transitions in figure 1). The probability of a parse (a sequence of transitions) is simply the product of the probabilities of transitions and words defined on the transitions that compose the parse. The probability of an input sentence is, then, the sum of possibly many ambiguous parses that can generate the sentence. Other major tasks carried out on probabilistic grammars are to assign values to probabilistic parameters and to retrieve one or more parses with highest scores from ambiguous parses. The parameter reestimation algorithm for PRTN is explained in Han and Choi (1994), and the best parse problem is the topic of this paper. The best parse parsing method to be introduced is based on the Inside algorithm used to compute the probability that a fragment of an input sentence is generated by a fragment of a given PRTN. In fact, the inside algorithm works exactly in the same manner as Viterbi algorithm, thus Inside parsing is simply a Viterbi algorithm

150

Y. S. Han and K.-S. Choi

subnet

subnet sentence

Fig. 2. Illustration of Inside probability. applied to Probabilistic RTN or other stochastic grammars. In the following, let us first describe Inside algorithm. An input sentence of length N is denoted by w

=

w{w2

•••

wN.

Then Inside probability is defined as follows. Definition 1 The Inside probability denoted by P/(0s-r of state i is the probability that the subnetwork of state i generates the string Ws^t positioned from s to t starting at state i given a network X. The more constructive description is as follows.

(1)

ik is a terminal transition and ij is a nonterminal transition of the subnetwork state i belongs to. The probability of transition ik is denoted by a*. State u is the final state of the subnetwork state j belongs to, and state v is one of the states the return transitions of / s subnetwork lead to. The subnetwork of state i must be the same as that of state v. b(ik, Ws) is the probability of s,j, input word defined on the transition ik. Figure 2 gives a pictorial description of the Inside probability. After the last word is processed, the final state of the subnetwork of state / should be reached. If the last word is the final word of an input sentence as in PI(I)S~N, t n e final state must be J* for the parsing to be successful. 1 0

if i is the final state of the subnet of /, otherwise.

Best parse parsing

151

In computing P/(i) w , there are two types of quantities to be summed. When the immediate transition ij is of terminal type, the transition probability a,j and the probability of s,/, word as denned at the transition, b(ij, Ws), are multiplied together with the Inside probability of the rest of the input sentence, W&i^,,. When the immediate transition is of nonterminal type, the probability that the subnetwork initiated by the nonterminal produces sentence fragments of various lengths within Ws^t is the sum of Inside probabilities. When ij is a pop transition, the Inside probability is zero because a subnetwork cannot generate any word starting at its final state. The probability of an input sentence can be computed using Inside probability as =Pi(sr)i~N.

p(w\i)

More specifically, Inside algorithm can be made as a parser by replacing the summations in equation 1 by maximizations as follows. Ps&(i)s~t (2)

=

max ( max( alkb{ik, max(m'ax( aija j

r=s

s~t is the probability of the best analysis that covers VFW with the network fragment from state i to the final state of the i's subnetwork. The parse analyses can be retrieved by concatenating the partial results along the Inside recursions. There can be numerous analyses, but only a best analysis needs to be stored at each maximization. Let Ajr&(i)s~t denote the output analysis of Pj&{i)s^t. The best parse Inside parsing is defined as follows. Aj&{i)s~t (3)

=

argmax ( argmaxfc( argmax;-(argmax£=s( ayaU0

The cubic complexity of Inside parsing can be easily seen by observing that there are three variable points (s, r, t) that determine a composition of s and t when the next transition is of nonterminal type. Consequently, computing all the Insides takes cubic complexity in the size of an input sentence. 3 Earley parsing on probabilistic RTN

In this section, we describe Earley best parse parsing modified for PRTN. Earley parsing is of interest because we first like to see how efficiently the parser runs in comparison to Inside parsing, and the chart produced by Earley algorithm can be used to reduce the Inside computations. There have been attempts to apply Earley algorithm to RTN (Woods, 1970; Kochut, 1983). The difference in our variation of Earley algorithm from the previous ones lies in the content of items of parse lists. Our variation uses RTN states in the items, so the algorithm works on the fragments of the network. Previous variations, on the other hand, used production rules in the items.

152

Y. S. Han and K.-S. Choi

The algorithm starts from obtaining a set of terminal transitions that are reachable from start state S? with respect to the first word of the given sentence. The next step is to expand the subnetworks that the terminal transitions belong to. The expansion of a subnetwork is accomplished in the following three cases. Case 1. When the immediate transition is of terminal type, the transition is advanced if the next word is defined on the transition. Case 2. When two items in a subnetwork are adjacent, a concatenated item is produced. Case 3. When the immediate transition is of pop type and the current subnetwork is completed, a transition in the parent subnetworks is advanced. In this way, the network updates are propagated into higher subnetworks. The above steps are synchronized by each word of an input sentence. The recognition of an input sentence is accomplished by building an item list for each word, which corresponds to the construction of a parse list in Earley algorithm. Refer to figure 3 for the description of the Earley algorithm for PRTN. In the algorithm, the empty transitions are also taken into account. An item is a 5 tuple as ([stateL, stateR], i, prob, ptr).

[stateL, stateR] indicates a fragment of a subnetwork from stateL to stateR, and i is a sentence position. An item belonging to list j indicates that network fragment from stateL to stateR generates sentence fragment, WVi~/, prob keeps the highest probability of the items inducing the current item. Pointers are used to make it easy to keep track of item developments that are used to retrieve analyses. The complexity of the algorithm is the same as that of Earley algorithm. In the case of unambiguous grammars, the complexity is O(N2). If the grammar is ambiguous, there can be duplicate items in a list. There are additional checks to be made for every candidate item to prevent the duplication. To detect the duplication, we have to look over the list, which is the source of the complexity of O(N3) (Aho and Ullman, 1972). Items that do not lead to the next word are discarded, which is based on the computation of FOLLOW sets (Aho and Ullman, 1972). No other measure to improve the speed is taken such as an efficient list search. It is possible to obtain further efficiency by using the best first search method (Allen, 1994).

4 Earley versus Inside parsing

In this section, we show by way of experiments that Inside parsing can be more efficient under certain conditions than Earley parsing. The goal of experiments is to examine how Inside and Earley parsing perform in terms of network size and sentence length, and to determine the appropriate conditions under which each parser works better than the other.

Best parse parsing

An item Input Output

153

([state, state], j) PRTN and a sentence WiW2---WT Parse list H0HiHy • HT

1. Construct H o a', add ([S,S],0) to H o . Repeat the following until no new item can be added to Ho. a. if pq is an empty transition and ([a,p],0) is in H o , add ([p,q],0) to H o . b. if (lp,q],0) and ([q,b],O) are in H o , add (\p, b],0) to Ho where p and b are in the same subnet. where a is the start state of subnet and d. if there is an item ([a, b],0) add ([p,p],0)'s to Ho

bp is nonterminal.

2. Construct H, provided that Ho,Hi,- • H^_i have been constructed. a' add (lp,q],j-l)'s to H, if b(pq, W}) > 0, where ([a,p],i)'s are in H^_i. and a is the start state or a = p. Repeat the following steps until no new items can be added. a. if pq is an empty transition and ([a,p],i) is in H ; , add ([p,q],j) to Hj. b. tf([p,q],i) and ([q,b],k) are in Hk and H, respectively, where p and b are in he same subnet. add (\p,b],i) to H) where p is start and q is a final state of a subnet c. if (\p,q],k) is in H, add ([a,b],k)'s to H}

d. if there is an item ([a, b],i) add ([p,p],j)'s to Hj

where qb is of pop type, there is item ([s,a],i) in list k as s is start state. where a is the start state of subnetwork and bp is nonterminal.

Fig. 3. Earley PRTN algorithm. Basically the algorithm depicts a process of expanding parses. Step l.a! expands the parse by a terminal transition, step a by an empty transition, step b by concatenation, and step c advances the parse into higher subnetwork. Step d's collect the reachable states from the current state.

All the experiments were done on a Sparc 10 workstation and measured in cpu seconds. The parsers are coded in C language. Sample sentences for constructing PRTN and parsing are selected from the Wall Street Journal trees of the Penn Tree corpus. There are four test sets as in Table 1. The networks used in the various tests are constructed by extracting CFG rules from the sample sets and transforming the rules into networks. The transitions and states are merged as long as the equivalence of the network with the CFG rules is not violated so that sentences may be generated by the constructed network if and only if the CFG rules generate them. Once the network is prepared, parameters are estimated. When the parse trees of the training sentences are known, counting the frequencies of transitions used in the trees is sufficient to determine the values of the parameters. A separate network is constructed from each test set, and the same sentences are used in testing parsing on the network since our concern here is not to

154

Y. S. Han and K.-S. Choi Table 1. Test sets of sentences. Set size set set set set

I II III IV

25 50 100 1545

Acquisition method random selection of sentences with 30 or random selection of sentences with 30 or random selection of sentences with 20 or all sentences with 10 or less words in the

less less less test

words words words Corpus

Table 2. Networks used in the testing Number of states (CFG rules) network network network network network network network

I II III IV V VI VII

101 121 155 222 362 429 123

(119) (178) (264) (406) (678) (927) (206)

Number of sentences used

test set I test set II 100 sentences with 30 or less words 200 sentences with 30 or less words 500 sentences with 30 or less words 1000 sentences with 30 or less words test set III

see how well the grammar covers unseen sentences, but to see how efficiently parsers run. Table 2 shows the composition of the networks used in testing differing in the number of states. Figure 4 reveals an interesting property of a network grammar. The number of network states tends to saturate while the number of CFG rules continues to grow in line with to the number of sentences. In fact, the number of rules continued to grow linearly till the last sentence of the Wall Street Journal trees as we tried to extract CFG rules. As is shown in Figure 5, the performance of Earley algorithm rapidly deteriorates as the network size increases, though once network size is fixed, Earley algorithm will outdo Inside parsing with sufficiently long sentences. Figures 5 and 6 show that the Earley parser is better than the Inside parser when sentences are sufficiently long. The opposite is the case as the grammar size increases. Inside parsing became more and more competitive with larger grammars. Figure 7 shows a partial result from testing Test set II on Test network VI. In the figure, Inside parsing performs better for the sentences of a certain sentence length limit. It should be noted that the implementation of Inside parsing itself reduces a good deal of the Inside computations by way of some obvious tricks, such as the second Inside of the nonterminal case in equation 2 is tried only when the first Inside is successful. Also computed Insides are maintained in a table to avoid the same computation. In fact, the implementation of Inside parsing without these tricks becomes scarcely practical, and may not have any merit over Earley parsing. It is not hard to understand what makes Earley and Inside algorithms behave in a rather different way. The following explains the difference between the two algorithms.

Best parse parsing

600 500

155

CFC rules RTN states - 9—

400 CFG rules RTN states 300 200

-.

r

« =

"

»—

©—

100 0

200

400

600

800 1000 1200 1400 1600 Sentences

Fig. 4. Network size versus sample size. The growth of network size saturates as the sample size increases. Test set IV is used in 9 different sizes.

• Higher sensitivity of Inside algorithm to the sentence length: Inside algorithm tests if each candidate nonterminal generates a set of possible fragments of a sentence. On the other hand, the source of the Earley's cubic complexity comes from searching other lists that mostly contain valid items after reachability test, thus the search is acted upon mostly valid search space. • Higher sensitivity of Earley algorithm to the grammar size: In constructing a list, Earley algorithm extends existing items to cover the breadth of the grammar. The more rules or states means the more choices derivable from an existing item. Inside algorithm, by cutting the second inside computation, avoids exploring the whole space of choices, lessening the search space significantly. As a result, the effect of cutting acted more strongly on the network size than the sentence length. If the grammar size becomes sufficiently large, Inside algorithm will be better than Earley algorithm (see Figure 8). Figure 8 shows that the Inside algorithm ran faster with the sentences of 16 or fewer words on the grammar that is not big enough for practical applications. The implication may be more significant given that the Inside algorithm may take the place of Earley algorithm as an efficient parser for the large scale applications with grammars constructed from many thousands of sentences of non-trivial length. The success ratio of the best parses being a correct parse in Inside parsing with 1545 sentences with 10 or less words is about 82% where the grammar was also constructed from the same test set. For 100 unseen sentences the precision of the best parses was about 71%. In a strong sense for a sentence to be successfully parsed, not only grammatical sequence but also words must be defined appropriately within the

156

Y. S. Han and K.-S. Choi

20

I arley's parsinj Inside parsing

/

/

15 Time (sec.)

/

10 C

_

• —

50

100

150 200 250 300 Network Size (states)

400

350

450

Fig. 5. Network size and parsing methods. Average time over the 5 sentences with 10 or less words in test set I.

5000 I arley's parsing Inside parsing -e—

4000

/ /

/

Time (sec.)

/

3000

1 /

2000 /

1000 0

t

L

0

50

/

*- ——

— • - — •

100 150 200 250 300 Network Size (states)

350

400

450

Fig. 6. Network size and parsing methods. Average time over the 5 sentences with between 26 and 30 words in test set I.

network because each terminal transition has its own dictionary. It should be noted that the success ratio, however, is heavily dependent on the syntactic complexity of unseen sentences. In our case, the unseen sentences are arbitrarily generated by the grammar so that the parsing may always succeed. Considering the ambiguity of the constructed grammar, the success ratio is reasonable.

Best parse parsing

157

y

1200 1000

/

I arley's parsing _. Inside parsing

800 i

Time (sec.)

600 400

/

/

/ )

200 •—< 6 8 10 12 Sentence Length (words)

0

14

16

18

Fig. 7. Sentence length and parsing methods. Test set II was run on the network VI. The figure shows only a partial view of the results.

18 16 14 12 Sentence length

10 8 6 4 2 0 50

100

150 200 250 300 Network Size (states)

350

400

Fig. 8. Performance breakpoints between Inside and Earley parsers. Inside parsing tends to work better than Earley's as the network size increases. Test networks I ~ IV are used. In the case of Inside parsing, the estimate of time cost for sentences having more than 20 words with networks consisting of more than 300 states can be easily over 1000 seconds, which is higher than Briscoe and Carroll reported (Briscoe and Carroll, 1993). The discrepancy can be attributed to the observation that Inside parsing tests every possible combination of sentence fragments for each candidate

158

Y. S. Han and K.-S. Choi

parse fragment, which is critically sensitive to the sentence length. On the other hand, Briscoe's unification based GLR (Briscoe and Carroll, 1993) is said to work on more information in addition to the proven efficiency of GLR parsing control. A means of improving Inside parsing outlined in the next section is thus worthwhile. 5 Efficient computation of Inside probability The complexity of Inside computation is attributed to the excessive number of Insides to compute. Though the actual Insides count far fewer than the cubic order of input length and grammar size, computing more than 2000 Insides will take more than a minute on a Sparc 10 workstation. The Earley algorithm as introduced in the previous section produces a chart of nonredundant items that participate in forming valid parses. Valid items are identified by tracing from the last item (\Sf, i^],0) in the N,h list after parse lists are completed. Once invalid items are filtered out, each of the remaining items is interpreted as a chart item. For an Earley item, ([i,j],s—l,K), belonging to list t, the corresponding chart item takes the following form. ([i,j], s, t, K) The item says that the network fragment between states i and j generates the sentence fragment between s,j, and t,h words. If a unit nonterminal transition between i and j generates the sentence fragment, the type of nonterminal transition (for example, ADJP, NP) is also recorded into the chart item. Since there can be more than one nonterminal between two adjacent states, K maintains a list of nonterminal symbols. This situation takes place only when step c of the algorithm in Figure 3 is applied. The nonterminal symbols in K supply with more restrictive information in composing Insides when the next transition is a nonterminal type in the Inside parsing. Efficient Inside {SJ) probability composes recursive Insides according to the chart produced by Earley parsing. The parsing algorithm of equation 3 is modified as follows.

(4)

Obviously the heavier computational cost is required when the next transition at state ( is a nonterminal type than when it is a terminal type. Instead of the blind composition of i,j,s,r that entails the computation of PgAJ)s~r and PgAv)r+\ t, only such compositions as are valid are selected from the chart and applied to the recursive computations. For the terminal transitions, xi(i,k,s) examines whether there is a chart item that confirms that the transition generate st;, word. For the nonterminal transitions, X2(Uj,s,r,T,j) checks whether there is a chart item that says that the transition ij with tag xxl generates the sentence fragment Ws^r-

Best parse parsing

160

159

1

Earley's parsing • • Ins ide computation Efficient Ins ide computation

/

140

\

120

100

Time (sec.)

80

60

1I

40

1

r

20

.

10 Sentence Size (words)

15













20

Fig. 9. Earley, Inside, and Efficient Inside parsing with test set III on network VII. Efficient Inside parsing saved 46% of the cost of Inside parsing.

The experiments show that the use of the chart produced by the Earley parser reduces the number of Inside computations by 36% when tested on the network VII with test set III (see Figure 9). The saved Insides complemented the cost of Earley parsing and gave net performance gain of up to 46% in time. If the Inside algorithm is used only for those sentences in which the Inside algorithm is better than Earley's, the gain becomes 55%. The improvement should be understood bearing in mind that the grammar is highly ambiguous, for the grammar was not carefully crafted but was a result of semi-automatic processing.

160

Y. S. Han and K.-S. Choi 6 Conclusion

Computing a best parse is one of the major problems with probabilistic grammars, and is important in practice as well as in theory. Theoretically obtaining a best parse in the sense of the parse probability cost an order of cubic complexity in the size of input sentences for probabilistic CFGs. In practice the parsing cost is usually high. Though there have been reports on heuristic approaches, little is known about the precise nature of the performance. We chose Inside algorithm, redefined it as a parser, and compared its performance with that of the Earley parser to arrive at the conclusion that the Inside parser can be better than the Earley's under some conditions. We also suggested that the computation of Inside probability be made efficient by using the result of Earley parsing. In summary, the following conclusions can be derived from our work. 1. Inside parsing can be more efficient than Earley's when the network size is sufficiently large but sentence length is sufficiently small. The sufficiency is determined by the sensitivity of Earley and Inside parsers to the grammar size and the sentence size of a given problem respectively. 2. The new method for computing Inside probabilities needed in reestimating probabilistic parameters of probabilistic grammars is shown to reduce the time cost by up to 55%. If the length of input sentences is less than 15 and the network contains more than 200 states, it is very likely that Inside parsing that returns best parses is more efficient than Earley's. Given this conclusion, some practical systems may be built allowing more choice of methods to select best parses. Certainly the results of our work are not limited to PRTN, and thus can be carefully exported to other grammar representations. References Aho, Alfred V., and Ullman, Jeffrey D. (1972) The Theory of Parsing, Translation, and Compiling, vol. I. New Jersey: Prentice Hall. Allen, J. (1994) Natural Language Understanding. 2nd edition. Benjamin Cummings. Briscoe, T., and Carroll, J. (1993) Generalized probabilistic LR parsing of natural language (Corpora) with unification-based grammars. Computational Linguistics 19(1): 25-57. Carroll, J., and Briscoe, T. (1992) Probabilistic normalization and unpacking of packed parse forests for unification-based grammars. In proceedings, AAAl Fall Symposium Series: Probabilistic Approaches to Natural Language. Cambridge. Pp. 33-8. Charniak, E., and Goldman, R. (1993) A Bayesian model of plan recognition. Artificial Intelligence. 64(1): 53-79. Charniak, E., Hendrickson, C , Jacobson, N., and Perkowitz, M. (1993) Equations for partof-speech tagging. In proceedings, AAAl Conference. Glenn, C , and Charniak, E. (1992) Learning probabilistic dependency grammars from labelled texts. In Proceedings, AAAl Fall Symposium Series: Probabilistic Approaches to Natural Language. Cambridge. Pp. 25-32. Han, Young S., and Choi, Key-Sun. (1993) Lexical concept acquisition from collocation map. In Proceedings, a workshop of SIGLEX: Acquisition of Lexical Knowledge from Text. Ohio.

Pp. 22-31.

Best parse parsing

161

Han, Young S., and Choi, Key-Sun. (1994) A Reestimation algorithm for probabilistic transition network. In proceedings of COLING. Kyoto. Pp. 859-64. Jelinek, R, Lafferty, J. D., and Mercer R. L. (1990) Basic methods of probabilistic context free grammars. IBM RC 16374. IBM Continuous Speech Recognition Group. Kochut, K. (1983) Towards the elastic ATN implementation. In B. Leonard (ed.), The Design of Interpreters, Compilers, and Editors for ATN. New York: Springer-Verlag. Pp. 175-214. Kupiec, J. (1991) A Trellis-based algorithm for estimating the parameters of a hidden stochastic context-free grammar. In Proceedings, Speech and Natural Language Workshop, sponsored by DARPA. Pacific Grove. Pp. 241-6. Lafferty, J., Sleator, D., and Temperley, D. (1992) Grammatical trigrams: a probabilistic model of link grammar. In Proceedings, AAAI Fall Symposium Series: Probabilistic Approaches to Natural Language. Cambridge. Pp. 89-97. Lari, K., and Young, S. J. (1990) The estimation of stochastic context-free grammars using the Inside-Outside algorithm. Computer Speech and Language. 4: 35-56. Schabes, Y. (1992) Stochastic lexicalized tree-adjoining grammars. In Proceedings, the I5,h International Conference on Computational Linguistics. Woods, W. A. (1970) Transition network grammars for natural language analysis, Communication of the ACM 13. Wright, J. H. (1990) LR parsing of probabilistic grammars with input uncertainty for speech recognition. Computer Speech and Language 4:297-323. Wright, J., Wrighley, E., and Sharman, R. (1991) Adaptive probabilistic generalized LR parsing. In Proceedings, 2nd International Workshop on Parsing Technologies, Cancun, Mexico. Pp. 154-63.