Generative Connectionist Parsing with Dynamic ... - Semantic Scholar

2 downloads 0 Views 73KB Size Report
with Neural Networks (Pollack 1987) (Waltz and. Pollack 1985). Pollack used in ..... mechanism, and still to my former students Habibatou Kone and Christoph.
Generative Connectionist Parsing with Dynamic Neural Networks Christel Kemke Department of Computer Science 562 Machray Hall University of Manitoba Winnipeg, Manitoba, R3T 2N2, Canada

[email protected]

Abstract Dynamic Neural Networks (DNN) are a new approach within the Neural Network (NN) paradigm, which is based on the dynamic construction of Neural Networks during the processing of an input. This methodology allows the representation and processing of recursively defined structures and circumvents the problems of traditional, fixed-size Neural Networks with the processing of input structures of unknown, arbitrary size. The DNN methodology has been employed in the so-called ‚Hybrid Connectionist Parsing‘ (HCP) approach, which comprises an incremental, dynamic, on-line generation of Neural Network parse trees. In this paper, we describe the general principles of the HCP method and some of its specific Neural Network related features. We also discuss the use of a modified HCP with respect to robust, fault-tolerant parsing of ungrammatical inputs.

1

Introduction

The HCP method of parsing with Dynamic Neural Networks was inspired by the early work of Jordan Pollack on word-sense disambiguation with Neural Networks (Pollack 1987) (Waltz and Pollack 1985). Pollack used in his system a traditional chart-parser as front-end to the disambiguation network and one aim of the initial work on the HCP method was to transfer the concept of chart-parsing or traditional parsing in general into Neural Networks. In the early stages of Connectionist NLP (CNLP), some researchers have attempted to model or emulate well-known

parsing methods with Neural Networks by simulating stacks and pointers etc. (see e.g. Schnelle (1988) for the Early-Parser and Fanty (1985) for the CYK-parser). A more recent trend in CNLP is the use of Recurrent Neural Networks RNNs (Elman 1990, Reilly and Sharkey 1988, Reilly and Sharkey 1992, Sharkey 1992, Wermter and Weber 1997, Jain 1989, Jain and Waibel 1990). RNNs have a certain ability to represent prior inputs in the hidden units of the network, with feedback connections to the current input representation and thus are able to deal with inputs of arbitrary size – although in a limited way since they do not have the full possibility to represent recursive structures.

2

Dynamic Neural Networks

We suggest an alternative approach to RNNs within the Neural Network paradigm, which addresses the problem of processing and representing time-dependent input structures of unknown size in a more suitable manner. The general concept is called Dynamic Neural Networks (DNN) and is based on the idea of a dynamic construction of a Neural Network. DNNs employ representations of grammar rules or recursive functions in so-called ’mini-networks’. These mini-networks are successively composed into larger NNs in a generate-and-merge process, which is dynamically triggered by the occurrence of external inputs, or fully activated nodes within a generated network. Due to the generative character of DNNs, the processing and representation of inputs is not limited by a fixed-size, relatively rigid network architecture but reflects the generative, constructive paradigm, which is inherent to grammar-based descriptions of natural languages, for example. The methodology is in these respects related to the parsing method developed by Kempen and Vosse (1989,2000).

3

the father node of an already constructed partial NN-parse-tree)

The Hybrid Connectionist Parsing Approach

The general idea of DNNs has been employed in the ‘Hybrid Connectionist Parsing’ (HCP) method. The basic concept of processing in the HCP approach is as follows: A mini-network represents the rule of a CFG in the following form: It is a two-layer network with one root node and one or more son-nodes. The root-node corresponds to the LHS of the rule and the son-nodes correspond to the items on the RHS (cf. Figure 1). Each node receives a symbol marker, which represents the respective symbol (syntactic category, lexical entry, etc.) of the grammar. Connections in a mini-network are directed from each son-node to the one root- or father-node. The weight of a connection in such a mini-network is 1/n, where n is the number of son-nodes of the mini-network (symbols on the RHS of the rule). NT 1/n

2. the successive merging of mini-networks and partial NN-parse trees by unifying a fully activated father-node of a sub-network with a matching (leftmost) open son-node of a partial NN-parse-tree 3. the transfer of activation in the resulting new network Step 1 exploits parallelism since all possible derivations/rule applications are generated at this time. Step 2 ensures a strict LR parsing due to the constraint that a binding occurs only to the leftmost open son-node. This step also ensures the correctness of the parsing method as a deterministic parser since a structure has to be fully recognized (has a fully activated fathernode) in order to be connected to the existing initial partial parse-tree.

initial parse tree

new mini-network

1/n 1.0

A1



An

Figure 1. Mini-network for the general grammar rule NT ← A1 … An with father-node NT, son-nodes A1 ,…, An and connection weights 1/n for each connection Ai to NT, i=1,…n.

Each unit or node in the network represents in the HCP approach a syntactical category or lexical input item. This is reflected in a symbol marker associated with each node. The units are in general simple threshold units with threshold 1.0, a linear input and threshold dependent output function. The activation value ranges in general between 0 and 1 and it represents the state or degree of recognition of the respective syntactic item (cf. Figure 2).

3.1

Processing in the HCP Approach

Processing of an input in the HCP approach comprises the following steps (cf. Figure 2): 1. the instantiation or generation of mini-networks, triggered by fully activated nodes (a lexical input node or

merging

the

det

0.5 NP

NP

0.5

0.5 man

S

noun 1.0

0.5 VP

combined parse tree 1.0 the

det

0.5 NP

man

noun

0.5

0.5

S

1.0 VP

0.5

full activation a = 1.0 medium activation a = 0.5 no activation a = 0.0

Figure 2. Merging of a generated NN-parse tree for “the man“ and a new mini-network for S ← NP VP, triggered by the fully activated NP father-node of the initial parse tree.

3.2

The Standard HCP

The HCP approach to parsing is called ‘hybrid‘ since, on the one hand, it employs a Neural Network representation of the parse tree and uses special NN-features like weighted connections and activation transfer in the parsing process, and, on the other hand, it is based on traditional symbolic concepts of parsing and grammars. Due to the dynamic construction of NN parse trees, the HCP approach encompasses the typical problems of NN parsers and enables full recursion. The standard HCP approach as described above is equivalent to deterministic bottom-up parsing. In addition, it integrates typical NN features, like parallelism, weighted connections and graded activation. In the next section, a special NN-based mechanism for the parallel processing of competing grammatical rules is described.

3.2.1 Competing Derivations The technique to ensure the correct selection of a derivation for competing right-hand side (RHS) of rules with the same left-hand side (LHS) is based on the concept of mutual inhibition, similar to winner-take-all networks. In order to integrate this mutual inhibition into the parsing process, one strategy is to introduce ‘hypothesis nodes’ for the competing derivations, and small inhibitory connections, which implement an inhibition of invalid hypothesis nodes. This leads to more complex mini-networks, which provide a combined representation of the different RHS for the same LHS (see Figure 3). VP

h1

V

h2

NP

h3

h4

PP

1.0 0.5 0.333 -0.2

Figure 3. Representation of competing RHS for the same LHS in one mini-network. h1 stands for VP ←

V, h2 for VP ← V NP, h3 for VP ← V PP, h4 for VP ← V NP PP.

This complex mini-network is generated on the basis of the following rule : • For each competing rule define a hypothesis node. • Provide a positive connection for each RHS item to the related hypothesis-node with the connection weight 1/n, if n is the number of items on this RHS. • Add a negative connection, e.g. –0.1, from any RHS-items to a hypothesis node, if the RHS-items is not contained in the RHS of the rule, which is represented by this hypothesis node. In Figure 3 the connection weights for the RHS items V and NP to the hidden hypothesis node h2, which represents the rule VP ← V NP, equal 0.5 (1/2 for 2 RHS items). An inhibitory connection is introduced, for example, from NP to h1, since h1 represents the rule VP ← V, and NP is not on the RHS of this rule. We can show easily that this implementation ensures a correct, parallel processing of competing rule applications : A hypothesis node (h-node) becomes fully activated if and only if the sum of its inputs is equal to or larger than 1.0. This means that all items on the RHS of the rule represented by this h-node must be fully activated with 1.0, since only n∗1/n∗1.0 (number of RHS-items∗connection weights∗activation of each RHS-item) yields full activation 1.0 of the h-node. In addition, no item which is not on the RHS of this rule can be activated, since otherwise the h-node would receive negative input and thus not be fully activated anymore.

4

Towards Robust Parsing with the HCP

A major issue of current investigations is the use of the HCP method for robust parsing, i.e. parsing of ungrammatical structures (cf. also Kemke 2000). Ungrammatical structures are often observed and frequently used in spontaneous speech. Thus, any integrated speech and language processing system has to provide mechanisms for a syntactic analysis, which take care of aberrations from standard grammatical constructs

as used for the analysis of written text. Some modifications of the standard HCP have been suggested for dealing with phenomena, which are typical for spontaneous spoken dialogues, like repetitions, corrections, insertions, reversions of substructures, and (occasionally) omissions. In general, these speech phenomena involve ungrammaticalities, which become obvious during the parsing process and might lead to a complete parser failure. In order to deal with this kind of speech phenomena, the HCP method has to detect an ungrammaticality, and then either tolerate, ignore or repair it.

4.1

Ungrammaticality and the Standard HCP

In the HCP method, indications for the occurrence of an ungrammatical structure are the following two cases: 1. a (partial) parse tree has either an open son-node which does not become fully activated, or 2. a father-node is fully activated but the related partial parse tree cannot be integrated into another initial parse tree. It can be shown that in any other case the parsing process procedes without problems and produces a syntactical analysis according to the given context-free grammar (cf. Kemke 2000).

4.2

enable the repair of ungrammatical constructs (see section 4.2.4). The HCP method can be modified with respect to the processing of speech related, grammatical abnormalities on the following levels: changes to the merging process (sentence ordering, violation of LR parsing); changes to the activation transfer function (incomplete structures); changes to the weight setting (incomplete structures, ‘cognitive’ human parsing), and overlapping of competing partial parse trees (repetitions and corrections, see 4.2.4).

4.2.1 Changes to the merging process This involves in particular the possibility of later/earlier merging of partial parse trees: a parse tree, which cannot be merged immediately, can be kept and inserted if possible in an appropriate place, later during processing. This modification overcomes strict LR-parsing and takes care of variations in the sentence structure.

4.2.2 Changes to the activation transfer function. One possibility to allow for incomplete structures is to change the activation transfer function such that the sum of inputs can be lower than the threshold value, and nevertheless activation is passed on. Since this modification would violate the correctness of the parsing method, it requires careful examination and has not yet been further investigated.

Modifications of the Standard HCP

A general formal analysis of distortions of strings together with an informal analysis of transcribed corpora of spoken dialogues 1 has preceded the investigation of the HCP approach with respect to the processing of spontaneous spoken natural language. These studies resulted in the suggestion of several modifications of the standard deterministic and correct HCP method, which allow the tolerance of speech-specific phenomena, like repetitions and corrections of utterances and parts thereof, and which might also

1 transcripts of speech dialogues prepared in the SFB 360, ‘Situated Communicating Agents’, University of Bielefeld, Germany

4.2.3 Changes to the weight setting An adaptation of the connection weight setting by selecting weights, which reflect the ‘relevancy’ of a syntactical structure (son-node of a mini-network) instead of choosing equal weights for all son-nodes of a mini-network, can take care of incomplete structures and inputs. The connection weights of ‘irrelevant’ or ‘unnecessary’ items can be set very low or to 0, and the weights of ‘mandatory’ or ‘necessary’ items (son-nodes of a mini-network) have to be set close or equal to the threshold value. Then, the recognition of a sub-structure depends on the appearance of the necessary items, and further items can be absorbed but are not crucial for the

further processing. Example: For the rule (NP  det noun) set the connection weight in the corresponding mini-network for ‘NP GHW¶WR and for ‘NP  QRXQ¶ WR  7KXV D GHWHUPLQHU (det) can be absorbed by this mini-network but does not have to be recognized. A noun is mandatory and has to be recognized to fully activate the father-node (assuming the threshold, the full activation and the output values are 1).

4.2.4 Overlapping of partial parse trees Corrections and repetitions in spontaneous speech are characterized in the HCP parsing process by the appearance of a second, similar (partial) parse-tree, which competes in the first initial parse-tree for a binding to the same son-node. Often, one of these parse-trees is incomplete, and thus a partial parse tree (PPT). Example: “I went to – to the shop.” This utterance has an incomplete VP, i.e. a not fully activated VP-node, due to an incomplete PP-structure (in the first parse-tree). This PP appears again as a complete PP-structure in the correction/repetition part and forms a second partial parse-tree. Both structures (the incomplete PP ‘to’ and the complete PP ‘to the shop’) compete for binding to the PP son-node in the VP. One solution for this phenomenon is to ignore and over-read the incomplete PP. The more general solution is to overlap and combine both sub-trees in order to yield a complete, repaired, grammatically correct sub-structure. The essence of this graph-overlapping method is to compare and combine two (partial) parse trees – which were identified based on the detection of ungrammaticalities in the HCP as described above – and find a suitable overlap based on matches of symbol markers and structures. This problem can be seen as a graph matching problem for marked graphs, where markers associated with nodes are representing syntactic categories. The comparison method developed and used so far is constrained to the examination of directed acyclic graphs, i.e. directed trees. A sample input, with comparison results, and resulting output is shown in Figure 4 for a noun phrase completion and in Figure 5 for the correction and completion of a prepositional phrase inside a verb phrase.

----------------------INPUT-------------------------------peter takes the red --- the blue block ---------------------------------------------------------------1. partial parse tree S [NP( PR-peter) VP(V-takes) NP(DET-the ADJ-red NIL)] 2. partial parse tree [ NP( DET-the ADJ-blue N-block)] ---------------------------------------------------------------MATCH NP MATCH DET MATCH the MATCH ADJ NO MATCH 1) red 2) blue NO MATCH 1) NIL 2) N-block ----------------------OUTPUT-----------------------------peter takes the blue block ---------------------------------------------------------------S [NP(PR-peter) VP(V-takes) NP(DET-the ADJ-blue N-block)] ---------------------------------------------------------------Figure 4. Sample for a comparison and overlap of two partial parse trees. Shown are the input sentence, the compared syntactic structures, the comparison results for syntactic categories and words, and the output sentence and structure as result of overlapping the parse trees. The example shows the match and overlap of an incomplete and a complete noun phrase yielding a complete, corrected noun phrase. ----------------------INPUT--------------------------the screw is over --- is in the block _______________________________________ MATCH VP MATCH V MATCH is MATCH PP MATCH P NO MATCH 1) over 2) in NO MATCH 1) NIL 2) DET-the N-block ----------------------OUTPUT-----------------------the screw is in the block Figure 5. The example shows the match and overlap of an incomplete and a complete verb phrase, including a false start of a contained prepositional phrase in the first partial parse tree.

The method works well for corrections and repetitions on phrase level or higher, and in some

cases for false starts, as a first set of tests has shown. The method is up to now in agreement with the hypothesis that corrections in spoken language often take place on the level of phrases (cf. e.g. Jurafsky and Martin 2000, Hindle 1983). One problem for the HCP is if the structure to be parsed can be represented as a grammatical utterance, even though it might actually be a kind of ungrammatical expression. Example: “The red blocks --- take the red block.” Which could be parsed as NP - the red blocks, VP - take the red block. A proper processing of this kind of phenomena could be captured using prosodic information or semantic knowledge (e.g. that – in the context of this utterance - a block can’t take a block), or a set of communication or conversational rules (e.g. that the repetition of more than one word is unlikely). An application to a larger set of transcribed spontaneous spoken dialogues still requires further work, in particular an integration of the various modified mechanisms of the HCP as described above. This might involve some difficulties since the relaxation of several constraints, which are in general implied in traditional deterministic parsing, might lead to an over-acceptance of input sentences, or to repaired constructs which do not reflect the intention of the speaker.

5

Conclusion

In this paper the basic method of the Hybrid Connectionist Parsing (HCP) approach was described. As a special NN related feature of the HCP a mechanism for the parallel processing of competing grammatical rules was shown, based on a complex mini-network integrating inhibitory connections, which represents the competing rules. Various modifications of the HCP towards a robust and fault-tolerant parser of ungrammatical utterances have been discussed, which were developed based on the analysis of transcribed spontaneous speech. In particular the method of partial parse tree matching and overlapping as a special graph-matching problem has been introduced. The method works well for corrections and repetitions and in some cases for false starts, whenever a second sub-tree can be overlapped with a first parse tree on the basis of a common root node within the trees. An application to a larger set of transcribed spontaneous spoken dialogues still requires an

integration of the various techniques of the modified HCP.

Acknowledgement Thanks to Venkatesh Manian for an implementation of the tree-matching and overlapping mechanism, and still to my former students Habibatou Kone and Christoph Schommer for first implementations and test of the HCP method.

References Elman, J. L.: Finding Structure in Time. Cognitive Science, 1990, no 14, pp. 179-211. Fanty, M.: Context-Free Parsing in Connectionist Networks. Technical Report TR 174, Computer Science Department, University of Rochester, Rochester, NY, 1985. Heeman, Peter A., Kyung-ho Loken-Kim and James F. Allen: Combining the detection and correction of speech repairs. ICSLP-96. Hemphill, Charles T., John J. Godfrey and George R. Doddington: The ATIS Spoken Language Systems Pilot Corpus. Proc. Of the Speech and Natural Language Workshop, Hidden Valley, PA, 1990, pp.96-101. Hindle, D.: Deterministic Parsing of syntactic non-fluencies. In ACL-83, Cambridge, MA, pp. 123-28. Jain, A. N. : A Connectionist Architecture for Sequential Symbolic Domains. Technical Report CMU-CS-89-187, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 1989. Jain, A. N. and A. H. Waibel: Incremental Parsing by Modular Recurrent Connectionist Networks. In D. S. Touretzky (ed.): Advances in Neural Information Processing Systems 2, Morgan Kaufmann, San Mateo, CA, 1990. Jurafsky, D. and J. H. Martin: Speech and Language Processing, Prentice-Hall, 2000. Kemke, C.: Konnektionistische Modelle in der Sprachverarbeitung (Connectionist Models for Speech and Language Processing), Cuvillier-Verlag, Goettingen, Germany, 2000. Kemke, C.: Connectionist Parsing with Dynamic Neural Networks – or: “Can Neural Networks make Chomsky Happy?”, Technical Report, Computing Research Laboratory, New Mexico State University, 2001 (in press). Kemke, C.: Graph Matching for Robust Natural Language Processing, article in preparation. Kemke, C.: Parsing Neural Networks - Combining Symbolic and Connectionist Approaches. Proc. International Conference on Neural Information

Processing ICONIP’94, Seoul, Korea, October 1994. Also TR-94-021, ICSI, Berkeley, CA, 1994. Kemke, C.: A Hybrid Approach to Natural Language Parsing. von der Malsburg, von Seelen, Vorbrueggen, Sendhoff (Eds.): Artificial Neural Networks, Proc. ICANN’96, Bochum, Germany, July 1996, p.875-880. Kemke, C. and H. Kone: INCOPA - An Incremental Connectionist Parser. Proc. World Congress on Neural Networks, Portland, Oregon, 1993, Vol. 3, pp. 41-44 Kemke, C. and C. Schommer: PAPADEUS Parallel Parsing of Ambiguous Sentences, Proc. World Congress on Neural Networks, Portland, Oregon, 1993, Vol. 3, pp. 79-82 Kempen, G. and T. Vosse: Incremental syntactic tree formation in human sentence processing: A cognitive architecture based on activation decay and simulated annealing, Connection Science, Vol.1, 1989, pp. 273-290. Kone, H.: INKOPA - Ein inkrementeller konnektionistischer Parser fuer natuerliche Sprache (INKOPA – an Incremental Connectionist Parser for Natural Language). Master's Thesis, Computer Science Department, University of the Saarland, 1993 Pollack, J. B.: On Connectionist Models of Natural Language Processing. Technical Report MCCS-87-100, Computing Research Laboratory, New Mexico State University, 1987. Reilly, R. and N. E. Sharkey (eds.): Connectionist Approaches to Languages, North-Holland, Amsterdam, 1988 Reilly, R. and N. E. Sharkey (eds.): Connectionist Approaches to Natural Language Processing, Lawrence Erlbaum, Hillsdale, 1992. Schnelle, H.: A Net-Linguistic “Earley” Chart-Parser. In R. Reilly and N. E. Sharkey (eds.): Connectionist Approaches to Languages, North-Holland, Amsterdam, 1988. Schommer, C.: PAPADEUS - Ein inkrementeller konnektionistischer Parser mit einer parallelen Disambiguierungskomponente (PAPADEUS – An Incremental Connectionist Parser with Disambiguation). Master's Thesis, Computer Science Department, University of the Saarland, 1993 Sharkey, N.: Connectionist Natural Language Processing. intellect, Oxford, England,1992. Vosse, T. and G. Kempen: Syntactic structure assembly in human parsing: A computational model based on competitive inhibition and a lexicalist grammar, Cognition, no 75, 2000, pp. 105-143. Waltz, D.L. and Pollack, J.B.: Massively Parallel Parsing: A Strongly Interactive Model of Natural Language Interpretation. Cognitive Science, 1985, vol 9, no 1, pp. 51-74. Wermter, S. and V. Weber: SCREEN: Learning a Flat Syntactic and Semantic Spoken Language Analysis Using Artificial Neural Networks. Journal of

Artificial Intelligence Research, pp.35-85.

vol 6,

1997,