GIT-CC-95/10 SYNTAX-SEMANTICS INTERACTION IN SENTENCE UNDERSTANDING Kavi Mahesh
[email protected]
A THESIS Presented to The Academic Faculty
In Partial Ful llment of the Requirements for the Degree Doctor of Philosophy in Computer Science
Georgia Institute of Technology March, 1995 c 1995 by Kavi Mahesh Copyright All Rights Reserved
SYNTAX-SEMANTICS INTERACTION IN SENTENCE UNDERSTANDING
Approved: Kurt P. Eiselt, Chair Susan Bovair Jennifer K. Holbrook Ashwin Ram J. Spencer Rugaber
Date approved by Chair
i
DEDICATION
| To my wife Vani and sister KN |
ii
ACKNOWLEDGMENTS I am most deeply indebted to my advisor, Kurt Eiselt. He has helped me in every step through my graduate career ever since I walked into his oce in my very rst quarter to sign up for a specialtopic course. Most of all, he guided my search through the maze of language processing literature and led me into the right topic for my dissertation, one that I have enjoyed working on every minute since, at a time when I was just about to get past my coursework and qualifying examinations. He has continually shared with me his vision of how we understand language and inspired my work all through the years. He has patiently taught me not to write papers in the \mystery story" approach and has generously lent me all his style guides and writing manuals. He has encouraged me and helped me tremendously in publishing parts of this work in a variety of rst-rate conferences and workshops. He has not hesitated even to spend the bulk of his discretionary funds for my travel expenses for attending conferences. He has taught me how to teach and has proved his trust in me by giving me the opportunity to teach a graduate class all by myself, though he could well have come back and resumed teaching the class towards the middle of that quarter. It is impossible to count the number of letters he has so well written for me. He has never complained no matter how many times I walked into his oce in a day. He has been my advisor, mentor, friend, and my guru in the true sense of the word. Ashwin Ram has been a lot more than a committee member to me. He has always shown keen interest, has come to my every presentation, has asked all the right questions, and has given me invaluable ideas that have helped me shape my work and expand my comprehension of the eld. I have learned most from being his teaching assistant. He even patiently taught me how to use LaTEX and other graduate school \requirements." He has always been a source of excitement and enthusiasm in the NLR meetings and the AI group in general. Ashok Goel has been a source of ideas and through numerous discussions in personal meetings, in his group meetings, and in the corridors late in the nights, has taught me how to think about issues in AI other than language processing. I owe to him whatever little I understand about functional reasoning and design. He has never hesitated to include me in his group of students though I have asked many stupid questions and should have really been an \outsider." He has supported me to work on the KA project knowing very well that I was rather too preoccupied with completing my thesis to aord the frequent context switches that the project demanded. His ideas on the KA project have expanded my horizons far beyond sentence understanding. Jennifer Holbrook has been a source of ideas and has been our gateway to psycholinguistics. She has posed challenging problems and always shown complete con dence in me by entrusting me with the job of developing her ideas about human language processing into a working computational model. She has read everything I have written in great detail and given me detailed comments ranging from simple grammar corrections and style improvements to the most insightful theoretical remarks. Her comments on the psycholinguistics chapter have been particularly useful. Susan Bovair has served on my committee since the beginning and has provided many a useful criticism from a psychologist's perspective, especially on Chapters 3 and 10 of this thesis. She has also been an integral part of the NLR group and the many research games we have played in that group. Spencer Rugaber agreed to join the committee towards the very end and read and commented on the thesis at short notice. His comments from the point of view of a computer scientist outside the area of arti cial intelligence were truly valuable.
iii
Other members of the Natural Language and Reasoning (NLR) interdisciplinary research group have served as an irreplaceable source of ideas and motivation for language related research. Together they have taught me how to think about language from the perspectives of linguistics, social science, education, literature, culture, and communication. Charles Bazerman was truly instrumental in giving me a broad education on the study of language from diverse perspectives. He has patiently listened to my naive ideas from the computational perspective and worked with me in turning his ideas from rhetorics into a research proposal for building a computer tool to teach good writing skills. William Evans insists on believing that he learned something about natural language processing from me. He has introduced me to many interesting pieces of work in communication science and bibliometric studies. Wendy Newstetter deserves a special mention for bringing her functional perspective on linguistics to the group. She has also taught us a whole lot about the teaching of language skills and about education and evaluation in general. My fellow students in the group, Andy Edmonds, Anthony Francis, Lucy Gibson, Kenneth Moorman, Justin Peterson, and James Riechel, have added tremendously to the excitement in the group. Of course, the group would not have existed but for the time, eort, and interdisciplinary interest shown by the other faculty members of the group, Susan Bovair, Kurt Eiselt and Ashwin Ram. Thank you all for making it such a fun and unique group to belong to. Other faculty in the cognitive sciences at Georgia Tech have contributed signi cantly in shaping my interests and views. In particular, I would like to acknowledge the interest and encouragement from Janet Kolodner, Dorrit Billman, Lawrence Barsalou, Tony Simon, Daryl Lawton, and Ron Arkin. Special thanks to Janet Kolodner for the many ways in which she has provided generous nancial and other forms of support for this work. Richard Billington has never complained about the numerous software and machine problems that I have taken to him over the years. He has always been there to help me with my problems, be they with Lisp, with networks, or the good old Symbolics. Ed Anderson and Kathy Ball taught me how to be a teaching assistant in the early times. Justin Peterson has contributed enormously to this work. He has always treated me like the smaller of us \language brothers" and has helped me shape my ideas and present them better. He has pointed me to the right literature, taught me many things about syntax, and made me think in the right directions. Working on the KA project was especially enjoyable because of Justin. Graduate school would never have been the same without Sambasiva Bhatta, Eleni Stroulia, and Murali Shankar. We entered the program together and learned many tricks of graduate school survival together. Thanks especially to Sam and Eleni for the many entertaining conversations in the \AI lounge" and all the help outside school too. I especially enjoyed the gossip sessions. Other members of the AI group including Michael Cox, Anthony Francis, Andres Gomez, and Kenneth Moorman have added their part to make the place more fun and interesting. Kenneth Moorman and James Riechel have patiently worked with COMPERE, in spite of the many bugs it had and the inordinate delays before I even attempted to x the bugs. Thank you guys for believing that I had written a program that you could actually use. Members of the administrative sta in the College of Computing have been most helpful. My special thanks to Susan Haglund, Tempo Tinch, and Angelie Alford for the many favors and for letting me walk through their oces to Kurt's chamber all the time. The AI secretaries, rst Jeannie Terrell and then Allyana Ziolko, have so cheerfully helped us in every way. I was supported in part during the course of this research by a research grant from Northern Telecom. My special thanks to H. Venkateswaran for being such a caring friend. Friends in Atlanta outside the school, especially Manju, Suresh, and Sampath, have provided invaluable company and entertainment that has made my life as a graduate student so much easier to go through. Special thanks to all the members of the Kannada association for making me part of their own team and
iv
for the great friendship and the many dinners. Last but by no means the least, my wife and best friend, Vani, has contributed in no small measure to the success of this thesis. She has endured all my frustrations and stresses during thesis writing, rewriting, and job hunting without a complaint and has shown unlimited faith and con dence in me. This thesis would not have been possible but for her encouragement and support. My parents have contributed from the start by instilling in me the values of higher education and learning and have supported me in every way. Apart from my parents, the one person who has supported me in every way possible throughout my life is my sister Nagarathna (KN). She has sel essly provided me with all the moral, social, and nancial support throughout, and made my graduate study possible. All through my years away from home, she has never stopped writing letters regularly to keep me informed of all the ongoings back home and has never allowed me to feel left out of the family. The contribution of my brother KRM, both when he was here and back home, is no less either. I dedicate this thesis to Vani and KN.
v
CONTENTS DEDICATION ACKNOWLEDGMENTS LIST OF TABLES LIST OF FIGURES SUMMARY
Chapters
I. SENTENCE UNDERSTANDING 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9
Natural Language Understanding Sentence Understanding: The Task Sentence Understanding: The Questions Previous Solutions 1.4.1 The Modularity Debate Our Hypothesis Our Solution: An Overview 1.6.1 Implementation and Evaluation COMPERE in Action: An Example Contributions Organization of the Thesis
2.1 2.2 2.3 2.4 2.5
Principles Questions Claims Predictions Assumptions
ii iii xi xii xiv 1
1 2 3 4 4 5 6 8 9 12 13
II. PRINCIPLES, CLAIMS, AND ASSUMPTIONS
15
III. PSYCHOLINGUISTIC THEORY OF SENTENCE UNDERSTANDING
21
3.1 3.2 3.3 3.4
Overview of Psycholinguistic Studies Immediate Interaction Studies 3.2.1 Additional Support for Immediate Integration First Analysis Studies 3.3.1 Additional Support for Syntactic Autonomy Studies of Error Recovery 3.4.1 Lexical and Pragmatic Error Recovery 3.4.2 Syntactic Error Recovery 3.4.3 Uni ed Theory of Error Recovery 3.5 Studies of Resource Constraints 3.5.1 Other Multiple Analysis Models 3.6 Studies of Aphasia 3.7 Analysis: Evidence for Claims 3.7.1 Combining Functional Independence and Integrated Processing 3.7.2 Implications of Error Recovery Studies 3.7.3 Implications of Resource Constraints
15 17 17 18 18
21 22 25 26 29 30 32 33 34 34 36 36 37 37 38 38
vi
3.7.4 Modularity Debate: Reconciliation 3.8 Discussion and Summary
IV. FUNCTIONAL THEORY OF SENTENCE UNDERSTANDING 4.1 Functional Constraints on Natural Language Understanding 4.1.1 Incremental Interpretation 4.1.2 Determinism 4.1.3 Error Recovery 4.1.4 Functional Independence 4.2 The Sentence Understanding Task 4.2.1 The Input 4.2.2 The Output 4.2.3 Problems in Sentence Understanding 4.2.4 Ambiguity 4.2.4.1 A Typology of Ambiguities 4.2.5 Knowledge Sources Used in Language Understanding 4.2.6 Types of Knowledge 4.3 Task Decomposition 4.3.1 Incremental Communication between Subtasks 4.3.2 Example
39 40
43
43 44 45 45 45 45 46 46 46 47 47 50 50 53 56 56
V. SYNTAX-SEMANTICS COMMUNICATION AND SENTENCE PROCESSING ARCHITECTURES 62 5.1 The Need for Communication 5.1.1 Functional Motivation: Ambiguity 5.1.2 Cognitive Constraints 5.1.3 Generativity 5.1.3.1 Integrated Representations 5.2 The Nature of the Communication 5.2.1 Interaction between Knowledge Sources within Syntax 5.2.1.1 Types of Top-Down Guidance to Bottom-Up Parsing 5.2.1.1.1 Syntactic Expectation: 5.2.1.1.2 Semantic and Conceptual Preference: 5.2.1.1.3 Structural Preference: 5.2.1.2 The Control of Parsing 5.2.2 The Need for an Arbitration Mechanism 5.2.3 Translation is No Good 5.2.4 The Alternative: Intermediate Roles 5.3 Sentence Processing Architectures 5.3.1 Architectural Dimensions 5.3.2 Sequential Architecture 5.3.3 Integrated Architectures 5.3.4 Parallel Architectures 5.4 Architectures and Syntax-Semantics Communication 5.4.1 COMPERE's Architecture 5.4.2 A Uni ed Process? 5.4.3 Issues in Syntax-Semantics Communication 5.5 Related Models of Sentence Understanding 5.5.1 Sequential Models 5.5.2 Integrated Models 5.5.3 Spreading Activation Models 5.5.4 Other Recent Models 5.5.5 Summary
62 62 63 63 64 65 65 66 66 66 66 67 67 68 69 69 69 72 72 74 75 75 77 79 80 80 81 83 84 85
vii
VI. THE THEORY OF PARSING: WHEN TO COMMUNICATE WITH SEMANTICS 88 6.1 Introduction 6.2 Combining Bottom-Up and Top-Down Parsing 6.2.1 The Need to Combine Bottom-Up and Top-Down Methods 6.2.2 Parsing Strategies 6.2.3 Sources of Justi cation for Syntactic Commitments 6.2.4 Two Constraints on Human Parsers 6.2.4.1 Local Ambiguities 6.2.4.2 Memory Requirements 6.3 Some Preliminaries 6.3.1 Spectrum of Parsing Strategies 6.3.1.1 Arc Enumeration Strategies 6.3.2 Required versus Optional Constituents 6.4 Why Not Bottom-Up or Top-Down? 6.4.1 Other Reasons for Mixing Bottom-Up and Top-Down Strategies 6.5 Problems with Left-Corner Parsing 6.6 Head-Signaled Left-Corner Parsing (HSLC Parsing) 6.6.1 Space Requirements of HSLC 6.6.2 Local Ambiguities in HSLC 6.7 Related Parsers 6.7.1 Head-Corner Parsing 6.7.2 Lookahead Parsers 6.7.3 Categorial Grammars 6.8 On the Nature of Syntactic Preferences 6.9 Summary
88 90 90 91 91 93 93 94 95 96 97 99 100 100 103 104 106 110 110 111 111 111 112 112
7.1 Semantics 7.1.1 Dierent Views of Semantics 7.1.2 Elements of Linguistic Semantics 7.1.3 Semantic Roles 7.1.3.1 Thematic Roles 7.1.3.2 Extended Set of Semantic Roles 7.1.4 Intermediate Roles 7.1.4.1 Linking Theory and Thematic Hierarchies 7.2 Semantic Processing 7.2.1 Role Assignment 7.2.1.1 Syntactic Guidance for Semantics 7.2.1.2 Role Assignment as Parsing 7.2.2 Role Emergence 7.2.2.1 Syntax-Semantics Consistency and Correspondence 7.2.3 Independence of Semantics 7.3 Arbitration and Con ict Resolution
113 113 114 115 115 116 116 117 117 118 118 118 119 119 122 122
VII. THE THEORY OF SEMANTICS: HOW TO COMMUNICATE WITH SYNTAX 113
VIII. COMPERE: THE SENTENCE UNDERSTANDING SYSTEM 8.1 Knowledge Representation 8.1.1 Lexical Knowledge 8.1.2 Syntactic Knowledge 8.1.2.1 Preconditions and Expectations in Syntax 8.1.2.2 Indexing for Bottom-Up Parsing 8.1.2.3 Required Units and the Head 8.1.2.4 COMPERE's Representation of Syntactic Rules 8.1.2.5 The Head of a PP 8.1.3 Conceptual Knowledge 8.1.4 Role Knowledge 8.1.4.1 Intermediate Roles 8.1.5 Other Knowledge
124
124 125 126 127 127 130 130 133 133 135 136 138
viii
8.2 Working Memory 8.2.1 Representing Proposed Attachments 8.3 Sentence Processing Methods 8.3.1 Implementation of HSLC Parsing Algorithm 8.3.1.1 Deciding When to Make Attachments 8.3.1.2 Identifying Possible Parents 8.3.1.3 Proposing Attachments 8.3.1.4 Selecting from Proposed Attachments 8.3.1.5 Making Attachments 8.3.2 Implementation of Role Assignment 8.3.2.1 Generating Semantic Attachment Proposals 8.3.2.2 Processing Left Roles 8.3.2.3 Semantic Preference Levels 8.3.2.4 Selecting and Making Role Attachments 8.3.2.5 Constraints on Role Assignment 8.3.3 Arbitration 8.3.3.1 Implementing the Arbitration Algorithm 8.3.4 Resolving Lexical Semantic Ambiguities 8.3.5 Retention and Elimination of Alternatives 8.3.6 Implementation of Error Recovery Algorithms 8.3.6.1 Composition Failure 8.3.6.2 Incompleteness Failure 8.3.6.3 Recovery Induced Errors 8.4 COMPERE, the Program
IX. PERFORMANCE ANALYSIS AND EVALUATION
9.1 Validation of the COMPERE Program 9.1.1 Simple Sentences 9.1.2 Relative Clauses 9.1.3 Structural Ambiguities 9.1.3.1 PP Attachment Ambiguities 9.1.3.2 Phrase-Boundary Ambiguities 9.1.4 Multiple Ambiguities: A Challenge 9.1.5 Claims Revisited 9.1.5.1 Syntactic Coverage 9.1.5.2 Semantic Coverage 9.1.5.3 Claim 1: Integrated Processing with Independence 9.1.5.4 Claim 1a: Functional Independence 9.1.5.5 Claim 2: Synchronizing Syntactic-Semantic Compositions 9.1.5.6 Claim 3: Error Recovery 9.1.5.7 Claim 4: Syntax-Semantics Interaction through Arbitrator 9.2 Comparative Analysis with Other Architectures 9.2.1 Sequential Architectures 9.2.1.1 Semantics-First Architecture 9.2.2 Integrated Architectures 9.2.3 Uncontrolled Parallel Architectures 9.3 Formal Analysis of COMPERE 9.3.1 Some Intuitions 9.3.2 COMPERE as an Automaton 9.3.2.1 Operations in the Automaton 9.3.3 A Simple Cost Metric 9.3.3.1 The Cost Metric 9.3.3.2 Computing the Size of the Parse Forest 9.3.4 Cost Metric: Assumptions 9.3.5 Cost Metric: Validity 9.3.5.1 Cost of Center Embedding 9.3.6 The Cost of Parsing Decisions 9.3.6.1 Performance Analysis: Simple Sentences 9.3.6.2 Performance Analysis: PP Attachment Ambiguity
138 141 142 143 143 143 143 144 147 147 148 149 151 152 152 154 155 156 159 159 159 160 162 164
166
166 166 167 171 171 172 174 174 175 176 176 177 178 179 179 180 180 181 184 184 185 187 188 189 192 192 192 193 195 196 197 198 199
ix
9.3.6.3 Performance Analysis: Modi ers in the Pre x 9.3.6.4 Performance Analysis: A Complex Example 9.3.7 Performance Tradeos in Sentence Processing: A Formal Analysis 9.3.7.1 Tradeos in Prediction 9.3.7.2 Tradeos in Left-Corner Projection 9.3.7.3 Tradeos in Eager Reduction 9.3.8 Empirical Factors
201 203 204 205 206 207 209
X. DISCUSSION
212
XI. CONCLUSIONS
223
10.1 Psychological Predictions from COMPERE 10.1.1 The Predictions 10.1.1.1 Interactive Error Recovery and COMPERE's Architecture 10.1.1.2 Error Recovery and Retention 10.1.2 A Sketch of an Experiment 10.1.3 A Third Prediction 10.2 History of COMPERE 10.3 Limitations of COMPERE 10.3.1 Cognitive Accuracy 10.3.1.1 Resource Limits 10.3.1.2 Flexible Parsing: Eager HSLC 10.3.2 Other Limitations 10.3.3 Limitations of the Current COMPERE Program 10.4 Other Directions for Future Work 11.1 Issues Addressed 11.2 Contributions 11.3 Conclusion
212 212 213 213 214 216 216 217 217 217 218 220 220 221 223 224 225
x
LIST OF TABLES 4.1 5.1 5.2 5.3 6.1 8.1 8.2 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 9.10
Semantics After Each Word of Sentence 19. Architectural Space of Sentence Understanders. Syntax-Semantics Communication in Various Architectures. A Comparative Summary of Sentence Understanding Models. Space Requirements and Local Ambiguities of Parsing Strategies. Preference Levels in Syntax. Preference Levels in Semantics. Operator Preconditions in COMPERE. Cost Metric Calculation: An Example. Cost of One Center Embedding. Cost of Two Center Embeddings. Performance Comparison: A Simple Sentence. Performance Comparison: A PP Attachment Ambiguity. Performance Comparison: Unambiguous PP Attachment. Performance Comparison: Eect of Modi ers in the Pre x. Performance Comparison: A Complex Noun Group. Performance Comparison: A Complex Example.
61 71 77 87 102 146 151 191 193 197 198 199 200 201 202 203 204
xi
LIST OF FIGURES 1.1 1.2 1.3 1.4 3.1 3.2 3.3 3.4 3.5 4.1 4.2 4.3 4.4 5.1 5.2 5.3 5.4 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 7.1 7.2 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11 8.12 8.13
Architecture of COMPERE. Garden Path: Main-Clause Interpretation. Garden Path: Reduced Relative Clause. Reduced Relative Clause: No Garden Path. Main-Clause Interpretation. Reduced Relative Clause. Minimal Prepositional Attachment. Non-Minimal Prepositional Attachment. Right Association. Task Decomposition and Information Flow in Sentence Understanding. Syntactic Structures After Each Word of Sentence 19 (Part 1). Syntactic Structures After Each Word of Sentence 19 (Part 2). Syntactic Structures After Each Word of Sentence 19 (Part 3). Sequential Architecture. Integrated Architecture. Various Sentence Processing Architectures. Forster's Levels of Processing Model. Top-Down Parsing. Bottom-Up Parsing. Left-Corner Parsing. Head-Driven Parsing. Arc-Standard Left-Corner Parsing. Sentence Branching Structures. Arc-Eager Left-Corner Parsing Is Too Eager. Arc-Standard Left-Corner Parsing Is Too Circumspect. Relative Positions of Parsing Strategies. The HSLC Algorithm. An Illustration of Data Structures in HSLC Parsing. HSLC: Head-Signaled Left-Corner Parsing. A Right Branching Construct. Syntax-Guided Role Assignment. Role Assignments Guided by Syntactic Attachments. Representational Unit: A Node. Representation of Lexical Knowledge. A Simple Grammar. Typical Structure of a Phrase `xP'. Representation of a Simple Grammar. Representation of Conceptual Knowledge. Representation of Intermediate Role Knowledge. COMPERE's Working Memory. Ways of Making Syntactic Attachments. Pseudocode for Proposing Syntactic Attachments. Processing a Left Role. The Arbitration Algorithm. Arbitration: A Benign Situation.
8 10 11 12 27 27 28 28 31 55 58 59 60 73 73 76 81 97 98 98 99 99 101 104 105 106 107 108 109 110 120 121 125 126 128 129 132 134 137 139 145 146 150 156 157
xii
8.14 8.15 8.16 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 9.10 9.11 9.12 9.13 9.14 9.15 9.16 9.17 9.18 9.19 9.20 9.21 9.22 9.23 9.24 9.25 9.26 9.27 10.1
Arbitration: A Con ict Situation. Error Recovery: Composition Failure. Error Recovery: Incompleteness Failure. A Simple Sentence. An AUX/V Ambiguity Sentence. A Passive-Voice Sentence. A Reduced-Relative Garden-Path Sentence. A Relative-Clause Sentence. Another Relative-Clause Sentence. Yet Another Relative-Clause Sentence. A Reduced-Relative With Semantic Bias. The Complete Sentence With Semantic Bias. A Center-Embedded Sentence. Resolving a Structural Ambiguity Also Resolves a Lexical Ambiguity. Recovering from the Garden Path \Unresolves" the Lexical Ambiguity. Syntax Violates Semantic Bias. An Unambiguous PP Attachment. A PP-Attachment Ambiguity. ADJ Attachment Ambiguity. A Local Phrase-Boundary Ambiguity. Multiple Lexical-Category Ambiguities. Sequential Architecture. Interaction Between Syntax and Semantics in the MOPTRANS Parser. Interaction Between Syntax and Semantics in COMPERE. The Sentence Interpretation Automaton. The Contents of the Stack: An Example. Tradeos in Left-Corner Projection and Prediction. Tradeos in Eager Reduction: Cost of Reduction at Left Corner. Tradeos in Eager Reduction: Cost of Delayed Head Composition. Tradeos in Eager Reduction: Cost of Delayed Post x Composition. Multiple Ambiguities and Composite Errors.
158 161 163 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 189 194 206 208 209 210 215
xiii
SUMMARY Natural language is the primary mode of human communication. Developing a complete and wellspeci ed computational model of language understanding is a dicult problem. Understanding a natural language sentence requires the application of many types of knowledge, such as syntactic, semantic, and conceptual knowledge, to resolve the many types of ambiguities that abound in natural language. Most unresolved issues in both psychological and computational modeling of sentence understanding are concerned with the questions of when should each of the various types of knowledge be applied in processing a sentence and how should the dierent types of knowledge be integrated to select unique interpretations of sentences. In this work, we have developed a model of sentence understanding called COMPERE (Cognitive Model of Parsing and Error Recovery). Our model was built on the hypothesis that a sentence processor has an architecture with separate representations of the dierent types of knowledge but a single uni ed process that integrates the dierent types of knowledge. We have shown that such an architecture addresses the modularity debate by demonstrating how the same sentence processor can produce seemingly modular behaviors in some situations and interactive behaviors in other situations. We have also shown how the uni ed arbitrating process can not only resolve both syntactic and semantic, lexical and structural, ambiguities, but can also recover from its errors in both syntactic and semantic ambiguity resolution. The uni ed process can also explain the temporal dependencies in syntax-semantics interactions. It shows how certain decisions are made early and others delayed until further information becomes available. We have developed a parsing algorithm called Head-Signaled Left-Corner parsing to identify the time course of points in the sentence where decisions are to be made. This algorithm decides when to make a commitment and when to delay a syntactic attachment. We have also developed a simple arbitration algorithm for combining information coming from multiple knowledge sources and for resolving any con icts between them. In addition we have developed a uniform representation of syntactic and semantic interpretations using what are called intermediate roles. These intermediate roles not only aid the dynamic integration of knowledge types by the uni ed arbitrator, they also provide a declarative record of the intermediate decisions made in syntax-semantics interactions to enable the processor to recover from its errors through repair rather than complete reprocessing. We present a theoretical framework for formal analyses of the performance of sentence processors in various situations. These analyses indicate that the HSLC parsing algorithm, along with incremental interactions between syntax and semantics controlled by the uni ed arbitrator, reduces the amount of local ambiguity and working memory requirements in processing a sentence. We also present certain psychological predictions made by the COMPERE model. We conclude from this study that our model of sentence understanding, with its uni ed process
xiv
applied independently to multiple knowledge sources, provides an answer to the modularity debate and explains, better than other possible architectures, how and why the human sentence processor produces the wide variety of behaviors that it does.
1
CHAPTER I SENTENCE UNDERSTANDING The strongest argument of those advocating a semantics-driven syntax analysis is the ability of people to interpret sentences from semantic clues in the face of syntactic errors or missing information: : : An analogous argument, however, can be made in the other direction|people can also use syntactic rules when semantics is lacking: : : Ultimately, we want an analyzer which can work from partial information of either kind, and research in this direction is to be welcomed: : :
R. Grishman, 1986, p. 12.
1.1 Natural Language Understanding A natural language such as English is the primary mode of human communication. Advances in technology such as the introduction of electronic devices and networks have opened up new media for communication, which is, nevertheless, primarily in natural language. As processing speeds and storage capacities of modern computers are multiplying rapidly, the use and availability of natural language texts in on-line communication and storage are burgeoning to a great extent. However, current techniques for processing such texts based on simple string matching are proving to be increasingly insucient. This has opened up an ever expanding range of applications for text processing requiring deeper analyses of the texts in terms of the syntax and semantics of natural languages. There is an immediate need to develop robust, exible, and ecient techniques for processing texts as words, sentences, and other larger units rather than merely as sequences of characters with little syntax or semantics. Eective communication through natural language has been possible so far only through the language processor in the human brain. Psychological investigations of human language processing using a variety of intricate techniques have led to considerable insight into the working of the human language processor. Yet, attempts to build psychologically real computational models of language processing face considerable obstacles in translating the constraints put forth by psycholinguistic theories into the representations and processes that constitute the computational model. Natural languages, unlike the speci cation languages that we have designed to program machines, have evolved to permit high degrees of ambiguity, redundancy, and vagueness. Although some of these features of natural language are crucial for the exibility and eectiveness of natural language communication, they have also hampered eorts to build arti cial language processors. The proliferation of ambiguities in natural language processing together with the exibility and ease with which human language understanders deal with them has captured the attention of researchers in arti cial intelligence (AI), linguistics, psycholinguistics, and other cognitive sciences. Researchers trying to build cognitive models of language understanding have been puzzled by the architecture of the language understander which can produce the diverse behaviors in ambiguity resolution observed in human language understanding, such as early commitment (Carpenter and Just, 1988; Frazier, 1987; Wanner and Maratsos, 1978), garden paths (e.g., Crain and Steedman,
2
1985), delayed decisions (e.g., Stowe, 1991), parsing breakdown (e.g., Lewis, 1992), and error recovery (e.g., Carpenter and Daneman, 1981; Eiselt, 1989; Eiselt and Holbrook, 1991). Ambiguities are encountered throughout language analysis. For example, natural language sentences are full of ambiguities in word meanings, in assigning unique syntactic structures to sentences, and in possible ways of combining individual word meanings. In order to arrive at a unique interpretation for a sentence, these ambiguities must be resolved. If left unresolved, ambiguities lead to a combinatorial explosion of possible interpretations for a given input. A model of natural language understanding that simply charts out every possible interpretation without making any selection is not only an implausible account of human language comprehension given the limited working memory capacities of the human brain, but such a model is soon bogged down by the large number of possibilities and will be unable to provide useful output for any computational task involving natural language input. Resolving the ambiguities requires that the language processor employs a variety of dierent types of knowledge. Ambiguities can be resolved using knowledge of individual words, their meanings and other information about a word that one nds in a dictionary or a lexicon, knowledge of a grammar for the language (called syntactic knowledge), knowledge of possible semantic relationships between classes of meanings, as well as conceptual knowledge of the particular context in which the natural language communication is taking place, of the domain of discourse, and so on.1 The majority of unresolved issues in theories of sentence processing or ambiguity resolution are concerned with the questions of when each of the various types of knowledge should be applied in processing a sentence and how should the dierent types of knowledge be integrated to select a unique interpretation for a given sentence. From a computational point of view, it is important to nd good answers to these questions because natural language processors need to be able to apply the right types of knowledge at the right times to reduce local ambiguity and produce unique interpretations without demanding unreasonable amounts of complete and speci c knowledge (of the domain or context, for example). These issues are equally important from the point of view of psychological modeling of human sentence processing. How should the model apply each type of knowledge and how and when should it make decisions using the knowledge in order for the model to explain the variety of human behaviors in sentence processing that are documented in psycholinguistic literature?
1.2 Sentence Understanding: The Task In this thesis, we address the problem of resolving dierent types of ambiguities in processing a natural language sentence. We focus on the problem of interpreting individual sentences, ignoring higher level issues such as discourse processing, or inference and reasoning, or other non-linguistic, contextual phenomena. The input to a sentence understander is a sentence (complete or not, grammatical or otherwise) in natural language. A sentence can be characterized as a linear sequence of words in the language. The output desired from a sentence understander must include the events, objects, properties of objects, and the thematic role relationships between the events and the objects in the sentence. In addition, it may also be desirable to include the syntactic parse structure of the sentence in the output. A fundamental problem in mapping the input to the output is the high degree of ambiguity in natural languages. Several types of knowledge such as syntactic and semantic knowledge can be used to resolve ambiguities and identify unique mappings from the input to the desired output. Semantic knowledge is the knowledge of literal, decontextualized, grammatical meaning (see Chapter 7 and Frawley (1992)). Conceptual knowledge, on the other hand, is any knowledge about the world regardless of whether or in what form it appears in a language. 1
3
The task of sentence understanding, the problem of ambiguity, and the use of knowledge to resolve ambiguities are further discussed in Chapter 4 of this thesis.
1.3 Sentence Understanding: The Questions In this thesis, we are concerned primarily with the questions of when should each of the various types of knowledge (such as syntactic and semantic knowledge) be applied in processing a natural language sentence and how should they be integrated to select the most appropriate interpretations from the set of all possible interpretations of a sentence. Our answers to these questions are derived from both psycholinguistic results from a variety of studies (reviewed in Chapter 3) and computational considerations of sentence processing architectures (Chapters 5 and 9). More speci cally, we ask the following questions regarding the use of knowledge to derive unique interpretations of sentences and provide answers to them based both on computational and psychological motivations. 1. How are the dierent types of knowledge integrated in sentence processing?2 How are con icts, if any, among them resolved? 2. When are decisions made to resolve an ambiguity by selecting from the set of possible interpretations? When should the sentence processor attempt to make a decision in order to minimize local ambiguity? 3. What happens when previously made decisions lead to an error? How does the sentence processor recover from dierent types of errors? We know that a sentence processor must use its syntactic, semantic, lexical, conceptual, and other knowledge in order to resolve the ambiguities in sentences. How are these dierent types of knowledge represented and applied in sentence processing? For example, are they integrated a priori in various combinations or are they represented and applied separately but integrated dynamically during sentence interpretation? What happens when dierent types of knowledge are in con ict with each other when it comes to determining a unique interpretation for a sentence? In sentence processing, the time (or location in the sentence) at which the processor attempts to resolve an ambiguity makes considerable dierences in the amount of local ambiguity that the processor must struggle with. For example, if the processor does not resolve an ambiguity even after it has the information necessary to resolve the ambiguity, it will be dealing with more ambiguity than it would have had it resolved the ambiguity using the available information. On the other hand, if the processor attempts to make a decision even before it has the right knowledge that enables it do so, it suers from one of two problems. It either increases the amount of ambiguity by considering each of the alternatives at that ambiguity and, being unable to select one, adds the ambiguity to the current interpretation, or makes a decision that is not justi ed by any piece of knowledge and runs into possible errors at later times. If the processor attempts to resolve an ambiguity exactly at the point when the necessary information becomes available, it neither carries along ambiguities longer than necessary, nor adds to the ambiguity by considering alternative possibilities that it has no information to choose from. In spite of applying a variety of types of knowledge and in spite of applying them at the right time, the sentence processor is bound to run into errors. This is because information that is One could ask a more basic question, namely, what are the dierent types of knowledge that play a role in language understanding. We do not raise such a question in this thesis. Instead, we assume that knowledge sources commonly employed in linguistics, psycholinguistics, computational linguistics, and natural language processing, such as syntax and semantics, are essential for resolving ambiguities in sentences. See an introductory textbook such as the one by Allen (1987) or Grishman (1986) for a good explanation of the role of syntax and semantics in natural language understanding. 2
4
necessary to resolve an ambiguity convincingly may become available only much later in processing a sentence. The processor is forced to make the best decision given the knowledge that is currently available to it, either for lack of resources or because of the need to make early commitments and produce incremental interpretations in order to avoid combinatorial multiplications of ambiguities. What happens when an error is detected? How does the sentence processor recover from the error and change the interpretation by switching to another interpretation or repairing the current interpretation?
1.4 Previous Solutions A number of models of sentence understanding, both psychological or cognitive models and computational models, have been proposed to address some of the above questions. Some of these models are not complete models of sentence processing; they address only a subtask of sentence processing such as syntactic processing, or resolving lexical ambiguities, or resolving pronoun references. Others have assumed a simple linear architecture in their models where one type of knowledge, say syntactic knowledge, is applied rst, and then other types of knowledge, such as semantic knowledge, are applied in a later stage to resolve the remaining ambiguities. A third class of models are based on an approach known as integrated processing. Previous implementations of these models have integrated dierent types of knowledge a priori in their representations.3 This is achieved by directly encoding commonly occurring or expected combinations of dierent types of knowledge so that they can be directly matched to parts of a natural language text. As such, these models were only capable of dealing with expected kinds of texts from known domains. There are also models of sentence interpretation that are incompletely or only vaguely speci ed and do not count as computational models. For example, one can claim that a sentence processor has one spreading activation network for each type of knowledge and that the networks interact with one another by exchange of activation between them. A speci cation at this level does not tell us how or whether this model can produce human-like behaviors and unique interpretations of sentences and does not answer the questions above. In general, models of sentence processing have either accounted primarily for the application of one type of knowledge to resolve certain types of ambiguities or, from a cognitive point of view, have only explained certain kinds of behaviors observed in human sentence processing. For example, there are many models that only deal with the use of syntactic knowledge to resolve syntactic ambiguities. Or, there are models that only account for immediate interactions between syntax and semantics in human sentence processing.
1.4.1 The Modularity Debate
Psycholinguistic studies have shown strong evidence for two broad kinds of behaviors in human sentence processing that appear to be incompatible with each other. The sentence processor shows modular behaviors in some situations where only some types of knowledge seem to have been applied to make decisions, delaying the application of other types of knowledge. In other behaviors known as integrated or interactive behaviors, the sentence processor seems to have made decisions by immediately integrating dierent types of knowledge. These two views have produced a whole body of psychological experiments supporting each view and computational models explaining each set of behaviors. At the center of research on language understanding is this ongoing modularity debate|whether language understanding faculties such as syntax and semantics have a modular architecture or
See Chapter 5 for further discussion of the distinction between the integrated processing principle and previous attempts at implementing it by integrating the representations of dierent types of knowledge. 3
5
whether they interact and are integrated to dierent degrees (Crain and Steedman, 1985; Fodor, 1987; Frazier, 1987; Marslen-Wilson and Tyler, 1987; Tanenhaus, Dell, and Carlson, 1987). Psycholinguistic studies have found evidence for modularity (Clifton and Ferreira, 1987; Fodor, 1983; Frazier, 1987) as well as for interaction between dierent faculties of language processing (e.g., Crain and Steedman, 1985; Tyler and Marslen-Wilson, 1977). There have been many computational models proposed that are good at explaining one or the other kind of behaviors. However, there have been few computational models of language understanding that have even attempted to satisfactorily explain all the dierent behaviors and resolve the debate. A satisfactory answer to resolve the modularity debate in language understanding would result in far-reaching bene ts to the multiple disciplines that the issue has brought together, apart from its inherent value in revealing the nature of some parts of human cognition. The focus in psycholinguistics, for example, would change from looking for evidence for dierent architectures of the language understander to nding out more about the \right" architecture. Natural language understanding programs in AI, which have not seen much success either in modeling human behavior or in producing general and exible language understanding capabilities, would get a head start if they are built on the \right" architecture. These programs can then form the core of a variety of language processing systems for applications such as information extraction and knowledge acquisition, information retrieval, or machine translation. Current natural language processing systems have in exible architectures with little portability across tasks or domains beyond syntactic processing.
1.5 Our Hypothesis We began this work with a hypothesis that arose partly from psycholinguistic evidence about error recovery behaviors, partly from recent studies showing the in uence of certain cognitive factors aecting the modularity issue, and partly from our desire to build a computational model that would be the rst true implementation of the integrated processing principle. The hypothesis was that we could answer the above questions satisfactorily if we designed an architecture of the sentence processor that kept knowledge sources separate but uni ed concurrent syntactic and semantic processes into a single process that controls all aspects of ambiguity resolution and error recovery (Holbrook, Eiselt, and Mahesh, 1992; Eiselt, Mahesh, and Holbrook, 1993). Recent work in psycholinguistics has shown that the ways in which the human sentence processor recovers from its errors in lexical semantic ambiguity resolution (e.g., Eiselt, 1989; Holbrook, 1989) are very similar to those in structural syntactic ambiguities (Stowe, 1991). The amount of resources such as working memory available to the sentence processor has been introduced into the literature recently to explain why human sentence processing exhibits both modular and interactive behaviors (e.g., Carpenter and Just, 1988; King and Just, 1991; Stowe, 1991). The integrated processing principle (Birnbaum, 1986; see also Chapter 2), namely, that every type of knowledge available to the sentence processor is integrated as soon as possible, was never implemented in its true spirit. Integrated models had assumed that integration was guided by stereotypical situations in a domain and was to happen in the representation before processing began, thus preventing the use of parts of the existing knowledge when a new situation did not match one of the encoded integrations. Our hypothesis showed a clear promise of building a model that treated all types of knowledge on an equal footing and integrated any of them that are available dynamically during sentence processing. A model based on our hypothesis could also explain the similarities in syntactic and semantic error recovery. By keeping knowledge sources separate and unifying the process, it had the potential for explaining how the human sentence processor produced seemingly modular behaviors in some situations and seemingly integrated behaviors in other situations. In addition, the uni ed process could provide a exible architecture that could include the new factors such as working memory
6
limits to explain the change in strategies between modular and integrated processing in terms of changes in sentence processing situations.
1.6 Our Solution: An Overview Based on the above hypothesis, we have developed a model of sentence understanding called COMPERE (Cognitive Model of Parsing and Error Recovery)4 that answers the three questions raised earlier in Section 1.3. COMPERE is based on both psychological results and computational analyses of sentence processing. The three questions and COMPERE's answers to them are: 1. How are dierent types of knowledge integrated in sentence processing? We claim that it is possible to keep knowledge sources independent of one another and yet integrate information from all of them dynamically during processing (Claim 1 in Chapter 2). Dierent types of knowledge are represented separately from one another so that they can be applied independently of one another. They are integrated incrementally during sentence interpretation by a single uni ed process that controls the interactions between the dierent knowledge sources and arbitrates between them to resolve any con icts.5 Thus, in COMPERE, knowledge sources are independent but processing is uni ed through a single arbitrating process. 2. When should a sentence processor attempt to resolve an ambiguity? We claim that local ambiguity and the requirements of working memory can be minimized in sentence interpretation by synchronizing the performance of syntactic compositions and semantic compositions (Claim 2 in Chapter 2). COMPERE synchronizes syntactic and semantic compositions to minimize local ambiguity. If semantic information is not available yet, syntactic compositions are delayed until such time when semantic information becomes available. At that point, both syntactic and semantic compositions are attempted together and carried out by the uni ed arbitrating process. 3. How can a sentence processor recover from dierent types of errors it may make? A uniform representation of syntactic and semantic interpretations using what we call intermediate roles is used to keep a declarative record of syntax-semantics interactions during sentence understanding. We claim that a sentence processor, using such a record, can recover from its errors in both structural and lexical ambiguity resolution without reprocessing the input sentence completely or exhaustively searching the space of possible interpretations (Claim 3 in Chapter 2). The arbitrating process coordinates recovery from syntactic as well as semantic errors using intermediate role representations to establish correspondences and maintain consistency between syntactic and semantic interpretations at all times. We also claim that sentence interpretation requires controlled, incremental interaction between syntax and semantics and that a uni ed arbitrator is a sucient mechanism for modeling such interactions (Claim 4 in Chapter 2). COMPERE takes a written sentence in English as input and produces a syntactic structure and a semantic representation of the sentence as output. The semantic representation consists of the 4
A Compere is one who introduces and interlinks items of an entertainment. | Chambers Twentieth Century Dictionary.
Producing incremental interpretations is not a new idea among models of sentence comprehension; it has been a key feature of several previous models (for example, Birnbaum, 1986; Jurafsky, 1991). Hence we consider incremental processing to be a principle (see Chapter 2) on which our model is based rather than as a claim we are making. 5
7
appropriate meanings of the words connected to each other through thematic and other semantic roles. COMPERE produces this output as follows. It reads the words in the sentence in left to right order, processing each word and producing an incremental interpretation after each word. For each word, it accesses its lexical entry or entries and decides whether it should try to compose the word at this time with the current syntactic and semantic interpretations of the part of the input preceding the word. This decision is made according to COMPERE's parsing algorithm. Syntactic compositions are suggested by a parsing algorithm that is a hybrid of left-corner and head-driven parsing methods. This new parsing algorithm is called Head-Signaled LeftCorner Parsing (Mahesh, 1994a; 1994b). Semantic compositions are suggested by a process that tries to compose the primitive roles of two syntactic units to form what we call Intermediate Roles. These intermediate roles ultimately become thematic and other semantic roles that constitute the semantic interpretation of the sentence. Assignment of intermediate roles takes a variety of constraints into account including those arising from syntactic context, knowledge of semantic roles, and conceptual knowledge. COMPERE selects among the proposed syntactic and semantic alternatives by arbitrating between their preferences. If a clear selection is possible, it makes the corresponding syntactic and semantic compositions and retains the unselected alternatives for possible later use in recovering from errors. If there is a con ict between syntactic and semantic preferences, it delays the decision until further information helps the arbitrator resolve the con ict. When a composition is performed, COMPERE decides whether the resulting composite unit should be composed with other units at this time. This decision is again made according to its parsing algorithm. Essentially, if a composition results in a completed syntactic or semantic unit, the completed unit is composed with other units.6 If no syntactic or semantic composition is possible between the new unit and the previous parts of the sentence, an error has occurred. COMPERE tries to recover from the error by selecting a retained previous alternative and repairing the current interpretations. It switches to the new interpretation by repairing the syntactic and semantic interpretations in consistent ways without having to completely reprocess the input. Figure 1.1 shows the architecture of the COMPERE sentence processor that carries out the above process. It is particularly interesting to note that syntactic and semantic knowledge are shown represented separately. Knowledge from these two sources is processed by similar syntactic and semantic processes and integrated dynamically by the uni ed arbitrator (see Chapter 5 for further details of the architecture). The uni ed arbitrator is the single process that controls the interactions between syntax and semantics and makes arbitrated decisions based on all knowledge available to COMPERE. The uni ed process also keeps a record of the intermediate decisions made in syntax-semantic interactions. This record maintained in the form of trees of intermediate roles enables the uni ed process to carry out consistent error recovery operations across syntax and semantics. The uni ed arbitrator, in eect, integrates information arising from its multiple knowledge sources and selects the interpretations that are best overall. It resolves syntactic and semantic ambiguities and recovers from errors made in resolving those ambiguities. 6 A unit is considered complete as soon as it has acquired its head child. See Chapter 6 for more information on this matter.
8
Syntactic Parse Trees
Semantic Roles
Lexical Entries
Words
Unified Arbitrator
Lexical Access
Lexicon
Syntactic Processing
Semantic Processing
Syntactic Knowledge
Semantic and Conceptual Knowledge
Figure 1.1: Architecture of COMPERE.
1.6.1 Implementation and Evaluation
COMPERE has been implemented and tested with a variety of sentences. The algorithms that constitute COMPERE's sentence processing are described in Chapters 6 and 7 while the implementation of the COMPERE system is illustrated in Chapter 8. Being a model of incremental sentence interpretation, COMPERE not only produces the desired output at the end of a sentence, but it also produces appropriate incremental interpretations after each word in the sentence. COMPERE has been evaluated in several ways. First, its \proof of concept" was demonstrated by testing COMPERE with a variety of sentence structures with various combinations of syntactic and semantic ambiguities. Chapters 8 and 9 include a number of these examples. These examples show that the model has a broad coverage of natural language syntax and semantics (see Chapter 9). COMPERE has also been integrated with a discourse processor (or story understander) to show that the output of sentence understanding produced by it is in fact useful for discourse level language understanding (see Chapter 8). Chapter 9 also includes a comparative analysis of COMPERE with other models in terms of the claims we are making in this thesis. Chapter 10 includes some of the signi cant limitations of COMPERE and suggestions for improving the model in several ways. In addition, a theoretical framework based on automata for analyzing the performance of COMPERE and other models of sentence understanding is developed in Chapter 9. Using this framework,
9
several sample analyses are presented to illustrate the computational advantages of COMPERE's architecture and algorithms. Furthermore, the psychological plausibility of COMPERE as a model of human sentence understanding is analyzed in Chapter 10 where two concrete psychological predictions made by COMPERE are presented along with a preliminary sketch of an experiment to verify the predictions. Some of COMPERE's limitations and their eects on the cognitive plausibility of COMPERE are also discussed in Chapter 10. Below, we consider a pair of example sentences and show COMPERE's behaviors in processing them.
1.7 COMPERE in Action: An Example For example, consider the sentence
(1)
The ocers taught at the academy were very demanding. As COMPERE reads the rst two words in this sentence, it builds a syntactic structure of a noun phrase (NP) and assigns the meaning of the noun \ocers" a role of the Subject of the sentence. At the next word \taught," a verb, this sentence is syntactically ambiguous since there is no distinction between its past-tense form and its past-participle form. This ambiguity is detected from the lexical entry of the word which shows two possible subcategories for the word. In the simple past reading of \taught," it would be the main verb with the corresponding interpretation that \the ocers taught somebody else at the academy." On the other hand, if \taught" is read as a verb in past-participle form, it would be the verb in a relative clause, a reduced relative since there is no relative pronoun (such as \who") marking the clause. The reduced relative would correspond to the meaning \the ocers who were taught by somebody else at the academy : : :" The sentence processor does not have the necessary information at this point to decide for sure which of the two is the appropriate interpretation. Of the two interpretations, though there is no semantic preference for or against either one (since \ocers" might equally well be teachers or be taught by someone else), there is a syntactic preference for the simple past category since COMPERE is expecting to see a verb phrase to complete the sentence. The reduced relative structure would simply modify the subject noun, leaving COMPERE still waiting to see a main verb phrase. The main-clause interpretation, on the other hand, meets the current expectation for a main verb and completes the sentence. Using this syntactic preference, COMPERE selects the main clause interpretation and proceeds to read the following words. It may be noted that COMPERE is an incremental processor and hence strives to present the most appropriate syntactic and semantic interpretations incrementally after each word, rather than present all possible interpretations at intermediate points and defer all ambiguity resolution until the end of a clause of sentence. The resulting interpretation at the end of the phrase \at the academy" is shown in Figure 1.2. The gure shows the syntactic parse tree as well as the tree of semantic roles assigned to the sentence. This example shows how COMPERE resolves ambiguities with whatever information is available to it at that point, rather than pursue all possible interpretations until the disambiguating point where all information necessary to resolve the ambiguity becomes available. Such a \best rst" strategy often leads to errors. We continue this example now to show how COMPERE deals with its errors and recovers from them to arrive at correct interpretations. COMPERE continues to process the sentence and encounters another verb \were." It nds that there is no way to attach or compose this word with the preceding interpretation since it already has a main verb. An error has occurred since the new word cannot be attached to the current interpretation and the COMPERE has not yet come to the end of the input sentence. COMPERE's uni ed process examines previous alternatives that it had retained and nds an alternative that would have attached the previous verb phrase (VP) in a reduced relative clause attached to the NP \the ocers." Since this alternative would leave the expectation for a main verb
10
S
NP
VP
ART
N
V
The
officers
taught
PP NP
P at ART
N
the
academy
Event: TEACH
Agent: Officer
Location: Academy
Active-Subject: Officer AT-NP: Academy Subject: Officer NP: Officer
AT-Role
NP: Academy
Figure 1.2: Garden Path: Main-Clause Interpretation. open, to be lled by the current word \were," COMPERE selects this alternative and recovers from the error by repairing the syntactic and semantic structures shown in Figure 1.2. It detaches the VP from the sentence structure S and reattaches it to the NP via a reduced relative clause. While doing this, COMPERE's uni ed process makes sure that corresponding changes are made to the semantic interpretations using the intermediate roles that are part of the semantic role trees. The resulting interpretation of the sentence after the word \were" is shown in Figure 1.3. COMPERE continues to process the remaining words in the sentence to arrive at a complete interpretation of the sentence. This example shows how COMPERE produces incremental interpretations without waiting for complete knowledge before resolving an ambiguity. In this case, the decision led to an error as further structure of the sentence was revealed to the processor. COMPERE was able to recover from the error by repairing the erroneous interpretations. This behavior where an initial choice leads to a dead end and forces the sentence processor to go back and reprocess earlier decisions and switch to a new interpretation or new path is called a garden-path behavior. This error occurred because there was no semantic information available to the processor to help resolve the syntactic ambiguity when it was rst encountered. This can be viewed as a modular behavior since the decision to select the main-clause interpretation was made based only on syntactic information without any in uence of semantic or other types of knowledge. COMPERE is also capable of producing interactive behaviors. To see this, consider sentence (2) obtained from (1) above by changing the subject noun \ocers" to \courses." The two sentences are identical except for the dierence in the subject noun.
11
S
NP
ART
VP
N
The
RCl
V
officers RS
were
VP
V
PP
taught NP
P at ART
N
the
academy
Event: TEACH
Experiencer: Officer
Passive-Subject: Officer
Event: BE Location: Academy
Active-Subject: Officer AT-NP: Academy Subject: Officer
Subject: Officer NP: Officer
Agent: Officer
AT-Role
NP: Academy
NP: Officer
Figure 1.3: Garden Path: Reduced Relative Clause.
(2)
The courses taught at the academy were very demanding. COMPERE has conceptual information to the eect that only animate entities can teach and that courses are not animate entities. When COMPERE encounters the structural ambiguity at the word \taught" this time, there are two possible interpretations with a slight syntactic preference for the main-clause interpretation, the one we used in dealing with sentence (1) above. However, there is a strong semantic bias against this choice since it would make \courses" the agent of teaching thereby violating the conceptual constraint. By arbitrating between these preferences, COMPERE's uni ed process eliminates the main-clause reading of \taught" right away. Only the reduced relative clause interpretation is pursued so that there is no error and no garden-path eect. The output from COMPERE for this sentence is shown in Figure 1.4. This example shows how COMPERE can produce interactive behaviors by integrating syntactic, semantic, and conceptual information incrementally. Such an integration was able to avoid the error and the garden path in this sentence, though the sentence is identical to the previous one except for the dierence in the subject noun. Further examples of COMPERE's behaviors when presented with sentences containing various combinations of syntactic and semantic ambiguities will be presented throughout the thesis, especially in Chapters 8, 9, and 10.
12
S
NP
VP
ART
N
RCl
V
The
courses
RS
were
VP
V
PP
taught NP
P at ART
N
the
academy
Event: TEACH
Theme: Course
Passive-Subject: Course
Event: BE
Location: Academy
AT-NP: Academy
Subject: Course NP: Course
Agent: Course
Active-Subject: Course
AT-Role
NP: Academy
Subject: Course NP: Course
Figure 1.4: Reduced Relative Clause: No Garden Path.
1.8 Contributions COMPERE was built on the hypothesis that keeping the knowledge sources separate and integrating them incrementally during sentence processing will explain how both modular and interactive behaviors could be produced by the same sentence processor. COMPERE has been tested with a variety of sentences and it has shown that it can produce behaviors that have been characterized in the literature both as modular and as interactive. In addition to demonstrating the feasibility of the hypothesis in a computational model of sentence understanding, this work has made several other contributions: It has developed a new parsing algorithm called Head-Signaled Left-Corner parsing that minimizes local ambiguity by producing syntactic compositions at the right times in processing a sentence. It has developed a theoretical framework using automata for analyzing and comparing the performance of sentence interpreters in the presence of syntactic and semantic ambiguities. It has extended the notion of thematic and semantic roles into what we call intermediate roles to provide a uniform representation of syntactic and semantic interpretations. The use of these uniform representations has been demonstrated in arbitration between syntax and semantics and in error recovery operations.
13
It has also shown how knowledge of dierent types can be combined during sentence inter-
pretation using the uni ed process and the uniform representation of all intermediate roles. Moreover, the model has a exible architecture because of its uni ed process and can be extended to accommodate new factors, such as working memory and other resource limits, to account for new psychological data. As a sentence processor, COMPERE produces literal, linguistic interpretations of sentences, not speci c to any domain or conceptual framework. As such, it can be integrated with a variety of other reasoning systems with their own task-speci c and domain-speci c ontologies. Such integration is especially feasible given the exible architecture of COMPERE in which an additional source of knowledge or preference can be easily accommodated in the arbitration process by adding additional constraints on the assignment of intermediate roles. Such constraints can also be modi ed by a reasoning system dynamically to model context-speci c behaviors.
1.9 Organization of the Thesis The next chapter is a compendium of the principles on which this work is based, the questions raised in translating these principles into a computational model, the claims and predictions made by this thesis, and some of the assumptions made in this work. Chapter 3 reviews psycholinguistic literature in support of or against the claims made in Chapter 2. This review builds up toward a psycholinguistic theory of sentence processing that is implemented in COMPERE. The following chapter examines sentence processing from a computational point of view and builds a functional theory of sentence understanding. Chapter 5 presents an analysis of dierent sentence processing architectures and models in terms of the kinds of communication between syntactic and semantic processing that they aord. By comparing this with the kinds of interactions deemed necessary both by the psycholinguistic model and by the functional model, this chapter explains why COMPERE has a controlled parallel architecture with a uni ed arbitrating process. In addition, Chapter 5 also presents a survey of contemporary computational models of sentence understanding. The next two chapters present COMPERE's theories of syntactic and semantic processing. Chapter 6 presents the new parsing algorithm called Head-Signaled Left-Corner parsing and compares it with a spectrum of other syntactic parsing algorithms. Chapter 7 describes the theory of linguistic semantics that we have employed in this work. The chapter introduces the concept of intermediate roles and describes semantic analysis as a process of role assignment. It also explains arbitration in the uni ed process using intermediate roles. Chapter 8 is a presentation of the COMPERE sentence understanding system. It describes all the representations and algorithms that constitute COMPERE in detail and illustrates its working with a number of examples. This chapter also includes a detailed description of the repair mechanisms used to implement syntactic and semantic error recovery. Chapter 9 concerns the evaluation of this work. It begins by validating the COMPERE model by illustrating its working with a wide range of example sentences that show that COMPERE in fact meets its claims while not deviating from the principles on which it was built. It then presents comparative analyses of its architecture with other possible architectures. Finally, Chapter 9 presents a detailed formal analysis of COMPERE and incremental sentence processors in general in terms of an enhanced push-down automaton. This analysis provides a theoretical framework within which some of the tradeos in the design of the COMPERE model are analyzed. Chapter 10 discusses a range of issues that came up in this work including the psychological predictions made by COMPERE, the history of its development, some of the limitations of this model, and future directions to pursue in this work. Finally, Chapter 11 concludes this thesis by stating the issues addressed, the contributions made to dierent language-related sciences, and some of the ideas resulting from this work. With the understanding that a reader might have more
14
interests in some parts of an interdisciplinary work of this kind than others, the main chapters of this thesis (Chapters 3 through 9) have been written so that they are relatively self-contained. The author hopes that the reader will be able to read some of these chapters out of sequence.
15
CHAPTER II PRINCIPLES, CLAIMS, AND ASSUMPTIONS In this chapter, we state some well known principles of natural language processing that the COMPERE model subscribes to. We then raise questions that come up in designing a model of sentence interpretation to meet the principles and state the claims made by this thesis to answer the questions. We also state some of the assumptions we make in this work. The chapters in the rest of the thesis delve into the computational and psychological motivations for making these claims, the design and implementation of the COMPERE model to meet the principles and the claims, and the analysis and evaluation of the performance of the model.
2.1 Principles The COMPERE model builds on previous work in both computational and psychological modeling of natural language processing. Some of the lessons learned from the previous work are summarized here in terms of several principles. This set of principles forms the foundation on top of which we make the claims for COMPERE.
Principle 1. Eager Selection: The language processor strives to select a unique interpretation whenever the information necessary to do so is available to it.
The language processor selects from the set of interpretations possible for a given input. It does not simply produce all possible interpretations and let an external agent select the best. Selection (or ambiguity resolution) is a part of the job of language understanding. Many connectionist models of language comprehension (e.g., McClelland and Kawamoto, 1986) as well as parsing algorithms such as Earley's (1970) algorithm (which is considered an ecient combination of bottom-up and top-down parsing) violate this principle. They simply present a set of possible interpretations and let an external agent pick a suitable interpretation. The principle of eager selection (sometimes also called Early Commitment) is close to the First Analysis principle of Frazier (1987) which states that the language processor always selects one interpretation in the rst analysis based on syntactic criteria alone. However, the eager selection principle is less stringent; it does not require that the processor pursue and present exactly one interpretation at all times (as required by some models such as the NL-Soar model (Lewis, 1993a; Lewis, 1993b)). It allows multiple interpretations to be pursued and presented when the information to select from among them is not available (yet). It merely requires that the processor reduce the set of possible interpretations to a proper subset whenever the information necessary to do so is available to it. In this sense, this principle demands the quality of \eagerness" from the language processor.
Principle 2. Incremental Interpretation: The language processor produces incremental interpretations.
16
Psychological evidence for incremental interpretation is presented in Chapter 3. From a computational point of view, functional motivations for incremental interpretation are presented in Chapter 4. Along with the eager selection principle above, the principle of incrementality not only demands that the language processor must select as unambiguous an interpretation as possible, it also requires the processor to do so incrementally (after each word, for instance) instead of only at the end of the input (the end of the sentence, for example). This principle has also been called the Principle of Incremental Comprehension by Crocker (1993), the On-Line Principle by Jurafsky (1991), and Immediacy of Interpretation by Just and Carpenter (1980, 1987). It was also a primary objective behind the development of language understanding programs such as IPP (Lebowitz, 1983).
Principle 3. Integrated Processing: The language processor applies a piece of knowledge of any type (syntactic, semantic, conceptual, lexical, and so on) as soon as it is available (Birnbaum, 1986; Schank, Lebowitz, and Birnbaum, 1980).
Incremental selection requires integrated processing. If a piece of knowledge was available and the language processor did not use it, it would not be able to reduce the set of possible interpretations using the piece of knowledge or it would not be able to do so at a time when it was already possible. Thus incremental selection precludes the processor from applying just one or a few types of knowledge initially and applying other types of information only later in processing the input. Apart from the functional motivation for integrated processing coming from the requirements of incremental selection (see Chapter 4), there is a large body of evidence for integrated processing in psycholinguistic literature. A number of studies have shown that the human language understander shows immediate eects of syntactic preferences, of semantic feedback, of conceptual priming, of the context, of referential success or failure, and so on (see Chapter 3). For example, the same principle is stated dierently as the Immediate Semantic Decision Hypothesis in some psycholinguistic studies (e.g., Stowe, 1991). It may also be noted that, given the goal of incremental selection, and the fact that all types of knowledge can be of help in reducing ambiguity in language processing, the integrated processing principle is merely a restatement of the principle of rationality (Newell, 1981).
Principle 4. Functional Independence: The language processor is able to apply a piece of knowledge of any type at a point independently of whether or not other types of knowledge are available (or accessible) at that point.
Functional independence (e.g., Caramazza and Berndt, 1978; Eiselt, 1989) between dierent knowledge sources is entailed by the intergrated processing principle. As long as there are situations in which some kinds of knowledge are useful for making rational decisions but other knowledge sources are unavailable or unusable for some reason, the integrated processing principle requires that the ones that are useful be applicable and hence independent of those that are not. Since such situations do occur in language processing, integrated processing entails the independent applicability of each knowledge source. It may be noted that the integrated processing principle entails independence between the dierent knowledge sources. As long as there are situations in which some kinds of knowledge are useful for making rational decisions but other knowledge sources are unavailable or unusable for some reason, integrated processing requires that the ones that are useful be applicable and hence independent of those that are not. Since such situations do occur in natural language processing, integrated processing entails the independent applicability of each knowledge source.
17
Principle 5. Determinism: The language processor does not make any commitment (such as a selection) unless it has a piece of knowledge that supports the commitment.
In other words, the language processor does not make random selections. This is the counterpart of integrated processing. The language processor not only applies every piece of knowledge that is available to it, it does not make any bindings or selections when it has no information substantiating such decisions. For example, it does not select randomly from a set of possible interpretations, nor does it merely select the rst of a list of possible interpretations. Thus, while the selection principle requires the processor to make commitments, the determinism principle requires that it does so only when the selection is justi ed by some piece of knowledge. Determinism might require delaying decisions when necessary information is not available yet. For example, this principle precludes the use of a top-down parser which makes initial commitments without any information in support of the commitments and later backtracks until its blind commitments turn out to be the right ones. Such a top-down parser is used in models of language processing based on the Augmented Transition Network (ATN) model (e.g., the LUNAR system by Woods, 1973). The ve principles above dictate that a model of language processing must make commitments to particular interpretations, make those commitments incrementally in processing the input, make commitments only when there is some knowledge that justi es the commitments, make the commitments as soon as a supporting piece of knowledge is available, and is able to use any type of knowledge independently of other types. Given these ve principles guiding the design of a language processing model, there is no need to introduce a less precise \Strong Competence" principle (e.g., Bresnan and Kaplan, 1982) which states that the steps taken by the language processor mirror human behavior in language comprehension. The ve principles together ensure that a model based on them is not just a linguistic or analytical model of language comprehension, but a cognitive model of the process of language comprehension.
2.2 Questions Given the ve principles above, the questions that remain in the design of a model of sentence comprehension are: Question 1: How do we design the architecture of the model so that both the integrated processing and the functional independence principles may be met? Question 2: How do we design the temporal scheduling of commitments, decisions, and selections of interpretations so as to strike a balance between determinism and eager, incremental selection? How do we schedule the application of dierent types of knowledge to maintain a balance between determinism and eager, incremental commitment? Question 3: What happens when determinism fails and there is an error from which the processor must recover? What should the processor do when an earlier decision which was justi ed at that time leads to an error given new information? Question 4: What happens when there are con icts in integration? How does the processor resolve con icts between dierent types of knowledge in selecting a unique interpretation?
2.3 Claims In this thesis, we make the following claims to answer these questions.
18
Claim 1: It is possible to keep knowledge sources independent of one another and yet
integrate all evidence dynamically during processing through a uni ed arbitrator. Claim 2: Local ambiguity and the requirements of working memory can be minimized in sentence interpretation by synchronizing the performance of syntactic compositions and semantic compositions. Claim 3: A sentence processor can recover from its errors in both structural and lexical ambiguity resolution without completely reprocessing the input sentence or exhaustively searching the space of possible interpretations. Claim 4: Sentence interpretation requires controlled, incremental interaction between syntax and semantics.
2.4 Predictions In addition to making the above claims, the COMPERE model makes predictions regarding the way the human language processor works. A discussion of these predictions as well as a sketch of an experiment to test the predictions can be found in Chapter 10 of this thesis.
Prediction 1. Interactive Ambiguity Resolution: Resolving a structural ambiguity has an immediate eect on resolving an associated lexical ambiguity and vice versa. Prediction 2. Interactive Error Recovery: Recovering from an error in resolving a structural ambiguity has an immediate eect on previous decisions made in an associated lexical ambiguity and vice versa.
2.5 Assumptions As in any exploratory work, the COMPERE model of sentence understanding has been designed by assuming several things. Some of our assumptions are made so commonly in language processing research, or are so intuitively obvious, that they rarely get mentioned explicitly. Below is an extensive set of assumptions we have made: 1. Language processing can be modeled independently of other information processing tasks. In particular, human language understanding can be studied independently of other cognitive faculties, even including language generation. This is a convenient assumption made almost always in language processing research.1 2. We can build a model of native, adult language understanding without also explaining developmental language acquisition. However, it may be noted that our model and the claims we make about sentence understanding do not contradict what is known about language acquisition in any signi cant way. 3. Sentence understanding is a valid subtask of language processing (see Chapter 4). We will however attempt to design the model so that it can be extended to include the eects of the context of a sentence and understand more than single sentences in isolation (as described in Chapter 10). For an example of an investigation of the relationships between language processing and other tasks such as design, problem solving, and knowledge acquisition, see the work on the KA project (Peterson, Mahesh, and Goel, 1994; Mahesh, Peterson, Goel, and Eiselt, 1994). 1
19
4. We can ignore phonology, morphology, other speech processing issues, stress and accent, diagram understanding, and so on, and still make important contributions to the eld of natural language understanding. 5. We can study just one natural language, and in particular, English. Though English is very dierent from many languages for instance in its relentless reliance on word order information (e.g., Bates, Wulfeck, and MacWhinney, 1991), we believe that the overall process and architecture of language processing is the same across languages. Dierences, if any, would be in the relative importance of sources of information and in the strategies used for extracting the information from linguistic markings in the text. Though we are not doing cross-linguistic studies, we believe that the exible and general architecture of our model would accommodate the structures and processes of other languages quite well. 6. Sentence understanding happens the same way no matter what overall goal is being pursued. This, perhaps, is one of the most signi cant assumptions we are making, though it is not an uncommon assumption. What it says is that the methods that are applied and the way the task of sentence processing is carried out is invariant over the kinds of higher-level language processing goals (Ram, 1991), such as whether the text is being processed to acquire knowledge, or for pleasure, or to answer questions, and so on. Certain dierences in sentence processing do exist between language processing situations with dierent goals. For instance, one might skim through the text quickly or read in depth to various degrees. However, we are working with the assumption that the architecture of the sentence processor can be determined by considering primarily the routine behavior in sentence processing that is common to all those situations. This in fact is the reason why we are dealing with architectural theories as opposed to only functional theories. A functional theory would be useful to develop as an abstract theory if there were dierent ways of decomposing the task and dierent methods of solving them. In the case of sentence processing with the above assumption, human behavior suggests that only one kind of decomposition of the task with only one kind of architecture (the one with controlled parallel interaction between independent modules (see Chapter 5)) is feasible. A theory of sentence understanding which is merely functional and which does not make architectural commitments is thus not as interesting or useful. An important piece of human evidence that supports this assumption comes from the observation that processing at any level cannot be turned o without in icting neurological damage to the human brain. Syntax, for instance, gets processed automatically all the time no matter whether the current task (such as answering a particular question) requires it or not. There is no choice as to the kinds of processing the sentence processor can do at any time; it must always do the kinds of processing shown in our model (though the results of some of them may be discounted or overruled by others at times). 7. People normally read English from left to right and do not reprocess surface-level representations repeatedly by eye movement unless they are stuck in a \garden path." This has been supported by experimental studies such as reading time and eye-movement studies (e.g., Carpenter and Daneman, 1981; Carpenter and Just, 1988; Frazier and Rayner, 1982; Rayner, 1978). We also assume that reprocessing is not done by indiscriminately and repeatedly traversing some kind of internal buer which stores parts of the sentence they are currently reading. 8. We do not need structural transformations in syntax (at least not for our present purposes). We are assuming, as others have done before (e.g., Jurafsky, 1991), that constructs such as questions for which transformations have been proposed can be handled in semantics (and in
20
the interaction between syntax and semantics) without having to resort to transformations in syntax. Other assumptions made with reference to the formal computational analysis of the COMPERE model are listed in Chapter 9. Limitations of the model and directions for future improvements are discussed in Chapter 10. We now begin the main body of the thesis by examining rst the psycholinguistic and then the computational relevance of the claims stated in this chapter.
21
CHAPTER III PSYCHOLINGUISTIC THEORY OF SENTENCE UNDERSTANDING 3.1 Overview of Psycholinguistic Studies Natural languages are replete with ambiguities. Understanding a natural language sentence is all about coming up with possible interpretations in each of the dierent faculties of language processing, and choosing one or more that are most preferred in the current context, by integrating information from various knowledge sources. The temporal aspects of when each type of information is used and when certain commitments are made constitute the bulk of the unresolved issues in modeling human sentence interpretation. A central example of a temporal issue in sentence processing is whether both syntactic and semantic knowledge are applied immediately or whether semantic knowledge is applied only after syntactic decisions have already been made. A majority of psycholinguistic studies of human sentence processing address the temporal issues of knowledge application attempting to support one or the other side of the modularity debate. The modularity debate (see Chapter 1), essentially, is the debate over whether certain decisions such as those in syntactic analysis are shielded from the eects of semantic and other contextual types of information or whether they are subject to immediate eects of such types of information. On one side of the modularity debate, there is a whole body of work that has shown that a variety of knowledge sources such as semantics, reference, and context in fact have an immediate eect on human parsing decisions. These studies which we shall collectively refer to as the \immediate interaction" studies explain their results by concluding that the human sentence processor exploits any type of information as soon as it can in resolving ambiguities. On the other side of the modularity debate, there is an equally large body of work, which we shall collectively refer to as the \ rst analysis" studies, that has shown that the sentence processor computes an initial interpretation based only on certain sources of information (mainly morphology and syntax) leaving other types of knowledge such as semantic and contextual information for application later on in processing the sentence. While it is true that no one would argue that the human sentence processor computes a purely syntactic interpretation until the end of a lengthy sentence and applies semantic knowledge only at the end, \ rst analysis" studies show that many phenomena in human sentence processing can be explained by assuming that the processor is carrying out a rst analysis based only on syntactic information until certain intermediate points such as clause boundaries. In addition to these two main areas of work in psycholinguistic studies of human sentence comprehension, we look at three other important groups of studies, more recent ones, that view the temporal issues from new angles and provide new answers to the modularity debate. The rst of these is the set of relatively few studies of error recovery in the human sentence processor. Human language understanders at times make errors in their decisions at ambiguous points since information sucient to make the right decision may not be available yet at those points. However, information coming later on in the sentence might alter the preferences for possible interpretations and result in the detection of errors in previous decisions. People are capable not only of resolving
22
ambiguities but also recovering from such errors in many cases. These ambiguities and errors occur at all levels of language processing, including the lexical, syntactic, and semantic levels. By investigating behaviors in recovering from errors at dierent levels, these studies show how the human sentence processor employs similar mechanisms in dierent error situations. A second, related set of recent studies focuses on the causes of many of the errors and explains the errors in terms of limits on the working memory capacities of individual human processors. By introducing resource limits as a cognitive factor aecting the temporal issues in language processing, these studies provide other possible answers to the modularity debate. The last body of work includes studies of language behavior under aphasia, monolingual (both in English and in other languages) as well as multilingual. In this work, we touch upon this rather controversial and as yet unresolved set of results from aphasic studies merely to point out that there are no glaring contradictions between the results of these studies and the claims we are making in our model. While immediate interaction and rst analysis studies are on opposite sides of the modularity debate providing answers that contradict each other, the three sets of studies above|error recovery, resource limits, and aphasia|together support a new way of looking at the modularity debate. Instead of hypothesizing that either syntax and semantics are separate modules or that they are completely integrated together, these results have led us to believe that certain aspects of both syntax and semantics are integrated while certain others are independent of one another. Error recovery studies support the view that syntactic and semantic processes are uni ed and follow a single control structure. This view is also in agreement with the ndings of cross-linguistic aphasia studies. Resource limit studies support the view that interactions between dierent knowledge sources are dependent on and constrained by the availability of working memory resources. In turn, the results of resource limit studies point out the need for controlling the interactions between syntax and semantics in accordance with the availability of resources. Together these studies support our view of the modularity debate and our answers to the temporal issues in human sentence processing embodied in the COMPERE model, namely, that syntactic and semantic knowledge sources are independent of one another but their in uences and interactions in sentence comprehension are controlled by a single, uni ed arbitrating process (see the claims listed in Chapter 2). This uni ed arbitrator is capable of explaining how resource constraints, if available in a model, can be applied to result in apparently modular and interactive behaviors and how consistent recoveries from dierent types of errors are possible. In this chapter, we present brief reviews of the ve areas of psycholinguistic studies mentioned above: immediate interaction, rst analysis, error recovery, resource limit, and aphasic studies. We then show that these bodies of work provide evidence for the COMPERE model and describe how they support the claims we are making in this thesis (see Chapter 2). Following the intricacies of the psycholinguistic experiments and results presented in this review might require familiarity with several experimental techniques as well as types of sentences commonly used in these studies. Interested readers may refer to an introductory text on psycholinguistics such as the one by Carroll (1986; see also Gernsbacher, 1994). However, it should be possible to follow the results of the experiments and their implications for a model of sentence comprehension without getting into the intricate details of the experiments and the materials they use.
3.2 Immediate Interaction Studies Immediate interaction of semantic and contextual information in structural ambiguity resolution has been demonstrated in the well-known experimental studies of Tyler and Marslen-Wilson (1977), Crain and Steedman (1985), Holmes, Stowe, and Cupples (1989), and Taraban and McClelland (1988). Tyler and Marslen-Wilson (1977) used Adjective-Verb ambiguous word-pairs such as \land-
23 ing planes" in (1)1 (1a) If you walk too near the runway, landing planes: : : (1b) If you've been trained as a pilot, landing planes: : : with a naming task to test whether syntax behaves autonomously until the boundary of a clause. Their experiments showed that semantic information aects the choice of syntactic structure even before the end of a clause. When a probe inappropriate with respect to the context of the incomplete clause was used (for example, \is" for sentence (1a)), the naming latency was found to be longer than when an appropriate probe was used (\are" for (1a)). This shows that subjects had already used prior semantic context to choose one syntactic structure over another before the clause was complete. Crain and Steedman (1985) used sentences with complement and relative clauses to show that semantic and referential context information can steer the sentence processor towards one or the other kind of syntactic structure. For instance, they used sentence pairs such as in (2). (2a) The teachers taught by the Berlitz method passed the test. (2b) The children taught by the Berlitz method passed the test. Sentences such as (2b) were judged grammatical by the subjects signi cantly more often than those of type (2a). The semantic distinction between the two sentences namely, that a teacher is more likely to teach whereas a child is more likely to be taught, explains the dierence in grammaticality judgement. Their experiments showed that such semantic and contextual interaction happens well before any sentential or clause boundary is encountered. They argued that garden pathing is a contextual phenomenon and can both be prevented and induced by the context in which a sentence is processed. For example, in sentences (2), sentences of type (2a) were judged ungrammatical because the semantic bias in them steered the subjects away from the relative clause structure leading to a garden path when the main verb \passed" was encountered. Such a garden path was avoided in sentences of type (2b) where the semantic bias steered the subjects towards the relative clause structure which also happened to be the correct structure for the sentences. Cupples and Stowe (Stowe, 1991) used reduced-relative sentences with animate and inanimate subjects as in (3) and (4) (3) The ocers taught at the academy were very demanding. (4) The courses taught at the academy were very demanding. and measured word-by-word reading times in a self-paced reading task to establish that semantic information such as the animacy of the subject in uences the immediate assignment of syntactic structure. In sentence (3), the animate subject is equally plausible as the agent or experiencer of teaching, while in (4), the inanimate subject is plausible only as the theme. When the reading times for these sentences were compared with those for unambiguous control sentences (i.e., those with the marker \that were" for the relative clause), the readings times at the main verb phrase (\were very demanding") showed a large and signi cant increase over those at other points (Stowe, 1991). The reading times for sentences like (4) did not show any such dierence. This showed that the semantic bias in (4) must have immediately disambiguated the syntactic structure of the sentence thereby avoiding any reading diculties at the main verb phrase. Taraban and McClelland (1988) investigated the eects of sentential context preceding a prepositional phrase on resolving attachment ambiguities and showed garden-pathing eects in both 1 Both sentences (1a) and (1b) can have either of the two continuations: \are likely to hurt you" and \is a routine task." For each sentence, one of the two continuations is semantically more coherent than the other.
24
directions based on dierences in context. They used both minimal and nonminimal structures as the preferred ones in the contexts they created. An example of the kind of sentences they used is shown in (5). (5a) The reporter exposed corruption in government. (5b) The reporter exposed corruption in the article. They did not nd any evidence for autonomous syntactic decisions unaected by semantic context. Instead, they found a signi cant interaction with sentential context. For instance, when the context predicted high attachment, VP attachment was easier (as in (5b) where \in the article" modi es \exposed"), but when the context predicted nominal attachment, attachment to the NP was easier instead (as in (5a) where \in government" modi es \corruption"). They were able to show garden pathing in both directions based on dierences in context, thereby providing clear evidence against the autonomy of syntax. Carpenter and Just (1988) found evidence for immediate semantic processing of a word by measuring word-by-word reading times. They found that the gaze duration on a semantically anomalous word was longer than that on a semantically appropriate word which is otherwise similar to the inappropriate word (in terms of its length, frequency, and syntactic t to the rest of the sentence). This indicates that semantic processing must be going on even as the eye is xated on the word. Altmann, Garnham, and Dennis (1992) have also concluded that context does in uence the initial decisions of the syntactic processor. Using ambiguous relative/complement sentences they measured eye movements to show that pragmatic context can help the sentence processor avoid syntactic garden paths. This study is especially noteworthy since it is one of the rst to employ eye movement measurement, a technique that has been used extensively to demonstrate the autonomy of syntactic processing (see below). There has been a whole series of new studies done to con rm that every type of knowledge one can think of has an immediate in uence on resolving syntactic ambiguities. For example, SpiveyKnowlton and Tanenhaus (1994; see also Spivey-Knowlton, 1992) used eye tracking techniques to show that both discourse and semantic context have an immediate eect on syntactic processing. Ferstl (1994) has also shown eects of context on the location of prepositional phrase attachment, replicating the results of Taraban and McClelland (1988). Similarly, Britt, Gabrys, and Perfetti (1993) have identi ed conditions under which discourse contexts have an eect on resolving prepositional phrase attachments. Pearlmutter, Daugherty, MacDonald, and Seidenberg (1994) found an explanation for syntactic processing behaviors using contextual biases and frequency eects. Pearlmutter and MacDonald (1992) have also shown that plausibility has an eect on syntactic processing. Burgess and Lund (1994) have argued that syntactic processing is a con uence of multiple constraints. This account was also supported by experimental results showing parafoveal2 and semantic eects on syntactic ambiguity resolution (Burgess, Tanenhaus, and Homan, 1994). Recent studies have pointed towards other factors that determine when a syntactic decision is made. Apart from the in uence of semantic, discourse, and referential contexts, the frequency of use of dierent meanings can play a role in resolving syntactic ambiguities (Pearlmutter, Daugherty, MacDonald, and Seidenberg, 1994). Stevenson (1994) has argued for eects of recency on similar grounds. A study of the location of prepositional phrase attachment by Ferstl (1994) showed that the location depends on the context. She found evidence for delayed attachment when the noun phrase within the PP had a compound noun. Attachment eects were observed only after the noun ller. However, when they introduced an adjective in the noun phrase, early attachment was The term parafoveal refers to a stimulus that is not on the fovea (\center" of the retina), but near enough to it to be visible and resolvable. 2
25
noticed. Similarly, when prior context mentioned one of the modi ers in the compound noun phrase, early attachment was observed. Attachment was delayed when the context did not refer to any of the modi ers in the noun phrase. This study clearly demonstrated that the strategy employed by the human sentence processor for delaying an attachment is heavily context dependent (see also the discussion on \Eager HSLC" in Chapter 10).
3.2.1 Additional Support for Immediate Integration
Apart from the experimental evidence presented above, one can provide intuitive arguments to support both views on the modularity issue. We examine such additional evidence for immediate integration of syntactic and semantic information in this subsection. Since the purpose of language is its communicative function and not syntactic grammaticality judgment, intuitively, the human language processor should use the meaning of the text as early as possible in making any decision during the course of language understanding (Crain and Steedman, 1985; Tyler and Marslen-Wilson, 1977). If so, the human sentence understander must forever be trying to integrate information arising from knowledge sources at dierent levels in order to come up with a single interpretation that makes the best sense overall. This goal to integrate the preferences of every level results in immediate interaction between the dierent levels such as between syntax and semantics. In processing some kinds of sentences, people exhibit behaviors that show they exploit an immediate feedback from semantics and pragmatics in resolving structural ambiguities in syntax. People garden path on reduced-relative ambiguous structures such as in sentence (3).
(3)
The ocers taught at the academy were very demanding. However, sentence (4), which has the same surface structure as (3), does not result in garden-path behavior (Stowe, 1991).
(4)
The courses taught at the academy were very demanding. Since there is no dierence in the apparent structure of the two sentences, the human sentence processor must be using some information about the meaning of the words \ocers" and \courses" in relation to the meaning of \taught" and other words in the sentence to choose the main-clause interpretation in (3) and the relative-clause structure in (4) while reading the rst verb \taught" in the two sentences. For instance, such higher-level information might be the knowledge that only animate objects can be actors of teaching. Moreover, natural languages appear to allow a high degree of local syntactic ambiguity (Crain and Steedman, 1985). If the sentence processor supports interaction between syntax and higher levels, and if it produces fully interpreted semantic entities corresponding to incomplete fragments of the sentence, the context in which these entities are evaluated can be a powerful source of redundancy. Such redundancy between information arising from knowledge sources at dierent levels enables the sentence processor to handle a degree of local syntactic ambiguity which in purely syntactic processing terms would be intolerable. In other words, combining preferences from dierent levels as early as possible might be imposing more constraints and shrinking the set of feasible interpretations rather than creating an information overload on the real-time processor. According to the integrated processing principle (Chapter 2), every kind of knowledge available to the sentence processor must be applied at the earliest opportunity in making decisions while processing a sentence (Birnbaum, 1986; Birnbaum, 1991; Schank, Lebowitz, and Birnbaum, 1980; see also Carpenter and Just, 1988). Integrated processing has been interpreted to imply that dierent kinds of knowledge such as syntactic, semantic, and conceptual knowledge must be integrated a priori, thereby making the use of one dependent on others (Birnbaum and Selfridge,
26
1981; Lebowitz, 1983; Lehnert, Dyer, Johnson, Yang, and Harley, 1983; Ram, 1989; Riesbeck and Martin, 1986a; Riesbeck and Martin, 1986b). In this work, we claim that knowledge sources can be represented independently of each other to retain functional independence between them (see also Lytinen, 1987). However, information arising from the dierent sources is integrated during the processing of a sentence so that decisions are still made by applying all kinds of knowledge available at every decision point. Information about the sentence coming from each source should be considered at the earliest, but the sources themselves should not be integrated a priori in a monolithic representation.
3.3 First Analysis Studies A number of models have argued that the human sentence processor selects a single syntactic interpretation in the rst analysis. These rst analysis (or rather single analysis) studies have shown that syntactic decisions are autonomous, not subject to immediate eects of semantic or pragmatic biases. Forster (1979) used a matching task with pairs of sentences of various degrees of plausibility and grammaticality to argue that there is a distinct syntactic level at which decisions are not aected by higher semantic levels. Many experiments have used eye-movement studies to demonstrate that initial parsing decisions are made using only syntactic preferences, such as minimal attachment, irrespective of the semantic and discourse context (Clifton and Ferreira, 1987; Frazier and Rayner, 1982; Rayner, Garrod, and Perfetti, 1992). The minimal attachment principle states that the parser selects that syntactic structure which is simpler or minimal relative to the other. This, for instance, can explain the preference for a main-clause analysis (shown in Figure 3.1) over a reduced-relative structure (shown in Figure 3.2) in sentence (6) below.
(6)
The editor played the tape agreed the story was big. Another example of minimal attachment can be seen in prepositional attachment ambiguities such as in sentence (7) below where people initially prefer the simpler VP-attachment of \on the cart" (shown in Figure 3.3) over the complex NP shown in Figure 3.4. By constructing sets of sentences, some of which are minimal but others are not, and embedding them in contexts that bias against the minimal attachment, eye movements can be recorded to show that minimal structures were selected initially even when the sentence required a non-minimal attachment and the context favored the non-minimal attachment. Minimal attachment explains the initial preference of syntax for a structure that may not be compatible with the semantic and contextual bias that may be present. Minimal attachment is used to explain the garden-path behaviors that are shown to be present in spite of the semantic biases against them in a variety of experiments. For example, Frazier and Rayner (1982) used sentences like those in (7) and showed that eye movements are disrupted in the disambiguating regions (italicized in (7)) of the non-minimal attachment sentences indicating that a minimal attachment must have been selected which must be revised in the disambiguating region to force the non-minimal structure. (7a) Sam loaded the boxes on the cart before lunch. (7b) Sam loaded the boxes on the cart onto the van. In another experiment, Rayner, Carlson, and Frazier (1983) presented sentences like those in (8). (8a) The kids played all the albums on the stereo before they went to bed. (8b) The kids played all the albums on the shelf before they went to bed.
27
S
NP
VP
ART
N
The
editor
V
NP
played ART the
N
tape
Figure 3.1: Main-Clause Interpretation.
S
NP
VP
ART
N
RCl
V
The
editor
RS
agreed
VP
V
NP
played ART the
N
tape
Figure 3.2: Reduced Relative Clause.
28
S
NP
VP
N
V
Sam
NP
loaded ART the
PP N
P
boxes
on
NP ART
N
the
cart
Figure 3.3: Minimal Prepositional Attachment.
S
NP
N
VP
V
Sam
NP
loaded ART the
PP N
P
PP
boxes
onto NP
P
on ART
N
the
cart
Figure 3.4: Non-Minimal Prepositional Attachment.
29
Pragmatic (or world) knowledge indicates that the (italicized) PP \on the stereo/shelf" is attached minimally to the VP in (8a) but non-minimally to the NP in (8b) (because music albums can be played on stereos but not on shelves). This would indicate that the application of such pragmatic knowledge would eliminate any diculty with the non-minimal structure in (8b). However, the experiment showed that the diculty did not go away in spite of the pragmatic knowledge. This clearly indicates that initial syntactic selections are made without the help of pragmatic knowledge of the type described above. Detailed analyses of the above experiments can also be found in the review by Clifton and Ferreira (1987). Ferreira and Clifton (1985) showed that referential context also does not help in avoiding reading diculties with syntactic garden paths. For example, in sentences such as 6 and 7, they showed that readers exhibit longer reading times in the disambiguating regions of reduced relatives such as 6 and complex NP sentences such as 7b. Even in contexts where there were two editors (one of whom has been played a tape) or two sets of boxes (and it was asserted that one set had been placed on a cart), longer reading times were observed in sentences such as 6 and 7b. (6) The editor played the tape agreed the story was big. (9) The editor played the tape and agreed the story was big. (7a) Sam loaded the boxes on the cart before lunch. (7b) Sam loaded the boxes on the cart onto the van. In a more recent study, Rayner, Garrod, and Perfetti (1992) showed that discourse eects are delayed in resolving syntactic ambiguities. They used sentences with a syntactically ambiguous prepositional phrase attachment or a syntactically ambiguous reduced relative clause. They embedded the sentences in discourse contexts that biased the subjects towards or against the minimal attachment and measured eye movements. Their results showed that subjects garden pathed even when there was a biasing context. They concluded that discourse aects only the later stages of processing, the initial syntactic decisions being made based only on structural principles.
3.3.1 Additional Support for Syntactic Autonomy
Apart from the experimental evidence presented above, one can provide intuitive arguments to support an autonomous syntactic module. We review some such additional evidence in this subsection. The term functional independence is used to describe that feature of the architecture which supports interaction between the dierent levels without sacri cing the independence between them. It allows each faculty to function if other faculties fail to provide useful information (Caramazza and Berndt, 1978; Eiselt, 1989). If, for some reason, a knowledge source fails to provide any information for making a decision, information arising from other knowledge sources can still be applied to make the best decision based on available information. Functional independence need not necessarily mean neurological independence, according to which there are separate parts of the brain corresponding to the dierent functions of language processing. There are several reasons for hypothesizing a functionally independent cognitive architecture for the sentence processor. Intuitively, human sentence-understanding behavior shows the independence of syntax and semantics. To illustrate this point, one can argue that, for instance, people can judge grammaticality independent of meaning as in sentences like (10) which make little sense if any (Chomsky, 1957). (10) Colorless green ideas sleep furiously. On the other hand, people can put up with imperfect syntax and get the meaning out of ungrammatical strings of words. For instance, even in the (nearly) total absence of syntax as in (11) we still get some meaning out of the text.
30
(11a) skid crash hospital (Winograd, 1973) (11b) re match arson hotel (Charniak, 1983) Further support for syntactic autonomy comes from the complexity of interaction and that of integrating preferences from several higher levels in making on-line decisions. The speed and automaticity of decisions at the lower levels of sentence processing seem to suggest a fast, autonomous, automatic process for tasks such as syntactic analysis (Fodor, 1983; Fodor, 1987). Fodor (1983) called these \input systems" which share a set of characteristics dierent from \central systems" (such as that input systems are domain speci c, mandatory in operation, fast, and informationally encapsulated). Syntactic processing matches this view of an input system better than other language processes such as semantics and pragmatics. Syntax must thus be an autonomous input system whose decisions are shielded from the in uence of semantic and pragmatic biases. Lack of independence of syntax also leads to diculties in accounting for syntactic generalizations across dierent semantic entities. An example of a syntactic generalization exhibited by the language processor in a variety of constructs is the minimal attachment heuristic (Frazier, 1987; Kimball, 1973) which subsumes a class of preferences for a syntactically minimal interpretation. A preference for a main-clause analysis rather than a reduced-relative analysis in (1), a preference for a simple-direct-object (rather than a sentential-complement) analysis of the ambiguous phrase in (12), and a preference for NP (rather than sentential) conjunction of the ambiguous phrase in (13), can all be explained by the syntactic generalization of minimal attachment. A variety of semantic and referential preferences using the speci c concepts and contexts in the above sentences would have to be employed to explain the same set of behaviors if the language processor were unable to impose syntactic preferences independently. (12) Mary knew the answer fby heart j was incorrectg. (13) John kissed Mary and her sister ftoo j laughedg. There are structural preferences other than minimal attachment such as right association (Kimball, 1973) which seem to work in sentences where minimal attachment fails. Right association or late closure says that the incoming constituent is attached to the current structure rather than a previous structure higher up in the parse tree. For instance, in sentence (14),
(14)
I saw the man with the horse. right association would attach the PP to the NP (as shown in Figure 3.5) and not to the VP as minimal attachment would have it. In this sentence, the minimal interpretation would be turned down by semantic criteria (since a horse cannot be used as an instrument for seeing) and the structure preferred by right association turns out to be the correct parse. Right association can be used as an explanation of how the syntactic processor makes autonomous decisions in sentences such as (14) above where minimal attachment does not yield the right structure.
3.4 Studies of Error Recovery Though it is well known that the human sentence processor makes errors at times in processing a sentence, there have been relatively few studies of its mechanisms for recovering from such errors (e.g., Burgess and Simpson, 1988; Carpenter and Daneman, 1981; Eiselt and Holbrook, 1991; Eiselt, 1989; Holbrook, Eiselt, and Mahesh, 1992; Holbrook, Eiselt, Granger, and Matthei, 1988). Nevertheless, error recovery is a vital aspect of the human language understander and its study has the potential of unlocking some of the mystery of human language behaviors.
31
S
NP
VP
PRO
V
I
saw
NP ART
N
the
man
PP NP
P
with ART
N
the
horse
Figure 3.5: Right Association.
32
Language understanders have to make early commitments in resolving ambiguities so as to cope eciently with the vagaries of natural languages. As a result, language understanders commit errors at times in making decisions at ambiguous points since sucient information may not be available yet at those points. Information coming later on in the sentence might alter their preferences for dierent interpretations. People are capable not only of resolving ambiguities but also recovering from such errors in many cases (Carpenter and Daneman, 1981; Holbrook, 1989). Not all errors in resolving an ambiguity lead to a garden path. People can often recover locally from the errors they make when later input shows the errors.
3.4.1 Lexical and Pragmatic Error Recovery
Carpenter and Daneman (1981) used garden-path passages with semantically ambiguous words (such as (15) below) to study lexical retrieval and error recovery in reading. They examined eye xations on individual words and used the gaze durations as a measure of the amount of processing done at each word. They found evidence in their eye-movement data that people do not simply reprocess the texts when they detect an error they have committed. For example, in text (15), they might initially assume that \tears" meant Cinderella was crying because she could not go to the dance, but when they see the disambiguating word \dress," they realize that their interpretation of \tears" was in error and must correct the error. However, when they correct such errors, they do not simply reread the previous parts of the text sequentially; instead, they regress backwards selectively, perhaps using a number of heuristics for error recovery. For example, they might rst reexamine alternative interpretations of the inconsistent word or might reexamine earlier words that were also ambiguous and see if an alternative interpretation of those words might correct the error.
(15)
Cinderella was sad because she couldn't go to the dance that night. There were big tears in her dress. The ATLAST model (Eiselt, 1989) proposed a Conditional Retention Mechanism (Holbrook, Eiselt, Granger, and Matthei, 1988) for error recovery in resolving lexical and pragmatic ambiguities without complete reprocessing of the text. According to this, the sentence processor selects the best interpretation in the current context and makes an early commitment when possible. However, it does not discard the alternative interpretations; it retains them for possible later use. The retained alternatives are deactivated for the present but are not completely inactive. If later text proves an earlier decision wrong, the sentence processor reactivates the retained alternatives to reevaluate the decision. It would then switch to the interpretation that is now best with respect to the new information as well as the earlier. The computational model ATLAST also demonstrated that the same mechanism could account for recovery from errors in resolving pragmatic ambiguities (Eiselt, 1989). The theory of conditional retention and error recovery in lexical processing without rereading the text was veri ed by experimental studies (Eiselt and Holbrook, 1991; Holbrook, 1989). Holbrook (1989) found evidence for the retention of unselected meanings of ambiguous words when the words were embedded in longer texts instead of being at the end of a text. Eiselt and Holbrook (1991) presented more evidence for conditional retention from a study that employed a binary forced-choice task to distinguish between the two hypotheses of conditional retention and active suppression3 (e.g., Seidenberg, Tanenhaus, Leiman, and Bienkowski, 1982) of unselected meanings. They used sentences with ambiguous words (such as \bat" in text (16) below) embedded in either consistent contexts or con icting contexts (i.e., where the preceding context either biased the reader Active suppression is the hypothesis that unselected meanings of an ambiguous word are purposely suppressed from further consideration and are not available to the sentence processor for later use in error recovery. 3
33
towards or against the meaning suggested by the following disambiguating region). Readers were forced to make a selection between two words (not in the text, such as \CAVE PITCH" in text (16) below) at some point as to which one is closer to the text. Data from this experiment showed clear evidence for conditional retention as opposed to active suppression (Eiselt and Holbrook, 1991).
(16)
Mary realized that she had examined the wrong bat. She took it back and got CAVE PITCH one that was male. Though the two explanations of error recovery given above, namely, error recovery by reprocessing the text (albeit selectively) and recovery without reprocessing, appear to be dierent strategies, they can in fact be uni ed into a single mechanism. Such a uni ed view of lexical error recovery (Eiselt and Holbrook, 1991) accounts for both mechanisms by introducing an additional factor, namely, the distance between the ambiguous word and the disambiguating word(s). When the distance is small, selective reprocessing may be the choice, but when it is large (or across clause or sentence boundaries), conditional retention and recovery without reprocessing seem to provide better a explanation of error recovery behavior.
3.4.2 Syntactic Error Recovery
Studies of structural ambiguity resolution in syntax showed error recovery behaviors similar to those in semantics described above. The experiments of Stowe and colleagues (Holmes, Stowe, and Cupples, 1989; see also Stowe, 1991) showed evidence for delayed decisions and error recoveries in syntactic ambiguity resolution. They used reduced relative sentences and showed that semantic features of the subject noun such as animacy have an immediate eect on syntactic ambiguity resolution (replicating the results of the immediate interaction studies discussed earlier). For example, sentence (3) below is a garden-path sentence while (4) is not because of the dierence in their subject nouns (vis-a-vis their ability to become agents of the \teach" event). However, they also conducted \loading" experiments in which they increased the load on the human sentence processor by increasing the length of the intervening relative clause (as in (17) for example). They found that reading diculties similar to garden-path eects were found even when the subject noun biased the reader towards the relative clause (and hence away from the garden path). For example sentence (17a) produced garden-path eects while (17b) did not. However sentence (17c) which is a longer version of (17b) showed diculties similar to those seen in (17a). Stowe explained these results by proposing a limited delayed decision model in which syntactic decisions are delayed and alternatives retained (much as in the conditional retention theory above) until resource limits preclude continued retention. This suggests that the sentence processor, when pressed for resources to hold on to lengthy ambiguous interpretations, makes a decision based only on syntactic information, disregarding any semantic bias against the syntactic preference. (3) The ocers taught at the academy were very demanding. (4) The courses taught at the academy were very demanding.
(17a) The reporter saw the woman was not very happy. (17b) The student realized the answer was not clear. (17c) The inspector realized the mistake which he'd made had quickly been corrected. In another study of recovery from syntactic garden paths, Ferreira and Henderson (1991) conducted a series of ve experiments using sentences with direct object/complement ambiguities (such
34
as in sentences (18a) and (18b) below) and grammaticality judgments. By varying the length of the ambiguous noun phrase, they found that recovery from a garden path was more dicult with longer NPs. They also found that this diculty was not attributable to just the length of the phrases. In fact, ambiguous phrases made longer by the addition of prenominal modi ers did not cause increased diculties at all. Postnominal modi ers on the other hand, such as prepositional phrases and relative clauses were hard for the parser to reanalyze. They concluded that the ease of error recovery from syntactic garden paths is determined by the distance between the head of the ambiguous phrase and the disambiguating word(s). (18a) Because Bill drinks wine beer is never kept in the house. (18b) Because Bill drinks wine is never kept in the house.
3.4.3 Uni ed Theory of Error Recovery
All the above studies of error recovery have the following features in common: they have all found that Some information about unselected choices is retained at an ambiguity. This information is used to recover from errors easily if the disambiguating region is \close" to the ambiguity where the error occurred. There is a limit on how long the information is retained. For example, some of the retained information, such as the surface form that led to the retained choices, may be lost when the limit is reached. Other information, such as semantic alternatives, may be available for error recovery even after the surface form is lost, say at a clause boundary (Jarvella, 1971; Eiselt and Holbrook, 1991). Another striking aspect of the above account is the similarity between recovery behavior in syntactic and semantic errors (Holbrook, Eiselt, and Mahesh, 1992). For example, both the ATLAST model and Stowe's limited delay model proposed an early commitment where possible. Both models had the capability to pursue multiple interpretations in parallel when the ambiguity forced it. Both models explained error recovery as an operation of switching to an alternative interpretation retained in parallel by the sentence processor. Conditional retention modeled resource constraints on the processor just as limited delay does. Finally, both models made decisions by integrating the preferences from syntax and semantics. The uni ed account of error recovery outlined above (Eiselt and Holbrook, 1991) accounts for both syntactic and semantic error recoveries. The uni ed account outlined above introduced the resource factor as an explanation of when the human sentence processor recovers from an error easily and when it experiences diculty and shows a garden-path eect. We now explore studies of resource constraints in more detail and see how resource constraints oer a dierent explanation for the duality between interactive and modular processing of syntax and semantics in human sentence understanding.
3.5 Studies of Resource Constraints Apart from the studies of Stowe and colleagues reported above that demonstrated the eects of resource constraints when the human sentence processor is \loaded," a series of studies by Just, Carpenter, and colleagues has investigated the limitations of working memory and their eects on language comprehension. Carpenter and Just (1988) summarize several of their studies in which they found that people attempt to \digest" as much information as possible immediately to
35
minimize working memory required to hold on to partial products. Using working memory capacity limits in their model, they explained a preference for immediate semantic processing as opposed to a \wait-and-see" strategy. They also found signi cant individual dierences in capacities indicating that some readers (or readers in some situations) may be processing language in a more modular fashion than others (or at other times). In one set of experiments on the eects of working memory capacity on lexical ambiguity resolution, Miyake, Just, and Carpenter (1993) found that subjects with high working memory capacities (\high-span readers") showed little eect of ambiguity on encountering the disambiguating region of sentences like (19) below that have a lexical ambiguity with one highly frequent meaning. They showed little diculty in disambiguation irrespective of whether the more frequent (as in (19a)) or the less frequent meaning (as in (19b)) was the correct one in the sentence, showing that they must have maintained both meanings until the disambiguating region. Subjects with low working memory capacities4 (\low-span readers"), on the other hand, showed a large ambiguity eect when the disambiguating words favored the less frequent meaning (as in (19b)). This suggests that their capacity limits did not enable them to maintain both interpretations until disambiguation was possible. In another experiment, Miyake, Just, and Carpenter (1993) also showed that subjects with intermediate working memory capacities (\mid-span readers") had diculty when the disambiguating region appeared much after the ambiguity in the text, providing further evidence for the eects of working memory on lexical disambiguation. (19a) Since Ken really liked the boxer, he took a bus to the nearest sports arena to see the match. (19b) Since Ken really liked the boxer, he took a bus to the nearest pet store to buy the animal. In other experiments, King and Just (1991) studied the eects of working memory capacity on syntactic processing. Using object relative sentences such as (20) below, they found that low span readers not only took longer to read the verbs of the object relative sentence, but their comprehension was also less accurate than those of high capacity readers.
(20)
The reporter that the senator attacked admitted the error. Further evidence for working memory constraints on syntactic ambiguity resolution comes from the experiments of MacDonald, Just, and Carpenter (1992). They used reduced relative sentences such as (21) below and showed using word-by-word reading times that all subjects do additional processing upon encountering the syntactic ambiguity at the verb (namely, if the verb is the main verb of the sentence or if it starts a reduced relative clause) suggesting that they generate both interpretations at the ambiguity. Further, they showed that readers with higher working memory capacities were able to maintain both interpretations for a longer period of time than low-span readers. As a result, high-span readers experience less diculty in disambiguating the syntactic attachment of the verb when they encounter the disambiguating words in the sentence than lowspan readers. The experiments showed that low-span readers had a much higher error rate than high-span readers for sentences (like (21b)) with a greater distance between the ambiguous verb and the disambiguation (than for (21a) below). (21a) The experienced soldiers told about attacks conducted the midnight raid. (21b) The experienced soldiers told about surprise enemy guerilla attacks conducted the midnight raid. 4 The working memory capacity of a subject can be assessed by a task such as the Reading Span task (Daneman and Carpenter, 1980).
36
Based on the evidence from these experiments, Just and Carpenter (1992) proposed a model of working memory and language processing in which individual dierences in working memory capacity are captured by variations in the total amount of activation available in an instance of the model. They proposed such activation (and hence capacity) limits as an explanation of syntactic modularity. Readers with greater capacities maintain multiple interpretations and allow semantic and pragmatic information to play a role in resolving syntactic ambiguities while those with lower working memory capacities are forced to make syntactic decisions early resulting in syntactic behaviors that appear to informationally encapsulated. Carpenter, Miyake, and Just (1994) also oered working-memory based explanations for behavioral evidence from language processing with aphasia and aging.
3.5.1 Other Multiple Analysis Models
Models of sentence processing based on working memory constraints are similar to other multiple analysis models in that they allow more than one syntactic interpretation to be active at any time. Diculties with strict rst analysis models in explaining a variety of behaviors have led to the development of several models that have resorted to multiple syntactic representations for various reasons. For example, Kurtzman (1985) proposed a model in which multiple syntactic interpretations of an ambiguous sentence were maintained until pragmatic information guided the choice of one. Gorrell (1987) proposed a similar model in which syntactic complexity was used instead of pragmatic information to decide which interpretation to select. In Tanenhaus and Carlson's (1989) model, multiple syntactic interpretations were permitted and were a result of accessing lexical information about alternative thematic structures of verbs with ambiguous argument structures. Earlier models such as the one by Bever, Garrett, and Hurtig (1973) employed a purely syntactic criterion based on their experimental ndings in a fragment completion task to decide when to maintain multiple interpretations; they proposed that multiple representations are maintained until the clause boundary at which point the human sentence processor makes a commitment to a single representations and discards alternative syntactic representations. Models based on working memory constraints dier from these models in that the criteria for deciding how and which interpretations are maintained, and for how long, are derived from resource constraints (MacDonald, Just and Carpenter, 1992). As such, these models apply the same criteria to decide when to retain alternative syntactic as well as semantic interpretations. (Chapter 9 presents a computational formalism that allows one to model the resource requirements of multiple syntactic as well as semantic interpretations. Chapter 10 discusses the diculties in implementing a numerical resource limit in a computational model of sentence processing.)
3.6 Studies of Aphasia Behavioral studies with aphasic subjects have been used traditionally as a source of evidence for the separability of syntax from semantics in human sentence processing (Caramazza and Berndt, 1978). People who have suered focal neural damage in the Broca's area, Broca's aphasics, exhibit impaired syntactic processing abilities. Those with a damage in a dierent part, Wernicke's aphasics, appear to retain syntactic processing abilities but have trouble producing coherent semantic content. Studies of language production and comprehension in aphasic people have resulted in the conclusion that although processing at dierent levels may interact, they are functionally and neurologically independent and can be selectively aected by damage to the brain. While the above studies were mostly done with English speaking aphasics, some recent crosslinguistic studies of aphasics have challenged the above claim for syntactic modularity. Crosslinguistic studies have not shown such a clear distinction in the kind of breakdown in language
37
processing behavior occurring as a result of focal neural damage to dierent parts of the brain (Bates, Wulfeck, and MacWhinney, 1991). Nevertheless, systematic dierences between patient groups such as Broca's and Wernicke's aphasics are still observed in these cross-linguistic studies. In order to account for these dierences, a model of sentence comprehension must keep syntax and semantics functionally independent of one another. Without such functional independence, syntax and semantics would be inseparable and would not show systematic dierences in impairments due to neural damage. Cross-linguistic evidence is against neurological, not functional, independence between the dierent faculties of language processing.
3.7 Analysis: Evidence for Claims Having reviewed the ve areas of psycholinguistic results, we now turn to the principles and claims presented in Chapter 2 and analyze the evidence the results provide for and against the claims. We then summarize the analysis and lead the reader towards COMPERE's uni ed-arbitrator model of sentence understanding. It may be noted rst that the evidence outlined in the previous sections supports the ve principles listed in Chapter 2 on which the COMPERE model is based. For example, immediate interaction, rst analysis, error recovery, and resource limit studies all suggest that the human sentence processor attempts to select one or more interpretations from the set of all possible interpretations (principle of eager selection). Interaction, error recovery, and resource constraint studies show strong evidence to support the view that the sentence processor produces interpretations incrementally (principle of incremental interpretation). Interaction and resource constraint studies show evidence for the immediate in uence of a variety of semantic and pragmatic knowledge sources on syntactic processing (principle of integrated processing). First analysis, error recovery, resource constraint, and aphasic studies lend support to the functional separability of syntactic and semantic processing (principle of functional independence). Finally, the entire body of work in psycholinguistics shows that the decisions made by the human sentence processor can be readily attributed to the in uence of various types of knowledge or to the cognitive constraints such as memory limitations; available evidence does not support the view that the human sentence processor makes its decisions completely arbitrarily (principle of determinism).
3.7.1 Combining Functional Independence and Integrated Processing
Returning to the modularity debate introduced earlier in Chapter 1 and described in greater detail at the beginning of this chapter, it is now clear that there is strong psycholinguistic evidence for both the modular and the non-modular views of syntactic processing, from rst analysis and immediate interaction studies, respectively. How then are we to design a model of human sentence processing that can account for both bodies of evidence? Based on the ndings of error recovery and resource constraint studies outlined above, we claim that integration and functional independence can coexist in a model that separates syntactic and semantic knowledge representations but uni es their processing through a single arbitrator (Claim 1 in Chapter 2). Alternatives to this claim (such as completely separate processing or completely integrated representations and processing) and their disadvantages are discussed in Chapter 5. We will also show in the rest of this thesis that the COMPERE model based on this claim can explain the types of error recovery behaviors found in the studies reported above. Moreover, such a model with independent knowledge sources but integrated processing could also explain the eects of resource limitations on sentence processing described above. The arbitrating process could take resource limitations into account and decide whether and how much semantic and other contextual sources of information should in uence syntactic decisions. The issue of implementing a resource constraint in the model is addressed
38
further in Chapter 10.
3.7.2 Implications of Error Recovery Studies
Since the sentence processor is capable of recovering from errors in lexical as well as structural ambiguity resolution, and since the error recovery behavior seems to be the same in both kinds of ambiguities, we argue that the simplest theory which accounts for this behavior is one in which there is a single uni ed method that can recover from errors at dierent levels of sentence processing. In this thesis, we demonstrate the feasibility of this claim (Claim 3 in Chapter 2) with the COMPERE model. Evidence from error recovery lends support to COMPERE's claim that sentence understanding is a uni ed process of arbitration between multiple sources of knowledge. Moreover, it shows how to develop an actual computational model out of a sketch of a theory of syntactic processing and structural ambiguity resolution put forward by Stowe (1991).
3.7.3 Implications of Resource Constraints
Recent studies on the in uence of working memory capacity on sentence comprehension have brought to light the important role played by working memory limits in determining the workings of the human sentence processor. This is a new factor in the modularity debate. Experiments reviewed above have shown that the sentence processor shows immediate interaction between syntax and semantics whenever it has the resources to support such interaction and that it shows autonomous syntactic behaviors whenever the processor is pressed to a point where it no longer has sucient resources to carry out incremental interactions with higher level processes. This has been shown both within an individual by increasing the load on the processor with long intermediate clauses and across subjects with diering working memory capacities, as already described. Limitations on working memory force the human language processor to immediately \digest" partial interpretations to minimize storage requirements rather than using a \wait-and-see" strategy (Carpenter and Just, 1988). In other words, conserving working memory resources is a strong motivation for the human sentence processor to avoid delays in resolving ambiguities and resort to eager commitments. Working memory capacity governs the length of delays in making decisions. For example, \low span" readers (i.e., people with less (available) working memory capacities) are able to pursue multiple interpretations for less time than \high span" readers. Working memory capacity also oers an explanation for modularity eects (as mentioned earlier). When higher capacities are available, more interactions with semantic and pragmatic information and processes are supported. When capacities are limited, interactions are cut down and decisions are made autonomously, using syntactic information alone, for instance. Evidence from resource constraint studies also lend support to COMPERE's claim that syntaxsemantics interactions in sentence processing are controlled by a uni ed arbitrating process (Claim 4 in Chapter 2; also see Chapter 5 for a detailed account of sentence processing architectures). These studies tell us that the sentence processor must take resource factors into account when combining syntactic and semantic preferences and selecting interpretations. Since resource limitations apply to the combined resource requirements of syntactic and semantic processing, these decisions must be made by a process that has control over both syntax and semantics. For example, there must be a controlling process that decides (however implicitly), based on resource availabilities, whether syntactic and semantic preferences must be evaluated against each other and combined or whether syntax must be allowed to override semantic preferences so that a selection can be made with minimal resource requirements.
39
3.7.4 Modularity Debate: Reconciliation
Resource constraint studies point out the possibility that modular and interactive eects were both observed in rst analysis and immediate interaction studies (respectively) because the human sentence processor behaves in either fashion depending on the \resource situation": the working memory capacity of the individual subject, the amount of resources required for processing a given sentence, any additional load on the resources at the time of processing the sentence, and any other factors aecting the eective use of available resources. This observation leads us to conclude that, perhaps, reconciliation in the modularity debate does not come from one of the two explanations being incorrect or one of the two sets of experiments being inconclusive, but it might come from the hypothesis that syntactic processing in the human sentence processor is autonomous in certain situations and is not under other situations. Given this hypothesis, one needs to identify what those two situations are and how to build a model that allows sucient control over syntax-semantics interactions so that it can explain both types of situations in sentence understanding. The two situations above, autonomous and interactive syntactic processing, correspond to those that require more resources than available and those that require less (than or equal to the amount of resources available). Exactly what amount of resources is available at a point is not easy to determine, given signi cant individual dierences across subjects. Further, our understanding of the computational properties of sentence processing (especially semantic and contextual processing) is not mature enough yet to enable us to gauge precisely the amount of resources it takes to make a decision at a point in processing a sentence (see Chapter 9 for a de nition of such a metric for measuring resource requirements and Chapter 10 for a discussion of diculties in implementing resource constraints in a computational model). Given the new explanation that the human sentence processor is modular in some situations and is not modular in other situations, our computational model must be able to explain both kinds of behaviors. That is, our model must adhere to both the integrated processing principle and the functional independence principle. Our model must have an architecture that can support both integrated processing and functional independence by allowing dierent kinds of knowledge to be integrated early in sentence processing and, at the same time, ensuring that they are applicable independently of one another. We claim that integrated processing does not preclude functional independence. A number of people have (implicitly) assumed that integrated processing implies integrated representations. For instance, models such as CA, IPP, (Birnbaum and Selfridge, 1981; Lebowitz, 1983) and the recent model by Jurafsky (1992) have no separable representation of syntactic knowledge (see Chapter 5 for a detailed survey of these and other models). The integrated processing principle implies that the human sentence understander does not have a sequential architecture and gives semantics and world knowledge their proper role in sentence understanding. However, without functional independence between syntax and semantics, integrated models cannot maintain the proper role of syntax. The role of syntax in language understanding is to determine the hierarchical and (left-to-right) order relationships between parts of a sentence. These relationships are important cues in assigning semantic roles to the parts. A model gives syntax its proper role if such relationships are applied to make decisions in resolving an ambiguity, though such information may not be necessary or sucient in every case to assign the proper roles. Syntax was discounted in integrated processing models which attempted to ll the proper role of syntax by semantic or conceptual expectations (Birnbaum and Selfridge, 1981; Lebowitz, 1983; Lehnert, Dyer, Johnson, Yang, and Harley, 1983; Lytinen, 1987). However, \the use of such a criterion in parsing is bound often to be unreliable, since the reason for having syntax at all is presumably that real events frequently contradict such expectations" (Crain and Steedman, 1985). In previous implementations of integrated processing, information available right away from syntax could not be utilized in making sentence processing decisions. For instance, Lytinen's semantics rst model of language processing called MOPTRANS (Lytinen, 1986; see also Chapter 5) used
40
animacy cues to determine the actor of a sentence, though syntax would tell right away (and perhaps more convincingly in many cases) who the actor was. By retaining functional independence, we can give its proper role to each kind of knowledge that goes into sentence understanding. We can make the best use of each kind of knowledge available and thereby extend the scope of the model in terms of the kinds of sentences it can process and for which it can explain human behavior. The reconciliation between integrated processing and functional independence led us to our rst claim (Claim 1 in Chapter 2) that the sentence processor can keep its knowledge sources separate and thereby functionally independent from one another, but can integrate the information coming from all the sources immediately and incrementally during sentence processing. Such a design raises the question, \At what points in a sentence do syntax and semantics interact?" Psycholinguistic evidence described earlier has shown that the points depend on the semantic and referential context (Ferstl, 1994; Pearlmutter, Daugherty, MacDonald, and Seidenberg, 1994; Taraban and McClelland, 1988). In this thesis, we present an algorithm derived from computational considerations and illustrate that synchronizing the points at which syntactic and semantic compositions are made gives a close approximation to the actual sequence of attachment points observed in human sentence processing (Claim 2 in Chapter 2). Chapter 6 presents the algorithm. A computational analysis of the behavior of this algorithm and the costs and bene ts of the sequence of points it provides are discussed in Chapter 9. Ways of improving the algorithm to take advantage of context eects and provide an even better explanation of psycholinguistic results are discussed in Chapter 10 of this thesis.
3.8 Discussion and Summary From the above discussions on temporal dependencies in sentence interpretation, we can conclude the following about the architecture of the sentence processor:
The sentence processor must integrate information from multiple sources to make decisions.
For example, it might have to integrate any or all of syntactic, semantic, pragmatic, reference, discourse, frequency, and recency information. The sentence processor need not make every decision immediately. It can decide to delay a decision and pursue multiple interpretations in parallel. The amount of delay permitted is determined by available resources such as working memory. A decision must be made when resource limits are reached. Not all types of information need be integrated all the time. If resources are scarce, certain types of information (semantic and contextual information, for instance) may be disregarded. Given these requirements on the sentence processor, it is clear that decisions cannot be made by any one source of information such as syntax. Since certain types of information may be disregarded entirely in some decisions, there must be a common process that exerts overall control over the decisions. Only such a process could account for the in uence of cognitive factors such as working memory capacity. This uni ed arbitrating process performs the following functions: Decides when to make a commitment. For example, it decides whether to delay a decision or whether to terminate a delay and force a decision. Combines information arising from multiple sources. It selects the alternative that is best overall given the various preferences from the dierent sources.
41
Decides which types of information to consider for combining and making decisions. For exam-
ple, it decides whether to look only at syntax or include semantic and pragmatic preferences as well. Monitors resource limits and availabilities. When resources are scarce, it might force a commitment to reduce the resource requirements. It coordinates the various processes such as syntax and semantics running in parallel to synchronize them. For instance, it might delay a syntactic commitment until certain semantic information is available. It coordinates the various separate processes to ensure consistency between them. For example, during an error recovery, it makes sure that the resulting interpretations are always coherent by making syntax and semantics carry out operations that correspond to each other. It must be noted that the above description of the arbitrating process and the control structure of the sentence processor is an abstract description of the computational architecture that is necessary to explain all the psychological results presented in this chapter. Whether this abstract process is realized in the form of a separate processor controlling the other processes or is more \automatic" (in the sense of a set of interacting connectionist networks, for instance) is an implementation issue. However, uncontrolled interaction between separate processes, as in certain connectionist network models (Cottrell, 1985; Waltz and Pollack, 1984 and 1985), does not seem to be capable of producing the kind of ambiguity resolution behaviors observed in limited delayed decisions and error recoveries. The sentence processor needs a control structure monitoring the interaction between the dierent faculties of sentence processing. Interaction between syntactic and semantic processing appears to be not only intense and continual, but also controlled. Such an overall process of sentence interpretation is best described as one involving a uni ed arbitrator coordinating the processing of each type of knowledge. (See Chapter 5 for a survey of uncontrolled and controlled parallel as well as other architectures for sentence processing.) The arbitrator controls the amount of delay for a decision and thereby determines the balance between eager commitments and delayed commitments. This is achieved by synchronizing (in terms of location or word number in the sentence) syntactic and semantic decisions (Claim 2 in Chapter 2) to make eager commitments in certain situations and delayed commitments in certain other situations. An algorithm which de nes these strategies is the Head-Signaled Left-Corner Parsing algorithm (described in Chapters 6 and 8 and also discussed in Chapter 10). In summary, current psychological evidence leads us to believe that the sentence processor has these characteristics: It has functionally independent faculties. The dierent faculties interact at every opportunity. Preferences from dierent faculties for alternative interpretations are integrated to select one that is best overall. It delays disambiguation decisions when information from dierent faculties are in con ict or when there is insucient information for making a decision; the sentence processor then pursues multiple interpretations in parallel. The limited resources that the sentence processor has at its disposal place limits on possible interactions and delays.
42
The sentence processor can often recover from erroneous decisions in resolving dierent am-
biguities. In our view of sentence processing, knowledge sources are kept separate from one another but processing is uni ed in a single control structure for syntactic and semantic processing. This single uni ed process carries out ambiguity resolution and error recovery in both syntax and semantics by integrating information generated from the dierent knowledge sources. Whether and how much of this integration is possible at a point in processing a sentence depends on (i) the amount of resources it takes to generate those types of information by accessing and processing the knowledge structures in the knowledge sources, (ii) the amount of resources it takes to consider the many alternatives and select the best among them, and (iii) the amount of resources currently available to the processor. The model can produce modular behaviors as well as interactive behaviors depending on the suciency of available resources for syntax-semantics interaction.5 Thus, this model not only reconciles integrated processing with functional independence, it also explains how resource constraints govern the degree to which the human sentence processor is interactive or modular and why error recovery behaviors in syntax and semantics appear to be the same.
5 See Chapter 10 for a discussion of resource limits and their in uence on syntax-semantics interaction vis-a-vis COMPERE's implementation.
43
CHAPTER IV FUNCTIONAL THEORY OF SENTENCE UNDERSTANDING In this chapter, we present a functional theory of natural language understanding. This theory looks at language understanding as an information processing task and speci es it in terms of its input and output, the types of knowledge used in converting the input information to the output information, the methods used, and the subtasks that result from a decomposition of the task. We use such an analysis of the language understanding task, independently of the psychological motivations, to show the need for incremental interpretation and communication between syntactic and semantic analyses in a model of sentence processing.
4.1 Functional Constraints on Natural Language Understanding Apart from analyzing the task, methods, and knowledge, more importantly, a functional theory must impose certain constraints on a model, based on the function or purpose that the task serves in the real world. Though such constraints must really derive from the functional requirements of the situation or task for which natural language comprehension is being performed, we can look at the functional constraints imposed by a generic task such as that of interactive communication in natural language between a pair of agents. In the case of such language use, the function or purpose of natural language is to assist intelligent agents in communicating with other agents either in person using speech or indirectly through a written text (a more recent invention than speech). In order for this communication to work eectively, the generator and the comprehender of natural language must share a common set of conventions or a common body of knowledge (such as the syntax and semantics of the language for instance) so that neither needs a complete and unambiguous speci cation of the meaning being conveyed (as would be the case in most programming languages, for instance). The functional analysis presented in this chapter has much in common with the work on lexical functional grammars (LFG) (Bresnan and Kaplan, 1982; Sells, 1985). In both analyses of language understanding, representations and processing mechanisms are motivated from the communicative function of language and the contributions to that function that various elements of a language make. For example, it is not important in such formalisms that each piece of knowledge be strictly classi able as either syntactic or semantic. Rather, the function that the piece of knowledge contributes to the overall function of natural language is the basis for including that piece of knowledge in a functional model of language understanding. There is an important distinction, however, between LFG-based approach and our model of sentence understanding, COMPERE, in that our model does not rely on the lexicon as heavily as LFG-based models do. For example, LFG-based models integrate dierent types of knowledge through the lexicon while COMPERE does the integration dynamically during sentence understanding through its arbitrating process. We elaborate further on this distinction in Chapter 5. It may also be noted that many models based on LFG are largely models of syntactic processing.
44
4.1.1 Incremental Interpretation
For communication in natural language to be eective, it is necessary that the comprehender arrive at incremental interpretations as soon as possible and that the interpretations that the comprehender arrives at be justi ed to a reasonable extent. Incremental comprehension is especially important in speech understanding for otherwise, some of the information in the real-time speech input would simply be lost from the communication. Incremental interpretation is nevertheless important even in text comprehension because it allows the writer to assume that the reader will use the immediate context of the preceding text to correctly interpret the text that is immediately following. Without this feature, the writer will have to put more information in the following text to make it comprehensible in the absence of the context that could have been provided by the meaning of the preceding text. For example, if the writer used an ambiguous word (or a bigger unit such as a phrase) towards the beginning of a sentence, an incremental understander might disambiguate the word right away when possible. If this is what the writer expects to happen, she can assume that the disambiguated meaning of the word will be part of the context in which the rest of the sentence will be understood. Then she can use another ambiguous or otherwise incompletely speci ed meaning in the rest of the sentence and assume that the context will help convey the intended complete and correct meaning. If the interpretation was not incremental, the writer will be forced to provide more complete information in the latter part of the sentence to compensate for the lack of richer context available when that part is being interpreted. This need to provide additional information makes the communication less ecient and might also introduce additional ambiguities. Worse, if the writer assumes that the interpretation is incremental but in the reader it is not, the reader might misinterpret the latter part of the sentence, leading to ineective communication. In other words, natural language allows eective means of communication between agents by allowing for a high degree of ambiguity, ellipsis, and vagueness. Without these virtues, communication would be an arduous task for the agent at the source end, as arduous as is communicating with today's machines through programming. At the receiving end of natural language communication, the agent must be able to deal with ambiguities, vagueness, and ellipsis for the communication to work. The language comprehender must produce incremental interpretations and make commitments early to reduce the degree of ambiguity. Without such an incremental comprehension, natural language communication would get bogged down by the large number of interpretations possible of a piece of text and fail to be eective. For instance, even a sentence such as the one below which is easily comprehensible to the human language understander, an incremental processor, would severely overload a syntax- rst, non-incremental processor by generating over one hundred syntactic interpretations of the sentence (Jacobs, 1992; Jacobs et al, 1990).
(1)
A form of asbestos once used to make Kent cigarette lters has caused a high percentage of cancer deaths among a group of workers exposed to it more than 30 years ago, researchers reported. This sentence is full of ambiguities (e.g., whether it was 30 years ago that the workers were exposed to asbestos or whether they died 30 years ago). While syntactic analysis does reduce the number of possible compositions or word meanings, it cannot provide unique compositions in ambiguous sentences such as the one above. If the processor tries to analyze syntax rst for the whole sentence (or a whole clause, etc.) and then produce semantic interpretations, it would have to deal with hundreds of possible parses for such sentences. An incremental interpreter, on the other hand, could bring meanings of interpreted parts to bear and rule out many of the possible syntactic compositions in the semantic context. In order to reduce the number of possible interpretations to a manageable size, the sentence understander must produce incremental interpretations.
45
4.1.2 Determinism
In general, the incremental interpretations that the comprehender generates must be justi ed by some body of knowledge shared between the generator and the comprehender. In speech processing, if this goal is not met, it results in a failure in communication or at least makes it less eective. Communication through speech tends to be highly interactive in a conversation and relies on the listener arriving at the same interpretation that the speaker intends most of the time. Even in text understanding, if the reader arrives at unjusti ed interpretations and keeps backtracking and switching interpretations at will, the writer has no control over what the reader reads from the text and as such the goal of communication fails. Especially since the author has no interactive control over the interpretation that the reader derives from a text, the author must rely on the fact that the reader only arrives at justi ed interpretations that would be the same as the ones that the writer intended given that the two agents share a common body of linguistic and world knowledge.
4.1.3 Error Recovery
Even if the language processor is deterministic and makes commitments only when they are justi ed by some piece of knowledge, an incremental language processor that makes early commitments in processing a sentence is bound to fail and make errors. For natural language communication to be eective, the processor must be able to recover from many of those errors without having to start afresh and reprocess the input. Communication fails if the receiving agent demands fresh input every time it makes a mistake.
4.1.4 Functional Independence
Yet another functionally derivable constraint is that of the independence between the dierent types of knowledge used in language comprehension. In natural language communication, there are situations where the agent has to comprehend the input without demanding complete knowledge required for the comprehension. To do this, the agent must be able to apply the incomplete knowledge eectively to reduce the ambiguity (i.e., narrow the set of possible interpretations down) as far as possible with the available knowledge. In other words, whatever knowledge is available in a situation must be applicable without demanding other knowledge that is unavailable. Since it is easy to construe natural language inputs in which any of the many types of knowledge are unavailable, it follows that each type of knowledge must be applicable independently of the others. This is referred to as the functional independence of the dierent knowledge sources. In summary, for eective communication in natural language, it is desirable that the language comprehension system is incremental, does not make wild, unjusti ed commitments, is able to recover from a majority of its errors without prohibitive penalties, and has functionally independent knowledge sources. (See the principles of eager selection, incremental interpretation, determinism, and functional independence in Chapter 2.)
4.2 The Sentence Understanding Task The input of the language understanding task is the natural language text itself (given the assumptions we made earlier in Chapter 2 about the modalities of input). This input is naturally divided into individual sentences. Though intersentential interactions, such as reference and ellipsis, are certainly an important aspect of language understanding, sentence boundaries do demarcate a valid subtask, the task of sentence understanding, during language understanding. The types of knowledge that are used in analyzing an individual sentence (such as syntactic knowledge) are quite dierent from the kinds of knowledge that come into play in intersentential analysis (such
46
as knowledge of discourse structures). In the analysis that follows, we only look at the sentence understanding subtask and ignore all aspects of discourse analysis. However, the computational architecture we are proposing is designed with the fact in mind that new types of information will have to be taken into account as the model evolves in the future to include discourse processing.
4.2.1 The Input
The input to a sentence understander is a sentence (complete or not, grammatical or otherwise) in natural language. Though sentences can be of dierent types such as interrogative or exclamatory, we only consider declarative sentences in this thesis. As such we do not pay attention to punctuation either. The input is thus a sentence which can be characterized as a linear sequence of words in the natural language. The sentence may contain a rich structure such as relative clauses and prepositional adjuncts as well as a rich meaning, perhaps with category and semantic ambiguities in word meanings.
4.2.2 The Output
The output of the language understanding task is much less well de ned than the input. In the absence of a situation, a task for which natural language processing is being carried out, the output of language understanding can only be said to have those kinds of information that are deemed useful in a large variety of tasks. Exactly what these kinds of information are depends on the theory of meaning or the theory of semantics employed, a matter that will be discussed later in this thesis (see Chapter 7). At this time though we can say that we have adopted a minimalist approach in specifying the components of the output of language understanding. The output that our model produces would have essential components of the outputs demanded by most situating tasks.1 In addition, our analysis of the language understanding task attempts to ensure that enhancing the output of the task does not negate the commitments made by our model of sentence understanding. As we shall see later in our discussion of semantics, we simply adopt an event-oriented semantics and de ne the output of sentence understanding to be a set of events, objects, their properties, and the thematic role relationships between the events and the objects. This semantics does not attempt to model other components of meaning such as temporal and spatial relationships, aspect, and modality (e.g., Frawley, 1992).
4.2.3 Problems in Sentence Understanding
Why is the sentence understanding task a hard task given that it is simply a task of mapping the input sentence to an output representation of its meaning? The reason is, it is a many-to-many mapping. The mapping is neither unique nor unambiguous. Moreover, parts of what needs to be in the output are completely left out from the input information. Not everything that is in the input is relevant to the output either. These features of natural language lead to its classical problems of ambiguity, incompleteness (e.g., ellipsis), and vagueness (or indirectness). Successful mapping can only be achieved by using a variety of knowledge, the application and interactions of which can be fairly complex. Among the problems listed above, incompleteness and indirectness tend to be bigger problems in discourse analysis than in individual sentence analysis. Since the problem addressed by this thesis is sentence understanding, we focus on the problem of ambiguity in the following functional analysis. It may be noted that the output produced our implementation of COMPERE includes syntactic parse trees for the input sentence in addition to the objects, events, and thematic roles. 1
47
4.2.4 Ambiguity
Ambiguity is the problem of one input mapping to more than one output. For example, a word might map to more than one meaning, or there may be more than one way of composing a sequence of words to form a whole meaning. Ambiguities can be resolved by applying many types of knowledge. Knowledge helps to reduce the ambiguity by eliminating one or more of the possible mappings for a given input. However, there is no single type of knowledge that always resolves a particular type of ambiguity. For instance, knowledge of the grammar of the language does help in reducing ambiguities in word meanings but does not always do so by itself. Other types of knowledge such as the semantic context may be necessary at other times to resolve the same kind of ambiguity in word meanings. It is this problem of the need for dierent types of knowledge at dierent times and situations that makes the problem of sentence processing a hard one. Ambiguity resolution in light of the above functional constraints requires complex interactions between the dierent types of knowledge available to the sentence understander. Before we look at the types of knowledge and how they are used in resolving ambiguities, let us look at the dierent classes of ambiguities at the sentence level.2
4.2.4.1 A Typology of Ambiguities
Ambiguities in a sentence can be of two broad categories: Local ambiguities and global ambiguities. Local ambiguities are those multiple mappings that appear in the middle of a sentence but vanish when the following part of the sentence is analyzed. The information that becomes available after reading the latter part of a sentence eliminates all but one of the mappings for the initial part thereby eliminating the local ambiguity. For example, the sentence
(2)
The bar was frequented by the gangsters. has a local ambiguity at \The bar" where the word \bar" can mean several things such as a metal rod or a place where drinks are served. This local ambiguity is resolved by the rest of the sentence since metallic rods are not frequented by people but drinking places are. A global ambiguity is one that remains even after analyzing the entire sentence. For instance, the sentence
(3)
The bar looks new. is globally ambiguous without additional context since we cannot tell in a null context whether it is a metallic rod or a drinking place that looked new. Ambiguities can be dierentiated along two other dimensions: lexical versus structural and syntactic versus semantic. Lexical ambiguities are those that arise from multiple meanings for words in the lexicon. Lexical syntactic ambiguities (or word category ambiguities) arise when a word has more than one syntactic category (with corresponding dierences in meaning). For instance, the word \saw" in the sentence
(4)
The man saw the ship. is ambiguous. It can either be the past tense form of the verb \see" or it can be the noun \saw" (meaning the tool that carpenters use).3 Of course, in this sentence, this is a local ambiguity since Ambiguities at higher levels, in discourse and literary studies, have been given an interesting classi cation into seven types by Empson (1956). 3 In fact, there is a third possibility. \Saw" can be the in nitive form of the verb \saw" meaning to cut with a saw. This is also a local ambiguity since the singular noun \man" does not agree with this in ection of the verb. 2
48
only the verb form is grammatical in its syntactic context (following the noun phrase \The man" and preceding the other noun phrase \the ship"). A lexical ambiguity can also be a subcategory ambiguity when a word has a particular syntactic category but many possible subcategories and corresponding meanings for the same in ected form. For instance, the word \taught" in the sentence
(5)
The ocers taught at the academy. is a verb in either its simple past form or its past participle form, with corresponding dierences in meanings, namely, \the ocers taught someone else" or that \the ocers were taught by someone else," respectively. Lexical semantic ambiguities arise when a word has more than one meaning for the same syntactic category. For instance, the word \bat" in the sentence
(6)
The bats are here. is ambiguous. It can either mean the noun referring to the instrument used for hitting the ball in a ball game or the noun meaning the ying mammals.4 This lexical semantic ambiguity is a global ambiguity here since the isolated sentence above does not tell us how to choose between the two meanings of \bats." Lexical semantic ambiguities are also called Word sense ambiguities. Structural ambiguities arise from multiple ways of composing the word meanings in a sentence. Many structural ambiguities may however arise from underlying lexical ambiguities. Structural syntactic ambiguities or simply structural ambiguities are those that are the result of multiple possible compositions of the words in the sentence based on syntactic knowledge. Syntactic analysis results in multiple assignments of hierarchical structure when there are structural ambiguities. For example, the sentence
(7)
The man saw the ship with the telescope. has a global structural ambiguity in that it is not clear whether the meaning of the phrase \with the telescope" should be composed with the meaning of \the ship" or the meaning of \saw." In syntactic terms, it is not clear whether to attach the prepositional phrase \with the telescope" to the noun phrase \the ship" or the verb phrase \saw the ship" in the hierarchical syntactic structure of the sentence. In this sentence, the structural syntactic ambiguity is a global ambiguity and also it does not arise from any underlying lexical ambiguity. In this sense, it is a purely structural syntactic ambiguity. Also, the ambiguity in structure does make a dierence in meaning in the above sentence: was it a ship with a telescope on board that the man saw or did he see the ship by using the telescope as an instrument? Structural syntactic ambiguities frequently do not make any dierence in meaning. For example, in the sentence taken from an earlier example (sentence (1) above),
(8)
... a high percentage of cancer deaths among a group ... there is a structural ambiguity in attaching the prepositional phrase \among a group" to \a high percentage" or to \cancer deaths." However, this is a \senseless" or \spurious" ambiguity since the two attachments result in the same nal semantic composition: deaths in the group were caused by cancer and the percentage of such deaths was high. Eliminating such spurious ambiguities is a major problem in building models of sentence understanding (Jacobs, 1992; Jacobs et al, 1990). The above example is also a global ambiguity. Structural syntactic ambiguities can also be local. For instance, in the sentence 4
The word \bats" is also ambiguous in its category since it can also be a verb, meaning the action of using a bat.
49
(9)
They are green apples. there is a local structural ambiguity after the word \green" since the adjective \green" might either modify the verb as in
(10)
They are green. or it might start a new noun phrase which is completed by \apples" in sentence 9 above. This local structural ambiguity is important because it raises the question of whether the meaning of \green" is to be composed with the meaning of \are" or with the meaning of \apples." Structural semantic ambiguities are those that arise when a sentence with a unique syntactic structure has multiple semantic compositions possible. For example, consider the two sentences below: (11) The ocers were taught by the faculty. (12) The courses were taught by the faculty. These two sentences have the same syntactic structure. However, the thematic roles assigned to the subject nouns are dierent in the two sentences: \The ocers" were the experiencers of teaching whereas \The courses" were the themes of teaching (see Chapter 7 for a description of these and other thematic roles).5 Another example of a purely semantic structural ambiguity can be found in
(13)
The hat was hidden by three o'clock. Did \three o'clock" hide the hat or did someone hide the hat before \three o'clock"? In this example, however, there is a strong semantic bias for the latter semantic composition. Another way to look at ambiguities is in terms of the types of knowledge (described below) that help resolve the particular ambiguity. For instance, structural syntactic ambiguities can sometimes be resolved based only on syntactic preferences, sometimes using semantic biases, and sometimes using a more global context of the natural language communication. Similarly, word sense ambiguities (or lexical semantic ambiguities) can be resolved using knowledge of recencies and frequencies of word meaning usage in the lexicon or in a corpus, using syntactic knowledge, or using semantic preferences. The part of the sentence shown below
(14)
The ocers taught at the academy ...... has a structural syntactic ambiguity, the ambiguity being whether \taught at the academy" should be composed with \The ocers" to mean \the ocers who were taught at the academy by someone else" or whether \The ocers" should be composed with \taught" to mean \The ocers taught someone else at the academy." This ambiguity arises from the subcategory ambiguity in the word \taught" as discussed earlier and from the possible omission of the relative clause markers \who were." The ambiguity can be resolved using purely syntactic preferences for the second of the two structural compositions mentioned above (see the section on an example at the end of this chapter for further explanation). Sentence (15) below has a structural ambiguity similar to the one in sentence (7) above, the ambiguity being whether they saw the ship with the horse on it or whether they used the horse as an instrument for seeing. This ambiguity can however be resolved using One might argue that in a semantics based on thematic roles, it is strange to call this a structural semantic ambiguity since there is no apparent distinction in the hierarchical structures (either syntactic or semantic) between the two meanings. However, that is simply a consequence of using such a semantics. Structural dierences do show up for ambiguities of the above kind in a more compositional semantics such as the one proposed by Peterson and Billman (1994). 5
50
semantic (or, rather, conceptual) knowledge that horses do not belong to the class of objects that can be used as optical instruments for seeing. (15) The ocers saw the ship with the horse. (16) The ocers saw the ship with the telescope. In sentence (16) above, however, the structural ambiguity can only be resolved (when the determinism criterion is taken into account so that only justi ed commitments can be made) using a more global context that told us whether the telescope was on the ship or next to the ocer. In the case of word sense ambiguities such as \bugs" (whether they mean insects or electronic microphones for spying purposes), sometimes one can choose between the meanings using the fact that the insect meaning is used much more frequently in a certain domain and is hence likely to be the appropriate meaning. In other cases it may be resolved using syntactic and semantic preferences such as in
(17)
The bugs moved into the lounge ... Here, the \bugs" must have been insects since only insects go well with the syntactic composition that means the bugs moved themselves into the lounge (and not the bugs were moved by someone else into the lounge). It may be noted that though syntactic preferences provided the basic disambiguating criterion here, the understander must have employed additional semantic (or conceptual) knowledge (that only insects can move themselves and microphones cannot) in order to resolve the ambiguity. Such complex interactions between dierent types of knowledge are what make sentence understanding a hard task to model. The above example has both a lexical semantic ambiguity and a structural syntactic ambiguity. Combinations of ambiguities only lead to combinatorial explosions proposing hundreds of possible interpretations unless the sentence understander applies all available types of knowledge at the right times to minimize the eects of ambiguity. The above classi cation excludes another kind of ambiguity in composing word meanings. This is the type of ambiguity that arises in noun groups. For example, how do we compose the meanings of \tea" and the other noun in \tea spoon," in \tea cup," and in \tea time"? This problem is not addressed here because the structure and semantics of language do not help us in resolving these ambiguities. They are resolved largely using the global context of the communication and other domain speci c pieces of knowledge. However such ambiguities are a major problem in language understanding (see Halliday and Martin, 1993, on the problems of nominalization).
4.2.5 Knowledge Sources Used in Language Understanding
Several kinds of knowledge, both language-speci c and extra-linguistic, are necessary to map the input sentence to the output representation of its meaning, resolving any ambiguities in the mapping. A natural language is a set of conventions for communicating meaning between the generator and the comprehender. Types of linguistic knowledge are merely distinct types of regularities that one can observe in a natural language. A sentence in a natural language can be looked at as a structured collection of cues to the meaning intended to be communicated. The regularities in the cues and how the cues map to meanings dier from language to language. As such, the discussion of knowledge sources here is somewhat speci c to English and its close relatives.
4.2.6 Types of Knowledge
Word meanings: Meaning that is communicated in natural language can be viewed as a conceptual graph structure with some components of meaning forming the nodes and various relationships between them forming the links. Components of the meaning being conveyed are encoded in the
51
sentence in the form of content words such as nouns and verbs. Words constitute the basic units of meaning in a natural language. However, words do not have single meanings; many words have multiple meanings resulting in ambiguities. The mapping from words to meanings is by no means one-to-one. Word category information: Apart from the meaning of the word, a word also contains a variety of other cues such as the syntactic (or grammatical) category of the word (also known as the part of speech), the subcategories for the particular in ection of the word indicating such linguistic cues as tense, number, gender, case, person, and so on. Knowledge of these cues, some of which is sometimes made part of the grammar or syntactic knowledge, constitutes an essential type of knowledge employed in sentence interpretation. Syntactic structure or grammar: The sequential order and grouping of words in a sentence serves as an important cue for determining the relationships between the meanings of words.6 The task of sentence understanding involves not only the selection of word meanings but also combining or composing them into bigger connected units leading to the meaning of the sentence. Knowledge of the syntactic structure of a natural language provides cues to this composition, identifying which words combine with which others and how the meanings of those composite units combine with other such units. Such knowledge saves us from having to try out every possible combination of word meanings in a sentence. Syntactic knowledge helps us transform the simple sequence of words in a sentence into a hierarchical grouping of combinations of word meanings. For instance, in a sentence starting with an adjective, a noun, and then a verb, such as (18) below, syntax tells us that the meaning of the adjective should be composed with the meaning of the noun, which in turn should be composed with the meaning of the verb. This for instance, tells us that we need not consider a composition of the meanings of the adjective and the verb for the following sentence.
(18)
The young lady kissed ...... A slightly dierent type of knowledge, known as structural knowledge, can also be considered part of syntactic knowledge. This is the knowledge of preferences for particular syntactic structures over other related structures. For instance, given an ambiguity between two possible syntactic structures, there may be a preference for a minimal structure over a non-minimal one. Preferences of this kind are known to help the language processor resolve ambiguities that are otherwise left unresolved (Frazier, 1987). Examples of the use of structural preferences can be found in attachment ambiguities such as with prepositional phrases, conjunctions, and so on (see Chapter 3). Semantic knowledge: Semantics can be de ned as comprising those aspects of meaning that are marked in the surface form of language. As such, semantic knowledge is the knowledge of how the structural compositions marked by syntax map to relationships in the meanings of the parts. This, however, does not mean that all the knowledge that may be needed to map a particular instance of a syntactic form to its meaning is semantic knowledge. Many other extra-linguistic types of knowledge may very well be necessary to disambiguate the mapping. Semantic knowledge merely identi es the relationships between the grammatical forms of utterances and how dierent equivalence classes of meanings combine with each other. While linguists have identi ed several distinct components of meaning expressible in natural language (e.g., Frawley, 1992), we focus our attention on events, objects, their properties (to a certain extent), and the thematic roles that link events and objects together. Given this semantics, knowledge of semantics is the knowledge of how Linearity in natural language communication is in fact imposed by the human vocal apparatus. We must communicate our conceptual graphs of meanings by uttering a linear sequence of words. A natural language provides the mechanisms required for communication to work eectively using linearized forms of a complex conceptual graph. Some of those mechanisms are the syntax and semantics of the natural language. 6
52
grammatical roles identi ed by syntax, such as subject and object map to corresponding thematic roles such as agent and theme. For example, the piece of knowledge that tells us that the meaning of the noun following the preposition \with" combines with the meaning of a previous verb (or noun) by taking the role of an instrument (or a collateral such as a co-theme) role is a piece of semantic knowledge. This part of the meaning of a sentence is marked explicitly in the surface form of the sentence by using the preposition \with" and by ordering the words appropriately. However, the piece of knowledge that tells us whether a particular noun in a \with" prepositional phrase can be an instrument for a particular event denoted by a verb is not semantic knowledge since this is extra-linguistic; this piece of knowledge is not marked in the surface form of the sentence in any manner whatsoever and falls under conceptual knowledge. Conceptual knowledge: Conceptual knowledge is the knowledge of events and objects (and other components of meaning) irrespective of how or whether they are expressed in natural language. For instance, the knowledge that a telescope can be used as an instrument for seeing but a horse cannot be, is a piece of conceptual knowledge that perhaps has its origins in knowledge of a particular world, or in an agent's experience with a domain. This piece of knowledge is not marked in the surface form of a natural language sentence (such as sentence (15)).7 The sentence interpreter needs this piece of knowledge nonetheless to resolve the ambiguity between whether the horse was accompanying the object that was seen or whether it was used as an instrument to see the object, as in sentence (15).
(15) The ocer saw the ship with the horse. Lexical and non-lexical knowledge: One can look at a completely orthogonal partitioning of
linguistic (and other) knowledge and divide all knowledge used in natural language understanding into lexical and non-lexical knowledge. Lexical knowledge is the knowledge that is speci c to a particular word|knowledge that is found in the lexical entry of the word.8 This word-speci c lexical knowledge can be any type of knowledge and typically includes syntactic category information, meaning(s) of the word(s), subcategory information, subcategorization information, thematic grids (Carlson and Tanenhaus, 1988) and so on. Non-lexical knowledge is knowledge, mostly syntactic and semantic, that is not speci c to a particular word. For instance, knowledge of a grammar, or how a thematic role combines with others, typically falls under such general knowledge. The piece of semantic knowledge described above is not speci c to any particular word and is not associated with any particular lexical entry. It may be noted that in some models (e.g., CA, see Birnbaum and Selfridge, 1981) and in some grammatical theories like lexical-functional grammars or LFGs (Bresnan and Kaplan, 1982; see also Sells, 1985), dierent types of non-lexical knowledge are lexicalized by associating appropriate instances of it with each lexical entry. While this may be a useful way of combining dierent types of knowledge this not only introduces a lot of redundancy between lexical entries of words in the same equivalence classes but also violates the principle of functional independence between dierent types of knowledge. This does not mean that such conceptual knowledge cannot be expressed in a natural language! It, as well as most everything else, can be expressed in a natural language. However, in a sentence such as (15), such knowledge is required to map the input to the intended meaning, yet the knowledge is not marked explicitly in the sentence in any manner. 8 Actually the word as the basic lexical unit is a somewhat loose characterization since sometimes we use phrasal lexicons where commonly occurring phrases have lexical knowledge associated with them. 7
53
4.3 Task Decomposition The input to the sentence comprehension task is a set of words with meanings, along with a variety of linguistic information marked by such things as word order, punctuation, function words such as articles, particles, and prepositions, and various in ections on the words for marking number, gender, person, tense, causality, and so on. The task of sentence comprehension is to take this input and map it into a desired unique composition of unique meanings of the words in the sentence. Hence, one can decompose this task into the subtasks of word meaning selection, and word meaning composition. Another way to arrive at a task decomposition for the sentence comprehension task is to look at it in terms of the types of knowledge used in the task. Since language comprehension requires the use of several dierent types of knowledge, the task of comprehension can be decomposed into subtasks each corresponding to the application of a particular type of knowledge in mapping the input to the output. This mode of task decomposition is particularly useful given the additional constraint of functional independence of each type of knowledge relative to others. The main subtasks that result from such a decomposition are lexical access, syntactic processing, and semantic processing. We will now see how these tasks are related to the main tasks of word meaning selection and composition. Word meaning selection is the task of taking a word as input and returning a unique meaning for the word. Since words can have many syntactic categories and each of those categories may have many meanings, the task of word meaning selection can be decomposed into the tasks of category selection and word sense disambiguation apart from the obvious task of lexical entry retrieval. Lexical entry retrieval is the task of taking a word and retrieving from the lexicon one or more lexical entries that are indexed by the word (or its root form if the word's morphology is analyzed). The knowledge used in this task is lexical knowledge or the lexicon itself. However, this task may also use knowledge of the morphology of the language if words are analyzed morphologically for their roots and in ections. Word category selection can be done either by proposing the default, most preferred (or recent or frequent) category for that word or, by selecting the category that best ts the current syntactic context of the word in the sentence. For instance, of the several possible categories for a word, that which was already being expected in the prior syntactic context for the word might be the one that is selected. Thus the task of word category selection has the subtask of syntactic context determination or simply syntactic processing. Depending on the method used, word category selection might be using recency and frequency knowledge or syntactic knowledge. Word sense disambiguation is the task of taking a word with a particular grammatical category and selecting one of the many possible meanings for the word. Again while this may be accomplished sometimes by defaults, relative frequencies, and recencies, the method we are interested in here is that of selecting the meanings that are acceptable in the current semantic context. This means we need to determine the current semantic structure in order to test which word sense ts the semantic context well. Thus word sense disambiguation requires the task of semantic processing. Word sense disambiguation might be using recency and frequency knowledge or semantic or conceptual knowledge of which (equivalence classes of) meanings can be composed with which others and how the composition occurs.9 Word meaning composition is the task of taking a set of chosen meanings, one for each content In this work, for the most part we do not consider methods based on recencies or frequencies or collocations (see Chapter 10 for a brief discussion however). While such methods can accomplish a number of the subtasks described here, we are interested in a more parsimonious and generative theory that can explain how sentence interpretation occurs without relying on a particular domain or being trained on a particular corpus. For the same reason, we do not attempt a connectionist implementation of our model either. However, we do acknowledge that such methods are sometimes necessary in sentence interpretation and as such we may not be able to give a complete account of sentence interpretation without resorting to such methods at some point. 9
54
word in a sentence, and producing a composite meaning for the whole sentence. In order to avoid having to try out every possible combination of word meanings, meaning composition must be done according to the preferences of syntactic and semantic knowledge. Applying syntactic knowledge to perform composition involves the determination of the syntactic structure of the sequence of words in the sentence, a subtask that was also required for word category selection above. Similarly, applying semantic knowledge to the composition task requires one to build the semantic structure of the sentence that corresponds to its syntactic structure in useful ways (Peterson and Billman, 1994), a subtask that was also required for word sense disambiguation above. Syntactic structure assignment, or syntactic analysis for short, takes a linear sequence of syntactic categories for the words in the sentence and assigns a hierarchical structure to the sequence. This hierarchy identi es which particular word meanings should be combined with which others. In addition it also incorporates closed class words in the sentence such as prepositions which suggest particular ways of composing word meanings. Syntactic analysis uses syntactic knowledge or knowledge of the grammar of the language. However, syntactic analysis cannot in general be completed using only grammatical knowledge because of ambiguities in the grammar or in the sentence. The assignment of unique structure is often dependent on semantic, conceptual, or other analyses. Semantic structure assignment, or semantic analysis for short, takes a partial hierarchy of word meanings determined by syntactic analysis and assigns a semantic structure to it. That is, it identi es through what semantic primitive (or through which semantic relationship or role) are two meanings related. The result is a composition of the meanings of the words through the linkage provided by the roles assigned by semantics to each composition. This task is accomplished by the application of semantic knowledge which tells us how dierent semantic relationships or roles combine with one another as well as what classes of meanings assume what roles in the semantics. Another extra-linguistic type of knowledge that is often used in semantic analysis is conceptual knowledge of the preferences of particular meanings (or concepts) with regard to how they combine with other meanings in a particular role. Again, semantic analysis is dependent on the compositions suggested by the hierarchy that syntax assigns. Without such syntactic guidance, semantic analysis would be highly ambiguous, trying out every possible combination of word meanings, except perhaps in overly restricting contexts. Semantic analysis would be unnecessary if syntactic hierarchies had a one-to-one mapping to semantic roles denoting the relationships between dierent meanings. In such a language, one could directly use syntactic compositions to perform conceptual analysis of the meanings of a sentence. However, natural languages (including heavily in ected and case-marked ones) do not have such a direct mapping between syntactic structures and the conceptual relationships among the components of the meaning of a sentence (see for example, Covington, 1990). Semantic analysis can be viewed as bridging the gap between the two, resolving certain ambiguities in the mapping from syntactic relationships between words, to conceptual relationships between their meanings. In the above analysis, we have ignored other tasks that might be assisting the execution of many subtasks listed. For instance, discourse analysis and reference resolution can also be shown to assist word meaning selection and word meaning composition. However, in this analysis, we exclude the role played by these tasks for the sake of simplicity and focus. It may also be noted that we are ignoring a few other tasks that are part of sentence comprehension. An example is the inference and interpolation of gaps or unspeci ed parts of the meaning of a sentence. It may also be noted that this is not a strictly hierarchical task decomposition since many subtasks do not exist merely in service of the parent task. The dependencies between these tasks are largely mutual. For instance, the tasks of syntactic structure assignment and semantic structure assignment are mutually dependent and in a sense subtasks of each other. The task decomposition structure is shown in Figure 4.1. It may be noted that the term \discourse analysis" is used in Figure 4.1 to refer to all subtasks of natural language understanding other than sentence
55 understanding.10
Natural Language Understanding
Sentence
Discourse
Understanding
Analysis
Word Meaning Selection
Lexical Entry Retrieval
Word Category
Word Sense
Word Meaning Composition
Disambiguation
Selection
Syntactic
Semantic
Analysis
Analysis
Information Flow Task Decomposition
Figure 4.1: Task Decomposition and Information Flow in Sentence Understanding.
We should also make it clear at this point that the dependencies between these tasks do not suggest a temporal sequencing of the execution of these tasks (or of the modules that accomplish those tasks in a computational implementation), especially since the dependencies are mutual. For instance, because we said semantic analysis uses syntactic structure, it does not follow that semantic analysis is performed after syntactic analysis has been completed. The importance of not drawing such conclusions about the architecture of the sentence comprehension system cannot be overemphasized. For an example of a model of discourse understanding that includes explanation and other reasoning processes, see Moorman and Ram (1994). 10
56
4.3.1 Incremental Communication between Subtasks
It can be seen from the task decomposition above that syntactic and semantic analyses are needed for both word meaning selection and word meaning composition. However, it turns out that neither syntactic nor semantic analysis can be done independently of the other because of ambiguities. Syntactic analysis needs, among other things, the feedback that semantic analysis can provide in order to assign a unique hierarchical structure to a sentence. Semantic analysis, on the other hand, needs the guidance of syntax so as not to have to try out every possible composition of the set of word meanings in the sentence. Because of the problem of ambiguity and the consequent combinatorial explosion of the number of possible interpretations of a sentence, syntactic and semantic analyses must communicate with each other incrementally and provide information to each other that will help reduce the number of interpretations being considered by the other. Incremental communication between syntax and semantics can produce incremental interpretations of a sentence while adequately handling the dierent kinds of ambiguities in a natural language. We will analyze the issue of syntax-semantics communication and what it has to say about the control of processing in sentence understanding in greater detail in the next chapter (Chapter 5).
4.3.2 Example
To illustrate the above tasks and subtasks, consider the sentence
(19)
The bugs moved into the lounge were found quickly. The input in this example contains the following information: The content words \bugs," \moved," \lounge," \found," and \quickly" Word order which speci es which word precedes and is followed by which others The word \The" suggesting a de nite reference (not necessarily already existing in the discourse context, as in the case of bugs in this sentence for instance) for \bugs" and for \lounge" The preposition \into" suggesting a destination location role for the noun \lounge" The auxiliary \were" denoting past tense, plurality, and passivity The in ections on the words such as the \s" on \bugs," the \ed" on \moved," and the \ly" on \quickly" suggesting plurality for the noun category of \bugs," third person present tense for the verb category of \bugs," past tense or past participle for the verb \moved," and the adverb form of \quick." Given this information, the task of understanding this sentence is to choose the appropriate meanings of the words \bugs," \moved," \lounge," \found," and \quickly" and compose those meanings to form a meaning for the entire sentence. Consider the task of word meaning selection for the word \bugs." Lexical entry retrieval for the word \bugs" tells us that the word has two possible syntactic categories: noun and verb. In order to perform a word category selection, we need to determine the syntactic structure of the sentence up to that point. Syntactic analysis tells us that after seeing the determiner \The," we are expecting a noun to follow, not a verb (See Figure 4.2). Using this syntactic context, we can select the noun category for \bugs." Word sense disambiguation is now the task of choosing between the two possible meanings of the noun form of \bugs": insect bugs and electronic microphones (See Table 4.1). Though the insect meaning may be more common than the microphone meaning in many contexts, such frequency information does not always correctly resolve the word sense ambiguity. At this point in sentence
57
understanding, we do not have the semantic information to perform word sense disambiguation. Our best bet is to postpone that task and move ahead in reading the sentence. That is, we carry along both meanings of the noun \bugs" as we read the next word(s). The word \moved" has a verb entry in the lexicon with the corresponding meaning of the event move. We now try to compose the meanings of the words \bugs" and \moved." Syntactically there are two ways of composing these two parts of the sentence (the noun phrase \the bugs" and the verb phrase \moved"). We can attach the verb directly as the main verb of the sentence or through a reduced relative clause to the noun phrase. Semantically, either meaning of \bugs" is compatible with the relative clause attachment, the corresponding semantic composition being that of a theme role for the meanings of \bugs" in the move event. However, only the insect meaning of \bugs" is compatible with the direct attachment since only insect \bugs" are in the class of animate objects which can move themselves (i.e., be agents of move events, as per a piece of conceptual knowledge). This ambiguity is resolved in favor of the direct attachment since there is a syntactic preference for the attachment which completes the sentence structure by lling the expectation for a verb phrase at that point in the sentence. As a result of this choice, the microphone meaning of \bugs" is deactivated since it is incompatible with the composition selected by syntactic analysis. The analysis proceeds further to produce the composite meaning of the rst part of the sentence \The bugs moved into the lounge." However, at this point, there is no way of composing the meaning of the next word \were" with the existing sentence structure. Assuming that the sentence is syntactically correct (i.e., grammatical), we can hypothesize that there has been an erroneous decision made. Going back and looking at previous decisions, we nd that we decided to compose \bugs" and \moved" by making \moved" the main verb. We can identify that this is the error by noting that we do not have a way currently of attaching the new verb phrase containing \were" and the decision to make \moved" the main verb was another attachment of a verb phrase (with the verb \moved"). Perhaps if we made a dierent decision at that earlier, related choice, there would be room left in the syntactic structures to compose the meaning of the phrase starting with \were." We do this by repairing the previous syntactic structures. We attach \moved" (and the meaning of \into the lounge" which has already been composed with the meaning of \moved") as a reduced relative clause to \bugs" and thereby recover from the error. We now have room for composing the meaning of \were found" with the earlier meanings. In this process of error recovery, while removing the previous attachment of \moved" with \bugs," we reconsider our associated decision of deactivating the microphone meaning of \bugs." Since there is no longer a reason to rule out the microphone meaning as either type of \bugs" can be the themes of being moved by someone else, we now bring back the deactivated meaning of \bugs." Thus, we reintroduce the ambiguity in the meaning of \bugs" and undo the word sense disambiguation performed earlier.11 This example not only shows the need for the error recovery subtask, it also shows some of the complexities in the interactions between these tasks. Much of this would not be necessary, but for the functionally (and cognitively) motivated constraint of incremental interpretation. Figures 4.2, 4.3, and 4.4 and Table 4.1 show the intermediate results of processing this example after every word. The Figures show the syntactic structures only and the Table shows the corresponding meanings and the semantic roles assigned to those meanings.
From this example, we can also derive a psychological prediction generated by COMPERE that a word sense ambiguity, such as in the \bugs" example, remains unresolved at the end of the sentence, even though it was resolved temporarily until the resolution was found to be in error. See Chapter 10 for a discussion of COMPERE's predictions. 11
58
S NP
NP Det
The (a)
Det
N
The
bugs
(b)
S
S
Det
V
N
The
bugs
VP
NP
VP
NP
moved
Det
N
The
bugs
V
moved
PP
Prep
into
(d)
(c)
S
VP
NP
Det
N
The
bugs
PP
V
moved
NP
Prep
Det
into
the
(e)
Figure 4.2: Syntactic Structures After Each Word of Sentence 19 (Part 1).
59
S
VP
NP
Det
V
N
The
bugs
PP
moved Prep
NP
into Det
N
the
lounge
(f)
S
Det The
N
VP
VP
NP
Aux
V
Rel-Cl
bugs
were
were Rel-S
VP
V
PP
moved Prep
NP
into Det
N
the
lounge
(g)
Figure 4.3: Syntactic Structures After Each Word of Sentence 19 (Part 2).
60
S
VP
NP
Aux
V
were
found
(h)
S
VP
NP
Aux
V
Adv
were
found
quickly
(i)
Figure 4.4: Syntactic Structures After Each Word of Sentence 19 (Part 3).
61
Table 4.1: Semantics After Each Word of Sentence 19. Part (a) (b) (c) (d) (e) (f)
After word \The" \bugs" \moved" \into" \the" \lounge"
(g)
\were"
(h)
\found"
(i)
\quickly"
Semantics insect, microphone insect: Agent of move: Event
{same as above{ {same as above{ {above{ and lounge: Location of move: Event insect, microphone: Theme of move: Event lounge: Location of move: Event insect, microphone: Theme of move: Event; lounge: Location of move: Event insect, microphone: Theme of find: Event; {same as above{
62
CHAPTER V SYNTAX-SEMANTICS COMMUNICATION AND SENTENCE PROCESSING ARCHITECTURES What really seems to be going on is a coordinated process in which a variety of syntactic and semantic information can be relevant, and in which the hearer takes advantage of whatever is more useful in understanding a given part of a sentence.
T. Winograd, 1973 In the previous chapter, we identi ed the subtasks of syntactic and semantic analyses and showed how they are necessary for both word meaning selection and word meaning composition. We also noted that, in general, neither syntactic nor semantic analysis can be completed without useful information provided by the other. In this chapter, we examine further the issue of syntaxsemantics communication in incremental sentence interpretation. We begin by noting reasons other than functional analysis for proposing such incremental communication and analyze the nature of the communication. Then we investigate dierent sentence processing architectures and analyze the kinds of communication that dierent architectures aord. Finally, we arrive at a particular architecture for enabling incremental communication between syntax and semantics.
5.1 The Need for Communication There are several reasons for proposing a sentence processing architecture in which syntactic and semantic processing communicate with each other incrementally during sentence comprehension.
5.1.1 Functional Motivation: Ambiguity
From a functional point of view, the mapping from the input sentence to its syntactic structure or to its semantic representation is underconstrained and highly ambiguous. The subtask of semantic (or word meaning) composition in the absence of syntactic guidance is largely combinatorial. Similarly, the assignment of unique syntactic structure is often impossible without semantic feedback that helps constrain the possible structures. Parsing programs get bogged down by the multiplicity of possible syntactic structures when processing a sentence in a natural language. The parsing strategies left to themselves would encounter an enormous number of syntactic ambiguities even in sentences that human understanders do not have much diculty with. For example, as already seen in Chapter 4, the sentence below leads a syntactic parser that did not use semantic feedback to over one hundred possible syntactic structures for the sentence (Jacobs et al, 1990).1 One could argue that modern computers may not get bogged down by a few hundred alternatives. While it is true that computing speed ameliorates the problem to a certain extent, the problem of selecting one interpretation from a set has more to it than just the number of alternatives in the set. If there are ve points with ambiguous attachments in a sentence, for example, with two alternatives at each ambiguity, an incremental processor could apply locally available knowledge at each ambiguous point to make a selection then and there. For example, a syntactic 1
63
(1)
A form of asbestos once used to make Kent cigarette lters has caused a high percentage of cancer deaths among a group of workers exposed to it more than 30 years ago, researchers reported. Syntactic processing needs external help to eliminate many of the alternative parse structures early on. Information enabling such early commitment can come from the results of semantic and conceptual analyses. In order that such information be used to reduce the complexity of syntactic processing, syntax and semantics must communicate early on and incrementally. Semantic feedback is an important source of information that a parser could use to deal with local ambiguities in syntax. However, it is dicult to devise a systematic communication mechanism for interactive syntax and semantics. The focus of investigation in natural language processing research has moved away from the issue of semantic feedback primarily due to the diculty of getting the communication between syntax and semantics to work in a clean and systematic manner. Nonetheless, it is unquestionable that semantics does in fact provide useful information which, when fed back to syntax, helps eliminate many an alternative syntactic structure. Since natural languages are replete with ambiguities at all levels, it seems intuitive that a processing architecture with incremental interaction between the levels of syntax and semantics which makes the best and immediate use of both syntactic and semantic information to eliminate many alternatives would win over either a syntax- rst or a semantics- rst mechanism.
5.1.2 Cognitive Constraints
As seen in Chapter 3, experimental results in psycholinguistics have demonstrated a number of phenomena such as immediate semantic in uence on syntactic garden paths, error recovery, delayed decisions, and early commitment. These behaviors can only be explained by an architecture that permits incremental interaction between syntax and semantics. In order to account for the eect of context on the presence or absence of garden-path eects in reduced relative-clause sentences such as the pair (2) and (3) below, the sentence processor must not only combine top-down and bottom-up in uences on sentence comprehension arising from syntactic sources of knowledge in the parsing algorithm, but also combine information arising from semantic and conceptual analyses with syntactic preferences. A cognitive model of human sentence processing cannot do without incremental interaction between syntactic and semantic analyses. (2) The ocers taught at the academy were very demanding. (3) The courses taught at the academy were very demanding.
5.1.3 Generativity
In addition to the other functional and cognitive constraints, we want our model of sentence comprehension to be computationally minimal. That is, it should perform its task with the fewest types and least amount of knowledge and with the minimal set of processing mechanisms that will enable sentence comprehension. Such a minimal processor can be said to have a high degree of generativity since it can deal with a variety of input sentences using only a concise representation of the requisite knowledge. Maintaining such generativity entails independence between dierent sources of preference may resolve the rst ambiguity, a semantic constraint might resolve the second, and so on. If decisions are deferred until the end, on the other hand, the processor must now weigh and combine the evidence provided by the various pieces of information against (various combinations of) each other in order to select the alternative that is best given all those pieces of information. Additional knowledge required to combine pieces of information arising from disparate sources of knowledge (such as syntax and semantics) is not available in practice. See the discussion on the arbitration algorithm in Chapters 7 and 8 for further details.
64
knowledge as described below. Combining knowledge obtained from these separate sources dynamically during sentence interpretation in turn requires incremental communication between syntax and semantics. In order to illustrate the need for independent representation of dierent types of knowledge and their dynamic integration during sentence interpretation by means of incremental communication between syntactic and semantic processes, let us consider the nature of the dierent types of knowledge and examine an alternative to the above solution that would not require incremental communication between separable syntactic and semantic processes.
5.1.3.1 Integrated Representations
Any type of linguistic knowledge is a set of regularities in the world. For instance, syntactic knowledge is a set of regularities in the structure of natural language sentences. Semantic knowledge is the set of regularities in the relationships between structure and meaning in a natural language. Conceptual knowledge is the set of regularities in the objects, events, and other components of meaning in the world, irrespective of the linguistic forms they take. By independently representing and processing each type of regularity (i.e., knowledge) used in language understanding, we can maximize the generativity of the language processor. Else, the processor would be speci c to some particular sublanguage in some domain. For example, one can combine the regularities a priori for a particular domain or a particular sublanguage in the form of a semantic grammar or some other integrated construction. But by combining two kinds of regularities in predetermined ways, the language processor loses certain other combinations of the independent regularities. If, on the other hand, every possible combination was included in the integrated representation, then of course the resulting search space would be just as large and there would be no advantage to combining the dierent types of knowledge a priori. Prior integration of syntactic and semantic knowledge to suit a domain and sublanguage does result in a loss of generativity for the language processor. For example, one piece of syntactic knowledge tells us that, given a subject noun phrase followed by a verb that has a subcategory ambiguity, being in either a simple past tense form or a past participle form, we prefer the simple past form since this would satisfy the expectation for a main verb and thereby complete the sentence structure. In fact, this knowledge can simply be stated as a syntactic preference for an expected structure over one that would leave the expectation unful lled. A piece of semantic knowledge tells us that subject noun phrases in active voice sentences take up agent roles if the event represented by the verb accepts the subject noun as an agent. A piece of conceptual knowledge tells us that for a verb such as \taught," the agent must be an adult human being. We could either represent these three pieces of knowledge independently as stated above or combine them into an integrated representation for the particular situation of the verb \taught." We could then have one integrated piece of knowledge that tells us that \if the subject noun of `taught' in an active voice sentence is an adult, then prefer the simple past tense form of `taught'; otherwise, prefer the past participle form." If we do that, how could the sentence processor understand sentence (4) below?
(4)
The children taught at the academy. In this sentence, the conceptual knowledge is violated and \children" does take up the agent role though it does not represent adult animate entities. Since there is a strong syntactic preference for assigning a complete syntactic structure to a sentence by the time the last word of the sentence has been processed, the conceptual preference for a non-agent role for \children" must be disregarded, for, otherwise, we would be left with an incomplete sentence structure still waiting to see a main verb. We would be excluding this particular composition of the meanings involved unless we enumerate every possible integration of the above pieces of knowledge. If we represent
65
them as independent pieces of knowledge, we could combine them during processing to account for the presence or absence of garden-path eects in the two sentences (2) and (3) above, but also understand the above sentence (4) by combining them in a dierent way (this time disregarding the conceptual knowledge since the syntactic structure of the above sentence overrides the conceptual preference). This example shows that there are rules such as the one here that says syntactic preferences overrule conceptual ones under certain conditions (such as at the end of a sentence), and therefore the processor must be able to separate the syntactic and conceptual pieces of knowledge. If they had been integrated a priori, there would be no way of separating them out and excluding the conceptual constraint in favor of the syntactic one. (See Chapter 10 for a discussion of the psycholinguistic implications of how COMPERE deals with sentence (4) above.) Once we decide to represent dierent types of knowledge independently of one another, there must be a way of combining them dynamically during sentence processing in order to adhere to the integrated processing principle and use all types of available information as soon as possible in making sentence processing decisions. This dynamic combination cannot be deferred until the end and must be incremental as required by the other functional and cognitive reasons above.
5.2 The Nature of the Communication We now examine the nature of the interactions between dierent types of knowledge and see why we need an arbitrating process to control the interactions. We have already seen that syntax-semantics communication cannot be substituted for by an a priori integration of syntactic and semantic knowledge. What form does this communication then take? We begin by analyzing the content of the communication by looking at the dierent knowledge sources that contribute to decision making in sentence interpretation. First, we consider how word category information and nonlexical syntactic knowledge (in the form of a grammar) interact with each other. Then we look at the interaction of word meaning and other semantic and conceptual knowledge with syntactic knowledge and processing. We then argue that, in order to address the functional and cognitive motivations for incremental communication between syntax and semantics, the computational model must employ a single arbitration process that can integrate independently proposed syntactic and semantic interpretations to resolve ambiguities.
5.2.1 Interaction between Knowledge Sources within Syntax
Syntactic processing employs both knowledge of the word in terms of its category and other information arising from its in ections, as well as knowledge of a grammar for the phrases, clauses, and sentences in the natural language. The interactions between these sources of information in terms of when each piece of knowledge should be applied and when and how they in uence the process of syntactic structure assignment determine the parsing algorithm to be employed for syntactic analysis. In order to meet the cognitive and functional constraints on the sentence processor, it turns out that we can neither start with word information alone and worry about the grammar later (i.e., bottom-up processing), nor start with the grammar alone and look at word information in the input at a later time (i.e., top-down processing). The sentence processor must employ a combination of bottom-up and top-down parsing to assign syntactic structures to sentences while minimizing the number of dierent assignments possible (see Chapter 6). Steedman (1989) claims that standard theories of parsing using phrase-structure rules are incompatible with incremental processing given that the steps taken by the parser be psychologically real. Data-driven models of language comprehension with bottom-up strategies appear to be compatible with incremental semantic interpretation. Incremental comprehension can be viewed as a bottom-up strategy that interprets each successively larger constituent as it is built from the next
66
word in the sentence. Though it seems possible to apply an incremental bottom-up strategy to phrase-structure rules, the resulting process will not be incremental since the bottom-up parser waits until it has seen every daughter of a constituent before interpreting it. Phrase-structure rules can be applied to carry out an incremental interpretation only if sentences have a left-branching structure; however, this is not common among natural languages including English (Steedman, 1989). A top-down strategy, on the other hand, would force the processor to commit to a whole constituent before analyzing the parts of the whole. Making such commitments when the processor does not have necessary information results in unwarranted backtracking. What we need is a bottom-up parser combined with top-down guidance that can make early commitments before actually seeing every part of a constituent so that semantic interpretation can be incremental. Such early commitments may be made by employing top-down in uence from a variety of types of knowledge.
5.2.1.1 Types of Top-Down Guidance to Bottom-Up Parsing
Information providing top-down guidance to the parser can be of three types: syntactic expectations arising from grammatical information about the categories involved (Steedman, 1989), general structural principles (Frazier, 1989), and feedback from semantics, reference and discourse interpretation (see, for example, Crain and Steedman, 1985; Taraban and McClelland, 1988).
5.2.1.1.1 Syntactic Expectation: Grammatical information tells the processor about the ar-
guments that must follow before the current constituent can be complete (and hence grammatical). For instance, after seeing a noun phrase, a verb must follow for the sentence to be complete. We can say that the processor can expect to see a verb phrase at this point. The parser can use such syntactic expectations (also called predictions) to make early commitments at syntactic ambiguities. This grammatical preference for expected structures results in the same behavior as expected by the minimal attachment principle and explains garden-path behavior in reduced relative clause sentences such as (2).
(2)
The ocers taught at the academy were very demanding.
5.2.1.1.2 Semantic and Conceptual Preference: One way to introduce extra-
grammatical top-down in uence on attachment ambiguities is through incremental semantic interaction. Semantic and discourse processes feedback to syntactic processing and exert preferences for some attachments over others (e.g., Crain and Steedman, 1985; Stowe, 1991). For instance, semantic feedback can tell the processor that the prepositional phrase (PP) in sentence (5) must be attached to the noun phrase \the soldier" and not the verb \saw" since a horse cannot be used as an instrument for seeing.
(5)
The ocer saw the soldier with the horse.
5.2.1.1.3 Structural Preference: The parser must also be able to exert purely structural
preferences that are independent of the categories and lexical items involved in the ambiguous parts of the sentence. Examples of such a preference are right association and minimal attachment (Frazier, 1987). Such syntactic generalizations allow the syntactic processor to make early commitments for optional adjuncts as well (which were not expected), thereby explaining several syntactic phenomena (Frazier, 1989). There is psychological evidence in structural ambiguity resolution which demonstrates the need for this form of top-down in uence. For example, Stowe (1991;
67
see also Blackwell and Bates, 1994) has shown that the human sentence processor delays a decision (i.e., does not do an early commitment) when there is a con ict between syntactic and semantic preferences at a structural ambiguity (i.e., when top-down in uence from grammar or structural principles and those from semantic feedback contradict each other). Experiments showed that people continue to delay the decision and pursue multiple interpretations until they reach resource limits. At the limit, they make a choice based only on syntactic preferences. This result would be left unexplained but for the presence of a top-down in uence of structural preferences. Other psychological studies of syntactic ambiguity resolution have also found certain limitations on the role of semantic and contextual information in resolving syntactic ambiguities and hence the need for purely syntactic preferences to explain syntactic disambiguation under those limitations (King and Just, 1991; MacDonald, Just, and Carpenter, 1992; see also the review of psycholinguistic literature in Chapter 3.).
5.2.1.2 The Control of Parsing
Top-down guidance from syntactic expectations and structural preferences can be integrated with a bottom-up control of parsing by employing a form of left-corner parsing (Abney and Johnson, 1991). We have developed a variant of left-corner parsing by adding to it the virtues of headdriven parsing. The resulting mechanism that we call Head-Signaled Left Corner Parsing produces the right sequence of syntactic commitments to account for a variety of data. The parser has been implemented in the COMPERE model. The parsing algorithm and the theory of parsing are further described in Chapter 6 and elsewhere (Mahesh, 1994b).
5.2.2 The Need for an Arbitration Mechanism
While top-down and bottom-up in uences on sentence comprehension arising from syntactic sources of knowledge can be combined in the parsing algorithm mentioned above, the sentence processor must also combine information coming from semantic and conceptual analyses with syntactic interpretations. As already seen, this communication between syntax and semantics is necessary to account for evidence from interaction, delayed decisions, early commitment, and error recovery. The left-corner parsing algorithm merely identi es the points when the communication ought to occur but doesn't tell us how the communication is handled or how con icts are resolved in the best interests of the functional and cognitive constraints on the behavior of the sentence interpreter. One way to do this, as mentioned earlier, is to integrate the representations of the dierent types of knowledge a priori as in semantic grammars and what are called grammatical constructions in certain other models (e.g., Jurafsky, 1991). Such an approach suers from reduced generativity and other disadvantages. In order to avoid losing generativity (see Section 5.1.3 earlier), the sentence processor must keep the knowledge sources separate and introduce an arbitration mechanism that dynamically combines information arising from independent syntactic and semantic sources, resolving any con icts that might arise in the process. The arbitrator needs to combine information coming from independent syntactic and semantic sources which talk in dierent terms. Syntax describes its interpretations in terms of grammatical relations such as subject and object roles while semantics talks in terms of thematic roles.2 The arbitrator must establish correspondences between syntactic and semantic interpretations (and between the decisions made by the two). For purposes of illustration and simplicity, we are employing thematic-role assignment using selectional preferences from the lexicon as our theory of semantics. Our approach however is not limited to thematic roles. For an example of a more structured theory of semantics, see (Peterson and Billman, 1994). 2
68
5.2.3 Translation is No Good
In functional terms, syntactic and semantic sources of information are capable of independently proposing (or generating) alternative interpretations of a sentence. However, in order to reduce the number of alternatives they must consider, and to make commitments to interpretations early enough, they must in uence each other's decisions. The nature of the communication between syntax and semantics can either take the form of a message passing architecture or can be achieved through a common arbitrator. Since syntax and semantics are knowledge of dierent kinds of regularities there must exist a translating mechanism for the communication to work. The translator bridges the communication gap between the languages of syntactic and semantic attachments and preferences (coming from the corresponding regularities) by establishing correspondences between the dierent representations and between decisions in syntax and semantics. For instance, the translator may translate the syntactic decision of attaching the next word as the main verb of the sentence to the corresponding semantic decision of assigning the agent role to the subject noun in the event represented by the verb. Or, in the other direction, the translator might translate a semantic preference to its corresponding syntactic decision. If the semantic preference is for not allowing the next word to be the ller for the instrument role in an event, the corresponding syntactic decision may be that the prepositional phrase in which the word is a noun cannot be attached to the verb phrase corresponding to that event. There are two problems with this approach. First, it is a procedural solution with well-known disadvantages over a comparable declarative solution.3 Second, during error recovery in incremental sentence comprehension, changes made in syntax and semantics must correspond to each other in order to ensure consistency of the resulting interpretation. This is problematic if the only representation of the correspondence is in the translation procedures, since correspondence information is not present in the representations that need to be manipulated by error recovery mechanisms. Also, the only kind of recovery possible is through reprocessing because there are no representations of intermediate decisions to repair or backtrack to. For example, consider the sentence
(6)
The bugs moved into the lounge. The semantic representation corresponding to the main verb interpretation of \moved" must have the insect meaning of \bug" in the agent role of the move event, the destination location role being lled by the \lounge." However, if the sentence continues on to the following,
(7)
The bugs moved into the lounge were found quickly. the previous syntactic structure must be repaired to reinterpret the sentence so that the \bugs" are now in the theme role of the move event (as well as in the find event). During this error recovery, we should also bring back the microphone or electronic spying device meaning of \bugs" since that can no longer be ruled out in the theme role of move. In other words, while we repair the syntactic structure to remove the verb \move" from the main sentence structure, moving it to a reduced relative clause, we have to do the following corresponding things in semantics: replace the agent thematic role of \bugs" with the new theme role; and bring back the previously deactivated microphone meaning of \bugs." If the sentence processor merely employed a translating procedure to map syntax to semantics, it would have to practically reinterpret the sentence during error recovery since there was no record of which syntactic commitments corresponded to which semantic decisions and vice versa. There would be no record of decision points intermediate between the syntactic structures and their 3
For a discussion of procedural versus declarative knowledge representation, see Barr and Feigenbaum, 1981.
69
thematic roles that could be used to backtrack to so as not to redo the entire job of sentence interpretation. The correspondence between syntax and semantics established through the translation procedure would not be represented anywhere for use in later error recovery.
5.2.4 The Alternative: Intermediate Roles
An alternative to this is a solution in which the gap is bridged via intermediate decision points in the mapping process and their declarative representations. These intermediate points connect syntactic and semantic decisions and interpretations in a network of roles and enable not only translation but also error recovery and other coherent and consistent manipulations of the syntactic and semantic interpretations of a sentence. This view of bridging the gap between syntactic and semantic structures through intermediate roles leads to a uniform representation of syntactic and semantic knowledge (and of sentence interpretations) in terms of a spectrum of roles that range from grammatical roles such as NP, subject, object, VP, and so on, all the way to thematic roles such as event, agent, and theme. Such a uniform representation of syntactic and semantic knowledge enables syntactic and semantic interpretations to be integrated through intermediate roles and syntactic and semantic processing to be integrated through a uni ed arbitrator. The representational primitive in the uniform representation is a node that speci es part-of and has-part relations to other nodes, preconditions on these relations, and expectations that could be generated from such relations. The intermediate roles, the uniform representation, and their use in syntax-semantics arbitration are further described in Chapters 7 and 8 and by Mahesh and Eiselt (1994). In the rest of this chapter, we deal with the issue of computational architectures that permit the kind of incremental communication between syntax and semantics described above. We rst analyze the spectrum of possible con gurations of syntactic and semantic knowledge and processes and derive the particular architecture that is best suited to incremental communication between them through an arbitrator.
5.3 Sentence Processing Architectures There have been many computational architectures proposed in various models of sentence understanding. In order to be able to compare them and to see what kind of communication each architecture aords or does not aord, let us try to map the dierent dimensions along which architectures of sentence processors could dier from one another. The architecture of a sentence processor has the following architectural elements: Knowledge Sources: One or more representations of dierent kinds of knowledge including those built while processing sentences. Processes: One or more processes where a process is a set of operators on dierent kinds of knowledge which are embedded in a control structure. Control structures are made of control elements such as sequence, recursion, coroutines, and so on. Note that we do not talk about processors. A processor is an element of the implementation, not an architectural element (Marr, 1982). However, we sometimes do refer to the entire architecture as the \language processor." Architectures connect their architectural elements together in various ways such as in series, in parallel, and so on, and in turn enable or disable dierent kinds of communications between the processes.
5.3.1 Architectural Dimensions
Sentence processing architectures can dier from one another in the following features:
70
Language analyses: A language analysis is a combination of one or more knowledge sources
and a process for applying the knowledge to accomplish a particular subtask of the language understanding task. For example, lexical, syntactic, and semantic analyses are distinct language analyses in most theories. The terms levels and faculties are sometimes used to refer to the same elements as the analyses above. Note however that levels usually refer to stages while analyses are not the same as stages. There is no implication of sequentiality of the dierent analyses. Sequential models place the analyses in a sequence of modules; parallel models place them in parallel; integrated models combine all the dierent analyses into one module; others might combine them in various modules as described below. Nevertheless, sentence processing architectures do dier in terms of what kinds of analyses they include in their model of sentence interpretation. Nature of decomposition: The overall task of sentence understanding can be decomposed into modules either along the lines of language analyses such as syntax, semantics, and conceptual analyses, or orthogonally into modules comprised of similar processes running in each of the analyses. As an example of an orthogonal decomposition, one can divide language understanding into three modules: access, integration, and selection, any of which may be going on as part of a lexical, syntactic, semantic, or conceptual analysis. Con guration: The modules thus obtained by a decomposition can possibly be arranged in one of three con gurations: in a sequence of stages as in sequential models, in a single module as in integrated models, or else in various parallel con gurations as in the case of certain interactive models. Nature of interaction: Interaction between the modules of an architecture can happen either through a shared representation (e.g., a working memory or a blackboard acting as another knowledge source) that is understood by all units participating in the interaction, or it can happen between the control structures of the processes in the dierent modules by transfer of control or by exchange of messages, which may have to be translated if there is no shared representation. Modularity: When a task is decomposed, the resulting units can not only be arranged in dierent ways, they can also interact with each other in various ways. The amount and nature of interaction between the dierent modules can be described in terms of dierent kinds of modularity (Fodor, 1987; Marslen-Wilson and Tyler, 1987; Tanenhaus, Dell, and Carlson, 1987): { Representational modularity, also known as information encapsulation, means that each module has its own exclusive representation of a kind of knowledge that is not accessible to any other module. Knowledge sources are exclusive to processes. It is a strong form of modularity where interaction could not happen through shared representations. { Process modularity means that each module has its own process whose control structure does not interact with any other process. The only form of interaction possible is through a common representation such as a blackboard (see below). { Functional modularity means that the modules can interact in any way as long as they are functionally independent, that is, one can perform its function if the other fails to run. Parts of a module may be functionally dependent on other parts of the same module, but are independent of other modules. Functional modularity permits interaction either through common representations (such as in a blackboard architecture)
71
or message passing between the control structures of the modules. Functional modularity permits considerable freedom with respect to the kind of interactions possible. Thus, a module is an exclusive knowledge source plus a process in the rst kind of modularity, an isolated process in the second notion of modularity, and does not have a well-de ned meaning in a functionally independent architecture. For instance, if there are independent representations of knowledge sources with a single uni ed process operating on them all, there is neither just one module nor many dierent modules. Functional independence does not preclude sharing of knowledge sources or processes, as long as it does not render the accomplishment of one function inevitably dependent on another. Grain size or intimacy of interaction: To say that two modules interact during processing does not carry much meaning until we specify the grain size of the interval of interaction. For instance, if syntax and semantics interact only at the end of a sentence, that is not particularly interactive. On the other hand, if they interact after each word, that would make a highly interactive parallel model. It is important to note that an architecture can be described dierently at dierent grain sizes. For instance, what looks like a parallel architecture at the size of a clause (because of interaction going on at a grain size ner than a clause) might actually be sequential at a smaller grain size. This is particularly true in the case of the interaction between syntax and semantics. The lexical entry of a word must be accessed before any signi cant syntactic or semantic analyses can begin. Similarly, syntactic processing of a word resulting in its binding to a phrase structure has to happen at least partially before any semantic processing can begin. If not, valuable information that syntax can provide to semantics would not be exploited by semantic processing which could be wasteful. However, not all syntactic processing for a word need be done before semantics can take o. In short, it is sometimes desirable to have sequentiality at a ne grain size but independence and parallelism at coarser grain sizes. It must be noted that even though reducing the grain size of interaction by cascading the modules in a sequential model brings it closer to a parallel model, the two are still fundamentally dierent since the cascaded model allows interaction between modules in only one direction. Parallel models are built to permit interactions between modules in both directions. The dierent positions that could be taken by sentence processing models along each of these architectural dimensions are shown in Table 5.1.
Table 5.1: Architectural Space of Sentence Understanders. Dimension 1 Nature of Decomposition 2 Con guration 3 Nature of Interaction
Possible Positions by analyses, orthogonal, none serial, parallel, integrated uncontrolled, message passing, blackboard, controlled, none 4 Type of Modularity representational, process, functional, none 5 Grain Size of Interaction more than once per word, once per word, per phrase, clause, sentence
72
From a psycholinguistic point of view, there have been many computational models proposed to account for human language understanding capabilities. Some sentence processing models are better at explaining certain kinds of behavior than others. For instance, some models assume autonomy of the syntactic level and hence cannot explain interactive behaviors (e.g., Forster, 1979). Some models explain only some subtask of the task of sentence understanding by focusing on only one aspect of language processing (i.e., at most one language analysis in the case of many models, e.g., Marcus, 1980; Wilks, 1975). For instance, there are models of lexical ambiguity resolution, of syntactic parsing, of pronoun reference in relative clauses, and so on. In this work we have built a complete model of sentence processing, one that analyzes the lexical information, syntax, and semantics, of a sentence to arrive at the best interpretation possible.4 The role of a computational model is much more than just producing the same behaviors that people do and showing the computational feasibility of a theory. A model is supposed to explain computationally how a sentence processor could arrive at an interpretation. A model could point to parts of a behavior that had not been noted in human performance. A model might lead to the generation of hypotheses about particular details of the sentence processor's architecture which could lead to testable predictions and to the design of further experiments, computational or psychological, theoretical or empirical. We now describe a number of possible points along the above dimensions of sentence processing architectures and analyze them in relation to the kinds of communication between syntax and semantics that they enable.
5.3.2 Sequential Architecture
Models of sentence understanding can be classi ed as sequential, integrated, or parallel models depending on how their knowledge representations and dierent processes relate to each other. In a sequential architecture, a \lower-level" process does not get any feedback from a \higherlevel" process. Each level receives the output of processing at the previous lower level and sends out its output to its next higher level. Traditionally, the task of language understanding has been decomposed into the analyses of syntax, semantics, and pragmatics which are arranged in a syntax rst sequential architecture as shown in Figure 5.1. Such a sequential model has the advantage of accounting for the fast, autonomous processing at syntactic and other earlier stages. As stated earlier, sequential models can improve the eciency and eectiveness of their sentence processing by cascading the dierent modules, that is, by reducing the grain size of the \interaction" which is in this case merely the passing on of the results of processing at a lower level to a higher level. However, this is still a one-way communication and the processing at the lower levels cannot take advantage of results or knowledge available through other analyses being carried out at higher levels.
5.3.3 Integrated Architectures
The integrated processing principle (Birnbaum, 1986; Birnbaum, 1989; Schank et al., 1980) states that the language processor applies syntactic, semantic, and other kinds of knowledge at the earliest opportunity in processing a piece of text (see Chapter 2). However, integrated models of language understanding (Figure 5.2) assume more than just this; they employ integrated representations of knowledge, not just the integration of the information provided by the representations during processing. They do not retain independence in the use of the dierent kinds of knowledge involved in linguistic competence. As a result, integrated models do not account for functionally independent behaviors in language understanding. Though the principle itself is called the integrated processing principle, the integration has always been modeled in the representation itself.
4 However, we do not consider phonology, morphology, and so on; also, there is no model of pragmatic or discourse processing or reference resolution in the present model.
73
PRAGMATICS
SEMANTICS
SYNTAX
Input
Figure 5.1: Sequential Architecture.
Integrated Processor
Integrated Knowledge
Output decisions
Input text
Figure 5.2: Integrated Architecture.
The only example of a model that integrated independently represented syntactic and semantic knowledge dynamically during processing is Lytinen's (1984; 1987) MOPTRANS model which was a sequential semantics- rst model. In fact, as we have seen earlier in Chapter 2, the integrated processing principle entails functional independence between the dierent knowledge sources. However, previous implementations of integrated processing either did not support functional independence with the exception of MOPTRANS which was a sequential model. It is in this sense that we claim COMPERE to be a true implementation of the integrated processing principle. Another class of integrated models also use integrated representations of all the dierent kinds of knowledge represented in a monolithic knowledge base of integrated constructs, sometimes called grammatical constructions (Jurafsky, 1991; 1992). These models dier from other integrated mod-
74
els mentioned above in detailing an explicit decomposition of the integrated process into access, selection, and integration of alternative interpretations (see Figure 5.3(d) below). This is an example of a decomposition of the language processing task that is orthogonal to standard analyses such as syntax, semantics, and pragmatics. Sentence interpretation in such an architecture can be decomposed into the following subtasks: Accessing lexical entries for a word; Proposing feasible ways of integrating the word into current interpretations (there being no distinct syntactic and semantic integrations or compositions); Selecting the most preferred interpretation(s). Models with orthogonal decompositions dier from one another in their architectural details such as whether they propose all possible interpretations and then select from them, or they propose and select, or reject, one at a time, and so on.
5.3.4 Parallel Architectures
Parallel architectures retain independent representations of dierent kinds of knowledge and yet support on-line interaction between them without integrating them so much that they are no longer independent to any degree. Integration of information happens during the processing of a sentence. Parallel models can adhere to the integrated processing principle without having to sacri ce the independence between the uses of dierent kinds of knowledge. Syntax and semantics can be placed in parallel in several dierent ways (Figure 5.3). Dierent parallel con gurations yield dierent degrees of control on the communication between syntax and semantics. Uncontrolled Interaction: The simplest parallel con guration is one in which syntactic and semantic processes interact with each other continuously without any monitoring, arbitration, or even translation. If the two processes shared the same language to represent information and the same operations to manipulate the information, then they could interact with each other in this fashion. This has been the mechanism used in certain connectionist models such as the one by Waltz and Pollack (1985). However, it remains to be seen that an uncontrolled interaction between processing mechanisms such as spreading activation can actually model the variety of interactions between syntax and semantics observed in human sentence processing (e.g., Frazier, 1987; Marslen-Wilson and Tyler, 1987; Stowe, 1991). Even from a computational point of view, it is neither clear that such processing mechanisms can deal with the complexities of linguistic structure nor that they can eectively reduce the degree of local ambiguity encountered by a sentence interpreter. Interaction through a Translator: Another parallel con guration between syntax and semantics is one where there is a translator between syntactic and semantic processes. This gives syntax and semantics the freedom to use their own internal representations and processes. In other words, this architecture does permit representational modularity between the modules. The processes interact with each other through messages that they pass to each other through the bidirectional translator. This mode of communication, again, does not provide any control on the interaction. For instance, if syntax and semantics have con icting preferences for alternative interpretations, there is no way of imposing a preference on one, or in otherwise changing the strategy based, for instance, on limits on resources available to the sentence processor. As already discussed (in Section 5.2.3 above), further problems with translation without supervisory control include the lack of a guarantee that the two processes
75
produce interpretations that are consistent with each other, especially during and after an error recovery operation, and the lack of a declarative representation of intermediate decisions for use in error recovery. Interaction through a Blackboard: In a blackboard con guration, there is no direct communication between syntax and semantics. The two processes communicate with each other by writing onto and reading from a common representation called the blackboard (e.g., Erman, Hayes-Roth, Lesser, and Reddy, 1980; Nii, 1989; Reddy, Erman, and Neely, 1973). This con guration does not guarantee that communication actually occurs because each process might very well disregard what the other wrote. By a proper design of the representations on the blackboard, however, it may be possible to guarantee consistency between the things that the dierent processes write onto the blackboard (by keeping the contents of the blackboard internally consistent at all times). However, it still does not give the sentence processor any control over the resolution of con icts between syntactic and semantic preferences. Moreover, since what one process writes can be overwritten by another, there is no record of intermediate decisions with a resulting diculty in implementing error recovery. Controlled Interaction through an Arbitrator: In an arbitrator con guration, syntactic and semantic processes perform their analyses and send their results (such as alternative interpretations and preferences for each of them) to the central arbitrator. The arbitrator controls the interactions between the processes by performing the following functions: { It synchronizes the processes by starting and stopping their operations to match each other's operations. { It provides feedback from one process to the other by conveying to each process the alternative that was nally selected by the arbitrator. { It resolves any con icts by selecting the alternative that is best overall according to its arbitration algorithm (see Chapters 7 and 8 for the algorithm). { It retains alternatives not presently selected, for possible use during later error recovery. { It translates between the representations used by the two processes by mapping one to the other. { However, it does not access any syntactic or semantic knowledge on its own; it simply integrates the syntactic and semantic information that is provided to it by the independent processes.
5.4 Architectures and Syntax-Semantics Communication After looking at the nature of syntax-semantics communication and the dierent architectures possible, it is obvious why we are proposing a parallel architecture with controlled interaction between syntax and semantics, the control function being performed by an arbitrating process. To summarize the arguments about architectures, we present the dierent architectures discussed above pictorially in Figure 5.3 and list the kinds of communication they aord in Table 5.2 below.
5.4.1 COMPERE's Architecture
The question we would like to address at this point is what architectural con guration aords the kind of syntax-semantics interaction through arbitration that we have described earlier in this chapter. It is obvious, for instance, that the architecture cannot be a sequential one with a one-way
76
Sem
Semantic Knowledge
Sem
Semantic Knowledge
Syn
Syntactic Knowledge
Syn
Syntactic Knowledge
(b) Sequential: Semantics-first
(a) Sequential: Syntax-first.
Integration
Process Selection
Access Knowledge Knowledge (c) Integrated (d) Integrated: Orthogonal Decomposition
Syn
Sem
Syntactic Knowledge
Semantic Knowledge
(e) Parallel: Uncontrolled Interaction
Syn
Translator
Syntactic Knowledge
Sem
Semantic Knowledge
(f) Parallel: Translator
Unified Arbitrator
Blackboard
Sem
Syn
Syntactic Knowledge
Semantic Knowledge
(g) Parallel: Blackboard
Syn
Syntactic Knowledge
Sem
Semantic Knowledge
(h) Parallel: Controlled Interaction
Figure 5.3: Various Sentence Processing Architectures.
77
Table 5.2: Syntax-Semantics Communication in Various Architectures. (a) (b) (c) (d) (e) (f) (g)
Architecture
Sequential: Syntax- rst Sequential: Semantics- rst Integrated Integrated: Orthogonal decomposition Parallel: Uncontrolled Interaction Parallel: Translator Parallel: Blackboard
(h) Parallel: Controlled Interaction
Syntax-Semantics Communication
One-way from syntax to semantics. One-way from semantics to syntax. Predetermined and packaged. prepackaged. two-way; no consistency; no control. two-way; no consistency; no control. two-way possible; consistency possible; no control. two-way, controlled, and consistent communication.
ow of information. There must be a single process, the arbitrating process, that moderates the communication. To this end, COMPERE has a controlled parallel architecture with syntax and semantics controlled by a uni ed arbitrator. It is clear from the discussion above that sequential architectures are not suitable since they do not permit communication in both directions between syntax and semantics. Integrated architectures on the other hand have syntax and semantics too tightly coupled in the representations themselves. They do not preserve the independence between syntax and semantics that is required for both functional and cognitive reasons. For the reasons described above, of all the parallel con gurations, the controlled architecture with the arbitrator provides the right type of control over the communication necessary to model the sentence processing behaviors (from a cognitive point of view) and to reduce local ambiguity and produce the most preferred interpretation (from a functional point of view).
5.4.2 A Uni ed Process?
One question that we might ask about the controlled parallel architecture is what subtasks are solved by the individual syntactic and semantic processes and what subtasks by the arbitrator. A related question is whether the arbitrating process itself is the only process or whether it exists in addition to syntax-speci c and semantics-speci c processes. For instance, could the arbitrator directly access both types of knowledge eliminating the need for the separate processes altogether? In other words, can syntactic and semantic processing be completely uni ed while keeping the knowledge sources independent of each other? The answer to this question depends directly on the very nature of syntactic and semantic processing and the knowledge they use. Since there is reason to believe that syntax and semantics are dierent at least to the extent that syntax is about linguistic structure and semantics is not (at least not about the structure of the surface form of language), there could be two separate processes. However, this question ultimately boils down to whether the two processes are the same or not. Since a process is a control structure plus a set of operations embedded into the control structure, this question is really asking whether syntactic and semantic processing can be performed using the same control structure and the same set of operations. The design of COMPERE has been motivated strongly by the desire to unify syntax and semantic in this sense (Holbrook, Eiselt, and Mahesh, 1992). We have taken, for several important
78
reasons, the approach of hypothesizing a single process and then identifying small dierences in the ways by which dierent analyses use their knowledge to come up with their proposals for attachment and preferences for alternative attachments. We have tried to put language processing together using as few assumptions of separate processes as possible while keeping in mind the signi cant claims of modular theories (Fodor, 1983; Frazier, 1987; see also the review in Chapter 3). An obvious reason for taking this approach is parsimony. Why start with many dierent processes if we can do with just one? Since the overall process of sentence understanding is all about resolving ambiguities by selecting the best from a set of interpretations and making amends for any erroneous decisions along the way, it is reasonable to propose a single uni ed process that carries out the overall job of sentence understanding. The dierences may lie in the operations used on the dierent kinds of knowledge that result in the proposed interpretations and the various preferences for them. Apart from parsimony, there is psycholinguistic evidence that points towards a common process (see Chapter 3). For example, Stowe's work (1991) showed that syntactic analysis as it happens in human sentence processing in the presence of ambiguities is itself pretty much the same process as the semantic analysis process exempli ed by the one proposed in ATLAST (Eiselt, 1989). For instance, both access multiple structures in parallel, can pursue multiple interpretations in parallel, can switch from one interpretation to another, and can interact with other analyses of language processing. See the review in Chapter 3 for other psycholinguistic results that support the uni edprocess model (and also, Burgess and Lund, 1994; Burgess, Tanenhaus, and Homan, 1994; Eiselt and Holbrook, 1991; MacDonald, Just, and Carpenter 1992; Taraban and McClelland, 1988). In fact, we have developed a uniform representation of grammatical knowledge in syntax and of role knowledge in semantics so that the same kinds of operations can be applied to both (Mahesh and Eiselt, 1994). Moreover, the same control structure, that of the uni ed arbitrator, is used for both syntactic and semantic analyses. In addition to the common control structure provided by the arbitration algorithm, the two processes of syntax and semantics themselves have similar control structures dictated by the head-signaled left corner parsing algorithm (Mahesh, 1994a; Mahesh, 1994b). The same type of parsing algorithm is applied both to assign syntactic structure and to assign thematic and intermediate roles in semantics. However, syntactic and semantic knowledge are inherently dierent kinds of regularities in the language. As such, though some of the same operations such as instantiation, selection, composition, and so on are applied in both syntactic and semantic processing, some operations have slightly dierent or specialized instantiations for syntax and semantics. This matter will be illustrated in further detail in Chapter 8. The requirement of functional independence has an important bearing on the possibility of a completely uni ed sentence processor. Given that the architecture must have one arbitrating process, can it be the only process? Can we look at the sentence processor as having just one integrated process? If the sentence processor is indeed inseparable in this fashion, it could not retain the functional independence of syntax and semantics. Functional independence entails the separability of syntax and semantics to a certain extent. As such, there must be distinguishable parts of the control structure of the integrated arbitrating process where it is doing syntactic processing and consulting syntactic knowledge and other parts where the integrated process is doing semantic processing and consulting semantic knowledge. This separation need not be one in time; the two processes might very well be synchronous, but they are distinguishable from one another and, more importantly, independent of one another, though they may contribute to each other and cooperate with each other whenever possible. Even if they used an identical set of operations, there must still be distinct parts of the control structure which can be labeled syntactic processing and other parts that can be called semantic processing. Integrated understanders have always assumed that the dierent types of knowledge are integrated a priori. As such, one inseparable integrated process has been a good architecture to model
79
such an integrated understander. Because we want to model the functional independence between syntax and semantics and because we want to model incremental communication between them through an arbitrating process, we must subscribe to an architecture where syntactic and semantic processes are talking to each other through a uni ed arbitrating process while doing their own analyses in parallel. The arbitrating mechanism tells the two processes when to communicate with each other, what to communicate about, and how to do the arbitration. Such an architecture has an element of orthogonal decomposition since there is a greater degree of decomposition between the independent syntactic and semantic processes but a common, orthogonal, arbitrating process that controls and integrates the two independent processes. Another point of interest here is whether a cascaded sequential architecture is essentially equivalent in its functionality to a controlled parallel architecture. In a cascaded syntax- rst model with a ne grain-size of interaction, the results of a small amount of syntactic processing are sent to the semantic processor for \ ltering" out the semantically infeasible (or less preferred) syntactic alternatives. As such it may seem as though syntax and semantics are communicating early on to avoid wasteful syntactic processing of fruitless alternatives. However, the decision that semantics has made to rule out certain alternatives proposed by syntax must somehow be communicated back to syntax in order to prevent it from continuing to pursue those alternatives. This can be done either by adding a blackboard to the cascaded sequential architecture, or by introducing two-way communication. In either case, semantics would be in control since it gets to make its decisions later in the process. There would be no scope for syntax for instance to overrule semantics or for any kind of negotiation or arbitration, unless the semantics module really does some of syntactic processing too and makes both syntactic and semantic decisions. In such a case, the second module is really not a semantic module, but more like a combination of an arbitrator or a lter and a semantic processor. In fact, many integrated models and cascaded models are implemented this way (e.g., Charniak, 1983; Eiselt 1989; Woods, 1980). This point about augmenting a cascaded sequential architecture opens up the possibility of a new kind of architecture, one where syntax is subservient to semantics or vice versa. Many systems have been implemented this way. For instance, a semantics- rst system might in reality be a semantic (or conceptual) analyzer which consults a subservient syntactic processor of limited competence. Similarly, one can think of an essentially syntactic analyzer with a subservient semantic analyzer that is used as a consultant by the syntactic processor when there is a syntactic ambiguity. In such a syntax dominated system, semantics could never suggest syntactically unacceptable interpretations or overrule syntax in other ways (which may be necessary for dealing with \weakly ungrammatical" sentences, for instance).5 COMPERE has an architecture in which both syntax and semantics are \ rst class citizens," neither being subservient to the other. This feature, along with the control over interaction provided by the arbitrating process, gives COMPERE the exibility necessary to model dierent strategies in dierent sentence understanding situations (i.e., sets of possible interpretations with dierent combinations of syntactic and semantic preferences for them, con icting or otherwise).
5.4.3 Issues in Syntax-Semantics Communication
We have seen that syntactic and semantic processing in our model of incremental sentence comprehension communicate with each other through a single arbitrating mechanism and declarative intermediate representations that facilitate error recovery. Given these constraints, the three main issues in syntax-semantics communication that the model must address can be summarized as below:
A sentence where semantic or contextual in uences override syntax to force, for example, an interpretation that has a relatively low syntactic preference. 5
80
When to communicate: At what points in time (in terms of the word position in the input
sentence) should syntax and semantics communicate? When should the arbitrator arbitrate between syntax and semantics? The answer to this question determines how we mix bottomup and top-down processing in the parsing algorithm. This issue is further addressed in Chapter 6. What to communicate about: What are the contents and representations of syntactic interpretations, semantic interpretations, and the intermediate roles? What information is used to arbitrate between syntax and semantics? This issue is addressed further in the Chapter 7. How to arbitrate: How does the arbitrator decide? What does it do when there are con icts between the preferences of syntax and semantics? How does error recovery happen? How long are alternatives retained? These questions are also addressed in Chapter 7 (and further elaborated upon in Chapter 8 as a part of describing the COMPERE system).
5.5 Related Models of Sentence Understanding In this section, we present an overview of current models of sentence understanding, trying to map an entire range of in uential models that have been proposed until now. We restrict this survey in general to only those models that have been implemented on a computer. Certain cognitive models that are only described in abstract terms were described earlier in Chapter 3 (e.g., Stowe's (1991) model) and are not part of this survey.
5.5.1 Sequential Models
A typical example of a sequential model is Forster's Levels of Processing Model (1979) shown in Figure 5.4. In this model, a syntactic processor takes the results of a lexical processor as input and sends its output to a message processor. There is no feedback to lower levels of processing. For example, semantic processing in message processor has no eect on the syntactic processor. However, all three processes send their results to a common problem solver that combines their decisions using general conceptual knowledge. This module called the GPS in Figure 5.4 is ultimately responsible for making all decisions and its algorithms are left unspeci ed in this model. Many early models were sequential models based on the augmented transition network (ATN) formalism. These models employed an ATN parser to perform syntactic analysis and fed the output of the ATN to a semantic processor. Examples of ATN based systems such as LUNAR (Woods, 1970; 1973) and SHRDLU (Winograd, 1973) are well-known landmarks in the early history of arti cial intelligence and computational modeling of natural language processing. Though some models including SHRDLU interleaved syntactic and semantic processing, the architecture was still sequential with semantic processing depending on correct syntactic processing. More than one ATN network can be cascaded to reduce the grain size of syntax-semantics interactions. In a cascaded ATN (Woods, 1980), intermediate outputs from the rst ATN are sent to the second ATN network for immediate semantic analysis. However, the interaction is still one-way from syntax to semantics. An interestingly dierent sequential model, with semantics preceding syntax, was proposed by Lytinen (1986) and implemented in the MOPTRANS program (see Chapter 9 for further discussion of this model). In this model, syntactic and semantic knowledge are represented independently. Semantic compositions of word meanings are rst proposed. Syntax is used only to verify the semantic proposals. While this model certainly permits semantic preferences to in uence the choice of syntactic interpretations, it makes syntax subservient to semantics. As a result, the model may have to search through many semantic combinations before nding one that syntax permits. Other
81
Message
Syntactic
GPS
Processor
General Conceptual Knowledge
Lexicon
Processor
Lexical Processor
Input Features
Decision Output
Figure 5.4: Forster's Levels of Processing Model. problems with the semantics- rst architecture in modeling human sentence processing are discussed further in Chapter 9. An interesting class of sequential models used capacity limitations to shape the course of syntactic processing. The two in uential models of this kind are Sausage Machine (Frazier and Fodor, 1978) and PARSIFAL (Marcus, 1980). The Sausage Machine was a two stage processor where both stages processed syntax and semantics. However the rst stage was limited to windows of 5 to 7 words at a time. The result of such local processing was sent to the second stage which assigned the nal syntactic and semantic structures. PARSIFAL also proposed a limited look-ahead parser attempting to account for psychological data on the limitations of the human parser by using limited lookahead in conjunction with deterministic processing. Both models lacked functional independence between syntax and semantics and were essentially syntax- rst, since semantic analysis depended on proper syntactic analysis of a sentence.
5.5.2 Integrated Models
A fundamental idea behind all integrated models is the integrated processing principle: namely, that all knowledge sources must be put to use immediately in language understanding (Birnbaum, 1986; Schank, Lebowitz, and Birnbaum, 1980). Semantic and conceptual knowledge, for instance, should not have to wait until after syntactic analysis to in uence the selection of interpretations. Integrated models, as a rule, do not have a separate syntactic processor. Early models built in this paradigm attempted integration of multiple types of knowledge in the lexical entries of words. A good example of a model of this kind is the CA model (Birnbaum and Selfridge, 1981). Lexical entries of head words contained rules about what to expect before and after the word in various situations and how to combine the meanings of surrounding words with those of the head words. Because of the complete reliance on lexical packaging of all types of knowledge, these models suered from lack of functional independence and problems of scalability. Small and Rieger (1982) built a model called the Word Expert Parser which used a similar technique. Each word was an expert that knew exactly how to combine its meaning with those of other words in the surrounding context.
82
Later models based on integrated processing were based on skimming the text to index into stored knowledge structures and using the knowledge structures to guide language understanding (DeJong, 1982). The rst model of this kind called SAM employed stereotypical temporal sequences of events packaged in structures called scripts to guide understanding (Cullingford, 1978; Schank and Abelson, 1977). Using scripts for stereotypical situations such as events in a restaurant, SAM could make inferences and ll in gaps left unspeci ed in the input text. However, the model functioned well only when presented with texts that were about stereotypical situations for which a script was available to the understander. Another model, FRUMP (DeJong, 1979) also used scripts to skim through newswire stories in the domain of terrorism. The strict dependency on scripts was recti ed in later models which had more exible knowledge structures such as MOPs (memory organization packets) in IPP (Lebowitz, 1983). These models also addressed broader cognitive functions such storage and retrieval of knowledge structures in memory, explanation, and learning by adapting existing knowledge structures. This line of integrated models moved away from sentence understanding and towards discourse (or story) understanding by relying on large knowledge structures that packaged many types of knowledge about typical situations in a domain. For example, the BORIS model (Lehnert, Dyer, Johnson, Yang, and Harley, 1983) proposed many types of elaborate knowledge structures in order to accomplish a \deep understanding" of a story by making many inferences from the text. The amount of domain speci c knowledge required for understanding each text seriously hampered the scalability of models like BORIS. More recent models have proposed specialized knowledge structures such as explanation patterns (Schank, 1986). AQUA is a model of language understanding that views understanding as a process of maintaining an agenda of questions and providing explanations to answer the understander's internal questions by retrieving and adapting stored explanation patterns (Ram, 1989; 1991). An integrated model based on the hypothesis that parsing is no dierent from accessing knowledge structures in memory was built in DMAP (Riesbeck and Martin, 1986a; Riesbeck and Martin 1986b). In DMAP (Direct Memory Access Parsing), entire phrases were attached to representations of their meaning in a semantic-network based integrated representation of syntactic and semantic knowledge. DMAP was also limited to texts from a domain in which the system had all the necessary knowledge structures. Many other models such as Jurafsky's SAL (1991; 1992) are also integrated models with integrated representations of all the dierent kinds of knowledge represented in a monolithic knowledge base of integrated constructs called grammatical constructions. Jurafsky's model diers from other integrated models mentioned above in detailing an explicit decomposition of the integrated process into distinct phases called access, selection, and integration of alternative interpretations (see Figure 5.3(d) above). This is an example of a decomposition of the language processing task that is orthogonal to standard analyses such as syntax, semantics, and pragmatics. A related type of integrated model was proposed by Wilks (1975) based on the use of stored conceptual knowledge in the form of templates. Wilks proposed that these templates store constraints on llers that act as preferences rather than as requirements. These templates also included syntactic ordering information and as such were packages that combined syntactic, semantic, and conceptual knowledge in an integrated representation. A dierent kind of integrated representation is employed in models based on lexical functional grammars (LFG) (Bresnan, 1982; Sells, 1985) where the integration of syntactic and semantic knowledge is encoded a priori in individual lexical entries. In this formalism, there are separate syntactic and semantic processes (typically arranged in a syntax- rst sequential architecture) but the mapping from syntactic structures to semantic structures is precomputed and enumerated in the lexicon. Models based on LFG, such as the Diana and Mikrokosmos analyzers for machine translation (Meyer, Onyshkevych, and Carlson, 1990; Nirenburg, Carbonell, Tomita, and Good-
83
man, 1992; Nirenburg and Levin, 1992; Onyshkevych and Nirenburg, 1994) are typically sequential models using integrated representations.
5.5.3 Spreading Activation Models
In parallel models with uncontrolled interactions, syntax and semantics run simultaneously in parallel and interact with each other continuously. Since there is no process controlling the interactions between the parallel processes, the two processes must share the same representation language so that they can in uence each other through interaction. Models of this type have been built using connectionist networks with activation and inhibition between nodes being the means of interaction. For example, Waltz and Pollack (1985) built a connectionist model where syntactic and semantic decisions were made in separate networks, but the networks exchanged activation with each other and settle on a combined interpretation at the end. A similar connectionist model based on activation and inhibition between nodes in a network, called the Competition Model, has been proposed by Bates and MacWhinney (1991). A fundamental problem with these models is the inability of their processing mechanism, spreading activation, to deal with syntactic processing. In spite of recent advances in connectionist networks, it is yet to be demonstrated that a connectionist network can process the complex syntax of natural languages. Yet, a uniform processing mechanism such as spreading activation is necessary for a parallel model to rely on uncontrolled interaction. Current models such as the one by Waltz and Pollack above assume that a syntactic parse produced by a dierent processing mechanism is available as input to the model.6 Another class of models have used a symbolic equivalent of spreading activation called marker passing in a semantic network to build models of semantic analysis and lexical ambiguity resolution. Among these models, Charniak's (1983) model and Eiselt's ATLAST (Eiselt, 1989) model are particularly interesting to us since they based their models on the principle of functional independence between syntax and semantics. In order to make semantics independent of syntax, Charniak proposed a semantic processor that used marker passing to initially propose semantic alternatives without being in uenced by a syntactic processor (that was running in parallel to the marker passer). There was a third module that combined syntactic preferences with the semantic alternatives to select the combined interpretation. It was interesting to note that semantic feedback was permitted from the semantic marker passer to syntax, but syntax was forbidden from in uencing the rst stage of semantic processing. ATLAST (Eiselt, 1989) was a model of lexical semantic and pragmatic ambiguity resolution as well as error recovery also using marker passing. ATLAST divided both syntax and semantics into three dierent modules called the Capsulizer, the Proposer, and the Filter. The Capsulizer was an initial stage that packaged the input sentence into local syntactic units and sent them incrementally to the proposer. The proposer was the marker passer quite akin to the one in Charniak's model. The third module, the lter, combined syntactic and semantic preferences to arrive at the nal interpretation. The most important aspect of this model was its ability to recover from its errors without completely reprocessing the sentence by conditionally retaining previously unselected alternatives (Eiselt, 1989; Eiselt and Holbrook, 1991). ATLAST was one of the very few models that highlighted the importance of error recovery in shaping the design of a sentence understander. ATLAST's drawback was its limited syntactic capabilities (limited to simple subjectobject-verb single-clause sentences) realized in a simple ATN parser that did not interact in any interesting way with the semantic analyzer. Nevertheless, ATLAST had the most in uence on
More connectionist models have been proposed in the recent past than we can aord to include in this survey. For a sample of interesting models, the reader is referred to the following authors: Cottrell (1985), Cottrell and Small (1983), Miikkulainen and Dyer (1991), Stevenson (1994), and St.John and McClelland (1990). 6
84
the design of the COMPERE model which was born in an attempt to rectify the problems with ATLAST. Blackboard models are also parallel models but typically do not use spreading activation for processing. A good example of a blackboard model is the HEARSAY-II speech understanding model (Erman, Hayes-Roth, Lesser, and Reddy, 1980). This model employed production rules and had syntax, semantics, and other processes interacting through a shared blackboard. However, the blackboard model did not produce syntactic and semantic interpretations in parallel. In HEARSAYII, semantic interpretation did not start until the entire parse tree had been built by the syntactic processor. As a result, the model neither followed the integrated processing principle nor made semantics independent of syntax. READER (Thibadeau, Just, and Carpenter, 1982) was a hybrid connectionist and production system model of sentence comprehension. Production rules governed the transfer of activation across nodes in a network. The model was buily to match closely psychological data on eye xations in reading that the builders of the model had collected in their experiments (see Chapter 3 for details). READER focused on syntactic analysis of sentences and accounted for ambiguities in dealing with embedded clauses. While READER appears to maintain functional independence between syntax and semantics, upon closer examination, it is clear that it does not since it has some productions that do some of both syntactic and semantic processing. On the other hand, READER is capable of applying both types of knowledge immediately in processing a sentence. It is not clear whether it is capable of recovering from errors (see Eiselt, 1989). READER has been extended in a new model called CC READER (Just and Carpenter, 1992) which is a Capacity Constrained READER. By enforcing a limit on the total amount of activation in the network, this model can account for a variety of data on diering working memory capacities and their eects on ambiguity resolution and other phenomena in human sentence processing (see Chapter 3 for more details). Though READER and CC READER (with their control of spreading activation through production rules) appear to have a controlled parallel architecture necessary for dealing with syntax-semantics interactions, the parallel processes are largely uncontrolled. The only controlling force in the model is capacity limit. In this sense, the model's architecture is very similar to that of connectionist models such as Waltz and Pollack's (1985), and it remains to be seen if such limited control over syntax-semantics interactions can account for the variety of ambiguities and error recovery seen in sentence understanding and handled by the COMPERE model (see Chapters 8 and 9 for examples).
5.5.4 Other Recent Models
NL-SOAR (Lehman, Lewis, and Newell, 1991; Lewis, 1992; 1993a; 1993b) is a model of sentence understanding based on the SOAR architecture for production systems and learning by chunking (Laird, Newell, and Rosenbloom, 1987). NL-SOAR used architectural constraints from SOAR and a large set of language processing rules to model a range of syntactic phenomena in human sentence processing including structural ambiguity resolution, garden paths, and parsing breakdown (such as in center embedded sentences). The model also showed a gradual transition in behavior from deliberative reasoning to immediate recognition. This was made possible by the chunking mechanism in SOAR that produced chunks by combining all the productions that were involved in selecting a particular interpretation for a sentence. A drawback of this approach was the gradual loss of functional independence between the dierent knowledge sources. Though syntactic and semantic productions were encoded separately to begin with, the chunking process combined them to produce bigger monolithic units applicable to speci c types of sentences. Thus, NL-SOAR starts with separate representations of dierent types of knowledge but gradually builds its own integrated representations. Another drawback of using the SOAR architecture was its serial nature with the consequence that NL-SOAR could pursue only one interpretation of a sentence at any time. More-
85
over, since productions could contain any type of knowledge, one could encode productions that are rather speci c to a particular type of sentence. In fact, chunking would produce such productions that would be applicable in sentences that represent particular combinations of ambiguities and syntactic and semantic contexts. Cardie and Lehnert (1991) have extended a conceptual analyzer (such as CA described earlier) to handle complex syntactic constructs such as embedded clauses. They show that the conceptual parser can correctly interpret the complex syntactic constructs without a separate syntactic grammar or explicit parse tree representations. This is accomplished by a mechanism called lexicallyindexed control kernel (LICK) which is essentially a method for dynamically creating a copy of the conceptual parsing mechanism for each embedded clause. A model called ABSITY that had separate modules for syntax and semantics was developed by Hirst (1988). ABSITY had separate representations of syntactic and semantic knowledge. A syntactic parser similar to PARSIFAL (Marcus, 1980) ran in tandem with a semantic analyzer based on Montague semantics. This model was able to resolve both structural syntactic and lexical semantic ambiguities and produced incremental interpretations. Syntax was provided with semantic feedback so that syntax and semantics could in uence each other incrementally. Lexical disambiguation was made possible by the use of individual processes or demons for each word, called Polaroid Words, that interacted with each other to select the meaning that is most appropriate to the context. However, semantic analysis was dependent on the parser producing correct syntactic interpretations. This was a consequence of the requirement of strict correspondence between pieces of syntactic and semantic knowledge for syntax-semantics interaction to work in Hirst's model. Rules for semantic composition had to be paired with corresponding syntactic rules for the tandem design to work. A noticeable characteristic of Hirst's model was the use of separate mechanisms for solving each subproblem in sentence understanding. The model used a parser based on a capacity limit for syntax, a set of continuously interacting processes through marker passing for resolving word sense ambiguities, a Semantic Enquiry Desk for semantic feedback to syntax, and strict correspondence between syntactic and semantic knowledge for ensuring consistency of incremental interpretations. This characteristic is inherited completely by a more recent model by McRoy and Hirst (1990) which has more modules than Hirst's original model and appears to be as heterogeneous as its predecessor. This enhanced model is organized in a race-based architecture which simulates syntax and semantics running in parallel by associating time costs with each operation. The model is able to resolve a variety of ambiguities by simply selecting whichever alternative that minimizes the time cost (hence the name \race-based"). The model employed a Sausage-Machine like (Frazier and Fodor, 1978) two-stage parser for syntactic processing and ABSITY for semantic analysis. The two were put together through an \Attachment Processor" and a set of grammar application routines. The attachment processor also consulted three other sets of routines called the grammar consultant routines, knowledge base routines, and argument consultant routines, resulting in a highly complex and heterogeneous model.
5.5.5 Summary
We conclude this section with a comparative summary of contemporary models of natural language (sentence) understanding in terms of the following criteria: F.I.: Functional independence between syntax and semantics: Whether syntactic and semantic knowledge can each be applied independently of the other. I.P.: Integrated processing: Whether both syntactic and semantic criteria are taken into account immediately in processing.
86
E.R.: Error recovery: Whether the model has the ability to recover from its syntactic and
semantic errors. 2-Way: 2-way interaction: Whether syntax and semantics can in uence each other incrementally; this does not apply to models where either there is no separate syntactic processing or where syntactic and semantic knowledge are integrated a priori in integrated representations (such models are given an \N/A" ller in the table). Arch.: The type of architecture: For the sake of brevity, the word \sequential" is dropped from \syntax- rst sequential," \semantics- rst sequential," and \cascaded sequential" architectures. It may also be noted that all models that do not have a separate syntactic processor are noted as having integrated architectures. Conceptual processors such as SAM and FRUMP are said to have integrated architectures in the model below though the processing they do is typically not considered integrated processing. Table 5.3 shows this summary in roughly chronological order of a major publication of the models. It must be noted here that some models shown in the Table, especially SAM, FRUMP, IPP, BORIS, and AQUA, are not considered models of sentence understanding. They are included here both for the sake of completeness and to map the course taken by work in computational modeling of natural language understanding in the last two decades. It can be seen from this table that COMPERE is the rst model to meet the requirements of both functional independence and integrated processing, to be capable of syntactic and semantic error recovery, and to support twoway incremental interaction between syntax and semantics. The following chapters will illustrate how COMPERE accomplishes all of these.
87
Table 5.3: A Comparative Summary of Sentence Understanding Models. Model LUNAR (Woods, 1970) SHRDLU (Winograd, 1973) Preference Semantics (Wilks, 1975) Sausage Machine (Frazier & Fodor, 1978) SAM (Cullingford, 1978) FRUMP (DeJong, 1979) Cascaded ATN (Woods, 1980) HEARSAY-II (Erman et al, 1980) PARSIFAL (Marcus, 1980) CA (Birnbaum & Selfridge, 1981) Word Expert Parser (Small & Rieger, 1982) READER (Thibadeau, Just, & Carpenter, 1982) Marker passing (Charniak, 1983) IPP (Lebowitz, 1983) BORIS (Dyer, 1983) MOPTRANS (Lytinen, 1984) Massively parallel (Waltz & Pollack, 1985) DMAP (Riesbeck & Martin, 1986) ABSITY (Hirst, 1988) ATLAST (Eiselt, 1989) AQUA (Ram, 1989) Race-based parsing (McRoy & Hirst, 1990) Competence Model (Bates & MacWhinney, 1991) SAL (Jurafsky, 1991) CIRCUS/LICKS (Cardie & Lehnert, 1991) NL-SOAR (Lehman & Lewis, 1991) CC READER (Just & Carpenter, 1992) COMPERE (Mahesh & Eiselt, 1993)
F.I. no no no
I.P. no no yes
E.R. no no no
2-Way no no N/A
Arch. Syn- rst Syn- rst Integrated
no
no
no
yes
Cascaded
no no no no
no no no no
no no no no
N/A N/A yes no
Integrated Integrated Cascaded Blackboard
no no
no no yes no
no N/A
Sequential Integrated
no
yes no
N/A
Integrated
no
yes possibly
N/A
yes
yes no
no
Uncontrolled parallel Parallel
no no no yes
yes yes yes no
no no no no
N/A N/A no yes
no
yes no
N/A
Integrated Integrated Sem- rst Uncontrolled parallel Integrated
no no no yes yes In semantics no yes no no no no
yes no N/A yes
Cascaded Parallel Integrated Cascaded
no
no
yes
no no
yes no yes no
N/A N/A
Uncontrolled parallel Integrated Integrated
no
yes in syntax
yes
Integrated
no
yes possibly
N/A
Uncontrolled parallel Controlled parallel
no
yes yes yes
yes
88
CHAPTER VI THE THEORY OF PARSING: WHEN TO COMMUNICATE WITH SEMANTICS The general picture of the human sentence processing mechanism that emerges is of a fast, ecient, highly interactive system in which parsing is inextricably linked to interpretation. Although there is a tendency to consider the two processes as separate, it is becoming increasingly apparent that the distinction has little empirical value when applied to the resolution of local syntactic ambiguity by the human sentence processor. G. Altmann, A. Garnham, and Y. Dennis, 1992, p. 706. In this chapter COMPERE's theory of syntactic parsing is presented. Among all the possible parsing strategies a particular variant of left-corner parsing is proposed as the parsing strategy to be employed for communicating with semantics at the right times. Also presented is an organization of syntactic knowledge that is suitable for the parsing algorithm. The rst section presents an overview of the chapter.
6.1 Introduction A sentence is an ordered sequence of words in a language. Words in the sentence provide a sequence of individual meanings. The sentence processor needs to select appropriate meanings for words and combine or compose these meanings to form the meaning of the sentence as a whole. Syntactic knowledge or grammar tells the sentence processor which meanings to select and attempt to compose, avoiding an exhaustive search for all possible compositions of all word meanings. Syntactic analysis is the process of identifying the compositions licensed by the grammar of the language. It adds hierarchy to the left-to-right order in the words in a sentence and makes it a hierarchical leftto-right order. As we move each level up in the hierarchy, it tells us what units at the lower levels combine with what other units to form bigger meanings. Thus, given a sentence and a grammar for the natural language, the task of syntactic analysis is to output a parse tree (or a parse forest, if there is no single tree for the sentence) that identi es which units need to be composed with which others and in what left-to-right order at each level in the tree. Semantic processing makes use of these recommendations and those from other sources of knowledge and attempts to make compositions and arrive at a sentence meaning. Since neither grammatical nor semantic knowledge guarantees unique compositions of word meanings all the time, ambiguities of composition (as distinct from ambiguities of selection such as lexical ambiguities) exist in both syntax and semantics. That is, there are often multiple ways of composing word meanings according to syntactic or semantic knowledge. In order to deal with the ambiguities and meet the set of functional and cognitive constraints on its design, COMPERE's process of syntactic analysis should communicate with the processing of semantics so as to produce the right sequence of decisions and commitments in sentence processing. Syntactic analysis should not only communicate incrementally with semantics, it should do so at the right points or times in processing a
89 sentence.1 The times at which the communication occurs are determined by the usefulness of the communication between syntax and semantics.2 Communication is useful if it eliminates certain alternatives and leads to a reduction in the number of choices being considered by syntax and semantics. Syntax and semantics should interact only at those times when one can provide some information to the other to help reduce the number of choices being considered. This can happen only at or after syntactic analysis has reached a decision point at which semantics could have information to make communication useful. For instance, the parser should not interact with the semantic analyzer until it has completed analyzing a unit that carries some part of the meaning of the sentence, such as a content word. Only then can semantics provide useful feedback perhaps using selectional preferences for llers of thematic roles. An alternative is to eliminate the question of deciding when to communicate by proposing integrated representations of grammatical and semantic knowledge as in semantic grammars or in the various integrated understanders described in the previous chapter. Such an approach loses heavily on the generativity of the language analyzer and would work only in a limited, familiar domain. The choice of points in syntactic analysis at which to communicate with semantics is determined by the points at which syntactic analysis makes commitments to structure (i.e., commitments to particular compositions of words). This is in turn determined by the choice of parsing algorithm employed for performing syntactic analysis. A parsing strategy must be designed that communicates with semantics precisely at those points at which semantics begins to have the necessary information to provide helpful feedback. For this purpose, pure bottom-up parsing turns out to be too circumspect since it waits until the end of a phrase to posit any attachment. Pure top-down parsing, on the other hand, is too eager and makes its commitments too early for semantics to say anything about those attachments. The resulting backtracking in a top-down parser complicates the issue of semantic interaction signi cantly. A combination strategy called Left Corner (LC) parsing is a good middle ground, making expectations for required constituents from the leftmost unit of a phrase, but waiting to see the left corner before committing to a bigger syntactic unit. In LC parsing, the leftmost child (i.e., the left corner) of a phrase is analyzed bottom-up, the phrase is projected upward from the leftmost child, and other children of the phrase are projected top-down from the phrase. While LC parsing de nes when to project top-down, it does not tell us when to make attachments. That is, it does not tell when to attempt to attach the phrase projected from its left corner to higher-level syntactic units. Should it be done immediately after the phrase has been formed from its left corner, or after the phrase is complete with all its children, or at some intermediate point? Since ambiguities arise in making attachments and since semantics could help resolve such ambiguities, the points at which semantics can help the syntactic parser determine when the parser should attempt to make such attachments. LC parsing de nes a range of parsing strategies in the spectrum of parsing algorithms along the \eagerness" dimension (Abney and Johnson, 1991). The two ends of this dimension are purely bottom-up (i.e., most circumspect) and purely top-down (i.e., most eager) parsers. Dierent LC parsers result from the choice of arc enumeration strategies employed in enumerating the nodes in a parse tree. In Arc Eager LC (AELC) Parsing, a node in the parse tree is linked to its parent without waiting to see all of its children. Arc Standard LC (ASLC) Parsing, on the other hand, waits for all the children before making attachments. While this distinction vanishes for pure bottom-up or top-down parsing, it makes a big dierence for LC Parsing. Time here refers to points in the sentence (e.g., after word 2 from the left), not the micro-level time in processing a particular word. The latter demands a ner level of analysis than COMPERE attempts and falls into modularity arguments such as those given by Fodor (1983). 2 Semantics in this chapter refers to semantic as well as conceptual knowledge. 1
90
In this chapter, an intermediate point in the LC parsing spectrum between ASLC and AELC strategies called Head-Signaled Left-Corner Parsing (HSLC) is presented. The proposed point produces the right sequence of decisions for incremental interaction with semantics. In this strategy, a node is linked to its parent as soon as all the required children of the node are analyzed, without waiting for other optional children to the right. The required units are prede ned syntactically for each phrase; they are not necessarily the same as the semantic head of the phrase (e.g., N is the required unit for NP, V for VP, and NP for PP). HSLC makes the parser wait for required units before interacting with semantics but does not wait for optional adjuncts, such as PP adjuncts to NPs or VPs.
6.2 Combining Bottom-Up and Top-Down Parsing In this section, we present the arguments for the assertion that incremental interpretation of natural language sentences cannot be modeled by pure bottom-up or top-down parsers. We then present the functional and cognitive motivations for combining bottom-up and top-down methods by employing various left-corner parsers. First we describe the need to combine bottom-up and top-down methods. Then we analyze the types of knowledge that allows a parser to include a top-down in uence and make an early commitment.
6.2.1 The Need to Combine Bottom-Up and Top-Down Methods
Data-driven models of language comprehension with bottom-up strategies are compatible with incremental semantic interpretation. Incremental comprehension is best described by a bottom-up strategy that interprets each successively larger constituent as it is built from the next word in the sentence. A top-down strategy would force the parser to commit to a whole constituent before analyzing the parts of the whole. Making such commitments when the processor does not have the necessary information results in unwarranted backtracking. Such behavior would not only be wasteful, it would also not conform to human behavior for the same input sentence. Several models built on the augmented transition network (ATN) mechanism, a top-down parsing method, are of this character (e.g., Winograd, 1973). Standard theories of syntax using phrase-structure rules are incompatible with incremental processing given that the steps taken by the parser be psychologically real (Steedman, 1989). They are based on the notion that a constituent has been parsed and can be semantically interpreted when all its parts have been processed. For instance, the rule VP ?! V NP can be applied to rewrite the sequence V NP into the single constituent VP only when we have seen the entire sequence. Only after such rewriting can the processor apply the corresponding semantic operations to interpret the VP constituent. Though it is possible to apply a bottom-up strategy to phrase-structure rules, the resulting process will not be incremental since the bottom-up parser waits until it has seen every daughter of a constituent before interpreting it. Phrase-structure rules can be applied to carry out an incremental interpretation only if sentences have a left-branching structure; this, however, is not true with a majority of natural languages (Steedman, 1989). Most natural languages are full of right-branching structures since they are SVO (Subject Verb Object) or SOV languages but not VOS. This leads to spurious ambiguities unless the parser can perform some form of partial analysis (Frazier, 1987). What is needed is a bottom-up parser with top-down guidance that can make early commitments before actually seeing every part of a constituent so that semantic interpretation can be incremental. Such early commitments may be made by employing top-down in uence from a variety of types of knowledge.
91
6.2.2 Parsing Strategies
Dierent parsing algorithms dier in the order in which they traverse the tree structures representing the compositions of words as they build those structures. The criterion used in selecting the right parsing algorithm for COMPERE is one of making maximal commitments with justi able bases. That is, at a particular point in the sentence, we would want COMPERE to make the maximal set of commitments so as to minimize the number of alternatives being pursued at that point as long as those commitments are justi ed by some piece of knowledge. We would want to avoid making unjusti ed commitments that lead to unwarranted backtracking, in order to meet the functionally and psychologically derived constraints on the time course of sentence processing decisions. Most work on parsing strategies has been formalized based only on syntactic considerations as though attachments were made using only lexical categories and grammar rules. However, other knowledge sources such as semantic and conceptual constraints and structural preferences also play a role in making syntactic attachments. When these other sources are considered, bottom-up and top-down parsing strategies can be explained in terms of the types of knowledge used to make attachments. In bottom-up parsing, the parent node comes after all its children. Information that is obtainable from the children nodes is used to generate the parent node. Such information is information that is overt in the children nodes (i.e., available directly from the lexical entries of the terminal nodes in the subtrees and from the categories of the nonterminals in the subtrees). If any piece of information that is not overt in the children is used, the parser could perhaps generate the parent before seeing all the children and their descendants, resulting in a strategy that is no longer bottom-up. A bottom-up parser is an extremely circumspect one that makes commitments using only information that it sees in the input. In top-down parsing, the parent comes before the children. Obviously, the parent could not have come from overt information available from the children. It must have been generated from some other source of information, the most common being grammatical predictions. A pure topdown parser is an extremely eager parser that generates interpretations without waiting for overt information from the input. One could combine the two strategies to arrive at a left-corner parsing strategy (Aho and Ullman, 1972). In this case, the parser starts out with overt information in the input to generate the left corner of a phrase. From then on, it uses a top-down strategy in combination with the bottom-up one to make predictions using information that is not overtly available from the part of the input that is rooted under the node currently being processed. More precisely, at any point in parsing, the parser is trying to make attachments between a child and a parent . Parent might have many children, : : :, child ?2, child ?1, child , child +1, : : :. The subtree rooted at child is called subtree and the segment of the sentence spanned by this subtree is called segment : In bottom-up parsing, parent does not exist until all of its children are enumerated. Bottom-up parsing can only use the information in child and its siblings in order to combine the siblings to form the parent node Parent : This information must come from the corresponding subtrees and segments. If any information that is coming either from Parent or some other part of a parse tree outside the above subtrees, or is coming from an outside source such as semantic feedback, is used by the parser to make attachment decisions, the resulting strategy is top-down parsing. i
j
j
i
i
i
i
i
i
i
j
i
j
j
6.2.3 Sources of Justi cation for Syntactic Commitments
Information arising from the input sentence and the lexical entries for the words in the sentence could be used by a bottom-up parser to select and compose meanings of words. Such cues present
92
in the input sentence can be of the following types:
Syntactic Category: The syntactic category of a word (e.g., noun, verb, etc.) is used to
compose the word with other words already seen to form a grammatical composite, or parent, syntactic unit. Subcategory Information: Subcategory information (e.g., number, gender, tense, etc.) is used to prefer certain compositions over others. Predicate Argument Structure or Licensing Information: If this information is available from the lexicon (e.g., for a verb) it is used to exert preferences over alternatives. Word Meaning: The meaning(s) of a word constitute the elements from which the meaning of the sentence is composed. Other Cues: A variety of other information in the input such as punctuation and intonation patterns could be used by a sentence processor, but is not modeled by COMPERE for the sake of simplicity.
As already discussed in Chapter 5, information providing top-down guidance to the processor can be of three types: grammatical information about the categories involved leading to syntactic expectations, general structural preferences such as minimal attachment (Frazier, 1989), and feedback from semantics, reference and discourse interpretation. Syntactic predictions cannot be made based on the children alone. This information is available only in the parent nodes. A bottom-up parser could not use the prediction for a category in its attachment decisions since the predicted node is bound to be outside the set of subtrees that have thus far been built from the input. For instance, it is known that after enumerating an NP node, a VP node must follow for the sentence to be completed. This information, even if obtainable from the category information for the already enumerated NP node, is of no use to a bottom-up parser since the expected VP node will always be outside the set of subtrees constructed bottom-up from the input. It could never be used by a bottom-up parser to prefer a VP attachment to S over one to a reduced relative clause, for instance, since the information that there is an expected VP is coming from the S node and is therefore unavailable to a pure bottom-up parser. The bottom-up parser would not be able to use syntactic predictions immediately to select among possible categories of the next word when a particular category is expected. The parser must also be able to exert purely structural preferences that are independent of the categories and lexical items involved in the ambiguous parts of the sentence. Examples of such a preference are right association and minimal attachment (Frazier, 1987). Such syntactic generalizations allow the syntactic processor to make early commitments for optional adjuncts as well (which were not expected), thereby explaining several syntactic phenomena (Frazier, 1989). Semantic preferences are also not overtly present in the input. They are often not part of the lexical entries of words in the spanned segments. A preference such as that for animate entities for the Agent role is a piece of general knowledge, often not speci c to a lexical entry. In sentence (1) below, even if the lexical entry for the verb \saw" tells us that its instrument role can only be lled by an object that can act as an optical instrument, it takes some semantic processing to gure out that the \horse" is lling the instrument role if the PP is attached to the VP. Conceptual preferences usually also come from some amount of inference from meanings that are not all in the segments spanned by the children nodes. As such, a bottom-up parser would not be able to make use of semantic feedback to prune the parse forest early on. Preferences leading to top-down parsing decisions may also come from analyses of reference. However, certain pieces of conceptual knowledge may be directly encoded as lexical preferences and may be usable without further inferencing.
93
(1)
The ocer saw the man with the horse. Justi cations arising from semantic and conceptual preferences are handled by means of incremental communication between syntax and semantics. Those arising from syntactic expectations and structural preferences are integrated with bottom-up syntactic information by the syntactic parsing algorithm. The parsing algorithm determines when and how such information is used to make justi ed early commitments and thereby eliminate certain alternatives from immediate consideration.
6.2.4 Two Constraints on Human Parsers
Models of human language processing must take into account two constraining factors arising from both the nature of natural languages and the nature of the human processing system: local ambiguities and memory requirements. Natural languages are full of local ambiguities that use memory by making the processor hold on to multiple intermediate interpretations. However, the human processing system has a limited memory capacity. A cognitive model of parsing must therefore devise a parsing algorithm that keeps memory requirements to a minimum by reducing the amount of local ambiguity it must deal with. For example, the human parser is unable to comprehend deeply centerembedded constructions while being able to parse left- and right-branching structures with ease. Using this empirical evidence, Abney and Johnson (1991) have shown that in order to minimize both space requirements and local ambiguities in an incremental parser, the parsing strategy can neither be bottom-up nor top-down. They showed that an intermediate, uniform, syntax-directed parsing strategy known as left-corner parsing has the right properties along the dimensions of both local ambiguity and minimum space requirements. However, a pure left-corner parsing strategy does not build the syntactic structure in an order suitable for incremental semantic comprehension. We present an alternative parsing strategy, head-signaled left-corner parsing, that retains the local ambiguity and memory requirement virtues of left-corner parsing and integrates them with the virtues of head-driven parsing that are compatible with incremental semantic comprehension.3
6.2.4.1 Local Ambiguities
Local ambiguities are not inherent in a natural language sentence. They are the consequences of choosing a parsing strategy. Both top-down and bottom-up strategies lead to extreme positions along the local ambiguity scale. A top-down strategy is too eager and hence results in unfounded local ambiguities whereas a bottom-up strategy is too circumspect and leads to increases in space requirements as well as fruitless explorations at lexical ambiguities. Left-corner parsing falls in an intermediate slot on the eagerness scale as well as on the storage requirements scale (Abney and Johnson, 1991), presenting itself as a highly promising candidate for computational psycholinguistics.4 However, it is not quite the right parsing strategy for incremental semantic interpretation where semantic interpretation is \syntax-guided" whenever possible. A variant of left-corner parsing, \head-signaled left-corner parsing," augments left-corner parsing with the notion of a head, as in \head-driven parsing," to optimize space requirements, minimize local ambiguities, and enable orderly incremental interaction with semantics. However, this strategy turns out to be dierent from head-corner parsing as well.
While COMPERE's head-signaled left-corner parsing has the virtue of keeping memory requirements to a minimum, no limit has been placed on available memory in the implementation. Reasons for not having an explicit numerical limit are discussed in Chapter 10. 4 The term computational psycholinguistics, as it is used here, refers to that discipline which focuses on building computational models of human language processing that are psycholinguistically valid, plausible, or predictive. The same term has been used by others in the context of connectionist models of human sentence processing. 3
94 The changes that a parser makes to the current interpretations after reading word w and before reading word w +1 is called the ith parsing increment. A local ambiguity is said to exist at word w when there is more than one possible ith parsing increment for the parser to make. Since the increments that the parser makes at a point i depend on the parsing strategy being employed, local ambiguities are a result of the choice of parsing strategies. Local ambiguities cost time or space depending on whether the parser chooses to make a rst analysis and then backtrack or to pursue multiple interpretations in parallel. In general, the more eager the parsing strategy, that is, the earlier it commits to an interpretation, the more errors it encounters. Such errors cost both time to recover from them and space to keep track of states at previous decision points. The less eager, or the more circumspect the parsing strategy, the more space the parser needs in order to keep all the alternatives at hand before it can nally commit to some of them, but may not take more time if computation can be parallel.5 Global ambiguities are those that remain unresolved even after applying all the knowledge available to the parser and all that is derivable from an entire sentence. Global ambiguities are independent of parsing strategies. i
i
i
6.2.4.2 Memory Requirements
Architectural models of human language processing have been shaped by the two taxing demands of minimizing space requirements and the amount of local ambiguities encountered by the parser. The model must use nite amounts of working storage; this amount must be very minimal considering the limitations on human short-term or working memory. Furthermore, memory limitations should predict parsing breakdown. A classic example of this last requirement is seen in the case of centerembedded constructions such as the rat the cat the dog chased bit ate the cheese. The parser must require greater amounts of memory on this sentence than on easily comprehensible left- or rightbranching constructs. In their position paper on parsing strategies for psycholinguistic modeling, Abney and Johnson (1991) have showed that neither a top-down nor a bottom-up parser predicts a breakdown due to memory overload in center-embedded sentences. In fact they both require greater memory to parse one of left- or right-branching constructs than center-embedded constructs. A parser needs space to keep its partial interpretations of initial substrings of a sentence. Any parser must require only a nite amount of space for all the inputs it can parse correctly. In what follows, only the space requirements of syntax are considered. This does not mean that semantic interpretation does not require space; it does. Nevertheless, this is a valid assumption for the analyses of parsing strategies below since syntactic space requirements alone are sucient to rule out certain parsing strategies. (It is also not clear how the space requirements of semantics are shared between working memory and long-term memory.) It is also assumed that the human parser constructs a parse tree incrementally. However, it is not required that the entire parse tree be accessible at any time. Parts of the trees may be \lost" as parsing proceeds. For instance, as bigger trees are built by composing smaller trees, parts of the trees in the \interior" (i.e., not on the right-hand boundary of the new composite tree) may be thrown away. The parser must, however, compute the syntactic structures and the structural relations between the constituents of a sentence in some form equivalent to the parse tree. Any such syntactic analysis is assumed to take the same amount of space as does building and using a parse tree. The space requirement of a parse tree is not the space required to represent a complete parse tree. Nor is it the number of nodes that are not yet attached to a parent node. The space required at any point in parsing a sentence is the number of nodes that the parser might have to refer to 5 An example of a parsing architecture designed to address the problem of local ambiguities is the Marcus parser (Marcus, 1980).
95
at a later point, either to attach the node to a parent or to attach a child to the node (Abney and Johnson, 1991). This is called the parser's undierentiated space requirement to clarify the point that we do not concern ourselves with whether the required nodes are all in working memory or they are shued between working memory and long-term memory. An example of a parsing architecture designed to minimize space requirements is the \ rst analysis" model of Frazier (1987) built around the minimal attachment hypothesis. Abney and Johnson (1991) de ne space requirements as the size of the set of all nodes that the parser may need to refer to later in the parse. This set is the set of all nodes whose parent has not been identi ed, and all nodes that are missing one or more children. This characterization is problematic since we never know whether a node has acquired all its children until later in the parse. This is a consequence of the fact that natural languages allow optional children such as prepositional adjuncts. An alternative formulation of the space requirement is given below.
Space Requirement: The space required at any point in parsing is the total number of nodes on
all the right-hand boundaries of all the parse trees at that point.
Right-Hand Boundary: A node in a parse tree is on the right-hand boundary of the tree if it is
the root of the tree or if it is the rightmost child of its parent and its parent is on the right-hand boundary of the tree. Note that there may be more than one parse tree at a point if dierent parts of the sentence have dierent subtrees that have not been attached to each other yet. Note also that the nodes on the right-hand boundaries are the ones that the parser may have to refer to since attachments can only be made to those nodes. This analysis is however approximate since it does not account for interior nodes (i.e., those that are not on the right-hand boundaries), if any, that need to be referred to during error recovery. This analysis does however account for the space required to keep track of syntactic expectations (i.e., predictions); an expected node can simply be treated as a separate tree with just a root node. The space requirement as de ned above may be greater than that de ned by Abney and Johnson if some nodes on the right-hand boundary are known grammatically not to take any more children. This is not common since almost any grammar allows adjuncts such as prepositional phrases and adverbs to be added freely to the right of a phrase.
6.3 Some Preliminaries This section introduces several concepts, distinctions, and assumptions necessary for making the points in the rest of this chapter. Before analyzing dierent parsing strategies, it is necessary to de ne certain terms and state certain assumptions.6 A fundamental assumption underlying this work is the principle of incremental interpretation, stated in Chapter 2. When applied to the syntactic parser, this principle can be restated as follows: The parser must provide a maximal, partial interpretation for all initial substrings of a sentence (Crocker, 1993). This principle entails maximal use of lexical, grammatical, and other knowledge sources and, together with the need to reduce errors, entails the principle of integrated processing (Birnbaum, 1986; also stated in Chapter 2). Neither principle requires the parser to select just one interpretation at all times. They do not demand a \ rst analysis" model (Frazier, 1987); the parser may delay This and the following section derive heavily from the ideas discussed by Abney and Johnson. Also, several gures that follow are based on those by Abney and Johnson (1991). 6
96
decisions at times and pursue multiple interpretations (Stowe, 1991) depending on the suciency of available knowledge for choosing an interpretation.7
6.3.1 Spectrum of Parsing Strategies
A parsing strategy is a way of enumerating the nodes in a parse tree. Conversely, every enumeration of the nodes in a tree is a dierent parsing strategy. A uniform parsing strategy is one which uses a repeating pattern for enumeration throughout the tree. There are three basic parsing strategies that correspond to the three basic patterns of tree traversal: pre-order, post-order, and in-order traversal.
Top-Down Parsing: A parsing strategy is top-down if a node is enumerated before any of its children are. This results in a pre-order traversal of the tree.
Bottom-Up Parsing: A parsing strategy is bottom-up if a node is enumerated after all its children and their descendants are. This results in a post-order traversal of the tree. Left-Corner Parsing: A parsing strategy is left-corner if a node is enumerated immediately after its leftmost child, along with its descendants, is but before any other child or any of its descendants is. This results in an in-order traversal of the tree. More precisely (see also Nijholt, 1980), The left corner of a production is the leftmost symbol (terminal or nonterminal) on the right side. A left-corner parse of a sentence is the sequence of productions used at the interior nodes of a parse tree in which all nodes have been ordered as follows. If a node n has p direct descendants n1 ; n2 ; : : :; n ; then all nodes in the subtree with root n1 precede n. Node n precedes all its other descendants. The descendants of n2 precede those of n3 ; which precede those of n4 ; and so forth. Roughly speaking, in left-corner parsing the left corner of a production is recognized bottom-up and the remainder of the production is recognized top-down (Aho and Ullman, 1972). Bottom-up, left-corner, and top-down parsing are three points on a spectrum of what are called uniform syntax-directed parsing strategies. A syntax-directed parsing strategy is one where there is an announce point in every grammar rule such that the children that precede the announce point, and all their descendants, are enumerated rst, then the parent node is enumerated, and then the children after the announce point are enumerated, as are all their descendants (Abney and Johnson, 1991; Nijholt, 1980). A syntax-directed strategy is uniform if the announce point is the same for all rules. The announce point is zero (before the rst child) for top-down parsing, one (the left corner) for left-corner parsing, and n (after the rightmost child) for bottom-up parsing. Every syntax-directed strategy is also a left-to-right strategy (i.e., preceding nonterminal nodes are enumerated before succeeding nonterminals). Figures 6.1, 6.2, and 6.3 show enumerations of a parse tree by top-down, bottom-up, and left-corner strategies respectively. These three parsing p
On a dierent note, however, the integrated processing principle entails independence between the dierent knowledge sources (see the principle of functional independence in Chapter 2). As long as there are situations in which some kinds of knowledge are useful for making rational decisions but other knowledge sources are unavailable or unusable for some reason, integrated processing requires that the ones that are useful be independent of those that are not. Since such situations do occur in human language processing, integrated processing entails the independent applicability of each knowledge source. 7
97
strategies correlate with the three classes of languages and grammars: LL, LR, and LC(k) (Nijholt, 1980).
1S 3 2NP 4 5 DET
9
8 VP 11 13 7 10 6 N V 12NP 15 17 14DET 16 N
Figure 6.1: Top-Down Parsing.
There are non-syntax-directed parsing strategies such as Head-driven parsing. In head-driven parsing, every phrase has a distinguished child called the head. A parent node is enumerated immediately after its head is enumerated. Not all of the subtree rooted at the head need be enumerated before the parent can be. Thus head-driven parsing is not syntax-directed. We shall see the advantages of head-driven parsing for incremental semantic interpretation below. Figure 6.4 shows an enumeration of a parse tree by head-driven parsing. Head-driven parsing is generally applied to government binding theories of parsing and as a result, the parse tree in Figure 6.4 is not typical of those produced by head-driven parsers.
6.3.1.1 Arc Enumeration Strategies
In any of the strategies above, one can make a distinction between an arc-eager strategy and an arc-standard strategy with consequent dierences in space requirements and local ambiguity. Arc Eager Strategy: A parser is said to be arc eager if it attaches a pair of nodes as soon as both nodes are available regardless of whether their children have been enumerated or not. Arc Standard Strategy: A parser is said to be arc standard if it either attaches two enumerated nodes when none of the subtree under the nodes has been enumerated, or when all of the subtrees rooted at the nodes have been enumerated. These two strategies are identical for a top-down or bottom-up parsing strategy. However, they are not identical for an intermediate strategy such as left-corner parsing. An arc eager strategy may lead to a reduction in space requirements but never requires more space than its arc standard counterpart. This reduction is accompanied by an increase in local ambiguity for the arc eager
98
15S 16 17 3 12VP NP 4 5 13 14 1 2 N 6V 9 NP DET 10 11 7DET 8 N Figure 6.2: Bottom-Up Parsing.
6S 11 2 9 VP NP 10 15 3 5 1 4 N 8V 13NP DET 7
14 17 12DET 16N Figure 6.3: Left-Corner Parsing.
strategy. Figure 6.3 showed an arc-eager left-corner enumeration. Figure 6.5 shows a corresponding arc-standard left-corner parse.
99
9S 11
10 3
7 VP NP 4 5 8 17 1DET 2 N 6V 14NP 15 16 12 DET 13 N Figure 6.4: Head-Driven Parsing.
6 7
S 17
2NP 9 VP 3 5 10 16 1DET 4 N 8 V 12 NP 13 15 11 DET 14 N Figure 6.5: Arc-Standard Left-Corner Parsing.
6.3.2 Required versus Optional Constituents
Syntactic knowledge is a set of generalizations about the acceptable sentence structures in a natural language. Each generalization, or rule of grammar, relates at least three constituents structural
100
units. Its general form is: Parent ?! child-leftmost, child2, child3, ... , child-rightmost. This rule says that the speci ed children, in the speci ed order, make up the said parent. In other words, the said parent can be expanded to the speci ed children in that order. Natural languages allow many ways of making up a parent by appending or inserting optional constituents to a parent structure. In other words, each syntactic structure has at least one required unit and zero or more optional units. This distinction between optional and required units is a very important one when it comes to syntactic processing, especially when syntactic processing is supposed to be interacting incrementally with semantic processing. A phrase has two kinds of constituents. Those that are required for the phrase to be complete and those that are optional adjuncts. Optional constituents can be to the left of the required parts (e.g., an adjective in an NP) or to the right of an already completed phrase (e.g., a PP attached to an NP). One of the required constituents of a phrase is called the head of the phrase. The head provides the core meaning of the phrase; the other required and optional constituents modify the meaning of the head. For example, N is the head of an NP, V that of VP, and VP that of S (in an event-centered semantics as described in Chapter 7). The head as de ned above is actually what could be called the semantic head, since what is considered a head in head-driven parsing techniques may be based only on syntactic considerations. For instance, in some theories, the determiner is the head of an NP (Abney, 1989). In this work we need to assume that the grammar formalism is such that the semantic head is the same as the syntactic head. According to COMPERE, the head of a PP is the NP in it (and recursively, the head of the NP, i.e., the N), not the preposition. This is obvious when we consider the semantic function of the words involved. A preposition modi es the meaning of the noun by specifying the relation between the noun and the event or another noun in the sentence. A sentence has nouns and verbs. It also has two other types of constituents: those that modify the meanings of nouns and verbs and those that mark the relationships between dierent nouns and verbs. These markers appear in the form of dierent linguistic cues. Word order and prepositions are the two most used markers in English. Syntactically, prepositions happen to appear to the left of the PPs. This does not make the preposition the semantic head of the PP; the head of the PP is the noun in its NP. A PP with a prep-noun and no preposition is an example of a grammatical construct where there is no need for an overt preposition showing that the semantic head is the noun. It may also be noted that the semantic head of a phrase is always an open-class word.
6.4 Why Not Bottom-Up or Top-Down? Abney and Johnson (1991) have shown that bottom-up parsing takes more space for right-branching constructions than center-embedded constructions. Top-down parsing takes more for left-branching constructions than center-embedded ones. Given that human parsing breaks down on centerembedded sentences but not on right- or left-branching ones, neither of these parsing strategies is good for modeling human sentence processing. Left-corner parsing has the right properties in terms of space requirements for dierent branching structures of sentences (shown graphically in Figure 6.6). These results are summarized in Table 6.1.
6.4.1 Other Reasons for Mixing Bottom-Up and Top-Down Strategies
Intuitively, the principle of incremental comprehension entails maximal use of both lexical and grammatical knowledge sources (Crocker, 1993). It is a small step from the principle of incremental
101
leftbranching
centerembedded
rightbranching
Figure 6.6: Sentence Branching Structures.
102
Table 6.1: Space Requirements and Local Ambiguities of Parsing Strategies. n is the number of terminal nodes or words in the sentence. LeftCenterRightLocal Branching Embedded Branching Ambiguity Bottom-Up 3 n/2 n Low (but high for lexical ambiguities) Top-Down n n/2 2 High Arc-Eager Left-Corner 2 n/2 3 Low Arc-Standard Left-Corner 2 n/2 n Low
Parser
interpretation to the need for bottom-up processing. In other words, a partial interpretation can be constructed from the input substring in a bottom-up process. It is in making it more complete that one needs to throw in top-down strategies. Consider an incremental semantic interpreter that works hand-in-hand with the syntactic parser. That is, the syntactic parser sends information on every attachment it makes to the semantic interpreter. The semantic interpreter, thus guided by syntax, tries to nd corresponding semantic relationships between the entities attached syntactically. Further, it provides feedback to the syntactic parser regarding the semantic feasibility of the proposed syntactic attachment. In fact, the syntactic parser consults the semantic processor before making an attachment so that the semantic processor could advise syntax as to the semantic preferences for the attachment. An alternative to this architecture (that will not be embraced here) is one where the semantic interpreter is not guided by syntax. Rather, the two act independently but communicate their decisions to each other to negotiate a settlement. This violates the integrated processing principle since the semantic interpreter may do some computation that could have been avoided if only it had followed the guidance available from syntactic information. (Such wasteful semantic processing can also be seen in semantics- rst parsers.) It may be noted that integrated processing simply requires that decisions be made using both syntactic and semantic preferences, not that semantic processing be done excluding syntactic guidance. Neither syntactic nor semantic processing need be autonomous. The sentence processor may employ a highly interactive process that is syntax- rst at the \sub-word level" (so called because there may be syntax-semantics interactions more than once in processing a single word). In computational linguistics, all parsers built since the 1970's have shown that we need a good combination of bottom-up and top-down methods (Berwick, 1993). One reason for this is an empty category.8 Empty categories cannot be posited by a bottom-up parser early enough to be able to account for psychological data on the processing of long-distance dependencies (Crocker, 1993). Psychological evidence from reading-time diculties in parsing head- nal constructs in German has shown that arguments are attached to a hypothesized node before its head occurs in the input string (Bader and Lasser, 1993). This shows that the German parser could not have been purely bottom-up.9 In the case of computational models, even those models which appear to be bottom-up, such as categorial parsers (Steedman, 1989), in fact use operations such as type raising, 8 An empty category is typically an empty (or missing) NP that has moved to a dierent position as a result of a syntactic transformation (Sells, 1985). 9 See our assumptions in Chapter 2 on the overall process of language comprehension across languages (assumption 5 in Section 2.5).
103 equivalent to those that license arguments before actually seeing the head.10 Categorial grammars make parsing more bottom-up by expanding or laying out dierent types of knowledge that would have been used for top-down predictions or licensing in a system of categories and by lexicalizing the categories. If one wants to avoid the consequent enormities of lexical representations, one has to capture generalizations across categories in more compact representations which are not lexical any more. As a result, one has to move toward a more top-down parsing strategy.
6.5 Problems with Left-Corner Parsing Though left corner parsing is a combination of bottom-up and top-down parsing strategies, it still has problems with incremental communication with semantics. Consider the enumeration of a parse tree with an optional adjunct by arc-eager left-corner parsing (Figure 6.7). There is a local ambiguity at the point of attaching the PP: its parent could be either the NP or the VP. This parsing strategy, being eager to make commitments, proceeds to deal with this local ambiguity as soon as it sees the preposition. However, semantics is unable to help syntax resolve this local attachment ambiguity since the head of the NP in the PP is as yet unknown. The meaning of the prepositional object is unavailable for the semantic interpreter to decide which of the two attachments is semantically feasible.11 If only the parser had waited until it had seen the head, it could have sought the help of semantics in resolving this ambiguity. It could have avoided wasteful processing of parallel interpretations, or it could have avoided some errors and backtracking which would have resulted if the parser tried a rst analysis derived from a structural preference. An arc-standard left-corner parser would perhaps fare better. Consider the same tree enumerated by such a parser (Figure 6.8). It can be seen that the arc-standard left-corner parser violates the principle of incremental interpretation. It does not provide a maximal partial semantic interpretation where it could have if only it had been a little less circumspect. For instance, after seeing the V, the event described by the V could have been semantically related to the subject NP by building the link between the VP and the S (Figure 6.8). This parser, however, did not permit that at least until after the direct object of the V was composed with the V. In other words, link number 27 between the VP and the S in Figure 6.8 was built only after those between the PP and the NP/VP, number 25, and that between the object NP and the VP, number 26. From the above analysis, it is fairly obvious that what we need for an incremental semantic interpreter that minimizes local ambiguities is a parsing strategy that is intermediate to an arceager and an arc-standard left-corner parser. It is also obvious that it is the head of a phrase that de nes the announce point for interaction with semantics. The head marks the point at which semantics could provide useful advice for syntactic decision making. As seen above, if we take the eager extreme and attempt to communicate all the time, semantics will be unable to help before the head and unable to alleviate the increase in local ambiguity resulting from early commitment (Figure 6.7 above). Type raising is a formal operation used in certain linguistic analyses to map an argument, such as subject, into a function, such as predicate, that takes such arguments. See, for example, Steedman (1989). 11 One can argue that there may be other sources of information such as statistical information from a corpus, lexical subcategorizations of the verb for particular prepositions, or contextual biases that would enable a justi ed early commitment immediately after the preposition. See the discussion on exible parsing in Chapter 10 (Section 10.3) for possible improvements to the parsing algorithm to account for the above sources of information. 10
104
6 S 11 2 NP 9 VP 3 5 8 10 15 21a 1DET 4 N V 13NP 21b 14 17 19 PP 12 DET 16 N 20 25 18 PREP 23 NP 24 27 22 DET 26N 7
Figure 6.7: Arc-Eager Left-Corner Parsing Is Too Eager.
6.6 Head-Signaled Left-Corner Parsing (HSLC Parsing) Left-corner parsing provides a neat way to combine bottom-up and top-down strategies: generate the left branch bottom-up, go up a level, and then project top-down to the other branches. Leftcorner parsing is however an incomplete speci cation of a parsing strategy unlike pure bottom-up or top-down parsing. An arc-eager left-corner strategy turns out to be rather too eager when incremental semantic interpretation is considered. An arc-standard left-corner strategy on the other hand happens to be too circumspect. Between these two ends, left-corner strategies de ne a range of parsing strategies that are all between bottom-up and top-down strategies. A point within this range de ned by the semantic head of the phrase is the left-corner parsing strategy that has the right computational properties in terms of space requirements and local ambiguities for incremental semantic interpretation. This is called Head-Signaled Left-Corner Parsing. (See Figure 6.9 for the positions of dierent parsers on a scale from bottom-up to top-down parsing.) The HSLC algorithm (Figure 6.10) says:
Generate the parent from the left branch. Always attach a node further up after seeing its head. Do not wait for other constituents after the head to make attachments.
105
6 S 27 2 NP 9 VP 3 5 8 10 26 25a 1DET 4 N V 12NP 25b 13 15 11 DET 14 N 7
17 PP 18 24 16PREP 20 NP 21 23 19 22 DET N
Figure 6.8: Arc-Standard Left-Corner Parsing Is Too Circumspect. Figure 6.10 below shows the HSLC parsing algorithm. Further details of implementing this algorithm are described in Chapter 8. Storage of expectations is also described in the formal analysis in Chapter 9. Figure 6.11 illustrates the data structures described in Figure 6.10 graphically. The enumeration produced by this strategy is shown in Figure 6.12. This is dierent from a pure head-driven parsing in that it permits the generation of a parent from just its left branch no matter if this branch is the head of the parent node or not. Head-driven parsing would behave like a bottom-up parser, waiting until later, if the left-branch was not a head. The eagerness licensed by the left-corner strategy in such a situation is valuable in the following way. It permits the generation of the parent and the associated predictions immediately after the left branch. These expectations can be used to resolve local structural and lexical ambiguities in the right branches to follow. For instance, such a preference for an expected structure helps produce a Minimal Attachment behavior without having to count the number of nodes in the tree. Consider a PP attachment ambiguity and the tree traversal labelings produced by dierent LC parsers shown in Figures 6.7, 6.8, and 6.12. It can be seen from Figure 6.7 that AELC attempts to attach the PP to the VP or NP even before the noun in the PP has been seen. At this time, semantics cannot provide useful feedback since it has no information on the role ller for a thematic role to evaluate it against known selectional preferences for that role ller. Thus AELC is too eager for interactive semantics. ASLC, on the other hand, does not attempt to attach the VP to the S until the very end (Figure 6.8). Thus even the thematic role of the subject NP remains unresolved until the very end. ASLC is too circumspect for interactive semantics. HSLC on the other hand, attempts to make attachments at the right time for interaction with semantics (Figure 6.12).
106
headsignaled leftcorner
(HSLC)
left-corner
Bottom -up
headdriven
arcstnd. leftcorner
circumspect
arceager leftcorner
Topdown
eager
Figure 6.9: Relative Positions of Parsing Strategies. At rst sight, it might appear that HSLC parsing is not quite a neat strategy since it says one has to wait for the semantic head to make attachments. However it is more uniform than arc-eager left-corner parsing in the wake of attachment ambiguities. Left-corner parsing requires you to wait until every child of the left branch has been enumerated but the arc-eager strategy requires you to proceed to make attachments for the right branches without waiting for the descendants of the right branches. That is, the parser waits even on optional adjuncts on the left branch but does not even wait for required descendants on the right branches. An arc-standard strategy would result in far greater uniformity but, as we have seen earlier, arc-standard is too bottom-up for incremental interpretation. HSLC, on the other hand, requires you to wait until the head is parsed to make further attachments on any branch. It is left-corner only because the parent is generated from the left branch no matter what. It might also be argued that waiting for the semantic head kills the independence of syntax. However, HSLC does not require the parser to wait for particular meanings or lexical strings; it simply requires waiting until the head has been parsed. Heads are grammatically de ned syntactic entities since they are de ned for each phrase in a language irrespective of particular lexical items.
6.6.1 Space Requirements of HSLC
As seen in Table 6.1 earlier, all left-corner parsers require constant space for left-branching constructs and unbounded space for center-embedded constructs as desired. However, the arc-standard left-corner parsing requires unbounded space for right-branching constructs as well, since it is rather
107
Data Structures: Tree: An n-ary tree of Nodes. Node: a record Children: an ordered list of children nodes. Parent: a parent node. Expects: a node that is expected by this node. Expected-by: a node that expects this node. Lexical-entry: the lexical entry of the word from which this node was created. Head: the head child node for the phrase this node represents. Complete-p: is this node (phrase) complete?
Algorithm HSLC:
Given a grammar and an empty set as the initial forest of parse trees, For each word, Add a new node T to the current forest of trees fT g for each category in the lexical entry of the word; mark T as a complete subtree; Repeat until there are no more complete trees that can be attached to other trees, Propose attachments for a complete subtree T to a T that is expecting T , or to a T as an optional constituent, or to a new T to be created if T can be the left corner (leftmost child) of T ; Select an attachment by consulting semantics (see the Role Assign algorithm in Chapter 7) and arbitrating (see Algorithm Arbitrate in Chapter 8) and attach; If a new T was created, add it to the forest, and make expectations for required units of T ; If a T in the forest has just seen its head, Mark the T as a complete subtree. w
i
w
j
i
j
i
k
j
k
k
k
i
i
Figure 6.10: The HSLC Algorithm. circumspect. As HSLC lies in between arc-eager and arc-standard, we must verify that HSLC does not require unbounded space on right-branching constructs. Consider the right-branching construct shown in Figure 6.13. The worst case situation is the one where R1 is the head of R0; : : :,R +1 is the head of R ; : : :,R is the head of R ?1 . If any of the L s is a head, HSLC, unlike arc-standard LC, guarantees that all attachments above that level are done thereby eliminating the space required to hold all the incomplete constituents above that level. Similarly, if any of the R s is an optional constituent of R ?1, HSLC does not wait for things below that level thereby limiting space requirements. Moreover, it may be noted that if there are more siblings at any level, say if there are optional constituents to the left of the head at that level, i
i
n
i
i
i
n
108
xP
.... Left-Corner
....
....
OptionalModifier
Head
....
RequiredUnit
....
....
OptionalAdjunct
(Argument)
(a) Constituents of a Phrase.
Expects Expected-By
A Complete Tree
An Incomplete Tree
A Complete Tree
(b) A Forest of Parse Trees
Figure 6.11: An Illustration of Data Structures in HSLC Parsing. it is not a problem for HSLC since it always generates the parent from the left corner and keeps attaching the right branches at that level to the parent. The worst-case, right-branching situation described above does not occur in English, or in any other natural language. This is formalized in the following lemma.
Finite Wait Lemma: If a natural language has a right-branching construct such as the one shown in Figure 6.13, either some of the L s (or some other sibling M that is between L and R ) is a head of R ?1 and the the maximum distance between two such consecutive heads L (or M ) and L (or M ), ji ? kj, is a small constant (say 2 or 3), or such constructs in the language cause the human language processor to break down (just like in the case of center-embedded constructs). i
i
i
i
i
i
i
k
k
The human language processor fails to process center-embedded constructs beyond a certain level of embedding because such constructs require the parser to hold on to a number of incomplete structures. The worst-case right-branching construct above also requires the parser to hold on to a large number of constructs. The nite wait lemma essentially says that either such constructs
109
6 S 7
11
2 NP
8 VP 10 17 27a 3 5 1DET 4 N 9 V 13 NP 27b 14 16 12 DET 15 N
19 PP 20 26 18 PREP 21NP 23 25 22DET 24 N
Figure 6.12: HSLC: Head-Signaled Left-Corner Parsing. do not occur in natural languages, or they do cause the same kind of breakdown as do centerembedded sentences. Natural languages only require a nite wait in terms of the number of units the parser needs to hold on to before they provide the necessary heads, or they contain pathological right-branching constructs in addition to center-embedded ones. The worst-case right-branching sentence is one where phrases keep on getting opened but are all closed only at a later time when the head of the rightmost phrase is seen. A more common right-branching construct is one where many of the rightmost branches are optional constituents such as in nested PPs where alternate rightmost nodes are optionals (PPs) and heads (NPs). Note that a bottom-up or an arc-standard left-corner parser would fail equally badly on such constructs too. HSLC would be the same as arc-eager left-corner parsing for a language that is uniformly head rst. HSLC would be the same as arc-standard left-corner parsing for a language that is uniformly head- nal. HSLC is better than either for a \head-middle" or a mixed head- rst and head- nal language. The above discussion shows that a uniformly head- nal language is not a problem for HSLC except in the worst-case situation. According to the lemma above, either such situations do not occur or they cause the human parser to break down as well.12 Such constructs are possible in certain case-marked, head- nal languages. However, the sentential complements in those sentences appear before the verb (i.e., the head) and thus the sentences become center-embedded ones, not worst-case right branching constructs. 12
110
R0 R1 L1 R2 L2 R3 L3 R4 L4 Rn-1
Ln
Rn
Figure 6.13: A Right Branching Construct.
6.6.2 Local Ambiguities in HSLC
We have seen that optional constructs before the head are not a problem for HSLC. This is because HSLC, being a left-corner parser, always generates the parent from the leftmost branch. As a result, the parent is available while attaching the right branches and can help resolve the local ambiguities by applying top-down strategies. Optional constructs that come after the required parts sometimes lead to attachment ambiguities such as PP attachment ambiguities (Figures 6.7, 6.8, and 6.12). Since HSLC attempts to make such attachments only after the semantic heads of the adjuncts, such as the PP, have been processed, it can take the advice of semantics regarding the choice between attachment points.13 Thus, it can avoid wasteful processing of multiple attachments or a rst analysis based on a uniform structural preference followed by backtracking. HSLC provides an enumeration of the structure of a sentence that is optimal for incremental semantic interpretation according to the principles of incremental interpretation and integrated processing. HSLC provides top-down predictions for the right branches. Such predictions can be used to eliminate certain ambiguities that arise in bottom-up parsing. For instance lexical ambiguities can be resolved by choosing the interpretation that meets the existing top-down prediction. Thus, HSLC provides the right mix of early commitment and delayed decisions|a good middle ground in the spectrum of parsing strategies in terms of local ambiguities as well.
6.7 Related Parsers There are several parsing strategies that are close to HSLC. In this section, some of the signi cant ones are compared with HSLC. See Chapter 9 for a detailed and formal comparative analysis of local ambiguities and space requirements in dierent parsing algorithms. Here we assume that semantic evaluation of proposed syntactic attachments can provide useful information for resolving such attachment ambiguities. 13
111
6.7.1 Head-Corner Parsing
There is a dierent combination of the ideas of left-corner and head-driven parsing strategies called head-corner parsing. In head-corner parsing, \head" is substituted for \left" by sacri cing the left-to-right processing of the words in a sentence. Instead, the parser looks for a seed word which becomes the head of the phrase. It uses the head to then consume other parts of the phrase to the left and to the right of the head. Thus, a head is selected by searching in the input string and then used to make predictions about the rest of the constituents. The advantage of head-corner parsing is that it allows parsing of scrambled and discontinuous constituents. Semantics can be simple and compositional even when grammars permit non-concatenative constructs (van Noord, 1991). In HSLC, on the other hand, predictions are made from the left-most constituent as well as from heads. Such a strategy is suitable for a heavily word-order driven language such as English. A head-corner parser would require a careful selection of heads. Moreover, it violates the principle of incremental interpretation by arbitrary amounts. For instance, given a sentence, it waits past the subject NP until it sees a verb because the verb licenses the subject NP on the left. It remains to be seen if there are other variants of head-corner parsers that do not suer from these problems. If a head-corner parser did a left-to-right parse but waited for the heads, it would be the same as a head-driven parser. There are certain other generalizations of left-corner parsing (Nederhof, 1993) either using a lookahead feature or using a non-unitary de nition of the left-corner. In the latter approach, the announce point is greater than one; that is, the parser waits for a certain number of initial nodes (more than one) before switching to a top-down strategy. It remains to be seen if there is a parser built that equates this number with the position of the head rather than having a uniform announce point across all rules in the grammar.
6.7.2 Lookahead Parsers
Since information that comes later on in a sentence can often help disambiguate an earlier local ambiguity, a parser might bene t by looking ahead before making a commitment. An unlimited lookahead would of course violate the constraints on memory requirements and left-to-right incremental processing. On the other hand, if the lookahead is limited to a maximum of some xed number of words (e.g., Marcus, 1980), there is no guarantee that the lookahead is useful. This is a consequence of the fact that natural languages allow optional constructs and thus might delay the disambiguating information for arbitrary lengths. HSLC can keep track of incomplete structures and use top-down projections from the left corner to eliminate alternatives and thus already has the power that looking ahead can sometimes provide.
6.7.3 Categorial Grammars
The most radical proposals of categorial grammars (e.g., Steedman, 1987 and 1989) purport completely bottom-up processing and hence cannot account for a host of evidence from human sentence processing behavior. Other approaches to categorial grammars combine semantic information to various degrees in their categories. This not only takes away the independence between syntax and semantics, it also makes processing more lexically based. Such an account will have diculties capturing the generalization observable in syntactic processing apart from not being parsimonious. COMPERE tries to capture generalizations as far as possible by representing argument structures separately from the major syntactic categories and by putting the categorial information in hierarchies. For instance, the representation of a verb only tells us how it can be part of a VP but not how that VP can combine with an NP to form an S. By not precompiling such information we restore
112
each type of information to that level in the hierarchy which enables the most generalizations to be captured.
6.8 On the Nature of Syntactic Preferences Syntactic preferences have been considered to be resulting from structural criteria such as high or low attachment or minimality of the number of nodes in a structure (Frazier, 1987). Thus the sentence processor is said to choose the minimal structure or one with right association and so on. Many recent psycholinguistic studies (Crain and Steedman, 1985; Pearlmutter and MacDonald, 1992; Spivey-Knowlton, 1992; Taraban and McClelland, 1988; Trueswell and Tanenhaus, 1992) have used various forms of contextual bias to demonstrate that the sentence processor does not have any pervasive preference for minimality of the size of a structure or for high or low attachment. They have been able to induce and avoid processing diculties at the will of semantic context. This is not to say that the sentence processor does not have any kind of syntactic preference. Of course it does have preferences for one or another structure which is evident especially when semantics does not have any more preference for either of the interpretations, as in sentence (2).
(2)
The ocers taught at the academy were very demanding. Stowe (1991) has proposed an alternative explanation for such structural preferences. The sentence processor has a pervasive goal to complete an incomplete item at any level of processing. In syntax, it has a goal to complete the syntactic structure of a unit such as a phrase, clause, or a sentence. The sentence processor prefers the alternative which helps complete the current structure (called the Syntactic Default which is also the expected unit at that point) over one that adds an optional constituent leaving the incompleteness intact. Such behavior can be described sometimes as a minimal attachment. For instance, in sentence (2) a VP is required to complete the sentence after seeing \The ocers." Since the main-clause interpretation helps complete this requirement and the relative-clause interpretation does not, the main-clause structure is selected as semantics did not have a bias towards either interpretation anyway at that point. A second kind of preference is one where the sentence processor prefers to attach a new constituent to a more recent unit or to a previous unit lower along the right-hand boundary of a previous tree. Such a preference, called Right Association, has already been incorporated into the HSLC algorithm.
6.9 Summary Computational arguments based on space requirements and local ambiguities were used to argue that there is a parsing strategy in the range de ned by left-corner parsing that has the right set of properties for modeling human sentence processing behavior. Under an assumption of incremental semantic interpretation, it was shown that the two extremes of left-corner parsing de ned by arceager and arc-standard strategies are both unsuitable for the human parser. It was shown how one could impregnate a left-corner parser with the head-drivenness of a head-driven parser to de ne a middle point in left-corner parsing. The resulting parsing strategy was shown to possess the right space-requirement and local ambiguity properties. It was called a head-signaled left-corner strategy since the pathways of parsing are laid out by the left-corner strategy with the head-driven strategy providing the \stop and go" signals at the crossroads of parsing decisions.
113
CHAPTER VII THE THEORY OF SEMANTICS: HOW TO COMMUNICATE WITH SYNTAX People have things to talk about only by virtue of having represented them. R. Jackendo, 1983.
In this chapter, we present the theory of natural language semantics used by the COMPERE model and describe the arbitration process to show how semantic processing communicates with syntax. In this context, we also present the theory of intermediate roles as an extension of the idea of thematic roles and show how intermediate roles help in syntax-semantics interactions during error recovery. The theory of semantics itself as presented here is based largely on the uni ed view of linguistic semantics by Frawley (1992).
7.1 Semantics Semantics of a natural language is about linguistic meaning. Frawley (1992) de nes linguistic semantics as the study of literal, decontextualized, grammatical meaning. It is literal since it does not concern itself with inferences that could be made from the literal meaning. It is decontextualized since it does not concern itself with the eects of context. And it is grammatical since it is only concerned with those elements of meaning that are re ected in the grammatical or syntactic structure of natural language. The idea of employing such a linguistic semantics in COMPERE is that it is not only more precise than a non-literal (i.e., inferential, metaphoric, implied, and so on), a contextual, or a conceptual (as opposed to grammatical) theory of semantics, but it also provides an output representation that is of potential use to many situated reasoners with their own non-linguistic (or extra-linguistic), contextual, or conceptual ontologies and knowledge representations. The linguistic semantics employed in COMPERE serves as a precise, useful level of meaning representation that captures the content that is common to and essential for other more contextual and conceptual representations of meaning, whether in relation to language processing or other reasoning situations.
7.1.1 Dierent Views of Semantics
There have been at least ve dierent approaches to meaning. These views, though not completely incompatible with each other, dier in what they consider the contents of semantic representations to be. The dierent approaches are: 1. Meaning as Reference: Meaning, in this view, is what the linguistic expression refers to or points to in the world. This view runs into philosophical questions such as how meaning can vary when the reference is constant, or how meaning can exist when the referent does not exist in the world.
114 2. Meaning as Logical Form: In this formal semantics, the contents of semantic representations are well-formed logical formulas. Thus, semantic representations are subject to logical truth conditions, to inferences licensed by the logic, and to procedures for transforming the representations. This logical view of semantics diers from linguistic semantics in that it is free of the content of a representation, and it enforces strictly categorical view which is overly restricting for linguistic semantics. 3. Meaning as Context and Use: In this view, the meaning of an expression is its context and its function in the context. This essentially equates pragmatics with semantics by not distinguishing between the set of possible linguistic meanings for an expression and the one meaning from that set that is appropriate in the context. See Frawley (1992) for a detailed discussion of the problems with equating semantics with pragmatics. 4. Meaning as Culture: Similar to the view of meaning as context and use, in this view, meaning is ultimately determined by the culture. In other words, there is no meaning inherent in a linguistic expression and the contents of semantic representations are the very culture of a society. 5. Meaning as Conceptual Structure: This view equates linguistic semantics to non-linguistic or conceptual semantics (e.g., Jackendo, 1983). While there is a large overlap between conceptual representations and semantic representations, one also nds numerous incompatibilities between the two. In other words, what can be conceptualized is not always the same as what can be represented semantically. For instance, certain categories in linguistic semantics are precise and exclusive unlike conceptual categories which are fuzzy. Frawley (1992) argues that the rst four approaches converge on conceptual structure. In this Uni ed view of Linguistic Semantics, the meaning of a linguistic expression is a reference to a conceptual construct in a mentally projected world of gradient, not strict, categories and precedes contextually or culturally appropriate meanings.
7.1.2 Elements of Linguistic Semantics
In this section, we give a brief overview of the various elements that constitute the semantics of a sentence. What follows is essentially a content theory of the linguistic semantics employed by COMPERE. For further details on each of the elements, the reader is referred to Frawley (1992). The various elements are: 1. Entities or Objects: The things in the conceptual space that are temporally stable and typically appear as nouns. 2. Events: The temporally sensitive things in the conceptual space that are typically encoded as verbs. 3. Thematic Roles: Thematic roles are the relationships between entities and events. They are the semantics of the roles entities play in events. Thematic roles represent the participant, causal, spatial, or purposive roles that objects (and other events) play in events. Thematic roles are based on the fundamental notion of predication. 4. Spatial Roles: The spatial relationships between objects. 5. Aspect: The nontemporal, internal contour of an event. Aspect is the way an event is distributed through the time frame in which the event occurs.
115 6. Tense and Time: The semantics of how a language encodes time. 7. Modality and Negation: Modality is the element of semantics that is concerned with the factual status of a statement. For instance, this involves the negation, possibility, and obligational entailments of statements. 8. Modi cation: This nal element of linguistic semantics encompasses all relationships between objects that can be called properties, qualities, or modi ers of conceptual entities. In the current COMPERE model, we have focused only on objects, events, and thematic roles. Certain elements of modi cation have been captured using the generic State role as described below. However, incorporating the other elements above is not expected to cause any conceptual problem in COMPERE since its representations and processes have been designed with the entire set of semantic elements above in mind. For instance, one could add the semantics of spatial relationships by adding the necessary spatial roles and the intermediate roles that link them to the spatial prepositions. Such enhancement is particularly feasible because of a uniform representation of all semantic elements in the form of what we call Semantic Roles. Though one can distinguish between four classes of events, namely, acts, states, causes, and motions, such distinctions are not made in the current implementation of COMPERE. Acts and motions are together treated uniformly as events, causes (i.e., relationships between two events) are not dealt with, and states are treated as semantic roles that represent any property of an object (i.e., any modi er).
7.1.3 Semantic Roles
All the above eight elements of semantic representations are uniformly represented in COMPERE in terms of what we call semantic roles. This notion of semantic roles is an extension of thematic roles viewed as predication where semantic roles form a set of predicates that represent any of the eight elements of semantic representation. Such a uniform representation (Mahesh and Eiselt, 1994) enables COMPERE to apply the same set of operations to all semantic elements such as the operations carried out during error recovery. Furthermore, semantic roles themselves are extended to syntactic and other predicates leading to the notion of what we call intermediate roles.
7.1.3.1 Thematic Roles
A standard set of thematic roles includes at least the following roles (Allen, 1987; Frawley, 1992): 1. Agent: The object that caused the event to happen. 2. Instrument: The force or tool used in causing the event. 3. Experiencer: The person (a particular class of objects) who is involved in perception or in a psychological state. 4. Bene ciary: The person for whom some act is done. 5. Theme: The object that was aected by the event. 6. Location: The site of an event (sometimes also used to denote the source of destination of a motion event). 7. Co-Agent: A secondary agent in an event. 8. Co-Theme: A secondary theme in an event.
116
There are several other thematic roles distinguishable from the above roles. For example, author roles are distinct from agents, themes from patients, and sources and goals from locatives. There are also certain purposive and causal roles such as reason, purpose, enablement, and so on, that are not included in the current instantiation of COMPERE.
7.1.3.2 Extended Set of Semantic Roles
All semantic relationships can be viewed as one semantic element lling a role in another element. For instance, certain causal relationships between entities and events are traditionally viewed in this manner as thematic roles. In COMPERE, we extend the notion of thematic roles to what we call semantic roles by treating all the semantic elements except objects as semantic roles which are lled by objects.1 This uniform, extended set of roles enable all semantic processing to be treated as role assignment. Role assignment is accomplished in COMPERE through the syntax-semantics mapping provided by what are called intermediate roles below.
7.1.4 Intermediate Roles
Syntactic analysis identi es the grammatical roles that syntactic units play in relation to other units. These grammatical roles can be viewed as a set of syntactic predicates. For example, a Subject role is a predicate that states whether an NP is a subject of a sentence S. We extend this notion even further to representations of other linguistic cues as well. For instance, the voice of a sentence is represented in terms of two predicates called the Active-Voice role and the PassiveVoice role. It may be noted that some of these roles are distributedly encoded in the language. For example, in a sentence, there may not be a single word that determines whether the sentence is in active or passive voice. Dierent semantic roles combine with each other in constrained ways to form other roles. These combined roles are various combinations of the individual roles. For instance, an Active-Subject role is a conjunction of an Active-Voice role and a Subject role. Such role combinations nally map into semantic roles that constitute the elements of semantic representation. This entire set of roles including semantic roles, grammatical roles, other linguistic roles, and their intermediate combinations are called intermediate roles. (See Chapter 8 for further elaboration on the representation of intermediate roles.) Intermediate roles enable semantic processing to be reduced to role assignment. Word meaning composition (see Chapter 4) can be viewed as a process of role assignment where various intermediate roles combine with each other under known constraints to form other intermediate roles and nally to result in a thematic or semantic roles assignment. In addition, intermediate roles provide a uniform representation of not only syntactic and semantic representations but also the intermediate choices and commitments in syntax-semantics interactions during sentence processing. With such a uniform representation, a declarative record can be kept of the incremental interactions between syntax and semantics. Such a record is crucial for modeling error recovery behaviors in sentence interpretation (see below and Chapter 8). In a natural language such as English for example, there is no direct mapping from grammatical roles to thematic roles. The mapping from syntax to semantics is determined by a number of other kinds of information such as individual word meanings, other categories speci c to lexical entries, conceptual constraints derived from non-linguistic conceptual or world knowledge, linguistic cues such as voice, and so on. This is also true of other natural languages such as heavily in ected and case-marked languages where the mappings from case endings to thematic roles are as ambiguous 1 Events are also considered semantic roles since an event might ll a causal or temporal, etc., role in another event, or in a bigger knowledge structure such as a script (Schank and Abelson, 1977), for instance.
117
and multi-valued as in English. Thus, neither grammatical roles nor morphological cases can be equated with thematic roles. Intermediate roles are also dierent from another formalism known as \macroroles" (Foley and Van Valin, 1984; Frawley, 1992). Macroroles are just classes or categories of thematic roles with each class de ning a hierarchy of thematic roles. Macroroles are not assigned in semantic processing and do not form the content of semantic representations. A particular macrorole, unlike an intermediate role, is not a well-de ned predicate. Macroroles were proposed to give a uniform account of the process of mapping grammatical roles to thematic roles. However, they do not capture the content of the representations involved in the mapping process.
7.1.4.1 Linking Theory and Thematic Hierarchies
Linking theories are attempts to provide concise and uniform accounts of how thematic roles map to surface or grammatical roles. These theories do not provide hard mapping rules but rather describe tendencies or likelihoods of certain thematic roles being mapped to certain grammatical roles. They are all based on various thematic hierarchies, that is, orderings of thematic roles that are used to determine which is more likely to map to a particular grammatical role than other roles in the hierarchy. See Frawley (1992) for a description of several thematic hierarchies. Hierarchies are simpli ed views of the mapping process which transforms grammatical roles to thematic roles. These hierarchies are nothing but compilations of the many types of knowledge that constrain and de ne the mapping. However, none of this knowledge is represented in the hierarchy. For example, the hierarchy might embody the in uence of the voice of the sentence, but nevertheless, this information does not appear in the simple linear hierarchies of thematic roles. Intermediate roles are more elaborate representations that make the involved knowledge explicit. Moreover, thematic hierarchies are generic whereas intermediate roles leave room for including lexically speci c hierarchies. For example, the conceptual meaning of a verb might specify its particular preferences for some thematic roles over others. Such event-speci c hierarchies can be used in COMPERE to constrain the assignment of intermediate roles. Intermediate roles also provide an apparatus that works for a lot more than just thematic roles. Thematic hierarchies have been proposed only for thematic roles from the standard set. Intermediate roles apply the same principle to all semantic roles and help disambiguate other role assignments as well. Intermediate roles also have advantages in arbitration and error recovery as explained below.
7.2 Semantic Processing A functional analysis of sentence understanding (Chapter 4) reveals two main tasks, word sense disambiguation and word meaning composition, that are accomplished in part by semantic processing. Given the notions of semantic and intermediate roles, these two tasks can be achieved by the method of role assignment as follows. Word Sense Disambiguation: Whenever a role assignment is made, the word meanings associated with the roles are checked against any constraints on that role assignment. Any meanings that are not compatible with the assignment are deactivated and retained for possible later use in error recovery (see Chapter 8 for further details and examples; also see Eiselt, 1989). Thus, only those meanings that are compatible with the role assignment are selected. The process of role assignment is described in further detail below. Word Meaning Composition: Word meanings are composed by lling one word meaning in a role of another meaning. For instance, many thematic roles help compose the meanings of the objects with that of the event, modi er roles may compose the meanings of two objects, and so on.
118
This composition is performed using syntactic guidance as described below to constrain the search in the role assignment process.
7.2.1 Role Assignment
In COMPERE's theory of sentence processing, role assignment is the primary operation in semantic processing. Portions of the input sentence are assigned primitive roles by lexical information. Open class words get their roles from their syntactic category. For example, nouns get an NP-role and verbs get an event-role. Closed class words such as prepositions get roles speci c to the individual prepositions from their lexical entries. For example, \with" gets a with-role and \at" gets an at-role. It may be noted here that the operation of role assignment described here is similar to those in formalisms based on lexical functional grammars (LFG) (Bresnan and Kaplan, 1982; Sells, 1985). However, as already pointed out in Chapters 4 and 5, role assignment in our model is determined dynamically during sentence understanding, not selected from previously enumerated syntax-semantics mappings in the lexicon. These primitive roles of words are assigned to bigger syntactic units by syntactic analysis. Each bigger syntactic unit acquires its role from its head child (see Chapters 6 and 8). For example, the NP-role of a head noun is propagated up to its parent NP structure and the event-role of a verb is propagated up to its parent VP and further perhaps to the sentence structure S.
7.2.1.1 Syntactic Guidance for Semantics
During this propagation from a head child to its parent, the roles are also composed with any roles of other children of the parent. For instance, if there was a syntactic modi er before the head, the role of the modi er, if any, is composed with the role of the head child which now becomes the composite role of the parent unit. This composition also results in a composition of the word meanings of the modi er and the head. For example, the meaning and role of a prepositional phrase attached to a noun phrase are composed with the meaning and role of the head noun of the noun phrase. Chapter 8 will show representations and examples of such role composition through syntactic guidance. Syntactic compositions do not always lead to unique semantic roles. Syntactic compositions guide semantic processing by suggesting which roles should be composed together, thereby avoiding an exhaustive search of all possible word meaning compositions of the words in a sentence. Other types of knowledge must be used to gure out how the suggested pair of roles should be composed. Such knowledge might be of a variety of types, such as conceptual constraints on role assignments in the form of selectional restrictions or other linguistic cues. These types of knowledge are accommodated in COMPERE by uniformly treating them all as constraints on assigning intermediate roles. Chapter 8 shows the representations of intermediate roles, the classes of constraints on assigning intermediate roles, and several examples.
7.2.1.2 Role Assignment as Parsing
Given that COMPERE treats semantic processing as role assignment and composition, and that it uniformly treats all types of knowledge that aect semantic processing as constraints on assigning intermediate roles, semantic processing in COMPERE can be viewed as parsing. Individual words are assigned primitive semantic categories (i.e., roles) and these are roles are composed to form other roles. Ultimately, we arrive at a role tree, a hierarchy similar to a parse tree, rooted at the role of the head meaning of the sentence with other word meanings in the sentence hanging o roles in the interior of the tree.
119
This process of \semantic parsing" is itself very similar to syntactic parsing. Intermediate roles form a grammar of rules where each rule identi es a particular role composition just as a phrase structure rule describes a syntactic composition. Semantic rules also specify what constraints must be met in order for the rule to be applicable. For example, it might require that a test on the class of word meaning of the child role must satisfy any selectional constraints speci ed in the word meaning associated with the parent role (see Chapter 8). The algorithm used in COMPERE to carry out this semantic parsing (Figure 7.1) is a simple head-driven parsing algorithm very similar to the HSLC parser applied to syntactic processing (Chapter 6). The only dierence is that no leftcorner projection is done in the case of semantic parsing because there is no particular advantage for doing that in semantics in terms of reducing ambiguity in role assignment, for instance. (See Chapter 9 for as analysis of the cost of left-corner projection and the associated predictions.) The algorithm for role assignment using syntactic guidance shown in Figure 7.1 is very similar to the HSLC parsing algorithm described earlier in Chapter 6. Further details of implementing the role assignment algorithm are described in Chapter 8. Figure 7.2 shows a sequence of steps in making role assignments that correspond to a syntactic attachment of a PP to a VP. This gure illustrates both the forest of syntactic parse trees and the forest of role trees at the beginning of processing the noun in the PP, after the NP is attached to the PP, and at the end when the PP is attached to the VP.
7.2.2 Role Emergence
It can be seen from the above description that syntactic and semantic processes are very similar to each other in COMPERE. This not only agrees with the psycholinguistic motivations for a uni ed process (stated in Chapter 3) but also aids the processes of arbitration and error recovery in sentence processing. The concept of intermediate roles enable us to build a seamless bridge between syntax and semantics and to view all processes in sentence interpretation as a process of role emergence. As each type of knowledge or linguistic cue in a sentence is applied to perform word meaning selection and word meaning composition (Chapter 4), intermediate roles are assigned or composed. Every such assignment of intermediate roles can be viewed as a re nement in predication that ultimately leads to the predicates that constitute the content of the semantic representation of the sentence. Sentence processing is an incremental process where individual word meanings start with primitive roles and as each piece of knowledge is applied to re ne the role assignment, the roles emerge into more speci c roles, nally providing a role tree that shows a representation of not only the nal semantic roles in the interpretation but also the intermediate steps and decisions in the process. Since both syntactic and semantic processing are viewed as role assignment, arbitration (see below) is merely the task of choosing the best role assignment given the dierent possible ways the roles could emerge into other roles. Error recovery in either syntax or semantics is a process of undoing certain role assignments and doing consistent and coherent reassignments of roles. This view of sentence processing as incremental role emergence from a variety of linguistic cues seems to abstract away from the particular feature of English that makes syntactic processing in English so heavily in uenced by the role of word order. The theory of intermediate roles and role emergence seem to be equally well applicable, for instance, to a heavily in ected, word-order free language in which morphological case endings play a much bigger role in determining word meaning composition than word position or ordering.
7.2.2.1 Syntax-Semantics Consistency and Correspondence
It is important in sentence processing to ensure that there is consistency between syntactic and semantic interpretations at all times. The interpretation that syntax selects must correspond to the one that semantics selects and vice versa. In a sentence processor where semantics is
120
Data Structures: Role Tree: An n-ary tree of Roles. Role: a record Children: an ordered list of children roles. Parent: a parent role. Head: the head child role for this role. Complete-p: is this role complete?
Algorithm Role-Assign:
Given a pair of syntactic nodes to be semantically composed, Add a new primitive role R to the current forest of role trees fR g for each of the two nodes; mark R as a complete subtree; Repeat until the R s for the two syntactic nodes are linked to one another, or there are no more complete trees that can be attached to other trees, Propose attachments for a complete subtree R , using rules for intermediate-role attachment in COMPERE's semantic knowledge and by checking selectional constraints on the meanings of the two syntactic nodes, to an R as a required constituent, or to an R as an optional constituent, or to a new R to be created if R can be the left corner (leftmost child) of R ; Select an attachment (see Algorithm Arbitrate in Chapter 8) and attach; Deactivate any meanings that do not satisfy the constraints on the selected role attachment; If a new R was created, add it to the forest; If an R in the forest has just seen its head, Mark the R as a complete subtree. w
i
w
w
j
i
i
k
j
k
k
i
i
Figure 7.1: Syntax-Guided Role Assignment. guided by syntax, that is, where semantics only makes those role assignments that correspond to syntactic compositions, consistency of initial interpretations can be maintained by only selecting those interpretations that are suggested by syntax and also compatible with the corresponding semantic role assignments. However, it is also important to ensure consistency between syntactic and semantic representations during and after an error recovery operation. During error recovery, certain commitments made either in syntax or in semantics must be undone. Such deletion of a role assignment in syntax must be followed by a recovery operation that performs corresponding deletions or changes in semantics and vice versa. Otherwise, the sentence processor could end up with inconsistent syntactic and semantic interpretations of a sentence. In COMPERE, consistency before, during, and after error recovery is maintained with the
121
VP
PP
Prep
NP-Role
AT-Role
Event-Role NP
at Det
N
the
academy
VP
AT-NP-Role
Event-Role
PP
NP
Prep
at
AT-Role Det
NP-Role
N
the
academy
VP
Event-Role PP
Location-Role NP
Prep
AT-NP-Role
at Det
the
N
academy AT-Role
Forest of Syntactic Parse Trees
NP-Role
Forest of Role Trees
Figure 7.2: Role Assignments Guided by Syntactic Attachments. help of intermediate roles. Every syntactic unit (or role, as we may call it) in a parse tree has links to semantic roles that it corresponds to and vice versa. These links are maintained during role assignment and composition operations. The links are used to determine which parts of the other representation are aected when a change is made to a part of a syntactic or a semantic representation of the input. In addition, the intermediate roles which are accessible through the links in the role trees provide a declarative representation of every decision that was made during role assignment. Using these representations of intermediate roles, COMPERE can gure out what changes in one representation are necessitated by a change made to another representation. Such correspondence is established using intermediate roles by examining the constraints based on the satisfaction of which a particular intermediate role was assigned. Trees of intermediate roles provide
122
a declarative record of the interactions between syntax and semantics without which a sentence processor is forced to abandon the other representation completely and carry out a syntax-semantics translation from scratch to ensure consistency. Intermediate roles enable COMPERE to repair its interpretations instead of having to rebuild them by reprocessing the input. The next chapter (Chapter 8) explores these issues in more concrete terms after introducing the actual representations employed by COMPERE.
7.2.3 Independence of Semantics
The functional independence principle (Chapters 2 and 3) requires that semantic knowledge be applicable even when corresponding syntactic knowledge may not be available. In the above account of syntactic guidance to semantics, semantic knowledge was applied only to test those semantic role assignments that were suggested by syntactic compositions. In order to comply with the independence principle, the sentence processor must maintain the independence of semantics. COMPERE indeed allows for such independence by including methods for semantic role assignment independently of whether the corresponding syntactic role assignments exist or not. Semantic role assignment, which was described as a parsing process above, can be applied either to a pair of roles suggested by (i.e., corresponding to) a syntactic composition, or to every possible pair of roles currently assigned. The sentence processor is capable of independently proposing word meaning compositions based only on semantic considerations. It appears that such independently proposed semantic compositions are necessary when syntax fails to provide guidance to semantics. This might happen when the input is ungrammatical given the syntactic knowledge of the sentence processor, for instance. However, when syntactic guidance is available, semantic processing must make use of the guidance and avoid wasteful search of semantic role assignments that will eventually be found to be incompatible with syntactic role assignments. In the semantics- rst sentence processing architecture, MOPTRANS (Lytinen, 1987) for example, semantic independence is carried to a limit. MOPTRANS selects a semantically feasible word meaning composition rst, and then tries to establish correspondence with possible syntactic compositions by exhaustively looking through phrase structure rules for one that corresponds with the independently chosen semantic composition (see Chapter 9). Intermediate roles in COMPERE provide an incremental mechanism or maintaining correspondence between syntax and semantics that avoids such exhaustive enumeration. At the same time, independent applicability of syntactic and semantic knowledge is also maintained.2
7.3 Arbitration and Con ict Resolution When there is an ambiguity, it is possible that syntactic and semantic knowledge will assign dierent preferences to the various interpretations, leading to con icts between syntax and semantics. The arbitrating process makes the selection between alternative interpretations (or role assignments or compositions) and resolves any con icts. Given the role emergence view of sentence processing, arbitrating is the process of selecting between dierent intermediate role assignments. As described above, intermediate roles provide the mechanisms for the arbitrator to gure out correspondences between the various syntactic and semantic role assignments. The task of the arbitrator is to This leads to an interesting question, namely, whether intermediate roles provide a reversible mapping from syntactic representations to semantic representations. That is, given an independently proposed semantic role assignment, can the sentence processor trace through the trees of intermediate roles to suggest corresponding syntactic compositions. The derivability of grammatical roles from thematic roles is a controversial issue and is considered untenable by many (e.g., Frawley, 1992; Rosen, 1984). Linking theories and thematic hierarchies mentioned above are attempts to capture the tendencies in the mapping from thematic roles to grammatical roles. Further analysis is necessary before we can answer the question with regard to the reversibility of the intermediate role mechanism presented here. 2
123
select a subset of combinations of corresponding syntactic and semantic role assignments from the set of possible combinations of corresponding assignments. The arbitrator must select a single combination whenever possible. Since syntactic and semantic role assignments are proposed using respective knowledge and since the arbitrator examines both before it selects an interpretation, it eectively combines or integrates dierent types of information in sentence processing. The intermediate roles provide a uniform representation for reasoning about, for carrying out, and for recording the integration of knowledge from multiple sources. Information of dierent kinds (for instance, syntactic and semantic preferences) can be combined either as continuous numerical values or as discrete qualitative values. Since there is no known way to determine how to combine real-number preferences of dierent kinds (such as whether a syntactic preference of 2.3 is stronger or weaker than a semantic preference of 3.2), nor is there a way to gure out what those real numbers must be in a particular situation, we have adopted discrete qualitative values for preferences in COMPERE (see Chapter 8 for details). However, even while combining discrete values, we run into the problem of comparing values for preferences of dierent types, such as whether a level 3 preference in syntax is higher or lower than a level 2 preference in semantics. COMPERE's arbitrating algorithm avoids this problem of combining dissimilar values by merely ranking preferences of each kind and choosing the interpretation that has the highest preference in both syntax and semantics. When there is a real con ict (i.e., when the syntactically preferred interpretation is rejected by semantics and vice versa), the arbitrator simply delays the selection (see Chapters 3 and 8) and pursues multiple interpretations until some new information enables it to resolve the con ict away. The arbitration algorithm for combining discrete values is presented in Chapter 8, since we need to see certain details of COMPERE's representations before arbitration can be presented as an algorithm. It may be noted that the arbitrator is the uni ed process in COMPERE that carries out both ambiguity resolution and error recovery. Since it selects among alternatives, it clearly resolves ambiguities. In addition, it also coordinates the changes in syntactic and semantic representations that take place during error recovery since recovery from an error is accomplished in COMPERE largely by re-examining the alternative interpretations in the light of the new information that led to the detection of the error. It is important to note that this process of arbitration is neither syntactic processing nor semantic processing. It is the uni ed process that is common to both syntactic and semantic decision making.
124
CHAPTER VIII COMPERE: THE SENTENCE UNDERSTANDING SYSTEM A third class of models consists of models in which syntactic cues and information derived from the content of the sentence are used together to guide online attachment and role assignment processes. : : : Models of this type have a number of attractive properties. : : : The drawback of these models is that most are quite incomplete or underspeci ed. It is a matter of ongoing research to develop computationally sucient models of this type.
R. Taraban and J. McClelland, 1988, p. 624-625. In this chapter the knowledge representations and the algorithms used to implement COMPERE are presented along with examples of both representations and the workings of COMPERE. The goal of this chapter is to describe the knowledge and the algorithms in sucient detail to facilitate replication of the results.1
8.1 Knowledge Representation COMPERE represents syntactic and semantic knowledge separately but uniformly using a representational primitive called a node. A node is a structured representation of all the information pertaining to a syntactic or semantic category. A link, represented as a slot- ller pair in the node, speci es a parent category or concept of which the node can be a part, together with the conditions under which it can be bound to the parent, and the expectations that are required to be ful lled should the node be bound to the said parent. More speci cally, the elements of the uniform representation are units called nodes comprised of (a) part-whole relations to other nodes (\part-of" and \has-part" links), (b) preconditions on these relations, and (c) expectations that could be generated from such relations. The representations are to be read as \the parts can be linked to the wholes when the preconditions are met and if so, the corresponding expectations can be generated at that point." The structure of a node is shown in Figure 8.1. In addition, nodes in either syntactic or semantic knowledge may have links to corresponding nodes in the other type of knowledge so that the sentence processor can build on-line interpretations of the input sentence in which each syntactic unit has a corresponding representation of its thematic role and the word meaning (the role ller) and vice versa. COMPERE's conceptual knowledge of objects and events in the world and the relationships between them are also represented using the same kinds of nodes and links, except in this case, the links from a node are really \has-part" links rather than \part-of" links. The links point to thematic roles that can be attached to the concept, along with any preconditions and preferences on the role llers. Semantic role knowledge acts as a bridge between syntactic and conceptual knowledge forming a continuum of kinds of knowledge Complete listings of all the knowledge in the COMPERE program are not included here. Representative pieces are shown and pointers to typical enumerations of such knowledge are included which can be used to replicate the results produced by COMPERE. 1
125
NODE: Part-of: Preconditions: Expectations:
...... Part-of: Preconditions: Expectations:
A Node.
Figure 8.1: Representational Unit: A Node. from the purely syntactic knowledge of the surface structure of the language to deep conceptual knowledge of the world. In addition, there is a lexicon as well as certain other minor heuristic and control knowledge that is part of the process. Before we take a look at COMPERE's uni ed arbitration process for sentence interpretation, let us see the dierent kinds of knowledge and their representation.
8.1.1 Lexical Knowledge
Each word has a set of one or more lexical entries in the lexicon, one for each category of the word when the word has a category ambiguity. Each lexical entry is a frame and includes among other things a set of one or more meanings for the word (for that particular category). Thus an individual lexical entry has one syntactic category but might point to several conceptual meanings for the word.2 Other information stored in a lexical entry include: The word itself. The syntactic category of the word. Subcategory information: this might include the number and person of a noun, the tense of a verb, and so on. The meaning(s) of the word (for that category). This representation is a simpli cation. Sometimes, dierent meanings for the same categories may be associated with dierent subcategories or subcategorizations (i.e., predicate argument structures). For example, the word \saw" has a verb category and a noun category. The verb category has two meanings, one being the past tense form of \see" and the other the in nitive form of \to saw." These two verb forms have dierent subcategories (the past tense form being valid for all three persons and all numbers while the in nitive form is valid for all forms except 3S, the third person singular). In order to capture this information, the lexical representation must be enhanced so as to have separate lexical entries for each subcategory or for each meaning. This point also highlights the arti ciality of certain linguistic distinctions such as between categories and subcategories. 2
126
The role structure that corresponds to the word (mostly for closed-class words like preposi-
tions). Sample lexical entries are shown in Figure 8.2.3 The representations ensure that it is not possible to force a disambiguation of a word by directly specifying a particular category of the word in the sentence. Also, at present there is no morphological analyzer in COMPERE. As a result, each in ection of a word must have its own lexical entry (or entries if it has a category ambiguity).
Word:
...............
saw
Word:
Lexical-
Lexical-
Lexical-
Entry 1
Entry 2
Entry 3
Word:
Word:
saw
Category:
SAW-TOOL
Meaning: Role:
Role:
at
Category:
V
Subcategory:
Subcategory: Meaning:
Word:
saw
Category:
N
at
(Past), (Inf)
SEE, SAW-CUT
PREP
Subcategory: Meaning: Role:
NIL
NIL
AT-Role
Figure 8.2: Representation of Lexical Knowledge.
8.1.2 Syntactic Knowledge
Syntactic knowledge is mainly comprised of phrase-structure rules. These rules tell the sentence processor how dierent parts of the sentence can be composed with each other to form phrases, clauses, and entire sentences. However, the rules in COMPERE's syntactic knowledge are reversed from the standard representation of phrase-structure rules. The rules are indexed dierently so that they can be accessed eciently by a bottom-up or a left-corner parser. The rules also have conditions on subcategories and other information attached to them and thereby extend the grammar beyond Context-Free Grammars to capture the context sensitiveness of natural languages. It may be noted that the lexical representation described here is strikingly similar to those used in LFG (lexical functional grammar) formalisms (Bresnan and Kaplan, 1982; Sells, 1985). However, unlike in LFG, COMPERE does not require lexicalized semantic knowledge. COMPERE does not employ syntax-semantics mappings de ned a priori and stored in the lexical entry of each word. COMPERE instead uses general, non-lexical semantic knowledge and applies an arbitration algorithm on a uniform representation based on intermediate roles to determine the mappings from syntax to semantics dynamically. 3
127
8.1.2.1 Preconditions and Expectations in Syntax
A phrase-structure rule is represented in COMPERE using two kinds of relations between grammatical categories. The rst kind is an expectation relation. It is a relationship among at least three categories or three nodes in the grammar representation. An expectation relation says that a constituent of a category, such as NP, can be a part of a parent constituent, such as S, i it is followed by another unit, such as VP, which is a required unit as described below. The expectation relation in this example is stored in the NP node in the representation and says that a NP can be a part of a sentence structure, S, if it is followed by a VP. In this case the language processor expects a verb phrase, VP, to follow the NP since it is required to complete the S structure. Thus, we can de ne syntactic expectation as follows.
Syntactic Expectation: A syntactic constituent is expected at a particular point i the constituent is necessary to complete the currently incomplete structure.
The second kind of link is a precondition relation. It can also be called a satisfaction relationship because when it is processed successfully, a previously made expectation is satis ed by the addition of a new constituent. A ternary precondition relationship connects three nodes and says that a category, such as VP, can be a part of a parent category, such as S, i it is preceded by a unit, such as NP, that expected this category. The expecting unit must precede the expected unit immediately to its left, unless otherwise speci ed in the grammar representation. That is, the precedence speci ed in an expectation relation must make the expecting and the expected unit siblings under the parent category and contiguous with each other in the sentence. We can de ne contiguity as follows.
Contiguity: Two syntactic units X and Y are said to have a contiguous span if the rst word in the part of the sentence spanned by Y follows the last word spanned by X immediately to the right.
If X is a child of a parent node Z in a parse tree, then a new node Y can be added as the next child of Z only if X and Y are contiguous in the sentence. Expectation relations are represented using MUST-FOLLOW lists and precondition relations using MUST-PRECEDE lists in COMPERE's syntactic knowledge as described below. Syntactic contiguity is represented using the `word counter' described below in the section on the working memory of COMPERE.
8.1.2.2 Indexing for Bottom-Up Parsing
Phrase-structure rules can either be represented as a set without further organization or indexing, or they can be grouped in various indexing schemes to facilitate their access. Consider a grammar such as the one in Figure 8.3 that has several possible expansions for each nonterminal. The set of rules in Figure 8.3 has been grouped so that rules with the same nonterminal on the left-hand side are together. Such an organization would be useful for a top-down parser that would in fact start with those nonterminals on the left hand side and try to expand them according to the rules. However, for a bottom-up parser, or for a left-corner parser, this organization is not convenient, especially if the parser is required to evaluate all the possible attachments suggested by the various rules at the same time. For instance, given the sentence
(1)
The boy saw the girl. after parsing the rst word \The," there are two rules that can be applied. However, in order to gure this out, the parser must scan through the entire set of rules, no matter what is on the
128
S ?! NP VP S ?! VP NP ?! N NP ?! Det N NP ?! Det N PP VP ?! V VP ?! V PP PP ?! Prep NP Figure 8.3: A Simple Grammar. left sides of the rules, to select those that start with a Det on the right side. If the rules had been organized dierently, using an indexing scheme more suitable to bottom-up parsing, then perhaps the parser could easily pick all such rules. Such an indexing scheme has been devised in the grammar representation employed in COMPERE. Before we see the representation, we must observe a second reason why the above representation of grammar rules is not suitable for COMPERE. After the rst word of the above sentence, there are two rules that are applicable. Which one should the parser apply? A top-down parser would in fact encounter such an ambiguity. However, this is purely an ambiguity in the grammar, not in the language. The sentence itself is not ambiguous after the rst word. We do not like our parsers to introduce two interpretations of the sentence at this point. In order to do this, we need an enhanced representation of the rules that will allow us to distinguish between two kinds of units on the right hand sides of rules: those that are required parts of the phrases and those that are optional adjuncts. In the above case, after the Det, the N would be the required constituent and the PP would be the optional adjunct. With this richer representation, we could combine the two rules into one composite rule and thereby eliminate the spurious ambiguity at this point. The representation of grammar rules must be such that it minimizes the ambiguities that are merely a consequence of the grammar representation and use. The representation must distinguish between required and optional units on the right-hand sides of rules. We note that if we tried to eliminate this spurious ambiguity by altering the rules for NP using left-recursive rules as follows, NP ?! Det N NP ?! NP PP we are only pushing the problem one step further. Now, there is no ambiguity after a determiner. However, after seeing the following noun, should we hypothesize a PP every time or should we treat it as a complete NP and try to attach it to some other parent unit such as an S or a VP or a PP? This is again a spurious local ambiguity introduced by a representation that fails to distinguish between required and optional constituents of phrases. The ambiguity in attaching the Det to the NP arose from the partitioning of syntactic knowledge given by the above set of rules. The rule representation broke up the knowledge of the grammar into dierent ways of expanding noun phrases without regard to the distinction between required and optional parts of the noun phrase. Had that distinction been used as the partitioning criterion instead of the various ways of expanding a complete noun phrase, a dierent representation would have resulted that would have eliminated the above ambiguity.
129
Syntactic knowledge is a set of complex relations between syntactic units. These relations cannot be broken down into binary relations between units.4 Each rule in the grammar represents a subgraph of the complex graph with n-ary relationships between units that represent the complete grammar. The rules must partition the grammar into smaller subgraphs such that the partitioning introduces the least amount of local ambiguity into the representation. In summary, rules in the grammar must be organized such that those with the same subsequences on the right hand side are grouped together and they must be combined so that those that dier only in whether or not they allow optional constituents somewhere on the right hand side form one composite rule. In COMPERE, we have developed such a representation where, essentially, there is a set of composite rules for each grammatical category or phrase. An individual composite rule represents all the knowledge about the child unit being a part of a particular parent unit. This might include many ways of attaching the child to the parent under dierent syntactic contexts. The dierent composite rules for a category tell how a unit of that category can be attached to parent units of dierent kinds. Using the above grammar (Figure 8.3) as an example, we now have one composite rule for a Det being a child of an NP. This rule states that the Det can be a part of the NP provided there is nothing in the NP that precedes the Det and there is an N that follows the Det. There is no ambiguity as to whether there is a PP after the N or not. We are not even concerned about that aspect at this point in parsing. Each rule only states the MUSTPRECEDE and the MUST-FOLLOW conditions for that child to be a part of the parent syntactic unit. Optional adjuncts are handled through the rules for the adjuncts themselves that state how the adjunct can be attached to the ends of various parent phrases. For instance, a composite rule for PP tells how it can be attached to an already complete NP as an adjunct.
xP
Left-Corner
.... OptionalModifier
.... Head
.... .... RequiredUnit
.... .... OptionalAdjunct
(Argument)
Figure 8.4: Typical Structure of a Phrase `xP'.
4 What is sometimes referred to as a `binary rule' (Jensen, 1993) is actually a ternary relation since it has two constituents on one side of the rule and one on the other side.
130
8.1.2.3 Required Units and the Head
A phrase has four kinds of units (Figure 8.4, also shown earlier in Chapter 6): Required units: Required units are those that must be seen for the syntactic unit to be complete. Conversely, a syntactic unit that has instantiated all its required units is said to be complete. A syntactic unit that is yet to acquire all its required children is said to be open or incomplete. A complete syntactic unit that might take an optional child at a later time is said to be accessible. For instance, N is a required unit for NP, V for VP. There is no unique required unit or sequence of required units for a phrase. A phrase may have more than one set of alternative required units that can make it complete. For instance, an NP might simply have a Pronoun as a required unit and no N at all. Associated with each sequence of required units, there may be other optional units that can be legally added to the phrase. Optional units: Other optional units may often be added to a complete phrase. An example of an optional unit added to the right of a complete phrase is a PP that can be attached to the end of an NP or a VP. Optional units can also be added to the left of or in between required units such as in the case of adjectives or adverbs, respectively. We must make sure that the representation of the grammar and the parsing algorithm do not introduce spurious ambiguities due to optional units. The parser should not expect to see an optional unit. However, if it sees one, it should be able to parse it correctly. The head: Each syntactic unit (other than a leaf unit) has a particular required unit called the head. The parent syntactic unit derives its primary meaning from the meaning of its head child. The head may be in any position in the phrase for dierent phrases. Also, a phrase may have more than one alternative head, one for each possible sequence of required units. A phrase that has instantiated its head is said to be head-complete. A head-complete unit may not or may be complete depending on whether or not there are required units to the right of the head (subcategorized by the head, for instance). A head-complete syntactic unit has its meaning available for semantic processing; one that is not complete has not yet acquired its meaning. It may be noted that the head is a syntactically or grammatically de ned unit for each phrase and does not depend on particular meanings of words or particular lexical items in a sentence. The left corner: The leftmost child unit on the right hand side of a phrase-structure rule is called the left corner. The left corner may be the head of the phrase, a required unit of the phrase, an optional unit to the left of the required units of the phrase or none of the above. Yes, in fact, some left corners are neither strictly required units nor optional units in the sense described above. A phrase may have several possible left corners. The phrase is not required to start with any one of those left corners, but once it does, there is a corresponding set of required units and a corresponding head. For example, the subject NP in the sentence is neither required in every sentence nor is it optional in every sentence. It was not expected. But once it is seen, it stays and creates other requirements for completion. If such units are not treated as left-corners but as required units, then there would be grammar-induced spurious local ambiguities that the parser would suer from.
8.1.2.4 COMPERE's Representation of Syntactic Rules
Using the notion of required, optional, and head units to partition syntactic knowledge and the bottom-up indexing scheme for phrase-structure rules where rules with the same child unit on the right hand side (not necessarily just left corners) are grouped together, rules are represented in COMPERE as follows. For each category we represent:
131
The name of the category and a set of possible parent syntactic categories for the category. For each parent category, we have one composite rule which speci es one or more of the following:
Preconditions: A MUST-PRECEDE constraint that speci es a sequence of types of previous units that
must immediately precede the child unit before the child can be attached to the parent. For an optional attachment to the right end of an already complete phrase, in order to avoid spurious ambiguities and spurious multiplicity of rules, the representation allows us to specify the parent phrase itself as the MUST-PRECEDE condition. This condition is interpreted to mean that the child can be attached to any complete parent phrase of the speci ed type. For optional units on the left or in the middle of required units, one could devise similar shorthands (regular expressions, for example) that could make the representation more parsimonious. However, there is no particular advantage to doing that since these optional units do not introduce any ambiguity in parsing given left corner parsing and the left to right order of processing the words in a sentence. A MUST-BE constraint states particular requirements for attaching the child to the parent based for instance on subcategory information. For example, for a VP to be attached to a reduced relative clause, the V in the VP must have the participle subcategory. MUST-BE constraints could be enhanced to take care of agreement preferences such as number and person agreement between the subject noun and the verb.
Expectations: A MUST-FOLLOW constraint that speci es a sequence of units of particular types that must follow (i.e., that are required) for the phrase to be complete. As such MUST-FOLLOW constraints only specify required constituents. This information creates expectations for the parser. Only required units that must follow are expected.
Other Actions: Other procedural attachments can also be made to syntactic rules. These attachments specify
tiny actions that need to be performed when the child is attached to the parent unit. For instance, a test can be attached to the VP to S attachment rule that checks and notes the voice of the sentence. These attachments are useful for utilizing other linguistic cues such as voice that are not speci ed locally in one particular word for them to be included in word (sub)category information. These MUST-BE constraints and procedural attachments enhance the grammar beyond CFG languages without making the grammar combinatorially large.5 It may be noted that the above representation gives a set of composite rules for all the syntactic knowledge about a particular syntactic category. This is the entire set of composite rules for that syntactic unit. It is the subgraph of the grammar for that type of syntactic unit with only certain links, represented in particular directions, as guided by the indexing scheme. Figure 8.5 shows a representation of the simple grammar (in Figure 8.3) employed in the above discussion. A more complete grammar quite similar to the one in COMPERE is oered by Allen (1987). In addition to the above types of constraints on child-parent attachments, there is additional syntactic knowledge that represents the following two pieces of information: 5
See Grishman (1986) for a discussion of adding context-sensitive powers to a Context Free Grammar.
132
Node:
Det Node:
Part-of:
N
NP
Precond:
(Must-Precede Nil)
Expect:
(Must-Follow N)
Part-of:
NP
Precond:
(Must-Precede Nil)
Expect: Part-of: Node:
NP
Precond:
V
(Must-Precede Det)
Expect: Part-of:
VP
Precond:
(Must-Precede Nil)
Expect:
Node:
Node:
NP
Part-of:
Part-of:
PP
Precond:
VP
(Must-Precede PEP)
Expect:
S
Precond:
(Must-Precede Nil)
Expect:
Part-of:
Part-of:
S
Precond: Expect:
(Must-Precede Nil) (Must-Follow VP)
S
Precond:
(Must-Precede NP)
Expect:
Node:
PP
Part-of:
NP
Precond: Node:
(Must-Precede NP)
PREP Expect:
Part-of:
PP
Precond: Expect:
Part-of: (Must-Precede Nil) (Must-Follow NP)
Precond:
VP (Must-Precede VP)
Expect:
Figure 8.5: Representation of a Simple Grammar.
133
the set of possible head units for each syntactic phrase and the primitive role structure that corresponds to the syntactic unit. For example, NPs have
an NP-Role (also known sometimes as the Thing role) and the Verb has the Event-Role. (See Chapter 7 for a discussion of semantic roles.)
8.1.2.5 The Head of a PP
The notion of the head of a phrase needs further elaboration. In order to see the dierence between heads as de ned above and those used in other theories (e.g., Abney, 1989), let us consider the prepositional phrase. In most parsing theories, the head of a phrase is de ned using purely syntactic considerations without regard to meaning. For instance, the head of a phrase is that which projects the rest of phrase and is unique to each phrase. This de nition seems to work well for English but perhaps not so well for other languages such as a case-marked language. Even in English, according to this de nition, the head of a prepositional phrase is the preposition. This particular phrase helps us illustrate the problem with this approach and illustrate how COMPERE is dierent. In COMPERE, the head of a PP is the NP, whose head in turn is its Noun. One could object to this on purely syntactic grounds since the noun cannot be the head of both an NP and a PP. However, it is the Noun that gives meaning to both the NP and the PP. The preposition is only a linguistic marker used to denote the role played by the word meaning of the noun in the event. Languages have many ways of specifying the roles played by noun meanings. When a language like English uses word order to indicate roles, we get NPs directly attached to S or VP. However, when there are many roles, there are at most three that can be speci ed using word order alone. Prepositions are a mechanism for specifying the roles of other nouns. The nouns are still central to the meaning of the PP. In fact, in many other languages, prepositions or their equivalents come after the nouns, not before them as in English. In a case-marked language, case markers that follow the nouns are used to mark the semantic roles of nouns. Another interesting piece of evidence points to the notion of the noun being the head of a PP. This is the phenomenon of a Prep-noun such as \home" in
(2)
John took the book home. This shows that there can be a PP without a preposition, but there can never be a PP without a noun, except in anaphoric situations where clauses end with a preposition. Since the notion of a head comes from considerations of meaning and since the HSLC parsing algorithm is de ned using the head of a phrase to determine when to perform parsing actions, COMPERE as a parser does not have an existence as a purely syntactic parser. However, COMPERE was meant to be a model of sentence interpretation, not just parsing, and that is the way it should be.
8.1.3 Conceptual Knowledge
Content words such as nouns and verbs have conceptual meanings quite independently of the linguistic forms the words take. These meanings are represented in conceptual units that state how the concept relates to other concepts. Concepts are viewed as nodes in multiple hierarchies, each hierarchy being a generalization-specialization structure between concepts. A conceptual node speci es its relations to other concepts by stating the selectional preferences for classes of concepts that can ll particular role relationships with the concept. A selectional preference for a role is stated as a MUST-BE constraint in terms of the class of concepts to which the ller of that role must belong. For example, the word \teach" has the conceptual meaning of the event teach. This concept has selectional preferences on which other concepts can ll roles in the teach
134
event (Figure 8.6). For instance, llers of agent roles in this event must be animate concepts and llers of theme roles must be abstract concepts in the teach event. Objects fall into dierent hierarchies of concepts which tell us whether a particular concept is animate or not, whether it is an abstract entity or a physical object, and so on. A sample hierarchy is shown in Figure 8.6(b). COMPERE's conceptual knowledge was built only for the purpose of demonstration and included only those concepts and distinctions between them that were necessary for the example sentences that COMPERE was tested with (see Chapter 9 for several examples).
Event:
TEACH
Agent-Role:
(Must-Be Animate)
Experiencer-Role: Theme-Role:
(Must-Be Abstract-Object)
(a) Lexicalized Conceptual Knowledge
thing
lifeless object
living plant
animate
human
adult
other-animal
abstractobject
opticalinsrument
child course
officer
physicalobject
telescope
horse
(b) Non-Lexical Concept Hierarchies (or Graphs) Figure 8.6: Representation of Conceptual Knowledge.
135
Though conceptual units state selectional preferences for role llers, there is no completeness requirement that they must specify every role that can ever be attached to a concept. Optional roles such as a location role can be attached to any event, for instance. Once again, a distinction is made between required units and optional units and the representations are designed to distinguish between the two. The required roles, such as the agent, theme, and experiencer roles of a particular event for example, form the collection of roles that are typically present for the event. However, this is not a strict requirement constraint as in syntax. Some of the roles mentioned in the conceptual event may not be present in a sentence and may have to be inferred by means of inter-sentential and extra-linguistic inferences, something that COMPERE does not do. For instance, the above event might only have an agent and an experiencer and so on. It is assumed that the grammaticality of the requirement or absence of a role is handled in syntax by the corresponding parts of the grammar. It may also be noted that events (concepts in general) can occur in various combinations with other concepts that it relates to. Such variety is realized in linguistic forms through various predicate argument structures and subcategorizations. However, all such combinations of roles are not explicitly represented in COMPERE's conceptual knowledge. Predicate argument or subcategorization information which can be represented using thematic grids in conceptual knowledge (Tanenhaus, Garnsey, and Boland, 1991) is rather verbose and enumerative. The use of such information is assumed to be taken care of by the combined use of subcategory information in the lexicon, subcategory constraints on the grammar rules (in the form of MUST-BE constraints on the syntactic rules), and selectional preferences for conceptual role llers. It may be also be noted that conceptual knowledge of selectional constraints as described above is lexicalized. That is, these selectional constraints are speci c to particular word meanings and can be accessed directly using the lexical entry of the word. COMPERE however has additional non-lexicalized conceptual knowledge in the form of the concept hierarchies mentioned above.
8.1.4 Role Knowledge
Semantic analysis in COMPERE is based on thematic roles. A standard set of thematic roles is extended by introducing intermediate roles that link syntactic structures to their thematic roles. These intermediate roles help maintain declarative representations of the sentence interpretation process in all its stages. Each role is linked to the syntactic structure (i.e., the node in the parse tree that corresponds to the role) on the one side and the conceptual meaning (i.e., the word meaning that is the ller for the role) on the other. The representation of roles is by design very similar in content and form to the representation of syntactic knowledge. A role type has a node that represents all the rules that govern the specialization of a role of that type into more specialized roles (by becoming a child of the specialized parent role). These rules state the parent role to which the child role will be attached along with the constraints on that attachment. The constraints, preferences, and procedural attachments that go with role attachments are the following:
Syntactic guidance: Certain constraints on role assignment inspect the local syntactic
context to seek the guidance of syntax in composing meanings through role attachments. For instance, role assignment might depend on whether the parent unit is a VP or an NP, or if the NP is a subject, direct object or an indirect object. Other linguistic guidance: Other sources of linguistic information also help in role attachments. For example, some role specializations check the voice of the sentence or the clause to determine the right attachment.
136
Role context: Other roles present in the current role hierarchies also in uence role assign-
ment. These are handled much like the corresponding constraints in syntax namely, using MUST-PRECEDE constraints. Just as in syntax, some MUST-PRECEDE conditions refer to the parent role itself indicating that the child role can be added to the parent role as long as that role is complete. Roles too have required and optional constituents. However, which subset of roles is actually required for a parent role is often determined by the conceptual meaning of the phrases such as which type of event is being talked about and the syntactic argument structure in the surface form. Optional roles such as LOCATION and CO-THEME, however, can be added to many roles without being required by the corresponding conceptual knowledge. Conceptual constraints: The particular concept participating in the parent role in uences the particular roles that can be attached as child roles. Such knowledge in the form of MUST-BE constraints helps disambiguate between ambiguous roles such as between THEME, EXPERIENCER, and so on. Such disambiguation is not well marked in linguistic surface form and as such cannot be captured adequately by distinctions in syntactic structure alone. Sometimes, conceptual knowledge also helps delay role specialization until the right time when the requisite knowledge for disambiguation becomes available. For instance, a subject role is prevented from getting too specialized to become an agent role until there is an actual event role that accepts or rejects the subject noun as a ller for the agent role. Combined factors: Certain combinations of the above also help in deciding the right way to specialize a role in order to attach it to a parent role. For instance, a combination of syntactic and role contexts provide the right constraints for attaching co-agent and co-theme roles.
8.1.4.1 Intermediate Roles
Intermediate roles are structures that are neither purely syntactic nor purely semantic. They combine lexical, syntactic, semantic, and conceptual information to represent states in sentence interpretation by capturing a set of related commitments made by the sentence processor at that point in interpreting a sentence. Intermediate roles are declarative representations of sentence processing decisions that are of great value in recovering from an error. An example of an intermediate role is the VP-WITH-NP-Role. This role is the role for the head noun of a prepositional phrase which has the preposition \with" and has been attached to a VP as an adjunct. This role is not quite the same as an Instrument role because at this point, the sentence processor has not yet veri ed that the conceptual meaning of the VP, the event, accepts the meaning of the head noun in the PP as a valid instrument in the event. Whether this noun becomes the instrument or not, the sentence processor is equipped to deal with any eventuality since it has a complete track of all the decisions it has made namely, that the noun initially got an NP-Role which became a WITHNP-Role when the NP was attached to the PP with a \with" preposition which in turn became a VP-WITH-NP-Role when the PP itself was attached to the VP. Without such a representation of these intermediate decisions, the sentence processor would have to start role assignment afresh every time its attempt to assign a particular role did not succeed. A graphical representation of thematic roles together with many intermediate roles currently in COMPERE is shown in Figure 8.7. In this gure, nodes are roles; links show how roles emerge into parent roles either by themselves or by combining with other roles. The MUST-PRECEDE and MUST-BE preconditions on the links (not shown in the gure) ensure that only appropriate roles emerge in a syntactic and semantic context. The set of roles shown in Figure 8.7 is sucient to cover all the example sentences for which COMPERE's output is described in Chapter 9.
EVENT
Figure 8.7: Representation of Intermediate Role Knowledge.
AGENT
THEME
EXPERIENCER
INSTRUMENT
LOCATION
BENEFICIARY
STATE
NON-AGENT-SUBJ
ACTIVE-SUBJ CO-AGENT
PASSIVE-SUBJ
VP-WITH D-OBJ
SUBJ-NP
CO-THEME
NP-WITH
I-OBJ
OBJ-NP
PASSIVE-BY-NP
FOR-NP
BY-NP
NP
BY
FOR
WITH-NP
WITH
AT-NP
AT
INTO-NP
ADJ-VP-Modifier
INTO
ADJ
137
138
8.1.5 Other Knowledge
COMPERE also has a few other types of heuristic knowledge that it uses to reduce ambiguity and assign proper syntactic and semantic structures to a sentence. It has purely structural preferences in assigning syntactic structure. For instance, it has a right association preference for an optional adjunct which means that it prefers to attach an optional adjunct such as a PP to a lower syntactic unit when there are two or more such units of the same type. It also has a minimal attachment preference for syntactic structure assignment. However, its minimal attachment behavior is a mere consequence of another heuristic that it employs namely, the completion preference (see below). Essentially, given the choice between attaching a structure as a required unit and as an optional unit, it prefers the required unit attachment over the optional one. The second type of heuristic knowledge is the completion preference. The sentence processor has a quest for completing all incomplete structures. In syntax, it keeps track of all the required units that have not yet been seen for a particular open phrase and prefers an interpretation of the following words that would complete the phrase. As noted above, this preference for completing phrases provides an explanation of minimal attachment behaviors without relying on an explicit count of the number of nodes in a parse tree. This notion of completeness is directly related to COMPERE's account of expectations. Only required units that are necessary to complete a structure are expected. This is a good heuristic that COMPERE uses to exploit the power of expectations to reduce ambiguity while not wasting its eorts expecting unnecessary entities that may or may not appear in the sentence. The notion of completion is useful not only as a disambiguation preference, it also guides the sentence processor in making its commitments at the right points in processing a sentence. As described later in the parsing algorithm, the processor makes further attachments upwards only when the structure is complete, that is, when it has acquired all its required units.6 In addition, COMPERE also has certain other heuristic preferences. For instance, it honors phrase and clause boundaries and knows that it must not cross them arbitrarily while making syntactic attachments. It also knows that the sentence structure itself, the S, has a special status in the sense that there can only be one of those in a sentence and that it must strive to create an S structure if one does not exist yet, and so on. It has similar preferences for an event-role in semantics. It also has completion preferences for roles in semantics when they have required units. COMPERE also knows that two syntactic structures can be attached as adjacent siblings to a parent only if their sentence loci or projections are contiguous (i.e., the rst word spanned by the later unit follows immediately the last word in the sentence spanned by the earlier sibling). All these smaller pieces of knowledge are represented procedurally in the algorithms that implement COMPERE.
8.2 Working Memory COMPERE maintains a working memory to keep track of the interpretation(s) it has built and to be able to access the syntactic and semantic interpretations for the initial parts of the sentence while processing the later parts. This working memory (Figure 8.8) consists of the following structures:
Parse Trees: A set of partial parse trees for the sentence, some of which will eventually
become part of the parse tree for the entire sentence that represents the syntactic interpretation of the sentence. Each parse tree comes with the additional constraint that only its
This is an approximation and will be clari ed later in the parsing algorithm. Brie y, a structure is attached to its parent as soon as it is head-complete (i.e., it has seen its head unit); further, the attachment is re ned every time it acquires new units whether required or optional to the right of the head. 6
139
Syntactic
Accessible 0
Parse
0
NP
S3
Right Boundary
Expectation
VP
2
2
PP
3
3
4
Trees DET 0
N 1
1
V 2
2
PREP
3
The 0
bugs 1
1
moved 2
2
3
Event-Role
NP
4
3
5
into 3
5
Last Seen Structure
4
INTO-Role
Agent-Role MOVE
Role Trees
Active-Subject
INSECT-BUG
MICROPHONE-BUG
Subject-Role
Word Meanings
NP-Role Not all links between parse tree nodes, role tree nodes, and word meanings are shown here.
Note:
0
0
Voice-Flag:
4
Word Counter
Embedded-Clauses: -Nil-
Syntactic-Alternatives:
ACTIVE
VP attached to Rel-S as parent.
Ignore-Semantics:
Semantic-Alternatives:
NO
Agent-Role’s retained meaning: MICROPHONE-BUG
Figure 8.8: COMPERE's Working Memory.
140
right-hand boundary is accessible for attachment or error recovery. Also, the trees are stored in left to right order and the ones behind the currently incomplete clause are not accessible for attachment. COMPERE also keeps track of the last seen structure which is the most recent syntactic structure processed since this is the one that is most likely to be accessed in the near future. Nodes in parse trees also have a representation of the window of words they span in the sentence. This is a tuple which stands for the starting and ending word number for the portion of the sentence spanned by the syntactic unit. Role Trees: A similar set of role hierarchies are also accessible, either directly or through the parse trees. Nodes in either set of trees also have connections to the conceptual meanings of the words in them. Embedded Clauses: The semantic roles and conceptual meanings of embedded clauses are also kept accessible since they constitute the nal interpretation of the sentence along with the interpretation of the main clause. However, embedded clauses are not available for attachment or structural modi cations unless they happen to be on the right-hand boundary of the currently open or accessible syntactic structure. Distinctions between the semantic representations of main clauses and embedded clauses (see Woods, 1975, for a discussion of these distinctions) are represented by making the meaning of the main clause available at the S node and the meanings of other clauses available separately through this list of embedded clauses. Syntactic expectations: The parse trees also have expected units hanging to the right of the nodes, which are the expecting siblings, on their right-hand boundaries. These nodes are used to attach incoming words to expected parent units in preference over other attachments. Expectations serve to ful ll the quest for completing incomplete units. Expected units are instantiated at the time the expectation is generated. Expected units are linked to the parse trees through \horizontal" links between sibling units, from the expecting unit to the expected unit. These horizontal expectation links have certain advantages over the alternative of attaching the expected units to the parent units. They are distinct from any possible syntactic attachment links (i.e., they are never part of the parse tree), thereby eliminating any ambiguity between expectation and other attachment links. Further, the same syntactic unit can be expected by dierent left corners in a particular parent phrase. For example, whether we see an Auxiliary or an Adverb, we expect to see a Verb. If the expected Verb is linked to the parent VP, it would not be known whether it was expected by an Aux left corner or an Adverb left corner. This could of course be veri ed by traversing the tree locally, but having the information explicitly represented helps in semantic role assignment and also in error recovery. Syntactic alternatives: A record of syntactic alternatives for previous attachment decisions is kept in order to switch interpretations or otherwise repair structures during error recovery. A syntactic alternative is a set of the alternative attachment proposal and the child structure for which the attachment was proposed. Alternatives that are not accessible any more because their child nodes are not on the right-hand boundaries any more are removed from this list. Semantic alternatives: Word meanings that have been deactivated due to conceptual selectional constraints for particular role assignments are retained as alternative meanings. These alternative word meanings accessible either through the role structures or the parse trees are used for reactivating them during error recovery should there be no reason any more to deactivate them given the revised role assignments.
141
Word Counter: The current word position in the sentence. This is used to check for the
spans of syntactic structures in terms of the subsequence or window of words that they span. Checks for contiguity and overlap of sentence loci of syntactic structures are of great value in syntactic structure assignment especially in the wake of structural and categorial ambiguities. Voice ag: A register that indicates whether the currently open clause has been found to be in active voice or passive voice. This is a useful way of handling non-local (or non-CFG) eects in sentence processing. Similar registers may be needed for dealing with agreement issues. An associated register is used in determining the voice of an embedded relative clause. This ag notes that the VP in the relative clause has the potential of being in passive voice (i.e., is in participle form, with or without an auxiliary form of \be"). This register later determines if the clause is really in passive voice or not depending on whether or not there is an auxiliary, a relative pronoun, or both.7 Ignore semantics: A third ag is used to note whether semantic preferences including selectional preferences should be completely disregarded until the ag is reset. This is necessary for error recovery during end of sentence processing (see below).
8.2.1 Representing Proposed Attachments
Since COMPERE arbitrates between several proposed attachments, both syntactic and semantic, before making an attachment, it needs to hold on to proposed attachments in the working memory. Though these proposals often include a variety of information about the proposed attachment, they are transient and are thrown out in most cases as soon as the arbitrator selects another proposal. Proposals that are alternatives to the selected ones are retained in the lists of alternatives only when needed for later use in error recovery. Even when they are added to this list, they are discarded once the nodes involved become inaccessible by moving to interior parts of parse trees. (See the algorithms below for a clear analysis of when a proposal is thrown out and when it is retained.) A syntactic attachment proposal has the following information speci ed: Parent-category-name: The syntactic category of the proposed parent node. Child-category-name: The syntactic category of the child unit being attached. This information could easily be obtained from the Child-str below, but is included anyway for the sake of uniformity. Parent-str: The parent structure if it already exists in the parse forest. Child-str: The child structure being attached. Preference-value: The preference value assigned to the proposed attachment. Passivity: Whether the child structure being attached is in passive voice. This slot is used to determine the voice of relative clauses since the voice depends not only on the verb subcategories within the VP but also on the presence or absence of the relative pronoun. Lexical-entry: The lexical entry of the word according to which the attachment is being attempted. When the word has a category ambiguity, this identi es the particular lexical entry for the Child-category-name and provides access to the necessary subcategory information.
For a thorough illustration of determining the voice of a relative clause and of thematic role assignment in relative clauses see the examples in Chapter 9. 7
142
Must-precede: The syntactic category(ies) of the units that must precede the child-str for
attachment to the parent-str. Must-follow: The syntactic category(ies) of the units that can be expected at this time if the child-str is attached to the parent-str. Must-be: The constraints on the subcategories of the words in the child-str for the proposed attachment. For example, the V in the VP must be in participle mode and so on. Actions: The actions (procedural attachments) that must be carried out at the time of attachment. This feature is used sparingly, mainly to set the voice register and so on. A semantic attachment proposal has the following types of information: Parent-category-name: The role category of the proposed parent role. Child-role-str: The child role structure being attached. Parent-role-str: The existing parent role structure to which the attachment is being made. Preference-value: The preference value assigned to the proposed attachment. Must-precede: The role category(ies) of the units that must precede the child-role-str for attachment to the parent-role-str. Must-follow: The role category(ies) of the units that can be expected at this time if the child-role-str is attached to the parent-role-str. Actions: The actions (procedural attachments) that must be carried out for the proposed attachment. These procedures are used mainly to specify tests (for non-CFG aspects) to implement the role preconditions (described below) on the links of the role graph in Figure 8.7. Directly-p: Is the role attachment direct or is there an intermediate role (to be created) through which the child and parent roles are to be attached. Middle-role: The intermediate role in the middle if the attachment is not direct. Active-meanings: The meanings of the child structure that should remain part of the current interpretation after the attachment. Alternative-meanings: The meanings of the child structure that should be deactivated when this attachment is done; these meanings are the alternatives that may be used at a later time for error recovery purposes. Role-on-left: The role on the left sibling of the child role when attached. (See the section on processing left roles below for a description of how this information is used.) It may be noted that COMPERE's arbitrator passes on a syntactic attachment proposal to semantics and combines the corresponding semantic proposal(s) with the syntactic one to form a complex \proposal ensemble." Its use in arbitration is described below.
8.3 Sentence Processing Methods Having seen the representations, both of knowledge and of intermediate results in the working memory, we now turn to the methods and algorithms used to process the sentence using the representations. We rst focus on syntactic processing and describe the implementation of the HSLC parsing algorithm.
143
8.3.1 Implementation of HSLC Parsing Algorithm
The important methods in the HSLC parsing algorithm are (i) deciding when to attempt to attach a syntactic unit to a parent unit, (ii) proposing possible attachments, (iii) selecting a subset of the proposed attachments, and (iv) making the attachment(s). This enumeration does not include other minor steps such as creating instances for word categories, and so on. As discussed in Chapter 6, a parsing algorithm is determined essentially by the choices of when to make an attachment and when and whether or not to project from a parent syntactic unit. A top-down parser projects at start and has no need to make attachments at all since it always projects everything from the top. A bottom-up parser on the other hand makes an attachment at the end of the phrase. In other words, the announce point (see Chapter 6 or Abney and Johnson, 1991) is always zero for top-down parsers and is always at the end of the phrase for bottom-up parsers irrespective of whether there are required or optional constituents in the phrase. In HSLC, the announce point is the same as the head and is hence dierent for dierent phrases depending on the position of the head and any optional modi ers before the head. Since HSLC is a left-corner parsing algorithm, there is no question as to when projection is done. A left corner is always used to project to a parent unit and make any expectations for required units of the parent. The expectations thus created undergo minor adjustments, in their starting sentence loci for instance, when optional modi ers are seen between an expecting unit, like a left corner, and the expected required units.
8.3.1.1 Deciding When to Make Attachments
HSLC says that any constituent that just received its head must be attached to its parent unit(s). In other words, a unit is attached to a parent unit as soon as the unit is head-complete. This is done by maintaining for each unit its completion status which is either True or is a list of required units to be acquired before it becomes complete. Leaf units for word categories are their own heads and hence are attached immediately upon instantiation.
8.3.1.2 Identifying Possible Parents
Possible categories for parent units are obtained by looking at the phrase structure rules for the current child category. These composite rules, represented as described above, tell us the types of parent nodes and any constraints on the attachment such as MUST-PRECEDE and MUST-BE constraints for instance. Once the types of parents are identi ed, the parser needs to propose attachments by locating parent units existing in the parse trees or by proposing new parents units to be created.
8.3.1.3 Proposing Attachments
The right-hand boundaries of parse trees are used to propose ways of attaching a given syntactic unit to an existing parent unit or to one that needs to be freshly instantiated. There are four ways of proposing attachments (Figure 8.9; see also the pseudocode algorithm in Figure 8.10):
Attaching to satisfy an expectation: It is possible that another syntactic unit previously
seen has created an expectation for the current unit. In such a case, the current unit is attached to the parent of the expecting unit to the right of the expecting unit as its sibling. While it is possible to merge this case with one of the other cases below (since if the expecting unit is on the right hand boundary so will its parent), this type of attachment is checked rst to help in disambiguation. By explicitly representing expectation links, we can catch
144
expected attachments and treat them preferentially over other attachments thereby reducing the ambiguity in structure assignment. Attaching to an expected unit: It is also possible that the parent unit was expected but not the current child unit.8 In such a case, the child unit is attached to the expected parent unit, leading it towards completion. An optional unit or a left corner can also be added to an expected parent unit. Attaching to a unit on the right hand boundary: A syntactic unit may be attached to a parent unit that is on the right hand boundary of the tree. Since there was no expectation in this case for a unit of this type, this unit must be an optional unit for the parent unit. If the optional unit is to the left of some of the required units of the parent unit, the sentence processor will have to readjust the expectations for those required units while making the attachment since the sentence loci of the expected units will now be pushed to the right by the intervening optional unit. An example of this is in adding a PP as an adjunct to a subject NP. In this case, the NP's expectation for a VP will have to be pushed to the right by an amount equal to the length of the PP. Attaching by projecting a new parent unit: If there is no suitable parent unit already existing and there is no objection to creating a new parent unit (such as no violation of incomplete phrase or clause boundaries), then the child unit can lead to the creation of a new parent unit. The child unit will be the rst or leftmost child of the parent unit or its left
corner.
It may be noted that the child unit being attached might itself have children and might have just turned head-complete. In other words, the child unit might actually be one of the partial parse trees in COMPERE's working memory. It may also be noted that contiguity in sentence locus is always checked before any attachment is made. Any two siblings must always be contiguous in the sentence and parents should accurately re ect the sentence loci of their children. In locating an existing parent unit for attaching a child unit, the right-hand boundaries of parse trees are searched from leaf to root upwards and from the most recent (in terms of the word counter) partial tree to the very rst tree as long as the tree is accessible. Trees \behind" an incomplete tree become inaccessible so as to honor phrase/clause boundaries (i.e., to avoid links crossing each other in the nal parse tree). This search ordering results in a right-association preference for attachments in the absence of other stronger preferences such as the completion preference. Such right association behavior can be seen in COMPERE in the case of attaching optional adjuncts to a phrase for instance.
8.3.1.4 Selecting from Proposed Attachments
Proposed attachments are ranked based on preference levels for dierent types of attachments. Dierent types of attachment proposals and their preference levels are shown in Table 8.1. Selecting based on the preferences is the task of the arbitration algorithm below and is described as part of the description of that algorithm.
A parent unit may be expected but not its head child. This is done to keep the expectations general and their representation parsimonious. Recall that phrases may have more than one head depending on the expansion they take. For instance, a preposition generates an expectation for an NP but not speci cally for a noun. This is because an NP might have a noun for a head or might have a pronoun. Rather than generate two alternative expectations one for a noun and another for a pronoun, we simply generate an expectation for an NP leaving the rest to later processing. 8
145
0
0 0
DET
Expected Category
NP 1
N
1
2
0
0
2
Word1 1
NP
DET 1
2
N 1 2
Word1 1
Word2 2 1
2
Expected Category
0 Expectation Link (a) Attaching to Satisfy an Expectation
0
PP
Expected Category
1
PREP 1 0 0
0
PP
1
PREP 0 1
NP 2 2
Word1 1 1
NP 1
DET
2
3
N
3
Expectation Link
Word1 Word2 2 0 11 (b) Attaching to an Expected Parent 0S6
4 PP 6
0 S4
0
NP
2
2
2
VP
AUX
4
3
4
3
V
4
PREP 5
0
5
5
NP
N
NP
2 VP 6
2
2
6
AUX
3
3
V
6
4 PP 6
4
4
PREP 5
5
(c) Attaching an Optional Adjunct to an Existing Parent 0 S4
0
NP
2
N
0 S4
2
2
5
NP
VP
AUX
3
4
0
3
V
NP
2
2
VP
4 PP5
4
AUX V 4 2 3 3 4 (d) Projecting a new Parent from a Left Corner
4
PREP 5
Figure 8.9: Ways of Making Syntactic Attachments.
6
NP 6
6
6
146
Given a current syntactic unit to be attached and a parse forest, For each possible parent, For each tree in the parse forest from the rightmost to the leftmost, For each node on the right-hand boundary of the tree, Stop if crossing incomplete phrase boundary or clause boundary; If the node is expecting the current unit, select its parent; else if the node is expecting a unit of the same type as the possible parent, select the node; else if the current unit can be attached as an optional child to the node on the right-hand boundary, select the node; If all else fails, If the current unit has no MUST-PRECEDE, create a new node by projecting from the current unit as the left corner of the new parent unit. Figure 8.10: Pseudocode for Proposing Syntactic Attachments.
Table 8.1: Preference Levels in Syntax. Type of attachment Attaching to satisfy an expectation Attaching to an expected parent as a required child Attaching to an existing parent as optional child Attaching to an expected parent as an optional child Creating a new parent unit
Preference Level 5 5 4 3 0
There are two modi cations done to the above levels of preference. If a sentence structure, S, has not been formed yet, and there is a proposal to create one, that proposal gets an increment of 4 in its preference value. The idea is to prefer starting an S structure so that a complete sentence structure may be formed rather than adding to nonessential optional units. The S structure is in a sense the expected structure to start with. Secondly, in addition to the preference levels shown
147
in Table 8.1, any preference level is increased by 1 if there are procedural attachments attached to the attachment proposals. This is a heuristic that says if there was a test or other action that was associated with the proposal and the test or action did not lead to a rejection of the proposal (i.e., some additional information was used by the attached procedure and the additional information corroborated the proposal), the preference level for the proposal should increase by a small amount.
8.3.1.5 Making Attachments
As the selected attachments are made, relevant structures in the working memory are updated to re ect the changes in the syntactic structures. For example, the following updates are done to the working memory so that it is always up to date:
Meanings: If the child is the head of the parent, then the meaning(s) of the child unit are
propagated up to the parent. The meanings are now accessible from both the child and the parent. The corresponding roles are also propagated upward to the parent. For instance, the event role and the meaning of the V are propagated up to the VP node and then to the S node if the VP is the main verb of the sentence. Sentence Loci: The increase in span of the parents as a result of the attachment is re ected by updating the sentence loci of the parent unit as well as those of all its ancestors (which are all on the accessible right-hand boundary). To-be-completed lists: If the child unit is a required unit of the parent, the to-be-completed list of the parent unit are updated to re ect the fact that one of the required units was seen. \Propagate Completion Up": If the child unit is the head of the parent unit, then the parent just became head-complete. As such, the parent itself should now be attached to units further up, if possible. This process, called \propagate completion up" in COMPERE's jargon is recursively applied upward as far as possible so as to get the most complete incremental interpretation at any point in processing a sentence. The only thing that can stop this upward propagation is an intermediate node that is not head-complete. Head-completeness is the license for attachment and the green signal for the upward propagation of processing.
8.3.2 Implementation of Role Assignment
Semantic analysis in COMPERE is essentially a process of assigning a semantic role, such as a thematic role or other modi er roles of various kinds (Frawley, 1992), to each content word in the sentence. Word meanings in a sentence can be assigned semantic roles either using syntactic guidance so that role assignments correspond to syntactic compositions of words proposed by the HSLC parsing algorithm, or independently using only semantic and conceptual knowledge. We focus only on the rst kind of role assignment in the discussion below. While COMPERE can propose role assignments independently of syntactic attachments, such unguided role assignment is nothing more than an exhaustive search of every pair of unattached roles and meanings for possible attachments, which might come in handy while processing ungrammatical input, for instance. Implementing role assignment involves the following steps: (i) deciding when to make role attachments, (ii) generating semantic attachment proposals, (iii) selecting among the proposed role assignments, and (iv) making the assignments. Deciding when to make role assignments is already taken care of by the HSLC parsing algorithm. Role attachments are attempted whenever a head-complete syntactic unit is being attached to a parent unit. The only other time COMPERE might want to attempt role attachments is when there is no composition suggested by syntax, such as when the input is ungrammatical. In the following discussion, we will only concern ourselves
148
with the former situation of making role assignments that correspond to syntactic attachment proposals. However, there is a second place where role attachments will have to be carried out. This phenomenon, known as \left-role processing" in COMPERE, is explained below. Given a set of proposed syntactic attachments, COMPERE evaluates the semantic feasibility of each of those attachments. This results in the generation of one or more possible semantic attachment proposals corresponding to the syntactic attachment proposals. The arbitration algorithm selects from this set of pairs of syntactic and semantic attachment proposals, a subset that the sentence processor needs to actually pursue and make attachments. We discuss each of these steps in further detail below.
8.3.2.1 Generating Semantic Attachment Proposals
Given a syntactic attachment proposal, COMPERE generates corresponding semantic attachment proposals by attempting to attach the semantic roles of the two syntactic structures either directly or through intermediate roles. For each of these links, it also checks any MUST-BE or other conceptual constraints on the llers of the roles. These tests are done for each pair of child and parent roles. If the parent has no role at the time of syntactic attachment, then it gets the role of the child if the child is the head child; otherwise the parent remains without a role. Given a pair of roles to be attached, COMPERE employs a parsing algorithm very similar to the HSLC parsing algorithm to generate attachment proposals. In the case of a direct attachment of a child role to a parent role, this process is the equivalent of attaching a syntactic child to an existing parent unit. When a new, intermediate perhaps, parent role is created, the process is similar to what happens in syntax when a new parent structure is created from a left corner of that phrase. Roles can also be attached to satisfy expectations, as in the case of PP roles where the preposition's role expects an NP-Role. However, unlike syntax, there is little to be gained from prediction and projection here since structural ambiguities in composition are taken care of by HSLC in syntax. Since roles are composed of at most two children roles (see Figure 8.7), the algorithm used to \parse" role sequences is much simpler than HSLC. It does not employ leftcorner projection or prediction. It simply carries out a purely bottom-up process until a satisfactory (according to conceptual MUST-BE constraints and any syntactic correspondence constraints on role links) attachment to the parent role is found. In summary, generating semantic attachment proposals given a syntactic attachment proposal is comprised of the following steps: Examine if the role is complete: a role is complete only after it has acquired all its required children (a head-driven strategy); the event role is never complete and is never attached to any roles higher up (Figure 8.7). Consider all possible parents mentioned in the composite \phrase-structure rule" for the child role to be attached. If there are MUST-FOLLOW conditions for attachment to the parent, delay the attachment until the MUST-FOLLOW is satis ed (head-driven strategy). If all the MUST-PRECEDE conditions, conceptual constraints on the role ller, and any correspondence constraints are satis ed, then propose attachment to the parent. MUSTPRECEDE constraints are checked by looking to see if the existing parent role has the child(ren) role(s) of the types speci ed in the constraint. Conceptual constraints on role llers are checked by seeing if the word meaning attached to the child role is of the type speci ed in the constraint. Such conceptual constraints themselves are obtained by looking up the slots of the conceptual meaning attached to the parent role (i.e., the meaning of the
149
parent syntactic phrase speci es the constraints on the llers of roles; the meanings of the child syntactic phrase which become such llers must meet those constraints). The process of checking correspondence constraints is described below. Find an existing parent role, or propose that a new role of that type be created to be the parent of the child role. Note that nding the existing role does not involve any search of the role trees or the parse trees; the role can always be obtained from a local search of the syntactic child, the syntactic parent, or the syntactic left child in the case of left role processing. Repeat this process until the child role is linked, perhaps through intermediate roles, to the role of the parent syntactic structure. These steps essentially constitute the role assignment method in COMPERE's semantic analysis. The algorithm for generating semantic role assignments was presented earlier in Chapter 7.
8.3.2.2 Processing Left Roles
Often when a child syntactic node is attached to an existing parent syntactic node, roles not attached to either node will have to be processed. To see this scenario in more precise terms, consider a syntactic node Syn1 with a child, a left corner, Syn2. Suppose also that Syn2 is not the head of Syn1. Therefore, Syn1 neither has a meaning nor a role at this time. However, Syn2 might have a meaning and an intermediate role waiting to be composed with the meaning of Syn1. Suppose now a new child Syn3, the head of Syn1 is attached to Syn1 (Figure 8.11). As a result, Syn1 is going to acquire a meaning and a role from its head, Syn3. Before we make this attachment between Syn1 and Syn3, we have to verify the semantic feasibility of making such an attachment and let the arbitrating algorithm decide whether it is wise to make this commitment at this time. Since making the syntactic link between Syn1 and Syn3 essentially results in the composition of the meanings of Syn2 and Syn3, we need to evaluate the feasibility of attaching the intermediate role of Syn2 to the new role of Syn1 (which is the same as the role of Syn3). Thus it is also necessary to process the roles on the left link, that is, the link between the parent syntactic structure Syn1 and its child Syn2 to the left of the child Syn3. This is because the parent will acquire its meaning from the new child if the child is its head, and hence the relationships between this newly acquired meaning and children on the left should be investigated to determine the feasibility of attaching the current child to the parent syntactic structure.9 This scenario is a typical example of immediate semantic feedback to syntactic processing which helps resolve structural ambiguities in syntax. For example, the semantic feasibility of the agenthood of a subject noun might help syntax determine whether or not the verb should be the main verb of a sentence or be a part of a reduced relative clause modifying the subject noun. This can explain, for example, why sentence 3 below is not a garden-path sentence while sentence 4 is. (3) The courses taught at the academy were very demanding. (4) The ocers taught at the academy were very demanding. It may also be noted here that checking the semantic feasibility of a syntactic link as described above involves a chain of intermediate roles at times for making the role attachments. For instance, the NP-role of a subject noun might have to be specialized all the way to the AGENT-role (Figure 8.7) in order to determine the feasibility of a VP to S link by processing the NP to S
Details of the algorithm for left-role processing are not presented here. The context in which such processing is necessary was explained above and illustrated in Figure 8.11. Processing the left role itself follows the same role assignment algorithm presented earlier in Chapter 7. 9
150
Syn1
MeaningSyn2
IntermediateRole-Syn2
Syn2
Syn3 Head
Role-Syn3
Syn1
Role-Syn2
MeaningSyn2
IntermediateRole-Syn2
Syn2
Syn3 Head
Syntactic node to Semantic role or Semantic role to Conceptual meaning. Syntactic expectation link. Syntactic link on the right boundary. Semantic role to Semantic role. Inaccessible syntactic link.
Figure 8.11: Processing a Left Role.
MeaningSyn3
151
left link. This serves to show that in general there is no one-to-one correspondence between the individual operations on parse trees and role trees. A single composition operation on parse trees might involve building an entire tree of semantic roles and vice versa. Checking the semantic feasibility of a syntactic link involves examining the semantic consequences of making the link. Since the syntactic link has not been made yet, because the arbitrator evaluates all proposed compositions in parallel before making any attachment, this process poses a problem in implementing COMPERE. The solution employed in COMPERE involves making the syntactic attachment temporarily to evaluate the semantic consequences of the attachment and then removing the syntactic link to let the arbitrator decide which link to build. Such temporary syntactic links may be needed, for instance, to see if correspondence constraints on role links are satis ed by the proposed syntactic link.
8.3.2.3 Semantic Preference Levels
The arbitration algorithm selects from among the proposed attachments by means of a set of preference levels for semantic attachments. Proposed role attachments are assigned dierent preference levels according to the semantic and conceptual constraints they satis ed or violated. Semantic preferences come from a combination of syntactic correspondence constraints, conceptual selectional constraints, and semantic role attachment constraints. These constraints may be MUSTPRECEDE constraints on role attachment (as in the roles of a PP, for example), constraints based on correspondence with syntactic compositions (such as whether the attachment is to an NP or a VP, and so on), MUST-BE constraints on role ller classes coming from conceptual knowledge, or constraints based on other linguistic knowledge such as whether the attachment is in an active- or passive-voice construct.
Table 8.2: Preference Levels in Semantics. Semantic constraints Constraints not satis ed No meanings to check constraints with No constraints exist Constraints satis ed
Preference Level -1 1 32 3
Positive integer
Table 8.2 shows the four dierent levels of semantic preferences. The least preference exists when a constraint is violated. An arbitrary value of -1 is assigned to this preference level. The next higher preference exists when one or both of the words being composed do not have any meanings to check semantic constraints against. This is assigned a positive value close to zero, namely, 31 . The third preference level is assigned when word meanings are available but there are no known constraints to check against. This is assigned a slightly higher value, 23 , the rationale being that we have meanings and meanings do not violate any constraints known to the sentence processor. In the previous case, there might have been some constraints known, but there were no role llers to check the constraints against. If meanings were acquired at a later time, then some of the known constraints may be violated. Hence, this case is assigned a lower level of preference than the one where there are no constraints to be checked. The highest level of preference exists when known
152
constraints are in fact satis ed by the word meanings. The number assigned to this level is a positive integer, and hence greater than the value of any of the previous levels, the exact value depending on the actual constraints that were satis ed. The numbers above do not have any greater signi cance than partitioning the scale of preferences into discrete ranges that can be compared against each other. For example, 32 is greater than 13 , but their ratio is of no signi cance whatsoever. These values were chosen since two values were needed between zero and one; negative values are below zero and indicate a failure to meet semantic constraints; positive integers greater than or equal to one indicate that constraints were satis ed, their actual value giving an idea of how many were satis ed and what weight the satis ed constraints carried. The two intermediate values stand for the inability to check existing constraints ( 31 ) or the absence of any semantic constraints ( 32 ). There has been no need to \ ne tune" any of these numbers during the implementation of COMPERE. It may be noted that constraints in Table 8.2 may be conceptual (MUST-BE), role composition (MUST-PRECEDE), or syntactic correspondence constraints. They are all given equal weights when they are satis ed. This amounts to a heuristic that the more constraints satis ed, the higher the preference for the proposed attachment. However, when any one constraint is violated, the preference falls to the lowest level ( -1) and might result in a rejection of the proposal. Thus, essentially, semantic preferences have two levels corresponding to whether the constraints are satis ed or not along with two intermediate levels, one for the absence of any constraints at all and the other for the absence of sucient information from the input to check against known constraints.
8.3.2.4 Selecting and Making Role Attachments
Just as in the case of the selection of syntactic attachments, semantic selection is a uni ed task requiring the arbitration of syntactic and semantic proposals and preferences. This job is done by the uni ed arbitration algorithm which is described below. Once an attachment is selected, making the role attachment is essentially a bookkeeping operation and is pretty much the same process as making syntactic attachments.
8.3.2.5 Constraints on Role Assignment
Given that semantics addresses the relationships between linguistic structure and conceptual meaning, some semantic constraints lean towards the syntactic aspects of semantic relationships and others more towards the conceptual aspects. Constraints used to determine semantic role assignment are of three basic kinds: Syntactic Correspondence Constraints: The assignment of a semantic role must correspond to the syntactic compositions in order to ensure consistency between syntactic and semantic interpretations of a sentence. Syntactic guidance to semantics is implemented in COMPERE via a set of correspondence constraints on role links (Figure 8.7). These constraints enforce tests on the local syntactic context to make sure that the syntactic structures in the context license the role attachment being proposed. These constraints include: { Subject-p: Is the NP a syntactic subject NP (i.e., an NP that is the rst child of an S or a Rel-S)? { Object-p: Is the NP a syntactic object NP (i.e., NP is the child of a VP)? { Direct-Object-p: Is the NP a syntactic direct object NP (i.e., NP is the last ( rst if only one and second if there are two) of the object NPs in the VP)? { Indirect-Object-p: Is the NP a syntactic indirect object NP (i.e., NP is the rst of two NPs in the VP)?
153
{ NP-Parent-p: Is the syntactic parent node an NP? { VP-Parent-p: Is the syntactic parent node a VP? { Co-agent-role-p: Is the parent an NP that already has an agent role? Note that this is
a conjunction of an NP-Parent-p constraint and a check for the presence of a particular role. { Co-theme-role-p: Is the parent an NP that already has a theme role? This too is the conjunction of an NP-Parent-p constraint and a check for the presence of a particular role. Role Constraints: These are constraints from the \role grammar" whose \phrase-structure rules" say which roles must precede or follow which other roles in a parent role structure (Figure 8.7). { MUST-PRECEDE constraint: Is the preceding child role of the existing parent role of the appropriate kind? { MUST-FOLLOW constraint: Is this child role followed by a role of the appropriate kind? { Event-Exists-p: Is there an event role that could take the role? This is a check for the presence of an event role. This helps delay the unwarranted attachment of a subject noun to an agent role even before the verb is seen, for example. Conceptual Constraints: These MUST-BE constraints are the selectional restrictions on role llers. They insist that any role llers must be meanings of a particular kind such as an animate object or other classes of objects (Figure 8.6(b)). { Satis es-slot: Does the role ller meet any MUST-BE constraints on the llers of the role as speci ed in the conceptual knowledge for the parent role's role ller (Figure 8.6(a))? { Event-Aords-p: Does the event role accept the current role and does the role ller satisfy any MUST-BE constraints? This is a combination of a role constraint and a satis es-slot constraint. Checking this constraint might actually involve several other role attachments if the child role is only indirectly attached to the event role before the satis es-slot constraints can be checked. For example, given a subject-NP role and an event role, one has to rst specialize the subject-NP role to an active-subject role (Figure 8.7) and so on before checking to see whether the word meaning of the NP satis es any selectional MUST-BE constraints on the agent role for the event. { Satis es-role-slot: Does the role ller meet any constraints on role llers as per semantic (role) knowledge? This constraint is not used in the current implementation of COMPERE. However, it allows one to generalize certain lexical knowledge. One can specify, using a satis es-role-slot constraint, a constraint for every ller of a particular role without having to repeat the information in the conceptual entry for every word whose meaning accepts the role. Other Linguistic Constraints: Other linguistic cues provide additional constraints on role assignment. We will only deal with the voice of a sentence here. Similar constraints can also be imposed from agreement of word in ections for number, person, gender, case, and so on across the dierent parts of a sentence. { Active-voice-p: Is the clause in active voice?
154
{ Passive-voice-p: Is the clause in passive voice?10 It may be noted that many of the constraints above take the role being processed, the child role or the parent role of the proposed attachment, as an argument and use it as well as the corresponding syntactic structures to access information from the local syntactic and role context. It may also be noted that the order in which the possible parent roles and the associated constraints for attaching to them are checked is important. Such order is used to implement negation of certain constraints. An example is in the constraints for specialing an active-subject role to a non-agent-subject role (Figure 8.7). An active-subject role should be specialized to a non-agent-subject role only when the current event does not aord this role becoming an agent role (i.e., if the subject noun cannot be an agent role when the sentence is in active voice, only then can it become an experiencer or an instrument role).
8.3.3 Arbitration
Having generated the semantic attachment proposals (or rejections for that matter) corresponding to syntactic attachment proposals, COMPERE now arbitrates between the dierent syntactic and semantic alternatives. Arbitration is really a task of combining information arising from disparate sources of knowledge. The arbitrator must combine syntactic preferences for alternative interpretations arising from the syntactic knowledge sources with the semantic preferences originating from semantic and conceptual knowledge. Unless these distinct knowledge sources are always benign, the arbitrator must weigh the preferences against each other in order to select the alternative that is \best" overall. A fundamental problem in doing this is determining a projection from one scale of preferences to another. If the two scales are merged, then the preferences can be combined using arithmetic operations.11 How do we know for instance whether a level 2 preference in syntax is higher or lower than a level 1 in semantics? Since these preferences embody conceptual knowledge as well, such a projection could perhaps only be derived from analyses of a corpus of texts or a domain and the application task in which sentence interpretation is situated. The arbitration algorithm presented below is a simpler one where this problem of projecting one scale of preference to line it up with another is avoided altogether. As a result of avoiding having to combine the information explicitly, the arbitrator might appear to be slightly biased, favoring syntax a bit more than semantics, but this has not been a problem at all in COMPERE. The arbitration algorithm described below is also well suited to explaining the psychological data on ambiguity resolution and error recovery (Holbrook, Eiselt, and Mahesh, 1992; Mahesh and Eiselt, 1994; Stowe, 1991). While mathematical approaches for doing arbitration using real-numbered preferences and various theories of probabilities have been proposed for integrating information from multiple knowledge sources, such precision has not been necessary for COMPERE to produce acceptable sentence processing behaviors. As noted previously, COMPERE's goal is not to match any individual body of psychological data word to word; its real purpose is to design and implement the computational machinery it takes to produce a variety of sentence processing behaviors. However, it is duly noted that the arbitration algorithm presented here could be an approximation to one using real-numbered preferences. A propitious aspect of using only discrete levels of preferences is that COMPERE does not have to deal with the question of where those real numbers originate from or how they can be acquired or learned. Passive voice is detected as follows: A clause is in passive voice if it has its verb in the participle form and has an auxiliary form of \be" (\is," \was," \were," and so on). A clause is also in passive if it is in the participle verb form without the auxiliary but is a relative clause without a relative pronoun. A clause is in active voice when it is not in passive voice. 11 It may be noted that certain connectionist networks do precisely this by having a merged single scale of activation levels to start with so that arbitration is in a sense automatic. 10
155
It may also be noted here that the computational complexity of the algorithm below is not a concern of much import since rarely do we come across examples where the number of either syntactic or semantic alternatives is more than a handful. It is also true that the algorithm could be implemented eciently, by building sorted lists of attachment proposals incrementally for instance, if necessary.
8.3.3.1 Implementing the Arbitration Algorithm
A proposal complex here is a potential syntactic attachment, a syntactic preference for it, corresponding semantic role attachments, and semantic preferences for them. Such a proposal complex not only speci es how to attach a syntactic child unit to a parent unit, but it also speci es one or more semantic role attachments, through intermediate roles if any, that correspond to the syntactic attachment. The given proposal complexes are rst sorted in decreasing order of the preference levels of the syntactic attachments. They are then partitioned into equivalence classes where members of a class have the same syntactic preference level. After this, each equivalence class is sorted in decreasing order using the highest preference for any one of the semantic attachment proposals associated with each complex. Using this doubly sorted sequence of proposal complexes, an equivalence class is picked that has the highest syntactic preference and a positive semantic preference. If there is no such class, then the rst class (i.e., the one with the highest syntactic preference) is picked. Finally, from this equivalence class, only those proposal complexes are picked that have the highest semantic preference levels for any one of the semantic attachment proposals in them. In other words, we simply select the highest preferred one that both syntax and semantics agree on. The arbitration algorithm is shown in Figure 8.12. It may be noted that this algorithm implicitly allows delayed decisions. Whenever semantics has equal preferences (whether positive, zero, or negative), it allows the selection of all the alternative with equal preference so that they are all pursued until disambiguated later by new information. If semantics has a positive preference for at least one syntactic proposal, such a proposal is selected. If semantics has no preference for or against any syntactic attachment, the most preferred syntactic attachment is selected. If, on the other hand, semantics rejects all syntactic proposals, one or more of the most preferred syntactic proposals are still selected, thereby allowing syntax to override semantics to produce an interpretation instead of none at all. It may also be noted that when semantics has a negative preference for a highly preferred syntactic proposal and a positive preference for a syntactically less preferred proposal, the arbitration algorithm selects the latter proposal and goes ahead with it. Implications of this decision in terms of cognitive accuracy as well as reasons for not selecting both (and delaying the decision) in such circumstances are discussed in Chapter 10.
(4)
The ocers taught at the academy were very demanding. Consider, for example, sentence (4) above and the situation depicted in Figure 8.13 which shows the syntactic and semantic preferences at the ambiguity in composing the verb \taught" with the subject NP \The ocers." It is clear that in this case, there is a strong syntactic preference for the main verb attachment. Since semantics has equal preferences for either attachment, arbitration is trivial in this case and the main verb attachment is selected.
(3)
The courses taught at the academy were very demanding. Considering sentence (3) above, on the other hand, we see (Figure 8.14) that though the syntactic preferences remain the same as in the previous example (Figure 8.13), semantics now has a negative preference for the main verb attachment (i.e., semantics rejects this attachment proposal, since
156
Data Structures: Proposal Complex: a record Nodes: the two nodes to be syntactically composed. Syn-Pref: the syntactic preference level. Roles: role pairs being semantically composed. Sem-Pref: semantic preference levels for role pairs.
Algorithm Arbitrate:
Given: A set of feasible compositions fP g where each P is a proposal complex: Sort the set of P s in decreasing order of Syn-Pref; Partition the sorted list into equivalence classes E of P s of equal Syn-Pref; For each partition E , Sort the P s in E in decreasing order of the maximum Sem-Pref in a P ; Select the rst E with a positive maximum Sem-Pref; Else if no such E exists, select the rst partition in the sorted list E1 (syntactically most preferred); Select from the chosen partition E those P s that have the maximum Sem-Pref among the P s in that E . i
i
i
j
i
j
i
j
i
j
j
j
i
i
j
Figure 8.12: The Arbitration Algorithm. \courses" cannot be the agents of \taught"). Thus, in this case, the reduced relative attachment is selected by the arbitrator.
8.3.4 Resolving Lexical Semantic Ambiguities
The above methods for proposing and selecting syntactic and role attachments explain how COMPERE can resolve lexical category ambiguities and structural ambiguities. Apart from these syntactic ambiguities, these methods can also resolve ambiguities in role assignment (i.e., structural semantic ambiguities). However, they do not tell us how COMPERE resolves lexical semantic (or word sense) ambiguities. These ambiguities do not lend themselves to be construed as a choice in some form of attachment.12 As such, separate methods are necessary to resolve lexical semantic ambiguities. COMPERE has the following additional methods that work in conjunction with the above selection methods to choose contextually appropriate word meanings.
Deactivate Meanings: This method is called whenever a role attachment is made. It deactivates any word meanings that are not compatible with the role attachment being made. Any meaning that violates one or more selectional preferences, such as conceptual MUST-BE constraints on role llers, are taken out of the set of conceptual meanings for the structure
Though it is conceivable that in a semantic network or other spreading activation model of word sense disambiguation, selecting between word senses can be construed as a choice between links (or paths) in the semantic network. 12
157
Event-Role
S Agent-Role
Active-Subject-Role
VP
NP
Det
officers
The
Subject-NP-Role
V
N
taught
NP-Role
Attachment Proposal 1: Main-Verb Attachment Syntactic Preference: 6
Sematic Preference: 3
S
NP Rel-Cl Event-Role
Det
N Rel-S
The
Experiencer-Role
officers
Passive-Subject-Role
VP Subject-NP-Role
V taught
NP-Role
Attachment Proposal 2: Reduced Relative Attachment. Syntactic Preference: 1
Semantic Preference: 3
Figure 8.13: Arbitration: A Benign Situation.
158
Event-Role
S Agent-Role
Active-Subject-Role
VP
NP
Det The
Subject-NP-Role
N
V
courses
taught
NP-Role
Attachment Proposal 1: Main-Verb Attachment Semantic Preference: -1
Syntactic Preference: 6
S
NP Rel-Cl Event-Role
Det
N Rel-S
The
Theme-Role
courses Passive-Subject-Role
VP Subject-NP-Role
V taught
NP-Role
Attachment Proposal 2: Reduced Relative Attachment. Syntactic Preference: 1
Semantic Preference: 3
Figure 8.14: Arbitration: A Con ict Situation.
159
being attached. These meanings are retained as alternative meanings for their potential recall during later error recovery processes. Reactivate Retained Meanings: This method is applied during an error recovery operation. It reexamines any deactivated meaning in the new context and brings back any meaning that is now appropriate, because the selectional constraint that eliminated it in the rst place is not present in the new context. If there are new selectional constraints, on the other hand, this method might use the service of Deactivate Meanings to remove some of the previously active meanings. Thus, COMPERE has the capacity to switch conceptual meanings appropriately as the syntactic or role structure of the interpretation changes.
8.3.5 Retention and Elimination of Alternatives
In order to enable recovery from errors (as opposed to just reprocessing the input), COMPERE retains unselected alternative interpretations at every point. However, indiscreet retention could soon become prohibitive with hundreds of alternatives retained for some sentences. COMPERE uses a simple heuristic to decide when to retain an alternative and when it can safely discard an unselected proposal. The heuristic is: Retain only accessible alternatives: Only alternatives for attachments that are currently accessible by being on the right-hand boundary of a parse tree are retained. As a subtree moves to interior parts of a tree, alternatives associated with nodes on that subtree are discarded. This greatly reduces the number of alternatives retained for error recovery. This rule is based on a hypothesis that not only is a right-hand boundary the only accessible part for making attachments, it is also the only part accessible for repairing the parse forest during error recovery. This hypothesis has not been contradicted by any sentence COMPERE has been tested with.
8.3.6 Implementation of Error Recovery Algorithms
In its attempts to produce incremental interpretations and also be deterministic to the extent possible, COMPERE sometimes commits errors in both syntactic and semantic interpretation.13 However, it has the ability to recover from errors in both syntax and semantics. There are several speci c error recovery methods that it employs for reinterpreting the sentence without reprocessing the entire sentence. Though the essential mechanism for error recovery is obviously backtracking to a previously unselected alternative, COMPERE's error recovery methods are more than plain chronological backtracking. The alternatives are stored in the working memory as already mentioned. Error recovery methods pick out the appropriate alternatives from this memory using the information available from the error situation without having to search through unrelated alternatives. Error situations and the corresponding recovery methods can be of three basic kinds.
8.3.6.1 Composition Failure
When a new word meaning cannot be composed with the existing interpretations either syntactically or semantically, then we have a composition failure. Assuming that the sentence itself is always correct, COMPERE looks at its alternatives to see if reinterpreting the initial part of the sentence These errors do not occur because of incorrect algorithms or bugs in COMPERE's implementation. They are a result of the fact that COMPERE makes its incremental decisions based on the information it has at its access at the time of decision, but sometimes information available later in processing a sentence proves a previous decision incorrect. 13
160
using one of the retained alternatives results in a new interpretation with which the new word meaning can be composed successfully. Since the new structure that could not be composed is available in this type of error, COMPERE searches the memory of alternatives for precisely that which involves an alternative composition of a previous structure of the same kind. The reasoning behind this heuristic is that perhaps the previous structure has taken away the place which should have been left open for attaching the new structure. This heuristic search process is not the same as either chronological or dependency-directed backtracking. It does not go back and reexamine every decision that was made in processing the sentence. Furthermore, although it is similar to dependency-directed backtracking, it is not dependency directed since there was no explicit knowledge of a causal dependency between the previous decision and the present error.
(5)
The bugs moved into the lounge were found quickly. For example, consider sentence (5) above and the error situation upon reading the word \were" shown in the top half of Figure 8.15. This is a composition failure where there is no way to compose the meaning of \were" with the previous part of the sentence. At this point, there are three retained syntactic alternatives (shown in the gure). COMPERE does not examine or pursue each one of these. From the present error situation, it knows that the error occurred because of a lack of place to attach the new VP. Hence it looks for alternatives for attaching previous VP's. Thus, it selects alternative 2 (in Figure 8.15). COMPERE then repairs the parse forest as follows:
It removes the existing VP to S connection. It detaches corresponding role assignments and removes the corresponding intermediate roles.
For example, it removes the Agent role for the subject noun and retracts the role of the subject noun back to the Subject NP role. It \unpropagates" meanings. For example, it removes the move meaning and its event role from the S node and makes them point back to the now unattached VP node. It reattaches the VP through the relative clause structure suggested by the retained alternative proposal. It performs corresponding role assignments. For example, the subject noun now gets a theme role in the move event. It nally attaches the new VP (with \were") to the now open, expected position in S. Finally, it performs role assignments suggested by the new attachment. For example, S now has a new find event in which the subject noun has a theme role.
The resulting correct interpretation is shown in the bottom half of Figure 8.15. It may be noted here that the error recovery process described above involved only repair and reinterpretation without a complete reprocessing of the input sentence. For instance, the prepositional phrase and its role in its parent VP were never modi ed in any manner during this recovery.
8.3.6.2 Incompleteness Failure
When a syntactic or semantic structure remains incomplete (at the end of the sentence, for example, when there are no more words to be processed), the result is an incompleteness failure. In such a situation, COMPERE analyzes the required units for the incomplete sentence or phrase to be completed, which are also the expected units at that point, and picks out alternatives from its working memory that involve compositions of structures, which are still on the right boundary and
161
S
VP
VP
NP
Det
N
The
bugs
V
V
PP
moved
were
NP
Prep into Retained Alternatives: 1. Attach PP to S.
Det
ADJ
N
the
new
lounge
2. Attach previous VP to a Rel-S. 3. Attach subject NP to a Rel-S. Error: A Composition Failure. ‘‘were’’ cannot be composed with the rest.
S
VP
NP
Det
N
The
bugs
Rel-Cl
V Rel-S
were
VP
V
PP
moved Recovery: Successful Composition. Retained Alternative 2 was selected to repair the tree.
Figure 8.15: Error Recovery: Composition Failure.
162
hence are accessible at this time, that are of the same type as the required unit(s). COMPERE explores such alternatives to see if reinterpreting them removes the error by producing complete structures. Here again, the same heuristic is employed to search the retained alternatives although the error situation is dierent from the previous one.
(6)
The children taught at the academy. For example, consider sentence 6 above. As shown in the top half of Figure 8.16, COMPERE has pursued a reduced relative interpretation until the end of the sentence since there is a conceptual constraint in teach that does not allow \children" to ll an agent role in it. In this interpretation, \children" ll an experiencer role in the event. However, at the end of the sentence, there is an incompleteness error since the sentence structure is incomplete. The S structure is still waiting for its head VP. The alternatives retained at this point are shown in the gure. COMPERE selects among these and repairs the structure to obtain a complete S structure as follows:
It identi es the structures on the right-hand boundary that are not incomplete and gathers
their expected units (i.e., the required units which would complete the incomplete structures). Using a search similar to the one described in the case of composition failure above, it selects a retained alternative that would release a previous structure so that it could meet the expectation and complete the incomplete structures. It repairs the parse forest by removing the previous attachment and making the attachment suggested in the retained alternative. It performs appropriate role assignment operations to re ect the changes made during repair. The resulting complete structure is shown in the bottom half of Figure 8.16. In this interpretation, the \children" ll an agent role in the teach event. It may be noted that the error occurred in the rst place because of a conceptual selectional constraint that said \children" cannot ll the agent role of the event. As such, COMPERE must discount this constraint during the repair process, for otherwise, the same constraint would prevent it from building the complete structure. COMPERE accomplishes this by telling its arbitrator to disregard semantic constraints and go ahead considering only syntactic preferences. Such a change in strategy necessary to model the type of error recovery behaviors illustrated in this example is possible in COMPERE because its architecture has a uni ed arbitrator. The arbitrator provides the necessary control over syntaxsemantics interaction to enable recovery from such con icts and errors. Once again, error recovery was accomplished through repair and reinterpretation, not reprocessing of the input sentence.
8.3.6.3 Recovery Induced Errors
It is also possible that sometimes recovering from one error might cause a decision made previously to be revoked or altered. For instance, reinterpreting a syntactic structure might annul a previous word sense disambiguation. In sentence (5) above (Figure 8.15), for example, at the time of the composition failure, the word \bugs" was disambiguated to mean only an insect, the microphone meaning being deactivated and retained as an alternative. After the repair and the recovery from the error, however, there is no longer a reason to reject the latter meaning of \bugs." COMPERE reexamines the active and deactivated meanings of the structures involved in error recovery, which are also the structures whose roles are reassigned, to see their current status as per the new constraints coming from the new role assignments. This process is done concurrently with the role (re)assignment process to ensure consistency between the syntactic and semantic interpretations at all times. For example, in the \bugs" example, it turns out that the insect meaning is still
163
S
NP
Det The
N children
VP
Expected VP
Rel-Cl
Rel-S
Retained Alternatives:
VP
1. Attach PP to Rel-S. 2. Attach PP to NP.
V 3. Attach VP to S.
taught
PP
4. Attach NP to a Rel-S.
Error: Incompleteness Failure. Expected VP never seen.
S
NP
Det The
VP
N children
V taught
PP
Recovery: Successful Completion. Retained Alternative 3 was selected to repair the tree.
Figure 8.16: Error Recovery: Incompleteness Failure.
164
valid in the new interpretation. However, the microphone meaning is also compatible with the new role assignment where the \bugs" are now assigned a theme role instead of the original agent role in the move event. The microphone meaning is also brought back from the retained state to the active state. Thus, a previous decision which had resolved the word sense ambiguity is now revoked and the word sense ambiguity is \unresolved" as a result of the error recovery process.14 Having a uni ed arbitration process helps once again by providing the control over the interactions between syntax and semantics to ensure that the two are always consistent. Though COMPERE is capable of recovering from the types of errors described above, the program (i.e., its \state") does not change in any manner after it encounters an error and recovers from it. For example, it behaves the same way over and over again making the same errors and recovering from them when presented with a sequence of sentences that lead the processor to the similar errors. It appears that COMPERE diers from human sentence processors in this respect, since the human processor appears to be at least aware of diculties encountered in recovering from certain errors. However, in order to model such an \awareness" in a program, it appears that having a single process acting as a uni ed arbitrator is a de nite advantage: the single process can keep a record of the additional decisions and arbitrations it had to perform in order to recover from the error. Such records can be used to change the behavior of the program in subsequent processing to model the psychological eects of encountering and recovering from errors.
8.4 COMPERE, the Program COMPERE, the computational model of sentence understanding, has been implemented and tested with a variety of sentences containing syntactic and semantic ambiguities. The program has been written in Common Lisp with the Common Lisp Object System (CLOS). A graphical user interface to COMPERE has also been developed using the Common Lisp Interface Manager (CLIM). The interface allows the user to run COMPERE on a sentence either by typing in a new sentence or by selecting from a set of sentences o a menu. While running, COMPERE displays the words being processed, the current syntactic output (i.e., parse trees), and the current semantic output (role bindings shown in the intermediate-role trees) in separate windows. A key feature of the interface is that it allows the user to stop COMPERE after each word. This allows a clear graphical demonstration of incremental processing capabilities, garden-path eects, and error recoveries. Of course, COMPERE can also be run without using the graphical interface. COMPERE's implementation is very portable; it can be any computing platform that supports Common Lisp. The program is designed so that syntactic processing can be separated from semantic processing. By setting a single ag, COMPERE can be made to run as a syntactic parser only or as a semantic processor that interprets directly from lexical entries of words without depending on a syntactic parse of the input sentence. This feature allows one to see clearly the functional independence between syntax and semantics in COMPERE. COMPERE's program is about 9000 lines of Common Lisp code including a grammar, a small lexicon, and a limited amount of semantic knowledge. The lexicon has now been extended to include over 600 English words. COMPERE's grammar has 31 categories (or nonterminals) with 1 to 8 \rules" per category covering a variety of phrase, clause, and sentence structures including prepositional phrases, relative clauses, and so on. COMPERE's semantics has 34 intermediate roles with 1 to 7 \rules" per role. Its conceptual knowledge has over 500 concepts in several hierarchies. COMPERE runs fairly fast, taking less than a second to process a word on an average. Resolving an ambiguity here refers to the operation of selecting a proper subset from the set of possible interpretations at an ambiguity. Consequently, the term \unresolving" may be used to refer to the operation of switching back to the bigger set of possible interpretations from the previously selected subset. 14
165
COMPERE has been tested with 20 to 25 types of sentences with various syntactic constructs and various combinations of syntactic and semantic ambiguities, both lexical and structural. COMPERE has also been integrated with the ISAAC story understanding system and tested with a couple of complete stories which are short, science ction stories (Moorman and Ram, 1994). In addition, COMPERE has been partially integrated with a design comprehension system called KA (Mahesh, Peterson, Goel, and Eiselt, 1994; Peterson, Mahesh, and Goel, 1994) and tested with several sentences from texts describing the design of simple physical devices.
166
CHAPTER IX PERFORMANCE ANALYSIS AND EVALUATION A wide range of language-processing strategies was employed by the top-scoring systems, indicating that many natural language-processing techniques provide a viable foundation for sophisticated text analysis. Further evaluation is needed to produce a more detailed assessment of the relative merits of speci c technologies and establish true performance limits for automated information extraction.
W. Lehnert and B. Sundheim, 1991 In this chapter, we present a comparative analysis of the performance of COMPERE as a sentence processor. First COMPERE is shown to produce the desired interpretations for a variety of sentences. In order to show this, we illustrate the feasibility of COMPERE as a sentence processor with a small set of sentences that covers a range of phenomena in sentence interpretation and show that COMPERE in fact adheres to the functional and cognitive principles it was based on while processing the sentences. After this, we perform comparative analyses of COMPERE with other models of sentence interpretation. Finally, we provide a formal analysis of COMPERE as an automaton. This view of COMPERE gives us the formal machinery to enable us to talk about the empirical factors involved in an analysis of the performance of a sentence interpreter vis-a-vis many features of its design. The formal analysis will elicit the tradeos inherent in the choices made in the design of a sentence interpreter and enable us to compare the performance of dierent sentence interpreters as well as attribute the tradeos to features of their design.
9.1 Validation of the COMPERE Program The rst issue in evaluating the claims made in this thesis is to establish the feasibility of the sentence processing model by showing that the COMPERE system as described in Chapter 8 and implemented in the COMPERE program can in fact interpret a variety of sentences producing desired behaviors. For this purpose, we present COMPERE's behaviors for a set of interesting sentences and analyze the behaviors showing that COMPERE produced them while adhering to the functional and cognitive constraints on the model outlined in Chapter 2.
9.1.1 Simple Sentences (1)
The ocers taught at the academy. COMPERE's output for this simple sentence with an unambiguous prepositional adjunct is shown in Figure 9.1.
(2)
The ocers were at the academy. This sentence has an Auxiliary-Verb ambiguity as a result of the lexical syntactic ambiguity in the word \were" whose category can be either an auxiliary or a verb. This local ambiguity is resolved
167
S
VP
NP
Det
N officers
The
V
PP
taught NP
Prep at Det the
N academy
teach: AGENT: officer LOCATION: academy
Figure 9.1: A Simple Sentence. at the next word which starts a new prepositional phrase after the VP. The output for this sentence is shown in Figure 9.2.
(3)
The ocers were taught at the academy. This sentence is similar to the previous one except that the local category ambiguity is resolved here in favor of the auxiliary when the following verb \taught" is processed. The output is shown in Figure 9.3. In all the examples above, academy is assigned to a location role without requiring the verb meaning (i.e., the event) to subcategorize the optional location role. It may also be noted that the sentence is in passive voice and COMPERE was able to correctly assign the appropriate thematic role to the subject noun. This was made possible by the intermediate roles (not shown in the gures in this chapter) without having to explicitly perform a passive transformation in syntax.
9.1.2 Relative Clauses (4)
The ocers taught at the academy were very demanding. This sentence is a syntactic garden-path reduced-relative. It has a subcategory ambiguity in \taught" which can be either in simple past or the past participle. As a result, there is a structural ambiguity between the main verb and the reduced relative structures. COMPERE resolves this ambiguity in favor of the main verb structure, garden paths as a result, and recovers from the error
168
S
VP
NP
Det
V
N
The
officers
PP
were NP
Prep at Det the
N academy
be AGENT: officer LOCATION: academy
Figure 9.2: An AUX/V Ambiguity Sentence. to assign the nal structure shown in Figure 9.4. The intermediate interpretations are not shown in this gure; however, the intermediate results are the same as for sentence (1) above. (5) The ocers who taught at the academy were very demanding. This sentence diers from the previous one in that it has an explicit marker, the relative pronoun \who," to signal the relative clause. As such, there is no ambiguity and no error committed in processing this sentence. However, this along with the other relative clause sentences shows COMPERE's abilities to deal with non-trivial syntax (i.e., structures more complex than a simple subject-verb-object structure). COMPERE can correctly assign appropriate thematic roles to the subject noun in both the main and the relative clauses. COMPERE's output for this sentence is shown in Figure 9.5.
(6)
The ocers who were taught at the academy were very demanding. In this variation, the relative clause is in the passive voice pushing the role of the subject noun in the relative clause (in the teach event) to an experiencer role. The corresponding output structures are shown in Figure 9.6.
(7)
The ocers who taught the men at the academy were very demanding. In this variation, the relative clause is transitive and back in active voice and has a direct object which was left out in all the previous examples. The resulting structures are shown in Figure 9.7.
169
S
VP
NP
Det The
N officers
AUX were
V taught
PP
NP
Prep at Det the
N academy
teach EXPERIENCER: officer LOCATION: academy
Figure 9.3: A Passive-Voice Sentence.
(8)
The courses taught at the academy were very demanding. This sentence looks very similar to sentence (4) but results in very dierent behaviors. A semantic bias, resulting from the conceptual knowledge about courses and teaching regarding the (in)animacy of courses, acts against the main verb interpretation. Because of this semantic eect of the selectional preference, COMPERE directly pursues the syntactic alternative of a reduced relative clause as shown in Figure 9.8. There is no error as a result; the garden path is avoided leading to the nal interpretation in Figure 9.9.
(9)
The ocers the man taught were very demanding. To further illustrate the abilities of COMPERE to deal with the nuances of syntax, this sentence shows a subject NP in the reduced relative clause with its gap in the direct object position. In other words, this is a center embedded sentence with one level of embedding. COMPERE's correct interpretations are shown in Figure 9.10.
(10)
The bugs moved into the new lounge were found quickly. This sentence has both a lexical semantic and a structural ambiguity. The structural ambiguity is of the same kind as in the previous examples (a main-verb/reduced-relative ambiguity as in sentence (4) arising from the subcategory ambiguity in \moved"). The lexical ambiguity is a semantic one: \bugs" can mean either an insect or a microphone. In addition, resolving one resolves the other
170
S
VP
NP
Det
N
The
officers
Rel-Cl
Rel-S
V
ADV
were
very
ADJ demanding
VP
V
be
PP
taught
AGENT: officer STATE: demanding
NP
Prep at
teach EXPERIENCER: officer
Det the
N academy
LOCATION: academy
Figure 9.4: A Reduced-Relative Garden-Path Sentence. ambiguity. This interaction can be seen in the interpretations shown in Figures 9.11 and 9.12. In Figure 9.11, COMPERE has selected the main-verb interpretation and as a consequence of the selectional preferences of teach, deactivated the microphone meaning of \bugs." In Figure 9.12, COMPERE has recovered from the error, switching to the reduced-relative interpretation and reintroduced the microphone meaning since the new thematic role for \bugs" does not have the restrictive selectional constraint. This sentence serves to show how COMPERE can deal with the interactions between syntactic and semantic decisions in a sentence that has multiple ambiguities.
(11)
The children taught at the academy. This simple sentence has an interesting anomaly.1 We used the selectional preference and the resulting semantic bias to explain how COMPERE avoids the garden path in sentence (8) above. The same selectional preference is violated in this example. Syntax overrides the semantic bias and forces \children" to be the agent of teaching. COMPERE pursues the reduced relative interpretation initially since at that point there is a semantic bias for that interpretation. However, at the end of the sentence, the prevailing syntactic structure is unacceptable; an incomplete S structure is still waiting to see the main verb. This forces COMPERE to carry out an error recovery and reinterpret the sentence by discounting the semantic bias.2 1
Let us assume that there is a semantic bias against children being the agents of teaching, i.e., only adult, animate
171
S
VP
NP
Det
N
The
officers
Rel-Cl
Rel-Pro
V
ADV
were
very
ADJ demanding
Rel-S
who VP
V
be AGENT: officer
PP
taught
STATE: demanding NP
Prep at
teach AGENT: officer
Det
LOCATION: academy
the
N academy
Figure 9.5: A Relative-Clause Sentence.
9.1.3 Structural Ambiguities
We have already seen one type of structural ambiguity which results from an underlying lexical ambiguity. In order to cover COMPERE's abilities to deal with structural ambiguities in general, we present a few other examples.
9.1.3.1 PP Attachment Ambiguities (12)
I saw the man with the horse. This sentence has a purely structural ambiguity in that the PP \with the horse" can modify either the NP \the man" or the verb \saw." In this example, however, only the NP attachment is permissible given COMPERE's selectional preferences from its conceptual knowledge of see and horse. As a result COMPERE resolves the ambiguity and produces the interpretation shown in Figure 9.14.3 entities can be teachers. 2 See Chapter 10 for a discussion of the psychological aspects of COMPERE's behavior for this sentence and other ways of dealing with this anomaly. 3 There is another possible interpretation where the PP modi es the S node and as a result \the horse" is the co-agent of seeing along with \I." We ignore this possibility since it is common to this and the next example which
172
S
VP
NP
Det
N
The
officers
Rel-Cl
Rel-Pro
V
ADV
were
very
ADJ demanding
Rel-S
who VP
AUX
V
were
taught
be AGENT: officer
PP
STATE: demanding NP
Prep at
teach EXPERIENCER: officer
Det
LOCATION: academy
the
N academy
Figure 9.6: Another Relative-Clause Sentence.
(13)
I saw the man with the telescope. This is similar to the previous sentence except that the PP-attachment ambiguity is a global one in this sentence unlike in the previous one where it was just a local ambiguity until the \horse" was seen. As a consequence, COMPERE produces both possible interpretations of this sentence with the instrument role as well as the co-theme role. The results are shown in Figure 9.15. The intermediate roles, not shown in the gure, provide an unambiguous representation of the correspondence between the two possible syntactic attachments and the two role assignments.
9.1.3.2 Phrase-Boundary Ambiguities
Another type of structural ambiguity results in a phrase-boundary ambiguity (i.e., an ambiguity in whether one or more words belong at the end of the previous phrase or at the start of a new phrase). Here too there is no underlying lexical ambiguity. This is illustrated in the next two examples where an adjective can modify either the preceding verb or the following noun in the direct object NP. COMPERE deals with this ambiguity by implicitly delaying the decision until the next word disambiguates the situation. The delay is implicit because COMPERE does not choose between delaying or not delaying the decision. It merely considers both possibilities and
we are trying to compare and contrast against each other. If we include the S attachment, the resulting ambiguity would be a global one and cannot be resolved by COMPERE in the null context in which it processes these sentences.
173
S
NP
Det
N
The
officers
VP
Rel-Cl
Rel-Pro
V
ADV
were
very
ADJ demanding
Rel-S
who VP
NP
be AGENT: officer STATE: demanding
PP
V taught
Det
N
the
men
Prep
NP
at
teach AGENT: officer
Det
EXPERIENCER: man
the
N academy
LOCATION: academy
Figure 9.7: Yet Another Relative-Clause Sentence. makes all attachments that are permissible in the current context. In doing so, it turns out that the adjective can be attached to the verb but can also be the left corner of a new noun phrase. Since the two choices do not lead immediately to a composition with the same preceding phrase, COMPERE does not have to make a selection between the two immediately. Moreover, it does not have any information at this point to make the selection. Hence COMPERE simply pursues both possibilities and resolves the ambiguity at a later time when the necessary information is available.
(14)
The man was demanding. In this example, the local ambiguity is resolved in favor of the VP attachment since there is no noun following the adjective. The resulting interpretation is shown in Figure 9.16.
(15)
The man taught demanding courses. In this case, however, the ambiguity must be resolved in favor of the NP attachment for the adjective and the VP attachment must be removed. The resulting output is shown in Figure 9.17. These examples serve to show that COMPERE has the repertoire of abilities to deal with the variety of ambiguities and interactions between them. It can make early commitments, it can recover from its errors and switch as well as repair interpretations, and it can also delay a decision and pursue multiple interpretations in parallel.
174
S
NP
Det
N
The
courses
Rel-Cl
Rel-S
VP
V PP
taught teach THEME: course LOCATION: academy
NP
Prep at Det the
N academy
Figure 9.8: A Reduced-Relative With Semantic Bias.
9.1.4 Multiple Ambiguities: A Challenge (16)
The large can can hold the water. As one nal example, consider the sentence above that has multiple ambiguities. The word \can" has a 3-way lexical category ambiguity since it can be either a noun, an auxiliary, or a verb. Similarly, \hold" can be either a noun or a verb.4 This sequence of three ambiguous words results in a plethora of syntactic possibilities. However, amazingly all the ambiguities are local and nally, there is only one correct interpretation which COMPERE produces. This is the one shown in Figure 9.18.
9.1.5 Claims Revisited
We will now summarize brie y how the above illustrations show that COMPERE meets its claims by demonstrating that its theory of sentence interpretation is feasible given the set of functional and cognitive constraints it is based on. It must be noted, however, that there is no reason to believe that COMPERE's theory is applicable only to the set of sentences above. Before we discuss
It is also possible that \large" is a noun instead of being an adjective. We ignore this additional possibility in this illustration. 4
175
S
VP
NP
Det
N
The
courses
Rel-Cl
Rel-S
V
ADV
were
very
ADJ demanding
VP
V be
PP
taught
AGENT: course STATE: demanding
NP
Prep at
teach
Det
THEME: course
the
N academy
LOCATION: academy
Figure 9.9: The Complete Sentence With Semantic Bias. each of the claims, we state brie y the breadth of COMPERE's coverage of sentences in terms of both syntax and semantics.
9.1.5.1 Syntactic Coverage
The sentences shown above show a variety of syntactic features such as relative clauses (sentences (4) through (10)), prepositional phrases (sentences (1) through (8), (12), and (13)), passive forms (sentences (3), (4), (6), (8), and (10)), category ambiguities such as AUX/V ambiguities (sentences (2) through (8), and (16)), structural ambiguities such as main-V/reduced-relative clause ambiguities (sentences (1), (4), and (8)), PP-attachment ambiguities (sentences (12) and (13)), phrase boundary ambiguities (sentences (14) and (15)), and transitive and intransitive verb structures (sentences (1) through (8)). Since it is possible for COMPERE's program to analyze the syntax of these sentences and produce the parse trees shown above, we can say that the representation and processing method we have proposed work for the syntactic analysis of a large subset of English sentences. Moreover, COMPERE not only analyzes the syntax of these sentences and resolves syntactic ambiguities therein, but it can also recover from errors and correct the syntactic structures it has built either by switching to a retained interpretation (sentences (4) and (10)) or by repairing the structure (sentence (11)).
176
S
NP
VP
Rel-Cl
Det
N
The
officers
Rel-S
NP
Det The
V
ADV
were
very
ADJ demanding
VP
N
V
man taught
be AGENT: officer STATE: demanding
teach AGENT: man EXPERIENCER: officer
Figure 9.10: A Center-Embedded Sentence.
9.1.5.2 Semantic Coverage
The sentences above show COMPERE's ability to assign appropriate thematic roles in a variety of sentences. In particular, the examples show the eects of syntactic structure (sentences (1) through (16)), word meanings and selectional preferences such as due to animacy (sentences (4), (8), (10), (12), and (13)), word subcategorizations (sentences (4), (8), and (10)), passive voice (sentences (3), (4), (6), (8), and (10)), and embedded clauses and gaps (sentences (4) through (10)) on the assignment of thematic roles. The above examples show the part played by each of these kinds of information in role assignment. Roles can be said to emerge from all these kinds of information which span a structure-to-meaning spectrum starting from surface syntactic form through semantic role structures to conceptual representations. Roles played by the parts of the sentence in the events described by the sentence start in the form of primitive roles, such as the NP role, originating from the lexical categories and items and then emerge as each new kind of information brings some evidence to more specialized roles resulting nally in the thematic roles shown in the gures above. We now consider each of the four claims made by COMPERE and state how COMPERE's program meets each claim while providing the wide syntactic and semantic coverage noted above.
9.1.5.3 Claim 1: Integrated Processing with Independence
COMPERE has separate representations of syntactic and semantic knowledge which can be applied independently of each other. However, it is an integrated processor since it combines both types of
177
S
VP
NP
Det
N
The
bugs
V
PP
moved NP
Prep into Det
ADJ
N
the
new
lounge
move AGENT: insect-bug LOCATION: lounge
Figure 9.11: Resolving a Structural Ambiguity Also Resolves a Lexical Ambiguity. knowledge as soon as it can. This is illustrated in the above examples by means of the immediate eects of syntactic decisions on semantic decisions and vice versa. The above sentences show how syntax helps in semantic processing (sentences (1) through (16), and especially in sentences (10) and (11)), as well as how semantic constraints help resolve syntactic ambiguities (especially in sentences (8), (11), and (12)). The program can process the sentences and produce the correct output shown above, both at the end of the sentence and incrementally after each word, by combining syntactic and semantic knowledge which are represented independently of each other. As such, the program has demonstrated that COMPERE meets the claim that a sentence processor can be an integrated processor, complying with the integrated processing principle, while retaining functional independence between dierent types of knowledge.
9.1.5.4 Claim 1a: Functional Independence
How do we know that syntactic and semantic knowledge are in fact functionally independent of each other in COMPERE? We must show that they can each be applied no matter whether or not the other one is available in a particular situation. This can easily be done in the COMPERE program through simple lesions performed by setting software switches that determine whether a type of knowledge, and the corresponding processing, is applied or not. Moreover, COMPERE also has the ability to disregard a type of knowledge when a situation requires it. For instance, in recovering from the error at the end of sentence (11), COMPERE deliberately discounts the semantic constraint
178
S
NP
Det The
N
VP
Rel-Cl
bugs Rel-S
V were
ADV
ADJ
found
quickly
VP
V moved
PP
NP
Prep into Det
ADJ
N
the
new
lounge
find THEME: insect-bug, microphone-bug
move THEME: insect-bug, microphone-bug LOCATION: lounge
Figure 9.12: Recovering from the Garden Path \Unresolves" the Lexical Ambiguity. against making \children" the subject of \taught." This is a clear demonstration of the functional independence between the two types of knowledge. COMPERE's program has been designed so that we can temporarily make one or more kinds of knowledge unavailable to see if the other kinds of knowledge can still be applied to make the best decisions with available knowledge. It may be noted here that functional independence is not a new claim to our theory. We have simply maintained it in our model.
9.1.5.5 Claim 2: Synchronizing Syntactic-Semantic Compositions
This claim can only be validated by showing that synchronizing the points at which syntactic and semantic compositions are performed does minimize the total cost of sentence interpretation. This will be done in the section below on the formal analysis of the performance of sentence interpreters.
179
S
VP
NP
Det The
N children
PP
V taught Prep
NP
at
Det
N
the
academy
teach AGENT: children LOCATION: academy
Figure 9.13: Syntax Violates Semantic Bias.
9.1.5.6 Claim 3: Error Recovery
Syntactic error recovery in structural ambiguity resolution is exempli ed in sentence (4). This example shows that COMPERE can make the structural changes necessary for syntactic error recovery as well as that it can make corresponding changes in semantics so that the incremental interpretations are internally consistent all the time. Sentence (10) shows that COMPERE is also capable of recovering from semantic errors. The decision to deactivate the microphone meanings was an error in this case; COMPERE recovered from this error and brought the deactivated meaning back into the interpretation. As stated earlier, this example also has a structural ambiguity that leads to an error recovery in syntax similar to the one in sentence (4). COMPERE is able to carry out both error recoveries and account for interactions between the two.
9.1.5.7 Claim 4: Syntax-Semantics Interaction through Arbitrator
COMPERE employs a single arbitrating algorithm to control the interaction between syntax and semantics and to resolve any con icts between their preferences. The above examples show not only that such an architecture supports interaction rich enough to enable immediate in uence of semantic decisions on syntactic processing and vice versa, but also that it provides sucient control on the interaction to account for error recovery phenomena illustrated in sentences (4), (10), and (11). While this is evidence that COMPERE's arbitration architecture is a sucient model of syntax-semantics interaction in sentence interpretation, the next section will argue for the
180
S
VP
NP
PRO
V
I
saw
NP
Det
N
the
man
PP
Prep
NP
with Det see
the
N horse
AGENT: speaker THEME: man CO-THEME: horse
Figure 9.14: An Unambiguous PP Attachment. necessity of the architecture by comparing it to other possible architectures in light of the above examples. It may be emphasized here that these claims form a coherent set and together de ne the COMPERE model as we conceived of it to begin with. While the COMPERE program needs to be \engineered" to make it a usable software product (for example in ways enumerated in Chapter 10), we believe that the groundwork we have laid will enable one to easily carry out such engineering to scale the model up.
9.2 Comparative Analysis with Other Architectures In this section we reconsider the dierent architectures described in Chapter 5 and show why they cannot produce some of the behaviors that we saw in the examples above.
9.2.1 Sequential Architectures
It is fairly obvious why sequential architectures cannot produce some of the outputs shown in the above examples, especially when intermediate outputs are considered. For example, a sequential syntax- rst architecture (Figure 9.19, reproduced here from Chapter 5) could never produce the incremental outputs for sentences such as (8) and (10). Even when the grain size of interaction is reduced, say to a word-by-word level, in a cascaded architecture, the interaction is still one way.
181
S
VP
NP
PRO
V
I
saw
NP
Det
PP
N Prep
the
man
NP
with Det
N
see AGENT: speaker
the
telescope
THEME: man CO-THEME: telescope INSTRUMENT: telescope
Figure 9.15: A PP-Attachment Ambiguity. For instance, a sequential syntax- rst model could never show the eects of semantic decisions on syntactic processing, and vice versa for a sequential semantics- rst model. The only way around this is to transfer some of the syntactic decision making to the second module, semantics, or to transfer some of the decision making in semantics to yet a third module. By doing that, processing in the second module can have an eect on the decisions originally made in the rst module in a pure sequential model. In any case, the last module which makes all the decisions will start looking more and more like an arbitrator as more power is transferred to the module.
9.2.1.1 Semantics-First Architecture
A sequential semantics- rst architecture would be able to produce the above outputs, just as any other exhaustive-search algorithm, but not the same behaviors. A semantics- rst approach, such as the MOPTRANS Parser shown in Figure 9.20 (Lytinen, 1987), attempts to make semantic compositions early and defers the application of syntactic knowledge until after the best semantic \connection" is chosen. As a result, it explores many syntactically impossible connections. For example, even in a simple sentence in passive voice such as sentence (3), a semantics- rst sequential parser would rst consider the possibility that the subject noun is the agent of the verb and only later realize that such an interpretation is not licensed by syntax. After searching through other possible semantic connections it nally arrives at the correct interpretation shown above. Furthermore, in more complex examples such as sentence (10), it is not even clear that a semantics- rst parser would
182
S
VP
NP
Det
N
V
The
man
was
ADJ demanding
be AGENT: man STATE: demanding
Figure 9.16: ADJ Attachment Ambiguity. produce the garden-path behavior. Even if choosing the \best" semantic connection in Figure 9.20 is interpreted as \choose that connection which resolves lexical semantic ambiguities the most," it does not always guarantee the strong garden-path eect of syntactic preferences seen in sentences such as (4) and (10) (for example, when more word meanings are compatible with the main-clause than the reduced-relative interpretation). It is also not clear how a semantics-driven approach could show syntax-driven error detection and recovery eects seen in sentences such as (4), (10), and (11). Consider sentence (11) for instance. According to the semantics- rst algorithm in Figure 9.20, the error with the reduced-relative interpretation is never even detected, let alone corrected at the end of the sentence. This is a direct result of the subservient role of syntax which can only act as a veri er. One might be concerned that COMPERE suers from a similar weakness, and say that semantics is acting as a veri er in COMPERE (Figure 9.21). However, this is not a valid criticism for two reasons. First, semantics in COMPERE can propose connections independently. It is given an opportunity to do so only when syntax fails to propose any connections, for otherwise, semantics, lacking the guidance of syntax, would propose many unacceptable connections and lead to the same wastefulness as in the semantics- rst approach analyzed above. Second, there is evidence to show that when there is a real con ict between syntactic and semantic preferences, it is syntax that dominates and overrides semantics. This evidence comes from psychological experiments such as those of Stowe (1991) as well as from purely functional reasoning (along the lines of Chapter 4). From a functional point of view, a natural language must allow us to communicate new semantic
183
S
VP
NP
Det
N
V
The
man
taught
NP
ADJ demanding
N courses
teach AGENT: man THEME: course
Figure 9.17: A Local Phrase-Boundary Ambiguity. connections between word meanings in order to convey new ideas. As a consequence, a natural language must provide a mechanism, namely syntax, that can mark the sentence such that our prior knowledge and preferences for semantic connections can be overridden and the new meaning conveyed to the receiver (Crain and Steedman, 1985). On the other hand, since the syntax of a natural language presumably covers any construct that one may need to convey any known or new meaning, there is no functional motivation for semantics to override syntax.5 A sequential architecture, whether syntax- rst or semantics- rst, explores the space of possible interpretations as dictated by only one source of information (either syntax only or semantics only). As such, whenever there are more than one syntactic possibilities and only one of them is semantically acceptable or vice versa, sequential parsers waste a lot of eort exploring alternatives that are unacceptable given the knowledge that they consider later on in processing. COMPERE (Figure 9.21), on the other hand, considers all available knowledge before arbitrating to make a 5 One might say that ungrammatical sentences (or non-sentences) are examples where one might want semantics to override syntax. However, sentences that are ungrammatical are by de nition syntactically unacceptable. I doubt that there is a syntactically acceptable sentence where the intended meaning involves a semantic connection that is incompatible with the accepted syntactic interpretation. An implausible example (due to Woods, personal communication) is \I saw the man with the telescope that you asked me to nd" where the intended meaning requires the PP \with the telescope" to modify the verb \saw" and the following relative clause \that you asked me to nd" to modify the object noun \the man." It is questionable that such usage, which entails syntactic links criss-crossing each other in the parse tree, is permitted in English.
184
S
VP
NP
Det
ADJ
N
AUX
The
large
can
can
V
NP
hold Det
N
the
water
hold AGENT: can THEME: water
Figure 9.18: Multiple Lexical-Category Ambiguities. decision.
9.2.2 Integrated Architectures
Integrated architectures have trouble accounting for the variety of syntactic structures, such as the variety of relative clauses, shown in sentences (1) through (16). In particular, they are unable to show any eect that requires the processor to separate and disregard one type of knowledge and use another (e.g., sentence (11)).
9.2.3 Uncontrolled Parallel Architectures
In an uncontrolled parallel architecture (including the one with a translator, or a blackboard architecture as in Figure 5.3 in Chapter 5), diculties will be encountered whenever the desired behavior requires some control of the interaction between syntax and semantics. Such control may be needed to ensure consistency between syntactic and semantic interpretations during error recovery (sentences (4) and (10)). Control is also necessary to arbitrate and use one type of information in preference over another (sentence (11)). Thus, we see that syntax and semantics (i) cannot be arranged sequentially in either order, (ii) cannot be integrated completely by integrating their knowledge representations a priori, (iii) must run together in parallel, and (iv) must have an arbitrator controlling their interactions to produce
185
PRAGMATICS
SEMANTICS
SYNTAX
Input
Figure 9.19: Sequential Architecture. the right interpretation and the right behavior while producing intermediate outputs. We now turn to the third part of this chapter where we present a formal analysis of a sentence processor as an automaton to address Claim 2 above.
9.3 Formal Analysis of COMPERE There have been many previous analyses of parsers as push-down automata of various kinds (Johnson-Laird, 1983; Resnik, 1992; Aho and Ullman, 1972). These formal models provide certain measures such as the stack size that enable one to compare and formally evaluate dierent parsing algorithms. However, one debilitating feature of these models is that their measures, such as the stack size, only take the syntactic complexity of language into account without regard to meaning or the complexity associated with ambiguities in meanings. Worse yet, some analyses do not even consider ambiguities of any kind. Syntactic complexity is introduced by lexical and structural syntactic ambiguities or ambiguities in grammar as some people refer to it. In order to perform a meaningful evaluation of COMPERE, we desire a formal analysis that takes not only such syntactic complexity but also semantic complexity into account in de ning a measure to be used as a yardstick to grade dierent sentence interpreters, not parsers, against each other. By semantic complexity we mean such factors as the costs of lexical semantic ambiguities, of holding on to individual meanings until they are composed with other, and so on. Our goal here is really to de ne a measure of ambiguity in a sentence. While it is easier to conceive of such a measure for global syntactic ambiguities in terms of the size of the parse forest, our goal is to include semantic ambiguities and in particular, local ambiguities in the sentence. In order to account for local ambiguities, the metric must compute costs after each word in the sentence and accumulate these local amounts in some manner to determine the overall measure of ambiguity in a sentence. Since local ambiguities are a result of the order in which parsing operations are performed, the cost metric will show dierent amounts for dierent algorithms, thereby letting us compare and evaluate the algorithms against each other. It may be noted that the metric being developed here is not a direct measure of any aspect of computation, either computational complexity in terms of the number of primitive operations, or
186
Find possible semantic connections between concepts in active memory
Are there possible connections ?
no
yes
Choose "best" connection
Find syntactic rules whose semantic actions will make this connection Select a rule by indexing via syntactic patterns
Choose a rule whose syntactic pattern is also satisfied
Remove the connection from the list of possible connections
no
Is there such a rule ?
yes
Execute the rule
Figure 9.20: Interaction Between Syntax and Semantics in the MOPTRANS Parser. the amount of computational resources required, such as working memory or processing time. It is simply a measure that increases with an increase in the number of lexical or structural, syntactic or semantic, or local or global ambiguities. Nevertheless, it serves us well in evaluating and comparing sentence interpreters since factors such as whether they use a type of information or when they use it make a dierence in the amount of local ambiguity that the interpreter has to wrestle with and hence a dierence in the above measure. The desired measure of the performance of sentence interpretation must take at least the following costs into account:6
The cost of keeping around the parts of the syntactic structure of a sentence that must be
accessed at a later point. The cost of lexical syntactic ambiguities or category ambiguities. The cost of structural syntactic ambiguities taking care of \packing" and \sharing" of partial structures while the sentence interpreter is pursuing multiple syntactic interpretations in parallel. 6 There may also be interactive eects among the costs listed below so that the total cost when multiple ambiguities are present is more than the sum of the costs listed. An analysis of such interactive eects is beyond the scope of this dissertation.
187
Find possible syntactic connections between trees in the parse forest
Are there possible connections ?
no
yes Find semantic connections that correspond to the syntactic connections between the two trees
Find all semantic connections between the two trees
Choose "best" semantic connection
Arbitrate and Choose "best" connection
Is there such a connection? no
Delay the connection
yes
Make the connection
Figure 9.21: Interaction Between Syntax and Semantics in COMPERE.
The cost of holding on to individual word meanings before they are composed with other
meanings and the cost of holding on to sentence meanings and the meanings of any embedded clauses. The dierence in cost between holding on to two individual meanings and that of holding on to their composite meaning where one meaning is a role ller in the other. The cost of lexical semantic ambiguities. The cost of structural semantic ambiguities, such as the cost of holding on to multiple role assignments for a role ller when it is composed with another meaning.7 The cost of making and holding on to expectations.
9.3.1 Some Intuitions
In developing such a cost measure, it helps to think of the syntactic parse tree as a set of nodes and pipes. Whenever a link is added between two nodes in the parse tree, we are adding a pipe for meanings to ow through. The meaning at the lower node (farther from the root of the tree) ows 7 In the analysis below, we assume that meaning compositions always result in one unambiguous role assignment. We ignore the cost of holding on to role assignments and hence the cost of any structural semantic ambiguities.
188
through the pipe to the upper node (closer to the root) as soon as both the meaning of the lower node and the pipe are available. As meanings ow through the pipes, they compose with each other as soon as they arrive at the same node. This might happen for instance when the meaning of a left child is already waiting at the parent node and the meaning of a right child arrives later at the parent node. Thus, we view parse trees as \pipework" that de ne how meanings should compose with each other and nally arrive at the sentence node, or other roots of embedded clauses. Viewing parse trees in this light tells us exactly when we need to hold on to parts of parse trees and when we need not. We need to hold on to a pipe precisely at those times when a meaning must
ow through it at a later time. We can throw away all other pipes. Consequently, we can throw away all syntactic nodes if they are not currently roots and they do not have any pipes connected to them. As a corollary of this fact, the nodes and pipes on the right-hand boundary of a tree should always be retained since optional adjuncts can (almost) always be added to the right of a tree and require meanings to ow up to the nodes on the right-hand boundary. Another piece of intuition allows us to model the cost of lexical ambiguities in just the same way as syntactic complexity is modeled, in terms of the size of the parse forest or the size of a stack in the automaton. We can view a sentence as a set of meanings together with a variety of linguistic markers, including word order, in ections, closed-class words, and so on, to inform the interpreter of the intended relationships among the meanings. We can then view the sentence interpreter as applying a \grammar" for all the linguistic markers to map the con guration of meanings and markers that we call a sentence to a representation of the relationships between the meanings. In this view, the meanings of words are the terminals of the grammar, the various in ected and otherwise marked forms of the words being the non-terminals of the grammar. We can extend this notion to ambiguous words and treat them as non-terminals in this \grammar." When there is a lexical syntactic or a lexical semantic ambiguity, the sentence has given us a non-terminal instead of a terminal (i.e., unique meaning) and hence must be expanded to get the terminals. This expansion will allow us to represent the cost of lexical and semantic ambiguities in terms of the size of the parse forest. It is by using this intuition that we add a \meaning set" to the stacks of the automaton below and also include the size of this set along with the total size of the parse forest, including the trees introduced for every category of an ambiguous word, in the total cost metric. However, it must be mentioned that this notion of lexical ambiguities as nonterminals in a grammar is meant only for purposes of performance analysis and should not be viewed as what might actually be going on during sentence interpretation.
9.3.2 COMPERE as an Automaton
A sentence interpreter will now be described formally as a push-down automaton with an \enhanced graph-structured stack." A graph-structured stack is a stack where elements of the stack are connected in a set of graph structures instead of forming a simple linear stack of elements. The automaton (Figure 9.22) has a graph-structured stack (hereafter referred to as the main stack) where elements are partial parse trees. Only the nodes on the lea ess right-hand boundaries of the trees are elements of the stack. A lea ess right-hand boundary is the right-hand boundary of a tree excluding the rightmost leaf and the link to that leaf from its parent. The graph-structured stack is a partitioning of the sequence of all the right-hand boundaries in the parse forest so that the entire right-hand boundary of the tree on the top of the stack is currently accessible, instead of just the root of that tree which would be the only element on the top of the stack in a simple linear stack. The graph-structured stack is enhanced by two augmentations. There is a second stack of expectations. The expectation stack is a linear stack of syntactic nodes predicted at the current time. Nodes on this stack have pointers to and from nodes on the main stack. The expectation stack enables the automaton to model prediction operations and account for their cost.
189
The automaton has a third stack of meanings. This is really a set and not a stack since all elements may be accessed at any time. The meaning set is a list of meanings (or meaning compositions). Elements of this list also have pointers to and from nodes on the main stack. The meaning list enables the automaton to model semantics, however minimally, by accounting for the costs of word meanings, their ambiguities, and composition operations.
Finite State Network for Operator Application
Top of Stack
Bottom of Stack
Expectation Meaning Set
Graph-Structured Main Stack Stack
Figure 9.22: The Sentence Interpretation Automaton.
9.3.2.1 Operations in the Automaton
The automaton has the following fundamental operations: 1. Push expectation: Add an expected node to the expectation stack. This involves adding pointers to and from the node on the main stack that generated the syntactic prediction.
190
2.
3. 4.
5. 6.
7. 8.
9.
10.
If the prediction is satis ed in future, the expected node will become a right sibling of the expecting node currently on the main stack. Pop expectation: Remove an expected node from the expectation stack. This operation might be used when an expectation is no longer valid because the sentence took a dierent turn than what was expected, or it may be followed by a Push-tree operation that adds the popped node to the main stack, when the expected unit is actually seen in the sentence. Push tree: Add a tree (which may be just a root node) to the top of the stack. This operation is carried out for each category of the next word being processed. The lea ess right-hand boundary of this tree becomes the new accessible part of the stack. Pop and compose trees: Remove the top of the stack and compose that tree with the new top of the stack. This is done by adding a link from the root of the popped tree to one of the nodes on the right-hand boundary of the tree that is the new top of stack. The modi ed or extended right-hand boundary of the new top of the stack is now the accessible part of the stack. Pop tree: Remove a tree from the main stack and discard. This could be done when the tree is no longer supported by the next word(s) and there is an alternative tree for the same span of the sentence which is supported by the left context and by the next word(s). Bring node or link to right-hand boundary: This is an implicit operation that marks previously unmarked nodes or links to announce that they are now on the right-hand boundary of a tree and hence are accessible if their tree is on the top of the stack. This is performed as a result of a push-tree or a pop-and-compose-trees operation. Remove node or link from right-hand boundary: This is also an implicit operation and is the inverse of the above bring operation. This too is performed as a consequence of a push-tree or a pop-and-compose-trees operation. Add meaning: Add a meaning to the meaning set. This new meaning has links to and from one or more nodes on the main stack. Some of these nodes may not be on the right-hand boundaries of the syntactic trees on the main stack such as when the element is the meaning of an embedded clause in the sentence. The contents of this set at any time constitute the current meaning(s) of the part of the sentence that has been processed by the automaton. A meaning once pushed onto the set, stays there unless it is composed with another meaning or otherwise thrown out as an inappropriate meaning. Delete and compose meanings: Remove a meaning from the meaning set and compose it with another meaning in the set which continues to be a member of the set after the composition. The rst meaning which was deleted is assumed to be accessible through the second meaning and is no longer a direct member of the meaning set. Delete meaning: Remove a meaning from the meaning set when it is not a feasible meaning for the sentence and discard it.8
The above description of the sentence interpretation automaton speci ed the stack representations and the fundamental operations on the stacks, but does not specify the nite state network (Figure 9.22) which determines when each operation is performed. It is in this aspect that dierent The deleted meaning is actually retained by COMPERE for future error recovery. However, the automaton does not model error recovery as noted in the section below on assumptions. 8
191
sentence interpretation models dier from one another. A complete speci cation of the preconditions for each operation would constitute a speci cation of all the algorithms in the sentence interpreter. Table 9.1 speci es the preconditions but only in the form of brief summaries, for performing each of the operations according to the design choices embodied in COMPERE. Other models have other similar preconditions.9 Table 9.1: Operator Preconditions in COMPERE.
Operator: Preconditions Push expectation: When an accessible node on the top of the main stack makes a de nite
prediction for the expected unit. Pop expectation: When the expected unit or its left-corner child has been placed on the top of the main stack. Push tree: For each category of a new word or when the left-corner of the tree is on the top of the main stack. Pop and compose trees: When the topmost and the next trees on the main stack can be composed and when the tree that is going to be the child is head-complete. Pop tree: When the tree is no longer supported by the current word and there is an alternative tree for the same span of the sentence that is supported by the left context and the current word. Bring to right-hand boundary: When the node or link happens to come to the new right-hand boundary of a tree on the main stack. Remove from right-hand boundary: When the node or link is no longer on the right-hand boundary of a tree on the main stack. Add meaning: For each lexical meaning of a word. Delete and compose meanings: When the corresponding trees on the main stack have just been composed. Delete meaning: When the meaning is not appropriate for the corresponding tree on the main stack and there is an alternative meaning that is appropriate.
The automaton described above does not model the processes of error recovery in sentence interpretation. For instance, the elements retained by the interpreter either as syntactic or semantic alternatives are not part of the stacks. The cost of error recovery operations are also not considered One might wonder whether it is possible at all not to have any preconditions for certain operators and to attempt those operations continually. For example, a model might try the composition operation all the time on pairs of elements. Such a design is conceivable and seems to lead to an uncontrolled interaction in the model as described in Chapter 5. 9
192
in this analysis. This simpli cation is justi ed by the observation that error recovery and its costs are common to all the models we are comparing with each other and hence can be factored out of the analysis. On a related note, semantic roles are not explicitly represented in the automaton's stacks. It is simply assumed that whenever two meanings are composed there is a suitable unambiguous role that one meaning lls in the other. There is no additional cost associated with the role binding that is taken into account in this analysis. Thus, as mentioned earlier, this analysis ignores the cost of any structural semantic ambiguities. The reader should also be cautioned perhaps that the formal analysis of COMPERE as an automaton is to be taken only as an analysis tool. There is no implication here that COMPERE either runs the same way as this automaton or that the human sentence interpreter is an automaton of this kind.
9.3.3 A Simple Cost Metric
A rst attempt at a cost metric may be to simply add up the number of separate meanings being held on to after each word in the sentence. While this measure clearly shows the advantage of making an early commitment instead of delaying a semantic composition, it does not account for the cost of syntactic processing as well as the cost of structural ambiguities, especially local structural ambiguities. For instance, it does not include the cost of a prediction or that of a reduction done too early, as in AELC parsing, for example.
9.3.3.1 The Cost Metric
The cost metric for processing a sentence using a particular sentence interpretation algorithm is determined by adding the cost after each word. The cost after each word is the sum of the following values:
The size of the parse forest (i.e., the size of the graph-structured main stack). The number of expected syntactic units present (i.e., the size of the expectation stack). The number of meaning compositions present (i.e., the size of the meaning set).
9.3.3.2 Computing the Size of the Parse Forest
The size of a parse forest is de ned as the sum of the size of each tree in the forest. The size of a tree is the sum of the number of nodes on the right-hand boundary of the tree, excluding the leaf node when it is not itself the root, plus the number of links (called pipes above) on the right-hand boundary, excluding the link to an excluded leaf node, through which meanings are yet to ow. The reason for excluding the leaf node is that the meaning of the leaf node when it has a parent node (i.e., when it is not the root) will have already owed to the parent node. Any new word that gets added to the same parent can be added to the parent without accessing the leaf node and the link to it being excluded from the cost (see the assumptions below). We also exclude links through which meanings have already owed since such meanings are supposed to be accessible from both the child and the parent ends of the link. This becomes obvious given the fact that each node in a parse tree points to its current meaning. Consider a simple sentence as an example:
(17)
The boy drove the car.
193
Table 9.2: Cost Metric Calculation: An Example. Word Stack Size Expectation Cost Meanings Cost Total Cost The 1 1 0 2 boy 2 1 1 4 drove 2 0 1 3 the 3 1 1 5 car. 3 0 1 4 Total 18 Figure 9.23 shows the contents of the augmented graph-structured stack after each word in the above sentence. Table 9.2 shows the corresponding cost calculations after each word using the HSLC algorithm of COMPERE (i.e., using the operator preconditions shown in Table 9.1 above). The characterization of the sentence interpreter as an automaton enables us to use the cost metric de ned above as a yardstick to compare the performance of various parsing algorithms and various architectural con gurations of syntactic and semantic processing. Because we can describe dierent sentence interpreter in terms of the same fundamental operations above, we can factor out the dierences between the dierent models in terms of the preconditions under which they perform the operations and analyze how the cost metric changes under various manipulations of the preconditions. We present such an analysis below which will lead us to the tradeos inherent in deciding when to perform various operations. These tradeos package empirical factors that depend on the distribution of various kinds of ambiguities in input sentences. Thus by performing this formal analysis, we will be able to characterize the classes of input texts (in terms of the empirical factors) for which a particular sentence interpreter performs well.
9.3.4 Cost Metric: Assumptions
There has been no known metric previously proposed that attempts to count the cost of both syntactic and semantic ambiguities. While the analysis being attempted here is an ambitious one, it also makes a number of assumptions, some of which are more drastic than others. Some of these assumptions are: 1. All units have the same cost. A syntactic node, a syntactic link, a meaning element, and an expected node all have a unit cost in the cost metric. 2. A composite meaning of any size or complexity has only a unit cost in the cost metric. Thus, composing meanings reduces the cost of holding on to them. 3. Meanings ow through syntactic links as soon as links are established. Thus, meanings of modi ers in the pre x (i.e., before the head) of a phrase are already waiting at the parent node to be combined with the meaning of the head or with other following pre x modi ers. Hence we need not count the cost of rightmost leaf nodes of the pre x in the size of the parse forest. For the same reason, only right-hand boundaries (excluding the leaf) are counted in the cost metric. 4. The cost of all syntactic nodes that can potentially acquire another child or that is the root of a tree (and hence can become the child of a node in some other tree) is always included
194
WORD
Expectations
Graph-Structured Stack
Meanings
NP
The
N Det
S
boy
VP
NP
Det
BOY
N
S
drove
Det
DRIVE
VP
NP
V
N
S
the
VP
NP
Det
N
V
N
NP
DRIVE
Det
S
VP
NP
DRIVE
car Det
N
V
Det
NP
N
Figure 9.23: The Contents of the Stack: An Example. in the metric. The entire right-hand boundary (excluding the leaf) is used to compute the cost even though some nodes on the boundary may never be modi ed again in a particular sentence. 5. Parts of syntactic trees not on the right-hand boundary may be thrown away and should not be included in the cost of the trees. Such \left branches" do not participate even in error recovery processes and need not be retained at all. 6. Those syntactic links through which meanings might have to ow through at a later time should be retained and counted in the cost metric. This means that not all links on righthand boundaries need be included in the cost; only those through which the meanings of
195
7. 8.
9. 10. 11. 12. 13. 14. 15.
heads of the children nodes have not owed yet. A subtree that is part of two dierent trees is counted once only assuming that the graphstructured stack exploits packing and sharing. A head as well as its parent, and recursively so on if the parent is the head of its parent, have links to and from the meaning element for the head. These multiple links enable to us to get to the meaning of any part of a sentence without having to keep the syntactic trees around. This also allows us to exclude from the cost links between the above nodes once the meanings have already owed through them. There is a cost for prediction. The cost is simply the number of expected syntactic nodes as long as they are valid expectations. There is no cost for links between elements in the main stack and those in the expectation stack or the meaning set. Multiple word meanings cost as many units as there are meanings until the meanings are composed with other meanings. Multiple unresolved meanings do not cost anything when they are already composed with other meanings. In other words, all composite meanings cost just one unit even when there are ambiguities in some of their role llers.10 It is sucient to count costs at the end of processing each word; counting the costs after each operation during the processing of a word does not change the balance in the comparative analyses of dierent models. We need not consider the costs of error recovery and of retention of alternatives. These costs are common to all models, assuming they can actually recover from their errors, and do not aect the following analyses. Certain closed-class elements carry \meaning" and this must be added to the cost metric. For example, for a PP with just the preposition processed, it is not sucient to know that it is a PP. The interpreter must know which particular preposition generated the PP, for otherwise, it cannot gure out which is the right attachment for the PP. Hence, the identity of the preposition itself, its \meaning," must be propagated to the parent PP unit and must add an additional unit to the cost metric. Similar costs might have to be accounted for in an analysis that takes the morphological breakdown of word in ections and pre xes into account.
9.3.5 Cost Metric: Validity
In order to illustrate the validity of the cost metric outlined above, we observe that the cost metric: increases with an increase in lexical category ambiguities. increases with an increase in lexical semantic ambiguities. increases with an increase in structural syntactic ambiguities. shows orders of magnitude increase with center-embedded sentences. Though this could be changed to add up the costs of any alternative meanings in role llers without signi cantly aecting any of the analyses below. 10
196
increases, but not linearly, with the length of intermediate clauses. after each successive word does not increase monotonically with the position of the word in
the sentence. However, it may be noted that the cost metric increases linearly with the length of a right-branching construct. Perhaps the metric itself is not at fault; instead, perhaps the automaton should be modi ed to truncate the right-boundary to a small nite length from the leaf upwards. Without a well justi ed number, it is of course hard to introduce such a number into the automaton. The number could well be an empirical parameter or constant. The cost metric increases with lexical category ambiguities because a new node is introduced for each category of the word. Though many of these category ambiguities may be eliminated immediately using predictions from the left context, more than one category must be considered and pursued in many cases thereby adding to the size of the parse forest and hence the cost metric. Lexical semantic ambiguities introduce more meanings to be added to the meaning set unless all but one can be eliminated immediately using semantic or conceptual knowledge. Structural syntactic ambiguities, such as PP attachment or reduced relative clause ambiguities, introduce more possible con gurations of trees in the parse forest. Even when packing and sharing are used to avoid duplicate subtrees, structural syntactic ambiguities introduce additional links thereby adding to the cost metric. Example sentences with all these types of ambiguities are analyzed below showing the associated costs. Since the cost after each word is non-zero, the cost metric increases with an increase in the length of the sentence. However, the increase in the measure is not directly proportional to the length of the sentence. On a related note, the cost after each word does not increase monotonically. It often decreases after a word, sometimes drastically, because of syntactic and semantic compositions reducing the size of the right-hand boundaries of trees or reducing the number of elements in the meaning set (see examples below for illustrations). This is not surprising since the metric is not really a measure of the apparent size of the input sentence; it is truly a cumulative measure of the amount of ambiguities, both local and global, present in the sentence and hence does not vary directly with the number of words in the sentence.
9.3.5.1 Cost of Center Embedding Consider the simple sentence above:
(17)
The boy drove the car. The cost for this sentence is 18 as shown in Table 9.2. Let us successively introduce center embeddings in this sentence and determine the cost measures after each embedding to see whether the cost metric as de ned above re ects the degree of diculty experienced by the human sentence interpreter in dealing with such sentences. Table 9.3 shows the performance analysis of HSLC for the center-embedded sentence:
(18)
The boy the girl loved drove the car. If we now introduce a second center embedding as in:
(19)
The boy the girl the teacher praised loved drove the car. the cost metric for the sentence using HSLC algorithm increases enormously as shown in Table 9.4. The cost went up from 18 for the simple sentence to 43 for one center embedding to 81 for two center embeddings. This rate of increase (measured by the average cost per word in a sentence,
197
Table 9.3: Cost of One Center Embedding. Algorithm! HSLC Word# The 2 boy 4 the 6 girl 8 loved 8 drove 4 the 6 car. 5 Total 43
for example), is not seen when the length of the sentence is otherwise extended by comparable amounts through right or left branching constructs (see other examples below). Table 9.4 also shows the costs for the head-driven and bottom-up algorithms.11 The cost for the head-driven algorithm is, interestingly, not as drastic as for HSLC.12 However, the increase in costs for the head-driven parser as each center embedding is introduced into the sentence is as drastic as for the HSLC algorithm. It increases from 14 for the simple sentence (see Table 9.5 below) to 30 for a sentence with a single center embedding (not shown) to 53 for two center embeddings (Table 9.4).
9.3.6 The Cost of Parsing Decisions
We are now ready to look at the analyses of the performance of dierent models of sentence interpretation for a variety of sentences. We restrict our analyses to models which share the following features:
Produce incremental interpretations after each word. Have incremental interaction between syntax and semantics. Output thematic role assignments for a sentence. Build syntactic structures. Those models that leave most sentence processing decisions to empirical factors such as the availability of the right conceptual knowledge (for example,
11 In this and other performance analyses in this chapter, we have assumed a grammar that covers just the sentence being analyzed. For instance, when analyzing the simple sentence, we did not consider the possibility of a relative clause adjunct to a noun phrase. This does not make a dierence for AELC, HSLC, or head-driven algorithms since they compose the words at or before the head of the phrase. However, it could make a dierence for ASLC and bottom-up parsers depending on whether they produce multiple interpretations in parallel (like chart parsers do) or wait for the next word to con rm before producing a composite structure. In any case, if we considered a grammar with a broader coverage, the performance of these two algorithms could be worse. 12 One could form a hypothesis from this observation that there is a correlation between a cost metric of this kind and the overall capacity of the human sentence interpreter. Given such a hypothesis, one could argue that head-driven parsing is not a likely candidate for the human parser because it does not show enormous costs for center-embedded sentences that are known to cause the human sentence interpreter to break down.
198
Table 9.4: Cost of Two Center Embeddings. Algorithm! HSLC Head- BottomWord# Driven Up The 2 1 1 boy 4 2 3 the 6 3 4 girl 8 4 6 the 10 5 7 teacher 12 6 9 praised 12 8 11 loved 9 7 11 drove 5 5 9 the 7 6 10 car. 6 6 6 Total 81 53 77
conceptual information processors like Birnbaum and Selfridge's (1981) CA) are not amenable to a formal analysis of the kind described here. Do not perform conceptual inferences or do other extra-linguistic reasoning. Have a well-speci ed algorithm for the control of processing. Models which are not computationally well-speci ed, such as those for example that merely say \it all happens in a spreading-activation network," cannot be analyzed as outlined here. Do not assume an all powerful oracle or homunculus that always nondeterministically selects the right interpretation at any point in sentence processing. In addition, the analysis does not include many of the aspects that models like COMPERE do not deal with. These include phonetic and morphological analysis, discourse, reference, and context analyses and eects, and so on. However, the cost metric developed here could be extended without much diculty to include the cost of reference analysis and so on. Below, we present the costs of several sentence processors built by incorporating dierent parsing algorithms along the parsing spectrum of eagerness into incremental syntax-semantics architectures such as the one in COMPERE. The idea here is to show examples and lay the groundwork for the analyses of the tradeos below in performing the operations of the automaton earlier or later than when COMPERE performs them. Certain extremes of the parsing spectrum such as a completely top-down algorithm or a syntax- rst architecture with no incremental semantic interaction can also be analyzed to show that they are not viable. We shall not, however, deal with such cases in the examples below.
9.3.6.1 Performance Analysis: Simple Sentences
The tables below show the word after word and nal costs of processing sample sentences using the three left-corner parsing algorithms|AELC, HSLC, and ASLC|as well as pure bottom-up
199
and head-driven models. The actual contents of the stack for each case are not shown due to the extreme length of such details. Table 9.5 shows a performance analysis of the above algorithms when they process a simple subject-verb-object sentence. The comparison with the bottom-up parser is \unfair" since that is the only algorithm among the ones analyzed which does not produce incremental interpretations (apart from ASLC which can also be non-incremental to signi cant lengths at times). For instance, all other models (except ASLC) compose the meanings of the subject noun and the verb immediately after the verb; the bottom-up model (as well as the ASLC) waits until the end of the sentence to do that. When a simple sentence with no ambiguity is analyzed,13 the head-driven parser comes out as the best, other than the unacceptable bottom-up parser. The main reason for the increased cost for the left-corner parsers is the cost of making predictions. The left-corner projection and the consequent generation of expectations is a mere burden in a sentence like this since there is no ambiguity that prediction could help eliminate. It may also be noted that ASLC suers the most because it accumulates the costs of prediction but also delays composition until the end of the sentence, being the most bottom-up of the left corner parsers. The dierence in the metric between AELC and HSLC is mainly due to the eager linking of the Determiner to the VP, through the object NP, even before the following object noun is processed. AELC will have to hold on to this extra \pipe" until the noun is processed.
Table 9.5: Performance Comparison: A Simple Sentence. Algorithm! AELC HSLC ASLC Head- BottomWord# Driven Up The 2 2 2 1 1 boy 4 4 4 2 2 drove 3 3 5 3 4 the 6 5 7 4 5 car. 4 4 4 4 4 Total 19 18 22 14 16
9.3.6.2 Performance Analysis: PP Attachment Ambiguity
Table 9.6 shows the performance of the various algorithms when a sentence with a PP attachment ambiguity is presented. Here the ambiguity is a structural syntactic ambiguity regarding whether the PP should be attached to the object NP or the VP. This is also a local ambiguity since there is presumably conceptual knowledge that disambiguates this sentence in favor of the NP attachment. Once again, HSLC comes out as the best among the left corner parsers, the dierence between HSLC and the other two LC parsers being even more marked in this example. It is interesting to note that bottom-up parsing, which does not produce incremental interpretations at all and waits until the very end of the PP even to compose the subject noun and verb meanings, turns out to In this sentence, there is no ambiguity of the kinds we are dealing with. There still might be a referential ambiguity and so on in this sentence. 13
200
be rather expensive in this right-branching sentence. Head driven parsing has the least cost once again. As in the previous example of a simple sentence, prediction in LC parsers does not buy us anything since there is no ambiguity in the pre x (i.e., the part before the head) of a phrase. In this analysis, we have also assumed that the head of the PP is the noun in the NP. If we were to take the preposition to be the head of the PP, the head-driven algorithm would behave much like the AELC, trying to attach the PP immediately after the preposition. In fact, the head-driven algorithm would become as expensive as the HSLC if the preposition is considered to be the head of the PP. This analysis did not consider the category ambiguity in the word \saw" which could be either a noun or a verb. We assumed that it is only a verb. This was done primarily to exclude additional factors and focus on the tradeos in eager versus late attachments of an adjunct phrase. However, if we considered the category ambiguity in \saw," left-corner parsers would fare better than the others because of their prediction of a verb phrase from the left-corner projection before processing the word \saw." It may also be noted that we ignored the apparent lexical semantic ambiguity in the verb \saw," whether it was about seeing or cutting with a saw. This ambiguity arises from a subcategory ambiguity in the word, whether it is the past tense form or a present tense form for only the rst or second person, singular or plural, or the third person plural, but not the third person singular, which could be used by any of the algorithms through agreement checking mechanisms to disambiguate easily in this case.
Table 9.6: Performance Comparison: A PP Attachment Ambiguity. Algorithm! AELC HSLC ASLC Head- BottomWord# Driven Up The 2 2 2 1 1 boy 4 4 4 2 2 saw 3 3 5 3 4 the 6 5 7 4 5 man 4 4 7 4 7 with 9 7 10 6 9 the 11 8 11 7 10 horse. 6 6 6 6 6 Total 45 39 52 33 44
In the above analysis of prepositional phrase attachment, if the attachment is unambiguous then the performance analysis looks as shown in Table 9.7. It can be seen from the table that HSLC does better than both AELC and ASLC even when the attachment is unambiguous. The reason for AELC's higher cost is that it makes the syntactic attachment between the PP and the VP too early (i.e., immediately after the preposition) for meanings to ow through the links and hence must hold on to the links longer. HSLC does better than this by exploiting the head-drivenness of the head-driven algorithm.
201
Table 9.7: Performance Comparison: Unambiguous PP Attachment. Algorithm! AELC HSLC ASLC Head- BottomWord# Driven Up The 2 2 2 1 1 girl 4 4 4 2 2 went 3 3 5 3 4 into 7 6 8 5 6 the 8 7 9 6 7 lounge. 5 5 5 5 5 Total 29 27 33 22 25
9.3.6.3 Performance Analysis: Modi ers in the Pre x
From the above analyses, it appears that the head-driven parser provides the best performance of all the models considered. However, this picture changes when we consider the eect of modi ers in the pre x, that is, before the head. When modi ers are present on the left of the head, left-corner parsers compose them immediately using their left-corner projection mechanism to reduce costs. We will see from the example below that HSLC does better than the head-driven parser as well as other left-corner parsers by exploiting both the eagerness of the left-corner mechanism and the circumspection of the head-driven mechanism. Table 9.8 shows the performance analysis for a sentence with modi ers in the pre x of the subject and object noun phrases. This example shows that as more and more modifying meanings appear on the left of the head, the cost of the head-driven algorithm increases sharply since it does not process the semantics of the modi ers (i.e., does not compose their meanings with each other even when they can be so composed) until the head. HSLC employs its left-corner projection to compose the modi ers immediately to yield the best performance among all the algorithms considered. In this analysis, we have assumed that the bracketing of the sentence is [The [[light green] parrot]] uttered [an [[exceptionally long] [original sentence]]].
In other words, the meanings of \light" and \green" can be composed before the noun \parrot" is seen; the meanings of \exceptionally" and \long" can be composed with each other immediately, but the resulting composite meaning does not combine with the meaning of \original." The meaning of \original" must be held separately until the noun \sentence" is processed. If the modi ers were such that each new modi er can be composed with the previous ones immediately, the dierence in performance between HSLC and head-driven parser would be even greater, the head-driven algorithm performing even worse. The contrast between this example and the previous ones shows the tradeo between the cost of prediction in left-corner parsers and the cost of not predicting and delaying the semantic processing of pre x modi ers in the head-driven parser. While it is obvious that the numbers in any one of these tables should not be taken too seriously, say, to select an algorithm, these numbers and the underlying cost metric provide a language for analyzing and expressing the tradeos in the timing
202
Table 9.8: Performance Comparison: Eect of Modi ers in the Pre x. Algorithm! AELC HSLC ASLC Head- BottomWord# Driven Up The 2 2 2 1 1 light 3 3 3 3 3 green 3 3 3 5 5 parrot 4 4 4 2 2 uttered 3 3 5 3 4 an 6 5 7 4 5 exceptionally 7 6 8 6 7 long 7 6 8 8 9 original 8 7 9 10 11 sentence. 4 4 4 4 4 Total 47 43 53 46 51
of the interactions between syntactic and semantic processing in sentence interpretation. They also help us illustrate the advantages that HSLC has to oer by combining the best of both left-corner and head-driven parsers. Table 9.8 highlights the places where head-driven parsing loses to HSLC. These are precisely the places where a left-corner projection enables an early semantic composition before the head to reduce the cost. A similar situation occurs in noun groups which occur very frequently in certain kinds of technical texts. An example taken from a local network posting for a colloquium shows some extreme examples (some dicult noun phrases have been highlighted):14 An Architecture for Design Decision Support Systems Command and control software design support represents a new, but interesting, aspect of the human machine systems engineering problem. This presentation will describe an architecture for a design decision support directed at collecting and disseminating command and control software design knowledge as well as facilitating the construction of plausible solutions to design problems. The goal of this research eort is to formulate, demonstrate, and evaluate concepts which assist a designer in unfamiliar, yet commonly occurring, domain-speci c design situations. Consider one such noun group, \the human machine systems engineering problem," for example. Left-corner projection enables incremental composition of noun meanings. Head-driven parsing waits until the head noun, \problem," before performing any composition. However, there is an additional complexity in analyzing this example, namely, that any of the nouns can be the head noun when it is seen rst, or, it can be a noun modi er for a following head noun. Parsers could either assume that the noun is a head and go ahead with the compositions and then incrementally recover as each new noun is encountered or they could simply wait until the phrase ends without a doubt. Taking this issue into account, the performance analysis for just this phrase is shown in Table 9.9 for three kinds of parsers. Parser A is left-corner parser such as HSLC with its projection 14
A talk in the series of colloquia in cognitive science at Georgia Tech by John Morris on July 15, 1994.
203
and prediction overheads. Parser B is a parser without left-corner projection or prediction such as a head-driven parser. Parser C is a pure bottom-up parser that waits until the end to make any composition at all. The dierence between B and C is that B is incremental, assuming that each noun replaces the previous one as the head of the phrase, while C assumes that a noun is a head only when we are certain that there are no more nouns following. It can be seen from the table that while the measure remains constant after the rst noun for parsers A and B, it keeps increasing for parser C since it keeps waiting for the last noun before making any composition. As a result, this less eager parser accumulates costs higher than even that of parser A with its heavy penalty for early projection (of S from NP) and prediction (for a VP). Head-driven parsers may be implemented as either a parser B or a parser C. This analysis shows the real bene ts of word-by-word incrementality and the costs of other related decisions in the design of a parser.
Table 9.9: Performance Comparison: A Complex Noun Group. Algorithm! A B C Word# The 2 1 1 Human 4 2 3 machine 4 2 5 systems 4 2 7 engineering 4 2 9 problem. 4 2 2 Total 22 11 27
It may be noted that a purely left-branching construct, on the other hand, does not pose a problem to any of the algorithms analyzed here. For instance, the above example may be contrasted with a left-branching chain of possessives such as \my cousin's aunt's dog's tail" where every noun is the head of its NP. Since every word after the rst is a head and also the rightmost unit of the phrase (which is what makes it a left-branching construct), all the parsers in the tables above would compose them immediately resulting in more or less the same cost for metric for each algorithm, discounting the additional cost that left-corner parsers incur in repeatedly trying to project from the NP to the S and in repeatedly expecting to see a VP.
9.3.6.4 Performance Analysis: A Complex Example
As a nal example, let us consider a sentence that has both lexical and structural ambiguities and one that leads to a garden path and subsequent error recovery. Table 9.10 shows the performance analysis for such a sentence. Again, the head-driven parser performs a little better than HSLC because of the cost of prediction in HSLC which is not put to much use in this sentence because there are not many modi ers in the pre x in this sentence. However, such small dierences in numbers should not be taken too literally since the performance metric attributes equal costs to syntactic nodes, links, and meaning units. Head-driven parsing reduces its cost by not creating a few syntactic nodes and links early, but HSLC and AELC help reduce the number of separate meanings yet to be composed. On the other hand, it can be seen that as the sentence complexity increases,
204
the dierence in performance measures between these algorithms and the highly circumspect ones such as ASLC and bottom-up parsers grows enormously. Also, though AELC has a consistently higher but only slightly higher cost than HSLC, the advantage that HSLC oers by delaying compositions until the head might be much more signi cant than is apparent from the tables shown here. The metric attributes a signi cant portion of its cost to merely holding on to intermediate interpretations in benign situations with no ambiguities. However, most of the dierence in cost between HSLC and AELC comes from ambiguities in attachment that HSLC avoids by delaying the attachment until such time as semantic (or conceptual) information can help resolve. Moreover, if such ambiguities are of higher degrees (i.e., more than two attachments possible per ambiguity), then AELC would suer by greater amounts.
Table 9.10: Performance Comparison: A Complex Example. Algorithm! AELC HSLC ASLC Head- BottomWord# Driven Up The 2 2 2 1 1 bugs 5 5 5 3 3 moved 3 3 6 3 5 into 7 6 9 5 7 the 9 7 10 6 8 new 10 8 11 8 10 lounge 5 5 5 5 5 were 7 6 14 5 11 found 4 4 12 4 13 quickly. 4 4 4 4 4 Total 56 50 78 44 67
9.3.7 Performance Tradeos in Sentence Processing: A Formal Analysis
Below we present formal analyses of the cost tradeos depending on when certain operations of the automaton are performed. In particular, we will concern ourselves with three important scenarios: 1. The costs and bene ts of prediction: the consequences of performing push-expectation operations and whether those operations enable useful pop-tree operations 2. The costs and bene ts of eagerness or the lack of it: the eects of performing pop-andcompose-trees at dierent announce points in the phrase, such as immediately after the left corner, immediately after the head, or at the end of the phrase 3. The costs and bene ts of projection: the eects of a push-tree operation of a parent node from a left corner and the consequent pop-and-compose-trees operations.
205
9.3.7.1 Tradeos in Prediction
The simplest tradeo is in prediction. It costs signi cant amounts to make and hold on to a prediction for a syntactic unit. However, having a prediction can help us eliminate some choices in a following ambiguity, resulting in a saving when the ambiguity is encountered. In this analysis we assume a prediction strategy similar to COMPERE's where only required units are predicted. If optional units are also predicted, prediction can be very expensive, leading to excessive cost measures. The cost of prediction is one unit for the predicted unit added to the expectation stack. This cost adds up to the overall measure as long as the predicted unit stays on the expectation stack. That is, the cost of prediction is equal to the number of words between the place where the prediction was generated, including the word at the place, and the place where it is either satis ed or is rejected and thrown out of the expectation stack. For example, a prediction for a noun is generated at the determiner of a noun phrase. This prediction might stay through any intermediate adjectives or other allowed modi ers between the determiner and the noun. The cost of prediction in this case is one for the determiner plus one each for any modi er words before the noun. Suppose the word that meets the prediction has a category ambiguity. Prediction helps eliminate all but the predicted category. It also helps avoid having to add the meanings of those eliminated categories to the meaning set. Hence, the reduction in cost resulting from a prediction is the sum of the number of categories of the word minus one plus the total number of meanings of all of those rejected categories of the word. For example, in the sentence15
(20)
The large can: : : the cost of predicting a noun after the determiner is 2 units; the corresponding bene t is 3 units|one for the auxiliary category of \can," one for the verb category of \can," and another for the meaning of the verb \can." In fact, the bene t is even higher due to the elimination of subsequent costs of pursuing some of the alternatives rejected by using the prediction. For instance, the auxiliary category of \can," when pursued, leads to the expectation for a following verb. Moreover, there could be further bene ts of prediction when one of the rejected categories could lead to a syntactic structure that is far more complex and hence leads to much bigger parse forests. An example of this would be the prediction for a verb phrase after the subject noun phrase of a sentence. This could help eliminate a reduced relative clause structure, following a verb that has a subcategory ambiguity between simple past tense and past participle forms, which is much more complex than a simple main-verb structure (see sentences (4) and (10) at the beginning of this chapter). The cost of prediction would be wasted only when the entire phrase leading to the prediction for a required child is itself rejected. Otherwise, since only required units are predicted, the predictions must be met given that the sentence is grammatical. However, if the prediction is met much later in the sentence due to intervening clauses, the cost of prediction would also be correspondingly higher. In the above example, one could argue that there was no need for an explicit prediction. The left context could as well be used to eliminate the auxiliary and verb categories of \can" in the above sentence by a parser that does not make any predictions. However, the same argument does not hold in the case of other predictions where the left context does not unambiguously select a single category of a following ambiguous word. For example, in the reduced-relative clause ambiguity mentioned above, the NP in the left context could either be followed by a main verb phrase or a reduced relative clause and then a main verb phrase. However, a parser like HSLC generates a prediction for a main verb using the knowledge that it is required to complete the S structure. This prediction is used to eliminate the expensive reduced relative structure, unless the main verb 15
assuming that \large" can only be an adjective, not a noun by itself
206
structure is rejected by conceptual constraints. Thus the bene ts of prediction really come from the knowledge of required and optional units.
9.3.7.2 Tradeos in Left-Corner Projection
Projecting the parent from the left corner enables the sentence interpreter to compose the words in the pre x syntactically and semantically without waiting for the head. The cost of projecting is the cost of creating the parent node early. The bene ts of projection is the reduction in cost resulting from the immediate composition of words in the pre x between the left corner and the head.
.....
Left Corner 1
2
3
Head n
+ m meanings each
= (m + 1) (n x (n + 1)/2 + 1)
(a) Wait till the head strategy
Projected Parent Predicted Head Left Corner
+ m composite meanings in all
= (m + 2) (n + 1).
(b) Projection and Prediction in prefix processing
Figure 9.24: Tradeos in Left-Corner Projection and Prediction.
Let there be n words before the head in a phrase. Out of these, some words may have meanings and some more than one meaning when they have lexical semantic ambiguities. They may also have lexical category ambiguities, but, for the present, we will ignore this complexity.16 Let us simply assume that the n words have an average of m meanings each. The cost of left corner projection is n: one parent unit projected early and held on to for the next n words before the head makes the phrase head-complete. Thus the total cost metric after the head with projection and immediate As discussed before, these category ambiguities could lead to additional costs of predictions and alternative trees in the parse forest if they are not eliminated using immediate composition. 16
207 composition would be (m + 1) (n + 1).17 If no composition was performed, the total cost would have been (Figure 9.24(b)) 1 (m + 1) (n n + 2 + 1) A simple analysis shows that the two expressions above equal each other when n is 1. Thus, even when n is 2 and m is so low that it is close to 0, the cost is lesser with projection. In general, it appears that the cost with projection is signi cantly less than that without any projection and incremental composition. However, this analysis is rather simple and ignores several other factors. We will consider them in the following paragraphs. One reason to project from the left corner is to be able to predict the required constituents from the parent. As we already noted, prediction also adds to the overall cost, but could also reduce the overall cost due to immediate disambiguation using the predicted categories. If prediction is combined with projection, the cost with projection and prediction (Figure 9.24(a)) turns out to be (m + 2) (n + 1) assuming that one predicted node is added to the expectation stack at the left corner and is satis ed by the head after the rst n words. With this expression and with m equal to 1, whenever n is greater than 2, the cost with prediction and projection is still less than the cost without incremental composition. In fact if n is more than 3, projection and prediction are together less expensive than delaying the composition for any value of m. Similarly, when n equals 3, any value of m more than a third (i.e., at least one of the three pre x words having at least one meaning) makes prediction and projection a less expensive option; when n equals 2, m must be 2 or more for prediction and projection to be bene cial; and when n equals just 1, the incremental alternative is more expensive by 1 unit for any value of m. The above analysis did not consider the savings that prediction could make possible by way of immediate resolution of category ambiguities. When such bene ts are considered, the incremental option would become even more pro table. However, this analysis also does not consider additional costs of projection when there are certain types of ambiguities in the grammar. When the left-corner is ambiguous in determining its parent, more than one parent phrase might have to be projected from the left corner and more than one prediction made (one or more for each of the projected phrases). When this happens, projection and prediction could turn out to be more expensive than the above analysis showed.
9.3.7.3 Tradeos in Eager Reduction
There are performance tradeos in performing a pop-and-compose-trees operation early or late. We can consider dierent situations in terms of the announce points at which the operation is performed by the sentence interpreting automaton. The announce point may be 1 (i.e., the left corner, as in AELC), may equal the head (as in HSLC and head-driven parsers), or the end (as in ASLC and bottom-up parsers). We shall now analyze the costs and bene ts of having the announce point at 1 (or less than the head, in general), at the head, and beyond the head. When the announce point is at the left-corner (i.e., equal to 1), attachment will be attempted even before the head as shown in Figure 9.25 except in special cases where the left corner is itself the head. An eager attachment at the left corner would result in an additional cost in the amount equal to the number of dierent attachments possible multiplied by the length of the pre x in the phrase. This is because, as many new links as the number of possible attachments are added eagerly at the announce point. However, since the head has not been seen yet, these links need to be retained until the head is processed (i.e., throughout the length of the pre x of the phrase) so
17 This assumes that meanings can always be composed with previous ones so that there are never more than m meanings on an average that we need to hold on to.
208
Attachment 1 Attachment 2
Projected Parent
Left Corner
Expected Head
Eager attachment at Announce Point = 1: Additional cost = number of attachments = 2.
Figure 9.25: Tradeos in Eager Reduction: Cost of Reduction at Left Corner. that the meaning of the head can ow through one or more of these links, as appropriate. This is true unless there is some other knowledge that disambiguates the attachment even before the head is seen (for example, such as the knowledge that a particular preposition in a PP only modi es a verb not an object noun, no matter what the head noun in the NP of the PP). When the announce point is at the head, we get minimal costs as seen in the examples with COMPERE at the beginning of this chapter. However, when the announce point is pushed beyond the head, additional costs could accrue from two sources: delayed composition of head meanings with pre x meanings and delayed composition of post x (adjunct or argument) meanings with the head meaning. As shown in Figure 9.26, attaching immediately after the head is processed and thereby composing the meaning of the head with the meanings of the pre x phrases will reduce the cost by at least the number of meanings in the pre x phrases. These meanings will be considered part of the composite meaning arising from the head and would not be counted separately in the cost metric. Additional savings could result if the right-hand boundary resulting from such an attachment is shorter than the prevailing right-hand boundary at that point. Similarly, Figure 9.27 shows the situation where a post x adjunct is not composed immediately with the head. Such a delay would result in an additional cost equal to the number of meanings of the post x phrase. If attached immediately, those meanings would be composed with the head meaning thereby reducing the cost.
209
Meaning 3
Head of Meaning 1 Meaning 3 Meaning 2
Left-Corner
Head
Cost before attachment = 6
Cost after attachment = 3
Additional cost of delayed composition with prefix: If announce point is beyond the head, the cost is much higher.
Figure 9.26: Tradeos in Eager Reduction: Cost of Delayed Head Composition.
9.3.8 Empirical Factors
From the above analysis, we can identify certain empirical factors that aect the performance of sentence interpreters and hence cost metric. Some of them are: 1. The average number of category ambiguities for a word in the text (not the average in the lexicon). 2. The average number of meanings per category of a word in the text. 3. The average length of intermediate modi ers between a left corner and the predicted required unit (i.e., length of a pre x). 4. The percentage of times when the head is the left corner of a phrase. 5. The percentage of times when the head is the rightmost child of a phrase. 6. The average number of phrases for which a unit can be a left corner in the grammar. 7. The percentage of post x children that can be unambiguously predicted by the grammar. 8. The percentage of times when an ambiguous attachment can be disambiguated from just the left corner or from parts of the pre x even before the head is seen.
210
Meaning 1
Meaning 1
Head Meaning 2 Head
Meaning 3
Left-Corner
Head
Cost before attachment = 6
Cost after attachment = 4
Additional cost of delayed attachment of postfix units: If announce point is beyond the head, the cost is much higher.
Figure 9.27: Tradeos in Eager Reduction: Cost of Delayed Post x Composition. The above formal analysis and the cost metric provide a framework for carrying out empirical studies of sentence interpreters. No such framework existed before that took semantics and the cost of ambiguities into account. The cost metric developed here is clearly a measure of the amount of local ambiguity that a sentence interpreter has to wrestle with. We have even identi ed several empirical factors that aect the cost measure. However, it is impossible to devise a mathematical function from the set of all empirical factors to the cost metric. There are too many empirical variables and their contribution to the overall cost (or the overall amount of local ambiguity encountered) depends on a number of other factors such as the position and syntactic context in which the ambiguity occurs in the sentence. Nonetheless, this framework allows us to articulate in concrete terms the tradeos in altering individual design decisions such as the ones analyzed above. However, it is hard to combine all the tradeos for the dierent design decisions in a single analysis. Many other possibly unknown or unaccounted factors as well as complex interactions between design decisions preclude such a complete mathematical analysis. Though it appears that the empirical factors noted above de ne classes of sentences, it is also not easy to classify a sentence as one where only certain empirical factors play a role. Most sentences have many kinds of ambiguities with many associated empirical factors. Moreover, when we consider entire texts, they rarely have sentences of the same kind. Based on the design decisions analyzed above, our COMPERE model seems to have selected a very reasonable set of design choices. Whether this set is really optimal for a particular class of texts can only be evaluated by an actual empirical study. Such a study requires that the COMPERE program be actually able to process the whole corpus of texts and hence requires a substantial
211
amount of engineering before the study can be conducted. Such a study will also require us to build other programs with dierent design choices and engineer them too to compare their performance with that of COMPERE. The formal analysis above provides a framework for evaluating some of the costs and tradeos in the design of sentence interpreters without the expensive process of actual empirical comparisons.
212
CHAPTER X DISCUSSION Discussion, n. A method of con rming others in their errors. A. Bierce, 1911. The Devil's Dictionary. In this chapter, we address several signi cant issues that have come up in the design and implementation of the COMPERE model. But rst, we present the predictions made by the COMPERE model and describe how the predictions could be veri ed.
10.1 Psychological Predictions from COMPERE The value of a computational model of some aspect of human cognition goes far beyond a mere simulation of psychological data. A good computational model must have some predictive power. It must make concrete predictions and suggest psychological experiments to verify the predictions. COMPERE indeed makes such predictions. Its predictions are concerned with the interactions between syntactic and semantic decisions when several ambiguities are present in each other's vicinity in the same sentence. Sentences having more than one type of ambiguity in the same sentence, where there may be complex interactions between the choices in the dierent ambiguities, are dif cult to design and control in experimental studies of human sentence comprehension. However, work on COMPERE suggests that examining such interactions could reveal and con rm interesting details about the human sentence processor. In particular, the second prediction below could serve as a challenging test of theories of conditional retention for error recovery (Eiselt, 1989; Holbrook, 1989).
10.1.1 The Predictions
COMPERE makes at least the following two predictions. When a sentence has both a structural syntactic ambiguity and a lexical semantic ambiguity, Prediction 1: Interactive Ambiguity Resolution: Resolving a structural ambiguity has an immediate eect on resolving an associated lexical ambiguity and vice versa. Prediction 2: Interactive Error Recovery: Recovering from an error in resolving a structural ambiguity has an immediate eect on previous decisions made in an associated lexical ambiguity and vice versa. These predictions are central to COMPERE's claims and crucial to its applicability as a cognitive model of sentence comprehension. The predictions follow directly from COMPERE's architecture with parallel syntactic and semantic processing and an arbitrating controller (see Chapter 5). Since syntactic and semantic preferences are considered together and arbitrated by the uni ed process, a decision that resolves one ambiguity also aects the other ambiguity. Such a decision is made by selecting an interpretation by considering all the alternatives possible at that point. Thus, selecting
213
an interpretation to resolve the rst ambiguity might automatically resolve the second ambiguity as well. For example, in sentence (1) below,
(1)
The bugs moved into the new lounge were found quickly. resolving the structural ambiguity at \moved" in favor of the main-clause interpretation automatically resolves the lexical semantic ambiguity in \bugs" in favor of the insect meaning. This is because, of the two meanings of \bugs," only the insect meaning is compatible with the chosen main-clause interpretation of \moved." Only insects can be agents of move since microphones cannot move themselves. Similarly, if there was a strong contextual bias for the microphone meaning of \bugs" and if the lexical semantic ambiguity had been resolved immediately at the word \bugs," then the structural ambiguity at \moved" would be resolved automatically and immediately in favor of the reduced relative interpretation. Previous work has already demonstrated that the context in which a sentence is being processed has an immediate eect on ambiguity resolution (e.g., Crain and Steedman, 1985; Tyler and Marslen-Wilson, 1977; also see Chapter 3). The interactive resolution prediction above is extending this view into dynamic situations where the change in context produced by one decision in sentence processing has an immediate eect on resolving other ambiguities.
10.1.1.1 Interactive Error Recovery and COMPERE's Architecture
The interactive error recovery prediction is not only central to COMPERE's controlled parallel architecture, but only such an architecture could have made the prediction. Other architectures do not posit a single controlling process. Without such a process, they would not be able to predict that changes to the syntactic interpretation during an error recovery have immediate consequences in previously resolved associated ambiguities in semantics, and vice versa. A sequential syntax- rst architecture, for instance, could not have predicted that an error recovery in a lexical semantic ambiguity would have any eect at all on an associated syntactic decision. An integrated model, where the knowledge types are integrated a priori, does not account for the types of dynamic decisions in syntax and semantics and the complex interactions between decisions at dierent levels modeled in COMPERE and predicted above. Even parallel models without a controlling process (i.e., without an arbitrator) could not have predicted the interactions in error recovery when there are multiple ambiguities. They do not guarantee that interactions occur between syntax and semantics since the models cannot explain the control and coordination necessary to ensure consistency between syntax and semantics when there are multiple types of ambiguities and errors in resolving them. A more detailed discussion of the types of interactions between syntax and semantics possible in dierent sentence processing architectures was presented earlier in Chapter 5.
10.1.1.2 Error Recovery and Retention
The second prediction, Interactive Error Recovery, is more interesting since it tests the retention theory of error recovery (Eiselt, 1989; Holbrook, 1989; also see Chapter 3). According to the retention theory, unselected alternatives are retained for use in error recovery. Error recovery is carried out by examining retained alternatives and switching from the current, erroneous interpretation to a retained interpretation. Consider a situation such as in sentence (1) above where a previous decision had resolved two dierent ambiguities. If that previous decision later turns out to be an erroneous one with respect to one of the ambiguities, as it does in sentence (1), when the processor recovers from the error by switching to a retained alternative for that ambiguity, the other ambiguity must also be aected. Alternatives retained from the second ambiguity must also be examined and switched to if necessary. Otherwise, the interpretation resulting from the error recovery could be inconsistent or otherwise inaccurate.
214
(1)
The bugs moved into the new lounge were found quickly. For example, in sentence (1) (see Figure 10.1 for an illustration), upon encountering the word \were," the processor detects an error in syntax since there is no way to compose this word with the previous syntactic structures. In resolving this error by switching to a reduced relative interpretation for the preceding clause, if the processor does not make any change to the status of the lexical semantic ambiguity of \bugs," the resulting interpretation would be incorrect. insects and microphones can both be themes of move and must both have been counted as possible meanings of \bugs" at this time. There is no longer a reason to rule out the second meaning since the \bugs" are not the agents of moveing any more. When the processor reexamines the retained alternatives of the lexical ambiguity, there can be three situations that result. In the case of sentence (1), the lexical ambiguity is \unresolved" since both meanings are brought back into the interpretation.1 Depending on the selectional constraints present, the ambiguity could also have been resolved dierently. That is, a dierent meaning could have been chosen and the current one deactivated and retained, or there could be no change if the changes made for structural error recovery were orthogonal to the selectional constraints for the lexical ambiguity. Similar arguments can be made where a previously resolved lexical ambiguity (by the preceding context, for instance) leads to an error and an associated structural ambiguity is aected by the error recovery. For example, the chosen meaning of the subject noun could have ruled out the main-clause interpretation which might have to be reconsidered if the lexical disambiguation was in error. COMPERE is capable of producing correct interpretations when presented with sentences like (1) above which have complex interactions between syntactic and semantic ambiguities. The interactive error recovery prediction is a challenging test of the retention theory since it demands that unselected alternatives be retained in both types of ambiguity resolutions: those that are resolved by new information provided by the sentence and those that are resolved automatically as a consequence of resolving other ambiguities. It also suggests an experiment to test the theory since, if retention is not capable of meeting the requirements posted by this prediction, observable errors in interpretation must occur after error recovery. We present a preliminary sketch of such an experiment below.
10.1.2 A Sketch of an Experiment
An experiment to test the above predictions requires the design of materials such as sentence (1) above that have two ambiguities, one syntactic and the other semantic. The two ambiguities should be such that one alternative of the rst ambiguity must be compatible with all alternatives of the second, but the second alternative of the rst ambiguity must be compatible with only one of the alternatives for the second ambiguity. In sentence (1) above, the insect meaning is compatible with both syntactic interpretations while the microphone meaning is compatible only with the reduced-relative interpretation. The sentence must also be such that there is sucient distance between the point where the two ambiguities are rst resolved (Resolution Point in Figure 10.1) and the point where the error and the consequent recovery occur (Error Point in Figure 10.1).2 This window, between \moved" and \were" in sentence (1), is called the Resolution Window in Figure 10.1. There should also be some words following the error point (e.g., after the word \were" Resolving an ambiguity here refers to the operation of selecting a proper subset from the set of possible interpretations at an ambiguity. Consequently, the term \unresolving" may be used to refer to the operation of switching back to the bigger set of possible interpretations from the previously selected subset. 2 The distance, that is the length of the intervening phrase or clause, must be sucient (about 600 msec, see Seidenberg, Tanenhaus, Leiman, and Bienkowski, 1982) for us to employ experimental techniques, such as lexical decision or naming, to determine which interpretations are active at that point. 1
215
in sentence (1) above) which form the Recovery Window in Figure 10.1. This will allow us to test for the current interpretations after the initial resolution but before the error, as well as after the error but before the end of the sentence. The recovery window is essential since the processor resolves many ambiguities at the end of the sentence based on available information (Bever, Garrett, and Hurtig, 1973; see also Holbrook, 1989).
Ambiguity 1
Ambiguity 2
The bugs moved into the new lounge were found quickly. Resolution Point
Error Point
Resolution Window
Recovery Window
Figure 10.1: Multiple Ambiguities and Composite Errors. The experimental technique employed must show the eects of processing loads and use such eects to test for current interpretations in the two windows mentioned above. After the resolution point, the experiment must show that both ambiguities have been resolved. This can be done when the resolution window is suciently long by showing evidence that one interpretation is active while the other is signi cantly less active. At a later point in the sentence, in the recovery window, we can conduct similar tests to show which interpretations are active. We can also use dierent types of sentences, if necessary, so that each of the three scenarios mentioned above (\unresolving," resolving dierently, and no change) occurs. This experiment can produce the following outcomes: One possibility is that after the resolution point, all interpretations are equally active. If this happens, then the ambiguities were never resolved and not much can be said about our predictions from such a result. If only one ambiguity is resolved and the other not resolved, though it could have been given available information, then the interactive resolution prediction is not supported. If both ambiguities are resolved after the resolution point though the input sentence only had explicit information sucient to resolve one ambiguity, then the interactive resolution prediction is correct. If the interactive resolution prediction is correct, but the tests done after the error point revealed that the automatically resolved ambiguity was not properly corrected during error recovery, then we can conclude that the interactive recovery prediction is not supported. If after error recovery both ambiguities are properly resolved and the syntactic and semantic interpretations are consistent, then the interactive recovery prediction is supported.
216
We do not expect an outcome where the interactive resolution prediction is not supported but the error recovery prediction is supported. If both predictions are not supported, then this experiment neither supports nor contradicts retention or error recovery. If the resolution prediction is supported but the recovery prediction is not, then retention theory loses ground. We can then say that the human sentence processor might employ retention to successfully recover from errors when there are single ambiguities but it cannot extend these mechanisms to situations where there are multiple interacting ambiguities at dierent levels. On the other hand, if both predictions are supported then retention theory must be accurate since we were able to demonstrate that the processor is capable of recovering not only from single errors when there is just one ambiguity but also when there are multiple ambiguities and composite errors. It must be noted, however, that this is a very rough and preliminary sketch of an experimental design. Considerable analysis and eort is needed to design the right type and sucient number of sentences for use in the experiment.3 Further, there are many variables in the complex situations described above which multiply the possibility of complex interactions and confounds.
10.1.3 A Third Prediction
A third possible prediction made by COMPERE is that a lexical ambiguity such as the one in the word \bugs" in sentence (1) above remains unresolved at the end of the sentence even though it might have been resolved temporarily until an error recovery reintroduced the ambiguity. It is interesting to investigate this further and see if people leave such an ambiguity unresolved, or if they show some bias toward the original interpretation, the one that was in error, since they already made that choice once very recently. Further analysis of this prediction and an extension to the above experiment to test this prediction are beyond the scope of this dissertation.
10.2 History of COMPERE The history of the development of the COMPERE model constitutes an interesting case history in cognitive science research. We started out with intuitions and largely cognitive motivations. In trying to build a computational model that can explain a wider range of psycholinguistic results than previous models, we formed the hypothesis that knowledge sources must be kept separate in a sentence processor but processing across levels must be uni ed (see Chapter 1). When we actually started working out the architecture of the computational model that could accommodate independent knowledge representations and a common control structure that carries out both syntactic and semantic processing, we raised several architectural issues (see Chapter 5). However, our focus was still very much on cognitive modeling and not computational properties of the model. At a later stage in building the model, having implemented a prototype uni ed process and applied it to carry out mostly syntactic analysis, the author began to abstract away the details of the program to write down the formal algorithm behind the program. It then became clear that we had rediscovered left corner parsing (see Chapter 6) in COMPERE's program. A ner analysis showed that the algorithm in COMPERE was not the same as either arc-eager, or arc-standard left-corner parsers. It was also not the same as head-driven or head-corner parsing. It was found to be a unique combination of left-corner and head-driven parsing algorithms (Mahesh, 1994b). This new parsing algorithm was found to have interesting computational properties for incremental sentence processing with interactive syntax and semantics (see Chapters 6 and 9). Further along the way of building the COMPERE program, the theory of intermediate roles was introduced rst intuitively and later re ned into a well-de ned concept with distinct advantages for uniform representation of syntactic and semantic interpretations and for error recovery operations 3
We are currently working with a psycholinguist, Dr. Jennifer Holbrook, on a more concrete experimental design.
217
(Mahesh and Eiselt, 1994; also see Chapters 7 and 8). Once again, a review of literature showed links to existing linguistic formalisms such as linking theory, macroroles, and thematic hierarchies (see Chapter 7 and Frawley, 1992). It was clear that the mechanism of intermediate roles developed in this thesis was not the same as those previous theories. Finally, it was shown that COMPERE, as a computational model, is able to make concrete and seemingly testable predictions about the human sentence processor. This completed a rare cycle in this piece of research in cognitive science where a cognitive model developed into an interesting computational model and the computational model was able to make interesting psychological predictions. We strongly believe that a model such as COMPERE that is motivated from both psychological and computational points of view is able to provide a better constrained and more plausible account of sentence understanding phenomena than one that is based only on psychological or only computational considerations.
10.3 Limitations of COMPERE As any other model, COMPERE has several limitations. Some of these limitations are merely those of the current implementation of the model. These will be discussed later in this section. First, we look at limitations of COMPERE as a model of sentence understanding.
10.3.1 Cognitive Accuracy
While it should be emphasized that it was never a goal of this thesis to simulate exactly any particular set of psycholinguistic data, COMPERE as a cognitive model could be a better model, closer to recent psychological results, but for the following two limitations: lack of additional constraints from resource limits and the approximation introduced by the parsing algorithm. Let us examine these limitations and their implications in terms of cognitive accuracy.
10.3.1.1 Resource Limits
COMPERE's architecture with its uni ed arbitrator was designed to include constraints from resource limits in making ambiguity resolution and error recovery decisions. However, resource limits have not been included in the model for two fundamental reasons: How do we know how much the limits are? What are the numbers that de ne when a resource limit has been reached? There is no justi cation to add an arbitrarily chosen number and claim that the model considers resource limits. Limits on human working memory are dicult to transduce into computational resource limits. How do we measure the amount of resources it takes to process something in sentence understanding? There has been no known method of counting the amount of working memory or other resources it takes to maintain and process intermediate interpretations, especially when semantic representations are included. Chapter 9 provided a preliminary analysis of a proposed metric for measuring resource requirements. Given these two challenges, we can only claim that because COMPERE has a exible architecture with a uni ed process controlling the interactions between syntax and semantics, it is equipped to include resource limits in its decisions once we gure out what those limits are and how to measure resource requirements. This limitation also illustrates the distinction between a well-speci ed computational model and a vaguely or incompletely articulated model. For example, Stowe's model (1991), which admittedly was never advertised as a computational model, proposed a change in strategies between modular
218
and interactive behaviors based on limits on the amount of delay in making decisions permitted by resource limits. Such a model can account for human behavior in sentence processing which shows highly interactive behaviors under normal circumstances, but switches to modular behaviors where only syntactic information is used to make decisions when pressed for resources by increasing the load in processing a sentence. However, as a true computational model, COMPERE cannot implement such a theory because of the two diculties in modeling memory limits mentioned above. One consequence of this dierence between the cognitive model put forth by Stowe (1991) and a computational instantiation of COMPERE can be seen in the now familiar examples. (2a) The ocers taught at the academy were very demanding. (2b) The courses taught at the academy were very demanding.
(2c)
The children taught at the academy.
We have seen previously that sentence (2a) results in a garden-path behavior because the syntactic ambiguity at \taught" is resolved incorrectly at that point using syntactic preferences since there is no semantic bias in this example. However, no signi cant diculties are expected in processing either sentence (2b) or (2c). Given a semantic (or, rather, conceptual) bias that only adult, animate entities can be teachers, we see that neither \courses" nor \children" are t to be agents of teaching. Thus the main clause interpretation should not be pursued in either sentence. While this strategy can explain why sentence (2b) is not a garden path and is much easier to process than sentence (2a), it is unable to explain how sentence (2c) is not a strong garden path like sentence (2a). In COMPERE, the reduced relative is selected for both (2b) and (2c) which results in an error at the end of the sentence in (2c). While COMPERE is able to recover from the error (see Chapter 8) and ultimately produce the correct interpretation for sentence (2c), one where syntax overrules the semantic bias to allow \children" to be the agents of teaching, COMPERE is unable to explain the dierence in diculty between sentences (2a) and (2c). Stowe's model explained this dierence by claiming that the structural ambiguity at \taught" in sentences (2b) and (2c) is not resolved immediately since syntax and semantics have con icting preferences at this point. The decision is delayed and both interpretations are pursued until a later point when resource limits force a decision to be made. Sentences (2b) and (2c) presumably do not reach resource limits and hence the sentence processor can proceed with the reduced-relative interpretation in (2b) and the main-clause interpretation in (2c). Given the above limitation on modeling resource limits, COMPERE is unable to implement these types of resource-controlled delay strategies. We strongly believe that there is not much merit to introducing arbitrary numbers into the COMPERE program and ne-tuning the numbers to t the model accurately to any one piece of data collected from one set of experiments. Future developments in distributed models of computation might provide us with new ways of modeling factors such as resource limits.
10.3.1.2 Flexible Parsing: Eager HSLC
Though the Head-Signaled Left-Corner parsing algorithm enables COMPERE to make syntactic and semantic compositions at a better sequence of points in the sentence than other algorithms such bottom-up or top-down or other left-corner parsers, it still leaves room for improvement. HSLC is somewhat conservative and at times delays commitments more than one would desire for reasons of better cognitive accuracy. When an adjunct such as a prepositional phrase (PP) is to be attached, HSLC delays the attachment until the head of the PP (i.e., the NP) is processed. However, sometimes, it may be possible uniquely to attach the PP even before the head noun in the PP. For example, Ferstl (1994; see also Carpenter and Just, 1988; Holbrook, 1989) has found that the location of PP
219
attachment in the human sentence processor is dependent on the context. It can be before or after the head depending on whether meanings in the pre x (i.e., before the head) were mentioned in prior context (see Chapter 3). Early attachment may also be made possible by recency and frequency information (e.g., Pearlmutter, Daugherty, MacDonald, and Seidenberg, 1994). When early attachment is possible, Arc-Eager Left-Corner (AELC) parsing gives a better account of the attachment position than HSLC (see Chapter 6). In fact, if attachment occurs between the left corner and the head, a strategy in between AELC and HSLC will be necessary to produce such behavior. There may be still other sources of information such as statistical information from a corpus, lexical subcategorizations of the verb for particular prepositions, or other contextual biases that would enable a justi ed early attachment immediately after the preposition. Such a source of information could be used by COMPERE's uni ed arbitrator mechanism if it is available from the lexical entries of the words. COMPERE's HSLC algorithm would not make the early commitment even when such information is available. COMPERE is thus conservative and makes commitments late in certain cases. However, it does not make unjusti ed early commitments in other cases. A possible improvement to COMPERE is to see if a preference is available immediately after the preposition, say, and make a commitment if available. That is, COMPERE would switch strategies between AELC and HSLC depending on the availability of outside information. There may also be situations where COMPERE might have to wait after the head to see some other required unit to the right of the head before being able to commit to an attachment. In this case, COMPERE might be improved to switch to a strategy closer to ASLC by delaying the attachment. The quantitative aspects of how often such an early commitment can be justi ed and how often the commitment leads to an error can only be answered from an empirical study in a corpus of sentences. It may also be noted that the source of lexicalized preferences is left unspeci ed in COMPERE; such preferences can come from conceptual knowledge of slot- ller preferences, from statistical collocation data, or from other licensing preference. Even if other sources of information permit earlier commitments at times, communication with semantics (where semantics is the process of selecting and composing meanings of words to form composite meanings) should be delayed until a content word with meaning has been processed. Based on this analysis, we can propose a new parsing algorithm that is exible and changes strategies between AELC and HSLC. We call such an algorithm \Eager HSLC" to indicate that it is HSLC, but is eager and makes earlier attachments when possible. In other words, Eager HSLC keeps attempting an attachment and makes one as soon as there is a unique attachment possible. If the attachment is ambiguous, it delays the attachment until the head of the adjunct. It must be noted however, that according to the performance analysis presented in Chapter 9, an early attachment, even when it is unambiguous, adds to the cost since the syntactic link added early on must be retained until the meaning of the head ows through the link. This will be the additional cost of making the early attachment. There may, however, be bene ts from the early attachment if it can help resolve any ambiguities within the adjunct. We can also expand the exibility of this new algorithm (i.e., widen the range of parsing algorithms within which the exible algorithm can switch strategies) towards ASLC by relaxing the de nition of the head of a phrase. If we allow the head to include certain required arguments that follow the head word, the parsing algorithm can be made to wait until after the head word and those required units are processed to make attachments. This results in an algorithm that can take any position in spectrum of parsing strategies (see Chapter 6) from AELC all the way to ASLC depending on the context.
220
10.3.2 Other Limitations
Many of the assumptions made in this work (see Chapter 2) can be construed as limitations of COMPERE. However, they would all be limitations in the functionality of the task modeled by COMPERE, not limitations of the model itself. For instance, COMPERE does not model inference or any non-linguistic reasoning. But the goal of this thesis was to build a model of sentence processing that produced linguistic meanings of sentences, not to build a model that would produce conceptual interpretations inferred from linguistic meanings as appropriate in the situation of a non-linguistic task or domain. However, limitations of the model that are issues in sentence processing are more interesting. The one that comes to mind rst is reference resolution. Though resolving references often involves intersentential interactions, thereby making it a part of the discourse processing problem, references may also need to be resolved across dierent phrases or clauses within a sentence. COMPERE should be extended to deal with such reference resolution. Another drawback of the current model is that the same knowledge is applied to a particular sentence processing situation no matter in what context the situation occurs. This may be changed to allow the context to have an eect on sentence processing. For example, the preferences for dierent interpretations could be changed by the context so that the same sentence produces dierent behaviors in COMPERE depending on the context. This may be accomplished either by allowing the context (i.e., a discourse processor or a non-linguistic reasoning program that determines what the context is) to directly communicate with the arbitrator, or by allowing it to modify the preferences directly at the time of lexical access depending on whether the contextual preferences are available for individual word meanings (i.e., concepts) or for bigger linguistic units.
10.3.3 Limitations of the Current COMPERE Program
The current implementation of COMPERE has several limitations. Many of these limitations shown below require a large engineering eort to remedy. The lexicon is currently very limited. COMPERE knows about 600 words, out of which less than half have their meanings encoded. Extending the syntactic knowledge in the lexicon will be less dicult given the availability of on-line lexicons. However, adding meanings to all the words will require substantial amount of work since the corresponding meaning representations and conceptual knowledge, such as selectional constraints and concept hierarchies (see Chapter 8), will also have to be encoded. The grammar is limited to a certain extent. For example, COMPERE's grammar cannot deal with relative clause hanging o a prepositional phrase as in sentence (3) below.
(3)
The academy at which the ocers taught is full of demanding courses.
COMPERE's semantic knowledge is also considerably limited at this time. Its knowledge
of semantic roles and intermediate roles has been con ned to thematic roles ignoring other semantic elements such as spatial roles, modi ers, tense, time, and aspect. For example, only a handful of prepositions have their primitive and intermediate roles represented. A substantial amount of work is needed to de ne and encode all the mappings from closedclass elements such as prepositions to all the elements of linguistic semantics. For instance, most of the elements in Frawley (1992) must be encoded in COMPERE's formalisms and representations. This, being linguistic knowledge, must be added to COMPERE before it can integrated with discourse and reasoning systems.
221
10.4 Other Directions for Future Work Although we have been exploring and suggesting directions for future research throughout the discussion of limitations above, there are other places where further work could be fruitful that are not directly related to the limitations above. Some of them are listed below.
At present, COMPERE does not model parsing breakdown. For instance, it does not ex
perience much diculty in processing center-embedded sentences. We have to nd a good method to impose resource limitations so that we can model parsing breakdown behavior. It may not be necessary to build an explicit representation of a syntactic parse tree for a sentence. Intuitively, it seems to be possible to use syntactic cues to in uence thematic role assignment without always building a parse tree. However, it is not clear how we can do less work in syntax and yet achieve the same end result. Further research on intermediate roles and the uniform representation of syntactic and semantic knowledge might yield better ways of representing syntactic interpretations. Such a representation might lead to an even smoother integration of syntax and semantics in sentence processing. Another area that requires further investigation concerns the hypothesis that intermediate roles are powerful enough to capture syntactic transformations. COMPERE has already shown that the mechanism of intermediate roles is sucient to deal with passive transformations. It remains to be seen if intermediate roles are powerful enough to represent the transformation from surface to deep syntactic structure in general. At some point, we would like to extend COMPERE beyond sentence understanding. Some of this work is already under way in two separate projects where COMPERE is being integrated with a discourse processor for understanding science ction stories called ISAAC (Moorman and Ram, 1994), and a non-linguistic reasoner called KA (Mahesh, Peterson, Goel, and Eiselt, 1994; Peterson, Mahesh, and Goel, 1994) that acquires knowledge of the design of physical devices by reading natural language descriptions of the devices. However, in both these projects, currently COMPERE is more of a front-end to the system than an integrated part of the whole system. This is because COMPERE's output is being used in these systems, but the reasoning systems are not in any way in uencing sentence interpretation. This can be achieved by working out methods for allowing the reasoning situation or context to aect the preferences in COMPERE as outlined above. Such integration of COMPERE's sentence processing model into other reasoning systems has potential use in a variety of applications such as information retrieval, information extraction and knowledge acquisition, conceptual information processing, and machine translation. COMPERE's theory has only been tested with English. The model of sentence processing can be tested with other natural languages. This will enable COMPERE to serve as a computational model for applications in machine translation. This is particularly interesting given the uniform representation of syntactic and semantic interpretations in COMPERE using intermediate roles which are independent of the strong in uence of word order as a linguistic cue in English. Further evaluation of the COMPERE model can be carried out by a combination of additional formal analysis and evaluations based on empirical factors and tradeos, such as the ones discussed in Chapter 9. However, these empirical tests rst require that the implementation be engineered to deal with a large number of sentences. Empirical results when combined with formal models of performance analyses will not only tell how well the model did in practice,
222
but also why it did well on certain sentences and not so well on others. For instance, one could evaluate the Eager HSLC algorithm proposed above in this manner.
223
CHAPTER XI CONCLUSIONS : : :the control structure will necessarily need to be highly dynamic, using and exploiting
information from syntax, semantics, and context, wherever it is best de ned at a given moment, to lead the system to the appropriate interpretation.
J. Allen, 1989 Our model of sentence understanding, COMPERE, has demonstrated that its computational architecture with a uni ed-process and independent knowledge sources can explain how a sentence processor could produce modular behaviors in some situations and interactive behaviors in other situations. COMPERE can process a sentence at the lexical, syntactic, and semantic levels, it can resolve syntactic and semantic, lexical and structural, ambiguities, and recover from errors in both syntactic and semantic ambiguities. Its uni ed arbitrator, its Head-Signaled Left-Corner parsing algorithm, and its uniform representation of syntactic and semantic interpretations using Intermediate Roles have shown distinct advantages both in computational terms and in modeling a wide range of psycholinguistic ndings. In addition to modeling known behaviors, COMPERE has generated predictions about new situations in sentence processing as well as several insightful ideas about the interactions between syntax and semantics in sentence understanding.
11.1 Issues Addressed In this work, we have addressed several issues of signi cance to the study of natural language understanding: How does the language processor integrate dierent kinds of knowledge to lead to the appropriate interpretation of a sentence? We have shown how the application of a single uni ed process to the multiple sources of knowledge can result in a smooth integration of knowledge from the dierent sources. We have also appealed to the integrated processing principle and shown that such interaction can happen incrementally at the earliest opportunities in sentence processing. How does the sentence processor integrate information from multiple sources as often and as soon as possible and yet retain the ability to use them independently? This, in essence, is the modularity debate in the study of human language processing. We have shown that by keeping the knowledge sources independent of one another and unifying the process, we can support early and frequent interaction between the use of dierent knowledge sources without sacri cing the functional independence between the dierent types of knowledge. How does the language understander cope with the variety of ambiguities in natural languages? How are decisions made and con icts resolved to select unique interpretations of sentences? We have shown that the uni ed arbitrator can resolve dierent kinds of ambiguities without having to do an exhaustive search or much wasteful backtracking.
224
How does the language understander recover from errors it makes along the way due to the
lack of sucient information at dierent points in sentence processing? Our model maintains an on-line interpretation of any portion of the input that is the best with respect to all the information available at that point in processing. Our model also retains information about alternative interpretations so that it can switch to one of the retained interpretations or otherwise repair the erroneous interpretation when later information proves the current one wrong. We have shown that this method, proposed originally for semantic and pragmatic error recovery (e.g., Eiselt, 1989), is applicable to syntactic error recovery as well. When does the sentence processor jump to a decision and when does it delay a commitment? What is the time course of decisions made by the human sentence processor in processing dierent sentences? We have developed the HSLC parsing algorithm that produces a sequence of decisions that is closer to the desired time course than the ones produced by other parsers. We have also shown how the uni ed arbitrator is capable of delaying decisions and provided an architecture that could accommodate additional cognitive factors such as working memory limitations.
11.2 Contributions This work has made contributions to several areas of research related to language processing. The contributions along with the areas of language study involved are listed below.
The architecture of the sentence processor with its uni ed process and separate knowledge
sources is a contribution to the multiple disciplines of natural language processing and psycholinguistics. It provides a new architecture for building natural language processing system that can exploit any piece of available knowledge without demanding complete knowledge of any one type. In psycholinguistics, it provided a well speci ed computational model that demonstrates how the human sentence processor could produce both modular and interactive behaviors in a single architecture. It also contributes to arti cial intelligence a blackboard architecture with an additional arbitrating process that monitors and controls the interactions occurring through the blackboard. The HSLC parsing algorithm, a unique combination of left-corner and head-driven parsing algorithms is a contribution to computational linguistics and natural language processing. The concept of intermediate roles and their use in uniformly representing syntactic and semantic interpretations is a contribution to both computational linguistics and natural language processing. The error recovery and repair methods demonstrated in COMPERE are contributions to natural language processing, arti cial intelligence, and psycholinguistics. The theoretical framework of an \enhanced graph-structured pushdown automaton" described in Chapter 9 that enables formal analyses of the performance of sentence processors is a contribution to computational linguistics and to computer science in general. The psychological predictions (see Chapter 10) are potential contributions to psycholinguistics. The history of COMPERE (see Chapter 10) illustrated how our cognitive motivations based on psychological observations led to a cognitive model that exhibited interesting computational properties which in turn led to cognitive predictions and suggested psychological
225
experiments. This methodological cycle serves as an interesting illustration of the bene ts of interdisciplinary studies in the cognitive sciences. The notion of emergence of roles (see Chapter 7) from primitive roles initiated from the lexicon, and through progressive re nement through intermediate roles using a variety of dierent types of linguistic cues and knowledge, is a characterization of semantic processing that could prove to be particularly useful in studies of bilingual (or multi-lingual) language processing and machine translation.
11.3 Conclusion Building a complete computational model of sentence understanding is a dicult, unsolved problem. The task of sentence understanding requires a variety of dierent types of knowledge, not all of which can be assumed to be available in all situations. When and how the dierent types of knowledge are integrated during sentence interpretation make a big dierence in the amount of local ambiguity that the processor must struggle with. In this thesis, we have presented our computational model of sentence understanding that has a exible architecture which allows any and all available knowledge to be exploited, at the right points during sentence understanding, to produce the best interpretations derivable from a sentence and the knowledge available. We have presented the algorithms it is built of, shown examples of its workings, and discussed both cognitive and computational motivations for the model and implications of its claims. We conclude from this study that our initial hypothesis, that a sentence processor has a uni ed process applied independently to multiple knowledge sources, provides an answer to the modularity debate and explains, better than other possible architectures, how and why the human sentence processor produces the wide variety of behaviors that it does.
226
BIBLIOGRAPHY Abney, S. P. (1989). A computational model of human parsing. Journal of Psycholinguistic Research, 18(1):129{144. Abney, S. P. and Johnson, M. (1991). Memory requirements and local ambiguities of parsing strategies. Journal of Psycholinguistic Research, 20(3):233{250. Aho, A. V. and Ullman, J. D. (1972). The Theory of Parsing, Translation, and Compiling: Volume 1: Parsing. Prentice Hall. Allen, J. (1987). Natural Language Understanding. The Benjamin/Cummings Publishing Company, Inc. Allen, J. (1989). Natural language understanding. section g: Conclusion. In Barr, A., Cohen, P. R., and Feigenbaum, E. A., editors, The Handbook of Arti cial Intelligence, Volume IV, pages 238{239. Addison-Wesley Publishing Company, Inc. Altmann, G. T. M., Garnham, A., and Dennis, Y. (1992). Avoiding the garden path: Eye movements in context. Journal of Memory and Language, 31:685{712. Bader, M. and Lasser, I. (1993). In defense of top-down parsing: Evidence from german. In Proceedings of the Sixth Annual CUNY Sentence Processing Conference, Amherst, MA, Mar 18-20, 1993. Barr, A. and Feigenbaum, E. A. (1981). The handbook of arti cial intelligence, Volume 1. Addison Wesley Publishing Company. Bates, E., Wulfeck, B., and MacWhinney, B. (1991). Cross-linguistic research in aphasia: An overview. Brain and Language, 41(2):123{148. Berwick, R. C. (1993). Cartesian parsing. Talk presented at the Sixth Annual CUNY Sentence Processing Conference, Amherst, MA, Mar 18-20, 1993. Bever, T., Garrett, M., and Hurtig, R. (1973). The interaction of perceptual processes and ambiguous sentences. Memory and Cognition, 1:277{286. Bierce, A. (1911). The devil's dictionary. Dover Publications, Inc. Birnbaum, L. (1986). Integrated Processing in Planning and Understanding. PhD thesis, Yale University, Department of Computer Science, New Haven, CT. Research Report #489. Birnbaum, L. (1989). A critical look at the foundations of autonomous syntactic analysis. In Proceedings of the Eleventh Annual Conference of the Cognitive Science Society, pages 99{106. Cognitive Science Society. Birnbaum, L. (1991). Three critical essays on language and representation: A critical look at the foundations of autonomous syntactic analysis. Tech Report 18, Institute for Learning Sciences, Northwestern University. Birnbaum, L. and Selfridge, M. (1981). Conceptual analysis of natural language. In Schank, R. and Riesbeck, C., editors, Inside Computer Understanding, pages 318{353. Lawrence Erlbaum Associates. Blackwell, A. and Bates, E. (1994). Inducing agrammatic pro les in normals. In Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society, pages 45{50. Hillsdale, NJ: Lawrence Erlbaum.
227
Bresnan, J. and Kaplan, R. (1982). Introduction: Grammars as mental representations of language. In Bresnan, J., editor, The Mental Representation of Grammatical Relations. MIT Press. Britt, M., Gabrys, G., and Perfetti, C. (1993). A restricted interactive model of parsing. In Proceedings of the Fifteenth Annual Conference of the Cognitive Science Society, pages 260{ 265. Lawrence Erlbaum Associates. Burgess, C. and Lund, K. (1994). Multiple constraints in syntactic ambiguity resolution: A connectionist account of psycholinguistic data. In Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society, pages 90{95. Lawrence Erlbaum Associates. Burgess, C. and Simpson, G. B. (1988). Neuropsychology of lexical ambiguity resolution: The contribution of divided visual eld studies. In Small, S. L., Cottrell, G. W., and Tanenhaus, M. K., editors, Lexical Ambiguity Resolution: Perspectives from Psycholinguistics, Neuropsychology, and Arti cial Intelligence, pages 411{430. Morgan Kaufmann Publishers. Burgess, C., Tanenhaus, M., and Homan, M. (1994). Parafoveal and semantic eects on syntactic ambiguity resolution. In Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society, pages 96{99. Lawrence Erlbaum Associates. Caramazza, A. and Berndt, R. S. (1978). Semantic and syntactic processes in aphasia: A review of the literature. Psychological Bulletin, 85:898{918. Cardie, C. and Lehnert, W. (1991). A cognitively plausible approach to understanding complex syntax. In Proceedings of the Ninth National Conference on Arti cial Intelligence, pages 117{ 124. Morgan Kaufmann. Carlson, G. N. and Tanenhaus, M. K. (1988). Thematic roles and language comprehension. In Wilkins, W., editor, Syntax and Semantics, Vol. 21: Thematic Relations. Academic Press. Carpenter, P. A. and Daneman, M. (1981). Lexical retrieval and error recovery in reading: A model based on eye xations. Journal of Verbal Learning and Verbal Behavior, 20:137{160. Carpenter, P. A. and Just, M. A. (1988). The role of working memory in language comprehension. In Klahr, D. and Kotovsky, K., editors, Complex information processing: The impact of Herbert A. Simon. Erlbaum. Carpenter, P. A., Miyake, A., and Just, M. A. (1994). Working memory constraints in comprehension: evidence from individual dierences, aphasia, and aging. In Gernsbacher, M., editor, Handbook of Psycholinguistics. Academic Press. Carroll, D. (1986). Psychology of language. Monterey, CA: Brooks/Cole. Charniak, E. (1983). Passing markers: A theory of contextual in uence in language comprehension. Cognitive Science, 7:171{190. Chomsky, N. (1957). Syntactic structures. Mouton. Clifton, C. and Ferreira, F. (1987). Modularity in sentence comprehension. In Gar eld, J. L., editor, Modularity in Knowledge Representation and Natural-Language Understanding. MIT Press. Cottrell, G. and Small, S. (1983). A connectionist scheme for modelling word sense disambiguation. Cognition and Brain Theory, 6:89{120. Cottrell, G. W. (1985). Connectionist parsing. In Proceedings of the Seventh Annual Conference of the Cognitive Science Society, Irvine, CA, pages 201{211. Covington, M. A. (1990). A dependency parser for variable-word-order languages. Research Report AI-1990-01, Arti cial Intelligence Programs, The University of Georgia, Athens, GA.
228
Crain, S. and Steedman, M. (1985). On not being led up the garden path: the use of context by the psychological syntax processor. In Dowty, D. R., Karttunen, L., and Zwicky, A. M., editors, Natural Language Parsing: Psychological, computational, and theoretical perspectives. Cambridge University Press. Crocker, M. W. (1993). Properties of the principle-based sentence processor. In Proceedings of the Fifteenth Annual Conference of the Cognitive Science Society, pages 371{376. Lawrence Erlbaum Associates. Cullingford, R. (1978). Script application: Computer understanding of newspaper stories. PhD thesis, Yale University, Department of Computer Science, New Haven, CT. Research Report #116. Daneman, M. and Carpenter, P. (1980). Individual dierences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19:450{466. DeJong, G. (1979). Skimming stories in real time: An experiment in integrated understanding. PhD thesis, Yale University, Department of Computer Science, New Haven, CT. Research Report #158. DeJong, G. (1982). An overview of the frump system. In Lehnert, W. G. and Ringle, M. H., editors, Strategies for natural language processing. Lawrence Erlbaum. Earley, J. (1970). An ecient context-free parsing algorithm. Communications of ACM, 13(2):94{ 102. Eiselt, K., Mahesh, K., and Holbrook, J. (1993). Having your cake and eating it too: Autonomy and interaction in a model of sentence processing. In Proceedings of the Eleventh National Conference on Arti cial Intelligence (AAAI-93), pages 380{385. AAAI Press and The MIT Press. Eiselt, K. P. (1989). Inference Processing and Error Recovery in Sentence Understanding. PhD thesis, University of California, Irvine, CA. Tech. Report 89-24. Eiselt, K. P. and Holbrook, J. K. (1991). Toward a uni ed theory of lexical error recovery. In Proceedings of the Thirteenth Annual Conference of the Cognitive Science Society. Cognitive Science Society. Empson, W. (1956). Seven types of ambiguity. Chatto and Windus, London. Erman, L., Hayes-Roth, F., Lesser, V., and Reddy, D. (1980). The hearsay-ii speech understanding system: Integrating knowledge to resolve uncertainty. ACM Computing Survey, 12:213{253. Ferreira, F. and Clifton, C. (1985). The independence of syntactic processing. The Journal of Memory and Language, 25:348{368. Ferreira, F. and Henderson, J. M. (1991). Recovery from misanalyses of garden-path sentences. Journal of Memory and Language, 30:725{745. Ferstl, E. (1994). Context eects in syntactic ambiguity resolution: The location of prepositional phrase attachment. In Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society, pages 295{300. Lawrence Erlbaum Associates. Fodor, J. A. (1983). The Modularity of Mind. The MIT Press, Cambridge, MA. Fodor, J. A. (1987). Modules, frames, fridgeons, sleeping dogs, and the music of the spheres. In Gar eld, J. L., editor, Modularity in Knowledge Representation and Natural-Language Understanding. MIT Press. Foley, W. and van Valin, R. (1984). Functional syntax and universal grammar. Cambridge University Press.
229
Forster, K. I. (1979). Levels of processing and the structure of the language processor. In Cooper, W. E. and Walker, E. C. T., editors, Sentence Processing: Psycholinguistic Studies Presented to Merrill Garrett. Lawrence Erlbaum Associates. Frawley, W. (1992). Linguistic Semantics. Lawrence Erlbaum Associates. Frazier, L. (1987). Theories of sentence processing. In Gar eld, J. L., editor, Modularity in Knowledge Representation and Natural Language Understanding. MIT Press. Frazier, L. (1989). Against lexical generation of syntax. In Marslen-Wilson, W., editor, Lexical Representation and Process. MIT Press. Frazier, L. and Fodor, J. D. (1978). The sausage machine: A new two-stage parsing model. Cognition, 6:291{325. Frazier, L. and Rayner, K. (1982). Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology, 14:178{210. Gernsbacher, M. (1994). Handbook of Psycholinguistics. Academic Press. Gorrell, P. (1987). Studies of human syntactic processing: Ranked-parallel versus serial models. PhD thesis, University of Connecticut. Unpublished Ph.D. dissertation. Grishman, R. (1986). Computational linguistics: An introduction. Cambridge University Press. Halliday, M. and Martin, J. (1993). Writing Science: Literacy and Discursive Power. University of Pittsburgh Press. Hirst, G. (1988). Semantic interpretation and ambiguity. Arti cial Intelligence, 34:131{177. Holbrook, J. K. (1989). Studies of inference retention in lexical ambiguity resolution. PhD thesis, School of Social Sciences, University of California, Irvine. Holbrook, J. K., Eiselt, K. P., Granger, R. H., and Matthei, E. H. (1988). (almost) never letting go: Inference retention during text understanding. In Small, S. L., Cottrell, G. W., and Tanenhaus, M. K., editors, Lexical Ambiguity Resolution: Perspectives from Psycholinguistics, Neuropsychology, and Arti cial Intelligence, pages 383{409. Morgan Kaufmann Publishers. Holbrook, J. K., Eiselt, K. P., and Mahesh, K. (1992). A uni ed process model of syntactic and semantic error recovery in sentence understanding. In Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society, pages 195{200. Cognitive Science Society. Holmes, V. M., Stowe, L., and Cupples, L. (1989). Lexical expectations in parsing complement-verb sentences. Journal of Memory and Language, 28:668{689. Jackendo, R. (1983). Semantics and Cognition. The MIT Press. Jacobs, P. S. (1992). Parsing run amok: Relation-driven control for text analysis. In Proceedings of the Tenth National Conference on Arti cal Intelligence, AAAI-92, pages 315{321. Jacobs, P. S., Krupka, G. R., McRoy, S. W., Rau, L. F., Sondheimer, N. K., and Zernik, U. (1990). Generic text processing: A progress report. In Proceedings DARPA Speech and Natural Language Workshop, pages 359{364. Morgan Kaufmann Publishers. Jarvella, R. (1971). Syntactic processing of connected speech. Journal of Verbal Learning and Verbal Behavior, 10:409{416. Jensen, K. (1993). Peg: The plnlp english grammar. In Jensen, K., Heidorn, G. E., and Richardson, S. D., editors, Natural Language Processing: The PLNLP Approach, pages 29{45. Kluwer Academic Publishers. Chapter 3. Johnson-Laird, P. (1983). Mental Models: Towards a cognitive science of language, inference, and consciousness. Harvard University Press.
230
Jurafsky, D. (1991). An on-line model of human sentence interpretation. In Proceedings of the Thirteenth Annual Conference of the Cognitive Science Society, pages 449{454. Cognitive Science Society. Jurafsky, D. (1992). An on-line computational model of human sentence interpretation. In Proceedings of the Tenth National Conference on Arti cial Intelligence, AAAI 92, pages 302{308. Just, M. A. and Carpenter, P. (1980). A theory of reading: From eye xations to comprehension. Psychological Review, 87:329{354. Just, M. A. and Carpenter, P. (1987). The psychology of reading and language comprehension. Allyn and Bacon, Newton, MA. Just, M. A. and Carpenter, P. (1992). A capacity theory of comprehension: Individual dierences in working memory. Psychological Review, 99(1):122{149. Kimball, J. (1973). Seven principles of surface structure parsing. Cognition, 2:15{47. King, J. and Just, M. A. (1991). Individual dierences in syntactic processing: The role of working memory. Journal of Memory and Language, 30:580{602. Kurtzman, H. S. (1985). Studies in Syntactic Ambiguity Resolution. PhD thesis, Massachusetts Institute of Technology. Laird, J., Newell, A., and Rosenbloom, P. (1987). Soar: An architecture for general intelligence. Arti cial Intelligence, 33:1{64. Lebowitz, M. (1983). Memory-based parsing. Arti cial Intelligence, 21:363{404. Lehman, J. F., Lewis, R. L., and Newell, A. (1991). Integrating knowledge sources in language comprehension. In Proceedings of the Thirteenth Annual Conference of the Cognitive Science Society, pages 461{466. Lehnert, W. and Sundheim, B. (1991). A performance evaluation of text-analysis technologies. AI Magazine, 12(3):81{94. Lehnert, W. G., Dyer, M. G., Johnson, P. N., Yang, C. J., and Harley, S. (1983). Boris - an experiment in in-depth understanding of narratives. Arti cial Intelligence, 20(1):15{62. Lewis, R. L. (1992). A computational theory of human sentence comprehension. PhD Thesis Proposal, School of Computer Science, Carnegie Mellon University. Lewis, R. L. (1993a). An architecturally-based theory of human sentence comprehension. In Proceedings of the Fifteenth Annual Conference of the Cognitive Science Society, pages 108{ 113. Lawrence Erlbaum Associates. Lewis, R. L. (1993b). An architecturally-based theory of human sentence comprehension. PhD thesis, Carnegie Mellon University, Computer Science Department, Pittsburgh, PA. Tech. Report. CMU-CS-93-226. Lytinen, S. L. (1984). The Organization of Knowledge in a Multi-lingual Integraged Parser. PhD thesis, Yale University, Department of Computer Science, New Haven, CT. Research Report #340. Lytinen, S. L. (1986). Dynamically combining syntax and semantics in natural language processing. In Proceedings of the Fifth National Conference on Arti cial Intelligence, pages 574{578. Lytinen, S. L. (1987). Integrating syntax and semantics. In Nirenburg, S., editor, Machine Translation, Theoretical and methodological issues, pages 302{316. Cambridge University Press. MacDonald, M. C., Just, M. A., and Carpenter, P. A. (1992). Working memory constraints on the processing of syntactic ambiguity. Cognitive Psychology, 24:56{98.
231
Mahesh, K. (1994a). Building a parser that can aord to interact with semantics. In Proceedings of the Twelfth National Conference on Arti cial Intelligence (AAAI-94), pages 1473{1473. AAAI Press and The MIT Press. Mahesh, K. (1994b). Reaping the bene ts of interactive syntax and semantics. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pages 310{312. ACL and Morgan Kaufmann. Mahesh, K. and Eiselt, K. (1994). Uniform representations for syntax-semantics arbitration. In Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society, pages 589{ 594. Hillsdale, NJ: Lawrence Erlbaum. Mahesh, K., Peterson, J., Goel, A., and Eiselt, K. (1994). Ka: Integrating design problem solving and natural language understanding. In Working Notes from the AAAI Spring Symposium \Active NLP: Natural Language Understanding in Integrated Systems", Stanford University, California. Also available as a Technical Report from the American Association for Arti cial Intelligence. Marcus, M. (1980). A Theory of Syntactic Recognition for Natural Language. MIT Press, Cambridge, MA. Marr, D. (1982). Vision: A Computational investigation into the Human Representation and Processing of Visual Information. W. H. Freeman, San Francisco. Marslen-Wilson, W. and Tyler, L. K. (1987). Against modularity. In Gar eld, J. L., editor, Modularity in Knowledge Representation and Natural-Language Understanding. MIT Press. McClelland, J. L. and Kawamoto, H. (1986). Mechanisms of sentence processing: Assigning roles to constituents. In McClelland, J. L., Rumelhart, D. E., and the PDP Research Group, editors, Parallel Distributed Processing. Explorations in the microstructure of cognition, Vol. 2: Psychological and biological models. MIT Press. McRoy, S. W. and Hirst, G. (1990). Race-based parsing and syntactic disambiguation. Cognitive Science, 14:313{353. Meyer, I., Onyshkevych, B., and Carlson, L. (1990). Lexicographic principles and design for knowledge-based machine translation. Technical Report CMU-CMT-90-118, Center for Machine Translation, Carnegie Mellon University, Pittsburgh, PA. Miikkulainen, R. and Dyer, M. G. (1991). Natual language processing with modular pdp networks and distributed lexicon. Cognitive Science, 15(3):343{399. Miyake, A., Just, M. A., and Carpenter, P. A. (1993). Working memory constraints on the resolution of lexical ambiguity: Maintaining multiple interpretations in neutral contexts. Journal of Memory and Language, 32. Moorman, K. and Ram, A. (1994). Integrating creativity and reading: A functional approach. In Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society, pages 646{651. Hillsdale, NJ: Lawrence Erlbaum. Nederhof, M. J. (1993). Generalized left-corner parsing. In Sixth Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, Utrecht, The Netherlands. Newell, A. (1981). The knowledge level. AI Magazine, 2:1{20. Presidential Address, American Association for Arti cial Intelligence AAAI80, Stanford University, 19 Aug 1980. Nii, H. P. (1989). Blackboard systems. In Barr, A., Cohen, P. R., and Feigenbaum, E. A., editors, The Handbook of Arti cial Intelligence, Volume IV, pages 1{82. Addison-Wesley Publishing Company. Nijholt, A. (1980). Context Free Grammars: Covers, Normal Forms, and Parsing. Springer Verlag.
232
Nirenburg, S., Carbonell, J., Tomita, M., and Goodman, K. (1992). Machine Translation: A Knowledge-Based Approach. Morgan Kaufmann Publishers, San Mateo, CA. Nirenburg, S. and Levin, L. (1992). Syntax-driven and ontology-driven lexical semantics. In Pustejovsky, J. and Bergler, S., editors, Lexical semantics and knowledge representation. Springer Verlag, Heidelberg. Proceedings of the First SIGLEX Workshop, Berkeley, CA, June, 1991. Onyshkevych, B. and Nirenburg, S. (1994). The lexicon in the scheme of kbmt things. Tech Report MCCS-94-277, Computing Research Laboratory, New Mexico State University, Las Cruces, New Mexico. Pearlmutter, N., Daugherty, K., MacDonald, M., and Seidenberg, M. (1994). Modeling the use of frequency and contextual biases in sentence processing. In Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society, pages 699{704. Lawrence Erlbaum Associates. Pearlmutter, N. J. and MacDonald, M. C. (1992). Plausibility and syntactic ambiguity resolution. In Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society, pages 498{503. Lawrence Erlbaum Associates. Peterson, J. and Billman, D. (1994). Correspondences between syntactic form and meaning: From anarchy to hierarchy. In Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society, pages 705{710. Lawrence Erlbaum Associates. Peterson, J., Mahesh, K., and Goel, A. (1994). Situating natural language understanding within experience-based design. Tech Report GIT-CC-94/18, College of Computing, Georgia Institute of Technology, Atlanta, GA. To appear in the International Journal of Human-Computer Studies. Ram, A. (1989). Question-driven understanding: An integrated theory of story understanding, memory and learning. PhD thesis, Yale University, New Haven, CT. Research Report #710. Ram, A. (1991). A theory of questions and question asking. The Journal of the Learning Sciences, 1:273{318. Rayner, K. (1978). Eye movements in reading and information processing. Psychological Bulletin, 85:618{660. Rayner, K., Carlson, M., and Frazier, L. (1983). The interaction of syntax and semantics during sentence processing: Eye movements in the analysis of semantically biased sentences. Journal of Verbal Learning and Verbal Behavior, 22:358{374. Rayner, K., Garrod, S., and Perfetti, C. A. (1992). Discourse in uences during parsing are delayed. Cognition, 45:109{139. Reddy, D., Erman, L., and Neely, R. (1973). A model and a system for machine recognition of speech. IEEE Transactions on Audio and Electroacoustics, AU-21:229{238. Resnik, P. (1992). Left-corner parsing and psychological plausibility. In Proceedings of the Fourteenth International Conference on Computational Linguistics (COLING '92). Riesbeck, C. K. and Martin, C. E. (1986a). Direct memory access parsing. In Kolodner, J. L. and Riesbeck, C. K., editors, Experience, memory, and reasoning, pages 209{226. Lawrence Erlbaum, Hillsdale, NJ. Riesbeck, C. K. and Martin, C. E. (1986b). Towards completely integrated parsing and inferencing. In Proceedings of the Eighth Annual Conference of the Cognitive Science Society, pages 381{ 387. Cognitive Science Society. Rosen, C. (1984). The interface between semantic roles and initial grammatical relations. In Perlmutter, D. and Rosen, C., editors, Studies in relational grammar, 2, pages 38{77. University of Chicago Press.
233
Schank, R. (1986). Explanation patterns: Understanding mechanically and creatively. Lawrence Erlbaum. Schank, R. C. and Abelson, R. P. (1977). Scripts, Plans, Goals, and Understanding. Lawrence Erlbaum Associates, Hillsdale, NJ. Schank, R. C., Lebowitz, M., and Birnbaum, L. (1980). An integrated understander. American Journal of Computational Linguistics, 6(1):13{30. Seidenberg, M. S., Tanenhaus, M. K., Leiman, J. M., and Bienkowski, M. (1982). Automatic access of the meanings of ambiguous words in context: Some limitations of knowledge-based processing. Cognitive Psychology, 14:489{537. Sells, P. (1985). Lectures on contemporary syntactic theories. Center for the Study of Language and Information, Stanford University. Small, S. L. and Rieger, C. (1982). Parsing and comprehending with word experts. In Lehnert, W. G. and Ringle, M. H., editors, Strategies for natural language processing. Lawrence Erlbaum. Spivey-Knowlton, M. and Tanenhaus, M. (1994). Immediate eects of discourse and semantic context in syntactic processing: Evidence from eye-tracking. In Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society, pages 812{817. Lawrence Erlbaum Associates. Spivey-Knowlton, M. J. (1992). Another context eect in sentence processing: Implications for the principle of referential support. In Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society, pages 486{491. Lawrence Erlbaum Associates. Steedman, M. (1987). Combinatory grammars and human sentence processing. In Gar eld, J. L., editor, Modularity in Knowledge Representation and Natural Language UNnderstanding. MIT Press. Steedman, M. J. (1989). Grammar, interpretation, and processing from the lexicon. In MarslenWilson, W., editor, Lexical Representation and Process. MIT Press. Stevenson, S. (1994). A uni ed model of preference and recovery mechanisms in human parsing. In Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society, pages 824{829. Lawrence Erlbaum Associates. St.John, M. F. and McClelland, J. L. (1990). Learning and applying contextual constraints in sentence comprehension. Arti cial Intelligence, 46:217{257. Stowe, L. A. (1991). Ambiguity resolution: Behavioral evidence for a delay. In Proceedings of the Thirteenth Annual Conference of the Cognitive Science Society, pages 257{262. Cognitive Science Society. Tanenhaus, M., Garnsey, S., and Boland, J. (1991). Combinatory lexical information and language comprehension. In Altmann, G., editor, Cognitive Models of Speech Processing: Psycholinguistic and Computational Perspectives. MIT Press. Tanenhaus, M. K. and Carlson, G. N. (1989). Lexical structure and language comprehension. In Marslen-Wilson, W., editor, Lexical Representation and Process. MIT Press. Tanenhaus, M. K., Dell, G. S., and Carlson, G. (1987). Context eects in lexical processing: A connectionist approach to modularity. In Gar eld, J. L., editor, Modularity in Knowledge Representation and Natural-Language Understanding. MIT Press. Taraban, R. and McClelland, J. L. (1988). Constituent attachment and thematic role assignment in sentence processing: In uences of content-based expectations. Journal of Memory and Language, 27:597{632.
234
Thibadeau, R., Just, M., and Carpenter, P. (1982). A model of the time course and content of reading. Cognitive Science, 6:157{203. Trueswell, J. C. and Tanenhaus, M. K. (1992). Consulting temporal context during sentence comprehension: Evidence from the monitoring of eye movements in reading. In Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society, pages 492{497. Lawrence Erlbaum Associates. Tyler, L. K. and Marslen-Wilson, W. D. (1977). The on-line eects of semantic context on syntactic processing. Journal of Verbal Learning and Verbal Behavior, 16:683{692. van Noord, G. (1991). Head corner parsing for discontinuous constituency. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, pages 114{121. Waltz, D. L. and Pollack, J. B. (1984). Phenomenologically plausible parsing. In Proc. AAAI-84, pages 335{339. Waltz, D. L. and Pollack, J. B. (1985). Massively parallel parsing: A strongly interactive model of natural language interpretation. Cognitive Science, 9:51{74. Wanner, E. and Maratsos, M. (1978). An atn approach to comprehension. In Halle, M., Bresnan, J., and Miller, G. A., editors, Linguistic Theory and Psychological Reality, pages 119{161. MIT Press. Wilks, Y. (1975). A preferential, pattern-seeking, semantics for natural language inference. Arti cial Intelligence, 6(1):53{74. Winograd, T. (1973). A procedural model of language understanding. In Schank, R. C. and Colby, K. M., editors, Computer models of thought and language, pages 152{186. W. H. Freeman. Woods, W. (1975). What's in a link: Foundations for semantic networks. In Bobrow, D. and Collins, A., editors, Representation and Understanding: Studies in cognitive science. New York: Academic Press. Woods, W. (1980). Cascaded atn grammars. American Journal of Computational Linguistics, 6(1):1{12. Woods, W. A. (1970). Transition network grammars for natural language analysis. Communications of the ACM, 13:591{606. Also reprinted in Readings in Natural Language Processing, Grosz, Jones, and Webber (ed.), Morgan Kaufmann Publishers, 1986. Woods, W. A. (1973). An experimental parsing system for transition network grammars. In Rustin, R., editor, Natural language processing. Algorithmics Press.
235
VITA The author is currently a research scientist in the Computing Research Laboratory at New Mexico State University. From September, 1989, to November, 1994, he was a graduate student and also a Graduate Teaching and Research Assistant in the College of Computing at Georgia Institute of Technology. He obtained his MS in Information and Computer Science from the Georgia Institute of Technology in December, 1991. Before coming to Georgia Tech, he was a Scienti c Ocer in the Department of Computer Science and Automation at the Indian Institute of Science in Bangalore, India. In India, he obtained a Master of Technology degree in Computer Science and Engineering from the Indian Institute of Technology, Bombay, in 1989, and a Bachelor of Engineering degree from Bangalore University in 1987. He was born in Bangalore, India, on April 12, 1965. The author's research interests include natural language processing, computational linguistics, syntax-semantics interaction in sentence understanding, parsing, knowledge representation for natural language processing, information retrieval, and cognitive science of language. At New Mexico State University, he is currently working on ontology acquisition and semantic analysis for machine translation systems. The author's other interests include photography, traveling, Sanskrit literature, and learning dierent languages. The author's present address is: Kavi Mahesh Computing Research Laboratory Box. 30001, Dept. 3CRL New Mexico State University Las Cruces, NM 88003-0001 USA (505) 646-5466
[email protected]