Constraints in Computational Semantics Stephen Beale, Sergei Nirenburg and Evelyne Viegas Computing Research Laboratory Box 30001, Dept. 3CRL New Mexico State University Las Cruces, NM 88003-0001 USA sb,sergei,
[email protected] tel: 505-646-5757 fax: 505-646-6218 January 28, 1998
Abstract
Research reported in this paper a) extends the familiar notions of constraints and preferences in computational semantic analysis and generation; b) adapts constraint satisfaction techniques to the requirements of natural language processing; and c) combines i) largescale static knowledge sources (grammars, ontologies and lexicons) with ii) processing algorithms and iii) an advanced control architecture, which guarantees optimal results in near linear-time. The integration of the above has facilitated the implementation of a semantics-based machine translation system between Spanish, English and Chinese. This paper analyzes constraints and preferences in computational semantics.
Submission Type: Regular Paper Topic Areas: topic area(s): A3,R2 Author of Record: Stephen Beale (
[email protected]) Under consideration for other conferences (specify)? none
Constraints in Computational Semantics Abstract
Research reported in this paper a) extends the familiar notions of constraints and preferences in computational semantic analysis and generation; b) adapts constraint satisfaction techniques to the requirements of natural language processing; and c) combines i) largescale static knowledge sources (grammars, ontologies and lexicons) with ii) processing algorithms and iii) an advanced control architecture, which guarantees optimal results in near linear-time. The integration of the above has facilitated the implementation of a semantics-based machine translation system between Spanish, English and Chinese. This paper analyzes constraints and preferences in computational semantics.
1 Introduction
A {0,1,2,3}
Research reported in this paper a) extends the familiar notions of constraints and preferences in computational semantic analysis and generation ((Wilks, 1975);(Hirst, 1984)); b) adapts constraint satisfaction techniques (see, for instance, (Tsang, 1993) for a summary) to the requirements of natural language processing; and c) integrates i) large-scale static knowledge sources (grammars (local-refs), ontologies (local-refs) and lexicons) with ii) processing algorithms (local-refs) and iii) an advanced control architecture (local-refs), which guarantees optimal results1 in near linear-time. The result is a semantics-based machine translation system (local-refs) which translates between Spanish, English and Chinese. This paper analyzes constraints and preferences in computational semantics. First, we brie y describe dierences in treating constraints and preferences. Next, we discuss the kinds of constraints and preferences, using examples from both analysis and generation. We also present an overview of the control architecture developed to track and apply these natural language constraints. We conclude by examining how previous researchers have utilized constraints in natural language processing, pointing out dierences and similarities with our own work.
A > B
A < C
B < C B {0,1,3,4}
C {0,1,2,3,4}
Figure 1: A Constraint Satisfaction Problem
2 Constraints and Preferences
From the point of view of processing, constraints and preferences are vastly dierent creatures. Consider Figure 1. The constraint satisfaction problem (CSP) on the left has three variables, with the domain of possible values listed for each. Each solid line represents a binary constraint. In order to nd all possible solutions to this problem, an exhaustive search would need to examine 4 X 4 X 5 = 90 possible solutions. CSP techniques such as \arc consistency" can substantially reduce the search space without risk of deleting correct answers. For example, the value 0 can safely be removed from the domain of A, since with the assignment A=0, no values of B remain that would satisfy the constraint A > B. Arc consistency reduces the search space of the original problem to 3 X 2 X 3 = 18. An algorithm that dynamically implements arc-consistency while
1 Optimality is judged modulo the correctness of the preference settings in static knowledge sources.
1
processing can reduce the search space further. For example, when the algorithm begins considering assignments with A=1, arc consistency techniques can eliminate the assignment B=1 (because of the constraint A > B). When the assignment A=2 is considered, the assignment C=2 can be automatically removed from consideration (because of the constraint A < C). Combining the static and dynamic CSP techniques allows all nine solutions to be found without having to examine any incorrect solutions. If only one \correct" solution is required, it can be found, in this case, with no traditional search at all. The notion of what is \correct" is important. If all that is needed is a solution which meets all constraints, CSP techniques can be used within a basic search algorithm to accomplish the task. Backtracking in this search algorithm would only occur if the dynamic application of CSP techniques produced a complex elimination of all a variable's possible domain values (for example, choosing a particular assignment for A eliminated several values for B, which in turn eliminated all of C's values). On the other hand, in problems of this type, there is often some secondary \optimality" condition that needs to be applied once all \consistent" answers are discovered. For example, if in Figure 1 an additional optimality constraint of \ nd the answer with the highest sum" was added, the answer (3,1,4) would be chosen. In such a case, all consistent answers must be found rst, then the optimality constraint is applied to each to determine the best answer. A much dierent situation is represented in Figure 2. The constraint graph is the same as in Figure 1; that is, all the same variables exist and the lines indicating constraints are all the same. In this case, however, the \constraints" cannot be represented as boolean functions. The tables in Figure 2 represent preferences between variables2. It is not so important how such ta-
(A,B)
(B,C)
(A,C)
(0,0) : .12
(0,0) : .9
(0,0) : .9
(0,1) : .32
(0,1) : .85
(0,1) : .85
(0,3) : .22
(0,2) : .23
(0,2) : .8
(0,4) : .41
(0,3): .45
(0,3): .65
(1,0) : .2
(0,4) : .1
(0,4) : .1
(1,1) : .3
(1.0) : .8
(1.0) : .9
(1,3) : .22
(1,1) : .7
(1,1) : .7
(1,4) : .9
(1,2) : .55
(1,2) : .33
(2,0) : .3
(1,3) : .71
(1,3) : .71
(2,1) : .32
(1,4) : .33
(1,4) : .33
(2,3) : .77
(3.0) : .77
(2.0) : .72
(2,4) : .8
(3,1) : .5
(2,1) : .48
(3,0) : .22
(3,2) : .66
(2,2) : .33
(3,1) : .4
(3,3) : .12
(2,3) : .12
(3,3) : .45
(3,4) : .33
(2,4) : .33
(3,4) : .77
(4,0) : .6
(3,0) : .8
(4,1) : .5
(3,1) : .5
(4,2) : .85
(3,2) : .81
(4,3) : .25
(3,3) : .2
(4,4) : .05
(3,4) : .1
Figure 2: A Preference Satisfaction Problem bles are generated; in fact, we can assume an oracle that returns the preference values. The critical point is that in such a problem straightforward CSP techniques are no longer of use. Looking at the table of preferences for (A,B), one might propose the elimination of the assignment B=0, since in combination with variable A it always receives a low score. This local observation, however, does not license such a removal on a global level, since preferences with other variables might compensate. In this case, for example, the assignment B=0 appears to be optimal when combined with variable C. The bottom line is that a search algorithm would need to examine all 90 exhaustive combinations in order to guarantee an optimal answer. Looking ahead, this paper overviews a control mechanism we call Hunter-Gatherer (HG) that solves Natural Language preference problems in near linear-time, while still guaranteeing optimality.
3 Constraints and Preferences in Computational Semantics Computational semantic processing must deal with both constraints and preferences. Often we are able to assign clear-cut, \hard" constraints to certain aspects of a problem. For example, when generating \John went to X," we can safely posit a hard constraint that X cannot, syntactically, in English, be a clause. On the
See (local-ref) for a comparison of constraint and preference problems with linear and non-linear equations, and the parallels between our approach to solving preference problems and Nonserial Dynamic Programming (see, for example, (Bertele and Brioschi, 1972)) to 2
solving non-linear equations.
2
subj (root IBM) root adquirir obj (root Apple)
Adquirir sense 1 syntax expected subj VAR1 root "adquirir" obj VAR2
Figure 4: Example Input Syntax
semantic analysis ACQUIRE agent
theme
VAR1
lexicon. A typical analysis lexicon entry is shown in Figure 3. Two types of constraints are speci ed:
VAR2
sense 2 syntax expected
3.1.1 Syntactic Binding Constraints
subj VAR1 root "adquirir" obj VAR2
The input to semantic analysis is a candidate syntactic parse, for example Figure 4. This input syntax is compared to the syntax expected zone of the lexicon entries. If any non-optional structures are missing, the sense can be rejected. In this case, both senses of adquirir expect a subject and an object. This is a \hard" constraint, since missing non-optional structures eliminates the sense from consideration3 . Each variable in the lexicon entry is bound to the correct portion of the input. Comparing the rst sense in Figure 3 with the input in Figure 4, VAR1 will be bound to IBM and VAR2 to Apple.
semantic analysis LEARN agent VAR1
theme VAR2
Figure 3: Analysis Lexicon Entry - Simpli ed other hand, we cannot state unequivocally that the meaning of the direct object of the English word \say" should be constrained to people (or, well, also parrots). In a myriad manifestations of metonymy, this selectional restriction is violated, making the above constraint, in eect, a preference.
3.1.2 Semantic Constraints
In the general case, lexicon entries will contain several word senses. Word sense disambiguation in semantic analysis involves picking the correct set of senses that maximizes the global preferences (in combination with other aspects of semantic analysis). Individual preference values are obtained by comparing the semantic values of the possible word senses of the variables to the expected semantic value. Expected semantic values are found in the ontology (local-ref) 4 . For example, for the rst sense of adquirir, VAR1 is the agent of an acquire event. Looking up acquire in the ontology, we nd that its agent should be a human. Likewise, its theme should be an object. For the second sense of adquirir, learn, the agent
3.1 Constraints in Semantic Analysis
Unlike many approaches to natural language analysis, we address many of the issues needed for practical computational semantics, over a realistic-size lexicon. In addition to word sense disambiguation (which is a newly active area in computational linguistics), we also address semantic dependency structure (for example, discovering case relations between events and their constituents, inferring relationships in nounnoun compounds, etc.), discourse structure, modalities, speaker attitudes, stylistics, coreference, etc. Disambiguation is the process of nding the best answer for all of these types of information for a particular language input. The main task, therefore, clearly becomes that of being able eciently to combine evidence from many sources. In our environment, much of the knowledge for carrying out disambiguation resides in the
Even here, though, there is room for a \softer" view (as is the case for every \hard" constraint described in this paper). To improve robustness, we can view these as binding preferences with very low scores. 4 Semantic expectations can also be placed directly in the lexicon entry to override ontological values. 3
3
adquirir {ACQUIRE, LEARN}
ASSERTIVE-ACT ASSERTIVE-ACT agent
agent
IBM {CORPORATION}
VAR1
theme
ACQUIRE AGENT = HUMAN THEME = OBJECT
(ACQUIRE,CORP) : 0.95 (ACQUIRE, CORP) : 1.0
LEARN AGENT = HUMAN THEME = INFORMATION
(LEARN, CORP) : 0.95 (LEARN, CORP) : 0.25
VAR2
subj VAR1 root SAY comp root THAT obj root VAR2 (clause)
Apple {CORPORATION}
CONSTRAINT PRODUCED:
theme
VAR2 : clause binding DIVIDE-31 : clause
Figure 6: Grammatical Constraints straints and preferences which are incorporated and treated on an equal footing with the syntactic and semantic constraints described above.
Figure 5: Semantic Preference Values
3.2 Constraints in Text Generation
should also be a human but the theme should be information. Figure 5 summarizes the semantic preferences retrieved from the lexicon, along with the actual values returned for the example. Notice that only one of the preferences receives a (perfect) score of 1.0 (indicating an exact match through direct inheritance in the ontology). The others receive scores between 0 and 1.0. The preference values are returned by an ontological search algorithm (local-ref). The values re ect how well a particular preference can be ful lled by examining metonymic links. For example, the preference that IBM, a corporation, be a human receives a score of 0.95. This relatively high score is returned because the ontological search algorithm determined that corporation can have humans as employees with or presidents. On the other hand, the constraint from sense 2 of adquirir that Apple, also a corporation be information receives a low score, since only indirect links between corporations and information could be found.
The tasks in text generation mirror those in semantic analysis. Lexical choice is the analog to word sense disambiguation. Implementing coreference, temporal, aspectual and modal information in the target language is required instead of uncovering them. As mentioned above, realization tends to be an \easier" task because many of the decisions can be made with \hard" constraints, and the preferences that do exist tend to only aect the degree of naturalness, instead of the correctness as in analysis. (local-ref) contains a detailed discussion of constraints in our text generator; below we summarize with an emphasis on the variety and types of constraints and preferences.
3.2.1 Semantic Binding Constraints
Semantic binding constraints are completely analagous to the syntactic binding constraints used in analysis. These eliminate (or severely penalize) generation choices that require input semantics that are not present. Binding constraints interact with the semantic matching constraints described below.
3.1.3 Other Sources of Constraints
The other aspects of semantic analysis give rise to other sources of constraints and preferences. We employ several microtheories in areas such as coreference analysis, discourse structuring and studies of tense and related syntactic and lexical items to discover temporal, aspectual and modal information. A description of each is beyond the scope of this paper; suce it to say that each can give rise to various con-
3.2.2 Grammatical Constraints
An example of a grammatical constraint is shown in Figure 6. A lexicon entry can specify grammatical constraints on the realization of any of the variables in it. One possible syntactic realization for assertive-act is shown. It requires its VAR2 to be realized as a clause. This particular entry allows the system to pro-
4
STOCK-MARKET
techniques (such as relative clause formation or coordination in English).
LOCATION
collocations LOCATION (ON) realization
PREFER
3.2.5 Semantic Matching Constraints
default obj sem
in-pp
(container)
on-pp
(surface)
at-pp
(place)
Matching constraints take into account the fact that, rst of all, certain lexicon entries may match multiple elements of the input structure and, secondly, that the matches that do occur may be imperfect or incomplete. In general, the semantic matcher keeps track of which lexicon entries cover which parts of the input, which require other plans to be used with it, and which have some sort of semantic mismatch with the input. The following sums up the types of mismatches that might be present, each of which receives a dierent penalty (penalties are tracked by the control mechanism and help determine which combination of realizations is optimal):
Figure 7: Collocational Constraints duce \John said that Bill went to the store" but not \John said that Bill."
3.2.3 Collocational Constraints
Figure 7 illustrates the familiar notion of collocational constraints. In this case, the dierent realizations of location usually correspond to the semantic type of the object. Collocations can be used to override the default. The cooccurrence zone of the stock-market entry simply states that if it is used as the range of a location relation, then the location relation should be introduced with \on." This produces an English collocation such as \the stock is sold on the stock market" as opposed to the less natural \... sold at the stock market." Notice that no additional work on collocations needs to be performed beyond the declarative knowledge encoding. The constraint-based control architecture will identify and assign preferences to collocations.
slots present in input that are missing in lexicon entry ? > undergeneration penalty, plan missing slot separately extra slots in lexicon ? > overgeneration penalty slot ller mismatch (dierent, more/less speci c) { constant ller values HUMAN (age 13-19) - \teenager" vs HUMAN (age 12-16) - \age b^ete" { concept llers HUMAN (origin FRANCE) vs. HUMAN (origin EUROPE)
4 Eciently Processing Constraints and Preferences
3.2.4 Clause Combination Constraints
Various kinds of constraints arise when clauses are combined to form complex sentences. The strategies for clause combination can come from two main sources: 1) directly from a lexicon entry associated with an input. For example, a discourse relation such as concession might directly set up the syntax to produce a sentence structure such as \Although Jim admired her reasoning, he rejected her thesis." Verbs which take complement clauses as arguments also set up the complex sentence structure and impose grammatical constraints (if present) on the individual clause realizations: \John said that he went to the store" or \John likes to hear himself speak." 2) indirectly, from a language-speci c source of clause combination
We utilize an ecient, constraint-directed control architecture called Hunter-Gatherer (HG). (local-ref) overviews how it enables semantic analysis to be performed in near linear-time. Its use in generation is quite similar. (local-ref) describes the architecture in detail. The key to the ecient constraint-based planner Hunter-Gatherer is its ability to identify constraints and partition the overall problem into relatively independent subproblems. These subproblems are tackled independently and the results are combined using solution synthesis techniques. This \divide-and-conquer" methodology substantially reduces the number of combinations that have to be tested, while always guaranteeing an optimal answer. 5
There are two distinct types of constraints that HG deals with. The rst are hard binary constraints. If such a constraint is not met while considering a candidate combination of realizations, that combination is simply eliminated. HG utilizes traditional constraint satisfaction techniques to keep track of hard constraints. On the other hand, soft constraints set up preferences that can be overridden. These \fuzzy" constraints cannot be handled by straightforward constraint satisfaction techniques; however, the partitioning methodology combined with simple branch-and-bound methods allows HG to determine optimal local combinations. The following sums up the advantages Hunter-Gatherer has for text generation: a) Its knowledge is fully declarative. This is also allowed by uni cation processors such as that used in (Elhadad et al., 1997), but HG gives the added bene ts of speed and capability of \fuzzy" constraint processing; b) It allows \exhaustive" enumeration of local combinations; c) It eliminates the need to make early decisions; d) It facilitates interacting constraints, and accepts constraints from any source, while still utilizing modular, declarative knowledge; e) It guarantees optimal answers (as measured by preferences); f) It is very fast (near linear-time).
is (Hirst, 1984). PS has a twofold interpretation of preferences. First, for all intents and purposes, he equates the term with selectional restrictions, which is the primary knowledge source used in disambiguation. We lean toward a much more inclusive interpretation what a preference is. Our work aims at a wide variety of goals that go beyond word sense disambiguation, and since the knowledge sources and processes we employ are more varied, we need to utilize preferences from many sources, not only selectional restrictions. Secondly, PS de nes a preference as a boolean (yes/no) value. This is in contrast to our view of a preference as a multi-valued (perhaps continuous) score re ecting probability in context. Preference, in PS, is more in the way selectional restrictions are processed. Even though preference values are boolean, a single \no" value does not rule out a particular reading. Disambiguation occurs at the sentence level, with the reading receiving the most \yes" votes winning. These, then are the primary dierences between our approaches. We argue that preferences are not be boolean values, but rather varying shades of probability (as determined by the knowledge sources) given a particular context. Combination of evidence occurs not by conducting a poll, but by assessing the acceptability of combinations of these multi-valued parameters. It should be pointed out that, even though PS uses a binary system of (what we would term) constraints, straightforward CSP techniques could not be used in his approach because the constraints are not \hard"; that is, a single \no" doesn't rule out a reading. In this sense, HG could be used as an ecient control mechanism for implementing PS. Hirst's ABSITY is similar to our work in that it uses various sources of knowledge and processing, each of which we utilize in some form as well (along with others ABSITY does not). There are two main points to make, however. First, ABSITY's knowledge sources return constraints that are binary and de nitive in nature. Either the knowledge source can make a de nite decision, or it cannot. Secondly, ABSITY does
5 Constraints in Natural Language Processing Recently, several approaches to NLP started to use the terminology of constraints overtly (for example, (Elhadad et al., 1997) for uni cationbased constraint processing and (Kilger, 1997) in the incremental generation area). However, the major tradition of constraint-based NLP, at least in the area of meaning treatment, ascends to AI of the 1970s. Unfortunately, over the last decade this area has not been as fashionable as other approaches to NLP, so that it is best to discuss our work in comparison with selected seminal earlier contributions. We restrict our comments to two contributions whose work is directly related to our own. The rst is Wilks;s Preference Semantics (PS) (Wilks, 1975). The second researcher we discuss 6
not, the best we could understand, combine evidence from various sources, but rather applies them sequentially (and individually) until one of them can reach a decision. The two main sources of constraints in ABSITY come from marker passing and selectional restrictions. Marker passing (Quillian, 1992) attempts to nd the semantic associations between two words or concepts by searching paths in an ontology. Hirst gives the example of disambiguating taxi (the yellow-car or the ground motion event of airplanes) in the sentence The plane taxied to the terminal by discovering the obvious semantic connections between the intended senses of plane, terminal and taxi. The relevant point for this paper is how Hirst utilizes this information. Marker passing is used as one of the rst disambiguation mechanisms. If the marker passing mechanism can return a score that is above some threshold, then it treates this as a \yes" answer and disambiguates accordingly (without referencing any other knowledge). We use a mechanism subsuming the functionality of marking passing in onto-search (localref). There are two main dierences. The rst is that onto-search searches out semantic connections in the context of the input sentence and associated semantic expectations. Unrestricted, uncontextualized marker passing, we feel, is only a weak method for semantic inference (among other problems, see (local-ref) for additional comments). Secondly, Hirst uses the results, when they pass a threshold, as a \hard" constraint, and one that, on its own, has the power to disambiguate. In contrast, we view the results of onto-search as a measure of probability and combine this measurement with other evidence. Concerning Hirst's use of selectional restrictions, a similar argument applies. His selectional restrictions are identical in spirit to the PS preferences. Either the preference is met or it is not. And, like marker passing, these selectional constraints are able to rule out senses independently. This last point leads nicely into what we feel is a good summary of the advantages of our approach: emphasis on combination
of evidence. Although ABSITY uses a variety of sources of evidence, it does not allow them to interact. The system must implicitly decide which source of evidence is strongest and consult that rst - and trust it alone if it returns an answer. If it cannot, the next strongest piece of evidence is considered, and so on. HG, on the other hand, allows us to admit evidence from multiple sources simultaneously. We can still rank the relative reliability of each source and can take into account the con dence that source has in its answer, but the point is that we can do all that using a combination of sources. If a strong source has a weak answer and a weak source has a strong answer, the weak source may in uence the nal answer the most. This combination of evidence was necessitated by our expanded view of the computational semantic problem; that is, we go beyond trying to disambiguate word senses (or simple lexical choice in generation) to the other aspects of the problem mentioned above. The control mechanism we utilize in HG has enabled such an approach by allowing us to focus on the knowledge sources while freeing us from concerns about how the knowledge will be applied.
References
U. Bertele and F. Brioschi. 1972. Nonserial Dynamic Programming. Academic Press, New York. M. Elhadad, J. Robin, and K. McKeown. 1997. Floating constraints in lexical choice. Computational Linguistics (2), 23:195{239. G. Hirst. 1984. Semantic interpretation against ambiguity. Ph.D. Diss., Brown University. A. Kilger. 1997. Microplanning in Verbmobil as a Constraint-Satisfaction Problem. In DFKI Workshop on Generation, pages 47{53, Saarbrucken. M. R. Quillian. 1992. The teachable language comprehender: A simulation program and theory of language. Communications of the ACM, 12:459{ 476. E. Tsang. 1993. Foundations of Constraint Satisfaction. Academic Press, London. Y. Wilks. 1975. A preferential, pattern-seeking semantics for natural language inference. Arti cial Intelligence, 6:53{74.
7