Nov 6, 1990 - PATH: the route along which an entity (a theme) travels .... and COMMUNICATIVE-EVENTs â illustrate the problems any ontology builder.
World Modeling for NLP Lynn Carlson* Sergei Nirenburg CMU-CMT-90-121
November 6, 1990
Copyright c 1990, Carnegie Mellon University, Center for Machine Translation All Rights Reserved.
*Visiting Researcher, U.S. Department of Defense and Department of Linguistics, Georgetown University
Contents
i
1.
Why Does One Need a World Model?
In formal semantics, one of the most widely accepted methodologies is that of model-theoretic semantics, in which syntactically correct utterances in a language are given semantic interpretation in terms of truth values with respect to a certain model (in Montague semantics, a “possible world.”) of reality. Such models are in practice never constructed in detail but rather delineated through a typically underspecifying set of constraints. In order to build large and useful natural language processing systems one has to go further and actually commit oneself to a detailed version of a “constructed reality” (Jackendoff’s term). In computational applications, interpreting the meanings of textual units is feasible only in the presence of a detailed world model whose elements are linked (either directly or indirectly, individually or in combinations) to the various textual units by an is-a-meaning-of link. World model elements themselves will be densely interconnected through a large set of well-defined “ontological” links. These links are recognized properties of entities in the world and are themselves part of the world model. By combining properties and their corresponding values in named sets, the world modeler builds descriptions of complex objects and processes in a compositional fashion, using as few basic primitive concepts as possible. Of course, the quest for a very small basic set of ontological primitives may become the central goal in such research. From the point of view of breadth of coverage and usability of the ontology such a pursuit can be often detrimental. Methodologically, it is preferable to allow guarded proliferation of primitives. The basic top-level ontological classification in our system divides all concepts into free-standing entities and properties. Of course, viewed from another angle, properties are also free-standing entities, because it is as necessary to describe their ontological nature as it is to describe the nature of other entities. This duality is an unavoidable feature of ontology building (unless one is prepared to declare that “computer” belongs in our world models whereas “something being owned by somebody” does not). When viewed as free-standing entities, properties (whose semantics is relational in nature) are described in terms of constraints on entity classes that they can relate. When viewed as dimensions along which other entities are distinguished, they are listed (together with their values, see description in Section 2 below) as components of the definitions of these entities. Constraints on world model elements and their cooccurrence will serve as heuristics on the cooccurrence of lexical and other meanings in the text, thus facilitating both natural language understanding and generation. The world model forms the substrate for building computational lexicons, which contain static, “typical” interpretations of meanings of various lexical units in a language in terms of the world model. Once the lexicon is created, the processes of semantico-pragmatic analysis or lexical, rhetorical and pragmatic generation can be developed. The crucial knowledge structure in these tasks is the representation of textual meaning (in machine translation usually called interlingua text). The textual meaning is the outcome of the analysis process and the input to the generation process. Propositional meanings (defined in the lexicon in terms of links to the underlying world model) trickle down to the text meaning representation as instances of world model entities, sometimes somewhat modified. These are typically the meanings of open-class lexical items, that is, most nouns, verbs and adjectives and many adverbs. Non-propositional meanings don’t have ontological connections and are recorded in the lexicon in terms of properties and values defined in a particular text meaning representation. Meanings of closed-class lexical items are typically nonpropositional. As an example of a class of non-propositional meanings consider speaker attitudes, such as, for instance, the speaker’s evaluation of an event or object; the level of speaker’s expectation of an event occurring, etc. Information about the internal representation of such meanings can be carried by the choice of lexical units or syntactic constructions. In the former case, this means that the lexicon has to include information about speaker attitude in its entries. This information does not have a direct connection with
1
the ontological model.1 In the DIONYSUS project at CMU the text meaning representation language is called TAMERLAN (see Nirenburg and Defrise, 1991a, 1991b). The structure of the lexicon, used both in the DIANA analyzer and the DIOGENES generator is described in Meyer et al (1990). For simplicity, we will cumulatively refer to the programs and data associated with world modeling in our environment as the ONTOS system. ONTOS consists of a) a constraint language in which world model elements are specified; b) an ontology, a set of the most general world concepts applicable across subject domains; c) a set of domain models, each of which is rooted in the common ontology; and d) an intelligent knowledge acquisition interface. This paper will describe our approach to world modeling. We will not concentrate on the peculiarities of the knowledge acquisition interface but rather discuss the ontological and representational issues and choices.
2.
The Format of the ONTOS Constraint Language
Elements of the domain model in the ONTOS environment are formulated in terms of the syntax of the knowledge representation language FRAMEKIT (Nyberg, 1988). A knowledge base in FRAMEKIT takes the form of a collection of frames. A frame is a named set of slots. A slot is a named set of facets. A facet is a named set of views and a view is a named set of fillers. A filler can be any symbol or a Lisp function call. The above is the basic set of constraints and features of FRAMEKIT. Though the language actually specifies some extensions to this basic expressive power, it remains, by design, quite general and semantically underspecified. The actual interpretation and typing of its basic entities is supposed to take place in a particular application, in our case, world modeling. This strategy is deliberate and is different from many knowledge representation languages and environments, in which the basic representation language is made much more expressive at the expense of its relative unwieldiness, difficulty in learning and some format-related constraints on application development. In the ONTOS environment, FRAMEKIT frames are used to represent concepts. A concept is the basic building block of our ontology. Examples of concepts are house, four-wheel-drive-car, voluntary-olfactoryevent or specific-gravity. FRAMEKIT slots are interpreted as a subset of concepts called properties. In practice some properties recorded in concepts do not relate to the domain model but rather serve as administrative and explanatory material for the human users. The FRAMEKIT views are used to mask parts of the information from certain users. This mechanism is useful when working with very large domain models serving more than one domain, so that some information will be presented differently, if intended for, say, a chemist as opposed to, say, a software engineer. FRAMEKIT fillers are constrained to be a) names of atomic elements of the ontology and the domain model(s), b) expressions referring to modified elements of the ontology or domain model(s), c) collections of the above (specified in any of a number of legal ways), d) demons or lambda expressions, e) special-purpose symbols and strings used by the knowledge acquirers. A more detailed description of all these filler types is given in Section 3 below. The FRAMEKIT facets are used in ONTOS to refer to the status of the values of various properties. The ONTOS constraint language specifies the following standard facets: value, default, sem and salience.2 The semantics of these facets is as follows. Fillers of the value facet are actual values of the property referred to by the slot name. The default facet lists the most typical value(s) (if any) for the given property of the given concept. In a given slot the above two facets are in a complementary distribution — if an actual value 1 See Meyer et al., 1990 for a discussion and examples of lexicon entries carrying both propositional and non-propositional meanings. 2
Another standard facet, measuring-unit, is applicable only to a subset of slots and is discussed separately in Section 3.6.
2
is found, there is no need for a default; while if a value is not (or cannot be) determined, the default will be used. We will return to the uses of these two facets later. The fillers of the sem facet are semantic constraints on the fillers of the value and default facets. The sem facet fillers are akin in status to selectional restrictions and thus refer to ontological or domain model concepts (and their subclasses) from which the fillers of the value and default facets must be selected. The idea of salience has to do with the relative importance of various properties of a concept to its identity. Thus, for a table to be a table it is more important to have a horizontal top than to have four legs. Therefore, the salience facet of the spatial-orientation property of the TABLETOP concept (which stands in the part-of relation to the TABLE concept) will have a high value; whereas the value in the salience facet of the CARDINALITY property of the SET concept corresponding to the legs of this table will be relatively low.3 The salience property is similar to the importance values used in the lexicon of the natural language generation system DIOGENES-88 (Nirenburg et al., 1988; Nirenburg and Nyberg, 1989) to support lexical selection. An example slot with a detailed view of facets follows. (TABLE ... (MADE-OF (default *wood) (sem (*solid @ (hardness (> 0.5)))) (salience 0.7)) ... ) The above means that we expect tables to be made of wood but will not be quite surprised if they are made of another solid material with a hardness greater than the middle point on the hardness scale.4 Note the complexity of the sem facet filler. The at-sign (@) means that what follows it is an additional constraint on the concept preceding it. The value of salience for this slot is above the middle point of the 0,1 continuum, meaning that this property is judged to be an important identity property of the concept of TABLE.
f g
The legal format of a filler in the ONTOS constraint language is as follows. It can be a string, a symbol, a number, or a (numerical or symbolic) range. Strings are typically used as fillers (of value facets) of “service” slots representing user-oriented, non-ontological properties of a concept, such as DEFINITION or TIME-STAMP. A symbol in a filler can be an ontological concept. This signifies that the actual filler can be either the concept in question or any of the concepts that are defined as its subclasses. However, if the symbol is not an ontological concept, this does not necessarily mean that it is illegal. In addition to concept names and special keywords (such as facet names, etc.), we also allow symbolic value sets as legal primitives in a world model. For instance, we can introduce symbolic values for the various colors — red, blue, green, etc. — as legal values of the property COLOR, instead of defining any of the above color values as separate concepts (which is conceptually difficult) or linking together all objects having the same (uninterpreted!) value of COLOR. Numbers, numerical ranges (such as the value for the salience facet above) and symbolic ranges (e.g., APRIL — JUNE) are also legal fillers in the ONTOS constraint language. Note that symbolic ranges are only meaningful for ordered value sets and that for numerical range values one can locally specify a measuring unit. If no measuring unit is specified locally, the system will use the (default) unit listed in the definition of each scalar attribute in the ontology. In the DIONYSUS project, the syntactic convention is 3 A more detailed explanation of the format and representation of sets (collections) in ONTOS as well as the semantics of fillers see in Section 4.3. 4
See Section 3.6 for a discussion of the treatment of scalar attributes in ONTOS.
3
to prepend an ampersand (&) to symbolic value set members in order to distinguish them from ontological entity names, which are marked by an asterisk (*). There are two special fillers — nil and none. The former means that the user hasn’t specified a filler and no filler could be inherited. The latter means that there can be no filler, and the user (or the system) overtly specified it. For instance, if for a certain property in a certain concept there cannot be found a default filler — that is, that many potential fillers are equally probable — then the user will have to enter none as the filler of this default facet. But if the user doesn’t specify anything, the system will understand that the work is somehow incomplete.
3.
Basic Ontological Modeling Choices
We have established the difference between free-standing entities and properties. Among the free-standing entities the most immediate distinction seems to be that between object-type entities existing primarily in space and process-type entities existing primarily in time. So, it would be a fair approximation to say that the set of all concepts is divided into objects, processes and properties.5 In technical terms, then, we introduce the concepts OBJECT, EVENT and PROPERTY and make them into SUBCLASSES of the concept ALL, which serves as the root of the network. (In the examples below we do not list the service slots.) (ALL (SUBCLASSES (value *property *object *event)) ) The SUBCLASSES property and its inverse, IS-A, are the major classifying relations in the world model. In our ontology each relation has an inverse defined for it. This is done for reasons of convenience. The default semantics of value collections — whether they are conjunctive sets (e.g., (and OBJECT EVENT PROPERTY)), disjunctive sets (e.g., (or OBJECT EVENT PROPERTY)) or should, in fact, be understood as disjunctive power sets (values can be any subsets of the set of values) — is determined for each facet type. If the facet is value, the set is conjunctive. However, the situation is more complicated if a collection (a set) of fillers is given for a sem facet. In this case, the set may be understood as either disjunctive or a disjunctive power set, depending on the actual slot in question. For example, if the slot is HAS-AS-PART or COLOR, the fillers of the sem facet must be understood as forming a disjunctive power set. (HAS-AS-PART (sem (table chair desk lamp))) in the concept ROOM means that a room can have any combination (a disjunctive power set) of the listed components. However, in the case of NUMBEROF-WHEELS (see example below), the fillers of the sem facet form a purely disjunctive set. The default, salience and measuring-unit facets cannot contain a set of fillers. In the example below, the concept MOTOR-VEHICLE is shown to be a subclass of both VEHICLE and COMPLEX-MECHANISM, while the number of wheels is recorded as any one of a finite set of allowable values (more can be added if necessary, but the point is that the set is finite and enumerable), with a default value of 4. 5
We try to avoid getting involved in the longstanding philosophical debate concerning metaphysics. Since our task is empiricial computational metaphysics, we simply avoid any claims about the universality of our particular approach to world modeling. Moreover, we readily accept the possibility of very different ontologies and domain models. The potential for coexistence of some such models is a separate interesting question. The lowest common denominator for our world modeling is the capability of using the same constraint language and the same interactive acquisition tool.
4
(MOTOR-VEHICLE (IS-A (value *VEHICLE *COMPLEX-MECHANISM)) (NUMER-OF-WHEELS (sem 3 4 6 8 10 14 18) (default 4)) ... ) Properties are described with the help of two special slots, DOMAIN and RANGE, for instance: (IS-A (DOMAIN (sem *all)) (RANGE (sem *all))) DOMAIN and RANGE are special properties that apply to other properties and specify the beginning and end points, respectively, of the links that the properties represent. A careful reader will immediately detect a vicious circle here — if DOMAIN and RANGE are properties themselves and at the same time are used to describe the semantics of other properties, we end up with defining them in terms of themselves. We recognize this state of affairs and are treating DOMAIN and RANGE as undefined primitive atomic entities. When a filler is the name of an ontological concept X, its semantics is “X and all entities that are descendents of X in the ontology.” The filler ALL in both slots of the IS-A frame stands for any concept in the ontology, since ALL is the general root concept. The appearance of this filler means that the knowledge acquirer could not think of any more meaningful constraint (or, in fact, that there isn’t any!). In order to describe the various concepts in our ontology, we must first present some essential properties, in terms of which many concepts are represented. Based on the semantic constraint on the filler of the RANGE slot in a property, we first classify properties into two large classes — ATTRIBUTE and RELATION. Relations have references to concepts in their RANGE slots; attributes, references to values from (numerical or symbolic) value sets. Turning to the semantics of properties, we observe that an object can typically have identifiable parts or constitute a part of some other objects, can belong to somebody, can be at a specifiable location and can have the time of its coming into existence specified. Next, we observe that events can have identifiable component events or can be a component of another event, can take place at specifiable times and/or in specifiable locations, can be caused by another event and cause another event, can have effects and can be instantiatable only provided certain conditions are met. This brings us to the following basic formats of the frames for OBJECT and EVENT (for now we exclude their SUBCLASSES slots): (OBJECT (IS-A (HAS-AS-PART (PART-OF (BELONGS-TO (AGE
(value *all)) (sem *object)) (sem *object)) (sem *human *organization)) (sem (> 0)) (measuring-unit *year)) 5
(LOCATION
(EVENT (IS-A (SUBEVENTS (SUBEVENT-OF (TIME (LOCATION (CAUSED-BY (CAUSES (PRECONDITION (EFFECT
(sem *place)))
(value *all)) (sem *event)) (sem *event)) (sem (> 0)) (measuring-unit *second)) (sem *place)) (sem *event)) (sem *event)) (sem *event)) (sem *event)))
At this point let us observe that an object can be known to be a typical participant in some types of events, and events can have typical participants associated with them. Participants in events can be of several kinds — consider the intuitively perceptible differences between the participant status of the nominal concepts in I took a train from New York to Chicago, which is the linguistic realization of an instance of a MOVE event. Numerous studies of the differences among event participant types and typical constraints that can be imposed on them have been undertaken in linguistics and artificial intelligence over the past two decades. The first influential description of these phenomena in linguistics is the case grammar theory of Fillmore (1968). Since then, case grammar has had a major impact on both theoretical and computational linguistics, and has found its way, in varying forms, into many AI programs.6 In case grammar, a case relation (or case role, or simply case) is a semantic role that an argument (typically, a noun) can have when it is associated with a particular predicate, (typically, a verb). While many linguistic theories of case have been proposed, all of them have in common two primary goals: 1) to provide an adequate semantic description of the verbs of a given language, and 2) to offer a universal approach to sentence semantics.7 Although various case grammar theories differ in the total number of cases they allow and in the exact inventory of cases, they all claim that the set of cases should be 1) small in number and 2) universal across languages. The application of case grammar theory to AI programming has entailed some changes to the basic premise that the number of cases be small in number. The notion of a case relation has been extended to cover a wider range of information relevant to events, such as purpose, precondition, time, cause, result, etc. In case grammar theory, such notions are regarded as modalities of the verb; however, in AI their inclusion in the case representation of events has been found necessary for inference making. Case relations in the ONTOS system contrast with case relations in case grammar systems (and many AI programs) in one fundamental way: As a language-independent representation of objects, events and their 6
An overview and comparison of several theories of case grammar in linguistics can be found in Cook 1989; reviews of case systems as they are used in natural language processing are given in, for example, Bruce 1975 and Winograd 1983. 7
This definition follows Cook (1989, ix).
6
properties, ONTOS does not describe the predicate-argument structure of the verbs of a particular language.8 Therefore, ontological case relations can be thought of as conceptual roles typically associated with events and objects, rather than semantic roles associated with verbs and nouns. An event might include, for example, an agent role, for the entity performing the action associated with the event; a location role, specifying where the event takes place; a theme role, for the entity that undergoes a change of state or location; and so on. Case roles are treated as relations in ONTOS, which means that they must have a DOMAIN and a RANGE. The DOMAIN specifies the types of frames in which a particular CASE-ROLE can occur as a slot, while the RANGE specifies what the fillers of the slot can be. An example of a case role frame is given below: (AGENT (IS-A (value *case-role)) (DOMAIN (sem *event)) (RELATION-RANGE (sem *animal *force) (default *intentional-agent)) (INVERSE (sem *agent-of)) (DEFINITION (value ‘‘the entity that causes or is responsible for an action’’))) The IS-A slot defines AGENT as a CASE-ROLE in the ontology. The DOMAIN slot indicates that EVENT frames are the only frames which may have an AGENT slot in them. That is, EVENT concepts have agents, but other types of concepts do not. The sem facet of the RELATION-RANGE slot tells us that the types of entities which can be agents are ANIMAL or FORCE concepts (and their descendents); the default facet indicates that we typically expect intentional agents to act as agents in most events.9 The basic inventory of case roles presently being used in the ontology is given below, with a brief definition for each role. We do not claim that our set of case roles is final. In particular applications this list may be modified to meet additional requirements, as necessary. In fact, below we suggest possible enhancements to the basic list:
AGENT: the entity that causes or is responsible for an action THEME: the entity whose state or location is being described, or whose state or location changes; or, the entity that is affected by an action.10
EXPERIENCER: the entity that undergoes psychological experience (perception, cognition, emotion) BENEFICIARY: the entity that benefits from an action INSTRUMENT: the object that is used to accomplish an action 8 An example of a language-independentrepresentation of meaning in AI is Schank’s conceptual dependency theory (see Schank 1975). 9
For treatment of apparent violations of these constraints in sentences such as The White House issued a statement, see discussion of metonymical processing in Section 4.2 below. 10
Some systems distinguish THEME from PATIENT, the former being used to describe the state or location (or change in state or location) of an entity, and the latter being used to describe the entity affected by an action.
7
LOCATION: the place where an event takes place or where an object exists SOURCE: A point of origin for actions and processes, the SOURCE can be either a location or an entity. For CHANGE-OF-LOCATION events, the SOURCE is the location from which a theme moves. For TRANSFER-POSSESSION events, the SOURCE is the entity which relinquishes possession of a theme.
GOAL: An endpoint for actions and processes, the GOAL can either be a location or an entity. For
CHANGE-OF-LOCATION events, the GOAL represents the location to which a theme moves. For TRANSFER-POSSESSION events, the GOAL is the entity which acquires possession of a theme.
PATH: the route along which an entity (a theme) travels
As a potential first extension, we could specify CO-AGENT (or ACCOMPANIMENT) and CO-THEME roles, for example. A CO-AGENT would designate the entity that accompanies or assists an AGENT in carrying out an action; it is a necessary role for events involving cooperative actions where a single AGENT is logically impossible: for example, marrying, (in the sense of ‘A and B got married’), pooling efforts, merging (in the sense of ‘Company A and Company B merged’), swapping or trading (in the sense of ‘A and B swapped/traded places’). Similary, a CO-THEME would represent the entity whose state or location changes, or is being described, in conjunction with, or relative to, another entity. Examples involving state or location changes are mixing or merging things together (in the sense of ’John mixed A and/with B’), and swapping or exchanging things (in the sense of ’Mary exchanged A and/for B’).11 Examples involving state descriptions are comparisons of two (or more) items, e.g., identity, equality, difference, etc. (as in ’A is equal to B’ or ’A is larger than B’ or ’Lines A, B and C form a triangle’). In addition to the above list of case roles, other relations have been created in the ontology to describe “modifying” properties of events, namely, spatiotemporal relations and conditions.12 Spatiotemporal relations specify the general time and location of an event. Conditions specify the causal and intentional structure of events in terms of PRECONDITIONs and POSTCONDITIONs, and include relations such as PURPOSE, CAUSE, EFFECT, PRESUPPOSITION. (See a discussion of the complex event structure in Section 3.3 below.)
3.1.
Top-Level Object and Event Subtrees
Now that we have developed a list of typical participant roles for events, let us consider how we might further develop the ontological subtrees for EVENT and OBJECT. Since events turn out to be somewhat more difficult to classify than objects, let us first turn our attention to the object hierarchy. We recognize a basic distinction between object-type entities that exist in the physical world, such as a rock, a chair, or a puppy; in the mental world, such as an idea, a collection, or information; and in 11
Some case systems use the GOAL role for the CO-THEME or CO-AGENT examples which involve prepositional phrases.
12 The distinction between distinguishing roles and modifying roles is not at all obvious. Following the tradition of most case grammar models, we have reserved the concept of CASE-ROLE for the types of roles that, in case grammar systems, relate arguments to predicates (sometimes called “propositional” or “inner” roles), and classify other types of roles, such as PURPOSE or TIME (called “modal” or “outer” roles) as conditions or spatiotemporal relations in the ontology. A discussion of how different systems differentiate between distinguishing and modifying roles is found in Bruce (1975).
8
Figure 1: A subnetwork for objects. the social world, such as a government, or an organization. For such entities, we introduce the concepts PHYSICAL-OBJECT, MENTAL-OBJECT, and SOCIAL-OBJECT, respectively. Within the physical world, we perceive a fundamental distinction between discrete entities, e.g., a table, a human being, a castle, and mass-like substances, e.g., water, sand, salt. A third component of the physical world is the existence of places, such as in the city, near the park or under the chair. We therefore divide PHYSICAL-OBJECT into three subclasses: SEPARABLE-ENTITY, MATERIAL, and PLACE. Separable entities are further divided into animate and inanimate objects. The social world contains objects which are the constructs of organized societies: geopolitical entities, including nations, cities, and states; and organizations, including service corporations, political parties, professional organizations, and so on. All of the above entities become, in fact, concepts in our ontology. In the mental world, we recognize a basic division into ABSTRACT-OBJECTs, such as principles, ideas, or beliefs, and REPRESENTATIONAL-OBJECTs, including MATHEMATICAL-OBJECTs, such as set, variable, or number; LANGUAGE-RELATED-OBJECTs such as word, text, or sentence; ICONs and PICTORIAL-OBJECTs. At this point, we have a top-level OBJECT hierarchy that appears as in Figure 1. Turning to events, we find that while certain events can be clearly represented as either physical, mental or social, more often than not we will find that it is difficult to categorize an event as a single type of event, or that we will need to view an event as a series of related component events. Examples of typical physical events include CHANGE-LOCATION, which would subsume the types of activities conveyed by English words such as move, walk, and APPLY-FORCE (cf. insert or throw, etc.). MENTAL-EVENTs include COGNITIVE-EVENTs, which can be active, as in thinking or deciding; or passive, as in remembering, forgetting, or understanding — depending on the presence or absence of volition (which is made manifest through the active events having an agent case role). Two other subclasses of mental events — namely, PERCEPTUAL-EVENTs and COMMUNICATIVE-EVENTs — illustrate the problems any ontology builder (and, in fact, any taxonomy organizer) will have with multiple classification options. Perceptual events are classified as both mental and physical events, while communicative events are classified as both mental and social events. Detailed descriptions of the PERCEPTUAL-EVENT hierarchy and the SPEECH-ACT subtree of the COMMUNICATIVE-EVENT hierarchy will be given in Sections 3.2 and 4.1 below, respectively. Figure 2 illustrates a portion of the top-level event hierarchy. We would also like to stress that the DIONYSUS ontology operates not just with graphical symbols but also with complete symbolic representations. Figure 3 illustrates the symbolic representation of two ontological concepts.
9
Figure 2: A subnetwork for events.
Figure 3: Sample symbolic representations of ontological concepts.
10
3.2.
Perceptual Events and Attributes
We have described a top-level division of our ontology into objects, events and properties, where events and objects are further divided into subclasses corresponding to the physical, mental and social worlds, while properties are classified as either relations or attributes. Assuming this basic top-level division, we now consider how we might extend the hierarchy in order to provide an ontological treatment of the field of perception. At this point we will introduce some extra detail of the decision-making process, as regards concept identification and delineation of conceptual boundaries. In order to guide the process of concept acquisition and to provide a basis for the creation of language-independent concepts, we first consider some cross-linguistic evidence concerning words related to perception in various languages. A rich source of cross-linguistic evidence was found in a survey conducted by Viberg (1984) of the verbs of perception in 53 different languages. The results of this survey pointed to a fundamental distinction in all of the languages examined between the passive experience of sense perception and the deliberate act (or activity) of perception (p. 123). This distinction is reflected in English by the existence of such verb pairs as see/look, hear/listen, feel/touch.13 Another distinction which was born out in the languages of his survey has to do with the difference between experiencer-based verbs of perception, in which the entity undergoing the perceptual experience or activity is the subject of the sentence, and source-based (or phenomenon-based) verbs of perception, which take the object experienced as the subject (p. 124). These fundamental, cross-linguistic distinctions can be classified as follows: 1. Experiencer-based verbs (a) activity (consciously controlled by agent); examples: John looked at the roses in the garden. Pete smelled the meat to see if it was fresh. (b) experience (uncontrolled state); examples: Jane heard music coming from the auditorium. Susie smelled smoke in the kitchen. 2. Source-based (or phenomenon-based) verbs (stative verbs; copulative or attributive expressions). Examples: The flower smells fragrant. The soup tastes salty. Using the results of this survey as a guideline, a fundamental distinction was made in the ontology between voluntary and involuntary perception. A VOLUNTARY-PERCEPTUAL-EVENT is defined as an agent-controlled, conscious activity involving one of the five sense modalities. An INVOLUNTARYPERCEPTUAL-EVENT is defined as a state or experience involving one of the five sense modalities, which is not under the conscious control of an agent.14 The ontological concepts VOLUNTARY-PERCEPTUALEVENT and INVOLUNTARY-PERCEPTUAL-EVENT are subclasses of PERCEPTUAL-EVENT, which is classified as both a PHYSICAL-EVENT and a MENTAL-EVENT, reflecting the intuitive notion that perception involves both physical and mental properties. The difference between VOLUNTARY-PERCEPTUAL-EVENT and INVOLUNTARY-PERCEPTUALEVENT is captured in the ontology by a difference in the assignment of case roles for each of these types of events. 13
Not every language has a lexical distinction between the experience and activity types of perception for each sense modality. English, for example, uses the verb smell for both the activity and the experience of smelling. 14
Viberg (1984, p. 123) refers to this as an “inchoative achievement”
11
A VOLUNTARY-PERCEPTUAL-EVENT requires an AGENT slot, to represent the entity that performs the act of perception. In contrast, an INVOLUNTARY-PERCEPTUAL-EVENT requires an EXPERIENCER slot, for the entity that experiences the uncontrolled state of perception; moreover, the AGENT slot must be blocked from occurring in this frame, to account for the fact that INVOUNTARY-PERCEPTUALEVENT is not an agent controlled activity. These ontological constraints are illustrated below for the concepts VOLUNTARY-VISUAL-EVENT and INVOLUNTARY-VISUAL-EVENT, which correspond, respectively, to the English verbs look and see. (VOLUNTARY-VISUAL-EVENT (AGENT (sem *animal)) (THEME (sem *physical-object)) (INSTRUMENT (sem *vision-organ))) (INVOLUNTARY-VISUAL-EVENT (EXPERIENCER (sem *animal)) (THEME (sem *physical-object)) (INSTRUMENT (sem *vision-organ))) The subtrees for VOLUNTARY-PERCEPTUAL-EVENT and INVOLUNTARY-PERCEPTUAL-EVENT account for the types of verbs that Viberg classifies as experiencer-based; they are illustrated in Figure 4. However, we need another ontological distinction to capture the attributive perceptual phenomenon of source-based verbs. For this, we created the concept of PERCEPTUAL-ATTRIBUTE. In a sentence like The soup tastes salty (to me), the verb taste would be linked to the concept GUSTATORY-ATTRIBUTE in the ontology: (GUSTATORY-ATTRIBUTE (DOMAIN (sem *physical-object)) (ATTRIBUTE-RANGE ()) (EXPERIENCER (sem *animal))) In the example cited above, the DOMAIN slot would be filled by the soup, the experiencer slot by me. For the range slot of this type of attribute, we offer two options for the fillers: 1) define and use a set of values from a symbolic value set. For taste, GUSTATORY-ATTRIBUTE could, for example, be given the following set as a filler: &salty, &sweet, &sour, &spicy, &bitter . As discussed in Section 2 above, these symbolic fillers do not reference ontological concepts.15 2) define a set of numerical range fillers by introducing a “scientific” approach. For instance, for COLOR, we could use the Munsell scale to measure colors using numerical scales for hue, value and chroma. Presumably, some set of numerical criteria could be established for other perceptual categories: for example, chemical measurements for smell or taste; measurements for sound in terms of harmonics, timbre, and so on.
f
g
In addition to positing perceptual events and attributes for the ontology, concepts were needed to account for the physiological sense of perception, i.e., a faculty which is part of an animate being, and for the sensory 15
The use of symbolic fillers is simply a way of limiting the grain size of the ontology to a specified level of detail. Should the need for a more detailed level of description arise, symbolic values such as salty or red may be redefined as ontological concepts with their own set of properties.
12
Figure 4: A subnetwork for perceptual events.
13
organ which is the instrument for a particular faculty of perception. For this, we created the ontological subtrees PERCEPTUAL-SENSE and SENSORY-ORGAN, each of which has five children, one for each sense modality. In order to illustrate the use of the above ontological distinctions for perception, we describe below how a highly polysemous English word like smell may be linked to one of four concepts in the ontology: VOLUNTARY-OLFACTORY-EVENT, INVOLUNTARY-OLFACTORY-EVENT, OLFACTORYATTRIBUTE, and OLFACTORY-SENSE. We present the lexical mapping components of lexicon entries for five senses of smell and give examples of their use. The lexical mapping connects a word sense with its propositional meaning.16 (smell-v1 (lexical-mapping (VOLUNTARY-OLFACTORY-EVENT (AGENT (sem *animal)) (THEME (sem *physical-object))))) Example: John smelled the milk to see if it had spoiled (smell-v2 (lexical-mapping (INVOLUNTARY-OLFACTORY-EVENT (EXPERIENCER (sem *animal)) (THEME (sem *physical-object))))) Example: Mary smelled smoke when she walked into the kitchen (smell-v3 (lexical-mapping (OLFACTORY-ATTRIBUTE (DOMAIN (sem *physical-object)) (ATTRIBUTE-RANGE (sem )) (EXPERIENCER (sem *animal))))) Example: The cake smells like chocolate (smell-n1 (lexical-mapping (OLFACTORY-SENSE (PART-OF (sem *animal))))) Example: That golden retriever has a keen sense of smell 16
For a detailed discussion of the lexis-ontology mapping in the DIONYSUS project, see Meyer et al. 1990. The indexing of lexical entries shown in the examples below is as follows: a headword (e.g., smell) is followed by a hyphen, an indication of the part of speech, and a ‘one-up’ number for a particular word sense for that part of speech.
14
(smell-n2 (lexical-mapping (VOLUNTARY-OLFACTORY-EVENT (AGENT (sem *animal)) (THEME (sem *physical-object))))) Example: The coffee shop owner had a smell of the coffee beans Notice that smell-n2 and smell-v1 correspond to the same ontological concept, in spite of the difference in part of speech.
3.3.
Complex Events
Many events in the ontology are best understood as named sets of component events. These are complex events. In what follows we describe our approach to representing complex events.17 The frame for a complex action includes, in addition to slots found in the representation of simple actions, a slot for subevents. An example follows.18 (TEACH (IS-A (AGENT
(value *communicative-event)) (sem *intentional-agent) (default *teacher)) (THEME (sem *knowledge)) (GOAL (sem *intentional-agent) (default *student)) (LOCATION (default (#spatial-in-1))) (TIME (default nil)) (PRECONDITION (default (and #teach-know-1 (not #teach-know-2)))) (POSTCONDITION (default #teach-know-2)) (SUBEVENTS (component-events #teach-describe #teach-request-info #teach-answer) (component-relations #domain-temporal-after-1 #domain-temporal-after-2)))
The representation of a complex action is typically a set of frames, not a single frame. We constrain the meaning to teaching of humans by humans. This is why the agent and the goal are both intentional agents. The filler for the location slot is a relative location marker for which we allow specification through the instance of a relation; similarly, the relative temporal relations among component actions of teach are described in terms of special relation instances. Conditions and components are described through a reference to particular instances of certain events (in this case, three different instances of know and an instance each of describe, request-info and explain. These component actions are not just any instances 17
The status of complex events is similar to that of scripts and plans in the sense of Schank and Abelson (1977).
18
We use the role of goal to designate the recipient of the imparted knowledge.
15
of their respective types but instances specifically constrained in the ways necessary for the description of the main event, teach. So, even though their descriptions contain a reference to their class (through the instance-of relation), their major “allegiance” is to the concept in whose definition they appear, in this example, teach. These we will call ontological instances and in the representation we will prepend them with a pound sign (#) Note that the introduction of the ontological instances is parallel to further constraining the values of some fillers, using the at-sign (see example in Section 2 above). The difference between the two approaches is in being able explicitly to refer lexically to a modified concept if it is introduced as an ontological instance, whereas to refer to the concept of *solid @ (hardness > 0.5) is possible only on a value (that is, “the filler of the sem facet of the made-of slot of *table”). (TEACH-KNOW-1 (INSTANCE-OF (value *know)) (EXPERIENCER (value *teach.agent.sem)) (THEME (value *teach.theme.sem))) (TEACH-KNOW-2 (INSTANCE-OF (value *know)) (EXPERIENCER (value *teach.goal.sem)) (THEME (value *teach.theme.sem))) (TEACH-DESCRIBE (INSTANCE-OF (AGENT (THEME (GOAL
(value (value (value (value
*describe)) *teach.agent.sem)) *teach.theme.sem)) *teach.goal.sem)))
(TEACH-REQUEST-INFO (INSTANCE-OF (value (AGENT (value (THEME (value (GOAL (value
*request-info)) *teach.goal.sem)) *teach.theme.sem)) *teach.agent.sem)))
(TEACH-ANSWER (INSTANCE-OF (AGENT (THEME (GOAL
*answer)) *teach.agent.sem)) *teach-request-info.theme.sem)) *teach.goal.sem)))
(value (value (value (value
The above examples introduce yet another feature of the ONTOS constraint language — the paths. The notation .[.] means “the filler of [the given facet of] the given slot in the given frame.” Thus, teach.agent means the filler of the sem facet of the agent slot of the frame teach — that is, the experiencer of teach-know-1 is coreferential with the agent of the main event, teach. (SPATIAL-IN-1 (DOMAIN *teach) 16
(RANGE *classroom)) (DOMAIN-TEMPORAL-AFTER-1 (RELATION-VALUE 0.2) (DOMAIN *describe) (RANGE *request-info)) (DOMAIN-TEMPORAL-AFTER-2 (RELATION-VALUE 0.2) (DOMAIN *request-info) (RANGE *explain)))) The above relations are shared by the TAMERLAN text meaning representation language and the ONTOS constraint language. This is one of several manifestations of the mixed character of the complex event descriptions: they are both ontological entities and representations of actual processes. In other words, they are “canned” episodes. In a nutshell, the entire example says that a) teaching is a communicative event (which, in turn, is both a mental and a social event), typically performed by teachers; b) those taught are typically students; c) there are things that are taught and they are subclasses of the concept KNOWLEDGE; d) the teaching typically occurs in classrooms; e) the precondition of teaching is typically that the teacher knows the material and the students don’t; f) the default postcondition is that the students know the material; g) the component events in teaching include the teacher’s describing of the material to the students, the students’ questions about the material to the teacher and the teacher’s answers to these questions.
3.4.
Classes and Instances
So far in this paper we have for the most part discussed the properties, behavior and representation of classes of entities in the world. The information that we recorded in the ontology thus related to all objects, properties, computers, vehicles, etc., not to any particular instance of any such class. In reality, reasoning about entities in the world always involves dealing with tokens of ontological and domain-model types. For instance, in natural language processing, open-class lexical items typically refer to tokens (instances) of entity types (classes). Only in generic senses (e.g., The tiger is a ferocious animal) does the information apply to the whole class of entities (in the example, the class of all tigers or the ontological concept tiger). In most cases concept instances are “fleeting,” that is, they would be included in text meaning representations only to be discarded when the processing is finished. As we saw in the description of complex actions, there are some instances which are non-fleeting. The ontological instances, however, are not instances in a complete sense of the word. To explain why it is so, let us consider a situation in which a complex action is instantiated during the processing of a natural language text. The connection between this instance (say, %teach-8) and its corresponding class, the complex event *teach is maintained.19 Then, if some other action is instantiated, a special process in the analyzer will check whether this new instantiation matches one of the component event specifications in the above complex action. This feature is necessary to support correct meaning representation. To summarize, ontological instances help the analyzer check on the actual connections among various events referred to in a text. 19
To represent an instance of an ontological concept like *teach, we prepend the symbol (%) to the concept name, and affix a numerical indicator to the end of the name, i.e., %teach-8.
17
Natural language texts, however, habitually include references to instances that are neither ontological nor fleeting. These are the so-called “remembered instances” such as “Paris,” “John Kennedy,” “The Washington Post,” “IBM” or “the 1990 Stanley Cup Finals.” These are instances of objects and events which we remember even after having processed the texts in which they appeared. The knowledge base of remembered instances can be quite large in any practical system of machine translation or natural language processing. In some cases the semantics of value sets in the representation of classes and their corresponding instances is different. For example, in the description of the concept ROOM the fillers of the HAS-AS-PART slot form a disjunctive power set, and are thus included in the sem facet, since any subset of the objects listed as potentially capable of being in a room can be in an actual room. In the description of an instance of a room the fillers of the HAS-AS-PART slot are listed in the value facet, since they must be understood as members of a conjunctive set, namely, all and only the objects (more precisely, object instances) that are in this particular room.
3.5.
Inheritance
Concepts in the ontology are typically characterized by a large number of properties. The reader will have already observed, however, that many properties and even many complete property-value pairs are shared among some concepts. Indeed, our approach to building an ontology can be said to resemble preparing knowledge for a computer program which would play the “Twenty Questions” game. The question “Is it an animal, plant or mineral?” can be interpreted in computational ontology terms as traversal of a link from the root of a concept hierarchy to its children, which, for the purposes of the game will be labeled ANIMAL, PLANT and MINERAL. In reality, the concepts in the ontology are stored in a heterarchy or a plex structure (or a lattice) — a hierarchy in which a node can have more than one parent. The property which forms the backbone of this structure is IS-A (and its inverse, SUBCLASSES). It is very important to organize the domain model as a network of densely interconnected concepts. This is beneficial for search and retrieval as well as for update efficiency. Additionally, the greater the number of associations among concepts the better the support of reasoning activities on the basis of a domain model — since more opportunities exist for connecting any two concepts together and thus, for instance, making sense of their cooccurrence in a text. The IS-A heterarchy allows one to store locally in a node only the property-value pairs that are different from those in its parent(s). The semantics of inheritance is such that if no local mention is made of a property, it is assumed that the value is the same as in a node’s parent (or a more distant ancestor, up to the root of the entire ontology, if the value in question is not set in the frame for the immediate parent). It is the responsibility of the ontology builder to make sure that no ambiguities arise from multiple inheritance — in other words, that the same property with incompatible values is not inherited by a node from different parents of that node. Not all property values can be inherited. For instance, inheritance is blocked, for obvious (though different) reasons, for the fillers of the IS-A, SUBCLASSES, INSTANCES, INSTANCE-OF, DEFINITION and COMMENTS slots in any concept. Note also that not all facets of a property (slot) are inherited. Inheritance occurs for the contents of the value and sem facets, but not for those of default or salience.
18
3.6.
Scalar Attributes and Measuring Units
At this point we would like to give a detailed description of an additional kind of property in the ontology, the scalar attributes. They are important as a group not only from the ontological point of view but also because of the peculiarities of their formalization. We described above a division of properties into two classes, RELATIONs, which reference concepts in their range slots, and ATTRIBUTEs, which reference values from numerical or symbolic value sets. ATTRIBUTEs, then, can be divided into two general types, based on what they can reference in their range slots: The class LITERAL-ATTRIBUTE designates attributive properties for which a symbolic value is specified, for example, (ORGANIC-ATTRIBUTE (sem &organic &inorganic)). The class SCALAR-ATTRIBUTE is reserved for attributive properties whose value can be numerically measured on an appropriate scale, for example, area = 12 cubic feet; or age = 50 years old. In order to provide an adequate ontological treatment of SCALAR-ATTRIBUTEs several factors must be considered. First of all, consider some properties of physical objects, such as temperature, age, or volume. In order to state a numerical value (or range of values) for each of these properties, a particular class of measuring units must be specified. We can measure temperature in Kelvin, or degrees Fahrenheit; age in years, hours, or seconds; volume in cubic meters or cubic inches, and so on. A second observation is that any property which can be measured on a scale can also be described linguistically in relative terms. We can say The temperature is 90 degrees today or It is very hot today. Similarly, That man is very tall or That man is 6 feet, 6 inches tall. In the ONTOS system, we have devised a method for relating various scalar attributes to their corresponding measuring units, which allows us to convert scalar information into a standard format for interlingua representation. It also provides a way to associate relative information, such as very hot with an appropriate numerical interpretation for the object being described. Consider the ontological subtree SCALAR-PHYSICAL-OBJECT-ATTRIBUTE, which includes subclasses for TEMPERATURE, VELOCITY, WEIGHT, SIZE, and so on. Corresponding to each of these concepts is a particular class of MEASURING-UNITs, which appear under the node REPRESENTATIONALOBJECT in the ontology (see Figure 5). For TEMPERATURE, the related class of measuring units is THERMOMETRIC-UNIT, which has subclasses DEGREE-F, DEGREE-C, and KELVIN. Below, we will see how slots are used to relate a particular scalar attribute to the appropriate class of measuring units, and how conversion between different units within a class takes place. Every scalar attribute has an ATTRIBUTE-RANGE slot, which is used to express information about the range of numerical values a particular scalar attribute can have. An ATTRIBUTE-RANGE has two facets, which are specified as follows: (ATTRIBUTE-RANGE (sem ) (measuring-unit )) The sem facet is used to indicate an absolute constraint on what numerical values the range can have. For example, it is impossible to have a negative value for age, so the sem facet must specify an absolute constraint of (> 0). The second facet, measuring-unit, is used to indicate a standard unit of measure for the scalar attribute in which the attribute range appears, since an absolute constraint can only be interpreted with respect to a specific unit of measure. For each class of measuring units in the ontology, we designate one unit as the standard measure for that class. For example, in the case of the concept AGE, the standard unit 19
is SECOND.20 Some examples of the ATTRIBUTE-RANGE slot for different scalar attributes are given below:
(TEMPERATURE (ATTRIBUTE-RANGE (sem (>= 0)) (measuring-unit *kelvin))) (LINEAR-SIZE (ATTRIBUTE-RANGE (sem (> 0)) (measuring-unit *meter))) (AGE (ATTRIBUTE-RANGE (sem (> 0)) (measuring-unit *second))) Additionally, we must specify in the DOMAIN slot of every SCALAR-ATTRIBUTE the types of concepts for which the attribute is appropriate. The DOMAIN for a SCALAR-PHYSICAL-OBJECT-ATTRIBUTE (e.g., TEMPERATURE, LINEAR-SIZE, etc.) will be PHYSICAL-OBJECT. For a SCALAR-OBJECTATTRIBUTE (e.g., AGE), the DOMAIN is OBJECT. Below, we show relevant slots from the frame for AGE. Compare the ATTRIBUTE-RANGE slot for the concept AGE, with the slot AGE, when it appears in an OBJECT concept such as COFFEE: (AGE (ATTRIBUTE-RANGE (sem (> 0)) (measuring-unit *second)) (MEASURED-IN (sem *temporal-unit)) (DOMAIN (sem *object))) (COFFEE (AGE (sem (> 0)) (default ( 0 4)) (measuring-unit *hour))) When a scalar attribute appears as a slot in an OBJECT concept, such as COFFEE, it has three facets: The sem facet represents the absolute constraint on what values the attribute can have. The default facet expresses a typical or expected range of values for the age of the object being described, while the measuring-unit facet selects an appropriate measuring unit from the class TEMPORAL-UNIT. We represent the AGE of coffee (that is, the hot beverage, not the bean) in hours, whereas the AGE of a human would be expressed in years, for example. The slot MEASURED-IN in the AGE frame specifies TEMPORAL-UNIT as the appropriate class of MEASURING-UNIT for AGE. We represent the AGE of COFFEE (i.e., the beverage) as having a 20 We originally considered specifying the appropriate class of measuring units (e.g., THERMOMETRIC-UNIT) in the measuringunit facet of the attribute range, rather than the standard measure (e.g., KELVIN), but opted for the latter because the absolute range may not be identical for all members of the class. In the case of THERMOMETRIC-UNIT, for example, the absolute constraint (>= 0) is only true if the unit of measure is KELVIN.
20
default range value of 0 - 4 hours; this range is defeasible, i.e., can be overridden, as long as the absolute constraint is not violated. Default range values may be used to associate lexical items like new, fresh, old, etc., which represent relative information, with typical numerical values for the age of particular objects in the ontology. The same principle can be applied to other scalar attributes as well, allowing us to make inferences about typical or expected values. For example, fresh-brewed coffee can be represented as a percentage (say 10%) of the default range value of 4 hours, that is, 0.4 hours (or 24 minutes), whereas fresh produce might be 10% of a default range of 7 days, i.e., 0.7 days (or 16.8 hours). While exact values for default ranges of scalar attributes and percentages associated with particular lexical items are subject to refinement, the mechanism needed to support this type of inference making is already in use in the DIANA system. Finally, we demonstrate how information is converted from one unit of measure to another, enabling us to establish standard units of measure for the interlingua text. Consider the frames for the concepts THERMOMETRIC-UNIT and DEGREE-F: (THERMOMETRIC-UNIT (IS-A (SUBCLASSES (MEASURED-BY (MEASURING-UNIT-FOR (STANDARD-MEASURE
(value *measuring-unit)) (value *kelvin *degree-c *degree-f)) (sem *thermometer)) (sem *temperature)) (value *kelvin)))
(DEGREE-F (IS-A (value *thermometric-unit)) (CONVERSION-TO-STANDARD (value (lambda (X) (+273.15 (* (- X 32) 5/9))))))
The slots MEASURED-BY, MEASURING-UNIT-FOR and STANDARD-MEASURE will all be inherited by the concept DEGREE-F. Additionally, the frame DEGREE-F contains a CONVERSION-TOSTANDARD slot, whose filler is a lambda expression which when invoked will convert temperatures expressed in degrees Fahrenheit into the standard measure, degrees Kelvin. Below we illustrate the facets for the temperature slot for HUMAN:
(HUMAN .... (TEMPERATURE
(sem ( 96 106)) (default 98.6) (measuring-unit *degree-f)))
Figure 5 illustrates a portion of the subnetworks for SCALAR-ATTRIBUTEs and MEASURING-UNITs.
3.7.
Variability in Network Building
In constructing a world model, it is possible to come up with quite different organizational strategies, depending on what criteria we use for dividing things up. For example, how do we determine when a 21
Figure 5: Scalar attributes and measuring units.
22
Figure 6: A subnetwork for materials.
Figure 7: An alternative subnetwork for materials. particular feature should be the criterion for dividing a category into subcategories, rather than simply being expressed as a property of that category in a slot? We will illustrate the problem by discussing some options we faced when building the MATERIAL subtree of our PHYSICAL-EVENT hierarchy. One way of dividing up materials into subclasses would be based on whether they are in a solid, liquid or gaseous state of matter. Or, we could consider a division into subclasses for inorganic or organic materials. We might even want to divide materials into subclasses according to whether they are man-made or natural; metallic or nonmetallic, etc. Each of these choices would affect the decisions that have to be made at lower levels of the subnetwork. For example, if a choice is made to divide the MATERIAL subtree into the subclasses SOLIDMATERIAL, LIQUID-MATERIAL, and GASEOUS-MATERIAL, each of these could in turn be subdivided according to whether something is organic or inorganic. Using this approach, SOLID-MATERIAL, for example, would be subdivided into INORGANIC-SOLID-MATERIAL (e.g., metal) and ORGANICSOLID-MATERIAL (e.g., wood), resulting in a tree structure that looks like the one in Figure 6. Alternatively, the subclasses for SOLID, LIQUID, and GASEOUS-MATERIALs could be divided according to whether something is man-made or natural. Then, for example, the SOLID-MATERIAL subtree would have subclasses MAN-MADE-SOLID-MATERIAL (e.g., concrete) and NATURAL-SOLIDMATERIAL (e.g., rock), as in Figure 7. On the other hand, if the MATERIAL subtree were first divided into ORGANIC-MATERIAL and INORGANIC-MATERIAL, then at the next level of the hierarchy, we could have divisions into, for example, SOLID-INORGANIC-MATERIAL (e.g., metal), LIQUID-INORGANIC-MATERIAL (e.g., water), and GASEOUS-INORGANIC-MATERIAL (e.g., oxygen), or alternatively, into MAN-MADE-INORGANICMATERIAL (e.g., steel) and NATURAL-INORGANIC-MATERIAL (e.g., mineral), as illustrated in Figures 8 and 9, respectively. 23
Figure 8: A third possible subnetwork for materials.
Figure 9: A fourth possible subnetwork for materials. A completely different approach — and the one which we have currently adopted — would be to divide the MATERIAL subtree using some other criteria and treat the distinctions outlined above as properties (slots). For example, we could have categories which group things together based on their source or origin, such as EARTH-MATERIAL, MARINE-SUBSTANCE, PLANT-DERIVED-SUBSTANCE, SYNTHETICS, etc., or on their use, such as CLEANSING-MATERIAL, PAVING-MATERIAL, PIGMENT, etc. Then properties could be assigned to indicate whether something is man-made or natural; organic or inorganic; metallic or non-metallic; solid, liquid, or gas; and so forth. Using this approach, a frame for ROCK might include the following slots:
(ROCK (IS-A (STATE-OF-MATTER (ORGANIC-ATTRIBUTE (METALLIC-ATTRIBUTE (MAN-MADE-ATTRIBUTE
(sem *earth-material)) (value *solid)) (value &inorganic)) (sem &metallic &non-metallic)) (value &natural))
One advantage of this approach is that it is easier to make inferences by checking the filler of a slot to see if something is a solid — e.g., by checking the filler of the STATE-OF-MATTER slot, rather than trying to find that information buried in a class name such as SOLID-INORGANIC-MATERIAL. Including this type of information in slots allows for efficient representation and inheritance of features, and eliminates the kind of duplication across subtrees that would necessarily occur in the sample trees illustrated in Figures 6 - 9. For example, using the approach illustrated in Figures 6 and 7, we would have to subdivide gaseous and liquid materials in a manner similar to that of solid materials. The approach we have opted for avoids another problem as well, namely, that of class names growing too long and unwieldy as one proceeds down 24
ONTOLOGY
SOURCE TEXT
TARGET TEXT
1
LEXICON
2
ANALYZER
GENERATOR
3 TEXT MEANING REPRESENTATION FRAGMENTS
TEXTUAL MEANING REPRESENTATION TEXT
LEGEND: refers to information in data flow control flow 1 Choice heuristics based on world knowledge 2 entry instantiation 3 choice heuristics based on contextual knowledge (including pragmatics and discourse)
Figure 10: Static knowledge sources in a comprehensive knowledge-based NLP system. a subtree: SOLID-INORGANIC-MAN-MADE-NONMETALLIC-MATERIAL is simply unmanageable as a concept name.
4.
Ontology as a Component of Knowledge Support for NLP
Ontology building has never been a totally independent project in our environment. World knowledge in DIONYSUS has been acquired with the express purpose of using it in a natural language processing system. Knowledge representation requirements of such a system include, in addition to ontology specification, a representation for a lexicon entry and a language for recording the meaning of input text. The interaction between these static knowledge sources is illustrated in Figure 10. In our work, we observed that having a well-developed textual meaning representation language as well as a detailed format for a dictionary entry eases requirements on the size and complexity of an 25
ontological model. In this section we will illustrate the give-and-take between the lexicon, the text meaning representation language and the ontology in accounting for several sample phenomena in DIONYSUS.
4.1.
Speech Acts
We have seen several examples of the factors that must be considered when designing a language-independent ontology. Here we will examine another corner of the event hierarchy, namely, the subtree SPEECH-ACT. We will see how a very general ontological classification of basic speech act types, in conjunction with a constrained lexicon-ontology mapping, and a link to certain TAMERLAN attitudes can provide an adequate account of a large variety of speech act verbs in natural language. Simply put, a speech act can be thought of as the performance of certain acts when uttering a sentence. The identification of such acts was first made by Austin in a series of lectures entitled How to do Things with Words (1962), in which he outlined three types of acts which are performed simultaneously when making an utterance (quoted from Levinson 1983, p. 236): 1. locutionary act: the utterance of a sentence with determinate sense and reference 2. illocutionary act: the making of a statement, offer, promise, etc. in uttering a sentence, by virtue of the conventional force associated with it (or with its explicit performative paraphrase)21 3. perlocutionary act: the bringing about of effects on the audience by means of uttering the sentence, such effects being special to the circumstances of utterance We have classified speech acts under the subtree COMMUNICATIVE-EVENT in the DIONYSUS ontology. The subtree SPEECH-ACT is in turn divided into the subclasses LOCUTIONARY-ACT and ILLOCUTIONARY-ACT. Since the perlocutionary effects of an utterance are not necessarily predictable from the utterance itself, perlocutionary acts are represented as the postconditions associated with a particular speech act event (See Section 3.3 above on complex events for a description of the POSTCONDITIONs slot). Illocutionary acts are subdivided into five types: ASSERTIVE-ACT, ROGATIVE-ACT, EXPRESSIVE-ACT, DIRECTIVE-ACT, and COMMISSIVE-ACT, following Leech (1983).22 They are defined as follows: An ASSERTIVE-ACT refers to an act performed in asserting something, as exemplified by the English verbs state, report, announce, declare, pronounce or assert. A ROGATIVE-ACT describes an act performed in asking something, as in the English verbs inquire, query, interrogate, question, or ask. An EXPRESSIVE-ACT is an act which expresses an attitude of some sort, as illustrated by the English verbs thank, apologize, excuse, praise, denounce, blame or congratulate. A DIRECTIVE-ACT refers to an act performed in giving orders, as exemplified by the English verbs suggest, urge, command, demand, or order. 21
In practice, the term speech act has come to be synonymous with that of illocutionary act, and different theories of speech acts have focused their attention on categorizing the illocutionary verbs of a language in terms of a set of illocutionary acts. 22
Other classifications schemas have been proposed. See, for example, Austin (1962) or Searle (1969 and 1979).
26
Figure 11: Speech acts. A COMMISSIVE-ACT is an act performed in offering something, as shown by the English verbs promise, vow, threaten, offer, guarantee or ensure. Figure 11 illustrates the SPEECH-ACT subtree in the DIONYSUS ontology. The separation of ontologically defined illocutionary act types from the actual illocutionary verbs of a language has several advantages. First of all, as Leech (1983, Chapter 9) observes, many problems arise from the attempt to categorize illocutionary verbs into illocutionary act types. Many of these verbs are polysemous, and can fall into more than one category. His examples are the verbs advise, suggest and tell, which can be used as either assertive or directive acts:
Assertive 1. I advised them that the building would close in 10 minutes. 2. Mary suggested that the plan was favorable to management. 3. Jerry told them that Doug would be late.
Directive 1. I advised them to take Route 40. 2. The teacher suggested that we type our term papers. 3. Aunt Helen told us to clean our rooms. Because individual lexical entries in the DIONYSUS lexicon represent a single word sense of an orthographic word, we have no difficulty in, for example, linking advise-v1 with the ontological concept ASSERTIVE-ACT, and advise-v2 with DIRECTIVE-ACT. Moreover, another concern that Leech points out with classification schemas like Searle’s becomes nonproblematic given our approach. That is, in Searle’s system, each illocutionary act type has a semantic description, but is also associated with a specific syntactic subcategorization pattern. As Leech rightly observes, this association of the act type with specific syntactic pattern information does not always hold. For example, Searle’s category of expressive verbs are described as normally occurring in the construction: “S VERB (prep) (O) (prep) Xn’, where ’(prep)’ is an optional preposition, and where Xn is an abstract noun phrase or a gerundive phrase; eg: apologize, commiserate, congratulate, pardon, thank” (p. 206). Leech cites the verb greet as an example of a member of this category which does not follow this subcategorization pattern. In the DIONYSUS lexicon, we specify syntactic subcategorization information in a special zone of a lexical entry, called SYN-STRUC. Therefore, the 27
difference in subcategorization pattern between the first and second set of examples above, or the difference in subcategorization between greet and apologize, is stipulated for each lexical entry, and is independent of the linking to the ontology, which is specified in the SEM zone of the entry. An important part of a speech act is the degree of force which it entails. Among assertive acts, to state something conveys a lesser degree of force than to assert something, for example. Similarly, a rogative act of inquiring is not as forceful as one of interrogating; while expressive acts of suggesting, urging, or ordering represent increasing degrees of force. Therefore, the concept ILLOCUTIONARY-ACT will have a DEGREE-OF-FORCE slot, whose filler is a numerical scale ranging from 0, 1 . In the lexicon, all illocutionary verbs will be linked to a specific ILLOCUTIONARY-ACT in the ontology, and the force conveyed by a particular verb will be specified in the DEGREE-OF-FORCE slot, as a decimal value on the interval 0, 1 , where zero represents a low degree of force, .5 represents a neutral value, and 1.0 represents a high degree of force.
f g
f g
Another aspect of a speech act may be an attitude conveyed in uttering a sentence. For example, different expressive acts convey various types of attitudes on the part of the speaker making the utterance. Thanking and congratulating express positive attitudes, while blaming and denouncing convey negative ones. In the DIONYSUS system, we represent speaker attitudes as TAMERLAN constructs, by-passing the ontology (see Nirenburg and Defrise 1991a, 1991b).23 Therefore, the SEM zone of a lexical entry may indicate a particular TAMERLAN attitude associated with the entry, in addition to the ontological concept with which it is linked. By constraining the DEGREE-OF-FORCE slot for a particular verb and associating with it a specific TAMERLAN attitude when necessary, we are able to classify a wide variety of speech act verbs in various natural languages using a limited set of ontological concepts, while still capturing gradations of meaning associated with different speech act verbs. This approach also provides us with an effective solution to the long-standing confusion between illocutionary verb and illocutionary act type.
4.2.
Relations and Metonymic Processing
In processing natural language text, there are many instances where language is used in a way that can be said to violate or relax ontological constraints. For example, when we say That farm sells fresh produce, or The bank lowered its lending rate, we are using the name of an organization to refer to the people who are actually in charge of the activities being described. Similarly, if we say We need some more hands to finish this job, we are using a body part to refer to the fact that we need more workers. Such usage of language is referred to as metonymy. In the DIONYSUS project, the processing of metonymy is an integral part of semantic interpretation. This allows us to avoid recording certain types of information in the ontology because it can be retrieved and put into the text meaning representation by the special metonymy processing rules. During semantic processing, if an input sentence has a putative violation of a constraint in a slot, the system will check to see if a certain type of metonymy is involved. Below, we list several types of metonymy we treat in the DIONYSUS project (the types were selected from Lakoff and Johnson, 1980). One type of metonymy is part-for-whole: We need some good heads to solve this problem. After the system encounters a violation of constraints on the agent of solve, the metonymy treatment module will select HAS-AS-PART (or its inverse, PART-OF) as a candidate for relaxation. Suppose that the default agent of solve is HUMAN (a reasonable assumption). The 23
Of course, more detail is required. For one thing, the attitudes associated with different speech acts must be refined and interwoven with presuppositions, expectations, and postconditions on events. As we develop our model further, these will be represented as conditions on complex events.
28
system will check the content of the HAS-AS-PART slot for HUMAN and find that it includes HEAD. As a result, the part-for-whole type of metonymy is accepted as a candidate reading for the sentence. Other types of metonymy include: producer-for-product (My Saab needs an oil change), checked with the help of the PRODUCED-BY relation; organization-for-people-involved (The union went on strike), determined through the MEMBER-OF relation; place-for-institution (The White House vetoed the bill), checked with the help of the LOCATION relation; controlled-for-controller (The red sports car ran into me in the parking lot), determined through the OPERATED-BY relation; place-for-event (Three Mile Island and Chernobyl make us nervous about future nuclear disasters), identified through the LOCATION relation.
4.3.
Collections and Quantities.
The notion of a SET is needed ontologically as an abstract construct (a mathematical object). However, we handle collections of objects in the world through our semantic processing component, representing them as TAMERLAN constructs, and thus don’t require special treatment of them in the ontology.24 We do not want to create ontological concepts for collections of different types of objects or quantities of things in the world. In order to support this type of processing, we only need to define the general concept of SET and introduce two subclasses of sets — NON-ENUMERATED-COLLECTION and ENUMERATED-COLLECTION (see examples below). So, for example, when a natural language processing system based on our static knowledge sources obtains a pride of lions as input, we treat this particular word sense of pride (the one which collocates with lion) as a trigger for the instantiation, in the TAMERLAN text!, of a TAMERLAN construct SET, whose element type will be an instance of the ontological concept LION. Since the number of lions in the pride was not mentioned, we cannot instantiate a value for the cardinality of this particular set. However, if the input were five lions, the TAMERLAN representation would have included a set whose element type was LION and whose cardinality was 5. In the representation below, the symbol (%%) indicates a generic instance of an ontological concept, used to represent set elements that are not referred to individually.25 (make-frame %NON-ENUMERATED-COLLECTION-1 (ELEMENT (value %%lion))) Handling quantities of materials is a task related to handling collections of separable entities. The general question here is how to deal with inputs of the form ‘X of Y’, where X represents a quantity and Y represents one of the two types of separable entities described above. Say, we have a bottle of water. Water will be a type of material in the ontology. A ‘bottleful’ of water, unlike ‘water’ in general, is, in fact, a separable entity. Therefore, the system will instantiate the following in the TAMERLAN text: (make-frame %SEPARABLE-ENTITY-1 (MADE-OF (value *water)) 24
This makes the ontology almost twice as concise (almost any single separable entity can appear in a collection; therefore, if we want separate concepts for the latter, we will have to proliferate concept names). Note that Dahlgren et al, 1990 did not make this choice — and ended up with a lot of extra luggage. 25
See Nirenburg and Defrise 1991b for details of the notational conventions in TAMERLAN.
29
(QUANTITY (default 1) (measuring-unit *liter)))
5.
Conclusions
This document is a first attempt to discuss the many difficult representation and interpretation problems which we have encountered in our work on world modeling. We hope that this material will help those who would like to use our experience in designing and acquiring their own world models. We recommend that these people read several companion documents in order to facilitate the knowledge acquisition task. These are – the FrameKit manual (Nyberg, 1988), the Ontos Reference Manual (Monarch, 1988), the DIONYSUS lexicon technical report (Meyer et al., 1990) and the specification of TAMERLAN in Nirenburg and Defrise, 1991a. The authors will be glad to get your comments and suggestions. We also promise to continually extend and improve this ontology representation document as the work on DIONYSUS progresses and modifications are made to its components.
6.
Acknowledgements
We would like to thank all the current and past members of the DIONYSUS project — Ralf Brown, Ted Gibson, Todd Kaufmann, John Leavitt, Ingrid Meyer, Eric Nyberg and Boyan Onyshkevych. Thanks also to Irene Nirenburg and Ken Goodman.
7.
References
Austin, J. L. 1962. How to Do Things with Words. Cambridge, MA: Harvard University Press. Bruce, Bertram. 1975. Case Systems for Natural Language. Artificial Intelligence 6: 293-326. Cook, Walter, S.J. 1989. Case Grammar. Washington, D.C.: Georgetown Dahlgren, Kathleen, Joyce McDowell and Edward P. Stabler, Jr. 1990. Knowledge Representation for Commonsense Reasoning with Text. Computational Linguistics, 15: 149-170. Fillmore, Charles. 1968. The Case for Case. In Emmon Bach and Robert Harms, eds. Universals in Linguistic Theory. New York: Holt, Rinehart, and Winston. 1-88. Goodman, Kenneth, and Sergei Nirenburg, eds. 1989. KBMT-89 Project Report. Center for Machine Translation. Carnegie Mellon University. Goodman, Kenneth, and Sergei Nirenburg, eds. (forthcoming) KBMT-89: A Case Study in KnowledgeBased Machine Translation (working title). Los Altos, CA:: Morgan Kaufmann. Jackendoff, Ray. 1983. Semantics and Cognition. Cambridge, MA: MIT Press. Lakoff, George and Mark Johnson. 1980. Metaphors We Live By. Chicago: University of Chicago Press. Leech, Geoffrey. 1983. Principles of Pragmatics. London: Longman. 30
Levinson, Stephen. 1983. Pragmatics. Cambridge, England: Cambridge University Press. Meyer, Ingrid, Boyan Onyshkevych, and Lynn Carlson. 1990. Lexicographic Principles and Design for Knowledge-Based Machine Translation. Technical Report CMU-CMT-90-118, Center for Machine Translation, Carnegie Mellon University. Nirenburg, Sergei. 1989. Lexicographic Support for Knowledge-Based Machine Translation. Literary and Linguistic Computing, Vol. 4. No. 3. Nirenburg, Sergei. 1989. Lexicons for Computer Programs and Lexicons for People. Proceedings of the 5th Annual Conference of the University of Waterloo Centre for the new Oxford English Dictionary, pp. 43-65. Nirenburg, Sergei. 1989. Knowledge-Based Machine Translation. Machine Translation, Vol. 4, No. 1, March 1989, pp. 5-24. Nirenburg, Sergei and Christine Defrise 1991a. Practical Computational Linguistics. In R. Johnson and M. Rosner, eds., Computational Linguistics and Formal Semantics. Cambridge: Cambridge University Press. Nirenburg, Sergei and Christine Defrise 1991b. Aspects of Text Meaning. In J. Pustejovsky, ed., Semantics and the Lexicon. Dordrecht, the Netherlands: Kluwer. Nirenburg, Sergei, Ira Monarch, Todd Kaufmann, Irene Nirenburg, and Jaime Carbonell. 1988. Acquisition of Very Large Knowledge Bases: Methodology, Tools, and Applications. Technical Report CMUCMT-88-108. Center for Machine Translation. Carnegie Mellon University. Nirenburg, Sergei and Victor Raskin. 1987. The Subworld Concept Lexicon and the Lexicon Management System. Computational Linguistics, 13: 276-89. Nirenburg, Sergei, and Kennth Goodman. 1990. Treatment of Meaning in MT Systems. Proceedings of the Third International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages. (Austin, Texas, June 11-13, 1990). Linguistic Research Center, University of Texas at Austin. Nirenburg, Sergei, Jaime Carbonell, Masaru Tomita and Kenneth Goodman. (forthcoming) Machine Translation: The Knowledge-Based Approach. Los Altos, CA: Morgan Kaufmann. Schank, Roger, ed. 1975. Conceptual Information Processing. New York: North Holland. Schank, Roger and Robert Abelson. 1977. Scripts, Plans, Goals and Understanding. Hillsdale, NJ: Lawrence Erlbaum. Searle, John R. 1969. Speech Acts: An Essay in the Philosophy of Language. Cambridge, England: Cambridge University Press. Searle, John R. 1979. Expression and Meaning. Cambridge, England: Cambridge University Press. Viberg, ˚Ake. 1984. The Verbs of Perception: a Typological Study. In Brian Butterworth, Bernard Comrie ¨ sten Dahl, eds., Explanations for Language Universals. New York: Mouton. and O Winograd, Terry. 1983. Language as a Cognitive Process. Volume I: Syntax. Reading, MA: AddisonWesley.
31