1 shows an example of the domain of discourse: part of a city with static ob- ... In CITYTOUR a point of reference is either given by the observer's position (de-.
Coping with the Intrinsic and Deictic Uses of Spatial Prepositions Elisabeth Andr´e, Guido Bosch, Gerd Herzog, Thomas Rist SFB 314, Project VITRA, Universit¨at des Saarlandes D-66041 Saarbr¨ucken, Germany Abstract This paper1 deals with the definition of a computational semantics for spatial relations between objects in a scene. In particular we examine the effects of the observer's position on the computation of spatial relations, the use of path prepositions, and degrees of applicability for spatial relations. Representational prerequisites and the computational analysis of spatial relations are investigated in the German dialogue system CITYTOUR.
This paper appeared in: K. Jorrand and L. Sgurev (eds.), Artificial Intelligence II: Methodology, Systems, Applications, pp. 375–382. Amsterdam: North-Holland, 1987.
1
This report describes work done in the project VITRA, which is part of the SFB 314 Research Program of the Science Foundation (DFG) on AI and Knowledge-Based Systems.
1
1
Introduction
The development of knowledge-based systems for natural language descriptions of image sequences includes the problem of defining a computational semantics of natural language expressions describing spatial relations between physical objects. Usually, such expressions contain prepositions relating one object (the subject) to one or more others (the reference objects). In linguistics various approaches to the semantics of prepositions have been presented (cf. Bennett [1975], Herskovits [1980] or Hawkins [1984]). Aspects of a computational semantics of spatial relations are examined by Boggess [1978]. In systems like NAOS, Neumann [1984], HAM-RPM, von Hahn et al. [1980], or SWYSS, Hußmann and Schefe [1984], simplified versions of static spatial relations have already been implemented. A computational model of memory for spatial relations has been studied in the system MERCATOR, Davis [1984]. The research reported here forms the basis for the implementation of the system CITYTOUR, Andr´e et al. [1985], a German question-answering system that simulates aspects of a fictitious sightseeing tour through an interesting part of a particular city. Fig. 1 shows an example of the domain of discourse: part of a city with static objects (e.g. buildings) and dynamic objects (e.g. pedestrians) from a bird's eye view. A special dynamic object is the sightseeing bus which is graphically represented as a large dot. The questioner is assumed to be sitting in this bus. Thus the conversational partner is himself part of the scene under discussion. Therefore CITYTOUR's answers can take into account the current position of the observer. This is one of the distinguishing features of CITYTOUR.
2
Geometrical Representation
CITYTOUR copes with dynamic and static objects projected on a two-dimensional area. Dynamic objects (e.g. pedestrians) are represented in a simplified way as centroids. The trajectory of a dynamic object is represented as a list of pairs ((Pt1 t1) (Pt2 t2 )
:::
(Pti
ti ) :::);
where Pti denotes the position of the object at time ti on the underlying discrete time axis (cf. Fig. 2). It is an important prerequisite for the processing of path prepositions (such as along) that the static objects not be represented simply as centroids. In CITYTOUR the representation of a static object consists of the following features (cf. Fig. 3):
centroid closed polygon prominent front (cf. the bold edge of the polygon as shown in Fig. 3) 2
Figure 1: The basic windows of the system CITYTOUR
delineative rectangle The delineative rectangle is used for the computation of the basic relations in front of, to the left of, to the right of, behind. The computation of the delineative rectangle depends on the observer's position and will be explained in the next section.
3
Observer's Position
In CITYTOUR we have implemented relations such as at or along, which are independent of the observer's position, as well as relations such as in front of or to the right of, for which the observer's position must be taken into account. Linguists (eg. Vandeloise [1984] or Wunderlich [1985]) have examined the factors (direction of movement, line of sight etc.) that influence the choice of a system of reference for spatial relations. In CITYTOUR a point of reference is either given by the observer's position (deictic use of the relation) or by the prominent front of the reference object (intrinsic use).
3
Figure 2: The trajectory of a dynamic object
Figure 3: The elements of the representation for a static object Let us now examine the difference between the deictic and the intrinsic use of a spatial relation in the example dialogue shown in Fig. 1. The first question “Liegt die Post neben der Kirche?” (Is the post office beside the church?) yields the answer “Ja, die Post befindet sich recht gut neben der Kirche.” (Yes, the post office is beside the church.) because the front side (cf. the bold edges in Fig. 1) of the reference object Kirche is decisive for the system's answer. To the question “Liegt die Post hinter der Kirche?” (Is the post office behind the church?) the system generates the negative answer “Nein, das kann man nicht sagen.” (No, that's not the case.). If the deictic expression “von hier aus ” (from here) is added to the last question, CITYTOUR considers the observer's position (sitting in the bus) instead of the front side of the reference object and answers “Ja, die Post befindet sich recht gut hinter der Kirche von hier aus.” (Yes, the post office is behind the church from here.). For 4
Figure 4: The delineative rectangle based on the observer's position the computation of the basic relations in front of, to the left of, to the right of, behind we use the delineative rectangle of the reference object. Each edge of the rectangle is associated with one of the regions F , L, R or B (cf. Figs. 4 and 5), in which one of the above relations is applicable. Fig. 4 illustrates the construction of the delineative rectangle based on the observer's position. We assume that the eyes of the observer fix upon the reference object with a monocular view (cf. [Saile, 1984, pp. 64–81]). The line of sight is given by the bisector between the tangents T1 and T2. The perpendicular to the line of sight, which is touching the object, finally determines the front edge of the delineative rectangle. For non-deictic uses, which we treat in CITYTOUR as the unmarked case, we assume hypothetically that the observer is located in front of the prominent side of the object. Thus the rectangle in Fig. 5 is orientated by the object's prominent front. For the analysis of spatial relations, the representation of the reference object by means of its delineative rectangle is appropriate if the subject is outside the rectangle. Otherwise the polygonal representation has to be used (cf. Andr´e et al. [1985]).
5
Figure 5: The delineative rectangle orientated by the object's prominent front
4
A Computational Semantics for the Relation 'vorbei' (past)
In addition to some static spatial relations (e.g. in front of, behind, to the left of, to the right of, between) and simple dynamic relations (e.g. move to the front of, move to the back of, move to the left of, move to the right of), we have defined a computational semantics for the path prepositions entlang (along) and vorbei (past). In this section the relation vorbei (past) will be discussed as an example. Usually the relation vorbei (past) is combined with other relations such as behind, in front of, to the left of, to the right of, e.g. “the man is passing to the left of the church” so that the observer's position has to be taken into account for the computation (cf. Fig. 6). The following definition describes a predicate function which decides whether a dynamic object is passing in front of a static object. Definition: A dynamic object is passing in front of a static object () There is a sub-trajectory ((Pbegin tbegin ) ::: (Pend tend )) of the dynamic object for which the following conditions can be satisfied according to the four half-planes determined by the delineative rectangle which is computed with regard to the 6
Figure 6: The dynamic object passes to the left of the church from the observer's viewpoin deictic (cf. Fig. 6) or the intrinsic (cf. Fig. 7) use of the relation. 1.
(Pbegin 2 R ^ Pend 2 L) _ (Pbegin 2 L ^ Pend 2 R), i.e., the dynamic object passes the static object from the right to the left or vice versa.
S
= L R) ^ (Pi 2 F ), 2. 8ti 2 ]tbegin ; tend [ : (Pi 2 i.e., the dynamic object is exactly in front of the static object between tbegin and tend .
3. The distance between the delineative rectangle of the static object and the dynamic object does not exceed an object-dependent threshold. Condition (2) guarantees that the minimal time interval for which the sub-trajectory satisfies the predicate is found. We call this interval core interval of the relation. Analogous definitions for the remaining cases rechts vorbei, links vorbei, hinter vorbei (passing to the right of, passing to the left of, passing behind) are easy to formulate.
7
Figure 7: The dynamic object passes behind the church relative to the prominent front
5
Degrees of Applicability for Spatial Predicates
The predicate function which has been defined in the last section can only decide whether the predicate vorne vorbei (to pass in front of) is applicable or not. In natural language use the meaning of prepositions can be modified by so called linguistic hedges (cf. Lakoff [1973]), such as directly or more or less. In CITYTOUR we use a measure of degrees of applicability which expresses the extent to which a spatial preposition is applicable, to generate such hedges. If a given object configuration can be described by several spatial relations, the degree of applicability is used in order to select the most appropriate preposition for verbalization. In section 3 we presented the delineative rectangle for static objects for the computation of the four basic prepositions (behind, to the left of, to the right of, in front of). One of these prepositions is applicable if the subject is located in the corresponding half-plane of the reference object. For these relations the degrees of applicability can be determined by partitioning the half-planes into regions of the same degree of applicability such that linguistic hedges can be associated with each region. In CITYTOUR this partition depends on the size of the reference object and on its delineative rectangle. Other aspects, such as the exact shape of the objects or the influence of adjacent or nearby objects, are not considered. Fig. 8 shows a partitioned half-plane
8
Figure 8: The partitioned left-half plane of the church with the associated expressions in German of the reference object whereby the degree of applicability decreases with the distance from the object. At present, degrees of applicability are only implemented for static predicates.
6
Concluding Remarks
The basic elements of our representation of physical objects are the polygon, the centroid, and the prominent front. Our analysis has shown that a less complex representation of objects is insufficient for natural language scene descriptions. The representation also includes delineative rectangles as a special feature. They are used for the computation of the four basic relations in front of, to the left of, to the right of, and behind in both their intrinsic and deictic uses. For the domain of CITYTOUR the computation of the predicate vorbei (past) using the delineative rectangle yields not only reasonable results but also reduces computational expense. For other relations, such as the path preposition entlang (along) (cf. Andr´e et al. [1986]), the exact shape of the reference object has to be considered. In this case CITYTOUR uses the polygonal representation for computing the applicability.
7
Future Work
In the CITYTOUR domain, the focus of our research is on the recognition and verbalization of static and dynamic spatial relations between objects in a time-varying scene. Another interesting aspect with regard to the verbalization of visual information is the recognition of higher-level motion concepts and their natural language description. This aspect is being investigated in a second domain of discourse for the project VITRA, called SOCCER. In this domain natural language descriptions of image sequences (up to a few minutes long), taken from soccer games, are to be generated. 9
Important characteristics of SOCCER, in contrast to CITYTOUR, include:
In addition to the spatial relations between static or dynamic subjects and static reference objects, spatial relations between two or more dynamic objects will be verbalized as well. Besides spatial relations, temporal relations between movements (such as simultaneity and succession) will also be verbalized. Whereas in CITYTOUR we only deal with complete trajectories of moving objects, in SOCCER, movements will already be being recognized and verbalized (in present tense) while they are still going on. For this purpose expectationdriven heuristics and thus additional knowledge sources (e.g. scripts, cf. RetzSchmidt [1985]) are required. Apart from visual motion concepts, in SOCCER non-visual concepts such as intentions and plans are also to be used for the selection of an adequate natural language description of observed movements, i.e. the same trajectory will be mapped onto completely different motion descriptions, depending on the intention imputed to the actor.
8
Technical Notes
The CITYTOUR system is implemented on a SYMBOLICS 3600 LISP machine. The program is written in FUZZY LeFaivre [1977] and ZetaLISP. The average response time for a question is about five seconds.
Acknowledgements We would like to thank Prof. Dr. Wolfgang Wahlster and Gudula Retz-Schmidt for their insightful comments and suggestions throughout the development and the writing of this paper.
References E. Andr´e, G. Bosch, G. Herzog, T. Rist. CITYTOUR – Ein n¨urlichsprachliches Anfragesystem zur Evaluierung r¨aumlicher Pr¨apositionen. Abschlußbericht zum fortgeschrittenenpraktikum prof. dr. w. wahlster, wintersemester 1984/85, Fachbereich Informatik, Univ. des Saarlandes, 1985. E. Andr´e, G. Bosch, G. Herzog, T. Rist. Characterizing Trajectories of Moving Objects Using Natural Language Path Descriptions. In: Proc. of the 7th ECAI, vol. 2, pp. 1–8, Brighton, UK, 1986. 10
D. G. Bennett. Spatial and Temporal Uses of English Prepositions. An Essay in Stratificational Semantics. Longman, London, 1975. L. C. Boggess. Computational Interpretation of English Spatial Prepositions. Technical Report T-75, Coordinated Science Laboratory, Univ. of Illinois, 1978. E. Davis. Representing and Acquiring Geographic Knowledge. Ph.D. thesis, Computer Science Department, Yale Univ., New Haven, CT, 1984. B. W. Hawkins. The Semantics of English Spatial Prepositions. Ph.D. thesis, Univ. of California, San Diego, CA, 1984, Available as Paper A 142, Trier: L.A.U.T., 1985. A. Herskovits. On the Spatial Uses of Prepositions. In: Proc. of the 18th ACL, pp. 1–5, Philadelphia, PA, 1980. M. Hußmann, P. Schefe. The Design of SWYSS, a Dialogue System for Scene Analysis. In: L. Bolc, ed., Natural Language Communication with Pictorial Information Systems, pp. 143–201, Hanser/McMillan, M¨unchen, 1984. G. Lakoff. Hedges: A Study in Meaning Criteria and the Logic of Fuzzy Concepts. Journal of Philosophical Logic, 2, 458–508, 1973. R. A. LeFaivre. FUZZY Reference Manual. 1977. B. Neumann. Natural Language Description of Time-Varying Scenes. Report 105, Fachbereich Informatik, Univ. Hamburg, 1984. G. Retz-Schmidt. Script Based Generation and Evaluation of Expectations in Traffic Scenes. In: H. Stoyan, ed., GWAI-85. 9th German Workshop on Artificial Intelligence, pp. 197–203, Springer, Berlin, Heidelberg, 1985. G. Saile. Sprache und Handlung. Vieweg, Braunschweig, 1984. C. Vandeloise. Description of Space in French. Ph.D. thesis, Univ. of California, San Diego, CA, 1984. W. von Hahn, W. Hoeppner, A. Jameson, W. Wahlster. The Anatomy of the Natural Language Dialogue System HAM-RPM. In: L. Bolc, ed., Natural Language Based Computer Systems, pp. 119–253, Hanser/McMillan, M¨unchen, 1980. D. Wunderlich. Raumkonzepte. Zur Semantik der lokalen Pr¨apositionen. In: T. T. Ballmer, R. Posener, eds., Nach-Chomskysche Linguistik, pp. 340–351, de Gruyter, Berlin, New York, 1985.
11