Modelling Spatial Inferences in Text Understanding

1 downloads 0 Views 181KB Size Report
To handle texts like the one above, we want to be able a) to distinguish deictic and intrinsic ... axis and the z-axis to the above/below axis of the body.
Modelling Spatial Inferences in Text Understanding

Ute Schmid, Sylvia Wiebrock and Fritz Wysotzki Methods of Arti cial Intelligence Department of Computer Science, Technical University of Berlin FR 5{8, Franklinstr. 28/29, D 10587 Berlin email: fschmid,wiebrock,[email protected] Abstract

Text understanding in a spatial domain is often seen as the construction of a mental model. In this paper we present the rst prototype of a program that realizes some of the characteristics claimed for spatial reasoning processes in mental models. Our work is based on homogeneous coordinate systems and transformation matrices. This means that a relation between two objects A and B is represented by the transformation matrix that maps the coordinate system of A onto that of B, and by constraints on the parameters of the matrix. We argue that this approach is more exible and more suitable for modelling spatial relations than qualitative approaches. While the psychological questions concerning mental models are largely skipped over, the advantages and problems of our program are discussed in more detail.

1 Introduction When reading a text describing the arrangement of objects, humans construct a mental model (Garnham 1981; Glenberg and Langston 1992) representing the situation. This mental model is \structurally identical" (Johnson-Laird 1983) to the described situation in the sense that the relations holding in the situation can be inferred from the model. In our approach, the mental model is represented by a graph. The nodes correspond to the objects, while the arcs are labelled with the relations. A sample text describing spatial con gurations { similar to the texts used in experiments by Franklin and Tversky (1990) and Hornig et al. (1996) { is1 : (1) Torsten is standing in the kitchen. (2) The refrigerator is standing on the left of Torsten. (3) The bowl is standing on the refrigerator { Sentences for the other three basic directions { (4) Torsten turns left. ... In this paper, we will deal with inferences in the static situation described by (1) { (3) only, and not with those involving motions (e.g. (4)). To handle texts like the one above, we want to be able a) to distinguish deictic and intrinsic uses of spatial propositions, b) to allow for a reorientation and/or movement of objects, and c) to infer spatial relations after reorientation of the protagonist. Therefore, objects are supplied with a coordinate system. The x-axis corresponds to the left/right axis, the y-axis to the front/back  This research was supported by the Deutsche Forschungsgemeinschaft (DFG) in the project \Modellierung von Inferenzen in Mentalen Modellen" (Wy 20/2{1) within the priority program on spatial cognition (\Raumkognition"). 1 We omit introduction, ller sentences and object descriptions. In our experiments (H ornig et al. 1996), we are using german texts.

1

axis and the z-axis to the above/below axis of the body. The orientation of these axes is arbitrary for objects without intrinsic axes. To describe relations, we follow an approach that is used in robotics (Ambler and Popplestone 1975). A relation between two objects A and B is described by the 4  4 matrix corresponding to the rotation and translation needed to map the coordinate system of A onto that of B. For the restricted case where only rotations around the vertical (z ) axis are allowed, this matrix has the form: 0 )1 0 cos  ? sin  0 1 (x(A;B A;B ) C B  R y C with R = @ sin  cos  0 A T =B @ (zA;B) A 0 0 1 0 0 0 1 In Fig. 1, the meaning of the paramA B eters of the transformation matrix is shown. y We use extended objects. At present, two x A.r forms are possible: cuboids with dimensions width for the x-axis, depth for the B.d y-axis, and height for the z-axis, and cylinders with dimensions radius for the x- and x y-axes and height for the z-axis. We don't y ∆(A,B) θ y have default objects, i.e. in general there (A,B) may be bounds on the extension of an ob∆x B.w ject, but we don't know the exact size. The origin of the coordinate system is positioned at the geometric center of the object. The dimensions are always measured from the origin of the coordinate system Figure 1: Relatum A and localized object B. along the corresponding axis. That means that the parameters denote half the size of the object. Note that even in a situation where all objects are standing on the ground, the (zA;B) values will be non-null whenever objects have di erent heights. Spatial relations like right of are de ned by constraints on the parameters of the transformation matrix. To nd psychologically valid de nitions for spatial relations is an open problem. Though Franklin, Henkel, and Zangas A B (1995) and Vorwerg and Rickheit (1998) report overlapping, non-symmetric regions in front of di erent sizes for the four basic directions, we are not sure whether these results are applicable to text understanding. y left right For a discussion of the in uence of relax y tive size and orientation on the so-called x \acceptance areas" see, e.g. Hernandez behind (1994), Gapp (1994) or Gapp (1995). In our model, we usually know neither the exact sizes, nor the exact distances and y angles of objects. Therefore it is hard to include these results in our de nitions. B is right of A x Other results from Zimmer et al. (1998) C is left of B support the theory that the four base reC lations are nearly exclusively used for the prototypical positions (only along the axes). Figure 2: A possible set of relations. Thus, at present we simplify the de nitions and use the relations shown in Fig. 2. B is right of A if and only if the smallest bounding box containing B (shown with dotted lines in the picture) is completely included in the right region of A. Such simpli cations are common in arti cial intelligence solutions. We plan to include more realistic de nitions in the future, but concentrate on other aspects of the model at the moment. For the sample texts considered so far, these de nitions are sucient. 2

x

room y

wall2

ceiling room.d wall1

wall2

wall3

wall4

y x

floor

y wall3

x

wall1 x

room

Subgraph for room

y room.w

Constraints: wall_i.w = ceiling.h = floor.h = 0 floor.w = ceiling.w = room.w floor.d = ceiling.d = room.d wall_i.h = room.h wall1.d = wall3.d = room.d wall2.d = wall4.d = room.w

wall4

y

x Bird’s eye view of room

Figure 3: Default constellation for room object(s) Intrinsic spatial relations are de ned by a set of (in)equations using the parameters of the matrix and the dimensions of the two objects. As any movement (and rotation) can be described by a transformation matrix, the relation between two objects can easily be updated. It is also possible to introduce a third parameter (the current reference frame, see Claus et al. (1998) for a discussion) and compute the relation between A and B for a given perspective V in case of a deictic interpretation of the relation. Currently, we always assume that an intrinsic reading is intended. In contrast to qualitative spatial reasoning (e.g. Hernandez 1993), where it is not possible to compute the relation between B and A, given only the intrinsic relation between A and B, we only need to invert the matrix to compute the transformation. As can be seen in Fig. 2, this transformation need not correspond to a de ned relation. In the example, we cannot label the arc from C to A with the name of a relation. This means, the arcs in our graphs will always be labelled with a transformation matrix and may also be labelled with the name(s) of one or more de ned relations. In the following, we will use the example text to explain the algorithm we have implemented. The psychological motivation for our program, and the unsolved questions with respect to the construction of and the access to a mental model are discussed in Claus et al. (1998). Recent results can also be found in Hornig et al. (1998).

2 Description of the Program When started, the program reads the de nition of objects and prede ned relations from two les. An object description contains the name of an object class (e.g. person ), the form (cylinder ), the names of the dimensions (r and h ) and optionally upper and lower bounds on the possible extensions. The graph is initialized with a subgraph representing a room object2 (see Fig. 3). When sentence (1) is read, a node for Torsten is created and an arc from Torsten to the room node is introduced which is labelled with is in room and the generic transformation matrix. The constraints for the is in room relation are stored in the constraint table. The transformation matrix T (room,To) describes the rotation of the axes and the translation of the origin of the room coordinate system necessary to identify both coordinate systems. When this matrix is multiplied with a vector representing a location of an object A (respectively its origin) in Torsten's coordinate system, the result is the location of A with respect to the room coordinate system. Therefore the arc is pointed from Torsten to the room node. In Fig. 3 the default graph for a room node and 2

At present, room objects are the only structured objects we consider.

3

the default layout of a room is shown. Neither the room object nor any of its sub-objects have intrinsic front or right sides. The coordinate system of room is positioned parallel to its walls. The coordinate systems of the walls are positioned in such a way that the relation is in room corresponds to a negative (xWall;A) value for any object A and any wall. x When sentence (2) is read, a node for the refrigerator is created, the arc for the wall2 y mentioned relation left of (Torsten, refrigerator) is introduced, and the arc for the relation is in room (refrigerator ) is introy refrigerator y duced (as part of our background knowly wall3 x edge). The transformation matrix and x y wall1 x room constraints for the is in room (refrigerator ) x y x arc are computed by multiplying the maTorsten trices for T (room,To) and T (To,re) and propagating the constraints. In addition, y wall4 the generic constraints for the is in room x relation are stored (they only state that an object must be completely inside the Figure 4: Possible situation for sentences (1) and (2) room). A possible situation for these two sentences is shown in Fig. 4. As the walls are not distinguished, we have chosen to orient Torsten so that his coordinate system is parallel to the room coordinate system. This corresponds to the transformation matrix 0 ;To) 1 1 0 0 x(room B 0 1 0 y(room;To) CC T (room, To) = B @ 0 0 1 To:h ? room:h A 0 0 0 1 with the constraints for the is in room relation ?(room:w ? To:r)  (xroom;To)  room:w ? To:r and ?(room:d ? To:r)  (yroom;To)  room:d ? To:r The refrigerator has an intrinsic front (presumably where the door is) and is oriented with its back side against the wall. The transformation matrices for left of (Torsten, refrigerator ) and is in room (refrigerator) are 0 ;re) 1 0 1 0 (To x(To;re) C B ?1 0 0 y T (To, re) = B @ 0 0 1 re:h ? To:h CA 0 0 0 1 with the constraints ;re)  ?(To:r + re:d) (To x ;re) j ? To:r ? re:d) + (To:r ? re:w)] = ?[(j(To x ;re) ? re:d ? re:w)  (To;re)  (?(To;re) ? re:d ? re:w) ?(?(To x y x

0 0 B ? 1 ( room, re ) T =B @

and with the constraints

1 0 0 0 0 0

(xroom;re) ?(room:w ? re:d)  (xroom;re) (yroom;re) ;re) ?(room:d ? re:w)  (room y 4

1

;re) 0 (room x(room;re) CC 0 y 1 re:h ? room:h A 0 1

= (xroom;To) + (xTo;re)  room:w ? 2To:r ? re:d = (yroom;To) + (yTo;re)  (room:d ? re:w)

To get these (in)equations, we use the fact that T (room,re) = T (room,To)  T (To,re) . Thus we get the generic constraints for the is in room relation updated with the upper bound on (xTo;re) plus the equations for (xroom;To) and (yroom;To). The constraints on (yTo;re) contain the variable (xTo;re) and are not used to build further constraints for (yroom;re). This strategy of immediately updating wall1 the is in room arcs corresponds to the selection of the room coordinate system as a Torsten is_in_room global reference frame. Though we retain wall2 left all arcs given by the propositions, we always compute the position relative to the refrigerator ceiling room floor room. To use the coordinate system of the is_in_room protagonist (e.g. Torsten) would be cogon wall3 nitively more plausible, but as yet we have bowl no conclusive results about the building is_in_room and updating of mental models and have wall4 opted for the most ecient (from a computer science point of view) strategy. Figure 5: The graph after three sentences. Suppose that after sentence (3), we have the graph in Fig. 5. In this graph all and only the relations given in the text plus the implicit is in room arcs are represented. Suppose further that we were asked the relationship between Torsten and the bowl. To infer the relation between Torsten and bowl, we have to nd a directed path from bowl to Torsten. In the gure there is exactly one such path, but { as stated above { we can easily compute the inverse relation, i.e. invert the arc, by computing the inverse matrix. These inverted arcs are not introduced into the graph, because we want to keep the number of arcs as small as possible, but can be used for the inference. Now we get four possible paths: bowl { refrigerator { Torsten, bowl { room { Torsten?1 , bowl { refrigerator { room { Torsten?1 , and bowl { room { refrigerator?1 { Torsten, where the raised ?1 denotes inverting the preceding arc. The default relation is in room introduced for every object ensures that the graph is always connected. Therefore we will always nd at least one path. Which path is chosen for the inference depends on the search strategy. From a computational point of view, we might prefer to omit the search altogether and take the path via the room node which is always a shortest path. This is the default strategy we have implemented at present.3 If we have found a path, we can multiply the transformation matrices at the arcs. We get T (Torsten,bowl)

T (room,Torsten)

=

0 BB ?01 @



=

?1

 T (room,bowl) 0 ;To) 1 1 0 0 ?x(room BB 0 1 0 ?y(room;To) CC @ 0 0 1 ?(To:h ? room:h) A

=

0 BB ?01 @

1 0 0 0 0 0

1 0 0 0 0 0

0 0 0

1

1

;bowl) 0 (room x(room;bowl) CC 0 y 1 2re:h + bowl:h ? room:h A 0 1

1

(room;bowl) ? (room;To) 0 (room x x CC 0 y ;bowl) ? y(room;To) 1 2re:h + bowl:h ? room:h ? (To:h ? room:h) A 0 1

After simplifying the expressions as far as possible and propagating the constraints for the parameters in the matrix, we have to match the constraints for the transformation matrix against

Recent results described in Hornig et al. (1998) suggest that the initial position of the protagonist, e.g. Torsten may serve as global reference frame. This would correspond to another organization of the graph. 3

5

the de nitions of all prede ned relations. For the example above, it can easily be proved that left of(Torsten, bowl ) holds (and none of the other relations, if we only provide the set shown in Fig. 2 and on ). For the natural path bowl { refrigerator { Torsten, the computations would be even simpler. In general, this task is by no means trivial, because we can get nonlinear equations involving trigonometric expressions. As shown above, a possible simpli cation of the problem is the use of defaults. Currently, we restrict the orientations of objects to multiples of 90 around the z-axis. This means that the coordinate systems of all objects become parallel or orthogonal. This makes the inferences easier without too much loss of expressive power. We are at present working on the problem of nding heuristics to prove trigonometric (in)equations for arbitrary rotations around the z-axis. We are also investigating the application of machine learning algorithms for constraint solving (see Geibel et al. 1998).

3 Related Work We have described above a quantitative approach to spatial reasoning. Typically, this approach is used when both the size and the positions of the objects are known. In applications, as for example the virtual oce environment of Jording and Wachsmuth (1996), where a robot can move objects to support computer aided design tasks, the use of quantitative methods is mandatory. While our focus is on the inference of spatial relations, in their scenario the emphasis is on changing perspective (involving such inferences) and on the interpretation of spatial expressions. The early work from Waltz and Boggess (1979) is also similar to our approach, but relies heavily on defaults, which greatly simpli es the inference process. Compared to these works we face two additional problems: a) our (in)equations are not strictly numerical, but contain variables for object extensions and distances. This means we often have incomparable expressions in constraints on the same variable which makes constraint propagation and constraint solving more dicult, and b) in the general case those (in)equations will also contain trigonometric functions. Most papers on spatial reasoning favour qualitative calculi. Mukerjee and Mittal (1995) argue that spatial information is usually non-quantitative and under determined. Therefore, a coarser qualitative calculus may often suce for the task at hand. In their work, they compare di erent calculi for qualitative homogeneous coordinate transformations, i.e. all matrix entries are elements of f?1; 0; 1g. Though they additionally make use of topological information, the composition of relations yields a high degree of uncertainty in the general case. There are only nine possible sectors for objects in two dimensions, and the extension of objects (non-overlapping) cannot be formulated. On the other hand, the inference process is very ecient and Mukerjee and Mittal suggest the use of such calculi in restricted domains or in hybrid reasoning systems. More typical for qualitative spatial reasoning is the work of Hernandez (Hernandez 1994; Hernandez 1993). In his calculus, a relation between two objects is determined by an orientation (e.g. front-right), topological information (e.g. touching) and the current frame of reference. His 2-D model allows the speci cation of di erent levels of granularity for the orientation. Despite the computational advantages of qualitative reasoning, we feel the lack of expressive power { (relative) size of objects, and distances cannot be expressed { justi es to investigate the more involved quantitative methods described above.

4 Discussion and Outlook In this paper we have focussed on the program and have largely ignored the psychological questions concerning the mental model. These questions are discussed in Claus et al. (1998). The model as presented above is implemented as a prototype SPACE/0 (Wiebrock et al. 1998). In this prototype, we only consider intrinsic uses of spatial expressions. The graph we build contains the explicitly given relations plus the implicit is in room arcs. After inference (including constraint propagation) these arcs contain all available information about an object's position in the room. Therefore the most ecient inference strategy is to use the path via the room node. This corresponds to a mixed strategy of localizing all objects with respect to the global reference frame of the room and still retaining the arcs for the explicitly given relations. As this example shows, 6

our (implemented) mental model is not equivalent to a single perspective but allows to compute di erent perspectives, if necessary. For the sample text above, another plausible strategy would be to assume a protagonist's perspective and use the protagonist's coordinate system as global reference frame. We have preferred the room reference frame because a reorientation of the protagonist would necessitate the recomputation of all adjacent arcs. Therefore we tried to keep the number of those arcs as small as possible. In the example above we never had to consider more than one mental model. When using relation de nitions with disjunctions (i.e. for the relation beside ), we may be forced to construct several pairwise incompatible models. Another case is discussed in Wysotzki et al. (1997). In this paper, disjoint models have to be constructed for the sentence \The oven stands at the wall." As there are four possible walls, the sentence is ambiguous. At encoding time it cannot be decided which wall is meant. The situation is disambiguated with the next sentence of the text, and three of the models can then be discarded. In the future we plan to parameterize the model so that di erent strategies both for the introduction of arcs { at encoding time { and the inferences at access time can be modelled.

References Ambler, A. P. and R. J. Popplestone (1975). Inferring the Positions of Bodies from Speci ed Spatial Relationships. Arti cial Intelligence 6, 157{174. Claus, B., K. Eyferth, C. Gips, R. Hornig, U. Schmid, S. Wiebrock, and F. Wysotzki (1998). Reference Frames for Spatial Inferences in Text Comprehension. In C. Freksa, C. Habel, and K. F. Wender (Hrsg.), Spatial Cognition - An Interdisciplinary Approach to Representing and Processing Spatial Knowledge, LNAI 1404. Springer Verlag. http://ki.cs.tuberlin.de/~sppraum/Papers/Trier97/trier.ps. Franklin, N., L. A. Henkel, and T. Zangas (1995). Parsing surrounding space into regions. Memory and Cognition 23 (4), 397{407. Franklin, N. and B. Tversky (1990). Searching imagined environments. Journal of Experimental Psychology: General 119, 63{76. Gapp, K.-P. (1994). Basic Meanings of Spatial Relations: Computation and Evaluation in 3D Space. In Proceedings of the 12th National Conference on Arti cial Intelligence (AAAI-94), Seattle, Washington, USA, S. 1393{1398. Also published as SFB 314 (VITRA), Bericht Nr. 101, April 1994. Gapp, K.-P. (1995). An Empirically Validated Model for Computing Spatial Relations. In Proceedings of the 19th German Conference on Arti cial Intelligence (KI-95), Bielefeld, Germany, September 1995. Also published as SFB 314 (VITRA), Bericht Nr. 118, Juli 1995. Garnham, A. (1981). Mental Models as Representation of Text. Memory and Cognition 9, 560{ 565. Geibel, P., C. Gips, S. Wiebrock, and F. Wysotzki (1998). Learning Spatial Relations with CAL5 and TRITOP. Technical Report Fachberichte des Fachbereichs Informatik, Report No. 98-7, TU Berlin. In preparation. Glenberg, A. M. and W. E. Langston (1992). Comprehension of Illustrated Text: Pictures Help To Build mental Models. Journal of Memory and Language 31, 129{151. Hernandez, D. (1993). Maintaining Qualitative Spatial Knowledge. In A. U. Frank and I. Campari (Hrsg.), Spatial Information Theory. A Theoretical Basis for GIS. European Conference, COSIT'93. Hernandez, D. (1994). Qualitative Representation of Spatial Knowledge. LNAI 804. Springer. Hornig, R., B. Claus, D. Durstewitz, E. Fricke, U. Schmid, and K. Eyferth (1996). Object access in mental models under di erent perspectives induced by linguistic expressions. Technical Report KIT-Report 126, KIT Research Group, Technische Universitat Berlin. Hornig, R., B. Claus, and K. Eyferth (1998). In Search for an Overall Organizing Principle in Spatial Mental Models: A Question of Inference. This volume. 7

Johnson-Laird, P. N. (1983). Mental Models: Towards a Cognitive Science of Language, Inference and Consciousness. Cambridge: Cambridge University Press. Jording, T. and I. Wachsmuth (1996). An Antropomorphic Agent for the Use of Spatial Language. In Proceedings of ECAI'96-Workshop on Representation and Processing of Spatial Expressions, S. 41{53. Mukerjee, A. and N. Mittal (1995). A qualitative representation of frame-transformation motions in 3-dimensional space. In Proceedings of the IEEE Conference on Robotics & Automation. Vorwerg, C. and G. Rickheit (1998). Typicality e ects in the Categorization of Spatial Relations. In C. Freksa, C. Habel, and K. F. Wender (Hrsg.), Spatial Cognition - An Interdisciplinary Approach to Representation and Processing of Spatial Knowledge, LNAI 1404, S. 203{222. Springer Verlag. Waltz, D. and L. Boggess (1979). Visual analog representations for natural language understanding. In Proc. of the 6th IJCAI, S. 926{934. Wiebrock, S., C. Gips, U. Schmid, and F. Wysotzki (1998). SPACE/0 { Documentation of a Program for Spatial Inference. In preparation. Wysotzki, F., U. Schmid, and E. Heymann (1997). Modellierung raumlicher Inferenzen durch Graphen mit symbolischen und numerischen Constraints. In C. Umbach, M. Grabski, and R. Hornig (Hrsg.), Perspektive in Sprache und Raum. Deutscher Universitatsverlag, Wiesbaden. Zimmer, H., H. Speiser, J. Braus, A. Blocher, and E. Stopp (1998). The Use of Locative Expressions in Dependence of the Spatial Relation between Target and Reference Object in Two Dimensional Layouts. In C. Freksa, C. Habel, and K. F. Wender (Hrsg.), Spatial Cognition - An Interdisciplinary Approach to Representing and Processing Spatial Knowledge, LNAI 1404, S. 223{240. Springer Verlag.

8