Spatial language and dialogue: A multimodal ...

Spatial language and dialogue: A multimodal perspective Jana Holsanova Cognitive Science Department, Lund University, Sweden

Abstract Reference frames, spatial relations and perspective-taking are studied in two types of conversation: a) in a faceto-face conversation among friends containing description, spontaneous drawing, pointing and gesturing and b) in a task oriented conversation (a variation of a reference task) where the partners do not see each other and where the task – to copy a figurative or abstract picture – could only be carried out verbally. The observations and preliminary results are connected to theories about alignment in dialogue.

1. Face-to-face conversation When describing an unknown visual environment for physically co-present listeners, speakers sometimes spontaneously use drawing. Holsanova (2001) studied such descriptions that were embedded in a casual and amusing type of conversation among friends (see also Holmqvist and Holsanova 1997). The main question is: How is the joint focus of attention created in a face-to-face conversation where the partners share the perceptual space? It shows that in the referring process, apart from the verbal and prosodic means, also the non-verbal means and actions (drawing, laughing, nodding, pointing, gazing and gesturing) played an important role. Let us have a look at an example where the speaker is describing spatial relations between objects and suddenly decides to change perspective and adjust the presentation according to the listeners' verbal and non-verbal feedback. After A has given an overview over a whole bathroom (see Drawing a), he continues with the description of some details (see Extract 1).

Drawing (a). THE BATHROOM Extract 1 637(A) 638(A) 639(A) 640(B) 641(A) 642(B)

…1.53 we have= usually … a=eh …4.00 vanity they call it there, …1.04 where . the washbasin is built in’ …mhm’ but it’s a …1.09 a .. piece,…a part of the washbasin, …0.62 mhm’

643(A) 644(A) 645(A) 646(B) 647(A)

… and it sits .. on the counter itself, … which means that if the water overflows’ then you have to try to .. force it over the rim, back again, …1.00 which is very natural’ .. for us’ …1.09 so … the washbasin is actually resting upon it,

Since the relation ‘upon’ cannot be seen in the original bird’s-eye perspective and the listeners may have trouble imagining it, the speaker introduces a change of perspective. On the basis of the listeners’ facial expressions and the missing feedback, A decides that a further explanation of the position of the washbasin is necessary before he can move on with his description. Therefore, he chooses to draw this spatial relation from the side. (see Drawing (b) and Extract 2). The verbal description is synchronised with the drawing process (accentuated adverbials up, down, up are spoken simultaneously with the corresponding curves in the sketch).

Drawing (b). THE VANITY Extract 2 648(A) 649(B) 650(A) 651(C) 652(A) 653(A) 654(A) 655(A) 656(A) 657A) 658(B) 659(A) 660(A) 661(A) 662(A) 663(A) 664(A) 665(A) 666(B) 667(A) 668(B) 669(A) 670(B)

… so if you have a … cross-section here, … m[hm] [here we] have … the very counter’ mhm’ …then we have the washbasin, it goes up here like this’ and down, …1.67 and up, …1.24 of course when the water already has come over here’ then it won’t go back again, … m[hm] [y]ou have to force it over, .. and these here’ …1.04 eh=if the caulking under here is not entirely new and perfect’ … then of course the water leaks in under .. under here, and… when … the water leaks in un/ .. under’ … and there is wood under here’ after a while then certain small funny things are formed’ [mhm’] ... which are not/ hn … ha ha [ha] … which means that you have to .. replace the whole … piece of furniture, [ha ha ha]

In sum, in this face-to-face conversation where interlocutors are visible to each other, spatial relations can be described and explained both verbally and non-verbally. The drawing has been used: • as a complement to the spoken language description (explicative function),

• as a support for visualisation (illustrative function), • as a way of underlining the talk (expressive function), • as a container of referents (demonstrative function), • as a sort of external memory and information source when references were unclear • as a representation of the whole bathroom construction problem. By pointing at the drawing, the interlocutors could locate objects in a shared visual world and they could both follow and influence the gaze direction and gaze shifts of the other interlocutors (Goodwin 2003). Pointing and gaze is thus contributing to create mutual spatial orientation and a joint visual and attentional focus (Tomasello 1999). The interlocutors were also able to evaluate how the addressee is responding to the action, which prevented misunderstanding and enhanced the process of comprehension. The deictic expressions used by interlocutors were complemented by non-verbal actions (drawing, pointing, gestures), disambiguating possible referents. A number of co-operative activities (Clark 1996) have contributed to establish a base for mutual understanding: the interlocutors were seeking and giving feedback, they were elaborating terminology interactively, etc. (Schober and Brennan, in press).

2. Task-oriented conversation The question now is: How is the joint attention focus in a referring process created in a situation where the partners cannot see each other and therefore cannot use information from the non-verbal means of communication? It is a known fact that interlocutors are more efficient when they can both see and hear each other than when they could only hear another (Doherty-Sneddon et al. 1997). How do interlocutors explain spatial relation on the basis of the verbal description only? To answer this question, we collected data from task-oriented conversations. As Schober and Brennan (in press) note, there are several advantages with task-oriented conversations: the intentions are constrained by the task, there is a number of objective measures for task performance (e.g. similarity between the original picture and the copy, time elapsed) and it allows to investigate mental processes. This kind of reference task was a two-party conversation with a side-participant observing the interaction. In our version, the interlocutors are separated by a barrier and cannot see each other. One person had access to visual and spatial information (a picture) that the other person needed. The task – to copy the picture – could only be carried out verbally, through conversation and co-operation. At the end of the session, we asked all participants to reflect on their interaction (how they have experienced it, what has been positive, what was problematic etc.). The data consists of 5 hours of video-recordings and the pictures involved are either schematic or figurative. When describing a picture for a person who is supposed to re-draw it, the process of establishing spatial relations and frames of reference becomes even more critical. The choice of perspective is another important moment contributing to a successful task performance. Let us have a look at some preliminary observations from the data. In the next extract, the interlocutors just started to communicate about a figurative picture that should be drawn by B (a cowboy holding a horse and a gun). To locate a figure and describe spatial relations between parts of the figure, interlocutors can either use prepositions (at, on, in, across, in front of, on top of, near, between, parallel to) or verbs (support, hold, lean, approach) (Tversky and Lee 1998) . In our data, the interlocutors often add even metaphors as in the next example:

Extract 3 8(A) 9(A) 10(B) 11(B) 12(A) 13(B) 14(A) 15(B) 16(A) 15(B) 16(A) 17(B) 18(A) 19(B) 20(A) 21(B) 22(B) 23(B)

mm he stands and holds a gun about like a baby, the gun also lies in his arms, in which way does it lean, is it from left down, to right up’ the gun’ yes, eh … do you understand what I mean’ yes, does it lean in this’ direction or in the other direction, it rests upon his … left arm’ and under his right arm, okay, mm … but you must think of yes, that it is inverted, eh eh, his his left, . well let’s see,’ eh . something like that there’ and then a little bit there,]

Another observation concerns creating of reference frames. Speakers who describe locations to interacting partners are aware of differences in perspective (Schober 1993). They are able to see things from the partner’s point of view and adapt to each other. In our data, A had problems with the ’right’ and ’left’ distinction. To avoid future referring problems, she created an alternative reference frame in the course of the conversation, She started using parts of the environment (the camera and the observer, called L) instead of left/right distinction: the cowboy is looking from here from my side towards the camera. Her partner B became aware of A’s perceptual and descriptive preferences and adjusted to this reference system by reformulated her own utterances from A’s perspective and by systematically producing tailored descriptions (Schober and Brennan, in press): horse’s leg closest to the camera; on L’s side, sorry, from your perspective on the camera’s side, under his right… oh the arm closest to the camera. Both partners continued using these multiple informative and innovative references throughout the conversation (Brennan and Clark 1996) – they even created a compound noun: the old man’s camera leg. When describing spatial relations and perspectives in the schematic picture (specifically, a flat drawing), the interlocutors also used real-world scenarios: ”Imagine that you exit the toilet and directly to your left is a bench.” Natural borders such as sides of the paper, horizontal and vertical axes and bodily measures were used as reference frames (cf. Tversky and Lee 1998). Figures were often located relative to other reference figures. Some interlocutors even described objects by virtually taking away some of the drawn objects, relating to the space and objects that were left on the picture (if x would not have been in front of y then…). Conversations consisted of a large number of repetitions and reformulations. A common strategy that contributed to joint attention (Tomasello 1999) was that A described a new object, B reformulated it relation to the other drawn objects or asked for clarifications, A expressed agreement/disagreement or answered the questions, B starts drawing and reformulated it again. These steps of description were typically clustered into a larger (event) unit connected by thematic aspect, and consisting of verbal and non-verbal actions, so called attentional superfocus (Holsanova 2001). The proportion of initiatives from the describers’ and the drawers’ side to offer/require spatial information differed accross the various pairs of interlocutors. Compared to face-to-face conversation, everything had to be formulated, verified and confirmated verbally. Since the interlocutors could

not see aeach other’s facial expressions, gaze directions and gestures, they had to explicitly signal uncertainty, doubt, accomplishment of an action, turntaking etc. verbally. Although our observations are only preliminary and a thorough data analysis remains to be done, they can still be interpreted in terms of the theories on adaptation, adjustment and alignment. Whereas it is not very probable that the interlocutors at each point in conversation model each others’ mental states and mutual knowlegde (Clark and Marshall 1981), it seems to be more plausible that they rely on conceptual pacts (Brennan and Clark 1996), i.e. more global flexible agreements on how to conceptualize objects and spatial relations. The collected data and the examples presented here give also support to Pickering’s and Garrod’s theory (2003) stating that the dialogue is mostly based on a low-level alignment and implicit common ground. It is first when these mechanisms fail that we engage in a more sofisticated way of modelling the interlocutors’ mental states in order to repair the misaligned representations.

References Brennan, S. and Clark, H. H. (19996): , Conceptual pacts and lexical choice in conversation. Journal of Experimental Psychology: Learning, Memory and Cognition 22, 1482-1493. Clark, H.H. (1996). Using language. Cambridge MA: Cambridge University Press. Clark, H.H. and Brennan, S.E. (1991). Grounding in communication. In L.B. Resnick, J.M. Levine and S.D. Teasley (Eds.) Perspectives on socially shared cognition (pp. 127-149). Washington, DC: American Psychological Association. Clark, H.H. and Wilkes-Gibbs, D. (1986). Referring as a collaborative process. Cognition, 22, 1-39. Clark, H.H., and Marshall, C. R. (1981). Definite reference and mutual knowledge. In A. K. Joshi, B. L. Webber, and I. A. Sag (Eds.), Elements of discourse understanding (pp. 10-63). Cambridge: Cambridge University Press. Doherty-Sneddon et al. (1997). Face-to-face and video-mediated communication: a comparison of dialogue structure and task performance. Journal of Experimental Psychology: Applied 3, 105–125. Garrod, S., and Anderson, A. (1987). Saying what you mean in dialogue: A study in conceptual and semantic co-ordination. Cognition, 27, 181-218. Garrod, S., and Doherty, G. (1994). Conversation, co-ordination and convention: An empirical investigation of how groups establish linguistic conventions. Cognition, 53, 181-215. Goodwin, C. (2003). Pointing as a Situated Practice. In: Kita, S. (Ed.), Pointing. Where Language, Culture and Cognition Meet. (pp. 217-241). Lawrence Erlbaum Associates. Holmqvist, K. and Holsanova, J. (1997): Reconstruction of Focus Movements in Spoken Discourse. In: Liebert, W., Redeker, G., Waugh, L.: Discourse and Perspective in Cognitive Linguistics. Benjamins: Amsterdam, 223-246. Holsanova, J. (2001): Spoken Language Descriptions and Spontaneous Drawing, Chapter V. In Picture Viewing and Picture Description: Two Windows on the Mind. Doctoral dissertation. Lund University Cognitive Studies 83, 148-184. Pickering, M. J. and Garrod, S. (2003), Toward a mechanistic psychology of dialogue, BBS, CUP.

Schober, M. F. and Brennan, S. E. (in press). Processes of interactive spoken discourse: The role of the partner. In A. C. Graesser, M. A. Gernsbacher, and S. R. Goldman (Eds.), Handbook of discourse processes. Hillsdale, NJ: Lawrence Erlbaum. Schober, M.F. (1993). Spatial perspective-taking in conversation. Cognition, 47, 1-24. Tversky, B. and Lee, P.U. (1998), How space structures language. In Freksa, C., Habel, C. and Wender, K. (Eds.), Spatial cognition. Number 1404 in Lecture Notes in Computer Science, Berlin: Springer, 157–176.

Spatial language and dialogue: A multimodal ...

Spatial language and dialogue: A multimodal ...

Suggest Documents

Multimodal generation, spatial language and

Spatial Language Understanding with Multimodal Graphs using

MIAMMâA MULTIMODAL DIALOGUE SYSTEM USING HAPTICS

Multimodal Language Resources

A Dialogue Manager Supporting Natural Language Tutorial Dialogue

Adding a Speech Cursor to a Multimodal Dialogue

Adding a Speech Cursor to a Multimodal Dialogue

Multimodal Fusion in Human-Agent Dialogue - Ilhaire

Dialogue Management for Multimodal User Registration

an architecture for multimodal dialogue. - CiteSeerX

Rapid Multimodal Dialogue Design: Application in

Multimodal character viewpoint in quoted dialogue sequences

SPEECH DIALOGUE WITH FACIAL DISPLAYS: MULTIMODAL ...

Cooking Coach Spoken/Multimodal Dialogue Systems

Building Multimodal Dialogue Applications: System Integration ... - DFKI

an architecture for multimodal dialogue. - CiteSeerX

Towards Augmenting Dialogue Strategy Management with Multimodal

SPEECH DIALOGUE WITH FACIAL DISPLAYS: MULTIMODAL ...

Improving Context Modelling in Multimodal Dialogue Generation

Rapid Development of Multimodal Dialogue Applications with

A Device-Independent Multimodal Mark-up Language

A Natural Language Web-based Dialogue system

A Natural Language Tutorial Dialogue System for

Situated Dialogue and Spatial Organization

Spatial language and dialogue: A multimodal ...