Towards Learning by Interacting

2 downloads 0 Views 135KB Size Report
by interaction goes beyond common supervised or unsupervised strategies by taking into ..... A burst was characterized as several successive sucks of the baby. When a ..... Lemon, O., Pietquin, O.: Machine learning for spoken dialog systems.
Towards Learning by Interacting Britta Wrede1,2 , Katharina J. Rohlfing1,2 , Marc Hanheide1,2 , and Gerhard Sagerer1,2 1 2

Bielefeld University, Applied Computer Science Institute for Cognition and Robotics (CoR-Lab) 33501 Bielefeld, Germany

Abstract. Traditional robotics has treated the question of learning as a one way process: learning algorithms, especially imitation learning approaches, are based on observing and analysing the environment before carrying out the ‘learned’ action. In such scenarios the learning situation is restricted to uni-directional communication. However, insights into the process of infant learning more and more reveal that interaction plays a major role in transferring relevant information to the learner. For example, in learning situations, the interactive situation has the potential to highlight parts of an action by linguistic and nonlinguistic features and thus to guide the attention of the learner to those aspects that the tutor deems relevant for her current state of mind. We argue that learning necessarily needs to be embedded in an interactive situation and that developmental robotics, in order to model engagement in interaction, needs to take the communicative context into account. We further propose that such an approach necessarily needs to take three aspects into account: (1) multi-modal integration at all processing levels (2) derivation of top-down strategies from bottom-up processes and (3) integration of single modules into an interactive system in order to facilite the first two desiderata.

1

Introduction

For many years robotic research has aimed at developing robots that are able to learn from observation or interaction with their environment. In the domain of cognitive robotics it can be seen as an agreement that embodiment is a foundation for any kind of interactive or simply active learning [7,38]. Consequently, many approaches have thereby been motivated by findings from infant development, thus implicitly implying that their goal is to model human cognition. However, most approaches have either applied offline learning to simply enable online recognition or have treated learning and recognition as rather distinct processes. Other approaches exploit the embodiment of the system by enabling the robot to interact with the environment. However, only rarely has the social and cognitive knowledge of the tutor been taken into consideration, although it is widely accepted that a social learning situation itself comprises particular information facilitating the learning process and increase its effectivity. Therefore, we argue in this paper that learning – with a focus on the acquisition of B. Sendhoff et al. (Eds.): Creating Brain-Like Intelligence, LNAI 5436, pp. 139–150, 2009. c Springer-Verlag Berlin Heidelberg 2009 

140

B. Wrede et al.

linguistic as well as manipulation-oriented capabilities – needs to be embedded in an interactive situation and that a robotic system, in order to develop and learn over time, needs to be able to make use of the interactive situation by exploiting the closed loop with the human tutor. This view on learning entails that we need new system architectures that facilitate multi-modal fusion in a much more integrated way and that allow to integrate offline with online processing approaches. One important step in this direction is to enable the development of top-down strategies from bottom-up processes at various levels. We are convinced that in order to develop brainlike intelligence also cognition-inspired learning paradigms are crucial. Learning by interaction goes beyond common supervised or unsupervised strategies by taking into account wider feedback and assessments for the learning processes. We argue that this entails to model mechanisms that have the power to generalise and thus develop through experience. As one such mechanism we have identified the process of joint attention which is seen as a necessary capability in order to facilitate learning. We address this mechanism in Section 3 by reviewing findings of the role of joint attention in parent-infant interactions and approaches to model attention in different robot applications. Another mechanism, which is closely related to joint attention but appears to be even more powerful, is the way how interaction is managed. This entails on the one hand issues of turn-taking, that is, at what point in time each participant can make a contribution. However, this also relates to feedback mechanisms that signal to the contribution partner that her message has been received, processed and understood – a process that is commonly being addressed as grounding. We present different approaches of grounding in Human-Robot Interaction (HRI) in the following section and analyse them for their capabilities of scaling to a potentially open-ended learning situation. We will discuss the implications that these desiderata of an interactive learning system have on the further development and research.

2

Background

Traditionally, the ability of learning has been investigated in terms of information storage and representational formats. Fogel critisizes the metaphor of signal and response that is conveyed that way. Instead of “top-down models of action control” [9], recent research on learning promotes the dynamics of interacting systems and their co-regulation. In this vein, our argument starts from the observation that infants need interaction with caregivers in order to develop. The question of how interaction influences the learning process has concerned many researchers. The phonetic acquisition process, for example, has for a long time been assumed to rely purely on statistic analysis of the auditory input, as specified in the Perceptual Magnet Theory [22]. However, more recently it has been observed that while very young infants are able to re-learn non-native phonemic contrasts in interactive situations they will not do so when confronted with the same signals via video in a uni-directional, non-interactive situation [21].

Towards Learning by Interacting

141

These findings indicate that already very young infants make use of information arising from interactive situations. Kuhl [21] suggests two aspects that might help infants to learn in interactive situations: on the one hand she argues that in direct interaction attention and arousal of the infants are higher than in video situations. These factors are known to affect learning in a wide variety of domains [31] and might thus help the infants to learn the non-native contrasts. On the other hand, she argues that the mechanism of joint attention, which is only present in live interactions but not in video settings, might provide the infants with relevant additional information that finally helps the infants to distinguish between relevant phonemic contrasts. However, how in detail can joint attention help to learn things that an infant would not learn in a non-interactive situation? And what exactly is the additional information that is provided in interactive situations but not in a video session? One answer is that in an interactive situation the teacher can guide the learner’s attention to relevant aspects of the entity that is to be learned and can thus establish joint attention in order to achieve a common ground (or mutual understanding). Most interestingly, strategies to guide the attention are highly multi-modal and make use of the temporal relationship between information in the different modalities. There is empirical evidence showing that infants are more likely to attend to a visual event which is highlighted by speech (e.g. [26]). This suggests that in young infants language plays the role of one of many possible perceptual cues. Gogate et al. [10] applied the principle of the intersensory redundancy theory to early acquisition of communication and investigated the speech signal as a help for preverbal infants to remember what they see. The authors found that moving an object in synchrony with a label facilitated long-term memory for syllableobject relations in infants as young as 7 months. By providing redundant sensory information (movement and label), selective attention was affected [10]. Redundant information can include temporal synchrony, spatial co-location, shared rhythm, tempo, and intensity shifts. Gogate and Bahrick argue for the following reference mechanism in learning words (see also [39]). Young infants relate words and objects by detecting intersensory redundancy across the senses in multimodal events. One type of perceptual information can be the sound as it is provided by a person when she or he speaks. Temporal synchrony and movement seem to constitute two perceptual cues that are ecologically valid: When observing mothers interacting with their children, Gogate et al. [10] found that mothers use temporal synchrony to highlight novel word-referent relations, i.e. they introduced a novel label to their children and moved the new object synchronously to the new label. According to the results achieved in different age groups of children, this phenomenon - observed also cross-culturally [11] - seems to be very prominent in preverbal children. This idea that the presence of a sound signal might help infants to attend to particular units within the action stream was originally proposed and termed “acoustic packaging” by Hirsh-Pasek and Golinkoff [14]. Hollich and colleagues [15] report on literature that considerably supports the notion that infants

142

B. Wrede et al.

appear to rely quite heavily on the acoustic system in the phase of perceptual segmentation and extraction. While this segmentation and categorization is going on, Hirsh-Pasek and Golinkoff [14] argue that children can use this “acoustic packaging” to achieve a linkage between sounds and events (see also [39]). Interaction thus establishes a situation where joint attention can continuously be established through the extensive use of multi-modal cues which facilitate learning as specified by the multi-modal redundancy hypothesis. However, in order to yield learning effects, bottom-up strategies – as applied often for attentional processes – need necessarily to be combined with top-down processes. For example, parent-infant interaction can be seen as a combination of both processes with the infant using mainly bottom-up strategies in order to make sense of a situation whereas the care-taker will provide top-down strategies to guide the infant’s attention. By this interlinked process bottom-up and topdown processes are intimately combined and form a dynamic mechanism that provides more and more differentiated learning (and attention) strategies from which the infant can select. In order to model such a complex dynamic process on a robotic system single modules, as developed in current state-of-the-art research, need to be incrementally integrated in a complex system. Far from being trivial, it has been shown that such complex integration tasks necessarily address the fundamental question of a cognitive architecture and have the potential to show the synergetic effects arizing from the combination of single capabilities.

3

Joint Attention and Learning

In order to answer the question of what to be learned, we believe that joint attention is in particular relevant. Hinted by adult-infant studies it will be dicussed in this section why and how joint attention can be considered to be a cornerstone for learning by interacting in robotics. 3.1

Joint Attention in Parent-Infant Interaction

Whether memorizing a face, learning about how to switch the light on or just perceive an object that is on a table, all this is based on attention processes. It is our attention that reduces an enormous amount of information that is available to us through our senses, memory and other cognitive processes, to a limited amount of salient information. The attention decides about selected information that will be further processed by mental conscious and automatic processes [35]. In social learning, which has attracted interests of scholars not only from developmental psychology and cognitive science but also from robotics (e.g. [5]), attention processes are not only controlled by a single system and its reactions to the environment but are also influenced by interaction partners. In studies with newborns, it has been shown that babies are tuned to faces and will more likely follow geometric patterns that resemble faces than other similar geometric patterns [18]. This preference is discussed as adaptive orientation mechanism that ensures that infants will learn the most relevant social stimuli in their environment [4].

Towards Learning by Interacting

143

The fact that a social partner can influence the attention allows the interacting agents to simultaneously engage on one and the same external thing and form a kind of mental focus [1], which is called joint attention. In the literature, two representatives of joint attention have been identified: the ability to follow interlocutor’s direction of gaze and the ability to follow interlocutor’s pointing gesture. Interestingly, in recent research, it has been suggested that the ability of joint attention might develop from infants bottom-up skills and preferences. For example, Hood et al. [16] as well as Farroni and her colleagues [8] could show that infants as young as 4 months develop a rudimentary ability to follow somebody’s gaze and demonstrated this under specific experimental conditions. Farroni and her colleagues further specified that infants show this behavior, because they are cued by the motion of the pupils rather than by the eye gaze. Thus, when there was no apparent movement of the pupils and the averted gaze was presented statically, there was no evidence of 4 month-old infants following the gaze shift. This data reveals that the direction of the eyes per se does not have an effect in directing infants’ attention. Motion or perceptual saliency [12] seems to be an important cue for infants, but Farroni and her colleagues [8] suggest that it may become less important during the development (s. also [32]). For example, while young infants’ gaze-following depends on seeing the adult’s head movement rather than the final head pose [27], older infants use static head pose to top-down infer direction of attention. For the following of the pointing gestures, it was similarly suggested that while infants as young as 4.5 months respond to the direction of movement, no matter whether the movement is done by a pointing finger or the back of the hand. Around 6.5 months, however, infants start to be sensitive to the orientation of the finger, which is a crucial component of a pointing gesture [32]. Therefore, it is plausible to argue that early in their development, children are attracted by perceptual saliency but continue to attend to social cues [12]. 3.2

Modeling Joint Attention in HRI

Models of joint attention in robotics, in contrast, tend to be either purely bottomup, where the interactive character of joint attention is often neglected, or modeldriven. Most approaches only work under highly restricted constraints and are generally not intended to be able to adapt their attentional behavior to the interaction partner’s behavior. Attempts to model human performance in non-interactive situations with respect to attentional processes have been investigated in several implementations of a saliency based attention model [6] [17] [13]. In these approaches attention is reduced to a focus point in a visual scene for each time-frame. While these results are interesting when compared to human performance [34], they can provide little help with respect to the question of how attention can support learning in interactive situations. However, they serve as an important pre-requisite to model joint-attention. A brief overview relating infant development with respect to joint attention and robot’s attention models is given in [19].

144

B. Wrede et al.

Integrating bottom-up strategies for attention in a socially interactive robot system has been shown to yield an active search behavior of the robot with the goal to satisfy internal, top-down driven needs [2]. However, in this approach the top-down needs were pre-defined as well as their relation to the behaviors and visual perceptions. A learning process based on the bottom-up processing results does not take place. Bottom-up strategies for learning joint attention have been successfully demonstrated on a humanoid robot. In [28] [29] results have been reported from a robot that has learned to associate visual cues of the human partner’s head movements with the movements of its own head through a saliency based reward mechanism. Similarly, a robot can learn to find a face and the eyes of a human interaction partner in order to imitate head movement behavior based on bottom-up strategies for vision [33]. These learnig algorithms can be interpreted to learn joint attention in an interaction situation solely through bottom-up features. However, again, no processes taking advantage of the learned association with the goal to establish top-down strategies have been foreseen. Yet, more recently, results have been reported that suggest that semantically relevant aspects of actions as well as preferences for social cues might be found through bottom-up strategies [30] and might thus serve as a bootstrap mechanism to learn top-down strategies. However, in total it can be stated that bottom-up (or even mixed) approaches to joint attention do not take verbal interaction, and thus explicitly specified semantics, into account, which would be necessary in order to bootstrap linguistic learning. On the other hand, robots that are able to interact at a higher, linguistic level with humans and make use of multi-modal information tend to be based on almost purely model-driven joint-attention. Such approaches follow the strategy to track an object of interest (OOI) that has either been specified by a deliberate process and make use of it in order to resolve references from the communication with the user – or that have come in the focus of attention through bottom-up saliency detection processes ([2]). While these approaches allow meaningful, grounding-based interactions they provide only limited means to integrate bottom-up strategies for joint attention in order to adapt the attention strategies to new situations and learn qualitatively new concepts as would be necessary for e.g. language learning. 3.3

Conclusion

Thus, while theories on infant development discuss the possibility that higher level attentional processes, and thus top-down strategies such as joint attention, may develop through bottom-up, saliency-based strategies being applied to meaningful interactive communication, computational models of joint attention mainly focus on one strategy exclusively. Also, the importance of multi-modal information tends to be neglected in these approaches. However, theories about the interplay of different modalities for learning, such as the intermodal redundancy hypothesis, suggest that it is the synchrony of information in different

Towards Learning by Interacting

145

modalities that helps to make sense of the enormous amount of information that reaches the infant’s brain through its sensory system. This means that attentional processes necessarily need to be able to take multi- modal information into account and make use of it in order to find relevant segments over time in the signal. Yet, the modeling of such multi-modal approaches is still in its infancy. The reason for this is that synchronisation and the handling of different processing rates, as they are common for e.g. visual vs audio-processing, are non-trivial tasks that need to be solved beforehand. In order to achieve such an integration, an adequate system architecture is needed that enables (1) early as well as late fusion, (2) the building of new concepts and processing strategies and (3) feeding back of such top-down information from higher to lower levels. In other words, taking multi-modality seriously means that the architecture needs to be adapted to these requirements.

4

Interaction Management and Learning

While joint attention is a process that helps to guide the attention of each interaction partner to a common focus, it does not provide a means to communicate about the common object of interest. What is, thus, needed is a mechanism that facilitates the exchange of information by provding means of feedback and rules of interactivity. One might even argue that such a mechanism is actually a more general kind of joint attention as it also needs to make sure that both interaction partners are focussing on the same thing while providing at the same time a means for exchanging information about this thing. 4.1

Interaction Management in Parent-Infant Interaction

A prerequisite for the development of communication is the ability to participate cooperatively in shared discourse ([20] [25]). Dialogue as a behavioral and communicational form seems to be acquired [20] in that sense that a child has to learn how to time her or his own behavior in order to exchange with a social partner and how to coordinate communication when one has to handle or manipulate objects in parallel. The foundations for these skills seem to be acquired very early and in situations that have seemingly less to do with vocal communication. In a study, in which mothers and their newborns were observed during feeding time, Kaye [20] analyzed burst-pause patterns and their development. A burst was characterized as several successive sucks of the baby. When a baby paused the feeding, her or his mother seemed to interpret this behavior as flagging and was therefore inclined to stimulate the baby’s behavior by jiggling the baby. But interestingly, jiggling - as has been analyzed - does not cause another burst immediately. Instead, the crucial signal was the pause after the jiggling that seemed to let the baby resume the sucking. In the course of the interaction development, jiggling of the mothers became shorter. After two weeks, mothers seemed to change their behavior from ’jiggle’ to ’jiggle-stop’. Kaye [20] interprets

146

B. Wrede et al.

the results in terms of the origin of a dialogue: Both, the mother and the child, learned to anticipate one another’s behavior in terms of a symmetry or contingency meaning that when somebody is doing something, the other will wait. That is the rule of turn taking, and it is learned because certain interactive sequences (that seem to occur naturally) are becoming more frequent. Infants seem to be sensitive to such contingencies and regularities in behavior. According to Masataka [25], when exposed to a contingent stimulation, children are positively affected in their motivation insofar as their performance on a subsequent task was facilitated. Whereas children’s performance was impaired when they were exposed to the non-contingent stimulation before. Young infants enjoy contingent interaction even in the absence of a social partner, but with, e.g. a mobile robot [37]. Csibra and Gergely [4] believe that these early interactions serve “an ultimately epistemic function: identifying teachers and teaching situations and practicing this process”. This way, infants are tuned to specific types of social learning by a specific type of communication. In this type of social learning, generalizable knowledge is conveyed that is valid beyond the ongoing situation [4]. The authors rest their assumption on further preferences in young infants for tutoring situations. Accordingly, infants prefer not only verbal behavior that is specifically addressed to them (known as Motherese or Child-Directed-Speech) but also nonverbal behavior such as object demonstration that is directed towards them (known as Motionese). 4.2

Modeling Interaction Management in HRI

Taking these findings into account one may argue that joint attention serves as a precursor of and necessary ingredient for meaningful interaction. Additionally to joint attention, in order to exchange information, there needs to be a mechanism that enables a bi-directional interaction where information, that is given by one partner, is confirmed by the other. This means that this mechanism does not only need to make sure that both partners are attending to the same object but also that they can exchange information about it. In this sense one might interpret joint attention as a subset, or precursor, of interaction. In HRI such a mechanism is generally modeled as a dialog module which manages interaction in a top-down way. It is often integrated in the system’s architecture at the highest, the deliberative level, which is closely related to the planning processes. Thus, the techniques to model interaction tend to be closely intertwined with the system’s architecture. Standard approaches to dialog modeling in HRI have for a long time been state-based where the internal system states are augmented by dialog states with each dialog step being represented as a new state. In such an approach the dialog is driven by the goal to satisfy the need for information depending on the system state. For example, if in an instruction a crucial information for a task execution is missing this is modeled explicitly by a state representing the system state and the dialog state. This way, each potential interaction has to be modeled a-priori and learning of new interaction steps is very difficult if not impossible.

Towards Learning by Interacting

147

In order to generalise over such states and learn better and new associations between system and dialog states, there exist a range of machine-learning approaches to spoken dialog systems [23]. While such approaches are reported to be able to generalise to a certain degree to unseen events, they are nevertheless tied to the existing system states on the one hand and require a high level analysis of the spoken utterances in terms of dialog or speech acts on the other hand. Learning is thus only possible within a very limited range, where for example new speech acts can not be learned. Also, these approaches are limited to uni-modal speech based interaction systems such as telephone service systems – indicating that dealing with more complicated or even dynamic states, as they are common in embodied and situated systems such as robots, might be difficult for this approach. A relatively new approach in HRI is to base the interaction management on a grounding process [24]. Grounding [3] is a well-known concept that has been used at length to describe and analyse human-human interaction on the one hand, and which has been adapted to human-machine interaction on the other hand. Basically, grounding describes the interactive process of establishing mutual understanding between two interaction partners. This process can be described in terms of adjacency pairs of utterances with a so called presentation initiating such a pair and an acceptance signaling understanding by the listener. If the listener does not understand, he or she will initiate a clarification sequence, which has to be finished before the initial utterance can be answered. While this concept is intuitively clear it raises several questions with respect to the sequencing of such contributions and the termination of the grounding process. Accordingly, there are only very few implementations making use of this principle. However, those applications that make use of a grounding-based interaction management tend to be more flexible when the system is changed and also to allow for much more mixed-initiative interactions [24]. Understanding is thereby often operationalized as meeting an expectation. For example, if the system is asking a question it will expect an answer with a specific type of information. If the answer does not contain this type of information the expectation is not met and the question, thus, not grounded. 4.3

Conclusion

Thus, dialog modeling in HRI is highly model-driven as it pertains to higher levels of communication. However, when taking findings from parent-infant interaction into account it becomes obvious that the principle of grounding might also be applicable to lower or even non-linguistic levels of interaction. While the current models of grounding need linguistic analyses of the spoken utterances, it is yet possible to extend this principle to non-linguistic levels of interaction. We argue that grounding is a mechanism that may serve as a general mechanism for learning higher-level linguistic interaction through non-linguistic, physical interaction. However, in order for such a system to evolve, we argue that three prerequisites need to be available: (1) The mechanism needs to be able to process

148

B. Wrede et al.

multi-modal information in a way that allows to synchronize the information at different levels in order to draw conclusions about co-occurring events and thus segmentation into meaningful units. Multi-modality must thus not only occur at a single point in the overall system – as is often the case – but has to be available to all processing levels. (2) The mechanism needs to provide a possibility to develop top-down strategies from bottom-up data analysis. Also, a mechanism is required that enables bottom-up and top-down processes at the same time. For example, interaction strategies derived from non-verbal interaction from which turn-taking and feedback strategies are derived, need to be able to feed back into the further bottom-up processes involved in the processing of verbal input. These two desiderata entail (3) that the overall system has to be highly integrated. With this we mean that on the one hand, many modules operating at different levels and focussing on different modalities need to work together. On the other hand, this means that a coherent architecture needs to be established that enables all components to have access to multi-modal information and may feed information back to other processes.

5

Conclusion

We have argued that the interaction situation is a necessary pre-requisite in order to enable developmental learning. In this context, learning can not only be seen as the task of assigning symbols to sensorial patterns, but should be modeled as a continuous process driven by social and interactive cues. Hence, we are not advocating any specific method in machine learning but rather calling for a paradigm in learning that combines aspects of supervised and reinforcement learning emphasizing the role of an interaction partner playing the tutor role. This view indeed has an impact on the selection of any particular underlying methodology or algorithm. However, as interaction is inherently a process characterized by multi-modality and mixed top-down and bottom-up schemes, a systemic approach that is functionally highly interlinked is needed. Thus, when taking the interaction situation in a learning scenario into consideration a complex perception-action system needs to be developed in order to enable learning. From software engineering and architecture perspective such a close coupling is always considered as a particular challenge that needs attention in further research. This fact is reflected by the growing attention cognitive architectures put forward in the last years (e.g. [36]). Furthermore, as we are talking about interactive systems, demands regarding ’responsiveness’ and ’liveness’ need to be taken into account during the selection of particular methods. As interaction is always live and can not be stored or processed offline, the system needs to be ’responsive’, that is it needs to be able to react in a short enough time span in order for the user to be perceived as continuing interaction. In order to decompose the general problem of interactive learning, in this paper we identified two major building interaction blocks, namely joint attention and interaction management, that have the potential to evolve through repeated interaction cycles from basic physical interactions to interactions at a linguistic level. To implement systems that actually learn by interaction, existing models

Towards Learning by Interacting

149

for attention in computational system need to grow up towards comprehensive, systemic approaches fusing bottom-up and top-down mechanisms. They have to exploit multi-modality in their perceptual and expressing capabilities. And they demand for a systemic view that allows for close inter-operation of the basic capabilities.

References 1. Baldwin, D.A.: Understanding the link between joint attention and language. In: Moore, C., Dunham, P. (eds.) Joint attention: its origins and role in development, pp. 131–158. Lawrence Erlbaum, Mahwah (1995) 2. Breazeal, C., Scassellati, B.: A context-dependent attention system for a social robot. In: IJCAI 1999: Proc. Int. Joint Conf. on Artificial Intelligence, pp. 1146– 1153. Morgan Kaufmann, San Francisco (1999) 3. Clark, H.H.: Arenas of Language Use. University of Chicago Press (1992) 4. Csibra, G., Gergely, G.: Social learning and social cognition: The case for pedagogy. In: Johnson, M., Munakata, Y. (eds.) Processes of Change in Brain and Cognitive Development. Attention and Performance, XXI. Oxford University Press, Oxford (2005) 5. Dautenhahn, K., Nehaniv, C. (eds.): Imitation and social learning in robots, humans and animals: Behavioural, social and communicative dimensions. Cambridge University Press, Cambridge (2007) 6. Driscoll, J., Peters, R., Cave, K.: A visual attention network for a humanoid robot 7. Duffy, B., Joue, G.: Intelligent robots: The question of embodiment. In: Proc. of the Brain-Machine Workshop (2000) 8. Farroni, T., Johnson, M.H., Brockbank, M., Simion, F.: Infants’ use of gaze direction to cue attention: The importance of perceived motion. Visual Cognition 7, 705–718 (2000) 9. Fogel, A., Garvey, A.: Alive comunication. Infant Behavior and Development 30, 251–257 (2007) 10. Gogate, L.J., Bahrick, L.E.: Intersensory redundancy and 7-month-old infants’ memory for arbitrary syllable-object relations. Infancy 2, 219–231 (2001) 11. Gogate, L.J., Prince, C.: Is ”multimodal motherese” universal? In: The X International Congress of the International Association for the Study of Child Language (IASCL), Berlin, Germany, July 25-29 (2005) 12. Golinkoff, R.M., Hirsh-Pasek, K.: Baby wordsmith. Current Directions in Psychological Science 15, 30–33 (2006) 13. Hashimoto, S.: Humanoid robots in waseda university - hadaly-2 and wabian. In: IARP First International Workshop on Humanoid and Human Friendly Robotics, Tsukuba, Japan (1998) 14. Hirsh-Pasek, K., Golinkoff, R.M.: The Origins of Grammar: Evidence from Early Language Comprehension. MIT Press, Cambridge (1996) 15. Hollich, G., Hirsh-Pasek, K., Tucker, M.L., Golinkoff, R.M.: The change is afoot: Emergentist thinking in language acquisition. In: Anderson, C., Finnemann, N.O.E., Christiansen, P.V. (eds.) Downward Causation, pp. 143–178. Aarhus University Press (2000) 16. Hood, B.M., Willen, J.D., Driver, J.: Adult’s eyes trigger shifts of visual attention in human infants. Psychological Science 9, 131–134 (1998) 17. Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 20(11), 1254–1259 (1998)

150

B. Wrede et al.

18. Johnson, M.H., Morton, J.: Biology and Cognitive Development: The case of face recognition. Blackwell Scientific Publications, Oxford (1991) 19. Kaplan, F., Hafner, V.V.: The challenges of joint attention. In: Proceedings of the Fourth International Workshop on Epigenetic Robotics, pp. 67–74 (2004) 20. Kaye, K.: Toward the origin of dialogue. In: Shaffer, H.R. (ed.) Studies in motherinfant interaction, pp. 89–119. Academic Press, London (1977) 21. Kuhl, P.: Is speech learning gated by the social brain? Developmental Science 10(1), 110–120 (2007) 22. Kuhl, P.K.: Human adults and human infants show a ‘perceptual magnet effect’ for prototypes of speech categories, monkeys do not. Percept. Psychophys. 50, 93–107 (1991) 23. Lemon, O., Pietquin, O.: Machine learning for spoken dialog systems. In: INTERSPEECH 2007, Antwerp, Belgium, pp. 2685–2688 (2007) 24. Li, S.: Multi-modal Interaction Management for a Robot Companion. Phd, Bielefeld University, Bielefeld (2007) 25. Masataka, N.: The onset of language. Cambridge University Press, Cambridge (2003) 26. McCartney, J.S., Panneton, R.: Four-month-olds’ discrimination of voice changes in multimodal displays as a function of discrimination protocol. Infancy 7 (2), 163–182 (2005) 27. Moore, C., Angelopoulos, M., Bennett, P.: The role of movement in the development of joint visual attention. Infant Behavior and Development 2, 109–129 (1997) 28. Nagai, Y.: Joint attention development in infant-like robot based on head movement imitation. In: Proc. Int. Symposium on Imitation in Animals and Artifacts (AISB 2005), pp. 87–96 (2005) 29. Nagai, Y., Hosoda, K., Asada, M.: Joint attention emerges through bootstrap learning. In: Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS 2003), pp. 168–173 (2003) 30. Nagai, Y., Rohlfing, K.J.: Computational analysis of motionese: What can infants learn from parental actions? In: Proc. Int. Conf. on Infant Studies (ICIS 2008) (March 2008) 31. Posner, M.I. (ed.): Cognitive neuroscience of attention. Guilford, New York (2004) 32. Rohlfing, K.J.: Cognitive foundations. In: Rickheit, G., Strohner, H. (eds.) Handbook of Applied Linguistics. Communicative competence of the individual, vol. 1. Mouton de Gruyter, Berlin (2008) 33. Scassellati, B.: Imitation and mechanisms of shared attention: A developmental structure for building social skills. In: Proc. Autonomous Agents 1998 workshop Agents in Interaction - Acquiring Competence through Imitation, Minneapolis, MO (August 1998) 34. Shic, F., Scassellati, B.: A behavioral analysis of computational models of visual attention. Int. Journal of Computer Vision 73(2), 159–177 (2007) 35. Sternberg, R.J.: Cognitive Psychology. Wadsworth Publishing (2005) 36. Sun, R.: Desiderata for cognitive architectures. Philosophical Psychology 17(33), 341–373 (2004) 37. Watson, J.S.: Smiling, cooing, and ”the game”. Merrill-Palmer Quarterly 18(4), 323–339 (1972) 38. Ziemke, T.: What’s that thing called embodiment. In: Proceedings of the 25th Meeting of the Cognitive Society (July/August 2003) 39. Zukow-Goldring, P.: Assisted imitation: Affordances, effectivities, and the mirror system in early language development. In: Arbib, M.A. (ed.) Action To Language via the Mirror Neuron System. Cambridge University Press, Cambridge (2006)

Suggest Documents