Cogn Process (2012) 13 (Suppl 1):S113–S116 DOI 10.1007/s10339-012-0471-y
SHORT REPORT
Modeling body state-dependent multisensory integration Martin V. Butz • Anna Belardinelli Stephan Ehrenfeld
•
Published online: 18 July 2012 Ó Marta Olivetti Belardinelli and Springer-Verlag 2012
Abstract The brain often integrates multisensory sources of information in a way that is close to the optimal according to Bayesian principles. Since sensory modalities are grounded in different, body-relative frames of reference, multisensory integration requires accurate transformations of information. We have shown experimentally, for example, that a rotating tactile stimulus on the palm of the right hand can influence the judgment of ambiguously rotating visual displays. Most significantly, this influence depended on the palm orientation: when facing upwards, a clockwise rotation on the palm yielded a clockwise visual judgment bias; when facing downwards, the same clockwise rotation yielded a counterclockwise bias. Thus, tactile rotation cues biased visual rotation judgment in a headcentered reference frame. Recently, we have generated a modular, multimodal arm model that is able to mimic aspects of such experiments. The model co-represents the state of an arm in several modalities, including a proprioceptive, joint angle modality as well as head-centered orientation and location modalities. Each modality represents each limb or joint separately. Sensory information from the different modalities is exchanged via local forward and inverse kinematic mappings. Also, re-afferent sensory feedback is anticipated and integrated via Kalman filtering. Information across modalities is integrated probabilistically via Bayesian-based plausibility estimates, continuously maintaining a consistent global arm state M. V. Butz (&) A. Belardinelli S. Ehrenfeld Department of Computer Science, University of Tu¨bingen, Tu¨bingen, Germany e-mail:
[email protected] M. V. Butz Department of Psychology, University of Tu¨bingen, Tu¨bingen, Germany
estimation. This architecture is thus able to model the described effect of posture-dependent motion cue integration: tactile and proprioceptive sensory information may yield top–down biases on visual processing. Equally, such information may influence top–down visual attention, expecting particular arm-dependent motion patterns. Current research implements such effects on visual processing and attention. Keywords Sensorimotor integration Multi-modular arm model Top–down visual attention
Introduction An increasing body of research indicates that the brain does not represent the environment for its own sake, but rather it ‘‘pro-presents’’ it for effective, goal-oriented body control. In fact, the brain autonomously learns to encode body and objects with modular population codes in various frames of reference. Mainly in the parietal cortex peripersonal spaces, that is, body surface-relative encodings can be found, which, for example, encode body-relative object locations (Holmes and Spence 2004). In motor cortical areas, body representations are typically posture-encoded, while in adjacent regions also ethologically relevant goaloriented behavioral encodings can be found (Graziano 2006). Such ‘‘pro-presentations’’ may be termed sensorimotor correlation codes, encoding motor-dependent changes of sensory stimuli not only for forward predictions but also for the effective inverse initiation and control of behavior (Wolpert and Kawato 1998). The ideomotor principle proposes how these sensorymotor codes may be learned. Starting with random motor babbling, soon learning and behavior become increasingly
123
S114
goal-driven by curiosity and other motivational mechanisms (Herbart 1825; Stock and Stock 2004). To organize the large number of sensory-motor codes for flexible, goaldirected behavior control, the brain tends to strongly modularize the encodings in interactive and hierarchical networks. These are grounded in particular sensory modalities, which are then integrated and converted into multisensory and sensorimotor correlation codes and finally into motor output signals. To exchange and integrate information across modules and modalities, our brain thus inevitably needs a mechanism apt to exchange and integrate different sensory sources of information, each of which may be encoded in different reference frames. Assuming that sensory cue integration occurs based on Bayesian principles (Doya et al. 2007), transformations across sensory modalities must take into account the body-relative configuration of the involved modalities. For example, for almost 20 years, it has been known that tactile information is transformed into a head-centered frame of reference, so that same figures or letters drawn on different body surfaces can be perceived as straight or reversed depending both on the position of the stimulated body part and on the surface orientation with respect to ‘‘embodied head axes’’ (Oldfield and Phillips 1983; Sekiyama 1991). Thus, tactile information for letter perception is transformed into a visual frame of reference. In general, it appears that cue integration is strongly dependent on the current body posture and consequent relative sensor orientations. This insight can be critical for modularly modeling limbs and sensors that continuously maintain a state-dependent representation of the body in a biologically plausible and robust way. In this paper, we characterize such interactions, propose how to model them, and consider their behavioral and cognitive implications. In our work, we have shown that tactile motion stimuli on the hand palm can bias visual motion judgments depending on the positioning of the hand in a highly versatile way and fully sub-consciously. Meanwhile, we have developed a modular arm model, which exchanges information probabilistically and interactively. The model integrates different sources of sensory information across modalities, thus yielding a high robustness against sensor failures and noise. In the future, we plan to also integrate motion feature maps in a cameracentered model of visual attention so to provide a further modality for self-motion estimation and at the same time selectively focusing on it or on the scene.
Posture-dependent remapping of motion Motion perception and estimation is crucial for detecting and understanding the events that unfold around our body
123
Cogn Process (2012) 13 (Suppl 1):S113–S116
and in interaction with our body. While investigations on how tactile stimuli interact with other sensory modalities have been conducted with various approaches (Holmes and Spence 2004), the relevance of the position and orientation of the stimulated body part has only been investigated to a limited extent in the past. Three experiments were conducted to assess whether tactile motion can influence visual motion perception, how this interaction depends on the orientation of the stimulated body surface, and finally how this integration unfolds in time (Butz et al. 2010). Participants experienced a tactile rotating stimulus on the right hand, monitoring if the stimulus switches direction during a trial. Concurrently, participants had to judge the rotation direction of a visual motion display. In the first experiment, tactile stimulation was administered on the palm or the back of the hand. Analyses of the proportion of clockwise reported rotations in visually ambiguous trials showed that tactile and visual cues integrated, where the first biased the perception of the latter when the palm was stimulated. In the second experiment, always the palm of the hand was stimulated, but it was either oriented upwards or downwards (Fig. 1, left). Consequently, the hand-relative same tactile stimulus would appear clockwise or counterclockwise in a head-centered frame of reference. Results showed indeed an assimilative visual bias when the hand was stimulated from above and a reversed bias when it was stimulated from below. Finally, in the third experiment, the time of tactile stimulation onset was varied. The same bias was observed only when tactile stimulation was applied during the display of visual motion, but not when tactile stimulation was applied only afterwards. Moreover, when the tactile rotation changed during the exposure, the direction that was applied upon visual onset determined the bias. Thus, motion cue integration occurred in a very immediate way. The results thus show that tactile motion integrated with visual motion depending on body posture.
Redundant mapping in a modular arm model Computational models of motor planning propose the modular and distributed representation of body limbs (Butz et al. 2007; Vaughan et al. 2006). Redundant sensory readings—such as proprioception and vision—are grounded in multiple frames of reference. Humans maintain internal statistical representations of both sensory uncertainty and prior knowledge about a task (Ko¨rding and Wolpert 2004). We devised a modular modality frames model (MMF), which is able to integrate and maintain sensory and anticipatory information across frames of reference in a kinematic arm chain by means of Kalman filtering and
Cogn Process (2012) 13 (Suppl 1):S113–S116 Fig. 1 Left setup of the second experiment, tactile stimulation administered on the palm facing upwards or downwards. Right dependency graph for the transformations from posture to location (solid) and viceversa (dashed)
S115
shoulder global end-point (GL) GL 0
global orientation (GO)
elbow
GO1 upper arm
local forward and inverse mappings (Ehrenfeld and Butz 2011). The hierarchical architecture relies on different spatial representations for each limb in different modality frames: a global location space, a global orientation space, a local orientation space, and a local posture space (Fig. 1, right). Global spaces share origin and orientation in every limb (upper arm, forearm, and hand), while local spaces depend on the orientation of the previous limb. Thus, the model considers a system composed of twelve modular modality frames (three limbs, each represented in four modalities) describing the state of a maximally 9 DOF arm. To cope probabilistically with inaccurate movements and noisy sensory readings, different sources of information were combined iteratively. First, information from each modular modality frame is combined. Second, movement information is added in each frame. And finally, sensory measurements are included. Information combination is accomplished by a principled Bayesian approach based on individual probabilistic Gaussian state estimates. The transformations of information are realized by means of hard-coded forward and inverse kinematic mappings and their derivatives. For the combination of sensory information with the predicted state estimations, Kalmanfiltering principles are applied. Tests were carried out for goal-directed behavior in simulation. The results confirmed high noise robustness, the continuous maintenance of a consistent arm state estimation, and the flexible and adaptive identification of faulty or unexpectedly noisy information due to the available redundant sensory information. While the MMF model thus flexibly integrates information from different modalities, motion cue integration has not been considered so far.
Attention for visual and proprioceptive motion cue integration Merging proprioceptive and visual information constitutes a crucial step for actively learning sensorimotor
wrist
GL1
end-effector
GL3
GL2
GO2
GO3
forearm
local orientation (LO)
LO1
LO2
local posture (LP)
LP1 shoulder
LP2
hand
LO3
LP3 elbow
wrist
contingencies as well as for visual servoing of robotic manipulators. Some approaches exploited the mutual information or correlation between visual and motor information to let a robot learn to detect its own arm, without knowledge of the arm kinematics or appearance (Kemp and Edsinger 2006; Saegusa et al. 2010). Conversely, assuming the appearance of the arm to be known, it can be localized and tracked for improving arm state estimations. This was done recently by using a virtual representation of the arm for iteratively aligning its posture estimation based on comparisons between the visually rendered pose and its actual appearance in the camera (Gratal et al. 2011). This approach is particularly interesting since the 3D CAD model could also be used to simulate tactile stimulations and their posture-dependent appearance in a visual frame of reference for modeling the observed motion cue integration discussed above. Nevertheless, these applications must rely on attentional selection of the region where the effector of interest is to be found. W.r.t. own body perception, this may happen completely bottom–up: the attentive module encodes salient motion correlating with executed arm movements. In the servoing scenario, top–down attention locates the limb to be tracked w.r.t. the target. An object-based saliency map was obtained by spatiotemporal filtering in Wischnewski et al. (2010) for targetdriven visual search in dynamic scenes. Motion features are encoded as energy features, extracted by filtering a video sequence via a Gabor filter bank along two spatiotemporal dimensions (horizontally and vertically in time). This corresponds to computing the response of Reichardt’s-like motion detectors, akin to motion sensitive cells in V1 (Adelson and Bergen 1985). Directional feature maps are obtained by selecting filters relative to coherent motion (diagonally oriented) and computing features relating to rightward, leftward, downward, or upward motion. Bottom–up saliency is computed with a center-surround mechanism enhancing locations moving differently from the surrounding, while promoting maps entailing few local
123
S116
maxima. Proto-object candidates are subsequently extracted by clustering regions of consistent motion energy and direction. Top–down biases may be induced by transforming effector motion estimations from the global location and orientation spaces of the MMF model into a camera view space, promoting expected motion direction and scales in the saliency map. While arm motion itself may be consequently filtered in the saliency map, additional motion cues will elicit attention. The cue integration observed in the psychological experiments can then be accomplished via Bayesian Decision Theory.
Conclusions For modeling posture-dependent multisensory integration, modular modality frame representations of the body are necessary. Moreover, motion cue information needs to be transferrable based on approximations of the body surfaces. Finally, attention is necessary to accomplish selective cue integration. We believe that the proposed model will not only offer advancements in robotics, but it will also generate interesting predictions on how the brain processes multisensory information, consequently modeling the mentioned cue integration phenomena as well as hypothesizing many other testable posture-dependent multisensory and sensorimotor interactions. Conflict of interest This supplement was not sponsored by outside commercial interests. It was funded entirely by ECONA, Via dei Marsi, 78, 00185 Roma, Italy.
References Adelson EH, Bergen JR (1985) Spatiotemporal energy models for the perception of motion. J Opt Soc Am A 2(2):284–299 Butz MV, Herbort O, Hoffmann J (2007) Exploiting redundancy for flexible behavior: unsupervised learning in a modular sensorimotor control architecture. Psychol Rev 114:1015–1046
123
Cogn Process (2012) 13 (Suppl 1):S113–S116 Butz MV, Thomaschke R, Linhardt MJ, Herbort O (2010) Remapping motion across modalities: tactile rotations influence visual motion judgments. Exp Brain Res 207:1–11 Doya K, Ishii S, Pouget A, Rao RPN (2007) Bayesian brain: probabilistic approaches to neural coding. The MIT Press, Cambridge Ehrenfeld S, Butz MV (2011) A modular, redundant, multi-frame of reference representation for kinematic chains. In: Proceedings of the IEEE ICRA 2011, pp 141–147 Gratal X, Romero J, Kragic D (2011) Virtual visual servoing for realtime robot pose estimation. In: Proceedings of the 18th IFAC world congress Graziano MSA (2006) The organization of behavioral repertoire in motor cortex. Annu Rev Neurosci 29:105–134 Herbart JF (1825) Psychologie als Wissenschaft neu gegru¨ndet auf Erfahrung, Metaphysik und Mathematik. Zweiter, analytischer Teil. August Wilhem Unzer, Ko¨nigsberg Holmes NP, Spence C (2004) The body schema and multisensory representation(s) of peripersonal space. Cogn Process 5:94–105 Kemp CC, Edsinger A (2006) What can i control?: the development of visual categories for a robot’s body and the world that it influences. In: Proceedings of the fifth international conference on development and learning Ko¨rding KP, Wolpert DM (2004) Bayesian integration in sensorimotor learning. Nature 427:244–247 Oldfield SR, Phillips JR (1983) The spatial characteristics of tactile form perception. Perception 12:615–626 Saegusa R, Metta G, Sandini G (2010) Own body perception based on visuomotor correlation. Proc IEEE IROS 2010:1044–1051 Sekiyama K (1991) Importance of head axes in perception of cutaneous patterns drawn on vertical body surfaces. Percept Psychophys 49:481–492 Stock A, Stock C (2004) A short history of ideo-motor action. Psychol Res 68:176–188 Vaughan J, Rosenbaum DA, Meulenbroek RGJ (2006) Modeling reaching and manipulating in 2- and 3-D workspaces: the posture-based model. In: Proceedings of the fifth international conference on learning and development, pp 1–6 Wischnewski M, Belardinelli A, Schneider WX, Steil JJ (2010) Where to look next? Combining static and dynamic protoobjects in a TVA-based model of visual attention. Cogn Comput 2(4):326–343 Wolpert DM, Kawato M (1998) Multiple paired forward and inverse models for motor control. Neural Netw 11:1317–1329