30 Optimal Multisensory Integration: Assumptions and ...

1 downloads 0 Views 1MB Size Report
Dec 21, 2011 - Banks, 2002; Ghahramani, Wolpert, & Jordan, 1997; ... 528 MArC O. ErNST .... ramani, Wolpert, & Jordan, 1997; van Beers, Sittig, &. Denier van ...
30 Optimal Multisensory Integration: Assumptions and Limits Marc O. Ernst

The brain receives more or less noisy information about the environment from all the different sensory modalities, including vision, touch, and audition. To efficiently interact with the environment, this noisy information must eventually converge in the brain in order to form a reliable and accurate multimodal percept. Several recent studies have shown that multisensory estimates from within and across the senses are integrated in a statistically optimal fashion (Alais & Burr, 2004; Ernst & Banks, 2002; Ghahramani, Wolpert, & Jordan, 1997; Helbig & Ernst, 2007; Hillis, Watt, Landy, & Banks 2004; Knill & Saunders, 2003). That is, multisensory information is weighted according to its reliability (inverse variance), leading to an integrated percept that is more reliable (i.e., less noisy) than each of its constituents. This model of optimal integration (Cochran, 1937) makes three major modeling assumptions: (1) the individual perceptual estimates are unbiased; that is, they veridically represent the physical property to be estimated; (2) the noise distributions of the individual perceptual estimates are independent; and (3) the noise distributions underlying the perceptual estimates are Gaussian distributed. In this chapter, I discuss in what way relaxing each of these assumptions can lead to interesting predictions about multisensory perception. (1) If the sensory estimates may be biased (i.e., the estimates may be inaccurate), the brain should consider this by providing an estimate of the bias. Doing so will lead to a weaker form of coupling and finally independence between the signals. (2) If the noise distributions are no longer independent, we show that the benefit of integration, which is a reduction in variance, can be diminished. Furthermore, relaxing the independence assumption may lead to a situation in which the weights assigned to the sensory estimates may even become negative. (3) If the probability distributions describing the integration process are no longer assumed to be strictly Gaussian, one could model not only integration but also the breakdown of integration with increasing discrepancy between the sensory signals.

Optimal Integration of Multisensory Information For estimating a specific environmental property, such as the size of an object in the world, SW, there are often multiple sources of sensory information available. For example, an object’s size can be estimated from sensory signals that are derived from vision and touch, .. and SH. Models of optimal sensory integration typically assume that these signals are unbiased (accurate) (i.e., SV = SH) with normally distributed noise sources that are independent, a situation in which sensory integration is beneficial (Landy, Maloney, Johnston, & Young, 1995). Figure 30.1 illustrates the optimal mechanism of sensory combination given these assumptions and given that the goal is to compute a minimum-variance estimate. The likelihood functions represent two independent estimates of size, the visual size estimate SˆV and the haptic size estimate SˆH , based on sensory measurements z = (zV, zH) that are corrupted by noise ε = (εV, εH) with standard deviations σV and σH (i.e., zi = Si + εi where the index i refers to the different sensory modalities, V and H).1 Given this model, the integrated multisensory estimate SˆVH is a weighted average of the individual sensory estimates with weights wV and wH that sum up to unity (Cochran, 1937):

SˆVH = wV SˆV + w H SˆH , where wV + w H = 1 (30.1)

To achieve optimal performance, the chosen weights need to be proportional to the reliability (precision2) r, which is defined as the inverse of the signal variance:



wj =

rj

∑ ri , i

with ri =

1 σ i2

(30.2)

The signal that provides more reliable information in a given situation is given a higher weight and so has a greater influence on the final percept. In the example shown in figure 30.1, visual information about the size of the object is four times more reliable than the haptic

Stein—The New Handbook of Multisensory Processes 8466_030.indd 527

12/21/2011 6:03:18 PM

Q

Probability

likelihood functions combined visual

haptic VH

H

Sˆ H

V

ˆV Sˆ VH S Size

Figure 30.1  Illustration of the likelihood functions of the individual visual and haptic size estimates SˆV and SˆH and the combined visual-haptic size estimate SˆVH , which is a weighted average according to equation 30.1. The variance associated with the visual-haptic distribution is less than either of the two individual estimates (equation 30.3). (Adapted from Ernst & Banks, 2002.)

information. Therefore, the combined estimate (the weighted sum) is “closer” to the visual estimate than the haptic one (in the present example the visual weight is 0.8 according to equation 30.2). In other circumstances where the haptic modality might provide a more reliable estimate, the situation would be reversed. Given this weighting scheme, the benefit of integration is that the variance of the combined estimate from vision and touch is less than that of either of the individual estimates that are fed into the integration process. Therefore, the combined estimate arising from integration of multiple redundant sources of independent information shows greater reliability and diminished effects of noise. Mathematically, this is expressed by the combined reliability r being the sum of the individual reliabilities:



Q

r = ∑ ri i



(30.3)

Given the assumptions stated above—all redundant estimates are unbiased, their noise sources are Gaussian distributed and independent—this integration scheme can be considered statistically optimal because it provides the lowest possible variance of its combined estimate. Thus, this form of sensory combination is the best way to reduce uncertainty. There are many empirical studies that recently have tested different aspects of this integration scheme (e.g., Bresciani & Ernst, 2007; Drewing & Ernst, 2006; Ghahramani, Wolpert, & Jordan, 1997; van Beers, Sittig, & Denier van der Gon, 1998, 1999). In 2002, Ernst and Banks could demonstrate that humans actually integrate sensory information derived from vision and touch in such a statistically optimal fashion. It has further been shown that this finding of optimality also

528  

holds across and within other sensory modalities, for example vision and audition (e.g., Alais & Burr, 2004), perspective and binocular disparity for visual slant (Hillis, Watt, Landy, & Banks, 2004; Knill & Saunders, 2003), or different texture cues to the spatial location of an image boundary (Landy & Kojima, 2001). Thus, weighted averaging of sensory information appears to be a general strategy employed by the perceptual system to decrease the detrimental effects of noise present in the sensory estimates. Despite the success of this relatively simple model there are several experimental findings that do not conform with the model predictions (e.g., Bresciani, Dammeier, & Ernst, 2006; Roach, Heron, & McGraw, 2006; Rosas, Wagemans, Ernst, & Wichmann, 2005; Shams, Ma, & Beierholm, 2005). A likely reason for this failure may be that some of the modeling assumptions are not fulfilled. In the following I discuss the consequences of relaxing each of the three major modeling assumptions—(1) the redundant sensory signals are all unbiased, (2) their noise distributions are independent, and (3) their noises are Gaussian distributed— and I show that relaxing these assumptions can lead to interesting predictions, most of which are not yet rigorously tested in behavioral experiments.

Potentially Biased Estimators To start, I will discuss some of the consequences for multisensory integration that can occur when the assumption of unbiased sensory signals is relaxed, that is, when the signals are allowed to be inaccurate as well. Formally, we may define a biased sensory signal Si as a signal that is derived from the world attribute Swi within a sensory channel i, which contains some added bias Bi: Si = SWi + Bi . For simplicity we here consider additive biases only, and we assume that the world attribute as well as the current bias is unknown to the perceptual system, so they have to be inferred. There are many situations that may lead to biases in sensory signals, that is, that may cause the sensory signals to be inaccurate. Examples of sources of inaccuracies in signals can be muscle fatigue, variance in grip posture when touching an object, or effects of temperature or humidity that affect sound propagation and thus affect auditory estimates, to name just a few. Other examples include processes that are derived from noisy sensory signals through integration, such as position estimates based on information from the vestibular organ, which is sensing acceleration only. Integrating noisy acceleration signals is likely to introduce biases, which can take the form of a random walk. Figure 30.2 illustrates some examples of processes that might affect the accuracy of

Marc O. Ernst

Stein—The New Handbook of Multisensory Processes 8466_030.indd 528

12/21/2011 6:03:18 PM

Bias in size estimates Unbiased

(SV,SH)-distributions

Muscle fatigue

Without bias

SV SH

SV SH

SW

Object

SW

SH

Fingertips

SV

Eyes

Grip posture

With bias

Glasses SV SH

SV SH SW

SW

SH

SV

Figure 30.2  Examples of potential biases in visual-haptic size signals caused by variation in grip posture, muscle fatigue, or variations in the fitting of glasses (left). Schematic illustration of the distribution of unbiased and of potentially biased visualhaptic signals (right).

visual and haptic size signals. The top left panel shows sensory signals (SV and SH) that are accurate with respect to the world attribute SW; that is, they contain no bias. The other panels show three examples of SV and SH signals that are inaccurate and that hold an additional bias B, such as muscle fatigue and the variation in grip posture, which may affect the haptic size estimates, or slight changes in the fitting of glasses, which may have noticeable effects on the visual size estimates. Whether an isolated sensory signal is biased or not cannot be known from just observing the perceptual outcome of this one signal. In other words, a sensory signal does not carry information about the bias it may contain. This is in contrast to the precision of a sensory signal (i.e., its reliability), which may be determined online by, for example, averaging over multiple detectors using a neural population code (e.g., Ma, Beck, Latham, & Pouget, 2006; Pouget, Deneve, & Duhamel, 2002). This would provide an average measurement zi with variance σ i2 . Alternatively, averaging the signal over a short period of time and observing the signal fluctuations (assuming the sensed world property stays constant or varies only predictably) would also provide the perceptual system with a measurement zi together with its variance. This is like the speedometer needle of an old VW beetle that jitters around some average value: from the jitter one can easily tell the mean and the precision of the speed measurement provided by the speedometer. However, whether the speedometer signal is inaccurate is not apparent from the measurement itself. That is, the signal may unknowingly contain a bias (e.g., due to the car having smaller or bigger wheels

compared to when it was calibrated) and thus may be off by some amount. Even though it is impossible to determine the bias from the observation of a single, isolated signal, the bias in the signal may become apparent as manifestation of an error when one tries to use the signal for interaction with the environment (e.g., in the example of the VW beetle, you may arrive too early or too late at a meeting even though you were constantly driving 50 km/hr according to the speedometer reading and should have been on time). The bias may also become apparent when one compares estimates of two signals against each other that are derived from the same object or event SW, that is, that are redundant. That is, if two VW beetles go 50 km/hr according to their speedometer readings, but one is slightly faster than the other, one may infer that at least one of the two speedometer signals is inaccurate (however, which of the two is off remains unknown). Such a situation of comparing estimates of two signals exists when one is trying to integrate two redundant estimates that are derived from the same object or event. In what follows I discuss how best to integrate two such estimates assuming that the signals to be measured may potentially be biased, while at the same time they contain noise and are thus imprecise. The determination of a discrepancy between estimates of two redundant signals caused by some bias in either of the two signals is complicated by the existence of noise in the estimates. That is, when one compares two noisy estimates, there will always be a slight discrepancy between them, even when there is no bias present, because noisy estimates will never agree completely. Optimal Multisensory Integration  

529

Stein—The New Handbook of Multisensory Processes 8466_030.indd 529

12/21/2011 6:03:19 PM

Q

Q

Therefore, if such a discrepancy between noisy estimates is detected, there will always be uncertainty as to whether this discrepancy is caused by noise (i.e., imprecision) or by a bias in the signals (i.e., inaccuracy). If the discrepancy were due to noise, the perceptual system may want to benefit from integration because this way some of the noise can be averaged out and thus can lead to more precision in the perceptual estimates. In contrast, if the discrepancy were due to inaccuracy, integrating the estimates by forming a weighed average would induce a cost because the integrated perceptual estimate would be biased as well. Thus, determining the cause of the discrepancy is a classical form of a creditassignment problem, and the goal is to best balance the benefit of integration in terms of an increased precision with the potential cost of inaccuracy. Because there is uncertainty as to whether the signals and the estimates derived from these signals are inaccurate or imprecise, we have to rely on a probabilistic framework to solve this problem. This is provided by the Bayesian approach. Because the signals may be inaccurate, the ultimate goal is to get an estimate of the bias Bˆi contained in the noisy sensory estimate of the signal Sˆi , which can then be subtracted in order to get an estimate of the world property SˆWi = Sˆi − Bˆi . As stated above, for a perceptual process that does not involve interactions, this is impossible to achieve from the observation of a single, isolated signal. Therefore, we have to consider the comparison between at least two (redundant) sensory estimates, which leads to a discrepancy estimate Dˆ MLE = SˆVMLE − SˆHMLE . The index MLE (maximum likelihood estimate) refers to the estimates as inferred directly from the sensory signals. This discrepancy between the signal estimates Dˆ MLE is the result of both the difference in the biases of the signals and the difference as a result of the noise in the sensory estimates. In a first step the perceptual system has to discount in the best possible way the effect of noise in the signal estimates in order to arrive at a more precise estimate of the sensory signals SˆiMAP and thus at a more precise estimate of the discrepancy between those signals Dˆ MAP = SˆVMAP − SˆHMAP = BˆV − BˆH . This can be done in the Bayesian framework by integrating prior knowledge about the statistics of the sensory signals. The index MAP (maximum a posteriori) refers to the signal estimates after this integration process, combining the sensory estimates as defined by likelihood functions with prior knowledge according to Bayes’s rule. Once we have the best possible estimate of the discrepancy due to inaccuracy, in a second step we will attribute this discrepancy to the bias contained in one or the other signal. This is yet another credit-assignment problem. In what follows, I further describe this two-step process.

530  

Let me focus on the first step first and see how the perceptual system might determine how much of the discrepancy Dˆ MLE between two sensory estimates SˆiMLE can be attributed to the presence of noise and how much of it might be due to a difference in bias. We assume that a single world property is sensed by vision and touch SW = (SWV , SWH ) such that SWV = SWH . The sensory signals S = (SV, SH) derived from this world property might potentially be biased B = (BV, BH) such that S = SW + B. Let z = (zV, zH) be the sensory measurements derived from S. Both the visual and haptic measurements are corrupted by independent Gaussian noise with variance σ i2 , so zi = Si + εi. For now we assume the noises to be independent so the covariance ρ is zero. With these assumptions, the joint likelihood function takes the form of a Gaussian density function: p (z|S ) = p (zV , z H |S V , S H ) = N (S , Σ z ) 1  (zV − SV )2 (z H − S H )2  +  2 σV2 σH 

1 − 2  e = N1



 σV2 with Σ Z =   0

0  σ H2 

(30.4) ,

which is a bivariate normal distribution with mean S and covariance matrix Σz (left column in figure 30.3). N1 is just a normalization factor. Here the maximum likelihood estimate derived from this distribution SˆMLE = (SˆVMLE , SˆHMLE ) corresponds to its mean. The discrepancy estimate between the visual and haptic estimates Dˆ MLE = SˆVMLE − SˆHMLE contains both noise and b ias. In order to resolve this ambiguity between noise and bias, we have to rely on prior knowledge about the probability of co-occurrence of the (biased) signals p(SV, SH). To begin, we assume that the perceptual system has already acquired a priori knowledge about the probability of jointly encountering a combination of sensory signals S. This knowledge is represented in the prior p(SV, SH). If the signals were all unbiased at all times, and they were always derived from the same object or event such that SWV = SWH —that is, they are redundant— the distribution of joint signals would coincide with the unity line. The prior, reflecting this distribution, would thus be an impulse function aligned with the unity line (figure 30.3, upper row). This can be approximated by:



p (SV , S H ) =

1 (SV − S H )2 σ x2

1 −2 e N2

with σ x → 0 (30.5)

N2 is again just a normalization constant. Integrating the likelihood function with such an impulse prior according to Bayes’s rule p(SV, SH|ZV, ZH) ∝ p(ZV, ZH|SV, SH)p(SV,SH) results in a posterior also residing on the

Marc O. Ernst

Stein—The New Handbook of Multisensory Processes 8466_030.indd 530

12/21/2011 6:03:19 PM

Likelihood

α

Prior

x

Posterior

MLE x

α

(SˆVMLE , SˆHMLE )

MAP (SˆVMAP , SˆHMAP )

Fusion

Unbiased

x

Coupling

2 H

x

haptic size

x

2 V

x

2

Biased

impulse prior

x

Dˆ MLE = SˆVMLE

SˆHMLE

flat prior

Dˆ MAP = SˆVMAP

SˆHMAP

Independence

Unrelated

coupling prior

visual size

Figure 30.3  The combination of visual and haptic measurements with different prior distributions. (Left column) Likelihood functions resulting from noise with standard deviation σV twice as large as σH; × marks the maximum-likelihood estimate (MLE) of the sensory signals SˆMLE = (SˆMLE , SˆHMLE ) . (Middle column) Prior distributions aligned with the positive diagonal (i.e., SWV = SWH ) with different variances σ x2 . (Top) Impulse prior σ x2 = 0 . (Middle) Intermediate coupling prior 0 < σ x2 < ∞ . (Bottom) Flat prior σ x2 → ∞. (Right column) Posterior distributions, which are the product of the likelihood and prior distributions according to Bayes’s rule. • marks the maximum a posteriori (MAP) estimate SˆMAP = (SˆVMAP , SˆHMAP ) . Black arrows correspond to the shift in the MAP estimate relative to the MLE estimate. The orientation of the arrows α correspond to the weighting of the SˆV and SˆH estimates. The length of the arrow indicates the degree of coupling from fusion to independence.

identity line (as the probability of the impulse prior is zero elsewhere). The final estimate of the signals is then given by the maximum of the posterior SˆMAP (the MAP estimate). According to equation 30.2 the weighting of the signals is given by the relative inverse variance σV2 and σ H2 of the sensory estimates SˆMLE . Such a weighting corresponds to the direction in which the MAP estimate is shifted relative to the MLE estimate, which is indicated by the direction α of the arrow in figure 30.3 (Bresciani et al., 2006):  σ2   w  α = arctan  H2  = arctan  V  (30.6)  1 − wV    σ V Thus, this situation corresponds exactly to what has been discussed earlier as the standard model of integration, which assumed unbiased estimators. We call this case “fusion.” We now turn to the case of potentially inaccurate signals. If the signals are potentially inaccurate (biased), this will be reflected in the prior p(SV, SH) encoding the co-occurrence statistics of the signals. If we assume

additive biases for the visual and haptic signals (S = SW + B) with a joint distribution of biases that is again Gaussian (and independent of SW), and we further assume that all the signals are derived from the same world property such that SWV = SWH , the prior will take the same form as already expressed in equation 30.5. However, σ x2 in this case corresponds to the variance of the joint distribution of biases, which is different from zero. Figure 30.2 provides some examples for visual and haptic size signals (SV, SH) that might be encountered jointly when we try to estimate the size of an object SW. Figure 30.2 also shows the joint distribution of signals that may result from such situations, both for the biased and the unbiased cases. The point here is that the variance in the joint distribution and hence the variance of the prior learned from these signals is affected by the variability due to inaccuracy of the two signals. The more likely it is that the signals contain a bias the higher the variance of the prior. Since this prior refers to the probability of co-occurrence of certain signals, in earlier work this prior p(SV, SH) has also been referred to as the Optimal Multisensory Integration  

531

Stein—The New Handbook of Multisensory Processes 8466_030.indd 531

12/21/2011 6:03:20 PM

Q

“coupling prior” (Bresciani et al., 2006; Ernst, 2005, 2007). Accordingly, cases relying on this prior we call “coupling” (figure 30.3, middle row). Realistically, all cases of multisensory integration, such as size estimation from vision and touch, fall into this category (Ernst, 2005; Hillis, Ernst, Banks, & Landy, 2002). This is because there is always some probability that the signals are inaccurate due to external or internal factors, such as muscle fatigue, optical distortion, or other environmental or bodily influences. With such a prior at hand, we can resolve the ambiguity about the discrepancy Dˆ MLE between the estimates being due to noise or bias. The higher the variance of the prior σ x2 the more likely it will be that the discrepancy is caused by biases. This can be formalized again by using Bayes’s rule combing the joint likelihood function obtained from the sensory measurements together with the prior knowledge about the co-occurrence statistics of the signals. This gives rise to a final estimate of the sensory signals SˆMAP = (SˆVMAP , SˆHMAP ) based on the posterior distribution. Such a posterior estimate balances the benefit of reduced variance with the cost of a potential bias in the estimate. Note that this step does not yet provide an estimate of the world property SW = (SWV , SWH ) or the biases B = (BV, BH). However, it provides the best possible estimate of the sensory signals SˆMAP = (SˆVMAP , SˆHMAP ) and the discrepancy as a result of the bias Dˆ MAP = SˆVMAP − SˆHMAP = BˆVMAP − BˆHMAP . How we estimate SW and B is discussed later in this section, after a discussion of some of the properties of this integration process, which is accounting for the accuracy as well as the precision of the signals. Due to integration according to Bayes’s rule, the posterior estimate SˆMAP is shifted with respect to the likelihood SˆMLE . This shift is highlighted by the arrow in figure 30.3, right column. As previously, the direction α in which the estimate is shifted is solely determined by the variances σV2 and σ H2 of the likelihood function (equation 30.5), independent of the variance σ x2 of the prior. In contrast the length L of the arrow, which indicates the strength of the integration, is determined by a weighting of the likelihood function and the prior (conditioned on the line through the MLE and MAP estimates, i.e., in direction α), with the weight:



Q

L=

2 (α ) σ likelihood 2 ( ) σ likelihood α + σ 2prior (α ) .

(30.7)

If σ x2 = 0 , all the weight is on the prior because there is absolute certainty that the signals are unbiased, and all the discrepancy between the estimates is due to noise. Consequently the estimates are fused completely (figure 30.3, upper row), which—as noted earlier—

532  

corresponds to the standard model of integration where we had assumed unbiased signals. If there is some probability that the signals may contain a bias, σ x2 will have a nonzero value, such that 0 < σ x2 < ∞ (figure 30.3, middle row). The higher the value of σ x2 , the lower the weight of the prior because the discrepancy between the signals is caused more likely by a bias and not noise in the estimates. The MAP estimate as a result of this weighting scheme provides the best balance between precision and accuracy of the sensory estimates. The remaining difference in the MAP estimates Dˆ MAP = SˆVMAP − SˆHMAP corresponds to the best current estimate of the actual discrepancy D between the sensory signals. The predictions of this model, regarding both bias and variance, have been confirmed by an experimental study of perceived numerosity of visual and haptic events (Bresciani et al., 2006). In the extreme case when σ x2 → ∞, the result is a flat prior (figure 30.3, lower row). This indicates that the signals are unrelated, there will be no weight given to the prior, and thus, the signals will be kept independent. In this case the MAP estimates based on the posterior are identical to the MLE estimates based on the likelihood function. In the first stage, just discussed, we described how an observer can use Bayes’s rule to calculate a posterior distribution of the sensory signals given the noisy measurements P (SV , S H |z V , z H ) , taking into account prior knowledge of the co-occurrence statistics between the signals p(SV, SH). This provided us with the best possible estimate of the sensory signals SˆMAP and the discrepancy Dˆ MAP between those. However, how much of this discrepancy is due to bias in the visual or haptic signals is still unknown. More importantly, however, we still also have ˆ no estimate of the world property SW . Thus, I now describe how the observer can use prior knowledge of the likely inaccuracy in each modality p(BV, BH) along with the current estimate of the discrepancy between sensory signals ( Dˆ MAP = SˆVMAP − SˆHMAP ) to solve this next credit-assignment problem: What portion of Dˆ MAP can be attributed to a bias in vision, and what portion to a bias in touch? Once we know the bias Bˆ MAP = ( BˆVMAP , BˆHMAP ) , we can subtract it from the estimate of the signal to arrive at the best estimate of the world property: SˆW = SˆMAP − Bˆ MAP . Figure 30.4 (plate 33) illustrates this two-stage process. After the signals are integrated in the first stage, as has been described above, we are left with the discrepancy estimate Dˆ MAP . If we plot this estimate in a space of visual versus haptic bias, we will end up with a diagonal line at distance Dˆ MAP from the origin (figure 30.4 [plate 33], lower left panel). This line corresponds to all pos-

Marc O. Ernst

Stein—The New Handbook of Multisensory Processes 8466_030.indd 532

12/21/2011 6:03:20 PM

α

Prior

x

x

x H

Dˆ M Sˆ MA AP V = P Sˆ M

AP

haptic size

Posterior Sˆ MAP

p(SV, SH )

^

SW = ^ MAP

^ MAP

(S – B )

visual size

Bˆ MAP= (BˆVMAP, BˆHMAP )

p(BV, BH )

AP

0 Dˆ M

haptic bias

1. step: noise vs. signal discrepancy 2. step: visual vs. haptic bias

Likelihood Sˆ MLE

0 visual bias

Figure 30.4 (plate 33)  Two-step process of estimating visual and haptic signals, the bias contained in those signals, and the world property SˆW . In the first step the noisy sensory estimates are combined with prior knowledge about the co-occurrence statistics of the signals using Bayes’s rule in order to derive an estimate of the sensory signals and the discrepancy between those (indicated in red). In the second step the ambiguous discrepancy estimate is combined with prior knowledge about the distribution of visual and haptic biases, again using Bayes’s rule, in order to get an estimate of the visual and haptic biases that contribute to the discrepancy (indicated in blue). This estimate of bias can then be subtracted from the signal estimate in order to derive an estimate of the world property SˆW = SˆMAP − Bˆ MAP (gray arrow).

sible combinations of visual and haptic biases that sum up to Dˆ MAP . Thus, this likelihood function represents the ambiguity in the bias estimate. Next we want to know how much of this discrepancy is due to a bias in vision and how much is due to a bias in touch. As there is no information about the bias directly in the signals, we again have to rely on prior knowledge to solve this credit-assignment problem. The prior of choice represents the a priori probability of bias p(BV, BH) in either of the two sensory signals. This prior may have been learned in the past from interactions with the environment, during which such biases could be detected. We assume this distribution of biases to be centered at zero and to be Gaussian shaped with variances independent of S. This is illustrated in figure 30.4 (plate 33) for a case in which the haptic signals have a higher probability of containing a bias compared to the visual signal, and thus the variance of the prior along the haptic dimension is greater. Combining this prior with the likelihood function according to Bayes’s rule will lead to a bias estimate Bˆ MAP = ( BˆVMAP , BˆHMAP ) for both the visual and haptic signals. Now that we have an estimate of the signal SˆMAP and an estimate of the bias Bˆ MAP we can easily derive the best estimate of the world property by subtracting the bias from the signal estimate (figure 30.4 [plate 33],

gray arrow): SˆW = SˆMAP − Bˆ MAP . This is the best estimate of a world property, such as the size of an object, derived from potentially biased, noisy estimates when we assume the world property is identical in both sensory channels (SWV = SWH ). To recalibrate the perceptual system, that is, to adaptively correct for the perceptual distortions due to inaccuracy, the bias estimate may also be used to iteratively update the prior in the following time steps. How the bias estimate may be used to recalibrate the perceptual system is discussed by Ernst and Di Luca (2011). Estimating a world property from redundant signals that are noisy and potentially inaccurate was the task we set out to model here. However, if the task is slightly altered, and observers are asked to either report the visual or the haptic signal instead of the world property, their answer to those two questions may be different. They may answer SˆVMAP when asked to report about the visual signals and SˆHMAP when asked for the haptic signal—answers that will differ by Dˆ MAP . There is ample experimental evidence that observers’ answers do indeed differ when they are explicitly asked for one or the other signal (e.g., Bresciani et al., 2006; Ernst, 2005; Shams, Ma, & Beierholm, 2005). However, to date there are no rigorous empirical tests of whether signal accuracy is affecting the integration process when Optimal Multisensory Integration  

533

Stein—The New Handbook of Multisensory Processes 8466_030.indd 533

12/21/2011 6:03:21 PM

Q

Q

estimating a world property in the way we have discussed here. A recent study by Girshick and Banks (2009), in which they investigated the integration process while deliberately adding large signal biases, may be interpreted in this way. With small discrepancies they found weighted averaging according to the reliability of the signals. With large discrepancies, however, the integration process breaks down (as discussed in a later section). When subjects are still asked to report a single estimate of the object property, the effects of the signal accuracy should become apparent because then the estimate should be based more on the relative accuracy and less on the relative reliability of the signals. Interestingly, Girshick and Banks, 2009 found that with large discrepancies between the signals the interaction between the estimates could no longer be described by a weighting scheme based on the relative reliability of the signal. Instead, with large discrepancies, apparently random weights were observed. Although the interpretation provided by Girshick and Banks, 2009 is slightly different, it may well be that those weights were based on the relative signal accuracy instead of the relative signal reliability. This would be a first hint that the accuracy of the signals is affecting the perceptual integration process. However, at this point this is still speculative. The race is still on. The coupling prior described thus far represents the co-occurrence statistics of sensory signals p(SV, SH) that are potentially biased S = SW + B assuming SWV = SWH . Thus, there is no uncertainty about the property SW that is estimated, and the signals are assumed to be fully redundant (i.e., SWV = SWH ) (see also Ernst & Bülthoff, 2004; Knill & Pouget, 2004; Witten & Knudsen, 2005). Consequently, the variance σ x2 of the prior is determined only by the joint distribution of biases. However, we may easily use the same framework in which the prior describes the co-occurrence statistics of sensory signals in general, relaxing the assumption that the signals are fully redundant. Thus, the prior may reflect variance due to bias—but also other variations affecting the co-occurrence statistics of the signals. Consider, for example, the size of an object and the pitch of a sound it may produce: the co-occurrence statistics of these two signals may show a positive correlation because bigger objects usually produce lower-pitched sounds. Yet there is variance in the co-occurrence statistics between those signals because there are other factors, such as the density of the material, that also affect this correlation. Still, these co-occurrence statistics, when represented in a prior, can be used to improve the joint perception of these signals by reducing the variance of the estimates through integration. Staying with the example of estimating visual and haptic size, but relaxing the assump-

534  

tion of redundancy (SWV ≠ SWH ), we may still find cases in which the visual and haptic signals are correlated. The correlation in the signals in this case results from a correlation of the world properties SWV and SWH , although they are not equal. Consider, for example, a visual and haptic estimate that is not derived from the same location on the object but within close proximity Δx. Because the world tends to vary smoothly with space (and time), signals derived in close proximity on an object will still be correlated. This is illustrated in figure 30.9 (see below). Again, we may use this correlation to improve perception through integration of the signals with a prior that represents these statistics (see also Ernst & Di Luca, 2011). As the correlation between the signals will decease with the spatial (or temporal) separation from which they are derived, we can use this framework also to model the breakdown of integration. This is discussed in more detail in the section on nonGaussian probability distributions.

Correlated Noise Sources In this section I discuss some of the consequences for multisensory integration that occur when we relax the assumption that the noise distributions associated with the different sensory estimates are independent. To describe the integration process with correlated noise distributions, we rely on the same Bayesian framework as already introduced in the last section. This framework helps to illustrate the effects of correlated noises in a very transparent fashion. In this section we restrict ourselves to cases of compete fusion using the impulse prior. Still, the framework can easily be generalized to cases of incomplete fusion (“coupling”) as well. Furthermore, for now we restrict ourselves to cases of positive correlation. Figure 30.5 shows three examples of visual and haptic signals containing different amounts of noise in the two channels (σV:σH = 1:1, 2:1, 3:1). The noise distributions are either uncorrelated (left panels) or correlated with ρ = 0.45 (right panels). The correlation in the noise distribution distorts the joint probability distribution according to: 1

 (zV − SV )2 (z H − S H )2 2 ρ (zV − SV )(z H − S H )  + −  2 σV σ H σV2 σH 

1 − 2(1− ρ2 )  p (z | S ) = e N3



(30.8)

N3 is again just a normalization constant. The weighting between the sensory estimates is indicated by the direction of the arrow in figure 30.5. Given correlated noise distributions the optimal weight assigned to the estimates is given by (Oruç, Maloney, & Landy, 2003):

Marc O. Ernst

Stein—The New Handbook of Multisensory Processes 8466_030.indd 534

12/21/2011 6:03:21 PM

Uncorrelated Likelihood

x

Prior

Positive Correlation α

Posterior

Likelihood

x

Prior

α

Posterior

wV=0 wV=.5

wV=.5

wV=.5

haptic size

wV=1

wV=.03

wV=.2

wV=-.05 wV=.1

ρ=0.45 visual size

visual size

Figure 30.5  Examples of uncorrelated (left) and positively correlated (right) noise distributions with correlation ρ = 0.45. Each row illustrates an example with a different σV/σH ratio. The weighting between the estimates is indicated by the direction of the arrow. With a 1:1 ratio the weighting will always be 0.5 independent of the correlation. With a higher ratio the weighting for the less reliable signal will decrease with the correlation (difference between grey and black arrow in the column posterior of the examples with correlation), up to a point when the weights even become negative (lower row). The spread of the posterior distribution indicates the remaining noise in the combined estimate. With negative weights it can happen that the combined estimate is more reliable when the noise distributions are correlated (lower row).

wV =

rV ′ with r ′ = r − ρ r r and V V V H rV ′ + rH ′

(30.9) rH ′ = rH − ρ rV rH If the correlation is zero, then equation 30.9 reduces to equation 30.2. In case the reliabilities of the individual signal estimates are equal (rV = rH), the weight will always be 0.5 irrespective of the correlation (direction of the arrow α = 45°). This is shown in the top row of figure 30.5. However, if the reliabilities of the estimates differ, the optimal weight assigned to the less reliable estimate will decrease with the amount of positive correlation (figure 30.5, column posterior, difference between gray and black arrow), up to a point where the optimal weights may even become negative (figure 30.5, bottom row). With increasing difference between the reliabilities of the estimates, less correlation in needed in order to reach the tipping point at which the optimal weights will become negative. As discussed above, when estimates are integrated with uncorrelated noises the reliability will always increase (equation 30.3). If the noise distributions of the estimates are positively correlated there is still a

benefit from integration in the form of an increased reliability. However, this benefit will be less compared to the uncorrelated case and it is less the higher the correlation, if the weights are constrained to be nonnegative and to sum to 1 (Oruç et al., 2003): rV + rH − 2ρ rV rH (30.10) 1 − ρ2 With negative weights, however, it is possible to find combinations of reliabilities (rv,rH) together with some correlation ρ such that the reliability of the integrated estimate exceeds that achievable with uncorrelated noise sources (and the same reliabilities). To date, however, to the best of our knowledge there is no report in the literature that claimed to have found negative weights for the linear weighted integration of sensory estimates. Despite the lack of reports about negative weights in the literature, in general, correlations between the noise distributions of different sources of sensory information seem very plausible. Noise in sensory estimates will be largely generated by the stochastic process of neuronal transduction and transmission. If some of the r=

Optimal Multisensory Integration  

535

Stein—The New Handbook of Multisensory Processes 8466_030.indd 535

12/21/2011 6:03:23 PM

Q

neuronal transmission or transduction process is shared between two sensory signals, their noise distributions may become correlated. For example, visually estimating a slant from different texture signals, such as perspective projections and the density gradient of texture elements, may contain noise distributions, which are correlated because much of the neuronal processing is shared among these signals. Oruç et al. (2003) have investigated the effects of correlated noises on the integration of sensory estimates and have found the first evidence that visual texture cues to slant may indeed contain correlated noise distributions. When considering the integration of information across the sensory modalities, such as the integration of size estimates derived from vision and touch, it seems reasonable to assume that the noise distributions of these estimates are largely independent because much of the neuronal processing (transduction and transmission) between those multisensory signals is not shared. If, however, a correlation between the noise distributions is introduced by common neuronal processing mechanisms, this correlation will always be positive. It is more difficult to conceive situations in which the

noise distributions may become negatively correlated. Still, at least from a theoretical point of view, it is interesting to consider these cases of negative correlation as well. If the noise distributions were negatively correlated, compared to the uncorrelated case, the weights of the estimates would always tend more toward 0.5, and the reliabilities of the integrated estimate would always be increased, because the negatively correlated noises will cancel each other when the estimates are averaged during integration (figure 30.6). If the noises between two estimates were perfectly anticorrelated, averaging would result in a noise-free estimate.

Non-Gaussian Probability Distributions and the Breakdown of Integration The last of the three assumptions that we consider here in this chapter concerns the shape of the probability distributions, which in the standard model of integration are assumed to be all Gaussians. When two Gaussian-shaped probability distributions are multiplied, as implied in the standard model of integration, the result will again be a Gaussian-shaped distribution. Interest-

Negative Correlation x

Prior

α

Posterior

haptic size

Likelihood

visual size

Q

Figure 30.6  Three examples of multisensory integration with a negative correlation between the noise distributions of the estimates (ρ = 0.45). If the reliabilities of the estimates are equal (upper row), the weight will continue to be 0.5; however, with increasing negative correlation the reliability of the combined estimate will increase. If the reliabilities of the two estimates are unequal (lower two rows), with increasing negative correlation the weights will tend more toward 0.5. Still, the reliability of the integrated estimate will increase with a higher the negative correlation.

536  

Marc O. Ernst

Stein—The New Handbook of Multisensory Processes 8466_030.indd 536

12/21/2011 6:03:23 PM

ingly, such a product will be independent of the magnitude of discrepancy ( Dˆ = Sˆi − Sˆj ) between the different distributions. This is illustrated in figure 30.7, left column. Multiplying two Gaussian distributions, here the prior and the likelihood, will result in a posterior that will always be between those two. Thus, the weighting between the different sources of information described by Gaussian-shaped probability distributions and the effects on reliability are all independent of the size of the discrepancy Dˆ between those distributions. That is, no matter how big the discrepancy, the interaction between the different information sources will persist. Consequently, because the signals do not become independent with increasing discrepancy, it is impossible to model the breakdown of integration using Gaussian-shaped probability distributions. In order to behave robustly, however, integration should break down—that is, signals should become independent—when the discrepancies between the signals increase (e.g., Gepshtein, Burge, Banks, & Ernst, 2005; Landy et al., 1995). Recently, it was suggested that the Gaussian assumption could be relaxed to account for this possibility (e.g., Girshick & Banks, 2009; Knill, 2007a, 2007b; Natarajan, Murray, Shams, & Zemel, 2009; Roach, Heron, & McGraw, 2006; Sato, Toyoizumi, & Aihara, 2007; Shams & Beierholm, 2010). In particular, Roach et al. (2006) introduced “heavy tails” to the Gaussian distribution of the coupling prior. On the

A

ˆ D Prior

Likelihood Posterior

B

C

D

Heavy Tails

1D cross section Haptic size

Gaussian

right side of figure 30.7 this is illustrated for a onedimensional cross section along the negative diagonal of the visual-haptic signal space. Introducing heavy tails transforms the prior in a very sensible way: close to the center the prior by and large keeps its Gaussian shape with a reasonable variance; far from the center the prior does not approach zero probability as a Gaussian would but maintains a nonzero probability. In essence Roach et al. (2006) suggest a mixture of Gaussians using a linear combination of a flat coupling prior that is used for modeling independence (figure 30.3, lower row) and an intermediate coupling prior that is used for modeling coupling or fusion (figure 30.3, middle and upper row). As a result, the system continues to behave as it did without the long tails when the discrepancies are reasonably small, since the central Gaussian part of the prior plays the dominant role. In figure 30.7A this can be seen when the posterior distributions are compared between the left and right columns, which are virtually identical. For larger discrepancies, however, such a prior ensures that the process converges toward segregation because of the increased influence of the flat part of the prior (figure 30.7B,C). When the discrepancy is sufficiently large, as illustrated in figure 30.7D, the flat part of the prior will dominate, and the signals will be close to independent—that is, the likelihood and posterior distributions will become virtually identical.

ˆ D heavy tails

heavy tails

Visual size

ˆ D

ˆ D

ˆ D

ˆ D

ˆ D

ˆ D

Figure 30.7  Integration of prior and likelihood into the posterior using Bayes’s rule. The left column shows the integration scheme assuming all Gaussian distributions. The depicted situation corresponds to a cross section along the negative diagonal of the situation described in figure 30.3. The shape of the posterior is independent of the discrepancy Dˆ , and it represents a constant weighting between likelihood and prior. The right column shows the same integration scheme with a slightly modified prior containing “heavy tails.” With a small discrepancy, the center part of the prior will dominate, and the resulting posterior is virtually identical to the situation in which the distributions were all Gaussians, thus representing a relative weighting between likelihood and prior (A). With large discrepancies, however, the product will be dominated by the heavy tails of the prior, leading to a posterior that signifies independence (D). That is, the likelihood function and the posterior will become virtually identical. Optimal Multisensory Integration  

537

Stein—The New Handbook of Multisensory Processes 8466_030.indd 537

12/21/2011 6:03:23 PM

Q

In summary, relaxing the Gaussian assumption and replacing the Gaussian with a Gaussian-like distribution containing heavy tails makes it possible to model a smooth transition from fusion to segregation with increasing discrepancy between the probability distributions (figure 30.7, right column). However, the transition is not always smooth like that. With a different relative spread of the distributions it is possible to find conditions under which the posterior distribution will become bimodal (figure 30.8). Still, with small discrepancies the integration process is dominated by the central part of the prior signifying fusion, whereas with large discrepancies the prior loses its influence, demonstrating independence. When the posterior distribution is bimodal, the question naturally arises what the perceptual estimate will be that corresponds to such a distribution. Most often it is assumed that the perceptual estimate corresponds to the maximum of the posterior distribution (the maximum a posteriori or MAP estimate). However, with a bimodal posterior this seems unlikely because there are multiple maxima. In addition, the MAP estimate will be discontinuous with increasing discrepancy, when suddenly one mode will exceed the other (transition between figure 30.8B and C)—an effect that has not been empirically observed so far for perceptual estimates. Thus, in order to guarantee a smooth transition in the perceptual estimates with increasing separation, it is more likely that the

perceptual estimates are based on the mean or the median of the posterior distribution (Natarajan et al., 2009). However, at present this is speculation, and we do not know how the perceptual estimate is determined from the probabilistic computations that may be performed by the brain. What does such a prior with heavy tails represent? In general a prior should represent the statistics of the signals derived from the environment. Thus, a prior with heavy tails represents the a priori knowledge that sensory signals are not in a strictly Gaussian distribution but that there is some finite probability of encountering signals far away from the center as well. Let me illustrate this with the following example. We assume that under most circumstances, objects and the environment tend to change in their properties over space and time in a smooth, continuous way rather than a discontinuous and chaotic manner. Thus, despite the spatial or temporal discrepancies, generally there will still be a correlation between the multisensory signals. This correlation implies that despite some spatial and temporal separation there is still redundancy in the multisensory signals. This redundancy should be used by the brain to improve its estimates. Examples of distributions of spatially discrepant SV and SH size signals are depicted in figure 30.9. With increasing spatial separation Δx, this correlation between the signals becomes weaker and weaker (figure 30.9 left-to-right column)

ˆ D

A

Fusion

heavy tails

ˆ D B Prior

Likelihood Posterior

C

ˆ D

ˆ D D

Q

Independence

Figure 30.8  Integration of the likelihood function with a heavy-tailed prior using Bayes’s rule. This example shows a condition in which the posterior distribution becomes bimodal for certain discrepancies between likelihood and prior. At small discrepancies the central part of the prior dominates, which leads to integrated signals (A). At large discrepancies the long tails of the prior dominate, so the signals will become independent (D).

538  

Marc O. Ernst

Stein—The New Handbook of Multisensory Processes 8466_030.indd 538

12/21/2011 6:03:24 PM

Small separation

-distribution

Large separation

-distribution

Figure 30.9  Examples of visual and haptic signals derived in close proximity on an object. Despite the spatial separation at which the signals are derived on the object, generally there will still be a correlation between those signals as the object, and the world in general, tends to vary smoothly. The larger the separation between the origins of the signals, the weaker the correlation will be in general (left column vs. right column). The lower panels show the co-occurrence statistics for signals derived in close proximity and with a larger separation.

until, finally, the co-occurrence statistics of signals derived from vision and touch will result in a flat distribution. Still there is some remaining probability of jointly encountering visual and haptic signals even when the separation becomes large, which may take the form of a Gaussian-like prior with heavy tails. We can now take this concept and extend it to model the breakdown of integration across multiple dimensions of space Δx, time Δt, and intersensory conflict D = SV − SH (figure 30.10). The left column of figure 30.10 shows the likelihood function, which is identical for all five situations depicted because the sensory measurements are assumed to be identical in all cases. The signals contain an intersensory conflict of D = SV − SH.

The change in the co-occurrence statistics between the visual and haptic signals with increasing spatial or temporal separation is reflected in the prior. For Δx = 0 (Δt = 0) such a prior would resemble a central Gaussian with intermediate variance (analogous to figure 30.3, middle row) that also has heavy tails to account for the integration breakdown with increasing intersensory conflict in the size estimates. This is highlighted in the panel of the prior by the one-dimensional cross section along the negative diagonal (Gaussian-like distribution with heavy tails). With increasing spatial or temporal discordance (Δx ≠ 0or Δt ≠ 0), the variance of the central part of the prior increases. This is because the prior probability of encountering combinations of SV and SH signals for which the discrepancy D = SV − SH is large increases with spatial or temporal separation, Δx or Δt, respectively (cf. figure 30.9). With large separation the co-occurrence statistics will become flat, and the prior along the dimension of separation will also contain heavy tails. This is again highlighted by the onedimensional cross section through the prior, now along the dimension of separation. For each level of separation the posterior represents the combination of the likelihood function with such a prior using Bayes’s rule. Because of the heavy tails the influence of the prior on the likelihood function is largest with an intermediate value of separation (Δx = ±5). This can be seen from the posterior being most different from the likelihood function, the maximum of which is marked by the small crosses. With a larger separation (e.g., ••), when the central part of the prior increases in spread, the influence of the prior decreases such that the signals become more and more independent. Without separation (Δx = 0), when the central part of the prior shows minimal spread, the intersensory conflict becomes noticeable. In this case the posterior distribution becomes bimodal because of the heavy tails, with more energy in the distal mode. Thus, without separation the posterior again becomes more similar to the likelihood function, which is a sign of the breakdown of integration. This is reasonable because, given the narrow spread of the prior, it is unlikely that signals with such a relatively large discrepancy (D = SV − SH) are derived from the same object or event, so they are kept independent. The decrease of the interaction between the multisensory signals with discrepancy or separation between the signals has been termed the window of integration. Interestingly, for the example depicted in figure 30.10, this window takes the shape of a “w.” That is, the integration effects are strongest at an intermediate separation, they decrease as expected with large separations, and they dip without separation. The dip is caused by the intersensory conflict between the signals, which Optimal Multisensory Integration  

539

Stein—The New Handbook of Multisensory Processes 8466_030.indd 539

12/21/2011 6:03:24 PM

Q

Likelihood

Prior

Posterior 1D cross section

Haptic Size

∆x = –10 (∆t = –200)

∆x = –5 (∆t = –100)

he a ta vy ils

∆ x = 0 (∆ t = 0) D = SV –SH

heavy tails ∆x = 5 (∆ t = 100)

∆x = 10 (∆ t = 200) Visual Size

Figure 30.10  Robust estimation, that is, the breakdown of integration with spatial (temporal) separation Δx (Δt) and with intersensory conflict D = SV − SH. For each example with different separation Δx (Δt) the same signals are measured; thus, the likelihood functions are all identical. The coupling prior is assumed to be of Gaussian shape with heavy tails along both the dimension of intersensory conflict and the dimension of spatial (temporal) separation. This is indicated by the one-dimensional cross sections. The variance of the central part of the prior increases with increasing spatial (temporal) separation, reflecting the decrease in correlation between the two signals (figure 30.9). The posterior shows the result of combining likelihood function and prior according to Bayes’s rule. For convenience, the location of the maximum of the likelihood function is also marked by small crosses in the panel of the posterior. The influence of the prior on the likelihood function is largest for the intermediate cases (Δx = ±5). This is a sign of multisensory integration. With a larger separation (Δx = ±10) the prior loses its influence. Without separation, Δx = 0, when the spread of the prior at the center becomes narrow, the intersensory conflict becomes noticeable. Here the flat part of the prior causes the posterior to become bimodal with more energy in the distal mode. Thus, interestingly, in this example without separation the signals become more independent again.

Q

becomes noticeable when the spread of the prior centrally is narrow and the conflict is sufficiently large. Under most conditions, however, when the intersensory conflicts are small or the spread of the prior is larger, the window of integration will be “u”-shaped. A “u”shaped function for the window of integration, unlike a “w”-shaped function, has been reported frequently in the literature (e.g., Bresciani, et al., 2005; Jack & Thurlow, 1973; Jackson, 1953; Radeau & Bertelson, 1987; Shams, Kamitani, & Shimojo, 2002; van Wassenhove, Grant, & Poeppel, 2007; Warren & Cleaves, 1971; Witkin, Wapner, & Leventhal, 1952). Recently, a few other approaches have been proposed to model this elusive aspect of the robustness or breakdown of integration (for a recent review see Shams & Beierholm, 2010). I will discuss two of the more prominent proposals in this direction, both of which closely resemble the proposal presented here. The first proposes that the likelihood function is a mixture of Gaussians (Knill, 2007b) to explain the breakdown of integration, whereas the second approach formalizes

540  

the concept of causal inference to achieve the same purpose (Körding et al., 2007). Both approaches model the transition from fusion to segregation successfully; however, they both relate to special cases and specific scenarios for which they might be considered optimal. The mixture-of-Gaussians approach by Knill (2007b) refers to a specific scenario in which a texture signal to slant is modeled by a likelihood function, which is composed of a central Gaussian with heavy tails. This proposal resembles what we have discussed earlier in this chapter, except that the heavy tails are added to the probability distributions of one of the sensory estimates (i.e., the likelihood function) and not to a coupling prior. The primary argument in this theory is that in order for texture to be a useful signal, we must make some prior assumptions about the isotropy of the texture that, in statistical considerations, could possibly fail in some cases. This argument provides a suitable justification for the use of heavy tails. The argument, however, is specific to the texture signal and can there-

Marc O. Ernst

Stein—The New Handbook of Multisensory Processes 8466_030.indd 540

12/21/2011 6:03:25 PM

fore not easily be extended to other within- or crossmodal sensory signals. The second proposal attempts to formalize the concept of causal inference to model why integration breaks down with highly discrepant information (Körding et al., 2007). The proposal has the same intuitive basis that we have been referring to from time to time, i.e., segregation at large discrepancies, integration when there is no apparent discrepancy. This model, however, concentrates on the causal attribution aspects of combining different signals. Two signals could either have one common cause, if they are generated by the same object or event, or they may have different causes when generated by different objects or events. In the former case, the signals should be integrated, and in the latter case, they should be kept apart. The model takes into account a prior probability pcommon of whether a common source or separate sources exist for a given set of multisensory signals. pcommon = 1 corresponds to perfect knowledge that there is a common source and thus complete fusion. As previously, complete fusion can be described by a coupling prior corresponding to an impulse prior with σ x2 = 0 . ρcommon = 0 corresponds to complete knowledge that there are two independent sources and thus complete segregation. Complete segregation was previously described by a flat coupling prior with σ x2 → ∞. Whenever two sensory signals are detected, in general there will be some probability pcommon of a common cause and some probability 1 − ρcommon of independent causes. This probability depends on many factors such as for example temporal delays, visual experience, context, and many more (Körding et al., 2007), so it is not easy to predict. In any case, however, it will lead to a weighted combination of the two priors for complete fusion and segregation and will thus in essence be analogous to a coupling prior, which has the form of an impulse prior with flat, heavy tails (Körding et al., 2007, supplementary information). In this sense, the causal-inference model is a special case of the model described above. It does not allow for variance in the prior describing the common cause (i.e., the impulse prior) because just like the standard model of integration, the causal-inference model is based on the assumption that all sensory signals with a common cause are perfectly correlated (i.e., their co-occurrence statistics obeys a correlation of 1) and accurate (i.e., the sensory estimates are assumed to be unbiased). Because it does not consider a weaker correlation between the co-occurring signals (i.e., the situation illustrated in figure 30.9) and because it does not take into account the (in)accuracy of the signals (i.e., the situation in figure 30.2), this causal-inference model does not optimally balance the benefits and costs of multisensory

integration, which are, reduced variance and potential biases, respectively. Despite the conceptual differences, mathematically all these models are very similar, so at present it is still unclear which of these models best describes the breakdown of integration with discordant signals in human perception. Thus, empirical evidence is needed to distinguish and validate the different models.

Concluding Remarks and Some Open Questions Many recent studies have shown that multisensory integration, that is, the integration of several redundant sources of information derived from different sensory modalities, is “optimal.” That is, the brain forms a weighted average of the different sources of information with weights that are proportional to its precision in order to increase the precision of the final integrated perceptual estimates. This standard model of integration makes several assumptions, which in general, however, may not be warranted. Three of the major assumptions are that the signals are redundant and unbiased, that the noise distributions associated with the estimates are independent, and that the probability distributions used for describing the estimation process are all Gaussian-shaped. Here I have discussed one by one how relaxing these assumptions can influence the integration process. This led to interesting predictions, most of which have not yet been tested rigorously in empirical studies. Some of the predictions are: when there is only a weak coupling between the multisensory signals, the estimate of the world property should not only be influenced by the relative variance of the estimates but also by their relative accuracy (two-step estimation process). As a result, the effect of relative accuracy on the final perceptual estimate should be more noticeable the weaker the coupling is; that is, it should be pronounced during integration breakdown with increasing discrepancy between the signals. Another prediction was that we ought to be able to find negative weights in case the noise distributions of the sensory estimates are positively correlated and their individual precisions differ significantly. Finally, there was the prediction that, with a larger discrepancy between the signals, integration should break down. The probability distributions describing the perceptual estimates in that case may become bimodal, which may lead to nonmonotonic effects in the perceptual estimates as the discrepancy between the signals increases. Thus, we predicted that the window of integration might become “w”-shaped under some specific conditions. Optimal Multisensory Integration  

541

Stein—The New Handbook of Multisensory Processes 8466_030.indd 541

12/21/2011 6:03:25 PM

Q

By relaxing the different modeling assumptions it is certainly possible to model a wider range of phenomena, such as incomplete fusion or the breakdown of integration with discrepancy between the signals. Thus, the model overall becomes more realistic. Still, to date all the modeling efforts in this direction concentrate on very simplified situations (e.g., an isolated object that can be seen and felt), which are still far away from the perception of natural scenes. Natural scenes are cluttered with a wealth of multisensory signals available simultaneously, and all these signals are changing all the time in a highly dynamic fashion. So there are still many open questions concerning our understanding of multisensory integration and its modeling. Some of those questions concern the relationship between precision and accuracy of the signals and how they are acquired and represented in the brain. This relates to the question of how prior knowledge in general is acquired and how accurately such prior knowledge represents the statistics of the environment or whether there are any heuristics to help in the updating of such priors. Furthermore, the model discussed here basically provides only a static snapshot of the highly dynamic environment. Thus, it would be very interesting to include some of the dynamic aspects of the environment in the models of multisensory integration as well. Additionally, the models have to be extended to more than just two signals in order to account for the richness of the environment. More complete models, however, may quickly become computationally very expensive. This raises the question whether integration behavior would still be optimal even under more complex and naturalistic situations or whether the brain would have a fallback strategy that is computationally less expensive such as using heuristics or constraints. In any case, despite the significant progress made in recent years concerning our understanding of human multisensory perception, we are still far away from a complete model of such a complex behavior. Whether we will ever get there is yet another open question.

Acknowledgments I thank the European Commission (EU Grant THE, IST-2009–248587), the Bernstein Center for Computational Neuroscience, Tübingen (BMBF; FKZ: 01GQ1002), and the Max Planck Society for support.

Notes

Q

1.  Variables with a hat always denote noisy sensory estimates, whereas variables without a hat represent signals from which the sensory estimates are derived.

542  

2. Reliability and precision are used interchangeably throughout the chapter.

References Alais, D., & Burr, D. (2004). The ventriloquist effect results from near optimal crossmodal integration. Current Biology, 14, 257–262. Bresciani, J.-P., Dammeier, F., & Ernst, M. O. (2006). Vision and touch are automatically integrated for the perception of sequences of events. Journal of Vision (Charlottesville, Va.), 6, 554–564. Bresciani, J.-P., & Ernst, M. O. (2007). Signal reliability modulates auditory-tactile integration for event counting. Neuroreport, 18, 1157–1167. Bresciani, J.-P., Ernst, M. O., Drewing, K., Bouyer, G., Maury, V., & Kheddar, A. (2005). Feeling what you hear: auditory signals can modulate tactile taps perception. Experimental Brain Research, 162, 172–180. Cochran, W. G. (1937). Problems arising in the analysis of a series of similar experiments. Journal of the Royal Statistical Society, 4(Supplement), 102–118. Drewing, K., & Ernst, M. O. (2006). Integration of force and position cues for shape perception through active touch. Brain Research, 1078, 92–100. Ernst, M. O. (2005). A Bayesian view on multimodal cue integration. In G. Knoblich, I. Thornton, M. Grosjean, & M. Shiffrar (Eds.), Human body perception from the inside out (pp. 105–131). New York: Oxford University Press. Ernst, M. O. (2007). Learning to integrate arbitrary signals from vision and touch. Journal of Vision, 7(5):7, 1–14. Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415, 429–433. Ernst, M. O., & Bülthoff, H. H. (2004). Merging the senses into a robust percept. Trends in Cognitive Sciences, 8, 162–169. Ernst, M. O., & Di Luca, M. (2011). Multisensory perception: from integration to remapping. In J. Trommershäuser, K. Körding, & M. S. Landy (Eds.), Sensory cue integration (pp. 224–250). New York: Oxford University Press. Gepshtein, S., Burge, J., Banks, M. S., & Ernst, M. O. (2005). Optimal combination of vision and touch has a limited spatial range. Journal of Vision (Charlottesville, Va.), 5, 1013–1023. Ghahramani, Z., Wolpert, D. M., & Jordan, M. I. (1997). Computational models of sensorimotor integration. In P. G. Morasso & V. Sanguineti (Eds.), Self-organization, computational maps and motor control (pp. 117–147). Amsterdam: Elsevier Press. Girshick, A. R., & Banks, M. S. (2009). Probabilistic combination of slant information: weighted averaging and robustness as optimal percepts. Journal of Vision (Charlottesville, Va.), 9, 1–20. Helbig, H. B., & Ernst, M. O. (2007). Optimal integration of shape information from vision and touch. Experimental Brain Research, 179, 595–606. Hillis, J. M., Ernst, M. O., Banks, M. S., & Landy, M. S. (2002). Combining sensory information: mandatory fusion within, but not between, senses. Science, 298, 1627–1630.

Marc O. Ernst

Stein—The New Handbook of Multisensory Processes 8466_030.indd 542

12/21/2011 6:03:25 PM

Hillis, J. M., Watt, S., Landy, M. S., & Banks, M. S. (2004). Slant from texture and disparity cues: optimal cue combination. Journal of Vision (Charlottesville, Va.), 4, 1–24. Jack, C. E., & Thurlow, W. R. (1973). Effects of degree of visual association and angle of displacement on the “ventriloquism” effect. Perceptual and Motor Skills, 37, 967–979. Jackson, C. V. (1953). Visual factors in auditory localization. Quarterly Journal of Experimental Psychology, 5, 52–65. Knill, D. C. (2007a). Learning Bayesian priors for depth perception. Journal of Vision, 7(8):13, 1–20. Knill, D. C. (2007b). Robust cue integration: A Bayesian model and evidence from cue-conflict studies with stereoscopic and figure cues to slant. Journal of Vision, 7(7):5, 1–24. Knill, D. C., & Pouget, A. (2004). The Bayesian brain: the role of uncertainty in neural coding and computation. Trends in Neurosciences, 27, 712–719. Knill, D. C., & Saunders, J. A. (2003). Do humans optimally integrate stereo and texture information for judgments of surface slant? Vision Research, 43, 2539–2558. Körding, K. P., Beierholm, U., Ma, W., Quartz, S., Tenenbaum, J., Shams, L., et al. (2007). Causal inference in multisensory perception. PLoS One, 2, e943. Landy, M. S., & Kojima, H. (2001). Ideal cue combination for localizing texture-defined edges. Journal of the Optical Society of America. A, Optics, Image Science, and Vision, 18, 2307–2320. Landy, M. S., Maloney, L. T., Johnston, E. B., & Young, M. J. (1995). Measurement and modeling of depth cue combination: in defense of weak fusion. Vision Research, 35, 389– 412. Ma, J. W., Beck, J. M., Latham, P. E., & Pouget, A. (2006). Bayesian inference with probabilistic population codes. Nature Neuroscience, 9, 1432–1438. Natarajan, R., Murray, I., Shams, L., & Zemel, R. S. (2009). Characterizing response behavior in multi-sensory perception with conflicting cues. In D. Koller, et al. (Eds.), Advances in neural information processing systems 21 (pp. 1153–1160). Cambridge, MA: MIT Press. Oruç, I., Maloney, T. M., & Landy, M. S. (2003). Weighted linear cue combination with possibly correlated error. Vision Research, 43, 2451–2468. Pouget, A., Deneve, S., & Duhamel, J. (2002). Opinion: a computational perspective on the neural basis of multisen-

sory spatial representations. Nature Reviews. Neuroscience, 3, 741–747. Radeau, M., & Bertelson, P. (1987). Auditory-visual interaction and the timing of inputs. Thomas (1941) revisited. Psychological Research, 49, 17–22. Roach, N., Heron, J., & McGraw, P. (2006). Resolving multisensory conflict: a strategy for balancing the costs and benefits of audio-visual integration. Proceedings. Biological Sciences, 273, 2159–2168. Rosas, P., Wagemans, J., Ernst, M. O., & Wichmann, F. A. (2005). Texture and haptic cues in slant discrimination: reliability-based cue weighting without statistically optimal cue combination. Journal of the Optical Society of America. A, Optics, Image Science, and Vision, 22, 801–809. Sato, Y., Toyoizumi, T., & Aihara, K. (2007). Bayesian inference explains perception of unity and ventriloquism aftereffect: Identification of common sources of audiovisual stimuli. Neural Computation, 19, 3335–3355. Shams, L., & Beierholm, U. R. (2010). Causal inference in perception. Trends in Cognitive Sciences, 14, 425–432. Shams, L., Kamitani, Y., & Shimojo, S. (2002). Visual illusion induced by sound. Brain Research. Cognitive Brain Research, 14, 147–152. Shams, L., Ma, W. J., & Beierholm, U. (2005). Sound-induced flash illusion as an optimal percept. Neuroreport, 16, 1923–1927. van Beers, R. J., Sittig, A. C., & Denier van der Gon, J. J. (1998). The precision of proprioceptive position sense. Experimental Brain Research, 122, 367–377. van Beers, R. J., Sittig, A. C., & Denier van der Gon, J. J. (1999). Integration of proprioceptive and visual position information: an experimentally supported model. Journal of Neurophysiology, 81, 1355–1364. van Wassenhove, V., Grant, K., & Poeppel, D. (2007). Temporal window of integration in auditory-visual speech perception. Neuropsychologia, 45, 598–607. Warren, D. H., & Cleaves, W. T. (1971). Visual-proprioceptive interaction under large amounts of conflicts. Journal of Experimental Psychology, 90, 206–214. Witkin, H. A., Wapner, S., & Leventhal, T. J. (1952). Sound localization with conflicting visual and auditory cues. Journal of Experimental Psychology, 43, 58–67. Witten, I. B., & Knudsen, E. I. (2005). Why seeing is believing: merging auditory and visual worlds. Neuron, 48, 489–496.

Q Optimal Multisensory Integration  

543

Stein—The New Handbook of Multisensory Processes 8466_030.indd 543

12/21/2011 6:03:25 PM

Q

Stein—The New Handbook of Multisensory Processes 8466_030.indd 544

12/21/2011 6:03:25 PM