How automatic are crossmodal correspondences?

4 downloads 54210 Views 597KB Size Report
Jan 29, 2013 - E-mail address: [email protected].uk (C. Spence). .... the best documented to date, and which perhaps represent the most likely place to find ...... Seo et al., 2010; see also Nahm, Tranel, Damasio, & Damasio, 1993).
Consciousness and Cognition 22 (2013) 245–260

Contents lists available at SciVerse ScienceDirect

Consciousness and Cognition journal homepage: www.elsevier.com/locate/concog

Review

How automatic are crossmodal correspondences? Charles Spence a,⇑, Ophelia Deroy b a b

Crossmodal Research Laboratory, Department of Experimental Psychology, University of Oxford, Oxford, UK Centre for the Study of the Senses, University of London, London, UK

a r t i c l e

i n f o

Article history: Received 18 August 2012 Available online 29 January 2013 Keywords: Crossmodal correspondence Automaticity Strategic Voluntary Stimulus-driven Synaesthesia

a b s t r a c t The last couple of years have seen a rapid growth of interest (especially amongst cognitive psychologists, cognitive neuroscientists, and developmental researchers) in the study of crossmodal correspondences – the tendency for our brains (not to mention the brains of other species) to preferentially associate certain features or dimensions of stimuli across the senses. By now, robust empirical evidence supports the existence of numerous crossmodal correspondences, affecting people’s performance across a wide range of psychological tasks – in everything from the redundant target effect paradigm through to studies of the Implicit Association Test, and from speeded discrimination/classification tasks through to unspeeded spatial localisation and temporal order judgment tasks. However, one question that has yet to receive a satisfactory answer is whether crossmodal correspondences automatically affect people’s performance (in all, or at least in a subset of tasks), as opposed to reflecting more of a strategic, or top-down, phenomenon. Here, we review the latest research on the topic of crossmodal correspondences to have addressed this issue. We argue that answering the question will require researchers to be more precise in terms of defining what exactly automaticity entails. Furthermore, one’s answer to the automaticity question may also hinge on the answer to a second question: Namely, whether crossmodal correspondences are all ‘of a kind’, or whether instead there may be several different kinds of crossmodal mapping (e.g., statistical, structural, and semantic). Different answers to the automaticity question may then be revealed depending on the type of correspondence under consideration. We make a number of suggestions for future research that might help to determine just how automatic crossmodal correspondences really are. Ó 2013 Elsevier Inc. All rights reserved.

Contents 1. 2. 3.

4. 5.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Automaticity: Defining features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing the evidence concerning the automaticity of crossmodal correspondences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1. Goal independence and intentionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2. The problem of stimulus salience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3. Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1. The cognitive neuroscience of crossmodal correspondences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . On automaticity and different kinds of crossmodal correspondence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

246 248 250 250 251 253 254 255 256

⇑ Corresponding author. Address: Crossmodal Research Laboratory, Department of Experimental Psychology, University of Oxford, South Parks Road, Oxford OX1 3UD, UK. Fax: +44 1865 310447. E-mail address: [email protected] (C. Spence). 1053-8100/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.concog.2012.12.006

246

C. Spence, O. Deroy / Consciousness and Cognition 22 (2013) 245–260

5.1. Closing comments: Where do we stand with respect to the notion of automaticity? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

1. Introduction The term ‘‘crossmodal correspondences’’ is but one of a range of terms that has been used over the years by researchers in order to refer to our brain’s tendency to systematically associate certain features or dimensions of stimuli across the senses (see Marks, 2004; Spence, 2011, for reviews). Crossmodal correspondences have now been documented between many different pairs of stimulus dimensions: So, for example, auditory pitch has been shown to map onto visual elevation (see Ben-Artzi & Marks, 1995; Bernstein & Edelstein, 1971; Evans & Treisman, 2010; Melara & O’Brien, 1987; Miller, 1991; Patching & Quinlan, 2002; Proctor & Cho, 2006; Rusconi, Kwan, Giordano, Umiltà, & Butterworth, 2006), brightness and lightness (Hubbard, 1996; Ludwig, Adachi, & Matzuzawa, 2011; Marks, 1987; Martino & Marks, 1999; Melara, 1989; Mondloch & Maurer, 2004), size (Bien, ten Oever, Goebel, & Sack, 2012; Evans & Treisman, 2010; Gallace & Spence, 2006; Mondloch & Maurer, 2004; Parise & Spence, 2009, 2012), angularity of shape (Marks, 1987; Parise & Spence, in press), direction of movement (Clark & Brownell, 1976; Maeda, Kanai, & Shimojo, 2004; Sadaghiani, Maier, & Noppeney, 2009), and even spatial frequency (Evans & Treisman, 2010; Heron, Roach, Hanson, McGraw, & Whitaker, 2012). The majority of the studies of crossmodal correspondences that have been published to date have involved the presentation of auditory and visual stimuli. That said, similar crossmodal correspondences also exist between auditory pitch and the elevation of tactile stimuli (Occelli, Spence, & Zampini, 2009), not to mention the size of objects experienced haptically (Walker & Smith, 1985),1 and between tastes/odours and the angularity of visual stimuli or the pitch of auditory stimuli (Belkin, Martin, Kemp, & Gilbert, 1997; Crisinel & Spence, 2010, 2011, 2012; Deroy & Valentin, 2011; Hanson-Vaux, Crisinel, & Spence, 2013; see Deroy, Crisinel, & Spence, in press; Spence & Ngo, 2012a, for reviews). One important, but as yet unconvincingly answered question in the area of crossmodal correspondences research, concerns whether they affect performance (in tasks involving, for example, participants having to make speeded responses) in an automatic manner, or whether instead they affect performance in more of a strategic manner, emerging only as a function of the specific task demands and instructions imposed on the participant by the experimenter. Addressing the issue of the automaticity of crossmodal correspondences means, however, breaking the notion of automaticity down into a number of distinct sub-components (see Section 2) and then trying to make sense of the apparently contradictory results that have been published in the area recently (see Section 3). This exercise will further help to draw attention to the differences that exist between synaesthesia and crossmodal correspondences (see Section 4) while agreeing that, as there certainly are various types of crossmodal correspondence, one perhaps needs to accept that one’s answer to the automaticity question might vary as a function of the type of crossmodal correspondence under consideration. This said, the review of the literature relevant to the automaticity claim outlined here leads to the generation of a number of specific hypotheses that deserve further testing in future research on crossmodal correspondences (see Section 5). The original evidence that prompted researchers to make the automaticity claim came from the many speeded classification studies demonstrating that the speeded discrimination of target stimuli in one modality (e.g., discriminating larger vs. smaller circles, for visual stimuli presented on a monitor) was affected by the presentation of a completely task-irrelevant auditory stimulus that varied randomly on a trial-by-trial basis between high and low pitch (see Marks, 2004; Spence, 2011, for reviews). However, the suggested automaticity of crossmodal correspondences has been questioned by a series of negative results from studies that have sometimes failed to show any difference in behaviour between those conditions in which congruent vs. incongruent pairs of visual and auditory stimuli have been presented (see also Chiou & Rich, 2012a; Heron et al., 2012; Klapetek, Ngo, & Spence, 2012; Klein, Brennan, D’Aloisio, D’Entremont, & Gilani, 1987; Klein, Brennan, & Gilani, 1987; Sweeny, Guzman-Martinez, Ortega, Grabowecky, & Suzuki, 2012). Explaining why such differences between studies have been obtained represents a worthwhile endeavour: And, what is more, in answering the question of the degree of automaticity of crossmodal correspondences, two further related questions also come to the fore, as detailed below. The first question concerns the link between crossmodal correspondences and other phenomena such as colouredhearing synaesthesia,2 where the presence, or experience, of a stimulus in one modality (for instance, audition) induces a conscious concurrent in another, unstimulated modality (for instance, vision). Crossmodal ‘mappings’ or ‘correspondences’ between say, pitch and brightness can, at first, sometimes appear just as surprising as synaesthesia. In particular, it may not always be immediately obvious whether (or that) they are tracking, or picking-up on, some statistical regularity of the environment (see Spence & Deroy, 2012). The initially unexplainable nature of at least certain crossmodal correspondences has led to their being

1 Here it is worth noting that auditory stimuli tend to be assigned to specific elevations even in the absence of any stimuli being presented in another sensory modality (e.g., see Cabrera & Morimoto, 2007; Pedley & Harper, 1959; Pratt, 1930; Roffler & Butler, 1968; Trimble, 1934). The matching of auditory pitch to elevation has also been demonstrated under those conditions in which the participants have to respond to (i.e., discriminate) a centrally-presented visual target by pressing one of two vertically-arrayed buttons, while the pitch of an accessory sound is varied (see Keller & Koch, 2006). 2 Canonical cases of synaesthesia include such examples as coloured-hearing, tasted shapes, etc. (see Ward, 2012, for a recent review).

C. Spence, O. Deroy / Consciousness and Cognition 22 (2013) 245–260

247

described by a handful of researchers as synaesthetic correspondences (e.g., Crisinel & Spence, 2010; Walker et al., 2010), or even, on occasion, to their being subsumed under the heading of synaesthesia proper (see Martino & Marks, 2001; Rader & Tellegen, 1987; Rudmin & Cappelli, 1983; see Deroy & Spence, in press, for a discussion). Elsewhere, we have argued against this growing tendency to place crossmodal correspondences and synaesthesia on one and the same continuum, together with the related assumption that both phenomena can be explained by the same underlying neural mechanisms (e.g., Bien et al., 2012; Chiou & Rich, 2012a; Ward, Huckstep, & Tsakanikos, 2006). A key difference here concerns the fact that crossmodal correspondences are not necessarily associated with a conscious sensory concurrent (though see also Spence & Deroy, 2013). However, using this characteristic to distinguish between the two phenomena soon becomes complicated: First, because of the practical difficulty associated with deciding how to assess the occurrence of a conscious sensory concurrent or with sorting conscious cases of mental imagery guided by crossmodal correspondences from the supposedly distinct synaesthetic cases (Spence & Deroy, 2013); And, second, because of controversies surrounding the possibility of unconscious synaesthesia in certain difficult cases (e.g., see Cohen Kadosh & Terhune, 2011; Deroy & Spence, submitted for publication, for a discussion). Another important place to look, then, in order to try and distinguish synaesthesia from crossmodal correspondences concerns the automaticity of the process that, in both cases, ties together the two sensory stimuli. In the case of synaesthesia (at least in sensory as opposed to conceptual forms of synaesthesia), the processing of the synaesthetic concurrent is largely involuntary (with only a limited degree of control over the presence/experience of the concurrent being reported by some synaesthetes, see Rich, Bradshaw, & Mattingley, 2005; Rich & Mattingley, 2003; though see Price & Mattingley, in press), at least once the inducer has been attended to (e.g., Mattingley, Payne, & Rich, 2006; Rich & Mattingley, 2003, 2010; Sagiv, Heer, & Robertson, 2006; Ward, 2012, pp. 321–322). This is where deciding on the automaticity or involuntariness of the occurrence of crossmodal correspondences matters: For should crossmodal correspondences turn out not to be automatic that would provide additional grounds for distinguishing them from synaesthesia. Answering the automaticity question turns out to be important not only when it comes to trying to distinguish crossmodal correspondences from canonical cases of synaesthesia, but also because it may help researchers to assess how crossmodal correspondences fit more generally into the framework of multisensory integration research (e.g., Bremner, Lewkowicz, & Spence, 2012; Calvert, Spence, & Stein, 2004; Stein, 2012). The idea that crossmodal correspondences may influence multisensory integration, and that their effect is, or at least can be, perceptual rather than necessarily just decisional (e.g., possibly reflecting some sort of response bias) in nature is a relatively recent one (see Gallace & Spence, 2006; Parise & Spence, 2009; Sweeny et al., 2012). That said, it makes sense in those cases in which a perceptual effect of crossmodal correspondences has been demonstrated in the laboratory to ask whether or not the multisensory integration of the component signals is influenced by the focus of a participant’s attention. However, little is currently known about this topic, and the investigation is made all the more complicated by the large number of correspondences that have been reported to date, and which perhaps belong to different kinds (see Deroy et al., in press; Sadaghiani et al., 2009; Spence, 2011). What is more, a wide variety of tasks has been used to test each phenomenon (see Spence, 2011, for a review).3 Here, we suggest narrowing the discussion down to audiovisual correspondences, which are the best documented to date, and which perhaps represent the most likely place to find automaticity in crossmodal correspondences.4 One reason for considering audiovisual correspondences as representing one of the best potential candidates for being ‘automatic’ is that they have been suggested to operate as ‘coupling priors’ in Bayesian Decision Theory models of multisensory integration (see Ernst, 2007; Spence, 2011). Many coupling priors are considered to operate in an automatic manner: Helbig and Ernst (2008), for example, have demonstrated that the integration of visual and haptic shape information is unaffected by the focus of a participant’s attention to a specific sensory modality by means of an attentional load manipulation. This result then suggests some degree of automaticity for this particular form of multisensory integration.5 Another popular example of multisensory integration that is seemingly immune to spatial, or modality-based, attentional manipulations is the audiovisual ventriloquism effect (see Bertelson, Vroomen, de Gelder, & Driver, 2000; Vroomen, Bertelson, & de Gelder, 2001; though see Fairhall & Macaluso, 2009; Röder & Büchel, 2009). The kinds of multisensory integration that underlie these two effects therefore appear to operate in a fairly automatic manner, at least in the sense of their not being intentional, nor under an observer’s conscious control. By contrast, the audiovisual integration at stake in the McGurk effect is modulated by variations in cognitive load (see Alsius, Navarra, Campbell, & Soto-Faraco, 2005; Alsius, Navarra, & Soto-Faraco, 2007, for an exception; see Navarra, Alsius, Soto-Faraco, & Spence, 2009, for a review). Such mixed results suggest a range of degrees of automaticity in which to locate these crossmodal correspondences which can also act as ‘coupling priors’. In other words, it is important to see whether these crossmodal correspondences exhibit a different degree or kind of automaticity than other factors that are known to 3 Of course, the same problem raises its head for those researchers who are interested in assessing the automaticity of synaesthesia (Blake, Palmeri, Marois, & Kim, 2005; Esterman, Verstynen, Ivry, & Robertson, 2006; Lupiáñez & Callejas, 2006; Treisman, 2005). 4 Note here also that the majority of synaesthesia researchers who have attempted to tackle the automaticity question have tended to focus on just a few specific cases (or kinds of synaesthesia), normally those cases involving a visual concurrent (e.g., Blake et al., 2005; Esterman et al., 2006; Lupiáñez & Callejas, 2006; Treisman, 2005). 5 We would argue that the matching (or integration) of shape information across modalities should be considered as an example of amodal stimulus matching, rather than as an example of crossmodal correspondence (though not every researcher necessarily makes such a distinction, see Maurer & Mondloch, 2005).

248

C. Spence, O. Deroy / Consciousness and Cognition 22 (2013) 245–260

influence the way in which our brains combine sensory cues. In the next section, we highlight the way in which researchers have broken the notion of automaticity down into a number of distinct sub-components (or criteria). 2. Automaticity: Defining features Recent studies have delivered seemingly contradictory evidence concerning the automaticity of crossmodal correspondences (see Chiou & Rich, 2012a; Evans & Treisman, 2010; Klapetek et al., 2012; Parise & Spence, 2012; Peiffer-Smadja, 2010). These studies have utilised a variety of different experimental paradigms, including speeded classification (Evans & Treisman, 2010), exogenous spatial attentional cuing (Chiou & Rich, 2012a; Mossbridge, Grabowecky, & Suzuki, 2011), visual search (Klapetek et al., 2012), and a simplified variant of the Implicit Association Test (IAT; Parise & Spence, 2012; Peiffer-Smadja, 2010). Making comparison of these results somewhat harder, these studies have also tested several different audiovisual crossmodal correspondences, and have used a variety of stimuli (see Table 1). The authors of these various studies have come to conclusions that, at least on the face of it, appear to be mutually inconsistent: Chiou and Rich (2012a), for instance, have argued recently that crossmodal correspondences are not automatic in the sense that they are ‘primarily mediated by cognitive processes after initial sensory encoding’ and occur at a ‘relatively late stage of voluntary attention orienting’ (see Chiou & Rich, 2012a, p. 339). Similarly, Klapetek et al. (2012, p. 1161) have suggested that the crossmodal correspondence between auditory pitch and visual brightness operates ‘‘at a more strategic (i.e., rather than at an automatic or involuntary) level.’’ By contrast, Evans and Treisman (2010), Parise and Spence (2012), and Peiffer-Smadja (2010) all argue that the available evidence suggests that crossmodal correspondences are automatic. Evans and Treisman, for example, suggest that crossmodal correspondences ‘‘happen in an automatic fashion’’ (Evans & Treisman, 2010, p. 1) and later in their article state that ‘‘They are certainly automatic and independent of attention.’’ (Evans & Treisman, 2010, p. 10). Meanwhile, Parise and Spence point to the fact that crossmodal correspondences influenced even the fastest of their participants’ discrimination responses (i.e., those occurring within 400 ms of stimulus onset) and suggest that such evidence is at least consistent with claims regarding the automaticity of crossmodal correspondences. At this point, one might ask which of the above criteria should apply to evaluating these claims regarding the automaticity of crossmodal correspondences. This is by no means a simple question to answer given the widely-publicised difficulty associated with any attempt to define automaticity and drawing a clear line between those processes that are automatic and those that are not (e.g., Bargh, 1992, 1994; Logan, 1985; MacLeod & Dunbar, 1988; Schneider, Dumais, & Shiffrin, 1984; Shiffrin, 1988; see Moors & De Houwer, 2006, for a review). As Moors and De Houwer (2006, p. 297) succinctly put it: ‘‘Despite its central nature, there is no consensus about what automaticity means.’’ Following on from Moors and De Houwer (2006), though, it seems both theoretically and pragmatically more appropriate not to try and choose between the various criteria that have been put forward by researchers over the years, but rather to consider automaticity as an umbrella term which encompasses distinct features (or sometimes sets of closely related features). According to Moors and De Houwer, there are four, non-overlapping diagnostic features: the goal-independence criterion; the non-conscious criterion; the load-insensitivity criterion; and the speed criterion (cf. Santangelo & Spence, 2008; Treisman, 2005). One obvious advantage of this pluralist approach is to stress that these features are assessed separately and that specific experimental protocols usually only establish or measure one aspect or part of automaticity. Another theoretical advantage (although one might call it a challenge) is to raise the question regarding the way in which these various features relate to one another, whether they recommend the breaking of automaticity into degrees or distinct kinds, and how these degrees or features relate to the degree or kind of control that can be exerted on a certain process. Leaving these larger issues aside, here we are interested in determining whether crossmodal correspondences might satisfy all, or a subset, of these criteria, or satisfy them to varying degrees. The first of the criteria (the goal-independence criterion) eliminates goal-dependent or strategic processes from being categorised as automatic. A goal-directed process can be defined as one in which a person engages with the intention of pursuing a particular goal and over which s/he will exert a degree of control in terms of whether or not that goal is achieved. An automatic qua goal-independent process, then, has to be non-intentional: An individual cannot voluntarily prevent an automatic process from taking place. It also has to be out of the individual’s cognitive control. Note here that these two aspects can come apart: That is, a process can be automatic in the sense that an individual cannot prevent it from occurring but s/he may still be able to exert some control over it once it has started (cf. Mattingley, 2009; Ward, 2012). In terms of empirical testing, the intentionality of a process can be assessed by demonstrating that it only occurs when participants are instructed (or decide) to engage in a certain task. Assessing the controlled vs. uncontrolled character of a process is rather more complicated: For one thing, no process seems to be totally uncontrollable. Even classic involuntary responses, such as, for example, the knee-jerk, turn out to be under at least some degree of voluntary control (Matthews, 1991). Methodologically, then, the most appropriate solution here is, by default, to consider a process as non-controlled unless there are clear signs that the process cannot be completed without monitoring or evidence of control (see Moors & De Houwer, 2006, for further discussion of this point). The non-conscious criterion at first looks to be closely related to the goal-independent criterion, at least in the sense that by intentional and controlled, one often means consciously intentional and under conscious control. However, there are reasons to believe that unconscious control and unconscious decisions also exist, and hence to try and draw a distinction

249

C. Spence, O. Deroy / Consciousness and Cognition 22 (2013) 245–260 Table 1 Summary of recent studies relevant to assessing claims regarding the automaticity of crossmodal correspondences (see text for details). Crossmodal correspondence with auditory pitch

Task

Did the crossmodal correspondence affect performance?

Auditory pure tone stimuli (and their duration)

Visual elevation

Speeded detection

No

Rising (700–1200 Hz) vs. declining (900–400 Hz); 250 ms

Speeded classification

Yes

1000 vs. 1500 Hz; 120 ms

Mossbridge et al. (2011)

Visual elevation; visual size; visual spatial frequency Visual elevation

Go/No Go

Yes

Chiou and Rich (2012a)

Visual elevation

Speeded detection

Yes No Yes

Fernández-Prieto et al. (2012)

Visual elevation

Speeded detection

Klapetek et al. (2012)

Visual brightness

Yes

Parise and Spence (2012)

Visual size; visual angularity Visual spatial frequency

Speeded visual search Speeded IAT

Two ascending from 300 Hz to 450 Hz or to 450 to 600 Hz; two descending from 450 to 300 Hz or 600 to 450 Hz 300 vs. 1500 Hz; 200 ms 300 vs. 400 Hz; 200 ms 100 Hz vs. 900 Hz; 900 Hz vs. 1700 Hz; 200 ms Pure tone, 250 Hz vs. 2500 Hz; 50 ms Rising (200–700 Hz) vs. falling tone (700–200 Hz); 200 ms 250 vs. 2000 Hz; 60 ms

Yes

300 vs. 4500 Hz; 300 ms

Speeded detection (asynchrony)

No

500 vs. 2000 Hz; 20 ms

Klein, Brennan, D’Aloisio, et al. (1987) and Klein, Brennan, and Gilani (1987) Evans and Treisman (2010)

Heron et al. (2012)

Yes Yes

between the consciousness and goal-dependent aspects of automaticity. There are at least two ways in which to assess the non-conscious character of a given process: In a strong version, the non-conscious character comes from demonstrating that a process is pre-attentive. The mere presence of the stimulus (or target) is sufficient to start the process, while awareness of its presence is unnecessary. This can, for example, be tested by looking for pop-out effects in visual search paradigms (Mattingley, 2009; Treisman, 2005; Ward, Jonas, Dienes, & Seth, 2010; though see Mack & Rock, 1998). In a weaker version, at stake in a variety of tasks such as the IAT (Greenwald, McGhee, & Schwartz, 1998) and its variants (e.g., Demattè, Sanabria, & Spence, 2007; Parise & Spence, 2012; Peiffer-Smadja, 2010), all that is needed for a process to count as non-conscious is for it to occur without the participant’s conscious volition or control once the stimulus/target has been attended to (cf. Chen, Yeh, & Spence, 2011). Indeed, this seems to be very much the reasoning behind Evans and Treisman’s (2010) assertion that the various audiovisual correspondences that they studied were automatic. Their claim was that since the crossmodal correspondence between the auditory and visual stimuli affected participants’ performance even when their presence was completely irrelevant to a participant’s task (that is, they occurred without monitoring, to use Tzelgov’s, 1997, terminology) it therefore meant that they were unconscious and non-intentional. According to the load-insensitivity criterion, a process is automatic if it is not hindered when the simultaneous information load goes up – such as, for example, when the perceptual load of a participant’s task is increased (Lavie, 2005). This is usually assessed by means of performance in dual-task interference paradigms: For example, by investigating whether performance in one task is affected by varying the perceptual resources that simultaneously need to be allocated by a participant to a second task (Alsius et al., 2005, 2007; Eramudugolla, Kamke, Soto-Faraco, & Mattingley, 2011; Helbig & Ernst, 2008; Santangelo & Spence, 2008; Spence, 2010). According to the fourth and final criterion, the speed criterion, a process is more likely to be automatic if it can be demonstrated that it affects the very earliest stages of information processing. Hackley (1993), for example, suggests that information processing is strongly automatic until 15 ms for audition and about 80 ms for vision (see Moors & De Houwer, 2006, for further discussion of this criterion). In summary, the evidence reviewed in this section supports two conclusions: (1) Researchers disagree about whether crossmodal correspondences are automatic or not; and (2) While difficult to define, four criteria (goal-independence, non-conscious, load-insensitivity, and speed) seem critical to evaluating claims that a particular cognitive process, or phenomenon, is automatic. That said, the speed criterion appears to be the weakest of the four criteria, and so should perhaps be weighted as somewhat less important that the others when it comes to assessing the automaticity of a given cognitive process. With these various criteria in mind, we can now turn to the empirical evidence concerning audiovisual crossmodal correspondences. As mentioned already, we will focus on those correspondences that have been postulated to be statistical in origin. The studies reviewed here have forwarded somewhat different conclusions regarding the automaticity question (in part, or so we will try to show below, because the researchers concerned have been using the term ‘automaticity’ to mean rather different things).

250

C. Spence, O. Deroy / Consciousness and Cognition 22 (2013) 245–260

3. Reviewing the evidence concerning the automaticity of crossmodal correspondences 3.1. Goal independence and intentionality One crossmodal correspondence that has attracted perhaps more research interest than any other is the crossmodal mapping of auditory pitch onto visual elevation (see Ben-Artzi & Marks, 1995; Bernstein & Edelstein, 1971; Evans & Treisman, 2010; Klein, Brennan, D’Aloisio, et al., 1987; Klein, Brennan, & Gilani, 1987; Melara & O’Brien, 1987; Miller, 1991; Patching & Quinlan, 2002; Pedley & Harper, 1959). Specifically, people appear to associate higher-pitched sounds with higher visual stimuli while associating lower-pitched sounds with lower visual stimuli. People have also been shown to associate auditory stimuli with an ascending pitch with higher positions than auditory stimuli whose pitch descends (e.g., Fernández-Prieto, Vera-Constán, García-Morera, & Navarra, 2012; Mossbridge et al., 2011). A recent study relevant to the question of the automaticity of crossmodal correspondences investigated this mapping between auditory pitch and visual elevation. In particular, Chiou and Rich (2012a) conducted a series of exogenous spatial crossmodal cuing studies (see Spence, 2010, for a review) in which the participants had to make simple speeded detection responses to a series of visual targets presented randomly from either above or below a central fixation point. Shortly before the presentation of each target, an auditory cue was presented from the centre of the display. The idea was that if higher pitched sounds are associated with higher spatial locations then an exogenous crossmodal spatial cuing effect might be observed (such that participants would detect upper targets more rapidly than lower targets following the presentation of a higher-pitched than following a lower-pitched auditory cue). The results (see Fig. 1) revealed that participants responded significantly more rapidly to the upper visual targets following the presentation of the higher pitched auditory cue (1500 Hz), whereas, lower visual targets were detected more rapidly following the presentation of lower-pitched auditory cues (300 Hz). Previous research has revealed that exogenous spatial cuing effects usually dissipate within a few hundred milliseconds of the presentation of a lateralised auditory cue (see Spence, 2010; Spence, McDonald, & Driver, 2004, for reviews). Interestingly, however, Chiou and Rich’s (2012a) data did not reveal a significant interaction between Congruency and SOA. This null result led the authors to suggest that the effect on spatial attention on participants’ performance was just as large across the whole range of SOAs tested. That said, closer inspection of their data (see Fig. 1), suggests that there might have been a different pattern of results (namely the absence of a spatial cuing effect) at the shortest SOA tested (0 ms). Nevertheless, these results do provide some of the first empirical evidence that the crossmodal mapping of pitch to elevation can affect the spatial allocation of a participant’s exogenous attention. Similar results have since been reported by Fernández-Prieto et al. (2012) in a study in which a rising (from 200 to 700 Hz) or falling pitch (from 700 to 200 Hz) auditory cue was presented for 200 ms prior to a visual target located in one of four positions arranged in a square around fixation (see also Mossbridge et al., 2011). While the 4 ms spatial cuing effect failed to reach statistical significance at the shorter SOA (400 ms) tested in this study, it did at the longer interval (550 ms; the cuing effect was approximately 7 ms at this SOA). Once again, the participants in this study had to make a speeded simple detection response.6 In another of Chiou and Rich’s (2012a) experiments, the pitch of the auditory cue was made informative with regard to the likely elevation of the visual target. In particular, a low-pitched (250 Hz) auditory cue predicted that the visual target would be presented from the upper target location on the majority (80%) of trials whereas the presentation of the highpitched tone (2500 Hz) predicted that the target would likely appear in the lower position instead. The participants were told about the meaning of the cue (and the probabilities concerned). The results (see Fig. 2) revealed that participants’ attention was directed to the likely (rather than the crossmodally corresponding) target location, though this reversal of the spatial cuing effect (as compared to that reported in Chiou and Rich’s other experiments) took some time to emerge. Chiou and Rich argued that the latter results demonstrated that the triggering of the crossmodal correspondence between auditory pitch and visual elevation was under the participants’ voluntary attentional control, and hence not ‘automatic’. In the context of the present review, we would add here that this pattern of results also implies that the processing of crossmodal correspondences is, to some degree, goal-dependent. However, it could be argued that there are potentially two influences on participants’ performance: One is the endogenous (or voluntary) shift of attention that is triggered by the informative cue, the other an exogenous (or stimulus-driven) shift of attention elicited by the natural crossmodal correspondence between relative pitch and elevation. When endogenous and exogenous attention pull in opposite directions, as in Chiou and Rich’s (2012a) experiment, it may simply be that the stronger of the two (apparently the endogenous effect), simply overrides the other (the exogenous effect) in terms of determining where a participant’s attention is ultimately allocated (see Chica, Sanabria, Lupiáñez, & Spence, 2007). Crucially, though, such a result should not be taken to demonstrate that the exogenous, or natural, crossmodal mapping does not exert any influence on a participant’s behaviour. In order to support such a claim, one would further need to demonstrate that there was no difference in the time-course and magnitude of spatial cuing effects under those conditions in 6 One unfortunate limitation with all of Chiou and Rich’s (2012a) studies (as well as with similar studies reported by Fernández-Prieto et al. (2012), Klein, Brennan, D’Aloisio, et al. (1987) and Klein, Brennan, and Gilani (1987) is that the use of a speeded simple detection response paradigm (even one with catch trials) means that it is impossible to rule out a criterion shifting explanation of any spatial cuing effects that were observed (see Spence & Driver, 1997). Hence, one cannot know for sure whether any crossmodal cuing effects reported in these studies were decisional or perceptual in nature (cf. Mossbridge et al., 2011).

C. Spence, O. Deroy / Consciousness and Cognition 22 (2013) 245–260

251

Fig. 1. Results of Chiou and Rich’s (2012a; Experiment 1) crossmodal cuing study. Mean reaction times (RTs) are plotted as a function of the SOA and crossmodal correspondence between the pitch of the centrally-presented auditory cue and the position of the visual target (congruent = open symbols; incongruent = filled symbols). (A) Null crossmodal spatial cuing effect observed when the visual targets were presented to the left or right of fixation; (B) Significant crossmodal spatial cuing effect observed when the visual targets were presented from above or below fixation instead. Error bars represent 1 SEM. (Reprinted with permission from Chiou and Rich (2012a, Fig. 2).)

which the participants’ exogenous and endogenous attention pulls in the same vs. opposite direction (see Klein, Brennan, D’Aloisio, et al., 1987; Klein, Brennan, & Gilani, 1987). Should there be such a difference between these conditions, it could perhaps be accounted for in terms a short-lasting exogenous cuing effect triggered by the presentation of the cue that is then overridden by the top-down (or strategic) allocation of a participant’s spatial attention. Therefore, until such empirical data has been collected, the question of whether or not some automatic coding of crossmodal correspondence takes place remains unresolved. And, indeed, in the only study we know of to have compared performance in these two conditions, participants did indeed find it easier to direct their attention when the informative value of a rising or falling pitch tone (regarding the likelihood of a visual target appearing above or below fixation) matched the natural crossmodal mapping than when these two were put into opposition (Juckes & Klein, unpublished; Klein & Juckes, 1989), as in Chiou and Rich’s (2012a) study. In conclusion, the evidence reviewed in this section suggests, but does not unequivocally support, the claim that the audiovisual crossmodal correspondence between auditory pitch and visual elevation is goal-dependent. In the next section, we continue to evaluate the goal-independence criterion in light of the problem of stimulus salience. 3.2. The problem of stimulus salience Klapetek et al. (2012) investigated whether the pitch of an auditory cue would affect participants’ performance in a version of the ‘pip-and-pop’ visual search task (Ngo & Spence, 2010; Van der Burg, Olivers, Bronkhorst, & Theeuwes, 2008). Participants in this study had to search for a horizontal or vertical target bar in amongst displays containing 23, 35, or 47 tilted (22° from the horizontal or vertical) distractor items (see Fig. 3). The brightness of the target and distractors changed frequently (between light and dark grey, against a mid-grey background) in a seemingly randomly manner during

Fig. 2. Results of Chiou and Rich’s (2012a; Experiment 4) study of the effects of endogenous attentional orienting and crossmodal correspondence on the magnitude of spatial cuing effects observed in the vertical dimension. Mean RTs are plotted as a function of the SOA and crossmodal correspondence between the elevation of the visual target and the pitch of the centrally-presented auditory cue (crossmodally congruent = open symbols; crossmodally incongruent = filled symbols). (A) The results when the pitch of the centrally-presented auditory cue was non-predictive with regard to the location (upper vs. lower) of the visual target (essentially replicating the results shown in Fig. 1.B). The results when the pitch of the auditory cue was made predict of the opposite location (opposite in the sense of contradicting the natural crossmodal mapping). Error bars represent 1 SEM. (Reprinted with permission from Chiou and Rich (2012a, Fig. 5).)

252

C. Spence, O. Deroy / Consciousness and Cognition 22 (2013) 245–260

the course of each trial. The brightness of the visual target switched between light and dark grey or vice versa once every second or so. On two thirds of the trials, a task-irrelevant sound with an alternating pitch was presented in time with the changing lightness of the visual target. The auditory stimulus could either be crossmodally congruent or incongruent with the visual target: i.e., the lower-pitched 250 Hz sound synchronised with the presentation of the darker target, and the high-pitched 2000 Hz tone presented in synchrony with the brighter target or incongruent, with the mapping reversed. On the remaining 1/3rd of trials, no auditory cue was presented. The participants had to make a speeded manual discrimination response regarding the orientation of the visual target (i.e., the changing brightness of the target was irrelevant to their task). The results demonstrated that the presentation of the auditory cue resulted in a significant facilitation of participants’ target discrimination latencies (though interestingly no change in the search slope). Crucially, however, this crossmodal benefit was not modulated by the crossmodal correspondence between the pitch of the sound and the brightness of the visual target (see Fig. 4A). In a second experiment, however, Klapetek et al. (2012) investigated whether informing the participants about the crossmodal correspondence that existed between the stimuli presented in the auditory and visual modalities, together with blocking the presentation of the crossmodally congruent and incongruent trials (as compared to a random presentation of congruent and incongruent trials in their Experiment 1), would affect the pattern of results that was obtained. Interestingly, under such conditions, the crossmodal correspondence between the auditory cue and the visual target exerted a significant modulatory effect on participants’ performance (see Fig. 4B). Note, here, that exactly the same experimental stimuli were used in both of Klapetek et al.’s studies. All that changed were simply the instructions given to the participant and the blocking of the trial types. Taken together, Klapetek et al.’s (2012) results therefore provide further evidence (this time, from a visual search task), that the crossmodal correspondence between auditory pitch and visual brightness isn’t solely stimulus driven. This, however, still does not demonstrate that the process underlying the effect of crossmodal correspondence is conscious, controlled, and/or goal-dependent. Another way in which to interpret the influence of explicit instruction on participants’ performance is that it simply serves to make a certain correspondence more salient to the participant (cf. Chiou & Rich, 2012a). Here, there is potentially an analogy with what happens when people look at images such as the famous black and white picture of the Dalmatian dog hidden amongst a ground of leaves (see Life Magazine, 19th February, 1965, p. 120; Marr, 1982, p. 101, Fig. 3.1; or Ahissar & Hochstein, 2004, for other examples). Initially, many people will fail to recognise the dog. However, once they have ‘seen’ it, then, whenever they subsequently look at the picture again they will seemingly ‘automatically’ see the animal. In a sense, then, the viewer’s response to the picture is not purely stimulus-driven, since information, or time spent inspecting the figure, can change the way in which they process/respond to it. However, once in that perceptive state, the dog is seen automatically, and the awareness of its presence cannot be suppressed voluntarily. Although perceptual salience can be modulated by top-down processes (Theeuwes, 2010), its core comes from bottomup, stimulus-driven signals indicating that a certain location or element in a display is sufficiently different from its surroundings to be worthy of attention (though see also Awh, Belopolsky, & Theeuwes, 2012). It is important, then, to try to understand which differences in the audiovisual context make the corresponding dimension salient – especially given the multidimensional nature of both the auditory and the visual percepts. Relevant here are the results of another of Chiou and Rich’s (2012a) experiments in which they demonstrated that if the magnitude of the frequency difference between the high and low pitched auditory cues (presented, once again, in the context of an exogenous spatial cuing study) was reduced from 1200 Hz down to only 100 Hz, then the crossmodal correspondence between the pitch of the auditory cue

Fig. 3. Visual search display with 36 stimuli used in Klapetek et al.’s (2012) studies investigating the effect of the crossmodal correspondence between auditory pitch and the lightness of a visual target in a variant of the ‘pip-and-pop’ visual search task. The visual target (a vertically-oriented bar) is highlighted by a dotted yellow circle (not present in the actual experiment). (Figure reprinted with permission from Klapetek et al. (2012, Fig. 1).) (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

C. Spence, O. Deroy / Consciousness and Cognition 22 (2013) 245–260

253

Fig. 4. RTs as a function of sound (congruent, incongruent, no sound) and set size (24, 36, 48) in Klapetek et al.’s (2012) recent study. (A) The crossmodal correspondence between the changing pitch of the auditory stimulus and the changing brightness of the visual target was varied on a trial-by-trial basis in Experiment 1, but failed to modulate participants’ performance on the visual search task. (B) However, in a second experiment, in which the congruent and incongruent trials were blocked, and where the participants were also informed about the crossmodal correspondence at the start of each block of trials, a significant effect of crossmodal correspondence was observed. The error bars represent the standard errors of the means for each combination of the two factors. (Figure reprinted with permission from Klapetek et al. (2012, Fig. 2).)

and the elevation of the visual target no longer influenced participants’ behavioural responses. That is, the effect of the congruent/incongruent crossmodal mapping between auditory pitch and visual elevation disappeared across the whole range of SOAs (from 100 to 900 ms) that were tested. Chiou and Rich (2012a) also reported that whether the auditory cue directed the participants’ attention to the upper or lower visual field depended on stimulus context, or rather, on the range of stimuli that were presented within a block of trials. Thus, while the presentation of a 900 Hz pure tone led to an upward shift of participants’ spatial attention when the other tone in a block of trials was 100 Hz, it gave rise to a borderline-significant trend toward a downward shift of attention when the very same tone was presented amongst 1700 Hz tones instead. Such results confirm previous suggestions that the effects of crossmodal correspondence on a participant’s behaviour appear to be determined in more of a relative rather than an absolute manner (see Deroy & Spence, in press; Gallace & Spence, 2006; Marks, 1987; Pedley & Harper, 1959; Spence, 2011; though see also Guzman-Martinez, Ortega, Grabowecky, Mossbridge, & Suzuki, 2012, for evidence that at least certain crossmodal correspondences may show an absolute mapping).7 Now, one can legitimately ask whether these results really do show that the crossmodal shift of spatial attention elicited by the crossmodal correspondence between the pitch of the auditory cue and the elevation of the subsequently-presented visual target is ‘underpinned by voluntary attention’ (Chiou & Rich, 2012a, p. 348)?8 Here we would like to argue that it is not altogether clear what necessary implications the demonstration that an effect is relative and/or context-dependent has in terms of determining whether or not it is voluntary. It could be argued that these are somewhat orthogonal debates (see also Sperber, 2005). What Chiou and Rich’s results more minimally demonstrate is, once again, that the crossmodally corresponding dimensions need to be salient to the participant, and that this salience is determined (in part) by the context of the experiment. In summary, then, although the research reviewed in this section (and the last) appears to show that the processing of crossmodal correspondences is goal-dependent, and only operates in a strategic, top-down manner, the role of instruction can be more minimally interpreted as a way in which to make the dimensions on which the crossmodal correspondence operates perceptually salient to the participant: Once those dimensions have been made salient, the crossmodal correspondence might then operate in a manner that is both automatic and goal-independent. 3.3. Speed Recently, Parise and Spence (2012) reported a series of five experiments in which they tested various examples of crossmodal correspondences (including several different examples of sound symbolism) between auditory and visual stimuli. The participants in these experiments had to make speeded manual discrimination responses to a random sequence of unimodal auditory and visual target stimuli. So, for example, in one experiment, the participants had to press one key in response to the presentation of the smaller of two target circles and to the higher-pitched of two tones, while pressing the other response key whenever they were presented with either the larger circle or the tone with the lower-pitch (see Fig. 5). Meanwhile, in 7 Here one might wonder whether it is the general context that matters, or rather, the context of the transition from the sound presented on the immediately preceding trial (cf. Spence, Nicholls, & Driver, 2001). It is certainly possible to imagine a study of crossmodal correspondences in which a low, medium, and high pitched auditory cue were to be presented randomly on each trial. One could then isolate those trials on which the medium pitched sound is presented, and look at whether that tone behaves like a high pitched sound if the tone on the immediately preceding trial was lower in pitch, but like a low pitched sound if the auditory cue on the immediately preceding trial was high pitched instead. 8 In full, Chiou and Rich’s (2012a, p. 348) argument is as follows: ‘‘The finding that contextual relative pitch determines the direction of attention shifts implies that substantial mental processes after early auditory encoding mediate the pitch cuing effect. This leads us to suspect, despite pitch causing a shift in attention when it is non-predictive, the effect may be underpinned by volitional attention.’’

254

C. Spence, O. Deroy / Consciousness and Cognition 22 (2013) 245–260

other blocks of experimental trials, the mapping of the auditory and visual stimuli to the two response keys was reversed (so that one response key was associated with larger circles and higher-pitched tones while the other response key was associated with smaller circles and lower-pitched tones). The participants in this study were instructed to respond as rapidly and accurately as possible. The results of all five of Parise and Spence’s (2012) experiments revealed a significant compatibility effect, with faster (and somewhat more accurate) responses being recorded in those putatively congruent blocks of trials than in those blocks of trials that were expected (according to the findings of previous research) to be crossmodally incongruent instead. What is more, these crossmodal correspondence effects were present in even the fastest of participants’ responses (as revealed by a bin analysis showing that the crossmodal correspondence between the auditory and visual stimuli even influenced those responses that were elicited within 400 ms of stimulus onset). This was the result that led Parise and Spence to argue against a strategic account of such results, and instead, like many before them, to suggest that crossmodal correspondences affect performance in a seemingly automatic manner instead. While one can certainly question whether the results of the IAT (and its variants) necessarily provide evidence of the implicit processing of the crossmodal correspondence between the auditory and visual stimuli that the participant is instructed to respond to (e.g., see De Houwer, Teige-Mocigemba, Spruyt, & Moors, 2009), Parise and Spence’s (2012) results are, in the context of the present review, nevertheless still interesting for several reasons. First, because the modulation of participants’ performance that resulted from the crossmodal correspondence between the stimuli associated with a particular response key occurred under those conditions in which only a single unimodal target stimulus was presented on each trial. Such results therefore demonstrate that crossmodal congruency can impact on participants’ performance when an explanation of the effect in terms of a bias in selective attention (to one or other stimulus; Marks, 2004) can effectively be ruled out (since only a single stimulus was presented on each trial, and hence there was no stimulus-driven competition for a participant’s attention). Second, Parise and Spence’s results also rule out an account of crossmodal correspondences solely in terms of multisensory integration. To be clear, since only a single unisensory stimulus was presented on each trial, there was presumably no opportunity for multisensory integration to have influenced participants’ performance in any of Parise and Spence’s experiments (see Ngo & Spence, 2012; Spence & Ngo, 2012b). However, a potential problem arises when taking Parise and Spence’s (2012) results to support the automaticity of crossmodal correspondences. They based their assertion regarding automaticity on the fact that the influence of crossmodal correspondences was evident in even the fastest of their participants’ behavioural responses. ‘Fast’ in this case, though, actually meant manual responses that were initiated within 400 ms of the onset of target. Now while such results are most certainly consistent with the speed criterion of automaticity, it has to be said that they do not provide particularly strong support for it; after all, a lot happens within the first few hundreds of milliseconds of information processing (cf. Fiebelkorn, Foxe, & Molholm, 2010, 2012; Horowitz, Wolfe, Alvarez, Cohen, & Kuzmova, 2009). Of course, the speed criterion is only one aspect of automaticity and many very fast effects can be voluntary/optional. That said, much stronger evidence concerning how early in information processing crossmodal correspondences exert their effect could potentially come from neuroimaging studies. It is to the results of these that we turn next. 3.3.1. The cognitive neuroscience of crossmodal correspondences Over the last couple of years or so, researchers have started to investigate crossmodal correspondences using various different cognitive neuroscience techniques (e.g., Bien et al., 2012; Kovic, Plunkett, & Westermann, 2010; Peiffer-Smadja, 2010;

Fig. 5. The stimulus–response mapping used and the results obtained in one of Parise and Spence’s (2012) recent studies using a variant of the IAT to assess the consequences of varying the stimulus–response mapping between auditory pitch and visual size. The results demonstrate that participants found it easier to pair crossmodally corresponding stimuli with the same response key (e.g., the large circle with the lower-pitched sound). (This figure is reprinted with permission from Parise and Spence (in press, Fig. 3a and b).)

C. Spence, O. Deroy / Consciousness and Cognition 22 (2013) 245–260

255

Seo et al., 2010; see also Nahm, Tranel, Damasio, & Damasio, 1993). The results of such research hold the potential to provide more robust evidence regarding the time-course of crossmodal correspondences (especially if the earliest neuronal responses were to be affected by the crossmodal congruency between auditory and visual stimuli). So what do the data show? Sadaghiani et al. (2009) reported on an fMRI study in which a rising/falling pitch sound was presented while participants were trying to determine whether an ambiguous visual motion display was drifting upwards or downwards. The crossmodal correspondence between the auditory pitch change and the direction of visual motion modulated neural activity in both visual motion areas (specifically the left human motion complex; hMT+/V5+) as well as in higher areas (specifically, the right intraparietal sulcus). That said, it was unclear from the results of this study whether this modulation of activity in the visual motion areas resulted from feed-forward vs. feedback interactions. Hence, although certainly intriguing, Sadaghiani et al.’s results do not provide any strong evidence regarding just how ‘early’ the influence of crossmodal correspondences can be picked up in neural information processing. A similar limitation also affects the interpretation of Peiffer-Smadja’s (2010) fMRI study, where the author looked at the crossmodal correspondence between speech sounds and angularity. More interesting here are the results of a combined TMS, EEG, and psychophysics study reported by Bien et al. (2012), in which the participants were simultaneously presented with a small or large circle and either a lower or higher pitched sound. The lateral separation between the auditory and visual stimuli was varied and participants had to try and judge whether the sound was presented to the left or right of the visual stimulus (this study was based on an earlier study by Parise and Spence, 2009). On the basis of a combined ERP and repetitive TMS study, the authors concluded that the first neural signs associated with a distinction between congruent and incongruent stimulus pairings started around 250 ms after stimulus onset in the right intraparietal sulcus (and was identified with the parietal P2).9 Note that such a temporal signature would not normally qualify a process as being fast. Here it should also be noted that the two other published ERP studies of crossmodal correspondences that we are aware of, although studying very different crossmodal correspondences, have come to roughly the same conclusion – namely that the effect of crossmodal correspondence (matching vs. mismatching) on ERPs first emerges around 150–200 ms after stimulus onset (see Kovic et al., 2010; Seo et al., 2010). While Kovic et al. described their ERP results as showing an ‘early’ multisensory effect, it is worth noting that many other examples of ERP differences between matching and mismatching pairs of auditory and visual stimuli have been documented much earlier in neural information processing (e.g., at 40 ms in Giard and Peronnet, 1999); see also Molholm, Ritter, Javitt, & Foxe, 2004; Molholm et al., 2002). The involvement of the Lateral Occipital Complex (LOC), the angular gyrus (located within the temporal–parietal–occipital, TPO, region; Ramachandran & Hubbard, 2003), and even prefrontal cortex (in the bouba-kiki effect; Peiffer-Smadja, 2010) in other crossmodal correspondence studies also points to an effect that doesn’t modulate the very earliest stages of neural information processing (see also Fiebelkorn et al., 2010, 2012). At present, it is unclear to what extent any differences in the patterns of neural activation in these studies should be attributed to differences in the tasks used or the correspondences investigated in these recent studies. In conclusion, although crossmodal correspondences can affect the fastest of a participant’s behavioural responses, their influence does not necessarily appear to impact on the earliest of their neural responses, and hence should not be taken as providing strong support for the speed criterion. Note, though, that, contrary to other criteria like the conscious criterion which has a clear-cut satisfaction condition (a process has to be either conscious or not), the notion of ‘early’ and ‘fast’ are relative, so that the conclusion should be that crossmodal correspondences are not among the earliest influences on human information processing. Furthermore, given the lack of a clear-cut satisfaction condition on the speed criterion, this criterion should perhaps be weighted somewhat less heavily than the others when it comes to assessing the automaticity or otherwise of a cognitive process. 4. On automaticity and different kinds of crossmodal correspondence As has becomes clear over the course of this review, the fact that no single answer emerges from these studies concerning the automaticity of crossmodal correspondences is the sign that there is no simple answer to this question. Besides, differences in the kind of criteria and standard of automaticity one chooses to focus on, and differences in the experimental task used to test it, notable differences can be introduced by the selection of a particular crossmodal correspondence for testing. First, when it comes to comparing studies that apparently focus on the same crossmodal relation, and that have utilised seemingly identical tasks, as in the case, for example, for the crossmodal correspondence between pitch and elevation (Chiou & Rich, 2012a; Klein, Brennan, D’Aloisio et al., 1987; Klein, Brennan, & Gilani, 1987), there could be subtle differences attributable to the particular stimuli (and, more importantly, the range of stimuli) that researchers happen to have used (see Table 1) which bear on the salience of the relevant dimensions. Pushing this hypothesis one step further, there might even be a difference between the crossmodal correspondences holding between pure tones and elevation (Chiou & Rich, 2012a) and, say, that holding between rising/descending sounds and elevation (Fernández-Prieto et al., 2012; Jeschonek, Pauen, & Babocsai, in press; Mossbridge et al., 2011), or even rising/descending visual stimuli. 9 TMS over parietal cortex disrupts the elicitation of the concurrent at least in certain forms of synaesthesia (see Esterman et al., 2006; Muggleton, Tsakanikos, Walsh, & Ward, 2007). Note, though, that the areas that are critically involved in crossmodal correspondences and synaesthesia appear to be different, with TMS over the right parieto-occipital (PO) junction, but not left or right IPS or left PO knocking out the synaesthetic concurrent in Muggleton et al.’s study.

256

C. Spence, O. Deroy / Consciousness and Cognition 22 (2013) 245–260

Pushing this one step further, it is important to stress why the question of there being different kinds of crossmodal correspondences matters. Simply as a methodological caution, it would seem sensible to remember that the conclusions drawn concerning the crossmodal correspondence say, between auditory pitch and visual brightness by Klapetek et al. (2012) should not necessarily be taken to show that another crossmodal correspondence such as, for instance, the correspondence between pitch and elevation (Chiou & Rich, 2012a), need necessarily operate in exactly the same manner. Now, the argument for resisting the tendency to generalise from one type of crossmodal relation to another (even if that seems to be what is done in many discussions of the automaticity of synaesthesia) comes from the more substantial reason that there are differences in the origin and internalisation of the crossmodal correspondences. According to Spence (2011; see also Deroy et al., in press; Sadaghiani et al., 2009), many crossmodal correspondences likely result from an organism picking up on the statistical regularities that are present in the environment, and occur for pairs of stimulus dimensions that happen to be correlated in nature (Spence & Deroy, 2012). Others, though, are likely to involve further structural or internal determinants that fall naturally out of the organisation of the organism’s mind/perceptual system. The emerging cognitive neuroscience of crossmodal correspondences has only recently started to distinguish between the different areas involved in different audiovisual correspondences (e.g., Bien et al., 2012; Peiffer-Smadja, 2010; Sadaghiani et al., 2009; Spence & Parise, 2012). Continuing along these lines, one can further distinguish between correspondences that occur because of a common amodal coding (of, for instance, magnitude, shape, intensity, or space; e.g., Walker-Andrews, 1994; Walsh, 2003; see also Cohen, 1934), or because of other indirect comparisons or mappings, in terms of their emotional effect (Palmer & Schloss, 2012; Schifferstein & Tanudjaja, 2004) or linguistic/semantic coding (Long, 1977; Martino & Marks, 1999, 2000; Sadaghiani et al., 2009; Smith & Sera, 1992). What’s more, other differences in the development of specific crossmodal correspondences might have an effect on their internalization or robustness, and therefore on the degree of automaticity of their effect: One could, for instance, hypothesise that those crossmodal correspondences that are more ‘natural’ and show at very early stages of human development (such as the correspondence between pitch and brightness or pitch and size; see also Lewkowicz & Turkewitz, 1980) are more strongly internalised than others that may only develop later (for instance, those requiring further conceptual development, such as correspondences involving mass; Simner, Harrold, Creed, Monro, & Foulkes, 2009; Smith, Carey, & Wiser, 1985) and are more automatically (or at least rapidly) processed. Or perhaps it is all just a function of exposure, with increased exposure making the crossmodal correspondence more likely to operate from an earlier stage of information processing. In any case, further research will certainly need to keep these differences in mind: Instead of looking for a general answer to the automaticity question, researchers should perhaps focus instead on trying to understand what kinds and degrees of automaticity in various tasks tell us about the access and control our minds have over kinds of correspondences that are variously internalised.10 In conclusion, the literature reviewed in this section draws attention to the possibility that difference answers to the automaticity question may be obtained for different types of crossmodal correspondence (e.g., statistical, structural, and/ or semantic). It might even be the case that different answers will be obtained when considering the correspondences that hold between different pairings of sensory modalities. 5. Conclusions In conclusion, in the present article, we have reviewed a number of recent studies in which crossmodal correspondences have sometimes not impacted on participants’ performance. The interpretation of these failures (or perhaps, better said, null results) has resulted in some researchers arguing that crossmodal correspondences are strategic, goal-dependent, and/or intentional (e.g., Chiou & Rich, 2012a; Klapetek et al., 2012), require conscious control, and operate relatively late in information processing (Fernández-Prieto et al., 2012). Although our purpose in this article has been to qualify this conclusion, it should, at least, be clear by now that the automaticity of the crossmodal correspondences discussed here are not in any way comparable to the sort of automaticity that has been documented in the case of synaesthetic relations. Synaesthesia is largely involuntary (Ward, 2012), meaning that the conscious sensory concurrent is automatically induced by the presentation of the inducing stimulus, just as long as that inducer is processed consciously by the synaesthete. It has been argued that synaesthesia occurs relatively early in information processing (though see also Simner & Hubbard, 2006), and is largely load-insensitive (see Mattingley, 2009; Treisman, 2005, for reviews). In fact, many researchers take the automaticity of the concurrent to be a key defining characteristic of the condition (Hochel & Milán, 2008; see also Ward, 2012). Note, though, that while some researchers have argued that (pre-attentive) pop-out occurs in the case of synaesthesia, the weight of scientific opinion is now against such claims (Mattingley, 2009; Ward, Jonas, Dienes, & Seth, 2010), leaving synaesthesia as a largely involuntary process. The synaesthetic association of sounds of different pitches with specific elevations in space, as happens in (rare) cases of music-space synaesthesia have, for instance, been shown to result in automatic processing (at least in the sense of being involuntary and goal-independent) and to lead to compatibility effects in spatial Stroop-like tasks: When both synaesthetic and non-synaesthetic participants are presented with a musical note and have

10 In the future, it may be helpful to try and develop a quantitative account of the acquisition of crossmodal correspondences. Relevant here may be the traditional models of learning, such as those proposed by Estes (1950) and Bush and Mosteller (1951).

C. Spence, O. Deroy / Consciousness and Cognition 22 (2013) 245–260

257

to reach for a visual target with the cursor of a mouse, only the synaesthetes are significantly faster when the target appears in a compatible as compared to an incompatible location (see Linkovski, Akiva-Kabiri, Gertner, & Henik, 2012). Whereas we agree with the general notion that crossmodal correspondences do not share the kind of automaticity exhibited by synaesthesia, we have also tried to make clear that there are a number of ways in which crossmodal correspondences might still present a high degree of automaticity. They do not qualify as automatic processes in the strong, or old-fashioned, sense of being systematic or necessary (e.g., Logan, 1985, 1989). This view has, though, now largely been given up on by researchers (see Moors & De Houwer, 2006, for a review). That said, crossmodal correspondences can affect a participant’s behaviour without their (conscious) control and in a relatively fast manner (though it should once again be acknowledged that the speed criterion is perhaps the weakest, or least important, of the four criteria when it comes to assessing the automaticity of a given cognitive process). Regarding the intentionality criterion, the evidence that the processing of crossmodal correspondences is goal-dependent and strategic remains weak. Regarding consciousness, we have highlighted how in the most interesting cases in which crossmodal correspondences appear to be ‘activated’, researchers seem to need to manipulate the perceptual salience of the relevant dimensions primarily by means of the selection process involved or, on occasion, the verbal instructions given to participants (Klapetek et al., 2012). We might also ask whether the differences in results between different studies of crossmodal correspondences are not also to be explained by the fact that certain audiovisual correspondences (e.g., those operating between pitch and elevation) are not more robust or salient than others. It would be useful in this sense to vary the kind of audiovisual correspondences proposed in most of the reviewed cases – for instance, by contrasting the results obtained with high-pitch-brightness vs. high-pitch-size vs. high-pitch-elevation within the same experimental paradigm (cf. Evans & Treisman, 2010; Parise & Spence, 2012). When it comes to future investigation, it is worth bearing in mind that researchers studying the automaticity of crossmodal correspondences have typically used contrasting pairs of stimuli in different sensory modalities and used either explicit or implicit matching tasks. The need to resort to pairs of stimuli (that vary along a single dimension) is, however, perhaps not always the most appropriate strategy as the task can appear surprising and hence might lead participants to adopt a more reflective strategy, for instance, reasoning by analogy (‘the more on the pitch scale should be matched with the more on the bright scale’, e.g., see Martino & Marks’, 1999, ‘semantic coding hypothesis’). If that is really what is happening, it could be argued that the results of such studies might not reflect internalised or learned crossmodal correspondences. As such, the presentation of only a single stimulus in at least one of the sensory modalities might be a good way in which to investigate those correspondences which have been internalised (see Guzman-Martinez et al., 2012). Finally, further similarities and differences between the automaticity of crossmodal correspondences, other crossmodal priors, and synaesthesia may emerge if one were to look at the load insensitivity of crossmodal correspondences, and their pre-attentional character. So, for example, it would be interesting to determine whether increasing the perceptual load in a given modality would necessarily reduce the magnitude of any crossmodal correspondence effects that are observed in a specific experimental paradigm (Chiou & Rich, 2012b; cf. Helbig & Ernst, 2008; Lavie, 2005; Mattingley et al., 2006; Sweeny et al., 2012). Furthermore, with regard to the non-conscious criterion, it would be interesting to investigate whether visual stimuli (say a small or large circle) that a participant is not aware of, because say they are presented to one eye under conditions of continuous flash suppression or binocular rivalry (Schwartz, Grimault, Hupé, Moore, & Pressnitzer, 2012; Sweeny et al., 2012) can nevertheless still influence a participant’s categorisation of the pitch of a sound say (see Marks, 2004; Spence, 2011). 5.1. Closing comments: Where do we stand with respect to the notion of automaticity? Ultimately, it should now be clear that the traditional dichotomous, all-or-none view of a given cognitive process as being either automatic or not automatic is no longer tenable. Instead, when it comes to evaluating researchers’ claims regarding the automaticity of a given cognitive process, or phenomenon, such as crossmodal correspodences, we need to consider the extent to which the various defining criteria (goal-independence, non-conscious, load-insensitivity, and speed) are met. According to this later proposal, then, a given cognitive processes will turn out to be more-or-less automatic, depending on the number and extent of the defining features that characterise the phenomenon (or criteria that are met). While the latter suggestion might seem an eminently sensible alternative to the traditional dichotomous view, it must be remembered that it leads to its own set of complications: First-and-foremost, one is immediately faced with the question of how to characterise these ‘‘different degrees of automaticity’’. How, for example, should one decide whether or not a given cognitive process (say crossmodal sensory correspondences) is more or less automatic than another phenomenon (such as, for example, synaesthesia)? How might one quantify the ‘degree’ to which each criterion is satisfied, and how should the different criteria be weight in order to compute some absolute measure of ‘overall automaticity’? ‘‘Is so much load-invariance worth so much goal-independence or ‘fastness’?’’, one might well ask. Here we would like to end by stating our view that defining the common currency with which to quantify the degree of automaticity of a given cognitive process still remains a challenge of the first order; Furthermore, the challenge is likely to remain significant as long as researchers continue using different behavioural paradigms to study different crossmodal correspondences (see above). However, that said, we leave it for the reader to decide whether or not such difficulties would best be resolved by dispensing with the very notion of ‘automaticity’ (as suggested by one of the original reviewers of this paper), and perhaps simply focusing on the processes that lead to, or control, those patterns of behaviour that we may choose to characterise as being more or less automatic.

258

C. Spence, O. Deroy / Consciousness and Cognition 22 (2013) 245–260

Acknowledgment We would like to thank Cesare Parise for his detailed comments and suggestions on an earlier version of this manuscript. References Ahissar, M., & Hochstein, S. (2004). The reverse hierarchy theory of visual perceptive learning. Trends in Cognitive Sciences, 8, 457–464. Alsius, A., Navarra, J., Campbell, R., & Soto-Faraco, S. (2005). Audiovisual integration of speech falters under high attention demands. Current Biology, 15, 1–5. Alsius, A., Navarra, J., & Soto-Faraco, S. (2007). Attention to touch weakens audiovisual speech integration. Experimental Brain Research, 183, 399–404. Awh, E., Belopolsky, A. V., & Theeuwes, J. (2012). Top-down versus bottom-up attentional control: A failed theoretical dichotomy. Trends in Cognitive Sciences, 16, 437–443. Bargh, J. A. (1992). The ecology of automaticity: Toward establishing the conditions needed to produce automatic processing effects. American Journal of Psychology, 105, 181–199. Bargh, J. A. (1994). The four horsemen of automaticity: Awareness, intention, efficiency, and control in social cognition. In R. S. Wyer & T. K. Srull (Eds.). Handbook of social cognition (Vol. 1, pp. 1–40). Hillsdale, NJ: Erlbaum. Belkin, K., Martin, R., Kemp, S. E., & Gilbert, A. N. (1997). Auditory pitch as a perceptual analogue to odor quality. Psychological Science, 8, 340–342. Ben-Artzi, E., & Marks, L. E. (1995). Visual–auditory interaction in speeded classification: Role of stimulus difference. Perception and Psychophysics, 57, 1151–1162. Bernstein, I. H., & Edelstein, B. A. (1971). Effects of some variations in auditory input upon visual choice reaction time. Journal of Experimental Psychology, 87, 241–247. Bertelson, P., Vroomen, J., de Gelder, B., & Driver, J. (2000). The ventriloquist effect does not depend on the direction of deliberate visual attention. Perception and Psychophysics, 62, 321–332. Bien, N., ten Oever, S., Goebel, R., & Sack, A. T. (2012). The sound of size: Crossmodal binding in pitch-size synesthesia: A combined TMS, EEG, and psychophysics study. NeuroImage, 59, 663–672. Blake, R., Palmeri, T., Marois, R., & Kim, C.-Y. (2005). On the perceptual reality of synesthesia. In L. C. Robertson & N. Sagiv (Eds.), Synesthesia: Perspectives from cognitive neuroscience (pp. 47–73). New York: Oxford University Press. Bremner, A., Lewkowicz, D., & Spence, C. (Eds.). (2012). Multisensory development. Oxford: Oxford University Press. Bush, R. R., & Mosteller, F. (1951). A mathematical model for simple learning. Psychological Review, 58, 313–323. Cabrera, D., & Morimoto, M. (2007). Influence of fundamental frequency and source elevation on the vertical localization of complex tones and complex tone pairs. Journal of the Acoustical Society of America, 122, 478. Calvert, G., Spence, C., & Stein, B. E. (Eds.). (2004). The handbook of multisensory processing. Cambridge, MA: MIT Press. Chen, Y.-C., Yeh, S.-L., & Spence, C. (2011). Crossmodal constraints on human visual awareness: Can auditory semantic context modulate binocular rivalry? Frontiers in Perception Science, 2, 212. Chica, A., Sanabria, D., Lupiáñez, J., & Spence, C. (2007). Comparing intramodal and crossmodal cuing in the endogenous orienting of spatial attention. Experimental Brain Research, 179, 353–364, 531. Chiou, R., & Rich, A. N. (2012a). Cross-modality correspondence between pitch and spatial location modulates attentional orienting. Perception, 41, 339–353. Chiou, R., & Rich, A. N. (2012b). Perceptual difficulty and speed pressure reveal different behavioural effects of voluntary and involuntary attention. Poster session presented at the 39th Australasian Experimental Psychology Conference, Sydney, Australia. Clark, H. H., & Brownell, H. H. (1976). Position, direction, and their perceptual integrality. Perception and Psychophysics, 19, 328–334. Cohen, N. E. (1934). Equivalence of brightness across modalities. American Journal of Psychology, 46, 117–119. Cohen Kadosh, R., & Terhune, D. B. (2011). Redefining synaesthesia? British Journal of Psychology, 103, 20–23. Crisinel, A.-S., & Spence, C. (2010). As bitter as a trombone: Synesthetic correspondences in non-synesthetes between tastes and flavors and musical instruments and notes. Attention, Perception, and Psychophysics, 72, 1994–2002. Crisinel, A.-S., & Spence, C. (2011). Crossmodal associations between flavoured milk solutions and musical notes. Acta Psychologica, 138, 155–161. Crisinel, A.-S., & Spence, C. (2012). A fruity note: Crossmodal associations between odors and musical notes. Chemical Senses, 37, 151–158. De Houwer, J., Teige-Mocigemba, S., Spruyt, A., & Moors, A. (2009). Implicit measures: A normative analysis and review. Psychological Bulletin, 135, 347–368. Demattè, M. L., Sanabria, D., & Spence, C. (2007). Olfactory-tactile compatibility effects demonstrated using the implicit association task. Acta Psychologica, 124, 332–343. Deroy, O., Crisinel, A., & Spence, C. (in press). Crossmodal correspondences between odours and contingent features: Odours, musical notes, and arbitrary shapes, Psychonomic Bulletin & Review. Deroy, O., & Spence, C. (submitted for publication) Bordelines cases of crossmodal experiences, mind and language. Deroy, O., & Spence, C. (in press). Why we are not all synaesthetes (not even weakly so), Psychonomic Bulletin & Review. Deroy, O., & Valentin, D. (2011). Tasting liquid shapes: Investigating the sensory basis of cross-modal correspondences. Chemosensory Perception, 4, 80–90. Eramudugolla, R., Kamke, M., Soto-Faraco, S., & Mattingley, J. B. (2011). Perceptual load influences auditory space perception in the ventriloquist aftereffect. Cognition, 118, 62–74. Ernst, M. O. (2007). Learning to integrate arbitrary signals from vision and touch. Journal of Vision, 7(5/7), 1–14. Esterman, M., Verstynen, T., Ivry, R. B., & Robertson, L. C. (2006). Coming unbound: Disrupting automatic integration of synesthetic color and graphemes by TMS of the right parietal lobe. Journal of Cognitive Neuroscience, 18, 1570–1576. Estes, W. K. (1950). Toward a statistical theory of learning. Psychological Review, 57, 94–107. Evans, K. K., & Treisman, A. (2010). Natural cross-modal mappings between visual and auditory features. Journal of Vision, 10(1), 1–12 (article no. 6). Fairhall, S. L., & Macaluso, E. (2009). Spatial attention can modulate audiovisual integration at multiple cortical and subcortical sites. European Journal of Neuroscience, 29, 1247–1257. Fernández-Prieto, I., Vera-Constán, F., García-Morera, J., & Navarra, J. (2012). Spatial recoding of sound: Pitch-varying auditory cues modulate up/down visual spatial attention. Seeing and Perceiving, 25(Suppl.), 150–151. Fiebelkorn, I. C., Foxe, J. J., & Molholm, S. (2010). Dual mechanisms for the cross-sensory spread of attention: How much do learned associations matter? Cerebral Cortex, 20, 109–120. Fiebelkorn, I. C., Foxe, J. J., & Molholm, S. (2012). Attention and multisensory feature integration. In B. E. Stein (Ed.), The new handbook of multisensory processing (pp. 383–394). Cambridge, MA: MIT Press. Gallace, A., & Spence, C. (2006). Multisensory synesthetic interactions in the speeded classification of visual size. Perception and Psychophysics, 68, 1191–1203. Giard, M. H., & Peronnet, F. (1999). Auditory–visual integration during multimodal object recognition in humans: A behavioral and electrophysiological study. Journal of Cognitive Neuroscience, 11, 473–490. Greenwald, A. G., McGhee, D. E., & Schwartz, J. L. K. (1998). Measuring individual differences in implicit cognition: The implicit association test. Journal of Personality and Social Psychology, 74, 1464–1480. Guzman-Martinez, E., Ortega, L., Grabowecky, M., Mossbridge, J., & Suzuki, S. (2012). Interactive coding of visual spatial frequency and auditory amplitudemodulation rate. Current Biology, 22, 383–388. Hackley, S. A. (1993). An evaluation of the automaticity of sensory processing using event-related potentials and brain-stem reflexes. Psychophysiology, 30, 415–428.

C. Spence, O. Deroy / Consciousness and Cognition 22 (2013) 245–260

259

Hanson-Vaux, G., Crisinel, A.-S., & Spence, C. (2013). Smelling shapes: Crossmodal correspondences between odors and shapes. Chemical Senses, 38, 161–166. Helbig, H. B., & Ernst, M. O. (2008). Visual-haptic cue weighting is independent of modality-specific attention. Journal of Vision, 8(10), 1–16 (article no. 2). Heron, J., Roach, N. W., Hanson, J. V. M., McGraw, P. V., & Whitaker, D. (2012). Audiovisual time perception is spatially specific. Experimental Brain Research, 218, 477–485. Hochel, E., & Milán, E. G. (2008). Synaesthesia: The existing state of affairs. Cognitive Neuropsychology, 25, 93–117. Horowitz, T. S., Wolfe, J. M., Alvarez, G. A., Cohen, M. A., & Kuzmova, Y. I. (2009). The speed of free will. Quarterly Journal of Experimental Psychology, 62, 2262–2288. Hubbard, T. L. (1996). Synesthesia-like mappings of lightness, pitch, and melodic interval. American Journal of Psychology, 109, 219–238. Jeschonek, S., Pauen, S., & Babocsai, L. (in press). Cross-modal mapping of visual and acoustic displays in infants: The effect of dynamic and static components, European, http://dx.doi.org/10.1080/17405629.2012.681590. Keller, P. E., & Koch, I. (2006). Exogenous and endogenous response priming with auditory stimuli. Advances in Cognitive Psychology, 2, 269–276. Klapetek, A., Ngo, M. K., & Spence, C. (2012). Do crossmodal correspondences enhance the facilitatory effect of auditory cues on visual search? Attention, Perception, and Psychophysics, 74, 1154–1167. Klein, R., Brennan, M., D’Aloisio, A., D’Entremont, B., & Gilani, A. (1987). Covert cross-modality orienting of attention. Unpublished manuscript. Klein, R. M., Brennan, M., & Gilani, A. (1987). Covert cross-modality orienting of attention in space. Paper presented at the annual meeting of the Psychonomics Society, Seattle (November). Klein, R. M., & Juckes, T. (1989). Can auditory frequency control the direction of visual attention. Paper presented at the Canadian Acoustic Association, Halifax, NS, October (abstract published in proceedings). Kovic, V., Plunkett, K., & Westermann, G. (2010). The shape of words in the brain. Cognition, 114, 19–28. Lavie, N. (2005). Distracted and confused?: Selective attention under load. Trends in Cognitive Sciences, 9, 75–82. Lewkowicz, D. J., & Turkewitz, G. (1980). Cross-modal equivalence in early infancy: Auditory–visual intensity matching. Developmental Psychology, 16, 597–607. Linkovski, O., Akiva-Kabiri, L., Gertner, L., & Henik, A. (2012). Is it for real? Evaluating authenticity of musical pitch-space synesthesia. Cognitive Processing. http://dx.doi.org/10.1007/s10339-012-0498-0. Logan, G. D. (1985). Skill and automaticity: Relations, implications, and future directions. Canadian Journal of Psychology, 39, 367–386. Logan, G. D. (1989). Automaticity and cognitive control. In J. S. Uleman & J. A. Bargh (Eds.), Unintended thought (pp. 52–74). New York: Guilford Press. Long, J. (1977). Contextual assimilation and its effect on the division of attention between nonverbal signals. Quarterly Journal of Experimental Psychology, 29, 397–414. Ludwig, V. U., Adachi, I., & Matzuzawa, T. (2011). Visuoauditory mappings between high luminance and high pitch are shared by chimpanzees (Pan troglodytes) and humans. Proceedings of the National Academy of Sciences USA, 108, 20661–20665. Lupiáñez, J., & Callejas, A. (2006). Automatic perception and synaesthesia: Evidence from colour and photism naming in a Stroop-negative priming task. Cortex, 42, 204–212. Mack, A., & Rock, I. (1998). Inattentional blindness. Cambridge, MA: MIT Press. MacLeod, C. M., & Dunbar, K. (1988). Training and Stroop-like interference: Evidence for a continuum of automaticity. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 126–135. Maeda, F., Kanai, R., & Shimojo, S. (2004). Changing pitch induced visual motion illusion. Current Biology, 14, R990–R991. Marks, L. E. (1987). On cross-modal similarity: Auditory–visual interactions in speeded discrimination. Journal of Experimental Psychology: Human Perception and Performance, 13, 384–394. Marks, L. E. (2004). Cross-modal interactions in speeded classification. In G. A. Calvert, C. Spence, & B. E. Stein (Eds.), Handbook of multisensory processes (pp. 85–105). Cambridge, MA: MIT Press. Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. New York: W.H. Freeman and Company. Martino, G., & Marks, L. E. (1999). Perceptual and linguistic interactions in speeded classification: Tests of the semantic coding hypothesis. Perception, 28, 903–923. Martino, G., & Marks, L. E. (2000). Cross-modal interaction between vision and touch: The role of synesthetic correspondence. Perception, 29, 745–754. Martino, G., & Marks, L. E. (2001). Synesthesia: Strong and weak. Current Directions in Psychological Science, 10, 61–65. Matthews, P. B. C. (1991). The human stretch reflex and the motor cortex. Trends in Neurosciences, 14, 87–91. Mattingley, J. B. (2009). Attention, automaticity and awareness in synaesthesia. Annals of the New York Academy of Sciences (The Year in Cognitive Science), 1156, 141–167. Mattingley, J. B., Payne, J. M., & Rich, A. N. (2006). Attentional load attenuates synaesthetic priming effects in grapheme-colour synaesthesia. Cortex, 42, 213–221. Maurer, D., & Mondloch, C. J. (2005). Neonatal synaesthesia: A reevaluation. In L. C. Robertson & N. Sagiv (Eds.), Synaesthesia: Perspectives from cognitive neuroscience (pp. 193–213). Oxford: Oxford University Press. Melara, R. D. (1989). Dimensional interaction between color and pitch. Journal of Experimental Psychology: Human Perception and Performance, 15, 69–79. Melara, R. D., & O’Brien, T. P. (1987). Interactions between synesthetically corresponding dimensions. Journal of Experimental Psychology: General, 116, 323–336. Miller, J. O. (1991). Channel interaction and the redundant targets effect in bimodal divided attention. Journal of Experimental Psychology: Human Perception and Performance, 17, 160–169. Molholm, S., Ritter, W., Javitt, D. C., & Foxe, J. J. (2004). Multisensory visual–auditory object recognition in humans: A high-density electrical mapping study. Cerebral Cortex, 14, 452–465. Molholm, S., Ritter, W., Murray, M. M., Javitt, D. C., Schroeder, C. E., & Foxe, J. J. (2002). Multisensory auditory–visual interactions during early sensory processing in humans; A high-density electrical mapping study. Cognitive Brain Research, 14, 115–128. Mondloch, C. J., & Maurer, D. (2004). Do small white balls squeak? Pitch-object correspondences in your children. Cognitive, Affective, and Behavioral Neuroscience, 4, 133–136. Moors, A., & De Houwer, J. (2006). Automaticity: A theoretical and conceptual analysis. Psychological Bulletin, 132, 297–326. Mossbridge, J. A., Grabowecky, M., & Suzuki, S. (2011). Changes in auditory frequency guide visual-spatial attention. Cognition, 121, 133–139. Muggleton, N., Tsakanikos, E., Walsh, V., & Ward, J. (2007). Disruption of synaesthesia following TMS of the right posterior parietal cortex. Neuropsychologia, 45, 1582–1585. Nahm, F. K. D., Tranel, D., Damasio, H., & Damasio, A. R. (1993). Cross-modal associations and the human amygdale. Neuropsychologia, 31, 727–744. Navarra, J., Alsius, A., Soto-Faraco, S., & Spence, C. (2009). Assessing the role of attention in the audiovisual integration of speech. Information Fusion, 11, 4–11. Ngo, M. K., & Spence, C. (2010). Auditory, tactile, and multisensory cues facilitate search for dynamic visual stimuli. Attention, Perception, and Psychophysics, 72, 1654–1665. Ngo, M., & Spence, C. (2012). Facilitating masked visual target identification with auditory oddball stimuli. Experimental Brain Research, 221, 129–136. Occelli, V., Spence, C., & Zampini, M. (2009). Compatibility effects between sound frequencies and tactually stimulated locations on the hand. NeuroReport, 20, 793–797. Palmer, S. E., & Schloss, K. B. (2012). Color, music and emotion. In École thématique interdisciplinaire CNRS. Coloeur: Approaches multisensorielles (pp. 43–58). Roussillon: Ôkhra SCIC SA. Parise, C., & Spence, C. (2009). ‘When birds of a feather flock together’: Synesthetic correspondences modulate audiovisual integration in non-synesthetes. PLoS ONE, 4(5), e5664.

260

C. Spence, O. Deroy / Consciousness and Cognition 22 (2013) 245–260

Parise, C. V., & Spence, C. (2012). Audiovisual crossmodal correspondences and sound symbolism: An IAT study. Experimental Brain Research, 220, 319–333. Parise, C. V., & Spence, C. (in press). Audiovisual crossmodal correspondences. In J. Simner, & E. Hubbard (Eds.), Oxford handbook of synaesthesia. Oxford: Oxford University Press. Patching, G. R., & Quinlan, P. T. (2002). Garner and congruence effects in the speeded classification of bimodal signals. Journal of Experimental Psychology: Human Perception and Performance, 28, 755–775. Pedley, P. E., & Harper, R. S. (1959). Pitch and the vertical localization of sound. The American Journal of Psychology, 72, 447–449. Peiffer-Smadja, N. (2010). Exploring the bouba/kiki effect: A behavioral and fMRI study. Unpublished Ms thesis, Universite Paris V, Descartes, France. Pratt, C. C. (1930). The spatial character of high and low tones. Journal of Experimental Psychology, 13, 278–285. Price, M. C., & Mattingley, J. B. (in press). Automaticity in sequence-space synaesthesia: A critical appraisal of the evidence. Cortex. Proctor, R. W., & Cho, Y. S. (2006). Polarity correspondence: A general principle for performance of speeded binary classification tasks. Psychological Bulletin, 132, 416–442. Rader, C. M., & Tellegen, A. (1987). An investigation of synesthesia. Journal of Personality and Social Psychology, 52, 981–987. Ramachandran, V. S., & Hubbard, E. M. (2003). Hearing colors, tasting shapes. Scientific American, 288(May), 43–49. Rich, A. N., Bradshaw, J. L., & Mattingley, J. B. (2005). A systematic, large-scale study of synaesthesia: Implications for the role of early experience in lexicalcolour associations. Cognition, 98, 53–84. Rich, A. N., & Mattingley, J. B. (2003). The effects of stimulus competition and voluntary attention on colour-graphemic synaesthesia. NeuroReport, 14, 1793–1798. Rich, A. N., & Mattingley, J. B. (2010). Out of sight, out of mind: Suppression of synaesthetic colours during the attentional blink. Cognition, 114, 320–328. Röder, B., & Büchel, C. (2009). Multisensory interactions within and outside the focus of visual spatial attention (commentary on Fairhall & Macaluso). European Journal of Neuroscience, 29, 1245–1246. Roffler, S. K., & Butler, R. A. (1968). Factors that influence the localization of sound in the vertical plane. Journal of the Acoustical Society of America, 43, 1255–1259. Rudmin, F., & Cappelli, M. (1983). Tone-taste synesthesia: A replication. Perceptual and Motor Skills, 56, 118. Rusconi, E., Kwan, B., Giordano, B. L., Umiltà, C., & Butterworth, B. (2006). Spatial representation of pitch height: The SMARC effect. Cognition, 99, 113–129. Sadaghiani, S., Maier, J. X., & Noppeney, U. (2009). Natural, metaphoric, and linguistic auditory direction signals have distinct influences on visual motion processing. Journal of Neuroscience, 29, 6490–6499. Sagiv, N., Heer, J., & Robertson, L. (2006). Does binding of synesthetic color to the evoking grapheme require attention? Cortex, 42, 232–242. Santangelo, V., & Spence, C. (2008). Is the exogenous orienting of spatial attention truly automatic? Evidence from unimodal and multisensory studies. Consciousness and Cognition, 17, 989–1015. Schifferstein, H. N. J., & Tanudjaja, I. (2004). Visualizing fragrances through colors: The mediating role of emotions. Perception, 33, 1249–1266. Schneider, W., Dumais, S. T., & Shiffrin, R. (1984). Automatic and control processing and attention. In R. S. Parasuraman & D. R. Davies (Eds.), Varieties of attention (pp. 1–27). New York: Academic Press. Schwartz, J.-L., Grimault, N., Hupé, J.-M., Moore, B. C. J., & Pressnitzer, D. (2012). Multistability in perception: Binding sensory modalities, an overview. Proceedings of the Royal Society B, 367, 896–905. Seo, H.-S., Arshamian, A., Schemmer, K., Scheer, I., Sander, T., Ritter, G., et al (2010). Cross-modal integration between odors and abstract symbols. Neuroscience Letters, 478, 175–178. Shiffrin, R. M. (1988). Attention. In R. C. Atkinson, R. J. Hernstein, G. Lindzey, & R. D. Luce (Eds.). Stevens’ handbook of experimental psychology (Vol. 2, pp. 739–811). New York: Wiley. Simner, J., Harrold, J., Creed, H., Monro, L., & Foulkes, L. (2009). Early detection of markers for synaesthesia in childhood populations. Brain, 132, 57–64. Simner, J., & Hubbard, E. M. (2006). Variants of synesthesia interact in cognitive tasks: Evidence for implicit associations and late connectivity in cross-talk theories. Neuroscience, 143, 805–814. Smith, C., Carey, S., & Wiser, M. (1985). On differentiation: A case study of the development of the concepts of size, weight, and density. Cognition, 21, 177–237. Smith, L. B., & Sera, M. D. (1992). A developmental analysis of the polar structure of dimensions. Cognitive Psychology, 24, 99–142. Spence, C. (2010). Crossmodal spatial attention. Annals of the New York Academy of Science (The Year in Cognitive Neuroscience), 1191, 182–200. Spence, C. (2011). Crossmodal correspondences: A tutorial review. Attention, Perception, and Psychophysics, 73, 971–995. Spence, C., & Deroy, O. (2012). Are chimpanzees really synaesthetic? i-Perception, 3, 316–318. Spence, C., & Deroy, O. (2013). Crossmodal mental imagery. In S. Lacey, & R. Lawson (Eds.), Multisensory imagery: Theory and applications. (pp. 130–159). New York: Springer. Spence, C., & Driver, J. (1997). Audiovisual links in exogenous covert spatial orienting. Perception and Psychophysics, 59, 1–22. Spence, C., McDonald, J., & Driver, J. (2004). Exogenous spatial cuing studies of human crossmodal attention and multisensory integration. In C. Spence & J. Driver (Eds.), Crossmodal space and crossmodal attention (pp. 277–320). Oxford, UK: Oxford University Press. Spence, C., & Ngo, M. (2012a). Capitalizing on shape symbolism in the food and beverage sector. Flavour, 1, 12. Spence, C., & Ngo, M. K. (2012b). Does attention or multisensory integration explain the crossmodal facilitation of masked visual target identification? In B. E. Stein (Ed.), The new handbook of multisensory processing (pp. 345–358). Cambridge, MA: MIT Press. Spence, C., Nicholls, M. E. R., & Driver, J. (2001). The cost of expecting events in the wrong sensory modality. Perception and Psychophysics, 63, 330–336. Spence, C., & Parise, C. V. (2012). The cognitive neuroscience of crossmodal correspondences. i-Perception, 3, 410–412. Sperber, D. (2005). Modularity and relevance: How can a massively modular mind be flexible and context-sensitive? In P. Carruthers, S. Laurence, & S. P. Stich (Eds.), The innate mind: Structure and content (pp. 53–68). New York: Oxford University Press. Stein, B. E. (Ed.). (2012). The new handbook of multisensory processing. Cambridge, MA: MIT Press. Sweeny, T. D., Guzman-Martinez, E., Ortega, L., Grabowecky, M., & Suzuki, S. (2012). Sounds exaggerate visual shape. Cognition, 124, 194–200. Theeuwes, J. (2010). Top-down and bottom-up control of visual selection. Acta Psychologica, 135, 77–99. Treisman, A. (2005). Synesthesia: Implications for attention, binding, and consciousness – A commentary. In L. Robertson & N. Sagiv (Eds.), Synaesthesia: Perspectives from cognitive neuroscience (pp. 239–254). Oxford, UK: Oxford University Press. Trimble, O. C. (1934). Localization of sound in the anterior posterior and vertical dimensions of auditory space. British Journal of Psychology, 24, 320–334. Tzelgov, J. (1997). Specifying the relations between automaticity and consciousness: A theoretical note. Consciousness and Cognition, 6, 441–451. Van der Burg, E., Olivers, C. N. L., Bronkhorst, A. W., & Theeuwes, J. (2008). Non-spatial auditory signals improve spatial visual search. Journal of Experimental Psychology: Human Perception and Performance, 34, 1053–1065. Vroomen, J., Bertelson, P., & de Gelder, B. (2001). The ventriloquist effect does not depend on the direction of automatic visual attention. Perception and Psychophysics, 63, 651–659. Walker, P., Bremner, J. G., Mason, U., Spring, J., Mattock, K., Slater, A., et al (2010). Preverbal infants’ sensitivity to synesthetic cross-modality correspondences. Psychological Science, 21, 21–25. Walker, P., & Smith, S. (1985). Stroop interference based on the multimodal correlates of haptic size and auditory pitch. Perception, 14, 729–736. Walker-Andrews, A. (1994). Taxonomy for intermodal relations. In D. J. Lewkowicz & R. Lickliter (Eds.), The development of intersensory perception: Comparative perspectives (pp. 39–56). Hillsdale, NJ: Lawrence Erlbaum. Walsh, V. (2003). A theory of magnitude: Common cortical metrics of time, space and quality. Trends in Cognitive Sciences, 7, 483–488. Ward, J. (2012). Synaesthesia. In B. E. Stein (Ed.), The new handbook of multisensory processes (pp. 319–333). Cambridge, MA: MIT Press. Ward, J., Huckstep, B., & Tsakanikos, E. (2006). Sound-colour synaesthesia: To what extent does it use cross-modal mechanisms common to us all? Cortex, 42, 264–280. Ward, J., Jonas, C., Dienes, Z., & Seth, A. (2010). Grapheme-colour synaesthesia improves detection of embedded shapes but without pre-attentive ‘pop-out’ of synaesthetic colour. Proceedings of the Royal Society of London. Section B. Biological Sciences, 277, 1021–1026.