Reference, perception, and attention

19 downloads 0 Views 222KB Size Report
Feb 19, 2008 - Abstract I examine John Campbell's claim that the determination of the ... identity (or, as it is usually called, its numerical identity). .... the representations of objects in perception are constructed. ...... biased in favor of behaviorally relevant stimuli as a result of many ..... 'What' and 'where' in the human brain.
Philos Stud (2009) 144:339–360 DOI 10.1007/s11098-008-9213-5

Reference, perception, and attention Athanasios Raftopoulos

Published online: 19 February 2008  Springer Science+Business Media B.V. 2008

Abstract I examine John Campbell’s claim that the determination of the reference of a perceptual demonstrative requires conscious visual object-based selective attention. I argue that although Campbell’s claim to the effect that, first, a complex binding parameter is needed to establish the referent of a perceptual demonstrative, and, second, that this referent is determined independently of, and before, the application of sortals is correct, this binding parameter does not require object-based attention for its construction. If object-based attention were indeed required then the determination of the referent would necessarily involve the application of sortal concepts, since object-based attention initiates top-down cognitive effects on visual processing. I also examine Mohan Matthen’s claim that reference to objects is established only through the visual processing in the dorsal visual stream and argue that although it is true that processing in the dorsal stream can determine reference, a thesis that goes against Campbell’s view that the determination of the referent requires conscious attention, processing along the ventral visual stream can also establish the reference of perceptual demonstratives. It also claim that Matthen’s account of dorsal processing underestimates the kind of information processed along the dorsal stream and this has some implications regarding perceptual demonstratives reference fixing. Keywords

Perception  Attention  Reference

1 Introduction Philosophers have argued for a variety of reasons that the referents of visual perceptual demonstratives should be determined in a non-conceptual manner A. Raftopoulos (&) Department of Psychology, University of Cyprus, P.O. Box 20537, Nicosia 1678, Cyprus e-mail: [email protected]

123

340

A. Raftopoulos

through the processing of the visual system (Campbell 2002, 2006a, b; Raftopoulos and Muller 2006b; Raftopoulos, in press; Smith 2002). To be able to determine such referents in a non-conceptual manner, the visual system should parse the objects in a visual scene by segregating them from both the background and the other objects in the scene, index them in a way that would allow it to track the objects in space and time and collect information about them, and do that without involving any sortal concepts whatsoever. In other words, the visual system should be able to individuate an object in a scene, lock onto it, and make us perceive it as an object that persists in space and time and carries properties that may change without the object loosing its identity (or, as it is usually called, its numerical identity). All these should take place without the application of sortals such as, e.g., objecthood. Campbell (2002, 2006a, b) argues that the referents of perceptual demonstratives can be known in such a non-conceptual manner, that is, independently of any sortal classification, through the function of conscious visual attention that yields a complex binding parameter, which carries information that depends on the occasional context of the visual scene, and which makes us know (or acquaints1us with) the referent by opening an object file for that specific object. Matthen (2005) claims that the reference to visual objects is achieved through the function of the dorsal visual stream, that is, the visual pathway that guides action upon the environment sidestepping the central nervous cognitive system of the brain. This has two consequences that differentiate Matthen’s account from that of Campbell’s. First, reference is determined non-consciously, since the representations formed during processing in the dorsal system lie outside awareness. Moreover, according to Matthen, the only information processed along the dorsal pathway concerns location. Therefore, information about location suffices to fix the reference of perceptual demonstratives. Second, processing in the ventral system yields only descriptive information regarding the objects to which reference has been determined by the dorsal system. This means that perception along the ventral pathway presupposes the application of sortals. Before I start discussing Campbell’s and Matthen’s claims, I will first state my assumptions. I will not discuss the reasons that have led philosophers to suggest that demonstrative reference should be determined in a non-conceptual manner. I will take the thesis that demonstrative reference must be established nonconceptually for granted and proceed on that assumption. Of course, that it should be so for philosophical reasons does not necessarily mean that it can be so, or that it is so. If, for instance, it turns out that the referents of perceptual demonstratives cannot be determined without the application of sortal concepts, as Brandom (1996), Clark (2006), and McDowell (1994) argue, then those philosophers who claim that demonstrative reference should be determined non-conceptually should find another way to address the issues that have led them to take that standpoint in the first place. For lack of space I will not deal with this issue here. Instead I will assume that one can mount a convincing argument to the effect that a stage of

1

Campbell explains that he uses the term ‘‘acquaintance’’ the same way that Russell, who introduced the term, does.

123

Reference, perception, and attention

341

visual processing, which I call ‘perception’, does not require the application of sortal concepts.2 The second assumption that I make is that visual processing consists in three functionally separate stages. Firstly, sensation, in which differences in light intensities that are initially registered in the retina are processed. Secondly, perception, in which complex information is retrieved directly (that is, without cognitive penetration and thus without the application of sortals) from the visual scene. And thirdly, observation or vision, in which perception is imbued with topdown flowing conceptual information and becomes a stage of visual processing that is cognitively penetrable (Raftopoulos 2001; in press; Raftopoulos and Muller 2006a). Perception, which roughly coincides with early vision as defined by Pylyshyn (2001, 2003), yields the non-conceptual content of experience as the content of the perceptual neural states. Siding with those who think that demonstrative reference is determined nonconceptually imposes certain restrictions on the kind of demonstrative reference one discusses. One cannot, for example, discuss demonstrative reference that occurs in utterances, such as ‘That picture’ or, even, the more generic ‘That object’. The reason is simple. Once one has uttered these words one has already exercised a conceptual framework and, therefore, the terms occurring in the demonstratives are, as it were, conceptually contaminated. It makes no sense to wonder whether the subject has applied any sortals when she uttered that statement; of course she has.3 This is why I have proposed elsewhere (Raftopoulos and Muller 2006b) that one should not be interested in utterances of perceptual demonstratives but, instead, on the mental act that one finds oneself in when looking at a scene and locking onto the objects in that scene. In that case, the perceptual content of the relevant perceptual states of the perceiver plays the role of the ‘‘That’’ in utterances of perceptual 2

Such arguments can be found among other places in Campbell (2006a, b), Heck (2000), Kelly (2001), Peacocke (2001), Raftopoulos (in press), Smith (2002), Vision (1997).

3

Confusion arises if one does not take that point into account. Clark (2006, p. 181), for instance, claims that the issue of the sortal dependence of perceptual contents should not be treated as a claim about how to focus one’s attention, that is, whether attention focusing requires the application of sortals (Clark takes this to be Campbell’s main point against conceptualism). Rather, the issue is ‘‘if we think it true that in using some expression a subject is referring to object x, and not to y, what are the conditions necessary for this to happen? To answer that question it seems we must consider how the subject represents x, whether the subject represents the thing as an x, or as a y: and there the suggestion that mastery of sortal concepts might be required is not so silly at all.’’ Now, a subject represents an object x as an x only if sortal concepts are applied; one cannot represent an animal as a tiger unless one has applied the sortal ‘‘tiger’’ and ‘‘animal’’. However, this is not what is at stake in the problem of fixing the referents of perceptual demonstratives. The issue here is not whether the expressions that subjects use to refer involve sortals, of course they do. The issue is whether to refer to objects in a visual scene through perceptual demonstratives the application of sortals is required. And the answer to that issue is, ‘of course not’. It should be mentioned though that Campbell’s (1997, 2002) treatment of the issue is ambiguous enough to validate Clark’s worries. Campbell frequently confounds perceptual demonstratives with expressions of reference. Campbell (1997, p. 55), for example, writes that the problem of reference to objects by means of perceptual demonstratives is a problem of relating concepts to imagistic content. ‘‘Imagistic content’’ is the content involved in imagistic or pictorial representations, which largely preserve the spatial structure of the scene they represent. It is the content of our experiences as we consciously access it and use it to see things as being such and such: ‘‘looking out of the window, then we may discuss the castle before us, identifying it as ‘that castle,’ the one we can see’’.

123

342

A. Raftopoulos

demonstratives. In addition, the same content is the mode of presentation of that demonstrative in the mind of the perceiver. Finally, I would like to explain briefly what I mean by ‘non-conceptual’ when I talk of the non-conceptual content of experience. There are various ways to define this notion and I have argued elsewhere (Raftopoulos and Muller 2006a) that the proper way to avoid various pitfalls is to define non-conceptual content as the content of perceptual mental states, that is, as the content of those visual states in which the relevant processes are cognitively impenetrable, since the neural sciences can help us isolate in part those states and their contents, thus allowing us to define non-conceptual content in a non-circular manner. However, I have also argued (Raftopoulos and Muller 2006b) that in the context of discussing the problem of demonstrative reference fixing, the notion of non-conceptual content as defined by Burge (1977) suffices to clarify the meaning of non-conceptual. According to Burge, being acquainted in perception with an object means that one is in direct (without any conceptual intermediaries) contact with the object itself and retrieves information regarding that very object from the object itself and not through a description. On that account, perception puts us in a de re relationship with the object (as opposed to a descriptivist relationship). When one forms a de re belief, one stands in ‘‘an appropriate non-conceptual, contextual relation to objects the belief is about’’ (Burge 1977, p. 346). Thus, the content of a mental state is nonconceptual if its reference is determined independently of any descriptive content that the mental state might have under a canonical description.4 Since a description involves sortals, a non-descriptivist de re relationship with objects that allows the fixing of reference of perceptual demonstratives does not involve any sortals. In this paper, I address several problems that plague Campbell and Matthen’s account of reference fixing with regard to perceptual demonstratives and claim that the theory of reference that I have presented elsewhere (Raftopoulos and Muller 2006b; Raftopoulos, in press) allows us to overcome these problems while retaining some of the useful insights that mark Campbell’s and Matthen’s accounts. I start first by discussing in section one Matthen’s claims that reference to objects in a visual scene is established through the dorsal visual pathway and that the ventral system provides only descriptive information regarding the objects in a scene. I claim that although Matthen is right to argue that the dorsal system can establish reference without awareness of the contents processed along the dorsal stream, the ventral system can also fix the referents of perceptual demonstratives in a non-conceptual manner. In the second section, I discuss Campbell’s claims to the effect that fixing reference requires awareness and that the referents of perceptual demonstratives are fixed through the function of conscious visual selective attention. With regard to the first issue I will argue that in some cases reference can be fixed in a non-conscious 4

Cussins (1990) calls the content of a representation ‘‘non-conceptual,’’ by which he means those representations whose contents are canonically characterized by means of concepts which are such that the organism need not have those concepts in order to have that content. More specifically, for any state S with content, S has non-conceptual content P, iff the fact that X is in S does not imply that X possesses the concepts that canonically characterize P, meaning that X does not need to possess the concepts that would normally enter in a report of the content of S that adequately specifies that content.

123

Reference, perception, and attention

343

manner and that when, in other cases, awareness is required for reference fixing, this is not the awareness Campbell has in mind. With regard to the second issue, I will claim contra-Campbell that if conscious visual attention is required to fix the reference of perceptual demonstratives, then reference fixing presupposes the application of sortal concepts, thus undermining Campbell’s attempt to build a theory of non-conceptual reference fixing. However, fortunately for the project of non-conceptual reference fixing of perceptual demonstratives, I claim that reference to objects does not require conscious visual attention but only the preattentive perceptual mechanisms of object segregation. Finally, in section three, I discuss Campbell’s claim that consciousness is required for demonstrative reference fixing.

2 Reference: dorsal or ventral? Matthen (2005, pp. 27, 272) argues that objects and not locations are the first referents of visual conscious states, a thesis also adopted by Smith (2002) who, in similar vein, claims that the objects of our awareness are the objects and not their properties, as they are perceived in perception.5 This goes against Clark’s (2000) and Campbell’s (2002) conception of locations as the first referents of vision that identify the bearers of the features perceived at those locations, by organizing the features perceived at the same location in a spatio-temporal manifold. Campbell, for instance, uses Treisman’s (1988) theory of selective attention, the Feature Integration Theory (FIT), according to which, in vision, information from different feature maps is bound together by extracting the location encoded implicitly in any feature information. Spatial attention makes the implicit location explicit. Information localized at the same location is bound together and thought to pertain to a certain object that occupies that space. In this view, perception of location comes first and perception of objects follows, since it is only through locations that the representations of objects in perception are constructed. Against this, Matthen argues that perception is first of objects in the external world and not of their locations. ‘‘Material objects come first; features are attributed to them after they are identified’’ (Matthen 2005, p. 324). For Matthen (2005, p. 183), visual

5

Although phenomenal content, what Smith (2002) and others call sensations, is inherent into perceptual experience it is not the object of awareness of this experience, the object of awareness is a specific object with its properties. Smith distinguishes between sensation and perception. Sensations or sensory qualities or qualia are inherent features of experiences themselves. However, sensations lack intentionality. They do not refer to something beyond them; they are not properties, a sense-datum, of which one is aware. Although sensations are presented in consciousness as intrinsic properties of the experience itself, they do not function as the immediate objects of awareness of this experience. When one perceives, one is not aware of these properties as properties of one’s experience, one is aware of objects outside our bodies carrying these properties. One is aware of sensations only in cases of non-perceptual sensations. Thus, there is a distinction between having a sensation of something and being perceptually conscious of something. Sensations can occur either perceptually or non-perceptually, depending on whether they are properties of perceptual states or not, respectively. In the former case they are perceptual sensations (Smith 2002, p. 92). In the latter case sensations give us no awareness of physical objects only of some properties. The faculty that makes us aware of objects is perception.

123

344

A. Raftopoulos

states are in object-attribute form6 and, thus, a visual state does not simply represent or present the co-locability of certain visual features of objects, but in addition (re)presents these properties as adhering to an individual object that persists in space and time. This way, visual states identify the objects in a scene. Our awareness of space has a different origin than our awareness of the other sense features in that visual directions constitute an omnipresent grid that overlays every scene indexing the features represented in it. In that sense, ‘direction’ is part of the form of a visual representation, whereas ‘distance’, being a feature, is part of the empirical content of vision (Matthen 2005, pp. 274–275). Matthen is right. Perceptual states always (re)present the world within a coordinate system; visual scenes are always perceived as being in a space that is structured by means of a Cartesian coordinate system whose center is the body of the perceiver and whose axes correspond to the top-down and left-right direction as defined by the body of the perceiver. In that sense, the Cartesian grid indeed constitutes the form of the representation and not its content. It is this grid that enables the representation of the content of a perceptual state and, therefore, could not be part of that content.7 Within this spatial framework, objects can be represented in two ways; in a scene-based, allocentric, relational form, and in a perceiver-centered egocentric, absolute form. In the latter case, all featural information is computed within an absolute, body-centered frame of reference, in which features are computed with respect to the body of the perceiver (first her retina, then the center of the distance between the eyes, or the center of gravity of her body depending on the aim of the perceiver). Size, for instance, is computed in an absolute metric, that is, with respect to the perceiver, and not relationally with respect to the sizes of other objects in the scene. In the former case, information is cast in a relational object-centered or scene-based frame of reference in which the object features are computed with respect to the other objects and features in the scene. In this spatial framework objects are still perspectivally seen, that is, they are being represented in a Cartesian framework whose center is the body of the perceiver, but they are organized within this grid with respect to their relations to the other objects in the scene. We saw that for Matthen objects come first in perception and properties are predicated to them after objects have been identified. But how are objects identified in perception without any properties being predicated to them, and then described by 6

Now, ‘‘there is an analogy between familiar ways of reporting seeing and the units into which objects of sight may be articulated. The latter are divisible in roughly the way we tend to parse linguistic components of sentences’’ (Vision 1997, p. 85). In other words, perceptual states have the object-attribute form.

7

Peacocke (1992, pp. 61–62) makes roughly the same point in elaborating his notion of scenarios. Peacocke searches for a level of non-conceptual content on which to anchor concepts in a non-circular way. This kind of non-conceptual content is provided by the spatial types ‘the type being that under which fall precisely those ways of filling the space around the subject that are consistent with the correctness of the content.’ To specify the spatial types one needs to fix an origin and the axes of the resulting frame. These elements cannot be defined with reference to the real world, since a spatial type may be instantiated at different places. Thus, the point of origin and the axes should be defined with respect to a thing that is always present irrespective of the location at which the type is instantiated; this is the body of the subject. The point of origin may be the center of gravity of the body and the axes of reference the directions of right-left, up-down, and back-front, as defined on the subject’s body.

123

Reference, perception, and attention

345

means of their properties? Matthen (2005, p. 8) distinguishes between motion-guiding vision or vision that guides the body in its interaction with the environment and offers motor affordances (for example, it enables reaching, grasping, and in generally physical manipulation of objects), and vision for knowing the environment (that is, vision whose aim is to find out about objects in the world and their properties) or descriptive vision that offers epistemic affordances. Motion-guiding vision is implemented along the dorsal visual stream, and the processes of epistemic vision take place along the ventral visual stream (Matthen 2006, Chap. 13). Matthen takes the dorsal system to process spatial information (specifically the distances and angles of objects from the perceiver’s body, or the eyes, or the head), which it represents in a perceiver-centered spatial framework. The ventral stream processes information about size, shape, location, color etc., which it represents in a scene-based framework. The distinction between a ventral and a dorsal visual stream that subserve different visual functions is relatively old. In its first formulation (Ungerleider and Haxby 1994) the dorsal system corresponds to the ‘where’ system that locates objects in space, and the ventral system corresponded to the ‘what’ system that provides descriptive information about the objects, that is, information about their visual properties. Matthen (2005, pp. 297–299) adopts this distinction. However, there is almost universal agreement nowadays that the dorsal system processes featural information as well as spatial information, and that the ventral stream processes spatial information too. The difference between the two is not between a ‘where’ and a ‘what’ system but, rather, between a system that guides interaction with the environment directly by bypassing the cognitive centers of the brain, and a system that is dedicated to representing and storing in working and long term memory information about the environment that is useful for the organism’s various practical and epistemic actions (Goodale and Milner 2004, 2006; Jacob and Jeannerod 2003; Norman 2002). As we shall see next, this has some serious repercussions for Matthen’s theory of reference. Having distinguished between motor-guiding vision and descriptive vision, Matthen proceeds to argue for the thesis that reference to objects through vision is eventually established through the motor-guiding dorsal visual stream, whereas the role of the ventral stream is restricted to providing information about these objects. More specifically, Matthen (2005, p. 302) proposes that bodily actions are executed in three stages. During the first stage, the perceiver identifies an object as belonging to a certain kind and as having certain object features (color, size, shape etc.). At the second stage, an action plan is formulated that concerns an action that specifically refers to the object being identified. At the third stage, once the target has been identified, motion-guiding vision, which is involved only at this stage, takes over and guides the action on the object. Motor-guiding vision enables us to come into physical contact with the objects in a scene and physically manipulate them. This constitutes a kind of demonstrative reference to visual stimuli, which does not depend on the attribution of descriptive features to the objects (Matthen 2005, p. 300). Thus, there is in motor-guiding vision a non-descriptive reference to objects, a deictic element akin to ostension in language (Matthen 2005, p. 304). Through motion-guiding vision we have a feeling of presence of the objects in a scene, a feeling that descriptive vision cannot provide

123

346

A. Raftopoulos

and which seems to be a phenomenological consequence of the function of the motion-guiding vision that brings us into contact with the world (Matthen 2005, p. 316). After motion-guiding vision has identified an object through this kind of deictic reference, descriptive vision assigns it to descriptive classes. This is an ingenious move in that it allows one to solve many of the problems that have plagued descriptive theories of reference (for a discussion see Raftopoulos, in press; Raftopoulos and Muller 2006b). Moreover, Matthen’s view agrees with that of Campbell’s (2002) that reference should be established in a non-descriptive manner. Unfortunately, Matthen’s view is beset by problems of its own. I will concentrate on three of them. First, it is not clear in Matthen’s account what is the role of the ventral system in reference fixing. Second, if reference is determined through the dorsal system, spatial information is not often enough for that purpose. This means that the dorsal system must also process information other than spatial information. Matthen’s thesis that the dorsal system processes only spatial information leads him to an impasse. Third, there are strong reasons suggesting that reference should be determined through the ventral system, as well. First, although the dorsal stream is involved only at the third stage, Matthen describes the three-stage process of reference fixing as a ‘bodily action’. So, is it the three-stage process that determines reference or is it only the last stage in which the motor-guiding system takes over? Matthen’s (2005, p. 302) own comparison of his account with that of Campbell seems to suggest that it is the whole three stage process, since Matthen includes in the referential process the first two stages as well to stress the similarities between the two accounts. However, the rest of the chapter strongly suggests that he takes it for granted that reference to objects is determined only through the function of the motor guiding visual stream, since it is this stream that provides us with a deictic reference and the feeling of presence of the objects in the scene. Matthen (2006) alludes that this is the case when he offers the possibility, against Campbell’s claim that reference fixing requires consciousness, that reference fixing of perceptual demonstratives could be established in a non-conscious way, which is as it would be were reference fixing the result of processing along the dorsal stream whose states have content that is not available to awareness. This marks a deep difference between Campbell and Matthen’s accounts. If reference fixing takes place through the processing in the dorsal pathway, then demonstrative reference is fixed without awareness of objects and of their features, in so far as the contents of the states in the dorsal stream are unconscious. Notice though that for Matthen (2005) there is a kind of ‘‘awareness’’ accompanying the processes along the dorsal stream in that it is dorsal processing that makes us ‘‘feel’’ the presence of objects. If it is the case that reference to objects is determined only through the function of the motor guiding visual stream, then Matthen’s account of reference fixing is partly consistent with his claim that reference is fixed non-descriptively, but as we shall see, the account is flawed. If, on the other hand, Matthen includes in the process of reference fixing the first two stages by subsuming them under the general heading ‘bodily action’ then this conflicts both with his thesis that the referents of perceptual demonstratives are fixed non-descriptively and his thesis that objects are identified in a scene first without recourse to their features and the features are assigned to objects later on (objects first, features later on). The reason is that during the first stage the

123

Reference, perception, and attention

347

ventral system provides information about size, shape, and other visual features that identify the object; as Matthen (2005, p. 302) states, object features are needed for the identification to proceed. But the ventral system is a descriptive system and thus the identification of objects relies on descriptive information, which means, pace Matthen, that features come first and objects come later by being constructed on the basis of some featural description. In fact, the problem is more pervasive. If to initiate action on a specific object the dorsal stream requires information about the object that it receives from the ventral system as Matthen suggests, then this presupposes that this object has been identified as such and thus that reference to it has already been fixed, in which case one wonders what is the role of the dorsal system in reference fixing. That is why I said that even if Matthen adopts the view that it is only the dorsal system that establishes reference, this is only partially consistent with the rest of his theses.8 Note that if Matthen intends that demonstrative reference be fixed through the dorsal system in a non-descriptive way, then he should agree with Campbell that the fixing of the referents of perceptual demonstratives takes place in a non-conceptual way, since the processing along the dorsal pathway is cognitively impenetrable and thus it is not conceptually contaminated. Although in Matthen (2005) this view is rejected in favor of conceptualism (that is, the view that perception requires the application of sortal concepts) on account of the fact that Matthen considers the classificatory function as paramount to perceptual processing, in Matthen (2006) it is conceded that demonstrative perceptual reference may be perceptually fixed without the application of sortal concepts since the classificatory function of perception could be taken over by the various perceptual mechanisms that constrain perceptual processing. Matthen’s predicament stems from his view that the dorsal system processes only locational information. But, and this is the second of the problems in Matthen’s account, this view is wrong. There is first a conceptual problem. If the dorsal system does not process featural information other than spatial information, how does the ventral system communicate with it? We saw that at stage three of bodily action the dorsal system is informed by information regarding the identity of the target coming from the ventral system. There are two possibilities here. Either the dorsal system and the ventral system meet at some area just before the motor cortex in which the signals from the two streams are combined and the resulting signal is transmitted to the motor cortex to initiate action, or the dorsal system must process information coming from the ventral system. The first option cannot hold since such an area does not exist in the brain (the dorsal system communicates directly with the motor cortex). So, one is left with the second option. 8

Besides there is an empirical problem, which is fatal to Matthen’s three stage account of bodily action. Evidence (see Raftopoulos, in press, Chap. 2, for an extensive discussion) shows that activation can reach the motor cortex through the dorsal system at about 130 ms after stimulus onset. At that point in time the activation along the ventral system has barely solved the binding problem and it has not necessarily reached the mnemonic circuits. The identification and recognition of an object as indexed by the elicitation of the P300 component of ERP scanning takes place at about 270–300 ms after the stimulus onset. This means that if the dorsal system to initiate action had to rely always on information regarding the identity of the target coming from the ventral system, then action would be much more delayed that it actually is.

123

348

A. Raftopoulos

Indeed, this seems to be the case; ventral and dorsal systems cross over in many areas and information is transferred from the ventral to the dorsal system (Goodale and Milner 2004, 2006; Norman 2002). This creates a problem regarding the frame of reference in which the information is cast, given that the two streams use different spatial frameworks (the dorsal stream uses a perceiver-centered absolute frame whereas the ventral uses a scene-based relational frame) but there is a solution that I will discuss later on. This means that the dorsal system can process featural information other than spatial information. But if this is the case, why should one claim that the dorsal system, in order to identify its targets, necessarily needs information from the ventral system? Would it not be possible for the dorsal system to process the information it needs on its own? The answer is yes; as I have stated in the beginning of this section, it is almost uncontroversial nowadays that the dorsal system process featural information and uses it to identify its targets. Upon a stimulus onset, the ventral and the dorsal system process information in parallel, each one subserving its specific function. So, things might seem brighter for Matthen. If he were willing to abandon his thesis regarding the kind of information processed in the dorsal stream, he would be able to argue that the dorsal system can identify non-descriptively its target and initiate action, and, thus, establish the deictic non-descriptive reference that Matthen, and rightly so, is after. Unfortunately, that will not do either. There are many occasions, and this is the third point of my criticism of Matthen’s account, in which a perceiver refrains from acting upon the objects of a visual scene. She may have decided, for instance, to remain a passive viewer, or she may have been instructed to remain so, say in the context of an experiment where she has received instructions from the experimenter. It is known (Goodale and Milner 2004) that the dorsal system works only in real time and stores the visuomotor information that it extracts from the scene for a few hundred ms. This entails that a few seconds after the stimulus onset and in a situation in which action is not initiated, the dorsal system ceases to function. Since the results of the processing along the dorsal system are not stored in memory the representations that were constructed during its processing are lost. If an action is required after the delay, then the action must be guided through information about the visual scene that has been stored in memory.9 This information, however, is being represented in the ventral stream, which now must transmit it to the dorsal stream to guide action, since it is only the dorsal 9

This is suggested from the following evidence. The ventral system stores size information in a relational scene-based framework. Thus it falls victim to the size-contrast illusion in which one object is perceived to be larger than another one although they both have the same size just because smaller objects surround it. The dorsal system, on the other hand, due to the absolute perceiver-center framework that it uses to represent information is not victimized by the illusion as the fact that the same person who perceives one object to be larger than the other, when asked to grasp them her grip is the same, which means that the calibration of the grip does not fall victim to the illusion. This, by the way, shows that the dorsal system processes size information on its own and does not have to wait for that information to be transmitted to it from the ventral system. Now, when action is delayed, the calibration of the grip falls victim to the illusion. The reason is that, since the dorsal system works in real time the time delay of the action causes the loss of the size information retrieved in the dorsal stream and, therefore, any action has to rely on size information stored in memory along the ventral stream. However sizes in the ventral system are represented in a relational framework and, thus, are subject to the size-contrast illusion. This leads the dorsal system that uses the ventral information to fall victim to the illusion.

123

Reference, perception, and attention

349

stream that interfaces with the motor cortex. The ventral system plays the same role when action is based upon semantic information regarding either the object or the visual scene. Since semantic information is represented in the ventral stream, it must be transmitted to the dorsal system (Goodale and Milner 2004, pp. 82–85). A problem with this is that, as Matthen (2005, 299 ft 3) remarks, this transfer of information from the ventral to the dorsal system requires that the perceivercentered representations be transformed to scene-based representations. Goodale and Milner (2004, pp. 101–102) suggest that since the representation of information in the ventral system is cast in a relational framework, in order to be usable by the dorsal system, in which information is represented in an absolute frame of reference, the information from the ventral system must be transformed first into a frame or reference that can in its turn be transformed to the absolute frame of reference used by the dorsal system. Such a frame of reference is the retinotopic frame of reference that characterizes the cells both in the retina and in V1. Thus, it is plausible that information from the ventral system is transmitted back to V1 and there it is transformed into retinotopic coordinates. From there it is fed into the dorsal system in which it is cast in an absolute frame of reference. In other words, the neutral retinotopic frame of reference acts as the medium that allows the translation from the relational frame of reference used in the ventral system to the perceiver-centered frame used in the dorsal system. Now, consider the perceiver who has been instructed to remain passive and who is presented with a visual scene. Since the dorsal system is off-line and does not function, Matthen has to argue that the perceiver cannot determine the demonstrative reference of her perceptual states. In other words, and given that the objects of awareness of perceptual states are the objects in the external world that are identified through demonstrative reference, the perceiver does not entertain perceptual states with content. She sees the visual scene as she would view the contents of pictures, that is, non-referentially. But this is certainly far-fetched if not outright absurd. After all, Matthen (2005, pp. 306–313) himself distinguishes between viewing real scenes and depicted scenes and, in the case I have been describing, one clearly confronts a real scene. It follows from these considerations that reference fixing of perceptual demonstratives must be feasible through the ventral system. Matthen could retort that if reference is established through the ventral system, and given that this is a descriptive system, this reference would be descriptive, which leads us back to descriptive theories of reference which Matthen, Campbell and I have agreed that they should be a forlorn cause. There is no reason for despair though. Reference can be fixed in non-descriptive ways through the ventral system provided that the featural information (including spatio-temporal information that plays the primary role in reference fixing) used to fix the reference is not encoded and stored anywhere in the system so that it could become part of the meaning of the referent and then used as an identifying description of the referent. In other words, although spatio-temporal and other featural information is used to determine the referents of perceptual demonstratives, this information is not assigned, properly speaking, to the object. This is why objects are singled out first in a scene as entities that persist in space and time and they are perceived as the same objects even though their properties may change. In this sense, the information that is used to

123

350

A. Raftopoulos

single out and index objects in a scene belongs to the foundational facts that fix the reference and not to the facts that contribute to the semantic content of the objectterm (Soames 2005). Thus, one should distinguish foundational facts, to wit, the facts that originally brought it about that the viewer parsed the scene the way she did and that has sustained the reference to the objects parsed as entities persisting in space and time (always within the time framework of a single fixation) from the facts that other viewers may use to parse the same scene. The foundational facts do not constitute necessary and sufficient conditions for parsing the same objects in a visual scene and, thus, do not provide descriptions that semantically fix the referents of perceptual demonstratives that correspond to the perceptual acts of singling out objects in a visual scene. Despite the various problems in Matthen’s account, one should hold fast on some of the points made by Matthen, namely that reference can be determined in a nondescriptive manner through the dorsal system that guides action upon objects, that one is aware of objects and their properties and not of properties hanging at the same location, and that objects are singled out in a scene first and then they are assigned features, although Matthen must allow that other featural information, in addition to spatio-temporal information, may be used to single out objects in a scene. Finally, one important contribution of Matthen’s, although as I argue in the next section it is wrong, is his view that when it comes to singling out objects and, thus, determining the referents of perceptual demonstratives, object-based attention and not spatial based attention is the kind of attention that could do the job. Matthen intends this to be a criticism of Campbell’s theory of reference to which I now turn.

3 Is attention needed for reference fixing? I have extensively discussed Campbell’s (2002) theory of reference in the context of presenting my own (Raftopoulos and Muller 2006b). So, I will deal here only with a thesis that is central to Campbell’s theory of reference fixing of perceptual demonstratives, namely that visual selective attention is required to fix demonstrative reference. One should notice, first, that there are two rather important shifts, or clarifications, in Campbell’s thought. The first concerns the usage of sortal concepts in fixing the referents of perceptual demonstratives, and the second concerns the kind of attention that is involved in reference fixing. In Campbell (2002), the issue of whether sortal concepts are needed in fixing demonstrative reference was left unclear with Campbell oscillating between the view that reference could be fixed in clearly perceptual ways without any conceptual involvement (a demand imposed from his attempt to elaborate a nondescriptive theory of demonstrative reference), and the view that at least some concepts, mainly perceptual concepts may be needed (a demand imposed mostly by his view that attention is needed to establish reference) (Raftopoulos and Muller 2006b). However, in subsequent work (Campbell 2006a, b) no such hesitation is to be found. Campbell clearly states that demonstrative reference can be established without any sortal involvement by means of purely perceptual mechanisms that single out objects in a scene. Another welcome move on Campbell’s part is that he

123

Reference, perception, and attention

351

has clarified what demonstrative reference purports to do. I have claimed (Raftopoulos and Muller 2006b; Raftopoulos, in press) that the referents of perceptual demonstratives are precursors of the ordinary objects of our experience. More specifically, the individuals directly retrieved from a visual scene that constitute the referents of demonstrative reference in vision are usually called ‘visual’ objects or ‘proto-objects’ and are structural descriptions of 2½D objects that are singled out in a visual scene and are indexed so that the visual system could keep track of them as they move in space and time and undergo featural changes. This presupposes that the visual system can first segregate objects from ground and from the other objects in the scene, that is, it can individuate objects in the scene. It should also be able to index these objects by locking determinately onto one thing in the scene rather than onto another. These things should be perceived as persisting in space and time and retain their (numerical) identity despite featural changes. What the visual system is not required to do is to supply a criterion of identity and classify the objects as members of classes of other things that have the same criterion of identity, a criterion that would allow their reidentification in other visual contexts. Campbell (2006b, pp. 234–236) agrees now with all these points. This means that whenever Campbell (2002) talked about the perceptual system identifying an object for the benefit of the information processing-systems, identification should be construed in the way just explained. In earlier work, Campbell (1997, 2002) had relied on Treisman’s FIT theory to address the binding problem, that is, the problem of how the visual system constructs objects, and moved from there to build his theory. FIT puts emphasis on the role of spatial attention as the key factor that binds together features found at the same location. As a result, Campbell’s claim that selective visual attention is necessary to fix demonstrative reference was taken to mean that spatial attention plays this role. This view has been severely criticized (Martin 1997: Matthen 2005; Raftopoulos and Muller 2006b) on the ground that there is evidence that objects are singled out by means of an object-based attention that locks directly onto objects (see Raftopoulos in press for an extended discussion and references). In other words, either spatial attention is not needed for the perception of objects, or it does not play the predominant role that Campbell initially thought that it did. Responding to these criticisms Campbell (2006a, pp. 239–242) makes clear that the attention he has in mind is allocated to objects and results in object files for the objects that are individuated in the scene, and not to locations. Indeed, Campbell (2002) discusses conscious attention to objects and the way this attention serves to single out information from a visual scene so that it could identify the objects for the benefit of information-processing systems, by which Campbell means that conscious attention opens and maintains object-files for the objects in a visual scene that allow the perceptual system to lock onto those objects. Still, locations play an important role in the allocation of attention to objects. Thus, ‘‘you can hold that attention is allocated to objects, but still say that location plays a role in the allocation of attention to objects’’ (Campbell 2006a, p. 240). Location plays such an important role because object files are assigned and maintained primarily by spatiotemporal information. The theories Campbell relies on to address the issue of object files (Pylyshyn 2001, 2003; Kahneman et al. 1992; Scholl 2001; Scholl and Leslie 1999) underline the role of

123

352

A. Raftopoulos

object-based attention in opening object files for objects parsed in a visual scene; object-based attention is needed for object individuation. Thus, Campbell (2006b, p. 233) states, ‘‘object-based attention can sustain demonstrative reference to particular objects.’’ At the same time, the demand that demonstrative reference be possible without the application of sortal concepts drives Campbell to complement the above sentence with the statement ‘‘without the appeal to the use of sortal concepts.’’ This is exactly the juncture where things start to go awry for Campbell’s ‘‘revised’’ theory. Remember that Campbell envisages a theory of demonstrative reference in which object-based attention individuates objects in a scene without any conceptual involvement. However, as I will claim now, assigning object-based attention to the role of opening and sustaining object files brings in the application of the conceptual framework of the perceiver in singling out objects in a visual scene. Let us see why. A first sign that things are not as straightforward as Campbell suggests comes from a careful reading of Scholl and Leslie’s (1999) and Pylyshyn’s (1994, 2001) work. This would reveal a small caveat regarding the role of attention. Pylyshyn (1994) points out that object individuation and the assignment of object-files occurs preattentionally. Scholl (2001) states that object individuation occurs very early in the attentional process or preattentively, where preattentive processing may be an attentive processing that requires relatively little attention. (Although it is not clear to me what an attentive processing that requires little attention is, I will not comment upon this point.) Why do these researchers oscillate between an attentive object-based processing and a preattentive processing that individuates objects? The reason is that there is abundant evidence that objects are individuated in a visual scene, by being segregated from ground and other objects, much earlier than the onset of object-based attention. In Raftopoulos (2001; in press), I discuss the mechanisms of vision and of their relationship with spatial and object-based attention. Here is in a nutshell the core tenet of that account. When a visual scene is being presented to the eyes, a feedforward sweep (FFS) reaches V1 at a latency of about 40 ms. Information is fed forward to the extrastriate areas, parietal, and temporal areas of the brain. The first ERP component, C1, is elicited at about 50 ms after stimulus onset and is unaffected by attention, be it spatial or object-centered. By 70–80 ms after stimulus onset most visual areas are activated. The preattentional FFS culminates within 100 or 120 ms after stimulus onset. After 70–90 ms of the stimulus onset, spatial attention by modulating the P1 waveform enhances visual processing in a voluntary taskdriven search at the salient locations. However, P1 is sensitive only to the characteristics of the stimulus. After 100 ms of the presentation of the stimuli at the attended locations an extensive part of our brain responds to the physical characteristics of the visual array. After 150 ms of the stimulus these features fuse to a single form or structural description of the objects in a visual scene by means of ‘Local Recurrent Processing’10 (LRP). At 150 ms the onset of N1 indexes the beginning of the registration of differences 10 LRP is processing that involves lateral and top-down flow of information, in addition to FFS. However, the information that flows top-down originates from sites within the circuits of early vision and thus does not entail the cognitive penetrability of early vision by conceptual information.

123

Reference, perception, and attention

353

between targets and distractors and, in general, differences between task-relevant and task-irrelevant items in the visual scene. About 200–300 ms after stimulus presentation, a voluntary task-driven search is registered in the same areas that process the visual features during FFS and LRP by enhancing neuronal activation of the salient objects and/or locations. These attentional effects are indexed by the onset of N2, which also signifies the onset of the biasing of processing by object-centered attention. Thus, the top-down effects of attention to objects are delayed in time and involve the same anatomical areas as FFS and LRP, except that attention amplifies the recordings in these areas. Finally, about 250 ms after the stimulus, some of the same areas participate in the cognitive/semantic processing of the input. Global RP, that is, processing that involves higher cognitive centers11 takes place and objects are classified and recognized, a process that is indexed by the onset of P3. Another important thesis is that attention is not a mechanism (like a zoom-lens) that focuses on and selects some information while ignoring some other. Instead, attention is the result of a biased competition (Desimone 1999; Reynolds and Desimone 2001) among pieces of information that, due to the restricted receptive fields of neurons in the more central areas of the brain, cannot all of them enter and be further processed by these central areas. More specifically, attention acts in a way that biases the competition between neuronal populations that encode environmental stimuli. All the stimuli in a visual scene are initially processed in parallel and activate neuronal assemblies that represent them. These assemblies eventually engage in competitive interactions for two reasons: either because they project onto cells in topographically organized cortical areas in which neurons have restricted receptive fields and thus cannot process all stimuli, or because some behaviorally relevant feature or object must be selected among all present stimuli. Thus, in the biased competition model of attention, multiple representations of objects are active and compete to be selected to drive a motor output (whether it be pressing a button, reaching to grasp an object, or some other motor behavior). The enhancement of neuronal responses in the extrastriate cortex due to attention is better understood ‘‘in the context of competitive interactions among neurons representing all of the stimuli present in the visual field. These interactions can be biased in favor of behaviorally relevant stimuli as a result of many different processes, both spatial and non-spatial and both bottom-up and top-down’’ (Desimone 1999, p. 13), where top-down influence is derived mostly from working memory. As a result of the biased interaction, behaviorally irrelevant stimuli are suppressed. In this framework, attentional selection is better understood not so much as the enhancement of neuronal responses but, rather, as the modulation of the competitive interaction of the stimuli in the visual field, and attention is better viewed as a dynamic property of the system and not as a separate mechanism. Although the biased competition account of visual processing posits the existence of a parallel bottom-up stage at which information from the environment is fed toward the visual areas of the brain, it is not clear as to the kind of information that is so processed. The stimuli include object features such as color and oriented lines, but the theory does not explicitly deal with the problem of feature binding that may 11

To use Dehaene’s et al. (1998) terminology, the signal enters the global working space.

123

354

A. Raftopoulos

occur during the parallel stage of processing; that is, it does not specify which features retrieved in the parallel mode may combine to form a more complex structure. Vecera (2000) concentrates on object-based attention and extends the biased competition account of visual search to the segregation or segmentation of objects from backgrounds and the selection of these objects by attentional processes. Object segmentation is the set of preattentional visual processes that determine which features combine to form the shapes present in a visual scene (Driver et al. 2001; Scholl 2001; Vecera 2000). These processes segment a shape from the background and segregate it from other shapes that are similarly segmented. Vecera defines object-based attention as the visual processes that select a segregated shape from among several segregated shapes. Given that the visual system cannot process all stimuli present in multi-object scenes, objects or regions in space compete with one another for processing in two respects. ‘‘First, there is a competition within object-based segregation processes and the segregated regions formed by segregation processes; the outcome of this competition is a perceptual group or figure that is more salient than other groups or figures. Second, there is a competition within object-based attentional processes; the outcome of this competition is the selection of one perceptual figure or group over another’’ (Vecera 2000, pp. 359–360). Consequently, feature integration, as a process that binds parts of a scene into units, and thus object segmentation or segregation, takes place at many different levels of visual processing, both early and late and both preattentionally and attentionally (Driver et al. 2001; Scholl 2001); feature binding takes place at many levels of visual processing. To sum up, the existing evidence suggests that during early vision there is a bottom-up stage (a stage in which processing is guided only by the stimuli and not by cognitive influences) in which a preliminary segregation of the sensory data into separate proto-objects takes place. Top-down effects, including familiarity with objects or scenes or some form of attentional setting, may override this initial segregation in favor of some other parsing of the scene into objects. The top-down effects also resolve ambiguities when the bottom-up processes do not suffice to segment a scene into its objects. However, these top-down effects occur after early vision has performed its first pass and parsed the scene into proto-objects. Feature integration and object segregation, thus, are better seen not as a separate stage of visual processing higher in the brain, but as an emergent phenomenon due to interactive activation among the cortical areas. Here is an example of how object-based attention works in a search task in which a subject has been instructed to search a scene for a specific feature. Suppose that a cue (say a certain feature) is presented to a subject who, after a delay, is asked to perform a task involving the selection of an object (target object) with that cued feature among other objects that do not have the specific feature (distractors). After the cue has been presented the neuronal assemblies in the prefrontal cortex that represent that cue are activated and remain activated for the duration of the task. The description of the target provided by the demands of the task creates a ‘‘template’’ that is stored in visual working memory for the duration of the task (otherwise put, the subject keeps the cue in her working memory to use it in the selection process). Thus, the features of the cue are temporarily stored in working memory, even when the stimulus has been withdrawn. The activation of this assembly is fed back to the extrastriate inferior

123

Reference, perception, and attention

355

temporal cortex thereby activating only the neurons that respond to the cued feature. This means that working memory biases activity in IT in favor of cells that select the cued feature. So, when the choice array is presented and the subject has to select the target feature, all cells in the IT cortex that respond to any feature in the visual field are initially activated and compete to be further processed. Cells representing different stimuli engage in mutually suppressive interactions, which eventually are biased in favor of the cells that represent the cued feature. The bias is due to the top-down activation of the cells from the signals that originate in working memory. When the subject makes her choice, the activation of cells responding to non-target stimuli has been suppressed. The aforementioned example in connection to the account of the timing of perceptual processes presented above helps us put into focus the role of working memory and higher cognitive centers in object-based attention. Object-based attention acts as the conductor of cognitive influences on visual processing and mediates the passage from the cognitively impenetrable processing that takes place in early vision or perception to a cognitively penetrable processing (what I have called ‘observation’ or ‘vision’). The instructions of the experimenter, for instance, enable the subject to employ the relevant cognitive resources and bring them into play to deal with the problem at hand. The fact that neurons in IT encoding the features of the target are more strongly activated than the neurons encoding other features on account of the influence of working memory suggests that, in the words of Clark (2006, p. 174) ‘‘somehow too the word comes down from on-high that it is RED that is sought; the word ‘red’ starts a chain of processes that eventuate in the appropriate instruction reaching chromatic systems in such a way that appropriate chromatic target is identified’’. So, the point in space and time at which non-conceptual representations, which characterize the non-conceptual cognitively impenetrable perceptual processing, come into contact or talk, as Clark notes, to the conceptual variety is determined by the top-down influence of working memory on perceptual processing that starts at the IT cortex and eventually spreads down to lower visual areas of the cortex. This top-down effect signals the onset of object-based attention, which is none other than the result of the biased competition in the IT cortex. How do these results affect the issue regarding the role of selective object-based attention in fixing the referents of perceptual demonstratives? Well, Campbell claims that object-based attention fixes the reference of demonstratives in a nondescriptive, that is, a conceptually encapsulated, way. However, the above sketch of visual processing and object-based attention renders clear that object-based attention, first, intervenes very late in visual processing and after objects have been individuated in the visual scene on the basis of spatio-temporal (and perhaps other featural information if the spatio-temporal information does not suffice to individuate the objects in the scene), and, second, that object-based attention involves working memory, which belongs to the consumer parts of the brain and functions as the interface between perceptual and conceptual/cognitive processing, that is, as the locus at which conceptual top-down information starts affecting visual processes rendering them cognitively penetrable.

123

356

A. Raftopoulos

Surely this is not the result Campbell would wish for. To square perceptual nondescriptive demonstrative reference fixing that involves no sortals with the above empirical facts, one should search for a theory that allows for object individuation before the onset of object-based attention. This theory would explain how objects are individuated and locked on (indexed) by early visual (perceptual) processing in a cognitively impenetrable way, that is, without the application of conceptual descriptive knowledge. Since there is evidence that object individuation takes place preattentively through early perceptual segmentation processes, these mechanisms and not object-based attention should be the focus of a non-conceptual nondescriptive theory of perceptual demonstrative reference fixing.12 Notice that Campbell (2002, pp. 17, 41) knows that the binding problem has already been solved to some extent before the onset of object-based conscious attention ‘‘the natural way for conscious attention to identify the object, for the benefit of the processing-information systems, is to use the parameter that was used in solving the Binding Problem … that will provide a kind of address to the object that is bound’’ (Campbell 2002, p. 41). So, one has to wonder why Campbell thinks that conscious attention is needed for the identification of objects, that is, for the individuation of, and opening object-files for, the object in a visual scene? I will briefly discuss this in the next section. The reader has certainly noticed that in the account of perceptual processing presented above, although object-based attention enters the picture very late to be able to play the role Campbell assigns to it in fixing demonstrative reference, spatial attention appears early enough. Could not one argue that it is spatial attention that fixes the reference of perceptual demonstratives? Furthermore, if spatial attention affects early vision and since spatial attention can be cognitively driven, does this not mean that conceptual content permeates perceptual processing in its early stages and that, consequently, conceptualism is vindicated? That spatial attention cannot play that role is evidenced by research showing single object benefit phenomena and a plethora of other research suggesting that the perceptual system locks onto objects and not places (Matthen 2005; Pylyshyn 2001, 2003; Raftopoulos, in press; Scholl 2001). With regard to the second question, namely whether the fact that spatial attention can be cognitively driven threatens the cognitive impenetrability of early vision, I have argued (Raftopoulos 2006; in press) that spatial attention allows only an indirect cognitive penetrability of perception that does not undermine the thesis that perception is cognitively impenetrable.

4 Demonstrative reference fixing and consciousness A last issue that I would like to address concerns Campbell’s insistence that perceptual demonstrative reference fixing requires consciousness. Campbell (2006b, p. 243) asks ‘‘But we can ask why it has to be consciousness of the object that is involved, rather than some perceptual state that might or might not be conscious. What exactly does consciousness contribute to our grasp of demonstrative concepts? 12

I have proposed such a theory elsewhere (Raftopoulos and Muller 2006b; Raftopoulos, in press).

123

Reference, perception, and attention

357

The answer I propose is that it makes possible for us to think in terms of categorical objects’’. ‘Thinking in terms of categorical objects’ is contrasted with ‘thinking in terms of the dispositions that surround us’ and it means that we think of objects and their properties in terms of the object and the property in itself and not in terms of the potentialities or dispositions for action that they afford us. When we perceive a round object we think in terms of roundness itself, rather than in terms of the complex of dispositions that roundness grounds. It seems to me that Campbell targets Matthen’s (2005, 2006) account of perceptual demonstrative reference in which the referents of perceptual demonstratives are determined through the function of the vision for action or motorguiding vision (the dorsal visual stream). Campbell takes it, and, although I will not argue for that point here, I think he is correct, that the dorsal system deals with objects in terms of the potentialities for action they afford us with, and not in terms of the objects and their properties in themselves. Only in the ventral system could ‘categorical representations’ emerge and since we perceive objects categorically, reference to objects should be fixed through the function of the ventral stream. Furthermore, consciousness eventually emerges in the ventral stream and, thus, conscious experience can be involved in fixing the referents of perceptual demonstratives. Campbell (2006b, pp. 242–246) offers a rather incomplete argument to the effect that categorical perception is possible because of the role of conscious experience, but I will not deal with this thesis either. Instead, I will accept for the sake of argument, that Campbell is right. So, let us suppose that consciousness is necessary for categorical perception and reference. What does this entail for the issue of demonstrative reference fixing? Matthen argues that demonstrative reference is determined non-descriptively through the dorsal system and Campbell argues that it is consciously determined through the ventral system. I do not see any reason why one should choose one alternative at the expense of the other. In both cases, to either act on an object or to recognize/cognize about it, requires that this object be identified in a way that better suits the occasional aims of the perceiver. This identification may consist of perceiving the object as a set of potentialities for action afforded by the object, or it may consist in categorically perceiving the object. In both cases the object must have been parsed in a scene both from ground and other objects and must have been locked on by the perceptual system. In that sense, the referent of the perceptual demonstrative has been fixed, albeit into two different ways; one in an action/ potentialities related manner and the other in a categorical manner.13 Let me return now to Campbell’s claim that reference requires consciousness. Campbell points out that categorical reference is provided not simply by conscious experience but by conscious object-based attention to the object. There is one caveat and one problem here. The caveat is that Campbell uses the terms ‘conscious attention’ and ‘selective attention’ indiscriminately. Since information binding is necessary to provide the complex binding parameter that individuates objects in a 13 Of course, Campbell could argue that only in categorical perception one can be construed as properly referring to an object, but such an argument is missing and, furthermore, I find it hard to see how this is something more than an exercise in mere semantics of the term ‘‘reference’’.

123

358

A. Raftopoulos

scene, thereby allowing demonstrative reference to these objects, and since Campbell thinks that attention is needed for that binding, Campbell draws the conclusion that conscious attention is needed for demonstrative reference fixing. The first objection to this is that, as we have seen, a significant amount of binding takes place preattentively (meaning before the onset of object-based attention, which according to Campbell fixes the demonstrative reference to objects in a visual scene). So there are occasions in which the complex binding parameter that individuates objects can be formed preattentively. (If motion and shape, to give an example, suffice to individuate objects, then this takes place preattentionally. The same holds whenever spatio-temporal information suffices to individuate objects.) A second objection is that even when the binding of features requires some form of attention, as is the case with shape and color, the attention involved is spatial attention and not object-based attention (color and shape binding occurs earlier than the onset of object-based attention but it seems to require spatial attention). A third objection, which is a problem raised by Clark (2006, p. 172) as well, is that there are cases in which all the binding, selection, gating, and processing job required either to individuate objects (even in cases when this individuation presupposes conceptual involvement and, thus, the reference involved transcends the confines of perceptual demonstratives) or, even, to use conceptual information regarding these objects, can take place outside the realm of awareness, as in cases of implicit perception. In these cases there are certainly informational bottlenecks, but they are either non-attentional (Lamme 2003), or if they are attentional they are outside the realm of consciousness (Dehaene et al. 1998; Merikle et al. 2001; Rensink 2000; Treisman 1998). In the latter case, as Clark rightly points out, selective attention can do the entire job without the subject being conscious of the objects referred to in perception. Notice that the processes involved in implicit perception usually take place in the ventral visual stream, and thus, the (implicit) perception of objects is categorical and not based on action potentialities. This means that Campbell cannot argue that this kind of perception is non-categorical and, therefore, non-referential. The problem with Campbell’s claim that conscious object-based attention is required for reference fixing is that object-based attention, by involving the consumer parts of the brain and, thus, GRP, cannot provide the non-conceptual nondescriptive reference fixing that Campbell is after. So, does this rule out any role for consciousness in fixing the referents of perceptual demonstratives? This is not necessarily so, provided that one could assign the role of establishing categorical perception to a non-conceptual non-descriptive kind of consciousness. This cannot be the kind of consciousness that Campbell has in mind since, according to Campbell, this kind contributes to our grasp of demonstrative concepts and goes hand in hand with object-based attention, and we have rejected this possibility.14 However, there is another kind of consciousness that does not require 14 We saw that object-based attention is registered at about 200–300 ms after stimulus onset. This coincides with the onset of GRP, a fact that shows that object-based attention is inextricably related to the function of memory and other cognitive centers. If the reader looks back at the example of how the biased competition account of attention accounts for the search of a feature in a visual scene, she will see that, indeed, attention that locks on to objects/features requires working memory.

123

Reference, perception, and attention

359

object-based attention and does not involve the application of sortal concepts either; it is the ‘phenomenal consciousness or awareness’, which is opposed to ‘access or report consciousness or awareness’. The former is the consciousness we have of the phenomenal non-conceptual content of experience, whereas the latter is the consciousness we have of conceptual content (Block 2005; Lamme 2003; Raftopoulos and Muller 2006a; Raftopoulos, in press). Thus, Campbell could retain his view that awareness is necessary for categorical perception and reference, provided that the awareness involved is phenomenal awareness and not access awareness, a restriction that is imposed by the demand that the referents of perceptual demonstratives be fixed non-conceptually and non-descriptively.

References Block, N. (2005). Two neural correlates of consciousness. Trends in Cognitive Sciences, 9(2), 46–53. Brandom, R. B. (1996). Making it explicit: Reasoning, representing, and discursive commitment. Cambridge, MA: Harvard University Press. Burge, T. (1977). Belief de re. Journal of Philosophy, 74, 338–362. Campbell, J. (1997). Sense, reference and selective attention. Proceedings of the Aristotelian Society, Supplementary Volume, 71, 55–74. Campbell, J. (2002). Reference and consciousness. Oxford: Clarendon Press. Campbell, J. (2006a). Does visual attention depend on sortal classification? Reply to Clark. Philosophical Studies, 127, 221–237. Campbell, J. (2006b). What is the role of location in the sense of a visual demonstrative? Reply to Matthen. Philosophical Studies, 127, 239–254. Clark, A. (2000). A theory of sentience. Oxford: Oxford University Press. Clark, A. (2006). Attention and inscrutability: A commentary on John Campbell, reference and consciousness. Philosophical Studies, 127, 167–193. Cussins, A. (1990). The connectionist construction of concepts. In M. Boden (Ed.), The philosophy of artificial intelligence. Oxford: Oxford University Press. Dehaene, S., Kerszberg, M., & Changeux, J. P. (1998). A neuronal model of global workspace in effortful cognitive tasks. Proceedings of the National Academy of Science, USA, 95, 14529–14534. Dehaene, S., Naccashe, L., Le’Clec’H, G., Koechlin, E., Mueller, M., Dehaene-Lambertz, G., van de Moortele, P.-F., & Le Bihan, D. (1998). Imaging unconscious semantic priming. Nature, 395, 597–600. Desimone, R. (1999). Visual attention mediated by biased competition in extrastriate visual cortex. In G. W. Humphreys, J. Duncan, & A. Treisman (Eds.), Attention, space and action: Studies in cognitive neuroscience (pp. 13–30). Oxford: Oxford University Press. Driver, J., David, G., Russell, C., Turatto, M., & Freeman, E. (2001). Segmentation, attention and phenomenal visual objects. Cognition, 80, 61–95. Goodale, M., & Milner, D. (2004). Sight unseen. New York, NY: Oxford University Press. Goodale, M., & Milner, D. (2006). The visual brain in action (2nd ed.). Oxford: Oxford University Press. Heck, R., Jr. (2000). Non-conceptual content and the ‘space of reasons. Philosophical Review, 109(4), 483–523. Jacob, P., & Jeannerod, M. (2003). Ways of seeing: The scope and limits of visual cognition. New York, NY: Oxford University Press. Kahneman, D., Treisman, A., & Gibbs, B. J. (1992). The reviewing of object files: Object specific integration of information. Cognitive Psychology, 24, 174–219. Kelly, S. D. (2001). Demonstrative concepts and experience. The Philosophical Review, 110(3), 397–420. Lamme, V. A. F. (2003). Why visual attention and awareness are different. Trends in Cognitive Sciences, 7(1), 12–18. Martin, M. G. F. (1997). The shallows of the mind. Proceedings of the Aristotelian Society, Supplementary Volume, 71, 75–98. Matthen, M. (2005). Seeing, doing, and knowing: A philosophical theory of sense-perception. Oxford: Clarendon Press.

123

360

A. Raftopoulos

Matthen, M. (2006). On visual experience of objects. Philosophical Studies, 127, 195–220. McDowell, J. (1994). Mind and world. Cambridge, MA: Harvard University Press. Merikle, P. M., Smilek, D., & Eastwood, J. D. (2001). Perception without awareness: Perspectives from cognitive psychology. Cognition, 79, 115–134. Norman, J. (2002). Two visual systems and two theories of perception: An attempt to reconcile the constructivist and ecological approaches. Behavioral and Brain Sciences, 25, 73–144. Peacocke, C. (1992). A study of concepts. Cambridge, MA: The MIT Press. Peacocke, C. (2001). Does perception have a non-conceptual content? The Journal of Philosophy, XCVIII(5), 239–269. Pylyshyn, Z. (1994). Some primitive mechanisms of spatial attention. Cognition, 50, 363–384. Pylyshyn, Z. (2001). Visual indexes, preconceptual objects, and situated vision. Cognition, 80, 127–158. Pylyshyn, Z. (2003). Seeing and visualizing: It’s not what you think. Cambridge, MA: The MIT press, A Bradford series book. Raftopoulos, A. (2001). Is Perception informationally encapsulated? The issue of the theory-ladenness of perception. Cognitive Science, 25, 423–451. Raftopoulos, A. (2006). Defending realism on the proper ground. Philosophical Psychology, 19(1), 1–31. Raftopoulos, A. Perception and cognition: How do psychology and the neural sciences inform philosophy. Cambridge, MA: The MIT Press, A Bradford Book (in press). Raftopoulos, A., & Muller, V. (2006a). The non-conceptual content of experience. Mind and Language, 27(2), 187–219. Raftopoulos, A., & Muller, V. (2006b). Non-conceptual demonstrative reference. Philosophical and Phenomenological Research, 72(2), 251–286. Rensink, R. A. (2000). Seeing, sensing, and scrutinizing. Vision Research, 40, 1469–1487. Reynolds, J. H., & Desimone, R. (2001). Neural mechanisms of attentional selection. In J. Braun, C. Koch, & J. Davis (Eds.), Visual attention and cortical circuits. Cambridge, MA: The MIT Press. Scholl, B. J. (2001). Objects and attention: The state of the art. Cognition, 80, 1–46. Scholl, B. J., & Leslie, A. M. (1999). Explaining the infant’s object concept: Beyond the perception/ cognition dichotomy. In E. Lepore & Z. Pylyshyn (Eds.), What is cognitive science? (pp. 26–74). Malden, MA: Blackwell. Smith, A. D. (2002). The problem of perception. Cambridge, MA: The Harvard University Press. Soames, S. (2005). Reference and description. Princeton, NJ: Princeton University Press. Treisman, A. (1988). Features and objects. Quarterly Journal of Experimental Psychology, 40, 201–237. Treisman, A. (1998). Feature binding, attention, and object perception. Philosophical Transactions of the Royal Society of London Series B, 353, 1295–1306. Ungerleider, L. J., & Haxby, J. M. (1994). ‘What’ and ‘where’ in the human brain. Current Opinion in Neurobiology, 4, 157–165. Vecera, P. (2000). Toward a biased competition account of object-based segmentation and attention. Brain and Mind, 1, 353–384. Vision, G. (1997). Problems of vision. Oxford: Oxford University Press.

123

Suggest Documents