principles of Chomsky's Minimalist Programme as a reference modelâ (Pastra and .... language and action there would be both hierarchical complexity and ...
NOTICE: This is the author’s version of a work that was accepted for publication in . Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms, may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in TOPOI: http://link.springer.com/article/10.1007%2Fs11245-013-9186-7
A generative system for intentional action? Marco Mazzone Abstract: It has been proposed that intentional actions are supplied by a generative system of the sort proposed by Chomsky for language. In this paper I aim to provide a closer analysis of this claim for the sake of conceptual clarification. To this end, I will first clarify what is involved in the thesis of a structural analogy between language and action, and then I will consider what kind of evidence there seems to be in favour of the thesis of a neurobiological identity. On this basis, I will subsequently focus on two definitional issues. The first is whether, as the claim of a generative system for intentional action suggests, humans may perform an infinite number of possible actions. The second is whether, as the claim of a generative system for intentional action suggests, what is at issue is conscious planning of action and therefore controlled processing.
Keywords: generative process; intention; action; Broca's area 1. Introduction
It has been recently suggested that both intentional action and its comprehension could be implemented by a generative system similar to that involved in language processing. Baldwin and Baird (2001, 171), for instance, have claimed that “[a] generative knowledge system underlies our skill at discerning intentions, enabling us to comprehend intentions even when action is novel and unfolds in complex ways over time”. Let me emphasize here the adjectives “novel” and “complex”. In fact, the ability to produce and understand complex and always novel sentences is the essential fact that motivates Chomsky's thesis of a generative system for language. Baldwin and Baird (2001, 176) also make the explicit claim that the generative system for discerning intentions “is probably just as rich and complex as the generative system underlying language”. This thesis of a strict analogy between language and intentional action due to their common generative structure has been taken very seriously by Pastra and Aloimonos (2012). Their purpose is to “present a biologically inspired generative grammar of action, which employs the structure-building operations and
1
principles of Chomsky's Minimalist Programme as a reference model” (Pastra and Aloimonos 2012, 103). In their view, the proposal of a generative grammar for describing the structure of action is justified by “experimental evidence on the common biological basis of language and action” (Pastra and Aloimonos 2012, 113). In this quotation a stronger thesis is suggested than that of a structural analogy between language and action: the claim is that there is in fact a “common biological basis”, that is, a (plausibly partial) neurobiological identity between these two phenomena. In this vein, Baars and Gage (2010, 405) suggest that the prefrontal cortex and the whole executive system “is critical for planning and the generative processes” both in action and language. Making plans for the future, they claim, requires that the brain has the ability to reconfigure elements of prior experiences in a way that does not exactly copy any past experience. This ability is apparent in toolmaking, one of the fundamental distinguishing features of primate cognition, but according to the authors “the generative power of language to create new ideas depends on this ability as well. The ability to manipulate and recombine internal representations depends critically on the PFC, which probably made it critical for the development of language” (Baars and Gage 2010, 402). In this paper my aim is to provide a closer analysis of the claim that intentional actions are supplied by a generative system just as language is. To this end, I will first clarify what is involved in the thesis of a structural analogy between language and action (section 2), and then I will consider what kind of evidence there seems to be in favour of the thesis of a neurobiological identity (section 3). On this basis, I will subsequently focus on two definitional issues. The first (section 4) is whether, as the claim of a generative system for intentional action suggests, humans may perform an infinite number of possible actions. Generative Grammar is intended to be an explanation of our ability to produce an infinite number of sentences. It has been suggested that communicative intentions would display the same kind of complexity as sentences (Levelt 1989; Sperber and Wilson 2002). Sperber and Wilson (2002) have offered an argument to the effect that, in this respect, communicative intentions might be quite different from intentions underlying noncommunicative action: while communicative intentions – due to their semantic complexity – can vary in infinite ways, the number of possible non-communicative actions (and the related intentions) would be greatly constrained by many practicalities. A way to understand this claim – although not necessarily the one that Sperber and Wilson would espouse – is in terms of what is consciously intended. That is, the communicative intention underlying an utterance could be a conscious thought endowed with the same degree of semantic complexity as the utterance itself, while conscious plans of action would not have a comparable degree of complexity. These considerations bring us to the second issue (section 5): whether, as the claim of a generative system for intentional action suggests, what is at issue is conscious planning of action and therefore 2
controlled processing. With regard to those two issues, my aim is not to provide any conclusive argument in favour of one position over another. More modestly, I intend to analyse some theoretical options for the sake of conceptual clarification. Specifically, I will first consider the possibility that communicative intentions are conscious thoughts – that is, thoughts whose complex semantic content is consciously intended – driving language production in a top-down manner. Then I will sketch an account of my preferred option, according to which there are no such conscious plans at the instigation of speaking. There are reasons to think that automatic and controlled processes interact in a much more flexible way than that (Mazzone and Campisi in press; Mazzone in press): both in language and action, automatic processes can be thought to operate through a constraint-based, multi-level generative system, while conscious control can be thought to focus – from moment to moment and from circumstance to circumstance – on different components of the multiple representations involved. If this picture is correct, a single automatic generative mechanism may deliver an infinite number of both communicative and non-communicative actions endowed with complex structure, while conscious control interact with that automatic processing in a distributed and changeable way. 2. Structural analogy between language and action Action appears to have a complex intentional structure. Baldwin and Baird (2001, 172) point to the fact that even infants as young as 10-11 months are able to detect this structure, and in fact they parse continuous action along intention boundaries. Pastra and Aloimonos (2012, 103) recall that two-year-old children can not only parse hierarchically organized actions (Bauer 1995), but they also can copy and reproduce them (Whiten et al. 2006). Interestingly, in virtue of that hierarchical organization, the structure of action can be analysed in terms of means-end parse trees. A hierarchical organization is in fact considered an essential feature of action processing. As Baldwin and Baird (2001, 172) put it: Adults also appear to process continuous action streams in terms of hierarchical relations that link smaller-level intentions (e.g. in a kitchen cleaning-up scenario: intending to grasp a dish, turn on the water, pass the dish under the water) with intentions at higher levels (intending to wash a dish or clean a kitchen).
Pastra and Aloimonos (2012) offer some detailed examples of how actions can be analysed in terms of parse trees. For an example, let us consider their partial analysis of the action “grasp with the hand a knife to slice” (Pastra and Aloimonos 2012, 108). This action can be first broken down 3
into a right-branch node containing the action of grasping the knife with the hand and a left-branch node containing the goal of slicing. Then the action of grasping the knife with the hand can be further broken down into a right-branch node containing the action of extending the hand towards the knife and a left-branch node containing the action of enclosing the knife with the hand. Apart from the details of this example and of the whole proposal, what is important to us is its general structure. Lower-level representations of actions can be combined together in accordance with higher-level representations – in other words, the latter specify arrangements of the former. This is also what happens in the traditional account of Generative Grammar (GG). Higherlevel categories (e.g., Sentence) determine the combination of lower-level ones (e.g., Nominal Phrase, Verbal Phrase) by means of phrase structure rules (e.g., S → NP – VP). To be sure, in GG those rules have been conceived as operations supplied by a module specialized for language. In this case, one cannot expect that action and language, despite their superficial analogy, are processed in the same way. However, it has been proposed that the generative component of GG can be embedded within a different framework, which is domain-general instead of domain-specific for language. This view has been largely elaborated by Ray Jackendoff (2002; 2007a). In his approach, rules such as S → NP – VP are reinterpreted as “constraints on possible trees rather than as algorithmic generative engines for producing trees” (Jackendoff, 2007a, 8). Here, the important idea is that “words, regular affixes, idioms, constructions, and ordinary phrase structure rules […] can all be expressed in a common formalism, namely as pieces of structure stored in long term memory” (Jackendoff, 2007a, 11). This means that the assumption of a rigid distinction, crucial to GG, between words and syntactic rules is completely abandoned: instead, there is supposed to be a continuum of pieces of representation with different degrees of abstraction – both with respect to generality of form and meaning. As a consequence, instead of specialized procedural rules ensuring the combination of inert words, there would be just pieces of structure – at different levels of abstraction – which act as constraints in the process of combination, insofar as they prescribe how they must be combined with each other. In this perspective, the only process that is needed is a domain-general one. In Jackendoff's (2007a, 11) words, “[t]he generation of novel sentences is accomplished across the board by the operation of clipping together pieces of stored structure, an operation called unification”. This process of unification is explicitly distinguished from the operation called “Merge” in the Minimalist Program, in that the former is not specific for language, as is made clear by Jackendoff 4
and Pinker (2005, 222): Unlike the recursive Merge operation in Chomsky's MP, [unification] combines expressions of any size and composition, not just words and syntactic trees […]. Unification may be a fundamental operation throughout perception and cognition: if so, the language-specific part of grammar would reside in the nature of the stored representations (their constants and variables) rather than in the operation that combines them.
Jackendoff and Pinker (2005) claim that not even recursion is special to language. They make an argument that recursive structures can be found in perception as well as in language. Recursion should then be considered a general feature of cognition and specifically of the general-domain process of unification. In sum, the same sort of combinatorial structure seems to be found in language and action. Jackendoff (2007b) explicitly analyses some examples of non-communicative actions such as making coffee, and he insists on the thesis of a strict analogy with linguistic competence: in both language and action there would be both hierarchical complexity and recursive structure. However, as we saw, Jackendoff goes beyond the claim of a structural analogy: he argues that it is at least possible to coherently account for generativity by means of a domain-general, constraint-based mechanism. Therefore, it is at least possible that the generativity of language does not require a specialized mechanism. But he goes even further by offering linguistic, psycholinguistic, and evolutionary arguments for the positive thesis that language is processed by a domain-general, not a domain-specific process. This is also what has been suggested on the basis of neurobiological evidence. 3. Neurobiological identity between language and action In his review of hierarchical models of behaviour, Matthew Botvinick (2008) notices that the issue has recently received renewed attention in cognitive and developmental psychology, as well as in neuropsychology and neuroscience. With regard to the neural basis of the phenomenon, Botvinick (2008, 204) observes that “[t]he neural mechanisms underlying the production of hierarchically organized behavior have long been considered to reside, at least in part, within the dorsolateral PFC (DLPFC)”. Specifically, on the basis of neurophysiological and neuropsychological evidence, the dorsolateral prefrontal cortex has been proposed by Fuster (1990; 2004) to have a key role in the temporal integration of behaviour, insofar as it serves to maintain 5
context or goal information at multiple, hierarchically nested levels of task structure. It should be noticed that in Fuster's view these considerations on the DLPFC specifically concern high-level processing of behaviour, but hierarchical organization in itself is taken to be a general phenomenon concerning the entire brain: The physiology of the cerebral cortex is organized in hierarchical manner. At the bottom of the cortical organization, sensory and motor areas support specific sensory and motor functions.
Progressively
higher
areas—of
later
phylogenetic
and
ontogenetic
development—support functions that are progressively more integrative. The prefrontal cortex (PFC) constitutes the highest level of the cortical hierarchy dedicated to the representation and execution of actions. (Fuster 2001, 319)
In other words, the brain seems to be organized along two distinct pathways, respectively constituting a sensory and a motor hierarchy of cortical maps. The PFC lies at the top of the motor hierarchy and it seems to contain neuronal networks that, both in monkeys and in humans, represent abstract programs or plans of action (Fuster 2003, 76). Grafton and Hamilton (2007) have found evidence for a hierarchy of action representation in the entire brain. They have performed three functional brain imaging studies of action observation using the method of repetition suppression, based on the reduced physiologic response to repeated stimuli. This has made it possible to separately address different levels of representation for the same stimulus, revealing differential activation for the outcome of a movement (right inferior frontal network), for the evaluation of the goal-object interactions (anterior intraparietal sulcus), and for the evaluation of the lower level kinematics (visual association areas). Specifically, as far as the frontal cortex is concerned, recent research supports the idea that “a topographical organization might exist within the frontal cortex and the DLPFC, according to which progressively higher levels of behavioral structure are represented as one moves rostrally” (Botvinick 2008, 205). Koechlin and Jubault (2006, 936), on the basis of evidence from magnetic resonance imaging, propose that Broca's area and its homolog in the right hemisphere (synthetically, BCA regions) might “implement a specialized executive system governing action selection in hierarchically structured action plans”. Importantly, their study shows “phasic activation at the boundaries of action segments that constitutes a hierarchical action plan” (Koechlin and Jubault 2006, 936), consistent with behavioural evidence that we segment actions in accordance with their hierarchical structure (as seen in section 2). They have also found evidence of a topographical organization extending from premotor to anterior BCA regions, with a different 6
localisation for single acts, simple chunks, and superordinate chunks. Specifically, premotor regions seem to be involved in selecting single acts, posterior BCA regions (Brodmann area 44) in selecting/inhibiting simple chunks, and anterior BCA regions (Brodmann area 44) in selecting/inhibiting superordinate chunks. Importantly, left Brodmann areas 44 and 45 constitutes Broca's area, that is, the area which is traditionally linked to language production. As a matter of fact, a topographical organization of hierarchically organized representations within Broca's area has been proposed for language as well as for action. There is some evidence of dissociation between phonological, semantic and syntactic processing, with posterior regions preferentially involved in phonological processing while anterior and anterior-ventral regions would be more specifically involved in syntactic and semantic processing (Koechlin and Jubault 2006, 971; Gough et al. 2005; Bookheimer 2002). Therefore, both in language and action processing Broca's area seems to play a key role in managing hierarchical complexity. There is strong neuroimaging evidence that this region has functions that extend beyond language alone. As observed by Fadiga et al. (2009), Broca's area is not only related to language processing, but is also a part of the premotor cortex and, as such, is involved in action representation (Roby-Brami et al. 2012, 150). Specifically, there is large evidence of the role of Broca's area in the representation of human action (Fazio et al. 2009; Clerget et al. 2009) and specifically of tool use (Higuchi et al. 2007) and music (Fadiga et al. 2009). This conclusion also receives support from neuropsychological evidence. Roby-Brami et al. (2012, 144), in their review of the topic, have found “functional and neuroanatomical links between language and praxis in brain-damaged patients with aphasia and/or apraxia”. Putting together these neurological data on the relation between language and action with the evidence of the role plaid in managing hierarchical representations, the hypothesis has been made that Broca's area is a “supramodal hierarchical processor” (Tettamanti and Weniger 2006; Fazio et al. 2009). In other words, its computational role could be to process hierarchical structures in different domains, including language, tool use and music. Further evidence of the neurological relation between action recognition and language perception has been provided by studies based on event-related potentials. Sitnikova et al. (2008), making use of a violation paradigm applied to the structure of action, have found that the introduction of a tool irrelevant to the action context elicited a late positivity (P600), that is, a neurophysiological response usually linked to violations in syntactic processing. The general hypothesis is that there are two distinct semantic integration mechanisms at play both in action and language (see also Kuperberg 2007). The first (reflected by the N400 component) is based on the activation of graded semantic memory networks and is supposed to be particularly useful in familiar 7
circumstances. The second (reflected by the P600 component) could involve discrete, rule-like representations, and – as far as action perception is concerned – it might “play an important part in flexible visual real-world comprehension by enabling viewers to understand relationships in novel combinations between entities and actions” (Sitnikova et al. 2008, 2054). Given the poor spatial resolution of EEG techniques, Sitnikova and colleagues cannot draw conclusions on the precise localisation of the second, discrete and rule-based, process. They speculate, however, that this process could be based on the neurobiological mechanisms specific to prefrontal cortex, given the recent evidence (Rougier et al. 2005) that they “can lead to self-organization of discrete, rule-like representations coded by patterns of activity” (Sitnikova et al. 2008, 2055). Finally, it is interesting to cite Hagoort's (2005) proposal concerning a left frontal language network (including BA 44 and 45, BA 47 and possibly the ventral part of BA 6) that would provide a “unification workspace” in the precise sense intended by Jackendoff (2002, 2007): a process for unifying linguistic pieces of structure in accordance with their internal hierarchical structure. In sum, the analogy between language and action due to their common hierarchical structure seems to have a plausible neurobiological ground. Not only is hierarchical organization a crucial feature of the cerebral cortex in general, but also there is evidence that managing hierarchical structures in language and action could recruit partially overlapping areas in the brain, with a crucial role played by the prefrontal cortex and Broca's area. 4. An infinite number of intentional actions? Let us recall Baldwin and Baird's (2001, 171) claim that a generative system enables us to “comprehend intentions even when action is novel and unfolds in complex ways over time”. An implicit assumption underlying this claim seems to be that when actions are novel and complex it is more difficult to comprehend intentions, possibly because they are novel and complex as well. A similar issue arises with regard to language. It should be kept in mind that the generative system proposed by Chomsky for language processing is not presumed to explain communicative intentions. It is presumed to explain instead the infinite variety of sentence structures. Do communicative intentions have a complex structure just as sentences do? Is there, as a consequence, an infinite variety of communicative intentions, just as there is an infinite variety of sentences? An affirmative answer to the latter question has been given by Levelt in his Speaking (Levelt 1989), which represents the most comprehensive effort undertaken thus far to address the role of intentions within a general framework for language production. Levelt proposes that communicative intentions are produced by a processing system that he calls “the Conceptualizer”. In his view, the Conceptualizer involves highly controlled processing. The argument that he gives for that is as 8
follows: Speakers do not have a small, fixed set of intentions that they have learned to realize in speech. Communicative intentions can vary in infinite ways, and for each of these ways the speaker will have to find new means of expression. This requires much attention. (Levelt 1989, 21)
It seems that, in Levelt's opinion, there is not a finite set of communicative intentions to be realized by the infinite variety of sentences. On the contrary, communicative intentions differ from each other just as sentences do. One reasonable interpretation is that this happens because communicative intentions are just as complex as sentences are. In Levelt's model this makes perfectly sense: communicative intentions are thought to drive language production in a substantially top-down manner, and therefore intentions must contain in themselves the whole content to be expressed by sentences. A similar view is adopted by Sperber and Wilson (2002). They assume that communicative intentions differ from each other as a function of the semantic complexity of utterances. But they also claim that there is here a difference with non-communicative action. In the repertoire of human actions, utterances are much more differentiated than other types of actions: many utterances are wholly new, whereas it is relatively rare to come across actions that are not reiterations of previous actions […]. Leaving stereotypical utterances aside, the prior probability of most utterances ever occurring is close to zero, as Chomsky pointed out long ago. Semantically, the complexity of ordinary [that is, noncommunicative] intentions is limited by the range of possible actions, which is in turn constrained by many practicalities. There are no such limitations on the semantic complexity of speaker's meaning. Quite simply, we can say so much more than we can do. (Sperber and Wilson 2002, 11)
The claim is that while sentences (and the related communicative intentions) are complex, it is not the case that ordinary (i.e., non-communicative) actions and intentions have a similar kind of complexity. This claim is obviously contrary to Baldwin and Baird's (2001) thesis of a generative system for action, insofar as that thesis presupposes the complexity and infinity of noncommunicative intentions as well. Now, my aim is by no means to settle here the issue. I just want to consider and compare with each other some ways to understand the claim of an infinite number
9
of communicative and non-communicative intentions. To start with, we have to clarify the conceptual difference between performing an infinite number of action/utterances and intending to perform an infinite number of actions/utterances. Let us start from the supposition that non-communicative actions do have a complex hierarchical structure in the sense we defended above (section 2). Moreover, let us assume that the components of those complex actions change (to some extent) from one time to another. For instance, in making coffee I can choose or not caffeine-free coffee; I can use a big or a little coffee maker and so on and so forth. Nonetheless, one could argue that these occasional differences are not a constitutive part of high-level intentions: with the previous example, I would intend to make coffee all the times in spite of the differences. That is, the lower-level structure of the action – to make caffeine-free coffee with a little coffee maker... – would not be intended in itself. Alternatively, one could assume that we intend the precise occasion-specific structure of our non-communicative actions, so that – in accordance with Baldwin and Baird (2001) – non-communicative intentions may in principle vary infinitely. The same issue arises with regard to communicative intentions: we can consider the semantic complexity of utterances either as constitutive of the respective intentions or not. Both Levelt (1989) and Sperber and Wilson (2002) appear to choose the first option: they assume that the everchanging arrangement of words in utterances is constitutive of communicative intentions, so that communicative intentions vary from utterance to utterance.1 More precisely, Sperber and Wilson (2002) might be intended as suggesting that, while in the case of non-communicative actions the agent may intend to make an action with an occasion-specific structure without intending that occasional structure, in the case of communicative intentions the speaker would intend the precise structure of what he utters. What I want to address now is which precise sense one should give to this intending the specific structure of actions/utterances. One possible interpretations is in terms of conscious intentions – or, in Levelt's words, in terms of controlled processing. It is this hypothesis that we are going to consider now. 5. Conscious intentions? Levelt (1989) seems very close to the position we intend to analyse. As we saw, he proposes that communicative intentions can vary in infinite ways and that they are produced by a processing 1
I am not considering here the way in which, even for the same utterance, communicative intention may change as a function of context. However, at a first approximation that issue may be set aside. As a matter of fact, in the quotation above Sperber and Wilson (2002) seem to suggest that communicative intentions are just as complex as it would be predicted by the syntactic structure of the corresponding utterances. 10
system, the Conceptualizer, which involves highly controlled processing. In fact, in Levelt's view, “speaking is usually an intentional activity; it serves a purpose the speaker wants to realize. An intentional activity is, by definition, under central control’’ (Levelt 1989, 20). More specifically, the Conceptualizer is in charge of a sum of activities which include ‘‘conceiving of an intention, selecting the relevant information to be expressed for the realization of this purpose, ordering this information for expression, keeping track of what was said before, and so on. These activities require the speaker’s constant attention’’ (Levelt, 1989, p. 9). In this quotation, let me emphasize the suggestion that the conscious operations of the Conceptualizer include the selection of the information whose expression is needed in order for the communicative intention to be fulfilled. In other words, communicative intentions can be said to contain in themselves the whole content to be expressed by utterances. In practice, the final output of the Conceptualizer is taken to be a preverbal message consisting of conceptual information. This preverbal message is then fed into the subsequent component of the model, the Formulator, which has to translate the conceptual structure into a linguistic structure thanks to the operations of two subcomponents, the Grammatical and the Phonological Encoder. Finally, a phonetic plan, which is the output of the Formulator, is fed into the Articulator which is responsible for its motor execution. In sum, in Levelt's view a communicative intention is essentially a preverbal message, that is, a complete thought which is produced under conscious control (by the Conceptualizer) and then drives language production in a substantially top-down manner (through the operations of the Formulator and the Articulator). Let me emphasize that this view is not trivial at all. To start with, as we have repeated, according to Generative Grammar it is sentences that result from a generative system and that can vary in infinite ways. The claim of a modular generative system for sentences is quite different from the claim of a generative system for intended thoughts which operates under conscious control. It is not clear whether these two theses are even compatible with each other. Intuitively, one has to choose between the two, insofar as the assumption is made that the generative mechanism operates at one single level of representation – consider, for instance, the debate opposing Generative Grammar and Generative Semantics in the sixties and seventies. However, I would like to consider the possibility that this assumption is to be abandoned at all. Specifically, I intend to outline the hypothesis of a generative system which operates neither at the level of syntax nor at the level of thoughts alone, in fact a multi-level generative system for communicative intentions which operates automatically but under conscious control. The idea of a multi-level generative system for language production is explicitly formulated by Jackendoff (2007a). His model – that we have partially outlined above (in section 2) – can be characterized as “a constraint-based architecture with parallel sources of generativity” (Jackendoff 11
2007a, 12). More precisely, his theory “treats phonology, syntax, and semantics as independent generative components” and it “uses a parallel constraint-based formalism that is nondirectional” (Jackendoff 2007a, 2). Therefore, the building of linguistic structures – that is accomplished through the unification process we referred to above – is taken to be constrained by word-based, phrase-based, semantically and even pragmatically based conditions. In short, in this hypothesis there would be a multi-level generative system operating through a constraint-based process. Processes of this kind are generally conceived of as essentially automatic. So what would be the place for consciousness in this process? One possibility is that consciousness interact with the automatic generative component in a dynamic and changeable manner. In Mazzone and Campisi (in press) the claim is made that “in order for actions to be intentional it is not required that action plans are consciously represented and then put into effect in a purely top-down manner”. This is intended to mean essentially two things. First, that consciousness needs not to be concentrated into the instigation of action; it can be expected instead to be distributed along the whole time course of action. Second, that consciousness needs not to be focussed on one single level of representation and one single content; it can be focussed instead on contents lying at different levels of the action representation, as a function of changeable circumstances. As an argument for the first point, it should be kept in mind that “consciousness is too slow for it to ensure fast and efficient action initiation: most of the time conscious control comes later, in the course of processing, as a mechanism for goal maintenance and shielding, for reorganization of habits, or for the management of unexpected difficulties” (Mazzone and Campisi in press). As to the second point, we should consider that “actions are complex entities involving an indefinite number of different goals, both hierarchically and heterarchically related, and agents may consciously attend to different goals [or other goal-related representations] in the course of one and the same action” (Mazzone and Campisi in press). The distributed and changeable role of consciousness is well expressed by Jordan (2003, 6), according to which action is supported by a “hierarchy of nested control systems, each of which is responsible for pre-specifying, monitoring, and producing outcomes at a different level of scale”. Now, [g]iven that these systems are all coupled and function simultaneously, one can be said to be controlling (i.e., pre-specifying, monitoring, and producing) multiple events at the same time. The temptation of course is to identify consciousness with one particular level of event control. It seems to be the case however that the actual timbre of consciousness can find itself distributed across different levels of event-control at any given moment. A beginning pianist for example, is probably very conscious of his/her attempt to control the
12
relationship between finger positions and keys. Pre-specifying, monitoring, and producing these relationships constitutes the task at hand. For an expert pianist however, who has long since automated the control of finger-key relationships, consciousness may be most intimately tied to the control of the emotional states provoked by the piece. Thus, the content of consciousness is fluid and dynamic, and pinning it down to one particular level of event control seems difficult, if not impossible. (Jordan 2003, 6)
6. Conclusions Do communicative intentions have a complex structure which parallels the complex structure of utterances, so that intentions can vary in infinite ways? I have considered a possible affirmative answer to this question, based on the idea that the complex structure of utterances is consciously intended by the speaker. However, this assumption can be understood in (at least) two different ways. On the one hand, a conscious thought can be formed at the instigation of speaking and it can drive speaking in a substantially top-down manner. On the other hand, a possibility is that utterances are produced by an automatic, multi-level generative system and that conscious control applies to the resulting complex representation in a dynamic and changeable manner. Nonetheless, it can be argued (as in Mazzone and Campisi in press) that such a distributed view of conscious control is compatible with the claim that utterances are intentional and, specifically, that their semantic structure is intended. The same considerations also apply to non-communicative actions: their structure might be intended either in the sense that they are previously planned and then executed in a top-down manner or in the sense that their structure is generated by an automatic, multi-level generative system operating under distributed intentional control. Although I think that there are reasons to prefer the latter theoretical option (see Mazzone and Campisi in press; Mazzone in press), this was not my point here. I just wanted to analyse some implications of the thesis of a generative system for intentional actions, for the sake of conceptual clarification. The first implication of the thesis is that intentional actions have the sort of hierarchical (and recursive) structure that is generally ascribed to language. This is in fact the view expressed by Baldwin and Baird (2001) and Pastra and Aloimonos (2012). Jackendoff (2002; 2007a; 2007b) has also adopted this view within a general framework in which the very same cognitive process is responsible for the generation of sentences and actions. Moreover, we have considered some evidence that hierarchical processing of language and action might recruit partially overlapping areas in the brain. Then, I have considered another possible implication: the idea that there is an infinite number
13
of non-communicative intentions underlying the infinite number of possible actions, as well as – according to Levelt (1989) and Sperber and Wilson (2002) – there is an infinite number of communicative intentions underlying the infinite number of possible sentences. This idea is far from obvious. It depends – amongst other things – on whether communicative and noncommunicative intentions are conceived of as endowed with a complex structure of the same sort as sentences. The two views that I have been contrasting above share the assumption that intentions are indeed complex, but in one case something like a generative system for conscious intentions is presupposed, while in the other a multi-level, automatic generative system is postulated to work in concert with distributed conscious attention. It should be noticed that both views are not easy to reconcile with the thesis of a generative system for sentences. In the former case, it is instead intentions that drive the process, imposing the structure on lower levels of processing. In the latter case, there is no single level driving the process of combination. Thus, somewhat paradoxically, although the hypothesis of a generative system for action is inspired by the thesis of a generative system for syntax, nonetheless the former winds up putting into question the latter. Insofar as the same sort of mechanisms – located in overlapping neural areas – seem involved in the generation of action and utterances, it is the assumption of language specificity that is called into question. And insofar as communicative and non-communicative actions are taken to be driven by conscious intentions or, alternatively, conceived as the result of multi-level generative systems, one has to question the assumption that syntax has a special role in driving linguistic processing. References Baars B, Gage N (2010) Cognition, brain, and consciousness. Academic Press, Amsterdam Baldwin D, Baird J (2001) Discerning intentions in dynamic human action. Trends Cogn Sci 5: 171-178 Bauer P (1995) Recalling past events: from infancy to early childhood. Ann Child Dev 11: 25-71 Bookheimer S (2002) Functional MRI of language: new approaches to understanding the cortical organization of semantic processing. Annu Rev Neurosci 25: 151-188 Botvinick M (2008) Hierarchical models of behavior and prefrontal function. Trends Cogn Sci 12: 201-208 Clerget E, Winderickx A, Fadiga L et al (2009) Role of Broca’s area in encoding sequential human actions: a virtual lesion study. Neuroreport 20: 1496-1499 Fadiga L, Craighero L, D'Ausilio A (2009) Broca's area in language, action, and music. Ann 14
NY Acad Sci 1169: 448-458 Fazio P, Cantagallo A, Craighero L, et al (2009) Encoding of human action in Broca's area. Brain 132: 1980-1988 Fuster J (1990) Prefrontal cortex and the bridging of temporal gaps in the perception-action cycle. Ann NY Acad Sci 608: 318-329 Fuster J (2001) The prefrontal cortex-An update: Time is of the essence. Neuron 2: 319-333 Fuster J (2003) Cortex and Mind. Unifying cognition. Oxford University Press, Oxford Fuster J (2004) Upper processing stages of the perception-action cycle. Trends Cogn Sci 8: 143-145 Gough P, Nobre A, Devlin J (2005) Dissociating linguistic processes in the left inferior frontal cortex with transcranial magnetic stimulation. J Neurosci 25: 8010-8016 Grafton S, Hamilton A (2007) Evidence for a distributed hierarchy of action representation in the brain. Hum Movement Sci 26: 590-616 Hagoort P (2005) On Broca, brain, and binding: a new framework. Trends Cogn Sci 9: 416-23 Higuchi S, Chaminade T, Imamizu H et al (2007) Shared neural correlates for language and tool use in Broca’s area. Neuroreport 20: 1376-1381 Jackendoff R (2002) Foundations of language. Brain, meaning, grammar, Evolution. Oxford University Press, Oxford Jackendoff R (2007a) A parallel architecture perspective on language processing. Brain Res 1146: 2-22 Jackendoff R (2007b) Language, consciousness, culture: Essays on mental structure (Jean Nicod Lectures). MIT Press, Cambridge, MA Jackendoff R, Pinker S (2005) The nature of the language faculty and its implications for evolution of language (Reply to Fitch, Hauser, & Chomsky). Cognition 97: 211-225 Jordan J (2003) Consciousness on the edge: The intentional nature of experience. In: Science and Consciousness Review. http://www.scicon.org/news/articles/20040101.html. Cited 19 Feb 2013 Koechlin E, Jubault T (2006) Broca's area and the hierarchical organization of human behavior. Neuron 50: 963-974 Kuperberg G (2007) Neural mechanisms of language comprehension: Challenges to syntax. Brain Res 1146: 23-49 Levelt W (1989) Speaking: From intention to articulation. MIT Press, Cambridge, MA Mazzone M (in press) Automatic and controlled processes in pragmatics. In: Capone A, Lo Piparo F, Carapezza M (ed) Perspectives on Pragmatics and Philosophy. Springer Mazzone M, Campisi E (in press) Distributed intentionality. A model of intentional behavior 15
in humans. Philos Psychol Pastra K, Aloimonos Y (2012) The minimalist grammar of action. Philos T R Soc B 367: 103117 Roby-Brami A, Hermsdorfer J, Roy A et al (2012) A neuropsychological perspective on the link between language and praxis in modern humans. Philos T R Soc B 367:144-160 Rougier N, Noelle D, Braver T et al (2005). Prefrontal cortex and flexible cognitive control: rules without symbols. Proc Natl Acad Sci USA 102: 7338-7343 Sitnikova T, Holcomb P, Kiyonaga K, et al (2008) Two neurocognitive mechanisms of semantic integration during the comprehension of visual real-world events. J Cognitive Neurosci 20: 1-21 Sperber D, Wilson D (2002) Pragmatics, modularity and mind-reading. Mind Lang 17: 3-23 Tettamanti M, Weniger D (2006) Broca’s area: A supramodal hierarchical processor? Cortex 42: 491-494 Whiten A, Flynn E, Brown K et al (2006) Imitation of hierarchical action structure by young children. Dev Sci 9: 574-582
16