Controlled and automatic processing in animals and machines with application to autonomous vehicle control , Kevin Gurney1 , Amir Hussain2 , Jon Chambers1 , and Rudwan Abdullah2 1
2
Adaptive behaviour research group, Department of Psychology, University of Sheffield, S10 2TN, UK
[email protected] Department of Computing Science, University of Stirling, Stirling, FK9 4LA, Scotland, UK
Abstract. There are two modes of control recognised in the cognitive psychological literature. Controlled processing is slow, requires serial attention to sub-tasks, and requires effortful memory retrieval and decision making. In contrast automatic control is less effortful, less prone to interference from simultaneous tasks, and is driven largely by the current stimulus. Neurobiological analogues of these are goal-directed and habit-based behaviour respectively. Here, we suggest how these control modes might be deployed in an engineering solution to Automatic Vehicle Control. We present pilot data on a first step towards instantiating automatised control in the architecture, and suggest a synergy between the engineering and biological investigation of this dual-process approach.
keywords: Executive control, habits, basal ganglia loops, Fuzzy Tuning, Autonomous Vehicle Control, dual-process theory
1
Introduction
What are the different control strategies used by humans to enact behaviour and what might their implications be for control engineering? To get some idea of an answer to the first question, consider the following scenarios. Imagine making tea soon after getting out of bed in the morning in your own kitchen. You probably know exactly what to do without having to consciously be aware of it – the location of the tea, milk, sugar, kettle and water-tap are all well learned, as is the motor actions required to interact with the objects in these locations. Introspection after the event leads us to use terms such as; ‘I did it in my sleep’ or ‘I was on auto-pilot’. Now consider doing the same task if you are staying at a friend’s house for the first time. A completely different strategy appears to be used. Thus, we have to be alert, explore, and use high level cognitive knowledge that we hope generalises well (for example, we hope the tea will be in a cupboard near the sink, not in the living room). These two modes of control are well recognised in the psychological literature [see, for example 1] as automatic, and controlled or executive processing respectively. There is also
growing neurobiological evidence for the existence of different control regimes, supported by different neural substrates [2, 3]. In this paper, we first review the two modes of control from a biological perspective, looking at their characterisation, mechanistic and neural substrate, and the rationale for there being a control duality in the first place. We then go on to deploy these ideas in the context of an architecture for autonomous vehicle control which uses multiple sub-controllers [4]. This builds on a general consideration of the links between biology and the vehicle controller dealt with previously [5]). Thus, we suggest how the architecture may incorporate ‘automatised’ processing to complement its current ‘executive’ control of sub-controller selection. The programme of work required to fully evaluate the resulting architecture is ambitious, but we make a first step here by describing a pilot study dealing with one aspect of learning automatic responses. We conclude by considering the computational problems that arise in the new architecture and their similarity with biological counterparts.
2 2.1
Dual mode control: biological perspectives Characterisation
The concepts of automatic and controlled processing have a long history in psychological research [6] but the development of a full dual-process theory of cognition is often attributed to the work of Shiffren and Shneider [7, 8]. Controlled processing is under the subjects direct and active control, is slow, and requires serial attention to component stimuli or sub-tasks. It is sensitive to a task’s difficulty (which limits the ability to perform additional tasks at the same time) and requires effortful memory retrieval and decision making. In contrast [9] automatic control is less effortful, less prone to interference from simultaneous tasks, is driven largely by the current stimulus and does not necessarily give rise to conscious awareness. As well as defining two kinds of processing, the dual-process theory supposes a dynamic of the transfer of control under learning. Thus, controlled processing is the mode required in the early acquisition of new skills which are, when well-practiced, carried out using automatic processing. For a recent review of dual-process theory see [10]. The development of automatic processing has close similarities with the notion of stimulus-response (S-R) learning, or habit learning in the behavioural neuroscience literature (for a recent review see [11]). A habit is deemed to be constituted when an animal responds to a stimulus without regard to the value of reward obtained by that response [12]. This may occur, for example, if the response elicits a food reward but the animal is sated. Thus, habits are contexttriggered, learned behaviors which are entirely contingent on stimulus and not the goal. Similarly, controlled processing may be likened to goal directed behaviour in animals in which the animal makes a response eliciting something of genuine, current reward value. Further, just as there is a gradual transfer from controlled to automatic processing in cognitive dual-process theory, habits are learned after an initial goal-directed period of the same behaviour.
2.2
Mechanisms and models: psychological
In the domain of cognitive psychology, one of the most influential models of dual process theory is that of Norman and Shallice [13, 14] - henceforth referred to as the NS model. Central to the model (Figure 1 is the idea that behaviour is decomposed into a set of primitive actions or schemas. Examples of schemas
Fig. 1. The model of Norman and Shallice (1980) with Supervisory Attentional System (SAS)
are provided by the scenario described in the introduction: pouring milk into a cup, opening a fridge door, putting a kettle on, etc. These schemas are activated by sensory input (pre-processed through the perceptual system) via a set of ’triggers’. Thus, each schema delivers a particular behavioural output in response to a narrow range of sensory input. However, given a particular environmental situation, several possible schemas may be activated. These can not all be expressed simultaneously because there are limited motor resources (one can’t pour milk and spoon sugar with your dominant hand simultaneously). This problem of schema selection is an example of the more widely studied problem of action selection [15] - a universal problem for all autonomous agents. It appears in many forms across various disciplines, and the need for its resolution in all animals lies at the heart of our neuroscientific account of automatic and controlled processing described in the next section. In the current context, schema selection is mediated in the first instance by a process called ‘contention scheduling’ which is supposed to occur via mutual inhibition between schema activation levels. In this way, schemas which are triggered strongly by sensory input are more likely to be enacted than those
which are weakly activated. However, what happens if the system doesn’t have a schema triggered by the current sensory context, or the initially chosen schema fails to achieve the goal? Such non-automatic processing requires control by a more sophisticated, general purpose planning and programming system. In the NS model this is the so-called Supervisory Attention System (SAS). However, the SAS does not take over the role of the contention scheduling mechanism, rather, it works by biasing the results of that process on the schema pool. The NS model has recently been reinvigorated in quantitative form by Garforth et al [16]. Here, key components of the NS scheme have been incorporated into a simulated autonomous robot model that ‘forages for food’. The model is a connectionist one using clusters of neural networks to implement the various functional components which include but also extend the NS model framework. 2.3
Mechanisms and models: neurobiological
The Norman and Shallice model highlights the importance of action selection as a pivotal concept in understanding the functions of, and relationship between, controlled and automatic processing. In the vertebrate brain, action selection (cf contention scheduling) is, believed to be mediated by a set of sub-cortical structures known as the basal ganglia (BG). In [17], we developed this idea and argued that the BG act as a central switch, receiving requests for behavioural expression from subsystems throughout the brain, and then selectively permitting these to take control of their target motor plant (or cognitive resources). The strength of an action request is supposed to reside in its overall signal level or ‘salience’ (cf schemata activity level in the NS model). Further, requests are more or less ‘listened to’ by the BG according to how well activity profiles on the afferents to BG input neurons match with the corresponding patterns of synaptic ‘weights’ [18]. Thus, a template match between the two will result in a BG action channel which is sensitive to the action request (this would correspond to assigning sensitivities to elements in the trigger database of the NS model). Further downstream, output from the BG is inhibitory, normally active, and targets the requesting subsystem in a closed loop (via thalamus). Actions are selected when competitive processes between action channels in the BG result in withdrawal of inhibition from the target subsystem. An action request will therefore be successful if its salience is is sufficiently high, and there is a receptive action channel input in the BG. These processes are illustrated for two action channels in Fig. 2. The scheme shown in Fig. 2 is well suited to perform action selection for habitual or routine behaviours if the action requests are dominated by signals from sensorimotor cortical areas, representing environmental state and ongoing motor actions. Indeed, the role of the BG in the encoding of habits under these circumstances is well documented [11, 19]. In particular, we propose that habitual selection by pattern matching in BG works in a similar way to schemata triggering in the NS model. But, if the basal ganglia is the neural substrate of contention scheduling in the NS model, what instantiates the SAS? Much evidence points to the prefrontal-cortex (PFC) as serving an ‘executive’ or supervisory role similar to that
Fig. 2. The basal ganglia as the central action selection ‘switch’ in the brain, showing pattern matching at the input and closed loop architecture (thalamus omitted for simplicity). Channel 1 is selected because, i) its overall input salience is greater, and ii) its pattern of activity is better matched with the synaptic weights on the pertinent BG input neuron.
of the SAS [20]. Further, PFC also forms loops through BG (but with different domains of BG than sensorimotor cortices), and there is the possibility of substantial crosstalk from the PFC loops to their sensorimotor counterparts [21, 22]. However, just as in the NS scheme, the ‘supervisory’ PFC does not simply usurp the automatic processing system, but rather, works by modulating it via its cross-connections. In this way, it is supposed that goal-directed (non-habitual) behaviours governed by PFC can transfer into habits in sensorimotor loops by learning therein under the influence of the PFC loops [11, 19]. 2.4
Automatic and controlled processing - the computational rationale
So far we have characterised the processes of controlled and automatic processing, and offered possible mechanistic and neurobiological accounts. However, it remains to answer the question: what computational advantages might ensue from having two modes of action selection? This is especially pertinent if we are to adopt a dual process model in a conventional engineering situation for, while developing biologically grounded solutions is interesting for its own sake, this is not reason enough to do so. One obvious advantage to having routine, automatised control is that it offers a speed advantage. Thus, while all decisions could, in principle, be made using a high level supervisory system, if this were the case, every behavioural decision would require extensive high-level cognitive processing, which would incur a time
penalty by dint of the excessive computational load. In contrast, the automatised version of the behaviour is running with little computational overhead (using, say, pattern matching for example). Such decision time differences are observed extensively in cognitive tasks which tap both processes [23]. A second, less obvious advantage has been recently demonstrated by Daw et al [24]. They modelled learning a task using two training schemes. One made use of internal cognitive models and the other was based on immediately available information; these schemes are representative of controlled and automatic processing respectively. The result was that each method showed different uncertainties in the expected reward of each trial. Further, these uncertainties changed as learning progressed, and as a function of the time into the trial. Thus, an optimal agent could make use of different strategies to dynamically reduce uncertainty to a minimum.
3
Autonomous vehicle control
The field of autonomous vehicles is a rapidly growing one which promises improved performance, fuel economy, emission levels, comfort and safety. An important component of autonomous vehicle control (AVC) aims to control the throttle, wheel brake and steering systems so that the vehicle can follow a desired path and target speed (possibly governed by a ‘lead vehicle’) and at the same time keep a safe inter-vehicle spacing under the constraint of comfortable driving [25]. This is the problem considered here, and one particularly promising strategy for its control is to break the task space into a series of distinct operating regions and to switch between them when required [26]. In order to make contact with the previous discussion, we then interpret this control scheme as one in which the plant (an automobile) switches between distinct ‘behaviours’. Recently, Abdulah et al [4] have formulated one instantiation of AVC using such a multi-controller scheme. We have previously described similarities between this AVC architecture and biological solutions to action selection [5], but here, we focus on the issues of automatic and controlled behaviour. As well as using multiple, sub-controllers, the AVC controller also uses a fuzzy logic based switching and tuning supervisor which uses information about the systems internal state, and the environment, to determine which controller to choose. It is this part of the controller which is relevant to the present discussion and it is shown in Figure 3a Here, n controllers are shown for generality, although only 2 were used in the original scheme. Each controller is MIMO with three outputs – one for each of ‘steer’, ‘throttle’ and ‘brake’ – and the outputs of the ith controller form a vector y (i) . In general, the selected signal for each control function may be independently chosen from any of the controllers (so ‘steer’ and ‘brake’, for example, may originate from different controllers). The result is a final control signal vector y∗ . The selection of the control signals y∗ (t) is dependent on those at the previous time step y∗ (t−1); nevertheless, the selection function per se is performed entirely by the fuzzy controller which, because of its powerful processing capacity, is a clear example of a supervisory executive
Fig. 3. Partial view of AVC architecture showing controller selection. a) original scheme, b) new scheme with automatised, or ‘habit-based’ control option
system exercising controlled processing. There is, however, no counterpart in the AVC architecture to automatic, ‘habit-based’ processing, which might incur the benefits described in section 2.4). In order to remedy this, we propose the architectural modification shown in Figure 3b, which is based on the discussion of the basal ganglia and its use of weight-input template matching to trigger action selection. That is, for each of the three control signals, a neural network is assigned which has, as its input, the signal vector y∗ (t − 1). It is interesting to note that the use of loop with delay is reminiscent of the loops through basal ganglia (Figure 2). Each network has a number of output units zj , 1 ≤ j ≤ n, and has to learn to flag which controller should be used to drive each of the three control signals using a 1-outof-n code (one unit on, the others off). In general, with many controllers, it will be necessary to subject the network outputs to a ‘cleanup’ process to readout a selection signal. This might be done using a winner-take-all net, or by simply taking argmaxj (zj ). The role of the fuzzy controller in this scheme is now to bias or influences this selection process rather than govern it ‘single-handed’. While the mechanism of the top-down bias remains to be elucidated, the neural net pattern matching can be implemented in a straightforward way. We therefore attempted to do this using simple feedforward networks for the driving task shown in Figure 4a. The time series of the control signals (brake, steering, and throttle), are shown in Figure 4b. With only two controllers (n = 2), rather than use a 1-out-of-2 code, we used networks with a single output z with the interpretation that z > 0.5 signalled one controller rather than the other. Training was done using the Levenberg-Marquardt algorithm as implemented in the Matlab Neural Network toolbox. In a first set of experiments, the three control signals formed the inputs, but in subsequent set, these were augmented with the signal derivatives, giving 6
Fig. 4. Pilot study for training network matching in ‘habit-based learning’
inputs to each network. Best results were then obtained with 3 hidden nodes in a 6-3-1 network structure. A typical network output for the ‘brake’ network is shown in Figure 4c. The training worked well, with residual errors too small to see on the graph; this was also a feature of the networks for ‘steer’ and ‘throttle’. While it is encouraging to see highly accurate learning of the training set, the existence of good generalisation – certainly desirable if this scheme is to work robustly – requires the use of validation data. Gathering and deployment of this data are scheduled for future work, but in the interim, we note that the progressive reduction in layer size of the network is indicative that generalisation will, indeed, occur.
4
Discussion
Both psychological and neuroscientific thinking endorses the view that there are two quite different modes of behavioural control. Further, the psychological notions of automatic and controlled processing, would appear to have strong similarities with the neuroscientific counterparts of S-R (or habit-based) control, and goal-driven behaviour, respectively. We have suggested that adaptive controllers in an engineering context may benefit from deploying these two modes of control and reap benefits in terms of speed, and reduced errors during learning. We went on to show how at least part of this programme (automatised or S-R control) may be implemented in a particular AVC architecture.
However, many question remain that are common to both biological and engineering domains. Studying a specific candidate for dual-process control (like the AVC architecture) therefore promises to be useful, not only for the particular application itself, but also in developing our understanding of automatic and controlled processing in the brain. Issues for the AVC controller include: – Exactly how does the executive fuzzy controller bias the selection process? (One possible bridge between the two modes of control might make use of neuro-fuzzy hybrid techniques.) – How do the habits get learned while having some degree of control? That is how can there be a seamless and sensible transition from controlled to automatised behaviour without slipping too quickly into poor ‘habitual’ behaviour? – In solving this problem can we make use of the more natural ‘soft-selection’ seen in basal ganglia control? (‘Mixtures of controllers’) The extent to which unique engineered solutions emerge perforce to these questions, will strongly suggest their presence in the animal brain. We look forward to developing the programme of work outlined here and answering these questions in future work.
References [1] Styles, E.: The Psychology of Attention. Psychology Press, East Sussex, UK (1997) [2] Packard, M.G., Knowlton, B.J.: Learning and memory functions of the basal ganglia. Annu Rev Neurosci 25 (2002) 563–593 [3] Owen, A.M.: Cognitive planning in humans: neuropsychological, neuroanatomical and neuropharmacological perspectives. Prog Neurobiol 53(4) (Nov 1997) 431–450 [4] Abdullah, R., Hussain, A., Warwick, K., Zayed, A.: Autonomous intelligent cruise control using a novel multiple-controller framework incorporating fuzzy-logic based switching and tuning. Neurocomputing (in press) [5] Hussain, A., Gurney, K., Abdullah, R., Chambers, J.: Emergent common functional principles in control theory and the vertebrate brain: A case study with autonomous vehicle control. In: ICANN (2). (2008) 949–958 [6] Hommel, B., Ridderinkhof, K.R., Theeuwes, J.: Cognitive control of attention and action: issues and trends. Psychol Res 66(4) (Nov 2002) 215–219 [7] Schneider, W., Shiffrin, R.: Controlled and automatic information procxessing i. derection, search and attention. Psychological Review 84 (1977) 1–66 [8] Shiffrin, R., Schneider, W.: Controlled and automatic information procxessing ii. perception, learning, automatic attending and a general theory. Psychological Review 84 (1977) 128–190 [9] Neuman, O.: Automatic processing: A review of recent findings and a place for an old theory. In Printz, W., Sanders, A., eds.: Cognition and Motor Processes. Springer, Berlin (1984)
[10] Birnboim, S.: The automatic and controlled information-processing dissociation: is it still relevant? Neuropsychol Rev 13(1) (Mar 2003) 19–31 [11] Graybiel, A.M.: Habits, rituals, and the evaluative brain. Annu Rev Neurosci 31 (2008) 359–387 [12] Adams, C., Dickinson, A.: Instrumental responding following reinforcer devaluation. Quarterly Journal of Experimental Psychology 33 (1981) 109– 122 [13] Norman, D., Shallice, T.: Attention to action: Willed and automatic control of behavior. In Davidson, R.J., Schwartz, G.E., Shapiro, D., eds.: Consciousness and Self-Regulation. Plenum Press, New York (1980) [14] Shallice, T.: Specific impairments of planning. Philos Trans R Soc Lond B Biol Sci. 298 (1982) 199–209 [15] Prescott, T.J., Bryson, J.J., Seth, A.K.: Introduction to the theme issue on modelling natural action selection. Philos Trans R Soc Lond B Biol Sci. 362 (2007) 1521 – 1529 [16] Garforth, J., McHale, S.L., Meehan, A.: Executive attention, task selection and attention-based learning in a neurally controlled simulated robot. Neurocomputing 69(16-18) (2006) 1923 – 1945 [17] Redgrave, P., Prescott, T.J., Gurney, K.N.: The basal ganglia: a vertebrate solution to the selection problem? Neuroscience 89 (1999) 1009 – 1023 [18] Wilson, C.J.: Striatal circuitry: categorically selective, or selectively categorical? In Miller, R., Wickens, J., eds.: Brain Dynamics and the Striatal Complex. Volume submitted MS. Harwood Academic (1999) 289–306 [19] Yin, H.H., Knowlton, B.J.: The role of the basal ganglia in habit formation. Nat Rev Neurosci 7(6) (Jun 2006) 464–476 [20] Dalley, J.W., Cardinal, R.N., Robbins, T.W.: Prefrontal executive and cognitive functions in rodents: neural and neurochemical substrates. Neurosci Biobehav Rev 28(7) (Nov 2004) 771–784 [21] Joel, D., Weiner, I.: The organization of the basal ganglia-thalamocortical circuits: open interconnected rather than closed segregated. Neuroscience 63 (1994) 363–379 [22] Haber, S.N., Fudge, J.L., McFarland, N.R.: Striatonigrostriatal pathways in primates form an ascending spiral from the shell to the dorsolateral striatum. J.Neurosci 20(6) (2000) 2369–2382 [23] MacLeod, C.M.: Half a century of research on the stroop effect: an integrative review. Psychol Bull 109(2) (Mar 1991) 163–203 [24] Daw, N.D., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8(12) (Dec 2005) 1704–1711 [25] Conatser, R., Wagner, J., Ganta, S., Walker, I.: Diagnosis of automotive electronic throttle control systems. Control Engineering Practice 12 (2004) 23 – 30 [26] Lee, C.Y.: Adaptive control of a class of nonlinear systems using multiple parameter models. Int. J. Contr. Autom. Sys. 4(4) (2006) 428 – 437