tain unordered/context-free association rules for determining intentional ECOM classes and .... The deductive system is currently coded in SWI PRO-. LOG [6, 7] ... The PROLOG system then initiates a deductive process in order to generate the ...
Adaptive, Perception-Action-based Cognitive Modelling of Human Driving Behaviour using Control, Gaze and Signal inputs Affan Shaukat, David Windridge, Erik Hollnagel, Luigi Macchi, and Josef Kittler
Abstract A perception-action framework for cognition represents the world in terms of an embodied agent’s ability to bring about changes within that environment. This amounts to an affordance-based modelling of the environment. Recent psychological research suggests that a hierarchical perception-action model, known as the Extended Control Model (ECOM), is employed by humans within a vehicle driving context. We thus seek to use machine learning techniques to identify ECOM states (i.e. hierarchical driver intentions) using the modalities of eye-gaze, signalling and driver control input with respect to external visual features. Our approach consists in building a deductive logical model based on a priori highway-code and ECOM rules, which is then to be applied to non-contextual stochastic classifications of feature inputs from a test-car’s camera and detectors so as to determine the currently active ECOM state. Since feature inputs are both noisy and sparse, the goal of the logic system is to adaptively impose top-down consistency and completeness on the input. The cognitively-motivated combination of stochastic bottom-up and logical top-down representational induction means that machine learning problem is one of symbol tethering in Sloman’s sense.
1 Introduction Artificial cognitive systems are distinguished from general machine learning systems in that they exhibit adaptive, anticipatory and goal directed behaviour. The majority of such systems are embodied in a real or virtual environment in which it is possible to speak of a cognitive agent’s actions and perceptions. A key problem in this context is the extent to which perceptual adaptivity can occur; in particuAffan Shaukat, David Windridge Centre for Vision, Speech and Signal Processing Faculty of Engineering & Physical Sciences, University of Surrey, Guildford, UK e-mail: {a.shaukat,d.windridge}@surrey.ac.uk
1
2
A. Shaukat et al.
lar, whether it is possible for an agent to update its model of the world (using e.g. computer vision techniques) at the same time as adapting its perceptual framework (i.e. its mode of representing the world) in a manner appropriate to the environment. (In classical approaches, modelling necessarily assumes a fixed representational domain [11]). A perception-action framework for artificial cognition is capable of resolving this issue by imposing a bijectivity between perceptual-transitions and actions. Such a system hence attempts to represent the world in the most efficient manner with respect to the ability of the cognitive system to bring about changes in it. This amounts to an affordance-based model of the environment. Recent work, based on psychological research [14], suggests that a hierarchical perception-action model, known as the Extended Control Model (ECOM), is employed by humans within a driving context. Perception-Action (P-A) methodologies such as ECOM can be characterised as constituting the greedy clustering of those perceptual states that are made distinct by the agent’s actions in order to model the environment [10, 11]. The current research work seeks to use machine learning techniques to identify ECOM states (i.e. hierarchical driver intentions) using the modalities of eyegaze, signalling and driver control input with respect to visual features identified by an external camera. Our proposed approach describes human intentions with respect to the environment in the form of subsumptive [4, 3] task-based hierarchical perception-action circuits, minimising the number of non-measurable assumptions made regarding the nature of human cognition. It consists of a deductive logical model based on a priori highway-code and ECOM rules that is interfaced to feature inputs from the camera and detectors via stochastic machine learning techniques in order to determine the currently active ECOM state. The different levels of the ECOM model thus serve as a classification objective, with training/test-set annotation of the intentional features generated by per frame expert analysis of representative driving scenarios. Each ECOM level consists of mutually exclusive classes; however, different levels may be active simultaneously, with a strong downward conditional dependency. Since feature input is both noisy and sparse with respect to the ECOM output states constituting perception-action hierarchy, the first goal of the logic system is to impose completeness and consistency on the input. However, this is not always possible when detector input is consistently faulty. The deductive system thus additionally has the potential to modify detector inputs in a top-down fashion, and thereby complete (since it follows bottom-up learning) a full adaptive bootstrap cycle [18]. As a proof-of-concept implementation of this idea, we test the notion that global consistency checking is capable of providing top-down constraints on the feature-detectors under simulated failure conditions. Note that because of the existence of the deductive a priori logical model, our problem is thus strictly one of symbol tethering [13], rather than grounding [12]. We use six cross-road traversing scenarios, with two cases each of left-turning, right-turning, and straight-over junction traverses for experimentation. We thus seek to map the Regulating and Monitoring levels of the ECOM model onto the highwaycode-relevant entities of these scenarios for each frame of data using ground-truth training information. This is achieved via annotation of the driver’s visual scene us-
Adaptive, Perception-Action-based Cognitive Modelling of Human Driving Behaviour
3
ing a set of bounding boxes outlining key objects per frame. This serves as a coarsegrained characterisation of the driver’s gaze behaviour in terms of logically-salient road entities (e.g. traffic signs, lights). In building the classification data-set, we migrate all relevant logical relations between these entities down to the lowest-level features as far as possible in order to generate a high-redundancy feature domain. This permits the use of standard pattern-recognition approaches such as feature selection and stochastic classification prior to first-order logical induction. Thus the resulting logical feature vector comprises 772 descriptors for each frame of data, with per-frame intentional annotation constituting the classes to be learned.
2 Concept Implementation and Methodology Our methodological approach consists of the following steps: 1. We firstly formalise and render mutually-consistent the Highway Code and ECOM models. We then compile a comprehensive list of low-level predication relating to the ECOM model of Regulating/Monitoring intentions and the corresponding highway-code-derived world-model. These will correlate with the various possible detector inputs (i.e. they will relate to external visual features and signal, motor and gaze inputs from the driver as measured by various carinstrument modalities). 2. We formalise all relevant a priori hierarchical scene description inherent in the Highway Code. For example, the lane/road occupancy relation: ∀n In Lane(n, cari ) => In Road(cari ) 3. We next obtain ground truth per-frame data for all of the low-level predication. For this purpose, we select six cross-road junction scenarios. Cross roads, in particular, act as a super-set of all other road junction situations, with their exhaustive pathing options. World annotation is conducted via machine-learning techniques; intentional annotation is carried out by human experts. 4. We then extend the annotation to all of the intermediate and high-level predication by application of the rules determined in stage 2, giving rise to a per-frame binary feature vector of scene and intentional descriptors at progressively abstracted levels of the representational hierarchy. 5. Through the application of decision-trees to the above training-data, we obtain unordered/context-free association rules for determining intentional ECOM classes and sub-classes given the signalling/motor/feature inputs. We thus classify the mapping between the human perception-action hierarchy and the superset of hierarchical scene descriptors implicit in the Highway Code. 6. We use a Highway Code & ECOM based logical deduction system (using first order predicate logic) for determining long range a priori logical consistency amongst classes of decision tree outputs. 7. A cognitively-motivated top-down respecification module is tested for the ability to re-weight feature detectors on the basis of global logical consistency, giving the system an adaptive bootstrap capability.
4
A. Shaukat et al.
3 Annotation Specification and Implementation 3.1 Intentional Annotation The ECOM model consist of four layers of control (or concurrent loops), only three of which are appropriate to the experimental scenario (since we don’t consider navigation). These are: Monitoring, Regulating & Tracking. For the current purposes, we adopt the principle that Tracking level behaviour manages the continuous activity undertaken to keep the vehicle within a specific, discrete conceptual configuration (e.g. car-order within a lane). From a driver’s perspective it refers to minor modifications of car speed, direction of car, intended distance from the car in front or back, or the lateral position on the road. In the case of an experienced driver these actions are predominantly a matter of physical reflex without high-level conscious attention. (However, in the case of an inexperienced driver these Tracking behaviours may conceivably be enacted at the Regulating level). Regulating intentions hence provide an input to the Tracking control-loop to perform a specific, high-way code relevant action, e.g. changing lane. Other regulating intentions include intentionally stopping and turning right/left at a junction, and as such can, where necessary, be linked hierarchically. In attempting to correlate Highway code-based scene descriptions with the subsumptive perception-action model employed in driving, we are hence primarily concerned with the Monitoring and Regulating levels of ECOM (figure 1). The intentional states are expert annotated, with annotation based on the observed context and the empirically-motivated psychological model. Once this is carried-out per-frame for all ECOM levels, gaze can be mapped onto the highway-code-relevant visual categories (traffic light states etc.) via bounding box occupancy tests, with the composite data serving as a training set for the final stage, in which intention is deduced from gaze, signalling & control behaviour with respect to the changing road configuration. Per frame description of human scene representation is thus achieved via the bounding box labels associated with the gaze occupancies, supplemented with any relevant car/pedestrian lane/road occupancies. We turn now to a description of how the per-frame hierarchical scene-descriptors are propagated through the forward camera video footage.
3.2 Scene Description Annotation Two distinct categories are apparent within the video annotation of the junction scenario; ’ground-plane’ and ’view-plane’ scene description. The absolute positions of certain entities, such as, signs and lights, are of less importance than the fact that they can be seen by the driver; these are the ’view-plane’ entities. Indicator entities, such as road-arrows and lane-markers are intrinsically positional relative to the world, and their outlines are considered as bounded with respect to the ground
Adaptive, Perception-Action-based Cognitive Modelling of Human Driving Behaviour
5
Fig. 1 Hierarchically-arranged perception-action circuits within the ECOM model.
plane. Gaze behaviour is thus characterised, on a per-frame basis, using key-entity (e.g. sign, traffic-light) bounding-box transitions within both the ground-plane and view-planes as appropriate. Manual placement of bounding boxes is used for perframe annotation of mobile (i.e. cars, pedestrians) and view-plane (i.e. signs, lights) entities. However, the projective propagation of ground-plane junction zones and topologies throughout the video footage requires an intrinsically 3D approach for bounding box description in order to establish correlations with the gaze direction.
3.3 Projective Ground-Plane Tracking The following technique was found to be most effective for accomplishing the task of ground-plane tracking for propagating key ground-plane entities. It involves five different steps: 1. Temporally aggregate LIDAR data to give an approximate delineation of junction-outlines within inner-urban (city) areas. 2. Histogram and drift correct aggregate LIDAR data to further distinguish road outline and differentiate it from traffic noise. 3. A Hough transform H(r, θ) of the histogram is computed via the edge point mapping: r(θ) = x0 · cos θ + y0 · sin θ, where r, θ are Hough transform parameters, (r represents the distance between the line and the origin, while θ is the angle of the vector from the origin to this closest point ((x0 , y0 ))). After edge-detection a Hough Transform histogram [1, 9] with high angular suppression is used to obtain predominating road vectors. That is, we obtain a Canny edge detected image [5] such that non-zero intensity values with coordinates (x0 , y0 ) in the image plane constitute the Hough intensity H(r, θ). A selection criterion is applied to the peaks in H(r, θ) to identify the top two line candidates (i.e. highest density bins) subject to the constraint that they are > 30◦ apart in the θ ordinal i.e. (This is illustrated in figure 2):
6
A. Shaukat et al.
{(r1 , θ1 ), (r2 , θ2 )} : argmax H(r1 , θ1 ) + H(r2 , θ2 ) s.t. |θ1 − θ2 | > 30◦
(1)
r1 ,θ1 ,r2 ,θ2
4. The junction topology and pedestrian-crossing/lane structure is fitted to {(r1 , θ1 ),
Fig. 2 Edge-detect and Hough Transform histogram (left), projecting the junction topology into screen frame (right).
(r2 , θ2 )} on the basis of a priori knowledge of their absolute number. 5. An approximate view-plane transformation matrix is applied for projecting the junction topology into screen frame (figure 2) for further small-scale adjustments of car-height/camera orientation etc. The outputs from the process are used as the projected junction-lane bounding boxes on the driver’s field-of-view to generate per-frame gaze occupancies of junction zones and sub-zones. These are added to the view-plane bounding-box annotation.
4 Context-Free Machine Learning The current experiment comprises six cross-road traversing scenarios, consisting of 2 cases each of left-turning, right-turning and straight-over junction traverses. This set acts as a nearly maximally complex representation of the driving situation, since it effectively depicts all possible ’configuration-changing’ driving scenarios: i.e. all other driving situations (lane changes, roundabout-traverses, etc.) can be considered as degenerate instances of these cases. The ECOM hierarchy has been set up in such a way that individual intentions are considered to be mutually temporally exclusive; individual levels, however, may be simultaneously operative. Formally it can be stated that the classification of ECOM intentions is the simultaneous categorisation of the unique item il within each level; it can be considered as a mapping problem: ∀l, X → il : i = argmax {p(j l |X)} j
(X is the feature vector).
(2)
Adaptive, Perception-Action-based Cognitive Modelling of Human Driving Behaviour
7
A robust and locally self-consistent inference system is expected to have the ability to accept potentially inconsistent information at any arbitrary level of the hierarchy. Ideally it requires the determination of intentional class attribution down to the lowest level of the hierarchy (where, in general, consistency is likely to be the weakest). A full hierarchy of binary features is therefore generated, consisting of junction, roads, and lanes (with a notion of subsumption implicit within the hierarchy). Thus, lanes at a junction are characterised as belonging to the set {ROn, RIn, LOn, LIn, DOn, DIn, OOn, OIn}, where n is a number between 1 and the total number of lanes of the road (R=right, L=left, D=driver’s side, O=opposite side; I/O = inbound/outbound lane). Consequently road directions are characterised as belonging to the set: {RO, RI, LO, LI, DO, DI, OO, OI}, with roads belonging to the set: {R, L, D, O} (thus, in general, subsumption is determined via ordinal subset relations). There are also generalised velocity descriptors such as ’driver-ward’, ’leftward’, etc., that subsume the tracking-level angle-based descriptors, allowing for the possibility of more coarse-grained velocity relations to be captured by the classification process. A logical feature vector comprising 772 features for each frame of data is obtained. A decision-tree learning algorithm [17, 8] based on Gini impurity [2] is used for context-free classification on the basis of its readily-interpretable results. Rule induction using decision tree algorithms [15], has the characteristic of direct translatability into logical clauses. The optimum attribute m∗ is considered as the root node of the decision tree. The optimum attribute is the one that produces the maximum information gain; m∗ = argmax {Gi (m)}, whereas Gi (m) is the information gain m
for attribute m. A 10-fold cross-validation technique is used for testing the acontextual classifiers. Trees are pruned via iterative removal of non-leaf nodes and performance testing. The tests results show an average per-level accuracy between 9.1% and Table 1 Percentage misclassification rates for each scenario (ECOM intentional levels 1 to 5) Level Straight on 1 1 2 3 4 5
Straight on 2
Left turn 1
Left turn 2
Right turn 1
Right turn 2
0 0 0 0 0 0 0 0 0 0 0 0 10.27 ± 0.072 17.07 ± 0.15 9.10 ± 0.064 7.16 ± 0.043 29.88 ± 0.166 7.48 ± 0.046 15.38 ± 0.063 17.07 ± 0.15 14.13 ± 0.067 11.29 ± 0.028 30.16 ± 0.168 10.56 ± 0.035 16.30 ± 0.066 19.89 ± 0.079 16.81 ± 0.067 11.29 ± 0.024 25.71 ± 0.077 12.40 ± 0.041
30.16% on ground truthed data. It can be noticed that decision tree algorithm shows good performance at the higher levels of the intentional hierarchy (see Table 1). The decision rules characterise the states of the scene description hierarchy within the ECOM-based junction navigation. It is noteworthy that they depend only on a small fraction of the total set of hierarchical feature descriptors (refer to figure 3).
8
A. Shaukat et al.
Fig. 3 Level 2 decision tree whereas ’Tobs’, ’Tobj-driverward’ etc. are the binarised feature predicates (left), decision tree output projected onto the screen (right).
5 Deductive Logic Infrastructure Construction The logical deduction system is used as an extension to the previously discussed intentional detection system for accommodating long-range rule-like behaviours. In its free-standing mode of operation, the logic system constructs a logically consistent world-model from the computer-vision system inputs, driver’s gaze, signal and control inputs to determine the active intention and sub-intention at any given time. If ilc is the current ECOM intention, and Pcl is the current ECOM perception, with Alc representing the current ECOM action or state, intention maps a l-level percept l l , Pc2 ) 7−→ Alc . Given the notion change onto its corresponding action; ilc : (Pc1 that action precedes percept, this can be formalised into a mathematical principle l l , Pnl )} i.e. every given per, Pnl : ∃A(Pm of bijectivity using the equation: {∀mn Pm l l cept Pm or Pn is linked by an action or a sequence of actions, i.e. states of A. This serves to limit the model of the driver’s perceptual domain to entities that are intention-relevant. The logic system has been built using first-order predicate logic [16] with a priori logical predication applied via a top-down approach, starting with the most general world predicates and clauses. The deductive system is currently coded in SWI PROLOG [6, 7] with a recursive predicate structure designed to maximally assist debugging and code-based expandibility. The junction predicates are defined in such a way that the subsumptive hierarchy implicit within the highway code persists between different levels of the logic. Thus it starts with the most general predicate of junction followed by roads, lane numbers, lane direction (inbound/outbound) and physical/legal conditional path possibilities. Thus the full extent of instantaneous feature detections are presented to the system, irrespective of their completeness or consistency. The PROLOG system then initiates a deductive process in order to generate the logically-legitimate configuration possibilities, as well as all possible legal/physical outcomes for overall temporal consistency. Also input into this process is the active ECOM intention and sub-intention (at the Monitoring/Regulating level) determined by the decision trees. The intention is thus that the PROLOG logic system determines the consistency of current intentions/world-states with the bulk of preceding intentions/world-states
Adaptive, Perception-Action-based Cognitive Modelling of Human Driving Behaviour
9
using Vitterbi-like searching, selecting the longest/best path through intentional nodes, weighted by the decision-tree outputs. All active predicates per-frame are considered as true, while inactive as false, with previous frame data asserted as previously true (this is equivalent to a temporalised closed-world assumption). As indicated, the logic system is, in itself, capable of acting as a per-frame ECOM intention classifier; each of the intentions il on each of the levels, l, can be asserted, in turn, alongside the negation of all of the other intentions on the same level to test for consistency, i.e. for the intention set {j n } defined ∀n we test the assertions: l i → T rue : l = n, i = j ∀l, i l i → F alse : l 6= n, i 6= j All consistent intention classes can be given an equal weighting such that they sum to unity to create a pseudo-probability, and give a stochastic measure of the likelihood of the correctly predicted intentional output. It can be noticed (figure 4) that the accuracy figures increase with time for the right turn scenario, since the default ’straight on’ assumption becomes falsified as more temporal context is accrued. Given that these results are based on logic, it is thus possible to make relatively accurate predictions purely on the basis of prior knowledge of the conditionalities of the ECOM model and the constrained (highway code-based) nature of traffic scenario. There are thus two distinct modes of pattern recognition employed in the composite system; the discriminative (i.e. the decision trees) and the generative (i.e. a priori logical model). By combining the decision-tree outputs (figure 4) with logical deduction through their incorporation into the consistency testing and aggregation procedure, the accuracy of the composite system is very significantly greater than that of the individual systems (figure 4).
Logic 1
Decision tree
0.8
Probability of correct intentions
Probability of correct intentions
1
0.6
0.4
0.2
0
0.8
0.6
0.4
0.2
0
10
20
30
40
50 60 Frame No.
70
80
90
100
0
0
10
20
30
40
50 60 Frame No.
70
80
90
100
Fig. 4 Comparison of a priori logic and decision-trees accuracy for 1st right-turn scenario (left), combination of decision-trees with logical consistency constraints (right).
10
A. Shaukat et al.
6 Top-Down Feature Respecification/Cognitive Bootstrapping The previous experiments relate to the bottom-up aspects of the system, which consists of building the world and intentional model from detector inputs. However, the global logical consistency checks also have the potential to modify detector inputs in a top-down fashion, therefore completing a full ’bootstrap’ modelling of the world (ultimately allowing the possibility of running the intentional system fully unsupervised). As a proof-of-concept implementation of this idea, the notion was tested that global consistency checking is capable of providing usable top-down constraints on the feature-detectors under simulated failure conditions. This is achieved by simulating the effects of noise on features by randomly replacing binary values for single features within the frame vector by arbitrary binary digits. Frames are selected according to a uniform random distribution with an average of failure probability of one in every five frames. The goal is then to determine which of the 772 possible features is subject to this additive noise purely on the basis of global logical consistency. This is achieved by sequentially removing feature predicates (i.e. detector inputs), recalculating the associated predication of each frame for all of the 772 features, and determining whether this improves overall global consistency as measured by the final size of the consistency set. (The consistency set is the largest set of mutually consistent frames). The binary weighting of the feature inputs is done on the basis of this consistency. The plots in figure 5(left) show the results for a subset of the features. A clear peak in the magnitude of the consistency set can be seen for the removal of the errorcompromised feature. The two other peaks exhibit a similar characteristic because of the conditional dependencies existing between features. However, critical to our argument is that this measure of global consistency correlates with the accurate prediction of the ground truth ECOM values. Blanking the features associated with the peak in global consistency can indeed be shown to act to increase overall system accuracy in the manner intended (figure 5(right)), such that the system is able to adaptively determine the reliability of its own detector input. The system thus exhibits the capability of bottom-up inference of intentional states, and top-down feedback of global context information.
7 Conclusion We sought, in this paper, to determine the mapping that exists between the tasksubsumption hierarchy and scene-representation hierarchy employed by human drivers while navigating junctions utilising the psychologically-derived ECOM perception-action model of human intentionality. This was accomplished via the application of statistical pattern-recognition techniques to an expert-annotated hierarchy of intentional descriptors, which was applied to driving footage with an eye-tracking overlay. An intentional deduction system based on a priori ECOM and Highway code logic models was combined with the stochastic classifiers to deduce
Adaptive, Perception-Action-based Cognitive Modelling of Human Driving Behaviour 100
0.75 0.74
95 Average probability of correct intentions
Feature blanked Blanked feature
90
Consistency count
85 80 75 70 65 60 55
11
0.73 0.72 0.71 0.7 0.69 0.68 0.67 0.66
128
130
132
134
136 138 140 142 Blanked feature no.
144
146
148
150
128
130
132
134
136 138 140 142 Blanked feature no.
144
146
148
150
Fig. 5 Size of consistency set verses feature number (left), accuracy verses feature number (right), for 1st example of the right-turn scenario.
rule-like behaviours relating to intentional context. In doing so, a prototype system has been demonstrated for the application of adaptive bootstrap techniques within the context of our problem, that also provides a mechanism for classifying driver intention with good baseline accuracy. As well as providing scientific evidence for the utilisation of perception-action hierarchies by humans, this approach, if successful, has the potential for practical application to driver warning and feedback systems.
8 Acknowledgements The work presented here was supported by the the European Union, grant DIPLECS (FP 7 ICT project no. 215078)1 . We also gratefully acknowledge funding from the UK Engineering and Physical Sciences Research Council (grant EP/F0694 21/1).
1
However, this paper does not necessarily represent the opinion of the European Community, and the European Community is not responsible for any use made of its contents.
12
A. Shaukat et al.
References 1. L. Banjanovi´c-Mehmedovi´c, I. Petrovi´c, and E. Ivanjko. Hough transform based correction of mobile robot orientation. In Proceedings of the IEEE-ICIT 2004 International Conference on Industrial Technology, December 8-10, Hammamet, Tunisia, 2004. 2. L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA, 1984. 3. Rodney A. Brooks. A robust layered control system for a mobile robot. Technical report, Cambridge, MA, USA, 1985. 4. Rodney A. Brooks. Intelligence without representation. Artif. Intell., 47(1-3):139–159, 1991. 5. J. Canny. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell., 8(6):679–698, November 1986. 6. A. Colmerauer, H. Kanoui, Pasero R., and P. Roussel. Un systeme de communication hommemachine en francais. Technical report, Groupe d’Intelligence Artificielle, Universite d’AixMarseille II, 1973. 7. Alain Colmerauer. Prolog in 10 figures. Commun. ACM, 28(12):1296–1310, 1985. 8. Wlodzislaw Duch, Rudy Setiono, Jacek M. Zurada, and Senior Member. Computational intelligence methods for rule-based data understanding. In Proceedings of the IEEE, pages 771–805, 2004. 9. Richard O. Duda and Peter E. Hart. Use of the hough transformation to detect lines and curves in pictures. Commun. ACM, 15(1):11–15, 1972. 10. G¨osta Granlund. Organization of architectures for cognitive vision systems. In Proceedings of Workshop on Cognitive Vision, Schloss Dagstuhl, Germany, October 2003. 11. G¨osta Granlund. A Cognitive Vision Architecture Integrating Neural Networks with Symbolic Processing. K¨unstliche Intelligenz, (2):18–24, 2005. ISSN 0933-1875, B¨ottcher IT Verlag, Bremen, Germany. 12. Stevan Harnad. The symbol grounding problem. Phys. D, 42(1-3):335–346, 1990. 13. Nick Hawes, Jeremy Wyatt, and Aaron Sloman. An architecture schema for embodied cognitive systems. Technical Report CSR-06-12, University of Birmingham, School of Computer Science, November 2006. 14. Erik Hollnagel and David D. Woods. Joint Cognitive Systems: Foundations of Cognitive Systems Engineering, pages 149–154. CRC Press, Taylor & Francis Group, 6000 Broken Sound Parkway NW, Suit 300 Boca Raton, FL 33487-2742, Feb 2005. 15. David E. Johnson, Frank J. Oles, Tong Zhang, and Thilo Goetz. A decision-tree-based symbolic rule induction system for text categorization. IBM Systems Journal, 41:428–437, 2002. 16. R. Kowalski. Predicate logic as a programming language. In Information Processing 74, pages 569–574. North-Holland, 1974. 17. J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, March 1986. 18. Mikhail Shevchenko, David Windridge, and Josef Kittler. A linear-complexity reparameterisation strategy for the hierarchical bootstrapping of capabilities within perception-action architectures. Image and Vision Computing, 27(11):1702 – 1714, 2009. Cognitive Systems: Perception, Action, Learning.