Anticipation in Robotics

1 downloads 0 Views 4MB Size Report
realisable embedded artificial intelligence, robots can indeed predict the future ... broadly consider two classes of anticipatory robotic systems: predictive internal ... Let us first investigate sensorimotor prediction processes in humans before we ...... Winfield, A.F. (2012) Robotics: A very short introduction, Oxford University.
Anticipation in Robotics

Alan FT Winfield, University of the West of England, Bristol, UK, [email protected] Verena V Hafner, Humboldt-Universität zu Berlin, Berlin, Germany, [email protected] Abstract (250 words) In this chapter, we introduce anticipatory robotic systems. We show how intelligent robots can anticipate the future, by outlining two broad approaches; the first shows how robots can use anticipation to learn how to control their own bodies, and the second shows how robots can use anticipation to predict the behaviour of themselves interacting with others, and hence demonstrate improved safety, or simple ethical behaviours. Both approaches are illustrated with experimental results from recent work. We show that, with practical realisable embedded artificial intelligence, robots can indeed predict the future and that this is a technology with significant potential for improved safety and human-robot interaction. 1. Introduction In recent years, robotics has more and more moved away from industrial applications to autonomous adaptive and learning systems that can interact with humans or in human environments in a predictable and intuitive way. The ability to anticipate is a major advantage for these robots, since it allows for making predictions, having expectations and making decisions in a way that combines past, present and future events. The subject of this chapter is intelligent robots (as distinct from traditional preprogrammed industrial assembly robots, or robots that are remotely teleoperated by humans). A reasonable definition for a modern intelligent robot is an ‘embodied artificial intelligence (AI)’ (Winfield, 2012). It is the robot’s AI that determines how the robot will behave, and the mechanisms for anticipation outlined in this chapter form part of a robot’s embedded AI. The aim of this chapter is to show how intelligent robots can anticipate the future, by outlining two broad approaches; the first shows how robots can use anticipation to learn how to control their own bodies, and the second shows how robots can use anticipation to predict the behaviour of themselves interacting with others, and hence demonstrate improved safety, or simple ethical behaviours. Both approaches are illustrated with experimental results from recent work. Our motivation is to show that, with practical realisable embedded AIs, robots can indeed predict the future and that this is a technology with significant potential for improved safety and human-robot interaction.

In this chapter, we follow Robert Rosen’s definition of an anticipatory system as “[...] a system containing a predictive model of itself and/or of its environment, which allows it to change state at an instant in accord with the model’s predictions pertaining to a later instant.” (Rosen 1985, p.339). Such a system could be a human, other animal or a robot as long as it is able to anticipate future states or events. Rosen developed a rigorous theoretical foundation for living organisms as “anticipatory systems”. We consider here anticipation via the mechanism of predictive internal models. These models could either be learned or already be present in the system. We broadly consider two classes of anticipatory robotic systems: predictive internal models for sensorimotor systems and predictive internal models for simulating multi-robot systems. In the first, a robot learns to make predictions of the outcome of its own sensorimotor actions, and adapts its actions accordingly. In the second, a robot runs an internal simulation of itself and other agents, and choses its actions depending on the predicted outcomes. We choose to focus on these two classes of anticipatory system because we consider them to be not only interesting and powerful exemplars but also approaches with considerable potential for future intelligent robot systems. Another motivating factor is that these two algorithmic approaches are fully transparent and explainable, in contrast to methods based on artificial neural networks or deep learning. Combining the two classes, simulations of self and others can be run, which are a prerequisite for developing a theory of mind. This is also a first step towards bringing robotics and cognitive sciences closer together. Other mechanisms of anticipation that do not involve internal simulations can be subsumed under implicit anticipation. This could be seen in simple reactive robots that for example avoid obstacles using distance sensors. This behaviour can be achieved with pre-programmed thresholds without any internal simulation (Brooks 1991, Braitenberg 1984). Map-based navigation and planning systems such as SLAM could be categorised as explicit anticipation since it involves memory (past) and predictions (planning). Anticipatory behaviour in robotics can be seen as an extension of prediction. Prediction can be extended by classification tasks that require sorting objects or actions into different classes. It can also be extended by sensory attenuation. In that case, an experienced sensory signal is attenuated by a predicted sensory signal. In robotics, this is often used in egonoise cancellation (Schillaci et al. 2016b). The most common form of anticipation in robotics is a combination of prediction and decision-making. Here, the prediction has a direct impact on the robot’s future behaviour. Finally, anticipation is defined by the expectations one has of self, others, and the environment. In human-robot interaction, both the expectations the human has of the robot and potential expectations the robot could have are relevant for an intuitive interaction.

2 Anticipation for Sensorimotor Control In this section, we consider sensorimotor learning and prediction for anticipating both own behaviour and the behaviour of others or external events. Sensorimotor interaction is a dynamic process that needs to adapt over time to changes in body morphology, motor parameters or the environment. Most anticipatory systems therefore undergo an exploration and learning process. In the next subsection, learning architectures for internal simulations of sensorimotor actions are described. 2.1 Sensorimotor learning and prediction The ability to perform sensorimotor predictions is not fully developed at birth in mammals and has to be learned through exploratory behaviours during the lifetime. Developmental Robotics is a field where this principle of learning and development is applied to robots. Schillaci et al. (2016a) review the latest developments in cognitive science and robotics concerning exploratory learning and sensorimotor prediction in their chapter in “Re-enacting sensorimotor experience for cognition“ (Schillaci et al. 2016c). Vernon et al. (2015) consider episodic memory in that context and review the simulation hypothesis. 2.1.1 Neuroscientific foundations Let us first investigate sensorimotor prediction processes in humans before we apply them to robots. In 1995, Wolpert et al. (1995) suggested an experiment to show that sensorimotor prediction processes for motor planning and execution exist in humans. In this experiment, participants were asked to perform a hand movement using a tool with a tracker, and had their hand movements distorted as well as the visual feedback hidden from them. The participants then had to estimate the position of their thumb. The results indicated that the participants continuously updated an internal model with a prediction of their motor actions. They later suggested a model that suggested that an internal body representation with a combination of sensory input and motor output signals is stored in the brain (Wolpert et al. (1998)). Computational predictive models have recently gained a lot of interest in the scientific community since they can account for many processes that are going on in the brain. Andy Clark talks about “prediction-action machines” in his recent book (Clark, 2015). Friston et al. (2015) discuss generative models in the context of active inference and learning. 2.1.2 Robots learning internal models A first robotic implementation of sensorimotor predictions was performed by Dearden and Demiris (2005). They used an internal forward model to predict the visually observed movements of a robot gripper. The internal model of the robot

learned by performing random movements with the gripper and visually observing them. The model was encoded as a Bayesian neural network. Schillaci and Hafner (2011) performed experiments with a humanoid robot Nao that learned an internal model to predict the consequences of its own motor actions. The model consists of an inverse and a forward model that can predict the necessary motor command to reach a desired state and the sensory state that will be reached when performing this motor command (see figure 2). The robot learned both models during an exploratory phase where it performed random motor actions with one arm, and remembered the corresponding sensory information, which in this case was the position of its hand as seen through its camera (see figure 1). An interpretation of such an anticipatory sensory perception can be seen in figure 3. Robots cannot only predict own actions, but also the actions of others (Copete et al. 2015, Schillaci et al. 2012b).

Figure 1: A sequence from an exploration behaviour – in this case random motor babbling – performed by the humanoid robot Nao. In the bottom row, the corresponding frames grabbed by the robot camera showing the robot’s hand are shown. Picture taken from Schillaci and Hafner (2011).

Figure 2: Combination of inverse and forward model for sensorimotor predictions.



Figure 3: An example of an internal simulation (image taken from Schillaci (2014)). The inverse model simulates the motor command needed for reaching a desired sensory state from the current state of the system. Before being sent to the actuators, such a simulated motor command can be fed into the forward model that anticipates its outcome, in terms of sensory perception. A prediction error of the internal simulation can be calculated by comparing the simulated sensory outcome with the desired sensory state. 2.2 Exploration strategies In order to make predictions about own sensorimotor actions or others, experiences have to be made through exploration behaviour. The most straightforward exploration strategy is random body or motor babbling as has been presented earlier in Dearden and Demiris (2005) and Schillaci et al. (2011). However, this exploration strategy is not optimal in more complex cases, for example when the robot has a high number of degrees of freedom (many limbs). A more effective way in this case would be goal-directed exploration strategies as implemented by Rolf and Steil (2014) on the Bionic Handling Assistant, a robot arm with artificial muscles and a large number of degrees of freedom. Here, the motor actions are not randomly set, but a goal is chosen in the sensory space and the current information the model has already gained is used to reach it. Another way of exploration is inspired by curiosity-based exploration in animals: Baranes and Oudeyer (2013) presented an intrinsically motivated goal exploration mechanism that drives exploration based on the actions that would maximise learning progress (Oudeyer et al. 2007). 2.3 Tool Use The ability to use tools as developed in humans and some other animals requires anticipatory skills involving simulating sensorimotor actions and anticipating their consequences. Matching objects to tools and actions (affordances) is a further step.

In a preliminary study, Schillaci et al. (2012a) present internal models based on inverse and forward model pairs to decide on whether to use or not to use a tool for a reaching task in a humanoid robot (see figure 5). The reaching action is learned through exploration with and without a tool, and results in two internal models (see figure 4). When faced with the decision to reach for an object, the robot internally simulates the reaching action with the tool using one model and without the tool using the other model. The action with the smaller prediction error is then chosen.

Figure 4: Two internal models for sensorimotor simulation. The green one has been learned on a reaching movement using a tool, the blue one without a tool.

Figure 5: Reachable space of the humanoid robot Nao with (left) and without (right) holding a tool.



2.4 Sensory attenuation and self-models In sensory attenuation, an experienced sensory signal is attenuated by a predicted sensory signal. The neural mechanisms have been extensively studied in the electrosensory system of the electric fish (Krahe and Maler 2014). In humans, the effect becomes evident in the observation that people have difficulties in tickling themselves (Blakemore et al. 2000). In robotics, a similar effect can be produced, and a prominent application is egonoise cancellation (Schillaci et al. 2016b, Pico et al. 2016). Here, the motor noise of the robot can be predicted using information on the motor commands, resulting in a filtered acoustic signal with reduced noise.

Figure 6: Top row: video sequence from a robot’s perspective with the arm covering an object (orange ball). Bottom Row: Video sequence of an imaginary image generated through predictive processes. Sensory attenuation works for many different modalities such as visual, auditory or tactile. A way to structure the learning architectures in a self-organising way is using self-organising maps (SOMs). Morse et al. (2010) created a SOM-based predictive internal model that could integrate several modalities. An adaptive SOM-based model for egonoise prediction has been presented in Schillaci et al. (2016b). Bechtle et al. (2016) present an experiment where a humanoid robot predicts the occlusions caused by its own arm movements and generates an imaginary image without the arm occlusion (see figure 6). Predictive internal models can also explain how self-other distinction can develop in humans. Having a predictive sensorimotor model allows to realise which sensations are caused by own actions and which sensation are caused by others or external events. This can be a first step towards developing a sense of agency, a sense of ownership and eventually a minimal self in artificial systems. 2.5 Human-robot interaction Robots acting predictably In human-robot interaction, both the expectations the human has of the robot and potential expectations the robot could have are relevant for an intuitive interaction. Therefore, robots that interact with humans or in human environments are designed to act in a predictable way. This can easily be

realised by creating smooth motion trajectories, but also complex scenarios can be imagined such as autonomous cars projecting a zebra-crossing pattern onto the road when they recognise pedestrians. Robots could also run internal simulations of themselves and act in a way that keeps the prediction error small. Joint Attention Joint attention is an important social skill where two or more agents share their attention towards a common object or goal. The prerequisites for joint attention are attention detection and manipulation, social coordination and intentional understanding (Kaplan and Hafner, 2006). The intentional understanding skill is the most advanced and has yet to be fully understood in humans to be applied to robots. It is based on anticipating actions and intentions of someone else, and eventually leads towards having a theory of mind, which only develops in human children at the age of around three and a half years (Baron-Cohen 1997). In robotics, in particular in the field of human-robot interaction, joint attention is a crucial skill. Examples are shared attention through gaze tracking and pointing (Nagai 2003). 3 Anticipation through simulation-based internal models An interesting and potentially very powerful new way of providing robots with the cognitive machinery to anticipate the future is by embedding a simulation of a robot and its environment inside the robot. In this section we explore simulation-based internal models for anticipation. 3.1 Conceptual basis for simulation based internal models A robot with a simulation-based internal model has, potentially, a mechanism for generating and testing what-if hypotheses: (1) what if I carry out action x? and, . . . (2) . . . of several possible next actions xi, which should I choose? Holland writes: “an internal model allows a system to look ahead to the future consequences of current actions, without actually committing itself to those actions” (Holland, 1992, p25). This leads to the idea of an internal model as a consequence engine – a mechanism for estimating the consequences of actions. Dennett, in his book Darwin’s Dangerous Idea (1995), develops the same idea in what he calls the Tower of Generate-and-Test; a conceptual model for the evolution of intelligence that has become known as Dennett’s Tower. Dennett’s tower is a set of conceptual creatures each one of which is successively more capable of reacting to (and hence surviving in) the world through having more sophisticated strategies for anticipating by ‘generating and testing’ hypotheses about how to act or react.

Language The ground floor of Dennett's tower represents Darwinian creatures; these have only natural selection as the generate and test mechanism, so mutation and selection is the only way that Darwinian creatures can adapt - individuals cannot. All biological organisms are Darwinian creatures. A small subset of these are Skinnerian creatures, which can learn, but only by generating and physically testing all different possible actions, then reinforcing the successful behaviour – providing of course that the creature survives. Dennett's Popperian creatures have the additional ability to internally model the possible actions so that some telephone (the bad ones) are discarded before they are tried out for real. A robot with an conversation Internal Model, capable of generating and testing what-if hypotheses, is thus an example of an artificial Popperian creature within Dennett's scheme. The ability Extending Maturana & Varela’s symbols, FECS, Feb. 2012 slide 23 to internally model possible actions is of course a significant innovation. The third storey of Dennett’s tower, a sub sub subset of Darwinians, are Gregorian creatures. In addition to an internal model, Gregorians have what Dennett refers to, after Richard Gregory, as mind tools – including words, which they import from the (cultural) environment (Dennett 1995, p378). Conceptually therefore Dennett’s Gregorians are social learners. © 2012 The University of Sheffield Another valuable tool for visually abstracting cognitive agents is to be found in Maturana & Varela’s (1987) seminal work The Tree of Knowledge: The Biological Roots of Human Understanding. Figure 7 (left) shows Maturana & Varela’s abstraction of a biological entity interacting with its environment (out of such interaction emerges behaviour). Figure 7 (right) shows one example of Moore’s extension of this visual abstraction to artificial entities: a robot with an internal model of self and environment (Moore 2012, Moore 2016). autonomous machine agent

Non-Living Systems

self-aware robot cognitive robot

Extending Maturana & Varela’s symbols, FECS, Feb. 2012

slide 24 robot with internal model ! of self and environment!

Figure 7. Pictograms showing visual abstractions of (left) a biological cognitive agent (Maturana & Varela, 1987) and (right) Moore’s extension of Maturana and Varela’s schema to robots, for a robot with an internal model of self and environment (Moore 2012). The use of internal models within control systems is well established, but these are typically mathematical models of the plant (system to be controlled). Typically a set of first-order linear differential equations models the plant, and these allow the design of controllers able to cope with reasonably well defined uncertainties; methods also exist to extend the approach to cover non-linear plant (Isidori et al., 2003). In such internal-model based control the environment is not modelled explicitly – only certain exogenous disturbances are included in



the model. This contrasts with the internal simulation approach of this section which models both the plant (in our case a robot) and its operational environment. In the field of cognitive robots specifically addressing the problem of machine consciousness (Holland, 2003), the idea of embedding a simulator in a robot has emerged in recent years. Such a simulation allows a robot to try out (or ‘imagine’) alternative sequences of motor actions, to find the sequence that best achieves the goal (for instance, picking up an object), before then executing that sequence for real. Feedback from the real-world actions might also be used to calibrate the robot’s internal model. The robot’s embodied simulation thus adapts to the body’s dynamics, and provides the robot with what Marques and Holland (2009) call a ‘functional imagination’.

Figure 8 Robots with simulation-based internal models. Top left: ECCE Robot (Marques et al, 2010; Diamond et al, 2012), Top right: Starfish-like robot (Bongard et al, 2006), Bottom left: Vaughan and Zuluaga’s (2006) robot with incomplete selfknowledge, Bottom right: e-puck robots each with an embedded genetic algorithm and simulator (O’Dowd et al, 2014). Bongard et al. (2006) describe a 4-legged starfish like robot that makes use of explicit internal simulation, both to enable the robot to learn its own body morphology and control, and notably allow the robot to recover from physical damage by learning the new morphology following the damage. The internal model of Bongard et al. models only the robot, not its environment. In contrast Vaughan and Zuluaga (2006) demonstrate self-simulation of both a robot and its environment in order to allow a robot to plan navigation tasks with incomplete self-knowledge; their approach provides perhaps the first experimental proof-of-



concept of a robot using a simulation-based internal model to anticipate and hence avoid unsafe actions.

Figure 9. From Zagal et al. (2009), walking behaviour executed with the simulated NAO robot (top), and with the real NAO robot (bottom). Zagal et al. (2009) describe self-modelling using internal simulation in humanoid soccer robots; in what they call a ‘back-to-reality’ algorithm, behaviours adapted and tested in simulation are transferred to the real robot. In a similar approach, but within the context of evolutionary swarm robotics O’Dowd et al. (2014) describe simple wheeled mobile robots which embed within each robot a simulator for both the robot and its environment; a genetic algorithm is used to evolve a new robot controller which then replaces the ‘live’ robot controller about once every minute. Simulation technology is now sufficiently well developed to provide a practical basis for implementing the kind of internal model required to test what-if hypotheses, as outlined above. In robotics advanced physics and sensor-based simulation tools are commonly used to test and develop, even evolve, robot control algorithms before they are tested in real hardware. Examples of robot simulators include Webots (Michel, 2004), Gazebo (Koenig and Howard, 2004), and Player-Stage (Vaughan and Gerkey, 2007). While using simulation tools roboticists are well aware of the dangers in making claims about algorithms tested only in simulation. The term reality-gap is used as shorthand for the gap between the performance of real sensors and actuators and their approximated and idealised versions, in simulation (Jacobi et al., 1995). Furthermore, there is an emerging science of simulation, aiming for principled approaches to simulation tools and their use (Stepney et al., 2011). 3.2 A Generic architecture for simulation-based internal models Figure 10 proposes an architecture for a robot with a simulation-based internal model which is used to test and evaluate the consequences of the robot’s next possible actions. The machinery for modelling next actions is relatively independent of the robot’s controller; the robot is capable of working normally without that machinery, albeit without the ability to generate and test what-if hypotheses. The what-if processes are not in the robot’s main control loop, but instead run in parallel to moderate the Robot Controller’s normal action selection process, acting in effect as a kind of governor. This governance might



4

be to rule out certain actions because they are modelled as unsafe for the robot, or to recommend new robot actions to, for instance, prevent an accident. Journal Title XX(X) Sensory Input

Object Tracker-Localiser

Set of all possible Actions

Loop through Actions

Consequence Engine Robot Controller World Model Robot Model

Action Evaluator Safety Logic

Robot Controller Action Selection

Consequence Engine Robot Controller Actions/Consequences Embodiment

Action

Motors Commands

Figure 2. The Consequence Engine (CE) architecture, and its relationship with the robot’s controller. Modules dealing with different kinds of information are color coded.



Figure 10 The Consequence Engine: an architecture for robot anticipation using a simulation-based internal model (Blum, 2015) has generated a complete set of consequences and their computer vision, or by all agents broadcasting their own evaluation in the form of safety values, all next possible position and orientation. At the heart of the architecture is the Consequence Engine. The CE is initialised actions together with their corresponding safety values are 3.2.2 Simulation-Based Internal Model The passed to the Action Selection (AS) mechanism of the from the Object Tracker-Localiser, and loops through all possible next actions; simulation-based internal model comprises three modules: robot. This AS module then selects one of the actions these next actions are generated within the Robot Controller (RC) and the Robot Model, Robot Controller and World Model. The according to this evaluation for execution by the robot Robot Model is a model of the robot running the CE as transferred to the mirror RC within the CE (for clarity this data flow is omitted controller. well as all other robots, The Robot Controller is a duplicate from figure 10). For each candidate action the CE simulates the robot executing of the robot’s real controller, and the World Model is a a virtual duplicate of the environment. After it has been 3.2that action, and generates a set of model outputs ready for evaluation by the Modules initialised with the current situation by the OTL, i.e. the TheAction Evaluator. The Consequence Evaluator loops through each possible next AE, SL, and AS are discussed in detail in section 4 environment and pose of both the smart robot and all other in action; this is the Generate-and-Test loop. Only when the complete set of next the context of the set of possible actions and their robots in the environment, the simulation-based model can corresponding safety values. possible actions has been tested does the Consequence Evaluator send, to the generate simulated trajectories and any consequences, i.e. collisions. The internal model and its implementation in Robot Controller, its recommendations. These processes are explained in more 3.2.1 Object Tracker-Localizer The OTL has been this research is described in detail in appendix A. detail below. implemented using a commercial motion capture system and fitting the robots with reflective markers. The motion 3.2.3 Action Evaluator The AE checks the trajectories capture system supplies the robot with position and pose generated by the internal simulation for each action against The Internal Simulation data for all robots. This system is discussed in more detail the safety action defined as a part of the task of the robot. in appendix B. In principle this module could also be It labels the actions as safe or dangerous and passes this implemented locally on the robot using a camera and information on to the SL. For more detail on the safety The Consequence Engine shown in figure 10 incorporates a simulator with both a World Model and a Robot Model. The World Model (WM) is a model of the Prepared using sagej.cls robot’s environment; including the terrain across which the robot must move (if it’s a mobile robot) and the other objects the robot might encounter. Those objects might be static obstacles (i.e. walls) or hazards (i.e. holes in the ground),

or dynamic objects. The dynamic objects could be moving obstacles or actors with which our robot must interact; these could be other robots or, if we are concerned with human-robot interaction, one or more humans. For many of the applications we might envisage the WM will also need to model real-world physics, so that for instance the inertia of moving objects is accounted for in collisions. The Robot Model (RM) is a model of the robot itself and, as is commonplace in robot simulators, this will model the robot’s sensors and actuators. Importantly the same Robot Controller as the real robot controls the RM, so that it will act and react in the same way as the real robot. Thus the Robot Controller in the Internal Model mirrors the real robot’s Controller – as shown in figure 10. All of these components are normally present in current technology robot simulators. However, in order to make use of such an internal simulator we need the following additional capabilities. (1) It must be capable of being initialised so that objects in the WM, and the state and disposition of the RM within the WM, can be matched to the real world environment and the location and disposition of the real robot in that environment. The Object Tracker – Localizer shown in figure 10 will supply the initialisation data; (2) It must be capable of being run with a given RM action, for a given simulated time period then halted and re-initialised to the same state, then run again for each of the robot’s next possible actions. This is the Generate and Test loop, and (3) The final state of the simulator at the end of each of these fixed-time runs must be captured and suitably coded, then supplied to the Action Evaluator. The Action Evaluator The purpose of the Action Evaluator (AE) is to compare the outputs of the internal simulator for each of the robot’s next possible actions, and select the ‘best’ action for the real robot. To understand how to do this we first need to consider what we mean by the simulator outputs. Clearly, for each action, the robot is likely to have changed its disposition in the WM during the simulation run. Or it may not, either because its move was blocked by another object, or simply because the next action being tested might be ‘stand still’. If there are dynamic actors in the environment, then their positions, relative to the robot, are also likely to have changed. Thus the position of the robot at the end of each internal simulator run, and of any other dynamic actors, are useful outputs. Perhaps better still are the changes in position of the robot. But since collisions are significant consequences as far as safety is concerned, that are likely to be detected directly by the simulator since – during a simulated what-if run – those collisions actually happen, then collision or no-collision is another extremely useful output. Given the outputs of the internal simulator are, minimally, change of position and collision/no-collision, for each next possible action tested, how is the AE to judge which is the best action? Clearly such a judgement requires rules, which are implemented in the safety logic (SL). The rules might, for instance, determine that all collisions are unsafe. Thus, for a set of next possible actions tested in the

internal simulator if only one has the output no-collision, then that would be chosen and sent to the Robot Controller. But if several next actions are equally safe (i.e. none of them are predicted to result in collisions) how is the AE/SL to decide? The simple answer is that the AE/SL doesn’t have to decide between the safe actions, since the RC is, we assume, capable of action selection in order to decide which action is the next best action toward achieving the robot’s task or mission. Clearly for n next possible actions modelled and evaluated the number of actions judged safe, s could be any value in the range (0…n), and so the AE needs to send an s-tuple of safe actions to the RC. Using its action selection mechanism, the RC then chooses one of the s-tuple actions (possibly) overriding an unsafe action. Consider the abstract scenario illustrated in figure 11. Here a robot is approaching two hazards: a wall on the left and a hole directly ahead. Let us assume the hole is deep enough that it presents a serious hazard to the robot. The robot has four possible next actions, each of which is simulated in its CE. The actions are move ahead left, move straight ahead, move ahead right or remain stationary; for simplicity assume left and right movements actually consist of a turn on the spot, followed by straight moves as shown by the dashed arrows.

Figure 11. An abstract scenario in which a robot faces two hazards, a wall on its left, and a hole in the ground straight ahead. Consider that the robot has four next possible actions: turn left, move straight ahead, turn right or stand still. Robot action Position change Robot outcome Consequence Ahead left

5 cm

Collision

Robot collides with wall

Ahead

10 cm

Collision

Robot falls into hole

Ahead right

20 cm

No-collision

Robot safe

Stand still

0 cm

No-collision

Robot safe

Table 1. Possible anticipated outcomes for each of the four next possible actions in the abstract scenario of figure 11. Table 1 shows the change of position and collision/no-collision values that might be generated by the IM for each of the four possible next actions of the robot. Two of the four actions are clearly unsafe: Ahead Left, which leads to a collision with the wall, and Ahead, which results in the robot falling into the hole. It is perfectly reasonable to expect the IM to simulate and detect both outcomes and, from a safety perspective both can be classified as Collision (we can assume the WM’s physics engine will model the robot colliding with the bottom of the hole). Two of the actions Ahead Right and Stand Still, are safe and so the CE will output the 2-tuple (Ahead Right; Stand Still) to the RC. It is then easy for the RC to select the action Ahead Right, since it almost certainly results in the robot moving closer to its target destination. The ability to anticipate the near future for each of its next possible actions clearly improves the robot’s safety. And importantly, because the IM is initialised to match the situation in which the robot finds itself, the robot is able to maintain its safety in situations that may not have been anticipated when the robot was designed. In the example sketched here some actions are evaluated to be safe (robot outcome: no-collision). What if the situation a robot finds itself in means that all robot actions are evaluated as unsafe, leaving the RC with no options? This problem might be addressed if we arrange that, instead of generating binary (safe or not-safe) outcomes, the CE outputs an analogue value estimating the degree of safety risk. The CE could then provide the RC with the ‘least unsafe’ options. This approach is outlined in Section 3.3 below. The Object Tracker-Localizer The Object Tracker-Localizer (OTL) is required to track the (relative) position of both static and dynamic objects in the robot’s local environment, while at the same time localising the robot relative to those objects. Then provide this position data to the IM. For moving (dynamic) objects the OTL must also provide the IM with the speed and direction of those objects, so that their trajectories can be modelled. The OTL is the essential mechanism by which the robot’s Internal Model is synchronised to its local environment, and position in that environment. Although the OTL might appear to be a demanding requirement, robots of reasonable sophistication are likely to require sensing and processing of sense data for object tracking and localisation as a part of their normal control architecture. 3.2.1 Experimental validation: The Corridor Experiment We have implemented and tested the simulation-based internal model architecture outlined above in an experimental scenario, which we call the corridor experiment (Blum et al, 2017). Inspired by the problem of how mobile robots could move quickly and safely through crowds of moving humans, the aim of this experiment is to compare the performance of our simulation-based

internal model with a purely reactive approach. In other words: can a robot’s safety be improved with anticipation? In this experiment one mobile robot (the CE-robot) is equipped with the consequence engine, while 5 other mobile robots have only simple obstacle avoidance behaviours. The setup is shown in figure 12 (left); here the smart CErobot is shown in blue at its starting position. The CE-robot’s goal is to reach the end of the corridor on the right while maintaining its own safety by avoiding, while maintaining a safe distance, the five unintelligent robots shown in red. Figure 12 (right) shows the trajectories of all six robots during a simulated run of the experiment, with the CE-robot reaching the end of the corridor.

8

Figure 12. The corridor experiment goal (left), with 5 (red) robots moving randomly and one intelligent (CE) robot (blue) with a simulation-based internal model. (Right) shows (simulated) trajectories of all six robots by the time blue has Journal Title XX(X) reaching the end of the corridor.



Figure 4. Experimental setup of the corridor with one smart and five h-robots. The smart robot is required to move from the left end of Figure 13. The real robot corridor experiment, using e-puck robots (Mondada et al, the corridor to the right, as indicated by the „task“arrow, avoiding the other h-robots which are also moving in the corridor. 2009) fitted with Linux extension boards (Liu and Winfield, 2011). This image shows the initial condition with the CE (intelligent) robot on the left and the five unintelligent robots positioned at randomly selected locations in the corridor. The arena markings have no significance here.

Blum, et al.

(a) run time

(c) danger ratio

11

(b) distance covered

(d) simulations per time step

Figure 14. Results of the corridor experiment. Multiple trials (88 simulations and Figure 8. Statistical analysis for 88 simulations and 54 real robot experiments considering four basic metrics and comparing them to the baseline experiment. The box plots depict median, 25-quantile, and 75-quantile as a box. Whiskers 54 real robot experiments) were run over four experiments: (1) the baseline in depict 5-quantile and 95-quantile, and crosses depict data points outside those quantiles. simulation – in which the blue robot has only a simple reactive collision avoidance behaviour, (2) the baseline with real robots, (3) using the consequence engine in As expected, the smart robot employing the CE the two controllers. For the baseline controller, the danger the blue robot in simulation and (4) using the consequence engine in the blue robot mechanism takes more time to reach its goal and covers ratio increases almost linearly with the safety distance as a more distance doing so. The increase is moderate and is larger safety distance only increased the statistical chances with real robots. Panel (a) shows the time taken for the blue robot to reach the end about 50% more time used and 30% more distance covered. of another robot being present inside of the safety area. of the corridor, (b) shows the distance that the blue robot covers while reaching the Those are not symmetrical since the robot can also stop For the smart robot running the CE, the function behaves end of the corridor, (c) shows the ‘danger ratio’ experienced by the blue robot, and and thus take longer to reach its goal without travelling more like a threshold function as it is able to avoid danger further. Furthermore, variance in the results is increased for as long as the disc described by the safety distance can (d) the number of consequence engine runs per timestep in the blue robot. The the smart robot, which is also to be expected. be manoeuvred between the other robots for geometric danger ratio is the percentage of the run time that anther robot is within the blue The smart robot using the CE mechanism is much reasons. For small safety distances this is always possible, robot’s safety radius. safer. In simulation it is almost perfectly safek but the and for very large safety distances, for example larger than corridor width, this is impossible. The actual maximum improvement in the real experiments is still large. This the improvement comes at the cost of having to perform the safety distance which still allows the smart robot to move The baseline experiments are introduced in order to provide a benchmark simulations with the corresponding computational cost, through the corridor depends on the density of the other against which to evaluate the merit of the consequence engine in improving the while the baseline robot can act with a purely reactive robots and the corridor width. In a preliminary parameter controller. Here the trade-off between computational CE-robot’s safety. In the baseline experiments the consequence engine in the CEcomplexity and effectiveness between the two algorithms k This is due to the fact that in pure simulation the simulated possible robot is disabled so that the CE-robot defaults to simple Braitenberg (1984) can be seen. next actions are a direct continuation of the past trajectories, which were obstacle avoidance. Baseline experiments are conducted pair-wise with the CE The danger ratio is a function of the safety distance also simulated by the same simulator, as there is no reality gap (cf. parameter (cf. section 4.1), which scales differently for appendix A.3. experiments under identical initial conditions and after each experiment a random robot is selected to be the CE robot in order to compensate for interPrepared using sagej.cls robot-heterogeneity. Results of the corridor experiment in figure 14 show that for a relatively small cost in additional run time and distance covered, panels (a) and (b), the danger ratio is very significantly reduced from a mean value of ~20% to a mean value of zero, panel (c). Clearly there is a computational cost and this is reflected in panel (d); the baseline experiment has no CE and hence runs no simulations, whereas the CE robot runs an average of between 8 and 10 simulations per timestep. This is entirely to be expected: anticipation clearly incurs a computational overhead.

3.3 Simulation-based internal models for ethical robots Consider the scenario illustrated in figure 15. Here there are two actors: our selfaware robot and a human. The environment also contains a hole in the ground, of sufficient size and depth that it poses a serious danger to both the robot and the human. As in the previous example the robot has four possible next actions, each of which is simulated in its IM. Let us extend the architecture of figure 10 in the following two ways. Firstly, we extend the definition of the ‘collision/nocollision’ output of the IM, to include all safety outcomes, and assign to these a numerical value that represents the estimated degree of danger. Thus 0 indicates ‘safe’ and (say) 10 ‘fatal’. An intermediate value, say 4, might be given for a lowspeed collision: unsafe but probably low-risk, whereas ‘likely to fall into a hole’ would merit the highest danger rating of 10. Secondly, we also output, to the CE, the same safety consequence of the other actor(s) in the environment - noting that the way we have specified the IM and its inputs, from the OTL, means that the IM is equally capable of anticipating the effect of hazards on all dynamic actors in the environment, including itself. If one of those dynamic actors is a human then we now see the possibility of the robot choosing to execute a moderately unsafe action in order to prevent that human from coming to harm.

An ethical thought experiment

Figure 15. Abstract scenario in which a robot and a human both face a dangerous hazard: a hole in the ground. The human is assumed to be unaware of the hazard and will move forwards. The robot has four next possible actions: turn left, move straight ahead, turn right or stand still. Robot action

Robot Human Consequence outcome outcome

Ahead left

0

10

Robot safe; human falls into hole

Ahead

10

10

Both robot and human fall into hole

Ahead right

4

4

Robot collides with human

Stand still

0

10

Robot safe; human falls into hole

Table 2 Anticipating possible safety outcome values for both robot and human, for each of the four next possible robot actions in the scenario of Fig. 9. Here the outcome is on a continuous scale from 0 to 10, where 0 means completely safe and 10 means fatally dangerous. Table 2 shows the safety outcome values that might be generated by the IM for each of the four possible next actions of the robot, for both the robot and human actors in this scenario. From the robot’s perspective, 2 of the 4 actions are safe: Ahead Left means the robot avoids the hole, and Stand Still means the robot also remains safe. Both of the other actions are unsafe for the robot, but Ahead is clearly the most dangerous, as it will result in the robot falling into the hole. For the human, 3 out of 4 of the robot’s actions have the same outcome: the human falling into the hole. Only 1 robot action is safer for the human: if the robot moves Ahead Right then it might collide with the human before she falls into the hole. Notice here that the simulation-based internal model allows the robot to anticipate the consequences of not just its own actions, but also the human’s. This is an important advantage of the simulation-based internal modelling approach. In order for the CE to generate the action Ahead Right in this scenario it clearly needs both a safety rule, as before, and an ‘ethical’ rule, which can take precedence over the safety rule. This logic might take the form: IF for all robot actions, the human is equally safe THEN (* default safe actions *) output s-tuple of safe actions ELSE (* ethical action *) output s-tuple of action(s) for least unsafe human outcome(s)

What we have set out in this section appears to match remarkably well with Asimov’s first and third laws of robotics: (1) A robot may not injure a human being or, through inaction, allow a human being to come to harm, and (3) A robot must protect its own existence as long as such protection does not conflict with the First (or Second) Laws (Asimov, 1950). The schema proposed here will compel a robot to maintain its own safety (3rd law ‘protect its own existence’); it will avoid injuring (i.e. colliding with) a human (1st law ‘may not injure a human’), but may also sometimes compromise that rule in order to prevent a human from coming to harm (1st law ‘... or, through inaction, allow a human to come to harm’). This is not to suggest that a robot which apparently implements part of Asimov’s famous laws is ethical in any formal sense (i.e. that an ethicist might accept). But the intriguing possibility of a route toward engineering a minimally ethical robot does appear to be presented.

3.3.1 Experimental Validation: a minimally ethical robot We have experimentally implemented the abstract scenario shown in figure 15, with two mobile robots; one acting as the ethical robot, and the other as a proxy human (Winfield et al 2014). The ethical robot (which we refer to as the Asimovrobot) embeds the consequence engine, together with the ethical rule shown above, while the proxy human (H-robot) has no consequence engine but instead a simple state-machine in which the robot moves straight ahead unless it encounters an obstacle, which triggers a simple avoidance behaviour. The experimental setup using e-puck robots (Mondada et al, 2009) is shown in figure 16.

Experimental results

Figure 16. The ethical robot experiment setup, using e-puck robots. The ethical robot – which embeds the consequence engine and ethical rule above – is the Asimov-robot (A-robot), the proxy human is the H-robot. The yellow square is the hole in the ground and ‘goal’ is the goal position for the A-robot. The other markings in the arena have no significance. The consequence engine evaluates 30 next possible actions every 0.5 second.



Robot trajectories: trials 1 and 2

Figure 17. Left: shows a baseline experimental run in which there is no H-robot. The red lines show the start (A) and end (goal) positions of the Asimov-robot for 26 runs; the consequence engine allows the Asimov-robot to maintain its own safety and skirt around the corner of the hole. Right: shows the same setup with the introduction of a proxy human (H-robot). Here we see the Asimov-robot start toward its goal position, but then when it notices the H robot, it diverts its path toward the H-robot. When it is close enough to the H-robot a collision avoidance response is triggered and the H-robot moves away from danger. The Asimovrobot then resumes its path toward its goal position while itself avoiding the hole.

Test results: trial 3, an ethical dilemma

Figure 18. The ethical robot experiment with a balanced ethical dilemma. The setup (left) is the same as Figs. 10 and 11, but here there are two proxy humans, H1 and H2. Both H1 and H2 are initially moving in the direction of the hole. The

number of H robots that are ‘rescued’ from falling into the hole over 33 trials are shown on the right. The trial runs in figure 17 show that the ethical (Asimov) robot is able to prevent the H-robot from falling into the hole, i.e. ‘rescue’ the H-robot every time. Its success rate is 100%. By simulating ahead in time for each of its next possible actions, both itself and the H-robot, and hence anticipating the consequences (for both) of each of those actions, then applying the ethical rule, the Asimov robot is able to prevent both itself and the proxy human from coming to harm. We then tested the Asimov-robot with a balanced ethical dilemma. Running the same setup as figures 15 and 16 we introduced two proxy humans (H1 and H2), in what we believe to be the world’s first experimental demonstration of a realrobot facing an ethical dilemma. The Asimov-robot is running exactly the same code as before. As figure 18 shows, over 33 trials, the Asimov-robot is unable to ‘rescue’ either H1 or H2 in 14 runs. In 16 runs its succeeds in rescuing one of the two H-robots, and in 3 runs it rescues both1. At first sight this result is puzzling. We know from figure 17 that the Asimov-robot can rescue one H-robot very reliably, so why does it fail to rescue either H1 or H2 in nearly 50% of the runs? Analysis revealed the answer to be the fact that the Asimov-robot is able to change its decision after every run of the CE, i.e. every 0.5 second; in other words any decision to move in the direction of H1 or H2 can be changed – no decision is sustained. A real system would need to make decisions with some persistence (just as a human would do when faced with a similar balanced dilemma). Our puzzling result simply reflects the fact that the Asimov-robot is not coded with memory that persists over multiple cycles of anticipation. 3.4 Future potential of simulation-based internal models Exogeneous fault detection in swarm robotic systems Consider now a swarm of robots in which some robots may have developed faults. It has long been assumed that swarm systems are robust, i.e. the failure of individual robots will have little effect on a swarm’s overall collective behaviour. However this is not always the case. Partial failures2 in particular can have the effect of thwarting the desired behaviour of the swarm (Bjerknes and Winfield, 2013). One approach to addressing this problem that has been explored in recent work, is to equip every robot in the swarm with a simulation of itself and its neighbours. Each robot is then able to anticipate the behaviour of neighbouring robots by comparing the behaviour it observes, with the behaviour predicted by its simulation based internal model. Inspired by immune systems, we refer to this approach as exogeneous fault detection (Millard et al, 2013, 2014). 1 The setup is such that there isn’t time to rescue both, so the fact that the Asimov-robot rescues both H1 and H2 in 3 trials can only be regarded as an anomaly. 2 Such as motors failed but sensors and communication remain operational.

The imitation of goals Imitation is a very important form of social learning in humans. This importance is reflected in the early emergence of imitation in human infants; from the age of two, humans can imitate both actions and their intended goals (Gariépy et al, 2014), and this has been termed rational imitation. Imitation has long been regarded as a compelling method for (social) learning in robots. However, robot imitation faces a number of challenges (Dautenhahn and Nehaniv, 2002). One of the most fundamental issues is determining what to imitate (Carpenter and Call, 2006). Although not trivial it is relatively straightforward to imitate actions, but inferring goals from observed actions and thus determining which parts of a demonstrated sequence of actions are relevant, i.e. rational imitation, is a difficult research problem. One approach explored in Vanderelst and Winfield (2017), is to equip the imitating robot with a simulation-based internal model that allows the robot to explore alternative sequences of actions required to attain the demonstrator robot’s potential goals (i.e. goals that are possible explanations for the observed actions). Comparing these actions with those observed in the demonstrator robot enables the imitating robot to infer the goals underlying the observed actions. Towards Artificial Theory of Mind Theory of mind is the term given by philosophers and psychologists for the ability to form a predictive model of others (Carruthers and Smith, 1996). With theory of mind it is supposed we are able to understand how other might behave in particular circumstances – to form beliefs about their intentions. Artificial theory of mind could provide robots designed to interact with humans with the cognitive machinery to, for instance, anticipate their needs. This could be a considerable advantage for companion (care) robots or workplace assistant robots. Arguably the simulation-based internal model takes us a step in the direction of artificial theory of mind; indeed one of the several theories of mind is the simulation theory of mind (Michlmayr, 2002). Clearly the examples given in this chapter fall very far short of anything that could be characterised as artificial theory of mind. How a robot could be initialised with (or better still might learn) an adequate model for its human companion or workmate is clearly a very difficult problem. In Winfield (2017) we propose an experiment in which two robots – each equipped with the same consequence engine – semiotically interact with each other. Although not directly addressing the problem of artificial theory of mind, such an embodied computation model of social interaction could advance us in that direction. 3.5 Challenges and Open Questions with simulation-based internal models Although the architecture proposed above has been demonstrated to be technically realisable with current simulation technology in the laboratory, it is by no means certain that the resulting robots would present as a practical

proposition for real-world applications. Significant challenges fall into five categories: performance, the reality gap, timing, sensing and situational awareness and validation. Performance Sensor-based simulation is computationally expensive, and the time required to simulate each next possible action and complete the CE cycle is likely to be a major limiting factor on the robot’s overall performance. For example, the internal modelling process for the complex anthropomimetic humanoid ECCE-Robot, using workstation grade computational hardware, ran at about one quarter of real-time (Diamond et al., 2012). Ideally we require a efficient internal modelling process that runs in the background, overriding the robot’s next control action as and when necessary, yet with no perceptible interruption to the robot’s normal operation. Achieving this ideal presents two challenges: engineering the simulation, and integration so that the Internal Model and its data flows integrate smoothly with the robot’s actions in the real world. One approach to speeding up the simulator is to exploit GPGPU technology and parallelise processes within the simulator (Jones et al, 2014). The key simulation challenge is to find the optimal level of simulation fidelity. Too much fidelity will slow down the simulator; too little and the reality-gap will reduce the value of the simulation outcomes, i.e. the CE will not provide a useful prediction of what will really happen if the robot performs this action. It may be that a variable-fidelity simulator is required, in which the robot can adapt the simulation fidelity according to the perceived hazard. Although current robot simulator technology is likely to be adequate to allow proof-of-principle, a different kind of variable fidelity simulation framework will most probably be needed for a practical real-world robot. Developing this presents a substantial research challenge. The Reality Gap This refers to the gap between the performance of real sensors and actuators and their approximated and idealised versions, in simulation (Jacobi et al., 1995). The reality gap is unavoidable, and is potentially a very serious limitation of simulation-based internal models. However, an unexpected outcome of the experiments outlined in section 3.1 is that the effect of the reality gap reduces as the A-robot moves physically closer to the H-robot – enabling the A-robot to accurately intercept the H-robot. This advantage would, of course, not be seen in robots that are not required to physically intercept. Timing Here we have a number of open questions. Firstly, when and how often does the robot need to initiate the process of internally modelling the sequence of next possible actions? The process is computationally expensive and, given the performance issue discussed above, likely to slow down the robot if it has to wait for the IM cycle to complete before acting. Ideally, the IM cycle would only be triggered as and when some potential hazard is detected by the robot’s sensors and, furthermore, far enough in advance that the IM cycle can complete before the robot encounters the hazard. Intuitively, a static environment is safer for the robot, thus we could propose that when the robot senses a nearby object or actor starting to move, the IM cycle is triggered. Such an event would be sensed by the Object Tracker-Localizer, so it would make sense for that process to initiate the

whole IM cycle. Perhaps also the OTL should send a signal to the Robot Controller to slow down the robot; an appropriate response perhaps to sensing a moving object but with the benefit of giving longer for the IM cycle to complete. Secondly, how far ahead should the IM simulate, for each next possible action? Let us call this time ts. If ts is too short, the internal modelling process is likely to be useless, since it will not simulate far enough ahead to interact with the hazard that triggered the IM cycle. But setting ts too long is likely to affect the robot’s performance, given the computational cost of Internal Modelling. Ideally ts and its upper limit (time-out) should be adaptive, but how to set or discover these values clearly presents an open research question. Sensing and situational awareness A requirement of the simulation-based internal modelling approach to anticipation is that the robot must have the situational awareness to initialise its internal model. Any failure to sense objects or actors in the environment, then represent those via the Object TrackerLocaliser in the World Model (Fig. 4), will clearly compromise the robot’s ability to anticipate the possible effects of its actions. Validation and verification This chapter is proposing a route to practical anticipation and, hence, safer robots. But a safety system is worthless unless it can be formally shown to be safe. Thus we face the difficult question of if, and how, a robot engineered along the lines proposed could be validated3. At first this might appear to be an insurmountable challenge: after all, the whole idea of the architecture outlined here is that it offers the potential for a robot that can act safely even when it is confronted with new situations, including scenarios not anticipated by the robot’s designers. Given that robot behaviours are an emergent property of the interaction between the robot and its environment, then logic would suggest that placing a robot in an unpredictable environment will lead to unpredictable behaviours – and hence a robot that cannot be validated. However, a more careful analysis suggests there may be a route to verification, and hence partial validation. Firstly, consider that the Generate-and-Test machinery, including its Internal Model, does not control the robot directly. In fact it serves to reduce the number of next possible actions in any given situation, by assessing some to be unsafe and inhibiting those in the Robot Controller’s action selection mechanism. This suggests that a robot with the Generate- and-Test machinery cannot be less safe than the same robot without that machinery. If the Robot Controller has already been shown to be safe (or as-safe-as-it-can-be within its design limitations), then the introduction of the internal modelling process cannot compromise that assurance. Let us test this proposition by considering the two possible ways in which the Generate-and-Test process can give incorrect assessments:

3 Validation would determine if the robot is safe in use; verification checks the correctness of its design





Incorrectly evaluating a safe action as unsafe: here the effect is to (unnecessarily) limit the choice of next possible actions; in effect the robot acts more cautiously than it needs to. Incorrectly evaluating an unsafe action as safe: if the controller then selects this action, the robot will execute an unsafe action. However, the same robot without the Generate-and-Test machinery would, in the same situation, execute the same unsafe action, so the robot with the Generateand-Test process is no more unsafe than the robot without it.

How might the internal modelling process might give rise these incorrect assessments above? There are several reasons including at least: (i) the robot might fail to accurately perceive its environment and the objects in it, because of sensor limitations or sensor noise for example, and therefore incorrectly initialise the World Model. (ii) Limitations in simulation fidelity (the reality-gap) might result in the Robot Model failing to (virtually) sense an object that has correctly been initialised in the World Model, or failing to (virtually) collide with an object. Or (iii) simulation time ts is too short, so the CE doesn’t simulate far enough ahead to (virtually) encounter a hazard. (i) is clearly a fundamental limitation of any robot; animals and humans also suffer the consequences of sensory limitations. Since perfect perception is impossible (i) cannot be held against the internal modelling approach. (ii) and (iii) are factors discussed above in sections Performance and Timing and need careful design in order to minimise the likelihood of incorrect assessments. Secondly, the Generate-and-Test process, including the rule-set in the Safety Logic, is entirely deterministic. Thus for any given situation, i.e. current disposition and set of perceptual inputs, and for a given simulator fidelity and internal timing, a robot will always generate the same set of IM and CE outputs. Thus we may be able to formally check the correctness of the deterministic Generate-and-Test process using agent model checking (Dennis et al., 2012) or deductive verification approaches (Dixon et al., 2002). 4 Social and Technological Implications Introducing robots that are able to anticipate the consequences of their own or humans’ behaviours raises a number of societal, ethical and technological questions. An important aspect is safety. As we have shown in this article anticipatory robots can in general be safer, perhaps even behave ethically, and this broadens the potential for new applications and capabilities, mainly in the context of human-robot interaction such as in education and care, but also in semi-industrial joint human-robot workspaces (so called cobots). But if robots are to anticipate humans they will need to embed better models of human behaviour – in other words they will need to be equipped with artificial theory of mind. Given the technical difficulty of implementing artificial theory of mind (see section 3.4), the scope for a robot misinterpreting a human’s behaviour and hence incorrectly anticipating their needs must be high. For this reason such robots will need a high level of transparency, both so that the human user can understand why a robot is behaving in a particular way, and that an

accident investigator can discover the internal processes that led a robot to misanticipate a human, causing harm to the human (Winfield and Jirotka, 2017). The approaches outlined in this paper lend themselves to this kind of transparency since their internal processes are traceable, repeatable and hence explainable. These will be important attributes when anticipatory AIs are embedded in safety-critical applications such as driverless cars, drones and assisted-living (care) robots. This contrasts with non-algorithmic AIs which make use of deep learning approaches. Game playing AIs based on deep learning such as Deepmind’s AlphaGo (Silver et al, 2016) are undoubtedly anticipatory systems since they are able to look ahead and model the consequences of possible plays. Whilst powerful, such systems are also inscrutable. Based on artificial neural networks (ANNs) and trained with very large datasets it is generally impossible to discover the reasons for a particular decision. Until solved this lack of transparency and explainability renders anticipatory AIs based on deep learning unsuitable for application in safety-critical systems. 5 Summary and Conclusions In this chapter, we have outlined two major contributions to anticipation in robotics, namely predictive internal models for sensorimotor control and predictive internal models for safety and ethical systems. We explained the mechanisms of anticipation and illustrated these with examples of recent experimental work. We then suggested further potential applications of simulation-based internal models. We elaborated the limitations and challenges for implementation, and – in a discussion on societal and technological implications – we underlined the need for transparency and explainability, especially in safety-critical applications. We noted that the approaches to anticipation outlined in this chapter lend themselves to transparency. In conclusion we have shown that it is possible to develop robots that are able to anticipate their own and others’ behaviour and thus act in a more intelligent, safer and transparent way. Acknowledgments The work of this chapter is part funded by EPSRC grant ref EP/L024861/1 and by the German Academic Exchange Service (DAAD) and the Deutsche Forschungsgemeinschaft (DFG) grant HA 5458/9-1. The authors also gratefully acknowledge the research colleagues who conducted the experimental work outlined in this chapter, in particular Dr Christian Blum and Dr Wenguo Liu. References

Asimov, I. (1950): I, ROBOT. Gnome Press. Baranes, A., & Oudeyer, P.-Y. (2013). Active learning of inverse models with intrinsically motivated goal exploration in robots. Rob. Auton. Syst. 61, pp. 49–73. Baron-Cohen, S. (1997). Mindblindness: an essay on autism and theory of mind. MIT Press, Boston, MA, USA. Bechtle, S., Schillaci, G., Hafner, V.V. (2016). On the Sense of Agency and of Object Permanence in Robots, Proceedings of the 6th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics, pp. 166171, Paris, France. Bjerknes, J.D., Winfield, A.F.T. (2013). On fault tolerance and scalability of swarm robotic systems, Distributed Autonomous Robotic Systems, 431-444. Blakemore, S. J., Wolpert, D., & Frith, C. (2000). Why can’t you tickle yourself? Neuroreport 11, 11–16. Blum, C, (2015). Self-organization in networks of mobile sensor nodes, PhD thesis, Mathematisch-Naturwissenschaftliche Fakultät, Humboldt-Universität zu Berlin. Bongard, J., Zykov, V., Lipson, H. (2006). Resilient machines through continuous self-modeling. Science 314(5802), 1118–1121. Braitenberg, V. (1984). Vehicles: Experiments in synthetic psychology. Cambridge, MA: MIT Press. Brooks, Rodney A. (1991). Intelligence without representation. Artificial Intelligence. 47 (1–3): 139–59. Carpenter, M., Call, J. (2006). The question of ‘what to imitate’: inferring goals and intentions. In: Nehaniv, C.L., Kirstin, D. (eds.) Imitation and Social Learning in Robots, Humans and Animals Behavioural, Social and Communicative Dimensions, pp. 135–152. Cambridge University Press. Carruthers, P., & Smith, P.K. (eds), Theories of Theories of Mind, Cambridge University Press, 1996. Clark, A. (2016), Surfing Uncertainty - Prediction, Action, and the Embodied Mind, Oxford University Press Copete, J.L., Nagai, Y. & Asada, M. (2016). Motor development facilitates the prediction of others’ actions through sensorimotor predictive learning, Proceedings of the 6th IEEE International Conference on Development and Learning and on Epigenetic Robotics, September 19-22, 2016.

Dautenhahn, K., Nehaniv, C.: Challenges in Building Robots That Imitate People, pp. 363–390. MIT Press (2002). Dearden, A., & Demiris, Y. (2005). Learning forward models for robots, Proceedings of the 19th international joint conference on Artificial intelligence (IJCAI'05), pp. 1440-1445. Dennett, D. (1995). Darwin’s Dangerous Idea. Penguin. Dennis, L. A., Fisher, M., Webster, M., & Bordini, R. H. (2012). Model checking agent programming languages, Automated Software Engineering 19, 1, pp. 5–63. Diamond, A., Knight, R., Devereux, D., & Holland, O. (2012). Anthropomimetic robots: Concept, construction and modelling. International Journal of Advanced Robotic Systems 9 :209. Dixon, C., Fisher, M., & Bolotov., A. (2002). Resolution in a logic of rational agency, Artificial Intelligence 139, 1, pp. 47–89. Friston K., FitzGerald T., Rigoli F., Schwartenbeck P., O'Doherty J., Pezzulo G. (2016). Active inference and learning, Neurosci Biobehav Rev. 2016 Sep;68:86279. Gariépy, J.F., Watson, K.K., Du, E., Xie, D.L., Erb, J., Amasino, D., Platt, M.L. (2014). Social learning in humans and other animals. Front Neurosci 8(58). Holland, J. (1992). Complex Adaptive Systems. Daedalus. Holland, O. (ed.) (2003). Machine Consciousness. Imprint Academic. Isidori, A., Marconi, L. & Serrani, A. (2003). Fundamentals of internal-model based control theory, in Robust Autonomous Guidance, Advances in Industrial Control (Springer London), pp. 1–58. Jacobi, N., Husbands, P. & Harvey, I. (1995). Noise and the reality gap: The use of simulation in evolutionary robotics, in Proceedings of the Third European Conference on Advances in Artificial Life, (Springer), pp. 704–720. Jones, S., Studley, M. and Winfield, A. F. (2014). Mobile GPGPU acceleration of embodied robot simulation. In: Headleand, C. J., Teahan, W. J. and Ap Cenydd, L., eds. (2014) Artificial Life and Intelligent Agents: First International Symposium, ALIA 2014, Bangor, UK, November 5-6, 2014. Revised Selected Papers. (519) Springer, pp. 97-109. Kaplan, F. & Hafner, V.V. (2006). The Challenges of Joint Attention, Interaction Studies, 7:2, pp. 135-169.

Koenig, N., & Howard, A. (2004). Design and use paradigms for gazebo, an opensource multi-robot simulator, In Proc. 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), vol. 3, pp. 2149–2154, IEEE. Krahe, R. and Maler, L. (2014). Neural maps in the electrosensory system of weakly electric fish, Current Opinion in Neurobiology, Vol. 24, February 2014, pp. 13-21, ISSN 0959-4388, https://doi.org/10.1016/j.conb.2013.08.013. Libet, B (1985). Unconscious cerebral Initiative and the role of conscious will in voluntary action. Liu, W., Winfield, A.F. (2011): Open-hardware e-puck Linux extension board for experimental swarm robotics research. Microprocessors and Microsystems 35(1). Marques, H., Holland, O. (2009). Architectures for functional imagination,. Neurocomputing 72(4-6), 743–759. Marques, H.G., Jantsch, M., Wittmeier, S., Holland, O., Alessandro, C., Diamond, A., Lungarella, M., & Knight, R. (2010), ECCE1: the first of a series of anthropomimetic musculoskeletal upper torsos, 10th IEEE-RAS International Conference on Humanoid Robots, 391-396, IEEE Maturana, H. R., & Varela, F. J. (1987). The Tree of Knowledge: The Biological Roots of Human Understanding. Boston, MA: New Science Library/Shambhala Publications. Michlmayr, M., (2002). Simulation Theory Versus Theory Theory: Theories concerning the Ability to Read Minds, Diploma thesis, Leopold-FranzensUniversitat, Innsbruck. Millard, A., Timmis, J., & Winfield, A.F. (2014). Run-time Detection of Faults in Autonomous Mobile Robots Based on the Comparison of Simulated and Real Robot Behaviour, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014). Millard, A.G., Timmis, J., Winfield, A.F.T. (2013). Towards exogenous fault detection in swarm robotic systems, Towards Autonomous Robotic Systems, 429-430. Mondada, F., Bonani, M., Raemy, X., Pugh, J., Cianci, C., Klaptocz, A., Magnenat, S., Zufferey, J.C., Floreano, D., & Martinoli, A. (2009). The e-puck, a robot designed for education in engineering. In: Proc. 9th Conference on Autonomous Robot Systems and Competitions. pp. 59-65. Moore, R. K. (2016). Introducing a pictographic language for envisioning a rich variety of enactive systems with different degrees of complexity. Int. J. Advanced Robotic Systems, 13(74).

Moore, R.K. (2012). Lecture: Extending Maturana & Varela’s symbols, FECS, Feb. 2012. Morse, A. F., Greef, J. D., Belpaeme, T., and Cangelosi, A. (2010). Epigenetic robotics architecture (ERA). IEEE Trans. Auton. Ment. Dev. 2, 325–339. Nagai, Y., Hosoda, K., Morita, A., & Asada, M. (2003). A constructive model for the development of joint attention. Connection Science, 15(4):211–229. O'Dowd, P., Studley, M. & Winfield, A. F. (2014). The distributed co-evolution of an on-board simulator and controller for swarm robot behaviours. Evolutionary Intelligence, 7 (2). pp. 95-106. Oudeyer, P.-Y., Kaplan, F., & Hafner, V. V. (2007). Intrinsic motivation systems for autonomous mental development. IEEE Trans. Evol. Comput. 11, pp. 265–286. Pico, A., Schillaci, G., Hafner, V.V., Lara, B. (2016). How Do I Sound Like? Forward Models for Robot Ego-Noise Prediction, Proceedings of the 6th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics, pp. 246-251, Paris, France. Rolf, M., & Steil, J. J. (2014). Explorative learning of inverse models: a theoretical perspective. Neurocomputing 131, pp. 2–14. Rosen, R. (1985). Anticipatory Systems: Philosophical, Mathematical and Methodological Foundations. Pergamon Press, Oxford, UK. Schillaci, G., & Hafner, V. V. (2011). “Random movement strategies in selfexploration for a humanoid robot“, 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 245–246, Lausanne. Schillaci, G., Hafner, V.V., Lara, B. (2012a), Coupled Inverse-Forward Models for Action Execution Leading to Tool-Use in a Humanoid Robot, Proceedings of the 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI 2012), pp. 231-232, Boston, USA. Schillaci, G., Hafner, V.V., Lara, B. (2016a), Exploration behaviours, body representations and simulations processes for the development of cognition in artificial agents, Frontiers in Robotics and AI, section Humanoid Robotics, 3:39. doi: 10.3389/frobt.2016.00039 Schillaci, G., Hafner, V.V., Lara, B. (Editors) (2016c). Re-enacting sensorimotor experience for cognition, Research Topic in Frontiers in Robotics and AI, Section Humanoid Robotics. doi: 10.3389/ frobt.2016.00077 Schillaci, G., Lara, B. and Hafner, V.V. (2012b), Internal Simulations for Behaviour Selection and Recognition, in Human Behaviour Understanding 2012, A.A. Salah et al. (Eds.), Lecture Notes in Computer Science, Volume 7559, pp. 148-160.

Schillaci, G., Ritter, C.-N., Hafner, V.V., Lara, B. (2016b), Body Representations for Robot Ego-Noise Modelling and Prediction. Towards the Development of a Sense of Agency in Artificial Agents, International Conference on the Simulation and Synthesis of Living Systems (ALife XV), pp. 390-397, Mexico, July 2016 Silver, D. et al (2016), Mastering the game of Go with deep neural networks and tree search, Nature, Vol 529, 484-489. doi:10.1038/nature16961 Vanderelst, D. and Winfield, A. (2017) Rational imitation for robots: The cost difference model. Adaptive Behavior, 25 (2). pp. 60-71. Vaughan, R.T., Zuluaga, M. (2006). Use your illusion: Sensorimotor selfsimulation allows complex agents to plan with incomplete self-knowledge. In: Proc. International Conference on Simulation of Adaptive Behaviour (SAB). pp. 298–309. Vernon, D., Beetz, M., Sandini, G. (2015). Prospection in cognition: the case for joint episodic-procedural memory in cognitive robotics, Frontiers in Robotics and AI, section Humanoid Robotics 2:19 Winfield, A.F. (2012) Robotics: A very short introduction, Oxford University Press. Winfield, A. F. (2014). Robots with internal models: A route to self-aware and hence safer robots. In: Pitt, J., ed. (2014) The Computer After Me: Awareness And Self-Awareness In Autonomic Systems. London: Imperial College Press. Winfield, A. F., Blum, C. & Liu, W. (2014). Towards an ethical robot: Internal models, consequences and ethical action selection. In: Mistry, M., Leonardis, Aleš, Witkowski, M. and Melhuish, C., eds. Advances in Autonomous Robotics Systems Springer, pp. 85-96. Winfield, A. F. (2017). When Robots Tell each other Stories: The Emergence of Artificial Fiction. In: Stepney, S. and Walsh, R., eds. Narrating Complexity, Springer, in press. Wolpert, D. M., Ghahramani, Z., & Jordan, M. I. (1995). An internal model for sensorimotor integration. Science 269, 1880. Wolpert, D. M., Goodbody, S. J., & Husain, M. (1998). Maintaining internal representations: the role of the human superior parietal lobe. Nat. Neurosci. 1, 529–533. Zagal, J.C., Delpiano, J., Ruiz-del Solar, J. (2009). Self-modeling in humanoid soccer robots. Robot. Auton. Syst. 57(8), 819–827 (Jul 2009). Keywords

Anticipation, robotics, internal simulations, predictive models, sensorimotor learning, multi-agent systems, developmental robotics, theory of mind, robot safety, robot ethics, expectations, forward models, exploration, tool-use, sensory attenuation, human-robot interaction