Hormonal Modulation of Development and Behaviour Permits a Robot ...

1 downloads 0 Views 1MB Size Report
School of Computer Science & STRI, University of Hertfordshire. College Lane ..... energy, H1 for heath and T1 for temperature – released by a corresponding ...
DOI: http://dx.doi.org/10.7551/978-0-262-32621-6-ch031

Hormonal modulation of development and behaviour permits a robot to adapt to novel interactions John Lones1, Matt Lewis1 and Lola Cañamero1 Embodied Emotion, Cognition and (Inter-)Action Lab School of Computer Science & STRI, University of Hertfordshire College Lane, Hatfield, Herts, AL10 9AB, U.K. [email protected], [email protected], [email protected],

Abstract Hormones are known to play a critical role in modulating the behaviour and development of organisms when confronted with different environment challenges. In this paper we present a biologically plausible hormonal mechanism that allows an autonomous robot to interact appropriately with novel objects and interactions depending upon both its current internal state and its past experiences. In our experiments, robots that had been exposed to negative experiences during their initial developmental phase displayed withdrawn behaviour and were less likely to explore new objects and environments, or to engage with a human caregiver. In contrast, robots with a positive upbringing showed much greater levels of outgoing behaviour such as exploration and social interaction.

Introduction Hormones have been established as critical behavioural modulators for many types of organisms ranging from vertebrates (leRoith et al, Montoya et al 2012) right through to simple unicellular organism ( Kabara 1988) highlighting them as a potential fundamental aspect of life. Like in biological organisms, hormones have also been demonstrated to have a potentially critical role to play in autonomous robots. In these artificial models, hormones have been utilised to successfully modulate different action selection mechanisms in order to achieve appropriate behaviour in a range of scenario (AvilaGarcía & Cañamero, 2004, French & Cañamero 2005, Blanchard & Cañamero 2006, Chelian et al 2012, Krichmar 2013). In our own previous studies (see Lones & Cañamero 2013 & Lones et al 2013 & Lones et al 2014) we integrated hormonal modulation of a homeostatic system through an epigenetic mechanism, in which hormones were secreted in relation to homeostatic deficits. This mechanism enabled a sensory-driven autonomous robot to adapt successfully to a wide range of different environmental challenges and gave rise to the emergence of unique behavioural characteristics. In this paper, we look to improve the adaption capabilities of an autonomous robot by endowing it with an hormone signalled epigenetic mechanism that modulates the

development of a wide range of survival-related and social behaviours as a function of both its internal state and the environmental conditions, and that would permit the robot to interact appropriately with novel objects in its environment. We do this by adding two new hormones to our previous system, functionally akin to two chemical modulators, the steroid hormones corticosteroids and testosterone. In biological organisms both hormones have long drawn particular interest for their role in modulating a wide range of value-laden survival and social behaviours. This occurs due to the interaction between the hormones and one of their primary targets, the amygdala (Koolhass et al., 1990). While the exact mechanisms are unknown, once the hormones have reached the amygdala, their behaviour is better understood. Testosterone is linked to promoting outgoing reward-seeking behaviours such as dominance, aggression, exploration, and curiosity (Daitzman & Zuckerman, 1980, Mazur & Booth 1998). In contrast, corticosteroids are related to avoidance and withdrawal behaviours (Buss et al., 2003; Montoya et al., 2012). Moreover, these hormones do not only modulate emotional processing of the amygdala but are also believed to affect its neural connectivity to other areas of the brain, particularly the orbitofrontal cortex. Exposure to corticosteroids leads to strengthening these neural connections, while testosterone weakens them (Mehta, 2010; van Wiggan et al., 2010). As the orbitofrontal cortex is associated with decision making, strengthening or weakening the emotional input from the amygdala could result in additional behavioural modulation (Bechara, 2000) Although the two steroid hormones have significant potential to modulate behaviour, actual studies into the individual roles of these hormones do not always offer conclusive evidence. This is particularly noticeable in human studies where results are normally limited to observation of subjects, which can even be contradictory. There are at least two likely explanations for this. Firstly, there is significant evidence to suggest that both cortisol (CHT) and testosterone (T) work in tandem to modulate behaviour and it is the ratio or imbalance between both chemicals that is important (Montoya, 2012). For example, in a situation with a high T/CHT ration (high T low CHT) aggression is more prevalent than in a situation

ALIFE 14: Proceedings of the Fourteenth International Conference on the Synthesis and Simulation of Living Systems

with an equal ratio, even when the T level remain constant (Popma et al., 2007). Secondly, the effects of hormones and secretion are likely to be subjective. This is particularly relevant for organism with highly complex cognition and neural mechanisms, such as humans. Where aspects such as learning, planning, normative behaviour and beliefs gained through life-long experience will affect “consciousness” and therefore can lead to differences in individual emotional processing (Arnold, 1960, LeDoux,1993 & Khalin, 1993). However, it is not only the level of neural mechanisms that can lead to subjective hormonal modulation. More recently, evidence has arisen of phenotypical plasticity in the neuroendocrine systems responsible for the secretion and regulation of T and CHT. Changes in gene expression occurring within these neuroendocrine systems are known to be associated with extreme forms of behaviour (Mcgowan et al 2009). However, it is also highly likely that these changes could have an effect on day-to-day behaviour. The neuroendocrine systems of T and CHT consist of the Hypothalamic–pituitary–gonadal axis (HPG-Axis) and the Hypothalamic–pituitary–adrenal axis (HPA-Axis) for T and CHT secretion respectfully. While these two axes are often considered separate entries, they are interconnected through feedback loops. Specifically, research has shown that the HPA-Axis supress the activity of the HPG-axis on all levels. In addition the HPA-AXIS contains a negative feedback loop that consists of glucocorticoid receptors which, in response to rising corticoid levels, signal for the suppression of the axis activity (Montoya et al 2012). However, this is not a simple static relationship between cortisol levels and HPA activity. Research has suggested that the glucocorticoid receptors responsible for the feedback are susceptible to epigenetic changes consisting of upwards and downwards regulation. Downwards regulation, which is a reduction in the total number of receptors, leads to reduced sensitivity to corticoids and thus weakens the negative feedback loop. In contrast, upwards regulation leads to an increased number of receptors and therefore increased sensitivity and a more reactive negative feedback loop (Liu et al 1997 Mcgowan et al 2009, Zhang et al 2013 ). Downregulation of glucocorticoid receptors has been linked with, and believed to be triggered at least partially by, continuous high levels of corticoids in the system (Mcgowan et al 2009). Upregulation, on the other hand, has been associated with positive upbringing and experiences during early life with dopamine considered a potential chemical trigger (Liu et al 1997). Much like many other forms of epigenetic changes organisms tend to be more susceptible to mechanism during critical periods of development (Reik et al

biological examples discussed, in this artificial model hormones play a critical role in short-term modulation of both the internal environment and the connectivity of different “neural functions” of the robot. This is achieved by secreting the hormones as a function of the robot’s internal state and external stimuli, creating a “chemical soup” that surrounds the robot’s neural functions. Each “neural function” has receptors that can detect the concentration of specific hormones and are susceptible to modulation accordingly. In addition, hormone levels are also used to signal epigenetic changes in specific areas of the model during critical developmental periods, causing long-term implications on the behaviour of the robot arising from specific characteristics of its internal state and of the environment stimuli that it was exposed to during the critical developmental period.

Robotic model To test the viability of the previously described mechanism experiments were run utilizing the koala II robot. We added a webcam to the standard robot configuration, which consists of 14 IR range sensors. Control of entire model was handled through a serial connection to a computer running Ubuntu, the architecture was programmed in C++, and openCV was used for vision and image processing. Vision was used to detect resources based on predetermined characteristics such as shape and colour. As we will discuss in more detail later, we conducted experiments in a noisy environment in open space in our lab. Due to this addition of noise we found it essential to add two functions between the IR-sensors and the Action selection mechanism. The first function consisted of a form of sensory memory similar to iconic memory in which information from IR’s are briefly stored before decaying. {

(1)

where Tsv is total IR sensory value which is passed to the ASM, sv the current sensor value, i is sensor number, t time, rD is the rate of decay which was set to make sensory information fully decay in just under a second, and Abst the absolute threshold that defines the minimum value that sensors need to become ‘active’. The second function heightens the sensitivity of sensors neighbouring an active sensors by reducing the Abst by heightened value ( ) which is equal to 33% {

2001).

In this paper, we present a hormone-driven biologically plausible robotic model of these mechanisms. Like in the

ALIFE 14: Proceedings of the Fourteenth International Conference on the Synthesis and Simulation of Living Systems

(2)

Figure 1: Action selection architecture used in this paper.

Homeostatic variables The architecture of the robot includes three survival-related variables which are homeostatically controlled: health, energy and temperature. These decrease as a function of the robot’s actions and interaction with the environment. Whereas ‘health’ is simulated, the two other variables are linked to the actual physics of the robot. When the robot detects contact of significant strength, its homeostatic variable ‘health’ decreases proportionally to the size of the force. This deficit can be recovered by finding and utilising a repair resource. Energy is linked to the robot’s battery, which has a maximum capacity of 3.5 ah (3500 mAH), permitting around 4 hours of usage. Due to the long battery duration, to make the environment more challenging, the architecture was programed to allow the robot sense a maximum of 75 mAh of charge at any time, equivalent to a running time of around 5 minutes at a moderate activity level. Consumption of an energy resource increases the charge available to the potential maximum of 75 mAh. Finally, temperature is related to the speed of the motors and directly sensed using the internal temperature monitors of the robot. Each of the survival-related homeostatic variables has a lethal boundary which if transgressed results in the agent’s death. In the case of energy and health, the lethal boundary is set at the bottom end of the range of permissible values, in the case of temperature the lethal boundary is at the upper end of the range.

Table 2 The types of objects the IR sensors can detect

Table 1 the homeostatic resources Homeostatic Variable Derived from Range lethal boundary Deficits occurrence recovery

Object detection is achieved by a combination of image and IR data. Data from the camera is processed (using the openCV library) to detect resources scattered around the environment. The IR sensors feed into a learning growing artificial neural network in order to detect and classify novel objects. The latter uses primarily information about the size of objects, determined by the number of IR sensors active. Since this type of neural network needs a very long training phase to learn from scratch, for the experiments presented in this paper we trained the network off-line; the same network was used for all runs

Energy

Health

Temperature

Battery level 0-75 mAh 0 mAh Semi- dynamic

Simulated 0-100% 0% Dynamic

Internal sensors 26– 32 degrees 32 degrees Dynamic

Energy Resource

Health Resource

Reducing motor speed

Action selection architecture Mechanism (ASM) The robot’s action selection architecture consists of three main internal systems: object and stimuli detection, valence assignment and free flow movement. Behaviours emerge from the activation, interaction and modulation of the different internal systems, rather than being pre-determined.

Number of sensors active 0 1 2 3 4+ 0, but both neighbouring neurons active

Type of object Empty space Small object Medium sized object Large object Wall A hole or gap

As well as being able to classify different objects into size groups, a second function of this system is to predict the behaviour of a detected object in terms of how close or far it will be at the next point in time, or which sensors might detect it next if it is being passed. Each IR sensor can store up to three predicted locations at any one time – those generated by itself and by its two neighbouring sensors. Since, as we will see later, the environment is primarily static, the robot will always assume any change in distance to be the result of its own movement. For example, while moving forward, an object that is detected in front of the robot will be expected to be closer, while an objected detected on the left will be expected to be later detected by an IR sensor on the same side but further away in the direction of motion, and therefore the predicted location of this object is shared with this sensor.

ALIFE 14: Proceedings of the Fourteenth International Conference on the Synthesis and Simulation of Living Systems

)

(6)

Table 3 Valence detection Type of object

Figure 2: Object prediction system. The distance to the predicted location is determined by a range of values between two boundaries, which are calculated as follows:

Empty space Small object Medium sized object Large object Wall A hole or gap

Initial Valence 1 200 -200 -400 0 -800

Curiosity scale 1 2 8 14 -2 20

stress scale -1 -4 -8 -12 10 -16

Hormone model

(3) (4) where is the speed at which the robot is moving towards the object, the expected object speed (always 0 in this case as objects are expected to be stationary) the current concentration of stress and curiosity hormones The width of the predicted area (if the prediction is passed on to a neighbouring sensor) is determined by

{

The epigenetic artificial hormones in this model consist of two main groups, that we have named “endocrine hormones” (eH) and “neuro-hormones” (nH). While both group types share common characteristics, they also present some significant differences. Firstly, endocrine hormones (eH) are secreted by their respective glands in relation to homeostatic variables deficits. Each variable has an associated hormone – E1 for energy, H1 for heath and T1 for temperature – released by a corresponding gland. The secretion of each of these hormones occurs when the ASM stimulates its associated glands ( ) as a function of the relevant homeostatic deficit.

(7)

(5)

where lWSpeed and rWSpeed are the speed of the left and right wheels, respectively and is the distance between neighbouring sensors, which is unique between each pair. If no object is detected by a sensor or its neighbours, then the robot will only expect to detect distant objects. If an object is detected outside the predicted area, e.g., if an object is placed directly in front of the robot, it will be perceived as unexpected and treated as a stressor. Valence assignment occurs after an object has been detected and its value depends upon both the type of object and the current internal state of the robot. Valence is an emotion dimension associated with objects, interactions or events (Russell, 1980; Posner et al, 2005) and that provides values along a pleasure-displeasure continuum. The value of objects is calculated as a function of the potential threat associated with the object, and the internal state of the robot. The base valence for a novel object is determined as a function of their size – larger objects are perceived more negatively as they could potentially be more dangerous,. The internal factors contributing to valence are the current concentration of stress and curiosity hormones. The effect that these hormones have on the perception of different objects scales differently ( for each object as shown in table 3.

where is the strength of stimulation from the ASM and the gland’s activity level. The gland’s activity level is determined by a simple biologically plausible epigenetic mechanism akin to epigenetic adaption that we previously implemented in (Lones & Cañamero, 2013) This mechanism consists of a positive feedback loop were high concentration of a secreted hormone will lead to increased gland activity. (8) where is a constant value used to control the speed of the epigentic change. The secretion of the second group of hormones is triggered by a combination of the robot’s external perception of the environment and its current internal state. This hormone group includes two hormones called “stress” and “curiosity”. As the name suggests, the stress hormone is inspired from the roles that cortisol and serotonin play in regulation of stress responses, and hence it regulates stress response in the robot. (

)

ALIFE 14: Proceedings of the Fourteenth International Conference on the Synthesis and Simulation of Living Systems

(9)

where oS is overstimulation (a simple addition of all objects detected), F is fear (total value of negative stimuli detected), fL is the feedback loop, and x a constant, roD is the robot self-perceived “risk of death” (Risk of death as a measure of viability was previously developed by AvilaGarcía & Cañamero ,2004) which is derived from the current concentration of the three endocrine hormones. The risk of death is determined by both the current level and duration of any homeostatic deficits. In addition, since hormone concentration is also affected by the previously described epigenetic mechanism, the perceived risk of death will also be dependent on the development of the robot. For example, whereas a robot that is constantly low on energy will likely develop hyper-sensitivity to energy deficits and will end up perceiving a high risk of death when energy deficits occur, a robot that “grows up” in an environment that permits it to always maintain high energy levels will develop a natural tolerance to deficits. The stress hormone is secreted under three types of circumstances: when there is a high risk of death (linked to homeostatic deficits), when there is overstimulation from exposure to different novel objects, and in the presence of perceived threats. Our curiosity hormone takes inspiration from the roles that hormones such as testosterone and dopamine play in regulating (in this case increasing) behaviors related to dominance, aggression and curiosity. (

)

(11) Once part of the chemical soup ( ) all hormones decay at the same rate which in this model was set to 0.95 ∑

(12)

Experiments For this paper we tested this architecture under two main experimental conditions: a relatively static environment where the only changes occurred as a result of the interactions of the robot, and a much more dynamic environment that included human-robotic interaction. The environments used for our tests were implemented in the open space of our lab, an area of about 45 m2 in which desks and chairs are located near the walls, surrounding an empty central area. The robot could roam freely around the lab that was only modified by removing the swivel chairs and by placing limited plywood boards to protect sensitive or delicate areas. Resources, obstacles and other environmental stimuli were then added as can be seen in figure 3. In all cases, an identical architecture and robot was used for each experiment.

(10)

As can been seen, secretion occurs with low risk of death, perception of interesting objects (pS), and as homeostatic deficits are recovered (R). In addition, the concentration of limits the secretion of in a similar manner to the biological interaction between cortisol and testosterone through the HPA axis. Much like the HPA axis found in biologically systems, an epigenetic mechanism exists in our artificial HPA negative feedback loop. In biological systems, this mechanism consists of glucocorticoid receptors present at both the hypothalamus and the anterior pituitary lobe of the pituitary gland. These receptors detect the current concentration of corticosteroid and lead to a negative feedback loop if levels are too high (Liu et al 1997 Mcgowan et al 2009, Zhang et al 2013). An important feature of these receptors is that they are susceptible to epigenetic changes in gene expression. Exposure to constant high levels of corticosteroids, for instance, leads to downregulation in the feedback loop (Mcgowan et al 2009). However positive environmental upbringing has shown to lead to upregulation. While the exact mechanic of upregulation is unknown, exposure to hormones/neuro transmitters associated with happiness and outgoing behaviour i.e dopamine are potential and realistic mechanism with some supporting evidence (Liu et al 1997). In this model, upregulation and downregulation of the glucocorticoid receptors and therefore the regulation of the feedback loop are respectively associated with exposure to and .

Figure 3: The positive and negative environments. Figure 3 shows two different angles of the environment the positive on the left and negative environment on the right with the differences discussed in more detail bellow In both experimental conditions (“static” and “dynamic” environments), the tests involved two phases. In the first phase, identical for both experimental conditions, the robot spent its critical developmental period (in our case the first 5 minutes of its “life”) in one of the two environments previously discussed, with an equal split between the two types of environments within each experimental condition: either a “positive” or easy environment (half of the runs for each experimental condition) or a “negative” or stressful environment (half of the runs for each experimental condition). As can be seen in Figure 3, both positive and negative developmental environments share similar design with a few subtle but important differences, as follows. A) the first difference is the homogeneity of materials used, with more variation occurring in the negative environment. As IRSensors naturally respond differently to different materials, increasing variation naturally leads to more fluctuations in sensor readings. Due to the previous described neural

ALIFE 14: Proceedings of the Fourteenth International Conference on the Synthesis and Simulation of Living Systems

mechanism, these fluctuations are likely to lead to a stress response. B) The second difference is the increased use of objects made of textures and/or colours particularly difficult to detect in the negative environment. Not only does this lead to fluctuations in sensor readings but also, since the robot is unable to accurately detect distance, it also increases the likelihood of collisions. C) The third difference is the spacing between objects. In the negative environment, distance between different objects is small, increasing the potentially for over-stimulation. In addition, smaller spaces in structures like the maze depicted in the figure also hamper navigation significantly, potentially increasing stress. D) The Final, a fourth difference is the “reward” obtained for exploring the environment, which is greater in the positive environment. For example, in the positive environment the reward for completing the maze is an easily accessible homeostatic resource. In contrast, while the resource is still present in the negative environment, the likelihood of finding and accessing the resources, and therefore of getting a reward for completing the maze, is smaller. The second phase differed for each condition. A set of 10 single-robot runs, each of duration of 15 minutes, took place in relatively static environment, and a set of 10 runs of duration of 5 minutes per run in the dynamic environment. In this second condition, human-robot interaction took place in an empty environment after the robots had developed. While challenges and potential stimuli to explore were still in abundance at the edges of the environment, the centre was largely barren in order to increase the chances that the robot would focus on human interaction. Runs were reduced to 5 minutes since the experimenter could finely control exposure to stimuli, making unnecessary the longer duration of runs that was used in the static environment condition in order to ensure that the robot could fully explore the environment. While the ASM was never specifically designed or programed for human-robot interaction, we found that it would naturally lend itself to simple types of interaction. For example, the robot generally found stroking interaction positive as it led to a release of the curiosity hormone, whereas sudden movement usually invoked a fearful or negative response. The robot’s response to an action was also highly influenced by the way it was carried out. For example like in (Cañamero & Fredslund, 2000), the speed, force and duration of the stroking motion will have a significant impact on the robot’s response The response is again also influenced by a combination of the robot’s internal state and developmental history. A robot that has had extensive human interaction was more likely to enjoy vigorous stroking where as one that has suffered extreme deprivation might find displeasure in even minimal contact.

Experimental results and discussion To analyse the results of our experiments in terms of the performance of the robot, we had initially planned to use the viability-based indicators of performance developed in previous work (Avila-García & Cañamero, 2004.), such as wellbeing, overall comfort, or risk of death. However, due to the epigenetic developmental mechanism in our present architecture, the robots that developed under different environmental conditions developed different tolerance to stimuli and homeostatic deficits. These differences in tolerance led to skewed results using the above-mentioned performance indicators, as will discuss below. Experiments in the static environment. As can be seen in figure 4, the interaction between the environment and epigenetic hormonal mechanisms made the robot have significantly different behaviours once developed and placed within the “neutral” testing environment. Internal state of a robot developed in the positive environment Events Energy Health Temp E1 H1 T1 Curosity Stress

Internal state of a robot developed in the negative environment Events Energy Health Temp E1 H1 T1 Curosity Stress

Figure 4: The internal state of two robots. Figure 4 shows the internal state of a robot from the positive and negative environments during the 15 minutes in the static environment. Darker colours indicate increased stimuli, higher homeostatic deficits or increased hormones concentrations respectfully. In all cases, robots that had developed in the negative environment showed a very “withdrawn” behaviour: the robot spent a significant portion of its time executing a behaviour similar to wall following. If the robot found a corner or an enclosed area, it would remain stationary in this location until other internal needs (e.g., the need to replenish energy) became more prioritarian. The reason for stopping in these enclosed areas was likely to be the fact that they were perceived as the safest location – as walls, which, detected on multiple sides, in a stressed state would have positive valence, effectively treating them as nests. As can be seen in figure 4, interaction with other areas of the environment was minimal due to the constant high level of the stress hormone, which suppressed the HPG-Axis effectively, preventing the emergence of a ratio between curiosity and stress that would modulate the ASM into investigating novel objects. In a few rare occasions that the robot did have a high enough level of curiosity hormone to facilitate and initiate interaction with novel objects, it quickly became over-stimulated and reverted to the previous withdrawn behaviour. Stress responses in

ALIFE 14: Proceedings of the Fourteenth International Conference on the Synthesis and Simulation of Living Systems

regards to interactions with novel objects not only tended to be more prevalent in these robots but also were significantly more severe and lasted on average 60 percent longer. Stress responses and hypersensitivity in regards to homeostatic deficits where also heightened in the robots that developed in a negative environment. Essentially, this meant that the robot would look to maintain homeostatic deficits at a higher level and if they started to drop, the robot would quickly enter it withdrawn behaviour. The implication of this is that, once the robot found an area of the environment with access to both resources, it would tend to stay in that general region and never really explore for new opportunities. In contrast, the robots that had developed in the positive environment showed a much more outgoing behaviour, thoroughly exploring the entirety of the environment and interacting with a large range of the different novel objects. While this outgoing behaviour did lead to increased risks such as collisions or over stimulation, which caused the two high stress moments that can be seen in figure 4, the robot was able to recover fairly quickly. In addition, robots developed in a positive environment tended to have a greater tolerance to homeostatic deficits, which made them spend more time exploring and interacting with the environment. When comparing the two robots developed under different environmental conditions, one additional significant difference can be observed in figure 4: the manner in which hormones were secreted, which tended towards quick large releases in the robot from the negative environment compared to smaller sustained releases from the robot with positive background. The quick bursts led to more unpredictable but responsive behaviour in regards to environmental stimuli. This could perhaps be compared to a highly responsive “flight-or-fight” system.

to have. In addition, continuing to engage with any of these robots after they had attempted to withdraw led to an “aggressive” behaviour were the robot would attempt to push past or in a few case even turn to face the human and drive into them. We are still investigating the reason behind the emergence of this behaviour; however, we know that, for it to be triggered, the robot needs to have both an extremely high concentration of the stress hormone and a medium to high concentration of the curiosity hormone. Due to the inhibiting relation of the HPA-HPG axis, this concentration mix is a relatively rare phenomenon, and so far has only been achieved in the negative robot. These incidents represented the only time any of the robots made physical contact, excluding accidental collisions. As we also expected, the robot that developed in a positive environment was much more tolerant of interaction with the human. Slow- to medium-speed stroking led to an initial positive response; after a period of interaction, faster stroking and sudden movements were tolerated and even sparked interest from the robot. To this extent, if an object such as a ball was thrown, the robot would go after it to investigate. Once the ball/object stopped and the robot had explored it as a normal novel object in the environment, the interest in the object would drop, often leading to the robot to return to the human in search of increased stimulation. It is worth noting that the robot did not know who or what had thrown the object and returned to the human purely because s/he is a large moving object and therefore had a high positive valence). As a comparison, an object thrown at a robot from the negative environment almost always led to the robot’s withdrawal. An interesting aspect that emerged was the ability of the human to calm down the robot (i.e., to reduce its level of curiosity) by using low stimulation in the interaction.

Conclusion Experiments in the dynamic environment. In our second environment, dynamism was introduced by the presence of a human who interacted with the robot. Once again, the robots in the different runs had developed either in a positive or a negative environment. Due to the limited range of the robot’s sensors, the range of “recognisable” interactions was relatively small. As we could expect, the robots that had developed in a negative environment had a “timid demeanour” and tried to avoid any form of human interaction. However, gentle stroking motions along the IR-sensors could be used to initiate interaction by causing a rise in the concentration of the curiosity hormone. Interaction was primarily limited to this slow stroking as well as the robot exploring the human at its own pace. Any sudden movements or overzealous stroking would quickly lead to overstimulation of the robot and an attempt to withdraw. However, even with an ideal level of interaction, the hypersensitivity to homeostatic deficits meant the robot would only spend a maximum of around 30 seconds interacting before becoming more interested in procuring resources. It should be noted that robots that had developed such an aversive phenotype would try to withdraw immediately regardless of the interaction that the human tried

In this paper we have presented and demonstrated a biologically plausible epigenetic mechanism that modulates the development of a wide range of value-laden survival and social behaviours, taking inspiration from a key component in the formation of motivations and behaviours: the HPA-HPG axis. As this epigenetic mechanism is dependent upon the internal state of the robot and its exposure to different environmental stimuli, the final phenotype of the robots reflects the conditions in which it developed. Our results show that a robot that developed in a negative environment spent a majority of its time thereafter trying to avoid any interaction with anything new or novel, preferring instead to simply stay in a safe location and maximise its homeostatic levels, providing a buffer to help protect itself from it perceived environmental dangers. In contrast, a robot developed in a positive environment spent a large portion of its time interacting with and exploring its surroundings. This included interaction with a human, for which the robot had not been programmed and that emerged as a consequence of the developmental history of the robot. Based upon our results in this paper, we argue that this model can provide a useful adaptive mechanism for autonomous robots to develop

ALIFE 14: Proceedings of the Fourteenth International Conference on the Synthesis and Simulation of Living Systems

behaviour that allows them to interact appropriately with different elements of a novel (physical and social) environment. In the future, we will look to expand this model by running experiments over longer periods of time and utilising the learning algorithm mentioned in the ASM section. This model will allow us to investigate the potential roles that hormones may play in modulating learning experiences.

References Arnold, Emotion and personality (1960) New York, NY, US: Columbia University Press. Avila-García, O. and Cañamero, L. (2004). Using hormonal feedback to modulate action selection in a competitive scenario. In: From Animals to Animats: Proceedings of the 8th International conference of Adaptive Behavior (SAB’04),, p.p 243–252.

Krichmar, J. L. (2012). “A biologically inspired action selection algorithm based on principles of neuromodulation” in The 2012 International Joint Conference on neural networks p.p 1-8 LeDoux J, E Emotional memory systems in the brain (1993) Behavi. Brain Res., 58 , pp. 69–79 LeRoith, D., Shiloach, J., Roth, J., and Lesniak, M.A(1980). The evolutionary origins of vertebrate hormones: Insulin in unicellular organisms. Proc. Natl. Acad. Sci. USA 77:6184-6188 Lones, J & Cañamero, L (2013), 'Epigenetic adaptation through hormone modulation in autonomous robots'. In IEEE ICDL-EPIROB 2013: The Third Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics Lones, J, Cañamero, L & Lewis, M (2013), 'Epigenetic adaptation in action selection environments with temporal dynamics'. in Advances in Artificial Life, ECAL 2013: 12th Conf on the Synthesis and Simulation of Living Systems. MIT Press, pp. 505-512.

Bechara A, Damasio H, Damasio AR (2000) Emotion, decision making and the orbitofrontal cortex. Cerebral Cortex 10:295-307.

Lones, J, Cañamero, L & Lewis, M (2014), Hormonal modulation of interaction between autonomous agents In Press for The Forth Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics

Buss K,A, Schumacher J, R, M, Dolski , I, Kalin N.H, Goldsmith H.H, Davidson R J (2003) Right frontal brain activity, cortisol, and withdrawal behavior in 6-month-old infantsBehavioral Neuroscience, 117, pp. 11–20

Liu D, Tannenbaum B, Caldji C, Francis D,Freedman A, et al. (1997). Maternal care, hippocampal glucocorticoid receptor gene expression and hypothalamic-pituitary-adrenal responses to stress. Science 277:1659–62

Blanchard, A. and Cañamero, L. (2006). Developing AffectModulated Behaviors: Stability, Exploration, Exploitation, or Imitation? In F. Kaplan et al. (eds.), Proc. 6th International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems (EpiRob 6), , 128: 17-24.

Mazur, Allan, and Alan Booth. (1998). “Testosterone and Dominance in Men.” Behavioral and Brain Sciences 21:353-63 Mehta PH, Beer J. (2010) Neural mechanisms of the testosterone– aggression relation: The role of orbitofrontal cortex. Journal of Cognitive Neuroscience. (10):2357–2368

Cañamero, L.D., Fredslund, J. (2000). How Does It Feel? Emotional Interaction with a Humanoid LEGO Robot. In K. Dautenhahn, ed., Socially Intelligent Agents: The Human in the Loop. AAAI 2000 Fall Symposium, pp. 23-28.

McGowan, P. O., Sasaki, A., Dymov, S., LaBoonté, B., Turecki, G., Szyf, M., et al. (2009). Epigenetic regulation of the glucocorticoid receptor in human brain associates with childhood abuse. Nature Neuroscience, 12, 342–348.

Chelian, S. E., Oros, N., Zaldivar, A., Krichmar, J., & Bhattacharyya, R. (2012). “Model of the interactions between neuromodulators and pre-frontal cortex during a resource allocation task” in ICDL- EpiRob

Montoya, E, Terburg, D, A Bos, P, and Honk,J (2012) Testosterone, cortisol, and serotonin as key regulators of social aggression: A review and theoretical perspective Motiv Emot. 2012 March; 36(1): p.g 65–73.

Daitzman, R. J., & Zuckerman, M. (1980). Disinhibitory sensation seeking and gonadal hormones. Personality and Individual Differences, 1, 103–110. Du Ruisseau, P., Y. TachS, P. Brazeau and R. Collu.(1979). Effects of chronic immobilization stress on pituitary hormone secretion, on hypothalamic factor levels, and on pituitary responsiveness of LHRH and TRH in female rats. Neuroendo- crinology 29:90 French, R. and Cañamero, L. (2005). Introducing Neuromodulation to a Braitenberg Vehicle. In Proc. IEEE Intl. Conference on Robotics and Automation, "Robots get Closer to Humans" (ICRA 2005), pp. 4199-4204. Kalin, N.H. (1993). The neurobiology of fear. Scientific American, 268, 94-101 Koolhass, J.M., van den Brink, T.H.C., Roozendaal, B, & Boorsma,F. (1990) Medial amygdala and aggressive behavior: Interactionbetween testosterone and vasopressin. Aggressive Behav., 16:223–229.

Popma A, . Vermeiren R, C.A.M.L. Geluk, T. Rinne, W. van den Brink, D.L. Knol (2007). Cortisol moderates the relationship between testosterone and aggression in delinquent male adolescents Biol Psychiatry, 61 (3) pp. 405–411 Reik ,W ,Dean, W, Walter, J Mammalian Development 2001: 293 (5532), 1089-1093

Epigenetic Reprogramming in (2001)Science 10 August

van Wingen G, Mattern C, Verkes RJ, Buitelaar J, Fernandez G (2010) Testosterone reduces amygdala-orbitofrontal cortex coupling. Psychoneuroendocrinology 35: 105–113. Zhang, T Y, Labonte, B, Wen, X, L, Turecki, G, Meaney, M J (2013)Epigenetic Mechanisms for the Early Environmental Regulation of Hippocampal Glucocorticoid Receptor Gene Expression in Rodents and Humans :Neuropsychopharmacology 38 1 pp 111 - 123

ALIFE 14: Proceedings of the Fourteenth International Conference on the Synthesis and Simulation of Living Systems

Suggest Documents