Emotions as a Vehicle for Rationality: Rational Decision Making Models Based on Emotion-Related Valuing and Hebbian Learning Jan Treur1, Muhammad Umair1,2 1
VU University Amsterdam, Agent Systems Research Group De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands 2 COMSATS Institute of Information Technology, Lahore, Pakistan Email: 1{j.treur, m.umair}@vu.nl,
[email protected] URL: http://www.cs.vu.nl/~treur
Abstract. In this paper an adaptive decision model based on predictive loops through feeling states is analysed from the perspective of rationality. Hebbian learning is considered for different types of connections in the decision model. To assess the extent of rationality, a measure is introduced reflecting the environment’s behaviour. Simulation results and the extents of rationality of the different models over time are presented and analysed. Keywords: decision making, cognitive agent model, emotion, Hebbian learning
1
Introduction
Decision making usually involves considering different options and comparing them in order to make a reasonable choice out of them. Each option has an associated emotional response, related to a prediction of a rewarding or aversive consequence of a choice for that option. The extent to which such an emotional response associated to an option is felt as positive, is often seen as a form of valuing of the option. In decisions such valuing plays an important role, and can be seen as a grounding for the decision. Decisions that are not solidly grounded by having a positive feeling about them often do not last long, as any opportunity to get rid of them will be considered to cancel the decision. In recent neurological literature this idea of emotional valuing (and grounding) of decisions has been related to a notion of value as represented in the amygdale; e.g., (Bechara, Damasio, and Damasio, 2003; Bechara, Damasio, Damasio, and Lee, 1999; Montague and Berns, 2002; Morrison and Salzman, 2010; Ousdal, Specht, Server, Andreassen, Dolan, Jensen, 2014; Pessoa, 2010; Rangel, Camerer, Montague, 2008; Rudebeck and Murray, 2014). A question that may be put forward is whether such decisions based on emotional valuing are not in conflict with rational behaviour. Indeed, there is a long tradition in considering emotions and rationality enemies of each other. It is this theme that is the focus of this paper. It will be discussed and computationally evaluated how emotions are not an enemy but a vehicle for
1
rationality. It will be investigated in what sense emotional valuing as a basis for decision making satisfies some rationality criterion. Making a decision is not just an instantaneous process in the present. Instead, decision making has an embedding in the temporal dimension; earlier situations are relevant as well, and it also has implications for future situations. More specifically, experiences with (outcomes of) decisions made within the given environment from the past play an important role. Learning processes adapt the decision making mechanism to such experiences. In this way the decisions become more reasonable, or in some way rational, given its increasing knowledge about the environment built up by these past experiences. The question to which extent such learning based on specific biologically plausible learning models leads to decision making satisfying some rationality criterion will be addressed in this paper. The computational model for decision making considered here, for a given (observed) situation first generates preparations for a number of options, relevant for that situation. Next, based on predictive as-if body loops, associated feeling states are generated, in order to obtain emotional valuations of the options; e.g., (Damasio, 1994; Damasio, 2004; Damasio, 2010; Janak and Tye, 2015; Ousdal, Specht, Server, Andreassen, Dolan, Jensen, 2014; Pearson, Watson, and Platt, 2014; Pessoa, 2010; Rangel, Camerer, Montague, 2008). The extent to which such a feeling state is positive, strengthens the preparation for the related option. Through this process a strongest option emerges which can become the outcome of the decision. For this selection process mutual inhibition relations may be added. The type of biologically inspired learning considered is Hebbian learning (cf. Hebb, 1949; Gerstner and Kistler, 2002), in four different variations by applying it to different types of connections in the decision model. In order to assess whether a specific decision making model can be considered as being rational, first a notion of rationality is needed. Such a notion strongly depends on characteristics of the environment. What is rational in one environment may be totally irrational in another environment vice versa. For example, throwing a ping pong ball (for table tennis) to hit something can be quite rational in an indoor environment without wind, but totally irrational in a stormy environment outdoor, as the effects of such an action will be totally different. Therefore a rationality measure needs to reflect the environment’s characteristics in the sense of its behaviour when actions are performed. Two examples of such a rationality measure will be defined and applied to assess the computational decision model. The point of departure for the underlying notion of rationality here is that the more the agent makes the most beneficial choices for the given environment, the more rational it is. In this paper, in Section 2 the decision model and the different variants of adaptivity considered are introduced. Sections 3 to 5 present a number of simulation results for a deterministic world, a stochastic world and a changing world, respectively. In Section 6 measures for rationality are discussed, and the different models are evaluated. Finally, Section 7 is a discussion.
2
2
The Adaptive Decision Model Addressed
Traditionally an important function attributed to the amygdala concerns the context of fear. However, in recent years much evidence on the amygdala in humans has been collected showing a function beyond this fear context. In humans many parts of the prefrontal cortex (PFC) and other brain areas such as hippocampus, basal ganglia, and hypothalamus have extensive, often bidirectional connections with the amygdale; e.g. (Ghashghaei, Hilgetag, Barbas, 2007; Janak and Tye, 2015; Likhtik and Paz, 2015; Morrison and Salzman, 2010; Salzman and Fusi, 2010). A role of amygdala activation has been found in various tasks involving emotional aspects; e.g., (Lindquist and Barrett, 2012; Murray, 2007; Pessoa, 2010). Usually emotional responses are triggered by stimuli for which a prediction is possible of a rewarding or aversive consequence. Feeling these emotions represents a way of experiencing the value of such a prediction: to which extent it is positive or negative for the person. This idea of value also plays a central role in work on the neural basis of economic choice in neuroeconomics. In particular, in decision-making tasks where different options are compared, choices have been related to a notion of value as represented in the amygdale; e.g., (Bechara, Damasio, and Damasio, 2003; Bechara, Damasio, Damasio, and Lee, 1999; Montague and Berns, 2002; Morrison and Salzman, 2010; Ousdal, Specht, Server, Andreassen, Dolan, Jensen, 2014; Pessoa, 2010; Rangel, Camerer, Montague, 2008; Sugrue, Corrado, and Newsome, 2005) [1, 2, 14, 15, 17, 19, 24]. Damasio (1999, 2010) distinguishes an emotion (or emotional response) from a feeling (or felt emotion); see for example: ‘Seen from a neural perspective, the emotion-feeling cycle begins in the brain, with the perception and appraisal of a stimulus potentially capable of causing an emotion and the subsequent triggering of an emotion. The process then spreads elsewhere in the brain and in the body proper, building up the emotional state. In closing, the process returns to the brain for the feeling part of the cycle, although the return involves brain regions different from those in which it all started.’ (Damasio, 2010, p. 111)
Any mental state in a person induces emotions felt by this person, as described by Damasio (1999; 2004; 2010); e.g.: ‘… few if any exceptions of any object or event, actually present or recalled from memory, are ever neutral in emotional terms. Through either innate design or by learning, we react to most, perhaps all, objects with emotions, however weak, and subsequent feelings, however feeble.’ (Damasio, 2004, p. 93)
More specifically, in this paper it is assumed that responses in relation to a sensory representation state roughly proceed according to the following causal chain for a body loop, based on elements from Bosse, Jonker, and Treur (2008) and Damasio (1999; 2004): sensory representation preparation for bodily response body state modification sensing body state sensory representation of body state induced feeling
In addition, an as-if body loop uses a direct causal relation
3
preparation for bodily response sensory representation of body state
as a shortcut in the causal chain; cf. Damasio (1994; 1999). This can be considered a prediction of the action effect by internal simulation (e.g., Hesslow, 2002). The resulting induced feeling is a valuation of this prediction. If the level of the feeling (which is assumed positive) is high, a positive valuation is obtained. In Damasio (1999) this is described as follows: ‘The changes related to body state are achieved by one of two mechanisms. One involves what I call the ‘body loop’. It uses both humoral signals (chemical messages conveyed via the bloodstream) and neural signals (electrochemical messages conveyed via nerve pathways). As a result of both types of signal the body landscape is changed and is subsequently represented in somatosensory structures of the central nervous system, from the brain stem on up. The change in the representation of the body landscape can partly be achieved by another mechanism, which I call the ‘as if body loop’. In this alternate mechanism, the representation of body-related changes is created directly in sensory body maps, under the control of other neural sites, for instance, the prefrontal cortices. It is ‘as if’ the body had really been changed but it was not.’ (Damasio, 1999, p. 79-80)
The body loop (or as-if body loop) is extended to a recursive (as-if) body loop by assuming that the preparation of the bodily response is also affected by the level of the induced feeling: induced feeling preparation for the bodily response
Such recursion is suggested by Damasio (2004), noticing that what is felt is a body state which is under control of the person: ‘The brain has a direct means to respond to the object as feelings unfold because the object at the origin is inside the body, rather than external to it. The brain can act directly on the very object it perceives. (…) The object at the origin on the one hand, and the brain map of that object on the other, can influence each other in a sort of reverberative process that is not to be found, for example, in the perception of an external object.’ (Damasio, 2004, pp. 91-92)
In this way the valuation of the prediction affects the preparation. A high valuation will strengthen activation of the preparation. Informally described theories in scientific disciplines, for example, in biological or neurological contexts, often are formulated in terms of causal relationships or in terms of dynamical systems. To adequately formalise such a theory the hybrid dynamic modelling language LEADSTO has been developed that subsumes qualitative and quantitative causal relationships, and dynamical systems; cf. (Bosse, Jonker, and Treur, 2008). This language has proven successful in a number of contexts, varying from biochemical processes that make up the dynamics of cell behaviour to neurological and cognitive processes; c.f.., (Bosse, Jonker, and Treur, 2008; Bosse, Jonker, Meij, Treur, 2007). Within LEADSTO a local dynamic property (LP) or temporal relation a D b denotes that when a state property a occurs, then after a certain time delay (which for each relation instance can be specified as any positive real number D), state property b will occur. Below, this D is the time step t. A dedicated software environment is available to support specification and
4
simulation. A specification of the model in LEADSTO format can be found in Appendix A. An overview of the basic decision model involving the generation and valuation of emotional responses and feelings is depicted in Fig. 1. This picture also shows formal representations explained in Table 1. However, note that the precise numerical relations are not expressed in this picture, but in the detailed specifications below, through local (dynamic) properties LP0 to LP6. To get an idea of a real life context for the model, consider a simple scenario in which a person can choose among three options of shops to get a certain product, for example, fruit. In the world each of the three options provides this product with a certain satisfaction factor, indicated by characteristics or effectiveness rates λ1, λ2, λ3 (measured between 0 and 1) for the three options respectively. The model describes a situation in which a number of times the person goes to all three shops to find out (and learn) how these satisfaction factors are. In each of the shop the person buys a fraction of the fruit corresponding to the person’s tendency or preference to decide for this shop.
world_state(w)
srs(w)
sensor_state(w)
ω1i
prep_state(bi)
effector_state(bi)
ω2i sensor_state(bi)
srs(bi)
feeling(bi)
ω3i
λi
Fig. 1. Overview of the model for decision making evaluated from a rationality perspective
The effector state for bi combined with the (possibly stochastic) effectiveness of executing bi in the world (indicated by λi) activates the sensor state for bi via the body loop as described above. By a recursive as-if body loop each of the preparations for bi generates a level of feeling for bi which is considered a valuation of the prediction of the action effect by internal simulation. This in turn affects the level of the related preparation for bi. Dynamic interaction within these loops results in an equilibrium for the strength of the preparation and of the feeling. Depending on these values, the option bi is actually activated with a certain intensity. The specific strengths of the connections from the sensory representation to the preparations, and within the recursive as-if body loops can be innate, or are acquired during lifetime.
5
Table 1 Overview of the (dynamic) states and connections used in the model
Formal name world_state(w)
Informal name World state w
Description This characterizes the current world situation which the person is facing
sensor_state(w)
Sensor state for w
sensor_state(bi)
Sensor state for bi
srs(w)
Sensory representation of world state w Sensory representation of body state bi Feeling associated to body state bi Preparation for an action involving bi Effector state for body state bi Weight of the connection from srs(w) to prep_state(bi) Weight of the connection from feeling(bi) to prep_state(bi) Weight of the connection from prep_state(bi) to srs(bi)
The person observes the world state through the sensor state, which provides sensory input The person observes the body state through the sensor state, which provides sensory input Internal representation of sensory input for w
srs(bi) feeling(bi) prep_state(bi) effector_state(bi)
ω1i ω2i
ω3i
A feeling state feeling(bi) is generated by a predictive as-if body loop, via the sensory representation srs(bi) of body state bi. This provides a valuing of a prediction for the option bi before executing it. Preparation for a response involving body state bi Body expression of bi This is one of the connections which can be learnt by Hebbian learning. This one models the relation between stimulus w and directly associated response bi. This is another connection which can be learnt by Hebbian learning. This one models how the generated feeling affects the preparation for response bi. This is the third connection which can be learnt by Hebbian learning. This one models how the preparation for response b i affects the (predicted) body state representation for b i.
The computational model is based on such neurological notions as valuing in relation to feeling, body loop and as-if body loop. The adaptivity in the model is based on Hebbian learning. The detailed specification of the model is presented below starting with how the world state is sensed. Note that in this detailed specification each local property LP is given in two (equivalent) formats. The first format is semiformally expressed in the LEADSTO language and the second format is in the form of a differential equation, for those readers more familiar with that format. For a summarized formal specification in differential equation format see Appendix A, for a summarized formal specification in LEADSTO format, see Appendix B.
More detailed description of the model As a first step sensing of the world state w takes place (see also Fig. 1, the arrow in the left upper corner). This means that the activation value of the sensor state for w is updated (gradually) to get its value more in the direction of the value of the world state for w. After a few steps these activation values will become (practically) equal. This is expressed in property LP0.
6
LP0 Sensing a world state If
world state property w occurs of level V1 and the sensor state for w occurs has level V2 then the sensor state for w will have level V2 + [V1 – V2] t. = [world_state(w) - sensor_state(w)]
From the sensor state for w, the sensory representation of w is updated by dynamic property LP1. Again this means that the activation value of the sensory representation for w is updated (gradually) to get its value more in the direction of the value of the sensor state of w. After a few steps these values will become (practically) equal. This is expressed in property LP1. LP1 Generating a sensory representation for a sensed world state If
the sensor state for world state property w has level V1, and the sensory representation for w has level V2 then the sensory representation for w will have level V2 + [V1 – V2] t. = [sensor_state(w) - srs(w)]
To get an updated value of the preparation state for bi, there are impacts of two other states (see the two arrows pointing at the preparation state in Fig. 1, right upper corner): the sensory representation of w and the feeling state for bi. This means that a combination of the activation values of these states have impact in the activation value of the preparation state. The combination function h to combine two inputs which activate a subsequent state uses the threshold function th thus keeping the resultant value in the range [0, 1]: th(σ, τ, V) =
where σ is the steepness and τ is the threshold value. The combination function is: h(σ, τ, V1, V2, 1, 2) = th (σ, τ, 1V1 + 2V2)
where V1 and V2 are the current activation level of the two states and ω1 and ω2 are the connection strength of the links from these states to the destination state. In this way dynamic property LP2 describes the update of the preparation state for bi from the sensory representation of w and feeling of bi. LP2 From sensory representation and feeling to preparation of a body state If and and and and and and then
a sensory representation for w with level V occurs the feeling associated with body state bi has level Vi the preparation state for bi has level Ui 1i is the strength of the connection from sensory representation for w to preparation for bi 2i is the strength of the connection from feeling of bi to preparation for bi σi is the steepness value for preparation of bi and τi is the threshold value for preparation of bi 1 is the person’s flexibility for bodily responses after t the preparation state for body state bi will have level Ui + [h(σi, τi, V, Vi, 1i, 2i ) - Ui] t. = [ h σi, τi, srs(w), feeling(bi), 1i, 2i) - preparation(bi)]
In a similar manner dynamic property LP3 describes the update of the activation level of the sensory representation of a body state from the respective values for preparation state and sensor state (see Fig. 1, left under corner).
7
LP3 From preparation and sensor state to sensory representation of a body state If and and and and and and then
preparation state for bi has level Xi sensor state for bi has level Vi the sensory representation for body state bi has level Ui 3i is the strength of the connection from preparation state for bi to sensory representation for bi σi is the steepness value for sensory representation of bi τi is the threshold value for sensory representation of bi 2 is the person’s flexibility for bodily responses after t the sensory representation for bi will have level Ui + [ h(σi, τi, Xi, Vi, 3i, 1)- Ui ] t. = [ h σi, τi, preparation(bi), sensor_state(bi), 3, 1) -
]
Dynamic property LP4 describes the update of the activation level of feeling bi from the activation level of the sensory representation (see Fig. 1 in the middle under). LP4 From sensory representation of a body state to feeling If
the sensory representation for body state bi has level V1, and bi is felt with level V2 then bi will be felt with level V2 + [V1 – V2] t). = [srs(bi) - feeling(bi)]
LP5 describes how the activation level of the effector state for bi is updated from the activation level of the preparation state (see Fig. 1, right upper corner). LP5 From preparation to effector state If
the preparation state for bi has level V1, and the effector state for body state bi has level V2. then the effector state for body state bi will have level V2 + [V1 – V2] t. = [preparation_state(bi) - effector_state(bi)]
LP6 describes the update of the activation level of the sensor state for bi from the activation level of the effector state for bi (see Fig. 1, the arrow from right upper corner to left under corner). LP6 From effector state to sensor state of a body state If
the effector state for bi has level V1, and λi is world preference/ recommendation for the option bi and the sensor state for body state bi has level V2, then the sensor state for bi will have level V2 + [λi V1 – V2] t = [λi effector_state(bi) - sensor_state(bi)]
8
WS(w,V)
1.2
WS(w,V)
1.2
1
1
0.8
0.8
World Preferences (Static)
0.6
λ1
0.6
λ2
0.4
0.4
λ3 0.2
0.2
0
0 0
100
200
300
400
500
600
700
Connection Strength srs(w)-prep(bi) 1.2
800
0
100
ω11
200
300
400
500
600
700
Effector States (bi)
ω12
ES(b1,V1) ES(b2,V2)
1.2
ω13
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
800
ES(b3,V3)
0 0
100
200
300
400
500
600
700
800
0
100
200
300
400
500
600
700
800
Fig. 2. Output of the basic model without learning
Just to get the idea, Fig. 2 shows some of the variables for a simple instance of the scenario and the output of the basic model in a deterministic environment without learning (further parameters are as for Fig. 3). This Fig. 2 only shows the output of the effector states for the options bi over time without learning any of the connections, for given deterministic world characteristics or preferences for the available options. The horizontal axis represents time (corresponds to simulation steps). The vertical axis represents the activation values of the depicted states which can be any values between 0 and 1. The upper left hand graph indicates the recurring world state (WS) in which the decision problem and the options occur. This serves as input. The symbol V (or V1, V2, V3) indicates the activation value, which for this world state switches from 0 to 1 and back only. The upper right hand graph shows the (different) world characteristics λi for the three options. They are preset as characteristics for any given scenario. The lower left hand graph shows that the strengths 1i for all three connections from srs(w) to prep_state(bi) are the same for this case and constant for the three options (no learning takes place). The lower right hand graph shows that the tendencies to choose for the three responses or decision options b1, b2 or b3 for w remain the same (due to the fact that no learning takes place). For the considered case study it was assumed that three options are available to the agent and the objective is to see how rationally an agent makes its decisions using a given human-like adaptive model: under deterministic as well as in stochastic world characteristics and in for static as well as changing worlds. The dynamic properties LP7 to LP9 describe a learning mechanism for the connection strengths (A) from sensory representation for w to preparation for option bi (B) from feeling bi to preparation for bi (C) from preparation for bi to sensory representation of bi
9
These have been explored separately (A), (B), or (C), and in combination (ABC). The first type of learning (A) corresponds to learning as considered traditionally for a connection between stimulus and response. The other two types of learning (B and C) concern strengthening of the as-if body loop. This models the association of the feeling to an option and how this feeling has impact on preparing for this option. The learning model used is Hebbian learning (Hebb, 1949). This is based on the principle that strengthening of a connection between neurons over time may take place when both nodes are often active simultaneously (‘neurons that fire together wire together’). The principle goes back to Hebb (1949), but has recently gained enhanced interest by more extensive empirical support, and more advanced mathematical formulations (e.g., Gerstner and Kistler, 2002). More specifically, here it is assumed that the strength of a connection between states S1 and S2 is adapted using the following Hebbian learning rule, taking into account a maximal connection strength 1, a learning rate >0, and an extinction rate ≥0 (usually small), and activation levels V1 and V2 (between 0 and 1) of the two nodes involved. The first expression is in differential equation format, the last one in difference equation format d /dt = V1V2 (1 - ) - (t +t) = (t) + [ V1(t)V2(t) (1 - (t)) - (t) ] t Such Hebbian learning rules can be found, for example, in (Gerstner and Kistler, 2002, p. 406). By the factor 1 - the learning rule keeps the level of bounded by 1 (which could be replaced by any other positive number). When the extinction rate is relatively low, the upward changes during learning are proportional to both V1 and V2 and maximal learning takes place when both are 1. Whenever one of these activation levels is 0 (or close to 0) extinction takes over, and slowly decreases (unlearning). First, based on the Hebbian learning mechanism described above LP7 models the update of the strength of the connection from sensory representation of w to preparation of bi (type A; see Fig. 1 the arrow with label 1i in the upper part, in the middle). LP7 Hebbian learning (A): connection from sensory representation of w to preparation of bi If and and and and then
the connection from sensory representation of w to preparation of bi has strength 1i the sensory representation for w has level V the preparation of bi has level Vi the learning rate from sensory representation of w to preparation of bi is the extinction rate from sensory representation of w to preparation of bi is after t the connection from sensory representation of w to preparation of bi will have strength 1i + (VVi (1 - 1i) - 1i) t. = srs(w)preparation(bi) (1 - 1i) - 1i
Similarly the learning of the two connections involved in the as-if body loop (B and C) are specified in LP8 and LP9 (see Fig. 1, the arrows with label 2i (B) and 3i (C)).
10
LP8 Hebbian learning (B): connection from feeling bi to preparation of bi If and and and and then
the connection from feeling associated with body state bi to preparation of bi has strength 2i the feeling for bi has level Vi the preparation of bi has level Ui the learning rate from feeling of bi to preparation of bi is the extinction rate from feeling of bi to preparation of bi is after t the connection from feeling of bi to preparation of bi will have strength 2i + (ViUi (1 - 2i) - 2i) t. = feeling(bi)preparation(bi) (1 - 2i) - 2i
LP9 Hebbian learning (C): connection from preparation of bi to sensory representation of bi If and and and then
the connection from preparation of bi to sensory representation of bi has strength 3i the preparation of bi has level Vi and the sensory representation of bi has level Ui the learning rate from preparation of bi to sensory representation of bi is the extinction rate from preparation of bi to sensory representation of bi is after t the connection from preparation of bi to sensory representation of bi will have strength 3i + (ViUi (1 - 3i) - 3i) t. = preparation(bi) srs(bi) (1 - 3i) - 3i
3 Simulation Results for a Deterministic World In this section some of the simulation results, performed using numerical software, are described in detail. A real life scenario context that can be kept in mind to understand what is happening is the following. Scenario context At regular times (for example, every week) a person faces the decision problem where to buy a certain type of fruit. There are three options for shops to choose from. Each option i has its own characteristics (price and quality, for example); based on these characteristics buying in this shop provides a satisfaction factor λi between 0 and 1. For the deterministic case it is just this value that is obtained for this option. For the stochastic case a probability distribution around this λi will determine the outcome. For a static world the λi remain constant, whereas for a dynamic world they may change over time. By going to a shop the person experiences the related outcome λi and learns from this. To find out which shop is most satisfactory, each time the person decides to go to all three of them to buy some fraction of the amount of fruit needed. This fraction corresponds to the person’s tendency or preference to decide for this shop. Due to the learning this tendency changes over time, in favour of the shop(s) which provide(s) the highest satisfaction.
The simulation results address different scenarios reflecting different types of world characteristics, from worlds that are deterministic to stochastic, and from a static to a changing world. Moreover, learning the connections was done one at a time (A), (B), (C), and learning multiple connections simultaneously (ABC). An overall summary of the results is given in Table 2. Note that this table only contains the values of the connection strengths and activation levels of the effector states after completion of
11
simulation experiments: at the end time. In contrast in the current and next section the processes are shown performed to achieve these final values. Results for the rationality measures are presented in the next section. For all simulation results shown, time is on the horizontal axis whereas the vertical axis shows the activation level of the different states. Step size for all simulations is ∆t = 1. Fig. 3 shows simulation results for the model under constant, deterministic world characteristics: λ1 = 0.9, λ2 = 0.2, and λ3 = 0.1. Other parameters are set as: learning rate = 0.04, extinction rate = 0.0015, initial connection strength ω2i= ω3i= 0.8, speed factors γ = 1, γ1 = 0.5, γ2 = 1, steepness = 2 and threshold = 1.2 for preparation state, and = 10 and = 0.3 for sensory representation of bi. For initial 80 time units the stimulus w is kept 1 and for next 170 time units it is kept 0 and the same sequence of activation and deactivation for stimulus is repeated for the rest of the simulation. Connection Strength srs(w)-prep(bi) 1.2
ω11
Effector States (bi)
ω12
ES(b1,V1) ES(b2,V2)
1.2
ω13
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
ES(b3,V3)
0 0
100
200
300
400
500
600
700
800
0
100
200
300
400
500
600
700
800
Fig. 3 Deterministic World: (a) Connection strengths (A) (b) Effector States for bi Initial values ω11= ω12= ω13= 0.5; η= 0.04, ζ= 0.0015
Moreover it depicts the situation in which only one type of links (ω1i) is learned as specified in LP7 using the Hebbian approach (A) for the connection from sensory representation of w to preparation state for bi. It is shown that the model adapts the connection strengths of the links ω1i according to the world characteristics given by λi. So ω11 strengthens more and more over time, resulting in the higher activation level of the effector state for b1 compared to the activation level of the effector states for the other two options b2 and b3. Fig. 4 and Fig. 5 show the simulation results while learning is performed for the links (B) from feeling to preparation state for bi and (C) from preparation state to sensory representation of bi respectively.
12
Connection Strength feeling(bi)-prep(bi) 1.2
ω21
Effector States (bi)
ω22
ES(b1,V1) ES(b2,V2)
1.2
ω23
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
ES(b3,V3)
0 0
100
200
300
400
500
600
700
800
0
100
200
300
400
500
600
700
800
Fig. 4 Deterministic World: (a) Connection strengths (B) (b) Effector States for bi Initial values ω21= ω22= ω23= 0.5, η = 0.04; ζ= 0.0015 Connection Strength prep(bi)-srs(bi) 1.2
ω31
Effector States (bi)
ω32
ES(b1,V1) ES(b2,V2)
1.2
ω33
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
ES(b3,V3)
0 0
100
200
300
400
500
600
700
800
0
100
200
300
400
500
600
700
800
Fig. 5 Deterministic World: (a) Connection strengths (C) (b) Effector States for bi Initial values ω31= ω32= ω33= 0.5, η = 0.04, ζ = 0.0015
Fig. 6 shows the results when the Hebbian learning is applied on all links simultaneously (ABC). These results show that for deterministic world characteristics the agent successfully adapts the connection strength in all four different cases rationally, as these connection strengths become more in line with the world characteristics.
13
Connection Strength srs(w)-prep(bi) 1.2
ω11
ω21
Connection Strength feeling(bi)-prep(bi)
ω12
ω22
1.2
ω13
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
ω23
0 0
100
200
300
400
500
600
700
Connection Strength prep(bi)-srs(bi) 1.2
800
0
100
200
ω31
300
400
500
600
700
Effector States (bi)
ω32
ES(b1,V1) ES(b2,V2)
1.2
ω33
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
800
ES(b3,V3)
0 0
100
200
300
400
500
600
700
800
0
100
200
300
400
500
600
700
800
Fig. 6. Deterministic World: Connection strengths (ABC) and Effector States for bi Initial values ω11= ω12= ω13= 0.5, ω21= ω22= ω23= 0.8, ω31= ω32= ω33= 0.8; ηi= 0.04, ζi= 0.0013
4 Simulation Results for a Stochastic World Other experiments were carried out for a stochastic world with four different cases as mentioned earlier. To simulate the stochastic world, probability distribution functions (PDF) were defined for λi according to a Normal Distribution. Using these PDFs, the random numbers were generated for λi limiting the values for the interval [0, 1] with μ1=0.9, μ2=0.2 and μ3=0.1 for λi respectively. Furthermore the standard deviation for all λi was taken 0.1. Fig. 7 shows the world state w and stochastic world characteristics λi. Figures 8 to 11 show the simulation results while learning is performed for the links (A) from sensory representation of w to preparation state for bi, (B) from feeling to preparation state for bi, and (C) from preparation state to sensory representation of bi respectively, one at a time, and (ABC) all three. WS(w,V)
1.2
WS(w,V)
1.2
1
1
0.8
Stochastic World
λ1
0.8
0.6
λ2
0.6
λ3
0.4
0.4
0.2
0.2 0
0 0
100
200
300
400
500
600
700
800
0
100
Fig. 7. Stochastic World
14
200
300
400
500
600
700
800
Connection Strength srs(w)-prep(bi) 1.2
ω11
Effector States (bi)
ω12
ES(b1,V1) ES(b2,V2)
1.2
ω13
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
ES(b3,V3)
0 0
100
200
300
400
500
600
700
800
0
100
200
300
400
500
600
700
800
Fig. 8 Stochastic World: (a) Connection strengths (A) (b) Effector States Initial values ω11= ω12= ω13= 0.5; η= 0.04, ζ= 0.0015 Connection Strength feeling(bi)-prep(bi) 1.2
ω21
Effector States (bi)
ω22
ES(b1,V1) ES(b2,V2)
1.2
ω23
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
ES(b3,V3)
0 0
100
200
300
400
500
600
700
800
0
100
200
300
400
500
600
700
800
Fig. 9 Stochastic World: (a) Connection strengths (B) (b) Effector States Initial values ω21= ω22= ω23= 0.5; η= 0.04, ζ= 0.0015 Connection Strength prep(bi)-srs(bi) 1.2
ω31
Effector States (bi)
ω32
ES(b1,V1) ES(b2,V2)
1.2
ω33
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
ES(b3,V3)
0 0
100
200
300
400
500
600
700
800
0
100
200
300
400
500
600
Fig. 10 Stochastic World: (a) Connection strengths (C) (b) Effector State Initial values ω31= ω32= ω33= 0.5; η= 0.04, ζ= 0.0015
15
700
800
Connection Strength srs(w)-prep(bi) 1.2
ω11
Connection Strength feeling(bi)-prep(bi)
ω12 1.2
ω13
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
ω21
ω22 ω23
0 0
100
200
300
400
500
600
700
Connection Strength feeling(bi)-prep(bi) 1.2
800
0
100
ω21
200
300
400
500
600
700
Effector States (bi)
ω22
ES(b1,V1) ES(b2,V2)
1.2
ω23
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
800
ES(b3,V3)
0 0
100
200
300
400
500
600
700
800
0
100
200
300
400
500
600
700
800
Fig. 11. Stochastic World: Connection Strengths (ABC) and Effector States Initial values ω11= ω12= ω13= 0.5, ω21= ω22= ω23= 0.8, ω31= ω32= ω33= 0.8; ηi= 0.04, ζi= 0.0013
It can be seen from these results that also in a stochastic scenario the agent model successfully learnt the connections and adapted the connections and effector states to the world characteristics with results quite similar to the results for a deterministic world.
5 Simulation Results for a Changing Stochastic World Another scenario was explored in which the (stochastic) world characteristics were changing drastically from μ1=0.9, μ2=0.2 and μ3=0.1 for λi respectively to μ1=0.1, μ2=0.2 and μ3=0.9 for λi respectively with standard deviation of 0.1 for all. Fig. 12 to Fig. 15 show the results for such a scenario. The results show that the agent has successfully adapted to the changing world characteristics over time, as the final effector states reflect more the changed world characteristics. The initial settings in this experiment were taken from the previous simulation results shown in Figs. 7 and 8 to keep the continuity of the experiment. It can be observed that the connection strength for option 3 becomes higher compared to the other options, and consequently the value of the effector state for b3 becomes higher than for the other two by the end of experiment.
16
ω11
Connection Strength srs(w)-prep(bi) 1.2
ES(b1,V1)
Effector States (bi)
ω12
ES(b2,V2)
1.2
ω13
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
ES(b3,V3)
0 0
400
800
1200
1600
2000
2400
2800
3200
3600
4000
0
400
800
1200
1600
2000
2400
2800
3200
3600
4000
Fig. 12 Changing Stochastic World: (a) Connection strengths (A) (b) Effector States Initial values ω11=0.78, ω12= 0.53, ω13= 0.52; η= 0.04,ζ= 0.0015 Connection Strength feeling(bi)-prep(bi) 1.2
ω21
ES(b1,V1)
Effector States (bi)
ω22
ES(b2,V2)
1.2
ω23
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
ES(b3,V3)
0 0
400
800
1200
1600
2000
2400
2800
3200
3600
4000
0
400
800
1200
1600
2000
2400
2800
3200
3600
4000
Fig. 13 Changing Stochastic World: (a) Connection strengths (B) (b) Effector States Initial values ω21=0.88, ω22= 0.58, ω23= 0.47; η= 0.04, ζ= 0.0015 ω31
Connection Strength prep(bi)-srs(bi)
ES(b1,V1)
Effector States (bi)
ω32
1.2
ES(b2,V2)
1.2
ω33
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
ES(b3,V3)
0 0
400
800
1200
1600
2000
2400
2800
3200
3600
4000
0
400
800
1200
1600
2000
2400
2800
3200
3600
Fig. 14 Changing Stochastic World: (a) Connection strengths (C) (b) Effector States Initial values ω31=0.87, ω32= 0.30, ω33= 0.23; η= 0.04, ζ= 0.0015
17
4000
Connection Strength srs(w)-prep(bi) 1.2
ω11
ω21
Connection Strength feeling(bi)-prep(bi)
ω12
ω22
1.2
ω13
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
ω23
0 0
800
1600
2400
3200
4000
4800
5600
6400
7200
8000
0
800
1600
ω31
Connection Strength prep(bi)-srs(bi)
3200
4000
4800
5600
6400
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
8000
ES(b2,V2)
1.2
ω33
7200
ES(b1,V1)
Effector States (bi)
ω32
1.2
2400
ES(b3,V3)
0
0
800
1600
2400
3200
4000
4800
5600
6400
7200
8000
0
800
1600
2400
3200
4000
4800
5600
6400
7200
8000
Fig. 15. Changing Stochastic World: Connection strengths (ABC) and Effector States Initial values ω11=0.80, ω12= 0.55, ω13= 0.54, ω21= ω31= 0.84, ω22= ω32= 0.30, ω23= ω33= 0.29; ηi= 0.04, ζi= 0.001
Note that the Table 2 contains the values of different connection strengths and the activation level of effector states after the completion of simulation experiments (at the end time).
18
Table 2. Overview of the simulation results for all links (A), (B), (C) and (ABC) Link
A
B
C
Scenario
ωx1
ωx2
ωx3
ES 1
ES 2
ES 3
S tatic
0.78
0.53
0.52
0.56
0.15
0.14
S tocastic
0.78
0.53
0.52
0.56
0.15
0.14
Change World
0.40
0.38
0.80
0.09
0.09
0.58
S tatic
0.89
0.58
0.46
0.65
0.38
0.31
S tochastic
0.89
0.59
0.47
0.65
0.39
0.32
Change World
0.42
0.57
0.89
0.30
0.37
0.65
S tatic
0.88
0.29
0.23
0.63
0.28
0.26
S tochastic
0.88
0.29
0.23
0.63
0.28
0.27
Change World
0.04
0.08
0.87
0.25
0.26
0.63
0.81
0.55
0.54
0.85
0.30
0.29
0.59
0.13
0.13
0.85
0.30
0.29
0.80
0.55
0.54
0.84
0.30
0.29
0.57
0.13
0.13
0.84
0.30
0.29
0.64
0.64
0.94
0.02
0.03
0.96
0.16
0.16
0.75
0.02
0.03
0.96
S tatic
ABC
S tocastic
Changed World
6 Evaluating the Models on Rationality In the previous section it was shown that the agent model behaves rationally in different scenarios. These scenarios and its different cases are elaborated in detail in the previous section, but the results were assessed with respect to their rationality in a qualitative and rather informal manner. For example, no attempt was made to assign an extent or level to the rationality observed during these experiments. The current section addresses this and to this end two different formally defined measures to assess the extent of the rationality are introduced. The notion of rationality aimed at in this paper concerns that the agent makes the choices that are most beneficial to it in the given environment. For example, it does not take into account the extent to which the agent has information about the environment. So, if the agent has had no experiences yet with the given environment (extreme form of incomplete information), then according to the notion considered here it will behave totally irrational. Only when the agent gathers more information about the environment by having experiences with choices previously made, its behaviour will become more and more rational as an adaptation to its environment. The focus then is on how rational the agent will become over time, in such an adaptation process to the environment. From the rationality measures considered here one rationality measure is based on a discrete scale and the other one on a continuous scale.
19
Method 1 (Discrete Rationality Measure) The first method presented is based on the following point of departure: an agent which has the same respective order of effector state activation levels for the different options compared to the order of world characteristics λi will be considered highly rational. So in this method the rank of the average value λi at any given time unit is determined, and compared with the rank of the respective effector state levels. More specifically, the following formula is used to determine the irrationality factor IF.
where n is the number of options available. This irrationality factor tells to which extent the agent is behaving rationally in the sense that the higher the irrationality factor IF is, the lower is the rationality of the agent. It is assumed that the there is uniqueness in ranking and none of the two values assign a similar rank. To calculate the discrete rationality factor DRF, the maximum possible irrationality factor Max. IF can be determined as follows. Max. IF =
Here ceiling(x) is the first integer higher than x. Note that Max. IF is approximately ½n2. As a higher IF means lower rationality, the discrete rationality factor DRF is calculated as: DRF = 1 -
On this scale, for each n only a limited number of values are possible; for example, for n = 3 three values are possible: 0, 0.5, and 1. In general ½ Max. IF +1 values are possible, which is approximately ¼n2 + 1. As an example, suppose during a simulation average values of λ1= 0.107636, λ2 = 0.203044, and λ3 = 0.888522 are given, whereas the effector state values are ES1=0.170554, ES2= 0.12367 and ES3 = 0.43477 at a given time point. So according to the given data the world’s ranks will be 3, 2, 1 for λ1, λ2, λ3 and the agent’s ranks are 2, 3, 1 for ES1, ES2, ES3 respectively. So according to the given formulas IF = 2, Max. IF = 4 and DRF = 0.5. So in this particular case at this given time point the agent is behaving rationally for 50%. Method 2 (Continuous Rationality Measure) The second method presented is based on the following point of departure: an agent which receives the maximum benefit will be the highly rational agent. This is only possible if ESi is 1 for the option whose λi is the highest. In this method to calculate the continuous rationality factor CRF, first to account for the effort spent in performing actions, the effector state values ESi are normalised as follows. nESi =
Here n is number of options available. Based on this the continuous rationality factor CRF Is determined as follows, with Max λi) the maximal value of the different λi. CRF =
20
This method enables to measure to which extent the agent is behaving rationally in a continuous manner. For the given example used to illustrate the previous method CRF= 0.6633. So according to this method the agent is considered to behaving for 66.33% rationally in the given world. Fig. 16 to Fig. 19 show the two types of rationality (depicted as percentages) of the agent for the different scenarios with changing stochastic world. In these figures the first 250 time points show the rationality achieved by the agent just before changing world characteristics drastically for the simulations shown in Section 5. Rationality over Time
Discrete
Rationality over Time
Continuous
120
120
100
100
80
80
60
60
40
40
20
20
Discrete
Continuous
0
0
0
400
800
0
1200 1600 2000 2400 2800 3200 3600 4000
Fig. 16. Rationality during learning ω1i (A) Rationality over Time
400
800
1200 1600 2000 2400 2800 3200 3600 4000
Fig. 17. Rationality during learning ω2i (B)
Discrete
Rationality over Time
Continuous
120
120
100
100
80
80
60
60
40
40
20
20
Discrete
Continuous
0
0 0
400
800
0
1200 1600 2000 2400 2800 3200 3600 4000
Fig. 18. Rationality during learning ω3i (C)
800
1600 2400 3200 4000 4800 5600 6400 7200 8000
Fig. 19. Rationality for learning ω1i, ω2i, ω3i (ABC)
From time point 250 onwards, it shows the rationality of the agent after the change has been made (see Section 5). It is clear from the results shown in Fig. 17 to Fig. 20 that the rationality factor of the agent in all four cases improves over time for the given world.
7 Discussion This paper focused on how the extent of rationality of a biologically inspired humanlike adaptive decision model can be analysed. In particular, this was explored for variants of a decision model based on valuing of predictions involving feeling states generated in the amygdala; e.g., (Bechara, Damasio, and Damasio, 2003; Bechara, Damasio, Damasio, and Lee, 1999; Damasio, 1994; Damasio, 2004; Montague and Berns, 2002; Janak and Tye, 2015; Morrison and Salzman, 2010; Ousdal, Specht, Server, Andreassen, Dolan, Jensen, 2014; Pessoa, 2010; Rangel, Camerer, Montague,
21
2008). The model was made adaptive using Hebbian learning; cf. (Gerstner and Kistler, 2002; Hebb, 1949). Note that this study limits itself to a biologically inspired model for natural or human-like decision making. A preliminary and shorter version of the work discussed here was presented in (Treur and Umair, 2011). To assess the extent of rationality with respect to given world characteristics, two measures were introduced, and using these measure, the extents of rationality of the different models over time were analysed. The point of departure for the underlying notion of rationality chosen is that the more the agent makes the most beneficial choices for the given environment, the more rational it is. It was shown how by the learning processes a high level of rationality was obtained, according to these measures, and how after a major world change after some delay this rationality level is re-obtained. It turned out that emotion-related valuing of predictions in the amygdala as a basis for adaptive decision making according to Hebbian learning satisfies reasonable rationality measures. In this sense the model shows how in human-like decision making emotions are used as a vehicle to obtain rational decisions. This contrasts with the traditional view that emotions and rationality disturb each other or compete with each other, especially in decision processes. More and more findings from neuroscience show that this traditional view is an inadequate way of conceptualisation of processes in the brain. In (Bosse, Treur, and Umair, 2012) it is shown how the presented model can be extended by social interaction and how this also contributes to rationality for the case of collective decision making. Moreover, in (Treur, and Umair, 2012) it is discussed how some other types of models can be evaluated according to the rationality measures used here.
References Bechara, A., Damasio, H., and Damasio, A.R. (2003). Role of the Amygdala in DecisionMaking. Ann. N.Y. Acad. Sci. 985, 356-369 (2003) Bechara, A., Damasio, H., Damasio, A.R., and Lee, G.P. (1999). Different Contributions of the Human Amygdala and Ventromedial Prefrontal Cortex to Decision-Making. Journal of Neuroscience 19, 5473–5481 (1999) Bosse, T., Hoogendoorn, M., Memon, Z.A., Treur, J., and Umair, M. (2010). An Adaptive Model for Dynamics of Desiring and Feeling based on Hebbian Learning. In: Yao, Y., Sun, R., Poggio, T., Liu, J., Zhong, N., and Huang, J. (eds.), Proceedings of the Second International Conference on Brain Informatics, BI'10. Lecture Notes in Artificial Intelligence, vol. 6334, pp. 14-28. Springer Verlag (2010) Bosse, T., Jonker, C.M., and Treur, J. (2008). Formalisation of Damasio's Theory of Emotion, Feeling and Core Consciousness. Consciousness and Cogn. 17, 94–113 (2008) Bosse, T., Jonker, C.M., Meij, L. van der, Treur, J. (2007). A Language and Environment for Analysis of Dynamics by Simulation. Intern. J. of AI Tools 16, 435-464 (2007) Damasio, A. (1994). Descartes’ Error: Emotion, Reason and the Human Brain. Papermac, London (1994)
22
Damasio, A. (1999). The Feeling of What Happens. Body and Emotion in the Making of Consciousness. New York: Harcourt Brace (1999) Damasio, A.: Looking for Spinoza. Vintage books, London (2004) Damasio, A.R. (2010). Self comes to mind: constructing the conscious brain. Pantheon Books, NY. Forgas, J.P., Laham, S.M., and Vargas, P.T. (2005). Mood effects on eyewitness memory: Affective influences on susceptibility to misinformation. Journal of Experimental Social Psychology 41, 574–588 (2005) Gerstner, W., and Kistler, W.M. (2002). Mathematical formulations of Hebbian learning. Biol. Cybern. 87, 404-415 (2002) Ghashghaei, H.T., Hilgetag, C.C., Barbas, H. (2007). Sequence of information processing for emotions based on the anatomic dialogue between prefrontal cortex and amygdala. Neuroimage 34, 905-923 (2007) Hebb, D.O. (1949). The Organization of Behaviour. New York: John Wiley & Sons (1949) Hesslow, G. (2002). Conscious thought as simulation of behaviour and perception. Trends Cogn. Sci. 6, 242-247 (2002) Montague, P.R., Berns, G.S.: Neural economics and the biological substrates of valuation. Neuron 36, 265-284 (2002) Morrison, S.E., and Salzman, C.D. (2010). Re-valuing the amygdala. Current Opinion in Neurobiology 20, 221–230 (2010) Murray, E.A. (2007). The amygdala, reward and emotion. Trends Cogn Sci 11, 489-497 (2007) Rangel, A., Camerer, C., Montague, P.R.: A framework for studying the neurobiology of value-based decision making. Nat Rev Neurosci 9, 545-556 (2008) Salzman, C.D., and Fusi, S. (2010). Emotion, Cognition, and Mental State Representation in Amygdala and Prefrontal Cortex. Annu. Rev. Neurosci. 33, 173–202 (2010) Sugrue, L.P., Corrado, G.S., Newsome, W.T. (2005). Choosing the greater of two goods: neural currencies for valuation and decision making. Nat Rev Neurosci 6, 363-375 (2005) Bosse, T., Treur, J., and Umair, M. (2012). Rationality for Adaptive Collective Decision Making Based on Emotion-Related Valuing and Contagion. In: W. Ding et al. (eds.), Modern Advances in Intelligent Systems and Tools, Proceedings of the 25th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE’12, Part II. Studies in Computational Intelligence, vol. 431, pp. 103–112. Springer Verlag, 2012. Janak, P.H., and Tye, K.M. (2015). From circuits to behaviour in the amygdale. Nature, vol. 517, 2015, pp. 284-292 Likhtik, E., and Paz, R. (2015). Amygdala–prefrontal interactions in (mal)adaptive learning. Trends in Neurosciences, vol. 38, 2015, pp. 158-166. Lindquist, K.A., and Barrett, L.F. (2012). A functional architecture of the human brain: emerging insights from the science of emotion. Trends in Cognitive Sciences, vol. 16, 2012, pp. 533-540. Ousdal, O.T., Specht, K., Server, A., Andreassen, O.A. Dolan, R.J., Jensen, J. (2014). The human amygdala encodes value and space during decision making. Neuroimage, vol. 101 (2014) 712–719. Pearson, J.M., Watson, K.K., and Platt, M.L. (2014). Decision Making: The Neuroethological Turn. Neuron, vol. 82, 2014, pp. 950-965 . Pessoa, L. (2010). Emotion and cognition and the amygdala: From “what is it?” to “what’s to be done?” Neuropsychologia, vol. 48, 2010, pp. 3416–3429. Rudebeck, P.H., and Murray, E.A., (2014). The Orbitofrontal Oracle: Cortical Mechanisms for the Prediction and Evaluation of Specific Behavioral Outcomes. Neuron, vol. 84, 2014, pp. 1143-1146.
23
Treur, J., and Umair, M. (2011). On Rationality of Decision Models Incorporating EmotionRelated Valuing and Hebbian Learning. In: Lu, B.-L., Zhang, L., Kwok, J. (eds.), Proceedings of the 18th International Conference on Neural Information Processing, ICONIP'11, Part III. Lecture Notes in Artificial Intelligence, vol. 7064, pp. 217–229. Springer Verlag, 2011. Treur, J., and Umair, M. (2012). Rationality for Temporal Discounting, Memory Traces and Hebbian Learning. In: W. Ding et al. (eds.), Modern Advances in Intelligent Systems and Tools, Proceedings of the 25th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE’12, Part II. Studies in Computational Intelligence, vol. 431, pp. 53–61. Springer Verlag, 2012.
Appendix A Model Specification in Differential Equation Format LP0 Sensing a world state = [world_state(w) - sensor_state(w)]
LP1 Generating a sensory representation for a sensed world state = [sensor_state(w) - srs(w)]
LP2 From sensory representation and feeling to preparation of a body state = [ h(σi, τi, srs(w), feeling(bi), 1i, 2i) - preparation(bi)]
LP3 From preparation and sensor state to sensory representation of a body state = [ h(σi, τi, preparation(bi), sensor_state(bi), 3, 1) -
]
LP4 From sensory representation of a body state to feeling = [srs(bi) - feeling(bi)]
LP5 From preparation to effector state = [preparation_state(bi) - effector_state(bi)]
LP6 From effector state to sensor state of a body state = [λi effector_state(bi) - sensor_state(bi)]
LP7 Hebbian learning (A): connection from sensory representation of w to preparation of bi
= srs(w)preparation(bi) (1 - 1i) - 1i
LP8 Hebbian learning (B): connection from feeling bi to preparation of bi
= feeling(bi)preparation(bi) (1 - 2i) - 2i
LP9 Hebbian learning (C): connection from preparation of bi to sensory representation of bi
= preparation(bi) srs(bi) (1 - 3i) - 3i
24
Appendix B Model Specification in LEADSTO Format LP0 Sensing a world state
world_state(w, V1) & sensor_state(w, V2) sensor_state(w, V2 + [V1 – V2] t)
LP1 Generating a sensory representation for a sensed world state sensor_state(w, V1) & srs(w, V2) srs(w, V2 + [V1 – V2] t)
LP2 From sensory representation and feeling to preparation of a body state srs(w, V) & feeling(bi, Vi) & preparation_state(bi, Ui) & has_connection_strength(srs(w), preparation(bi), 1i) & has_connection_strength(feeling(bi), preparation(bi), 2i) & has_steepness(prep_state(bi), σi) & has_threshold(prep_state(bi), τi) preparation(bi, Ui + 1 (h(σi, τi, V, Vi, 1i, 2i) - Ui) t)
LP3 From preparation and sensor state to sensory representation of a body state preparation_state (bi, Xi) & sensor_state(bi, Vi) & srs (bi, Ui) & has_connection_strength(preparation(bi), srs(bi), 3i) & has_steepness(srs(bi), σi) & has_threshold(srs(bi), τi) srs(bi, Ui + 2 (h(σi, τi, Xi, Vi, 3i, 1) - Ui) t)
LP4 From sensory representation of a body state to feeling
srs(bi, V1) & feeling(bi, V2) feeling(bi, V2 + [V1 – V2] t)
LP5 From preparation to effector state
preparation_state(bi, V) & effector_state(bi, V2) effector_state(bi, V2 + [V1 – V2] t))
LP6 From effector state to sensor state of a body state effector_state(bi,V1) & effectiveness_rate(bi, λi) & sensor_state(bi,V2) sensor_state(bi, V2 + [λi V1 – V2] t)
LP7 Hebbian learning (A): connection from sensory representation of w to preparation of bi has_connection_strength(srs(w), preparation(bi), 1i) & srs(w, V) & preparation(bi, Vi) & has_learning_rate(srs(w), preparation(bi), ) & has_extinction_rate(srs(w), preparation(bi), ) has_connection_strength( w, bi, 1i + (VVi (1 - 1i) - 1i) t)
LP8 Hebbian learning (B): connection from feeling bi to preparation of bi
has_connection_strength(feeling(bi), preparation(bi), 2i) & feeling(bi, Vi) & preparation(bi, Ui) & has_learning_rate(feeling(bi), preparation(bi), ) & has_extinction_rate(feeling(bi), preparation(bi), ) has_connection_strength(feeling(bi), preparation(bi), 2i + (ViUi (1 - 2i) - 2i) t)
LP9 Hebbian learning (C): connection from preparation of bi to sensory representation of bi
has_connection_strength(preparation (bi), srs(bi), 3i) & preparation (bi, Vi) & srs(bi, Ui) & has_learning_rate(preparation (bi), srs(bi), ) & has_extinction_rate(preparation (bi), srs(bi), ) has_connection_strength(preparation (bi), srs(bi), 3i + (ViUi (1 - 3i) - 3i) t)
25