Emotionally motivated reinforcement learning based ... - CiteSeerX

2004 IEEE International Conference on Systems, Man and Cybernetics

Emotionally Motivated Reinforcement Learning 13ased Controller Aladdin Ayesh

De Montfort University, The Gateway Leicester LE1 9BH

UK [email protected]

There have been several attempts to model emotions in airtanomous agents and robotics. The use of emotions in conjunction with reinforcement learning in particular has arnacted attention since both notions are borrowed analogies from psychology The work presented here is an approach to robot control based on modeling emotions within reinforcement learning algorithm. The main cantribidion ofthis paper is the use offizzy cognitive mops (FCMj to facilitate the modeling of emotions and inferencing for action selection, This approach does nor use feeling estimation; instead a direct link between sensoiy data and emotions is used for emotional estimation. An emotion based reinforcement learning algarithm is proposed for action selection in robotic control.

little formalization of FCM in computational intelligence; nonetheless, they have a great potential for supporting hybrid systems because of their resemblance to neural networks and ability to present qualitative data. We use cognitive maps in this paper to define links between emotions, emotional states, physical states and actions. By doing this, we bypass the feeling interpretation layer [4] of sensory inputs and provide a direct link between sensory fusion and emotion estimation. We stalt the paper with a preliminaries section. This establishes two topics: fuzzy cognitive maps and emotional modeling. The emotion-action modeling section presents the theoretical aspects of development. The implementation section discusses some of the implementation and ongoing experiments. We conclude the paper with future work.

Keywords: Intelligent Control, Reinforcement Learning, Emotions.

2 Preliminaries

Abstract

-

2.1

1 Introduction

Fuzzy Cognitive Maps

Cognitive maps were proposed in psychology literature as early as 1948 [I 31 to explain some interesting research results from experiments done on animals such as rats. Following that, cognitive maps were used to explain some human behavior on both the individual and social levels [ I Z ] . The same concept was borrowed by economics and political sciences and was developed further to model and enable decision making 19,151. Recently, cognitive maps started to attract the attention of computational intelligence researchers. The increasing popularity of cognitive maps is due to many factors. First is their similarity to neural networks and semantic networks in providing a connectionist model [I]. At the same time, their psychological background, in studying rats' spatial behavior [IZ], makes them similar to spatial-topological maps that are often used in robots. Cognitive maps are graphical and formal representations of crisp cause-effect relationships among the elements of a given environment [9, IO]. They are similar to neural nets in the sense that they consist of nodes that are linked together. However, they differ from neural nets in the fact that they represent semantically

There have been several attempts to model emotions in autonomous agents and robotics, e.g. [1-4]. The use of emotions in conjunction with reinforcement leaming in particular has attracted attention [4-61 since both notions are borrowed analogies from psychology [7]. The work presented here is an approach to robot control based on modeling emotions within reinforcement Ieaming algorithm. The main contribution of this paper is the use of Furzy cognitive maps (FCM) [SI to facilitate the modeling of emotions and inferencing for action selection [I].This is done in support of a reductionist emotional model that neglects feelings estimation. Instead, it uses sensory data directly for emotional estimation. Cognitive maps [8-141 are relatively a new and interesting technique introduced to computational intelligence from economics and political sciences [8, 91. However, they have their origins in psychology [12, 131. Recently, they have become increasingly popular with computational intelligence researchers, especially a version blended with fuzzy logic [8, 9, 141. There is a 0-7803-8566-7/04/$20.00 0 2004 IEEE.

874

defined relationships as opposed to the often numericallyweighted links in neural networks. A version of cognitive maps that is blended with fuzzy logic [&IO, 141 is more interesting from a computational intelligence viewpoint. Fuzzy cognitive maps (FCM) are represented in the form of fuzzy signed directed graphs with feedbacks. They model the world as a collection of concepts and causal relations between these concepts [8, IO]. This provides, in our opinion, the middle ground between the pure connectionists and the symbolic AI approaches. However, there are still few studies in the formal representation and inferencing using cognitivemaps [I,10, 141. Cognitive maps and fuzzy cognitive maps are a relatively new topic within the field of computational intelligence with little formalization. This is one of the weaknesses cognitive maps suffer. Since cognitive maps were introduced into computational intelligence via economic sciences they are often presented as a knowledge representation tool. Consequently, there is very little research on the formalization of cognitive maps in general and on inferencing within cognitive maps in particular. This may hinder their use for the purpose of reasoning. In this paper we use cognitive maps and f u u y cognitive maps to define links between emotions, emotional states, physical states and actions. This allows us to bypass the feeling interpretation layer [4] of sensory inputs. Instead, sensory data is used directly to estimate emotions and trigger reactions. The aim is to have a more reactive type of robot that can act almost in real time.

2.2

Modelling emotions for control

The emotional model Gadanho and Hallam presented in [ 5 ] distinguishes between two notions: emotions and feelings. Feelings, as presented by Gadanho and Hallam, can be seen as sensor fusion processes. The results are then used to calculate approximately emotions. This creates a further level of calculations which separates between sensing and emotional estimation. However, a feed back allows the approximation of feelings’ intensity based on emotions. Some similarities can be seen between the Gadanho and Hallam model and the model presented in this paper. First, both use reinforcement leaming [4] as the underlying mechanism for action selection. Second, Gadanho and Hallam deploys neural networks, here we deploy cognitive maps. The similarity between neural networks and cognitive maps has already been highlighted. Conversely, the differences between the two models appear in modeling emotions and calculating action triggers A more symbolic approach is used in this paper. First, feelings are replaced with limited emotional states. Unlike the Gadanho and Hallam model, we specify a set of emotional states that has direct impact on the behavior. This reductionist approach may neglect some of

the interesting analogies with biological basis of emotions and behavior psychology; nonetheless, it provides us with a model suitable for a variety of robots especially reactive robots with limited processing resources. In estimating emotions, we calculate emotional intensity instead of feelings intensity. We use cognitive maps to associate between emotions, actions and physical states. Fuzzy functions can be used to calculate the relationships between emotions and emotional states. Unlike the case in [ 5 , 161 we do not determine a dominant emotion for any emotional state. Emotional states are used to estimate action emotional triggers. Action costs are evaluated based on physical and emotional states combined.

Emotion - Action Modelling

3

The cognitive map represents the robot’s knowledge

of its states. As a result, two types of states are required: physical and emotional states. The emotional state is calculated from the set of emotions a robot may have in a given physical state. The physical state on the other hand is calculated from perception of environment and changes within that environment. To simplify matters perception [I71 will not be discussed here. Instead physical states are calculated using actions deploying STRIPS-like assumptions [18], this simplifies sensory fusion.

3.1

Emotion modeling

In modeling emotions we distinguish between emotions (e), emotional state (es) and emotional triggers (et). Figure 1 shows the directed relationships between physical states @hs), actions (a) and the emotional elements.

Figure 1. Relationship between emotions, actions and robot’s states. The physical state represents the robot’s perception of itself in relation to the physical environment. The changes in the physical state, which are often the result of environmental changes or the displacement of the robot, affects the robot emotions which in turn may affect the emotional state of the robot. Since STRIPS-like assumptions are used, actions will be the main reason for physical changes. The emotional state will change less often and over longer cycles of operation than physical state or emotions. The results of emotional state changes will be reflected on emotional triggers which are the action selection mechanism. The manipulation and adaptation of the links

075

between emotional state, emotional triggers and actions allow a more adaptive and non-detenninistic behavior to emerge. As a result, the system will he capable of coping with a larger range of applications. The emotional system proposed is based on the assumption of six types of emotions [19], namely fear (F), anger (A), happiness (H), disgust (D),sadness (S), and surprise (S). There are three emotional states: neutral (N), restless (R) and stable (G). Stable emotional state represents the goal state in which emotional estimation functions terminate. Each emotion is represented as a fuzzy set with a membership function E that defines the intensity of each emotion. Each emotional state is linked to every emotion with a fuzzy function estimating the strength of that link. This estimation is based on the changes in the physical states and emotional rules determining the meaning of each emotion in relation to the emotional states. The system emotional state, which determines the emotional trigger, is determined through a mar function. If the positive or desirable emotions are more apparent in the resulting emotions estimation the emotional state becomes closer to stable, whilst the negative or nondesirable emotions bring the emotional state to restless. A balance between the two will maintain the neutral state. Emotional triggers are linked to one or more actions through an estimation function. Based on these links the system defines which action will be triggered given an association of emotional state and physical state. Emotional triggers appear as selection cases within the reinforcement learning algorithm.

3.2

The action cost affects the emotional trigger, which acts mostly as an action selection mechanism. Given an emotional state es, the emotional trigger would trigger the low cost action that leads to the stable (G) or maintains the neutral (N) emotional state, while the restless (R) emotional state would override the action cost. Two fuzzy memhenhip functions are used here. The first relates the emotional triggers to the emotional states. The second relates the emotional triggers with the action costs.

A simplified version of reinforcement learning algorithms (Algorithm I), similar to Q learning algorithm [20, ~3751,is developed based on emotional state and cost. The cases in the algorithm, which refer to emotional triggers, are self-explanatory hence they mac’ to the three possible emotional states. Algorithm 1 Emotions-based I-einforcement learning algorithm (QE)

V @hs, es).3a Initialize an entry QE(a,phs,es) to zero Observe current physical state @hs) and emotional state (es) While (Car-Act =true) { Select action a and execute it; Receive an immediate cost c;

Action modeling

Observe phs’ and es’;

We define two types of actions: primary (PA) and composed (CA). Each composed action is a predetermined sequence of primary actions, which may be viewed as suh-plans or behaviors. A primary cost is anached to each primary action which is then used in conjunction with the physical-emotional states estimation to determine the primary action cost within the reached physical state. Cosfa(PA.phs) = IJCosta(PA)*[e@hs: es‘ = restless) - e(jhs. es = restless)]) i[e(phs: es‘ = neutral) e(jhs, es = neutral)] - [e(jhs: es’ = stable) - e p h s .

es =stable)]/

Algorithmic development

3.3

Case Goal-achived:

es’

=

stable and es’ t

threshold Then Can-Act=false; Break;

Case Agitation: es’ =restless and es’ > es then

For @hs,es): if a is only action then Ignore cost;

(1)

Break

Each composed action would have a cost calculated as follows: Costa(CA) = L Y C o s t a (PA) iCostv(PA) (2)

Else: Ignore cost and action; Break; Else: Update entry QE(a,phs,es) as follows:

r is a constant representing the complexity cost of the composed action; in other words, the more complex the composed action is, the higher is the cost of that action. Cost, represents the cost of emor adjustment that may be required with the primary action P A , .

QE(a,phs,es) = c + min QE(a’,phs’,es’)

QE is the costing function uses equations I and 2. The costing table represents the cognitive map, which

876

could he expanded to a more complex representation [21]. Emotions rules are used with fuzzy operators to estimate the emotional state (es). These rules are represented as simple if-then rules at this stage. The reinforcement algorithm will treat primary and composed actions in the same way. A further development is to enable the dynamic formation of composed actions and to provide a polymorphic version of QE to deal with composed actions accordingly. This may he difficult to implement on Lego robots hut it should be possible on Pioneer robots as we discuss the implementation in the next section.

4 Implementation The algorithms are being implemented on Lego and Pioneer robots. The differences between the capabilities of these two types of robots assess the suitability of the algorithms for smaller and more reactive robots. Lego robots have limitations of memory and processing power. This forced the re-adjustment of the algorithms used. The main adjustment is the restrictions placed on the calculation of updates. The calculation of action cost has a limit of how many future states it can consider after which it resets itself. In addition, the links between emotions, physical states and actions are not modifiable. Early trials show that the robot forgets some of its leamed behavior, which is expected. This is not what is aimed for even though it is somewhat realistic in simulating animals. One of the modifications are being tried is to develop 'strip maps'. As a result, the entries of the Q table will he reduced to the minimum. Practically, only the last entry per action, physical and emotional states will he maintained, which means any possibility of backtracking will he lost. The links between emotional states, physical states, emotional triggers and actions will he updated in line with the Q table. Future work will focus on extending this into a two dimensional map in which some spatial information is also maintained. Pioneers do not suffer from Lego's limitations. As a result, the standard algorithm and an extended algorithm are being implemented. The extension is represented in the way emotions are calculated. The current emotional state will he taken into consideration in calculating the next emotional state [16]. This complicates the setting of the Q table and the selection of entries. This is being implemented and trial results will be reponed in future paper. The benefit that is hoped to he gained is smoother transition between emotions and a reduction in 'emotional outbursts' when changing between emotional states.

feelings, was discussed. In this approach, we do not separate sensor fusion from emotions calculation. Instead, sensors are used directly to calculate emotions. These emotions are then used to map to preset emotional states. A cognitive map is developed to link between emotions, emotional states, physical states, actions and action triggers. There are three possible versions of this map. The two used are: ( I ) with preset weights, (2) with fuzzy functions to calculate links' weights. The main contribution of this paper is the development of control algorithm based on Q leaming algorithm that links directly between sensory data and emotion estimation. The aim is to provide real time or near real time data processing for the robot reactions. Future work includes further expansion in the cognitive map representation and further development of fuzzy estimators. Also, some spatial information will he encoded in 2-D cognitive maps. This will provide an internal perception-emotion [I] model of self, environment and eventually other robots sharing the environment. On the theoretical side, we will he looking at the development of 'social cognitive maps' [12]. In additions, the link between cognitive maps and neural networks will then be exploited. Another extension is the dynamic formation of composed actions. The ideal situation is to start the robot with few primary actions. As the robot is trained, more complex behaviors should emerge formulating in composed actions. Achieving this requires further studies in cognitive maps formalization, inferencing and selfformation.

References A. Ayesh, "Perception and Emotion Based Reasoning: A Connectionist Approach," Informatica, vol. 27, pp. 119-126,2003, D. Cafiamero, "Issues in the design of emotional agents," presented at AAA1 Fall Symposium on Emotional and Intelligent: The tangled knot of cognition, Technical Report FS-98-03, Menlo Park, CA., 1998. D. D. de Grandpre and D. M. Tucker, "Emotion and the Self-Organization of Semantic Memory,'' in Learning us Se(f-orgunizufion,K. H. Prihram and J. King, Eds. Mahwah, New Jersey: Lawrence Erlbaum Associates, Publishers, 1996, pp. 423442. S. Gadanho, "Reinforcement Leaming in Autonomous Robots: An Empirical Investigation of the Role of Emotions." Edinburgh: PhD Thesis, University of Edinburgh, 1999. S. C. Gadanho and J. Hallam, "Emotion triggered leaming in autonomous robot control," Grounding emorions in adupfive sysfenzs,A Special Issue of Cybernetics a n d S p f e m s ,vol. 32, pp. 531-559, 2001.

5 Conclusions and Future work In this paper, the modeling of emotions and their links to behavior were expressed in the form of cognitive maps. Cognitive maps were discussed in detail. A reductionist approach to emotions, which bypasses

877

S. C. Gadanho and L. Custodio, "Asynchronous Leaming by Emotions and Cognition," presented at Proceedings of the Seventh Intemational Conference on the Simulation of Adaptive Behavior (SAB2002), 2002. R. L. Solso, Cognitivepsychologv, 6th ed. Boston: Allyn &Bacon, A Peareon Education Company, 2001. B. Kosko, " F u u y Cognitive Maps," International Journal of Man-Machine Studies, pp. 65-75, 1986. R. Axelrod, Structiire of Decision: the Cognitive Maps ofPoMcal Elites. Princeton, New Jersey: Princeton Universit y Press, 1976. C. Carlsson and R. Fuller, "Adaptive Fuzzy Cognitive Maps for Hyperknowledge Representation in Strategy Formation Process," presented at International Panel Conference on SoA and Intelligent Computing, 1996. E. Chown, S. Kaplan, and D.Kortenkamp, "Prototypes, Location, and Associative Networks (PLAN): Towards a Unified Theory of Cognitive Mapping," CognitiveScience, vol. 19, pp. 1-51, 1995. E. Laszlo, R. Artigiani, A. Combs, and V. Csanyi, Changing Visions: Human Cognitive Maps: Past, Present, undFurure. London, England: Adamantine Press Limited, 1996. E. C. Tolman, "Cognitive maps in rats and men," Ps)'chologiculReviav, pp. 189-209, 1948. M. P. Wellman, "Inference in Cognitive Maps," SIAMJournalon Comptiting, vol. 36, pp. 1 - 12, 1994.

M. Pidd, Toolsfor Thinking: Modelling in Management Science. Chichester: John Wiley & Sons, 2000. A. Ayesh and J. Cowell, "Emotional analysis of Facial Expressions," presented at IEEE SMC 2004, The Hague, The Netherlands, 2004. F. Pini and A. Finzi, "An approach to Perception in Theory of Actions: Part I," ETAI, vol. 4, 1999. R. E. Fikes and N. J. Nilsson, "STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving," in Readings in Planning, J. Allen, J. Hendler, and A. Tate, Eds. San Mateo, CA, USA: Morgan Kaufmann Publishers, INC., 1990, pp. 88-97. P. Ekman, W. Wallace, and J. Hager, Facial Action Coding System, 2002. T. M. Mitchell, Machine Learning. Singapore: McGraw-Hill Book Co., 1997. D.Randell and M. Witkowski, "Building Large Composition Tables via Axiomatic Theories," presented at Principles of Knowledge Representation and Reasoning: Proceedings of the Eighth International Conference (KR-2002), 2002.

878

Emotionally motivated reinforcement learning based ... - CiteSeerX

Emotionally motivated reinforcement learning based ... - CiteSeerX

Suggest Documents

Motivated Reinforcement Learning for Non-Player ... - CiteSeerX

Motivated Reinforcement Learning for Adaptive ... - CiteSeerX

Intrinsically Motivated Reinforcement Learning: A ... - Brown CS

Web Collaboration Motivated by Colors Emotionally ...

Application of Opposition-Based Reinforcement Learning ... - CiteSeerX

Reinforcement Learning-based Load Shared Sequential ... - CiteSeerX

Module-Based Reinforcement Learning: Experiments with ... - CiteSeerX

Model-based Reinforcement Learning with State ... - CiteSeerX

Self-Motivated, Task-Independent Reinforcement Learning for Robots

Schema-Based Modular Reinforcement Learning

Practical Kernel-Based Reinforcement Learning

Bootstrapping Reinforcement Learning-based ...

Reinforcement Learning based approach to

Context Driven Reinforcement Learning - CiteSeerX

Politically Motivated Reinforcement Seeking - R. Kelly Garrett

Emotion-Driven Reinforcement Learning - CiteSeerX

REINFORCEMENT LEARNING FOR COORDINATED ... - CiteSeerX

Intrinsically Motivated Learning of Hierarchical Collections ... - CiteSeerX

Learning Options in Reinforcement Learning - CiteSeerX

Motivated Learning for Computational Intelligence - CiteSeerX

A reinforcement learning Ticket-Based Probing path ... - CiteSeerX

Fast Feature Selection for Reinforcement-Learning-based ... - CiteSeerX

Free-energy-based Reinforcement Learning in a Partially ... - CiteSeerX

Reinforcement Learning