Using Evolution for Thinking and Deciding - wseas

2 downloads 0 Views 142KB Size Report
world. First, we are going to let our tripod learn to stand up whatever the initial state it finds itself in and without any indication of what stand up means except.
Using Evolution for Thinking and Deciding F. Bellas, J. A. Becerra, R. J. Duro Grupo de Sistemas Autónomos Universidade da Coruña Mendizábal s/n, 15403 Ferrol SPAIN Abstract: This paper addresses the problem of implementing mechanisms that permit artificial agents to obtain original solutions to problems and, at the same time, make use of their previous experience in an as unpredetermined way as possible. This is achieved through a cognitive mechanism that uses its interaction with the surroundings in order to improve the level of satisfaction obtained. The mechanism is based on Darwinist principles and involves a two level concurrent operation of evolutionary processes. The first, or unconscious level, consists of evolutionary processes over models of the environment that are evaluated according to how good they are at predicting the next perceptions. This leads to a current, or conscious model, which is employed for evaluating strategies, using as fitness the level of satisfaction of the motivations of the artificial organism. Once a strategy is selected as the current one, it is carried out by means of the effectors that act on the real environment, returning a fitness measure for the evolution of the models. This cycle is repeated and, as time progresses, the models and strategies the organism works with become better adapted to the fulfillment of its motivations. This type of mechanism presents several advantages over other structures, such as its capability of obtaining original solutions and, at the same time, its capacity to make use of previous experience. In addition, if the environment changes, the mechanism will adapt efficiently. Key-Words: Cognitive Mechanism, Evolutionary Brain, Artificial Neural Networks, Autonomous Robots.

1 Introduction One of the main points of interest in autonomous robotics is how to provide artificial structures with cognitive mechanisms whose operation and adaptability to changing environments, requirements and tasks are not predetermined by the designer. In other words, mechanisms that allow the artificial agent to organize its own knowledge from its own perceptions and needs and that allow it to survive and operate in an efficient manner in its environment. In the stride towards this goal, many approaches have been followed, some completely original and others inspired in natural ways of doing things. In this paper we are going to consider a model based on some not too well known theories within the field of cognitive science that relate the brain or neural structure with its operation through a darwinist, or evolutionary approach in somatic time. These theories are: the Theory of Evolutionary Learning Circuits (TELC) [1], [2]; the Theory of Selective Stabilization of Synapses (TSSS) [3], [4]; the Theory of Selective Stabilization of Pre-Representations (TSSP) [5] and the Theory of Neuronal Group Selection (TNGS) or “Neural Darwinism” [6]. For an excellent review see [7]. Taking inspiration from these theories we have developed an operational cognitive mechanism for

artificial organisms. The problem could be formulated in the following terms: How can we provide our organisms with an underlying cognitive mechanism which does not predetermine their behavior or learning and which allows them to find creative and original solutions in an autonomous way? Our approach to solving this question was initially presented in [8] and since then we have developed it to provide the agent with a darwinist cognitive mechanism that allows it to acquire in its life time a model of the environment or environments in which it lives and of the consequences of its actions over its internal state. This model is obtained and continuously updated through an evolutionary process. There are few examples in the behaviour based robotics approach and, in particular, in the evolutionary robotics literature, where explicit models of the environment the robot operates on are used. Watson [9] presents a system that, starting from some pre-trained building blocks and behaviour sequences, when released into a real time environment adds its experiences to memory, building new behaviour sequences, rules and procedures and deleting unused ones through a genetic algorithm. Nordin et al. [10] employ a memory based genetic programming

mechanism in order to obtain a cognitive architecture for a Khepera robot that makes use of previous experience in its interaction with the world. Basically, a planning process can incorporate a GP system that is used to evolve a suitable plan for the optimization of the outcome given the best current world model. In [11], the author introduces a selectionist mechanism, he calls selectron, for the evolution of new behavioral competencies of robotic agents. This mechanism may be used on-line and on a real robot as it operates on a changing environment. Here the approach we are going to follow is based on the development of an artificial nervous system that will be used as a base for achieving the objectives set for the autonomous agent. In section 2 we will present the basic cognitive architecture. Section 3 is devoted to the operation of the mechanism. In section 4 we provide some simple application examples and in section 5 we discuss these results. Finally, in section 6 we indicate some conclusions and the work we will be carrying out in the near future.







World Model Evolver: Structure that manages world models through evolution. This structure evolves a population of world models and each moment of time selects the best one according to the information it has available about the real world. Internal Model Evolver: Manages an internal model base, evolving it and providing the best internal model available when it is requested. Strategy Evolver: This component manages and evolves strategies and when the agent needs one it provides it.

2 Cognitive architecture The cognitive architecture developed is based on three blocks that are interrelated to conform a closed loop between the environment and the artificial organism (figure 1). These blocks handle three types of representations or functional units. They are combined each instant of time to obtain the best global structure for the robot according to the situations it finds itself in. These three functional units are: •





World Models: A world model is a computational structure that permits evaluating the consequences of actions on the environment as perceived by the agent (sensed values in t+1). Internal Models: Computational structures that permit predicting the internal state in instant t+1 from the sensory inputs predicted for t+1 (provided by the world model) and the motivation. That is, it provides an indication of a satisfaction factor the agent uses to make decisions. Strategies: These are actions or sequences of several actions.

For the mechanism to function, these three elements must be structured and combined in a way that permits a fluid operation. In our case, the three basic blocks of this mechanism are:

Fig. 1: Block diagram of the cognitive structure

3 Operation The basic idea in the operation of the structure we describe in the previous section is divided into two parts: 1. Internal operation: The internal operation may be assimilated to a thought process. That is, during this stage, and without any interaction with the external world, the mechanism explores possible solutions to its problems. It uses the information it has compiled through its models and what it knows of the environment to obtain better models, deciding in the end the best strategy to employ.

2. External operation: The selected strategies are applied in the environment, producing new values the sensors can measure and thus, producing new information the agent can use to improve its models of the world. Consequently, we can say that the environment through the sensory inputs evaluates the world models and the internal models. The current model through the predicted satisfaction of the motivations evaluates the strategies. The key idea in the operation of the mechanism is how the thought process takes place. Each block is managed through a Darwinist process. The prototypes of darwinist strategy in computation are genetic algorithms and evolutionary techniques. The controllers these processes handle are Artificial Neural Networks, whose parameters constitute the “chromosomes” of the genetic algorithms, although any other computational structure could be used. As regards to the interaction with the environment, it takes place by means of the sensory inputs and the actions performed by the agent through its effectors. These interactions will only use the language induced by the sensory and actuation mechanism available. These action-perception pairs obtained from the environment are the only information the agent has on the nature and mechanics of the environment, and will be the only information we can use to improve our models. Our mechanism preserves a short-term memory of these pairs. It is used in the evolutionary process that evolves world models. The inputs to world models are perceptions and actions in time t and the outputs are the predicted perceptions in time t+1. In each moment of the interaction we will select the model with the highest fitness value as the current model. In addition, from the perceptions predicted by the current world model, and the objective or phobias included in the motivation of the organisms, the internal model will provide the expected satisfaction index these perceptions provide. The internal model evolver uses the data obtained from the environment short-term event memory and the motivation satisfaction history to evolve the best internal model possible, which is set as the current internal world model. Finally, when trying to decide the best strategy for the agent in a given moment of time, the set of strategies are evolved. The current world model is used to predict the effect of each strategy on the agent´s future perceptions. These predicted perceptions are run through the current internal model, whose output, the predicted satisfaction of the motivational index, becomes the fitness the fitness

value for each strategy. The strategy that obtains the highest fitness is then selected as the current strategy and applied on the environment. This basic cycle is repeated indefinitely and establishes the process for performing a parallel processing scheme (through the evolution of populations of possible solutions) and leading to sequential solutions by selection with very few constraints regarding the type of solutions that may result. It also provides the opportunity for generating new un-programmed solutions and the capacity for predicting and planning.

4 Application To demonstrate the validity of the concepts applied in this work to the generation of coherent cognitive structures that allow artificial beings to adapt to different environments in an efficient manner, we present two very simple, but illustrative, application examples. In both cases we will make use of a three legged robot performing operations in a simulated world. First, we are going to let our tripod learn to stand up whatever the initial state it finds itself in and without any indication of what stand up means except that the internal satisfaction (motivational) index is higher the higher the center of masses of the organism. This example will allow us to provide some indications of how the mechanism goes about doing its job and how models and strategies are evolved. In the second example, we include a sensing apparatus in the tripod that is equivalent to two light sensors. Here the satisfaction of the robot is not statically obtained as it must learn to act differently depending on the position of the light with respect to itself and on how satisfied it feels. This is, to feel well, the robot must follow the light as closely as possible until its “light reservoir” is filled, and after this, light “hurts” and it must keep away from it as much as possible within the limits set by the experiment. This will provide us with an indication of how well the mechanism works in cases where the objectives for the artificial organism change in time and it must adapt its strategies to fulfil these new objectives. In both experiments the world model base and the internal model base are made up of populations of artificial neural networks. The strategy base consists of sequences of actions of different lengths. All three structures are evolved, as indicated before, using the sensorial information acquired by the organism through its interaction with the world and stored in the short term event memory (usually 50 instants long). The evolutionary processes run between interactions with the world and, usually run 4 generations.

finally generate coherent strategies in order to achieve its objective, in this case, to stand up. We display this process in figure 2. Here we show snapshots of five instants in the life of an organism that is learning to stand up (the learning period was about 500 instants). Initially, the actions it performs are pseudorandom and allow it to explore the world (first three snapshots). Once its representation of it is good enough, it is able to consistently reach the goal (last two snapshots). To demonstrate the adaptability of this mechanism to a changing world or requirements on the individual, we have run a simulation where we have changed the objective or, in other words, the maximum satisfaction state of the robot every 20 instants of time and allowed it to use the information from its world model to rapidly obtain new strategies leading to the optimal state. We show the global position of the robot after the execution of a strategy. The red arrows represent the desired position. As we can see, in many cases there is a transient period before it reaches the objective. This transient corresponds to the learning of the models in a stable manner.

4.1 Learning to Stand Up As a first example, we have considered a threelegged robot in a very simple environment. The robot must learn by itself to stand up on its three legs starting from any configuration, much like a baby when it is learning to stand. The world models are neural networks with 6 inputs (sensing values and strategy in t) and 3 outputs (predicted sensing in t+1). The internal models are also neural networks, again with 4 inputs (the sensing values in t+1 predicted by the current world model and the motivation) and, in this simple case, just one output (predicted internal state in t+1). In both cases we evolve the weights of the neural networks using as fitness function the similarity, in the form of an Euclidean distance, between the values predicted by the networks and those that are obtained in reality when the robot finally interacts with the world through the selected strategies.

28

EXECUTION OF STRATEGIES IN THE ENVIRONMENT

27

30

25 22

24

23

23

22

25

18

13 ROBOT POSITION

22

20

11 15

10 7

10

17

9

6 5

12

7 4

0 0

100

5 200

300

400

500

600

700

800

ITERATION

Fig. 2. The tripod robot reaches three intermediate positions (14.34, 18.89, 23.40) before the desired one (28), which is reached (28.02) and maintained.

Fig. 3: Position of robot (line) and desired position (arrows) in an experiment where the objectives are changed with time.

4.2 Following a light Initially, the organism and its cognitive mechanism are made up of randomly generated neural networks and strategies, and thus know nothing of the world they are in or of themselves. They have no idea of the perception+action → consequent perception pairs that rule their interaction with the world or of the perception → satisfaction-index pairs that would allow it to select strategies that would bring the maximum satisfaction in a rapid manner. With time, through trial and error the robot obtains a better representation of the world and is able to use it to

As a second example, we have added two sensors to the tripod robot and introduced it in a world where there is a light moving. To be satisfied, the robot must follow the light as closely as possible until its “light reservoir” is filled. After this, the light “hurts” and it must keep away from it as much as possible within the limits set by the experiment. Thus, in this experiment, in addition to developing world models on the effects of its actions on its perceptions, the internal model of the robot must adapt to two different circumstances, when the light is desirable and when the light is harmful. The whole mechanism must adapt

and generate coherent strategies in both cases. In figure 4 we show the setup of the problem and in figure 5 the result obtained by the robot using the cognitive mechanism presented before. In this case, the ANN’s for the world models have 3 inputs (light perceived by each light sensor in figure 4 and the motion to be performed by the robot legs) and two outputs (the perceptions after the movement). The ANN’s for the internal models have two inputs (perception of light predicted by the world model) and one output (internal state). The robot is run for 50 instants of time following the light, which it does perfectly well as shown in area A of figure 5. After this period, the light begins hurting, that is, the motivational index is changed and the whole mechanism must adapt in order to obtain strategies that allow the robot to avoid the light as much as possible (area B of figure 5). It can be seen that the robot adopts a strategy of escaping from the light to its extreme position until the light comes relatively close to it and then it just goes to the other end of its range. This is a very good strategy and, what is more relevant, it was able to adapt to a change in its objectives in no more than one interaction with the environment.

D1 (5.0-27.5)

3-30

D2 (5.0-27.5)

3-30

Fig. 4: Setup for experiment 2. 35

A

B

30

Height

25 20 15 10 5 0 0

Time (Interaction with the environment)

100

Fig. 5: Example 2. Trajectory of robot (solid line) following a light (circles) for 50 instants of time and then the goal is changed to avoiding the light.

5 Discussion When applying this mechanism, as in the case of any living organism in the first stages of learning, the actions performed cannot be called strategies with an objective. They are basically randomly selected actions. Obviously the selection of the actions is not random, as the world models in the model based are used to predict their consequences and the internal models to predict internal satisfaction, but as these models are initially very poor or even completely random, the resulting actions will also be very poor for the objective that must be achieved. These initial actions provide the organism with sets of experimental data on the action-perceptionsatisfaction correspondences in the real world, and as time passes the evolution of the model bases using these data, will make the models better adapted to the real world the robot finds itself in and the objective it must fulfill for its satisfaction index to go up. After a certain amount of interaction, the models, both world and internal, become good enough for the strategies selected using them to be meaningful. The operation of the organism then becomes purposeful. Thus, it ends up achieving the objectives it has set and, what is more important, in a consistent manner. This is very important, as the objective of this whole cognitive mechanism is not only to make the organism achieve its objective (there are many other techniques that would allow it to achieve individual objectives faster and in a simpler way), but to provide it with an internal mechanism that permits learning the structure of the world it finds itself in and of itself so that it can evaluate the best medium term strategy without interacting with the world until it has selected it. In stronger words, it is a way of allowing the organism to learn to think and decide before acting. This is the basic idea in experiment 2. For the first 50 instants of time the robot, using the models of the world and itself it is developing, and through their evolution and selection thinks out the best actions to perform according to the perceptions it receives. Only after this thought process does it perform the action, and, as shown in the figure, it is usually quite appropriate. In instant of time 50 the robot starts perceiving light as a harmful entity and, consequently, must change the prediction of the satisfaction a given predicted perception is going to provide so as to be able to obtain the best possible way of avoiding light. I would like to remark that this change of motivation doesn’t imply a change in the world models, the mechanism must just select the strategies with a different criterion. As shown in figure 5, the model base it has and the evolutionary mechanism employed allow it to start doing this in just one interaction with the world, and after 5 interactions it becomes quite good. The

first avoidance actions are relatively clumsy, as seen in the figure, as it is adapting its internal model to the new circumstances, but the remaining transitions are quite sharp, the decisions it makes are crisp.

[5]

Changeux, J.P., Heidmann, T., and Patte, P.. Learning by Selection. The Biology of Learning, P. Marler and H.S. Terrace (Eds.), Berlin, Springer-Verlag. 1984, pp. 115-133,

[6]

Edelman, G.M., Neural Darwinism. The Theory of Neuronal Group Selection. Basic Books 1987.

[7]

Weiss, G., Neural Networks and Evolutionary Computation. Part II: Hybrid Approaches on the Neurosciences. Proceedings of the First IEEE Conference on Evolutionary Computation, Vol. 1, 1994, pp. 273-277.

[8]

F. Prieto, R. J. Duro, J. Santos. Modelo de Mecanismo Mental Darwinista para robots móviles autónomos. Proceedings of SAAEI’96. Zaragoza, Spain, 1996.

[9]

Watson, J.B., Behavior-Based Control for Autonomous Robotics. Proceedings of the 3rd Annual Conference on Evolutionary Programming, Sebald, A.V. and Fogel, L.J. (Eds.), World Scientific, Singapore, 1994, pp. 185-190.

6 Conclusions In this paper we address the problem of how a cognitive operation model for an artificial organism based on a darwinist mechanism could work so as not to predetermine de behaviors of the organism. The model provides an “evolutionary learning” mechanism that may be employed on-line in dynamic environments, where the final model is a summary of the correlation of the information the organism has considered relevant for the required behaviors. This structure permits artificial organisms to think things out before acting and provides a framework for learning to take place. It is important to note that during this thinking process, two things must occur, it is necessary to be able to generate original solutions (strategies) that maybe the organism has never used before and it must be able to make use of previous experience. This is achieved through the use of evolutionary processes that allow new solutions to arise and, at the same time permit combining, in a meaningful way, previously acquired experience in the form of world or internal models or in the form of successful strategies. The results obtained through the implementation of this mechanism are very promising and allow artificial organisms to interact in a seamless way with simple environments. We are currently working on the application of the structure to more complex cognitive processes of real robots operating in real environments. References: [1]

Conrad, M., Evolutionary Learning Circuits. J. Theor. Biol. 46, 1974, pp. 167.

[2]

Conrad, M. Complementary Molecular Models of Learning and Memory, BioSystems 8, 1976, pp. 119-138.

[3]

Changeux, J.P., Courrege, P., and Danchin, A. A Theory of the Epigenesis of Neural Networks by Selective Stabilization of Sinapsis, Proc. Natl. Acad. Sci. USA, 70, 1973, pp. 2974-2978.

[4]

Changeux, J.P., and Danchin, A. Selective Stabilization of Developing Synapsis as a Mechanism for the Specification of Neural Networks, Nature 264, 1976, pp. 705-712.

[10] Nordin, P., Banzhaf, W., and Brameier, M.. Evolution of a World Model for a Miniature Robot Using Genetic Programming. Robotics and Autonomous Systems, Vol. 25, 1998, pp. 105-116. [11] Steels, L.. Emergent Functionality in Robotic Agents through On-line Evolution. Artificial Life IV, 1995, pp. 8-14.

Suggest Documents