Coevolutionary Process Control Jan Paredis Research Institute for Knowledge Systems Universiteit Maastricht Postbus 463, NL-6200 AL Maastricht, The Netherlands e-mail:
[email protected]
Abstract. This text describes the use of a Coevolutionary Genetic Algorithm (CGA) for process control. The CGA combines two Artificial Life techniques life-time fitness evaluation (LTFE) and coevolution - to improve the genetic search for a Neural Network (NN) controlling a given process. Here, the approach is illustrated and tested on a well-known bioreactor control problem which involves issues of delay, nonlinearity and instability. The goal is to control the system such that it is steered towards - and then remains at - a specific point in the state space. Different types of target points are used in the experiments: stable and unstable points. This paper concentrates on the features of CGAs which are particularly relevant for process control. Keywords: arms races, coevolution, life-time fitness evaluation, neural networks, predatorprey systems, process control.
1. Introduction Escaping from predators is clearly an essential ability which increases an entity's survival chances and hence also its reproduction chances. In general, predator-prey interactions often result in a selection pressure which enforces evolution's drive towards the creation of highly complex adaptations. These interactions typically result in a strong evolutionary pressure for preys to defend themselves better (e.g. by running quicker, growing bigger shields, better camouflage ...) in response to which future generations of predators develop better attacking strategies (e.g. stronger claws, better eye-sight ...). Here, success on one side is felt by the other side as failure to which must be "responded" in order to maintain one's chances of survival. This, in turn, calls for reaction of the other side. Such an arms race results in a stepwise increase in complexity of both predator and prey. Predator-prey coevolution is the main motor behind the coevolutionary genetic algorithm (CGA) presented here. In contrast with traditional "single population" genetic algorithms (GAs), a CGA operates on two or more coevolving populations. Earlier research studied the use of a CGA for two - completely unrelated - tasks: the search for good classification neural networks (Paredis 1994a) and the search for solutions of constraint satisfaction problems (Paredis 1994b). Both applications have demonstrated the A short version of this paper appears in: Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms (ICANNGA97), G. D. Smith (ed.), Springer, Vienna 1997 Please use this as reference.
power of the CGA. It has been shown that a CGA gets its performance from the combination of coevolution and life-time fitness evaluation (see later). In addition to this, a symbiotic variant of the CGA algorithm has been used to search for good genetic representations for a given problem (Paredis 1995; 1996). According to the definition of Narendra and Parthasarathy (1990), process control involves the analysis and synthesis of dynamical systems in which one or more variables are kept within prescribed bounds. The current paper demonstrates the use of a CGA for process control. It is shown how the properties of a CGA can be exploited for process control. The goal of this paper is not to compare the performance of a CGA with that of traditional single population GAs. This was already done in (Paredis 1994a) within the same context of genetic search in the weight-space of neural networks (NNs). That paper focused on a classification task. The structure of this text is as follows: the next section briefly describes the benchmark problem. Section 3 introduces the CGA and its application to process control. Next, empirical results are described. Finally, conclusions are given.
2. The Bioreactor Figure 1 depicts a bioreactor. Such a reactor is a tank containing water, nutrients and biological cells. Nutrients and cells are introduced in the tank at a rate called r. The state of the process is characterized by the number of cells and the amount of nutrients. The outflow rate is equal to the inflow rate. As a consequence, the volume in the tank remains constant. The objective of the bioreactor control problem is to bring - and maintain - the amount of cells in the tank at an - à priori given - desired level. Anderson and Miller (1991) describe the relevance of this problem as follows: The bioreactor is a challenging problem for neural network controllers for several reasons. Although the task involves few variables and is easily simulated, its nonlinearity makes it difficult to control. For example, small changes in para-meter value can cause the bioreactor to become unstable. The issues of delay, nonlinearity, and instability can be studied with the bioreactor control problem. Significant delays exist between changes in the flow rate and the response in cell concentration.
inflow-rate = r
c1 = amount of cells c2 = amount of nutrients
outflow-rate = r
Figure 1: Schematic representation of a bioreactor
Mathematically, the state of the tank is described by c1 and c2, the amount of cells and nutrients respectively. The control parameter is r. The variables c1 and c2 are constrained to the interval [0,1], r is in the interval [0,2]. The equations of motion describing the system dynamics are given below Here, the functions c1[t], c2[t], and r[t] represent the values of c1, c2, and r at time t. The state of the entire system is described by the triple (c1[t], c2[t], r[t]). The initial values, c1[0], c2[0], and r[0], are given. They define the control problem to be solved. In the equations, the growth parameter, ß, is equal to 0.02. The nutrient inhibition parameter, g, equals 0.48. These parameters determine the rate of cell growth and nutrient consumption, respectively. A sampling interval, ∆, of 0.01 seconds is used. Hence, each application of the equations describes the state of the reactor 0.01 second later. Equation 1 says that the change in amount of cells is equal to the amount of cells carried out of the tank (∆c1[t]r[t]) plus the amount by which the cells grow. This growth rate is proportional to the amount of cells (c1[t]) but depends nonlinearly on the concentration of nutrient (c2[t]). The second equation says that the change in the amount of nutrients equals the rate at which nutrient is swept out of the system plus the rate at which it is metabolized by the cells (Ungar 1991). Equations of Motion:
c1[t + 1] = c1[t] + ∆ ⋅ (−c1[t]⋅ r[t] + c1[t]⋅ (1 − c2[t]) ⋅ e
c2[t] γ
c2[t + 1] = c2[t] + ∆ ⋅ (−c2[t]⋅ r[t] + c1[t]⋅ (1 − c2[t]) ⋅ e
(1)
)
c2[t] γ
⋅
1+ β 1 + β − c2[t]
(2)
The goal of this problem is to bring - and keep - the amount of cells at a desired level during 50 seconds. This involves 5000 applications of the equations of motion given above. During this period of 50 seconds, the control parameter r can be adjusted every half a second. Hence, the input to the controller is c1[t] and c2[t] for t= 50, 100 ... 5000. The output is r[t], again for t= 50, 100 ... 5000. For other values of t, r[t] is equal to r[t-1]. The overall objective is to minimize the cumulative measure:
∑ (c1[t] − c1*[t])2
(3)
t∈{50,100...5000}
Here, c1*[t] is a constant function giving the desired level of c1. Ungar (1991), as well as Anderson and Miller (1991), defines two kinds of bioreactor problems. These involve the control of the bioreactor around stable or unstable desired states (c1*, c2*, r*). Here r* is the known flow rate at which the tank remains at the state (c1*, c2*). All problems start from a state in a region around the target state, i.e. c1[0] is a random number uniformly drawn from the interval [0.9c1*, 1.1c1*]. In a similar way, c2[0] is drawn from [0.9c2*, 1.1c2*] and r[0] from [0.9r*, 1.1r*]. The difference between a stable and an unstable state can be illustrated with a simple example. Consider a pendulum. An example of control around a stable point involves keeping the pendulum pointing downward as much as possible, given small external perturbances. The inverted pendulum illustrates the concept of control around an unstable state. Here, the goal is to have the pendulum pointing up all the time. This typically involves carefully balancing the pendulum. This intuitive example illustrates the difference in difficulty between both problems: it is much more difficult to stay around an unstable state than around a stable state. Nevertheless, both problems allow to illustrate various properties of our CGA.
2.1 Problem 1 The first problem starts from a region in the state space of the bioreactor from which a stable state has to be achieved. Here, this stable state is: (c1* , c2*, r * ) = (0.1207, 0.8801, 0.75). Or in other words, these amounts of nutrients and cells can be maintained with a constant flow-rate of 0.75. The controller has to steer the system such that c1 reaches c1* and stays there. The system is controlled over a period of 50 seconds. 2.2 Problem 2 The second problem starts from a region in the state space of the bioreactor from which an unstable state (c1*, c2*, r*) = (0.2107, 0.7226, 1.25) has to be achieved. Again, the initial values of c1[0], c2[0] and r[0] are within 10% of their target values.
3. The CGA Figure 2 depicts the NN architecture used: a standard multi-layer perceptron with one hidden layer consisting of 12 hidden nodes. This NN maps the amount of cells and nutrients in the tank to the flow-rate. The maximum activation value of a node is 1. The flow rate, r[t], should, however, cover the interval [0,2]. Hence, the activation value of the output node is doubled in order to obtain r[t].
c1[t] r[t]/2 c2[t]
.... t= 50, 100, ..., 5000
Figure 2: The architecture of the neurocontroller
Our CGA approach operates on two populations which interact through their fitness. The first population contains NNs. We use a simple genetic representation of a NN: a linear string of its weights, with weights belonging to links feeding into the same node located next to each other. In accordance with (Paredis 1991) and (Whitley 1993), the weights are encoded directly as real numbers. Furthermore, the weights of the NNs in the initial population are in the range [-1,1]. This in order to keep weight contributions in a range to which the (sigmoidal) transfer function is sensitive (Wieland 1990). The second population consists of 200 starting states around the goal state. They take the form of triples (c1[0],c2[0],r[0]). These three variables are randomly generated but deviate maximally 10 % from their "optimal" values: c1*, c2* and r*, respectively. In the strictest sense, these states do not form a real population because they are never replaced (this in contrast with some of the other CGA applications (Paredis 1995; 1996)). Hence, the population consists all the time of the same 200 starting states. In the CGA, the interaction between the populations occurs during encounters between a NN and a starting state. During such an encounter, the NN is used in combination with the simulator. This is done in the following way: starting from the c1[0], c2[0], and r[0] specified by the starting state, the equations of motion (1) and (2) are applied 50 times in order to simulate the bioreactor for half a second. This determines the values of c1[50] and c2[50]
which are then clamped on the corresponding input nodes of the NN. Next, the activation value of the output node is calculated using standard forward propagation. This value is then doubled and assigned to r[50]. Or, in other words, this gives the new value of r. This cycle of simulation followed by the calculation of r is repeated a hundred times. The error-measure, as defined by Equation 3, is used to calculate the reward for the two objects involved in the encounter. For the state, this pay-off is equal to the error. The NN, on the other hand, receives a pay-off equal to the negative of the error (see later). This is analogous to the negative fitness interaction in predator-prey systems in which success on one side entails failure for the other side and vice versa. Each encounter results in updating the fitness of the involved NN and starting state. The fitness of an object - NN and search state - is defined as the average pay-off received over the last twenty encounters it was involved in. For this purpose, each object has an associated history which contains the twenty pay-offs received most recently. Now, the complete algorithm can be described. First, the two populations are created. The members of the initial NN population are strings of randomly generated weights. The population of states, consists of randomly generated (c1,c2,r) triples with a maximum deviation of 10 % from c1*, c2*, and r*, respectively. Next, the following cycle is repeated (the pseudo-code below describes this basic cycle). First, twenty encounters are executed. This involves the SELECTion of a state and a NN. This selection is biased towards highly ranked individuals in the population which is sorted on fitness. The error resulting from the ENCOUNTER is then used to award a pay-off to the NN and state. This pay-off is pushed on the history of these individuals. At the same time the oldest pay-off is removed from both histories. Next, the pay-offs in the histories are averaged in order to UPDATE the fitness of the individuals. After the execution of 20 such encounters, two NNs are SELECTed in order to reproduce. Here, standard two-point crossover and adaptive mutation (Whitley 1989) is used. The fitness of the new NN-offspring is then calculated. This is done through twenty encounters of the NN with SELECTed states. This fills up the history of the NN with negative errors which are then averaged in order to obtain the initial fitness1. Finally, the new NN is INSERTed into the appropriate rank location in the NN population. At the same time, the individual with the minimum fitness is removed. In this way, the population remains sorted on fitness. The last four lines of the code below are identical to the basic cycle of GENITOR (Whitley 1989), a well-known GA. DO 20 TIMES nn:= SELECT(nn-pop) state:= SELECT(state-pop) error:= ENCOUNTER(nn,state) UPDATE-FITNESS(nn,-error) UPDATE-FITNESS(state, error) nn1:= SELECT(nn-pop) ; parent1 nn2:= SELECT(nn-pop) ; parent2 child:= MUTATE-CROSSOVER(nn1,nn2) f:= FITNESS(child) INSERT(child,f,nn-pop)
In the code above, two parameters are introduced: the number of encounters before each reproduction and the number of most recent encounters used to calculate the fitness of the NNs and states. In the code above, in all experiments in the rest of this paper, and in all other CGA applications, both parameters are - rather arbitrarily - set to 20. 1This same procedure is used to compute the fitness of the NNs in the initial population. The
history of the members of the initial state population, on the other hand, is filled with 1s. Hence, their initial fitness is equal to 1.
It is important to remark that a "maximization GA" is used here. Hence, highly fit NNs receive small (negative) pay-offs, i.e. they keep the error small. Highly fit states, on the other hand, get relatively high (positive) pay-offs. Or, in other words, a state is considered fit when the NNs cannot achieve good control from it. We use the term life-time fitness evaluation (LTFE) for the continuous fitness feedback resulting from the encounters. Because of the fitness interaction between states and NNs, both populations are kept in constant flux. As soon as the NNs are becoming successful on a certain set of states then these states get a lower fitness and other, more difficult states will be selected more often. This process forces the CGA (or, more specific, the selection process) to concentrate now on these states too (because of fitness proportional SELECTion of the pairs of individuals involved in an encounter). As Hillis (1992) argues, this has two advantages over the traditional approach. Primo, it helps prevent large portions of the NN population from becoming stuck in local optima. Secundo, fitness testing becomes much more efficient: one mainly focuses on not yet solved states. The results of the experiments in the next section show the power of CGAs as well as their applicability on control problems.
4. Empirical Results Unless stated otherwise, all experiments described here, use the following parameter values: population size of the NN population = 100; population size of the population of starting states = 200; the hidden layer of the NN contains 12 nodes.
100
75
50
25
0
number of cells
4.1 Problem 1 Figure 3 depicts the behavior of the NN with highest rank obtained after a typical run of 5000 basic cycles. Here, the experiment starts from the state (c1*, c2*, r*). One clearly sees a quick damping of the number of cells. The system stabilizes at 0.12144, which is only about 0.6% away from the desired value of 0.1207. It is good to keep in mind that the CGA has been trained on starting states in a region around (c1*, c2*, r*). It was, however, not trained on these specific values. Hence, this plot - to some extent - illustrates how the fittest NN generalizes. Now we can also investigate the 0.1225 ranking of the starting states in their c1 population. Figure 4 compares the 0.122 control achieved by the best NN starting from the fittest and from 0.1215 the least fit starting states. These tests were done after the execution 0.121 of 5000 basic cycles. The curve labelled c1-opt gives, again, the 0.1205 control achieved by the fittest NN starting from (c1 * , c2*, r*). The curves labelled c1-ex1 and c1-ex2 Simulated time (in 0.5 seconds) depict the control achieved by this same NN starting from the two Figure 3: Control achieved after a typical CGA run fittest states. Similarly, ex199 and ex200 are the two least fit examples. Both graphs have the same scale. This allows for easy comparison between them. First of all, the good control starting from the stable state can be observed. Secondly, it is clear that control starting from the fit states is less good than the control starting from the less fit states (compare the amplitude of the oscillations in both graphs). This clearly illustrates that LTFE
ranks the states. Or, in other words, states from which the current NN population achieves poor control have a high fitness. 0.17
0.17 c1-opt 0.15
c1-ex1
number of cells
c1-ex2
0.13
0.11
c1-ex199 c1-ex200
0.13
0.11
0.09
Simulated time (in 0.5 seconds)
100
75
50
0
100
75
50
25
0
0.09 25
number of cells
0.15
c1-opt
Simulated time (in 0.5 seconds)
Figure 4: Control achieved starting from states with high (left) and low (right) fitness
This same property is visualized in figure 5 as well. The x-axis of both graphs in this figure represents the ranking of the starting states. Here, x = 1 corresponds with the fittest starting state. The fitness decreases as x increases. The least fit starting state is located at x=200. The y-axis represents the Cartesian distance between the triple (c1[0], c2[0], r[0]) of the starting state and the target state (c1*, c2*, r*). The left scatter diagram depicts the situation before the first cycle, i.e. the initial population of states has just been filled randomly. This scatter diagram exhibits no structure. This in contrast to the right scatter diagram, depicting the situation after 5000 basic cycles. Here, there is clearly some structure: the most fit starting states are obviously those farthest away from the target state. The starting states close to the target state, on the other hand, are primarily located at the bottom of the population. This is because these states are easier to control. 0.125
0.125 0.1
0.075
0.075
start states
200
150
200
0 150
0 100
0.025
50
0.025
100
0.05
50
0.05
d1
0
distance
0.1
0
distance
d0
start states
Figure 5: Ranking of the states versus distance from the stable state. Left: initial state population. Right: state population after the execution of 5000 cycles.
4.2 Problem 2 Let us now look at the performance of our CGA around the unstable state of problem 2. Figure 6 illustrates the progress made during a run. It compares the control obtained during a typical run after 1500 and 3000 executions of the basic cycle, respectively. This figure shows the evolution of the variables c1, c2 and r when the fittest NN is used starting from the state (c1*,c2*,r*).
0.5
0.25
0.25
0
0
c1
100
Simulated time (in 0.5 seconds)
100
0.5
c2
75
0.75
50
0.75
25
1
r
0
1
75
1.25
50
1.25
25
1.5
0
1.5
Simulated time (in 0.5 seconds)
Figure 6: Control achieved by best NN after 1500 (left) and 3000 (right) basic cycles
Figure 6 shows that control is improved during evolution. Remember that the initial values of c1, c2 and r in these graphs are the optimal ones. Hence, during the simulation depicted in the graphs, these values should remain as close as possible to the initial ones. Furthermore, it is good to emphasize that the fitness function only takes into account the deviation of c1 from c1*. Figure 7 zooms in on the fluctuations of c1 after 1500 and 3000 executions of the basic cycle. The left graph in this figure shows the control of c1 in both cases. This clearly shows the decrease in amplitude of the oscillations. The right graph magnifies the control of c1 obtained after 3000 executions of CGA's basic cycle. The fittest NN obtained after 3000 basic cycles not only achieves less deviation, it is also able to reduce the amplitude of the deviation during the control period. In other words, control does not only improve from generation to generation, it also improves during the 50 seconds of simulated time. 0.25
0.218 c1-3000
0.225
0.216
0.2
0.214
0.175
0.212
0.15
0.21 c1-1500
c1 100
75
50
25
100
75
50
25
0
Simulated time (in 0.5 seconds)
0
0.208
0.125
Simulated time (in 0.5 seconds)
Figure 7: The control of c1 achieved by the fittest NN after 1500 (left) and 3000 cycles (left and right)
5. Conclusions . This text describes the application of CGAs for process control. Hillis (1992), provided the basic inspiration for the work on CGAs. He coevolved (using predator-prey interactions) sorting network architectures and sets of lists of numbers on which the sorting networks are tested. The partial and continuous nature of LTFE is ideally suited to deal with coupled fitness landscapes. Coevolutionary interacting species typically give rise to such coupled landscapes. The incorporation of LTFE is an important difference with Hillis' work. As a matter of fact, LTFE is much closer to reality than the traditional "all at once" fitness calculation. Over the last two years, CGAs have been used in various applications (Paredis 1994a & b). Moreover, the CGA can easily be extended to incorporate symbiotic interactions (Paredis 1995 & 1996). All this earlier work showed several advantages of the combined use of coevolution and LTFE, such as: increased performance (in terms of solution quality as well as computation demand) and noise tolerance. With respect to process control, an important question has now to be answered: How large is the class of control problems which can be solved with a CGA? Narendra en Parthasarathy's definition of process control given in the introduction of this paper sheds some light on this issue. According to this definition, the main task of process control is to "keep one or more variables within prescribed bounds". This immediately proposes the fitness criterion to be used: the amount of deviation outside the prescribed boundaries (as, for example, in equation 3). Specific control problems typically specialize the general definition above. In the bioreactor example described here, control was learned from different starting states. Other problems, for example, involve various noise patterns the controller should be able to deal with. In this case, the CGA would use a population of noise patterns. In general, the second population consists of tests for the controller. These can take different forms: starting states, noise patterns etc. Furthermore, additional learning algorithms can be used during an encounter to tune the weights of a NN. This way, genetic and life-time learning can be combined. This point certainly warrants further research.
Acknowledgements This research is sponsored by the European Union as Brite/Euram Project No. BE-7686 (PSYCHO). The author is indebted to Roos-Marie Bal for proof-reading this paper.
References Anderson, C. W., Miller III, W. T., (1991), Challenging Control Problems, in Neural Networks for Control, W.T. Miller, R.S. Sutton, P.J. Werbos (eds.), MIT Press / Bradford Books. Hillis, W. D., (1992), Co-evolving Parasites Improve Simulated Evolution as an Optimization Procedure, in Artificial Life II, Langton, C.G.; Taylor, C.; Farmer, J.D., and Rasmussen, S., (eds), Addison-Wesley, California. Narendra, K.S., and Parthasarathy, K, (1990), Identification and Control of Dynamical Systems using Neural Networks, IEEE Transactions of Neural Networks, 1(1), IEEE Computer Society Press. Paredis, J., (1991), The Evolution of Behavior: Some Experiments, in Proceedings of Simulation of Adaptive Behavior: From Animals to Animats, Meyer, and Wilson (eds.), MIT Press / Bradford Books. Paredis, J., (1994a), Steps towards Coevolutionary Classification Neural Networks, Proc. Artificial Life IV, R. Brooks, P. Maes (eds), MIT Press / Bradford Books.
Paredis, J., (1994b), Coevolutionary Constraint Satisfaction, Proc. PPSN-III, Lecture Notes in Computer Science, vol. 866, Davidor, Y., Schwefel, H-P., Manner, R. (eds.), Springer Verlag. Paredis, J., (1995), The Symbiotic Evolution of Solutions and their Representations, Proceedings of the Sixth International Conference on Genetic Algorithms (ICGA 95), Eshelman, L. (ed.), Morgan Kaufmann Publishers. Paredis, J., (1996a), Symbiotic Coevolution for Epistatic Problems, Proceedings of the European Conference on Artificial Intelligence 1996 (ECAI 96), John Wiley & Sons (in press). Paredis, J., (1996b), Coevolutionary Computation, Artificial Life Journal, Vol. 2, nr. 4, Langton, C. (ed.), MIT Press / Bradford Books (in press). Ungar, L.H., (1991), A Bioreactor Benchamrk for Adaptive Network-based Process Control, in Neural Networks for Control, W.T. Miller, R.S. Sutton, P.J. Werbos (eds.), MIT Press / Bradford Books. Wieland, A. P., (1991), Evolving Controls for Unstable Systems, Connectionist Models: Proc. of the 1990 Summer School, D. S. Touretzky, J. L., Elman, T. J. Sejnowski, G. E. Hinton (eds.), Morgan Kaufmann. Whitley, D., (1989), Optimizing Neural Networks using Faster, more Accurate Genetic Search. Proc. Third Int. Conf. on Genetic Algorithms, Morgan Kaufmann. Whitley, D., (1993), Genetic Reinforcement Learning for Neurocontrol Problems, Machine Learning, 13: p. 259-284, Kluwer Academic Publishers.