Dept. of Mathematical Sciences, The University of Memphis and. Universidad Nacional de Colombia. Anthony Quas. Dipankar Dasgupta. Dept. of Mathematical ...
Equilibrium States of Iterated Random Maps Arising in Evolutionary Algorithms German Hernandez Fernando Ni~no Dept. of Mathematical Sciences, The University of Memphis and Universidad Nacional de Colombia
Abstract
Anthony Quas Dipankar Dasgupta Dept. of Mathematical Sciences, The University of Memphis
This paper studies the equilibrium states and the dynamical entropy of iterated random maps that arise in modeling a class of evolutionary algorithms Keywords: Evolutionary algorithms, iterated random maps, ergodic theory, equilibrium states.
1 INTRODUCTION An evolutionary algorithm can be seen as a random dynamical system [1] with con guration space a space of populations and with dynamics de ned by a stochastic transition operator that models the behavior of natural selection and natural variation acting over a population of a species. We use the term con guration space instead of the standard \state space" because the term \state" will be used here to denote a probability measure over the space of con gurations of a system. Iterated random maps [6] are a class of discrete random dynamical systems that have been studied intensively in recent years due to their applications in dierent areas, in particular, in generation of fractal images [2]and image compression [7]. In Hernandez, Ni~no and Quas [9] we proposed a model for a wide class of evolutionary algorithms using iterated random maps and we studied some properties of their asymptotic behavior using such a model. In this paper, we study the equilibrium states of the iterated random maps that model evolutionary algorithms. Here, an equilibrium state of a system is considered as a thermodynamical or statistical mechanics equilibrium [12], i.e., a probability measure over the space of con gurations of the system that is preserved by the dynamics of the system, and such that the dynamics evolves towards it (stable equilibrium). These equilibrium states can be also characterized by a variational principle without timeevolution considerations [11]. In this case, an equilibrium state is characterized as a measure that maximizes a linResearch supported in part by COLCIENCIAS grants to G. Hernandez and F. Ni~no
ear combination of an entropy and energy-like quantity (in most cases, either a sum or a dierence). The rest of this paper is organized as follows: section 2 reviews the fundamental concepts of iterated random maps; section 3 describes evolutionary algorithms and the model using iterated random maps; section 4 presents the variational characterization of the equilibrium states of these iterated random maps and nally section 5 presents some conclusions.
2 Iterated Random Maps An iterated random map [6] is a system that is governed by a collection of mappings that are selected randomly with some probability distribution. An iterated random map is precisely de ned next. Given a metric space (X; ), a set S = fh j h : X g of functions from X to itself and a collection fxjx 2 X g of probability measures on S. An iterated random map is de ned as a discrete random dynamical system hX; T i with con guration space X and dynamics T de ned in each iteration, and for each x 2 X , as a random map d.x (distributed like x ) on S. The n-th iterate T n of T is T n(x) = hn hn?1 h1 (x) where h1 is d.-x, h2 is d.-h1(x) , f3 is d.-h2 h1 (x) and so on. If x = for all x 2 X then h1,h2, ; hn is a sequence of i.i.d.- (independent identically distributed like ) random maps on S . In this paper, we consider a particular class of iterated random maps in which X is compact, and S is a nite set of contractions; this type of iterated random maps are known in the literature as probabilistic iterated function systems [3]. Assuming S = ff1 ; f2; ; fk g and pi (x) = x(fi ), for i = 1; :::; k, then T can be de ned as 8 f1 (x) with probability p1(x) > > > < f2 (x) w.p. p2(x) T (x) = > .. . > > : f (x) w.p. pk (x) k
In random dynamics we are interested in the study of probabilistic properties of a system, i.e., instead of studying the evolution (orbit) of a point in the state, we study the evolution of probability measures (or probability densities of points) under the dynamics of the system. In order to do this, we de ne an operator that re ects the action of the system over Borel probability measures on X , denoted M(X ); also, we are looking for the measures describing the long-term behavior of the system, i.e., the invariant measures under the dynamics such that the dynamics evolves towards it (stable equilibrium states). In order to de ne the operator on M(X ) that re ects the action of T , we rst observe that the system hX; T i de nes a discrete Markov chain on X with transition probability function (of reaching B given that the system is in the state x) given by
P (x; B ) = P (T (x) 2 B ) =
k X i=1
pi (x) 1B (fi (x)) :
where B is any Borel measurable subset of X and 1B is the indicator function of B . Then, an orbit of a point x0 is a random walk fx0; x1; :::g on X with xi+1 = T (xi ). The time evolution of a Borel probability measure by the random dynamical system is described by a Markov operator MT : M(X ) ! M(X ) which is de ned by R MT (B ) = PX p(x;R B )d(x) = ki=1 fi?1 (B) pi(x)d(x) for all Borel measurable subsets B of X and for all 2 M(X ). In the case of constant probabilities pi(x) = pi (i.e., x = ), the Markov operator reduces to
MT () =
k X i=1
pi fi?1 for 2 M(X ): (see [10])
If X is compact, each fi is a contraction and the moduli of continuity of the probabilities pi(x) satisfy a Dini condition [3], then the system hX; T i has a compact global attractor AT , and it is known that for any point x0 ,
!Tn (x0 ) !h AT ; with probability 1, where !Tn (x0 ) is de ned by !Tn (x0) =
1 [ r=n
T r (x0 )
and the convergence is in the Hausdor metric. Further, there exists a unique global attracting invariant measure (stable equilibrium state) supported on the attractor AT ; and this measure is also ergodic [3]. Another option to study this is through the Perron-Frobenius operator that describes the time-evolution of probability densities, see gure 1.
M
Mµ
T
µ
Markov operator acting on Borel measures on
Probability density associated with
dµ
X State space
µ.
R
T( )
X
.
T
Perron-Frobenius operator acting on probability densities on
Mµ
Probability density associated with
X
T Dynamics
MT µ
d
( ).
( )
X
Figure 1.
3 Evolutionary Algorithms Let us consider an optimization problem of the form, optimize fF (i) j i 2 I g
where I the search space is a nite space with a metric d and F is the optimization function. Optimize means either maximize or minimize. If we consider the search space I as a space of individuals and the optimization function as a tness function. Then we can say that I and F de ne a tness landscape (I; F ) for an evolutionary algorithm. This algorithm works over a collection of individuals, called population, that evolves in time through the application of computational procedures that emulates the behavior of natural evolution, i. e., selection and variation on the population. The algorithm is expected to produce approximately optimal solutions. The eectiveness of these algorithms is supported by the premise that natural evolution, as is observed in nature, is a process that produces approximately optimal individuals. In general, an evolutionary algorithm is a procedure of the form Choose initial population Repeat Perform variation Perform selection Until a stopping criterion is satis ed The stopping criterion can be either that the EA reaches a pre-determined number of iterations, or some condition for the values of the tness function is reached; for instance, reaching a stable value. In evolutionary algorithms a population is a multiset of n individuals, with n xed, i. e., a collection of n elements of I such that each element can appear more that once in the collection. The multisets of n individuals of I , denoted I n = , can be formally de ned as, the set of equivalence classes n Q n of I = I by the relation that makes equivalent a i=1 couple of n-tuples x y in I n if x and y have the same elements, and each element appears in x and y the same number of times. Then we de ne the space of possible
con gurations of a population, denoted P and called space of populations as P = I n = . An evolutionary algorithm can be viewed as a random dynamical system hP; Ei with P a space of populations and E an evolution operator that models the behavior of selection and variation acting over the space of populations. This is a stochastic operator of the form E : P ! P; in which E (x), for a given population x 2 P, is a random population obtained usually as a result of the composition of a selection operator and some variation operators. The selection and variation operators considered here are de ned next. We consider here the selection operator S : P ! P, that takes a population x, and produces a population y composed only of individuals of x that have maximum tness, i.e., composed only of the ttest individuals in x. Selection operators of this type were used by Cerf in [4] and Francois in [8] . In order to de ne formallynS , let us denote by F(x) = fi 2 x jF (i) = maxk2x F (k)g = P, this is the set of populations composed only by the ttest individuals of x. Then S is de ned by the probability of producing a population z through application of S to a population x as follows P S (x) = z = 1jFF((xx)()zj ) with 1F(x) being the indicator function of F(x). The action of S on x is then to select a random population on F(x) with a uniform probability. We consider here two variation operators: a local variation operator and a global variation operator. The local variation operator VL : P ! P, is an operator that introduces a random perturbation on the population. Let us denote by B (x) the ball in P centered on x and with radius 2 (0; 1). The operator VL applied to a population x selects a population from B (x) with a uniform probability distribution. Then the probability of producing a population z by applying VL to a population x is P VL (x) = z = 1jBB (x(x) ()zj ) : The global variation operator VG : P ! P, is an operator that introduces a variation in the population similar to the eect of genetic ow in natural populations. Let us de ne the set of x-reachable populations R(x), as the set of the populations that can be produced from x by application of local variation composed with selection, i.e., [ F(y): R(x) =
The global variation operator applied to any x produces a random uniform element in E . Then, we de ne VG , de ning the probability of producing a population z by applying VG to a population x as P V (x) = z = 1E(z) : G
jEj
Finally, the evolution operator E : P ! P is the result of the random independent application of three dierent combinations of the operators de ned before. 8 with probability pS < S (x) w.p. pL E (x) = : S VL (x) VG S VL (x) w.p. pG and we assume that pS + pL + pG = 1. This can be de ned as an iterated random map because E = z1 ; z2 ; ; zjEj is nite and we can de ne the functions fi : P ! P x
7!
zi
for i = 1; 2; ; jEj. These functions are contractive with contractivity factor 0. Also we can de ne the probability functions pi : P ! [0; 1] by pi(x) = pS P (S (x) = zi ) + pL P (S VL (x) = zi ) +pG P (VG S VL (x) = zi ) : The probabilities pi (x) are continuous in (P; (d)) with (d) Hausdor distance induced by the distance between individuals d; in this case (d)(x; y) is the Hausdor distance between the sets of individuals in P the populations x and y. These probabilities satisfy i) jiE=1j pi (x) = 1, ii) pi(x) > 0 8i and iii) Dini's condition on the continuity moduli due to the niteness of the space, see [9] for more details. Then the evolution operator is in this case 8 f (x) with probability p1 (x) > > > f1 (x) w.p. < p2 (x) 2 E (x) = > .. . > > : fjEj(x) w.p. pjEj(x): The system hP;E i has a compact global attractor AE = E and a unique global attracting invariant measure (equilibrium state) supported on the attractor. Then from the probabilistic point of view this system is asymptotically stable.
y2B (x)
4 Equilibrium state
x2P
The existence of a unique global attracting invariant measure, implies in this case, not only the existence of unique thermodynamical equilibrium, but also implies that the algorithm approaches an equilibrium starting
And also let us de ne the set of evolvable populations E by [ E= R(x):
from any probability distribution on the space of populations. Next, we will see that these equilibrium states are characterized using a variational principle. An iterated random map can be viewed as a Markov chain on P with transition probability
Px;y =
jEj X
i=1
fygfi (x)
We know that dynamics is asymptotically concentrated in E for any initial probability distribution 0 on P, in other words, the populations on P? E are transient, see [12]. This Markov chain can be seen as an abstract dynamical system ( ; ) with = PZthe set all the orbits of the Markov chain and the left shift on . Given a -invariant measure and given a nite -partition of
, (A) > 0 for A 2 . The entropy H () of respect to is PA2 ?(A) log((A)) and the dynamical entropy of the transformation respect to the partition is 1 H _ ?1 _ _ ?(n?1) h(; ) = nlim !1 n where _ = fA \ B jA 2 ; B 2 ; (A \ B ) > 0g. The quantity h(; ) measures the average information, per unit time, from observing the system through the partition . The dynamical entropy of is de ned as h ( ) = sup h(; ). If (!) is taken to be the local energy function (!) = log P!0 ;!1 , then the system ( ; ) (see [11] sec 5.4) has a unique equilibrium state on that satis es a variational principle (using entropy and energy) of the form
h ( ) +
Z
d = sup h ( ) + 2I ( )
Z
d
where I ( ) is the set of -invariant measures. The measure arising in this way is exactly the distribution on the space of sequences of populations which describes the stationary distribution of populations arising from the EA. The quantity h ( ) in this case can be explicitly computed as
h ( ) = ?
X
x Px;y log Px;y ;
of studying the structural stability of evolutionary algorithms. In particular, we can study phase transitions, i.e. changes in the properties of the equilibrium state due to changes in the parameters of the evolutionary algorithm.
References [1] L. Arnold. Random Dynamical Systems. Springer Verlag, 1998. [2] M. Barnsley. Fractals Everywhere. Academic Press, 1988. [3] M. Barnsley, S. Demko, J. Elton, and J. Geronimo. Invariant mesures for markov processes arising form iterated function systems with place-dependent probabilities. Ann. Inst. Henri Poincare Probabilities et Statistiques, 24(3):367{394, 1988. [4] R. Cerf. The dynamics of mutation-selection algorithms with large populations sizes. Ann. Inst. Henri Poincare Probabilities et Statistiques, 32(4):455{508, 1996. [5] L. Demetrius. The thermodynamics of evolution. Physica A, 189:417{436, 1992. [6] P. Diaconis and D. Freedman. Iterated random fucntions. SIAM Review, 41(1), 1999. [7] Y. Fisher (editor). Fractal Image Compression - Theory and Applications. Springer Verlag, 1995. [8] O. Francois. An evolutionary strategy for global optimization and its markov chain analysis. IEEE Transactions on Evolutionary Computation, 2(3), 1998. [9] G. Hernandez, F. Ni~no, and A. Quas. Ergodicity in evolutionary systems. In Proceedings of the World Multicoinference on Systemics, Cybernetics and Informatics No III, Ed. Nagib Callos et al., volume 3,
[10]
x;y2P
this entropy has been called population entropy in thermodynamical studies of evolutionary theory [5]. This quantity can be interpreted as the maximal information per unit time that an observer of the system can gain, if he looks at it through any nite partition of .
5 Conclusions The characterization of the equilibrium state and the dynamical entropy in this model opens the possibility
[11] [12] [13]
pages 148{155. Publications of the International Institute of Informatics and Systemics, 1999. P. Kamthan and M. MacKey. Statistical dynamics of random maps. Random and Compuational Dynamics, 3(3), 1995. G. Keller. Equilibrium States in Ergodic Theory. Cambridge Press, London Mathemathical Society Student Texts 42, 1998. O. Penrose. Foundations of Statistical Mechanics. Pergamon Press, 1970. A. Quas. On representation of Markov chains by random smooth maps. Bull. London math. Soc., 23:487{ 492, 1991.