Deceptive and Other Functions of Unitation as

1 downloads 0 Views 4MB Size Report
tion defined by a BN [Druzdzel, 19941, but little research on varying the .... Definition 4 A schema partition is deceptive if there exists a schema ..... rent definition.
To appear, Symposium on Genetic Algorithms (SGA), Madison, Wisconsin, July 22-25, 1998

1

Deceptive and Other Functions of Unitation as Bayesian Networks Ole J. Mengshoel

David E. Goldberg

David C. Wilkins

Dept. of Computer Science University of Illinois Urbana, IL 61801 [email protected]

Dept. of General Engineering University of Illinois Urbana, IL 61801 [email protected]

Beckman Institute University of Illinois Urbana, IL 61801 [email protected]

ABSTRACT Deceptive and other functions of unitation have been considered in order to understand which fitness functions are hard and which are easy for genetic algorithms to optimize. This paper focuses on genetic algorithm fitness functions represented as Bayesian networks. We investigate onemax, trap, and hill functions of unitation when converted into Bayesian networks. Among other things, this paper shows that Bayesian networks can be deceptive.

1

Introduction

The notion of deception was introduced in order to systematically investigate the conditions under which GA schema processing may lead a GA away from fitness optima [Goldberg, 19871. Functions of unitation were later discussed in connection with deceptive functions. A function of unitation (or bit-counting function) is defined over a bit string. It depends on the number of ones in the bit string, and not their positions. There are several advantages to such artificial functions, notably their ease of specification and analysis [Goldberg, 19921. Unfortunately, they are not used in applications. In contrast, Bayesian networks (BNs) have proven to be an important knowledge representation formalism, for instance for probabilistic reasoning in expert systems [Pearl, 19881 [Neapolitan, 19901. Combining the advantages of the two types of functions, this paper converts functions of unitation into BNs. The current research is motivated by previous and ongoing research on using GAS for BN inference [Rojas-Guzman and Kramer, 19931 [Rojas-Guzman and Kramer, 19961 [Welch, 19961 [Mengshoel, 19971 [Mengshoel and Wilkins, 1998b] [Mengshoel and Wilkins, 1998a]. We believe that this line of research can benefit from an increased focus on the relationship between a BN and the joint probability distribution defined by the BN, which is the GA’s fitness function. Here we focus on characteristics of a Bayesian network, given a certain joint probability distribution, which is derived from a

function of unitation. There has been some research within the BN community on typical properties of the joint distribution defined by a BN [Druzdzel, 19941, but little research on varying the hardness of constructed networks, which is what we do here. This papers introduces an approach to mapping functions of unitation into Bayesian networks. In addition, instances from three classes of functions of unitation are mapped to Bayesian networks: onemax functions, trap functions, and hill functions. Onemax functions of unitation are non-deceptive, and the constructed onemax BNs are easy as well. Trap functions of unitation, on the other hand, are deceptive. The trap function instances mapped into highly connected, hard BNs. Hill functions of unitation are similar to trap functions of unitation, but they are not deceptive. The instances considered were still mapped to highly connected BNs. These results show that BNs can be deceptive, and suggest that deceptive functions of unitation are indeed hard, not only for GAS, and that other functions of unitation can be hard as well, although less so than the deceptive ones. In addition, this research paves the way for controlled and systematic experimentation, since experimental BNs of varying difficulty can be constructed using the mapping. An additional advantage of constructing a BN from a function of unitation is that the latter is a parametric function while the former is non-parametric. The rest of this paper is structured as follows. Section 2 presents schemata, deception, and functions of unitation. In Section 3, the basics of BNs are presented. Section 4 describes how to construct BNs from functions of unitation, and the following sections show how BNs can be constructed from different classes of functions of unitation. Section 5 constructs BNs from onemax functions of unitation, which are non-deceptive functions. Section 6 considers trap functions of unitation, which are deceptive functions. Section 7 discusses hill-shaped functions of unitation. Section 8 concludes and presents future work.

2

Schemata, Deception, and Functions of Unitation

The notion of GA deception is based on the assumption that

schema processing plays a central role in GAS. In the following we introduce some definitions that are helpful in dis-

To appear, Symposium

on Genetic Algorithms

(SGA), Madison,

cussing static schema processing. Definition 1 Let p : { 0,1)” -+ R be a fitness function, and let B = b+.b, and S = s1 . ..s. with bi E {O,l} and s; E {O,l, $1. Now define inst(S) = inst(sl...s,) to refer to an instantiation of the schema S. An instantiation B of S (B = inst(S)) is such that if si = *, then bi = 0 or bi = 1. If s; # *, then bi = si. Now the set of instantiations of S, Inst(S), can be defined as follows: Inst(S)

= {B 1 B = inst(S)}.

Given the definition of an instantiation tion of schema fitness can be introduced.

of a string, the no-

Definition 2 (Schema fitness) The schema fitness q : {O,l, *}n -+ R of schema S = {O,l, *}n is deBned as follows: q(S) =

x

2

Wisconsin, July 22-25, 1998

and the deceptive conditions shown in Table 1. Note that in this case, the deceptive schemata are all instances of the same string, namely the deceptive maximum. According to Definition 4, this need not be the case. There has been extensive research on deception within the GA community. The connection between GAS and Walsh functions has been presented by Goldberg [Goldberg, 1989b] [Goldberg, 1989a]. He emphasizes the utility of using Walsh functions and Walsh polynomials for analyzing GAS [Goldberg, 1989b]. These methods are extended to analyze deception and disruption to schema processing by genetic operators. In addition, the first fully deceptive function-of bit length three-is presented [Goldberg, 1989a]. The notion of unitation has been used repeatedly for the purpose of analysis of hard and easy fitness functions. Definition 5 Let B be a bit string of length n. The unitation u(B) of B is a function deJined as

P(B)*

n

BEIns@) uw

=

u(bi...bn) = bl +

l

When there is no ambiguity, we write p(S) instead of q(S).

A function of unitation is defined using u(B).

As an example of the above definition, q(*l*)

= p(o10) + p(110) + p(o11) + p(lll).

One aspect of GAS that we will focus on is their optimizing capability. For simplicity we assume in the following that there is one optimal string -this can easily be generalized. Definition 3 Let f be a jitness function, and let Xop~ = arg m~xE(O,l}~n f(X) be an optimal string. Then any schema S such that inst(S) = XoP* is called an optimal schema. Schema processing operates on schema partitions [Grefenstette, 19931 [Mitchell, 19961. In the 3-bit case, the schema partitions for (d >I:*) ae (0 * *) and (1 * *). Competition between bit strings takes place within a schema partition. It is a common assumption within the GA community that GAS start by processing low-order schema partitions, and then gradually shift towards processing high-order schema-partitions. Definition 4 A schema partition is deceptive if there exists a schema with higher$tness than the optimal schema in the schema partition. A fitness function is deceptive if it contains a deceptive schema partition. A fitness function isfully deceptive if all schema partitions (except for the maximally speciJied one) are deceptive. A fitness function is non-deceptive tf no schema partition is deceptive. As an example, consider a fully deceptive 3-bit string, and assume without loss of generality that 111 is the global maximum, 000 the deceptive maximum. This is formalized in the

optimality condition PW)

> P(Ooo)

. . + bn = x b;. i=l

(1)

In other words, unitation means the number of ones in a bit string, so ~(100) = 1 and ~(111) = 3. Research on deception has also been critiqued. For example, Back says that ‘it may well be the case that [deceptive functions] are the problems for which Genetic Algorithms must fail according to the construction of the problem and the provable limits of Genetic Algorithms’ [Back, 1996, p. 1281. The results in this paper, in particular Section 6, do indeed suggest that deceptive functions are inherently difficult, not only for GAS but for other algorithms as well. It should, however, be emphasized that we only give examples and not formal proofs of this.

3

Bayesian Networks

A Bayesian network (BN) represents a joint probability distribution in a compact manner, by exploiting a graph structure. This section introduces formal definitions related to BNs, and also defines the inference task we are focusing on. Definition 6 A discrete random variable V is associated with k 2 1 mutually exclusive and exhaustive states ~1, . .. , vh, and V’s state space is 0~ = (211, . . . . vk). Typically, one is interested in multivariate leading to the following definition.

distributions,

Definition 7 Let {VI, . .., Vn} be (discrete) random variables, (211, . .. . vn) instantiations of those random variables. Here, instantiation vi goes with random variable V;. Pr(v)denotes the joint probability distribution over the variables (VI, . . . . Vn):

Pr(4

= Pr(vl,

. . . . Vn)

=

Pr(Vl

=

211,

....

Vn =

Vn),

To appear, Symposium

on Genetic Algorithms

Conditions

July 22-25,

Schema partition

Conditions

p(dd*) P(d * 4

P(0 * *> > PO* *> p(*o*) > p(*l*) p(* * 0) > p(* * 1) p(oo*) > p(Ol*), p(oo*) > p(lO*), p(oo*) > p(ll*) P(O * 0) > P(O * q, P(O * o> > P(l * q, P(0 * o> > P(l * 1)

for full deception

PW4

p(*oo)

> p(*Ol),

p(*oo)

> P(*lo),

Definition 8 Let V be a node in a DAG. Then the following functions are de$ned: Pa(V) gives the parents of V, pa(V) gives an instantiation of parents of V. Ch(V) gives children of V, Ne( V) gives neighbors of V: Ne( V) = Pa(V) U Ch( V). The notion of a Bayesian

network can now be introduced.

Definition 9 A Bayesian network is a tuple (V, W,Pr), where (V, W) is a directed acyclic graph with nodes V = {VI, . ..Vn) and edges W = (WI, . ..W.>; Pr is a set of conditional probability tables. For each node Vi E V there is one such table, which defines a conditional probability distribution over V; in terms of its parents Pa(K): Pr(K IPa(& Consider a Bayesian network over the set of nodes V. Then the joint distribution Pr(v) is:

> p(*W

We consider BNs where nodes represent discrete random variables, and in particular the special case where all nodes are binary. In this case there is one bit position in a GA bit string per node in the BN. More formally, let Vi = vi be the assignment to node number i in a BN with binary nodes only, so vi E {O,l> = s2E = 0~. Then the following defines a one-to-one mapping from node Vi to bit bi in position i in bit string B: b; = 0 if Vi = 0, and bi = 1 if Vi = 1. In the following we shall often gloss over the distinction between a GA individual and a BN instantiation, and we may say Pr : (0,1)” --) [0,11.For example, we may write Pr(O1O) rather than Pr(Vi = 0, V2 = 1, V3 = 0) = Pr(O,l,O). All explanations are not equal. In particular, those that are more probable are typically of greater interest, leading to the following definitions. Definition 11 Let the explanations x1, x2, x3, . . . be ordered according to their posterior probability: Pr(xl

= Pr(vl, . . . ,v,)

= fiPr(vi

I pa(K)),

(2)

i=l

where pa( Vi) c {vi+l, . . . , vn}. Bayesian networks are different from most other GA fitness functions in their accommodation of evidence through conditioning. Definition 10 Variables whose states are known are called evidence variables E. The remaining variables X = V - E are called non-evidence variables. When all non-evidence variables are instantiated, X = x, this is denoted an explanation x.

Intuitively, an explanation x are If nodes E =(&, . . . . E,) e1, ***,a! = e,), then we can use posterior belief over the remaining Pr(x

p(*oo)

for full deception in a bit string of length three.

The following definitions related to directed acyclic graphs (DAGs) will also prove useful.

Pr(v)

3

1998

P(d * *> p(*d*) p(*d*)

Table 1

(SGA), Madison, Wisconsin,

Pr(x, 4 ( e) = IYe)

explains the evidence e. instantiated to {El = Bayes rule to compute the nodes X:

o( Pr(x, e) = Pr(y).

(3)

Pr(e) can be computed by marginalization, however this is often not done, since Pr(x, e) can be used instead as indicated in Equation 3. In the non-evidence case, Equation 2 can be used directly.

1 e) > Pr(x2

1 e) > Pr(x3

1 e) 2 ..*

The most probable explanation is x1 = x~pE. probable explanations are x1, . .. . xk.

The k most

Two typical BN inference tasks are belief updating and belief revision [Pearl, 19881. Belief revision is concerned with computing the most probable explanation or more generally the k most probable explanations. Notice the equivalence between computing the most probable explanation in the BN setting and computing the optimal string in the GA setting, that is Xop~ = XMPE. Both belief updating and belief revision is, in the general case, computationally hard. Two factors of a BN make it harder: A highly connected topology and highly skewed probability distributions. Concerning the topology, polynomial time algorithms exist for BNs that are trees (also known as singly connected networks) [Pearl, 19881. Concerning skewness, the situation is not as clear. However, it is known that skewed distributions can make stochastic simulation converge much more slowly, in particular when lowprobability states are instantiated.

4

Constructing

Bayesian Networks

Both functions of unitation and BNs can be used as GA fitness functions; the advantage of functions of unitation compared to BNs is that there are fewer parameters to manipulate. In particular, the number of conditional probabilities that needs to be supplied for a node in a BN grows exponentially with the

To appear, Symposium

on Genetic Algorithms

(SGA), Madison, Wisconsin,

number of parents that the node has. In contrast, the number of parameters in a function of unitation grows at most linearly with the number of bits. This is how to systematically create a Bayesian network from a fitness function: 1. Use the fitness function to create a joint probability distribution. For example, if we want a fully deceptive BN consisting of k nodes, a fully deceptive trap function (see Section 6) consisting of Ic bits is first constructed. This deceptive trap function is then used to construct a deceptive joint probability distribution. This is a table consisting of 2” rows, one row per explanation in the Bayesian network. 2. Use the joint probability distribution to construct a Bayesian network that represents it, but in a more compact manner. Possibly, some approximation needs to be taken here, but the essential characteristics (such as deception) need to be maintained. This creates a network with Ic nodes (and some number of edges) from the 2” table. To prepare for a more formal description of the above, the following notation is introduced: ”

cc(i) : cell count for cell i. Cell count is the number of samples that exist for an explanation. For example, with four samples for the explanation X1 = 0, X2 = 1, X3 = 1, the cell count is ~~(011) = 4. Often, cells are indexed by a decimal number rather than a binary number, giving for example, CC(~) = 4. f: fitness function tion) Bin(i):

(in the following

a function of unita-

function that gives the integer i in binary

5. Pr +- Estimate((Xi,

l

Build(): Direct():

eu(B)+c

rounds the number x to the nearest integer

c: constant; in the following constructs

c = 3 is used

eu(bl...b,,)+c

--

ebl+...+b,+c

--

e c ;‘1 ,

--

ec

conditional

probabilities,

further

The Construct algorithm for constructing a k-node BN from a k-bit fitness function, such as a k-bit trap function, is as follows: 1. Let there be k nodes {Xi, X2, .. ., &}. Let each node have two states (0, l}, i.e. I&, , 1= 2 for all i. 2. For i = 0 to 2’”- 1: cc(i) +- Round(exp(f(Bin(i))

l

bi+c

+c))

rI

eb%.

i=l

edges from the pattern P, further

described below Estimate(): estimates described below

-_

l

n

a pattern P, further described below

constructs

X2, . . . . Xl,),

The above algorithm constructs a DAG (V, E) and conditional probability tables Pr which together make up a Bayesian network. Examples of how Construct works are given in sections 5, 6, and 7. The Construct algorithm is based on the observation that a BN can be induced from the joint distribution table; this is learning with a complete data set. Approaches for machine learning of BNs can therefore be utilized, here TETRAD [Scheines et al., 19941 is used. The algorithms Build and Estimate both correspond to TETRAD procedures (see [Scheines et al., 1994]), while Direct is a quite simple algorithm. Given nodes (Xi, X2, .. ., &) along with their cardinalities (PXII, PXA, “‘7IQXA)an da cell count cc, Build constructs a set of patterns P which expresses conditional independence and dependence relations between the random variables. Direct takes the patterns P and creates a set of edges E. Estimate takes the same parameters as Build and the edges E, and constructs conditional probability tables Pr. Build and Estimate were used with their default parameter settings when the results in the following three sections were produced. Why is the exponent exp taken above? This is because BNs are set up to work by multiplication of probabilities, and this is ‘counteracted’ by taking the exponent here. More formally, an exponential function of unitation can be written as follows, where the function of unitation u(br . . .bn) = bi + + b, is assumed:

/K number of bits in fitness function Round(z):

July 22-25, 1998

Note that the last line above is of the same form as Equation 2. This justifies the use of e u(B)+c in the Construct algorithm above.

5

Onemax Functions

Onemax functions are generalizations of the unitation u(B) of a bit string B. Ackley introduced the onemax function

f(B) = 10 x u(B), i.e. just a constant multiple of the functi .on of uni tation [Ackley, 19871. More generally, a onemax function of unitation 1s

3. P t- Build((Xr,

X2, .*o,Xk),

(Is2x, I, Px,

4 4. E t- Direct(P)

I>“‘7I%&I>?

f(B) = d x u(B),

(4)

where d is a constant. By letting d = 1 in Equation 4, we get the onemax function f(B) = u(B), which is used in the following.

To appear, Symposium i 0 1 2

3 4 5 6 7

B = Bin(i) 000 001

on Genetic Algorithms

f(B)

u(B)

(SGA), Madison, Wisconsin,

f(B) +c

0 1

0 1

3 4

010

1

1

4

011 100 101

2 1 2

2 5 1 4 2 5 110 2 2 5 111 3 3 6 Table 2 Non-deceptive onemax fitness function over bitstring consisting of three bits.

July 22-25,

cc@> 20 55 55

148 55

148 148 403 defined

Table 4 BN constructed from non-deceptive onemax fitness function with four bits.

f(B)

BN constructed from non-deceptive onemax fitness function.

Table 2 shows bit strings and corresponding cell counts for this onemax function. The conditional probability tables created by the Construct algorithm based on the input in Table 2 is shown in Table 3, the graph is shown in Figure 1. It is easy to check that the BN constructed from the onemax function is non-deceptive, like the onemax function itself. Using a four bit onemax function in a similar way as above, the BN whose conditional probabilities are shown in Table 4 is attained. As one might have expected, the BNs constructed from the two onemax functions are quite ‘easy’. There are no interactions between nodes in the BNs, and the conditional probability tables are the same within any one BN.

6

1998

Trap Functions

a

if u(B)< otherwise

(z - u(BN

&(u(B) - z)

z ’

(5)

where a is the local (deceptive) optimum, b is the global optimum, and z is the slope-change location. The point of a trap function is that there are two optima, a and b, and by varying the parameters, one can make it more or less difficult to find the global optimum b as opposed to the local optimum a. It has been shown that when the following condition holds, a trap function is fully deceptive [Deb and Goldberg, 1992, p. . 991 . a = T > 2 - ll@ - 4 (6) 2-l/2 ’ b The smallest bit string that can be fully deceptive consists of three bits [Goldberg, 19871. In the following, we construct a 3-node BN that is fully deceptive. The construction is based on the trap function in Eq. 5. Let k = 3, z = 2, a = 4, and b = 5 in that equation, so we get the function

f(B) =

if if

2(2-u(B)) 5(u(B) - 2)

u(B)5 2 u(B) > 2

which is shown in the figure below:

This section focuses on deception in BNs. Deceptive BNs are constructed from trap functions, which are deceptive functions of unitation. The advantage of trap functions is that they have few parameters but can still be made deceptive. Ackley used a trap function for empirical investigations [Ackley, 19871, and this function has later been generalized [Deb and Goldberg, 19921. The trap function definition of Deb and Goldberg is the following:

0'

3

Graph BN constructed Figure 1 onemax fitness function.

1

1.5

2

2.5

We note that the condition for full deception met: 2 4 = r > 2 - I/(3 - 2) 2-l/2 = 3’ s -

0 X

0.5

from non-deceptive

3

of Eq. 6 is

Now the following trap function values can be calculated: f (000) = 4, f (001) = 2, f (011) = 0, and f(ll1) = 5. This gives bit string fitness as shown in Table 5. Schemata fitness values can be computed from Table 5, and it can easily be checked that this fitness function is indeed fully deceptive.

To appear, Symposium i 0 1 2 3 4 5 6

B = Bin(i) 000 001 010 011 100 101 110

on Genetic Algorithms u(B)

f(B)

0 1 1 2 1 2 2 111 3 b7 Table 5 Deceptive trap fitness string consisting of three bits.

(SGA), Madison, Wisconsin,

f(B) + c

4 7 2 5 2 5 0 3 2 5 0 3 0 3 5 8 function defined

July 22-25, 1998

49 1097 148 148 20 148 20 20 2981 over bit

Y3 ii Figure 2 Graph of BN constructed from deceptive trap fitness function with three bits. i

B = Bin(i)

0

IYl:I

0

I

1

I

Y2: 0 1

0 1 0 1 0.8811 0.8809 0.8809 0.0067 0.9933 0.1189 0.1190 0.1190 Conditional probability tables of BN conTable 6 structed from deceptive trap fitness function.

The BN whose graph is shown in Figure 2 and whose conditional probability tables are shown in Table 6 is constructed from the fitness function f shown in Table 5. In order to make this BN deceptive, the Pr( Yl) table is replaced with a new table Pr(Y,‘) as follows. Since 111 is the global optimum and 000 the deceptive optimum, the condition

Wll)

>

1

Pr(OOO) After some calculation we get needs to be maintained. It turns out that for example = 0) < 0.5479. Pr(K Pr(Y{

= 0) = 0.54 and Pr(Y{

f(B)

--

$(3-u(B)) 5(u(B) - 3)

if if

u(B)5 3 u(B)> 3 ’

The result of running Construct, as shown in Table 7, is quite similar to that for the three-bit trap function. As one might have expected, the BNs constructed from the two trap functions turn out to be ‘harder’ than the BNs constructed from the two onemax functions. In particular, there are two things to notice. First, the graph in Figure 2 is fully connected, indicating interactions between all random variables. Second, the conditional probability tables in Table 6 and in Table 7 contain more varied and extreme

f(B) + c 44

0

1 4 55 1 001 1 3 6 403 2 010 1 3 6 403 011 2 5 8 2981 , 3 4 100 1 3 6 403 5 101 2 5 8 2981 6 110 2 5 8 2981 7 111 3 0 3 20 Table 8 Hill fitness function defined over bit string consisting of three bits.

entries than those for the onemax functions. In particular, = 1) and Pr( Y3 1 Yl = 1, Y2 = 1) are such Pr(Y2 I K extreme entries.

7

Hill Functions

Intuitively, onemax functions are ‘easy’ while trap functions are ‘hard’, and we have seen how the corresponding constructed BNs reflect this. But could it be the case that not only trap functions, but also other functions of unitation, result in highly connected BNs? In order to investigate this question, we constructed a hill function by subtracting a trap function from its maximal value:

= 1) = 0.46 gives a fully

deceptive BN when the two other tables in Table 6 are kept the same. What happens if we consider a four-bit trap function? Let k = 4, x = 3, a = 4, and b = 5 in Equation 5, giving the function

f(B)

u(B)

000

g(B) = b - f WI where f is a trap function, b is the trap function’s maximal value. For instance, for the 3-bit trap function presented in Section 6 we get

g(B) --

l+ $u(B) 5(4-u(B))

if if

u(B)< 3 u(B)> 3 ’

Fitness values and cell counts for the 3-bit case are shown in Table 8. Somewhat surprisingly, the same graph was induced as for the corresponding trap function, shown in Figure 2. However, the conditional probability tables, shown in Table 9, are different. The most prominent differences First, Pr(&) is slightly more skewed are the following. than Pr( Yl). Pr( 22 1 21 = 1) is much less skewed than Pr(Y, I K = 1). Pr( 2s ( 21 = 1,22 = 1) is very similar to Pr(Y3 IYl = l,Y2 = l), and the other conditional probability distributions are also quite similar to each other.

To appear, Symposium

on Genetic Algorithms

(SGA), Madison, Wisconsin,

0 0.7916 0.2084

1998

7

pr(Y3 1Yoi)

PQi? I J5) Y1: 0 1

July 22-25,

1 0.1296 0.8704

Y1: Y2:

0

1

0

0 1

0.7915 0.2084

0.7917 0.2082

0.7917 0.2082

0

1 1 0.0310 0.9690

Pr(Y41fi,&,fi) Y1: Y2: Y3:

0

1

0

1

0

1 0 1 0 0.7917 0.0067 0.7915 0.7918 0.7918 0.7917 0.7918 0.7917 1 0.2085 0.2082 0.2082 0.2083 0.2082 0.2083 0.2083 0.9933 Conditional probability tables of BN constructed from deceptive trap fitness function with four bits.

Table 7

0

1

pr(z2

pr( z,) t 1

0

Y1: 0 1

0.3757 0.6243

pr(z3

0

1

I&>

0 0.1192 0.8808

1 0.5300 0.4700

I &,22)

1 1 0 1 0 0.1201 0.1191 0.1191 0.9933 0.8799 0.8809 0.8809 0.0067 , Conditional probability tables of BN conTable 9 structed from hill fitness function with three bits. 21:

0

22: 0 1

Similar to the 4-bit trap function, a 4-bit hill function was also investigated. The resulting conditional probability tables are omitted due to space restrictions. These tables show the same trend as the 3-bit tables, both by themselves and compared to the 4-bit trap BN. First, Pr( 21) is somewhat more skewed than Pr(Y1). Pr( 22 1 21 = 1) is much less skewed than Pr(Y2 1 Yl = l), and Pr(& I 21 = 1,22 = 1) is much less skewed than Pr(Ys I Yi = 1, Y2 = 1). Other conditional probability distributions are quite similar to each other. 8

Conclusion

and Future Work

Characterizing classes of fitness functions for which evolutionary computation is appropriate is one of the most important current research directions. This paper focuses on two types of fitness functions: functions of unitation and Bayesian networks. Functions of unitation are attractive because they are relatively easy to manipulate and can be made hard to solve for GAS. Unfortunately, functions of unitation are artificial functions that are less interesting from an application point of view. In contrast, Bayesian networks have proven useful in many areas of artificial intelligence, but it is less clear which Bayesian networks are hard for GAS. Functions of unitation and Bayesian networks are therefore complimentary, and this papers introduces an approach to mapping functions of unitation into Bayesian networks. In addition, instances from three classes of functions of unitation

0

1

have been mapped to Bayesian networks: onemax functions, trap functions, and hill functions. Onemax functions of unitation are non-deceptive, and the constructed onemax BNs are non-deceptive and easy as well. Trap functions of unitation, on the other hand, are deceptive. The trap function instances mapped into highly connected, hard BNs, which can be made deceptive. Hill functions of unitation are similar to trap functions of unitation, but they are not deceptive under the current definition. The instances considered were still mapped to highly connected BNs. However, overall the skewness in the conditional distributions was smaller for the hill BN than for the trap BNs. Both the trap and the hill functions give highly connected Bayesian networks, suggesting that they are in a sense difficult. In retrospect, this appears reasonable since functions of unitation are global, meaning that the entire string is taken into account for fitness computation. However, when the skewness of the conditional distributions is also taken into account, the hill BNs are generally less skewed than the trap BNs, supporting the intuition that hill BNs are ‘easier’ than trap BNs. Future work in three areas stands out: systematic experimentation, other measures of function difficulty, and other aspects of the relation between functions of unitation and BNs. Regarding systematic experimentation, the approach described here can be used to create BNs for systematic exThere are several parameters to vary when perimentation. constructing these networks, even when they are constructed from functions of unitation: Size of deception, size of deceptive network, number of deceptive sub-networks, and signal difference between deceptive and non-deceptive optima. Future work on other measures of function difficulty could focus on how deception relates to other measures of function difficulty, for example NP-hardness. To our knowledge, there . has not been any research on this. Regarding other aspects related to functions of unitation and BNs, additional research is needed to exactly formalize the conditions under which BNs are deceptive or nondeceptive, similar to that done for functions of unitation [Deb and Goldberg, 19921. Second, one could investigate empiri-

To appear, Symposium

on Genetic Algorithms

(SGA), Madison, Wisconsin,

tally the degree to which there is deception in some ‘natural’ BNs, i.e. BNs that have been developed for application purposes.

Acknowledgments The research of Ole J. Mengshoel

and Dr. Wilkins was supported in part by ONR Grant N00014-95-1-0749, ARL Grant DAALO l-96-2-0003, and NRL Contract NO00 14-97-C-206 1. Dr. Goldberg’s contribution to this study was sponsored by the Air Force Office of Scientific Research, Air Force Materiel Command, USAF, under grants F49620-94- 1-O103, F49620-95 l-0338, and F49620-97- l-0050. The US Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Air Force Office of Scientific Research or the U. S. Government. Comments from Vadim Bulitko, Eugene Grois, and William Hsu as well as from anonymous reviewers helped improve the presentation of the results in the paper.

[Ackley, 19871 Ackley, D. H. (1987). An Empirical Study of Bit Vector Function Optimization, chapter 13, pages 170204. In [Davis, 19871.

.

[Back, 19961 Back, T. (1996). Evolutionary Algorithms in Theory and Practice. Oxford University Press, New York, NY . [Davis, 19871 Davis, L., editor (1987). Genetic Algorithms and Simulated Annealing. Pitman, London. [Deb and Goldberg, 19921 Deb, K. and Goldberg, D. E. (1992). Analyzing deception in trap functions. In FOGA92, Foundations of Genetic Algorithms, Vail, Colorado. [Druzdzel, 19941 Druzdzel, M. J. (1994). Some properties of joint probability distributions. In Proceedings of the Tenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-94), pages 187-194, Seattle, WA. [Goldberg, 1989a] Goldberg, D. (1989a). Genetic algorithms and Walsh fuctions: II. Deception and its analysis. Complex Systems, 3(2): 153-171. [Goldberg, 1989b] Goldberg, D. (1989b). Genetic rithms and Walsh functions: I. A gentle introduction. plex Systems, 3(2): 129-152.

1998

8

[Goldberg, 19921 Goldberg, D. E. (1992). Construction of high-order deceptive functions using low-order Walsh coefficients. Annals of Mathematics and Art$cial Intelligence, 5:35-48. [Grefenstette, 19931 Grefenstette, J. J. (1993). Deception considered harmful. In Whitley, L. D., editor, Foundations of Genetic Algorithms 2, San Mateo, CA. Morgan Kaufman. [Mengshoel, 19971 Mengshoel, 0. J. (1997). Belief network inference in dynamic environments. In Proc. of AAAI-97, page 813, Providence, RI. [Mengshoel and Wilkins, 1998a] Mengshoel, 0. J. and Wilkins, D. C. (1998a). Abstraction for belief revision: Using a genetic algorithm to compute the most probable In Proc. 1998 AAAI Spring Symposium on explanation. SatisJicing Models, pages 46-53, Stanford University, CA. [Mengshoel and Wilkins, 1998b] Mengshoel, 0. J. and Wilkins, D. C. (1998b). Genetic algorithms for belief The role of scaling and niching. network inference: In Proc. Seventh Annual Conference on Evolutionary Programming, San Diego, CA. To appear. [Mitchell, 19961 Mitchell, M. (1996). An Introduction to Genetic Algorithms. The MIT Press, Cambridge, MA.

References

.

July 22-25,

algoCom-

[Goldberg, 19871 Goldberg, D. E. (1987). Simple GeneticAlgorithms and the Minimal Deceptive Problem, pages 7488. In [Davis, 19871.

[Neapolitan, 19901 Neapolitan, R. E. (1990). Probabilistic Resoling in Expert Systems. John Wiley & Sons, New York, New York. [Pearl, 19881 Pearl, J. (1988). Probabilistic Reasoning telligent Systems: Networks of Plausible Inference. gan Kaufmann, San Mateo, California.

in InMor-

[Rojas-Guzman and Kramer, 19931 Rojas-Guzman, C. and Kramer, M. A. (1993). GALGO: A Genetic ALGOrithm decision support tool for complex uncertain systems modeled with Bayesian belief networks. In Proceedings of the Ninth Conference on Uncertainty in Artijicial Intelligence, pages 368-375, Washington, D.C. [Rojas-Guzman and Kramer, 19961 Rojas-Guzman, C. and Kramer, M. A. (1996). An evolutionary computing approach to probabilistic reasoning on Bayesian networks. Evolutionary Computation, 4( 1):57-85. Scheines et al., 19941 Scheines, R., Spirtes, P., Glymour, C., and Meek, C. (1994). TETRAD II: Tools for Causal Modeling. Lawrence Erlbaum, Hillsdale, NJ. Welch, 19961 Welch, R. L. (1996). Real time estimation of Bayesian networks. In Proceedings of the Twelfth Annual Conference on Uncertainty in ArtiJicial Intelligence (UAI96), pages 533-544, Portland, Oregon.