Evolutionary strategy for learning multiple-valued logic ... - IEEE Xplore

2 downloads 0 Views 443KB Size Report
University of Windsor, 401 Sunset Avenue. N9B 3P4 Windsor, Ontario, Canada. £. Dan A. Simovici. Department of Mathematics and Computer Science.
Evolutionary Strategy for Learning Multiple-Valued Logic Functions Alioune Ngom

Dan A. Simovici

Computer Science Department, 5115 Lambton Tower

Department of Mathematics and Computer Science

University of Windsor, 401 Sunset Avenue

University of Massachusetts at Boston

N9B 3P4 Windsor, Ontario, Canada. £

02125 Boston, Massachusetts, USA. Ivan Stojmenovi´c

Computer Science Department School of Information Technology and Engineering University of Ottawa, K1N 6N5 Ottawa, Ontario, Canada. Ý

Abstract We consider the problem of synthesizing multiplevalued logic functions by neural networks. An evolutionary stratà  is described. A egy (ES) which finds the longest strip in Î strip contains points located between two parallel hyperplanes. Repeated application of ES partitions the space Î into certain number of strips, each of them corresponding to a hidden unit. We construct neural networks based on these hidden units. Preliminary experimental results are presented and discussed.

Keywords: Multiple-valued logic, Multiple-threshold perceptron, Evolution strategy, Neural network, Partitioning method, Constructive algorithm.

1. Introduction In this research paper we propose to synthesize multiplevalued logic functions by minimal multilayer feedforward neural networks. There are various measures that can be used in constructing multiple-valued neural networks. The most important measures are depth (number of layers) and size (number of processing units). The depth is related to the speed of computing a function whereas size decides the hardware cost. In this paper, we use multiple-valued multiple-threshold perceptrons as basic processing elements (i.e. nodes) of the network. We apply the strip based neural network growth algorithm of Ngom et al. [9] to realize multiple-valued logic functions by minimal networks. The technique in [9] apply genetic algorithm to find small networks for given arbitrary functions. Our strip based method, however, uses evolutionary strategy to construct minimal networks.    with . A -valued logic Let function maps the Cartesian product into . Denote by  the set of all such functions   . The set







 



 defined by 





 is the set of all -valued logic functions. For instance,  is the set of all two-valued logic functions. A -input -valued -threshold perceptron [8]), abbrevi-perceptron computes a weighted -input ated as  valued -threshold function    given by





 



   

 if      

   if         

  

 if    (1) where

       is the output vector;       is the threshold vector —with          — and         is the number of threshold values;      is the input vector;      is the weight vector and  is the

and . The perceptron’s transfer function is dot product of  a  -valued -threshold function     . A -perceptron partitions the inputs  into        disjoint classes   using  parallel hyperplanes,   where 

      and       (we assume   and  ). Each hyperplane equation denoted by  (    ) is of the form     (2) Our model of multiple-valued logic neuron is the 

¼



perceptron defined above. The first model of multiple-valued logic neurons (and neural network) was introduced by Chan [2] and since then various other models have been described (see for instance, [7]). The problem we address in this paper is that of learning multiple-valued logic functions using minimal neural net-perceptrons. The problem of works composed of  deciding whether or not a given task can be performed by a given architecture is known to be NP-complete [4]. Also, it has been shown in [1] that the problem of finding the absolute minimal architecture for a given task is NP-hard.

£ Research supported by NSERC grant RGPIN22811700 and University of Windsor’s Startup Fund Ý Also with DISCA, IIMAS, UNAM, Dirrection Circuito Escolar s/n, Coyoacan, Mexico D.F. 04510, Mexico. Research supported by REDII grant Proceedings of the 34th International Symposium on Multiple-Valued Logic (ISMVL’04) 0195-623X/04 $20.00 © 2004 IEEE



y

  ;    ;

Procedure

3

1

1

3

3

2

0

1

1

2

1

0

1

1

1

0

3

0

0

1

0

1

2

3

-BasedSynthesis

   ;

 s.t.   of  ; with respect to ;      ;     ; Until   ; Construct a network with  hidden units on the first layer;

Repeat Apply ES to find a subset Create a new hidden unit

Figure 2. -based synthesis algorithm. x

 and  .

ited by either one or two hyperplanes (depending on the predefined objective subset ). Each halfspace is assigned to a hidden unit that correctly classifies all elements of it. Once Our approach to the problem’s solution is discussed in secthe first hidden layer is complete, the remaining weights, laytion 2. The learning method is based on the general principle ers and units of the networks are determined to complete the of partitioning algorithms discussed in [9]. A partitioning alnetwork construction (the details of the network architecture gorithm seeks to construct a minimal network by partitioning are described in section 4). the input space into classes that are as large as possible. Each class of partition is then assigned to a new hidden unit. The connections and weights of the new units are determined appropriately in such a way that the constructed network will 3. Determining longest strips by evolutionary strategy always give the correct answer for any input. Distinct partitioning algorithms differ in the way the input space is partitioned. Also network topologies obtained from different parEvolutionary strategies have been proposed by Schwefel titioning algorithms may differ in the way new hidden units [11] as optimization methods for real-valued parameters. ES are connected. manipulates a single potential solution to a problem. SpecifiIn [9], minimal neural network is obtained by assigning cally, it operates on an encoded representation of the solution, nodes to optimal subsets (called strips) of a function’s space and not directly on the solution itself. Schwefel’s ES encodes and combining those nodes in such a way that the given funca solution as a real-valued vector. We will refer to such vection is synthesized. A genetic algorithm (GA) is used to find tor as chromosome. The solution is associated with an objecthe optimal strips of a function. Although GA is a powerful tive value that reflects how good or bad it is, compared with optimization method, it has the problem of being too slow other potential solutions in the space. ES is a random guided compared to other optimization techniques. In this paper, we hill-climbing technique in which a good candidate solution is use the computationally faster evolutionary strategy in place obtained by applying a certain number of small mutations to of genetic algorithm to obtain the function’s strips. a given parent solution. The best result of mutation is used again to generate the next best solution, and so on until some 2. Longest strip based growth algorithm convergence criterion is satisfied. Figure 1. Example of longest strip for

A strip is a set of points between two parallel hyperplanes which have the same value. The longest strip is the strip with the maximum possible cardinality. A maximum separable subset is a set of points having equal values with the maximum possible cardinality that can be separated from all other points by exactly one hyperplane. Examples of longest strip and maximum separable subset are shown, respectively, in Figure 1. The original STRIP of [9] uses genetic algorithm (GA) to determine the longest strip or the maximum separable subset of a currently given set of training examples. Here, we use evolution strategy (ES) in place of genetic algorithm (see Figure 2). The underlying growth algorithm of STRIP constructs a network by removal of points in a predefined objective sub . is either the longest strip or the maximum set separable subset in  . The goal is then, using ES, to obtain a subset   such that  is as close as possible to   if not equal to   (of course we have    ). In the algorithm, ES finds subsequent halfspaces delim-

3.1. Problem representation Our ES method uses the same solution representation of [9]. That is, a potential solution  (a subset of   ) is rep (such vector can be decoded resented as a weight vector  to obtain ). More formally, a potential solution is a subset    and the best solution is one whose size is closest to   (if not equal to  ). Given a weight vector  ½          we can find the unique strip (or separable subset) of maximum cardinality that is associated  (see section 3.2). Each chromosome  will uniquely with    into   classes with determine a partition of  parallel hyperplanes (for some ) and the best chromosome is the one that maximizes the number of points between a pair of parallel hyperplanes. To determine how good is a solution the ES needs an objective function to evaluate each chromo. some 

Proceedings of the 34th International Symposium on Multiple-Valued Logic (ISMVL’04) 0195-623X/04 $20.00 © 2004 IEEE

3.2. Fitness function

3.3. Mutation

The objective function, the function to optimize, provides the mechanism for evaluating each chromosome.   be the current set of points. Initially, Let   . To compute the longest strip generated by  , we   and construct a sorted calculate for every   the value  list of records of the form     . The array is sorted using    as primary key and   as secondary key. Let these records be sorted as follows: ½       , or more precisely,        ,     , where           .          such A strip in is a sequence  that

Chromosomes are subject to random mutations. With probability   , each coordinate of a vector is altered according to some mutation operator. We use the three mutation operators of [9]. For a chromosome to be mutated, one of the three mutation operators is selected with probability  .

3.4. Weight neighborhood

 Recall that to find the subset  for a current   in a set  we sort the  points with respect of points with   to   (first key) and   (second key) and find the longest strip or the maximum separable subset from the sorted list 1.            .                  . The neighbors of  are precisely those weight vectors that yield the least change in       and       . 2.  the ordering of the  ’s. with      and       . The length of the For instance, if 

and  gives the orstrip is    and    is the value of the strip. der                         , then a near neighbor     Given a set of points   and a function  over , let of  may produce the order                 ½ and ¾     (      ) be respec- and a far neighbor of  may produce the order  , with            . tively the leftmost and rightmost strips generated by  strip values  and  . We denote by      the length of  ,    . Let   Consider the -th coordinate,  , of  the longest strip generated by   and denote by     the and    be two consecutive elements in the order produced length of the maximum between the leftmost and rightmost by   and let                   strips, on set and function  . To evaluate how good is      . Likewise, for an arbitrary vector , we   we propose the following fitness function with respect to the will have                . definition of .  with respect to coordinate Vector  is a neighbor of  if   for  and     is minimal such that the   = longest strip sorted order is changed. Then only  is affected and hence                . Since        (3)            therefore we have               , where    .   = maximum separable subset Taking   and solving for  we obtain







   

    

(4)  

   , that is the set of

    Let

points of value . An alternative objective is to select a strip ´µ



 of value , i.e.  , which maximizes  , where    de  notes the length of  . That is, as in [6, 12], the selection criterion chooses the strip that constitutes the largest proportion of a class of points that can be separated. We denote by     the length of the largest strip proportion of value

and denote by  ½    and  ¾    the lengths of the leftmost and rightmost strips, respectively, on set and function  . Our alternative fitness function with respect to  is 

 = largest strip proportion

  



     





     (7)   Only differences      , for     , be

tween distinct consecutive elements in sorted order are considered. For each coordinate (   ) there should be one neighbor only, chosen such that      ·½    is minimized and is not equal to zero. Let      and denote the nearest neighbor of   with respect to by  . Then  differs from   only in the -th coordinate, by  .    and   for  . In other That is  words, the Euclidean distance      between  and  is   . Thus there are  neighbors of  , that is one nearest  . We denote the set of all neigbhor per each coordinate of   such neighbors of  by         . This set is . called the neighborhood of 

(5)

3.5. Evolution strategy

 = largest separable proportion Our evolutionary strategy is shown in Figure 3. In the al     ¾      is the current solution at generation ,  is the best       ½  (6) gorithm,    ½   ¾  solution generated so far,   is the best solution in  and

Proceedings of the 34th International Symposium on Multiple-Valued Logic (ISMVL’04) 0195-623X/04 $20.00 © 2004 IEEE

Procedure EvolutionaryStrategy    ;   ;    := random unit vector;    ; Repeat        = best in    If

4. Constructing the neural network In the -th iteration of the -based synthesis algorithm, ES will find a chromosome  which generates a subset  . Subset  is the longest strip (or maximum separable subset) found in the population. The optimization ability of ES makes it possible to attain a solution as close (in size) to   as possible, if not equal. The ability of ES to produce   depends on its many control parameters and the complexity of the tasks to learn.

      ;          then      ;

Else



 

Success := false; Repeat we search in the  ’s

  ;      = best in  ; If  

   then    ;      ;

Success := true; Until Success = true or  ;  may be a local optimum If Success = false then  With probability  do either one of  1:    := best between    and all  ’s;





2: With probability  do either one of   := random unit vector; big jump 2.1:  2.2:    := random mutation of  2.3:    := random mutation of    2.4:

    ;

    := random mutation of    for a random ;

Until Stopping criteria is true;

Figure 3. Evolution strategy to find

such that



.

  is the next solution to generate. We initially start with a random unit vector  and set  to  . The algorithm works as follows. At generation , we compute the best neighbor of  , that is a vector   in   that has the highest fitness value. If   is better than  then we are done for this generation;   and  are both set to   and we move to the next generation. If   is worse than  then we must decide how to set   . We first attempt to find a solution better than  in the neighborhood of the neighbors of  . That is we search successively in each  (  ) until such solution is found or all  neighborhoods are tried without success. If a solution is found then we set both  and   to that solution,  for some  , and move on to the next generation. If, otherwise, no such solution is found then  is possibly a local optimum. We have two choices, with probability  we either 1) set   to the best between   and  (for all  ’s) and move on to generation   , or 2) to jump out of a possible local optimum, we set   to a random unit vector or to a random alteration of  or   or  for some random  . In the second choice, one of the four possibilities is selected with probability  and the alteration of a vector is done by choosing any of our three mutation techniques described in section 3.3 (with probability  ). Let  be the number of generations, then at each new generation  new chromosomes are evaluated for their fitness and hence, our ES has a time complexity of           . The time complexity in [9] is      where is the population size of the GA. Since   then our ES is much faster than the GA of [9].

Longest strip based network At every iteration of the based synthesis algorithm, ES finds a chromosome  which produces the longest strip         , where     ,       and strip value      . Let     . Then we create a    -perceptron (hidden unit  ) whose weight vector is  , threshold vector is           and output vector is       . In other words, the perceptron has a transfer function of the form             (that is a         valued two-threshold function). The    -perceptron will output the value  for all points   and will output the value 0 for all points     . In order to achieve a good accuracy on the testing set, that is a good generalization ability of our algorithm when approximating a function, we set the threshold vector to              . Thus test points of value       which are outside but close to the strip —that is test points that lie between    and   and between     and     — will be correctly classified by unit  , since they are now spanned by  . The offsets  and  are given by



 











  and     



   





 

(8)

Maximum separable subset based network At every iteration of the -based synthesis algorithm, GA finds a chromosome  which produces a maximum separable subset        or          (i.e. the maximum between the leftmost and the rightmost strips), where        and strip value       or    . Let     . Then we create a    -perceptron (hidden unit  ) whose weight vector is  , threshold vector is         if  is the leftmost strip, or        if  is the rightmost strip, and output vector is      or       depending on  . In other words, the perceptron has a transfer func             or tion of the form   



 

          (that is a   -valued one-threshold function). The    -perceptron will output the value  for all points   and will output the value 0 for all points     .  

Proceedings of the 34th International Symposium on Multiple-Valued Logic (ISMVL’04) 0195-623X/04 $20.00 © 2004 IEEE



t j = j+1

1

Suggest Documents