Document not found! Please try again

Evolving Rules to Solve Problems: The Learning ... - Semantic Scholar

4 downloads 0 Views 2MB Size Report
First dissertations (Bagley, Rosenberg 1967). ❖ Simple Genetic ..... Larry Bull, Adam Budd, Christopher Stone, Ivan Uroukov,. Ben De Lacy Costello and Andrew ...
Evolving Rules to Solve Problems: The Learning Classifier Systems Way Pier Luca Lanzi

UEC, Chofu, Tokyo, Japan, March 5th, 2008

Evolving

Early Evolutionary Research

3

™ Box (1957). Evolutionary operations. Led to simplex methods, Nelder-Mead. ™ Other Evolutionaries: Friedman (1959), Bledsoe (1961), Bremermann (1961)

™ Rechenberg (1964), Schwefel (1965). Evolution Strategies. ™ Fogel, Owens & Walsh (1966). Evolutionary programming.

Common view Evolution = Random mutation + Save the best. Pier Luca Lanzi

Early intuitions

4

™ “There is the genetical or evolutionary search by which a combination of genes is looked for, the criterion being the survival value.” Alan M. Turing, Intelligent Machinery, 1948

™ “We cannot expect to find a good child-machine at the first attempt. One must experiment with teaching one such machine and see how well it learns. One can then try another and see if it is better or worse. There is an obvious connection between this process and evolution, by the identifications “Structure of the child machine = Hereditary material “Changes of the child machine = Mutations “Natural selection = Judgment of the experimenter” Alan M. Turing, “Computing Machinery and Intelligence” 1950.

Pier Luca Lanzi

Meanwhile… in Ann Arbor… ™ Holland (1959). Iterative circuit computers. ™ Holland (1962). Outline for a logical theory of adaptive systems. ™ Role of recombination (Holland 1965) ™ Role of schemata (Holland 1968, 1971) ™ Two-armed bandit (Holland 1973, 1975) ™ First dissertations (Bagley, Rosenberg 1967) ™ Simple Genetic Algorithm(De Jong 1975)

Pier Luca Lanzi

5

What are Genetic Algorithms?

6

™ Genetic algorithms (GAs) are search algorithms based on the mechanics of natural selection and genetics ™ Two components ¾ Natural selection: survival of the fittest ¾ Genetics: recombination of structures, variation ™ Underlying methaphor ¾ Individuals in a population must be adapted to the environment to survive and reproduce ¾ A problem can be viewed as an environment, we evolve a population of solutions to solve it ¾ Different individuals are differently adapted ¾ To survive a solution must be “adapted” to the problem

Pier Luca Lanzi

Rules

What was the goal?

8

1#11:buy ⇒30 0#0#:sell ⇒-2 …

™ A real system with an unknown underlying dynamics ™ Use a classifier system online to generate a behavior that matched the real system. ™ The evolved rules would provide a plausible, human readable, model of the unknown system

Pier Luca Lanzi

Holland’s Vision, Cognitive System One

9

To state, in concrete technical form, a model of a complete mind and its several aspects ™ A cognitive system interacting with an environment ™ Binary detectors and effectors ™ Knowledge = set of classifiers ™ Condition-action rules that recognize a situation and propose an action ™ Payoff reservoir for the system’s needs ™ Payoff distributed through an epochal algorithm ™ Internal memory as message list ™ Genetic search of classifiers Pier Luca Lanzi

As time goes by…

Reinforcement Learning Machine Learning

10

1970’s

Genetic algorithms and CS-1 Research flourishes Success is limited

1980’s

Evolving rules as optimization Research follows Holland’s vision Success is still limited

Stewart Wilson Robotics applications 1990’s creates XCS First results on classification But the interest fades away

2000’s Pier Luca Lanzi

Classifier systems finally work Large development of models, facetwise theory, and applications

Stewart W. Wilson & The XCS Classifier System

11

1. Simplify the model 2. Go for accurate predictions not high payoffs 3. Apply the genetic algorithm to subproblems not to the whole problem 4. Focus on classifier systems as reinforcement learning with rule-based generalization 5. Use reinforcement learning (Q-learning) to distribute reward

Most successfull model developed so far ™ Wilson, S.W.: Classifier Fitness Based on Accuracy. Evolutionary Computation 3(2), 149-175 (1995). Pier Luca Lanzi

The Classifier System Way

Learning Classifier Systems as Reinforcement Learning Methods

13

System stt+1

rt+1

at

Environment ™ ™ ™ ™

The goal: maximize the amount of reward received How much future reward when at is performed in st? What is the expected payoff for st and at? Need to compute a value function, Q(st,at)→ payoff Pier Luca Lanzi

14

Define the inputs, the actions, and how the reward is determined expected payoff HowDefine doesthereinforcement learning work?

Compute a value function Q(st,at) mapping state-action pairs into expected payoffs

Pier Luca Lanzi

This looks simple… Let’s bring RL to the real world!

15

Reinforcement learning assumes that Q(st,at) is represented as a table But the real world is complex, the number of possible inputs can be huge! We cannot afford an exact Q(st,at)

Pier Luca Lanzi

What are the issues?

™Exact representation infeasible ™Approximation mandatory ™The function is unknown, it is learnt online from experience

Pier Luca Lanzi

16

What are the issues?

17

Learning an unknown payoff function while also trying to approximate it Approximator works on intermediate estimates While also providing information for the learning Convergence is not guaranteed

Pier Luca Lanzi

Whats does this have to do with Learning Classifier Systems?

18

They solve reinforcement learning problems Represent the payoff function Q(st, at) as a population of rules, the classifiers Classifiers are evolved while Q(st, at) is learnt online

Pier Luca Lanzi

What is a classifier?

19

IF condition C is true for input s THEN the value of action a is p(s,w)

Which type of approximation? payoff

payoff

p(x,w)=w0+xw1

Which Representation?

landscape of A

Condition C(s)=l≤s≤u l

Pier Luca Lanzi

u

s

Same example with computed prediction

Pier Luca Lanzi

20

How do learning classifier systems work? The main performance cycle

Pier Luca Lanzi

21

How do learning classifier systems work? The main performance cycle

22

The classifiers predict an expected payoff The incoming reward is used to update the rules which helped in getting the reward Any reinforcement learning algorithm can be used to estimate the classifier prediction.

Pier Luca Lanzi

How do learning classifier systems work? The main performance cycle

Pier Luca Lanzi

23

24

In principle, any search method may be used I prefer genetic algorithms Where doare classifiers come from? because they representation independent A genetic algorithm select, recombines, mutate existing classifiers to search for better ones

Pier Luca Lanzi

25

Accurate Estimates About Classifiers (Powerful RL)

Genetics-Based Generalization

Classifier Representation Pier Luca Lanzi

One Algorithm Many Representations!

One Representation, One Principle

27

™ Data described by 6 variables a1, …, a6 ™ They represents the simple concept “a1=a2 Ç a5=1” ™ A rather typical approach ¾ Select a representation ¾ Select an algorithm which produces such a representation ¾ Apply the algorithm ™ Decision Rules (attribute-value) ¾ if (a5 = 1) then class 1 [95.3%] ¾ If (a1=3 Æ a2=3) then class = 1 [92.2%] ¾… ™ FOIL ¾ Clause 0: is_0(a1,a2,a3,a4,a5,a6) :- a1≠a2, a5≠ 1

Pier Luca Lanzi

Learning Classifier Systems: One Principle Many Representations Ternary rules 0, 1, #

####1#:1 22####:1

28

if a5