First dissertations (Bagley, Rosenberg 1967). â Simple Genetic ..... Larry Bull, Adam Budd, Christopher Stone, Ivan Uroukov,. Ben De Lacy Costello and Andrew ...
Evolving Rules to Solve Problems: The Learning Classifier Systems Way Pier Luca Lanzi
UEC, Chofu, Tokyo, Japan, March 5th, 2008
Evolving
Early Evolutionary Research
3
Box (1957). Evolutionary operations. Led to simplex methods, Nelder-Mead. Other Evolutionaries: Friedman (1959), Bledsoe (1961), Bremermann (1961)
Rechenberg (1964), Schwefel (1965). Evolution Strategies. Fogel, Owens & Walsh (1966). Evolutionary programming.
Common view Evolution = Random mutation + Save the best. Pier Luca Lanzi
Early intuitions
4
“There is the genetical or evolutionary search by which a combination of genes is looked for, the criterion being the survival value.” Alan M. Turing, Intelligent Machinery, 1948
“We cannot expect to find a good child-machine at the first attempt. One must experiment with teaching one such machine and see how well it learns. One can then try another and see if it is better or worse. There is an obvious connection between this process and evolution, by the identifications “Structure of the child machine = Hereditary material “Changes of the child machine = Mutations “Natural selection = Judgment of the experimenter” Alan M. Turing, “Computing Machinery and Intelligence” 1950.
Pier Luca Lanzi
Meanwhile… in Ann Arbor… Holland (1959). Iterative circuit computers. Holland (1962). Outline for a logical theory of adaptive systems. Role of recombination (Holland 1965) Role of schemata (Holland 1968, 1971) Two-armed bandit (Holland 1973, 1975) First dissertations (Bagley, Rosenberg 1967) Simple Genetic Algorithm(De Jong 1975)
Pier Luca Lanzi
5
What are Genetic Algorithms?
6
Genetic algorithms (GAs) are search algorithms based on the mechanics of natural selection and genetics Two components ¾ Natural selection: survival of the fittest ¾ Genetics: recombination of structures, variation Underlying methaphor ¾ Individuals in a population must be adapted to the environment to survive and reproduce ¾ A problem can be viewed as an environment, we evolve a population of solutions to solve it ¾ Different individuals are differently adapted ¾ To survive a solution must be “adapted” to the problem
Pier Luca Lanzi
Rules
What was the goal?
8
1#11:buy ⇒30 0#0#:sell ⇒-2 …
A real system with an unknown underlying dynamics Use a classifier system online to generate a behavior that matched the real system. The evolved rules would provide a plausible, human readable, model of the unknown system
Pier Luca Lanzi
Holland’s Vision, Cognitive System One
9
To state, in concrete technical form, a model of a complete mind and its several aspects A cognitive system interacting with an environment Binary detectors and effectors Knowledge = set of classifiers Condition-action rules that recognize a situation and propose an action Payoff reservoir for the system’s needs Payoff distributed through an epochal algorithm Internal memory as message list Genetic search of classifiers Pier Luca Lanzi
As time goes by…
Reinforcement Learning Machine Learning
10
1970’s
Genetic algorithms and CS-1 Research flourishes Success is limited
1980’s
Evolving rules as optimization Research follows Holland’s vision Success is still limited
Stewart Wilson Robotics applications 1990’s creates XCS First results on classification But the interest fades away
2000’s Pier Luca Lanzi
Classifier systems finally work Large development of models, facetwise theory, and applications
Stewart W. Wilson & The XCS Classifier System
11
1. Simplify the model 2. Go for accurate predictions not high payoffs 3. Apply the genetic algorithm to subproblems not to the whole problem 4. Focus on classifier systems as reinforcement learning with rule-based generalization 5. Use reinforcement learning (Q-learning) to distribute reward
Most successfull model developed so far Wilson, S.W.: Classifier Fitness Based on Accuracy. Evolutionary Computation 3(2), 149-175 (1995). Pier Luca Lanzi
The Classifier System Way
Learning Classifier Systems as Reinforcement Learning Methods
13
System stt+1
rt+1
at
Environment
The goal: maximize the amount of reward received How much future reward when at is performed in st? What is the expected payoff for st and at? Need to compute a value function, Q(st,at)→ payoff Pier Luca Lanzi
14
Define the inputs, the actions, and how the reward is determined expected payoff HowDefine doesthereinforcement learning work?
Compute a value function Q(st,at) mapping state-action pairs into expected payoffs
Pier Luca Lanzi
This looks simple… Let’s bring RL to the real world!
15
Reinforcement learning assumes that Q(st,at) is represented as a table But the real world is complex, the number of possible inputs can be huge! We cannot afford an exact Q(st,at)
Pier Luca Lanzi
What are the issues?
Exact representation infeasible Approximation mandatory The function is unknown, it is learnt online from experience
Pier Luca Lanzi
16
What are the issues?
17
Learning an unknown payoff function while also trying to approximate it Approximator works on intermediate estimates While also providing information for the learning Convergence is not guaranteed
Pier Luca Lanzi
Whats does this have to do with Learning Classifier Systems?
18
They solve reinforcement learning problems Represent the payoff function Q(st, at) as a population of rules, the classifiers Classifiers are evolved while Q(st, at) is learnt online
Pier Luca Lanzi
What is a classifier?
19
IF condition C is true for input s THEN the value of action a is p(s,w)
Which type of approximation? payoff
payoff
p(x,w)=w0+xw1
Which Representation?
landscape of A
Condition C(s)=l≤s≤u l
Pier Luca Lanzi
u
s
Same example with computed prediction
Pier Luca Lanzi
20
How do learning classifier systems work? The main performance cycle
Pier Luca Lanzi
21
How do learning classifier systems work? The main performance cycle
22
The classifiers predict an expected payoff The incoming reward is used to update the rules which helped in getting the reward Any reinforcement learning algorithm can be used to estimate the classifier prediction.
Pier Luca Lanzi
How do learning classifier systems work? The main performance cycle
Pier Luca Lanzi
23
24
In principle, any search method may be used I prefer genetic algorithms Where doare classifiers come from? because they representation independent A genetic algorithm select, recombines, mutate existing classifiers to search for better ones
Pier Luca Lanzi
25
Accurate Estimates About Classifiers (Powerful RL)
Genetics-Based Generalization
Classifier Representation Pier Luca Lanzi
One Algorithm Many Representations!
One Representation, One Principle
27
Data described by 6 variables a1, …, a6 They represents the simple concept “a1=a2 Ç a5=1” A rather typical approach ¾ Select a representation ¾ Select an algorithm which produces such a representation ¾ Apply the algorithm Decision Rules (attribute-value) ¾ if (a5 = 1) then class 1 [95.3%] ¾ If (a1=3 Æ a2=3) then class = 1 [92.2%] ¾… FOIL ¾ Clause 0: is_0(a1,a2,a3,a4,a5,a6) :- a1≠a2, a5≠ 1
Pier Luca Lanzi
Learning Classifier Systems: One Principle Many Representations Ternary rules 0, 1, #
####1#:1 22####:1
28
if a5