Bloat and its control in Genetic Programming - Google Groups

Bloat and its control in Genetic Programming Nic McPhee Division of Science and Mathematics University of Minnesota, Morris Morris, Minnesota, USA Currently on sabbatical working with Riccardo Poli, University of Essex, UK

13 June 2008 University of Granada

Nic McPhee (U of Minnesota, Morris)

Bloat control in GP

13 June 2008, U of Granada

1 / 21

Overview

The big picture

The big picture

Genetic Programming successful in numerous domains Average tree size often grows quickly without relation to fitness This bloat has negative performance implications Parsimony pressure often used, but ad hoc and crude Can theory help? Size evolution equation from schema theory Similar to Price’s theorem from biology

⇒ Precise and powerful control of average population size


Bloat control in GP


2 / 21

Overview

Outline

Outline

1

Brief overview of Genetic Programming

2

The problem of bloat

3

Price’s theorem and size evolution

4

Dynamically computing parsimony penalty


Bloat control in GP


3 / 21

Overview of Genetic Programming

Outline

1

Brief overview of Genetic Programming Evolutionary Computation (EC): Population based search Genetic Programming (GP): EC with expression trees Open questions in Genetic Programming

2


3


4



Bloat control in GP


4 / 21


EC: population based search

Evolutionary Computation (EC): Population based search The basic process: Generate random initial population. Some of these are better than others at solving your problem. Take the better and mutate/recombine to generate new individuals. Some of these are better than others, etc. Cook until done (or bored). Key issues: How to represent/manipulate these potential solutions What biases those representations/manipulations have


Bloat control in GP


5 / 21


GP = EC + expression trees

Genetic Programming (GP): EC with expression trees Genetic Algorithms (GAs) = EC with (fixed length) bit strings. Genetic Programming (GP) uses expression trees instead. Subtree crossover (XO) is most common recombination operator.


Bloat control in GP


6 / 21


Open questions in GP

Questions and issues in GP

As with any complex process, there are questions still to be answered, including: Why does subtree XO work? How do we evolve solutions humans can understand? How/why are variants the same/different? Why do tree sizes bloat? How to combat bloat without undue bias? etc. . .


Bloat control in GP


7 / 21


Outline

1


2

The problem of bloat What is bloat? Causes of bloat Controlling bloat Parsimony pressure

3


4



Bloat control in GP


8 / 21


What is bloat?

What is bloat?

Noticed very early in GP history Initial generations are driven by search Soon, though, average tree size grows fast Greater than linear, less than quadratic

Growth not related to improvements in fitness Large trees require memory to store and CPU cycles to process


Bloat control in GP


9 / 21


Causes of bloat

Causes of bloat

Still an active research question, but much has been learned Early thought: Protection against harmful XO More recently: Small trees likely unfit, but sampled often Any "final" explanation is likely to be a combination of (or at least encompass) many of the existing ideas.


Bloat control in GP


10 / 21


Controlling bloat

Controlling bloat Ad hoc methods: Parsimony pressure Koza’s original "solution" Still probably most widely used

Mutation operators aimed at shrinking trees More theoretically grounded approaches: Multi-objective approaches Using Minimum Description Length, entropy, etc., to measure/control solution complexity Tarpeian bloat control (based on schema theory results)


Bloat control in GP


11 / 21


Parsimony pressure

Parsimony pressure The basic approach: fp (x) = f (x) − c`(x) where f is the original (unpenalized) fitness ` is the length (or size) of the tree x c is the parsimony penalty Choosing the "right" c is important, and not obvious. Too small, you still have bloat Too large, you over constrain the search process In most applications c is constant, which is known to be problematic


Bloat control in GP


12 / 21


Parsimony pressure

Parsimony pressure The basic approach: fp (x) = f (x) − c`(x) where f is the original (unpenalized) fitness ` is the length (or size) of the tree x c is the parsimony penalty In this work We show how to compute c dynamically, In a disciplined, theoretically grounded manner, Which allows us to tightly control the average tree size, And even dynamically alter the control during a run Nic McPhee (U of Minnesota, Morris)

Bloat control in GP


12 / 21


Outline

1


2


3

Price’s theorem and size evolution Size evolution equation Price’s theorem

4



Bloat control in GP


13 / 21


Size evolution equation

Size evolution equation, part 1

Earlier schema theory work showed E[µ(t + 1)] =

X

`p(`, t)

length `

where E[µ(t + 1)] is expected average size at time t + 1 Summation is over all lengths (sizes) ` p(`, t) is the probability of selecting a program of size ` in generation t


Bloat control in GP


14 / 21


Size evolution equation

Size evolution equation, part 2 We can focus on the change in size: E[∆µ] = E[µ(t + 1) − µ(t)] =

X

`(p(`, t) − Φ(`, t))

length `

where µ(t) is expected average size at time t Summation is over all lengths (sizes) ` p(`, t) is the probability of selecting a program of size ` in generation t Φ(`, t) is the proportion of programs of size ` in generation t Difference between p and Φ is ultimately key. Nic McPhee (U of Minnesota, Morris)

Bloat control in GP


15 / 21


Price’s theorem

This is Price’s Theorem! Assuming fitness proportionate selection, we can rewrite this as: E[∆µ] =

Cov(`, f ) ¯f (t)

where Cov(`, f ) is the covariance between size and fitness ¯f (t) is the average fitness at time t This is just a version of Price’s Theorem! An important theorem from evolutionary biology Describes change in frequency of heritable traits (size here) using their covariance with fitness Nic McPhee (U of Minnesota, Morris)

Bloat control in GP


16 / 21


Outline

1


2


3


4

Dynamically computing parsimony penalty The math Simple example Empirical results


Bloat control in GP


17 / 21


The math

Generalized parsimony pressure

Generalize earlier parsimony pressure: fp (x, t) = f (x) − g(`(x), t) Using this new fitness function we find E[∆µ] =

Cov(`, f ) − Cov(`, g) ¯f − g ¯

Then no bloat is E[∆µ] = 0, i.e., Cov(`, f ) = Cov(`, g).


Bloat control in GP


18 / 21


Simple example

A simple example Let g(`(x), t) = c(t)`(x), so fp (x, t) = f (x) − c(t)`(x) Then Cov(`, f ) = Cov(`, g) implies c(t) =

Cov(`, f ) Var(`)

Use that equation to compute c(t) at each generation and you get no change (in expectation) in the average size over time. A theoretically grounded, dynamic parsimony pressure! Can be generalized, e.g., so µ(t) tracks specified function. Nic McPhee (U of Minnesota, Morris)

Bloat control in GP


19 / 21


Empirical results

600

700

Avg size vs. time, different target size functions

400

Local 300

Average size

500

Linear

100

200

Limited

Sin

Constant 0


100

200 300 400 Generation 6 Mux, Pop size 2000, c * size penalty

Bloat control in GP

500


20 / 21


Empirical results

Thanks!

Thanks for your time and attention! Thanks also to J.J. Merelo for inviting me out to the University of Granada. Contact: [email protected] http://www.morris.umn.edu/~mcphee/ Questions?


Bloat control in GP


21 / 21

Bloat and its control in Genetic Programming - Google Groups

Bloat and its control in Genetic Programming - Google Groups

Suggest Documents

What bloat? Cartesian Genetic Programming on ... - Google Sites

Control of Bloat in Genetic Programming by Means of the Island Model ...

Genetic Programming for Prediction and Control - CiteSeerX

Genetic Programming for Prediction and Control - CiteSeerX

Genetic Programming for Prediction and Control - CiteSeerX

PGI Isozyme Diversity and Its Genetic Control in Mango - HortScience

control and automation - Google Groups

control and automation - Google Groups

Programming in PHP â PHP001 - Google Groups

Genetic Programming and its Applications to the Synthesis of Digital

HTML5 and PYTHON PROGRAMMING - Google Groups

Distribution control of protecting groups and its effect on ... - Sematech

GENETIC PROGRAMMING

Genetic Programming

Genetic Programming

Genetic Programming

Genetic Programming –

Avoiding the Bloat with Stochastic Grammar-based Genetic

Combining Genetic Programming and Inductive Logic Programming ...

Genetic Terrain Programming - Google Sites

Genetic Terrain Programming - Google Sites

BLOAT CHART COLOR

Genetic Programming III - Google Sites

Self-compatibility in Lollum temulentum L: its genetic control ... - Nature

Bloat and its control in Genetic Programming - Google Groups