Learning of Computer Programsâ *1+. ⢠GP algorithms are inspired by theory of. Evolution. ..... Time required to solve a puzzle. 35. Pakistan Institute of ...
Basics of Genetic Programming and GPLab Toolbox
Dr. Asifullah Khan
Pattern Recognition Lab Department of Computer and Information Sciences Pakistan Institute of Engineering and Applied Sciences (PIEAS)
Things to be Discussed Evolutionary Computing and its Types Introduction to Genetic Programming
GPLab (Genetic Programming Toolbox) Modules of GPLab
Parameters setting
Graphical outputs
GP examples GPLab based implementation Pakistan Institute of Engineering and Applied Sciences
2
Genetic Programming (GP) • “ Genetic Programming is the automated Learning of Computer Programs” *1+. • GP algorithms are inspired by theory of Evolution.
Pakistan Institute of Engineering and Applied Sciences
3
Natural Evolution The Darwinian theory of evolution can be summarized as: • In a world with limited resources and stable populations, each individual competes with others for survival. • Those individuals with the “best” characteristics (traits)are more likely to survive and to reproduce. • These desirable characteristics will be passed on to their offspring from generations to generations . Pakistan Institute of Engineering and Applied Sciences
4
Natural Evolution “Viewed as a Learning process, natural evolution results in very long-time learning from the collective experience of generations of populations” In other words, every living organism is the result of million of years of learning by its ancestors about how to survive on earth long enough to reproduce. Pakistan Institute of Engineering and Applied Sciences
5
Natural Evolution There are four essential preconditions for the occurrence of evolution by natural selection [1]: 1. Reproduction of individuals in the population 2. Variation that affects the likelihood of survival of individuals 3. Heredity in reproduction 4. Finite Resources causing competition
Pakistan Institute of Engineering and Applied Sciences
6
Natural Evolution “Competition results in winners or losers”
Natural Evolution “Generally, the most-fit pass on their traits”
Natural Evolution “Evolution requires a diverse population”
Pakistan Institute of Engineering and Applied Sciences
9
Evolution “Mating and mutation creates feature diversity from among the pool of mostly advantageous traits”
“It take thousands of cycles for truly amazing adaptations to emerge”
Evolutionary Computing “Evolutionary computing is the branch of computational intelligence that models the process of natural evolution”
Pakistan Institute of Engineering and Applied Sciences
11
Evolutionary Computing • Several evolutionary algorithms (EAs) have been developed so far Some of the common evolutionary algorithms are: – Differential Evolution – Cultural Algorithm, – Genetic Algorithms – Genetic Programming – …… Pakistan Institute of Engineering and Applied Sciences
12
Differential Evolution (DE) • Differential Evolution was proposed by Price and Storn in 1995. • DE differs from other EAs in the sense that distance and direction information from the current population is used to guide the search process [2]. • Nowadays, it is considered as one of the most powerful evolutionary algorithms for real number function optimization. Pakistan Institute of Engineering and Applied Sciences
13
Cultural Algorithm (CA)
Cultural Algorithm Framework BELIEF SPACE Adjusting Beliefs
Influence Function
Acceptance Function
POPULATION SPACE Evolutionary Operators Pakistan Institute of Engineering and Applied Sciences
Performance Function 14
General Features of CA • Dual Inheritance (at population and knowledge levels) • Knowledge are “beacons” that guide evolution of the population • Supports self adaptation at various levels • Evolution can take place at different rates at different levels (“Culture evolves 10 times faster than the biological component”). • Supports hybrid approaches to problem solving.
Pakistan Institute of Engineering and Applied Sciences
15
Genetic Algorithm
Pakistan Institute of Engineering and Applied Sciences
16
Genetic Algorithms (GA) • The first Evolutionary Computing developed to simulate genetic systems
technique
• A probabilistic search algorithm that iteratively transforms a set of mathematical objects (each with an associated fitness value) into a new population using Darwinian principle of natural selection.
Pakistan Institute of Engineering and Applied Sciences
17
Genetic Programming(GP) In 1992, Koza introduced GP by evolving tree structures
Pakistan Institute of Engineering and Applied Sciences
18
Genetic Programming(GP) • Information learned through biological evolution is regularly stored in DNA base pairs. Sequences of DNA base pairs act like instructions, facilitating the manufacturing of proteins. • This program like nature of DNA, together with the variable length nature of DNA, explains the appeal of biological evolution as a model for evolution of computer programs.
Pakistan Institute of Engineering and Applied Sciences
19
Genetic Programming (GP) GP is mostly used as a Machine Learning Approach Machine learning Evolutionary Algorithms
GP EP GA Artificial Intelligence ES
Artificial Intelligence
Genetic Programming (GP) • Specialization of Genetic Algorithms • GA vs. GP – Main difference between GA and GP is their representation scheme Genetic Algorithms Genetic Programming
• Strings • Tree Structure
Pakistan Institute of Engineering and Applied Sciences
21
Genetic Programming (Basic Algorithm) 1
• Create a population of Programs • Each program attempts to solve the given problem • Determine Fitness of Programs
2
3
• Determine fitness by success in solving the problem • Fitter the member, better its chance to produce offspring in the next generation
• Select Parents and Produce Offspring • Use Selection Schemes • Use crossover and mutation
Pakistan Institute of Engineering and Applied Sciences
22
Genetic Search Cycle Evaluate Candidate Solutions
Initial Population
Fitness Values
New Candidates
Apply Genetic Operators
Check Termination Criteria Save the Best
Selected Individuals
Perform Selection Fitness Values Pakistan Institute of Engineering and Applied Sciences
23
Tree Based Representation • GP evolves executable Computer Programs • Each individual represents a computer program as a tree structure • Tree structure has the following implications [1]; • Adaptive Individuals: Contrary to GAs where the size of individuals are usually fixed, a GP population will usually have individuals of different size, shape and complexity. • Domain Specific Grammar: A grammar needs to be defined that accurately reflects the problem to be solved. It should be possible to represent any possible solution using the defined grammar. Pakistan Institute of Engineering and Applied Sciences
24
Tree Representation (XOR) T=[x1, x2]
OR
F=[OR, AND, NOT]
AND
AND x1
NOT
NOT
x2
X1
Pakistan Institute of Engineering and Applied Sciences
x2
25
GP based Modeling of A Problem
Pakistan Institute of Engineering and Applied Sciences
26
Genetic Programming Components
Terminal Set
• Works as primitive data types • Constants • Parameter-less functions • Inputs • Members of this set make up the leaves of the program tree
Function Set
• Set of available functions • Often tailored according to the needs of the problem domain [2,4]
Pakistan Institute of Engineering and Applied Sciences
27
Using Trees To Represent Computer Programs 9+((X*7)/(Y-5))
Functions + / 9
*
X
Terminals
-
7
Y
5
Initial Population • Initial Population is generated randomly within the restrictions of maximum depth • For each individual – Root is randomly selected from the set of functions – Arity of function determines the branching factor of the root and non-terminal nodes – For each non root node an element is selected randomly from either the terminal set or the function set – If the element is selected from the terminal set the node becomes a leaf Pakistan Institute of Engineering and Applied Sciences
29
Initial Population (Cont.…) • Non-terminals are used to build a complete tree up to the leaf nodes, which are then completely populated with terminals. Every tree is grown to maximum depth and has the maximum number of nodes allowed.
Full
Grow
• The root node is chosen from the function set • All nodes not at maximum depth are chosen randomly and the growth for a branch ends when a terminal is chosen. • Trees can have irregular shapes
Ramp Half and Half
• The population is separated into M partitions, with ith partition having maximum depth of M-i • Half of each partition is populated with grow while the other half is populated with full.
Pakistan Institute of Engineering and Applied Sciences
30
Genetic Operators; Simple Examples Crossover
Mutation
Randomly select a node from Parent 1
Randomly select a node in the program tree
Randomly select a node from Parent 2
Remove that node and its subtree
Swap the two nodes along with their subtrees
Replace the node with a new subtree (generated using full, grow or Ramp half and half)
Pakistan Institute of Engineering and Applied Sciences
31
Genetic Operators (Cont..)
• Extensive research is going on for developing new Genetic Operators; Crossover and Mutation
Pakistan Institute of Engineering and Applied Sciences
32
Crossover Operator *
/
-
1
2
-
Parent 2
+
Parent 1
13
2
power
4
2
abs
-7 +
Child 1
Child 2
* / 13
2
2
4
Pakistan Institute of Engineering and Applied Sciences
1
power
2
abs
-7 33
Mutation Operator + * 1
Right subtree is randomly selected for mutation
+ 2
3
4 + *
The entire subtree is replaced
1
2
*
2 7
Pakistan Institute of Engineering and Applied Sciences
4 34
Fitness based selection • Parents for the production of next generation are selected on the basis of their fitness • Sum of absolute error is the most basic fitness function used • Fitness measure can be varied depending on problem domain – Number of correct solutions – Number of errors navigating a maze – Time required to solve a puzzle
Pakistan Institute of Engineering and Applied Sciences
35
Ranking Selection • Selection Based on Fitness Order – The members of the population are ranked from best to worst. – The selection probability is assigned based on the rank.
Pakistan Institute of Engineering and Applied Sciences
36
Tournament Selection • Select a subset of the population (the tournament size) randomly. • More fit (winning) individuals are used to generate replacements for less fit (losing) individuals. – Accelerates processing time competition) – Facilitates parallel processing
Pakistan Institute of Engineering and Applied Sciences
(compared
with
full
37
Types of GP • • • • • • •
Conventional Tree based GP Linear GP Graph based GP Cartesian GP Multi-Gene GP Multi-Objective Optimization based GP …….
Pakistan Institute of Engineering and Applied Sciences
38
Science-oriented Applications of GP • • • • • •
Sequence prediction/classification Forecasting & Prediction Crystallography; Biochemistry Datamining Geoscience and Remote Sensing …….
Pakistan Institute of Engineering and Applied Sciences
39
Engineering-oriented Applications of GP • • • • • •
On-Line control of Real robots Design of Electrical Circuits Spacecraft Attitude Maneuvers Antenna Design Motion Animation …….
Pakistan Institute of Engineering and Applied Sciences
40
Computer-Science-oriented Applications of GP • • • • • • • • •
Cellular Encoding of ANN Intrusion Detection Image Classification Digital Watermarking Computer Aided Diagnostics Systems Soccer Softbot Team Coordination GP-Music Evolutionary Art ……. Pakistan Institute of Engineering and Applied Sciences
41
Genetic Programming • Part-II –GPLAB
Pakistan Institute of Engineering and Applied Sciences
42
GPLab • GPLab is an open source genetic programming toolbox for MATLAB. • Developed by – Sara Silva (Evolutionary and Complex Systems Group, University of Coimbra, Portugal) – gplab.sourceforge.net/
Pakistan Institute of Engineering and Applied Sciences
43
GPLab (Continued) SET VARS (This module initializes the parameters)
GEN POP (Generates the initial population and calculates its fitness)
GENERATION (Generates new population by applying genetic operators)
Pakistan Institute of Engineering and Applied Sciences
44
SET VARS • This module – initializes the parameters with the default values – updates the parameters with the user settings
• This module can be called – by the user – by a request for parameter initialization from GEN POP
Pakistan Institute of Engineering and Applied Sciences
45
GEN POP • This module generates the initial population and calculates its fitness. • Three initialization methods exist – Full – Grow – Ramp Half and Half
• By default the fitness is the sum of absolute difference between obtained and expected results. • Custom fitness function can also be used. Pakistan Institute of Engineering and Applied Sciences
46
GENERATION • Generates new population by applying the genetic operators (tree crossover, tree mutation) • Parents are selected from the pool through one of the following four sampling methods – – – –
Roulette SUS Tournament Lexicographic Parsimony Pressure Tournament
Pakistan Institute of Engineering and Applied Sciences
47
GENERATION (Continued) • Three methods to calculate the expected no. of offspring are: – Absolute – Rank85 – Rank89
• Repeats itself until the stopping conditions is met or maximum generation is reached
Pakistan Institute of Engineering and Applied Sciences
48
Parameters (Tree Initialization) • Initial population of trees, created at runtime in the beginning of a GPLAB run – Initial maximum depth/size of the new trees is determined by the parameter “inicmaxlevel”
• Method to generate initial population is specified by the parameter “initpoptype” – Possible Values (‘fullinit’, ‘growinit’, ‘rampedinit’)
Pakistan Institute of Engineering and Applied Sciences
50
Parameters (Functions) • “functions” parameter can be used to indicate which functions GPLab should use • Functions available in GPLab to set functions are params=setfunctions(params,’func1’,2,’func2’,1); params=addfunctions(params,’func1’,2,’func2’,1);
• Table of functions on next slide enlists the functions present in GPLab • User defined functions can also be used
Pakistan Institute of Engineering and Applied Sciences
51
Parameters (Available Functions)
Pakistan Institute of Engineering and Applied Sciences
52
Parameters (Terminals) • Terminals are the variables needed to evaluate the fitness cases • GPLab can use as terminals – Constants – Any function with null arity e.g. rand()
• “terminals” parameter can be used to set the terminals • Function to set terminals is params=setterminals(params,’1’,’rand’);
Pakistan Institute of Engineering and Applied Sciences
53
Parameters (Genetic Operators) • Four operators are available in GPLab – – – –
Crossover Mutation Shrink Mutation Swap Mutation
• Functions to set operators are params=setoperators(params,’operator1’,2,2,’operator2’,2,1); params=addoperators(params,’operator1’,2,2,’operator2’,2,1);
Pakistan Institute of Engineering and Applied Sciences
54
Parameters (Selection) • Parents are selected according to one of the five sampling method – – – – –
‘roulette’ ‘sus’ ‘tournament’ ‘lexictour’ ‘doubletour’
• User defined sampling methods can also be used params.sampling=’new_sampling_method’;
Pakistan Institute of Engineering and Applied Sciences
55
Parameters (Expected Number of Children) • Expected number of children can be calculated using one of the three available methods – Absolute – Rank85 – Rank89
• Method to calculate expected no. of children can be selected by setting the parameter “expected”
Pakistan Institute of Engineering and Applied Sciences
56
Parameters (Fitness Measurement) • “calcfitness” parameter determines the method for fitness measurement • Methods available in GPLab for fitness measurement are “regfitness” and “antfitness” • User defined function can also be used for fitness measurement • Data File – When starting a GPLAB run the user is required to indicate the names of the files where the fitness cases are stored
Pakistan Institute of Engineering and Applied Sciences
57
Runtime Graphical Output • GPLab can represent some state variables of algorithm graphically as plots. – Runtime plots are updated in runtime after every generation.
• Four different graphs can be plotted in runtime determined by parameter “graphics” – – – –
plotfitness plotdiversity plotcomplexity plotoperators Pakistan Institute of Engineering and Applied Sciences
58
Fitness
Pakistan Institute of Engineering and Applied Sciences
59
Diversity
Pakistan Institute of Engineering and Applied Sciences
60
Complexity
Pakistan Institute of Engineering and Applied Sciences
61
Operators Probability
Pakistan Institute of Engineering and Applied Sciences
62
Offline Graphical Output • Five specialized functions are provided by GPLab to visualize different aspects of evolution and the results obtained. – – – – –
Accuracy VS Complexity Pareto Front Desired VS Obtained Operator Evolution Tree Visualization
Pakistan Institute of Engineering and Applied Sciences
63
Accuracy VS Complexity
Pakistan Institute of Engineering and Applied Sciences
64
Pareto Front
Pakistan Institute of Engineering and Applied Sciences
65
Desired VS Obtained
Pakistan Institute of Engineering and Applied Sciences
66
Tree Visualization
Pakistan Institute of Engineering and Applied Sciences
67
Genetic Programming (example)
Training Data (XOR) 2 Bit Input Data X1 X2
XOR
0
0
0
0 1 1
1 0 1
1 1 0
Pakistan Institute of Engineering and Applied Sciences
68
Genetic Programming (example) • Terminals – Two Bits of Data (each bit as a separate terminal)
T = { X1, X2 } • Functions – Functions for XOR problem includes the logical operators OR, AND, NOR and NAND
F = { OR, AND, NOR }
Pakistan Institute of Engineering and Applied Sciences
69
Tree Representation NOR NOR x2
AND NOR
x2
X2 NOR
X2
OR X1
X2
x1 Functions
Pakistan Institute of Engineering and Applied Sciences
Terminals 70
Tree Representation; Evolving If Then Rules for decision making IF (quality> 20) AND (Service > 80) THEN good ELSE bad can be represented by the following tree AND
>
quality
>
20
Service
80
Tree Representation; Evolving Codes i =1; while (i < 20) { i = i +1 }
SYMBOLIC REGRESSION
PREPARATORY STEPS
SYMBOLIC REGRESSION
POPULATION OF 4 RANDOMLY CREATED INDIVIDUALS FOR GENERATION 0
SYMBOLIC REGRESSION; x2 + x + 1 FITNESS OF THE 4 INDIVIDUALS IN GEN 0 x+1
x2 + 1
2
x
0.67
1.00
1.70
2.67
SYMBOLIC REGRESSION x2 + x + 1 GENERATION 01
Mutant of (c) Copy of (a)
picking “2” as mutation point
First offspring of crossover of (a) and (b) picking “+” of parent (a) and left-most “x” of parent (b) as crossover points
Second offspring of crossover of (a) and (b) picking “+” of parent (a) and left-most “x” of parent (b) as crossover points
References • 1.Banzhaf, W., Nordin, P., Keller, R. E., & Francone, F. D. (1998). Genetic programming: an introduction (Vol. 1). San Francisco: Morgan Kaufmann. • 2. Alpaydin, E. (2014). Introduction to machine learning. MIT press • 3. Poli, R., Langdon, W. B., McPhee, N. F., & Koza, J. R. (2008).A field guide to genetic programming. Lulu. com. • 4. Poli, R., & Koza, J. (2014). Genetic programming (pp. 143-185). Springer US.
Thanks
Thanks to Department of Electrical Engineering, COMSATS Attock and especially to Dr. Raja Asif