Playing Regex Golf with Genetic Programming - Google Sites
Recommend Documents
publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ..... acceptab
traction may span across several lines (e.g., HTML elements including their ... that the learner cannot generate queries ex novo and can ..... NoProfit-HTML/Email.
This matches the end of a line. . The period or dot matches any character. \d. Matches any single digit. \w. Matches any
answer the query (button âExtractâ or âDo not extractâ, re- spectively). Otherwise, when the user has to describ
both aesthetic and real terrains (without requiring a database of real terrain data). Additionally ... visualisation of
regular structure, good for optimisation (rendering, collision ... optimisation approach, uses a database of pre-selecte
PDF Download Genetic Programming Iii Full Online, epub free Genetic ... flat document, including the text, fonts, graphi
and consider its values in some natural situations. Key words. Gittins index, Markov chain, Markov decision theory, random walk, hitting time, game theory.
down decision tree algorithms like CART [5] and C4.5 [27]. Traditionally, such .... with J classes, as well as an associ
The graph has a set of ni in- ... The genotype is of fixed length however the graph described by it is not. ..... 7 show
Swap node mutation Exchange a primitive with another of the same arity (prob. ... the EKM, the 5 solutions out of 10 run
not explicitly computational in that often one must apply some other mapping ... Cartesian Genetic Programming represent
post-docking filters and how we selected the best candidates using seeded li- braries. In section 7 we examine the evolv
chemical compounds and a measured endpoint for each compound. The measured endpoint is the property of interest. Typical
Jul 11, 2007 - [email protected]. ABSTRACT. In nature ... republish, to post on servers or to redistribute to lists, req
PDF Download Genetic Programming Iii Full Online, epub free Genetic ... flat document, including the text, fonts, graphi
Jul 7, 2010 - algorithm, Genetic Programming and Evolvable Machines (2009) Vol. 10. For further info see: http://dipaola
computer systems, it is developmental, in that it acquires increasingly ... The computational network that forms when th
PDF Download Genetic Programming Iii Full Online, epub free Genetic Programming Iii by John R. Koza, ebook free Genetic
1.1 VISUALISED DISTRIBUTED GENETIC PROGRAMMING ENGINE . ..... also advantages of machine learning: the ability of massiv
Researchers in artificial intelligence, machine learning, evolutionary computation, and ... Includes an introduction to
PDF Download Genetic Programming Iii Full Online, epub free Genetic ... is a file format used to present documents in a
... Series book ebook pdf cartesian genetic programming natural computing series ââ¬Â¦ span class ... PDF Download Car
... a deep dive into web applications and how they work Download the free trial version below to get started ... city Th
Playing Regex Golf with Genetic Programming - Google Sites
Our previous GP-based tool. Automatic regex generation from examples. For data extraction. IEEE Computer, GECCO Hot Off
Playing Regex Golf with Genetic Programming A. Bartoli, A. De Lorenzo, E. Medvet, F. Tarlao University of Trieste, Italy
The problem (“Regex Golf”) ● ●
Specific kind of code golf Writing the shortest regular expression which: ○ ○
matches all the strings in a given list does not match any strings in another list
Regex Golf - Naive solution Positive examples
● ● ● ● ● ●
The Phantom Menace Attack of the Clones Revenge of the Sith A New Hope The Empire Strikes Back Return of the Jedi
Negative examples ● ● ● ● ● ● ● ● ●
The Wrath of Khan The Search for Spock The Voyage Home The Final Frontier The Undiscovered Country Generations First Contact Insurrection Nemesis
The Phantom Menace|Attack of the Clones|Revenge of the Sith|A New Hope|The Empire Strikes Back|Return of the Jedi
Regex Golf - Best solution Positive examples
● ● ● ● ● ●
Negative examples ● ● ● ● ●
The Phanto Phantomm Menace Menace Attack of the the Clones Clones Revenge of the the Sith Sith A New Hope The Empire Strikes Back Return of the the Jedi Jedi
● ● ● ●
m | [tN]|B
The Wrath of Khan The Search for Spock The Voyage Home The Final Frontier The Undiscovered Country Generations First Contact Insurrection Nemesis
16 difficult instances
A lot of vibrant discussions...
...really a lot...
February 26-th, 2014
Automatic Regex Golf
Our previous GP-based tool Automatic regex generation from examples For data extraction
IEEE Computer, GECCO Hot Off the Press http://regex.inginf.units.it
Key observations ● We need a classifier---not an extractor ○ No need to identify boundaries
● No need to infer a general pattern ● No need to process streams with unknown items ○ We need to “overfit”
No greedy/lazy quantifiers (execution time too long to be practical)
● Leave nodes: ○ ○
a-z, A-Z, ^, $, .. problem-dependent elements (next slide)
Problem-dependent elements ● All characters in matches ● All partial ranges including those characters ● All “most useful” n-grams (n=2,3,4) ○ Build all n-grams ○ Score each n-gram based on its frequency: +1 for each match, -1 for each unmatch ○ Rank n-grams ○ Select the smallest set totalling M points (M being the number of matches)
Problem in detail ● 16 instances, each with: ○ Set of matches M ○ Set of unmatches U ○ Weight w (a “difficulty” coefficient)
● Score of regex r on a given instance: ○ w * #misclassifications - length(r)
Problem in detail
Baseline ● GP-RegexExtract (our data extraction tool) ● “Norvig” (January 2014) ○ Deterministic algorithm widely discussed on the web ○ IMPORTANT ■ Not developed for this challenge ■ Designed for completely avoiding misclassifications ■ Comparison not fully fair...but the only algorithm we were aware of
Results: Warning ● The web site does not collect scores/rankings ● Programmers advertised solutions on forums ○
GitHub, Reddit, Hacker News
● Sometimes only scores without any evidence ● Sometimes slightly improving earlier results by other programmes ● No evidence of time spent
Great Results ! ● 6-th/8-th worldwide ○ At the time ○ To the best of our knowledge
● Without any hint from other programmers ● Much better than the baseline
Execution time, Actual regexes
Summary ● Evolutionary computation has reached a level in which it may successfully compete with human programmers ● In scenarios explicitly designed to test their practical skills and creativity ● And, It may do so without any starting hint or external help.
Web application A web-based prototype is public available at http://regex.inginf.units.it/golf