3.1.4. Tracking Genes through Historical Markings .... applied to any other similar game such as Mega Man, Sonic, or Kirby. The problem any ..... spaces by starting out with a uniform population of networks with zero hidden nodes (i.e., all.
Hochschule Rhein-Waal Rhine-Waal University of Applied Sciences Faculty of Communication and Environment Degree Program Information Engineering and Computer Science, M. Sc. Prof. Dr.-Ing. Sandro Leuchter
TEACHING COMPUTER TO PLAY GAMES USING ARTIFICIAL NEURAL NETWORK AND GENETIC ALGORITHM
Term Paper WS 2015/16 Module “Applied Research Project B”
By Muhammad Ahsan Nawaz (18790), Vu thanh ngo (18773) & Junaid Asghar (20819)
1
DECLARATION We hereby declare that the work presented herein is our own work completed without the use of any aids other than those listed. Any material from other sources or works done by others has been given due acknowledgement and listed in the reference section. Sentences or parts of sentences quoted literally are marked as quotations; identification of other references with regard to the statement and scope of the work is quoted. The work presented herein has not been published or submitted elsewhere for assessment in the same or a similar form. We will retain a copy of this assignment until after the Board of Examiners has published the results, which we will make available on request.
Muhammad Ahsan Nawaz, Vu thanh ngo & Junaid Asghar
2
Abstract: This project explores the application of several machine learning techniques such as Artificial Neural Networks, Genetic Algorithm (GA), Neuro-Evolution and Neuro-Evolution of Augmenting Topologies (NEAT) to develop an agent capable of successfully playing games, in this case we chose Super Mario Bros, Japanese edition as an experiment. The world of Mario presents a partially observable, dynamic, episodic problem, and thus provides an interesting and applicable platform to explore several Machine Learning techniques. Mastering the game of Mario has remained a long standing challenge to the field of AI. In computer games, NeuroEvolution is more widely applied than supervised learning algorithms which require a syllabus of correct input-output pairs. Follow this sentiment, we continue to explore and apply NEAT as well as genetic algorithm for this agent. We want to utilize the advantages of NEAT as its author claim which is possible for evolution to both optimize and complexity solutions simultaneously, offering the possibility of evolving increasingly complex solutions over generations, and strengthening the analogy with biological evolution.
Keywords:- Artificial Neural network, Genetic algorithm, Neuro-evaluation of augmenting topologies (NEAT), Machine learning, viusalization
3
Table of Contents CHAPTER – 1 Introduction ------------------------------------------------------------------------- 6 1.
Introduction: -------------------------------------------------------------------------------------- 7 1.1.
Problem Statement ------------------------------------------------------------------------- 8
1.2.
Objectives ----------------------------------------------------------------------------------- 8
1.3.
Scope & Limitation ------------------------------------------------------------------------ 8
1.4.
Project Planning ---------------------------------------------------------------------------- 9
1.5.
Risk Management -------------------------------------------------------------------------- 9
1.6.
Project Outline ---------------------------------------------------------------------------- 10
CHAPTER – 2 Background ------------------------------------------------------------------------ 11 2.
Background -------------------------------------------------------------------------------------- 12 2.1.
Artificial Neural Network --------------------------------------------------------------- 13
2.2.
Evolution Algorithm---------------------------------------------------------------------- 14
2.2.1. 2.3.
Fitness Evolution --------------------------------------------------------------------- 14
Genetic Algorithm ------------------------------------------------------------------------ 15
2.3.1.
Methodology -------------------------------------------------------------------------- 16
2.4. LUA Scripting Language -------------------------------------------------------------------- 17 2.4.1. Environment & chunks -------------------------------------------------------------------- 17 2.4.2. Why LUA? ------------------------------------------------------------------------------- 18 CHAPTER – 3 Literature review ------------------------------------------------------------------ 21 3.
Literature Review ------------------------------------------------------------------------------ 22 3.1.
Neuro-Evolution of Augmenting Topologies (NEAT) ------------------------------ 22
3.1.1. Network Topology-------------------------------------------------------------------- 22 3.1.2. TWEANS encoding ------------------------------------------------------------------- 22 3.1.3. Initial Populations and Topological Innovation ----------------------------------- 23
4 3.1.4. Tracking Genes through Historical Markings ------------------------------------- 24 3.1.5. Protecting Innovation through Speciation ----------------------------------------- 25 3.1.6. Minimizing Dimensionality through Incremental Growth from Minimal Structure
26
CHAPTER – 4 Implementation -------------------------------------------------------------------- 27 4.1.
Artificial Neural Network Implementation ------------------------------------------- 28
4.2.
Activation Function----------------------------------------------------------------------- 29
4.3.
Apply Genetic Algorithm ---------------------------------------------------------------- 30
4.3.1.
Initialize Population ------------------------------------------------------------------ 31
4.3.2. Evaluation ------------------------------------------------------------------------------ 32 4.3.3.
Selection ------------------------------------------------------------------------------- 32
4.3.4.
Cross-over ----------------------------------------------------------------------------- 33
4.3.5.
Mutation -------------------------------------------------------------------------------- 33
4.3.5.1. 4.4.
Structural Mutation -------------------------------------------------------------------- 34 Visualization ------------------------------------------------------------------------------- 35
CHAPTER – 5 ---------------------------------------------------------------------------------------- 36 Test Result ---------------------------------------------------------------------------------------------- 36 5.
Test result ---------------------------------------------------------------------------------------- 37
CHAPTER – 6 Discussions and Future Work --------------------------------------------------- 38 References ---------------------------------------------------------------------------------------------- 40
5
List of Figures: Figure 1: Artificial Neural Network ----------------------------------------------------------------------- 13 Figure 2. Matching up genomes for different network topologies using innovation (Stanley and Miikkulainen, 2002) ----------------------------------------------------------------------------------------- 25 Figure 3 Layers are mapped with game sprites ---------------------------------------------------------- 28 Figure 4. Network of neuron structure -------------------------------------------------------------------- 29 Figure 5 How activation function used to give output in ANN ---------------------------------------- 30 Figure 6 Sigmoid Fuction ----------------------------------------------------------------------------------- 30 Figure 7: GA flow chart ------------------------------------------------------------------------------------- 31 Figure 8. Sample code for evaluation phase -------------------------------------------------------------- 32 Figure 9: Uniform crossover operation-------------------------------------------------------------------- 33 Figure 10 Example of mutation (Obitko.com, 2016)---------------------------------------------------- 34 Figure 11 the two types of structural mutation in NEAT. (Stanley and Miikkulainen, 2002) ---- 35 Figure 12. Fitness and respective generation ------------------------------------------------------------- 37
List of Tables: Table 1: Project Planning ------------------------------------------------------------------------------------ 9 Table 2: Project Management (Divide) -------------------------------------------------------------------- 9
6
CHAPTER – 1 Introduction
7
1. Introduction: The introduction of this report provides an overview of the entire report with problem statement, objective, scope & limitation, project planning, risk management and project outline. Mario is an ancient, deeply strategic board game that is notable for being one of the few board games where human experts are still comfortably ahead of computer programs in terms of skills. Predicting the moves made by expert players is an interesting and challenging machine learning task, and has immediate applications to computer Mario. In this section we provide a brief overview of Mario, previous work, and the motivation for our deep learning based approach. Nintendo's Mario set the standard for a generation of video games. Nintendo's Mario set the standard for a generation of video games. The linear world of Mario, filled with enemies, obstacles and power-ups, is a template which many others have followed. An agent that could solve the problem of playing Mario successfully could easily be applied to any other similar game such as Mega Man, Sonic, or Kirby. The problem any agents faces when playing a game like Mario is how to correctly interpret what it sees in the environment and decide on the best action to take. The problem is important because as the numerous variations on Mario shows, there are various interesting incarnations of this problem. We experimented with several learning solutions to solve the problem of correctly mapping what an agent sees in its environment (state) to an optimal action. The machine learning aspects of our research involve Artificial Neural Networks (ANN), Genetic Algorithms (GAs). In this paper we will explore the strengths and weaknesses of the agent designed by the explored algorithms. First we will show the results of an agent that uses an ANN with genetically evolved weights as well as its topology based on NEAT algorithm. Then we test it on a game called Super Mario Bros (Japanese edition) to see how well the again we developed can play the game and try with another version of Mario which is Super Mario World (US). Last, we will collect and analyze the result and give some discussions about the advantages and problems of that agent.
8
1.1.
Problem Statement
The problem of Teaching Computer to play games in many fields has highlighted the importance of algorithms for this kind of problems. In this scope of this project, we focus on some algorithms related to neural network algorithm. The problem any agents faces when playing a game like Mario is how to correctly interpret what it sees in the environment and decide on the best action to take. The problem is important because as the numerous variations on Mario shows, there are various interesting incarnations of this problem.
1.2.
Objectives
The main objective for this project is to explore genetic algorithms and utilize the power of Artificial Neural Network (ANN) and make it evolve to solve a problem, which is teaching Mario to react in game. Besides, NEAT is a new technique that we want to try and apply along with genetic algorithm with the hope that the efficient when using ANN is raised. We also want to test these algorithms‟ limits as well as flexibility towards different games and different situations.
1.3.
Scope & Limitation
In this project the approach is to find a function which maps a certain state of the board to the probability that a player wins the round. Once the neural network is trained, the move that generates the highest probabilities to win is picked. The main objective of Mario game is to score as many points as possible by winning the game in different ways. However, these rules are out of the scope of this project. Therefore, only the basic rules and the win/loss ratio are considered in this project. This report is also aimed at specifying requirements of software to be developed but it can also be applied to assist in the selection of in-house and commercial software products. The standard can be used to create this final report directly or can be used as a model for defining a project specific standard. It does not identify any specific method, nomenclature or tool for preparing final report
9
1.4.
Project Planning
Project planning is very important in every project whether it is a software development project or belongs to any other field like; (Electrical Engineering, Civil engineering etc.). Because in every project, there are different resources that are used or consumed during the project development process. If a project is not well planned then there can be many of risks like cost, time, effort etc. can occur that can cause to the project over budgeted, late delivery of the project and quality less product. So a project should be well planned and the project manager should communicate with its team members in order to deliver the project in time. The proposed project's planning is described in the below table. Table 1: Project Planning
Tasks
Activity
Time required
1
Project Proposal
30/10/2015
2
Mid-term meeting
23/11/2015
3
Scientific Poster
22/01/2016
4
Final Presentation
29/01/2016
5
Final report
05/02/2016
Table 2: Project Management (Divide)
Name
Activity
Muhammad Ahsan Nawaz (18790)
Project Management & Project Documentation
Ngo Than vu (18773)
Implementation of Genetic algorithm, Artificial Neural Network, NEAT, documentation.
Junaid Khan (20819)
1.5.
Project documentation
Risk Management
There can always be a risk in every project that can harm the project from several ways. And a good software engineer tries to minimize the risk. "Risk is the potential future harm that may arise from some present action". Risk management is a process that is used to minimize the risk before it can harm the productivity of a software project. The percentage of the projects that
10 finishes in time and within the estimated budget is less than the project finishes late and with over budgeted. Risk and the management of risk play an important role in software development. The two ways of handle the risk: Reactive type is a type in which the problem is corrected as it occurs, while the proactive type is a type in which the software engineer starts thinking about possible risks in the project before they occur. There can be several types of risks that can occur during a software development process that are as follows:
Generic risks:
These types of risks can occur all most in every project. For example, requirement change, loss of team members, loss of funding, under estimation or the project.
Product specific risks:
These risks are type of high level risks associated with the type of product being developed. For example: availability of testing resources.
Product risks:
These types of risks can affect quality or performance of software/product being developed.
Business risks
These risks can affect the viability of the software. There are also specific risks associated with team members, customers, tools, technology, time estimation, and team size. Many of these risks can be minimized by the development methodology used for the project. There are many different tools that can be used to analyse the risk apparent in a project and that can help choose the best way to minimize or eliminate that risk. The risks that can occur in our project are as follows:
Time:
Time is a risk that can affect the in time completion of the project. Due to some uncertain instances can cause the late delivery of the project.
1.6.
Project Outline
This document contains almost everything that can help the reader to understand the Project. This document is divided into several chapters. Chapter 2 presents the Background and the tools used in this project. Chapter 3 presents the Literature review about NEAT and in chapter 4; we present how we integrate these algorithms and implement them. Chapter 5 & Chapter 6 present the Discussion & Conclusion.
11
CHAPTER – 2 Background
12
2. Background For the past several years, a number of game-related competitions have been organized in conjunction with major international conferences on computational intelligence (CI) and artificial intelligence for games (Game AI). In these competitions, competitors are invited to submit their best controllers or strategies for a particular game; the controllers are then ranked based on how well they play the game, alone or in competition with other controllers. The competitions are based on popular board games (such as go and Othello) or video games (such as Pac-Man, Unreal Tournament and various car racing games. In most of these competitions, competitors submit controllers that interface to an API built by the organizers of the competition. The winner of the competition becomes the person or team that submitted the controller that played the game best, either on its own (for single-player games such as Pac-Man) or against others (in adversarial games such as Go). Usually, prizes of a few hundred US dollars are associated with each competition, and a certificate is always awarded. There is no requirement that the submitted controllers be based on any particular type of algorithm, but in many cases the winners turn out to include computational intelligence (typically neural networks and/or evolutionary computation) in one form or another. The submitting teams tend to comprise students, faculty members and persons not currently in academia (e.g. working as software developers). There are several reasons for holding such competitions as part of the regular events organized by the computational intelligence community. A main motivation is to improve benchmarking of learning and other AI algorithms. Benchmarking is frequently done using very simple test-bed problems that might or might not capture the complexity of real-world problems. When researchers report results on more complex problems, the technical complexities of accessing, running and interfacing to the benchmarking software might prevent independent validation of and comparison with the published results. Here, competitions have the role of providing software, interfaces and scoring procedures to fairly and independently evaluate competing algorithms and development methodologies. Another strong incentive for running these competitions is that they motivate researchers. Existing algorithms get applied to new areas, and the effort needed to participate in a competition is (or at least, should be) less than it takes to write new experimental software, do experiments and write a completely new paper. Competitions might even bring new researchers into the
13 computational intelligence fields, both academics and non-academics. One of the reasons for this is that game-based competitions simply look cool. The particular competition described in this paper is similar in aims and organization to some other game-related competitions (in particular the simulated car racing competitions) but differs in that it is built on a platform game, and thus relates to the particular challenges an agent faces while playing such games.
2.1.
Artificial Neural Network
One common type of (Evolution Algorithm) EA is one that produces an Artificial Neural Network (ANN). The principles of ANNs are inspired by biological neural networks, or brains, in the sense that ANNs are made of interconnected nodes (neurons) which produce results based on the weighted connections between the nodes. The input into a Neural Network comes from a set of “input” nodes. In the context of game AIs, the input nodes take data directly from the game environment, in the form of numerical data such as distance to a wall, distance to an enemy, or current health. The output from an ANN comes from one or more output nodes, which are used to determine a course of action. For example, in Figure 1 the input nodes are shown on the left of the image and the output nodes are shown on the right. Each node takes values from one or more inputs, performs some operation with them, and sends a value to that node‟s output. Any number of nodes can take one of their inputs from a node‟s output. In this way, values propagate through the ANN until they reach the output nodes.
Figure 1: Artificial Neural Network Source: https://www.hindawi.com/journals/aai/2011/686258/fig1
14
2.2.
Evolution Algorithm
This paper is primarily about evolutionary algorithms applied to AI development, so some background on EAs will be relevant to some readers. An EA is a type of algorithm which is engineered to solve a given problem. It does this by creating a population of “individuals”, where each individual represents a solution to the problem1. The EA then varies this population over time by removing poor solutions from the population and replacing them with new individuals derived from the more successful solutions. Randomness involved in the introduction of new individuals creates and maintains diversity in the population, allowing it to gravitate towards better solutions to the original problem over time. An EA starts with an initially random population. Usually, EAs are organized into “generations” where each generation has a new population of individuals produced from individuals in the previous generation. To produce the next generation, each individual in the old generation is assigned a number representing how good of a solution it is, i.e. its “fitness score.” Then, individuals with high fitness (better solutions) are selected to produce new individuals in the new population. New individuals are produced from old individuals in a way in which some of the old individual‟s traits and characteristics are passed down, often by means of the mutation operation (which modifies a random part of the individual) and the crossover operation (which combines the traits of two individuals). The individuals in the new population will be similar to the best individuals from the previous population, but the random modifications have a chance of producing individuals that are better solutions. In this way, the population tends towards individuals with high fitness.
2.2.1. Fitness Evolution The most important part of an EA is determining which individuals in a population are better or worse solutions to the problem that the EA is supposed to solve. Naturally, we will want to keep the most fit solutions (the solutions that best solve the problem) while abandoning the least fit. This is done via a fitness function, which is intended to measure the relative success of an individual at solving the problem. In the context of game AIs, this is often not a “function” in the traditional sense of the word, but rather an implementation of the individual, either on some data or in competition against another individual, to see how well it does. This can easily become the most computationally expensive part of an EA, and the care put into its implementation can
15 easily determine how quickly an EA reaches a sufficiently good solution (or if it does at all). The fitness function does not have to be perfect, and often includes some noise (randomness). For example, if a set of algorithms is produced for playing a two-player board game, their fitness might be determined via a round-robin style series of matches. However, if the game involves any amount of randomness you are not guaranteed to get exact results. Often, to simplify the calculation, instead of a round-robin style series of matches, individuals will be paired randomly for a smaller number of matches. This will produce more noise but will speed the evaluation of fitness. Where the happy medium will lie depends on the given problem, which is one of the reasons why implementing a good EA is nontrivial.
2.3.
Genetic Algorithm
In the 1950s and the 1960s several computer scientists independently studied evolutionary systems with the idea that evolution could be used as an optimization tool for engineering problems. The idea in all these systems was to evolve a population of candidate solutions to a given problem, using operators inspired by natural genetic variation and natural selection (Mitchell, 1996). Genetic algorithms (GAs) were invented by John Holland in the 1960s and were developed by Holland and his students and colleagues at the University of Michigan in the 1960s and the 1970s. In contrast with evolution strategies and evolutionary programming, Holland's original goal was not to design algorithms to solve specific problems, but rather to formally study the phenomenon of adaptation as it occurs in nature and to develop ways in which the mechanisms of natural adaptation might be imported into computer systems. Holland's 1975 book Adaptation in Natural and Artificial Systems presented the genetic algorithm as an abstraction of biological evolution and gave a theoretical framework for adaptation under the GA. Holland's GA is a method for moving from one population of "chromosomes" (e.g., strings of ones and zeros, or "bits") to a new population by using a kind of "natural selection" together with the genetics−inspired operators of crossover, mutation, and inversion. Each chromosome consists of "genes" (e.g., bits), each gene being an instance of a particular "allele" (e.g., 0 or 1). The selection operator chooses those chromosomes in the population that will be allowed to reproduce, and on average the fitter chromosomes produce more offspring than the less fit ones. Crossover exchanges subparts of two chromosomes, roughly mimicking biological recombination between two single−chromosome ("haploid")
16 organisms; mutation randomly changes the allele values of some locations in the chromosome; and inversion reverses the order of a contiguous section of the chromosome, thus rearranging the order in which genes are arrayed. (Here, as in most of the GA literature, "crossover" and "recombination" will mean the same thing.) (Mitchell, 1996).
2.3.1. Methodology Genetic Algorithm (GA) mimics all the processes based on the concept of natural evolution to find the optimum solution to the given problem residing in the search space. The GA pool contains a number of individuals called chromosomes. Each chromosome encoded from the parameters holds the potential solution. According to the evolutionary theories, the chromosomes which only have a good fitness are likely to survive and to generate the offspring‟s and pass its strength to them by the genetic operator. The fitness of chromosome is the way that is linked to the predefined problem or objective function (Coley, 1999; Goldberg, 1989). GA cycle can be decomposed into five steps described as follows (Ismail,2007): 1. Randomly initialize the population in the pool with more population, the coverage in search space is good but traded off by the calculation time in each generation. In the simplest way, the real-value parameter is binary coded to give a bit string. The bit strings for several parameters are concatenated to form a single string or chromosome. 2. Evaluate the chromosomes by objective function. After the evaluation, all the chromosomes are ranked for the fitness values in a descending or ascending order depending on the purpose of the objective function. 3. Select the parents from the chromosomes with the biased chances. The higher-fitness chromosome is prone to survive. 4. Generate the offspring using genetic operators consisting of crossover and mutation. Crossover is a recombination operator that swaps the parts of two parents. Two random decisions are made prior to this operation, whether to do it or not and where the crossover point is. Mutation gives a good chance to explore the uncovered search space. It mutates, or complements some genes in the chromosome of the offspring, so that the new parameter value takes place. 5. Entirely replace the elder generation in the pool with the newer one and return to step 2. In some cases, the few best elders may be kept away from replacement. This is known as
17 elitist strategy. The criteria for stopping the revaluation loops are met when a) the loop number is over some predefined point or b) the steady state lasts for predetermined times.
2.4. LUA Scripting Language LUA is a powerful, light-weight programming language designed for extending applications. LUA is also frequently used as a general-purpose, stand-alone language. Lua combines simple process dural syntax (similar to Pascal) with powerful data description constructs based on associative arrays & extensible semantics. Lua is dynamically typed, interpreted from byte codes, and has automatic memory management with garbage collection, making it ideal for configuration, scripting, and rapid prototyping. Unlike most other scripting languages, lua has a strong focus on embed-ability, favouring a development style where parts of an application are written in a “hard” language (such as C or C++) and parts are written in Lua. Currently Lua is used in a vast range of applications, being regarded as the leading scripting language in the game industry. Lua is an extension programming language designed to support general procedural programming with data description facilities.
2.4.1. Environment & chunks All statement in lua is executed in a global environment. This environment is initialized with a call from the embedding program to Lua-open & persists until a call to Lua-close, or the end of the embedding program. If necessary, the host programmer can create multiple independent global environments, and freely switch between them. The global environment can be manipulated by Lua code or by the embedding program, which can read & write global variables using API functions from the library that implements Lua. Global variables in Lua do not need to be declared. Any variable is assumed to be global unless explicitly declared local. Before the first assignment, the value of a global variable is nil. A table is used to keep all global names & values. The unit of execution of lua is called a chunk. A chunk is simply a sequence of statements, which are executed sequentially. Each statement can be optionally followed by a semicolon: Chunk
{stat [„;‟]}
The notation above is the usual extended BNF, in which {a} means 0 or more a‟s, [a] means an optional a, and {a} + means one or more a‟s. The complete syntax of Lua.
18 A chunk may be stored in a file or in a string inside the host program. When a chunk is executed, first it is pre-compiled into byte codes for a virtual machine, and then the statements are executed in sequential order, by simulating the machine. All modifications a chunk effects on the global environment persists after the chunk ends. Chunk may also be pre-compiled into binary form & stored in files; see program luac for details. Text files with chunks & their binary pre-compiled forms are interchangeable. Lua automatically detects the file type & acts accordingly.
2.4.2. Why LUA? 1. Embed-ability
Provided as a c Library
Stand-alone interpreter is a client
Simple API
Simple types
Low-level operations
Stack model
Embedded in C/C++, java, FORTRAN, C#, Perl, Ruby, Ada, etc.
Language designed with embedding in mind
Bi-directional
Host calls lua & lua calls host
Most other scripting languages focus only on calling external code.
2. Portability
Runs on most of machines we ever heard of UNIX, Mac, Windows, Windows CE, Symbian, Palm, Xbox, PS2, PS3, PSP, etc. Embedded hardware
Written in ANSI C / ANSI C++ Avoids #ifdefs Avoids dark corner of the standard Development for a single & very well documented platform: ANSI C
LUA has ”Two parts”: core & library
19 Core moving towards a free-standing implementation No direct dependencies on the OS.
3. Simplicity
Just one data structure: tables Associative arrays t.x for t[“x”] (sugar syntax) all efforts to simplicity & performance
Complete manual with 100 pages Core, libraries (standard, auxiliary) and API
Paradigm: mechanisms instead of policies
Non-intimidating syntax For non-programmers users like engineering, geologists, mathematician, etc.
4. Small Size
Entire distribution (tar.gz) has 209 KB
Binary less than 200 KB