A Genetic Programming System for the Induction of ... - Semantic Scholar

3 downloads 0 Views 171KB Size Report
The study presented in this paper evaluates genetic programming (GP) as a ... Each generation of this process involves evaluating the population, applying ... the translation of the evolved algorithms into a particular programming language, ...
A Genetic Programming System for the Induction of Iterative Solution Algorithms to Novice Procedural Programming Problems NELISHIA PILLAY University of KwaZulu-Natal, Pietermaritzburg Campus ________________________________________________________________________________________________ The study presented in this paper evaluates genetic programming (GP) as a means of evolving solution algorithms to novice iterative programming problems. This research forms part of a study aimed at reducing the costs associated with developing intelligent programming tutors by inducing solutions to the programming problems presented to students, instead of requiring the lecturer to provide these solutions. The paper proposes a GP system for the induction of algorithms using iteration and nested iteration. The proposed system was tested on 15 randomly selected novice procedural programming problems requiring the use of iterative and nested-iterative constructs. The system was able to evolve solutions to eight of these problems. Premature convergence of the GP algorithm as a result of fitness function biases was identified as the cause of the failure of the system to induce solutions to the remaining seven problems. The iterative structure-based algorithm (ISBA) was developed and successfully implemented to escape local optima caused by fitness function biases and evolve solutions to these problems. Categories and Subject Descriptors: I.2.2 [Artificial Intelligence]: Automatic Programming General Terms: Genetic Programming, Automatic Programming Additional Key Words and Phrases: Intelligent Programming Tutors

________________________________________________________________________________________________ 1.

INTRODUCTION

Intelligent programming tutors (IPTs) have proven to be effective in assisting novice programmers overcome learning difficulties. Unfortunately, the high developmental costs of IPTs have impeded their large scale use in the classroom [Freedman, Ali and McRoy 2000]. The research presented in this paper forms part of a larger initiative (described in [Pillay 2003]) aimed at reducing these costs. An IPT compares students’ attempts to programming problems to solutions to these problems stored in the IPTs knowledgebase. In order to cater for more than one problem solving approach, alternative solutions are stored [Xu and Chee 1999]. The stored solutions are usually constructed by the developer of the system. A function of the Expert Module of the generic architecture described in [Pillay 2003] is the automatic induction of these solution algorithms. The study presented in this paper evaluates genetic programming for this purpose. This study focuses specifically on the induction of novice iterative solution algorithms. The following section defines the scope of the study. Section 3 gives a brief overview of genetic programming. Section 4 provides an account of similar studies conducted to evolve iterative algorithms using genetic programming. The overall methodology employed to test the hypothesis that genetic programming is able to evolve novice iterative solution algorithms is presented in Section 5. Section 6 proposes a genetic programming system for the induction of iterative solution algorithms. The performance of the proposed system when applied to 15 randomly chosen novice iterative programming problems is discussed in Section 7. Revisions made to the initial system based on an analysis of its performance are described in Section 8. Section 9 summarises the results of the study. 2.

SCOPE OF THE STUDY

The main aim of the study presented in this paper is to determine whether genetic programming can evolve novice iterative solution algorithms. Genetic programming will be evaluated as a means of finding at least one solution to each novice procedural programming problem. The knowledgebase of an IPT stores alternative solutions to each problem, thus the system will need to generate multiple solutions to a problem, each taking a different problem solving approach. Algorithms evolved by genetic programming systems usually contain a lot of redundant code [Banzhaf, Nordin, Keller and Francone 1998]. Methods for removing redundant code will have to be investigated. Once it’s established that a genetic programming system is capable of evolving iterative solution algorithms to novice procedural programming problems, i.e. the results of the study presented in the paper are positive, future work will examine the evolution of multiple solutions and the removal of redundant code from algorithms.

________________________________________________________________________________________________ Author Addresses: N. Pillay, School of Computer Science, Pietermaritzburg Campus, University of KwaZulu-Natal, South Africa, [email protected] Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, that the copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than SAICSIT or the ACM must be honoured. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. © 2005 SAICSIT Proceedings of SAICSIT 2005, Pages 111 –113

A Genetic Programming System for the Induction of Iterative Solution Algorithms to Novice Procedural Programming Problems •

3.

9

GENETIC PROGRAMMING

Genetic programming, a concept introduced by Koza, falls into the category of evolutionary algorithms and is hence based on Darwin’s theory of evolution [Koza 1992]. Given the input for a problem and the corresponding output, a genetic programming system can evolve an algorithm to produce the required output. The input-output pairs, called fitness cases, represent a subset of the problem domain and thus the algorithm derived is usually not specific to the fitness cases. The evolutionary process involves creating an initial population and iteratively refining the population until a solution algorithm is evolved. Each generation of this process involves evaluating the population, applying selection methods, e.g. fitness proportionate selection or tournament selection, to choose the fitter individuals as parents of the next generation and applying genetic operators, namely, reproduction, mutation and crossover, to create the next generation. Problem-specific fitness functions are used for evaluation purposes. The elements of each population are usually represented as parse trees comprised of function nodes, e.g.+, if and for, and terminals, i.e. variables representing the input to the problem and constants. An individual of the population is created by randomly choosing elements of a specified function and terminal set. 4.

PREVIOUS WORK

This section examines previous genetic programming systems implemented to induce iterative algorithms. Iterative algorithms are evolved by including iterative operators in the function set used by the GP system [Koza 1992; Langdon 1998]. In the system implemented by Koza [1992] to solve an instance of the Blocks World problem the DU (dountil) operator is included in the function set. DU takes two arguments, a body to be executed on each iteration, and a condition. The iteration stops when the condition specified as the second argument of the DU is met. A maximum of 25 iterations per DU operator instance and a maximum of a 100 iterations per individual is permitted. A similar operator, namely, forwhile, is used in the GP system developed by Langdon [Langdon 1998] to automatically induce methods for the list abstract data type. Nesting of loops was not allowed and a maximum of 32 iterations could be performed by a forwhile operator instance. A more recent approach to evolving iterative algorithms introduced by Koza et al. [Koza, Bennett III, Andre and Keane 1999] is the use of Automatically Defined Loops (ADLs). An ADL is comprised of four branches, namely, a loop initialisation branch (LIB), a loop condition branch (LCB), a loop body branch (LBB) and a loop update branch (LUB). An invocation of the loop results in the LIB branch being evaluated followed by the LCB branch. Depending on the result of the evaluation of the LCB branch the LBB will be executed followed by an evaluation of the LUB branch or the loop will be terminated. An ADL is assigned a name and an argument list. ADLs can be used by a genetic programming system by creating individuals during initial population generation that include ADL structures in their architecture. Architecture-altering operations [Koza, Bennett III, Andre and Keane 1999] are used to alter the architecture of these automatically defined structures during the evolutionary process. In some studies problem-specific iteration operators have been implemented. For example, Koza et al. [Koza, Bennett III, Andre and Keane 1999] define Automatically Defined Iterations (ADIs) operators which are used to specifically access elements of an array and Kirshenbaum [Kirshenbaum 2001] implements bounded iteration operators to access elements of a vector. 5.

METHODOLOGY

The overall methodology employed to test the hypothesis that genetic programming can induce solutions to iterative novice procedural programming problems is the “proof by demonstration” methodology described by Johnson [Johnson 2003]. This methodology involves iteratively improving the performance of a computer system in order to answer a research question. The causes of the failure of the system on one iteration form input to the refinement process of the next iteration. If the system is still unable to induce solutions after numerous steps of refinement reasons for the failure must be identified. In this study the proof by demonstration methodology is implemented with one step of refinement as follows: ─ Develop and implement a genetic programming system for the induction of novice iterative procedural solution algorithms. ─ Test the system on the set of 15 randomly chosen problems in Appendix A. ASCII graphics problems are used to test nested iteration. The problem set contains 10 problems requiring the use of a single iterative construct and 5 ASCII graphics problems. Due to the randomness of genetic programming the system was tested using ten different random number generator seeds for each problem. ─ If the system is unable to evolve a solution for at least one seed for each problem, make changes to the system. ─ Test the revised system and report on the results obtained. If the system is unable to induce solutions to any of the problems, discuss reasons for failure. The proposed system was implemented using the Professional version of JBuilder 4 with JDK1.3. The random number generator provided by the Random class in the JBuilder library was used for purposes of random number generation. This random number generator implements the linear congruential method to modify the initial seed. Proceedings of SAICSIT 2005

10



N Pillay

The simulations in this study were run on two different systems, namely, a Pentium III with Windows XP and a Pentium 4 with Windows 2000. 6.

PROPOSED GP SYSTEM

This section provides an overview of the proposed genetic programming system implemented to evolve novice procedural solution algorithms. In order to ensure that the programs generated have a valid structure and to facilitate the translation of the evolved algorithms into a particular programming language, the system is strongly-typed. Each terminal, constant, memory location, function and function argument is of a specific type. The types catered for by the system are Integer, Real, Boolean, Char, String and Output (for ASCII graphics problems). A number of functions are defined as generic operators. This means that the type of each operator instance is only determined during the process of initial population generation. For example, an individual may contain two for operator instances of different types. Input variables Types of input variables Source of input variables Input values for N Constants Output variables Types of output variables Destinations of target variables Output values for fact Function set

N Integer Keyboard 0, 1, 2, 3, 4, 5 one: 1 Fact Integer Screen 1, 1, 2, 6, 24, 120 +, -, *, /, if, for, , =, ==, !=

Table 6.1.1: Problem Specification for the Factorial Problem

The following sections describe the programming problem specification, internal representation language, genetic programming system architecture, and mechanisms built into the system for escaping local optima. 6.1 Programming Problem Specification For each problem, the input to the genetic programming system is a problem specification. Each problem specification contains the following information: ─ A description of each input to the problem and output of the problem. Each input/output description is comprised of a variable representing the input/output, the type of the input/output and the source/destination of the input/output (keyboard, file or memory). ─ A set of fitness cases where each fitness case provides a value (or list of values) for each input variable and the corresponding output values for each output variable defined. ─ Details regarding the application domain. This knowledge takes the form of constant values in the program specification. ─ Screen output must be specified in the case of ASCII graphics problems. Each screen output specification contains a set of x and y-coordinate pairs defining a screen location and the character found at that position. ─ A subset of the internal representation language that students should have knowledge of to solve the particular problem. Table 6.1.1 illustrates the programming problem specification for the Factorial Problem. 6.2 Internal Representation Language This section describes the internal representation language used by the GP system to express procedural algorithms. A subset of the language, representing the knowledge that the student needs to solve the problem, is used to evolve a solution algorithm for a specific programming problem. During interpretation of operators by the GP system checks are performed to ensure that the closure property is met. For example, if a calculation evaluates to Infinity this value is replaced with the maximum integer value permitted. Similarly, if the condition of a loop is invalid and the loop is not executed, an error message is propagated through to the evaluation method and the individual is penalized. 6.2.1 Sequential Operators The internal representation language includes the following arithmetic, string manipulation, and logical operators: ─ Arithmetic: +, -, *, /, sqrt, % (modulus), neg, sq, cube, pow, trunc, round, ceil, floor, abs. ─ String manipulation: length, concat (concatenates a string), concatc (concatenates a character), del (deletes a single character), delete (deletes a substring), insert, copy, equal, charat, upcase. ─ Logical: ==, !=, =, , cequal (checks character equality), cnoteq, bequal (checks Boolean equality), bneq, not and, or. Proceedings of SAICSIT 2005

A Genetic Programming System for the Induction of Iterative Solution Algorithms to Novice Procedural Programming Problems •

11

6.2.2 Memory Manipulation Operators In some problems each fitness case of an input variable may contain a list of values instead of a single value. In such cases the system uses indexed memory to store the list. Prior to the execution of an individual on a particular fitness case, the indexed memory is initialized to the list of values specified in the fitness case. The aread operator is used to access elements of the indexed memory structure. The alen function, which returns the number of elements in indexed memory, is added to the terminal set in these cases. 6.2.3 Conditional Control Structures The internal representation language includes the if operator which performs the function of an if-then-else statement. The first argument of the operator represents a condition and is of type Boolean. The second and third arguments can be of any type. Both these arguments are instantiated to be of the same type during initial population generation. 6.2.4 Iterative Control Structures In order to facilitate the translation of the evolved algorithms to a procedural programming language, the syntax and functioning of these operators have been chosen to be similar to that of iterative operators commonly used by procedural programming languages. The iterative control structures provided are the for, while, and dowhile operators which implement the for, while and dowhile loops respectively. In order to prevent infinite and time-consuming iterations an upper bound is set on the number of iterations that can be performed per individual. Two variables are maintained for each iterative control structure instance. The first is the counter variable which is incremented on each iteration. In the case of the for operator this variable can be incremented or decremented and is given an initial value of one. The second variable, the iteration variable, stores the result of each iteration. Both these variables are added to the memory structure of the individual and to the terminal set when creating the third argument of the for operator and the second argument of the while and dowhile operators. The counter variable is included in the terminal set when creating the first argument of both the conditional loops. 6.2.5 Input and Output Operators The code required to obtain user input, e.g. from the keyboard or a file, and return outputs, e.g. to the screen or a file are fairly standard. Thus, in an attempt to reduce the computational effort involved in inducing algorithms using genetic programming, it was decided that the genetic programming system will not be required to generate these bits of code. Based on the input sources and output destinations listed in the problem specification, the standard code components will be added to the solution algorithm generated by the system The place(x, y, char) output operator is included in the language in order to generate solution algorithms to ASCII graphics problems. These problems are used to test nested iteration. The system uses a multidimensional array to represent the screen. Two variables, currentx and currenty, keep track of the next screen position to be written to. The place operator writes char to the screen position specified by its first two arguments. The maximum x and ycoordinates specified in the problem specification are added to the terminal set. In order to cater for procedural languages that do not contain commands that write to a specific screen location the language includes toscreen and newline. The toscreen operator writes its argument to the next screen position to be written. Execution of the newline operator results in currentx being incremented and currenty being set to zero. 6.2.6 Multiple Statements In addition to evolving functions, the GP system in this study has to evolve algorithms which consist of more than one statement. The internal representation language provides blockn, where n=1..5, operators for combination purposes. A blockn operator instance takes n arguments and returns the result of evaluating its nth argument. The blockn operators are generic operators and each instance is the same type as its last argument. 6.2.7 Errors The system deals with two types of errors. The first error type is #serror, which represents the error message returned by an operator if it receives an illegal operand or was unable to perform its function. This error is propagated through to the procedure calculating the raw fitness (defined in section 6.3) of the individual and the individual is penalized by adding an error offset to the raw fitness. The value of the error offset used is problem dependant and is one of the GP parameters. The second error type is user-defined errors. Novice procedural programming problems often require the programmer to perform some type of error checking and output a corresponding error message. These errors are defined in the problem specification by giving an output variable a value of error when an error message needs to be displayed for the given input. If the solution algorithm has to perform error checking for more than one input variable there will be more than one error type and each error type will be numbered in the problem specification, e.g. error1. 6.3 Standard GP Features This section provides an overview of the standard features, namely, program representation, control models, initial population generation methods, population evaluation methods, selection methods and the genetic operators, implemented by the system. Proceedings of SAICSIT 2005

12



N Pillay

Each individual is represented as a single parse tree with its corresponding memory structure. Initially, a multi-tree genome was used in order to cater for algorithms that produce multiple output. While this structure was effective for simple problems, the system was not able to induce multiple output simultaneously for more complicated problems. This is consistent with the studies conducted by Bruce [1995] and Langdon [1998]. Thus, for those cases which require the system to evolve more than one output for a problem, a separate run is performed for each output. In novice procedural programming problems it is often the case that part of (or the entire solution algorithm) for one output forms part of the solution for one or more of the other outputs for a multiple output problem. To cater for this a function node is created for each subtree of a solution algorithm for an output. These function nodes are added to the terminal set when inducing the solution algorithms for the other problem outputs. It is evident from the literature reviewed [Koza 1992; Langdon 1998] that the most effective control model and method of initial population generation is problem dependant. Thus, the system provides for both control models, generational and steady-state and all three, namely, grow, full and ramped half-and-half, methods of initial population generation. Inverse tournament selection is used when the steady-state control model is implemented. Each individual is evaluated by executing the individual on each set of input values described in the fitness cases and comparing the output of the execution with the target output specified in the fitness cases. If the target output is numerical the error fitness function defined in Koza [1992] is used to calculate the fitness of the individual. This function sums the absolute value of the differences between the target output and that produced by the individual for each fitness case. For all other target output types the fitness measure is the number of fitness cases for which the individual produces exactly the same output as the target output. In the case of ASCII graphics problems the output written to the screen maintained by the system is compared to that described in the problem specification. The individual is also penalized if a screen location is written to more than once or an attempt has been made to access a location beyond the bounds of the screen. The tournament selection method is used to choose parents of the successive generations. The mutation and crossover operators are used to create the next generation. The mutation operator randomly selects a subtree in the copy of the selected parent and replaces it with a newly created subtree. The crossover operator randomly selects subtrees in copies of both the selected parents. These subtrees are than swapped to create two offspring. The fitter of two offspring is returned. To reduce bloat a limit is set on the size of offspring produced. If an offspring (that is not a solution) exceeds this limit, the genetic operation is performed again until an offspring of the correct size is generated. 6.4 Escaping Local Optima In some cases genetic programming systems are unable to generate solution algorithms due to premature convergence of the GP algorithm. Lack of genetic diversity, selection variance and destructive genetic operators have been cited as the main causes of premature convergence [Banzhaf, Nordin, Keller and Francone 1998; Mawhinney 2000]. In order to promote genetic diversity the reproduction operator is not used and duplicates are not permitted in the initial population. Furthermore, mutation application rates will be increased if necessary. In order to deal with selection variance the system performs multiple iterations per seed, in the hope that a different area of the search space will be visited on each iteration. The system provides non-destructive mutation and crossover operators for use if necessary. These operators produce offspring that are at least as fit as their parents. 7.

RESULTS

This section reports on the performance of the proposed genetic programming system when applied to the set of 15 novice procedural programming problems. The following GP parameters were varied in an attempt to find a solution in each case: control model, method of initial population generation, population size, maximum tree size, initial tree depth limit, mutation tree depth limit, tournament size, bound, error offset (to calculate the penalty for #serror), type of genetic operators (non-destructive or standard), genetic operator applications rates, maximum number of generations, maximum number of runs per seed, maximum number of iterations per individual. The proposed system was able to successfully evolve solutions to eight of the 15 problems. The generational control model was used for all problems. Selection variance was the main cause of premature convergence. The values of the GP parameters used are listed in Table B.1 in Appendix B and one of the solutions evolved for each of these problems is depicted in Table C.1 in Appendix C. The evolutionary process was studied for each of the seven problems that the system was unable to find at least one solution to. Fitness function biases against certain primitives or combinations of primitives was identified as the cause of premature convergence in each of these cases. Given a particular landscape, defined by the fitness function and set of fitness cases for that problem, individuals containing certain primitives or combination of primitives will always have a poor fitness and are thus eliminated early during the evolutionary process. In those cases that the system was not able to find a solution, the eliminated structures were essential components of solution algorithms. Individuals containing an iterative operator had a poorer fitness than those not containing these operators.

Proceedings of SAICSIT 2005

A Genetic Programming System for the Induction of Iterative Solution Algorithms to Novice Procedural Programming Problems •

13

Changes to the fitness function and fitness cases used, resulted in the system getting stuck in different local optima. The following section describes an iterative structure-based algorithm which was used to successfully escape local optima caused by fitness function biases. 8.

REVISED SYSTEM

Table 8.1 lists the iterative structure-based algorithm (ISBA) implemented to escape local optima caused by fitness function biases. The ISBA performs multiple iterations per seed and uses similarity indexes, similar to those implemented by Mawhinney [2000], to ensure that different areas of the search space are examined on each iteration. In order to determine how to apply this restriction, studies were conducted to determine how the GP algorithm converges to a particular structure. These studies revealed that during the first of two phases the GP algorithm appears to search globally until the best individual of each generation has more or less the same structure from the root to some depth d, where d is less than the depth of the tree. The second phase performs a local search, with individuals of the population fixed from the root to a depth d. The rest of the tree evolves during the run until the GP algorithm converges to an overall structure with the best individual being basically the same from one generation to the next. Thus, based on this study of GP convergence, the ISBA performs both a global (consisting of m runs) and local (consisting of n runs) level search. The gsim index is used to test for similarity during the global level search. This index performs comparisons of both the root and the component of the individual from the root to some depth d. Each comparison can be a node comparison, i.e. the comparison checks that the nodes are the same, or a type comparison where nodes are compared with respect to type, e.g. iterative, constant, arithmetic. The lsim similarity index is used during the local level search to determine the number of relations in an individual that are equal to that in the local optima for the local levels. Alternatively, similarity checks may not be performed during the n local runs. This has the same effect as performing multiple iterations each of which are forced to visit different areas of the search space.

While a solution has not been found and the number of global areas explored is less than m begin Step 1: ƒ Perform a run. ƒ If this is not the first global run, ensure that this run does not visit the same areas as the previous global runs. ƒ This can be achieved by ensuring that each newly created individual, selected individual and offspring is not similar to the local optima representing the areas explored on the previous global runs. The gsim similarity index will be used for this purpose. Step 2: ƒ If the individual converged to is not a solution, record the best individual of the last generation as a local optimum for global areas. ƒ Identify the fixed component of the local optimum, consisting of the subtree comprising the nodes from the root to depth d, that will form the first d levels of each element of all the populations of the local level search. Step 3: ƒ While a solution has not been found and the number of local areas visited is less than n begin Step 3.1: • Perform a run. •

The elements of all the populations are fixed from the root to level d, to be the same as the first d levels of the local optimum representing the current global area.



If this is not the first local area visited and similarity checking must be performed, ensure that this run does not visit the same area as the previous local runs.



This is achieved by ensuring that each newly created individual, selected individual and offspring is not similar to the local optima representing the areas explored on previous local runs. The lsim similarity index is used for this purpose.

Step 3.2: • If the run does not converge to a solution and similarity checking must be performed, store the best individual of the last generation as a local optimum for local areas. End End Table 8.1: Iterative Structure-Based Algorithm (ISBA)

The ISBA requires the following parameters to be specified: Proceedings of SAICSIT 2005

14



N Pillay

─ ─ ─ ─ ─ ─ ─

Number of global level runs (m). Number of local level runs (n). Depth until which a component is fixed (d). Type of root comparison for gsim (rcomp). Maximum number of local optima which an individual can have the same (or same type) root as (rthresh). Type of global area comparison for gsim (gcomp) Maximum number of nodes, from the root to depth d, which an individual can have that is the same (or same type) in the same position in one of the local optima (gthresh). ─ The maximum number of relations that an individual can have in common with one of the local optima (lthresh). The ISBA was able to evolve solutions to the seven problems which the proposed GP system was unable to find solutions to. A value of two proved to be sufficient for d in all cases. The generational control model was used for all problems. Appendix B lists the GP and ISBA parameter values used in Table B.2 and one of the solutions evolved for each problem is illustrated in Table C.2 in Appendix C. 9.

CONCLUSION

The study presented in this paper evaluated genetic programming as means of evolving novice iterative solution algorithms. The genetic programming system implemented was able to evolve solutions to the 15 randomly chosen iterative programming problems. The proposed system found solutions to eight of these problems. The use of multiple runs successfully escaped local optima caused by selection variance. In the revised system the ISBA was developed to escape local optima caused by fitness function biases. The revised system was able to find solutions for the remaining seven problems. As was anticipated, a number of the evolved solutions contained redundant code. Furthermore, the problem solving approach taken in some of the evolved solutions was very different from that usually employed by human programmers. Some of the solutions to the factorial problem used an if operator to determine the bounds of the for loop, whereas human solutions to this problem usually contain the for loop in the else section of the if statement. Future extensions to the system will examine the removal of redundant code and the use of syntax rules to evolve solutions adhering to good programming practice. Furthermore, some of the solutions evolved by the system were inefficient. Mechanisms, similar to the fitness function penalty employed by Koza [1992], will be incorporated into the system to ensure that efficient algorithms are evolved. Future investigations will also focus on improving the runtime of the ISBA and automating the process of identifying suitable ISBA parameters for a given problem domain.

10. REFERENCES BANZHAF, W., NORDIN, P., KELLER, R.E., AND FRANCONE, F. D. 1998. Genetic Programming – An Introduction – On the Automatic Evolution of Computer Programs and its Applications. Morgan Kaufmann Publishers. BRUCE, W. S. 1995. The Application of Genetic Programming to the Automatic Generation of Object-Oriented Programs, Phd Dissertation, School of Computer and Information Sciences, Nova Southeastern University. FREEDMAN, R., ALI, S. S., AND MCROY, S. 2000. What is an Intelligent Tutoring System? International Journal of Artificial Intelligence in Education 11, 3, 15 – 16. JOHNSON, C., DEPARTMENT OF COMPUTER SCIENCE, GLASGOW UNIVERSITY, 2003. What is Research in Computing Science? http://www.dcs.gla.ac.uk/~johnson/teaching/research_skills/basics.html KIRSHENBAUM, E. HP LABORATORIES. 2001. Iteration Over Vectors in Genetic Programming. http://www.hpl.hp.com/techreports/2001/HPL-2001-327.html. KOZA, J. R. 1992. Genetic Programming I: On the Programming of Computers by Means of Natural Selection. MIT Press. KOZA, J. R., BENNETT III, F. H., ANDRE, D., AND KEANE, M. A. 1999. Genetic Programming III, Darwinian Invention and Problem Solving. Morgan Kaufmann Publishers. LANGDON, W. B. 1998. Genetic Programming + Data Structures = Automatic Programming. Kluwer Publishers. MAWHINNEY, D., 2000, Preventing Early Convergence in Genetic Programming by Replacing Similar Programs. http://citeseer.nj.nec.com/mawhinney00preventing.html. PILLAY, N. 2003. Developing Intelligent Programming Tutors for Novice Programmers. Inroads – SIGCSE Bulletin 35, 2, 78 – 82. ACM Press. XU, S., AND CHEE, Y. S. 1999. SIPLeS-II: An Automatic Program Diagnoses System for Programming Learning Environments. In Artificial Intelligence in Education – Open Learning Environments: New Technologies to Support Learning Exploration, and Collaboration, LeMans, France, July 1999, S. P. LAJOIE AND VIVET M. Eds. IOS Press, 397 – 404.

11. ACKNOWLEDGEMENTS This material is based upon work supported by the National Research Foundation (NRF). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and the NRF does not accept any liability in regard thereto.

Proceedings of SAICSIT 2005

A Genetic Programming System for the Induction of Iterative Solution Algorithms to Novice Procedural Programming Problems •

15

12. APPENDIX A: PROBLEM SET 1. 2. 3. 4. 5.

Write a program that calculates and outputs the sum of a list of numbers entered by the user. Find the largest number in a sequence of integers. Write a program that takes in two integers m and n with m=1 ) { for count486#1 = (1) to (n) { count486#1 * lvar486#1 } } else { 1 } for count302#1 = (1) to ( length ( str ) ) { concatc ( lvar302#11 , upcase ( charat ( str , count302#1 ) ) ) } for count118#1 = (4) to (1) { for count118#2 = (1) to (4) { place ( count118#2 , count118#1 , if ( ( count118#2 + 1) lvar481#7 ) { aread( count481#7 ) } else { lvar481#7 } } for count102#1 = ( num1 ) to ( num2 ) { count102#1 * count102#1 + lvar102#1 } for count18#1 = (alen) to (1) { aread(count18#1) + lvar18#1 } / alen / 1 2 while( count275#1 < alen ) { ten * lvar275#1 + aread( count275#1 ) } for count293#1 = (1 + N) to (1 + 1) { lvar293#1 + 1 / count293#1 } while ( cnoteq ( charat ( str , count361#1 ) , term ) ) { if ( cequal ( space , charat ( str , count361#1 ) ) ) { one + lvar361#1 } else { lvar361#1

5

1hr, 28 mins

6

2hrs, 23 mins

8

5

15

7

4 mins, 12 secs

2

N/A

6

8

1 hr, 2 mins, 7 secs

5

N/A

1

2

Note that because the root of the body of the loop is “+” lvar275#1 gets an initial value of 0.

Proceedings of SAICSIT 2005

A Genetic Programming System for the Induction of Iterative Solution Algorithms to Novice Procedural Programming Problems •

} } 10

13 mins

4

N/A

5

if (e == zero ) { 1 } else { for count200#1 = (e) to (1) { lvar200#1 * b } }

Table C.2: Solutions Evolved by the Revised System

Proceedings of SAICSIT 2005

113

Suggest Documents