September 3, 2004 11:12 WSPC/115-IJPRAI
00358
International Journal of Pattern Recognition and Artificial Intelligence Vol. 18, No. 6 (2004) 1039–1056 c World Scientific Publishing Company
PROJECT CellNet: EVOLVING AN AUTONOMOUS PATTERN RECOGNIZER
N. KHARMA∗ , T. KOWALIW† , E. CLEMENT, C. JENSEN, A. YOUSSEF and J. YAO Departments of Electrical and Computer Engineering and Computer Science, Concordia University, 1455 de Maisonneuve Blvd W, Montr´ eal, Canada H3G 1M8 ∗
[email protected] †
[email protected]
We describe the desire for a black box approach to pattern classification: a generic Autonomous Pattern Recognizer, which is capable of self-adapting to specific alphabets without human intervention. The CellNet software system is introduced, an evolutionary system that optimizes a set of pattern-recognizing agents relative to a provided set of features and a given pattern database. CellNet utilizes a new genetic operator designed to facilitate a canalization of development: Merger. CellNet utilizes our own set of arbitrarily chosen features, and is applied to the CEDAR Database of handwritten Latin characters, as well as to a database of handwritten Indian digits provided by CENPARMI. CellNet’s cooperative co-evolutionary approach shows significant improvement over a more standard Genetic Algorithm, both in terms of efficiency and in nearly eliminating over-fitting (to the training set). Additionally, the binary classifiers autonomously evolved by CellNet return validation accuracies approaching 98% for both Latin and Indian digits, with no global changes to the system between the two trials. Keywords: Pattern recognition; autonomous; evolution; cooperative co-evolution; merger; genetic algorithm; evolutionary computation; handwritten character.
1. Motivation The field of Pattern Recognition is expansive: the shear volume of possible frameworks, conceptualizations and systems is enormous. The amount of literature available to a practitioner in any subfield will span a wide berth of conceptual frameworks and highly specialized cases. A valuable tool in this environment is that of an Autonomous Pattern Recognizer (APR). We envision a “black box” approach to pattern recognition in which the APR’s operator need not be privy to the details of the mechanism used in order to generate a reliable recognition tool. A black box of this sort needs to accommodate many differing classes of input and output. Nearly no assumptions regarding the space of input can be made, nor any assumptions regarding clustering, features, etc. The immediate trade-off found in this setting is the trade-off between the generality of the system and 1039
September 3, 2004 11:12 WSPC/115-IJPRAI
1040
00358
N. Kharma et al.
the so-called “curse of dimensionality”. Any system that operates on a space of possible recognizers this large quickly confronts a computationally unfeasible task: the space of all possible recognizers in any sufficiently general system is simply too vast. Towards this aim, an APR needs to implement a means to “canalize development”: a method of search through the space of possible recognizers which is limiting in terms of the subspace considered, but robust in terms of finding optima. In addition to these difficulties, such a system would need to be resistant to difficult problems such as overfitting: the tendency of learning systems to “memorize data”. In the context of adaptive systems, overfitting means that the system learns a very precise methodology for classifying a training set, which does not extend to classifying an independent validation set. An APR search technique needs to deal with the problem of overfitting intrinsically in its methodology of subspace search, in addition to other global strategies. It is toward this aim, and with these constraints in mind, that we present CellNet, a system designed to implement a positive step forward toward the ultimate goal of an APR. This paper presents the first phase of the CellNet system, along with several ideas for future growth. 2. Introduction 2.1. Review 2.1.1. Coevolution Hillis8 was the first to propose the idea of using coevolution in a GA. Without coevolution, the nonchanging (or static) fitness function will ultimately start returning values that offer little information about differential fitness of agents. Hillis meant to solve this problem via the introduction of a second population of “test agents”, which change automatically through evolution. This effectively replaced the single explicit static fitness function (of one population) with a dynamic implicit one that kept changing (and getting harder) with time. Hillis’ solution also helped in preventing the population from stagnating in local optima. What Hillis had invented was essentially a zero-sum game between two competing populations. On the other hand, Paredis in 1996 proposed a different type of coevolution inspired by the symbiosis found in nature between many living organisms (e.g. shark and parasite-eating fish). In his model, the game is not zero-sum, and the increase of fitness of one population is tied to, and is beneficial to, an increase in fitness of the other population. For this reason, we choose to call this cooperative coevolution, in contrast to Hillis’ competitive coevolution. Artificial evolution, as well as both versions of coevolution, were used for the solution of many scientific and engineering problems; the focus below is on GA use for optimization and creation in pattern recognition systems.
September 3, 2004 11:12 WSPC/115-IJPRAI
00358
Project CellNet: Evolving an Autonomous Pattern Recognizer
1041
2.1.2. Feature selection To our knowledge, Siedlecki14 contributed the first paper to suggest the use of GA for feature selection. Ten years later, Kudo10 demonstrated the superiority of GA over a number of traditional search and optimization methods, for sets with a large number of features. Vafaie17 showed that a GA is better than Backward Elimination in some respects, including robustness. Guo7 used a GA for selecting the best set of inputs for a neural network used for fault diagnosis of nuclear power plant problems. The technique was successful, but proved to be computationally expensive. In contrast, Brill3 speeded-up the evaluation of the various feature sets by using multiple generations running on multiple processors, and using a nearest-neighbor classifier to emulate the functionality of a full-fledged neural network, without actually using one. Moser12 surveyed previous attempts at speeding up the action of GA, and then proposed two new architectures of his own: VeGA, which uses parallelism to simultaneously evaluate different sets of features (on one computer), and DveGA, which utilizes an adaptive load balancing algorithm to use a network of heterogeneous computers to carry out the various evaluations. In both cases, the selected features were used for character recognition (CR) purposes. Shi13 applied a GA to the selection of relevant features in handwritten Chinese character recognition. Fung6 used a GA to minimize the number of features required to achieve a minimum acceptable hit-rate, in signature recognition. Yeung19 succeeded in using a GA in tuning four selectivity parameters of a Neocognitron, which is used to recognize images of handwritten Arabic Numerals. 2.1.3. Feature creation A relatively small number of researchers have tried to approach the problem of feature creation (as opposed to mere selection) for pattern recognition. The first appears to be Ref. 15. In his work, Stentiford used evolution to create and combine a set of features (into a vector), which is compared to reference vectors, using nearest-neighbor classifiers. A single feature is a set of black- or whiteexpecting pixel locations. The results were generated using a training set of 30,600 printed characters (from 6–8 high quality fonts) and a test set of 10,200 printed characters (from the same fonts). The error rate, on the test set, was 1.01%, and was obtained using 316 nearest-neighbor classifiers automatically generated by the evolutionary system. More recently, a group of researchers including Melanie Mitchell (of the Santa Fe Institute) used evolution to create features for use in analyzing satellite and other remote-sensing captured images. 16 Specifically, the researchers defined a genetic programming scheme for evolving complete image processing algorithms, for example, in identifying areas of open water. Their system called GENIE successfully evolved algorithms that were able to identify the intended ground features with 98% accuracy (measured against the “ground truth”).
September 3, 2004 11:12 WSPC/115-IJPRAI
1042
00358
N. Kharma et al.
Our system uses cooperative coevolution to select a set of features optimally suited to a given pattern recognition task. The task is related to character/gesture recognition. However, the domain of application is not intended to limit the range of application of our approach in the future. 2.2. CellNet The system appears as a typical GA, but for one aspect: “Merger”. Merger is a new genetic operator that enables a GA to literally join or merge two independent agents into a single new agent. However, the idea behind merger is more general than the specific details of the current implementation within the current version of CellNet. Merger is not just an operator; the aim of introducing merger is to create a GA that is capable of autonomously creating complex (multi-module) designs starting from simple agents who are driven, initially, by a simple fitness function. This fitness function is enhanced in later generations by the introduction of new criteria. However, no matter how complex (or “demanding”) the fitness function may be, agents cannot achieve a higher degree of complexity than that allowed by their static (form-fixed size) representation. Merger attempts to solve this problem, by providing the GA with an operator that can be used to increase (or decrease) the size of agents. This also has the side effect of allowing the GA to find the right-sized agents without having to fix, before hand, the right or maximum size of agents. 3. The Model 3.1. Overview CellNet contains a fixed-size population of pattern recognizing agents (or “hunters”) and another fixed-size population of pattern agents (or “prey”). There is also a set of algorithms (the “environment”) that govern the dynamic behavior of each population, and both populations with respect to each another. The essential metaphor of the model is that of hunters optimizing their ability to recognize prey. Prey consist solely of images, drawn from the CEDAR or CENPARMIa database. Initially, the prescribed number of agents are chosen, and placed into the training population. The system also contains a pair of global parameters by which the prey population is adapted — for each set length of time, a percentage of the prey are removed from the population, and replaced with previously unseen images drawn from the total pool. This is done to provide a framework in which the hunters have a stable population which they may exploit, but which also provides an amount of variability serving to reward more general hunters, by penalizing those too precise in their classification scheme. a Portions of the research in this paper use the CENPARMI Database of Indian Digits collected by CENPARMI.
September 3, 2004 11:12 WSPC/115-IJPRAI
00358
Project CellNet: Evolving an Autonomous Pattern Recognizer
1043
Hunters are complex recognizing agents, whose structure consists of logical combinations of features drawn from a finite feature set. The purpose of the CellNet system is to evolve one, or several, hunters which can serve as robust pattern recognizing agents once disembodied from their environment. The finite feature set is a selection of prechosen functions which when applied to an image, return a normalized value. At present, the CellNet system uses the following feature list, all applied to thinned characters: Histogram (parameterized to consider subsets of the image), Normalized Height (of a bounding box), Normalized Width, Normalized Length, Number of Terminators, Number of Intersections, Fourier Descriptors, Zernike Moments and Central Moments. It should be stressed that the current choice of features is arbitrary, and was chosen largely due to their availability. Indeed, nearly any set of features designed for character recognition could be used in their stead. 3.2. Genetic representation A hunter agent consists of one or more “cells”, organized into one or more “chromosomes”. A cell is, basically, a simple pattern matcher whose state depends on the existence (or lack thereof) of a specific feature in a given pattern (prey). A cell resides in a chromosome, and a chromosome can hold any number of cells (greater than 0). A chromosome with more than one cell combines its cells using either AND or AND-NOT operators. A chromosome, in a given pattern recognition context, can either express an opinion (“vote”), or simply stay silent. This depends wholly on the state of its constituent cells in that context. If/when a chromosome votes then it will vote either for the class (since we are only dealing with binary classifiers) or against it, depending on the chromosome’s “class-bit”. As stated, a complete agent is made of one or more chromosomes. In a given context, an agent has an overall vote of its own, which depends on the votes of its chromosomes. If more chromosomes vote for the class than against it, then the whole agent votes for the class, and vice versa. As such, an agent acts as a simplemajority-wins-all mechanism for all its voting chromosomes. The representational structures is discussed in more detail below. The structure of a hunter cell follows. C
Fx
L1
L2
J
C is either 1 or 0, and indicates whether the cell is associated with the class or not. FX refers to a specific feature extraction function, drawn from the finite feature list. L1 and L2 are the lower and upper limits, respectively, of a range of values that FX is allowed to return in order for the cell to be activated. If, at a given time, the value measured by FX falls outside that range then the cell is considered inactive (and hence its chromosome does not vote). The J bit is a join bit, which is meaningless in this context; its presence is explained below.
September 3, 2004 11:12 WSPC/115-IJPRAI
1044
00358
N. Kharma et al.
A cell may be interpreted as follows: if FX returns a value in [L1 , L2 ] then the cell is activated, and votes either for or against the class depending on C. Otherwise, the cell is silent. The structure of a hunter chromosome is illustrated below: C
Fx
L1
L2
J1
FY
L3
L4
J2
A1
A2
The C bit in a uni- or multi-cellular chromosome stands for class. The C here indicates the vote that the whole chromosome will cast, if/when all its cells are activated. Note that there exists only one C bit per chromosome — see the description of Merger (3.6) for details. There can be any number of cells in a chromosome (two in this example). The first cell starts with FX , and the second starts with FY . These two cells are joined by a J1 (join) bit, specifying the conjunction between the cells (AND or AND-NOT); note the second join bit: it is unused at present, only activated if a third cell is attached to the chromosome. Finally, the 2 A (affinity) bits are special bits, which guide the process of merger, which is explained in detail below. A chromosome may be read as: if FX returns a value in [L1 , L2 ] AND[NOT] FY returns a value in [L3 , L4 ] then the chromosome is active and votes C. A complete hunter is simply a set of one or more chromosomes. No special structures are added (and none are eliminated). To find out how a hunter votes in a given situation, the number of chromosomal votes for the class are counted, as are the votes against the class. Then, the hunter votes “1” (as opposed to “0”) if the number of votes are greater than for the votes against, and vice versa. If the number of positive and negative votes turns out to be equal, or if none of the chromosomes vote, then the hunter votes “0.2” (interpreted as: “uncertain”). The complexity of an agent is defined to be the number of cells in its make-up; the number of chromosomes is not a factor in this measure. 3.3. Genetic algorithm The CellNet system’s primary driving force is a GA. It consists of a population of hunters and another of prey, initialized randomly, as the zeroth generation. Following this begins an iterative procedure of refinement: Hunter agents are sorted according to an assigned fitness, according to which the next generation is prepared. This preparation consists of elitism, crossover and mutation, and is augmented with our new genetic operator, merger. Each operator is described below. 3.4. Fitness At the beginning of each generation, each hunter agent is evaluated for fitness. This is not necessarily the agent’s accuracy (defined in Sec. 4.1); although for testing purposes raw accuracy is also computed. Rather, fitness is a function based loosely
September 3, 2004 11:12 WSPC/115-IJPRAI
00358
Project CellNet: Evolving an Autonomous Pattern Recognizer
1045
on accuracy, and is meant to provide a measure that will drive the population’s evolution. Explicit fitness functions depend on the trial in question, and are shown in Sec. 4. 3.5. Elitism The first stage of the preparatory process is elitism: here a percentage of the topranking agents are simply copied into the next generation without modification. This is done to provide additional stability in the genetic algorithm, by ensuring that the maximal discovered fitness (although not necessarily accuracy) does not diminish. 3.6. Merger The population is then subjected to the merger operator — this is an operator designed to select two fit agents, and combine them to create a single agent with twice the complexity. The merged agents are then deposited in the new population as single units. Agents are selected for merger with a set probability, chosen using roulette-wheel selection (a.k.a. fitness-proportional selection). Merger is a complex process, largely under genetic control. It operates on the chromosomes present in the parent agents, allowing them to combine either vertically (chromosomes vote competitively for the agent’s final decision) or horizontally (two chromosomes combine using AND or AND-NOT as a conjunction). The result is always a valid agent, although not necessarily a fitter agent. Example of Horizontal Merger of Two Chromosomes: Chromosome in Agent A: 0 Fa1 La11 La12 Ja Fa2 La21 La22 1 0 Chromosome in Agent B: 1 Fb1 Lb11 Lb12 0 1 Result:
0 Fa1 La11 La12 Ja Fa2 La21 La22 AND-NOT Fb1 Lb11 Lb12 1 1 Example of Vertical Merger of Two Chromosomes:
Chromosome in Agent A: 0 Fa1 La11 La12 Ja Fa2 La21 La22 1 1 Chromosome in Agent B: 1 Fb1 Lb11 Lb12 0 1 Result:
0 Fa1 La11 La12 Ja Fa2 La21 La22 1 1 1 Fb1 Lb11 Lb12 0 1
Consider the attempt to merge two agents, A and B. Both A and B have chromosomes, each with a two-bit affinity. The affinity of a chromosome is the means
September 3, 2004 11:12 WSPC/115-IJPRAI
1046
00358
N. Kharma et al.
through which it controls how it merges with another. Initially, all chromosomes in A and B are enumerated, and pairs are chosen, sequentially matching the chromosomes as they occur in the agent. Once chromosomes have been paired, they are merged according to the following mechanism (the conjunction is set as AND or AND-NOT on the basis of whether or not the vote classes agree): A
B
Result (Affinity bits — chromosome — conjunction — chromosome)
00 00 00 A and[not] B 00 01 01 A and[not] B 00 10 10 B and[not] A 00 11 Chromosomes laid out vertically 01 00 01 B and[not] A 01 01 Chromosomes laid out vertically 01 10 11 B and[not] A 01 11 Chromosomes laid out vertically 10 00 10 A and[not] B 10 01 11 A and[not] B 10 10 Chromosomes laid out vertically 10 11 Chromosomes laid out vertically 11 00 Chromosomes laid out vertically 11 01 Chromosomes laid out vertically 11 10 Chromosomes laid out vertically 11 11 Chromosomes laid out vertically The above (admittedly, rather intricate) process was chosen for the amount of control, which may be exploited genetically. For example, a chromosome may be declared complete by setting its affinity bits to “11”. Or, it may “choose” to always be the first section of a chromosome (and hence, the section which defines which class is being voted for) by setting its affinity bits to “10”. 3.7. Crossover The remaining population is subjected to crossover, with a set probability. Candidates are chosen using roulette-wheel selection. Crossover is always single point, although there are two variations on this — the first is a single point crossover in which indices are chosen randomly between the two agents (variable-length), and the second in which one index is chosen for both agents (fixed-length; it is assumed that parent agents will be of the same length in this case). In both cases, crossover is bit-wise, where function limits are stored as integers.
September 3, 2004 11:12 WSPC/115-IJPRAI
00358
Project CellNet: Evolving an Autonomous Pattern Recognizer
1047
3.8. Mutation In the final stage in the preparatory process, a mutation operator is applied. The mutation operator skims over all bits in all agents, making random changes with a set probability. The mutation operator applies to all hunter agents with the exception of those selected for elitism. 4. Methodology and Results 4.1. Merger versus the Standard GA In the first series of experiments, a standard problem was chosen to evaluate the performance of CellNet in finding an effective pattern recognizer. The chosen problem was the ability of the system to distinguish between handwritten characters — specifically, to distinguish between the zero character and anything else. All examples were drawn from the CEDAR database. Data was drawn from two series of runs. The first set of runs consisted of the CellNet system evaluated without merger (hereby denoted the Standard Genetic Algorithm or SGA trial). The second set of runs consisted of a similar run of the CellNet system, this time using the new merger operation (hereby denoted the Merger-Enabled or ME trial). In both the SGA and ME trials, a population of 500 agents was initialized randomly. 500 images were chosen as prey — in the ME and SGA trials, no prey replacement was allowed (i.e. the prey population was static). Fitness was assigned to each agent according to the function: 1.0; agent A misidentifies image i 1 X 0.2; agent A replies “uncertain” for image i . fitness(A) = |T | i∈T 0.0; agent A correctly identifies image i
This differs from raw accuracy, defined as: 1.0; agent A misidentifies image i 1 X accuracytrain (A) = 0.2; agent A replies “uncertain” for image i |T | i∈T 0.0; agent A correctly identifies image i
for the training sets (where T is the training set; note that T can change between generations c.f. Sec. 3.2, although it is static for the ME and SGA trials), and ( 1 X 1; agent A correctly identifies image i accuracyvalidate (A) = |V | 0; agent A does not correctly classify image i i∈T
for the purposes of validation (where V is the set of images reserved for validation). Once fitness was assigned, the agents were sorted in descending order and prepared for the next generation, following the process: (1) 10% of the top ranking agents were passed immediately as elite.
September 3, 2004 11:12 WSPC/115-IJPRAI
1048
00358
N. Kharma et al.
(2) From the remaining population, each was considered for merger with a probability of 1%. In the event that an agent was chosen, a second agent was chosen randomly, using a roulette-wheel algorithm. (3) From the remainder, 50% of the population were chosen for crossover. Crossover was single-point, and occurred at the same index point for both agents (i.e. respected agent complexity). (4) For all population members save those selected for elitism, mutation was applied with a probability of 0.01. The difference between the trials lies in step 2, and in the choice of the original population. In the SGA trial, the initial population members were chosen with a complexity of 40 cells, and step 2 was omitted — note that this implies a constant complexity of 40 cells for each agent throughout the SGA trial. In the ME trial, the initial population members had a complexity of 1 cell, and step 2 was included — merger was altered such as not to allow agents of size greater than 40 cells. Every ten generations, a validation step was performed — accuracy was computed, instead of using an independent verification set of 300 images. This independent verification had no feedback to the system — instead it was used to measure the progress of the system in a manner resistant to the effects of overfitting. A run in either trial was executed for 1,000 generations. Standard Genetic Algorithm (SGA) trial and Merger Enabled (ME) trial results: Each of the SGA and ME trials were repeated 15 times, with little variance in results. Shown are results of typical executions:
Typical Run of the ME Trial 1 0.95 0.9
Accuracy
0.85 0.8 0.75 0.7
Training - Max. Accuracy
0.65
Training - Mean Accuracy
0.6
Validation - Max Accuracy Validation - Mean Accuracy
0.55 0.5 0
50
100
150
200 Generations
250
300
350
400
September 3, 2004 11:12 WSPC/115-IJPRAI
00358
Project CellNet: Evolving an Autonomous Pattern Recognizer
1049
Typical Run of the SGA Trial 1 0.95 0.9
Accuracy
0.85 0.8 0.75 0.7 Training - Max. Accuracy 0.65
Training - Mean Accuracy
0.6
Validation - Max Accuracy Validation - Mean Accuracy
0.55 0.5 0
50
100
150
200
250 300 350 Generations
400
450
500
550
600
The maximum accuracy found on the validation set for the typical ME trial was 89.2%. For the SGA trial, the maximum validation accuracy found was 72.9%.
4.2. Auto-complexity trial (Latin) In the second experiment (the Latin Auto-Complexity or LAC trial), the system was configured to again find a binary classifier in a manner similar to the ME trial described above. In the LAC trial, however, no maximum complexity was enforced: agents were initialized with a complexity of one cell, and could grow without bound. For each run of the AC trial the CellNet system was executed for 1,000 generations, outputting data regarding its (raw) accuracy on the independent verification set, and data regarding the complexity of the agents produced. The initial prey population was initialized at 500 images, set to replace 3% from a total pool of 1,000 every five generations. The LAC trial was run with a rate of elitism of 0.1, a rate of merger of 0.02, a rate of (variable-length) crossover of 0.4, a rate of mutation of 0.01, and a complexity penalty (alpha) of 0.0005. The fitness function used was: 2 1.0; agent A correctly identifies image i 1 X fitness(A) = 0.2; agent A replies “uncertain” on image i |T | i∈T 0.0; agent A misidentifies image i
− α · complexity(A)
where T is the (nonstatic) set of training images.
September 3, 2004 11:12 WSPC/115-IJPRAI
1050
00358
N. Kharma et al.
Latin Auto-Complexity (LAC) Trial Results (best run of 20): Best Run of the LAC Trial - Accuracy 1 0.95 0.9
Accuracy
0.85 0.8 0.75 0.7
Training - Max. Accuracy
0.65
Training - Mean Accuracy Verification - Max Accuracy
0.6
Verification - Mean Accuracy
0.55
500
480
460
440
420
400
380
360
340
320
300
280
260
240
220
200
180
160
140
120
80
100
60
40
0
20
0.5 Generations
Maximum Complexity
Best Run of the LAC Trial - Complexity
Mean Complexity
400
Complexity of Most Fit Agent Minimum Complexity
350 300
200 150 100 50
Generations
500
480
460
440
420
400
380
360
340
320
300
280
260
240
220
200
180
160
140
120
100
80
60
40
0
0 20
Accuracy
250
September 3, 2004 11:12 WSPC/115-IJPRAI
00358
Project CellNet: Evolving an Autonomous Pattern Recognizer
1051
The maximum validation accuracy found in the best LAC trial was 97.7%, found around generation 150. Following generation 250, maximum accuracy dropped slightly, settling around 94.8%, despite continued smooth increases in training accuracy. There was initially a large variance in the complexity measure, which quickly stabilized. The majority of the most accurate agents displayed little variation from the mean. The mean maximum validation accuracy found between runs of the LAC trial was 0.954 (with a standard deviation of 0.0147). Complexity in all runs (for both the most accurate agent and the mean) settled at slightly higher than 30 cells, with very little variance between runs (approximately ±5 cells). Additional informal LAC trials were run involving Latin characters other than zero, with similar results. 4.3. Auto-complexity trial (Indian) In the final experiment (the Indian Auto-Complexity or IAC trial), the system was configured identically to the LAC trial above, save that the prey was restructured to distinguish between “4” and not “−4”, using Indian handwritten characters from the CENPARMI database. The initial prey population was initialized at 400 images, set to replace 3% from a total pool of 600 every five generations. Shown below are the results of the run (of 20) which found the most accurate classifier.
Best Run of the IAC Trial - Accuracy 1 0.95 0.9 0.85
0.75 0.7
Training - Max. Accuracy
0.65
Training - Mean Accuracy
0.6
Verification - Max Accuracy
0.55
Verification - Mean Accuracy
Generations
48 0
44 0
40 0
36 0
32 0
28 0
24 0
20 0
16 0
12 0
80
40
0.5 0
Accuracy
0.8
September 3, 2004 11:12 WSPC/115-IJPRAI
N. Kharma et al.
Maxim um Com plexity Mean Com plexity Complexity of Most Fit Agent
180 170 160 150 140 130 120 110 100 90 80 70 60 50 40 30 20 10 0
48 0
44 0
40 0
32 0
28 0
24 0
20 0
16 0
12 0
80
40
Minimum Complexity
0
Accuracy
Best Run of the IAC Trial - Complexity
36 0
1052
00358
Generations
The maximum validation accuracy found in the IAC trial was 97.4%, found around generation 380. Following generation 450, the accuracies followed a stable course, indicating convergence in the genetic algorithm. The IAC trial was run 20 times, each time with similar results. The mean maximum validation accuracy found between runs was 0.956 (with a standard deviation of 0.019), with typical mean complexities similar to those found in the best run above. A hunter with validation accuracy within 0.005 of the maximum validation accuracy found was generated in each trial prior to generation 500 — convergence appears to have occurred in all cases prior to generation 1,000. 5. Analysis In the ME versus SGA trials, a clear bias is shown toward the ME system. The SGA system shows a slow gradual progression toward its optimal training accuracy, and a poor progression towards a good validation accuracy — the latter is not surprising, as the SGA does not contain any measures to prevent overfitting to the training set, and the chosen features bear a large potential for precisely that. More interesting is the ME system’s resilience to overfitting. We believe this is a result of the structural construction of the agents, as early simple agents do not have the complexity necessary for “data memorization”. The LAC and IAC trials showed a good progression towards a powerful pattern recognizer, without sacrificing much efficiency relative to the ME trial. Unlimited potential agent complexity proved a computational burden in the initial generations, but quickly decreased to a stable and relatively efficient maximum. The effects of
September 3, 2004 11:12 WSPC/115-IJPRAI
00358
Project CellNet: Evolving an Autonomous Pattern Recognizer
1053
overfitting can be seen here as well, but to a much lesser extent than in the SGA trial: a difference between training and verification accuracy of approximately 2% seems to be the norm, with overfitting nonexistent or negative in some runs. Variance existed between the runs within the LAC and IAC trials; there is nearly a 2% difference in validation accuracy between the typical and the best run. Indeed, this shows that several runs of the CellNet system are needed to guarantee a high-accuracy classifier. Additionally, it shows that the CellNet system is capable of discovering several independent classifiers, based solely on the seed chosen for random parameter generation. This opens up the CellNet system for placement within a larger classifier-combining framework (e.g. bagging) utilizing agents found between independent runs of the system. The slow-down in convergence in the LAC and IAC trials relative to the ME trials is to be expected: the complexity of the ME trial was specifically chosen to be appropriate to the task at hand, while the agents in the AC trials had no such luxury. The AC trial was more computationally expensive in terms of time required for a single generation initially, but soon stabilized to a more reasonable range of agent complexities. The presence of large (100 + cells) and small (three cells) agents throughout the trial is most likely the result of a preparatory process between generations which was rather forgiving. 6. Conclusions It is believed that the success of the Merger operator in speeding up the evolutionary process and selecting a complexity appropriate to the task of handwritten character recognition has been demonstrated. The final LAC and IAC trials are largely autonomous processes resulting in the discovery of binary classifiers which compare to the raw accuracy of other current approaches (for example, Refs. 2, 4 and 9). Additionally, we believe that the merger operator has demonstrated its ability to aid in the elimination of overfitting. It is our belief that the current CellNet model of Cooperative Co-evolution is an appropriate choice for approaching the task of designing an APR, and that given appropriate feature functions the CellNet system could be extended to virtually any image classification task. 7. Future Directions The idea of a truly Autonomous Pattern Recognizer is enticing; a black box which could learn to recognize an arbitrary visual pattern in the best possible manner is an ambitious goal, and one not yet achieved by the CellNet system. However, work is currently under way involving several axis by which the CellNet system expositioned above may be improved. One very powerful aspect of evolutionary computing, little exploited thus far, is the ability of a “blind designer” to discover nonintuitive solutions which a human designer might overlook. The first axis by which CellNet is to be improved is in the inclusion of a mechanism for developing original features concurrently with
September 3, 2004 11:12 WSPC/115-IJPRAI
1054
00358
N. Kharma et al.
their organization as cells in agents. The current mandate of the CellNet Research Group is to investigate the possibility of replacing the finite feature list with an evolvable framework by which the system may design and utilize custom-tailored Neural Networks. The inclusion of a mechanism for the design of features once again borders on the computationally infeasible; the shear number of possible features makes a reduction in the space of considered strategies necessary. Indeed, there is ample evidence that the human brain uses precisely such a tactic — the psychological field of Pre-Attentive Vision focuses on this topic explicitly. Rather than allow for processing on the entirety of a target image, the CellNet Research Group is now focussing on preprocessing images on the basis of algorithms extracted from the field of Pre-Attentive Vision, and including a capacity for the design of features on this more limited resulting space. Finally, a missing component in the creation of a truly autonomous system is the capacity of that system to select global parameters at run-time, as well as handling global strategies for the prevention of overfitting. Rather than specify parameters such as elitism or crossover rate explicitly, the CellNet Research Group is involved in a redesign of the CellNet environment which will facilitate their inclusion intrinsically. This includes a divergence from the typical paradigm of Genetic Programming in favor of an immersive environment inspired by other experiments in simulated ecology. It is hoped that the natural competitive coevolution between camouflaged images and recognizing agents will provide a self-optimizing world in which global parameters are no longer necessary. References 1. Y. Al-Ohali, M. Cheriet and C. Suen, Databases for recognition of handwritten Arabic cheques, Patt. Recogn. (2002). 2. A. Amin and P. Compton, Hand printed character recognition using machine learning, Progress in Handwriting Recognition, eds. A. Downton and S. Impedovo (World Scientific, 1998). 3. F. Brill, D. Brown and W. Martin, Fast genetic selection of features for neural network classifiers, IEEE Trans. Neural Networks 3(2) (1992) 324–328. 4. K. Cheung, D. Yeung and T. Chin, Recognition of handwritten digits using deformable models, in Progress in Handwriting Recognition, eds. A. Downton and S. Impedovo (World Scientific, 1998), pp. 145–150. 5. M. Dash and H. Liu, Feature selection for classification, Intell. Data Anal. 1(3) (1997). 6. G. Fung, J. Liu and R. Lau, Feature selection in automatic signature verification based on genetic algorithms, ICONIP’96, 1996, pp. 811–815. 7. Z. Guo and R. Uhrig, Using genetic algorithms to select inputs for neural networks, in Proc. COGANN’92, 1992, pp. 223–234. 8. W. D. Hillis, Co-evolving parasites improve simulated evolutions as an optimization procedure, in Artificial Life II, SFI Studies in the Sciences of Complexity, Vol. X, eds. C. G. Langton, C. Taylor, J. D. Farmer and S. Rasmussen (Addison-Wesley, 2000), pp. 313–322.
September 3, 2004 11:12 WSPC/115-IJPRAI
00358
Project CellNet: Evolving an Autonomous Pattern Recognizer
1055
9. N. Kharma and R. Ward, A novel invariant mapping applied to handwritten Arabic character recognition, Patt. Recogn. 34 (2001) 2115–2120. 10. M. Kudo and J. Sklansky, A comparative evaluation of medium- and large-scale features selectors for pattern recognition, Kybernetika 34(4) (1998) 429–434. 11. M. Martin-Bautista and M.-A. Vila, A survey of genetic feature selection in mining issues, in Proc. 1999 Congress on Evolutionary Computation, 1999, pp. 1314–1321. 12. A. Moser, A distributed vertical genetic algorithm for feature selection, in ICDAR ’99, pp. 1–9 (late submissions’ supplement). 13. D. Shi, W. Shu and H. Liu, Feature selection for handwritten Chinese character recognition based on genetic algorithms, IEEE Int. Conf. Syst. Man Cybern. 5 (1998), pp. 4201–4206. 14. W. Siedlicki and J. Sklansky, On automatic feature selection, Int. J. Patt. Recogn. 2 (1988) 197–220. 15. F. W. M. Stentiford, Automatic feature design for optical character recognition using an evolutionary search procedure, IEEE Trans. Patt. Anal. Mach. Intell. PAMI-7(3) (1985) 349–355. 16. S. P. Brumby, J. Theiler, S. Perkins, N. Harvey, J. J. Szymanski, J. J. Bloch and M. Mitchell, Investigation of image feature extraction by a genetic algorithm, in Proc. SPIE, Vol. 3812, 1999, pp. 24–31. 17. H. Vafaie and K. De Jong, Robust feature selection algorithms, in Proc. IEEE Int. Conf. Tools with Artificial Intelligence, 1993, pp. 356–363. 18. M. Vose, The Simple Genetic Algorithm (MIT Press, 1999). 19. D. Yeung, Y. Cheng, H. Fong and F. Chung, Neocognitron based handwriting recognition system performance tuning using genetic algorithms, IEEE Int. Conf. Syst. Man Cybern. 5 (1998), pp. 4228–4233.
Nawwaf Kharma is an Assistant Professor of computer engineering at Concordia University in Montreal. He had previously worked at the University of Paisley in Scotland and the University of British Columbia in Vancouver, B.C. He teaches both genetic algorithms and machine learning at the graduate level at Concordia. Dr. Kharma received his Ph.D. in artificial intelligence from Imperial College, University of London, England. His research interests are in evolutionary computation, machine learning and pattern recognition.
Taras Kowaliw received his B.Sc. in pure mathematics from the University of Toronto, and his M.Sc. in computer science from Concordia University in Montreal, Canada. He is currently pursuing his Ph.D. at the latter institution, concentrating on evolutionary computation and development. He also works in the design of software for people with disabilities at the Adaptive Technology Resource Centre at the University of Toronto, and teaches in the Department of Digital Art at Concordia.
September 3, 2004 11:12 WSPC/115-IJPRAI
1056
00358
N. Kharma et al.
Etienne Clement received his bachelor degree in software engineering with honors from Concordia University. He worked as a software developer at Discreet with Motorola and as a research assistant at Concordia University. Moreover, he has acquired expertise in audio/video and pattern recognition domains since he has been working on related projects for more than two years. Currently, he is working as a software engineer at Discreet. Albert Youssef holds a B.S. in computer engineering from the University of Balamand, Lebanon (2001), and a Masters of electrical and computer engineering from Concordia University, Montreal, Canada (2003).
Christopher Jensen received a B.Eng. degree in software engineering from Concordia University, Montreal, in 2003. He is presently working at Matrox Electronic Systems on MIL, the Matrox Imaging Library, pursuing his interests in imaging, system programming and general development on the Linux platform.
Jie Yao obtained her bachelor degree from the Department of Automation at Tsinghua University, and a master degree from the Department of Electrical Engineering at McGill University. She is now a Ph.D. student in the Department of Electrical and Computer Engineering at Concordia University, Montreal, Canada.