Table 3. Percentage error reduction of NVGP system relative to set of best individual modules operating in isolation (see text), for each of 10 holdout tests.
N-version Genetic Programming via Fault Masking Kosuke Imamura1, Robert B. Heckendorn1, Terence Soule1, and James A. Foster1 1 Initiative
for Bioinformatics and Evolutionary STudies (IBEST), Dept. of Computer Science, University of Idaho, Moscow, ID 83844-1010 {kosuke,heckendo,tsoule,foster}@cs.uidaho.edu http://www.cs.uidaho.edu/ibest
Abstract. We introduce a new method, N-Version Genetic Programming (NVGP), for building fault tolerant software by building an ensemble of automatically generated modules in such a way as to maximize their collective fault masking ability. The ensemble itself is an example of n-version modular redundancy for fault tole rance, where the output of the ensemble is the most frequent output of n independent modules. By maximizing collective fault masking, NVGP approaches the fault tolerance expected from n version modular redundancy with independent faults in component modules. The ensemble comprises individual modules from a large pool generated with genetic programming, using operators that increase the diversity of the population. Our experimental test problem classified promoter regions in Escherichia coli DNA sequences. For this problem, NVGP reduced the number and variance of errors over single modules produced by GP, with statistical significance.
1 Introduction We introduce a new technique for building fault tolerant software from an ensemble of automatically generated modules that significantly reduces errors when applied to a classification problem. We use genetic programming to provide a large pool of candidate modules with sufficient diversity to allow us to select an ensemble whose faults are nearly uncorrelated. We combine the ensemble into a single system whose output is the most common output from its constituent modules. The error rate of this N-Version Genetic Programming (NVGP) system is directly related to the extent to which the constituent modules mask different faults. Collections of modules for which the system error rate is low either make mistakes on different inputs, so that in the overall system these mistakes are in the minority and are suppressed, or they make very few mistakes, so that little error suppression is necessary. The isolated island model evolves individuals in distinct demes and promotes speciation. Such speciated individuals may produce faulty outputs on different instances. We compare the observed error rate for the ensembles built from random samples of the best-evolved modules to the theoretically optimal expected value, and retained the best ensembles. We call this N-Version Genetic Programming (NVGP). The expected failure rate for n independent components, each of which fails with probability p, where the composite system requires m component faults to fail (ini-
tially
derived
for
n-modular
redundant
hardware
systems
[1])
is
n f = ∑k = m (1 − p) n − k p k . For an N-version classifier system, such as ours, k n
(
)
the individual fault rate p is the ratio of misclassified examples to the total number of trai ning instances. In this case, f is the error rate of an ideal ensemble. The error rate of an ensemble is close to the theoretically optimal rate f precisely when component failures are not correlated. This is our criteria for selecting the best ensemble. We experimentally validated this system with a classification problem taken from bioinformatics: recognizing E. coli promoters, which are DNA sequences that initiate or enhance the activation of genes. Our experiment shows a statistically significant reduction in the number of errors and variance of our system whe n compared to single modules produced by genetic programming. 1.1
Fault tolerant software
Our approach is based on N-version programming (NVP). NVP was an early approach to building fault tolerant software that adapted proven hardware approaches to fault tolerance [2]. When applied to software, the objective was to avoid catastrophic failure caused by flawed software design by combining N≥2 functionally equivalent modules (we use the words versions, modules and components interchangeably) that were developed by independent teams with different design methodologies from the same initial specifications [3]. A fundamental assumption of the NVP approach was that independent programming efforts would reduce the probability that similar errors will occur in two or more versions [3]. But this assumption was questioned. Knight and Leveson applied a probabilistic metric to measure the assumed independence of modules in NVP [4], and rejected the hypothesis of the assumed independence of faults by independently developed programs. However, this conclusion does not invalidate NVP in general. Hatton determined that multiple versions developed for NVP are sometimes more reliable and cost effective than a single good version [5], even with non-independent faults. His 3-version system increased the reliability of the composite NVP system by a factor of 45. This is far less than the theoretical improvement of a factor of 833.6. But it is still a significant improvement in system reliability. 1.2
Test Problem: E. coli promoter recognition
We tested our approach on an E. coli DNA promoter region classification problem. A promoter is a DNA sequence that regulates when and where an associated gene will be expressed. The problem is whether or not a given DNA sequence is an E. coli promoter. Our objective was to quantify the effectiveness of a fault tolerant system built with our ensemble construction method, not to produce a competitive promoter detection tool. This problem has also been solved with artificial neural networks [6][7][8] and genetic programming [9].
2 Previous work Different ensemble construction methods have been studied in an effort to enhance accuracy. This section reviews averaging, median selection, boosting, bagging, and evolutionary methods. All methods exploit heterogeneity of ensemble components. 2.1
Averaging and Median selection
A simple averaging method gathers outputs from all component modules and calculates their arithmetic average. Imamura and Foster showed simple averaging reduces error margins in path prediction [10] and function approximation with evolved digital circuits [11]. Another approach is weighted averaging, in which component modules are assigned optimal weights for computing a weighted average of the module outputs. Linearly optimal combination of an artificial neural network takes this approach [12][13]. The median value of the outputs is then the ensemble output. Soule approximated the sine function by taking the median of individuals, which were evolved, with subset of the entire training set for specialization [14]. 2.2
Boosting and Bagging
Boosting and bagging are methods that perturb the training data by resampling to induce classifier diversity. The AdaBoost algorithm trains a weak learner by iterating training while increasing the weights of misclassified samples and decreasing the weights of correctly classified ones [15]. The trained classifiers in each successive round are weighted according to their performance and cast a weighted majority vote. Bagging (Bootstrap aggregating), on the other hand, replicates training subsets by sampling with replacement [16]. It then trains classifiers separately on these subsets and builds an e nsemble by aggregating these individual classifiers. For evolutionary computation, Iba applied Boosting and Bagging to genetic programming and experiment validated their effectiveness and their potential for controlling bloat [17]. Land used a boosting technique to improve performance of Evolutionary Programming derived neural network architectures in a breast cancer diagnostic application [18]. However, both techniques have limitations. Boosting is susceptible to noise, Bagging is not any better than a simple ensemble in some cases, neither Boosting nor Bagging is appropriate for data poor cases, and bootstrap methods can have a large bias [15][19][20][21][22][23]. Langdon used genetic programming to combine classifiers into ensembles [24]. 2.3
Classification of Ensemble Methods
Table 1 categorizes current ensemble methods in genetic programming in terms of their sampling technique in combination with the evolutionary approach. In cooperative methods [14][25], speciation pressure (such as that caused by crowding penalties [25]) plays a vital role in evolving heterogeneous individuals, while in isolation methods there is no interaction between individuals during evolution. Resampling methods create different classifiers by using different training sets (bagging) or varying weights of training instances (boosting). Non-resampling method creates different classifiers from the same training set with or without explicit speciation
pressure. NVGP is non-resampling technique based on isolated evolution of diverse individuals. Table 1. Classification of ensemble creation methods.
Evolutionary Approach Non-Isolation Isolation
Training set selection Resampling Boosting Bagging
Non-resampling Crowding NVGP
3 Experimental Data We compared the performance distributions of a group of single best versions and a group of NVGP ens embles. Evaluation and comparison of one or a small number of evolved individuals or ensembles would have been susceptible to stochastic errors in performance estimation. We assume the number of errors have a normal distribution, since each test instance can be viewed as a Bernoulli trial [23]. 3.1
Problem
Our problem is to classify whether a given DNA sequence is an E. coli promoter. The data set is taken from UCS ML repository [26]. It contains 53 E. coli DNA promoter sequences and 53 non-promoter sequences, all of length 68. 3.2
Computing Environment
In order to generate sufficiently large statistical samples for the experiments, we used the cluster supercomputing facilities from the Initiative for Bioinformatics and Evolutionary STudies (IBEST). This device uses commodity-computing parts to build substantial computing power for considerably less money than traditional supercomputers1. Cluster based computers using this approach are referred to as Beowulf computers. 3.3
Input and output
We used 2-gram encoding for input [27]. The 2-gram encoding counts the occurrences of two consecutive input characters (nucleotides) in a sliding window. There are four characters in our sequences (“a”, “c”, “g”, and “t”). The classifier clusters the positive instances and places the negative instances outside the cluster. The cluster is defined by the interval [µ-3*δ, µ+3*δ], where µ is the mean of the classifier output values for the positive instances and δ is the standard deviation. If an output value from a given sequence falls within this interval then it is in the cluster and so is classified as a promoter. Otherwise, it is classified as a non-promoter.
1
The total cost of the machine is about US$44,000. Micron Technology generously donated all of the memory for the machine.
3.4
Classifier
Target Machine Architecture. Our classifier is a linear genome machine [28], which mimics MIPS architecture [29]. There are two instruction formats in this architecture: (Opcode r1,r2,r3) and (Opcode r1,r2,data). We used the instruction set in Figure 1. The length of an individual program is restricted to a maximum of 80 instructions. Each evolving individual (a potential component for our NVGP ensemble system) used sixteen read-only registers for input data, which contained counts for individual nucleotide 2-grams as described above, and four read/write working regi sters.
Inst.
Arithmetic operations Action
ADDI ADDR MUL DIV
reg[r1]=reg[r2] reg[r1]=reg[r2] reg[r1]=reg[r2] reg[r1]=reg[r2]
+ + * /
data reg[r3] reg[r3] reg[r3]
MULI
reg[r1]=reg[r2] * data
DIVI SIN COS LOG EXP
reg[r1]=reg[r2] / data reg[r1]=sin( reg[r2] ) reg[r1]=cos( reg[r2] ) reg[r1]=log( reg[r2] ) reg[r1]=exp( reg[r2] )
Data and control operations Inst. Action NOP MOVE LOAD CJMP
CJMPI
None reg[r1]=reg[r2] reg[r1]=data if(reg[r1]