THE COST MINIMIZING INVERSE CLASSIFICATION PROBLEM: A GENETIC ALGORITHM APPROACH
Michael V. Mannino Graduate School of Business Administration, Campus Box 165, P.O. Box 173364 University of Colorado at Denver, Denver, CO 80217-3364
[email protected]
Murlidhar V. Koushik Microsoft Corporation 11/2134, One Microsoft Way Redmond, WA 98052-6399
[email protected]
April 10, 2000
THE COST MINIMIZING INVERSE CLASSIFICATION PROBLEM: A GENETIC ALGORITHM APPROACH Abstract We consider the inverse problem in classification systems described as follows. Given a set of prototype cases representing a set of categories, a similarity function, and a new case classified in some category, find the cost minimizing changes to the attribute values such that the case is reclassified as a member of a (different) preferred category. The problem is “inverse” because the usual mapping is from a case to its unknown category. The increased application of classification systems in business suggests that this inverse problem can be of significant benefit to decision-makers as a form of sensitivity analysis. Analytic approaches to this inverse problem are difficult to formulate as the constraints are either not available or difficult to determine. To investigate this inverse problem, we develop several genetic algorithms and study their performance as problem difficulty increases. We develop a real genetic algorithm with feasibility control, a traditional binary genetic algorithm, and a steepest ascent hill-climbing algorithm. In a series of simulation experiments, we compare the performance of these algorithms to the optimal solution as the problem difficulty increases (more attributes and classes). In addition, we analyze certain algorithm effects (level of feasibility control, operator design, and fitness function) to determine the best approach. Our results indicate the viability of the real genetic algorithm and the importance of feasibility control as the problem difficulty increases.
Keywords: inverse classification, genetic algorithms, classification systems, sensitivity analysis
1. Introduction Classification systems have become important tools used in the support of organizational decision making. When presented with a new case, a classification system returns the category that best describes the case. Examples of classification in business decision making include fault diagnosis in semiconductor manufacturing [ICFQ93], bank failure prediction [TK90], and industry and occupation code prediction [CMSW92]. The primary goal of a classification system is to perform at the same level of human experts. Classification systems can provide many benefits to an organization such as reducing decision making time, improving the consistency of decisions, and reducing dependence on scarce human experts. Notably absent in most classification systems is the ability to systematically conduct sensitivity analysis. Sensitivity analysis, which has been widely studied in many areas of mathematical programming, provides insight to decision makers by focusing on the impact of changes to one or more input variables on the values of the decision variables and objective function. In this paper, we study a sensitivity problem that we refer to as the cost minimizing inverse classification problem. This problem deals with the question: What is the minimum required change to a case in order to reclassify it as a member of a (different) preferred class? This is an inverse problem because the normal mapping in a classification system is from a case to its predicted class. An efficient solution to this inverse classification problem can be of significant value to decision makers in situations where attribute values can be manipulated. For example, municipal bond ratings by the two major rating agencies, Moody's Investor Service, Inc. and Standard & Poor's Corporation, are well known determinants of the credit worthiness (and hence the
Page 2
borrowing costs and financial integrity) of a community’s debt obligations. A number of factors belonging to four categories - economic base, financial analysis, administrative structure and debt - are used by these agencies in arriving at the bond rating. Several factors, such as population size and geographic location, can be difficult to manipulate. However, other factors such as housing variables (for example, the percentage of owner-occupied dwellings and percentage of single-family housing units, which are both significant in their influence on the rating [CF84]) are far more directly controllable. The cost-minimizing solution to the problem of moving to an improved rating might clearly be of interest to the municipality’s administrators. As a first attempt to study the cost minimizing inverse classification problem, we study a heuristic approach using genetic algorithms for similarity-based classification systems. The most distinctive feature of similarity-based classification systems is that concept boundaries are not explicitly represented. The classifier, when presented with a new case, returns the predicted class as well as a goodness of fit measure such as the distance from other similar cases. Because concept boundaries are not explicitly represented, it is difficult to develop an analytic model of the cost minimizing inverse problem. A genetic approach is interesting here because it can be efficient, incorporate problem specific knowledge, yet operate without explicit knowledge of concept boundaries. Similarity-based classification systems are widely studied and used [Dasr90, SW86]. The simplest approach, known as the k nearest neighbors approach, does not require analysis of training data. However, because it may require retrieval of a large number of cases, other approaches have been devised that reduce the retrieval effort but require analysis of training data [HT96, ZO96, ABR98]. We focus on similarity-based techniques that reduce the search effort by
Page 3
saving only the prototype or most representative instances of each class [AKA91]. Similaritybased classification has been applied to industrial problems [Alle94] including industry and occupation classification [CMSW92], clinical audiology [PBH90], customer technical support [Simo92], and message classification [Good90]. In the bond rating example discussed previously, similarity-based classification would be a good approach because the municipality does not know the explicit concept boundaries as they are a closely guarded secret of the credit rating agency. To study this problem, we develop several genetic algorithms and study their performance. To facilitate the comparison of the genetic algorithms with the optimal solution, we restrict our study to classification systems with real attributes. We present a real genetic algorithm with feasibility control, a traditional binary genetic algorithm, and a steepest ascent hill climbing algorithm. The real genetic algorithm uses feasibility preserving operators, a feasibility control parameter, and fitness functions for evaluation of infeasible solutions. In a series of simulation experiments, we compare the performance of these algorithms to the optimal solution as the problem difficulty increases (more attributes and classes). In addition, we analyze certain algorithm effects (level of feasibility control, operator design, and fitness function design) to determine the best approach. Our results provide insight into the capabilities of a genetic algorithm when precise constraint boundaries are not known, the importance of feasibility control, and the effect of other design issues. The rest of this paper is organized as follows. Section 2 reviews research on nearest point problems and sensitivity analysis for classification systems. Section 3 defines the cost minimizing inverse classification problem and lists some simplifying assumptions. Section 4
Page 4
describes the genetic algorithms used in our study. Section 5 reports on experiments comparing the genetic algorithms. Section 6 summarizes the paper and discusses future research directions.
2. Related Work The work most closely related to the model presented in Section 3 is the nearest point problem. In the nearest point problem, the goal is to find the nearest point by Euclidean distance in a convex polyhedral space. There are many applications of nearest point problems in robotics, geophysical data analysis, and optimization algorithms [Alsu90]. The nearest point problem can be solved by algorithms that utilize its underlying linear complementarity structure [Murt88], interior point methods [Alsu94], and exterior point methods [AM92]. However, none of the work on nearest point problems addresses imprecisely known constraints. Clustering is another related area. The objective in clustering problems is to assign objects to groups so that the members of each group are as similar as possible. Clustering can be hard in which each object is assigned to only one cluster or fuzzy in which objects are assigned with a degree of association. Clustering is similar to the inverse classification problem as the constraint boundaries are not known and heuristic solutions such as Tabu search [Glov89, Alsu95] and simulated annealing [AS93, LA87, SA91] have been used. Clustering is dissimilar in that a nearest point is not sought. Several related studies deal with sensitivity analysis of classification. The sensitivity of efficiency classifications in the additive model of data envelopment analysis is studied in [CHJS92]. The authors formulate a model in which an organization’s classification as efficient or inefficient remains unchanged in a region of stability. Their model requires two classifications
Page 5
(efficient or inefficient) and the explicit representation of resource production and usage in the data envelopment analysis model. Sensitivity analysis in neural net models is studied in [MB98]. The authors develop a technique to show the relative importance of input variables and elements in the hidden layer of a neural network for product entry decisions. Although their technique does not require explicit modeling of concept boundaries, they do not compute a cost minimizing point as described in this paper. Construction of prototype instances in a neural network framework is presented in [HP92]. This work describes a backward mode of classification in which a neural network for classification and a training set are known but good representatives (or prototypes) of a class are unknown. However, the authors do not address the problem of finding cost minimizing points. There has been active research on using genetic algorithms for constrained optimization problems [Mich95, MS96]. These efforts use techniques such as penalty functions, specialized operators, co-evolution, and repair algorithms. In addition, hybrid methods combining evolution with traditional optimization techniques have been used [KM97]. Some of these techniques (penalty functions and specialized operators) are adapted in this work. None of these approaches has been reported for the problem described here. All of the techniques require explicit enumeration of the constraints. A final study to note involves the use of genetic algorithms for sensitivity analysis [KO94]. The authors describe a method whereby the output of a genetic algorithm is consulted to provide sensitivity information. The authors’ method may also apply to the genetic algorithms described here to provide additional information.
Page 6
3. Problem Definition and Solution Approaches In this section we define the particular inverse classification problem studied in this paper. We make several simplifying assumptions about the inverse classification problem so that we can compare the performance of approximate solution approaches to the optimal solution. 3.1 General Problem Definition The cost minimizing inverse classification problem determines the minimum cost alternative by which a reference instance r = {r1 , r2 , , rn } , currently categorized in class Ci , can have its attribute values modified so that it is recategorized in a (different) preferred class Cλ , i ≠ λ ∈ {1, 2, , m} . The process of classifying an instance k is a mapping from the set of attribute values {k1, k2 , , kn } and prototype instances R to exactly one of m classes: classify (k , R ) → Ci , i ∈ {1, 2, , m} . Let q = {q1, q2 , , qn } denote a modified instance of r after one or more attribute values have been manipulated and change_cost (r, q) denote the total cost of the modifications. The inverse classification problem CP1 can be formulated as Problem CP1: min( change _ cos t (r , q ))
(1)
q
s.t.
classify (q , R) → Cλ
3.2 Assumptions We make the following assumptions that enable the cost minimizing inverse classification problem to be formulated as an optimization problem. •
Attribute domains are real-valued, i.e., dom( A1 ) × × dom( An ) = E n , where E n represents n-dimensional Euclidean space. Further, we assume that attribute values are
Page 7
mapped to the range (LB, UB) where LB and UB represent the lower and upper bounds, respectively. •
Each attribute can be independently changed. Independence allows an attribute to be modified without inducing changes in other attributes. The case where only some attributes can be modified is clearly of practical interest. Our approach allows this to be modeled by associating such attributes with significantly higher change costs.
•
Attribute change costs are non-uniform, linear functions. Non-uniformity implies that change costs can differ among attributes. Linearity implies that change costs are stationary over the domain.
•
Concepts defining class boundaries represent linear constraints. Linear class boundaries imply that the region describing any individual class is a convex space. With these assumptions, the inverse classification problem can be formulated as Problem
CP2. Let hj denote the change cost associated with attribute Aj. Since hj is stationary, the objective function is the weighted Euclidean distance between the original and modified instances. In the formulation, we have chosen to minimize the square of the weighted Euclidean distance. n
Problem CP2: min q
∑ h 2j (r j - q j )
2
(2)
j=1
Appendix A describes a mathematical programming formulation of Problem CP2. The formulation requires the construction of a Voronoi diagram, a fundamental data structure in computational geometry [Aure91]. A Voronoi diagram is a partition that reflects proximity relationships among a collection of points or sites. Each point is associated with the region
Page 8
closest to it. Voronoi diagrams have been applied in a wide variety of areas such as cluster analysis, disk optimization, collision detection in robotics, and crystallography. Appendix B describes the technique used to construct a Voronoi diagram. The mathematical programming formulation in Appendix A uses the constraints generated from the Voronoi diagram. The mathematical programming formulation in Appendix A is impractical for many classification systems because of the complexity to construct the underlying Voronoi diagram. First, to support queries on all classes, the entire Voronoi diagram must be constructed. Vertex enumeration algorithms used to construct the Voronoi diagram are exponentially complex in the worst case. Second, the number of sites can increase because more than one site exists in many classification systems. Techniques to identify prototypes from a training set of cases such as the IB3 algorithm [AKA91] usually permit multiple prototypes per class to support disjunctive concept spaces. Third, there are environments in which the prototypes change as new cases are acquired. Recomputing the Voronoi diagram in changing environments may not be cost effective. In the next section, we describe approximate techniques that can be used in situations where the math programming solution is not appropriate. We use the math programming formulation to gauge the quality of solutions generated by the heuristic methods.
4. Genetic Algorithm Design In this section we present the genetic algorithms used to find solutions to the inverse classification problem CP2. We first present the features of the real genetic algorithm including the mutation and crossover operators, the feasibility control parameter, and the fitness functions for non-feasible solutions. In the last two subsections, we briefly present the two comparison
Page 9
algorithms, the traditional binary genetic algorithm and the steepest ascent hill climbing algorithm. 4.1 Real Genetic Algorithm Genetic algorithms are probabilistic search algorithms based on the principles of natural selection, evolution, and heredity. Historically, genetic algorithms have used the binary alphabet for internal representation of chromosomes. This practice was largely inspired by early work in genetic algorithms that exploited the simplicity of this approach. In addition, a number of theoretical results, such as the well-known Schema Theorem [Holl75] and the Implicit Parallelism Property [GB89], were derived on the basis of this representation. However, recent work suggests that the implicit parallelism result does not depend on the use of a binary alphabet and that it may be worthwhile to experiment with larger alphabets, floating point representations, and new operators [Anto89, Wrig91]. In addition, experiments using genetic algorithms on floating point representations using modified genetic operators show that overall performance as measured by efficiency and standard deviation, improves substantially especially in the presence of non-trivial constraints [Mich96]. In this study we adopt a floating point representation for feature values since this is the most natural way of modeling the data. Each chromosome is therefore represented as a vector with a fixed number of floating point values, one value per attribute. An important advantage of real encoding for constrained optimization problems is the ability to control the feasibility of chromosomes. A chromosome is infeasible when it is not classified in the preferred class. We exploit this advantage by designing feasibility preserving mutation and crossover operators. The operators are adapted from [Mich96] where they are used
Page 10
with known linear constraints. In contrast, the problem studied here has unknown linear constraints. In the remainder of this section, the details of the real genetic algorithm are described. The mutation and crossover operators presented in subsections 4.1.1 and 4.1.2 followed by the feasibility control parameter and fitness function in subsections 4.1.3 and 4.1.4. 4.1.1 Mutation Operator Because class boundaries are not precisely known, mutation cannot take advantage of explicit constraint information to transform a chromosome to a feasible state. Instead, the feasible mutation operator uses a binary search to find a feasible state change. If no feasible transformation is found, no mutation occurs. Pseudo code for the mutation operator known as FeasibleMutationSearch appears below. In the pseudo code, CalcNonUniformMutation is the non-uniform mutation operator described in [Mich96]. A non-uniform mutation operator is designed so that the extent of change induced by the operator varies with the age of the population. The extent of change becomes very small when the population’s age approaches the designated maximal number of generations. procedure FeasibleMutationSearch Input: Chromosome, MutPoint, SearchDepth, Age Output: new value for the mutated gene (MutPoint) of Chromosome TempChromosome ← Chromosome Adjust ← CalcNonUniformMutation (Chromosome, MutPoint, Age) TempChromosome[MutPoint] ← Adjust if Improves(TempChromosome, Chromosome) Chromosome[MutPoint] ← Adjust exit procedure Upper ← | Chromosome[MutPoint] − Adjust | Lower ← 0 BestMutation ← BinSearch(Lower, Upper, SearchDepth, MutPoint, Chromosome) Chromosome[MutPoint] ← BestMutation end procedure
Page 11
FeasibleMutationSearch uses two alternative criteria to control the search. The diversity criterion selects the largest feasible adjustment to the chromosome. Using the diversity criterion, the Improves function returns true if the original adjustment results in a feasible solution. If Improves returns false, the binary search is executed. Inside the binary search, the best mutation value is updated whenever a mutation results in a feasible solution and the mutation amount is larger than the previous best mutation. In contrast, the fitness criterion selects the mutation value with the largest fitness. Using the fitness criterion, the Improves function returns true if the original adjustment is feasible and the resulting chromosome has a larger fitness value than the original chromosome. Inside the binary search, the best mutation value is updated whenever a mutation results in a feasible chromosome and the chromosome’s fitness is larger than the fitness associated with the previous best mutation. 4.1.2 Crossover Operator We adapt the partial vector crossover operator from [Mich96] as a feasibility preserving crossover operator. Let q t = {q1 , q 2 , , q n } and s t = {s1 , s2 , , sn } represent two feasible solution vectors at time t and assume that the crossover point falls between attributes indexed by j and j+1. Since the search space F is convex (cf. Section 3), there exists a ∈ [0,1] such that
{
}
{
}
q t +1 = q1 , , q j , a ⋅ s j +1 + (1 − a ) ⋅ q j +1 , , a ⋅ sn + (1 − a ) ⋅ q n ∈ F s t +1 = s1 , , s j , a ⋅ q j +1 + (1 − a ) ⋅ s j +1 , , a ⋅ q n + (1 − a ) ⋅ sn ∈ F Similar to the feasible mutation operator, we use a binary search with two criteria to find the best a value. The diversity criterion finds the largest a value that is consistent with the feasibility constraint. The fitness criterion finds the fitness maximizing a value that is consistent with the feasibility constraint. If no consistent value of a is found, then it is set to 0, implying no
Page 12
swapping of genetic material. Note that a = 1 results in a simple swapping of values between the two chromosomes. 4.1.3 Feasibility Control Parameter To control the application of the feasibility preserving operators, we use a feasibility control parameter. The feasibility control parameter is defined as the minimal fraction of feasible solutions required in each generation. A value of 0 indicates that all members of a generation can be infeasible (no feasibility requirement), while a value of 1 indicates that all members must be feasible. The feasibility control parameter restricts the application of the feasibility preserving mutation and crossover operators previously defined. When the feasibility level is exceeded in a generation, subsequent chromosomes in the generation are created using non-closed versions of the operators. The non-closed operators involve a single application of the underlying formula (non-uniform mutation or partial vector crossover) without the binary search to ensure feasibility. 4.1.4 Fitness Function The fitness function for feasible chromosomes is derived directly from the objective function problem CP2. The only change is due to the maximizing nature of genetic algorithms. Since the objective function is a minimization problem, a simple transformation makes it into a maximization problem. The fitness of a feasible solution q is given by
fitness(q ) = Cmax
n − h 2j (r j − q j ) 2 j =1
∑
1
2
(3)
where Cmax is a value larger than the worst objective function value. When infeasible solutions can be generated, the fitness function is typically augmented with a penalty function. Both the binary genetic algorithm and the real genetic algorithm (with a
Page 13
feasibility control less than 1) can generate infeasible solutions. The augmented fitness function should have several properties: (1) the amount of penalty depends on the extent of infeasibility, (2) infeasible solutions should receive lower fitness values than feasible solutions, and (3) transition between the fitness values of feasible and infeasible solutions should be smooth. The second and third properties may conflict. If all feasible solutions have a larger fitness value than infeasible solutions, the fitness function may have a large discontinuity that causes poor feasible solutions to dominate. To make the transition smoother, it may be beneficial to allow some infeasible solutions (especially those close to the optimal solution) to have better fitness values than poor feasible solutions. We designed two fitness functions for infeasible solutions to test the sensitivity of the genetic algorithm to the second and third properties. In both infeasible fitness functions, the fitness is the sum of a penalty amount measuring the amount of infeasibility and an objective amount measuring the quality of some feasible solution. The penalty amount is the distance between the infeasible point and the closest feasible point. The objective amount differs in the two fitness functions. In the AFitness function, the fitness of each infeasible solution is guaranteed to be worse than all feasible solutions. In the SFitness function, the fitness of each infeasible solution is guaranteed to be worse than some feasible solution (in practice, it is worse than most feasible solutions). The AFitness function ensures that the second property is satisfied, while the SFitness function ensures that the third property is satisfied. Figure 1 containing a two dimensional Voronoi diagram with five sites, illustrates the two fitness functions for infeasible solutions. In Figure 1, Site3 is the preferred site, r is the reference point, and i is an infeasible point generated by the genetic algorithm. The labeled points interior
Page 14
to Site3 (q1 through q6) are feasible solutions previously generated by the genetic algorithm. Assume that q3 is the closest to i among the interior points (q1 through q6), while q6 is the worst point among the interior points. Then, the distance from i to q3, denoted by the dark line connecting them, is the penalty amount. The objective amount in AFitness is the distance from r to q6 (the worst known feasible point). The objective amount in SFitness is the distance from r to q3. 200 Site1 Site4 3 Site5 2 1
150
q2 q1 Penalty i Obj (SFitness)
q3
Site3 (preferred)
q4 q5 Site2 r 100 100
Obj (AFitness) 150
q6 200
Attribute X Figure 1: Voronoi Diagram Depicting the Infeasible Fitness Functions
Page 15
The precise definition of the two fitness functions for infeasible chromosomes is given below. Let, i be an n dimensional infeasible point, q ′ be the point in F closest to i, and q ′′ be the point in F farthest from r. Then the AFitness and Sfitness functions are defined as:
AFitness(i ) = Cmax
1 1 n 2 2 n 2 2 2 2 − h j (q ′j − i j ) + h j (q ′′j − r j ) j =1 j =1
(4)
SFitness(i ) = Cmax
1 1 n 2 n 2 − h 2j (q ′j − i j ) 2 + h 2j (q ′j − r j ) 2 j =1 j =1
(5)
∑
∑
∑
∑
4.2 Binary Genetic Algorithm Even though the published literature contains a large number of algorithms, the traditional binary algorithm [Gold89b] is still widely used and a good benchmark for a more specialized algorithm. Thus, the experiments in Section 5 use the traditional binary genetic algorithm as one of the algorithms. The traditional binary genetic algorithm uses the single-point crossover and mutation operators. The fitness function is described Section 4.1.4. The only modification to the traditional binary algorithm is the use of the modGA [Mich96] reproduction algorithm instead of roulette wheel selection. The modGA algorithm uses an elitist strategy in which some members of the previous generation are copied to the next generation. 4.3 Steepest Ascent Hill Climber The steepest ascent hill climber is a greedy heuristic algorithm as it always moves in the direction of largest gain. On certain optimization problems [Mich96], hill climbing algorithms will eventually find the optimal solution. In our problem, a hill climber seems like a good heuristic because the constraints are linear and only one optimal value exists.
Page 16
We designed a steepest ascent hill climber so that its search effort is similar to the genetic algorithms. In the outer loop, the steepest ascent hill climber conducts NumGens local searches where NumGens is the number of generations used by the genetic algorithms. In each outer iteration, the hill climber uses the best solution from the previous iteration. In the first iteration, the best solution in the original feasible population is used. In the inner loop, the hill climber performs PopSize mutations of a starting solution where PopSize is the population size used by the genetic algorithms. Mutation is performed using the diversity control version of the FeasibleMutationSearch procedure (Section 4.2.1) where the age and mutation point are randomly generated each time. In effect, the hill climber works like a super-elite, real genetic algorithm using only mutation. The pseudo code for the hill climber is given below. procedure HillClimb Input: Chromosome, NumGens, PopSize, Depth Output: BestChromosome (the best chromosome generated) BestChromosome ← Chromosome for i = 1 to NumGens InitChromosome ← BestChromosome for j = 1 to PopSize NewChromosome ← InitChromosome Randomly generate Age and MutPoint FeasibleMutationSearch(NewChromosome, MutPoint, Depth, Age) if fitness(NewChromosome) > fitness(BestChromosome) BestChromosome ← NewChromosome next j next i return BestChromosome end procedure
5. Experimental Comparison In this section, we describe simulation experiments to evaluate the performance of the genetic algorithms. We describe the research questions, experimental design, experimental procedures, and results.
Page 17
5.1 Research Questions The objective of the experiments is to investigate genetic algorithm performance as related to algorithm issues and problem difficulty. For the algorithm issues, we are primarily interested in the effect of feasibility control and algorithm (binary, real, and hill climbing) because these factors generalize to other genetic algorithms. We are secondarily interested in issues specific to our algorithms including feasibility search criteria (diversity versus fitness) in the mutation and crossover operators, and evaluation of infeasible solutions. For problem difficulty, we are interested in the effect of search effort and number of attributes. We have several beliefs about the relationship between algorithm issues, problem difficulty, and performance. First, we believe that increasing feasibility control will improve performance and that increasing the number of classes will make feasibility control more important. Hypotheses 1 and 1.1 states these beliefs. Second, because of the importance of feasibility control, we believe that the real-1 algorithm (real genetic algorithm with feasibility control parameter equal to 1) will dominate the other algorithms as stated in Hypothesis 2. Because the other algorithm issues are rather specific to our problem, we do not state formal hypotheses for them. Third, we believe that performance will worsen as the number of attributes increases as stated in Hypothesis 3. Additional search effort will improve performance but will not entirely compensate for increases in the number of attributes. The search effort is determined by the number of individuals sampled (number of generations times the population size). Hypothesis 1: As the feasibility control increases, the performance of the real genetic algorithm improves. Hypothesis 1.1: As the number of classes changes from low to medium and medium to high, feasibility control becomes more important.
Page 18
Hypothesis 2: The real-1 algorithm will dominate the binary, real-0, and hill climber algorithms. Hypothesis 3: Performance becomes worse as the number of attributes increases. Increasing the number of individuals sampled will not entirely compensate for increases in the number of attributes. We use two performance measures: (i) the percentage cost difference (PD) between the genetic algorithm and optimal solution and (ii) the percentage improvement (PI) of the genetic algorithm as compared to the best initial solution. BestGA is defined as the best in all generations in which the genetic algorithm executed. PI is an internal measure of performance as it uses the initial population to account for problem difficulty. Problem difficulty can arise from a number of sources such as how far the reference point is from the optimal solution, the quality of solutions in the initial population, and the shape of the preferred region. BestGA − Optimal Optimal
(6)
BestInitial − BestGA BestInitial − Optimal
(7)
PD = 100 × PI = 100 ×
5.2. Experimental Design We use several experiments to test the hypotheses discussed in Section 5.1. For Hypothesis 1, we use a linear regression model as shown in the response function (8) below. E (Y ) = β 0 + β 1 FC + β 2 NC L + β 3 NC M + β 4 FCNC L + β 5 FCNC M
(8)
Here, Y is the performance measure (PD or PI), FC is the feasibility control parameter, and NCL and NCM are indicator variables defined as follows: 1 if the number of classes is low NC L = 0 otherwise 1 if the number of classes is moderate NC M = 0 otherwise
(9)
(10)
Page 19
In this model, we vary the feasibility control from 0.0 to 1.0 in steps of 0.10. For the number of classes, we use low1, moderate (15), and high (30) levels. Each observation is the average performance over 30 problems where a problem is a random combination of a preferred site and a reference point. Because we are not interested in the impact of the number of attributes, we use separate regressions for a variety of number of attributes. Each regression uses a sample of 160 randomly generated observations. For Hypothesis 2, we use a single factor, fixed effects model with equal treatment sample sizes [NWK85]. This experiment has four treatments corresponding to the binary genetic algorithm, the real genetic algorithm with feasibility control of 0, the real genetic algorithm with feasibility control of 1, and the hill climbing algorithm. Each observation is the average over 30 randomly selected problems (combination of number of classes, attributes, preferred site, and reference point). The number of classes is uniformly drawn from low, moderate, or high where the low, moderate, and high levels are defined as in experiment 1. For each treatment, the same set of 30 observations was used. For Hypothesis 3, we generate data to plot graphs depicting the relationship between the number of generations, number of attributes, and performance. We generate data by varying the maximum number of generations and averaging the best performance across a set of 30 randomly selected problems (combinations of preferred class and reference point).
1
The number of classes for the “low” level depends on the number of attributes because of requirements to generate the Voronoi diagram (see Section 3). We used 5 classes for 2 attributes, 6 classes for 4 attributes, 7 classes for 5 attributes, and 12 classes for 10 attributes.
Page 20
5.3 Experimental Procedure To conduct the experiments discussed in Section 5.2, we determined parameter values and devised a method to generate experimental observations. We determined parameter values through an examination of previous work and pilot studies. For the replacement ratio (.9), aging parameter (2), and binary search depth (3) of the real genetic algorithm, we relied on values suggested in [Mich96]. For the real crossover and mutation rates, we conducted a small simulation experiment at a number of levels. The simulations suggested that a crossover rate of 0.3 and mutation rate of 0.2 give excellent results. These rates are comparable to previous studies [Mich96, MVH91]. For the binary genetic algorithm, we used a crossover rate of 0.5 and mutation rate of 0.001 based on an examination of past studies discussed in [Gold89b, Gref86]. For the population size, pilot studies suggested that small sizes (10) give slightly better results after 2500 individuals as well as converge more rapidly than larger sizes. Our results with small population sizes are also consistent with previous studies discussed in [Gold89a]. Each observation results from executing the appropriate algorithm (real, binary, or hill climbing) with specified parameter settings on randomly selected problems. A problem is a combination of reference point, preferred class, and change cost vector. The techniques described in Appendices A and B were used to generate the Voronoi diagram for a given set of sites and determine the optimal solution for the Voronoi diagram, reference point, change cost vector, and preferred class. The costs were uniformly drawn from a population with a moderate (0.3) coefficient of variation. There is one set of sites for each combination of attributes and number of classes.
Page 21
5.4 Pilot Study Results We first conducted informal studies about the issues specific to the real genetic algorithm. Figure 2 displays results about the two infeasible fitness functions (SFitness versus AFitness). Each point in Figure 2 is an average over a number of observations where each observation is the average of a number of runs with the same initial population. The results in Figure 2 support the notion that genetic algorithms can be sensitive to the choice of fitness functions for infeasible solutions. The SFitness is the better choice as all genetic algorithms with feasibility control less than 1 have improved performance. It is interesting to note that the order of the real-0 and binary algorithms changes with the different infeasible fitness functions. SFitness Results (3 Attributes, 30 Classes)
AFitness Results (3 Attributes, 30 Classes) 18
16 14
AvgPD
12 10
AvgPD
Real-1 Real-0.5
16 14
8
Real-1 Real-0.5 Real-0 Binary
6 4
8 6
2
4 2
0
0 0
500
1000
1500
Individuals
2000
2500
Real-0 Binary
12 10
0
500
1000
1500
2000
2500
Individuals
Figure 2: Infeasible Fitness Function Results Figure 3 depicts results about the four combinations of operator (crossover and mutation) and search criterion (diversity and fitness). The fitness and diversity versions of the crossover operator (CrossFit and CrossDiv, respectively) do not seem to affect the results so we ignore them in our discussion. The mutation operator with the diversity criterion (MutDiv) provides slightly better results after 2500 individuals. It is interesting to note that the fitness criterion (MutFit) provides a very rapid improvement in the fitness function. However, the rapid
Page 22
improvement cannot be sustained indicating that more search depth may be necessary. Even with the same binary search depth, the MutFit operator requires considerably longer run times (up to two times as long) than the MutDiv operator because the MutDiv operator only performs the binary search if the initial mutation result is infeasible. Thus, we strongly prefer the MutDiv operator because it provides excellent results after 2500 individuals and uses little search effort.
Diversity vs. Fitness Search Criteria (4 Attributes, 15 Classes) 60 CrossDiv/MutDiv
AvgPD
50
CrossFit/MutDiv
40
CrossDiv/MutFit
30
CrossFit/MutFit
20 10 0 0
500
1000
1500
2000
2500
Individuals
Figure 3: Search Criteria Results for the Mutation/Crossover Operators 5.5 Experiment Results For experiment 1, Tables 1 through 4 show the parameter estimates for the response functions (PD and PI) in Equation 8 for 5 and 10 attributes. In these tables, only those variables that were found to be significant at a P-value of 0.06 or below are shown. Table 1. PD Parameter Estimates for 10 Attributes R-square 0.6983; Adj R-sq 0.6907 Variable DF Param. Est. Std. Error t-value P-value INTERCEPT 1 62.064340 2.57018713 24.148 0.0001 FC 1 -53.445617 3.83140876 -13.949 0.0001 NCL 1 6.432133 2.42319567 2.654 0.0087 NCM 1 18.872213 4.10872816 4.593 0.0001 FC* NCM 1 -12.680096 6.63619464 -1.911 0.0578
Page 23
Table 2. PI Parameter Estimates for 10 Attributes R-square 0.9112; Adj R-sq 0.9095 Variable DF Param. Est. Std. Error t-value P-value INTERCEPT 1 68.962752 0.45239398 152.440 0.0001 FC 1 24.703983 0.61000792 40.498 0.0001 NCL 1 1.506254 0.47251011 3.188 0.0017 NCM 1 1.150204 0.47251011 2.434 0.0160 Table 3. PD Parameter Estimates for 5 Attributes R-square 0.6947; Adj R-sq 0.6890 Variable DF Param. Est. Std. Error t-value P-value INTERCEPT 1 9.760559 0.39049992 24.995 0.0001 FC 1 -9.769902 0.52654999 -18.555 0.0001 NCL 1 1.431880 0.40786387 3.511 0.0006 NCM 1 1.815742 0.40786387 4.452 0.0001 Table 4. PI Parameter Estimates for 5 Attributes R-square 0.6918; Adj R-sq 0.6880 Variable DF Param. Est. Std. Error t-value P-value INTERCEPT 1 86.850910 0.35075947 247.608 0.0001 FC 1 10.470872 0.63382790 16.520 0.0001 FC* NCL 1 2.064426 0.67227601 3.071 0.0025 The results support Hypothesis 1 but offer only mild support for Hypothesis 1.1. For Hypothesis 1, feasibility control is significant in all response functions. To aid in understanding the results for Hypothesis 1.1, Table 5 decomposes the response function by number of classes. This table reveals that there is some support in the response function for percentage improvement. The percentage improvement functions for 10 attributes support Hypothesis 1.1. However in the five attribute dimension case, there is support for the feasibility control being more important to the low number of classes. In the percentage difference response functions, the slope of the response function is only affected by the number of classes in one case (five Attributes, Medium).
Page 24
Classes High Medium Low
Table 5. Response Functions for Experiment 1 Dimensions 5 10 E(PD) = 9.76 − 9.77 * FC E(PD) = 62.06 − 53.45 * FC E(PI) = 86.85 + 10.47 * FC E(PI) = 68.26 + 24.70 * FC E(PD) = 11.38 − 9.77 * FC E(PD) = 80.86 − 66.12 * FC E(PI) = 86.85 + 10.47 * FC E(PI) = 70.11 + 24.70 * FC E(PD) = 11.38 − 9.77 * FC E(PD) = 68.69 − 53.45 * FC E(PI) = 86.85 + 12.54 * FC E(PI) = 70.46 + 24.70 * FC
There are a number of possible explanations for the weak support of Hypothesis 1.1. One explanation is that the increase in classes was not enough to make a difference. Another explanation is that competing forces are at work. For example, the fitness function exerts strong pressure to focus on feasible solutions and infeasible solutions close to the optimal. This increased selection pressure combined with the smaller feasible space may counteract the effect of increased constraints. Tables 6 through 8 show the analysis of variance results for experiment 2. Because both analyses support the hypothesis that at least one of the means is different, t tests for equality of sample means were performed (Tables 9 and 10). The results support Hypothesis 2 in that Real1 is the best algorithm for both PD and PI measures. The results also support a ranking of the remaining algorithms as Real-0, HC, and Binary. The reasonable performance of HC suggests that mutation is an important operator. The superiority of Real-0 over HC suggests that reproduction and crossover play a significant although not dominant role in the search process. The poor performance of Binary suggests that the small role of mutation is not adequate in constrained optimization problems. Simple changes such as large increases in the mutation rate make Binary behave as a random search. A more fundamental change may be necessary.
Page 25
Table 6: PD ANOVA Results for Experiment 2 Source SS df MS F P-value F crit Alg 44008.256 3 14669.419 150.2 8.64E-40 2.683 Within 11329.226 116 97.666 Total 55337.482 119
Source Alg Within Total
Table 7: PI ANOVA Results for Experiment 2 SS df MS F P-value F crit 49173.759 3 16391.253 843.305 1.43E-78 2.683 2254.682 116 19.437 51428.441 119 Table 8: PD and PI Summary Results for Experiment 2 ALG PD Avg PD Var PI Avg PI Var Real-1 Real-0 HC Binary
4.324 16.601 19.900 55.754
7.847 38.987 29.039 314.790
96.323 86.051 70.490 42.642
0.997 6.345 14.401 56.005
Table 9: PD Sample Mean Results Comparison P-value t-Value Real-1, Real-0 3.19412E-12 -9.82621685 Real-0, HC 0.032770655 -2.18817487 HC, Binary 2.62237E-12 10.59174601 Table 10: PI Sample Mean Results Comparison P-value t-Value Real-1, Real-0 2.44412E-22 20.76336538 Real-0, HC 3.13435E-24 18.71289961 HC, Binary 8.40517E-22 -18.1783105 For experiment 3, Figure 4 shows results for various number of attributes when varying the maximum number of generations. These graphs support Hypothesis 3 in that performance deteriorates as the number of attributes increases. The graphs for the high and low classes look similar indicating that real-1 was robust with increasing the number of constraints in this range. However, the results do not seem to show that the search effort increases to reach the performance limit. In the PD graph, the performance limit is reached at about 4000 individuals
Page 26
for all 4 graphs even though there is some fluctuation after this point in the 10 attribute case. In another simulation, we increased the number of individuals of the 10A case from 10,000 to 19,000. Surprisingly, the 10A performance did not seem to improve much even with increasing individuals to 19000. PI Performance as Search Effort Increases (30 Classes)
PD Performance as Search Effort Increases (30 Classes) 25
100
2A 3A
20
98 96
5A
15
10A 10
AvgPI
AvgPD
4A
94
2A
92
3A 4A
90
5A
5 88 0
10A
86 0
5000
10000
0
5000 Individuals
Individuals
PD Performance as Search Effort Increases (Low Classes) 25
15
PI Performance as Search Effort Increases (Low Classes)
2A 3A 4A
100
5A
98
10A
96 AvgPI
AvgPD
20
10
2A
94
3A
92
4A
90
5
5A
88 0
10000
10A
86 0
5000 Individuals
10000
0
5000 Individuals
Figure 4: PD and PI Search Effort Results
10000
Page 27
5.6 Discussion These results provide several lessons about genetic algorithm performance. First, these results provide additional evidence that constrained optimization problems are difficult for genetic algorithms. The poor performance of the traditional binary algorithm shows that specialized designs may be necessary. Second, these results provide additional evidence about the viability of real genetic algorithms. The real genetic algorithm is a good tool even when constraint boundaries are not known precisely. The reasonable performance of the real algorithm even without feasibility control shows that real encoding is responsible for some of the success. Third, these results show that genetic algorithm performance on constrained optimization problems is rather sensitive to the operators and fitness function. The importance of feasibility control shows the need for feasibility preserving operators. In addition, the results support fitness functions that do not overly penalize infeasible solutions and operators that emphasize diversity rather than fitness improvements. A final lesson from this study involves the use of approximate solution techniques such as genetic algorithms versus optimal solution techniques such as quadratic programming. As the results demonstrate, the best approximate results were only within 10% of the optimal solution for the largest size problems tested. There is ample reason to prefer the optimal solution if it can be generated efficiently. Because constructing a Voronoi diagram is computationally expensive, the real genetic algorithm may be preferred for moderate size problems in dynamic environments and time-constrained environments. The real genetic algorithm provides a fast approximate solution for these environments. In addition, for large problems, the real genetic algorithm provides a solution approach when the Voronoi diagram cannot be constructed.
Page 28
6. Summary and Conclusion We examined the cost minimizing inverse classification problem with respect to similarity-based classification systems. As a form of sensitivity analysis, the importance of this problem seems likely to grow with the spread of classification systems in organizations. The use of similarity-based systems was motivated by their importance and their implicit representation of concept boundaries. A genetic algorithm is well suited to this problem because the concept boundaries can be implicitly encoded in the fitness function. We developed three genetic algorithms (real, binary, hill climbing) and compared their performance to the optimal solution. The strong performance of the real genetic algorithm emphasized that genetic algorithms can perform well on constrained optimization problems even when the constraint boundaries are not explicitly represented. In addition, the empirical results replicated other studies in the need for feasibility control in operator design and the viability of real encoding. We see this as an initial effort to study inverse classification problems. An obvious extension to this study is to relax the real attribute restriction. Many classification systems operate with attributes of mixed data types including real, integer, ordinal, and nominal. To study this relaxation, we will need to amend our genetic algorithms, use an actual classification system, and devise good heuristics with which to compare our approach. Because the performance of the binary genetic algorithm was poor here, a mapping between real and discrete representations may be necessary. Other possible extensions are studying other classification systems and searching for tradeoffs between the benefit of moving to a better class and the cost of changing attributes.
Page 29
Appendix A – Mathematical Programming Formulation The assumptions given in Section 3.2 allow us to transform the inverse classification problem into an equivalent problem based on Voronoi diagrams. Formally, a Voronoi diagram is a cell complex that reflects proximity relationships as follows. Let S = {s1 , s2 , , sm } be a set of m sites defined in E n . Then the Voronoi diagram V(S) of S decomposes E n into m cells, one for each site in S, with the property that any point x lies in the cell corresponding to site si if and
(
)
only if δ ( x , si ) < δ x , s j ∀ j ∈ S - {i} , where δ ( x , si ) denotes the Euclidean distance between point x and site si [Aure91, Edel87]. The relationships between the inverse classification problem and Voronoi diagrams can be described as follows. Prototypes in the classification system correspond to sites in the Voronoi diagram. Attributes of a prototype or an instance correspond to dimensions in the Voronoi diagram. Concepts defining a given class are represented in the Voronoi diagram as bisectors separating two adjacent regions. Figure A.1 shows a Voronoi diagram in the plane for five sites in the range LB = 100, UB = 200. Note that not all pairs of sites have their bisecting surfaces present in the diagram. For example, there is no point common to sites 1 and 3. Consider point p in region 2, and assume that region 3 is the preferred region. Then point q, shown on the bisector separating regions 2 and 3, is a solution to the inverse classification problem if attribute change costs are uniform.
Page 30
200 Site1 Site4
Site5 150 Site3
q r Site2 100 100
150
200
Attribute X Figure A.1: Voronoi Diagram for Two Dimensions and Five Sites One of the problems in using Voronoi diagrams is that it is difficult to generate such diagrams in arbitrary space. Commercially available software packages, such as Mathematica [Boyl91], do not have the ability to construct Voronoi diagrams beyond 2-dimensional space. As a result, our approach is based on the observation that Voronoi diagrams in n dimensions are related to arrangements of hyperplanes in (n+1) dimensions [Edel87]. In addition, we also use the reverse search algorithm [AF92] for enumeration of all vertices of a convex polyhedron, and its C implementation [Avis93]. Details of the method used in the construction of Voronoi diagrams are given in Appendix B.
Page 31
This method allows us to determine the set of bisectors that are present in the Voronoi diagram. Let B( λ ) denote the set of sites with which site sλ has a bisector in the Voronoi diagram. Then B( λ ) can be partitioned into sets B+ (λ ) and B − (λ ) defined as follows2: B + ( λ ) = si
∑ j =1(sλ , j − si , j ) sλ , j
−
1 2
∑ j =1(sλ2 , j − si2, j )
> 0
(A1)
B − (λ ) = si
∑ j =1(sλ , j − si , j ) sλ , j
−
1 2
∑ j =1(sλ2 , j − si2, j )
< 0
(A2)
n
n
n
n
It follows that any other point q = {q1 , q 2 , , q n } is contained in the open region of site sλ if it satisfies similar constraints with respect to the sites in B+ (λ ) and B − (λ ) . Using the constraints in (A1) and (A2), the inverse classification problem can be formulated as shown below in Problem CP2.
Let hj denote the change cost associated with
attribute Aj. Since hj is stationary, the objective function is the weighted Euclidean distance between the original and modified instances. In the formulation shown below, we have chosen to minimize the square of the weighted Euclidean distance. n
Problem CP2: min q s.t.
∑ h 2j (r j - q j ) j=1
∑ j =1(si , j − sk , j ) q j n
−
∑ j =1(si , j − sk , j ) q j n
qj ≥
2
− −
1 2
∑ j =1(si2, j − sk2, j ) ≥
1 2
n
∑ j =1(si2, j − sk2, j ) n
0 ∀ sk ∈ B + ( λ ) ≥ 0 ∀ sk ∈ B − ( λ )
LB for j = 1, 2, , n
−q j ≥ - UB for j = 1, 2, , n
2
See equation B2 in Appendix B for a derivation of these constraints.
Page 32
The objective function in Problem CP2 can be reduced to standard form as 1 T q Hq Problem CP3: min g T q + 2 q where g is an n x 1 column vector whose jth element is −2 p j h2j and H is an n x n diagonal matrix whose jth diagonal element is 2h 2j , q is an n x 1 column vector representing the decision variable, and gT, qT denote transposes. Since H is positive definite, a unique solution to problem CP3 exists provided the constraint set is consistent. In this research we used the IMSL procedure QPROG [IMSL89] to solve problem CP3.
Page 33
Appendix B - Construction of Voronoi Diagram We describe here the steps used for constructing Voronoi diagrams. Input: Coordinates of m sites in si , i = 1, 2, , m .
E n . The ith site is denoted by
Output: Set of bisector surfaces (hyperplanes) defining the region of site si. Step 1: Transform coordinates to hyperplane in E n+1 We use the property that Voronoi diagrams in E n are related to arrangements of hyperplanes in E n+1 . Let the coordinates of site si in E n be denoted by {s1 , s2 , , sm } . We map each site si ∈S in E n to the hyperplane of a unit paraboloid in E n+1 by using the geometric transformation ξ defined below. For this purpose we identify any point { yi ,1 , yi ,2 , , yi ,n } in E n with the hyperplane { yi ,1 , yi ,2 , , yi ,n , 0} in E n+1 . Transform ξ(si) maps the coordinates of site si to the hyperplane
(
)
ξ( si ) = 2 si ,1 x1 + si ,2 x2 + + si ,n xn − s2 + s2 + + s2 i ,1 i ,2 i ,n
(B1)
It has been shown in [Edel87] that ξ(si) is the unique hyperplane that touches the unit paraboloid U : si ,n +1 = s2 + s2 + + s2 in the vertical projection U(si ) of si. on surface i ,1
i ,2
i ,n
U. The transformations hence result in a convex polyhedron given by the set of m unique hyperplanes in E n+1 . Step 2: Enumerate all vertices of convex polyhedron: Enumeration of all vertices of the convex polyhedron is based on the property that vertices of the convex polyhedron in E n+1 , when projected back to E n , are also vertices of the Voronoi diagram [see Edel87]. The reverse search enumeration method described in [AF92] was employed for this purpose. A C language
Page 34
implementation of the algorithm is given in [Avis93]. This implementation was found to be suitable since it uses multiple precision rational arithmetic and is capable of handling problems of fairly large size. The vertices generated by the algorithm are transformed to E n by projection, i.e. simply dropping the (n+1)th coordinate. Step 3: Identify active bisecting surfaces of Voronoi diagram: Identification of the active bisector surfaces was done by using the following decision rule. Consider sites si and sk. The bisecting surface, Sep (i, k), separating these two sites is the hyperplane
Sep (i, k):
∑ j =1 (si , j − sk , j ) x j n
-
1 2
∑ j =1 (si2, j − sk2, j ) = n
0
(B2)
If Sep (i, k) contains one or more of the vertices generated in step 2, then it is an active bisector surface and is present in the Voronoi diagram, otherwise not. Identification of all the bisector surfaces thus partitions the n-dimensional space (LB, UB)n into m convex regions, one for each of the m sites. Step 4: Testing: Testing of the Voronoi diagram is necessary since vertex coordinates are converted from rational to real values, which may cause some loss in precision. To test the algorithm, we generated several thousand points at random and ensured that the classification by considering the active bisector surfaces was identical to the classification using the nearest neighbor rule.
Page 35
References ABR98
Avesani, P., Blanzieri, E., and Ricci, F. “Advanced Metrics for Class-Driven Similarity Search,” in Proceedings of the 10th International Workshop on Database & Expert System Applications, 1998.
AF92
Avis, D. and Fukuda, H. "A Pivoting Algorithm for Convex Hulls and Vertex Enumeration of Arrangements and Polyhedra," Discrete and Computational Geometry 8, (295-313), 1992.
AKA91
Aha, D., Kibler, D. and Albert, M. “Instance-Based Learning Algorithms,” Machine Learning, 6 (1991), 37-66.
Alle94
Allen, B. “Case-Based Reasoning: Business Applications” Communications of the ACM 37, 3 (March 1994), 40-42.
Alsu90
Al-Sultan, K. Nearest Point Problems: Theory and Algorithms, Ph.D. Dissertation, The University of Michigan, Ann Arbor, 1990.
Alsu94
Al-Sultan, K. “A Newton Based Radius Reduction Algorithm for Nearest Point Problems in Pos Cones,” ORSA Journal on Computing 6, 3 (Summer 1994), 292299.
Alsu95
Al-Sultan, K. “A Tabu Search Approach to the Clustering Problem,” Pattern Recognition 28, 9 (1995), 1443-1451.
AM92
Al-Sultan, K. and Murty, K. “Exterior Point Algorithms for Nearest Points and Quadratic Programs,” Mathematical Programs, 57, (1992), 145-161.
Anto89
Antonisse, J. “A New Interpretation of Schema Notation that Overturns the Binary Encoding Constraint,” in Proc. Third International Conference on Genetic Algorithms, J. Schaffer (ed.), Morgan Kaufmann Publishers, Los Altos, CA, 1989.
Aure91
Aurenhammer, F. “Voronoi Diagrams - A Survey of a Fundamental Geometric Data Structure,” ACM Computing Surveys 23, 3 (September 1991), 345-405.
Page 36
AF92
Avis, D. and Fukuda, H. “A Pivoting Algorithm for Convex Hulls and Vertex Enumeration of Arrangements and Polyhedra,” Discrete and Computational Geometry 8, (295-313), 1992.
AS93
Alsultan, K. and Selim, S. “A Global Algorithm for the Fuzzy Clustering Problem,” Pattern Recognition 26, 9 (1993), 1357-1361.
Avis93
Avis, D. "A C Implementation of the Reverse Search Vertex Enumeration Algorithm," Technical Report SOCS-92.12 (October 1992; revised June 1993), School of Computer Science, McGill University, Montreal, Quebec H3A 2A7.
Boyl91
Boyland, P. Guide to Standard Mathematica Packages, Technical Report, Wolfram Research Inc., Champaign, IL, 1991.
CF84
Cluff, G.S. and Farnham, P.G. “Standard & Poor's Vs Moody's: Which City Characteristics Influence Municipal Bond Ratings?,” Quarterly Review of Economics and Business, 2 (3): 72-94, 1984.
CHJS92
Charnes, A., Haag, S., Jaska, P., and Semple, J. “Sensitivity of Efficiency Classifications in the Additive Model of Data Envelopment Analysis,” Int. J. Systems Sci. 23, 5 (1992), 789-798.
CMSW92 Creecy, R., Masand, B., Smith, S. and Waltz, D. “Trading MIPS and Memory for Knowledge Engineering,” Communications of the ACM, 35, 8 (August 1992), 48-64. Dasr90
Dasrathy, B. (editor) Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques, IEEE Computer Society Press, 1990.
Edel87
Edelsbrunner, H. Algorithms in Combinatorial Geometry, Springer-Verlag, 1987.
GB89
Grefenstette, J. and Baker, J. How Genetic Algorithms Word: A Critical Look at Implicit Parallelism, in Proc. Third International Conference on Genetic Algorithms, J. Schaffer (ed.), Morgan Kaufman Publ., San Mateo, CA 1989, pp. 20-27.
Glov89
Glover, F. “Tabu Search - Part I,” ORSA Journal on Computing, 2, 1 (1989), 190206.
Page 37
Gold89a
Goldberg, D. "Sizing Populations for Serial and Parallel Genetic Algorithms," in Proc. Third International Conference on Genetic Algorithms, J. Schaffer (ed.), Morgan Kaufman Publ., San Mateo, CA 1989.
Gold89b
Goldberg, D. Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, Reading, MA, 1989.
Good90
Goodman, M. “Prism: A Case Based Telex Classifier,” in Proceedings of the 2nd Innovative Applicatons of Artificial Intelligence, 1990.
Gref86
Grefenstette, J. “Optimization of Control Parameters for Genetic Algorithms,” IEEE Transactions on Systems, Man and Cybernetics, 1 (Jan./Feb. 1986), 122-128.
HP92
Hirota, K. and Pedrycz, W. “Prototype Construction and Evaluation as Inverse Problems in Pattern Classification,” Pattern Recognition 25, 6 (1992), 601-608.
HT96
Hastie, T. and Tibshirani, R. “Discriminant Adaptive Nearest Neighbor Classification,” IEEE Transactions on Pattern Analysis and Machine Intelligence 18, 6 (June 1996), 607-616.
IMSL89
IMSL Math/Library. User's Manual - Fortran Subroutines for Mathematical Applications, Version 1.1, December 1989.
ICFQ93
Irani, K., Cheng, J., Fayyad, U., and Qian, Z. "Applying Machine Learning to Semiconductor Manufacturing," IEEE Expert 8, 1 (February 1993), 41-47.
KM97
Kim, J. and Myung, H. “Evolutionary Programming Techniques for Constrained Optimization Problems,” IEEE Transactions on Evolutionary Computation 1, 2 (July 1997), 129-140.
KO94
Kimbrough, S. and Oliver, J. “On Automating Candle Lighting Analysis: Insight from Search with Genetic Algorithms and Approximate Models,” in Proceedings of the Twenty-Seventh Annual Hawaii International Conference on System Sciences, 1994, pp. 536-544.
LA87
Laarhoven, P. and Aarts, E. Simulated Annealing: Theory and Practice, D. Reidel Publishing Company, Holland, 1987.
Page 38
MB98
Mak, B. and Blanning, R. “An Empirical Measure of Element Contribution in Neural Networks,” IEEE Transactions on Systems, Man, and Cybernetics 28, 4 (November 1998), 561-564.
Mich95
Michalewicz, Z. “A Survey of Constraint Handling Techniques in Evolutionary Computation Methods,” in Proc. 4th Annual Conference on Evolutionary Programming, McDonnell, Reynolds, and Fogel (eds.), MIT Press, 1996, pp. 135155.
Mich96
Michalewicz, Z. Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag, Third Edition, 1996.
MS96
Michalewicz, Z. and Schoenauer, M. “Evolutionary Algorithms for Constrained Optimization Problems,” Evolutionary Computation 4, 1 (1996), 1-32.
MVH91
Michalewicz Z., Vignaux, G., and Hobbs, M. “A Nonstandard Genetic Algorithm for the Nonlinear Transportation Problem,” ORSA Journal on Computing 3, 4 (Fall 1991), 307-316.
Murt88
Murty, K. Linear Complementarity, Linear and NonLinear Programming, SpringerVerlag, Berlin, 1988.
NWK85
Neter, J., Wasserman, W., and Kuttner, M. Applied Linear Statistical Models, Richard D. Irwin, Inc., Homewood, IL, 1985.
PBH90
Porter, B., Bareiss, R., and Holte, R. “Concept Learning and Heuristic Classification in Weak-Theory Domains,” Artificial Intelligence Journal 45, 1, 229-264, 1990.
Simo92
Simoudis, E. “Using Case-Based Retrieval for Customer Technical Support,” IEEE Expert 7, 5 (October 1992), 7-11.
SA91
Selim, S. and Alsultan, K. “A Simulated Annealing Algorithm for the Clustering Problem,” Pattern Recognition 24, 10 (1991), 1003-1008.
SW86
Stanfill, C. and Waltz, D. “Toward Memory-Based Reasoning,” Communications of the ACM 29, 12 (December 1986), 1213-1228.
Page 39
TK90
Tam, K. and Kiang, M. "Predicting Bank Failures: A Neural Network Approach," Applied Artificial Intelligence, 4, 1990, 265-282.
Wrig91
Wright, A. “Genetic Algorithms for Real Parameter Optimization,” Foundations of Genetic Algorithms, G. Rawlins (ed.), Morgan Kaufman Publ., San Mateo, CA 1991.
ZO96
Zakarauskas, P. and Ozard, J. “Complexity Analysis for Partitioning Nearest Neighbor Searching Algorithms,” IEEE Transactions on Pattern Analysis and Machine Intelligence 18, 6 (June 1996), 663-668.