Metaheuristic Algorithms in Modeling and Optimization

8 downloads 0 Views 880KB Size Report
GEP gene contains a list of symbols with a fixed length that can be any element from a ..... divided into three groups: employed bees (forager bees), onlooker bees (observer bees) and scouts. Unlike ..... “Coupled Eagle Strategy and Differential.
Metaheuristic Algorithms in Modeling and Optimization Amir H. Gandomi1, Xin-She Yang2, Siamak Talatahari3, Amir H. Alavi4 1

Department of Civil Engineering, The University of Akron, Akron, OH 44325, USA Mathematics and Scientific Computing, National Physical Lab, Teddington TW11 0LW, UK 3 Marand Faculty of Engineering, University of Tabriz, Tabriz, Iran 4 School of Civil Engineering, Iran University of Science and Technology, Tehran, Iran 2

Abstract Metaheuristic algorithms have become powerful tools for modeling and optimization. This chapter strives to provide an overview of nature-inspired metaheuristic algorithms, especially those developed in the last two decades, and their applications. We will briefly introduce algorithms such as genetic algorithms, differential evolution, genetic programming, fuzzy logic, and most importantly, swarm-intelligence-based algorithms such as ant and bee algorithms, particle swarm optimization, cuckoo search, firefly algorithm, bat algorithm, and krill herd algorithm. We also briefly the main characteristics of these algorithms and outline some recent applications of these algorithms.

Keywords: Nature-Inspired Algorithm; Metaheuristic Algorithms; Modeling; Optimization.

1. INTRODUCTION In metaheuristic algorithms, meta- means ‘beyond’ or ‘higher level’. They generally perform better than simple heuristics. All metaheuristic algorithms use some tradeoff of local search and global exploration. The variety of solutions is often realized via randomization. Despite the popularity of metaheuristics, there is no agreed definition of heuristics and metaheuristics in the literature. Some researchers use ‘heuristics’ and ‘metaheuristics’ interchangeably. However, the recent trend tends to name all stochastic algorithms with randomization and global exploration as metaheuristic. Randomization provides a good way to move away from local search to the search on the global scale. Therefore, almost all metaheuristic algorithms are usually suitable for nonlinear modeling and global optimization. Metaheuristics can be an efficient way to produce acceptable solutions by trial and error to a complex problem in a reasonably practical time. The complexity of the problem of interest makes it impossible to search every possible solution or combination, the aim is to find good feasible solution in an acceptable timescale. There is no guarantee that the best solutions can be found, and we even do not know whether an algorithm will work and why if it does work (Yang, 2008; Yang, 2010a). The idea is to have an efficient and practical algorithm that will work most the time and is able to produce good quality

solutions. Among the found quality solutions, it can be expected that some of them are nearly optimal, though there is no guarantee for such optimality. The main components of any metaheuristic algorithms are: intensification and diversification, or exploitation and exploration (Blum and Roli 2003). Diversification means to generate diverse solutions so as to explore the search space on the global scale, while intensification means to focus on the search in a local region by exploiting the information that a current good solution is found in this region. This is in combination with the selection of the best solutions (Yang, 2011a). The selection of the best ensures that the solutions will converge to the optimality. On the other hand, the diversification via randomization avoids the solutions being trapped at local optima, while increases the diversity of the solutions. The good combination of these two major components will usually ensure that the global solution is achievable. Metaheuristic algorithms can be classified in many ways. One way is to classify them as: population-based and trajectory-based (Yang, 2010a). For example, genetic algorithm (GA) and genetic programming (GP) are population-based as they use a set of strings, so is the particle swarm optimization (PSO) which uses multiple agents or particles (Kennedy and Eberhart, 1995). On the other hand, simulated annealing (SA) (Kirkpatrick et al. 1983) uses a single solution which moves through the design space or search space, while artificial neural networks use a different approach. Modeling and optimization may have different emphasis, but for solving the real world problems, we often have to use both modeling and optimization because modeling makes the objective functions are evaluated using the correct mathematical/numerical model of the problem of interest, while optimization can achieve the optimal settings of design parameters. For optimization, the essential part is the optimization algorithms. For this reason, we will focus on the algorithms, especially metaheuristic algorithms.

2. METAHEURISTIC ALGORITHMS 2.1. Characteristics of Metaheuristics Throughout history, especially at the early periods of human history, the main approach to problemsolving has always been heuristic or metaheuristic – by trial and error. Many important discoveries were done by ‘thinking outside the box’, and often by accident; that is heuristics. Archimedes’s Eureka moment was a heuristic triumph. In fact, our daily learning experience (at least as a child) is dominantly heuristic (Yang, 2010a). The popularity and success of metaheuristics can be attributed to many reasons, and one of the main reasons is that these algorithms have been developed by mimicking the most successful processes in nature, including biological systems, and physical and chemical processes. For most algorithms, we know their fundamental components, but how exactly these components interact to achieve efficiency still remains largely a mystery, which inspires more active studies. Convergence

analysis of a few algorithms shows some insight, but in general mathematical analysis of metaheuristic algorithms still has many open questions and still an ongoing active research topic (Yang 2011a; Yang 2011c).

The notable performance of metaheuristic algorithms often result from that they imitate the best features in nature. Intensification and diversification are two main features of the metaheuristic algorithms (Blum and Roli 2003; Yang 2010a; Yang 2011c; Gandomi and Alavi 2012). The intensification, also called exploitation, phase searches around the current best solutions and selects the best candidates or solutions. The diversification, also called exploration, phase ensures that the algorithm explores the search space more efficiently. The overall efficiency of an algorithm is mainly influences by a fine balance between these two components. The system may be trapped in local optima if the exploration is too little and the exploitation is too much. In this case, it would be very difficult or even impossible to find the global optimum. On the other hand, if too much exploration but too little exploitation, it may be difficult for the system to converge. In this case, the overall search performance decelerates. Balancing these two components is itself a major optimization problem (Yang 2011c). Evidently, simple exploitation and exploration are a part of the search. During the search, a proper mechanism or criterion should be considered to select the best solutions. “Survival of the fittest” is a common criterion. It is based on keep updating the current best solution found so far. Moreover, certain elitism should be used. This is to verify that the best or fittest solutions are not lost, and are passed into the next generations. Each algorithm and its variants use different ways to obtain a balance of between exploration and exploitation. Certain randomization in combination with a deterministic procedure can be considered as an efficient way to achieve exploration or diversification. This makes sure that the newly generated solutions distribute as diversely as possible in the feasible search space. From the implementation viewpoint, the actual way of implementing the algorithm does affect the performance to some degree. Hence, validation and testing of implementation of any algorithm are important (Talbi 2009).

2.2. No Free Lunch Theorems There are the so-called ‘No free lunch theorems’, which can have significant implications in the field of optimization (Wolpert and Macready 1997). One of the theorems states that if algorithm A outperforms than algorithm B for some optimization functions, then B will be superior to A for other functions. In other words, if averaged over all possible function space, both algorithms A and B will perform, on average, equally well. That is to say, there is no universally better algorithms. An alternative viewpoint is that there is no need to the average over all possible functions for a given optimization problem. In this case, the major task is to find the best solutions, which has nothing to do with the average over all

possible function space. Other researchers believe that there is no universal tool and, based on experiences, some algorithms outperform others for given types of optimization problems. Thus, the main objective would be either to choose the most suitable algorithm for a given problem, or to design better algorithms for most types of problems, not necessarily for all the problems.

3. METAHEURISTIC ALGORITHMS IN MODELING Various methodologies can be employed for nonlinear system modeling. Each method has its own advantages or drawbacks. The need to determine both the structure and the parameters of the engineering systems makes their modeling a difficult task. In general, models are classified into two main groups: (1) phenomenological and (2) behavioral (Metenidis et al. 2004). The first class is established by taking into account the physical relationships governing a system. The structure of a phenomenological model is chosen on the basis of a prior knowledge about the system. To cope with the complexity of design of the phenomenological models, behavioral models are usually used. The behavioral models capture the relationships between the inputs and outputs on the basis of a measured set of data. Thus, there is no need for a prior knowledge about the mechanisms that produced the experimental data. Such models are beneficial because they can provide very good results with minimal effort (Metenidis et al. 2004; Gandomi and Alavi 2011,2012a,b). Statistical regression techniques are widely-used behavioral modeling approaches.

Several alternative metaheuristic approaches have been developed for the behavioral modeling. Developments in computer hardware during the last two decades have made it much easier for these techniques to grow into more efficient frameworks. In addition, various metaheuristics may be used as efficient tools in problems where conventional approaches fail or perform poorly. Two well-known classes of the metaheuristic algorithms used in nonlinear modeling are Artificial Neural Networks (ANNs) (Haykin 1999) and Genetic Programming (GP) (Koza 1992). ANNs have been used for a wide range of structural engineering problems (e.g. Sakla and Ashour 2005; Alavi and Gandomi 2011a). In spite of the successful performance of ANNs, they usually do not give a deep insight into the process which they obtain a solution. GP, as an extension of genetic algorithms (GAs), possess completely new characteristics. This technique is essentially a supervised machine learning approach that searches a program space instead of a data space. GP automatically generates computerized programs that are represented as tree structures and expressed using a functional programming language (Koza 1992; Gandomi and Alavi 2011). The ability of generating prediction models without assuming the form of the existing relationships is surely a main advantage of GP over regression and ANN techniques. GP and its variants are widely used for solving real world problems (e.g., Gandomi et al. 2011a; Alavi and Gandomi

2011b). There are some other metaheuristic algorithms have been used in the literature for modeling such as Fuzzy Logic (FL) and Support Vector Machine (SVM). These algorithms (ANNs, GP, FL and SVM) are explained in the following subsections.

3.1. Artificial Neural Networks ANNs emerged as a result of simulation of biological nervous system. The ANN method was developed in the early 1940s by McCulloch and co-workers (Perlovsky 2001). The first studies were focused on building simple neural networks to model simple logic functions. At present, ANNs have been applied to problems that do not have algorithmic solutions or problems with complex solutions. In this study, the approximation ability of two of the most well-known ANN architectures, MLP and RBF, are investigated.

3.1.1 Multilayer Perceptron Network MLPs are a class of ANN structures using feed forward architecture and it is one of the most widely used metaheuristic for modelling complex systems in real-world applications (Alavi et al. 2010a). The MLP networks are usually applied to perform supervised learning tasks, which involve iterative training methods to adjust the connection weights within the neural network. MLPs are universal approximators; that is, they are capable of approximating essentially any continuous function to an arbitrary degree of accuracy. They are often trained using back propagation (BP) (Rumelhart et al. 1986) algorithm. MLP consist of an input layer, at least one hidden layer of neurons, and an output layer. Each of these layers has several processing units, and each unit is fully interconnected with weighted connections to units in the subsequent layer. Each layer contains a number of nodes. Every input is multiplied by the interconnection weights of the nodes. The output (hj) is obtained by passing the sum of the product through an activation function as follows:   h j  f   xi wij  b   i 

(1)

where f () is activation function; xi is the activation of ith hidden layer node; wij is the weight of the connection joining the jth neuron in a layer with the ith neuron in the previous layer, and b is the bias for the neuron. For nonlinear problems, the sigmoid functions (hyperbolic tangent sigmoid or log-sigmoid) are usually adopted as the activation function. Adjusting the interconnections between layers will reduce the following error function: E



1   tkn  hkn 2 n k



2

(2)

where tkn and hkn are respectively the calculated output and the actual output value, n is the number of sample, and k is the number of output nodes. Further details of MLPs can be found in (Cybenko 1986; Haykin 1999).

3.1.2. Radial Basis Function RBFs have feed-forward architectures. Compared with other ANN structures such as MLPs, the RBFs procedure for finding complex relationships is generally faster, and their training is much less computationally intensive. The structure of the RBF network consists of an input layer, a hidden layer with a non-linear RBF activation function, and a linear output layer. Input vectors are transformed into radial basis functions by means of the hidden layer (Alavi et al. 2009). The transformation functions used are based on a Gaussian distribution as an activation function. The center and width are two important parameters that are related to the Gaussian basis function. As the distance, usually Euclidean distance, between the input vector and its center increases, as the output given by the activation function decays to zero. The rate of decrease in the output is controlled by the width of RBF. The Gaussian basis function (c) is given in the following form:    C j(x)  exp     

xμ

j 2 2σ j

2      

(3)

where ||•|| is the Euclidian norm, x is the input pattern. In addition, μj and σj are the center and the spread of the Gaussian basis function, respectively. The output of kth neuron in the output layer of network is computed as: y (x)   n w h ( x)  b j  1 jk j k k

(4)

in which n is the number of the hidden neurons, wjk is the weight between jth hidden neuron and kth output neuron and bk is the bias term. The RBF networks with Gaussian basis functions have been shown to be universal function approximators with high point-wise convergence (Girosi and Poggio 1990).

3.2. Genetic programming GP is a symbolic optimization technique that creates computer programs to solve a problem using the principle of Darwinian natural selection (Koza, 1992). Friedberg (1958) left the first footprints in the area of GP by using a learning algorithm to stepwise improve a program. Much later, Cramer (1985) applied genetic algorithms (GAs) and tree-like structures to evolve programs. The breakthrough in GP then came in the late 1980s with the experiments of Koza (1992) on symbolic regression. GP was introduced by

Koza (1992) as an extension of GA. The main difference between GP and GA is the representation of the solution. The GP solutions are computer programs that are represented as tree structures and expressed in a functional programming language (like LISP) (Koza, 1992). GA first creates a string of numbers that represent the solutions, while in GP, the evolving programs (individuals) are parse trees than can vary in length throughout the run rather than fixed-length binary strings. Essentially, this is the beginning of computer programs that can program themselves (Koza, 1992). Since GP often evolves computer programs, the solutions can be executed without post-processing, while coded binary strings typically evolved by GA require post-processing. The optimization techniques, like GA, are generally used in parameter optimization to evolve so as to find the best values for a given set of model parameters. GP, on the other hand, provides the basic structure of the approximation model, together with the values of its parameters (Javadi and Rezania, 2009). GP optimizes a population of computer programs according to a fitness landscape determined by a program ability to perform a given computational task. The fitness of each program in the population is evaluated using a predefined fitness function. Thus, the fitness function is the objective function GP aims to optimize (Torres et al., 2009). This classical GP approach is referred to as tree-based GP. In addition to the traditional tree-based GP, there are other types of GP where programs are represented in different ways (see Figure ). These are linear and graph-based GP (Banzhaf et al., 1998; Alavi et al. 2012). The emphasis of the present study is placed on the linear-based GP techniques.

Figure 1 Different types of genetic programming.

3.2.1. Linear-Based Genetic Programming There are some main reasons for using linear GP. Basic computer architectures are fundamentally the same now as they were twenty years ago, when GP began. Almost all architectures represent computer programs in a linear fashion. Also, computers do not naturally run tree-shaped programs. Hence, slow interpreters have to be used as part of tree-based GP. Conversely, by evolving the binary bit patterns actually used by computer, the use of an expensive interpreter (or compiler) is avoided and GP can run several orders of magnitude faster (Poli et al., 2007). Several linear variants of GP have been recently

proposed. Some of them are (Oltean and Grosşan, 2003a): linear genetic programming (LGP) (Brameier and Banzhaf, 2007), gene expression programming (GEP) (Ferreira, 2001), multi-expression programming (MEP) (Oltean and Dumitrescu, 2002), Cartesian genetic programming (CGP) (Miller and Thomson, 2002), genetic algorithm for deriving software (GADS) (Patterson, 2002) and infix form genetic programming (IFGP) (Oltean and Grosşan, 2003b). LGP, GEP and MEP are the most common linear-based GP methods. These variants make a clear distinction between the genotype and the phenotype of an individual. The individuals in these variants are represented as linear strings (Oltean and Grosşan, 2003a).

3.2.1.1. Linear Genetic Programming LGP is a subset of GP with a linear representation of individuals. There are several main differences between LGP and the traditional tree-based GP. Figure presents a comparison of program structures in LGP and tree-based GP. Linear genetic programs can be seen as a data flow graph generated by multiple usage of register content. LGP operates on genetic programs that are represented as linear sequences of instructions of an imperative programming language (like C/C ++) (see Figure a). As shown in Figure b, the data flow in tree-based GP is more rigidly determined by the tree structure of the program (Brameier and Banzhaf, 2001; Gandomi et al. 2010).

(b)

(a) f[0] = 0; L0: f[0] += v[1]; L1: f[0] /= 3; L2: f[0] += v[4]; return f[0];

+

v[4]

/ v[1]

3

y = f[0] = (v[1] / 3) + v[4]

Figure 2 Comparison of the GP program structures. (a) LGP (b) Tree-based GP (after (Alavi et al., 2010c)).

In the LGP system described here, an individual program is interpreted as a variable-length sequence of simple C instructions. The instruction set or function set of LGP consists of arithmetic operations, conditional branches, and function calls. The terminal set of the system is composed of variables and constants. The instructions are restricted to operations that accept a minimum number of constants or memory variables, called registers (r), and assign the result to a destination register, e.g., r0 := r1 + 1.

LGPs can be converted into a functional representation by successive replacements of variables, starting with the last effective instruction (Oltean and Grosşan, 2003a). Automatic Induction of Machine code by Genetic Programming (AIMGP) is a particular form of LGP. In AIMGP, evolved programs are stored as linear strings of native binary machine code and are directly executed by the processor during fitness calculation. The absence of an interpreter and complex memory handling results in a significant speedup in the AIMGP execution compared to tree-based GP. This machine-code-based LGP approach searches for the computer program and the constants at the same time. Comprehensive descriptions of the basic parameters used to direct a search for a linear genetic program can be found in (Brameier and Banzhaf, 2007).

3.2.1.2. Gene Expression Programming GEP is a natural development of GP. It was first presented by Ferreira (2001). GEP consists of five main components: function set, terminal set, fitness function, control parameters, and termination condition. Unlike the parse-tree representation in the conventional GP, GEP uses a fixed length of character strings to represent solutions to the problems, which are afterwards expressed as parse trees of different sizes and shapes. These trees are called GEP expression trees (ETs). One advantage of the GEP technique is that the creation of genetic diversity is extremely simplified as genetic operators work at the chromosome level. Another strength of GEP refers to its unique, the multigenic nature allows the evolution of more complex programs composed of several subprograms (Gandomi and Alavi, 2011,2012b, 2012c). Each GEP gene contains a list of symbols with a fixed length that can be any element from a function set like {+,-, ×, /, √} and the terminal set like {X1, X2, X3, 2}. The function set and terminal set must have the closure property: each function must able to take any value of data type, which can be returned by a function or assumed by a terminal. A typical GEP gene with the given function and terminal sets can be as follows: +. ×. √ . X1. -. +. +. ×. X2. X1. x3. 3. X2. X3

(5)

where X1, X2 and X3 are variables, and 3 is a constant, while ‘‘.’’ is the element separator for easy reading. The above expression is often called Karva notation or K-expression (Ferreira, 2006). A K-expression can be represented by a diagram which is an ET. GEP genes have a fixed length, which is predetermined for a given problem. Thus, what varies in GEP is not the length of genes but the size of the corresponding ETs. This means that there exist a certain number of redundant elements that are not useful for the genome mapping. Hence, the valid length of a Kexpression may be equal or less than the length of the GEP gene. To guarantee the validity of a randomly selected genome, GEP employs a head–tail method. Each GEP gene is composed of a head and a tail. The

head may contain both function and terminal symbols, whereas the tail may contain terminal symbols only.

3.2.1.3. Multi-Expression Programming Muti-Expression Programming (MEP) is a subarea of GP developed by Oltean and Dumitrescu (2002). MEP uses linear chromosomes for solution encoding. It has a special ability to encode multiple solutions (computer programs) of a problem in a single chromosome. Based on the fitness values of the individuals, the best encoded solution is chosen to represent the chromosome. There is no increase in the complexity of the MEP decoding process, compared with the other GP variants that store a single solution in a chromosome. The exception is for the situations where the set of training data is not a priori known (Oltean and Grosşan, 2003a, c). The evolutionary steady-state MEP algorithm typically starts by the creation of a random population of individuals. MEP is represented in a similar way to that of C and Pascal compilers translating mathematical expressions into machine code. The number of MEP genes per chromosome is constant, which specifies the length of the chromosome. A terminal (an element in the terminal set T) or a function symbol (an element in the function set F) is encoded by each gene. A gene, that encodes a function, includes pointers towards the function arguments. Function parameters always have indices of lower values than the position of that function itself in the chromosome. The first symbol in a chromosome must be a terminal symbol as stated by the proposed representation scheme. As an example, we use an MEP chromosome as shown below. It should be noted that the numbers to the left stand for gene labels that do not belong to the chromosome. Using the set of functions F = {+, ×, /} and the set of terminals T = {x1, x2, x3, x4}, the example is given as follows: 0: x1 1: x2 2: × 0, 1 3: x3 4: + 2, 3 5: x4 6: / 4, 5 Translation of the MEP individuals into computer programs can be obtained by reading the chromosome top-down starting with the first position. A terminal symbol defines a simple expression, and each of function symbols specifies a complex expression obtained by connecting the operands specified by the argument positions with the current function symbol (Oltean and Grosşan, 2003c).

In order to choose one of these expressions (E1, …, E6) as the chromosome representer, multiple solutions in a single chromosome can be encoded. Each MEP chromosome may be viewed as a forest of trees, rather than a single tree due to its multi-expression representation. Each of these expressions can be considered as a possible solution of a problem. The fitness of each expression in an MEP chromosome is calculated to designate the best encoded expression in that chromosome (Alavi et al. 2010b).

3.3. Fuzzy Logic Fuzzy Logic (FL) is a process of mapping an input space onto an output space using membership functions and linguistically specified rules (Ceven and Ozdemir, 2007). The concept of ‘‘fuzzy set’’ was preliminarily introduced by Zadeh (1986). The fuzzy approach is more in-line with human thought since it provides possible rules relating input variables to the output variable. FL is well suited to implementing control rules that can only be expressed verbally. It can also be used for the modeling of systems that cannot be modeled with linear differential equations (Afandizadeh-Zargari et al. 2012). The essential idea in FL is the concept of partial belongings of any object to different subsets of the universal set instead of full belonging to a single set. Partial belonging to set can be described numerically by a membership function (Topcu and Sarıdemir, 2008). A membership function is a curve, mapping an input element to a value between 0 and 1, showing the degree to which it belongs to a fuzzy set. Membership degree is the value of every element, varying between 0 and 1. A membership function can have different shapes for different kinds of fuzzy sets, such as bell, sigmoid, triangle, and trapezoid (Ceven and Ozdemir, 2007). In FL, rules and membership sets are used to make a decision. The idea of a fuzzy set is basic and simple: an object is allowed to have a gradual membership of a set. It means the degree of truth of a statement can range between 0 and 1, which is not limited to just two logic values {true, false}. When linguistic variables are used, these degrees may be managed by specific functions. A fuzzy system consists of output and input variables. For each variable, fuzzy sets that characterize those variables are formulated, and for each fuzzy set a membership function can be defined. After that, the rules that relate the output and input variables to their fuzzy sets are defined. Figure 1 depicts a typical fuzzy logic system where a general fuzzy inference system has basically four components: fuzzification, fuzzy rule base, fuzzy inference engine, and defuzzification (Topcu and Sarıdemir, 2008). In the fuzzification stage, each piece of the input data is converted to degrees of membership by a lookup in one or more several membership functions. The fuzzy rule base contains rules including all possible fuzzy relations between the inputs and outputs. These rules can be expressed as a collection of IF-THEN statements. All the rules have antecedents and consequents. Fuzzy inference engine takes into account all the fuzzy rules in the fuzzy rule base, and learns how to transform a set of inputs into their corresponding

outputs. The fuzzy inference process generates the resulting fuzzy set, based on the input and the antecedents of the rules. Finally, the resulting fuzzy outputs are converted from the fuzzy inference engine to a number through the so-called defuzzification process (Topcu and Sarıdemir, 2008).

Rules Input Data

Defuzzificatin

Fuzzification

Output Data

Inference Fuzzy output sets

Fuzzy input sets

Figure 1 Fuzzy logic system. 3.4. Support Vector Machines Support vector machine (SVM) is a well-known machine learning method, based on statistical learning theory (Boser et al. 1992; Vapnik 1995, 1998). Similar to ANNS, the SVM procedure involves a training phase in which a series of input and target output values are fed into the model. A trained algorithm is then employed to evaluate a separate set of testing data. Two fundamental concepts underlying the SVM are (Goh and Goh 2007): 1. An optimum margin classifier: This is a linear classifier that constructs a separating hyperplane (decision surface) such that the distance between the positive and negative examples is maximized. 2. Use of kernel functions: A kernel function is a function that calculates the dot product of two vectors. A suitable nonlinear kernel can map the original example data onto a new dataset that become linearly separable in a high-dimensional feature space, even though they are non-separable in the original input space (Vapnik 1995, 1998, Goh and Goh 2007). The SVM procedure can be outlined as follows (Goh and Goh 2007): (1) Choosing a kernel function with related kernel parameters. (2) Solving a Lagrange cost function and obtaining the Lagrange multipliers. (3) Carrying out the binary classification task, with training input data points expressed in vector form xi (i = 1, . . . , m) belonging to two classes yi = +1 or -1, the decision surface (f) is determined by:

f x    i yi K x, xi   b m

i 1

(8)

where K is the kernel function, α is the Lagrange multiplier, and b is the bias term. Among the parameters in the SVM algorithm (Goh and Goh 2007), dimension (power) and the penalty parameter are the two parameters needed for a polynomial kernel. For a given kernel, the SVM analysis essentially involves the optimization of a convex cost function, which results in a unique optimal separating the hyperplane, using an efficient quadratic programming method. Comprehensive descriptions of SVM can be found in more advanced literature (Vapnik 1995; Goh and Goh 2007).

4. METAHEURISTIC ALGORITHMS IN OPTIMIZATION To find an optimal solution to an optimization problem is often a very challenging task, depending on the choice and the correct use of the right algorithm. The choice of an algorithm may depend on the type of problem, the available of algorithms, computational resources, and time constraints. For large-scale, nonlinear, global optimization problems, there is often no agreed guideline for algorithm choice, and in many cases, there is no efficient algorithm. For hard optimization problems, especially for nondeterministic polynomial-time hard, or NP-hard, optimization problems, there is no efficient algorithm at all. In most applications, an optimization problem can be commonly expressed in the following generic form (Yang, 2010a; Yang 2011e): minimize x∈ℜn fi(x), (i = 1, 2, ...,M), subject to hj(x) = 0, (j = 1, 2, ..., J), gk(x) ≤ 0, (k = 1, 2, ...,K),

(9) (10) (11)

where fi(x), hj (x) and gk(x) are functions of the design vector x = (x1, x2, ..., xn)T . Here the components xi of x are called design or decision variables, and they can be real continuous, discrete or the mixed of these two. The functions fi(x) where i = 1, 2, ..., M are called the objective functions, or simply cost functions, and in the case of M = 1, there is only a single objective. The space spanned by the decision variables is called the design space or search space. The equalities for hj and inequalities for gk are called constraints. It is worth pointing out that we can also write the inequalities in the other way ≥ 0, and we can also formulate the objectives as a maximization problem.

Various algorithms may be used for solving optimization problems. The conventional or classic algorithms are mostly deterministic. As an instance, the simplex method in linear programming is deterministic. Some other deterministic optimization algorithms, such as Newton-Raphson algorithm, use the gradient information and are called gradient-based algorithms. Non-gradient-based, or gradientfree/derivative-free, algorithms only use the function values, not any derivative (Yang 2011b). Heuristic and Metaheuristic are the main types of the stochastic algorithms. The difference between Heuristic and metaheuristic algorithms is negligible. Heuristic means ‘to find’ or ‘to discover by trial and error’. Quality solutions to a tough optimization problem can be found in a reasonable amount of time, but there is no guarantee that optimal solutions are reached. It hopes that these algorithms work most of the time, but not necessarily all the time. This is good when good solutions which are easily reachable are need not necessarily the best solutions (Yang 2010a; Koziel and Yang 2011). As discussed earlier in this chapter, metaheuristic optimization algorithms are often inspired from nature. According to the source of inspiration of the metaheuristic algorithms they can be classified into different categories as shown in Figure 2. The main category is the biology-inspired algorithms which generally use biological evolution and/or collective behavior of animals. Science is another source of inspiration for the metaheuristics. These algorithms are usually inspired physic and chemistry. Moreover, art-inspired algorithms have been successful for the global optimization. They generally inspired from artists behavior to create artistic stuffs (such as musicians and architectures). Socially inspired algorithms can be defined as another source of inspiration and the algorithm simulate the social behavior to solve optimization.

Figure 2 Source of inspiration in metaheuristic optimization algorithms

Although there are different source of inspirations for the metaheuristic optimization algorithms, they have similarities in their structures. Therefore, they can also be classified into two main categories as Evolutionary Algorithms and Swarm Algorithms. 4.1. Evolutionary Algorithms

The evolutionary algorithms generally use an iterative procedure, based on a biological evolution progress to solve optimization problems. Some of the evolutionary algorithms are described below:

4.1.1. Genetic Algorithm Genetic algorithm (GA) is a powerful optimization method based on the principles of genetics and natural selection (Holland 1975). Holland (1975) was the first to use the crossover and recombination, mutation, and selection in the study of adaptive and artificial systems. These genetic operators form the essential part of GA for problem-solving. Up to now, many variants of GA have been developed and applied to a wide range of optimization problems (Rani et al. 2012; Nikjoofar and Zarghami 2012). One of the main advantages is that GA is a gradient-free method with flexibility of dealing various types of optimization whether the objective function is stationary or non-stationary, linear or nonlinear, continuous or discontinuous, or with random noise. In GA, a population can simultaneously find the search space in many directions because multiple offsprings in the population act like independent agents. This feature idealizes the parallelization of the algorithms for implementation. Further, different parameters and groups of encoded strings can be manipulated at the same time. Despite several advantages of GAs, they have some disadvantages pertaining to the formulation of fitness function, the usage of population size, the choice of the important parameters, and the selection criteria of new population. The convergence of GA can be seriously dependant on the appropriate choice of these parameters.

4.1.2. Differential Evolution Differential evolution (DE) was developed by Storn and Price (1997). It is a vector-based evolutionary algorithm, and can be considered as a further development to genetic algorithms. It is a stochastic search algorithm with self-organizing tendency and does not use the information of derivatives. DE carries out operations over each component (or each dimension of the solution). Solutions are represented in terms of vectors, and then mutation and crossover are carried out using these vectors (Gandomi et al. 2012a). For example, in genetic algorithms, mutation is carried out at one site or multiple sites of a chromosome, while in differential evolution, a difference vector of two randomly-chosen vectors is used to perturb an existing vector. Such vectorized mutation can be viewed as a self-organizing search, directed towards optimality (Yang, 2008; Yang 2010a). This kind of perturbation is carried out over each population vector, and thus can be expected to be more efficient. Similarly, crossover is also a vector-based component-wise exchange of chromosomes or vector segments. Solutions of DE are represented in terms of vectors, and then mutation and crossover are carried out using these vectors. For example, in genetic algorithms, mutation is carried out at one site or multiple

sites of a chromosome, while in differential evolution, a difference vector of two randomly-chosen vectors is used to perturb an existing vector. Such vectorized mutation can be viewed as a self-organizing search, directed towards optimality. This kind of perturbation is carried out over each population vector, and thus can be expected to be more efficient. Similarly, crossover is also a vector-based component-wise exchange of chromosomes or vector segments.

4.1.3. Harmony Search Harmony search (HS) algorithm is a music-inspired algorithm, based on the improvisation process of a musician (Geem et al. 2001). Previous reviews of the HS literature have focused on applications in civil engineering such as engineering optimization (Lee and Geem 2005), design of structures (Lee et al. 2005), design of water distribution networks (Geem 2007), geometry design of geodesic domes (Saka 2007), design of steel frames (Degertekin 2008a, b), Groundwater Management Problems (Ayvaz and Elci 2012), Geotechnical Engineering problems (Cheng and Geem 2012), etc. HS algorithm includes a number of optimization operators, such as the harmony memory (HM), the harmony memory size (HMS), the harmony memory considering rate (HMCR), and the pitch adjusting rate (PAR). In the HS algorithm, the HM stores the feasible vectors, which are all in the feasible space. The harmony memory size determines the number of vectors to be stored. During the optimization process, a new harmony vector is generated from the HM, based on memory considerations, pitch adjustments, and randomization. After generating a new harmony vector, if it is better than the worst harmony in the HM, judged in terms of the objective function value, the new harmony is included in the HM and the existing worst harmony is excluded from the HM. Pitch adjustment is similar to the mutation operator in genetic algorithms. Although adjusting pitch has a similar role, but it is limited to certain local pitch adjustment and thus corresponds to a local search. The use of randomization can drive the system further to explore various regions with high solution diversity so as to find the global optimality.

4.2. Swarm-Intelligence-Based Algorithms Swarm intelligence based algorithms use the collective behavior of animals such as birds, insects or fishes. Here, we introduce briefly some of the most widely used swarm algorithms.

4.2.1. Particle Swarm Optimization

The Particle swarm optimization (PSO) algorithm, inspired by social behavior simulation, was initially proposed by Kennedy and Eberhart (1995). The PSO used the idea that social sharing of information among members may have some evolutionary advantage (Kennedy et al. 2001). PSO has been applied to many real-world problems (Yang 2008; Yang 2010a; Talatahari et al. 2013). A standard PSO algorithm is initialized with a population (swarm) of random potential solutions (particles). Each particle iteratively moves across the search space and is attracted to the position of the best fitness historically achieved by the particle itself (local best) and by the best among the neighbors of the particle (global best) (Kaveh and Talatahari 2009a). In fact, in the PSO, instead of using more traditional genetic operators, each particle adjusts its flying according to its own flying experience and its companions’ flying experience (Kaveh and Talatahari 2008, 2009b, Hadidi et al. 2011). The original PSO, developed by Kennedy and Eberhart (1995), used two following equations:

v id (k  1)  v id (k )  c1  rand 1di  ( pbest id (k )  x id )  c 2  rand 2di  ( gbest d (k )  x id ))

(12)

x id (k  1)  x id (k )  v id (k  1)

(13)

where the first calculates the velocity of each particle according to three factors: d (1) previous velocity (v i ( k ) ); d (2) direction of the best position ( pbest i (k ) ) visited by each particle itself;

d

(3) direction of the best position of swarm ( gbest (k ) ) up to iteration k. Eq. (13) updates each particle's position in the search space. In these equations, xi, vi represent the current position and the velocity of the ith particle, respectively; rand 1di and rand 2di represent random numbers between 0 and 1; gbestk(k) corresponds to the global best position in the swarm up to iteration k ; c1, c2 represent cognitive and social parameters, respectively. After many numerical simulations, Eberhart and Shi (1998) added a weighting/inertia factor to Eq. (12) to control the trade-off between the global exploration and local exploitation abilities of the flying particles as:

v id (k  1)  w v id (k )  c1  rand 1di  ( pbest id (k )  x id )  c 2  rand 2di  ( gbest d (k )  x id ))

(14)

A well-chosen weight can stabilize the swarm as well as speed up the convergence. By using the linearly decreasing inertia weight, the PSO lacks global search ability at the end of run even when the global search ability is required to jump out of the local minimum in some cases. Nevertheless the results shown

in literature illustrate that by using a linearly decreasing inertia weight the performance of the PSO can be improved greatly and have better results than that of both simple PSO and evolutionary programming as reported in Angeline (1998), and Shi and Eberhart (1999).

4.2.2. Ant Colony Optimization In 1992, Dorigo developed a paradigm known as Ant Colony Optimization (ACO), a cooperative search technique that mimics the foraging behavior of real life ant colonies (Dorigo (1992) and Dorigo et al. (1996)). The ant algorithms mimic the characteristics of real ants that can rapidly establish the shortest route from food source to their nest and vice versa. Ants start searching the area surrounding their nest in a random manner. Ethologists observed that ants can construct the shortest path from their colony to the feed source and back using pheromone trails (Deneubourg & Goss (1989) and Goss et al. (1990)). When ants encounter an obstacle, at first, there is an equal probability for all ants to move right or left, but after a while, the number of ants choosing the shorter path increases because of the increase in the amount of the pheromone on that path. With the increase in the number of ants and pheromone on the shorter path, all of the ants will choose and move along the shorter one (Talatahari et al. 2012a, Kaveh and Talatahari 2010a). In fact, real ants use their pheromone trails as a medium for communication of information among them. When an isolated ant comes across some food source in its random sojourn, it deposits a quantity of pheromone on that location. Other randomly moving ants in the neighborhood can detect this marked pheromone trail. Furthermore, these ants can follow this trail with a very high degree of probability and simultaneously enhance the trail by depositing their own pheromone. More and more ants follow the pheromone rich trail and the probability of the trail being followed by other ants is further enhanced by the increased trail deposition. This is an autocatalytic (positive feedback) process which favors the path along which more ants previously traversed. The ant algorithms are based on the indirect communication capabilities of the ants. In ACO algorithms, virtual ants are deputed to generate rules by using heuristic information or visibility and the principle of indirect pheromone communication capabilities for iterative improvement of rules. The general procedure of the ACO algorithm manages the scheduling of three steps: initialization, solution construction, pheromone updating. The initialization of the ACO includes two parts: first consists mainly the initialization of the pheromone trail. Second, a number of ants are arbitrarily placed on the nodes chosen randomly. Then each of the distributed ants will perform a tour on the graph by constructing a path according to the node transition rule described next.

For generation a solution, each ant constructs a complete solution to the problem according to a probabilistic state transition rule. The state transition rule depends mainly on the state of the pheromone and visibility of ants. Visibility is an additional ability used to make this method more efficient. When every ant has constructed a solution, the intensity of pheromone trails on each edge is updated by the pheromone updating rule which is applied in two phases. First, an evaporation phase where a fraction of the pheromone evaporates, and then a reinforcement phase where the elitist ant which has the best solution among others, deposits an amount of pheromone. At the end of each movement, local pheromone update reduces the level of pheromone trail on paths selected by the ant colony during the preceding iteration.

4.2.3. Bee Algorithms Bee algorithms are another class of metaheuristic algorithms that mimic the behavior of bees (Yang, 2005; Karaboga, 2005; Yang 2008). Different variants of bee algorithms use slightly different characteristics of the behavior of bees. For example, in the honeybee-based algorithms, forager bees are allocated to different food sources (or flower patches) so as to maximize the total nectar intake (Nakrani and Tovey, 2004; Yang, 2005; Karaboga, 2005; Pham et al., 2006). In the virtual bee algorithm (VBA), developed by Xin-She Yang in 2005, pheromence concentrations can be linked with the objective functions more directly (Yang, 2005). On the other hand, the artificial bee colony (ABC) optimization algorithm was first developed by D. Karaboga in 2005. In the ABC algorithm, the bees in a colony are divided into three groups: employed bees (forager bees), onlooker bees (observer bees) and scouts. Unlike the honey bee algorithm which has two groups of the bees (forager bees and observer bees), bees in ABC are more specialized (Afshar et al., 2007; Karaboga, 2005). The artificial bee colony (ABC) algorithm, developed by Karaboga (2005, 2010), and later advocated by Basturk and Karaboga (2006), was based on the foraging behavior of honey bees. In the ABC algorithm, the colony of the artificial honey bees contains three groups of bees, e.g. employed bees, onlookers and scouts. The first half of the colony consists of the employed artificial bees and the second half includes the onlookers. The position of a food source represents a possible solution to the considered optimization problem and the nectar amount of the food source corresponds to the quality or fitness of the associated solution. In the first step, the ABC algorithm generates randomly distributed predefined number of initial population. After initialization, the population of the positions (solutions) is subjected to repeated cycles of the search process of the employed bees, onlooker bees and scout bees. An employed bee produces a modification on the position (solution) in its memory depending on the local information (visual information) and tests the nectar amount (fitness value) of the new food source (new solution).

Provided that the nectar amount of the new source is higher than that of the previous one, the bee memorizes the new position and forgets the old one. Otherwise, it keeps the position of the previous source in its memory. When all the employed bees complete the search process, they share the nectar information of the food sources and their position information with the onlooker bees in the dance area. 4.2.4 Firefly Algorithm

Firefly Algorithm (FA) was first developed by Xin-She Yang (Yang, 2008; Yang, 2009), which was based on the flashing patterns and behaviour of fireflies. In essence, FA uses the following three idealized rules: 1) Fireflies are unisexual so that one firefly will be attracted to other fireflies regardless of their sex.2) The attractiveness is proportional to the brightness and they both decrease as their distance increases. Thus for any two flashing fireflies, the less brighter one will move towards the brighter one. If there is no brighter one than a particular firefly, it will move randomly. 3) The brightness of a firefly is determined by the landscape of the objective function. As a firefly's attractiveness is proportional to the light intensity seen by adjacent fireflies, we can now define the variation of attractiveness  with the distance r , the movement of a firefly i is attracted to another more attractive (brighter) firefly j can be determined by

x ti 1= x ti   0e

  rij2

( x tj  x ti )    it ,

(18)

where 0 is the attractiveness at r=0. Here, the second term is due to the attraction. The third term is randomization with

 being the randomization parameter, and  ti is a vector of random numbers drawn

from a Gaussian distribution or uniform distribution at time t. Furthermore, the randomization  it can easily be extended to other distributions such as Lévy flights. A demo version of firefly algorithm implementation, without Lévy flights, can be found at Mathworks file exchange web site.1 Firefly algorithm has attracted much attention (Sayadi et al., 2010; Apostolopoulos and Vlachos, 2011; Gandomi et al., 2011b; Yang et al., 2012). A discrete version of FA can efficiently solve NP-hard scheduling problems (Sayadi et al., 2010), while a detailed analysis has demonstrated the efficiency of FA over a wide range of test problems, including multobjective load dispatch problems (Apostolopoulos and Vlachos, 2011). A chaos-enhanced firefly algorithm with a basic metod for automatic parameter tuning is also developed (Yang, 2011b), and the use of various chaotic

1

http://www.mathworks.com/matlabcentral/fileexchange/29693-firefly-algorithm

maps can significantly improve the performance of firefly algorithm, though different chaotic maps may have different effects (Gandomi et al., 2012c). 4.2.5 Cuckoo Search Cuckoo search (CS) is one of the latest nature-inspired metaheuristic algorithms, developed in 2009 by Xin-She Yang and Suash Deb (Yang and Deb, 2009). CS is based on the brood parasitism of some cuckoo species. In addition, this algorithm is enhanced by the so-called Lévy flights (Pavlyukevich, 2007), rather than by simple isotropic random walks. Recent studies show that CS is potentially far more efficient than PSO and genetic algorithms (Yang and Deb, 2010). For simplicity in describing the standard Cuckoo Search, we now use the following three idealized rules: 1) Each cuckoo lays one egg at a time, and dumps it in a randomly chosen nest. 2) The best nests with high-quality eggs will be carried over to the next generations. 3) The number of available host nests is fixed, and the egg laid by a cuckoo is discovered by the host bird with a probability pa  [0,1] . In this case, the host bird can either get rid of the egg, or simply abandon the nest and build a completely new nest. As a further approximation, this last assumption can be approximated by a fraction pa of the n host nests are replaced by new nests (with new random solutions). For the implementation point of view, we can use the following simple representations that each egg in a nest represents a solution, and each cuckoo can lay only one egg (thus representing one solution), the aim is to use the new and potentially better solutions (cuckoos) to replace a not-so-good solution in the nests. Obviously, this algorithm can be extended to the more complicated case where each nest has multiple eggs representing a set of solutions. For this present introduction, we will use the simplest approach where each nest has only a single egg. In this case, there is no distinction between egg, nest or cuckoo, as each nest corresponds to one egg which also represents one cuckoo. This algorithm uses a balanced combination of a local random walk and the global explorative random walk, controlled by a switching parameter pa . The local random walk can be written as

x ti 1= x ti   s  H ( pa   )  ( x tj  x tk ),

(20)

t where x j and x tk are two different solutions selected randomly by random permutation, H (u) is a

Heaviside function,  is a random number drawn from a uniform distribution, and s is the step size. On the other hand, the global random walk is carried out by using Lévy flights

x ti 1= x ti   L( s,  ), L( s,  ) =

( ) sin( / 2) 1 , ( s ? s0 > 0).  s1 

(22)

Here  > 0 is the step size scaling factor, which should be related to the scales of the problem of interests. In most cases, we can use  = O( L /10) where L is the characteristic scale of the problem of interest, while in some case  = O( L /100) can be more effective and avoid flying too far. The above equation is essentially the stochastic equation for a random walk. In general, a random walk is a Markov chain whose next status/location only depends on the current location (the first term in the above equation) and the transition probability (the second term). However, a substantial fraction of the new solutions should be generated by far field randomization and whose locations should be far enough from

the current best solution, this will make sure that the system will not be trapped in a local optimum (Yang and Deb, 2009). A Matlab implementation is given by the author, and can be downloaded.2 Cuckoo search is very efficient in solving engineering optimization problems (Gandomi et al., 2012c; Gandomi et al., 2012d). 4.2.6 Bat Algorithm Bat algorithm is a relatively new metaheuristic, developed by Xin-She Yang in 2010 (Yang, 2010b). It was inspired by the echolocation behaviour of microbats. Microbats use a type of sonar, called, echolocation, to detect prey, avoid obstacles, and locate their roosting crevices in the dark. These bats emit a very loud sound pulse and listen for the echo that bounces back from the surrounding objects. Their pulses vary in properties and can be correlated with their hunting strategies, depending on the species. Most bats use short, frequency-modulated signals to sweep through about an octave, while others more often use constant-frequency signals for echolocation. Their signal bandwidth varies depends on the species, and often increased by using more harmonics. Inside the bat algorithm, it uses three idealized rules: 1) All bats use echolocation to sense distance, and they also `know' the difference between food/prey and background barriers in some magical way; 2) A bat flies randomly with a velocity v i at position x i with a fixed frequency range [ f min , f max ] , varying its emission rate r [0,1] and loudness A0 to search for prey, depending on the proximity of their target; 3) Although the loudness can vary in many ways, we assume that the loudness varies from a large (positive) A0 to a minimum constant value Amin . The above rules can be translated into the following formulas:

fi  f min  ( f max  f min ) , vit 1  vit  ( xit  x*) f i ,

xit 1  xit  vit ,

(5)

where  is a random number drawn from a uniform distribution, and x* is the current best solution found so far during iterations. The loudness and pulse rate can vary with iteration t in the following way: Ait 1   Ait , rit  ri0 [1  exp(  t )]. (6) Here α and γ are constants. In fact, α is similar to the cooling factor of a cooling schedule in the simulated annealing to be discussed later. In the simplest case, we can use α = , and we have in fact used α =  = 0.9 in most simulations. BA has been extended to multiobjective bat algorithm (MOBA) by Yang (2011d), and preliminary results suggested that it is very efficient (Yang and Gandomi, 2012; Gandomi et al., 2012e).

4.2.7. Charged System Search The charged system search (CSS) is another of the most recently introduced metaheuristic algorithms (Kaveh and Talatahari 2010b). This algorithm has been used to solve different type of optimization

2

www.mathworks.com/matlabcentral/fileexchange/29809-cuckoo-search-cs-algorithm

problems such design of skeletal structures (Kaveh and Talatahari 2010c), design of grillage systems (Kaveh and Talatahari 2010d), parameter identification of MR dampers (Talatahari et al. 2012b), design of composite open channels (Kaveh et al. 2012). This algorithm was inspired by the governing laws of charged systems. Like the swarm algorithms, CSS uses multiple agents/charge particles, and each agent can be considered as a charged sphere. Since these agents are treated as charged particles (CP) that can affect each other according to the Coulomb and Gauss laws of electrostatics. The governing laws of motion from the Newtonian mechanics. CPs can impose electrical forces on the others, and the forces vary with the separation distance between the CPs, and for a CP located outside the sphere is inversely proportional to the square of the separation distance between the particles. At each iteration, each CP moves toward to its new position considering the resultant forces and its previous velocity. If each CP exits from the allowable search space, its position is corrected using the harmony search-based handling approach as described by Kaveh and Talatahari (2009c). In addition, to store the best design, a memory (Charged Memory) is considered containing the CMS number of positions for the so far best agents. 4.2.8. Krill Herd Krill herd (KH) is another novel biologically-inspired algorithm, proposed by Gandomi and Alavi (2012). The KH algorithm is based on simulating the herding behavior of krill individuals. The minimum distances of each individual krill from food and from highest density of the herd are considered as the objectives for the Lagrangian movement. The time-dependent positions of the krill individuals are updated by three main components/factors: i) movement induced by the presence of other individuals; ii) foraging activity, and iii) random diffusion. This algorithm is also a gradient-free method because derivatives are not needed in the KH algorithm, and KH is also metaheuristic algorithm because it uses a stochastic/random search in addition to some deterministic components. For any metaheuristic algorithm, it is important to tune its related parameters. One of interesting parts of the KH algorithm is that it can carefully simulate the krill behavior, and the values of these coefficients were based on empirical studies of the real-world krill sytems. For this reason, only time interval needs fine-tuning in the KH algorithm. This can be considered a first attempt to use real-world system to derive algorithm-dependent parameters, which can be advantageous. The preliminary results indicate that the KH method is very encouraging for its further application to optimization tasks.

5. Challenges in Metaheuristics

As we have seen from this review, metaheuristic algorithms have been successfully for solving a variety of real-world problems. However, there remain some challenging issues concerning metaheuristics. First, theoretical analysis of these algorithms still lacks a unified framework, and there are many open problems as outlined by Yang in a recent review (Yang 2011c). For example, in what ways algorithm-dependent parameters affect the efficiency of an algorithm? What is the optimal balance between exploration and exploitation for metaheuristic algorithms so that they can perform most efficiently?

How memory in

algorithm can help to improve the performance of an algorithm?

Another important issue is the gap between theory and practice because metaheuristic applications are expanding rapidly, far ahead of mathematical analysis. At the same time, most applications are on smallscale problems. Future applications and studies should focus on large-scale applications.

On the other hand, there are many new algorithms, but more algorithms make it even harder to understand the working mechanisms of metaheuristics in general. We may need a unified approach to analyze algorithms, and ideally to classify these algorithms, so that we can understand all metaheuristics in a more insightful way (Yang, 2011c). These challenges also provide some timely and hot research opportunities for researchers so that important progress can be made in the near future.

REFERENCES

Afandizadeh-Zargari S., Zabihi S., Alavi A.H., Gandomi A.H., (2012). “A Computational Intelligence Based Approach for Short-Term Traffic Flow Prediction.” Expert Systems, 29(2), 124–142. Afshar, A., Haddad, O. B., Marino, M. A., Adams, B. J., (2007). Honey-bee mating optimization (HBMO) algorithm for optimal reservoir operation, J. Franklin Institute, 344, 452-462. Alavi A.H., Gandomi A.H., Bolury J., Mollahasani A., (2012). “Linear and Tree-Based Genetic Programming for Solving Geotechnical Engineering Problems” Chapter 12 in Metaheuristics in Water Resources, Geotechnical and Transportation Engineering, XS Yang et al. (Eds.), Elsevier, 289-310.

Alavi A.H., Gandomi A.H., (2011a). “Prediction of Principal Ground-Motion Parameters Using a Hybrid Method Coupling Artificial Neural Networks and Simulated Annealing.” Computers and Structures, Elsevier, 89 (23-24): 2176-2194,. Alavi A.H., Gandomi A.H., (2011b). A Robust Data Mining Approach for Formulation of Geotechnical Engineering Systems. International Journal of Computer Aided Methods in Engineering-Engineering Computations, 28(3) 242–274. Alavi A.H., Gandomi A.H., Gandomi M., Sadat Hosseini S.S., (2009). “Prediction of Maximum Dry Density and Optimum Moisture Content of Stabilized Soil Using RBF Neural Networks.” The IES Journal Part A: Civil & Structural Engineering, Taylor & Francis, 2(2): 98-106. Alavi A.H., Gandomi A.H., Mollahasani A., Heshmati A.A.R., Rashed A., (2010a). “Modeling of Maximum Dry Density and Optimum Moisture Content of Stabilized Soil Using Artificial Neural Networks.” Journal of Plant Nutrition and Soil Science, 173(3): 368-379. Alavi A.H., Gandomi A.H., Sahab M.G., Gandomi M., (2010b). “Multi Expression Programming: A New Approach to Formulation of Soil Classification.” Engineering with Computers, 26(2): 111-118. Angeline PJ, (1998). Evolutionary optimization versus particle swarm optimization: philosophy and performance difference, Proceedings of Annuale Conference on Evolutionary programming, San Diego, 1998, pp. 601-610. Apostolopoulos, T. and Vlachos, A., (2011). Application of the Firefly Algorithm for Solving the Economic Emissions Load Dispatch Problem, International Journal of Combinatorics, Volume 2011, Article ID 523806. http://www.hindawi.com/journals/ijct/2011/523806.html Date accessed: 14 Jan 2012. Ayvaz M.T. and Elci A. (2012). “Application of the Hybrid HS Solver Algorithm to the Solution of Groundwater Management Problems” Chapter 4 in Metaheuristics in Water Resources, Geotechnical and Transportation Engineering, Xin-She Yang et al. (Eds.), Elsevier, 79-97. Banzhaf, W., Nordin, P., Keller, R. and Francone, F. (1998), Genetic Programming - An Introduction. On the Automatic Evolution of Computer Programs and its Application. dpunkt/Morgan Kaufmann: Heidelberg/San Francisco. Basturk, B., Karaboga, D., (2006). An artificial bee colony (ABC) algorithm for numeric function optimization. In: Proceedings of the IEEE Swarm Intelligence Symposium, Indianapolis, IN, USA, May 12–14. Blum, C. and Roli, A. (2003). ‘Metaheuristics in combinatorial optimization: Overview and conceptural comparision’, ACM Comput. Surv., Vol. 35, 268-308.

Boser BE, Guyon IM, Vapnik VN, (1992). A training algorithm for optimal margin classifiers. In: Proceedings of the 5th annual ACM workshop on computational learning theory, Pittsburgh, vol. 5. p. 144–52. Brameier, M. and Banzhaf, W. (2007), Linear Genetic Programming. New York (NY): Springer Science + Business Media LLC. Brameier, M., and Banzhaf, W. (2001), “A Comparison of Linear Genetic Programming and Neural Networks in Medical Data Mining”, IEEE Transactions on Evolutionary Computation, Vol. 5 No. 1, pp. 17-26. Ceven, E. K., and Ozdemir, O. (2007). Using Fuzzy Logic to Evaluate and Predict Chenille Yarn’s Shrinkage Behaviour. FIBRES & TEXTILES in Eastern Europe, 15(3), 55-59. Cheng Y.M. and Geem Z.W. (2012). “Hybrid Heuristic Optimization Methods in Geotechnical Engineering” Chapter 9 in Metaheuristics in Water Resources, Geotechnical and Transportation Engineering, XS Yang et al. (Eds.), Elsevier, 205-229. Cortes C, Vapnik V. (1995). Support vector networks. Mach Learn, 20, 273–97. Cramer, N.L. (1985), A representation for the adaptive generation of simple sequential programs. In Genetic Algorithms and their Applications, pp. 183-7. Cybenko, J. (1989). Approximations by Superpositions of a Sigmoidal Function.” Math. Cont. Sign. Syst., 2, 303--314 Degertekin SO (2008a). Optimum design of steel frames using harmony search algorithm. Structural and Multidisciplinary Optimization 36:393–401 Degertekin SO (2008b). Harmony search algorithm for optimum design of steel frame structures: a comparative study with other optimization methods. Structural Engineering and Mechanics 29:391– 410 Deneubourg, J.L. and Goss, S. (1989). ‘‘Collective patterns and decision-making’’, Ethnology Ecology and Evolution, Vol. 1, pp. 295-311. Dorigo, M. (1992). ‘‘Optimization, learning and natural algorithms’’, PhD thesis, Dip. Elettronica e Informazione, Politecnico di Milano, Milano. Dorigo, M., Maniezzo, V. and Colorni, A. (1996). ‘‘The ant system: optimization by a colony of cooperating agents’’, IEEE Transactions on Systems, Man, and Cybernetics. Part B, Cybernetics, Vol. 26 No. 1, pp. 29-41. Eberhart RC, Kennedy J (1995). A new optimizer using particle swarm theory. In: Proceedings of the Sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan, 1995 Ferreira, C. (2001). “Gene expression programming: a new adaptive algorithm for solving problems.” Complex Syst., 13(2), 87–129.

Ferreira, C. (2006). Gene expression programming: Mathematical modeling by an artificial intelligence, 2nd Ed., Springer-Verlag, Germany. Francone, F.D. and Deschaine, L.M.

(2004). “Extending the boundaries of design optimization by

integrating fast optimization techniques with machine-code-based, linear genetic programming”, Information Sciences, Vol. 161, pp. 99-120. Friedberg, R.M. (1958). “A learning machine: Part i”, IBM Journal of Research and Development, Vol. 2: 213. Gandomi A.H., Alavi A.H., (2011). “Multi-Stage Genetic Programming: A New Strategy to Nonlinear System Modeling.” Information Sciences, (23): 5227-5239. Gandomi A.H., Alavi A.H., (2012a). “Krill Herd: A New Bio-Inspired Optimization Algorithm.” Communications in Nonlinear Science and Numerical Simulation, Elsevier, in press. [DOI: 10.1016/j.cnsns.2012.05.010] Gandomi A.H., Alavi A.H., (2012b). “A New Multi-Gene Genetic Programming Approach to Nonlinear System Modeling. Part I: Materials and Structural Engineering Problems.” Neural Computing and Applications, 21(1): 171-187. Gandomi A.H., Alavi A.H., (2012c). “A New Multi-Gene Genetic Programming Approach to Nonlinear System Modeling. Part II: Geotechnical and Earthquake Engineering Problems.” Neural Computing and Applications, 21(1): 189-201. Gandomi, A.H., Alavi, A.H., Mirzahosseini, M.R., Moqaddas Nejad, F. (2011a). Nonlinear Genetic-Based Models for Prediction of Flow Number of Asphalt Mixtures” J. Mater. Civil Eng. ASCE 23(3), 1-18. Gandomi A.H., Alavi A.H., Sahab M.G., (2010). “New Formulation for Compressive Strength of CFRP Confined Concrete Cylinders Using Linear Genetic Programming.” Materials and Structures, 43(7): 963-983. Gandomi A.H., Alavi A.H., Yun G.J., (2011a). “Nonlinear Modeling of Shear Strength of SFRC Beams Using Linear Genetic Programming.” Structural Engineering and Mechanics, Techno Press, 38(1): 125. Gandomi, A. H., Yang, X. S., Alavi, A. H., (2011b). Mixed variable structural optimization using firefly algorithm, Computers and Structures, 89(23/24), 2325-2336. Gandomi, A. H., Yang, X. S., Alavi, A. H., (2011c). Cuckoo search algorithm: a metaheuristic approach to solve structural optimization problems, Engineering with Computers, (in press). DOI: 10.1007/s00366-011-0241-y

Gandomi A.H., Yang X.S., Talatahari S., Deb S., (2012a). “Coupled Eagle Strategy and Differential Evolution for Unconstrained and Constrained Global Optimization.” Computers and Mathematics with Applications, 63(1) 191-200. Gandomi A.H., Babanajad S.K., Alavi A.H., Farnam Y., (2012b). “A Novel Approach to Strength Modeling of Concrete under Triaxial Compression.” Journal of Materials in Civil Engineering. [DOI: 10.1061/(ASCE)MT.1943-5533.0000494] Gandomi A.H., Yang X.S., Talatahari S., Alavi A.H. (2012c). “Firefly Algorithm with Chaos.” Communications in Nonlinear Science and Numerical Simulation, Elsevier, 2012, in press. [DOI: 10.1016/j.cnsns.2012.06.009] Gandomi A.H., Talatahari, S., Yang X.S., Deb, S. (2012d). “Design optimization of truss structures using cuckoo search algorithm.” The Structural Design of Tall and Special Buildings, Wiley, 2012, in press. [DOI: 10.1002/tal.1033] Gandomi A.H., Yang X.S., Talatahari S., Alavi A.H., (2012e). " Bat Algorithm for Constrained Optimization Tasks" Neural Computing & Applications, Springer, 2012, in press. [DOI: 10.1007/s00521-0121028-9] Geem ZW (2006). Optimal cost design of water distribution networks using harmony search. Engineering Optimization 38:259¬–277 Geem ZW, Kim JH (2001). A new heuristic optimization algorithm; harmony search. Simulation 76:60–68 Genetic Algorithm and Game Theory” Chapter 5 in Metaheuristics in Water Resources, Geotechnical and Transportation Engineering, Xin-She Yang et al. (Eds.), Elsevier, 43-77. Girosi, F., Poggio, T. (1990). Networks and the best approximation property. Biological Cybernetics, 63(3), 169--176 Goh A.T.C., Goh S.H. (2007). Support vector machines: Their use in geotechnical engineering as illustrated using seismic liquefaction data. Computers and Geotechnics, 34, 410–421 Goss, S., Beckers, R., Deneubourg, J.L., Aron, S. and Pasteels, J.M. (1990). ‘‘How trail laying and trail following can solve foraging problems for ant colonies’’, in Hughes R.N. (Ed.), Behavioural Mechanisms in Food Selection, NATO-ASI Series, Vol. G 20, Berlin. Hadidi A., Kaveh A., Farahmand Azar B., Talatahari S., and Farahmandpour C., (2011). An Efficient Optimization Algorithm Based on Particle Swarm and Simulated Annealing for Space Trusses, International Journal of Optimization in Civil Engineering, Volume 1, Issue 3, Pages 375-395. Haykin, S. (1999) Neural networks – A comprehensive foundation (2nd Ed.), Prentice Hall Inc., Englewood Cliffs. Holland, J. (1975). Adaptation in Natural and Artificial systems, University of Michigan Press, Ann Anbor.

Javadi, A.A. and Rezania, M. (2009). “Applications of artificial intelligence and data mining techniques in soil modeling”, Geomechanics and Engineering, Vol. 1, No. 1, pp. 53-74. Karaboga, D., (2005). An Idea Based on Honey Bee Swarm for Numerical Optimization, Technical ReportTR06, Erciyes University, Computer Engineering Department. Karaboga, D., (2010). Artificial bee colony algorithm. Scholarpedia 5, 6915. Kaveh A. and Talatahari S., (2008) A Discrete Particle Swarm Ant Colony Optimization for Design of Steel Frames, Asian Journal of Civil Engineering, Volume 9, Issue 6, Pages 563-575. Kaveh A. and Talatahari S., (2010b) A Novel Heuristic Optimization Method: Charged System Search, Acta Mechanica, Volume 213, Issues 3-4 Pages 267-289. Kaveh A. and Talatahari S., (2009a). Hybrid Algorithm of Harmony Search, Particle Swarm and Ant Colony for Structural Design Optimization, Studies in Computational Intelligence, , Volume 239, Harmony Search Algorithms for Structural Design Optimization, Springer-Verlag Berlin Heidelberg, pp. 159198. Kaveh A. and Talatahari S., (2009b) A Particle Swarm Ant Colony Optimization Algorithm for Truss Structures With Discrete Variables, Journal of Constructional Steel Research, Volume 65, Issues 8-9, Pages 1558-1568. Kaveh A. and Talatahari S., (2009c) Particle Swarm Optimizer, Ant Colony Strategy and Harmony Search Scheme Hybridized for Optimization of Truss Structures, Computers and Structures, Volume 87, Issues 5-6, Pages 267-283. Kaveh A. and Talatahari S., (2010a). An Improved Ant Colony Optimization for Constrained Engineering Design Problems, Engineering Computations, International Journal for Computer-Aided Engineering and Software, Volume 27, Issue 1, Pages 155-182. Kaveh A. and Talatahari S., (2010d). Charged System Search for Optimum Grillage Systems Design Using the LRFD-AISC Code, Journal of Constructional Steel Research, Volume 66, Issue 6, Pages 767-771 Kaveh A. and Talatahari S., (2010c). Optimal Design of Skeletal Structures via The Charged System Search Algorithm, Structural and Multidisciplinary Optimization, Volume 41, Issue 6, Pages 893-911. Kennedy J, Eberhart RC, Shi Y (2001). Swarm intelligence. San Francisco (CA), Morgan Kaufman Publishers Kennedy, J. and Eberhart, R. (1995). ‘Particle swarm optimization’, in: Proc. of the IEEE Int. Conf. on Neural Networks, Piscataway, NJ, pp. 1942-1948. Kirkpatrick, S., Gellat, C. D., and Vecchi, M. P. (1983). ‘Optimization by simulated annealing’, Science, 220, 671-680. Koza, J.R., (1992) Genetic programming: On the programming of computers by means of natural selection, MIT Press, Cambridge:MA

Koziel, S. and Yang, X. S., (2011). Computational Optimization, Methods and Algorithms, Studies in Computational Intelligence, Vol. 356, Springer, Berlin, Germany. Lee KS, Geem ZW (2005) A new meta-heuristic algorithm for continuous engineering optimization: harmony search theory and practice. Computer Methods in Applied Mechanics and Engineering 194:3902–3933. Lee KS, Geem ZW, Lee S-H et al (2005). The harmony search heuristic algorithm for discrete structural optimization. Engineering Optimization 37:663–684. Metenidis M.F., Witczak M., Korbicz J. (2004). A novel genetic programming approach to nonlinear system modelling: application to the DAMADICS benchmark problem, Eng. Appl. Art. Int. 17, 363-370. Miller, J. and Thomson, P. (2002). “Cartesian Genetic Programming. In Genetic Programming, Poli R, Banzhaf W, Langdon B, Miller J, Nordin P, Fogarty TC (eds). Springer-Verlag, Berlin. Nikjoofar A., Zarghami M., (2012). “Water Distribution Networks Designing by the Multiobjective Oltean, M. and Dumitrescu, D. (2002). Multi expression programming. Technical report, UBB-01-2002, Babeş-Bolyai University, Cluj-Napoca, Romania. Oltean, M. and Grosşan, C. (2003a). “A comparison of several linear genetic programming techniques”, Advances in Complex Systems, Vol. 14 No. 4, pp. 1-29. Oltean, M. and Grosşan, C. (2003b). Solving Classification Problems using Infix Form Genetic Programming. In Intelligent Data Analysis, Berthold M (ed). LNCS 2810, 242-252, Springer-Verlag, Berlin. Oltean, M. and Grosşan, C. (2003c). Evolving evolutionary algorithms using multi expression programming. In Artificial life, LNAI, Springer-Verlag: Vol. 2801, pp. 651-8. Patterson, N. (2002). Genetic Programming with Context-Sensitive Grammars. Ph.D. Thesis, School of Computer Science, University of Scotland. Perlovsky, L.I. (2001). Neural networks and intellect, Oxford University Press. Pham, D.T., Ghanbarzadeh, A., Koc, E., Otri, S., Rahim, S., and Zaidi, M., (2006). The Bees Algorithm: A Novel Tool for Complex Optimisation Problems, Proceedings of IPROMS 2006 Conference, pp.454461. Poli, R., Langdon, W.B., McPhee, N.F. and Koza, J.R. (2007). Genetic programming: An introductory tutorial and a survey of techniques and applications. Technical report [CES-475]. UK: University of Essex, 2007. Rani D., Jain S.K., Srivastava D.K. and Perumal M. (2012). “Genetic Algorithms and Their Applications to Water Resources Systems” Chapter 3 in Metaheuristics in Water Resources, Geotechnical and Transportation Engineering, Xin-She Yang et al. (Eds.), Elsevier, 43-77.

Rumelhart, D.E., Hinton, G.E., Williams, R.J. (1986). Learning internal representations by error propagation.” Proc., Parallel Distributed Processing, MIT Press, Cambridge. Saka MP (2007) Optimum geometry design of geodesic domes using harmony search algorithm. Advances in Structural Engineering 10:595–606. Sakla S.S., Ashour A.F. (2005). Prediction of tensile capacity of single adhesive anchors us-ing neural networks, Comput. Struct. 83(21–22), 1792--1803. Sayadi, M. K., Ramezanian, R. and Ghaffari-Nasab, N., (2010). A discrete firefly meta-heuristic with local search for makespan minimization in permutation flow shop scheduling problems, Int. J. of Industrial Engineering Computations, 1 , 1--10. Shi Y, Eberhart RC. (1998). A modified particle swarm optimizer, Proceedings of IEEE International Conference on Evolutionary Computation, Alaska, 1998, 69-73. Shi Y, Eberhart RC. (1999). Empirical study of particle swarm optimization, Proceedings of the 1999 IEEE Congress on Evolutionary Computation 1999, Vol. 3, pp. 1945-1950. Storn R. and Price K.V., (1997). “Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces,” J. Global Opt., vol. 11, no. 4, pp. 341 – 359, 1997. Talatahari S., Kheirollahi M., Farahmandpour C., Gandomi A.H., (2013). A multi-stage particle swarm for optimum design of truss structures, Neural Comput & Applic, doi: 10.1007/s00521-012-1072-5 Talatahari S., Kaveh A., Mohajer Rahbari N., Parameter Identification of Bouc-Wen Model for MR Fluid Dampers Using Adaptive Charged System Search Optimization, Journal of Mechanical Science and Technology, Volume 26, Issue 8, 2012b. Talatahari S., Singh V.P. and Hassanzadeh Y. (2012a). “Ant Colony Optimization for Estimating Parameters of Flood Frequency Distributions” Chapter 6 in Metaheuristics in Water Resources, Geotechnical and Transportation Engineering, Xin-She Yang et al. (Eds.), Elsevier, 121-146. Talbi, E. G. (2009). Metahueristics: From Design to Implementation, Wiley, Hoboken, NJ, USA. Topcu IB, Sarıdemir M. (2008). Prediction of compressive strength of concrete containing fly ash using artificial neural networks and fuzzy logic. Computational Materials Science, 41, 305-311. Torres, R.S., Falcão, A.X., Gonçalves, M.A., Papa, J.P., Zhang, B., Fan, W. and Fox E.A. (2009). “A genetic programming framework for content-based image retrieval”, Pattern Recognition, Vol. 42 No. 2, pp. 283-92. Vapnik V. 1998. Statistical learning theory. New York: Wiley. Vapnik V. 1995. The nature of statistical learning. New York: Springer-Verlag. Wolpert, D.H., Macready, W.G. (1997), "No Free Lunch Theorems for Optimization," IEEE Transactions on Evolutionary Computation 1(1), 67-82.

Yang, X.S., (2005). Engineering optimization via nature-inspired virtual bee algorithms. in:

Artificial

Intelligence and Knowledge Engineering Applications: A Bioinspired Approach, Lecture Notes in Computer Science, 3562, pp. 317-323, Springer Berlin, Germany. Yang, X. S. (2008). Nature-Inspired Metaheuristic Algorithms, First Edition, Luniver Press, Frome, UK. Yang, X. S. and Deb, S., (2009). Cuckoo search via Lévy flights, in: Proc. of World Congress on Nature & Biologically Inspired Computing (NaBic 2009), IEEE Publications, USA, pp. 210-214. Yang, X. S. (2009). Firefly algorithms for multimodal optimization, 5th Symposium on Stochastic Algorithms, Foundation and Applications (SAGA 2009) (Eds Watanabe O. and Zeugmann T.), LNCS, 5792 , pp. 169--178. Yang, X. S. (2010a). Engineering Optimization: An Introduction with Metaheuristic Applications, John Wiley and Sons, Hoboken, NJ, USA. Yang, X. S., (2010b). A new metaheuristic bat-inspired algorithm, in: Nature-Inspired Cooperative Strategies for Optimization (NICSO 2010) (Eds. Gonzalez J. R. et al.), Springer, SCI 284 , pp. 65-74. Yang X.S. (2011a). ‘Review of metaheuristics and generalized evolutionary walk algorithm’, Int. J. BioInspired Computation, Vol. 3, No. 2, pp. 77-84. Yang, X. S., (2011b). Chaos-enhanced firefly algorithm with automatic parameter tuning, Int. J. Swarm Intelligence Research, 2(4), 1-11. Yang, X. S., (2011c). Metaheuristic optimization: algorithm analysis and open problems, in: Proceedings of 10th Symposium of Experimental Algorithms (SEA2011) (Editted by P. M. Pardalos and S. Rebennack), Lecture Notes in Computer Science, Vol. 6630, Springer, pp. 21-32 (2011). Yang, X. S., (2011d). Bat algorithm for multi-objective optimisation, Int. J. Bio-Inspired Computation, 3 (5), 267-274 (2011). Yang, X. S., (2011e). Metaheuristic optimization, Scholarpedia, 6(8), 11472. http://www.scholarpedia.org/article/Metaheuristic_Optimization (Date accessed: 14 Dec 2011). Yang, X. S., Gandomi, A. H., (2012). Bat algorithm: a novel approach for global engineering optimization, Engineering Computations, 29(5), 464-483. Yang X.S., Sadat Hosseini S.S., Gandomi A.H., (2012). “Firefly Algorithm for Solving Non-Convex Economic Dispatch Problems with Valve Loading Effect.” Applied Soft Computing, Elsevier, 12(3): 1180–1186. Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338–353.

Suggest Documents