Over the latest few years the part of System Identification has drawn interest of ... develop a latest model for efficient identification of nonlinear dynamic system.
DYNAMIC SYSTEM IDENTIFICATION USING ADAPTIVE ALGORITHMS
By SAIKAT SINGHA ROY
ACKNOWLEDGEMENT I am indebted to many people who contributed through their support, knowledge and friendship, to this work and the years at KIIT University, Bhubaneswar. I am grateful to my supervisor, Dr. Harpal Thethi, who gave me the opportunity to realize this work in the laboratory. He encouraged, supported and motivated me with much kindness throughout the work. I always had the freedom to follow my own ideas, which I am very grateful for. I really admire him for patience and staying power to carefully read the whole thesis. It is his help for which I stand where I am. I am also grateful to KIIT University for providing me adequate infrastructure to carry out the present investigations. I am thankful to Prof. Babita Majhi of Dept. of IT, Institute of Technical Education and Research, S ‘O’ A University, Bhubaneswar and Prof. Ganapati Panda of School of Electrical sciences, Indian Institute of Technology Bhubaneswar for extending their valuable suggestions and help whenever I approached. My special thanks to Prof. A. K. Sen, Dean of School of Electronics Engineering and Prof. S. S. Singh, M.Tech Co-ordinator of School of Electronics Engineering, for their constant inspiration and encouragement during my research. My hearty thanks to Susovan Mondal for constant encouragement and cooperation for the thesis work. I am also thankful to Aritra Ghosh, Smruti Ranjan Nath for their help, support and encouragement. I acknowledge all staff members of ECE department, KIIT University for helping me. I render my respect to all my family members for giving me mental support and inspiration for carrying out my research work.
ABSTRACT Over the latest few years the part of System Identification has drawn interest of numerous researchers due to its wide applicability to different fields. Adaptive direct modeling or system identification and adaptive inverse modeling or channel equalization find extensive applications in telecommunication, control system, instrumentation, power system engineering and geophysics. The identification task becomes very difficult, if the plants or systems are nonlinear and dynamic in nature. Further, the existing conventional methods like the least mean square (LMS) algorithms do not provide suitable training to build upprecise direct and inverse models. Very often these (LMS) derivative based algorithms do not lead to optimal solutions in pole-zero and Hammerstein type system identification problem as they have tendency to be trapped by local minima. To overcome this problem, in this report the Genetic algorithm (GA), Bacterial Foraging Optimization (BFO) and differential evolution (DE) technique has been properly applied to develop a latest model for efficient identification of nonlinear dynamic system. Hence there are two most significant problems which need attention to be determined. These are: (i) Development of accurate direct models of complex plants using some novel architecture and new learning techniques. (ii) Development of new training rules which alleviates local minima problem during training and thus help in generating improved adaptive models. In terms of identification performance this model is shown to outperform the multilayer perceptron model. A novel method of identification of FIR and IIR plants is proposed using Differential Evolution algorithm. It is shown that the new approach is more accurate in identification compared to those obtained by existing recursive LMS, genetic algorithm (GA) and BFO based approaches. The new scheme takes less computational effort, more accurate and consumes less input samples for training. DE algorithm is a population based algorithm like genetic algorithms using similar operators; crossover, mutation and selection. In this work, the performance of DE algorithm is compared to that of some other algorithms like LMS, GA and BFO. By adding some nonlinearity in the desired linear dynamic system, the performances of all these systems are compared. In this report we have proposed an efficient DE scheme for identification of complex nonlinear dynamic systems using Functional Link Artificial Neural Network (FLANN). An efficient DE scheme is selected according to its mutation strategy. FLANN is basically a single layer structure in which nonlinearity is introduced by enhancing the input pattern with nonlinear functional expansion. The results are then compared with some popular algorithm. Exhaustive simulation study shows that DE based algorithm shows superior performance in minimizing the error function.
CONTENTS Particulars
Page No.
CHAPTER .1 INTRODUCTION 1.1. Background 1.2. Motivation 1.3. Major contribution of the thesis CHAPTER.2 ADAPTIVE MODELING
1 1 3 3 4
2.1 Introduction 2.2 Adaptive filter 2.2.1 Adaptive FIR filter 2.2.2 Adaptive IIR filter
4 4 5 6
2.3 Artificial neural network 2.3.1 Single layer neuron 2.3.2 Multilayer Perceptron 2.3.1 FLANN
7 8 9 9
CHAPTER.3 ADAPTIVE ALGORITHMS
11
3.1 Introduction 3.2 Gradient based algorithms 3.2.1 LMS algorithm 3.2.2 Disadvantage of LMS algorithm
11 11 11 12
3.3 Evolutionary based algorithms 3.3.1 Genetic Algorithm 3.3.1.1 Operators of GA
13 13 14
3.3.1.2 Parameters of GA 3.3.2 Steps in GA 3.3.3Bacterial Foraging Optimization 3.3.3.1 Chemotaxis 3.3.3.2 Swarming 3.3.3.3 Reproduction 3.3.3.4 Elimination and Dispersal 3.3.4 Steps in BFO
16 18 19 20 21 21 22 25
3.3.5 Differential Evolution 3.3.5.1 Classical DE – How does it Work? 3.3.5.2 The Complete DE Family of Storn and Price
26 27 29
3.3.5.3 Summary of all Schemes 3.3.6 Steps in DE
30 31
CHAPTER.4 ADAPTIVE SYSTEM IDENTIFICATION OF FIR SYSTEMS USING LMS, GA, BFO, DE 33 4.1 Direct modeling using adaptive LMS algorithm
33
4.2 Direct modeling using adaptive GA algorithm
34
4.3 Direct modeling using adaptive BFO algorithm
35
4.4 Direct modeling using adaptive DE algorithm
37
4.5 Simulation Results
38
4.5.1 Linear System
38
4.5.2 Non-linear System
41
CHAPTER.5 DYNAMIC SYSTEM IDENTIFICATION USING FLANN STRUCTURE AND BP, GA, DE BASED LEARNING ALGORITHMS 49 5.1 Dynamic System Identification of Nonlinear System
49
5.2 A generalized FLANN structure based identification model
50
5.3 BP, GA and DE based nonlinear system identification
51
5.4 Simulation Results
55
CHAPTER.6CONCLUSION AND SCOPE FOR FURTHER WORK
71
6.1 Conclusion
71
6.2 Further research extension
72
References
73
LIST OF FIGURES Figure No
Figure Title
Page No.
2.1
The general adaptive filtering problem
05
2.2
Adaptive filter using Adaptive algorithms
06
2.3
Structure of an adaptive IIR filter
07
2.4
Structure of a single neuron
08
2.5
MLP Structure
09
2.6
Structure of FLANN model
10
3.1
Flow graph for GA
18
3.2
Movement of Bacteria
21
3.3
Flow chart of BFO
24
3.4
Illustrating creation of the donor vector in 2-D parameter space
29
4.1
Block diagram of an adaptive identification system using FIR filter structure 34
4.2
Plot of Desired vs Actual Output of the H1 (Z) system using LMS algorithm 39
4.3
Plot of Mean Square Error (MSE) vs Iteration for LMS
39
4.4
Plot of Mean Square Error (MSE) in db vs Iteration for LMS
39
4.5
Plot of desired response of linear system and response of model system for GA 40
4.6
Plot of desired response of linear system and response of model system for BFO 40
4.7
Plot of desired response of linear system and response of model system for DE 41
4.8
MSE comparison of GA for linear and non-linear system
42
4.9
Mean square error (MSE) for GA with different SNR
42
4.10 for GA
Plot of desired response of nonlinear (NL1) plant and response of Model system 43
4.11
Plot of desired response of nonlinear (NL1) plant and response of model system for BFO 43
4.12
Plot of desired response of nonlinear (NL1) plant and response of model system for DE 44
4.13
Plot of MSE vs. Iteration for different Exponential DE schemes with 30 db SNR 44
4.14
Plot of MSE vs. Iteration for different Binomial DE schemes with 30 db SNR 45
4.15
Plot of MSE vs. Iteration for Binomial and Exponential DE with 30 db SNR 45
4.16
Difference of DE and GA with respect to MSE for 20 db SNR
4.17
Comparison of system output between LMS, GA, BFO and DE for 30 db SNR 46
5.1
Block diagram of nonlinear system identification
49
5.2
Structure of FLANN model
50
5.3
Neural network using BP algorithm
51
5.4
Comparison of output response of model-1 using nonlinearity defined in (5.23) 58
5.5
Comparison of output response of example-1 using nonlinearity defined in (5.24) 59
5.6
Comparison of output response of example-1 using nonlinearity defined in (5.25) 61
5.7
Comparison of output response of model-2
63
5.8
Comparison of output response of model-3
66
5.9
Comparison of output response of model-4
68
5.10
Comparison of different DE strategies for model-1
69
5.11
Comparison of different DE strategies for model-2
69
5.12
Comparison of different DE strategies for model-3
70
46
LIST OF TABLES Table No. Table Name
Page No.
4.1
Comparison between Linear and Non-Linear transversal structure for GA 47
4.2
Comparison between Linear and Non-Linear transversal structure for DE 47
4.3 Comparison between of DE and GA with respect to error between actual signal and desired signal 48 4.4 Comparison between of DE and GA with respect to NMSE between actual signal and desired signal for different nonlinear functions and linear system 48 4.5 Comparison between of LMS, GA, BFO and DE with respect to system coefficient between actual signal and desired signal for different nonlinear functions and linear system 48 5.1
Comparison of NMSE of different examples
68
GLOSSARY
AWGN
Additive white Gaussian noise
BFO
Bacterial foraging optimization
BIBO
Bounded input bounded output
BP
Back propagation
CNN
Chebyshev neural network
CR
Crossover ratio
DE
Differential Evolution
DSP
Digital signal processing
FE
Functional expansion
FIR
Finite impulse response
FLANN
Functional link artificial neural network
GA
Genetic Algorithm
IIR
Infinite impulse response
LMS
Least mean square
MF
Mutation factor
MIMO
Multiple input multiple output
MLANN
Multilayer artificial neural network
MLP
Multilayer perceptron
MMSE
Minimum mean square error
MSE
Mean square error
NMSE
Normalized mean square error
PSO
Particle swarm optimization
RBF
Radial basis function
RLS
Recursive least square
SI
Swarm intelligence
SISO
Single input single output
CHAPTER 1 INTRODUCTION 1.1. Background
I
dentification of a nonlinear dynamic plant is a major area in engineering today. System identification is widely used in a numerous applications like Biological processes [1], Control system [2], Signal processing [3], intelligent sensor design [4], process control [5], power system engineering [6], image and speech processing [6], geophysics [7], acoustic noise and vibration control [8] and Communication engineering [9]. Many practical systems used in process control, robotics and autonomous system are nonlinear and dynamic (i.e., no prior knowledge) in nature. To find a perfect model of these types of plants is a challenging task. There are certain classical parameterized models such as Winner-Hamarstein[10], Voltera Series [11] and Polynomial identification model [12-13] which offers a reasonable precision, but the problem with these methods is that they involve lot of computational complexity. Subsequently, many neural network based approaches like MultiLayer Perception (MLP), Radial Basis Function (RBF) and Recurrent neural network etc. have been applied to nonlinear system identification problem. For basic neural network generally Back Propagation (BP) is used as an adaptive algorithm, to provide better accuracy. Earlier Nerandra and Parthasarathy (1990) [14] have employed the multilayer perceptron (MLP) networks for effective identification and control of dynamic systems like truck-backer-upper problem [15].However, the major disadvantage of earlier methods is that, they employ derivative based learning algorithm (BP), to train their system parameters which can lead to local minima thereby leading to incorrect estimation of the parameters. The direct modeling mainly refers to adaptive identification of unknown plants. Simple static linear plants are easily identified through parameter estimation using conventional derivative based least mean square (LMS) type algorithms [16]. But most of the practical plants are dynamic, nonlinear and combination of these two characteristics. In many applications Hammerstein and MIMO plants need identification. In addition the output of the plant is associated with measurement or additive white Gaussian noise (AWGN).
1
Identification of such complex plants is a difficult task and poses many challenging problems. The conventional LMS and recursive least square (RLS) [17] techniques work well for identification of static plants but when the plants are of dynamic type, the existing forward-backward LMS [18] and the RLS algorithms very often lead to non-optimal solution due to premature convergence of weights to local minima [19]. This is a major drawback of the use of existing derivative based techniques. To alleviate this burning issue this thesis suggests the use of derivative free optimization techniques in place of conventional techniques. In recent past population based optimization techniques have been reported which fall under the category of evolutionary computing [20] or computational intelligence [21]. These are also called bio-inspired techniques which include genetic algorithm (GA) and its variants [22], bacterial foraging optimization (BFO) and its variants [23] and Differential Evolution and its variants [24]. These techniques are suitably employed to obtain efficient iterative learning algorithms for developing adaptive direct and inverse models of complex plants and channels. Development of direct adaptive models essentially consists of two components. The first component is an adaptive network which may be linear or nonlinear in nature. Use of a nonlinear network is preferable when nonlinear plants or channels are to be identified. The linear networks used in the thesis are adaptive linear combiner or all-zero or FIR structure [25] and pole-zero or IIR structure [25]. Under nonlinear category low complexity single layer function link artificial neural network (FLANN) [26] and multilayer perceptron network (MLP) [27] are used. The second component is the training or learning algorithm used to train the parameters of the model. As stated earlier the structures used are trained by bio-inspired techniques such as GA, BSO and DE. Depending upon the complexity and nature of the plants to be identified proper combination of network of the model and corresponding bio-inspired learning rule is selected so that the combination yields the best possible performance in direct modeling tasks. This requires the knowledge of prior experience and simulation results. One of the objectives of the present investigation is to choose models with appropriate combination of structure and algorithm so that it provides best possible performance of direct models. The bio-inspired optimization tools can not directly be applied to develop direct models of plants as those are not aimed to be used for training of parameters of models. Therefore another motivation of investigation is to formulate the direct modeling problems as optimization problems and then to introduce bio-inspired techniques suitably to effectively optimize the cost function of the models. In 2
conventional identification and equalization problems, the mean square error at the output is considered as the cost function to be minimized by using bioinspired techniques. 1.2. Motivation In summary the main motivations of the research work carried in the present thesis are the following: (i) To formulate the direct and inverse modeling problems as error square optimization problems (ii) To introduce bio-inspired optimization tools such as GA and DE and their variants to efficiently minimize the squared error cost function of the models. In other words to develop alternate identification scheme. (iii) To achieve improved identification (direct modeling) of complex nonlinear all-zero, pole-zero by introducing new and improved identification algorithms. (iv) The objective of identification is to determine a suitable mathematical model of a given system/process useful for predicting the behavior of the system under different operating conditions.
1.3. Major contribution of the thesis A low complexity functional link artificial neural network based nonlinear dynamic system identifier has been developed and its learning algorithm has been derived. Improved identification performance has been demonstrated through simulation study. Differential Evolution technique has been used to effectively identify IIR plants. Further, extensive simulation study has been made on the use of the proposed method to effectively identify higher order plants with lower order models. The new approach has been shown to overcome local minima problem in a multimodal situation and accurate identification for nonlinear dynamic models.
3
CHAPTER 2 ADAPTIVE MODELING
2.1 Introduction he main motive of the research work carried out in this thesis is to develop elegant and efficient adaptive identification schemes for complex nonlinear and dynamic plants. These adaptive models inherently need suitable adaptive structures and appropriate learning rules to train the parameters of these models. In the present investigation, I briefly outline some selected adaptive architectures such as adaptive linear combiner, adaptive pole zero filters, functional link artificial neural network and multilayer artificial neural network. In addition we present some recently developed population based bio-inspired derivative free techniques such as GA and DE for training the parameters or coefficients of the adaptive structures. An adaptive linear combiner or filter is feed forward in structure. By choosing a particular adaptive filter structure, one specifies the number and type of parameters that need adjustments. The adaptive algorithms used to update these parameters tend to minimize the cost function of the model. The main contribution of the thesis is to solve these complex optimization problems using bio-inspired based learning rules. In following section, the general adaptive filtering problem is presented and the mathematical notation for representing the form and operation of the adaptive filter is introduced.
T
2.2 Adaptive filter Figure 2.1 shows a block diagram of an adaptive FIR filter or an adaptive linear combiner in which a sample from a digital input signal xk is fed into an adaptive filter, that computes a corresponding output signal sample yk at time k. The output signal is compared to a second signal dk, called the desired signal.
4
xk
yk
Adaptive
-
dk
+ ∑
Filter ek
Fig. 2.1 The general adaptive filtering problem
The difference signal given by (2.1) is known as the error signal. The error signal is used to adapt the parameters of the filter from time k to time (k+1) in a well-defined manner. This process of adaptation is represented by an oblique arrow. As the time index k is incremented the output of the adaptive filter matches a better and betters to the desired signal following an adaptation process such that the magnitude of ek decreases over time. 2.2.1 Adaptive FIR filter The general architecture of an FIR adaptive filter or a adaptive linear combiner is depicted in Fig. 2.2. Let X isNth input pattern having one unit delay in each instant. This process is also called as adaptive linear combiner. Let Xn= [x(n) x(n −1)........... x(n − M +1)] form of the M -by-1 tap input vector and M−1 is the number of delay elements. The tap weights Wn=[w0(n) w1(n) ......... wMT form the elements of the M -by-1 tap weight vector. The output is 1(n)] represented as, (2.2) The output can be represented in vector notation as (2.3)
5
x(n)
w0(n)
Z-1
x(n-1)
Z-1
w1(n)
x(n-2)
w2(n)
…
Z-1
…
wM-1(n)
x(n-M+1)
∑ Adaptive Algorithm
e(n)
y(n)
∑ d(n)
Fig. 2.2 Adaptive filter using Adaptive algorithms
2.2.2 Adaptive IIR filter The structure of a direct-form adaptive IIR filter is shown in Fig. 2.3. In this case, the output of the system is given by (2.4) The terms am(n) and bm(n)represent the feed forward and feedback coefficients of the filter respectively. In matrix form, y(n) may be written as (2.5) Where the combined weight vector is (2.6) And the combined input and output signal vector is (2.7)
6
b0(n)
u(n)
y(n) ∑ Z-1
Z-1 a0(n)
-1
Z
Z-1
Z-1
Z-1 u(n-M+1)
e(n) y(n-1)
b1(n)
u(n-1)
d(n) ∑
aM-1(n)
bM-1(n)
y(n-M+1)
Adaptive Algorithm
Fig. 2.3 Structure of an adaptive IIR filter
The weight update operation of adaptive IIR filter is carried out using either conventional derivative based or derivative free learning algorithms. In addition to the linear structures nonlinear structure can be used for which the principle of superposition does not hold when the parameter values are fixed. Such systems are useful when the relationship between d(n) and x(n) is not linear in nature. This class of nonlinear structure consists of artificial neural network (ANN), functional link artificial neural network (FLANN) and radial basis function (RBF) network. These networks inherently contain distributed nonlinear elements in each path like the sigmoid function in ANN, sine / cosine terms in FLANN and Gaussian function in RBF network. In the next section details of these nonlinear structures are dealt.
2.3 Artificial neural network (ANN) Artificial neural network (ANN) takes its name from the network of nerve cells in the brain. Recently, ANN has proved to be an important technique for classification and optimization problems [27-29]. McCulloch and Pitts have developed the neural networks for different computing machines. There are extensive applications of various types of ANN in the field of communication, control, instrumentation and forecasting. The ANN is capable of performing nonlinear mapping between the input and output space due to its large parallel interconnection between different layers and the nonlinear processing characteristics. An artificial neuron basically consists of a computing element that performs the weighted sum of the input signal and the connecting weight. 7
The sum is added with the bias or threshold and the resultant signal is then passed through a nonlinear function of sigmoid or hyperbolic tangent type. Each neuron is associated with three parameters whose learning can be adjusted; these are the connecting weights, the bias and the slope of the nonlinear function. For the structural point of view, a neural network (NN) may be single layer or it may be multilayer. In multilayer structure, there is one or many artificial neurons in each layer and for a practical case there may be a number of layers. Each neuron of the one layer is connected to each neuron of the next layer. The functional-link ANN is another type of single layer NN. In this type of network the input data is allowed to pass through a functional expansion block where the input data are nonlinearly mapped to more number of points. This is achieved by using trigonometric functions, tensor products or power terms of the input. The output of the functional expansion is then passed through a single neuron. Basically there are two types of NNs used in practice i) Single neuron structure ii) Multilayer perception (MLP). 2.3.1 Single layer neuron The basic structure of an artificial neuron is presented in Fig. 2.4 Input Wj(n)
X1(n)
α(n) Activation Function
X2(n)
.
∑
Φ(.)
y(n) Output
. XN(n)
Fig. 2.4 Structure of a single neuron
The operation in a neuron involves the computation of the weighted sum of inputs and threshold. The resultant signal is then passed through a nonlinear activation function. This is also called as a perceptron, which is built around a nonlinear neuron. The output of the neuron may be represented as, (2.8) whereα (n) is the threshold to the neurons at the first layer, wj (n)is the weight associated with the jth input, N is the no. of inputs to the neuron and ϕ (.) is the nonlinear activation function.
8
2.3.2 Multilayer Perceptron In the multilayer neural network or multilayer perceptron (MLP), the input signal propagates through the network in a forward direction, on a layerby-layer basis. This network has been applied successfully to solve some difficult and diverse problems by training in a supervised manner with a highly popular algorithm known as the error back-propagation algorithm [27-28].The scheme of MLP using four layers is shown in Fig. 2.5. x i(n)represent the input to the network, fj and fk represent the output of the two hidden layers and yl(n)represents the output of the final layer of the neural network. The connecting weights between the input to the first hidden layer, first to second hidden layer and the second hidden layer to the output layers are represented by, and wij wjk wkl respectively.
+1
wij
+1 Wjk
+1
wkl
Input Signal xi(n)
Output Signal
.
yi(n)
. . . InputLayer
.
.
.
.
.
2nd Hidden Layer
OutputLayer
1st Hidden Layer (Layer-1)
(Layer-4)
(Layer-3)
(Layer-2)
Fig. 2.5 MLP Structure
2.3.1 Function Link Artificial Neural Network (FLANN) FLANN, a single layered Artificial Neural Network structure proposed by Pao, which is capable of forming complex decision regions by generating nonlinear decision boundaries [30]. In this structure, the pattern dimension space is increased by enhancing the input patterns by using some nonlinear functions. For nonlinear dynamic system identification, FLANN structure has been proposed in [31].In order to identify the plants a series-parallel scheme is 9
employed during training phase [14]. A structure of a FLANN is shown in Fig.2.6.
v0(k)
1
v1(k) v2(k) x(k)
h0(k) yp(k+1) h1(k) h2(k)
y p (k 1)
∑
∑
. .
h2n+1(k)
v2n+1(k)
e (k+1) Adaptive Algorithm
Fig.2.6 Structure of FLANN model
Input x(k) is expanded by using nonlinear functions and is then applied to an adaptive linear combiner whose weights are updated by using adaptive algorithm. Earlier in [31] trigonometric expansion was proposed, as it offered better performance for most of the applications. Hence in the proposed investigation too we have selected trigonometric expansion, which is given as follows: (2.9) = [v0(k),v1(k)…….v2n+1(k)]
(2.10)
Here there are n numbers of sine and cosine expansions of input samples. The first term v0(k) is an unity input. There are a total of (2n + 1) numbers of terms in the input vector. The weight vector related to the kth input vector defined in (6) is given as: (2.11) Hence the estimated output of the identification model is computed as:
y p (k 1) v(k ).h(k )
10
(2.12)
CHAPTER 3 ADAPTIVE ALGORITHMS 3.1 Introduction
A
n adaptive algorithm is a procedure for adjusting the parameters of an adaptive filter to minimize a cost function chosen for the task at hand. There is no unique solution of the adaptive filtering problem. Because there is no unique algorithm to find the global minimum of a performance surface. Adaptive algorithm is mainly two types: (i) Gradient based algorithm. (ii) Evolutionary based algorithm. Choice of the appropriate algorithm is more important. Least Mean Square (LMS), Recursive Least Square (RLS) & Back Propagation (BP) algorithm are gradient decent algorithms and Genetic Algorithm (GA), Differential Evolution (DE), Particle Swarm Optimization (PSO) etc. are evolutionary based algorithms. In this thesis we used LMS, BP, GA and DE is used for training of the model.
3.2 Gradient based algorithms These types of algorithms are gradient search in nature and have been derived by taking the derivative of the squared error .During the process of training these algorithms tend to drive the weights of the model to local minima. This leads to premature termination of weights. As a result the mean square error does not attend to local minima and hence the accuracy of prediction becomes inferior. However these algorithms are simple to implement and can be expressed in close form equations. A brief description of simple LMS algorithm is presented below. 3.2.1 LMS algorithm The least-mean-square (LMS) is a search algorithm in which a simplification of the gradient vector computation is made possible by appropriately modifying the objective function. The LMS algorithm, as well as others related to it, is widely used in various applications of adaptive filtering 11
due to its computational simplicity. The convergence characteristics of the LMS algorithm are examined in order to establish a range for the convergence factor that will guarantee stability. The convergence speed of the LMS is shown to be dependent on the eigenvalue spread of the input signal correlation matrix. In this chapter, several properties of the LMS algorithm are discussed including the misadjustment in stationary and non-stationary environment and tracking performance. The analysis results are verified by a large number of simulation examples. The LMS algorithm is by far the most widely used algorithm in adaptive filtering for several reasons. The main features that attracted the use of the LMS algorithm are low computational complexity, proof of convergence in stationary environment, unbiased convergence in the mean to the Wiener solution, and stable behavior when implemented with finite-precision arithmetic. The convergence analysis of the LMS presented here utilizes the independence assumption. The weight updating equation for nth instant is given by (3.1) Where Δwk(n)is the change of k th weight at n th iteration. The change in weight of each path in each iteration is obtained by minimizing the mean squared error. Using this value the weight update equation is given as (3.2) Where η is the learning rate parameter (0 ≤ η ≤ 1). This procedure is repeated till the mean square error (MSE) of the network approaches a minimum value.
LMS Algorithm Initialization x (0) = w(0) = [0 0 . . . 0]T Do for n ≥ 0 e (n) = d (n) – xT(n) w (n) w (n + 1) = w(n) + 2μe(n)x(n) 3.2.2 Disadvantage of LMS algorithm 1. Derivative based algorithm, so there are chances that the parameters may fall to local minima during training. 2. Does not perform satisfactorily under high noise condition. 3. Does not perform satisfactorily if the order of the filter increases. 12
4. Once close to optimal solution they normally rattle around it rather than converging. 5. Does not perform satisfactorily for nonlinear system identification. To overcome this type of problems Evolutionary Algorithm like GA, BFO and DE algorithms is considered. 3.3 Evolutionary based algorithms As the history of the field suggests there are many different variants of Evolutionary Algorithm. The common underlying idea behind all these techniques is the same: given a population of individuals the environmental pressure causes natural selection (survival of the fittest) and this cause arise in the fitness of the population. Given a quality function to be maximized we can randomly create a set of candidate solution, i.e., elements of the function’s domain, and apply the quality function as a abstract fitness measure – the higher the better. Based on the fitness, some of the better candidates are choose to seed the next generation by applying recombination and/ or mutation to them. Recombination is an operator applied to two or more selected candidate (the socalled parents) and results one or more new candidates (the children). Mutation is applied to one candidate and results in one new candidate. Executing recombination and mutation leads to asset of new candidates (offspring) that compete – based on their fitness (and possibly age) - with the old ones for a place in the next generation. This process can be iterated until a candidate with sufficient quality (a solution) is found or a previously set computational limit is reached.GA, BFO and DE are Evolutionary based algorithms. There are different types of evolutionary algorithms like (i) Genetic Algorithm(GA) (ii) Differential Evolution(DE) (iii) Particle Swarm Optimization(PSO) (iv) Bacterial Foraging Optimization(BFO) 3.3.1 Genetic Algorithm Genetic algorithms are a part of evolutionary computing, which is a rapidly growing area of artificial intelligence. Genetic algorithms are inspired by Darwin's theory of evolution. Simply said, problems are solved by an evolutionary process resulting in a best (fittest) solution (survivor) - in other words, the solution is evolved. Genetic Algorithms are a family of computational models inspired by evolution. These algorithms encode a 13
potential solution to a specific problem on a simple chromosome like data structure and apply recombination operators to these structures so as to preserve critical information. Genetic algorithms are often viewed as function optimizers, although the range of problems to which genetic algorithms have been applied is quite broad. An implementation of a genetic algorithm begins with a population of typically random Chromosomes. One then evaluates these structures and allocates reproductive opportunities in such a way that those chromosomes which represent a better solution to the target problem are given more chances to reproduce, than those chromosomes which are poorer solutions. The goodness of a solution is typically defined with respect to the current population. In a broader usage of the term a genetic algorithm is any population based model that uses selection and recombination operators to generate new sample points in a search space. The genetic algorithm is a probabilistic search algorithm that iteratively transforms a set (called a population) of mathematical objects (typically fixedlength binary character strings), each with an associated fitness value, into a new population of offspring objects using the Darwinian principle of natural selection and using operations that are patterned after naturally occurring genetic operations, such as crossover (sexual recombination) and mutation. 3.3.1.1 Operators of GA (a) Population In GA, a string of certain data type where the data type can be numeric, binary, or a user defined type represents a solution. The string structure is called chromosome. Again the population comprises a group of chromosomes from which candidates can be selected for the solution of the problem. For example in binary coding, each parameter is coded using a string of L bits where by the parameter can take values lying in the interval [0 to 1]. A linear mapping procedure is used to decode any unsigned integer from [0] to a specific interval [low/high]. For multi parameter optimization, the coded parameter values are concatenated to from a large string which then forms one member (chromosome) of a population. (b)Crossover Crossover operates on selected genes from parent chromosomes and creates new offspring. The simplest way how to do that is to choose randomly some crossover point and copy everything before this point from the first parent
14
and then copy everything after the crossover point from the other parent. Crossover can be illustrated as follows: ( | is the crossover point) 11011|00100110110 11011|11000011110 11011|11000011110 11011|00100110110
Chromosome 1 Chromosome 2 Offspring 1 Offspring 2
There are other ways how to make crossover, for example we can choose more crossover points. Crossover can be quite complicated and depends mainly on the encoding of chromosomes. Specific crossover made for a specific problem can improve performance of the genetic algorithm. Single point crossover - one crossover point is selected, binary string from the beginning of the chromosome to the crossover point is copied from the first parent, the rest is copied from the other parent. Parent A 11001011
Parent B 11011111
Offspring 11001111
Two point crossover- two crossover points are selected, binary string from the beginning of the chromosome to the first crossover point is copied from the first parent, the part from the first to the second crossover point is copied from the other parent and the rest is copied from the first parent again Parent A 11001011
Parent B 11011111
Offspring 11011111
(c) Mutation After a crossover is performed, mutation takes place. Mutation is intended to prevent falling of all solutions in the population into a local optimum of the solved problem. Mutation operation randomly changes the offspring resulted from crossover. In case of binary encoding we can switch a few randomly chosen bits from 1 to 0 or from 0 to 1. Mutation can be then illustrated as follows: 1101111000011110 1101100100110110 1100111000011110
Original Offspring 1 Original Offspring 2 Mutated Offspring 1 15
1101101100110110
Mutated Offspring 2
The technique of mutation (as well as crossover) depends mainly on the encoding of chromosomes. For example when we are encoding permutations, mutation could be performed as an exchange of two genes. (d) Selection Finally the fitness (cost function) of original parents, offspring (children after crossover of selected parents) and mutated offspring are calculated. Again the best chromosomes are selected from the entire pools which are then treated as the parents of the next generation. There are many methods in selecting the best chromosomes. Examples are roulette wheel selection, Boltzman selection, tournament selection, rank selection, steady state selection and some others. Roulette Wheel Selection Parents are selected according to their fitness. The better the chromosomes are, the more chances to be selected they have. Imagine a roulette wheel where all the chromosomes in the population are placed. The size of the section in the roulette wheel is proportional to the value of the fitness function of every chromosome - the bigger the value is, the larger the section is. A marble is thrown in the roulette wheel and the chromosome where it stops is selected. Clearly, the chromosomes with bigger fitness value will be selected more times. The entire process continues till the global optimum is reached. 3.3.1.2 Parameters of GA Crossover and Mutation Probability There are two basic parameters of GA - crossover probability and mutation probability. Crossover probability: How often crossover will be performed. If there is no crossover, offspring are exact copies of parents. If there is crossover, offspring are made from parts of both parent's chromosome. If crossover probability is 100%, then all offspring are made by crossover. If it is 0%, whole new generation is made from exact copies of chromosomes from old population (but this does not mean that the new generation is the same!). Crossover is made in hope that new chromosomes will contain good parts of old chromosomes and therefore the new chromosomes will be better. However, it is good to leave some part of old population survives to next generation.
16
Mutation probability: How often parts of chromosome will be mutated. If there is no mutation, offspring are generated immediately after crossover (or directly copied) without any change. If mutation is performed, one or more parts of a chromosome are changed. If mutation probability is 100%, whole chromosome is changed, if it is 0%, nothing is changed. Mutation generally prevents the GA from falling into local extremes. Mutation should not occur very often, because then GA will in fact change to random search. Other Parameters There are also some other parameters of GA. One another particularly important parameter is population size. Population size: how many chromosomes are in population (in one generation). If there are too few chromosomes, GA has few possibilities to perform crossover and only a small part of search space is explored. On the other hand, if there are too many chromosomes, GA slows down. Research shows that after some limit (which depends mainly on encoding and the problem) it is not useful to use very large populations because it does not solve the problem faster than moderate sized populations.
17
Flow graph for Simple Genetic Algorithm START SPECIFY THE CONTROL PARAMETER CREATE AN INITIAL POPULATION ANC OPERATION AND EVALUATE FITNESS FUNCTION
SELECTION
CROSSOVER MUTATION ANC OPERATION AND EVALUATE FITTNESS FUNCTION UPDATE POPULATION
GENERATIO N>G
STOP
Fig. 3.1Flow graph for GA
3.3.2 Steps in GA 1. [Start] Generate random population of n chromosomes (suitable solutions for the problem) 2. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population 3. [New population] Create a new population by repeating following steps until the new population is complete 1. [Selection] Select two parent chromosomes from a population according to their fitness (the better fitness, the bigger chance to be selected)
18
2. [Crossover] with a crossover probability cross over the parents to form new offspring (children). If no crossover was performed, offspring is the exact copy of parents. 3. [Mutation] with a mutation probability mutate new offspring at each locus (position in chromosome). 4. [Accepting] Place new offspring in the new population 4. [Replace] Use new generated population for a further run of the algorithm 5. [Test] if the end condition is satisfied, stops, and returns the best solution in current population 6. [Loop] Go to step 2 3.3.3 Bacterial Foraging Optimization Natural selection tends to eliminate animals with poor "foraging strategies" (methods for locating, handling, and ingesting food) and favor the propagation of genes of those animals that have successful foraging strategies since they are more likely to enjoy reproductive success (they obtain enough food to enable them to reproduce). After many generations, poor foraging strategies are either eliminated or shaped into good ones (redesigned). Logically, such evolutionary principles have led scientists in the field of "foraging theory" to hypothesize that it is appropriate to model the activity of foraging as an optimization process: A foraging animal takes actions to maximize the energy obtained per unit time spent foraging, in the face of constraints presented by its own physiology (e.g., sensing and cognitive capabilities) and environment (e.g., density of prey, risks from predators, physical characteristics of the search area). Evolution has balanced these constraints and essentially "engineered" what is sometimes referred to as an "optimal foraging policy" (such terminology is especially justified in cases where the models and policies have been ecologically validated). Optimization models are also valid for "social foraging" where groups of animals cooperatively forage. The bacterial swarm proceeds through four principal mechanisms namely chemotaxis, swarming, reproduction and elimination-dispersal. Below we briefly describe each of these processes and finally provide a pseudo-code of the entire algorithm.
19
Let us define a chemotactic step to be a tumble followed by a tumble or a tumble followed by a run. Let jbe the index for the chemotactic step. Let k be the index for the reproduction step. Let l be the index of the eliminationdispersal event. Also let p: Dimension of the search space, S: Total number of bacteria in the population, Nc: The number of chemotactic steps, Ns: The swimming length. Nre: The number of reproduction steps, Ned : The number of elimination-dispersal events, Ped: Elimination-dispersal probability, C (i): The size of the step taken in the random direction specified by the tumble. 3.3.3.1 Chemotaxis This process simulates the movement of an E.colicell through swimming and tumbling via flagella. Biologically an E.coli bacterium can move in two different ways. It can swim for a period of time in the same direction or it may tumble, and alternate between these two modes of operation for the entire lifetime. Suppose θi( j, k, l) i q represents i-th bacterium at j-th chemotactic, k-th reproductive and l-th elimination dispersal step. C(i) is the size of the step taken in the random direction specified by the tumble (run length unit). Then in computational chemotaxis the movement of the bacterium may be represented by (3.3) Where D indicates a unit length vector in the random direction.
20
SWIM
TUMBLE Fig.3.2. Movement of Bacteria
3.3.3.2 Swarming An interesting group behavior has been observed for several motile species of bacteria including E.coliand S. typhimurium, where stable spatiotemporal patterns (swarms) are formed in semisolid nutrient medium. A group of E.coli cells arrange themselves in a traveling ring by moving up the nutrient gradient when placed amidst a semisolid matrix with a single nutrient chemoeffecter. The cells when stimulated by high level of succinate release an attractantaspertate, which helps them to aggregate into groups and thus move as concentric patterns of swarms of high bacterial density. The cell to cell, signaling in E.colis warm may be represented with the following function.
(3.4)
Where θ=[θ1 θ2 …….θD]T is a point in the D-dimensional search domain. 3.3.3.3 Reproduction: The least healthy bacteria eventually die while each of the healthier bacteria (those yielding lower value of the objective function) asexually split into two bacteria, which are then placed in the same location. This keeps the swarm size constant. 21
After Ncchemotactic steps, a reproduction step is taken. Let Nrebe the number of reproduction steps to be taken. For convenience, we assume that S is a positive even integer. Let S = (Sr/2) be the number of population members who have had sufficient nutrients so that they will reproduce (split in two) with no mutations. For reproduction, the population is sorted in order of ascending accumulated cost (higher accumulated cost represents that a bacterium did not get as many nutrients during its lifetime of foraging and hence is not as “healthy” and thus unlikely to reproduce); then the Sr least healthy bacteria die and the other Sr healthiest bacteria each split into two bacteria, which are placed at the same location. Other fractions or approaches could be used in place of (1); this method rewards bacteria that have encountered a lot of nutrients and allows us to keep a constant population size, which is convenient in coding the algorithm. 3.3.3.4 Elimination and Dispersal: Gradual or sudden changes in the local environment where a bacterium population lives may occur due to various reasons e.g. a significant local rise of temperature may kill a group of bacteria that are currently in a region with a high concentration of nutrient gradients. Events can take place in such a fashion that all the bacteria in a region are killed or a group is dispersed into a new location. To simulate this phenomenon in BFOA some bacteria are liquidated at random with a very small probability while the new replacements are randomly initialized over the search space.
22
START
INITIALISE ALL VARIABLE= 0 INCREASE ELIMINATION AND DISPERSION COUNTER BY 1 NO I