soft computing techniques - Shodhganga

Chapter-3

SOFT COMPUTING TECHNIQUES

Soft computing is an emerging approach to computing which parallels the remarkable ability of the human mind to reason and learn in an environment o f uncertainty and imprecision Zedah L. A. (1992)

3.1 Meaning of Soft-Computing Soft-computing is an innovative approach to construct computationally intelligent systems, to possess human-like expertise within a specific domain and the ability to adapt and learn in changing environments, to achieve this complex goal, a single computing paradigm or solution is not sufficient therefore an intelligent systems is being required which combine knowledge, techniques and methodologies from various sources so that the soft computing came in the arena [200]. Softcomputing encompasses a group of unique methodologies, contributed mainly by Expert System, Fuzzy Logic, ANN and EA, which provide flexible information processing capabilities to solve real-life problems. The advantages of employing soft computing is its capability to tolerate imprecision, uncertainty, and partial truth to achieve tractability and robustness on simulating human decision-making behavior with low cost [151]. In other words, soft computing provides an opportunity to represent ambiguity in human thinking with the uncertainty in real life [266]. Soft computing is a wide ranging group of techniques such as neural networks, genetic algorithms, nearest neighbor, particle swarm optimization, ant colony optimization, fuzzy systems, rough sets, simulated annealing, DNA computing, Quantum computing, Membrane computing etc. While some of these techniques are still in the emerging stage, the rest of them have found wide spread use in the area of Pattern recognition, Classification, Image processing, Voice recognition, Data mining etc. Each of these methodologies has their own strength. The seamless integration of these methodologies to create intelligent systems forms the core of soft computing [200]. Soft Computing is useful where the precise scientific tools are incapable of giving low cost, analytic, and complete solution. Scientific

methods of previous centuries could model, and precisely analyze, merely, relatively simple systems of physics, classical Newtonian mechanics, and engineering. However, many complex cases, e.g. systems related to biology and medicine, humanities, management sciences, and similar fields remained outside of the main territory of successful applications of precise mathematical and analytical methods. Soft computing is not a homogeneous body of concepts and techniques. Rather it is a collection of methodologies, which in one form or another reflect the guiding principle of soft computing: exploit the tolerance for imprecision, uncertainty, and partial truth to achieve tractability, robustness, and low solution cost. Viewed in a slightly different perspective, soft computing is a consortium of methodologies which, either singly or in combination, serve to provide effective tools for the development of intelligent systems [160]. A recent trend to view Fuzzy Logic, Neuro-computing, Genetic Computing and Probabilistic Computing are as an association of computing methodologies falling under the rubric of so-called soft computing. The essence of soft computing is that its constituent methodologies are for the most part complementary and synergistic rather than competitive. A concomitant of the concept of soft computing is that in many situations it is advantageous to employ these tools in combination rather than isolation [8]. It is thus clear from the above definitions that soft computing techniques resemble human reasoning more closely than traditional techniques, which are largely based on conventional logical systems, such as sentential logic and predicate logic, or rely heavily on the mathematical capabilities of a computer. Now, we see that, the principal constituents of soft computing are fuzzy logic, ANN theory and probabilistic reasoning,

with the latter subsuming belief networks, genetic algorithms, chaos theory and parts of learning theory. What is important to note is that Soft Computing is not a melange of Fuzzy Logic, ANN and Probabilistic Reasoning. Rather; it is a partnership in which, each of the partners contributes a distinct methodology for addressing problems in its domain. In this perspective, the principal contributions of Fuzzy Logic, ANN and Probabilistic Reasoning are complementary rather than competitive. The definitions of Prof. Zadeh also imply that, unlike hard computing schemes, which strive for exactness and for full truth, soft computing techniques exploit the given tolerance of imprecision, partial truth, and uncertainty for a particular problem. Another common contrast comes from the observation that inductive reasoning plays a larger role in soft computing than in hard computing. The novelty and strength of soft computing lie in its synergistic power through fusion of two or more computational models/techniques [104]. The major soft computing techniques are briefed here.

3.2 Soft Computing Tools In this section we briefly outline some of the common softcomputing tools (Neural Network, Genetic Algorithm, k-NN, LI-KNN, GI-KNN, GA, PSO and ACO) based on their fundamental characteristics. 3.2.1 Artificial Neural network

A ANN is a parallel distributed information processing structure consisting of a number of nonlinear processing units called neurons. The neuron operates as a mathematical processor performing specific mathematical operations on its inputs to generate an output [267]. It can be trained to recognize patterns and to identify incomplete patterns

by resembling the human-brain processes of recognizing information, burying noise literally and retrieving information correctly [89]. In terms of modeling, remarkable progress has been made in the last few decades to improve ANN. ANN are strongly interconnected systems of so called neurons which have simple behavior, but when connected they can solve complex problems. Changes may be made further to enhance its performance [89]. The details of Neural Network are given in Chapter-4.

3.2.2 Genetic algorithms Evolutionary algorithms (EA) were invented to mimic some of the processes observed in natural evolution. Evolution occurs on chromosomes - organic devices for encoding the structure of living beings. Processes of natural selection then drive those chromosomes that encode successful structures to reproduce more frequent than those that encode failed structures. In other word, the chromosomes with the best evaluations tend to reproduce more often than those with bad evaluations. By using simple encodings and reproduction mechanisms, the algorithms can then display complicated behavior and turn out to solve some extremely difficult problems [46]. GAs are a special subclass of a wider set of EA techniques. GA were named and introduced by John Holland in the mid-1960s. Then Lawrence Fogel began to work on evolutionary programming and Ingo Rechenberg and Hans-Paul Schwefel introduced the evolution strategy. In resolving difficult problems where little is known, their pioneered work stimulated the development of a broad class of optimization methods [208]. Subsequently the genetic algorithms were studied by De Jong and

Goldberg. Others such as Davis, Eshelman, Forrest, Grefenstette, Koza, Mitchell, Riolo, and Schaffer, to name only a few, GA had been most frequently applied to the domain of optimization [11]. Based on the principles of natural evolution, GAs are robust and adaptive methods to solve search and optimization problems [263]. Because of the robustness of genetic algorithms, a vast interest had been attracted among the researchers all over the world [64]. In addition, by simulating some features of biological evolution, GA can solve problems where traditional search and optimization methods are less effective. Therefore, genetic algorithms have been demonstrated to be promising techniques which have been applied to a broad range of application are as [264]. The ability to apply GA to real-world problems has improved significantly over the past decade [46]. Details of Genetic Algorithm are given in Chapter-5. 3.2.3 Nearest

Neighbor Techniques

(a) k-Nearest Neighbor The Nearest Neighbor is a simple classification technique used for pattern recognition, which says that a point belongs in the same category as the point nearest to it. A variation of the nearest neighbor rule classifies a point as belonging in the category of the majority of a certain number of nearest neighbors. Nearest neighbor classifiers are based on learning by analogy. It is an instance-based or lazy learner in that they store all of the training samples and do not build a classifier until a new (unlabeled) sample needs to be classified. This contrasts with eager learning methods, such as decision tree induction and back-propagation, which construct a

generalization model before receiving new samples to classify. Lazy learners can incur expensive computational costs when the number of potential neighbors (i.e., stored training samples) with which to compare a given unlabeled sample is great. Therefore, they require efficient indexing techniques. An expected lazy learning method is faster at training than eager methods, but slower at classification since all computation is delayed to that time. Unlike decision tree induction and back-propagation, nearest neighbor classifiers assign equal weight to each attribute. This may cause confusion when there are many irrelevant attributes in the data. The k-NN classifier has been both a workhorse and benchmark Classifier [9], [10], [41], [162], [235]. Given a query vector x0 and a set of N labeled instances {x;, ys}, the task of the classifier is to predict the class label of xo on the predefined P classes. The k-NN classification algorithm tries to find the k nearest neighbors of x0 and use a majority vote to determine the class label of x0. A metric distance between any two entities is called as a notion of proximity. If two entities are in the close proximity, then they are said to belong to the same class or group. The nearest neighbor search is a method to identify entities in the same proximity in a supervised manner and is defined as “Given a collection of data points and a query point in a d-dimensional metric space, find the data point that is closest to the query point” [22]. (b) Locally Informative k-NN (LI-KNN) Without prior knowledge, most KNN classifiers apply Euclidean distance as the measurement of the closeness between

examples.

The neighbors that are of low relevance as the same

importance as those of high relevance could possibly degrade the performance of k-NN procedures [60], we believe it to further explore the information exhibited by neighbors. To find out importance of an instance we propose a new distance metric that assesses the informativeness of point given a specific query point. We then proceed to use it to augment k-NN classification and advocate our first method, LI-KNN. The Rational of informativeness is that two points are likely to share the same class label when their distance is sufficiently small, assuming the point have a uniform distribution. This idea is the sane as k-NN classification. On the other hand, compared to traditional k-NN classifier that measure pair wise distances between the query point and neighbors, our metric also calculates the closeness between neighbor point, i.e., the informative point should also have a large distance form dissimilar point. This further guarantee that the locations of other informative point have the same class label maximum likelihood. (c) Globally Informative k-NN (GI-KNN) The LI-KNN algorithm classify each individual query point by learning informative points separately, however, the informativeness of those neighbors are then discarded without being utilized for other query point. Indeed, in most scenarios, different queries Q may yield different informative points. However it is reasonable to expect that some points are more informative than others, i.e. they could be informative neighbor for several different

points. As a result, it would seem reasonable to put more emphasis on those points that are globally informative. Since it has been shown that k-NN classification can be improved by learning from training examples a distance metric, in this section we enhance the power of the informativeness metric and propose a boosting - like iterative method, namely a globally informative k-NN (GI-KNN) that aims to learn the best weighting for points within the entire training set. Details of Nearest Neighbor Techniques are given in Chapter-6.

3.2.4 Polynomial Neural Network (PNN) PNN is a flexible neural architecture whose topology is not predetermined like a conventional ANN but developed through learning layer by layer. The design is based on Group Method of Data Handling (GMDH) which was invented by Prof. A.G. Ivakhnenko in the late 1960s [80], [81], [146], [149]. He developed GMDH as a means for identifying nonlinear relations between input and output variables. As described in [148] the GMDH generates successive layers with complex links that are individual terms of a polynomial equation. The individual terms generated in the layers are partial descriptions (PDs) of data being the quadratic regression polynomials with two inputs. The first layer is created by computing regressions of the input variables and choosing the best ones for survival. The second layer is created by computing regressions of the values in the previous layer along with the input variables and retaining the best candidates. More layers are built until the network stops getting better based on termination criteria. The selection criterion used in this study penalizes the model that become too complex to prevent overtraining. Figure-3.1 shows a basic PNN model with all inputs. The details of the PNN are given in Chapter-4 along with ANN.

Figure-3.1 Basic PNN Model 3.2.5 Fuzzy Logic In the real world, information is often ambiguous or imprecise. When we state that it is warm today, the context is necessary to approximate the temperature. A warm day in January may be degrees Celsius, but a warm day in August may be 33 degrees. After a long spell of frigid days, we may call a milder but still chilly day relatively warm. Human reasoning filters and interprets information in order to arrive at conclusions or to dismiss it as inconclusive. Although machines cannot yet handle imprecise information in the same ways that humans do, computer programs with fuzzy logic are becoming quite useful when the sheer volume of tasks defines human analysis and action. An organized method for dealing with imprecise data is called fuzzy logic. The data sets engaged in fuzzy logic are considered as fuzzy sets. Traditional sets include or do not include an individual element; there is no other case than

true or false. Fuzzy sets allow partial membership. Fuzzy Logic is basically a multi-valued logic that allows intermediate values to be defined between conventional evaluations like yes/no, true/false, black/white, etc. Notions like rather warm or pretty cold can be formulated mathematically and processed with the computer. In this way, an attempt is made to apply a more humanlike way of thinking in the programming of computers. Fuzzy logic is an extension of the classical propositional and predicate logic that rests on the principles of the binary truth functionality. Fuzzy logic is a multi-valued logic. However, the most pertinent feature of fuzzy logic for which it receives so much attention is its scope of partial matching. In any real world system, the inferences guided by a number of rules follow a middle decision trajectory over time. This particular behavior of following a middle decision trajectory [107] is human like and is a unique feature of fuzzy logic, which made it so attractive. Very recently, Prof. Zadeh highlighted another important characteristic [79] of fuzzy logic that can distinguish it from other multi valued logics. He called it f.g-generalization. According to him any theory, method, technique or problem can be fuzzified (or f-generalized) by replacing the concept of a crisp set with a fuzzy set. Further, any theory, technique, method or problem can be granulated (or g-generalized) by partitioning its variables, functions and relations into granules (information cluster). Finally, we can combine f-generalization with ggeneralization and call it f.g-generalization. Thus ungrouping an information system into components by some strategy and regrouping them into clusters by some other strategy can give rise to a new kind of information sub-systems. Determining the strategies for ungrouping and grouping, however, rests on the designer’s choice. The philosophy of F.ggeneralization undoubtedly will re-discover fuzzy logic in a new form.

_______________________ ____________ Chapter 3 : Soft CoHtftMng Techniques

3.2.6 Particle Swarm Optimization A PSO is one of the soft computing technique. It’s a population-based stochastic optimization algorithm, ft i$;8JQ4eled on the social behavior of bird flocks [91], [92]. It can be easily ifaplamenteri and has been successfully applied to solve a wide rawgft;

optimization

problems such as continuous nonlinear and disotfi& optimization problems [91], [92], [193], There are two basic variations of PSO models based on the types of neighborhoods used : 1) global best ( gbesl) and 2) local best (Z^,). In the gbest neighborhood, the particles are attracted to the best solution found by any member of the swarm (i.e. collection of particles). This represents a fully connected network in which each particle has access to the information of all other members in the commu?ap£ ^ppyever, in the case of using the local best approach, each partidfc J m Jbcess to the information corresponding to its immediate neighbors, according to certain swarm topology. The PSO is explained in Chapter-5.

• • •