studies in applications of soft computing to some ...

5 downloads 144 Views 5MB Size Report
Aug 22, 2008 - Optimization methods coupled with modern tools of digital computer are ...... The manpower recruitment problem is adopted from Tata Consultancy Services, ..... loan evaluation, signature recognition, time series forecasting, ...
STUDIES IN APPLICATIONS OF SOFT COMPUTING TO SOME OPTIMIZATION PROBLEMS

Arindam Chaudhuri Registration Number: 08119900024

A Thesis submitted to Netaji Subhas Open University in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2010

Dedicated to the loving memory of my beloved Father

ACKNOWLEDGEMENTS The work reported in this thesis is performed between the years 2004 to 2010 at Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India and Department of Computer Science Engineering, Birla Institute of Technology Mesra, Patna Campus, India. With deep sense of gratitude I express my heartiest thanks to Prof Kajal De for supervising my research work leading to this dissertation. Her invaluable advice formed a source of great motivation for me. She provided constant encouragement and took great pains to ensure success of this work. I am indebted to her for guiding this thesis and creating a possibility to make this work. Heartfelt thanks are always there for Prof Dipak Chatterjee, Department of Mathematics, St. Xavier’s College, Kolkata, India. His discussions of different aspects of mathematics and statistics formed a backbone for this work and will be of immense value to me throughout my research and academic career. Without his decisive help and support in the critical moment the finalizing of this thesis would not be possible. My indebtness to Dr. Pabitra Mitra, Department of Computer Science Engineering, Indian Institute of Technology Kharagpur, India is beyond words. I have known him since last decade. He has been my role of an ideal researcher and academician. Special note of thanks are always there for Dr. Debaprasad Mandal. I acknowledge Birla Institute of Technology Mesra, India for providing computing facilities and Indian Statistical Institute for library facilities. I shall forever remain grateful to my wife Shivapriya and my father and mother in law for their constant enthusiasm and support that has helped me during this research work.

NSOU, Kolkata 2010

Arindam Chaudhuri

Contents 1 Introduction and Scope of Thesis…………….……………………………….………………1 1.1 Introduction……………………………………………………………………………………………………….……..2 1.1.1 Research Problem…………………………………………………………………………………………………….……3 1.1.2 Research Assumptions………………………………………………………………………………………….……….4 1.1.3 Hypothesis…………………………………………………………………………………………………………………….4

1.2 Optimization Problem…………………………………………………………………………………………….…5 1.3 Solution of Optimization Problem……………………………………………………………………………..6 1.4 Soft Computing………………………………………………………………………………………………………….7 1.4.1 Fuzzy Sets………………………………………………………………………………………………………..……………8 1.4.2 Artificial Neural Networks…………………………………………………………………….……………….…….9 1.4.3 Genetic Algorithms……………………………………………………………………………………………………….9 1.4.4 Rough Sets…………………………………………………………………………………………………………..………10 1.4.5 Ant Colony Optimization.....................................................................................................…...10 1.4.6 Hybrid algorithms………………………………………………………………………………………….…………..11

1.5 Soft Computing for Optimization Problems……………………………………………..………………11 1.5.1 Scope of Applicability……………………………………………………………………………………….…………12 1.5.2 Research Issues and Challenges…………………………………………………………………………………..12

1.6 Related Works…………………………………………………………………………………………………………13 1.6.1 Work related to Traveling Salesman Problem…………………………………………..…..……………..13 1.6.2 Work related to Transportation Problem………………………………………………………..……..……14 1.6.3 Work related to Decision Making and Rectangular Games……………………..………………..…15 1.6.4 Work related to Financial Investment Classification…………………..……..………………..……..16 1.6.5 Work related to Forecasting and Prediction………………………………………………….………..…..17 1.6.6 Work related to Bankruptcy Prediction…………………………………………..……..…………….…….19 1.6.7 Work related to Timetable Assignment Problem……………………..…………..……………………..19 1.6.8 Work related to Longest Common Subsequence Problem………………………..……………….…..20 1.6.9 Work related to Job Scheduling Problem………………………..…………………………………………..21

1.7 Scaling Soft Computing Algorithms to Complex Data Sets…………………………..…………..22 1.7.1 Data reduction……………………………………………………………………………………………………..………23 1.7.2 Dimensionality reduction……………………………………………………………………………………………23 1.7.3 Active learning……………………………………………………………………………………………….…………..23

1.7.4 Data partitioning………………………………………………………………………………………………….…….24 1.7.5 Efficient search algorithms…………………………………………………………………….…………………..24

1.8 Scope of the Thesis…………………………………………………………………………………………………..24 1.8.1 Different Formulations of Traveling Salesman Problem……………………………….…………….25 1.8.2 Modeling Various aspects of Transportation Problem………………………………………………….26 1.8.3 Decision Making and its Applications in Game Theory and Financial Investments…….26 1.8.4 Time Series Forecasting and Predicting Stock Prices alongwith Bankruptcy in Organizations…………………………………………………………………………………………………………….27 1.8.5 Some Problems in Assignment, Sequencing and Job Scheduling………….………………………28 1.8.6 Conclusions and scope for further research………………………………………………………………….30

2 Different Formulations of Traveling Salesman Problem…………………..……31 2.1 Introduction…………………………………………………………………………………….………………………32 2.2 Fuzzy Self Organizing Map (FSOM) Model………………………………………..……………………34 2.2.1 Self Organizing Map (SOM)…………………………………………………………………………………………34 2.2.2 Fuzzy Self Organizing Map (FSOM)…………………………….…………………………………………….35 2.2.3 Heuristic solution for TSP by FSOM………………………………………..…………………………………37

2.3 Fuzzy Integer Linear Programming (FILP) Model…………………………….………….……….38 2.3.1 ILP Model of TSP………………………………………………………………………………..……………………..39 2.3.2 FILP Model of TSP…………………………………………………………………………..………………………..41

2.4 Fuzzy Multi-Objective Linear Programming (FMOLP) Model………….……………………45 2.4.1 Multi-Objective Linear Programming……………………………………………………..…………………..45 2.4.2 Fuzzy Multi-Objective Linear Programming……………………………………….………………..……45 2.4.3 FMOLP Model of TSP…………………………………………………………………….…………………….…...47

2.5 Experimental Results and Comparisons………………………………………………………….…….….48 2.5.1 Simulation results of TSP using FSOM technique………………………………..…..………………...48 2.5.2 Simulation results of TSP using FILP Approach…………………………………..…..…………….…51 2.5.3 Simulation results of TSP using FMOLP Approach………………………….……………..…………52

2.6 Conclusion…………………………………………………………………………………….…………………………55

3 Modeling various aspects of Transportation Problem…..……….………………..56 3.1 Introduction………………………………………………………………….…………………………………………57 3.2. Transportation Problem under Probabilistic and Fuzzy Uncertainties…………...………60

3.3 Solution of constraint equations of Transportation Problem through Neuro-Fuzzy Approach………………………………………………………………………….……………………………………..62 3.3.1 Neuro-Fuzzy Approach………………………………………….…..………………………………………………..63 3.3.2 Fuzzy Back-propagation Learning rule………………….…………………………….….………………….63 3.3.3 Polak-Ribiere conjugate gradient algorithm with Fuzzy rules…………………………….………65

3.3.4 Architecture of Neuro-Fuzzy network………………………………………………………….……….……..65 3.3.5 Complexity analysis of modified Polak-Ribiere conjugate gradient Neuro-Fuzzy Algorithm……………………………………………………………………………………………………….…….……67

3.4 Proposed solution of Transportation Problem using FVAM…………………………………….67 3.4.1 Transportation Problem through Fuzzy Trapezoidal Numbers………………………….……..….67 3.4.2 Definitions regarding Fuzzy feasible solution of Transportation Problem….…….………...69 3.4.3 Solution of Transportation Problem using Fuzzy Trapezoidal Numbers……….……..........69 3.4.4 Computational Complexity of FVAM…………………………………………………………….…………….71

3.5. Experimental Results and Comparisons…………………………….………………..…………………..71 3.5.1 Simulation results of Transportation Problem under Probabilistic and Fuzzy Uncertainties……………………………………………………………………………………………………………..71 3.5.2 Numerical Examples illustrating Solution of System of Linear Equations using NeuroFuzzy Approach…………………………………………………………………………………………………………73 3.5.3 Simulation results of Transportation Problem using FVAM……………………….…………..…74

3.6 Conclusion…………………………………………………………………………………………………….…………83

4 Decision Making and its Applications in Game Theory and Financial Investments……………………………………………………………………………………………….85 4.1 Introduction……………………………………………………………………..……………………………………..86 4.2 Solution of Decision Making Problems using Fuzzy Soft Relations…………………..…….91 4.2.1 Soft Relation – A Classical Approach……………………………………………………………………..……92 4.2.2 Fuzzy Soft Relation…………………………………………………………………………………..………………..93 4.2.3 Soft Relation and Fuzzy Soft Relation – An Alternative Approach…………………..………..96

4.3 Solution of Rectangular Fuzzy Games by Principle of Dominance using LR-type Trapezoidal Fuzzy Numbers………………………………………………………………………….…….…103 4.3.1 Interval Numbers……………………………….…………………………….……………………………..…………103 4.3.2 Two Person Zero Sum Games and Pay-off Matrix……………………………………..……..……….104 4.3.3 Solution of 2 × 2 Games with Mixed Strategies……………………………………….……..…………..105 4.3.4 Concept of Dominance………………………………………………………………………….………………..….106

4.4 Classification of Financial Investments using Multi-class SVM Approach………..…..108

4.4.1 Support Vector Machine…………………………………………………………………………………………….110 4.4.2 Multi class SVM for classification of Financial Investments…………………………..…………112

4.5 Experimental Results and Comparisons…………………………………………………….……………113 4.5.1 Application of Fuzzy Soft Relations to Decision Making Problems………………….……..…113 4.5.2 Application of LR-type Trapezoidal Fuzzy Numbers to Dominance Problem……………118 4.5.3 Application of Multi-class SVM for Classification Problem…………………………….…..……119

4.6 Conclusion……………………………………………………………………………………………………………..121

5 Time Series Forecasting and Predicting Stock Prices alongwith Bankruptcy in Organizations……………………………………………………..……………123 5.1 Introduction…………………………………………………………………………………………………………..124 5.2 Predicting Stock Price using RFMLP Networks………………………………..………………….129 5.2.1 Fuzzy Multi Layer Perception Networks………………………………………………….………………..130 5.2.2 Generation of Stock Price Prediction Rules using RFMLP Networks…………………..…..131 5.2.3 Quantitative Performance Measures………………………………………………………………………….138

5.3 Time Series Forecasting using Hybrid Neuro-Fuzzy Regression Model………………….138 5.3.1 ANN Approach to Time Series…………………………………………………………………………….……..138 5.3.2 Fuzzy Forecasting Model…………………………………………………………………………………………..139 5.3.3 Fuzzy Regression Model…………………………………………………………………………….………………139 5.3.4 Neuro-Fuzzy Forecasting Model………………………………………………………..………………………141

5.4 Enhancing Forecasting Accuracy with Non-uniform spread FLR Model……………….144 5.4.1 Fuzzy Linear Regression……………………………………………………………………………………………145 5.4.2 Membership Function of Regression Coefficients………………………………………………………146 5.4.3 Fuzzy Regression with Non-uniform Spreads………………………………………………….…………149

5.5 FSVM for Bankruptcy Prediction……………………………………………………………….…………151 5.5.1 Bankruptcy Analysis Methodology…………………………………………………………………………….152 5.5.2 Need for Risk Classification……………………………………………………………….……………………..154 5.5.3 Support Vector Machine……………………………………………………………………..……………………..155 5.5.4 Fuzzy Support Vector Machine………………………………………………………………..………………..159

5.6 Experimental Results and Comparisons…………………………………………..……………………..163 5.6.1 Simulation results of Discovering Stock Price Prediction Rules using RFMLP Model……………………………………………………………………………………………………………………….163 5.6.2 Simulation results of Forecasting using Hybrid Neuro-Fuzzy Model…………….……………166 5.6.3 Simulation results denoting Non-uniform Spread in FLR Model………………………….……170 5.6.4 Simulation results denoting Bankruptcy Prediction using FSVM……………………..……….172

5.7 Conclusion……………………………………………………………………………………………….…………….177

6

Some

Problems

in

Assignment,

Sequencing

and

Job

Scheduling…………………….………………………………………………………………………..179 6.1 Introduction…………………………………………………………………………………….…………………….180 6.2. FILP for Examination Timetable Problem………………………………………….……………….185 6.2.1 Examination Timetable at Netaji Subhas Open University…..…………………………………..186 6.2.2 Model 1 for Examination Timetable Problem………………………………………………………..…186 6.2.3 Model 2 for Examination Timetable Problem…………………………………………………………..188 6.2.4 Model 3 for Examination Timetable Problem………………………………………..…………………189

6.3 FGH Algorithm for University Course Timetable Problem………………………..…………..191 6.3.1 University Course Timetable Problem…………………………………………………………..……………191 6.3.2 Uncertainty measures in University Course Timetable Problem………………………….……..193 6.3.3 FGH Algorithm for University Course Timetable Problem…………………………………..……194 6.3.4 Formulation of fitness function………………………………………………………………………….……..195 6.3.5 Genetic Operators………………………………………………………………………………………………………197

6.4 ACO technique for LCS Problem……………………………………………………………..…………….198 6.4.1 Longest Common Subsequence……………………………………………………………………..…………….199 6.4.2 Ant Colony Optimization……………………………………………………………………..……………………199 6.4.3 Framework of basic Ant Colony Optimization Algorithm……………………………..…………..200 6.4.4 ACO for LCS Problem……………………………………………………………………………………………….201 6.4.5 Stochastic Combinatorial Optimization for ACO-LCS Algorithm……………………..……….206 6.4.6 Stochastic Gradient Ascent update in ACO-LCS………………………………………………………..208 6.4.7 Cross-entropy update in ACO-LCS………………………………………………………………………..……209

6.5 RFMLP–NN for JSP………………………………………………………………………..……………………211 6.5.1 Generation of solutions using Genetic Algorithms…………………………………………..…………211 6.5.2 Data Classification Problem…………………………………………………………………..………………….212 6.5.3 RFMLP–NN Model…………………………………………………………………………………………………..214 6.5.4 Structure of RFMLP–NN Model……………………………………………………………………………....218 6.5.5 RFMLP–NN Job Scheduler………………………………………………………………….…………………..219

6.6 Experimental Results and Comparisons………………………………………………………………….221 6.6.1 Simulation results of examination timetable problem from Netaji Subhas Open University, Kolkata……………………………………………………………………………………..…………..221 6.6.2 Simulation results of University Course Timetable Problem from St. Xavier’s College, Kolkata………………………………………………………………………………………………………………..…..225

6.6.3 Illustration of ACO-LCS Algorithm…………………………………………………………………………..228 6.6.4 Performance illustration of RFMLP–NN Classifier for JSP………………………..…………..233

6.7 Conclusion……………………………………………………………………………………………………………..237

7 Conclusion and Scope for further Research…………………………………………..239 7.1 Conclusion……………………………………………………………………………………………….…………….240 7.2 Scope for further Research…………….….……………………………………………………………………244

Appendix: Data Sets Used in Experiments Bibliography

List of Figures Figure 2.1: Kohonen SOM with two dimensional neighborhood and input vector……………………….35 Figure 2.2: One dimensional neighborhood of Kohonen SOM……………………………………………..……..35 Figure 2.3: Classical two dimensional neighborhood…………………………………………………..………………35 Figure 2.4: Extended two dimensional neighborhood of Kohonen SOM……………………………………..35 Figure 2.5: Self-organization of network with two dimensional neighborhoods……………….…………37 Figure 2.6: Self-organization of network with one dimensional neighborhood……………………..……37 Figure 2.7: SOM solution without 2opt optimization (left). There are two local loops on the left. First and last neuron can be seen in the middle. They are not connected it picture but distance between them is also computed. The same solution improved by 2opt (right). Loops on left have been erased. Additional changes can be observed…………………………………………………………………..……38 Figure 2.8: 2opt optimization. If there is cycle (A, B, C, D, E, F) and path (B, C, D) is reversed, then new cycle is: (A, D, C, B, E, F)…………………………………………………………………………………….…….38 Figure 2.9: Illustration of network of TSP; Distance between arcs (i, s ) and ( j , s  1) is x isj …..42 Figure 2.10: Optimal tour length for 225 city set taken from TSPLIB (left) is 3916. Tour length generated by FSOM 2opt hybrid (right) is 3899…………………………………………………………..……………..49 Figure 2.11: Optimal tour length for 2392 city set taken from TSPLIB (left) is 378037. Tour length generated by FSOM 2opt hybrid (right) is 377946………………………………………….………………..49 Figure 2.12: Symmetric Traveling Salesman Problem………………………………………………………………..52 Figure 3.1: General representation of Transportation Problem…………………………………………………..57 Figure 3.2: Frequency distributions to be transformed………………………………………………………………..61 Figure 3.3: The transformation of Cumulative function to Fuzzy number……………………..…………..61 Figure 3.4: Frequency distribution F and Fuzzy number  for optimized x11 …………………………….72 Figure 3.5: Frequency distribution F and Fuzzy number  for optimized x12 ………………..…………..72 Figure 3.6: Frequency distribution F and Fuzzy number  for optimized x22 …………………………….72 Figure 3.7: Frequency distribution F and Fuzzy number  for optimized x33 …………………………….72

Figure 3.8: Frequency distribution F and Fuzzy number  for optimized benefit D : 1 – Monte Carlo method for 10,000 random steps; 2 – Monte Carlo method for 100,000,000 random steps; 3 – Fuzzy approach…………………………………………………………………………………………………………………..……..72 Figure 3.9: Transportation Problem with 3 origins and 4 destinations………….…………….……………..75 Figure 3.10: First allocation to Transportation Problem……………………………………….…………….……..75 Figure 3.11: Second allocation to Transportation Problem…………………………………………………….…..76 Figure 3.12: Third allocation to Transportation Problem…………………………………………………………..76 Figure 3.13: Fourth allocation to Transportation Problem…………………………………………..…………….77 Figure 3.14: Fifth allocation to Transportation Problem…………………………………………………………...77 Figure 3.15: Sixth allocation to Transportation Problem……………………………………………….…………..78 Figure 3.16: Final allocated matrix with corresponding allocation values………………………………….78 Figure 3.17: Final allocated matrix with sum of U i and V j values and cell evaluations for unoccupied cells………………………………………………………………………………………………………..……………….83 Figure 4.1: The term tall men in terms of Crisp and Fuzzy soft relations…………………………..……….98 Figure 4.2: (a) Trapezoidal membership function; (b) Triangular membership function; (c) Smooth Trapezoid; (d) Smooth Triangular………………………………………………………………………………...99 Figure 4.3: Truth and Cayley tables for p  q………………………………………………………………….………..101 Figure 4.4: Truth table for Fuzzy soft connective or…………………………………………………….……………101 Figure 4.5: Modus Ponens rule of inference…………………………………………………………………...…………102 Figure 4.6: The Cayley table for Fuzzy soft connective implication…………………………………………..102 Figure 4.7: Average Classifier accuracy for Glass dataset using Gauss kernel……….…………………..121 Figure 5.1: Intra and Inter module links……………………………………………………………………………..…..136 Figure 5.2: Steps for designing sample modular RFMLP…………………………………………………….……137 Figure 5.3: Chromosomal representation…………………………………………………………………………………..137 Figure 5.4: Artificial Neural Network Structure [ N ( p q 1) ] …………………………………….………………140 Figure 5.5: Minima f opt and fˆn of expected ( R ) and empirical ( Rˆ ) Risk functions generally do not coincide…………………………………………………………………………………………………………………….……….154 Figure 5.6: Eight possible ways of shattering 3 points on plane with Linear indicator function…………………………………………………………………………………………………………………………………..155

Figure 5.7: The separating hyperplane x T w  b  0 and margin in non-separable case……….….158 Figure 5.8: Positive connectivity of the network obtained for BSE data using Model S; Bold lines indicate weights greater than PThres2 , while others indicate values between PThres1 and

PThres2 ……………………………………………………………………………………………….…………….165 Figure 5.9: Results obtained from neuro-fuzzy model (Series1 denote upper bound of exchange rate; Series2 denote actual value of exchange rate; Series3 denote lower bound of exchange rate)………………………………………………………………………………………………………………………………………....168 Figure 5.10: Results of neuro-fuzzy model after deleting 22nd August, 2008 Lower bound (Series1 denote upper bound of exchange rate; Series2 denote actual value of exchange rate; Series3 denote lower bound of exchange rate)……………………………………………………………………………….…………………168 Figure 5.11: Ratings of Organizations in two dimensions; Case of low classifier functions, radial basis is 100

1

2

, capacity is fixed at C  1 ………………………………………………………..……………………..174

Figure 5.12: Ratings of Organizations in two dimensions; Case of an average complexity of 1

classifier functions, radial basis is 2 2 , capacity is fixed at C  1 ………………………….……………..174 Figure 5.13: Ratings of Organizations in two dimensions; Case of an excessively high complexity of classifier functions, radial basis is 0.5

1

2

, capacity is fixed at C  1 …………………………….……175

Figure 5.14: Ratings of Organizations in two dimensions; Case of high capacity C  300 , radial 1

basis is fixed at 2 2 ……………………………………………………………………………………………………….……….176 Figure 5.15: Power (Lorenz) curve – the cumulative default rate as a function of the percentile of Organizations sorted according to their score – for training set of Organizations; FSVM is applied 1

with radial basis 2 2 ………………………….…………………………………………………..176

and

capacity C  1

Figure 6.1: Fuzzy representation of elapsed times………………………………………………….………………….193 Figure 6.2: Fuzzy representation of schedule of teacher……………………………………………………………193 Figure 6.3: Schematic illustration of RFMLP–NN based Scheduler………………………………….……..212 Figure 6.4: Representation of GA solution…………………………………………………………………………..……213 Figure 6.5: Input features and target class for Classification Problem………………………..…………….214 Figure 6.6: Block diagram of the Procedure……………………………………………………………………………..220 Figure 6.7: Rough Fuzzy Multi Layer Perception Neural Network (12–12–10–6)………………….…221 Figure 6.8: Directed path from one element to other depicting Pheromone deposited by ants…...229 Figure 6.9: The identical nature of 1234 with respect to other permutations………………………….229

Figure 6.10: The identical nature of 1234 with respect to other permutations………………………..230 Figure 6.11: The diagram of a1234 b1432……………………………………………………………………..……230 Figure 6.12: The diagram of a1234 b1234……………………………………………………………..……………230 Figure 6.13: The diagram of a1234 b1423……………………………………………………………….………….231 Figure 6.14: The diagram of a123 b213………………………………………………….…………………………..231 Figure 6.15: The diagram of a123 b123………………………………………………………………………………231 Figure 6.16: The diagram of a231 b213………………………………………………………………………………232 Figure 6.17: The diagram of a12 b12, 1……………………………………………………..……………………….232 Figure 6.18: The diagram of a21 b21, 1……………………………………….……………………………………..232 Figure 6.19: The diagram of a12 b21, 1……………………………………………………………………..……….233 Figure 6.20: The diagram of a21 b12, 1……………………………………………………..……………………….233

List of Tables Table 2.1: Comparison of FSOM, EA and 2opt Algorithm………………………………………..………………..50 Table 2.2: Results for Lin-Kerninghan Algorithm……………………………………………………………………..50 Table 2.3: Matrix for time, cost and distance for each pair of cities………………………………………….…53 Table 2.4: Solution of Fuzzy Multi-Objective linear program……………………………………………………..54 Table 3.1: Average values of Gaussian distributions of uncertain parameters…………………….……….71 Table 4.1: General form of Payoff table…………………………………………………………………………………..…92 Table 4.2: General form of Regret table…………………………………………………….………………………………..92 Table 4.3: Parameterized matrix for Soft relation is husband of………………………………….……………..94 Table 4.4: Parameterized matrix for Fuzzy Soft relation far………………………………………………………94 Table 4.5: Membership matrix for Fuzzy Soft relation costly, beautiful………………………………..…...97 Table 4.6: Membership matrix for Fuzzy Soft relation cheap, beautiful……………………………………..97 Table 4.7: Membership values and corresponding elements of Universe…………………………….………..98 Table 4.8: Kernel functions of SVM………………………………………………………………………………….………113 Table 4.9: Probability values of person p i with respect to risk taking parameter……………………....114 Table 4.10: Possibility values of person p i with respect to risk taking attitude……………………..…..115 Table 4.11: Probability values of investment ik with respect to advance mobilization………………..115 Table 4.12: Possibility values of investment ik with respect to advance mobilization………………….115 Table 4.13: Probability values of fund source mk with respect to fund mobility……………………..….116 Table 4.14: Possibility values of fund source mk with respect to fund mobility…………………..……..116 Table 4.15: Probability values of recruitment ri with respect to innovativeness parameter………...117 Table 4.16: Possibility values of recruitment ri with respect to innovativeness parameter……….…117

Table 4.17: Probability values of product t i with respect to price parameter…………………………….…118 Table 4.18: Possibility values of product t i with respect to price parameter………………..………………118 Table 4.19: Comparison of classifier accuracy using different Kernel functions…………………….....120 Table 4.20: Comparison of classifier accuracy using different methods for Multi-class SVM…….120 Table 5.1: Rating grades and Risk premia…………………………………………………………………………………156 Table 5.2: Rating grades and capital requirements; Figures in last column were estimated for a loan to an SME with turnover of 5 million euros and maturity of 2.5 years using data from column 2 and the recommendations of Basel Committee on Banking Supervision……….…………..157 Table 5.3: Rough set dependency rules for BSE data alongwith input Fuzzy parameter values……………………………………………………………………………………………………………………………………....164 Table 5.4: Comparative Performance of different models on BSE dataset……………………..………….165 Table 5.5: Comparison of performance of rules extracted by various methods on BSE data…….…166 Table 5.6: Rules extracted from trained networks (Model S) for BSE data alongwith input Fuzzy parameter values………………………………………………………………………………………………………………..…….167 Table 5.7: Weights and Biases of ANN N ( 231) ……………………………………………………………………….168 Table 5.8: Results of Neuro-Fuzzy model for test data………………………………………………………………169 Table 5.9: Comparison of forecasted interval widths of Neuro-Fuzzy model with different sample sizes……………………………………………………………………………………………………………………………………….…169 Table 5.10: Comparison of forecasted interval widths by Neuro-Fuzzy model with other forecasting models…………………………………………………………………………………………………….………………169 Table 5.11: Comparison of performance of Neuro-Fuzzy model with other forecasting models………………………………………………………………………………………………………………………………………170 Table 5.12: Numerical data and estimation errors (Example 1)……………………………………..…………171 Table 5.13:  -cuts of Fuzzy regression coefficients at eleven distinct  -values………………………171 Table 5.14: Numerical data and estimation errors (Example 2)………………………………………………..172 Table 5.15: Cluster centre locations; there are 25 members in class {-1}: successful organizations and 75 members in class {1}: unsuccessful organizations…………………………………………………..……….175 Table 6.1: Timetable Problem specifications…………………………………………………………………..………..191 Table 6.2: Hard Constraint specifications………………………………………………………….……………………..192 Table 6.3: Soft Constraint specifications………………………………………………..………………………………….192 Table 6.4: Standard Genetic Operators and parameters considered……………………………………………197

Table 6.5: ft06 problem instance devised by (Fisher and Thompson 1963)…………………………..…….213 Table 6.6: Assignment of class labels to target feature………………………………………………….……………214 Table 6.7: Sample data for classification task……………………………………………………………..…………….220 Table 6.8: Training parameters for 12–12–10–6 RFMLP Classifier…………………………….……………220 Table 6.9: Test results for FILP Model 1……………………………………………………………………………….…222 Table 6.10: Test results for FILP Model 2………………………………………………………………..………………222 Table 6.11: Test results for FILP Model 3…………………………………………………………………….………….222 Table 6.12: Test results for Model 1 as ILP heuristic………………………………………………………………..223 Table 6.13: Test results for Model 2 as ILP heuristic………………………………………………………………..223 Table 6.14: Test results for Model 3 as ILP heuristic……………………………………….……………………….223 Table 6.15: Comparison of best and mean cost of FILP (Model 3) technique with other AI based Heuristic techniques…………………………………………………………………………………………………………………224 Table 6.16: Comparison of Execution Times (in minutes) of FILP (Model 3) technique with other AI based Heuristic techniques………………………………………………………………….………………………………225 Table 6.17: Comparison of the results obtained by FILP (Model 3) technique and best result cited in literature……………………………………………………………………………………………………………………………..226 Table 6.18: Simulation results for standard operators and parameters……………………..……………….227 Table 6.19: Simulation results for domain specific operators…………………………………………………….227 Table 6.20: Comparison between manual and Fuzzy Genetic Algorithm solution………………………228 Table 6.21: Comparison of Execution Times (in minutes) of Fuzzy Genetic Heuristic solution with other GA based Heuristic techniques…………………………………………………………………………….….228 Table 6.22: Confusion matrix of 12–12–10–6 RFMLP-NN Classifier………………………………………..234 Table 6.23: Makespan of Schedules for ft06………………………………………………………………………….…..234 Table 6.24: Makespan obtained by various Schedulers on test data set…………………..………………….235 Table 6.25: Different groups of schedulers……………………………………………………………….……………….235 Table 6.26: Comparison of makespan performance of different schedulers…………………..……………236

Chapter 1 Introduction and Scope of Thesis

1.1 Introduction The ever-increasing demand to lower production costs to withstand cutthroat competition has prompted engineers and technologists to look for rigorous methods of decision making [94], [126], [146], [148] such as optimization methods or operations research techniques to design and produce products both economically and efficiently. As search for best has always fascinated mankind, operations or strategies have been attempted and devised for searching the best solution of variety of problems in almost all branches of activities that can be perceived by logic or institution or both. History of computing recorded this endeavor for optimization even from the days when Euclid was concerned with finding greatest and least straight lines that can be drawn from a point to circumference of a circle. More complex problems of maximization and minimization had to wait until calculus or calculus of variations were introduced and developed. Many problems of geometric or dynamic or physical nature have since then been attempted and solved by determining optimal solutions [12], [126], [148]. Problems involving curves or quickest descent or minimum areas of revolution fall in this category. The existence of optimization methods can be traced even in the days of Newton, Largrange and Cauchy. The contribution of these mathematicians to the development of Differential Calculus made possible methods of optimization for some problems. The foundation of Calculus of Variation by Bernoulli, Euler, Lagrange and Weirstrass led to the discovery of methods of optimization of constrained problems which involve addition of unknown multipliers. Cauchy made first application of steepest descent method to solve unconstrained maximization problems [94], [148]. In spite of these early contributions very little progress was made until the middle of twentieth century when high speed digital computers made implementation of optimization techniques possible and stimulated further research on new methods of optimization [12], [126]. In recent years, applied science has acquired a high degree of competence in industrial, scientific and military operations. The growth of Computer Science has added much to this progress [94], [148]. In the past large margins of safety were acceptable to design and planning of such operations. But now the competitive situation has become so tense that real demand has been created for discovery of methods which will systematically improve the system of operations. This has been possible with growing reservoir of reliable models and widespread availability of fast and accurate computing facilities. Almost every problem in design analysis and operations associated with manufacturing plants and industrial process can be reduced to determination of largest and smallest values of function of several variables. Searching over wide variety of feasible solutions to these problems alongwith consequential research of best out of many probable led to the growth of optimization techniques discipline [12], [126], [146]. Optimization technique is thus an important and challenging aspect in mathematical and computational domain and has its applications in all disciplines of engineering and science [94], [126]. It involves in finding best solution in most effective way to given problem eventually with some constraints. The task of optimization technique by computing device can be described as transformation from modeling space M to solution space S and finally to decision space D , i.e.,

M S D

Here, the mapping function  : S  D is decision function and elements d  D are termed as decisions. Although search for maximum or minimum dates back to time unknown, the term optimization techniques goes to only 1940 when development of the subject began in United Kingdom during World War II. During the conflict, lives and national freedom were at stake and experts in different domains were urged to enter the field to devise method in solving problems leading to victory. The combination of needs and intellects gave birth to the subject, which in peace time is being used in all kinds of management science [12], [148]. Optimization technique is the application of scientific method to decision problems of business and all units of social organization including government and military organizations. Optimization problems arise in almost every sphere of human activity [126]. They occur in every engineering discipline such as civil engineering, mechanical engineering, electrical engineering, telecommunication engineering, chemical and bio-chemical engineering, engineering design and manufacturing systems etc [146]. They have reached a degree of maturity over the past several years, are being used in wide spectrum of industries including aerospace, automotive, chemical, electrical and manufacturing industries. They also occur in business administration, management and other economics and industry related fields [94]. With the rapidly advancing computer technology, computers are becoming more powerful, and correspondingly, the size and the complexity of the problems being solved using optimization techniques are also increasing. Optimization methods coupled with modern tools of digital computer are being used to enhance the creative process of conceptual and detailed design of engineering systems. In fact, newly developed optimization techniques are now being applied in every of life where decisions have to be taken in complex situation which can be represented by mathematical model [111]. The objective of the thesis is to provide some results of investigations, both theoretical and experimental, addressing certain optimal solution of optimization problems by Soft Computing techniques [86], [163]. The problems considered include traveling salesman problem [29], [35], transportation problem [32], [37], [40] decision making problem [39], rectangular game problem [30], financial investment problem [33], stock price prediction problem [36], time series forecasting problem [41], [42], bankruptcy prediction problem [38], job scheduling [44] and sequencing problems [31]. Various methodologies have been designed and developed using Soft Computing approach integrating Fuzzy Logic [159], [163], Artificial Neural Networks (ANN) [79], Genetic Algorithms (GA) [74], Rough sets [121] and Ant Colony Optimization (ACO) [58]. The emphasis of proposed methodologies is given on (a) handling data sets which are large both in size and dimension and involve classes that are overlapping, intractable and have non-linear boundaries and (b) demonstrating significance of Soft Computing paradigm for generating optimal, cost effective and low complexity solutions. Before the description of scope of the thesis, a brief review of optimization problem, its solution through different traditional techniques, elements of Soft Computing, its applicability to optimization problem, related works of associated problems and scalability of Soft Computing algorithms to complex data sets are presented. But before that a brief discussion of research problem, research assumptions and hypothesis are given in next three subsections. 1.1.1 Research Problem Since the pioneering applications of Soft Computing in different domains in middle of 1980’s [86], the industrial as well as research institutes’ interest has grown rapidly, especially in recent years. This has lead to an increasing number of industrial and theoretical applications in an easy and user friendly environments. However, optimization of results obtained remains an important issue, such that the solution is not always optimal. The optimality of solution is efficiently and

effectively tackled using various aspects of Soft Computing [163] and its different hybridizations [14], [117]. The research problem in this thesis is thus formulated as follows: To develop cost effective, efficient and optimum solutions to different aspects of optimization problems using different Soft Computing tools 1.1.2 Research Assumptions From systems engineering and developers’ point of view it may be argued that information extracted from data has very important effect on modeling different abstract concepts. This fact is apparent in developing optimal solutions on complex real life data for different aspects of optimization problems using various Soft Computing paradigms [86]. For example, in decision making situations involving large number of complex uncertain parameters [39], solution is required which should be optimal in nature. The optimality of solutions is taken care of through fine tuning of different variables of problem so that the desired tool is applied effectively. With increase in number of variables, problem becomes more computationally intensive, which entails in development of other hybrid tools [86] such that a feasible solution is obtained. From this point of view this research is summarized on following assumptions: Fast adaptation and tuning of high dimension optimization problem in an environment with complex uncertain parameters requiring specific generation of Soft Computing model as a part of development of optimal solution for the problem 1.1.3 Hypothesis The generation of different Soft Computing models [86] is considered as a part of developing cost effective and optimal solutions for various categories of optimization problems. Especially the generation and tuning of Soft Computing techniques is needed for the fast adaptation into abstract aspects of optimization problems [163]. Therefore, the development process, methods, environment and application area of the problem as well as algorithm development related matters have to be understood and taken into consideration in fast adaptation and tuning process. Each of them sets their own restrictions to design part and parameters used in modeling. From that point of view, the research problem is considered from many aspects and research hypothesis can be formulated as follows: In order to model real life nonlinear optimization problems, generation and tuning of Soft Computing models is required for fast adaptation into new circumstances of optimization problems dynamically such that low cost solutions are obtained Solution development process and methodology can be specified to include restrictions set by design parts and design parameters of model. The modeling environment and nature of application area include restrictions originated from development tools and natural environment. Algorithm development parameters have their own restrictions, which should be identified and selected to support the procedures [53]. This hypothesis is supported by showing several applications where other approaches are either very difficult or computationally intensive to implement. This chapter is organized as follows. Section 1.2 presents description of basic concept of optimization problem. Next, different solution techniques of optimization problem are discussed. In section 1.4, fundamental principles of Soft Computing are illustrated. The different

applications of Soft Computing to optimization problems are described in section 1.5. This is followed by discussion of previous works related to optimization problems considered in the thesis. The scalability of Soft Computing algorithms to complex data sets is highlighted in section 1.7. Finally, section 1.8 discusses scope of the thesis.

1.2 Optimization Problem An optimization technique can be defined as computing paradigm which helps the user to develop best possible solution among set of solutions available for given problem, thus enabling him to take better decisions [126]. The ultimate goal of all decisions is either to minimize the effort required or to maximize the desired benefit. Since, effort required or benefit desired in any practical situation can be expressed as function of certain decision variables, optimization can be defined as the process of finding conditions that give maximum or minimum value of function. The general mathematical optimization problem has the following form [21], [111], [120]: 

min f 0 ( X ) subject to constraints 

f i ( X )  bi , i  1,........, m 

(1)

g j ( X )  c j , j  1,......... , p 

Here, vector X  ( x1 ,........, x n ) T is optimization variable or decision variable or design vector of the

problem.

The

function

f0 : Rn  R

is

objective

function.

The

functions

f i : R n  R, i  1,........, m are inequality constraint functions and constants b1 ,......., bm are limits or bounds for constraints. The functions g j : R n  R, j  1,........, p are equality constraint functions and constants c1 ,......., c p are limits or bounds for constraints. The number of variables n and number of constraints m and/or p need not be related in any way. The problem stated in equation (1) is called constrained optimization problem. Some optimization problems do not involve any constraints and can be stated as follows [21], [111], [120]: 

min f 0 ( X )

(2) 

Such problems are called unconstrained optimization problem. A vector X * is called optimal solution of above problem, if it has smallest objective value among all vectors that satisfy 

constraints

i.e.,

for

* any z , f1 ( z )  b1 ,......... , f m ( z )  bm , f 0 ( z )  f 0 ( X ) and 

with

* g1 ( z )  c1 ,......... , g m ( z )  c m , f 0 ( z )  f 0 ( X ) . There can be variety of mathematical models of real life optimization problems. The objective function which is sought to be optimized can be either linear or nonlinear problem. Sometimes even its explicit mathematical formulation may not be known. The constraints optimizing the objective function can be linear or nonlinear [111], [146]. The representation of optimization problem can also be given through either Crisp sets or Fuzzy sets [159], [163] or Rough sets [121]. In Fuzzy Sets or Rough sets representation of optimization problem either objective function or constraint functions or both can have fuzzy or rough membership function [117] respectively.

The classical optimization methods are useful in finding optimal solution of continuous and differentiable functions [111]. These methods are analytical in nature and make use of techniques of differential calculus in locating points of optima. These methods are best suited for finding maxima or minima of unconstrained functions, which may be either single-variable or multivariable in nature leading to single-variable or multi-variable optimization [111]. An important category of constrained optimization is linear programming problem in which objective functions as well as constraints is linear function. To every linear programming problem [148] there corresponds another linear programming problem called its dual and original problem is called primal. As most linear programming problems are mathematical models of real life situations, in many cases it is not just sufficient to obtain their solutions but also to analyze effect of subsequent changes in input data on solutions obtained. This is known as post optimality analysis [146]. Another important theory known as khun-tucker theory [111] locates the points of maxima and minima of constrained and unconstrained non-linear optimization problem. It provides set of necessary and sufficient conditions for checking whether given point is point of optimality. Quadratic programming is an example which uses khun-tucker necessary conditions. Unless khun-tucker sufficiency conditions are verified, there is no way of being certain whether solution obtained is local or global optimum. Other special classes of non-linear optimization problems are linear fractional programming problem, separable programming problem and geometric programming problem [111]. In many optimization problems optimal solution is sought in terms of integral values of variables. Such problems subject to the constraint that variables are integers are called integer programming problems [148]. If some variables are restricted to be integers while others are real numbers, problem is said to be mixed integer programming problems [146]. Decision making problem or decision theory [94] is an optimization problem, where the basic objective is to provide a method wherein data concerning outcomes of different consequences can be evaluated to enable decision maker to identify best course of action. Decision making can be performed under certain, uncertain and risk situations. There can also be situations in which more than one objective function is to be optimized subject to same set of constraints. These are known as multi-objective optimization problems [111]. In civil engineering but would also like some desired values be achieved as close as possible. Such problems are called goal programming problems [146]. There are problems where it is not desirable or possible to determine optimal values of all unknown decision variables simultaneously. In such problems, decisions have to be taken in stages. These problems are known as multi-stage decision problems or dynamic programming problems [53]. Dynamic programming is well suited to deal to deal with problems involving discrete variables as well as non-convex, discontinuous and non-differentiable functions. In some cases, more than one decision maker may be involved in decision making process. Such problems are called multi-person decision-making problems [111]. It is however, not necessary that all decision makers may be having identical objectives in mind. There can be situations in which persons involved in decision process have conflicting objectives and are taking decisions in opposition to each other. These problems are called game theory problems [94]. Other important categories of decision making problems are multi-criteria and multi-person decision-making problems. With growing awareness of advantages of optimization problems and easy availability of high speed computers, new category of optimization techniques have emerged referred to as heuristic-based optimization problems [86], [111], [126]. These techniques rely extensively on numerical computations and often have imprint of user’s field. Commonly used heuristics are artificial neural networks [79] based on learning algorithms, simulated annealing, genetic algorithms, random search based methods and hybrid algorithms [60], [74], [117].

1.3 Solution of Optimization Problem

Once optimization problem is defined, the next task is to solve it. The solution of optimization problem usually involves following three phases [111], [126], [148] viz. (i) modeling phase; (ii) solution of mathematical model and (iii) validation of results and their implementation. The first phase is vital for obtaining correct solution for the problem. The other two phases provide the basis for determining optimal solution and its implementation in real life situation. There is no single method available for solving all optimization problems efficiently. As far as possible, the problem is amenable to solution by one or more of available techniques of optimization. In many cases, new technique may have to be developed to solve the problem and many of which are still under investigation and development process. In the past, in view of limited availability of computational facilities, the trend was to introduce approximations and assumptions in model, so that it could be solved using some well-known techniques of optimization [146]. However, the solution of this modified and simplified model often did not meet specifications of actual specifications of user. This was one of the main reasons why initially users were not so enthusiastic in using these methods. However, with easy availability of fast computing facilities in form of personal computers and at same time the development of more robust and efficient computational techniques of optimization, scenario has now changed. The solution of more realistic and complex problems can now be obtained in more or less their original form in relatively much shorter time span. As a result, realistic solutions can be obtained much faster and implemented for better results. There are host of methodologies to solve different forms of optimization problem. Some of the commonly used solution techniques are illustrated here. The most common form of optimization problem viz., linear programming problem is solved using graphical and simplex methods [148]. In linear programming problem, solution satisfying constraints and non-negativity constraints is called feasible solution. If feasible solution is basic, then it is called basic feasible solution. Basic feasible solution in which at least one basic variable vanishes is called degenerate basic feasible solution. If not a single basic variable vanishes, then it is called non-degenerate basic feasible solution. The dual of linear programming problem can also be solved likewise by simplex method. However, as dimension of linear programming problem increases, computational time and effort increase exponentially which is taken care of by karmarkar type interior point method [146]. Transportation problem, a type of linear programming problem is solved using number of techniques among which vogel’s approximation method [148] gives most optimal solution. Assignment problem which is special case of linear programming problem is solved using hungarian method [146]. The solution of different types of constrained non-linear optimization problem is obtained through kuhn-tucker conditions. The solution methods of unconstrained nonlinear optimization problem include direct search, gradient and newton-raphson methods [146]. The integer programming problem and mixed integer programming problem are solved using cutting plane, branch and bound and exhaustive enumeration approaches [111]. The multi-stage optimization problems are solved through decomposition, backward and forward recursion techniques. Decision making problem and its variants are tackled through various deterministic and non-deterministic techniques [126]. Game theory an important category of decision making problem is solved by different techniques of which principle of dominance [94] is very effective. The multi-objective and goal programming optimization problem and its variants are solved using different techniques including weighted sum and preemptive approaches [111].

1.4 Soft Computing Soft Computing refers to a collection of computational techniques in Computer Science, Artificial Intelligence, Machine Learning and some engineering disciplines which attempt to study, model and analyze very complex phenomena [86], [163]; those for which more conventional methods

have not yielded low cost, analytic, and complete solutions. Earlier computational approaches could model and precisely analyze only relatively simple systems. More complex systems arising in biology, medicine, humanities, management science and similar fields often remained intractable to conventional mathematical and analytical methods. It is realized that complex real world problems require Intelligent Systems that combine knowledge, techniques and methodologies from various sources. These Intelligent Systems are supposed to possess human like expertise within a specific domain, adapt themselves and learn to do better in changing environments and explain how they make decisions or take actions [60]. In confronting real world computing problems, it is frequently advantageous to use several computing techniques synergistically rather than exclusively, resulting in construction of hybrid Intelligent Systems such as Soft Computing which can be defined as follows [163]: Soft Computing is an emerging approach to computing which parallels the remarkable ability of human mind to reason and learn in an environment of uncertainty and imprecision. [Lotfi A. Zadeh] Soft Computing differs from conventional or hard computing in that unlike hard computing it is tolerant of imprecision, uncertainty, partial truth and approximation. In effect role model for Soft Computing is human mind. The guiding principle of Soft Computing is as follows [163]: Exploit tolerance for imprecision, uncertainty, partial truth and approximation to achieve tractability, robustness and low cost solution The basic ideas underlying Soft Computing in its current incarnation have links to many earlier influences, among them Zadeh's 1965 paper on Fuzzy sets; 1973 paper on analysis of complex systems and decision processes and 1979 report (1981 paper) on possibility theory and soft data analysis. The inclusion of ANN and GA in Soft Computing came at later point. The principal constituents of Soft Computing [163] are thus Fuzzy sets, ANN, Evolutionary Algorithms (EA), Machine Learning and Probabilistic Reasoning with latter subsuming Belief Networks, Chaos theory and parts of Learning theory [121]. Each of these constituents has its own strength. What is important to note is that Soft Computing is not a mélange. Rather, it is partnership in which each of the partners contributes distinct methodology for addressing problems in its domain. In this perspective, principal constituent methodologies in Soft Computing are complementary rather than competitive. Furthermore, Soft Computing may be viewed as foundation component for emerging field of Computational Intelligence. Since past few decades, various Soft Computing tools are hybridized with Rough sets to form some exciting paradigms such as Roughfuzzy, Neuro-fuzzy, Neuro-fuzzy-genetic or Rough-fuzzy-genetic hybridizations [14], [86], [121]. Some of the important constituents of Soft Computing are briefly discussed below. 1.4.1 Fuzzy Sets The human brain interprets imprecise and incomplete sensory information provided by perceptive organs. Fuzzy set theory [159], [163] provides systematic calculus to deal with such information linguistically and it performs numerical computation by using linguistic labels stipulated by membership functions. The logic revolving Fuzzy sets is Fuzzy Logic which deals with reasoning that is approximate rather than precisely deduced from classical predicate logic. It can be thought of as application side of Fuzzy set theory dealing with well thought out real world expert values for complex problem [110], [130]. The degrees of truth are often confused with probabilities. However, they are conceptually distinct. The membership function denoting linguistic labels are fuzzy truth values represent membership in vaguely defined sets, and not likelihood of some

event or condition. The set membership values are allowed to range inclusively between 0 and 1 and in its linguistic form represent imprecise concepts like slightly, quite and very. Specifically, it allows partial membership in set. Fuzzy set is also related to Possibility Theory [159], [163]. The computational paradigm of Fuzzy Logic thus generalizes classical two-valued logic for reasoning under uncertainty. In order to achieve this, notation of membership in set needs to become a matter of degree. This is the essence of Fuzzy sets. By doing this, two things are accomplished viz., ease of describing human knowledge involving vague concepts and enhanced ability to develop cost-effective solution to real-world problem. Fuzzy Logic is multi-valued logic which is model-less approach and is clever disguise of probability theory [159], [163]. The theory of Fuzzy sets provides an effective means of describing behavior of systems which are too complex or too ill-defined to admit precise mathematical analysis by classical methods and tools. It has shows enormous promise in handling uncertainties to a reasonable extent, particularly in decision making models under different kinds of risks, subjective judgment, vagueness and ambiguity. Moreover, a selection of fuzzy if-then rules forms key component of Fuzzy Inference Systems that can effectively model human expertise in specific applications [86]. Extensive applications of this theory to various fields e.g., Expert Systems, Control Systems, Pattern Recognition, Machine Intelligence etc. have already been well established. 1.4.2 Artificial Neural Networks ANN is computational paradigm [79] that consists of nodes that are connected by links. Each node performs simple operation to compute its output from its input, which is transmitted through links connected to other nodes. This relatively simple model of ANN is analogous to that of Neural Systems in human brains [19]; the nodes correspond to neurons in brain and links or interconnections correspond to synapses that transmit signal between neurons. Each interconnection has a strength that is expressed by weight value. One of the major features of ANN is its learning capability from real life patterns or examples. While the details of learning algorithms of ANN vary from architecture to architecture, they have one common aspect; they can adjust parameters in ANN such that the network learns to improve its performance of given task. The most common forms of learning used in ANN are supervised and unsupervised learning [60]. The supervised learning is guided by specifying for each training input pattern, the class to which the pattern is supposed to belong. That is, the desired response of network is used in learning algorithm for appropriate adjustment of weights. These adjustments are made incrementally in desired direction to minimize difference between desired and actual outputs, which facilitates convergence to solution [79]. Once network converges to solution, it is capable of classifying each input pattern with other patterns that are close to same distinguishing features. In unsupervised learning, network forms its own classification of patterns. The classification is based on commonalities in certain features of input patterns. These require that network implementing an unsupervised learning be able to identify common features across range of input patterns. The major advantage of ANN is their flexible non-linear modeling capability and no need to specify particular model form [86]. Rather, the model is adaptively formed based on the features presented in data. This data-driven approach is suitable for many empirical data sets where no theoretical guidance is available to suggest an appropriate data generating process. However, ANN require large amount of data in order to yield accurate results. No definite rule exists for the sample size requirement of given problem. The amount of data for network training depends on network structure, training method and complexity of particular problem or amount of noise in data on hand. With large enough sample, ANN can model any complex structure in data. ANN can thus benefit more from large samples than linear statistical models. Some commonly used categories of ANN are single layer perceptron network, feed-forward network, radial basis function network, multi-layer perceptron network, kohonen self-organizing network, recurrent network, stochastic network, associative networks etc [19], [79].

1.4.3 Genetic Algorithms GA was first suggested by John Holland. They are unorthodox optimization search algorithms. They are nature inspired algorithms mimicking natural evolution [74], [108] in which a population of abstract representations called chromosomes or genotype or genome of candidate solutions called individuals, creatures or phenotypes to an optimization problem evolves toward better solutions. GA perform directed random search through a given set of alternatives with the aim of finding best alternative with respect to given criteria of goodness. Traditionally, solutions are represented in binary as strings of 0 and 1, but other encodings are also possible. The evolution usually starts from population of randomly generated individuals and happens in generations. In each generation, fitness of every individual in population is evaluated, multiple individuals are stochastically selected from the current population based on their fitness and modified with recombined and possibly randomly mutated to form a new population. The new population is then used in next iteration of algorithm. Commonly, the algorithm terminates when either maximum number of generations has been produced or satisfactory fitness level has been reached for the population. If the algorithm has terminated due to maximum number of generations, satisfactory solution may or may not have been reached. A typical GA requires two things to be defined viz., genetic representation of solution domain and fitness function to evaluate solution domain [74], [108]. A standard representation of solution is as an array of bits. Arrays of other types and structures can be used in essentially the same way. The main property that makes these genetic representations convenient is that their parts are easily aligned due to their fixed size that facilitates simple crossover operation. Variable length representations may also be used. The fitness function is defined over genetic representation and measures quality of represented solution. The fitness function is always problem dependent. Once genetic representation and fitness function are defined, GA proceeds to initialize a population of solutions randomly and improve it through repetitive application of mutation, crossover, inversion and selection operators. 1.4.4 Rough Sets The concept of Rough sets was introduced by Pawlak [121]. It has emerged as another major mathematical approach for managing uncertainty that arises from inexact, noisy or incomplete information [121]. It has turned out to be methodologically significant in domains of Artificial Intelligence and Cognitive Sciences especially in representation and reasoning with vague and imprecise knowledge, data classification, data analysis, learning and knowledge discovery [116], [117]. The theory has proved to be of substantial importance in many areas of applications. The major distinction between Rough set and Fuzzy set is that former requires no external parameters and uses only information present in given data [116], [117]. It may be noted that Fuzzy set theory hinges on notion of membership function on domain of discourse, assigning to each object grade of belongingness in order to represent an imprecise concept. The focus of Rough set theory is on ambiguity caused by limited discernibility of objects in domain of discourse. The idea is to approximate any concept by pair of exact sets called lower and upper approximations. On basis of lower and upper approximations of Rough set, the accuracy of approximating Rough set can be calculated as ratio of cardinality of lower and upper approximations [116], [117]. But concepts, in such granular universe, may well be imprecise in the sense that these may not be represented by crisp subsets. This led to a direction among others, in which notions of Rough sets and Fuzzy sets can be integrated; the aim being to develop model of uncertainty stronger than either. Research work combining Fuzzy set and Rough set for developing efficient methodologies and algorithms for various real life decision making

applications have started to come out. Integration of these theories with ANN is being performed with an aim of building more efficient and intelligent systems in Soft Computing paradigm. 1.4.5 Ant Colony Optimization The ACO algorithm introduced by Marco Dorigo is probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs. They are inspired by the behavior of ants in finding paths from the colony to food. This method uses many ants or agents to traverse the solution space and find locally productive areas [58]. While usually inferior to GA and other forms of local search, it is able to produce results in problems where no global or up-to-date perspective can be obtained, and thus other methods cannot be applied. In real world, ants initially wander randomly and upon finding food return to their colony while laying down pheromone trails. If other ants find such a path, they are not likely to keep traveling at random but instead to follow the trail, returning and reinforcing it if they eventually find food. Over time, however pheromone trail starts to evaporate, thus reducing its attractive strength. The more time it takes for an ant to travel down the path and back again, the more time pheromones have to evaporate. A short path by comparison, gets marched over faster and thus pheromone density remains high as it is laid on path as fast as it can evaporate. Pheromone evaporation has also advantage of avoiding convergence to a locally optimal solution [58]. If there were no evaporation at all, paths chosen by first ants would tend to be excessively attractive to the following ones. In that case, the exploration of solution space would be constrained. Thus, when one ant finds good path from colony to food source, other ants are more likely to follow that path and positive feedback eventually leads all the ants following a single path. The idea of ACO algorithm is to mimic this behavior with simulated ants walking around the graph representing the problem to solve. ACO algorithms have been used to produce near-optimal solutions to traveling salesman problem. They have an advantage over GA approach when graph may change dynamically; the ACO algorithm can be run continuously and adapt to changes in real time. This is of interest in network routing and transportation systems. 1.4.6 Hybrid Algorithms In late Eighties there was a trend to integrate the technologies such as Fuzzy sets, ANN, GA, Rough sets etc which would synergistically enhance the capability of each Soft Computing tool. This resulted in fusion and growth of various hybridization methods [86], [109], [116], [117] such as Neuro-Fuzzy, Neuro-Genetic, Neuro-Fuzzy-Genetic, Rough-Fuzzy-Genetic, Rough-NeuroFuzzy-Genetic etc. Neuro-Fuzzy Hybridization is the most visible integration realized so far. The past few years have witnessed a rapid growth in number and variety of applications of Fuzzy Logic and ANN ranging from consumer electronics and industrial control to decision support systems and financial trading. Neuro-Fuzzy modeling together with new driving force from stochastic, gradient-free optimization techniques such as GA and Simulated Anealing forms constituents of Soft Computing which is aimed at solving real-world decision making, modeling and control problems. The integration of these technologies with Rough sets has resulted in the birth and growth of hybrid systems which can handle uncertainty and vagueness stronger than either. These problems are usually imprecisely defined and require human intervention. Thus, Neuro-Fuzzy, Rough sets and Soft Computing with their ability to incorporate human knowledge and to adapt their knowledge base via new optimization techniques play increasingly important roles in conception and design of hybrid Intelligent Systems.

1.5 Soft Computing for Optimization Problems

A solution method for an optimization problem is an algorithm that computes solution to the problem to some desired accuracy, given a particular problem from class of problems [126], [146]. Since the early years, large effort has gone into developing algorithms for solving various classes of optimization problems, analyzing their properties and developing good software implementations. The effectiveness of these algorithms i.e., ability to solve the optimization problem varies considerably and depends on factors such as particular forms of objective and constraint functions, how many variables and constraints are there. Even when objective and constraint functions are smooth such as polynomials, general optimization problem is difficult to solve. Approaches to general problem therefore involve some kind of compromise such as very long computation time or possibility of not finding the solution There are however some important exceptions to general rule that most optimization problems are difficult to solve For few problem classes, there are effective algorithms that can reliably solve even large problems with hundreds or thousands of variables and constants However, sometimes problem is of intractable in nature and it becomes impossible to find any feasible solution for the problem. Such a problem is called NP-complete problem or NP-hard problem [53]. In such situations, Soft Computing tools provide major helping hand in developing optimum, cost effective solutions which would otherwise have been impossible. 1.5.1 Scope of Applicability Keeping in view, the computationally intractable nature of optimization problem in real life situations, in this thesis the concentration is to design and develop optimal algorithms using different Soft Computing concepts. High level programming languages such as C/C++ and MATLAB were used for model various implementations. The developed methods and algorithms have appreciable computational complexity. Their advantage compared to conventional methods is that the developed model adapts itself into new circumstances dynamically, which makes it possible to apply to other complex situations of optimization problems. It may be argued that some potential applications of Soft Computing to optimization problems [90], [94], [111], [126], [146], [148] are made in the following domains: a) b) c) d) e) f) g) h) i) j) k)

Decision Making Traveling Salesman Problem Job Scheduling Assignment Problem Sequencing Financial Investments Theory of Games Transportation Problem Time Series Forecasting Stock Price Prediction Bankruptcy Prediction

An appreciable amount of work has been done in this direction; however a lot of work still remains to be explored. With this objective, an attempt has been made to develop optimal and cost effective solutions for aforementioned categories of optimization problems using various Soft Computing paradigms which are as follows: a) b) c)

Fuzzy Sets [159], [163] Artificial Neural Networks [79] Genetic Algorithms [74], [108]

d) e) f)

Rough Sets [121] Ant Colony Optimization [58] Hybrid Algorithms [86], [109], [116], [117]

1.5.2 Research Issues and Challenges The following research issues were considered during the course of entire work: 1.

Massive data sets and high dimensionality: Huge data sets create combinatorial explosive search spaces for model induction which may make the process of extracting information infeasible owing to space and time constraints. They also increase chances that Soft Computing algorithm will generate spurious results that are not generally valid.

2.

Over-fitting and assessing statistical significance: Data sets used for verifying the feasibility of developed solutions are usually huge and available from distributed sources. As a result, often presence of spurious data points leads to over-fitting of models. Regularization and resampling methodologies need to be emphasized for model design.

3.

Understandability of data patterns: It is necessary to make the solutions developed more understandable to humans, so that they can be applied to other similar real life problems as well.

4.

Nonstandard and incomplete data: The data can be missing and/or noisy in nature.

5.

Mixed media data: Learning from data that is represented by combination of various media, like numeric, symbolic and text sources.

6.

Integration: Different Soft computing tools are often only a part of entire decision making system. It is desirable that they integrate smoothly with problem domain, solution space and final decision making procedure.

In section 1.7, issues related to large size of data sets in are discussed in more detail.

1.6 Related Works The works related to different aspects of optimization problem dealt in this thesis are briefly illustrated here. 1.6.1 Work related to Traveling Salesman Problem The first instance of traveling salesman problem was given by Euler in 1759 whose problem was to move a knight to every position on chess board exactly once. The problem first gained fame in book written by German salesman B F Voigt in 1832 on how to be successful traveling salesman. He mentions the problem, although not by that name, by suggesting that to cover as many locations as possible without visiting any location twice is most important aspect of scheduling of a tour. The origin of problem in mathematics dates back to 1930s. The problem has been studied intensively in both Operations Research and Computer Science since 1950s. Therefore, it is not surprising that large number of different algorithmic techniques were either applied to the problem to develop feasible and optimal solution to problem. Upto early 1980’s these approaches comprised mainly construction heuristics by Clarke and Wright in 1964; Christofides in 1976; Golden and Stewart in 1985; Bentley in 1992 [29], iterative improvement algorithms by Flood in

1956; Croes in 1958; Lin in 1965; Lin and Kernighan in 1973 [29] and exact methods like branch and bound or branch and cut by Dantzig, Fulkerson and Johnson in 1954; Grötschel in 1981; Padberg and Grötschel in 1985; Grötschel and Holland in 1991 and Applegate et al in 1995 [29]. Since beginning of 1980’s, more and more metaheuristics have been tested on traveling salesman problem. In fact, traveling salesman problem was first problem to which simulated annealing; one of the first metaheuristic approaches was applied by Kirkpatrick, Gelatt and Vecchi in 1983 [29] and Cerny in 1985. Following simulated annealing virtually any metaheuristic used traveling salesman problem as test problem. These include tabu search by Knov in 1994; Zachariasen and Dam in 1996 [29], guided local search by Vondouris and Tsang in 1999 [29], evolutionary algorithms by Merz and Freisleben in 1997; Walters in 1998 [29], ant colony optimization algorithms by Dorigo in 1992 [31] and iterated local search by Baum in 1986; Martin et al in 1991; Johnson and McGeoch in 1997 and Applegate et al in 2003 [29]. The state of art for solving symmetric traveling salesman problem with heuristics is summarized in overview article by Johnson and McGeoch in 1997 [29] which contains discussion of relative performance of different metaheuristic approaches to traveling salesman problem and concludes that iterated local search algorithms using fine tuned implementations of Lin Kernighan heuristic are most successful. The most recent effort in collecting state of art for traveling salesman problem solving by heuristic methods was undertaken by 8th DIMACS Implementation Challenge on traveling salesman problem. The details of this benchmark challenge can be found in paper by Johnson and McGeoch in 2002 [29]. The conclusion of this recent undertaking is that, when running time is not much concern, best performing algorithms appear to be tour merging approach (traveling salesman problem specific heuristic) of Applegate et al in 1999 [29] and iterated version of Helsgaun’s Lin Kernighan variant in 2000 [29]. In this context, it is interesting to note that iterated version of Helsgaun’s implementation of Lin Kernighan heuristic uses constructive approach as ant colony optimization to generate initial tours for local searches, where best so far solution strongly biases tour construction. Conventional programming approaches do not deal with multi-objective traveling salesman problem. Fischer and Richter in 1982 [29] used branch and bound approach to solve traveling salesman problem with two sum criteria. Gupta and Warburton in 1986 [29] used 2 and 3-opt heuristics for max-ordering traveling salesman problem. Sigal in 1994 [29] proposed decomposition approach for solving traveling salesman problem with respect to two criteria of route length and bottlenecking, where both objectives are obtained from same cost matrix. Tung in 1994 [29] used branch and bound method with multiple labeling scheme to keep track of possible pareto-optimal tours. In 1997, Melamed and Sigal [29] suggested constrained based algorithm for bi-objective traveling salesman problem. Ehrgott in 2000 [29] proposed an approximation algorithm with worst case performance bound. Hansen in 2000 [29] applied tabu search algorithm to multi-objective traveling salesman problem. Borges and Hansen in 2002 [29] used weighted sums program to study global convexity for multi-objective traveling salesman problem. Jaszkiewicz in 2002 [29] proposed genetic local search which combines ideas from evolutionary algorithms, local search with modifications of aggregation of objective functions. Paquete and Stützle in 2003 [29] proposed two-phase local search procedure to tackle biobjective traveling salesman problem. During first phase, good solution to one single objective is found by using an effective single objective algorithm. This solution provides starting point for second phase, in which local search algorithm is applied to sequence of different aggregations of objectives, where each aggregation converts bi-objective problem into single objective one. Yan et al in 2003 [29] used an evolutionary algorithm to solve multi-objective traveling salesman problem. Angel, Bampis and Gourvès in 2004 [29] proposed dynamic search algorithm which uses local search with an exponential sized neighborhood that can be searched in polynomial time using dynamic programming and rounding technique. Paquete, Chiarandini and Stützle in 2004 [29] suggested pareto-local search method which extends local search algorithm for single

objective traveling salesman problem to bi-objective case. This method uses an archive to hold non-dominated solutions found in search process. 1.6.2 Work related to Transportation Problem The first reference to transportation problem dates back to 17th Century. The problem has been studied intensively in optimization techniques since then. In Mathematics and Economics, transportation theory is a name given to the study of optimal transportation and allocation of resources. The problem was initially formalized by French mathematician Gaspard Monge in 1781 [37]. Major advances were made in the field during World War II by Russian mathematician economist Leonid Kantorovich [37]. Consequently, the problem as is sometimes known as Monge-Kantorovich transportation problem. In 1941, F. L. Hitchcock [37] presented a study entitled The Distribution of a Product from Several Sources to Numerous Localities. This presentation is considered to be first important contribution to solution of transportation problems. In 1947 T. C. Koopmans [37] presented an independent study called Optimum Utilization of the Transportation System. These two contributions helped in development of transportation methods which involve number of shipping sources and number of destinations. An earlier approach was given by Kantorovich. The linear programming formulation and associated systematic method for solution were first given by G. B. Dantzig [37]. 1.6.3 Work related to Decision Making and Rectangular Games Problem The general decision making problem is a NP-complete problem initially studied in mid of 20th century by number of researchers. In 1947, Brunswik [39] gave foundational work on lens model, where policy capturing denotes methodology for studying individual differences in decision strategies via mathematical or statistical models. Policy capturing has been employed to study range of decision environments. In this approach, set of judgment stimuli, created on basis of manipulated cues, are presented to participants so that their ensuing judgments can be captured and modeled. Internal validity is addressed through systematic manipulation of environmental cues, whereas external validity is addressed through use of expert decision makers. Brehmer and Brehmer [39] addressed certain fundamental issues in using such an approach that include degree to which individuals use different decision policies and actual awareness of strategies. In addition, this type of modeling research can play pivotal role in understanding how to train individuals to use given policy. Gobet and Ritter [39] gave judgment modeling research of this type that capitalizes on advantages of individual level data analysis. Although potential benefits of modeling decision making are numerous review of traditional modeling approaches e.g., linear regression reveals number of factors suggesting that research on alternative methods is warranted. These factors include use of unrealistic, orthogonal judgment cues, arising from difficulty in analyzing inter-correlated cues with multiple regression; reliance on linear models even though cited pervasiveness of linearity may be more reflective of lack of research on alternative models; and limited selection of methods for eliciting participants verbal descriptions of their judgment policies. According to behaviorist Isabel Briggs Myers, a person's decision making process depends on significant degree on their cognitive style. Myers developed set of four bipolar dimensions, called Myers Briggs Type Indicator (MBTI). The terminal points on these dimensions are: thinking and feeling; extroversion and introversion; judgment and perception; and sensing and intuition. She claimed that person's decision making style is based largely on how they score on these four dimensions. Other studies suggest that these national or cross cultural differences exist across entire societies. For instance Maris Martinsons has found that American, Japanese and Chinese business leaders each exhibit distinctive national style of decision making.

Zadeh conceived the concept of Fuzzy sets and later in idea of Soft Computing to deal with impreciseness and uncertainty involved in all decision making problems. In 1980s and 1990s large number of researchers applied Fuzzy sets and Soft Computing concepts to solve many problems in engineering and management. Some notable works in this direction are given by Jang and Sun. A key concept in Fuzzy Systems theory and related techniques is idea of adaptive, model-free estimation. Kosko in discussing Soft Computing techniques quoted that Intelligent Systems adaptively estimate continuous functions from data without specifying mathematically how outputs depend on input. Essentially this statement refers to ability of fuzzy systems to map an input domain X e.g., cues to an output range Y e.g., judgments/decisions without denoting function f: X → Y. However, it has been demonstrated mathematically that fuzzy systems are universal approximations of continuous functions of rather general class. Because of this distinction as model-free estimators and universal approximations, modeling techniques such as Fuzzy models have an innate freedom from a priori assumption of type of relationships that may exist between variables. Although universal approximation places no theoretical limits on modeling capabilities of fuzzy systems, how to construct an optimal model for given data set to achieve full modeling power of approach remains an open question. Inherent in the claim that model-free estimation is an advantage is belief that some relationships of interest to human performance researchers depart from normally assumed linear form and that for exploratory research, specific form of such models cannot be predetermined. In support of this idea in Hammond [39] suggested that various cognitive and judgment tasks vary on continuum from intuitive to analytical and that respective judgment models may similarly change in nature and complexity. In view of wealth of evidence accruing in physical and life sciences Barton said that many real systems function through complex nonlinear interactions where adaptive modeling tools may prove useful to human performance researchers. Similar universal approximation methods, such as ANN have been incorporated into the analytical toolbox of researchers interested in modeling elements of human performance. Craiger and Coovert [39] discussed how Fuzzy sets can be used to capture linguistic values such as high and low in variables related to human performance such as job or task experience and performance. In line with these ideas, variable performance can be captured by specifying finite universal set that consists of levels of performance using three fuzzy sets viz., high, moderate, and low performance. It is noteworthy that concept of Fuzzy set is congruent with early psychometric ideas; for example, pioneers such as L. L. Thurstone [39] put forth idea that an individual's opinion could be characterized by more than single point estimate response as suggested by Hesketh and Hesketh [39]. Newell and Simon [39] in their seminal work demonstrated that much of human problem solving could be expressed as if-then types of production rules. This finding helped launch field of Intelligent Systems. Subsequently, Expert and other Intelligent Systems have been implemented to model, capture, and support human decision making. However, traditional rule-based systems suffer from several problems, including the fact that human experts are often needed to articulate propositional rules, that symbolic processing normally used prevents direct application of mathematics and that traditional rule-based systems require large number of rules that are often brittle and thus they are not robust to often required novel set of data inputs. Game theory an important category of decision making problem was originated by John Von Newmann in 1928 and later developed by G. B. Dantzig [30]. The mathematical treatment of was made available in 1944, when John Von Newmann and Oscar Morgenstern published the famous article Theory of Games and Economic Behavior. Von Newmann’s [30] approach to solve game theory problems was based on the principle of best out of the worst i.e., he utilized idea of minimization of maximum losses. The game problems are represented by rectangular matrices also known as rectangular games. Most of competitive problems can be handled by this principle. 1.6.4 Work related to Financial Investment Classification Problem

Data classification is an important and most studied problem in Machine Learning domain. Much of the work on data classification is motivated by real life applications. Indeed, numerous applications of data classification bring life to research area and help to direct future work. Classification is among first crucial steps in making sense of blooming buzzing confusion of sensory data that Intelligent Systems confront. The first references to data classification problem date back to later half of 19th Century. The problem has been studied intensively in both optimization techniques and Computer Science since 1970s [33], [34]. The foundations can be traced to Plato which were later extended by Aristotle who distinguished between as essential property i.e., which would be shared by all members in class from an accidental property i.e., which could differ among all members in class. The problem of data classification revolves around concept of Pattern Recognition and can be cast as problem of finding such essential properties of category. In late 90s Zen Patriarch, Bodhidharma [33], [34] would point at things and demand answer for the question “what is that?” as a way of confronting deepest issues in mind, identification of objects and the nature of classification and decision. It has been a central theme in the discipline of Philosophical Epistemology, study of nature of knowledge. A more modern treatment of some philosophical problems of Pattern Recognition can be found in number of references. Linear programming and integer programming has found successful applications in finance domain to perform data classification. The idea is to apply programming techniques to choose weight vector, so that weighted sum of answers is above some cutoff value for good investments and below cutoff value for bad ones. The classical study of classification by programming can be found in work done by Glover and Mangasarian [33], [34]. The nearestneighbor approach is standard non-parametric approach to classification problem. Classification tree also known as recursive partitioning algorithms is completely different statistical approach to classification. The idea is to split set of application answers into different sets and then identify each of these sets as good or bad depending on what majority is in that set. It was first used by Makowski and Coffman [33], [34]. ANN which were originally developed from attempts to model communication and processing information in human brain are most promising credit scoring models and have been adopted by many financial systems. They could be developed to classify nonlinearly separable cases using multiple-layer networks with nonlinear transfer functions. However, ANN introduces new problems such as over fitting and an opaque mechanism. There is vast literature of ANN application in finance applications. Support vector machine a commonly used ANN was first proposed by Vapnik [33]. Unlike classical methods that merely minimize the empirical training error, support vector machine aims at minimizing an upper bound of generalization error by maximizing margin between separating hyperplane and data. It is powerful and promising data classification and function estimation tool. In this method input vectors are mapped into higher dimensional feature space and an optimal separating hyperplane in this space is constructed. Support vector machines have been successfully applied to number of applications ranging from bioinformatics to text categorization and face or fingerprint identification. 1.6.5 Work related to Forecasting and Prediction Forecasting and prediction of data has been studied since long back. Among the important Soft Computing tools used for forecasting are ANN as mentioned by Zhang et al [41]. The amount of data for network training depends on network structure, training method, complexity of particular problem or amount of noise in data at hand. Nam and Schafer suggested that as different training sample sizes increased ANN performed better [41]. Kang argued that ANN forecasting models perform quite well with sample sizes smaller than 50, while Box-Jenkins models typically require at least 50 data points in order to forecast successfully [41]. Another Soft Computing forecasting

tool commonly used is Fuzzy sets that are suitable under incomplete conditions but their performance is not always satisfactory. Fuzzy set theory was originally developed to deal with problems involving linguistic terms and have been successfully applied to various applications such as university enrollment forecasting by Chen, Hwang; Chen, Hsu; Chen, financial forecasting by Hsu, Tse, Wu; Huarng, Yu; Huarng, Yu; Yu, temperature forecasting by Chen, Hwang, reactors etc [41]. Tanaka et al [41] have suggested Fuzzy regression to solve fuzzy environment and to avoid modeling error. The model is an interval prediction model with the disadvantage that prediction interval can be very wide if some extreme values are present. Watada gave an application of Fuzzy regression to fuzzy time-series analysis. Fuzzy sets were applied to time series models leading to fuzzy time series by Song, Chissom [41]. Different fuzzy time series models have been proposed since then by Song, Chissom; Sullivan, Woodall; Chen; Hwang, Chen, Lee; Chang; Song; Huarng; Chen; Huarng, Yu [41]. In subsequent works, several other methods were proposed such as high order fuzzy relationships by Chen, Hsu; Chen; Chen, Chung; Hwang, Chen, Lee [41]. Additionally Huarng [41] pointed that length of intervals affected forecasting accuracy in fuzzy time series and proposed method with distribution based and average based lengths to reconcile this issue. Yu [41] suggested weighted model to tackle recurrence and weighting issues in fuzzy time series forecasting. Other works can be found in Miller [41]. Using hybrid models or combining several models has become common practice to improve forecasting accuracy as suggested by Reid; Bates, Granger; Clemen [41]. The basic idea of model combination in forecasting is to use each model’s unique feature to capture different patterns in data. Both theoretical and empirical findings suggest that combining different methods can be effective way to improve forecasts as illustrated by Armano, Marchesi, Murru; Luxhoj, Riis, Stensballe [41]. The notable works on time series forecasting include Fuzzy auto regressive integrated moving average (FARIMA) method by Tseng et al [41] hybrid GA and high-order Fuzzy time-series approach for enrollment forecasting by Chen, Chung and Huarng et al [41] combined methodology using ANN to forecast fuzzy time-series. Finally, Rough sets in recent years since its inception have gained momentum and widely used as viable intelligent knowledge discovery techniques in many applications including forecasting and prediction areas. For instance, building trading systems using Rough set model was studied by several researchers. Ziarko et al., Golan and Edwards applied Rough set model to discover strong trading rules, which reflect highly repetitive patterns in data from historical database of Toronto stock exchange. A detailed review of applications of Rough set in financial domain can been found in Tay, Francis, Lixiang and Shen [41]. Fuzzy regression is an important technique for analyzing the vague relationship between dependent variable (response variable) and independent variables (explanatory variables) by Celikyilmaz, Turksen; Coopi, D’Urso, Giordani; Ge, Wang; Hong, Song, Young; Hong, Hwang; Hung, Yang; Khashei, Hejazi, Bijari; Wang, Zhang [41]. Some successful applications of fuzzy regression include insurance, housing, thermal comfort forecasting, productivity and consumer satisfaction, product life cycle prediction, project evaluation, reservoir operations, actuarial analysis, robotic welding process and business cycle analysis. Tanaka et al. first studied fuzzy linear regression problem with crisp explanatory variables and fuzzy response variables. They formulated fuzzy linear regression problem as linear programming model to determine regression coefficients as fuzzy numbers, where objective function minimizes total spread of fuzzy regression coefficients subject to constraint that support of estimated values is needed to cover support of their associated observed values for certain pre-specified level. Later it was improved by Tanaka; Tanaka and Watada; Tanaka, Hayashi and Watada [42]. The drawbacks of these approaches have been pointed out by several investigations by Redden and Woodall; Wang and Tsur; Kao and Lin [42], such as sensitiveness to outliers, wide ranges in estimation and more observations result in fuzzier estimations which contradict general observation that more observations provide better estimations. Some other works include fuzzy least-squares approach

to determine regression coefficients as proposed by Diamond [42] and criterion of minimizing difference of membership values between observed and estimated fuzzy dependent variable by Kim and Bishu [42]. Sakawa and Yano [42] formulated three types of multi-objective programming approaches to investigate fuzzy linear regression model with fuzzy explanatory variables and responses. Hong et al [42] used shape preserving arithmetic operations LR fuzzy numbers for least-squares fitting to investigate class of fuzzy linear regression fuzzy linear regression problem. Here, derived regression coefficients are fuzzy numbers. However, since regression coefficients derived based on Zadeh’s extension principle are fuzzy numbers, spread of estimated dependent variable becomes wider as magnitudes of independent variables increase, even if spreads of observed dependent variables are decreasing. To avoid problem of wide spreads for large value of explanatory variables in estimation, Kao and Chyu [42] proposed twostage approach to obtain crisp regression coefficients in first stage and determine unique fuzzy error term in second stage. Kao and Chyu [42] proposed least-squares method to derive regression coefficients that are crisp. These two studies have better performance but they still cannot cope with situation of decreasing or non-uniform spread. Another issue is that crisp regression coefficients may eliminate problem of increasing spread, but they also mislead functional relationship between dependent and independent variables in fuzzy environment. When spreads of fuzzy independent variables are large, it is possible that spread of regression coefficients is also large. In this case values of regression coefficients are in wide range even from negative to positive values. If derived regression coefficients are crisp, some valuable information may be lost. According to Bargiela et al [42], regression model based on fuzzy data shows beneficial characteristic of enhanced generalization of data patterns compared to regression models based on numeric data. This is because membership function associated with fuzzy sets has significant informative value in terms of capturing either notion of accuracy of information of patterns in data set used for derivation of regression model. 1.6.6 Work related to Bankruptcy Prediction Problem The early works in bankruptcy analysis were published by late 1960s and mid 1970s [38]. Demand from financia1 institutions for investment risk estimation stimulated subsequent research. However, despite substantial interest, accuracy of corporate default predictions was much lower than in private loan sector, largely due to small number of corporate bankruptcies. Meanwhile, situation in bankruptcy analysis has changed dramatically. Large datasets with median number of failing companies exceeding 1000 have become available. 20 years ago median was around 40 companies and statistically significant inferences could not often be reached. The growth of computer technology and advances in statistical learning techniques has allowed identification of more complex data structures. Basic methods are no longer adequate for analyzing expanded datasets. A demand for advanced methods of controlling and measuring default risks has rapidly increased in anticipation of New Basel Capital Accord adoption [38]. This emphasizes importance of risk management and encourages improvements in financial institutions’ risk assessment capabilities. The importance of financial ratios for organizations’ financial analysis has been known for more than a century. The first researchers applying financial ratios for bankruptcy prediction were Ramser, Fitzpatrick and Winakor and Smith in 1930s [38]. However, it was not until publications of Beaver and Altman in 1960s [38] and introduction of univariate and multivariate discriminant analysis that systematic application of statistics to bankruptcy analysis started. Altman’s linear Z-score model [38] became the standard technique for many applications and is still widely used today due to its simplicity. However, its assumption of equal normal distributions for both unsuccessful and successful organizations with same covariance matrix has been severely criticized. This approach was further developed by Deakin and Altman et al in 1970s [38]. Later on, center of research shifted towards logit and probit models. The original work of Martin and Ohlson are followed by Wiginton, Zavgren and

Zmijewski in 1980s [38]. Among other statistical methods applied to bankruptcy analysis are gambler’s ruin model by Wilcox, option pricing theory by Merton in 1970s, Recursive partitioning by Frydman et al, ANN by Tam and Kiang, Odom and Sharda and Rough sets by Dimitras et al in 1990s [38]. 1.6.7 Work related to Timetable Assignment Problem Two important types of timetable assignment problem are examination and university course timetable problems. Examination timetable problem is an NP-Hard problem and therefore no optimal algorithm is known which generates solution within reasonable time. There are number of versions of examination timetable problem, differing from one university to another. A lot of work has been done on this type of problem with respect to their studies on specific universities. The first attempts were based on graph coloring concept, where vertices represent courses and an arc joins two vertices only if they cannot be scheduled at same time. The problem is thus to find chromatic number of resulting graph. However, like examination timetable problem chromatic number problem is also NP-hard. Due to complexity of problem, most of work done concentrates on heuristic algorithms which try to find good approximate solutions. Some of these include GA, tabu search, simulated annealing and recently the use of scatter search methods. Over decades, scheduling community has used domain knowledge in order to generate high quality solution. Therefore, it is very likely that those types of algorithms could not run on different types of scheduling problems. However, in recent past there has been an increase in research towards generating algorithms that work well over range of scheduling problems by choosing an appropriate heuristics. Such algorithms are usually referred to as hyper-heuristics. In order to consider different types of scheduling problems, there exists unique formalism for their representation that will capture variety of domain-specific constraints. In scheduling community there have been some attempts to develop language for constraint representation, but they addressed only single type of problem. Le Pape presented software schedule, which implements constraint language that is powerful enough to represent variety of resource and temporal constraints. However, such representation is more appropriate for certain type of algorithms. Some other type of algorithms, such as meta-heuristics may need to map given representation of problem into representation suitable for their running before starting to construct schedule. Another version of timetable problem, i.e., university course timetable problem is NP-hard as illustrated by Cooper and Kingston [46] constrained optimization problems of combinatorial nature and no optimal algorithm are known which generates solution within reasonable time. These problems are mainly classified as constraint satisfaction problems given by Brailsford, Potts and Smith [46] where main goal is to satisfy all problem constraints rather than optimizing number of objectives. There are number of versions of university course timetable problem differing from one university to another as suggested by Slim and Marte [46]. A lot of work has been done on this type of problem with respect to their studies on specific universities and many formulations and algorithms have been developed. Due to complexity of problem, most of work done concentrates on heuristic algorithms which try to find good approximate solutions. Some of these include GA by Corne, Fang and Mellish [46], tabu search by Mooney [46], simulated annealing by Saleh and Coddington [46], and recently used scatter search methods by Marti, Lourenco and Laguna [46]. Heuristic optimization methods are explicitly aimed at good feasible solutions that may not be optimal where complexity of problem or limited time available does not allow exact solution. Generally, two questions arise (i) how fast the solution executes and (ii) how close the solution is to optimal one. A tradeoff is often required between time and quality which is taken care of by running simpler algorithm more than once, comparing results obtained with more complicated ones and effectiveness in comparing different heuristics. For design and planning problems quality is more important than time, whereas for control problems situation is

just reverse. Generally, even if quality of the solution is met it is not optimum. The empirical evaluation of heuristic method is based on analytical difficulty involved in problem and pathological worst case result. 1.6.8 Work related to Longest Common Subsequence Problem Longest common subsequence problem has its applications in DNA analysis or in design of conveyor belt workstations in machine production process. A brute-force approach to solve longest common subsequence problem is to enumerate all subsequences of first sequence and check each subsequence if it is also subsequence of second sequence keeping track of longest subsequence found. Each subsequence of first sequence corresponds to subset of indices {1,......... ., m} . There are 2 m subsequences of first sequence, so that exponential time is required making it impractical for long sequences. The O(mn) algorithm for longest common subsequence problem seems to be a folk algorithm. Knuth posed the question of whether sub quadratic algorithms for longest common subsequence exist. Masek and Paterson answered this question in affirmative by giving an algorithm that runs in O(mn / log n) time, where n  m and sequences are drawn from set of bounded size. For special case in which no element appears more than once in an input sequence, Szymanski shows that problem can be solved in O((n  m) log(n  m)) time. Many of these results extend to the problem of computing string edit distances. Many sequential algorithms for longest common subsequence problem have been proposed in literature and time complexity has been shown to be (mn) when alphabet is not fixed and lengths of two strings are m and n ( m  n) . Parallel algorithms have also been devised on different models. On the theoretical CREW PRAM, Liu and Lin proposed fastest algorithm that requires mn / log m processors and takes O(logm) time. In order to offer practical solutions, researchers provided solutions on systolic model which is more realistic. When symbols are input sequentially on n processor linear systolic array, tight lower bound has been achieved by Lin and Chen. Their algorithm takes m  2n  1 steps. 1.6.9 Work related to Job Scheduling Problem Artificial Intelligence aims at developing machines and programs that have the capability to learn adapt and exhibit human like intelligence. Hence, learning algorithms are important for practical applications of Artificial Intelligence by Mitchell [44] and is closely related to Machine Learning by Dietterich [44]. There have been several applications of ANN in scheduling. A comprehensive survey of ANN architectures used in scheduling is provided by Cheung [44]. These are basically searching networks (hopfield networks), probabilistic networks (boltzmann machine), error correcting networks (multi-layer perception), competing networks and self organizing networks. An investigation and review of application of ANN in job scheduling problem is also provided by Jain and Meeran [44]. ANN has been used in many important applications such as function approximation, classification, memory recall, optimization and noise filtering. Many commercial products such as modems, image processing and recognition systems, speech recognition software, knowledge acquisition systems and medical instrumentation etc. Windrow et al [44] have been developed using ANN. ANN was first employed to job scheduling problem by Foo and Takefuji [44]. They formulated scheduling problem as an integer linear programming problem and used modified hopfield network to model the problem. The energy function for network is linear function and is sum of starting times of jobs. In later work, Foo et a1 [44] investigated scaling properties of modified hopfield network for scheduling problem. Some of the drawbacks of these networks include lack

of convergence, convergence to local minima and number of hardware processing elements. Adaptive ANN for generalized job scheduling problem was proposed by Yang and Wang [44] where precedence and resource constraints of problem are mapped to network architecture. The network consists of linear segmented activation functions. The network generates feasible solutions, which are further improved by heuristics to obtain non–delay solutions. The other prominent ANN systems used in scheduling are error correcting networks which adapt network parameters (weights and biases) based on propagation of error between desired and computed output of network. A major class in such systems is multi-layer perception network, where supervised learning takes place by back propagation algorithm. A modified multi-layer perception model was proposed by Jain and Meeran [44], where ANN performs task of optimization and outputs desired sequence. A novel input/output representation scheme is used to encode job scheduling problem for ANN training. Although the method has been able to handle large problem sizes (30  10) compared to other approaches, generalization capability of model is limited to approximately 20% deviation from training sample. In contrast to above approach, many applications of error correcting networks to job scheduling problem utilize ANN as component of hybrid scheduling system. ANN is used to rank and determine coefficients of priority rules as given by Rabelo and Alptekin [44]. An expert system utilizes these coefficients to generate schedules. GA was used for optimization and ANN performs multi-objective schedule evaluation as suggested by Dagli and Sittisathanchai [44]. The network maps set of scheduling criteria to appropriate values provided by experienced schedulers. A hybrid approach was presented by Yu and Liang [44] for job scheduling problem in which GA was used for optimization of job sequences and ANN performs optimization of operation start times. This approach has been successfully tested on large number of simulation cases and practical applications. The use of multi-layer perception ANN for simulation of job environment was investigated by Fonscca and Navaresse [44]. This work compares performance of ANN in estimating manufacturing lead time to traditional simulation approaches. In an approach very similar to ANN learning mechanism Agarwal et a1 [44] proposed an adaptive learning approach for flow shop scheduling problem. In their approach, heuristics were guided through problem search space based on weighted processing time. The weights were adaptive with two parameters, learning rate and reinforcement rate used to influence extent and memory of search. The authors report satisfactory performance on several sets of benchmark problems drawn from literature.

1.7 Scaling Soft Computing Algorithms to Complex Data Sets In today’s competitive scenario, organizations and institutions are handling very large volume of customer, operations, scientific and other sorts of data of gigabytes or even terabytes size. Operations research practitioners would like to be able to apply Soft Computing techniques and algorithms to these large data repositories in order to discover useful knowledge such that meaningful inferences can be drawn and effective decisions are made. The question of scalability asks whether algorithm can process large data volumes efficiently while building from them best possible models [109]. From point of view of complexity analysis, for most scaling problems the limiting factor of data set has been number of examples and their dimension. A large number of examples introduces potential problems with both time and space complexity. For time complexity appropriate algorithmic question is: What is the growth rate of algorithm’s run time as number of examples and their dimensions increases? As expected, time-complexity analysis does not tell the whole story. As number of instances grows, space constraints become critical, since almost all existing implementations of Soft Computing algorithm operate with training set entirely in main memory. Finally, goal of Soft

Computing algorithm must be considered. Evaluating effectiveness of scaling technique becomes complicated if degradation in quality of optimality is permitted. Effectiveness of technique for scaling Soft Computing algorithms is measured in terms of above three factors viz., time complexity, space complexity and quality of learning. Many diverse techniques, general and task specific have been proposed and implemented for scaling Soft Computing algorithms [109]. Some of broad categories are discussed here relevant to the thesis. Besides these, other hardware driven (Parallel Processing, Distributed Computing) and database driven (Relational representation) methodologies are equally effective.

1.7.1 Data reduction The simplest approach for coping with infeasibility of inference from very large data set is to infer from reduced or condensed representation of original massive data set [109]. The reduced representation should be as faithful to original data as possible for its effective use in different processing tasks. At present following categories of reduced representations are mainly used: a)

Sampling/Instance selection: Various random deterministic and density biased sampling strategies exist in Statistics literature. Their use in Machine Learning and Soft Computing tasks has also been widely studied [109]. It is to be noted that merely generating random sample from large database stored on disk may itself be non-trivial task from computational viewpoint. Several aspects of instance selection, for example, instance representation, selection of interior/boundary points, instance pruning strategies have also been investigated in instance based and classification frameworks. Challenges in designing an instance selection algorithm include accurate representation of original data distribution making fine distinctions at different scales and noticing rare events and anomalies.

b)

Data squashing: It is a form of lossy compression where large data set is replaced by small data set and some accompanying quantities, while attempting to preserve its statistical information [109].

c)

Indexing data structures: Systems such as kd-trees, R-trees, hash tables, AD-trees, multiresolution kd-trees and cluster feature trees partition data or feature space into buckets recursively and store enough information regarding data in bucket so that many processing tasks can be achieved in constant or linear time [109].

d)

Data Cubes: Use relational aggregation database operator to represent chunks of data [109].

The last three techniques fall into general class of representation called cached sufficient statistics. These are summary data structures that lie between statistical algorithms and database, intercepting kinds of operations that have potential to consume large time if they were answered by direct reading of data set. 1.7.2 Dimensionality reduction An important problem related to processing large data sets, both in dimension and size is of selecting subset of original features. Preprocessing data to obtain smaller set of representative features, retaining optimal or salient characteristics of data not only decreases processing time but also leads to more compactness of models and better generalization features [109]. 1.7.3 Active learning

Traditional Soft Computing and Machine Learning algorithms deal with input data consisting of independent and identically distributed samples. In this framework, number of samples required or sample complexity by class of Soft Computing algorithms to achieve specified accuracy can be theoretically determined. In practice, as amount of data grows, increase in accuracy slows, forming the learning curve. This slowdown in learning can be avoided by employing selection methods for sifting through additional examples and filtering out small non-identically distributed set of relevant examples that contain essential information. Formally, active learning studies closed-loop phenomenon of learner selecting actions that influence what data are added to its training set [109]. When actions are selected properly, sample complexity for some problems decreases drastically and sometimes NP-hard learning problems become polynomial in computation time. 1.7.4 Data partitioning Another approach to scaling is to partition data, avoiding the need to run algorithms on very large data sets. The models inferred from individual partitions are then combined to obtain final ensemble model. Data partitioning techniques can be categorized based on whether they process subsets sequentially or concurrently. Several model combination strategies also exist in literature including committee machines, voting classifiers, mixture of experts, stacked generalization, Bayesian sampling, statistical techniques and Soft Computing methods [109]. The problems of modular task decomposition for achieving computational efficiency have also been studied. 1.7.5 Efficient search algorithms The most straightforward approach to scaling Soft Computing learning is to produce more efficient algorithms or to increase efficiency of existing algorithms. As mentioned earlier optimization problem may be framed as search through space of models based on some fitness criteria. This view allows for three possible ways of achieving scalability [109]: a)

Restricted model space: Simple Soft computing algorithms, for example, two-level trees, decision stump and constrained search involve smaller model space and decrease complexity of search process.

b)

Knowledge encoding: Domain knowledge encoding providing an initial solution close to optimal one, results in fast convergence and avoidance of local minima. Domain knowledge may also be used to guide search process for faster convergence.

c)

Powerful algorithms and heuristics: Strategies like greedy search, divide and conquer, modular computation is often found to provide considerable speedups. Programming optimization through efficient data structures and dynamic search space restructuring and use of GA, Randomized algorithms and Parallel algorithms may also obtain approximate solutions much faster compared to conventional algorithms.

1.8 Scope of the Thesis The major objective of this thesis is to provide some results of investigations, theoretical and experimental, addressing applications of Soft Computing to optimization problems. Problem domains considered include traveling salesman problem, transportation problem, decision making and its applications in game theory and financial investments, forecasting and prediction and resource allocation, sequencing and job scheduling. Various traditional methodologies have been

developed using classical approaches, the solutions obtained from which not always optimal. Soft Computing approaches integrating Fuzzy sets, ANN, Rough sets and GA try to improve optimality of solutions obtained using classical approaches. The emphasis of proposed methodologies is given on handling data sets which are large both in size and dimension and involve classes that are overlapping, intractable and have non-linear boundaries. Several strategies based on data reduction, dimensionality reduction, active learning and efficient search heuristics are employed for dealing with the issue of scaling in Soft Computing algorithms. The problems of handling linguistic input and ambiguous output decision, issues of overlapping or intractable class structures, selection of optimal parameters and discovering human comprehensible knowledge in form of linguistic rules are addressed in Soft Computing framework. Methodologies developed for TSP are based on Fuzzy Self Organizing Map (FSOM), Fuzzy Integer Linear Programming (FILP) and Fuzzy Multi-Objective Linear Programming (PFMOLP) approaches and are useful in handling uncertainty in real life data. A comparative study of transportation problem is made under probabilistic and fuzzy uncertainties, which is followed by representation of constrains as system of linear equations and their solution by Neuro-Fuzzy approach. A solution of balanced transportation problem is also developed using Fuzzy Vogel’s Approximation Method (FVAM) whose optimality tested through Fuzzy Modified Distribution Method (FMODIM). The solution of decision making problem is developed using the concept of Fuzzy Soft relations. Fuzzy game rectangular payoff matrices are represented using LR-type trapezoidal Fuzzy number and solved through algebraic method and principle of dominance. The problem of financial investments is solved through Multi-class Support Vector Machine (SVM) approach. The approach allows investor to accomplish effective decision making while investing. Rough Fuzzy Multi Layer Perception (RFMLP) ANN are used predict stock price prediction rules of Bombay Stock Exchange (BSE). A hybrid Neuro-Fuzzy regression model is developed for time series forecasting of currency exchange rate data of US dollar against INR. Two important problems of Fuzzy Linear Regression (FLR) viz., wide spreads for large value of explanatory variables and functional relationship between dependent and independent variables are also addressed. Corporate bankruptcy problem is then tackled using FSVM. Assignment, sequencing and Job Scheduling Problem (JSP) are effectively studied using FILP, Fuzzy Genetic Heuristic (FGH) algorithm, Ant Colony Optimization (ACO) and RFMLP ANN computing paradigms which generates optimal solutions for decision making in uncertain environments. The effectiveness of algorithms is demonstrated on different real life data sets mainly large in dimension or size, taken from varied domains, for example, traveling salesman problem, banking systems, financial institutions, corporate organizations, stock exchanges and currency exchange rates, academic institutions, scheduling and sequencing problems. The superiority of models over several related ones are found to be effective and significant. The results of investigations are summarized below under different chapter headings. 1.8.1 Different Formulations of Traveling Salesman Problem In chapter 2, Fuzzy based ANN algorithm viz., Fuzzy Self Organizing Map for TSP is proposed [35] and is compared with two well known and very effective heuristic methods viz., LinKernighan and an EA. The numerical simulation indicates that FSOM algorithm produces appreciably satisfactory results compared to both evolutionary algorithm and Lin Kerninghan algorithm for TSP. Then TSP is represented as an Integer Linear Programming (ILP) problem [29] which is further extended with fuzzy numbers, resulting in development of Fuzzy Integer Linear Programming (FILP) formulation of TSP. The FILP formulation of problem takes into consideration impreciseness, vagueness and uncertainty aspects in real life data, thus providing a

tool which helps in reasoning of imprecise knowledge based systems. These methods are based on representation theorem and fuzzy number ranking method. A symmetric version of TSP is also solved by FMOLP paradigm [45]. Here, multi-objective TSP exists in uncertain situations where route selection is done by exploiting vague decision parameters. Tolerances are introduced by the decision maker, which are adjusted to give range of solutions with different aspiration levels, from which best solution is chosen that meets satisfactory level within given domain of tolerances. FMOLP can be effective in order to achieve k-dimension points according to decision maker’s aspiration level in multi-dimensional solution space. 1.8.2 Modeling Various aspects of Transportation Problem In chapter 3, transportation problem is comparatively studied under probabilistic and fuzzy uncertainties [37]. The approach allows accomplishing direct fuzzy extension of classical numerical simplex method. The experimental results obtained are compared using fuzzy and probabilistic approaches. The simple special method for transformation of frequency distributions into fuzzy numbers without loss of useful information is used to achieve comparability of uncertain initial data in fuzzy and random cases. The fuzzy interval is achieved using trapezoidal fuzzy numbers. The constraints of transportation problem are then expressed as systems of linear equations which are solved using neuro-fuzzy networks [32] that can learn as well as allow prior knowledge to be embedded via fuzzy rules with appropriate linguistic labels. The first phase consists of constructing an appropriate error cost function for particular type of problem to be solved, which is based on defined error variables that are typically formulated from functional network. The problem thus is represented by structured multi-layer ANN. The second phase is optimization step, which involves deriving appropriate learning rule for structured neuro-fuzzy network using defined error cost function, which involves deriving learning rule. The third phase involves training of neuro-fuzzy network using learning rule developed to match some set of desired patterns i.e., input/output signal pairs. Therefore, network is essentially optimized to minimize the associated error cost function, i.e., training phase involves adjusting network’s synaptic weights according to derived learning rule. The nodes in different layers of neuro-fuzzy network perform different operations. The fourth is application phase in which appropriate output signals are collected from structural ANN for particular set of inputs to solve the problem. Finally, initial basic solution of balanced transportation problem is developed using FVAM whose optimality tested through FMODIM [40]. The solution is developed in terms of revised simplex method. This model may arise in many physical situations and is developed using trapezoidal fuzzy numbers. The nature of solution is closed, bounded and non–empty feasible which ensures existence of an optimal solution to the problem. The cost values of each cell in transportation table are represented in terms of trapezoidal fuzzy numbers which allows handling of uncertainty and vagueness involved. 1.8.3 Decision Making and its Applications in Game Theory and Financial Investments In chapter 4, optimal solutions are devised for different complex decision making problems in engineering, management and social science disciplines which involve data that are not always precisely defined using concept of Fuzzy Soft relation [39]. These problems have various types of uncertainties some of which can be dealt with using theories viz., Probability theory, Fuzzy set theory, Rough set theory, Vague set theory and Approximate Reasoning theory. However, all these techniques lack in parameterization of tools due to which these could not be applied successfully in tackling such problems. Soft sets concept is free from above difficulty and has rich potential for application for these problems. With motivation of this new concept, Soft relation and Fuzzy Soft relation are defined and are applied to solve various decision making problems. This is followed by solution of rectangular games by principle of dominance using LR-

type trapezoidal Fuzzy numbers [30]. LR-type trapezoidal Fuzzy numbers are defined by trapezoidal membership functions and are characterized by their simple formulations and computational efficiency. The solution of Fuzzy games with pay-off as imprecise number is generally given by minimax-maximin principle. The determination of 2  2 Fuzzy games from rectangular m  n Fuzzy game without saddle point is fundamental problem of Fuzzy game theory. Practically, the pay-off is not a fixed real number, so it is considered as LR-type trapezoidal Fuzzy number and m  n matrix is reduced to 2  2 matrix and then solved. Multiclass SVM is then used to evaluate the credibility of financial investments [33], [34] which is an important decision making problem. The curse of dimensionality is addressed here using kernel function. However, a proper kernel function for certain problem is dependent on specific dataset. Here, the choice of kernel function is studied empirically and optimal results are achieved for multi-class SVM combining several binary classifiers. The performance of multi-class SVM is illustrated by extensive experimental results which indicate that with suitable kernel parameters better classification accuracy is achieved as compared to other methods. Experimental results of datasets indicate that Gaussian kernel is not always the best choice to achieve high generalization of classifier although it is often the default choice. 1.8.4 Time Series Forecasting and Predicting Stock Prices alongwith Bankruptcy in Organizations In chapter 4, three important aspects of forecasting viz. stock price prediction, currency exchange rate forecasting and bankruptcy prediction are studied using hybrid Soft Computing techniques viz., RFMLP ANN, FSVM and Neuro-Fuzzy Regression model. First, generic stock price prediction presents using modular evolutionary approach to design hybrid connectionist system under Soft Computing framework for classification and rule generation. The basic building block is RFMLP ANN [36]. The model extracts knowledge in form of rules from daily stock movements of BSE, guiding investors to buy, sell or hold stock. The efficiency of prediction process is increased using Rough Set with Boolean Reasoning (RSBR) discretization algorithm. The original classification includes splitting tasks into several subtasks and RFMLP networks for each subtask. The sub-network modules integrate to preserve crude domain knowledge. The pool of integrated networks is evolved using GA with restricted adaptive mutation operator that utilizes domain knowledge to accelerate training and preserving localized rule structure. The parameters for input and output fuzzy membership functions of network are tuned using GA and link weights. The existing procedure has been modified for generation of Rough set dependency rules for handling directly real valued attribute table containing fuzzy membership values, which helps in preserving all class representative points in dependency rules by adaptively applying threshold that automatically takes care of shape of membership functions. In this design all possible inference rules contribute to final solution. The use of GA is beneficial for modeling multimodal distributions, since all major representatives are given fair chance during network synthesis. Next rule extraction algorithm is given. The performance of generated rules is evaluated quantitatively, where two new measures are defined indicating certainty and confusion in decision along with some existing measures to evaluate quality of rules. Quantitative comparison of rule extraction algorithm is made with some existing ones like Subset method, M of N method, X2R method etc. This is followed by hybrid Neuro-Fuzzy model, which is developed based on basic concepts of ANN and fuzzy regression models to time-series forecasting under incomplete data conditions [41]. The data considered is exchange rate of US dollar to Indian rupee. ANN preprocesses raw data and provides necessary background to apply fuzzy regression model. Fuzzy regression model eliminates disadvantage of large amount of historical data imposed on model formulation. The model yields accurate results with fewer observations and incomplete data sets for point and

interval forecasts. The empirical results indicate that model’ performance is comparatively better than other models which make it an ideal candidate for forecasting and decision making. The method is empirically compared with other forecasting models such as auto regressive integrated moving average, Chen’s fuzzy time-series (first and higher order), Yu’s fuzzy time-series, FARIMA and ANN, which gives improved forecasting results for present method. Then, two important problems of FLR [42] viz., wide spread for large value of explanatory variables and functional relationship between dependent and independent variables are addressed. First a procedure is developed for constructing membership function of fuzzy regression coefficients, which conserves fuzziness of input information completely. A variable spread FLR model is generated with higher explanatory power and forecasting accuracy, which resolves problem of wide spreads of estimated response for larger values of independent variables in fuzzy regression such that situation of decreasing or non-uniform spreads can be tackled. These problems are taken care of by means of three-step approach. In first step, membership functions of least-squares estimation of fuzzy response and explanatory variables are derived based on Zadeh’s extension principle, to obtain valuable information. In second step, fuzzy regression coefficients are defuzzified to crisp values via fuzzy ranking method to avoid problem of nonuniform spreads for larger values of explanatory variables in estimation. Finally, in third step mathematical programming approach determines fuzzy error term for each pair of explanation variables and response, such that errors in estimation are minimized subject to condition that spreads of each estimated response is equal to associated response. As spreads of error terms coincide with those of associated observed responses, non-uniform spreads are used no matter how spreads of observed responses change. Finally, FSVM is used to study bankruptcy in corporate organizations [38]. It is formed by integrating SVM and Fuzzy sets. FSVM is based on idea of structural risk minimization and handles uncertainty and impreciseness in corporate data such that overall prediction accuracy is enhanced. It is implemented here for analyzing predictors as financial ratios and adapts to default probability estimation. Extensive data sets are required to fully utilize classification power of FSVM. The test dataset comprises of 50 largest bankrupt organizations with capitalization of no less than $1 billion that filed for protection against creditors under Chapter 11 of United States Bankruptcy Code in 2001–2002 after stock marked crash of 2000. Their performance is illustrated by experimental results which show that FSVM are better placed in extracting useful information than traditional bankruptcy prediction methods from data. Thus, this approach inherits advantages of Machine Learning and Fuzzy Logic which enhances prediction accuracy of the model. 1.8.5 Some Problems in Assignment, Sequencing and Job Scheduling In chapter 5, first an important NP-hard assignment problem viz., Examination Timetable Problem (ETP) is considered. It is basically assignment of courses to be examined and candidates to time periods and examination rooms while satisfying set of hard and soft constraints. The simulation example is taken from Netaji Subhas Open University, Kolkata. Here, solution is developed using FILP technique [46]. As in most real life situations information available in the system is not exact and lack precision and has an inherent degree of vagueness, various allocation variables are considered as fuzzy numbers expressing lack of precision that decision maker has. Each feasible solution has fuzzy number obtained by fuzzy objective function. The solution is obtained using either fuzzy number ranking method. The performance of different FILP techniques are demonstrated by experimental data generated through extensive simulation from Netaji Subhas Open University, Kolkata, India in terms of its execution times. The proposed FILP models are compared with a commonly used heuristic viz., ILP approach on experimental

data which gives an idea about quality of the heuristic. FILP technique is again compared with different AI based heuristic techniques for ETP with respect to best and mean cost as well as execution time measures on Carter benchmark datasets to illustrate its effectiveness. An appreciable amount of time is required by FILP technique to generate satisfactory solution in comparison to other heuristic solutions. The work acts as benchmark for other heuristic algorithms [46] and help better reformulations of mathematical models of problem. The experimental study presented here focuses on producing a methodology that generalizes well over a spectrum of techniques that generates significant results for one or more datasets. The performance of FILP model is finally compared to the best results cited in literature for Carter benchmarks to assess its potential. A heuristic based solution for university course timetable problem is presented next. The problem is NP-hard combinatorial optimization problem which lacks analytical solution methods and has received tremendous attention due to its wide use in universities. Several algorithms have been proposed based on heuristics like search techniques and evolutionary computation. Here, FGH algorithm is used to solve the problem [43]. The method incorporates GA using indirect representation based on event priorities, micro GA and heuristic local search operators to tackle real world timetable problem from St. Xavier’s College, India. Fuzzy set models measure of violation of soft constraint in fitness function to take care of inherent uncertainty and vagueness involved in real life data. The present search technique is different from other techniques in several aspects such as (i) algorithm is multi-path that searches many peaks in parallel and hence reduces possibility of local minimum trapping; (ii) it works with coding of parameters instead of parameters themselves which help genetic operator to evolve current state into next state with minimum computations; (iii) fitness of each string are evaluated to guide its search instead of optimization function; (iv) there is no requirement for derivatives or other auxiliary knowledge such that no computation of derivatives or other auxiliary functions are required; (v) GA explores search space where probability of finding improved performance is high. The algorithm incorporates number of techniques and domain specific heuristic local search operators to enhance search efficiency. The non-rigid soft constraints involved in the problem are basically optimization objectives for search algorithm for which there is an inherent degree of uncertainty involved in objectives which comprises of different aspects of real life data. This uncertainty is tackled by formulating measure of violation parameter of soft constraint in fitness function using fuzzy membership functions. The solutions are developed with respect to manual solution developed by college staff. The proposed technique satisfies all hard constraints of problem and achieves significantly better score in satisfying soft constraints. The algorithm is computationally intensive in comparison to standard benchmark heuristics. Then the concentration shifts in finding low-complexity solution for Longest Common Subsequence (LCS) problem using ACO paradigm. ACO is used as novel nature-inspired metaheuristic for solution of hard combinatorial optimization problems. It belongs to class of metaheuristics which are approximate algorithms used to obtain good enough solutions in reasonable amount of computation time. The inspiring source of ACO is foraging behavior of real ants when searching for food. This characteristic of real ant colonies is exploited in artificial ant colonies in order to solve combinatorial optimization problems. Considering two strings a1 .......... ....a n and b1 .......... ....bm ( m  n) , the traditional technique for finding LCS is based on dynamic programming which consists of creating recurrence relation and filling table of size m  n . This problem has two different aspects, in which first one deals with various measures of pre-sortedness and other with problem of generation of LCS provided elements are arranged in an increasing sequence resulting in longest increasing subsequence. Here, second aspect of this problem is tackled. The proposed ACO-LCS algorithm [31] draws analogy with behavior of ant

colonies function known as ant system. It is viable approach to stochastic combinatorial optimization. The main characteristics of this model are positive feedback, distributed computation and use of constructive greedy heuristic. Positive feedback accounts for rapid discovery of good solutions, distributed computation avoids premature convergence and greedy heuristic helps find acceptable solutions in minimum number of stages. The proposed methodology is applied to LCS and simulation results are given. The effectiveness of this approach is demonstrated by efficient computational complexity. Finally, hybrid RFMLP–NN is used to study scheduling process to JSP [44]. It is a Soft Computing paradigm in which consortium of methodologies works synergistically and provides flexible information processing capability for handling real life ambiguous situations. As job scheduling is decision making process, the next operation is to select partial schedule from set of competing operations with objective of minimizing performance measure. A complete schedule is consequence of best selected decisions. However, there exist an inherent degree of uncertainty, vagueness and impreciseness associated with such problems, which are taken care of by using Fuzzy sets and Rough sets. Fuzzy sets helps in handling vagueness and impreciseness in linguistic input description and ambiguity in output decision. Rough sets deals with uncertainty arising from inexact or incomplete information and extracts crude domain knowledge for determining network parameters. It synthesizes approximation of concepts and finds hidden patterns in acquired data. It aids in representation and processing of both qualitative and quantitative parameters in reduced form and mixes user defined and measured data thus evaluating significance of data. It classifies decision rules from data and provides legible and straightforward interpretation of synthesized models. It is mostly suited for parallel processing applications and hence can be effectively integrated with ANN. ANN are recognized as powerful and general technique for Machine Learning because of their non-linear modeling abilities and robustness in handling noise-ridden data. The nodes of ANN are linked by connections with each connection having an associated weight, which is a measure of its strength and its sign is indicative of excitation or inhibition potential. It captures task relevant knowledge as part of its training regimen. This knowledge is encoded in network in architecture or topology of network. The transfer functions are used for nonlinear mapping alongwith set of network parameters (weight and biases). To generate learning or knowledge base GA are chosen for producing optimal solutions to known benchmark problem as they have proved to be successful in empirical scheduling research. In each optimal solution, every individually selected operation of job is treated as decision which contains knowledge. Each decision is function of job characteristics divided into classes using domain knowledge. Scheduler enhances classification strength and captures predictive knowledge regarding assignment of operation’s position in sequence. The trained network successfully replicates performance of GA. The better performance of scheduler on test problems compared to other methods demonstrates utility of method. The scalability of Scheduler on larger problem sets gives satisfactory results. RFMLP–NN thus captures predictive knowledge regarding assignment of operation’s position in job sequence. 1.8.6 Conclusions and Scope for Further Research The concluding remarks along with scope for further research are given Chapter 7. Different data sets used in experiments are described in brief in Appendix.

Chapter 2 Different Formulations of Traveling Salesman Problem

2.1 Introduction TSP is widely studied NP-hard [53] combinatorial optimization problem. Much of the work on TSP is not motivated by direct applications, but rather by the fact that problem provides an ideal platform for study of general methods that can be applied to wide range of discrete optimization problems. Indeed, numerous direct applications of TSP bring life to research area and help to direct future work. The idea of TSP is to find a tour of given number of cities, visiting each city exactly once and returning to starting city such that length of this tour is minimized [146]. The investigation question which arises is thus as follows: In what order should the cities be visited such that the distance traveled is minimized? TSP is represented by complete edge-weighted graph G  (V , E ) with V being set of n | V | nodes or vertices representing cities and E  V  V being set of directed edges or arcs. Each arc (i, j )  E is assigned value d ij , which is distance between cities i , j with i, j  V . The problem can be either asymmetric or symmetric in nature. In case of asymmetric TSP, distance between pair of node i, j is dependent on direction of traversing edge or arc i.e., there is at least one arc (i, j ) for which d ij  d ji . In symmetric TSP, d ij  d ji holds for all arcs in E . The goal in TSP is to find minimum length Hamiltonian Circuit [66], [69], [101] of graph, where Hamiltonian Circuit is closed path visiting each of n nodes of G exactly once. Thus, an optimal solution to TSP is permutation  of node indices {1,......., n} such that length f ( ) is minimal where, n 1

f ( )   d  (i ) (i 1)  d  ( n ) (1)

(1)

i 1

TSP has some direct importance, since quite a lot of practical applications can be put in this form. It naturally arises as sub problem in many transportation and logistic applications [94]; for example problem of arranging school bus routes to pick up children in school district [126]. This bus application is of important historical significance to TSP, since it provided motivation for Merrill Flood [29], one of pioneers of TSP research in 1940s. A second TSP application from 1940s involved transportation of farming equipment from one location to another to test soil leading to mathematical studies by P. C. Mahalanobis and R. J. Jessen [29]. More recent applications involve scheduling of service call at cable firms, delivery of meals to homebound persons, scheduling of stacker cranes in warehouses, routing of trucks for parcel post pickup and host of others. Although transportation applications [94], [148] are most natural setting for TSP, simplicity of model has led to many interesting applications in other areas. A classic example is scheduling of machine to drill holes in circuit board or other object. In this case holes to be drilled are cities and cost of travel is time it takes to move drill head from one hole to next. The technology for drilling varies from one industry to another, but whenever travel time of drilling device is significant portion of overall manufacturing process then problem can play role in reducing costs.

TSP also has theoretical significance in complexity theory, since problem is one of the classes of NP-complete problems [70], which are difficult optimization problems where set of feasible solutions which satisfy constraints of problem but are not necessarily optimal is finite though usually very large set. The numbers of feasible solutions grow as some combinatoric factor such as N! where, N characterizes size of problem [101]. It has often been case that progress on TSP has led to progress on many combinatorial optimization problems. Thus, TSP is an ideal stepping stone for study of combinatorial optimization problems. Based on deterministic approach, the world record setting TSP solution is by Applegate et al [7], [8] which has solved instances as large as 24,978 cities to optimality. Trying to solve course of exponentials parallel implementations of TSP were realized [35]. Although many optimal algorithms exist for solving TSP it has been realized that it is computationally infeasible to obtain optimal solution to problem. For large-size problem [101] it has been proved that it is almost impossible to generate an optimal solution within reasonable amount of time. Heuristics instead of optimal algorithms are thus extensively used to solve such problems [78], [111]. Many heuristic algorithms give near optimal solutions to problem which are used for practicability reasons specifically for large numbers of cities. Heuristic approaches for solving TSP are thus very popular which try to produce an optimal solution. The commonly used heuristic approaches are [78]: (a) Greedy Algorithms; (b) 2-opt Algorithm; (c) 3-opt Algorithm; (d) Simulated Annealing; (e) GA and (f) ANN. However, efficiencies vary from case to case and from size to size. Generally the most commonly used heuristic are ANN [73] suited for solving problems, which are hard to catch in mathematical models. However, usage and employment of ANN in such application domains is often dependent on tractability of processing costs. The problem domains for employment of ANN [9], [79] are increasing and also problems themselves are getting larger and complex [157]. This leads to larger networks consisting of huge numbers of nodes and interconnection links, which results in exceeding costs for network specific operations. Especially cost intensive training phase of ANN inherits major drawback, due to the situation that large numbers of patterns viz., input and target values are fed into network iteratively. The effectiveness of ANN [9], [79] is improved by deployment of Fuzzy Logic [163], [166] which generalizes classical twovalued logic for reasoning under uncertainty. This is achieved by notation of membership. Two things are accomplished by this viz., (i) ease of describing human knowledge involving vague concepts and (ii) enhanced ability to develop cost-effective solution to real-world problem. ANN and Fuzzy Logic [86] are two complementary technologies. ANN can learn from data and feedback; however, understanding knowledge or pattern learned by ANN has been difficult. More specifically, it is difficult to develop an insight about meaning associated with each neuron and each weight. Hence, ANN is often viewed as black box approach [9]. In contrast, Fuzzy rulebased models are easy to comprehend because it uses linguistic terms and structure of if-then rules. Unlike ANN, Fuzzy Logic does not come with learning algorithm. Since, ANN can learn, it is natural to merge the two technologies. This merger creates Neuro-Fuzzy ANN [86]. NeuroFuzzy ANN thus describes fuzzy rule-based model using ANN like structure. In this chapter, Fuzzy based ANN algorithm viz., Fuzzy Self Organizing Map (FSOM) [35], [95] for TSP is proposed and is compared with two well known and very effective heuristic methods viz. Lin-Kernighan [35] and an Evolutionary Algorithm (EA) [108]. The numerical simulation indicates that FSOM algorithm produces appreciably satisfactory results compared to both EA and Lin Kerninghan algorithm. The concentration then shift towards formulating the problem as an Integer Linear Programming (ILP) [71], [132] problem and extending ILP problem formulation to fuzzy constraints with fuzzy numbers which can be treated as Fuzzy Integer Linear Programming (FILP) [166] formulation of TSP. As in real life situations, information available in

system under consideration is not of exact nature, FILP formulation takes into consideration lack of precision of vague nature which is generally assumed on their formulations, thus providing a tool which helps in reasoning of imprecise knowledge based systems. These methods are based on representation theorem and fuzzy number ranking method. Finally, symmetric version of TSP is solved by Fuzzy Multi-Objective Linear Programming (FMOLP) [45] paradigm with vague decision parameters. Here, multi-objective TSP exists in uncertain environment where route selection is done by exploiting these parameters. Further tolerances are introduced by decision maker to accommodate inherent vagueness. By adjusting these tolerances a range of solutions with different aspiration levels are found from which decision maker can choose one that best meets his satisfactory level within given domain of tolerances. FMOLP can be effective in order to achieve k-dimension points according to decision maker’s aspiration level in multi-dimensional solution space. This chapter is organized as follows. In the next section, a solution to TSP is proposed using FSOM model. This is followed by FILP formulation of TSP in section 2.3. In section 2.4, symmetric TSP is solved using FMOLP problem. Experimental results and comparisons are given in section 2.5. Finally, in section 2.6 conclusions are given.

2.2 Fuzzy Self Organizing Map Model In this section, FSOM with one dimensional neighborhood [86] is used to find an optimal solution for symmetrical TSP. The solution generated is improved by 2opt algorithm [57], [74]. FSOM algorithm is compared with Lin-Kerninghan algorithm and EA with enhanced edge recombination operator and self-adapting mutation rate [124]. 2.2.1 Self Organizing Map SOM [95] introduced by Teuvo Kohonen in 1975 is an ANN that is trained using competitive, unsupervised learning [19] to produce low-dimensional discretized representation of input space of training samples called map which preserves topological properties of input space. SOM is useful for visualizing low-dimensional views of high-dimensional data which is identical to multi-dimensional scaling. They operate in two modes viz., training and mapping. Training builds map using input examples, which is competitive process and mapping automatically classifies new input vector. This approach is based on Winner Takes All (WTA) and Winner Takes Most (WTM) algorithms [86]; former being most basic competitive learning algorithm. When input vector or pattern is presented, distance to each neuron's synaptic weights is calculated. The neuron whose weights are most correlated to current input vector is winner. Only the winning neuron modifies its synaptic weights to point presented by input pattern. Synaptic weights of other neurons do not change. The learning process is described by following equation:

Wi  Wi   ( x  Wi )

(2)

where, i  {0………number of neurons}, Wi represents all synaptic weights of winning neuron,  is learning rate and x is current input vector. WTM has better convergence than WTA [79]. The difference is that many neurons in WTM strategy adapt their synaptic weights in single learning iteration only. In this case not only winner, but also its neighborhood adapts. The further neighboring neuron is from winner, smaller modification is applied to its weights. This adaptation process can be described as:

Wi  Wi  N (i, x)( x  Wi ) (3) for all neurons i that belongs to winner's neighborhood. Here, N (i, x ) function defines neighborhood. Classical SOM is created when function N (i, x ) is defined as [95]: ( i , w)   N (i, x)  {10 ford forothers

(4)

where, d (i, w) is euclidean distance between winning and ith neuron and λ is neighborhood radius. To train SOM euclidean distance between input vector and all neural weights are calculated [35]. Neuron that has shortest distance to input vector i.e., winner is chosen and its weights are slightly modified to direction represented by input vector. Then neighboring neurons are taken and their weights are modified in same direction. The parameters  and  are multiplied with   and  respectively during each learning iteration. The two last parameters are always less than one. Therefore,  and  become smaller during learning process. At beginning SOM tries to organize itself globally and with following iterations it performs more and more local organization, because learning rate and neighborhood gets smaller. Kohonen SOM is shown in figure 2.1 maps input vectors of any dimension onto map with one, two or more dimensions. Input patterns which are similar to one another in input space are put close to one another in the map. The input vector is passed to every neuron. SOM is made of vector or matrix of output neurons. If vector representation is chosen each neuron have two neighbors, one on left and other on right, then it is called one-dimensional neighborhood as shown in figure 2.2.

Figure 2.1: Kohonen SOM with two dimensional neighborhood and input vector

Figure 2.2: One dimensional neighborhood of Kohonen SOM

Figure 2.3: Classical two dimensional neighborhood

Figure 2.4: Extended two dimensional neighborhood of Kohonen SOM

If two-dimensional matrix representation is used, neurons have 4 neighbors (viz., left, right, top and bottom). This is classical two dimensional neighborhoods shown in figure 2.3. Instead of taking four nearest neurons, 8 or more can be taken as shown in figure 2.4. As many dimensions can be used as required; however, 2D is most common.

2.2.2 Fuzzy Self Organizing Map FSOM [73], [163] introduces concept of membership function in theory of Fuzzy sets to learning process. The membership Rlj of each pattern l to each neuron j is calculated and then weight vector of each neuron is adjusted according to memberships of all patterns to the neuron. In FSOM some network parameters related to neighborhood in SOM are replaced with membership function. Also learning rate parameter is omitted. FSOM considers all input data at each iteration step and is thus more effective at decreasing oscillations and avoiding dead units. FSOM used here is combination of SOM and Fuzzy C means clustering algorithm [35]. In ANN structure, each output neuron directly corresponds to city in the network of cities. The number of output neurons used to describe cities is generally arbitrary. However, if number of neurons is equal to number of cities, problem gets simplified. The more the number of neurons greater is accuracy of the model. The number of output neurons needed for good accuracy depends on complexity of problem. The more complex problem is more output neurons are required. The number of output neurons is manually selected. The weights W connect input vector components and output neurons. The weight vectors are of same dimensions as sample vectors. The weight components are initialized randomly and adjusted gradually using self organizing learning algorithm and ultimately mapping is done from input to output [79]. Let M denote number of input patterns, N number of input vector components and K number of output neurons; then the learning algorithm [35] consists of following steps: 1) Randomize weights for all neurons. 2) Input all patterns X l  { X l1 ,......... .., X lN }, l  1,......... , M . 3) Take one random input pattern and calculate euclidean distances from each pattern X l to all output neurons. N

 (X

d lj (t ) 

i 1

li

 Wij (t )) 2

l  1,......... .., M , j  1,......... ., K .

(5)

4) Compute memberships of each pattern to all neurons.

Rlj (t ) 

{dlj (t )}2 K

{d m 1

; l  1,........, M , j  1,......... , K

(6)

2

lm

(t )}

5) Find winning neuron and neighbors of winner. 6) Adjust synaptic weights of each neuron according to computed memberships. 7) M

Wij (t  1)  Wij (t ) 

 R (t )( X l 1

lj

li

M

 Wij (t ))

 R (t ) l 1

lj

(7)

8) Reduce values of parameters  and  . 9) Determine stability condition of network.

max{| Wij (t  1)  Wij (t ) |}  

1 i  N 1 j  K

(8)

If stability condition is satisfied or predefined number of iterations is achieved, then learning process terminates; otherwise go to Step 2 for another loop of learning. From above learning procedure, it can be seen that FSOM eases difficulty of selecting network parameters. In above learning procedure, weights are adjusted only once in each learning loop and features of all input samples are taken into consideration once weights are adjusted, so learning speed and estimation accuracy are both greatly improved. 2.2.3 Heuristic solution for TSP by FSOM Most interesting results of self-organization can be achieved in networks that have two dimensional input vectors and two dimensional neighborhoods [79]. In this case input to network consists of two values viz., x and y, which represent point in two dimensional space. This kind of network can map two dimensional objects in such a way that a mesh which covers this object is created. This process is illustrated in figure 2.5. Each example consists of six squares. First one shows object that should be learned. The second square illustrates network just after randomization of all neural weights. Following squares describe learning process. It is to be noted that each neuron or circle represents point whose coordinates are equal to neuron's weights. These figures illustrate that Kohonen ANN is powerful self-organizing and clustering tool. However, it is also possible to create network with one dimensional neighborhood and two dimensional inputs. Learning process of this is shown in figure 2.6. It can be observed that this network tries to organize it's neurons in such a way, that a short route between all neurons emerges. These experiments were stimulus to build a system based on one-dimensional FSOM that would solve TSP problems.

Figure 2.5: Self-organization of network with two dimensional neighborhoods

Figure 2.6: Self-organization of network with one dimensional neighborhood

To solve TSP problem, a one dimensional network is created [35]. If weight of neuron is equal to some city's coordinates this neuron represents that city. In other words, a neuron and city are assigned to each other and there is one-to-one mapping between set of cities and set of neurons. All neurons are organized in a vector [1]. This vector represents sequence of cities that must be visited. However, some modifications need to be done before FSOM is able to fully solve this problem. This is because the real-valued neural weights may never equal exactly coordinates of cities. To solve the problem an algorithm that would modify FSOM solution to valid one is

created. Positions of cities and positions of neurons may not equal. However, adequate neural weights and cities' coordinates are very close to each other. An algorithm that modifies neural weights so that they equal to cities' coordinates is applied. These weights need to be modified in such a way to restore one-to-one mapping assumed at beginning. If neuron A is assigned to city B it means that weights of neuron A are equal to coordinates of city B. After applying this algorithm, good and fast solution is obtained, however it is not locally optimal. Therefore, it needs to be optimized using well known 2opt algorithm [108]. In this case 2opt works fast even for large amount of cities, because current solution is already good. Usually 2opt does not change solution a lot as shown in figure 2.7 [23], [88]. The 2opt algorithm is based on one simple rule which selects part of tour reverses it and inserts back in cycle. If new tour is shorter than original cycle, then it is replaced. The algorithm stops when no improvement can be done. For example if there is cycle (A, B, C, D, E, F) and path (B, C, D) is reversed, then new cycle is (A, D, C, B, E, F). After 2opt optimization solution is locally optimal as shown in figure 2.8 [23], [88]. Optimal FSOM training parameters should be chosen adequately to number of cities to achieve best results. It is found empirically that good training parameters are [88]: (i) For 200 cities:   0.5 ,   0.9667 ,   0.966 ; (ii) For 700 cities:   0.6 ,   0.9665 ,   0.9664; (iii) For 1200 cities:   0.8 ,   0.9662 ,   0.9666. In every case number of iterations is set to 25000.

Figure 2.7: SOM solution without 2opt optimization (left). There are two local loops on the left. First and last neuron can be seen in the middle. They are not connected it picture but distance between them is also computed. The same solution improved by 2opt (right). Loops on left have been erased. Additional changes can be observed.

Figure 2.8: 2opt optimization. If there is cycle (A, B, C, D, E, F) and path (B, C, D) is reversed, then new cycle is: (A, D, C, B, E, F)

2.3 Fuzzy Integer Linear Programming Model In this section, FILP paradigm is presented based on formulation of TSP [29]. ILP [132] is reformulated with fuzzy constraints using fuzzy numbers. FILP problems have certain lack of

precision of vague nature in their formulation. A technique is proposed to solve them with fuzzy numbers in objective function. These are based on representation theorem and fuzzy number ranking method [11], [163], [166].

2.3.1 ILP Model of TSP The prototype associated with TSP is standard assignment prototype and there is one-to-one correspondence between TSP tours and extreme points. The modeling consists essentially of lifting this prototype in higher dimension in such a way that quadratic cost function of TSP is captured using linear function [29]. To do this the framework of graph G  (V , E ) as shown in figure 2.7 is used. Nodes in V correspond to (city, travel stage) pairs (i, s )  ( M , S ) and arcs correspond to binary variables xirj  uiru j ,r 1 ((i, j )  (M , M \ {i}); r  R) . Clearly there is oneto-one correspondence between perfect bipartite matching solutions of TSP and paths in this graph that simultaneously span set of stages S and set of cities M . For simplicity of exposition such paths are referred to as city and stage spanning paths. Also, set of all nodes of graph are considered to have given city index in common as level of graph and set of all nodes of graph that have given travel stage index in common as stage of graph. The idea behind this approach is to develop constraints for TSP which force flow in graph G to propagate along city and stage spanning paths [29]. The approach here consists of developing reformulation of standard assignment prototype using variables that are functions of flow variables associated with arcs of graph G . The correspondence between vertices of this model and TSP tours is achieved through association of costs to vertices of model. This model can also be thought of as having layers of flow like commodity flows linked through some consistency requirements constraints like capacity constraints in multi-commodity flow context. However, in particular commodities flows in this framework do not necessarily originate from source nodes of network. Rather some of commodities are created within network itself at intermediate nodes. The ILP formulation of TSP now follows [29], [132]. For (i, j , u, v, k , t )  M 6 , ( p, r , s)  R 3 such that r  p  s , let zirjupvkst be 0 / 1 binary variable that takes value 1 iff flow on arc (i, r , j ) of graph G subsequently flows on arcs (u , p, v) and (k , s, t ) respectively. Similarly, for (i, j , k , t )  M 4 , ( s, r )  R 2 such that r  s , let yirjkst be binary variable that indicates whether flow on arc (i, r , j ) subsequently flows on arc (k , s, t ) i.e., yirjkst  1 or yirjkst  0 . Finally yirjirj denotes binary variable that indicates whether there is flow on arc (i, r , j ) or not. Then, with respect to multi-commodity framework analogy, yirjirj is linked to commodity that propagates onto stages succeeding stage r in graph through yirjkst (s  r ) variables. Hence, given an instance of ( y , z ) , term flow layer is used to refer to sub-graph of G induced by arc (i, r , j ) corresponding to given positive yirjirj and arcs

(k , s, t )(s  R, s  r ) corresponding to yirjkst ' s that are positive. Hence, flow on arc (i, r , j ) also flows on arc (k , s, t ) for given s  r iff arc (k , s, t ) belongs to flow layer originating from arc (i, r , j ) . Also, flow on given arc (i, r , j ) of graph G visits given level of graph (say level t ) if



y

tskirj sR k( M \{i , j ,t }) s( r 1)





y

tskirj sR k( M \{i , j ,t }) s( r 1)

0

(9)

Logical constraints of model are that (i) flow must be conserved; (ii) flow must be connected and (iii) flow layers must be consistent with one another. By consistency of flow layers it is being referred to requirement that any flow layer originating from given arc (i, r , j ) with r  2 must be sub-graph of one or more flow layers originating from set of arcs at any other given stage preceding r . More specifically, consider arc (i, r , j ) corresponding to given positive component of yirjirj where yirjirj  0 . For s  r ( s  R) , define FS (i, r , j )  {( k , t )  M 2 : ykstirj  0} . Then, by consistency of flow layers condition that is referred is that flow layer originating from arc (i, r , j ) must be sub-graph of union of flow layers originating from arcs comprising each of FS (i, r , j )' s respectively. These ideas are developed as follows [29], [132]: 1.

Flow Conservations: Any flow through graph G must be initiated at stage 1. Also, for (i, j )  M 2 , r  R, r  2 , flow on arc (i, r , j ) must be equal to sum of flows from stage 1 that propagate onto arc (i, r , j ) .

y

iM jM

irjirj

1

yirjirj    yu1virj  0

(10)

uM vM

i, j  M ; r  R, r  2 (11) 2.

Flow Connectivity: All flows must propagate through graph, on to stage n  1 , in connected manner. Each flow layer must be connected graph and must conserve flow.

y

kM

3.

irjkst

  yirjt( s1) k  0

(12)

kM

Consistency of Flow Layers: For p, s  R (1  p  s ) and (u , v, k , t )  M 4 , flow on (u , p, v) subsequently flows onto (k , s, t ) iff for each r  p ( r  R ) there exists (i, j )  M 2 such that flow from (i, r , j ) propagates onto (k , s, t ) via (u , p, v) . This results in following three types of constraints: a)

Layering Constraints A:

yirjupv   zirjupvkst  0 kM tM

i, j , u , v  M ; p, r , s  R,2  p  n  3, r  p  1, s  p  1 b)

(13)

Layering Constraints B:

yirjkst   zirjupvkst  0 uM vM

i, k , j , t  M ; p, r , s  R,2  p  n  3, r  p  1, s  p  1 c)

Layering Constraints C:

(14)

yirjkst   zirjupvkst  0 uM vM

u , v, k , t  M ; p, r , s  R,2  p  n  3, r  p  1, s  p  1 4.

(15)

Visit Requirements: Flow within any layer must visit every level of graph G . u, v  M ; t  M \ {u , v} (16) yu1vu1v  yu1vkst  0

 sR kM s 2

yu1virj 

5.

z

sR kM sr 1

u1vtskirj

z



u1virjkst

sR kM sr 1

0

r  R \ {1}; u , v, i, j  M ; t  M \ {u , v, i, j} (17)

Visit Restrictions: Flow must be connected with respect to stages of graph G . There can be no flow between nodes belonging to same level of graph. No level of graph can be visited at more than one stage and vice versa.

y

irjkrt ( k ,t )M 2 ( k ,t )( i , j )



y



irjk ( r 1) t ( k ,t )( M \{ j },M ) ( k ,r 1,t )E

  y

irikst

sR kM tM sr

y

sR kM sr 1

irjksi



y

sR kM sr 1

   ykstjrj  0

irjisk

   yirjksj  sR kM sr 1

i, j  M ; r  R

 y

sR kM s r  2

irjisk



(18)

sR kM tM sr

2.3.2 FILP Model of TSP Considering TSP as FILP problem with imprecise costs FILP problem is presented with imprecise coefficients in objective function, i.e., with coefficients defined by fuzzy numbers. The problem can be written as [11], [29], [146]:

Min ZTSP   cˆ j x j (19) jN

a x jN

ij

subject to j

xj  0 ,

 bi i  M , M  {1,........, m}

(20)

j  N , x j  N , N  {1,........, n} (21)

where, aij , b j  R are real coefficients and costs in objective function are fuzzy numbers, i.e.,

cˆ j  F ( R) , F (R ) being set of real fuzzy numbers, i  M , j  N . Thus, one has membership function  j : R  [0,1], j  N expressing lack of precision on values of coefficients that decision-maker has [166]. For each feasible solution, there is fuzzy number which is obtained by means of fuzzy objective function. Hence, in order to solve the problem, obtaining both optimal solution and corresponding fuzzy value of objective, methods ranking fuzzy numbers obtained from this function is considered. From these two ways to solve ZTSP is approached. The first consists of use of well known ranking fuzzy numbers method, which provide different auxiliary conventional model solving the former problem. The second approach explores behavior of representation theorem of Fuzzy sets when it is used as tool to solve the problem.

Let X be set of feasible alternatives of ZTSP and g , a function mapping set of feasible alternatives

of ZTSP into set of fuzzy numbers, g : X  F ( R ) , g ( x)  cˆx   cˆ j x j , cˆ j  F ( R) where, jN

extended sum and product by positive real numbers is considered defined in F (R ) by means of Zadeh's extension principle [163]. Consider set of fuzzy numbers A  {g ( x) : x  X } . Then

x*  X will be said to be an optimal alternative if fuzzy number g ( x* ) is least in A . Hence, the problem now is how to determine least in A . As stated earlier, this method is tackled using fuzzy number ranking method which is defined by means of ranking function and particularly by means of linear ranking function which is not too restrictive because many well known fuzzy number ranking methods can be formulated by using linear ranking function in some way.

s 1

s2

(i , s )

s  n2

s  n 1

( j , s  1)

Figure 2.9: Illustration of network of TSP; Distance between arcs (i, s ) and ( j , s  1) is x isj Considering A, B  F ( R ) , a simple method of comparison between them consists of definition of certain function f : F ( R )  R . If function f () is known, then f ( A)  f ( B) , f ( A)  f ( B) , f ( A)  f ( B) are equivalent to A  B , A  B , A  B respectively. Usually f is called an linear

f ( A  B)  f ( A)  f ( B) and function if A, B  F ( R ) ; r  R, r  0 ; f (rA)  rf ( A) . As it is well known, from this definition several fuzzy numbers ranking method may be considered. To simplify, triangular fuzzy numbers is considered. They are denoted by cˆ j  (r j , c j , R j ) and their membership functions are supposed in the form [11], [29]: ranking

(u  r j ) /(c j  r j ); r j  u  c j  u  R, j  N ,  cj  ( R j  u ) /( R j  c j ); c j  u  R j  0; otherwise

(22)

Then, following result holds. Assuming linear expression y   cˆ j x j in which cˆ j ' s are fuzzy j

numbers with membership function similar to ones given by  cj and x j  0, j  N . Then membership function of fuzzy number yˆ is given by [11], [29]:

h j ( z )  ( z  rx) /(cx  rx); x  0, rx  z  cx   ( z )   g j ( z )  ( Rx  z ) /( Rx  cx ); x  0, cx  z  Rx  0; otherwise

(23)

where, r  (r1 ,......... , rn ) , c  (c1 ,......... , c n ) and R  ( R1 ,......... , Rn ) . If it is denoted that

d  R  c and d '  c  r , then d  x and d '  x will be lateral margins (right and left respectively) of fuzzy number cˆx . Applying different methods of ranking fuzzy numbers to ZTSP then it is interesting to observe how optimal solution to it will be an optimal solution of conventional programming problem with similar constraints and non-fuzzy objective function. This non-fuzzy objective reflects by means of ranking functions, preference of decision-maker. Consider ranking function f mapping each Fuzzy set into real line f : A  R ; solution for ZTSP is found from [11], [29]: (24) Min f (cˆx) subject to Ax  b x j  N , j  N (25) Then, according to ranking function f , different auxiliary models solving ZTSP can be obtained. Clearly, if linear ranking function is used then auxiliary problem obtained in above expression will be the following ILP Problem:

Min { f (cˆ j ) x j : j  N , x  X } (26) jN

Next representation theorem of Fuzzy sets is used. Considering c  R , c  (c1 ,......... , c n ) , membership function,  (c)  inf  j (c j ) , j  N (27) n

j

 () defines fuzzy objective which induces fuzzy preorder in X . Consequently fuzzy solution to ZTSP can be found from solution of multi-objective Parametric ILP problem [11], [29]:

Max {cx : c  R n ,  (c)  1   } (28) Taking into account that,  (c)  1    inf  j (c j )  1   , j  N ,   [0,1] from  (z ) , j

 j (c j )  1    h (1   )  c j  g (1   ), j  N is obtained and denoting  j  h j 1 , 1 j

1 j

 j 1  g j 1 , j  N problem stated above can be written as follows: Max {cx : x  X ,  (1   )  c   (1   ) ,   [0,1]} where,

 ()  [1 (),........ .,  n ()] and

(29)

 ()  [ 1 (),........ .., n ()] .

(1   ),  [0,1] denotes set of vectors c  R

n

Moreover,

if

with all their components c j in interval

[ j (1   ), j (1   )], j  N such that above expression is finally rewritten as [11], [29], [166]:

Max {cx : x  X , c  (1   ),  [0,1]}

(30)

which for each   [0,1] is multi-objective ILP problem denoted by M ( ) and having in its objective function costs that assume values in respective intervals. Different alternatives are considered here. First, the resolution of all problems in family {M ( ),  [0,1]} where fuzzy solution for z is obtained from solution of following multi-objective ILP problem [11], [29]: (31) Max (c1 x, c 2 x,......... ) k subject to Ax  b , x  0, c  E (1   ),  [0,1], k  1,2,........, 2 n

(32)

where, E (1   ) is subset of (1   ) constituted by vectors whose j th component is equal to either upper or lower bound of c j ,  j (1   ) or  j (1   ), j  N . On use of interval arithmetic for solving linear programming problems with interval objective functions, fuzzy solution for z is found from parametric solution of following bi-objective parametric problem, P ( ) [11], [29]:

Max z ' ( )  ( z 1 ( x,  ), z c ( x,  )) subject to Ax  b, j  N ,   [0,1]

(33) (34)

where, z 1 ( x,  ) and z c ( x,  ) in case of triangular fuzzy numbers are defined by: n

z 1 ( x,  )   (c j   (c j  r j )) x j (35) j 1

z c ( x,  )  1

n

2 j 1

(2c j   ( R j  r j  2c j )) x j (36)

Now, in accordance with representation theorem for fuzzy numbers it is defined as [11], [29]:

Sˆ   S (1   ) 

(37)

which is Fuzzy set giving fuzzy solution to former problem in which S (1   ) is defined as set of solutions of auxiliary problem considered according to above two approaches for every   [0,1] . Concretely, decision-maker may be able to assign weights  k  [0,1] to each of objectives taking part in above two approaches such that



k

 1 . Then, conventional parametric linear

k

programming problems are obtained. Assuming above two systems and consider     (1 ,......... ..,  t ) and     (1 , 2 ) ; then these problems are denoted as M  ( ) and

P ( ) respectively. If set of optimal points of these is defined as S  (1   )   [0,1] , then fuzzy solution with weight  will be given by Fuzzy set: Sˆ    S  (1   ) 

2.4 Fuzzy Multi-Objective Linear Programming Model The problem has been well studied till now under different headings [8], [29], [35], [69], [73], [78], [101] and has been solved with different approaches. As real life information is often available in form of vague descriptions, Fuzzy programming methods are designed to handle them to find optimal solutions to such problems. It enables emulation of human reasoning process and makes decisions based on imprecise data. One such concept is FMOLP [45], [64], [103] which deals with flexible aspiration levels or goals and fuzzy constraints with acceptable deviations. In TSP as multi-objective combinatorial optimization problem [155], [159] each objective function is represented in distinct dimension. To decide optimality in multi-objective TSP entails in determining k -dimensional points that pertain to feasible solution space of problem and possess minimum possible values according to all dimension. The permissible deviation from specified value of structural dimension is also considerable because salesman can face situation in which he is not able to achieve his objectives completely [62], [63]. There must be set of alternatives from which he can select one that best meets his aspiration level. Furthermore, in TSP salesman takes decision of selecting an optimal and feasible route between any couple of cities on basis of expected measures. This is true in most of real world problems because it is not possible to have all constraints and resources in exact form. An ideal solution method would solve every TSP problem to optimality, but this is not practical in most large problems [35], [103]. While advances have been made in solving TSP, it is required to meet aspiration level of decision maker under which current optimal solution remains still optimal and feasible. In this section, a methodology is presented which deals with vague parameters and achieve certain aspiration level of optimality for multi-objective TSP by transforming it into linear program using FMOLP. 2.4.1 Multi-Objective Linear Programming The first formal representation of linear programming problem and efficient techniques were developed for solving it [45]. The general linear programming model suffered from the limitation that it can deal only with single objective function and it does not incorporate soft constraints. In 1974 the concept of multi-objective linear programming was introduced. A general linear multiple criteria decision making model is presented as follows [51], [85]: n

Max z i   cij x j i  1,........, k j 1

(38)

subject to:

n

a i 1

ij

xi  b j j  1,........, m

(39)

where x is unknown vector represented as x T  [ x1 ,......... ., xn ] which maximizes k objective functions with n variables and m constraints. The parameters c ij , aij and b j are given crisp values. In a precise form, multiple objective problems can be represented by following multi-objective linear programming model [51], [85]:

optimize Z  CX

(40)

subject to AX  b (41)

where Z  [ z1 ,......... ....., z n ] is objective function vector, C is K  N matrix of constants and X is an N 1 vector of decision variables, A is an M  N matrix of constants and b is M  1 constant vector.

2.4.2 Fuzzy Multi-Objective Linear Programming The concept of decision making in fuzzy environment involving several objectives was first proposed by Bellman [45] and in 1978 Zimmermann [166] applied their approaches to vectormaximum problem. Zimmerman transformed fuzzy multi-objective linear programming problem to classic single objective linear program. The adopted fuzzy model by [103], [166] is given as follows: ~

maximize CX  Z 0 (42)

~

subject to AX  b ~

(43)

~

where, Z 0  [ z10 ,......... ....., z n0 ] are goals or aspiration levels;  and  are fuzzy inequalities that are fuzzifications of  and  respectively. For measurement of satisfaction levels of objectives and constraints [166] suggested simplest form of membership function which is given as follows: 0 0 if C k X  z k  t k  1k (C k X )  1  ( z k0  C k X ) / t k if z k0  t k  C k X  z k0 k  1,......... ..., n (44) 1 if C k X  z k0 

where, t k is admissible violation for objective z k which is decided by decision maker. Zimmerman [103], [166] also discussed membership function for maximizing objective function. In case of minimization objective function, fuzzy membership function is as follows: 0 0 if C k X  z k  t k  1k (C k X )  1  (C k X  z k0 ) / t k if z k0  C k X  z k0  t k k  1,......... ..., n (45) 1 if C k X  z k0  th Another class of fuzzy membership function suggested by [103], [166] has  2i (ai X ) for i constraint is as follows:

if ai X  bi  d i 0   2i (ai X )  1  (ai X  bi ) / d i if bi  ai X  bi  d i i  1,......... ..., m (46) 1 if ai X  bi  Here d i is admissible violation for fuzzy resource bi for i th constraint. These membership functions express satisfaction of decision maker with the solution so they must be maximized. As a result of this objective function becomes [45], [51], [85]:

max (11 (C1 X ),........ .......... , 1k (Ck X ),  21 (a1 X ),........ ........,  2m (am X )) (47) X

According to fuzzy set theorem [45], membership function of intersection of any two or more sets is the minimum membership function of these sets. By applying this theorem, objective function becomes:

max min( 11 (C1 X ),........ .......... , 1k (C k X ),  21 (a1 X ),........ ........,  2 m (a m X )) (48) X

From above representation, fuzzy program [51], [85], [103], [166] can be rewritten as follows: ~

maximize CX  Z 0

(49)

subject to ~ ~

 1  ( z k0  C k X ) / t k ; k  1,......... ......, n (50) ~ ~

 1  (ai X  bi ) / d i ; i  1,......... ......, m

(51)

~

  0, X  0 (52) ~

where,  2.4.3 FMOLP Model of TSP As objective function of TSP is to determine an optimal order for traveling all cities so that total cost is minimized. Further it is considered the situation when decision maker has to determine solution with minimized cost, time and overall distance. The individual objective functions can be formed for all objectives of decision maker. Let x ij represents link from city i to j [132], [148] and

1, city(i)  city( j ) xij   0, otherwise

(53)

Let c ij be the cost of traveling from city i to j , overall cost of a particular route is sum of costs on links comprising the route. Since, decision maker has to minimize overall cost of traveling the goal can be set for total estimated cost of entire route for TSP denoted by z10 . But there can be situations when estimated cost doesn’t meet and so decision maker can set tolerance for estimated cost. Denoting tolerance against this goal as t1 , objective function for minimization of cost is given as follows [51], [103], [166]:

n

n

~

z1 : min  cij xij z10

(54)

i 1 j 1

Let d ij be the distance from city i to j and z 20 be the corresponding aspiration level for the objective function for minimization of distance and t 2 be tolerance, then objective function takes following form [51], [103], [166]: n

n

~

z 2 : min  d ij xij z 20

(55)

i 1 j 1

Let t ij be the time spent in traveling from city i to j and z 30 be the corresponding aspiration level for the objective function for minimization of total time and t 3 be tolerance. The objective function is written as follows [51], [103], [166]: n

n

~

z 3 : min  t ij xij z 30

(56)

i 1 j 1

One important aspect is dependency of objective functions on each other. Most of the times they are dependent, but determining exact form of dependency is complex process. The proposed framework works in all cases, if there exist some feasible solution. These multiple objective function can be represented in vector form of earlier section comprising multiple objectives with specified goals and tolerances. The membership functions can be set for these individual objective functions to check their level of acceptability [166]. A restriction have been imposed in TSP that every city should be visited from exactly one of its neighboring city and vice versa, i.e. n

 xij  1, j

n

x

(57)

i 1

j 1

A route cannot be selected more than once, i.e. xij  x ji  1, i, j

ij

 1, i

(58)

(59)

and non-negativity constraints:

xij  0

(60)

These constraints collectively can be expressed in vector form and fuzzy membership functions be defined for all objective functions. Finally, a linear model is formulated using fuzzy multiobjective linear model using TSP objective functions, constraints and their corresponding membership functions. The model can be solved by mixed integer linear programming [166].

2.5 Experimental Results and Comparisons In this section, some experimental results are presented which are conducted on some well known data sets of varying dimension and size. 2.5.1 Simulation results of TSP using FSOM Technique

In the quest of finding solution to TSP problem by using FSOM [35] two types of tests were performed, viz. (i) Using city sets taken from TSPLIB in which there are already some optimal solutions present and (ii) Using randomly chosen cities TSPLIB city sets are hard to solve because in many cases cities are not chosen randomly as shown in figures 2.10 and 2.11. Generally larger city sets consist of small patterns. The city set shown in figure 2.11 consist of two different patterns and each of them is used nine times. Thus, optimal tour is identical in each one of these smaller patterns shown in figure 2.11 left. FSOM tries to figure out a unique tour in each of smaller pattern shown in figure 2.11 right. The testing process using randomly chosen cities is more objective. It is based on Held-Karp traveling salesman bound [35], [88]. An empirical relation for expected tour length is

L  k nR

(61)

where L is expected tour length, n is number of cities, R is an area of square box on which cities are placed and k is an empirical constant. For n ≥ 100, the value of k is

k  0.70805 

0.52229 1.31572 3.07474   n n n n

(62)

The three random city sets viz., 200, 700, 1200 cities are used in this experiment, square box edge length being 500.

Figure 2.10: Optimal tour length for 225 city set taken from TSPLIB (left) is 3916. Tour length generated by FSOM 2opt hybrid (right) is 3899.

Figure 2.11: Optimal tour length for 2392 city set taken from TSPLIB (left) is 378037. Tour length generated by FSOM 2opt hybrid (right) is 377946.

Instances EIL51 EIL101 TSP225 PCB442 PR1002 PR2392 RAND200 RAND700 RAND1200

Optimum 426 629 3916 50778 259045 378037 3851.81 8203.73 11475.66

Fuzzy Self Organizing Map Average Best Best Result Result Time 435 428 0.024 654 640 0.069 3909 3899 0.254 50635 50537 0.407 259024 259010 1.999 377969 377946 7.967 3844 3769 0.131 8199 8069 0.824 11469 11437 2.311

Evolutionary Algorithm

2opt Algorithm

Average Result 428.2 653.3

Best Result 426 639 4044 55657 286908

Best Time 10 75 871 10395 25639

Average Result 537 869

Best Result 524 789 4679 56686 292069

Best Time 1.44 2.96 6.7 12.37 29

3931.4

3822 9261 12858

69.6 11145 56456

4344

4037 14116 24199

5.9 17.8 37

Table 2.1: Comparison of FSOM, EA and 2opt Algorithms Lin Kerninghan Instances

Optimum

Average Result

Average Time

EIL51 EIL101 PCB442 PR2392

426 629 50778 378037

427.4 640 51776.5 389413

0.012 0.039 0.137 0.719

Table 2.2: Results for Lin-Kerninghan Algorithm All statistics for FSOM are generated after 75 runs on each city set. When number of iterations considered as 100, average results did not show any considerable difference, better results are obtained on increasing number of iterations. FSOM generates a tour in relatively short time, such as 225 cities set is solved in 254 ms, and 1000 cities set in less than 2 seconds. The average tour lengths for city sets up to 2000 cities are comparatively better than optimum. FSOM thus generates solutions that are noticeably good from optimal tour. FSOM has been compared with EA. EA used enhanced edge recombination operator [108], Steady-State Survivor Selection where always worst solution is replaced and tournament parent selection with tournament size depending on number of cities and population size. Scramble mutation is used here. The optimal mutation rate depends on number of cities and state of evolution. Therefore, self-adapting mutation rate has been used. Every genotype has its own mutation rate, which is modified in similar way as in evolution strategies. This strategy adapts mutation rate to number of cities and evolution state automatically, so it is not needed to check manually which parameters are optimal for each city set. Evolution stops when population converges. Population size was set to 1000 [98]. With smaller populations EA did not work that well. When EA stopped its best solution was optimized by 2opt algorithm. The results for FSOM, EA and 2opt Algorithm is shown in table 2.1. All statistics for FSOM are generated after 75 runs on each city set. For EA there are 20 runs of algorithm for sets EIL51, EIL101 and RAND100. For other sets EA is run twice. The optimum solutions for instances taken from TSPLIB were already present there and optimum solutions for random instances are calculated from empirical relation described above. The experiments show that EA finds better solutions for instances with up to 100 cities. Both average and best results are better than FSOM. For city sets with 50 or less EA finds optimum in every execution. The results for 225 cities are nearly comparable for both algorithms; however for larger amount of cities viz., 442 and more FSOM yields better solutions. With more number of cities, search space increases

significantly and EA needs bigger population size. For TSP225 with population size of 1000 EA's result was 4044, but when population size was expanded to 3000, tour with length 3949 is found which is comparable to that of FSOM solution. This underlines the fact that when EA is used one can always expand population size, so algorithm has greater chance of achieving good results. However, algorithm is much slower then. It is interesting to compare FSOM algorithm to other non-evolutionary approaches. One of the best TSP algorithms which are appreciably fast is LinKerninghan algorithm [35]. The algorithm was run 20 times on each city set. The average results and average times are shown in table 2.2 which indicates that Lin-Kerninghan is comparable to that of FSOM. There is not considerable difference in time for small 51-city instance which is 0.012 seconds for Lin-Kerninghan and 0.024 seconds for FSOM. On other hand, for 2392-city instance Lin-Kerninghan needed just 0.719 seconds and FSOM required almost 7 seconds. This is because FSOM is optimized by 2opt [108] which is slowest part of this algorithm. When average results are compared it can be easily seen that Lin-Kerninghan is superior in all cases. The higher is number of cities, bigger the difference between both algorithms. The FSOM is also used to generate initial population for EA. Such initialization takes only fraction of time needed for EA to finish because FSOM is fast algorithm. In this case EA tends to converge much faster and finally it did not improve much best solution generated by FSOM alone. It seems that all initial solutions were very similar to each other, thus population diversity was low and so EA lost all exploration abilities. 2.5.2 Simulation results of TSP using FILP Approach The following problem illustrates proposed FILP of TSP [29]:

max imize z  cˆ1 x1  5x2 subject to 2 x1  x2  12

2 x1  8x2  35 x j  0, x j  N , j  N where cˆ  (1,3,5) Considering the following functions: 1 (1   )  3  2 , 1 (1   )  3  2 and associated interval parametric problem is as follows:

max imize z  c1 x1  5x2 subject to 2 x1  x2  12

2 x1  8x2  35 3  2  c1  (1   )  3  2 x j  0, x j  N , j  N ,   [0,1] From [ max imize z ' ( )  ( z 1 ( x,  ), z c ( x,  )) ] auxiliary multi-objective ILP problem is:

max imize {(3  2 ) x1  5x2 , (3  2 ) x1  5x2 } subject to 2 x1  x2  12

2 x1  8x2  35 3  2  c1  (1   )  3  2

x j  0, x j  N , j  N ,   [0,1] Next, above auxiliary multi-objective ILP problem for following weight vectors:

  (1,0) and   (0.5,0.5) For   (1,0) auxiliary parametric problem is:

max imize z  (3  2 ) x1  5x2 subject to 2 x1  x2  12

2 x1  8x2  35 x j  0, x j  N , j  N ,   [0,1] The optimal solution of which is as follows:

x( )  (7,2), z ( )  31  14 ,   [0,0.25] x( )  (5,3), z ( )  30  10 ,   [0.25,0.875] x( )  (1,4), z ( )  23  2 ,   [0.875,1] Sˆ  {(7,2) / 0.25, (5,3) / 0.875, (1,4) / 1} 

For   (0.5,0.5) :

max imize z  3x1  5x2 subject to 2 x1  x2  12

2 x1  8x2  35 x j  0, x j  N , j  N ,   [0,1] The corresponding optimal solution is: x( )  (7,2), z( )  31,   [0,1], S   {(7,2) / 1} 2.5.3 Simulation results of TSP using FMOLP Approach The proposed FMOLP approach for TSP has been analyzed with symmetric TSP, where salesman starts from his home city 0, visits three cities exactly once and comes back to his home city 0 by adopting route with minimum cost, time and distance covered [71]. A map of cities to be visited is shown in figure 2.12 and cities listed along with their cost, time and distance matrix in table 2.3, where triplet (c, d , t ) represents cost, distance and time parameters respectively for corresponding pair of cites.

Figure 2.12: Symmetric Traveling Salesman Problem Let links x ij be decision variable of selection of link (i, j ) from city i to city j . The objective functions z1 , z 2 , z 3 are formulated for cost, distance and time respectively. Their aspirations levels are set as 65, 16 and 11 by solving each objective function subject to given constraints in TSP and their tolerances are decided as 5, 2, 1. The corresponding objective functions are as follows [166]:

z1  20x01  15x02  11x03  20x10  30x12  10x13  min

(63)

~

15x 20  30x 21  20x23  11x30  10x31  20x32  65 tolerence  t1  5

z1  5 x01  5 x02  3x03  5 x10  5 x12  3x13  min

(64)

~

5 x20  5 x21  10x23  3x30  3x31  10x32 16 tolerence  t 2  2 z1  4 x01  5 x02  2 x03  4 x10  3x12  3x13 

min

~

5 x20  3x21  2 x 23  2 x30  3x31  2 x32 11

(65)

tolerence  t 3  1 city 0 1 2 3

0

1

2

3

(c, d , t ) (0,0,0) (20,5,4) (15,5,5) (11,3,2)

(c, d , t ) (20,5,4) (0,0,0) (30,5,3) (10,3,3)

(c, d , t ) (15,5,5) (30,5,3) (0,0,0) ( 20,10,2)

(c, d , t ) (11,3,2) (10,3,3) ( 20,10,2) (0,0,0)

Table 2.3: Matrix for time, cost and distance for each pair of cities The fuzzy membership function for cost, distance and time objective functions are illustrated below which are based on above equations.

if z1  70 0   ( z1 )  1  ( z1  65) / 5 if 65  z1  70 (66) 1 if z1  65  if z 2  18 0   ( z 2 )  1  ( z 2  16) / 2 if 16  z 2  18 (67) 1 if z 2  16 

if z 3  12 0   ( z 3 )  1  ( z 3  11) / 1 if 11  z 3  12 1 if z 3  11 

(68)

The fuzzy multi-objective linear program with max-min approach is given as follows: ~

max imize CX  Z 0

(69)

subject to ~ ~

 1  ( z1  65) / 5 ~ ~

 1  ( z 2  16) / 2

(70)

~ ~

 1  ( z 3  11) / 1 x01  x02  x03  1

(71)

x10  x12  x13  1

(72)

x 20  x 21  x 23  1

(73)

x30  x31  x32  1

(74)

x10  x 20  x30  1

(75)

x01  x21  x31  1

(76)

x02  x12  x32  1

(77)

x03  x13  x 23  1

(78)

x01  x10  1

x02  x20  1

(79)

x03  x30  1

(81)

x13  x31  1

(83)

(80)

x12  x21  1

(82)

x23  x32  1

(84)

~

  0, xij  0

(85)

Solution

z 1 , t1

z2 , t2

z3 , t3



Route

1

65,5

16,2

-

0.80

( x03 , x31 , x 20 )

2 2

65,5 65,5

16,2 16,2

11,1 11,4

0.55

No feasible solution

3

65,5

16,2

11,5

0.62

( x03 , x31 , x12 , x 20 )

( x03 , x31 , x12 , x 20 )

Table 2.4: Solution of Fuzzy Multi-Objective linear program The above fuzzy linear program and its variants were solved by using MATLAB. As given in table 2.4 only z1 and z 2 are considered and z 3 is omitted; an optimal route with   0.8 is yielded. When z 3 is also considered, solution becomes infeasible on these tolerances. Again by relaxing tolerance in z 3 to 4, solution becomes feasible. In this case, the optimal path is achieved with   0.55 . By increasing tolerance in z 3 from 4 to 5, an optimal solution with   0.62 is obtained. These results show that by adjusting tolerance an optimal solution to multi criteria TSP [166] can be determined.

2.6 Conclusion

In this chapter, TSP is studied using FSOM, FILP and FMOLP approaches. Experimental results indicate that FSOM-2opt hybrid algorithm generates appreciably better results compared to both EA and Lin Kerninghan algorithm for TSP as number of cities increases. There are some parameters such as ,  ,  that can be optimized. Experiments with other self organizing networks should be performed and gaussian neighborhood and conscience mechanism can be applied to improve TSP solutions generated by ANN. Some other optimization algorithms may be used other than 2opt algorithm which gives better results. There are many algorithms that solve permutation problems. EA have many different operators that work with permutations. Enhanced edge recombination is one of the best operators for TSP. However, it was proved that other permutation operators, which are worse for TSP than enhanced edge recombination, are actually better for other permutation problems like warehouse or shipping scheduling applications. Therefore, it might be possible that FSOM-2opt hybrid might work better for other permutation problems than for TSP. Next, the focus shifts on representing TSP as ILP model which is extended to fuzzy constraints with fuzzy numbers, such that the problem is treated as FILP problem with imprecise costs. FILP formulation of TSP generates an optimal solution which is feasible in nature and also takes care of the impreciseness aspect. Finally, symmetric version of TSP is considered as fuzzy problem with vague decision parameters and solved using FMOLP. It deals with flexible aspiration levels or goals and fuzzy constraints with acceptable deviations. The multi-objective TSP exists in uncertain or vague environment where route selection is done by exploiting these parameters. The tolerances are introduced by decision maker to accommodate this vagueness. By adjusting these tolerances, range of solutions with different aspiration levels are found from which decision maker can choose one that best meets his satisfactory level within given domain of tolerances. FMOLP can be effective in achieving k-dimensional points according to aspiration level of decision maker in multi-dimensional solution space. FMOLP paradigm for TSP is rich enough to direct advanced research in Operations Research and Soft Computing. There is definite potential for work on development of methods to solve TSP with vague description of resources. For efficient results, some heuristics are required such as relative dependencies among objective function can also be determined.

Chapter 3 Modeling various aspects Transportation Problem

of

3.1 Introduction The transportation problem is a special category of linear programming problem deals with transportation of product manufactured at different plants or factories or supply sources to number of different warehouses or demand destinations. It has been widely studied in Logistics and Operations Management where distribution of goods and commodities from sources to destinations is an important issue [40], [83], [91], [94], [146], [153], [100]. Much of the work on transportation problem is motivated by real life applications. The objective is to satisfy destination requirements within sources capacity constraints at minimum transportation cost [71], [115]. It is an optimization problem which has been applied to solve various NP-hard problems. The computational procedure is an adaptation of simple method applied to system of equations of the associated linear programming problem. The task of distributor’s decisions can be optimized by reformulating distribution problem as generalization of classical transportation problem. The conventional transportation problem can be represented as mathematical structure [91], [94] which comprises an objective function subject to certain constraints. In classical approach, transporting costs from m sources or factories to n destinations or warehouses are to be minimized. Considering m sources as S1 ,......... ., S m and n destinations as D1 ,......... , Dn the problem can be diagrammatically represented in figure 3.1. The structure which represents transportation model in a rectangular array is called transportation table [148]. DESTINATIONS -------

Dn

ai (supply)

S1

C11

C12

-------

c1n

A1

S2

C21

C22

-------

c2n

A2

-------

-------

-------

-------

cm1

cm2

-------

cmn

Sm

----

D2

----

SOURCES

D1

am

bj(demand)

B1

B2

-------

bn

Figure 3.1: General representation of Transportation Problem Let ai  0, i  1,........, m be supply amount available at i th source S i and demand amount required at j th destination D j be b j  0, j  1,........, n . The cost of transporting one unit of commodity form

i th source to j th destination is cij  0, i  1,........, m; j  1,........, n which

corresponds to (i, j ) th cell at intersection of ith row and jth column. In transportation table, each of mn cells corresponds to a variable, where each row corresponds to one of m constraints called row constraints and each column corresponds to one of n constraints called column constraints. If xij  0 is decision variable which is the amount of commodity to be transported from i th source to j th destination, then problem is to determine x ij so as to minimize [94], [148]: m

n

z   cij xij

(1)

i 1 j 1

subject to: m

n

 xij  ai , i  1,........, m

x

(2)

i 1

j 1

ij

 b j , j  1,........, n

(3)

xij  0, i, j The cells allocated in transportation table are called occupied cells and empty cells are called unoccupied cells for which xij  0 and xij  0 respectively. The necessary and sufficient condition [94], [148] for existence of feasible solution to transportation problem is that: m

n

i 1

j 1

 ai   b j

(4)

The existence of equality in equation (4) further guarantees that problem is balanced transportation problem and consistent in nature otherwise problem is unbalanced and inconsistent. The set of constraints given by equations (1) and (2) represents m  n equations in mn non-negative variables. Each variable x ij appears in exactly two constraints, one is associated with source and other is associated with destination. If data is put in matrix from as represented in figure 2.1, elements of matrix are either 0 or 1. In transportation table, an ordered set of four or more cells is said to form a loop if following conditions are satisfied [94], [148], viz. (i) Any two adjacent cells in ordered set lie in the same row or in same column and (ii) Any three or more adjacent cells in ordered set do not lie in same row or in same column. A feasible solution to transportation problem is basic iff corresponding cells in transportation table do not contain loop. The initial basic feasible solution [94], [148] is generally obtained by applying either of the commonly used techniques, viz. (i) North-West Corner Rule; (ii) Matrix Minima Method and (iii) Vogel’s Approximation Method. However, in real life situations the information available is of imprecise nature and there is an inherent degree of vagueness or uncertainty present in problem data under consideration. The term imprecision here signifies the sense of vagueness rather than lack of knowledge about parameters present in the system. The uncertainty aspect is tackled using the concept of Fuzzy sets [163] which provides a strict

mathematical framework in which vague conceptual phenomena can be precisely and rigorously studied and thus serves as an important decision making tool [166]. Using concept of fuzzy membership function different effective solutions was formulated to transportation problem [24], [153], [166]. The most commonly used membership function is triangular membership function. Different effective algorithms were worked but with regard to parameters of task described in form of real numbers. Nevertheless, such conditions are fulfilled seldom or almost never because of natural uncertainties encountered in real world situations. For example, it is hard to define stable cost of specific route. This problem was solved in case of interval uncertainty of transporting costs [40]. The above mentioned work introduces restriction in the form of membership function. This allows transformation of initial fuzzy linear programming problem into usual linear programming problem by use of well defined analytic procedures. However in practice membership functions, which describe uncertain parameters of used model, can have considerable complicated forms. In such cases, numerical approach is needed. The main problem when constructing numerical fuzzy optimization algorithm is to compare fuzzy values. To solve this problem, approach based on -level representation of fuzzy numbers and probability estimation of the fact that given interval is greater than or equal to other interval is used [37]. The probabilistic approach was used only to infer set of formulae for deterministic quantitative estimation of intervals inequality or equality. The method allows comparison of interval and real number and to take into account implicitly widths of intervals ordered. In this chapter, comparative study of transportation problem under probabilistic and fuzzy uncertainties is made [37]. The proposed approach allows accomplishing direct fuzzy extension of classical numerical simplex method. The experimental results obtained are compared using fuzzy and probabilistic approaches. The simple special method for transformation of frequency distributions into fuzzy numbers without loss of useful information is used to achieve comparability of uncertain initial data in fuzzy and random cases. The fuzzy interval is achieved using trapezoidal fuzzy numbers. Next, constraints of transportation problem are expressed as systems of linear equations which are solved [32] using neuro-fuzzy networks [86] that can learn as well as allow prior knowledge to be embedded via fuzzy rules with appropriate linguistic labels because of their inherent parallel architecture. The first phase consists of constructing an appropriate error cost function for particular type of problem to be solved. Error cost function is based on defined error variables that are typically formulated from functional network for particular problem. The problem thus in general is represented by structured multilayer ANN [79] Multi-layer ANN has fuzzy rules embedded in it giving rise to neuro-fuzzy Network. The node and links in a Neuro-Fuzzy Network correspond to a specific component in a Fuzzy System. A node in neuro-fuzzy Network is usually not fully connected to nodes in an adjacent layer. The second phase is optimization step, which involves deriving appropriate learning rule [79] for structured neuro-fuzzy network using defined error cost function. This typically involves by deriving learning rule in its vector-matrix form. Once vector-matrix form of learning rule is derived, the scalar form can be formulated in relatively straightforward manner. The third phase involves training of neuro-fuzzy network using learning rule developed as above to match some set of desired patterns i.e., input or output signal pairs. Therefore, training phase involves adjusting network’s synaptic weights according to derived learning rule in order to minimize associated error cost function. The nodes in different layers of neuro-fuzzy network perform different operations. The fourth and final phase is actually application phase in which appropriate output signals are collected from structural ANN for particular set of inputs to solve the problem.

Finally, solution to balanced transportation problem is developed in terms of revised simplex method computational procedure [146] for solving the problem. Though this problem can be solved by using simplex method, its special structure allows developing simplified algorithm for its solution. This model is not representative of particular situation but may arise in many physical situations. Considering this point of view the model is developed using trapezoidal fuzzy numbers [86]. The nature of solution is closed, bounded and non–empty feasible which ensures existence of an optimal solution to balanced transportation problem. The cost values of each cell in the transportation table are represented in terms of trapezoidal fuzzy numbers which allows handling of uncertainty and vagueness involved. The initial basic solution of transportation problem is proposed using Fuzzy Vogel’s Approximation Method (FVAM) whose optimality tested through Fuzzy Modified Distribution Method (FMODIM) [40]. This chapter is organized as follows. In section 3.2, transportation problem is studied under probabilistic and fuzzy uncertainties. In the next section, solution of constraint equations of transportation problem is obtained through Neuro-Fuzzy approach. This is followed by solution of transportation problem using FVAM in section 3.4. The section 3.5, illustrates different experimental results and comparisons. Finally, in section 3.6 conclusions are given.

3.2 Transportation Problem under Probabilistic and Fuzzy Uncertainties The major prima face here is to minimize the costs involved in transportation problem and maximize profits under identical conditions [148]. There are m sources and n destinations, with ai  0, i  1,........, m be the maximal quantities of goods that can supplied by sources and

b j  0, j  1,........, n be maximal requirement of goods at destinations. In accordance with signed contracts the distributor must buy at least p i unit of goods at price of t i monetary units from each of i th source and sell at least q j unit of goods at price of s j monetary units to each of

j th destination. The total transportation cost of delivering goods from i th source to j th destination is denoted as c ij [37]. There is reduction in price k i if greater quantities of goods are purchased than the stipulated quantity p i . Also there is reduction in price r j if the quantity purchased is greater than contracted quantity q j . The problem lies in finding optimal quantities of goods xij (i  1,......... ., m; j  1,......... , n) being delivered from i th source to j th destination such that total benefits D are maximized under restrictions. Assuming all the above mentioned parameters are fuzzy ones, the resulting optimization problem can be formulated as follows [37], [141]: max D 

m

n

 ( z i 1 j 1

ij

 xij )

(5)

subject to: n

x j 1

ij

n

x j 1

ij

 ai , i  1,........, m

m

(6)

x i 1

 pi , i  1,........, m

ij

m

(8)

x i 1

ij

 b j , j  1,........, n

(7)

 q j , j  1,........, n

(9)

where, zij  r j  ki  cij ; i  1,......... , m; j  1,......... , n and D , z ij , a , b , p , q are fuzzy values. To decide on the problem specified by equations (5) to (9), the numerical method based on -cut representation of fuzzy numbers and probabilistic approach to interval and fuzzy interval comparison is discussed [26]. The direct fuzzy extension of simplex method is made use of. To estimate effectiveness of method, results of fuzzy optimization were compared with those obtained from equations (5) to (9) when all uncertain parameters were considered as random values [26] which are normally distributed and the parameters are considered as real numbers. Generally, the problem lies with different precisions of representation of the uncertain data [166]. For instance, one part of parameters used can be represented in the trapezoidal fuzzy numbers form on basis of expert’s opinions [37] and at same time, the other part of them can have the form of histogram frequency distributions of considerable complicated form, obtained from statistical analyses. In these cases, correct approach is to transform all uncertain data available to form of smallest certain level. Thus, the data represented is transformed in form of frequency distributions or histogram to membership functions of fuzzy numbers. To present initial data in fuzzy number form, a technique is applied which develops membership function on basis of frequency distribution, if such exists or directly use a histogram. In simplest case of normal frequency distributions, they can be exhaustively described by their averages m and standard deviations  [75]. In more complicated situations it seems better to use histograms. Thus a numerical technique is used which allows to transform frequency distribution or histogram to trapezoidal fuzzy number. As an illustration, consider reduction of frequency distribution to fuzzy number with following algorithm [141]: Step 1: In interval within smallest value xmin ( xmin  50) and maximum value x max ( x max  151) , function F ( xi ) is defined as surface area under curve given in figure 3.2 from xmin to current value of x i . As a result, cumulative function is obtained as shown in figure 3.3. It is verified that function F ( xi ) is actually probability of x  xi . Step 2: Using obtained cumulative function F (x ) four decision values of F ( xi ) are given at

i  0, 1, 2, 3, which define mapping of F (x ) on x in such a way that they provide upper and bottom α-levels of trapezoidal fuzzy number. In the example given in figure 3.3, intervals [95,105] and [78,120] are in essence the 30% and 90% of probability confidence intervals. As a result, trapezoidal fuzzy interval represented by the quadruple [78,95,105,120] are obtained. It is easily evident that transformation accuracy depends only on suitability and correctness of upper and bottom confidence intervals chosen. It is worthy to note that the main advantage of this method is that it can be successfully used in both cases viz., when initial data in form of frequency distribution function and in form of rough histogram. The method allows representation all uncertain data in uniform way as trapezoidal fuzzy intervals [134]. The solution for the method represented by fuzzy programming problem in equations (5) to (9) is realized by performing all fuzzy numbers as sets of α-cuts. In fact, it reduces fuzzy problem into set of crisp interval [133], [141] optimization problem. The final solution has been obtained numerically using probabilistic approach to interval comparison. The standard montecarlo procedure was used for realization of probabilistic approach to the description of uncertain parameters of optimization problem described by equations (5) to (9). For each randomly selected set of real valued parameters of optimization problem is solved using Linear Programming Problem.

Figure 3.2: Frequency distributions to be transformed

Figure 3.3: The transformation of Cumulative function to Fuzzy number

3.3 Solution of constraint equations of Transportation Problem through NeuroFuzzy Approach In this section, constraints of transportation problem are expressed as systems of linear equations which are solved using neuro-fuzzy networks [86] that can learn as well as allow prior knowledge to be embedded via fuzzy rules [123]. The entire technique is represented in five phases, which optimizes the network to minimize associated error cost function such that appropriate output signals are collected from structural ANN [79] for particular set of inputs to solve the problem. This results in system of linear equations to be solved in lesser number of iterations. The constraint equations (2) and (3) can be represented as following set of linear algebraic equations with constant coefficients [75]:

a11 x1  .......... .......... .........  a1n x n  b1 a 21 x1  .......... .......... ........  a 2 n x n  b2 .......... .......... .......... .......... .......... ......... .......... .......... .......... .......... .......... ......... a m1 x1  .......... .......... .........  a mn x n  bm Here,

vector X  [ x1 .......... ..... x n ] is

unknown

vector

(10)

given

the

coefficients

aij ; i  1,......... ., m, j  1,......... ...., n and bi ; i  1,......... .., m . Since, in equations (2) and (3) coefficients aij  1; i, j , above system of equations can be written in more compact vector-matrix form as follows: AX  B (11) where, it is assumed that A  R mn , X  R n1 , B  R m1 and matrix A and B are given by:

1 .  A  .  . 1

. . . . .

. . . . .

. . . . .

1 .  .  . 1 mn

B  b1 . . . bm 

T

There are three cases that can exist which are [76]: (a) If n  m ie there are as many equations as unknowns; then system of equations are called exactly determined; (b) If n  m ie there are more unknowns than equations; then system of equations are called underdetermined; (c) If m  n ie there are more equations than unknowns; then system of equations are called overdetermined which is common situation encountered often in practice. An efficient way to solve these constraint systems of linear equations numerically is given by gauss jordan elimination or by cholesky decomposition [137]. For problems of form in equation (11) where A is singular matrix, it is decomposed into product of three matrices in a process called singular value decomposition The types of systems that are of major interest here is to solve constraint system of equations that are much more complex large-scale over-determined and under-determined systems ill-conditioned systems and systems that have uncertainty and impreciseness. These complex systems are taken care of using neuro-fuzzy techniques [86] 3.3.1 Neuro-Fuzzy Approach A solution developed for equation (11) is based on neuro-fuzzy approach that can learn as well as allow prior knowledge to be embedded via fuzzy rules with appropriate linguistic labels. Typically, neuro-fuzzy network has five to six layers of nodes. The functionalities associated with different layers includes following steps [86]: 1. 2. 3. 4. 5.

Compute matching degree to fuzzy condition involving one variable. Compute matching degree to conjunctive fuzzy condition involving multiple variables. Compute normalized matching degree. Compute conclusion inferred by fuzzy rule. Combine conclusion of all fuzzy rules in a model.

The solution approach here is based on fuzzy back-propagation learning rule [32]. 3.3.2 Fuzzy Back-propagation Learning rule Fuzzy back-propagation learning rule [79] is developed to solve equation (11). Backpropagation learning is generally obtained by applying conjugate gradient descent method to minimize error between network’s output and target output of entire training set. In conjugate gradient descent method [79], set of direction vectors {d 0 ,......... ......, d n 1 } are generated that T are conjugate with respect to matrix A i.e. d i Ad j  0; i  j . Conjugate direction vector is th

generated at k iteration of iterative process by adding to the calculated current negative th gradient vector of objective function. The vector generated at k iteration is applied to parameter identification of Takagi-Sugeno-Kang (TSK) [86] fuzzy model whose antecedent membership functions are of gaussian type. More specifically, in forward pass for given input

pattern, actual response of model is computed directly from equation (15), and the effect from input to output is completed through single propagation step. During this process, antecedent parameters mij ,  ij and consequent c i which amount to weights in ANN are all fixed. In backward pass, error signal resulting from difference between actual output and desired output is propagated backward and parameters mij ,  ij and c i are adjusted using error correction rule. Again the process is completed in single propagation step. Denoting error function at k th iteration as [99]:

J (k )  1 ( yˆ i  yi ) 2 2

(12)

error correction rules for c i , mij and  ij are given by the following expressions:

ci (k )  ci (k  1)  1 (

(13)

J (k ) ) | mij mij ( k 1) ; i  1,......... ..., M , j  1,......... ..., m mij

(14)

J (k ) ) | ij  ij ( k 1) ; i  1,......... ..., M , j  1,......... ..., m  ij

(15)

mij (k )  mij (k  1)   2 (

 ij (k )   ij (k  1)   3 (

J (k ) ) |ci ci ( k 1) ; i  1,......... , M ci

where, 1 , 2 , 3 are learning rate parameters. To give a clear picture that shows how gradients

(

J (k ) J (k ) J (k ) ) and ( ) are formed and yˆ ( k ) , actual output of model at k th ), ( mij  ij ci

observation in its extended are rewritten as [99]: M

yˆ (kˆ)  (kˆ)   vi (k )ci i i S

S

where, vi (k ) 

 j 1

M

Aij

x j (k ) 

S

  i 1 j 1

Aij

x j (k )

 exp(( x

(16)

j

(k )  mij ) 2 /  ij2 )

j

(k )  mij ) /  )

j 1 M

(17)

S

 ((x i 1 j 1

2

Correspondingly error signal becomes, e(k )  y(k )  yˆ (k )  y(k ) 

2 ij

M

 v (k )c i 1

i

i

(18)

Differentiating J (k ) with respect to ˆ , equating result to zero and using chain rule,

J (k ) J (k )e(k ) (19)   2e(k )vi (k ) ci e(k )ci M J (k ) J (k )e(k )vi (k )   2e(k )vi (k )[ci   vl (k )cl ][(( x j (k )  mij ) /  ij2 ] mij e(k )vi (k )mij l 1 M J (k ) J (k )e(k )vi (k )   2e(k )vi (k )[ci   vl (k )cl ][(( x j (k )  mij ) 2 /  ij3 ]  ij e(k )vi (k ) ij l 1

(20) (21)

Substituting equations (19), (20) and (21) into equations (13), (14) and (15) respectively,

ci (k )  ci (k  1)  21e(k )vi (k ); i  1,......... ......, M

(22)

M

mij (k )  mij (k  1)  2 2 e(k )vi (k )[ci   vl (k )cl (k )][(( x j (k )  mij (k )) /  ij2 (k ))]; l 1

(23)

i  1,......... ......, M , j  1,......... ......., S M

 ij (k )   ij (k  1)  2 3 e(k )vi (k )[ci   vl (k )cl (k )][(( x j (k )  mij (k )) 2 /  ij3 (k ))]; l 1

(24)

i  1,......... ......, M , j  1,......... ......., S where, vi (k ) are computed by equation (17) with mij and  ij replaced by mij (k  1) and

 ij (k  1) respectively and e(k ) is computed by equation (18) with c i replaced by ci (k  1) . The last three equations make up the back-propagation algorithm for parameter estimation of TSK fuzzy models using gaussian antecedent membership functions. This is valid for purely quadratic case. However, it is important for generalizations of conjugate gradient method to non-quadratic problems. It is assumed that near solution point the problem is approximately quadratic. Here, polak-ribiere conjugate gradient method [19] based on line search methods for non-quadratic case is presented. The method is modified using fuzzy rules with appropriate linguistic labels. The objective is to minimize cost function E (x ) where x  R n1 and E is not necessarily quadratic function. 3.3.3 Polak-Ribiere conjugate gradient algorithm with Fuzzy rules Step 1: Set x 0 Step 2: Compute g 0   x E ( x0 )  E ( x) / x at x  x0 Step 3: Set d 0   g 0 Step 4: Compute x k 1  x k   k d k v k where, xk 1  min E ( xk   k d k vk )  0

S

and v k 

 j 1

M

Aij

x j (k )

S

  i 1 j 1

Aij

x j (k )

where,  k  g kT1 ( g k 1  g k ) / g kT g k Step 5: Compute g k 1   x E ( x k 1 ) Step 6: Compute d k 1   g k 1   k d k Step 4 through Step 6 are carried out for k  0,1,......... ....., n  1 Step 7: Replace x 0 by x n and go to Step1 Step 8: Continue until convergence is achieved; termination criterion could be || d k ||  (where,  is an appropriate predefined small number)

The restart feature in above algorithm (step 7) is important for cases where cost function is not quadratic. Polak-ribiere conjugate gradient algorithm with fuzzy rules is restated by search in steepest descent directions after each n iteration [32]. This feature of algorithm is important for global convergence because it cannot be guaranteed that directions that d k generate are descent directions. 3.3.4 Architecture of Neuro-Fuzzy network Modified polak-ribiere conjugate gradient algorithm with fuzzy rules gives the updated solution [32],

x k 1  x k   k d k v k where, xk 1  min E ( xk   k d k vk )  0

The vector d k is current direction vector and E () is cost function to be minimized. In this case cost function is given by,

E ( x)  1 || Ax  b || 2  1 ( Ax  b)T ( Ax  b) (25) 2 2 T Therefore, E ( xk  d k vk )  1 [ A( xk  d k vk )  b] [ A( xk  d k vk )  b] 2 E ( xk  d k vk )  1 [( xk  d k vk )T AT  bT ][ Ax k  Ad k vk  b] 2 1 E ( xk  d k vk )  [( xkT  d kT vkT ) AT  bT ][ Axk  Ad k vk  b] 2 E ( xk  d k vk )  1 [( xkT AT  d kT vkT AT  bT ][ Ax k  Ad k vk  b] 2 E( xk  d k vk )  1 [ xkT AT Axk  xkT AT Ad k vk  xkT AT b  d kT vk AT Axk 2 2 T T T   d k vk A Ad k vk  d kT vkT AT b  bT Axk  bT Ad k vk  bT b] (26) Computing gradient of E ( x k  d k v k ) in equation (26) with respect to  and equating result equal to zero,   E ( x k  d k v k )  E ( x k  d k v k ) / 

 E( xk  d k vk )  1 [ xkT AT Ad k vk  d kT vkT AT Axk  2d kT vkT AT Ad k vk  d kT vkT AT b  bT Ad k vk ] 2  E ( xk  d k vk )  1 [2d kT AT Axk vk  2d kT vkT AT Ad k vk  2d kT AT bvk ] 2   E ( x k  d k v k )  0 T T d k A Axk vk  d kT vkT AT Ad k vk  d kT AT bvk  0 (27) Solving for  from equation (27) and assuming    k gives,

k  

g kT vkT d k d kT vkT AT Ad k

(28)

Here, g k v k is gradient of E ( x k ) i.e., g k v k   x E ( x k ) or g k   x E ( x k ) / v k (29)

Thus, modified polak-ribiere conjugate gradient algorithm with fuzzy rules (with restart) for solving AX  B is summarized in following steps [32]: Algorithm Step 1: Set x 0 Step 2: Compute g 0  ( AT Ax0  AT b) / v0 Step 3: Set d 0   g 0 Step 4: Compute x k 1  x k   k d k v k where,  k  

g kT vkT d k d kT vkT AT Ad k

Step 5: Compute g k 1  ( AT Axk 1  AT b) / vk 1 Step 6: Compute d k 1   g k 1   k d k where  k  g kT1 ( g k 1  g k ) / g kT g k Step 4 through Step 6 are carried out for k  0,1,......... ....., n  1 Step 7: Replace x 0 by x n and go to Step1 Step 8: Continue until convergence is achieved; termination criterion could be || d k ||  (where,  is an appropriate predefined small number) 3.3.5 Complexity Analysis of modified Polak-Ribiere conjugate gradient Neuro-Fuzzy Algorithm In the context of analyzing computational complexity of modified polak-ribiere conjugate gradient algorithm with fuzzy rules for solving system of linear algebraic equations, the linear system is expressed as matrix equation AX  B in which each matrix belongs to field of real numbers R [53]. Further, three cases viz., exactly determined, underdetermined and overdetermined is considered Now, the central question is: How fast solution of system of linear equations is solved using modified polak-ribiere conjugate gradient algorithm with fuzzy rules? For exactly determined case in which number of equations is equal to number of unknowns, using above neuro-fuzzy algorithm, computational complexity is (n 2 ) which defines asymptotic tight bound. Again, considering underdetermined case in which number of equations is less than number of unknowns, typically such system has infinitely many solutions (if there are any). However, using above neuro-fuzzy algorithm computational complexity is O(n 2 ) which defines asymptotic upper bound. Finally, considering over-determined case in which number of equations exceeds number of unknowns, there may not exist any solution. Finding good approximate solutions to over-determined system of linear equations is an important problem. Here, neuro-fuzzy algorithm yields computational complexity ( n 2 ) which defines asymptotic lower bound. Thus, from this discussion it is concluded that proposed neuro-fuzzy algorithm provides improved results over other well-known techniques.

3.4 Proposed solution of Transportation Problem using FVAM

Here, transportation problem is solved using Fuzzy Vogel’s Approximation Method and optimality of solution is tested through Fuzzy Modified Distribution Method [40]. The solution allows handling of uncertainty and vagueness involved in cost values of each cells in transportation table. 3.4.1 Transportation Problem through Fuzzy Trapezoidal Numbers For transportation problem [40] having m sources and n destinations consider that ith source must supply fuzzy quantity Ai  [ai(1) , ai( 2) , ai(3) , ai( 4) ](  [ ,0,0,  ]), whereas jth destination must receive fuzzy quantity

B j  [b (j1) , b (j 2) , b (j3) , b (j 4) ](  [ ,0,0,  ]).

Let fuzzy cost

Cij  [cij(1) , cij( 2) , cij(3) , cij( 4) ] of shipping a unit quantity from source i to destination j be known for all sources and destinations. The fuzzy quantities Ai , B j and Cij are represented using fuzzy trapezoidal numbers as given by the following membership function [86], [141]: 0, x  a ( x  a) /(b  a), a  x  b  trapezoid( x; a, b, c, d )  1, b  x  c (d  x) /(d  c), c  x  d  0, x  d

(30)

It is possible to transport from any one source to any one destination such that all requirements are satisfied at total minimum transportation cost. This scenario holds for balanced transportation problem. Further in an unbalanced transportation problem [94] the sum availabilities or supplies of sources are not equal to the sum of demands at destinations. In order to solve this problem first unbalanced problem is converted into balanced one where sum of demand and supply are made equal. For this a fictitious or dummy source or destination is added that provides required supply or demand respectively. The costs of transporting a unit from fictitious origin as well as costs of transporting a unit to fictitious destination are taken as zero. This is equivalent to not transporting from dummy source or to dummy destination with zero transportation cost. The mathematical formulation of the problem is as follows [40]. Let (1) ( 2) ( 3) ( 4) decision variable X ij  [ x ij , xij , xij , xij ] represented by fuzzy trapezoidal number be the number of units supplied from origin i to destination j. Then the problem can be written as follows [141]: m

MinimizeZ   i 1

n

C j 1

ij

X ij

(here, Z  [ z (1) , z ( 2) , z (3) , z ( 4) ] )

(31)

subject to: m

n

X

 X ij  Ai , i  1,2,......... ..., m

i 1

j 1

n

j 1

 B j , j  1,2,......... ..., n

(destination constraints) (33)

(supply constraints) (32)

X

ij

ij

 [ ,0,0,  ] i, j

(34)

where, Ai  [ ,0,0,  ] and B j  [ ,0,0,  ] i, j .

It is obvious from constraint equations (32), (33) and (34) that every component X ij of fuzzy feasible solution vector X is bounded, i.e.,

[ ,0,0,  ]  [ xij(1) , xij( 2) , xij(3) , xij( 4) ]  min([ai(1) , ai( 2) , ai(3) , ai( 4) ], [b (j1) , b (j 2) , b (j3) , b (j 4) ]) (35) Thus, the feasible region of the problem is closed, bounded and non-empty. Hence, there always exists an optimal solution to the balanced transportation problem. Constraint equations (32) and (33) can be written in the matrix form as follows:

AX  B

(36)

with X  ( X 11 , X 12 ,...., X 1n , X 21 ,...., X 2n ,.... X mn )T , B  ( A1 , A2 ,...., Am , B1 , B2 ,...., Bn )T and A as an ( m  n)  mn matrix given by following matrix [75], [76]: 1.......... .......... 0 0..1.......... ..... 0   .......... .......... . .  A  .......... .......... . .  0.......... .......... 1     I ..I ..I ..I ..I ..I ...I 

(37)

Here, 1 is 1  n matrix with all components as 1 and I is n  n identity matrix. Since, sum of m equations in (32) equals sum of n equations in (33), the (m + n) rows of A are linearly dependent, such that rank (A) ≤ m + n - 1. 3.4.2 Definitions regarding Fuzzy feasible solution of Transportation Problem Some important definitions relevant for developing the feasible solution for transportation problem using trapezoidal fuzzy numbers [40], [141] are given below: (i)

Fuzzy feasible solution: Any set of fuzzy non-negative allocations ( X ij  [ ,0,0,  ] , δ being a small positive number) which satisfies row and column sum is fuzzy feasible solution.

(ii)

Fuzzy basic feasible solution: A fuzzy feasible solution is fuzzy basic feasible solution if the number of non-negative allocations is at most (m  n  1) ; where, m is number of rows and n is number of columns in transportation table.

(iii)

Fuzzy non–degenerate basic feasible solution: Any fuzzy feasible solution to transportation problem containing m origins and n destinations is said to be fuzzy non– degenerate, if it contains exactly (m  n  1) occupied cells.

(iv)

Fuzzy degenerate basic feasible solution: If fuzzy basic feasible solution contains less than (m  n  1) non-negative allocations, it is said to be fuzzy degenerate.

3.4.3 Solution of Transportation Problem using Fuzzy Trapezoidal Numbers

Based on definitions discussed above solution of fuzzy transportation problem is generally obtained through two stages viz. (i) initial basic feasible solution and (ii) test of optimality for solution [40]. The initial basic feasible solution can be easily obtained using the methods like North West Corner Rule, Least Cost Method or Matrix Minima Method, Vogel’s Approximation Method (VAM) etc [94]. VAM is preferred over the other methods, since the initial basic feasible solution obtained by this method is either optimal or very close to the optimal solution. The uncertainty aspect of data in VAM is taken care of using the fuzzy trapezoidal numbers such that Fuzzy Vogel’s Approximation Method (FVAM) is evolved. The steps involved in FVAM for finding the fuzzy initial solution are briefly enumerated below [40], [141]: Step 1: The penalty cost is found by considering difference of smallest and next smallest costs in each row and column. Step 2: Among penalties calculated in step 1, maximum penalty is chosen. If maximum penalty occurs more than once then any one can be chosen arbitrarily. Step 3: In selected row or column found in step 2, cell having least cost is considered. An allocation is made to this cell by taking minimum of supply and demand values. Step 4: Finally, row or column is deleted which is fully fuzzy exhausted. Now, considering reduced transportation tables repeat steps 1 - 3 until all requirements are fulfilled. Once fuzzy initial basic feasible solution has been obtained, the next step is to determine whether solution obtained is fuzzy optimum or not. Optimality test can be conducted to any initial basic feasible solution of transportation problem provided such allocations have exactly (m  n  1) non–negative allocations, where m is number of origins and n is number of destinations. Also these allocations must be in independent positions. To perform optimality test, FMODIM is made use of. The various steps involved in FMODIM for performing optimality test are given below [40], [141]: Step 1: Find fuzzy initial basic feasible solution of fuzzy transportation problem by using FVAM. (1) ( 2) ( 3) ( 4) Step 2: Find set of numbers U i = [ ui(1) , ui( 2) , ui(3) , ui( 4) ] and V j = [ v j , v j , v j , v j ] for each

row and column satisfying U i ()V j  Cij for each occupied cell. It is started by assigning number fuzzy zero which may be [-0.05, 0, 0, 0.05] to any row or column having maximum number of allocations. If maximum number of allocation is more than one, any one is chosen arbitrarily. Step 3: For each empty or unoccupied cell, find the sum of U i and V j . Step 4: Find net evaluation value for each empty cell given as,  ij  Cij ()(U i ()V j ) and also write it in each cell. This gives optimality condition which may be any of the following: a)

If all  ij  [ ,0,0,  ] , solution is fuzzy optimum and fuzzy unique solution exists.

b)

If  ij  [ ,0,0,  ] , then solution is fuzzy optimum but an alternate solution exists.

c)

If at least one  ij  [ ,0,0,  ] , solution is not fuzzy optimum. In this case, next step is performed to improve total transportation cost.

Step 5: Select empty cell having the most negative value of  ij . From this cell draw a closed path by drawing horizontal and vertical lines with corner cells occupied. Assign positive and negative signs alternately and find minimum allocation from cell having negative sign. This allocation is to be added to allocation having positive sign and subtracted from allocation having negative sign. Step 6: The step 5 yields better solution by making one or more occupied cell as empty and one empty cell as occupied. For this new set of basic feasible allocations repeat from steps 2 – 5 until an optimum basic feasible solution is obtained.

3.4.4 Computational Complexity of FVAM The computational complexity of transportation problem using FVAM with m origins and n destinations is investigated here [53]. Let total computational time be given by T (m, n). The time to calculate penalty values for m rows = (n + 1) m. The time to calculate penalty values for n columns = (m + 1) n. The time to search for maximum value of penalty for corresponding least value of cost = {m / (m + n)} n, if maximum penalty is found in a row and least value of cost = {n / (m + n)} m, if maximum penalty is found in a column. The time required to obtain feasible solution corresponding to (m + n -1) cell allocations

 m  (m  n) n(m  n  1)  =  n m(m  n  1)  (m  n) Hence, total computational time required is given by,

m  (n  1)m  (m  n) n(m  n  1)  T (m, n) =  (m  1)n  n m(m  n  1)  ( m  n) Thus, total time complexity is O(mn) . The computational time grows as values of m and n increases, as result of which problem becomes intractable in nature.

3.5 Experimental Results and Comparisons In this section some experimental results are presented which are conducted on examples taken from different real-life problems. 3.5.1 Simulation results of Transportation Problem under Probabilistic and Fuzzy Uncertainties

To compare results of fuzzy programming with those obtained when using monte-carlo method all uncertain parameters were previously resolved using gaussian frequency distributions [37]. The averages of them are presented in table 3.1. For simplicity, standard deviation   10 is considered.

a1 = 460 a2 = 460 a3 = 610 c11 = 100 c21 = 110 c31 = 120

B1 = 410 B2 = 510 B3 = 610 C12 = 30 C22 = 36 C32 = 148

p1 = 440 q1 = 390 t1 = 600 s1 = 1000 k1 = 590 r1 = 990 p2 = 440 q2 = 490 t2 = 491 s2 = 1130 k2 = 480 r2 = 1100 p3 = 590 q3 = 590 t3 = 581 s3 = 1197 k3 = 570 r3 = 1180 c13 = 100 c23 = 405 c33 = 11

Table 3.1: Average values of Gaussian distributions of uncertain parameters

Figure 3.4: Frequency distribution F and Fuzzy number  for optimized x11

Figure 3.6: Frequency distribution F and Fuzzy number  for optimized x22

Figure 3.5: Frequency distribution F and Fuzzy number  for optimized x12

Figure 3.7: Frequency distribution F and Fuzzy number  for optimized x33

Figure 3.8: Frequency distribution F and Fuzzy number  for optimized benefit D : 1 – Monte Carlo method for 10,000 random steps; 2 – Monte Carlo method for 100,000,000 random steps; 3 – Fuzzy approach The results obtained using fuzzy optimization method and monte-carlo method i.e., linear programming with real valued but random parameters are given in figures 3.4 to 3.8, by considering m  n  3 , where final frequency distributions F are drown by dotted lines, fuzzy numbers μ are drown by continuous lines. It is easy to see that fuzzy approach gives some more wide fuzzy intervals than monte-carlo method. Using probabilistic method two extreme results are obtained whereas the fuzzy approach always gives results without ambiguity. It is worth considering that probabilistic method demands too many random steps to obtain smooth frequency distribution of resulting benefit D . As a result, this method is generally not used in practice. 3.5.2 Numerical Examples illustrating Solution of System of Linear Equations using Neuro-Fuzzy Approach To verify the effectiveness of neurofuzzy network approach, some examples are presented to solve exactly determined underdetermined and overdetermined system of equations [32] Example  Consider the following exactly determined system of linear equations:

10x1  2 x 2  x3  x 4  3  2 x1  10x 2  x3  x 4  15  x1  x 2  10x3  2 x 4  27  x1  x 2  2 x3  10x 4  9 The neuro-fuzzy conjugate gradient algorithm given above is used to solve this system of equations for X  [ x1 , x 2 , x3 , x 4 ] . The initial zero condition is assumed for unknown quantities. After 3 rd iteration the solution is given as X  [1,2,3,0] . The same solution is obtained using gaussseidel method after  iterations and jacobi method after  iterations Example 2

Consider

the

following

exactly

determined

system

of

linear

equations:

20x1  x2  2 x3  17 3x1  20x 2  x3  18 2 x1  3x 2  20x3  25 The neuro-fuzzy conjugate gradient algorithm given above is used to solve this system of equations for X  [ x1 , x 2 , x3 ] . The initial zero initial condition is assumed for unknown quantities After 4 th iteration the solution is given as X  [1,1,1]  The same solution is obtained using jacobi method after 6 iterations

Example 3 Consider the following underdetermined system of linear equations:

6 x1  2 x 2  4 x3  9 x 4  12x5  2 x6  12x7  x9  12 8 x1  10x 2  x3  8 x 4  22x5  11x7  11x8  7 x9  13 9 x1  7 x 2  6 x3  6 x 4  10x5  10x6  15x7  13x8  12x9  9  10x1  11x 2  6 x3  8 x 4  5 x5  9 x6  x7  3x8  5 x9  0 2 x1  x 2  4 x3  3 x 4  3 x5  4 x6  12x7  10x8  3 x9  6 The neuro-fuzzy conjugate gradient algorithm given above is used to solve this system of equations for X  [ x1 , x 2 , x3 , x 4 , x5 , x6 , x7 , x8 , x9 ] . The initial zero condition is assumed for unknown quantities The proposed algorithm gives the following solution after 4 th iteration:

X  [0.1886,0.4444,0.1066,0.1450,0.3418,0.0678,0.4396,0.0186,0.0060] The same solution is obtained using singularvalue decomposition method. Example 4 Consider the following over-determined system of linear equations:

x1  2 x 2  3 x3  14 3 x1  2 x 2  x3  10 x1  x 2  x3  6 2 x1  3 x 2  x3  5 x1  x 2  3 The neuro-fuzzy conjugate gradient algorithm solves this system of equations for X  [ x1 , x 2 , x3 ] . The zero initial condition is assumed for unknown quantities. The proposed algorithm gives solution X  [1,2,3] after 2 nd iteration. 3.5.3 Simulation results of Transportation Problem using FVAM The fuzzy transportation problem is considered here and fuzzy initial basic feasible solution of problem is obtained by FVAM. The fuzzy membership functions of costs and allocations are then determined. Finally, test of optimality of solution obtained using FMODIM [40]. The transportation problem considered here consists of 3 origins and 4 destinations. The cost coefficients are denoted by trapezoidal fuzzy numbers. The corresponding availability (supply) and requirement (demand) vectors are also given in figure 3.9. In figure 3.9,

 A = [1009.85, 1010, 1010, 1010.15] (column sum) and  B i

i

1010, 1010, 1010.20] (row sum). Since,

A

i

i

j

= [1009.80,

j

and

B j

j

are fuzzy equal, differing by fuzzy zero

viz., [-0.35, 0, 0, 0.35] given problem is balanced and there exists fuzzy feasible solution to the problem. First row and column penalties are found as difference between fuzzy least and next fuzzy least cost in corresponding rows and columns respectively. In above problem maximum penalty is [6.9, 7, 7.1] corresponding to D2 column. In this allocation cell having fuzzy least cost is (1, 2). To this cell minimum of Supply A1 and Demand B2 i.e., ([279.95, 280, 280, 280.05], [249.95, 250, 250, 250.05]) = [249.95, 250, 250, 250.05] is allocated as given in figure 3.10. This exhausts second column by fuzzy zero, [-0.10, 0, 0, 0.10] and supply is reduced to ([279.95, 280, 280, 280.05] (-) [249.95, 250, 250, 250.05]) = [29.90, 30, 30, 30.10]. The second column is deleted from figure 3.10 such that the following shrunken matrix is obtained given in figure 3.11. As second column is deleted values of penalties are also changed. DESTINATIONS D2 D3

ORIGINS

D1

D4

O1

[12.95, 13, 13, 13.05]

[14.95, 15, 15, 15.05]

[15.95, 16, 16, 16.05]

[17.95, 18, 18, 18.05]

O2

[19.95, 20, 20, 20.05]

[21.95, 22, 22, 22.05]

[10.95, 11, 11, 11.05]

[7.95, 8, 8, 8.05]

O3

[18.95, 19, 19, 19.05]

[24.95, 25, 25, 25.05]

[16.95, 17, 17, 17.05]

[10.95, 11, 11, 11.05]

Deman d (Bj)

[299.95, 300, 300, 300.05]

[249.95, 250, 250, 250.05]

[279.95, 280, 280, 280.05]

[179.95, 180, 180, 180.05]

Supply (Ai) [279.95, 280, 280, 280.05] [329.95, 330, 330, 330.05] [399.95, 400, 400, 400.05]

Figure 3.9: Transportation Problem with 3 origins and 4 destinations DESTINATIONS

ORIGINS

O1

D1 [12.95, 13, 13, 13.05]

Demand (Bj)

[19.95, 20, 20, 20.05] [18.95, 19, 19, 19.05] [299.95, 300, 300, 300.05]

Column Penalty

[5.9, 6, 6, 6.1]

O2 O3

D2 [14.95, 15, 15, 15.05] ([249.95, 250, 250, 250.05]) [21.95, 22, 22, 22.05] [24.95, 25, 25, 25.05] [249.95, 250, 250, 250.05] [-0.10, 0, 0, 0.10] [6.9, 7, 7, 7.1]

D3 [15.95, 16, 16, 16.05]

D4 [17.95, 18, 18, 18.05]

Supply (Ai) [279.95, 280, 280, 280.05]

Row Penalty [1.9, 2, 2, 2.1]

[10.95, 11, 11, 11.05] [16.95, 17, 17, 17.05] [279.95, 280, 280, 280.05]

[7.95, 8, 8, 8.05] [10.95, 11, 11, 11.05] [179.95, 180, 180, 180.05]

[329.95, 330, 330, 330.05] [399.95, 400, 400, 400.05]

[2.9, 3, 3, 3.1] [5.9, 6, 6, 6.1]

[5.9, 6, 6, 6.1]

[2.9, 3, 3, 3.1]

Figure 3.10: First allocation to Transportation Problem In figure 3.11, maximum value of penalty is [5.9, 6, 6, 6.1] corresponding to D1 column and O3 row. Taking either of row or column viz., if row O3 is considered, cell having least fuzzy cost is (3, 4). To this cell minimum of Supply A3 and Demand B4 i.e., ([399.95, 400, 400, 400.05], [179.95, 180, 180, 180.05]) = [179.95, 180, 180, 180.05] is allocated as given in figure 3.11. This

exhausts fourth column by fuzzy zero, [-0.10, 0, 0, 0.10] and supply is reduced to ([399.95, 400, 400, 400.05] (-) [179.95, 180, 180, 180.05]) = [219.90, 220, 220, 220.10]. The fourth column is deleted from figure 3.11 such that following shrunken matrix is obtained given in figure 3.12. The values of penalties are also changed. DESTINATIONS D1 [12.95, 13, 13, 13.05] [19.95, 20, 20, 20.05] [18.95, 19, 19, 19.05]

D3 [15.95, 16, 16, 16.05] [10.95, 11, 11, 11.05] [16.95, 17, 17, 17.05]

Demand (Bj)

[299.95, 300, 300, 300.05]

[279.95, 280, 280, 280.05]

Column Penalty

[5.9, 6, 6, 6.1]

[4.9, 5, 5, 5.1]

O1

ORIGINS

O2 O3

D4 [17.95, 18, 18, 18.05] [7.95, 8, 8, 8.05] [10.95, 11, 11, 11.05] ([179.95, 180, 180, 180.05]) [179.95, 180, 180, 180.05] [-0.10, 0, 0, 0.10] [2.9, 3, 3, 3.1]

Supply (Ai) [29.90, 30, 30, 30.10] [329.95, 330, 330, 330.05] [399.95, 400, 400, 400.05]

Row Penalty [2.9, 3, 3, 3.1] [2.9, 3, 3, 3.1] [5.9, 6, 6, 6.1]

Figure 3.11: Second allocation to Transportation Problem In figure 3.12, maximum value of penalty is [7.9, 8, 8, 8.1] corresponding to O2 row. In this allocation cell having fuzzy least cost is (2, 3). To this cell the minimum of Supply A2 and Demand B3 i.e., ([329.95, 330, 330, 330.05], [279.95, 280, 280, 280.05]) = [279.95, 280, 280, 280.05] is allocated as given in figure 3.12. This exhausts third column by fuzzy zero, [-0.10, 0, 0, 0.10] and supply is reduced to ([329.95, 330, 330, 330.05] (-) [279.95, 280, 280, 280.05]) = [49.90, 50, 50, 50.10]. The third column is deleted from figure 3.12 such that the following shrunken matrix is obtained given in figure 3.13. As third column is deleted values of penalties are also changed. DESTINATIONS O1

ORIGINS

O2

D1 [12.95, 13, 13, 13.05] [19.95, 20, 20, 20.05]

Demand (Bj)

[18.95, 19, 19, 19.05] [299.95, 300, 300, 300.05]

Column Penalty

[5.9, 6, 6, 6.1]

O3

D3 [15.95, 16, 16, 16.05] [10.95, 11, 11, 11.05] [279.95, 280, 280, 280.05] [16.95, 17, 17, 17.05] [279.95, 280, 280, 280.05] [-0.10, 0, 0, 0.10] [4.9, 5, 5, 5.1]

Supply (Ai) [29.90, 30, 30, 30.10] [329.95, 330, 330, 330.05]

Row Penalty [2.9, 3, 3, 3.1] [7.9, 8, 8, 8.1]

[219.90, 220, 220, 220.10]

[1.9, 2, 2, 2.1]

Figure 3.12: Third allocation to Transportation Problem In figure 3.13, maximum value of penalty is [19.95, 20, 20, 20.05] corresponding to O2 row. As there is only one element in this row, this cell i.e., (2, 1) is fuzzy least cost cell and is considered for allocation. The minimum of Supply A2 and Demand B1 i.e., ([49.95, 50, 50, 50.05], [299.95, 300, 300, 300.05]) = [49.95, 50, 50, 50.05] is allocated as given in figure 3.13. This exhausts second row by fuzzy zero, [-0.10, 0, 0, 0.10] and demand is reduced to ([299.95, 300, 300, 300.05] (-) [49.95, 50, 50, 50.05]) = [249.90, 250, 250, 250.10]. The second row is deleted from

figure 3.13 such that the following shrunken matrix is obtained as given in figure 3.14. As second row is deleted values of penalties are also changed. DESTINATIONS O1

ORIGINS

O2

O3 Demand (Bj) Column Penalty

D1 [12.95, 13, 13, 13.05] [19.95, 20, 20, 20.05] [49.95, 50, 50, 50.05] [18.95, 19, 19, 19.05] [299.95, 300, 300, 300.05] [5.9, 6, 6, 6.1]

Supply (Ai) [29.90, 30, 30, 30.10] [49.95, 50, 50, 50.05] [-0.10, 0, 0, 0.10] [219.90, 220, 220, 220.10]

Row Penalty [12.95, 13, 13, 13.05] [19.95, 20, 20, 20.05]

[18.95, 19, 19, 19.05]

Figure 3.13: Fourth allocation to Transportation Problem In figure 3.14, maximum value of penalty is [18.95, 19, 19, 19.05] corresponding to O3 row. As there is only one element in this row, this cell i.e., (3, 1) is fuzzy least cost cell and is considered for allocation. The minimum of Supply A3 and Demand B1 i.e., ([219.90, 220, 220, 220.10], [249.90, 250, 250, 250.10]) = [219.90, 220, 220, 220.10] are allocated as given in figure 3.14. This exhausts third row by fuzzy zero, [-0.10, 0, 0, 0.10] and demand is reduced to ([249.90, 250, 250, 250.10] (-) [219.90, 220, 220, 220.10]) = [29.85, 30, 30, 30.15]. The third row is deleted from figure 3.14 such that the following shrunken matrix is obtained as given in figure 3.15. As third row is deleted values of penalties are also changed. In figure 3.15, only one cell value corresponding to D1 column and O1 row remains. There is only one penalty value i.e., [12.95, 13, 13, 13.05]. In this allocation the cell having fuzzy least cost is (1, 1). To this cell the minimum of Supply A1 and Demand B1 i.e., ([29.90, 30, 30, 30.10], [29.85, 30, 30, 30.15]) = [29.90, 30, 30, 30.10] is allocated as given in figure 3.15. This exhausts first row and first column by fuzzy zero, [-0.20, 0, 0, 0.20] and all requirements are fulfilled in 6th allocation. The demand and supply are differed by fuzzy zero ([-0.35, 0, 0, 0.35] (-) [-0.20, 0, 0, 0.20]) = [-0.55, 0, 0, 0.55] due to fuzziness. Finally, initial basic feasible solution is shown in figure 3.16.

ORIGINS

O1

DESTINATIONS D1 Supply (Ai) [12.95, 13, [29.90, 30, 13, 13.05] 30, 30.10]

O3

[18.95, 19, 19, 19.05] [219.90, 220, 220, 220.10]

Demand (Bj) Column Penalty

[249.90, 250, 250, 250.05] [5.9, 6, 6, 6.1]

[219.90, 220, 220, 220.10] [-0.10, 0, 0, 0.10]

Row Penalty [12.95, 13, 13, 13.05] [18.95, 19, 19, 19.05]

Figure 3.14: Fifth allocation to Transportation Problem There are 6 positive independent allocations given by m + n -1 = 3 + 4 – 1. This ensures that solution is fuzzy non – degenerate basic feasible solution. The total transportation cost = {C11 () X11} (+) {C12 () X12} (+) {C21 () X21} (+) {C23 () X23} (+){C31 () X31} (+) {C34 () X34}

= {[12.95, 13, 13, 13.05] () [29.90, 30, 30, 30.10]} (+) {[14.95, 15, 15, 15.05] () [249.95, 250, 250, 250.05]} (+) {[19.95, 20, 20, 20.05] () [49.95, 50, 50, 50.05]} (+){[10.95, 11, 11, 11.05] () [279.95, 280, 280, 280.05]} (+) {[18.95, 19, 19, 19.05] () [219.90, 220, 220, 220.10]} (+) {[10.95, 11, 11, 11.05] () [179.95, 180, 180, 180.05]} = [14323.47, 14380, 14380, 14436.57]

(38)

Here, Cij are cost coefficients and X ij are allocations (i = 1, 2, 3; j = 1, 2, 3, 4).

ORIGINS

DESTINATIONS O1

Demand (Bj) Column Penalty

D1 [12.95, 13, 13, 13.05] [29.90, 30, 30, 30.10] [29.85, 30, 30, 30.15] [12.95, 13, 13, 13.05]

Supply (Ai) [29.90, 30, 30, 30.10] [-0.60, 0, 0, 0.60]

Row Penalty [12.95, 13, 13, 13.05]

Figure 3.15: Sixth allocation to Transportation Problem 3.5.3.2 Fuzzy membership functions of costs and allocations Now, fuzzy membership function of Cij and X ij and then their transportation costs are presented [166]. ( x  12.95) / 0.05,12.95  x  13 1,13  x  13   c11 ( x)   (13.05  x) / 0.05,13  x  13.05 0, otherwise Let C11 be interval of confidence for level of presumption,   [0, 1]. DESTINATIONS

ORIGINS

O1

O2

O3

Demand (Bj)

D1 [12.95, 13, 13, 13.05] ([29.90, 30, 30, 30.10])

D2 [14.95, 15, 15, 15.05] ([249.95, 250, 250, 250.05])

D3 [15.95, 16, 16, 16.05]

D4 [17.95, 18, 18, 18.05]

Supply (Ai) [279.95, 280, 280, 280.05]

[19.95, 20, 20, 20.05] ([49.95, 50, 50, 50.05]) [18.95, 19, 19, 19.05] ([219.90, 220, 220, 220.10])

[21.95, 22, 22, 22.05]

[7.95, 8, 8, 8.05]

[329.95, 330, 330, 330.05]

[24.95, 25, 25, 25.05]

[10.95, 11, 11, 11.05] ([279.95, 280, 280, 280.05]) [16.95, 17, 17, 17.05]

[10.95, 11, 11, 11.05] ([179.95, 180, 180, 180.05])

[399.95, 400, 400, 400.05]

[299.95, 300, 300, 300.05]

[249.95, 250, 250, 250.05]

[279.95, 280, 280, 280.05]

[179.95, 180, 180, 180.05]

Figure 3.16: Final allocated matrix with corresponding allocation values

Thus, C11 = [ c1( ) , c 2( ) ] = [0.05 + 12.95, 13.05 – 0.05]

(39)

( x  29.90) / 0.10,29.90  x  30 1,30  x  30   X 11 ( x)   (30.10  x) / 0.10,30  x  30.10 0, otherwise Let X11 be interval of confidence for level of presumption,   [0, 1]. Thus, X11 = [ x1( ) , x 2( ) ] = [0.10 + 29.90, 30.10 – 0.10]

(40)

C11 () X11 = [(0.05 + 12.95) (0.10 + 29.90), (13.05 – 0.05) (30.10 – 0.10)] = [0.0052 + 2.79 + 387.205, 0.0052 – 2.81 + 392.805] (41) Similarly, other fuzzy membership functions can be written as follows:

( x  14.95) / 0.05,14.95  x  15 1,15  x  15   c12 ( x)   (15.05  x) / 0.05,15  x  15.05 0, otherwise C12 = [0.05 + 14.95, 15.05 – 0.05]

(42)

( x  249.95) / 0.05,249.95  x  250 1,250  x  250   X 12 ( x)   (250.05  x) / 0.05,250  x  250.05 0, otherwise X12 = [0.05 + 249.95, 250.05 – 0.05] (43) C12 () X12 = [(0.05 + 14.95) (0.05 + 249.95), (15.05 – 0.05) (250.05 – 0.05)] = [0.00252 + 13.245 + 3736.7525, 0.00252 – 13.255 + 3763.2525] (44)

( x  19.95) / 0.05,19.95  x  20 1,20  x  20   c 21 ( x)   (20.05  x) / 0.05,20  x  20.05 0, otherwise C21 = [0.05 + 19.95, 20.05 – 0.05] (45) ( x  49.95) / 0.05,49.95  x  50 1,50  x  50   X 21 ( x)   (50.05  x) / 0.05,50  x  50.05 0, otherwise

X21 = [0.05 + 49.95, 50.05 – 0.05] (46) C21 () X21 = [(0.05 + 19.95) (0.05 + 49.95), (20.05 – 0.05) (50.05 – 0.05)] = [0.00252 + 3.495 + 996.5025, 0.00252 – 3.505 + 1003.5025] (47) ( x  10.95) / 0.05,10.95  x  11 1,11  x  11   c 23 ( x)   (11.05  x) / 0.05,11  x  11.05 0, otherwise

C23 = [0.05 + 10.95, 11.05 – 0.05] (48) ( x  279.95) / 0.05,279.95  x  280 1,280  x  280   X 23 ( x)   (280.05  x) / 0.05,280  x  280.05 0, otherwise

X23 = [0.05 + 279.95, 280.05 – 0.05]

(49)

C23 () X23 = [(0.05 + 10.95) (0.05 + 279.95), (11.05 – 0.05) (280.05 – 0.05)] = [0.00252 + 14.545 + 3065.4525, 0.00252 – 14.555 + 3094.5525] (50) ( x  18.95) / 0.05,18.95  x  19 1,19  x  19   c31 ( x)   (19.05  x) / 0.05,19  x  19.05  0, otherwise

C31 = [0.05 + 18.95, 19.05 – 0.05] (51) ( x  219.90) / 0.10,219.90  x  220 1,220  x  220   X 31 ( x)   (220.10  x) / 0.10,220  x  220.10 0, otherwise

X31 = [0.10 + 219.90, 220.10 – 0.10] (52) C31 () X31 = [(0.05 + 18.95) (0.10 + 219.90), (19.05 – 0.05) (220.10 – 0.10)] = [0.0052 + 12.89 + 4167.105, 0.0052 – 12.91 + 4192.905] (53) ( x  10.95) / 0.05,10.95  x  11 1,11  x  11   c34 ( x)   (11.05  x) / 0.05,11  x  11.05  0, otherwise

C34 = [0.05 + 10.95, 11.05 – 0.05] (54)

( x  179.95) / 0.05,179.95  x  180 1,180  x  180   X 34 ( x)   (180.05  x) / 0.05,180  x  180.05 0, otherwise X34 = [0.05 + 179.95, 180.05 – 0.05] (55) C34 () X34 = [(0.05 + 10.95) (0.05 + 179.95), (11.05 – 0.05) (180.05 – 0.05)] = [0.00252 + 9.545 + 1970.4525, 0.00252 – 9.555 + 1989.5525] (56) Now, total transportation cost is given by: Cost = {C11 () X11} (+) {C12 () X12 } (+) {C21 () X21} (+) {C23 () X23} (+) {C31 () X31} (+) {C34 () X34 }= [0.022 + 56.51 + 14323.47, 0.022 – 53.78 + 14436.57] (57) The expression given in equation (51) is obtained using equations (41), (44), (47), (50), (53) and (56). From equation (57) following equations are solved whose roots  [0, 1]: 0.022 + 56.51 + 14323.47-x1 = 0

(58)

0.022 – 53.78 + 14436.57-x2 = 0

(59)

From equation (58),

  {56.51  ((56.51) 2  4  0.02  (14323.47  x1 ))}/(2  0.02) From equation (59),

  {53.78  ((53.78) 2  4  0.02  (14436.57  x2 ))}/(2  0.02) {56.51  ((56.51) 2  4  0.02  (14323.47  x ))} /(2  0.02),14323.47  x  14380 1  1,14380  x  14380  cos t ( x)   {53.78  ((53.78) 2  4  0.02  (14436.57  x 2 ))} /(2  0.02),14380  x  14436.57  0, otherwise which is the required fuzzy membership function of transportation cost.

To determine fuzzy optimal solution for above problem, FMODIM is used. The set of numbers U i and V j are determined for each row and column of matrix given in figure 3.16 with

U i ()V j  Cij for each occupied cell. The value of fuzzy zero is assigned to U1 = [-0.05, 0, 0, 0.05] arbitrarily, as all rows have identical number of allocations. From occupied cells, V1 = C11 (-) U1 = [12.95, 13, 13, 13.05] (-) [-0.05, 0, 0, 0.05] = [12.90, 13, 13, 13.10] V2 = C12 (-) U1 = [14.95, 15, 15, 15.05] (-) [-0.05, 0, 0, 0.05] = [14.90, 15, 15, 15.10]

U2 = C21 (-) V1 = [19.95, 20, 20, 20.05] (-) [12.90, 13, 13, 13.10] = [6.85, 7, 7, 7.15] V3 = C23 (-) U2 = [10.95, 11, 11, 11.05] (-) [6.85, 7, 7, 7.15] = [3.80, 4, 4, 4.20] U3 = C31 (-) V1 = [18.95, 19, 19, 19.05] (-) [12.90, 13, 13, 13.10] = [5.85, 6, 6, 6.15] V4 = C34 (-) U3 = [10.95, 11, 11, 11.05] (-) [5.85, 6, 6, 6.15] = [4.80, 5, 5, 5.20] Now the sum of U i and V j are calculated for each unoccupied cell. The values of U i ()V j are given below Cij value of cells which are as follows: U1 (+) V3 = [-0.05, 0, 0, 0.05] (+) [3.80, 4, 4, 4.20] = [3.75, 4, 4, 4.25] U1 (+) V4 = [-0.05, 0, 0, 0.05] (+) [4.80, 5, 5, 5.20] = [4.75, 5, 5, 5.25] U2 (+) V2 = [6.85, 7, 7, 7.15] (+) [14.90, 15, 15, 15.10] = [21.75, 22, 22, 22.25] U2 (+) V4 = [6.85, 7, 7, 7.15] (+) [4.80, 5, 5, 5.20] = [11.65, 12, 12, 12.35] U3 (+) V2 = [5.85, 6, 6, 6.15] (+) [14.90, 15, 15, 15.10] = [20.75, 21, 21, 21.25] U3 (+) V3 = [5.85, 6, 6, 6.15] (+) [3.80, 4, 4, 4.20] = [9.65, 10, 10, 10.35] Next the net evaluations  ij  Cij ()(U i ()V j ) are found for each unoccupied cell. The values of  ij are given below values of U i ()V j which are as follows: 13 = C13 (-) (U1 (+) V3) = [15.95, 16, 16, 16.05] (-) ([-0.05, 0, 0, 0.05] (+) [3.80, 4, 4, 4.20]) = [11.80, 12, 12, 12.20] 14 = C14 (-) (U1 (+) V4) = [17.95, 18, 18, 18.05] (-) ([-0.05, 0, 0, 0.05] (+) [4.75, 5, 5, 5.25]) = [12.75, 13, 13, 13.25] 22 = C22 (-) (U2 (+) V2) = [21.95, 22, 22, 22.05] (-) ([6.85, 7, 7, 7.15] (+) [14.90, 15, 15, 15.10]) = [-0.20, 0, 0, 0.20] 24 = C24 (-) (U2 (+) V4) = [7.95, 8, 8, 8.05] (-) ([6.85, 7, 7, 7.15] (+) [4.80, 5, 5, 5.20]) = [3.70, 4, 4, 4.30] 32 = C32 (-) (U3 (+) V2) = [24.95, 25, 25, 25.05] (-) ([9.65, 10, 10, 10.35] (+) [14.90, 15, 15, 15.10]) = [-0.40, 0, 0, 0.40] 33 = C33 (-) (U3 (+) V3) = [16.95, 17, 17, 17.05] (-) ([9.65, 10, 10, 10.35] (+) [3.80, 4, 4, 4.20]) = [2.50, 3, 3, 3.50] The above calculated values are given in figure 3.17. Here, as all  ij  [-0.40, 0, 0, 0.40] solution obtained is optimal in nature and an alternative solution exists given by 32 = [-0.40, 0, 0, 0.40]. Therefore, optimal allocations are given by the following: X11 = [29.90, 30, 30, 30.10]; X12 = [249.95, 250, 250, 250.05] X21 = [49.95, 50, 50, 50.05]; X23 = [279.95, 280, 280, 280.05] X31 = [219.90, 220, 220, 220.10]; X34 = [179.95, 180, 180, 180.05]

The total optimum transportation cost = [14323.47, 14380, 14380, 14436.57] The fuzzy membership functions of optimality test can be obtained using similar technique as in case for fuzzy initial basic feasible solution. DESTINATIONS D1 [12.95, 13, 13, 13.05] ([29.90, 30, 30, 30.10])

D2 [14.95, 15, 15, 15.05] ([249.95, 250, 250, 250.05])

D3 [15.95, 16, 16, 16.05] [3.75, 4, 4, 4.25] [11.80, 12, 12, 12.20]

D4 [17.95, 18, 18, 18.05] [4.75, 5, 5, 5.25] [12.75, 13, 13, 13.25]

Ui [-0.05, 0, 0, 0.05]

O2

[19.95, 20, 20, 20.05] ([49.95, 50, 50, 50.05])

[21.95, 22, 22, 22.05] [21.75, 22, 22, 22.25] [-0.20, 0, 0, 0.20]

[10.95, 11, 11, 11.05] ([279.95, 280, 280, 280.05])

[7.95, 8, 8, 8.05] [11.65, 12, 12, 12.35] [3.70, 4, 4, 4.30]

[6.85, 7, 7, 7.15]

O3

[18.95, 19, 19, 19.05] ([219.90, 220, 220, 220.10])

[24.95, 25, 25, 25.05] [20.75, 21, 21, 21.25] [-0.40, 0, 0, 0.40]

[16.95, 17, 17, 17.05] [9.65, 10, 10, 10.35] [2.50, 3, 3, 3.50]

[10.95, 11, 11, 11.05] ([179.95, 180, 180, 180.05])

[5.85, 6, 6, 6.15]

Vj

[12.90, 13, 13, 13.10]

[14.90, 15, 15, 15.10]

[3.80, 4, 4, 4.20]

[4.80, 5, 5, 5.20]

ORIGINS

O1

Figure 3.17: Final allocated matrix with sum of U i and V j values and cell evaluations for unoccupied cells

3.6 Conclusion In this chapter, transportation problem is studied under probabilistic and fuzzy uncertainties. Transportation problem is solved using fuzzy technique. The method is based on α-level representation of fuzzy numbers and probability estimation of the fact that given interval is greater than or equal to other interval. The proposed method makes it possible to extend simplex method using fuzzy numbers. Numerical results obtained using fuzzy optimization method and monte-carlo method with linear programming using real valued and random parameters show that fuzzy approach has considerable advantages in comparison with monte-carlo method especially from computational point of view. The results obtained can be considerably improved by using hybridization of neuro-fuzzy or rough-fuzzy approaches. Another important aspect which has been dealt here is representation of constraint equations of transportation problem as system of linear equations and solving them through neuro-fuzzy

approach using polak-ribiere conjugate gradient method for three cases, namely exactly determined, underdetermined and over-determined systems. This is achieved using fuzzy backpropagation learning rule. Neuro-fuzzy algorithm presented here has several advantages over conventional ANN algorithms. First, neuro-fuzzy algorithms are much simpler than others. The only algebraic operations required here are addition and multiplication. Inverse and other complex logic operations are not needed. Secondly, neuro-fuzzy algorithms include most parallel and distributed attribute of conventional ANN algorithms. Thirdly, nodes and links in neurofuzzy network are comprehensible. Fourthly, neuro-fuzzy network has more layers than conventional ANN which allow greater degree of precision in solution obtained. Finally, neurofuzzy algorithms are simple to implement. Various numerical examples are presented to prove effectiveness of neuro-fuzzy algorithm developed. A discussion on computational complexity of proposed neuro-fuzzy algorithm is also given. Finally, initial solution of transportation problem is obtained using FVAM and optimality of solution is tested through FMODIM. The closed, bounded and non–empty feasible region of transportation problem using fuzzy trapezoidal numbers ensures existence of an optimal solution to balanced transportation problem. The multi-valued nature of Fuzzy sets allows handing of uncertainty and vagueness involved in cost values of each cells in transportation table. The fuzzification of cost of transportation problem is discussed with help of a simulation example. The computational complexity involved in the problem is also discussed. The effectiveness of solutions obtained can greatly be enhanced by incorporating GA alongwith fuzzy trapezoidal numbers such that computational complexity is greatly reduced.

Chapter 4 Decision Making and its Applications in Game Theory and Financial Investments

4.1 Introduction In today’s fast moving world the need for sound, rational decision making by business, industry and government is vividly and sometimes disquietingly apparent. A decision is selection from two or more courses of action. Decision making can be regarded as an outcome of mental processes which are basically cognitive in nature leading to the selection of course of action among several alternatives. Every decision making process produces a final choice [66]. The output can be an action or an opinion of choice. Decision making is vital for all categories of problems which may be either long-range or short-range in nature; or the problem may be at relatively high or low level managerial responsibility. The decision theory provides rich set of concepts and techniques to aid decision maker in dealing with complex decision problems. The general decision theory is defined as follows [94]: 1. A process which results in selection from set of alternative courses of action, that course of action which is considered to meet objectives of decision problem more satisfactorily than others as judged by the decision maker. 2. The process of logical and quantitative analysis of all factors that influences the decision problem, assists decision maker in analyzing these problems with several courses of action and consequences. The inherent analysis in decision theory is a discipline providing various tools for modeling decision situations in view of explaining them or prescribing actions increasing coherence between possibilities offered by the situation, goals and values of systems of agents involved. Mathematical decision analysis consists in building functional or relational model. The human performance in decision making terms has been subject of active research from several perspectives. From psychological perspective, it is necessary to examine individual decisions in context of set of needs, preferences an individual has and values he seeks [80]. From cognitive perspective, decision making process must be regarded as continuous process integrated in interaction with the environment. From normative perspective, analysis of individual decisions is concerned with logic of decision making, rationality and invariant choice it leads to [89]. At another level, it might be regarded as problem solving activity which is terminated when a satisfactory solution is found. Therefore, decision making is reasoning or emotional process which can be rational or irrational can be based on explicit or tacit assumptions. Logical decision making is an important part of all technical professions, where specialists apply their knowledge in given area to making informed decisions [80]. Some research using natural methods shows that in situations with higher time pressure, higher stakes or increased ambiguities; experts use intuitive decision making rather than structured approaches following recognition primed decision approach to fit set of indicators into expert's experience and immediately arrive at satisfactory course of action without weighing alternatives. Also, recent robust decision efforts have formally integrated uncertainty into decision making process. The roles of human judgment and factors associated with fallibility of decision making have been central facets in many areas of human performance research [39], [100]. Attempts to understand decision making have generated rich history of psychological research much of which is characterized by building formal mathematical and computational models. These models have been used for variety of purposes, across range of disciplines and settings. For example, in Artificial Intelligence research, researchers have studied intelligent agents acting in environments and concerned themselves with decisions that agents should make. The alternative situation has also been of interest. Researchers observe human agents acting in an environment and attempt to

model why certain decisions are made. Such efforts focus on policies that individuals are presumed to be using [80]. The inherent feature revolving all decision making problems is vagueness or uncertainty aspects. In order to tackle this problem most psychological researchers, make use of probability. However, two potential issues arise with using probabilistic models. First, some natural sources of uncertainty may not exist in form that fits a known probability distribution. Second, for modeling cognitive phenomena, abstract or subjective nature of many cognitive processes may reflect type of uncertainty that is not conceptually congruent with probability theory and randomness [127].The basic idea that conventional mathematics should be augmented to describe complex systems prompted Lotfi Zadeh to develop theory of Fuzzy sets [166] and later generalized into Soft Computing encapsulating techniques such as Fuzzy Systems, ANN and GA [163]. Fuzzy set theory and Fuzzy Logic provides system of mathematics that map directly into natural language, thus capturing complex interactions between variables in qualitative descriptions that lend themselves to everyday reasoning. The potential of Fuzzy System approach for modeling human judgment and decision making lies in several critical features such as model free estimators or universal approximations, imprecision associated with everyday reasoning and representation of human judgment models as fuzzy rules. Biases can creep into our decision making processes. Many different people have made decision about same question and then craft potential cognitive interventions aimed at improving decision making outcomes. Some of commonly argued cognitive biases are selective search for evidence; premature termination of search for evidence; inertia; selective perception; wishful thinking or optimism bias; choice supportive bias; recency; repetition bias; anchoring and adjustment; source credibility bias; incremental decision making; attribution symmetry; role fulfillment; underestimating uncertainty and illusion of control [39], [100]. Some of decision making techniques that are used in everyday life include listing advantages and disadvantages of each option commonly used by Plato and Benjamin Franklin; flipping a coin, cutting deck of playing cards, and other random or coincidence methods; accepting first option that seems like it might achieve the desired result; prayer, tarot cards, astrology, augurs, revelation, or other forms of divination; acquiesce to person in authority or an expert; calculating expected value or utility for each option. Let us consider an example. A person is considering two jobs. At first job option person has 60% chance of getting 30% raise in first year. And at second job option person has 80% chance of getting 0% raise in first year. The decision maker would calculate expected value of each option, calculating probability multiplied by increase of value. As such expected value for option a = 0.60 * 0.30 = 0.18 and for option b = 0.80 * 0.10 = 0.08. The person deciding on job would choose the option with highest expected value, in this example option number one [80]. The optimization models like linear programming, transportation and assignment problems involve internal interests of an organization [94]. Their mode of decision making generally involves maximization of profits or minimization of costs. But in practical life, it is required it is required to take decisions in competing situation when there are two or more opposite teams with conflicting interests and outcome is controlled by decisions of all parties concerned. Such problems occur frequently in economics, business administration, sociology, political science and military operations. In all the above problems where competitive situations are involved, one opponent acts in rational manner and tries to resolve conflict of interests in his favor. It is in this context, concept of game theory [148] proves useful decision making tool whereby decisions are made under conflict caused by opposing interests. When solving number of practical real life problems, it is required to analyze situation where at outset, there are two or more opposing entities with conflicting interests and where action of one depends on action which opponent takes. Such situations are known as conflicting situations, where each of the contenders in

operation takes all available measures to prevent opponent from succeeding. The choice of system of armaments, possible ways of its application in battle and in general, planning of military operations all belong to conflicting situations; each decision which is taken must be calculated to ensure that it is least advantageous to opponent. The strategies of opponents are represented by payoff game matrix, where gains of one player are equal to losses of other player, such that the game is called two-person zero-sum games [148]. This game can easily be extended to n  person game. For example, if two firms are locked up in war to maintain their market share, then price-out by first firm will invite reaction from second firm in the nature of price-cut. This will, in turn, affect sales and profits of first firm which will again have to develop counter strategy to meet challenge from second firm. The game will thus go on until one of the firms emerges as winner. Game theory may thus be defined as follows [148]: A decision making situation where two or more intelligent and rational opponents are involved under conditions of conflict and cooperation to seek to determine rival’s most profitable counter strategy to one’s own best moves and formulate appropriate defensive measures. The payoff value in game matrix is often denoted by single numeric value [148]. However, in most situations this representation of payoff value is not adequate because there is an inherent degree of vagueness, imprecision or uncertainty involved. This impreciseness is effectively tackled using Fuzzy sets [166], such that single value of payoff is replaced by number of values denoting different parameters associated with the problem [112]. Imprecision here is meant in sense of vagueness rather than lack of knowledge about parameters present in the system. Fuzzy set theory [166] thus serves as an important decision making tool and provides strict mathematical framework in which vague conceptual phenomena can be precisely and rigorously studied. Another important decision making problem which has been studied in past few decades is investment problem [33]. It involves in making decision to invest in financial institution and has been a source of major concern for investors. Generally, credibility of an investment is tested by means of financial investment analysis [33] which provides capacity to easily analyze performance of any investment. However, in most situations investor faces the problem of making correct decision which yields maximum returns. Today one of the most significant threats for many investors [33] is to invest money in such investments which give them maximum dividends despite the promises made by financial institutions to investors. Evidence shows that in recent past bankruptcies and defaults have occurred at higher rates than at any time. Due to recent financial crises and financial investment risk assessment is an area that has seen resurgence of interest from both academic world and the business community. Especially for investments made by investor in financial institution, the ability to discriminate good investment from bad ones is crucial. In order to enable interested parties to take either preventive or corrective action, the need for efficient and reliable models that predict defaults accurately is imperative. The general approach of financial investment analysis is to apply classification technique [60] on similar data of previous investments both good and bad in order to find relation between characteristics and potential failures. The most fundamental function executed by human brain is classification. Humans can analyze objects using some characteristics to perceive differences and similarities. So, it is possible classify animals as friendly or dangerous, objects to be eatable or not etc. In all moments of our life, classification is imposed among two or more objects. A considerable amount of work has been carried out in different areas of classification problems such as text categorization, optical character recognition, intrusion detection, speech recognition, handwritten recognition [34] etc.

Accurate classifiers should be found in order to categorize new investments as good or bad. Nonlinear regression models, logistic regression and probit regression also have been applied in wide range of financial problems. Several techniques are used to data classification like statistical methods, ANN, Fuzzy sets, Rough sets and hybridization of these techniques [117], [121]. ANN trained with specific set of examples can be used to classify elements that define if some data characteristic belong to a class or not [79]. It has been used in commercial applications to classify customers, fraud detection, handwriting recognition, data mining, medical diagnostics, speech recognition and others. SVM is an ANN tool that has been used in number of finance applications. It was originally designed for binary classification based on statistical learning theory and developed by Vapnik [151]. Some of the famous works are by [141], [142] [143], [150] show that SVM achieved accuracy comparable with that of back propagation ANN. The works by Van Gestel compare classical linear rating methods with state-of-art SVM techniques. The test results clearly indicate SVM methodology yields significantly better results on an out-ofsample test set. One of the main drawbacks in application of standard SVM is their sensitiveness to outliers or noises in training sample due to over fitting. Larger and more complex classification problems have subsequently been solved with SVM. How to effectively extend it for multi-class is still an ongoing research issue [128]. However, most real-world applications are essentially multi-class classification problems. Multiclass classification is intrinsically harder than binary classification problem because classification has to learn to construct greater number of separation boundaries or relations. Classification error rate is greater in multi-class problem than that of binary as there can be error in determination of any one of decision boundaries or relations. There are basically two types of multi-class classification algorithms. The first type deals directly with multiple values in target field i.e. knearest neighbor, naive bayes, classification trees in class etc [34]. Intuitively these methods can be interpreted as trying to construct conditional probability density for each class, than classifying by selecting class with maximum aposteriori probability. For each data with high dimensional input space and very few samples per class, it is very difficult to construct accurate densities. While other approaches decompose the multi-class problem into set of binary problems and then combine them to make final multi-class classifier, this group contains SVM boosting any binary classifier. In certain settings, later approach results in better performance and then multiple target approaches. The most common way to build k-class SVM is by constructing and combining several binary classifiers [60]. In designing Machine Learning algorithms, it is often easier to first devise algorithms to distinguish between two classes. SVM are learning machines that transform training vectors into high dimensional feature space, labeling each vector by its class. It classifies data by determining set of support vectors, which are members of set of training inputs that outline hyperplane in feature space [151]. It is based on the idea of structural risk minimization which minimizes generalization error. The number of free parameters used in SVM depends on margin that separates data points and not on number of input features. SVM provides generic technique to fit surface of hyperplane to data through use of an appropriate kernel function. Use of kernel function enables curse of dimensionality to be addressed and the solution implicitly contains Support vectors that provide a description of significant data for classification [79]. The most commonly used kernel Functions are polynomial, gaussian and sigmoidal functions. Although in literature, default choice of kernel function for most of applications is gaussian. In training SVM we need to select kernel function and its parameters and value of margin parameter C . The choice of kernel function and parameters to map dataset well in high dimension may depend on specific datasets. There is no method to determine how to choose an appropriate kernel function and its parameters for given dataset to achieve high generalization of classifier. The main modeling

freedom consists in choice of kernel function and corresponding kernel parameters which influences speed of convergence and quality of results. Furthermore, choice of regularization parameter C is vital to obtain good classification results In this chapter, optimal solutions are devised for different complex problems in engineering, management and social science disciplines which involve data that are not always precisely defined using concept of Fuzzy Soft relation. Fuzzy Soft relation has its origins in Soft sets which were initially given by Molodostov [106], [158]. The following decision making problems are considered here: (a) House acquisition; (b) Job allocation problem; (c) Investment portfolio; (d) Fund sources; (e) Manpower recruitment; (e) Product marketing. These problems have various types of uncertainties some of which can be dealt with using theories viz., Probability theory, Fuzzy set theory [163], Rough set theory [121], Vague set theory and Approximate Reasoning theory [39]. However, all these techniques lack in parameterization of tools due to which these could not be applied successfully in tackling such problems. Soft sets concept is free from above difficulty and has rich potential for application to these problems. With motivation of this new concept, Soft relation and Fuzzy Soft relation are defined, which are extensions of crisp and fuzzy relations respectively and are applied to solve the above decision making problems. This is followed by solution of rectangular games by principle of dominance using LR-type trapezoidal Fuzzy numbers. LR-type trapezoidal Fuzzy numbers [30] is defined by trapezoidal membership functions. They are characterized by their simple formulations and computational efficiency and thus have been used extensively to solve different real life problems. The solution of Fuzzy games with pay-off as imprecise number is generally given by minimax-maximin principle [30]. In the past, work has also been done on algebraic method of solving m  n rectangular Fuzzy games with pay-off as interval numbers having no saddle point. The determination of 2  2 Fuzzy games from rectangular m  n Fuzzy game without saddle point is fundamental problem of Fuzzy game theory. In classical game theory pay-off is crisp number. Practically, it may happen that pay-off is not necessarily fixed real number. Here, pay-off is considered as LR-type trapezoidal Fuzzy number; m  n matrix is reduced to 2  2 matrix and then the problem is solved. Next, Multi-class SVM [55] is used to evaluate credibility of financial investments which is an important decision making problem. The technique is used in knowledge extraction in database. The curse of dimensionality is addressed using the kernel function. However, a proper kernel function for certain problem is dependent on specific dataset and as such there is no good method to choose kernel function [34]. Here, choice of kernel function is studied empirically and optimal results are achieved for Multi-class SVM combining several binary classifiers. The performance of Multi-class SVM is illustrated by extensive experimental results which indicate that with suitable kernel parameters better classification accuracy is achieved as compared to other methods. Experimental results of datasets indicate that Gaussian kernel is not always the best choice to achieve high generalization of classifier although it is often the default choice. This chapter is organized as follows. In section 4.2, solution of decision making problems is illustrated using Fuzzy Soft relations. The Solution of Rectangular Fuzzy Games by Principle of Dominance using LR-type Trapezoidal Fuzzy Numbers is illustrated in section 4.3. In section 4.4, classification of financial investments using Multi-class SVM is addressed. Experimental results and comparisons are given in section 4.5. Finally, in section 4.6 conclusions are given.

4.2 Solution of Decision Making Problems using Fuzzy Soft Relations Decision theory [39], [149] can be used to determine optimal strategies where decision maker is faced with several decision alternatives and uncertain pattern of future events. For example, a manufacturer of a new style of clothing would like to manufacture large quantities of product if the consumer acceptance and consequently demand for product are going to be high, otherwise smaller quantities are produced if consumer acceptance and demand for product are going to be low. Unfortunately, seasonal clothing items require the manufacturer to make production quantity decision before demand is actually known. The actual consumer acceptance of new product will not be determined until items have been placed in stores and consumers have had the opportunity to purchase them. The selection of best production volume decision from among several production volume alternatives when decision maker is faced with uncertainty of future demand is a problem for decision theory analysis. Decision theory commences with the assumption that regardless of type of decision involved, all decision making problems have certain common characteristics which are briefly enumerated below [39]: 1. Decision maker: The decision maker refers to individual or group of individuals responsible for making choice of an appropriate course of action amongst available course of action. 2. Courses of Action: The courses of action or strategies are acts that are available to decision maker. The decision analysis involves selection among two or more courses of action and problem is to choose best of these alternatives in order to achieve an objective. 3. States of Nature: The events identify occurrences which are outside of decision maker’s control and which determine level of success for given act. These events are often called states of nature or outcomes. 4. Payoff: Each combination of course of action and state of nature is associated with payoff which measures net benefit to decision maker that accrues from given combination of decision alternatives and events. 5. Payoff Table: For a given problem, payoff table lists states of nature which are mutually exclusive as well as collectively exhaustive and set of given courses of action or strategies. For each combination of states of nature and course of action, payoff is calculated. Suppose the problem under consideration has m possible events or states of nature denoted by S1 ,......... ...., S m and n courses of action denoted by A1 ,......... ......, An . Then payoff corresponding to strategy A j of decision maker under state of nature S i is denoted by pij (i  1,......... ., m; j  1,......... ., n) . The mn payoff can be conveniently arranged in tabular form known as m  n payoff table as shown in table 4.1. 6. Regret or Opportunity Loss Table: The opportunity loss is defined as difference between possible profits for state of nature and actual profit obtained for particular action taken. Opportunity losses are calculated separately for each state of nature that might occur. Consider a fixed state of nature S i . The payoffs corresponding to n strategies are given by

pi1 ,......... ....., pin . Suppose M i is maximum of these quantities. Then if A1 is used by decision maker there is loss of opportunity of M 1  p11 . Opportunity loss can be computed as shown in table 4.2:

States of Nature S1 S2 ……. ……. Sm

A1 p11 p21 …….. …….. pm1

Conditional Payoffs Courses of Action (Strategies) A2 ………………. ……. …….. …….. P12 ……. …….. …….. P22 …….. ……. …….. …….. …….. ……. …….. …….. ……. …….. …….. pm2

An p1n p2n …….. …….. pmn

Table 4.1: General form of Payoff table

States of Nature S1 S2 ……. ……. Sm

A1 M1 - p11 M2 - p21 …….. …….. Mm - pm1

Conditional Opportunity Loss Courses of Action (Strategies) A2 ………………. ……. …….. …….. M1 - p12 ……. …….. …….. M2 - p22 …….. ……. …….. …….. …….. ……. …….. …….. ……. …….. …….. Mm - pm2

An M1 – p1n M2 – p2n …….. …….. Mm - pmn

Table 4.2: General form of Regret table 4.2.1 Soft Relation – A classical approach The concept of soft relation given by [106] is given here which has rich potential for application to decision making problems is illustrated with an example [39]. Definition 1: A soft relation may be defined as soft set over power set of cartesian product of two crisp sets. If X and Y are two non-empty crisp sets of some Universal set and E is set of parameters, then soft relation denoted as ( R, E ) is defined as mapping from E to P ( X  Y ) . Example 1: Let U = {Professors teaching in a College}, M = {Male Professors in U} = {m1,…..…..,m9}, N = {Female Professors in U} = {f1,………,f9}. Let E1and E2 be two sets of parameters given by E1 = {is father of, is uncle of, is husband of, is grandfather of, is son of, is nephew of}, E2 = {is mother of, is aunt of, is wife of, is grandmother of, is daughter of, is niece of}. Then soft relation R over P ( M  N ) corresponding to E1 may be given as (R, E1) = {R (is father of) = {(m1, f1), (m2, f3), (m4, f6), (m6, f7)}, R (is uncle of) = {(m2, f1), (m3, f5), (m5, f6)}, R (is husband of) = {(m3, f1), (m4, f7), (m9, f6)}, R (is grandfather of) = {(m1, f4), (m5, f4), (m6, f6)}, R (is son of) = {(m1, f2), (m1, f5), (m4, f3)}, R (is nephew of) = {(m2, f8), (m1, f9), (m7, f8)}}. Another soft relation R over P ( M  N ) corresponding to E2 may be given as (R, E2) = {R (is mother of) = {(f2, m1), (f5, m1), (f3, m4)}, R (is aunt of) = {(f8, m2), (f9, m1), (f8, m7)}, R (is wife of) = {(f1, w3), (f7, w4), (f6, w9)}, R (is grandmother of) = {(f9, m6), (f8, m5), (f7, m9)}, R (is daughter of) = {(f1, m1), (f3, m2), (f6, m4), (f7, m6)}, R (is niece of) = {(f2, m2), (f3, m3)}}. It is evident that R (is father of) and R (is daughter of) can be derived from each other. Similarly {R (is husband of), R (is wife of)} and {R (is son of), R (is mother of)} can also be derived from each other. Again {R (is nephew of), R (is aunt of)} is derivable relation. There may be many other soft relations over P( M  N ) . Each approximate value set in above soft relation (R, E1) or (R, E2) can be expressed in parameterized matrix form as shown in table 4.3.

Considering sets E and P ( X  Y )  V , and any subset of cartesian product E  V is called soft binary relation, denoted by T . e  E , v  V , if  e, v  T , then e and v satisfy relation T i.e., eTv ; otherwise  e, v  T , i.e., e and v do not satisfy relation T . The corresponding soft binary relation can be represented as a matrix M t  (t ij ) nm is called relation matrix T [39], where

t ij 



1, ei ,v j T 0, ei ,v j T

i  1,......... ........., n; j  1,......... ......., m

Let T be relation on non-empty set E . a, b, c  E , if relation satisfies: (1) reflexivity i.e., aTa ; (2) symmetry i.e., aTb  bTa , T is called soft similarity relation. Furthermore, if T satisfies transitivity i.e., aTb, bTc  aTc , then T is soft equivalence relation. Here, soft similarity relations are denoted as Y and soft equivalence relations Q [39]. Given set E and soft similarity relation Y on E , e  E , set [e]Y called soft similarity class of e induced by relation Y , where [e]Y  {ei | ei , e  Y , ei  E} . Q is soft equivalence on E , correspondingly, e  E , define a soft equivalence subset of e with respect to relation Q , denoted by [e]Q , where

[e]Q  {ei | ei , e  Q, ei  E} [39]. Let E be a set and  is family of sets constituting with non-empty subset of E , viz.,  = {|   ,  E,   }, where  is subscript set. If e  E , there is e   so that e  e , the set family is called cover of E . It is easy to verify that    = E if  is cover of E . For any 1, 2 in, if  1   2   , then  is soft partition of E . Given any set E ,  = {E} is coarsest partition and  = {{ei}| ei  E} is finest partition. Let 1 = {|   M} and  = {|   N} be two partitions of set E . 0  M, 0  N such that  0   0 , then partition  is finer than, denoted by    . For partitions 1 and  of set E , V = {   |   ,  ,     } is also soft partition of set E . Moreover, V   , V   [39]. 4.2.2 Fuzzy Soft Relation The concept of fuzzy soft relation is certain extension of crisp soft relation. The fuzziness aspect deals with uncertainty and vagueness inherent in decision making problems. This concept is further extended on relation of two fuzzy soft sets which leads to extension principle [39]. Definition 2: A fuzzy soft relation may be defined as soft set over fuzzy power set of cartesian product of two crisp sets. If P ( X  Y ) is fuzzy power set; X and Y are two non-empty crisp sets of some Universal set and E is set of parameters, then a function R : E  P ( X  Y ) is called fuzzy soft relation. For each   E ordered pair in R ( ) has degree of membership in fuzzy soft relation R , indicating the strength of   parametric relationship present between elements of ordered pairs in R . Example 2: Let P = {Paris, Berlin, Amsterdam} and Q = {Rome, Madrid, Lisbon} be two set of cities, and E = {far, very far, near, very near, crowded, well managed}. Let R be fuzzy soft relation over sets P and Q given by (R, E) = {R (far) = {(Paris, Rome)/ 0.60, (Paris, Madrid)/ 0.45, (Paris, Lisbon)/ 0.40, (Berlin, Rome)/ 0.55, (Berlin, Madrid)/ 0.65, (Berlin, Lisbon)/ 0.70, (Amsterdam, Rome)/ 0.75, (Amsterdam, Madrid)/ 0.50, (Amsterdam, Lisbon)/ 0.80}}. This information can be represented in form of two-dimensional array (matrix) as shown in table 4.4.

It is obvious from the matrix given in table 3.4 that fuzzy soft relation may be considered as parameterized fuzzy relation.

R(is husband of) m1 m2 m3 m4 M5 m6 m7 m8 M9 f1 0 0 1 0 0 0 0 0 0 f2 0 0 0 0 0 0 0 0 0 f3 0 0 0 0 0 0 0 0 0 f4 0 0 0 0 0 0 0 0 0 f5 0 0 0 0 0 0 0 0 0 f6 0 0 0 0 0 0 0 0 1 f7 0 0 0 1 0 0 0 0 0 f8 0 0 0 0 0 0 0 0 0 f9 0 0 0 0 0 0 0 0 0 Table 4.3: Parameterized matrix for Soft relation is husband of

R (far) Rome Madrid Lisbon Paris 0.60 0.45 0.40 Berlin 0.55 0.65 0.70 Amsterdam 0.75 0.50 0.80 Table 4.4: Parameterized matrix for Fuzzy Soft relation far The soft relation T defined over non-empty crisp sets takes either of two values {0,1} .In this case relation matrix is basically boolean matrix. In case of fuzzy soft relation, values are considered in the interval [0,1] viz., t  [0,1] where grades of relations signify strength that elements satisfy the relation. In the context of fuzzy soft relation, properties of fuzzy relations are defined as [39]: (1) reflexivity i.e., T ( a, a )  1 ; (2) symmetry i.e., T (a, b)  T (b, a ) ; (3) transitivity i.e., T (a, c)  MAX c (T (a, b)  T (b, c)) , where  is t-norm. Here, max is considered as t-norm. The relation is called fuzzy soft similarity relation if it satisfies conditions of reflexivity and symmetry. The relation is called T -indistinguishability relation or fuzzy soft equivalence relation if soft similarity relation satisfies t-transitivity. Generally, fuzzy soft equivalence relations are also called soft similarity relations. Usually, concepts of soft similarity relation, fuzzy soft similarity relation, soft equivalence relation and fuzzy soft equivalence relation are used for distinguishing among objects. If T1 and T2 are two soft fuzzy relations on set E , operators which can be defined are [39]: (a) Union: (T1  T2 )(a, b)  max{T1 (a, b),T2 (a, b)}, a, b  E ; (b) Intersection: (T1  T2 )(a, b)  min{T1 (a, b),T2 (a, b)}, a, b  E ; (c) Containment: T1  T2 

T1 (a, b)  T2 (a, b), a, b  E . Given fuzzy relation T on E ,   [0, 1],  -cuts T of fuzzy soft relation is crisp soft relation [39], where

T (a, b) 



1,T ( a ,b ) 0 ,T ( a ,b )

T is fuzzy soft equivalence relation if and only if  -cuts T of T is crisp soft equivalence relation for all   [0,1] . Given a finite set E and fuzzy soft equivalence relation T , fuzzy soft equivalence class [ei ]T of ei  E is fuzzy subset, where [ei ]T is defined as

t i1 t i 2 t   .......... ........  in where, tij  T (ei , e) . The fuzzy soft equivalence class is e1 e2 en fuzzy information granule, whereby elements in class are fuzzy indiscernible with ei ; t ij means [ei ]T 

the degree how two elements are equivalent or indiscernible. The family of fuzzy soft equivalence classes [ei ]T , written as E / T  {[ei ]T | ei  T } is called fuzzy soft quotient set of E induced by T [39]. Definition 3: Given finite set E and fuzzy soft binary relation T on E , and define soft fuzzy class

[ei ]T of ei  E induced by relation T , where [ei ]T 

t i1 t i 2 t   .......... ........  in . [ei ]T is e1 e2 en

fuzzy soft set, and fuzzy cardinal number of [ei ]T is defined as | [ei ]T |

n

t j 1

ij

. For a finite set E

, ei  E , t ij  1 tij  1, then cardinality of [ei ]T is also finite and | [ei ]T | n . Definition 4: Let E be finite set and T fuzzy soft relation on E . The fuzzy soft relation class of e n

| [ei ]T | is [ei ]T , then, expected cardinality of [ei ]T is computed as card ([ei ]T )   |E|

t j 1

n

ij

. The

expected cardinality card ([ei ]T )  1 can be considered as ratio of [ei ]T in E . Definition 5: The uncertainty quantity of fuzzy soft relation class [ei ]T is defined as

V ([ei ]T )   log2 card ([ei ]T ) . As card ([ei ]T )  1 , V ([ei ]T )  0 and, V ([ei ]T ) decreases monotonously with increase of card ([ei ]T ) . Definition 6: Given finite set E and fuzzy soft relation T on E , the average uncertainty quantity n 1 G (T ) of fuzzy soft relation with G(T )   log2 card ([ei ]T ) is calculated. The average i 1 n uncertainty quantity of fuzzy soft relation T on E is mapping G : ( E , T )    , where + is the

domain of nonnegative real numbers. With this mapping an order is formed to compare fuzzy soft relations with respect to uncertainty quantity. It is to be noted that uncertainty quantity is not only function of fuzzy soft relation T , but also related to set E . Also, G (T )  0 if E   . Proposition 1: Given non-empty and finite set E and soft relation T on E, if a, b  E; T (a, b) = 1, then G (T) = 0. Proposition 2: Let T1 and T2 be two fuzzy soft relations on nonempty and finite set E, then T1  T2  G (T2)  G (T1).

Proposition 3: Let T1 and T2 be two fuzzy soft relations on nonempty and finite set E, then G (T1  T2)  max (G (T1), G (T2)); G (T1  T2)  min (G (T1), G (T2)). Definition 7: Let ( F , A) and (G , B ) be two fuzzy soft sets over a common Universal set. Then relation R of ( F , A) on (G , B ) may be defined as mapping R : A  B  P(U 2 ) such that for each ei  A , e j  B and for all u p  F (ei ), uq  G(e j ) , relation R is characterized by following membership function,  R (u l , u k )   F ( ei ) (u l )   G ( e j ) (u k ), where, u l  F (ei ), u k  G(e j ) . Thus mapping is well defined. Higher value of membership grade in relation R for a pair, stronger is parametric character present between the pair. Example 3: Let U = {w1, w2, w3, w4, w5, w6} be set of watches and A = {cheap, costly}, B = {beautiful, in a golden locket}. Let ( F , A) and (G , B ) be two soft sets given by F (cheap) = {w1/ 0.1, w2/ 0.25, w3/ 0.2, w4/ 0.6, w5/ 0.15, w6/ 0.35}, F (costly) = {w1/ 1, w2/ 0.75, w3/ 0.8, w4/ 0.55, w5/ 0.9, w6/ 0.85}, G (beautiful) = {w1/ 0.65, w2/ 1, w3/ 0.8, w4/ 0.7, w5/ 0.8, w6/ 0.75}, G (in a golden locket) = {w1/ 0.6, w2/ 0.75, w3/ 0.8, w4/ 0.5, w5/ 0.45, w6/ 0.95}. The relation R : A  B  P(U 2 ) is given by following membership matrices in tables 4.5 and 4.6. The membership function of R can also be defined using other appropriate techniques. Let X be cartesian product of Universes, X 1 ,......... ......, X r and ( F , A1 ),........ ......., ( F , Ar ) be r fuzzy soft sets in X 1 ,......... ......, X r . Further consider f to be a mapping from X to Y given by

y  f ( x1 ,......... ....., xr ) , then fuzzy soft set B can be defined using extension principle as where B  { y,  B ( y) | y  f ( x1 ,......... ....., xr ), ( x1 ,......... ......, xr )  X } ,  B ( y)  1 max 1 min{ ( F , A1 ) ( x1 ),........ .,  ( F , Ar ) ( xr )}if f ( y )   otherwise  B ( y)  0 [39], [86].

( x1 ,........,xr ) f

y

4.2.3 Soft Relation and Fuzzy Soft Relation – An Alternative Approach An alternative approach to soft relation and fuzzy soft relation are also considered. The soft relation generally uses two-valued logic and as such the propositions may be either true or false, but not both [39]. As a consequence of this, something which is not true is false and vice versa, i.e., law of the excluded middle holds. This is only an approximation to human reasoning [159], [163] which gives rise to multi-valued logic in fuzzy soft relation [125]. For example, consider class of tall men which does not constitute classes or sets in usual mathematical sense of these terms. The term tall is an elastic property. To define class of tall men as crisp set, predicate P (x) is used, where x may be 176 cm, x is height of a person and in the figure 176 denotes threshold value. This is an abrupt approximation to the concept tall. From an engineering viewpoint, it is likely that measurement is uncertain, due to source of noise in equipment. Thus, measurements within narrow range of 176   , where  is variation in noise which could fall on either side of threshold randomly. This leads to the concept of membership grade,  A (x) which allows finer detail, such that the transition from membership to non-membership is gradual rather than abrupt. The membership grade for all members defines fuzzy relation as given in figure 4.1. Corresponding to membership grade, there is membership function that relates x to each membership grade,  A (x) which is in fact a real number in closed interval [0,1] .

The term fuzzy soft or indistinct suggests an image of boundary zone rather than an abrupt frontier. Indeed, soft relations are being considered as relations composed of crisp sets, to distinguish them from fuzzy soft relations. As with soft relations, only guidance is the intuition in deciding which objects are members and which are not; a formal basis for how to determine membership grade of fuzzy soft relation is absent. The membership grade is precise, but arbitrary measure as it rests on personal opinion, not reason. The range of values of membership grade is 0    1 , higher the value, higher is the membership grade. A soft relation is consequently special case of fuzzy soft relation, with membership values restricted to   {0,1} . The members of fuzzy soft relations are taken from universe of discourse which comprises of all objects that can be taken into consideration and generally depends on the context.

R (costly, beautiful) w1 w2 w3 w4 w5 w6

w1

w2

w3

w4

W5

w6

0.65 0.49 0.52 0.35 0.59 0.53

1 0.75 0.80 0.55 0.90 0.85

0.80 0.60 0.64 0.44 0.72 0.68

0.70 0.53 0.56 0.39 0.63 0.60

0.80 0.60 0.64 0.55 0.72 0.68

0.75 0.56 0.60 0.42 0.68 0.64

Table 4.5: Membership matrix for Fuzzy Soft relation costly, beautiful

R (cheap, beautiful) w1 w2 w3 w4 w5 w6

w1

w2

w3

w4

W5

w6

0.07 0.16 0.13 0.39 0.01 0.23

0.10 0.25 0.20 0.60 0.15 0.35

0.08 0.20 0.16 0.48 0.12 0.28

0.07 0.18 0.14 0.42 0.11 0.25

0.08 0.20 0.16 0.48 0.12 0.28

0.08 0.19 0.15 0.45 0.11 0.26

Table 4.6: Membership matrix for Fuzzy Soft relation cheap, beautiful There are two ways to represent membership function viz., continuous or discrete. A continuous fuzzy soft relation A is defined by means of continuous membership function,  A (x) . A trapezoidal membership function is piecewise linear, continuous function, controlled by four parameters viz., a, b, c, d [86]:

 0, xa  x a ,a xb  trapezoid ( x; a, b, c, d )   1b,ba xc ; x    d  x ,c  x  d  0d,dc x The parameters a  b  c  d define four breakpoints, here designated as: left footpoint, a; left shoulderpoint, b; right shoulderpoint, c; and right footpoint, d as shown in figure 4.2 (a). A triangular membership function is piecewise linear, and derived from trapezoidal membership function by merging two shoulder points into one, i.e., b  c as shown in figure 4.2(b). Smooth,

differentiable versions of trapezoidal and triangular membership functions can be obtained by replacing linear segments corresponding to intervals a  x  b and c  x  d by nonlinear function, for instance half period of cosine function,

 0, xa  1  1 cos( x b  ), a xb  smoothtrapezoid ( x; a, b, c, d )   12,b2xc ba ; x  1 1 x c   cos(  ),c xd  02,d 2 x d c

Figure 4.1: The term tall men in terms of Crisp and Fuzzy soft relations They are known as smooth trapezoid or soft trapezoid and smooth triangular or soft triangular which are illustrated in figures 4.2 (c) and (d). Other possibilities exist for generating smooth trapezoidal functions, for example Gaussian, generalized bell, and sigmoidal membership functions [86]. A discrete fuzzy soft relation is defined by means of discrete variable They are generally defined by ordered pairs, xi (i  1,2,......... ..) .

A  { x1 ,  ( x1 ) ,  x 2 ,  ( x 2 ) ,......... ....... | xi  U , i  1,2,......... ......} .Each

membership

value  ( xi ) is an evaluation of membership function  at discrete point x i in universe U and the whole set is a collection, usually finite of pairs  xi ,  ( xi )  . Example 4: To achieve discrete triangular membership function from trapezoid membership function assumes that the universe is vector u of 7 elements. In MATLAB notation, u = [9 10 11 12 13 14 15]. Considering parameters a = 10, b = 12, c = 12, and d = 14 then, by trapezoid membership function, corresponding membership values are vector of 7 elements, viz., [0, 0, 0.5, 1, 0.5, 0, 0]. Each membership value corresponds to one element of the universe, more specifically written as given in table 4.7, with universe in bottom row, and membership values in top row. As a crude rule of thumb, continuous form is more computing intensive, but less storage demanding than discrete form.

0 0 0.5 1 0.5 0 0 9 10 11 12 13 14 15 Table 4.7: Membership values and corresponding elements of Universe

Figure 4.2: (a) Trapezoidal membership function; (b) Triangular membership function; (c) Smooth Trapezoid; (d) Smooth Triangular Definition 8: If ( F , A) and (G , B ) are two soft sets over common universe U , then soft subset ( R, C ) of ( F , A)  (G, B ) is called soft relation of ( F , A) and (G , B ) , where C  A  B and for R ( x, y ) S ( x, y ) every ( x, y )  C , and are identical approximations where S ( x, y )  F ( x )  G ( y ) . It is clear that ( R, C ) is also soft set and therefore basic concepts such as union, intersection, complement, difference and exclusion can be applied without any modification to soft relation [125]. Example 5: Let U = {c1, c2, c3, c4, c5, c6} be set of cars and E = {cheap, costly, fuel efficient, produced by firm A, produced by firm B, produced by firm C} be set of parameters. Let (F, P) = {cheap cars = {c1, c2, c3}, costly cars = {c4, c5}, fuel efficient cars = {c1, c3, c5, c6}} and (G, Q) = {cars produced by firm A = {c1, c3}, cars produced by firm B = {c2, c3, c4}, cars produced by firm C = {c2, c5, c6}} be two soft sets over U. Then soft relation (R, C) of all cheap and fuel efficient cars produced by the firms A and C respectively is given by (R, C) = {R (cheap, produced by firm A) = {c1, c3}, {R (fuel efficient, produced by firm C) = {c5, c6}. Based on definition 8, the generalized representation of operations viz., AND, OR, NOT, NAND and NOR [39], [125] are: (a) AND Operation: The AND Operation can be generalized for family of n fuzzy soft sets {( Fi , Ai ) / i  N } , denoted by (F1, A1)  (F2, A2) ……….. (Fn, An) =

 ( Fi , Ai ) and given by  (Fi, Ai) = (H,  Ai) = (H, A1  A2 ………. An) where H

iN

(x1,………,xn) = F1(x1)  F2(x2) ………… Fn(xn)  (x1,………,xn)  A1  ..… An; (b) OR Operation: The OR Operation can be generalized for family of n fuzzy soft sets {( Fi , Ai ) / i  N } , denoted by (F1, A1)  (F2, A2) ……….. (Fn, An) =  ( Fi , Ai ) and given by iN

 (Fi, Ai) = (P,  Ai) = (P, A1  A2 ………. An) where P (x1,………,xn) = F1(x1)  F2(x2) ………… Fn(xn)  (x1,………,xn)  A1  ..……. An; (c) NOT Operation: The NOT Operation can be generalized for family of n fuzzy soft sets {( Fi , Ai ) / i  N } , denoted by  (Fi,

Ai) and given by  (Fi, Ai) = (P, Ai) where P (xi) = Fi(xi)  xi  Ai; (d) NAND Operation: The NAND Operation can be generalized for family of n fuzzy soft sets {( Fi , Ai ) / i  N } , denoted by  ((F1, A1)  (F2, A2) ……….. (Fn, An)) =   ( Fi , Ai ) and given by   (Fi, Ai) = (H,  iN

Ai) = (H, A1  A2 ………. An) where H (x1,………,xn) =  F1(x1)   F2(x2) …………  Fn(xn)  (x1,………,xn)  A1  ..… An; (e) NOR Operation: The OR Operation can be generalized for family of n fuzzy soft sets {( Fi , Ai ) / i  N } , denoted by  ((F1, A1)  (F2, A2) ……….. (Fn, An)) =   ( Fi , Ai ) and given by   (Fi, Ai) = (P,  Ai) = (P, A1  A2 iN

………. An) where P (x1,………,xn) =  F1(x1)   F2(x2) …………  Fn(xn)  (x1,………,xn)  A1  ..……. An. Similarly, set difference and exclusion operations can also be defined. Considering above operations following definitions are given: Definition 9: If (F1, A1),………..,(Fn, An) be n soft sets then soft subset (R, C) of (F1, A1)  (F2, A2) ……….. (Fn, An) is called an n-ary soft relation. Here, C  A1  A2 ………. An and  (x1,………,xn)  A1  A2 ………. An, R (x1,………,xn) and P (x1,………,xn) are identical approximations where P (x1,………,xn) = F1(x1)  F2(x2) ………… Fn(xn). Definition 10: If (F, A) and (G, B) are two fuzzy soft sets then fuzzy soft subset (R, C) of (F, A)  (G, B) is called a fuzzy soft relation. Here, C  A  B and  (x, y)  A  B, R (x, y) is a fuzzy subset of P (x, y) where, P (x, y) = F (x)  G (y). Example 6: Let U = {c1, c2, c3, c4, c5, c6} be set of cars and (F, A) be fuzzy soft set which describes the cost of the cars and (G, B) be fuzzy soft set which describes the attractiveness of the cars where, A = {costly, moderate, cheap}and B = {fuel efficient, beautiful, having metallic color}. Let, F (costly) = {c1/ 0.5, c2/ 0.8, c3/ 0, c4/ 0.1, c5/ 1, c6/ 0.9}; F (moderate) = {c1/ 0.2, c2/ 0.4, c3/ 0.5, c4/ 0.6, c5/ 0.5, c6/ 0.7}; F (cheap) = {c1/ 0.5, c2/ 0.1, c3/ 1, c4/ 0.9, c5/ 0, c6/ 0.4}; G (fuel efficient) = {c1/ 0.4, c2/ 0.6, c3/ 0.8, c4/ 1, c5/ 0.2, c6/ 0.5}; G (having metallic color) = {c1/ 1, c2/ 0, c3/ 0, c4/ 1, c5/ 0, c6/ 1}; G (beautiful) = {c1/ 0.8, c2/ 0, c3/ 0.5, c4/ 0.7, c5/ 0.9, c6/ 0.8}. Then a fuzzy soft relation R of all cheap, fuel efficient and beautiful cars is given by (R, C) = {R (cheap, fuel efficient) = {c1/ 0.4, c2/ 0.1, c3/ 0.8, c4/ 0.9, c5/ 0, c6/ 0.4}, R (cheap, beautiful) = {c1/ 0.5, c2/ 0, c3/ 0.5, c4/ 0.7, c5/ 0, c6/ 0.4}}. The fuzzy soft relation considered in definition 10 can be generalized to n fuzzy soft sets {( Fi , Ai ) / i  N } in the following manner [39], [125]. Definition 11: The fuzzy soft set (R, C) of (Fi, Ai) is called an n-ary fuzzy soft relation. Here, C  A1…………An  (x1,……….,xn)  A1…………An, R(x1,……….,xn)  O where, O(x1,……….,xn) = F1(x1) ……….. F2(x2). By analogy the relation on n-soft sets is called an n -ary or n -dimensional relation [125]. The association of logic with crisp and fuzzy soft relations can be considered as study of language in arguments and persuasion, and is used to judge correctness of chain of reasoning in mathematical proof [39]. The goal is to reduce principles of reasoning to code. For crisp soft relation truth or falsity values are assigned truth-values of proposition. The fuzzy soft relation considers true or false values of proposition or an intermediate truth-value such as maybe true,

which may further be extended to multi-valued logic. Generally the unit interval is subdivided into finer divisions to achieve greater level of precision. The logical statements are basically represented as propositions or elementary sentences [163] which are combined with connectives such as and (conjunction), or (disjunction), if-then (implies), if and only if (equivalence) to form compound propositions. In many practical situations, assertions are used which contains at least one propositional variable called propositional form. The main difference between proposition and propositional form is that every proposition has truth-value, whereas propositional form is an assertion whose truth-value cannot be determined until propositions are substituted for its propositional variables. But when there is no confusion, propositional forms are referred as propositions. A truth-table summarizes possible truth-values of an assertion. For example, consider truth-table for crisp soft propositional form viz., p  q . The truth-table in figure 4.3 lists all possible combinations of truth-values i.e., cartesian product of arguments p and q in two leftmost columns. The rightmost column holds truth-values of proposition. Alternatively, truthtable can be rearranged into two-dimensional array, also known as cayley-table [39] as shown below.

Figure 4.3: Truth and Cayley tables for p  q Along vertical axis in cayley table, symbolized by arrow ↓, the possible values are 0 and 1 of first argument p; along horizontal axis, symbolized by arrow →, the possible values are 0 and 1 of second argument q. At intersection of row i and column j is truth-value of the expression pi  q j . Similarly, other logical operations can be represented by means of truth-table and cayley-table. By analogy, similar truth-tables can be defined for fuzzy soft logic connectives [39]. Starting by defining negation and disjunction; truth-tables of other connectives can be derived from that point of departure. Defining disjunction as set union, p  q  max( p, q ) , truth-table for fuzzy soft connective or is shown in figure 4.4. Like before p-axis is vertical and q-axis horizontal. At intersection of row i and column j the value of expression is max( pi , q j ) . When looking for definitions of fuzzy soft connectives, it is required that such connectives should agree with their crisp soft counterparts for truth-domain {0,1} . In terms of truth-tables, values in four corners of fuzzy soft cayley-table, should agree with cayley-table for crisp soft connective. Similarly, other fuzzy soft logic connectives can be defined [39].

pq

Figure 4.4: Truth table for Fuzzy soft connective or The implication connective however should be taken care of with caution. If it is defined as material implication, p  q then fuzzy soft truth-table is obtained which is unsuitable, as it causes several useful logical laws to break down. It is important to realize, that a design choice is made at this point, in order to proceed with definition of implication and equivalence [39]. The choice is which logical laws it is to be applied. Not all laws known from two-valued soft logic

can be valid in fuzzy soft logic. Taking for instance propositional form, p  p  1 which is equivalent to law of excluded middle. Testing with truth-value p = 0.5 (fuzzy soft logic) left hand side yields 0.5  ¬0.5 = max (0.5, 1 − 0.5) = 0.5. This is different from right hand side, and thus law of excluded middle is invalid in fuzzy soft logic. If a proposition is true with truth-value of 1, for any combination of truth-values assigned to variables, it is said to be valid. Such proposition is a tautology [39]. If proposition is true for some, but not all combinations, it is satisfiable. One tautology that is definitely applied in fuzzy soft logic applications is [ p  ( p  q )]  q . The above tautology is closely associated with modus ponens rule of inference. Another tautology that is extensively used is transitive relationship, [( p  q )  (q  r )]  ( p  r ) . Whether these propositions are valid in fuzzy soft logic depends on how connectives are defined. Or rather, connectives are defined, implication in particular, such that those propositions become valid. Closely related to implication connective is inference. Logic provides principles of reasoning, by means of inference, drawing of conclusions from assertions. The verb to infer means to conclude from evidence, deduce, or to have as logical consequence. Rules of inference specify conclusions drawn from assertions known or assumed to be true. One such commonly used rule of inference is modus ponens [39]. The generalized form to fuzzy soft logic is core of fuzzy soft reasoning. It is often presented in form of argument given in figure 4.5.

P PQ Q Figure 4.5: Modus Ponens rule of inference In other words, if P is known to be true, and P  Q is true, then Q must be true. Considering two-valued soft logic, it is seen from cayley-table for implication that given in figure 4.6, whenever P  Q and P are true then so is Q; by P true only second row is considered leaving Q true as only possibility. In such an argument assertions above the line are premises, and assertion below line the conclusion. It is to be noticed that premises are assumed to be true, not considering all possible truth combinations. On the other hand, underlying modus ponens is tautology [39], which expresses the same, but is valid for all truth-values. Therefore modus ponens is valid in fuzzy soft logic, if tautology is valid in fuzzy soft logic.

pq

Figure 4.6: The Cayley table for Fuzzy soft connective implication The inference mechanism in fuzzy soft modus ponens can be generalized [39]. Given relation R connecting logical variables p and q, possible values of q for a particular instance of p are inferred; considering vector-matrix representation, to emphasize the computer implementation, with p as column vector and R two-dimensional truth-table, with p-axis vertical, the inference is defined as qt = pt ◦ R. The operation ◦ is an inner  −  product. The  operation is same as in p  (p  q) and  operation along columns yields what can possibly be implied about q, confer rightmost implication in [p  (p  q)]  p. Assuming p is true corresponds to p  10 . But the scheme is more general, because it could also be assumed that p is false, compose with R and



11   which is  01

study what can be inferred about q. Taking for instance modus ponens, thus R  

11    (01) . The outcome qt is 01  

truth-table for p  q. Assigning p as above, qt = pt ◦ R = (01)  

truth-vector pointing at q true as only possible conclusion, as expected. For instance with

11    (11) . Thus q could be either true or false as expected.  01

p  (10) t yields qt = pt ◦ R = (10)  

The inference could even proceed in reverse direction, from q to p, but then composition must be from right side of R to match axes. Assume for instance q is true or q  (10) t , then p = R ◦ q =

11  1  1          . Hence, if q is false and p  q , p is false (modus tollens) [39]. The array  01  0   0 

based inference mechanism is even more general, because R can be any dimension n, n > 0, n  I. Given values of n − 1 variables, possible outcomes of remaining variable can be inferred by an ndimensional inner product. Furthermore, given values of n − d variables, d  I and 0 < d < n, then truth-array connecting remaining d variables can be inferred. Thus, using fuzzy soft connectives various fuzzy soft inference rules can be developed.

4.3 Solution of Rectangular Fuzzy Games by Principle of Dominance using LR-type Trapezoidal Fuzzy Numbers LR-type trapezoidal fuzzy numbers are used here to develop solution of rectangular fuzzy games based on the principle of interval numbers and ordered relation among intervals. The technique is then applied to payoff matrix and is used to find solution of 2  2 games with mixed strategies [112]. This is further extended to games with no saddle point [94]. Then, dominance principle [148] is explained. All concepts are illustrated with examples. 4.3.1 Interval Numbers An interval number is defined as I  [ X L , X R ]  {x : X L  x  X R , x  } [133]. Another way of representation of an interval number in terms of midpoint and width is I  m(i ), w(i )  where, m (i )  midpoint of I  ( X L  X R ) / 2 and w(i )  half width of I  ( X L  X R ) / 2 . Addition

of

two

interval

numbers I  [ X L , X R ] and

is J  [YL , YR ] if I  m1 , w1  and

I  J  [ X L  YL , X R  YR ] . Using mean width notations, J  m2 , w2  then I  J  m1  m2 , w1  w2  . Similarly, the other binary operations on interval numbers are defined.

If I  [a, b] , J  [c, d ] then [ a, b]  [c, d ] , iff b  c and is denoted by I  J . I is contained in J iff a  c , b  d and this is denoted by I  J [133], [135]. Definition 11 The dominated index (DI ) to proposition I

I ( I  J )  (m2  m1 ) /(w1  w2 ) . Using DI the following ranking order is defined.

is dominated over J as

Definition 12 If DI ( I  J )  1 , then I is said to be totally dominating over J in the sense of minimization and J is said to be totally dominating over J in the sense of minimization. This is denoted by I  J . Definition 13 If 0  DI ( I  J )  1 , then I is said to be partially dominating over J in the sense of minimization and J is said to be partially dominating over J in the sense of minimization. This is denoted by I  J . When DI ( I  J )  0 , then m1  m2 , it may be emphasized on the width of interval numbers I and J. If w1  w2 , then left end point of I is less than that of J and there is a chance that on finding a minimum distance, the distance may be on I . But at the same time, since the right end point of I is greater than that of J , if one prefers I over J in minimization then in worst case, he may be looser than one who prefer J over I [30], [135].

I  [110,120]  115,5  , J  [150,155]  152.5,2.5  . DI ( I  J )  (152.5  115) / 2  1. So in minimization I is totally dominating over J .

Numerical

Example

7:

Definition 14 The dominated index (DI ) of proposition A  ( , a,  ) is dominated over B  ( , b,  ) is given by DI ( A  B )  (b  a ) /(   ) .Using DI index the following ranking order is defined. Definition 15 If DI ( A  B )  1 , A is said to be totally dominating over B in the sense of minimization and B is said to be totally dominating over A in the sense of maximization, it is also denoted by A  B . Definition 16 If 0  DI ( A  B )  1 , then A is said to be partially dominating over B in the sense of minimization and B is said to be partially dominating over A in the sense of maximization, it is also denoted by A  B . Lemma 1: If DI ( A  B )  0 then A and B are said to be non comparable and is denoted by A  B . In this case A is preferred over B if (a)    and    or (b)    and    , otherwise a pessimistic decision maker would prefer the number with smaller length of support whereas an optimistic decision maker would do the converse.

B  1,0.2  (1) If A  6,0.2  and then DI ( A  B)  (6  1) /(0.2  0.2)  1 . Thus, A is totally dominating over B in the sense of minimization and B is said to be totally dominating over A in the sense of maximization. Numerical

Example

8:

(2) If A  0.4,0.5  and B  0.3,0.5  then DI ( A  B )  (0.4  0.3) /(0.5  0.5)  0.1 . So, A is said to be partially dominating over B . 4.3.2 Two Person Zero Sum Games and Pay-off matrix Some basic definitions of two person zero sum games and pay-off matrix are given which form the basic building blocks of game theory [94].

A game of two persons in which gains of one player are losses of other player is called a two person zero sum game, i.e., in two person zero sum the algebraic sum of gains to both players after a play is bound to be zero . Games relating to pure strategies taken by players are considered here based on two assumptions [148] 1.

Player A is in better position and is called maximization player (or row player) and player B is called minimizing player (or column player).

2.

Total gain of one player is exactly equal to total loss of other player. In general, if player A takes m pure strategies and B takes n pure strategies, then the game is called two person zero sum game or m  n rectangular game.

Two person zero sum games are known as rectangular games since they are represented by rectangular pay-off matrix. A pay-off matrix is always written for maximizing player. Considering the general m  n rectangular game, the pay-off matrix of A with m pure strategies A1 ,......... ., Am and B with n pure strategies B1 ,......... ., Bn is given by [148],

 a11 , b11  .......... ..........  a1n , b1n    .......... .......... .......... ....... .    .......... .......... .......... ....... .    .......... .......... .......... ....... .  .......... .......... .......... ....... .    .......... .......... .......... ....... .  a , b  .......... .........  a , b  mn mn  m1 m1  The elements  aij , bij  are LR-type trapezoidal fuzzy numbers and for crisp game they may be positive, negative or zero. When player A chooses strategy Ai and player B selects B j , it results in pay-off of LR-type trapezoidal fuzzy number  aij , bij  to player A . 4.3.3 Solution of 2 × 2 Games with Mixed Strategies Consider the fuzzy game of players A (strategies represented horizontally) and B (strategies represented vertically) whose pay-off is given by following matrix and for which there is no saddle point [94].

 a11 , b11   a12 , b12    a , b   a , b   22 22  21 21  where, pay-off  aij , bij  are symmetric LR-type trapezoidal fuzzy numbers such that th bij  aij   . If x i and y j be the probabilities by which A chooses i strategy and B chooses

j th strategy then [148]:

x1  (a22  a21 ) /(a11  a22  a12  a21 ) ; y1  (a22  a21 ) /(a11  a22  a12  a21 )

x2  (a11  a12 ) /(a11  a22  a12  a21 ) ; y2  (a11  a12 ) /(a11  a22  a12  a21 ) which are crisp numbers and value of the game can be easily computed as V  a, b  ; where, a and b are left and right spreads of LR-type trapezoidal fuzzy numbers given by [30]:

a  (a11a22  a12 a21 ) /(a11  a22  a12  a21 ) ; b  (b11b22  b12b21 ) /(b11  b22  b12  b21 ) 4.3.4 Concept of Dominance If one pure strategy of a player is better for him or as good as another, for all possible pure strategies of opponent then first is said to dominate the second [148]. The dominated strategy can simply be discarded from pay-off matrix since it has no value. When this is done, optimal strategies for the reduced matrix are also optimal for the original matrix with zero probability for discarded strategies. When there is no saddle point in pay-off matrix, then size of the game can be reduced by dominance, before the problem is solved. Definition 17 If all elements of the i th row of pay-off matrix of a m  n rectangular game are dominating over r th row in the sense of maximization, r th row is discarded and deletion of r th row from matrix does not change the set of optimal strategies of maximizing player. Numerical Example 9 Consider the fuzzy game of two players A (strategies represented horizontally) and B (strategies represented vertically) with the following pay-off matrix. Player A is maximizing player and player B is minimizing player.

 1,0.2  7,0.3  2,0.1   6,0.2  2,0.1  7,0.3    0,0.2  1,0.2  6,0.2 

   

DI ( A31  A21 )  (6  0) /(0.2  0.2)  1 DI ( A32  A22 )  (2  1) /(0.4  0.2)  1 DI ( A31  A21 )  (7  6) /(0.3  0.2)  1 Thus A2 is dominating over A3 in the sense of maximization and row A3 is deleted. The reduced

 1,0.2  7,0.3   2,0.1     6,0.2  2,0.1  7,0.3 

matrix is given by, 

th

Definition 18 If all elements of j th column are dominating over s column in the sense of minimization the s th column is deleted and the deletion of s th column from the matrix does not change the set of optimal strategies of minimizing player. Numerical Example 10 Considering the above pay-off matrix

DI ( B11  B13 )  (2  1) /(0.1  0.3)  1 DI ( B21  B23 )  (7  6) /(0.2  0.3)  1

Here B1 is totally dominating over B3 in the sense of minimization and the resultant pay-off matrix

 1,0.2   7,0.3    6,0.2  2,0.1  

is given by, 

Definition 19 If the linear combination of p th and q th rows dominates all elements of the s th row in the sense of minimization, s th row is discarded and the deletion of s th row from matrix does not change the set of optimal strategies of maximizing player. Numerical Example 11 Considering a particular pay-off matrix of two players A (strategies represented horizontally) and B (strategies represented vertically) as follows:

  1,0.4   2,0.1   1,0.1    3,0.5   1,0.3   2,0.2      1,0.2   3,0.4   2,0.4   The convex combination of second and third row gives A4  A3  (1   ) A3 ;0    1 . Taking

  0.5 the elements of A4 are  1,0.35  ,  2,0.35  and  2,0.30  .Now A4 is dominating over A1 in the sense of maximization and row A1 is discarded such that the resulting pay-off matrix is  3,0.5  1,0.3  2,0.2   given by,    1,0.2  3,0.4  2,0.4  Definition 20 If j th column is dominated by the convex combination of m th and n th column, j th column is discarded in sense of minimization and deletion of j th column from matrix does not change the set of optimal strategies of the minimizing player. Numerical Example 12 Considering above pay-off matrix, the convex combination of B1 and B2 i.e., B4  B1  (1   ) B2 ;0    1 .

 2,0.4      1,0.3  

Taking   0.5 elements of B4 are 

Now B4 is totally dominating over B3 and thus the third column is removed such that the resulting

  3,0.5   3,0.4    1,0.2   1,0.3  

matrix is given by 

Definition 21 When there is no saddle point and no course of action dominates any other the values for different 2  2 sub games are computed. As A is maximizing player he will definitely select that pair strategies which will give the best value of 2  2 sub games and the corresponding sub matrix provides optimal solution. Similarly, B is minimizing player he will definitely select that pair of courses, which will give the least value of 2  2 sub games, the corresponding sub matrix will provide optimal solution to the fuzzy problem.

Numerical Example 13 Consider the fuzzy game whose pay-off matrix is given by

 19,0.2  15,0.4  16,0.1   0,0.2  20,0.4  5,0.4     There is no saddle point and no course of action dominates any other. The values V1 , V 2 , V3 are computed from the following three 2  2 sub games as obtained from given matrix.

 19,0.2    0,0.2   19,0.2  Sub game 2:    0,0.2    15,0.4  Sub game 3:   20,0.4  Sub game 1: 

 15,0.4   95 7 , . ; The corresponding value V1   6 5  20,0.4   16,0.1  ; The corresponding value V2  16,0.1  .  5,0.4    16,0.1  245 11 , . ; The corresponding value V3   16 5  5,0.4     15,0.4   16,0.1    20,0.4   5,0.4  

Here, min{V1 , V2 ,V3 }  V3 such the resulting pay-off matrix is 

4.4 Classification of Financial Investments using Multi-class SVM Approach Financial investment is an important decision making problem in Operations Research [94] which entails in future benefits of investor. It may be of long-term or short-term nature. The starting point of an investment is an asset allocation strategy i.e., finding right mix of assets to meet one’s investing needs. Some of commonly used investment includes equity or share investments, property or real estate investments, cash investments, retirement funds, recurring investments capital growth and income investments. The question which is to be asked while investing is [33]: What is the prime objective of the investment? The investor should know what he is trying to achieve because different investment products exhibit different characteristics, both in terms of returns they offer and in terms of risks investor incurs while trying to achieve those returns. Asset allocation is all about right risk or return mix. The basic principle of risk and return in financial markets is more or less commonsense: higher the rewards offered the greater the risks incurred i.e., riskier an investment proposition is, higher potential rewards needs to be to encourage people to take on those risks. When it comes to risk and return, different asset classes show dramatically different characteristics. Stocks for example offer highest return but they also carry highest risk of losses. The final piece of jigsaw is holding period or time investor wants to achieve the given return in, because not only do different asset classes exhibit different risk or return characteristics, but those risk or return characteristics change depending on time asset is hold. Thus, an asset allocation strategy is about trying to achieve blend of risk and return that is right for investor given investment aims, and time frame in which they are to be achieved. The risk or return characteristic associated with asset allocation is inflation. The long term investors must learn to live with intermediate swings in equity value without getting too carried away by either optimism or pessimism. Long term investors who cannot cope with temporary short-term falls in value need to hold proportion of their assets as bonds. Shorter term investors must always hold high proportion of assets as debt instruments to

prevent against the risk of having to recoup cash value at time when stock markets are performing badly. The next step is portfolio building process. Some of important portfolio categories are equity portfolios, fixed income portfolios, investing in funds and alternative investments. There are two key stages to equity portfolio building process, the first driven by need to control risk, the second by need to achieve returns [33]. Risk control is achieved by diversification. Returns can be realized by any number of different investment styles. The risk associated with investment risk is uncertainty of future returns; and it can be measured by variability or volatility of share price which can be reduced by diversification. This idea of diversification can be generalized to movements in share prices. Share prices that do not move in opposite directions are said to be negatively correlated and cancel effect of diversification in such investments. Diversification benefits can be achieved wherever two investments are anything less than perfectly positively correlated. They reduce equity risk by nullifying company specific risk inherent in any single shareholding. However, what can never be nullified is risk inherent in holding stocks as an asset class. The key technique in building portfolios which can outperform market is fundamental analysis [33]. It focuses on finding companies with best business prospects. There are generally two ways of looking at company fundamental analysis viz., business profile i.e., history and current goals of company and financial profile. There are two ways to start assembling portfolio which investment managers commonly use to describe their selection technique. Top-down approach begins with an assessment of current situation and future trends in background of social and economic environment. Bottom-up approach aims to identify attractive shares by starting with an assessment of company’s operations. Attractive companies are initially identified without reference to industry sector or wider socio-economic trends. This focuses stock selection on quantitative financial figures, company’s accounts as qualitative judgment can only be made with reference to its competitors and markets. The most sophisticated investment style whether topdown or bottom-up will also run any potential stock-picks through rigorous quantitative analysis. There are two broad approaches to fixed income portfolio management viz., passive and active. Passive management is buying and holding strategy. An active portfolio seeks to profit from bond price changes created by interest rate fluctuations or by credit rating changes. Some of commonly used fixed income portfolios are money markets, bonds etc. Besides direct investment one can also invest in investment funds. An investment fund is pooled investment vehicles, whereby professional fund manager invests money paid into fund on investors’ behalf. They are pooled in the sense that combined money of group of investors are invested in one investment vehicle fund and investor's returns or losses are proportionate to amount they put into the fund. Some of these funds are straightforward investment vehicles; others are designed specifically with tax efficiency in mind; and then there are funds linked to specific goals like pension plans, mortgage linked endowments and with-profits life insurance policies. Interest in alternative investment strategies grows daily. While most investors are happy enough to see their stocks go up and up, no one can predict for sure how far an individual stock will rise or perhaps more importantly, how far it will fall. Many investors try to cancel out risk that they have picked bad stock by buying number of different stocks, adding bonds to portfolio or buying stocks of companies in different countries. This is also known as process diversification and is the best way to ensure that portfolio will generate positive return. There is some evidence that even if an investor diversifies traditional portfolio of stocks and bonds internationally, it will not be enough to prevent an overall fall in value in bear market. One of the main reasons for this lies in globalization of world economy [33]. Governments and companies are increasingly working towards an agenda so that securities listed on different international markets are responding to similar fundamentals. The ability to consider all investment types allows performance of each investment to be assessed, forecast, monitored and reviewed. This enables informed buy or sell or hold decisions

to be made. With financial investment analysis current investments held can be analyzed, and investments that to be monitored for possible future purchase. If portfolio is build of all investments the current and future net worth and income streams can be determined. This is particularly beneficial for investment and retirement planning. All projection values are provided as both future and current equivalent values providing true indication of future returns. In next two subsections, Multi-class SVM [34] is used to classify financial investments which guide an investor to access the credibility of an investment and thereby invest funds in that investment, such that his decision is correct and maximum returns and maximum returns are yielded. 4.4.1 Support Vector Machine Consider a training sample {x k , y k }, k  1,......... ......., N where xk  R d is k th input pattern, d the dimension of input space and y k is its corresponding observed result which is a binary variable 1 or –1 . In financial investment model, x k denote attributes of applicant and y k is observed result of whether the investment made by investor yields him profits or losses i.e., whether the investment is good or bad [33], [55]. Hence, the investment is bad or non-profitable if y k  1 else y k  1 . It is further assumed that the training set is linearly separable after being mapped into a higher dimensional feature space by a nonlinear function  (); the classifier should be constructed as follows:

wT  ( xk ) b1, yk 1 wT  ( xk ) b1, yk 1

{

(1)

The distance between two boundary lines is 1 / 2 || w || 2 [151]. Large distance is encouraged for the purpose of generalization ability. In real world, the training set is usually not linearly separable even after mapping into a high dimensional feature space, which means we cannot find a perfect separating hyperplane that make each x k satisfy condition (1). A soft margin is introduced to incorporate possibility of violation. Error term  k of instance k is defined as follows:

yk [ wT  ( xk ) b ]1 k , k 1,........,N (2)  k 0

{

It is expected that the training should maximize the classification margin and minimize sum of error terms at same time. When the training set is not nonlinear separable in feature space, the two goals usually cannot be achieved at the same time. The two group classification problems are formulated as following optimization primal problem [33], [55], [79]:

min  ( w, b,  k ) 

w,b , k

N 1 T w w  C  k 2 k 1

subject to:

{ykk [ w0  ( xk )b]1k ,k 1,........,N T

(3)

where, regularization parameter C is a constant to trade off two goals. The larger C, more the error term is emphasized. Small C means that large classification margin is encouraged. By introducing Lagrange multipliers  i and  i for the constraints in equation (3), problem can be transformed into its dual form [33], [55], [79] as follows:

N

max J ( )   k  

k 1

1 N  2 i 1

N

N



  i  j y i y j  ( xi ) T  ( x j )

k 1

k 1

y k  0,0   k  C, k  1,......... , N

k

(4)

subject to: N



From the conditions of optimality, it follows that: w 

k

k 1

y k  ( xk ) (5)

where,  k is solution of quadratic programming problem given by equation (24). It is to be noted that w depends only on some train instance whose corresponding  is larger than zero. These instances are called support vectors. As noted in beginning of model,  (x ) is used to map the input vector into higher dimension space such that two groups are linearly separable. However, still explicit form of  (x ) is not known in solving the quadratic programming problem. The merit of support vector machine is that by kernel function K ( x i , x j ), [79], [151] which is inner product in feature space it tries to make training data linear separability in high dimension feature space and thus achieve nonlinear separability in input space. By this way, optimal map function can be found even without specifying explicit form of map function  (x ) . The choice of kernel includes equation which may be linear, polynomial, radial-basis function network or two layer neural perceptions. The kernel function should satisfy Mercer’s conditions. Substituting  ( xi ) T  ( x j ) with kernel function

K ( x i , x j ) leads to following optimization problem: N

1 N N    i  j y i y j K ( xi , x j ) 2 i 1 k 1 subject to:

N

max J ( )   k 



After solving equation (6) and substituting w 





k 1

k 1

k

y k  0,0   k  C , k  1,......... , N (6)

N

k 1

k

y k  ( xk ) into original classification

problem [33], [55] the following classifiers are obtained: N

y( x)  sign( wT  ( x)  b)  sign( k y k K ( x, xk )  b)

(7)

k 1

1 ( x 1  x 1 ) w and x 1 and x 1 are the two support vectors belonging to different 2 classes for which yk [wT  ( xk )  b]  1 . Here, x' s decision value w    k y k K ( x, x k ) is used where, b 

as its financial score instead of classifier (7) directly. A financial analyst can specify a cutoff to change the percent of accepted instances. Only if instance’s financial score is larger than cutoff, investment is treated as good. An important extension of standard SVM is least squares SVM. In later case, total error term [55], [151] is sum of squared deviation of each sample. In this model, the classification problem is formulated as follows:

min  ( w, b,  k ) 

w,b , k

N 1 T w w  C  k 2 k 1

subject to: yk [w  ( xk )  b]  1   k , k  1,......... ., N (8) T

The great merit of least squares SVM is that the classification problem can be solved by ( N  1)  ( N  1) linear system instead of N dimensional quadratic programming problem. 4.4.2 Multi class SVM for classification of Financial Investments Multi-class SVM is an extension of traditional SVM. The equation (7) gives an optimal hyperplane  n [34], [55]. However, more complex decision surfaces can be generated by employing nonlinear mapping  :  n  F while at the same time controlling their complexity and solving the same optimization problem in F. The equation (6) can be rewritten as follows: N

max J ( )   k  

k 1

1 N  2 i 1

N

  k 1

i

j

y i y j xi x j

subject to: N

 k 1

y k  0,0   k  C , k  1,......... , N . (6a)

k

It is evident from equation (6a) that x i always appears in the form of inner product xiT x j . This implies that there is no need to evaluate the nonlinear mapping  as long as the inner product in n F for a given xi , x j   is known. So instead of defining  :  n  F explicitly function

K :  n   n   is introduced to define an inner product in F. The only requirement on kernel K ( x, y ) is to satisfy Mercer’s condition which states that there exists a mapping  and an expansion K ( x, y) 

 ( x )( y ) iff i

i

i

for any g (x) such that

 g ( x) dx is 2

finite then

 K ( x, y) g ( x) g ( y)dxdy  0 . Solving equation (6) gives decision function of form given by equation (7) whose decision boundary is hyperplane in F and translates to nonlinear boundaries in original space. As all financial investment classification problems often involve more than two classes, so binary SVM is usually not enough to solve the whole problem rather a k  class or Multi-class SVM is often found more suitable. The most common way to build k  class SVM [60] is by constructing and combining several binary classifiers. To solve multi-class classification problems the whole pattern is divided into number of binary classification problems. The two representative ensemble schemes are one against all (one against others) and one against one [128]. One against all trains k binary classifiers each of which separates one class from other (k  1) classes. Given a point X to classify, the binary classifier with largest output determines the class of X . One against one constructs k ( k  1) / 2 binary classifiers. The outputs of classifiers are aggregated to make the final decision. Decision tree formulation is a variant of one against all formulation. Error correcting output code is general representation of one against all or one against one formulation, which uses error correcting codes for encoding outputs. The one against all approach provides better classification accuracy in comparison to others [34] as a result of which it has been applied to the classification of financial investments. Commonly used kernels for decision functions of a binary SVM classifier such as polynomial, gaussian and sigmoid may not be suitable for binary classification to map every dataset well into high dimensional space. There can be other functions which satisfy Mercer’s conditions and can enhance classifier accuracy by appropriate transformation in high dimensional space. Some of the kernel functions [34] used for classification of financial investments is given in table 4.8.

Kernel Function Cauchy

K ( x, xi ) for   0

Gaussian Hyperbolic Secant

e  | x  xi |

Laplace Squared Sine

1

(1   | x  xi | 2 ) 2

2 /(exp( | x  xi |)  exp( | x  xi |)) exp(  | x  xi |) ( | x  xi |) ( | x  xi |) 2 max(1   | x  xi |,0)

sin 2

Symmetric Triangle

Table 4.8: Kernel functions of SVM

4.5 Experimental Results and Comparisons In this section some experimental results are presented which are conducted on some well known data sets of varying dimension and size. 4.5.1 Application of Fuzzy Soft Relations to Decision Making Problems The concept of fuzzy soft relation and its generalization can be used effectively for solving wide range of decision making problems. Using fuzzy soft relation there is an inherent reduction in computational effort. This fact is illustrated by several real life applications considered in this section. Six real life applications viz., house acquisition, job allocation, investment portfolio, fund sources, manpower recruitment and product marketing show how fuzzy soft relation can be used to generate effective solutions with least possible efforts. In all the applications, membership values of fuzzy soft sets are determined by considering parameter set E . Generally, two important membership functions trapezoidal and triangular membership functions are used for all cases. The final decision result changes if different membership values are given. The advantages of fuzzy soft relations are also illustrated by comparing with other methods viz., probability and possibility distributions. The different values of probability and possibility distributions are obtained through various real life simulations. Using fuzzy soft relation a variation of house acquisition problem is solved which was solved earlier by [106]. Let U = {h1, h2, h3, h4, h5, h6, h7} be set of seven houses and E  {expensive, wooden, beautiful, cheap, in green surroundings, concrete, moderately beautiful, by the roadside} be set of parameters. Let (F1, A1) be the fuzzy soft set which describes cost of houses given by (F1, A1) = {F1 (cheap) = {h1/ 1, h2/ 0, h3/ 1, h4/ 0.2, h5/ 1, h6/ 0.2, h7/ 1}, F1 (expensive) = {h1/ 0, h2/ 1, h3/ 0.1, h4/ 0.9, h5/ 0.3, h6/ 1, h7/ 0.7}}. Let (F2, A2) be the fuzzy soft set which describes the attractiveness of the houses given by (F2, A2) = {F2 (beautiful) = {h1/ 1, h2/ 0.4, h3/ 1, h4/ 0.4, h5/ 0.6, h6/ 0.8, h7/ 0.7}, F2 (moderately beautiful) = {h1/ 0.3, h2/ 0.7, h3/ 0.5, h4/ 0.6, h5/ 0.2, h6/ 0.3, h7/ 0.4}}. Let (F3, A3) be the fuzzy soft set which describes the physical trait of the houses given by (F3, A3) = {F3 (wooden) = {h1/ 0.2, h2/ 0.3, h3/ 1, h4/ 1, h5/ 1, h6/ 0, h7/ 1}, F3 (concrete) = {h1/ 0.7, h2/ 0.9, h3/ 0, h4/ 0.1, h5/ 0.3, h6/ 0.8, h7/ 0.6}}. Similarly, (F4, A4) be the fuzzy soft set which describes the characteristics of the place where the houses are located given by (F4, A4) = {F4 (in green surroundings) = {h1/ 1, h2/ 0.1, h3/ 0.5, h4/ 0.3, h5/ 0.2, h6/ 0.3, h7/ 1}, F2 (near the roadside) = {h1/ 0.2, h2/ 0.7, h3/ 0.8, h4/ 1, h5/ 0.5, h6/ 0.9, h7/ 0.6}}. Suppose that Mr.

Jones is interested in buying a house on basis of his choice of parameters beautiful, wooden, cheap, in green surroundings. This implies that from houses available in U, he should select the house that satisfies with all parameters of his choice. The problem can be solved by virtue of definition 7, a fuzzy soft relation (R, C) among fuzzy soft sets (F1, A1), (F2, A2), (F3, A3) and (F4, A4) of houses of U which are cheap, beautiful, wooden, in green surroundings. By definition of fuzzy soft relation (R, C) is given by (R, C) = {R (cheap, beautiful, wooden, in green surroundings) = {h1/ 0.2, h2/ 0, h3/ 0.5, h4/ 0.2, h5/ 0.2, h6/ 0, h7/ 0.7}. Thus, house which best satisfies the requirement of Mr. Jones’s choice is the house, which has largest membership value in the relation. Here, h7 has largest membership value equal to 0.7; hence Mr. Jones will buy the house h7. It is noted that solution of above problem obtained by [106] requires calculating row sum, column sum and membership score for each house. This requires more computational time compared to solution obtained by using fuzzy soft relation. So the method above is more efficient and economical. Consider another decision-making problem of allocating a particular job to best possible person who fulfills the requirements of job [39]. The problem is adopted from job allocation problem in Indian Industrial scenario. Let U = {p1, p2, p3, p4, p5, p6} be crisp set of six persons for the job. Let E  {enterprising, average, confident, confused, willing to take risks, unwilling to take risks} be set of parameters. Let (F1, A1) = {F1 (enterprising) = {p1/ 0.5, p2/ 0.7, p3/ 0.3, p4/ 0.1, p5/ 0.8, p6/ 0.9}, F1 (average) = {p1/ 0.3, p2/ 0.1, p3/ 0.5, p4/ 0.8, p5/ 0.05, p6/ 0.7} be soft set describing enterprising qualities of the person. Again (F2, A2) = {F2 (confident) = {p1/ 0.6, p2/ 0.8, p3/ 0.5, p4/ 0.2, p5/ 0.9, p6/ 0.8}, F3 (confused) = {p1/ 0.3, p2/ 0.1, p3/ 0.7, p4/ 0.9, p5/ 0.5, p6/ 0.6} be the soft set describing confidence level of the person. Similarly, (F3, A3) = {F3 (willing to take risks) = {p1/ 0.7, p2/ 0.8, p3/ 0.5, p4/ 0.2, p5/ 0.6, p6/ 0.5}, F1 (unwilling to take risks) = {p1/ 0.3, p2/ 0.07, p3/ 0.65, p4/ 0.95, p5/ 0.1, p6/ 0.6} be the soft set describing willingness level of person. Assuming that the particular job requires an enterprising, confident person who is willing to take risks, the problem is to find candidate who best suits the requirements of job. To solve this problem definition 7 is used, a fuzzy soft relation (R, C) of fuzzy soft sets (F1, A1), (F2, A2), (F3, A3) of all candidates who are enterprising, confident, willing to take risks. By definition, (R, C) is given by (R, C) = {p1/ 0.21, p2/ 0.45, p3/ 0.08, p4/ 0.05, p5/ 0.43, p6/ 0.36}. From the relation it is evident that most suitable candidate for job is p2 who possesses greatest membership value in relation (R, C). Now, probability and possibility distributions for the above problem corresponding to one specific parameter are presented. Considering risk taking parameter viz., willing to take risks from parameter set E , the following probability distribution prob is obtained for persons pi; i = 1,………,6 from set U.

1 2 3 4 5 6 prob ( p i ) 0.25 0.55 0.1 0.1 0 0 pi

Table 4.9: Probability values of person p i with respect to risk taking parameter Again, fuzzy set expressing risk taking attitude of persons pi; i = 1,………,6 from set U may be the expressed using the following possibility distribution π,

1 2 3 4 5 6 Π ( p i ) 1 1 1 1 0.8 0.7 pi

Table 4.10: Possibility values of person p i with respect to risk taking attitude

It is to be noted that each possibility is at least as high as the corresponding probability. Further, sum of prob (pi) is always equal to 1 but π (pi) may be equal to, greater or less than 1. As it is obvious from above discussion that it is much easier to represent a large amount of information viz., different parameters using fuzzy soft relations which is prime requirement in most decision making situations, because final decision to the problem is dependent on various associated parameters. Besides this, using probability and possibility distributions only a partial representation of information is possible, which leads to final decision results which are inaccurate and incomplete. Finally, solution to the problem is obtained with minimal computational effort using fuzzy soft relations.

1 2 3 4 5 6 Prob ( ik ) 0.19 0.36 0.2 0.1 0.1 0.05 ik

Table 4.11: Probability values of investment ik with respect to advance mobilization The investment portfolio problem is simulated from ICICI Prudential Financial Services, India [39]. Let U = {i1, i2, i3, i4, i5, i6} be set of six investments at the disposal of investor to invest some money and E  {investment price, advance mobilization, period, returns, risk, security} be set of parameters. Let (F1, A1) be fuzzy soft set which describes attractiveness of investments to the customers given by (F1, A1) = {F1 (investment price) = {i1/ 0.1, i2/ 0.7, i3/ 0.4, i4/ 0.9, i5/ 0.6, i6/ 0.5}, F1 (advance mobilization) = {i1/ 0.7, i2/ 0.1, i3/ 1, i4/ 0.8, i5/ 0.4, i6/ 0.5}. Let (F2, A2) be fuzzy soft set which describes rate of returns on investments given by (F2, A2) = {F2 (period) = {i1/ 0.5, i2/ 0.6, i3/ 0.5, i4/ 0.8, i5/ 0.7, i6/ 1}, F2 (high returns) = {i1/ 0.9, i2/ 0.6, i3/ 0.3, i4/ 1, i5/ 0.7, i6/ 0.8}. Let (F3, A3) be fuzzy soft set which describes the risk factor of investments given by (F3, A3) = {F3 (risk) = {i1/ 0.9, i2/ 0.8, i3/ 0.7, i4/ 0.6, i5/ 1, i6/ 0.5}, F3 (security) = {i1/ 0.4, i2/ 0.7, i3/ 0.2, i4/ 0.3, i5/ 1, i6/ 0.9}. Considering that the person wishes to have an investment which has advance mobilization gives high returns and is of secured nature. The problem involves in finding an investment which maximum returns to the person. The problem is solved using definition 7, fuzzy soft relation (R, C) of fuzzy soft sets (F1, A1), (F2, A2), (F3, A3) of all investments which are advance mobilized gives high returns and is of secured nature. By definition, (R, C) is given by (R, C) = {i1/ 0.25, i2/ 0.04, i3/ 0.06, i4/ 0.24, i5/ 0.28, i6/ 0.36}. From the relation it is obvious that most profitable investment for person is i6 which has greatest membership value in relation (R, C). Considering the investment parameter viz., advance mobilization from parameter set E the following probability distribution prob is obtained for investment ik; k = 1,………, 6 from set U. Again, fuzzy set expressing advance mobilization of investment ik; k = 1,………,6 from set U may be expressed using the following possibility distribution π, 1 2 3 4 5 6 ik

Π ( ik ) 0.36 0.69 1 0.54 1 0.86 Table 4.12: Possibility values of investment ik with respect to advance mobilization As evident from above discussion that a large amount of information is easily represented using fuzzy soft relations as compared to other methods which leads to much more precise and accurate decision results. Also, the computational effort is minimized using fuzzy soft relations. The fund sources problem is taken from Axis Bank, India [39]. Let U = {s1, s2, s3, s4, s5} be set of five fund sources available for manager in banking system and E  {term deposit, demand

deposit, fund pricing, fund mobility, liquidity, investment} be set of parameters. Let (F1, A1) be fuzzy soft set which describes maturity pattern of deposit given by (F1, A1) = {F1 (term deposit) = {s1/ 0.95, s2/ 0.86, s3/ 0.79, s4/ 1, s5/ 0.21}, F1 (demand deposit) = {s1/ 0.75, s2/ 0.69, s3/ 0.58, s4/ 0.46, s5/ 0.29}. Let (F2, A2) be the fuzzy soft set which describes competition in the fund market given by (F2, A2) = {F2 (fund pricing) = {s1/ 0.15, s2/ 0.27, s3/ 0.37, s4/ 0.78, s5/ 0.35}, F2 (fund mobility) = {s1/ 0.5, s2/ 0.66, s3/ 0.7, s4/ 0.19, s5/ 1}. Let (F3, A3) be fuzzy soft set which describes the strength of organization to fulfill the commitment given by (F3, A3) = {F3 (liquidity) = {s1/ 1, s2/ 0.75, s3/ 0.53, s4/ 0.48, s5/ 0.96}, F3 (investment) = {s1/ 0.24, s2/ 0.39, s3/ 0.85, s4/ 1, s5/ 0.44}. Assuming that the manager intends to have fund source which possesses the attributes such as term deposit, fund mobility and liquidity. This means that from the fund sources in U, he must select the fund source that satisfies with all parameters of his requirements. To solve this problem using definition 7, fuzzy soft relation (R, C) of fuzzy soft sets (F1, A1), (F2, A2), (F3, A3) of all fund sources which has the attributes term deposit, fund mobility and liquidity. By definition, (R, C) is given by (R, C) = {s1/ 0.48, s2/ 0.43, s3/ 0.24, s4/ 0.09, s5/ 0.20}. Thus, the fund source which best satisfies the requirement of the manager’s choice is the fund source, which has largest membership value in the relation. Here, s1 has largest membership value equal to 0.48; hence manager will choose the fund source s1. Considering fund mobility parameter from set E the following probability distribution prob is obtained for fund source mk; k = 1,………, 5 from set U. 1 2 3 4 5 mk

prob ( mk ) 0.35 0.21 0.04 0.2 0.2 Table 4.13: Probability values of fund source mk with respect to fund mobility Again, fuzzy set expressing fund mobility of fund source mk; k = 1,………,5 from set U may be the expressed using the following possibility distribution π,

1 2 3 4 5 π ( mk ) 1 1 1 1 0.46 mk

Table 4.14: Possibility values of fund source mk with respect to fund mobility Form above discussion it is evident that fuzzy soft relation represents voluminous information easily as compared to other methods from which more precise and less vague decision results are obtained. The computational effort required is less. The manpower recruitment problem is adopted from Tata Consultancy Services, India [39]. Let U = {m1, m2, m3, m4, m5, m6, m7} be a set of seven programmers to be recruited by a software development organization by human resources manager and E  {hardworking, disciplined, honest, obedient, intelligence, innovative, entrepreneurial attitude, aspirant} be set of parameters. Let (F1, A1) be fuzzy soft set which describes punctuality of the programmer given by (F1, A1) = {F1 (hardworking) = {m1/ 0.17, m2/ 1, m3/ 0.88, m4/ 0.26, m5/ 0.55, m6/ 0.28, m7/ 0.98}, F1 (disciplined) = {m1/ 1, m2/ 0.33, m3/ 0.7, m4/ 0.64, m5/ 0.4, m6/ 0.3, m7/ 0.57}. Let (F2, A2) be fuzzy soft set which describes the truth in behavior of programmer given by (F2, A2) = {F2 (honest) = {m1/ 0.09, m2/ 0.81, m3/ 0.05, m4/ 1, m5/ 0.45, m6/ 0.24, m7/ 0.18}, F2 (obedient) = {m1/ 1, m2/ 0.56, m3/ 1, m4/ 0.04, m5/ 0.65, m6/ 0.97, m7/ 1}. Let (F3, A3) be fuzzy soft set which describes innovativeness in programmer’s attitude given by (F3, A3) = {F3 (intelligence) = {m1/ 0.13, m2/ 0.93, m3/ 0.08, m4/ 0.36, m5/ 1, m6/ 0.48, m7/ 0.47}, F3 (innovative) = {m1/ 0.54, m2/

0.22, m3/ 0.16, m4/ 0.42, m5/ 0.5, m6/ 0.2, m7/ 0.99}. Let (F4, A4) be fuzzy soft set which describes exploratory attitude of programmer given by (F4, A4) = {F4 (entrepreneurial attitude) = {m1/ 1, m2/ 0.72, m3/ 0.7, m4/ 0.64, m5/ 0.7, m6/ 0.8, m7/ 0.65}, F4 (aspirant) = {m1/ 0.14, m2/ 0.3, m3/ 0.82, m4/ 0.62, m5/ 1, m6/ 0.05, m7/ 0.77}. Assuming that the human resources manager wants to recruit a programmer who has qualities like hardworking, honest, innovative and entrepreneurial attitude. Thus, from available candidates in U, that programmer is selected who satisfies with all parameters of his requirements. To solve this problem using definition 7, fuzzy soft relation (R, C) of fuzzy soft sets (F1, A1), (F2, A2), (F3, A3), (F4, A4) of all programmers who have qualities hardworking, honest, innovative and entrepreneurial attitude. By definition, (R, C) is given by (R, C) = {m1/ 0.01, m2/ 0.13, m3/ 0.05, m4/ 0.07, m5/0.09, m6/0.01, m7/0.11}. Thus, from above calculations it is inferred that second programmer, m2 has largest membership value i.e., 0.13 in the relation; hence the human resource manager selects second programmer for the software development job. Considering innovative parameter from parameter set E the following probability distribution prob is obtained for manpower recruitment ri; i = 1,………,7 from set U.

1 2 3 4 5 6 7 prob ( ri ) 0.19 0.16 0.15 0.15 0.05 0.16 0.14 ri

Table 4.15: Probability values of recruitment ri with respect to innovativeness parameter Again, fuzzy set expressing innovativeness parameter for manpower recruitment ri; i = 1,………,7 from set U may be expressed using the following possibility distribution π,

1 2 3 4 5 6 7 Π ( ri ) 1 1 1 0.55 1 0.69 1 ri

Table 4.16: Possibility values of recruitment ri with respect to innovativeness parameter Form above discussion it is evident that fuzzy soft relation represents voluminous information easily as compared to other methods from which more precise and less vague decision results are obtained. The computational effort required is less. The product marketing problem is simulated from Khosla Electronics, Kolkata, India [39]. Let U = {t1, t2, t3, t4, t5, t6, t7} be set of six brands of televisions to be sold in an international market by retail outlet owner and E  {price, modern technology, portability, screen size, weight, longevity, picture clarity, audible sound} be set of parameters. Let (F1, A1) be fuzzy soft set which describes price effectiveness of television given by (F1, A1) = {F1 (price) = {t1/ 0.6, t2/ 0.5, t3/ 0.56, t4/ 1, t5/ 0.01, t6/ 0, t7/ 0.99}, F1 (modern technology) = {t1/ 1, t2/ 0.75, t3/ 0.43, t4/ 0.33, t5/ 1, t6/ 0.83, t7/ 0.04}. Let (F2, A2) be fuzzy soft set which describes television’s lightness aspect given by (F2, A2) = {F2 (portability) = {t1/ 0.06, t2/ 0.7, t3/ 1, t4/ 0.05, t5/ 0, t6/ 1, t7/ 0.8}, F2 (weight) = {t1/ 1, t2/ 0.87, t3/ 0.03, t4/ 0.23, t5/ 0.16, t6/ 0.75, t7/ 1}. Let (F3, A3) be fuzzy soft set which describes dimensionality of television given by (F3, A3) = {F3 (screen size) = {t1/ 0.61, t2/ 0.1, t3/ 0.2, t4/ 0.25, t5/ 0.67, t6/ 0.05, t7/ 1}, F3 (audible sound) = {t1/ 0.83, t2/ 1, t3/ 0.21, t4/ 0.45, t5/ 0, t6/ 0.74, t7/ 0.84}. Let (F4, A4) be fuzzy soft set which describes durability of television given by (F4, A4) = {F4 (longevity) = {t1/ 1, t2/ 0.4, t3/ 0.7, t4/ 0.55, t5/ 1, t6/ 0.91, t7/ 0.97}, F4 (picture clarity) = {t1/ 0.12, t2/ 0.89, t3/ 0.39, t4/ 1, t5/ 0.6, t6/ 0, t7/ 0.46}. It is assumed that the retail owner wants to maximize his profits by selling television brand which possesses attributes such as modern technology, portability, audible sound and picture clarity. Hence from available brands of

television sets in U, he must select television brand which satisfies with all parameters of requirements. To solve this problem using definition 7, fuzzy soft relation (R, C) of fuzzy soft sets (F1, A1), (F2, A2), (F3, A3), (F4, A4) of all brands of television sets which has attributes modern technology, portability, audible sound and picture clarity. By definition, (R, C) is given by (R, C) = {t1/ 0.01, t2/ 0.46, t3/ 0.04, t4/ 0.08, t5/ 0, t6/ 0, t7/ 0.01}. Thus, television brand which best meets requirement of retail owner is television brand, which has maximum membership value in the relation. Here, t2 has largest membership value equal to 0.46; hence retail outlet owner will consider television brand t2 to maximize his profits. Considering price parameter from parameter set E the following probability distribution prob is obtained for product ti; i = 1,………,7 from set U. 1 2 3 4 5 6 7 ti

Prob ( t i ) 0.16 0.04 0.1 0.1 0.15 0.3 0.15 Table 4.17: Probability values of product t i with respect to price parameter Again, fuzzy set expressing price parameter for product ti; i = 1,………,7 from set U may be the expressed using following possibility distribution π,

1 2 3 4 5 6 7 Π ( t i ) 1 0.89 1 1 1 0.96 1 ti

Table 4.18: Possibility values of product t i with respect to price parameter Form above discussion it is evident that fuzzy soft relation represents voluminous information easily as compared to other methods from which more precise and less vague decision results are obtained. The computational effort required is less. 4.5.2 Application of LR-type Trapezoidal Fuzzy Numbers to Dominance Problem In this section, dominance method [30] discussed in section 4.3 is illustrated by means of fuzzy game involving two players A (strategies represented horizontally) and B (strategies represented vertically) whose payoff matrix is as follows:

  8,0.3   15,0.4   4,0.1   2,0.4  19,0.1   15,0.5   17,0.4   16,0.1      0,0.3   20,0.2   15,0.5   5,0.4   All DI ( A1  A2 )  1, so A2 is totally dominating over A1 in the sense of minimization and row

A1 is deleted, such that the resulting pay-off matrix is:  19,0.1   15,0.5   17,0.4   16,0.1    0.0.3   20,0.2   15,0.5   5,0.4    

Again all DI ( B4  B3 )  1 , so B4 is totally dominating over B3 in the sense of minimization and the column B3 is deleted such that the resulting pay-off matrix is:

 19,0.1   15,0.5   16,0.1    0.0.3   20,0.2   5,0.4     This is no course of action which dominates any other and there is no saddle point. The values for different 2  2 pair of strategies are computed. Since, B is minimizing player the minimum value is considered and corresponding pay-off matrix provides optimal solution to fuzzy problem. The previous least value of {V1 , V2 , V3 } is V3 so optimal strategies of A are ( A2 , A3 ) and of B are

( B2 , B3 ) . The final pay-off matrix is given by:

  15,0.5   16,0.1   20,0.2   5,0.4     The probabilities are x1  0, x 2 

15 1 1 15 , x3  , y1  0, y 2  , y 3  0, y 4  and value of 16 16 16 16

245 11 ,  . Hence, optimal solution of the complete game is (0, x 2 , x3 ) for A 16 5 and (0, y 2 ,0, y 4 ) for B ; the value of game being V . the game is V 

4.5.3 Application of Multi-class SVM for Classification Problem The performance of Multi-class SVM is evaluated using different kernel functions on iris, wine and glass datasets of UCI repository of Machine Learning [34] as well as financial dataset taken from Weekly Dow Jones Industrial Average from 1900-1989. Iris dataset records physical dimensions of three different classes of iris flowers. There are four attributes in this dataset. The class setosa is linearly separable from other two classes, whereas later two are not linearly separable from each other. The wine dataset was obtained from chemical analysis of wines produced in same regions of Italy but derived from three different cultivators. There are 13 attributes and 178 patterns in this dataset. There are three classes corresponding to three different cultivators. The collections of glass dataset were for the study of different types of glass which was motivated by criminological investigations. At the scene of crime the glass left can be used as evidence if it is correctly identified. The glass dataset contains 214 cases. There are 9 attributes and 6 classes in glass dataset. The Weekly Dow Jones Industrial Average 1900-1989 dataset lists an important aggregate stock price index beginning in 1900. There are 4 attributes and 7 classes in this dataset. A daily version of this dataset is also available starting from 1915. The generalization performance is evaluated via ten-fold cross-validation for each dataset. One against all method is considered for designing the Multi-class SVM. For k  class problem Multi-class SVM is developed by combining k  binary SVM with same value of C and  and tested performance for different choices of kernel functions on predefined datasets. The most important criterion for evaluating performance of Multi-class SVM is their accuracy rate. For each Multi-class SVM and given kernel function, hyper parameters space is explored on two dimensional grid with following values:

  [2 10 ,2 9 ,2 8 ,2 7 ,2 6 ,2 5 ,2 4 ] ; C  [2 2 ,2 1 ,2 0 ,......... ......, 212 ] Data Set Iris

Kernel

Gauss

Cauchy

Laplace

Best

100 (24, 2-5) 98 (24, 2-5) 94.44 (25, 210 ) 82.78 (25, 210 ) 86.34 (2-1, 24 ) 69.10 (20, 21)

100 (28, 2-10) 96.67 (28, 2-10) 94.44 (29, 2-10)

98 (25, 2-6) 86 (2-1, 27 )

Average Wine

Best

Average

Glass

Best

Average Weekly Dow Jones

Best Average

Squared Sine 100 (22, 2-1) 97.33 (22, 2-1) 100 (212, 2-8)

Symmetric Triangle 100 (28, 2-1) 96.67 (28, 2-10) 100 (212, 2-10)

C4.5

100 (23, 2-5) 96.67 (23, 2-5) 100 (29, 2-10)

Hyper Secant 100 (26, 2-3) 98 (26, 2-3) 100 (212, 2-9)

81.11 (29, 2-10)

81.67 (29, 2-10)

94.44 (212, 2-9)

96.11 (212, 2-9)

82.22 (212, 2-10)

92.22

77.27 (24, 21)

81.82 (22, 21)

86.36 (20, 20)

86.36 (20, 20)

81.82 (20, 2-1)

81.82

72.82 (21, 22)

70.91 (20, 20)

70.91 (2-1, 21)

70.46 (2-1, 21)

69.54 (20, 20)

72.82

96.33 (29, 2-10) 87.27 (26, 21)

95.67 (23, 2-7) 86.82 (22, 21)

97.33 (26, 2-4) 89.89 (20, 20)

96.33 (22, 2-4) 84.36 (20, 20)

95 (28, 2-11) 84.82 (20, 2-1)

96

100 94 100

86.82

Table 4.19: Comparison of classifier accuracy using different Kernel functions One against one

DAG

One against all

C&S

Data Set

Different Methods

Iris

Accuracy (C, )

97.33 (212, 2-9)

96.67 (212, 2-8)

96.67 (29, 2-3)

97.33 (210, 2-7)

97.33 (212, 2-8)

100

Wine

Accuracy (C, )

99.43 (27, 2-10)

98.87 (26, 2-9)

98.87 (27, 2-6)

98.87 (21, 2-3)

98.87 (28, 2-1)

100

Glass

Accuracy (C, )

72.50 (211, 2-2)

73.83 (212, 2-3)

72.96 (211, 2-2)

72.96 (24, 21)

72.04 (29, 2-4)

81.8 2

Weekly Dow Jones

Accuracy (C, )

97.33 (212, 2-10)

96.67 (212, 2-9)

96.67 (29, 2-6)

97.33 (210, 2-7)

97.33 (212, 2-9)

100

C4.5

Proposed Method One against all 100 (Gauss, 28, 2-3) 100 (Laplace, 29, 2-10) 86.36 (Hyper Secant, 20, 20) 100 (Squared Sine, 28, 2-5)

Table 4.20: Comparison of classifier accuracy using different methods for Multi-class SVM For all 255 possible combinations of C and  , the best and average cross-validation accuracy is computed and rounded up to three decimal places. All tests are performed on Pentium IV microprocessor with 512MB RAM. Figure 4.7 shows average cross-validation accuracy of Multiclass SVM classifier for glass dataset using Gaussian kernel as function of two parameters C and

 . The figure shows the variation in accuracy for 121 combinations of C and  only. The optimal values of parameters can be chosen by visualizing the maximum value of average accuracy attained on the grid. Similarly experiments were performed on different datasets using other kernel functions. It is observed that Multi-class SVM demonstrates better accuracy for certain values of C and  . The significance of choosing appropriate values of C and  can be realized from the 3D plot. However, an analysis has also been done on cross-sectional view of the same in 2D. The accuracy of classifier for given dataset is influenced for all possible combinations C and  using different kernel functions. The best and average cross-validation accuracy are shown in table 4.20 with their optimal parameters C and  . For comparison with Multi-class SVM decision tree construction algorithm C4.5 [3] has also been applied on same datasets for determining the best and average cross-validation accuracy. It can be observed by C4.5 for all datasets. Similarly, table 4.21 compares accuracy of Multi-class SVM classifier with results obtained by C4.5 and available results [34]. The best results in each category are indicated in bold. From table 4.21 it can be observed that obtained results are better in each category.

Figure 4.7: Average Classifier accuracy for Glass dataset using Gauss kernel

4.6 Conclusion In this chapter, concepts of Soft relation and Fuzzy Soft relation are presented to solve various decision making problems in engineering, management and social science domains. These problems often involve data that are imprecise, uncertain and vague nature. A number of solutions have been proposed for such problems using Probability theory, Fuzzy set theory, Rough set theory, Vague set theory, Approximate Reasoning theory etc in the past. These techniques however lack in parameterization of tools due to which they could not be applied successfully in dealing with such problems. Soft set and Fuzzy Soft set concepts possess certain parameterization features which are certain extensions of crisp and fuzzy relations respectively and have rich potential for application to decision making problems. This fact is evident from theoretical analysis which illustrates the rationality of proposed method. Finally, these concepts are used in solving some real life decision making problems and present advantages of Fuzzy Soft sets compared to other paradigms. Another important decision making problem which arises in practical life situations is game theory where decisions are made in competitive situations under conflict caused by opposing interests. Due to inherent uncertainty and vagueness present in real life data, rectangular fuzzy games using LR-type trapezoidal Fuzzy numbers are considered. The pay-off is considered as imprecise number instead of crisp number which takes care of the uncertainty, impreciseness and vagueness aspects. LR-type trapezoidal Fuzzy numbers are used

because of their simplicity and computational efficiency. The solution of fuzzy games with pure strategies by minimax-maximin principle and also algebraic method to solve 2  2 fuzzy games without saddle point by using mixed strategies are discussed. The concept of dominance method is also illustrated. LR-type trapezoidal Fuzzy numbers generates optimal solutions which are feasible in nature and also takes care of impreciseness aspect. The effectiveness of solution obtained may be improved using Rough sets. Next, the problem of classification of financial investments using Multi-class SVM is considered. The classification accuracy of SVM is using Multi-class SVM. Experimental results on four datasets show that gaussian kernel is not always best choice to achieve high generalization of classifier although it is often the default choice. The dependency of classifier accuracy is exhibited on different kernel functions of Multi-class SVM using different datasets. With the choice of kernel function and optimal values of parameters C and  it is possible to achieve maximum classification accuracy of all datasets. It will be interesting and practically more useful to determine some method for determining kernel function and its parameters based on statistical properties of given data. Then proposed method in conjunction with Multi-class SVM can be tested on application domains such as image processing, text classification, intrusion detection etc. Better results can be obtained if Multi-class SVM classifier is integrated with fuzzy and rough membership functions.

Chapter 5 Time Series Forecasting and Predicting Stock Prices alongwith Bankruptcy in Organizations

5.1 Introduction The accuracy of predicting financial market data [5], [10], [56] has been a challenging problem in Operations Research domain. It has attracted attention of researchers in past few decades. The central aspect of improving prediction accuracy is to have good and efficient forecasting techniques. Forecasting is the process of estimation in unknown situations [36], [41]. It is used in almost all facets of human life and aims to tell of events before they happen. Prediction is more generalized term [38] and is a claim that particular event will occur in future in more certain terms than forecast. According to Howard H. Stevenson, prediction revolves around two things viz., important and hard. Important because there is an action and hard because future desired is to be realized and the best way to get there [38]. Forecasting differs from prediction in that it looks to future, whereas prediction may not as in successful reconstruction of some past outcome. Further, forecasting differs from explanation having goal of predicting an outcome rather than goal of theorizing about outcomes. The usage thus differs between areas of application. For example in finance, the term forecasting is sometimes reserved for estimates of values at certain specific future times, while prediction is used for more general estimates, such as number of times inflations will occur over a long period. In consumer demand planning, statistical forecasting is used in everyday business forecasting for manufacturing companies. Both forecasting and prediction refer to estimation of time series, sectional or longitudinal data. Risk and uncertainty [36], [38], [41], [42], [86], [166] are central to forecasting and prediction. Time series is set of observations x t [20], each one being recorded at specific time t . The common type of time series used is discrete-time time series in which set T0 of times at which observations are made is discrete set [22], when observations are made at fixed time intervals. They are often encountered in fields of engineering, science, sociology and economics, [10] where the objective is to draw inferences from such series which can help in effective decision making. Before this is done, it is necessary to set up a hypothetical probability model to represent data. After an appropriate family of models is chosen, it is then possible to estimate parameters, check for goodness of fit to data and possibly use the fitted model to enhance understanding of mechanism generating the series [28], [67], [68], [104]. Once a satisfactory model has been developed, it is used in variety of ways depending on particular field of application. The model may be simply to provide compact description of data. For example, it may be able to represent birth data as sum of specified trend, seasonal and random terms. For interpretation of economic statistics such as employment figures [67], [68], it is important to recognize the presence of seasonal components and remove them so as not to confuse them with long-term trends, a process known as seasonal adjustment. Other applications of time series models include [104] separation or filtering of noise of noise, prediction of future values of series depicting sales of products or population data, testing hypothesis such as global warming using recorded temperature data, predicting one series from observations of another and controlling future values of series by adjusting parameters. Time series models are also useful in simulation studies, such as examining performance of reservoir depending on random daily inputs of water to the system [67], [68]. The general approach to time series modeling includes [41]: (a) plotting series, examining main features of graph plotted and examining in particular whether there is trend, seasonal component, any apparent sharp changes in behavior, any outlying observations; (b) removing trend and seasonal components to get stationary residuals, which is generally achieved by applying preliminary transformation to data; (c) choosing model to fit residuals, making use of various sample statistics; (d) forecasting is achieved by forecasting residuals and then inverting transformations to arrive at forecasts of original series and (e) expressing series in terms of its fourier components, which are sinusoidal waves of different frequencies.

Another important aspect related to forecasting and prediction is prediction of stock prices. The complexity and difficulty of predicting stock prices with reasonable level of precision on one hand and emergence of Computational Intelligence techniques [136], [149] such as ANN, Fuzzy sets, EA, Rough sets [36], [74], [121], [163] etc. as alternative techniques to conventional statistical regression and Bayesian models with better performance on other hand have paved the road for increased usage of these techniques in various areas of finance [36]. Including in these areas is utilization of GA and Genetic Programming, for portfolio optimization, stock selection using ANN and predicting S&P 100 index using Rough sets. The stock and future traders have come to rely upon various types of Intelligent Systems to make trading decisions. Several information systems have been developed in recent years for modeling expertise, decision support and complicated automation tasks [36]. In recent years, Rough sets have gained momentum and have been widely used as viable intelligent knowledge discovery techniques in many applications including finance and investment areas [36], [84], [121]. For instance, building trading systems using Rough set model was studied by several researchers. Applications of Rough sets in economic and financial prediction can be divided into three main areas viz., database marketing, business failure prediction and financial investment [16], [17], [36], [38], [84]. Rough sets have been applied to stick exchange data to discover strong trading rules. Despite many prediction attempts using Rough set models, prediction still remains challenging and difficult task to perform specially within complicated, dynamic and often stochastic areas such as economic and finance. Next important issue in forecasting and prediction is time series forecasting. It is key element of financial and managerial decision making. It is highly utilized in predicting economic and business trends for improved decisions and investments. Financial data presents challenging and complex problem to understand and forecast. Many forecasting methods have been developed to study financial data in last few decades. ANN serve as powerful computational framework have gained much popularity in business applications [79]. ANN have been successfully applied to loan evaluation, signature recognition, time series forecasting, classification analysis and many other Pattern Recognition problems [41]. The major advantage of ANN is their flexible non-linear modeling capability and no need to specify particular model form. Rather, model is adaptively formed based on features present in data. This data-driven approach is suitable for many empirical data sets where no theoretical guidance is available to suggest an appropriate data generating process. However, ANN require large amount of data in order to yield accurate results [19]. No definite rule exists for sample size requirement of given problem. The amount of data for network training depends on network structure, training method and complexity of particular problem or amount of noise in data on hand. With large enough sample, ANN can model any complex structure in data. ANN can thus benefit more from large samples than linear statistical models. Forecasting using Fuzzy sets are suitable under incomplete data conditions and require fewer observations than other forecasting models, but their performance is not always satisfactory. Fuzzy theory was originally developed to deal with problems involving linguistic terms and have been successfully applied to various applications. Tanaka et al. [15], [27], [61], [77] has suggested Fuzzy regression to solve fuzzy environment and to avoid modeling error. The model is an interval prediction model with the disadvantage that prediction interval can be very wide if some extreme values are present. An application of Fuzzy regression to fuzzy time-series analysis is also present. Combining strengths of ANN and Fuzzy sets leads to the development of hybrid Neuro-Fuzzy model [86]. This hybrid model improves the forecasting accuracy. The basic idea of the model combination in forecasting is to use each model’s unique feature to capture different patterns in the data. Theoretical and empirical findings suggest that combining different techniques can be an effective and efficient way to improve forecasts [41]. The notable works on time series

forecasting include Fuzzy auto regressive integrated moving average (FARIMA) method, hybrid GA and high-order Fuzzy time-series approach for enrollment forecasting [41], [42]. The forecasting accuracy of Fuzzy Linear Regression (FLR) model [27] can be greatly improved by achieving greater explanatory power with non-uniform spread. FLR is an important technique for analyzing the vague relationship between dependent variable (response variable) and independent variables (explanatory variables) in complex systems involving human subjective judgment under incomplete and imprecise data conditions [42]. Some successful applications of fuzzy regression include insurance, housing, thermal comfort forecasting, productivity and consumer satisfaction, product life cycle prediction, project evaluation, reservoir operations, actuarial analysis, robotic welding process and business cycle analysis. Tanaka et al. first studied FLR problem with crisp explanatory variables and fuzzy response variables [42]. They formulated FLR problem as linear programming model to determine regression coefficients as fuzzy numbers, where objective function minimizes total spread of fuzzy regression coefficients subject to constraint that support of estimated values is needed to cover support of their associated observed values for certain pre-specified level. Later it was improved by [15], [27], [61], [77], [113]. The drawbacks of these approaches such as wide ranges in estimation and more observations resulting in fuzzier estimations which contradicts general observation that more observations provide better estimations have been pointed out by several investigations. Some other works include fuzzy least-squares approach to determine regression coefficients and criterion of minimizing difference of membership values between observed and estimated fuzzy dependent variable. Three types of multi-objective programming approaches have also been formulated [97], [113] to investigate FLR model with fuzzy explanatory variables and responses. The shape preserving arithmetic operations involving LR fuzzy numbers for least-squares fitting [42], [81], [92], [93] to investigate class of FLR problem has also been used. Here, derived regression coefficients are fuzzy numbers. However, since regression coefficients derived based on Zadeh’s extension principle are fuzzy numbers, spread of estimated dependent variable becomes wider as magnitudes of independent variables increase, even if spreads of observed dependent variables are actually decreasing. To avoid problem of wide spreads for large value of explanatory variables in estimation, two-stage approach [92] to obtain crisp regression coefficients in first stage and determine unique fuzzy error term in second stage have been developed. Least-squares method [93] has been proposed to derive regression coefficients that are crisp. These two studies have better performance but they still cannot cope with the situation of decreasing or non-uniform spread. Another issue is that crisp regression coefficients may eliminate problem of increasing spread, but they also mislead functional relationship between dependent and independent variables in fuzzy environment. When spreads of fuzzy independent variables are large, it is possible that spread of regression coefficients is also large. In this case values of regression coefficients are in wide range even from negative to positive values. If derived regression coefficients are crisp, some valuable information may be lost. According to [15], regression model based on fuzzy data shows beneficial characteristic of enhanced generalization of data patterns compared to regression models based on numeric data only. This is because membership function associated with fuzzy sets has significant informative value in terms of capturing either notion of accuracy of information or notion of proximity of patterns in data set used for derivation of regression model. Thus, when explanatory variables are fuzzy, regression coefficients will be fuzzy and they should be described by membership functions to completely conserve fuzziness of explanatory variables. Another important category of prediction in business and corporate is bankruptcy prediction [38]. It is a phenomenon of increasing interest to investors or creditors, borrowing organizations and governments. Timely identification of organizations’ impending failure is always desirable. Bankruptcy is the condition [5], [6] in which an organization cannot meet its debt obligations and

petitions federal district court for either reorganization of its debts or liquidation of its assets. In the action, property of debtor is taken over by receiver or trustee in bankruptcy for benefit of creditors. An effective prediction in time is valued priceless for business in order to evaluate risks or prevent bankruptcy. A fair amount of research has therefore focused on bankruptcy prediction. Signs of potential financial distress are evident long before bankruptcy occurs [6]. Financial distress begins when an organization is unable to meet its scheduled payments or when projection of future cash flows points to an inability to do so in near future. The causes leading to business failure and subsequent bankruptcy can be divided into economic, financial, neglect, fraud, disaster and others [38]. Economic factors include industry weakness and poor location. Financial factors include excessive debt and insufficient capital. Research shows that financial difficulties are often the result of managerial error and misjudgment. When errors and misjudgments proliferate, it could be sign of managerial neglect. Corporate fraud and failure became public concern during late 90’s leading to disaster which includes human error malice. However, no models are still available that could detect and flag corporate fraud. Bankruptcy filing is not exclusive to any specific economy. Globalization can feed waves of economic distress ac126 societies and national economies after original economy witnesses its deleterious impact. Countries like Japan, Belgium, Thailand, Greece, Hungary etc. are developing their own bankruptcy prediction models to deter disastrous consequences of ultimate financial distress. Predicting corporate failure using past financial data is traditional and modern topic of financial business [4], [54], [56]. The solution to this problem is discriminant function from variable space in which observations are defined into binary set. Various researches have demonstrated that AI techniques such as ANN [79] can serve as useful tool for bankruptcy prediction. In early stage of applying ANN to bankruptcy prediction, back propagation ANN were used and their prediction results were compared and integrated with other results. During past few decades, ANN has become dominant modeling paradigm of bankruptcy prediction though non-ANN methods are still used. Research efforts have been directed to integration of ANN models with other Soft Computing tools such as Fuzzy sets, Rough sets and GA [86] for better performance and improvement of predicted results. Since the work of [5], [114] bankruptcy prediction has been studied actively by academics and practitioners. This field of risk management continues to be very active, much due to continuous development of new financial derivatives. For example, pricing of credit derivatives rely on good estimates of counterparty risk. The literature on bankruptcy prediction is extensive and many models have been proposed and tested empirically, often with contradictory conclusions. There are basically three kinds of models that are commonly addressed in the literature. First group comprises of the statistical models, for example discriminant analysis, correlation and regression analysis, logit and probit models etc. Linear discriminant analysis models have been widely used. 4’s popular z-score is for example based on linear discriminant analysis. Generalized linear models or multiple logistic regression models have also been popular. Ohlson’s o-score is based on generalized linear models with logit link function is also referred to as logit analysis. The second group belongs to market based models, for example Merton or Black-Scholes Merton models, Moody’s KMV public firm model [38]. The market models are based on value of firm set by market. Stock prices are commonly used as proxies for the value. Market based models require that firms are registered on stock exchange and this is quite often not the case. The third group encompasses different computational intelligence techniques such as decision trees, ANN, SVM [151] etc. Most researchers use one of techniques to compare the prediction performance with other techniques for specific data set [38]. However, there is no single conclusion that one technique is consistently better than another for general bankruptcy prediction. Factors which can contribute to understanding of corporate bankruptcy can be found both in fields of economics and in theory of business management [38]. However, several attempts to specify a model of bankruptcy prediction based on causal specifications of underlying economic determinants has not fully

succeeded. The difficulties of merging theoretical and empirical fields may arise from the diversity of phenomenon. Organizations are heterogeneous and available information is limited. Furthermore, event of bankruptcy is twofold as the decision of whether or not to continue operations is not directly connected to particular outcome of bankruptcy. In search of explanatory factors it is required to identify factors that influenced on insufficiency of organization’s performance, but for organizations that do fail it is needed to explain why particular outcome of bankruptcy was observed and not timely liquidation, merger or restructuring of debt. In this chapter, three important aspects of forecasting viz. stock price prediction, currency exchange rate forecasting and bankruptcy prediction are studied using hybrid soft computing techniques viz., rough fuzzy multilayer perception networks, fuzzy support vector machine and neuro fuzzy regression model. In first part of chapter, a generic stock price prediction model is presented using modular evolutionary approach for designing hybrid connectionist system Soft Computing framework for classification and rule generation. The basic building block used is Rough Fuzzy Multi Layer Perception (RFMLP) ANN [14], [36] which is an important paradigm for pattern classification. The model extracts knowledge in form of rules from daily stock movements of Bombay Stock Exchange (BSE) that guides investors whether to buy, sell or hold stock [36]. To increase efficiency of prediction process Rough Set with Boolean Reasoning (RSBR) discretization algorithm is used to discretize data. The original classification includes splitting tasks into several subtasks and number of RFMLP networks is obtained for each subtask. The sub-network modules are integrated in particular manner so as to preserve crude domain knowledge which is encoded in them using Rough sets. The pool of integrated networks is then evolved using GA [74] with restricted adaptive or variable mutation operator that utilizes domain knowledge to accelerate training and preserves localized rule structure as potential solutions. The parameters for input and output fuzzy membership functions of network are also tuned using GA together with link weights. The existing procedure has been modified for generation of Rough set [36], [121] dependency rules for handling directly real valued attribute table containing fuzzy membership values. This helps in preserving all class representative points in dependency rules by adaptively applying threshold that automatically takes care of shape of membership functions. In this knowledge based network design all possible inference rules contribute to final solution. The use of GA in this context is beneficial for modeling multimodal distributions, since all major representatives in population are given fair chance during network synthesis [36]. Next rule extraction algorithm is given. The performance of generated rules is evaluated quantitatively. Two new measures are accordingly defined indicating certainty and confusion in decision. These new indices are used along with some existing measures to evaluate quality of rules. A quantitative comparison of rule extraction algorithm is made with some existing ones like Subset method, M of N method, X2R method etc. [36] on datasets of BSE. This is followed by hybrid neuro-fuzzy model, which is developed based on basic concepts of ANN and fuzzy regression models to time-series forecasting under incomplete data conditions [41]. The time-series data considered here is exchange rate of US dollar to Indian rupee. ANN is used to preprocess raw data and provide necessary background to apply fuzzy regression model. Fuzzy regression model eliminates disadvantage of large amount of historical data imposed on formulating the model. The model yields more accurate results with fewer observations and incomplete data sets for both point and interval forecasts. The empirical results indicate that performance of model is comparatively better than other models which make it an ideal candidate for forecasting and decision making under best and worst possible situations. The method is empirically compared with other forecasting models such as auto regressive integrated moving average, Chen’s fuzzy time-series (first and higher order), Yu’s fuzzy time-series, FARIMA and ANN which gives improved forecasting results [41] for present method. Next, two important problems of FLR viz., wide spreads for large value of explanatory variables and functional relationship between dependent and independent variables are addressed [42]. First a procedure is developed for

constructing membership function of fuzzy regression coefficients such that fuzziness of input information can be completely conserved. Then variable spread FLR model is generated with higher explanatory power and forecasting accuracy, which resolves problem of wide spreads of estimated response for larger values of independent variables in fuzzy regression such that situation of decreasing or non-uniform spreads can be tackled. These problems are taken care of by means of three-step approach. In first step, membership functions of least-squares estimation of fuzzy response and explanatory variables are derived based on Zadeh’s extension principle [160], [161], [162] to completely conserve fuzziness such that some valuable information is obtained. In second step, fuzzy regression coefficients are defuzzified to crisp values via fuzzy ranking method to avoid problem of non-uniform spreads for larger values of explanatory variables in estimation. Finally, in third step mathematical programming approach determines fuzzy error term for each pair of explanation variables and response, such that errors in estimation are minimized subject to constraints including spreads of each estimated response equal to that of associated response. As spreads of error terms coincide with those of their associated observed responses, spreads used here are non-uniform in nature, no matter how spreads of observed responses change. Finally, FSVM has been used to study bankruptcy in corporate organizations [38]. FSVM which is formed by integrating SVM and Fuzzy sets is based on idea of structural risk minimization and have the capability to handle uncertainty and impreciseness in corporate data. Thus, overall prediction accuracy of whole model is enhanced. FSVM is implemented here for analyzing predictors as financial ratios. A method of adapting it to default probability estimation is proposed. FSVM extracts meaningful information from financial data, although extensive data sets are required in order to fully utilize their classification power. The test dataset comprises of 50 largest bankrupt organizations with capitalization of no less than $1 billion that filed for protection against creditors under Chapter 11 of United States Bankruptcy Code in 2001 – 2002 after stock marked crash of 2000. Their performance is illustrated by experimental results which show that FSVM are better capable of extracting useful information from corporate data [38] than traditional bankruptcy prediction methods. Thus, this approach inherits advantages of Machine Learning and Fuzzy Logic such that prediction accuracy of whole model is enhanced. This chapter is organized as follows. In section 5.2, stock prices are predicted RFMLP ANN. This is followed by time series forecasting using hybrid Neuro-Fuzzy regression model in section 5.3. In section 5.4, forecasting accuracy is enhanced with non-uniform spread Fuzzy linear regression model. The section 5.5 illustrates bankruptcy prediction problem through FSVM. Experimental results and comparisons are given in section 5.6. Finally, in section 5.7 conclusions are given.

5.2 Predicting Stock Price using RFMLP Networks In the quest of generating prediction rules for stock price movement at BSE, RFMLP ANN is used [36]. The use of Computational Intelligence Systems such as ANN, Fuzzy sets and GA for stock market predictions has been widely established. The process is to extract knowledge in form of rules from daily stock movements leading to rule extraction algorithm to guide investors. Rough sets are used to discretize data which increases efficiency of prediction process. The methodology uses GA to obtain structured network suitable for both classification and rule extraction. The modular concepts is based on divide and conquer strategy, provides accelerated training and compact network suitable for generating minimum number of rules with high certainty values. The concept of variable mutation operator is introduced for preserving localized structure of constituting knowledge based sub-networks while they are integrated and evolved. The extracted rules are compared with some rule extraction techniques on basis of some quantitative performance indices which demonstrates effectiveness of proposed methodology.

5.2.1 Fuzzy Multi Layer Perception Networks Fuzzy Multi Layer Perception (FMLP) network [10], [36] is often used for data classification. The methodology is utilized for extracting dependency rules using knowledge encoding algorithm for mapping rules to parameters of FMLP network. Multi Layer Perception (MLP) network consists of a group of nodes arranged in layers. Each node in a layer is connected to all nodes in next layer by links which have weight associated with them. The input layer contains nodes that represent input features of classification problem [60]. A real valued feature is represented by a single node, whereas discrete feature with n distinct values is represented by n input nodes. The classification strength of MLP network is enhanced by incorporating Rough sets [36], [121] and Fuzzy sets [155], [166] in network such that RFMLP networks are evolved where ANN acts as efficient connectionist between two. In this hybridization, Fuzzy sets help in handling linguistic input information and ambiguity in output decision, while Rough sets extract domain knowledge for determining network parameters. FMLP model incorporates fuzziness at input and output levels of MLP and is capable of handling exact or numerical and inexact or linguistic forms of input data. Any input feature value is described in terms of some combination of membership values in linguistic property sets low (L), medium (M) and high (H). Class membership values  of patterns are represented at output layer of FMLP network. During training, weights are updated by back propagation errors with respect to these membership values such that contribution of uncertain vectors is automatically reduced. Four layered feed forward MLP network is used here. The output of neuron in any layer h other than input layer (h = 0) is given as [36]:

y (jh)  where,

1

1  exp(i yi( h1) w(jih1) )

(1)

( h 1) th is the weight of yi( h1) is the state of i neuron in preceding (h  1) th layer and w ji

0 connection from i th neuron in layer (h  1) to jth neuron in layer h. For nodes in input layer, y j

corresponds to jth component of input vector. Also x (jh ) 

y i

( h 1) i

w (jih 1) .

An n-dimensional pattern Fi  [ Fi1 , Fi 2 ,......... , Fin ] is represented as 3n-dimensional vector [36],

Fi  [  low( Fi1 ) ( Fi ),........ .,  high( Fin ) ( Fi )]  [ y1( 0) , y 2( 0) ,......... ., y3( 0n) ] (2) where,  values indicate membership functions of corresponding linguistic  -sets low, medium, and high along each feature axis and y1( 0) , y 2( 0) ,......... ., y3( 0n) refer to activations of 3n neurons in input layer. When input feature is exact in nature,  -Fuzzy sets in one dimensional form are used with range [0, 1] and are represented as:

|| F j  c || 2  ) , for  || F j  c ||  2(1  2   || F j  c || 2  (3)  ( F j ; c,  )  1  2( ) , for0 || F j  c ||  2   0, otherwise   where,  > 0 is radius of -function with c the central point.

Considering an l-class problem domain such that there are l nodes in output layer, ok  [ok1 ,......... , okl ] and v k  [v k1 ,......... , v kl ] denoting mean and standard deviation respectively of exact training data for k th class c k ; weighted distance of training pattern Fi from

k th class c k is defined as [36]:

z ik 

n

[ j 1

Fij  okj vkj

]2 for k = 1,……….,l

(4)

where, Fij is value of j th component of i th pattern point. The membership of i th pattern in class k, lying in [0, 1] is defined as:

 k ( Fi ) 

1 (5) z ik f e 1 ( ) fd

where, positive constants f d and f e are denominational and exponential fuzzy generators controlling amount of fuzziness in class membership set. 5.2.2 Generation of Stock Price Prediction Rules using RFMLP Networks The methodology to discover stock price prediction rules of BSE consists of five phases’ [36], [84] viz., pre-processing phase, analysis phase, rule generating phase, classification phase and rule extraction and prediction phase. These phases are detailed in following discussion. In pre processing phase, decision table is created for Rough set analysis [36], [84]. In this process, a number of data preparation tasks such as data conversion, data cleansing, data completion checks, conditional attribute creation, decision attribute generation, discretization of attributes are performed. Data splitting is performed which is created by two randomly generated subsets, one subset containing objects for analysis and remaining subset containing objects for validation. The data conversion is performed on initial data to generate a form in which specific Rough set tools can be applied. The real world data often contain missing values. Since Rough set classification involves mining for rules from data, objects with missing values in data set may have undesirable effects on rules that are constructed. The aim of data completion procedure is to remove all objects that have one or more missing values. Incomplete data or information systems [121] exist broadly in practical data analysis, and approaches to complete incomplete information system through various completion methods in pre-processing stage are normal in knowledge discovery process. However, these methods may lead to distortion in original data and knowledge, and can even render original data to be unexplored. To overcome these deficiencies inherent in traditional methods, decomposition approach is used for incomplete information system (i.e. decision table) as proposed in [3]. Attributes in classification and prediction [60] may have varying importance in problem domain being considered. Their importance can be pre-assumed using auxiliary knowledge about the problem and expressed by properly chosen weights. However, when using Rough set approach for classification, it avoids any additional information aside from what is included in information table itself. Basically, Rough set approach tries to determine from available data in decision table whether all attributes are of same strength and, if not, how they differ in respect of classifier power. Therefore, some strategies for discretization of real value attributes have to be used when learning strategies are applied for data classification with real value attributes. It has been shown that quality of learning algorithm dependents on this strategy [36]. Discretization uses data transformation procedure which involves finding cuts in data sets that divide data into intervals. Values lying within an interval are then mapped to same value. Performing this process leads to reducing size of the attributes value set and ensures that rules

that are mined are not too specific. For discretization of continuous-valued attributes, here Rough Sets with Boolean reasoning (RSBR) algorithm [36], [84] is considered. The main advantage of RSBR is that it combines discretization of real valued attributes and classification. The main steps of RSBR discretization algorithm are given in algorithm 1 [36]. Algorithm 1: RSBR discretization algorithm Input: Information system table (S) with real valued attributes Aij and n is number of intervals for each attribute. Output: Information table (ST) with discretized real valued attribute. Step1: for Aij  S do Define a set of Boolean variables as follows: N

N

N

N

i 1

i 1

i 1

i 1

B  { C ai ,  Cbi ,  Cci ,......... ,  C Ni } (6) N

where,

C i 1

ai

correspond to a set of intervals defined on variables of attributes a

end for Step 2: Create a new information table S new by using set of intervals C ai Step 3: Find minimal subset of C ai that discerns all objects in decision class D using:

 u  {(i, j ) : d ( xi )  d ( x j )} (7) where,  (i, j ) is number of minimal cuts that must be used to discern two different instances x i and

x j in information table. The principal task in method of rule generation is to compute reducts and corresponding rules with respect to particular kind of information system and decision system. The knowledge encoding is also embedded in this phase [36]. Relativised versions of matrices and functions viz., d-reducts and d-discernibility matrices are used as basic tools for computation [36], [121]. Let S  U , A  be a decision table with C and D  {d1 ,......... ., d l } its sets of condition and decision attributes respectively. The decision table S  U , A  is divided into l tables S i  U i , Ai ; i  1,......... ., l corresponding to l decision attributes d1 ,......... ., d l where

U  U 1  ..........  U l and Ai  C  {d i } . Let {xi1 ,......... .., xip } be set of those objects of Ui that occurs in S i ; i  1,......... ., l . Now for each d i -reduct B  {b1 ,......... ., bk }, discernibility matrix denoted by M di (B) from d i -discernibility matrix is defined as follows [36]:

cij  {a  B : a( xi )  a( x j )} , for i, j  1,......... .n (8) For each object x j  x1 ,........, xn , discernibility function f d j j is defined as x

f d j j  {(cij ) : 1  i, j  n, j  i, cij  }, x

(9)

where,  (cij ) is disjunction of all members of c ij . Then, f d j j is brought to its conjunctive normal x

form. Thus, a dependency rule ri is obtained, viz., Pi  d i , where Pi is disjunctive normal form

f d j j , j  i1 ,......... .., i p . The dependency factor dfi for ri is given by [36], x

card ( POS i (d i )) (10) card (U i ) where, POS i ( d i )   X I d l i ( X ) and l i ( X ) is lower approximation of X with respect to l i . i df i 

Here df i  1 . In knowledge encoding, feature F j for class c k is considered in l-class problem domain. The inputs for ith representative sample Fi are mapped to corresponding threedimensional feature space of  low( Fij ) ( Fi ),  medium( Fij ) ( Fi ),  high( Fij ) ( Fi ) which corresponds to low, medium and high values of stock price index [36]. Let these be represented by L j , M j , H j respectively. As method considers multiple objects in a class, separate n k  3n -dimensional attribute-value decision table is generated for each class c k where, n k indicates number of objects in c k . The absolute distance between each pair of objects is computed along each attribute L j ,

M j , H j for all j. The equation (8) is modified to directly handle real-valued attribute table consisting of fuzzy membership values. Defining cij  {a  B :| a( xi )  a( x j ) | Th} for i, j  1,......... , nk ,

(11)

where, Th is an adaptive threshold. The adaptivity of this threshold is built in, depending on inherent shape of membership function. While designing initial structure of RFMLP, union of rules of l classes is considered. The input layer consists of 3n attribute values while output layer is represented by l classes. The hidden layer nodes model the innermost operator in antecedent part of rule, which can be either conjunct or disjunct. The output layer nodes model outer level operands, which can again be either conjunct or disjunct. For each inner level operator, corresponding to one output class one dependency rule, one hidden node is dedicated. Only those inputs attribute that appear in this conjunct or disjunct are connected to appropriate hidden node, which, in turn, is connected to corresponding output node. Each outer level operator is modeled at output layer by joining corresponding hidden nodes. A single attribute involving no inner level operators is directly connected to appropriate output node via hidden node to maintain uniformity in rule mapping. The classification phase classifies rules generated. In the process of classifying the dataset, problem is effectively decomposed into sub-problems as a result of which sub-problems can be solved with compact networks and efficient combination and training of networks such that there is gain in terms of training time, network size and accuracy [36]. Two stages are used. In first stage, an l class classification problem is split into l two-class problems. Let there be l sets of subnetworks, with 3n inputs and one output node each. Rough set theoretic concepts are used to encode domain knowledge into each of sub-networks, using equations (9), (10), and (11). As explained in previous section, number of hidden nodes and connectivity of knowledge-based subnetworks is automatically determined. Each two-class problem leads to generation of one or more crude sub-networks, each encoding particular decision rule. Let each of these constitute pool knowledge-based modules. So, m  l such pools are obtained. Each pool k is perturbed to generate total of n k sub-networks, such that n1  .........  nk  ........  nm . These pools

constitute initial population of sub-networks, which are then evolved independently using genetic algorithms. At the end of first stage, modules or sub-networks corresponding to each two-class problem are concatenated to form an initial network for second stage. The inter module links are initialized to small random values as depicted in figure 5.1. A set of such concatenated networks forms initial population of GA. The mutation probability for inter-module links is set to a high value, while that of intra-module links is set to a relatively lower value. This restricted mutation helps preserve some of localized rule structures, already extracted and evolved as potential solutions. The initial population for GA of entire network is formed from all possible combinations of these individual network modules and random perturbations about them. This ensures that for complex multimodal pattern distributions all different representative points remain in population. The algorithm then searches through reduced space of possible network topologies. The steps are summarized in algorithm 2 [36]. Algorithm 2: Classification algorithm Input: An l class classification problem. Output: The pool of RFMLP networks is evolved using GA. Step 1: For each class, generate Rough set dependency rules using methodology discussed in rule generation phase. Step 2: Map each of dependency ru1es to separate sub-network modules using knowledge encoding methodology in rule generation phase. Step 3: Partially evolve each of sub-networks using conventional GA. Step 4: Concatenate sub-network modules to obtain complete network. For concatenation intramodule links are left unchanged while inter-module links are initialized to low random values. Each of sub-networks solves 2-class classification problem, while concatenated network solves actual l class problem. Every possible combination of sub-network modules is generated to form pool of networks. Step 5: The pool of networks is evolved using modified GA an adaptive mutation operator. Mutation probability is set to low value for intra-module links and to high value for inter-module links.

Consider problem of classifying a two dimensional data into two classes. The input fuzzy function maps features into six-dimensional feature space. Let sample set of rules obtained from Rough sets be [36]:

c1  ( L1  M 2 )  ( H 2  M 1 ); c 2  M 2  H 1 ; c3  L3  L1 where, L j , M j , H j correspond to  low( F j ) ,  medium( F j ) ,  high( F j ) respectively which denote low, medium and high values of stock price index. For first phase of GA three different pools are formed, using one crude sub-network for class 1 and two crude sub-networks for class 2 respectively. Three partially trained sub-networks result from each of these pools. They are then concatenated to form (1  2 = 2) networks. The population for final phase of GA is formed with these networks and perturbations about them. The steps followed in obtaining final network are illustrated in figure 5.2. The different features of GA [74] are discussed with relevance to this algorithm. The problem variables consist of weight values and input or output fuzzy parameters. Each weight is encoded

into binary word of 16 bit length, where [000...0] decodes to -128 and [111...1] decodes to 128 as shown in figure 5.3. An additional bit is assigned to each weight to indicate presence or absence of link. The fuzzy parameters tuned are centers c and radius  for each of linguistic attributes low, medium and high of each feature and output fuzzy parameters f d and f e [36]. These are also coded as 16 bit strings in the range [0, 2]. For the input parameters, [000...0] decodes to 0 and [111...1] decodes to 1.2 times maximum value attained by the corresponding feature in training set. The chromosome is obtained by concatenating all above strings. Sample values of string length are around 2000 bits for reasonably sized networks. Initial population is generated by coding networks obtained by Rough set based knowledge encoding and by random perturbations about them. Population size considered was 64. Due to large string length, single point crossover would have little effectiveness. Multiple point crossovers are adopted [74], with distance between two crossover points being a random variable between eight and 24 bits. This is done to ensure high probability for only one crossover point occurring within a word encoding single weight. The crossover probability is fixed at 0.7. The search string being very large, influence of mutation is more on search compared to c126over. The mutation probability [74], [108] has spatiotemporal variation. The maximum value of pmut is 0.4 and minimum value is 0.01. The mutation probabilities also vary along encoded string; bits corresponding to inter-module links are assigned probability pmut i.e., value of pmut at that iteration and intra-module links assigned probability pmut / 10 . This is done to ensure least alterations in structure of individual modules already evolved. Hence, mutation operator indirectly incorporates domain knowledge extracted through Rough sets.

Figure 5.1: Intra and Inter module links The objective function considered is of the form [36]: F  1 f1   2 f 2 (12) where, f 1  (No. of correctly classified sample in training set) / (Total no. of samples in training

set); f 2  1 – (No. of links present) / (Total no of links possible). Here,  1 and  2 determine

relative weight of each factor. Parameters  1 and  2 are taken as 0.9 and 0.1 respectively to give more importance to classification score compared to network size in terms of number of links. The network connectivity, weights and input or output fuzzy parameters are optimized simultaneously. Selection is done by roulette wheel method [74], [108]. The probabilities are calculated on basis of ranking of individuals in terms of objective function, instead of objective function itself. Elitism is incorporated in selection process to prevent oscillation of fitness

function with generation. The fitness of best individual of new generation is compared with that of current generation. If latter has higher value the corresponding individual replaces randomly selected individual in new population. This phase extracts rules from previous phases and utilizes them to predict stock price movement. The algorithm for rule extraction considered here is decompositional in nature as given in algorithm 3 [36]. Algorithm 3: Decompositional algorithm Input: Several RFMLP hybrid networks. Output: Generates most embedded rules over small number of computational steps. Step 1: Compute following quantities: PMean  Mean of all positive weights, PThreshold1  Meanof all positive weights less than PMean , PThreshold2  Mean of all weights greater than PMean . Similarly calculate N Threshold1 , NThreshold2 for negative weights. Step 2: for each hidden and output unit for all weights greater than PThreshold1 search for positive rules only and for all weights less than N Threshold1 search for negated rules only by Subset method search for combinations of positive weights above PThreshold2 and negative weights greater than N Threshold1 that exceeds the bias similarly search for negative weights less than NThreshold2 and positive weights below PThreshold1 to find the rules Step 3: Associate with each rule j a confidence factor

cf j 

inf

j:node sin thepath

(i w ji   j )

w i

(13)

ji

where, w ji is ith incoming link weight to node j.

Since, training algorithm imposes a structure on network, resulting in sparse network having few strong links, PThreshold and N Threshold1 values are well separated. Hence, above rule extraction algorithm generates most embedded rules over small number of computational steps. An important consideration is order of application of rules in rule base. Since most real life patterns are noisy and overlapping, rule bases obtained are often not totally consistent. Hence, multiple rules may fire for single example. Several existing approaches apply rules sequentially [14], often leading to degraded performance. The rules extracted by this method have confidence factors associated with them. Therefore, if multiple rules are fired, strongest rule having highest confidence is used. Two existing rule extraction algorithms, similar to this algorithm are Subset method and M of N method [36]. The major problem with Subset algorithm is that cost of finding all subsets grows as size of power set of links to each unit. It requires lengthy, exhaustive searches of size 2 k for hidden or output node with fan-in of k and extracts large set of rules, up to  p  (1   n ) , where  p and  n are number of subsets of positively and negatively weighted links respectively. Some generated rules may be repetitive, as permutations of rule antecedents are not taken care of automatically. Moreover, there is no guarantee that all useful knowledge

embedded in trained network will be extracted. Additionally, rule extraction procedure involves back-propagation step requiring significant computation time. The algorithm has good generalization accuracy, but can have degraded comprehensibility [36]. The group of links is considered as equivalence classes, thereby generating a bound on number of rules rather than establishing ceiling on number of antecedents.

Figure 5.2: Steps for designing sample modular RFMLP

Figure 5.3: Chromosomal representation

5.2.3 Quantitative Performance Measures Now some quantitative performance measures are provided to evaluate performance of rules [36], [109]. Let N be l  l matrix whose (i, j ) th element nij indicate number of patterns actually belonging to class i but classified as class j. The measures are: (a) Accuracy: It is correct classification percentage, provided by rules on test set defined as

nic ni

 100 where, n i is number

of points in class i and nic of these points are correctly classified; (b) User’s Accuracy: If ni' points are found to be classified into class i , [36] then user’s accuracy (U) is U  nic ni' . This gives measure of confidence that classifier attributes to region as belonging to class. In other words, it denotes level of purity associated with region; (c) Kappa: The coefficient of agreement called kappa [36] measures relationship of beyond chance agreement to expected disagreement. It uses all cells in confusion matrix. The estimate of kappa K is proportion of agreement after chance agreement is removed from consideration. The kappa value for class i ( K i ) is

Ki 

n.nic  ni .ni' . The numerator and denominator of overall kappa are obtained by summing n.ni'  ni .ni'

respective numerators and denominators of K i separately over all classes; (d) Fidelity: This represents how closely rule base approximates parent ANN model [16], [17]. This is measured this as percentage of test set for which network and rule base output agree. Fidelity may or may not be greater than accuracy; (e) Confusion: This measure quantifies goal that confusion should be restricted within minimum number of classes, which helps in higher level decision making. Let

nˆ ij be mean of all nij for i  j . Then, Conf 

Card{nij : nij  nˆ ij , i  j} l

for an l class problem.

The lower value of Conf denotes lesser number of classes between which confusion occurs; (f) Cover: Ideally rules extracted should cover all cluster regions of pattern space. The percentage of examples is used from test set for which no rules are fired as measure of uncovered region. Rule base having smaller uncovered region is superior; (g) Rule base size: It is measured in terms of number of rules; lower values have more compact rule base; (h) Computational complexity: Here, CPU time required is presented; (i) Certainty: By certainty of rule base, confidence of rules is quantified by certainty factor cf .

5.3 Time Series Forecasting using Hybrid Neuro-Fuzzy Regression Model In this section, time series financial market data is forecasted using hybrid Neuro-Fuzzy regression model which leads to improved decisions and investments [41]. The major concern is on accuracy and effectiveness of forecasts. ANN and Fuzzy sets are most suitable to tackle these types of problems. The model yields accurate results with fewer observations and incomplete data sets for point and interval forecasts. The empirical results indicate that performance of model is comparatively better than other models for forecasting and decision making under different conditions. 5.3.1 ANN Approach to Time Series ANN flexibly models wide range of non-linear problems [79]. They are universal approximators that approximate large class of functions with high degree of accuracy. Their power lies in parallel processing of information from data. No prior assumption of model form is required in

the model building process. Instead, network model is largely determined by characteristics of data. Single hidden layer feed forward network is most widely used form for time-series modeling and forecasting [164]. The model is characterized by network of three layers of simple processing units connected by acyclic links as shown in figure 5.4. The output y t and inputs

( yt 1 ,......... ., yt  p ) are represented by following relationship: q

p

j 1

i 1

yt  w0   w j  g (w0, j   wi , j  yt i )   t

(14)

where (wi , j ; i  0,1,2,......., p, j  1,2,......., q) and (w j ; j  0,1,2,......., q) are model parameters called connection weights, p is number of input nodes and q is number of hidden nodes. The logistic function used as hidden layer transfer function is:

Sig ( x) 

1 1  e x

(15)

Thus, equation (14) performs non-linear functional mapping from past observations to future values given by y t , i.e. (16) yt  f ( yt 1 ,......... ., yt  p , w)   t where w is vector of all parameters and f () is function determined by network structure and connection weights. Hence, ANN is equivalent to non-linear autoregressive model. Equation (14) implies that one output node in output layer is used for one step ahead forecasting and is able to approximate arbitrary function as number of hidden nodes when q is sufficiently large [140]. Generally, simple network structure that has small number of hidden nodes performs well in out of sample forecasting which is primarily due to over-fitting effect typically found in ANN modeling process. The choice of q is data dependent and there is no systematic rule in determining this parameter. Alongwith choosing an appropriate number of hidden nodes, an important task of ANN modeling of time-series is selection of number of lagged observations p and dimension of input vector [147]. This is one important parameter to be estimated in ANN model because it plays major role in determining non-linear autocorrelation structure of timeseries. 5.3.2 Fuzzy Forecasting Model Fuzzy sets introduced by Zadeh [155], [159] provides powerful framework to cope with vague or ambiguous situations and expresses linguistic values and human subjective judgments of natural language. Fuzzy sets were applied to time-series leading to Fuzzy time-series by [48], [49]. Among notable Fuzzy time-series model where calculations are easy and forecasting performance is good is by Chen [41], [47] as discussed. 5.3.3 Fuzzy Regression Models Statistical models use concept of measurement error to deal with difference between estimators and observations, but these data are precise values and do not include measurement errors and is identical to Fuzzy regression model [113], [139]. In Fuzzy regression residuals between estimators and observations are not produced by measurement errors, rather by parameter uncertainty in the model and possibility distribution is used to deal with real observations. The generalized model of Fuzzy linear regression is:

n ~ ~ ~ ~ ~ ~ Y   0  1 x1  .......... ..   n xn    i xi  x ' 

(17)

i 0

~

where, x ' denotes transpose of vector of independent variables, n is number of variables and  i

~

represents fuzzy sets representing ith parameter of model. The parameter  i is of form L-type Fuzzy numbers [130], possibility distribution for which is given by,  ~ (  i )  L{( i   i ) / c} (18) i

Algorithm 4: Chen’s algorithm Step 1: Define universe of discourse and intervals for rules abstraction i.e., U = [starting ending]. As interval length is determined, U can be partitioned into equal length intervals. Step 2: Define fuzzy sets based on U and fuzzify historical data. Step 3: Fuzzify observed rules. The data is fuzzified to A j if maximal degree of membership of that data is A j . Step 4: Establish fuzzy logical relationships and group them based on current states of data of fuzzy logical relationships. If A1  A1 , A1  A2 and A1  A3 , they can be grouped as A1  A1 , A2 , A3 . Step 5: Perform the forecasting operation. If F (t  1)  Ai and A1  Ai , A j ,......... ., Ak , then forecast value is Ai , A j ,......... ., Ak Step 6: Defuzzify using centroid method to get results.

Figure 5.4: Artificial Neural Network Structure [ N ( p q 1) ] Fuzzy parameters in form of triangular fuzzy numbers are used, such that

 | i  i | ,  i  ci     i  ci 1  ci  ~i (  i )   0, otherwise 

(19)

where,  ~ (  i ) is membership function of Fuzzy set is represented by parameter  i ,  i is centre i

of fuzzy number and c i is width or spread around centre of fuzzy number. By extension principle,

~ yt  xt'  can be defined using pyramidal fuzzy parameter membership function of fuzzy number ~

 as follows [139]:

1  | y t  xt  | c ' | xt |, xt  0  xt  0, y t  0  ~y ( y t )  1, 0, xt  0, y t  0 

(20)

where,  and c denote vectors of model values and spreads for all model parameters respectively, t is number of observations, t  1,2,........, k . Finally, the method uses criterion of minimizing total vagueness, S defined as sum of individual spreads of fuzzy parameters of model, such that [41], [47], [139], Minimize S 

k

c t 1

'

| xt |

(21)

This approach also takes into account that membership value of each observation y t is greater than an imposed threshold level h  [0,1] . This criterion expresses the fact that fuzzy output of model should be over all data points y1 ,......... ..., y k to certain h level. A choice of h level value influences widths c of fuzzy parameters,

 ~y ( yt )  h for t  1,2,........, k

(22)

The index t refers to number of non-fuzzy data used in constructing the model. The problem of finding Fuzzy regression parameters was formulated by as linear programming problem as follows [41], [47], [113], [139]: Minimize S 

k

c t 1

'

| xt |

 xt'   (1  h)c ' | xt | yt , t  1,2,......... ., k  ' ' subject to  xt   (1  h)c | xt | y t , t  1,2,......... ., k (23) c  0 otherwise  where  '  (1 ,  2 ,......... ..,  n ) and c '  (c1 , c2 ,......... .., cn ) are vectors of unknown variables and S is total vagueness. 5.3.4 Neuro-Fuzzy Forecasting Model ANN is appropriate forecasting paradigm for wide range of non-linear problems [79]. However, large amount of historical data is required to produce accurate results. In today’s competitive

scenario due to uncertainty from environment and rapid growth of new technologies, forecasting is performed using little data over short span of time. Thus, forecasting methods are required that are efficient under incomplete data conditions. Fuzzy regression model is suitable for interval forecasting where inadequate historical data are available. The parameters of ANN model viz. weights (wi , j ; i  0,1,2,......., p, j  1,2,......., q) and biases (w j ; j  0,1,2,......., q) are crisp in nature. Now, instead of using crisp, fuzzy parameters in form of  2 fuzzy numbers for weights ~ ; i  0,1,2,......., p, j  1,2,......., q) and biases (w ~ ; j  0,1,2,......., q) are used for related (w i, j j parameters of layers. Here, methodology formulated by is considered which includes wide spread of forecasted model. The model is described using fuzzy function with fuzzy parameters [41], [47], [113], [139]: q

p

j 1

i 1

~ ~  w yt  f ( w  ~ j  g (w~0, j   w~i, j  yt i )) 0

(24)

~ ; i  0,1,2,......., p, j  1,2,......., q and w ~ ; j  0,1,2,......., q where y t are observations and w i, j j are fuzzy numbers. Equation (14) can be rewritten as [41], [47]: q q ~ ~ ~  w ~  X~ ) = f ( w yt  f ( w  j t, j  ~ j  X t, j ) 0 j 1

~

~  where X t , j  g ( w 0, j

p

 w~ i 1

i, j

(25)

j 0

 y t i ) . Fuzzy parameters in form of  2 fuzzy numbers

~  (a , b , c , d ) are used [41], [47]: w i, j i, j i, j i, j i, j  ai , j , wi , j  bi , j   bi , j  ai , j  wi , j   w~i , j ( wi , j )  1, bi , j  wi , j  ci , j  d i, j  , w  ci , j  wi , j  ci , j  d i , j i , j 

(26)

where,  w~ ( wi , j ) is membership function of Fuzzy set representing wi , j . By using extension

~

~  principle [18], [105], [107], membership of X t , j  g ( w 0, j

p

 w~ i 1

i, j

 y t i ) in equation (22) is:

p  g (  a i , j  y t ,i )  p i 0  , X  g ( bi , j  y t ,i )  t, j p p  i 0  g ( bi , j  y t ,i )  g ( ai , j  y t ,i )  X t , j i 0  i 0 p p   X~t , j ( xt , j )  1, g ( bi , j  y t ,i )  X t , j  g ( ci , j  y t ,i ) i 0 i 0  p  g (  d i , j  y t ,i )  p i 0  , X  g ( c i , j  y t ,i )  t, j p p  i 0  X t , j  g (  c i , j  y t ,i )  g (  d i , j  y t ,i ) i 0 i 0 

(27)

yt ,i  1(t  1,2,......... , k , i  0), yt ,i  yt i (t  1,2,........, k , i  1,2,........, p) . ~ Considering  2 fuzzy numbers X t , j with membership function given by equation (27)  2 fuzzy ~ will be as follows [41], [47]: parameters w where,

j

 dj ,wj  ej  e j  d j  w j   w~ j ( w j )  1, e j  w j  f j  gj  ,wj  f j wj  f j  g j 

~  The membership function of ~ yt  f ( w 0

(28)

q

q ~ ~  X~ ) = f ( w w  j t, j  ~ j  X t , j ) is given as [41], [47]: j 1

j 0

1/ 2 2  1     B1   B1   C1  f ( yt )  , f 1 ( y )  C t 1  2 A1  2 A1  A1      ~ ( yt )  1, C1  f 1 ( yt )  C 2 Y  1/ 2 2  B2  B2  C 2  f 1 ( yt )     , f 1 ( yt )  C 2    2 A 2 A A   2  2  2 

where, A1 

q

p

(29)

p

 (e j  d j )  ( g ( bi, j  yt ,i )  g ( ai, j  yt ,i )) j 0

i 0

q

p

j 0

i 0

i 0

p

p

B1   (d j  ( g ( bi , j  yt ,i )  g ( ai , j  yt ,i ))  g ( ai , j  yt ,i )  (e j  d j )) i 0

q

i 0

p

p

A2   ( f j  g j )  ( g ( ci , j  yt ,i )  g ( d i , j  yt ,i )) j 0

i 0

q

i 0

p

p

p

i 0

i 0

B2   ( g j  ( g ( ci , j  yt ,i )  g ( d i , j  yt ,i ))  g ( d i , j  yt ,i )  ( f j  g j )) j 0

i 0

q

p

q

p

j 0

i 0

j 0

i 0

C1   (d j  g ( ai , j  yt ,i )) C 2   ( g j  g ( d i , j  yt ,i )) Considering threshold level h for all membership function values of observations, non-linear programming problem is given as follows [41], [47]: k

q

p

p

i 0

i 0

Min ( f j  g ( ci , j  yt ,i ))  (d j  g ( ai , j  yt ,i )) t 1 j 0

1/ 2   B  2 C  f 1 ( y )   B 1 t 1 1      h, f 1 ( yt )  C1 , t  1,2,......... ., k    2 A1  2 A1  A1     subject to 1, C1  f 1 ( y t )  C 2 (30)  1/ 2 2  B2  B2  C 2  f 1 ( yt )      h, f 1 ( yt )  C 2 , t  1,2,......... .., k    A2   2 A2  2 A2 

To present simplicity and efficiency of model in forecasting, nature of  2 fuzzy numbers are considered to be symmetric, output neuron transfer function is taken to be linear and connected weights between input and hidden layers are considered to be of crisp form. The membership function of y t is transformed as follows [41], [47]: q   | yt    j  X t , j | j 0 1  , X t, j  0 q  ~ ( yt )   c j | X t, j | y   j 0  0, otherwise

(31)

Simultaneously, y t represents tth observation and h-level is threshold value representing degree to which the model should be satisfied by all data points y1 ,......... ..., y k . A choice of h value influences widths of fuzzy parameters [41], [47]:

 ( yt )  h t  1,......... .., k ~

(32)

y

The index t refers to number of non-fuzzy data used for constructing the model. On other hand, fuzziness S included in model is defined by [41], [47], q

k

S   c j | w j || X t , j |,

(33)

j 0 t 1

where, w j is connection weight between output neuron and j th neuron of hidden layer, xt , j is output value of j th neuron of hidden layer in time t. Now, the problem lies in finding parameters in discussed method, which is reformulated as linear programming problem as follows [41], [47]: q

k

Minimize S   c j | w j || X t , j |, j 0 t 1

q q  X  ( 1  h )( c j | X t , j |)  y t , t  1,......... .., k   j t , j j 0 j 0  q  q  X  ( 1  h )( c j | X t , j |)  y t , t  1,......... .., k  j t , j  j  0 j  0  c  0, j  0,1,......... ...., q  j 

(34)

The procedure of the model is [41], [47]: (a) Phase I: Train network using available information from observations. The optimum solution of parameter w*  ( w*j ; j  0,1,2,......., q ,

wi*, j ; i  0,1,2,......., p, j  0,1,2,......., q) and output value of hidden neuron serves as one of input data sets for step II; (b) Phase II: Determine minimal fuzziness using criterion stated in * * * equation (20) and w  ( w j ; j  0,1,2,......., q , wi , j ; i  0,1,2,......., p, j  0,1,2,......., q) ; the number of constraint functions is same as number of observations; (c) Phase III: The data around model’s upper bound and lower bound when proposed model has outliers with wide spread are deleted in accordance with Isibuchi’s recommendations. In order to make model include all possible conditions c j has wide spread when data set includes significant difference or outlying case. [113] suggested that data around model’s upper and lower boundaries are deleted so that Fuzzy regression model can be reformulated.

5.4 Enhancing Forecasting Accuracy with Non-uniform spread FLR Model Fuzzy linear regression has proved to be a powerful tool for analyzing vague relationship between dependent and independent variables in complex systems involving human subjective judgment under incomplete conditions [42]. As illustrated in section 5.3, Fuzzy linear regression when hybridized with ANN demonstrated itself as an efficient time series forecasting technique. Some previous investigations on Fuzzy linear regression analysis obtain crisp regression coefficients for eliminating problem of increasing spreads for estimated fuzzy responses as magnitude of independent variable increases. However, they still cannot handle situation of decreasing or variable spreads. In this section, three phase method is presented to construct Fuzzy linear regression model with variable spreads to tackle this problem. 5.4.1 Fuzzy Linear Regression The linear regression model expresses relationship between one or more explanatory and response variables. The crisp simple linear regression model involving single independent variable and one or more explanatory variable are represented mathematically as by following relation [42], [139]: (35) yi   0  1 xi   i , i  1,......... .., n th where x i and y i represent explanatory variable and response in i observation respectively;  0 th and  1 are regression coefficients and  i is error term associated with i observation. In classical

statistical analysis, population parameters  0 and  1 are estimated by certain sample statistics, such as least-squares estimators which are best linear unbiased estimators used most frequently [42] and given as:

n

b1 

 ( xi  x)( yi  y) i 1

(x

 x) 2

i

i 1

b0 

(36i)

n

n

n

n

n

i 1

i 1 n

i 1 n

i 1

 y i  xi2   xi  xi yi n x 2i  ( xi ) 2 i 1

i 1

b0  y  b1 x n

where, x 

x i 1

(36ii)

n

i

and y 

y i 1

i

. The case of multiple regressions is straightforward n n generalization of linear regression. When any of responses y i or explanatory variables x i is fuzzy, crisp regression model defined in equation (35) is then modified into Fuzzy regression model expressed mathematically as [27]: ~

~

~

yi   0  1 xi   i , i  1,......... .., n ~

(37)

~

where,  0 and  1 are regression coefficients, ( x i , y i ), i  1,......... .., n are n pairs of fuzzy ~

observations and  i is fuzzy error term associated with regression model. Since crisp numbers ~

~

can be described by degenerated fuzzy numbers, consider x i and y i as fuzzy numbers as [42]: ~

x i  {( xi ,  ~ ( xi )) | xi  X i }, i  1,......... ., n (38i) xi

~

and y i  {( y i ,  ~ ( y i )) | y i  Yi }, i  1,......... .., n (38ii) yi

where, X i and Yi are crisp universal sets of explanatory variables and responses, and  ~ ( xi ) and xi

~

~

 ( yi ) are their corresponding membership functions respectively. When x i and y i are fuzzy ~

yi

numbers, least-squares estimators stated in equations (36i) and (36ii) become fuzzy numbers ~

~

denoted as b1 and b0 respectively. It is necessary to find membership functions to completely conserve fuzziness of observations. 5.4.2 Membership Function of Regression Coefficients According to Zadeh’s extension principle [139], [160], [161], [162], membership functions of regression coefficients are defined as:

 ( z1 )  sup min{ ( xi ),  ( y i ), i | z1  b1 } ~

b1

X i ,Yi

~

~

xi

yi

(39i)

and  ~ ( z 0 )  sup min{ ~ ( xi ),  ~ ( y i ), i | z 0  b0 } (39ii) b0

X i ,Yi

xi

yi

where, b1 and b0 are defined in equations (36i) and (36ii). Although equations (39i) and (39ii) are theoretically correct, since several fuzzy numbers are involved, it is almost impossible to derive ~

~

membership functions of b1 and b0 from these two equations in an explicit manner. An easier approach has been proposed [42] to derive membership function based on Zadeh’s extension

principle and  representation, which can be adopted to construct membership functions  ~ and b1



b0

~

~

~

~

~

. The idea is to derive -cuts of b1 and b0 respectively which are defined for x i and y i [42] as

follows:

xi ( )  {xi  X i |  ~ ( xi )  }

(40i)

xi

and yi ( )  { yi  Yi |  ~ ( yi )  }

(40ii)

yi

Here, xi ( ) and yi ( ) are crisp sets rather than fuzzy sets. Since, x i and y i are assumed to be fuzzy numbers, their -cuts defined in equations (39i) and (39ii) are crisp intervals expressed as follows: xi ( )  [min{xi |  ~ ( xi )   }, max{xi |  ~ ( xi )   }]  [( xi )L , ( xi )U ] xi X i

xi X i

xi

xi

and yi ( )  [min{ yi |  ~ ( yi )   }, max{ yi |  ~ ( yi )   }]  [( yi )L , ( yi )U ] yi Yi

yi Yi

yi

yi

These intervals indicate where constant explanatory variables and responses lie at possibility. By convexity of fuzzy number, bounds of these intervals are functions of  and can be obtained as ( xi ) L  min  ~ 1 ( ) , ( xi )U  max  ~ 1 ( ) , ( y i ) L  min  ~ 1 ( ) and ( yi )U  max  ~1 ( ) xi

xi

yi

yi

respectively. The membership functions of b1 and b0 defined in equations (39i) and (39ii) are parameterized by. As a result -cuts can be used to construct their membership function [42]. The basic idea of deriving membership function of regression coefficients is to employ -cuts ~

~

and Zadeh’s extension principle to transform fuzzy regression coefficients b1 and b0 to a family of ~

~

crisp regression coefficients (b1 )  and (b0 ) L , respectively parameterized by. According to L

extension principle given by equation (39i),  ~ ( z1 ) is minimum of  ~ ( xi ) and  ~ ( yi ), i . From b1

yi

xi

 ( xi )   or  ( yi )   , and at least one  ( xi ) or  ( yi ), i

membership value, either

~

~

~

~

xi

yi

xi

yi

equal to  such that z1  b1 in order to satisfy  ~ ( z1 )   , where b1 is defined in equation (36i). b1

Since, all -cuts form nested structure with respect to  [42], then  ~ ( xi )   and  ~ ( xi )   xi

xi

have same domain as  ~ ( yi )   and  ~ ( yi )   . To find membership function  ~ ( z1 ) , it is yi

yi

b1

~

sufficient to derive lower and upper bounds of -cuts of b1 . This is achieved by solving following crisp parametric nonlinear programming problems: n

~

(b1 )  min L

 (x i 1

i

 x)( yi  y )

n

 (x i 1

i

 x) 2

subject to ( xi )L  xi  ( xi )U ;

( yi )L  yi  ( yi )U , i (41i)

n

~

(b1 )   max U

 (x i 1

subject to ( xi )L  xi  ( xi )U ;

 x)( y i  y )

i

n

 (x i 1

i

( yi )L  yi  ( yi )U , i (41ii)

 x) 2

where at least one x i and y i must hit boundary of their -cuts to satisfy  ~ ( z1 )   . The b1

n

objective function of equations (41i) and (41ii) x 

x i 1

n

i

and y 

y i 1

i

are crisp numbers

n n rather than fuzzy numbers as all xi and y i , i  1,......... ......., n are crisp numbers. Similarly, membership function

 ( z 0 ) can be obtained by deriving lower and upper bounds of -cut of ~

b0

~

b0 and solved by following parametric nonlinear programming problems [42]: n

~

(b0 ) L  min

n

n

i 1

i

i 1 n

n

~

i

i 1 n

i 1

i

( xi )  xi  ( xi )U ; L

i

( yi )L  yi  ( yi )U , i (42i)

n xi2  ( xi ) 2 i 1

(b0 )U  max

subject to

n

y x x x y 2 i

i 1

n

n

i 1

i

i 1 n

subject to ( xi )L  xi  ( xi )U ; ( yi )L  yi  ( yi )U , i (42ii)

n

y x x x y 2 i

i

i 1 n

i 1

i

i

n xi2  ( xi ) 2 i 1

i 1

Each of above two models is nonlinear program with bound constraints can be solved effectively and efficiently by using nonlinear programming algorithms, such as sequential quadratic programming methods [42]. The membership functions  ~ ( zi ) are constructed as follows: bi

0, (bi )  1  z i ,  L L  Li ( z i ), (bi )   0  z i  (bi )  1 ,   ~ ( z i )  1, (bi ) L 1  z i  (bi ) U 1 ; i  0,1 bi  U U  Ri ( z i ), (bi )  1  z i  (bi )   0 , 0, (b ) U  z i  0 i  L

~

~

~

(43)

~

U L If values of (b1 )L , (b1 )U  , (b0 )  and (b0 )  cannot be solved analytically, then numerical

values for them at different level α can be collected to approximate shapes of Li ( z i ) and

Ri ( z i ), i  0,1 .

5.4.3 Fuzzy Regression with Non-uniform Spreads The idea that ideal regression coefficients in fuzzy linear regression model should be crisp rather than fuzzy numbers, to avoid increasing spread of estimated response has been given earlier [15], [27], [61], [77], [113], [139]. Further, a better fuzzy regression model can cope with situation of decreasing or non-uniform spread. Here, improved fuzzy linear regression model with better forecasting performances is discussed. ~

~

~

~

To find representative crisp values for b1 and b0 , fuzzy numbers b1 and b0 are defuzzified to crisp values. Among defuzzification approaches, center of area (COA) method is most commonly used technique used here for defuzzifying fuzzy regression coefficients to crisp ones [86]. Let (bi ) c be ~

defuzzified values of bi . COA method calculates (bi ) c as follows: ( bi )U  0



(bi ) c 

z i  ~ ( z i )dzi bi

L

( bi )  0 ( bi )U  0



, i  0,1

(44)

 ( z i )dzi

L

( bi )  0

~

bi

~

where,

 ( z i ) is membership functions of bi , i  0,1 , respectively which can be developed as ~

bi

discussed previously. If analytical form of

 ( z i ) cannot be obtained, then numerical methods ~

bi

of approximation such as trapezoidal rule can be used. From equation (37), error term is as follows [42]: ~

~

~

 i  yi  (  0  1 xi ), i  1,......... .., n

(45)

Now,  0 and  1 can be estimated by (bi ) c , i  0,1 . Substituting (bi ) c into equation (45), ~

~

~

 i  yi  ((b0 ) c  (b1 ) c xi ), i  1,......... .., n (46) ~

~

~

~

When x i and y i are fuzzy numbers the error terms  i are also fuzzy numbers. Assuming x i and ~

~

y i as trapezoidal fuzzy numbers ( xiL , xiM1 , xiM 2 , xiR ) and ( yiL , yiM1 , yiM 2 , yiR ) respectively,  i are ~

~

also trapezoidal in nature [86]. An estimate E  ( l ,0,0, r ) for  i , i  1,......... .., n leads to fixing ~

resulting spread of their estimated response such that E i  (li ,0,0, ri ) . Consequently, fuzzy linear regression model with non-uniform spreads is as follows [42], [139]: ~

~

~

~

( yi ) NUS  (b0 ) c  (b1 ) c xi  Ei  (b0 ) c  (b1 ) c x i  (l i ,0,0, ri ); i  1,......... ., n (47)

To minimize total error in estimation best values of l and r should be found. The error in estimation is difference between observed and estimated responses as follows [42], [139]:

Di  

~ ^ ~~

S ( yi )  S ( yi )

|  ~ ( y )   ^ ( y ) | dy, i  1,......... ..., n

(48)

~

yi

yi

^

~

~

where, S ( y i ) and S ( y i ) are supports of  ~ ( y) and  ( y ) respectively. Taking equation (48) ^ ~

yi

yi

as measure, a mathematical program is used to find optimal values of spreads of each estimated response l i and ri such that total error in estimation is minimized. According to equation (47), ^

~

estimated response y i can be represented as [42], [139], ^

~

M M y i  ( yˆ iL , yˆ i 1 , yˆ i 2 , yˆ iR )

 ((b0 ) c  (b1 ) c xiL  li , (b0 ) c  (b1 ) c xiM1 , (b0 ) c  (b1 ) c xiM 2 , (b0 ) c  (b1 ) c xiR  ri ) (49) To cope with the situation of decreasing non-uniform spread for observed responses; here it is assumed that spread of each estimated response equal to that of its associated observed response, such that following equalities hold: yˆ iR  yˆ iL  yiR  yiL , i  1,2,......., n (50) When x i is crisp, equation (50) becomes yˆ i  yˆ i  li  ri  yi  yi , i  1,2,......., n . The above constraints only limit value of sum of left and right spreads for each estimated response. It is possible that equation (50) is valid, but l i and ri is much smaller than it’s observed left or right spreads respectively. Thus, for individual left or right spread of estimated response, a lower bound should be considered [42], [139], such that, R

R

L

L

yˆ iM1  yˆ iL  l min , i  1,2,......., n (51i) yˆ iR  yˆ iM 2  rmin , i  1,2,......., n (51ii) where, l min and rmin are smallest left and right spreads of observed responses respectively. Hence, incorporating equations (50), (51i) and (51ii), mathematical program to find optimal values of li* and ri* , i  1,2,......... ., n can be formulated as [42]: n

min  Di

(52)

i 1

subject to Di  

~ ^ ~

S ( yi )  S ( yi )

|  ~ ( y )   ^ ( y ) | dy, i  1,......... ....., n ; (53) ~

yi

y

^

~

~

~

yi  (b0 ) c  (b1 ) c xi  Ei , i  1,......... ....., n ; (54) ~

Ei  (li ,0,0, ri ), i  1,......... .., n ;

(55)

yˆ iR  yˆ iL  yiR  yiL , i  1,......... ..., n ; yˆ

M1 i

 yˆ  lmin , i  1,......... ..., n ;

yˆ  yˆ R i

L i

M2 i

 rmin , i  1,......... ..., n

(56)

(57) (58)

Substituting li* and ri* for l i and ri in equation (47) respectively, fuzzy regression model becomes [42], [139]: ~

~

( y i ) NUS  (b0 ) c  (b1 ) c xi  (li* ,0,0, ri* ), i  1,......... ., n

(59)

This can be adopted to derive estimated response for forecasting the associated observed response ~

for specific and collected values of independent variables xi , i  1,......... ., n . For predicting response corresponding to values other than these specific ones, fuzzy inference system has been applied to fuzzy regression model [130]. Fuzzy inference system is an important function approximation technique [99] and has been applied to fields such as Expert Systems and Automatic Control. It consists of three components viz., rule base, database and reasoning mechanism. The rule base contains selection of fuzzy if-then rules activated by certain value of interest, database defines membership functions adopted in fuzzy if-then rules, and inference procedure is called fuzzy reasoning based on information aggregation from the activated fuzzy rules. Different types of fuzzy rules and aggregation resulting in several fuzzy inference systems have been used of which Mamdani fuzzy model and Tagaki-Sugeno fuzzy model [86] are most important. Here, Tagaki-Sugeno model with one input–one output is adopted for deriving spread of predicted error terms. The output from Tagaki-Sugeno fuzzy model is optimized using training data set by reducing training error. Suppose that observed responses activated by independent variable of interest are ~ a

~ a

y i ,1  1,......... .., p and associated error terms are E i  (lia ,0,0, ri a ),1  1,......... .., p . Let y estimated denote estimated response and eestimated denote its estimated error term. Then each row of membership function constitutes an if-then rule given as [42], ~a

~ a

Ri : if y estimated = y i ,then eestimated = E i i  1,......... .., p

(60)

~

Fuzzy set of predicted error term, E predicted obtained from above may have irregular shapes. For all observations that are trapezoidal fuzzy numbers, it is preferable that obtained estimated ~

response is also trapezoidal fuzzy number; which is achieved by transforming E predicted into fuzzy number. Denoting transformed predicted error term, ~

1 2 E estimated  (l estimated , mestimated , mestimated , restimated ) , where

~

minimum and maximum of possible E predicted values;

l estimated and

restimated are taken as

1 2 mestimated and mestimated are calculated by

~

using COA method. Thus, estimated response for ~ estimated

y NUS

x estimated becomes [42], [139],

~

1 2  (b0 ) c  (b1 ) c x estimated  (lestimated , cestimated , cestimated , restimated )

(61)

5.5 FSVM for Bankruptcy Prediction SVM is a promising statistical tool [151] which when embedded with fuzzy membership functions [155], [159] leads to development of FSVM to corporate bankruptcy analysis [38]. FSVM is implemented for analyzing predictors as financial ratios. A method of adapting it to default probability estimation is proposed here. Fuzzy functions SVM are capable of extracting

useful information from financial data, although extensive data sets are required in order to fully utilize their classification power. Thus, aim is to provide an approach which inherits advantages of Machine Learning and Fuzzy Logic [87] such that prediction accuracy of model is enhanced and effective results are obtained. 5.5.1 Bankruptcy Analysis Methodology The motivation for research in bankruptcy analysis was motivated by demand from financia1 institutions for investment risk estimation. However, despite substantial interest, accuracy of corporate default predictions was much lower than in private loan sector, due to small number of corporate bankruptcies. Meanwhile, situation in bankruptcy analysis has changed dramatically. Large datasets with median number of failing companies exceeding 1000 have become available. 20 years ago median was around 40 companies and statistically significant inferences could not often be reached. The growth of computer technology and advances in Statistical Learning techniques has allowed identification of more complex data structures. As basic methods are often inadequate, a demand for advanced methods of controlling and measuring default risks has rapidly increased in anticipation of New Basel Capital Accord adoption [38]. This emphasizes importance of risk management and improvements in financial institutions’ risk assessment capabilities. In order to estimate investment risks one needs to evaluate default probability (PD ) for an organization. Each organization is described by set of variables x commonly known as predictors, such as financial ratios and its class y that can be either y  1 (successful) or y  1 (unsuccessful or bankrupt) [38]. Initially, an unknown classifier function f : x  y is estimated on training set of organizations ( x, y ), i  1,......... , n .The training set represents dataset for organizations which are known to have survived or gone bankrupt. Finally, f is applied to computing default probabilities (PD ) that can be uniquely translated into an organization rating. Statistical models such as option pricing model, logit and probit regressions, discriminant analysis, assume that relationship between input and output parameters can be described apriori [38]. Besides their fixed structure these models are fully determined by set of parameters. The solution requires estimation of these parameters on training dataset. Although statistical models provide very clear interpretation of modeled processes, they are rigid in structure and are not flexible enough to capture information from data. Non-statistical models like ANN or GA [74], [79] are more flexible in describing data. They do not impose very strict limitations on classifier function but usually do not provide clear interpretation either. Statistical models for corporate default prediction are of practical importance. For example, corporate bond ratings published regularly by rating agencies such as moody’s or S & P strictly correspond to company default probabilities estimated to great extent statistically. Moody’s riskcalc model is basically probit regression estimation of cumulative default probability over number of years using linear combination of non-parametric transformed predictors [38]. These non-linear transformations f1 , f 2 ,........, f d are estimated on univariate models. As a result, the original probit model [38] is:

E[ yi ,t | xi ,t ]  (1 xi1 ,t   2 xi2 ,t  .........   d xid ,t )

(62)

is converted into following expression:

E[ yi ,t | xi ,t ]  {1 f1 ( xi1 ,t )   2 f 2 ( xi2 ,t )  .........   d f d ( xid ,t )}

(63)

where, yi ,t is cumulative default probability within prediction horizon for organization i at time t. Although modifications of traditional methods like probit analysis extend their applicability, it is

more desirable to base the methodology on general ideas of Statistical Learning theory without making many restrictive assumptions. The ideal classification machine [60] applying classifying function f from available set of functions  is based on expected risk minimization principle. The expected risk [151]:

1 | f ( x)  y | dP( x, y ) (64) 2 is estimated under distribution P ( x, y ) which is assumed to be known. This is however never true R( f )  

in practical applications and distribution should also be estimated from training set ( x, y ), i  1,......... , n, leading to an ill-posed problem [38]. In most methods applied today in statistical applications, this problem is solved by implementing another principle namely the principle of empirical risk minimization, i.e. risk minimization over training set of organizations, even when training set is not representative. The empirical risk defined as [38]:

1 n 1 Rˆ ( f )   | f ( xi )  yi | n i 1 2

(65)

is nothing else but an average value of loss over the training set, while expected risk is the expected value of loss under true probability measure. The loss for independent and identically distributed observations is given by [38]:

0, correct  classification 1 (66) | f ( x)  y |  2 1, incorrect  classification The solutions to problems of expected and empirical risk minimization [38]:

f opt  arg min R ( f )

(67)

fˆn  arg min Rˆ ( f )

(68)

f 

f 

generally do not coincide as shown in figure 5.5, although converge as n → ∞ if  is not too large. The expected risk cannot be minimized directly since distribution P ( x, y ) is unknown. However, according to Statistical Learning theory given by [151], it is possible to estimate Vapnik-Chervonenkis (VC) bound that holds with certain probability 1   [38]:

h ln( ) R ( f )  Rˆ ( f )   ( , ) n n

(69)

For linear indicator function, g ( x)  sign( x T w  b) [38]:

h ln( ) ) n n

( ,

h(ln

2n  )  ln h 4 n

(70)

where, h is VC dimension. VC dimension of function set  in d-dimensional space is h if some function f   can shatter h objects {xi  d , i  1,......., h}, in all 2h possible configurations and d no set {x j   , j  1,......., q} exists where q > h that satisfies this property. For example, three

points on plane ( d  2) can be shattered by linear indicator functions in 2h = 23 = 8 ways, whereas 4 points can not be shattered in 2q = 24 = 16 ways. Thus, VC dimension of set of linear indicator functions in two-dimensional space is three as shown in figure 5.6. The expression for VC bound in equation (69) is regularized functional where VC dimension h is parameter controlling

h ln( ) ) introduces penalty for excessive n n

complexity of classifier function [79]. The term  ( ,

complexity of classifier function. There is trade-off between number of classification errors on training set and complexity of classifier function. If complexity were not controlled, it would be possible to find such classifier function that would make no classification errors on training set notwithstanding how low its generalization ability would be [38].

Figure 5.5: Minima f opt and fˆn of expected ( R ) and empirical ( Rˆ ) Risk functions generally do not coincide 5.5.2 Need for Risk Classification In most countries only a small percentage of firms have been rated till date. The lack of rated firms is mainly due to two factors. Firstly, an external rating is an extremely costly procedure. Secondly, until recent past most banks decided on their loans to small and medium sized firms (SME) [38] without asking for client’s rating figure or applying an own rating procedure to estimate client’s default risk. At best, banks based their decision on rough scoring models. At worst, credit decision was completely left to the loan officer. Since learning to know its own risk is costly and until recently lending procedure of banks failed to set right incentives, small and medium sized firms shied away from rating. However, regulations are about to change the environment for borrowing and lending decisions. With the implementation of new basel capital accord (Basel II) [38] scheduled for the end of 2006 not only firms that issue debt securities on market are in need of rating but also any ordinary firm that applies for bank loan. If no external rating is available, banks have to employ an internal rating system and deduce each client’s specific risk class. Moreover, Basel II puts pressure on firms and banks from two sides.

Figure 5.6: Eight possible ways of shattering 3 points on plane with Linear indicator function First, banks have to demand risk premia in accordance to specific borrower’s default probability. Table 5.1 presents an example of how individual risk classes map into risk premiums. For small US firms one-year default probability of 0.11% results in spread of 2%. Of course, mapping used by lenders will be different if firm type or country in which bank are located changes. However, in any case future loan pricing has to follow the basic rule. The higher the firm’s default risk is more risk premium the bank has to charge. Second, Basel II requires banks to hold client-specific equity buffers. The magnitudes of these buffers are determined by risk weight function defined by the Basel committee [38] and solvability coefficient 8%. The function maps default probabilities into risk weights. The Table 5.2 illustrates the change in the capital requirements per unit of a loan induced by switching from Basel I to Basel II. Apart from basic risk determinants such as default probability, maturity and loss given default risk weights depend also on the type of loan and annual turnover. Table 4.2 refers to an SME loan and assumes that borrower’s annual turnover is 5 million EUR [38]. Since lock-in of bank’s equity affects provision costs of loan, it is likely that these costs will be handed over directly to an individual borrower. Basel II will affect any firm that is in need for external finance. As both risk premium and credit costs are determined by default risk, the firms’ rating will have deeper economic impact on banks as well as on firms themselves than ever before. Thus in the wake of Basel II choice of right rating method is of crucial importance. To avoid friction of large magnitude the employed method must meet certain conditions. On one hand, rating procedure must keep the amount of misclassifications as low as possible. On the other, it must be as simple as possible and if employed by borrower also provide some guidance to him on how to improve his own rating. FSVM have the potential to satisfy both demands [38]. First, the procedure is easy to implement so that any firm could generate its own rating information. Second, the method is suitable for estimating unique default probability for each firm. Third, the rating estimation done by FSVM is transparent and does not depend on heuristics or expert judgments. This property implies objectivity and high degree of robustness against user changes. Moreover, an appropriately trained FSVM enables the firm to detect specific impact of all rating determinants on overall classification. This property would enable the firm to find out prior to negotiations what drawbacks it has and how to overcome its problems. Overall, FSVM employed in internal rating systems of banks will improve transparency and accuracy of the system. Both improvements may help firms and banks to adapt to Basel II framework more easily. 5.5.3 Support Vector Machine SVM is classification method that is based on Statistical Learning theory and structural risk minimization to build a model of given system [151]. The methodology is becoming renowned

due to many useful features and promising empirical performance. It has been successfully applied to optical character recognition, medical diagnostics, and text classification. SVM are most widely used nonparametric technique in ANN and are deemed to be most accurate. They have a flexible structure and produce better classification results than parametric methods. They have very attractive properties and give single solution characterized by global minimum of optimized functional and not multiple solutions associated with local minima as in case of other methods used. SVM do not rely on heuristics and thus are an arbitrary choice to model various real life problems. SVM is an optimization technique in which prediction error and model complexity are simultaneously minimized [38]. Let training samples of multi-input single output model are denoted as X  {( x, y ) | ( x1 , y1 ),........ ., ( x N , y N )}, k  1,......... , N , where xk  R d is

k th input pattern; d denotes dimension of the input space and y k is its corresponding observed result which is binary variable1 or -1. In bankruptcy prediction model, x k denotes attributes of organization and y k is observed result of whether organization will be bankrupt or not. Hence, the organization becomes bankrupt then y k  1 else y k  1 . It is assumed that the training set is linearly separable after being mapped into higher dimensional feature space by nonlinear function  () , the classifier should be constructed as follows [38]:

wT  ( xk ) b1, yk 1 wT  ( xk ) b1, yk 1

{

(71)

Rating Class (S & P) One year PD (%) Risk Premia (%) 0. 01 0. 75 AAA 0. 02 – 0. 04 1. 00 AA 0. 05 1. 50 A+ 0. 08 1. 80 A 0. 11 2. 00 A0. 15 – 0. 40 2. 25 BBB 0. 65 – 1. 95 3. 50 BB 3. 20 4. 75 B+ 7. 00 6. 50 B 13. 00 8. 00 B> 13 10. 00 CCC 11. 50 CC 12. 70 C 14. 00 D Table 5.1: Rating grades and Risk premia The basic idea of SVM classification is to find such a separating hyperplane that corresponds to largest possible margin between the points of different classes [151] as shown in figure 5.7. Some penalty for misclassification must also be introduced. Classification error  k is related to distance from a misclassified point xi to canonical hyperplane bounding its class. If  k  0 an error in separating two sets occurs. The distance between two boundary lines is 1 / 2 || w || 2 [38]. Large distance is encouraged for purpose of generalization ability. In real world, training set is usually not linearly separable even mapped into high dimensional feature space, which means that a perfect separating hyperplane cannot be found that make each x k satisfy condition (71). A soft

margin is introduced to incorporate possibility of violation. Error term  k of instance k is defined as follows [38]:

{ykk [ w0  ( xk )b ]1k ,k 1,........,N (72) T

Rating Class One year PD Capital Requirements Capital Requirements (S & P) (%) (%) (Basel I) (%) (Basel II) 0. 01 8. 00 0. 63 AAA 0. 02 – 0. 04 8. 00 0. 93 – 1. 40 AA 0. 05 8. 00 1. 60 A+ 0. 08 8. 00 2. 12 A 0. 11 8. 00 2. 55 A0. 15 – 0. 40 8. 00 3. 05 – 5. 17 BBB 0. 65 – 1. 95 8. 00 6. 50 – 9. 97 BB 3. 20 8. 00 11. 90 B+ 7. 00 8. 00 16. 70 B 13. 00 8. 00 22. 89 B> 13 8. 00 > 22. 89 CCC 8. 00 CC 8. 00 C 8. 00 D Table 5.2: Rating grades and capital requirements; Figures in last column were estimated for a loan to an SME with turnover of 5 million euros and maturity of 2.5 years using data from column 2 and the recommendations of Basel Committee on Banking Supervision It is expected that the training should maximize classification margin and minimize sum of error terms at same time. When training set is not nonlinear separable in feature space, the two goals usually cannot be achieved at same time. Two group classification problems are formulated as the following optimization primal problem [38]:

min  ( w, b,  k ) 

w,b , k

subject to:

N 1 T w w  C  k 2 k 1

{ykk [ w0  ( xk )b ]1k ,k 1,........,N T

(73)

where, regularization parameter C is a constant to trade off two goals. The larger C more the error term is emphasized. Small C means that large classification margin is encouraged. The conditional minimization of objective function with constraints provides the highest possible margin in case when classification errors are inevitable due to linearity of separating hyperplane. Under such a formulation problem is convex. It can be shown that margin maximization reduces VC dimension. After substituting Karush-Kuhn-Tucker conditions [38] into primal Lagrangian, dual Lagrangian is as follows: N

max J ( )   k  

k 1

N

subject to:

 k 1

k

1 N  2 i 1

N

  k 1

i

j

y i y j  ( xi ) T  ( x j )

y k  0,0   k  C , k  1,......... , N .

(74)

Figure 5.7: The separating hyperplane x T w  b  0 and margin in non-separable case From conditions of optimality, it follows that: N

w   k y k  ( xk )

(75)

k 1

where,  k is solution of quadratic programming problem given by equation (74). Here, w depends only on some train instance whose corresponding  is larger than zero. These instances are called support vectors. As noted in beginning of the model,  (x ) is used to map input vector into higher dimension space such that two groups are linearly separable. However, still explicit form of  (x ) is not known in solving quadratic programming problem. The merit of support vector machine is that by kernel function, K ( x i , x j ), [79], [151] which is inner product in feature space, it tries to make training data linear separability in high dimension feature space, and thus achieve nonlinear separability in input space. By this way, optimal map function can be found, even without specifying explicit form of map function  (x ) . The choice of kernel includes equation which may be linear, polynomial, radial-basis function network or two layer neural T perceptions. The kernel function should satisfy Mercer’s conditions. Substituting  ( xi )  ( x j ) with kernel function K ( x i , x j ), leads to following optimization problem [38]: N

max J ( )   k  

k 1

N

subject to:

 k 1

k

1 N  2 i 1

N

  k 1

i

j

y i y j K ( xi , x j )

y k  0,0   k  C , k  1,......... , N .

After solving equation (76) and substituting w 

N

 k 1

k

(76)

y k  ( xk ) into original classification

problem, following classifiers are obtained: N

y( x)  sign( wT  ( x)  b)  sign( k y k K ( x, xk )  b) . k 1

(77)

1 ( x 1  x 1 ) w and x 1 and x 1 are two support vectors belonging to different classes 2 for which yk [wT  ( xk )  b]  1 . The value of classification function (score of organization) can where, b 

be computed as follows [38]:

f ( x)  wT  ( xk )  b

(79)

Each value of f(x) uniquely corresponds to default probability. 5.5.4 Fuzzy Support Vector Machine The idea of SVM is extended to FSVM and is applied to bankruptcy prediction by introducing fuzzy membership function to SVM [38]. The main idea of FSVM is that if input is detected as an outlier, one input’s membership is decreased such that its contribution to total error term is reduced. FSVM also treats each input as an input of opposite class with higher membership. In such way it is expected that new fuzzy machine makes full use of data and achieve better generalization ability. FSVM approach is commonly known as bilateral weighted FSVM because here one instance contributes two errors to total error term in objective function. First it is explained why Fuzzy set theory is applied to bilaterally weight fitting error in the model. Then it is shown how to solve the training problem by transforming it into quadratic programming problem [79] for bankruptcy prediction. Finally, procedure of the membership function generation is discussed. In many corporate and business applications clear classifications are usually impossible. In bankruptcy prediction analysis, it cannot be said that one organization will never be bankrupt and it will yield maximum returns to investors or vice versa [38]. Organizations which are least likely to default may not be considered. Based on this idea, every data point in the training dataset is treated as both positive and negative class but with different memberships. Memberships are assigned to both classes for every data point. The underlying economic meaning of bankruptcy prediction analysis is that each training sample is treated as being both possible good and bad to increase training algorithm’s generalization ability. This means that number of training data points is increased from original N to 2  N , i.e., from training dataset {x k , y k }, for

k  1,......... , N to

{x k ,1, mk }, {x k ,1,1  mk },

for k  1,......... , N .

In

the

notations

th

{x k , y k , mk }, x k is k input vector, y k is observed result and m k is membership for k th organization to class y k . Thus, classification problem can be reformulated as follows [38]:

min  ( w, b,  k , k ) 

w,b , k , k

N 1 T w w  C [mk  k  (1  mk ) k ] 2 k 1

subject to:

wT  ( x k )  b 1 k , k 1,........,N wT  ( x k )  b  1 k , k 1,........,N  k  0 , k 1,.......,N  k  0 , k 1,.......,N The corresponding error terms are denoted by:

(80)

(1 m )( w  ( x )  b 1), w  ( x )  b 1   (1mkk )(1 wT  k( xk )b ) mk (1kwT  ( xk )b ), 1 wT  ( xk )b1  mk (1 wT  ( xk )b ), wT  ( xk )b1   T

Error Term =

T

 10;wwT((xxk ))bb; 11 wT  ( x )b1 k k T T  1  w  ( x )  b ; w  ( x )  b  1 k k Error Term = 

(81)

T

(82)

 0m;w(1(wxTk)(xb)1b ); 1 wT  ( x )b1  mkk (1 wT  ( xkk )b ); wT  ( xk )bk 1  T

Error Term =

(83)

It is bilateral weighting error measure that makes FSVM distinct from other SVM. In bilateral weighted FSVM error term of an input originally labeled as positive. The error term in objective function for an input originally labeled as negative can also be written similarly and is neglected here. Let corresponding Lagrange multipliers to the conditions are  k ,  k , u k , v k . Based on this, Lagrangian function of previous model is constructed [38].

min J ( w, b,  k , k ;  k ,  k , u k , v k ) 

max

 k ,  k ,u k ,vk w,b , k , k N

 k 1

N

N

N

k 1

k 1

[ w  ( x k )  b  1   k ]    k [ w  ( x k )  b  1   k ]   u k  k   v k k T

k

N N 1 T w w  C  mk  k  C  (1  mk ) k  2 k 1 k 1

T

k 1

Differentiating this with respect to w, b,  k , k , N N d J  w   k  ( xk )    k  ( xk )  0 (85) dw k 1 k 1 N N d J   k    k  0 (86) db k 1 k 1 d J  Cmk   k  u k  0, k  1,......... , N (87) d k d J  C (1  mk )   k  vk  0, k  1,......... , N (88) d k

From Kuhn–Tucker theorem [38] following conditions are also satisfied:

 k [wT  ( xk )  b  1   k ]  0, k  1,......... , N (89)  k [wT  ( xk )  b  1   k ]  0, k  1,......... , N u k  k  0, k  1,......... , N (91)

(90)

v k  k  0, k  1,......... , N (92)

(84)

 k  0;  k  0; u k  0; vk  0;  k  0; k  0, k  1,......... , N (93) N

 (

According to equation (85), w 

k 1

k

  k ) ( xk ) (94)

From equations (86), (87) and (88), 0   k  Cm k , k  1,......... ., N (95)

0   k  C (1  mk ), k  1,......... ., N (96)

From equations (87), (88) and (93), Cm k   k and  k cannot be larger than zero at same time, or one of the constraints cannot be satisfied.  k (Cm k   k )  0, k  1,......... , N (97) Same as with  k ,  k satisfies  k [C (1  mk )   k ]  0, k  1,......... , N (98) Taking difference between sum of equations (89) and (90) from k  1,......... , N , N

 ( k 1

k

N

N

N

k 1

k 1

k 1

  k ) wT  ( xk )  b ( k   k )   ( k   k )   ( k  k   k k )  0 (99)

Incorporating equations (86), (97) and (98), N

N

N

N

k 1

k 1

k 1

k 1

C  mk  k  C  (1  mk ) k   ( k   k )   ( k   k )wT  ( xk ) (100) It is to be noted that the left of equation (100) is the second term of model’s objective function. Substituting equations (94) and (91) into objective function, reformulation of previous model is deduced [38]: N

N

max   k    k   k, k

k 1

k 1

1 N  2 i 1

N

 ( j 1

i

  i )( j   j ) ( xi ) T  ( x j )

subject to: N

 k 1

N

k

   k ,0   k Cm k , k  1,......... ., N k 1

(101)

0   k  C (1  mk ), k  1,........, N T Let K ( xi , x j )   ( xi )  ( x j ) . (102)

In order to transform this into quadratic programming problem, it is assumed that  k   k   k .The previous optimization becomes [38]: N

N

k 1

k 1

max   k   2 k   k , k

1 N  2 i 1

N

  j 1

i

j

K ( xi , x j )

N

subject to:

k 1

0   k   k  Cm k , k  1,......... , N After solving this and substituting w 



N

 ( k 1

k

k

0

0   k  C (1  mk ), k  1,......... , N (103)

  k ) ( xk ) into original classification problem,

the following classifier is obtained: N

y( x)  sign( wT  ( x)  b)  sign( ( k   k ) K ( x, xk )  b) (104) k 1

Here, decision value w 

N

 ( k 1

k

  k ) K ( x, xk ) is used as final bankruptcy score obtained from

FSVM. The major drawback of FSVM lies in its computational complexity, as it involves high dimensional quadratic problem [38]. The dimensions of quadratic programming problem are increased from N in SVM to 2  N . The quadratic problem is still convex because Hessian matrix related to Equation (41) is negative semi definite. It is interesting that quadratic programming involved in solving the model is unique. If decision variables are arranged as vector ( 1 ,......... ,  N , 1 ,......... ,  N )T , elements in upper-right, lower-left and lower-right quarter of its corresponding Hessian matrix are all zero and only upper-left quarter is negative definite matrix. An important step before solving quadratic programming problem is membership generation. Membership generating method itself is required to have good discriminatory power, which means an organization that is least likely to default should be assigned a high score by this method. Though the method itself can be used to evaluate bankruptcy, as shown in the test results, the classification performance is improved by FSVM. Many previous bankruptcy analysis methods [38] can be used to generate memberships. The problem of generating memberships from scores obtained by other methods i.e., the problem lies in mapping initial scores obtained from other bankruptcy analysis methods into membership which falls in unit interval [0, 1] of memberships. It is assumed that primary score for each organization, s k obtained by bankruptcy analysis method is known; then following functions can be used to define organization’s membership of positive class m k [38]: Linear: mk 

s k  min s k k 1,........,N

max s k  min s k k 1,........,N

k 1,......... ,N

Logistic:

mk 

(105)

a

ask  b

a ask b  1

(107)

 1,sk  s Bridge: mk   ( sk  s ) /( s  s ), s  sk  s (106)  0, s k  s s  Probit: mk  ( k ) (108) 

where, s, s, a, b,  and  are constants and  is cumulative normal distribution function. One of the main drawbacks of linear map function is that sometimes small min k 1,........,N s k and large max k 1,........,N s k will make the mapping inappropriate. Thus, the training of FSVM consists of [38]: (a) Using bankruptcy analysis method each training data is evaluated, for which the primary score for each organization s k is obtained; (b) Using equations (105)-(108) each organization’s membership of positive class m k is computed. The organization’s membership of

negative class is 1  m k ; (c) Finally, quadratic programming problem is solved corresponding to equation (103) to obtain final classifier given by equation (104).

5.6 Experimental Results and Comparisons In this section some experimental results are presented which are conducted on some well known data sets of varying dimension and size. 5.6.1 Simulation results of Discovering Stock Price Prediction Rules using RFMLP Model Let RFMLP model be termed as model S. Some other models are compared with this model to test its effectiveness [36], [109]. The models considered are: (a) Model O: An ordinary MLP trained using back propagation with weight decay; (b) Model F: A FMLP trained using back propagation with weight decay; (c) Model R: A FMLP trained using back propagation with weight decay and with initial knowledge encoding using Rough sets; (d) Model FM: A modular MLP trained with GA along with tuning of fuzzy parameters. Here, term modular refers to use of sub-networks corresponding to each class that are later concatenated using GA. Recognition scores obtained for each of data by model S are presented in table 5.4. It also shows comparison with other related MLP based classification methods viz., models O, F, R and FM. In all cases, 10 % of samples are used as training set and remaining samples are used as test set. 10 such independent runs are performed and the mean and standard deviation values of classification accuracy which are computed over them are presented in table 5.4. The rule generation phase computes reducts and corresponding rules with respect to stock exchange data considered. The dependency rules generated using Rough sets and encoding scheme for stock exchange dataset along with input fuzzy parameter values are given in table 5.3. The feature Fi in table 5.3, where F stands for low, medium, high denotes a property F of i th feature. The integrated networks contain 18 hidden nodes in single layer for stock exchange dataset. After combination 96 such networks were obtained. The initial population of GA was formed using 64 networks. In first phase of GA, for models FM and S, each of sub-networks are partially trained for 10 sweeps. The classification accuracies obtained by models are analyzed for statistical significance. Tests of significance are performed for inequality of means (of accuracies) obtained using RFMLP algorithm and compared with other methods considered. Since both mean pairs and variance pairs are unknown and different, a generalized version of t-test is appropriate here. This problem is classical Behrens-Fisher problem in hypothesis testing; for which suitable test statistic is described. The test statistic is of form [36]:

v

x1  x2

1 s12  2 s 22

(109)

where, x1 , x 2 are means, s1 , s 2 are standard deviations, 1  1 / n1 , 2  1 / n2 and n1 , n 2 are number of observations. Since, experiments were performed on 10 independent random training sets for all algorithms, n1 = n 2 = 10. Test confidence level considered was 95 %. In table 5.4, mean and standard deviation (SD) of accuracies are presented. Using means and standard deviations, value of test statistics is computed.

Dependency Rules:

c1  M 2  L3

c1  M 1  M 2 c2  M 2  M 3  ( H 2  M 2 )

c2  M 1  H 2 c3  ( L2  H 3 )  ( M 2  H 3 ) c3  ( L1  H 2 )  ( L1  M 3 ) c 4  ( L1  L2 )  ( L1  L3 )  ( L2  M 3 )  ( L1  M 3 ) c5  ( H 1  M 2 )  ( M 2  M 3 )  ( M 1  M 2 )  ( M 2  L3 ) c5  ( H 1  M 2 )  ( M 1  M 2 )  ( H 2  H 3 )  ( H 2  L1 ) c5  ( L2  L3 )  ( H 3  M 3 )  M 1 c6  L1  M 3  L2

c6  M 2  H 3 c 6  L2  H 3 c6  M 1  M 3  L2 Fuzzy Parameters: Feature 1: c L  0.346 , cM  0.469 , c H  0.619 , L  0.117 , M  0.169 , H  0.136 Feature 2: c L  0.221, cM  0.437 , c H  0.729 , L  0.218 , M  0.266 , H  0.286 Feature 3: c L  0.399 , cM  0.546 , c H  0.686 , L  0.146 , M  0.147 , H  0.136

Table 5.3: Rough set dependency rules for BSE data alongwith input Fuzzy parameter values

Model O Models

Model F

Model R

Model FM

Model S

Train Test Train Test Train Test Train Test Train Test

Accuracy %

66.9

65.4

86.6

82.7

87.6

87.0

86.4

83.6

88.6

86.4

Mean SD

0.56

0.56

0.46

0.56

0.36

0.26

0.46

0.56

0.26

0.26

Number of Links

136

210

156

124

84

Sweeps

5600

5600

2000

200

90

Table 5.4: Comparative Performance of different models on BSE dataset

Figure 5.8: Positive connectivity of the network obtained for BSE data using Model S; Bold lines indicate weights greater than PThres2 , while others indicate values between

PThres1 and PThres2 If the value exceeds corresponding tabled value, means are unequal with statistical significance i.e., algorithm having higher mean accuracy being significantly superior to one having lower value. It is observed from table 5.4 that model S performs best with least network size as well as least number of sweeps. For model R and model F, classification performance on test set is marginally better than that of model S, but with significantly higher number of links and training sweeps required. Comparing models F and R, it is observed that incorporation of domain knowledge in the latter through Rough sets boosts its performance. Similarly, using modular approach with GA (model FM) improves efficiency of model F. Since model S encompasses principle of both models R and FM, it results in least redundant yet most effective model. The variation of classification accuracy of models with iteration is also studied. As expected, model S is found to have high recognition score at very beginning of evolutionary training; the next values are attained by models R and FM, and lowest being attained by models O and F using back propagation. Model S converges after about 90 iterations of GA providing highest accuracy compared to all other models. The back propagation based models require about 2000 – 5000 iterations for convergence. It may be noted that training algorithm suggested is successful in imposing a structure among connection weights. It has been observed that weight values for FMLP trained with back propagation (model F) are more or less uniformly distributed between maximum and minimum values. RFMLP (model S) has most of its weight values zero while majority of its nonzero weights have high value. Hence, it can be inferred that former model

results in dense network with weak links, while incorporation of Rough sets, modular concepts and GA produces sparse network with strong links. The latter is suitable for rule extraction. The connectivity (positive weights) of trained network is shown in figure 5.8.

Model S

86.04

Users’ Accuracy (%) 84.37

Subset

84.00

83.72

78.29

3.89

16

1.2

1.7

M of N

80.00

81.04

75.69

3.20

14

1.1

1.7

X2R

78.00

76.89

75.36

3.72

14

0.8

1.5

C4.5

81.00

81.19

78.29

4.10

16

0.9

1.4

Algorithm Accuracy (%)

Kappa (%)

Number of Rules 10

CPU Time (Sec) 1.0

Conf

79.19

Uncovered Region (%) 4.10

1.4

Table 5.5: Comparison of performance of rules extracted by various methods on BSE data 5.6.2 Simulation results of Forecasting using Hybrid Neuro-Fuzzy Model The efficiency of hybrid Neuro-Fuzzy regression model is demonstrated by considering exchange rate data of US dollar (US $) against Indian national rupee (INR) and effectiveness of method is given by comparing with other forecasting models [41]. The information used here consists of 50 daily observations of exchange rates of US $ versus INR from 8th May to 15th October, 2008. Here, 40 observations are first used to formulate the model and last 10 observations to evaluate performance of method. Considering the proposed procedure three phases are illustrated [41]: (a) Phase I: Train ANN model: In order to obtain optimum network architecture using and constructive algorithm, different architectures were evaluated to compare network’s performance. The best fitted network architecture which presented best forecasting accuracy with test data composed of two inputs, three hidden and one output neurons i.e. N ( 231) . The weights and biases of network are given in table 5.7; (b) Phase II: Determining minimal fuzziness: Considering weights ( w0 , w1 , w2 , w3 ) = (985.2,0.5947,985.33,0.2787) fuzzy parameters are obtained from equation (30) with ( h  0) . The results are plotted in figure 5.9. The method provides possible intervals. From figure 5.9 actual values are located in fuzzy intervals but string of fuzzy intervals is wide enough especially when macro-economic environment is smooth in nature. This problem is resolved using method of Tanaka and Isibuchi to provide narrower interval for decision maker; (c) Phase III: Deleting outliers around model’s upper bound and lower bounds: From above results it is evident that observation of 22nd August, 2008 is located at lower bound, so linear programming constrained equation is produced by this observation is deleted to renew phase II. The results are shown in figure 5.10. Using given method, future values of exchange rates for next ten transaction days are forecasted. The results are shown in table 5.8. The results of forecast are satisfactory and fuzzy intervals are narrower than before. Thus, it helps decision maker to understand that possible interval of exchange rates in macro-economic environment is stable. The performance of method can be greatly improved with larger data sets. Table 5.9 presents comparison of forecasted interval widths obtained from given method with different data set sizes.

Extracted Rules:

c1  M 1  L3  M 2 ; cf  0.856

c1  H1  M 2 ; cf  0.766 c 2  M 2  M 3 ; cf  0.829

c2  M 1  H1  L2  M 2 ; cf  0.869 c3  L1  H 2 ; cf  0.796 c 4  L1  L2  L3 ; cf  0.736

c5  M 2  H 3 ; cf  0.896 c5  M 1  M 2 ; cf  0.799 c5  H 1  M 2 ; cf  0.737 c 6  H 2 ; cf  0.746 Fuzzy Parameters: Feature 1: c L  0.360 , cM  0.516 , c H  0.697 Feature 1: L  0.129 , M  0.175 , H  0.186 Feature 2: c L  0.237 , cM  0.446 , c H  0.736 Feature 2: L  0.219 , M  0.269 , H  0.299 Feature 3: c L  0.397 , cM  0.566 , c H  0.687 Feature 3: L  0.265 , M  0.219 , H  0.236

Table 5.6: Rules extracted from trained networks (Model S) for BSE data alongwith input Fuzzy parameter values

Input weights

Hidden weights

Biases

wi ,1

wi , 2

wi ,3

wj

w0, j

w0

3.786 42.1044 -155.2669

2.3752 -11.4969 172.2537

4.5530 -26.0886 158.1926

0.59466 985.3296 0.27868

-6.5937 11.4486 -27.1696

-985.1925

Table 5.7: Weights and Biases of ANN N ( 231) Exchange Rate 60

Dollar/Rupee

50 Series 1

40 30

Series 2

20 10 0

Date

Figure 5.9: Results obtained from neuro-fuzzy model (Series1 denote upper bound of exchange rate; Series2 denote actual value of exchange rate; Series3 denote lower bound of exchange rate) Exchange Rate 60

Dollar/Rupee

50

Serie s1

40

Serie s2

30 20 10

30-09-2008

23-09-2008

16-09-2008

Date

09-09-2008

02-09-2008

26-08-2008

19-08-2008

12-08-2008

05-08-2008

0

Figure 5.10: Results of neuro-fuzzy model after deleting 22nd August, 2008 Lower bound (Series1 denote upper bound of exchange rate; Series2 denote actual value of exchange rate; Series3 denote lower bound of exchange rate) A comparison of performance of proposed model using single time-series viz., exchange rate (US $/INR) with other forecasting models such as ARIMA, Chen’s fuzzy time-series (first-order),

Chen’s fuzzy time-series (high-order), Yu’s fuzzy time-series, FARIMA and ANN have been considered for comparison with forecasting power of the illustrated model [41]. To measure forecasting performance, in point estimation case, MAE (mean absolute error) and MSE (mean squared error) are employed as performance indicators, which are computed from following equations [41]:

MAE 

1 N  | ei | N i 1

MSE 

(110)

1 N (ei ) 2 (111)  N i 1

where, ei is individual forecast error and N is number of error terms. Date 1st October, 2008 2nd October, 2008 3rd October, 2008 6th October, 2008 7th October, 2008 8th October, 2008 9th October, 2008 10th October, 2008 14th October, 2008 15th October, 2008

Actual 46.47 46.63 47.01 47.77 47.78 48.00 48.00 48.41 47.93 48.53

Lower Bound 44.30 44.50 45.00 45.28 45.65 45.96 45.99 46.36 45.80 46.36

Upper Bound 48.50 48.66 49.09 49.79 49.80 50.10 50.40 50.46 49.97 50.57

Table 5.8: Results of Neuro-Fuzzy model for test data Sample Size (US $/INR) Exchange Rate Series

100

200

300

400

0.26

0.24

0.20

0.19

Table 5.9: Comparison of forecasted interval widths of Neuro-Fuzzy model with different sample sizes Model

ANN (95% confidence interval) Fuzzy ARIMA NeuroFuzzy Model

Forecasted interval width

Related performance

ANN

Fuzzy ARIMA

Neuro-Fuzzy Model

15.4

0

-

-

3.7

75.7 %

0

-

1.9

86.9 %

42.6 %

0

Table 5.10: Comparison of forecasted interval widths by Neuro-Fuzzy model with other forecasting models

Based on numerical results given in table 5.10, it is seen that predictive power of proposed model is rather encouraging and that possible interval by the model is narrower than 95 % of confidence interval of ANN. The width of forecasted interval by Neuro-Fuzzy model is 1.9 Rupees which indicates an 86.9 % improvement upon 95 % of confidence interval of ANN. However, model requires fewer observations than ANN and is an interval forecaster that yields more information. Although the method is basically designed for interval forecasting, its performance in point estimation is also more satisfactory than that of ANN. This evidence shows that performance of Neuro-Fuzzy model is better than that of other models and interval obtained is narrower than that obtained by FARIMA. Its performance is superior to FARIMA as depicted by 42.6 % improvement. The performance of Neuro-Fuzzy model is also better than point estimation models such as Chen’s fuzzy time-series (first order), Chen’s fuzzy time-series (second order) and Yu’s fuzzy time-series as per results given in table 5.11 where MAE and MSE of Neuro-Fuzzy model are lower than those of other models. Thus, Neuro-Fuzzy model obtains accurate results under incomplete data conditions, forecasts with little historical data and provides best and worst possible situations for decision makers. Model . Auto Regressive Integrated Moving Average Chen’s Fuzzy time-series (first order) Chen’s Fuzzy time-series (second order) Yu’s Fuzzy time-series Artificial Neural Networks (ANN) Neuro-Fuzzy Model

Exchange Rate MAE MSE 0.925 1.242 0.750

0.778

0.750

0.778

0.750 0.690 0.580

0.778 0.684 0.521

Table 5.11: Comparison of performance of Neuro-Fuzzy model with other forecasting models 5.6.3 Simulation results denoting Non-uniform Spread in Fuzzy Linear Regression Model To illustrate the validity of non-uniform spreads in Fuzzy linear regression two simulation examples are given. The first example has been investigated by several previous studies [42], show that discussed Fuzzy linear regression model has higher explanatory power than those of related earlier works. The second example shows that proposed method is capable of dealing with situation of variable spreads and has higher forecasting accuracy than other methods [42]. Example 1: The data given in table 5.12 has all observations for independent and response variables as

~

L

~

~

~

U L U trapezoidal fuzzy numbers. Although in this case (b1 )  , (b1 )  , (b0 )  and (b0 )  cannot be solved

~

~

analytically, the lower and upper bounds of membership functions of b1 and b0 at eleven distinct α-values can be obtained as shown in table 5.13. Then by defuzzifying these two fuzzy numbers via equation (44) and its corresponding n  200 trapezoids and solving equations (52) – (58), Fuzzy linear regression model constructed from non-uniform spread method is,

~

~

~

y NUS  3.6286  0.5696 x  E i , i  1,......... ,8 ~

~

(112)

~

~

where, E 1  (0.226,0,0,0.226) , E 2  (0.226,0,0,0.226) , E 3  (0,0,0,0.927) , E 4  (0.226,0,0,0.226) , ~

~

~

~

E 5  (0.226,0,0,0.226) , E 6  (0.927,0,0,0) , E 7  (0.226,0,0,0.226) and E8  (0.226,0,0,0.226) .

The expressions for other models viz., Sakawa-Yano, two-stage and least-square can be determined similarly. As evident from table 5.12 total error is minimum in case of non-uniform spread method which justifies superiority of the method.

[~ xi , ~ yi ]

Observation Number 1 2 3 4 5 6

[(1.5, 2.0, 2.0, 2.5),(3.5, 4.0, 4.0, 4.5)] [(3.0, 3.5, 3.5, 4.0), (5.0, 5.5, 5.5, 6.0)] [(4.5, 5.5, 5.5, 6.5), (6.5, 7.5, 7.5, 8.5)] [(6.5, 7.0, 7.0, 7.5), (6.0, 6.5, 6.5, 7.0)] [(8.0, 8.5, 8.5, 9.0), (8.0, 8.5, 8.5, 9.0)] [(9.5, 10.5, 10.5, 11.5), (7.0, 8.0, 8.0, 9.0)] [(10.5, 11.0, 11.0, 11.5), (10.0, 10.5, 10.5, 11.0)] [(12.0, 12.5, 12.5, 13.0), (9.0, 9.5, 9.5, 10.0)]

7 8 Total Error

Errors in Estimation Sakawa -Yano

Twostage

Leastsquare

0.633 0.453 1.613 1.165 0.770 1.977

0.848 0.208 1.489 0.910 0.760 1.449

0.974 0.702 1.683 1.111 0.848 1.603

Nonuniform Spread 0.825 0.221 1.186 0.926 0.712 1.227

1.368

1.000

1.508

0.969

1.452

0.806

0.934

0.869

9.431

7.470

9.363

6.935

Table 5.12: Numerical data and estimation of errors

 ~ b1

~ b0

L

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.296 0.317 0.340 0.360 0.382 0.404 0.426 0.448 0.472 0.495 0.519

L

0.782 0.755 0.727 0.700 0.673 0.646 0.620 0.594 0.569 0.544 0.519 1.562 1.770 1.975 2.178 2.381 2.583 2.783 2.983 3.181 3.377 3.572

U

5.401 5.231 5.058 4.881 4.702 4.520 4.338 4.147 3.958 3.766 3.572

U

Table 5.13:  -cuts of Fuzzy regression coefficients at eleven distinct  -values Example 2: Consider data given in table 5.14. The spreads of observed responses are 1, 1, 1, 1 and 5. By applying non-uniform spread method, membership functions of fuzzy regression coefficients are: 0,1.6  z1 , 10z  16  1 ,1.6  z1  2.4,  8   ~ ( z1 )  1, z1  2.4, b1  3.2  10z 1  ,2.4  z1  3.2, 8  0,3.2  z1

(113)

0,0.7  z 0 ,  z  1.4  0 ,1.4  z 0  0.7,  2.1   ~ ( z 0 )  1, z 0  0.7, b0  2.8  z 0  ,0.7  z 0  2.8,  2.1 0,2.8  z 0 

(114)

The regression model is: ~

~

y NUS  0.6  2.4 x  E i , i  1,......... ,5 ~

where,

~

(115) ~

~

E 1  (0.6,0,0,0.6) , E 2  (0.6,0,0,0.6) , E 3  (0.6,0,0,0.6) , E 4  (0.6,0,0,0.6) and

~

E 5  (0.6,0,0,0.6) .

By

applying

method

given

in

[42],

regression

model

is

~

y KC  0.6  2.4 x  (1.2,0,0,0.796) . To compare explanatory power equation (48) is used. Table 5.14 shows errors in estimation for two methods. It is expected that for observations with dramatically non-uniform spread, non-uniform model performs better than those with uniform spread.

[ xi , ~ yi ]

Observation Number 1. [1, (2.0, 2.5, 2.5, 3.0)] 2. [2, (5.0, 5.5, 5.5, 6.0)] 3. [3, (6.0, 6.5, 6.5, 7.0)] 4. [4, (9.0, 9.5, 9.5, 10.0)] 5. [5, (9.0, 11.5, 11.5, 14.0)] Total Error

Errors in Estimation Two- Non-uniform Stage Spread 0.456 0.356 1.093 0.836 0.789 0.836 0.557 0.356 1.586 0.000 4.480 2.384

Table 5.14: Numerical data and estimation of errors 5.6.4 Simulation results denoting Bankruptcy Prediction using FSVM The results of bankruptcy prediction of dataset given in section 5.5 using FSVM [38] are discussed here. The kernel function used for FSVM is anisotropic radial basis function in gaussian transformation. The logistic function given by equation (107) is used to map primary score for each organization s k to case membership of positive class m k . It is assumed that training dataset include N  positive instances and N  N  negative instances. After training the machines, each test data is input into the machines and its decision value is obtained. A key question in performance measurement is determination of cutoff value. The higher the cutoff, more instances will be rejected. In reality optimal cutoff value is chosen from descriptive statistic values for organizations in a way to tradeoff failures from default successes. The most significant predictors suggested by discriminant analysis belong to profit and leverage ratios. To demonstrate the ability of FSVM to extract information from data, two ratios from these groups viz., NI/TA from profitability ratios and TL/TA from leverage ratios are considered. The FSVM can differ in two aspects: (i) their capacity that is controlled by coefficient C and (ii) complexity of classifier functions controlled by anisotropic radial basis function in gaussian kernel transformation. In figures 5.11 – 5.14, triangles and squares denote successful and failing organizations from training dataset respectively. The intensity of gray background corresponds to different score values of f . The darker the area, higher the score and greater is probability of default. The most successful organizations lie in brighter area have positive profitability and reasonable leverage (TL/TA) = 0.4 approximately. Figure 5.11 presents classification results for FSVM using locally near linear classifier functions with capacity fixed at C  1 for which anisotropic radial basis 1

function is 100 2 . The discriminating rule in this case can be approximated by linear combination of predictors and is similar to that suggested by discriminant analysis, although

coefficients of predictors may be different. As complexity of classifying functions increases, 1

radial basis functions becomes 2 2 as illustrated in figure 5.12. Now areas of successful and 1

failing organizations become localized. If radial basis is decreased to 0.5 2 as shown in figure 5.13, FSVM will try to track each observation. The complexity in this case is too high for the given data set. Figure 5.14 demonstrates effects of high capacities, ( C  300 ) on classification results. As capacity is growing, FSVM localizes only one cluster of successful organizations. The area outside this cluster is associated with approximately equally high score values. Thus, besides estimating scores for organizations FSVM also manages to learn that there always exists cluster of successful organizations, while cluster for bankrupt organizations vanishes when capacity is high, i.e. an organization must possess certain characteristics in order to be successful and failing organizations can be located elsewhere. This result was obtained without using any additional knowledge besides that contained in training set. The calibration of model or estimation of mapping f  PD can be illustrated by following 1

example, where FSVM with radial basis function 2 2 and capacity, ( C  1 ) is used. Three rating grades are set viz., safe, neutral and risky which correspond to values of score f  0.0115,  0.0115  f  0.0115 and f  0.0115 respectively and calculate total number of organizations and number of failing organizations in each of three groups. If training set were representative of whole population of organizations, the ratio of failing to all organizations in a group would give estimated probability of default. Figure 5.15 shows Lorenz curve [38] in which cumulative default rate as function of percentile of organizations sorted according to their score for training set of organizations. For the abovementioned three rating grades, PDsafe  0.24 ,

PDneutral  0.50 and PDrisky  0.76 . If sufficient number of observations is available, the model can also be calibrated for finer rating grades such as AAA or BB by adjusting score values separating groups of organizations so that estimated default probabilities within each group equal to those of corresponding rating grades. It is to be noted that the model is calibrated on grid ^

determined by grad ( f )  0 or grad( PD)  0 and not on orthogonal grid as in moody’s riskcalc model. In other words, restrictive assumption of an independent influence of predictors is not made as in latter model. This can be important since same decrease in profitability will have different consequences for high and low leveraged firms. For multidimensional classification results can not be easily visualized. In this case cross-validation technique is used to compute percentage of correct classifications and compare it with that for discriminant analysis. It is to be noted that both most widely used methods viz., discriminant analysis and logit regression, choose only one significant at 5% level predictor (NI/TA) when forward selection is used. C126validation has following stages. One organization is taken out of the sample and FSVM is trained on remaining organizations. Then class of out of the sample organization is evaluated by FSVM. This procedure is repeated for all organizations and percentage of correct classifications is calculated. The best percentage of correctly c126-validated organizations where, all available ratios are used as predictors is higher for FSVM than for discriminant analysis (69% as compared to 60%). However, difference is not significant at 5% level. This indicates that linear function might be considered as an optimal classifier for number of observations in data set considered. As for direction vector of separating hyperplane, it can be estimated differently by FSVM and discriminant analysis without affecting much the accuracy since correlation of underlying predictors is high. Cluster center locations which were estimated using cluster analysis are presented in table 5.15. The results of cluster analysis indicate that two clusters are likely to correspond to successful and failing organizations. The substantial differences in interest coverage ratios, NI/TA, EBIT/TA and TL/TA between clusters are noted.

Figure 5.11: Ratings of Organizations in two dimensions; Case of low classifier functions, radial basis is 100

1

2

, capacity is fixed at C  1

Figure 5.12: Ratings of Organizations in two dimensions; 1

Case of an average complexity of classifier functions, radial basis is 2 2 , capacity is fixed at C  1

Cluster EBIT/TA NI/TA EBIT/S EBIT/INT TD/TA TL/TA SIZE QA/CL CASH/TA WC/TA CA/CL STD/TD S/TA INV/COGS

{-1}

{+1}

0.263 0.078 0.313 13.223 0.200 0.549 15.104 1.108 0.047 0.126 1.879 0.144 1.178 0.173

0.015 -0.027 -0.040 1.012 0.379 0.752 15.059 1.361 0.030 0.083 1.813 0.061 0.959 0.155

Table 5.15: Cluster centre locations; there are 25 members in class {1} : successful organizations and 75 members in class {1} : unsuccessful organizations

Figure 5.13: Ratings of Organizations in two dimensions; Case of an excessively high complexity of classifier functions, radial basis is 0.5

1

2

, capacity is fixed at C  1

Figure 5.14: Ratings of organizations in two dimensions; Case of high capacity C  300 , radial basis is fixed at 2

1

2

Figure 5.15: Power (Lorenz) curve – cumulative default rate as a function of percentile of Organizations sorted according to their score – for training set of Organizations; FSVM is applied with radial basis 2

1

2

and capacity C  1

5.7 Conclusion In this chapter, a methodology is presented for generating stock price prediction rules of BSE using RFMLP ANN. RFMLP network was modularly evaluated using GA for designing knowledge based network for pattern classification and rule generation. The algorithm involves synthesis of several MLP modules, each encoding Rough set rules for particular class. These knowledge-based modules are refined using GA. The genetic operators are implemented in such a way that they help preserve modular structure already evolved. It is seen that this methodology along with modular network decomposition results in accelerated training and more compact network with comparable classification accuracy, as compared to other hybridizations. The model is used to develop new rule extraction algorithm. The extracted rules are compared with some of related rule extraction techniques on basis of some quantitative performance indices. It is observed that proposed methodology extracts rules which are less in number, yet accurate and have high certainty factor and low confusion with less computation time. The research has immense potential for application to large scale prediction problems involving knowledge discovery tasks using case-based reasoning particularly related to mining of classification rules. This is followed by neuro-fuzzy hybrid model for time series forecasting of exchange rate data which is used in today’s competitive scenario. It is a quantitative technique that has become important tool for financial market forecasting and improving decisions and investments. One of the most important factors in choosing forecasting technique is its forecasting accuracy. The thrust is on improving effectiveness of time series models. The real world environment experiences uncertain and quick changes. Thus, future situations should be forecasted using small amount of data from short span of time. Neuro-fuzzy model combines advantages of ANN and fuzzy regression and is used to forecast exchange rate of US dollar to Indian national rupee. The disadvantage of large volume of historical data is removed through investing on advantages of fuzzy regression models. The model requires fewer observations to obtain accurate results and also obtains narrower possible intervals than other interval forecasting models under incomplete data conditions, by exploiting advantage of ANN to preprocess raw data. The empirical results indicate that model is suitable for use in incomplete data conditions. The performance of model is also better than other models as illustrated by the results. It is also suitable for both point and interval forecasts with incomplete data. Thus, hybrid model makes good forecasts under best and worst possible situations which make it more suitable for decision making over other techniques. Next, FLR model with non-uniform spreads is studied for achieving greater explanatory power and forecasting accuracy. Some of previous works on fuzzy regression coefficients of increasing spreads for estimated fuzzy responses as independent variable increase in magnitude and crisp regression coefficients with uniform spreads are not suitable for general cases. Although some obtain crisp regression coefficients and uniform spread, they cannot deal with situation where spreads of observed responses are actually non-uniform. Here, regression coefficients are calculated as crisp values and spreads of fuzzy error terms are non-uniform. The models proposed earlier with an increasing spread or constant spread are suitable for cases when observed responses have increasing spread or constant spread. The non-uniform spread fuzzy linear regression model resolves problem of wider spreads of estimated response for larger values of independent variables in fuzzy regression analysis and deals with non-uniform spreads effectively. The method is based on extension principle which also provides membership function of least-squares estimate of regression coefficient, thereby conserving fuzziness of observations. Further, the method has greater explanatory power and forecasting accuracy. Finally, method for constructing membership function of fuzzy regression coefficient completely conserves fuzziness of input information. Numerical example illustrates strength of non-uniform method in terms of better explanatory power than earlier studies. The non-uniform method can also be applied to multiple fuzzy regression problems.

Finally, a novel Soft Computing tool viz., FSVM is considered to study problem of bankruptcy prediction in corporate organizations. SVM is capable of extracting information from real life business data. Moreover, they give an opportunity to obtain results not very obvious at first glance. They are easily adjusted with only few parameters. This makes them particularly well suited as an underlying technique for organization rating and investment risk assessment methods used by financial institutions. SVM are also based on very few restrictive assumptions and can reveal effects overlooked by many other methods. They have been able to produce accurate classification results in other areas and can become an option of choice for several applications. But, real life corporate data has an inherent degree of uncertainty and impreciseness; it is obvious that unpredictable results may crop up. In order to create practically valuable methodology for tackling the uncertainty aspect, SVM is integrated with fuzzy membership functions so that an effective decision making classification tool is obtained. To conduct the study, test dataset is used which comprises of 50 largest bankrupt organizations with capitalization of no less than $1 billion that filed for protection against creditors under Chapter 11 of United States Bankruptcy Code in 2001–2002 after stock marked crash of 2000. The performance of FSVM is illustrated by experimental results which show that they are better capable of extracting useful information from corporate data than traditional bankruptcy prediction methods.

Chapter 6 Some Problems in Assignment, Sequencing and Job Scheduling

6.1 Introduction The allocation and assignment of resources is matter of central concern in Computer Science and Operations Research [53], [72], [94], [152]. It involves distribution of resources among competing groups of people or programs and is used to assign available resources in an economic and cost-benefit way, such that there is best possible utilization of organization’s resources. The management of an organization is generally responsible for allocating resources to achieve its goals and objectives. According to Churchman, resource allocation in organizations can be stated as follows [122], [138]: In organizations, decision-making function is responsibility of management. In order to execute its responsibility, an organization’s management requires information about resources available to it and their relative effectiveness for achieving organization’s objectives. Resources are acquired, allocated, motivated and manipulated under controlling authority. They include people, materials, plant and equipment, money and information Here, rationality is the achievement of organization’s objectives that will best fulfill its goals. An important question which arises here is: How does an organization rationally allocate its resources in order to achieve its goals? Executives are often asked how resources are assigned in their organization, which they are sometimes unable to answer specifically. The question which then crops up is how resource decisions are made in some rational way. In order to make resource decisions in a rational way [122], [138] an organization perform tasks such as: (i) Identify or design alternatives; (ii) Identify and structure organization’s goals into objectives, sub-objectives and so on; (iii) Measure how well each alternative contributes to each of lowest level sub-objectives and (iv) Find best combination of alternatives, subject to environmental and organizational constraints. Closely associated with assignment problem are sequencing and scheduling problems. Assignment problem is special form of transportation problem [145], [146], where decision variable of transportation matrix can assume either 0 or 1 value, i.e. one facility can perform one job at a time. An assignment plan is optimal if it minimizes total cost or effectiveness or maximizes the profit of performing all jobs. Examination Timetable Problem (ETP) and University Course Timetable Problem (UCTP) are two important types of NP-hard assignment problems [43], [46], [108] considered here. ETP can be defined as assignment of courses to be examined and their candidates to time periods and examination rooms while satisfying set of constraints. These constraints may be either hard or soft [43], [46]. Hard constraints such as avoiding student examination collision and room over-sizing must be satisfied. Soft constraints such as scheduling large candidate examinations first may be tolerable but must be minimized as much as possible. Generally, therefore no optimal algorithm is known which generates a solution within reasonable time. UCTP represents an important class of optimization problem in Operations Research [108]. It is considered as one of the most difficult problems faced by universities and colleges today. The problem can be defined as the allocation of given resources (teachers, students and classrooms) to objects (courses) being placed in space time satisfying all university constraints and optimizing the utilization of existing facilities such that set of desirable objectives are satisfied effectively and efficiently. University timetable problem exists in two forms viz., course and exam timetable formats and requires several slots with different categories such as lectures, tutorials and practical sessions, which fits within a week and repeats for whole semester. Given increasing number of students in universities, a large number of courses are offered every term. Each course has different number of enrolled students and each classroom has different capacities which make assignment of courses to classrooms complicated. Furthermore, it is not only enough

to schedule course in classroom with higher capacity than number of enrolled students, since this can still lead to inefficient utilization of classrooms which can cause difficulties for teachers and students. Sequencing problem [94], [126] is concerned with appropriate selection of sequence of jobs done on finite number of service facilities in some well-defined order so as to optimize some efficiency measure such as total elapsed time or overall cost. The sequence determination may be extended to longest common subsequence problem [53], which has an identical number of jobs processed by different number of facilities such that total elapsed time is minimized; the objective is to find longest common subsequence corresponding to two sequences obtained. Longest common subsequence can be seen as measure of closeness of two strings as it consists in finding maximum number of identical elements of two strings when preserving order of element matters. It is one of the most studied sequencing problems in as it plays an important role in strings comparison. It has potential applications in many areas such as Pattern Recognition, Data Compression, Word Processing and Genetics. The problem has two different aspects, where first one deals with various measures of pre-sortedness and other with problem of generation of longest common subsequence provided elements are arranged in increasing sequence resulting in longest increasing subsequence. Freedman [31], [53] has examined complexity of algorithm that computes length L of longest increasing subsequence of S , where S  x1 ,......... .., x n  is sequence of n distinct integers, given by L  max{k : 1  i1  i2  .......... .  ik  n; x(i1 )  ..........  x(ik )} with an order of n log2 n running time and n log2 n  n log log2 n  O(n) comparison in its worst case. It has been shown that substantial amount of information regarding ordering of elements in S is required for value of L to be determined uniquely. Sequencing and scheduling are closely related to each other and often used interchangeably. Scheduling is concerned with assignment of time to set of jobs for processing through group of machines or their service sector equivalents in order to best satisfy some criteria [13], [44], [118], [129]. Scheduling revolves around the concept of scheduling theory which is concerned with formulation and study of various scheduling models and development of associated solution techniques. It involves sequencing of activities under time and resource constrains to meet specific objective. It is complex decision making problem because of conflicting goals, limited resources and difficulty in accurately modeling real world problems. In an industrial job assignment problem context, scheduling activities are mapped to operations, and resources to machines. The purpose of scheduler is to determine starting time for each operation to achieve desired performance measures, while satisfying capacity and technological constraints. In today’s highly competitive industrial environment, there is an urgent need for robust and flexible approach capable of generating good solutions within an acceptable timeframe. Some widely studied classical scheduling models are single machine, parallel machine, flow scheduling and job scheduling problems [13], [108], [129]. A great deal of research has been carried out and is still being done on scheduling problems [13], [44], [118], [129]. The reason is that scheduling offers great theoretical challenge for researchers because of its combinatorial nature. Also, from practical point of view, it plays significant role in successful operation of production, planning, and control department. Most scheduling research has considered optimizing single objective. The problem has been tacked using integer programming, branch and bound, heuristic techniques etc. The heuristics provide good satisfactory but not necessarily optimal solutions in reasonable time and use problem specific information. The problems of scheduling [2], [118], [129], [131], [165] may be segregated based on requirements, complexity of processes and scheduling objectives. Requirements may be produced either by open shop for customer orders or closed

shop for inventory replenishment. The complexity of processes is primarily determined by order in which different machines appear in operations of individual jobs. Broadly, scheduling can be classified as flow scheduling and job scheduling. A number of assumptions revolve around flow or job scheduling. They are primarily considered for simplicity of structure of problems and to build generalized model. Most of different applications using these models require relaxing one or several of these assumptions, so that they are not entirely realistic models for applications. Some of the assumptions include availability of jobs, non-interference of machines, non-passing of jobs etc. [165]. In flow scheduling, it is generally assumed that all jobs must be processed on all machines in same technological or machine order. In job scheduling, jobs may be processed following different machine orders. There is no common path of movement of jobs from machine to machine. Each machine is likely to appear for processing each operation of each job. The scheduling objectives are evaluated to determine optimum schedule of jobs. Some of the objectives include makespan, total flow time, average job tardiness, and number of tardy jobs. Makespan of schedule of jobs is completion time of last job in that schedule; it is assumed that schedule starts at zero time. The total flow time of schedule of jobs is sum of completion times of all jobs in that schedule. Job tardiness indicates lateness of job with respect to its due date. Minimization of makespan results in maximization of overall resource utilization, whereas total flow time aims at minimizing working process inventory and minimum tardiness yields minimum penalty [13]. Scheduling problems may also be deterministic, stochastic and static or dynamic [25], [52], [118]. The problem is deterministic or stochastic when time required to process task over respective machine takes fixed or random value. The scheduling problem is considered as static if ordering of jobs on each machine is determined once and remains unchanged as opposed to dynamic case that can accommodate changes of job ordering for accessing new jobs to system. A four parameter notation [25], [52] is generally used to identify individual scheduling problems, often written as  /  /  /  ;  denotes job-arrival process. For dynamic problems  denote probability distribution of times between arrivals. For static problems it is assumed that they arrive simultaneously unless stated otherwise;  describes number of machines, m used in scheduling problem;  refers to flow pattern of jobs through machines in shop and  describes criterion by which schedule of jobs will be determined. Among the above scheduling models, deterministic job scheduling problem has attracted most attention for two reasons viz., (i) The generic formulation makes it applicable to possibly all scheduling domains (ii) The problem’s intractable nature has inspired researchers to develop broad spectrum of strategies, ranging from simple heuristics to adaptive search strategies based on conceptual framework borrowed from Biology, Genetics and Evolution. It consists of finite set of jobs to be processed on finite set of machines. The difficulty in job scheduling problem lies in number of possible schedules. Generally, for an n  m job scheduling problem, cardinality of set of possible schedules is (n!)m [44]. Though set of feasible schedules in which precedence constraints have not been violated is subset of this set, it is still large enough to discourage complete enumeration for even moderately sized problems. The computation time for algorithms searching possible solution space to identify an optimal schedule increases exponentially with problem size which makes it NP–hard problem [53]. The Search based approach to job scheduling problem is based on exploration of feasible solution space to identify an optimal solution. Adaptive search algorithms like GA, Tabu Search and Simulated Annealing have been applied to this domain with success and in many cases are capable of providing optimal or near optimal solutions. Heuristics offer knowledge based alternative to the problem. Many dispatching rules are abstractions formulated from expert knowledge of problem. Elementary priority dispatch rules such as Shortest Processing Time (SPT), First Come First Served (FCFS), and Earliest Due Date (EDD) etc. have proven useful in simulation studies of job environment [154]. The ease of application, rapidity of computation and flexibility to changing shop floor conditions are key

reasons for heuristic based approaches to be used widely in many industrial situations. Their main weakness however is that different heuristics cater to different problems and no single heuristic dominates the rest across all situations. The stochastic nature of search in these algorithms yield good solutions but does not help explain the process by which they are obtained or properties attributable to provided solutions. An interesting line of investigation would be to cast scheduling problem as learning task with goal of capturing properties of known good solutions. In such formulation, optimal solutions generated by efficient optimizers provide desired learning objects. In these optimized sequences, each individual operation is treated as decision which captures some problem specific knowledge. Hence, these solutions contain valuable information such as relationship between an operation’s attributes and its position in sequence. An exploration of these sequences by Machine Learning techniques would capture predictive knowledge regarding assignment of operation’s position in sequence based on its attributes. While this approach learns knowledge contained in schedules produced by an optimization method, there is no reason that knowledge contained in schedules produced by human scheduler could not serve as training set. In this chapter, first an important NP-hard assignment problem viz., ETP [46] is discussed. It is defined as assignment of courses to be examined and their candidates to time periods and examination rooms while satisfying set of hard and soft constraints. The simulation example is taken from Netaji Subhas Open University, Kolkata. Many possible models exist for examination timetable problem. Here, the solution is developed using FILP technique. As in most real life situations information available in the system is not exact and lack precision and has an inherent degree of vagueness, various allocation variables are considered as fuzzy numbers [155], [159] defined by membership function  : R  [0,1] expressing lack of precision that decision maker has. Each feasible solution has a fuzzy number obtained by fuzzy objective function. The solution to this problem is obtained using either fuzzy number ranking method. The performance of different FILP techniques are demonstrated by experimental data generated through extensive simulation from Netaji Subhas Open University, Kolkata, India in terms of its execution times. The proposed FILP models are compared with a commonly used heuristic viz., ILP approach on experimental data which gives an idea about quality of the heuristic. FILP technique is again compared with different AI based heuristic techniques for ETP with respect to best and mean cost as well as execution time measures on Carter benchmark datasets to illustrate its effectiveness. The comparative study is performed using mathematical Model 3 of FILP technique because minimum number of variables is required in its formulation. An appreciable amount of time is required by FILP technique to generate satisfactory solution in comparison to other heuristic solutions. The work acts as benchmark for other heuristic algorithms [46] and help better reformulations of mathematical models of problem. The experimental study presented here focuses on producing a methodology that generalizes well over a spectrum of techniques that generates significant results for one or more datasets. The performance of FILP model is finally compared to the best results cited in literature for Carter benchmarks to assess its potential. A heuristic based solution for UCTP [43] is presented next. The problem is NP-hard combinatorial optimization problem which lacks analytical solution methods. It has received tremendous attention during past few years given its wide use in universities. Several algorithms have been proposed most of which are based on heuristics like search techniques and evolutionary computation. Here, FGH algorithm [166] is used to solve the problem. The method incorporates GA [74] using indirect representation based on event priorities, micro GA and heuristic local search operators to tackle real world timetable problem from St. Xavier’s College, India. Fuzzy set models measure of violation of soft constraint in fitness function to take care of inherent uncertainty and vagueness involved in real life data. The present search technique is different from other techniques in several aspects such as (i) algorithm is multi-path that searches

many peaks in parallel and hence reduces possibility of local minimum trapping; (ii) it works with coding of parameters instead of parameters themselves which help genetic operator to evolve current state into next state with minimum computations; (iii) fitness of each string are evaluated to guide its search instead of optimization function; (iv) there is no requirement for derivatives or other auxiliary knowledge such that no computation of derivatives or other auxiliary functions are required; (v) GA explores search space where probability of finding improved performance is high. Hence, GA is often viewed as black box approach. In contrast, Fuzzy Logic models are easy to comprehend because it uses linguistic terms and structured rules. Unlike GA, Fuzzy Logic does not come with search algorithm. Fuzzy models adopt techniques from other areas such as statistics, linear system identification etc. Since, GA has search ability; it is natural to merge two paradigms. The algorithm incorporates number of techniques and domain specific heuristic local search operators to enhance search efficiency. The non-rigid soft constraints involved in the problem are basically optimization objectives for search algorithm for which there is an inherent degree of uncertainty involved in objectives which comprises of different aspects of real life data. This uncertainty is tackled by formulating measure of violation parameter of soft constraint in fitness function using fuzzy membership functions. The solutions are developed with respect to specified benchmark problem. The simulation results indicate that FGH algorithm produces better results than manual solution developed by college staff [43]. Then the concentration shifts in finding low-complexity solution for LCS problem using ACO paradigm [58]. ACO is used as novel nature-inspired metaheuristic for solution of hard combinatorial optimization problems. It belongs to class of metaheuristics which are approximate algorithms used to obtain good enough solutions in reasonable amount of computation time. The inspiring source of ACO is foraging behavior of real ants. When searching for food, ants initially explore area surrounding their nest in random manner. As soon as ant finds food source, it evaluates quantity and quality of food and carries some of it back to nest. During return trip, ant deposits chemical pheromone trail on ground. The quantity of pheromone deposited depends on quantity and quality of food and guides other ants to food source. In indirect communication between ants via pheromone trails enables them to find shortest paths between their nest and food sources. This characteristic of real ant colonies is exploited in artificial ant colonies in order to solve combinatorial optimization problems. Considering two strings a1 .......... ....a n and

b1 .......... ....bm ( m  n) , the traditional technique for finding LCS is based on dynamic programming [53] which consists of creating recurrence relation and filling table of size m  n . This problem has two different aspects, in which first one deals with various measures of presortedness and other with problem of generation of LCS provided elements are arranged in an increasing sequence resulting in longest increasing subsequence. Although first aspect has received good deal of attention in past, latter one is not excepting a few. Here, second aspect of this problem is tackled. The proposed ACO-LCS algorithm [31] draws analogy with behavior of ant colonies function known as ant system. It is viable approach to stochastic combinatorial optimization. The main characteristics of this model are positive feedback, distributed computation and use of constructive greedy heuristic. Positive feedback accounts for rapid discovery of good solutions, distributed computation avoids premature convergence and greedy heuristic helps find acceptable solutions in minimum number of stages. The proposed methodology is applied to LCS and simulation results are given. The effectiveness of this approach is demonstrated by efficient computational complexity. Finally, hybrid RFMLP–NN is used to study scheduling process to JSP [44], [102]. RFMLP–NN is Soft Computing paradigm in which consortium of methodologies works synergistically and provides flexible information processing capability for handling real life ambiguous situations. The main ingredients of RFMLP–NN [14] are Rough sets, Fuzzy Logic and MLP ANN. As job

scheduling is decision making process, the next operation is to select partial schedule from set of competing operations with objective of minimizing performance measure. A complete schedule is consequence of best selected decisions. However, there exist an inherent degree of uncertainty, vagueness and impreciseness associated with such problems. These aspects are taken care of by using Fuzzy sets and Rough sets. Fuzzy sets helps in handling vagueness and impreciseness in linguistic input description and ambiguity in output decision. Rough sets deals with uncertainty arising from inexact or incomplete information and extracts crude domain knowledge for determining network parameters. It synthesizes approximation of concepts and finds hidden patterns in acquired data. It aids in representation and processing of both qualitative and quantitative parameters in reduced form and mixes user defined and measured data thus evaluating significance of data. It classifies decision rules from data and provides legible and straightforward interpretation of synthesized models. It is mostly suited for parallel processing applications and hence can be effectively integrated with ANN. ANN is data-driven modeling tool [79] that is able to capture and represent complex and non-linear input or output relationships. They are recognized as powerful and general technique for Machine Learning because of their non-linear modeling abilities and robustness in handling noise-ridden data. They are composed of several layers of processing elements or nodes. These nodes are linked by connections with each connection having an associated weight. The weight of connection is measure of its strength and its sign is indicative of excitation or inhibition potential. It captures task relevant knowledge as part of its training regimen. This knowledge is encoded in network in architecture or topology of network. The transfer functions are used for nonlinear mapping alongwith set of network parameters viz., weight and biases. To generate learning or knowledge base GA are chosen [74] for producing optimal solutions to known benchmark problem as they have proved to be successful in empirical scheduling research. In each optimal solution, every individually selected operation of job is treated as decision which contains knowledge. Each decision is function of job characteristics divided into classes using domain knowledge. Scheduler enhances classification strength and captures predictive knowledge regarding assignment of operation’s position in sequence. The trained network successfully replicates performance of GA. The better performance of scheduler on test problems compared to other methods demonstrates utility of method. The scalability of Scheduler on larger problem sets gives satisfactory results. RFMLP–NN thus captures predictive knowledge [44] regarding assignment of operation’s position in job sequence. This chapter is organized as follows. In section 6.2, examination timetable problem is illustrated through FLIP technique. This is followed by solution of university course timetable problem using FGH algorithm in section 6.3. In section 6.4, ACO paradigm is used to find a solution for longest common subsequence problem. The section 6.5 uses RFMLP ANN for studying JSP. Experimental results and comparisons are given in section 6.6. Finally, in section 6.7 conclusions are given.

6.2 FILP for Examination Timetable Problem ETP is defined as assignment of courses to be examined and their candidates to time periods and examination rooms while satisfying set of constraints which may be either hard or soft. Here, solution of the problem is developed using FILP technique [46]. The various allocation variables are considered as fuzzy numbers expressing lack of precision that the decision maker has. Each feasible solution has fuzzy number obtained by fuzzy objective function which is tackled through using fuzzy number ranking method. The work acts as benchmark for testing heuristic algorithms and also better reformulations of problem.

6.2.1 Examination Timetable at Netaji Subhas Open University Here a mathematical model is developed for examination timetable at Netaji Subhas Open University, Kolkata using FILP technique [46]. The examination period is fixed to two weeks, with two examination sessions per day. A week is made up of six consecutive days i.e. Monday to Saturday. Some examinations have more candidates than a single room can hold, and are scheduled in more than one room. A room can have more than one examination scheduled in it if sufficient room space is available. To optimize examination space lecture halls are also used for examinations. The following assumptions are made: (i) An examination can be scheduled in any of the available rooms; (ii) All examinations are of variable time length duration; (iii) There are no open book examinations; (iv) The assignment of courses to rooms and timeslots is done as per division of courses in lecture theatres into different groups (v) The maximum mixing of courses in a room is four due to four colors arrangement; (vi) Walking distance between examination rooms is irrelevant as there is an interval of at least one hour between examinations. The problem is divided into hard and soft constraints [46]. The hard constraints considered for this problem are: (i) No candidate can be assigned to more than one examination at the same time; (ii) A room cannot be assigned more candidates than its capacity. The soft constraints for this problem are: (i) Room space wastage is minimized during examinations and each room is utilized as much as possible; (ii) Continuous examinations are discouraged for a candidate in a day; (iii) Each lecturer has at least one gap between invigilation sessions. The hard constraints must be satisfied and these are modeled as constraints of problem. Soft constraints are to be minimized and these are modeled as objective function of problem. Three mathematical formulations of problem are developed viz., Model 1, Model 2 and Model 3 [46] as discussed below. 6.2.2 Model 1 for Examination Timetable Problem Let ~ x jkr denote course j assigned to time slot k in room r . The time slot k takes care of both time and day. As examinations take two weeks spanning over a period of twelve days in a semester, they are numbered in increasing order from Monday of first week to Saturday of second week. This gives total of 24 timeslots for whole examination period. The variable ~ x jkr is fuzzy number, i.e., ~ x jkr  F ( R) , F (R ) being set of real fuzzy numbers. Thus, R has membership function

 : R  [0,1] . For each feasible solution, there is a fuzzy number which is obtained by means of fuzzy objective function. Hence, to solve this problem for optimal solution as well as fuzzy value of objective, fuzzy number ranking method from this function is considered [46] which provides different auxiliary conventional optimization model. The hard constraints of Model 1 [46] are briefly enumerated as follows: (i) Every student is registered for courses that he will take before the beginning of semester, i.e.,

1, i th student  j th course (1) s ij   0 otherwise (ii) Lecturers are assigned courses from respective departments. For preparing examination timetable it is required to have lecturer identification and courses assigned to each lecturer, i.e.,

1, i th lecturer  j th course (2) l ij   0 otherwise

(iii) There n courses that are scheduled in m rooms and t time slots.

~

(iv)Examinations are scheduled into rooms whose capacity is known. Let Rr denote examination capacity of room r . The examination capacity of a room is normally less than teaching capacity ~ ~ so as to disperse candidates. The variable Rr is fuzzy number, i.e., Rr  G( R) , G (R ) being set of

~

real fuzzy numbers. Thus, Rr behaves identically as ~ x jkr . (v) Student is not assigned to more than one examination session at a time i.e., a student cannot be assigned to two courses j1 and j 2 at same time slot k ,

r1 , r2 , i, j1 , j2 such that j1  j2 (3) (vi) Examination is not assigned to a room which has less capacity than course size i.e., courses assigned to time slot k to room r must not exceed examination room capacity, i.e.,

 s ~x

ij jkr

j

~  Rr ; k  1,......., 24, r  1,......... , m (4)

i

Thus, if any of the courses j  1,......... .., n are assigned to time slot k at room r , then sum of all students taking courses j  1,......... .., n must not exceed room capacity. The soft constraints of Model 1 [46] are briefly illustrated as follows: (i) As room space wastage is minimized during examinations; each room r must be utilized as much as possible, i.e.,

~ min z   (Rr   sij ~ x jkr ) (5) r

k

j

i

(ii) The case where i th student can have consecutive examinations j1 and j 2 in time slots k and k  1at any day is to be minimized, i.e., i, k , r1 , r2 , j1 , j2 such that j1  j2 (6) (iii) The case where l th lecturer may invigilate more than one consecutive examination j1 and j 2 in time slots k and k  1is to be minimized, i.e., i, j, k , r1 , r2 such that i  j (7) (iv) Every teacher has his own availability schedule for invigilation ensuring which he submits plan with desirable time periods that suits him best, ~ (8) min sk ~ k  j sj

j

k

(v) Every teacher has minimum and maximum limit of invigilation hours which are 12 and 4 respectively during entire examination period, (9) min  w~ijk ~k 4 wij 12

i

j

k

(vi) Travel time of teachers and students between rooms within campus is minimized,

~ min pk ~ k  ij pij

i

j

(10)

k

Finally, FILP model [46] is represented as follows:

~ min z   (Rr   sij ~ x jkr )   ( sij1 ~ x j1kr1  sij2 ~ x j2 ( k 1) r2 ) r

k

j

i

i

j1

j2

k

r1

r2

x j1kr1  lij2 ~ x j2 ( k 1) r2 ) + min +  (lij1 ~  ~s jk ~ i

j1

j2

k

r1

s jk

r2

j

~ + min pk  w~ijk + min ~k ~ k  ij 4 wij 12

i

j

k

pij

i

j

k

(11)

k

subject to

sij1 ~ x j1kr1  sij2 ~ x j2kr2  1; k  1,......... ...., 24 r1 , r2 , i, j1 , j2 such that j1  j2

 s j

ij

~ ~ x jkr  Rr ; k  1,......... ...., 24, r  1,......... ......, m

i

Given a timetable with n courses, m rooms and t timeslots as a result of which problem comprises of ntm variables. A typical examinations timetable at Netaji Subhas Open University involves 250 courses, 50 rooms, 10,000 students and 400 lecturers on 24 slots time interval. This gives problem with ntm  250  24  50  3,00,000 variables. The problem is generally intractable in nature. The number of variables is minimized by breaking variable definition into two separate variables which gives rise to following mathematical model. 6.2.3 Model 2 for Examination Timetable Problem Let ~ x jk denote course j assigned to time slot k in room r and ~ y jr denote course j assigned to room

r . The variables ~ x jk and ~ y jr are fuzzy numbers, i.e., ~ x jk  F1 ( R) and ~ y jr  F2 ( R) , F1 ( R) and F2 ( R) being the set of real fuzzy numbers. The variables ~ x jk and ~ y jr behave identically as ~ variables ~ x jkr and Rr used in Model 1. The hard constraints of Model 2 [46] are briefly enumerated as follows: (i) The i th student cannot have more than one courses j1 and j 2 scheduled on same time slot k , i.e.,

i, j1 , j2 such that j1  j2

(12)

~

(ii) Room r cannot have more students i than its capacity Rr , i.e.,

 s j

ij

~ ~ y jr  Rr ; r  1,......... ......, m

(13)

i

The soft constraints of Model 2 [46] are briefly illustrated as follows: (i) The case where lecturer i can invigilate consecutive examinations j1 and j 2 is avoided; there is at least one gap between [k , k  1] of an examination slot, i.e.,

i, k , j1 , j 2 such that j1  j2

(14)

(ii) The case where student i can have two or more consecutive examinations j1 and j 2 is to be minimized, i.e., (15) i, k , j1 , j 2 such that j1  j2

~

(iii) The room space wastage is minimized, Rr 

 (s j

ij

~ y jr ); r  1,......... .., m

(16)

i

Soft constraints (iv), (v) and (vi) are identical to the constraints specified in Model 1. Thus, FILP model [46] is represented as follows:

~ min z   (Rr   sij ~ y jr )   ( sij1 ~ x j1k  sij2 ~ x j2 ( k 1) ) r

j

i

i

j1

j2

k

x j1k  lij2 ~ x j2 ( k 1) ) + min +  (lij1 ~  ~s jk ~ i

j1

j2

s jk

k

j

~ + min pk  w~ijk + min ~k ~ k  ij 4 wij 12

i

j

k

pij

i

j

k

(17)

k

subject to

sij1 ~ x j1k  sij2 ~ x j2k  1; k  1,......... ...., 24 i, j1 , j2 such that j1  j2

 s j

ij

~ ~ y jr  Rr ; r  1,......... ......, m

i

In model 2, number of variables is n(t  m)  250  (24  50)  18,500 . This is a considerable reduction from previous model. However, the number of variables is still large enough to solve a real timetable problem. The number of variables can be further reduced if number of crisp as well as fuzzy variables is minimized. 6.2.4 Model 3 for Examination Timetable Problem Let Tc denote time slot in which course c is slotted, C i denote room which is assigned to course i ,

~ ~ Gsi denotes student s taking course i and Ali denotes lecture l assigned to course i . The variables ~ ~ ~ ~ Gsi and Ali are fuzzy numbers, i.e., Gsi  F1 (R) and Ali  F2 (R), F1 (R) and F2 (R) being set of ~ ~ ~ ~ real fuzzy numbers. The capacity of room i is denoted by Ri . The variables Gsi , Ali and Ri behave ~ identically as variables ~ x and R used in Model 1 and ~ x and ~ y in Model 2. jkr

jk

r

jr

The hard constraints of Model 3 [46] are briefly enumerated as follows: (i) The courses i and j registered by same student s cannot be scheduled on same slot as it may result in student collision, i.e.,

~ ~ Ti Gsi  T j Gsj , s, i, j  i  j

(18)

(ii) The count of all students s registered for courses i which have been slotted in room r must not exceed capacity of room, i.e., ~ ~ Gsi  Rr , r (19)



iCi r s

Since, C i is a variable, it is to be removed from bounds, which is done by introducing 0/1 value

 ir , such that

1, Ci  r 0, otherwise

 ir  

 

Thus, above expression can be rewritten as:

i

ir

(20)

~ ~ Gsi  Rr , r

(21)

s

(iii) To enforce 0/1 variable, following constraint is introduced, Ci   ir  r , i, r

(22)

(iv) Given a course slotted in room r i.e. Cr where room r has no standby generator, then time in which this course is slotted cannot be multiple of 3 (i.e. it cannot be held in evening hours) i.e., (23) TCr  p, p  E, r  G The soft constraints of Model 3 [46] are briefly illustrated as follows: (i) The room space wastage is minimized,

~

~

 (R    G r

r

iCi r s

si

)

(24)

(ii) To avoid consecutive examinations for a student, gap between two examinations is taken to be at least 2, i.e.,

~ ~ | Ti Gsi  T j Gsj | 2, s, i, j  i  j

(25)

The above expression can also be rewritten as,

~ ~ min(2  (Ti Gsi  T j Gsj )),s, i, j  i  j ~ ~ or min  (2  (Ti Gsi  T j Gsj )) (26) s

i j

Soft constraints (iii), (iv) and (v) are identical to soft constraints (iv), (v) and (vi) in Model 1. Thus, FILP model [46] is represented as follows:

~ ~ ~ min z   ( Rr   irGsi )   (2  (TiGsi  T j Gsj )) + min  ~s jk ~ r

i

s

s

s jk

i j

~ + min p k (27)  w~ijk + min ~k ~ k  ij 4 wij 12

i

j

pij

k

i

j

j

k

k

subject to

~ ~ Ti Gsi  T j Gsj  0, s, i, j  i  j ~ ~   irGsi  Rr , r i

s

Ci   ir  r , i, r

TCr  p, p  E, r  G   {0,1}, T , C  I In this model, there are three types of variables viz., T , C and  giving total of t  n  nm  28  200  (200 50)  10,228 . This is significant reduction in number of

variables than previous two models. However, number of variables is still large to be solved. This is obvious, since problem is NP-Hard. Here, small instances of test problems can be solved to optimality and can act as benchmarks for testing performance of heuristic procedures.

6.3 FGH Algorithm for University Course Timetable Problem UCTP is another NP-hard combinatorial optimization problem which lacks analytical solution methods. Several heuristic algorithms based on search and evolutionary procedures [74] have been proposed. Here, FGH algorithm is used to solve the problem. The method incorporates GA using indirect representation based on event priorities, micro GA and heuristic local search operators to tackle real world timetable problem from St. Xavier’s College, India [43]. Fuzzy set models measure of violation of soft constraint in fitness function to take care of inherent uncertainty and vagueness involved in real life data. The solutions are developed with respect to specified benchmark problem. 6.3.1 University Course Timetable Problem UCTP consists in finding the exact time allocation within limited time period of a number of events (courses-lectures) and assign to them number of resources (teachers, students and classrooms) in such a way that number of constraints are satisfied. In most of the universities courses are organized in number of semesters. The constraints to be satisfied by timetable are usually divided into two categories viz. hard and soft constraints [43]. Hard constraints should be rigidly fulfilled. Such constraints include: (i) No resource (teachers, students and classrooms) may be assigned to different events at same time; (ii) Events of same semester must not be assigned at same time slot (in order for students of the semester to be able to attend all semester courses); (iii) Assigned resources to an event must belong to set of valid resources for that event, etc. On the other hand, it is desirable to fulfill soft constraints to possible extent but is not fully essential for valid solution. Therefore, soft constraints can also be seen as optimization objectives for search algorithm. Such constraints are: (i) Schedule an event within particular window of whole period (such as during evenings); (ii) Minimize time gaps or travel times between adjacent lectures of same teacher, etc. The problem considered for this work comes is taken from St. Xavier’s College, Kolkata, India and involves weekly scheduling of all courses of Department of Computer Science of the College. The problem specifications are given in table 6.1. The hard and soft constraints considered for this problem are given in tables 6.2 and 6.3 respectively. Serial Number Parameter Description Quantity 1 Number of Courses 90 2 Number of different Lectures 200 3 Number of scheduled Events 210 4 Number of Semesters 11 5 Type of Lectures (Theory/Laboratory) 2 6 Number of Teachers 50 7 Number of Classrooms/Laboratories 19 8 Number of Days 5 9 Number of Periods within a Day 10 Table 6.1: Timetable Problem specifications In table 6.1 value 10 for field time-periods within a day denotes possible starting periods of each class (from 8:00 am to 6:00 pm) and not complete time slots that can accommodate equal number

of consequent classes. As different lectures have different durations (1 to 2 hours), the real number of consequent classes that can be scheduled within a day depends on specific set of classes chosen and their durations. Any solution satisfying above constraints is feasible schedule for the problem. The specific case is considered as benchmark and reasons are: (i) The real constraints were easily accessed for developing manual solution to the problem, in order to set-up university course timetable problem on realistic basis; (ii) There was an easy access to manual solutions for problem which facilitates in making easy comparisons with present results; (iii) The specific problem is generally NP-hard problem, and serves as demanding benchmark for developing an efficient optimization algorithm. Serial Number Hard Constraint 1 No resource (teacher, student or classroom) is assigned to different events at same time 2 Events of same semester are not assigned at same time slot when both events are of type theory or when one event is theory and other event is laboratory. Same semester events run concurrently only if they are both of type laboratory, as for each course 4 laboratory classes are scheduled within the week, each attended by different group of students. 3 The maximum number of time periods per day should not exceed a particular value (10) 4 Each lecture is held in a classroom belonging to specific set of valid rooms for lecture 5 Each classroom has its own availability schedule 6 Each lecture is assigned to teacher that belongs to specific set of teachers that can deliver lecture. 7 Specific lectures must be rigidly assigned to specific teachers. 8 Theory classes need one teacher while Laboratory classes need two teachers Table 6.2: Hard Constraint specifications Serial Number 1 2 3 4 5 6

Soft Constraint Every teacher has his own availability schedule ensuring which he submits a plan with desirable time periods that suits him best Every teacher has minimum and maximum limit of weekly work-hours If a class is broken in more than one noncontiguous lectures within a week, specific number of days must be left between these lectures The travel time of teachers and students between the classrooms within campus is to be minimized The time gaps within schedule of each teacher is to be minimized The time gaps within schedule of each classroom is to be minimized Table 6.3: Soft Constraint specifications

There are certain difficulties involved in the chosen problem case which are justified by following facts [43]: (i) the problem has two types of lectures viz. theory and laboratory with diverse characteristics and constraints; (ii) the number of classrooms is generally small (viz. only 19) in the College, that accommodates all taught lessons, a fact which makes timetable schedule very tight. Some of classrooms are laboratories designed for laboratory classes, and others are theory

classrooms. All laboratories are occupied by classes, for full number of periods per day and all five days with only minor time-gaps; (iii) Specific classes are taught in specific classrooms. Theory classes are assigned to any of lecture classrooms, but laboratory classes must be assigned to specific laboratory classrooms; (iv) There are quite large number of teachers, each of whom has their own minimum and maximum hour limits per week, and ability to teach in limited set of classes. 6.3.2 Uncertainty measures in University Course Timetable Problem Here different uncertainty measures in formulating UCTP are discussed. The uncertainty measures are associated with soft constraints of problem [43]. Fuzzy sets are used to model uncertainty and vagueness associated with soft constraints in final timetable schedule by allowing grades of membership in set. The model allows decision maker to express his preference to ultimate schedule such that related measure of violation is appropriately represented. Among soft constraints, best availability schedule of each teacher, maximum and minimum workload of each teacher as well as the classes broken into more than one non-contiguous lecture within a week where specific number of days are left between lectures are uncertain due to both human as well as environment factors. In addition, travel time of teachers and students between rooms within the campus, time gaps within schedule of each teacher and time gaps within schedule of each room which have to be minimized have an inherent degree of uncertainty and impreciseness factors associated. These constraints are represented using Fuzzy sets [155], [159]. The estimation of time elapsed with respect to soft constraints 4th, 5th and 6th is obtained by taking into consideration the nature of teacher, student and location of rooms. While some people walk faster, others may walk slowly as a result of which elapsed times are basically dependent on walking speed of different people. Uncertain elapsed times ~ pij are modeled by using triangular 1 2 3 1 3 membership functions represented by triplet ( pij , pij , pij ) , where pij and pij are lower and upper 2 bounds of elapsed time while pij is modal point [43] as represented in figure 6.1.

Figure 6.1: Fuzzy representation of elapsed times

Figure 6.2: Fuzzy representation of schedule of teacher The 2nd soft constraint i.e. weekly work-hours of each teacher can similarly be represented by 1 2 3 1 3 triangular membership functions represented by triplet ( wij , wij , wij ) , where wij and wij are

minimum and maximum bounds of weekly work-hours of each teacher while wij2 is modal point. The soft constraints 3rd and 1st are represented using LR trapezoidal membership functions. The specific number of days must be left between non-contiguous lectures within a week is given by

~

days elapsed d j and represented by doublet (d 1j , d 2j ) , where d 1j and d 2j denote left and right end of trapezoid as depicted in figure 6.2. The schedule of each teacher with respect to his own availability and desirable time periods that suits him best is given by schedule ~ s j and represented by doublet ( s 1j , s 2j ) , where s 1j and s 2j denote left and right end of trapezoid. 6.3.3 FGH Algorithm for University Course Timetable Problem This section presents FGA for UCTP. To solve the timetable problem, an optimization method based on FGA is developed that incorporates number of techniques and domain specific local search operators. GA is an iterative search procedure widely used in solving optimization problems, motivated by biological models of evolution [74]. In each iteration, population of candidate solutions is maintained. Genetic operators such as mutation and crossover are applied to evolve solutions and find good solutions that have high probability to survive for next iteration. First, a representation method is required to encode timetable solution into an encoded form or chromosome suitable for applying the genetic operators. Generally, two different approaches are considered viz. direct and indirect approaches. A direct representation [74] directly encodes all event attributes viz. day, time slot, teacher, classroom etc. for all events. In these cases GA has to decide for all timetable parameters and deliver complete and constraint free schedule. This results in very large search space having solutions satisfying all constraints. However, directly encoded solutions, that undergo genetic operators, frequently result in invalid solutions that have to be handled in some manner. An indirect representation [74] on the other hand, considers encoded solution i.e. chromosome that usually represents an ordered list of events which are placed into timetable according to some predefined method (timetable builder). The timetable builder [43] can use any combination of heuristics and local search to place events into the timetable, while observing constraints of problem. For GA implementation of this work an indirect representation that encodes four fields for each event into chromosome is considered: (i) Day to allocate event; (ii) Teachers (1 or 2) to assign to event; (iii) classroom where event will be held; (iv) Priority to allocate event within the day. All fields are first encoded as integers and then entered into chromosome as binary numbers. When GA produces such a solution, it first decodes it to gain these four fields for every event in schedule. Then it invokes timetable builder routine viz. timetable that works as follows: (i) It separates events into clusters, one for each day; (ii) For every cluster, it sorts events according to their priority values and in ascending order (small values are with high priority and are placed first); (iii) It takes first event in cluster (event with higher priority), marks it as taken, and places it into schedule of particular day; (iv) Starting from time slot 1 it places event and checks if any constraints are violated. If allocation is not fixed, the algorithm moves on to next event in cluster; (v) If any constraints are violated, they are allocated to the event into subsequent time periods until all constraints are satisfied; (vi) If there exists no time period for which all constraints are satisfied, event is marked to violate maximum time periods exceeded per day constraint (3rd constraint from table 6.2); (vii) The algorithm continues with next event in the list. When all events have been processed, timetabler moves to next cluster (day), and this is repeated for all days in schedule. A similar algorithm has been given in [43] where non-evolutionary heuristic algorithm is proposed for examination timetable problem. All events are sorted according to a measure of difficulty figure that is dynamically adapted during the run, and difficult to schedule

events are handled first. In this work, allocation priority of events is determined genetically. The timetable manages to satisfy hard 1st, 2nd, 5th, 7th and 8th constraints of table 6.2 while all other constraints are satisfied by GA. 6.3.4 Formulation of fitness function Now formulation of fitness function for FGA is discussed. After timetable has produced timetable, it is evaluated through fitness function that analyzes the solution and calculates its overall fitness values as sum of weighted scores and penalties for all constraints i.e., hard and as well as soft. Fitness function used here is as follows [43]: 6

8

i 1

i 1

F ( x)   wis  Pi soft ( x)   wih  Pi hard ( x)

(28)

where x is timetable under evaluation, Pi soft (x) is measure of violation of ith soft constraint,

Pi hard (x) is measure of violation of ith hard constraint, wis is weight factor for ith soft constraint and wih is weight factor for ith hard constraint. The weights wis and wih  [0, 1] are normalized and randomly chosen used in GA. In every iteration to explore different areas of search space weights are changed. The function F (x ) is to be minimized. The impreciseness and uncertainty aspects related to measure of violation of soft constraints are taken care of using fuzzy logic. However, impreciseness and uncertainty aspects related to measure of violation of hard constraints are handled using probabilistic measures as constraints are defined rigidly. The nature of fitness function is dependent on both measures of violation Pi soft (x) and Pi hard (x) which assesses ultimate quality of different allocations within the population. One of measures of violation for soft constraints is calculated taking into consideration estimation of time elapsed with respect to different resources and events. Fuzzy elapsed times between resources and events imply fuzzy completion times. The question arises how to compare fuzzy completion times with fuzzy elapsed times between resources and events. This is investigated here based on possibility measure [43]. The possibility measure  V~ ( ~ pij ) evaluates possibility of j

~ fuzzy event, V j occurring within fuzzy set ~ pij . It is used to compute measure of violation of time elapsed with respect to different resources and events such that fuzzy completion times are 1 2 3 minimized. Since, uncertainty in elapsed times ~ pij are modeled by using triplet ( pij , pij , pij ) , the

~

~

possibility measure considered as being composed of two fuzzy events V1 j and V2 j . The event

~ ~ V1 j is considered for pair ( pij1 , pij2 ) and event V2 j considers pair ( pij2 , pij3 ) of elapsed times. Thus, measure of violation is given by [43], soft ~ Ptime  elapsed ( x)   V~j ( p ij )  1  sup min{sup min{V~1 j ( x),  ~ pij ( x )}, sup min{V~2 j ( x ),  ~ pij ( x )}};

j  1,......... ., n

(29)

~

~

where, V~ ( x) , V~ ( x) , V~ ( x) and  ~pij ( x) are membership functions of fuzzy sets V j , V1 j , j

1j

2j

~ V2 j and ~ pij respectively. The other measure of violation for soft constraints is calculated taking into consideration weekly work-hours of each teacher with respect to minimum and maximum bounds of weekly work-

hours of each teacher. Fuzzy weekly work-hours of each teacher lead to computation of fuzzy maximum and minimum of weekly work-hours. The comparison of fuzzy weekly work-hours with fuzzy maximum and minimum of weekly work-hours of each teacher is performed using ~ ) evaluates possibility of fuzzy above possibility measure [43]. The possibility measure  P~ ( w ij j

~ ~ . It is used to compute measure of violation of weekly event, P j occurring within fuzzy set w ij work-hours of each teacher with respect to minimum and maximum bounds of weekly work~ are modeled by using hours of each teacher. As uncertainty involved in weekly work-hours w j triplet ( wij1 , wij2 , wij3 ) , the possibility measure considered as being composed of two fuzzy events

~ ~ ~ ~ P1 j and P2 j . The event P1 j is considered for pair ( wij1 , wij2 ) and event P2 j considers pair ( wij2 , wij3 ) of weekly work-hours. Thus, measure of violation is given by [43], soft ~ ~ ( wij )  1  sup min{sup min{ ~ ( x),  w Pwork ~ ( x )}, sup min{ ~ ( x),  w ~ ( x )}};  hours ( x )   P P1 j P2 j ij ij j

j  1,......... ., n

(30)

~

~

where,  P~ ( x) ,  P~ ( x) ,  P~ ( x) and  w~ij ( x) are membership functions of fuzzy sets P j , P1 j , 1j

j

2j

~ ~ respectively. P2 j and w ij The third measure of violation for soft constraints is calculated by considering schedule of each teacher with respect to his own availability and desirable time periods that suits him best. Fuzzy availability of each teacher gives fuzzy desirable time periods. The comparison of fuzzy availability with fuzzy desirable time periods is performed using possibility measure [43]. The

~ possibility measure  E~ (~ s j ) evaluates possibility of fuzzy event, E j occurring within fuzzy set j

~ s j . It is used to compute measure of violation of each teacher with respect to his own availability and desirable time periods that suits him best and is given by, soft Pschedule ( x)   E~ j (~ s j )  1  sup min{ E~ j ( x),  ~s j ( x)}; j  1,......... , n (31) ~ where,  E~ ( x) and  ~s j ( x) are membership functions of fuzzy sets E j and ~ s j respectively. j

The fourth measure of violation for soft constraints is calculated by considering specific number of days left between non-contiguous lectures within a week and corresponding upper and lower bounds. Fuzzy specific number of days left between non-contiguous lectures within a week results in computation of fuzzy upper and lower bounds. The comparison between above two aspects is performed using possibility measure [43]. The possibility measure

~

 M~ (d j ) evaluates j

~

~

possibility of fuzzy event, M j occurring within fuzzy set d j . It is used to compute measure of violation of specific number of days left between non-contiguous lectures within a week and is given by [43], ~ P soft ( x)   M~ (d j )  1  sup min{ M~ ( x),  d~ ( x)}; j  1,......... , n (32)

daysleft

j

j

j

~

~

where,  M~ ( x) and  d~ ( x) are membership functions of fuzzy sets M j and d j respectively. j

j

From above discussion it is clear that, some of the problem’s constraints are handled by timetable during construction of complete solution from genetically produced abstract solution. The rest of

constraints are handled using penalty function that is composed as weighted sum of penalty terms, each of which corresponds to measure of violation of each constraint. Moreover, soft constraints could also be seen as optimization objectives that have to be optimized to possible extent. Serial Number Genetic Operator Parameter 1 (a) Crossover 40-point 1 (b) Crossover Uniform 2 (a) Mutation Probability = 0.007 2 (b) Mutation Probability = 0.02 3 (a) Window Mutation Operator Probability = 0.1 3 (b) Window Mutation Operator Probability = 0.4 4 (a) Swap Chromosome Operator Probability = 0.1 4 (b) Swap Chromosome Operator Probability = 0.4 5 (a) Swap Bit Operator Probability = 0.1 5 (b) Swap Bit Operator Probability = 0.4 6 (a) Swap Window Operator Probability = 0.1 6 (b) Swap Window Operator Probability = 0.4 7 (a) Random Genotype Operator Probability = 0.1 7 (b) Random Genotype Operator Probability = 0.4 8 (a) Mutate Chromosome Operator Probability = 0.1 8 (b) Mutate Chromosome Operator Probability = 0.4 9 Bit Swap Mutate Hill Climbing Operator Probability = 0.5 10 Window Swap Hill Climbing Operator Probability = 0.5 11 (a) Varying Fitness Function Linear 11 (b) Varying Fitness Function Square 11 (c) Varying Fitness Function Exponential 12 (a) Genetic Algorithm Population 200 12 (b) Genetic Algorithm Population 400 13 Micro Genetic Algorithm Probability = 1.0 Combinatorial Hill Climbing Operator Table 6.4: Standard Genetic Operators and parameters considered 6.3.5 Genetic Operators The next issue to consider is blend of genetic operators incorporated into GA, in order to achieve maximum optimization performance. To do this first standard operators are considered as well as general purpose combinatorial operators. The operators and their parameters considered are shown in table 6.4. The operators 3 (a) through 10 are discussed [43]. The Varying Fitness Function technique and operator 13 are given in [43]. The standard GA setup employed Roulette Wheel Parent Selection, population of 50 solutions, standard 5-point Crossover operator and Bit Mutation operator (Probability = 0.001 per bit elitism). The offspring replaced whole population of parents with fitness scaling and generation limit of 7000 generations. The operators and their parameters of table 6.4 where tested in order before adopting them in final algorithm. Due to specific nature and intractability aspect involved in the problem also domain specific Hill Climbing operators are considered that are applied only to best solution of each generation. These operators include: (i) Change Day Hill Climbing Operator: This operator selects an event at random (1st and 2nd constraints of table 6.2) and changes its encoded day of allocation field, assigning to it all day values sequentially, except from original day value. Every time resulting

timetable is evaluated, if it scores better than original then change is kept, otherwise old day value is restored; (ii) Fix Teacher Hill Climbing Operator: This operator finds all events with teacher class constraint violations (6th constraint of table 6.2) and selects one such event at random. Then it changes encoded teacher to allocate field, assigning to it all valid teachers sequentially, except from original one. Every time resulting timetable is evaluated, if it evaluates better than original, and then change is kept, otherwise old teacher value is restored. This operator is also successfully applied to 7th and 8th constraints of table 6.2; (iii) Fix Classroom Hill Climbing Operator: This operator considers events with classroom availability constraint violations (5th constraint of table 6.2) and selects one event at random. Then it changes encoded classroom to allocate field, assigning to it all valid classrooms sequentially except from original one. Every time resulting timetable is evaluated, and if it evaluates better than original then change is kept otherwise old classroom value is restored; (iv) Fix Room Hill Climbing Operator: This operator finds all events with classroom lecture constraint violations (4th constraint of table 6.2) and selects one such event at random. Then it changes encoded room to allocate field, assigning to it all valid classrooms sequentially except original one. Every time resulting timetable is evaluated, if it evaluates better than original, then change is kept, otherwise old room value is restored; (v) Fix Day Hill Climbing Operator: This operator finds all events that are allocated beyond maximum time periods per day limit (3rd constraint of table 6.2), and selects one such event at random. Then it changes encoded day of allocation field, assigning to it all day values sequentially, except from original one. Every time resulting timetable is evaluated, if it evaluates better than original, then change is kept, otherwise old day value is restored. It is obvious that above four operators are specifically designed to give GA the ability to fulfill 3rd, 4th and 6th hard constraints of table 6.2 that are not satisfied automatically by timetable. The effectiveness of these operators is illustrated through simulation results. In the FGH algorithm, the event priority is given by placing small values first with high priority. The algorithm is not swapping evenly between clusters as all the constraints are not satisfied always and there is violation between clusters. If an event is not scheduled then timetabler will generate inconsistent results.

6.4 ACO technique for LCS Problem In this section, a solution of LCS problem is obtained using ACO [31]. LCS can be directly applied to job sequencing problem [94] where there are n jobs each of which has to be processed one at a time on each of m machines for which a sequence is obtained which minimizes total elapsed time between start of job on first machine and completion of job on last machine. Again, if n jobs are processed on each of k machines for which another sequence is obtained which optimizes total elapsed time. Thus, there exists an alternate sequence for processing n jobs and corresponding to two sequences a LCS is obtained. The solution is developed using natural behavior of ants while searching for their food. In general, ACO [58] approach attempts to solve an optimization problem by repeating two steps viz., (i) candidate solutions are constructed using pheromone model, i.e., parameterized probability distribution over solution space; (ii) candidate solutions are used to modify pheromone values in a way that is deemed to bias future sampling toward high quality solutions. The solution obtained for LCS has high computational efficiency.

6.4.1 Longest Common Subsequence Given an alphabet  , an element  * is called sequence or string. A subsequence of sequence is given sequence with some elements (possibly none) left out. Formally given a sequence X  x1 ,......... ....., x m  and another sequence Z  z1 ,......... ....., z k  is subsequence of X if there is strictly increasing sequence  i1 ,......... ....., ik  of indices of X such that for all

j  1,......... .., k such that xi  z j [53]. Further, given two sequences X and Y  a sequence Z is common subsequence of X and Y if Z is subsequence of both X and Y . For example, if X  A, B, C , B, D, A, B  and Y  B, D, C , A, B, A  , sequence  B, C , A  is common subsequence of both X and Y . The sequence  B, C , A  is not LCS of X and Y , since it has length 3 and sequence  B, C , B, A  , which is also common to both X and Y has length 4. The sequence  B, C , B, A  is LCS of X and Y , as is sequence  B, D, A, B  , since there is no common subsequence of length 5 or greater. In LCS problem, given two sequences X  x1 ,......... ....., x m  and Y  y1 ,......... ....., y n  ; the objective is thus to find maximum length common subsequence of X and Y , provided elements are arranged in an increasing sequence. LCS problem has an optimal-substructure property [31]. The sub problems correspond to pairs of prefixes of two input sequences. Given sequence X  x1 ,......... ....., x m  and i th prefix of X for i  0,......... .., m is defined as X i  x1 ,......... ....., xi  . LCS problem can easily represented recursively and recursive solution to the problem has overlapping-sub problems property 6.4.2 Ant Colony Optimization According to Papadimitriou and Steiglitz [31], ACO is combinatorial optimization problem P  ( S , f ) in which given finite set of solutions S is called search space and an objective function f : S    assigns positive cost value to each of solutions. The search process is stochastic in nature. The goal is either to find solution of minimum cost in reasonable amount of time. ACO algorithms belong to class of metaheuristics. The central component of ACO algorithm is parameterized probabilistic model called pheromone model [58]. Pheromone model consists of vector of model parameters T called pheromone trail parameters. Pheromone trail parameters Ti  T , which are usually associated to components of solutions, have values  i called pheromone values. Pheromone model is used to probabilistically generate solutions to problem by assembling them from finite set of solution components. At runtime, ACO algorithms update pheromone values using previously generated solutions. The update aims to concentrate search in regions of search space containing high quality solutions. In particular, reinforcement of solution components depending on solution quality is an important ingredient of ACO algorithms. Initially ACO was applied to solve TSP [31] and later to many other combinatorial optimization problems. The framework can be derived from tackled combinatorial optimization problem which is defined as follows: Definition: A model P  ( S , , f ) of combinatorial optimization problem consists of (a) search (or solution) space S defined over finite set of discrete decision variables and set  of constraints among variables; (b) an objective function f : S    to be minimized.

The search space S is defined as follows [31]: Given a set of n discrete variables X i with values

vij  Di  {vi1 ,......... ......, vi|Di | }, i  1,......... ...., n . Variable instantiation i.e., assignment of value

v ij to variable X i is denoted by X i  vij . A feasible solution s  S is complete assignment that satisfies constraints. If set of constraints  is empty, then each decision variable can take any value from its domain independently of values of other decision variables. In this case, P is called an unconstrained problem model, otherwise constrained problem model. A feasible solution * s *  S is called globally optimal solution, if f ( s )  f ( s), s  S . The set of globally optimal

solutions is denoted by S *  S . To solve combinatorial optimization problem it is required to find a solution s *  S * . The model of combinatorial optimization problem under consideration implies finite set of solution components and pheromone model [58]. Denoting combination of decision variables as X i , one of its domain values v ij and solution component c ij , then pheromone model consists of pheromone trail parameter Ti j for each solution component c ij . The set of all solution components is𝕮. The value of pheromone trail parameter Ti j called pheromone value is denoted by  i j .The vector of all pheromone trail parameters is T . As combinatorial optimization problem can be modeled in different ways, different models of combinatorial optimization problem can be used to define different pheromone models. 6.4.3 Framework of basic ACO Algorithm The algorithm 1 captures framework of basic ACO algorithm [58]. It works as follows. At each iteration n a ants probabilistically construct solutions to combinatorial optimization problem under consideration exploiting given pheromone model. Then optionally local search procedure is applied to constructed solutions. Finally, before next iteration starts some of solutions are used for performing pheromone update. This framework is explained with more details as follows: (a) InitializePheromoneValues(T ) : At start of algorithm pheromone values are all initialized to constant value c  0 . (b) ConstructSolution(T ) : The basic ingredient of any ACO algorithm is constructive heuristic [31]. A constructive heuristic assembles solutions as sequences of elements from finite set of solution components ℭ. A solution construction starts with an empty partial solution s p  . Then, at each construction step current partial solution s p is extended by adding feasible solution component from set ( s p )  𝕮 \ {s p } . This set is determined at each construction step by solution construction mechanism in such a way that problem constraints are met. The process of constructing solutions can be regarded as walk or path on construction graph GC  ( 𝕮,𝕷 ) which is fully connected graph whose vertices are solution components in 𝕮 and whose edges are elements of 𝕷. The allowed walks on GC are implicitly defined by solution construction mechanism that defines set ( s p ) with respect to partial solution s . The choice of solution p

component cij  (s p ) at each construction step is done probabilistically with respect to pheromone model. The probability for choice of c ij is proportional to [ i j ] [ (cij )] , where is function that assigns to each valid solution component depending on current construction step, a heuristic value which is also called heuristic information. The value of parameters  and  ,

  0 ,   0 determines relative importance of pheromone value and heuristic information. Heuristic information is optional but often needed for achieving high algorithm performance. The probabilities for choosing next solution component called transition probabilities are defined as follows [31]:

p(cij / s p )  {[ i j ] [ (cij )] } /{

[

j  i



] [ (ckl )] }, cij  ( s p ) (33)

ckl  ( s p )

(c) LocalSearch(s ) : Local search procedure is applied for improving solutions constructed by ants. The use of such procedure is optional, though experimentally it has been observed that if available its use improves algorithm’s overall performance. (d) ApplyPheromone(T , 𝕾iter , s bs ) : The aim of pheromone value update rule is to increase pheromone values on solution components that have been found in high quality solutions. Most ACO algorithms use variation of following update rule:  ij  (1   ) ij  (  / 𝕾upd ) F ( s ) ) for i  1,......... .., n; j  1,......... ......, | Di | (34)



{ s upd / cij s}

Instantiations of this update rule are obtained by different specifications of 𝕾upd which in all cases is subset of 𝕾iter  {s bs } , where 𝕾iter is set of solutions that were constructed in current iteration and sbs is best-so-far solution. The parameter   (0,1] is called evaporation rate. It has the function of uniformly decreasing all pheromone values [58]. From practical point of view, pheromone evaporation is needed to avoid too rapid convergence of algorithm towards suboptimal region. It implements useful form of forgetting, favoring exploration of new areas in search space F : 𝕾 ↦   is function such that f ( s )  f ( s ' )    F ( s )  F ( s ' ), s  s '  𝕾, where 𝕾 is set of all sequences of solution components that may be constructed by ACO algorithm and that correspond to feasible solutions. F () is commonly called quality function. 6.4.4 Ant Colony Optimization for Longest Common Subsequence problem In this section, a dynamic algorithm is developed for solving LCS problem using ACO technique, viz., ACO-LCS algorithm [31] provided elements are arranged in an increasing sequence. Given set L of strings over an alphabet  , LCS problem consists of finding string of maximal length that is subsequence of each string in L . The string B is subsequence of string A , if A can be obtained from B by deleting in B zero or more characters. The character at position j in string Li is denoted by sij . The components of resulting construction graph are sij ' s and graph is fully connected. The constraints enforce that true subsequence of strings in L has to be built. These constraints are implicitly enforced through construction policy used by ants. Pheromone trial  ij is associated with each component sij . It gives desirability of choosing character sij when building the subsequence. ACO-LCS algorithm [31] works as follows. Considering sequence S  s1 ,......... ....., s n  , each ant iteratively starts from an alphabet or character and builds subsequences of strings in L defined over alphabet  , independently of each other. Thus, sequence S is divided into two or more subsequences. Each ant k receives copy of original set of strings L and initializes its subsequence to empty string. At first construction step, ant k adds to subsequence it is building a character that

occurs at front of at least one string Li  L i.e., it chooses at least one component s i1 from sequence S . The choice of character to be added is based on pheromone trials (which are initialized to one), as well as on some additional information as explained in section 6.4.5. Once a character is added, the same character is removed from front of strings on which it occurred. Then the procedure is reapplied to modified set of strings L' until all characters have been removed from all strings and set L' consists of empty strings. To describe the solution construction procedure more formally, for each ant k an indicator vector is defined as v k  (v1k ,......... .., vlk ) with l | L | . The element v ik of v k points to front position vik of string Li – the position that contains character that is candidate for inclusion in subsequence. Consider for example, set L  {bbbaaa, bbaaab, cbaab, cbaaa} for which longest subsequence is cbbbaaab. In this case, vector v k  (2,2,3,3) represents situation in which first character of first and second strings, as well as first two characters of third and fourth strings is already included in subsequence. The characters that are candidates for inclusion in subsequence are a and b . Thus, s1 (v1k )  s12  b ,

s 2 (v 2k )  s 22  b , s3 (v3k )  s33  a , s4 (v4k )  s43  a [35]. Algorithm 1: Framework of basic ACO algorithm input: An instance P of combinatorial optimization problem model P  ( S , , f ) InitializePheromoneValues(T )

sbs  NULL while termination conditions not met do 𝕾iter ←  for j = 1, . . . , na do s  ConstructSolution(T ) if s is a valid solution then s  LocalSearch(s ) {optional} if (f(s) < f(sbs)) or (sbs = NULL) then sbs  s

𝕾iter ← 𝕾iter  {s} end if end for ApplyPheromone(T , 𝕾iter , s bs ) end while output: Best-so-far solution sbs At beginning of solution construction, vector v k is initialized to v k  (1,......... ....,1) . The solution construction procedure is completed once indicator vector has reached value v k  (| L1 | 1,......... ...., | Ll | 1) . As mentioned earlier, at each construction step feasible neighborhood N k (v k ) , i.e., set of characters can be appended to subsequence under construction is composed of characters occurring at positions pointed by indicator vector [31] (36) N k (v k )  {x   : i, x  si (vik )} The choice of character in x  N k (v k ) is to append to subsequence is done according to pseudorandom proportional action choice rule of ACS given by following equation [31]

arg max{ il [ il ]  }, q  q 0  lN ik j  J , otherwise 

(37)

where, q is random variable uniformly distributed in [0,1], q 0 (0  q 0  1) is parameter and J is random variable selected according to probability distribution given by following equation [58]

p  k ij

[ ij ] [ ij ] 

[



ij

] [ ij ]



, j  N lk

(38)

lN lk

where,  ij is heuristic value that is available apriori,  and  are two parameters which determine relative influence of pheromone trial and heuristic information; N lk is feasible neighborhood of ant

k when being at node l , i.e., set of nodes that ant k has not visited yet. Using pheromone trail value, the sum of pheromone trails of all occurrences of x in l strings is [31],

 (v

i i:si ( vik ) x

k i

)

(39)

Finally, indicator vector is updated, i. e, for i  1.......... ..l is [31] k k  vi  1, si (vi )  x vik   k  vi , otherwise

(40)

where, x is character appended to subsequence. It is to be noted that equation (39) gives pheromone amount to character x , i. e, (1) higher is k amount of pheromone on components sij for which it holds sij  si (vi )  x ; (2) higher is number of times character x occurs at current front of strings in L (i.e., larger cardinality of set {sij : sij  si (vik )  x} ). This later rule (2) reflects majority merge heuristic that at each step chooses character that occurs most often at front of strings. An additional variant can also be considered that weighs each pheromone trail with | Li | vik  1 , giving higher weight to characters occurring in strings in which many characters still need to be matched by subsequence. This later choice is inspired by L  majority merge heuristic that weighs each character with length of string Li and then selects character with largest sum of weights of all its occurrences. Considering sequence

S

divided into only two subsequences viz., a i and b j , where

1  i  n / 2;1  j  n / 2 [31], such that each subsequence is of length n / 2 which is obtained by making pair wise comparisons between each pair of elements of S by ants based on amount of pheromone deposited subject to the condition that ai

 b j . When n is odd, last element

is left to be used later. Now, pair wise comparisons are made between elements of smaller set called a-set (say) based on amount of pheromone deposited by ants. This is repeated until smallest element is found. The repetition procedure is generally based on pheromone updates and

local search done by ants. The similar procedure is repeated for larger set called b-set (say) until largest element is found. It is to be observed here that an extra phase is required when n is odd, in which case last element can be compared successively with maximum and minimum element to calculate final maximum and minimum element respectively. The repetition procedure is again based on pheromone updates and local search done by ants successively. Pheromone Update The amount of pheromone an ant k deposits is given by [58]  k  g (r k ) / | s k |

(41)

k

where, r is rank of ant k after ordering ants according to quality of their solutions in current iteration, g is some function of rank and s k is solution built by ant k . Pheromones are updated as follows First, a vector z

k

 ( z1k ,......... ...., zlk ) with l | L | analogous to one used for solution k

construction is defined and initialized to z k  (1,......... ....,1) . The element z i of z k points to character in string Li that is a candidate for receiving pheromone. Then, s k subsequence built by ant k is scanned from first to last position. Let x h denote h th character in s k , h  1,......... ..., | s k | . At each step of scanning procedure, first is memorized set

M hk  {si ( zik ) : si ( zik )  xh } of

elements in strings belonging to L that are pointed by indicator vector whose value is equal to x h  it is to be noted that by construction of subsequence there is at least one such character. Let x h exists. Next, indicator vector is updated, i.e., for i  1,......... ..., l [31]: k k   z i  1, si ( z i )  x h z ik   k   z i , otherwise

(42)

Once subsequence has been entirely scanned, amount of pheromone to be deposited k on k component sij  M h is given by [31],

 ijk  ( k / | M hk |).{2(| s k | h  1) /(| s k | 2  | s k |)} (43) The left term of right-hand side of equation (43) says that pheromone for h th character in s k is distributed equally among components of strings in L , if x h occurred in more than one string. The right term of right-hand side of equation (43) is scaling factor that ensures that overall sum of pheromones deposited by ant k is equal to  k ; additionally, this scaling factor ensures that earlier the character occurs in s k , larger the amount of pheromone it receives. Hence, each character of string receives an amount of pheromone that depends on how early character was chosen in construction process of ant k , how good ant k' s solution is and number of strings from which character was chosen in same construction step. Once all pheromone trails are evaporated and above computations are done, contributions of all ants to characters’ pheromone trails are summed and added to pheromone trail matrix. Local Search The pair wise comparison between elements in generated subsequence is based on local search done by ants. Local search is an optional component of ACO algorithms [58], although it has been shown since early implementations that it can greatly improve overall performance of ACO metaheuristic when static combinatorial optimization problems are

considered. In ACO-LCS local search is applied once ants have built their initial solutions and each solution is carried to its local optimum by application of local search routines. Locally optimal solutions are then used to update pheromone trails on arcs of construction graph, according to pheromone update procedure. Some of the commonly used local search procedures are: (i) Edge-exchange heuristics; (ii) Path-preserving exchange heuristics; (iii) Handling precedence constraints; (iv) Lexicographic search strategy with precedence constraints; (v) Labeling procedure. In ACO-LCS problem, any of the search procedures can be used to obtain local optimum [1]. The ACO-LCS algorithm is schematically represented below [31]. Algorithm 2: ACO-LCS algorithm 1. / Initialization Phase / For each pair (v, s) (v, s) = 0 End-For 2. / First step of Iteration / For k =1 to m do Let vk be the node where agent k is located vk  0 / All ants start from node 0 / End-For 3. / This is the step in which agents build their subsequences. The subsequence of the agent k is stored in Subsequencek in an increasing sequence / For k =1 to m do For i =1 to n -1 do Starting from vk compute the set N(vk) of feasible nodes / N(vk) contains all nodes j still to be visited and such that all nodes that have to precede j have already been inserted in the sequence / Choose the next node sk according to the pseudorandom proportional action choice rule of Ant Colony System in an increasing sequence by making pair wise comparisons Subsequencek(i)  (vk, sk) (vk, sk)   ijk / This is equation(iv) / vk  sk / New node for agent k / End-For End-For 4. / In this step, the local search is applied to solutions built by each ant / For k =1 to m do Optimized_ Subsequencek  local_opt_routine(Subsequencek) End-For 5. / In this step Pheromone Trails are updated using Pheromone Update Rule / For k =1 to m do Compute Lk / Lk is the length of the Optimized_ Subsequencek / End-For Let Lbest be the longest Lk from beginning and Optimized_ Subsequencebest the corresponding sequence For each (z, s)  Optimized_ Subsequencebest (z, s)  update the values / Use Pheromone Update Rule / End-For 6. If(End_condition = True) then Print Lbest and Optimized_ Subsequencebest else goto Step 2 End-if

6.4.5 Stochastic Combinatorial Optimization for ACO-LCS Algorithm The stochastic combinatorial optimization aspect for ACO-LCS algorithm [31] is discussed here. Stochastic optimization [126] is referred to as those combinatorial optimization problems for which some of variables used to define them have stochastic nature. This could be problem components as defined below, which can have some probability of being part of the problem or not or values taken by some of variables describing the problem or value returned by objective function. An artificial ant in ACO is stochastic constructive procedure that incrementally builds solution by adding opportunely defined solution components to partial solution under construction. Thus, ACO metaheuristic can be applied to LCS problem for which constructive heuristic can be defined. The real issue here is how to map LCS problem to a representation that can be used by artificial ants to build solutions. In the following formal characterization of representation that artificial ants use and policy they implement are given. The model based search approach is used for ACO-LCS algorithm. Considering above minimization problem ( S , , f ) , the goal of minimization problem is to find an optimal solution s * , i.e., feasible solution of minimum cost. The set of all optimal solutions is denoted by S * . At very general level, model based search approach attempts to solve this minimization problem by repeating following two steps viz., (a) Candidate solutions that are constructed using some parameterized probabilistic model, i.e., parameterized probability distribution over solution space and (b) Candidate solutions that are used to modify the model in a way it is deemed to bias future sampling toward low cost solutions. An auxiliary memory may be used in which some important information collected during search is stored. The memory which may store, information on distribution of cost values or collection of high-quality solutions, can be later used for model update. Moreover, in some cases at every iteration it may be desired to build new model, rather than to iteratively update same one. For any algorithm belonging to this general scheme, two components corresponding to two steps above need to be instantiated viz., (a) Probabilistic model that allows an efficient generation of candidate solutions and (b) An update rule for model’s parameters or structure. In rest of this section, two systematic approaches within model based search framework viz., stochastic gradient ascent and cross-entropy methods [31] are discussed which define second component that is update rule for the model. The main characteristics of this model are positive feedback, distributed computation and use of constructive greedy heuristic. Positive feedback accounts for rapid discovery of good solutions, distributed computation avoids premature convergence and greedy heuristic helps find acceptable solutions in minimum number of stages. It is assumed that combinatorial optimization problem ( S , , f ) is mapped on problem that can be characterized by [31]: (a) A finite set C  {c1 ,......... ......., c Nc } of components, where N c is number of components; (b) A finite set X of states of the problem, where state is sequence x  ci , c j ,......... , ck ,........  over elements of C . The length of sequence x i.e., number of components in sequence is expressed by | x | and maximum length of sequence is bounded by positive constant n   ; (c) The set of (candidate) solutions S is subset of X (i.e., S  X ); (d) ~ ~ * A set of feasible states X with X  X defined via a set of constraints  ; (e) A non-empty set S ~ * of optimal solutions with S *  X and S  S . Given above formulation artificial ants build candidate solutions by performing randomized walks on completely connected weighted graph GC  (𝕮, 𝕷) which is fully connected graph

whose vertices are solution components in 𝕮 and whose edges are elements of 𝕷. Further vector T is considered gathering so-called pheromone trails  . The graph G is called construction graph. Each artificial ant is put on randomly chosen vertex of graph and then it performs randomized walk by moving at each step from vertex to vertex in graph in such a way that next vertex is chosen stochastically according to strength of pheromone currently on arcs. While moving from one node to another of graph GC , constraints  may be used to prevent ants from building infeasible solutions. Formally, solution construction behavior of generic ant [31] can be described as follows: Ant_Solution_Construction for each ant:  select start node c1 according to some problem dependent criterion 

set k  1 and x k  c1 

~

while xk  c1 ,......... ....., ck  X , x k  S and J xk   do: at each step k after building sequence x k ,select next node (component)

c k 1 randomly using following distribution:

{F (c k , c)( (c k , c))}/  {F (c k , y )( (c k , y ))}, (c k , y )  J xk  ( ck , y )J xk PT (c k 1  c | x k )   (44) 0, otherwise where, connection (c k , y ) belongs to J xk iff sequence x k 1  c1 ,......... ....., c k , y  satisfies ~ constraints  i.e., x k 1  X and Fij (z) is some monotonic function most commonly, z  (i, j )  , where  ,   0 and  are heuristic visibility values. If at some stage x k  S and J xk   i.e., construction process has reached a dead-end, current state x k is discarded. For certain problems, one may find useful to use more general scheme, where F depends on pheromone values of several related connections, rather than just single one. The probabilistic rule given by above equation, together with underlying construction graph, implicitly defines first component of model based search algorithm viz., probabilistic model. Having chosen probabilistic model, next step is to choose parameter update mechanism. In the following, updates within ACO framework as well as ones derived from stochastic gradient ascent algorithm and cross-entropy method are described. Many different schemes for pheromone update have been proposed within ACO framework. Most pheromone updates can be described using following generic scheme [31]:

s  S t' , (i, j )  s :  (i, j )   (i, j )  Q f ( s | S1 ,......... ., S t ),

(45)

(i, j ) :  (i, j )  (i   )   (i, j ) th where, S i is sample in i iteration,  ,0    1 is evaporation rate and Q f (s | S1 ,......... ., S t ) is some quality function which is typically required to be non-increasing with respect to f and is defined over reference set S t' . Different ACO algorithms may use different quality functions and reference sets. For example, in very first ACO algorithm viz., ant system, quality function was 1 / f ( s ) and reference set S t'  S t . In more recently proposed scheme, called iteration best

update, reference set was singleton containing best solution within S t (if there were several iteration-best solutions, one of them was chosen randomly). For global best update, reference set contained best among all iteration-best solutions (and if there were more than one global-best solution, earliest one was chosen). In case good lower bound on optimal solution cost is available, following quality function may be used [31]:

( f ( s)  LB ) ( f  f ( s)) (46) }   0{ } ( f  LB ) ( f  LB ) where, f is average of costs of last k solutions and LB is lower bound on optimal solution cost. Q f ( s | S1 ,......... ., S t )   0 {1 

With this quality function, solutions are evaluated by comparing their cost to average cost of other recent solutions, rather than by using absolute cost values. In addition, quality function is automatically scaled based on proximity of average cost to lower bound. Pheromone update which slightly differs from generic update described above was used in ant colony system [58]. There pheromones are evaporated by ants online during solution construction, hence only pheromones involved in construction evaporate. Two additional modifications of generic update were also available. The first one uses maximum and minimum pheromone trail limits. With this modification, probability to generate any particular solution is kept above some positive threshold, which helps preventing search stagnation and premature convergence to suboptimal solutions. The second modification, proposed under the name of hyper-cube framework in context of combinatorial problems with binary coded solutions, is to normalize quality function, hence obtaining an automatic scaling of pheromone values  i . While all updates described above are of somewhat heuristic nature, stochastic gradient ascent and cross-entropy methods allow deriving parameters update rules in more systematic manner which are discussed below [31]. 6.4.6 Stochastic Gradient Ascent update in ACO-LCS An update rule for stochastic gradient is given by [31],

T t 1  T t   t  Q f (s) ln PT (s)

(47)

sSt

where, S t is sample at stage t . In case distribution is implicitly defined by ACO-type construction process, parameterized by vector of pheromone values T , gradient  ln PT (s) can be efficiently calculated. The following is generalized calculation [31]: From definition of Ant_Solution_Construction, it follows that for s  c1 , c2 ,......... .  , |s|1

PT ( s )   PT (c k 1 | prefk ( s))

(48)

k 1

where, prefk (s ) is k  prefix of s and consequently |s|1

 ln PT ( s)   ln PT (c k 1 | prefk ( s)) (49) k 1

Finally, given pair of components (i, j )  C 2 , using expression for PT (c k 1  c | x k ) and assuming differentiability of F , it is easy to verify that if i  c k , j  c k 1 then [31],

  {ln PT (c k 1 | prefk ( s ))}  {ln F ( (i, j )) /  F ( (i, y ))}   (i, j )   (i, j ) ( i , y )J x ( k )  {F ' ( (i, j )) / F ( (i, j ))}  {F ' ( (i, j )) /

 F ( (i, y))}

(50)

( i , y )J x ( k )

 {1  F ( (i, j )) /

 F ( (i, y))}{F ( (i, j )) / F ( (i, j ))} '

( i , y )J x ( k )

 {1  PT ( j | prefk ( s))}{G ( (i, j ))} ' where, G ()  F ()

F ()

and subscript of F was omitted for clarity of presentation.

If i  c k , j  c k 1 then,

 {ln PT (ck 1 | prefk ( s))}   PT ( j | prefk ( s))G( (i, j ))  (i, j )

(51)

If i  c k , then PT (c k 1 | prefk ( s)) is independent of  (i, j ) and

 {ln PT (ck 1 | prefk ( s))}  0  (i, j )

(52)

By combining these results, following pheromone update rule is derived [31]:

s  S t' , (i, j )  s :  (i, j )   (i, j )   t Q f ( s )G ( (i, j )), s  {c1 ,......... ..c k ,......... .}  S t , i  c k ,1  k | s |,

(53)

j :  (i, j )   t Q f ( s ) PT ( j | prefk ( s ))G ( (i, j )) Hence, any connection (i, j ) used in construction of solution is reinforced by an amount  t Q f (s)G( (i, j )) and any connection considered during construction has its pheromone values evaporated by an amount  t Q f (s) PT ( j | prefk (s))G( (i, j )) . It is to be noted that, if solutions are allowed to contain loops a connection may be updated more than once for same solution. In order to guarantee stability of resulting algorithm, it is desirable to have bounded gradient '

 ln PT (s) . This means that function F , for which G  F F is bounded and should be used. 6.4.7 Cross-entropy update in ACO-LCS Cross-entropy approach requires solving following intermediate problem [31]

Pt 1  arg max  Q f (s) ln P(s) PM

(54)

sSt

Considering this problem in more details in case of ACO-LCS type probabilistic model, maximum gradient must be zero [31], such that, Q f (s) ln P(s)  0 (55)



sSt

In some relatively simple cases, for example when solution s is represented by an unconstrained string of bits of length n , ( s1 ,......... ...., s n ) and there is single parameter  i for i th position in string, such that PT ( s ) 



i

p ( si ) , the above equation system reduces to set of independent

equations [31]

ln p d d ln(1  p ) { }  { }, i  1,......... ..., n d  Q f ( s) d  Q f ( s) sSt si 1

(56)

sS t si  0

which may often be solved analytically. For example, p   it can be verified that solution of above equation is

p     Q f (s)si /  Q f (s) sSt

(57)

sSt

and in fact, a similar solution also applies to more general class of markov chain models. Now, since pheromone trails  i in above equation are random variables, whose values depend on particular sample. To make the algorithm more robust, some conservatism is introduced into the update. For example, rather than discarding old pheromone values, new values may be taken to be convex combination of old values and solution of above equation is [31]:

 i  (1   ) i    Q f (s)si /  Q f (s) sSt

(58)

sSt

The resulting update is identical to one used in hyper-cube framework for ACO. However, for Q f (s) ln P(s)  0 many cases of interest equation are coupled and



sSt

an analytical solution is unavailable. Nevertheless, in actual implementations of cross-entropy method update was of form given by equation, p   which may be considered as an approximation to exact solution of cross-entropy minimization problem. Since, in general exact solution is not available an iterative scheme such as gradient ascent could be employed. As shown previously, gradient of log-probability may be calculated as follows [31]: If i  c k , j  c k 1 then,

 {ln PT (ck 1 | prefk ( s))}  {1  PT ( j | prefk ( s))G( (i, j ))}  (i, j ) where, G () 

F ' ()

F ()

(59)

and subscript of F was omitted for clarity of presentation. If

i  ck , j  ck 1 then,

 {ln PT (ck 1 | prefk ( s))}   PT ( j | prefk ( s))G( (i, j ))  (i, j ) If i  c k , then PT (c k 1 | prefk ( s)) is independent of  (i, j ) and

 {ln PT (ck 1 | prefk ( s))}  0  (i, j )

(61)

(60)

and these values may be plugged into any general iterative solution scheme of cross-entropy minimization problem. It may be concluded that if equation, p   is used as possible approximate solution of equation [31]

Pt 1  arg max  Q f (s) ln P(s) PM

(62)

sSt

the same pheromone update rule as in hyper-cube framework for ACO is derived. If otherwise a single-step gradient ascent is used for solving the problem given by above equation, generalization of stochastic gradient ascent update is obtained in which quality function is allowed to change over time.

6.5 RFMLP–NN for JSP The objective of this section is to develop RFMLP–NN scheduler for JSP [14], [50], [82], [87], [102], [109], [156]. To accomplish this, learning task for ANN needs to be identified. The central issue here is that optimal solutions to scheduling problem have common features which can be implicitly captured by Machine Learning tool like ANN. The second issue is that learned function i.e., ANN can be utilized to generate good solutions to new problems. The third issue concentrates in handling inherent uncertainty and impreciseness involved in scheduling problems which is taken care of by Rough and Fuzzy sets. To generate learning or knowledge base GA are chosen for producing optimal solutions as they have proved to be successful in empirical scheduling research. Scheduling can be understood as decision making process [149], where decision is selection of next operation to add to partial schedule from among set of competing operations with objective of minimizing chosen performance measure. A complete schedule is thus consequence of repeated decisions to select the best operation. Thus, in an optimal solution every individually scheduled operation of job is treated as decision which contains knowledge. Each decision is modeled as function of set of job characteristics such as processing time, machine load, which are divided into classes using domain knowledge from common dispatching rules like shortest processing time. The job of RFMLP–NN is then to capture predictive knowledge regarding assignment of operation’s position in the sequence. A schematic illustration of entire procedure [44], [87] is given in figure 6.3. It generates optimal solutions to benchmark scheduling problem using GA; models scheduling function as Machine Learning Problem by defining classification scheme which maps optimal sequences to data patterns; develops an appropriate ANN incorporating Rough and Fuzzy sets and focuses on development of scheduler which combines well known active scheduling algorithm with ANN to obtain schedules for new problems. ANN is trained on classification data patterns. The different tasks are explained in following subsections. 6.5.1 Generation of solutions using Genetic Algorithms The knowledge base for learning task is constructed from optimal solutions to JSP generated by GA [74], [87], [102], [119]. A famous 6  6 problem instance viz., ft06 devised by [44] has been chosen as benchmark problem. This instance has six jobs, each with six operations to be scheduled on six machines and has known optimum makespan of 55 units. The data for instance is shown in table 6.5 using structure (machine, processing time). A distributed GA implemented in Java [44] is utilized here for obtaining solutions to benchmark problem. A solution generated by GA is sequence of 36 operations (6 jobs  6 machines): {1, 3, 2, 4, 6, 2, 3, 4, 3, 6, 6, 2, 5, 5, 3, 5, 1, 1, 6, 4, 4, 4, 1, 2, 5, 3, 2, 3, 6, 1, 5, 2, 1, 6, 4, 5}. Each number in sequence is representative of job number and current operation number. The repetition of job number in sequence indicates next operation of that job. The representation of GA solution is shown in figure 6.4. The

benchmark instance shown in table 6.5; GA is run 3000 times. Each run produced a solution and optimal makespan of 55 units was achieved 1696 times. The next step is to transform these 1696 solutions or chromosomes into data structure suitable for classification task.

Generate optimal sequences from Genetic Algorithms

Training, crossvalidation and Testing Data Sets

Map sequences to patterns by Classification scheme

Generate Data Sets

New Problem

Train Neural Network

Scheduler

Proposed Schedule

Figure 6.3: Schematic illustration of RFMLP–NN based Scheduler 6.5.2 Data Classification Problem The solutions obtained by GA [74], [119] contain valuable information relevant to scheduling process. The learning task is to predict position of an operation in GA sequence denoted by chromosome, based on its features or attributes. A sequence contains information about ordering of operations on each machine. A schedule specifies both sequence and starting times of operations. The reason for utilizing sequences produced by GA instead of schedules for leaning task is twofold, viz. (a) In GA chromosomes represent solutions to optimization problem. In constructing GA for scheduling, decision to represent chromosomes either as sequences or schedules is design decision. In former case, a decoder is specified to map sequence to schedule. The genetic operators needed to manipulate sequences are simple than those needed to manipulate schedules. The efficiency of GA for complicated combinatorial optimization problem

like JSP is an important aspect and hence sequences are utilized to represent chromosomes; (b) There exists an N: 1 mapping between sequences and schedules. This implies that in operation in different positions in two sequences might still occupy same position in decoded schedule. As prediction of position of operation is objective of learning task, sequences were utilized instead of schedules to prevent any loss of information relevant to learning task. Job 1 2 3 4 5 6

1

2

3, 1 2, 8 3, 5 2, 5 3, 9 2, 3

1, 3 3, 5 4, 4 1, 5 2, 3 4, 3

Operation 3 4 2, 6 5, 10 6, 8 3, 5 5, 5 6, 9

4, 7 6, 10 1, 9 4, 3 6, 4 1, 10

5

6

6, 3 1, 10 2, 1 5, 8 1, 3 5, 4

5, 6 4, 4 5, 7 6, 9 4, 1 3, 1

Table 6.5: ft06 problem instance devised by (Fisher and Thompson 1963) Job Operation

1 1

3 1

2 1

4 1

6 1

2 2

3 2

4 2

3 3

6 2

6 3

2 3

5 1

5 2

3 4

5 3

1 2

1 3

6 4

……………… 4 3

4 4

4 5

1 4

2 4

5 4

3 5

2 5

3 6

6 5

1 5

5 5

2 6

1 6

6 6

4 6

5 6

Figure 6.4: Representation of GA solution Based on study of operation attributes commonly used in priority dispatch rules, attributes that have been identified as input features are operation, process time, remaining time and machine load. These input features have been clustered into different classes using concept hierarchy for JSP developed by [96]. Each job in benchmark problem has six operations that must be processed in given sequence. The operation feature identifies sequence number of operation ranging between 1 and 6. This feature has been clustered into four classes as: {1} first, {2, 3} middle, {4, 5} later and {6} last. The process time feature represents processing time for operation. The remaining time feature denotes sum of processing times for remaining operations of that job and provides measure of work remaining to be done for completion of job assuming no resource conflicts. For benchmark problem viz., ft06 instance, processing time ranges from 1 to 10 units, while remaining times ranged from 0 to 39 units. Based on data, three classes or clusters for these features are identified with ranges being split into three equal intervals and classified as short, medium and long. The machine load feature determines machine loading and is clustered into two classes’ viz., light and heavy. This feature represents capacity or utilization of machines in units of time and helps in differentiating between possible bottleneck and non-bottleneck machines. These input features thus signify features of operation that affect its placement in the schedule. The target concept to be learned is priority or position in sequence. Since an operation can be positioned in any one of 36 locations available in GA sequence, it may be difficult to discover an exact relationship between input features and position. However, if problem is modified to predict range of locations for operation, learning task becomes simpler. The target feature priority, thus determines range of positions in sequence where operation can be inserted. The possible range of positions have been split into 6 classes and assigned class labels as shown in table 6.6. The classification problem with input and target features is illustrated in figure 6.5 [44].

Range of Positions Priority 1–6 Zero 7 – 12 One 13 – 18 Two 19 – 24 Three 25 – 30 Four 31 – 36 Five Table 6.6: Assignment of class labels to target feature

Operation Feature {First, Middle, Later, Last}

Process Time feature {Low, Medium, High}

CLASSIFICATION PROBLEM

Priority class {0, 1, 2, 3, 4, 5}

Remaining Time feature {Low, Medium, High}

Machine Load feature {Low, High}

Figure 6.5: Input features and target class for Classification Problem 6.5.3 RFMLP–NN Model Various important aspects should be considered for developing an effective ANN model [79], [156]. Two such aspects are considered. The first criterion is selection of suitable architecture, training algorithm, learning constants and termination issues for building the model. The second criteria relate to determination of sizes and choice of data patterns for training, cross validation and testing data sets. Considerable experimentation is required to achieve a good model of data. ANN model chosen for data classification is MLP ANN [44], [50]. An MLP consists of group of nodes arranged in layers. Each node in layer is connected to all nodes in next layer by links which have weight associated with them. The input layer contains nodes that represent input features of classification problem. A real valued feature is represented by single node, whereas discrete feature with n distinct values is represented by n input nodes. The classification strength of MLP ANN is enhanced by incorporating Rough and Fuzzy sets in ANN, which results in development

of RFMLP–NN model. ANN acts as efficient connectionist between the two. In this hybridization, Fuzzy sets help in handling linguistic input information and ambiguity in output decision, while Rough sets extract domain knowledge for determining network parameters. The Fuzzy MLP model [109] incorporates fuzziness at input and output levels of MLP as shown in figure 6.6 and is capable of handling exact or numerical and inexact or linguistic forms of input data. Any input feature value is described in terms of some combination of membership values in linguistic property sets low (L), medium (M) and high (H). Class membership values  of patterns are represented at output layer of Fuzzy MLP. During training, weights are updated by back propagation errors with respect to these membership values such that contribution of uncertain vectors is automatically reduced. A four layered feed forward MLP is used here. The output of neuron in any layer h other than input layer ( h  0) is given as follows [44], [50], [156]: y (jh ) 

1

1  exp(i yi( h1) w(jih1) )

(63)

( h 1) where, yi( h1) is state of i th neuron in preceding (h  1) th layer and w ji is weight of connection

0 from i th neuron in layer (h  1) to j th neuron in layer h . For nodes in input layer, y j corresponds to

j th component of input vector. Further it is to be noted that x (jh )  i y i( h 1) w (jih 1) . An n  dimensional pattern Fi  [ Fi1 , Fi 2 ,......... , Fin ] is represented as 3n-dimensional vector [44]: (64) Fi  [ low( Fi1 ) ( Fi ),........ .,  high( Fin ) ( Fi )]  [ y1(0) , y 2(0) ,......... ., y3(0n) ] where,  values indicate membership functions of corresponding linguistic   sets low, medium, and high along each feature axis and y1( 0) , y 2( 0) ,......... ., y3( 0n) refer to activations of 3n neurons in input layer. When input feature is exact in nature,   Fuzzy sets in one dimensional form are used with range [0,1] and are represented as follows [44]:

|| F j  c || 2  ) , for  || F j  c ||  2(1  2   || F j  c || 2  (65)  ( F j ; c,  )  1  2( ) , for0 || F j  c ||  2   0, otherwise   where,  > 0 is radius of   function with c as central point. When input feature F j is linguistic in nature, its membership values for   sets low L, medium M and high H are quantified as follows [44]:

0.95  ( F j (0.95 L); c jm ,  jm )  ( F j (0.95 / L); c jh ,  jh ) , , } L M H  ( F j (0.95 L); c jl ,  jl ) 0.95  ( F j (0.95 / L); c jh ,  jh ) medium  { , , } (66) L M H  ( F j (0.95 L); c j ,  j )  ( F j (0.95 / L); c j ,  j ) 0.95 high  { , , } L M H

low  {

l

l

m

m

where, c jl ,  jl , c jm ,  jm , c jh ,  jh indicate centers and radii of three linguistic properties along j th axis and F j (0.95 / L), F j (0.95 / M ), F j (0.95 / H ) denote corresponding feature values F j at which three linguistic properties attain membership values of 0.95. Now the procedure is considered for selecting centers and radii of overlapping   sets. Let m j be mean of pattern points along jth axis. Then, m jl and m jh are defined as mean along jth axis of pattern points having co-ordinate values in range [ F jmin , m j ) and (m j , F jmqax ] respectively, where and

F jmin denote

F jmqax

upper and lower bounds of dynamic range of feature F j for training set

considering exact values only. For three linguistic property sets, centers and corresponding radii are defined as follows [44]:

cmedium( F j )  m jm ; clow( F )  m j ; chigh( F )  m j j

and

l

j

(67)

h

low( F )  2(cmedium( F )  clow( F ) ) ; high( F )  2(chigh( F )  cmedium( F ) ) j

j

medium( F ) 

j

low( F )  ( F j j

max

j

j

j

(68)

 cmedium( F j ) )  high( F j )  (cmedium( F j )  F jmin ) F jmax  F jmin

j

respectively. Here, distribution of pattern points along each feature axis is taken into account while choosing corresponding centers and radii of linguistic properties. Besides, the amount of overlap between three linguistic properties can be different along different axis depending on pattern set. Considering l  class problem domain such that there are l nodes in output layer and n  dimensional vectors ok  [ok1 ,......... , okl ] and v k  [v k1 ,......... , v kl ] denote mean and standard deviation respectively of exact training data for k th class c k . The weighted distance of training pattern Fi from k th class c k is defined as follows [44]:

z ik 

n

Fij  okj

j 1

vkj

[

]2 for k  1,......... ..., l

(69)

where, Fij is value of j th component of i th pattern point. The membership of i th pattern in class k lying in range [0,1] is defined as follows:

1 (70) z ik f e 1 ( ) fd where, positive constants f d and f e are denominational and exponential fuzzy generators

 k ( Fi ) 

th controlling amount of fuzziness in class membership set. Then, for i input pattern, desired output of j th output node is defined as follows: (71) d j   j ( Fi )

According to this definition pattern can simultaneously belong to more than one class and this is determined from training set during learning phase. The rule generation and knowledge encoding is formulated for configuring Fuzzy MLP ANN by using Rough sets [44]. The algorithm is able to deal with multiple objects corresponding to one decision attribute. From perspective of Pattern Recognition, this implies using multiple prototypes to serve as representatives of any arbitrary decision region. The entire procedure is represented diagrammatically in figure 6.6. The principal task in method of rule generation is to compute reducts with respect to particular kind of information system and decision system. Relativised versions of matrices and functions viz., dreducts and d-discernibility matrices are used as basic tools for computation. Let S  U , A  be a decision table with C and D  {d1 ,......... ..., d l } its sets of condition and decision attributes

S  U , A  is decision table divided into l tables S i  U i , Ai ; i  1,......... ....., l , corresponding to l decision attributes d1 ,......... ..., d l where

respectively.

The

U  U 1  .......... ...  U l and Ai  C  {d i } .Let {xi1 ,......... ...., xip } be set of those objects of U i that occur in S i ; i  1,......... ....., l . Now for each d i  reduct B  {b1 ,......... ..., bk } , a discernibility matrix denoted by M di (B) from d i  discernibility matrix is defined as follows [44]:

cij  {a  B : a( xi )  a( x j )}; i, j  1,......... ....., n

(72)

For each object x j  xi1 ,......... ...., xi p , discernibility function f d j j is defined as [44]: x

f d j j  {(cij ) :1  i, j  n, j  i, cij  } x

(73)

where,  (cij ) is disjunction of all members of c ij . Then, f d j j is brought to its conjunctive normal x

form. Thus, dependency rule ri is obtained, viz., Pi  d i , where Pi is disjunctive normal form

f d j j ; j  i1 ,......... .., i p . The dependency factor dfi for ri is given by [44]: x

df i 

card ( POS i ( d i )) card (U i )

(74)

where, POS i ( d i )   X I d l i ( X ) and l i ( X ) is lower approximation of X with respect to Ii. In i this case, df i  1. In knowledge encoding phase consider feature F j for class c k in l  class problem domain. The th

inputs for i representative sample Fi are mapped to corresponding three-dimensional feature space of  low( Fij ) ( Fi ),  medium( Fij ) ( Fi ),  high( Fij ) ( Fi ) . Let these be represented by L j , M j , H j respectively. As the method considers multiple objects in class, a separate nk  3n  dimensional attribute-value decision table is generated for each class c k where, n k indicates number of objects in c k . The absolute distance between each pair of objects is computed along each attribute

L j , M j , H j ; j . The equation (81) is modified to directly handle real-valued attribute table consisting of fuzzy membership values. Defining cij  {a  B :| a( xi )  a( x j ) | Th}; i, j  1,......... ., nk where Th is an adaptive threshold. It is to be

noted that adaptivity of this threshold is built in depending on inherent shape of membership function. While designing initial structure of RFMLP–NN [44] union of rules of l classes is considered. The input layer consists of 3n attribute values while output layer is represented by l classes. The hidden layer nodes model innermost operator in antecedent part of rule, which can be either conjunct or disjunct. The output layer nodes model outer level operands, which can again be either conjunct or disjunct. For each inner level operator, corresponding to one output class one dependency rule, one hidden node is dedicated. Only those inputs attribute that appear in this conjunct or disjunct are connected to appropriate hidden node, which in turn is connected to corresponding output node. Each outer level operator is modeled at output layer by joining corresponding hidden nodes. It is to be noted that single attribute involving no inner level operators is directly connected to appropriate output node via hidden node to maintain uniformity in rule mapping. 6.5.4 Structure of RFMLP–NN Model The initial structure of four layered RFMLP–NN is designed here. The classification problem under consideration has four discrete input features or attributes with four, three, three and two distinct values. Thus, input layer in RFMLP–NN has total of (12 = 4 + 3 + 3 + 2) nodes, each representative of an input feature value [44], [156]. The hidden layers map input to output layer and number of nodes in hidden layers is empirically chosen to obtain good model. The nodes in hidden layers model innermost operator in antecedent part of rule, which can either be conjunct or disjunct. The output layer gives decision of classifier which is represented by classes. Each output node represents possible target class and hence there are six output nodes in RFMLP–NN. The nodes in this layer model outer level operator, which can again be either conjunct or disjunct. The dependency factor considered for any rule is 1. Only those input attributes that appear in conjunct or disjunct are connected to appropriate hidden nodes, which in turn are connected to corresponding output nodes. The winner takes all heuristic is used to determine class membership when multiple nodes are present in output layer. The class of output node with maximum activation or output is class computed by network. Next the description of initial weight encoding procedure is given [44], [156]. Let dependency factor for particular dependency rule for class c k be given by df    1 . The weight wki(1) between hidden node i and output node k is equal to



fac

  , where fac refers to number of outer level operands in antecedent of rule and  is

small random number taken to destroy any symmetry among weights. Generally fac  1 and each hidden node get connected to only one output node. Let initial weight considered at hidden node is denoted as. The weight wia( 0j) between an attribute a j (where, attribute a corresponds to low L,



  , such that facd is number of facd attributes connected by corresponding inner level operator. Since, facd  1 for an l-class problem domain, there are at least l hidden nodes. It may be noted that number of hidden nodes is determined directly from dependency rules. It depends on form in which antecedents are present in rules. A training algorithm is also used to determine values of network parameters which best map given set of input patterns to desired outputs. The most commonly used training algorithm is back propagation algorithm [79]. The term back propagation refers to direction of propagation of error. The goal of training regimen is to adjust weights and biases of network to minimize chosen medium M, or high H) and hidden node i is equal to

cost function. Though several cost functions are available, function appropriate for classification problems is cross entropy function which is measure of relative entropy between different classes. This function needs to be minimized and back propagation algorithm uses form of gradient descent to update weights. A learning rate scales correction term in revising weights during training process and hence governs rate of learning. In addition, momentum parameter is also used to speed up training process. This parameter further scales correction based on memory of size of previous increment to weight. In RFMLP–NN model, a variant of back propagation algorithm with momentum learning is used as specified by [123], [156]. The network is initialized with random weights and training algorithm modifies weights according to procedure discussed above. Since random initial weights were specified, network was trained multiple times before best model was chosen. Further, in each run entire training data set is presented to network multiple times and each view is known as an epoch. The number of epochs per run and number of runs are user specified inputs to training algorithm. Thus, 1696 optimal schedules obtained by GA represented total of 61056 scheduled operations (1696 schedules  36 operations/schedule). Assignment of input features and target classes was done for each operation according to classification scheme described in previous subsection. Sample data for classification task is shown in table 6.7. The collection of input features or attributes and target class pairs used to train network is called training set. The training set contains patterns not used for training purpose and is used to evaluate performance of network. Even with testing the model might over fit training data causing degradation in performance when tested on new patterns. Cross validation is useful technique which avoids this problem of over generalization by periodically testing model performance on cross validation data set during training phase. The best model is one with least cross validation error. This classification data set was split into training, cross validation and testing data sets with 60%, 20% and 20% memberships. Different network architectures were experimented with to determine best possible RFMLP–NN classifier. The most appropriate model chosen was two hidden layered RFMLP–NN with 12–12–10–6 architecture and hyperbolic tangent transfer functions at hidden layers as shown in figure 6.7. This network had the best classification accuracy for testing data set. The training parameters for 12–12–10–6 RFMLP classifier is given in table 6.8 [44], [156]. 6.5.5 RFMLP–NN Job Scheduler The scheduler is developed based on output of RFMLP–NN [44], [156]. Given an input scheduling problem of size m  n, scheduler first maps each of mn operations to an input pattern according to classification scheme described in previous subsection. These mn input patterns are then presented to trained RGMLP–NN, which assigns priority index to each of them. The GifflerThomson algorithm [44] which generates active schedules is used for schedule generation. An active schedule can be described as schedule in which no operation can be started earlier without delaying any other operation. The set of active schedules is guaranteed to contain an optimum solution for any regular performance measure like makespan. The Giffler-Thomson algorithm iteratively chooses an operation to schedule from among set of available operations based on priority index. The operations are first ranked based on priority index assigned to each operation by RFMLP–NN. The operation with lowest rank is scheduled on identified machine. This process continues till all mn operations in the problem have been scheduled. The Giffler-Thomson algorithm is an intelligent manufacturing, planning and scheduling paradigm and is implemented in C++ programming language. The software developed is generic in nature and could also generate schedules with other priority dispatching rules like shortest processing time etc.

Input Patterns

Linguistic representation

π-functions

Th

Multi Object representation Tr

Training data

Reducts

CNF

Reduced attribute value table

Discernibility Function

DNF

Dependency factors

Rules

connectivity

Learning and Refinement

connection weights

initial weights Knowledge encoding of Fuzzy Multi Layer Perception Neural Network

Classification Figure 6.6: Block diagram of the Procedure Pattern - ID 1 2 ……. ……. 61055 61056

Operation First Middle ……. ……. Later Last

Process Time Short Medium ……. ……. Medium Short

Remaining Time Long Long ……. ……. Short Short

Machine Load Light Light ……. ……. Heavy Light

Table 6.7: Sample data for classification task Network Parameters Step Size Momentum Factor Number of runs Number of epochs per run Number of epochs without improvement in cross validation Error

Value 0. 01 0.7 10 10000 500

Table 6.8: Training parameters for 12–12–10–6 RFMLP Classifier

Priority 0 1 ……. ……. 4 5

Figure 6.7: Rough Fuzzy Multi Layer Perception Neural Network (12–12–10–6)

6.6 Experimental Results and Comparisons In this section some experimental results are presented which are conducted on some well known real life situations and data sets of varying dimension and size. 6.6.1 Simulation results of examination timetable problem from Netaji Subhas Open University, Kolkata data The test data is generated through extensive simulation of the existing system at University from the departments of Mathematics, Computer Science, Physics, Life Science, Management, English and Bengali [46]. The effectiveness of the mathematical models of FILP technique are further demonstrated by comparing with different AI based heuristic techniques on 13 Carter benchmark datasets. The results for abovementioned mathematical models 1, 2 and 3 are summarized in tables 6.9, 6.10 and 6.11 respectively. A small data set is used as number of constraints grows exponentially which increases size of problem. The solution is found in Model 2 after lesser number of iterations than Model 1. The number of variables and constraints is significantly lower. However, solution times are not directly proportional to size of problem. Management data in Model 2 took longer than Computer Science data on same model, despite of the fact that two departments have same number of variables. This is generally due to the fact that performance of algorithm also depends on other factors such as deeper constraints are towards optimal polytope. Using polyhedral combinatorics, it is worthwhile to find facets or deepest cuts associated with such model. This will trim down infeasibilities and makes it easy for branch and bound procedure to finish up the remaining problem. Such an approach brings down size of problem that could be solved by mathematical programming procedures. This has been applied with success to several Linear Programming models. Generally, two models give same solution, but Model 2 is smaller in size and could be used to solve bigger problem sizes as it gives set of optimal solutions which can be used to test performance of heuristics. Further, on comparing result of Models 1 and 2, it is found that execution time of Model 2 is lower than that of Model 1. This difference generally exists due to the fact that number of variables and constraints in Model 2 are less than those of Model 1. Finally in Model 3, solution is found further after lesser number of iterations than Models 1 and 2. Also, number of variables and constraints is reduced significantly. While comparing Model 3 with Models 1 and 2, it is found that infeasibilities are cut down making it better candidate for branch and bound procedure than other models. Generally, three models give

same solution, but Model 3 is smaller in size gives set of optimal solutions and can be used to solve bigger real life problem sizes, which can be effectively used to test performance of heuristics. A comparative study of Models 1, 2 and 3 with respect to heuristic Integer Linear Programming Model are given in tables 6.12, 6.13 and 6.14. On comparing results in tables 6.9, 6.10 and 6.11 with tables 6.12, 6.13 and 6.14 it is evident that execution times are lower than for Models 1, 2 and 3 with respect to heuristic. This is mainly because various allocation variables associated with Models 1, 2 and 3 are formulated using fuzzy numbers which for which feasible solution is obtained in less execution time. The rigidity involved in allocation variables of Integer Linear Programming heuristic mainly attributes higher execution times.

Departments Mathematics Computer Science Physics Life Science Management English Bengali

Variables 336 446 336 336 446 286 286

Model 1 Constraints 72,666 112,025 86,769 164,396 110,019 66,889 64,886

Execution Time (seconds) 24.46 26.36 25.19 27.46 26.30 22.18 22.18

Table 6.9: Test results for FILP Model 1

Departments Mathematics Computer Science Physics Life Science Management English Bengali

Variables 124 127 124 124 127 107 107

Model 2 Constraints 7,786 6,669 8,968 9,846 10,896 5,787 4,996

Execution Time (seconds) 24.16 24.07 24.25 24.44 25.45 21.07 21.69

Table 6.10: Test results for FILP Model 2

Departments Mathematics Computer Science Physics Life Science Management English Bengali

Variables 48 44 44 48 44 27 27

Model 3 Constraints 255 290 346 396 425 227 207

Execution Time (seconds) 12.37 9.96 11.33 12.86 12.46 7.66 7.37

Table 6.11: Test results for FILP Model 3

FILP technique is also tested on data instances prepared by Carter, Laporte and Lee in 1996 on 13 real world examination timetable problems from 3 Canadian High Schools, 5 Canadian, 1 American, 1 British and 1 Middle-East Universities which is readily available on website http://www.cs.nott.ac.uk/~rxq/data.htm. The datasets measure performance of approaches related to ETP and is prepared carefully to mimic real word ETP at Netaji Subhas Open University,

Kolkata with different size and supersets of constraints. The datasets are of varying sizes with respect to different parameter values. To demonstrate significance of FILP technique, a comparative performance of best and mean cost as well as execution times is made with respect to seven different AI based heuristics when applied to Carter benchmark datasets. These results are given in tables 6.15 and 6.16 respectively. Model 1 as Integer Linear Programming heuristic Departments Variables Constraints Execution Time (seconds) Mathematics 336 72,666 26.55 Computer Science 446 112,025 28.90 Physics 336 86,769 27.69 Life Science 336 164,396 29.98 Management 446 110,019 28.56 English 286 66,889 24.75 Bengali 286 64,886 24.79 Table 6.12: Test results for Model 1 as ILP heuristic Model 2 as Integer Linear Programming heuristic Departments Variables Constraints Execution Time (seconds) Mathematics 124 7,786 26.36 Computer Science 127 6,669 26.09 Physics 124 8,968 26.37 Life Science 124 9,846 26.86 Management 127 10,896 32.64 English 107 5,787 26.05 Bengali 107 4,996 24.69 Table 6.13: Test results for Model 2 as ILP heuristic Model 3 as Integer Linear Programming heuristic Departments Variables Constraints Execution Time (seconds) Mathematics 48 255 14.75 Computer Science 44 290 11.96 Physics 44 346 13.37 Life Science 48 396 14.98 Management 44 425 14.69 English 27 227 10.75 Bengali 27 207 10.37 Table 6.14: Test results for Model 3 as ILP heuristic

The following different AI based heuristic abbreviations are used: M1: Roulette Wheel Graph Coloring Technique; M2: Heuristic Combinations for Hyper Heuristic Technique; M3: Ant Colonization Technique; M4: Ahuja–Orlin Technique; M5: Fuzzy Logic Technique; M6: Ordering Heuristics Technique; M7: Decision Tree Based Routine Generation Technique; FILPM3: Fuzzy Integer Linear Programming Technique (Model 3). The mathematical model 3 of FILP is used because minimum number of variables is required in its formulation. The cost for each timetable is calculated using proximity cost function defined by the following equation. It assesses the quality of a timetable in terms of how well examinations are spread. The cost of this function is minimized for each ETP.

 w(| e

i

 e j |) N ij

i, j

(75)

S

where, | ei  e j | is distance between the periods for each pair of examinations (ei , e j ) with common students; N ij is number of students common to both examinations; S is total number of students and w(1)  16 , w( 2)  8 , w(3)  4 , w( 4)  2 , w(5)  1 , i.e. smaller the distance between periods the higher the weight allocated. When n  5 w( n)  0 . The best solutions generated are accessible from http://saturn.cs.unp.ac.za/~nelishiap/et/heuristics.htm. The proximity costs of each of these solutions are highlighted in table 6.15. The mean cost for problem is the average of proximity cost of the best solution obtained for each of forty runs performed. The overall system did not include mechanisms to ensure that feasible timetables were produced. However, different heuristic techniques produced feasible timetables for all benchmarks. Datasets

M1

M2

M3

M4

M5

M6

M7

CAR–F–92

6.21 6.25 7.01 7.07 42.81 44.24 12.90 12.96 18.47 18.50 15.62 15.64 7.92 7.99 10.50 10.54 161.00 161.07 10.98 11.04 4.76 4.80 29.69 29.70 40.83 40.84

4.28 4.36 4.97 5.10 36.86 37.22 11.85 11.85 14.62 14.62 11.14 11.14 4.73 4.78 9.65 9.70 158.33 158.33 8.48 8.52 3.40 3.43 28.88 28.88 40.74 41.52

4.80 4.86 5.70 5.82 36.86 36.96 11.90 11.97 15.00 15.04 12.10 12.12 5.40 5.44 10.20 10.26 158.40 158.47 8.80 8.84 3.80 3.84 28.97 28.98 41.60 41.96

4.40 4.42 5.20 5.30 36.90 34.98 12.30 12.31 14.64 14.67 11.20 11.21 9.70 9.75 159.20 159.27 8.48 8.48 3.60 3.64 29.00 29.02 41.20 41.28

4.54 4.60 5.29 5.37 37.02 37.18 12.79 12.80 15.81 15.84 13.46 13.47 10.39 10.44 160.75 160.76 9.31 9.37 3.57 3.60 30.32 30.32 40.80 40.87

4.38 4.40 5.08 5.18 38.44 38.45 11.96 11.97 14.67 14.70 11.69 11.70 9.89 9.90 158.72 158.75 8.78 8.87 3.55 3.62 29.63 29.65 40.85 40.86

4.31 4.32 5.24 5.30 37.75 37.79 11.86 11.87 14.96 14.99 12.80 12.86 4.90 4.97 9.72 9.79 158.37 158.45 8.89 8.90 3.44 3.46 29.07 29.10 40.77 40.80

CAR–S–91 EAR–F–83 HEC–S–92 KFU–S–93 LSE–F–91 PUR–S–93 RYE–S–93 STA–F–83 TRE–S–92 UTA–S–92 UTE–S–92 YOR–F–83

FILPM3 4.27 4.31 4.95 4.96 36.78 36.80 11.85 11.86 14.50 14.52 11.14 11.17 4.72 4.75 9.65 9.66 158.30 158.31 8.37 8.37 3.35 3.37 28.86 28.87 40.72 40.72

Table 6.15: Comparison of best and mean cost of FILP (Model 3) technique with other AI based Heuristic techniques From tables 6.15 and 6.16, it is evident that different AI based techniques performed appreciable results on different data sets. Experiments are performed on four small datasets, four medium

datasets and five large datasets. They are tested using 40 runs for each instance. The proposed FILP-M3 technique produced the best overall results performing just as good as or better than other heuristic techniques for 13 Carter benchmarks. Time consumed for each dataset is variable in nature and depends upon different parameters involved in each technique. The execution time for each heuristic is more or less the same for all techniques. FILP technique takes smaller amount of time to generate satisfactory solution in comparison to other heuristic solutions. Although the study presented in this work focuses on developing methodology that generalizes well over spectrum of techniques that produces significant results for one or more datasets, the performance of this method is compared to the best results cited in literature for Carter benchmarks to assess potential of this methodology. This fact is illustrated by tabulating difference from best cited results as given in table 6.17. It is evident that even though the method described in this paper only performs construction phase and not an improvement phase, results produced by FILP-M3 technique is comparable to the best results cited in literature for Carter benchmarks. Furthermore, FILP-M3 method has produced better results than some of methodologies and outperformed on number of benchmarks. The results presented in this section show effectiveness and potential of FILP-M3 technique as general methodology for producing good quality solutions to ETP at Netaji Subhas Open University, Kolkata, India. Datasets

M1

M2

M3

M4

M5

M6

M7

CAR–F–92 CAR–S–91 EAR–F–83 HEC–S–92 KFU–S–93 LSE–F–91 PUR–S–93 RYE–S–93 STA–F–83 TRE–S–92 UTA–S–92 UTE–S–92 YOR–F–83

10 12 4 14 12 5 110 5 31 3 10 2 3

7 10 2 11 5 2 80 5 0.50 2 9 1 2

9 11 2 10 6 3 90 5 0.70 2 9.50 1 2

8 10 1.50 9.50 5.50 2 5 0.80 2 9 1 2

9.50 10 2 11 7 4 5.50 0.90 2.50 9 1.20 2

8 10 3 11 5 3 5 0.55 2 9 0.90 3

7 10 2 11 6 3.50 87 5 0.50 2 9 1 2

FILPM3 7 10 1.50 9.55 5 2 80 5 0.50 2 9 1 2

Table 6.16: Comparison of Execution Times (in minutes) of FILP (Model 3) technique with other AI based Heuristic techniques 6.6.2 Simulation results of University Course Timetable Problem from St. Xavier’s College, Kolkata data Simulations should be performed for all parameter combinations of table 6.4 before deciding best combination of operators to be adopted in ultimate implementation. However number of combinations is prohibitive for exhaustive evaluation. Thus, elitism like technique has been applied in order to reduce number of simulations needed. The technique is briefly enumerated below [43]. First simulation experiment was conducted for standard GA setup discussed in section 6.4.5. The experiment consisted of 20 independent runs. After completion of runs, three statistical figures were calculated viz. overall best solution quality achieved, overall worst solution quality achieved and average solution quality achieved throughout 20 runs. Then, first operator setup of table 6.4 was added to standard GA setup and another simulation round of 20 runs was launched. The results were compared to those of standard setup via three statistical

figures mentioned above. If new setup had better performance than original, then new setup was adopted as best so far setup. Otherwise tested setup was ignored. With this method only 25 simulations of 20 runs each are needed to evaluate operators and their parameters of table 6.4. The validity of this method is based on assumption that operators are more or less independent of each other, which is obviously true as justified by experimental results. The simulation results for operators of table 6.4 are given in table 6.18, where adopted setups are displayed in bold typeface. From table 6.18 it is clear that operators that exhibited best performance and were adopted in GA scheme are [74]: (i) Uniform Crossover; 9ii) Window Mutation Operator (Probability = 0.4); (iii) Swap Chromosome Operator (Probability = 0.1); (iv) Mutate Chromosome Operator (Probability = 0.1); (v) Varying Fitness Function with square increase; (vi) GA Population of 400 genotypes and (vii) Micro GA Combinatorial Hill Climbing Operator.

Datasets CAR–F–92 CAR–S–91 EAR–F–83 HEC–S–92 KFU–S–93 LSE–F–91 PUR–S–93 RYE–S–93 STA–F–83 TRE–S–92 UTA–S–92 UTE–S–92 YOR–F–83

Fuzzy Integer Linear Programming (Model 3)

Best result cited

Difference

4.27 4.95 36.78 11.85 14.50 11.14 4.72 9.65 158.30 8.37 3.35 28.86 40.72

4.28 4.97 36.86 11.85 14.62 11.14 4.73 9.65 158.33 8.48 3.40 28.88 40.74

0.01 0.02 0.08 0.00 0.12 0.00 0.01 0.00 0.03 0.11 0.05 0.02 0.02

Table 6.17: Comparison of the results obtained by FILP (Model 3) technique and best result cited in literature By adding these operators to standard GA scheme it was possible to evolve best overall solution from value of 61087 for standard setup, down to value of 22982 for advanced setup. Roughly 60 hard constraints are violated to obtain value of 61087, while only 22 hard constraints are violated for value of 22982. The next step tests effectiveness of domain specific Hill Climbing operators discussed in section 6.4.5. For this reason four more simulation experiments were conducted. Each experiment incorporated one of four domain specific operators. Again 20 runs were executed for each experiment and each time the results were compared to best so far results. When an operator was found to enhance performance of GA optimizer it was adopted. The simulation results for these operators are shown in table 6.19. As it is obvious from table 6.16, each one of four domain specific operators enhances performance of GA optimizer and thus all four operators were adopted in final scheme. The domain specific operators managed to evolve best overall solution from value of 22982 for advanced setup down to value of 2809. The optimal solution for 2809 can be analyzed into two parts: (i) Hard constraints violation part which is 2000; (ii) Soft constraints violation part which is 809. The value of 2000 for first part means that 2 hard constraints are violated at optimal solution. The value of 809 for second part means that all soft constraints are fully satisfied and that gaps within classrooms and teacher schedules are

adequately minimized. The lower value of soft constraints i.e. 809 as compared to value of 2000 for hard constraints is attributed through treatment of measure of violation of soft constraints using Fuzzy sets [155], [159]. Setup Mean Quality Best Quality Worst Quality Standard 74192 61087 97073 1 (a) 72286 52046 90669 1 (b) 67886 60009 75969 2 (a) 72389 56966 82976 2 (b) 81690 65010 93024 3 (a) 61326 50024 75996 3 (b) 61484 44026 77987 4 (a) 66282 42010 82997 4 (b) 65090 52024 75999 5 (a) 70095 55998 82998 5 (b) 65887 52990 82884 6 (a) 65888 54027 87998 6 (b) 65186 55997 75987 7 (a) 71665 60995 80024 7 (b) 76882 52030 88987 8 (a) 53490 45036 64995 8 (b) 59680 44985 76996 9 68479 46046 91980 10 58278 47982 70999 11 (a) 50265 38978 66980 11 (b) 52269 36975 62936 11 (c) 57696 47660 69526 12 (a) 53277 38984 88024 12 (b) 42036 26002 58982 13 29782 22982 35030 Table 6.18: Simulation results for Standard Operators and parameters The final step encodes and evaluates manual solution for similar timetable problem that was already available. The manual solution was evaluated through same fitness function that was also used for FGA optimizer. The comparative results of manual solution and FGA solution are given in table 6.20, where objective value is part of fitness value attributed to violation of soft constraints i.e., objectives. The penalty value is part of fitness value attributed to hard constraints. The classroom hour gaps is total number of hours within classroom schedules during which classrooms are unoccupied, and teacher hour gaps is total number of hours within each teacher’s schedule during which teacher does not have class assignment. Setup Mean Quality Best Quality Worst Quality Day Change 20886 14446 28686 Fix Teacher 18169 12421 28684 Fix Room 12996 9466 22989 Fix Day 6936 2809 11130 Table 6.19: Simulation results for domain specific operators

As is obvious from table 6.20, FGA optimizer does not manage to satisfy all hard constraints, although it comes very close to achieving it, by producing solution with only 2 violating constraints. On the other hand, it is also evident that solution produced by FGA satisfies soft constraints better than manual solution. The FGA solution scores an objective value of 809 compared to 1286 of manual solution. The value of 809 corresponds to only 5 classroom hour gaps and only 1 teacher hour gap compared to 90 and 98 of manual solution respectively. It seems like manual solution was outcome of focused effort to satisfy hard constraints, while soft constraints didn’t received much importance. On the other hand GA solution is well developed concerning soft constraints as they have been treated with fuzzy membership functions. In Table 6.21, the experiments are performed on different datasets viz., small, medium and large. In this process, the following different GA based heuristic abbreviations are used: GA1: Genetic Algorithm1; Genetic Algorithm2; GA3: Genetic Algorithm3; GA4: Genetic Algorithm4; GA5: Genetic Algorithm5; GA6: Genetic Algorithm6; GA7: Genetic Algorithm7; FGH: Fuzzy Genetic Heuristic [43]. Feature Manual Solution Fuzzy Genetic Algorithm Solution Fitness 1286 2809 Objective Value 1286 809 Penalty Value 0 2000 Number of Hard Constraints Violated 0 2 Number of Soft Constraints Violated 0 0 Classroom Hour Gaps 90 5 Teacher Hour Gaps 98 1 Table 6.20: Comparison between manual and Fuzzy Genetic Algorithm solution Datasets Small1 Small2 Small3 Small4 Small5 Medium1 Medium2 Medium3 Medium4 Medium5 Large

GA1 11.55 11.59 11.62 11.64 11.69 109.96 104.86 110.99 -

GA2 15.79 15.86 15.96 15.98 15.99 106.90 109.50 100.79 -

GA3 16.60 16.64 16.66 16.69 16.86 116.99 107.84 118.30 104.56 105.99 -

GA4 17.45 17.46 17.47 17.50 17.52 115.90 107.86 117.99 115.57 115.96 -

GA5 14.56 14.57 14.59 14.60 14.69 111.30 86.32 114.37 114.54 85.69 116.32

GA6 16.62 16.64 16.66 16.67 16.69 112.30 80.32 112.37 112.50 75.69 119.30

GA7 18.32 18.33 18.34 18.35 18.37 112.28 79.86 112.16 112.37 69.86 119.55

FGH 19.76 19.90 19.86 19.89 19.89 119.07 119.56 118.66 117.96 118.98 119.99

Table 6.21: Comparison of Execution Times (in minutes) of Fuzzy Genetic Heuristic solution with other GA based Heuristic techniques 6.6.3 Illustration of ACO-LCS Algorithm To illustrate effectiveness of ACO-LCS algorithm [31], three types of sequences are considered viz., (i) sequence consisting of n elements when n is power of 2; (ii) sequence consisting of n elements when n is even and (iii) sequence consisting of n elements when n is odd. These sequences are formed by elements which are determined by amount of pheromone deposited by ants. In first case, it is supposed that sequence consist of 8 distinct elements and sequence is processed such that LCS can be generated. Since, 8 = 40320, this is number of possible cases

that are to be studied; this number of cases is reduced initially by substantial amount through pair wise comparisons. This is done by drawing directed path from one element to other that has pheromone deposited by ants as shown in figure 6.8.

Figure 6.8: Directed path from one element to other depicting Pheromone deposited by ants where, ai ' s , 1  i  n / 2 and b j ' s , 1  j  n / 2 are sets of smaller and larger elements. As ai ' s and b j ' s contains four elements each, a combination of 576 is possible. The proposed algorithm specifically selects some permutations which has higher amount of pheromone by deleting those permutations of ai ' s and b j ' s sets which are identical in nature and is determined by tracing path from minimum to maximum. As a result of this, 8 possible cases arise in both sides and combination of 64 is possible. The 8 possible ai ' s set are 1234, 1243, 2143, 2413, 2341, 2134, 4321, 3214; identical nature of 1234 with respect to other permutations is shown in figure 6.9.

Figure 6.9: Identical nature of 1234 with respect to other permutations Similarly, possible b j ' s sets are 1234, 1423, 1243, 1432, 2143, 3124, 4231, 4123; identical nature of 1234 with respect to other permutations is shown in figure 6.10. So, there are 64 combinations possible among ai ' s and b j ' s . The following is observed after some logical computational steps; 16 sets of length 6 are a1234, b1432, a1234, b4231, a1243 b1423, a1243, b4123 a2143, b1423, a2143, b4123, a2413, b3124, a2413, b2143 a2341, b3124, a2341, b2143, a2134, b4231, a2134, b1423 a3214, b1234, a3214, b1243, a4321, b1234, a4321, b1243

Figure 6.10: Identical nature of 1234 with respect to other permutations The diagram of a1234 b1432 is shown in figure 6.11.

Figure 6.11: Diagram of a1234 b1432 The 8 sets of length 4 are as follows a1234, b1234, a1243, b1243, a2143 b2143, a2413, b1423 a2341, b1432, a2134, b3124, a4321, b4231, a3214, b4123 The diagram of a1234 b1234is shown in figure 6.12.

Figure 6.12: Diagram of a1234 b1234

The 40 sets of length 5 are a1234, b1423, a1234, b1423,…………….,a4321, b1432 The diagram of a1234 b1423is shown in figure 6.13.

Figure 6.13: Diagram of a1234 b1423 Now, considering second type of sequence which consists of even number of elements (say 6), then 6 sets of length 5 each are a123, b213, a213, b123, a231 b123, a231, b312, a321, b132, a321, b213 The diagram of a123 b213is shown in figure 6.14.

Figure 6.14: Diagram of a123 b213

Figure 6.15: Diagram of a123 b123

The 8 sets of length 4 are a123, b123, a123, b132, a123 b312, a213, b132, a213, b213, a213, b312, a231, b132, a321, b312; the typical diagram of a123 b123is shown in figure 6.15. The 2 sets of length 6 is best possible case and yields fully sorted sequence which will be a231 b213, a321 b123; the diagram of a231 b213is shown in figure 6.16.

Figure 6.16: Diagram of a231 b213 Considering final case i.e., when n is odd. In this case, additional step is required to determine both intermediate maximum and minimum element as show below. Taking sequence consisting of odd number of elements (say 5) which shows 6 sets of length 4 as follows a12, b12, 1, a12, b21, 1, a21 b12, 1, a21, b21, 1, a12, b12, 1, a21, b21, 1, where 1 is last element; the diagram of a12 b12, 1is shown in figure 6.17.

Figure 6.17: Diagram of a12 b12, 1 The 2 sets of length 3 will be a21 b21, 1, a12 b12, 1; the diagram of a21 b21, 1 is shown in figure 6.18.

Figure 6.18: Diagram of a21 b21, 1 The 4 sets of length 5 each, which is best possible case and yields fully sorted sequence as a12 b21, 1, a21 b12, 1, a12 b21, 1, a21 b12, 1; the diagrams of a12 b21, 1 and a21 b12, 1 are shown in figures 6.19 and 6.20 respectively.

Figure 6.19 Diagram of a12 b21, 1

Figure 6.20 Diagram of a21 b12, 1 6.6.4 Performance illustration of RFMLP–NN Classifier for JSP Here, the performance of RFMLP–NN classifier and scheduler [44] on test set of problems is presented. Confusion matrix which shows desired classification (GA solutions) and output of classifier are compared on test data set given by table 6.22 are used to evaluate performance of RFMLP–NN classifier. In confusion matrix, diagonal entries represent number of test set instances classified correctly as set by GA for RFMLP–NN classifier. The classification accuracy for each class is calculated by dividing number of correct classifications by total number of instances in that class. It can be observed from table that often deviation from correct classification is only by one class. This deviation can be attributed to presence of considerable noise in classification data set. The two main sources of noise are: (i) Genetic Algorithm assignments: GA assigned different priorities to same operation in different schedules, since multiple sequences may produce same optimal makespan. This led to some ambiguity in classification data set with same training patterns having different target features which increased complexity of learning task. (ii) Encoding of classification problem: The chromosome sequences are mapped according to classification scheme into training patterns. While this generalization reduced dimensionality of data which was desirable for comprehensibility, it represented loss of information and source of noise for classification task. A comparative summary of makespan of schedules generated by GA, RFMLP–NN, SPT and other priority dispatching rules for ft06 instance is given in table 6.23. The performance of Attribute Oriented Induction (AOI) rule set is reported from [99]. AOI is rule induction method using concept hierarchies to aggregate membership of tuples that produced set of rules which assign priority to an operation based on either mean or mode placement. Makespan for schedules built by using dispatching rules other than SPT are from empirical study [44]. Only GA is able to achieve an optimal makespan of 55 units. RFMLP–NN developed here achieves makespan of 58 units, deviation of 4 time units (5.45 %) from optimum makespan. The

deviations of other methods ranged from 12 to 29 units (21.8 % – 52.7 %). The performance of RFMLP–NN is considerably better than other methods in scheduling benchmark 6  6 problem. Output/Desired Priority (Zero) Priority (One) Priority (Two) Priority (Three) Priority (Four) Priority (Five) Classification Accuracy (%) Total Accuracy (%)

Priority Priority Priority Priority Priority Priority (Zero) (One) (Two) (Three) (Four) (Five) 819 86 7 0 0 0 356 869 246 9 0 0 221 496 816 186 9 0 0 32 424 1119 532 46 0 0 0 66 319 119 0 0 5 56 521 1237 64.79 66.29 59.46 86.56 28.46 95.37 66.98

Table 6.22: Confusion matrix of 12–12– 10–6 RFMLP Neural Network Classifier To access generalization capabilities of RFMLP–NN, test data set consisting of 10 randomly generated 6  6 problem scenarios is constructed. The motivation in using test set of 6  6 problem is to keep scheduling complexity similar to benchmark ft06 problem, while altering processing times and precedence constrains of operations. Schedules based on different approaches are built using Giffler-Thompson algorithm, with priorities for operations being assigned from base algorithm (RFMLP–NN, AOI–Mean, AOI–Mode and SPT). Table 6.23 shows performance of these schedules on test data sets. GA solutions are considered benchmark solutions for comparing other schedulers. Figures indicated in bold in table 6.21 represent best solutions obtained for test instance under consideration. RFMLP–NN scheduler provides better makespan (nearest to GA makespan) for more problem instance than other methods on test data set. RFMLP–NN scheduler performed better than SPT heuristic in nine of the ten cases. From an observation of average makespan values and percentage deviations provided in table 6.24, it is evident that RFMLP–NN scheduler approaches GA in scheduling test problems. This illustrates generalization capabilities of RFMLP–NN scheduler as learning part comes from only optimal solution to benchmark ft06 problem instance. Scheduler

Makespan

Genetic Algorithm (GA) Rough Fuzzy Multi Layer Perception - Neural Network (RFMLP - NN) Attribute Oriented Induction (AOI) Shortest Processing Time (SPT) Most Work Remaining (MWKR) Shortest Remaining Processing Time (SRMPT) Smallest Ratio of Processing Time to Total Work (SPT-TWORK)

55 58

Deviation (%) 0 5.45

67 83 67 84 72

21.81 50.90 21.81 52.72 29.09

Table 6.23: Makespan of Schedules for ft06 Analysis of Variance (ANOVA) technique is used for comparing performance of alternate schedulers. The experiment is designed as randomized complete block design to account for variability arising from different job scheduling problem instances in test data set. The

assumptions made for this experiment are that observations are independent and normally distributed with same variance for each treatment (scheduler). Anderson–Darling test [44] verified normality assumption. Bartlett’s test is used to validate assumption of homogeneity of variances. The null hypothesis for this experiment, H0 (considering that all treatment means are equal) is tested at 5 % significance. If null hypothesis is rejected, it is concluded that there exists significant difference in treatment means. Duncan’s multiple range tests is then used to identify pairs of schedulers, which has significant difference in means. The treatment means are sorted in an ascending order. The test statistic is least significant studentized range rp where, p denotes number of treatment means and depends on number of means and degrees of freedom. For 2  p  6 , values of rp are obtained from least significant studentized range table. Duncan’s critical value R p for these means is computed. The difference between pair of means drawn from set of ordered treatment means is compared with critical value, R p . This comparison is carried out for all 15 possible combinations of treatment pairs. The determination of significant difference in means allowed schedulers to be combined into groups as given in table 6.25. Problem instance GA RFMLP – NN AOI - Mode AOI - Mean SPT 46 48 53 54 ft06-R1 49 53 58 64 ft06-R2 55 56 60 62 67 71 ft06-R3 60 48 60 55 63 ft06-R4 54 55 63 66 ft06-R5 60 61 54 61 59 67 ft06-R6 56 51 54 60 ft06-R7 53 53 67 70 76 75 ft06-R8 71 54 59 68 59 ft06-R9 56 59 70 70 86 Ft06-R10 64 54.7 57.7 60.7 62.1 66.1 Average 5.48 10.97 13.53 20.84 Deviation (%) (Bold values indicate best solutions obtained for Problem instance under consideration) Table 6.24: Makespan obtained by various Schedulers on test data set Scheduler Group Genetic Algorithm (GA) A Rough Fuzzy Multi Layer Perception - Neural Network (RFMLP - NN) B Attribute Oriented Induction (AOI - Mean) B Attribute Oriented Induction (AOI - Mode) B Shortest Processing Time (SPT) C Table 6.25: Different groups of schedulers The three groups identified by Duncan’s test correspond to different scheduling approaches for JSP. The optimization method viz., GA provided best makespan. The Machine Learning methods (RFMLP–NN, AOI–Mean and AOI–Mode) constituted second group B. While not statistically significant within this group, RFMLP–NN approach provided best results on test problems having lowest average makespan. Also, AOI based methods produce rules which assign priorities to operations. These rules are not sufficient to schedule any randomly generated 6  6 scenario. For some operations, input features of operation did not match rule antecedent and in such cases,

an average priority index of 2.5 is assigned to such operations. In contrast to rule-based approach, trained RFMLP–NN could successfully describe any randomly generated 6  6 instance. The performance of all members in second group is better than SPT heuristic. RFMLP–NN scheduler has shown to have good generalization capabilities based on its performance on test set of 10 randomly generated 6  6 problem instance. Another dimension for evaluating performance of RFMLP–NN scheduler is its scalability on larger problem sizes. For this, five well-known problem instances from scheduling literature with sizes ranging from 100 to 400 operations were selected. These problems are ft10, la24 and la36 abz7 and yn1 [44]. The problems are selected from different sources to provide gradation in problem size. The makespan achieved by RFMLP– NN scheduler is compared to makespan of best known solution, GA and active schedulers based on SPT heuristic and random assignment heuristic (table 6.26). The last heuristic selects an operation from among set of schedulable operations randomly [44]. Problem name

Size

ft10 (10  10) la24 (20  10) la36 (15  15) Abz7 (20  15) yn1 (20  20) Average Deviation from GA (%)

100 200 225 300 400

Best known solution 930 935 1268 665 886 936.8 -

GA 1056 1204 1538 828 1139 1153 -

RFMLP– NN 1136 1546 1699 921 1132 1286.8 11.60

SPT

Random

1429 1874 2289 1030 1539 1632.2 41.56

1748 2014 2377 1400 1998 1907.4 65.43

Table 6.26: Comparison of makespan performance of different schedulers An analysis of results indicates consistent pattern in performance of schedulers. The GA provides best solution among different Schedulers. RFMLP–NN scheduler out performs other schedulers (SPT and Random) on all problems. The deviation in average makespan provided by RFMLP– NN scheduler is 11.60% from average makespan provided by GA, while those of SPT and random assignment heuristic are 41.56% and 65.43% respectively. The deviation of RFMLP–NN scheduler from average GA makespan increases by 6.12% points on larger problem data set from 5.48% on test set of 6  6 problems. This compares favorably to increase in deviation of SPT scheduler by 20.42% points between two data sets. This is impressive because RFMLP–NN is trained on solutions obtained from single 6  6 benchmark problem. To be able to provide good solutions with minimum computational effort on considerably larger problems with different sequencing constraints and processing times validates learning of RFMLP–NN. It shows that RFMLP–NN is able to capture certain key problem size – invariant properties of good solutions as part of its training regimen. An inspection of result in table 6.25 also reveals consistent divergence in GA performance when compared to best known solutions. This is because GA used in this research had simple evolutionary scheme with standard genetic operators and constants parameters. Typically, GA parameters have to be tuned with considerable experimentation. For larger JSP, GA has shown to achieve very good solutions with more sophisticated genetic operators and adaptive parameters. In one such study [44] it is reported that by incorporating local search into GA scheme, near–optimal or optimal solutions are obtained on similar set of benchmark problems to those utilized in this work. As primary objective of this work is to develop and demonstrate potential of RFMLP–NN to learn from good solutions, simple GA is employed which provided optimal solutions for 6  6 problem. The source of solutions for learning task is not important and is indeed one of strengths

of this approach. In real world scenario, solutions considered as good solutions by experts can be used to train RFMLP–NN thus providing computational model based on expert knowledge. Alternately, more effective optimizers could be used to generate good solutions for training RFMLP–NN. RFMLP–NN scheduler developed here is generic in nature and could be effectively combined with any optimizer to generate good solutions with low computational effort and time. Though RFMLP–NN scheduler is computationally less intensive than GA and offers more comprehensible scheduling approach. It also provides an attractive alternative to simple heuristics like SPT as it effectively utilizes more problem–specific knowledge in making scheduling decisions with similar computational effort. For an arbitrary problem, development of an efficient optimizer involves significant design effort and an understanding of problem domain. The learning frameworks using RFMLP–NN presented in this work is generic and can be easily applied whenever there are known good solutions to problems, irrespective of their source. A thorough understanding of problem domain is not essential for successful application of methodology. Thus, learning framework is also particularly suited to problem domains in which current understanding is limited.

6.7 Conclusion In this chapter, different problems relating to assignment, scheduling and sequencing are studied. The first problem addresses FILP model for examination timetable problem. Many possible models exist for examination timetable problem. The problem can be further reduced by formulating with lesser number of allocation variables it without affecting optimality of the solution obtained. As problem is an NP-hard in nature, FILP formulation gives an optimal solution that can serve as good benchmark for other heuristics. The test data set obtained through extensive simulation is used to compare with solutions obtained from other heuristic viz., ILP approach which gives an idea about quality of heuristic. The technique is again compared with different AI based heuristic techniques for ETP with respect to best and mean cost as well as execution time measures on Carter benchmark datasets to illustrate its effectiveness. The comparative study is performed using mathematical Model 3 of FILP technique because minimum number of variables is required in its formulation. FILP technique takes an appreciable amount of time to generate satisfactory solution in comparison to other heuristic solutions. The experimental study presented here focuses on producing a methodology that generalizes well over a spectrum of techniques that generates significant results for one or more datasets. The performance of FILP model is finally compared to the best results cited in literature for Carter benchmarks to assess its potential. The problem can also be reformulated using rough fuzzy hybrid approach. This is followed by a variant of examination timetable problem viz., university timetable problem is then studied using FGA heuristic. The technique uses an indirect representation featuring event allocation priorities and invokes timetable builder routine for constructing complete timetable. The algorithm incorporates number of techniques and domain specific heuristic local search operators to enhance search efficiency. The non-rigid soft constraints in the problem are basically optimization objectives for search algorithm. There is an inherent degree of uncertainty involved in objectives which comprises of different aspects of real life data. This uncertainty is tackled by formulating measure of violation parameter of soft constraint in fitness function using fuzzy membership functions. FGA heuristic has been applied on real world university course timetable problem for which manual solutions are already available. It has been shown through extensive simulation that on incorporating certain combinatorial and domain specific operators search efficiency of evolutionary algorithm is significantly enhanced. On comparing FGA heuristic with manual solution it is evident that although the technique does not satisfy all hard constraints of problem, it achieves significantly better score in satisfying soft constraints and thus its

performance is superior. The algorithm’s inability to satisfy all hard constraints is attributed to difficulty of specific problem and to limited resources (7000 generations) used during simulation process. However, more simulations are required to develop an efficient solution with no violations in hard constraints. Further, to verify efficiency and robustness of algorithm, it should be tested on different real world timetable problems. The algorithm can also be adapted to solve other university course timetable as well as scheduling problems. The quality of solutions can be greatly improved using other hybrid paradigms such as neuro fuzzy genetic or rough fuzzy genetic approaches. Next, LCS is considered using ACO technique. The proposed methodology viz., ACO-LCS is applied to LCS and simulation results are given. The proposed algorithm draws analogy with behavior of ant colonies function and yields better results than traditional technique for finding LCS based on dynamic programming as is evident from its efficient computational complexity. The stochastic combinatorial optimization aspect for proposed technique is also given which are characterized by positive feedback, distributed computation, and use of constructive greedy heuristic. Positive feedback accounts for rapid discovery of good solutions, distributed computation avoids premature convergence and greedy heuristic helps find acceptable solutions in minimum number of stages. The computational complexity of ACO-LCS algorithm in worst case is obtained as log 2 n  . Finally, a novel knowledge based approach viz., RFMLP–NN is used to solve job scheduling problem by utilizing various ingredients of Machine Learning paradigm. The ability of GA to provide multiple optimal solutions is exploited to generate knowledge base of good solutions. RFMLP–NN is successfully trained on this knowledge base. RFMLP–NN Scheduler is utilized to schedule any job scenarios of comparable size to benchmark problem. Also, this methodology can be employed for learning from any set of schedules, regardless of their origin. A test problem set consisting of 10 randomly generated 6 6 scenarios is used to evaluate generalization capabilities of developed scheduler. A comparative evaluation of RFMLP–NN scheduler with other schedulers developed from different methodology and SPT heuristic is also undertaken. Among these schedulers, RFMLP–NN Scheduler developed in this work has closest average makespan to that of GA. The RFMLP–NN Scheduler also performed satisfactorily on test set of larger problem sizes, which demonstrates success of RFMLP–NN in learning key properties to identify good solutions. Thus, this investigation successfully develops RFMLP–NN scheduler which provides close approximation to performance of GA scheduler to JSP.

Chapter 7 Conclusion and Scope for further Research

7.1 Conclusion In every chapter, conclusions drawn from respective methodologies developed and experimental results therein are presented. Here, they are consolidated to provide an overall picture of contributions of the thesis. In this thesis, certain Soft Computing tools were used to develop solutions for different aspects of optimization problem. The problems considered were traveling salesman problem, transportation problem, decision making problem, rectangular game problem, financial investments problem, stock price prediction problem, time series forecasting problem, bankruptcy prediction problem, resource allocation problem, assignment problem, sequencing and job scheduling problems. Various methodologies have been developed using both Soft Computing approaches integrating Fuzzy Logic, ANN, Rough sets, GA and ACO. The emphasis of proposed methodologies is given on handling data sets which are large, both in size and dimension and involves classes that are overlapping, intractable and have non-linear boundaries. Several strategies based on data reduction, dimensionality reduction, active learning efficient search heuristics are employed for dealing with the issue of scaling in learning problem. The problems of handling linguistic input and ambiguous output decision, learning of overlapping/intractable class structures, selection of optimal parameters and discovering human comprehensible knowledge in form of linguistic rules are addressed in Soft Computing framework. Different features of methodologies along with comparisons with those of related ones are demonstrated extensively on different real life data sets. These data have number of dimensions and are considered from varied domains, such as traveling salesman problem, banking systems, financial institutions, corporate organizations, stock exchanges and currency exchange rates, academic institutions, scheduling and sequencing problems. The superiority of models over several related ones are found to be effective and significant. In chapter 2, TSP is rigorously studied using FSOM, FILP and FMOLP approaches. Experimental results indicate that FSOM-2opt hybrid algorithm generates appreciably better results compared to both Evolutionary and Lin Kerninghan algorithm for TSP as number of cities increases. Experiments with other SOM should be performed and gaussian neighborhood and conscience mechanism can be applied to improve the solutions developed. Some other optimization algorithms may be used other than 2opt algorithm. Enhanced edge recombination is one of the best permutation operators for TSP. The permutation operators are actually better for other permutation problems like warehouse or shipping scheduling applications. Therefore, FSOM-2opt hybrid might work better for other permutation problems than for TSP. The FILP formulation of TSP generates an optimal solution which is feasible in nature and also takes care of the impreciseness aspect. Finally, symmetric versions of TSP with vague decision parameters are considered and solved using FMOLP. It deals with flexible aspiration levels or goals and fuzzy constraints with acceptable deviations. The multi-objective TSP exists in uncertain or vague environment where route selection is done by exploiting these parameters. The tolerances are introduced by decision maker to accommodate this vagueness. By adjusting these tolerances, range of solutions with different aspiration levels are found from which decision maker can choose one that best meets his satisfactory level within given domain of tolerances. FMOLP can be effective in achieving k-dimensional points according to aspiration level of decision maker in multi-dimensional solution space. In chapter 3, transportation problem is studied under probabilistic and fuzzy uncertainties. The fuzzy method is based on α-level representation of fuzzy numbers and probability estimation of the fact that given interval is greater than or equal to other interval. The method makes it possible to extend simplex method using fuzzy numbers. Numerical results obtained using fuzzy

optimization method and monte-carlo method with linear programming using real valued and random parameters show that fuzzy approach has considerable advantages in comparison with monte-carlo method especially from computational point of view. Another important aspect which has been dealt here is representation of constraint equations of transportation problem as system of linear equations and solving them through Neuro-Fuzzy approach using Polak-Ribiere conjugate gradient method for three cases, viz., exactly determined, underdetermined and overdetermined systems. This is achieved using Fuzzy Back-Propagation learning rule. The only algebraic operations required here are addition and multiplication. The nodes and links in NeuroFuzzy network are comprehensible. They allow greater degree of precision in solution obtained and are simple to implement. The numerical examples illustrate effectiveness of Neuro-Fuzzy algorithm developed. An initial solution of transportation problem is obtained using FVAM and optimality of solution is tested through FMODIM. The closed, bounded and non–empty feasible region of transportation problem using fuzzy trapezoidal numbers ensures existence of an optimal solution to balanced transportation problem. The multi-valued nature of Fuzzy sets allows handing of uncertainty and vagueness involved in cost values of each cells in transportation table. The technique developed demonstrates accuracy in solutions obtained through real life data. In chapter 4, concepts of Soft relation and Fuzzy Soft relation are presented to solve various decision making problems arising in engineering, management and social science domains. These problems often involve data that are imprecise, uncertain and vague nature. The solution developed through other techniques lack in parameterization of tools due to which they could not be applied successfully in dealing with such problems. Soft set and Fuzzy Soft set concepts possess certain parameterization features which are certain extensions of crisp and fuzzy relations respectively and have rich potential for application to decision making problems. This fact is evident from theoretical analysis which illustrates the rationality of proposed method. These concepts are used to solve some real life decision making problems which illustrate their advantages over other paradigms. Another important decision making problem which arises in practical life situations is game theory where decisions are made in competitive situations under conflict caused by opposing interests. Rectangular fuzzy games using LR-type trapezoidal fuzzy numbers are considered whose pay-off matrix is imprecise in nature. LR-type trapezoidal fuzzy numbers are used because of their simplicity and computational efficiency. The solution of fuzzy games with pure strategies by minimax-maximin principle, algebraic method to solve 2  2 fuzzy games without saddle point by using mixed strategies alongwith concept of dominance method are obtained. LR-type trapezoidal fuzzy numbers generates optimal solutions which are feasible in nature and also takes care of impreciseness aspect. The problem of classification of financial investments using multi-class SVM is considered next. Experimental results on four datasets show that gaussian kernel is not always best choice to achieve high generalization of classifier although it is often the default choice. The dependency of classifier accuracy is exhibited on different kernel functions of multi-class SVM using different datasets. With the choice of kernel function and optimal values of parameters C and  it is possible to achieve maximum classification accuracy of all datasets. In chapter 5, RFMLP ANN is used to generate stock price prediction rules of BSE. RFMLP network is evaluated modularly using GA for designing knowledge based network for pattern classification and rule generation. The algorithm involves synthesis of several MLP modules, each encoding Rough set rules for particular class. GA refines the knowledge-based modules. The genetic operators preserve modular structure evolved. This methodology along with modular network decomposition results in accelerated training and more compact network with comparable classification accuracy. The model is used to develop new rule extraction algorithm. The extracted rules are compared with some rule extraction techniques on basis of some

quantitative performance indices. The extracted rules are less in number, accurate, have high certainty factor and low confusion with less computation time. Then a Neuro-Fuzzy hybrid model for time series forecasting of exchange rate data is illustrated. It is an important quantitative tool for financial market forecasting and improving decisions and investments. The thrust is on improving effectiveness of time series models. As real world environment experiences uncertain and quick changes, future situations should be forecasted using small amount of data from short span of time. Neuro-Fuzzy model combines advantages of ANN and Fuzzy regression and is used to forecast exchange rate of US dollar to Indian national rupee. The model requires fewer observations to obtain accurate results and also obtains narrower possible intervals than other interval forecasting models under incomplete data conditions, by exploiting advantage of Fuzzy regression model and ANN to preprocess raw data as evident by empirical results. The performance of model is also better than other models. It is also suitable for both point and interval forecasts with incomplete data. Thus, hybrid model makes good forecasts under best and worst possible situations which make it more suitable for decision making over other techniques. This is followed by a study of FLR model with non-uniform spreads for achieving greater explanatory power and forecasting accuracy. The Fuzzy regression coefficients of increasing spreads for estimated fuzzy responses as independent variable increase in magnitude and crisp regression coefficients with uniform spreads are not always suitable. Although some obtain crisp regression coefficients and uniform spread, they cannot deal with situation where spreads of observed responses are actually non-uniform. Here, regression coefficients are calculated as crisp values and spreads of fuzzy error terms are non-uniform. The non-uniform spread FLR model resolves problem of wider spreads of estimated response for larger values of independent variables in Fuzzy regression analysis. The method is based on extension principle which also provides membership function of least-squares estimate of regression coefficient. The method has greater explanatory power and forecasting accuracy. The construction of membership function of fuzzy regression coefficient conserves fuzziness of input information. Numerical example illustrates strength of the method in terms of better explanatory power. Finally, a novel Soft Computing tool viz., FSVM is considered to study problem of bankruptcy prediction in corporate organizations. SVM is capable of extracting information from real life business data. Moreover, they give an opportunity to obtain results not very obvious at first glance. They are easily adjusted with only few parameters. This makes them particularly well suited as an underlying technique for organization rating and investment risk assessment methods used by financial institutions. SVM are also based on very few restrictive assumptions and can reveal effects overlooked by many other methods. They have been able to produce accurate classification results in other areas and can become an option of choice for several applications. But, real life corporate data has an inherent degree of uncertainty and impreciseness; it is obvious that unpredictable results may crop up. In order to create practically valuable methodology for tackling the uncertainty aspect, SVM is integrated with fuzzy membership functions so that an effective decision making classification tool is obtained. To conduct the study, test dataset is used which comprises of 50 largest bankrupt organizations of United States Bankruptcy Code in 2001 – 2002 after stock marked crash of 2000. The performance of FSVM is illustrated by experimental results which show that they are better capable of extracting useful information from corporate data. In chapter 6, different problems relating to assignment, scheduling and sequencing are studied. The first problem addresses FILP model for examination timetable problem. The problem can be further reduced by formulating with lesser number of allocation variables it without affecting optimality of the solution obtained. As problem is an NP-hard in nature, FILP formulation gives an optimal solution that can serve as good benchmark for other heuristics. The small test data set is used to compare with solutions obtained from other heuristic viz., ILP approach which gives an idea about quality of heuristic. The technique is again compared with different AI based heuristic

techniques for ETP with respect to best and mean cost as well as execution time measures on Carter benchmark datasets to illustrate its effectiveness. The comparative study is performed using mathematical Model 3 of FILP technique because minimum number of variables is required in its formulation. FILP technique takes an appreciable amount of time to generate satisfactory solution in comparison to other heuristic solutions. The experimental study presented here focuses on producing a methodology that generalizes well over a spectrum of techniques that generates significant results for one or more datasets. The performance of FILP model is finally compared to the best results cited in literature for Carter benchmarks to assess its potential. The problem can also be reformulated using rough fuzzy hybrid approach. A variant of examination timetable problem viz., university timetable problem is then studied using FGH algorithm. The technique uses an indirect representation featuring event allocation priorities and invokes timetable builder routine for constructing complete timetable. The algorithm incorporates number of techniques and domain specific heuristic local search operators to enhance search efficiency. The non-rigid soft constraints in the problem are optimization objectives for search algorithm. The uncertainty in data is tackled by formulating measure of violation parameter of soft constraint in fitness function using fuzzy membership functions. FGH algorithm has been applied on real world university course timetable problem for which manual solutions are already available. It has been shown through extensive simulation that on incorporating certain combinatorial and domain specific operators search efficiency of evolutionary algorithm is significantly enhanced. On comparing FGA heuristic with manual solution it is evident that although the technique does not satisfy all hard constraints of problem, it achieves significantly better score in satisfying soft constraints and thus its performance is superior. The algorithm can also be adapted to solve other university course timetable as well as scheduling problems. Next, LCS problem is considered using ACO technique. The proposed methodology viz., ACO-LCS is applied to LCS and simulation results are given. The proposed algorithm draws analogy with behavior of ant colonies function and yields better results than traditional technique for finding LCS based on dynamic programming. The stochastic combinatorial optimization aspect for proposed technique is also given which are characterized by positive feedback, distributed computation and use of constructive greedy heuristic. Positive feedback accounts for rapid discovery of good solutions, distributed computation avoids premature convergence and greedy heuristic helps find acceptable solutions in minimum number of stages. The log complexity is obtained in worst case. Finally, RFMLP–NN is used to solve job scheduling problem by utilizing various ingredients of Machine Learning paradigm. The ability of GA to provide multiple optimal solutions is exploited to generate knowledge base of good solutions. The network is successfully trained on this knowledge base. RFMLP–NN Scheduler is utilized to schedule any job scenarios of comparable size to benchmark problem. Also, this methodology can be employed for learning from any set of schedules, regardless of their origin. A comparative evaluation of RFMLP–NN scheduler with other schedulers developed from different methodology and SPT heuristic is also undertaken. Among these schedulers, RFMLP– NN Scheduler developed in this work has closest average makespan to that of GA. The RFMLP– NN Scheduler also performed satisfactorily on test set of larger problem sizes, which demonstrates success of RFMLP–NN in learning key properties to identify good solutions. Thus, RFMLP–NN Scheduler successfully provides close approximation to performance of GA scheduler to JSP. Thus, the entire gamut of study and investigation encompassed in an in-depth search of different important aspects of optimization problem using various Soft Computing paradigms such that cost effective, efficient and optimum solutions are developed.

7.2 Scope for further Research The FSOM solution for TSP in chapter 2, parameters such as  ,  ,  can be optimized to obtain better solutions. Experiments with other self organizing networks should be performed. Gaussian neighborhood and conscience mechanism can be applied which will improve solutions generated by ANN. The permutation operators which are worse for TSP than enhanced edge recombination are actually better for other permutation problems like warehouse or shipping scheduling applications. The FSOM technique may be hybridized with Rough sets which may yield better optimal solutions. The FMOLP paradigm for TSP is rich enough to direct advanced research in different domains of optimization problems. There is definite potential for development of methods to solve TSP with vague description of resources. For efficient results, some heuristics are required such as relative dependencies among objective function can also be determined. The objective function of FMOLP solution for TSP may be obtained using GA. Further, the technique may be effectively applied to other optimization problems assignment, transportation and decision making problems to obtain optimal solutions. In chapter 3, the effectiveness of initial solution of transportation problem obtained using FVAM can greatly be enhanced by incorporating GA or other Evolutionary Computing techniques alongwith fuzzy trapezoidal numbers such that computational complexity is greatly reduced. Besides this, transportation problem studied under probabilistic and fuzzy uncertainties can also be represented using hybridization of Neuro-Fuzzy or Rough-Fuzzy approaches. This will improve overall quality of results. The rectangular fuzzy games using LR-type trapezoidal Fuzzy numbers in chapter 4 is restricted to 2 players only. The game can easily be extended to n players. The pay-off values may also be represented by lower and upper approximation values in Rough sets, which will enhance effectiveness of solution. In classification of financial investments using multi-class SVM it will be interesting and practically more useful to determine some method for determining kernel function and its parameters based on statistical properties of data. Then proposed method in conjunction with multi-class SVM can be tested on application domains such as image processing, text classification, intrusion detection etc. Likewise, better results can be obtained if multi-class SVM classifier is integrated with Fuzzy and Rough membership functions. In chapter 5, RFMLP ANN method for generating stock price prediction rules of BSE has immense potential for application to large scale prediction problems involving knowledge discovery tasks using case-based reasoning particularly related to mining of classification rules. FLR model with non-uniform spreads can also be applied to multiple fuzzy regression problems. The FSVM considered to study problem of bankruptcy prediction in corporate organizations can be enhanced with RFSVM as it improves the classification accuracy of SVM. The solutions obtained for examination timetable problem and university timetable problem in chapter 6 require more simulations to develop an efficient solution with no violations in hard constraints. To verify efficiency and robustness of algorithms, it should be tested on different real world timetable problems. The FILP model for examination timetable problem can also be reformulated using Rough Fuzzy hybrid approach. Similarly, quality of solutions for university timetable problem using FGA heuristic can be greatly improved using hybrid paradigms such as Neuro-FuzzyGenetic or Rough-Fuzzy-Genetic approaches.

Appendix

Data Sets used in Experiments The details of data set used in empirical evaluation and comparison of algorithms developed are briefly enumerated below: Traveling Salesman Problem data: The traveling salesman problem data is taken from TSPLIB 95 documentation, University of Heidelberg. The data consisted of varying sample sizes of 51, 101, 200, 225, 442, 700, 1002, 1200 and 2392. Transportation Problem data: The transportation problem data consisted of 569 samples of transportation matrices data of different dimensions. Source: Department of Mathematics, New Alipore College, Kolkata, India. Rectangular Games data: The rectangular games data consisted of 237 samples of two-dimensional game data of different dimensions. Source: Post Graduate Department of Commerce, St. Xavier’s College, Kolkata, India. Investment Portfolio data: The investment portfolio data is taken from ICICI Prudential Financial Services, India. There were 500 samples consisting of six investments for investor to invest his money based on set of six parameters. Fund Sources data: The fund sources data is taken from Axis Bank, India. There were 700 samples of five fund sources available for Manager based on set of seven parameters. Job Allocation data: The job allocation data is considered from Reliance Industries, India. There were 1000 samples of six persons for the job based on set of six parameters. Manpower Recruitment data: The manpower recruitment data is considered from Tata Consultancy Services, India. There were 500 samples of seven programmers to be recruited based on set of eight parameters. . Product Marketing data: The product marketing data is adopted from Khosla Electronics, Kolkata, India. There were 5000 samples comprising of six brands of televisions to be sold in an international market based on set of eight parameters. Bombay Stock Exchange data: The data comprises of daily stock movement of Bombay Stock Exchange spanning over a period of 10 years (1999 - 2008). There are 5000 stock data samples each being made up of three attributes. Financial Investment data: The data is considered from Treasury Bond Futures data, 1994-95 for French corporations from Financial Analysis Made Easy. This dataset illustrates huge drop in bond prices which took place in 1994. The data file is in plain text and contains high and low prices over 20 minute intervals on Treasury Bond Futures from 7th January, 1994 to 3rd February, 1995. There are a total of 5347 observations in the dataset. UCI Machine Learning Repository data: This data comprises of measurements of Iris flowers, Wine and Glass datasets. There are 150 samples, 4 features and 3 classes in each dataset. Weekly Dow Jones Industrial Average data: This dataset lists an important aggregate stock price index beginning in 1900 and goes up to 1989. The data is formed by patterns from three different classes’ viz., high, medium, low and represents aggregate stock price index; each class consists of 100 samples. A daily version of this dataset is nearly 2 megabytes in size. Each sample is composed by four numeric continuous attributes such as date, time, amount invested and volume format.

Currency Exchange Rate data: This data consists of 50 daily observations of exchange rates of United Sates Dollar versus Indian National Rupee from 8th May, 2008 to 15th October, 2008. Bankruptcy Prediction data: The dataset comprises of 50 largest bankrupt organizations with capitalization of at least $1 billion that filed for protection against creditors under Chapter 11 of United States Bankruptcy Code in 2001 – 2002 after stock marked crash of 2000. Employee Recruitment data: The employee recruitment data consists of 5000 samples spanning over a period of 10 years from 1998-2008 from Allahabad Bank, Kolkata, India. Examination Timetable Problem data: This data is taken from examination timetable system of Netaji Subhas Open University, Kolkata, India. University Course Timetable data: This data is taken from course timetable system of St. Xavier’s College, Kolkata, India. There are over 7000 samples in the dataset. Ant Colony Optimization Problem data: The ant colony dataset is taken from Department of Computer Science Engineering, Techno India – Salt Lake, Kolkata, India which consisted of nearly 5000 samples. Job Scheduling Problem data: This dataset comprises of 10 randomly generated 6  6 scheduling problem scenarios. Some important scheduling data considered are different instance sizes of ft06 alongwith ft10 (10  10), la24 (20  10), la36 (15  15), abz7 (20  15) and yn1 (20  20) problem instances.

Bibliography [1]

Aarts, E. H., Lenstra, J. K., Local Search in Combinatorial Optimization, Princeton University Press, Princeton, New Jersey, 2003.

[2]

Allahverdi, A., A New Heuristic for m-machine Flow Shop Scheduling Problem with Bi-criteria of Make-Span and Maximum Tardiness, Computers and Operations Research, 31(2), pp `57 – 180, 2004.

[3]

Alpaydin, E., Introduction to Machine Learning, The MIT Press, 2004.

[4]

Altman, E. L., Financial Ratios, Discriminant Analysis and Prediction of Corporate Bankruptcy, The Journal of Finance, pp 589 – 609, 1968.

[5]

Altman, E. L., Corporate Financial Distress and Bankruptcy, John Wiley and Sons, 1993.

[6]

Altman, E. I., Bankruptcy, Credit Risk and High Yield Junk Bonds, Wiley Blackwell, 2002.

[7]

Applegate, D. L., Bixby, R. E., Chvảtal, V., Cook, W. J., Finding cuts in TSP (A preliminary report), Technical Report DIMACS 95-05, Rutgers University, Piscataway, New Jersey, 1995.

[8]

Applegate, D. L., Bixby, R. E., Chvảtal, V., Cook, W. J., The Traveling Salesman Problem: A Computational Study, Princeton University Press, Princeton, New Jersey, 2006.

[9]

Arbib, M., The Handbook of Brain Theory and Neural Networks, Second Edition, The MIT Press, 2003.

[10]

Armstrong, J. S., Principles of Forecasting: A Handbook for Researchers and Practitioners, Kluwer Academic, Boston, 2001.

[11]

Asratian, A. S., Kuzjurin, N. N. Two Sensitivity Theorems in Fuzzy Integer Programming, Discrete Applied Mathematics, 134(1-3), pp 129 – 140, 2004.

[12]

Badiru, A. B., Handbook of Industrial and Systems Engineering, CRC Press, 2006.

[13]

Baker, K. R., Trietsch, D., Principles of Sequencing and Scheduling, John Wiley and Sons, 2009.

[14]

Banerjee, M., Mitra, S. Pal, S. K., Rough Fuzzy MLP: Knowledge Encoding and Classification, IEEE Transactions on Neural Networks, 9(6), pp 1203 – 1216, 1998.

[15]

Bargiela, A., Pedrycz, W., Nakashima, T., Multiple Regression with Fuzzy data, Fuzzy Sets and Systems, 158(19), pp 2169 – 2188, 2007.

[16]

Bazan, J., Nguyen, H. S., Nguyen, S. H., Synak, P., Wróblewski, J., Rough Set Algorithms in Classification Problem, In L. Polkowski, S. Tsumoto and T. Y. Lin (Editors), Rough Set Methods and Applications, Physica Verlag, pp 49 – 88, 2000.

[17]

Bazan, J., Peters, J. F., Skowron, A., Son, N. H., Szczuka, M., Rough Set Approach to Pattern Extraction from Classifiers, Electronic Notes in Theoretical Computer Science, 82(4), pp 20 – 29, 2003.

[18]

Benardos, P. G., Vosniakos, G. C., Optimizing Feed-forward Artificial Neural Network Architecture, Engineering Applications of Artificial Intelligence, 20(3), pp 365 – 382, 2007.

[19]

Bishop, C. M., Neural Networks for Pattern Recognition, Oxford University Press, USA, 2004.

[20]

Box, G., Time Series Analysis: Forecasting and Control, Third Edition, Dorling Kindersley India, 2008.

[21]

Boyd, S., Vandenberghe, L., Convex Optimization, Cambridge University Press, 2004.

[22]

Brockwell, P. J., Davis, R. A., Time Series: Theory and Methods, Second Edition, Springer Series in Statistics, Springer Verlag, 2009.

[23]

Buchheim, C., Liers, F., Oswald, M., Local Cuts Revisited, Operations Research Letters, 36(4), pp 430 – 433, 2008.

[24]

Buckley, J. J., Jowers, L. J., Fuzzy Transportation Problem, In Monte Carlo Methods in Fuzzy Optimization, Studies in Fuzziness and Soft Computing, Springer Verlag, Berlin, 222, pp 217 – 221, 2008.

[25]

Chakraborty, U. K., Computational Intelligence in Flow Shop and Job Shop Scheduling, Studies in Computational Intelligence, Springer Verlag, 2009.

[26]

Chanas, S., Zielinski, P., Ranking Fuzzy Interval Numbers in the Setting of Random Sets – Further Results, Information Sciences, 117(3-4), pp 191 – 200, 1999.

[27]

Chang, Y. H. O., Ayyub, B. M., Fuzzy Regression Methods: A Comparative Assessment, Fuzzy Sets and Systems, 119(2), pp 187 – 2003, 2001.

[28]

Chatfield, C., Time Series Forecasting, Chapman and Hall, Boca Raton, Florida, 2000.

[29]

Chaudhuri, A., A Dynamic Algorithm for looking Traveling Salesman Problem as a Fuzzy Integer Linear Programming Problem in an Optimal way, Proceedings of International Conference on Information Technology, Haldia, India, 1, pp 246 – 251, 2007.

[30]

Chaudhuri, A., Solution of Rectangular Fuzzy Games by Principle of Dominance Using LR-type Trapezoidal Fuzzy Numbers, Proceedings of Second International Conference on Advanced Computing and Communication Technologies, Asia Pacific Institute of Information Technology, Panipat, India, pp 203 – 208, 2007.

[31]

Chaudhuri, A., Chakraborty, K., M., A Dynamic Algorithm for the Longest Common Subsequence Problem using Ant Colony Optimization Technique, Proceedings of Second International Conference on Mathematics: Trends and Developments, Al-Azhar University, Cairo, 4, pp 93 – 120, 2007. (In Press: Journal of Egyptian Mathematical Society, Mathematical Science Research Centre, Cairo)

[32]

Chaudhuri, A., De, K., Chatterjee, D., Solution of System of Equations - A Neuro-Fuzzy Approach, East West Journal of Mathematics, Special Volume, Proceedings of International Conference on Discrete Mathematics and its Applications, University Chamber of Commerce, Bangkok, Thailand, pp 66 – 80, 2008.

[33]

Chaudhuri, A., De, K., Classification of Financial Investments using Neuro-Fuzzy and Rough Set Approaches, Proceedings of First International Conference on Data Management, Institute of Management Technology, Ghaziabad, India, pp 805 – 826, 2008.

[34]

Chaudhuri, A., De, K., Chatterjee, D., A Comparative Study of Kernels for the Multi-Class Support Vector Machine, IEEE Transactions on Neural Networks, IEEE Computer Society, 19(12), pp 3275 – 3281, 2008.

[35]

Chaudhuri, A., De, K., Chatterjee, D., A Study of the Traveling Salesman Problem Using Fuzzy Self Organizing Map, IEEE Transactions on Evolutionary Computation, IEEE Computer Society, 12(6), pp 724 – 734, 2008. (To be published as chapter in the book entitled Traveling Salesman Problem, Theory and Applications, IN-TECH Publications, Vienna).

[36]

Chaudhuri, A., De, K., Chatterjee, D., Discovering Stock Price Prediction Rules of Bombay Stock Exchange Using Rough Fuzzy Multi-Layer Perception Networks, In Rudra P. Pradhan (Editor), Forecasting Financial Markets in India, Allied Publishers, New Delhi, pp 69 – 96, 2009.

[37]

Chaudhuri, A., De, K., A Comparative study of the Transportation Problem under Probabilistic and Fuzzy Uncertainties, GANIT, Journal of Bangladesh Mathematical Society, 2009. (Accepted for Publication; In Press).

[38]

Chaudhuri, A., De, K., Chatterjee, D., Fuzzy Support Vector Machine for Bankruptcy Prediction, Applied Soft Computing Journal, Elsevier Publications, 2010. (In Press).

[39]

Chaudhuri, A,, De, K., Chatterjee, D., Solution of the Decision Making Problems Using Fuzzy Soft Relations, International Journal of Information Technology, Singapore Computer Society 15(1), pp 78 – 107, 2009.

[40]

Chaudhuri, A., De, K., Chatterjee, D., Mitra, P., Trapezoidal Fuzzy Numbers for the Transportation Problem, International Journal of Intelligent Computing and Applications, International Science Press, 2(4), pp 96 – 115, 2009.

[41]

Chaudhuri, A., De, K., Time Series Forecasting using Hybrid Neuro-Fuzzy Regression Model, In H. Sakai et al. (Editors), Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Lecture Notes in Artificial Intelligence, Springer Verlag, Berlin, LNAI 5908, pp 369 – 381, 2009.

[42]

Chaudhuri, A., De, K., Achieving greater explanatory power and forecasting accuracy with nonuniform spread fuzzy linear regression, Proceedings of 13th Conference of Society of Operations Management, Department of Management Studies, Indian Institute of Technology, Madras, India, pp 80 – 86, 2009.

[43]

Chaudhuri, A., De, K., A Fuzzy Genetic Algorithm Heuristic for University Course Timetable Problem, International Journal of Advances in Soft Computing and its Applications, 2(1), pp 101 – 124, 2010.

[44]

Chaudhuri, A., De, K., A Study of the Job Scheduling Problem Using Rough Fuzzy Multi Layer Perception Neural Networks Technique, International Journal of Artificial Intelligence: Theory and Applications, 1(1), pp 4 – 24, 2010.

[45]

Chaudhuri, A., De, K., Solution of the Traveling Salesman Problem Using Fuzzy Multi-Objective Linear Programming technique, African Journal of Mathematics and Computer Science Research, 3(7), pp 1 – 7, 2010.

[46]

Chaudhuri, A., De, K., Fuzzy Integer Linear Programming Mathematical Models for the Examination Timetable Problem, International Journal of Innovative Computing, Information and Control, Special Issue on Aware Computing, 2010. (In Press).

[47]

Chen, L. H., Tsai, F. C., Fuzzy Goal Programming with different Importance and Priorities: Theory and Methodology, European Journal of Operational Research, 133(3), pp 548 – 556, 2001.

[48]

Chen, S. M., Forecasting Enrollments based on Higher Order Fuzzy Time Series, Cybernetics and Systems, 33(1), pp 1 – 16, 2002.

[49]

Chen, S. M., Hsu, C. C., A New Method to Forecast Enrollments using Fuzzy Time Series, International Journal of Applied Science and Engineering, 2(3), pp 234 – 244, 2004.

[50]

Chen, K. P., Lee, M. S., Pulat, S., Moses, S. A., The Shifting Bottleneck procedure for Job Shop Scheduling with Parallel Machines, International Journal of Industrial and Systems Engineering, Inderscience Publishers, 1(1-2), pp 244 – 262, 2006.

[51]

Coello, C. A. C., Lamont, G. B., Veldhuizen, D., A. V., Evolutionary algorithms for Solving Multi-Objective Problems, Second Edition, Genetic and Evolutionary Computation, Springer Verlag, 2007.

[52]

Conway, R. W., Maxwell, W. L., Miller, L. W., Theory of Scheduling, Courier Dover Publications, 2003.

[53]

Cormen, T. H., Leiserson, C. E., Rivest, R. L., Stein, C, Introduction to Algorithms, Second Edition, MIT Press Cambridge, Massachusetts, 2001.

[54]

Cortes, E. A., Martinez, M. G., Rubio, N. G., A Boosting Approach for Corporate Failure Prediction, Applied Intelligence, 27(1), pp 29 – 37, 2007.

[55]

Cristianini, N., Taylor, J. S., An Introduction to Support Vector Machines and other Kernel-based Learning Methods, Cambridge University Press, 2010.

[56]

Dambolena, I. G., The Prediction of Corporate Failure, Omega, 11(4), pp 355 – 364, 1983.

[57]

Deb, K., Multi-Objective Optimization using Evolutionary Algorithms, John Wiley and Sons, New York, 2001.

[58]

Dorigo, M., Stutzle, T., Ant Colony Optimization, MIT Press, Cambridge, Massachusetts, 2004.

[59]

Dubois, D., Prade, H., Possibility Theory, Probability Theory and Multi-Valued Logics: A Classification, Annals of Mathematics and Artificial Intelligence, 32, pp 35 – 66, 2001.

[60]

Duda, R., Hart, P., Stork, D., Pattern Classification, Second Edition, John Wiley and Sons, New York, 2000.

[61]

D’Urso, P., Linear Regression Analysis for Fuzzy/Crisp Input data Fuzzy/Crisp Output data, Computational Statistics and Data Analysis, 42(1-2), pp 47 – 72, 2003.

[62]

Ehrgott, M., Approximation Algorithms for Combinatorial Multi-Criteria Problems, International Transactions in Operations Research, 7(1), pp 5 – 31, 2000.

[63]

Ehrgott, M., Gandibleux, X., (Editors), Multiple Criteria Optimization: State of the Art Annotated Bibliographic Surveys, International Series in Operations Research and Management Science, 52, Springer Verlag, 2002.

[64]

Farina, M., Deb. K., Amato, P., Dynamic Multi–Objective Optimization Problems: Test Cases, Approximations and Applications, IEEE Transactions on Evolutionary Computation, 8(5), pp 425 – 442, 2004.

[65]

Figueredo, A. J., Hammond, K. R., McKiernan, E. C., A Brunswikian Evolutionary Developmental Theory of Preparedness and Plasticity, Intelligence, 34(2), pp 211 – 227, 2006.

[66]

Focacci, F., Lodi, A., Milano, M., Mathematical Programming Techniques in Constraint Programming: A Short Overview, Journal of Heuristics, 8(1), pp 7 – 17, 2004.

[67]

Fuller, W. A., Introduction to Statistical Time Series, Second Edition, Wiley Series in Probability and Statistics, Wiley Interscience, 1995.

[68]

Fuller, W. A., Stephens, V. W., Introduction to Statistical Time Series, Third Edition, John Wiley and Sons, New York, 1999.

[69]

Gamboa, D., Rego, C., Glover, F., Implementation Analysis of efficient Heuristic Algorithms for the Traveling Salesman Problem, Computers and Operations Research, 33(4), pp 1154 – 1172, 2006.

[70]

Garey, M. R., Johnson, D. S., Computers and Intractability: A Guide to the Theory of NP– Completeness, W. H. Freeman and Company, New York, 1990.

[71]

Gauss, S. I., Linear Programming: Methods and Applications, Courier Dover Publications, 2003.

[72]

Gauss, S. I., Assad, A., An Annotated Timeline of Operations Research: An Informal History, International Series in Operations Research and Management Science, Springer Verlag, 75, 2005.

[73]

Ghaziri, H., Osman, I. H., A Neural Network Algorithm for the Traveling Salesman Problem with Backhauls, Computers and Industrial Engineering, Focussed Issue on Applied Meta-Heuristics, 44(2), pp 267 – 281, 2003.

[74]

Goldberg, D. E., Genetic Algorithms in Search, Optimization and Machine Learning, Reading, Mass, Addison Wesley, 1989.

[75]

Grewal, B. S., Higher Engineering Mathematics, Thirty-Fifth Edition, Khanna Publishers, 2004.

[76]

Gulub, G H, Loan, C F Matrix Computations Third Edition Baltimore John Hopkins University Press, 1996.

[77]

Guo, P., Tanaka, H., Dual Models for Possibilistic Regression Analysis, Computational Statistics and Data Analysis, 51(1), pp 253 – 266, 2006.

[78]

Hansen, M. P., Use of Substitute Scalarizing Functions to Guide a Local Search Based Heuristics: The Case of MOTSP, Journal of Heuristics, 6(3), pp 419 – 431, 2000.

[79]

Haykin, S., Neural Networks and Learning Machines, Third Edition, Prentice Hall, 2008.

[80]

Heragu, S., Facilities Design, PWS Publishing Company, Boston, 1997.

[81]

Hong, D. H., Song, J. K., Do, Y. H., Fuzzy Least Squares Linear Regression Analysis using Shape Preserving Operations, Information Sciences–Informatics and Computer Science, 138(1-4), pp 185 – 193, 2001.

[82]

Hsieh, P. F., Wang, D. S., Hsu, C. W., A Linear Feature Extraction for Multi-Class Classification Problems based on Class Mean and Covariance Discriminant Information, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(2), pp 223 – 235, 2006.

[83]

Huang, H. J., Yang, H., Optimum Utilization of a Transportation System with Auto/Transit Parallel Modes, Optimal Control Applications and Methods, 20(6), pp 297 – 313, 1999.

[84]

Jaaman, S. H., Shamsuddin, S. M., Yusob, B., Ismail, M., A Predictive Model Construction Applying Rough Set Methodology for Malaysian Stock Market Returns, International Research Journal of Finance and Economics, 30, pp 211 – 218 , 2009.

[85]

Jain, S., Lachhwani, K., Sum of Linear and Fractional Multi-Objective Programming Problem under Fuzzy Rules Constraints, Australian Journal of Basic and Applied Sciences, 2(4), pp 1204 – 1208, 2008.

[86]

Jang, J. S. R., Sun, C. T., Mizutani, E., Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence, Prentice Hall, 1997.

[87]

Jayamohan, M. S., Rajendran, C., Development and Analysis of Cost-based Dispatching Rules for Job Shop Scheduling, European Journal of Operational Research, 157(2), pp 307 – 321, 2004.

[88]

Johnson, D. S., McGeoch, L. A., Rothberg, E. E., Asymptotic Experimental Analysis for the HeldKarp Traveling Salesman Bound, Proceedings of ACM-SIAM symposium on Discrete Algorithms, 2000.

[89]

Kahneman, D., Tversky, A., Choice, Values, Frames, Cambridge University Press, 2000.

[90]

Kahraman, C., Fuzzy Engineering Economics with Applications, Studies in Fuzziness and Soft Computing, Springer Verlag, 2008.

[91]

Kantorovich, L. V., On the Translocation of Masses, Journal of Mathematical Sciences, 133(4), pp 1381 – 1382, 2006.

[92]

Kao, C., Chyu, C. L., A Fuzzy Linear Regression Model with better explanatory power, Fuzzy Sets and Systems, 126(3), pp 401 – 409, 2002.

[93]

Kao, C., Chyu, C. L., Least-Squares Estimates in Fuzzy Regression Analysis, European Journal of Operational Research, 148(2), pp 426 – 435, 2003.

[94]

Kapoor, V. K., Operations Research, Seventh Edition, Sultan Chand and Sons, 2007.

[95]

Kohonen, T., Self–Organizing Maps, Third Edition, Springer Verlag, Berlin, 2001.

[96]

Koonce, D. A., Tsai, S. C., Using Data Mining to find Patterns in Genetic Algorithm Solutions to a Job Shop Schedule, Computers and Industrial Engineering, 38(3), pp 361 – 374, 2000.

[97]

Korhonen, P., Multiple–Objective Programming Support, International Institute for Applied Systems Analysis, Interim Report, Austria, IR–98– 010, 1998.

[98]

Korte, B. H., Vygen, J., Combinatorial Optimization: Theory and Algorithms, Algorithms and Combinatorics, Fourth Edition, Springer Verlag, 2008.

[99]

Kosko, B., Neural Networks and Fuzzy Systems: A Dynamical Systems Approach to Machine Intelligence, Prentice Hall of India, 2008.

[100]

Landy, F. J., Conte, J. M., Work in the 21st Century: An Introduction to Industrial and Organizational Psychology, John Wiley and Sons, 2009.

[101]

Laporte, G., A concise guide to the Traveling Salesman Problem, Journal of the Operational Research Society, 61(1), pp 35 – 40, 2010.

[102]

Lee, D. S., Vassiliadis, V. S., Park, J. M., A Novel Threshold Accepting Meta-Heuristic for the Job Shop Scheduling Problem, Computers and Operations Research, 31(13), pp 2199 – 2213, 2004.

[103]

Liang, T. F., Distribution Planning Decisions using Interactive Fuzzy Multi-Objective Linear Programming, Fuzzy Sets and Systems, 157(10), pp 1303 – 1316, 2006.

[104]

Lund, R., Brockwell, P. J., Davis, R. A., Introduction to Time Series and Forecasting, Second Edition, Springer Verlag, New York, 2002.

[105]

Ma, L., Khorasani, K., A New Strategy for Adaptively Constructing Multi-Layer Feed-Forward Neural Networks, Neurocomputing, 51, pp 361 – 385, 2003.

[106]

Maji, P. K., Biswas, R., Roy, A. R., On Soft Set Theory, Computers and Mathematics with Applications, 44(4-5), pp 555 – 562, 2003.

[107]

Martin, D., Varo, A., Guerrero, J. E., Non-linear Regression Methods in NIRS Quantitative Analysis, Talanta, 72(1), pp 28 – 42, 2007.

[108]

Michalewicz, Z., Genetic Algorithms + Data Structures = Evolution Programs, Springer Verlag, 1996.

[109]

Mitra, P., Certain Pattern Recognition Tasks for Data Mining Problems, PhD Thesis, Indian Statistical Institute, Kolkata, 2002.

[110]

Mitra, S., Pal, S. K., Fuzzy Sets in Pattern Recognition and Machine Intelligence, Fuzzy Sets and Systems, 156(3), pp 381 – 386, 2005.

[111]

Mohan, C., Deep, K., Optimization Techniques, New Age International Publishers, New Delhi, 2009.

[112]

Narayan, A. L., Meenakshi, A. R., Ramasamy, A. M. S., Fuzzy Games, Journal of Fuzzy Mathematics, 10(4), pp 817 – 829, 2002.

[113]

Nasrabadi, M. M., Nasrabadi, E., Nasrabadi, A. R., Fuzzy Linear Regression Analysis: A MultiObjective Programming Approach, Applied Mathematics and Computation, 163(1), pp 245 – 251, 2005.

[114]

Ohlson, J., Financial Ratio and Probabilistic Prediction of Bankruptcy, Journal of Accounting Research, 18, pp 109 – 131, 1980.

[115]

Onwubolu, G. C. Babu, B. V., New Optimization Techniques in Engineering Studies in Fuzziness and Soft Computing Series, 141, Springer Verlag, 2004.

[116]

Pal, S. K., Soft Computing Pattern Recognition: Principles, Integrations and Data Mining, In T. Tassano et al. (Editors), New Frontiers in Artificial Intelligence, Lecture Notes in Computer Science, Springer Verlag, Berlin, LNCS 2253, pp 261 – 271, 2001.

[117]

Pal, S. K., Mitra, S., Mitra, P., Rough-Fuzzy Multi Layer Perception: Modular Evolution, Rule Generation and Evaluation, IEEE Transactions on Knowledge and Data Engineering, 15(1), pp 14 – 25, 2003.

[118]

Panopoulos, D., Metaxiotis, K., Solving Advanced Scheduling Problems using Advanced Model Checking Tools, International Journal of Computer Applications in Technology, 26(1-2), pp 37 – 38, 2006.

[119]

Park, B. J., Choi, H. R., Kim, H. S., A Hybrid Genetic Algorithm for the Job Shop Scheduling Problems, Computers and Industrial Engineering, 45(4), pp 597 – 613, 2003.

[120]

Paschos, P. T., (Editor), Applications of Combinatorial Optimization, John Wiley and Sons, 2010.

[121]

Pavel, J., Jiŕi, K., Miroslava, K., Systems Modeling on the basis of Rough and Rough-Fuzzy Approach, WSEAS Transactions on Information Science and Applications, 5(10), pp 1438 – 1447, 2008.

[122]

Polk, T. A., Seifert, C. M., Cognitive Modeling, MIT Press, 2002.

[123]

Principe, J. C., Euliano, N. R., Lefebvre, W. C., Neural and Adaptive Systems: Fundamentals through Simulations, John Wiley and Sons, New York, 2000.

[124]

Rahendi, N. T A., Atoum, J. Solving the Traveling Salesman Problem using New Operators in Genetic Algorithms, American Journal of Applied Sciences, 6(8), pp 1586 – 1590, 2009.

[125]

Ramik, J., Vlach, M., Fuzzy Mathematical Programming: A Unified Approach Based on Fuzzy Relations, Fuzzy Optimization and Decision Making, 1(4), pp 335 – 346, 2002.

[126]

Rao, S. S., Engineering Optimization – Theory and Practice, Fourth Edition, John Wiley and Sons, 2009.

[127]

Reason, J., Human Error: Models and Management, British Management Journal, 320, pp 768 – 770, 2000.

[128]

Rifkin, R., Klautau, A., In Defense of One versus All Classification, The Journal of Machine Learning, 5, pp 101 – 141, 2004.

[129]

Rivera, D. C., Becerra, R. L., Coello, C. A. C., Cultural Algorithms, An alternative Heuristic to Solve the Job Shop Scheduling Problem, Engineering Optimization, 39(1), pp 69 – 85, 2007.

[130]

Ross, T. J., Fuzzy Logic with Engineering Applications, Third Edition, John Wiley and Sons, 2004.

[131]

Ruiz, R., Maroto, C., Alcaraz, J., Solving the Flow Shop Scheduling Problem with Sequence Dependent Setup Times using Advanced Meta-Heuristics, European Journal of Operations Research, 165(1), pp 34 – 54, 2005.

[132]

Schrijver, A., Theory of Linear and Integer Programming, Third Edition, John Wiley and Sons, 2000.

[133]

Sengupta, A., Pal, T. K., On Comparing Interval Numbers, European Journal of Operational Research, 127(1), pp 28 – 43, 2000.

[134]

Sewastianow, P., Rog, P., Karczewski, K., A Probabilistic Method for Ordering Group of Intervals, Computer Science, 2(2), pp 45 – 53, 2002.

[135]

Sewastianow, P., Rog, P., A Probabilistic Approach to Fuzzy and Interval Ordering, Artificial and Computational Intelligence, Task Quarterly, Special Issue, 7(1), pp 147 – 156, 2003.

[136]

Shapiro, G. P., Data Mining and Knowledge Discovery 1996 to 2005: Overcoming the Hype and moving from University to Business and Analytics, Data Mining and Knowledge Discovery, 15(1), pp 99 – 105, 2007.

[137]

Shastri, S. S., Introductory Methods of Numerical Analysis, Fourth Edition, Prentice Hall of India, 2004.

[138]

Sikder, I. U., Discovering Decision Heuristics in Collaborative Planning, International Journal of Management and Decision Making, 9(1), pp 1 – 15, 2008.

[139]

Singh, R. K., Ghosh, H., Prajneshu, Possibility and Necessity Measures for Fuzzy Linear Regression Analysis, Journal of Indian Society of Agricultural Statistics, 62(1), pp 19 – 25, 2008.

[140]

Smith, K., Jatinder, N. D., Neural Networks in Business: Techniques and Applications for the Operations Researcher, Computers Operations Research, 27(11-12), pp 1023 – 1044, 2000.

[141]

Stephen, D. D., Palanivel, K., The Transportation Problem in Fuzzy Environment, International Journal of Algorithms, Computing and Mathematics, 2(3), pp 65 – 71, 2009.

[142]

Suykens, J. A. K., Vandewalle, J. V., Least Squares Support Vector Machine Classifiers, Neural Processing Letters, 9(3), pp 293 – 300, 1999.

[143]

Suykens, J. A. K., Brabanter, J. D., Lukas, L., Vandewalle, J., Weighted Least Squares Support Vector Machine: Robustness and Sparse Approximation, Neurocomputing, 48(1-4), pp 85 – 195, 2002.

[144]

Suykens, J. A. K., Van Gestel, T., Brabanter, J. De, De Moor, B., Vandewalle, J., Least Squares Support Vector Machines, World Scientific, Singapore, 2002.

[145]

Swarup, K., Gupta, P. K., Man Mohan, Operations Research, Twelfth Edition, Sultan Chand and Sons, 2006.

[146]

Taha, H. A., Operations Research – An Introduction, Seventh Edition, Prentice Hall of India, New Delhi, 2006.

[147]

Thawornwong, S., Enke, D., The adaptive selection of Financial and Economic variables for use with Artificial Neural Networks, Neurocomputing, 56, pp 205 – 232, 2004.

[148]

Thie, P. R., Keough, G. E., An Introduction to Linear Programming and Game Theory, Third Edition, John Wiley and Sons, 2008.

[149]

Tuban, E. E., Aronson, J. E., Decision Support Systems and Intelligent Systems, Sixth Edition, Upper Saddle River, New Jersey, Prentice Hall, 2001.

[150]

Van Gestel, T., Suykens, J. A. K., Lanckriet, G., Lambrechts, A., De Moor, B., Vandewalle, J., Bayesian Framework for Least Squares Support Vector Machine Classifiers, Gaussian Processes and Kernel Fisher Discriminant Analysis, Neural Computing, 14(5), 1115 – 1147, 2002.

[151]

Vapnik, V., The Nature of Statistical Learning Theory, Springer Verlag, New York, 1995.

[152]

Volgenant, T. A., A Note on the Assignment Problem with Seniority and Job Priority Problems, European Journal of Operational Research, 154(1), pp 330 – 350, 2004.

[153]

Waiel, F., Wahed, A. E., A Multi-Objective Transportation Problem under Fuzziness, Fuzzy Sets and Systems, 117(1), pp 27 – 33, 2001.

[154]

Wang, Y. M., Xiao, N. F., Yin, H. L., Hu, E. L., A Novel Heuristic Approach for Job Shop Scheduling Problem, In F. P. Preparata and Q. Fang (Editors), Frontiers in Algorithmics, Lecture Notes in Computer Science, Springer Verlag, Berlin, LNCS 4613, pp 252 – 260, 2007.

[155]

Wang, P. P., Ruan, D., Kerre, E. E., (Editors), Fuzzy Logic: A Spectrum of Theoretical and Practical Issues, Studies in Fuzziness and Soft Computing, Springer Verlag, Berlin, 2007.

[156]

Weckman, G. R., Ganduri, C. V., Koonce, D. A., A Neural Network Job-Shop Scheduler, Journal of Intelligent Manufacturing, 19(2), pp 191 – 201, 2008.

[157]

Xu, W., Tsai, W. T., Effective Neural Algorithm for the Traveling Salesman Problem, Neural Networks, 4(2), pp 193 – 205, 1991.

[158]

Yang, X., Yu, D., Yang, J., Wu, C., Generalization of Soft Set Theory – From Crisp to Fuzzy Case, In B. Y. Cao (Editor), Fuzzy Information and Engineering, Advances in Soft Computing, Springer Verlag, Berlin, ASC 40, pp 345 – 354, 2007.

[159]

Yen, J., Langari, R., Fuzzy Logic: Intelligence, Control and Information, Pearson Education, 2005.

[160]

Zadeh, L. A., The Concept of a Linguistic Variable and its Application to Approximate Reasoning – I, Information Sciences, 8(3), pp 199 – 249, 1975.

[161]

Zadeh, L. A., The Concept of a Linguistic Variable and its Application to Approximate Reasoning – II, Information Sciences, 8(4), pp 301 – 357, 1975.

[162]

Zadeh, L. A., The Concept of a Linguistic Variable and its Application to Approximate Reasoning – III, Information Sciences, 9(1), pp 43 – 80, 1976.

[163]

Zadeh, L. A., Fuzzy Logic, Neural Networks and Soft Computing, Communications of the ACM, 37(3), pp 77 – 84, 1994.

[164]

Zhang, P., Min Qi, G., Neural Network Forecasting for Seasonal and trend Time Series, European Journal of Operations Research, 160(2), pp 501 – 514, 2005.

[165]

Zhang, D., L, C., Zhu, J., Zhu, H., Hybrid Intelligent Algorithm for Job Shop Scheduling under Uncertainty, In C. Xioang et al. (Editors), Intelligent Robotics and Applications, Lecture Notes in Computer Science, Springer Verlag, Berlin, LNCS 5315, pp 946 – 956, 2008.

[166]

Zimmermann, H. J., Fuzzy Set Theory and its Applications, Fourth Edition, Kluwer Academic Publishers, Boston, 2001.

List of Publications of Author related to the Thesis [1] Chaudhuri, A., A Dynamic Algorithm for looking Traveling Salesman Problem as a Fuzzy Integer Linear Programming Problem in an Optimal way, Proceedings of International Conference on Information Technology, Haldia Institute of Technology, Haldia, India, 1, pp 246 – 251, 2007. [2] Chaudhuri, A., Solution of Rectangular Fuzzy Games by Principle of Dominance Using LR-type Trapezoidal Fuzzy Numbers, Proceedings of 2nd International Conference on Advanced Computing and Communication Technologies, Asia Pacific Institute of Information Technology, Panipat, India, pp 203 – 208, 2007. [3] Chaudhuri, A., Chakraborty, K., M., A Dynamic Algorithm for the Longest Common Subsequence Problem using Ant Colony Optimization Technique, Proceedings of 2nd International Conference on Mathematics: Trends and Developments, Al-Azhar University, Cairo, Egypt, 4, pp 93 – 120, 2007. [4] Chaudhuri, A., De, K., Chatterjee, D., Solution of System of Equations - A Neuro-Fuzzy Approach, East West Journal of Mathematics, Special Volume, Proceedings of International Conference on Discrete Mathematics and its Applications, University Chamber of Commerce, Bangkok, Thailand, pp 66 – 80, 2008. [5] Chaudhuri, A., De, K., Classification of Financial Investments using Neuro-Fuzzy and Rough Set Approaches, Proceedings of 1st International Conference on Data Management, Institute of Management Technology, Ghaziabad, India, pp 805 – 826, 2008. [6] Chaudhuri, A., De, K., Chatterjee, D., A Comparative Study of Kernels for the Multi-Class Support Vector Machine, Proceedings of IEEE 4th International Conference on Natural Computation, Jinan, China, 2, pp 3 – 7, 2008. [7] Chaudhuri, A., De, K., Chatterjee, D., A Study of the Traveling Salesman Problem Using Fuzzy Self Organizing Map, Proceedings of IEEE Region 10 and 3rd International Conference on Industrial and Information Systems, Indian Institute of Technology Kharagpur, India, pp 1 – 5, 2008. [8] Chaudhuri, A., De, K., Chatterjee, D., Discovering Stock Price Prediction Rules of Bombay Stock Exchange Using Rough Fuzzy Multilayer Perception Networks, In Rudra P. Pradhan (Editor), Forecasting Financial Markets in India, Allied Publishers, New Delhi, India, pp 69 – 96, 2009. [9] Chaudhuri, A., De, K., A Comparative study of the Transportation Problem under Probabilistic and Fuzzy Uncertainties, GANIT, Journal of Bangladesh Mathematical Society, 15(2), pp 31 – 40, 2009. [10] Chaudhuri, A,, De, K., Chatterjee, D., Solution of the Decision Making Problems Using Fuzzy Soft Relations, International Journal of Information Technology, 15(1), pp 78 – 107, 2009.

[11] Chaudhuri, A., De, K., Chatterjee, D., Mitra, P., Trapezoidal Fuzzy Numbers for the Transportation Problem, International Journal of Intelligent Computing and Applications, 2(4), pp 96 – 115, 2009. [12] Chaudhuri, A., De, K., Time Series Forecasting using Hybrid Neuro-Fuzzy Regression Model, In H. Sakai et al. (Editors), Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Lecture Notes in Artificial Intelligence, Springer Verlag, Berlin, LNAI 5908, pp 369 – 381, 2009.

[13] Chaudhuri, A., De, K., Achieving greater explanatory power and forecasting accuracy with nonuniform spread fuzzy linear regression, Proceedings of 13th Conference of Society of Operations Management, Department of Management Studies, Indian Institute of Technology, Madras, pp 80 – 86, 2009. [14] Chaudhuri, A., De, K., A Fuzzy Genetic Algorithm Heuristic for University Course Timetable Problem, International Journal of Advances in Soft Computing and its Applications, 2(1), pp 101 – 124, 2010. [15] Chaudhuri, A., De, K., A Study of the Job Scheduling Problem Using Rough Fuzzy Multilayer Perception Neural Networks Technique, International Journal of Artificial Intelligence: Theory and Applications, 1(1), pp 4 – 24, 2010. [16] Chaudhuri, A., De, K., Solution of the Traveling Salesman Problem Using Fuzzy Multi-Objective Linear Programming technique, African Journal of Mathematics and Computer Science Research, 3(7), pp 1 – 7, 2010. [17] Chaudhuri, A., De, K., A Study of the Traveling Salesman Problem Using Fuzzy Self Organizing Map, Book Chapter: Traveling Salesman Problem, Theory and Applications, Donald Davendra, (Editor), In Tech Open Access Publishers, Croatia, pp 197 – 212, 2010. [18] Chaudhuri, A., De, K., Fuzzy Support Vector Machine for Bankruptcy Prediction, Applied Soft Computing Journal, 11(2), pp 2472 – 2486, 2011. [19] Chaudhuri, A., De, K., Fuzzy Integer Linear Programming Mathematical Models for the Examination Timetable Problem, International Journal of Innovative Computing, Information and Control – Special Issue, 7(5), 1 – 25, 2011.

Suggest Documents