Evolving Connectionist Systems

Evolving Connectionist Systems Characterisation, Simplification, Formalisation, Explanation and Optimisation Michael John Watts

A thesis submitted for the degree of Doctor of Philosophy at the University of Otago, Dunedin, New Zealand February 27, 2004

i

Dedication To my beloved wife, Yuxiao Li. And to our darling daughter, Athena Jean Xiang Li-Watts.

ii

Abstract There are several well-known problems with conventional artificial neural networks (ANN), such as difficulties with selecting the structure of the network, and problems with forgetting previously-learned knowledge after further training. Constructive neural network algorithms attempt to solve these problems, but in turn have problems of their own. The Evolving Connectionist System (ECoS) is a class of open architecture artificial neural networks that are similar in the way in which neurons are added to their structures, and in the way in which their connection weights are modified. The ECoS algorithm is intended to address the problems with constructive neural networks. Several problems with ECoS are identified and discussed in this thesis. These problems are: the lack of comparison of ECoS with constructive neural networks; the excessive complexity of the Evolving Fuzzy Neural Network (EFuNN), which is the seminal ECoS network; the lack of a testable formalisation of ECoS; the dependence on fuzzy logic elements embedded within the network for fuzzy rule extraction; and the lack of methods for optimising ECoS networks. The research in this thesis addresses these problems. The overall theme of the research can be summarised as the characterisation, simplification, formalisation, explanation and optimisation of ECoS. Characterisation in this thesis means the comparison of ECoS with existing constructive ANN. Simplification means reducing the network to a minimalist implementation. Formalisation means the creation of a testable predictive model of ECoS training. Explanation means explaining ECoS networks via the extraction of fuzzy rules. Finally, optimisation means creating ECoS networks that have a minimum number of neurons with maximum accuracy. Each of these themes is approached in ways that build upon, and are complementary to, the basic ECoS network and ECoS training algorithm. The basic ECoS structure and algorithm is left unchanged, and the problems are addressed by extending that structure, rather than altering it as has been done in other work on ECoS. The principal contributions of this thesis are: a qualitative comparison of ECoS to constructive neural network algorithms; a proposed simplified version of EFuNN called SECoS; an experimentally tested formalisation of ECoS; novel algorithms for explicating SECoS via the extraction of fuzzy rules; and several novel algorithms for the optimisation of ECoS networks. The formalisation of ECoS and the proposed algorithms are evaluated on data from a set of standard benchmarking problems. Further experiments are performed with a data set with real-world applications, namely the recognition of isolated New Zealand English phonemes. The analyses of the experimental results show that the proposed algorithms are effective across both the benchmark data sets and the case study data set.

iii

First Thoughts For I dipt into the future, far as human eye could see, Saw the vision of the world, and all the wonders that would be;

Saw the heavens fill with commerce, argosies of magic sail, Pilots of the purple twilight, dropping down with costly bales;

Heard the heavens fill with shouting, and there rained a ghastly dew, From the nations’ airy navies grappling in the central blue;

Far along the world-wide whisper of the south wind rushing warm, With the standards of the people plunging through the thunder storm;

Till the war-drum throbbed no longer, and the battle flags were furled, In the parliament of man, the Federation of the World;

There the common sense of most shall hold a fretful realm in awe, And the kindly earth shall slumber, lapt in universal law...

...Yet I doubt not through the ages, one increasing purpose runs, And the thoughts of men are widened in the process of the suns.

- Tennyson, Locksley Hall

Lo, there do I see my father; Lo, there do I see my mother, my sisters and my brothers; Lo, there do I see the line of my people, back to the beginning; Lo, they do call to me; They bid me take my place among them in the halls of Valhalla, Where the brave may live forever.

- Ancient Norse Battle Prayer.

iv

Acknowledgements There are many, many people who have contributed to this work, and touched my life during the long years it took to complete. The first person I must acknowledge is my beloved wife, Yuxiao Li. She is the light of my life, my heart and soul, the rock from which I gain my support, and the owner of my heart. With the completion of this thesis, I can now show her and our baby the attention they deserve. My parents, Jack and Sheila, for their continued love and support all throughout the trials and tribulations of a postgraduate degree. My brother Jim, for the cheap hardware and the Open Source vs Microsoft debates. My nephews, Jamie Adam, Connor Jack, and Frewen Thomas for being such a fun and effective distraction from work, and for reminding me of the fun in everyday life. My father and mother in law, Drs Chunhui Li and Guilan Zhang, and my sister in law, Dr Yuhong Li, for all of the support and love they have shown me, even after I stole away their daughter and sister. Professor Nik Kasabov, the author of my career and a first-class human being. His work was the foundation of this thesis, and his unflagging belief gave me the confidence to keep working, even when it all looked hopeless. Dr Colin Aldridge, who helped me immensely in the management of my thesis, and whose timely advice and comments assisted me in bringing together a coherent thesis from the mess of years of research and scrawlings. Ali Akbar Ghobakhlou, for all of his encouragement and energy during the good times, and his support during the bad. Silvia Slavcheva Raykova, for her endless patience at the English language, and her good humour at my puns and jokes at her expense. Carl Leichter, for introducing me to the wonderful world of MST3K, and for teaching me the endless humour to be had in the constant absurdities of life. The term “Leichterian” is now indelibly engraved in my lexicon. Richard Kilgour, the perfect competition for my humour, against whom I have continually had to compete to be the first with a pun or verbal jab. His help with the numerous technical problems that pop up during the preparation of a thesis must also be noted. Zhang Ji, for being such a fun person to be with, and for not hitting me after drunkenly spinning her around the dance floor of the Cook. The help she gave me in wooing my wife should also not be forgotten. Dr Mark Laws, for the fun I had working on the English - Maori translator project, and for all the enlightenment on linguistics and on Maori culture. Brendon Sly, for the timely technical help, the advice on things technical, and the discussions about The Blackadder. Roy Ward, the best programmer I know, for his help with coding and with mathematics. Also, for never letting me forget that there are two sides to every story. Brian Niven of the Department of Mathematics and Statistics, for his help with the statistical analysis of the experimental results in this thesis. Mike Jennings, Tony Moore and Chris Edwards for the final proof-reading. And Martin Purvis, for his insightful comments on my work.

v

List of Abbreviations ANN - Artificial Neural Network ASR - Automatic Speech Recognition AVQ - Adaptive Vector Quantisation DCS - Dynamic Cell Structure DENFIS - Dynamic Evolving Fuzzy Inference System DNC - Dynamic Node Creation d.p. - decimal places EA - Evolutionary Algorithm EC - Evolutionary Computation ECM - Evolving Clustering Method ECoS - Evolving Connectionist System EFuNN - Evolving Fuzzy Neural Network EP - Evolutionary Programming ES - Evolution Strategy FLEFuNN - Four Layer Evolving Fuzzy Neural Network FuNN - Fuzzy Neural Network GA - Genetic Algorithm GAL - Grow and Learn GCS - Growing Cell Structure IIS - Intelligent Information System MF - Membership Function MLP - Multi-Layer Perceptron NN-MLP - Nearest Neighbour Multi-Layer Perceptron RAN - Resource Allocating Network SECoS - Simple (or Simplified) Evolving Connectionist System SRN - Simple Recurrent Network ZISC - Zero Instruction Set Computer

vi

List of Symbols A is the activation of a neuron. i is a neuron in the input layer of an ECoS network. n is a neuron in the evolving layer of an ECoS network. j is the winning (most highly activated) neuron in the evolving layer of an ECoS. o is a neuron in the output layer of an ECoS network.

is the number of neurons in the input layer of the ECoS network. W is the weight matrix between two layers of neurons.

D is the distance between two vectors in the same dimensionality space. the range [0; 1℄. t is a discrete point in time.

This is measured so that the result is in

I is the input vector to the ECoS network. Od is the desired output vector. O is the calculated (actual) output vector of the ECoS network.

Eo is the error (difference between desired activation and true activation) over output neuron o. Ethr is the error threshold training parameter. Sthr is the sensitivity threshold training parameter. 1 is the learning rate one training parameter. 2 is the learning rate two training parameter. C is a cluster centre in the ECM (Section 4.7) algorithm. Ru is the radius of a cluster in the ECM algorithm. Dthr is the distance threshold parameter in the ECM algorithm. x is the ECM input vector. S is a measure of the size of a cluster in the ECM algorithm. N is an ECoS network, of any particular variety. No is an ECoS network before training. Nt is an ECoS network after training on a data set. T is a set of training vectors. I is the input vector component of a single training example. is a set of training parameters.

MC is a set of ECoS evolving layer neurons that all correspond to the same class C . R is a region in the input space. D is a distance between two points in the input space, as defined by some normalised distance measuring function yielding results in the range [0; 1℄. V is the amount of space (volume) enclosed by a particular region. This is an area in the case of a two dimensional input space, a volume in three dimensional space and a hyper-volume in the case of input spaces of more than three dimensions.

Rj is the region defined by the winning evolving layer neuron j .

vii

Vj is the volume enclosed by Rj . Ra is the hyper-spherical region around a winning evolving layer neuron j

such that if a training example falls

within this region, then no neuron will be added to the ECoS by the training algorithm.

Da is the distance from j that defines Ra . Va is the volume enclosed by Ra . Pa is the probability of adding a neuron to the ECoS. Rs is the hyper-spherical subregion of Ra that is defined by the sensitivity threshold parameter Sthr . Ds is the distance from j that defines Rs . Vs is the volume enclosed by Rs . Re is the hyper-spherical subregion of Ra that is defined by the error threshold parameter Ethr . Ve is the volume enclosed by Re . Demax is the radius defining the outside edge of Re . Demin is the radius defining the inside edge of Re . Amax is the maximum possible activation of output neuron o, as defined by Ethr . e min Ae is the minimum possible activation of output neuron o, as defined by Ethr .

Contents

1

Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

i

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ii

First Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

iii

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

iv

Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v

Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vi

Introduction

1

1.1

Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.1.1

Research Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

Research Hypotheses and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

1.2.1

Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

1.2.2

Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

1.2.3

Scope of Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

1.3

Criteria for Success . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

1.4

Original Contributions of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

1.5

Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

1.6

Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

1.7

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

1.2

2

Fuzzy Systems, Neural Networks and Evolutionary Algorithms

11

2.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.2

Fuzzy Logic Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.2.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.2.2

Fuzzy Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

2.2.3

Zadeh-Mamdani Fuzzy Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

2.2.4

Takagi-Sugeno-Kang Fuzzy Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

2.3.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

2.3.2

Artificial Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

2.3.3

Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

2.3.4

ANN Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

2.3.5

Perceptrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.3

CONTENTS

2.4

2.5

2.6

2.7

2.8 3

ix

2.3.6

Multi-Layer Perceptrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

2.3.7

Backpropagation of Errors Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

2.3.8

Fuzzy Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.3.9

The Fuzzy Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

Perceptron Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.4.1

Problems with Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . .

22

Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

2.5.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

2.5.2

Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

2.5.3

Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

Benchmark Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

2.6.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

2.6.2

Two Spirals Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

2.6.3

Iris Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

2.6.4

Mackey-Glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

2.6.5

Gas Furnace Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

Benchmark Experiments with MLP and FuNN . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

2.7.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

2.7.2

Two Spirals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

2.7.3

Iris Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

2.7.4

Mackey-Glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

2.7.5

Gas Furnace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

2.7.6

Conclusions for Benchmark Experiments with MLP and FuNN . . . . . . . . . . . . . .

40

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

40

Constructive Connectionist Systems

41

3.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

3.2

Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44

3.2.1

Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44

3.2.2

Penalty Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

3.3

Constructing versus Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

3.4

The Dynamic Node Creation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

3.5

The Tiling Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

46

3.6

The Upstart Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

46

3.7

Cascade Correlation Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

3.8

Resource Allocating Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

3.9

Evolutionary Nearest Neighbor MLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

50

3.10 Growing Cell Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

3.11 Zero Instruction Set Computer Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

CONTENTS

4

x

3.12 Grow and Learn Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

3.13 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

3.14 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

Evolving Connectionist Systems

60

4.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

4.2

The ECoS Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

61

4.3

Evolving Fuzzy Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

62

4.4

General ECoS Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64

4.4.1

ECoS training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

4.4.2

Neuron Allocation Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

66

4.5

EFuNN Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

4.6

The Simple Evolving Connectionist System . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

4.6.1

The SECoS Structure and Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . .

68

4.7

Dynamic Evolving Neural-Fuzzy Inference System . . . . . . . . . . . . . . . . . . . . . . . . .

68

4.8

Output Space Expansion in ECoS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

70

4.8.1

EFuNN Output Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

70

4.8.2

SECoS Output Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

Temporal Extensions of ECoS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

4.10 Evaluation of ECoS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

72

4.10.1 Evaluation of ECoS in terms of constructive connectionist systems . . . . . . . . . . . . .

73

4.10.2 Evaluation of ECoS in terms of Intelligent Information Systems Criteria . . . . . . . . . .

73

4.11 Comparing ECoS with Constructive Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

4.11.1 Upstart Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

4.11.2 Resource Allocating Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

4.11.3 Evolutionary Nearest Neighbour MLP . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

4.11.4 Growing Cell Structure Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

4.11.5 ZISC Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

4.11.6 Grow and Learn Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

77

4.12 Applications of ECoS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

77

4.13 Experiments with ECoS Networks over the Benchmark Data Sets . . . . . . . . . . . . . . . . . .

79

4.13.1 Two Spirals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

4.13.2 Iris Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

4.13.3 Mackey-Glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85

4.13.4 Gas Furnace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

87

4.13.5 Benchmark Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

88

4.14 Problems with the ECoS Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89

4.15 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89

4.16 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

90

4.9

CONTENTS

xi

5

Formalisation of Evolving Connectionist Systems

91

5.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91

5.2

Existing Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

92

5.3

A New Formalisation of ECOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

5.3.1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

94

5.4

Theoretical Basis of ECoS Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

97

5.5

Influence of the Neuron Addition Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . .

98

5.6

Influence of the Sensitivity Threshold Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.7

Influence of the Error Threshold Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.8

Influence of the Learning Rate One Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.9

Influence of the Learning Rate Two Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

Axioms of State

5.10 Function Approximation versus Classification Problems . . . . . . . . . . . . . . . . . . . . . . . 106 5.11 Convergence of ECoS Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.12 Ramifications of Non-orthogonality of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.13 Ramifications of Non-uniform Distribution of Data . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.14 Validation of the Formalisation with Benchmark Data Sets . . . . . . . . . . . . . . . . . . . . . 108 5.14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.14.2 Experimental Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.14.3 Sensitivity Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.14.4 Error Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.14.5 Learning Rate Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.14.6 Benchmark Experiment Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.15 Problems with ECoS Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.16 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.17 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6

Fuzzy Rules and Evolving Connectionist Systems

119

6.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.2

Evaluation of Rule Extraction Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.3

Previous Work in Rule Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

6.4

Fuzzy Rule Extraction from FuNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.4.1

The REFuNN Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

6.4.2

Evaluation of the REFuNN Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

6.5

Fuzzy Rule Extraction from ECoS Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

6.6

Extraction of Fuzzy Rules from EFuNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6.7

6.6.1

The Rule Extraction Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6.6.2

Evaluation of the RE-EFuNN Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 128

Extraction of Fuzzy Rules from SECoS Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.7.1

Extracting Zadeh-Mamdani Fuzzy Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 130

CONTENTS

6.7.2 6.8

6.9

xii

Extracting Takagi-Sugeno Fuzzy Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Insertion of Fuzzy Rules into SECoS Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.8.1

Inserting Zadeh-Mamdani Fuzzy Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

6.8.2

Inserting Takagi-Sugeno Fuzzy Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

Evaluation of Fuzzy Rules Extracted from ECoS Networks . . . . . . . . . . . . . . . . . . . . . 137

6.10 Problems with Fuzzy Rules Extracted from ECoS Networks . . . . . . . . . . . . . . . . . . . . 137 6.11 Experiments with Benchmark Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 6.11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 6.11.2 Experimental Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 6.11.3 Two Spirals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 6.11.4 Iris Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 6.11.5 Mackey-Glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 6.11.6 Gas Furnace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 6.11.7 Conclusions for Experiments with Benchmark Data . . . . . . . . . . . . . . . . . . . . . 152 6.12 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 6.13 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 7

Methods for the Optimisation of ECoS Networks

155

7.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

7.2

Optimising Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

7.3

7.4

7.5

7.2.1

Pruning Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

7.2.2

Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

7.2.3

Requirements for ECoS Optimisation Algorithms . . . . . . . . . . . . . . . . . . . . . . 159

Methods for Optimising Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 7.3.1

Online Neuron Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

7.3.2

Optimisation of Parameters by Evolutionary Algorithm . . . . . . . . . . . . . . . . . . . 160

Methods for Post-training Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 7.4.1

Offline Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

7.4.2

Sleep Learning in ECoS Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

7.4.3

Evolutionary Sleep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

Experiments with Benchmark Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 7.5.1

Online Aggregation Experimental Method . . . . . . . . . . . . . . . . . . . . . . . . . . 165

7.5.2

Evolutionary Optimised Training Experimental Method . . . . . . . . . . . . . . . . . . 166

7.5.3

Offline Aggregation Experimental Method . . . . . . . . . . . . . . . . . . . . . . . . . 167

7.5.4

Sleep Training Experimental Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

7.5.5

Evolved Sleep Training Experimental Method . . . . . . . . . . . . . . . . . . . . . . . . 168

7.5.6

Method for the Comparison of Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 170

7.5.7

Experiments with the Two Spirals Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 171

7.5.8

Experiments with the Iris Classification Dataset . . . . . . . . . . . . . . . . . . . . . . . 175

CONTENTS

7.5.9

xiii

Experiments with the Mackey-Glass Dataset . . . . . . . . . . . . . . . . . . . . . . . . 178

7.5.10 Experiments with the Gas Furnace Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 182 7.5.11 Benchmark Experiments Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

8

7.6

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

7.7

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

Case Study: The Isolated Phoneme Recognition Problem

187

8.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

8.2

The Nature of Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

8.3

Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

8.4

The Otago Speech Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

8.5

Experimental Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 8.5.1

Presentation of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

8.5.2

Statistical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

8.5.3

Limitations of the Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

8.6

Results with the MLP and FuNN Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

8.7

Results with EFuNN and SECoS Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

8.8

Results of Fuzzy Rules Extracted from ECoS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

8.9

8.8.1

Fuzzy Rule Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

8.8.2

Fuzzy Rule Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

Results of ECoS Optimisation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 8.9.1

Online Neuron Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

8.9.2


8.9.3

Sleep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

8.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 8.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 9

Conclusions and Future Work

214

9.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

9.2

Summary of Benchmark Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

9.3

Hypothesis One: ECoS and Constructive Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 215

9.4

9.3.1

How similar is the ECoS algorithm to other constructive algorithms? . . . . . . . . . . . . 215

9.3.2

What elements of existing constructive algorithms can be adapted to ECoS? . . . . . . . . 215

9.3.3

Support for Hypothesis One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

9.3.4

Conclusions for Hypothesis One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

Hypothesis Two: Simplified ECoS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 9.4.1

Is a simplified version of EFuNN competitive with the original EFuNN? . . . . . . . . . . 218

9.4.2

Are the simplified ECoS as flexible as EFuNN? . . . . . . . . . . . . . . . . . . . . . . . 218

9.4.3

Support for Hypothesis Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

CONTENTS

9.4.4 9.5

9.6

xiv

Conclusions for Hypothesis Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

Hypothesis Three: Formalisation of ECoS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 9.5.1

How can the internal state of an ECoS network be explained? . . . . . . . . . . . . . . . 220

9.5.2

What effect does each training parameter have on the behaviour of ECoS? . . . . . . . . . 220

9.5.3

Support for Hypothesis Three . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

9.5.4

Conclusions for Hypothesis Three . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

Review Hypothesis Four: Fuzzy Rule Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 221 9.6.1

How accurate are the rules extracted from the simplified networks? . . . . . . . . . . . . 221

9.6.2

How do the rules extracted from the non-fuzzy ECoS compare to rules extracted from EFuNN? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

9.7

9.8

9.6.3

Support for Hypothesis Four . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

9.6.4

Conclusions for Hypothesis Four . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

Hypothesis Five: ECoS Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 9.7.1

At what stages of an ECOS network life-cycle can optimisation methods be applied? . . . 223

9.7.2

How can evolutionary algorithms be applied to optimising ECoS networks? . . . . . . . . 223

9.7.3

Support of Hypothesis Five . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

9.7.4

Conclusions for Hypothesis Five . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

Bibliography

244

A Results of Hypothesis Tests for Experiments with MLP and FuNN

245

A.1 Two Spirals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 A.2 Iris Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 A.3 Mackey-Glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 A.4 Gas Furnace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 B Results of Hypothesis Tests for Experiments with EFuNN and SECoS

250

B.1 Two Spirals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 B.2 Iris Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 B.3 Mackey-Glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 B.4 Gas Furnace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 C Results of Hypothesis Tests for Experiments with Rule Extraction

260

C.1 Two Spirals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 C.2 Iris Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 C.3 Mackey-Glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 C.4 Gas Furnace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

CONTENTS

xv

D Results of Hypothesis Tests for ECoS Optimisation Experiments

271

D.1 Two Spirals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 D.1.1 Online Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 D.1.2 Evolutionary Optimised Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 D.1.3 Offline Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 D.1.4 Sleep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 D.1.5 Evolved Sleep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 D.1.6 Comparison of Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 D.2 Iris Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 D.2.1 Online Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 D.2.2 Evolutionary Optimised Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 D.2.3 Offline Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 D.2.4 Sleep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 D.2.5 Evolved Sleep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 D.2.6 Comparison of Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 D.3 Mackey-Glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 D.3.1 Online Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 D.3.2 Evolutionary Optimised Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 D.3.3 Offline Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 D.3.4 Sleep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 D.3.5 Evolved Sleep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 D.3.6 Comparison of Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 D.4 Gas Furnace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 D.4.1 Online Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 D.4.2 Evolutionary Optimised Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 D.4.3 Offline Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 D.4.4 Sleep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 D.4.5 Evolved Sleep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 D.4.6 Comparison of Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 E Results of Hypothesis Tests for Phoneme Recognition Experiments

288

E.1 MLP and FuNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 E.2 EFuNN and SECoS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 E.3 Inserted Fuzzy Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 E.4 ECoS Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 E.4.1

Online Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

E.4.2


E.4.3

Sleep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

CONTENTS

xvi

F Complete Phoneme Experimental Results

296

List of Tables 1.1

Location of investigation of each hypothesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

2.1

Statistical hypotheses for comparing MLP and FuNN. . . . . . . . . . . . . . . . . . . . . . . . .

35

2.2

Statistical hypotheses for evaluating changes in accuracy after further training. . . . . . . . . . . .

35

2.3

Statistical hypotheses for comparing changes in accuracy of MLP and FuNN. . . . . . . . . . . .

35

2.4

Backpropagation training parameters for the two spirals problem. . . . . . . . . . . . . . . . . . .

36

2.5

Mean percent correct / approximate variance (to 1 decimal place (d.p.)) for the two spirals problem. 36

2.6

Backpropagation training parameters for the iris classification problem. . . . . . . . . . . . . . .

37

2.7

Mean percent correct / approximate variance (to 1 d.p.) for the iris classification problem. . . . . .

37

2.8

Average mean squared error / approximate variance (10

4) for the Mackey-Glass problem. . . .

38

2.9

Average mean squared error / approximate variance (to 3 d.p.) for the gas furnace problem. . . . .

39

4.1

Evaluation of ECoS in terms of requirements of IIS. . . . . . . . . . . . . . . . . . . . . . . . . .

72

4.2

ECoS training parameters for benchmark data. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

4.3

Statistical hypotheses for comparing EFuNN and SECoS. . . . . . . . . . . . . . . . . . . . . . .

80

4.4

Statistical hypotheses for comparing MLP and SECoS. . . . . . . . . . . . . . . . . . . . . . . .

81

4.5

Statistical hypotheses for comparing FuNN and EFuNN. . . . . . . . . . . . . . . . . . . . . . .

81

4.6

Statistical hypotheses for comparing changes in accuracy of SECoS and EFuNN. . . . . . . . . .

82

4.7

Statistical hypotheses for comparing changes in accuracy of MLP and SECoS. . . . . . . . . . . .

82

4.8

Statistical hypotheses for comparing changes in accuracy of FuNN and EFuNN. . . . . . . . . . .

82

4.9

Statistical hypotheses for evaluating changes in accuracy after further training. . . . . . . . . . . .

83

4.10 Mean percent correct / standard deviation (to 1 d.p.) for the two spirals problem. . . . . . . . . . .

83

4.11 Mean percent correct / standard deviation (to 1 d.p.) for the iris classification problem. . . . . . .

84

4) for the Mackey-Glass problem. . . . . .

86

4.13 Average mean squared error / standard deviation (to 3 d.p.) for the gas furnace problem. . . . . . .

87

4.12 Average mean squared error / standard deviation (10

5.1

ECoS training parameters for benchmark data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.1

Fuzzy rules extracted from a SECoS trained on the iris data set. . . . . . . . . . . . . . . . . . . . 131

6.2

Zadeh-Mamdani rules extracted from a SECoS trained on the gas furnace data set. . . . . . . . . . 132

6.3

Takagi-Sugeno rules extracted from a SECoS trained on the iris data set. . . . . . . . . . . . . . . 134

6.4

Takagi-Sugeno rules extracted from a SECoS trained on the gas furnace data set. . . . . . . . . . 135

6.5

Statistical hypotheses for comparing networks and extracted rules for the two spirals and iris classification data sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

LIST OF TABLES

6.6

xviii

Statistical hypotheses for comparing networks and extracted rules for the Mackey-Glass and gas furnace data sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

6.7

Statistical hypotheses for comparing rules extracted from SECoS and EFuNN. . . . . . . . . . . . 141

6.8

Statistical hypotheses for comparing Zadeh-Mamdani rules extracted from SECoS with TakagiSugeno rules extracted from SECoS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

6.9

Statistical hypotheses for comparing Zadeh-Mamdani rules with the networks created via insertion of those rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

6.10 Statistical hypotheses for comparing networks created via the insertion of Zadeh-Mamdani rules with the original networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 6.11 Mean percentage correct / standard deviation (to 1 d.p.) for the two spirals problem. . . . . . . . . 143 6.12 Mean percent correct / standard deviation (to 1 d.p.) for the iris classification problem. . . . . . . 145 6.13 Average mean squared error / standard deviation (10

4) of networks and extracted rules for the

Mackey-Glass data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.14 Average mean squared errors / standard deviation (to 3 d.p.) of networks and extracted rules for the gas furnace data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 6.15 Reported Mean Squared Error for the Gas Furnace Problem . . . . . . . . . . . . . . . . . . . . . 152 7.1

Training parameters for online aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

7.2

Statistical hypotheses for evaluating online aggregation. . . . . . . . . . . . . . . . . . . . . . . . 165

7.3

Parameters for GA optimised training over Two Spirals and Iris Classification data sets. . . . . . . 166

7.4

Parameters for GA optimised training over Mackey-Glass and Gas Furnace data sets. . . . . . . . 166

7.5

Statistical hypotheses for evaluating evolutionary optimised ECoS training. . . . . . . . . . . . . 167

7.6

Parameters for offline aggregation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

7.7

Statistical hypotheses for evaluating offline aggregation. . . . . . . . . . . . . . . . . . . . . . . 167

7.8

Sleep Training Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

7.9

Statistical hypotheses for evaluating sleep training. . . . . . . . . . . . . . . . . . . . . . . . . . 168

7.10 Parameters for GA optimised sleep training. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 7.11 Statistical hypotheses for evaluating evolved sleep training. . . . . . . . . . . . . . . . . . . . . . 169 7.12 Statistical hypotheses for comparing sleep trained networks and evolved sleep trained networks. . 170 7.13 Statistical hypotheses for comparing online aggregation and evolutionary optimised ECoS training. 170 7.14 Statistical hypotheses for comparing offline aggregation and sleep training. . . . . . . . . . . . . 171 7.15 Statistical hypotheses for comparing offline aggregation and evolved sleep training. . . . . . . . . 171 7.16 Mean percent correct / standard deviation / approximate variance (to 1 d.p.) for the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 7.17 Average mean squared error / standard deviation / approximate variance (to 3 d.p.) for the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 7.18 Optimised SECoS for iris classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 7.19 Optimised EFuNN for iris classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

LIST OF TABLES 7.20 Average mean squared error / standard deviation / approximate variance (10

xix

4) for the Mackey-

Glass problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 7.21 Average mean squared error / standard deviation / approximate variance (10

4) for the Mackey-

Glass problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 7.22 Optimised SECoS for gas furnace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 7.23 Optimised EFuNN for gas furnace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 7.24 Success of each optimisation method applied to SECoS, by benchmark data set. . . . . . . . . . . 185 7.25 Success of each optimisation method applied to EFuNN, by benchmark data set. . . . . . . . . . . 186 8.1

Phoneme numbers, character representation and example words. . . . . . . . . . . . . . . . . . . 191

8.2

Phonemes grouped by class. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

8.3

Mel scale filter central frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

8.4

Examples available for each phoneme in each phoneme data set. . . . . . . . . . . . . . . . . . . 194

8.5

Training Parameters for MLP and FuNN trained for the phoneme case study. . . . . . . . . . . . . 196

8.6

Mean accuracies / approximate variance of MLP and FuNN for the phoneme recognition case study. 197

8.7

Statistical hypotheses for comparing MLP and FuNN. . . . . . . . . . . . . . . . . . . . . . . . . 198

8.8

Statistical hypotheses for evaluating changes in accuracy after further training for the phoneme case study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

8.9

ECoS training parameters for phoneme recognition problem. . . . . . . . . . . . . . . . . . . . . 199

8.10 Mean percentage / standard deviation of true positive, true negative and overall accuracies of EFuNN and SECoS for the phoneme recognition case study. . . . . . . . . . . . . . . . . . . . . 199 8.11 Statistical hypotheses for comparing SECoS and EFuNN for the phoneme recognition case study.

200

8.12 Statistical hypotheses for comparing MLP and SECoS for the phoneme case study. . . . . . . . . 200 8.13 Statistical hypotheses for comparing FuNN and EFuNN for the phoneme case study. . . . . . . . 201 8.14 Statistical hypotheses for comparing changes in accuracy of SECoS and EFuNN for the phoneme recognition case study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 8.15 Mean percentage / standard deviation of true positive, true negative and overall accuracies of fuzzy rule extraction, for the phoneme recognition case study. . . . . . . . . . . . . . . . . . . . . . . . 203 8.16 Mean percentage / standard deviation of true positive, true negative and overall accuracies of fuzzy rule insertion, for the phoneme recognition case study. . . . . . . . . . . . . . . . . . . . . . . . 204 8.17 Statistical hypotheses for comparing Zadeh-Mamdani rules with the networks created via insertion of those rules, for the phoneme recognition case study. . . . . . . . . . . . . . . . . . . . . . . . 205 8.18 Statistical hypotheses for comparing networks created via the insertion of Zadeh-Mamdani rules with the original networks, for the phoneme recognition case study. . . . . . . . . . . . . . . . . . 205 8.19 Online aggregation training parameters for the phoneme recognition problem. . . . . . . . . . . . 206 8.20 Mean percentage / standard deviation of true positive, true negative and overall accuracies for ECoS networks optimised via online aggregation training, for the phoneme recognition case study. 207 8.21 Statistical hypotheses for evaluating online aggregation, for the phoneme recognition case study. . 207 8.22 Offline aggregation parameters for the phoneme recognition problem. . . . . . . . . . . . . . . . 208

LIST OF TABLES

xx

8.23 Mean percentage / standard deviation of true positive, true negative and overall accuracies of ECoS networks optimised via offline aggregation, for the phoneme recognition case study. . . . . . . . . 209 8.24 Statistical hypotheses for evaluating offline aggregation, for the phoneme recognition case study. . 209 8.25 Sleep learning parameters used for the phoneme recognition problem. . . . . . . . . . . . . . . . 210 8.26 Mean percentage / standard deviation of true positive, true negative and overall accuracies of SECoS optimised by sleep learning, for the phoneme recognition case study.

. . . . . . . . . . . 210

8.27 Statistical hypotheses for evaluating sleep training, for the phoneme recognition case study. . . . . 211 8.28 Statistical hypotheses for comparing offline aggregation and sleep training, for the phoneme recognition case study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 8.29 Summary of results over the phoneme recognition case study. . . . . . . . . . . . . . . . . . . . . 213 9.1

Summary of results over the benchmark data sets. . . . . . . . . . . . . . . . . . . . . . . . . . . 216

9.2

Locations of benchmark results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

A.1 Rejection / acceptance of H0 for MLP vs. FuNN for the two spirals problem. . . . . . . . . . . . 245 A.2 Rejection / acceptance of H0 for change in accuracies of MLP for the two spirals problem. . . . . 245 A.3 Rejection / acceptance of H0 for change in accuracies for FuNN for the two spirals problem. . . . 246 A.4 Rejection / acceptance of H0 for change in accuracies of MLP vs. change in accuracies of FuNN for the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 A.5 Rejection / acceptance of H0 for MLP vs. FuNN for the iris classification problem. . . . . . . . . 246 A.6 Rejection / acceptance of H0 for change in accuracies of MLP for the iris classification problem. . 247 A.7 Rejection / acceptance of H0 for change in accuracies for FuNN for the iris classification problem. 247 A.8 Rejection / acceptance of H0 for change in accuracies of MLP vs. change in accuracies of FuNN for the iris classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 A.9 Rejection / acceptance of H0 for MLP vs. FuNN for the Mackey-Glass problem. . . . . . . . . . 248 A.10 Rejection / acceptance of H0 for change in accuracies of MLP for the Mackey-Glass problem. . . 248 A.11 Rejection / acceptance of H0 for change in accuracies for FuNN for the Mackey-Glass problem. . 248 A.12 Rejection / acceptance of H0 for change in accuracies of MLP vs. change in accuracies of FuNN for the Mackey-Glass problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 A.13 Rejection / acceptance of H0 for MLP vs. FuNN for the gas furnace problem. . . . . . . . . . . . 249 A.14 Rejection / acceptance of H0 for change in accuracies of MLP for the gas furnace problem. . . . . 249 A.15 Rejection / acceptance of H0 for change in accuracies for FuNN for the gas furnace problem. . . . 249 A.16 Rejection / acceptance of H0 for change in accuracies of MLP vs. change in accuracies of FuNN for the gas furnace problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 B.1 Rejection /acceptance of H0 for EFuNN vs. SECoS for the two spirals. . . . . . . . . . . . . . . 251 B.2 Rejection /acceptance of H0 for MLP vs. SECoS for the two spirals. . . . . . . . . . . . . . . . . 251 B.3 Rejection /acceptance of H0 for FuNN vs. EFuNN for the two spirals. . . . . . . . . . . . . . . . 251 B.4 Rejection /acceptance of H0 for change in accuracies of SECoS for the two spirals problem. . . . 252 B.5 Rejection /acceptance of H0 for change in accuracies of EFuNN for the two spirals problem. . . . 252

LIST OF TABLES

xxi

B.6 Rejection /acceptance of H0 for change in accuracies of SECoS vs. change in accuracies of EFuNN for the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 B.7 Rejection of H0 for change in accuracies of MLP vs. change in accuracies of SECoS for the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 B.8 Rejection /acceptance of H0 for change in accuracies of FuNN vs. change in accuracies of EFuNN for the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 B.9 Rejection /acceptance of H0 for EFuNN vs. SECoS for the iris classification problem. . . . . . . . 253 B.10 Rejection /acceptance of H0 for MLP vs. SECoS for the iris classification problem. . . . . . . . . 254 B.11 Rejection /acceptance of H0 for FuNN vs. EFuNN for the iris classification problem. . . . . . . . 254 B.12 Rejection /acceptance of H0 for change in accuracies of SECoS for the iris classification problem. 254 B.13 Rejection /acceptance of H0 for change in accuracies of EFuNN for the iris classification problem. 255 B.14 Rejection /acceptance of H0 for change in accuracies of SECoS vs. change in accuracies of EFuNN for the iris classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 B.15 Rejection /acceptance of H0 for change in accuracies of MLP vs. change in accuracies of SECoS for the iris classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 B.16 Rejection /acceptance of H0 for change in accuracies of FuNN vs. change in accuracies of EFuNN for the iris classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 B.17 Rejection /acceptance of H0 for EFuNN vs. SECoS for the Mackey-Glass data set. . . . . . . . . 256 B.18 Rejection /acceptance of H0 for MLP vs. SECoS for the Mackey-Glass data set. . . . . . . . . . . 256 B.19 Rejection /acceptance of H0 for FuNN vs. EFuNN for the Mackey-Glass data set. . . . . . . . . . 257 B.20 Rejection /acceptance of H0 for change in accuracies of SECoS for the Mackey-Glass problem. . 257 B.21 Rejection /acceptance of H0 for change in accuracies of EFuNN for the Mackey-Glass problem. . 257 B.22 Rejection /acceptance of H0 for change in accuracies of SECoS vs. change in accuracies of EFuNN for the Mackey-Glass problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 B.23 Rejection /acceptance of H0 for change in accuracies of MLP vs. change in accuracies of SECoS for the Mackey-Glass problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 B.24 Rejection /acceptance of H0 for change in accuracies of FuNN vs. change in accuracies of EFuNN for the Mackey-Glass problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 B.25 Rejection /acceptance of H0 for EFuNN vs. SECoS for the gas furnace problem. . . . . . . . . . 258 B.26 Rejection /acceptance of H0 for MLP vs. SECoS for the gas furnace problem. . . . . . . . . . . . 258 B.27 Rejection /acceptance of H0 for FuNN vs. EFuNN for the gas furnace problem. . . . . . . . . . . 258 B.28 Rejection /acceptance of H0 for change in accuracies of SECoS for the gas furnace problem. . . . 258 B.29 Rejection /acceptance of H0 for change in accuracies of EFuNN for the gas furnace problem. . . . 258 B.30 Rejection /acceptance of H0 for change in accuracies of SECoS vs change in accuracies of EFuNN for the gas furnace problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 B.31 Rejection /acceptance of H0 for change in accuracies of MLP vs. change in accuracies of SECoS for the gas furnace problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

LIST OF TABLES

xxii

B.32 Rejection /acceptance of H0 for change in accuracies of FuNN vs. change in accuracies of EFuNN for the gas furnace problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 C.1 Rejection / acceptance of H0 for SECoS vs. Zadeh-Mamdani fuzzy rules extracted from SECoS, for the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 C.2 Rejection / acceptance of H0 for EFuNN vs. Zadeh-Mamdani fuzzy rules extracted from EFuNN, for the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 C.3 Rejection / acceptance of H0 for SECoS vs. Takagi-Sugeno fuzzy rules extracted from SECoS, for the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 C.4 Rejection / acceptance of H0 for Zadeh-Mamdani rules extracted from SECoS vs. Zadeh-Mamdani rules extracted from EFuNN for the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . 262 C.5 Rejection / acceptance of H0 for Zadeh-Mamdani fuzzy rules extracted from SECoS vs. TakagiSugeno fuzzy rules extracted from SECoS, for the two spirals problem. . . . . . . . . . . . . . . 262 C.6 Rejection / acceptance of H0 for Zadeh-Mamdani fuzzy rules extracted from SECoS vs. SECoS created by insertion of Zadeh-Mamdani rules, for the two spirals problem. . . . . . . . . . . . . . 262 C.7 Rejection / acceptance of H0 for Zadeh-Mamdani fuzzy rules extracted from EFuNN vs. EFuNN created by insertion of Zadeh-Mamdani rules, for the two spirals problem. . . . . . . . . . . . . . 263 C.8 Rejection / acceptance of H0 for SECoS vs. SECoS created by insertion of Zadeh-Mamdani rules, for the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 C.9 Rejection / acceptance of H0 for EFuNN vs. EFuNN created by insertion of Zadeh-Mamdani rules, for the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 C.10 Rejection / acceptance of H0 for SECoS vs. Zadeh-Mamdani fuzzy rules extracted from SECoS, for the iris classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 C.11 Rejection / acceptance of H0 for EFuNN vs. Zadeh-Mamdani fuzzy rules extracted from EFuNN, for the iris classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 C.12 Rejection / acceptance of H0 for SECoS vs. Takagi-Sugeno fuzzy rules extracted from SECoS, for the iris classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 C.13 Rejection / acceptance of H0 for Zadeh-Mamdani rules extracted from SECoS vs. Zadeh-Mamdani rules extracted from EFuNN for the iris classification problem. . . . . . . . . . . . . . . . . . . . 265 C.14 Rejection / acceptance of H0 for Zadeh-Mamdani fuzzy rules extracted from SECoS vs. TakagiSugeno fuzzy rules extracted from SECoS, for the iris classification problem. . . . . . . . . . . . 265 C.15 Rejection / acceptance of H0 for Zadeh-Mamdani fuzzy rules extracted from SECoS vs. SECoS created by insertion of Zadeh-Mamdani rules, for the iris classification problem. . . . . . . . . . . 265 C.16 Rejection / acceptance of H0 for Zadeh-Mamdani fuzzy rules extracted from EFuNN vs. EFuNN created by insertion of Zadeh-Mamdani rules, for the iris classification problem. . . . . . . . . . . 266 C.17 Rejection / acceptance of H0 for SECoS vs. SECoS created by insertion of Zadeh-Mamdani rules, for the iris classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 C.18 Rejection / acceptance of H0 for EFuNN vs. EFuNN created by insertion of Zadeh-Mamdani rules, for the iris classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

LIST OF TABLES

xxiii

C.19 Rejection / acceptance of H0 for SECoS vs. Zadeh-Mamdani fuzzy rules extracted from SECoS, for the Mackey-Glass problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 C.20 Rejection / acceptance of H0 for EFuNN vs. Zadeh-Mamdani fuzzy rules extracted from EFuNN, for the Mackey-Glass problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 C.21 Rejection / acceptance of H0 for SECoS vs. Takagi-Sugeno fuzzy rules extracted from SECoS, for the Mackey-Glass problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 C.22 Rejection / acceptance of H0 for Zadeh-Mamdani rules extracted from SECoS vs. Zadeh-Mamdani rules extracted from EFuNN for the Mackey-Glass problem. . . . . . . . . . . . . . . . . . . . . 268 C.23 Rejection / acceptance of H0 for Zadeh-Mamdani fuzzy rules extracted from SECoS vs. TakagiSugeno fuzzy rules extracted from SECoS, for the Mackey-Glass problem. . . . . . . . . . . . . . 268 C.24 Rejection / acceptance of H0 for Zadeh-Mamdani fuzzy rules extracted from SECoS vs. SECoS created by insertion of Zadeh-Mamdani rules, for the Mackey-Glass problem. . . . . . . . . . . . 268 C.25 Rejection / acceptance of H0 for Zadeh-Mamdani fuzzy rules extracted from EFuNN vs. EFuNN created by insertion of Zadeh-Mamdani rules, for the Mackey-Glass problem. . . . . . . . . . . . 268 C.26 Rejection / acceptance of H0 for SECoS vs. SECoS created by insertion of Zadeh-Mamdani rules, for the Mackey-Glass problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 C.27 Rejection / acceptance of H0 for EFuNN vs. EFuNN created by insertion of Zadeh-Mamdani rules, for the Mackey-Glass problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 C.28 Rejection / acceptance of H0 for SECoS vs. Zadeh-Mamdani fuzzy rules extracted from SECoS, for the gas furnace problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 C.29 Rejection / acceptance of H0 for EFuNN vs. Zadeh-Mamdani fuzzy rules extracted from EFuNN, for the gas furnace problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 C.30 Rejection / acceptance of H0 for SECoS vs. Takagi-Sugeno fuzzy rules extracted from SECoS, for the gas furnace problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 C.31 Rejection / acceptance of H0 for Zadeh-Mamdani rules extracted from SECoS vs. Zadeh-Mamdani rules extracted from EFuNN for the gas furnace problem. . . . . . . . . . . . . . . . . . . . . . . 269 C.32 Rejection / acceptance of H0 for Zadeh-Mamdani fuzzy rules extracted from SECoS vs. TakagiSugeno fuzzy rules extracted from SECoS, for the gas furnace problem. . . . . . . . . . . . . . . 269 C.33 Rejection / acceptance of H0 for Zadeh-Mamdani fuzzy rules extracted from SECoS vs. SECoS created by insertion of Zadeh-Mamdani rules, for the gas furnace problem. . . . . . . . . . . . . . 269 C.34 Rejection / acceptance of H0 for Zadeh-Mamdani fuzzy rules extracted from EFuNN vs. EFuNN created by insertion of Zadeh-Mamdani rules, for the gas furnace problem. . . . . . . . . . . . . . 270 C.35 Rejection / acceptance of H0 for SECoS vs. SECoS created by insertion of Zadeh-Mamdani rules, for the gas furnace problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 C.36 Rejection / acceptance of H0 for EFuNN vs. EFuNN created by insertion of Zadeh-Mamdani rules, for the gas furnace problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 D.1 Rejection / acceptance of

H0

for SECoS vs. SECoS trained by online aggregation for the two

spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

LIST OF TABLES

D.2 Rejection / acceptance of

xxiv

H0

for EFuNN vs. EFuNN trained by online aggregation for the two

spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 D.3 Rejection / acceptance of H0 for SECoS vs. SECoS trained via evolutionary optimised training for the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 D.4 Rejection / acceptance of H0 for EFuNN vs. EFuNN trained via evolutionary optimised training for the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 D.5 Rejection / acceptance of H0 for SECoS optimised by offline aggregation for the two spirals problem.273 D.6 Rejection / acceptance of H0 for EFuNN optimised by offline aggregation for the two spirals problem.273 D.7 Rejection / acceptance of H0 for SECoS optimised by sleep learning for the two spirals problem. . 273 D.8 Rejection / acceptance of

H0 for SECoS optimised by GA optimised sleep learning, for the two

spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 D.9 Rejection / acceptance of H0 for SECoS optimised via sleep learning, vs SECoS optimised via GA optimised sleep learning, for the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . 274 D.10 Rejection / acceptance of

H0

for SECoS optimised by online aggregation vs SECoS trained via

GA optimised training for the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . 274 D.11 Rejection / acceptance of H0 for EFuNN optimised by online aggregation vs. EFuNN trained via GA optimised training for the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . 275 D.12 Rejection / acceptance of H0 for SECoS optimised by offline aggregation vs. SECoS optimised by sleep learning for the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 D.13 Rejection / acceptance of H0 for SECoS optimised by offline aggregation vs. SECoS optimised by GA optimised sleep learning for the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . 275 D.14 Rejection / acceptance of

H0

for SECoS vs. SECoS trained by online aggregation for the iris

classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 D.15 Rejection / acceptance of

H0

for EFuNN vs. EFuNN trained by online aggregation for the iris

classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 D.16 Rejection / acceptance of H0 for SECoS vs. SECoS trained via evolutionary optimised training for the iris classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 D.17 Rejection / acceptance of H0 for EFuNN vs. EFuNN trained via evolutionary optimised training for the iris classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 D.18 Rejection / acceptance of H0 for SECoS optimised by offline aggregation for the iris classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 D.19 Rejection / acceptance of H0 for EFuNN optimised by offline aggregation for the iris classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 D.20 Rejection / acceptance of

H0

for SECoS optimised by sleep learning for the iris classification

problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 D.21 Rejection / acceptance of

H0

for SECoS optimised by GA optimised sleep learning, for the iris

classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

LIST OF TABLES


xxv

H0 for SECoS optimised via sleep learning, vs.

SECoS optimised via

GA optimised sleep learning, for the iris classification problem. . . . . . . . . . . . . . . . . . . . 278 D.23 Rejection / acceptance of H0 for SECoS optimised by online aggregation vs. SECoS trained via GA optimised training for the iris classification problem. . . . . . . . . . . . . . . . . . . . . . . 279 D.24 Rejection / acceptance of H0 for EFuNN optimised by online aggregation vs. EFuNN trained via GA optimised training for the iris classification problem. . . . . . . . . . . . . . . . . . . . . . . 279 D.25 Rejection / acceptance of H0 for SECoS optimised by offline aggregation vs. SECoS optimised by sleep learning for the iris classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 D.26 Rejection / acceptance of H0 for SECoS optimised by offline aggregation vs. SECoS optimised by GA optimised sleep learning for the iris classification problem. . . . . . . . . . . . . . . . . . . . 280 D.27 Rejection / acceptance of H0 for SECoS vs. SECoS trained by online aggregation for the MackeyGlass problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 D.28 Rejection / acceptance of H0 for EFuNN vs. EFuNN trained by online aggregation for the MackeyGlass problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 D.29 Rejection / acceptance of H0 for SECoS vs. SECoS trained via evolutionary optimised training for the Mackey-Glass problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 D.30 Rejection / acceptance of H0 for EFuNN vs. EFuNN trained via evolutionary optimised training for the Mackey-Glass problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 D.31 Rejection / acceptance of

H0 for SECoS optimised by offline aggregation for the Mackey-Glass

problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 D.32 Rejection / acceptance of

H0 for EFuNN optimised by offline aggregation for the Mackey-Glass

problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 D.33 Rejection / acceptance of H0 for SECoS optimised by sleep learning for the Mackey-Glass problem.282 D.34 Rejection / acceptance of H0 for SECoS optimised by GA optimised sleep learning, for the MackeyGlass problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 D.35 Rejection / acceptance of H0 for SECoS optimised via sleep learning, vs SECoS optimised via GA optimised sleep learning, for the Mackey-Glass problem. . . . . . . . . . . . . . . . . . . . . . . 283 D.36 Rejection / acceptance of

H0


GA optimised training for the Mackey-Glass problem. . . . . . . . . . . . . . . . . . . . . . . . 283 D.37 Rejection / acceptance of H0 for EFuNN optimised by online aggregation vs EFuNN trained via GA optimised training for the Mackey-Glass problem. . . . . . . . . . . . . . . . . . . . . . . . 283 D.38 Rejection / acceptance of H0 for SECoS optimised by offline aggregation vs SECoS optimised by sleep learning for the Mackey-Glass problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 D.39 Rejection / acceptance of H0 for SECoS optimised by offline aggregation vs SECoS optimised by GA optimised sleep learning for the Mackey-Glass problem. . . . . . . . . . . . . . . . . . . . . 284 D.40 Rejection / acceptance of

H0

for SECoS vs. SECoS trained by online aggregation for the gas

furnace problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284

LIST OF TABLES


xxvi

H0

for EFuNN vs. EFuNN trained by online aggregation for the gas

furnace problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 D.42 Rejection / acceptance of H0 for SECoS vs. SECoS trained via evolutionary optimised training for the gas furnace problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 D.43 Rejection / acceptance of H0 for EFuNN vs. EFuNN trained via evolutionary optimised training for the gas furnace problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 D.44 Rejection / acceptance of H0 for SECoS optimised by offline aggregation for the gas furnace problem.286 D.45 Rejection / acceptance of

H0

for EFuNN optimised by offline aggregation for the gas furnace

problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 D.46 Rejection / acceptance of H0 for SECoS optimised by sleep learning for the gas furnace problem. D.47 Rejection / acceptance of

286

H0 for SECoS optimised by GA optimised sleep learning, for the gas

furnace problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 D.48 Rejection / acceptance of H0 for SECoS optimised via sleep learning, vs SECoS optimised via GA optimised sleep learning, for the gas furnace problem. . . . . . . . . . . . . . . . . . . . . . . . . 287 D.49 Rejection / acceptance of

H0


GA optimised training for the gas furnace problem. . . . . . . . . . . . . . . . . . . . . . . . . . 287 D.50 Rejection / acceptance of H0 for EFuNN optimised by online aggregation vs EFuNN trained via GA optimised training for the gas furnace problem. . . . . . . . . . . . . . . . . . . . . . . . . . 287 D.51 Rejection / acceptance of H0 for SECoS optimised by offline aggregation vs SECoS optimised by sleep learning for the gas furnace problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 D.52 Rejection / acceptance of H0 for SECoS optimised by offline aggregation vs. SECoS optimised by GA optimised sleep learning for the gas furnace problem. . . . . . . . . . . . . . . . . . . . . . . 287 E.1 Rejection / acceptance of H0 for MLP vs. FuNN for the phoneme case study. . . . . . . . . . . . 288 E.2 Rejection H0 for change in accuracy of MLP for the phoneme case study. . . . . . . . . . . . . . 288 E.3 Rejection H0 for change in accuracy of FuNN for the phoneme case study. . . . . . . . . . . . . . 289 E.4 Rejection of hypothesis comparing SECoS and EFuNN for phoneme recognition case study. . . . 289 E.5 Rejection of hypothesis comparing MLP and SECOS for the phoneme recognition case study. . . 290 E.6 Rejection / acceptance of H0 comparing FuNN and EFuNN for phoneme recognition case study. . 290 E.7 Rejection of H0 for changes in accuracy of EFuNN for the phoneme recognition case study. . . . 291 E.8 Reject of H0 for changes in accuracy of SECoS for the phoneme recognition case study. . . . . . 291 E.9 Reject of H0 for changes in accuracy of SECoS and EFuNN for the phoneme recognition case study.291 E.10 Rejection / acceptance of H0 for comparison of Zadeh-Mamdani rules extracted from EFuNN and the networks created from those rules, for the phoneme recognition case study. . . . . . . . . . . . 292 E.11 Rejection / acceptance of H0 for EFuNN vs. EFuNN created via the insertion of Zadeh-Mamdani fuzzy rules, for the phoneme recognition case study. . . . . . . . . . . . . . . . . . . . . . . . . . 292 E.12 Rejection / acceptance of

H0

for evaluation of online aggregation for EFuNN, for the phoneme

recognition case study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

LIST OF TABLES

E.13 Rejection / acceptance of

xxvii

H0

for evaluation of online aggregation for SECoS, for the phoneme

recognition case study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 E.14 Rejection / acceptance of

H0

for evaluation of offline aggregation for EFuNN, for the phoneme


H0

for evaluation of offline aggregation for SECoS, for the phoneme


H0 for evaluation of sleep learning, for the phoneme recognition case

study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 E.17 Rejection / acceptance of

H0

for comparison of sleep learning and offline aggregation, for the

phoneme recognition case study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 F.1

Mean and standard deviation percent true negative, true positive and overall accuracies (to 1 d.p.) of MLP trained on the phoneme classification problem. . . . . . . . . . . . . . . . . . . . . . . . 297

F.2

Mean and standard deviation percent true negative, true positive and overall accuracies (to 1 d.p.) of FuNN trained on the phoneme classification problem. . . . . . . . . . . . . . . . . . . . . . . 302

F.3

Percent true negative, true positive and overall accuracies (to 1 d.p.) of EFuNN trained on the phoneme classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307

F.4

Percent true negative, true positive and overall accuracies (to 1 d.p.) of SECoS trained on the phoneme classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

F.5

Percent true negative, true positive and overall accuracies (to 1 d.p.) of Zadeh-Mamdani rules extracted from EFuNN trained on the phoneme classification problem. . . . . . . . . . . . . . . . 317

F.6

Percent true negative, true positive and overall accuracies (to 1 d.p.) of EFuNN created via insertion of Zadeh-Mamdani fuzzy rules, for the phoeneme classification problem. . . . . . . . . . . . . . 322

F.7

Percent true negative, true positive and overall accuracies (to 1 d.p.) of Zadeh-Mamdani rules extracted from SECoS trained on the phoneme classification problem. . . . . . . . . . . . . . . . 327

F.8

Percent true negative, true positive and overall accuracies (to 1 d.p.) of SECoS created via insertion of Zadeh-Mamdani fuzzy rules, for the phoneme classification problem. . . . . . . . . . . . . . . 332

F.9

Percent true negative, true positive and overall accuracies (to 1 d.p.) of Takagi-Sugeno rules extracted from SECoS trained on the phoneme classification problem. . . . . . . . . . . . . . . . . 337

F.10 Percent true negative, true positive and overall accuracies (to 1 d.p.) of EFuNN trained with online aggregation on the phoneme classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . 342 F.11 Percent true negative, true positive and overall accuracies (to 1 d.p.) of SECoS trained with online aggregation on the phoneme classification problem. . . . . . . . . . . . . . . . . . . . . . . . . . 347 F.12 Percent true negative, true positive and overall accuracies (to 1 d.p.) of EFuNN optimised with offline aggregation for the phoneme classification problem. . . . . . . . . . . . . . . . . . . . . . 352 F.13 Percent true negative, true positive and overall accuracies (to 1 d.p.) of SECoS optimised with offline aggregation for the phoneme classification problem. . . . . . . . . . . . . . . . . . . . . . 357 F.14 Percent true negative, true positive and overall accuracies (to 1 d.p.) of SECoS optimised with sleep learning for the phoneme classification problem. . . . . . . . . . . . . . . . . . . . . . . . . 362

List of Figures 2.1

Partitioning of space by fuzzy rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.2

Perceptron example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

2.3

A Three Neuron Layer MLP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

2.4

Error Surface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.5

An example FuNN Fuzzy Neural Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

2.6

Plot of the two spirals problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

2.7

Plot of sepal width versus sepal length for the iris data set. . . . . . . . . . . . . . . . . . . . . .

29

2.8

Plot of petal width versus petal length for the iris data set. . . . . . . . . . . . . . . . . . . . . . .

30

2.9

Plot of the Mackey-Glass function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

2.10 Plot of carbon dioxide concentration versus time for gas furnace data set. . . . . . . . . . . . . . .

32

2.11 Plot of gas inflow rate versus time for gas furnace data set. . . . . . . . . . . . . . . . . . . . . .

33

3.1

The Upstart Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

3.2

A Cascade Correlation Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

3.3

A Resource-Allocating Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

3.4

The R4 -Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

3.5

GCS adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52

3.6

GCS insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52

3.7

Deletion of neurons in a GCS network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

3.8

GCS Network trained on the two spirals problem . . . . . . . . . . . . . . . . . . . . . . . . . .

54

3.9

The ZISC Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

3.10 The ZISC Recall Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

4.1

General ECoS architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64

4.2

General temporal ECoS architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

72

5.1

Mapping from input to output hyper-spheres . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

5.2

Voronoi regions of an ECoS network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

95

5.3

Voronoi regions of an ECoS network trained on the gas furnace data set. . . . . . . . . . . . . . .

96

5.4

Voronoi regions and training examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

97

5.5

Voronoi regions of an ECoS network trained on the two spirals data set. . . . . . . . . . . . . . .

98

5.6

Region defined by both sensitivity and error thresholds . . . . . . . . . . . . . . . . . . . . . . .

99

5.7

Distance and region defined by sensitivity threshold parameter . . . . . . . . . . . . . . . . . . . 100

LIST OF FIGURES

xxix

5.8

Distances and region defined by error threshold parameter . . . . . . . . . . . . . . . . . . . . . 102

5.9

SECoS & EFuNN size vs Sensitivity Threshold for the two spirals problem . . . . . . . . . . . . 109

5.10 SECoS & EFuNN size vs Sensitivity Threshold for the Iris problem . . . . . . . . . . . . . . . . 110 5.11 SECoS & EFuNN size vs Sensitivity Threshold for the Mackey-Glass problem . . . . . . . . . . 110 5.12 SECoS & EFuNN size vs Sensitivity Threshold for the gas furnace problem . . . . . . . . . . . . 111 5.13 SECoS & EFuNN size vs Error Threshold for the two spirals problem . . . . . . . . . . . . . . . 112 5.14 SECoS & EFuNN size vs Error Threshold for the Iris problem . . . . . . . . . . . . . . . . . . . 113 5.15 SECoS & EFuNN size vs Error Threshold for the Mackey-Glass problem . . . . . . . . . . . . . 113 5.16 SECoS & EFuNN size vs Error Threshold for the Gas Furnace problem . . . . . . . . . . . . . . 114 5.17 SECoS & EFuNN size vs Learning Rate Two for the two spirals problem . . . . . . . . . . . . . 115 5.18 SECoS & EFuNN size vs Learning Rate Two for the Iris problem . . . . . . . . . . . . . . . . . 115 5.19 SECoS & EFuNN size vs Learning Rate Two for the Mackey-Glass problem . . . . . . . . . . . . 116 5.20 SECoS & EFuNN size vs Learning Rate Two for the gas furnace problem . . . . . . . . . . . . . 116 6.1

Regions defined by neurons compared to regions defined by fuzzy MF. . . . . . . . . . . . . . . . 139

6.2

Regions defined by neurons compared to regions defined by fuzzy MF . . . . . . . . . . . . . . . 139

8.1

Human vocal tract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

Chapter 1

Introduction To err is human, but to really foul things up requires a computer. Farmers’ Almanac, 1978

1.1 Motivation Intelligent Information Systems (IIS) (Kasabov, 1998a; Kasabov, 2003) are information processing systems that deal with information in an intelligent way. That is, they deal with information in a way similar to that in which a human domain expert would. Seven general, major requirements for intelligent systems were enumerated in (Kasabov, 1998a, pg 195). These requirements form the high-level motivation for the work discussed in this chapter and for the thesis as a whole. The requirements are:

1. IIS should learn fast from a large amount of data (using fast training, e.g. one-pass training). 2. IIS should be able to adapt incrementally in both real time, and in an off-line mode, where new data is accommodated as it becomes available. 3. IIS should have an open structure where new features (relevant to the task) can be introduced at a later stage of the system’s operation. IIS should dynamically create new modules, new inputs and outputs, new connections and neurons. This should occur either in a supervised, or in an unsupervised mode, using one modality or another, accommodating data, heuristic rules, text, images, etc. The system should tolerate and accommodate imprecise and uncertain facts or knowledge and refine its own knowledge. 4. IIS should be memory-based, i.e. they should keep a reasonable track of information that has been used in the past and be able to retrieve some of it for the purpose of inner refinement, external visualisation, or for answering queries. 5. IIS should improve continuously (possibly in a life-long mode) through active interaction with other IIS and with the environment they operate in. 6. IIS should be able to analyse themselves in terms of behaviour, global error and success; to extract rules that explain what has been learned by the system; to make decisions about its own improvement; to manifest introspection.

CHAPTER 1. INTRODUCTION

2

7. IIS should adequately represent space and time in their different scales; should have parameters to represent such concepts as spatial distance, short-term and long-term memory, age, forgetting, etc. While it is not explicitly stated in the quotation above, the number of modules in an intelligent system can be reduced as well as expanded, that is, redundant or superseded modules can be destroyed in addition to new modules being added. Evolving Connectionist Systems (ECoS) (Kasabov, 1998c; Kasabov, 1998b; Kasabov, 1998a; Kasabov, 2003) were created with these seven requirements in mind. They are a class of Artificial Neural Network (ANN) architectures and a general open architecture training algorithm that allows an ECoS type ANN to learn and adapt through the addition and deletion of neurons and modification of connection weights between those neurons. ECoS networks as defined by Kasabov have several advantages. Firstly, they are resistant to catastrophic forgetting, which means they are able to learn new examples without forgetting the data they have already learned. This makes them able to learn in an online environment, adapting to and learning the new data as soon as they become available. Secondly, as the ECoS training algorithm is a constructive one, ECoS networks do not have a limit to the amount of knowledge they can store. Thirdly, the dynamic addition of neurons also avoids the problem of topology selection. The ECoS training algorithm allows the network to learn quickly, from a single pass over a training data set or a single presentation of a single training example. Finally, a rule extraction algorithm has been proposed (Kasabov and Woodford, 1999) that allows for the explication of the knowledge captured by an ECoS network, via the extraction of fuzzy rules from a trained ECoS network (Chapter 6). The seminal ECoS network was the Evolving Fuzzy Neural Network EFuNN (Kasabov, 1998c). EFuNN contains fuzzy logic elements (Section 2.2) that transform the input variables into ‘fuzzy’ representations, then maps these fuzzy input values to the target fuzzy output values.

1.1.1 Research Problems While ECoS, and EFuNN in particular, are useful algorithms, there are some problems. Most of these problems are due to its novelty: the ECoS algorithm is not well understood; there are no methods of optimising ECoS networks; and EFuNN is overly complex for some applications. In this thesis there are five specific problems that are identified and addressed: 1. Comparisons of ECoS with existing constructive algorithms have been few and incomplete. 2. The fuzzy logic elements of the Evolving Fuzzy Neural Network EFuNN increase the complexity of this ANN, which can reduce its speed and efficiency. 3. There is no testable formalisation for ECoS, to explain either the internal mechanisms of an ECoS network, or the mechanisms of ECoS training. 4. It is implied in the literature (Kasabov, 1998c; Kasabov, 1998a; Kasabov, 2003) that the fuzzy logic elements of EFuNN are necessary for the extraction of fuzzy rules; this has yet to be shown to be the case. 5. Methods for optimising ECoS networks, such as evolutionary algorithms, have not been investigated.


3

Each of these problems will now be expanded upon. 1. ECoS and Other Constructive Algorithms Many constructive algorithms are already in existence. A review of the ECoS literature (Chapter 4) shows that comparisons of ECoS to these other algorithms have been few and incomplete. A codification of the ways in which the ECoS algorithm is similar to and the ways in which it is different from these algorithms would be an important step towards determining the place of ECoS in the spectrum of constructive algorithms, yet this appears not to have been done. Such a comparison would also assist in identifying optimisation methods that have been applied to other constructive algorithms, which might be applicable to ECoS. 2. Complexity of EFuNN The fuzzy logic elements in the EFuNN model add complexity to the network (Chapters 4 and 5)and may not be appropriate for all applications. There are two major problems: firstly, although the ECoS algorithm eliminates the need to specify the topology of the network a priori, EFuNN in some ways negates that advantage by requiring the prior specification of the number of fuzzy membership functions to be attached to each input and output variable. These do not change during the training of the network. Thus, a poor choice will affect the network for its entire lifetime. Secondly, the addition of the membership functions increases the dimensionality of the data being learned, which may make it more difficult for the network to learn the problem at hand. Although it is implied in the ECoS literature that the embedded fuzzy membership functions of EFuNN are necessary to support the extraction of fuzzy rules, this may not be the case, as fuzzy rules can be extracted from ANN without such structures (Cechin et al., 1996). Therefore, a reduced and simplified implementation of ECoS could be useful for those situations where EFuNN performs poorly. Such an implementation as is proposed in this research would be simpler to create, as no decisions about topology would need be made. It would be faster, as there are fewer processing elements to simulate. It would be easier to optimise, because of its simplified structure. It would also be much simpler to analyse these reduced networks. 3. Theoretical Basis of ECoS Traditional ANN are supported by a large body of theory (Cybenko, 1989; Kosko, 1993). This theory describes how the ANN training algorithms behave, given the settings of their training parameters; how the training algorithms allow the network to capture knowledge; and how this knowledge is represented by the ANN. This body of theory assists the neural network practitioner in both applying these algorithms and in optimising and extending them. A theoretical basis is also useful in assisting the acceptance of a new algorithm: other researchers are more likely to utilise a new algorithm if its theoretical grounding is known. A few attempts at a theoretical background for ECoS have been made in the past (Kasabov, 1998a), but as will be shown in Chapter 4 these have been unsatisfactory, as they are not testable and do not make any predictions about the behaviour of the network.


4

4. Fuzzy Rule Extraction without Embedded Fuzzy Elements The extraction of rules from a trained ANN (Chapter 6) is a very useful technique for explaining and elucidating the knowledge learned by the network (Andrews et al., 1995). Fuzzy rules are preferable as the result of rule extraction as they can be easier to comprehend than ordinary crisp rules (Kasabov, 1996a, pg 178). It is implied in the ECoS literature that the fuzzy logic elements of EFuNN are necessary to enable the extraction of fuzzy rules. However, algorithms exist that allow for the extraction of fuzzy rules from ANN which do not have such embedded elements (Matthews and Jagielska, 1995; Cechin et al., 1996). Therefore, it should be possible to extract fuzzy rules from the simplified ECoS described above (Subsection 6.7.1). This would refute the assumption that the fuzzy logic elements of EFuNN are necessary. It would also allow for the simplified ECoS network, with all of its advantages, to be used in the same situations where an EFuNN would have been used. 5. Optimisation of ECoS Due to fundamental differences in the way in which ECoS networks store knowledge (see ECoS theory, Chapter 5), methods of optimising traditional ANN cannot be used to optimise ECoS networks. Users of ECoS are thus in the position of being able to train and evaluate an ECoS network, but being unable to improve its performance, if it should prove to be unsatisfactory. This is especially true for the task of reducing the number of neurons in the network. Established methods of optimising ANN such as pruning of connections or neurons (Mozer and Smolensky, 1989; Le Cun et al., 1990) cannot be directly applied to ECoS due to the differences in the way in which knowledge is stored. Mechanisms for optimising an ECoS network before, during and after training are therefore needed. The larger number of parameters present in the ECoS training algorithm, as compared to more traditional ANN training algorithms, also complicates the optimisation of these networks. Whereas optimising such traditional ANN as backpropagation trained multi-layer perceptrons requires optimising only two training parameters (Rumelhart et al., 1986), ECoS requires the optimisation of up to seven training parameters. While evolutionary algorithms have been fruitfully applied to this problem for other ANN algorithms (Yao, 1999), they have not, prior to the research reported in this thesis (Chapter 7), been investigated for ECoS.

1.2 Research Hypotheses and Goals 1.2.1 Goals The general goal of the research reported in this thesis is to overcome the problems identified in Section 1.1 above, and can be summarised as the characterisation, simplification, formalisation, explanation and optimisation of ECoS networks. Whereas other researchers (Kasabov and Song, 2000; Deng and Kasabov, 1999) have developed ECoS in several different directions, this work takes a different approach. Instead of simply modifying the original ECoS methods into new approaches, the original ECoS algorithm is here characterised, in terms of its similarity to existing constructive algorithms, simplified into a minimalist system and formalised, that is, mathematically analysed. Explanation of what ECoS networks have learned, in terms of extracting comprehensible rules are investigated, as are methods of optimising ECoS networks that build upon, rather than modify, the original training


5

algorithm are also investigated. Following this approach, the goals of the research can be achieved by investigating the hypotheses stated in the following subsection.

1.2.2 Hypotheses The following hypotheses are formulated in response to the problems identified in Section 1.1. Hypothesis One In response to Problem One, it is hypothesised that a comparison of ECoS with existing constructive algorithms will lead to a better understanding of the ECoS algorithms, and lead to methods of optimising ECoS networks. The research questions to investigate for this hypothesis are: 1. How similar is the ECoS algorithm to existing constructive algorithms? 2. What elements of existing constructive algorithms can be adapted to ECoS? This hypothesis contributes to the research carried out under hypothesis five below and is investigated in Chapter 3 and in Section 4.11. Hypothesis Two In response to Problem Two it is hypothesised that a simplified version of EFuNN can be developed, that is competitive with EFuNN, yielding an ECoS network that lacks fuzzy logic elements. This simplified ECoS network will be easier to implement, and will be more efficient in operation. The research questions for this hypothesis are: 1. Is a simplified version of EFuNN competitive with the original EFuNN? 2. Are the simplified ECoS as flexible as EFuNN, in terms of the types of problems they can handle? That is, can they handle the same types of problem as EFuNN? Hypothesis two leads directly to hypothesis four and is investigated in Chapter 4.. Hypothesis Three In response to Problem Three it is hypothesised that a testable formalisation of ECoS and the ECoS training algorithm can be developed that will predict the behaviour of ECoS networks, in relation to the parameters used to train them. The research questions to investigate for this hypothesis are: 1. How can the internal state of an ECoS network be explained? 2. What effect does each training parameter have on the behaviour of ECoS? This hypothesis reinforces the motivation for hypothesis five, and is investigated in Chapter 5.


6

Hypothesis Four In response to Problem Four it is hypothesised that methods of extracting fuzzy rules from the simplified ECoS network can be developed that are competitive with the rules extracted from EFuNN. Research questions for this hypothesis are: 1. How accurate are the rules extracted from the simplified networks? 2. How do the rules extracted from the non-fuzzy, simplified ECoS compare to the rules extracted from EFuNN, in terms of performance? Hypothesis four provides further reinforcement to the motivation for hypothesis five and is investigated in Chapter 6. Hypothesis Five In response to Problem Five it is hypothesised that methods of optimising ECoS can be developed, that will reduce the size of the network while maintaining its accuracy over both previously seen and unseen data. These methods may include evolutionary algorithms. Research questions for this hypothesis are: 1. At what stages of an ECoS networks life-cycle can optimisation methods be applied? 2. How can evolutionary algorithms be applied to the problem of optimising ECoS networks? The motivation for this hypothesis is reinforced by hypotheses one, three and four, and it investigated in Chapter 7.

1.2.3 Scope of Hypotheses This subsection defines the scope of each hypothesis. Only work that is within the scope of each hypothesis will be presented in this thesis: anything that is outside of that scope is not relevant to these hypotheses and will not be addressed in this thesis. Hypothesis One The scope of hypothesis one is the comparison of the ECoS algorithms with selected existing constructive algorithms. The algorithms reviewed will be used to build up a picture of the development of constructive algorithms, while they will also be used to reflect the limitations of constructive algorithms that led to the development of ECoS. Existing elements of constructive algorithms will also be identified for application to ECoS. Hypothesis Two The scope of hypothesis two is the creation and evaluation of a simplified EFuNN network. The evaluation will compare the performance, in terms of training accuracy, generalisation accuracy, and adaptive ability, as well as the size of the networks, in terms of the number of neurons added to the network during training.


7

Hypothesis Three The scope of hypothesis three is the analysis of ECoS networks and creation of a testable formalisation that describes the internal state of an ECoS network, and that predicts how an ECoS network will behave in relation to the training parameters. Hypothesis Four The scope of hypothesis four is the creation algorithms that can be used to extract fuzzy rules from the simplified ECoS network, and the evaluation of these rules against the original networks and the fuzzy rules extracted from EFuNN using the existing methods. That is, the comparison of the accuracy of the fuzzy rules extracted from the simplified ECoS networks to the accuracy of the networks themselves, as well as comparison to the accuracy of fuzzy rules extracted from EFuNN. Hypothesis Five The scope of hypothesis five is the creation of algorithms that reduce the size of ECoS networks while maintaining their accuracy. These algorithms should be considered to be proof-of-concept, especially those algorithms that utilise evolutionary algorithms. While there are many evolutionary algorithms in existence, only simple algorithms will be used in this work. If the work shows that the algorithms can be successfully employed, then more advanced algorithms could be utilised at a later time.

1.3 Criteria for Success Whether or not each of the hypotheses stated in Section 1.2 have been supported will be evaluated according to the following criteria. 1. The research relating to Hypothesis One will be considered to support the hypothesis if: (a) Constructive neural network algorithms that are similar to ECoS are identified, and the ways in which they are similar and different are identified and described . (b) Optimisation methods that are applicable to ECoS are identified and described. Similarity to ECoS is determined according to the extent of the similarities between the structure, internal mechanisms and training algorithm of the constructive neural network being examined and ECoS. 2. The research relating to Hypothesis Two will be considered to support the hypothesis if it results in the creation of a simplified ECoS that is competitive with EFuNN. Competitive in this context means: (a) The simplified ECoS exhibits levels of memorisation of the training data similar to EFuNN. (b) The simplified ECoS exhibits similar or better levels of generalisation over previously unseen data that are similar to EFuNN.


8

(c) The simplified ECoS is able to adapt to new training data, without forgetting previously seen examples, to a similar degree than EFuNN. (d) The simplified ECoS is of a similar or smaller size than EFuNN. (e) The simplified ECoS can be applied to the same kinds of problems as EFuNN, that is, they are as flexible as EFuNN. 3. The research relating to Hypothesis Three will be considered to support the hypothesis if: (a) A formalisation is created that is experimentally testable. (b) The experiments performed do not disprove the formalisation. 4. The research relating to Hypothesis Four will be considered to support the hypothesis if it results in algorithms that allow for the extraction of fuzzy rules from simplified ECoS networks, where the rules are competitive with the rules extracted from EFuNN. Competitive means that the accuracy of the extracted fuzzy rules is similar to or better than the accuracy of rules extracted from EFuNN. 5. The research relating to Hypothesis Five will be considered to support the hypothesis if it results in algorithms that, when applied to an ECoS network, yields the following results: (a) The size of the network has been reduced. (b) The memorisation error over previously seen data has not changed significantly. (c) The generalisation error over previously unseen data has not changed significantly. To be acceptable, Criteria 2-5 above must be validated experimentally.

1.4 Original Contributions of the Research In solving the problems identified above, the thesis makes the following original contributions: 1. A qualitative comparison of ECoS with other, selected, constructive algorithms (Chapter 3 and Section 4.11). 2. Development and validation of a simplified ECoS network, SECoS (Section 4.6). 3. Development and validation of a theory of the internal mechanisms of ECoS (Chapter 5). 4. Development and validation of fuzzy rule extraction and insertion algorithms for SECoS (Chapter 6). 5. Development and validation of methods of optimising ECoS networks, including methods that utilise evolutionary algorithms (Chapter 7).


9

1.5 Structure of the Thesis The rest of the thesis is structured as follows: Chapter 2 has two purposes. Firstly, it briefly reviews the basics of the three major groups of technologies used in this thesis, that is, fuzzy systems, artificial neural networks, and evolutionary algorithms. Secondly, several benchmark data sets, and experiments over these sets, are presented and described. The experimental results are used as a basis of comparison of the novel algorithms presented in later chapters. Chapter 3 reviews several “constructive” neural network algorithms. The material in this chapter is the basis of the investigation of Hypothesis One. The algorithms described in Chapter 3 are used as the basis of the comparison with ECoS that is performed in Chapter 4. Chapter 4 introduces the ECoS family of algorithms. The original ECoS algorithm, the Evolving Fuzzy Neural Network EFuNN is described, as are other ECoS variants. The problems with EFuNN are described. The simplified ECoS network SECoS, one of the original pieces of this thesis, is described in this chapter. Experiments with both EFuNN and SECoS over the benchmark data sets are reported. The experiments are part of the investigation of Hypothesis Two. In Chapter 5 problems with the existing formalisation of ECoS are identified. The chapter then presents the novel formalisation of ECoS. This is an original contribution of the thesis, and is done to investigate Hypothesis Three. Experimental results over the benchmark data sets are reported that support the new formalisation. The chapter also identifies some problems with optimising the ECoS algorithm. Methods for explaining ECoS networks via the extraction of fuzzy rules are discussed in Chapter 6. In this chapter the reasons for extracting rules from ANN are explained. The existing method for extracting fuzzy rules from EFuNN is then described, which is followed by the novel algorithms for extracting fuzzy rules from SECoS. Hypothesis Four is investigated via experiments over the benchmark data sets. Chapter 7 describes several methodologies for the optimisation of ECoS networks. These methodologies are useful for optimising the training of ECoS networks as well as optimising ECoS at the end of training. Experiments over the benchmark data sets are presented as part of the investigation into Hypothesis Five. A major case study, recognition of isolated spoken phonemes, is presented in Chapter 8. In this case study, the algorithms tested on the benchmark data sets in the previous chapters are applied to a real-world problem with a large data set. Finally, conclusions are presented and avenues of future work are suggested in Chapter 9.

1.6 Definitions Definitions of the terms used throughout this thesis are presented in this section.

Evolving Connectionist Systems - Also known as ECoS, these are a class of open architecture ANN algorithms that avoid some of the difficulties inherent with other constructive ANN. They are the focus of this thesis.

Open architecture - An open architecture network, or algorithm, is an ANN that learns through both the


10

Hypothesis

Location(s)

One

Chapter 3 and Section 4.11

Two

Chapter 4, especially Section 4.6

Three

Chapter 5

Four

Chapter 6, especially Section 6.7

Five

Chapter 7

Table 1.1: Location of investigation of each hypothesis.

addition and removal of neurons, as well as the modification of connection weight.

Online learning - In this thesis, online learning is taken to mean that a learning algorithm learns new data as they become available, for example from a stock-market system. This requires that the learning algorithm is able to adapt to this data quickly, preferably without forgetting what it has previously learned. Online learning is also known as ‘real-time’ learning.

Lifelong learning - In this thesis, lifelong learning means the process of continually training a neural network throughout its existence. That is, the network continues to learn new data for its entire lifetime.

Rule extraction - This is the process of extracting comprehensible rules from an artificial neural network. It is the process of explaining what the network has learned, of opening the ‘black-box’.

Aggregation - This is the process of combining elements of a network into a more compact representation of the original components.

1.7 Summary This chapter has introduced several important things. Firstly, the basic motivation for the research was introduced in Section 1.1. The research hypotheses and goals were presented in Section 1.2. The places in the thesis where the hypotheses are investigated are listed in Table 1.1. The criteria for success in the research were presented and discussed in Section 1.3. The original contributions of the work were specified in Section 1.4. Finally, the Structure of the thesis was described in Section 1.5 and definitions of the terms used were presented in Section 1.6.

Chapter 2

Fuzzy Systems, Neural Networks and Evolutionary Algorithms 2.1 Introduction This chapter is in two parts. The first reviews the three basic technologies used in this thesis. They are: fuzzy rulebased systems, covered in Section 2.2, which are a way of representing in rules the uncertainty, or imprecision, found in real life; artificial neural networks, presented in Section 2.3, which are mathematical models inspired by biological nervous systems; and evolutionary algorithms, which are search and optimisation algorithms inspired by the mechanisms of natural selection and Darwinian evolution. These are covered in Section 2.5. This chapter provides only a brief overview of each technology. These technologies are the basis for the work in this thesis, but are not the contributions of this thesis. Therefore, a more in-depth treatment of them is not justified. The second part of the chapter, starting from Section 2.6, introduces the four benchmark data sets that have been selected for this thesis, and applies two of the neural networks discussed to those data sets. The purpose of these benchmark experiments is to provide a basis of comparison of the conventional, well-established models to the novel algorithms introduced in later chapters.

2.2 Fuzzy Logic Systems 2.2.1 Introduction Fuzzy rule based systems are systems that represent knowledge as rules and use inexact, or fuzzy, logic principles in their operation (Zadeh, 1965). Fuzzy logic and fuzzy rule-based systems are covered in this chapter for two reasons: firstly, elements of fuzzy logic are employed in fuzzy neural networks (Subsection 2.3.9); secondly, fuzzy rules are central to the rule extraction sections of this work (Chapter 6). Traditional crisp rule based systems use crisp, or Boolean, logic to match the initial parts, or antecedents, of a rule to the facts that are currently known by the system. Those rules in the system whose antecedents are satisfied are allowed to fire, with the rule that fires being selected by one of several different mechanisms. When a rule fires, the action specified in the consequent part of the rule is carried out. This is usually asserting some other facts as true or false.

CHAPTER 2. FUZZY SYSTEMS, NEURAL NETWORKS AND EVOLUTIONARY ALGORITHMS

12

Boolean, two valued logic holds that something either is, or is not. An object either is a member of a set, or it is not. This logical structure has been the foundation of Western science since the time of Plato. Unfortunately, the real world does not always abide by such crisp definitions. Crisp logic is unable to deal with such fuzzy concepts as ‘more’ or ‘less’, or with over-lapping concepts such as ‘short’, ‘medium’ and ‘tall’. Fuzzy logic is able to represent this vagueness (Zadeh, 1965). An object can belong to a fuzzy set to different degrees, and the sets can overlap. This means that a fuzzy set is able to represent such concepts as ‘more’ and ‘less’, and that they can deal with the overlapping sets of ‘small’, ‘medium’ and ‘large’. Fuzzy rule-based systems use fuzzy logic operators instead of boolean operators in their antecedents and consequents. Instead of dealing with precise, crisp conditions and assertions, a fuzzy rule deals with imprecise fuzzy sets. Evaluation of fuzzy rules thus becomes a matter of matching facts against fuzzy conditions, which may be satisfied to different degrees. Fuzzy rules have several advantages over crisp, non-fuzzy rules (Kosko, 1992; Kasabov, 1996a):

Fuzzy rules are universal function approximators: that is, given a sufficient number of rules, a fuzzy system can approximate any function to an arbitrary degree of accuracy.

Comprehensibility: assuming meaningful labels have been attached to the input and output variables, and their associated fuzzy sets, fuzzy rules are often easier to understand and explain than comparable crisp rules.

Modularity: fuzzy rules can be added to and removed from a fuzzy rule-based system as needed, making it easier to develop and optimise the system.

Parsimony: fewer fuzzy rules are needed to encapsulate knowledge than in an equivalent crisp rule-based system.

The two most important qualities for the purposes of this thesis are comprehensibility and parsimony, as these are the qualities that are taken advantage of when representing knowledge extracted from an ANN (Chapter 6). The rest of this section is structured as follows. Firstly, fuzzy sets are introduced and described in Subsection 2.2.2. This is followed by two different types of fuzzy rule-based system, Zadeh-Mamdani rules in Subsection 2.2.3 and Takagi-Sugeno rules in Subsection 2.2.4.

2.2.2 Fuzzy Sets In traditional sets an item either belongs to the set or does not. Thus, an item is either A or not-A, and between them A and not-A covers the entire universe of the problem (often called the Universe of Discourse). Traditional sets and numbers are often called crisp sets (and crisp numbers), because the boundaries between the sets are crisply defined. Fuzzy sets, which are often denoted by , differ from crisp sets in that an item may belong to a fuzzy set to a certain degree, that is, it can be both A and not-A simultaneously. The degree to which an item or value belongs to a fuzzy set is called its degree of membership. The degree of membership varies from zero (completely outside of the fuzzy set) to one (completely within the fuzzy set).


13

Variables and the fuzzy sets associated with them often have names attached to them: they are thus also referred to as ‘linguistic variables’. This is one of the major advantages of fuzzy rule systems, as the linguistic variables can be much easier to comprehend than an equivalent crisp rule system. Fuzzy sets are usually represented by fuzzy membership functions (MF). A MF will accept as its argument a crisp value and return the degree to which that value belongs to the fuzzy set the MF represents. The range of real-numbered values that a MF accepts is called its “universe of discourse”.

2.2.3 Zadeh-Mamdani Fuzzy Systems The knowledge within a fuzzy system is stored in its rules. The oldest and most common type of fuzzy rules are Zadeh-Mamdani type (Mamdani, 1976). A Zadeh-Mamdani fuzzy rule consists of antecedents (what conditions must exist for the rule to fire) and consequents (what will happen if the rule fires). These rules take the following general form: if I1 is A1 and I2 is A2 and . . . In is An then

O1 is C1 and O2 is C2 and . . . On is Cn where I is an input variable, A is an input fuzzy MF, O is an output variable and C is an output fuzzy MF. Fuzzy rules may be described geometrically by determining the region of input space that is delimited by the antecedent membership functions. That is, given a set of fuzzy rules each rule will describe a hypercube (cubes or squares for less than three dimensions) in the input space where the boundaries of the hypercube are determined by the boundaries of the MF specified in the antecedents. This concept is shown in two dimensions (two input variables) in Figure 2.1, where each variable has three MF attached to it. This divides the input space into nine distinct regions, where each region can be represented by one rule. The process of producing output from inputs in a fuzzy system is called fuzzy inference. Fuzzy inference is a four step process. Firstly, the truth of the rules antecedents (the rules ‘degree of support’) must be calculated. Secondly, an implied membership function must be calculated for each output for each rule with a truth value greater than zero, in a step called implication. Thirdly, in a step called composition, the implied membership functions must be combined to form a single output membership function for each output for the entire fuzzy system. Finally, defuzzification is performed. Defuzzification is the process of calculating a single crisp value from a fuzzy set. Calculating the degree of truth, or degree of support, of rules involves matching fuzzy facts against the antecedents in the rules. This matching requires the use of fuzzy logical operators, such as the fuzzy AND and the fuzzy OR. The degree of support, or degree to which the rule is activated, is the result of the application of the fuzzy logical operators across each element of the antecedents. The degree of support is used to construct an implied membership function across the consequents. That is, each rule will specify which fuzzy output membership function will be assigned to each output variable. The membership function that is assigned, however, will be modified by the degree of support. Two common methods of constructing the implied membership function are prod (product) and min (minimum). The min implication


14

Figure 2.1: Partitioning of space by fuzzy rules (adapted from (Kasabov, 1996a)).

method will form an inferred MF by ‘cutting’ the output MF at the degree of support for the rule. The prod implication method will form an inferred MF by multiplying the output MF by the degree of support for the rule. Fuzzy composition is the process of combining for each output variable the implied MF that each rule generates for that variable. Since composition is done after implication, and the implication process is independent for each rule, the effect is as if every fuzzy rule fired simultaneously. This parallelism of rules is an advantage of fuzzy rules over crisp rule-based systems, which must select which rules to fire by one of several different strategies. Two common fuzzy composition processes are max and sum. The max procedure performs a piecewise maximum operation across all inferred MF for each variable. The maximum value at each point is taken as the fuzzy value for that point of the universe of discourse. The sum procedure also carries out a piecewise operation across all inferred MF for each variable. However, instead of performing a maximum operation across the fuzzy values for each point, a sum operation is performed instead1 . The step following inference is defuzzification. This involves the transformation of fuzzy values (the inferred fuzzy membership function) into crisp values. For many fuzzy applications is not necessary to defuzzify the results of the rules: the fuzzy values alone are sufficient. These situations occur in such applications that deal with humans directly, as humans are often more easily able to interpret and use fuzzy linguistic values. Other applications such as fuzzy machinery control require crisp outputs. Therefore, a single numerical value for each output variable must be calculated by some form of aggregation of the inferred fuzzy set. Two of the methods of defuzzification in use are Centre of Gravity (CoG) and Mean of Maxima (MoM). The CoG algorithm will find the crisp value that ‘balances’ the output fuzzy set. The MoM algorithm will find the mean of the crisp values that correspond to the maximum fuzzy values. 1 Note

that this can lead to a final output MF with membership degrees of greater than one


15

2.2.4 Takagi-Sugeno-Kang Fuzzy Systems Takagi-Sugeno-Kang (or Takagi-Sugeno) fuzzy rules are the other major type of fuzzy rule (Takagi and Sugeno, 1985) The major difference between Takagi-Sugeno and Zadeh-Mamdani fuzzy rules is the structure of the consequents of the rules. The general form of a Takagi-Sugeno rule is as follows: if I1 is A1 and I2 is A2 . . . and In is An then

O1 = f1 (I1 ; I2 ; I3 ) and O2 = f2 (I1 ; I2 ; I3 ) . . . and On = fn(I1 ; I2 ; I3 ) where I is an input variable, A is an input fuzzy MF, O is an output variable and f1 . . . fn are output functions. The antecedent part of the rules has the same structure as Zadeh-Mamdani rules. The consequents, however, are a function of the input variables. The order of a Takagi-Sugeno rule refers to the order of the output functions. A first-order system has first-order polynomial output functions, a second-order system has quadratic output functions. The degree of support of Takagi-Sugeno rules is calculated the same way as for Zadeh-Mamdani rules. The degree of support is then used to calculate the weighted mean of the output values of the associated output functions. There are no implication, composition or defuzzification steps in the evaluation of Takagi-Sugeno fuzzy rules.

2.3 Artificial Neural Networks 2.3.1 Introduction Another of the major technologies employed in this thesis is the artificial neural network (ANN). Indeed, ANN are the focus of this thesis, but are introduced second because a knowledge of fuzzy rules-based systems is necessary to support fuzzy neural networks. The rest of this section covers the basic principles of ANN as are needed to support the research reported in this thesis. Artificial neurons are described in Subsection 2.3.2, followed by artificial neural networks proper in Subsection 2.3.3. Generic methods of training ANN are then described in Subsection 2.3.4, followed by two well-known ANN models in Subsections 2.3.5 and 2.3.6. The backpropagation of errors learning algorithm, which is applied to many ANN models, is described in Subsection 2.3.7. This is followed by a description of fuzzy neural networks (FNN, Subsection 2.3.8), which combine the principles of fuzzy rulebased systems and ANN. A particular model of FNN which is used in this thesis, known as FuNN, is described in Subsection 2.3.9. Finally, the problems with ‘classical’, fixed-architecture ANN are discussed in Subsection 2.4.1.

2.3.2 Artificial Neurons Artificial neurons are finite state automata that are inspired by biological neurons (McCulloch and Pitts, 1943). Biological neurons are specialised nerve cells that continuously collect, integrate and emit nerve signals through time. An artificial neuron greatly simplifies and models this process. Signals coming into an artificial neuron are gathered discretely, that is, there is no time component to their function.


16

There are three mathematical functions associated with an artificial neuron: the input function, activation function, and output function. The input function gathers the incoming signals, and is usually a simple multiplyand-sum operation. The activation function processes the incoming aggregated signal. This function determines the ‘strength’ of the firing of the neuron, and it is here that artificial neurons differ the most from biological neurons. Biological neurons emit a ‘spike train’ of signals when they fire, that is, a sequence of constant amplitude pulses, where the frequency of the pulses increases with greater activation of the neuron: the stronger the activation, the higher the frequency of the pulses. With artificial neurons, only a single signal is emitted, that is, a single value that represents the degree of activation of the neuron. There are several different activation functions are in existence, the most common of which are the linear function, the sigmoid, or logistic, function, and the saturated linear function. The final function associated with an artificial neuron is the output function. This processes the activation signal before passing it out of the neuron, and is usually a simple linear function, that is, the activation value is passed out unaltered.

2.3.3 Artificial Neural Networks The broadest definition of artificial neural networks is that they are collections of connected artificial neurons (Crick, 1989; Lippmann, 1987). The way is which the neurons are arranged is referred to as the network architecture, or topology. There are many different types of ANN, and different ANN arrange the neurons in different ways. Neurons in ANN are commonly arranged in layers: this is the case with the three types of ANN discussed in this section. These layers are connected to one another with weighted connections. Signals “travel though” the connections, and are multiplied by the weight value of the connection. The weights associated with these connections are usually variable, that is, the values of the weights are changed during the training of the network. If all neurons in a layer are attached to all neurons in the following layer, the network is described as being fully connected. Partially connected networks have some neurons that are not connected to all following neurons. The first neuron layer of an ANN is referred to as the input layer: input signals are submitted to this layer. The final neuron layer emits the final result of the processing carried out within the ANN, and is known as the output layer. Propagating signals from the input layer to the output layer is known as forward propagation. There is a well-known debate on the terminology used to describe the topology of an ANN. This debate involves two separate issues. The first is whether or not to count the first (input) layer as a neuron layer, as this layer often does not perform any processing. The second issue is whether the layers of neurons, or the layers of connections should be counted. In this thesis, neuron layers are counted, and the input layer is included in this count. Therefore, the term ‘five neuron layer network’ means that the ANN has five layers of neurons, including the input neuron layer.

2.3.4 ANN Learning As was discussed above, the connections between neuron layers have variable weighting values associated with them. It is the values of these weights that determine how the ANN behaves. ANN learning (or training) is the


17

process of setting these weight values. That is, learning is the process of optimising the connection weight values so that the network as a whole captures some knowledge about the current problem. It is this encoding of knowledge within the connection weights that give ANN their alternative name, connectionist systems. Although there are several different forms of ANN learning, only two general learning techniques will be discussed here: supervised, and unsupervised, learning. Supervised Learning Supervised learning is defined in (Reed and Marks, 1999, pg 7) as: the process of adjusting a system so it produces specified outputs in response to specified inputs. The term supervised learning comes about because an external agent must provide the desired output values. This ‘teacher’ supervises the outputs of the network and modifies its connection weights so that the outputs of the network approximate the desired outputs. Supervised learning is used to train the Perceptron ANN (Subsection 2.3.5). Backpropagation of errors training (Subsection 2.3.7) is also a supervised learning algorithm. Unsupervised Learning With unsupervised learning, there are no target outputs associated with the input values. Instead, an unsupervised learning algorithm learns patterns within the data. Therefore, there is no external ‘teacher’ agent that monitors the outputs of the network. The teacher, such as there is one, is embedded within the ANN itself, which learns without any feedback on the values it outputs. Hence the term, unsupervised. The advantage of unsupervised learning is, as stated by (Reed and Marks, 1999, pg 11): Unsupervised learning is useful because unlabelled data is often more readily available than labeled data. An example of an unsupervised learning algorithm is the Kohonen SOM (Kohonen, 1990; Kohonen, 1997).

2.3.5 Perceptrons Perceptrons (Rosenblatt, 1958) are one of the earliest ANN models. They are two neuron layer, fully connected networks. They have one layer of adjustable connections, and they are trained using a supervised learning algorithm. Figure 2.2 presents a perceptron with three input neurons and two output neurons. The perceptron learning algorithm is a supervised learning algorithm. Errors between the actual outputs of the network and the desired outputs from the training data are used to adjust the connection weights in the network. Perceptrons are severely limited in their applications, due to a problem known as the linear separability problem (Minsky and Papert, 1969). The perceptron linear separability problem is the inability of perceptron networks to distinguish between two classes that are not linearly separable. Classes are said to be linearly separable if a line in two dimensions (or a hyperplane in higher dimensionalities) can be placed between them such that all elements of each class are on one side of the line.


18

Figure 2.2: Perceptron example.

Figure 2.3: A Three Neuron Layer MLP.

2.3.6 Multi-Layer Perceptrons Multi-layer perceptrons (MLP) were derived from perceptrons (Widrow, 1962). A MLP adds one or more layers of neurons to the basic perceptron architecture. These additional layers are inserted between the input and output neuron layers, and are referred to as hidden layers. The name hidden layers comes about because these layers are not seen by anything outside of the MLP: they are completely hidden from the outside world. Figure 2.3 presents a MLP with three input neurons, three hidden neurons and two output neurons. The addition of the hidden layer to MLP solves the linear separability problem. Each neuron in the hidden layer of an MLP corresponds to a single perceptron. Thus, for a MLP with n hidden neurons, there are n hyperplanes available to partition the input space. The output layer then combines this partitioning into a smooth decision surface. Although MLP were known at the time the limitations on perceptrons became widely known, no learning algorithm was known for them, It was not until the discovery of the backpropagation of errors algorithm that is described in the next subsection that the MLP became widely used.

2.3.7 Backpropagation of Errors Training Backpropagation of errors training, also known as backprop or BP training, was created to solve the problem of training multi-layer perceptrons (Rumelhart et al., 1986). It is a supervised learning method that is driven by the minimisation of errors of the network over the training set. During training the output values of the network being trained are compared to the desired output values from the training set, and an error is calculated. This error is then propagated backwards over the network (the backpropagation part of the algorithm) and used to calculate changes (deltas) to the connection weights. There are two main parameters of interest in the backpropagation algorithm. These are the learning rate and


19

Network Error

1 0.8 0 0.6 2

0.4 10

4 8 6

6 4

8 2

Weight Two 0

10

Weight One

Figure 2.4: Error Surface. momentum. The effect of altering these two parameters can best be visualised by imagining an “error surface”. An error surface is the plot of the error of the network against the value of each connection. The dimensionality of this surface is equal to the number of connections in the network. Since a network can have only one set of weight values at any one time, the network will exist at only one point of the error surface at any one time. Figure 2.4 shows an idealised error surface for a network with just two weights. Although there are many different low points on this surface, one is lower than all of the others. This is the “global minimum”, the point of lowest error. Other low points are referred to as “local minima”, regions where the error is low, but not the lowest. The networks traversal of this surface is affected by the backpropagation rule. Backpropagation will move the network down the error surface, towards the minima. Backpropagation is therefore sometimes known as “gradient descent” learning, as the network moves down the gradient of the error surface. The rate at which the network descends the gradient is determined by the value of the learning rate parameter. The momentum term was introduced to backpropagation to smooth out “jitters” in the trajectory of the network over the error surface. Such jitters are caused by sudden changes in the networks trajectory across the error surface. By retaining a portion of the previous weight deltas, such jitters are smoothed out.

2.3.8 Fuzzy Neural Networks Fuzzy neural networks (FNN) combine neural networks and fuzzy logic (Section 2.2). Several FNN have been investigated (Furuhashi et al., 1993; Hasegawa et al., 1993; Hasegawa et al., 1992; Hashiyama et al., 1993b; Hashiyama et al., 1993a; Horikawa et al., 1990; Izquierdó et al., 2001; Jang, 1993; Lee et al., 1994; Mitra and Pal, 1996; Mitra et al., 1997; Uchino and Yamakawa, 1995; Yang and Furuhashi, 1993). The major advantage of FNN


20

is that they allow for the extraction of fuzzy rules from trained ANN (Hiraga and Furuhashi, 1995; Umano et al., 1997). This is especially useful for explaining what the networks have learned from the data. The FNN that will be focussed on here is FuNN (Kasabov et al., 1997a).

2.3.9 The Fuzzy Neural Network The Fuzzy Neural Network (FuNN), first proposed in (Kasabov et al., 1997a), was designed to provide an easy method of combining the advantages of fuzzy logic with the advantages of neural networks. A chief advantage of FuNN is the ability to extract and insert fuzzy rules from and into a FuNN structure (Section 6.4). Fuzzy rules are extracted from a trained FuNN, and are useful for explaining in a comprehensible manner the knowledge that the network has captured. FuNN is a five neuron layer feed forward network. Each layer performs a specific function. The first neuron layer is the input layer. The second layer is the condition layer. Each neuron in this layer represents a single fuzzy membership function attached to a particular input, and performs fuzzification of the input values based on that membership function. This layer is not fully connected to the input layer, as each condition neuron is connected to a single input neuron, that is, each input neuron is connected to its own subset of condition neurons. The weight of the connection between the condition neuron and its input defines the centre of the condition neurons membership function. The third layer of neurons is the rule layer. Neurons in this layer represent associations between fuzzy inputs and fuzzy outputs, that is, they represent fuzzy rules. The fourth layer of neurons is the action layer: neurons in this layer represent fuzzy membership functions attached to the output neurons. This layer is similar to the input layer, in that each action neuron is connected only to the output neuron with which its membership function is associated. Also, the value of the connection weight connecting the action neuron to its output defines the centre of the action neurons membership function. The final neuron layer is the output layer. This calculates crisp output values from the fuzzy output values produced by the action layer neurons. The activation functions used in each neuron layer of FuNN are described below. Figure 2.5 shows an idealised FuNN with three input neurons. Two MF are attached to the first input neuron, three to the second, and two to the third. There are three rule neurons and two outputs, with two MF attached to each output. FuNN Activation Functions The neuron activation functions for each layer of FuNN are described below.

Input Layer - The activation function for the input neurons is a simple linear function. Condition Layer - The activation function for the condition neurons is a fuzzification function based on triangular membership functions, as defined by Equation 2.1.

A =

8 > > > > > > < > > > > > > :

1 1 1; 0;

x wi; wi; +1 wi; wi; x wi; wi; 1

; wi; < x < wi; +1 ; wi; 1 < x < wi; wi; = x otherwise

(2.1)


21

Figure 2.5: An example FuNN Fuzzy Neural Network. where:

A is the activation of the condition node x is the input value wi; is the connection weight from input node i to condition node Rule Layer - The activation function of the rule layer is a standard sigmoid (logistic) function

Action Layer - The action layer activation function is also a standard sigmoid function. Output Layer - The output layer performs centre of gravity defuzzification over the action layer activations to produce a single crisp output. This value is calculated according to equation 2.2

Ao =

No Aa

P

(2.2)

where:

Ao is the activation of the output node o No is the weighted sum of the action node activation, as defined in equation 2.3 Aa is the activation of action node a

No =

X

wo;a Aa

where:

woa is the value of the connection weight from action node a to output o

(2.3)


22

Aa is the activation of action node a

2.4 Perceptron Learning The perceptron learning algorithm is a supervised learning algorithm based on reducing errors across each output neuron. Each example is passed through the perceptron, and the error across each output calculated according to Equation 2.4. The connections to each output neuron are then updated according to Equation 2.5.

ej = yj

oj

(2.4)

where:

ej is the error at output neuron j yj is the actual output value of neuron j oj is the desired output value of neuron j . wi;j (t + 1) = wi;j (t) + :Ii :ej

(2.5)

where:

wi;j (t) is the connection weight from neuron i to neuron j at time t wi;j (t + 1) is the connection weight from neuron i to neuron j at time t + 1 is the learning rate parameter Ii is the ith element of the input vector I ej is the error at neuron j according to Equation 2.4 2.4.1 Problems with Artificial Neural Networks There are several well-known problems associated with conventional connectionist systems. Difficulties such as selecting the topology of a network, the training algorithm and parameters used, the amount of knowledge an ANN can store, and their limited adaptability, all place constraints on the real world performance of connectionist structures. These restrictions are discussed below. It is these constraints that provided the motivation for the constructive neural networks that are described in Chapter 3. Topology Selection The topology of an ANN will have a distinct effect upon its quality. The input features used are of critical importance. Including redundant or irrelevant features of the input space will cause the network to learn very slowly, if at all, and will require a larger network in order to handle the unnecessarily complex input space. Conversely, omitting important features will make learning in the network almost impossible. Statistical analysis can help make these decisions, but statistical analysis has a problem with handling unending (continuous) data streams.


23

The number of hidden nodes in a feed-forward network is of great concern to an ANN practitioner. Although it has been established that a feed-forward ANN with sufficient hidden neurons can approximate any continuous function (Kolmogorov, 1957; Cybenko, 1989), the number of neurons needed is unknown. Again, it is a balancing act between having too few, which will make learning impossible due to a lack of connections within which to store knowledge, and having too many, which makes over-learning almost inevitable. The final architectural problem with ANN is the connectivity of the neurons. Some connections after training will probably be less relevant than others: their presence may cause confusion and inefficiency in the network. Attempts at solving this problem usually involve pruning and structural learning (Section 3.2). Parameter Selection Selection of the training parameters is a multi-parameter optimisation problem: too low a learning rate means that the network will train at a very slow pace, but lessens the chance of it becoming stuck in a local minimum. Too high a learning rate will speed the training process, but increases the chances of it being stuck in a local minimum. An additional danger is that a high learning rate will lead to over-training, or over-fitting, on the training data. The momentum parameter of the algorithm must also be chosen with care: if the momentum is too low, then it will not serve its purpose of smoothing the networks trajectory across the error surface. If it is too high, then the network may overshoot the global minimum. Some work has also been done on dynamically calculating the training parameters as learning is underway (Moreira and Fiesler, 1995; Schiffmann et al., 1994). Evolutionary algorithms have also been used to optimise the training parameters of ANN (Choi and Bluff, 1995; Fontanari and Meir, 1991; Kermani et al., 1999), which is done in this thesis (Subsection 7.3.2) Limited Dynamic Adaptation Catastrophic forgetting is the term used to describe what happens when an ANN previously trained on one data set is further trained on another (McCloskey and Cohen, 1989). Although the ANN is usually able to learn the new data set quite adequately, it is likely to forget the previous data set. This causes problems when the ANN must be used in a so-called “online learning” application, that requires new data to be continuously accommodated: the new data will be handled, but the old data will be forgotten. Training Speed Although there are now many variants of backpropagation training in existence, they all suffer from a lack of training speed (Fahlman, 1988). Backprop requires the repeated presentation of the entire training set, usually hundreds or thousands of times before an acceptable result is obtained. Although some backprop derived algorithms are usually much faster (e.g. quickprop (Fahlman, 1988)), these still require the repetitive presentation of the training set.


24

2.5 Evolutionary Algorithms 2.5.1 Introduction Evolutionary computation (EC) is the field of study that uses simulated evolution, or evolutionary algorithms (EA), to solve problems. Evolution is a blind adaptation method that functions via the non-random accumulation of random mutations and recombinations of genetic material. The modern theory of evolution through natural selection was formulated by the naturalist Charles Darwin after traveling the world aboard the British research ship HMS Beagle. Briefly stated, his theory was that organisms with advantageous heritable variations will produce more offspring than those that lack the variation. This will increase the frequency of the variation throughout the population, allowing the population as a whole to better deal with it’s environment (Darwin, 1859). The central principles of Darwin’s theory are as follows: some creatures possess certain innate qualities that better suits them to their environment. These qualities are genetic in origin, and arise because of random variations (mutations) in the genes of the individual. As these changes make the organism better suited to its environment, it is that much more fit than the others in it’s species, and thus has a greater chance of passing its genes onto future generations. These genetic changes accumulate in the population, gradually and continually altering from one form to another. The main contribution of Darwin’s theory was the concept of linking the fitness of an organism to beneficial variations in it’s genetic makeup. It also linked the chances of an organism passing on its genes to its fitness, and described evolution as the gradual accumulation of these genetic changes. Although Darwinian evolution is often described as “survival of the fittest”, this is not the case. It is not so much that the fittest will survive and the unfit will perish - although this does happen, it is not the central principle of the theory - as that the fittest have more offspring. Also, it is important to realise that although mutations -“the ultimate raw material of evolution” (Villee, 1972, pg 724, pp 726-728) - are created by a random process, the selective pressure on a population - the circumstance that allows some individuals to thrive and causes others to die off - is decidedly non-random. This is a key principle of Darwinian evolution: non-random accumulation of random changes. The rest of this section describes evolutionary algorithms in general and genetic algorithms in particular.

2.5.2 Evolutionary Algorithms Generically, evolutionary algorithms (EA) are search and optimisation algorithms that are inspired by the mechanics of natural selection and natural genetics. EA are well suited to exploring large search spaces, that is, to solving problems that require the optimisation of many parameters. All evolutionary algorithms have certain principles in common. Firstly, they have individuals, where each individual is a possible solution to the problem. The individuals are capable of reproduction, via methods such as mutation, which is a random change of parameters, and crossover, which is a recombination of two individuals to create a third, new individual. Secondly, they have a means of assessing each individual, via an objective function. The objective function assigns a fitness value to the individual, where the fitness depends upon how well the individual performs at the problem. Finally, the


25

individuals with a high fitness level will have more offspring than the less fit individuals. Thus, the qualities that make the fit individuals so will become more common as time passes. There are three main types of EA: genetic algorithms (GA), Evolution Strategies (ES) and Evolutionary Programming (EP). Genetic algorithms are the algorithms used in this thesis, and are discussed in greater detail in the following subsection. ES were developed in Germany during the 1960s (Hoffmeister and Bäck, 1991; Bäck et al., 1991), and are interesting in that they originally used only a single individual, as opposed to the populations used in EP and GA. An individual in ES consists of a string of real numbers, where each number represents a parameter of the problem at hand. Reproduction in ES is done solely by mutation. Later ES have populations of individuals and reproduce using crossover as well as mutation (Bäck et al., 1991). Evolution strategies have been applied to several different problems, including engineering (Hingston et al., 2002) and scheduling (Yang et al., 2002). Evolutionary programming was developed in California during the mid 1960s (Fogel et al., 1965). Evolutionary programming deals with populations of several individuals. Individuals in EP are not representations of problems, but are actual solution attempts. Reproduction is via mutation, although mutation needs to be performed in a way that is appropriate to the particular problem. A common contemporary application of EP is training of ANN (Yao and Liu, 1996b; Fogel et al., 1997; Chellapilla and Fogel, 1999).

2.5.3 Genetic Algorithms Genetic Algorithms were first developed by John Holland as a model of evolution (Holland, 1975). They were later used by such people as David Goldberg (Goldberg, 1989) as a powerful problem solving method. Since then, numerous works have been written on their principles and theory (Davis, 1996; Mitchell, 1996; Koza, 1993). They have also been applied to many different application areas, including path planning (Alander, 1993; Kim and Park, 1996; Shibata et al., 1996), engineering (Wu and Li, 1995) and scheduling (Jenkins and Gedeon, 1997; Lee et al., 1997; Monfroglio, 1996; Bud and Nocholson, 1997). A GA acts on a population of individuals. Each individual possesses one or more chromosomes, strings of genes that encode an attempt at solving the problem at hand, and a fitness value. An attempt at solving a problem with a simple genetic algorithm (Goldberg, 1989) could use the following algorithm: 1. Select a representation schema for solution attempts. 2. Initialise the members of the population. 3. Evaluate each member of the population. 4. Select members to breed new individuals. 5. Create new individuals using

crossover mutation

6. Replace old population with new.


26

7. Continue steps 3 to 6 until a stopping condition is reached. There are two general ways to represent a problem in GA: binary and real-valued encoding. With binary encoding, every parameter is represented as a vector of bits. During evaluation, the bits must be decoded into a parameters value. Real valued encoding (Antonisse, 1989; Wright, 1991) uses a single real number or character to represent each parameter. While this allows for a far more natural representation of the problem in the GA, it does mean that special mutation operators need to be created for each data type used in the chromosome. Empirical work done (Janikow and Michalewicz, 1991) suggests that it doesn’t make any difference what schema is used. Davis (Davis, 1996) maintains that the representation used should be a natural reflection of the problem at hand, that is genes should naturally translate into parameters of the problem, without intermediate steps. Therefore, all of the work done here will use real valued encoding. Evaluation is the most problem-specific part of any EA. The evaluation phase involves quantifying the fitness of each individual, that is, how well each attempt at the problem solution (individual) performs on the problem at hand. The manner of evaluation is entirely dependent upon the problem being solved, the only requirement being that the fitness value assigned to an individual after evaluation objectively reflects its adequacy at solving the problem. Selection is the mechanism whereby individuals are chosen to form the next population. Although there are many methods in existence, they are all based to some extent upon the fitness of individuals. Thus, the most fit individuals could be expected to be selected more than less fit individuals. Crossover is the exchange of genes between two chromosomes. During crossover, portions of chromosome are exchanged about one or more points. The type of crossover is described by the number of points the chromosomes join. Mutation is a random change in genes. A single gene is randomly selected, and its value is changed. With binary representations the bit may be inverted, or it may be reinitialised. With real valued representations, two methods of mutation are possible. The first is reinitialisation: a new value is randomly chosen and assigned to the gene. The second method available is real number creep (Michalewicz, 1992). In this method, a small random value is added to the existing value of the gene. This has the effect of making mutation a small, “creeping” process through the search space, instead of the huge jumps of reinitialisation mutation. There are several advantages to using GA as problem solving tools (Goldberg, 1989). The first is their ability to rapidly search large search spaces. As with biological evolution, a GA capitalises on the non-random action of selective pressure in a population to quickly reject unviable individuals (solution attempts) and to propagate successful genes (solution parameters). It does not need to attempt every possible solution, it samples the entire solution space instead. The second advantage to GA is their problem independence. A GA is not affected by the nature of the problem, it requires only that each individual can be assigned a fitness value that reflects how well it solves the problem. There are, of course, drawbacks to GA. Firstly, there must be a way of representing the problem in the artificial chromosomes. Secondly, there must be a way of evaluating the fitness of each individual. GA also have the potential to be very computationally expensive. Evaluating populations over several generations can raise serious computing power issues. Finally, while the selective pressure on a population is non-random, the mutation and


27

other processes that create new genetic material are random. This means that while the selective pressure will push the population towards the optimal combination of genes, the mutation may not create the genes necessary to achieve that state.

2.6 Benchmark Datasets 2.6.1 Introduction The purpose of the experiments performed with the benchmark data sets in this thesis is to evaluate the algorithms proposed, and to provide an empirical basis with which to compare the performance of those algorithms to each other and to the algorithms derived later in the thesis. There are four main properties of the algorithms to be investigated:

Learning accuracy, assessed as the accuracy over the training data sample after training has terminated.

Generalisation accuracy, assessed as the accuracy over a previously unseen sample of the data.

Adaptation, assessed as the accuracy over an additional sample of the data, after further training on that data.

Forgetting, assessed as the decrease in accuracy over the original training sample after further training on an additional sample.

The purpose of these experiments is not to create optimal models for these benchmark data sets: the results are used for comparison between the algorithms presented here, not to prove that the algorithms here are the best available. The benchmarks selected fulfilled the following criteria:

They must be well known within the computational intelligence community.

They should have two or more input features, with at least one set having two features (to simplify analysis).

They must be non-trivial, which is taken here to mean consisting of more than one hundred examples.

Since the criteria for evaluating the success of the hypotheses include evaluating the application flexibility of the algorithms, it is also necessary for the benchmarks to be a mix of classification and function approximation.

Four benchmark data sets have been selected. The first two of these sets, the Two Spirals and Iris Classification sets, are classification problems: the model being evaluated must determine the class of each pattern presented to it. The second two sets, Mackey-Glass and Gas Furnace, are time-series, function approximation problems2 : the model being evaluated must determine, given some previous values of a variable, the approximate value of that variable at a point in time in the future. Two of the benchmark sets are artificial, while two are natural. That is, the two spirals and Mackey-Glass data sets are both generated by evaluation of an equation across a range of parameter values. The iris classification and gas furnace data sets both come from measurements of real-world processes. 2 Time

series and function approximation can be considered to be different problems: in this thesis, they are regarded to be the same


28

2.6.2 Two Spirals Problem The two spirals set (Lang and Witbrock, 1988) consists of 194 examples, where each example consists of two input variables and one or two output variables that represent two classes. The input variables are the coordinates of a point, which may belong to either of the classes. If a single output variable is used to represent both output classes, then the output encoding is said to be dense. If two output variables are used to represent the two classes (one variable each) then the output encoding is said to be sparse. The task is to identify which class the point belongs to. Figure 2.6 illustrates this data set, where the two classes are represented by ’x’ and ’o’. 8

6

4

y

2

0

−2

−4

−6

−8 −8

−6

−4

−2

0 x

2

4

6

8

Figure 2.6: Plot of the two spirals problem. The Two Spirals problem is a difficult problem to solve with a fixed-topology, backpropagation trained network. As stated by (Yu and Liu, 2002, pg 1222): it is clear that this problem is very hard to solve by use of traditional sigmoidal network. Further evidence of this difficulty is given in (Romero and Alquézar, 2002, pg 1971): It is an extremely hard problem for architectures with sigmoidal activation functions because of its intrinsic high non-linearity In other words, the two interlocking spirals make it very difficult for a BP trained network to learn to separate the two classes (Mizutani and Dreyfus, 2002; Romero and Alquézar, 2002). The two spirals problem is particularly useful for the work in this thesis because it is often used to test so-called “constructive” ANN (see Chapter 3), as stated in (Reed and Marks, 1999, pg 204) The two-spirals problem is sometimes used as a benchmark for constructive [artificial neural network] algorithms because it requires careful coordination of many hidden nodes and is difficult to learn with simple back-propagation in a MLP network (It is not representative of most real-world problems,


29

however). In a single-hidden-layer architecture, 40 or more hidden nodes are generally needed and training times are long. Most successful solutions use more than one hidden layer; some use short-cut connections.

2.6.3 Iris Classification The iris classification problem is a classic in the field of classification (Ripley, 1993). Although it is often referred to as Fisher’s iris data (Fisher, 1936), the data was in fact collected by Anderson (Anderson, 1935). The data set consists of 150 examples , where each example consists of four measurements of the flower of an iris plant. The four measurements are of the sepal width, sepal length, petal width and petal length. There are three species of iris represented, with fifty examples of each species. The three species are Iris Setosa, Iris Versicolor and Iris Virginica. While I. Setosa is easily distinguished from the other two, I. Versicolor and I. Virginica overlap. The complete problem is not linearly separable, that is, a linear discriminator cannot separate the three classes (Eldracher, 1992). This is illustrated in Figures 2.7 and 2.8. Figure 2.7 is a plot of the sepal width against the sepal length of each example, while Figure 2.8 is a plot of the petal width against the petal length of each example. In both plots, it can be seen that the examples for I. Setosa are in a distinct group, while those for I. Virginica and I. Versicolor are tightly mingled together. 5 Iris Setosa Iris Versicolor Iris Virginica

4.5

Sepal Width (cm)

4

3.5

3

2.5

2

1.5

1

4

4.5

5

5.5

6 6.5 Sepal Length (cm)

7

7.5

8

Figure 2.7: Plot of sepal width versus sepal length for the iris data set. This problem has been widely used as a benchmark of classification algorithms, including artificial neural networks (Weiss and Kapouleas, 1991; Hansen et al., 1994).

2.6.4 Mackey-Glass The Mackey-Glass data set (Mackey and Glass, 1977) was originally created to model the functioning of physiological control systems. The data is generated from a chaotic function. While the values in the data set are


30

3

2.5

Petal Width (cm)

2

1.5

1

0.5 Iris Setosa Iris Versicolor Iris Virginica 0

0

1

2

3 4 Petal Length (cm)

5

6

7

Figure 2.8: Plot of petal width versus petal length for the iris data set. periodic, they are also acyclic. As a chaotic data set, future values are determined by a complex function of the current values. The formula from which the data set is derived is shown in Equation 2.6, while the dataset is plotted in Figure 2.9.

x_ =

0:2x(t ) 1 + x10 (t )

0:1x(t)

(2.6)

The data used in these experiments was sourced from the MATLAB software distribution. It was generated as follows (MATLAB Manual, 2002): To obtain the time series value at integer points, we applied the fourth-order Runge-Kutta method to find the numerical solution to the above MG equation; Here we assume x(0)

= 1:2, = 17, and

x(t) = 0 for t < 0. The Mackey-Glass data set is often used as a test of neural networks (Vesanto, 1997; Mukherjee et al., 1997; Rantala and Koivisto, 2002; Casdagli, 1989; Lapedes and Farber, 1987; Littmann and Ritter, 1996; Moody, 1989; Platt, 1991a; Principe and Kuo, 1995; Sanger, 1991; Chiu, 1994; Crowder, 1990; Jones et al., 1990; Svarer et al., 1993; Dorado et al., 2002; Yen and Lu, 2002; Sirvadam et al., 2002; Rozich et al., 2002). The data set used in the experiments in this thesis consisted of 1000 examples, with four input variables, representing the previous values of the function, and one output variable, representing the future value of the function.

2.6.5 Gas Furnace Data The gas furnace data set (Box and Jenkins, 1970, pg 371-372) is a widely used benchmark problem in the area of time-series prediction / function approximation. It consists of 296 examples, where each example is a measure


31

1.4

1.3

1.2

1.1

1

0.9

0.8

0.7

0.6

0.5

0.4

0

100

200

300

400

500

600

700

800

900

1000

Figure 2.9: Plot of the Mackey-Glass function. of the inflow rate of methane gas (CH4 ) into a furnace, and the concentration of carbon dioxide (CO2 ) within the furnace. Each example was taken at an interval of nine seconds. Figure 2.10 presents a plot of the

CO2

concentration against time for this data set. Figure 2.11 is a plot of the methane inflow rate. For the data set used in these experiments, the input variables consisted of the methane inflow rate measurement at four occasions in the past (time t

4), and the concentration of CO2 at the previous measurement (time t 1). The single output variable was the CO2 concentration at the current time (t = 0). This reduced the number of

available examples to 292. This data set has been used as a benchmark for artificial neural networks (Kasabov and Woodford, 1999; Kasabov and Song, 2000; Kasabov and Song, 2002; Ribeiro, 2002) and fuzzy rule based systems (Faraq and Tawfik, 2000; Pedrycz, 1984; Sugeno and Tanaka, 1991; Tong, 1978; Wang and Langari, 1996; Xu and Lu, 1987; Rantala and Koivisto, 2002; Sugeno and Yasukawa, 1991; Sugeno and Tanaka, 1991; Sugeno and Yasukawa, 1993; Tong, 1980; Kim et al., 1998; Lin and Cunningham III, 1995; Abreu and Pinto-Ferreira, 1996; Gaweda et al., 2002; Valdes, 2002; Abraham, 2002). The work in these papers often varies the number of timesteps the data set is dealing with, that is, the methane and CO2 levels at times other than t

1 is included as an input. The

general trend in results seems to be that the more previous timesteps are included, the more accurate the model is. From the literature above it is apparent that the number of timesteps included is adequate to model the problem, and is the same as used in previous work with ECoS networks (Kasabov and Woodford, 1999).


32

100

90

carbon dioxide concentration (%)

80

70

60

50

40

30

20

10

0

0

50

100

150 time step

200

250

300

Figure 2.10: Plot of carbon dioxide concentration versus time for gas furnace data set.

2.7 Benchmark Experiments with MLP and FuNN 2.7.1 Introduction The purpose of these experiments was to provide a basis of comparison between the existing, well-known algorithms MLP and FuNN and the algorithms that are introduced later in the thesis. Therefore, experiments with both MLP and FuNN were performed over each of the four benchmark data sets. The application of ANN and fuzzy systems to the four benchmark datasets have been extensively investigated by other researchers (see references in Subsections 2.6.2-2.6.5). However, different researchers carry out their investigations in different ways: some will randomly divide the data into train and test sets, while others will simply take the first half for training and the second half for testing. For the time-series data sets, differently sized windows in time will be taken. Further complicating matters are the different performance measures used: for the classification problems, both the total number of examples correctly classified has been reported, as well as percentage correctly classified. For the time-series problems, the mean-squared error (MSE), root mean squared error (RMSE), normalised root mean squared error (NRMSE), correlation coefficient (CC) and non-deterministic error index (NDEI) have all been used. Whether or not the data used in calculating these error values have been normalised or not is often not stated by the authors. These factors all combine to make it both very difficult and very frustrating to validly compare results between different publications, and very difficult and frustrating to validly compare the published results obtained using traditional ANN and fuzzy systems to the results obtained using the algorithms described in this thesis. To be truly comparable, the results for the traditional ANN need to be obtained using the same general experimental setup as the experiments carried out using the original work in the thesis. The only time results in the literature are presented in this thesis is when the following criteria are met:


33

1

0.9

0.8

gas inflow rate

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0

50

100

150 time step

200

250

300

Figure 2.11: Plot of gas inflow rate versus time for gas furnace data set.

The accuracy reported was measured the same way as that used in these experiments.

The accuracy reported was measured over the entire benchmark data set, not across a particular subset.

The only time these criteria were met was for the accuracies of gas furnace data modelled with fuzzy systems: this is presented and discussed in Subsection 6.11.6. As the focus of this thesis is artificial neural networks, it was not considered necessary to also carry out experiments using fuzzy systems. The performance of the fuzzy rules that are investigated as a result of rule extraction (Chapter 6) will be compared to the performance of the networks they were extracted from: this is because rule extraction in this thesis is used as a method of explaining what the networks have learned, rather than using rule extraction as a method of obtaining the rules for fuzzy systems. It is also bears reiterating that the goal of this work is not to create the best performing models for any of the benchmark data sets: they are used solely to provide a basis of comparison between the different algorithms described in this thesis. Ten fold cross validation (Stone, 1974) was used in these experiments. Cross validation is described as follows by (Weiss and Kapouleas, 1991, pg 782): In k-fold cross validation, the cases are randomly divided into k mutually exclusive test partitions of approximately equal size. The cases not found in each test partition are independently used for training, and the resulting classifier is tested on the corresponding test partition. The average error rates over all k partitions is the cross-validated error rate. For the experiments in this thesis, each of the data sets were linearly normalised. Each data set was then randomly divided into ten equally sized subsets. For the classification problems (two spirals and iris classification), this was done in such a way that each subset had an equal number of examples of each class. For each run, eight


34

of the subsets were concatenated into a single training set (Set A), one was used for testing and further training (Set B) and the remaining subset used as the validation set (Set C). This was repeated ten times, so that each subset was used for testing and validation once. For each run, the network was trained on Set A, then tested over all three sets. The network was then further trained on Set B, and tested again on all three sets. This was done to evaluate four things: firstly, how well the network had learned Set A; secondly, how well it generalised to Sets B and C; thirdly, after further training on Set B, how well it had adapted to Set B; and fourthly, how much it forgot Set A. For the classification experiments (two spirals and gas furnace) the accuracy was measured as the percentage of examples from each set that were correctly classified: as the number of examples of each class is balanced in both the two spiral and iris sets, this is an appropriate performance measure. For the function approximation experiments (Mackey-Glass and gas furnace) the mean-squared error was calculated, according to Equation 2.7.

e=

1X (to n

ao )2

(2.7)

where:

e is the mean squared error, n is the number of examples in the data set, to is the target output value, and ao is the actual output value. The MSE was assessed over unnormalised data. Since the networks were trained and tested using normalised data, the network output values were denormalised, using the same maximum and minimum values used to normalise them. As ten-fold cross validation was used, there were ten sets of six accuracy measures. The arithmetic mean of each of the performance measures was calculated and is presented in the subsections below. The variation of the performance measures is presented as either the standard deviation or the approximate variance of the runs, depending upon the particular experiment. The approximate variance is used when multiple experiments were carried out over each ‘fold’ of the data, such as was the case with the MLP and FuNN experiments. The approximate variance was calculated according to Equation 2.8.

t =

pPn

2 =1 i n i

(2.8)

where:

t is the total, approximated variance, n is the number of folds in the data, i is the standard deviation over the ith fold of the data. When the approximate variance is presented, rather than the standard deviation, this will be noted. For the purposes of statistical tests, the results of these and all following experiments are assumed to be normally distributed. The statistical hypotheses investigated when comparing MLP and FuNN are presented in Table 2.1. Unpaired, two-tailed t-tests were used to evaluate each hypothesis. The first superscript,

a or b,

indicates which data set the network was trained on. The second superscript

indicates which data set the network was recalled with, whether a,b, or the full data set, f . The subscript denotes


Hypothesis

H0 H1

AA

AB

AC

35

AF

af aa ab a aa ab a af m = f n m = f n m = f n m = f n af aa ab a aa ab a af m 6= f n m 6= f n m 6= f n m 6= f n

Hypothesis

BA

BB

BC

BF

H0 H1

ba ba m = f n ba ba m 6= f n

bb bb m = f n bb bb m 6= f n

b b m = f n b b m 6= f n

bf bf m = f n bf bf m 6= f n

Table 2.1: Statistical hypotheses for comparing MLP and FuNN. Hypothesis

H0 H1

ÆA ÆB ÆC ÆF ba ab bb a b af = = = = bf aa 6= ba ab 6= bb a 6= b af 6= bf aa

Table 2.2: Statistical hypotheses for evaluating changes in accuracy after further training. that the network is either an MLP (m) or a FuNN (fn). Since the purpose of these benchmark experiments is to establish a basis of comparison for later experiments, it was also necessary to assess how well the networks adapted to further training data. This is determined by comparing the performance of each network over each data set before and after further training, where a significant difference in performance indicates a significant change in the performance of the network. The statistical hypotheses used to evaluate these differences are presented in Table 2.2. Paired-sample, two-tailed t-tests were used to evaluate each hypothesis. The same hypotheses were used for evaluating other networks in later chapters (Chapter 4). To compare the way in which MLP and FuNN networks adapted to new data, the change in accuracy of each MLP was compared to the change in accuracy of each FuNN. The statistical hypotheses used to evaluate this comparison are listed in Table 2.3. In this table, the Æ superscript indicates the change in accuracy over the denoted data set. Unpaired, two-tailed t-tests were used to evaluate each hypothesis.

2.7.2 Two Spirals The MLP architecture selected had two input neurons, forty hidden neurons and one output neuron (a dense output encoding was selected for the two classes - the same output encoding scheme will be used throughout this thesis). A hidden layer size of forty neurons was selected, consistent with (Reed and Marks, 1999, pg 204). The FuNN

Hypothesis

H0 H1

ÆA ÆB ÆC ÆF Æf Æa Æb Æ Æa Æb Æ Æf m = f n m = f n m = f n m = f n Æf Æa Æb Æ Æa Æb Æ Æf m 6= f n m 6= f n m 6= f n m 6= f n

Table 2.3: Statistical hypotheses for comparing changes in accuracy of MLP and FuNN.


Learning Rate

0.5

Momentum

0.5

Epochs

5000

36

Table 2.4: Backpropagation training parameters for the two spirals problem. Trained on Set A

Trained on Set B

Recall Set

A

B

C

All

A

B

C

All

MLP

57.5/0.7

18.2/1.0

23.9/1.1

46.5/0.6

45.4/0.7

99.9/0.2

30.6/0.7

48.9/0.7

FuNN

59.4/1.4

41.8/1.6

37.2/1.4

50.3/1.3

54.9/1.4

72.8/1.7

41.5/1.5

51.9/1.4

Table 2.5: Mean percent correct / approximate variance (to 1 decimal place (d.p.)) for the two spirals problem. architecture also had two input neurons, with three MF attached to each. Forty hidden (rule) neurons were used, and the output neuron had two MF attached. The training parameters used were as in Table 2.4. The standard experimental procedure, as described above, was carried out, that is, each network was first trained on Set A, then tested over Sets A, B and C, after which it was further trained on Set B and tested again. One hundred runs were carried out over each ‘fold’ of the data set, where each run consisted of creating a new network with randomly initialised weights, then training and testing. The results, as percentage of examples correctly classified, are presented in Table 2.5. The approximate variance is presented as the measure of variation in the results. To test if the differences apparent in these results are significant, the statistical hypotheses in Table 2.1 were tested. The results of these are presented in Table A.1, where an entry of “reject” indicates that the null hypothesis was rejected by the test, and an entry of “accept” indicates that the null hypothesis was not rejected by the test. To evaluate the levels of forgetting and adaptation exhibited by each type of network, the hypotheses in Table 2.2 were tested. The results of the hypothesis tests evaluating the adaptation of the MLP networks are in Table A.2, while the results of the hypothesis tests evaluating the adaptation of the FuNN networks are in Table A.3. To assess which model forgot the most and adapted the best, the hypotheses in Table 2.3 were tested. The results of comparing the changes in accuracy of MLP and FuNN are in Table A.4. Discussion Inspection of the results in Tables 2.5 and A.1 show that the performance of FuNN was superior across the board after training on Set A, with the exception of Set A at the 99% level of confidence. After further training on Set B, FuNN also had superior performance to MLP over Sets A and C: only over Set B was the performance of MLP superior. The results in Tables 2.5 and A.2 show that both the MLP and FuNN networks forgot the previous training set to a significant degree, that is, the performance over Set A significantly declined. However, both networks significantly improved in performance over Set B and Set C. This caused a significant overall increase in accuracy for MLP, while a significant difference for FuNN appeared only at the 95% level of confidence. It was claimed in (Kasabov et al., 1997a; Watts and Kasabov, 1998) that FuNN networks are more resistant


Learning Rate

0.5

Momentum

0.5

Epochs

1000

37

Table 2.6: Backpropagation training parameters for the iris classification problem. Trained on Set A

Trained on Set B

Recall Set

A

B

C

All

A

B

C

All

MLP

97.6/0.2

96.0/0.2

96.1/0.2

97.3/0.2

96.5/0.2

99.3/0.1

97.0/0.3

97.2/0.2

FuNN

99.0/0.2

95.0/0.3

94.7/0.4

98.5/0.2

97.7/0.3

99.0/0.2

95.5/0.5

97.6/0.3

Table 2.7: Mean percent correct / approximate variance (to 1 d.p.) for the iris classification problem. to catastrophic forgetting than MLP. Inspection of the results in Tables 2.5 and A.4 shows that FuNN forgot to a significantly lesser degree than MLP did. Conversely, MLP adapted to Set B significantly better than FuNN did. The level of change in accuracy over Set C and in overall accuracy was not significantly different for either type of network. The overall performance of both networks, however, was often little better than chance. Given the difficulties inherent in this data set, as described in Subsection 2.6.2, it is not very surprising that the accuracies are so low. Overall, the backpropagation, fixed-architecture networks performed poorly over the two spirals data set.

2.7.3 Iris Classification Two input neurons, five hidden (or rule) neurons and one output neuron were used for each network in this set of experiments. The FuNN used had five membership functions attached to each input and output neuron. The standard experimental procedure was used. One hundred runs were carried out over each fold of the data set, with a new network being created for each run. The training parameters used are presented in Table 2.6. The results, as a percentage of examples correctly classified, are presented in Table 2.7. The approximate variance is presented as the measure of variation of the results. To determine if there are any significant differences between MLP and FuNN, the statistical hypotheses in Table 2.1 were tested. The results of the statistical hypothesis tests comparing MLP and FuNN are presented in Table A.5. As before, an entry of “reject” indicates that the null hypothesis was rejected. To evaluate the change in accuracies after further training, the hypotheses in Table 2.2 were tested. The results of these hypothesis tests for MLP are in Table A.6. The results of the hypothesis tests evaluating the adaptation of the FuNN networks are in Table A.7. To determine whether there were significant differences in the amount of forgetting and adaptation by the networks, the statistical hypotheses in Table 2.3 were tested. The results of comparing the changes in accuracy of MLP and FuNN are in Table A.8.


Trained on Set A Recall Set MLP

FuNN

38

Trained on Set B

A

B

C

All

A

B

C

All

2.191/

2.273/

2.386/

2.199/

2.493/

2.071/

2.505/

2.451/

15

16

16

15

17

17

18

17

13/ 42

14/ 46

13/ 46

13/ 42

16/ 56

12/ 37

16/ 58

16/ 54

Table 2.8: Average mean squared error / approximate variance (10

4) for the Mackey-Glass problem.

Discussion Inspection of the results in Tables 2.7 and A.5 show that the FuNN learned the initial training set better than the MLP. This gave FuNN a higher overall accuracy, as Set A was much larger than Sets B and C. The MLP did generalise over Sets B and C significantly better than FuNN, however. The results in Tables 2.7, A.6 and A.7 show that both MLP and FuNN suffered a significant decrease in accuracy over Set A after further training on Set B. Both significantly improved in performance over Sets B and C, while the overall performance of MLP did not significantly change at all. The comparison of the changes in accuracy over each data set, presented in Table A.8, show that FuNN forgot the most at the 95% level of confidence, but at the 99% level of confidence no significant difference existed. For this problem, FuNN adapted to Set B better. There was no significant difference in the change in generalisation performance over Set C. The iris classification problem is well known, but not very challenging. Thus, the very good performance of both FuNN and MLP over this data set is not surprising. However results indicate that statistically significant differences do exist between FuNN and MLP. These differences echo the findings over the two spirals problem, that is, that MLP adapts better to new data, but FuNN forgets the old data less.

2.7.4 Mackey-Glass The MLP and FuNN architectures for this problem consisted of four input neurons, five hidden neurons and one output neuron. The FuNN had five membership functions attached to each input and output neuron. The standard experimental procedure was used, with one hundred runs being performed over each fold of the data set. The training parameters were as in Table 2.6. The results, as the mean-squared error over unnormalised data, are presented in Table 2.8. The approximate variance is presented as a measure of variation within the results. The results (including variance) are presented as the mantissa of base ten numbers, where the base is raised to the negative fourth power. For example, 2:191 is actually 2:191 10

4. This is solely to reduce the amount of space and redundancy in the tables, and all results

for the Mackey-Glass dataset, throughout this thesis, are presented this way. Testing of the hypotheses in Table 2.1 was performed to determine if the relative differences between MLP and FuNN are statistically significant differences. The results of these statistical hypothesis tests are presented in Table A.9. The hypotheses in Table 2.2 were tested for MLP, to test for significant levels of forgetting and adaptation. The results of these tests are in Table A.10. The same tests were applied to the FuNN results. The results of these tests


Trained on Set A Recall Set MLP

FuNN

39

Trained on Set B

A

B

C

All

A

B

C

All

0.182/

0.193/

0.198/

0.184/

0.226/

0.150/

0.214/

0.219/

0.038

0.040

0.041

0.038

0.040

0.032

0.042

0.039

0.502/

0.612/

0.715/

0.513/

1.090/

0.363/

1.490/

1.018/

0.057

0.165

0.147

0.081

0.209

0.053

0.282

0.206

Table 2.9: Average mean squared error / approximate variance (to 3 d.p.) for the gas furnace problem. are in Table A.11. The question of which network type exhibits the greatest change across each data set is answered by evaluating the hypotheses in Table 2.3. The results are presented in Table A.12. Discussion Inspection of the results in Tables 2.8 and A.9 reveals that in all cases, the MLP out-performed the FuNN at both 95% and 99% levels of significance. The results in Tables A.10 and A.11 show that further training over Set B caused forgetting in both MLP and FuNN, although both exhibited an increase in accuracy over Set B. Both networks also showed a decrease in generalisation accuracy over Set C, after further training. There was a high variance in all cases, which suggests that the chaotic nature of the data set had been manifested: as the divergence in periodicity of the process increases, so does the error of the network. In other words, the network learned one section of the time-series, which then became deprecated by the increasing change in the chaotic function. Although both types of network were able to learn the problem to a high degree of accuracy (low error) the error from the MLP was consistently lower than that of the FuNN. This is in contrast to the results of the two spirals and iris classification problems, where FuNN was often able to out perform MLP. The No Free Lunch Theorem (Wolpert and Macready, 1995) suggests that such a situation would occur: no algorithm is superior across all data sets. It appears reasonable to assume that this is the case with the Mackey-Glass problem.

2.7.5 Gas Furnace Both the MLP and FuNN networks used for these experiments had two input neurons, five hidden (or rule) neurons and one output neuron. The FuNN networks had five membership functions attached to each input and output. The standard experimental procedure was used, and the results are presented in Table 2.9. The training parameters were as in Table 2.6. The results, as MSE measured over unnormalised data, are presented in Table 2.9. The approximate variance is used as a measure of the variation within the results. To determine if the differences between MLP and FuNN are statistically significant, the hypotheses in Table 2.1 were tested. The results of these tests are presented in Table A.13. To investigate the degree of significance of forgetting and adaptation in MLP and FuNN, the hypotheses in


40

Table 2.2 were tested. Table A.14 presents the results of these tests across the MLP results. The same tests for the results of the FuNN networks were also carried out. The results of these tests are in Table A.15. To determine which network type forgot the most and which adapted the best, the hypotheses in Table 2.3 were tested. The results of these tests are presented in Table A.16. Discussion The results across this data set were very similar to those for the Mackey-Glass data set. The performance of the MLP was superior to that of the FuNN across the board: also, the variance of the results was very high. In this case, however, further training caused an increase in generalisation accuracy for both network types. Although forgetting within both network types was evident in the results, it was more pronounced for FuNN. This contradicts the assertions made in the literature, but is consistent with the experimental results for the Mackey-Glass data sets.

2.7.6 Conclusions for Benchmark Experiments with MLP and FuNN The No Free Lunch theorem (Wolpert and Macready, 1995) established that no single algorithm can be superior across all problems. The results of applying the MLP and FuNN networks to the benchmark data sets reflect this principle: although FuNN yielded superior results over the two spirals and iris classification problems, MLP performed better over the Mackey-Glass and gas furnace data sets. Some issues are apparent, however. Firstly, further training a previously trained network, whether FuNN or MLP, caused the network to forget the data set it had been previously trained on. In other words, the phenomenon of catastrophic forgetting, or catastrophic interference, was observed. Also, selection of the appropriate architecture for the networks was an issue. Although the results obtained were satisfactory for most of the data sets (with the exception of the two spirals problem, which is well-known to be hard to model) it is an open question whether or not performance would have been better with different architectures. These results reinforce the issues discussed in Subsection 2.4.1 in terms of forgetting and in terms of the difficulty of selecting architectures, and reinforce the need for neural networks that are able to adapt their architecture during learning.

2.8 Summary This chapter has presented three things. Firstly, the three basic technologies used in this thesis, fuzzy rule-based systems (Section 2.2), artificial neural networks (Section 2.3), and evolutionary algorithms (Section 2.5), were briefly described. Secondly, the four benchmark data sets that are used throughout this thesis were introduced in Section 2.6. Finally, experiments with MLP and FuNN neural networks were performed with these benchmark data sets in Section 2.7. These results provide a basis for comparison with the novel algorithms presented later in the thesis. The results also clearly showed that conventional, fixed-architecture ANN have difficulty adapting to new data: once they have been trained on one set of data, they are difficult to train on another. This difficulty is one of the motivations for constructive ANN algorithms, as described in the next chapter.

Chapter 3

Constructive Connectionist Systems Things fall apart; the centre cannot hold; Mere anarchy is loosed upon the world, The blood-dimmed tide is loosed, and everywhere The ceremony of innocence is drowned; W.B. Yeats, The Second Coming

3.1 Introduction Several problems with traditional, fixed-architecture connectionist systems were identified and discussed in Section 2.3. These problems all spring from the fixed nature of traditional algorithms: once an architecture has been selected for the ANN, it is not possible to alter it. This means that additional knowledge cannot be accommodated, as is needed in life-long learning applications, due to the constant number of connections within which the knowledge of the ANN is stored. Often, attempts to insert new knowledge into an ANN with an insufficient number of connections will destroy the knowledge that exists there already, a phenomenon known as catastrophic forgetting (McCloskey and Cohen, 1989). Conversely, an ANN with a surplus of neurons will over-fit the training data, losing the ability to generalise beyond the training data set. If the topology of the network can be determined automatically, then these problems become less significant. This is the motivation behind so-called constructive algorithms, where the network is constructed by the training algorithm, during training. Instead of starting with an ANN of fixed topology, a constructive algorithm starts with a minimal network and progressively adds neurons and connections as needed by the training algorithm. This helps with the topology determination process, as most constructive algorithms will stop adding neurons when such additions no longer improve performance. The algorithms described in this chapter are known as constructive algorithms. These are algorithms that begin with neural networks that initially have a minimal number of neurons, then progressively add neurons in response to some stimulus, usually some function of the error over the training data set. The learning algorithms applied to constructive networks are called constructive learning algorithms. Six motivations for using constructive learning algorithms are listed in (Parekh et al., 2000): 1. Flexibility of Exploring the Space of Neural-Network Topologies 2. Potential for Matching the Intrinsic Complexity of the Learning Task 3. Estimation of Expected Case Complexity of the Learning Task

CHAPTER 3. CONSTRUCTIVE CONNECTIONIST SYSTEMS

42

4. Tradeoffs Among Performance Measures 5. Incorporation of Prior Knowledge 6. Lifelong Learning These motivations can be expanded as follows: 1. Flexibility of Exploring the Space of Neural-Network Topologies: This means that the algorithm itself is able to explore the alternative topologies that may be used for the problem at hand. This is contrasted to traditional algorithms where the alternative topologies must be investigated manually by the ANN practitioner, while the algorithm itself can investigate only the weight space. 2. Potential for Matching the Intrinsic Complexity of the Learning Task: As above, the algorithm will adjust the network to account for the complexity of the problem at hand. If the network is not able to learn a complex problem, then the algorithm will modify the structure of the network so that it can. If, on the other hand, the problem is of low complexity, then a constructive algorithm starting with a small sized network will find a parsimonious solution. 3. Estimation of Expected Case Complexity of the Learning Task: This is described as follows in (Parekh et al., 2000, pg 436): Most practical learning problems are known to be computationally hard to solve. However, little is known about the expected case complexity of problems encountered and successfully solved by living systems primarily because it is difficult to mathematically characterize the properties of such problems. Constructive algorithms, if successful, can provide useful empirical estimates of the expected case complexity of practical learning problems. 4. Tradeoffs Among Performance Measures: depending upon the algorithm used, the speed of the algorithm may be traded off against the accuracy of the network, or against the final size of the network at the conclusion of learning. 5. Incorporation of Prior Knowledge: this can be taken to mean two things. Firstly, the retention of knowledge learned during previous training over a previous data set, by further training without catastrophic forgetting. Secondly, the incorporation of knowledge from other sources, such as rules. Parekh (Parekh et al., 2000, pg 437) contends that: Constructive algorithms provide a natural framework for incorporating problem-specific knowledge into initial network configurations and for modifying this knowledge using additional training examples. 6. Lifelong Learning: this is simply the continued learning of a network throughout its existence, without the loss of knowledge through catastrophic forgetting.


43

The six motivations described above can be condensed into three criteria for evaluating constructive connectionist systems. These criteria are needed so that the constructive algorithms described here can be meaningfully compared to the ECoS algorithm (Chapter 4), which is the core of this thesis. The criteria are: 1. The domain of application of the algorithm, that is, whether the algorithm can be applied to problems from the domain of classification, function approximation, or both. 2. Suitability to lifelong learning, that is, whether the algorithm allows for further training of the ANN after the completion of training on the first training data set. This includes the resistance of the algorithm to over-training. 3. Efficiency of the algorithm, in terms of the number of calculations required both during training of the ANN and during use of the trained ANN. Constructive algorithms began as an extension of the backpropagation of errors training algorithm operating on the well-known multi-layer perceptron architecture (Section 2.3). One of the earliest constructive algorithms proposed, is the Dynamic Node Creation algorithm (Ash, 1989). This is described and evaluated in Section 3.4 and is of the extended backpropagation form of constructive algorithm. As constructive algorithms matured, they started to move away from the backpropagation algorithm in favour of other training methods, and started to modify the MLP architecture to a greater degree than just expanding its hidden layer. Two examples of this kind of algorithm are the Tiling algorithm (Mézard and Nadal, 1989), described in Section 3.5, and the Upstart algorithm (Frean, 1990), described in Section 3.6. The Cascade Correlation algorithm (Fahlman and Lebiere, 1990) as described in Section 3.7 is another intermediary form of constructive algorithm: it abandons the MLP structure completely, but still uses a form of backpropagation to set the connection weights. The final group of constructive algorithms described here are different from their predecessors: These are the Resource Allocating Network (Platt, 1991b), described in Section 3.8; the Evolutionary Nearest-Neighbour MLP (Zhao and Higuchi, 1996) in (Section 3.9); the Growing Cell Structures (Fritzke, 1991) in Section 3.10; the Zero Instruction Set Computer (ZISC) network (Section 3.11); and the Grow and Learn (GAL) network (Alpaydin, 1994) in Section 3.12. They generally have little in common with the architectures of previous ANN algorithms, and use a method of training specific to their architecture. These models are included because of their similarity to the algorithm presented in Chapter 4 that is the focus of this thesis. This similarity is important, as it is from these similarities that methods of optimising and improving the ECoS algorithm can be derived. This chapter establishes the background material for the investigation of Hypothesis Two. That is, it describes several constructive connectionist systems that will be compared to the ECoS algorithm in Chapter 4. The algorithms presented in this chapter are included either because they are of historical interest, or because of their similarity to the model presented in Chapter 4. Models included for historical interest are included if they are some of the earliest constructivist algorithms proposed. This is the case for the Dynamic Node Creation algorithm described in Section 3.4, the Tiling algorithm in Section 3.5, the Upstart algorithm in Section 3.6 and the Cascade Correlation algorithm in Section 3.7. Models included for their similarity to the model in Chapter 4 are the Resource Allocating Network in Section 3.8, the Evolutionary Nearest Neighbour MLP, in Section 3.9, the Growing


44

Cell Structures algorithm in Section 3.10, the Zero Instruction Set Computer networks in Section 3.11 and the Grow and Learn networks in Section 3.12. Before considering constructive methods, it is necessary to consider an alternative method of determining the optimal size of an ANN: pruning of connections and neurons.

3.2 Pruning An alternative method of finding optimally sized networks is to use a so-called “destructive” method. These methods use a technique known as pruning to find an optimally sized ANN. Pruning is the removal of connections or neurons from a trained ANN. The rationale behind pruning is to determine, through some mechanism, which connections or neurons are superfluous to the network and, having so identified them, remove them. The ways in which these superfluous elements are identified are varied, but they are generally based upon an analysis of the connection weights, and may include evaluation of the network over the training data set. These are destructive methods: they start with a large, fully connected network and destroy unneeded connections and neurons. Thus, an optimal network may be found by creating a large network and removing the unnecessary parts during, or after, training. Pruning algorithms are divided into two broad groups, sensitivity analysis and penalty functions.

3.2.1 Sensitivity Analysis Sensitivity analysis estimates the sensitivity of the network error with respect to each particular neuron or weight: those weights that do not contribute much to the error can be safely removed. An example of this type of pruning algorithm is (Mozer and Smolensky, 1989). In this paper, a modified transfer function was used in each neuron to determine the relevance of that neuron to the performance of the network. Those neurons with low relevance were pruned. A similar approach was used in (Chauvin, 1990), although in this case the importance of the neuron was measured by monitoring the variation of the neurons activity during training: neurons with low variation were considered to be unimportant and were removed. In (Karnin, 1990) the sensitivity was measured as the difference in network error with the weight removed. Rather than individually removing each weight and testing the error of the network, the sensitivity of the network with respect to each weight was approximated by monitoring the changes to the weight during training: weights that did not change much during training had low sensitivity and could be pruned. In (Le Cun et al., 1990) the well-known ‘Optimal Brain Damage’ procedure was introduced. In this algorithm the ‘saliency’ of each weight was estimated by calculating the second derivative of the network error with respect to the weight. This was an iterative procedure, with a typical optimisation cycle consisting of the train-compute saliencies-prune-train steps. The optimal brain damage procedure was evolved into the ‘Optimal Brain Surgeon’ algorithm in (Hassibi and Stork, 1993). This made changes to the calculation of the saliencies that lead to a more accurate estimation.


45

3.2.2 Penalty Functions Penalty functions apply a decrement to weights during training. An example of this type of algorithm is learning with forgetting (Ishikawa, 1996). In this modification to the standard backpropagation algorithm, a penalty or decay factor is included in the weight delta added to each connection. If a connection is not reinforced by backpropagation, then it will be weakened by the decay term. Since any connection that is not reinforced is superfluous to the network, redundant connections will eventually decay to zero, where they cease to have any effect. While the work in (Ishikawa, 1996) used a fixed decay term, the algorithm in (Weigund et al., 1991) scaled the decay term proportional to the magnitude of the weight, that is, smaller weights decayed faster than larger weights. There are, of course, several overlaps between the two groups.

3.3 Constructing versus Pruning Pruning methods, or destructive algorithms, as described in the previous section, have the advantage of being fast, simple and flexible: since most pruning algorithms are designed with the multi-layer perceptron-style ANN in mind, they can be easily applied to models that are derived from the MLP, such as SRN (Elman, 1990) or FuNN. Also, they provide a way of finding an optimally sized and connected network trained using either standard backpropagation, or some small variation thereof. There is no need to use a specialised learning algorithm as is used for the constructive algorithms described in this chapter. The main reason to use a constructive algorithm over a destructive algorithm is that a destructive algorithm starts with a large network and reduces its size, while a constructive algorithms starts with a small network and expands it. Thus, the destructively-optimised network must be as large as, or larger than necessary. If the optimal size of the network is not known, however, it is also difficult to choose a size that is “large enough” for the destructive algorithm to reduce. This runs the risk of an initial size being chosen that is far too large, with all of the training inefficiencies that entails. A constructive algorithm, on the other hand, will start with a minimal network and build it only as large as it needs to be. Straddling the line between constructive and destructive techniques are so-called open architecture methods. These use both constructive and destructive techniques: as neurons that were added by the constructive phase of the algorithm may become redundant later in the life cycle of the network, the destructive elements of the algorithm will return the network to optimality by removing those neurons. Several constructive algorithms are now reviewed.

3.4 The Dynamic Node Creation Algorithm The Dynamic Node Creation (DNC) algorithm (Ash, 1989) is based on three neuron layer multi-layer perceptrons with sigmoid activation functions. Changes in connection weight values are effected by normal backpropagation training. The network starts with a very small number of neurons in the hidden layer, and adds a new neuron whenever the mean training error begins to flatten. The addition of a new neuron is controlled by a threshold value

T : whenever the average change in error is less than T , a new hidden neuron is added. The new neuron is fully


46

connected to the networks inputs and outputs, although (Ash, 1989) does not describe how the connection weights are initialised. After the new neuron is inserted training continues as normal, that is, the connections to both the newly added and existing neurons are modified. Experimental results indicate that the DNC algorithm is able to find minimalist topologies for some problems, and the network will train faster than one with a fixed architecture. Evaluating this algorithm using the three criteria, the DNC algorithm can be applied to both classification and function approximation problems. The algorithm is designed for a single data set, however, and contains no protections against catastrophic forgetting. It is also vulnerable to overtraining, as it is quite possible for the algorithm to keep adding hidden neurons as the algorithm tries to push the training error to ever smaller values. It is entirely up to the user to determine when to stop adding neurons and cease training. Finally, the algorithm is iterative, with no guarantee of convergence. There is therefore the potential for it to be quite slow in learning.

3.5 The Tiling Algorithm Another of the early constructive algorithms was the Tiling algorithm (Mézard and Nadal, 1989). This algorithm builds multiple hidden layers, where additional layers are added in response to errors over the training set. Although it was initially designed for mapping binary inputs to binary outputs, there seems to be no reason why it could not also be applied to mapping continuous values to binary outputs. The algorithm commences with an input layer of the appropriate size. A new layer with a single neuron (known as a “master” unit) is added and connected to the preceding layer. The weights of the master unit are calculated so that the master unit will make at least one less error than the master unit in the previous layer. If the master unit is able to correctly classify all training examples, then the algorithm terminates and the master unit serves as the output node for the network. For multiple outputs a master unit will be added for each output. If the master unit is unable to correctly classify all examples, then ancillary units will be added so that the current layer produces a unique pattern of activation for each example that has different targets. This breaks the training set into progressively smaller sets, which makes it easier for each layer to model. It also means that smaller layers are added as the algorithm progresses. Since each layer makes at least one fewer errors than the layer before it, convergence is guaranteed for a finite data set. Using the three evaluation criteria gives the following evaluation: the tiling algorithm is suitable for binary outputs only, thus limiting the application domain to classification. There seems to be no way in which this algorithm could be applied to more than one training data set, nor is there any protection against over-training. It is possible that a new neuron will be added for every training example. Finally, the algorithm is iterative, requiring multiple presentations of the entire training set. Also, the master units of each hidden layer must be tracked and analysed during training, which adds more complexity to the training process.

3.6 The Upstart Algorithm The upstart algorithm was proposed in (Frean, 1990). It is a method of constructing a single hidden layer neural network to solve binary mappings where convergence to zero errors is guaranteed for a finite training set. The


47

networks are composed of linear threshold units and additional units are added whenever the network misclassifies an example. The upstart algorithm starts with no hidden layer, with the inputs connected directly to the output node Z . These connection weights are trained using a method such as perceptron learning (Section 2.3) until an error minimum

Z incorrectly activates, then a new unit X is added to the network. This is connected to the inputs as well as to Z . The weight from X to Z must be set so that the signal from X is sufficient to inhibit Z from firing. Similarly, if Z is incorrectly quiescent, then a neuron Y is added with weight set to excite Z sufficiently to cause it to fire. Both X and Y are referred to as ‘daughter’ nodes of Z . If errors are still being made, then daughters will be added to X and Y . The algorithm therefore builds a binary is reached. The existing weights are then frozen. If

tree of neurons. Since each additional unit eliminates at least one classification error, the algorithm is guaranteed to converge to zero errors over a finite data set. Figure 3.1 shows the construction of the binary tree of neurons.

. Figure 3.1: The Upstart Algorithm At the end of training, the tree structure can be converted into a single hidden layer network. The existing neurons in the tree, including the original Z , are placed together into a single layer and the connections between them eliminated. A new output neuron is added and connected to the constructed hidden layer. The new weights are either found using perceptron learning or are found from the tree structure itself. Results in (Frean, 1990) show that the upstart algorithm is significantly faster than the tiling algorithm (Section 3.5). Extensions are also proposed that allow it to create a single hidden layer network directly. The results of the evaluation using the three criteria are as follows: the algorithm deals with binary data only, and is thus limited to classification tasks. Convergence is guaranteed for only a single data set, and it seems unlikely that the upstart algorithm would be able to further train on new data. This is because as neurons are added, their weights are frozen after training and before insertion into the network. Thus, they would be unable to adapt to new examples. The restriction to a single data set is also due to the requirement that the tree be converted into a single hidden layer network before it can be used. This conversion adds to the complexity of the algorithms, making upstart a more computationally intensive training method, even though it is faster than the tiling algorithm.

3.7 Cascade Correlation Networks The cascade correlation algorithm (Fahlman and Lebiere, 1990) was created in an attempt to solve the dual problems of topology selection and training speed.


48

A cascade correlation network consists initially of two layers of neurons, which are fully connected. These connections are initially trained via any one of a number of training algorithms, and this training continues until the training error approaches an asymptote. Then, intermediate nodes are added, one at a time, in an effort to further reduce error. These intermediate nodes are selected from a pool of candidates, each one of which has been previously trained, and are inserted in a layer by themselves, being fully connected to every node in the input layer, and every intermediate node preceding them. The incoming connections of each intermediate node are frozen, but the outgoing connections may be further trained. Figure 3.2 shows a cascade correlation network with three inputs (plus a bias input), two outputs and two hidden units.

Figure 3.2: A Cascade Correlation Network (adapted from (Fahlman and Lebiere, 1990)). The cascade correlation algorithm can be evaluated as follows: there is no restriction on the problem domain to which cascade correlation can be applied. Since the connection weight training is carried out by backpropagation or one of its variants, the outputs can be either binary or continuous. Cascade correlation networks are not designed to learn more than a single training set: the training algorithm allows them to very quickly learn their training data, but is predicated on the idea that training will stop at some point and not be resumed. Two further problems with cascade correlation networks were identified in (Prechelt, 1997). These are, firstly, the networks tendency to overcompensate errors. Secondly, the cascading hidden units results in representation of very strong nonlinearities that can adversely effect generalisation when there is a small number of training examples. Finally, the algorithm is rather complex, as not only do multiple layers need to be created and maintained, but also pools of candidate neurons must also be created, trained and evaluated before insertion into the network.


49

3.8 Resource Allocating Networks The Resource Allocating Network (RAN) was proposed in (Platt, 1991b) as a way of quickly loading knowledge into a network without over-training. It is a two neuron layer network, where the first layer represents input space regions and the second layer represents the output function being modeled. The first layer neurons use Gaussian functions. The parameters of each Gaussian function represent a specific region of the input space: each neuron in that layer calculates its activation based on how close its region is to the presented input vector, and will thus respond most strongly to inputs from that region. The output layer performs a weighted sum over the outputs of the input layer neurons, and adds a constant vector that is independent of the input vector. Figure 3.3 shows a RAN network.

Figure 3.3: A Resource-Allocating Network (adapted from (Platt, 1991b)). A RAN learns via two different mechanisms: the first adds Gaussian units according to the novelty of the input examples presented. The second modifies the connection weights according to the error of the network. The network begins life with no stored patterns. As training examples are presented, both the distance between the existing Gaussian units and the input vector, and the difference between the actual and desired output vectors, are measured. If they are above a certain threshold, a new node is added. The Gaussian parameters of the new node are set so that its maximum response is for the current, novel example, with its width being proportional to its distance from the nearest existing node. The connections to the output layer are set to the difference between the network output and the desired output. If a new unit is not added, then the existing connections to the output layer are modified via a simple gradient descent algorithm so that the error of the network is reduced. As training progresses, the threshold values decay towards a set value: this means that the knowledge stored in the network will become more and more finely represented. Evaluating RAN using the criteria in criteria in Section 3.1 it is apparent that RAN is a multi-purpose algorithm, that is, it can be applied to both classification and function approximation tasks. The decay of the threshold values


50

as training proceeds implies that it is not designed to be trained on more than one data set. However, there seems to be no reason why these threshold values could not be reset before further training on a new data set. Nor does there seem to be any reason why the network would forget what it has already learned when it is further trained. Finally, RAN is complex. Gaussian activation functions require more computation than simpler sigmoid or threshold functions, while the post-processing of neuron outputs also adds to the complexity. Also, the training algorithm is iterative. There are however, no pools of candidates to track as with cascade correlation nor does a RAN network require post-training processing and transformations as for example an upstart network does. Overall, RAN has some issues with complexity but they do not seem to be major issues.

3.9 Evolutionary Nearest Neighbor MLP The Nearest-Neighbor Multi-Layer Perceptron (NN-MLP) is a connectionist implementation of the nearest-neighbour classifier algorithm (Gates, 1972). In this architecture, each hidden layer neuron corresponds to a training example, with the incoming connection weight vector of the neuron being equal to the example vector it represents. Activation of the hidden neurons is calculated in the following manner: the distance between the incoming weight vector and the current example vector is calculated, and this measure is subtracted from a counter value embedded in each hidden neuron. When the value of this counter reaches zero, the neuron fires, outputting a value of one. When a neuron fires, it inhibits all other neurons that belong to the same class from firing. An output neuron will fire when any of its inputs is one. The simplest way of constructing a NN-MLP is to create one hidden neuron for each example in the training set. This, however, will lead to a very large hidden layer, with all of the problems of efficiency and generalisation accuracy that that entails. A means of constructing parsimonious NN-MLP was proposed in (Zhao and Higuchi, 1996). This algorithm is designed to find the minimum number of hidden neurons needed to accurately model the training dataset. The algorithm is called the

R4-Rule,

for the four separate procedures employed in it: Recognition, Remembrance,

Reduction and Review. Recognition is the process by which training examples are matched to hidden neurons. Of interest in this phase are two parameters associated with each neuron, the class label of the neuron, denoted as , and its importance, denoted as . A hidden neuron n will fire if all three of the following conditions are true:

It belongs to the same class as the training example x (that is, n

The incoming connection weight vector of n is closer to x than any others.

The value is the largest. The values of n and n are set automatically during training: whenever n is a

= x ).

winner, the value of n is increased. If it is a loser (it matches the first two conditions given above but has a lower value) then n is decreased. Remembrance involves evaluating the performance of the network across the data set. If a large number of recognition errors occur, then new hidden neurons will be added to the network. As opposed to adding a single hidden neuron for each misclassified example, only one hidden neuron is added for each misclassified class. This


51

reduces the number of neurons added to the network. If the recognition rate is high, then hidden neurons can be removed. Reduction prunes neurons from the network when their values get very low. Only a single neuron is removed every time the reduction procedure is carried out. This is to stabilise the learning process. If the removal of the neuron causes classification errors then the reduction process is undone. Review adjusts the connection weights of the network, and is carried out whenever a neuron is added or deleted. It is a supervised learning process, and adjusts the connection weights of the network to be closer to x. These four steps and the relationships between them are shown in Figure 3.4.

Figure 3.4: The steps of the R4 training rule and the relationships between them. An evaluation of the NN-MLP algorithm using the three criteria discussed in Section 3.1 yields the following: NN-MLP are designed for classification only. Each neuron added to the hidden layer maintains a class label and it does not seem to be possible to adapt the algorithm to function approximation. There seems to be no reason why other training sets could not also be accommodated by the algorithm, without catastrophic forgetting of the previously learned data. Also, over-training does not seem to be a problem with this algorithm. NN-MLP trained

R4 training algorithm is not a very complex algorithm. Although measures such as the and values must be tracked, the network itself is quite simple. The R4 algorithm is, however, iterative.

with the

3.10 Growing Cell Structures A cellular neural network (Fritzke, 1991) is a two neuron layer network with a single layer of connections. In many ways it is similar to the Kohonen SOM (Section 2.3), in that each neuron in the first layer represents an input variable, while each neuron in the second layer represents a point in the input space. The connection weights coming into the second layer neurons represent the coordinates of the points represented by each neuron in that

n neurons and k inputs, these edges partition the input space into n k -dimensional cells, which are treated as k -dimensional Voronoi regions

layer. Each neuron is connected to its immediate neighbours by an “edge”. With

(Okabe et al., 1992). In (Fritzke, 1991) neurons are consistently referred to as cells: in this section, the term


52

neurons will be used. Learning in growing cell structures (GCS) is primarily unsupervised (Fritzke, 1993a). During training, the distance between each neurons weight vector and the current input vector is calculated. The winner, or “best matching cell” is that cell which is closest to the input vector. The weights of the winner and its immediate neighbours (those neurons that are directly connected to the winner by an edge) are then adjusted to bring the neuron weight vectors closer to the current input vector. A “signal counter” variable is attached to each neuron, and is incremented whenever it is the winner. This process is shown in Figure 3.5, where the values of the signal counters are represented by the height of each column. So that recent wins are weighted more heavily than older wins, a decay factor is applied to all signal counters after each example has been presented.

Figure 3.5: GCS adaptation (adapted from (Fritzke, 1993a)). After the presentation of a set number of training examples, a new neuron will be inserted. Figure 3.6 illustrates the process of inserting a new neuron. The neuron is inserted as the midpoint of the edge connecting the neuron with the greatest number of wins with its most distant neighbour. Thus, the connections of the new neuron are set to the mean of the values of these two parent neurons. The signal counters of the two parents and the child are set to the values they would have if the child had existed from the beginning of the training process. The edges connecting neurons in that region are recalculated to take the existence of the new neuron into account.

Figure 3.6: GCS insertion (adapted from (Fritzke, 1993a)).


53

In addition to periodically inserting new neurons, the algorithm can also periodically delete superfluous neurons. The importance of a neuron is determined by the strength of its signal frequency divided by the size of the Voronoi region associated with it (referred to as the “receptive field” in (Fritzke, 1993a)). This measure is referred to as the “probability density”. Neurons with a probability density less than a certain threshold are removed, along with all edges attached to them. Figure 3.7 shows the deletion of neurons linking two separate groups.

Figure 3.7: Deletion of neurons in a GCS network trained on the two separate regions (adapted from (Fritzke, 1993a)). The rationale of these adaptations, insertions and deletions is that those neurons that win the most are in areas of space with a high density of examples. To accurately map the training set, it is desirable that each neuron has an equal chance of being the winner. The overall effect of the spatial adjustments, neuron insertions and neuron deletions, therefore, is to distribute neurons so that each neuron at the termination of training has an equal chance of winning. This means that the regions defined by clusters of neurons will closely approximate the regions defined by clusters of examples in the training set (Fritzke, 1995). A GCS network trained on the two spirals benchmark data set (Subsection 2.6.2) is displayed in Figure 3.8. This form of learning has been favourably compared to the performance of the Kohonen SOM (Fritzke, 1993b). Since the topology of the network does not have to be determined beforehand, a cellular neural network is able to learn more complex spatial distributions than a SOM (Fritzke, 1993b). Also, their expanding nature allows them to adapt to additional training data without catastrophically forgetting the previously learned data, as was described and demonstrated in (Hamker, 2001). One drawback of this algorithm is the high computational cost of calculating the connecting edges and approximating the Voronoi regions of each neuron (Hamker, 2001). Another approach, used in (Fritzke, 1994) trains the network in a supervised manner. In this model, radial basis function neurons are used in the second neuron layer, and the signal counter variable attached to each neuron is replaced with an “error accumulator” variable. This variable accumulates the difference between the output of the winning neuron and the desired output for each training example. After presentation of a specific number of training examples, additional neurons are added as before.


54

Figure 3.8: GCS Network trained on the two spirals problem (adapted from (Fritzke, 1993a)). A further extension to this algorithm is the application of a construction and learning method called Dynamic Cell Structures (DCS) (Bruske and Sommer, 1995b). This is in many ways a simpler variant of GCS learning that incorporates elements of both Hebbian and Kohonen SOM learning. Construction of a DCS-GCS network starts with two neurons connected by bidirectional symmetrical weights. The learning algorithms outer loop continues until a stopping condition such as an error measure has been reached. The inner loop of the algorithm is repeated for each example in the training set. Each training example is presented to the network and the two most highly activated nodes (the winner and runner-up) are found. The connections between these winner and runner-up neurons are strengthened using Hebbian learning, followed by modification of the winners neighbours weights using Kohonen SOM learning. At the end of the inner loop of the algorithm, a local error measure attached to each node (called a resource in (Bruske and Sommer, 1995b)) is updated. For the unsupervised version of the DCS-GCS algorithm the sum squared distance between the training example and the nodes weights is used. For the supervised version of DCS-GCS, an error measure over the desired and actual output values is used. The inner loop of the DCS algorithm is analogous to a single epoch of the backpropagation training algorithm. The outer loop then adds a new node to the network, positioning it between the nodes with the two highest resource values. This is in contrast to the original GCS algorithm, which inserts new nodes between the neuron with the greatest number of ‘wins’ and its most distant neighbour. The exact position of the new node is based on the ratio of resource values of the two ‘parents’. The resource value of the new node is set to an approximation of what it would be had it always existed in the network. The final step of the algorithm is the application of a slight decay value to each resource counter in the network. The overall effect of this algorithm is that the resource values of all neurons will gradually become the same, thus achieving a match to the distribution of the training data. Evaluating GCS and DCS-GCS with the criteria in Section 3.1 gives the following results. It is not entirely clear what application areas the supervised version of DCS can be applied to. While it is apparent that unsupervised GCS can only be used for classification, there seems to be no reason why supervised DCS could not be used for


55

function approximation also. It is possible to continue training GCS/DCS networks on new data, and the deletion of superfluous neurons would protect it from overtraining. However, both the GCS and DCS algorithms are quite computationally costly: some phases of the algorithm require many complex operations such as the calculation of the resource counters of each neuron must be carried out. That said, they were shown in (Heinke and Hamker, 1998) to be more efficient than some of the alternatives, such as MLP and FuzzyARTMAP (Carpenter et al., 1992).

3.11 Zero Instruction Set Computer Networks The Zero Instruction Set Computer (ZISC) (ZISC Manual, 2002) is an in-silicon, or hardware, implementation of a constructive network for classification. It is a spatially based neural network algorithm, where each neuron represents the centre of a bounded polygon in the input space, and each polygon represents a single class to identify. These polygons are called “Active Influence Fields” (AIF), and the topology of these fields is determined by the activation function of the neurons. The activation of each neuron is distance based, where two different distance measures are available, radial-basis function (RBF) or K-Nearest Neighbour (KNN). RBF distance is measured according to Equation 3.1.

D = jVi

Pi j

(3.1)

where:

D is the distance between the current example V

and the connection weight vector P .

This defines what is described as a “polyhedral volume influence field” in the input space, about the neuron. KNN distance is measured according to Equation 3.2.

D = maxjVi

Pi j

(3.2)

This defines what is described as a “hyper-cube influence field” in the input space, about the neuron. Learning in ZISC networks is based on adding neurons to the network, where neurons are added according to the “uniqueness” of each training example. Uniqueness is defined as a training example either falling outside of the field of any of the existing neurons, or the training example being incorrectly classified. The ZISC learning algorithm for a single training example is displayed graphically in Figure 3.9. Recall of ZISC networks is carried out by finding which influence field an example falls into. Because the fields of the neurons can overlap, there are three possible outcomes of the recognition algorithm. Firstly, the example can fall outside of all neuron fields, in which case the example is regarded as “unrecognised”. Secondly, it can fall within the field of one or more neurons of the same category, in which case it is described as being both recognised and identified. Thirdly, it can fall within the influence fields of two or more neurons that are associated with different classes, in which case it is regarded as being recognised but not identified. The ZISC recall algorithm, for a single input example, is displayed graphically in Figure 3.10. Evaluating ZISC using the criteria established, it is apparent that ZISC is appropriate for classification problems only. In principle, ZISC could be trained on multiple data sets without forgetting, but in practice the limit of 78 neurons per network (this is a limit of the hardware, not the algorithm) means that the further training potential


56

Figure 3.9: The ZISC Learning Algorithm for a single training example (adapted from (ZISC Manual, 2002)). of a ZISC chip is limited. The algorithm is very simple, however, and non-iterative, so software implementations should be quite efficient.

3.12 Grow and Learn Networks Grow and Learn (GAL) networks were proposed by Alpaydin in (Alpaydin, 1994). They are three neuron layer binary classification networks that expand their structure as required to accommodate new training examples. The first neuron layer is the input layer: initially, this is the only layer that exists in the network. The second layer of neurons is the exemplar layer, which represents the memorised training examples. The third layer is called the class layer. The neurons in this layer represent the classes that the network is learning to distinguish. A vector is propagated through the input layer to each neuron in the exemplar layer. There, the distance between the input vector and the incoming connection weight vector is calculated, with Euclidean distance being the most commonly used distance measure. The exemplar neuron with the least distance between the input vector and its incoming connection weight vector is the winner and is allowed to activate. The winner will emit a value of one, while all other exemplar neurons will stay quiescent. The exemplar neuron layer to class neuron layer connection weights identify which class neuron will activate for each exemplar neuron. Training of GAL networks proceeds one example at a time. If the current example is correctly classified (that is, the correct class neuron is activated) then no changes are made to the network. If the current example is misclassified, then a new exemplar neuron is added. When a new exemplar neuron is added, its incoming connection weights are set to the current input vector, so that it will activate the most strongly if that example is


57

Figure 3.10: The ZISC Recall Algorithm for a single example (adapted from (ZISC Manual, 2002)). seen again. The outgoing connections of the new exemplar are set so that the desired class neuron activates. If the class is new (not represented by the network) then a new class neuron is added also. During the course of training, some of the exemplar neurons may become redundant. This is because newer exemplars may sit closer to the boundaries of the classes being learned than the existing exemplars. To deal with these redundant neurons, sleep learning of GAL networks was developed. The sleep learning algorithm is as follows. First, an exemplar neuron is randomly identified. Its incoming connection weight vector is extracted, and used as an input vector. The input vector is then propagated through the network to identify the desired output class for that neuron. Then, the vector is again propagated through the network with the current exemplar neuron disabled. If the network still correctly classifies the example, then the current exemplar is redundant and can be removed. This process can be repeated as many times as desired. Since this procedure can lead to an increase in error for the GAL network, it is often necessary to repeat the training - sleep learning process several times. A GAL network will always have a winning exemplar neuron. This means that there will always be one class neuron that activates, whether it is the right class or not. To deal with this, the top two activated exemplars are found (the “best two” exemplars) and their corresponding classes identified. If the two classes are different, then no class neuron will activate. If they are the same, then that class will be the winner. This has the effect of giving the network the capacity to reject an example as unclassifiable. The basic GAL learning algorithm does not modify the connection weights. The only modification to the network is via the addition and deletion of neurons. This has some disadvantages, such as a sensitivity to noise and a tendency to memorise the training set at the expense of generalisation. Fine-tuning of GAL can overcome this problem to some extent, by modifying the input to exemplar connection weights. This modification is done according to Equation 3.3.


Wi;e (t + 1) = Wi;e (t) + (Pi

58

Wi;e (t))

(3.3)

where:

Wi;e (t) is the connection weight from input i to exemplar e is a learning rate term, that is usually set to decay as learning progresses Pi is element i of the current training vector P . The intent of this modification is to move the winning exemplar neurons weight vector towards the statistical mean of that class. It is noted (Alpaydin, 1994, pg 407) that “a large number of iterations will be necessary” to tune the GAL network. GAL is limited to classification problems. However, it is able to continue to learn additional data sets after the completion of the initial training cycle. Also, the removal of redundant neurons though sleep learning will protect it from over training. Although the training algorithm is iterative, the algorithm itself is quite simple. If the number of training iterations were kept to a minimum, then GAL could be expected to be very efficient.

3.13 Problems Many of the algorithms described in this chapter have problems associated with them, as evidenced by the evaluation carried out of each using the criteria laid out in Section 3.1. The major problem that is suffered by most, especially the older, backpropagation trained MLP derived algorithms, is that they are limited to learning a single data set. This is a major limitation, as one of the major advantages of constructive networks is that they can continue to adapt to new data throughout their lifetime. Another significant problem is that some algorithms are only suitable for classification, for example the GAL (Kwok and Yeung, 1999) and ZISC algorithms. The high complexity of algorithms such as GCS/DCS is also of concern, as the computational cost in calculating the Voronoi region of each neuron and in maintaining the resource counters is significant.

3.14 Summary This chapter has reviewed and evaluated several existing constructive connectionist algorithms in support of the investigation of Hypothesis One of the thesis. The Dynamic Node Creation (DNC, Section 3.4), tiling (Section 3.5), upstart (Section 3.6) and Cascade Correlation (Section 3.7) algorithms are all older algorithms derived from backpropagation trained MLPs. The remainder of the algorithms, the Resource Allocating Network (RAN, Section 3.8), Evolutionary Nearest Neighbour MLP (NN-MLP, Section 3.9), Growing Cell Structures/Dynamic Cell Structures (GCS/DCS, Section 3.10), Zero Instruction Set Computer (ZISC, Section 3.11) and Grow and Learn Networks (GAL, 3.12) are all examples of algorithms that have moved away from the classical backpropagation trained MLP model. Several problems with existing constructive ANN were identified in Section 3.13. These problems define the requirements for a constructive algorithm, that is suitable for both classification and function approximation tasks,


59

is able to learn continuously throughout its lifetime, and is relatively simple and efficient. This is the specification met by the Evolving Connectionist System (ECoS) in the following chapter.

Chapter 4

Evolving Connectionist Systems The darkness drops again; but now I know That twenty centuries of stony sleep Were vexed to nightmare by a rocking cradle

W.B. Yeats, The Second Coming

4.1 Introduction As discussed in Section 2.3, conventional connectionist systems have difficulty meeting the seven requirements of IIS as listed in Section 1.1. Constructive connectionist systems, as described in the previous chapter, obviate some of these problems (specifically the problem of architectural selection) but they have other restrictions that prevent them from fulfilling these requirements. Evolving Connectionist Systems (ECoS) are open architecture ANN that address the seven requirements above and avoid the problems of both conventional and constructive ANN by adopting a dynamic, flexible structure that is automatically adjusted by the training algorithm and does not suffer from the restrictions inherent with other constructive ANN: ECoS networks fulfill the goals set in Section 3.14. The primary purpose for this chapter is to investigate Hypotheses One and Two, as described in Section 1.2. The information in this chapter is essential to the rest of the thesis, as it places the new algorithms of the following chapters in context and provides the motivation for much of the original work. Also presented in this chapter is the first of the original algorithms, the Simple Evolving Connectionist System (Section 4.6). The chapter is organised as follows: Section 4.2 introduces Kasabov’s original ECoS framework. The seminal ECoS network, the Evolving Fuzzy Neural Network (EFuNN) is described in Section 4.3. The algorithms described in Section 4.3 are generalised in Section 4.4, which describes a general architecture for ECoS networks and describes the general ECoS training algorithm. This is the form of the algorithm that will be focussed upon in this thesis. Section 4.5 revisits the EFuNN implementation of the ECoS principles. The Simple Evolving Connectionist System (SECoS), an original contribution of this thesis, is described in Section 4.6. An additional ECoS-derived algorithm, Dynamic Evolving Neural-Fuzzy Inference System (DENFIS) is described in Section 4.7. DENFIS is included because, although it will not be analysed theoretically or evaluated experimentally, an evaluation of its learning algorithm provides support for the work in later chapters, specifically the fuzzy rule extraction work in Chapter 6. Methods for expanding the output space (adding additional variables to the output space) of EFuNNs and SECoS networks are presented in Section 4.8, while a method for incorporating temporal

CHAPTER 4. EVOLVING CONNECTIONIST SYSTEMS

61

dynamics in ECoS networks is presented in Section 4.9. Evaluations of the ECoS algorithm in terms of the requirements for IIS as laid out in Section 1.1, and in terms of the criteria for evaluating constructive connectionist systems (Section 3.1), are presented in Section 4.10. ECoS and the constructive algorithms described in Chapter 3 are compared in Section 4.11, where methods and techniques that may be applied to ECoS are highlighted. Several selected applications of ECoS-style networks are reviewed in Section 4.12. Experiments over the benchmark data sets are presented in Section 4.13. Some problems with the ECoS model, that form the motivation for more of the original work in this thesis, are described in Section 4.14. Conclusions and evaluations of research Hypotheses One and Two are presented in Section 4.15. Finally, the chapter is summarised in Section 4.16.

4.2 The ECoS Framework The Evolving Connectionist System (ECoS) framework was developed to fulfill the seven requirements of intelligent systems described above and therefore address the disadvantages of existing constructive connectionist systems. It is a class of ANN architectures and a general purpose open architecture learning algorithm that modifies the structure of the network as training examples are presented. Although the seminal ECoS architecture was the Evolving Fuzzy Neural Network (EFuNN, see Section 4.3), several other architectures have been developed that utilise the ECoS algorithm. These include the minimalist Simple Evolving Connectionist System (SECoS, see Section 4.6) and the Dynamic Evolving Neural-Fuzzy Inference System (DENFIS, see Section 4.7). The general principles of ECoS are, as stated in (Kasabov, 1998a):

1. ECoS learn fast from a large amount of data through one-pass training; 2. ECoS adapt in an on-line mode where new data is incrementally accommodated; 3. ECoS memorise data exemplars for a further refinement, or for information retrieval; 4. ECoS learn and improve through active interaction with other systems and with the environment in a multi-modular, hierarchical fashion; Each of the above points can be expanded as follows: 1. The ECoS training algorithm is designed with the intention that all that the algorithm can learn from the data is learned in the first training pass. Additional exposure to the training data is not necessary, as all of the knowledge that can be captured has been captured. 2. ECoS are intended to be used in an online learning application. This means that new data will be constantly and continuously coming in to the system, and that this data must be learned by the network without forgetting the old. The general ECoS architecture and learning algorithm allows an ECoS network to accommodate this new data without a catastrophic forgetting of the old. 3. The manner in which neurons are added to an ECoS means that some training examples are stored, initially, verbatim within the structure of the network. These examples are then either modified (refined) by exposure


62

to further examples, or, depending upon the training parameters used, remain the same and can be later retrieved. 4. ECoS networks are intended to be used in concert with other networks and systems. Thus, the learning algorithm and architecture is intended to allow for the influence of external forces, with some modules having greater importance than others. The advantages of ECoS are precisely that they avoid the problems associated with traditional connectionist structures. They are hard to over-train, they learn quickly, and they are far more resistant to catastrophic forgetting than most other models. ECoS networks also have several advantages over other constructive algorithms. Firstly, they are not limited to a particular application domain, they can be applied to both classification and function approximation. Secondly, they do not in principle require multiple presentations of the training data set, as is the case with some of the constructive algorithms in existence, such as RAN (Section 3.8) and GAL (Section 3.12). Finally, they are able to continue learning and are not restricted to learning a single training set as some other constructive algorithms (such as RAN and Cascade Correlation) are. A comparison between the ECoS model and the constructive algorithms described in Chapter 3 is carried out in Section 4.11. Conversely, it is these advantages that create some of their disadvantages. Since they deal with new examples by adding neurons to their structure, they rapidly increase in size and can become inefficiently large. They also have some sensitivity to their parameters, which require careful selection for optimum performance.

4.3 Evolving Fuzzy Neural Networks Evolving Fuzzy Neural Networks (EFuNN) were proposed by Kasabov in (Kasabov, 1998c). They are fuzzy neural networks redeveloped within the ECoS framework, that is, they are FuNNs (Section 2.3) that have been modified to take into account the principles of ECoS. In this modification of the FuNN architecture, several changes are made to the method of activation of the rule and action neuron layers. Whereas in a traditional FuNN the activation of a rule neuron is calculated as the weighted sum of the condition neuron activations (the fuzzified input values), in an EFuNN the activation is determined by the distance between the condition neuron activations vector (the fuzzified input vector) and the connection weights incoming to that neuron. Thus, activation of an EFuNN rule neuron is proportional to the distance between the current fuzzified input values and the previously seen examples as represented by the connection weights. The activations of the rule layer neurons are propagated to following layers according to two different strategies: OneOfN, and ManyOfN. With OneOfN recall, only the most highly activated neuron has its activation value propagated to the following layers. With ManyOfN recall, only those neurons that are activated above a specified level (known as the Activation Threshold) have their activation values propagated onwards. The changes to the action layer are simpler: while FuNN uses a logistic, or sigmoid, activation function, EFuNN uses a saturated linear activation function. The activation of rule neurons is calculated as:

An = 1 D where:


63

An is the activation of rule neuron n, and D is the distance between the fuzzified input vector and the connection weight vector leading into n. The distance between the fuzzified input vector and the weight vector is measured as the normalised fuzzy distance, as defined by Equation 4.1.

Dn =

1 2

X i

=1

jIi

X i

=1

Wi;n j

(4.1)

Wi;n

where:

is the number of condition neurons (fuzzy inputs), I is the fuzzified input vector, and W is the Condition to Rule layer weight matrix. The EFuNN training algorithm is based upon the principle that rule neurons only exist if they are needed. As EFuNN is an open architecture that is intended to adapt over time, the initial number of rule neurons is very small, with additional neurons being added as more training examples are presented to it. As each training example is presented, the activation values of the neurons in the rule and action layers and the error over the action neurons are examined. If the maximum rule neuron activation is below a set threshold (the Sensitivity Threshold) then a rule neuron is added. If the action neuron error is above a threshold value (the Error Threshold) a rule neuron is added. When a neuron is added, it’s incoming and outgoing connection weights are set to the fuzzified input and output vectors, respectively. If no neurons are added, then both the incoming and outgoing weights of the most highly activated rule neuron are adjusted, according to the two parameters Learning Rate One (for the incoming weights) and Learning Rate Two (for the outgoing weights). While EFuNNs have been applied to a wide variety of application domains (Section 4.12), they do have their disadvantages. These disadvantages can be traced to their fuzzification and defuzzification operations. The formalisation of ECoS in Chapter 5 makes it apparent that the higher the dimensionality of the input space, the larger the number of evolving layer neurons that will be required to model the example. Like FuNN, each input variable in an EFuNN will have at least two fuzzification neurons attached to it: this is because each membership function is bounded by its neighbour. Thus, the dimensionality of the input space will be at least doubled. The same rule applies to the output space. The other major problem is with the parameters of the fuzzy neurons. The number of condition and action neurons attached to each input and output, as well as their centres, is chosen and fixed at the creation of the EFuNN. Kasabov’s ECoS / EFuNN learning algorithm is not able to modify either the number of input membership functions or their parameters. Although it may be possible to create an extended EFuNN learning algorithm that will add and adapt condition neurons and their parameters, such an algorithm would have to be designed with extreme care. If the parameters of the condition layer were altered, then the network becomes inconsistent: the fuzzy values would be different for the same crisp values, which means that the EFuNN would produce different results than learned for previously seen examples. This problem could conceivably be circumvented by employing multiple passes through the training set, but this would break from the ECoS approach as


64

specified in Section 4.2, namely the requirement that training be one-pass.

4.4 General ECoS Architecture EFuNN (Section 4.3) was the first ECoS network. From this a generalised constructive ANN architecture and training algorithm can be derived. An ECoS network is a multiple neuron layer, constructive artificial neural network. An ECoS network will always have at least one evolving neuron layer. This is the constructive layer, the layer that will grow and adapt itself to the incoming data, and is the layer with which the learning algorithm is most concerned. The meaning of the connections leading in to this layer, the activation of this layers neurons and the forward propagation algorithms of the evolving layer all differ from those of classical connectionist systems. For the purposes of this work, the term ‘input layer’ refers to the neuron layer immediately preceding the evolving layer, while the term ‘output layer’ means the neuron layer immediately following the evolving layer. This is irrespective of whether or not these layers are the actual input or output layers of the network proper. The connection layers from the input neuron layer to the evolving neuron layer and from the evolving layer to the output neuron layer, are fully connected. That is, every input neuron is connected to every evolving layer neuron, and every evolving layer neuron is connected to every output neuron.

Figure 4.1: General ECoS architecture.

The activation A of an evolving layer neuron n is determined by Equation (4.2)

An = 1 Dn

(4.2)

where:

An is the activation of the neuron n, and Dn is the distance between the input vector and the incoming weight vector for that neuron. Since ECoS networks are fully connected, the number of connections coming into an evolving layer neuron from the input layer is the


65

same as the number of input neurons. Thus, the incoming weight vector has the same dimensionality as the vector input to the evolving layer. It is therefore possible to directly measure the distance in Euclidean space between the two vectors. Although the distance can be measured in any way that is appropriate for the inputs, this distance function must return a value in the range of [0; 1℄. For this reason, most ECoS algorithms assume that the input data will be normalised, as it is far easier to formulate a distance function that produces output in the desired range if it is normalised to the range [0; 1℄. Thus, examples which exactly match the exemplar stored within the neurons incoming weights will result in an activation of 1, while examples that are entirely outside of the exemplars region of input space will result in an activation of near 0. Whereas most ANN propagate the activation of each neuron from one layer to the next, ECoS evolving layers propagate their activation by one of two alternative strategies. The first of these strategies, entitled OneOfN propagation, involves only propagating the activation of the most highly activated (“winning”) neuron. The second strategy, ManyOfN propagates the activation values of those neurons with an activation value greater than the activation threshold Athr .

4.4.1 ECoS training The ECoS Learning algorithm is based on accommodating within the evolving layer new training examples, by either modifying the weight values of the connections attached to the evolving layer neurons, or by adding a new neuron to that layer. The algorithm employed is described below.

Propagate the input vector I through the network.

Find the most highly activated (winning) neuron j and its activation Aj .

IF Aj is less than the sensitivity threshold Sthr – Add a neuron.

ELSE – Evaluate the errors between the calculated output vector O and the desired output vector Od .

IF the absolute error over the desired output is greater than the error threshold Ethr

Add a neuron.

– ELSE

Update the connections to the winning evolving layer neuron.

Repeat for each training vector.

When a neuron is added, its incoming connection weight vector is set to the input vector I , and its outgoing weight vector is set to the desired output vector Od .


66

The weights of the connections from each input i to the winning neuron j are modified according to Equation 4.3.

Wi;j (t + 1) = Wi;j (t) + 1 (Ii

Wi;j (t))

(4.3)

where:

Wi;j (t) is the connection weight from input i to j at time t

1 is the learning rate one parameter Ii is the ith component of the input vector I The weights from neuron j to output o are modified according to Equation 4.4. Wj;o (t + 1) = Wj;o (t) + 2 (Aj Eo )

(4.4)

where:

Wj;o (t) is the connection weight from j to output o at time t

2 is the learning rate two parameter Aj is the activation of j Eo is the signed error at o, as measured according to Equation 4.5. Eo = Od Ao

(4.5)

where:

Od is the desired activation value of o

Ao is the actual activation of o. This is essentially the perceptron learning rule. From this it becomes apparent that in (Kasabov, 1998c) and subsequent publications (Kasabov and Woodford, 1999; Woodford and Kasabov, 2001) the terms Od and Ao above were incorrectly reversed.

4.4.2 Neuron Allocation Strategies When a neuron is added to the evolving layer, it must be inserted somewhere in the layer. Where the neuron is placed is determined by the neuron allocation strategy used, as the position of the neuron is important for some methods of optimisation, such as those discussed in Subsections 7.3.1 and 7.4.1. The simplest of the evolving layer neuron allocation strategies is the Linear strategy. Here new rule neurons are simply added sequentially to the end of the layer, irrespective of the spatial location (within the input or output space) of the new neuron. The second strategy investigated is the Cluster strategy. With this strategy new neurons are inserted beside the most highly activated of the layers neurons. In principle this preserves the spatial significance of the new neuron, as the new neuron will be inserted next to the neuron that is closest to the current input vector. This means that the evolving layer will act as a one-dimensional vector quantiser. This strategy is based upon the principle that the most highly activated existing neuron is the closest to the new neuron.


67

The third strategy is the Maximum Weight strategy. Here the new neuron is inserted beside the existing neuron that possesses the largest weighted connection to the desired winning output neuron. This weight represents the centre of the spatial distribution of the seen examples. Inserting new neurons here preserves the spatial distribution of the training data. Thus, the evolving layer of the ECoS will function as a one dimensional vector quantiser, similar to the Kohonen SOM (Kohonen, 1990; Kohonen, 1997).

4.5 Evolving Fuzzy Neural Networks Revisited Re-examining the EFuNN algorithm in the light of the general ECoS architecture presented in Section 4.4, EFuNN can be described as follows: The input neuron layer is the condition layer. Thus, the inputs to the evolving layer is a fuzzified input vector. The outputs are the action layer, which means that the target output values that the evolving layer will learn are the fuzzified output vectors. The distance measure in this case is the normalised fuzzy distance, as defined in Equation 4.1. The learning algorithm is only slightly different from the general ECoS learning algorithm, in that the weight values for the incoming and outgoing connections of a new neuron are the fuzzified input and output values: there must therefore be two additional steps performed, the fuzzification of the current input and desired output vectors, before insertion of the new neuron.

4.6 The Simple Evolving Connectionist System The Simple Evolving Connectionist System (SECoS) is proposed as a minimalist implementation of the ECoS algorithm and is an original contribution resulting from Hypothesis Two of the thesis. SECoS were first introduced in (Watts, 1999a) and more fully developed in (Watts and Kasabov, 2000). They were then quickly adopted by other researchers (Ghobakhlou and Seesink, 2001). Lacking the fuzzification and defuzzification mechanisms of EFuNN, the SECoS model was created for several reasons. Firstly, they are intended as a simpler alternative to EFuNN. Since they lack the fuzzification and defuzzification structures of EFuNN, SECoS are much simpler to implement: with fewer connection matrices and a smaller number of neurons, there is much less processing involved in simulating a SECoS network. They are also much easier to understand and analyse: while EFuNN expands the dimensionality of the input and output spaces with its fuzzy logic elements, SECoS deals with the input and output space ‘as is’. Therefore, rather than dealing with a fuzzy problem space, SECoS deals with the problem space directly. Each neuron that is added to the network during training represents a point in the problem space, rather than a point in the expanded fuzzy problem space. Secondly, for some situations, fuzzified inputs are not only unnecessary but harmful to performance, as they lead to an increase in the number of evolving layer neurons. Binary data sets are particularly vulnerable to this, as fuzzification does nothing but increase the dimensionality of the input data. By removing the fuzzification and defuzzification capabilities, it is proposed that the adaptation advantages of EFuNN are retained while eliminating the disadvantages of fuzzification, specifically by eliminating the need to select the number and parameters of the


68

input and output membership functions. It is proposed that for most applications, a SECoS will be able to model the training data with fewer neurons in the evolving layer than an equivalent EFuNN.

4.6.1 The SECoS Structure and Learning Algorithm A SECoS as proposed in this thesis consists of only three layers of neurons, the input layer, with linear transfer functions, an evolving layer with distance-based neuron activation functions, and an output layer with a simple saturated linear activation function. The distance measure for neuron n,

Dn

used in the evolving layer is the

normalised Manhattan distance, as shown in Equation 4.6:

X

=1 Dn = X

i

i

=1

j I

W j

j I + W j

(4.6)

where:

is the number of input neurons in the SECoS, I is the input vector, W is the input to evolving layer weight matrix. The normalised Euclidean distance can also be used, as defined by Equation 4.7.

Dn =

p P i

=1 (pI W )2

(4.7)

There are two layers of connections in the SECoS model. The first connects the input neuron layer to the evolving layer. The weight values here represent the coordinates of the point in input space each evolving layer neuron represents. The second layer of connections connects the evolving layer to the output neuron layer. The weights in this layer represent the output values associated with the defined regions in the input space. The learning algorithm is the same as that described in Subsection 4.4.1 and as used by EFuNN. However in SECoS the input vector I is the actual crisp input vector, while the desired outputs Od vector is the unmodified desired output vector.

4.7 Dynamic Evolving Neural-Fuzzy Inference System The Dynamic Evolving Neural-Fuzzy Inference System (DENFIS) is an application of the ECoS principles to an ANN that implements a Takagi-Sugeno-Kang fuzzy inference system. DENFIS was first described in (Kasabov and Song, 2000) and was more completely described in (Kasabov and Song, 2002). DENFIS is presented here as an example of another application of the ECoS principles. DENFIS heavily utilises the so-called Evolving Clustering Method (ECM). ECM is substantially similar to SECoS training, in that it finds and adjusts cluster prototypes in non-fuzzy space. ECM is described as follows in (Kasabov and Song, 2002, pg 146-147)


69

C10 by simply taking the position of the first example from the input stream as the first cluster center C 01 , and setting a value 0 for its cluster radius Ru1 .

Step 0: Create the first cluster

Step 1: If all examples of the data stream have been processed, the algorithm is finished. Else, the current input example, xi , is taken and the distances between this example and all n already created cluster centres C j , Dij

= jjxi

C j jj; j = 1; 2; : : : ; n; are calculated

C j jj, equal to, or less than, at least one of the radii, Ruj ; j = 1; 2; : : : ; n it means that the current example xi belongs to a cluster Cm with Step 2: If there is any distance value, Dij

= jjxi

a minimum distance

Dim = jjxi

C m jj = min(jjxi C j jj) subject to the constraint Dij Ruj ; j = 1; 2; : : : ; n. In this case, neither a new cluster is created, nor any existing cluster is updated; the algorithm returns to Step 1. Else-go to the next step.

Step 3: Find cluster

Step 4: If Sia is greater then 2 Dthr, the example xi does not belong to any existing cluster.

Ca

C a and cluster radius Rua ) from all n existing cluster centers through calculating the values Sij = Dij + Ruj ; j = 1; 2; : : : ; n, and then choosing the cluster center C a with the minimum value Sia : Sia = Dia + Rua = min(Sij ; j = 1; 2; : : : ; n. (with center

A new cluster is created in the same way as described in Step 0 and the algorithm returns to Step 1.

Step 5: If Sia is not greater than 2 Dthr, the cluster Ca is updated by moving its center, C a , and increasing the value of its radius, Rua . The updated radius Runew is set to be equal to Sia =2 a and the new center C new is located at the point on the line connecting the xi and C a , and the a to the point xi is equal to Runew distance from the new center C new a a . The algorithm returns to Step 1.

In this way, the maximum distance from any cluster center to the examples that belong to this cluster is not greater than the threshold value, Dthr though the algorithm does not keep any information of past examples. When the training examples have been clustered using ECM, the rules are formulated. This is done in two phases, firstly forming the antecedents, followed by the consequent functions. The antecedents are formulated by finding which combination of input membership functions (MF) activate the most highly for the centre of the cluster, that is, the values represented by the cluster centre are fuzzified by the input MF set and the winning, most highly activated, MF are taken as the antecedents for that rule. The consequent functions are then found using a Least Means Estimation process over the examples within the cluster. Thus, each cluster is used as the basis of a single rule. There are many similarities between ECM and SECoS learning. The calculation of distances between the current example and the centres of each cluster in Step 1 of the ECM algorithm is essentially identical to the calculation of the neuron activations in a SECoS evolving layer. The addition of a new cluster centre in Step


70

4 of ECM is in many ways identical to the addition of a neuron to SECoS under the sensitivity threshold rule: the sensitivity threshold will cause a neuron to be added if the activation is below the specified threshold. This is the same as saying that a neuron will be added if the current training example is further away than a specific distance determined by the sensitivity threshold (see Chapter 5 for an in-depth discussion and analysis of the role of distances and training parameters in the behaviour of ECoS learning). Finally, the updating of the cluster centres in Step 5 of ECM is comparable to the modification of the input to evolving layer connection weights in the ECoS learning algorithm. The similarities between ECM / DENFIS and SECoS learning, lend support to the hypothesis, explored later in Chapter 6, that fuzzy rules, including Takagi-Sugeno fuzzy rules, can be extracted from trained SECoS networks.

4.8 Output Space Expansion in ECoS ECoS networks are intended to be used in an online, life-long learning situation. In such situations, it is entirely possible that new target classes will be introduced that must be handled by the existing system. In these cases, there are three possible solutions. Firstly, the existing system can be thrown away and a new network created from scratch. This is not satisfactory, for several reasons: firstly, the amount of time required may be significant. Secondly, the data that has been seen and must be accommodated by the existing network may not be available for retraining. The second option is to retain the existing network and create a new network specifically to handle the new class. This avoids the problem of training time, but the problem of missing data remains: if the new network is to handle the new class, then it must be trained on negative examples as well as positive examples. The third option is to modify the existing network to accommodate the new class. This has the advantage of requiring only additional training on the new class, obviating both the time and data availability problems of the previous two options. For conventional ANN this is a very difficult thing to do, because the knowledge within them is distributed across the entire architecture. ECoS class networks, on the other hand, store their knowledge local to each neuron and are therefore inherently suited to this problem. The method of adding new outputs to ECoS networks is referred to as output space expansion, because each class added increases the dimensionality of the output space by one. Algorithms for adding output classes have been developed for both EFuNN and SECoS, and are described in the following subsections.

4.8.1 EFuNN Output Expansion Expansion of the output space of EFuNN was introduced in (Ghobakhlou et al., 2000). Addition of a new output neuron affects the output neuron layer, the action layer and the rule to action layer connections. When a new output and its action neurons are inserted into the EFuNN, the connections from the existing rule neurons to the new action neurons are set to the fuzzified value of the crisp output zero, using the fuzzy membership functions defined by the new action neurons. This has the effect of making all existing rule neurons represent negative examples for the new output, that is, if any of the existing rule neurons fire, then the new output will be inactive by default. The network is then further trained on examples of the new class, allowing new rule neurons to be constructed to represent the


71

class. Experiments in (Ghobakhlou et al., 2000) on isolated spoken word recognition demonstrated the efficacy of this algorithm. Each output neuron in this case represented a single word, to be identified from acoustic data. The algorithm described here was used to add new outputs, and hence new words, to the EFuNN.

4.8.2 SECoS Output Expansion As befits the simplicity of the SECoS model, the algorithm for the addition to a SECoS network of new output classes is also simple. A new output neuron is inserted into the network, and the connection weights from the evolving layer to the new output are set to zero. This again has the effect of making all existing examples negative by default. This approach was used in (Ghobakhlou and Seesink, 2001) to expand the vocabulary of a spoken word recognition system.

4.9 Temporal Extensions of ECoS ECoS networks can be modified to capture short range temporal characteristics of the data. This is done by the addition of a second evolving layer, known as the context or short-term memory layer that is connected solely to the first evolving layer in a similar manner as the Jordan-Elman Simple Recurrent Network (SRN) (Elman, 1990). The context layer has two layers of connections associated with it. The first leads from the evolving layer to the context layer. This is a sparsely connected layer, where each evolving layer neuron is connected only with its corresponding context layer neuron. The weights of these connections are set to one and do not change during training. The second layer of connections leads from the context layer to the evolving layer. This is a fully connected layer, where each context layer neuron is connected to every evolving layer neuron. These connections will change during training. Activation values from each context layer neuron are propagated to each evolving layer neuron, where the weighted value is added to the activation of the evolving layer neuron. The activation values of each evolving layer neuron are then propagated from the evolving layer through the sparse fixed connections to the context layer. Thus, the context layer maintains a memory of the previous examples activations. A new neuron is added to the context layer whenever a neuron is added to the evolving layer. Figure 4.2 shows the general structure of a temporal ECoS network. Only the connection weight from the winning context layer neuron to the winning evolving layer neuron j is updated during training. The connection weight W ;j from to j is modified via Hebbian learning (see Section 2.3), according to Equation 4.8.

W ;j (t + 1) = W ;j (t) + 3 A Aj where:

3 is the learning rate three parameter A is the activation of the winning context layer neuron Aj is the activation of the winning evolving layer neuron j .

(4.8)


72

Figure 4.2: General temporal ECoS architecture.

4.10 Evaluation of ECoS In the introduction to this thesis (Section 1.1), seven requirements for intelligent systems were enumerated. Kasabov (Kasabov, 1998a) cites these requirements as the motivation for the ECoS framework, within which the ECoS algorithms were developed. Also, criteria for evaluating constructive connectionist systems were defined in Section 3.1. In this section, the ways in which ECoS fulfills or attempts to fulfill these criteria and requirements are discussed. A summary of the evaluation, in terms of whether or not ECoS meets each of the requirements, is presented in Table 4.10. Requirement

Evaluation

Fast Learning

met

Incremental Adaptation

met

Open Structure

unclear

Memory Based

not met

Continuous Improvement

met

Self Analysis

met

Representation of Time and Space

met

Table 4.1: Evaluation of ECoS in terms of requirements of IIS.


73

4.10.1 Evaluation of ECoS in terms of constructive connectionist systems Evaluated by the three evaluation criteria defined in Section 3.1, ECoS networks are suitable for both classification and function approximation, they are able to be trained on multiple data sets without catastrophic forgetting, and they are simple and efficient algorithms.

4.10.2 Evaluation of ECoS in terms of Intelligent Information Systems Criteria Fast Learning The ECoS learning algorithm is fast, certainly much faster than standard backpropagation training. In principle it requires only a single pass through the training data set. The training process will slow down, however, later on in training, when a large number of neurons have already been added to the network. This problem is not restricted to the training phase, as recall is also affected by this problem. This reinforces the need for optimisation of the ECoS structure, both online, during training, and offline, after training is complete. Incremental Adaptation ECoS is very well suited to incremental adaptation, that is, it is able to accommodate new data very easily, while resisting forgetting of or interference with the knowledge already acquired by the network. The “real-time” ability of ECoS to adapt is more debatable, again because of the slow response when the network is large. There is nothing in the ECoS training algorithm, however, that prevents either single examples or entire data sets being dealt with. Open Structure The exact meaning of a system having an open structure is not entirely clear from (Kasabov, 1998a). ECoS appears to fulfill this requirement to some extent, as the algorithm is a constructive one and thus adds neurons and connections as necessary. It is also possible to add output neurons (Section 4.8) to some ECoS networks. Adding new input features would be desirable, but to date no mechanism for doing this has been suggested. Memory Based “Memory based” is defined in (Kasabov, 1998a) to be “keep[ing] a reasonable track of information that has been used in the past”. ECoS achieves this by storing modified exemplars in the connection weights, that is, examples that cause the addition of neurons are retained and modified by the learning algorithm. The other requirements of “memory based”, however, are not met by the basic ECoS algorithm Continuous Improvement Continuous improvement and incremental adaptation are very similar processes. The same property that allows an ECoS to learn incrementally also allows it to continuously improve itself. That is to say, being able to train an ECoS one example at a time, without interfering with previously learned examples, also allows it to adapt to new examples throughout its lifetime.


74

Self Analysis It is possible to explain what an ECoS network has learned via extraction of rules (Chapter 6). The development of rule extraction algorithms for SECoS, to bring it into line with this requirement, is a major contribution of this thesis. Introspection is arguably provided by the training algorithm, as it examines the structure of the ECoS and adds neurons if it is found lacking. A greater level of introspection is provided in the optimisation algorithms described in Chapter 7. Representation of Time and Space The ECoS represents space by the placement of neurons in the evolving layer, and by the connection weights of those neurons. Each connection weight from the input layer to the evolving layer neuron represents a single ordinate. Thus, the complete incoming weight vector of an evolving layer neuron represents a single point in input space, and the complete evolving layer represents a collection of points that reflects the distribution in space of the training data. The temporal extensions to ECoS (specifically, the addition of recurrent neuron layers and connections as described in Section 4.9) allow them to capture some temporal properties of the data, however, the mechanisms by which this is done are not entirely understood.

4.11 Comparing ECoS with Constructive Algorithms To support the investigation of Hypothesis One it is necessary to compare the new, ECoS algorithm to existing constructive algorithms. This serves to not only illustrate the originality of the new algorithm, but also identify methods of optimisation that can be adopted from existing algorithms. This section compares ECoS to each of the constructive algorithms presented in Chapter 3, identifying the similarities and differences with each, and identifying those elements that may be applied to ECoS.

4.11.1 Upstart Algorithm The Upstart algorithm (Section 3.6) is a very old algorithm, and thus has very little in common with ECoS. Although both algorithms drive the addition of neurons by error measures, the similarities end there. As discussed in the evaluation in Section 3.6, Upstart is strictly limited to classification type problems. The connections in Upstart are allowed to skip neuron layers, and the weight values cannot be updated after they are initially set. Upstart is limited to learning a single data set and cannot be further trained on new data. Finally, Upstart does not actually create ANN directly: it creates a tree of neurons and connections that must be transformed into a MLP.

4.11.2 Resource Allocating Networks At first glance, the ECoS algorithm may seem very similar to the Resource-allocating Network (RAN) proposed by Platt (see Section 3.8). Indeed, the two networks do have many features in common. Both RAN and ECoS, during training, learn to encapsulate in the network regions of input space, that is, the neurons that are added to the network during training will each represent a specific region of the input space. Both


75

algorithms will scale sublinearly with the number of training examples, that is, the number of neurons added will usually be less than the number of training examples presented. The addition of new neurons in both RAN and ECoS are based upon the novelty of each training example and the neurons that are added will themselves represent these examples. Both algorithms use as one of these criteria the error of the network over the current training example, and in cases where a neuron is not added, then the parameters (connection weights or neuron parameters) are adjusted, in such a way as to optimise performance of the network over the current training example. A newly added neuron will perfectly represent (memorise) the training example that triggered its addition, and simple gradient descent learning, such as that used in Perceptron learning (Section 2.3) is used by both RAN and ECoS to adjust the connection weights. Finally, while RAN and ECoS both use a distance-based function to calculate the activation values of the growing neuron layer, the activation of the output neurons is based on a more conventional multiply and sum operation. Despite these similarities, closer inspection reveals several important differences between the two algorithms. These differences are principally to do with the complexity of the algorithm and its (implied) intended means of application, as discussed in the evaluation in Section 3.8. Firstly, RAN uses Gaussian functions to explicitly represent a region of input space, where the region is defined by the parameters of Gaussian functions in the growing layer of the network. Conversely, each neuron in the evolving layer of an ECoS network defines a point in input space, where the point is defined by the connection weight vector of that neuron. ECoS neurons do define regions in the input space, but they do so implicitly rather than explicitly. Secondly, RAN performs an exponential post-processing on the output values of the inputs, and has a “bias” function attached to the output layer, which is adjusted to perform the function mapping. RAN is therefore a more complex system, as it has more parameters to adjust and requires more complex calculations Thirdly, ECoS training is based on the idea of a one-pass, continuous, life-long learning algorithm, whereas RAN is not. Specifically, RAN has a “resolution” parameter that determines how finely the RAN matches the function being learned. This parameter decays as learning progresses, which calls into question the use of RAN for continuous learning. Decay implies that the training is going to stop at some point. This means that RAN is unlikely to be useful in life-long learning applications. Finally, methods of optimising RAN have not been reported in the literature. This, combined with the differences described above, means that it will not be further considered in this work.

4.11.3 Evolutionary Nearest Neighbour MLP The Evolutionary NN-MLP algorithm (see Section 3.9) also has several similarities to ECoS. The training algorithm combines the addition and removal of neurons with the adjustment of existing connection weights. The neurons inserted into NN-MLP also represent single points in the input space, and the activation of those neurons is based on a distance measure. The way in which a winning neuron in a NN-MLP inhibits the firing of other neurons in the network is very similar to the “One-of-N” recall method used by ECoS. Addition of neurons to the NN-MLP is driven by error, and it is capable of continuous learning, as ECoS is. As discussed in the evaluation in Section 3.9, the NN-MLP algorithm is for classification tasks only. The class that a neuron represents is explicitly stored as a property of the neuron. Training is iterative in a NN-MLP, as


76

opposed to the one-pass ECoS training algorithm, which suggests that a larger amount of time will be required to train a NN-MLP. Finally, the importance of each neuron is a property of the neuron that is adjusted during training, rather than determined by an objective external procedure. One interesting aspect of the NN-MLP algorithm is the way in which pruning is carried out: pruning of neurons is done based entirely upon the importance measure of the neuron, where the importance is measured as a function of the number of examples that the neuron activates for.

4.11.4 Growing Cell Structure Networks The general trend in constructive neural networks seems to be towards ECoS-style systems. This is illustrated by a comparison with the Growing Cell Structure (GCS) Network algorithm (Section 3.10), which has a large number of similarities with ECoS. Both algorithms are a winner-take-all kind of network, where the activation of neurons is based on the distance between the neuron and the current input vector. GCS networks partition the input space into Voronoi regions, as an ECoS does, and the winning neuron during training is adjusted to be spatially nearer to the current example (see Chapter 5 for a discussion on the partitioning of input space by an ECoS network). Both algorithms allow for continuous, life long learning. The differences between the two algorithms are quite significant, however. Firstly, neurons in a GCS are connected together by ‘edges’. Signal counters are attached to each neuron, and these counters are used to measure the performance and importance of each neuron. Rather than adding neurons when an example requires them, neurons are added after a set number of examples has been presented. Also, new neurons do not represent training examples: the connections of new neurons are set to the means of the two parent neurons, and the neurons are inserted with the goal of optimising the partitioning of the input space, rather than optimising the representation of the input data. Although both ECoS and GCS adjust the weights of the winning neuron, GCS will adjust the weights of the winners neighbours also. This means that GCS is not a local learning algorithm, while ECoS is purely local. A major problem with GCS, as identified by the evaluation in Section 3.10, is the complexity of the algorithm: a large number of calculations must be made at each step, such as updating and tracking signal counters and calculating the local resources of each neuron. This creates a large computational requirement.

4.11.5 ZISC Networks As with RAN, at first glance ZISC (Section 3.11) looks very similar to ECoS. Both activate their neurons based on the distance between input vectors and neurons and both divide the input space into regions. There, however, the similarities end. The regions defined by ZISC are not Voronoi regions as they are with ECoS but bounded regions that may overlap with the regions of other neurons. This means that not only is it possible that no neuron at all will activate for an example, but also for several neurons to activate for the same example. This will require some form of conflict resolution strategy, but at present this situation is dealt with by simply labelling the example as unidentifiable. The final difference between ECoS and ZISC is that ZISC is, as discussed in the evaluation in Section 3.11, for classification only, much as GAL and NN-MLP are. The major contribution ZISC makes to this thesis is that it demonstrates that a constructive algorithm can be implemented in silicon - this means that in principle, perhaps ECoS could also be implemented in hardware, with


77

all of the speed advantages that that provides.

4.11.6 Grow and Learn Networks The GAL algorithm (Section 3.12) is the most similar to the ECoS algorithm, and it is especially similar to the SECoS network. In both GAL and ECoS, the neurons use a distance based activation function, where the neuron that is closest to the current example in input space is the only one that is allowed to fire. GAL, as with ECoS, partitions the input space into Voronoi regions, where each added neuron defines the kernel of a new region. The learning process in both algorithms deal with one example at a time, and both are capable of continuous, life long learning. The addition of neurons in both algorithms is driven by the error of the network over each training example, and when a neuron is added, its incoming connection weights are set to the input vector of the current training example. The “fine-tuning” of incoming connection weights in GAL is identical to the learning rule used in the first layer of adjustable connections in EFuNN and SECoS, where the intention of both is to modify the exemplar represented by the neuron into a prototype that represents several examples in a cluster. Interestingly, Alpaydin (Alpaydin, 1994, pg 407) explicitly states that “a large number of iterations will be necessary”, while ECoS is touted as a one-pass algorithm. The differences between the two algorithms are quite informative, however. Firstly, as described in Section 3.12, GAL was designed for classification applications only (Kwok and Yeung, 1999), even though extensions are suggested in (Alpaydin, 1994) that would allow it to learn function approximation problems. Because of this restriction, the connection weights in the hidden to output connection layer are used only as class labels. This means that there is no learning in the second layer, as a class either is or is not, there is no middle ground in this application. Although there is an equivalent to the error threshold parameter from ECoS training, there is no equivalent of the sensitivity threshold parameter. This may be because of the restriction on applications to classification. It would appear, however, that this would have added an additional level of ‘certainty’ to the knowledge captured by a GAL, as weakly firing neurons would be reinforced by new, strongly firing neurons, even if the weakly firing neuron activated for the correct class. It is worth noting that there is also no equivalent to the ManyOfN recall of the ECoS algorithm, which casts further doubt on the use of this recall strategy in ECoS. A final difference is the use of the ‘top two’ activated hidden neurons when determining the classification of an example. Intuitively, this would be a useful thing to adapt to ECoS, but it is not clear how this could be applied to function approximation tasks. The large number of similarities between GAL and ECoS suggest that some of the optimisation strategies used in GAL can be adapted to ECoS. The strategy that is the most promising is sleep learning. This adaptation is described and investigated in Subsection 7.4.2.

4.12 Applications of ECoS ECoS networks, and EFuNNs in particular, have been applied to a wide variety of problem domains. These problems are typically those where the application domain dynamics are changing, as this makes the most use of


78

the adaptation capability of ECoS. Speech recognition is a challenging problem for many reasons, not least of which is the variation between speakers. The task is to identify either a complete word, or a segment of a word (such as a phoneme), from an acoustic signal. In (Kasabov, 1999) EFuNNs were applied to spoken phoneme recognition. EFuNNs were again applied to this problem in (Watts, 1999b), and were shown to be both accurate and adaptable to this problem. The efficacy of SECoS networks in this application area was demonstrated in (Watts and Kasabov, 2000), where a comparison with MLPs showed that they were faster training, more adaptable and more accurate than the traditional ANN. A further application in speech recognition is whole word recognition: EFuNNs were used in (Ghobakhlou et al., 2000), while SECoS as proposed in (Watts, 1999a; Watts and Kasabov, 2000) were adopted in (Ghobakhlou and Seesink, 2001). What is notable about (Ghobakhlou et al., 2000) is the application of an output space expansion algorithm (see Section 4.8) to add additional words to the EFuNN. An extensive investigation of ECoS networks in phoneme recognition is presented in Chapter 8. Economic and financial data is a rich application area for connectionist systems (Widrow et al., 1994). EFuNN networks have been applied to predicting the SE40 New Zealand stock market index in (Kasabov and Fedrizzi, 1999) and (Wang, 2001). In (Kasabov and Fedrizzi, 1999) the task was to predict the SE40 index based on three variables: the change in the current value, the change in the ten day moving average, and the change in the sixty day moving average. Comparison of the results of EFuNN with a FuNN showed that EFuNN was both more accurate and more adaptable than FuNN. An unusual application was classifying motion vector patterns in an MPEG video stream (Koprinska and Kasabov, 1999). The task was to classify a frame from a compressed video stream as one of six classes, according to the change in the image: static, where the image is not moving; panning, where the camera is rotating about its vertical axis; zooming, where the camera and object are stationary but the focus is changing; object motion, where the object in frame is moving; tracking, where a moving object is being tracked by the camera, and dissolve, which is a gradual transition between two sequences. This work compared the performance of EFuNN against the LVQ algorithm (Kohonen, 1997), as well as analysing the effect of varying the number of membership functions on performance. An example of a biomedical application of EFuNN is (Leichter et al., 2001). Here, an EFuNN was used to classify the stimulus received by a subject based on the electrical activity of the brain, as read by an EEG. The four stimuli were visual, auditory, mixed visual and auditory, and no stimulus. Horticultural applications were the subject of (Woodford, 2001). Two problems were dealt with here, classification of pest damage on apple tree leaves and identification of persimmon genotypes. The first problem involved presenting full colour images to an EFuNN that then had to identify which of three insect pests - Codling Moth, Leafroller and Appleleaf Curling Midge - had caused the damage in the image. The second problem involved identifying which of six cloned persimmons an infrared spectrum of the wax of the plant had been taken from. The papers main thrust was a comparison between EFuNNs and Support Vector Machines (SVM) (Cortes and Vapnik, 1995). The findings were that for the pest damage problem EFuNNs were more accurate over both training and testing sets. For the persimmon genotype problem SVM was slightly more accurate, but EFuNN was competitive.


79

4.13 Experiments with ECoS Networks over the Benchmark Data Sets There are three goals to the experiments reported in this section: 1. To compare the performance of ECoS networks to existing algorithms 2. To evaluate Hypothesis Two of the thesis, according to the criteria specified for that hypothesis. 3. To provide points of comparison for the algorithms introduced later in this thesis. The first goal involved comparing the performance of the ECoS networks with the prior results from Section 2.7. MLP and SECoS were compared, as were FuNN and EFuNN. It was considered to be inappropriate to compare MLP and EFuNN, or FuNN and SECoS. Nothing would be gained from these comparisons, because SECoS is the ECoS “version” of MLP, while EFuNN is the ECoS “version” of FuNN. Comparisons outside of those pairings would not be informative. Hypothesis Two (Section 1.2) is a major goal of this thesis. Thus, the criteria for evaluating the success of Hypothesis Two bear repeating, as follows: 1. The simplified ECoS exhibits levels of memorisation of the training data similar to EFuNN. 2. The simplified ECoS exhibits similar or better levels of generalisation over previously unseen data that are similar to EFuNN. 3. The simplified ECoS is able to adapt to new training data, without forgetting previously seen examples, to a similar degree than EFuNN. 4. The simplified ECoS is of a similar or smaller size than EFuNN. 5. The simplified ECoS can be applied to the same kinds of problems as EFuNN, that is, they are as flexible as EFuNN. This hypothesis is important because SECoS is an original contribution of this thesis. The criteria are designed to comprehensively compare aspects of SECoS and EFuNN. Critically, if SECoS is consistently inferior to EFuNN, in terms of the above criteria, then Hypothesis Two of the thesis will not be supported. If SECoS is competitive with EFuNN, in terms of accuracies that are equal to or better than EFuNN, or sizes that are equal to or smaller than EFuNN, then it will have been successful. The experimental methodology used was described in Section 2.7. Ten-fold cross-validation was again used. The only difference was that only a single run was performed over each fold of the data. This is because ECoS networks start with no hidden neurons and the training algorithm is completely deterministic. A common set of training parameters was used, as listed in Table 4.2. These parameters are the same as those used in (Watts, 1999a; Watts and Kasabov, 2000), where it was shown empirically that they produce ECoS networks that balance network size and accuracy. Ultimately, though, it is not important whether the parameters are optimal or not. What is important is that, when evaluating and comparing different algorithms, the parameters used are held constant across the different experiments. If the parameters are held constant, then any differences in the size and accuracy of networks produced by different algorithms are


80

Error threshold

0.1

Sensitivity threshold

0.5

Learning rate one

0.5

Learning rate two

0.5

Table 4.2: ECoS training parameters for benchmark data. Hypothesis

H0 H1

AA

AB

AC

AF

aa ab a af aa ab a af s = e s = e s = e s = e aa ab a af aa ab a af s 6= e s 6= e s 6= e s 6= e

AN an an s = e an an s < e

Hypothesis

BA

BB

BC

AF

BN

H0 H1

ba ba s = e ba ba s 6= e

bb bb s = e bb bb s 6= e

b b s = e b b s 6= e

bf bf s = e bf bf s 6= e

bn bn s = e bn bn s < e

Table 4.3: Statistical hypotheses for comparing EFuNN and SECoS. due to the algorithms themselves, rather than variations in parameters. This is standard procedure in scientific experiments, where the number of variable conditions in any set of experiments is held to a minimum1. Other methods, such as training SECoS and EFuNN networks to the same mean level of accuracy, or same mean number of neurons, were rejected as being impractical: there is no simple method for selecting the parameters necessary to achieve either of these goals. Also, this would not achieve the goal of these experiments, of comparing the two algorithms SECoS and EFuNN. The mean and standard deviation of the performance measures across each data subset were measured and are presented in the subsections below. As only a single experimental run was performed over each fold of the data, there was no need to evaluate the approximate variance of the results, as was the case in Section 2.7. The hypotheses listed in Table 4.3 were used to investigate the first of the goals of this section, that is, the comparison of SECoS and EFuNN. These comparisons are in terms of the memorisation accuracy, the generalisation accuracy and the sizes of the networks. These comparisons are to investigate the first, second and fourth criteria listed above. The fifth criterion is investigated by applying both the SECoS and EFuNN across all of the benchmark data sets. If there are no differences between EFuNN and SECoS across all benchmark sets, then the criterion will be successfully met. For those hypotheses that are concerned with the accuracies of each network over the data sets, unpaired, two-tailed t-tests were used. For hypotheses AN and BN, one-tailed t-tests were used. The performance of SECoS and EFuNN networks were compared to that of the MLP and FuNN networks discussed in Section 2.7. The statistical hypotheses for these comparisons are presented in Tables 4.4 and 4.5, respectively. The performance of SECoS was also compared to that of EFuNN. The statistical hypotheses for these comparisons are listed in Table 4.3. In these tables, the first superscript denote which training set the network was trained with, and the second the data set the network was recalled with. Where the second superscript is n, then this 1 The

problems inherent in selecting the values of ECoS training parameters are described in greater detail in Chapter 5, and a solution

presented in Subsection 7.3.2.


Hypothesis

H0 H1

AA

81

AB

AC

AF

aa ab a af aa ab a af m = s m = s m = s m = s aa ab a af aa ab a af m 6= s m 6= s m 6= s m 6= s

Hypothesis

BA

BB

BC

AF

H0 H1

ba ba m = s ba ba m 6= s

bb bb m = s bb bb m 6= s

b b m = s b b m 6= s

bf bf m = s bf bf m 6= s

Table 4.4: Statistical hypotheses for comparing MLP and SECoS. Hypothesis

H0 H1

AA

AB

AC

AF

= = = aa ab a aa ab a f n 6= e f n 6= e f n 6= e aa fn

aa e

ab fn

ab e

a fn

a e

Hypothesis

BA

BB

BC

H0 H1

ba ba f n = e ba ba f n 6= e

bb bb f n = e bb bb f n 6= e

b b f n = e b b f n 6= e

= af e 6= af e

af fn af fn

AF

bf fn bf fn

= bf e 6= bfe

Table 4.5: Statistical hypotheses for comparing FuNN and EFuNN. denotes the number of neurons in the evolving layer of the network. The subscript denotes which type of network it is, where m represents MLP, fn represents FuNN, e represents EFuNN and s SECoS. For example, ab s is the mean accuracy of SECoS networks trained over data set A and recalled with set B. The variable an e denotes the mean number of neurons in the EFuNN network after training on data set A. The comparisons of MLP versus SECoS, and FuNN versus EFuNN, were made using an unpaired t-test using pooled variance. The t statistic was calculated according to Equation 4.9.

t=

x xe P P n1 + n2

q

(4.9)

Where:

x is the mean measure of the ECoS networks, xe is the mean measure of the MLP or FuNN networks, n1 is the number of trials over the ECoS networks. In these experiments, n1 = 10 n2 is the number of trials over the MLP or FuNN networks. In these experiments, n2 = 1000 P is the pooled variance, calculated according to Equation 4.10. P=

(n1

1)2 + (n2 1)E2 n1 + n2 2

Where:

is the standard deviation over the ECoS networks, and E is the approximate variance of the MLP or FuNN networks, calculated according to Equation 2.8

(4.10)


Hypothesis

H0 H1

82

ÆA ÆB ÆC ÆF Æa Æb Æ Æf Æa Æb Æ Æf s = e s = e s = e s = e Æa Æb Æ Æf Æa Æb Æ Æf s 6= e s 6= e s 6= e s 6= e

Table 4.6: Statistical hypotheses for comparing changes in accuracy of SECoS and EFuNN. Hypothesis

H0 H1

ÆA ÆB ÆC ÆF Æa Æb Æ Æf Æa Æb Æ Æf m = s m = s m = s m = s Æa Æb Æ Æf Æa Æb Æ Æf m 6= s m 6= s m 6= s m 6= s

Table 4.7: Statistical hypotheses for comparing changes in accuracy of MLP and SECoS. Also of interest is a comparison of the degree of forgetting, and the degree of adaptation, exhibited by the networks. The changes in accuracy over each data set were calculated for each network. The forgetting and adaptation of SECoS and EFuNN networks were evaluated according to the hypotheses in Table 2.2, which are reproduced here as Table 4.9. For the comparison of the performance of SECoS and EFuNN network, the hypotheses listed in Table 4.6 were tested. These tests are used to evaluate the third criterion listed above, that is, that SECoS forgets to a similar degree as EFuNN. Unpaired, two-tailed t-tests were used. For the comparison of MLP and SECoS networks, the hypotheses listed in Table 4.7 were tested, and for the comparison of FuNN and EFuNN networks, the hypotheses listed in Table 4.8 were tested. Again, unpaired t-tests using pooled variance were carried out. For reasons of space, the results of all statistical hypothesis tests are presented in Appendix B.

4.13.1 Two Spirals Both the SECoS and EFuNN networks used in these experiments had two input neurons and one output neuron. The EFuNN had three membership functions attached to each input and output. In other words, the architectures of the SECoS and EFuNN networks were as close as possible to the architectures of the MLP and FuNN networks used previously (Subsection 2.7.2 ). The results of the experiments, in terms of percentage of examples correctly classified, are presented in Table 4.10. Tests of the hypotheses in Table 4.3 are necessary to make clear any significant differences in the performances of the two algorithms. The results of these tests are shown in Table B.1.

Hypothesis

H0 H1

ÆA ÆB ÆC ÆF Æf Æa Æb Æb Æ Æ = e f n = e f n = e f n = Æf e Æf Æa Æa Æb Æb Æ Æ f n 6= e f n 6= e f n 6= e f n 6= Æf e Æa fn

Table 4.8: Statistical hypotheses for comparing changes in accuracy of FuNN and EFuNN.


Hypothesis

H0 H1

83

ÆA ÆB ÆC ÆF aa = ba ab = bb a = b af = bf aa 6= ba ab 6= bb a 6= b af 6= bf

Table 4.9: Statistical hypotheses for evaluating changes in accuracy after further training. Trained on Set A Recall Set SECoS

EFuNN

Trained on Set B

A

B

C

All

Neurons A

B

C

All

Neurons

50.2/

57.4/

11.3/

47.1/

6.0/

44.2/

85.3/

54.2/

49.4/

6.4/

9.3

23.4

14.7

8.0

2.7

11.7

17.2

26.4

10.2

3.7

76.6/

21.1/

17.9/

65.1/

52.1/

69.6/

89.9/

17.2/

66.3/

59.1/

11.9

22.9

20.1

7.7

20.3

11.8

18.9

20.3

7.4

14.2

Table 4.10: Mean percent correct / standard deviation (to 1 d.p.) for the two spirals problem. Discussion The results in Tables 2.5, 4.10 and B.2 show that after initial training on Set A, the MLP recalled Set A significantly better than SECoS. SECoS, however, generalised to Set B better than MLP. Contrasting this result is the fact that MLP generalised to Set C better than SECoS. The overall accuracy, however, differs only at the 95% level of significance. This suggests that the performance differences between the two algorithms was not great for this data set. Comparing the two networks after further training on Set B (Table B.2) shows that the MLP had learned the additional data better, but had forgotten more than the SECoS had. After the additional training, both networks were able to generalise better over Set C. Most importantly, after further training, there was no significant difference between the two network algorithms over the full data set. Additionally, the SECoS was much smaller than the MLP. While the MLP used forty hidden neurons, the SECoS networks had an average of only 6:4 neurons. This makes SECoS much more efficient than MLP, for this problem. Tables 2.5, 4.10 and B.3 show that after initial training on Set A, the EFuNN learned the training data significantly better than FuNN. This high performance gave it a significantly better performance over the entire data set, despite it being significantly less accurate over Sets B and C. Comparing FuNN and EFuNN after additional training on Set B, in Table B.3, the EFuNNs performance over Set B increased markedly. Table B.5 shows that EFuNN forgot Set A less than the FuNN did. After further training, EFuNN was able to generalise to Set C significantly better than FuNN. Training of the EFuNN did not result in significantly more evolving layer neurons than were used in the hidden layer of FuNN. The results in Tables 4.10 and B.1 demonstrate that EFuNN was able to learn the initial training data to a significantly better degree than SECoS. This high accuracy over Set A led to a significantly higher accuracy over the entire data set. On the other hand, SECoS was able to generalise to Set B significantly better than EFuNN, while both generalised to Set C to the same degree (that is, there was no significant difference between the two accuracies over Set C). Tables 4.10 and B.6 show that after further training on Set B, both networks forgot to the


84

Trained on Set A Recall Set SECoS

EFuNN

Trained on Set B

A

B

C

All

Neurons A

B

C

All

Neurons

97.8/

94.7/

93.3/

97.1/

24.8/

97.5/

100.0/

92.0/

97.2/

26.1/

1.6

6.9

7.0

1.5

2.1

1.3

0.0

7.6

1.7

2.4

97.3/

93.3/

94.0/

96.5/

36.0/

97.2/

98.0/

94.0/

96.9/

37.1/

1.4

7.7

5.8

0.7

1.8

1.2

3.2

6.6

0.6

1.7

Table 4.11: Mean percent correct / standard deviation (to 1 d.p.) for the iris classification problem. same degree, and both learned Set B equally well: there was no significant difference between the accuracies over Set B after further training on Set B. The higher initial accuracy of EFuNN over Set A, however, meant that EFuNN was still significantly more accurate over the entire data set, even after further training. In terms of size, SECoS was significantly smaller, and thus more efficient, than EFuNN. Overall, the comparison of SECoS and EFuNN is similar to the comparison of MLP and FuNN. In both cases, the fuzzy network was able to learn the training data better, but the non-fuzzy network was able to generalise better. Also, the performance of most of the networks was often less than chance. This performance is much less than that produced by GCS, as reported in (Bruske and Sommer, 1995a), who reported a classification accuracy of over 99%. However, (Bruske and Sommer, 1995a) used a different experimental setup, using only one training and and testing set. The GCS model created also had a much greater number of neurons and took a greater number of cycles to train. Given how hard a problem the two spirals data set is, this is not surprising. This just proves again that no one algorithm is superior across all problems.

4.13.2 Iris Classification Both the SECoS and EFuNN used in these experiments had four input neurons and three output neurons. The EFuNNs used in these experiments had three MF attached to each input and three attached to each output. The mean accuracies of the SECoS and EFuNN networks are presented in Table 4.11, along with the mean number of neurons present in the evolving layer of each network. Discussion The results in Table 4.11 show that both SECoS and EFuNN were able to learn the problem to a high level of accuracy. The results of testing the statistical hypotheses in Table 4.3, as presented in Table B.9, show that there were no significant differences in the accuracies of SECoS and EFuNN over this data set. There were, however, significant differences in the size of the networks, with SECoS being significantly smaller than the equivalent EFuNN. Comparing MLP and SECoS, using the hypotheses in Table 4.4 shows that both types of network were able to learn the training data to the same level of accuracy: there was no significant difference in the accuracies over the training set. While the generalisation accuracies over Sets B and C were significantly lower for SECoS, there was no significant difference over the entire data set. Things were different after further training on Set B,


85

however. SECoS adapted to the new data significantly better than MLP did: the accuracy of SECoS over Set B was significantly better than that of MLP. SECoS also forgot less than MLP did: the accuracy of SECoS over Set A was significantly better than that of MLP. The generalisation accuracy of SECoS over Set C had, however, decreased. Overall, while there was a significant difference over the entire data set after further training at the 95% confidence level, there was no such difference at 99%. The SECoS networks were significantly larger than the MLPs. The results of the comparison between FuNN and EFuNN are presented in Table B.11. These results show that EFuNN was less accurate than FuNN, at all stages of the experiment and across all data subsets. After further training of the networks, however, EFuNN learned Set B better than FuNN. The EFuNN networks were, on average, significantly larger than the FuNNs. Even though EFuNN was significantly less accurate than FuNN, its performance was still quite high. For an easy problem like iris classification, this is not surprising. The level of forgetting and adaptation of SECoS was investigated by testing the hypotheses in Table 2.2, with the results being presented in Table B.12. These results show that there were no significant levels of forgetting evident in SECoS. This is in contrast to MLP, which did exhibit a significant level of forgetting. Although the level of change over Set B was significant at the 95% level, it was not significant at the 99% level. Thus, while there was some adaptation exhibited, it was not a highly significant amount. Generalisation accuracy over Set C did not significantly alter after further training. The level of forgetting and adaptation of EFuNN was investigated by testing the hypotheses in Table 2.2. The results of these tests are displayed in Table B.13. These results show that, as with SECoS and MLP, there were no significant levels of forgetting after further training on Set B. Again, while the accuracy over Set B increased, the increase was not highly significant. Nor did the generalisation accuracy over Set C change significantly. Comparing the changes in accuracies of SECoS to the changes in accuracies of EFuNN was done by testing the hypotheses in Table 4.6. The results of the comparison are presented in Table B.14. These results show that there were no significant differences in the amount of forgetting or adaptation exhibited by either SECoS or EFuNN. Of some great interest is a comparison of the degree of forgetting and adaptation exhibited by MLP compared to the amount of forgetting and adaptation of SECoS. This was done by testing the hypotheses listed in Table 4.7. The results listed in Table B.15 clearly show that MLP forgot more than SECoS. Also, that SECoS adapted better than MLP to the new data. Generalisation over Set C also decreased for MLP, while it increased for SECoS. Performing similar tests for FuNN versus EFuNN, via evaluation of the hypotheses in Table 4.8, yielded the results presented in Table B.16. As with MLP, the FuNN forgot more than EFuNN, while EFuNN adapted more to the new data. Generalisation over Set C also improved more for EFuNN than it did for FuNN. The overall conclusions for this benchmark data set are that while the fixed-architecture, backpropagation trained networks were able to learn the data set better than the ECoS networks, the ECoS networks adapted better to new data. Both SECoS and EFuNN networks performed equally well, although the SECoS networks were significantly smaller than the equivalent EFuNNs.

4.13.3 Mackey-Glass The SECoS and EFuNN used in these experiments each had four input neurons and one output neuron. The EFuNNs used in this experiment had five MF attached to each input and output.


86


EFuNN

Trained on Set B

A

B

C

All

Neurons A

B

C

All

Neurons

13/ 0.8

15/

15/

13/

60.0/

13/

6.667/

15/

13/

63.0/

2.13

2.944

0.934

2.1

1.355

0.793

2.116

1.088

2.5

13/ 1.3

12/

11/

267.8/

11/

9.319/

12/

11/

282.4/

1.67

0.363

6.4

0.383

1.024

1.54

0.272

4.5

22/ 0.53

Table 4.12: Average mean squared error / standard deviation (10

4) for the Mackey-Glass problem.

The mean accuracies, in this case the mean-squared errors, as well as the average number of neurons present in the network, are presented in Table 4.12. As before, the accuracies are presented as the mantissa of base ten numbers raised to the negative fourth power. Discussion Table 4.12 presents the accuracies of the SECoS and EFuNN trained on this data set. The results of testing the hypotheses in Table 4.3 are presented in Table B.17, which shows that there were several differences between the performance of SECoS and EFuNN over the Mackey-Glass data set. While SECoS learned Set A better than EFuNN, it generalised across Sets B and C worse than EFuNN. SECoS was significantly smaller than EFuNN, however. Comparing SECoS with MLP, by evaluating the hypotheses in Table 4.4, produced the results presented in Table B.18. These results show that there were no significant differences between the two types of network. Similar results are found in Table B.19, which is the comparison of FuNN with EFuNN (Table 4.5). Again, there were no significant differences between the two network types. This suggests that the differences in performance between MLP and FuNN (Table A.9), and SECoS and EFuNN were due to the fuzzy logic elements embedded in FuNN and EFuNN. In both cases, the networks that lack those elements (MLP and SECoS) performed significantly differently than FuNN and EFuNN. The results of testing the hypotheses listed in Table 2.2 for SECoS are presented in Table B.20. These results show the amount of forgetting and adaptation exhibited by SECoS after further training. From the results, it can be seen that while there was some forgetting, it was not highly significant. SECoS did adapt to the new data well, with a significant decrease in error being apparent. The generalisation error of SECoS did not change significantly after further training. Performing a similar test over the EFuNN results yielded the results in Table B.21. These results are similar to those for SECoS, in that the amount of forgetting was not highly significant, while the error over the additional training set did decrease by a highly significant amount. The generalisation error did not change significantly. The hypotheses in Table 4.6 were tested, which yielded the results presented in Table B.22. These results show that while both networks forgot to the same small degree, the change in accuracy of SECoS was very significantly greater than that of EFuNN. The changes in generalisation accuracies were not significantly different, which is to be expected as the results above show that generalisation accuracies were not altered by further training. The results of testing the hypotheses in Table 4.7 are presented in Table B.23. These results show that the


87


EFuNN

Trained on Set B

A

B

C

All

Neurons A

B

C

All

Neurons

0.458/

0.615/

0.583/

0.486/

26.8/

0.536/

0.227/

0.629/

0.514/

28.1/

0.043

0.091

0.207

0.035

3.9

0.104

0.068

0.192

0.093

4.3

0.45/

0.801/

0.847/

0.525/

107.8/

0.478/

0.337/

0.835/

0.499/

118.2/

0.049

0.212

0.25

0.066

3.6

0.04

0.05

0.215

0.045

4.0

Table 4.13: Average mean squared error / standard deviation (to 3 d.p.) for the gas furnace problem. changes exhibited by MLP were not significantly different to those exhibited by SECoS. In other words, while MLP forgot and SECoS did not, the lower initial error and greater variance of results for the MLP compensated for this. Finally, a comparison of the change in accuracy of FuNN to the change in accuracy of EFuNN was carried out. This was done by evaluating the hypotheses in Table 4.8. The results of these tests are presented in Table B.24. These results mirror those for MLP and SECoS above. There were no significant differences in the changes in accuracy of FuNN and EFuNN after further training. While FuNN forgot, and EFuNN did not, the lower error and greater variance of the FuNN results compensated for this. In conclusion, the ECoS networks exhibited better resistance to forgetting, and better adaptation than the fixedarchitecture, backpropagation trained networks. The ECoS networks were, however, often less accurate than the fixed-architecture networks, and tended to be much larger.

4.13.4 Gas Furnace The SECoS and EFuNN used in these experiments both had two input neurons and one output neuron. The EFuNNs used in this experiment had five MF attached to each input and five attached to each output. The mean accuracies of the SECoS and EFuNN networks, as well as the mean number of neurons present in each network, are presented in Table 4.13. Discussion Table 4.13 presents the accuracies of the SECoS and EFuNN tested over this data set. The results of testing the hypotheses listed in Table 4.3 are presented in Table B.25. This comparison shows that for the most part, there were no differences in the accuracies of SECoS and EFuNN after initial training. While there were some differences over Sets B and C that were significant at the 95% level, they were not significant at the 99% level. Thus, while the differences were significant, they were not highly significant. After further training on Set B, some highly significant differences did become apparent. While the accuracies over Set A did not differ significantly, the accuracies of SECoS over Sets B and C were highly significantly lower than the corresponding accuracies of EFuNN. At all times, SECoS was significantly smaller than the equivalent EFuNN. Evaluating the hypotheses listed in Table 4.4 yielded the results presented in Table B.26. These results show that the error of the MLP was highly significantly less than that of the SECoS, with only one exception: the


88

accuracy over Set B after further training. This shows that SECoS was able to adapt very well to the new data. It was not, however, able to learn the initial data as well, nor was it able to generalise as well as the MLP. To compare the FuNN and EFuNN networks, the hypotheses in Table 4.5 were investigated. The results of these tests are presented in Table B.27. These results show that the performance of FuNN was much closer to EFuNN, than the performance of MLP was to SECoS. After initial training, there were no significant differences in accuracy between FuNN and EFuNN: only after further training did differences manifest themselves. FuNN forgot Set A severely after further training, while the forgetting of EFuNN was much less. The accuracy of EFuNN over Set B was not significantly different, but the accuracies over Set C were, with the error of EFuNN being very much lower. The high error of FuNN after further training accounts for the significant difference over the entire data set. The level of forgetting and adaptation shown by SECoS was investigated by testing the hypotheses in Table 2.2, the results of which are in Table B.28. Although there was some significant forgetting evident, the forgetting was not highly significant, that is, the accuracies over Set A were not significantly different at the 99% level of confidence. The error over Set B improved significantly after additional training, that is, the SECoS adapted well to the additional data. Finally, the generalisation error was not significantly effected by the further training. Results of a similar test for EFuNN are presented in Table B.29. These results mirror those of SECoS, with the EFuNN adapting well to the additional training data. In order to compare the forgetting and adaptation of SECoS and EFuNN, the hypotheses in Table 4.6 were tested. The results of these tests are presented in Table B.30. These results show that the two networks forgot and adapted to a similar degree after further training on Set B. Comparing the forgetting and adaptation of MLP and SECoS was done by evaluating the hypotheses listed in Table 4.7, which yielded the results presented in Table B.31. These results show that the SECoS actually forgot more than the MLP did. However, this is due to the SECoS having a much higher initial error. Proportionally, the SECoS forgot less than the MLP. The SECoS did adapt to the additional training data much better than the MLP did. The change in generalisation accuracies were not significantly different. Finally, a comparison of the forgetting and adaptation of FuNN and EFuNN was carried out, using the hypotheses in Table 4.8. The outcomes of these tests are listed in Table B.32. From these results it is apparent that only the accuracy over Set B changed significantly differently: the EFuNN adapted much better than the FuNN did, as it exhibited a much greater decrease in error than the FuNN. Overall, for the gas furnace data set, the ECoS networks did not perform as well in terms of absolute error as the fixed-architecture, backpropagation trained networks did. However, the ECoS networks once again adapted to new training data much better, with a lower level of forgetting and a greater level of adaptation than either MLP or FuNN.

4.13.5 Benchmark Conclusions These experiments were designed to assess four things: learning accuracy, generalisation accuracy, adaptation and forgetting. The efficiency of the networks, as measured by the number of neurons present after training, was also of interest. Overall, SECoS networks have been shown to be more efficient, as they were much smaller than EFuNN networks trained on the same data sets with the same training parameters. Due to their simpler structure,


89

specifically the lack of fuzzification and defuzzification elements, they also had far fewer connections. SECoS were shown to generalise better across the iris classification and gas furnace data sets. EFuNN learned the initial training set and generalised better than SECoS for the Mackey-Glass data set, but SECoS learned the additional training data better for all benchmark data sets. However, this adaptability of SECoS came at the cost of memorisation. In all cases, SECoS forgot the initial training set to a greater degree than EFuNN. Although both SECoS and EFuNN forgot previous data after further training, the level of forgetting does not approach catastrophic levels by any reasonable definition of the word. The No Free Lunch theorem (Wolpert and Macready, 1995) establishes that no one algorithm will be superior over all applications. The results of these experiments with benchmark data sets confirm this for the case of SECoS and EFuNN ECoS networks. While SECoS performed better than EFuNN across the iris and gas furnace benchmark data sets, they performed slightly worse for the Mackey-Glass data. When EFuNN did out-perform SECoS, however, they did so at a cost of a greater degree of complexity and consequent lower efficiency.

4.14 Problems with the ECoS Model This chapter has identified several problems with ECoS networks. Firstly, the size to which a network can grow during training. Secondly, the difficulty of optimising the four training parameters, which is compounded by the fact that the four parameters interact with one another (see Chapter 5 for a formalisation of the ways in which ECoS training parameters are interrelated). The first problem, the size of the network, is fairly straightforward. The more neurons there are in the evolving layer of an ECoS network, the more slowly the network will recall and the more slowly it will respond to new examples (that is, the more slowly it will train). This is solely and simply due to the large number of calculations that must be performed for large networks: the distance between the current input vector and the neuron weight vector must be calculated for each and every evolving layer neuron. The memory requirements for storing the neurons and large connection matrices are also of concern. Problems with generalisation and overtraining can also be expected if the network grows to be too large. If the number of neurons is a significant fraction of the number of training examples seen by the ECoS, then it could not be expected to generalise well to new examples. This has been shown experimentally (Watts, 1999b).

4.15 Conclusions Comparisons between ECoS and selected constructive ANN algorithms were presented in Section 4.11, where similarities and differences, as well as methods of optimisation that may be applied to ECoS, were highlighted. These comparisons showed that the Grow and Learn network GAL is the algorithm that is the most similar to ECoS. A method of optimising GAL networks, sleep learning, may be applied to ECoS (Chapter 7). As a result of this comparison, Hypothesis One, under the criteria specified in Section 1.3, is considered to be successfully supported. Evaluating Hypothesis Two over the benchmark data set results using the criteria in Section 1.3, gives the


90

following results: 1. The results show that SECoS is able to memorise training data as well as EFuNN. 2. The results show that SECoS is able to generalise to unseen data as well as EFuNN. 3. The results show that SECoS adapts as well as EFuNN and resists forgetting as well as EFuNN. 4. The results show that SECoS is able to be applied to the same range of problems as EFuNN. Based on these results, Hypothesis Two is considered to be successfully supported.

4.16 Summary This chapter presents work in support of Hypotheses One and Two, and is the first of the chapters to present original material. The chapter commenced with Section 4.2 by describing the general ECoS framework, within which all ECoS networks are developed. This was followed by Section 4.3 in which the seminal ECoS network, EFuNN was presented. Section 4.4 derived from EFuNN the general, canonical ECoS network algorithm and its open architecture learning algorithm. Three instances of the ECoS algorithm were presented, where the motivation behind each network and the ways in which they implement the ECoS algorithm were described. Firstly, EFuNN was revisited in Section 4.5 and described in the context of the general ECoS algorithm of Section 4.4. Section 4.6 presented the first major piece of original work in this thesis, the minimalist SECoS network. This was followed in Section 4.7 by the DENFIS model, a network that is a more radical departure from the canonical ECoS algorithm than SECoS. DENFIS was included to provide background material for later original work. A mechanism for expanding the output space (adding output neurons to) of an ECoS network, and EFuNNs and SECoS in particular, was described in Section 4.8, followed by temporal extensions in Section 4.9. Temporal extensions are ways in which the basic ECoS algorithm can be extended to allow it to capture short-term temporal dynamics. Ways in which the ECoS algorithm fulfills the requirements for intelligent systems laid forth in (Kasabov, 1998a) were covered in Section 4.10. Comparisons between ECoS and selected constructive ANN algorithms was presented in Section 4.11. A survey of the applications of ECoS networks was presented in Section 4.12 and a description of the problems with ECoS was presented in Section 4.14. It is these problems that form the motivation for the rest of the original work in this thesis. The chapter finished with empirical results for SECoS and EFuNN networks trained on the four benchmark data sets, as presented in Section 4.13.

Chapter 5

Formalisation of Evolving Connectionist Systems Between theory and practice the distance is immense and there stand empires With apologies to Napoleon

5.1 Introduction Traditional ANN are supported by a large body of theory (Kolmogorov, 1957; Minsky and Papert, 1969; Cybenko, 1989; Kosko, 1993). This body of theory describes:

How the ANN training algorithms behave, given the settings of their training parameters.

How the training algorithms allow the network to capture knowledge.

How this knowledge is represented by the ANN.

This theory assists the neural network practitioner in both applying these algorithms and in optimising and extending them. A theoretical basis is also useful in assisting the acceptance of a new algorithm: other researchers are more likely to utilise a new algorithm if its theoretical grounding is known. It is for these reasons that a theoretical basis to ECoS is desirable. Any theory, or formalisation, that describes the ECoS algorithm must cover two distinct aspects. Firstly, the behaviour, or state, of the network at any time t. Secondly, the way in which the state of the network changes as it trains, which includes the effect each training parameter has on the changes made to the ECoS by the training algorithm. This chapter addresses Hypothesis Three, as described in Section 1.2, and accomplishes two things: firstly, a previous attempt at a theoretical basis to ECoS is examined and critiqued; Secondly, a new formalisation of ECoS is presented. This new formalisation is a major original contribution of this thesis. The previous theory is examined and critiqued in Section 5.2. The new theory of ECoS is introduced in Section 5.3 where the geometrical basics are laid out. In Section 5.4, the basic theory presented in Section 5.3 is initially applied to ECoS training. Section 5.5 focuses on the addition of neurons to the evolving layer of the ECoS, which is followed by an analysis and formalisation of the effect of each training parameter, in Sections 5.6 to 5.9. Throughout these sections it is shown how the training algorithm parameters are interrelated. Differences between training ECoS networks over classification and function approximation problems are discussed in Section 5.10. Some thoughts on the convergence of

CHAPTER 5. FORMALISATION OF EVOLVING CONNECTIONIST SYSTEMS

92

ECoS training are laid out in Section 5.11. The ramifications of the non-orthonogality of the training parameters are discussed in Section 5.12, while the ramifications of non-uniform training data are discussed in Section 5.13. The predictions made by the formalisation are investigated experimentally in Section 5.14, where the effects of altering training parameters while training ECoS networks on the benchmark data sets are investigated. The problems discussed in Section 4.14 of the previous chapter are revisited and redefined in Section 5.15. Conclusions are offered in Section 5.16. Finally, the chapter is summarised in Section 5.17.

5.2 Existing Theory Kasabov (Kasabov, 1998a; Kasabov, 1999; Kasabov, 2003) posits the following theory to describe the state and training of an ECoS network (specifically an EFuNN, Section 4.3). Note that the notation used for the training parameters has been altered to bring it into line with those used in previous sections of this thesis:

rj , represents an association between a hyper-sphere from the fuzzy input space and a hyper-sphere from the fuzzy output space, the W 1(rj ) connection weights representing the co-ordinates of the center of the sphere in the fuzzy input space, and the W 2(rj )-the co-ordinates in the fuzzy output space. The radius of an input hyper-sphere of a rule neuron is defined as (1 Sthr ) . . . For example, two pairs of fuzzy input-output data vectors d1 = (Xd1; Y d1) and d2 = (Xd2; Y d2) will be allocated to the first rule neuron r1 if they fall into the r1 input sphere and in the r1 output sphere, i.e. the local normalised fuzzy distance between Xd1 and Xd2 is smaller than the radius r and the local normalised fuzzy difference between Y d1 and Y d2 is smaller than an error threshold Ethr . Each rule neuron, e.g.

(Kasabov, 1999) On the topic of adaptation of the existing neurons, Kasabov goes on to say: Through the process of associating (learning) of new data point to a rule neuron, the centers of this neuron (sic) hyper-spheres adjust in the fuzzy input space depending on a learning rate 1 , and in the fuzzy output space depending on a learning rate 2 . (Kasabov, 1999) The mapping between hyper-spheres and the modification of their positions during weight adjustment is depicted in Figure 5.1. There are several problems with this theory. Firstly, the radius of the hyper-spheres is defined as 1

Sthr . The

sensitivity threshold, however, is a property of the training algorithm, not of the neurons. Although the example that caused the addition of that neuron will be within that radius, subsequent examples that cause the neuron to fire will not be. Also, if the radius of the hyper-spheres were dependent upon Sthr , then the radius of all hyper-spheres would be identical and only the learning rate parameters would have any effect upon training. Experimental results (Section 5.14) show that this is not the case. Secondly, the suggestion that the region defined by a neuron is a hyper-sphere is not supported by the canonical ECoS algorithm. ECoS evolving layer neurons are unthresholded, thus a neuron will activate if it is the closest in the input space to the current example, no matter how distant the example actually is. As long as the neuron is the closest, it will activate for that example, even if the distance is greater than 1

Sthr . Which neuron is closest to the example, however, depends on the co-ordinates of the other


93

Figure 5.1: Mapping from input to output hyper-spheres (adapted from (Kasabov, 1999)).

neurons in the evolving layer. Since these may be distributed in any manner within the input space, the boundaries between the regions defined by each neuron are not regular, that is, the polygons defined by the points represented by each neuron in the evolving layer of an ECoS network are not regular polygons, nor are they necessarily of regular size. Plainly, then, neither the description of the regions defined by neurons as hyper-spheres, nor the definition of these hyper-spheres, is appropriate for this purpose. Another problem with this theory is that it does not describe the effect of the training parameters. Although experimental results (Watts, 1999b) show that the parameters have different effects upon the behaviour of an ECoS network, the theory here does not describe this in any way. Finally, the theory is untestable: it makes no predictions about the behaviour of the network or training algorithm as parameters are altered. Some elements of the theory above are satisfactory, however: each neuron in the evolving layer does provide a mapping, or association, from a region of input space to a region of output space, and the coordinates of these regions in the input space are defined by the connection weights of the neuron. These elements will be retained in the improved theory put forward below.

5.3 A New Formalisation of ECOS With the existing theory unsatisfactory, it becomes necessary to formulate a new theory that overcomes the shortcomings elucidated above. Three assumptions are made:


94

1. That the ECoS network has at least one layer of connections coming into the evolving neuron layer, and one layer of connections going out of the evolving layer. 2. That the distance between two vectors will be measured so that the distance is in the range of [0; 1℄. 3. That the evolving layer neurons are not thresholded. Within this chapter, the term region is intended to mean a set of points in n-dimensional space that is defined by specific boundaries. The term volume means the amount of unit space occupied by a region. As stated above, this theory will be in two parts: the state of the network at a time t; and the behaviour of the network in relation to the training parameters, that is, the way in which each training parameter effects the training process. The first set of theory here describes the state of the network.

5.3.1 Axioms of State The following axioms are the basis of this theory. They are partially derived from an examination of the forward propagation algorithm of ECoS: some are also retained from the previous theoretical work described above. These axioms describe the way in which an ECoS network encapsulates what it has learned about the input space, that is, how an ECoS network represents knowledge. Axiom 1 each neuron n in the evolving layer of an ECoS network defines a single point in the input space. This is self evident from the ECoS algorithm. Since the activation of any neuron is based on the distance between the current input example and its incoming weights, each neuron therefore represents a single point in the input space. Axiom 2 the activation of a neuron n for an example I is proportional only to the distance of I from n. This also is self evident from the ECoS algorithm, specifically from Equation 4.2. Axiom 3 a neuron n will activate (win) iff its activation is greater than all other neurons in the evolving layer Assuming the case of one-of-n activation, a neuron may propagate its activation to following layers only if it is the most highly activated. To be the most highly activated with a distance-based activation function, the point defined by the neuron in Axiom 1 must be the closest to I. Axiom 4 for every neuron n there is a region Rn in input space within which each point is closer to n than to any other neuron. This is self evident: in any plot with multiple points, there will be a region around each point that is closer to that point than any other. In effect, each neuron in the evolving layer corresponds to a Voronoi polygon in the input space (Okabe et al., 1992). This is consistent with the GAL algorithm (Section 3.12), which is the constructive algorithm most similar to ECoS. In (Alpaydin, 1994, pg 399), it is stated that: The input space is divided in the form of a Voronoi tessellation where exemplar units’ domination regions are bounded by hyperplanes that pass through the medians of the two closest exemplar units.


95

Thus, an ECoS network with m neurons will partition the input space into m Voronoi regions, where a neuron will activate if an input example is at a point within the Voronoi region for that neuron. Figure 5.2 shows the Voronoi regions defined in two dimensional input space by a hypothetical ECoS network with three evolving layer neurons. The Voronoi region defined by the winning neuron j is shaded.

Figure 5.2: Voronoi regions defined by the evolving layer neurons of an ECoS network. In Figure 5.3 the Voronoi regions for each evolving layer neuron in a SECoS (Section 4.6) network is presented. This network was trained on the gas furnace data set (Subsection 2.6.5). The two axes of the plot correspond to the two input variables: the Voronoi plot is generated according to the incoming weights of the evolving layer neurons. In Figure 5.4 the same plot is presented, but with the training examples superimposed over the Voronoi polygons. This shows that overall each neurons region encapsulates several training examples. Also, there are more neurons, and hence more, smaller, Voronoi polygons, in the region of input space where the training examples are more tightly clustered. Further away from this region, there are fewer, larger regions, corresponding to a lower density of examples. In Figure 5.5 the Voronoi regions of a SECoS network trained on the two spirals data set (Subsection 2.6.2) is presented, where regions for neurons that represent the second of the two classes have been filled in. A basic spiral shape has become apparent, but is slightly coarser than for other constructive algorithms (see, for example, Section 3.10). From the above axioms and results, it is possible to infer some properties of ECoS networks. Assume that the task at hand is a classification problem of classes, using an ECoS network with m neurons in the evolving layer, and the examples being presented to the network are uniformly distributed in input space. The probability of an unknown example I of class C being correctly classified is determined by the probability of I falling in a region ‘owned’ by a neuron in the set of neurons MC that represent the class of I. In the general case this is determined by Equation 5.1.


96

Figure 5.3: Voronoi regions of an ECoS network trained on the gas furnace data set.

Pt =

P Vi P

Vm

(5.1)

where:

Pt is the probability of correctly classifying I as being a member of class C Vi is the volume of region i, where i 2 MC Vm is the volume of region m, where m is the set of all neurons in the evolving layer of the network.

Since the

distance measures used in ECoS must return values between zero and unity, the sum of all neurons regions volumes must be unity. Thus, Equation 5.1 simplifies to:

Pt =

X

Vi

(5.2)

With uniformly distributed examples, the volume of each of the m regions will be equal. Thus Pt is proportional to the number of neurons allocated for each class. This implies that in the case of an unbalanced training set, that is, a training set with a large number of examples of one class and smaller numbers of other classes, a network will be produced that is less likely to generalise well to the under-represented class. For non-uniformly distributed examples, multiple classes that are tightly clustered together will cause each neuron to have a very small region associated with it. This will require a larger number of neurons to model the problem. Note also that this theory only applies to classification problems: for a discussion of the differences between classification and function approximation problems, see Section 5.10.


97

Figure 5.4: Voronoi regions of an ECoS network trained on the gas furnace data set with the training examples superimposed.

5.4 Theoretical Basis of ECoS Training The training theory presented here makes an additional assumption to the previous section, that the training of the network will be carried out according to the algorithm presented in Subsection 4.4.1. That is, that the modification of the connection coming into the evolving layer neurons will be unsupervised, and that the modification of the connections going out of the evolving layer neurons will be supervised. The ECoS network that results from training can be expressed as a three-tuple, as follows:

Nt = fNo; T;

g

(5.3)

where:

Nt is the trained ECoS network, No is the initial, untrained ECoS network, T is the training data set, and is the training parameter set. Since the initial network and training data set can be assumed to be outside the control of the training algorithm, the parameter set becomes the focus of interest for this section. According to the ECoS training algorithm a neuron will be added to an ECoS network to represent the current training example under two circumstances: 1. If the activation of the winning neuron is less than the sensitivity threshold Sthr . 2. If the absolute error over the output is greater than the error threshold Ethr . It is possible to infer the following from these rules: 1. If an input vector lies within the Voronoi region Rj for a winning neuron j , then a neuron will be added to the network if the training example I lies at a distance from neuron j such that the activation Aj

< Sthr or:


98

Figure 5.5: Voronoi regions of an ECoS network trained on the two spirals data set. 2. At a position such that the error Ej

> Ethr .

There must therefore be a region around j , and within the region

Rj , denoted henceforth as Ra , such that any

input vector that falls within that region will not cause a neuron to be added. Conversely, any input vector that falls outside of that region will cause a neuron to be added. Since this region is defined only by the distance of the points within it from j , the region Ra must be described solely by the distance of its boundary from j , denoted henceforth as Da . Therefore, Ra is a hyper-sphere in the general case 1 .

5.5 Influence of the Neuron Addition Parameters Following this line of reasoning, it is apparent that there are two regions to be considered. The first is defined by the sensitivity threshold Sthr and will be denoted by Rs , which has the volume Vs . The second is defined by the error threshold Ethr , which will be denoted by Re and has the volume Ve . The region Ra is therefore the intersection of the intersection of these two regions and region Rj , that is:

Ra = Rj \ (Rs \ Re )

(5.4)

If it is assumed that the training examples are uniformly distributed through the input space, and that any training input vectors that lie outside of Ra will cause the addition of a neuron, then the probability of any training example within the neurons Voronoi region Rj causing the addition of a neuron, is equivalent to the ratio between the volume Vj of Rj and the volume Va of Ra :

Pa =

Vj

Vj

Va

If the examples are non-uniformly distributed in input space, then the relationship between 1 Kasabov

(5.5)

Pa

and

(Kasabov, 1999) had the right idea with his theory of hyper-spheres: they were just applied in the wrong place.

Vj

will be


99

Figure 5.6: Region defined by both sensitivity and error thresholds. Examples that lie in this region will not cause a neuron to be added. non-linear. In either case, however, it is apparent that the following relationship is true:

2

Va ! 0; Pa ! 1 Since

Va is defined as the volume of Ra , and Ra is defined as the intersection of Rs and Re , it can be deduced

that:

Vs ! 0; Va ! 0 Ve ! 0; Va ! 0

) Vs ! 0; Pa ! 1 ) Ve ! 0; Pa ! 1 Since Vs and Ve are respectively determined by Sthr and Ethr it is conjectured that:

Sthr ! 1; Pa ! 1

(5.6)

Ethr ! 0; Pa ! 1

(5.7)

and:

No assumptions are made that these relations are linear. The relationship between each parameter and the rate at which neurons are added to the evolving layer will be influenced by the distribution of the training data in the input space. If the training examples are distributed uniformly within the input space, then the relations will be 2 It

can also be conjectured from this equation that an ECoS network will gain neurons rapidly during the early phases of training, but more

slowly later on: as the network has few neurons initially, the volume assigned to each neuron is very large, thus the difference between Vj and

Va is very large and Pa is very high. Experimental results have confirmed this.


100

quasi-linear. If the distribution is non-linear, as could be expected of most real-world problems, then the relations will be more complex.

5.6 Influence of the Sensitivity Threshold Parameter Ds can be derived simply from the activation equation of the neuron, Aj = 1 Dj , where Dj is the distance between j and the example I, by replacing Dj with Ds and rearranging to make Ds the dependent variable: The hyper-sphere defined by

Sthr

is defined by a single distance, denoted here as

Ds .

The value of

Ds = 1 Sthr Since Va is a function of the distance Ds , Va

= f (Ds ), Pa is therefore also a function of Ds . Thus:

Sthr ! 1; Ds ! 0

) Sthr ! 1; Va ! 0 ) Sthr ! 1; Pa ! 1 In other words, as the sensitivity threshold increases, so too does the probability of a neuron being added to the network. This proves the conjecture in Equation 5.6 and is consistent with experimental results (Section 5.14). The region Rs as defined by Ds around j is displayed as the shaded region in Figure 5.7. This figure shows the Voronoi regions defined in a two dimensional input space by a hypothetical ECoS network with three neurons in the evolving layer.

Figure 5.7: Distance and region Rs (shaded region) defined by the sensitivity threshold training parameter. Training examples that lie within Rs will not cause a neuron to be added.


101

5.7 Influence of the Error Threshold Parameter Before considering the influence of the error threshold parameter, it is necessary to consider the activation Ao of an output neuron o. There are three cases to consider for the activation of an output neuron:

Ao =

8 > > > < > > > :

0;

Wj;o = 0

Wj;o Aj ; 0 < Aj Wj;o < 1

(5.8)

Aj Wj;o 1

1;

Proving the conjecture in Equation 5.7 therefore requires a proof for each of these three cases.

= 0, then Ao will be zero, no matter the activation Aj . Thus, the distance between I and j is indeterminable from this relation. Instead, Pa can be deduced directly. For an activation Ao = 0, the For the case of Wj;o

error threshold will trigger the addition of a neuron if the desired output Od is greater than the error threshold. Therefore:

Pa =

8 < :

1; Od > Ethr 0; otherwise

(5.9)

This is consistent with conjecture 5.7.

< Aj Wj;o < 1, the way in which the error is calculated must be examined. across output o between the desired output value Od and the actual output value Ao is defined as: For the case of 0

The error Eo

Eo = jOd Ao j which can be expanded as:

8
Ao (5.10) : A Od ; Od < Ao o Thus, there are two situations where Eo could exceed Ethr : when Ao is too low, that is, the example I is too far from n; or when Ao is too high, that is, I is too close to n. There are therefore two activations to consider: Amax , that is too high, and Amin that is too low. These activations can be determined by substituting Ethr for Eo o o and rearranging Equation 5.10 to make Ao the dependent variable, as follows: Eo =

Od

Amin = Od Ethr o Amax = Od + Ethr o For the case that 0 < Aj Wj;o

(5.11)

< 1, then the terms of Equation 5.11 above can be expanded as: Amin Wj;o = Od Ethr j Amax Wj;o = Od + Ethr j

where Amax is the maximum activation of winning evolving layer neuron j that will cause the maximum output j and Amin is the minimum activation of j that will cause the minimum output Amin activation Amax o o j These can be rearranged to make Amax and Amin the dependent variables, as follows: j j

Od + Ethr Wj;o O Ethr Amin = d J Wj;o

Amax = j


102

Given that Equation 4.2 describes the activation of j with respect to the distance D, then expanding Aj gives:

Od + Ethr Wj;o O Ethr Dmax = d Wj;o

1 Dmin = 1

Rearranging to make Dmin and Dmax the dependent variables yields:

Dmin = 1

Od + Ethr Wj;o

(5.12)

Dmax = 1

Od Ethr Wj;o

(5.13)

Thus, points that lie at a distance between Dmin and Dmax will not cause a neuron to be added. Figure 5.8 shows this. This figure shows the same three Voronoi regions from Figure 5.7, where the shaded region is Re .

Figure 5.8: Distances and region Re defined by error threshold parameter. Training examples that lie within Re will not cause a neuron to be added. While it may seem the terms Od

+ Ethr

and Od

Ethr

in Equations 5.12 and 5.13 could yield distances

of less than zero or greater than one, from Equation 5.10 it is apparent that, provided that Od and Ethr are both constrained to the range [0; 1℄, this cannot happen. From these equations the following constraints can be derived: if D

= Dmin then Od 1 Ethr

if D

= Dmax then Od Ethr

It is apparent from Equations 5.12 and 5.13 that:

Ethr = 0; Dmin = Dmax = 1

Od Wj;o

) Ethr ! 0; Demin ! Demax


103

Since the volume Ve of the region Re is given by:

Ve = f (Demax) f (Demin ) where f is the volume function, then the following is implied:

Ethr ! 0; Demin ! Demax

) Ethr ! 0; Ve ! 0 ) Ethr ! 0; Va ! 0 ) Ethr ! 0; Pa ! 1 In other words, as the error threshold decreases, the probability of a neuron being added increases. This proves the conjecture in Equation 5.7 for this case. Additionally, if the distance Ds is less than the minimum distance Demin , then a neuron will be added for every training example. This is verified by Equations 5.4 and 5.5. If the intersection between the two regions is zero, then Pa

= 1.

The final case is when Aj wj;o whenever Wj;o

1. Note that this requires Wj;o 1, although not every output will be unity

1. In this case, the error will exceed the error threshold only when Od is less then 1 Ethr ,

thus:

Pa =

8 < :

1; Od < 1 Ethr 0; otherwise

(5.14)

The conjecture in Equation 5.7 again holds true. The distance De can be derived by rearranging Aj Wj;o

1 to make Aj the dependent variable and substituting

De , thus: De 1

1

Wj;o

(5.15)

Therefore, the maximum distance that an example can be at for this case is determined solely by the connection weight. If the following is the case

De > 1

1

Wj;o

then the neuron activation will not saturate and Equations 5.12 and 5.13 will apply.

5.8 Influence of the Learning Rate One Parameter The learning rate parameters also have an effect upon training, although each parameter has a different effect. This is due to the different mechanisms by which they function. The weight update rule for the input to evolving layer of connections is intended to reduce the difference between the current weight vector and the current input vector. The weight update rule for the evolving to output layer is a variant of the perceptron learning rule, and is based on the idea of reducing errors for the outputs.


104

Intuitively, the higher the 1 parameter is, the higher the activation of j will be the next time the current vector

I is presented to it. The conjecture for this parameter is thus:

1 ! 1; Atj+1 ! 1 1 ! 0; Atj+1 ! Atj

(5.16)

where Atj+1 is the activation of neuron j at time t + 1 for input vector I. This conjecture can be proven as follows: The weight update rule for the input to evolving layer weights can be expressed as:

Wi;j (t + 1) = Wi;j (t) + Wi;j where:

Wi;j = 1 (Ii

Wi;j (t))

1 = 1 the +1 change in distance between the two vectors W and I is such that the distance at time t + 1, D , is zero. This can be viewed as a change in distance between W and I. The goal is thus to prove that when t j

Given the following:

Atj = 1 Djt

(5.17)

Atj+1 = 1 Djt+1

(5.18)

Djt+1 = Djt

Djt

(5.19)

From the weight update rule above, it can be seen that:

Djt = 1 Djt

(5.20)

Rearranging Equation 5.17 to make Djt the dependent variable yields:

Djt = 1 Atj substituting for Djt in Equation 5.20 yields

Djt = 1 (1 Atj )

(5.21)

while substituting for Djt in Equation 5.19 gives the following:

Djt+1 = Djt

1 (1 Atj )

(5.22)

Rearranging Equation 5.18 to make Djt+1 the dependent variable yields:

Djt+1 = 1 Atj+1

(5.23)

Replacing Djt+1 in Equation 5.22 with Equation 5.23 yields:

1 Atj+1 = (1 Atj ) 1 (1 Atj ) Finally, solving for Atj+1 yields:

Ajt+1 = 1

(1 Atj ) 1 (1 Atj )


which can be simplified to:

Atj+1 = Atj + 1 (1 Atj )

105

(5.24)

This holds true for any monotonic linear distance measure. It can be seen that when 1

= 1 Equation 5.24 becomes: Atj+1 = Atj + (1 Atj ) ) At+1 = 1 j

When 1

= 0, Equation 5.24 becomes:

Atj+1 = Atj + 0(1 Atj ) ) Atj+1 = Atj This is also true for non-monotonic distance measures, but is not proven. This objection aside, the conjecture in Equation 5.16 above is proven From Equation 5.21 above, it is also possible to calculate the maximum distance that a neuron will move. A neuron moves the maximum distance when its activation is the minimum allowed. The minimum allowed activation is set by the sensitivity threshold parameter, Sthr . Thus, by substituting Sthr into Equation 5.21 above, we get:

Dmax = 1 (1 Sthr )

(5.25)

5.9 Influence of the Learning Rate Two Parameter The change to the connection weight Wj;o , Wj;o , is determined by the following equation:

Wj;o = 2 Eo Ao

(5.26)

where:

2 is the learning rate two parameter, Eo is the absolute error over output neuron o and Ao is the activation of neuron o From this equation, a relationship with the error threshold becomes immediately apparent: since the weights will only be updated if a neuron is not added to the network, then Eo in Equation 5.26 above will always be less max can be calculated: than the error threshold Ethr . Thus, a limit on the value of Wj;o , Wj;o max Wj;o = 2 Ethr Ao

This shows that the error threshold parameter and the performance of the

2 parameters are closely coupled

together. A high error threshold will reduce the number of neurons added by two different mechanisms: firstly, by reducing the sensitivity of the network to the error over the current example, and secondly, by reducing its sensitivity to error for later examples. Error threshold is thus a very significant training parameter. The sensitivity threshold is also relevant. The activation of Ao is a function of Aj and Wj;o , and the minimum min can be calculated: activation of j is determined by Sthr , the lower limit of Wj;o , Wj;o min Wj;o = 2 Eo wj;o Sthr


106

Since ECoS networks are intended to learn for the duration of their existence, it is entirely possible that the weights in this layer of connections will become very large: with an unending (or infinite) stream of training data, it is also entirely possible that the weights themselves will approach infinity, especially with a low sensitivity threshold and a high error threshold. An interesting implication of the unbounded growth of weights can be described using Equations 5.12, 5.13 and 5.15. As the weights continue to grow, the distance values in these equations tend ever closer to one, that is:

Wj;o ! 1; De ! 1 Given the following:

De ! 1; Pa ! 0 2

parameter, then it is clear that as the

2 parameter increases, the probability of adding a neuron later decreases.

Over a complete training set, then, an

Since the rate at which the weights grow is directly determined by the

ECoS network will be expected to add fewer neurons during training with a high learning rate. This has been experimentally verified (Chapter 8). Note that this applies to classification problems only. Function approximation problems behave differently, and are discussed in Section 5.10.

5.10 Function Approximation versus Classification Problems The analysis of the effect of the sensitivity threshold and error threshold apply to all types of problems, no matter whether they are classification or function approximation problems. To explain the effect of the learning rate parameters for function approximation problems requires a slight reworking of parts of the formalisation. Assume that an evolving layer neuron activates with a value of unity. The activation of the output neuron o will therefore be equal to the value of the connection weight Wj;o . If Wj;o is greater than unity, then the output neuron

o will saturate at unity.

This will not cause a problem if this occurs for a classification problem, as the output

values will be either zero or unity. If a neuron activates at less than unity, and the connection weight outgoing from

j is equal to or greater than unity, then the error will still be low.

Thus, a neuron that is not close to the current

example can still correctly classify it, if its outgoing weight is sufficiently large. This means that for classification problems, a smaller number of neurons is possible, especially if Ethr and Sthr are low. For function approximation problems, it is not possible for the W2 connection layer weights to exceed unity. This is because these weights represent the desired output values. Thus, an error that is less than the error threshold, will cause the weight to move towards the desired output value. This means that each outgoing connection weight has a region around it on the number line, were the bounds are defined by the error threshold. The neuron then represents a cluster of examples, with similar output values. Any input vector that causes that particular evolving layer neuron to activate, and that has a desired output value within the range Wj;o Ethr will not cause a neuron to be added. A high 2 , however, will cause the output values to move away from the centre of the cluster of output values. If the data is self-consistent, that is, the values are periodic, then additional neurons will be added, driven by the error threshold parameter. However, if 2 is too low, then the neuron will not be able to find the centre of the cluster of output values. This will also cause more neurons to be added.


107

5.11 Convergence of ECoS Training The convergence of an ANN training algorithm is the ability of that training algorithm to achieve an arbitrary level of accuracy (or arbitrarily low error) over a finite training set. Proving the convergence of a training algorithm is important, as it shows that the algorithm is able to be trained to a sufficient accuracy to be usable within the specific application domain from which the training set is taken. For instance, the Kolmogorov Theorem (Kolmogorov, 1957, cited in (Ripley, 1993)) proves that a single hidden layer MLP trained by backpropagation can approximate any continuous function to any degree of accuracy, given enough neurons in the hidden layer (Cybenko, 1989). It is a trivial matter to prove that the ECoS training algorithm can converge to zero errors. Assume there is

p unique examples of either a random or non-random distribution. Then, convergence over T is guaranteed by the selection of training parameters such that Pa = 1: that is, if the sensitivity threshold parameter is set to Sthr = 1, or the error threshold parameter is set to Ethr = 0 then the resulting ECoS networks evolving layer size n will be n = p. Thus, every example of the training set will be represented by the network, so the number of errors E over set T , will be ET = 0. For convergence to an error ET > 0, then any setting of Sthr < 1 and Ethr > 0 such that Pa < 1 and n < p a finite training set

T

of

is guaranteed to converge. In this case, it is only necessary to insert enough neurons into the evolving layer such that n is not greater than p

ET . If T has a uniform distribution of examples, then n < p ET .

5.12 Ramifications of Non-orthogonality of Parameters The analysis in the previous sections has shown that three of the four training parameters (sensitivity threshold, error threshold and learning rate two) are non-orthogonal, that is, not independent of one another. The ramifications of this finding are primarily concerned with methods of optimising the settings of the training parameters. Since these three parameters are not independent, it is not possible to sequentially optimise each parameter. For example, while a value for the sensitivity threshold may be iteratively found that is optimal for a particular data set, that value will not be optimal if the error threshold is changed. The same applies to the error threshold and learning rate two parameters. In short, if optimal values are to be found for ECoS training parameters, they must be found simultaneously, that is, the task is a multi-parameter optimisation problem. A solution to this problem is presented in Chapter 7, where an evolutionary algorithm is proposed and applied to the task.

5.13 Ramifications of Non-uniform Distribution of Data The analysis in the preceding sections assume a uniform distribution of training examples, that is, examples that are uniformly distributed in input space. Most real-life problems, such as the benchmark data sets, have nonuniformly distributed training examples. it is therefore worthwhile to consider the ramifications to this analysis of non-uniformity. The equations in the preceding sections describe a single probability for the entire network. With non-uniform data, the probabilities vary according to the position of each neuron. The mean probabilities,


Ethr Sthr 1 2

108

0.1 0.5 0.5 0.5

Table 5.1: ECoS training parameters for benchmark data. however, across all neurons in the network and across all examples in the training set, still change according to the settings of the training parameters. Therefore, the predictions made in the preceding sections still hold true.

5.14 Validation of the Formalisation with Benchmark Data Sets 5.14.1 Introduction From the formalisations above, several predictions were made about the effect of the training parameters, insofar as they relate to the addition of neurons to the ECoS evolving layer. Firstly, as the sensitivity threshold increases, the number of neurons added increases; secondly, that as the error threshold decreases, the number of neurons added increases; thirdly, for classification problems, as the learning rate two parameter increases, the number of neurons added decreases, while for function approximation problems, as the learning rate two parameter increases, the number of neurons will first decrease, then increase. The ways in which these three parameters behave has been discussed from a theoretical viewpoint (Sections 5.6, 5.7, and 5.9). The purpose of this section is to experimentally confirm these predictions.

5.14.2 Experimental Method In these experiments, each of the three parameters of interest was investigated in turn. This was done by holding all parameters constant except the one being investigated. This setup assumes that there is a degree of independence between the parameters. The formalisation above shows that this is not the case. The general trends predicted, however, can be verified using this approach. The testing of the interactions of the parameters is left for future work. The entire data set was used for training, and ten fold cross validation was employed. In this case, each ‘run’ meant that the network was trained on 90% of the data, then further trained on the remaining 10%. The 10% held out was different each time, which had the effect of varying the order in which training examples were presented to the network. At the conclusion of each run, the mean number of neurons present was calculated. The intention of this process is to minimise the effect the order of examples has on the behaviour of the training algorithm, so that the effect of the training parameters can be better evaluated. The target parameter was started at a value of

0:01 and increased by 0:01 for each run.

There were a total of 100 runs. The behaviour and size of the networks

were therefore evaluated for a target parameter range of [0:01; 1℄. The constant values of the parameters are as in Section 4.13, and are reproduced in Table 5.1.


Number of Neurons vs Sensitivity Threshold for SECoS

109

Number of Neurons vs Sensitivity Threshold for EFuNN

200

200

180 180 160

140 Number of Neurons

Number of Neurons

160 120

100

80

140

120 60

40 100 20

0

0

0.1

0.2

0.3

0.4 0.5 0.6 Sensitivity Threshold

0.7

0.8

0.9

1

80

0

0.1

0.2

0.3


0.7

0.8

0.9

1

Figure 5.9: Average number of neurons versus sensitivity threshold parameter for SECoS (l) and EFuNN (r) networks trained on the two spirals data set.

Both SECoS and EFuNN networks were investigated, across each of the four benchmark data sets. The results are presented and discussed in the following subsections.

5.14.3 Sensitivity Threshold The analysis in Section 5.6 predicted that the number of neurons added would increase as the sensitivity threshold approached unity. The results presented here show that this is the case across all benchmark data sets. Figure 5.9 presents the results across the two spirals data set. This figure presents two plots, each one of which plots the mean number of neurons present in the network against the setting of the sensitivity threshold. Results for both SECoS and EFuNN are presented. It can be seen that the prediction has been verified: as the sensitivity threshold increased, so to did the number of neurons added. Note that the sudden jump in neurons in SECoS around 0:6 was probably due to the distribution of the training data: the examples in two spirals are very dense at the centre of the space, and becomes more sparse further out. At that jump, the size of the SECoS almost exactly matched the size of the EFuNN. The swing upward in numbers occurs at the same place for both types of networks, which lends further empirical support to the theory. The results across the iris data set are presented in Figure 5.10. Here, both networks showed a long, almost flat number of neurons up to the 0:9 region, at which point the number of neurons started to increase rapidly. Note that the number of neurons in EFuNN was higher than SECoS at all times, except at values of

Sthr

very near unity.

The smoothness of the SECoS curve was probably because of the wider, more even distribution of the training examples. The behaviour over the Mackey-Glass data set is presented in Figure 5.11. The same effect is seen, although in this case the size difference between SECoS and EFuNN is more pronounced. The upward swing in network size appeared at almost the same point, in the region of 0:9. The predictions were again verified over the gas furnace data, with the upward swing in numbers occurring at almost the same point as the other data sets. Again, the size difference between SECoS and EFuNN is apparent.



110


100


150

Number of Neurons

150

50

0

50

0

0.1

0.2

0.3


0.7

0.8

0.9

0

1

0

0.1

0.2

0.3


0.7

0.8

0.9

1

Figure 5.10: Average number of neurons versus sensitivity threshold parameter for SECoS (l) and EFuNN (r) networks trained on the iris classification data set.

Number of Neurons vs Sensitivity Threshold for EFuNN 1000

900

900

800

800

700


Number of Neurons

Number of Neurons vs Sensitivity Threshold for SECoS 1000

600

500

400

600

500

400

300

300

200

200

100

100

0

0

0.1

0.2

0.3


0.7

0.8

0.9

1

0

0

0.1

0.2

0.3


0.7

0.8

0.9

1

Figure 5.11: Average number of neurons versus sensitivity threshold parameter for SECoS (l) and EFuNN (r) networks trained on the Mackey-Glass data set.



300

300

250

250

200


Number of Neurons


150

150

100

100

50

50

0

0

0.1

0.2

0.3


0.7

111

0.8

0.9

1

0

0

0.1

0.2

0.3


0.7

0.8

0.9

1

Figure 5.12: Average number of neurons versus sensitivity threshold parameter for SECoS (l) and EFuNN (r) networks trained on the gas furnace data set.

Discussion The analysis in Section 5.6 predicts that as the sensitivity threshold approaches unity, so does the probability of adding a neuron to the network. The results in this section show that this is the case across all of the benchmark data sets. The results all have a great deal of similarity about them, that is, the curves are highly similar to one another, despite the great differences in the data sets. The points at which the curves start to increase were quite similar for each of the four benchmark data sets. This indicates that the point at which the sensitivity threshold dominates (that is, the effect of the sensitivity threshold overcomes the effect of the other parameters) is in the region of 0:85

0:95.

Below that value, the curves were

quite flat, which implies that the other parameters such as error threshold are responsible for adding most of the neurons. The size difference between SECoS and EFuNN was apparent throughout.

5.14.4 Error Threshold The analysis is Section 5.7 predicted that the number of neurons added to an ECoS network would increase as the error threshold approached zero. The results in this subsection show that this is the case across all benchmark data sets. Figure 5.13 displays the results over the two spirals data set. For both SECoS and EFuNN, it is apparent that the number of neurons added decreased as the error threshold increased, rapidly approaching a plateau about the

0:1 mark. The results for the iris classification benchmark are presented in Figure 5.14. Again, a constant decrease into a plateau is apparent, although the plateau occurs at a later point than for the two spirals data set. Also interesting is that EFuNN was actually smaller than the SECoS, but only for error thresholds greater than 0:1. This suggests that the default parameters selected in Chapter 4 may not have been optimal. However, the accuracy of the smaller


Number of Neurons vs Error Threshold for EFuNN

200

200

180

180

160

160

140


Number of Neurons

Number of Neurons vs Error Threshold for SECoS

120

100

80

120

100

80

60

60

40

40

20

20

0

0

0.1

0.2

0.3

0.4

0.5 0.6 Error Threshold

0.7

112

0.8

0.9

1

0

0

0.1

0.2

0.3

0.4


0.7

0.8

0.9

1

Figure 5.13: Average number of neurons versus error threshold parameter for SECoS (l) and EFuNN (r) networks trained on the two spirals data set.

EFuNNs is unknown: given that the accuracy of the EFuNN in Chapter 4 was not significantly better than the smaller SECoS, the smaller EFuNNs likely would be significantly less accurate. Figure 5.15 presents the results for the Mackey-Glass data set. These results are as predicted, in that both SECoS and EFuNN displayed a decrease in size as the error threshold increased. The decrease is much smoother than for the two classification problems, and the plateau appeared much later, at approximately the 0:45 mark. The SECoS networks were much smaller than the EFuNN, although the two curves are very similar otherwise. The results across the gas furnace set are presented in Figure 5.16. Again, the networks behaved as predicted, and the curve is again much smoother than those for the classification problems. The plateau in the curve occurred slightly earlier than for Mackey-Glass results, but still later than either two spirals or iris classification. Discussion The predictions made in Section 5.7 have been validated by these results. For all data sets, as the error threshold increased, the number of neurons added decreased. For three of the four data sets, SECoS was consistently smaller than EFuNN. One difference observed between the classification and function approximation problems is that the decrease in network size was much smoother for the two function approximation data sets. This is likely due to the greater range of error values that are possible during the training of function approximation networks, as discussed in Section 5.10.

5.14.5 Learning Rate Two Two predictions were made about the effect of the learning rate two parameter. The first was that for classification problems, the number of neurons added will decrease as the learning rate two parameter increases. For function approximation problems, the number of neurons will decrease up until a certain point, then start to increase again. The results for the two spirals data set are presented in Figure 5.17. As predicted, there is a downward trend to



113


100


150

Number of Neurons

150

50

0

50

0

0.1

0.2

0.3

0.4


0.7

0.8

0.9

0

1

0

0.1

0.2

0.3

0.4


0.7

0.8

0.9

1

Figure 5.14: Average number of neurons versus error threshold parameter for SECoS (l) and EFuNN (r) networks trained on the iris classification data set.

Number of Neurons vs Error Threshold for EFuNN 1000

900

900

800

800

700


Number of Neurons

Number of Neurons vs Error Threshold for SECoS 1000

600

500

400

600

500

400

300

300

200

200

100

100

0

0

0.1

0.2

0.3

0.4


0.7

0.8

0.9

1

0

0

0.1

0.2

0.3

0.4


0.7

0.8

0.9

1

Figure 5.15: Average number of neurons versus error threshold parameter for SECoS (l) and EFuNN (r) networks trained on the Mackey-Glass data set.



300

300

250

250

200


Number of Neurons


150

150

100

100

50

50

0

0

0.1

0.2

0.3

0.4


0.7

114

0.8

0.9

1

0

0

0.1

0.2

0.3

0.4


0.7

0.8

0.9

1

Figure 5.16: Average number of neurons versus error threshold parameter for SECoS (l) and EFuNN (r) networks trained on the Gas Furnace data set.

the curve, as the number of neurons decreased with the increasing learning rate two parameter. The same effect is visible in Figure 5.18, which presents the results over the iris classification data set. Again, there is a downward trend visible in the curve. Figure 5.19 presents the results over the Mackey-Glass data set. As predicted, both curves trend downwards until a central plateau, after which they start to trend upwards again. The SECoS curve shows a small plateau at approximately 0:4, while the EFuNN curve plateaus at approximately the 0:5

0:7 range.

Whereas the y-axes for the other plots have all been set the same for SECoS and EFuNN results, Figure 5.20 is different in that the y-axis for the SECoS plot is set to a maximum of fifty, while the y-axis for the EFuNN plot is set to a maximum of 150. This difference is solely for the sake of clarity. This clarity shows that both networks behaved as predicted: the size of the networks trended downwards, plateau briefly, then started to trend upwards again. Discussion The predictions made in Sections 5.9 and 5.10 have been validated by these results. For classification problems, the number of neurons added during training trend downwards as the learning rate two parameter increases. For function approximation problems, the number of neurons trend downwards to a point, then start to trend upwards again. Across all data sets, the SECoS networks were significantly smaller than the EFuNN networks.

5.14.6 Benchmark Experiment Conclusions All of the predictions made in the formalisation have been validated by these results. There are, however, some outstanding points. Firstly, while the general trends of the plots support the predictions made, there are some ‘bumps’ in the plots,


Number of Neurons vs Learning Rate Two for EFuNN

100

100

90

90

80

80

70


Number of Neurons

Number of Neurons vs Learning Rate Two for SECoS

60

50

40

60

50

40

30

30

20

20

10

10

0

0

0.1

0.2

0.3

0.4 0.5 0.6 Learning Rate Two

0.7

115

0.8

0.9

0

1

0

0.1

0.2

0.3


0.7

0.8

0.9

1

Figure 5.17: Average number of neurons versus learning rate two parameter for SECoS (l) and EFuNN (r) networks trained on the two spirals data set.

Number of Neurons vs Learning Rate Two for EFuNN 60

50

50

40


Number of Neurons

Number of Neurons vs Learning Rate Two for SECoS 60

30

30

20

20

10

10

0

0

0.1

0.2

0.3


0.7

0.8

0.9

1

0

0

0.1

0.2

0.3


0.7

0.8

0.9

1

Figure 5.18: Average number of neurons versus learning rate two parameter for SECoS (l) and EFuNN (r) networks trained on the iris classification data set.


Number of Neurons vs Learning Rate Two for EFuNN

300

300

250

250

200

200

Number of Neurons

Number of Neurons


150

150

100

100

50

50

0

0

0.1

0.2

0.3


0.7

116

0.8

0.9

0

1

0

0.1

0.2

0.3


0.7

0.8

0.9

1

Figure 5.19: Average number of neurons versus learning rate two parameter for SECoS (l) and EFuNN (r) networks trained on the Mackey-Glass data set.


Number of Neurons vs Learning Rate Two for EFuNN 150

50

45

40

35

Number of Neurons

Number of Neurons

100 30

25

20

50 15

10

5

0

0

0.1

0.2

0.3


0.7

0.8

0.9

1

0

0

0.1

0.2

0.3


0.7

0.8

0.9

1

Figure 5.20: Average number of neurons versus learning rate two parameter for SECoS (l) and EFuNN (r) networks trained on the gas furnace data set.


117

or regions of the plot where the number of neurons increases slightly, before decreasing again. This is most likely because while the formalisation was derived for data that is uniformly distributed in the input space, the benchmark data sets are not uniformly distributed. When there are “clumps” in the data, the effect of the training parameters becomes unpredictable. Secondly, while the formalisations suggest whether the number of neurons will go up or down, they do not suggest at which rate this will happen. Thus, the curves all alter at different rates for different data sets, in a way that cannot be predicted. This will require a further development of the formalisation. Finally, as was shown in the formalisations, each of the parameters interact with one another. This is almost certainly the cause of the plateaus in the curves for the sensitivity and error thresholds. For the sensitivity threshold experiments, the curves descend to a point where the error threshold starts to drive the addition of neurons. For the error threshold experiments, the curves descend to a point where the sensitivity threshold starts to drive the addition of neurons. Although this interaction is predicted in the formalisation, it may be possible to further clarify the mechanisms involved.

5.15 Problems with ECoS Revisited Some problems with ECoS were identified in Section 4.14. The work in this Chapter has identified some additional problems. It was shown in Section 5.3 that some neurons represent only a few training examples: others, due to the changes in the neuron position during training, may represent no examples at all. Ideally, each neuron would represent a large number of examples. While this can possibly be achieved by very careful selection of the training parameters, it is also desirable to have a method of removing extraneous neurons from the network. This should be done either during training, that is, in an online manner, or after training over a specific data set has been completed, that is, in an offline manner. For maximum flexibility, both options should be available. From Equation 5.3, the selection of optimal training parameters is of critical importance. It was shown in Sections 5.6, 5.7 and 5.9 that the sensitivity threshold, error threshold and learning rate two parameters all effect the behaviour of one another. Therefore, finding the optimal combination of these parameters becomes a difficult combinatorial problem. If there is too much of a disparity between the sensitivity threshold and error threshold then the rate of neuron addition will skyrocket. However, if the sensitivity threshold is set too low, or the error threshold is set too high, then the network will have a small size but will learn the training data to a very low level of accuracy. Also, if the learning rate two parameter is set too high, then the output neurons will saturate very quickly. This will cause the rate of neuron addition to slow, but will contribute to over-training. Also from Equation 5.3, the order in which training examples are presented will have an effect upon the performance and behaviour of an ECoS network. The activation and error of a network over an example depends upon the neurons that are present in the network, yet the neurons that are present depend upon which examples have already been presented. Ideally, examples that represent each of the target classes and are at the centre of the data clusters for these classes will be presented first, as they will allow for a smoother construction of the network. This is because any new neurons that are added will represent only the outliers of the cluster which will refine the division of the input space rather than hacking it up as may happen with a random-order presentation of


118

examples. Optimisation of the order of training examples is plainly not possible in an online training environment, but provides a chance for optimisation of the training process for offline training. The requirement that both the training parameters and the order of training examples be optimised forms the motivation for much of the original work in this thesis. Optimisation of the size of the evolving layer and selection of the optimal training parameters form the motivation for some of the original work in this thesis. Solutions to these problems are presented in Chapter 7.

5.16 Conclusions The overall purpose of this chapter was to address Hypothesis Three from Section 1.2. The criteria for assessing this hypothesis were as follows: 1. A formalisation is created that is experimentally testable. 2. The experiments performed do not disprove the formalisation. The experiments over the benchmark data set in Section 5.14 showed that the formalisation was experimentally testable. The first criterion is thus satisfied. The results of the experiments did not disprove the formalisation. This therefore satisfies the second criterion. Under the criteria defined for this hypothesis in Section 1.3, the hypothesis is considered to be supported. During the analysis in this chapter, it was proven that the sensitivity threshold, error threshold and learning rate two parameters are all intrinsically linked, that is, each of these parameters will effect the behaviour of the other parameters during training of an ECoS network. This shows that optimisation of the training parameters is a multi-parameter optimisation problem: it is not possible to optimise the training parameters independently, rather they must be optimised simultaneously. This difficulty in optimising the training parameters provides the motivation for some of the following work, as described in Subsection 7.3.2.

5.17 Summary The work in this chapter sprang for Hypothesis Three of the thesis and is a major original contribution of the thesis, and describes a novel theoretical basis for evolving connectionist systems. An earlier theory was discussed and critiqued in Section 5.2, and these criticisms addressed using a new formalisation. The geometrical basis of an ECoS network was described in Section 5.3. The theoretical basis of ECOS training was discussed and formalised in Section 5.4. The effect of each training parameter was analysed in Sections 5.6 to 5.9. The difference in behaviour of the training algorithm between classification and function approximation problems was discussed in Section 5.10. Some thoughts on the convergence of ECoS over a finite data set were offered in Section 5.11. This was followed by an experimental validation of the predictions made by the formalisation in Section 5.14. The results in this section supported the predictions made by the formalisation.

Chapter 6

Fuzzy Rules and Evolving Connectionist Systems The beginning of knowledge is the discovery of something we do not understand Frank Herbert

6.1 Introduction Rule extraction is the process of formulating, from a trained artificial neural network (ANN), a set of symbolic rules that mimic the behaviour of the ANN. Six motivations for extracting rules from ANN are given in (Andrews et al., 1995, pg 374-375): 1. Provision of a user explanation capability. 2. Extension of ANN systems to safety-critical problem domains. 3. Software verification and debugging of ANN components in software systems. 4. Improving the generalisation of ANN solutions. 5. Data exploration and the induction of scientific theories. 6. Knowledge acquisition for symbolic AI systems. These motivations are expanded as follows: 1. Artificial neural networks are usually regarded as “black boxes” (Kosko, 1992, pg 304-305). One of their greatest strengths is that they can be created and trained using only the available data, without any knowledge of the process being modelled. This is also one of their greatest weaknesses, as the knowledge that they have discovered is not readily understandable. By extracting rules from a trained ANN, it becomes possible to elucidate what has been learned. 2. People are unwilling to trust a system that cannot be explained. This is important when ANN are to be used in critical applications such as medical monitoring or industrial control. Therefore, the user explanation capability of point one above can allow for the inspection of the ANN process, allowing the ANN to be applied to a safety-critical application with greater confidence.

CHAPTER 6. FUZZY RULES AND EVOLVING CONNECTIONIST SYSTEMS

120

3. When an ANN is treated as a black box, it is very difficult to know whether is has truly learned the target data. A famous example of this is cited in (Ripley, 1993, pg 103), where an ANN had been trained to distinguish between photographs with battle tanks in them and photographs of empty scenery. Although the ANN appeared to be functioning correctly, it later transpired that the photographs with tanks had been developed differently to those without: the ANN had thus learned to distinguish between light and dark photographs. If rules had been extracted from the ANN, this problem would have become apparent much earlier. 4. By extracting rules from an ANN trained on a small or limited data set, an experienced user may be able to predict when the generalisation of the ANN will fail. This will allow the user to modify and supplement the data set as necessary to solve this problem. 5. The black box property of ANN can actually be exploited. If nothing is known about a set of data, an ANN can still learn the processes within it. By applying a method of explanation to the ANN, it becomes possible to discover knowledge about the unknown process. Even if they are not being used to solve a problem, they can produce knowledge that can be applied to solve it. 6. Although symbolic AI systems, such as rule-based expert systems, are useful in solving some problems, a major problem is acquiring the rules. An ANN can be used to learn these rules, which can then be extracted and applied in the expert system. As stated in (Gallant, 1993, pg 315): A . . . use for If-Then rules occurs when constructing a neural network expert system with the aid of a human expert. We can take rules that are implicitly encoded in the network, and ask for comments from our human expert.. . . Presumably it is easier for experts to comment upon rules than it is to create consistent collection of If-Then rules (with confidence measurements) as in conventional expert systems. From the above motivations, then, the primary reasons for extracting rules from ANN are explanation and knowledge discovery. Rule extraction algorithms have been developed to facilitate both of these goals. These algorithms are designed to extract rules through one or more of the following methods:

By observing the performance of the ANN over a set of data.

By analysing the connection weight values.

By some combination of these two methods.

Rule extraction algorithms exist that create both crisp and fuzzy rules (Section 2.2). Fuzzy rules are often favoured because of their greater expressive power, due to the greater semantic power of their predicates and consequences (Kasabov, 1996a, pg 178). Many ANN can have only one kind of rule extracted from them, as they have been designed with a specific rule extraction algorithm in mind. An example is the Knowledge Based Neural Network (Towell and Shavlik, 1993) which can have only crisp rules extracted from it. Another example is the Fuzzy Neural Network FuNN (Section 2.3), which, by virtue of its structure, can only have fuzzy rules extracted from it.


121

As discussed in Section 2.3 the initial values of an ANN’s connection weights can have a great effect upon its training performance. If the connection weights are far from optimal, it becomes difficult for the ANN to learn the training data. There is also the problem of how to accommodate existing knowledge about the problem into the ANN: if a partial solution is already known, it makes more sense to utilise that knowledge than to start from scratch. Because of these problems, rule insertion algorithms have also been developed (Towell et al., 1990) that will use existing rules to create and initialise an ANN. This allows existing knowledge to be included in the ANN from the start, which helps them to avoid the problems of a bad initial training point. Another motivation for rule insertion algorithms is “knowledge refinement” (Towell and Shavlik, 1993; Andrews and Geva, 1997). This is when a set of initial, rough rules are inserted into an ANN, which is then further trained to refine the rules. At the conclusion of training, new rules are extracted, which represent the refined knowledge. The importance of rule extraction to the ANN field, along with the desire to explain ECoS networks, is the motivation for Hypothesis Four, as described in Section 1.2, which leads to the work in this chapter. Here, algorithms for extracting and inserting knowledge from and into ECoS style networks are presented, evaluated and critiqued. The primary motivation for this work is the explanation of trained ECoS networks. Therefore, the rules that are extracted should be comprehensible. It is widely stated among the fuzzy logic community (Kosko, 1992; Kasabov, 1996a) that fuzzy rules are easier to comprehend than crisp rules. For this reason, fuzzy rule extraction is the focus of this chapter. Also, the rule extraction algorithms should not require modification or alteration of the network before it is carried out. This is primarily because rule extraction algorithms are intended to explain a trained ECoS: if an ECoS network will adequately solve the problem, then the ANN should be used, and rule extraction performed only when an explanation of the networks behaviour is required. A final comment about rule extraction is, that all rule extraction algorithms compromise, to some extent, as to the accuracy of the extracted rules. As stated in (Gallant, 1993, pg 322) “we have to make some compromise, because the task is inherently impossible”. In other words, it is generally not possible to perfectly recreate the performance of a neural network with rules. Ways in which ECoS fuzzy rule extraction algorithms compromise are discussed in Section 6.10. This chapter investigates Hypothesis Four, that is, it investigates the possibility of extracting fuzzy rules from SECoS networks. The rest of this chapter is arranged as follows: a framework for classifying and evaluating ANN rule extraction algorithms is presented in Section 6.2, and previous work in this field is briefly summarised in Section 6.3. The algorithm developed for extracting fuzzy rules from the FuNN model presented in Section 2.3, which is the forebear of the algorithms applied to ECoS, is described and evaluated in Section 6.4. Section 6.5 presents justification for extracting fuzzy rules from ECoS, in terms of relating fuzzy rules to ECoS networks, while Section 6.6 describes, evaluates and critiques the rule extraction algorithm developed for EFuNN (Section 4.3). Novel algorithms for extracting fuzzy rules from SECoS are presented in Section 6.7, while algorithms for inserting fuzzy rules in SECoS are presented in Section 6.8. A method for applying and evaluating the performance of fuzzy rules extracted from ECoS networks is described in Section 6.9. Problems with the fuzzy rules extracted are described in Section 6.10. Experimental results over the benchmark data sets are presented and discussed in Section 6.11: the purpose of these experiments is to support the evaluation of Hypothesis Four. Finally, conclusions to the chapter are presented.


122

6.2 Evaluation of Algorithms for Rule Extraction from Neural Networks Andrews et al (Andrews et al., 1995) list five criteria by which a rule extraction algorithm may be evaluated.

The expressive power of the rules.

The quality of the rules.

The translucency of the ANN to the rule extraction algorithm.

The complexity of the rule extraction algorithm.

The portability of the algorithm.

The expressive power of the rules is also referred to as the rules format. This is the form of the rules, for example propositional, symbolic, fuzzy or Boolean. The quality of the rules is assessed on four different qualities:

Accuracy of the rules, that is how well the rules classify unknown examples.

Fidelity of the rules, which is how well the rules replicate the behaviour of the ANN they were extracted from.

Rule consistency, that is the extent to which rules extracted from different ANN trained on the same data agree with one another.

Rule comprehensibility, which is directly related to the number of rules and the number of antecedents in each rule. Comprehensibility has been described by (Michalski, 1983, pg 122) in his comprehensibility postulate as follows: The results of computer induction should be symbolic descriptions of given entities, semantically and structurally similar to those a human expert might produce observing the same entities. Components of these descriptions should be comprehensible as single ‘chunks’ of information, directly interpretable in natural language, and should relate quantitative and qualitative concepts in an integrated fashion.

The translucency of the ANN to the rule extraction algorithm is a way of describing how much of the ANN is visible to the algorithm. These descriptions range from those algorithms where every neuron and connection weight is inspected, to those where the ANN is treated as a “black box”. Andrews et al. described the former as a “decompositional” algorithm, and the latter as “pedagogical”. They also identified a third class, “eclectic”, that combines aspects of the other two. Complexity refers to the complexity of the core algorithms of the rule extraction procedure. Portability describes how closely tied the rule extraction algorithm is to the ANN architecture it is applied to. Some algorithms are portable across different types of ANN, while some will work only with the ANN model they were designed for. The criteria described above will be used to evaluate the algorithms described in this chapter.


123

6.3 Previous Work in Rule Extraction The integration of knowledge representation and neural networks has been done in many ways. Crisp, propositional rules (Gallant, 1993; Vaughn et al., 1993), Boolean logic rules (Towell and Shavlik, 1993), decision trees (Ivanova and Kubat, 1995) and finite-state automata (Tino and Koteles, 1999) have all been dealt with in previous work. Since fuzzy rules are the focus of the original work in this chapter, only fuzzy rule extraction algorithms will be covered in this section. Rule extraction algorithms can be usefully classified in two ways. Firstly, according to the type of rules that are extracted. Secondly, according to the mechanism used to extract the rules. As described in Section 6.2, there are three kinds of rule extraction algorithm: decompositional, that will examine the structure and connection weights of the network directly. That is, a decompositional rule extraction algorithm will “open up” the network and examine its internal workings, it will treat the ANN as a “white box”. The second kind of algorithm is pedagogical, which works by analysing the behaviour of the network over specific input examples. While a decompositional algorithm treats the ANN as a white box, a pedagogical algorithm treats the ANN as a “black box”, and does not examine its internal structures at all. The third class of rule extraction algorithm is known as eclectic and combines elements of both decompositional and pedagogical algorithms. Distinctions are also made between algorithms that extract local and global rules. Local rules are rules that relate to individual neurons, that is, the rules that are generated are extracted so that they reflect individual neurons and the knowledge encapsulated by those neurons. Algorithms that extract local rules must therefore be of the decompositional type. Global rules are more holistic, in that the functioning of the network as a whole is used to generate the rules and that the rules represent the entire network rather than just fragments of it. Global rule extraction algorithms therefore tend to be pedagogical in nature. Since pedagogical algorithms require the presence of data sets, usually training data, to function, they will be of little use when extracting rules from ECoS networks. This is because ECoS networks are intended to learn throughout their lifetime, which means that the training data may not be available when rule extraction takes place. Ideally, the rule extraction algorithm will not require alteration of the network to function. Although many fuzzy rule extraction algorithms do require modification of the network before rules can be extracted (Umano et al., 1997; Hakim, 2001), the optimisation or alteration of the trained ANN solely for the purposes of rule extraction should be avoided. This is because of the focus on using rule extraction to explain ECoS networks. If the network is altered solely for rule extraction, then the rules will not explain the original network. Finally, the rule extraction algorithm should be applicable to existing networks without modification to the network learning algorithm. There are already several fuzzy rule extraction algorithms that work solely on network algorithms that were expressly designed for rule extraction (Ichimura et al., 1997; Furuhashi et al., 1997; Ishibuchi et al., 1997; Castellano and Fanelli, 2000). In this thesis, the motivation for rule extraction is explanation of an existing, useful ANN, where usefulness of the ANN is determined by how well it performs over the problem 1 . To summarise the requirements from the Introduction and above, the algorithms examined must:

Result in the creation of fuzzy rules.

1 Although

an algorithm for extracting fuzzy rules from EFuNN is presented in this chapter, it is included because EFuNN is an ECoS

network, and EFuNN is not intended solely as a knowledge discovery tool. Similarly, the REFuNN algorithm for extracting fuzzy rules from a FuNN is also included, because of the influence it had on the EFuNN fuzzy rule extraction algorithm


Not require the use of training data sets.

Not require alteration of the trained ANN before rule extraction is carried out.

Not require modification of the ANN algorithm or training algorithm.

124

The previous work that is presented in this section is therefore constrained to work that fulfills the above requirements. Under the evaluation criteria of (Andrews et al., 1995), each of the algorithms reviewed in this section have some criteria in common. All of the algorithms are expressive, because they all extract fuzzy rules. The algorithms must be decompositional in nature, as no data is assumed to be present. Quality of the rules can only be reported if it is evaluated in the publication - most do not do so. All of the publications discussed in this section deal with extraction from a specific type of neural network, so portability is not an issue. The work reported in (Mukaidono and Yamaoka, 1992) dealt with a conventional MLP that was trained on fuzzified inputs and outputs. The initial number of hidden neurons in the MLP was equal to the number of inputs raised to the power of the number of input membership functions: thus, there was one hidden neuron for each possible rule. After training by the backpropagation algorithm, the relative importance of each input neuron is calculated. Unimportant hidden neurons are then removed and the network re-trained. This process is repeated iteratively until the network error no longer improves: the hidden neurons that remain thus represent the rules necessary to represent the system. In (Matthews and Jagielska, 1995) a technique is presented that extracts fuzzy rules from an MLP that is trained on data that has been passed through external fuzzy membership functions. It is in many ways similar to the FuNN network presented in Section 2.3, except that the MF for FuNN are internal. In both cases, the MLP is learning to associate fuzzy inputs with fuzzy outputs. The paper describes two decompositional methods of extracting fuzzy rules from the trained MLP. In the first, the hidden layer neuron that contributes the most to the activation of a particular output neuron is identified. The positive incoming weights to that neuron are used to identify the antecedents for the rule that corresponds to that neuron. The second method estimates both the excitatory and inhibitory effect of each input neuron, and relating these to each output neuron. In many ways, it is a simplified version of the work in (Mukaidono and Yamaoka, 1992). Of particular interest in this paper is the way in which the extracted fuzzy rules were evaluated. Rather than evaluating the rules with a conventional compositional fuzzy algorithm, where the outputs of all rules are weighted and combined, only the outputs of the winning rule were calculated. The fuzzy system was thus a ‘winner take all’ system. This paper was followed up by (Whitfort et al., 1995), where the rule extraction technique was compared to a genetic algorithm based rule formulation technique, and (Jagielska et al., 1996), where rule extraction, genetic algorithm-based formulation and rough set based techniques were all compared. A genetic algorithm was used to train a three-neuron-layer MLP in (Brasil et al., 2000). The GAs fitness function was a simple linear function of the network error: no special genetic operators were applied, nor was the architecture of the network fundamentally altered. The MLP in this case was again learning to map fuzzified input data to fuzzified output data. Rules were extracted by presenting fuzzy input and output vectors to the network, and identifying which neurons and connections contributed the most to the activation of the output neurons. The input and output values were fuzzified to form the antecedents and consequents. Although this requires the use


125

of data examples, it also closely examines the neuron activations and connection weight values, and can thus be classified as decompositional. It is also not clear whether or not the input and output vectors used were from the training data set or whether they were generated on the fly. Training on fuzzy input and output data was also the method used in (Huang and Benjamin, 2001). Here, however, the rules were extracted via a depth-first tree search, where the transformed connection weights were sorted and organised into a tree data structure: transformed values were used to speed the search by eliminating negative values in the tree. A rule was formed when a complete traversal from the root to a leaf yielded a positive sum of weights, when the bias is taken into account. This rule extraction algorithm was applied to a manufacturing problem.

6.4 Fuzzy Rule Extraction from FuNN The REFuNN algorithm Rule Extraction from FuNN) was proposed in (Kasabov, 1996b) and is designed as a means of extracting Zadeh-Mamdani fuzzy rules from trained FuNN networks.

6.4.1 The REFuNN Algorithm The REFuNN algorithm consists of five distinct steps: 1. Initialisation of a FuNN. 2. Training the FuNN. 3. Extracting the initial set of weighted rules. 4. Extracting simple rules from the set of weighted rules. 5. Aggregating the initial weighted rules. Each of these steps will now be expanded. 1. Initialisation of a FuNN involves creating a FuNN structure and setting its initial weight values. These weight values can be set randomly or by insertion of fuzzy rules. 2. Training the FuNN is done via backpropagation of errors, in much the same way as training a MLP. The input and output membership functions may or may not be modified during this training (Subsection 2.3.9). 3. The initial weighted rules are extracted from the FuNN as follows: each action layer neuron is examined, and all incoming connection weights that are above a certain threshold T ha are identified. The rule neurons that these connections lead from are then identified. These are taken to be the rules, while the action layer neurons are the consequents. For each rule neuron, each condition to rule connection that is above the threshold T h is identified. These connections identify which condition neurons form the antecedents of the rules. The values of the weights from the condition neurons and to the action neurons are taken as the initial degrees of importance of each antecedent and the certainty factor for each consequent.


126

4. Extraction of simple rules is done by simply removing the degrees of importance. In the case of a rule that has antecedents important enough to cause the rule to fire irrespective of the other antecedents, then the rule is decomposed into several other rules, where the important antecedents are separated into their own rules. 5. Aggregating the initial rules is done by combining all rules that have the same antecedent and consequents into one rule. The degrees of importance and certainty factors are aggregated by taking the normalised sum across each rule being aggregated.

6.4.2 Evaluation of the REFuNN Algorithm Using the classification scheme of Andrews et al (Section 6.2), the REFuNN algorithm may be evaluated as follows. The end product of the REFuNN algorithm is fuzzy rules, so the expressive power of the rules is high. Provided that suitable labels are chosen for the inputs, outputs and MF, then the resulting rules will be easy to understand. The quality of the rules is dependent upon the thresholds set during rule extraction: if the thresholds are set too high, then too much knowledge will be lost, while a threshold that is too low will include too many elements in the rules, impacting both the quality of the rules and their expressive power. REFuNN examines the connection weights, and can thus be described as a decompositional algorithm. The translucency of the FuNN network to the REFuNN algorithm is very high. The complexity of the rule extraction process itself is quite low, but the rule aggregation phase can be computationally very demanding if the number of rules is large: comparing each rule to every other rule requires n(n

1) comparisons, for n rules. Finally, the portability of the REFuNN algorithm

is low: it is designed expressly for FuNN-style ANN - it could conceivably be used with slight modifications on a FuNN derived network, such as a four layer FuNN, but not on any other kind of network. The algorithm used for extracting fuzzy rules from FuNN is, however, useful as the basis of and inspiration for algorithms for extracting fuzzy rules from EFuNN.

6.5 Fuzzy Rule Extraction from ECoS Networks Justification for the approach used to extract fuzzy rules from ECoS networks can be found in (Kosko, 1993, pg 205). . . . rules are patches. So the question is how you learn patches. The answer is patches are clusters in the data. Learn data clusters and you learn patches and you learn rules and you have an adaptive fuzzy system. Kosko goes on to describe the Adaptive Vector Quantization (AVQ) method that is used to determine fuzzy rules. The AVQ algorithm was described as follows in (Kong and Kosko, 1992, pg 217-218):

1. Initialize synaptic vectors:

mi (0) = x(i); i = 1 : : : ; p: Sample-dependent initialization avoids

many pathologies that can distort nearest-neighbor learning.


127

2. For random sample x(t), find the closest or “winning” synaptic vector mj (t):

jjmj (t) x(t)jj = minjjmi (t) x(t)jj where jjxjj2

= x21 + : : : + x2n defines the squared Euclidean vector norm of x. We can define the N synaptic vectors closest to x as “winners” 3. Update the winning synaptic vector(s) mj (t) with an appropriate learning algorithm This clustering is very similar to the ECM used in DENFIS (Section 4.7), which is itself equivalent to the learning in the input layer to evolving layer connection weights of SECoS and EFuNN. The only difference is that whereas the number of prototypes in AVQ is fixed, it is dynamic in ECM. Neurons in the evolving layer of EFuNN and SECoS networks perform the same function as the prototypes in ECM and AVQ, that is, they represent the prototypes of the clusters being learned. Thus, each evolving layer neuron can be interpreted to represent a patch in space (actually a Voronoi polygon, as established in Chapter 5). Therefore, each neuron can be represented as a fuzzy rule, which will represent the data on which the ECoS was trained. Further support, by counter example, can be found from (Mitra and Hayashi, 2000, pg 748), where it is stated: Often an ANN solution with good generalisation does not necessarily imply involvement of hidden units with distinct meaning. Hence any individual unit cannot essentially be associated with a single concept or feature of the problem domain. This is typical of connectionist approaches, where all information is stored in a distributed manner among the neurons and their associated connectivity. This supports the concept of extracting fuzzy rules from ECoS networks by virtue of the fact that the knowledge in ECoS is purely local rather than distributed. Thus, each neuron does have meaning, which can be expressed in the form of a fuzzy rule.

6.6 Extraction of Fuzzy Rules from EFuNN It is to be expected that fuzzy rules can be extracted from a trained EFuNN, because the structure of EFuNN is based upon the structure of FuNN (Section 2.3), and fuzzy rules can be extracted from FuNN (Section 6.4). The structure of EFuNN (Section 4.3) restricts the rules that are extracted to the Zadeh-Mamdani type. The MF used in extracting rules are those that are embedded in the structure of the EFuNN itself. As there are MF attached to both the input and output neurons, extracting Takagi-Sugeno rules would be very much more difficult than extracting Zadeh-Mamdani rules. While the former would require an extensive decomposition of the structure and function of the EFuNN network, the latter requires only an examination of the connection weights. The following subsection describes how this is done.

6.6.1 The Rule Extraction Algorithm The EFuNN fuzzy rule extraction algorithm (RE-EFuNN), is based on the principle that an EFuNN rule node represents a mapping from a cluster of data to a certain class: this is consistent with the justification presented in Section 6.5. The algorithm, as described in (Kasabov and Woodford, 1999, pg 1409), is as follows:


128

1. A EFuNN (sic) is evolved on incoming data 2. The values W 1(i; j ) are thresholded: if W 1(i; j ) > T hrI then W 1t(i; j ) = W 1(i; j ), otherwise: W 1t(i; j ) = 0. 3. The values W 2(j; k ) are thresholded in a similar way with the use of a threshold T hr2 4. A rule Rj that represents a node j (j

= 1; 2 : : : Nrn) is formed as follows:

Rj: IF x1 is I1 [W1t1 (i1,j)] AND x2 is I2 [W1t2(i2,j)] AND. . . ) AND xn is In [W1tn(in.j)] THEN y1 is L1 [W2 1(j,l1)] AND y2 is L2 [W2 2 (j,l2)] AND. . . AND ym is Lm [W2m (j,lm)], where I1,I2,. . . In are the fuzzy values (labels) of the input variables x1,x2,. . . ,xn correspondingly, with the highest connection weights to the rule node j that are above the threshold Thr1:L1, L2,. . . ,ym are the fuzzy values of the output variables y1,y2,. . . ,ym correspondingly, that are supported by the rule node j by connection weights above the threshold Thr2. The values [W1t(i,j)] are interpreted as fuzzy co-ordinates at the clusters represented in the rules Rj and the values [W2(j,l)] are interpreted as certainty factors that this cluster belongs to the fuzzy output class. 5. The rules that have the same condition and action parts, but differ in fuzzy co-ordinate value and certainty factors, are aggregated by taking the average value across the fuzzy co-ordinates and the maximum value for the certainty degrees. Taking an average value for the fuzzy co-ordinates is equivalent to finding the geometrical centre at (sic) the cluster that aggregates several rule nodes into one.

6.6.2 Evaluation of the RE-EFuNN Algorithm Using the classification scheme of Andrews et al (Section 6.2), the RE-EFuNN algorithm can be evaluated as follows. As with the REFuNN algorithm, the use of fuzzy rules allows for a high degree of expressive power of the extracted rules. The fact that the algorithm can extract one rule for each rule neuron in the EFuNN, however, coupled with the number of rule neurons that may be present in an EFuNN, means that an excessively large number of rules can be created, obscuring the semantics of the rules. Rule quality in the RE-EFuNN algorithm is less dependent upon the thresholds used in the rule extraction process than it is in the REFuNN algorithm: with fewer parameters it is easier to optimise the performance of the algorithm. The translucency of the EFuNN is again high: RE-EFuNN is a decompositional algorithm. The algorithm is, however, unnecessarily complex. There are several steps in the algorithm that can be eliminated, saving a large amount of computational time from the rule extraction process. The rule aggregation process is again the major culprit, with the number of comparisons being

n(n

1) for n neurons in the EFuNN rule layer.

Finally, the portability of the algorithm is very low. It

was designed expressly for EFuNN networks, and cannot be used unmodified on other types of networks. It is, however, the inspiration for the algorithm for extracting Zadeh-Mamdani rules from SECoS (Subsection 6.7.1). Several criticisms can be levelled at the RE-EFuNN algorithm. The first concerns the weight zeroing step. If the algorithm only looks for the winning connection from each condition neuron, what is the point in zeroing?


129

Small weights will be ignored anyway. In all fairness, this is an optional step and is not an intrinsic flaw in the algorithm. The second is the rule aggregation step. If two rule neurons produce rules that have the same antecedents and consequents, then that would seem to indicate that the two neurons are themselves redundant. Thus, the neurons would best be aggregated within the network itself. This would yield a smaller, more efficient network, and would eliminate the rule aggregation step from the rule extraction algorithm. The third criticism is that the rule to action layer connections are taken as certainty factors. The problem with this is that certainty factors must be in the range [0; 1℄, while the weights in this connection layer can, as was shown in Chapter 5, grow above one. Thus, the algorithm could yield rules with certainty factors greater than one. This is a minor problem and could be solved by simply taking the normalised values of the rule to action weights as the certainty factors. The final criticism deals more properly with the architecture of EFuNN itself. In the canonical EFuNN algorithm, the fuzzy membership functions are stored as connections and neurons in the condition layer of the network. These membership functions are extracted along with the encoded rules during rule extraction. However, the membership functions do not change during training: thus, the membership functions in the extracted rules are as arbitrary as they were when the network was created. Also, changing the input membership function without modifying the connection weights leading from the condition neurons to the rule neurons will cause a decrease in accuracy. This is because the condition to rule connection weights represent the fuzzified input values of the previous training examples. If the membership functions are changed, then the fuzzified values of the same examples will be different. Thus, the network will not be able to accurately classify previously seen examples. This therefore rules out the use of such membership function optimisation methods as genetic algorithms. The membership functions of an EFuNN can be set once only, at the start of its lifetime, and cannot be easily changed, even if the rules later extracted from the network show that these are not optimal.

6.7 Extraction of Fuzzy Rules from SECoS Networks As demonstrated in Section 4.13, the size of the evolving layer in a SECoS is often smaller than the equivalent EFuNN. Also as demonstrated in Section 4.13, they can be just as accurate as EFuNNs and can be applied to the same problem domains. The presumed inability to extract fuzzy rules from a SECoS is perceived as one of the models major disadvantages when compared to EFuNN. This section introduces algorithms that allow for the extraction of both Zadeh-Mamdani and Takagi-Sugeno fuzzy rules from a trained SECoS network. The major difference between EFuNN and SECoS is the absence in SECoS of the input and output membership functions. This lack of membership functions is what accounts for the smaller size of a trained SECoS: the fuzzification in an EFuNN expands the input and output space dimensionality, which requires more rule neurons to learn it. However, comparing the RE-EFuNN algorithm of Section 6.6, and the REFuNN algorithm of Section 6.4, with the previous work discussed in Section 6.3, suggests that FuNN and EFuNN may be the right way to go


130

insofar as learning fuzzy rules is concerned. Most of the previous work discussed in Section 6.3 involved using a standard ANN to learn fuzzified input and output data, that is, learning fuzzy input to output mappings. Since the MF in FuNN and EFuNN networks are fixed, that is essentially what is happening: the FuNN and EFuNN networks are learning to map fuzzified input vectors to fuzzified output vectors. However, it occurred to the author that for the case of EFuNN, this is completely unnecessary. As established in Chapter 5, ECoS networks learn and store exemplars in their input to evolving layer connections. In the case of EFuNN, these are fuzzy exemplars, while for SECoS they are crisp exemplars. What, then, is the difference between storing fuzzified exemplars in an EFuNN, and fuzzifying the crisp examples stored in a SECoS? It is the hypothesis of this chapter that fuzzy rules of both Zadeh-Mamdani and Takagi-Sugeno type can be extracted from a SECoS by providing membership functions during the rule extraction process. The fuzzy rule extraction process is then a matter of combining the provided fuzzy membership functions with the properties of the trained SECoS network in such a manner that fuzzy propositions and consequents are created that adequately reflect the knowledge captured by the SECoS. Since ECoS learn by partitioning the input space into regions, and fuzzy rules can be visualised as methods of associating regions of input space (Section 6.5) with consequents, the fit between the two models is quite natural: rule extraction becomes a matter of mapping the two sets of regions together and by so doing finding the antecedents and consequents.

6.7.1 Extracting Zadeh-Mamdani Fuzzy Rules Zadeh-Mamdani fuzzy rules are useful for classification tasks and have the (arguable) advantage of being more comprehensible than Takagi-Sugeno rules, especially for a large number of consequent variables. The algorithm for extracting Zadeh-Mamdani rules from trained SECoS networks is as follows:

For each evolving layer neuron h – Create a new rule. – For each input neuron i

Find the MF associated with i that activates the most strongly for the weight Wi;h . Add that MF to the antecedent of the rule for that input.

This is the MF for that input for this rule. Insert the membership degree of the weight in the winning MF as the degree of importance for this condition.

– For each output neuron o

Find the MF associated with o that activates the most strongly for the weight Wh;o . Add that MF to the consequent of the rule for that output.

This is the MF for that output for this rule. Insert the membership degree of the weight in the winning MF as a degree of importance for this consequent.


131

if and and

and then and and if and and and then and and if and and and then and and Table 6.1: Fuzzy rules extracted from a SECoS trained on the iris data set. Functionally, this algorithm is equivalent to the RE-EFuNN algorithm. The RE-EFuNN algorithm chooses antecedent MF based on the highest magnitude weights from the condition to rule neurons, which are really crisp exemplar values that have been fuzzified by the EFuNNs internal MF. The SECoS-FRE algorithm chooses antecedent MF based on the fuzzified values of the weights, which while representing crisp exemplars, are fuzzified using the provided external MF. The advantage of this algorithm is that, since the membership functions are not an integral part of the network, the number of MF, their type and their parameters can all be optimised before the rule extraction process is carried out. If the rules extracted with a particular set of MF are not optimal, then the MF can be changed and fresh rules generated, all without altering the SECoS. This opens the possibility of refining the MF via some method such as an evolutionary algorithm. In this case, the fitness function would be a measure of the quality of the extracted fuzzy rules. Table 6.1 displays a selection of rules extracted from a SECoS trained on the iris data set. There were three MF associated with each input. The MF for petal and sepal length were named Short, Medium and Long. The MF for petal and sepal width were named Narrow, Medium and Wide. There were three MF associated with each output, named IsNot, Uncertain and Certain. The degree of importance value associated with each antecedent and the degree of confidence value associated with each consequent element were all rounded to two decimal places. Another example is shown in Table 6.2, which displays a selection of rules extracted from a SECoS trained on the gas furnace data set. The SECoS had nineteen neurons in the evolving layer. There were nine MF associated with the variable representing carbon dioxide levels, for both antecedents and consequents, and six associated with the variable representing methane levels. The MF for carbon dioxide levels were named ExtremelyLow, VeryLow, ModerateLow, Moderate, ModerateHigh, High, VeryHigh and ExtremelyHigh. The MF for methane levels were named VeryLow, Low, ModerateLow, ModerateHigh, High and VeryHigh. Evaluating this algorithm using the same criteria as above (Section 6.2) shows that it is indeed very similar to


132

if and then if and then if and then Table 6.2: Zadeh-Mamdani rules extracted from a SECoS trained on the gas furnace data set. the RE-EFuNN algorithm. The expressive power of the rules is very high, dependent upon the careful selection of variable and membership function names. Since SECoS networks generally have fewer neurons in the evolving layer, the number of rules is also smaller and hence easier to understand. The translucency of the SECoS is very high: in common with REFuNN and RE-EFuNN this is a decompositional algorithm. The complexity of the algorithm is low, each connection is examined only once and unlike the RE-EFuNN algorithm, no comparison or aggregation of the extracted rules is required. As there are no thresholds involved, the only parameters that require optimising are the provided sets of input and output MF. The portability of the algorithm is low, as it was designed expressly for the SECoS architecture.

6.7.2 Extracting Takagi-Sugeno Fuzzy Rules First-order Takagi-Sugeno fuzzy rules take the general form: if

x1

is

A1

x2

and

is

A2

then

y1 = f (x1 ; x2 )

That is, the consequents are a function of the antecedent variables. Inference over Takagi-Sugeno rules involve calculating the weighted combination of each activated rule. The computations involved in performing inference over Takagi-Sugeno rules are easier than Zadeh-Mamdani types, as no inferred MF is formed. Also, no defuzzification is required to obtain an output. Their main disadvantage is that they can be harder to comprehend than Zadeh-Mamdani rules: while their antecedents consist of semantically meaningful linguistic variables, their consequents consist entirely of linear equations. First-order Takagi-Sugeno rules have significant advantages compared to Zadeh-Mamdani rules when extracted from SECoS networks. Firstly as the consequent functions can be tuned to be much closer to the true output of the network, they will produce a much closer approximation of the behaviour of the network. Secondly, due to the presence of explicit weight values in the rules, they will exactly recreate the parent network when used in rule insertion. The consequent function for rules extracted from a SECoS was derived in the following way: The activation value

Ao

of an output neuron

o

of a SECoS network can be described using the following

equation:

Ao = Wj;o Aj

(6.1)

where:

Wj;o is the connection weight from the winning neuron in the evolving layer j to o. Aj is the activation value of


133

j , as determined by the following:

Aj = 1 D

(6.2)

where D, for the common case of normalised Manhattan distances, is calculated according to:

Dj =

P

jIi Wi;j j j i=1 Ii + Wi;j j

P i=1

(6.3)

where:

I is the input vector and Wi;j is the connection weight from input i to j . Expanding Aj into Equation 6.1 yields:

Ao = Wj;o (1 Dj ) while expanding Equation 6.3 into the above gives:

Ao = wj;o 1

P

jIi Wi;j j Pi =1 i=1 jIi + Wi;j j

!!

Thus the general form of the consequent function for Takagi-Sugeno rules extracted from a SECoS network, for k inputs, with y representing the output variable, is:

y = wj;o 1 where y is rounded to unity if y

jI1 W1;j j + jI2 W2;j j + : : : + jI W ;j j jI1 + W1;j j + jI2 + W2;j j + : : : + jI + W ;j j

!!

(6.4)

> 1.

This being the case, a decompositional rule extraction algorithm will be able to substitute values for Wj;o and W1;j : : : Wk;j , in which case the function when activated will, under certain circumstances, exactly mimic the operation of the SECoS. The comprehensibility of the function could be increased by using semantically meaningful variable names instead of I1 : : : I . Since the weight values are retained in the function as constants, the original network will be exactly recreated when these rules are used in rule insertion. The antecedent parts of the rules are generated as for the Zadeh-Mamdani type rule extraction algorithm above. Therefore, the Takagi-Sugeno type rule extraction algorithm is as follows:

For each neuron j in the evolving layer – Create a new rule. – For each input neuron i

Find the MF associated with i that activates the most strongly for the weight Wi;j . Add that MF to the antecedent of the rule for that input.

This is the MF for that input for this rule. Insert the membership degree of the weight in the winning MF as the degree of importance for this condition.


134

if and and

and then IrisSetosa=0 and IrisVersicolor=0 jSepalLength 0:83j+jSepalW idth 0:79j+jP etalLength 0:85j+jP etalW idth 0:94j and IrisVirginica=1 jSepalLength+0:83j+jSepalW idth+0:79j+jP etalLength+0:85j+jP etalW idth+0:94j if and and and then IrisSetosa=0 jSepalLength 0:87j+jSepalW idth 0:77j+jP etalLength 0:74j+jP etalW idth 0:68j and IrisVersicolor=1 jSepalLength+0:87j+jSepalW idth+0:77j+jP etalLength+0:74j+jP etalW idth+0:68j and IrisVirginica=0 if and and and jSepalLength 0:78j+jSepalW idth 0:8j+jP etalLength 0:55j+jP etalW idth 0:48j then IrisSetosa=1 jSepalLength+0:78j+jSepalW idth+0:8j+jP etalLength+0:55j+jP etalW idth+0:48j and IrisVersicolor=0 and IrisVirginica=0

Table 6.3: Takagi-Sugeno rules extracted from a SECoS trained on the iris data set. – For each output neuron o

Substitute the value of Wj;o into Equation 6.4. For each input neuron i

Substitute the value of Wi;j into Equation 6.4.

Table 6.3 contains a set of rules extracted from a SECoS trained on the iris data set. These rules were extracted from the same network as the Zadeh-Mamdani rules in Table 6.1, and the same antecedent MF were also used. Although these rules are accurate, they are less comprehensible than the equivalent Zadeh-Mamdani rules in Table 6.1: while the consequent functions specify exactly what the output values should be for each variable, they are less clear about specifying which class the rule actually corresponds to. For classification tasks, then, the extraction of Takagi-Sugeno rules is less useful than the extraction of Zadeh-Mamdani rules, from the viewpoint of comprehensibility. Another example is shown in Table 6.4, where rules extracted from a SECoS trained on the gas furnace data set are presented. These rules were extracted from the same network as the Zadeh-Mamdani rules in Table 6.2. As the gas furnace application is a function approximation problem, the greater accuracy of the consequent functions will offset the lower level of comprehensibility. For function approximation tasks, then, the extraction of TakagiSugeno rules is as useful as the extraction of Zadeh-Mamdani rules. Evaluating the Takagi-Sugeno SECoS rule extraction algorithm using the criteria of Section 6.2 shows the differences between this and the Zadeh-Mamdani rule extraction algorithm. The expressive power of the rules is lower, as the consequents are harder to understand, although this may be ameliorated by careful selection of labels for the input and output variables and the input MF. The number of rules is of course directly dependent upon the


135

if and

jM ethane 0:35j+jCarbonDioxide 0:69j jM ethane+0:35j+jCarbonDioxide+0:69j if and jM ethane 0:22j+jCarbonDioxide 0:7j then CarbonDioxide=0:7 1 jM ethane+0:22j+jCarbonDioxide+0:7j if and jM ethane 0:41j+jCarbonDioxide 0:54j then CarbonDioxide=0:58 1 jM ethane+0:41j+jCarbonDioxide+0:54j then CarbonDioxide=0:76

1

Table 6.4: Takagi-Sugeno rules extracted from a SECoS trained on the gas furnace data set. number of neurons in the SECoS, so careful optimisation of the network will be necessary to keep the number of rules to the bare minimum. The quality of the rules can be very high, however, possibly much higher than ZadehMamdani rules extracted from the same SECoS. The reason for this is that while the accuracy of Zadeh-Mamdani rules are dependent upon the careful selection of output MF, the outputs of Takagi-Sugeno rules will be identical to those of the parent SECoS, for all examples that cause the rule to activate. Accuracy thus becomes a function of the careful selection of the input MF, rather than both input and output MF as is the case with Zadeh-Mamdani rule extraction. In common with all of the previously discussed ECoS fuzzy rule extraction algorithms, this is a decompositional algorithm, with a high translucency of the SECoS network from which rules are extracted. The complexity of the algorithm is actually less than that of the Zadeh-Mamdani rule extraction algorithm: while the Zadeh-Mamdani rule extraction algorithm must find the most highly activated output MF for each evolving to output layer connection weight, for the Takagi-Sugeno rule extraction algorithm these outgoing weight values must only be substituted into the consequent function for the rule. Thus, the only operation being performed over the outgoing weights is a simple read, rather than the read, fuzzify and compare operations of the Zadeh-Mamdani rule extraction algorithm. Once again, the portability of the algorithm is very low, as the algorithm is suitable only for SECoS networks.

6.8 Insertion of Fuzzy Rules into SECoS Networks Initialisation of neural networks with fuzzy rules, or rule insertion, is a way of creating an ANN that already contains domain knowledge, before the commencement of training. There are several advantages to doing this. Firstly, it will accelerate training, as the network already contains knowledge about the problem, which does not then have to be learned from scratch. Secondly, it can help avoid the problems of random initialisation, such as the ANN being initialised in a bad part of the solution space. Thirdly, when combined with rule extraction, it allows for the refinement of domain knowledge. Preliminary rules can be acquired from, for example, domain experts. These are inserted into an ANN, which is then trained on data taken from the domain. At the completion of training, rules, which have been refined by the learning process, are extracted from the ANN. These rules can then be examined, as part of knowledge discovery, and modified either by hand or by some other rule optimisation technique. If necessary, they can be inserted into another ANN and the process repeated, with each step producing rules that more closely approximate the problem.


136

Insertion of fuzzy rules into ECoS networks is particularly advantageous, because the ECoS algorithm is able to discover new rules (that is, add neurons) as necessary. That is, it is possible to start with a small number of basic rules, which are then inserted into an ECoS network. During training the ECoS will add additional neurons to its evolving layer, each one of which represents a new rule. Thus, the ECoS network will expand the basic rule set until it is able to represent the problem to an acceptable level. As was explained in Section 6.8, it is possible to extract fuzzy rules from SECoS networks. The development of rule insertion algorithms is necessary to complete their usefulness. In this section the algorithms for inserting both Zadeh-Mamdani (Subsection 6.8.1) and Takagi-Sugeno (Subsection 6.8.2) fuzzy rules are presented.

6.8.1 Inserting Zadeh-Mamdani Fuzzy Rules The algorithm for inserting Zadeh-Mamdani fuzzy rules into a SECoS network is in many ways the reverse of the algorithm for extracting them. The algorithm is as follows:

For each rule r – For each antecedent i

Defuzzify the MF for this antecedent, using the degree of importance of the antecedent as a weighting.

Insert the defuzzified value into the connection from input i to evolving layer neuron r.

– For each consequent

Defuzzify the MF for this consequent, using the degree of confidence of the consequent as a weighting.

Insert the defuzzified value into the connection from evolving layer neuron r to output o.

The algorithm will thus create a SECoS that encapsulates the knowledge represented by the fuzzy rules. This is done by inferring the crisp values that correspond to each weighted (by degree of importance and degree of confidence) membership function in each antecedent and consequent of the rule set. The efficiency of this algorithm is roughly equal to that of the rule extraction algorithm. While defuzzification is a more complex procedure than fuzzification, there is no searching for winning MF as there is in the rule extraction algorithm. The problem with this algorithm is that the weights will not necessarily be the same as those in the network from which the rules were extracted, that is, if a set of Zadeh-Mamdani rules were extracted from a SECoS, then inserted unmodified into a new SECoS, then the two networks would not necessarily be identical. This is because the weights are determined by a defuzzification process, and are thus influenced by the defuzzification algorithm used.

6.8.2 Inserting Takagi-Sugeno Fuzzy Rules Inserting Takagi-Sugeno rules into a SECoS is very much simpler than inserting Zadeh-Mamdani rules. This is because each connection weight is stored as a constant in one of the rules. Thus, rule insertion is simply a


137

matter of creating a SECoS with the appropriate number of input, evolving layer and output neurons and copying the appropriate values from the rule constants into the connections. Since there is no defuzzification involved, the algorithm is much simpler and more efficient than the Zadeh-Mamdani rule insertion algorithm. It will also be more accurate, which is to say, there will be no uncertainty about the weight values as there is with ZadehMamdani rule insertion, because there is no defuzzification involved. If a set of Takagi-Sugeno rules are extracted from a SECoS, then used unmodified to initialise another SECoS network, then the SECoS that is created by the rule insertion algorithm will exactly match the original network.

6.9 Evaluation of Fuzzy Rules Extracted from ECoS Networks A problem arises when assessing fuzzy rules extracted from either an EFuNN or a SECoS. In the ECoS algorithm, only the winning evolving layer neuron will fire. This means that, to maintain fidelity, only a single fuzzy rule should fire. In normal fuzzy inference systems, however, different rules can fire to different degrees (Section 2.2). This conflict between two forms of fuzzy inference means that a new form of fuzzy inference, based on winning rules, must be developed. The ‘winner takes all’ approach used in (Matthews and Jagielska, 1995) can be adapted to fulfill these needs. For both Zadeh-Mamdani and Takagi-Sugeno rules, the degree of truth of the antecedents of each rule is calculated. The degrees of importance attached to each element of the antecedents are used as a multiplier for the fuzzy membership values derived from the MF attached to that element. The rule with the highest degree of truth is declared the winner, and all other rules are inhibited from firing. For Zadeh-Mamdani rules, the resulting inferred MF is created in the usual way, with the degree of truth of the winning rule and the degree of confidence both being used to shape the inferred MF for each output variable. Maximum and product shaping can both be used. For Zadeh-Mamdani rules, defuzzification can be performed using any of the existing defuzzification methods, although for the case of rules extracted from an EFuNN, centre of gravity defuzzification should be used, to maintain the fidelity of the rules. This is because the EFuNN output layer uses CoG defuzzification. For Takagi-Sugeno rules, the outputs are evaluated by inserting the current input vector into the functions associated with each output variable. Since only the winning rule is allowed to fire, no weighting is applied to the results of these functions. Also, degrees of importance are used only in the antecedents. The main advantage to this approach is that it preserves the semantics of the original ECoS network, that is, the rules will have a similar meaning as the neurons from which they were extracted. The disadvantage is that it is a different form of fuzzy inference to the established norm, which makes it more difficult to directly compare the results of this method to established methods.

6.10 Problems with Fuzzy Rules Extracted from ECoS Networks There are some problems with the extraction of fuzzy rules from ECoS networks. Andrews et al (Andrews et al., 1995) include the comprehensibility of the rules as a measure of rule quality, where comprehensibility is directly related to the number of rules. Since there will be one rule for each evolving layer neuron, and ECoS networks tend


138

to have a large number of neurons in the evolving layer, then the number of rules will also tend to be large. This will certainly impact the comprehensibility of the rules. The problem of redundancy in the extracted rules reflects problems in the parent network, that is, redundant neurons in the evolving layer of the ECoS. Redundant rules are undesirable because they can reduce the comprehensibility of the rules, as well as reduce their accuracy. Since they are smaller on average than EFuNNs, SECoS networks are less vulnerable to this. SECoS are also less vulnerable to this problem due to the use of external MF for fuzzy rule extraction: if a large number of redundant rules are extracted, then the MF can be tuned to reduce the redundancy by, for example, adding additional MF. This may not be sufficient in all cases, however, and adding too many antecedent MF can decrease the comprehensibility of the rules. This technique also cannot be applied to EFuNN, as the MF are embedded in these networks. It can therefore become necessary in some cases to reduce the size of the ECoS networks evolving layer by applying one of the optimisation techniques described in Chapter 7. The most significant problem with the extraction of fuzzy rules from ECoS, however, can best be shown graphically. Figure 6.1 plots the positions in input space of the evolving layer neurons of a SECoS network trained on the gas furnace data set as well as the Voronoi regions defined by each neuron. Overlaid on this plot is a grid that shows the partitioning of the input space by the five MF attached to each input variable. It can be seen that the partitions affected by the MF do not correspond well to either the positions of the neurons or the Voronoi regions. Yet, when fuzzy rules are extracted from this network using these MF, these are the regions that will be defined by these rules: any examples that fall within these regions will cause the rule to activate. This will cause a great deal of confusion within the rule set, and can potentially lead to contradictory or inconsistent rules. That is, the rules may not be able to generalise properly nor will the rules be able to accurately replicate the performance of the network. Thus, the naive application of unoptimised MF to the task of extracting rules from ECoS networks is not likely to lead to accurate rules. For the case of rule extraction from SECoS networks, this is not a huge problem, because the MF are not intrinsic to the network and can be separately optimised without effecting the network in any way. Such an optimisation is shown in Figure 6.2, where both the number and parameters of the MF attached to each input variable have been manually tuned so that each partition defined by the MF contains only one evolving layer neuron. For EFuNN networks, however, the problems are more severe, as the MF are embedded within the structure of the EFuNN itself. This makes optimisation of the MF very difficult, as any change to the MF will by necessity cause a change in the behaviour of the network.

6.11 Experiments with Benchmark Data Sets 6.11.1 Introduction The major goal of the experiments in this section is to support the evaluation of Hypothesis Four (Section 1.2). As this is a major goal of the thesis, the criteria for evaluating the success of this hypothesis bear repeating: The research relating to Hypothesis Four will be considered to support the hypothesis if it results in algorithms that allow for the extraction of fuzzy rules from simplified ECoS networks, where the


139

Figure 6.1: Regions defined by neurons compared to regions defined by fuzzy MF.

. Figure 6.2: Regions defined by neurons compared to regions defined by fuzzy MF


140

rules are competitive with the rules extracted from EFuNN. Competitive means that the accuracy of the extracted fuzzy rules is similar to or better than the accuracy of rules extracted from EFuNN. It is thus necessary to evaluate the performance of the fuzzy rules extracted from ECoS, and compare them to the fuzzy rules extracted from EFuNN. It is also necessary to evaluate the performance of the rule insertion algorithms. Evaluating the goal above can be broken down in to seven sub-tasks, as follows: 1. Evaluate the performance of the extracted fuzzy rules. 2. Compare the performance of the extracted fuzzy rules with the performance of the original network. 3. Compare Zadeh-Mamdani rules extracted from SECoS with Zadeh-Mamdani rules extracted from EFuNN. 4. Compare Zadeh-Mamdani rules extracted from SECoS with Takagi-Sugeno rules extracted from SECoS. 5. Evaluate the performance of networks created via rule insertion. 6. Compare the performance of the networks created via rule insertion with the performance of the rules. 7. Compare the performance of the networks created via rule insertion with the performance of the original networks.

6.11.2 Experimental Method The networks used in this section are the trained EFuNN and SECoS networks existing at the end of the benchmarking work in Section 4.13. After training on each data subset, fuzzy rules were extracted from the trained network: both Zadeh-Mamdani and Takagi-Sugeno rules were extracted from the SECoS networks, while only Zadeh-Mamdani rules were extracted from the EFuNNs, for reasons explained in Section 6.6. The extracted rules were then tested over each data subset, using the fuzzy inference method described in Section 6.9. New networks were then created from the Zadeh-Mamdani rules, and the performance of these newly created networks evaluated. No further training was performed over these networks. The Takagi-Sugeno rules were not used to create new networks. As discussed in Subsection 6.7.2, any network created from these rules will be identical to the original. The first task was performed by carrying out the experiments as described. The second task was performed by investigation of the statistical hypotheses listed in Table 6.5 and 6.6. The alternate hypotheses are based on the assumption that the rules will be less accurate than the original networks. For classification problems, this means that the performance measure (percent correct) will be lower, while for function approximation problems, this means that the performance measure (mean-squared error) will be higher. Thus, the hypotheses in Table 6.5 were used to evaluate the two spirals and iris classification problems, while the hypotheses in Table 6.6 were used to evaluate the Mackey-Glass and gas furnace problems. The meanings of the superscripts and subscripts used are consistent with those in previous sections (Section 2.7, Section 4.13). A subscript of

r

indicates that the measure refers to the extracted rules. One-tailed, paired-value t-tests were used to evaluate these hypotheses.


Hypothesis

H0 H1

AA

AB

AC

141

AF

aa = aa ab = ab a = a af = af r r r r aa aa ab ab a a af > r > r > r > af r

Hypothesis

BA

BB

BC

BF

H0 H1

ba = ba r ba > ba r

bb = bb r bb > bb r

b = b r b > b r

bf = bf r bf > bf r

Table 6.5: Statistical hypotheses for comparing networks and extracted rules for the two spirals and iris classification data sets. Hypothesis

H0 H1

AA

AB

AC

AF

= = = = af r aa aa ab ab a a af < r < r < r < af r aa

aa r

ab

ab r

a

a r

af

Hypothesis

BA

BB

BC

BF

H0 H1

ba = ba r ba < ba r

bb = bb r bb < bb r

b = b r b < b r

b = b r b < b r

Table 6.6: Statistical hypotheses for comparing networks and extracted rules for the Mackey-Glass and gas furnace data sets. The third task was carried out by testing the hypotheses listed in Table 6.7. Here, a subscript of rs indicates that the rules were extracted from SECoS, while a subscript of re indicates that the rules were extracted from EFuNN. Two-tailed t-tests were used to test these hypotheses. The fourth task was performed by testing the hypotheses presented in Table 6.8. Here, a subscript of

zm

indicates that the rules are of Zadeh-Mamdani type, while a subscript of ts indicates the rules are of Takagi-Sugeno type. Two-tailed t-tests were used to test these hypotheses. As with the second task, the fifth task was carried out by performing the experiments as described above. The sixth task was performed by testing the hypotheses presented in Table 6.9. Here, a subscript of r indicates rules, while a subscript of i indicates the network created by the insertion of those rules. Two-tailed, paired-value

t-tests were used to test these hypotheses. Hypothesis

AA

H0 H1

= aa aa rs 6= re aa rs

AB aa re

Hypothesis

BA

H0 H1

s= ba ba rs 6= re ba r

AC ab re

a rs

BB ba re

AF

= = = af re ab ab a a af rs 6= re rs 6= re rs 6= af re ab rs

af rs

BC

= bb bb rs 6= re bb rs

a re

bb re

BF

= b b rs 6= re b rs

b re

= bf re bf rs 6= bf re bf rs

Table 6.7: Statistical hypotheses for comparing rules extracted from SECoS and EFuNN.


Hypothesis

H0 H1

AA

AB

AC

142

AF

af aa ab a aa ab a af zm = ts zm = ts zm = ts zm = ts af aa ab a aa ab a af zm 6= ts zm 6= ts zm 6= ts zm 6= ts

Hypothesis

BA

BB

BC

BF

H0 H1

ba ba zm = ts ba ba zm 6= ts

bb bb zm = ts bb bb zm 6= ts

b b zm = ts b b zm 6= ts

bf bf zm = ts bf bf zm 6= ts

Table 6.8: Statistical hypotheses for comparing Zadeh-Mamdani rules extracted from SECoS with Takagi-Sugeno rules extracted from SECoS. Hypothesis

H0 H1

AA

AB

AC

AF

= = = = af i aa aa ab ab a a af r 6= i r 6= i r 6= i r 6= af i aa r

aa i

ab r

ab i

a r

a i

af r

Hypothesis

BA

BB

BC

BF

H0 H1

ba ba r = i ba ba r 6= i

bb bb r = i bb bb r 6= i

b b r = i b b r 6= i

bf bf r = i bf bf r 6= i

Table 6.9: Statistical hypotheses for comparing Zadeh-Mamdani rules with the networks created via insertion of those rules. The seventh and final task was carried out by testing the hypotheses listed in Table 6.10. Here, a subscript of

i again indicates the network created by the insertion of rules, and a subscript of o indicates the original network. Two-tailed, paired-value t-tests were used to test these hypotheses. The results of all statistical hypothesis tests are presented in Appendix C.

6.11.3 Two Spirals The results of the experiments are presented in Table 6.11. For purposes of comparison, the accuracies of the original SECoS and EFuNN networks are reproduced. In this table, “SECoS-ZM” refers to the Zadeh-Mamdani

Hypothesis

H0 H1

AA

AB

AC

AF

aa ab a af aa ab a af i = o i = o i = o i = o aa ab a af aa ab a af i 6= o i 6= o i 6= o i 6= o

Hypothesis

BA

BB

BC

BF

H0 H1

ba ba i = o ba ba i 6= o

bb bb i = o bb bb i 6= o

b b i = o b b i 6= o

bf bf i = o bf bf i 6= o

Table 6.10: Statistical hypotheses for comparing networks created via the insertion of Zadeh-Mamdani rules with the original networks.


Trained on Set A Recall Set

A

B

C

All

143

Trained on Set B Neurons A

B

C

All

/ Rules SECoS

SECoS-ZM

SECoS-TS

SECoS-ZM in

EFuNN

EFuNN-ZM

EFuNN-ZM in

50.2/

57.4/

11.3/

47.1/

9.3

23.4

14.7

8.0

87.2/

89.9/

71.1/

85.9/

3.2

10.6

32.4

4.0

12.6/

10.4/

9.0/

12.1/

14.4

13.1

28.5

14.7

59.0/

68.9/

50.0/

59.1/

36.8

36.7

52.7

38.0

76.6/

21.1/

17.9/

65.1/

11.9

22.9

20.1

47.7/

60.3/

29

6.0/ 2.7

Neurons / Rules

44.2/

85.3/

54.2/

49.4/

11.7

17.2

26.4

10.2

77.5/

79.9/

76.2/

77.6/

24.1

29.8

25.2

23.7

4.9/

20.7/

10.4/

7/5.4

6.4/ 3.7

3.9

33.3

15.9

59.5/

88.9/

73.3/

63.8/

6.4/ 3.7

40.1

20.8

37.6

36.9

52.1/

69.6/

89.9/

17.2/

66.3/

59.1/

7.7

20.3

11.8

18.9

20.3

7.4

14.2

47.3/

49.0/

52.1/

43.7/

57.3/

47.3/

45.6/

59.1/

42.4

17.9

26.4

20.3

30.0

45.8

46.3

28

14.2

57.3/

37.1/

34.4/

53.0/

52.1/

55.9/

52.2/

27.0/

52.7/

59.1/

21.3

31.8

36.1

20.7

20.3

20.8

41.4

23.8

16.5

14.2

6.0/ 2.7

6.0/ 2.7

6.0/ 2.7

6.4/ 3.7

6.4/ 3.7

Table 6.11: Mean percentage correct / standard deviation (to 1 d.p.) for the two spirals problem. rules extracted from SECoS, “SECoS-TS” means the Takagi-Sugeno rules extracted from SECoS, and “SECoSZM in” denotes SECoS created by the insertion of Zadeh-Mamdani rules. “EFuNN-ZM” means Zadeh-Mamdani rules extracted from EFuNN, and “EFuNN-ZM in” means EFuNN created by the insertion of Zadeh-Mamdani rules. Discussion The accuracies of SECoS and EFuNN networks, along with the accuracies of various rules extracted from them, are presented in Table 6.11. It is interesting to note the large differences in means between SECoS and SECoSZM, and between EFuNN and EFuNN-ZM. Each of these measures, however, had a very large standard deviation, which made it most important to test for significant differences between the groups of measures. These tests of significance were done by testing the hypotheses in Table 6.5. The results of these tests over the SECoS and SECoS-ZM accuracies gave the results in Table C.1. These results show that there were highly significant differences between the networks and the extracted rules, especially for those rules extracted from the initially trained network. The rules extracted from the network after further training showed more significant differences over Set A, but no difference over Sets B and C. While there was a significant difference between the two over the full data set, this difference was not highly significant, as the tests revealed no difference at 99% level of confidence. What is particularly intriguing is the fact that the accuracy of the rules was significantly higher than that of the parent network. This was most likely an artifact of the difficulty of the two spirals problem, as none of


144

the other benchmark data sets expressed a similar phenomenon. Repeating these tests for EFuNN gave the results presented in Table C.2. These results show that there were significant differences between the accuracies over the initial training set, and over Set B. There were no significant differences between the accuracies over Set C, nor were there highly significant differences between the accuracies over Set B and over the full data set. After further training, the significant difference over Set B at 95% confidence disappeared. The other differences and similarities remained the same. In the case of rules extracted from EFuNN, the extracted rules were generally less accurate than the original networks. It was only over Set A, the initial training set, that the accuracy of the rules exceeded the accuracy of the parent EFuNN. The accuracies of the original SECoS networks were also compared to the accuracies of the Takagi-Sugeno rules extracted from them. These accuracies were compared by evaluating the hypotheses listed in Table 6.5, which gave the results presented in Table C.3. A comparison of the rules extracted from SECoS and those extracted from EFuNN was carried out by testing the hypotheses listed in Table 6.7. This gave the results in Table C.4. For the rules extracted from the networks after initial training, the rules extracted from EFuNN had accuracy significantly less than those extracted from SECoS. While the EFuNN-derived rules had a significantly lower accuracy over Set B, this difference was not highly significant. No significant differences existed over Set C, but the performance of the SECoS-derived rules was much higher over the full data set. For the rules extracted after further training, there were no highly significant differences at all. Differences existed at the 95% level of confidence only for Set A and the full data set. It is important to note, however, that EFuNN-derived rules were able to match the performance of SECoS-derived rules only with a much larger set of rules. It was expected that the Takagi-Sugeno rules would perform poorly over this problem, as Takagi-Sugeno rules are primarily intended for use with function approximation problems. Evaluating the hypotheses in Table 6.8 yielded the results in Table C.5, which confirm this expectation: over all data sets, the performance of the TakagiSugeno rules was inferior to the Zadeh-Mamdani rules. To be useful as a knowledge engineering technique, the networks created by rule insertion must have a performance equal to or better than that of the fuzzy rules. This was investigated by testing the hypotheses in Table 6.9. This yielded the results presented in Table C.6. There were some significant differences at 95% confidence, but no highly significant differences at 99% confidence. In other words for this data set, the rule insertion technique produced networks of similar performance to the fuzzy rules. Repeating these tests for EFuNN yielded the results presented in Table C.7. These results are similar to those for SECoS, in that there were no highly significant differences apparent. It is also highly desirable for the rule insertion algorithm to produce networks that are similar to the original. A comparison of the original networks, and those created by rule insertion, was performed by testing the hypotheses listed in Table 6.10. The results of these tests over the SECoS rules and networks are presented in Table C.6. It can be seen from these results that the performance of the SECoS networks created by rule insertion was not significantly different from the performance of the SECoS networks from which the rules were originally extracted. In other words, the amount of information lost during the rule extraction and rule insertion operations was not so great that a close approximation of the original could not be created.



A

B

C

All

145


B

C

All

/ Rules SECoS

SECoS-ZM

SECoS-TS

SECoS-ZM in

EFuNN

EFuNN-ZM

EFuNN-ZM in

Neurons / Rules

97.8/

94.7/

93.3/

97.1/

24.8/

97.5/

100.0/ 92.0/

97.2/

26.1/

1.6

6.9

7.0

1.5

2.1

1.3

0.0

7.6

1.7

2.4

81.4/

80.7/

79.3/

81.1/

24.8/

80.2/

80.0/

80.0/

80.1/

26.1/

2.0

6.2

8.0

1.5

2.1

2.1

7.7

8.3

2.5

2.4

52.9/

54.7/

52.0/

53.0/

24.8/

52.8/

55.3/

51.3/

52.9/

26.1/

1.7

8.2

8.8

1.5

2.1

2.2

5.5

9.5

1.8

2.4

81.1/

78.0/

82.0/

81.1/

24.8/

82.5/

78.0/

82/

82.0/

26.1/

9.5

10.9

11.8

9.5

2.1

10.4

10.9

12.6

10.4

2.4

97.3/

93.3/

94/

96.5/

36.0/

97.2/

98.0/

94.0/

96.69/ 37.1/

1.4

7.7

5.8

0.7

1.8

1.2

3.2

6.6

0.6

1.7

86.25/ 86.0/

86.0/

86.2/

36.0/

86.4/

86.0/

85.3/

86.3/

37.1/

0.6

4.9

3.8

1.0

1.8

0.8

3.8

4.2

0.8

1.7

96.0/

92.0/

92.7/

95.3/

36.0/

95.6/

96.7/

93.3/

95.5/

37.1/

2.6

5.3

3.8

2.0

1.8

1.8

4.7

5.4

1.9

1.7

Table 6.12: Mean percent correct / standard deviation (to 1 d.p.) for the iris classification problem. Repeating these tests for EFuNN yielded the results presented in Table C.7. In contrast to the results over SECoS, there are several differences apparent. In all cases where highly significant differences exist, the accuracy of the EFuNN created via rule insertion is significantly less than the original. This indicates that the rule extraction and rule insertion procedures caused too much information to be lost. Conclusions For this data set, the rule extraction and insertion algorithms for SECoS produced better results with fewer rules than the equivalent algorithms for EFuNN. The Takagi-Sugeno rules extracted from SECoS were, as expected, poor performers.

6.11.4 Iris Classification The results of the experiments are presented in Table 6.12. For comparison purposes the results of the original SECoS and EFuNN networks are included. The row labels are as in the previous subsection. Discussion The accuracies for the original networks, the rules extracted from them, and the networks created via rule insertion, are all presented in Table 6.12. Since the iris classification problem is considered to be quite easy, it is not surprising that the accuracies of the extracted rules were quite high.


146

The results of a comparison of the Zadeh-Mamdani rules extracted from SECoS, and the original SECoS networks, are presented in Table C.10. These results are the results of testing the hypotheses presented in Table 6.5. An across-the-board rejection of the null hypothesis shows that the performance of the extracted rules was significantly less than the performance of the original networks. Repeating these tests for EFuNN gave the results in Table C.11. Again, the null hypothesis was rejected for all data sets, at all levels of significance. This shows conclusively that the extracted rules were less accurate than the original EFuNN networks. As was the case with the two spirals data set, the Takagi-Sugeno rules were expected to be less accurate than the original SECoS networks over this (classification) problem. The results of the hypothesis tests that are presented in Table C.14 show that this was the case: in all cases, at all levels of significance, the performance of the Takagi-Sugeno rules was highly significantly less than the performance of the SECoS from which they were extracted. More interesting is the comparison of the performance of Zadeh-Mamdani rules extracted from SECoS, with Zadeh-Mamdani rules extracted from EFuNN. This comparison was performed by testing the hypotheses listed in Table 6.7. The results of these tests are presented in Table C.13. For the rules extracted from the initially trained networks, the rules extracted from EFuNN had highly significantly superior performance over the initial training data Set A. There was no significant difference in accuracy over Set B. While a significant difference existed over Set C at the 95% level of confidence, no such difference existed at 99% confidence. Thus, there was no highly significant difference between the two rule sets. Across the entire data set, the rules extracted from EFuNN were significantly better. The rules extracted after additional training on Set B had similar performance. The accuracy of the EFuNN-derived rules was significantly higher. The accuracy over Set B of these rules was significantly better at 95% confidence, but not highly significantly better at 99% confidence. The generalisation accuracy over Set C was not significantly different at either 95% or 99% level of confidence. The higher accuracy of the EFuNN-derived rules over the training sets gives this rule set a significantly better performance over the entire data set. Overall, the rules extracted from EFuNN performed better than those extracted from SECoS. However, SECoS-derived rules were able to generalise as well as EFuNN-derived rules, and it is not really surprising that EFuNN derived rules can classify training data better, given that there are 50% more rules in the EFuNN-derived rule sets. A comparison of the accuracies of Zadeh-Mamdani and Takagi-Sugeno rules extracted from SECoS was performed by testing the hypotheses listed in Table 6.8. The results of these tests are presented in Table C.14. Since the Takagi-Sugeno rules performed so poorly over the data sets, it is not surprising that these rules performed much worse than the Zadeh-Mamdani rules extracted from the same SECoS networks. To be useful as a knowledge engineering technique, the networks created by the rule insertion algorithms must be at least as accurate as the rules from which they were created. This was tested for these experiments by testing the hypotheses in Table 6.9. The results of these tests over the SECoS-derived rules and the SECoS created from them are presented in Table C.15. These results show that there were no significant differences between the accuracy of the rules and the accuracy of the SECoS networks created from them. Repeating these tests for the EFuNN-derived rules yielded the results in Table C.16. It is apparent from these results that, with a single exception, the EFuNN created from the insertion of the rules were significantly more


147

accurate than the rules from which they were created. It is only for the case of the generalisation accuracy over Set B, for the rules extracted after initial training on Set A, that there was no highly significant difference at 99% level of confidence. This may have been due to the relatively high standard deviation of the results over this data set. It may also simply have been a statistical aberration. It is also desirable for the rule insertion algorithms to produce a network that is not significantly different from the network from which the rules were originally derived. This was evaluated by comparing the accuracies of the original networks with the accuracies of the networks created by rule insertion. This comparison was achieved by testing the hypotheses listed in Table 6.10. The results of these tests for the SECoS networks are presented in Table C.17. These results show that the accuracy of the networks created via rule insertion were consistently less than the accuracies of the original networks. The sole exception was over Set B, for the networks from the initial training group. In this case, there was a significant difference at the 95% level of confidence, but not at the 99% level of confidence. Again, this may have been a statistical aberration. Repeating these tests for the EFuNN-derived rules and their original networks gave the results in Table C.18. These results clearly show that the EFuNN created via rule insertion were not significantly better than the EFuNN from which the rules where originally extracted. Conclusions Overall, for this benchmark data set, the EFuNN rule extraction and insertion algorithms were superior to the SECoS rule insertion and extraction algorithms. The SECoS algorithms were competitive, however, in many cases there were no significant differences between the two. Given that there were 50% more rules in the EFuNN derived sets, it is not surprising that the EFuNN-related algorithms performed better: with more rules, it is easier to get good results, especially over the training data, as was the case in these experiments.

6.11.5 Mackey-Glass The accuracies of the networks and rules are presented in Table 6.13. The accuracies are presented as the Mean Squared Error, and the numbers presented are the mantissa of base-ten numbers raised to the negative fourth power. The row labels are the same as the previous subsections. Discussion The accuracies of the SECoS and EFuNN networks, along with the accuracies of the rules extracted from them, are presented in Table 6.13. It is interesting to note that the errors over the rules were very high compared to the networks, in some cases more than a order of magnitude higher. The standard deviations were quite low across all of the results, however, which indicates that the relatively bad mean performance of the rules was not due to a small number of poor performers. The accuracies of the Zadeh-Mamdani rules extracted from SECoS were compared to the accuracies of the SECoS they were extracted from, by evaluating the hypotheses listed in Table 6.6. The results of these tests are presented in Table C.19. These results make it apparent that the rules were in all cases less accurate than the original networks.



A

B

C

All

148


B

C

All

/ Rules SECoS

SECoS-ZM

SECoS-TS

SECoS-ZM in

EFuNN

EFuNN-ZM

EFuNN-ZM in

Neurons / Rules

13/

15/

15/

13/

60.0/

13/

6.667/ 15/

13/

63.0/

0.8

2.13

2.944

0.934

2.1

1.355

0.793

2.116

1.088

2.5

117/

120/

111/

117/

60.0/

103/

107/

110/

105/

63.0/

16

25

26

16

2.1

21

34

28

22

2.5

88/

91/

85/

88/

60.0/

92/

91/

95/

93/

63.0/

11

21

23

9.095

2.1

6.084

19

28

6.249

2.5

114/

105/

109/

113/

60.0/

119/

96/

105/

115/

63.0/

24

18

24

21

2.1

35

22

31

32

2.5

22/

13/

12/

11/

267.8/

11/

9.319/ 12/

11/

282.4/

0.53

1.3

1.67

0.363

6.4

0.383

1.024

1.54

0.272

4.5

113/

110/

114/

113/

267.8/

112/

112/

113/

112/

282.4/

8.213

16

20

7.77

6.4

6.953

20

22

7.479

4.5

56/

62/

62/

58/

267.8/

57/

55/

62/

57/

282.4/

1.448

8.654

7.608

1.515

6.4

1.099

8.662

11

1.133

4.5

Table 6.13: Average mean squared error / standard deviation (10 Mackey-Glass data set.

4) of networks and extracted rules for the


149

Performing the same comparisons for the Zadeh-Mamdani rules extracted from EFuNN gave the results in Table C.20. Again, each set of rules was significantly less accurate than the EFuNN networks they were extracted from. The Takagi-Sugeno rules extracted from SECoS also performed poorly. Table C.21 presents the results of comparing SECoS to Takagi-Sugeno rules. Again, each set of rules was significantly less accurate then the SECoS networks they were extracted from. By testing the hypotheses listed in Table 6.7, the performance of the Zadeh-Mamdani rules extracted from SECoS were compared with the performance of the Zadeh-Mamdani rules extracted from EFuNN. The results of these tests are presented in Table C.22. These results show that while neither SECoS or EFuNN-derived ZadehMamdani rules were able to compete with the networks those rules were extracted from, both were as bad as each other. In other words, there were no significant differences in the performance of rules derived from SECoS and rules derived from EFuNN. Takagi-Sugeno rules were expected to perform better, relative to Zadeh-Mamdani rules, over function approximation tasks than they did over classification problems. By testing the hypotheses listed in Table 6.8 the results presented in Table C.23 were obtained. Inspection of these results indicates that the Takagi-Sugeno rules were superior in some cases. For the rules extracted after initial training on Set A, the error over the Takagi-Sugeno rules was significantly lower than the error over the Zadeh-Mamdani rules. The generalisation errors over Sets B and C were also significantly lower at the 95% level of confidence, but were not significantly lower at the 99% level of confidence. The error over the entire data set was lower for the Takagi-Sugeno rules than for the Zadeh-Mamdani rules, which may be due to the significantly lower error over Set A. For the rules extracted after further training on Set B, there were no significant differences exhibited. A comparison of the accuracies of the SECoS networks created via rule insertion with the accuracies of the Zadeh-Mamdani rules they were created from, was achieved by testing the hypotheses listed in Table 6.9. The results of these tests are presented in Table C.24. From these results it is apparent that the rule insertion algorithm was able to create networks that closely approximate the performance of the rules. Comparing the accuracies of the EFuNN networks created via rule insertion with the accuracies of the ZadehMamdani rules they were created from yielded the results in Table C.25. These results show that the EFuNN rule insertion algorithm was able to create networks of superior performance to the original rules. By testing the hypotheses in Table 6.10, a comparison of the original SECoS networks, and the SECoS networks created by the insertion of Zadeh-Mamdani rules was carried out. The results of this comparison are presented in Table C.26. Combined with an inspection of Table 6.13, these results show that the SECoS networks resulting from fuzzy rule insertion were not as accurate as the original networks. Repeating these tests for the EFuNN results yielded the results in Table C.27. The same conclusions can be drawn, in that the EFuNN networks created from the rules were significantly less accurate than the original networks. This implies that the rule extraction, and rule insertion algorithms both lost too much information to recreate the original network.



A

B

C

All

150


B

C

All

/ Rules SECoS

SECoS-ZM

SECoS-TS

SECoS-ZM in

EFuNN

EFuNN-ZM

EFuNN-ZM in

Neurons / Rules

0.458/

0.615/ 0.583/ 0.486/ 26.8/

0.536/ 0.227/ 0.629/ 0.514/ 28.1/

0.043

0.091

0.104

1.949/

2.162/ 2.361/ 2.011/ 26.8/

1.963/ 2.133/ 2.191/ 2.003/ 28.1/

0.209

0.475

0.235

1.69/

1.895/ 1.705/ 1.712/ 26.8/

1.698/ 1.851/ 1.779/ 1.722/ 28.1/

0.226

0.447

0.257

2.292/

2.456/ 2.557/ 2.334/ 26.8/

2.899/ 3.14/

3.019/ 2.934/ 28.1/

1.149

0.782

1.113

1.406

0.45/

0.801/ 0.847/ 0.525/ 107.8/

0.478/ 0.337/ 0.835/ 0.499/ 118.2/

0.049

0.212

0.04

0.05

0.215

1.878/

1.827/ 1.887/ 1.874/ 107.8/

1.912

1.914

1.999/ 1.921/ 118.2/

0.266

0.458

/

/

0.594

0.255

0.544

0.207

0.739

0.295

1.243

0.25

0.662

0.035

0.217

0.19

1.105

0.066

0.195

3.9

3.9

3.9

3.9

3.6

3.6

0.068

0.539

0.559

1.378

0.192

0.514

0.449

0.093

0.198

0.275

1.126

0.045

0.159

4.3

4.3

4.3

4.3

4.0

4.0

2.57/

3.4/

3.342/ 2.73/

107.8/

2.575/ 2.255/ 3.515/ 2.637/ 118.2/

0.225

0.746

0.713

3.6

0.228

0.138

0.784

0.638

0.152

4.0

Table 6.14: Average mean squared errors / standard deviation (to 3 d.p.) of networks and extracted rules for the gas furnace data set. Conclusions The rules extracted from both SECoS and EFuNN performed relatively poorly over this data set. The best performing rules were the Takagi-Sugeno rules extracted from SECoS. There were no significant differences between the performance of the Zadeh-Mamdani rules extracted from SECoS, and the Zadeh-Mamdani rules extracted from EFuNN.

6.11.6 Gas Furnace The results of the experiments over the gas furnace data set are presented in Table 6.14. The results are presented as mean-squared errors, and are presented to three decimal places. Discussion The accuracies, as mean-squared errors, of each network type and the rules extracted from them, are presented in Table 6.14. Straight away, it is possible to see that the rules, and the networks created via rule insertion, were less accurate than the original networks. The accuracies of the extracted rules and the accuracies of the original networks were compared by testing the


151

hypotheses in Table 6.6. These tests were performed to compare SECoS to the Zadeh-Mamdani rules extracted from SECoS, SECoS to the Takagi-Sugeno rules extracted from SECoS, and EFuNN to the Zadeh-Mamdani rules extracted from EFuNN. The results of the comparison between SECoS and the Zadeh-Mamdani rules are presented in Table C.28. Inspection of these results, and the accuracies in Table 6.14, shows that the rules were much less accurate than the original networks. The results of these tests over EFuNN and the Zadeh-Mamdani rules are presented in Table C.29. Again, the results indicate that the extracted rules were significantly less accurate than the original networks. The comparison of SECoS and Takagi-Sugeno rules gave the results in Table C.30. Once again, the rules were significantly less accurate than the SECoS networks. By testing the hypotheses listed in Table 6.7, it was possible to compare the accuracies of the Zadeh-Mamdani rules extracted from SECoS, with the Zadeh-Mamdani rules extracted from EFuNN. The results of this comparison are presented in Table C.31. These results show that there were no significant differences between the performance of the two rule sets. Note that there are far fewer rules in the SECoS-derived sets than in the EFuNN-derived sets. Despite this disparity in size, the SECoS-derived rules were able to perform at a level comparable to the EFuNN-derived rules. The results of evaluating the hypotheses in Table 6.8 are presented in Table C.32. The purpose of these hypotheses was to compare the performance of the SECoS-derived Zadeh-Mamdani rules to the Takagi-Sugeno rules extracted from SECoS. These results show, for the rules extracted after the initial training over Set A that the Takagi-Sugeno rules were significantly more accurate than the Zadeh-Mamdani rules, at the 95% level of confidence. At the 99% level of confidence, there was no difference. The generalisation performance over Set B was the same for both, but there was a significant difference at the 95% level of confidence for the generalisation accuracy over Set C. The accuracy of the Takagi-Sugeno rules was significantly better over the full data set at both 95% and 99% levels of confidence. For the rules extracted after further training, significant differences existed only over Set A and the full data set, at the 95% level of confidence. The purpose of the hypotheses in Table 6.9 was to allow for the comparison of the accuracy of the ZadehMamdani rules, and the accuracy of the networks created via insertion of these rules. The results of testing these hypotheses over the SECoS-derived rules are presented in Table C.33. These results show that there were almost no significant differences at the 95% level of confidence, and no highly significant differences at all at the 99% level of confidence. The differences existed only over the rules extracted after further training, and only over Set B and the full data set (which suggests that the difference over Set B was large enough to disturb the overall accuracy). These results show that the rule insertion algorithm was again able to produce a SECoS that closely approximates the performance of the rules from which it was created. The tests were repeated for the EFuNN-derived rules. The results are presented in Table C.34. In contrast to the results for SECoS, significant differences existed across all but one of the data sets. Only for Set B over the rules extracted after further training, were there no significant difference. For those cases where there are significant differences, the performance of the EFuNN created via rule insertion was less than that of the rules. The final tests performed were intended to compare the original networks to those created via rule insertion. The hypotheses listed in Table 6.10 were used to do this. The results of testing those hypotheses over the SECoS


Source

Mean-Squared Error

(Box and Jenkins, 1970)

0.202

(Tong, 1978)

0.469

(Pedrycz, 1984)

0.320

(Xu and Lu, 1987)

0.328

(Sugeno and Tanaka, 1991)

0.068

(Sugeno and Tanaka, 1991)

0.359

(Sugeno and Yasukawa, 1991)

0.355

(Sugeno and Yasukawa, 1993)

0.190

(Lin and Cunningham III, 1995)

0.071

(Abreu and Pinto-Ferreira, 1996)

0.172

(Wang and Langari, 1996)

0.066

(Kim et al., 1997)

0.055

(Kim et al., 1998)

0.048

(Gaweda et al., 2002)

0.045

152

Table 6.15: Reported Mean Squared Error for the Gas Furnace Problem networks are presented in Table C.35. These results clearly show that in all cases, the SECoS networks created via the insertion of Zadeh-Mamdani rules, were significantly less accurate than the original SECoS networks. Repeating these tests over the EFuNN results yielded the results in Table C.36. Again, the EFuNN networks that resulted from rule insertion were significantly less accurate than the original networks. Results reported in the literature are presented in Table 6.15. The results for the extracted fuzzy rules are all inferior to these results, however any comparison between these two groups of results should be considered to be purely indicative. This is because the number of time-steps used in the literature vary, that is, the number of previous measurements used in the system vary from what was used here. Conclusions The fuzzy rule extraction and insertion algorithms all performed relatively poorly over the gas furnace data set. The rules extracted from SECoS were, however, competitive to those extracted from EFuNN, despite the rule sets being much smaller.

6.11.7 Conclusions for Experiments with Benchmark Data With the exception of the two spirals data set, the performance of the extracted rules was significantly less than the performance of the original networks. For the case of the two spirals set, the extracted Zadeh-Mamdani rules performed significantly better than the original networks. Across the iris data set, the rules were less accurate than the networks, but still performed much better than chance. There were few significant differences between the Zadeh-Mamdani rules extracted from SECoS and the Zadeh-Mamdani rules extracted from EFuNN. This is


153

despite the fact that there were far fewer fuzzy rules in the rule sets extracted from SECoS, than there were in the rule sets extracted from EFuNN. As expected, the extracted Takagi-Sugeno rules performed poorly across the classification problems. Across the two function approximation problems, they performed significantly better than the SECoS-derived ZadehMamdani rules, but they were still significantly less accurate than the original networks. The performance of the networks created by rule insertion was generally poor. While the SECoS networks that resulted from rule insertion were not generally significantly different in performance to the rules from which they were created, the EFuNN networks were often superior. This leads to the conclusion that the rule insertion algorithms are capable of producing networks that faithfully represent the rules. The performance of the created networks, however, was often significantly less than the performance of the original networks. Combined with the fact that, as mentioned above, the rules perform worse than the networks they were extracted from, this indicates that the rule extraction algorithms lose too much information to allow for a complete recreation of the original network.

6.12 Conclusions The criteria for evaluating the support of Hypothesis Four are as follows: The research relating to Hypothesis Four will be considered to support the hypothesis if it results in algorithms that allow for the extraction of fuzzy rules from simplified ECoS networks, where the rules are competitive with the rules extracted from EFuNN. Competitive means that the accuracy of the extracted fuzzy rules is similar to or better than the accuracy of rules extracted from EFuNN. This chapter has shown that algorithms for extracting fuzzy rules from SECoS are possible. From the benchmark results, it is clear that the Zadeh-Mamdani rules extracted from SECoS are of comparable performance to the Zadeh-Mamdani rules extracted from EFuNN. Based on these results, under the criteria in Section 1.3, Hypothesis Four is considered to be supported.

6.13 Summary This chapter investigates Hypothesis Four and has introduced several original contributions. It started by presenting, in Section 6.2, a widely-used set of criteria for evaluating rule extraction algorithms. Relevant previous work in rule extraction was briefly reviewed in Section 6.3, where the suitability of different rule extraction methods to ECoS was also discussed. This was followed by Section 6.4, which covered previous work in extracting fuzzy rules from FuNN. Justifications for extracting fuzzy rules from ECoS networks were presented in Section 6.5. An algorithm for extracting fuzzy rules from EFuNN was then reviewed and critiqued in Section 6.6. This was followed by Section 6.7 with the first two original algorithms of the chapter, an algorithm for extracting Zadeh-Mamdani fuzzy rules from SECoS networks. The second original algorithm of the chapter is an algorithm for extracting Takagi-Sugeno fuzzy rules from trained SECoS networks. Algorithms for inserting both Zadeh-Mamdani and Takagi-Sugeno rules into SECoS networks were the third and fourth pieces of original work and were presented


154

in Section 6.8. The final original piece of work in the chapter is a simple means of assessing fuzzy rules extracted from ECoS networks, that takes into account some of the differences between ECoS networks and more conventional ANN (Section 6.9). Some problems with fuzzy rules extracted from ECoS networks, which provide additional motivation for the ECoS optimisation algorithms in Chapter 7 were discussed in Section 6.10. Finally, experimental results of applying the algorithms described to the benchmark data sets were presented and analysed in Section 6.11.

Chapter 7

Methods for the Optimisation of ECoS Networks Artists who seek perfection in everything are those who cannot attain it in anything Eugene Delacroix

7.1 Introduction It is readily apparent that in order to extract accurate rules, one must first possess an accurate network. The difficulties involved in optimising the training parameters of an ECoS network, in terms of reducing the size of the evolving layer while maintaining accuracy, were presented in the formalisation in Chapter 5. Two additional problems discussed in the previous chapter were the size of and redundancies within the evolving layer of trained ECoS networks. The first leads to a large number of extracted rules, and hence a decrease in the comprehensibility of the rules, and the second leads to rules that are redundant or contradictory. There is therefore a need to optimise both the training of ECoS networks, and trained ECoS networks, so that accuracy is improved and redundancies in the evolving layer are reduced or eliminated. By so doing the size of the evolving layer will be reduced, leading to more efficient (in terms of computational loading) networks and extracted rules that are easier to comprehend. Optimisation in this chapter is taken to mean the creation of an ECoS network that fulfills the following criteria. The network should:

Exhibit good memorisation of the training data.

Exhibit good generalisation to data it has not previously experienced.

Be parsimonious, that is, be of the smallest size that can fulfill the previous two criteria.

As was previously established, it is a challenging task in the general case to create an ECoS that fulfills these criteria. There is therefore a need for methods and algorithms that can be applied either before, during or after training of an ECoS network. There are many ways of optimising an ECoS network. This chapter presents a “toolbox” of methods for optimising an ECoS network at various stages of its life-cycle. It was established in Chapter 5 that an ECoS network is a result of a three-tuple, consisting of the initial ECoS network, the training data set, and the training parameters. In this chapter, a fourth element will be added, the post-training optimisation that can be applied to an ECoS network at the conclusion of a training operation.

CHAPTER 7. METHODS FOR THE OPTIMISATION OF ECOS NETWORKS

156

Firstly, the order of presentation of the training examples and their quality will have an effect. If the quality (in terms of consistency, distribution and noisiness) of the data is poor, then the network will perform poorly. This situation is solely in the hands of the user of the network, and is therefore outside the scope of this work. The order in which training examples are presented, however, is within the control of the ECoS training algorithm, and a method of optimising this is combined with a method of optimising the third element of the ECoS training function, the settings of the training parameters. It was proven in Chapter 5 that each of the parameters present in the ECoS training algorithm is non-orthogonal to the other parameters, that is, the value of one parameter will have an effect on the behaviour of the other parameters. Thus, optimising the values of these parameters is a multi-parameter optimisation problem: it is not possible to perform a search on one parameter at a time, while a combinatorial search is not feasible due to the amount of time required to investigate the complete parameter space. Finally, processing of the network at the completion of a training cycle can be used to optimise its performance and eliminate redundancies in the evolving layer. These four factors comprise two separate phases in an ECoS life-cycle: training, which concerns the application of the training data and training parameters; and posttraining processing, which concerns optimisation of a trained network. Each of these phases therefore presents an opportunity to optimise the network. The goal of this chapter is therefore to introduce and evaluate algorithms that optimise ECoS networks, either during training, or at the completion of training. For those methods that optimise during training, they should create networks that are of similar performance to unoptimised networks, while having a smaller number of neurons. For those methods that optimise after training, they should produce networks that are of similar performance to the original networks, while being significantly smaller. This chapter investigates Hypothesis Five from Section 1.2. That is, it investigates algorithms that may be used to optimise ECoS networks. The chapter is structured as follows: previous work in optimising ANN is briefly reviewed in Section 7.2. Optimisation of the training process is presented in Section 7.3. In this section, a method of dynamically removing redundant neurons is presented, and evolutionary algorithms are used to optimise both the order of training examples and the training parameters. Several novel methods of post-training processing and optimisation are presented in Section 7.4. Empirical evaluation of the optimisation techniques over the benchmark data sets is presented in Section 7.5. Conclusions to the work in the chapter are offered in 7.6. Finally, the chapter is summarised in Section 7.7.

7.2 Optimising Neural Networks As befits the substantial amount of active research being carried out into ANN, there are a large number of methods for optimising conventional ANN. This section reviews some of the existing methods, highlighting those that can be applied to ECoS and explaining for those that cannot be applied why this is the case. Two general groups of methods are pruning and optimisation with evolutionary algorithms.


157

7.2.1 Pruning Revisited The rationale behind pruning and some ways in which it can be done were presented in Section 3.2. Pruning algorithms are not considered here for application to ECoS, for the following reasons:

ECoS neurons represent regions, rather than hyperplanes in the input space.

The magnitude of the incoming weights of an ECoS evolving layer neuron do not reflect the importance of that neuron.

For the first point, if an ECoS neuron is simply removed, then an entire region of input space is no longer represented in the network. Also, the regions of the surrounding neurons would expand into the vacated region, could lead to further confusion within the ECoS. For the second point, many of the pruning algorithms described in Section 3.2 make use of a measure of the magnitude of the connection weights to determine which neurons will be removed. The connection weights of ECoS do not reflect the importance of the neuron. So, any pruning method based on weight magnitudes will fail for ECoS. For these reasons, pruning algorithms per se will not be used. Instead, ways of combining several neurons into one will be introduced, that allow for the retention of knowledge within the network.

7.2.2 Evolutionary Algorithms Selection of the architecture, training parameters and connection weights of an ANN is a multi-parameter optimisation problem. Since evolutionary algorithms (EA) are multi-parameter optimisation algorithms, it is not surprising that evolutionary algorithms have been extensively applied to optimising ANN. There are three general ways in which evolutionary algorithms have been applied to optimising ANN

Selection of the ANNs topology and / or input features (Angeline et al., 1994; Arena et al., 1993; Baba et al., 1992; Balakrishnan and Honavar, 1996; Bebis et al., 1996; Billings and Zheng, 1995; Bornholdt and Graudenz, 1992; Brown and Card, 1997; Cho and Shimohara, 1998; East and Rowe, 1997; Esat et al., 1999; Fukumi and Akamatsu, 1996; Gupta and Ding, 1994; Harp et al., 1990; Jacobsson and Olsson, 2000; Kasabov and Watts, 1997; Lee and Sim, 1998; Mandischer, 1993b; Mandischer, 1993a; McCullagh et al., 1997; McDonnell and Waagen, 1994; Optiz and Shavlik, 1997; Robbins et al., 1993; Sarkar and Yegnanarayana, 1997b; Sarkar and Yegnanarayana, 1997a; Schiffmann et al., 1990; Schiffmann et al., 1993; Torreele, 1991; Watts and Kasabov, 1998; Watts et al., 2002)

Selection of the training parameters or learning rules (Choi and Bluff, 1995; Fontanari and Meir, 1991; Kermani et al., 1999; McCullagh and Bluff, 1993; Watts et al., 2002)

Direct selection of connection weights (Aguiler and Colmenares, 1997; Belew et al., 1990; Chellapilla and Fogel, 1999; de Castro et al., 1998; Faraq et al., 1997; Fogel et al., 1997; Fukuda et al., 1997a; Gueriot and Maillard, 1996; Hanebeck and Schmidt, 1994; Heistermann, 1990; Hung and Adeli, 1994; Lei et al., 1997; Liu and Yao, 1996; Maillard, 1997; Menczer and Parisi, 1992; Moriarty and Miikkulainen, 1998;


158

Mühlenbein and Kindermann, 1989; Munro, 1993; Paredis, 1994; Philipsen and Cluitmans, 1993; Ray and Ghoshal, 1996; Scholz, 1990; Siddiqi and Lucas, 1998; Siddique and Tokhi, 2001; Smalz and Conrad, 1994; Fukuda et al., 1997b; Yao and Liu, 1996a; Yao and Liu, 1996b; Yao, 1997; Yao and Liu, 1998; Zhao, 1997) There are of course many overlaps between each of these groups. Selection of the architecture is not applicable to ECoS, because ECoS is a constructive algorithm: the learning algorithm itself will select the topology, and the network is expected to be able to grow and adapt to new examples. Evolutionary optimisation of an ANNs topology is only useful at the start of the networks life. This is also the reason that EA cannot be used to directly set the connection weight values of an ECoS, as the architecture would have to be selected simultaneously. This has been done, but is computationally expensive (Liu and Yao, 1996), and the same objections to direct selection of the architecture apply. If the ECoS learning algorithm is able to adequately select both the architecture and connection weights of an ECoS network, it is better to use that than replace it with an EA-based approach, as ECoS is much faster. Selection, or optimisation, of the training parameters is a suitable application of EA because it is a multiparameter optimisation problem, which is the kind of problem at which EA excel. It is also easy to evaluate the effectiveness of the training parameters, in terms of the criteria in the previous section. This is because there must be a data set present, which can be used to evaluate the accuracy of the network, and because the initial and final size of the network can be assessed. Evolutionary Optimisation of Training Parameters Evolutionary optimisation of training parameters seems to be less popular than the other applications: there are only a few papers that deal with this application. For example, in the excellent and comprehensive review of combinations of EA and ANN by Yao (Yao, 1999), while entire sections are devoted to the evolution of connection weights and topologies, scant mention is given to selection of training parameters. This may be due to a couple of reasons: firstly, it is a relatively trivial problem, compared to selection of the topology of the network or training; Secondly, it still requires training of the network for each individual in the evolving population, which engenders all of the problems involved with the speed of BP learning. In this subsection, some of the available publications on this topic will be briefly discussed. In (Choi and Bluff, 1995) a GA was used to select the learning rate, momentum, sigmoid parameter and number of training epochs to be used for the backpropagation training of the network. This technique was tested with several different data sets, including bottle classification data, where a glass bottle is classified as either being suitable for reuse or suitable for recycling and breast cancer data, which classifies tissue samples as malignant or benign. For each of the test data sets, the genetically derived network outperformed those networks whose control parameters were manually set, often by a significant margin. In (Kermani et al., 1999) a GA was used to optimise a MLP for processing the signals coming from an electronic nose. This work used a GA to select not just the learning parameters, but also the input features and training examples to use. The results were that the GA was able to find optimal combinations of parameters, as well as minimal training sets. In (Watts et al., 2002) similar work was done: in this case, the input features, number of hidden neurons,


159

and training parameters of a MLP were selected via GA. The work investigated the use of a MLP to model the efficiency of stop codons in biological protein synthesis. The MLP created via the GA optimised training were much smaller and more accurate than the previous work (Watts et al., 2001) on the same data.

7.2.3 Requirements for ECoS Optimisation Algorithms From the review above, the following requirements for ECoS optimisation algorithms can be formulated:

They should adequately balance size and performance in the final optimised ECoS network.

They must be generic to ECoS, that is, that must not wherever possible rely on architectural artifacts of either SECoS or EFuNN.

Post-training optimisation algorithms should not require the presence of a data set.

7.3 Methods for Optimising Training Training optimisation is aimed at optimising the training process so that the network that results is both parsimonious and accurate. While training is of little use if the network does not adapt to the new data, it will severely impact the performance of the ECoS network if the evolving layer is allowed to grow too large during training. One way of reducing the size of the evolving layer is to reduce redundancy in the points represented by the neurons. This is the rationale behind online aggregation, described in Subsection 7.3.1. Although this algorithm is able to reduce the rate of growth of the network, it does introduce two more parameters to the training process. It was proven earlier in this thesis that the four basic ECoS training parameters are interrelated. If online aggregation is used, then there are six interrelated parameters to optimise (seven if the choice to use aggregation is counted as a parameter). This is a difficult task to perform manually, but can be approached with an Evolutionary Algorithm. This is the rationale behind the evolving training parameter optimisation algorithm described in Subsection 7.3.2.

7.3.1 Online Neuron Aggregation Evolving layer neuron aggregation is the process of combining several adjacent neurons into one neuron that represents all of the previous exemplars for that spatial region. During the aggregation process, the distance between the incoming and outgoing weight vectors of two neurons is calculated. If the distances are below specified thresholds, the two neurons are either aggregated together, or added to a set of neurons that are all aggregated into one. The rationale behind aggregation is to reduce the size of the evolving layer of the ECoS, while retaining the knowledge stored within the connections to each neuron. The normalised incoming distance Din between two nodes m and n is measured according to Equation 7.1,

is the number of input nodes. The outgoing distance Dout between m and n is measured according to Equation 7.2, where a is the number of output nodes. where

Online aggregation is carried out when neuron connection weights are modified. After the weight changes have been applied, the incoming and outgoing distances between the modified neuron and its immediate neighbours are


160

measured. If both distances are below the aggregation thresholds, then the neurons are aggregated together. There is no need to measure the distance between any other neurons, as only one neuron at a time is ever modified under the canonical ECoS training algorithm, which means that the spatial position of only one neuron is ever changing at any one time. Note that this algorithm assumes that new neurons are allocated spatially, using such a method as the Minimum Distance strategy from Subsection 4.4.1.

X

D

in m;n

=1 =X

i

i

=1

a X

=1 out Dm;n =X a o

o

=1

j Wi;m Wi;n j j Wi;m + Wi;n j

(7.1)

j Wm;o Wn;o j j Wm;o + Wn;o j

(7.2)

The aggregated incoming and outgoing weights are calculated according to Equations 7.3 and 7.4, where k is the neuron that results from the aggregation.

W ;k = Wk;a =

W ;n + W ;m

(7.3)

Wn;a + Wm;a

(7.4)

2

2

Online aggregation has the advantage of modifying the network as training is underway: there is no need to halt training at any point to perform a global (offline) aggregation, and there is no need to examine every neuron in the evolving layer as there is with offline aggregation (Subsection 7.4.1). The disadvantage of online aggregation is that there are two more parameters to optimise, in addition to deciding when to utilise the approach. This makes the parameter optimisation problem more complicated, which leads to the next approach to training optimisation, the optimisation of parameters by genetic algorithm. Experimental results with this algorithm over the benchmark data sets are presented in Section 7.5.

7.3.2 Optimisation of Parameters by Evolutionary Algorithm As discussed in Section 7.2 evolutionary algorithms, and genetic algorithms in particular, have been used both in selecting the training parameters and conditioning the training data. A genetic algorithm can be applied to do both of these for an ECoS network. It was established in Chapter 5 that the results of training are determined by both the training parameters and the training examples, which includes the order in which the examples are presented. A method of automatically modifying the sensitivity and error thresholds was presented in (Kasabov, 2003, pg 86-87). Here, small, preselected delta values are added or subtracted from the current values of the sensitivity and error thresholds, based on a test of the error over a certain number of training examples. This approach does not really solve the problem, though, because the delta values must still be selected before the commencement of training. The purpose of the


161

GA described in this section is to overcome the problem of parameter selection, by evolving both the training parameter values and the order in which the training examples are presented. The chromosome used consisted of the training parameters, encoded as floating point numbers, followed by a long sequence of integers that indirectly encode the order of the training set. That is, there was one integer gene for each example in the training data set, and the value of that gene specified which slot in the training data set that example should occupy. This is analogous to the way in which an array of values is shuffled: for each entry in the array, a random number is selected, that represents the final position of the entry. The current entry, and the entry in the slot nominated by the random number, are then swapped. The GA does the same thing, except the number of the nominated slot is evolved by the GA, rather than being randomly selected. Thus, while the numbers in the chromosome do not directly describe the final order of the training set, they do indirectly, and deterministically, represent it in a form that recombination and mutation operators can function on without the constraints that a direct encoding of order would impose. For evaluation, one or two data sets are needed. As the GA is optimising the training, a training data set must be supplied. An optional test data set can also be used, to measure the generalisation ability of the trained network. Each individual is decoded into a set of training parameters and a training set. Then a copy of the ECOS being optimised was trained and recalled with the training data set, and if present, the testing set. The error over these data sets, along with the change in size of the evolving layer of the SECoS, were used to calculate the fitness of the individual, according to Equation 7.5.

1 1 fi = + + 1 1 + et 1 + er

jnt+1 nt j d

(7.5)

where:

fi is the fitness of individual i, et is the training error, er is the error over the test set, nt is the size of the evolving layer at time t, and d is the number of vectors in the training set. Each of the terms of the fitness function are equally weighted, that is, each term yields values of the same magnitude. The range of (0; 1) for each term is quite arbitrary but with the use of fitness normalisation in the GA the actual range of values doesn’t matter. The form of the two error terms were chosen over a simple inverse of the error to prevent the unbounded growth of the influence of one term. If simple inverses were used, the effect of a small error over the training set, for example, would overwhelm the effect of a large error over the testing set. The term dealing with the number of neurons added is designed such that adding neurons at the maximum rate of one per training example will yield zero, while adding the minimum number of neurons (none) will yield unity. As the goal of the GA is to produce a set of training parameters that maximises accuracy while minimising the number of neurons used, these terms allows the GA to find the balance between these metrics. As originally described in (Watts and Kasabov, 2002), the size component was measured according to Equation 7.6, and the presence of the testing set was not optional. The size component of the fitness function was changed


162

as the new measure is in the same range as the other components ((0; 1), as opposed to (0:5; 1)). The testing set was made optional to obviate problems identified in (Watts and Kasabov, 2002): specifically, that when using this algorithm in an online application, where there is a data stream, it may be difficult to partition the stream into a training and testing set. The optionality of the testing set obviates this problem.

1 j n +1 1+ t

d

nt

j

(7.6)

Thus, as the errors and change in network size all approach zero, the fitness will approach the maximum value of three (or two, if no test set is used). It is unlikely, but not impossible, for any individual to reach the maximum fitness, as that would require a perfect learning of the data with no change in network size. Experimental results with this algorithm over the benchmark data sets are presented in Section 7.5.

7.4 Methods for Post-training Optimisation The purpose of post-training optimisation is to reduce the size of the evolving layer of an ECoS network while preserving the network’s performance. The three methods described in this section reduce the size of the network by removing redundant neurons. Redundant neurons are neurons whose function is being or can be largely or entirely fulfilled by other neurons in the evolving layer. That is, the Voronoi region defined by the neuron can be adequately described by another neuron.

7.4.1 Offline Aggregation The difference between online and offline aggregation is very simple. Online aggregation is performed while training is still underway: the comparisons of neurons and the aggregation of these neurons is done when a neuron’s connection weights are altered by the training algorithm, and only the altered neuron and its immediate neighbours are examined. Offline aggregation, however, is performed when training has terminated, and all neurons are examined. The offline aggregation strategy exhaustively compares each neuron to every other neuron. A naive application of this strategy would require n(n

1) comparisons for n neurons and would also run into the same redundancy

problems as the fuzzy rule aggregation phase of EFuNN rule extraction (Section 6.6). A more careful application of this strategy is as follows:

For each neuron n in the evolving layer – Add n to the aggregation set . – For each neuron m > n in the evolving layer

IF m 2 =

Measure the distance between the incoming (Din ) and outgoing (Dout ) weight vectors of n and m.


IF Din

163

Tin and Dout Tout THEN add m to .

– Aggregate all neurons in into one neuron k . – replace n with k . This strategy is very thorough: all neurons that are close together will be aggregated together, no matter where they are in the evolving layer. The use of the set

does reduce somewhat the number of comparisons made, as

n(n 1) is now the maximum number of comparisons made. Offline aggregation requires that training be halted before optimisation can be carried out. However, experiments have shown (Section 7.5) that it is able to reduce the size of the target network more effectively than online aggregation. Experimental results with these algorithms over the benchmark data sets are presented in Section 7.5.

7.4.2 Sleep Learning in ECoS Networks During training, modification of the connection weights of ECoS network may move the evolving layer neurons to be very close together, making some of the neurons redundant. Sleep learning is a way of identifying and removing these redundant neurons. It is performed in an offline mode using the exemplars stored within the ECoS network: the network is “asleep” to external stimuli. The idea of sleep learning in ECoS networks has been suggested before (Kasabov, 1998b) but the form of sleep learning discussed in the previous work is based on the idea of strengthening already learned concepts, rather than reducing the size of the network. As it is described in this section, it is based upon the GAL sleep learning algorithm ((Alpaydin, 1994), Section 3.12), with modifications that account for the differences between GAL and ECoS. Thus, it is able to reduce the size of the ECoS network while also retaining the knowledge captured by the network. The algorithm for sleep learning of ECoS is as follows:

For each neuron n in the evolving layer – Extract the incoming and outgoing connection weights to use as an example.

Because the outgoing weight values can exceed unity, process these weights to fall into the range of the output layer activation function.

For each example x in the set of extracted examples – Remove the corresponding neuron n from the network. – Propagate x through the network. – IF the maximum neuron activation in the evolving layer is less than Sthr OR the error is greater than

Ethr THEN

Re-insert n into the network.

– ELSE modify the incoming and outgoing connection weights of the winning neuron, according to Equations 4.3 and 4.4.


164

This extracts all exemplars at the start of learning because the weight modification to neurons examined later in the sleep training process can cause a disruption to the sleep learning process. This is because the purpose of sleep learning is to eliminate superfluous neurons that are not necessary to adequately represent all exemplars stored within the network at the start of sleep learning. If the exemplar for a neuron is extracted after learning has been applied to that neuron, then the exemplar will be different. In effect, this would cause the sleep learning algorithm to chase a moving target. A problem arises when applying this method to an EFuNN network: because the exemplars are fuzzified, for the sleep learning algorithm to be applied the exemplars must first be defuzzified. As different defuzzification algorithms will yield different results, the behaviour of the sleep learning algorithm will vary according to how the exemplars are defuzzified. In other words, the values that are extracted may not be a true representation of the neurons in non-fuzzy space. The problem with this algorithm lies with optimising the parameter values. There are almost as many parameters to optimise as the training algorithm. A possible solution to this problem is examined in the next subsection. Experimental results with this algorithm over the benchmark data sets are presented in Section 7.5.

7.4.3 Evolutionary Sleep Learning ECoS sleep learning suffers from the same problem as ECoS training: namely, the difficulty in selecting optimal parameters. Selection of these parameters via evolutionary algorithm (Section 7.3.2) demonstrated that this is effective: it is logical then to apply an EA to selecting the parameters to sleep ECoS learning. Since the goal of sleep learning is to reduce the size of an ECoS while maintaining its performance, the goal of the EA is to select a set of parameters such that the ECoS size is reduced as much as possible, while the performance of the network is disturbed as little as possible. The EA must therefore minimise both the size of the network (or equivalently maximise the reduction in size) and minimise the change in error of the network. The EA in this case is optimising only four parameters, the error threshold, sensitivity threshold and two learning rates. The chromosome therefore consists of only four real-valued genes, one for each parameter. Evaluation of the network performance after sleep learning is performed by assessing its error over the extracted exemplars. The fitness function is thus as in Equation 7.7.

fi =

nt nt+1 + (1 nt

jet+1 etj)

(7.7)

where:

nt is the number of neurons in the evolving layer before sleep training, and nt+1 is the number of neurons in the evolving layer after sleep training. et is the network error over the extracted exemplars before sleep training, and

et+1 is the network error over the extracted exemplars after sleep training. Thus, as the network size decreases, and the error remains low, the fitness will approach the maximum value of two. Again, it is not possible for the fitness to actually reach two, as that would mean that the network has had all of its neurons removed without losing any accuracy. This is, of course, impossible.


Error Threshold

0.1

Sensitivity Threshold

0.5

Learning Rate One

0.5

Learning Rate Two

0.5

Threshold In

0.5

Threshold Out

0.5

165

Table 7.1: Training parameters for online aggregation

Hypothesis

H0 H1

AA

AB

AC

AF

aa = aa ab = ab a = a af = af a a a a aa aa ab ab a a af 6= a 6= a 6= a 6= af a

AN

an = an a an < an a

Hypothesis

BA

BB

BC

BF

BN

H0 H1

ba = ba a ba 6= ba a

bb = bb a bb 6= bb a

b = b a b 6= b a

bf = bf a bf 6= bf a

bn = bn a bn < bn a

Table 7.2: Statistical hypotheses for evaluating online aggregation. As an evolutionary algorithm, this algorithm can be quite slow, especially with a large network. This is because a large network will be slow to have the extracted exemplars propagated though it: there will also be a large number of exemplars to extract. The algorithm does, however, have the advantage of being independent of any external data sets: all that is required is a trained ECoS network. Experimental results with this algorithm over the benchmark data sets are presented in Section 7.5.

7.5 Experiments with Benchmark Data Sets The results of each optimisation technique over each benchmark dataset are presented in this section. The results over the Two Spirals problem are presented in Subsection 7.5.7, Iris Classification in Subsection 7.5.8, MackeyGlass in Subsection 7.5.9, and Gas Furnace in Subsection 7.5.10. The general experimental setups, parameters and statistical hypotheses used for each optimisation technique are discussed in the subsections below.

7.5.1 Online Aggregation Experimental Method The format of the online aggregation experiments followed that of the experiments in Section 4.13. That is, a network was trained, tested, then further trained and tested again. The only difference was that online aggregation was used during the training phases of the experiment. The parameters used during training are listed in Table 7.1. The accuracies and sizes of the networks were compared to the accuracies and sizes of the networks created for Section 4.13. The statistical hypotheses used for these comparisons are listed in Table 7.2. In this table, a subscript of a indicates a network trained using online aggregation.


Population Size

100

Generations

100

Crossover rate

0.5

Mutation rate

0.1

Selection strategy

tournament

166

Table 7.3: Parameters for GA optimised training over Two Spirals and Iris Classification data sets.

Population Size

50

Generations

50

Crossover rate

0.5

Mutation rate

0.1

Selection strategy

tournament

Table 7.4: Parameters for GA optimised training over Mackey-Glass and Gas Furnace data sets.

A two-tailed t-test was used to evaluate each hypothesis, with the exception of hypotheses AN and BN, where one-tailed t-tests were used. The results of the statistical hypothesis tests are presented in Appendix D.

7.5.2 Evolutionary Optimised Training Experimental Method The setup for these followed the usual pattern of training, testing, further training and re-testing. Training in this case used evolutionary optimisation of the training parameters and data set order. Since evolutionary algorithms are stochastic, one hundred runs were performed over each fold of the data, for a total of one thousand individual runs. The training parameters used for the two spirals and iris classification data sets are listed in Table 7.3. Due to time constraints, a smaller population size and lesser number of generations was used for the experiments across the Mackey-Glass and gas furnace data sets, as these are the larger of the benchmark data sets. For these problems, the parameters in Table 7.4 were used. The EA selected was a real-encoded variant of Goldberg’s Simple Genetic Algorithm (Goldberg, 1989). This algorithm was used because it is well known, and because code for it was available for use by the author. The results are presented as the mean of all one thousand runs, and the approximate variance is used as the measure of variance. The networks trained via evolutionary optimised training were compared to the baseline networks from Section 4.13. The statistical hypotheses used for these comparisons are listed in Table 7.5. In this table, a subscript of g indicates that the network was trained using parameters optimised via GA. A two-tailed pooled variance t-test was used to evaluate each hypothesis, with the exception of hypotheses AN and BN, where one-tailed t-tests were used. The results of the statistical hypothesis tests are presented in Appendix D.


Hypothesis

H0 H1

AA

AB

AC

AF

aa = aa ab = ab a = a af = af g g g g aa aa ab ab a a af 6= g 6= g 6= g 6= af g

167

AN

an = an g an < an g

Hypothesis

BA

BB

BC

BF

BN

H0 H1

ba = ba g ba 6= ba g

bb = bb g bb 6= bb g

b = b g b 6= b g

bf = bf g bf 6= bf g

bn < bn g bn < bn g

Table 7.5: Statistical hypotheses for evaluating evolutionary optimised ECoS training. Threshold In

0.5

Threshold Out

0.5

Table 7.6: Parameters for offline aggregation.

7.5.3 Offline Aggregation Experimental Method The experimental procedure for testing this algorithm was slightly different from the above. This is because offline aggregation is performed after the termination of a training session. The procedure was therefore as follows: a network was trained from scratch across Set A, and tested across all data sets. The network was then aggregated, and again tested across A, B and C. The aggregated network was then further trained on Set B, tested, aggregated again and then tested again. There were therefore four sets of results to be considered. The results of the first train and test cycle are identical to those in Section 4.13. It was only after aggregation was performed the first time that the results differed. The aggregation parameters used are presented in Table 7.6. The purpose of aggregation is to, where possible, reduce the size of the network while maintaining as much as possible the initial performance of the network. The statistical hypotheses used to test this were thus as in Table 7.7. A two-tailed paired-sample t-test was used to evaluate each hypothesis, with the exception of hypotheses AN and BN, where one-tailed paired-sample t-tests were used. Paired sample tests were used because the aggregated network is derived directly from the original network: the performance of the aggregated network will therefore be dependent upon the performance of the original network.

Hypothesis

H0 H1

AA

AB

AC

AF

aa = aa ab = ab a = a af = af a a a a aa aa ab ab a a af 6= a 6= a 6= a 6= af a

AN

an = an a an < an a

Hypothesis

BA

BB

BC

BF

BN

H0 H1



b = b a b 6= b a

bf = bf a bf 6= bf a

bn = bn a bn < bn a

Table 7.7: Statistical hypotheses for evaluating offline aggregation.


Error Threshold

0.1

Sensitivity Threshold

0.5

Learning Rate One

0.5

Learning Rate Two

0.5

168

Table 7.8: Sleep Training Parameters.

Hypothesis

H0 H1

AA

AB

AC

AF

AN

aa ab a aa ab a af s = st s = st s = st s = af aa aa ab ab a a s 6= st s 6= st s 6= st af s 6= st af st

an an s = st an an s < st

Hypothesis

BA

BB

BC

BF

BN

H0 H1

ba ba s = st ba ba s 6= st

bb bb s = st bb bb s 6= st

b b s = st b b s 6= st

bf bf s = st bf bf s 6= st

bn bn s = st bn bn s < st

Table 7.9: Statistical hypotheses for evaluating sleep training. The results of the statistical hypothesis tests are presented in Appendix D.

7.5.4 Sleep Training Experimental Method The overall format of the experiments with the sleep training algorithm was the same as for the offline aggregation algorithm: after each training session, the network was subjected to sleep training, then tested. The parameters used for the sleep training were as in Table 7.8. The parameters are the same as the training parameters used in Section 4.13, and were selected for the following reason: sleep training is intended to remove redundant neurons, which occur because the ECoS training algorithm allows evolving layer neurons to ‘drift’ in space towards other neurons. By sleep training the network using the same parameters it was originally trained with, these redundant neurons are more likely to be removed. In other words, the final network will be closer to the network that would have existed, had the ECoS been globally trained. The sleep trained networks were compared with the original networks, that is, the networks as they were before sleep training. The statistical hypotheses used for this comparison are listed in Table 7.9, where a subscript of st indicates a network after sleep training. Two-tailed, paired sample t-tests were used, except for hypotheses AN and BN, where one-tailed paired sample

t-tests were used. The results of the statistical hypothesis tests are presented in Appendix D.

7.5.5 Evolved Sleep Training Experimental Method The evolved sleep training algorithm was tested using a similar experimental setup to the sleep training. As the GA used is stochastic, one hundred runs were performed across each fold of the data. There were thus one thousand runs altogether. The GA parameters used are listed in Table 7.5.5. These parameters, as with the GA


Population Size

100

Generations

100

Crossover rate

0.5

Mutation rate

0.1

Selection strategy

tournament

169

Table 7.10: Parameters for GA optimised sleep training.

Hypothesis

H0 H1

AA

AB

AC

AF

AN

= = = = aa ab a af aa ab a af s 6= es s 6= es s 6= es s 6= es aa s

aa es

ab s

ab es

a s

a es

af s

af es

= an es an s < an es an s

Hypothesis

BA

BB

BC

BF

BN

H0 H1

ba ba s = es ba ba s 6= es

bb bb s = es bb bb s 6= es

b b s = es b b s 6= es

bf bf s = es bf bf s 6= es

bn bn s = es bn bn s < es

Table 7.11: Statistical hypotheses for evaluating evolved sleep training. optimised training described above, are within the ranges described in (Grefenstette, 1986) and are considered to be conservative. Results are presented as the mean across all one thousand experiments, with the approximate variance as the measure of variance. Two sets of comparisons were performed. The first compared the networks resulting from evolved sleep training with the original networks. The statistical hypotheses used for these comparisons are listed in Table 7.11, where a subscript of es indicates a network optimised by evolved sleep training. Two-tailed pooled variance t-tests were used to evaluate these hypotheses. A one-tailed test was used for hypotheses AN and BN. Unpaired t-tests were used because of the stochastic nature of the GA used: the randomness involved means that the same original network can yield many different resulting networks. Comparisons were also made between the networks resulting from evolved sleep training, and the networks that resulted from the unoptimised sleep training above. The purpose of these comparisons was to evaluate whether or not the evolutionary optimisation of parameters yields a network that is significantly better than the unoptimised algorithm. The statistical hypotheses used for these comparisons are presented in Table 7.12. In this table, a subscript of es indicates a network resulting from the evolved sleep training algorithm. Again, two-tailed pooled variance t-tests were used to compare the accuracies, while a one-tailed pooled variance t-test was used to compare the number of neurons. The results of the statistical hypothesis tests are presented in Appendix D.


Hypothesis

H0 H1

AA

AB

AC

170

AF

AN

aa ab a af aa ab a af st = es st = es st = es st = es aa ab a af aa ab a af st 6= es st 6= es st 6= es st 6= es

an an st = es an an st 6= es

Hypothesis

BA

BB

BC

BF

BN

H0 H1

ba ba st = es ba ba st 6= es

bb bb st = es bb bb st 6= es

b b st = es b b st 6= es

bf bf st = es bf bf st 6= es

bn bn st = es bn bn st 6= es

Table 7.12: Statistical hypotheses for comparing sleep trained networks and evolved sleep trained networks. Hypothesis

H0 H1

AA

AB

AC

AF

AN

= = = = aa ab a af aa ab a af a 6= g a 6= g a 6= g a 6= g aa a

aa g

ab a

ab g

a a

a g

af a

af g

= an g an a 6= an g an a

Hypothesis

BA

BB

BC

BF

BN

H0 H1

ba ba a = g ba ba a 6= g

bb bb a = g bb bb a 6= g

b b a = g b b a 6= g

bf bf a = g bf bf a 6= g

bn bn a 6= g bn bn a 6= g

Table 7.13: Statistical hypotheses for comparing online aggregation and evolutionary optimised ECoS training.

7.5.6 Method for the Comparison of Techniques Several different optimisation methods exist for either of the two phases of training. While numerous comparisons could be made between these methods, only a few are informative. There is no point, for example, in comparing the results of online aggregation of EFuNN to the results of sleep training of SECoS. Comparisons are thus restricted to being within the same network type (SECoS or EFuNN), and within the same phase of training (during training, or post-training). The following comparisons of results (accuracies and network sizes) were carried out:

Online aggregation vs. evolutionary optimised training.

Offline aggregation of SECoS vs. sleep training of SECoS.

Offline aggregation of SECoS vs. evolutionary optimised sleep training of SECoS.

The first comparison was carried out using the hypotheses listed in Table 7.13. The subscripts used are as above. Two tailed t-tests were used to test each hypothesis. The second comparison was performed using the hypotheses listed in Table 7.14. Again, two-tailed t-tests were used to test each hypothesis. The third comparison was performed using the hypotheses listed in Table 7.15. Two-tailed pooled-variance

t-tests were used to evaluate each hypothesis. The results of the statistical hypothesis tests are presented in Appendix D.


Hypothesis

H0 H1

AA

AB

AC

171

AF

AN

af aa ab a aa ab a af a = st a = st a = st a = st af aa ab a aa ab a af a 6= st a 6= st a 6= st a 6= st

an an a = st an an a 6= st

Hypothesis

BA

BB

BC

BF

BN

H0 H1

ba ba a = st ba ba a 6= st

bb bb a = st bb bb a 6= st

b b a = st b b a 6= st

bf bf a = st bf bf a 6= st

bn bn a = st bn bn a 6= st

Table 7.14: Statistical hypotheses for comparing offline aggregation and sleep training. Hypothesis

H0 H1

AA

AB

AC

AF

AN

= = = = aa ab a af aa ab a af a 6= es a 6= es a 6= es a 6= es aa a

aa es

ab a

ab es

a a

a es

af a

af es

= an es an a 6= an es an a

Hypothesis

BA

BB

BC

BF

BN

H0 H1

ba ba a = es ba ba a 6= es

bb bb a = es bb bb a 6= es

b b a = es b b a 6= es

bf bf a = es bf bf a 6= es

bn bn a = es bn bn a 6= es

Table 7.15: Statistical hypotheses for comparing offline aggregation and evolved sleep training.

7.5.7 Experiments with the Two Spirals Dataset The results for each algorithm, as a percentage of examples correctly classified and the number of neurons in the evolving layer of the network, are presented in Tables 7.16 and 7.17. Table 7.16 presents the results for SECoS, and Table 7.17 presents the results for EFuNN. The results for each network type are presented together in one table to make it easier to compare the performance of one algorithm with another. The results of the baseline SECoS and EFuNN networks are also included in these tables for the purposes of comparison. The analyses of these results are presented and discussed in the subsections below, where each subsection is devoted to one algorithm. Online Aggregation The results for training including the online aggregation algorithm are presented in the rows labelled ‘Online Agg.’ in Tables 7.16 and 7.17. The results indicate that the resulting networks were smaller and, for the case of SECoS, more accurate than the originals. The hypotheses in Table 7.2 were tested across the SECoS experiments, and the results of these tests are presented in Table D.1. These tests show that while the accuracies were slightly higher, the accuracies were not significantly higher. Similarly, while the networks trained with online aggregation are slightly smaller, they were not significantly smaller. Similar results were found for EFuNN. The accuracies and sizes of the EFuNN networks are presented in Table 7.17: again the networks were slightly smaller, and generally slightly more accurate than the unoptimised networks. Testing the hypotheses in Table 7.2 yielded the results presented in Table D.2. Again, while the networks trained using online aggregation were slightly better, they are not significantly better.



Online Agg.

GA Training

Offline Agg. (Before)

(After)

Sleep (Before)

(After)

GA Sleep (Before)

(After)

172

Trained on Set B

A

B

C

All

Neurons A

B

C

All

Neurons

50.2/

57.4/

11.3/

47.1/

6.0/

44.2/

85.3/

54.2/

49.4/

6.4/

9.3

23.4

14.7

8.0

2.7

11.7

17.2

26.4

10.2

3.7

50.5/

59.6/

15.0/

47.8/

4.5/

46.9/

87.6/

55.4/

51.9/

4.7/

8.7

22.8

25.8

8.3

1.2

12.8

15.9

26.3

11.1

1.6

65.2/

46.7/

47/

56.2/

9.1/

56.2/

83.7/

42.6/

58.0/

10.0/

1.0

3.2

2.8

0.8

1.0

0.8

1.2

2.7

0.7

1.0

50.2/

57.4/

11.3/

47.1/

6.0/

46.6/

96.4/

39.4/

50.8/

4.6/

9.3

23.4

14.7

8.0

2.7

4.1

5.0

21.6

2.4

1.3

54.7/

58.2/

12.1/

50.8/

3.3/

45.7/

95.8/

41.8/

50.4/

3.3/

3.1

13.6

20.5

2.0

0.9

3.8

8.2

14.9

2.7

0.9

50.2/

57.4/

11.3/

47.1/

6.0/

41.9/

89.6/

70.1/

49.5/

4.5/

9.3

23.4

14.7

8.0

2.7

3.8

9.0

13.6

2.5

3.4

52.6/

51.9/

25.4/

49.8/

3.2/

49.8/

53.4/

52.2/

50.4/

2.9/

4.1

17.8

30.2

2.4

2.3

5.3

20.0

12.9

3.1

2.6

50.2/

57.4/

11.3/

47.1/

6.0/

44.6/

90.3/

63.1/

51.1/

3.9/

9.3

23.4

14.7

8.0

2.7

0.1

0.1

0.2

0.2

0.0

50.8/

46.6/

44.0/

49.7/

1.6/

50.1/

47.1/

55.0/

50.3/

1.1/

0.1

0.1

0.1

0.1

0.0

0.1

0.2

0.2

0.1

0.0

Table 7.16: Mean percent correct / standard deviation / approximate variance (to 1 d.p.) for the two spirals problem. Initialised / Trained on Set A Recall Set EFuNN

Online Agg.

GA Training


(After)

Trained on Set B

A

B

C

All

Neurons A

B

C

All

Neurons

76.6/

21.1/

17.9/

65.1/

52.1/

69.6/

89.9/

17.2/

66.3/

59.1/

11.9

22.9

20.1

7.7

20.3

11.8

18.9

20.3

7.4

14.2

63.4/

25.4/

27.3/

55.9/

40.0/

57.6/

80.0/

23.3/

56.3/

40.0/

6.7

23.7

26.3

4.8

17.8

8.5

24.0

19.4

5.5

17.8

99.8/

16.9/

13.1/

82.8/

46.6/

93.4/

99.9/

21.7/

86.9/

49.9/

0.4

0.8

0.9

0.5

0.6

0.8

0.3

1.1

0.7

0.6

76.6/

21.1/

17.9/

65.1/

52.1/

53.7/

35.0/

40.6/

50.6/

14.4/

11.9

22.9

20.1

7.7

20.3

6.4

28.9

25.0

6.2

4.5

53.7/

35.0/

40.6/

50.6/

6.0/

35.3/

63.7/

34.2/

37.9/

6.0/

6.4

28.9

25.0

6.2

0.0

9.3

20.3

30.2

6.9

0.0

Table 7.17: Average mean squared error / standard deviation / approximate variance (to 3 d.p.) for the two spirals problem.


173

Evolutionary Optimised Training As the evolutionary algorithm used in these experiments was a genetic algorithm, the results are presented in the rows labelled ‘GA Training’ in Tables 7.16 and 7.17. Inspection of the results for SECoS shows that the GA optimised training was able to better learn the training data, but was slightly less accurate at generalisation. Testing the hypotheses listed in 7.5 yielded the results presented in Table D.3. These results show that after training on Set A, using GA optimised training parameters, the resulting networks performed highly significantly better over the training set than the network trained using unoptimised parameters. On the other hand, it was highly significantly less accurate at generalising over Set B, but highly significantly more accurate over Set C. The size of the networks was significantly larger than the baseline networks, which suggests that the GA sacrificed size for learning accuracy. After further training on Set B, the same situation existed. The accuracies over Sets A and B were highly significantly better than the baseline networks, while the generalisation accuracy was significantly lower. Again, the GA sacrificed a small size for accuracy, as the optimised networks were significantly larger than the baseline networks. The results for EFuNN are similar. Again, the optimised EFuNN were more accurate over the training data, and less accurate over the unseen data. Testing the hypotheses listed in Table 7.5 yielded the results in Table D.4. These results show that after training on Set A, the EFuNN were highly significantly more accurate over Set A than the baseline EFuNN networks. They were, however, significantly less accurate over unseen data. The overall accuracy was highly significantly better than the baseline networks. The optimised EFuNN were also significantly smaller than the baseline, so in this case, the GA was able to optimise both the accuracy over the training data, and the size of the network, but at the expense of generalisation accuracy. After further training over Set B, the networks adapted very well, with the accuracy over Set B being highly significantly better than the unoptimised networks. In this case, generalisation accuracy was also significantly better, while the size of the networks was still significantly smaller. Offline Aggregation The results for the experiments with offline aggregation are presented in the sections of Tables 7.16 and 7.17 labelled ‘Offline Agg.’. There are two rows in each of these sections: the first, labelled ‘Before’ are the results of the networks after conventional ECoS training. The second, labelled ‘After’, are the results of the networks after they have been optimised using offline aggregation. Testing the hypotheses listed in Table 7.7 gave the results presented in Table D.5. These results show that while the networks were significantly reduced in size after aggregation, there were no significant changes in accuracies. The same statistical tests were performed for EFuNN. The results of the statistical tests are presented in Table D.6. These results show that while there were significant decreases in accuracy over subset A, there was no significant changes in accuracy over Sets B and C.


174

Sleep Learning Sleep training was applied only to SECoS networks. The results of applying this algorithm are presented in the section labelled ‘Sleep’ in Table 7.16. There are two rows in this section, labelled ‘Before’ and ‘After’, where ‘Before’ is the performance of the network after conventional ECoS training, and ‘After’ is the performance after optimisation via sleep learning. Further training caused a slight increase in the size of the network. Testing of the hypotheses listed in Table 7.9 yielded the results in Table D.7. These results show that, for the initially trained networks, the sleep learning algorithm was able to produce networks that were as accurate as the originals, but were significantly smaller. For the additionally trained networks, the optimised networks were more accurate over Set A, but less accurate over Sets B and C. Again, the optimised networks were significantly smaller than the originals. Evolved Sleep Learning As with sleep learning, evolved sleep learning was applied only to SECoS. As the evolutionary algorithm used was a GA, the results are presented in the section of Table 7.16 labelled ‘GA Sleep’. The row labels ‘Before’ and ‘After’ have the same meaning as before. The hypotheses listed in Table 7.11 were tested, which yielded the results presented in Table D.8. These results confirm that in all cases, the networks that result from the evolved sleep training algorithm were significantly smaller. With few exceptions, they were also significantly more accurate. After additional training, the optimised networks were more accurate on average over Set A and the full data set, but less accurate over Sets B and C. The networks that were optimised after additional training were significantly more accurate over Set A, but less accurate over the other Sets. A comparison of sleep learning, and evolutionary optimised sleep learning, was performed by testing the hypotheses in Table 7.12. The results of these tests are presented in Table D.9. These results show that there was no significant difference in the size of the networks created by either algorithm. The accuracy of the networks created by evolved sleep learning was only significantly better over Set C, and was either significantly worse or equivalent over the other subsets. Comparison of Techniques This subsection will discuss the results of comparing the optimisation techniques investigated. A comparison of the networks resulting from online aggregation and evolutionary optimised training was performed by testing the hypotheses listed in Table 7.13. The results of these tests over SECoS are presented in Table D.10. These results show that there were significant differences across all data sets, and that the size of the networks were also significantly different. Inspection of the accuracies, with the results of the tests in mind, shows that after training on Set A the networks resulting from online aggregation were more accurate over Sets A and B, but were less accurate over Set C. The online aggregation algorithm also produced networks that were significantly smaller. The results after further training on Set B were much the same, with the exception being that the performance over Set C was less for the evolutionary optimised networks.


175

The results of the same tests over EFuNN are presented in Table D.11. The results were quite similar to those for SECoS: the major difference is that after training on Set A, the accuracies over Sets B and C were inferior. Again, the sizes of the networks were significantly larger for evolutionary trained EFuNN. After further training on Set B, the performance over Set B was superior to the online aggregated networks, in contrast to the case with SECoS. The mean network size remained larger. A comparison of offline aggregation of SECoS and sleep learning optimisation of SECoS was performed by evaluating the hypotheses listed in Table 7.14, and the results are presented in Table D.12. As expected from the discussions above, there were no significant differences between SECoS optimised by offline aggregation and SECoS optimised by sleep learning. The accuracies and sizes of the networks were all either identical or very similar. The performance of SECoS offline aggregation and evolutionary optimised sleep learning was compared by testing the hypotheses listed in Table 7.15. The results of these tests are presented in Table D.13. As expected, there were no differences in the accuracies. The only difference between the two algorithms was the size of the networks after further training, in which case the evolutionary optimised sleep algorithm produced networks that were significantly smaller than those produced by offline aggregation.

7.5.8 Experiments with the Iris Classification Dataset The results of each experiment over SECoS are presented in Table 7.18. The results over EFuNN are presented in Table 7.19. The row labels are as in Subsection 7.5.7. These results are analysed and discussed in the following subsections. Online Aggregation The results of optimising the networks via online aggregation were compared to the results of the unoptimised ECoS training by testing the hypotheses listed in Table 7.2. The results of these tests over the results for SECoS are presented in Table D.14. Inspection of the appropriate rows in Table 7.19 with these results in mind show that the accuracy over Set A after initial training was lower for the SECoS created via online aggregation training than the accuracy for the unoptimised learning algorithm. Results over Sets B and C were not significantly different, while the networks yielded by online aggregation were significantly smaller. The same results are shown after further training on Set B. The results of testing the same hypotheses over the EFuNN results are presented in Table D.15. These results show that while there were some significant differences in accuracy at 95% level of confidence, there were no significant differences at 99% level of confidence. There were, however, significant differences in the size of the network, with online aggregation producing networks that were much smaller than the baseline networks. Evolutionary Optimised Training The performance of the SECoS networks trained via evolutionary optimised training was compared to the performance of the baseline SECoS networks by evaluating the hypotheses presented in Table 7.5. The results of these hypothesis tests are presented in Table D.16. These results, and the results in Table 7.18 show that the networks



Online Agg.

GA Training


(After)

Sleep (Before)

(After)

GA Sleep (Before)

(After)

176

Trained on Set B

A

B

C

All

Neurons A

B

97.8/

94.7/

93.3/

97.1/

24.8/

97.5/

1.6

6.9

7.0

1.5

2.1

94.7/

93.3/

94.0/

94.5/

2.6

6.3

5.8

99.6/

95.2/

0.2

C

All

Neurons

100.0/ 92.0/

97.2/

26.1/

1.3

0.0

7.6

1.7

2.4

12.3/

93.8/

98.0/

89.3/

93.7/

13.2/

2.3

3.3

2.9

3.2

11.8

2.9

2.3

95.1/

98.7/

7.5/

95.3/

100.0/ 93.1/

95.6/

7.7/

0.6

0.6

0.2

0.5

0.5

0.0

0.7

0.5

0.5

97.8/

94.7/

93.3/

97.1/

24.8/

89.8/

98.7/

85.3/

90.3/

9.6/

1.6

6.9

7.0

1.5

2.1

4.6

2.8

8.2

4.1

2.2

84.9/

84.7/

84.9/

83.7/

4.5/

83.7/

83.3/

84.7/

83.7/

4.4/

4.3

9.5

3.7

4.2

0.5

4.2

5.7

9.5

3.7

0.5

97.8/

94.7/

93.3/

97.1/

24.8/

94.1/

100.0/ 90.7/

94.3/

15.1/

1.6

6.9

7.0

1.5

2.1

3.3

0.0

5.6

2.8

3.1

95.2/

91.3/

90.7/

94.3/

12.9/

92.8/

97.3/

88.0/

92.7/

10.8/

2.3

7.1

7.8

2.3

2.8

3.1

4.6

6.9

2.7

3.7

97.8/

94.7/

93.3/

97.1/

24.8/

92.4/

100.0/ 88.2/

92.7/

15.4/

1.6

6.9

7.0

1.5

2.1

0.5

0.0

1.0

0.5

0.2

90.0/

85.2/

86.7/

89.2/

11.0/

88.3/

98.3/

82.7/

88.7/

8.9/

0.7

1.0

0.7

0.6

0.2

0.5

0.6

1.0

1.7

0.2

Table 7.18: Mean percent correct / standard deviation / approximate variance (to 1 d.p.) for SECoS optimised for the iris classification problem. Trained on Set A Recall Set EFuNN

Online Agg.

GA Training


(After)

Trained on Set B

A

B

C

All

Neurons A

B

C

All

Neurons

97.3/

93.3/

94.0/

96.5/

36.0/

97.2/

98.0/

94.0/

96.9/

37.1/

1.4

7.7

5.8

0.7

1.8

1.2

3.2

6.6

0.6

1.7

94.0/

90.7/

94.0/

93.7/

14.9/

95.5/

97.3/

94.7/

95.6/

13.7/

3.4

7.2

4.9

3.3

4.0

1.5

3.4

6.9

1.3

2.9

98.4/

95.0/

94.9/

97.7/

5.2/

94.1/

98.6/

92.3/

94.4/

5.2/

0.2

0.5

0.6

0.3

0.4

0.6

0.4

0.7

0.5

0.4

97.3/

93.3/

94.0/

96.5/

36.0/

93.5/

98.0/

95.3/

94.1/

12.3/

1.4

7.7

5.8

0.7

1.8

2.4

3.2

4.5

2.1

1.8

90.1/

91.3/

88.7/

90.1/

12.0/

87.3/

91.3/

86/

87.5/

12.0/

8.3

10.0

8.9

8.3

0.0

10.3

10.9

9.1

10.0

0.0

Table 7.19: Mean percent correct / standard deviation / approximate variance (to 1 d.p.) for EFuNN optimised for the iris classification problem.


177

trained over Set A using evolutionary optimised training were significantly more accurate and significantly smaller than the baseline SECoS networks. After further training on Set B, the evolutionary optimised training networks were less accurate over Set A, had equal accuracy over Set B, and were superior over Set C. Again, they were significantly smaller than the baseline networks. Repeating these tests for the EFuNN results yielded the results presented in Table D.17. The results after initial training on Set A were similar to the results for SECoS. The EFuNN created using evolutionary optimised training were more accurate and smaller than the equivalent baseline networks. After further training on Set B, the performance of the evolutionary trained EFuNN was less over Sets A and C, and higher over Set B. Again, these networks were smaller than the networks created using unoptimised training. Offline Aggregation The results for the offline aggregation experiments are laid out in Tables 7.18 and 7.19 the same way as in Subsection 7.5.7. The results of offline aggregation over SECoS were compared to the unoptimised networks by testing the hypotheses in Table 7.7. The results of these comparisons are presented in Table D.18. These results show that after initial training on Set A, the aggregated networks were significantly less accurate than the original networks, although they were also significantly smaller. Running the same tests across the results for EFuNN yielded the results in Table D.19. These results show that, for the initially trained networks, the decrease in accuracy over Set A was significant only at the 95% level of confidence. There was no significant decrease at the 99% level of confidence. No significant changes occurred over Set B, but significant changes occurred at both 95% and 99% for the full data set. After further training on Set B, there was no significant change in accuracy caused by aggregation over Set A. The decrease in accuracy over Set B was significant at the 95% level of confidence only. As before, there was a significant decrease in accuracy over Set C and the full data set, after aggregation. Sleep Learning The results of the sleep learning experiments are laid out as in Subsection 7.5.7. The results of testing the hypotheses listed in Table 7.9 are presented in Table D.20. These results show that after performing sleep learning on a network initially trained on Set A, the accuracies of the networks were significantly decreased over Sets A and B, but not significantly altered over Set C. The size of the networks were significantly reduced. Performing sleep learning on networks after further training on Set B yielded networks that significantly differ in performance only over Set C, yet were also significantly smaller. Evolved Sleep Learning The performance of the networks that resulted from evolved sleep learning was compared to two other sets of results. Firstly, to the performance of the original networks. Secondly, to the performance of the networks that resulted from unoptimised sleep learning. The first comparison was performed by testing the hypotheses listed in Table 7.11. The results of these tests are presented in Table D.21. These results, and the accuracies listed in Table 7.18 show that, for the initially trained


178

networks, evolved sleep training significantly reduced the size of the network, but also significantly reduced the accuracy of the networks across the data sets. For the additionally trained networks, the evolved sleep training again significantly reduced the size of the networks. Accuracy was again significantly reduced across all data sets. Comparison of Techniques The first comparison carried out was of online aggregation and evolutionary optimised training. This comparison was carried out by testing the hypotheses listed in Table 7.13. The results of these hypothesis tests are presented in Table D.23. These results show that in all cases, the evolutionary optimised network was more accurate and smaller than the network that resulted from training using online aggregation. Repeating the tests for EFuNN yielded the results presented in Table D.24. Bearing these results in mind while inspecting the results in Table 7.19 shows that, for the networks initially trained on Set A, the evolutionary optimised networks were superior in performance. After additional training on Set B, the evolutionary optimised networks were superior over Set B, but less accurate over Sets A and C. The evolutionary optimised networks were smaller. The comparison of SECoS optimised using offline aggregation and SECoS optimised using sleep learning was performed by testing the hypotheses listed in Table 7.14. The results of these tests are presented in Table D.25. These results show that the sleep learning algorithm was able to produce networks that were as accurate, or more accurate, than the networks produced by offline aggregation. Offline aggregation, however, did produce networks that were significantly smaller than the networks produced by sleep learning. The final comparison of techniques for this data set was the comparison of SECoS optimised by offline aggregation and SECoS optimised by evolutionary optimised sleep learning. The comparison was performed by testing the hypotheses in Table 7.15, the results of which are presented in Table D.26. These results show that there were no significant differences in the accuracies of the networks resulting from the two algorithms. There were, however, significant differences in the size of the networks, with offline aggregation again producing networks that were smaller than the sleep learning algorithm.

7.5.9 Experiments with the Mackey-Glass Dataset The results of the experiments with SECoS are presented in Table 7.20. The results of the experiments with EFuNN are presented in Table 7.21. The format of the results is as was used in previous experiments dealing with the Mackey-Glass data set. Online Aggregation The results of the online aggregation algorithm were compared to the original networks by evaluating the hypotheses listed in Table 7.2. The results of these tests for the SECoS results are presented in Table D.27. The results of these tests for the EFuNN results are presented in Table D.28. Inspection of the errors in Table 7.20 and the results in Table D.27 show that the networks that resulted from the use of online aggregation were significantly less accurate than those that did not use aggregation. They were, however, significantly smaller than the original networks.



Online Agg.

GA Opt. Train


(After)

Sleep (Before)

(After)

GA Sleep (Before)

(After)

179

Trained on Set B

A

B

C

All

Neurons A

B

13/

15/

15/

13/

60.0/

13/

0.8

2.13

2.944

0.934

2.1

16/

19/

18/

17/

1.521

3.641

3.279

24/

29/

65

All

Neurons

6.667/ 15/

13/

63.0/

1.355

0.793

1.088

2.5

53.3/

18/

9.361/ 17/

17/

55.5/

1.592

3.1

2.56

1.882

2.4

2.211

2.6

29/

25/

22.6/

32/

16/

34/

31/

22.6/

76

76

65

0.6

79

63

86

76

0.6

13/

15/

15/

13/

60.0/

41/

23/

45/

39/

27.8/

0.8

2.13

2.944

0.934

2.1

5.528

6.882

9.639

5.631

1.5

388/

389/

388/

388/

3.4/

417/

396/

421/

415/

3.2/

28

46

52

26

0.5

95

105

116

97

0.6

13/

15/

15/

13/

60.0/

28/

16/

3/

27/

34.5/

0.8

2.13

2.944

0.934

2.1

3.955

3.219

6.354

3.848

2.7

42/

42/

43/

42/

23.3/

66/

57/

71/

65/

18.7/

6.779

9.15

7.272

6.655

2.9

11

14

16

11

1.6

13/

15/

15/

13/

60.0/

37/

2/ 67

38/

35/

27.6/

0.8

2.13

2.944

0.934

2.1

56

68

54

0.3

181/

187/

19/

182/

2.9/

540/

534/

554/

541/

2.3/

217

221

231

216

0.2

921

937

950

925

0.2

Table 7.20: Average mean squared error / standard deviation / approximate variance (10 problem.

C

2.116

4) for the Mackey-Glass


Initialised / Trained on Set A Recall Set EFuNN

Online Agg.

GA Opt. Train


(After)

180

Trained on Set B

A

B

C

All

Neurons A

B

22/

13/

12/

11/

267.8/

11/

0.53

1.3

1.67

0.363

6.4

17/

19/

19/

17/

1.077

3.875

2.189

40/

44/

76

All

Neurons

9.319/ 12/

11/

282.4/

0.383

1.024

1.54

0.272

4.5

162.4/

17/

14/

18/

17/

173.9/

1.067

5.3

2.002

1.928

3.766

1.71

6.6

44/

41/

20.0/

51/

28/

52/

49/

20.0/

92

91

76

0.0

105

85

111

102

0.0

22/

13/

12/

11/

267.8/

26/

11/

26/

25/

88.9/

0.53

1.3

1.67

0.363

6.4

4.4

1.7

5.5

3.9

3.3

61/

68/

66/

62/

20.0/

26/

69/

79/

78/

20.0/

1.0

19

18

11

0.0

4.4

16

20

18

0.0

Table 7.21: Average mean squared error / standard deviation / approximate variance (10

C

4) for the Mackey-Glass

problem. Evolutionary Optimised Training The networks that resulted from evolutionary optimised training were compared to the original, unoptimised networks by testing the hypotheses listed in Table 7.4. The results of these tests for SECoS are presented in Table D.29, and for EFuNN in Table D.30. These results, and the accuracies presented in Tables 7.20 and 7.21, show that the evolutionary optimised training produced networks that were not significantly different in accuracy, but were significantly smaller, than those that resulted from unoptimised training. Offline Aggregation The performance of the networks resulting from offline aggregation was compared to the performance of the original, unoptimised networks. This was done by testing the hypotheses listed in Table 7.7. The outcome of these tests over the SECoS results are presented in Table D.31, while the outcome of the tests over the EFuNN results are presented in Table D.32. The results over SECoS show that the aggregated networks were significantly less accurate over all data sets, but were significantly smaller than the originals. Sleep Learning The performance of SECoS networks optimised by sleep learning was compared to the performance of the original networks, by testing the hypotheses listed in Table 7.9. The results of these tests are presented in Table D.33. These results, along with the accuracies in Table 7.20, show that the sleep trained networks, while significantly smaller, were significantly less accurate than the original networks.


181

Evolved Sleep Learning Two sets of comparisons were carried out for the results of the evolved sleep learning experiments. The first compared the optimised networks to the original networks, via the evaluation of the hypotheses listed in Table 7.11. The results of this comparison are presented in Table D.34. These results show that there were no significant differences in accuracies between the original and optimised networks, while the optimised networks were significantly smaller than the originals. While the mean error for the evolved sleep learning networks was higher than the originals, the variance was also quite high, which suggests that there were several poorly performing networks that dragged the mean performance down. After further training, the results were somewhat worse. The accuracy across all data subsets was significantly less than the original. The network was significantly reduced in size, however. The second comparison compared the performance of unoptimised sleep learning with evolved sleep learning. This was done by testing the hypotheses listed in Table 7.12. The results of these tests are presented in Table D.35. These results show that there was no significant difference in accuracies between the unoptimised sleep learning and the evolved sleep learning. There was, however, a significant difference in size, with the evolved sleep learning algorithm producing networks that were significantly smaller than the unoptimised sleep learning algorithm. Comparison of Techniques The results for online aggregation and GA optimised training were compared, by testing the hypotheses listed in Table 7.13. The results of the comparison, for SECoS, are presented in Table D.36, while the results for EFuNN are presented in Table D.37. The results in both of these tables show that there were no significant differences in the accuracies of networks resulting from either online aggregation or evolved training. There were, however, significant differences in the size of the resulting networks, with the evolutionary optimised training producing networks that were significantly smaller. The results of SECoS networks optimised by offline aggregation and SECoS networks optimised by sleep learning were compared by testing the statistical hypotheses listed in Table 7.14. The results of these tests are presented in Table D.38. From these results, and the errors listed in Table 7.20, it can be seen that while offline aggregation produced networks that were much smaller, the networks resulting from sleep learning were significantly more accurate than the networks that resulted from offline aggregation. The final comparison for this data set was the comparison of offline aggregation of SECoS and SECoS optimised by evolved sleep learning. This comparison was done by testing the statistical hypotheses in Table 7.15, the results of which are presented in Table D.39. These results show that there was no significant difference in the accuracies of SECoS networks resulting from offline aggregation and SECoS networks resulting from evolved sleep learning. There was a significant difference in size, however, with evolved sleep learning producing networks that were smaller than offline aggregation.



Online Agg.

GA Training


(After)

Sleep (Before)

(After)

GA Sleep (Before)

(After)

A

B

C

All

182


B

C

All

Neurons

0.458/ 0.615/ 0.583/ 0.486/ 26.8/

0.536/ 0.227/

0.629/

0.514/

28.1/

0.043

0.091

0.104

0.068

0.192

0.093

4.3

0.58/

0.667/ 0.637/ 0.595/ 20.9/

0.681/ 0.278/

0.75/

0.648/

20.8/

0.111

0.18

0.166

0.027

0.237

0.152

3.0

0.207

0.166

0.035

0.108

3.9

2.8

0.957/ 1.23/

1.223/ 0.988/ 5.3/

1.415/ 0.545/

1.513/

1.338/

5.3/

0.104

0.163

0.164

0.113

0.188

0.156

0.3

0.458/ 0.615/ 0.583/ 0.486/ 26.8/

1.309/ 0.456/

1.315/

1.224/

10.4/

0.043

3.9

0.289

0.187

0.241

0.241

2.3

5.323/ 5.399/ 5.314/ 5.3/

2.8/

1.309/ 6.55/

7.545/

7.16/

2.4/

0.691

0.4

0.289

3.495

1.8

0.5

0.166

0.091

2.35

0.207

0.751

0.109

0.035

0.8

0.3

1.44

0.458/ 0.615/ 0.583/ 0.486/ 26.8/

1.006/ 0.358/

1.096/

0.95/

13.9/

0.043

0.152

0.077

0.33

0.139

2.4

1.325/ 1.368/ 1.346/ 1.331/ 10.2/

2.157/ 1.531/

2.23/

2.101/

6.9/

0.283

0.873

0.737

0.893

0.837

2.0

0.458/ 0.615/ 0.583/ 0.486/ 26.8/

1.337/ 0.46/

1.193/

1.235/

10.3/

0.043

0.175

0.175

0.409

0.091

0.448

0.091

0.207

0.279

0.257

3.9

1.6

0.035

3.9

0.188

3.931/ 3.903/ 3.76/

2.4/

3.911/

1.337/ 10.761/ 14.429/ 13.012/ 1.9/

0.349

0.377

0.246

0.188

0.345

0.207

0.035

0.347

0.118

1.377

1.605

1.493

0.188

Table 7.22: Average mean squared error / standard deviation / approximate variance (to 3 d.p.) for SECoS optimised for the gas furnace problem.

7.5.10 Experiments with the Gas Furnace Dataset The results of the experiments with the gas furnace data set are presented in Table 7.22 for the SECoS networks, and Table 7.23 for the EFuNN networks. Online Aggregation The performance of networks trained using online aggregation was compared to the performance of the unoptimised networks from Subsection 4.13.4. The comparison was carried out by testing the statistical hypotheses listed in Table 7.2. The results of these tests for SECoS networks are presented in Table D.40, while the results for EFuNN are in Table D.41. The results in Tables D.40 and 7.22 show that online aggregation has had some effect upon the network. For the networks after initial training on Set A, the accuracy of the aggregated networks was significantly less than the unoptimised networks. Over Sets B and C, there were no significant differences. The aggregated networks were significantly smaller than the unoptimised networks. After further training on Set B, significant differences existed at the 95% level of confidence across Sets A, B and C, but no such differences existed at the 99% level of


Trained on Set A Recall Set EFuNN

Online Agg.

GA Training


(After)

C

Trained on Set B

A

B

0.45/

0.801/ 0.847/ 0.525/ 107.8/

0.478/ 0.337/ 0.835/ 0.499/ 118.2/

0.049

0.212

0.04

0.25

All

183

0.066

Neurons A

3.6

B

0.05

C

0.215

All

0.045

Neurons

4.0

0.457/ 0.815/ 0.849/ 0.532/ 105.8/

0.481/ 0.341/ 0.847/ 0.504/ 115.4/

0.047

0.21

0.25

0.062

4.0

0.045

1.166/ 1.445/ 1.428/ 1.22/

5.3/

1.746/ 0.747/ 1.82/

1.653/ 5.3/

0.115

0.199

0.3

0.189

0.18

0.45/

0.801/ 0.847/ 0.525/ 107.8/

1.161/ 0.356/ 1.131/ 1.078/ 33.9/

0.049

0.212

0.214

0.192

0.25

0.122

0.066

3.6

0.051

0.135

0.089

0.21

0.221

0.388

0.047

0.205

4.3

0.3

1.8

1.363/ 1.418/ 1.351/ 1.367/ 10.0/

1.161/ 1.248/ 1.700/ 1.608/ 10.0/

0.218

0.214

0.383

0.606

0.198

0.0

0.414

0.254

0.200

0.0

Table 7.23: Average mean squared error / standard deviation / approximate variance (to 3 d.p.) for EFuNN optimised for the gas furnace problem. confidence. Again, the aggregated networks were significantly smaller than the unoptimised networks. The results over the EFuNN, in Tables D.41 and 7.23 show that there were no significant differences in accuracy between the aggregated and unoptimised networks. Nor was there any difference in the size of the networks, which shows that the aggregation process failed at the parameter settings used. Evolutionary Optimised Training The performance of networks trained using evolutionary optimised training parameters were compared to the performance of the original, unoptimised networks. The comparison was performed by testing the statistical hypotheses listed in Table 7.5. The results of these tests over the SECoS results are presented in Table D.42. These results, and the accuracies presented in Table 7.22 show that the evolutionary optimised SECoS were both significantly smaller, and significantly less accurate, than the unoptimised networks. The results of the tests over EFuNN are presented in Table D.43. As was the case with the SECoS above, the networks trained using evolutionary optimised parameters were both significantly smaller and significantly less accurate than the unoptimised networks. Evolutionary optimisation of parameters failed to produce useful networks in this case. Offline Aggregation The performance of networks that were aggregated in an offline mode were compared to the accuracies of the unaggregated networks. The comparison was performed by testing the hypotheses listed in Table 7.7. The results of these tests over SECoS are presented in Table D.44. These results, and the accuracies in Table 7.22, show that the offline aggregation significantly reduced the size of the networks, but also seriously degraded the networks accuracies.


184

The results for the tests across EFuNN are presented in Table D.45. As with SECoS, these results, and the accuracies presented in Table 7.23, show that while the EFuNNs that resulted from offline aggregation were significantly smaller than the originals, they were also significantly less accurate. That is, aggregation significantly degraded the accuracies of the networks. Sleep Learning A comparison of the accuracy of SECoS networks before and after the application of sleep learning was performed by testing the hypotheses listed in Table 7.9. The results of these tests are presented in Table D.46. These results, and the accuracies in Table 7.22 show that the sleep training yielded networks that were significantly smaller than the originals, but also much less accurate. Evolved Sleep Learning The first comparison performed over the results of evolved sleep learning compared the performance of the networks that resulted from evolved sleep to the performance of the original networks. This comparison involved testing the hypotheses in Table 7.11, and the results are presented in Table D.47. The results of these tests, and the accuracies presented in Table 7.22 show that the evolved sleep algorithm resulted in networks that were significantly less accurate than the original networks, as well as significantly smaller. The second comparison compared the performance of the evolved sleep networks with the performance of the networks trained using unoptimised sleep learning. The comparison was performed by testing the statistical hypotheses listed in Table 7.12. The results of these tests are presented in Table D.48. These results show that, while the evolved sleep learning produced networks that were significantly smaller than the unoptimised algorithm, the accuracy of the networks yielded by the unoptimised sleep learning algorithm was much higher. Comparison of Techniques The first pair of techniques to be compared were online aggregation and evolutionary optimised training. The comparison involved testing the hypotheses listed in Table 7.13. The results of these tests are in Table D.49. Inspection of these results, and the results in Table 7.22, shows that the accuracies of the networks trained using online aggregation were greater than those networks resulting from evolutionary optimised training. Evolutionary training did result in networks that were significantly smaller, however. Note that there is one test that did not reject the null hypothesis, the test of hypothesis BB at the 99% level. This is most likely a statistical anomaly. Repeating these tests for the EFuNN networks gave similar results (Table D.50). Networks trained using online aggregation were significantly larger than the networks trained via evolutionary optimised training, but were also more accurate. No anomalous results appeared for the tests of the EFuNN results. The second set of comparisons of techniques compared the performance of SECoS aggregated offline, with the performance of SECoS optimised by sleep learning. The statistical hypotheses in Table 7.14 were tested, and the results are presented in Table D.51. These results show that, for the networks after initial training, the offline aggregation algorithm produced networks that were smaller than the networks produced by sleep learning, but that sleep learning produced networks that were more accurate. For the networks after further training on Set B, the


185

Two Spirals

Iris

Mackey-Glass

Gas Furnace

Online Agg.

fail

success

success

partial

GA Training

partial

success

success

fail

Offline Agg.

success

fail

fail

fail

Sleep

success

partial

fail

fail

GA Sleep

partial

partial

success

fail

Table 7.24: Success of each optimisation method applied to SECoS, by benchmark data set.

sleep-trained networks were more accurate over Set A, while no significant difference existed over Sets B and C. Offline aggregation again produced networks that were smaller. The third set of comparisons was between SECoS aggregated offline and SECoS trained by evolutionary sleep learning. The results of testing the hypotheses listed in Table 7.15 are presented in Table D.52. These results, and the accuracies in Table 7.22 show that after initial training, the evolved sleep learning algorithm produced networks that were more accurate than the networks produced by offline aggregation. After further training, no significant difference existed between the performance of the networks produced by either algorithm over Set A. Over Sets B and C, the networks produced by the evolved sleep algorithm were significantly less accurate.

7.5.11 Benchmark Experiments Conclusions It can be expected, from the No Free Lunch Theorem, that no one optimisation method will produce superior results across all data sets. This is the reason that multiple optimisation techniques were developed and presented in this Chapter; with a toolbox of optimisation techniques available, it is more likely that an effective algorithm will be available for a particular problem. In the introduction to this Chapter, criteria were established to judge the success of an optimisation algorithm. These criteria reflect the requirement that an optimisation technique must reduce the size of a network, without significantly decreasing the accuracy of the network, in relation to either an unoptimised network (in the case of training optimisation (online) techniques) or the network before optimisation (in the case of offline optimisation techniques). In this discussion, an algorithm will be considered to have failed if the accuracies were significantly degraded, or the size of the network was not significantly reduced. An algorithm will be considered to be partially successful if the accuracies over some of the data subsets (one or more of Sets A, B or C) were not significantly altered, and the size of the network was reduced. An algorithm will be considered successful if none of the accuracies were significantly degraded, and the size of the network was reduced. Table 7.24 presents the evaluation of each algorithm applied to SECoS, for each benchmark data set. Table 7.25 presents the evaluation of each algorithm applied to EFuNN, for each benchmark data set. Each of the techniques was successfully applied to at least one of the benchmark data sets, and each of the data sets had at least one method applied to it that was at least partially successful. Overall, reducing the number of neurons in the evolving layer of the network was detrimental to the performance of the function approximation networks. This agrees with the analysis presented in Chapter 5, which


186

Two Spirals

Iris

Mackey-Glass

Gas Furnace

Online Agg.

fail

partial

success

fail

GA Training

partial

partial

success

fail

Offline Agg.

partial

partial

fail

fail

Table 7.25: Success of each optimisation method applied to EFuNN, by benchmark data set.

establishes that function approximation problems will require more neurons than classification problems. This was particularly problematic across the gas furnace problem, with none of the optimisation methods being entirely successful.

7.6 Conclusions This chapter dealt with Hypothesis Five of Section 1.2. The criteria for the support of this hypothesis are reproduced below: The research relating to Hypothesis Five will be considered to support the hypothesis if it results in algorithms that, when applied to an ECoS network, yields the following results: 1. The size of the network has been reduced. 2. The memorisation error over previously seen data has not changed significantly. 3. The generalisation error over previously unseen data has not changed significantly. Algorithms to optimise ECoS networks were presented and tested over the benchmark data sets. The size of the networks was in all cases reduced. The memorisation accuracy was not significantly decreased in several cases, while the generalisation accuracy was also not significantly decreased in several cases. Based on these results, Hypothesis Five is considered to be supported.

7.7 Summary Previous work in optimising ANN was briefly summarised in Section 7.2. Section 7.3 presented two methods for optimising an ECoS network during training, and for optimising the training process itself. Optimisation of ECoS networks post-training were presented in Section 7.4. Finally, an empirical evaluation of the optimisation techniques over the benchmark data sets was presented in Section 7.5. These results show that each of the optimisation algorithms was successfully applied to at least one of the benchmark data sets.

Chapter 8

Case Study: The Isolated Phoneme Recognition Problem ’Tis better to be silent and be thought a fool, than to speak and remove all doubt. Abraham Lincoln

8.1 Introduction The algorithms described so far in this thesis have been evaluated over benchmark data sets. To be useful contributions, the algorithms must also be demonstrated over data from a real-world application. Testing and evaluation in this case means determining the performance of the algorithms, previously tested over benchmark data, over a real-world data set. This will support the assessment of the success of the hypotheses presented in Chapter 1. This chapter presents the major case study of the thesis. There are two major purposes to this chapter: 1. To simulate the application of ECoS in a real-world application system. 2. To test and evaluate the algorithms described in this thesis on a real-world problem. The problem selected for this case study is that of isolated New Zealand English phoneme recognition. Phonemes are the smallest unique components of speech. Although the English language has over a quarter of a million words, there are only between forty and forty five phonemes, depending upon the accent. This makes the recognition of phonemes a promising approach for the application of neural networks to automatic speech recognition systems. As was stated in (Robinson and Fallside, 1990): The most promising approach to the problem of large vocabulary automatic speech recognition is to build a recogniser which works at the phoneme level and then map (sic) the resulting strings of phonemes onto a string of words. They go on to say: The phoneme recognition approach is practical because the number of phonemes is small (about 45) compared with the number of words in a large vocabulary task (about 1000). The speaker independent phoneme modules may be trained with a much smaller speech corpus than would be required to train speaker independent word models.

CHAPTER 8. CASE STUDY: THE ISOLATED PHONEME RECOGNITION PROBLEM

188

Isolated phoneme recognition is thus the task of identifying which phonemes are present in a particular processed acoustic vector. For the task of phoneme recognition, the string of identified phonemes would then be passed on to a further module in the system, where they would be processed into words. Phoneme recognition is a difficult problem, for several reasons:

Variability of the acoustic signal of a phoneme in different utterances by a single speaker.

Variations between speakers of the same accent.

Variations between speakers of different accents.

Variations is the of duration of phonemes.

Variations in the stress of the phoneme, depending upon where in the word the phoneme is.

Coarticulation effects, that is, modification of the phoneme by the phoneme that precedes and follows it.

A multi-modular approach to this problem was selected, that is, a separate network was created for each phoneme to be recognised. The rationale for this approach was described in (Kasabov, 2003, pg 233-234): The rationale behind this approach is that single phoneme NN can be adapted to different accents and pronunciations without necessarily retraining the whole system (or whole NN in the case of a single NN that recognises all phonemes) Multi-modular systems are easier to optimise, as only individual networks need to be changed. They also allow for a mix of ANN architectures. Finally, this setup produces multiple results for each algorithm, which allows for the application of statistical analysis techniques similar to those used for the benchmark data sets. The goal of the experiments in this chapter was not to build a functioning speech recognition system, nor was it to create optimal phoneme recognition neural networks. The goal was to evaluate the algorithms described in this thesis in the context of the adaptive isolated phoneme recognition problem. This was done by creating an ANN module for each of the phonemes available, using those algorithms that are appropriate and practicable. The accuracy of each type of network over training and unseen phoneme examples was evaluated, and the degree to which it adapted to new utterances and new speakers was evaluated. While statistical hypothesis tests are described in this chapter, for reasons of space the results of these tests are presented in Appendix E. For the purposes of these tests, the results are assumed to be normally distributed. This chapter is organised as follows: Section 8.2 describes the nature of speech and how it is formed. Section 8.3 discusses previous work in using neural networks to recognise isolated phonemes, while Section 8.4 describes the Otago Speech Corpus, from which the experimental data used was derived. Section 8.5 describes how the data for the experiments was prepared, and describes the general structure of the experiments. Results with MLP and FuNN networks are presented and discussed in Section 8.6. The performance of EFuNN and SECoS in the experimental framework are presented in Section 8.7. Some of the optimisation algorithms presented in Chapter 7 are applied to the case study problem in Section 8.9. Finally, conclusions are offered in Section 8.10 and the chapter is summarised in Section 8.11.


189

Figure 8.1: Human vocal tract (taken from http://www.umanitoba.ca/faculties/arts/linguistics/russell/138/sec1/anatomy.htm).

8.2 The Nature of Speech Speech, as with all other sounds, is a sequence of pressure waves, produced by vibrations, moving through a medium, usually air. The particular vibrations in speech are produced by air moving from the lungs, and through the larynx, which produces vibrations of varying and variable frequencies. These sounds are then further modified by the action of the tongue and lips, to produce the sounds we associate with speech. Figure 8.1 shows a crosssection of the human vocal tract. When sound is digitised, samples are taken of its amplitude at a specific frequency. This is referred to as the sample frequency and is usually expressed in Hertz (Hz). The resolution of each sample is determined by the number of bits used to record each sample. The higher the sample frequency and sample resolution, the better the quality of the recording, but the higher the rate of data there is to process. An important theorem in signal sampling is the Nyquist Theorem (Nyquist, 1928). Briefly, it states that the highest frequency that can be represented is less than half the sample frequency. This has implications in the capture of speech data.

8.3 Previous Work Two of the most common models used for speech recognition are Hidden Markov Models (HMM) and ANN. HMM are out of the scope of this thesis, and will not be further considered, other than to point out that studies that have compared ANN and HMM in the past have found that ANN have superior performance (Bengio and De Mori, 1988; Kirchhoff, 1998). Much work has been done applying ANN to the problem of speech recognition (Lippmann, 1989). This has in the past involved using common ANN such as MLP for applications such as phoneme classification (Leung et al., 1990; Togneri et al., 1992) and word classification (Bourland and Wellekens, 1987; Franzini, 1988), and performing phoneme classification with RBF networks (Renals and Rohwer, 1989). Commonly used are recurrent networks (Watrous and Shastri, 1987; Waibel et al., 1989; Koizumi et al., 1996; Watrous et al., 1990; Robinson and Fallside, 1990; Anderson et al., 1988) and time delay neural networks (TDNN)


190

(Waibel et al., 1988; Waibel et al., 1989). Speech recognition is also often used as a test problem for new ANN algorithms (Homma et al., 1988; Lawrence et al., 1996; Glaeser, 1998; Sima et al., 1998; Pican et al., 1996; Haskey and Datta, 1998). For a more recent review of ASR technologies, see (Lippmann, 1997). The work in (Kasabov et al., 1997b) and (Kasabov et al., 1999) used FuNN networks to identify individual phonemes. This was a multi-modular approach, in that each phoneme had a single network associated with it. No accuracies are reported in either of these publications, neither was the ability of the FuNN networks to adapt investigated. Most relevant to the work in this chapter is the research presented in (Kilgour, 2003). Although Kilgour’s work was in the context of building a complete speech recognition system, he did do some work on phoneme recognition, using almost identical data sets to those used in this chapter. Kilgour investigated FuNNs and a four-layer variant of EFuNN, which he called FLEFuNN (Four Layer EFuNN). As the goal was a speech recognition system, the networks were trained on single data sets: there was no evaluation of the adaptive capabilities of the networks. No rules were extracted from any of the networks, and different data sets were used for the FuNN and FLEFuNN experiments. Of all of the results from (Kilgour, 2003), only a small set of results over FuNN are relevant to this thesis.

8.4 The Otago Speech Corpus The Otago Speech Corpus (Sinclair and Watson, 1995) is the source of the isolated phoneme data used in these experiments. This is a corpus of recorded spoken words from native speakers of New Zealand English, and consists of two collections of recordings. The first consists of spoken digits, from zero to nine, recorded from ten females and eleven males. Each speaker spoke each digit three times. The second consists of words that contain each of the forty three phonemes present in the New Zealand English dialect. Each of the phonemes was captured in an initial, medial and final position: there were thus 129 words recorded, with each word recorded three times. The phonemes were segmented manually. Due to recording difficulties and non-standard pronunciations, the 9,144 words in the corpus yielded a total of 10,467 segmented phonemes. The forty three phonemes present in the corpus are listed in Table 8.4, along with example words in which they are present.

8.5 Experimental Method The setup of the experiments in this chapter is different to that of the experiments performed over the benchmark data. The benchmark data was tested using ten-fold cross validation, with the final performance being measured as the mean over all ten sets. This is inappropriate for the current case study. This is principally because the amount of time required to train networks over the case-study data is such that it was not feasible to perform cross-validation. Another reason is that, in a real-world application of phoneme-based speech recognition systems, the entire data set (that is, all of the data the system will ever have to deal with) will not be available at the time the system is created.


191

Number

ASCII Character

Example Word

Number

ASCII Character

Example Word

01

/p/

put

23

/w/

went

02

/b/

but

24

/ie/

yes

03

/t/

ten

25

/I/

pit

04

/d/

den

26

/e/

pet

05

/k/

can

27

/&/

pat

06

/g/

game

28

/V/

putt

07

/f/

full

29

/A/

pot

08

/v/

very

30

/U/

good

09

/T/

thin

31

/i/

bean

10

/D/

then

32

/a/

barn

11

/s/

some

33

/O/

born

12

/z/

zeal

34

/3/

burn

13

/S/

ship

35

/u/

boon

14

/Z/

measure

36

/el/

bay

15

/h/

hat

37

/al/

buy

16

/ch/

chain

38

/Oi/

boy

17

/dj/

Jane

39

/OU/

no

18

/m/

man

40

/aU/

now

19

/n/

not

41

/i@/

peer

20

/N/

long

42

/U@/

poor

21

/l/

like

43

/e@/

pair

22

/r/

run

Table 8.1: Phoneme numbers, character representation and example words. Therefore, the system (and the networks therein) will have to identify examples as they become available, and, ideally, adapt to these examples without forgetting what has already been learned. Since the purpose of this case study is to simulate the application of ECoS to a real-world problem, this restriction must be taken into account. To simulate this situation, then, three data sets were prepared. The first data set, denoted in this chapter as Set A, consisted of two utterances of each phoneme from one male and one female speaker (speakers 12 and 17 from the Otago Speech Corpus, Section 8.4). The second data set, denoted as Set B, consisted of the remaining utterances of each phoneme for the two speakers that were present in the corpus. The third and final data set, denoted Set C, consisted of three utterances of each phoneme, by two additional speakers, once again one male and one female (Speakers 2 and 20 from the Otago Speech Corpus, Section 8.4). This is a different approach to that used in (Kilgour, 2003), who did not investigate the adaptation of networks. Overall, only the accuracies over Set C in this work are comparable to the results from (Kilgour, 2003). Following the data preparation steps described in (Kasabov et al., 1999, pg 244), each of the isolated phonemes


Class Plosive

Members /p/,/b/,/t/,/d/,/k/,/g/

Fricative

/f/,/v/,/T/,/D/,/s/,/z/,/S/, /Z/,/h/

Affricate

/ch/,/dj/

Nasal Approximant

192

/m/,/n/,/N/ /l/,/r/,/w/,/iw/

Monothong

/I/,/e/,/&/,/V/,/A/,/U/,/i/,/a/,/O/,/3/,/u/

Diphthong

/el/,/al/,/Oi/,/OU/,/aU/, /i@/,/U@/,/e@/ Table 8.2: Phonemes grouped by class.

were then mel-scaled, using the frequency bins shown in Table 8.5. Mel-scaling is a logarithmic transformation, that has been previously reported to lead to good classification results (Davis and Mermelstein, 1980). Each mel-scale window has a length of thirteen milliseconds and an overlap of 50%, giving a total window length of approximately twelve milliseconds. This resulted in a succession of twenty-six element overlapped vectors that were then linearly normalised to be in the range [0; 1℄, with a maximum value of 30000 and a minimum value of zero. These vectors were then time-stepped over three time steps, which resulted in seventy eight elements in each vector, representing time t, t + 1 and t + 2. The number of vectors (examples) present in each data set are listed in Table 8.4. As stated in the introduction to this chapter, a multi-modular approach was followed for these experiments, that is, a separate ANN was created and trained to identify each phoneme. Thus, the positive examples for each network reflect which phoneme the network is intended to activate for. This approach followed from (Kasabov et al., 1997b; Kasabov et al., 1999).

8.5.1 Presentation of Results The performance of the networks are reported as the percent true negative, percent true positive, and percent overall correct. With data sets as unbalanced as these (many times more negative examples than positive) it is not informative to report only the overall accuracy. It is possible for a model to achieve a very high overall accuracy by classifying every example as negative. By reporting the true negative and true positive accuracies, a better picture is drawn of the performance of the network. While experiments were performed over all forty-three phonemes, to reduce the amount of space taken, only two types of results are shown. The first is the mean accuracy across all forty-three phonemes. The second is the results for seven of the phonemes. These seven phonemes are /t/,/s/,/ch/,/n/,/l/,/a/ and /al/. These are the phonemes that have the largest number of examples, from each of the phoneme groups listed in Table 8.2. The complete results for all forty-three phonemes are presented in Appendix F.


Bin Number

Central Frequency (Hz)

1

86

2

173

3

256

4

430

5

516

6

603

7

689

8

775

9

947

10

1033

11

1120

12

1292

13

1550

14

1723

15

1981

16

2325

17

2670

18

3015

19

3445

20

3962

21

4565

22

5254

23

6029

24

6997

25

8010

26

9216

Table 8.3: Mel scale filter central frequencies.

193


194

ASCII Character

Set A

Set B

Set C

ASCII Character

Set A

Set B

Set C

/p/

116

57

59

/w/

103

48

54

/b/

17

9

23

/ie/

107

53

69

/t/

140

67

46

/I/

174

85

115

/d/

27

13

23

/e/

253

124

214

/k/

109

52

42

/&/

285

138

190

/g/

28

13

36

/V/

163

80

96

/f/

299

147

173

/A/

173

86

120

/v/

73

35

118

/U/

99

48

81

/T/

256

124

172

/i/

325

159

351

/D/

79

38

115

/a/

538

262

448

/s/

362

178

115

/O/

519

251

344

/z/

195

95

156

/3/

485

234

311

/S/

297

144

199

/u/

401

195

330

/Z/

80

39

82

/el/

476

235

165

/h/

83

38

123

/al/

688

335

270

/ch/

226

108

113

/Oi/

406

195

360

/dj/

38

18

98

/OU/

590

289

328

/m/

123

60

68

/aU/

469

232

344

/n/

182

90

92

/i@/

370

180

329

/N/

95

46

63

/U@/

226

109

203

/l/

138

66

153

/e@/

257

127

202

/r/

105

53

65

Table 8.4: Examples available for each phoneme in each phoneme data set.


195

8.5.2 Statistical Tests There were several statistical tests performed over the results of these experiments. As these tests required a single performance metric for each data set, the Matthews correlation coefficient (Matthews, 1975) was calculated for each result, according to Equation 8.1.

Cx =

q

(Px Nx ) (Nxf Pxf )

(Nx + N )(Nx + P )(Px + N )(Px + P ) f x

f x

f x

f x

(8.1)

where:

Cx is the correlation coefficient for a class x, Px is the number of true positive classifications, Nx is the number of true negatives classifications, Pxf is the number of false positive classifications, and Nxf is the number of false negative classifications. The correlation coefficient will be 1 when all examples are correctly classified, and

1 when all examples are

incorrectly classified. The advantage of this approach is that the correlation coefficient provides a more balanced measure of the accuracy of the network: even if the number of positive examples is very much smaller than the negative, the correlation coefficient will give an accurate measure of performance. The correlation coefficients for each test are not presented in this thesis, as it was felt that the metrics as described in the previous subsection were more informative. The coefficients were instead used in the statistical tests that were carried out. These experiments were carried out across all forty three phonemes, that is, the tests indicated differences in the algorithms over all phonemes, rather than across individual phonemes. The results of these tests should thus be considered to be indicative, rather than conclusive. That is, Algorithm A may have been better than Algorithm B over all, but for certain phonemes, Algorithm B may have been a better choice. Since a multi-modular approach was taken, this situation would not cause problems in a real-world speech recognition system: the developer would simply select the method that worked best for each particular phoneme.

8.5.3 Limitations of the Approach There are limitations with the above experimental approach. Whereas ten-fold cross-validation was used for the benchmark data sets, the case-study data was presented to the algorithms evaluated in only one way: firstly Set A was presented, then Set B, followed by Set C. This caused some problems as it is essentially then a small sample. The limitations are, firstly, that the performance was probably over-estimated, as the models will be biased towards the optimal models for those data sets. In other words, the accuracies tend to be higher over these data sets than they would be for truly randomly selected data. The most worrying limitation is the potential for the results to be very different due to being an atypical sample. This was not very likely, however, due to the way in which the data was collected: if a word was mispronounced by the speaker, it was recorded again and the original recording discarded. Having an equal number of male and female speakers also balanced the data set, as variations between


Learning Rate

0.5

Momentum

0.5

Total Epochs

1000

196

Table 8.5: Training Parameters for MLP and FuNN trained for the phoneme case study. genders are very strong. In conclusion, then, while the results of this case study should be treated with caution, they are considered unlikely to be strongly atypical.

8.6 Results with the MLP and FuNN Algorithms As discussed in Section 8.3, networks such as MLP and FuNN have previously been applied to speech-recognition and phoneme-recognition problems. Each MLP and FuNN investigated here had ten hidden neurons, consistent with (Kasabov et al., 1999). The FuNN had three MF attached to each input, and two attached to the output. Again, this was consistent with (Kasabov et al., 1999). The MLP and FuNN networks were trained with a variant of backpropagation, known as bootstrapped backpropagation. This method was used because of the unbalanced nature of the training sets: there were a large number of negative examples, in comparison with the number of positive examples. Bootstrap training, as used here, consists of drawing a random subset from the overall data set, where the examples are drawn so that the relative frequency of the positive examples to the negative examples is higher than in the complete data set. The network is trained on this subset for a specific number of epochs, at which time a new subset is built and the process repeated. The parameters used to train the MLP and FuNN networks are presented in Table 8.5. These parameters are similar to those used in the benchmark experiments. This choice was deliberate: differences between the architecture and parameters chosen were minimised as much as possible. This means that differences in performance were due to differences in the data, rather than the way in which the algorithms were used. The training subset consisted of four hundred examples, with one hundred drawn from the positive examples and three hundred drawn from the negative: there was thus a three to one ratio of negative to positive examples. For those phonemes that have fewer than one hundred examples available, the size of each subset was scaled so that the three to one ratio was maintained. The subset was rebuilt every ten epochs. Even though bootstrapped backpropagation is faster than standard backpropagation, a large amount of time was required to train the MLP and FuNN networks. This meant that only nine networks were trained and evaluated for each phoneme: at the time the experiments were carried out, performing more trials was not considered to be feasible, especially as neither MLP or FuNN are the focus of this chapter. The summary of the results of these experiments with MLP and FuNN are presented in Table 8.6. Note that the approximate variance is used as the measure of variation in this table. The results for FuNN, over Set C, are superior to those reported in (Kilgour, 2003). The FuNN used by Kilgour had only eight neurons in the rule layer, and bootstrapped training was not used, which may explain the inferior results.


197

Recalled With Set A Network

MLP

Set C

Train

True

True

Overall True

True

Overall True

True

set

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

96.4 /

95.8 /

96.3 /

95.8 /

81.4 /

95.5 /

94.5 /

45.5 /

93.4 /

0.1

0.2

0.1

0.2

0.3

0.1

0.2

0.4

0.2

95.8 /

62.3 /

95.0 /

95.8 /

61.1 /

95.0 /

94.6 /

36.5 /

93.3 /

0.3

0.6

0.3

0.3

0.5

0.3

0.3

0.6

0.3

25.6 /

90.1 /

27.0 /

25.6 /

90.4 /

27.0 /

23.9 /

90.0 /

25.4 /

0.7

0.6

0.7

0.7

0.6

0.7

0.7

0.6

0.7

93.3 /

36.6 /

92.1 /

93.2 /

34.7 /

92.0 /

91.8 /

29.6 /

90.5 /

0.5

0.7

0.5

0.5

0.7

0.5

0.5

0.7

0.5

80.1 /

20.1 /

79.0 /

80.1 /

20.1 /

79.0 /

80.0 /

20.0 /

78.9 /

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

79.3 /

20.9 /

78.2 /

79.3 /

20.9 /

78.2 /

79.3 /

20.9 /

78.1 /

0.9

0.9

0.8

0.8

0.8

0.8

0.8

0.8

0.8

B

C

A

FuNN

Set B

B

C

Overall

Table 8.6: Mean accuracies / approximate variance of MLP and FuNN for the phoneme recognition case study. The complete results for MLP are presented in Table F.1. The complete FuNN results are in Table F.2. A comparison of the overall performance of MLP and FuNN was performed. This comparison was of a similar format to that done in Section 2.7. The hypotheses used to perform this comparison are listed in Table 8.7. Each hypothesis was tested using a two-tailed pooled-variance t-test. In this table, the first superscript, a,

b or ,

indicates which data set the network was trained on. The second superscript indicates which data set the network was recalled with, whether a,b or . The subscript denotes that the network is either an MLP (m) or a FuNN (fn). The results of these statistical tests are presented in Table E.1. These results, and the results in Table 8.6, clearly show that there is a highly significant difference between the overall performance of MLP and FuNN, with the MLP out-performing FuNN across the majority of phonemes. Again following Section 2.7, the forgetting of the networks after further training was assessed. This was done by testing the statistical hypotheses listed in Table 8.8. In this table, the superscript denotes, firstly, which data set the network had been trained on, secondly, which data set the network was recalled on. A two-tailed, paired-value

t-test was used to test each hypothesis. The results of these tests are presented in Table E.2, for the MLP, and Table E.3 for the FuNN. These results show that the MLP suffered from severe levels of forgetting, after further training on the additional data sets. The FuNN also forgot after further training on Set B, but not after further training on Set C. By then, however, the damage was already done, as the FuNN was not able to classify the examples very well at all.


Hypothesis

H0 H1

AA

AB

198

AC

aa ab a aa ab a m = f n m = f n m = f n aa ab a aa ab a m 6= f n m 6= f n m 6= f n

Hypothesis

BA

BB

BC

H0 H1

ba ba m = f n ba ba m 6= f n

bb bb m = f n bb bb m 6= f n

b b m = f n b b m 6= f n

Hypothesis

CA

H0 H1

=

a a m 6= f n

a m

CB

a fn

CC

=

b b m 6= f n

b m

b fn

=

fn

m 6=

fn

m

Table 8.7: Statistical hypotheses for comparing MLP and FuNN. Hypothesis

H0 H1

ÆBA ÆBB ÆBC ÆCA ÆCB ÆCC aa = ba ab = bb a = b ba = a bb = b b =

aa 6= ba ab 6= bb a 6= b ba 6= a bb 6= b b 6=

Table 8.8: Statistical hypotheses for evaluating changes in accuracy after further training for the phoneme case study. Overall, the backpropagation trained networks performed well over the initial training sets, but suffered badly from forgetting after additional training. This is as was seen with the benchmark problems in Section 2.7.

8.7 Results with EFuNN and SECoS Algorithms ECoS networks, in this case EFuNN and SECoS, were sequentially trained over each of the data sets. At the conclusion of each training session, each network was tested on all three data sets. Since ECoS is a deterministic algorithm, only a single network was created and trained for each phoneme. The training parameters were as in Table 8.9. Again, these are the same that were used for the experiments with the benchmark data sets. Also, no consideration was given to the unbalanced nature of the training set, that is, no attempt was made to balance the number of positive and negative examples used in the training data sets. This is because, in a real-life situation, ECoS networks would have to accept new training data as it became available. It would not, therefore, be possible to balance the training data used. The EFuNN used here had three MF attached to each input, and two attached to the output. This is the same structure as was used for the FuNN in Section 8.6. The summary of the results for both EFuNN and SECoS are presented in Table 8.10. The standard deviation is used as the measure of variation in these results, and for the results in the following sections. The complete results over EFuNN are presented in Table F.3. The complete results over SECoS are presented in Table F.4. The overall accuracies and sizes of the EFuNN and SECoS networks were compared using the hypotheses listed in Table 8.11. In this table, a subscript of s indicates SECoS, while a subscript of e indicates EFuNN. The


Error threshold

0.1


0.5

Learning rate one

0.5

Learning rate two

0.5

199

Table 8.9: ECoS training parameters for phoneme recognition problem.


EFuNN

Set C

Train

True

True

Overall True

True

Overall True

True

set

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

96.9 /

32.7 /

95.3 /

96.8 /

29.5 /

95.1 /

96.9 /

21.4 /

95.1 /

107.7 /

2.6

24.9

2.2

2.7

23.6

2.2

2.7

24.1

2.6

43.3

90.0 /

69.3 /

89.5 /

90.4 /

84.8 /

90.2 /

88.5 /

39.3 /

87.2 /

233.0 /

8.4

13.8

8.1

8.3

9.9

7.9

8.5

27.2

8.3

61.5

93.9 /

61.6 /

93.3 /

94.0 /

78.6 /

93.6 /

95.6 /

70.7 /

95.0 /

319.8 /

5.0

12.3

5.1

5.2

12.4

5.2

4.5

20.0

4.7

101.5

97.7 /

98.3 /

97.7 /

96.7 /

74.2 /

96.2 /

94.7 /

34.0 /

93.3 /

538.4 /

0.8

2.0

0.8

1.5

13.0

1.9

3.3

24.9

4.1

328.3

97.8 /

97.8 /

97.8 /

98.8 /

99.6 /

98.8 /

95.0 /

34.4 /

93.6 /

726.6 /

1.0

2.6

1.0

0.6

0.7

0.6

3.4

25.4

4.2

431.9

94.3 /

97.8 /

94.3 /

94.9 /

99.4 /

95.0 /

93.1 /

90.7 /

92.9 /

880.4 /

4.4

2.6

4.4

4.0

1.0

4.0

5.5

9.8

5.5

476.8

B

C

A

SECoS

Set B

B

C

Overall Neurons

Table 8.10: Mean percentage / standard deviation of true positive, true negative and overall accuracies of EFuNN and SECoS for the phoneme recognition case study.


Hypothesis

H0 H1

AA

AB

AC

200

AN

aa ab a an aa ab a an s = e s = e s = e s = e aa ab a an aa ab a an s 6= e s 6= e s 6= e s 6= e

Hypothesis

BA

BB

BC

BN

H0 H1

ba ba s = e ba ba s 6= e

bb bb s = e bb bb s 6= e

b b s = e b b s 6= e

bn bn s = e bn bn s 6= e

Hypothesis

CA

H0 H1

=

a a s 6= e

a s

CB

CC

=

b b s 6= e

a e

b s

b e

CN

=

s 6= e

s

e

= n e

n s 6= n e

n s

Table 8.11: Statistical hypotheses for comparing SECoS and EFuNN for the phoneme recognition case study. Hypothesis

H0 H1

AA

AB

AC

aa ab a aa ab a m = s m = s m = s aa ab a aa ab a m 6= s m 6= s m 6= s

Hypothesis

BA

BB

BC

H0 H1

ba ba m = s ba ba m 6= s

bb bb m = s bb bb m 6= s

b b m = s b b m 6= s

Hypothesis

CA

CB

CC

H0 H1

a a m = s

a a m 6= s

b b m = s

b b m 6= s

m = s

m 6= s

Table 8.12: Statistical hypotheses for comparing MLP and SECoS for the phoneme case study. superscript n indicates the number of neurons in the evolving layer of the network. Two-tailed t-tests were used to test each hypothesis. The results of these tests are presented in Table E.4. Inspection of these results, and the results in Table 8.10, indicate that there were significant differences between the performance of EFuNN and SECoS, with SECoS being far more accurate than EFuNN. EFuNN was, however, significantly smaller than SECoS, which is contrary to the results from Section 4.13. Following Section 4.13, the performance of SECoS was compared to the performance of MLP from Section 8.6. The statistical hypotheses used to perform this comparison are presented in Table 8.12. Two-tailed, pooled variance t-tests were used to test each of these hypotheses. The results of these tests are presented in Table E.5. These results, and the results in Tables 8.6 and F.4, show that the performance of the two network types was not significantly different after training on the initial data Set A. This is despite the fact that the training set for the SECoS was unbalanced, that is, there were many more negative examples than positive examples in the data set. After additional training on Set B and C, the performance of SECoS was significantly better than that of MLP.


Hypothesis

H0 H1

AA

AB

201

AC

aa ab a aa ab a f n = e f n = e f n = e aa ab a aa ab a f n 6= e f n 6= e f n 6= e

Hypothesis

BA

BB

BC

H0 H1

ba ba f n = e ba ba f n 6= e

bb bb f n = e bb bb f n 6= e

b b f n = e b b f n 6= e

Hypothesis

CA

H0 H1

=

a a f n 6= e

a fn

CB

a e

CC

=

b b f n 6= e

b fn

b e

=

e

f n 6=

e

fn

Table 8.13: Statistical hypotheses for comparing FuNN and EFuNN for the phoneme case study. A similar comparison was carried out between FuNN and EFuNN. The statistical hypotheses used to perform this comparison are presented in Table 8.13. Again, two-tailed, pooled-variance t-tests were used to test these hypotheses. The results of these tests are presented in Table E.6. Inspection of these results, and the results in Tables 8.6 and F.4, show similar results to the comparison of MLP and SECoS. That is, FuNN and EFuNN both learn Set A well, such that there is no significant difference between the two network types. After additional training, however, the FuNN forgot significantly, while the EFuNN did not. The degree of forgetting exhibited by both EFuNN and SECoS was evaluated by testing the hypotheses listed in Table 8.8. Two-tailed, paired-value t-tests were again used. The results of these tests for EFuNN are presented in Table E.7. Inspection of these results, and the results in Tables F.3 and 8.10, shows that further training over Set B caused a significant increase in the accuracies over Sets A and B. Additional training on Set C caused some forgetting over Sets A and B that was significant at the 95% level of confidence, but insignificant at the 99% level of confidence. The accuracy over Set C, however, improved significantly. The results for SECoS are presented in Table E.8. Inspection of these results, and the results in Tables 8.10 and F.4, shows that after additional training over Set B, there was some forgetting of Set A, but the degree of forgetting was significant only at the 95% level of confidence. The SECoS adapted well to Set B, with the accuracy improving significantly. After further training on Set C, there was some forgetting over both Set A and B, as well as a significant improvement in the accuracy over Set C. Finally, the change in accuracy of EFuNN after further training was compared to the change in accuracy of SECoS. This comparison was carried out by testing the statistical hypotheses presented in Table 8.14. Two-tailed, unpaired t-tests were used to test these hypotheses. The results of these tests are presented in Table E.9. Inspection of these results, and the results in 8.10, shows that EFuNN forgot Set A to a greater extent than SECoS, after additional training on Set B. There was no significant difference in the changes in accuracy over Set C. After further training on Set C, SECoS adapted to the new data to a greater extent than did EFuNN. Overall, both EFuNN and SECoS were able to learn the phoneme data well, and were both able to adapt to new


Hypothesis

H0 H1 Hypothesis

H0 H1

202

ÆBA ÆBB ÆBC Æbb Æb Æba = Æba Æbb Æb s e s = e s = e Æbb Æb Æba 6= Æba Æbb Æb s e s 6= e s 6= e ÆCA ÆCB ÆCC Æ b Æ

Æ a = Æ a Æ b Æ

s e s = e s = e Æ b Æ

Æ a 6= Æ a Æ b Æ

s e s 6= e s 6= e

Table 8.14: Statistical hypotheses for comparing changes in accuracy of SECoS and EFuNN for the phoneme recognition case study. data well. In terms of the adaptation and resistance to forgetting, these results strongly mirror the results over the benchmark data, as discussed in Section 4.13. The major difference here is that the mean size of the EFuNN was less than the mean size of the SECoS. However, the simpler structure of SECoS means that the amount of storage required was, on average, similar for both types of network.

8.8 Results of Fuzzy Rules Extracted from ECoS The general experimental procedure was similar to that used in Section 6.11. At the completion of training over each set of data, fuzzy rules were extracted from the ECoS network and tested over all three data sets. In the case of the Zadeh-Mamdani rules, the rules were then used to create a new network, which was also tested over all three data sets. The networks created via rule insertion were not additionally trained. The MF used to extract rules from SECoS were the same as the MF embedded in the EFuNN used. That is, there were three triangular MF for each input, and the centres of those MF were the same as the centres of the MF embedded in EFuNN.

8.8.1 Fuzzy Rule Extraction A summary of the results for the fuzzy rules extracted from EFuNN and SECoS are presented in Table 8.15. The complete results of the Zadeh-Mamdani rules extracted from EFuNN are presented in Table F.5. The complete results of the Zadeh-Mamdani rules extracted from SECoS are presented in Table F.7. The full results for the Takagi-Sugeno rules extracted from SECoS are in Table F.9. Inspection of the results in Tables F.5, F.7 and F.9 make further, statistical, analysis unnecessary: for all three types of rule, the rules either classify all examples as negative, or all examples as positive. That is, they score one hundred percent on the negative data and zero percent on the positive data, or vice versa. Clearly, the fuzzy rules extracted from either EFuNN or SECoS are not suitable for use as classifiers in this problem domain. This is disappointing, as the rules extracted in Section 6.11 did display a useful level of performance.


203


EFuNN-ZM

True

True

Overall True

True

Overall True

True

set

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

93.0 /

7.0 /

91.0 /

93.0 /

7.0 /

91.0 /

93.0 /

7.0 /

91.0 /

107.7 /

25.8

25.8

24.7

25.7

25.8

24.7

25.8

25.8

24.6

43.3/

60.5 /

39.5 /

60.0 /

60.5 /

39.2 /

59.6 /

60.5 /

39.5 /

59.6 /

233.0 /

49.4

49.5

47.2

49.4

49.1

47.2

49.4

49.5

47.2

61.5

97.7 /

2.3 /

95.5 /

97.7 /

2.3 /

95.5 /

97.7 /

2.3 /

95.4 /

319.8 /

15.2

15.2

14.5

15.2

15.2

14.4

15.2

15.2

14.7

101.5

98.7 /

4.6 /

96.5 /

98.6 /

4.1 /

96.5 /

98.0 /

5.1 /

95.9 /

538.4 /

4.9

8.6

1.7

4.8

8.1

5.1

4.8

10.8

5.2

328.3

98.3 /

5.1 /

96.2 /

98.3 /

4.9 /

96.2 /

97.7 /

5.0 /

95.6 /

726.6 /

5.9

9.2

6.1

5.7

8.3

6.0

5.5

9.8

5.9

431.9

98.2 /

4.9 /

96.1 /

98.2 /

4.4 /

96.1 /

97.3 /

8.2 /

95.4 /

880.4 /

6.1

8.7

6.3

5.9

7.7

6.2

5.9

12.5

6.0

476.8

99.6 /

3.5 /

97.4 /

99.5 /

3.2 /

97.3 /

98.8 /

4.4 /

96.6 /

538.4 /

0.6

5.8

1.9

0.7

5.5

1.9

1.6

10.1

2.5

328.3

99.6 /

3.7 /

97.4 /

99.6 /

3.6 /

97.3 /

98.8 /

4.1

96.6 /

726.6 /

0.5

6.1

1.9

0.6

5.6

1.9

1.5

/8.9

2.5

431.9

99.4 /

3.4 /

97.2 /

99.4 /

3.1 /

97.2 /

98.2 /

7.5 /

96.2 /

880.4 /

0.6

5.9

2.0

0.8

5.0

2.0

1.9

10.9

2.7

476.8

B

A

B

C

A

SECoS-TS

Set C

Train

C

SECoS-ZM

Set B

B

C

Overall Rules

Table 8.15: Mean percentage / standard deviation of true positive, true negative and overall accuracies of fuzzy rule extraction, for the phoneme recognition case study.


204


EFuNN-ZM-in

Train True

Set C

True

Overall True

True

Overall True

True

Neg.

Pos.

Neg.

Pos.

Overall Neurons

set

Neg.

Pos.

A

96.8 /

32.8 /

95.2 /

96.6 /

30.6 /

94.9 /

96.8 /

21.5 /

95.0 /

107.7 /

2.8

26.1

2.4

2.9

24.5

2.4

2.9

24.6

2.7

43.3

89.4 /

68.0 /

88.9 /

89.9 /

84.2 /

89.7 /

87.9 /

40.5 /

86.6 /

233.0

9.1

14.5

8.8

8.9

9.8

8.5

9.1

27.9

8.8

/61.5

93.7 /

59.8 /

93.0 /

93.8 /

78.0 /

93.5 /

95.3 /

70.2 /

94.8 /

319.8 /

5.1

12.8

5.3

5.2

11.4

5.2

4.7

20.0

4.9

101.5

100.0

0.1 /

97.7 /

100.0

0.1 /

97.7 /

100.0

0.0 /

97.7 /

538.4 /

/ 0.0

0.3

1.7

/ 0.0

0.3

1.7

/ 0.0

0.0

1.6

328.3

100.0

0.1 /

97.7 /

100.0

0.1 /

97.7 /

100.0

0.0 /

97.7 /

726.6 /

/ 0.0

0.3

1.7

/ 0.0

0.3

1.7

/ 0.0

0.0

1.6

431.9

100.0

0.1 /

97.7 /

100.0

0.1 /

97.7 /

100.0

0.2 /

97.6 /

880.4 /

/ 0.0

0.2

1.7

/ 0.0

0.2

1.7

/ 0.1

0.8

1.6

476.8

B

C

A

SECoS-ZM-in

Set B

B

C

Table 8.16: Mean percentage / standard deviation of true positive, true negative and overall accuracies of fuzzy rule insertion, for the phoneme recognition case study.

8.8.2 Fuzzy Rule Insertion Despite the poor performance of the extracted rules, the rules may still be useful for explanation purposes. If this is the case, then some use may be gained from the rule insertion algorithms developed. To this end, the extracted Zadeh-Mamdani rules were used to create new networks, which were then tested. The summary results of the networks created via insertion of Zadeh-Mamdani fuzzy rules are presented in Table 8.16. The complete results of the EFuNN created via the insertion of rules are presented in Table F.5. The full results of the SECoS created via the insertion of rules are presented in Table F.7. These results show that the rules extracted from SECoS were not of sufficient quality to create useful networks. Therefore, the networks created by insertion of Zadeh-Mamdani rules extracted from SECoS will not be further considered. The first set of statistical tests done on these results was to compare the performance of the networks created via rule insertion, to the performance of the rules they were created from. The statistical hypotheses used to perform this comparison are presented in Table 8.17, where a subscript of r indicates the rules, and a subscript of i indicates the network created via insertion of those rules. Two-tailed, paired-sample t-tests were used to evaluate each of these hypotheses. The results of testing these hypotheses for EFuNN are presented in Table E.10. These results confirm the results shown in Tables F.5 and F.6: the EFuNN that resulted from the insertion of fuzzy rules, were significantly more accurate than the rules they were created from.


Hypothesis

H0 H1

AA

AB

205

AC

aa ab a aa ab a r = i r = i r = i aa ab a aa ab a r 6= i r 6= i r 6= i

Hypothesis

BA

BB

BC

H0 H1

ba ba r = i ba ba r 6= i

bb bb r = i bb bb r 6= i

b b r = i b b r 6= i

Hypothesis

CA

H0 H1

=

a a r 6= i

a r

CB

a i

CC

=

b b r 6= i

b r

b i

=

i

r 6=

i

r

Table 8.17: Statistical hypotheses for comparing Zadeh-Mamdani rules with the networks created via insertion of those rules, for the phoneme recognition case study. Hypothesis

H0 H1

AA

AB

AC

aa ab a aa ab a i = o i = o i = o aa ab a aa ab a i 6= o i 6= o i 6= o

Hypothesis

BA

H0 H1

= ba ba i 6= o

= bb bb i 6= o

= b o b i 6= b o

Hypothesis

CA

CB

CC

H0 H1

a a i = o

a a i 6= o

b b i = o

b b i 6= o

i = o

i 6= o

ba i

BB ba o

bb i

BC bb o

b i

Table 8.18: Statistical hypotheses for comparing networks created via the insertion of Zadeh-Mamdani rules with the original networks, for the phoneme recognition case study. The final comparisons carried out for this section were comparisons of the performance of the networks created via rule insertion, with the performance of the network the rules were originally extracted from. This was done by evaluating the hypotheses listed in Table 8.18. In this table, a subscript of

o indicates the original network.

Two-tailed, unpaired-value t-tests were used to test each hypothesis for both EFuNN and SECoS. The results of these tests for EFuNN are presented in Table E.11. These results, when considered with the results in Tables F.3 and F.6, show that the EFuNN that were created by the rule insertion algorithm were not significantly less accurate than the original networks. Overall, the fuzzy rules extracted from both EFuNN and SECoS, were not able to accurately classify examples in this problem. The rules extracted from EFuNN, however, were able to be used to recreate the networks from which they were originally extracted. This matches some of the results from Section 6.11, where the EFuNN created via insertion of Zadeh-Mamdani rules performed better than the SECoS created the same way.


Error threshold

0.1


0.5

Learning rate one

0.5

Learning rate two

0.5

Threshold in

0.5

Threshold out

0.5

206

Table 8.19: Online aggregation training parameters for the phoneme recognition problem.

8.9 Results of ECoS Optimisation Techniques A toolbox of ECoS optimisation methods was presented in Chapter 7. The evolutionary methods will not be investigated for this case study, although they may be in the future when more computing power becomes available. While each of these methods was effective over at least some of the benchmark data sets, the evolutionary methods have the penalty of being very time consuming, even for small data sets. While the evolutionary methods ran very quickly over the iris classification and gas furnace data sets, very much more time was required for the MackeyGlass data set. This difference is because the iris classification and gas furnace benchmarks each have less than two hundred examples, while the Mackey-Glass data set used had one thousand. Since the first phoneme data set has 10175 examples, with seventy-eight input features, the evolutionary methods could be expected to run between ten and one hundred times more slowly: this would extend the time required to perform experiments an unacceptable amount.

8.9.1 Online Neuron Aggregation The online aggregation experiments were carried out in the same fashion as those in Section 7.5. That is, each network was trained, tested and further trained, as in Section 8.7, with the difference that online aggregation was used during training. The training parameters used for the online aggregation experiments are presented in Table 8.19. These are the same parameters that were used in the benchmark online aggregation experiments in Section 7.5. The summary results across all phonemes, for both EFuNN and SECoS, are presented in Table 8.20. The full results for the experiments with EFuNN are in Table F.10. The full results for the experiments with SECoS are presented in Table F.11. The performance and sizes of the networks trained using online aggregation were compared to the performance and sizes of the networks from Section 8.7. This was done using the hypotheses listed in Table 8.21. In this table, a subscript of a denotes a network that was trained using online aggregation. Two-tailed t-tests were used to test each hypothesis. The results of testing these hypotheses for EFuNN are presented in Table E.12. These results, and the results in Tables 8.10 and 8.20, show that after training on Set A, there were not significant differences in accuracy between the unoptimised and optimised EFuNN. The EFuNN that had been optimised with online aggregation, however,


207


Train True set A

EFuNN

B

C

A

SECoS

B

C

Set B

Set C

True

Overall True

True

Overall True

True

Overall Neurons

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

63.5 /

80.3 /

63.8 /

63.1 /

79.0 /

63.3 /

63.5 /

73.2 /

63.7 /

3.5

16.0

19.5

15.4

16.6

21.0

16.0

14.1

28.4

13.8

0.6

58.5 /

87.6 /

59.1 /

58.0 /

88.1 /

58.6 /

59.4 /

79.4 /

59.7 /

6.3

19.0

15.6

18.5

19.4

16.3

18.9

17.9

25.9

17.5

1.6

63.4 /

72.0 /

63.5 /

63.8 /

72.4 /

63.9 /

64.0 /

77.3 /

64.2 /

7.0

19.6

26.8

19.3

19.5

26.4

19.1

20.6

22.9

20..1

1.4

97.7 /

92.0 /

97.5 /

96.8 /

70.6 /

96.2 /

94.7 /

34.4 /

93.4 /

498.9 /

0.8

8.4

0.9

1.4

13.4

1.8

3.2

24.5

4.0

293.4

97.8 /

89.2 /

97.6 /

98.7 /

93.1 /

98.5 /

95.1 /

35.1 /

93.7 /

665.2 /

1.0

10.2

1.3

0.6

6.8

0.7

3.3

25.1

4.1

381.2

94.3 /

89.1 /

94.1 /

94.9 /

93.0 /

94.8 /

93.1 /

80.4 /

92.7 /

796.8 /

4.4

10.1

4.4

4.0

6.8

4.0

5.7

20.0

5.6

412.2

/

/

/

Table 8.20: Mean percentage / standard deviation of true positive, true negative and overall accuracies for ECoS networks optimised via online aggregation training, for the phoneme recognition case study.

Hypothesis

H0 H1

AA

AB

AC

AN

= = = = an a aa aa ab ab a a an 6= a 6= a 6= a 6= an a aa

aa a

ab

ab a

a

a a

an

Hypothesis

BA

BB

BC

BN

H0 H1



b = b a b 6= b a

bn = bn a bn 6= bn a

Hypothesis

CA

CB

CC

CN

H0 H1

a = a a

a 6= a a

b = b a

b 6= b a

=

a

6=

a

n = n a

n 6= n a

Table 8.21: Statistical hypotheses for evaluating online aggregation, for the phoneme recognition case study.


Threshold in

0.5

Threshold out

0.5

208

Table 8.22: Offline aggregation parameters for the phoneme recognition problem. were significantly smaller. After further training, the accuracies over Sets A and B were significantly less than the unoptimised networks, as was also the case after training on Set C. The results of testing these results for SECoS are presented in Table E.13. Inspection of these results and the results in Tables 8.10 and 8.20 shows that, while the accuracies were not significantly changed, the size of the networks were not significantly changed either. Inspection of the individual results in Tables F.4 and F.11 reveals that for many phonemes, online aggregation did not result in any reduction in network size. These results run counter to the expectations from the benchmark results, as online aggregation of SECoS was successful over two of the four benchmark data sets, and partially successful over a third. It appears that the phoneme data is as complex as the two spirals data set, for which online aggregation was unsuccessful.

8.9.2 Offline Aggregation The offline aggregation experiments for this problem were performed in a similar manner to the experiments performed with the benchmark data set (Section 7.5). That is, at the completion of training over each of the data sets, the ECoS networks were aggregated and tested. The aggregated networks were then further trained. The aggregation parameters used are presented in Table 8.22. As before, they are the same parameters that were used in the benchmark experiments in Section 7.5. For the sake of brevity, only the performance of the aggregated networks is presented here. The summary of the performance of the aggregated networks for all phonemes is presented in Table 8.23. The complete result set of the aggregated EFuNN is in Table F.12. The complete result set for the aggregated SECoS is in Table F.13. The performance of the aggregated networks was compared to the performance of the original, unaggregated networks. This was done by testing the hypotheses listed in Table 8.24. In this table, a subscript of a denotes the aggregated network. Two-tailed, paired-sample t-tests were used to test each hypothesis. The results of testing these hypotheses for EFuNN are presented in Table E.14. In all cases, the offline aggregation algorithm significantly reduced the size of the network, but also significantly reduced the accuracies of the networks. The results of testing these hypotheses for SECoS are presented in Table E.15. Again, the networks are significantly reduced in size, but also significantly degraded in performance. This is consistent with the results of the benchmark experiments, where a reduction in network size was often accompanied by reductions in accuracy.

8.9.3 Sleep Learning The experimental setup for the sleep learning experiments was the same as for the offline aggregation experiments, that is, at the conclusion of training over each of the three data sets, the SECoS network was sleep trained, and the


209


Train True set A

EFuNN

B

C

A

SECoS

B

C

Set B

Set C

True

Overall True

True

Overall True

True

Overall Neurons

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

43.2 /

86.7 /

44.2 /

41.8 /

87.1 /

42.7 /

47.1 /

79.8 /

47.8 /

51.3 /

21.1

24.3

19.9

21.6

24.4

20.4

20.2

29.7

198.4

95.1

51.3 /

87.0 /

51.8 /

50.2 /

87.8 /

50.8 /

53.6 /

77.0 /

53.8 /

51.3 /

19.9

20.5

19.0

20.5

20.6

19.6

18.5

30.7

17.7

95.1

70.0 /

78.7 /

70.0 /

69.6 /

79.3 /

69.6 /

70.3 /

79.2 /

70.3 /

51.3 /

16.8

20.9

16.2

17.2

21.2

16.5

15.0

24.0

14.9

95.1

99.7 /

20.4 /

97.7 /

99.6 /

14.5 /

97.5 /

99.3 /

9.5 /

97.2 /

362.6 /

0.3

17.7

1.5

0.3

13.2

1.5

0.4

14.0

1.6

231.2

99.7 /

14.5 /

97.7 /

99.8 /

17.6 /

97.8 /

99.3 /

9.4 /

97.2 /

409.3 /

0.3

14.0

1.6

0.2

18.0

1.6

0.7

14.8

1.7

239.7

99.3 /

11.6 /

97.2 /

99.3 /

16.5 /

97.2 /

99.1 /

19.0 /

97.2 /

487.4 /

1.2

11.8

1.7

1.0

17.4

1.7

1.5

18.6

2.0

260.4

Table 8.23: Mean percentage / standard deviation of true positive, true negative and overall accuracies of ECoS networks optimised via offline aggregation, for the phoneme recognition case study.

Hypothesis

H0 H1

AA

AB

AC

AN

= = = = an a aa aa ab ab a a an 6= a 6= a 6= a 6= an a aa

aa a

ab

ab a

a

a a

an

Hypothesis

BA

BB

BC

BN

H0 H1



b = b a b 6= b a

bn = bn a bn 6= bn a

Hypothesis

CA

CB

CC

CN

H0 H1

a = a a

a 6= a a

b = b a

b 6= b a

=

a

6=

a

n = n a

n 6= n a

Table 8.24: Statistical hypotheses for evaluating offline aggregation, for the phoneme recognition case study.


Error threshold

0.1


0.5

Learning rate one

0.5

Learning rate two

0.5

210

Table 8.25: Sleep learning parameters used for the phoneme recognition problem. Recalled With Set A

Set B

Set C

Train

True

True

Overall True

True

Overall True

True

set

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

90.2 /

97.9 /

90.3 /

89.1 /

79.9 /

88.9 /

86.3 /

45.6 /

85.3 /

273.7 /

3.7

2.3

3.6

4.0

10.3

4.0

6.0

24.9

6.3

182.0

87.7 /

96.2 /

87.9 /

88.5 /

98.6 /

88.7 /

84.2 /

47.1 /

83.3 /

340.9 /

5.6

3.9

5.5

5.6

1.7

5.5

7.2

26.4

7.2

224.0

77.4 /

96.1 /

77.7 /

77.7 /

98.2 /

78.1 /

74.5 /

90.3 /

74.8 /

418.2 /

11.5

3.9

11.4

11.7

2.4

11.6

12.3

10.0

12.1

249.4

B

C

Overall Neurons

Table 8.26: Mean percentage / standard deviation of true positive, true negative and overall accuracies of SECoS optimised by sleep learning, for the phoneme recognition case study. sleep trained network tested. The sleep trained network was then further trained. Again, only the performance of the sleep trained networks are presented here. The sleep training parameters used are presented in Table 8.25. As before, these are the same parameters was were used in the benchmark experiments in Section 7.5. The summary of the performance of the sleep optimised networks is presented in Table 8.26. The complete results are presented in Table F.14. The performance of the sleep optimised SECoS was compared to the performance and size of the original, unoptimised, networks. This was done by testing the statistical hypotheses listed in Table 8.27. In this table, a subscript of st indicates a network optimised by sleep training. Two-tailed, paired-sample t-tests were used to test each hypothesis. The results of testing the hypotheses in Table 8.27 are presented in Table E.16. These results, and the results in Tables 8.10 and 8.26 show that while the networks were consistently reduced in size, they were also consistently degraded in accuracy. Finally, the performance of the sleep optimised SECoS networks was compared to the performance of the SECoS optimised via offline aggregation. This comparison was carried out by testing the statistical hypotheses listed in Table 8.28. A subscript of a indicates a network optimised via offline aggregation. Two-tailed, unpairedsample t-tests were used to test each hypothesis. The results of testing these hypotheses are presented in Table E.17. Inspection of these results, and the results in Tables 8.23 and 8.26 show that both algorithms reduced the size of the SECoS networks by the same amount.


Hypothesis

H0 H1

AA

AB

AC

211

AN

= = = = an st aa aa ab ab a a an s 6= st s 6= st s 6= st s 6= an st aa s

aa st

ab s

ab st

a s

an s

Hypothesis

BA

H0 H1

= ba ba s 6= st

= bb bb s 6= st

= b b s 6= st

= bn st bn s 6= bn st

Hypothesis

CA

CB

CC

CN

H0 H1

a a s = st

a a s 6= st

b b s = st

b b s 6= st

s = st

s 6= st

n n s = st

n n s 6= st

ba s

BB

a st

ba st

bb s

BC bb st

b s

BN b st

bn s

Table 8.27: Statistical hypotheses for evaluating sleep training, for the phoneme recognition case study.

Hypothesis

H0 H1

AA

AB

AC

AN

aa ab a an aa ab a an a = st a = st a = st a = st aa ab a an aa ab a an a 6= st a 6= st a 6= st a 6= st

Hypothesis

BA

BB

BC

BN

H0 H1

ba ba a = st ba ba a 6= st

bb bb a = st bb bb a 6= st

b b a = st b b a 6= st

bn bn a = st bn bn a 6= st

Hypothesis

CA

CB

CC

CN

H0 H1

a a a = st

a a a 6= st

b b a = st

b b a 6= st

a = st

a 6= st

n n a = st

n n a 6= st

Table 8.28: Statistical hypotheses for comparing offline aggregation and sleep training, for the phoneme recognition case study.


212

The offline aggregated networks had a higher true negative accuracy, and hence a higher overall accuracy, but the sleep trained networks had a better true positive rate. It is interesting to note that there was no significant difference in accuracy over Set C after training on Sets A and B. Sleep learning was partially successful across the benchmark data sets, so these results are not inconsistent.

8.10 Conclusions The results of these experiments are summarised in Table 8.29. Overall, the results are comparable to the results over the four benchmark data sets (Sections 2.7, 4.13, 6.11 and 7.5). The MLP and FuNN trained via backpropagation training were able to learn the initial data set, but were unable to adapt to new data sets. That is, both MLP and FuNN experienced significant levels of forgetting after additional training on new data sets. The ECoS networks, EFuNN and SECoS, were able to learn the initial training set well, and also adapt to new data sets without significantly forgetting the previous data. The SECoS networks were typically more accurate than the EFuNN networks. In a break from the benchmark results, the EFuNN networks were significantly smaller, in terms of the number of evolving layer neurons, than the SECoS networks. The fuzzy rules extracted from the ECoS networks were unable to meaningfully classify any of the data sets. In all cases, the extracted rules were not able to approximate the operation of the original networks. The fuzzy rule extraction algorithms must therefore be regarded as failures for this problem. The fuzzy rule insertion algorithms gave mixed results: the EFuNN networks created via the insertion of fuzzy rules were significantly more accurate than the rules from which they were created, although they were less accurate than the originals networks. The SECoS networks created via rule insertion performed as poorly as the rules. Three of the optimisation methods developed were applied to this problem: online aggregation, offline aggregation and sleep learning. Online aggregation was partially successful for EFuNN: it was able to reduce the size of the network, and, after training on the initial data set, was able to produce networks that were as accurate as the unoptimised EFuNN. After further training, however, the performance of the EFuNN degraded significantly. Online aggregation of SECoS was not successful, as the size of the SECoS networks was not significantly reduced. Offline aggregation succeeded in reducing the size of both the EFuNN and SECoS, but also significantly degraded the accuracy of the networks. Finally, sleep learning was applied to the SECoS networks. Again, the size of the SECoS were consistently reduced by sleep learning, but the accuracy was also significantly degraded. When compared to offline aggregation, the sleep trained networks were found to be more accurate. In conclusion, SECoS networks gave superior performance over the phoneme recognition problem. They were able to learn and adapt to new data well, as is necessary in a speech recognition system. Although they were larger than the equivalent EFuNNs, the simpler structure of SECoS meant that the two network types were typically of a similar size in terms of numbers of connections. SECoS also adapted better and forgot less than EFuNN. The optimisation methods tested resulted in a decrease in the size of each network, but at the cost of a decrease in


Algorithm

213

Evaluation

MLP

learned well but forgot badly

FuNN

learned well but forgot badly

EFuNN

learned well and adapted well

SECoS

learned and adapted better than EFuNN, but larger

EFuNN-ZM

could not perform to a useful level of accuracy

SECoS-ZM

failed to produce rules of usable accuracy

SECoS-TS

could not meaningfully classify examples

EFuNN-ZM-in

produced created networks more accurate than the rules but less accurate than original EFuNN

SECoS-ZM-in

produced poorly performing networks

EFuNN-on-agg

reduced the size of the networks without initially reducing accuracy

SECoS-on-agg

reduced size and accuracy of networks

EFuNN-off-agg

reduced both size and accuracy of networks

SECoS-off-agg

reduced both size and accuracy of networks

SECoS-sleep

reduced size and accuracy of networks Table 8.29: Summary of results over the phoneme recognition case study.

accuracy. In a real-life application of these techniques, it would be necessary to decide how much of a decrease in accuracy is acceptable.

8.11 Summary This chapter has applied the methods developed in the thesis to the case study problem, recognition of isolated New Zealand English phonemes. Overall, the results of these experiments were consistent with the results of the experiments over the benchmark data sets. The points of differences were:

The SECoS networks were larger than the EFuNN networks.

The extracted rules did not have a useful level of performance.

Online aggregation of SECoS did not reduce the size of the networks.

None of these differences invalidate or make unsuccessful the hypotheses presented in Chapter 1.

Chapter 9

Conclusions and Future Work And my soul from out that shadow that lies floating on the floor Shall be lifted–nevermore! Edgar Allen Poe, The Raven

9.1 Introduction The overall theme of this thesis can be summarised as the characterisation, simplification, formalisation, explanation and optimisation of Evolving Connectionist System (ECoS) artificial neural networks (ANN). This theme is the motivation for the five hypotheses listed in Chapter 1. Characterisation means classifying ECoS in terms of how they compare to constructive ANN. That is, comparing ECoS with constructive algorithms described in the literature. This part of the theme is the motivation for Hypothesis One. Simplification refers to the reduction of the original ECoS network, EFuNN, to a simpler form. That is, developing an ECoS network that eliminates the fuzzy logic elements of EFuNN. This part of the theme is the motivation for Hypothesis Two. Formalisation means developing a testable theory of the internal workings of ECoS and the ways in which they behave during training, in relation to their training parameters. Experimental testing is also needed to verify the formalisation developed. This part of the theme is the motivation for Hypothesis Three. Explanation means methods of explaining what the ECoS networks have learned. Although a fuzzy rule extraction algorithm existed for EFuNN, no such algorithm existed for the SECoS developed herein. This part of the theme is the motivation for Hypothesis Four. Optimisation refers to methods that can be used to optimise ECoS according to certain criteria. The criteria are, in brief, reducing the size of an ECoS network while maintaining its performance over both previously seen and previously unseen data. This part of the theme is the motivation for Hypothesis Five. The research in this thesis has examined the themes above by investigating the five hypotheses described in Chapter 1. This chapter draws together the work presented in this thesis. There are three parts to the chapter: firstly, Section 9.2 presents a summary of the results of each algorithm over the benchmark data sets. Secondly, Sections 9.3 to 9.7 reviews the research in the context of the five research hypothesis stated in Chapter 1, and the criteria in Section 1.3. These sections also reiterate the major findings and contributions of the thesis. Finally, possible extensions to the work are described in Section 9.8.

CHAPTER 9. CONCLUSIONS AND FUTURE WORK

215

9.2 Summary of Benchmark Experiment Results Table 9.2 presents a summary of the results of each algorithm investigated in this thesis over each of the benchmark data sets. The row labels are the same as used in the previous sections of the thesis, as is the format of the results. The results presented are the results over the entire data set at the conclusion of all training, that is, after the networks have been further trained on data set B. Table 9.2 presents the location of the complete results for each of the algorithms for each of the data sets.

9.3 Hypothesis One: ECoS and Constructive Algorithms Hypothesis One was stated as: It is hypothesised that a comparison of ECoS with existing constructive algorithms will lead to a better understanding of the ECoS algorithms, and lead to methods of optimising ECoS networks. Hypothesis One is concerned with the differences and similarities between ECoS and other constructive neural network algorithms. The rationale for this is to firstly, identify the place of ECoS in the spectrum of constructive algorithms, and secondly, to identify methods of optimising constructive ANN that can be applied to ECoS. Hypothesis One was addressed in two places. Firstly, a review of relevant constructive ANN algorithms was presented in Chapter 3. Secondly, these algorithms were compared point-by-point to ECoS, in Section 4.11.

9.3.1 How similar is the ECoS algorithm to other constructive algorithms? A comparison (Section 4.11) of the ECoS algorithm with the constructive algorithms discussed in Chapter 3 was carried out. From this comparison, it was found that the most similar algorithm to ECoS is the Grow and Learn (GAL) network of (Alpaydin, 1994). The next most similar was the Resource Allocating Network (RAN) of (Platt, 1991b), although RAN is significantly more complex than ECoS. The other algorithms reviewed bore very little similarity to ECoS, and most were limited to either a single application area, or to learning a single data set.

9.3.2 What elements of existing constructive algorithms can be adapted to ECoS? The optimisation method of sleep learning, from GAL (Alpaydin, 1994) was adapted to ECoS. This was described and evaluated in Chapter 7 and found to be effective.

9.3.3 Support for Hypothesis One The criteria for supporting Hypothesis One were stated in Section 1.3 as follows: 1. Constructive neural network algorithms that are similar to ECoS are identified, and the ways in which they are similar and different are identified and described . 2. Optimisation methods that are applicable to ECoS are identified and described. Each of these criteria will now be discussed.


Two Spirals

216

Iris Classification

Mackey-Glass

Gas Furnace

Algorithm

Accuracy Size

Accuracy Size

Accuracy Size

Accuracy Size

MLP

48.9/

97.2/

2.541/

0.219/

40

0.7 FuNN

0.2

51.9/

40

1.4 SECoS

6.4/ 3.7

10.2

SECoS-ZM

EFuNN-ZM in

SECoS

Online

Agg. GA

Training SECoS

Offline

Training EFuNN Agg.

Offline

13/

63.0/

0.514/

28.1/

1.7

2.4

1.088

2.5

0.093

4.3

14.2

0.6

1.7

0.272

4.5

0.045

4.0

77.6/

6.4/ 3.7

80.1/

23.1/

105/ 22

63.0/

2.003/

28.1/

2.5

2.4

2.5

0.198

4.3

52.9/

26.1/

93/

63.0/

1.722/

28.1/

1.8

2.4

6.249

2.5

0.275

4.3

82.0/

26.1/

115/ 32

63.0/

2.934/

28.1/

10.4

2.4

2.5

1.126

4.3

63.8/

6.4/ 3.7

6.4/ 3.7

45.6/

59.1/

86.3/

37.1/

112/

282.4/

1.921/

118.2/

28.0

14.2

0.8

1.7

7.479

4.5

0.159

4.0

52.7/

59.1/

95.5/

37.1/

57/

282.4/

2.637/

118.2/

16.5

14.2

1.9

1.7

1.133

4.5

0.152

4.0

51.9/

4.7/ 1.6

93.7/

13.2/

17/

55.5/

0.648/

20.8/

2.9

2.3

2.211

2.6

0.152

3.0

7.7/ 0.5

31/ 76

22.6/

1.338/

5.3/ 0.3

0.6

0.156

32./ 0.6

7.16/

58.0/

10.0/

95.6/

0.7

1.0

0.5

50.4/

3.3/ 0.9

83.7/

4.4/ 0.5

50.3/

2.9/ 2.6

1.1/ 0.0

2.4/ 0.5

1.8

92.7/

10.8/

2.7

3.7

88.7/

8.9/ 0.2

1.7

65/ 11

541/

18.7/

2.101/

1.6

0.837

2.3/ 0.2

13.012/

1.9/

1.493

0.188

173.9/

0.504/

115.4/

6.6

0.047

4.3

20.0/

1.653/

5.3/ 0.3

0.0

0.18

20.0/

1.608/

10.0/

0.0

0.200

0.0

925

56.3/

40.0/

95.6/

13.7/

5.5

17.8

1.3

2.9

86.9/

49.9/

94.4/

5.2/ 0.4

0.7

0.6

0.5

37.9/

6.0/ 0.0

87.5/

12.0/

10.0

0.0

6.9

415/ 97

3.7

0.1

GA

26.1/

7.4

3.1

EFuNN

97.2/

118.2/

50.4/

Agg.

0.206

0.499/

SECoS Sleep

Online

5

282.4/

2.7

EFuNN

1.018/

11/

Agg.

SECoS GA Sleep

5

37.1/

11.1

SECoS

16/ 54

96.9/

36.9 EFuNN-ZM

5

0.039

59.1/

7.0/ 5.4

SECoS-ZM in

17

5

66.3/

23.7 SECoS-TS

97.6/

5

0.3

49.4/

EFuNN

5

17/ 1.71

49/ 102

78/ 18

Table 9.1: Summary of results over the benchmark data sets.

6.9/ 2.0


217

Algorithm

Two Spirals

Iris Classification

Mackey-Glass

Gas Furnace

MLP

Table 2.5, pg 36

Table 2.7, pg 37

Table 2.8, pg 38

Table 2.9, pg 39

FuNN

Table 2.5, pg 36

Table 2.7, pg 37

Table 2.8, pg 38

Table 2.9, pg 39

SECoS

Table 4.10, pg 83

Table 4.11, pg 84

Table 4.12, pg 86

Table 4.13, pg 87

EFuNN

Table 4.10, pg 83

Table 4.11, pg 84

Table 4.12, pg 86

Table 4.13, pg 87

SECoS-ZM

Table 6.11, pg 143

Table 6.12, pg 145

Table 6.13, pg 148

Table 6.14, pg 150

SECoS-TS

Table 6.11, pg 143

Table 6.12, pg 145

Table 6.13, pg 148

Table 6.14, pg 150

SECoS-ZM in

Table 6.11, pg 143

Table 6.12, pg 145

Table 6.13, pg 148

Table 6.14, pg 150

EFuNN-ZM

Table 6.11, pg 143

Table 6.12, pg 145

Table 6.13, pg 148

Table 6.14, pg 150

EFuNN-ZM in

Table 6.11, pg 143

Table 6.12, pg 145

Table 6.13, pg 148

Table 6.14, pg 150

SECoS Online Agg.

Table 7.16, pg 172

Table 7.18, pg 176

Table 7.20, pg 179

Table 7.22, pg 182

SECoS GA Training

Table 7.16, pg 172

Table 7.18, pg 176

Table 7.20, pg 179

Table 7.22, pg 182

SECoS Offline Agg.

Table 7.16, pg 172

Table 7.18, pg 176

Table 7.20, pg 179

Table 7.22, pg 182

SECoS Sleep

Table 7.16, pg 172

Table 7.18, pg 176

Table 7.20, pg 179

Table 7.22, pg 182

SECoS GA Sleep

Table 7.16, pg 172

Table 7.18, pg 176

Table 7.20, pg 179

Table 7.22, pg 182

EFuNN Online Agg.

Table 7.17, pg 172

Table 7.19, pg 176

Table 7.21, pg 180

Table 7.23, pg 183

EFuNN GA Training

Table 7.17, pg 172

Table 7.19, pg 176

Table 7.21, pg 180

Table 7.23, pg 183

EFuNN Offline Agg.

Table 7.17, pg 172

Table 7.19, pg 176

Table 7.21, pg 180

Table 7.23, pg 183

Table 9.2: Locations of benchmark results


218

Identification of Similar Constructive Algorithms Two constructive algorithms were identified that are similar to ECoS. The ways in which they differ were discussed. Adaptation of Optimisation Algorithms An algorithm for removing redundant neurons from GAL networks was identified and adapted for use with ECoS networks. The efficacy of this algorithm was demonstrated in Subsection 7.4.2 and is discussed in Section 9.7.

9.3.4 Conclusions for Hypothesis One This hypothesis has been supported, in that the uniqueness of ECoS was established, and that a suitable technique for the optimisation of ECoS networks was identified.

9.4 Hypothesis Two: Simplified ECoS Hypothesis Two was stated in Section 1.2 as: It is hypothesised that a simplified version of EFuNN can be developed, that is competitive with EFuNN, yielding an ECoS network that lacks fuzzy logic elements. This simplified ECoS network will be easier to implement, and will therefore be more efficient in operation. Hypothesis Two is concerned with the development of a simplified derivative of EFuNN, the Simple Evolving Connectionist System, or SECoS. The motivation for this is that the fuzzy logic elements of EFuNN may not be useful for all applications: there is therefore the need for a simplified alternative. Hypothesis Two was addressed in Section 4.6. Experimental results comparing the performance of EFuNN and SECoS were presented in Section 4.13.

9.4.1 Is a simplified version of EFuNN competitive with the original EFuNN? The benchmark results presented in this thesis (Section 4.13) indicate that the simplified ECoS, SECoS, is indeed competitive with EFuNN. Across all benchmark data sets, the accuracies of the SECoS networks were competitive with the accuracies of the EFuNN networks. The SECoS networks were consistently smaller than EFuNN. Overall, then, for specific problems the SECoS algorithm is superior to EFuNN, and is competitive for the others.

9.4.2 Are the simplified ECoS as flexible as EFuNN? Flexibility is the ability of the algorithm to model problems of different types, that is, the ability to model both classification and function approximation. Section 4.13 shows that SECoS is able to learn both classification and function approximation data sets, and is able to do so with an accuracy and network size that is competitive with EFuNN.


219

9.4.3 Support for Hypothesis Two Five criteria were specified in Section 1.3 for the support of Hypothesis Two: 1. The simplified ECoS exhibits levels of memorisation of the training data similar to EFuNN. 2. The simplified ECoS exhibits similar or better levels of generalisation over previously unseen data that are similar to EFuNN. 3. The simplified ECoS is able to adapt to new training data, without forgetting previously seen examples, to a similar degree than EFuNN. 4. The simplified ECoS is of a similar or smaller size than EFuNN. 5. The simplified ECoS can be applied to the same kinds of problems as EFuNN, that is, they are as flexible as EFuNN. Each of those criteria will now be discussed. SECoS exhibits levels of memorisation of the training data similar to EFuNN In Section 4.13 It was shown in the results over the benchmark data sets that the SECoS is able to memorise training data to an equal or greater accuracy across all benchmark data sets and the case study data. This criterion has therefore been successfully met. SECoS exhibits similar levels of generalisation over previously unseen data that are similar to EFuNN The results in Section 4.13 show that there were few significant differences between the generalisation accuracies of SECoS compared to EFuNN. Where significant differences did exist, the generalisation accuracy is either only very slightly inferior to, or significantly better than, the generalisation accuracy of EFuNN. SECoS is able to adapt to new training data, without forgetting previously seen examples, to a similar degree to EFuNN The degree of adaptation of SECoS to new data is greater than that of EFuNN: across all of the benchmark data sets, the accuracy of SECoS over the additional training data sets was greater than that of EFuNN. However, SECoS did forget slightly more than EFuNN. This criterion has therefore been successfully met. SECoS is of a similar or smaller size than EFuNN The SECoS networks experimented with were significantly smaller than the equivalent EFuNN networks across all benchmark data sets. This is despite the fact that the SECoS networks were often of equal or superior performance over the data sets. This criterion is therefore a success.


220

SECoS can be applied to the same kinds of problems as EFuNN The experimental results over the benchmark data sets show quite clearly that SECoS is able to learn, generalise and adapt to both classification and function approximation data, as EFuNN is able to. Thus, this criterion is successfully fulfilled.

9.4.4 Conclusions for Hypothesis Two Having shown that all of the criteria specified for the support of this hypothesis are successfully fulfilled, Hypothesis Two is considered to be supported.

9.5 Hypothesis Three: Formalisation of ECoS Hypothesis Three was stated in Section 1.2 as: It is hypothesised that a testable formalisation of ECoS and the ECoS training algorithm can be developed that will predict the behaviour of ECoS networks, in relation to the parameters used to train them. Hypothesis Three is concerned with the development of an experimentally testable formalisation that describes the behaviour of ECoS networks. The motivation for this is twofold. Firstly, it assists the neural network practitioner in the application of ECoS networks. Secondly, it makes it easier for other researchers to accept the ECoS model if a formalisation is available. Hypothesis Four is the subject of Chapter 5.

9.5.1 How can the internal state of an ECoS network be explained? The internal state of an ECoS network can be explained as a collection of Voronoi regions, where each region is defined by an evolving layer neuron and its incoming connections. Examples that fall within the Voronoi region of a neuron will cause that neuron to activate, where the degree of activation of that neuron is determined by its position within the region in relation to the neuron.

9.5.2 What effect does each training parameter have on the behaviour of ECoS? It was shown in Chapter 5 that the number of neurons added to an ECoS network during training will increase as the sensitivity threshold parameter increases, and decrease as the error threshold parameter increases. The number of neurons added is also effected by the learning rate two parameter. Each of the parameters in the ECoS training algorithm is affected by at least one of the other parameters: thus, optimisation of the parameters becomes a multi-parameter optimisation problem.

9.5.3 Support for Hypothesis Three Two criteria were specified in Section 1.3 for the support of Hypothesis Three:


221

1. A formalisation is created that is experimentally testable. 2. The experiments performed do not disprove the formalisation. Each of these criteria will now be discussed. A formalisation is created that is experimentally testable The formalisation developed includes several derivations that each result in predictions about the behaviour of an ECoS network in relation to the parameter values. Each of these predictions is experimentally testable. This criterion is thus fulfilled. The formalisation is not experimentally refuted After evaluation of both EFuNN and SECoS networks with a variety of parameters over each of the benchmark data sets, and the case study data. In each case, the predictions made by the formalisation were supported. This criterion is thus fulfilled.

9.5.4 Conclusions for Hypothesis Three Each of the two criteria established for this hypothesis has been successfully met. Hypothesis Three is therefore considered to be supported.

9.6 Review Hypothesis Four: Fuzzy Rule Extraction Hypothesis Four was stated in Section 1.2 as: It is hypothesised that methods of extracting fuzzy rules from the simplified ECoS network can be developed that are competitive with the rules extracted from EFuNN. Hypothesis Four is concerned with the extraction of fuzzy rules from SECoS networks. This is motivated by the desire to explain what the SECoS has learned. Since it is possible to explain EFuNN networks by fuzzy rule extraction, for SECoS to be competitive the same capability is needed. Hypothesis Four was addressed in Chapter 6, especially Section 6.7.

9.6.1 How accurate are the rules extracted from the simplified networks? Experimental results show that for the classification benchmark problems the Zadeh-Mamdani fuzzy rules extracted from SECoS are able to recognise the classes at a level greater than chance. The results for the function approximation problems are somewhat poorer, with the rules for the gas furnace data set displaying a very high error. Conversely, the Takagi-Sugeno rules extracted from SECoS are able to perform somewhat better over the function approximation tasks. For the classification tasks, as could be expected, they perform rather worse.


222

9.6.2 How do the rules extracted from the non-fuzzy ECoS compare to rules extracted from EFuNN? The Zadeh-Mamdani rules extracted from SECoS were of comparable accuracy to the Zadeh-Mamdani rules extracted from EFuNN, for all benchmark data sets. There are fewer rules extracted from SECoS, than from EFuNN, however. This will increase the readability of the rule set. For the function approximation tasks the Takagi-Sugeno rules extracted from SECoS performed better than the Zadeh-Mamdani rules extracted from EFuNN.

9.6.3 Support for Hypothesis Four A single criterion was specified in Section 1.3 for the support of Hypothesis Four: The research relating to Hypothesis Four will be considered to support the hypothesis if it results in algorithms that allow for the extraction of fuzzy rules from simplified ECoS networks, where the rules are competitive with the rules extracted from EFuNN. Competitive means that the accuracy of the extracted fuzzy rules is similar to or better than the accuracy of rules extracted from EFuNN. This criterion will now be discussed. Experimental results show that the Zadeh-Mamdani rules extracted from SECoS had an accuracy comparable to the rules extracted from EFuNN. The Takagi-Sugeno rules were, as expected, less accurate across classification problems, but more accurate across function approximation problems. Hypothesis Four can thus be described as partially supported. The Takagi-Sugeno rules extracted from SECoS were more accurate over function approximation tasks than the fuzzy rules extracted from EFuNN, while the Zadeh-Mamdani rules extracted from SECoS were slightly less accurate across all data sets. The ability to extract Takagi-Sugeno rules from SECoS gives SECOS an additional measure of flexibility over EFuNN.

9.6.4 Conclusions for Hypothesis Four This hypothesis has been supported, in that algorithms for extracting fuzzy rules from SECoS have been proposed, and shown experimentally to be effective.

9.7 Hypothesis Five: ECoS Optimisation Hypothesis Five was stated in Section 1.2 as: It is hypothesised that methods of optimising ECoS can be developed, that will reduce the size of the network while maintaining its accuracy over both previously seen and unseen data. Hypothesis Five is concerned with the optimisation of ECoS networks. Although ECoS networks are able to model problems to a high degree of accuracy, their efficiency can suffer if a large number of neurons are added during training. The goal of optimisation is to reduce the number of neurons while maintaining accuracy. Hypothesis Five is investigated in Chapter 7.


223

9.7.1 At what stages of an ECOS network life-cycle can optimisation methods be applied? There are two stages in the life-cycle of an ECoS network, training and post-training (although post-training can, of course, be followed by further training). Optimisation methods can be applied to both of these. The methods used to optimise the training and to optimise the network post-training were successful, to different degrees.

9.7.2 How can evolutionary algorithms be applied to optimising ECoS networks? Evolutionary algorithms were applied to both stages of the ECoS life-cycle. The application of an EA in the optimisation of the training of ECoS was successful. The use of an EA to optimise sleep training was partially successful, with the EA proving too aggressive in the reduction of the size of the ECoS.

9.7.3 Support of Hypothesis Five Three criteria were specified in Section 1.3 for the evaluation of each algorithm devised for the optimisation of ECoS networks: 1. The size of the network has been reduced. 2. The memorisation error over previously seen data has not changed significantly. 3. The generalisation error over previously unseen data has not changed significantly. Each of these criteria will now be discussed in the context of each algorithm. The algorithms are:

Online neuron aggregation.

GA optimised training.

Offline neuron aggregation.

Sleep learning.

Evolved sleep learning.

The size of the network has been reduced All of the algorithms reduced the size of the network to some degree. Under this criterion, then, the optimisation algorithms were successful. The memorisation error over previously seen data has not changed to a large degree With the exception of the GA optimised training, all optimisation methods resulted in a decrease in memorisation accuracy. The decrease in memorisation accuracy was most apparent for the offline aggregation and GA optimised sleep learning. Therefore, this criterion has not been met.


224

The generalisation error over previously unseen data has not changed to a large degree Across the classification data sets, generalisation accuracy was reduced only for the offline aggregation algorithm. Both of the function approximation datasets suffered a decrease in generalisation accuracy when optimised. It seems that the number of neurons required for function approximation is larger than the number needed for classification.

9.7.4 Conclusions for Hypothesis Five None of the optimisation algorithms were more than partially effective over the gas furnace benchmark. Given the poor performance of the unoptimised ECoS over this data set, though, it appears that this is a difficult problem for ECoS to learn. Of the other benchmarks, GA optimised training and online aggregation performed consistently well across all data sets, and sleep training performed well over the classification benchmarks. Offline aggregation was the next most effective, with performance over the iris classification and Mackey-Glass data benchmarks being very good. The GA optimised sleep training was too aggressive overall, although the results over several of the benchmarks are still acceptable One conclusion that is quite obvious is the ECoS networks that are being used for function approximation tasks require a larger number of neurons than for classification problems. This is evidenced by the decrease in performance of ECoS networks when subjected to optimisation methods that reduce the number of neurons in the network. In conclusion, all optimisation algorithms were at least partially successful. Hypothesis Five is thus considered to be partially supported.

9.8 Future Work This thesis has not attempted to solve all of the problems with ECoS, nor has it tried to exhaustively investigate the ECoS family of algorithms. There is much more work that can be done, that will further expand and extend the ECoS algorithm. Firstly, there are several variations of ECoS that can still be investigated. The first of these is recurrent ECoS networks. These have been proposed and briefly investigated in (Watts and Kasabov, 1999) and (Kasabov, 2003). A more rigorous assessment of recurrent ECoS networks would be useful, as would the expansion of the formalisation in Chapter 5 to include the effect of the recurrent connections. A further variation of ECoS that could be investigated is EFuNNs using membership functions other than the constrained triangular MF in the canonical EFuNN algorithm, such as Gaussians. The use of different MF types with different inputs is also a possibility. Use of different MF types has the potential to improve the performance of EFuNN, as well as improving the performance of the extracted fuzzy rules. Finally, although algorithms have been developed for adding output neurons (or, equivalently, adding classes) to both EFuNN and SECoS networks, there is as yet no algorithm for splitting ECoS networks when the number of outputs grows too large. This would provide an important efficiency boost, and should be investigated.


225

The analysis in Chapter 5 suggests that there may be a problem with the adjustment of the connection weights during ECoS training. Specifically, the adjustment of the input to evolving layer weights will change the activation level of the evolving layer neuron (and also the error of the network), but the adjustments of the evolving to output layer weights are calculated according to the old activation and error level. This problem is acknowledged in (Kasabov, 2003, pg 72), where the EFuNN d/p (dual pass) algorithm is described. This overcomes the above problem by modifying the input to evolving layer connections, then again propagating the input vector through the network and recalculating the error. This is an effective and robust method, but is considered to be inelegant. The theoretical analysis of 5.8 suggests that a more elegant method of solving this problem may be possible. It is possible that the theoretical analysis in Chapter 5 could be used to optimise the values of the training parameters as training is underway. This has been attempted before in (Kasabov, 2003), and may be improved by taking into account the formalisation developed in this thesis. Experimental investigation of the interaction between training parameters is also desirable. It was mentioned that the major advantage of using external MF when extracting fuzzy rules from SECoS was that they could be optimised without modifying the SECoS network itself. The most promising way of doing this is with an evolutionary algorithm (EA), as was done in for example (Gan et al., 1995). It may also be possible to use an EA to optimise the fuzzy rules extracted from an ECoS and then reinsert them into an ECoS for further use in either a production (or recall) mode, or for further training. An improved method of extracting fuzzy rules from EFuNN is also possible. This would look at the two winning MF for each input, rather than just the single winner. This would only be applicable to the canonical EFuNN, with bounded triangular MF, but would obviate the problem identified with inserting extracted rules into another EFuNN. This could also be used with the SECoS rule extraction algorithm, when MF similar to those embedded in EFuNN were used. It may also be possible to formulate an additional method of extracting Takagi-Sugeno fuzzy rules from SECoS that makes use of training data. Rather than extracting the connection weights as the parameters of the consequent function, it may be possible to determine which evolving layer neuron activates for each training example. These identified examples can then be used to perform a Least-Mean-Squares analysis, to formulate the consequent function. This would be very similar to the way in which DENFIS (Kasabov and Song, 2002) generates TakagiSugeno rules, but would have the advantage of retaining the advantageous qualities of SECoS. Many more optimisation techniques could be developed. Initialisation of ECoS with Voronoi diagrams (Bose and Garga, 1993) is one possibility, although problems with generating Voronoi diagrams for high dimensions (Okabe et al., 1992) may preclude this. The parameters used in offline aggregation are candidates for optimisation via an EA, although this may require the provision of a testing data set. Sleep learning, and evolved sleep learning, can be applied to EFuNN, provided the problems identified with defuzzifying the stored exemplars can be overcome. The techniques developed for the enhanced EFuNN rule extraction algorithm described above may be applicable here. Finally, a method of implementing forgetting in ECoS networks could be formulated. Rather than causing the values of the connection weights to decay, the weights could be adjusted to cause the evolving layer neurons


226

to “drift” in space towards more frequently winning neurons. This would make aggregation easier, as it would provide a way of identifying infrequently used neurons. There are many, many applications that may benefit from the application of ECoS networks. One of these is bioinformatics, a field where ANN have been fruitfully applied (Wu and McLarty, 2000; Baldi and Brunak, 1998). As genome and proteome sequencing efforts continue to generate large quantities of data, methods of mining this data for knowledge are becoming increasingly important. ANN have been fruitfully applied to modelling genome data (Brunak et al., 1991; Bisant and Maizel, 1995; Fu, 1999), and, especially, the prediction of protein structure (Bohr et al., 1988; Qian and Sejnowski, 1988; Andreassen et al., 1990; Chandonia and Karplus, 1995; Rost, 1996; Qian, 1996; Rost et al., 1996; Diederichs et al., 1998). This field could potentially benefit from the application of ECoS methods.

Bibliography Abraham, A. (2002). Optimization of evolutionary neural networks using hybrid learning algorithms. In Proceedings of IJCNN 2002, pages 2797–2802. Abreu, A. and Pinto-Ferreira, L. C. (1996). Fuzzy modeling: a rule based approach. In Proceedings of the fifth IEEE International Conference on Fuzzy Systems, pages 162–168. Aguiler, J. and Colmenares, A. (1997). Recognition algorithm using evolutionary learning on the random neural network. In Proceedings of the 1997 IEEE International Conference on Neural Networks, volume 2, pages 1023–1028. IEEE Press. Alander, J. T. (1993). On robot navigation using a genetic algorithm. In Albrecht, R., Reeves, C., and Steele, N., editors, Artificial Neural Nets and Genetic Algorithms, pages 471–478. Alpaydin, E. (1994). GAL: Networks that grow when they learn and shrink when they forget. International Journal of Pattern Recognition and Artificial Intelligence, 8(1):391–414. Anderson, E. (1935). The irises of the gaspe peninsula. Bulletin of the American Iris Society, 59. Anderson, S., Merrill, J., and Port, R. (1988). Dynamic speech categorization with recurrent networks. In Touretzky, D., Hinton, G., and Sejnowski, T., editors, Proceedings of the 1988 Connectionist Models Summer School, pages 398–396. Morgan Kaufmann. Andreassen, H., Bohr, H., Bohr, J., Brunak, S., Bugge, T., Cotterill, R., Jacobsen, C., Kusk, P., Lautrup, B., Petersen, S., Saermark, T., and Ulrich, K. (1990). Analysis of the secondary structure of the human immunodeficiency virus (HIV) proteins p17, gp120, and gp41 by computer modeling based on neural networks methods. Journal of Acquired Immune Deficiency Syndromes, 3:615–622. Andrews, R., Diederich, J., and Tickle, A. B. (1995). Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowledge-Based Systems, 8(6):373–389. Andrews, R. and Geva, S. (1997). Refining expert knowledge with an artificial neural network. In Kasabov, N., Kozma, R., Ko, K., S’Shea, R., Coghill, G., and Gedeon, T., editors, Progress in Connectionist-Based Information Systems, volume 2, pages 847–850. Springer. Angeline, P. K., Saunders, G. M., and Pollack, J. B. (1994). An evolutionary algorithm that constructs recurrent neural networks. IEEE Transactions on Neural Networks, 5(1):54–65. Antonisse, J. (1989). A new interpretation of schema notation that overturns the binary encoding constraint. In Schaffer, J., editor, Proceedings of the Third International Conference on Genetic Algorithms, pages 86–91. Arena, P., Caponetto, R., Fortuna, L., and Xibilia, M. G. (1993). M.L.P. optimal topology via genetic algorithms. In Artificial Neural Nets and Genetic Algorithms, pages 670–674. Springer-Verlag Wien New York. Ash, T. (1989). Dynamic node creation in backpropagation networks. Connection Science, 1(4):365–375. Baba, N., Marume, M., and Itoh, K. (1992). Utilization of stochastic automaton and genetic algorithm for neural

BIBLIOGRAPHY

228

network design. In Proceedings of the 2nd International Conference on Fuzzy Logic and Neural Networks, volume 2, pages 837–840, Iizuka, Japan. Bäck, T., Hoffmeister, F., and Schwefel, H.-P. (1991). A survey of evolution strategies. In Belew, R. K. and Booker, L. B., editors, Proceedings of the Fourth International Conference on Genetic Algorithms, pages 2–9. Balakrishnan, K. and Honavar, V. (1996). Analysis of neurocontrollers designed by simulated evolution. In International Conference on Neural Networks 1996: Plenary, Panel and Special Sessions, pages 130–135. Baldi, P. and Brunak, S. (1998). Bioinformatics: The Machine Learning Approach. MIT Press. Bebis, G., Georgiopoulos, and Kasparis, T. (1996). Coupling weight elimination and genetic algorithms. In Proceedings of the 1996 IEEE International Conference on Neural Networks, pages 1115–1120. Belew, R. K., McInerney, J., and Schraudolph, N. N. (1990). Evolving networks: Using the genetic algorithm with connectionist learning. In Langton, C. G., Taylor, C., Farmer, J. D., and Rasmussen, S., editors, Artificial Life II, pages 511–547, Santa Fe, New Mexico. Addison-Wesley Publishing Company. Bengio, Y. and De Mori, R. (1988). Speaker normalization and automatic speech recognition using spectral lines and neural networks. In Touretzky, D., Hinton, G., and Sejnowski, T., editors, Proceedings of the 1988 Connectionist Models Summer School, pages 388–397. Morgan Kaufmann. Billings, S. A. and Zheng, G. L. (1995). Radial basis function network configuration using genetic algorithms. Neural Networks, 8(6):877–890. Bisant, D. and Maizel, J. (1995). Identification of ribosome binding sites in Escherichia coli using neural network models. Nucleic Acids Research, 23(9):1632–1639. Bohr, H., Bohr, J., Brunak, S., Cotterill, R. M., Lautrup, B., Norskov, L., Olsen, O. H., and Petersen, S. B. (1988). Protein secondary structure and homology by neural networks. The -helices in rhodopsin. FEBS Letters, 241(1,2):223–228. Bornholdt, S. and Graudenz, D. (1992). General asymmetric neural networks and structure design by genetic algorithms. Neural Networks, 5:327–334. Bose, N. and Garga, A. K. (1993). Neural network design using voronoi diagrams. IEEE Transactions on Neural Networks, 4(5):778–787. Bourland, H. and Wellekens, C. (1987). Multiplayer perceptrons and automatic speech recognition. In IEEE First Annual Conference on Neural Networks, volume IV, pages 407–416, San Diego. Box, G. E. and Jenkins, G. M. (1970). Time Series Analysis forecasting and control. Holden-Day. Brasil, L. M., de Azevedo, F. M., and Barreto, J. M. (2000). A hybrid expert system for the diagnosis of epileptic crisis. Artificial Intelligence in Medicine, 585:1–7. Brown, A. and Card, H. (1997). Evolutionary artificial neural networks for competitive learning. In Proceedings of ICNN, pages 1558–1562. Brunak, S., Engelbrecht, J., and Knudsen, S. (1991). Prediction of human mRNA donor and acceptor sites from the DNA sequence. Journal of Molecular Biology, 220:49–65. Bruske, J. and Sommer, G. (1995a). Dynamic cell structure learns perfectly topology preserving map. Neural Computation, 7:845–865. Bruske, J. and Sommer, G. (1995b). Dynamic cell structures. In Tesauro, G., Touretzky, D., and Leen, T., editors,

BIBLIOGRAPHY

229

Advances in Neural Information Processing Systems 7, pages 497–504. The MIT Press. Bud, A. and Nocholson, A. (1997). Scheduling trains with genetic algorithms. In Kasabov, N., Kozma, R., Ko, K., O’Shea, R., Coghill, G., and Gedeon, T., editors, Progress in Connectionist-Based Information Systems, volume 2, pages 1017–1020. Carpenter, G., Grossberg, S., Markuzon, M., Reynolds, J., and Rosen, D. (1992). Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Transactions on Neural Networks, 3:698–713. Casdagli, M. (1989). Nonlinear prediction of chaotic time-series. Physica D, 35:335–356. Castellano, G. and Fanelli, A. (2000). Fuzzy inference and rule extraction using a neural network. Neural Network World Journal, 3:361–371. Cechin, A. L., Epperlin, U., Rosentiel, W., and Koppenhoefer, B. (1996). The extraction of sugeno fuzzy rules from neural networks. In Andrews, R. and Diederich, J., editors, Rules and Networks, pages 16–24. Queensland University of Technology, Neurocomputing Research Centre. Chandonia, J.-M. and Karplus, M. (1995). Neural networks for secondary structure and structural class predictions. Protein Science, 4:275–285. Chauvin, Y. (1990). A back-propagation algorithm with optimal use of hidden units. In Touretzky, D., editor, Advances in Neural Information Processing Systems (Denver, 1988), pages 519–526. Morgan Kaufmann, San Mateo. Chellapilla, K. and Fogel, D. B. (1999). Evolving neural networks to play checkers without relying on expert knowledge. IEEE Transactions on Neural Networks, 10(6):1382–1391. Chiu, S. (1994). Fuzzy model identification based on cluster estimation. Journal of Intelligent and Fuzzy Systems, 2. Cho, S.-B. and Shimohara, K. (1998). Cooperative behavior in evolved modular neural networks. In Methodologies for the Conception, Design and Application of Soft Computing: Proceedings of IIZUKA ’98, pages 606–609. Choi, B. and Bluff, K. (1995). Genetic optimisation of control parameters of a neural network. In Kasabov, N. K. and Coghill, G., editors, Artificial Neural Networks and Expert Systems, pages 174–177. IEEE Computer Society Press. Cortes, C. and Vapnik, V. (1995). Support vector networks. Machine Learning, 20:273–297. Crick, F. (1989). The recent excitement about neural networks. Nature, 337:129–132. Crowder, R. (1990). Predicting the Mackey-Glass timeseries with cascade-correlation learning. In Touretzky, D., Hinton, G., and Sejnowski, T., editors, Proceedings of the 1990 Connectionist Models Summer School, pages 117–123, Carnegie Mellon Univ. Cybenko, G. (1989). Approximation by superpositions of sigmoidal function. Mathematics of Control, Signals, and Systems, 2:303–314. Darwin, C. (1859). The Origin of Species by means of natural selection. John Murray, London. Davis, L., editor (1996). Handbook of Genetic Algorithms. International Thomson Computer Press. Davis, S. B. and Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions of Acoustics, Speech, and Signal Processing,

BIBLIOGRAPHY

230

28(4):357–366. de Castro, L. N., Iyoda, E. M., Von Zuben, F. J., and Gudwin, R. (1998). Feedforward neural network initialization: an evolutionary approach. In Proceedings of Vth Brazilian Conference on Neural Networks, December 9-11, 1998, pages 43–48. Deng, D. and Kasabov, N. (1999). Evolving self-organizing map and its application in generating a world macroeconomic map. In Kasabov, N. and Ko, K., editors, Emerging Knowledge Engineering and Connectionist-based Systems (Proceedings of the ICONIP/ANZIIS/ANNES’99 Workshop “Future directions for intelligent systems and information sciences”, Dunedin, 22-23 November 1999), pages 7–12. University of Otago Press. Diederichs, K., Freigang, J., Umhau, S., Zeth, K., and Breed, J. (1998). Prediction by a neural network of outer membrane -strand protein topology. Protein Science, 7:2413–2420. Dorado, J., Rabunal, J., Rivero, D., Santos, A., and Pazos, A. (2002). Automatic recurrent ANN rule extraction with genetic programming. In Proceedings of IJCNN 2002, pages 1552–1557. East, I. R. and Rowe, J. (1997). Abstract genetic representation of dynamical neural networks using kauffman networks. Artificial Life, 3:67–80. Eldracher, M. (1992). Classification of non-linear-separable real-world-problems usin Æ -rule, perceptrons and topologically distributed encoding. In Proceedings of the 1992 ACM/SIGAPP Symposium on Applied Computing, volume 2, pages 1098–1104. ACM Press. Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14:179–211. Esat, I., Kothari, B., and Wrathall, P. (1999). Encoding neural networks for GA based structural construction. In ICONIP ’99 6th International Conference on Neural Information Processing, pages 359–365. Fahlman, S. E. (1988). An empirical study of learning speed in back-propagation networks. Technical Report CMU-CS-88-162, Department of Computer Science, Carnegie-Mellon University. Fahlman, S. E. and Lebiere, C. (1990). The cascade-correlation learning architecture. In Touretzky, D. S., editor, Advances in Neural Information Processing Systems 2, pages 524–532. Morgan Kaufman Publishers. Faraq, W. and Tawfik, A. (2000). On fuzzy model identification and the gas furnace data. In Proceedings of the IASTED International Conference. Faraq, W. A., Quintana, V. H., and Lambert-Torres, G. (1997). Neuro-fuzzy modeling of complex systems using genetic algorithms. In Proceedings of the 1997 IEEE International Conference on Neural Networks, volume 1, pages 444–449. IEEE Press. Fisher, R. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7:179–188. Fogel, D. B., Wasson, E. C., Boughton, E. M., and Porto, V. W. (1997). A step toward computer-assisted mammography using evolutionary programming and neural networks. Cancer Letters, 119:93–97. Fogel, L. J., Owens, A. J., and Walsh, M. J. (1965). Artificial intelligence through a simulation of evolution. In Maxfield, M., Callahan, A., and Fogel, L., editors, Biophysics and Cybernetic Systems: Proceedings of the 2nd Cybernetic Sciences Symposium, pages 131–155. Fontanari, J. and Meir, R. (1991). Evolving a learning algorithm for the binary perceptron. Network, 2:353–359. Franzini, M. A. (1988). Learning to recognize spoken words: A study in connectionist speech recognition. In Touretzky, D., Hinton, G., and Sejnowski, T., editors, Proceedings of the 1988 Connectionist Models Summer,

BIBLIOGRAPHY

231

pages 407–416. Morgan Kaufmann. Frean, M. (1990). The upstart algorithm: A method for constructing and training feedforward neural networks. Neural Computation, 2(2):198–209. Fritzke, B. (1991). Unsupervised clustering with growing cell structures. In Proceedings of the IJCNN-91 Seattle. IEEE Press. Fritzke, B. (1993a). Growing cell structures - a self organizing network for unsupervised and supervised learning. Technical Report TR-93-026, International Computer Science Institute. Fritzke, B. (1993b). Kohonen feature maps and growing cell structures - a performance comparison. In Giles, C., Hanson, S., and Cowan, J., editors, Advances in Neural Information Processing Systems 5. Morgan Kaufmann. Fritzke, B. (1994). Supervised learning with growing cell structures. In Cowan, J. D., Tesauro, G., and Alspector, J., editors, Advances in Neural Information Processing Systems 6, pages 255–262. Morgan Kaufmann. Fritzke, B. (1995). A growing neural gas network learns topologies. In Tesauro, G., Tourezky, D., and Leen, T., editors, Advances in Neural Information Processing Systems 7, pages 625–632. The MIT Press. Fu, L. (1999).

An expert network for DNA sequence analysis.

IEEE Intelligent Systems, 14(January /

February):65–71. Fukuda, T., Komata, Y., and Arakawa, T. (1997a). Recurrent neural networks with self-adaptive GAs for biped locomotion robot. In Proceedings of the 1997 IEEE International Conference on Neural Networks, volume 3, pages 1710–1715. IEEE Press. Fukuda, T., Komata, Y., and Arakawa, T. (1997b). Recurrent neural networks with self-adaptive GAs for biped locomotion robot. In 1997 International Conference on Neural Networks (ICNN ’97), volume 3, pages 1710– 1715, Westin Galleria Hotel, Houston, Texas, USA. IEEE Press. Fukumi, M. and Akamatsu, N. (1996). A genetic approach to feature selection for pattern recognition systems. In Methodologies for the Conception, Design and Application on Intelligent Systems: Proceedings of IIZUKA ’96, pages 907–910. Furuhashi, T., Hasegawa, T., Horikawa, S.-i., and Uchikawa, Y. (1993). An adaptive fuzzy controller using fuzzy neural networks. In Proceedings of Fifth IFSA World Congress, pages 769–772. Furuhashi, T., Matushita, S., Tsutsui, H., and Uchikawa, Y. (1997). Knowledge extraction from hierarchical fuzzy model obtained by fuzzy neural networks and genetic algorithm. In Proceedings of the 1997 International Conference on Neural Networks (ICNN’97), volume 4, pages 2374–2379. IEEE Press. Gallant, S. I. (1993). Neural Network Learning and Expert Systems. MIT Press. Gan, M., Lan, H., and Zhang, L. (1995). A genetic-based method of generating fuzzy rules and membership functions by learning from example. In Proceedings of International Conference on Neural Information Processing (ICONIP’95), volume 1, pages 335–338. Gates, G. (1972). The reduced nearest neighbor rule. IEEE Transactions on Information Theory, pages 431–433. Gaweda, A. E., Zurada, J. M., and Aronhime, P. B. (2002). Efficient data-driven modeling with fuzzy relational rule network. In Proceedings of FUZZ-IEEE 2002, pages 174–178. Ghobakhlou, A., Watts, M., and Kasabov, N. (2000). On-line expansion of output space in evolving fuzzy neural networks. In Proceedings ICONIP 2000, Taejon, Korea, November, 2000, volume 2, pages 891–896.

BIBLIOGRAPHY

232

Ghobakhlou, A. A. and Seesink, R. (2001). An interactive multi modal system for mobile robotic control. In Proceedings of the Fifth Biannual Conference on Artificial Neural Networks and Expert Systems (ANNES2001), pages 93–99. Glaeser, A. (1998). Modular neural networks for low-complex phoneme recognition. In Proceedings of ICSLP’98, pages 1303–1306. Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimisation and Machine Learning. Addison-Wesley. Grefenstette, J. J. (1986). Optimization of control parameters for genetic algorithms. IEEE Transactions on Systems, Man, and Cybernetics, 16(1):122–128. Gueriot, D. and Maillard, E. (1996). A local approach for a fuzzy error function used in multilayer perceptron training through a genetic algorithm. In Proceedings of the 1996 IEEE international conference on neural networks, pages 1050–1055. Gupta, M. M. and Ding, H. (1994). Fuzzy neuronal networks and genetic algorithms. In Proceedings of the 3rd International Conference on Fuzzy Logic, Neural Nets and Soft Computing (Iizuka, Japan), pages 187–188. Hakim, B. A. (2001). Extraction and optimization of fuzzy rules. In Zhang, L. and Gu, F., editors, Proceedings of ICONIP 2001, November 14-18, 2001, Shanghai, China, volume 1, pages 361–365. Fudan University Press. Hamker, F. H. (2001). Life-long learning cell structures-continuously learning without catastrophic interference. Neural Networks, 14:551–573. Hanebeck, U. D. and Schmidt, G. K. (1994). Optimization of fuzzy networks via genetic algorithms. In Proceedings of International Conference on Neural Information Processing, volume 3, pages 1583–1588. Hansen, L., Rasmussen, C., Svarer, C., and Larsen, J. (1994). Adaptive regularization. In Proceedings of the IEEE Workshop on Neural Networks for Signal Processing IV, pages 78–87, Piscataway, New Jersey. IEEE Press. Harp, S. A., Samad, T., and Guha, A. (1990). Designing application-specific neural networks using the genetic algorithm. In Toretzky, D. S., editor, Advances in Neural Information Processing Systems 5, pages 447–454. Morgan Kauffman Publishers. Hasegawa, T., Horikawa, S.-i., Furuhashi, T., and Uchikawa, Y. (1992). A study on fuzzy modeling of BOF using a fuzzy neural network. In Proceedings of the 2nd International Conference on Fuzzy Logic and Neural Networks (Iizuka, Japan, July 17-22, 1992), pages 1061–1064. Hasegawa, T., Horikawa, S.-i., Furuhashi, T., and Uchikawa, Y. (1993). An application of fuzzy neural networks to design of adaptive fuzzy controllers. In Proceedings of 1993 International Joint Conference on Neural Networks, pages 1761–1764. Hashiyama, T., Furuhashi, T., and Uchikawa, Y. (1993a). A fuzzy neural network for identifying changes of degrees of attention in a multi-attribute decision making process. In Proceedings of 1993 International Joint Conference on Neural Networks, pages 705–708. Hashiyama, T., Furuhashi, T., and Uchikawa, Y. (1993b). A study on a multi-attribute decision making process using a fuzzy neural network. In Proceedings of Fifth IFSA World Congress, pages 810–813. Haskey, S. and Datta, S. (1998). A comparative study of OCON and MLP architectures for phoneme recognition. In Proceedings of ICSLP 98. Hassibi, B. and Stork, D. (1993). Optimal brain surgeon. In Hanson, S., Cowan, J., and Giles, C., editors, Advances

BIBLIOGRAPHY

233

in Neural Information Processing Systems (Denver, 1992), pages 164–171. Morgan Kaufmann, San Mateo. Heinke, D. and Hamker, F. H. (1998). Comparing neural networks: A benchmark on growing neural gas, growing cell structures, and fuzzy ARTMAP. IEEE Transactions on Neural Networks, 9(6):1279–1291. Heistermann, J. (1990). The application of a genetic approach as an algorithm for neural networks. In Schwefel, H.-P. and Männer, R., editors, Parallel Problem Solving from Nature, volume 496 of Lecture Notes in Computer Science, pages 297–301. Springer-Verlag. Hingston, P., Barone, L., and L., W. (2002). Evolving crushers. In Proceedings of CEC 2002, pages 1109–1114. Hiraga, I. and Furuhashi, T. (1995). An acquisition of operator’s rules for collision avoidance using fuzzy neural networks. In IEEE Transactions on Fuzzy Systems, 3(3). Hoffmeister, F. and Bäck, T. (1991). Genetic algorithms and evolution strategies: Similarities and differences. In H-P, S. and Manner, R., editors, Parallel Problem Solving from Nature. Springer Verlag. Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. MIT Press. Homma, T., Atlas, L. E., and Marks, R. J. (1988). An artificial neural network for spatio-temporal bipolar patterns: Application to phoneme classification. In Touretzky, D., Hinton, G., and Sejnowski, T., editors, Proceedings of the 1988 Connectionist Models Summer School, pages 380–387. Morgan Kaufmann. Horikawa, S.-i., Furuhashi, T., Okuma, S., and Uchikawa, Y. (1990). Composition methods of fuzzy neural networks. In Proceedings of IEEE/IE CON ’90, pages 1253–1258. Huang, S. H. and Benjamin, M. (2001). Automated knowledge acquisition for design and manufacturing: The case of micromachined atomizer. Journal of Intelligent Manufacturing, 12:377–391. Hung, S. and Adeli, H. (1994). A parallel genetic/neural network learning algorithm for MIMD shared memory machines. IEEE Transactions on Neural Networks, 5(6):900–909. Ichimura, T., Matsumoto, N., Tazaki, E., and Yoshida, K. (1997). Extraction method of rules from reflective neural network architecture. In Proceedings of the 1997 International Conference on Neural Networks (ICNN’97), volume 1, pages 510–515. IEEE Press. Ishibuchi, H., Nii, M., and Murata, T. (1997). Linguistic rule extraction from neural networks and geneticalgorithm0based rule selection. In Proceedings of the 1997 International Conference on Neural Networks (ICNN’97), volume 4, pages 2390–2395. IEEE Press. Ishikawa, M. (1996). Structural learning with forgetting. Neural Networks, pages 501–521. Ivanova, I. and Kubat, M. (1995). Initialization of neural networks by means of decision trees. Knowledge Based Systems, 8(6):333–344. Izquierdó, J. M. C., Dimitriadis, Y. A., Sánchez, E. G., and Coronado, J. L. (2001). Learning from noisy information in FasArt and FasBack neuro-fuzzy systems. Neural Networks, 14:407–425. Jacobsson, H. and Olsson, B. (2000). An evolutionary algorithm for inversion of ANNs. In Wang, P. P., editor, Proceedings of the Fifth Joint Conference on Information Sciences, volume 1, pages 1070–1073. Jagielska, I., Matthews, C., and Whitfort, T. (1996). The application of neural networks, fuzzy logic, genetic algorithms, and rough sets to automated knowledge acquisition. In Yamakawa, T. and Matsumoto, G., editors, Methodologies for the Conception, Design, and Application of Intelligent Systems: Proceedings of IIZUKA’96, volume 2, pages 565–569. World Scientific.

BIBLIOGRAPHY

234

Jang, J.-S. R. (1993). ANFIS: Adaptive-network-based fuzzy inference system. IEEE Transactions on Systems, Man and Cybernetics, 23:665–684. Janikow, C. Z. and Michalewicz, Z. (1991). An experimental comparison of binary and floating point representations in genetic algorithms. In Belew, R. K. and Booker, L. B., editors, Fourth International Conference on Genetic Algorithms, pages 31–36, University of California, San Diego. Morgan Kaufmann Publishers. Jenkins, N. and Gedeon, T. (1997). Genetic algorithms applied to university exam scheduling. In Kasabov, N., Kozma, R., Ko, K., O’Shea, R., Coghill, G., and Gedeon, T., editors, Progress in Connectionist-Based Information Systems, volume 2, pages 1034–1037. Jones, R., Less, Y., Barnes, C., Flake, G., Lee, K., and Lewis, P. (1990). Function approximation and time series prediction with neural networks. In Proc. IEEE Int. Joint Conf. Neural Networks, volume 1, pages 649–665. Karnin, E. (1990). A simple procedure for pruning back-propagation trained neural networks. IEEE Transactions on Neural Networks, 1(2):239–242. Kasabov, N. (1998a). The ECOS framework and the ECO learning method for evolving connectionist systems. Journal of Advanced Computational Intelligence, 2(6):195–202. Kasabov, N. (1999). Evolving connectionist systems: A theory and a case study on adaptive speech recognition. In International Joint Conference on Neural Networks (IJCNN), July 10-16. Kasabov, N. and Fedrizzi, M. (1999). Fuzzy neural networks and evolving connectionist systems for intelligent decision making. In Proceedings of the Eighth International Fuzzy Systems Association World Congress, Taiwan, , August 17-20, pages 30–35. Kasabov, N., Kim, J., Watts, M., and Gray, A. (1997a). FuNN/2 - a fuzzy neural network architecture for adaptive learning and knowledge acquisition in multi-modular distributed environments. Information Sciences Applications. Kasabov, N., Kozma, R., Kilgour, R., Laws, M., Taylor, J., Watts, M., and Gray, A. (1997b). A methodology for speech data analysis and a framework for adaptive speech recognition using fuzzy neural networks. In Progress in Connectionist-Based Information Systems, Proceedings of the ICONIP / ANZIIS / ANNES ’97, Dunedin, 24-28 November 1997. Springer Verlag. Kasabov, N., Kozma, R., Kilgour, R., Laws, M., Watts, M., Gray, A., and Taylor, J. (1999). Speech data analysis and recognition using fuzzy neural networks and self-organising maps. In Kasabov, N. and Kozma, R., editors, Neuro-Fuzzy Techniques for Intelligent Information Systems, pages 241–263. Physica-Verlag. Kasabov, N. and Song, Q. (2000). Dynamic evolving neuro-fuzzy inference system (DENFIS): On-line learning and application for time-series prediction. In Proceedings of the 6th International Conference on Soft Computing, October 1-4, 2000, Fuzzy Logic Systems Institute, Iizuka, Japan, pages 696–702. Kasabov, N. and Song, Q. (2002). DENFIS: Dynamic evolving neural-fuzzy inference systems. IEEE Transactions on Fuzzy Systems, 10(2):144–154. Kasabov, N. and Woodford, B. (1999). Rule insertion and rule extraction from evolving fuzzy neural networks: Algorithms and applications for building adaptive, intelligent expert systems. In IEEE International Fuzzy Systems Conference, pages 1406–1411. Kasabov, N. K. (1996a). Foundations of Neural Networks, Fuzzy Systems and Knowledge Engineering. MIT Press.

BIBLIOGRAPHY

235

Kasabov, N. K. (1996b). Learning fuzzy rules and approximate reasoning in fuzzy neural networks and hybrid systems. Fuzzy Sets and Systems, 82(2). Kasabov, N. K. (1998b). ECOS: Evolving connectionist systems and the ECO learning paradigm. In Usui, S. and Omori, T., editors, ICONIP’98 Proceedings, volume 2, pages 1232–1235. Kasabov, N. K. (1998c). Evolving fuzzy neural networks - algorithms, applications and biological motivation. In Yamakawa, T. and Matsumoto, G., editors, Methodologies for the Conception, Design and Application of Soft Computing, volume 1, pages 271–274. World Scientific. Kasabov, N. K. (2003). Evolving Connectionist Systems: Methods and Applications in Bioinformatics, Brain Study and Intelligent Machines. Springer. Kasabov, N. K. and Watts, M. J. (1997). Genetic algorithms for structural optimisation, dynamic adaptation and automated design of fuzzy neural networks. In 1997 International Conference on Neural Networks (ICNN ’97), volume 4, pages 2546–2549, Westin Galleria Hotel, Houston, Texas, USA. IEEE Press. Kermani, B. G., Schiffman, S. S., and Nagle, H. T. (1999). Using neural networks and genetic algorithms to enhance performance in an electronic nose. IEEE Transactions on Biomedical Engineering, 46(4):429–439. Kilgour, R. (2003). Evolving Systems for Connectionist-Based Speech Recognition. PhD thesis, University of Otago. Kim, E., Park, M., Ji, S., and Park, M. (1997). A new approach to fuzzy modeling. IEEE Transactions on Fuzzy Systems, 5:328–337. Kim, E., Park, M., Kim, S., and Park, M. (1998). A transformed input-domain approach to fuzzy modeling. IEEE Transactions on Fuzzy Systems, 6(4):596–604. Kim, Y.-W. and Park, D.-J. (1996). Ship collision avoidance using genetic algorithm. In Methodologies for the Conception, Design, and Application of Intelligent Systems, pages 545–548. Kirchhoff, K. (1998). Combining articulatory and acoustic information for speech recognition in noisy and reverberant environments. In Proceedings of the International Conference on Spoken Language Processing, Sydney, Australia, pages 891–894. Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9):1464–1479. Kohonen, T. (1997). Self-Organizing Maps. Springer, second edition. Koizumi, T., Mori, M., Taniguchi, S., and Maruya, M. (1996). Recurrent neural networks for phoneme recognition. In Proceedings of ICSLP’96, volume 1, pages 326–329. Kolmogorov, A. (1957). On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. Dokl. Akad. Nauk. USSR, 114:953–956. (in Russian). Kong, S.-G. and Kosko, B. (1992). Adaptive fuzzy systems for backing up a truck-and-trailer. IEEE Transactions on Neural Networks, 3(2):211–223. Koprinska, I. and Kasabov, N. (1999). An application of evolving fuzzy neural network for compressed video parsing. In ICONIP/ANZIIS/ANNES’99 Workshop, Dunedin, New Zealand, November 22-24, pages 96–102. Kosko, B. (1992). Neural Networks and Expert Systems: A Dynamical Systems Approach to Machine Intelligence. Prentice-Hall, Englewood Cliffs, New Jersey. Kosko, B. (1993). Fuzzy Thinking. Flamingo.

BIBLIOGRAPHY

236

Koza, J. R. (1993). Genetic programming: on the programming of computers by means of natural selection. MIT Press, 3rd edition. Kwok, T.-Y. and Yeung, D.-Y. (1999). Constructive algorithms for structure learning in feedforward neural networks for regression problems. IEEE Transactions on Neural Networks. Lang, K. J. and Witbrock, M. J. (1988). Learning to tell two spirals apart. In Touretzky, D., Hinton, G., and Sejnowski, T., editors, Proceedings of the 1988 Connectionist Models Summer School, pages 52–57. Lapedes, A. and Farber, R. (1987). Nonlinear signal processing using neural networks: prediction and system modeling. Technical Report LA-UR-87-2662, Los Alamos Nat. Lab., Los Alamos, NM. Lawrence, S., Tsoi, A. C., and Back, A. D. (1996). The gamma MLP for speech phoneme recognition. In Touretzky, D., Mozer, M., and Hasselmo, M., editors, Advances in Neural Information Processing Systems 8, pages 785–791. MIT Press. Le Cun, Y., Denker, J., and Solla, S. (1990). Optimal brain damage. In Touretzky, D., editor, Advances in Neural Information Processing Systems, pages 598–605. Morgan-Kaufmann, San Mateo. Lee, D.-W. and Sim, K.-B. (1998). Ontogenesis of artificial neural networks based on L-System and genetic algorithms. In Yamakawa, T. and Matsumoto, G., editors, Methodologies for the Conception, Design and Application of Soft Computing: Proceedings of IIZUKA’98, volume 2, pages 817–820. World Scientific. Lee, K.-M., Kwak, D.-H., and Lee-Kwang, H. (1994). On fuzzy modeling with fuzzy neural networks. In Proceedings of International Conference on Neural Information Processing, volume 3, pages 1589–1594. Lee, K.-M., Yamakawa, T., Uchino, E., and Lee, K.-M. (1997). A genetic algorithm approach to job shop scheduling. In Kasabov, N., Kozma, R., Ko, K., O’Shea, R., Coghill, G., and Gedeon, T., editors, Progress in Connectionist-Based Information Systems, volume 2, pages 1030–1033. Lei, J., He, G., and Jiang, P. (1997). The state estimation of the CSTR system based on a recurrent neural network trained by HGAs. In Proceedings of the 1997 IEEE International Conference on Neural Networks, volume 2, pages 779–782. IEEE Press. Leichter, C. S., Cichocki, A., and Kasabov, N. (2001). Independent component analysis and evolving fuzzy neural networks for the classification of single trial EEG data. In Proceedings of the Fifth Biannual Conference on Artificial Neural Networks and Expert Systems (ANNES2001), pages 100–105. Leung, H. C., Glass, J. R., Philips, M. S., and Zue, V. W. (1990). Phonetic classification and recognition using the multi-layer perceptron. In Advances in Neural Information Processing, pages 248–254. Lin, Y. and Cunningham III, G. (1995). A new approach to fuzzy-neural modeling. IEEE Transactions on Fuzzy Systems, 3(2):190–197. Lippmann, R. P. (1987). An introduction to computing with neural nets. IEEE ASSP Mag., pages 4–22. Lippmann, R. P. (1989). Review of neural networks for speech recognition. Neural Computation, 1:1–38. Lippmann, R. P. (1997). Speech recognition by machines and humans. Speech Communications, 22:1–15. Littmann, E. and Ritter, H. (1996). Learning and generalization in cascade network architectures. Neural Computation, 8:1521–1539. Liu, Y. and Yao, X. (1996). A population-based learning algorithm which learns both architectures and weights of neural networks. In Yao, X. and Li, X., editors, Proceedings of ICYCS’95 Workshop on Soft Computing, pages

BIBLIOGRAPHY

237

54–65. Mackey, M. C. and Glass, L. (1977). Oscillation and chaos in physiological control systems. Science, 197:287– 289. Maillard, E. P. (1997). RBF neural network, basis functions and genetic algorithm. In Proceedings of the 1997 IEEE International Conference on Neural Networks, volume 4, pages 2187–2192. IEEE Press. Mamdani, E. (1976). Advances in linguistic synthesis of fuzzy controllers. International journal of Man-Machine Studies, 8(6):669–678. Mandischer, M. (1993a). Genetic optimization and representation of neural networks. In Proceedings of the Fourth Australian Conference on Neural Networks (ACNN93), pages 122–125. Mandischer, M. (1993b). Representation and evolution of neural networks. In Albrecht, R., Reeves, C., and Steele, N., editors, Artificial Neural Nets and Genetic Algorithms, pages 643–649. Springer-Verlag Wien New York. MATLAB Manual (2002). MATLAB Neural Networks Toolbox Manual. The MathWorks, Inc. Matthews, B. (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta, 405:442–451. Matthews, C. and Jagielska, I. (1995). Fuzzy rule extraction from a trained multilayered neural network. In Proceedings of the 1995 IEEE International Conference on Neural Networks (ICNN’95), Perth, Australia. McCloskey, M. and Cohen, N. (1989). Catastrophic interference in connectionist networks: The sequential learning project. The Psychology of Learning and Motivation, 24:109–164. McCullagh, J. and Bluff, K. (1993). Genetic modification of a neural networks training data. In Kasabov, N. K., editor, Artificial Neural Networks and Expert Systems, pages 58–59. McCullagh, J., Choi, B., and Bluff, K. (1997). Genetic evolution of a neural network’s input vector for meteorological estimations. In Kasabov, N., Kozma, R., Ko, K., O’Shea, R., Coghill, G., and Gedeon, T., editors, 1997 International Conference on Neural Information Processing and Intelligent Information Systems, volume 2, pages 1046–1049, Dunedin, New Zealand. Springer. McCulloch, W. and Pitts, W. (1943). A logical calculus of the ideas imminent in nervous activity. Bulletin of Mathematical Biophysics, 5:115–133. McDonnell, J. and Waagen, D. (1994). Evolving recurrent perceptrons for time-series modeling. IEEE Transactions on Neural Networks, 5(1):24–38. Menczer, F. and Parisi, D. (1992). Recombination and unsupervised learning: effects of crossover in the genetic optimization of neural networks. Network, 3:423–442. Mézard, M. and Nadal, J.-P. (1989). Learning in feedforward layered networks: the tiling algorithm. Journal of Physics A, 22:2191–2203. Michalewicz, Z. (1992). Genetic Algorithms + Data Structures = Evolution Programs. Springer Verlag. Michalski, R. S. (1983). A theory and methodology of inductive learning. Artificial Intelligence, 20:111–161. Minsky, M. L. and Papert, S. A. (1969). Perceptrons. MIT Press. Mitchell, M. (1996). An Introduction to Genetic Algorithms. MIT Press. Mitra, S., De, R. K., and Pal, S. K. (1997). Knowledge-based fuzzy MLP for classification and rule generation. IEEE Transactions on Neural Networks, 8(6):1338–1350.

BIBLIOGRAPHY

238

Mitra, S. and Hayashi, Y. (2000). Neuro-fuzzy rule generation: Survey in soft computing framework. IEEE Transactions on Neural Networks, 11(3):748–768. Mitra, S. and Pal, S. K. (1996). Fuzzy self-organization, inferencing, and rule generations. IEEE Transactions of Systems, Man, and Cybernetics, 26(5):608–620. Mizutani, E. and Dreyfus, S. E. (2002). MLP’s hidden-node saturations and insensitivity to initial weights in two classification benchmark problems: parity and two-spirals. In Proceedings of the Congress on Evolutionary Computation, pages 2831–2836. Monfroglio, A. (1996). Timetabling through constrained heuristic search and genetic algorithms. Software-practice and experience, 26(3):251–279. Moody, J. (1989). Fast learning in multi-resolution hierarchies. In Touretzky, D., editor, Advances in Neural Information Processing Systems I, pages 29–39. Morgan Kaufman. Moreira, M. and Fiesler, E. (1995). Neural networks with adaptive learning rate and momentum terms. Technical Report 95-04, Institut Dalle D’Intelligence Artificielle Perceptive. Moriarty, D. E. and Miikkulainen, R. (1998). Forming neural networks through efficient and adaptive coevolution. Evolutionary Computation, 5(4):373–399. Mozer, M. and Smolensky, P. (1989). Skeletonization: A technique for trimming the fat from a network via relevance assessment. In Touretzky, D., editor, Advances in Neural Information Processing Systems (Denver, 1988), pages 107–115. Morgan-Kaufmann, San Mateo. Mühlenbein, H. and Kindermann, J. (1989). The dynamics of evolution and learning - towards genetic neural networks. In Pfeifer, R., Schreter, Z., and Fogelman-Soulié, editors, Connectionism in Perspective, pages 173–197. North-Holland. Mukaidono, M. and Yamaoka, M. (1992). A learning method of fuzzy inference with neural networks and its application. In Proceedings of the 2nd International Conference on Fuzzy Logic and Neural Networks, volume 1, pages 185–187. Fuzzy Logic Systems Institute. Mukherjee, S., Osuna, E., and Girosi, F. (1997). Nonlinear prediction of chaotic time series using support vector machines. In Principe, J., Giles, L., Morgan, N., and Wilson, E., editors, IEEE Workshop on Neural Networks for Signal Processing VII, page 511. IEEE Press. Munro, P. W. (1993). Genetic search for optimal representations in neural networks. In Albrecht, R., Reeves, C., and Steele, N., editors, Artificial Neural Nets and Genetic Algorithms, pages 628–634. Springer-Verlag. Nyquist, H. (1928). Certain topics in telegraph transmission theory. Trans. AIEE, 47:617–644. Okabe, A., Boots, B., and Sugihara, K. (1992). Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. John Wiley and Sons, Ltd. Optiz, D. W. and Shavlik, J. W. (1997). Connectionist theory refinement: Genetically searching the space of network topologies. Journal of Artificial Intelligence Research, 6:177–290. Paredis, J. (1994). Steps towards co-evolutionary classification neural networks. In Brooks, R. A. and Maes, P., editors, Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, pages 102–108. The MIT Press. Parekh, R., Yang, J., and Honavar, V. (2000). Constructive neural-network learning algorithms for pattern classifi-

BIBLIOGRAPHY

239

cation. IEEE Transactions on Neural Networks, 11(2):436–451. Pedrycz, W. (1984). An identification algorithm in fuzzy relational systems. Fuzzy Sets and Systems, 13:153–167. Philipsen, W. and Cluitmans, L. (1993). Using a genetic algorithm to tune potts neural networks. In Albrecht, R., Reeves, C., and Steele, N., editors, Artificial Neural Nets and Genetic Algorithms, pages 650–657. SpringerVerlag. Pican, N., Fohr, D., and Mari, J.-F. (1996). HMMs and OWE neural network for continuous speech recognition. In Proceedings of ICSLP. Platt, J. (1991a). Learning by combining memorization and gradient descent. In Advances in Neural Information Processing Systems III. Platt, J. (1991b). A resource-allocating network for function interpolation. Neural Computation, 3(2):213–225. Prechelt, L. (1997). Investigation of the CasCor family of learning algorithms. Neural Networks, 10(5):885–896. Principe, J. and Kuo, J.-M. (1995). Non-linear modelling of chaotic time series with neural networks. In Advances in Neural Information Processing Systems VII. Qian, H. (1996). Prediction of -helices in proteins based on thermodynamic parameters from solution chemistry. Journal of Molecular Biology, 256:663–666. Qian, N. and Sejnowski, T. J. (1988). Predicting the secondary structure of globular proteins using neural network models. Journal of Molecular Biology, 202:865–884. Rantala, J. and Koivisto, H. (2002). Optimised subtractive clustering for neuro-fuzzy models. In Proceedings of the 3rd International Conference on Fuzzy Sets & Fuzzy Systems (FSFS’02) Interlaken, Switzerland, February 11-15. Ray, K. S. and Ghoshal, J. (1996). Neuro genetic approach to pattern recognition. In Methodologies for the Conception, Design, and Application of Intelligent Systems: Proceedings of IIZUKA ’96, pages 221–224. Reed, R. D. and Marks, R. J. (1999). Neural Smithing. MIT Press, Cambridge, Massachusetts. Renals, S. and Rohwer, R. (1989). Phoneme classification experiments using radial basis functions. In Proceedings of International Joint Conference on Neural Networks - IJCNN, Washington, D.C., volume I, pages 461–467. Ribeiro, B. (2002). Kernelized based functions with minkovsky’s norm for SVM regression. In IJCNN-2002, pages 2198–2203. Ripley, B. D. (1993). Statistical aspects of neural networks. In Barndorrf-Nielsen, O., Jensen, J., and Kendall, W., editors, Networks and Chaos - Statistical and Probabilistic Aspects, chapter 2, pages 40–123. Chapman and Hall. Robbins, P., Soper, A., and Rennolls, K. (1993). Use of genetic algorithms for optimal topology determination in back propagation neural networks. In Albrecht, R., Reeves, C., and Steele, N., editors, Artificial Neural Nets and Genetic Algorithms, pages 726–730. Springer-Verlag Wien New York. Robinson, T. and Fallside, F. (1990). Phoneme recognition from the TIMIT database using recurrent error propagation networks. Technical report, Cambridge University, Engineering Department. Romero, E. and Alquézar, R. (2002). A new incremental method for function approximation using feed-forward neural networks. In Proceedings of the International Joint Conference on Neural Networks (IJCCN) 2002, pages 1968–1973.

BIBLIOGRAPHY

240

Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–408. Rost, B. (1996). PHD: Predicting one-dimensional protein structure by profile-based neural networks. Methods in Enzymology, 266:525–539. Rost, B., Fariselli, P., and Casadio, R. (1996). Topology prediction for helical transmembrane proteins at 86% accuracy. Protein Science, 5:1704–1718. Rozich, R., Ioerger, T., and Yager, R. (2002). FURL - a theory revision approach to learning fuzzy rules. In Proceedings of FUZZ-IEEE 2002, pages 791–796. Rumelhart, D., Hinton, G., and Williams, R. (1986). Learning representations by back-propagating errors. Nature, 323:533–536. Sanger, T. (1991). A tree-structured adaptive network for function approximation in high-dimensional spaces. IEEE Trans. Neural Networks, 2(2):285–293. Sarkar, M. and Yegnanarayana, B. (1997a). An evolutionary programming-based probabilisitc neural networks construction technique. In Proceedings of the 1997 IEEE International Conference on Neural Networks, volume 1, pages 456–461. IEEE Press. Sarkar, M. and Yegnanarayana, B. (1997b). Feedforward neural networks configuration using evolutionary programming. In Proceedings of the 1997 IEEE International Conference on Neural Networks, volume 1, pages 438–443. Schiffmann, W., Joost, M., and Werner, R. (1990). Performance evaluation of evolutionarily created neural network topoplogies. In Schwefel, H.-P. and Männer, R., editors, Parallel Problem Solving from Nature, volume 496 of Lecture Notes in Computer Science, pages 274–283. Springer-Verlag. Schiffmann, W., Joost, M., and Werner, R. (1993). Application of genetic algorithms to the construction of topologies for multilayer perceptrons. In Albrecht, R., Reeves, C. R., and Steele, N., editors, Artificial Neural Nets and Genetic Algorithms, pages 675–682. Springer-Verlag Wien New York. Schiffmann, W., Joost, M., and Werner, R. (1994). Optimization of backpropagation algorithm for training multiplayer perceptrons. Technical report, Institute of Physics, University of Koblenz. Scholz, M. (1990). A learning strategy for neural networks based on a modified evolutionary strategy. In Schwefel, H.-P. and Männer, R., editors, Parallel Problem Solving from Nature, volume 496 of Lecture Notes in Computer Science, pages 314–318. Springer-Verlag. Shibata, T., Fukuda, T., Kosuge, K., and Arai, F. (1996). Path-planning for multiple mobile robots by genetic algorithms. In Methodologies for the Conception, Design, and Application of Intelligent Systems, pages 747– 750. Siddiqi, A. and Lucas, S. (1998). A comparison of matrix rewriting versus direct encoding for evolving neural networks. In Proceedings of IEEE Conference on Evolutionary Computation 1998. Siddique, M. and Tokhi, M. (2001). Training neural networks: Backpropagation vs genetic algorithms. In Proceedings of IJCNN 2001, pages 2673–2678. Sima, M., Croitoru, V., and Burileanu, D. (1998). Performance analysis on speech recognition using neural networks. In Proceedings of the International Conference and Development and Application Systems, Suceava,

BIBLIOGRAPHY

241

Romania, pages 259–266. Sinclair, S. and Watson, C. (1995). The development of the Otago speech database. In Kasabov, N. K. and Coghill, G., editors, Proceedings of ANNES ’95. IEEE Computer Society Press, Los Alamitos, CA. Sirvadam, V., McCloone, S., and Irwin, G. (2002). Separable recursive training algorithms for feedforward neural networks. In Proceedings of IJCNN 2002, pages 1212–1217. Smalz, R. and Conrad, M. (1994). Combining evolution with credit apportionment: A new learning algorithm for neural nets. Neural Networks, 7(2):341–351. Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, 36:111–147. Sugeno, M. and Tanaka, K. (1991). Identification of a fuzzy model and its application to prediction of a complex system. Fuzzy Sets and Systems, 42:315–334. Sugeno, M. and Yasukawa, T. (1991). Linguistic modeling based on numerical data. In Proceedings of IFSA’91, Brussels. Sugeno, M. and Yasukawa, T. (1993). A fuzzy-logic based approach to qualitative modeling. IEEE Transactions on Fuzzy Systems, 1(1):7–31. Svarer, C., Hansen, L., Larsen, J., and Rasmussen, C. (1993). Designer networks for time series processing. In et al., C. K., editor, Proceedings of the 1993 IEEE Workshop on Neural Networks for Signal Processing (NNSP’93), pages 78–87, Baltimore. Takagi, T. and Sugeno, M. (1985). Fuzzy identification of systems and its applications to modeling and control. IEEE Transactions on Systems, Man, and Cybernetics, SMC-15:116–132. Tino, P. and Koteles, M. (1999). Extracting finite-state representations from recurrent neural networks trained on chaotic symbolic sequences. IEEE Transactions on Neural Networks, 10(2):284–302. Togneri, R., Forrokhi, D., Zhang, Y., and Attikiouzel, Y. (1992). A comparison of the LBG, MLP, SOM and GMM algorithms for vector quantisation and clustering analysis. In Proceedings Fourth Australian International Conference of Speech Science and Technology, Brisbane, Australia, pages 173–177. Tong, R. (1978). Synthesis of fuzzy models for industrial processed: Some recent results. International Journal General Systems, 4:143–162. Tong, R. (1980). The evaluation of fuzzy models derived from experimental data. Fuzzy Sets and Systems, 4:1–12. Torreele, J. (1991). Temporal processing with recurrent networks: An evolutionary approach. In Belew, R. K. and Booker, L. B., editors, Proceedings of the Fourth International Conference on Genetic Algorithms, pages 555–561. Towell, G. and Shavlik, J. (1993). The extraction of refined rules from knowledge based neural networks. Machine Learning, 131:71–101. Towell, G. G., Shavlik, J. W., and Noordwier, M. O. (1990). Refinement of approximate domain theories by knowledge-based neural networks. In Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90), volume 2, pages 861–866. Uchino, E. and Yamakawa, T. (1995). System modeling by a neo-fuzzy neuron with applications to acoustic and chaotic systems. International Journal on Artificial Intelligence Tools, 1(2):73–91.

BIBLIOGRAPHY

242

Umano, M., Fukunaka, S., Hatono, I., and Tamura, H. (1997). Acquisition of fuzzy rules using fuzzy neural networks with forgetting. In 1997 International Conference on Neural Networks (ICNN ’97), volume 4, pages 2369–2373. IEEE Press. Valdes, J. J. (2002). Time series discovery with similarity-based neuro-fuzzy networks and evolutionary algorithms. In Proceedings of IJCNN 2002, pages 2345–2350. Vaughn, M., Ong, E., and Cavill, S. (1993). Direct rule extraction form a MLP network that performs whole life assurance risk assessment. In Usui, S. and Omori, T., editors, Proceedings of the Fifth International Conference on Neural Information Processing, volume 2, pages 909–914. IOS Press. Vesanto, J. (1997). Using the SOM and local models in time-series prediction. In Proceedings of WSOM’97, Workshop on Self-Organizing Maps, Espoo, Finland, June 4–6, pages 209–214. Helsinki University of Technology, Neural Networks Research Centre, Espoo, Finland. Villee, C. A. (1972). Biology. W.B. Saunders Company, 6th edition. Waibel, A., Hanazawa, T., Hinton, G., and Shikano, K. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions of Acoustics, Speech and Signal Processing, 37(3):328–339. Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., and Lang, J. (1988). Phoneme recognition: neural networks versus hidden markov models. In Proceedings ICASSP, New York, NY, pages 107–110. Wang, L. and Langari, R. (1996). Complex systems modeling via fuzzy logic. IEEE Transactions on Systems, Man, and Cybernetics, 26(1):100–106. Wang, X. (2001). “on-line” time series prediction system—EFuNN-T. In Proceedings of the Fifth Biannual Conference on Artificial Neural Networks and Expert Systems (ANNES2001), pages 82–86. Watrous, R. and Shastri, L. (1987). Learning phonetic features using connectionist networks: An experiment in speech recognition. In Proceedings of the 10th International Conference on Artificial Intelligence, pages 351–354. Watrous, R. L., Ladendorf, B., and Kuhn, G. (1990). Complete gradient optimization of a recurrent network applied to /b/,/d/,/g/ discrimination. Journal of the Acoustical Society of America, 87(3):1301–1309. Watts, M. (1999a).

Evolving connectionist systems for biochemical applications.

In Kasabov, N. and

Ko, K., editors, Emerging Knowledge Engineering and Connectionist-based Systems (Proceedings of the ICONIP/ANZIIS/ANNES’99 Workshop “Future directions for intelligent systems and information sciences”, Dunedin, 22-23 November 1999), pages 147–151. University of Otago Press. Watts, M. (1999b). An investigation of the properties of evolving fuzzy neural networks. In Proceedings of ICONIP’99, November 1999, Perth, Australia, pages 217–221. Watts, M. and Kasabov, N. (1998). Genetic algorithms for the design of fuzzy neural networks. In Usui, S. and Omori, T., editors, The Fifth International Conference on Neural Information Processing, volume 2, pages 793–796, Kitakyushu, Japan. IOS Press. Watts, M. and Kasabov, N. (1999). Spatial-temporal adaptation in evolving fuzzy neural networks for on-line adaptive phoneme recognition. Technical Report TR99/03, Department of Information Science, University of Otago. Watts, M. and Kasabov, N. (2000). Simple evolving connectionist systems and experiments on isolated phoneme

BIBLIOGRAPHY

243

recognition. In Proceedings of the first IEEE conference on evolutionary computation and neural networks, San Antonio, May 2000, pages 232–239. IEEE Press. Watts, M., Major, L., and Tate, W. (2002). Evolutionary optimisation of MLP for modelling protein synthesis termination signal efficiency. In Proceedings of the Congress on Evolutionary Computation (CEC) 2002, pages 606–610. Watts, M., Major, L., Tate, W., and Kasabov, N. (2001). Neural network analysis of protein synthesis termination signal efficiency. In Proceedings of International Conference on Neural Information Processing (ICONIP) 2001, Shanghai, China, pages 975–980. Watts, M. J. and Kasabov, N. K. (2002). Evolutionary optimisation of evolving connectionist systems. In Proceedings of the Congress on Evolutionary Computation (CEC) 2002, pages 606–610. Weigund, A., Rumelhart, D., and Huberman, B. (1991). Generalization by weight-elimination with application to forecasting. In Lippmann, R., Moody, J., and Touretzky, D., editors, Advances in Neural information Processing Systems (3), pages 875–882. Morgan Kaufmann, San Mateo. Weiss, S. and Kapouleas, I. (1991). An empirical comparison of pattern recognition, neural nets and machine learning classification methods. In Proceedings of the 11th International Joint Conference on Artificial Intelligence, Detroit, pages 781–787. Whitfort, T., Matthews, C., and Jagielska, I. (1995). Automated knowledge acquisition for a fuzzy classification problem. In Kasabov, N. K. and Coghill, G., editors, The Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems, pages 227–230. IEEE Computer Society Press. Widrow, B. (1962). Self-Organizing Systems, chapter Generalization and information storage in networks of adaline, pages 435–461. Sparta. Widrow, B., Rumelhart, D. E., and Lehr, M. A. (1994). Neural networks: Applications in industry, business and science. Communications of the ACM, 37(3):93–105. Wolpert, D. H. and Macready, W. G. (1995). No free lunch theorems for search. Technical Report SFI-TR-95-02010, Santa Fe Institute. Woodford, B. (2001). Comparative analysis of the EFuNN and the support vector machine models for the classification of horticulture data. In Proceedings of the Fifth Biannual Conference on Artificial Neural Networks and Expert Systems (ANNES2001), pages 70–75. Woodford, B. J. and Kasabov, N. K. (2001). Ensembles of EFuNNs: An architecture for a multi module classifier. In The proceedings of FUZZ-IEEE’2001. The 10th IEEE International Conference on Fuzzy Systems, December 2-5 2001, Melbourne, Australia, pages 441–445. Wright, A. H. (1991). Genetic algorithms for real parameter optimization. In Rawlins, G. J. E., editor, Foundations of Genetic Algorithms, pages 205–218. Wu, C. and McLarty, J. (2000). Neural Networks and Genome Informatics. Elsevier Health Sciences. Wu, Z. and Li, W. (1995). Optimization of floor plate structure in railway passenger train by genetic algorithm. In Proceedings of ICONIP 95, volume 1, pages 347–350. Xu, C. and Lu, Y. (1987). Fuzzy model identification and self-learning for dynamic systems. IEEE Transactions of Systems, Man, and Cybernetics, 17:683–689.

BIBLIOGRAPHY

244

Yang, J. T., Huang, H.-D., and Horng, J.-T. (2002). Devising a cost effective baseball scheduling by evolutionary algorithms. In Proceedings of CEC 2002, pages 1660–1665. Yang, X. and Furuhashi, T. (1993). A basic study on apparel CAD using a fuzzy neural network. In Proceedings of 1993 International Joint Conference on Neural Networks, pages 701–704. Yao, X. (1997). A new evolutionary system for evolving artificial neural networks. IEEE Transactions on Neural Networks, 8(3):694–713. Yao, X. (1999). Evolving artificial neural networks. Proceedings of the IEEE, pages 1423–1447. Yao, X. and Liu, Y. (1996a). Ensemble structure of evolutionary artificial neural networks. In Proceedings of the Third IEEE Conference on Evolutionary Computation (ICEC ’96), pages 659–664, Nagoya, Japan. Yao, X. and Liu, Y. (1996b). Evolving artificial neural network through evolutionary programming. In Fogel, L. J., Angeline, P. J., and Bäck, T., editors, Evolutionary Programming V, pages 257–266. MIT Press. Yao, X. and Liu, Y. (1998). Making use of population information in evolutionary artificial neural networks. IEEE Transactions on Systems, Man and Cybernetics, 28(3):417–425. Yen, G. G. and Lu, H. (2002). Hierarchical rank density genetic algorithm for radial-basis function neural network design. In Proceedings of CEC 2002, pages 25–30. Yu, C.-C. and Liu, B.-D. (2002). A backpropagation algorithm with adaptive learning rate and momentum coefficient. In Proceedings of the International Joint Conference on Neural Networks (IJCNN) 2002, pages 1218–1223. Zadeh, L. (1965). Fuzzy sets. Information and Control, 8:338–353. Zhao, Q. (1997). A co-evolutionary algorithm for neural network learning. In Proceedings of the 1997 IEEE International Conference on Neural Networks, volume 1, pages 432–437. IEEE Press. Zhao, Q. and Higuchi, T. (1996). Evolutionary learning of nearest-neighbour MLP. IEEE Transactions on Neural Networks, 7(3):762–767. ZISC Manual (2002). ZISC Zero Instruction Set Computer. Silicon Recognition, Inc., version 4.2 edition. http://www.silirec.com.

Appendix A

Results of Hypothesis Tests for Experiments with MLP and FuNN This Appendix presents the results of the statistical hypothesis tests for the experiments with the MLP and FuNN algorithms described in Section 2.7. Each section presents the results for the tests over one of the benchmark data sets.

A.1

Two Spirals

This section presents the results of the statistical hypothesis tests for the experiments over the two spirals data sets. The results are as follows:

Table A.1 presents the results of testing the hypotheses listed in Table 2.1 on page 35.

Table A.2 presents the results of testing the hypotheses listed in Table 2.2 on page 35 over the results for MLP.

Table A.3 presents the results of testing the hypotheses listed in Table 2.2 on page 35 over the results for FuNN.


Hypothesis

AA

AB

AC

AF

BA

BB

BC

BF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

accept

reject

reject

reject

reject

reject

reject

reject

Table A.1: Rejection / acceptance of H0 for MLP vs. FuNN for the two spirals problem.

Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

reject

reject

99%

reject

reject

reject

reject

Table A.2: Rejection / acceptance of H0 for change in accuracies of MLP for the two spirals problem.

APPENDIX A. RESULTS OF HYPOTHESIS TESTS FOR EXPERIMENTS WITH MLP AND FUNN

Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

reject

reject

99%

reject

reject

reject

accept

246

Table A.3: Rejection / acceptance of H0 for change in accuracies for FuNN for the two spirals problem. Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

accept

accept

99%

reject

reject

accept

accept

Table A.4: Rejection / acceptance of H0 for change in accuracies of MLP vs. change in accuracies of FuNN for the two spirals problem.

A.2

Iris Classification

This section presents the results of the statistical hypothesis tests for the experiments over the iris classification data sets. The tables are as follows:





Hypothesis

AA

AB

AC

AF

BA

BB

BC

BF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

Table A.5: Rejection / acceptance of H0 for MLP vs. FuNN for the iris classification problem.


Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

reject

accept

99%

reject

reject

reject

accept

247

Table A.6: Rejection / acceptance of H0 for change in accuracies of MLP for the iris classification problem. Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

reject

reject

99%

reject

reject

reject

reject

Table A.7: Rejection / acceptance of H0 for change in accuracies for FuNN for the iris classification problem.

A.3

Mackey-Glass

This section presents the results of the statistical hypothesis tests for the experiments over the Mackey-Glass data sets. The tables are as follows:





Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

accept

reject

99%

accept

reject

accept

reject

Table A.8: Rejection / acceptance of H0 for change in accuracies of MLP vs. change in accuracies of FuNN for the iris classification problem.


Hypothesis

AA

AB

AC

AF

BA

BB

BC

BF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

248

Table A.9: Rejection / acceptance of H0 for MLP vs. FuNN for the Mackey-Glass problem. Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

reject

reject

99%

reject

reject

reject

reject

Table A.10: Rejection / acceptance of H0 for change in accuracies of MLP for the Mackey-Glass problem.

A.4

Gas Furnace

This section presents the results of the statistical hypothesis tests for the experiments over the Mackey-Glass data sets. The tables are as follows:





Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

reject

reject

99%

reject

reject

reject

reject

Table A.11: Rejection / acceptance of H0 for change in accuracies for FuNN for the Mackey-Glass problem.


Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

reject

reject

99%

reject

reject

reject

reject

249

Table A.12: Rejection / acceptance of H0 for change in accuracies of MLP vs. change in accuracies of FuNN for the Mackey-Glass problem.

Hypothesis

AA

AB

AC

AF

BA

BB

BC

BF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

Table A.13: Rejection / acceptance of H0 for MLP vs. FuNN for the gas furnace problem.

Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

reject

reject

99%

reject

reject

reject

reject

Table A.14: Rejection / acceptance of H0 for change in accuracies of MLP for the gas furnace problem.

Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

reject

reject

99%

reject

reject

reject

reject

Table A.15: Rejection / acceptance of H0 for change in accuracies for FuNN for the gas furnace problem.

Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

reject

reject

99%

reject

reject

reject

reject

Table A.16: Rejection / acceptance of H0 for change in accuracies of MLP vs. change in accuracies of FuNN for the gas furnace problem.

Appendix B

Results of Hypothesis Tests for Experiments with EFuNN and SECoS This Appendix presents the results of the statistical hypothesis tests across the results for the experiments with the EFuNN and SECoS algorithms, as presented in Section 4.13. Each section in this appendix presents the results of the tests over one of the benchmark data sets.

B.1 Two Spirals This section presents the results of the tests for the experiments over the two spirals data set. The results are as follows:

Table B.1 presents the results of the tests of the hypotheses presented in Table 4.3 on page 80, that is, the results of comparing SECoS to EFuNN.

Table B.2 presents the results of the tests of the hypotheses presented in Table 4.4 on page 81, that is, the results of comparing MLP to SECoS.

Table B.3 presents the results of the tests of the hypotheses presented in Table 4.5 on page 81, that is, the results of comparing FuNN to EFuNN.

Table B.4 presents the results of the tests of the hypotheses presented in Table 2.2 on page 35 on SECoS, that is, the results of testing the forgetting of SECoS.

Table B.5 presents the results of the tests of the hypotheses presented in Table 2.2 on page 35 on EFuNN, that is, the results of testing the forgetting of EFuNN.

Table B.6 presents the results of the tests of the hypotheses listed in Table 4.6 on page 82 , that is, the results of comparing the degree of forgetting of SECoS and EFuNN.

Table B.7 presents the results of testing the hypotheses listed in Table 4.7 on page 82, that is, the results of comparing the degree of forgetting of MLP and SECoS.

Table B.8 presents the results of testing the hypotheses listed in Table 4.8 on page 82, that is, the results of comparing the forgetting of FuNN and EFuNN.

APPENDIX B. RESULTS OF HYPOTHESIS TESTS FOR EXPERIMENTS WITH EFUNN AND SECOS 251

Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

accept

reject

reject

reject

accept

reject

reject

reject

99%

reject

reject

accept

reject

reject

reject

accept

reject

reject

reject

Table B.1: Rejection /acceptance of H0 for EFuNN vs. SECoS for the two spirals. Hypothesis

AA

AB

AC

AF

BA

BB

BC

BF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

accept

reject

reject

reject

reject

Table B.2: Rejection /acceptance of H0 for MLP vs. SECoS for the two spirals.

B.2 Iris Classification This section presents the results of the tests for the experiments over the iris classification data set. The results are as follows:








Hypothesis

AA

AB

AC

AF

BA

BB

BC

BF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

Table B.3: Rejection /acceptance of H0 for FuNN vs. EFuNN for the two spirals.


Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

accept

reject

reject

accept

99%

accept

reject

reject

accept

Table B.4: Rejection /acceptance of H0 for change in accuracies of SECoS for the two spirals problem. Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

accept

accept

99%

reject

reject

accept

accept

Table B.5: Rejection /acceptance of H0 for change in accuracies of EFuNN for the two spirals problem.


Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

accept

reject

reject

accept

99%

accept

reject

reject

accept

Table B.6: Rejection /acceptance of H0 for change in accuracies of SECoS vs. change in accuracies of EFuNN for the two spirals problem.


Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

reject

accept

99%

reject

reject

reject

accept

Table B.7: Rejection of H0 for change in accuracies of MLP vs. change in accuracies of SECoS for the two spirals problem. Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

reject

accept

99%

reject

reject

reject

accept

Table B.8: Rejection /acceptance of H0 for change in accuracies of FuNN vs. change in accuracies of EFuNN for the two spirals problem.

B.3 Mackey-Glass This section presents the results of the tests for the experiments over the Mackey-Glass data set. The results are as follows:







Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

accept

accept

accept

accept

reject

accept

accept

accept

accept

reject

99%

accept

accept

accept

accept

reject

accept

accept

accept

accept

reject

Table B.9: Rejection /acceptance of H0 for EFuNN vs. SECoS for the iris classification problem.


Hypothesis

AA

AB

AC

AF

BA

BB

BC

BF

95%

accept

reject

reject

accept

reject

reject

reject

reject

99%

accept

reject

reject

accept

reject

reject

reject

accept

Table B.10: Rejection /acceptance of H0 for MLP vs. SECoS for the iris classification problem. Hypothesis

AA

AB

AC

AF

BA

BB

BC

BF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

Table B.11: Rejection /acceptance of H0 for FuNN vs. EFuNN for the iris classification problem.



Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

accept

reject

accept

accept

99%

accept

accept

accept

accept

Table B.12: Rejection /acceptance of H0 for change in accuracies of SECoS for the iris classification problem.


Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

accept

reject

accept

accept

99%

accept

accept

accept

accept

Table B.13: Rejection /acceptance of H0 for change in accuracies of EFuNN for the iris classification problem. Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

accept

accept

accept

accept

99%

accept

accept

accept

accept

Table B.14: Rejection /acceptance of H0 for change in accuracies of SECoS vs. change in accuracies of EFuNN for the iris classification problem.

B.4 Gas Furnace This section presents the results of the tests for the experiments over the gas furnace data set. The results are as follows:







Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

reject

reject

99%

reject

reject

reject

reject

Table B.15: Rejection /acceptance of H0 for change in accuracies of MLP vs. change in accuracies of SECoS for the iris classification problem.


Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

reject

reject

99%

reject

accept

reject

reject

Table B.16: Rejection /acceptance of H0 for change in accuracies of FuNN vs. change in accuracies of EFuNN for the iris classification problem. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

AF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

Table B.17: Rejection /acceptance of H0 for EFuNN vs. SECoS for the Mackey-Glass data set.



Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

accept

accept

accept

accept

accept

accept

accept

accept

99%

accept

accept

accept

accept

accept

accept

accept

accept

Table B.18: Rejection /acceptance of H0 for MLP vs. SECoS for the Mackey-Glass data set.


Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

accept

accept

accept

accept

accept

accept

accept

accept

99%

accept

accept

accept

accept

accept

accept

accept

accept

Table B.19: Rejection /acceptance of H0 for FuNN vs. EFuNN for the Mackey-Glass data set. Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

accept

accept

99%

accept

reject

accept

accept

Table B.20: Rejection /acceptance of H0 for change in accuracies of SECoS for the Mackey-Glass problem. Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

accept

reject

accept

reject

99%

accept

reject

accept

reject

Table B.21: Rejection /acceptance of H0 for change in accuracies of EFuNN for the Mackey-Glass problem. Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

accept

reject

accept

accept

99%

accept

reject

accept

accept

Table B.22: Rejection /acceptance of H0 for change in accuracies of SECoS vs. change in accuracies of EFuNN for the Mackey-Glass problem.

Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

accept

accept

accept

accept

99%

accept

accept

accept

accept

Table B.23: Rejection /acceptance of H0 for change in accuracies of MLP vs. change in accuracies of SECoS for the Mackey-Glass problem.

Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

accept

accept

accept

accept

99%

accept

accept

accept

accept

Table B.24: Rejection /acceptance of H0 for change in accuracies of FuNN vs. change in accuracies of EFuNN for the Mackey-Glass problem.


Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

AF

BN

95%

accept

reject

reject

accept

reject

accept

reject

reject

accept

reject

99%

accept

accept

accept

accept

reject

accept

reject

accept

accept

reject

Table B.25: Rejection /acceptance of H0 for EFuNN vs. SECoS for the gas furnace problem. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

accept

reject

reject

99%

reject

reject

reject

reject

reject

accept

reject

reject

Table B.26: Rejection /acceptance of H0 for MLP vs. SECoS for the gas furnace problem. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

accept

accept

accept

accept

reject

accept

reject

reject

99%

accept

accept

accept

accept

reject

accept

reject

reject

Table B.27: Rejection /acceptance of H0 for FuNN vs. EFuNN for the gas furnace problem. Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

accept

accept

99%

accept

reject

accept

accept

Table B.28: Rejection /acceptance of H0 for change in accuracies of SECoS for the gas furnace problem. Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

accept

accept

99%

accept

reject

accept

accept

Table B.29: Rejection /acceptance of H0 for change in accuracies of EFuNN for the gas furnace problem. Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

accept

accept

accept

accept

99%

accept

accept

accept

accept

Table B.30: Rejection /acceptance of

H0 for change in accuracies of SECoS vs change in accuracies of EFuNN

for the gas furnace problem. Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

reject

reject

reject

accept

99%

reject

reject

accept

accept

Table B.31: Rejection /acceptance of H0 for change in accuracies of MLP vs. change in accuracies of SECoS for the gas furnace problem.


Hypothesis

ÆA

ÆB

ÆC

ÆF

95%

accept

reject

accept

accept

99%

accept

reject

accept

accept

Table B.32: Rejection /acceptance of H0 for change in accuracies of FuNN vs. change in accuracies of EFuNN for the gas furnace problem.

Appendix C

Results of Hypothesis Tests for Experiments with Rule Extraction This Appendix presents the results of the statistical hypothesis tests across the results for the experiments with the rule extraction and rule insertion algorithms from EFuNN and SECoS.

C.1

Two Spirals

This section presents the results of the statistical hypothesis tests for the two spirals data set. The results are as follows:

Table C.1 presents the results of testing the hypotheses presented in Table 6.5 on page 141, that is, the comparison of the accuracy of the extracted Zadeh-Mamdani rules and the accuracy of the original SECoS network.

Table C.2 presents the results of testing the hypotheses presented in Table 6.5 on page 141 over the SECoSderived rules, that is, the comparison of the accuracy of the extracted Zadeh-Mamdani rules and the accuracy of the original EFuNN network.

Table C.3 presents the results of testing the hypotheses listed in Table 6.5 over the SECoS derived TakagiSugeno rules, that is, the comparison of the accuracy of the extracted Takagi-Sugeno rules to the accuracy of the original SECoS network.

Table C.4 presents the results of testing the hypotheses listed in Table 6.7 on page 141 , that is, the comparison of the accuracy of the Zadeh-Mamdani rules extracted from SECoS with the accuracy of the ZadehMamdani rules extracted from EFuNN.

Table C.5 presents the results of testing the hypotheses listed in Table 6.8 on page 142, that is, the comparison of the accuracy of the Zadeh-Mamdani rules extracted from SECoS with the accuracy of the Takagi-Sugeno rules extracted from SECoS.

Table C.6 presents the results of testing the hypotheses presented in Table 6.9 on page 142 for the SECoS rule insertion algorithm, that is, the comparison of the accuracy of the Zadeh-Mamdani rules extracted from SECoS with the accuracy of the SECoS created via the insertion of those rules.

APPENDIX C. RESULTS OF HYPOTHESIS TESTS FOR EXPERIMENTS WITH RULE EXTRACTION 261

Table C.7 presents the results of testing the hypotheses presented in Table 6.9 on page 142 for the EFuNN rule insertion algorithm, that is, the comparison of the accuracy of the Zadeh-Mamdani rules extracted from EFuNN with the accuracy of the EFuNN created via the insertion of those rules.

Table C.8 presents the results of testing the hypotheses listed in Table 6.10 on page 142 for the SECoS rule insertion algorithm, that is, the comparison of the accuracy of the original SECoS network with the SECoS created via insertion of Zadeh-Mamdani fuzzy rules.

Table C.9 presents the results of testing the hypotheses listed in Table 6.10 on page 142 for the EFuNN rule insertion algorithm, that is, the comparison of the accuracy of the original EFuNN network with the EFuNN created via insertion of Zadeh-Mamdani fuzzy rules. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

accept

accept

reject

99%

reject

reject

reject

reject

reject

accept

accept

accept

Table C.1: Rejection / acceptance of

H0 for SECoS vs.

Zadeh-Mamdani fuzzy rules extracted from SECoS, for

the two spirals problem.

Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

accept

reject

reject

accept

accept

reject

99%

reject

accept

accept

accept

reject

accept

accept

accept

Table C.2: Rejection / acceptance of H0 for EFuNN vs. Zadeh-Mamdani fuzzy rules extracted from EFuNN, for the two spirals problem.

Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

accept

reject

reject

accept

accept

reject

99%

reject

reject

accept

reject

reject

accept

accept

accept

Table C.3: Rejection / acceptance of H0 for SECoS vs. Takagi-Sugeno fuzzy rules extracted from SECoS, for the two spirals problem.

C.2

Iris Classification

This section presents the results of the statistical hypothesis tests for the iris classification data set. The results are as follows:

Table C.10 presents the results of testing the hypotheses presented in Table 6.5 on page 141, that is, the comparison of the accuracy of the extracted Zadeh-Mamdani rules and the accuracy of the original SECoS network.


Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

accept

reject

reject

accept

accept

reject

99%

reject

accept

accept

reject

accept

accept

accept

accept


H0 for Zadeh-Mamdani rules extracted from SECoS vs.

Zadeh-Mamdani

rules extracted from EFuNN for the two spirals problem. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

Table C.5: Rejection / acceptance of H0 for Zadeh-Mamdani fuzzy rules extracted from SECoS vs. Takagi-Sugeno fuzzy rules extracted from SECoS, for the two spirals problem.




Table C.14 presents the results of testing the hypotheses listed in Table 6.8 on page 142, that is, the comparison of the accuracy of the Zadeh-Mamdani rules extracted from SECoS with the accuracy of the TakagiSugeno rules extracted from SECoS.


Table C.16 presents the results of testing the hypotheses presented in Table 6.9 on page 142 for the EFuNN

Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

accept

accept

reject

accept

accept

accept

accept

99%

accept

accept

accept

accept

accept

accept

accept

accept

Table C.6: Rejection / acceptance of H0 for Zadeh-Mamdani fuzzy rules extracted from SECoS vs. SECoS created by insertion of Zadeh-Mamdani rules, for the two spirals problem.


Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

accept

accept

accept

reject

accept

accept

accept

99%

accept

accept

accept

accept

accept

accept

accept

accept


H0

for Zadeh-Mamdani fuzzy rules extracted from EFuNN vs. EFuNN

created by insertion of Zadeh-Mamdani rules, for the two spirals problem. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

accept

accept

accept

accept

accept

accept

accept

accept

99%

accept

accept

accept

accept

accept

accept

accept

accept

Table C.8: Rejection / acceptance of H0 for SECoS vs. SECoS created by insertion of Zadeh-Mamdani rules, for the two spirals problem. rule insertion algorithm, that is, the comparison of the accuracy of the Zadeh-Mamdani rules extracted from EFuNN with the accuracy of the EFuNN created via the insertion of those rules.


Table C.18 presents the results of testing the hypotheses listed in Table 6.10 on page 142 for the EFuNN rule insertion algorithm, that is, the comparison of the accuracy of the original EFuNN network with the EFuNN created via insertion of Zadeh-Mamdani fuzzy rules.

C.3

Mackey-Glass

This section presents the results of the statistical hypothesis tests for the Mackey-Glass data set. The results are as follows:

Table C.19 presents the results of testing the hypotheses presented in Table 6.6 on page 141, that is, the comparison of the accuracy of the extracted Zadeh-Mamdani rules and the accuracy of the original SECoS network. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

accept

accept

accept

reject

reject

accept

reject

99%

reject

accept

accept

accept

accept

reject

accept

accept

Table C.9: Rejection / acceptance of H0 for EFuNN vs. EFuNN created by insertion of Zadeh-Mamdani rules, for the two spirals problem.


Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

Table C.10: Rejection / acceptance of H0 for SECoS vs. Zadeh-Mamdani fuzzy rules extracted from SECoS, for the iris classification problem. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

accept

reject

reject

reject

reject

reject

reject

Table C.11: Rejection / acceptance of H0 for EFuNN vs. Zadeh-Mamdani fuzzy rules extracted from EFuNN, for the iris classification problem.







Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

Table C.12: Rejection / acceptance of H0 for SECoS vs. Takagi-Sugeno fuzzy rules extracted from SECoS, for the iris classification problem.


Hypothesis

AA

AB

AC

AF

BA

BB

BC

BF

95%

reject

accept

reject

reject

reject

reject

accept

reject

99%

reject

accept

accept

reject

reject

accept

accept

reject

Table C.13: Rejection / acceptance of H0 for Zadeh-Mamdani rules extracted from SECoS vs. Zadeh-Mamdani rules extracted from EFuNN for the iris classification problem. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject


H0

for Zadeh-Mamdani fuzzy rules extracted from SECoS vs. Takagi-

Sugeno fuzzy rules extracted from SECoS, for the iris classification problem. rule insertion algorithm, that is, the comparison of the accuracy of the Zadeh-Mamdani rules extracted from EFuNN with the accuracy of the EFuNN created via the insertion of those rules.



C.4

Gas Furnace

This section presents the results of the statistical hypothesis tests for the gas furnace data set. The results are as follows:

Table C.28 presents the results of testing the hypotheses presented in Table 6.6 on page 141, that is, the comparison of the accuracy of the extracted Zadeh-Mamdani rules and the accuracy of the original SECoS network. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

accept

accept

accept

accept

accept

accept

accept

accept

99%

accept

accept

accept

accept

accept

accept

accept

accept


H0

for Zadeh-Mamdani fuzzy rules extracted from SECoS vs. SECoS

created by insertion of Zadeh-Mamdani rules, for the iris classification problem.


Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

accept

reject

reject

reject

reject

reject

reject


H0


created by insertion of Zadeh-Mamdani rules, for the iris classification problem. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

accept

reject

Table C.17: Rejection / acceptance of H0 for SECoS vs. SECoS created by insertion of Zadeh-Mamdani rules, for the iris classification problem.







Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

accept

accept

accept

accept

reject

accept

accept

reject

99%

accept

accept

accept

accept

accept

accept

accept

accept

Table C.18: Rejection / acceptance of H0 for EFuNN vs. EFuNN created by insertion of Zadeh-Mamdani rules, for the iris classification problem.


Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

Table C.19: Rejection / acceptance of H0 for SECoS vs. Zadeh-Mamdani fuzzy rules extracted from SECoS, for the Mackey-Glass problem. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

Table C.20: Rejection / acceptance of H0 for EFuNN vs. Zadeh-Mamdani fuzzy rules extracted from EFuNN, for the Mackey-Glass problem. rule insertion algorithm, that is, the comparison of the accuracy of the Zadeh-Mamdani rules extracted from EFuNN with the accuracy of the EFuNN created via the insertion of those rules.



Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

Table C.21: Rejection / acceptance of H0 for SECoS vs. Takagi-Sugeno fuzzy rules extracted from SECoS, for the Mackey-Glass problem.


Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

accept

accept

accept

accept

accept

accept

accept

accept

99%

accept

accept

accept

accept

accept

accept

accept

accept

Table C.22: Rejection / acceptance of H0 for Zadeh-Mamdani rules extracted from SECoS vs. Zadeh-Mamdani rules extracted from EFuNN for the Mackey-Glass problem. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

accept

accept

accept

accept

99%

reject

accept

accept

reject

accept

accept

accept

accept


H0


Sugeno fuzzy rules extracted from SECoS, for the Mackey-Glass problem. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

accept

accept

accept

accept

accept

accept

accept

accept

99%

accept

accept

accept

accept

accept

accept

accept

accept


H0


created by insertion of Zadeh-Mamdani rules, for the Mackey-Glass problem. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject


H0


created by insertion of Zadeh-Mamdani rules, for the Mackey-Glass problem. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

Table C.26: Rejection / acceptance of H0 for SECoS vs. SECoS created by insertion of Zadeh-Mamdani rules, for the Mackey-Glass problem. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

Table C.27: Rejection / acceptance of H0 for EFuNN vs. EFuNN created by insertion of Zadeh-Mamdani rules, for the Mackey-Glass problem.


Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

Table C.28: Rejection / acceptance of H0 for SECoS vs. Zadeh-Mamdani fuzzy rules extracted from SECoS, for the gas furnace problem. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

Table C.29: Rejection / acceptance of H0 for EFuNN vs. Zadeh-Mamdani fuzzy rules extracted from EFuNN, for the gas furnace problem. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

Table C.30: Rejection / acceptance of H0 for SECoS vs. Takagi-Sugeno fuzzy rules extracted from SECoS, for the gas furnace problem. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

accept

accept

accept

accept

accept

accept

accept

accept

99%

accept

accept

accept

accept

accept

accept

accept

accept

Table C.31: Rejection / acceptance of H0 for Zadeh-Mamdani rules extracted from SECoS vs. Zadeh-Mamdani rules extracted from EFuNN for the gas furnace problem. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

accept

reject

reject

reject

accept

accept

reject

99%

accept

accept

accept

reject

accept

accept

accept

accept


H0


Sugeno fuzzy rules extracted from SECoS, for the gas furnace problem. Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

accept

accept

accept

accept

reject

accept

accept

reject

99%

accept

accept

accept

accept

accept

accept

accept

accept


H0


created by insertion of Zadeh-Mamdani rules, for the gas furnace problem.


Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

accept

reject

reject

99%

reject

reject

reject

reject

reject

accept

reject

reject


H0


created by insertion of Zadeh-Mamdani rules, for the gas furnace problem.

Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

Table C.35: Rejection / acceptance of H0 for SECoS vs. SECoS created by insertion of Zadeh-Mamdani rules, for the gas furnace problem.

Hypothesis

AA

AB

AC

AF

BA

BB

BC

AF

95%

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

Table C.36: Rejection / acceptance of H0 for EFuNN vs. EFuNN created by insertion of Zadeh-Mamdani rules, for the gas furnace problem.

Appendix D

Results of Hypothesis Tests for ECoS Optimisation Experiments This Appendix presents the results of the statistical hypothesis tests for the ECoS optimisation experiments described in Section 7.5. Each section contains the results for one of the benchmark data sets.

D.1

Two Spirals

This section describes the results for the hypothesis tests for the experiments over the two spirals data set. Each subsection deals with the results of one optimisation algorithm.

D.1.1 Online Aggregation This subsection presents the results for the tests over the experiments with the online aggregation optimisation algorithm. The tables in this subsection are as follows:

Table D.1 presents the results of testing the hypotheses listed in Table 7.2 on page 165 for SECoS, that is, for comparing the performance of the SECoS networks trained using online aggregation with the unoptimised SECoS.

Table D.2 presents the results of testing the hypotheses listed in Table 7.2 on page 165 for EFuNN, that is, for comparing the performance of the EFuNN networks trained using online aggregation with the unoptimised EFuNN.

Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

accept

accept

accept

accept

accept

accept

accept

accept

accept

accept

99%

accept

accept

accept

accept

accept

accept

accept

accept

accept

accept

Table D.1: Rejection / acceptance of H0 for SECoS vs. SECoS trained by online aggregation for the two spirals problem.

APPENDIX D. RESULTS OF HYPOTHESIS TESTS FOR ECOS OPTIMISATION EXPERIMENTS

272

Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

accept

accept

accept

accept

accept

accept

accept

accept

accept

accept

99%

accept

accept

accept

accept

accept

accept

accept

accept

accept

accept

Table D.2: Rejection / acceptance of H0 for EFuNN vs. EFuNN trained by online aggregation for the two spirals problem.

D.1.2 Evolutionary Optimised Training This subsection presents the results for the tests over the experiments with the evolutionary optimised training algorithm. The tables in this subsection are as follows:

Table D.3 presents the results of testing the hypotheses presented in Table 7.5 on page 167 for SECoS, that is, the comparison of SECoS trained using evolutionary optimised training and unoptimised SECoS.

Table D.4 presents the results of testing the hypotheses presented in Table 7.5 on page 167 for EFuNN, that is, the comparison of EFuNN trained using evolutionary optimised training and unoptimised EFuNN.

Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

Table D.3: Rejection / acceptance of H0 for SECoS vs. SECoS trained via evolutionary optimised training for the two spirals problem.

Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

Table D.4: Rejection / acceptance of H0 for EFuNN vs. EFuNN trained via evolutionary optimised training for the two spirals problem.

D.1.3 Offline Aggregation This subsection presents the results for the tests over the experiments with the evolutionary optimised training algorithm. The tables in this subsection are as follows:

Table D.5 presents the results of testing the hypotheses in Table 7.7 on page 167 over the SECoS optimised by offline aggregation, that is, the comparison between SECoS optimised by offline aggregation and the unoptimised SECoS.


273

Table D.6 presents the results of testing the hypotheses in Table 7.7 on page 167 over the EFuNN optimised by offline aggregation, that is, the comparison between EFuNN optimised by offline aggregation and the unoptimised EFuNN.

Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

accept

accept

accept

accept

reject

accept

accept

accept

accept

reject

99%

accept

accept

accept

accept

reject

accept

accept

accept

accept

reject

Table D.5: Rejection / acceptance of H0 for SECoS optimised by offline aggregation for the two spirals problem.

Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

accept

accept

reject

reject

reject

accept

accept

reject

reject

99%

reject

accept

accept

reject

reject

reject

accept

accept

reject

reject

Table D.6: Rejection / acceptance of H0 for EFuNN optimised by offline aggregation for the two spirals problem.

D.1.4 Sleep Learning This subsection presents the results for the tests over the experiments with the sleep learning algorithm. The only table in this subsection is Table D.7, which presents the results of testing the hypotheses presented in Table 7.9 on page 168, that is, the comparison of SECoS networks optimised by sleep learning with unoptimised SECoS networks. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

accept

accept

accept

accept

reject

reject

reject

reject

accept

reject

99%

accept

accept

accept

accept

reject

reject

reject

reject

accept

reject

Table D.7: Rejection / acceptance of H0 for SECoS optimised by sleep learning for the two spirals problem.

D.1.5 Evolved Sleep Learning This subsection presents the results for the tests over the experiments with the evolutionary optimised training algorithm. The tables in this subsection are as follows:

Table D.8 presents the results of testing the hypotheses in Table 7.11 on page 169, that is, the comparison between the SECoS networks optimised via evolved sleep learning and unoptimised SECoS networks.

Table D.9 presents the results of testing the hypotheses in Table 7.12, that is, the comparison between SECoS networks optimised via sleep learning and SECoS networks optimised via evolved sleep learning.


Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

274

Table D.8: Rejection / acceptance of H0 for SECoS optimised by GA optimised sleep learning, for the two spirals problem. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

accept

accept

reject

reject

reject

accept

accept

99%

reject

reject

reject

accept

accept

accept

reject

reject

accept

accept

Table D.9: Rejection / acceptance of

H0

for SECoS optimised via sleep learning, vs SECoS optimised via GA

optimised sleep learning, for the two spirals problem.

D.1.6 Comparison of Techniques This subsection presents the results of the comparisons of optimisation techniques that were explored. The tables in this subsection are as follows:

Table D.10 presents the results of testing the hypotheses listed in Table 7.13 on page 170 for SECoS, that is, the comparison of the performance of the SECoS networks optimised by online aggregation and the SECoS networks optimised by GA optimised training.


Table D.12 presents the results of testing the hypotheses listed in Table 7.14 on page 171, that is, the comparison of SECoS networks optimised by offline aggregation and SECoS networks optimised by sleep learning.

Table D.13 presents the results of testing the hypotheses in Table 7.15 on page 171, that is, the comparison of SECoS networks optimised by offline aggregation and SECoS networks optimised by evolved sleep learning.

Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

Table D.10: Rejection / acceptance of H0 for SECoS optimised by online aggregation vs SECoS trained via GA optimised training for the two spirals problem.


275

Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

Table D.11: Rejection / acceptance of H0 for EFuNN optimised by online aggregation vs. EFuNN trained via GA optimised training for the two spirals problem. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

accept

accept

accept

accept

accept

accept

accept

accept

accept

accept

99%

accept

accept

accept

accept

accept

accept

accept

accept

accept

accept


H0 for SECoS optimised by offline aggregation vs.

SECoS optimised by

sleep learning for the two spirals problem.

D.2

Iris Classification

This section describes the results for the hypothesis tests for the experiments over the iris classification data set. Each subsection deals with the results of one optimisation algorithm.




Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

accept

accept

accept

accept

accept

accept

accept

accept

accept

reject

99%

accept

accept

accept

accept

accept

accept

accept

accept

accept

reject



GA optimised sleep learning for the two spirals problem.

SECoS optimised by


276

Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

AF

BN

95%

reject

accept

accept

reject

reject

reject

accept

accept

reject

reject

99%

reject

accept

accept

reject

reject

reject

accept

accept

reject

reject

Table D.14: Rejection / acceptance of H0 for SECoS vs. SECoS trained by online aggregation for the iris classification problem. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

accept

accept

reject

reject

reject

accept

accept

reject

reject

99%

accept

accept

accept

accept

reject

accept

accept

accept

accept

reject


H0

for EFuNN vs. EFuNN trained by online aggregation for the iris

classification problem.




Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

accept

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

accept

reject

reject

reject

Table D.16: Rejection / acceptance of H0 for SECoS vs. SECoS trained via evolutionary optimised training for the iris classification problem.



Table D.19 presents the results of testing the hypotheses in Table 7.7 on page 167 over the EFuNN optimised


Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

277

Table D.17: Rejection / acceptance of H0 for EFuNN vs. EFuNN trained via evolutionary optimised training for the iris classification problem. by offline aggregation, that is, the comparison between EFuNN optimised by offline aggregation and the unoptimised EFuNN. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject


H0 for SECoS optimised by offline aggregation for the iris classification

problem.

Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

accept

reject

reject

reject

accept

reject

reject

reject

accept

99%

accept

accept

reject

accept

reject

accept

accept

reject

accept

accept


H0 for EFuNN optimised by offline aggregation for the iris classification

problem.

D.2.4 Sleep Learning This subsection presents the results for the tests over the experiments with the sleep learning algorithm. The only table in this subsection is Table D.20, which presents the results of testing the hypotheses presented in Table 7.9 on page 168, that is, the comparison of SECoS networks optimised by sleep learning with unoptimised SECoS networks.





Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

accept

reject

reject

accept

accept

reject

reject

reject

99%

reject

reject

accept

reject

reject

accept

accept

reject

accept

reject

278

Table D.20: Rejection / acceptance of H0 for SECoS optimised by sleep learning for the iris classification problem. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject


H0

for SECoS optimised by GA optimised sleep learning, for the iris

classification problem.






Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

Table D.22: Rejection / acceptance of H0 for SECoS optimised via sleep learning, vs. SECoS optimised via GA optimised sleep learning, for the iris classification problem.


Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

279

Table D.23: Rejection / acceptance of H0 for SECoS optimised by online aggregation vs. SECoS trained via GA optimised training for the iris classification problem. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

Table D.24: Rejection / acceptance of H0 for EFuNN optimised by online aggregation vs. EFuNN trained via GA optimised training for the iris classification problem.

D.3

Mackey-Glass

This section describes the results for the hypothesis tests for the experiments over the Mackey-Glass data set. Each subsection deals with the results of one optimisation algorithm.




Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

accept

accept

reject

reject

reject

accept

accept

reject

reject

99%

reject

accept

accept

reject

reject

accept

accept

accept

accept

reject



sleep learning for the iris classification problem.

SECoS optimised by


280

Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

accept

accept

accept

accept

reject

accept

accept

accept

accept

reject

99%

accept

accept

accept

accept

reject

accept

accept

accept

accept

reject



SECoS optimised by

GA optimised sleep learning for the iris classification problem. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

accept

reject

reject

reject

reject

accept

reject

reject

Table D.27: Rejection / acceptance of H0 for SECoS vs. SECoS trained by online aggregation for the MackeyGlass problem.







Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

Table D.28: Rejection / acceptance of H0 for EFuNN vs. EFuNN trained by online aggregation for the MackeyGlass problem.


281

Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

accept

accept

accept

accept

reject

accept

accept

accept

accept

reject

99%

accept

accept

accept

accept

reject

accept

accept

accept

accept

reject

Table D.29: Rejection / acceptance of H0 for SECoS vs. SECoS trained via evolutionary optimised training for the Mackey-Glass problem. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

accept

accept

accept

accept

reject

accept

accept

accept

accept

reject

99%

accept

accept

accept

accept

reject

accept

accept

accept

accept

reject

Table D.30: Rejection / acceptance of H0 for EFuNN vs. EFuNN trained via evolutionary optimised training for the Mackey-Glass problem. by offline aggregation, that is, the comparison between EFuNN optimised by offline aggregation and the unoptimised EFuNN. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject


H0

for SECoS optimised by offline aggregation for the Mackey-Glass

problem.






Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

accept

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

accept

reject

reject

reject

reject


H0

282

for EFuNN optimised by offline aggregation for the Mackey-Glass

problem. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

Table D.33: Rejection / acceptance of H0 for SECoS optimised by sleep learning for the Mackey-Glass problem.






Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

accept

accept

accept

accept

reject

reject

reject

reject

reject

reject

99%

accept

accept

accept

accept

reject

reject

reject

reject

reject

reject

Table D.34: Rejection / acceptance of H0 for SECoS optimised by GA optimised sleep learning, for the MackeyGlass problem.


283

Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

accept

accept

accept

accept

reject

accept

accept

accept

accept

reject

99%

accept

accept

accept

accept

reject

accept

accept

accept

accept

reject

Table D.35: Rejection / acceptance of H0 for SECoS optimised via sleep learning, vs SECoS optimised via GA optimised sleep learning, for the Mackey-Glass problem. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

accept

accept

accept

accept

reject

accept

accept

accept

accept

reject

99%

accept

accept

accept

accept

reject

accept

accept

accept

accept

reject

Table D.36: Rejection / acceptance of H0 for SECoS optimised by online aggregation vs SECoS trained via GA optimised training for the Mackey-Glass problem.

D.4

Gas Furnace

This section describes the results for the hypothesis tests for the experiments over the gas furnace data set. Each subsection deals with the results of one optimisation algorithm.




Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

accept

accept

accept

accept

reject

accept

accept

accept

accept

reject

99%

accept

accept

accept

accept

reject

accept

accept

accept

accept

reject

Table D.37: Rejection / acceptance of H0 for EFuNN optimised by online aggregation vs EFuNN trained via GA optimised training for the Mackey-Glass problem.


284

Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject


H0 for SECoS optimised by offline aggregation vs SECoS optimised by

sleep learning for the Mackey-Glass problem. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

accept

accept

accept

accept

reject

accept

accept

accept

accept

reject

99%

accept

accept

accept

accept

reject

accept

accept

accept

accept

reject

Table D.39: Rejection / acceptance of H0 for SECoS optimised by offline aggregation vs SECoS optimised by GA optimised sleep learning for the Mackey-Glass problem.







Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

accept

accept

reject

reject

reject

reject

accept

reject

reject

99%

reject

accept

accept

reject

reject

accept

accept

accept

reject

reject

Table D.40: Rejection / acceptance of H0 for SECoS vs. SECoS trained by online aggregation for the gas furnace problem.


285

Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

accept

accept

accept

accept

accept

accept

accept

accept

reject

accept

99%

accept

accept

accept

accept

accept

accept

accept

accept

reject

accept

Table D.41: Rejection / acceptance of H0 for EFuNN vs. EFuNN trained by online aggregation for the gas furnace problem. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

Table D.42: Rejection / acceptance of H0 for SECoS vs. SECoS trained via evolutionary optimised training for the gas furnace problem. by offline aggregation, that is, the comparison between EFuNN optimised by offline aggregation and the unoptimised EFuNN.





Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

Table D.43: Rejection / acceptance of H0 for EFuNN vs. EFuNN trained via evolutionary optimised training for the gas furnace problem.


Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

286

Table D.44: Rejection / acceptance of H0 for SECoS optimised by offline aggregation for the gas furnace problem. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

accept

reject

reject

reject

reject

99%

reject

reject

accept

reject

reject

accept

reject

reject

reject

reject

Table D.45: Rejection / acceptance of H0 for EFuNN optimised by offline aggregation for the gas furnace problem.






Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

Table D.46: Rejection / acceptance of H0 for SECoS optimised by sleep learning for the gas furnace problem.


Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

accept

reject

reject

reject

287

Table D.47: Rejection / acceptance of H0 for SECoS optimised by GA optimised sleep learning, for the gas furnace problem. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

Table D.48: Rejection / acceptance of H0 for SECoS optimised via sleep learning, vs SECoS optimised via GA optimised sleep learning, for the gas furnace problem. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

accept

reject

reject

reject

Table D.49: Rejection / acceptance of H0 for SECoS optimised by online aggregation vs SECoS trained via GA optimised training for the gas furnace problem. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

reject

Table D.50: Rejection / acceptance of H0 for EFuNN optimised by online aggregation vs EFuNN trained via GA optimised training for the gas furnace problem. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

reject

accept

accept

reject

reject

99%

reject

reject

reject

reject

reject

reject

accept

accept

reject

reject


H0 for SECoS optimised by offline aggregation vs SECoS optimised by

sleep learning for the gas furnace problem. Hypothesis

AA

AB

AC

AF

AN

BA

BB

BC

BF

BN

95%

reject

reject

reject

reject

reject

accept

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

accept

reject

reject

reject

reject



GA optimised sleep learning for the gas furnace problem.

SECoS optimised by

Appendix E

Results of Hypothesis Tests for Phoneme Recognition Experiments This Appendix presents the results of the statistical hypothesis tests performed for the experiments in the phoneme recognition case study, as described in Chapter 8. Each section deals with the tests done with a particular family of algorithms.

E.1 MLP and FuNN This section presents the results of the tests done over the MLP and FuNN algorithms. The tables in this section are as follows:

Table E.1 presents the results of testing the hypotheses listed in Table 8.7 on page 198, that is, the comparison of the performance of MLP and FuNN networks.

Table E.2 presents the results of testing the hypotheses in Table 8.8 on page 198 for MLP, that is, the evaluation of forgetting in MLP networks after further training.

Table E.3 presents the results of testing the hypotheses in Table 8.8 on page 198 for FuNN, that is, the evaluation of forgetting in FuNN networks after further training.

Hypothesis

AA

AB

AC

BA

BB

BC

CA

CB

CC

95%

reject

reject

reject

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

reject

reject

reject

Table E.1: Rejection / acceptance of H0 for MLP vs. FuNN for the phoneme case study.

Hypothesis

ÆBA

ÆBB

ÆBC

ÆCA

ÆCB

ÆCC

95%

reject

reject

reject

reject

reject

reject

99%

reject

reject

reject

reject

reject

reject

Table E.2: Rejection H0 for change in accuracy of MLP for the phoneme case study.

APPENDIX E. RESULTS OF HYPOTHESIS TESTS FOR PHONEME RECOGNITION EXPERIMENTS 289

Hypothesis

ÆBA

ÆBB

ÆBC

ÆCA

ÆCB

ÆCC

95%

reject

reject

reject

accept

accept

accept

99%

reject

reject

reject

accept

accept

accept

Table E.3: Rejection H0 for change in accuracy of FuNN for the phoneme case study.

E.2 EFuNN and SECoS This section presents the results of the statistical hypothesis tests for the experiments with EFuNN and SECoS. The tables in this section are as follows:

Table E.4 presents the results of testing the hypotheses in Table 8.11 on page 200, that is, the comparison of the performance of SECoS and EFuNN.

Table E.5 presents the results of testing the hypotheses in Table 8.12 on page 200, that is, the comparison of the performance of MLP and SECoS.

Table E.6 presents the results of testing the hypotheses in Table 8.13 on page 201, that is, the comparison of the performance of FuNN and EFuNN.

Table E.7 presents the results of testing the hypotheses in Table 8.8 on page 198 for EFuNN, that is, the assessment of the forgetting of EFuNN.

Table E.8 presents the results of testing the hypotheses in Table 8.8 on page 198 for SECoS, that is, the assessment of the forgetting of SECoS.

Table E.9 presents the results of testing the hypotheses in Table 8.14 on page 202, that is, the comparison of the forgetting of SECoS and EFuNN.

Hypothesis

AA

AB

AC

AN

95%

reject

reject

accept

reject

99%

reject

reject

accept

reject

Hypothesis

BA

BB

BC

BN

95%

reject

reject

reject

reject

99%

reject

reject

accept

reject

Hypothesis

CA

CB

CC

CN

95%

reject

reject

accept

reject

99%

reject

reject

accept

reject

Table E.4: Rejection of hypothesis comparing SECoS and EFuNN for phoneme recognition case study.


Hypothesis

AA

AB

AC

95%

accept

accept

accept

99%

accept

accept

accept

Hypothesis

BA

BB

BC

95%

reject

reject

reject

99%

reject

reject

reject

Hypothesis

CA

CB

CC

95%

reject

reject

reject

99%

reject

reject

reject

Table E.5: Rejection of hypothesis comparing MLP and SECOS for the phoneme recognition case study. Hypothesis

AA

AB

AC

95%

accept

accept

accept

99%

accept

accept

accept

Hypothesis

BA

BB

BC

95%

reject

reject

reject

99%

reject

reject

reject

Hypothesis

CA

CB

CC

95%

reject

reject

reject

99%

reject

reject

reject

Table E.6: Rejection / acceptance of H0 comparing FuNN and EFuNN for phoneme recognition case study.

E.3 Inserted Fuzzy Rules This section presents the results of testing the statistical hypotheses for the fuzzy rule insertion experiments. The tables in this section are as follows:

Table E.10 presents the results of testing the hypotheses listed in Table 8.17 on page 205, that is, the comparison of the performance of the Zadeh-Mamdani rules extracted from EFuNN with the EFuNN created via the insertion of those rules.

Table E.11 presents the results of testing the hypotheses listed in Table 8.18 on page 205, that is, the comparison of the performance of EFuNN networks created via rule insertion with the performance of the original EFuNN networks.

Table


Hypothesis

ÆBA

ÆBB

ÆBC

95%

reject

reject

accept

99%

reject

reject

accept

Hypothesis

ÆCA

ÆCB

ÆCC

95%

reject

reject

reject

99%

accept

accept

reject

Table E.7: Rejection of H0 for changes in accuracy of EFuNN for the phoneme recognition case study. Hypothesis

ÆBA

ÆBB

ÆBC

95%

reject

reject

accept

99%

accept

reject

accept

Hypothesis

ÆCA

ÆCB

ÆCC

95%

reject

reject

reject

99%

reject

reject

reject

Table E.8: Reject of H0 for changes in accuracy of SECoS for the phoneme recognition case study.

E.4 ECoS Optimisation This section presents the results of testing the statistical hypotheses for the ECoS optimisation experiments.

E.4.1

Online Aggregation

This subsection presents the results of the tests performed for the online aggregation experiments. The tables are as follows:

Table E.12 presents the results of testing the hypotheses listed in Table 8.21 on page 207, that is, the comparison between the performance of EFuNN networks optimised using online aggregation and the performance of unoptimised EFuNN networks.

Table E.13 presents the results of testing the hypotheses listed in Table 8.21 on page 207, that is, the compar-

Hypothesis

ÆBA

ÆBB

ÆBC

95%

reject

reject

accept

99%

reject

reject

accept

Hypothesis

ÆCA

ÆCB

ÆCC

95%

reject

reject

reject

99%

reject

reject

reject

Table E.9: Reject of H0 for changes in accuracy of SECoS and EFuNN for the phoneme recognition case study.


Hypothesis

AA

AB

AC

95%

reject

reject

reject

99%

reject

reject

reject

Hypothesis

BA

BB

BC

95%

reject

reject

reject

99%

reject

reject

reject

Hypothesis

CA

CB

CC

95%

reject

reject

reject

99%

reject

reject

reject

Table E.10: Rejection / acceptance of H0 for comparison of Zadeh-Mamdani rules extracted from EFuNN and the networks created from those rules, for the phoneme recognition case study. Hypothesis

AA

AB

AC

95%

accept

accept

accept

99%

accept

accept

accept

Hypothesis

BA

BB

BC

95%

accept

accept

accept

99%

accept

accept

accept

Hypothesis

CA

CB

CC

95%

accept

accept

accept

99%

accept

accept

accept

Table E.11: Rejection / acceptance of

H0

for EFuNN vs. EFuNN created via the insertion of Zadeh-Mamdani

fuzzy rules, for the phoneme recognition case study. ison between the performance of SECoS networks optimised using online aggregation and the performance of unoptimised SECoS networks.

E.4.2

Offline Aggregation

This subsection presents the results of the tests performed for the offline aggregation experiments. The tables are as follows:

Table E.14 presents the results of testing the hypotheses listed in Table 8.24 on page 209, that is, the comparison between the performance of EFuNN networks optimised using offline aggregation and the performance of unoptimised EFuNN networks.

Table E.15 presents the results of testing the hypotheses listed in Table 8.24 on page 209, that is, the comparison between the performance of SECoS networks optimised using offline aggregation and the performance


Hypothesis

AA

AB

AC

AN

95%

reject

accept

accept

reject

99%

accept

accept

accept

reject

Hypothesis

BA

BB

BC

BN

95%

reject

reject

accept

reject

99%

reject

reject

accept

reject

Hypothesis

CA

CB

CC

CN

95%

reject

reject

reject

reject

99%

reject

reject

reject

reject

Table E.12: Rejection / acceptance of H0 for evaluation of online aggregation for EFuNN, for the phoneme recognition case study. Hypothesis

AA

AB

AC

AN

95%

accept

accept

accept

accept

99%

accept

accept

accept

accept

Hypothesis

BA

BB

BC

BN

95%

accept

accept

accept

accept

99%

accept

accept

accept

accept

Hypothesis

CA

CB

CC

CN

95%

accept

accept

accept

accept

99%

accept

accept

accept

accept

Table E.13: Rejection / acceptance of H0 for evaluation of online aggregation for SECoS, for the phoneme recognition case study. of unoptimised SECoS networks.

E.4.3

Sleep Learning

Table E.16 presents the results of testing the hypotheses presented in Table 8.27 on page 211, that is, the comparison of the performance of SECoS networks optimised via sleep learning with the performance of unoptimised SECoS networks.

Table E.17 presents the results of testing the hypotheses listed in Table 8.28 on page 211, that is, the comparison of the performance of SECoS networks optimised via sleep learning with the performance of unoptimised SECoS networks.


Hypothesis

AA

AB

AC

AN

95%

reject

reject

reject

reject

99%

reject

reject

accept

reject

Hypothesis

BA

BB

BC

BN

95%

reject

reject

reject

reject

99%

reject

reject

reject

reject

Hypothesis

CA

CB

CC

CN

95%

reject

reject

reject

reject

99%

reject

reject

reject

reject

Table E.14: Rejection / acceptance of

H0

for evaluation of offline aggregation for EFuNN, for the phoneme

recognition case study.

Hypothesis

AA

AB

AC

AN

95%

reject

reject

reject

reject

99%

reject

reject

reject

reject

Hypothesis

BA

BB

BC

BN

95%

reject

reject

reject

reject

99%

reject

reject

reject

reject

Hypothesis

CA

CB

CC

CN

95%

reject

reject

reject

reject

99%

reject

reject

reject

reject

Table E.15: Rejection / acceptance of H0 for evaluation of offline aggregation for SECoS, for the phoneme recognition case study.

Hypothesis

AA

AB

AC

AN

95%

reject

reject

reject

reject

99%

reject

reject

reject

reject

Hypothesis

BA

BB

BC

BN

95%

reject

reject

reject

reject

99%

reject

reject

reject

reject

Hypothesis

CA

CB

CC

CN

95%

reject

reject

reject

reject

99%

reject

reject

reject

reject

Table E.16: Rejection / acceptance of H0 for evaluation of sleep learning, for the phoneme recognition case study.


Hypothesis

AA

AB

AC

AN

95%

reject

reject

accept

accept

99%

reject

reject

accept

accept

Hypothesis

BA

BB

BC

BN

95%

reject

reject

accept

accept

99%

reject

reject

accept

accept

Hypothesis

CA

CB

CC

CN

95%

reject

reject

reject

accept

99%

reject

reject

reject

accept

Table E.17: Rejection / acceptance of H0 for comparison of sleep learning and offline aggregation, for the phoneme recognition case study.

Appendix F

Complete Phoneme Experimental Results

APPENDIX F. COMPLETE PHONEME EXPERIMENTAL RESULTS

297

Table F.1: Mean and standard deviation percent true negative, true positive and overall accuracies (to 1 d.p.) of MLP trained on the phoneme classification problem.

Recalled With Set A Phoneme Train True

/p/

/b/

/t/

/d/

/k/

/g/

/f/

/v/

/T/

True

Set B Overall

True

True

Neg.

Pos.

Set C Overall

True

True

Neg.

Pos.

Overall

set

Neg.

Pos.

A

97.5/0.3

95.2/2.9

97.4/0.3

97.8/0.2

79.3/3.1

97.6/0.2

97.9/0.3

63.7/6.5

B

98.2/0.8

64.4/7.4

97.8/0.7

98.4/0.6

61.6/7.9

98.0/0.6

98.6/0.6

46.5/11.0 98.2/0.5

C

3.2/4.8

100.0/0.0 4.3/4.7

3.5/5.2

100.0/0.0 4.6/5.2

2.5/3.7

100.0/0.0 3.3/3.7

A

99.0/0.4

98.4/8.9

99.0/0.4

99.3/0.3

23.5/3.7

99.1/0.3

99.8/0.1

4.3/0.0

99.5/0.1

B

94.1/1.3

67.3/9.8

94.0/1.3

95.5/1.1

43.2/13.0 95.4/1.1

97.4/1.6

0.0/0.0

97.1/1.6

C

19.0/28.9 92.8/12.4 19.2/28.9 18.7/28.2 88.9/19.2 18.9/28.1 16.7/25.4 71.0/43.5 16.9/25.2

A

98.5/0.4

97.5/0.9

98.5/0.4

98.3/0.4

90.4/2.0

98.2/0.4

95.7/1.3

67.9/2.8

B

97/3.9

89.1/6.2

96.9/3.8

97.1/4

88.1/5.8

97.0/3.9

95.0/2.9

72.2/12.2 94.8/2.8

C

20.8/41

98.7/2.7

21.8/40.4 21/41.1

98.7/2.6

22.0/40.6 21.3/41.8 100.0/0.0 21.8/41.6

A

98.0/0.8

100.0/0.0 98.0/0.8

97.9/0.8

70.9/7.5

97.9/0.8

96.5/1.4

1.9/2.3

96.2/1.4

B

99.0/0.3

66.7/8.3

98.9/0.3

99.0/0.3

59.8/9.2

98.9/0.3

99.4/0.5

0.0/0.0

98.1/0.5

C

7.2/14.1

86.4/27.0 7.4/14.0

6.9/13.6

86.3/27.4 7.1/13.5

6.1/12.1

83.6/32.6 6.3/11.9

A

99.0/0.2

99.7/0.6

99.0/0.2

99.0/0.2

91.7/3.0

98.9/0.2

98.8/0.5

81.2/5.2

98.7/0.5

B

98.9/0.4

75.4/8.0

98.7/0.4

99.0/0.3

80.3/5.2

98.8/0.3

98.4/0.9

67.5/6.5

98.2/0.9

C

88.2/30.3 76.4/12.2 88.1/29.9 88.5/30.0 78.0/11.3 88.4/29.6 88.2/30.7 78.3/9.1

88.1/30.5

A

98.5/0.3

100.0/0.0 98.5/0.3

98.5/0.2

78.6/7.5

98.5/0.2

99.5/0.1

0.3/0.9

99.0/0.1

B

99.0/0.2

84.9/7.5

99.0/0.2

76.1/8.1

99.0/0.2

99.7/0.1

0.0/0.0

99.2/0.1

C

2.2/4.3

100.0/0.0 2.5/4.2

2.4/4.5

98.3/5.1

2.6/4.5

2.1/3.7

100.0/0.0 2.6/3.7

A

95.8/0.7

97.9/1.2

95.9/0.6

94.5/1.0

93.0/1.2

94.5/1.0

96.6/0.4

94.5/1.0

96.6/0.4

B

97.3/1.1

54.8/13.4 96.0/0.8

96.7/1.4

51.2/13.7 95.5/1.1

97.6/1.1

79.4/8.2

97.1/1.0

C

95.3/1.4

46.3/7.5

93.9/1.3

94.0/1.6

49.1/7.9

92.7/1.5

97.0/0.7

59.5/9.3

96.1/0.8

A

95.4/1.0

94.1/2.1

95.4/1.0

94.7/1.1

84.8/2.5

94.6/1.1

92.9/0.9

95.0/3.4

93.0/0.8

B

98.3/1.6

15.5/17.5 97.7/1.5

98.6/1.3

22.2/20.5 98.1/1.1

99.1/1.0

2.8/5.0

97.5/0.9

C

11.4/33.0 88.9/33.3 12.0/32.6 11.5/33.1 89.2/32.4 12.0/32.6 11.5/33.2 88.9/33.3 12.8/32.0

A

96.8/1.3

97.4/1.7

96.8/1.2

95.8/1.5

92.9/2.6

95.8/1.4

95.1/1.0

52.3/6.8

B

97.9/0.9

72.4/10.2 97.3/0.8

97.4/1.2

72.9/9.9

96.8/1.0

96.7/1.0

13.8/14.3 94.7/0.8

C

11.9/32.8 96.6/10.2 14.0/31.7 12.0/32.7 95.0/15.1 14.1/31.5 11.4/32.1 94.2/17.4 13.5/30.9

99.0/0.2

97.6/0.3

95.6/1.3

94.1/1.0


298


/D/

/s/

/z/

/S/

/Z/

/h/

/ch/

/dj/

/m/

True

Set B Overall

True

True

Neg.

Pos.

Set C Overall

True

True

Neg.

Pos.

Overall

set

Neg.

Pos.

A

96.3/0.8

96.6/1.6

96.3/0.8

95.4/0.9

75.4/2.9

95.3/0.9

90.6/1.4

77.8/5.8

90.4/1.3

B

97.6/0.5

73.7/7.7

97.5/0.5

97.2/0.5

70.2/0.4

97.0/0.5

94.1/1.5

76.8/5.0

93.8/1.4

C

3.0/4.5

100.0/0.0 3.7/4.4

3.2/4.8

99.4/1.8

4.0/4.8

2.2/3.3

100.0/0.0 3.8/3.2

A

98.8/0.5

95.6/0.8

98.7/0.5

98.1/0.6

93.3/1.4

98.0/0.5

98.1/0.6

76.3/4.0

B

99.4/0.2

77.0/10.1 98.6/0.3

99.1/0.3

79.4/9.1

98.4/0.2

99.1/0.7

56.0/26.7 98.4/0.7

C

99.4/0.2

75.0/15.6 98.6/0.4

98.9/0.2

80.8/11.7 98.3/0.2

99.6/0.3

84.1/9.1

99.3/0.3

A

98.3/0.7

96.9/1.6

98.3/0.7

98.1/1.0

86.3/4.4

97.9/0.9

96.3/0.9

95.7/1.6

96.3/0.8

B

99.4/0.1

76.2/7.1

99.0/0.1

99.0/0.2

86.0/5.6

98.8/0.2

98.0/0.3

88.7/2.7

97.8/0.3

C

0.0/0.0

100.0/0.0 1.9/0.0

0.0/0.0

100.0/0.0 1.9/0.0

0.0/0.0

100.0/0.0 2.2/0.0

A

97.7/0.2

96.7/1.1

97.6/0.2

97.3/0.2

96.6/2.2

97.3/0.2

97.6/0.5

83.2/5.8

B

98.2/0.9

76.8/18.4 97.6/0.4

98.1/1.0

79.9/13.5 97.6/0.6

99.5/0.4

31.8/30.3 97.6/0.6

C

96.7/1.8

92.8/5.5

96.5/1.6

96.7/1.6

94.2/5.5

96.6/1.4

97.9/1.2

80.2/22.7 97.4/0.6

A

98.9/0.3

99.3/2.1

98.9/0.3

98.2/0.3

64.1/5.1

97.9/0.3

99.1/0.5

54.3/1.6

B

97.7/1.0

65.3/29.3 97.4/0.8

97.7/1.2

57.0/23.4 97.4/1.0

99.4/0.5

39.7/20.6 98.7/0.3

C

2.6/7.7

90.8/27.5 3.3/7.4

2.5/7.5

90.6/28.2 3.2/7.2

2.0/5.9

93.5/19.5 3.0/5.6

A

98.8./0.5 99.3/0.6

98.8/0.4

98.7/0.4

83.9/4.3

98.6/0.4

99.1/0.3

35.9/3.2

98.0/0.3

B

99.5/0.3

99.3/0.2

99.6/0.2

73.4/6.2

99.4/0.2

99.6/0.3

28.6/3.9

98.4/0.3

C

12.5/32.5 96.5/10.4 13.2/32.1 12.9/32.4 95.9/12.3 13.5/32.1 12.3/32.9 83.1/26.0 13.6/32.0

A

96.0/0.3

99.0/0.7

96.1/0.2

95.9/0.3

97.3/1.1

95.9/0.3

96.4/0.3

40.1/7.0

B

96.9/0.6

77.6/12.7 96.4/0.4

97.0/0.7

77.3/12.5 96.6/0.5

96.9/1.0

42.4/22.0 96.1/0.7

C

73.3/41.6 95.6/4.8

73.8/40.6 73.5/41.7 96.1/4.1

74.0/40.7 74.2/42.1 82.3/11.1 74.3/41.2

A

95.8/0.5

99.7/0.9

95.9/0.5

95.4/0.5

87.7/8.7

95.4/0.5

94.6/0.8

5.6/0.7

93.4/0.8

B

96.8/0.8

82.2/24.6 96.7/0.7

96.3/0.8

83.3/9.6

96.3/0.8

96.0/1.2

3.7/1.6

94.7/1.2

C

0.3/0.5

100.0/0.0 0.6/0.5

0.4/0.8

100.0/0.0 0.7/0.8

0.2/0.5

100.0/0.0 1.6/0.5

A

96.6/0.9

96.7/3.1

96.6/0.8

95.9/0.8

88.0/5.5

95.9/0.8

92.5/1.5

26.5/9.6

91.9/1.4

B

98.5/0.7

42.5/15.5 97.8/0.6

98.5/0.7

34.4/13.1 97.7/0.6

98.3/1.4

6.7/5.9

97.4/1.3

C

6.0/7.7

99.4/1.9

6.4/7.5

99.4/1.7

4.3/5.9

100.0/0.0 5.2/5.8

71.1/6.0

7.1/7.6

7.5/7.4

97.8/0.6

97.2/0.5

98.5/0.5

95.5/0.2


299


/n/

/N/

/l/

/r/

/w/

/ie/

/I/

/e/

/&/

True

Set B Overall

True

True

Neg.

Pos.

Set C Overall

True

True

Neg.

Pos.

Overall

set

Neg.

Pos.

A

96.0/0.9

92.9/4.3

96.0/0.9

96.0/0.9

91.5/4.0

96.0/0.8

92.1/2.1

15.0/11.5 91.1/2.0

B

98.8/0.3

28.8/6.0

97.5/0.2

98.8/0.2

37.8/7.5

97.7/0.1

98.0/0.4

0.8/2.5

C

50/47.8

53.6/50.4 50.1/46

50.1/47.7 52.3/50.1 50.1/45.9 48.7/48.9 55.6/52.7 48.7/47.6

A

97.1/0.3

100.0/0.0 97.1/0.3

96.5/0.4

B

90.9/24.2 43.3/22.3 90.5/23.7 90.6/24.5 56.5/18.7 90.2/24.1 88.7/23.1 11.1/33.3 88.0/22.6

C

28.6/33.1 93.0/16.3 29.2/32.7 27.7/31.9 95.2/11.4 28.4/31.5 22.4/28.3 92.6/15.2 23.0/28.0

A

97.1/0.7

97.7/1.8

97.1/0.6

96.2/0.7

87.4/6.0

96.1/0.7

93.1/1.5

49.3/6.2

92.1/1.4

B

97.6/0.8

58.9/7.7

97.0/0.7

97.0/1.0

60.8/5.2

96.5/0.1

95.6/1.5

42.1/7.1

94.4/1.4

C

12.2/30.5 93.6/19

13.6/29.9 12.6/30.0 94.3/17.2 13.7/29.4 11.4/29.4 97.5/7.6

A

99.6/0.1

97.6/1.2

99.5/0.1

99.3/0.2

74.4/2.3

99.1/0.2

97.9/0.8

18.8/13.6 97.2/0.7

B

99.6/0.5

74.9/15.3 99.4/0.4

99.5/0.5

64.6/13.5 99.2/0.4

98.8/0.9

13.8/12.6 98.1/0.9

C

6.2/8.5

100.0/0.0 7.2/8.4

6.2/8.1

100.0/0.0 7.2/8.0

4.7/6.2

100.0/0.0 5.6/6.1

A

98.1/0.3

96.0/1.3

98.1/0.3

97.9/0.4

68.1/5.8

97.6/0.4

91.0/2.1

36.8/4.9

B

97.2/1.2

65.8/12.3 96.8/1.1

97.7/1.1

66.7/12.8 97.4/0.9

93.4/2.7

58.4/13.4 93.1/2.6

C

21.8/26.2 95.0/14.5 22.5/25.8 21.8/25.7 97.5/6.9

A

95.5/1.1

B

89.1/26.4 33.2/28.4 88.6/25.8 89.3/26.4 30.4/28.2 88.6/25.8 87.3/27.9 22.2/30.9 86.6/27.4

C

45.6/47.6 61.0/46.2 45.7/46.6 45.5/47.4 62.1/45.9 45.6/46.4 42.8/46.0 66.2/43.6 43.0/45.2

A

97.4/0.5

99.3/0.9

97.4/0.5

96.5/0.7

81.8/2.6

96.4/0.6

94.0/1.6

64.8/9.2

B

97.3/1.0

58.4/14.6 96.6/1.0

97.3/0.9

60.4/15.6 96.7/0.8

96.6/1.6

48.1/20.9 95.8/1.5

C

17.7/32.2 90.2/29.3 18.9/31.2 17.7/31.8 90.1/29.8 18.9/30.8 15.5/29.5 94.9/15.4 16.8/29.8

A

95.5/0.9

98.1/3.0

95.6/0.8

95.6/0.9

91.8/4.2

95.5/0.8

96.5/0.9

48.2/12.1 95.0/0.6

B

96.1/0.5

87.5/7.9

95.9/0.4

96.3/0.4

84.1/8.3

95.9/0.2

96.8/0.7

66.5/9.0

C

45.9/45.6 85.1/20.8 46.8/44.1 45.7/45.7 84.4/20.7 46.7/44.1 42.8/45.1 89.6/15.3 44.2/43.3

A

97.3/0.5

98.5/1.6

97.3/0.5

96.8/0.7

90.8/3.9

96.6/0.6

95.0/0.8

92.4/7.0

94.9/0.6

B

96.3/2.7

85.0/8.9

96.0/2.6

96.3/2.4

81.2/8.9

95.9/2.3

93.3/2.5

95.0/4.2

93.4/2.4

C

3.8/10.0

100.0/0.0 6.5/9.7

3.8/9.7

100.0/0.0 6.5/9.5

3.3/9.1

100.0/0.0 5.9/8.8

99.0/1.8

95.5/1.1

95.1/1.2

95.7/3.6

96.5/0.4

91.4/1.2

0.0/0.0

22.6/25.4 16.6/22.2 97.7/6.8

67.9/10.2 94.8/1.1

94.4/2.4

96.8/0.4

90.6/1.2

13.3/28.6

90.6/2.1

17.2/22.0

34.8/13.2 93.8/2.3

93.6/1.5

95.9/0.6


300


/V/

/A/

/U/

/i/

/a/

/O/

/3/

/u/

/el/

True

Set B Overall

True

True

Neg.

Pos.

Set C Overall

True

True

Neg.

Pos.

Overall

set

Neg.

Pos.

A

93.5/0.8

99.5/0.6

93.6/0.8

93.3/1.0

77.4/5.0

93.1/0.9

89.6/1.3

85.8/2.6

B

97.0/1.2

47.0/23.9 96.2/1.0

97.0/1.3

44.9/13.7 96.2/1.1

94.6/2.2

60.3/14.2 94.1/2.0

C

6.0/11.9

100.0/0.0 7.5/11.7

5.7/11.3

100.0/0.0 7.2/11.1

4.6/9.1

100.0/0.0 5.9/8.9

A

96.9/0.8

99.5/0.3

97.0/0.8

96.3/0.8

88.4/1.9

96.1/0.8

96.7/1.7

78.1/5.3

B

95.0/3.4

90.6/10.1 95.0/3.3

94.9/3.3

84.8/12.1 94.7/3.2

94.6/4.6

77.2/18.9 94.3/4.3

C

13.8/25.4 99.9/0.4

15.3/24.9 13.4/24.5 99.9/0.4

A

97.5/0.7

97.4/5.4

97.5/0.7

97.3/0.9

57.9/11.3 96.9/0.8

94.4/1.4

49.5/10.3 93.9/1.4

B

98.1/0.8

71.2/21.0 97.8/0.7

98.1/0.8

50.9/16.4 97.7/0.7

93.1/1.4

46.2/24.6 92.6/1.2

C

15.5/30.9 90.3/29.0 16.2/30.4 15.5/30.8 90.7/27.8 16.2/30.2 12.6/26.6 90.4/28.8 13.4/26.0

A

93.6/1.6

96.0/2.7

93.6/1.5

92.3/1.9

87.1/2.8

92.2/1.7

91.0/2.4

46.4/13.6 88.7/1.7

B

97.0/1.4

39.4/20.0 95.2/0.9

96.6/1.6

39.1/20.2 94.8/1.1

95.5/2.1

24.2/13.5 91.9/1.5

C

38.4/40.1 81.1/34.6 39.7/37.9 38.2/39.9 81.0/34.1 39.6/37.7 35.2/39.3 79.4/33.2 37.4/35.8

A

95.8/1.0

96.3/2.4

95.8/0.8

94.7/1.3

96.0/2.8

94.8/1.1

95.9/0.9

94.4/2.2

95.8/0.7

B

92.7/4.2

94.3/4.2

92.8/4.1

91.9/4.1

95.3/5.4

92.1/4.1

91.7/4.4

96.6/6.4

92.1/4.0

C

1.5/2.5

100.0/0.0 6.7/2.4

1.9/2.9

100.0/0.0 7.0/2.8

1.2/2.3

100.0/0.0 7.5/2.1

A

95.9/0.9

96.4/3.4

95.5/1.1

88.0/7.1

92.2/2.3

76.5/12.4 91.4/1.7

B

93.2/11.2 73.7/15.1 92.2/10.1 93.2/11.5 68.9/16.2 92.0/10.3 90.6/11.9 66.7/13.5 89.5/10.8

C

24.4/34.8 96.7/7.8

28.0/32.7 23.7/33.7 96.2/9.0

27.4/31.6 18.6/27.8 98.1/4.6

22.5/26.3

A

95.9/1.5

95.6/1.3

95.8/1.4

94.9/1.6

86.8/1.6

94.5/1.5

93.4/1.8

81.1/4.6

92.9/1.6

B

96.1/2.7

34.5/16.8 93.1/2.4

93.0/2.6

39.1/17.0 93.4/2.1

92.3/4.7

49.3/19.7 90.4/4.2

C

21.1/20.4 99.3/1.6

24.8/19.3 21.2/19.4 99.6/1.0

24.8/18.5 17.2/17.7 99.8/0.5

20.8/16.9

A

96.5/1.2

94.1/3.3

96.4/1.0

96.1/1.2

77.8/8.1

95.4/0.9

94.6/1.6

33.2/6.2

91.7/1.4

B

96.0/0.6

76.1/8.3

95.2/0.7

96.6/0.4

69.5/8.2

95.5/0.5

96.1/1.6

20.2/8.6

92.6/1.4

C

18.4/28.3 95.0/12.3 21.4/26.7 18.4/28.1 95.3/12.5 21.4/26.5 14.8/23.6 94.9/14.9 18.6/21.9

A

92.2/2.8

87.0/7.3

92.0/2.3

90.4/3.4

78.2/7.9

89.8/2.8

90.9/3.5

28.0/10.9 89.4/3.2

B

92.4/5.4

49.3/14.6 90.4/5.0

92.7/4.9

48.4/13.9 90.6/4.4

91.2/5.0

28.2/18.1 89.7/4.5

C

25.2/28.8 83.0/27.9 27.9/26.2 24.9/27.8 84.6/26.3 27.7/25.3 20.5/24.1 80.0/32.8 21.9/22.9

95.9/0.7

89.6/1.2

96.4/1.6

14.9/24.1 10.7/19.2 100.0/0.0 12.2/18.9

95.1/0.7


301


/al/

/Oi/

/OU/

/aU/

/i@/

/U@/

/e@/

True

Set B Overall

True

True

Neg.

Pos.

Set C Overall

True

True

Neg.

Pos.

Overall

set

Neg.

Pos.

A

92.1/3.0

85.4/5.3

91.7/2.5

90.4/3.2

79.3/5.8

89.6/2.7

92.8/2.7

2.2/1.7

89.3/2.5

B

93.2/3.8

57.4/12.3 90.8/2.9

93.1/3.6

61.5/10.9 90.9/2.8

93.7/5.2

7.5/13.4

90.4/4.6

C

27.8/41.9 77.9/43.7 31.2/36.2 28.0/41.8 77.8/43.8 31.3/36.1 26.3/42.2 77.8/44.1 28.4/38.9

A

93.7/1.8

82.6/7.2

93.3/1.5

93.2/1.9

71.9/6.9

92.3/1.6

91.0/3.2

0.3/0.3

86.4/3.0

B

93.2/3.4

41.5/15.1 91.1/3.0

93.4/3.3

46.7/17.1 91.5/2.8

89.2/5.8

0.4/0.8

84.6/5.5

C

16.7/13.0 99.9/0.2

20.0/12.4 16.3/12.7 100.0/0.0 19.6/12.2 12.8/10.2 99.0/2.9

17.2/9.6

A

94.6/2.0

93.9/1.7

82.5/1.7

B

83.5/23.9 52.3/19.4 81.7/21.6 83.6/24.1 53.0/19.2 81.8/21.7 80.3/24.9 14.2/32.5 77.2/22.3

C

16.7/23.6 93.5/16.4 21.1/21.4 16.5/22.6 94.8/13.8 21.1/20.5 13.7/17.8 98.5/4.5

17.6/16.8

A

92.3/2.3

93.4/4.6

92.4/2.0

90.6/2.7

86.0/5.5

90.4/2.4

95.1/1.5

0.9/1.1

90.5/1.4

B

97.0/0.8

40.7/11.7 94.4/0.7

96.3/1.0

46.4/9.9

94.0/0.8

96.3/1.5

4.0/3.8

91.8/1.3

C

39.8/39.7 73.3/35.7 41.3/36.4 39.2/39.2 75.6/34.3 40.9/35.9 34.7/37.4 83.0/34.5 37.0/34.2

A

93.5/1.7

B

87.4/25.0 47.1/23.8 85.9/23.3 86.9/24.9 46.0/23.7 85.4/23.2 82.4/25.3 14.4/32.1 79.2/22.7

C

31.5/27.7 88.0/30.5 33.6/25.7 31.1/37.4 87.8/31.7 33.2/25.3 27.4/28.4 85.8/32.4 30.1/25.6

A

96.0/1.0

B

86.3/27.2 47.2/22.6 85.4/26.1 86.5/26.9 51.1/20.4 85.7/25.9 81.8/27.7 17.5/31.9 79.9/26.0

C

3.1/5.0

100.0/0.0 5.3/4.9

3.5/5.6

100.0/0.0 5.6/5.4

2.2/3.4

100.0/0.0 5.0/3.3

A

94.0/0.9

97.3/1.0

94.0/0.8

93.3/1.1

75.1/3.0

92.9/1.0

92.6/1.0

52.8/7.4

B

95.2/2.5

47.9/25.4 94.0/1.8

95.3/2.4

42.2/22.6 94.0/1.8

95.1/2.9

28.5/22.2 93.2/2.3

C

18.1/32.6 87.6/33.1 19.9/31.0 18.3/32.4 87.4/33.1 20.1/30.8 16.5/32.6 88.9/33.3 18.6/30.7

81.9/4.3

91.7/6.8

91.8/7.3

93.5/1.4

95.9/0.8

93.7/2.0

92.7/2.0

95.3/1.2

65.7/4.5

80.6/7.8

75.7/9.5

92.1/1.6

92.3/1.7

94.9/1.0

86.0/1.9

88.9/1.6

95.1/1.3

8.8/4.3

1.9/1.7

1.8/1.0

84.8/1.5

92.4/1.2

91.4/0.8


302

Table F.2: Mean and standard deviation percent true negative, true positive and overall accuracies (to 1 d.p.) of FuNN trained on the phoneme classification problem.


/p/

/b/

/t/

/d/

/k/

/g/

/f/

/v/

/T/

True

Set B Overall

True

True

Set C Overall

True

True

Overall

set

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

96.2/8.2

86.7/29.3 95.8/7.8

96.6/7.6

48.0/31.6 96.0/7.2

96.3/6.9

36.9/33.6 95.8/6.6

B

88.9/33.3 11.1/33.3 88.0/32.6 88.9/33.3 11.1/33.3 88.0/32.6 88.9/33.3 11.1/33.3 88.2/32.8

C

88.9/33.3 11.1/33.3 88.0/32.6 88.9/33.3 11.1/33.3 88.0/32.6 88.9/33.3 11.1/33.3 88.2/32.8

A

99.6/0.9

B

100.0/0.0 0.0/0.0

99.8/0.0

C

100.0/0.0 0.0/0.0

99.8/0.0

A

97.6/3.3

B

66.7/50.0 33.3/50.0 66.2/48.6 66.7/50.0 33.3/50.0 66.2/48.6 66.7/50.0 33.3/50.0 66.4/49.3

C

66.7/50.0 33.3/50.0 66.2/48.6 66.7/50.0 33.3/50.0 66.2/48.6 66.7/50.0 33.3/50.0 66.4/49.3

A

97.9/6.3

B

13.1/26.1 99.5/0.9

18.5/25.6 96.6/2.9

9.5/28.4

99.7/0.8

99.6/0.7

99.9/0.2

100.0/0.0 0.0/0.0

99.8/0.0

100.0/33.3 0.0/0.0

99.7/33.2

100.0/0.0 0.0/0.0

99.8/0.0

100.0/0.0 0.0/0.0

99.7/0.0

97.5/3.6

6.2/14.8

19.7/27.9 96.4/3.1

9.4/28.2

98.1/2.5

0.5/1.4

99.6/0.2

15.2/24.4 97.6/2.4

97.7/6.2

97.9/6.3

97.6/6.2

98.5/4.5

4.8/14.5

98.2/4.5

100.0/0.0 0.0/0.0

99.7/0.0

100.0/0.0 0.0/0.0

99.7/0.0

100.0/0.2 0.0/0.0

99.6/0.2

C

100.0/0.0 0.0/0.0

99.7/0.0

100.0/0.0 0.0/0.0

99.7/0.0

100.0/0.0 0.0/0.0

99.7/0.0

A

88.9/33.3 11.1/33.3 88.1/32.6 88.9/33.3 11.1/33.3 88.1/32.6 88.9/33.3 11.1/33.3 88.4/32.9

B

88.9/33.3 11.1/33.3 88.1/32.6 88.9/33.3 11.1/33.3 88.1/32.6 88.9/33.3 11.1/33.3 88.4/32.9

C

88.9/33.3 11.1/33.3 88.1/32.6 88.9/33.3 11.1/33.3 88.1/32.6 88.9/33.3 11.1/33.3 88.4/32.9

A

100.0/0.0 0.0/0.0

99.7/0.0

100.0/0.0 0.0/0.0

99.7/0.0

100.0/0.1 0.0/0.0

99.5/0.1

B

100.0/0.0 0.0/0.0

99.7/0.0

100.0/0.0 0.0/0.0

99.7/0.0

100.0/0.0 0.0/0.0

99.5/0.0

C

88.9/33.3 11.1/33.3 88.7/33.1 88.9/33.3 11.1/33.3 88.7/33.2 88.9/33.3 11.1/33.3 88.5/33.0

A

77.8/44.1 22.2/44.1 76.1/41.5 77.8/44.1 22.2/44.1 76.1/41.5 77.8/44.1 22.2/44.1 76.4/41.9

B

55.6/52.7 44.4/52.7 55.2/49.6 55.6/52.7 44.4/52.7 55.2/49.6 55.6/52.7 44.4/52.7 55.3/50.1

C

55.6/52.7 44.4/52.7 55.2/49.6 55.6/52.7 44.4/52.7 55.2/49.6 55.6/52.7 44.4/52.7 55.3/50.1

A

88.9/33.3 11.1/33.3 88.3/32.9 88.9/33.3 11.1/33.3 88.3/32.9 88.9/33.3 11.1/33.3 87.6/32.2

B

100.0/0.0 0.0/0.0

99.3/0.0

100.0/0.0 0.0/0.0

99.3/0.0

100.0/0.0 0.0/0.0

98.3/0.0

C

100.0/0.0 0.0/0.0

99.3/0.0

100.0/0.0 0.0/0.0

99.3/0.0

100.0/0.0 0.0/0.0

98.3/0.0

A

100.0/0.0 0.0/0.0

97.5/0.0

100.0/0.0 0.0/0.0

97.5/0.0

100.0/0.1 0.0/0.0

97.5/0.1

B

55.6/52.7 44.4/52.7 55.3/50.1 55.6/52.7 44.4/52.7 55.3/50.1 55.6/52.7 44.4/52.7 55.3/50.1

C

55.6/52.7 44.4/52.7 55.3/50.1 55.6/52.7 44.4/52.7 55.3/50.1 55.6/52.7 44.4/52.7 55.3/50.1


303


/D/

/s/

/z/

/S/

/Z/

/h/

/ch/

/dj/

/m/

True

Set B Overall

Pos.

True

True

Neg.

Pos.

Set C Overall

True

True

Neg.

Pos.

Overall

set

Neg.

A

88.9/33.3 11.1/33.3 88.3/32.8 88.9/33.3 11.1/33.3 88.3/32.8 88.9/33.3 11.1/33.3 87.6/32.2

B

100.0/0.0 0.0/0.0

99.2/0.0

100.0/0.0 0.0/0.0

99.2/0.0

100.0/0.0 0.0/0.0

98.4/0.0

C

100.0/0.0 0.0/0.0

99.2/0.0

100.0/0.0 0.0/0.0

99.2/0.0

100.0/0.0 0.0/0.0

98.4/0.0

A

96.4/3.0

95.6/3.0

95.3/2.7

B

33.3/50.0 67.2/49.2 34.5/46.5 33.3/50.0 67.1/49.4 34.5/46.4 33.3/50.0 66.7/50.0 33.9/48.4

C

33.3/50.0 67.4/49.0 34.5/46.5 33.3/50.0 67.2/49.2 34.6/46.4 33.3/50.0 66.7/50.0 33.9/48.4

A

100.0/0.0 0.0/0.0

B

66.7/50.0 33.3/50.0 66.0/48.1 66.7/50.0 33.3/50.0 66.0/48.1 66.7/50.0 33.3/50.0 65.9/47.8

C

66.7/50.0 33.3/50.0 66.0/48.1 66.7/50.0 33.3/50.0 66.0/48.1 66.7/50.0 33.3/50.0 65.9/47.8

A

96.8/0.6

B

77.8/44.1 22.2/44.1 76.2/41.5 77.8/44.1 22.2/44.1 76.2/41.5 77.8/44.1 22.2/44.1 76.2/41.6

C

77.8/44.1 22.2/44.1 76.2/41.5 77.8/44.1 22.2/44.1 76.2/41.5 77.8/44.1 22.2/44.1 76.2/41.6

A

95.7/4.7

B

100.0/0.0 0.0/0.0

99.2/0.0

100.0/0.0 0.0/0.0

99.2/0.0

100.0/0.0 0.0/0.0

98.8/0.0

C

100.0/0.0 0.0/0.0

99.2/0.0

100.0/0.0 0.0/0.0

99.2/0.0

100.0/0.0 0.0/0.0

98.8/0.0

A

88.9/33.3 11.1/33.3 88.3/32.8 88.9/33.3 11.1/33.3 88.3/32.8 88.9/33.3 11.1/33.3 87.5/32.2

B

100.0/0.0 0.0/0.0

C

88.9/33.3 11.1/33.3 88.3/32.8 88.9/33.3 11.1/33.3 88.3/32.8 88.9/33.3 11.1/33.3 87.5/32.2

A

93.4/6.8

B

100.0/0.0 0.0/0.0

C

88.9/33.3 11.1/33.3 87.2/31.9 88.9/33.3 11.1/33.3 87.2/31.9 88.9/33.3 11.1/33.3 87.6/32.3

A

73.5/42.0 59.9/45.9 73.4/41.8 73.5/42.0 61.1/46.4 73.4/41.8 74.4/42.3 23.7/43.3 73.7/41.1

B

99.5/1.5

C

100.0/0.0 0.0/0.0

A

93.0/11.4 54.4/45.1 92.5/10.9 92.7/11.9 52.6/43.4 92.2/11.4 90.2/11.4 34.4/37.0 89.6/11.0

B

77.8/44.1 22.7/43.9 77.1/43.0 77.8/44.1 22.4/44.0 77.1/43.0 77.7/44.1 22.2/44.1 77.2/43.2

C

77.8/44.1 22.2/44.1 77.1/43.0 77.8/44.1 22.2/44.1 77.1/43.0 77.8/44.1 22.2/44.1 77.2/43.2

81.7/30.9 95.9/2.5

92.3/5.6

98.1/0.0

96.7/0.5

44.6/43.1 95.3/4.3

99.2/0.0

78.8/30.5 93.1/6.3

9.1/27.2

97.8/0.0

95.6/3.2

82.3/31.3 95.1/2.5

100.0/0.0 0.0/0.0

96.7/0.8

95.8/4.4

94.4/3.6

96.6/0.6

44.7/42.6 95.4/4.1

100.0/0.0 0.0/0.0

93.5/6.4

98.1/0.0

99.2/0.0

78.8/30.4 93.2/5.9

100.0/0.0 0.0/0.0

99.1/1.4

99.5/1.6

9.3/27.8

99.6/0.0

100.0/0.0 0.0/0.0

97.8/0.0

81/30.7

100.0/0.0 0.0/0.0

96.9/0.5

96.6/3.3

85.3/12.1 96.6/0.1

35.4/39.8 95.9/2.9

100.0/0.0 0.0/0.0

93.7/4.8

97.8/0.0

98.3/0.0

59.5/25.4 93.2/4.4

100.0/0.0 0.0/0.0

98.4/0.0

99.1/1.5

99.4/1.8

0.1/0.3

98.0/1.8

99.6/0.0

100.0/0.0 0.0/0.0

98.6/0.0


304


/n/

/N/

/l/

/r/

/w/

/ie/

/I/

/e/

/&/

True

Set B Overall

Pos.

True

True

Neg.

Pos.

Set C Overall

True

True

Neg.

Pos.

Overall

set

Neg.

A

100.0/0.0 0.0/0.0

B

44.4/52.7 55.6/52.7 44.6/50.8 44.4/52.7 55.6/52.7 44.6/50.8 44.4/52.7 55.6/52.7 44.6/51.3

C

55.6/52.7 44.4/52.7 55.4/50.8 55.6/52.7 44.4/52.7 55.4/50.8 55.6/52.7 44.4/52.7 55.4/51.3

A

100.0/0.0 0.0/0.0

99.1/0.0

100.0/0.0 0.0/0.0

99.1/0.0

100.0/0.1 0.0/0.0

99.1/0.1

B

100.0/0.0 0.0/0.0

99.1/0.0

100.0/0.0 0.0/0.0

99.1/0.0

100.0/0.0 0.0/0.0

99.1/0.0

C

100.0/0.0 0.0/0.0

99.1/0.0

100.0/0.0 0.0/0.0

99.1/0.0

100.0/0.0 0.0/0.0

99.1/0.0

A

88.9/33.3 11.1/33.3 87.8/32.4 88.9/33.3 11.1/33.3 87.9/32.4 88.9/33.3 11.1/33.3 87.2/31.9

B

88.9/33.3 11.1/33.3 87.8/32.4 88.9/33.3 11.1/33.3 87.9/32.4 88.9/33.3 11.1/33.3 87.2/31.9

C

88.9/33.3 11.1/33.3 87.8/32.4 88.9/33.3 11.1/33.3 87.9/32.4 88.9/33.3 11.1/33.3 87.2/31.9

A

93.9/12.3 72.5/17.5 93.7/12.0 93.7/12.2 67.3/17.5 93.5/1.9

89.0/16.1 27.9/29.2 88.5/15.6

B

100.0/0.0 0.0/0.0

99.0/0.0

100.0/0.0 0.0/0.0

98.9/0.0

100.0/0.0 0.0/0.0

99.1/0.0

C

100.0/0.0 0.0/0.0

99.0/0.0

100.0/0.0 0.0/0.0

98.9/0.0

100.0/0.0 0.0/0.0

99.1/0.0

A

100.0/0.0 0.0/0.0

99.0/0.0

100.0/0.0 0.0/0.0

99.0/0.0

100.0/0.0 0.0/0.0

99.2/0.0

B

100.0/0.0 0.0/0.0

99.0/0.0

100.0/0.0 0.0/0.0

99.0/0.0

100.0/0.0 0.0/0.0

99.2/0.0

C

100.0/0.0 0.0/0.0

99.0/0.0

100.0/0.0 0.0/0.0

99.0/0.0

100.0/0.0 0.0/0.0

99.2/0.0

A

100.0/0.0 0.0/0.0

98.9/0.0

100.0/0.0 0.0/0.0

98.9/0.0

100.0/0.0 0.0/0.0

99.0/0.0

B

100.0/0.0 0.0/0.0

98.9/0.0

100.0/0.0 0.0/0.0

98.9/0.0

100.0/0.0 0.0/0.0

99.0/0.0

C

100.0/0.0 0.0/0.0

98.9/0.0

100.0/0.0 0.0/0.0

98.9/0.0

100.0/0.0 0.0/0.0

99.0/0.0

A

86.2/12.7 67.6/31.4 85.9/12.0 86.5/12.2 63.4/29.4 86.1/11.6 77.4/16.0 67.1/38.9 77.2/15.2

B

88.9/33.3 11.6/33.2 87.6/32.2 88.9/33.3 11.5/33.2 87.6/32.2 88.9/33.3 11.1/33.3 87.6/32.2

C

78.1/43.3 22.2/44.1 77.2/41.8 78.2/43.3 22.2/44.1 77.2/41.8 78.0/43.7 21.4/42.6 77.0/42.3

A

92.8/7.2

B

88.9/33.3 11.1/33.3 87.0/31.7 88.9/33.3 11.1/33.3 86.9/31.7 88.9/33.3 11.1/33.3 86.5/31.3

C

88.9/33.3 11.1/33.3 87.0/31.7 88.9/33.3 11.1/33.3 86.9/31.7 88.9/33.3 11.1/33.3 86.5/31.3

A

96.1/3.2

B

77.8/44.1 22.2/44.1 76.2/41.6 77.8/44.1 22.2/44.1 76.2/41.6 77.8/44.1 22.2/44.1 76.3/41.7

C

77.8/44.1 22.2/44.1 76.2/41.6 77.8/44.1 22.2/44.1 76..2/41.6 77.8/44.1 22.2/44.1 76.3/41.7

90.5/6.0

98.2/0.0

92.7/6.9

66.7/30.3 95.3/2.5

100.0/0.0 0.0/0.0

93.3/7.5

95.9/3.5

85.4/8.0

98.2/0.0

93.1/7.1

57.5/26.0 94.8/2.8

100.0/0.0 0.0/0.0

92.8/7.1

90.6/5.9

98.7/0.0

62.2/16.0 91.9/6.5

79.3/32.7 90.3/5.0


305


/V/

/A/

/U/

/i/

/a/

/O/

/3/

/u/

/el/

True

Set B Overall

True

True

Set C Overall

True

True

Overall

set

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

88.5/5.1

83.0/13.9 88.5/4.8

88.3/5.5

76.8/20.8 88.1/5.1

86.6/3.9

88.5/15.0 86.6/3.6

B

88.9/33.3 11.1/33.3 87.6/32.3 88.9/33.3 11.1/33.3 87.6/32.3 88.9/33.3 11.1/33.3 87.8/32.4

C

88.9/33.3 11.1/33.3 87.6/32.3 88.9/33.3 11.1/33.3 87.6/32.3 88.9/33.3 11.1/33.3 87.8/32.4

A

93.3/1.9

B

88.9/33.3 11.1/33.3 87.6/32.2 88.9/33.3 11.1/33.3 87.5/32.2 88.9/33.3 11.1/33.3 87.6/32.2

C

88.9/33.3 11.1/33.3 87.6/32.2 88.9/33.3 11.1/33.3 87.5/32.2 88.9/33.3 11.1/33.3 87.6/32.2

A

90.9/4.5

B

77.8/44.1 22.2/44.1 77.2/43.2 77.8/44.1 22.2/44.1 77.2/43.2 77.8/44.1 22.2/44.1 77.1/43.1

C

76.5/43.5 33.1/49.7 76.0/42.7 76.5/43.5 32.4/48.7 76.0/42.7 75.2/43.3 32.2/48.4 74.7/42.3

A

95.6/10.2 15.9/31.4 93.1/8.9

B

66.7/50.0 33.3/50.0 65.6/46.8 66.7/50.0 33.3/50.0 65.6/46.8 66.7/50.0 33.3/50.0 65.0/45.0

C

66.7/50.0 33.3/50.0 65.6/46.8 66.7/50.0 33.3/50.0 65.6/46.8 66.7/50.0 33.3/50.0 65.0/45.0

A

92.9/3.3

B

88.9/33.3 11.1/33.3 84.8/29.8 88.9/33.3 11.1/33.3 84.8/29.8 88.9/33.3 11.1/33.3 84/29.1

C

88.9/33.3 11.1/33.3 84.8/29.8 88.9/33.3 11.1/33.3 84.8/29.8 88.9/33.3 11.1/33.3 84/29.1

A

92.6/5.4

B

66.7/50.0 33.3/50.0 65.0/44.9 66.7/50.0 33.3/50.0 65.0/44.9 66.7/50.0 33.3/50.0 65.0/45.1

C

66.7/50.0 33.3/50.0 65.0/44.9 66.7/50.0 33.3/50.0 65.0/44.9 66.7/50.0 33.3/50.0 65.0/45.1

A

92.7/7.5

B

77.8/44.0 22.2/44.1 75.1/39.9 77.8/44.1 22.2/44.1 75.2/39.9 77.8/44.1 22.2/44.1 75.3/40.2

C

77.8/44.1 22.2/44.1 75.1/39.9 77.8/44.1 22.2/44.1 75.2/39.9 77.8/44.1 22.2/44.1 75.3/40.2

A

88.1/33.1 19.4/37.0 85.3/30.6 88.2/33.1 18.7/36.5 85.4/30.6 88.3/33.1 15.2/34.0 84.9/30.0

B

55.6/52.7 44.4/52.7 55.1/48.6 55.6/52.7 44.4/52.7 55.1/48.6 55.6/52.7 44.4/52.7 55.0/47.8

C

55.6/52.7 44.4/52.7 55.1/48.6 55.6/52.7 44.4/52.7 55.1/48.6 55.6/52.7 44.4/52.7 55.0/47.8

A

87.4/32.7 19.4/37.4 84.3/29.7 87.4/32.8 19.0/37.2 84.2/29.7 86.8/32.8 15.8/31.3 85.2/31.3

B

66.7/50.0 33.3/50.0 65.1/45.3 66.7/50.0 33.3/50.0 65.1/45.3 66.7/50.0 33.3/50.0 65.9/47.7

C

66.7/50.0 33.3/50.0 65.1/45.3 66.7/50.0 33.3/50.0 65.1/45.3 66.7/50.0 33.3/50.0 65.9/47.7

93.6/4.6

93.3/1.8

79.6/30.3 90.7/4.2

81.7/31.1 92.3/1.9

74.0/29.0 91.6/4.2

58.7/35.7 91.0/5.9

93.0/21.2 87.0/7.5

91.0/4.3

92.9/2.0

65.3/26.5 90.7/4.0

95.6/10.4 14.6/29.6 93.0/9.1

92.0/3.7

91.9/5.8

92.3/7.4

91.6/31.5 91.4/2.1

68.9/27.1 90.8/4.6

54.4/33.3 90.5/5.9

90.8/2.4

82.5/7.1

95.7/9.7

91.0/2.7

87.7/3.1

90.8/2.3

77.4/29.1 82.5/6.7

15.7/31.9 91.8/7.6

86.7/32.4 93.6/1.4

82.7/10.3 77.2/29.5 82.5/8.8

90.5/8.2

53.5/32.0 88.9/6.6


306


/al/

/Oi/

/OU/

/aU/

/i@/

/U@/

/e@/

True

Set B Overall

True

True

Set C Overall

True

True

Overall

set

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

85.3/3.9

65.6/10.4 84.0/3.0

85.1/4.4

70.9/10.3 84.1/3.4

81.8/2.5

37.1/8.2

B

88.9/33.3 11.1/33.3 83.6/28.8 88.9/33.3 11.1/33.3 83.6/28.8 88.9/33.3 11.1/33.3 85.9/30.8

C

88.9/33.3 11.1/33.3 83.6/28.8 88.9/33.3 11.1/33.3 83.6/28.8 88.9/33.3 11.1/33.3 85.9/30.8

A

90.1/6.5

B

88.9/33.3 11.1/33.3 85.8/30.7 88.9/33.3 11.1/33.3 85.8/30.7 88.9/33.3 11.1/33.3 84.9/29.9

C

88.9/33.3 11.1/33.3 85.8/30.7 88.9/33.3 11.1/33.3 85.8/30.7 88.9/33.3 11.1/33.3 84.9/29.9

A

95.7/3.9

B

100.0/0.0 0.0/0.0

94.2/0.0

100.0/0.0 0.0/0.0

94.2/0.0

100.0/0.0 0.0/0.0

95.4/0.0

C

100.0/0.0 0.0/0.0

94.2/0.0

100.0/0.0 0.0/0.0

94.2/0.0

100.0/0.0 0.0/0.0

95.4/0.0

A

100.0/0.0 0.0/0.0

95.4/0.0

100.0/0.0 0.0/0.0

95.3/0.0

100.0/0.0 0.0/0.0

95.1/0.0

B

44.4/52.7 55.6/52.7 45.0/47.8 44.4/52.7 55.6/52.7 45.0/47.8 44.4/52.7 55.6/52.7 45.0/47.6

C

44.4/52.7 55.6/52.7 45.0/47.8 44.4/52.7 55.6/52.7 45.0/47.8 44.4/52.7 55.6/52.7 45.0/47.6

A

99.0/2.9

B

44.4/52.7 55.6/52.7 44.8/48.9 44.4/52.7 55.6/52.7 44.8/48.9 44.4/52.7 55.6/52.7 45.0/47.8

C

44.4/52.7 55.6/52.7 44.8/48.9 44.4/52.7 55.6/52.7 44.8/48.9 44.4/52.7 55.6/52.7 45.0/47.8

A

88.9/33.3 11.1/33.3 87.2/31.9 88.9/33.3 11.1/33.3 87.2/31.9 88.9/33.3 11.1/33.3 86.7/31.4

B

55.6/52.7 44.4/52.7 55.3/50.4 55.6/52.7 44.4/52.7 55.3/50.4 55.6/52.7 44.4/52.7 55.2/49.7

C

55.6/52.7 44.4/52.7 55.3/50.4 55.6/52.7 44.4/52.7 55.3/50.4 55.6/52.7 44.4/52.7 55.2/49.7

A

97.7/6.8

B

44.4/52.7 55.6/52.7 44.7/50.0 44.4/52.7 55.6/52.7 44.7/50.0 44.4/52.7 55.6/52.7 44.8/49.7

C

44.4/52.7 55.6/52.7 44.7/50.0 44.4/52.7 55.6/52.7 44.7/50.0 44.4/52.7 55.6/52.7 44.8/49.7

51.9/30.5 88.6/5.1

21.8/14.9 91.5/3.0

6.3/18.8

8.0/24.1

95.6/2.1

95.5/6.0

90.0/6.5

96.0/3.6

99.0/2.9

97.8/6.6

45.2/26.9 88.2/5.2

17.8/12.8 91.5/2.9

5.9/17.6

7.6/22.8

95.7/2.1

95.5/5.9

82.1/10.8 0.1/0.1

90.5/7.2

98.4/4.7

97.6/7.3

9.6/10.5

0.0/0.0

8.1/24.4

80.1/2.2

78.0/10.2

86.7/6.5

93.9/4.4

95.0/6.4


307

Table F.3: Percent true negative, true positive and overall accuracies (to 1 d.p.) of EFuNN trained on the phoneme classification problem.

Recalled With Set A Phoneme

/p/

/b/

/t/

/d/

/k/

/g/

/f/

/v/

/T/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

97.9

6.9

96.9

97.9

10.5

96.9

96.0

13.6

95.4

163

B

95.7

60.3

95.3

96.1

77.2

95.8

93.5

49.2

93.1

274

C

97.4

63.8

97.0

98.3

66.7

97.9

99.3

91.5

99.2

345

A

98.2

23.5

98.1

98.5

44.4

98.4

98.8

0.0

98.5

103

B

86.3

64.7

86.3

88.4

88.9

88.4

90.0

73.9

89.9

176

C

96.9

41.2

69.8

97.2

44.4

97.1

99.3

95.7

99.3

229

A

97.3

23.6

96.3

97.2

26.9

96.2

97.7

21.7

97.2

23

B

82.3

88.6

82.4

83.0

80.6

83.0

83.6

80.4

83.5

86

C

96.9

60.7

96.4

97.2

68.7

96.8

98.8

93.5

98.8

116

A

96.2

29.6

96.0

95.4

30.8

95.3

97.0

21.7

96.8

129

B

88.8

55.6

88.7

89.7

100.0

89.7

89.7

47.8

89.6

186

C

97.0

55.6

96.9

97.5

100.0

97.5

98.6

65.2

98.5

242

A

96.0

59.6

95.6

95.4

63.5

95.1

98.2

57.1

98.0

145

B

90.4

89.0

90.4

91.5

94.2

91.5

91.6

83.3

91.6

215

C

97.8

50.5

97.2

97.9

57.7

97.5

99.2

59.5

98.9

231

A

95.9

64.3

95.8

94.9

53.8

94.8

98.5

0.0

98.0

138

B

86.0

89.3

86.0

86.8

100.0

86.8

88.5

83.3

88.5

213

C

98.0

60.7

97.9

97.8

100.0

97.8

98.8

27.8

98.4

252

A

92.9

99.7

93.1

93.3

97.3

93.4

95.2

93.1

95.2

143

B

94.1

82.9

93.8

95.2

86.4

94.9

94.2

80.3

93.8

223

C

96.3

76.9

95.7

97.3

74.8

96.6

98.9

89.0

98.7

248

A

97.1

57.5

96.8

96.8

54.3

96.5

94.9

83.9

94.7

173

B

87.6

78.1

87.6

88.1

82.9

88.1

86.7

68.6

86.4

281

C

94.4

42.5

94.0

94.5

65.7

94.3

97.4

66.1

96.9

389

A

92.4

88.7

92.3

93.4

76.6

93.0

94.3

58.7

93.4

147

B

91.7

84.0

91.5

92.2

95.2

92.2

91.1

93.0

91.2

254

C

94.2

68.8

93.6

95.2

79.0

94.8

97.9

75.0

97.3

317


308


/D/

/s/

/z/

/S/

/Z/

/h/

/ch/

/dj/

/m/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

91.2

41.8

90.9

91.2

34.2

90.7

94.5

3.5

93.0

148

B

89.6

74.7

89.5

89.5

89.5

89.5

87.9

53.0

87.3

277

C

93.9

41.8

93.5

93.6

73.7

93.5

96.8

68.7

96.4

387

A

98.6

56.1

97.1

97.8

48.3

96.0

96.8

53.0

96.1

78

B

89.5

79.6

89.1

89.2

78.7

88.8

87.2

16.5

86.1

192

C

97.9

80.7

97.2

97.7

73.0

96.8

97.3

74.8

96.9

237

A

97.7

74.4

97.3

98.0

56.8

97.3

98.9

69.9

98.3

87

B

87.3

95.4

87.4

88.3

94.7

88.4

86.6

93.6

86.8

153

C

96.5

82.6

96.3

97.3

81.1

97.0

99.0

72.4

98.5

183

A

96.6

38.0

94.9

95.9

27.1

93.9

98.6

30.7

96.7

93

B

94.3

64.0

93.4

94.6

77.8

94.1

92.3

8.0

89.9

197

C

94.5

79.8

94.1

95.4

90.3

95.2

98.3

76.9

98.2

248

A

97.4

42.5

96.9

97.3

38.5

96.9

97.1

29.3

96.3

40

B

86.9

63.8

86.8

87.3

89.7

87.3

86.9

43.9

86.4

142

C

97.2

62.5

96.9

97.1

71.8

96.9

98.4

80.5

98.1

164

A

90.9

79.5

90.8

91.7

73.7

91.6

93.9

27.6

92.8

151

B

91.5

89.2

91.4

92.1

97.4

92.1

89.5

31.7

88.5

243

C

98.6

60.2

98.3

98.7

89.5

98.6

99.0

76.4

98.6

279

A

98.1

24.8

96.5

97.9

27.8

96.4

98.6

14.2

97.2

79

B

85.2

78.8

85.1

85.6

93.5

85.8

84.7

60.2

84.3

186

C

93.2

67.3

92.7

93.5

88.9

93.4

96.7

90.3

96.6

244

A

96.8

26.3

96.5

96.5

22.2

96.2

95.8

1.0

94.5

44

B

86.3

36.8

86.1

86.9

94.4

86.9

88.3

4.1

87.2

118

C

97.3

47.4

97.1

97.5

94.4

97.5

98.9

15.3

97.7

172

A

95.5

48.8

94.9

94.9

53.3

94.4

92.4

45.6

92.0

158

B

95.7

61.8

95.3

95.4

81.7

95.3

91.2

72.1

91.0

276

C

97.5

35.7

96.8

97.3

60.0

96.9

96.7

75.0

96.5

417


309


/n/

/N/

/l/

/r/

/w/

/ie/

/I/

/e/

/&/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

96.2

24.7

94.9

95.1

22.2

93.8

92.8

32.6

92.0

180

B

93.2

80.8

93.0

93.6

93.3

93.6

87.3

47.8

86.8

340

C

95.9

63.7

95.3

96.8

84.4

96.5

97.3

80.4

97.1

517

A

93.5

64.2

93.2

93.0

67.4

92.8

90.3

34.9

89.8

169

B

94.7

86.3

94.6

94.8

97.8

94.8

89.8

9.5

89.1

324

C

96.5

63.2

96.2

96.6

95.7

96.5

97.3

84.1

97.2

424

A

92.2

52.2

91.7

91.9

43.9

91.2

90.7

34.0

89.5

155

B

95.6

71.0

95.3

95.7

83.3

95.5

91.0

30.7

89.7

333

C

95.8

65.2

95.4

95.8

77.3

95.6

96.0

73.9

95.5

558

A

91.6

23.8

90.9

91.7

17.0

90.9

90.5

12.3

89.8

146

B

94.2

77.1

94.1

94.3

94.3

94.3

88.6

24.6

88.0

277

C

96.0

68.6

95.7

95.5

94.3

95.5

96.5

90.8

96.5

384

A

97.4

13.6

96.6

97.1

20.8

96.4

98.3

0.0

97.6

99

B

94.3

67.0

94.0

93.7

89.6

93.6

88.4

33.3

88.0

248

C

95.0

47.6

94.5

94.1

91.7

94.1

97.0

94.4

97.0

406

A

98.8

5.6

97.8

99.1

0.0

98.1

99.0

10.1

98.1

92

B

83.7

51.4

83.4

85.1

75.5

85.0

81.1

24.6

80.5

309

C

93.4

42.1

92.9

93.9

71.7

93.6

95.1

73.9

94.9

417

A

99.4

23.6

98.1

99.3

21.2

97.9

98.1

3.5

96.6

119

B

95.5

69.5

95.1

95.6

94.1

95.5

95.2

18.3

93.9

277

C

96.3

71.3

95.9

96.0

92.9

95.9

97.3

80.0

97.0

340

A

99.5

12.6

97.4

99.6

7.3

97.3

99.5

8.9

96.8

62

B

94.6

68.4

93.9

94.9

70.2

94.3

95.5

23.8

93.4

212

C

95.3

73.1

94.7

94.8

82.3

94.5

97.5

80.8

97.0

270

A

99.8

11.2

97.3

99.8

7.2

97.3

99.9

2.6

97.3

54

B

94.5

64.9

93.7

94.1

79.0

93.7

96.7

35.3

95.0

201

C

96.2

70.2

95.5

95.5

79.7

95.0

98.1

53.2

96.9

225


310


/V/

/A/

/U/

/i/

/a/

/O/

/3/

/u/

/el/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

99.4

20.2

98.1

99.5

7.5

98.0

99.8

2.1

98.5

49

B

95.5

34.4

94.5

96.1

57.5

95.4

95.1

10.4

93.9

187

C

97.2

34.4

96.2

97.2

57.5

96.5

96.8

49.0

96.1

233

A

97.8

46.2

97.0

97.4

37.2

96.3

97.7

7.5

96.2

76

B

97.8

58.4

97.2

97.7

74.4

97.3

97.8

20.0

96.5

219

C

97.9

60.7

97.2

97.3

73.3

96.9

96.3

80.8

96.1

254

A

99.3

19.2

98.6

99.4

8.3

98.6

98.0

19.8

97.1

46

B

98.6

57.6

98.2

98.7

85.4

98.6

96.1

3.7

95.0

164

C

96.7

59.6

96.4

96.8

85.4

96.6

96.9

56.8

96.4

256

A

92.8

37.2

91.0

92.2

35.2

90.4

91.9

39.0

89.3

151

B

93.6

52.9

92.3

93.5

76.7

92.9

90.2

49.3

88.2

365

C

93.4

60.9

92.3

92.6

83.0

92.3

93.7

72.9

92.6

496

A

98.9

26.8

95.1

99.1

20.6

94.9

98.6

36.6

94.6

51

B

97.5

67.7

96.0

98.1

74.8

96.8

99.0

18.1

93.9

182

C

96.3

69.1

94.9

96.2

76.0

95.2

98.3

81.7

97.2

214

A

99.3

18.3

95.2

99.1

16.3

95.0

98.6

10.8

94.3

73

B

96.2

63.0

94.5

96.5

72.9

95.3

94.0

52.9

92.0

236

C

92.0

58.0

90.3

92.8

68.9

91.6

92.7

82.6

92.2

423

A

98.8

35.1

95.7

99.1

25.6

95.6

99.5

5.1

95.4

68

B

95.4

71.8

94.3

95.7

76.5

94.8

91.1

46.9

89.2

217

C

95.8

70.1

94.6

96.0

66.7

94.6

96.5

57.9

94.8

318

A

94.3

35.2

92.0

94.2

26.7

91.5

95.2

27.3

92.0

151

B

93.2

72.3

92.3

93.9

87.2

93.6

91.2

27.9

88.2

302

C

91.2

72.1

90.4

91.2

83.6

90.9

91.7

86.1

91.4

428

A

99.4

1.9

94.9

99.5

1.3

94.9

99.7

0.6

97.4

82

B

93.9

48.9

91.8

94.2

66.0

92.9

91.6

24.8

90.1

273

C

91.4

57.4

89.8

91.4

69.8

90.4

92.9

80.6

92.6

354


311


/al/

/Oi/

/OU/

/aU/

/i@/

/U@/

/e@/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

99.6

7.3

93.3

99.2

5.4

92.9

99.6

6.7

96.0

100

B

88.4

57.4

86.3

89.3

71.9

88.1

89.2

23.3

86.7

291

C

84.9

65.4

83.6

85.4

72.8

84.6

88.8

50.0

87.3

416

A

98.1

14.3

94.7

98.4

11.3

95.0

96.8

0.0

91.9

89

B

91.1

62.3

89.9

91.8

83.6

91.5

87.1

1.9

82.8

232

C

89.6

51.5

88.1

88.9

75.9

88.4

91.5

23.3

88.0

360

A

98.6

3.7

93.1

98.6

2.4

93.0

98.6

0.0

94.1

99

B

65.9

73.1

66.3

67.3

87.9

68.5

57.5

26.2

56.0

278

C

82.6

57.3

81.1

82.4

74.0

81.9

86.4

87.2

86.4

436

A

99.4

1.9

94.9

99.5

1.7

94.9

99.5

0.0

94.7

88

B

83.2

77.6

82.9

83.8

82.8

83.8

86.9

8.1

83.1

180

C

75.4

70.8

75.2

73.6

75.0

73.7

77.1

64.2

76.5

252

A

98.9

10.0

95.7

98.3

13.9

95.3

97.5

0.9

93.0

147

B

90.7

57.6

89.5

91.2

78.3

90.7

84.3

10.9

80.9

274

C

85.9

65.1

85.1

87.1

80.0

86.9

91.6

29.5

88.7

359

A

99.0

7.5

97.0

98.9

9.2

96.9

99.0

2.0

96.2

152

B

99.0

72.6

87.7

87.7

91.7

87.8

84.2

6.9

82.0

245

C

90.5

71.7

90.1

90.5

90.8

90.5

91.3

41.9

89.9

380

A

99.4

2.3

97.0

99.5

0.8

96.9

99.3

0.0

96.4

89

B

50.4

79.8

51.2

50.8

98.4

52.0

52.1

68.3

52.6

163

C

81.8

82.5

81.8

81.1

97.6

81.6

85.0

89.6

85.2

330


312

Table F.4: Percent true negative, true positive and overall accuracies (to 1 d.p.) of SECoS trained on the phoneme classification problem.


/p/

/b/

/t/

/d/

/k/

/g/

/f/

/v/

/T/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

98.5

100.0

98.6

98.4

78.9

98.2

98.2

47.5

97.8

271

B

98.6

100.0

98.6

99.4

100.0

99.4

97.3

59.3

97.0

376

C

85.0

100.0

85.2

87.3

100.0

87.4

80.0

100.0

80.2

507

A

98.7

100.0

98.7

99.2

66.7

99.1

99.1

4.3

98.8

116

B

98.7

100.0

98.7

99.3

100.0

99.3

99.7

0.0

99.4

162

C

84.2

100.0

84.2

85.5

100.0

85.5

83.7

100.0

83.7

235

A

98.4

100.0

98.4

98.2

89.6

98.1

97.1

21.7

96.6

293

B

99.2

100.0

99.2

99.6

100.0

99.6

97.6

37.0

97.2

406

C

94.8

100.0

94.9

95.3

100.0

95.3

94.4

100.0

94.5

490

A

98.6

100.0

98.6

99.5

46.2

98.4

99.2

17.4

99.0

146

B

98.7

100.0

98.7

99.2

100.0

99.2

99.6

8.7

99.3

200

C

74.6

100.0

74.7

77.2

100.0

77.2

73.9

100.0

74.0

294

A

98.6

100.0

98.6

98.2

90.4

98.1

99.4

73.8

99.2

232

B

99.2

100.0

99.2

99.6

100.0

99.6

99.2

83.3

99.1

333

C

96.8

100.0

96.9

97.7

100.0

97.7

98.3

100.0

98.3

409

A

98.8

100.0

98.8

98.3

55.8

98.2

99.3

8.3

98.8

133

B

99.2

100.0

99.2

99.5

100.0

99.5

99.5

2.8

99.0

184

C

97.8

100.0

97.8

97.5

100.0

97.5

98.7

100.0

98.7

265

A

99.1

100.0

99.2

98.4

91.2

98.2

98.8

51.4

97.6

471

B

99.3

100.0

99.3

99.8

100.0

99.8

98.8

43.9

97.4

674

C

98.6

100.0

98.7

99.0

100.0

99.1

99.2

97.1

99.2

814

A

97.0

100.0

97.0

96.2

62.9

95.9

94.6

24.6

93.4

296

B

98.2

100.0

98.2

98.8

100.0

98.8

97.1

16.1

95.7

406

C

97.6

100.0

97.6

97.9

100.0

97.9

98.7

72.0

98.3

541

A

98.8

100.0

98.8

98.6

85.5

98.3

97.4

73.8

97.0

410

B

98.8

100.0

98.9

99.0

100.0

99.1

97.4

72.7

96.8

570

C

96.5

100.0

96.6

97.0

100.0

97.0

98.2

100.0

98.3

799


313


/D/

/s/

/z/

/S/

/Z/

/h/

/ch/

/dj/

/m/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

97.0

100.0

97.0

96.3

68.4

96.0

95.1

40.9

94.2

267

B

98.1

100.0

98.2

98.7

100.0

98.7

95.4

46.1

94.6

381

C

98.0

100.0

98.0

98.2

100.0

98.2

98.5

93.0

98.4

532

A

98.4

100.0

98.4

97.5

96.1

97.5

96.5

77.4

96.2

442

B

98.2

99.7

98.3

99.0

100.0

99.1

96.8

73.0

96.4

594

C

98.0

99.7

98.1

98.3

100.0

98.4

98.8

100.0

98.8

688

A

97.2

100.0

97.2

96.5

82.1

96.3

95.2

98.7

95.2

325

B

98.1

100.0

98.1

98.2

100.0

98.2

97.1

91.7

97.0

447

C

95.9

100.0

96.0

96.0

100.0

96.1

95.3

98.7

95.3

539

A

98.5

100.0

98.6

97.7

84.0

97.3

97.0

74.4

96.3

495

B

99.4

100.0

99.4

99.7

100.0

99.7

98.3

64.8

97.3

666

C

98.0

100.0

98.1

98.4

100.0

98.4

99.4

93.5

99.2

812

A

97.9

100.0

97.9

97.1

79.5

97.0

96.1

61.0

95.7

237

B

98.7

100.0

98.7

98.9

100.0

99.0

98.2

56.1

97.7

334

C

88.2

100.0

88.3

88.8

100.0

88.8

86.8

82.9

86.8

412

A

98.0

100.0

98.0

97.9

97.4

97.9

97.6

22.8

96.3

211

B

98.4

100.0

98.4

99.2

100.0

99.2

98.1

22.0

96.8

292

C

94.3

100.0

94.4

95.3

100.0

95.4

97.8

90.2

97.7

414

A

98.5

99.1

98.5

97.9

84.3

97.6

97.6

57.5

96.9

407

B

99.0

99.1

99.0

99.6

100.0

99.6

96.6

51.3

95.9

576

C

95.1

99.1

95.2

96.0

100.0

96.1

92.2

96.5

92.3

719

A

98.1

100.0

98.1

97.9

55.6

97.8

95.5

1.0

94.2

168

B

98.7

100.0

98.7

98.9

100.0

99.0

96.2

0.0

94.8

208

C

90.8

100.0

90.9

90.9

100.0

91.0

86.1

70.4

85.9

356

A

97.7

95.5

97.7

97.2

70.0

96.9

94.3

17.6

93.6

380

B

97.8

92.7

97.7

99.0

100.0

99.0

94.7

17.6

94.0

518

C

95.6

96.7

95.7

96.1

98.3

96.1

92.3

98.5

92.3

707


314


/n/

/N/

/l/

/r/

/w/

/ie/

/I/

/e/

/&/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

97.5

92.9

97.4

96.9

70.0

96.4

93.1

29.3

92.3

453

B

97.6

92.9

97.5

98.9

100.0

98.9

92.0

26.1

91.2

602

C

96.8

94.0

96.7

97.1

100.0

97.1

97.1

92.4

97.0

820

A

97.3

100.0

97.3

97.0

71.7

96.8

95.4

0.0

94.6

320

B

97.7

93.7

97.7

98.6

100.0

98.6

95.3

1.6

94.5

444

C

95.9

92.6

95.9

96.5

97.8

96.5

97.1

100.0

97.2

566

A

97.5

96.4

97.5

96.6

69.7

96.1

94.6

41.2

93.4

467

B

97.6

97.8

97.6

98.6

100.0

98.6

93.2

45.8

92.1

647

C

94.5

97.8

94.6

94.9

100.0

95.0

89.6

88.9

89.6

931

A

96.6

100.0

96.6

96.5

81.1

96.3

93.1

32.3

92.6

315

B

96.8

100.0

96.8

97.1

100.0

97.2

92.4

43.1

91.9

443

C

95.2

100.0

95.3

94.9

100.0

94.9

93.1

90.8

93.1

554

A

97.0

98.1

97.0

96.5

60.4

96.2

92.7

31.5

92.2

444

B

97.3

99.0

97.3

97.9

100.0

97.9

92.9

37.0

92.4

610

C

95.3

99.0

95.4

95.4

100.0

95.5

97.0

100.0

97.0

818

A

96.2

100.0

96.3

95.7

58.5

95.3

94.9

14.5

94.1

583

B

96.7

100.0

96.8

98.4

100.0

98.4

95.0

8.7

94.1

809

C

94.2

100.0

94.2

94.4

100.0

94.5

95.6

100.0

95.6

1003

A

97.2

98.9

97.2

97.1

89.4

97.0

92.5

37.4

91.6

493

B

97.7

98.3

97.7

98.6

97.6

98.5

94.4

31.1

93.3

665

C

96.4

98.3

96.4

96.6

96.5

96.5

94.2

98.3

94.3

771

A

98.0

93.7

97.9

97.7

71.0

97.0

97.5

32.2

95.5

444

B

98.3

93.7

98.2

99.3

98.4

99.3

97.6

36.9

95.8

605

C

96.6

94.9

96.5

97.6

99.2

97.6

97.7

81.8

97.2

728

A

98.0

97.5

98.0

97.1

88.4

97.0

93.8

73.2

93.2

528

B

98.2

95.1

98.1

98.7

98.6

98.7

96.4

76.3

95.9

715

C

96.6

95.8

96.6

97.2

99.3

97.2

95.7

88.9

95.5

779


315


/V/

/A/

/U/

/i/

/a/

/O/

/3/

/u/

/el/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

97.3

95.1

97.3

96.5

66.3

96.0

93.1

20.8

92.1

464

B

97.8

92.6

97.7

98.6

98.8

98.6

94.7

27.1

93.7

652

C

96.3

92.6

96.2

97.0

98.8

97.0

92.2

86.5

92.1

725

A

98.4

100.0

98.4

97.8

83.7

97.6

95.8

50.0

95.1

402

B

98.4

98.3

98.4

98.8

100.0

98.8

96.5

56.7

95.8

553

C

96.2

98.3

96.3

96.2

100.0

96.3

94.4

92.5

94.4

630

A

98.3

98.0

98.3

97.8

50.0

97.4

96.9

16.0

96.0

331

B

98.3

99.0

98.3

98.9

100.0

98.9

95.4

19.8

94.5

466

C

97.3

99.0

97.3

97.3

100.0

97.4

96.5

76.5

96.3

613

A

97.0

96.0

96.9

94.9

64.8

93.9

92.9

38.7

90.2

883

B

96.2

93.5

96.1

98.1

98.7

98.1

91.4

46.4

89.1

1165

C

93.6

92.0

93.5

93.6

96.9

93.7

91.9

91.2

91.9

1362

A

98.5

97.0

98.4

96.9

93.1

96.7

97.1

30.8

92.9

711

B

98.5

96.5

98.4

99.3

99.2

99.3

98.0

25.9

93.4

938

C

97.4

96.5

97.4

98.3

99.2

98.3

96.0

93.3

95.9

1011

A

98.0

98.8

98.0

96.3

77.7

95.4

91.8

57.6

90.1

846

B

97.5

97.5

97.5

98.4

100.0

98.5

91.4

63.1

90.0

1139

C

96.0

96.7

96.0

96.7

97.6

96.8

95.5

89.5

95.2

1447

A

97.1

97.9

97.1

95.0

82.5

94.5

89.4

59.8

88.1

893

B

97.6

96.9

97.6

99.2

99.6

99.2

90.8

62.4

89.6

1204

C

96.7

95.1

96.6

97.5

97.4

97.5

96.8

86.8

96.3

1430

A

96.5

97.8

96.6

95.0

80.0

94.4

91.3

31.2

88.5

778

B

97.3

97.5

97.3

98.6

99.0

98.6

91.8

27.6

88.8

1017

C

92.7

97.5

92.9

92.7

99.0

93.0

92.3

94.5

92.4

1274

A

96.5

95.6

96.5

92.9

68.1

91.7

91.6

27.9

90.1

1137

B

96.4

91.2

96.1

98.4

97.0

98.3

92.3

24.2

90.7

1565

C

94.4

91.4

94.2

95.6

97.4

95.7

91.7

95.8

91.8

1737


316


/al/

/Oi/

/OU/

/aU/

/i@/

/U@/

/e@/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

97.2

96.8

97.2

93.5

73.4

92.1

91.5

19.6

88.7

1379

B

96.2

95.9

96.2

98.2

98.8

98.3

90.4

15.2

87.5

1859

C

93.5

95.9

93.7

94.9

98.8

95.1

90.0

76.3

89.4

2067

A

96.9

97.3

96.9

95.2

68.2

94.1

89.0

0.6

84.5

1014

B

95.9

95.9

95.9

97.8

100.0

97.9

88.0

0.8

83.6

1338

C

93.6

96.1

93.7

94.4

100.0

94.7

89.1

81.4

88.7

1597

A

96.2

96.8

96.3

92.6

61.6

90.8

82.2

13.1

78.9

1420

B

94.8

96.4

94.9

97.3

100.0

97.5

82.7

9.8

79.3

1884

C

92.2

96.6

92.4

93.4

100.0

93.8

88.9

95.4

89.2

2229

A

96.7

98.9

96.8

95.2

75.9

94.3

93.7

5.8

89.4

945

B

96.6

98.5

96.7

98.1

100.0

98.1

93.5

5.2

89.2

1272

C

90.4

98.5

90.8

91.4

100.0

91.8

84.3

68.0

83.5

1368

A

97.3

97.0

97.2

94.8

68.3

93.8

91.7

5.5

87.7

982

B

97.0

96.8

97.0

98.5

98.9

98.5

92.5

5.8

88.4

1271

C

94.3

95.9

94.4

95.5

98.9

95.6

92.0

63.2

90.7

1429

A

97.3

98.7

97.4

95.8

82.6

95.5

94.7

4.4

92.1

725

B

97.1

99.1

97.1

98.7

100.0

98.7

95.0

7.9

92.4

928

C

93.6

99.6

93.7

93.3

100.0

93.4

90.0

83.7

89.8

1121

A

97.7

94.6

97.6

95.9

53.5

94.8

94.7

13.9

92.4

893

B

96.5

96.1

96.5

98.8

100.0

98.8

93.1

29.2

91.2

1125

C

94.4

96.1

94.4

95.0

100.0

95.2

90.6

92.1

90.7

1313


317

Table F.5: Percent true negative, true positive and overall accuracies (to 1 d.p.) of Zadeh-Mamdani rules extracted from EFuNN trained on the phoneme classification problem.


/p/

/b/

/t/

/d/

/k/

/g/

/f/

/v/

/T/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Rules

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

100.0

0.0

98.9

100.0

0.0

98.8

100.0

0.0

99.2

163

B

100.0

0.0

98.9

100.0

0.0

98.8

100.0

0.0

99.2

274

C

100.0

0.0

98.9

100.0

0.0

98.9

100.0

0.0

99.2

345

A

100.0

0.0

99.8

100.0

0.0

99.8

100.0

0.0

99.7

103

B

0.0

100.0

0.2

0.0

100.0

0.2

0.0

100.0

0.3

176

C

100.0

0.0

99.8

100.0

0.0

99.8

100.0

0.0

99.7

229

A

100.0

0.0

98.6

100.0

0.0

98.6

100.0

0.0

99.3

23

B

0.4

100.0

1.7

0.7

100.0

2.1

0.0

100.0

0.7

86

C

100.0

0.0

98.6

100.0

0.0

98.6

100.0

0.0

99.3

116

A

100.0

0.0

99.7

100.0

0.0

99.7

100.0

0.0

99.7

129

B

1.9

100.0

2.2

2.1

84.6

2.3

0.6

100

0.9

186

C

100.0

0.0

99.7

100.0

0.0

99.7

100.0

0.0

99.7

242

A

100.0

0.0

98.9

100.0

0.0

99.0

100.0

0.0

99.4

145

B

0.0

100.0

1.1

0.0

100.0

1.0

0.0

100.0

0.6

215

C

100.0

0.0

98.9

100.0

0.0

99.0

100.0

0.0

99.4

231

A

100.0

0.0

99.7

100.0

0.0

99.7

100.0

0.0

99.5

138

B

0.1

100.0

0.4

0.1

100.0

0.3

0.0

100.0

0.5

213

C

100.0

0.0

99.7

100.0

0.0

99.7

100.0

0.0

99.5

252

A

0.0

100.0

3.0

0.0

100.0

3.0

0.0

100.0

2.5

143

B

0.0

100.0

2.9

0.0

100.0

3.0

0.0

100.0

2.5

223

C

100.0

0.0

97.1

100.0

0.0

97.0

100.0

0.0

97.5

248

A

100.0

0.0

99.3

100.0

0.0

99.3

100.0

0.0

98.3

173

B

0.0

100.0

0.7

0.0

100.0

0.7

0.0

100.0

1.7

281

C

100.0

0.0

99.3

100.0

0.0

99.3

100.0

0.0

98.3

389

A

0.0

100.0

2.5

0.0

100.0

2.5

0.0

100.0

2.4

147

B

0.0

100.0

2.5

0.0

100.0

2.5

0.0

100.0

2.4

254

C

100.0

0.0

97.5

100.0

0.0

97.5

100.0

0.0

97.6

317


318


/D/

/s/

/z/

/S/

/Z/

/h/

/ch/

/dj/

/m/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Rules

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

99.7

0.0

98.9

99.6

0.0

98.8

99.8

0.0

98.2

148

B

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.4

277

C

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.4

387

A

100.0

0.0

96.4

100.0

0.0

96.4

100.0

0.0

98.4

78

B

100.0

0.0

96.4

100.0

0.0

96.4

100.0

0.0

98.4

192

C

0.0

100.0

3.6

0.0

100.0

3.6

0.0

100.0

1.6

237

A

100.0

0.0

98.1

100.0

0.0

98.1

100.0

0.0

97.8

87

B

0.0

100.0

1.9

0.0

100.0

1.9

0.0

100.0

2.2

153

C

100.0

0.0

98.1

100.0

0.0

98.1

100.0

0.0

97.8

183

A

100.0

0.0

97.1

100.0

0.0

97.1

100.0

0.0

97.2

93

B

100.0

0.0

97.1

100.0

0.0

97.1

100.0

0.0

97.2

197

C

100.0

0.0

97.1

100.0

0.0

97.1

100.0

0.0

97.2

248

A

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.8

40

B

0.1

100.0

0.9

0.0

100.0

0.8

0.1

100.0

1.2

142

C

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.8

164

A

0.2

100.0

1.0

0.2

100.0

1.0

0.2

100.0

1.9

151

B

0.0

100.0

0.9

0.0

100.0

0.8

0.1

100.0

1.8

243

C

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.3

279

A

100.0

0.0

97.8

100.0

0.0

97.8

100.0

0.0

98.4

79

B

0.0

100.0

2.2

0.0

100.0

2.2

0.0

100.0

1.6

186

C

100.0

0.0

97.8

100.0

0.0

97.8

100.0

0.0

98.4

244

A

100.0

0.0

99.6

100.0

0.0

99.6

100.0

0.0

98.6

44

B

0.0

100.0

0.4

0.0

100.0

0.4

0.0

100.0

1.4

118

C

100.0

0.0

99.6

100.0

0.0

99.6

100.0

0.0

98.6

172

A

100.0

0.0

98.8

100.0

0.0

98.8

100.0

0.0

99.0

158

B

100.0

0.0

98.8

100.0

0.0

98.8

100.0

0.0

99.0

276

C

100.0

0.0

98.8

100.0

0.0

98.8

100.0

0.0

99.0

417


319


/n/

/N/

/l/

/r/

/w/

/ie/

/I/

/e/

/&/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Rules

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

100.0

0.0

98.2

100.0

0.0

98.2

100.0

0.0

98.7

180

B

100.0

0.0

98.2

100.0

0.0

98.2

100.0

0.0

98.7

340

C

100.0

0.0

98.2

100.0

0.0

98.2

100.0

0.0

98.7

517

A

100.0

0.0

99.1

100.0

0.0

99.1

100.0

0.0

99.1

169

B

100.0

0.0

99.1

100.0

0.0

99.1

100.0

0.0

99.1

324

C

100.0

0.0

99.1

100.0

0.0

99.1

100.0

0.0

99.1

424

A

92.2

52.2

91.7

91.9

43.9

91.2

90.7

34.0

89.5

155

B

95.6

71.0

95.3

95.7

83.3

95.5

91.0

30.7

89.7

333

C

95.8

65.2

95.4

95.8

77.3

95.6

96.0

73.9

95.5

558

A

100.0

0.0

99.0

100.0

0.0

98.9

100.0

0.0

99.1

146

B

100.0

0.0

99.0

100.0

0.0

98.9

100.0

0.0

99.1

277

C

100.0

0.0

99.0

100.0

0.0

98.9

100.0

0.0

99.1

384

A

100.0

0.0

99.0

100.0

0.0

99.0

100.0

0.0

99.2

99

B

100.0

0.0

99.0

100.0

0.0

99.0

100.0

0.0

99.2

248

C

100.0

0.0

99.0

100.0

0.0

99.0

100.0

0.0

99.2

406

A

100.0

0.0

98.9

100.0

0.0

99.0

100.0

0.0

99.0

92

B

0.0

100.0

1.1

0.0

100.0

1.1

0.0

100.0

1.0

309

C

100.0

0.0

98.9

100.0

0.0

98.9

100.0

0.0

99.0

417

A

100.0

0.0

98.3

100.0

0.0

98.3

100.0

0.0

98.4

119

B

100.0

0.0

98.3

100.0

0.0

98.3

100.0

0.0

98.4

277

C

100.0

0.0

98.3

100.0

0.0

98.3

100.0

0.0

98.4

340

A

100.0

0.0

97.5

100.0

0.0

97.5

100.0

0.0

97.0

62

B

100.0

0.0

97.5

100.0

0.0

97.5

100.0

0.0

97.0

212

C

100.0

0.0

97.5

100.0

0.0

97.5

100.0

0.0

97.0

270

A

100.0

0.0

97.2

100.0

0.0

97.2

100.0

0.0

97.3

54

B

100.0

0.0

97.2

100.0

0.0

97.2

100.0

0.0

97.3

201

C

100.0

0.0

97.2

100.0

0.0

97.2

100.0

0.0

97.3

225


320


/V/

/A/

/U/

/i/

/a/

/O/

/3/

/u/

/el/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Rules

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

100.0

0.0

98.4

100.0

0.0

98.4

100.0

0.0

98.6

49

B

100.0

0.0

98.4

100.0

0.0

98.4

100.0

0.0

98.6

187

C

100.0

0.0

98.4

100.0

0.0

96.4

100.0

0.0

98.6

233

A

100.0

0.0

98.3

100.0

0.0

98.3

100.0

0.0

98.3

76

B

100.0

0.0

98.3

100.0

0.0

98.3

100.0

0.0

98.3

219

C

100.0

0.0

98.3

100.0

0.0

98.3

100.0

0.0

98.3

254

A

100.0

0.0

99.0

100.0

0.0

99.0

100.0

0.0.

98.9

46

B

100.0

0.0

99.0

100.0

0.0

99.0

100.0

0.0

98.9

164

C

100.0

0.0

99.0

100.0

0.0

99.0

100.0

0.0

98.9

256

A

100.0

0.0

96.8

100.0

0.0

96.8

100.0

0.0

95.0

151

B

100.0

0.0

96.8

100.0

0.0

96.8

100.0

0.0

95.0

365

C

100.0

0.0

96.8

100.0

0.0

96.8

100.0

0.0

95.0

496

A

98.9

26.8

95.1

99.1

20.6

94.9

98.6

36.6

94.6

51

B

97.5

67.7

96.0

98.1

74.8

96.8

99.0

18.1

93.9

182

C

96.3

69.1

94.9

96.2

76.0

95.2

98.3

81.7

97.2

214

A

100.0

0.0

94.9

100.0

0.0

94.9

100.0

0.0

95.1

73

B

100.0

0.0

94.9

100.0

0.0

94.9

100.0

0.0

95.1

236

C

100.0

0.0

94.9

100.0

0.0

94.9

100.0

0.0

95.1

423

A

100.0

0.0

95.2

100.0

0.0

95.3

100.0

0.0

95.6

68

B

100.0

0.0

95.2

100.0

0.0

95.3

100.0

0.0

95.6

217

C

100.0

0.0

65.2

100.0

0.0

95.3

100.0

0.0

95.6

318

A

100.0

0.0

96.1

100.0

0.0

96.1

100.0

0.0

95.3

151

B

100.0

0.0

96.1

100.0

0.0

96.1

100.0

0.0

95.3

302

C

100.0

0.0

96.1

100.0

0.0

96.1

100.0

0.0

95.3

428

A

100.0

0.0

95.3

100.0

0.0

95.3

100.0

0.0

97.7

82

B

100.0

0.0

95.3

100.0

0.0

95.3

100.0

0.0

97.7

273

C

100.0

0.0

95.3

100.0

0.0

95.3

100.0

0.0

97.7

354


321


/al/

/Oi/

/OU/

/aU/

/i@/

/U@/

/e@/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Rules

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

100.0

0.0

93.2

100.0

0.0

93.2

100.0

0.0

96.2

100

B

100.0

0.0

93.2

100.0

0.0

93.2

100.0

0.0

96.2

291

C

100.0

0.0

93.2

100.0

0.0

93.2

100.0

0.0

96.2

416

A

100.0

0.0

96.0

100.0

0.0

96.1

100.0

0.0

94.9

89

B

100.0

0.0

96.0

100.0

0.0

96.1

99.9

0.0

94.8

232

C

100.0

0.0

96.0

100.0

0.0

96.1

100.0

0.0

94.9

360

A

100.0

0.0

94.2

100.0

0.0

94.2

100.0

0.0

95.4

99

B

0.0

100.0

5.8

0.0

100.0

5.8

0.0

100.0

4.6

278

C

100.0

0.0

94.2

100.0

0.0

94.2

100.0

0.0

95.4

436

A

100.0

0.0

95.4

100.0

0.0

95.3

100.0

0.0

95.1

88

B

0.6

100.0

5.2

0.4

100.0

5.1

1.1

100.0

5.9

180

C

100.0

0.0

95.4

100.0

0.0

95.3

100.0

0.0

95.1

252

A

100.0

0.0

96.4

100.0

0.0

96.4

100.0

0.0

95.3

147

B

100.0

0.0

96.4

100.0

0.0

96.4

100.0

0.0

95.3

274

C

100.0

0.0

96.4

100.0

0.0

96.4

100.0

0.0

95.3

359

A

100.0

0.0

97.8

100.0

0.0

97.8

100.0

0.0

97.1

152

B

100.0

0.0

97.8

100.0

0.0

97.8

100.0

0.0

97.1

245

C

100.0

0.0

97.8

100.0

0.0

97.8

100.0

0.0

97.1

380

A

100.0

0.0

97.5

100.0

0.0

97.4

100.0

0.0

97.1

89

B

0.0

100.0

2.5

0.0

100.0

2.6

0.0

100.0

2.9

163

C

100.0

0.0

97.5

100.0

0.0

97.4

100.0

0.0

97.1

330


322

Table F.6: Percent true negative, true positive and overall accuracies (to 1 d.p.) of EFuNN created via insertion of Zadeh-Mamdani fuzzy rules, for the phoeneme classification problem.


/p/

/b/

/t/

/d/

/k/

/g/

/f/

/v/

/T/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

97.9

7.8

96.9

97.7

8.8

96.6

95.8

13.6

95.2

163

B

95.6

61.2

95.2

95.8

75.4

95.6

93.1

55.9

92.8

274

C

97.3

61.2

96.9

98.3

64.9

97.9

99.3

89.8

99.2

345

A

98.0

17.6

97.8

98.2

33.3

98.1

98.2

0.0

97.9

103

B

85.5

64.7

85.5

87.4

88.9

87.4

89.4

73.9

89.4

176

C

96.7

41.2

96.6

97.1

55.6

97.0

99.3

95.7

99.3

229

A

97.6

18.6

96.5

97.3

23.4

96.3

97.6

21.7

97.1

23

B

80.9

87.9

80.9

81.3

82.1

81.3

81.7

89.1

81.7

86

C

96.8

60.7

96.3

97.2

71.6

97.0

98.7

97.8

98.7

116

A

96.2

33.3

96.0

95.4

38.5

95.3

96.8

17.4

96.5

129

B

88.6

55.6

88.5

89.3

100.0

89.3

89.1

47.8

88.9

186

C

96.8

55.6

96.7

97.0

100.0

97.0

98.6

65.2

98.5

242

A

95.8

61.5

95.4

95.2

63.5

94.9

98.2

57.1

97.9

145

B

90.2

90.8

90.2

91.2

94.2

91.2

91.5

85.7

91.5

215

C

97.8

49.5

97.3

97.9

57.7

97.5

99.2

64.3

99.0

231

A

95.6

67.9

95.5

94.8

61.5

94.7

98.3

0.0

97.8

138

B

85.6

89.3

85.6

86.1

100.0

86.1

88.0

83.3

88.0

213

C

97.9

57.1

97.8

97.8

100.0

97.8

98.7

27.8

98.4

252

A

92.7

100.0

92.9

93.1

97.3

93.2

95.1

93.6

95.1

143

B

94.0

82.3

93.7

95.0

86.4

94.8

93.9

80.3

93.5

223

C

96.2

77.3

95.7

97.2

75.5

96.5

98.9

89.6

98.7

248

A

97.0

61.6

96.8

96.7

54.3

96.4

94.6

87.3

94.5

173

B

87.5

76.7

87.5

87.8

82.9

87.7

86.4

68.6

86.1

281

C

94.2

41.1

93.9

94.6

65.7

94.4

97.4

66.9

96.9

389

A

92.1

89.1

92.0

93.1

76.6

92.7

94.0

59.3

93.2

147

B

91.5

83.6

91.3

91.9

95.2

92.0

90.9

93.0

90.9

254

C

94.1

68.4

93.4

95.2

78.2

94.8

97.9

73.8

97.3

317


323


/D/

/s/

/z/

/S/

/Z/

/h/

/ch/

/dj/

/m/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

91.0

43.0

90.6

90.8

36.8

90.4

94.3

3.5

92.9

148

B

89.1

73.4

88.9

88.8

89.5

88.8

87.2

56.5

86.7

277

C

93.7

41.7

93.2

93.5

68.4

93.3

96.7

69.7

96.2

387

A

98.4

58.8

97.0

97.5

50.0

95.8

96.6

60.0

96.0

78

B

89.0

79.0

88.9

88.7

76.4

88.3

86.6

14.8

85.4

192

C

97.8

79.8

97.1

97.5

73.0

96.6

97.1

77.4

96.8

237

A

97.7

72.8

97.2

98.0

56.8

97.2

99.1

66.7

98.4

87

B

86.8

95.4

87.0

88.2

96.7

88.3

86.2

91.0

86.3

153

C

96.4

80.5

96.1

97.3

80.0

97.0

98.9

71.2

98.3

183

A

96.4

39.4

94.8

95.9

28.5

93.9

98.6

34.2

96.7

93

B

94.0

60.6

93.1

94.5

77.1

94.0

92.2

6.5

89.8

197

C

94.5

77.8

94.0

95.4

90.3

95.3

98.8

75.9

98.1

248

A

97.4

43.8

97.0

97.4

41.0

97.0

96.9

25.6

96.1

40

B

86.4

60.0

86.2

86.7

89.7

86.7

86.5

42.7

85.9

142

C

97.2

61.3

96.9

97.2

71.8

97.0

98.5

78.0

98.2

164

A

90.7

79.5

90.7

91.4

76.3

91.3

93.8

28.5

92.6

151

B

91.2

88.0

91.2

91.8

97.4

91.8

89.2

33.3

88.2

243

C

98.5

60.2

98.2

98.7

89.5

98.6

99.1

69.9

98.6

279

A

97.9

24.8

96.2

97.7

28.7

96.2

98.3

14.2

97.0

79

B

84.6

78.3

84.4

85.1

93.5

85.3

84.0

69.9

83.7

186

C

93.1

65.9

92.5

93.3

87.0

93.1

96.5

92.0

96.5

244

A

96.8

31.6

96.6

96.2

38.9

95.9

96.0

0.0

94.7

44

B

85.0

36.8

84.8

85.8

94.4

85.9

87.6

4.1

86.4

118

C

97.0

47.4

96.8

97.2

94.4

97.2

98.4

15.3

97.2

172

A

94.9

50.4

94.4

94.6

53.3

94.1

91.9

54.6

91.4

158

B

95.3

61.8

94.9

95.2

78.3

95.0

90.4

73.5

90.2

276

C

97.2

39.0

96.5

97.0

65.0

96.6

96.4

75.0

96.2

417


324


/n/

/N/

/l/

/r/

/w/

/ie/

/I/

/e/

/&/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

95.8

24.2

94.5

94.6

24.4

93.3

92.0

38.0

91.3

180

B

93.0

79.7

92.7

93.3

92.2

93.3

86.5

48.9

86.0

340

C

95.7

61.0

95.1

96.2

83.3

96.0

97.0

80.4

96.8

517

A

92.9

71.6

92.7

92.4

71.7

92.2

89.5

28.6

89.0

169

B

94.5

85.3

94.5

94.5

97.8

94.5

89.2

11.1

88.5

324

C

96.3

61.1

96.0

96.4

95.7

96.4

97.0

84.1

96.8

424

A

91.5

52.2

90.9

90.9

48.5

90.3

90.1

34.0

88.9

155

B

95.3

71.0

95.0

95.2

83.3

95.0

90.6

35.3

89.4

333

C

95.6

65.9

95.2

95.3

77.3

95.1

96.2

74.5

95.7

558

A

90.7

25.7

90.1

90.8

18.9

90.1

90.7

21.5

90.0

146

B

93.4

72.4

93.6

94.0

88.7

93.9

88.4

41.5

87.9

277

C

95.7

65.7

95.4

95.5

88.7

95.4

96.8

69.2

96.5

384

A

96.9

11.7

96.1

96.7

18.8

95.9

98.1

0.0

97.4

99

B

93.5

65.0

93.2

92.7

89.6

92.7

87.7

31.5

87.2

248

C

94.5

45.6

94.0

93.4

89.6

93.3

96.9

96.3

96.9

406

A

98.9

7.5

97.9

98.9

1.9

97.8

99.3

5.8

98.4

92

B

83.0

49.5

82.6

84.4

77.4

84.4

80.3

26.1

79.8

309

C

93.2

39.3

92.6

93.4

71.7

93.2

94.2

76.8

94.0

417

A

99.3

16.7

97.9

99.3

15.3

97.8

98.6

7.0

97.1

119

B

95.6

64.9

95.1

95.4

94.8

95.3

95.3

17.4

94.1

277

C

96.7

63.8

96.1

96.2

90.6

96.1

96.5

80.0

96.2

340

A

99.6

12.6

97.4

99.7

9.7

97.5

99.7

8.4

96.9

62

B

94.5

68.0

93.8

94.7

70.2

94.1

95.1

23.8

93.0

212

C

94.4

74.7

93.9

94.6

80.6

94.2

97.6

80.4

97.1

270

A

99.9

8.8

97.4

99.8

8.7

97.2

99.9

1.6

97.2

54

B

94.2

64.9

93.4

94.0

76.8

93.5

95.9

42.1

96.5

201

C

96.2

68.8

95.4

95.5

75.4

95.0

98.2

63.2

97.3

225


325


/V/

/A/

/U/

/i/

/a/

/O/

/3/

/u/

/el/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

99.5

17.2

98.2

99.5

6.3

98.0

99.9

3.1

98.6

49

B

95.6

31.3

94.5

96.2

62.5

95.6

94.9

10.4

93.8

187

C

97.2

28.8

96.1

97.4

58.8

96.7

97.1

47.9

96.4

233

A

97.6

54.3

96.8

97.0

48.8

96.2

96.5

18.3

95.2

76

B

97.5

57.8

96.8

97.8

74.4

97.4

97.9

13.3

96.4

219

C

97.3

56.1

96.6

97.4

72.1

97.0

95.2

81.7

95.0

254

A

99.8

10.1

98.9

99.9

4.2

99.0

99.7

0.0

98.5

46

B

98.6

44.4

98.1

98.7

81.3

98.5

95.7

11.1

94.7

164

C

96.8

43.4

96.3

97.0

79.2

96.9

97.0

50.6

96.4

256

A

92.4

40.3

90.7

91.8

38.4

90.1

91.9

37.6

89.2

151

B

93.3

50.5

91.9

93.3

78.0

92.8

89.4

49.6

87.4

365

C

93.3

58.2

92.2

92.8

83.0

92.5

93.5

70.9

92.3

496

A

99.0

27.0

95.2

99.0

19.8

95.0

98.7

40.4

95.0

51

B

97.7

66.9

96.0

98.2

72.5

96.8

99.2

18.8

94.1

182

C

96.3

68.6

94.8

96.2

76.3

95.2

98.0

83.5

97.1

214

A

99.3

14.6

95.0

99.2

15.1

94.9

98.1

8.4

93.7

73

B

96.2

61.7

94.4

96.5

71.3

95.3

94.1

50.0

91.9

236

C

91.5

54.3

89.6

91.9

65.3

90.6

92.4

82.0

91.9

423

A

98.8

34.0

65.7

99.0

25.6

95.5

99.6

6.1

65.5

68

B

65.3

71.1

94.2

95.5

80.8

94.8

90.0

39.2

87.8

217

C

95.8

67.8

94.4

95.7

73.5

94.6

96.1

47.9

94.0

318

A

94.1

35.1

91.8

93.6

26.2

91.0

95.5

26.7

92.2

151

B

92.4

73.8

91.7

93.2

86.7

92.9

90.1

24.2

87.1

302

C

90.3

72.3

89.6

90.7

82.1

90.4

91.4

84.5

91.1

428

A

99.4

3.2

94.9

99.4

4.3

94.8

99.8

0.0

97.5

82

B

94.1

46.4

91.8

94.0

64.3

92.6

91.4

25.5

89.9

273

C

91.5

53.4

89.7

91.7

66.4

90.5

93.4

80.6

93.1

354


326


/al/

/Oi/

/OU/

/aU/

/i@/

/U@/

/e@/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

99.5

5.8

93.2

99.3

3.0

92.8

99.3

8.1

95.8

100

B

87.1

57.0

85.1

88.3

69.9

87.0

88.7

22.6

86.2

291

C

83.7

64.0

82.4

84.5

70.1

83.5

87.5

47.4

85.9

416

A

98.4

12.3

94.9

98.6

10.3

95.1

97.2

0.0

92.2

89

B

90.4

60.3

89.2

91.2

83.6

90.9

87.2

2.5

82.9

232

C

89.4

51.5

87.9

89.1

75.9

88.6

90.8

22.8

87.3

360

A

98.6

4.6

63.2

98.4

3.5

92.9

98.7

0.0

94.1

99

B

63.9

69.0

64.2

65.5

86.2

66.7

56.1

26.5

54.7

278

C

82.8

53.4

81.1

82.4

71.6

81.8

86.1

84.8

86.0

436

A

99.2

2.8

94.8

99.4

2.6

94.9

99.4

0.0

94.6

88

B

81.8

75.5

81.5

83.1

82.8

83.1

86.4

10.2

82.7

180

C

74.8

67.2

74.4

74.1

73.3

74.1

76.5

69.2

76.2

252

A

98.9

7.0

95.6

98.6

10.0

95.3

97.9

1.2

93.4

147

B

90.2

58.9

89.1

90.9

76.7

90.4

84.0

10.0

80.5

274

C

85.1

64.1

84.3

86.5

76.1

96.1

90.7

32.2

88.0

359

A

98.9

8.4

96.9

98.7

11.9

96.8

98.6

3.0

95.9

152

B

86.4

74.8

86.2

86.5

89.9

86.6

82.2

5.9

80.0

245

C

89.7

73.0

89.4

89.7

89.9

89.7

91.1

41.9

89.7

380

A

99.4

2.3

97.0

99.6

2.4

97.2

99.4

0.0

96.6

89

B

46.7

79.0

47.5

47.3

99.2

48.6

48.1

75.2

48.9

163

C

81.4

83.7

81.5

81.7

96.9

92.1

84.5

90.6

84.7

330


327

Table F.7: Percent true negative, true positive and overall accuracies (to 1 d.p.) of Zadeh-Mamdani rules extracted from SECoS trained on the phoneme classification problem.


/p/

/b/

/t/

/d/

/k/

/g/

/f/

/v/

/T/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Rules

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

100.0

31.0

99.2

100.0

22.8

99.1

99.4

15.2

98.7

271

B

100.0

33.6

99.2

100.0

24.6

99.1

99.4

18.6

98.8

376

C

99.9

32.8

99.1

99.9

24.6

99.0

98.8

20.3

98.2

507

A

99.8

0.0

99.7

99.8

0.0

99.6

99.8

0.0

99.5

116

B

99.9

0.0

99.8

99.9

0.0

99.7

99.9

0.0

99.5

162

C

100.0

0.0

99.8

100.0

0.0

99.8

99.9

0.0

99.6

235

A

100.0

0.0

98.6

100.0

0.0

98.6

100.0

0.0

99.3

293

B

0.4

100.0

1.7

0.7

100.0

2.1

0.0

100.0

0.7

406

C

100.0

0.0

98.6

100.0

0.0

98.6

100.0

0.0

99.3

490

A

100.0

3.7

99.7

100.0

0.0

99.7

100.0

0.0

99.6

146

B

100.0

3.7

99.7

100.0

0.0

99.7

100.0

0.0

99.6

200

C

100.0

3.7

99.7

100.0

0.0

99.7

100.0

0.0

99.6

294

A

100.0

0.0

98.9

100.0

0.0

98.9

100.0

0.0

99.4

232

B

100.0

0.0

98.8

100.0

0.0

98.9

100.0

0.0

99.3

333

C

100.0

0.0

98.8

100.0

0.0

98.8

100.0

0.0

99.4

409

A

100.0

0.0

99.7

100.0

0.0

99.7

100.0

0.0

99.5

133

B

100.0

0.0

99.7

100.0

0.0

99.7

100.0

0.0

99.5

184

C

100.0

0.0

99.7

100.0

0.0

99.7

100.0

0.0

99.5

265

A

100.0

0.0

97.1

100.0

0.0

97.0

100.0

0.0

97.5

471

B

100.0

0.0

97.1

100.0

0.0

97.0

100.0

0.0

97.5

674

C

99.9

0.0

97.0

100.0

0.0

97.0

100.0

0.0

97.5

814

A

100.0

0.0

99.3

100.0

0.0

99.3

100.0

0.0

98.3

296

B

100.0

0.0

99.3

100.0

0.0

99.3

100.0

0.0

98.3

406

C

100.0

0.0

99.3

100.0

0.0

99.6

100.0

0.0

98.3

541

A

100.0

0.0

97.5

100.0

0.0

97.5

100.0

0.0

97.6

410

B

100.0

0.0

97.5

100.0

0.0

97.5

100.0

0.0

97.6

570

C

100.0

0.0

97.5

100.0

0.0

97.5

100.0

0.0

97.6

799


328


/D/

/s/

/z/

/S/

/Z/

/h/

/ch/

/dj/

/m/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Rules

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.4

267

B

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.4

381

C

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.4

532

A

100.0

0.0

96.4

100.0

0.0

96.4

100.0

0.0

98.4

442

B

100.0

0.0

96.4

100.0

0.0

96.4

100.0

0.0

98.4

594

C

0.0

100.0

3.6

0.0

100.0

3.6

0.0

100.0

1.6

688

A

99.8

1.5

97.9

99.8

0.0

97.9

100.0

0.0

97.7

325

B

99.8

1.5

98.0

99.9

0.0

97.9

100.0

0.0

97.8

447

C

99.8

1.5

97.9

99.8

0.0

97.9

100.0

0.0

97.7

539

A

99.9

6.4

97.1

99.8

5.6

97.1

99.6

0.0

96.8

495

B

99.8

7.4

97.1

99.8

9.7

97.2

99.6

0.0

96.8

666

C

99.8

7.7

97.1

99.8

11.1

97.2

99.7

0.0

96.9

812

A

100.0

2.5

99.2

100.0

2.6

99.2

100.0

0.0

98.8

237

B

99.9

3.8

99.1

99.9

2.6

99.1

99.9

0.0

98.7

334

C

99.8

3.8

99.0

99.7

2.6

99.0

99.7

0.0

98.5

412

A

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.3

211

B

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.3

292

C

99.8

0.0

99.0

99.7

0.0

99.0

99.9

8.1

98.3

414

A

100.0

0.0

97.8

100.0

0.0

97.8

100.0

0.0

98.4

407

B

0.0

100.0

2.2

0.0

100.0

2.2

0.0

100.0

1.6

576

C

100.0

0.0

97.8

100.0

0.0

97.8

100.0

0.0

98.4

719

A

100.0

5.3

99.6

100.0

0.0

99.6

99.8

0.0

98.4

168

B

100.0

2.6

99.6

100.0

0.0

99.6

99.8

0.0

98.4

208

C

99.5

5.3

99.2

99.5

0.0

99.1

98.9

11.2

97.7

356

A

99.5

6.5

98.4

99.3

5.0

98.2

98.8

0.0

97.8

380

B

99.5

4.9

98.3

99.2

3.3

98.1

98.3

0.0

97.3

518

C

99.4

5.7

98.3

99.3

3.3

98.1

98.8

0.0

97.8

707


329


/n/

/N/

/l/

/r/

/w/

/ie/

/I/

/e/

/&/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Rules

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

100.0

0.0

98.2

100.0

0.0

98.2

100.0

0.0

98.7

453

B

100.0

0.0

98.2

100.0

0.0

98.2

100.0

0.0

98.7

602

C

100.0

0.0

98.2

100.0

0.0

98.2

100.0

0.0

98.7

820

A

99.8

0.0

98.8

99.7

0.0

98.8

99.0

0.0

98.1

320

B

99.8

0.0

98.9

99.8

0.0

98.9

99.2

0.0

98.3

444

C

99.8

0.0

98.8

99.7

0.0

98.8

99..4

0.0

98.5

566

A

100.0

0.0

98.6

100.0

0.0

98.7

100.0

0.0

97.8

467

B

100.0

0.0

98.6

100.0

0.0

98.7

100.0

0.0

97.8

647

C

100.0

0.0

98.6

100.0

0.0

98.7

100.0

0.0

97.8

931

A

99.8

8.6

98.9

99.9

3.8

98.8

99.1

0.0

98.2

315

B

99.9

8.6

98.9

99.8

5.7

98.8

99.9

0.0

98.9

443

C

99.1

4.8

98.1

99.2

3.8

98.2

97.0

9.2

96.2

554

A

100.0

0.0

99.0

100.0

0.0

99.0

100.0

1.9

99.2

444

B

99.8

0.0

98.8

99.8

0.0

98.9

99.7

1.9

99.0

610

C

99.7

0.0

98.7

99.8

0.0

98.8

99.8

1.9

99.0

818

A

99.9

0.0

98.8

99.8

1.9

98.8

99.4

0.0

98.5

583

B

99.9

0.0

98.8

99.8

1.9

98.7

99.4

0.0

98.4

809

C

99.6

0.0

98.5

99.5

1.9

98.5

99.2

1.4

98.3

1003

A

99.3

5.0

97.7

99.4

2.4

97.7

98.2

8.7

96.7

493

B

99.3

6.9

97.7

99.0

5.9

97.4

96.7

9.6

95.3

665

C

98.2

6.3

96.7

98.0

4.7

96.4

94.2

17.4

93.0

771

A

98.8

2.0

96.4

98.4

1.6

96.0

97.5

3.7

94.6

444

B

99.4

4.0

97.0

99.4

1.6

96.9

98.7

1.4

95.8

605

C

99.3

2.4

96.9

99.3

1.6

96.9

98.8

0.5

95.8

728

A

98.3

0.7

95.6

98.2

0.7

95.5

96.1

5.3

93.7

528

B

99.9

0.4

97.1

99.9

0.7

97.2

99.0

3.7

96.4

715

C

99.8

0.4

97.0

99.8

0.7

97.1

99.2

4.2

96.6

779


330


/V/

/A/

/U/

/i/

/a/

/O/

/3/

/u/

/el/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Rules

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

100.0

3.1

98.4

100.0

0.0

98.4

99.0

9.4

97.8

464

B

100.0

3.7

98.4

100.0

0.0

98.4

99.9

0.0

98.5

652

C

100.0

3.7

98.4

100.0

0.0

98.4

99.7

14.6

98.5

725

A

99.8

2.9

98.2

99.8

1.2

98.1

98.8

40.8

96.9

402

B

99.9

2.9

98.2

99.9

1.2

98.2

98.8

30.8

97.6

553

C

99.8

3.5

98.1

99.8

1.2

98.1

97.4

50.8

96.7

630

A

98.3

15.2

97.5

98.0

14.6

97.2

95.3

42.0

94.7

331

B

98.8

13.1

97.9

98.7

12.5

97.8

95.6

34.6

94.9

466

C

98.3

14.1

97.5

98.3

12.5

97.5

95.7

23.5

94.9

613

A

98.6

1.2

95.5

98.9

1.9

95.8

97.3

0.6

92.5

883

B

98.3

1.8

95.3

98.8

1.9

95.7

97.4

2.0

92.6

1165

C

98.7

1.8

95.6

98.8

1.3

95.7

98.6

1.4

93.8

1362

A

100.0

0.0

94.7

100.0

0.0

94.7

100.0

0.0

93.7

711

B

100.0

0.0

94.7

100.0

0.0

94.7

100.0

0.0

93.7

938

C

100.0

0.0

94.7

100.0

0.0

94.7

100.0

0.0

93.7

1011

A

98.1

13.7

93.8

97.6

10.4

93.2

95.8

22.1

92.2

846

B

98.1

15.0

93.9

97.6

11.2

93.2

95.2

23.0

92.7

1139

C

98.1

15.8

93.9

98.1

9.6

93.6

97.2

15.4

93.2

1447

A

98.2

11.3

94.0

97.7

17.5

93.9

93.4

24.4

92.3

893

B

98.3

12.2

94.2

97.8

18.8

94.1

95.1

27.7

92.1

1204

C

98.3

9.7

94.0

97.9

12.8

93.9

99.0

22.8

92.8

1430

A

69.2

44.1

68.2

69.5

43.1

68.5

70.6

31.2

68.7

778

B

69.3

43.9

68.3

69.7

42.1

68.6

71.4

30.0

69.4

1017

C

69.1

39.7

67.9

69.2

38.5

68.0

71.3

46.7

70.2

1274

A

99.9

1.3

95.3

99.8

0.9

95.1

99.6

0.6

97.3

1137

B

98.3

3.6

93.9

98.2

5.5

93.8

97.4

3.0

95.2

1565

C

98.4

3.6

93.9

98.2

5.5

93.8

97.6

9.1

95.5

1737


331


/al/

/Oi/

/OU/

/aU/

/i@/

/U@/

/e@/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Rules

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

100.0

0.0

93.2

100.0

0.0

93.2

100.0

0.0

96.2

1379

B

100.0

0.0

93.2

100.0

0.0

93.2

100.0

0.0

96.2

1859

C

100.0

0.0

93.2

100.0

0.0

93.2

100.0

0.0

96.2

2067

A

98.9

11.8

95.4

98.8

14.4

95.5

96.7

1.1

91.8

1014

B

98.5

10.1

95.0

98.5

13.8

95.2

95.6

0.8

90.7

1338

C

99.0

8.6

95.4

99.1

11.3

95.6

96.9

1.4

92.0

1597

A

89.3

11.0

84.8

89.5

10.7

84.9

87.6

7.0

83.8

1420

B

75.0

26.3

75.2

76.6

20.4

73.3

76.4

13.4

73.5

1884

C

73.2

26.4

70.5

74.6

20.1

71.5

72.3

38.1

70.8

2229

A

99.9

0.0

95.3

99.9

0.0

95.2

99.3

0.6

94.4

945

B

99.9

0.2

95.3

99.9

0.9

95.2

98.9

7.6

94.4

1272

C

99.0

0.2

94.4

99.0

1.7

94.4

97.2

10.5

93.0

1368

A

99.7

0.3

96.1

99.8

1.1

96.2

98.8

4.6

94.4

982

B

99.6

1.1

96.0

99.7

2.8

96.2

98.5

4.3

94.1

1271

C

99.8

0.0

96.2

99.9

0.0

96.2

97.1

12.8

93.2

1429

A

99.8

2.2

97.6

99.7

2.8

97.6

99.3

0.5

96.4

725

B

99.5

3.1

97.4

99.5

2.8

97.4

98.4

0.0

95.6

928

C

99.5

3.1

97.4

99.5

3.7

97.4

98.4

0.0

95.6

1121

A

99.6

0.0

97.1

99.7

1.6

97.2

99.1

0.0

96.3

893

B

99.4

1.2

96.9

99.4

2.4

96.9

98.7

1.5

95.9

1125

C

99.3

0.0

96.8

98.9

0.8

96.4

96.4

6.9

93.8

1313


332

Table F.8: Percent true negative, true positive and overall accuracies (to 1 d.p.) of SECoS created via insertion of Zadeh-Mamdani fuzzy rules, for the phoneme classification problem.


/p/

/b/

/t/

/d/

/k/

/g/

/f/

/v/

/T/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

100.0

0.0

98.9

100.0

0.0

98.8

100.0

0.0

99.2

271

B

100.0

0.0

98.9

100.0

0.0

98.8

100.0

0.0

99.2

376

C

100.0

0.0

98.9

100.0

0.0

98.8

100.0

0.0

99.2

507

A

100.0

0.0

99.8

100.0

0.0

99.8

100.0

0.0

99.7

116

B

100.0

0.0

99.8

100.0

0.0

99.8

100.0

0.0

99.7

162

C

100.0

0.0

99.7

100.0

0.0

99.8

100.0

0.0

99.7

235

A

100.0

0.0

98.6

100.0

0.0

98.6

100.0

0.0

99.3

293

B

100.0

0.0

98.6

100.0

0.0

98.6

100.0

0.0

99.3

406

C

100.0

0.0

98.6

100.0

0.0

98.6

100.0

0.0

99.3

490

A

100.0

0.0

99.7

100.0

0.0

99.7

100.0

0.0

99.7

146

B

100.0

0.0

99.7

100.0

0.0

99.7

100.0

0.0

99.7

200

C

100.0

0.0

99.7

100.0

0.0

99.7

100.0

0.0

99.7

294

A

100.0

0.0

98.9

100.0

0.0

99.0

100.0

0.0

99.4

232

B

100.0

0.0

98.9

100.0

0.0

99.0

100.0

0.0

99.4

333

C

100.0

0.0

98.9

100.0

0.0

99.0

100.0

0.0

99.4

409

A

100.0

0.0

99.7

100.0

0.0

99.7

100.0

0.0

99.5

133

B

100.0

0.0

99.7

100.0

0.0

99.7

100.0

0.0

99.5

184

C

100.0

0.0

99.7

100.0

0.0

99.7

100.0

0.0

99.5

265

A

100.0

0.0

97.1

100.0

0.0

97.0

100.0

0.0

97.5

471

B

100.0

0.0

97.1

100.0

0.0

97.0

100.0

0.0

97.5

674

C

100.0

0.0

97.1

100.0

0.0

97.0

100.0

0.0

97.5

814

A

100.0

0.0

99.3

100.0

0.0

99.3

100.0

0.0

98.3

296

B

100.0

0.0

99.3

100.0

0.0

99.3

100.0

0.0

98.3

406

C

100.0

0.0

99.3

100.0

0.0

99.3

100.0

0.0

98.3

541

A

100.0

0.0

97.5

100.0

0.0

97.5

100.0

0.0

97.6

410

B

100.0

0.0

97.5

100.0

0.0

97.5

100.0

0.0

97.6

570

C

100.0

0.0

97.5

100.0

0.0

97.5

100.0

0.0

97.6

799


333


/D/

/s/

/z/

/S/

/Z/

/h/

/ch/

/dj/

/m/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.4

267

B

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.4

381

C

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.4

532

A

100.0

0.0

96.4

100.0

0.0

96.4

100.0

0.0

98.4

442

B

100.0

0.0

96.4

100.0

0.0

96.4

100.0

0.0

98.4

594

C

100.0

0.0

96.4

100.0

0.0

96.4

100.0

0.0

98.4

688

A

100.0

0.0

98.1

100.0

0.0

98.1

100.0

0.0

97.8

325

B

100.0

0.0

98.1

100.0

0.0

98.1

100.0

0.0

97.8

447

C

100.0

0.0

98.1

100.0

0.0

98.1

100.0

0.0

97.8

539

A

100.0

0.0

97.1

100.0

0.0

97.1

100.0

0.0

97.2

495

B

100.0

0.0

97.1

100.0

0.0

97.1

100.0

0.0

97.2

666

C

100.0

0.0

97.1

100.0

0.0

97.1

100.0

0.0

97.2

812

A

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.8

237

B

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.8

334

C

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.8

412

A

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.3

211

B

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.3

292

C

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.3

414

A

100.0

0.0

97.8

100.0

0.0

97.8

100.0

0.0

98.4

407

B

100.0

0.0

97.8

100.0

0.0

97.8

100.0

0.0

98.4

576

C

100.0

0.0

97.8

100.0

0.0

97.8

100.0

0.0

98.4

719

A

100.0

0.0

99.6

100.0

0.0

99.6

100.0

0.0

98.6

168

B

100.0

0.0

99.6

100.0

0.0

99.6

100.0

0.0

98.6

208

C

100.0

0.0

99.6

100.0

0.0

99.6

100.0

0.0

98.6

356

A

100.0

0.0

98.8

100.0

0.0

98.8

100.0

0.0

99.0

380

B

100.0

0.0

98.8

100.0

0.0

98.8

100.0

0.0

99.0

518

C

100.0

0.0

98.8

100.0

0.0

98.8

100.0

0.0

99.0

707


334


/n/

/N/

/l/

/r/

/w/

/ie/

/I/

/e/

/&/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

100.0

0.0

98.2

100.0

0.0

98.2

100.0

0.0

98.7

453

B

100.0

0.0

98.2

100.0

0.0

98.2

100.0

0.0

98.7

602

C

100.0

0.0

98.2

100.0

0.0

98.2

100.0

0.0

98.7

820

A

100.0

0.0

99.1

100.0

0.0

99.1

100.0

0.0

99.1

320

B

100.0

0.0

99.1

100.0

0.0

99.1

100.0

0.0

99.1

444

C

100.0

0.0

99.1

100.0

0.0

99.1

100.0

0.0

99.1

566

A

100.0

0.0

98.6

100.0

0.0

98.7

100.0

0.0

97.8

467

B

100.0

0.0

98.6

100.0

0.0

98.7

100.0

0.0

97.8

647

C

100.0

0.0

98.6

100.0

0.0

98.7

100.0

0.0

97.8

931

A

100.0

1.9

99.0

100.0

1.9

98.9

100.0

1.0

99.1

315

B

100.0

1.9

99.0

100.0

1.9

98.9

100.0

0.0

99.1

443

C

100.0

0.0

99.0

100.0

0.0

98.9

100.0

0.0

99.1

554

A

100.0

0.0

99.0

100.0

0.0

99.0

100.0

0.0

99.2

444

B

100.0

0.0

99.0

100.0

0.0

99.0

100.0

0.0

99.2

610

C

100.0

0.0

99.0

100.0

0.0

99.0

100.0

0.0

99.2

818

A

100.0

0.0

98.9

100.0

0.0

98.9

100.0

0.0

99.0

583

B

100.0

0.0

98.9

100.0

0.0

98.9

100.0

0.0

99.0

809

C

100.0

0.0

98.9

100.0

0.0

98.9

100.0

0.0

99.0

1003

A

100.0

0.0

98.3

100.0

0.0

98.3

100.0

0.0

98.4

493

B

100.0

0.0

98.3

100.0

0.0

98.3

100.0

0.0

98.4

665

C

100.0

0.0

98.3

100.0

0.0

98.3

99.8

2.6

98.4

771

A

100.0

0.0

97.5

100.0

0.0

97.5

100.0

0.0

97.0

444

B

100.0

0.0

97.5

100.0

0.0

97.5

100.0

0.0

97.0

605

C

100.0

0.0

97.5

100.0

0.0

97.5

100.0

0.0

97.0

728

A

100.0

0.0

97.2

100.0

0.0

97.2

100.0

0.0

97.3

528

B

100.0

0.0

97.2

100.0

0.0

97.2

100.0

0.0

97.3

715

C

100.0

0.0

97.2

100.0

0.0

97.2

100.0

0.0

97.3

779


335


/V/

/A/

/U/

/i/

/a/

/O/

/3/

/u/

/el/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

100.0

0.6

98.4

100.0

0.0

98.4

100.0

0.0

98.6

464

B

100.0

0.0

98.4

100.0

0.0

98.4

100.0

0.0

98.6

652

C

100.0

0.0

98.4

100.0

0.0

98.4

100.0

1.0

98.6

725

A

100.0

0.6

98.3

100.0

0.0

98.2

100.0

0.0

98.2

402

B

100.0

0.0

98.3

100.0

0.0

98.2

100.0

0.0

98.2

553

C

100.0

0.0

98.3

100.0

0.0

98.2

100.0

1.0

98.2

630

A

100.0

0.0

99.0

100.0

0.0

99.0

100.0

0.0

98.9

331

B

100.0

0.0

99.0

100.0

0.0

99.0

100.0

0.0

98.9

466

C

100.0

0.0

99.0

100.0

0.0

99.0

100.0

0.0

98.9

613

A

100.0

0.0

96.8

100.0

0.0

96.8

100.0

0.0

95.0

883

B

100.0

0.0

96.8

100.0

0.0

96.8

100.0

0.0

95.0

1165

C

100.0

0.0

96.8

100.0

0.0

96.8

100.0

0.0

95.0

1362

A

100.0

0.0

94.7

100.0

0.0

94.7

100.0

0.0

93.7

711

B

100.0

0.0

94.7

100.0

0.0

94.7

100.0

0.0

93.7

938

C

100.0

0.0

94.7

100.0

0.0

94.7

100.0

0.0

93.7

1011

A

100.0

0.0

94.9

100.0

0.0

94.9

100.0

0.0

95.1

846

B

100.0

0.0

94.9

100.0

0.0

94.9

100.0

0.0

95.1

1139

C

100.0

0.0

94.9

100.0

0.0

94.9

100.0

0.0

95.1

1447

A

100.0

0.0

95.2

100.0

0.0

95.3

100.0

0.0

95.6

893

B

100.0

0.0

95.2

100.0

0.0

95.3

100.0

0.0

95.6

1204

C

100.0

0.0

95.2

100.0

0.0

95.3

100.0

0.0

95.6

1430

A

100.0

0.7

96.1

100.0

0.0

96.0

100.0

0.0

95.3

778

B

100.0

0.5

96.1

100.0

1.0

96.0

100.0

0.0

95.3

1017

C

100.0

0.5

96.1

100.0

1.0

96.0

100.0

0.0

95.3

1274

A

100.0

0.8

95.4

100.0

0.4

95.3

100.0

0.0

97.7

1137

B

100.0

0.8

95.4

100.0

0.4

95.3

99.9

0.0

97.6

1565

C

100.0

0.8

95.3

100.0

0.1

95.3

100.0

0.0

97.6

1737


336


/al/

/Oi/

/OU/

/aU/

/i@/

/U@/

/e@/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

100.0

0.0

93.2

100.0

0.0

93.2

100.0

0.0

96.2

1379

B

100.0

0.0

93.2

100.0

0.0

93.2

100.0

0.0

96.2

1859

C

100.0

0.0

93.2

100.0

0.0

93.2

99.7

0.7

95.9

2067

A

100.0

0.0

96.0

100.0

0.0

96.0

100.0

0.0

94.9

1014

B

100.0

0.0

96.0

100.0

0.0

96.0

100.0

0.0

94.9

1338

C

100.0

0.0

96.0

100.0

0.0

96.0

100.0

0.0

94.9

1597

A

100.0

0.0

94.2

100.0

0.0

94.2

99.7

0.0

95.1

1420

B

100.0

0.0

94.2

100.0

0.7

94.2

99.7

0.0

95.1

1884

C

99.9

0.3

94.2

100.0

1.0

94.2

99.7

0.0

95.1

2229

A

100.0

0.0

95.4

100.0

0.0

95.3

100.0

0.0

95.1

945

B

100.0

0.0

95.4

100.0

0.0

95.3

100.0

0.3

95.1

1272

C

100.0

0.0

95.4

100.0

0.0

95.3

100.0

0.9

95.1

1368

A

100.0

0.0

96.4

100.0

0.0

96.4

100.0

0.0

95.3

982

B

100.0

0.0

96.4

100.0

0.0

96.4

100.0

0.0

95.3

1271

C

100.0

0.0

96.4

100.0

0.0

96.4

99.9

4.3

95.3

1429

A

100.0

0.0

97.8

100.0

0.0

97.8

100.0

0.0

97.1

725

B

100.0

0.0

97.8

100.0

0.0

97.8

100.0

0.0

97.1

928

C

100.0

0.0

97.8

100.0

0.0

97.8

100.0

0.0

97.1

1121

A

100.0

0.0

97.5

100.0

0.0

97.4

100.0

0.0

97.1

893

B

100.0

0.0

97.5

100.0

0.0

97.4

100.0

0.0

97.1

1125

C

100.0

0.0

97.5

100.0

0.0

97.4

100.0

0.0

97.1

1313


337

Table F.9: Percent true negative, true positive and overall accuracies (to 1 d.p.) of Takagi-Sugeno rules extracted from SECoS trained on the phoneme classification problem.


/p/

/b/

/t/

/d/

/k/

/g/

/f/

/v/

/T/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Rules

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

100.0

30.2

99.2

100.0

22.8

99.1

99.5

15.3

98.8

271

B

100.0

33.6

99.2

100.0

24.6

99.1

99.5

18.6

98.8

376

C

99.9

32.8

99.1

99.9

24.6

99.1

99.3

20.3

98.6

507

A

99.8

0.0

99.7

99.8

0.0

99.6

99.8

0.0

99.5

116

B

99.9

0.0

99.8

99.9

0.0

99.7

99.9

0.0

99.5

162

C

100.0

0.0

99.8

100.0

0.0

99.8

99.9

0.0

99.6

235

A

100.0

0.0

98.6

100.0

0.0

98.6

100.0

0.0

99.3

293

B

0.4

100.0

1.7

0.7

100.0

2.1

0.0

100.0

0.7

406

C

100.0

0.0

98.6

100.0

0.0

98.6

100.0

0.0

99.3

490

A

100.0

3.7

99.7

99.9

0.0

99.7

100.0

0.0

99.6

146

B

100.0

3.7

99.7

100.0

0.0

100.0

100.0

0.0

99.6

200

C

100.0

3.7

99.7

99.9

0.0

99.7

100.0

0.0

99.6

294

A

100.0

0.0

98.9

99.9

0.0

98.9

100.0

0.0

99.4

232

B

99.9

0.0

98.8

99.9

0.0

98.8

99.9

0.0

99.3

333

C

99.9

0.0

98.8

99.9

0.0

98.8

100.0

0.0

99.4

409

A

100.0

0.0

99.7

100.0

0.0

99.7

100.0

0.0

99.5

133

B

100.0

0.0

99.7

100.0

0.0

99.7

100.0

0.0

99.5

184

C

100.0

0.0

99.7

100.0

0.0

99.7

100.0

0.0

99.5

265

A

100.0

0.0

97.1

100.0

0.0

97.0

100.0

0.0

97.5

471

B

100.0

0.0

97.1

100.0

0.0

97.0

100.0

0.0

97.5

674

C

99.9

0.0

97.0

99.9

0.0

97.0

100.0

0.0

97.5

814

A

100.0

0.0

99.3

100.0

0.0

99.3

100.0

0.0

98.3

296

B

100.0

0.0

99.3

100.0

0.0

99.3

100.0

0.0

98.3

406

C

100.0

0.0

99.3

100.0

0.0

99.3

100.0

0.0

98.3

541

A

100.0

0.0

97.5

100.0

0.0

97.5

100.0

0.0

97.6

410

B

100.0

0.0

97.5

100.0

0.0

97.5

100.0

0.0

97.6

570

C

100.0

0.0

97.5

100.0

0.0

97.5

100.0

0.0

97.6

799


338


/D/

/s/

/z/

/S/

/Z/

/h/

/ch/

/dj/

/m/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Rules

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.4

267

B

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.4

381

C

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.4

532

A

100.0

0.0

96.4

100.0

0.0

96.4

100.0

0.0

98.4

442

B

100.0

0.0

96.4

100.0

0.0

96.4

100.0

0.0

98.4

594

C

0.0

100.0

3.6

0.0

100.0

3.6

0.0

100.0

1.6

688

A

99.5

1.5

98.0

99.8

0.0

97.9

100.0

0.0

97.8

325

B

99.9

1.5

98.0

99.9

0.0

97.9

100.0

0.0

97.8

447

C

99.8

1.5

98.0

99.8

0.0

97.9

100.0

0.0

97.8

539

A

99.9

6.4

97.1

99.8

5.6

97.1

99.8

0.0

97.0

495

B

99.8

7.4

97.1

99.8

9.7

97.2

99.8

0.0

97.0

666

C

99.8

7.7

97.1

99.8

11.1

97.2

99.9

0.0

97.1

812

A

100.0

2.5

99.2

100.0

2.6

99.2

100.0

0.0

98.8

237

B

99.9

3.4

99.1

99.9

2.6

99.1

100.0

0.0

98.8

334

C

99.4

3.8

98.7

99.4

2.6

98.6

96.9

0.0

95.7

412

A

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.3

211

B

100.0

0.0

99.2

100.0

0.0

99.2

100.0

0.0

98.3

292

C

99.8

0.0

99.0

99.7

0.0

99.0

100.0

8.1

98.4

414

A

100.0

0.0

97.8

100.0

0.0

97.8

100.0

0.0

98.4

407

B

0.0

100.0

2.2

0.0

100.0

2.2

0.0

100.0

1.6

576

C

100.0

0.0

97.8

100.0

0.0

97.8

100.0

0.0

98.4

719

A

100.0

5.3

99.6

100.0

0.0

99.6

99.8

0.0

98.4

168

B

100.0

2.6

99.6

100.0

0.0

99.6

99.8

0.0

98.4

208

C

99.3

5.3

99.0

99.3

0.0

98.9

97.5

11.2

96.3

356

A

99.5

6.5

98.4

99.3

5.0

98.2

98.8

0.0

97.8

380

B

99.5

4.9

98.3

99.2

3.3

98.1

98.3

0.0

97.4

518

C

99.5

5.7

98.3

99.3

3.3

98.1

98.8

0.0

97.9

707


339


/n/

/N/

/n/

/r/

/w/

/ie/

/I/

/e/

/&/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Rules

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

100.0

0.0

98.2

100.0

0.0

98.2

100.0

0.0

98.7

453

B

100.0

0.0

98.2

100.0

0.0

98.2

100.0

0.0

98.7

602

C

100.0

0.0

98.2

100.0

0.0

98.2

100.0

0.0

98.7

820

A

99.8

0.0

98.9

99.7

0.0

98.8

99.1

0.0

98.2

320

B

99.9

0.0

98.9

99.8

0.0

98.9

99.2

0.0

98.3

444

C

99.8

0.0

98.8

99.7

0.0

98.8

99.4

0.0

98.5

566

A

100.0

0.0

98.2

100.0

0.0

98.2

100.0

0.0

98.7

467

B

100.0

0.0

98.2

100.0

0.0

98.2

100.0

0.0

98.7

647

C

100.0

0.0

98.2

100.0

0.0

98.2

100.0

0.0

98.7

931

A

99.8

8.6

98.9

99.9

3.8

98.8

99.1

0.0

98.2

315

B

99.9

8.6

98.9

99.8

5.7

98.8

99.9

0.0

98.9

443

C

99.1

4.5

98.1

99.2

3.8

98.2

96.4

9.2

95.6

554

A

100.0

0.0

99.0

100.0

0.0

99.0

100.0

1.9

99.2

444

B

99.8

0.0

98.8

99.8

0.0

98.9

99.7

1.9

99.0

610

C

99.7

0.0

98.7

99.8

0.0

98.8

99.8

1.9

99.0

818

A

99.9

0.0

98.8

99.8

1.9

98.8

99.4

0.0

98.5

583

B

99.9

0.0

98.8

99.8

1.9

98.7

99.4

0.0

98.4

809

C

99.6

0.0

98.5

99.5

1.9

98.5

98.5

1.4

97.6

1003

A

99.3

4.6

97.7

99.4

2.4

97.7

98.2

8.7

96.7

493

B

99.3

6.9

97.7

99.0

5.9

97.4

96.7

9.6

95.3

665

C

98.2

6.3

96.7

98.0

4.7

96.4

94.2

17.4

93.0

771

A

98.8

2.0

96.4

98.4

1.6

96.0

97.5

3.7

94.6

444

B

99.4

4.0

97.0

99.4

1.6

96.9

98.7

1.4

95.8

605

C

99.3

2.4

96.9

99.3

1.6

96.9

98.8

0.5

95.8

728

A

98.3

0.7

95.6

98.2

0.7

95.5

96.1

5.3

93.7

528

B

99.9

0.4

97.1

99.9

0.7

97.2

99.0

3.7

96.4

715

C

99.8

0.4

97.0

99.8

0.7

97.1

98.6

4.2

96.1

779


340


/V/

/A/

/U/

/i/

/a/

/O/

/3/

/u/

/el/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Rules

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

100.0

3.1

98.4

100.0

0.0

98.4

99.0

9.4

97.8

464

B

100.0

3.7

98.4

100.0

0.0

98.4

99.9

0.0

98.5

652

C

100.0

3.7

98.4

100.0

0.0

98.4

99.3

14.6

98.2

725

A

99.8

2.9

98.2

99.8

1.2

98.1

97.8

40.8

96.9

402

B

99.9

2.9

98.2

99.9

1.2

98.2

98.8

30.8

97.6

553

C

99.8

3.5

98.1

99.9

1.2

98.1

97.4

50.8

96.7

630

A

98.3

15.2

97.5

98.0

14.6

97.2

95.3

42.0

94.7

331

B

98.8

13.1

97.9

98.7

12.5

97.8

95.6

34.6

94.9

466

C

98.3

14.1

97.5

98.3

12.5

97.5

95.0

23.5

94.2

613

A

98.7

1.2

95.5

98.9

1.9

95.8

97.3

0.6

92.5

883

B

98.3

1.8

95.3

98.8

1.9

95.7

97.4

2.0

92.6

1165

C

98.7

1.8

95.6

98.8

1.3

95.7

98.6

1.4

93.8

1362

A

99.0

27.0

95.2

99.0

19.8

95.0

98.7

40.4

95.0

711

B

97.7

66.9

96.0

98.2

72.5

96.8

99.2

18.8

94.1

938

C

96.3

68.6

94.8

96.2

76.3

95.2

98.0

83.5

97.1

1011

A

98.1

13.7

93.8

97.6

10.4

93.2

95.8

22.1

92.2

846

B

98.1

15.0

93.9

97.6

11.2

93.2

95.2

23.0

91.7

1139

C

98.1

15.8

93.9

98.1

9.6

93.6

97.2

15.4

93.2

1447

A

98.2

11.3

94.0

97.7

17.5

93.9

95.4

24.4

92.3

893

B

98.3

12.2

94.2

97.8

18.8

94.1

95.1

27.7

92.4

1204

C

98.3

8.7

94.0

97.9

12.8

93.9

96.0

22.8

92.8

1430

A

98.1

8.0

94.5

97.6

14.4

94.4

94.6

11.5

90.7

778

B

98.2

7.7

94.6

97.8

12.8

94.4

95.5

10.0

91.5

1017

C

97.7

3.7

94.0

97.2

9.7

93.8

95.1

29.1

92.1

1274

A

99.9

1.5

95.3

99.8

0.9

95.1

99.6

0.6

97.3

1137

B

99.8

1.5

95.2

99.8

1.7

95.1

98.2

1.2

95.9

1565

C

99.9

1.5

95.3

99.8

1.7

95.2

98.4

7.9

96.3

1737


341


/al/

/Oi/

/OU/

/aU/

/i@/

/U@/

/e@/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Rules

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

99.5

5.8

93.2

99.3

3.0

92.8

99.3

8.1

95.8

1379

B

87.1

57.0

85.1

88.3

69.9

87.0

88.7

22.6

86.2

1859

C

83.7

64.0

82.4

84.5

70.1

83.5

87.5

47.4

85.9

2067

A

98.9

11.8

95.4

9.8

13.3

95.5

96.7

0.0

91.7

1014

B

98.6

9.9

95.1

98.6

12.3

95.2

95.6

0.0

90.7

1338

C

99.1

8.4

95.5

99.2

9.7

95.7

96.9

1.4

92.1

1597

A

99.5

1.2

93.8

99.4

0.7

93.6

96.5

2.7

92.1

1420

B

99.4

0.9

93.7

99.3

1.0

93.6

96.6

3.7

92.3

1884

C

97.5

1.2

91.9

97.0

1.0

91.4

92.2

29.6

89.3

2229

A

99.9

0.0

95.3

99.9

0.0

95.2

99.3

0.6

94.4

945

B

99.9

0.2

95.3

99.9

0.9

95.2

98.9

4.1

94.2

1272

C

99.0

0.2

94.4

99.0

1.7

94.4

97.2

10.5

93.0

1368

A

99.7

0.3

96.1

99.8

1.1

96.2

98.8

0.0

94.2

982

B

99.6

1.1

96.0

99.7

2.8

96.2

98.5

0.0

94.0

1271

C

99.8

0.0

96.2

99.9

0.0

96.2

97.1

12.8

93.2

1429

A

99.8

2.2

97.6

99.7

2.8

97.6

99.3

0.5

96.4

725

B

99.5

3.1

97.4

99.5

2.8

97.4

98.4

0.0

95.6

928

C

99.5

3.1

97.4

99.5

3.7

97.4

98.4

0.0

95.6

1121

A

99.4

0.0

97.1

99.7

1.6

97.2

99.1

0.0

96.3

893

B

99.4

1.2

96.9

99.4

2.4

96.9

98.7

1.5

95.9

1125

C

99.3

0.0

96.8

98.9

0.8

96.4

96.3

6.9

93.8

1313


342

Table F.10: Percent true negative, true positive and overall accuracies (to 1 d.p.) of EFuNN trained with online aggregation on the phoneme classification problem.


/p/

/b/

/t/

/d/

/k/

/g/

/f/

/v/

/T/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

56.1

80.2

56.4

55.2

84.2

55.5

60.2

88.1

60.4

3

B

64.1

97.4

64.4

64.3

94.7

64.7

64.8

94.9

65.0

6

C

66.6

83.6

66.8

66.8

78.9

66.9

69.4

76.3

69.5

6

A

66.4

58.8

66.4

66.6

44.4

66.6

63.3

4.3

60.1

4

B

42.4

88.2

42.5

42.1

100.0

42.2

46.5

100.0

46.7

6

C

23.9

100.0

24.1

26.8

100.0

27.0

19.7

100.0

20.0

9

A

76.8

97.1

77.1

76.7

100.0

77.0

79.3

100.0

79.4

3

B

69.2

100.0

69.6

69.0

100.0

69.4

69.6

100.0

69.8

5

C

81.0

95.7

81.2

81.0

97.0

81.2

85.8

100.0

85.9

7

A

55.5

92.6

55.6

54.2

62.3

54.3

58.7

100.0

58.8

3

B

60.8

88.9

60.9

62.4

100.0

62.5

62.5

91.3

62.6

7

C

41.6

66.7

41.6

46.1

84.6

46.2

41.9

73.9

42.0

8

A

56.6

100.0

57.1

55.2

100.0

55.7

60.1

100.0

60.3

3

B

63.4

100.0

63.8

62.6

100.0

63.0

64.7

100.0

64.9

5

C

70.1

100.0

70.4

70.9

98.1

71.2

77.9

100.0

78.0

7

A

55.8

100.0

56.0

54.3

100.0

54.4

59.4

100.0

59.6

3

B

56.9

100.0

57.1

56.6

100.0

56.7

57.9

100.0

58.2

8

C

66.7

82.1

66.7

66.5

84.6

66.5

72.5

80.6

72.5

8

A

61.3

100.0

62.4

60.5

100.0

61.7

64.3

100.0

65.1

3

B

64.1

100.0

65.2

63.4

100.0

64.5

65.7

100.0

66.5

5

C

80.0

100.0

80.6

80.0

100.0

80.6

84.2

99.4

84.6

5

A

56.9

100.0

57.2

55.8

100.0

56.1

61.3

100.0

62.0

3

B

57.8

100.0

58.1

57.2

100.0

57.5

60.5

100.0

61.2

5

C

75.1

76.7

75.1

75.1

74.3

75.1

79.1

84.7

79.2

5

A

61.1

100.0

62.1

60.3

100.0

61.3

64.3

100.0

65.2

3

B

64.3

100.0

65.2

63.6

100.0

64.5

66.1

100.0

67.0

5

C

75.9

99.2

76.5

77.1

99.2

77.7

80.2

77.3

88.2

7


343


/D/

/s/

/z/

/S/

/Z/

/h/

/ch/

/dj/

/m/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

58.1

100.0

58.4

57.4

100.0

57.7

62.4

100.0

63.0

3

B

59.3

100.0

59.6

59.0

100.0

59.3

61.9

100.0

62.5

7

C

76.5

87.3

76.5

76.4

84.2

76.5

81.4

90.4

81.6

6

A

69.4

99.7

70.5

69.0

99.4

70.1

70.3

100.0

70.8

3

B

71.3

99.4

72.3

71.2

98.9

72.2

71.5

99.1

71.9

5

C

95.0

81.8

94.5

94.2

79.2

93.7

94.6

88.7

94.5

6

A

63.1

100.0

63.8

62.1

100.0

62.9

65.9

99.4

66.6

3

B

70.0

100.0

70.6

69.9

100.0

70.4

69.1

100.0

69.8

6

C

80.1

76.4

80.1

80.4

83.2

80.5

84.7

60.9

84.1

6

A

64.2

100.0

65.2

63.6

100.0

64.7

67.4

100.0

68.3

3

B

82.7

99.0

83.2

83.1

99.3

83.6

88.2

100.0

88.5

5

C

81.8

99.0

82.3

82.0

97.9

82.4

85.3

99.0

85.7

7

A

74.3

92.5

74.4

74.5

97.4

74.7

77.3

67.1

77.1

3

B

79.3

90.0

79.4

79.6

92.3

79.7

82.8

65.9

82.6

5

C

84.9

70.0

84.8

85.1

66.7

84.9

88.8

65.9

88.5

7

A

58.2

100.0

58.6

57.3

100.0

57.7

62.7

100.0

63.3

3

B

61.0

100.0

61.3

60.2

100.0

60.5

64.3

98.4

64.9

3

C

89.7

0.0

89.0

89.2

0.0

88.5

93.4

59.3

92.8

5

A

67.4

100.0

68.2

66.7

100.0

67.4

69.6

86.7

69.8

3

B

75.2

100.0

75.7

75.3

100.0

75.9

77.8

86.7

77.9

5

C

92.6

72.6

92.1

92.0

81.5

91.7

94.4

64.6

94.0

9

A

71.6

94.7

71.7

71.5

100.0

71.6

74.2

13.3

73.3

3

B

73.1

94.7

73.2

73.3

100.0

73.4

75.6

8.2

74.7

8

C

61.5

89.5

61..6

59.7

83.3

59.8

67.5

88.8

67.8

7

A

50.2

91.1

50.7

48.7

91.7

49.2

54.3

100.0

54.7

3

B

53.0

99.2

53.6

51.1

100.0

51.7

57.7

100.0

58.1

5

C

50.9

83.7

51.3

49.7

76.7

50.0

54.2

70.6

54.4

5


344


/n/

/N/

/l/

/r/

/w/

/ie/

/I/

/e/

/&/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

54.0

90.7

54.6

52.9

88.9

53.6

57.8

82.6

58.1

3

B

59.2

100.0

59.95

58.3

100.0

59.1

59.5

98.9

60.0

10

C

51.5

81.3

52.0

51.0

92.2

51.7

56.9

98.9

57.5

7

A

48.1

94.7

48.6

46.4

97.8

46.9

52.8

100.0

53.2

3

B

51.6

100.0

52.0

49.4

100.0

49.9

53.4

100.0

53.9

11

C

48.9

96.8

49.1

46.5

95.7

47.0

41.8

79.4

42.2

9

A

51.1

100.0

51.7

48.1

100.0

48.8

50.1

88.9

51.0

4

B

47.7

94.2

48.3

45.5

97.0

46.2

45.5

94.1

46.5

6

C

81.7

67.4

81.5

81.8

68.2

81.7

79.7

64.1

79.4

7

A

50.0

56.2

50.1

47.1

64.2

47.3

50.7

46.2

50.7

4

B

86.3

69.5

86.1

84.4

71.7

84.3

79.8

66.2

79.7

5

C

73.3

54.3

73.1

73.3

62.3

73.2

59.9

84.6

60.1

7

A

66.2

44.7

66.0

65.8

50.0

65.6

57.9

59.3

57.9

4

B

40.6

92.2

41.1

41.0

87.5

41.4

37.1

57.4

37.2

5

C

58.9

27.2

58.6

60.9

20.8

60.5

52.2

77.8

52.4

5

A

71.3

37.4

70.9

72.6

34.0

72.2

64.9

66.7

64.9

5

B

38.0

84.1

38.5

36.9

77.4

37.3

35.2

91.3

35.7

9

C

50.5

77.6

50.8

51.1

73.6

51.4

43.1

73.9

43.4

6

A

86.4

47.1

85.7

87.8

50.6

87.2

82.0

64.3

81.8

3

B

87.1

69.0

86.8

88.5

70.6

88.2

83.7

96.5

83.9

7

C

82.3

55.2

81.8

84.5

63.5

84.1

73.7

97.4

74.1

6

A

61.3

71.5

61.6

62.8

62.1

62.8

63.0

75.2

63.4

4

B

82.0

54.9

81.3

82.1

51.6

81.4

84.4

37.4

83.0

6

C

58.8

60.1

58.9

60.0

66.1

60.1

52.1

92.5

53.4

8

A

94.7

43.9

93.3

95.5

33.3

93.8

95.0

32.6

93.4

4

B

49.3

77.2

50.1

47.9

87.0

49.0

52.8

64.7

53.1

8

C

59.8

94.4

60.8

62.5

87.7

63.2

62.9

100.0

63.9

7


345


/V/

/A/

/U/

/i/

/a/

/O/

/3/

/u/

/el/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

88.6

76.7

88.4

89.0

71.3

88.8

83.4

87.5

83.5

3

B

81.5

81.6

81.5

81.1

92.5

81.3

78.3

64.6

78.2

5

C

73.4

87.1

73.6

74.1

88.8

74.3

69.4

96.9

69.8

5

A

79.0

98.3

79.3

78.1

94.2

78.4

74.1

95.8

74.5

3

B

51.1

98.8

51.9

49.9

100.0

50.8

50.2

89.2

50.9

5

C

55.0

72.3

55.3

56.2

68.6

56.4

47.8

100.0

48.7

6

A

85.1

84.8

85.1

85.5

64.6

85.3

75.8

77.8

75.9

4

B

82.9

83.8

82.9

83.8

72.9

83.7

74.9

86.4

75.0

5

C

82.3

85.9

82.3

82.7

77.1

82.6

74.9

90.1

75.0

7

A

65.6

67.7

65.6

66.4

61.6

66.2

67.7

69.8

67.8

4

B

34.8

73.2

36.0

33.3

71.7

34.5

33.9

78.3

36.1

7

C

63.7

48.3

63.2

63.3

54.1

63.0

66.7

60.4

66.4

8

A

88.4

70.4

87.4

88.6

58.8

87.0

86.2

93.5

86.7

3

B

83.6

95.0

84.2

82.5

96.2

83.2

85.3

99.6

86.2

5

C

60.3

98.0

62.3

60.4

98.1

62.4

63.1

99.6

65.4

5

A

82.3

74.4

81.9

81.8

72.1

81.4

73.2

81.1

73.6

4

B

77.8

86.9

78.3

77.9

82.5

78.1

65.5

85.8

66.5

6

C

63.0

12.9

60.5

62.8

17.9

60.5

73.5

19.8

70.9

8

A

84.3

65.2

83.4

85.8

59.8

84.6

79.5

55.0

78.4

5

B

34.2

78.4

36.3

33.7

71.4

35.5

40.3

78.8

42.0

6

C

35.8

96.9

38.7

36.1

94.0

38.8

44.6

84.9

46.3

8

A

27.3

95.3

30.0

24.8

89.7

27.4

26.2

66.4

28.1

4

B

21.1

99.0

24.2

19.1

97.9

22.2

21.4

100.0

25.1

7

C

48.5

93.4

50.3

48.0

97.9

50.0

43.7

99.7

46.3

6

A

45.6

53.2

45.9

46.7

52.3

46.9

52.6

24.8

52.0

4

B

35.3

60.9

36.4

36.3

57.0

37.3

48.6

63.6

49.0

7

C

70.1

44.3

68.9

71.8

43.8

70.5

80.5

71.5

80.2

8


346


/al/

/Oi/

/OU/

/aU/

/i@/

/U@/

/e@/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

72.7

68.8

72.4

73.1

75.2

73.2

70.9

56.3

70.3

4

B

42.9

59.9

44.0

43.7

60.0

44.8

42.2

64.4

43.1

8

C

37.7

15.4

36.2

39.0

12.8

37.2

37.1

41.1

37.3

8

A

71.8

72.9

71.9

70.1

70.8

70.1

61.6

46.7

60.9

4

B

32.8

93.6

35.2

31.0

95.9

33.5

30.0

26.7

29.9

7

C

59.8

78.6

60.5

61.2

76.4

61.8

58.8

17.8

56.8

8

A

65.8

55.1

65.2

66.2

52.2

65.4

62.9

39.0

61.8

4

B

73.7

42.5

71.9

73.4

41.9

71.5

73.4

21.6

71.0

8

C

65.3

49.8

64.4

65.6

49.1

64.6

59.2

38.4

58.2

6

A

37.2

80.0

39.2

39.0

78.0

40.9

45.6

10.8

43.9

4

B

73.8

72.5

73.7

72.7

77.2

72.9

74.0

51.7

72.9

7

C

54.6

13.0

52.6

55.4

12.9

53.4

54.9

67.4

55.5

8

A

19.4

58.6

20.8

17.5

62.8

19.2

21.2

72.3

23.6

4

B

23.9

51.9

24.9

22.4

53.9

23.5

34.3

49.5

35.1

7

C

72.8

43.2

71.7

72.8

46.1

71.8

73.0

16.1

70.4

9

A

44.3

76.1

45.0

42.4

81.7

43.3

42.7

29.1

42.3

4

B

36.8

93.4

38.1

36.6

89.9

37.8

33.2

19.7

32.8

6

C

16.8

84.1

18.3

17.3

82.6

18.7

22.8

74.4

24.3

11

A

66.4

64.6

66.4

68.0

62.2

67.9

59.3

66.8

59.5

3

B

20.9

98.4

22.8

19.6

98.4

21.6

27.9

82.2

29.4

5

C

5.8

92.2

8.0

4.9

91.3

7.1

6.5

95.5

9.0

7


347

Table F.11: Percent true negative, true positive and overall accuracies (to 1 d.p.) of SECoS trained with online aggregation on the phoneme classification problem.


/p/

/b/

/t/

/d/

/k/

/g/

/f/

/v/

/T/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

98.6

100.0

98.6

98.5

78.9

98.3

98.2

50.8

97.8

271

B

98.6

100.0

98.6

99.3

100.0

99.4

97.1

50.8

96.7

373

C

85.2

100.0

85.4

87.3

100.0

87.4

80.0

100.0

80.2

499

A

98.7

100.0

98.7

99.2

66.7

99.1

99.1

4.6

98.3

116

B

98.7

100.0

98.7

99.3

100.0

99.3

99.7

0.0

99.4

162

C

84.2

100.0

84.2

85.5

100.0

85.5

83.7

100.0

83.7

235

A

98.4

100.0

98.4

98.2

89.6

98.1

97.1

21.7

96.6

293

B

99.2

100.0

99.2

99.6

100.0

99.6

97.6

37.0

97.2

406

C

94.8

100.0

94.8

95.3

100.0

95.3

94.4

100.0

94.5

491

A

98.6

100.0

98.6

98.5

46.2

98.4

99.2

17.4

99.0

146

B

98.7

100.0

98.7

99.2

100.0

99.2

99.6

8.7

99.3

200

C

74.6

100.0

74.7

77.2

100.0

77.2

73.9

100.0

74.0

294

A

98.8

100.0

98.6

98.2

90.4

98.1

99.4

73.8

99.2

232

B

99.2

100.0

99.2

99.6

100.0

99.6

99.2

83.3

99.1

333

C

96.8

100.0

96.9

97.7

100.0

97.7

98.3

100.0

98.3

409

A

98.8

100.0

98.8

98.3

53.8

98.2

99.3

8.3

98.8

133

B

99.2

100.0

99.2

99.5

100.0

99.5

99.5

2.8

99.0

184

C

97.8

100.0

97.8

97.5

100.0

97.5

98.7

100.0

98.7

265

A

99.1

100.0

99.2

98.4

91.2

98.2

98.8

51.4

97.6

471

B

99.2

99.7

99.2

99.7

100.0

99.7

98.8

43.4

97.4

673

C

98.5

99.7

98.5

98.9

100.0

98.9

99.2

90.8

99.0

838

A

97.0

100.0

97.0

96.2

62.9

95.9

94.6

24.6

93.4

296

B

98.2

98.6

98.2

98.8

100.0

98.8

96.8

21.2

95.5

404

C

97.2

97.3

97.2

97.3

100.0

97.4

98.5

10.2

97.0

471

A

98.8

100.0

98.8

98.6

85.5

98.3

97.4

73.8

96.8

410

B

98.8

100.0

98.8

99.0

100.0

99.0

97.4

72.7

96.8

570

C

96.5

100.0

96.6

96.9

100.0

97.0

98.2

100.0

98.3

807


348


/D/

/s/

/z/

/S/

/Z/

/h/

/ch/

/dj/

/m/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

97.3

97.5

97.3

96.3

57.9

96.0

95.7

45.2

94.9

267

B

98.0

93.7

98.0

98.3

97.4

98.3

95.8

57.4

95.2

373

C

98.1

88.6

98.0

97.9

89.5

97.8

98.8

70.4

98.3

481

A

97.9

100.0

98.0

97.1

94.9

97.0

96.4

70.4

95.9

391

B

98.1

97.8

98.0

98.8

95.5

98.7

96.9

73.0

96.5

511

C

97.8

97.2

97.8

98.2

95.5

98.1

99.1

97.4

99.1

594

A

97.3

97.9

97.4

96.7

81.1

96.4

95.3

98.1

95.3

305

B

98.0

94.9

98.0

98.1

100.0

98.2

97.1

86.5

96.9

412

C

95.8

96.4

95.8

95.6

98.9

95.7

94.8

98.7

94.9

504

A

98.6

99.7

98.6

97.8

82.6

97.4

96.4

72.4

95.7

483

B

99.3

99.0

99.3

99.8

100.0

99.8

98.1

63.3

97.1

654

C

97.5

99.0

97.6

98.3

100.0

98.3

99.4

91.0

99.1

819

A

97.9

100.0

97.9

97.1

79.5

97.0

96.1

61.0

95.7

237

B

98.7

98.8

98.7

99.0

97.4

99.0

98.2

56.1

97.7

332

C

88.1

98.8

88.2

88.6

94.4

88.7

85.3

93.9

85.4

421

A

98.0

100.0

98.0

97.9

97.4

97.9

97.6

22.8

96.3

211

B

98.4

100.0

98.4

99.2

100.0

99.2

98.1

22.0

96.8

292

C

93.9

100.0

94.0

94.8

100.0

94.8

97.7

100.0

97.8

447

A

98.7

99.1

98.7

97.7

82.4

97.4

97.7

55.8

97.1

411

B

99.3

98.7

99.3

99.6

100.0

99.6

96.8

52.2

96.1

582

C

94.9

98.7

95.0

95.6

100.0

95.7

92.7

85.0

92.3

707

A

98.1

100.0

98.1

97.9

55.6

97.8

94.5

1.0

94.2

168

B

98.7

100.0

98.7

98.9

100.0

99.0

96.2

0.0

94.8

208

C

91.8

100.0

91.8

92.2

100.0

92.2

87.1

44.9

86.5

341

A

97.4

69.1

97.1

97.5

63.3

97.2

93.9

29.4

93.3

306

B

97.7

65.0

97.3

98.7

85.0

98.5

93.1

35.3

92.5

413

C

95.8

63.4

95.4

96.0

81.7

95.8

92.4

58.8

92.0

553


349


/n/

/N/

/l/

/r/

/w/

/ie/

/I/

/e/

/&/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

97.5

78.0

97.2

97.0

65.6

96.4

92.9

37.0

92.2

406

B

97.8

72.0

97.3

98.9

81.1

98.6

92.7

31.5

91.9

519

C

97.8

67.0

97.2

98.3

86.7

98.1

97.6

33.7

69.8

667

A

97.5

88.4

97.4

97.2

63.0

96.9

94.5

3.2

93.7

309

B

97.5

83.2

97.4

98.2

82.6

98.0

94.2

0.0

93.3

419

C

96.8

83.2

96.7

97.1

87.0

97.0

98.1

82.5

97.9

522

A

97.5

97.1

97.5

96.8

62.1

96.3

94.2

37.3

92.9

443

B

97.4

96.4

97.4

98.3

98.5

98.3

92.0

52.9

91.2

610

C

94.2

94.9

94.2

94.8

98.5

94.9

88.9

58.8

88.3

822

A

96.5

92.4

96.5

95.8

83.0

95.7

93.0

24.6

92.4

271

B

96.8

91.4

96.7

97.0

96.2

97.0

92.9

44.6

92.5

378

C

94.9

92.4

94.9

94.7

98.1

94.7

92.3

89.2

92.3

490

A

96.9

96.1

96.9

96.7

62.5

96.4

92.4

29.6

91.9

426

B

97.2

93.2

97.2

98.1

93.8

98.1

93.7

25.9

93.2

578

C

95.0

90.3

94.9

94.7

93.8

94.7

96.3

94.4

96.2

763

A

96.8

88.8

96.7

96.0

50.9

95.5

94.7

15.9

93.9

580

B

96.9

83.2

96.8

98.0

81.1

97.8

96.1

15.9

95.3

801

C

94.5

86.0

94.5

94.8

83.0

94.7

96.2

92.8

96.2

978

A

97.6

92.5

97.5

97.5

81.2

97.2

92.7

31.3

91.7

484

B

98.1

90.8

98.0

98.9

94.1

98.8

94.5

26.1

93.4

629

C

96.9

92.0

96.8

97.4

95.3

97.4

94.2

94.8

94.2

746

A

98.3

79.1

97.8

97.8

68.5

97.1

97.3

35.0

95.4

378

B

98.1

74.3

97.5

98.9

81.5

98.5

97.9

41.6

96.2

506

C

96.4

78.3

96.0

96.8

78.2

96.3

97.5

58.9

96.3

581

A

98.0

91.9

97.8

97.2

82.6

96.8

94.7

78.9

94.2

455

B

98.1

86.0

97.8

98.7

94.9

98.6

96.7

77.9

96.2

620

C

96.6

86.7

96.3

97.0

94.9

97.0

96.8

68.4

96.0

650


350


/V/

/A/

/U/

/i/

/a/

/O/

/3/

/u/

/el/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

97.0

88.3

96.8

96.3

56.3

95.6

93.2

25.0

92.3

446

B

97.6

80.4

97.4

98.6

85.0

98.4

93.8

30.2

92.9

606

C

96.8

81.0

96.5

97.3

85.0

97.1

93.4

80.2

93.2

680

A

98.5

93.6

98.4

97.9

73.3

97.4

95.9

48.3

95.1

373

B

98.5

87.9

98.3

98.8

91.9

98.4

96.2

52.5

95.5

498

C

97.5

87.9

97.3

97.6

91.9

97.5

93.9

91.7

93.9

610

A

98.5

97.0

98.5

97.8

62.5

97.5

96.5

24.7

95.7

339

B

98.5

90.9

98.4

99.0

95.8

99.0

96.6

22.2

95.8

452

C

97.0

92.9

97.0

97.4

95.8

97.4

97.5

64.2

97.1

600

A

96.5

84.6

96.1

95.1

57.2

93.9

93.2

37.3

90.4

838

B

96.4

71.4

95.6

98.3

85.5

97.9

93.0

44.4

90.6

1098

C

94.9

74.5

94.2

95.5

82.4

95.0

93.1

57.0

91.3

1177

A

98.6

87.0

98.0

97.1

85.9

96.5

97.3

42.4

93.9

592

B

98.5

85.3

97.8

99.3

89.3

98.7

97.4

43.3

93.9

761

C

97.5

84.8

96.8

98.3

87.4

97.7

97.2

77.9

95.9

775

A

98.0

86.7

97.4

96.7

70.1

95.3

91.6

50.3

89.6

693

B

97.9

88.1

97.4

98.8

88.4

98.3

92.0

61.9

90.5

898

C

95.6

83.6

95.0

96.5

84.5

95.9

94.9

84.0

94.4

1126

A

97.5

91.3

97.2

96.0

79.1

95.2

90.4

61.7

89.1

737

B

97.6

86.8

97.1

98.9

91.9

98.5

90.6

64.0

89.4

971

C

97.0

83.9

96.4

97.5

87.2

97.0

96.9

69.8

95.7

1146

A

96.1

84.8

95.7

94.5

68.7

96.4

90.7

32.7

88.0

692

B

97.0

83.8

96.5

97.9

87.7

97.5

92.3

23.3

89.0

918

C

92.8

82.0

92.4

92.8

89.2

92.7

92.9

80.0

92.3

1103

A

96.7

83.8

96.1

93.9

63.0

92.5

90.9

18.8

89.2

1089

B

96.3

77.1

95.4

98.7

91.5

98.3

91.6

22.4

90.0

1461

C

94.7

77.9

93.9

96.0

91.9

95.8

91.7

96.4

91.8

1629


351


/al/

/Oi/

/OU/

/aU/

/i@/

/U@/

/e@/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

96.9

87.8

96.3

93.9

68.1

92.2

92.6

20.7

89.9

1220

B

96.6

80.8

95.5

98.2

89.3

97.6

91.7

10.4

88.6

1620

C

93.6

81.8

92.8

94.9

90.1

94.6

91.2

70.0

90.4

1799

A

97.2

88.9

96.8

95.1

67.7

94.1

88.4

0.3

83.9

969

B

96.1

76.4

95.3

98.1

84.1

97.5

89.5

0.3

84.9

1242

C

93.6

76.8

92.9

94.3

83.6

93.9

89.6

66.1

88.4

1486

A

96.2

88.8

95.8

92.9

58.1

90.9

83.2

17.7

80.1

1304

B

95.1

87.1

94.6

97.1

94.8

97.0

82.3

11.6

79.0

1727

C

92.1

85.8

91.8

93.0

95.2

93.1

87.3

71.6

86.6

1986

A

97.3

86.8

97.0

95.7

66.4

94.3

94.7

1.7

90.2

865

B

96.1

87.4

95.7

97.6

93.5

97.4

93.7

4.4

89.3

1169

C

90.1

87.6

89.9

91.1

94.4

91.3

83.2

93.6

83.7

1279

A

97.5

72.2

96.6

95.4

51.7

93.8

92.1

3.3

87.9

885

B

97.4

69.5

96.4

98.6

76.7

97.8

93.1

3.0

88.9

1127

C

93.1

72.4

92.4

93.6

80.0

93.1

86.9

77.5

86.5

1260

A

97.3

92.9

97.2

96.1

75.2

95.6

95.1

4.4

92.5

701

B

97.0

91.2

96.9

98.7

92.7

98.6

95.3

8.4

92.8

895

C

93.5

92.0

93.5

93.7

93.6

93.7

91.1

73.9

93.6

1051

A

97.7

75.5

97.1

96.2

47.2

94.9

94.3

14.9

92.1

810

B

96.3

73.2

95.7

98.4

88.2

98.1

94.3

26.2

92.3

1020

C

93.4

74.7

92.9

94.4

88.2

94.2

90.3

87.1

90.3

1162


352

Table F.12: Percent true negative, true positive and overall accuracies (to 1 d.p.) of EFuNN optimised with offline aggregation for the phoneme classification problem.


/p/

/b/

/t/

/d/

/k/

/g/

/f/

/v/

/T/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

48.1

89.7

48.6

46.5

91.2

44.0

52.0

94.9

52.4

3

B

42.4

96.6

43.0

40.1

94.7

40.7

46.4

100.0

46.9

3

C

57.0

92.2

57.4

56.2

89.5

56.6

60.8

94.9

61.1

3

A

23.4

94.1

23.5

21.3

100.0

21.5

28.9

100.0

29.1

3

B

35.3

82.4

35.3

33.6

100.0

33.7

40.7

100.0

40.9

3

C

57.6

70.6

57.6

57.1

66.7

57.1

61.0

100.0

61.1

3

A

56.3

100.0

56.9

55.7

100.0

56.3

59.8

100.0

60.1

3

B

57.7

100.0

58.3

57.5

100.0

58.1

60.2

100.0

60.5

3

C

74.6

96.4

74.9

74.7

100.0

75.1

76.5

100.0

76.7

3

A

44.5

92.6

44.6

42.5

92.3

42.6

48.0

100.0

48.2

3

B

38.5

92.6

38.7

36.2

92.3

36.3

41.7

100.0

41.9

3

C

61.1

85.2

61.2

60.5

92.3

60.6

62.8

100.0

62.9

3

A

45.9

100.0

46.4

44.0

100.0

44.6

49.7

100.0

50.0

3

B

46.0

100.0

46.6

44.2

100.0

44.8

49.5

100.0

49.8

3

C

58.1

100.0

58.6

57.5

98.1

57.9

61.2

100.0

61.4

3

A

45.4

100.0

45.5

43.5

100.0

43.7

49.4

100.0

49.7

3

B

42.6

100.0

42.7

40.5

100.0

40.7

46.3

100.0

46.6

3

C

59.4

100.0

59.5

58.9

92.3

59.0

62.0

100.0

62.2

3

A

52.7

100.0

54.0

51.6

100.0

53.0

56.1

100.0

57.2

3

B

54.7

100.0

56.0

54.1

100.0

55.5

58.0

100.0

59.1

3

C

66.5

100.0

67.5

66.1

100.0

67.1

67.6

100.0

68.3

3

A

49.6

100.0

50.0

48.2

100.0

48.6

54.1

100.0

54.9

3

B

49.6

100.0

49.9

48.2

100.0

48.6

53.5

100.0

54.3

3

C

63.5

100.0

63.8

63.2

97.1

63.5

65.9

100.0

66.4

3

A

51.6

100.0

52.8

50.4

100.0

51.6

55.3

100.0

56.4

3

B

56.4

100.0

57.5

56.1

100.0

57.2

59.4

100.0

60.4

3

C

67.6

100.0

68.5

67.6

100.0

68.4

68.7

100.0

69..5

3


353


/D/

/s/

/z/

/S/

/Z/

/h/

/ch/

/dj/

/m/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

49.0

100.0

49.4

47.2

100.0

47.6

52.7

100.0

53.5

3

B

53.6

100.0

54.0

53.2

100.0

53.6

57.5

100.0

58.1

3

C

66.7

98.7

66.9

66.7

100.0

67.0

69.1

100.0

69.6

3

A

54.6

100.0

56.2

53.6

100.0

55.2

56.5

100.0

57.3

3

B

64.9

100.0

66.2

64.0

100.0

65.3

65.7

100.0

66.3

3

C

75.1

99.4

76.0

75.3

99.4

76.1

76.4

100.0

76.8

3

A

46.6

100.0

47.6

44.7

100.0

45.8

49.6

100.0

50.7

3

B

54.4

100.0

55.3

53.4

100.0

54.7

57.9

100.0

58.9

3

C

70.6

97.9

71.1

70.6

97.9

71.1

71.8

94.9

72.4

3

A

47.9

100.0

49.4

46.3

100.0

47.9

51.5

100.0

52.9

3

B

72.1

100.0

72.9

71.9

100.0

72.7

74.4

100.0

75.1

3

C

80.3

100.0

80.9

80.1

100.0

80.7

83.8

100.0

84.3

3

A

51.2

100.0

51.5

49.9

100.0

50.3

54.6

75.6

54.8

3

B

64.1

100.0

64.4

63.4

100.0

63.7

66.2

75.6

66.4

3

C

77.3

87.5

77.4

77.5

92.3

77.6

80.4

335.4

79.9

3

A

49.4

100.0

49.8

47.8

100.0

48.2

53.6

100.0

54.4

3

B

50.0

100.0

50.4

48.8

100.0

49.2

54.2

100.0

55.0

3

C

69.6

97.6

69.8

69.5

97.4

69.7

72.5

98.4

73.0

3

A

50.5

100.0

51.6

49.0

100.0

50.2

53.6

86.7

54.1

3

B

65.2

100.0

66.0

64.5

100.0

65.3

66.9

86.7

67.2

3

C

77.8

100.0

78.3

77.8

100.0

78.3

80.4

86.7

80.5

3

A

51.0

94.7

51.1

49.9

100.0

50.1

55.1

95.9

55.6

3

B

58.3

94.7

58.4

57.7

100.0

57.9

61.9

67.3

61.9

3

C

72.8

94.7

72.9

72.7

100.0

72.8

72.3

10.2

74.4

3

A

41.0

100.0

41.7

38.3

100.0

39.0

45.5

100.0

46.0

3

B

43.7

100.0

44.4

41.5

100.0

42.2

48.6

100.0

49.1

3

C

59.3

92.7

59.7

58.5

95.0

59.0

61.6

100.0

662.0

3


354


/n/

/N/

/l/

/r/

/w/

/ie/

/I/

/e/

/&/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

46.4

100.0

47.4

44.6

100.0

45.6

50.4

100.0

51.1

3

B

51.6

100.0

52.5

50.5

100.0

51.4

55.4

96.7

55.9

3

C

61.5

94.5

62.1

60.7

92.2

61.3

62.9

85.9

63.2

3

A

39.7

100.0

40.2

37.2

100.0

37.8

44.6

100.0

45.1

3

B

48.3

97.9

48.7

46.3

100.0

46.8

52.1

100.0

52.5

3

C

57.0

92.6

57.3

56.4

97.8

56.8

60.7

100.0

61.0

3

A

36.1

100.0

37.0

33.5

100.0

34.4

41.3

85.6

42.2

3

B

43.7

100.0

44.5

41.8

100.0

42.5

47.6

81.7

48.4

3

C

53.7

94.2

54.3

52.5

92.4

53.0

57.2

74.5

57.6

3

A

33.8

57.1

34.1

31.1

52.8

31.3

39.3

47.7

39.3

3

B

36.3

70.5

36.7

32.3

77.4

32.8

38.3

50.8

38.5

3

C

62.2

72.4

62.3

61.6

84.9

61.8

53.0

73.8

53.2

3

A

20.5

96.1

21.3

18.1

95.8

18.8

24.9

87.0

25.3

3

B

32.3

92.2

32.9

29.4

91.7

30.0

36.3

66.7

36.5

3

C

54.3

87.4

54.7

52.8

85.4

53.1

50.5

98.1

50.9

3

A

14.1

91.6

14.9

12.9

92.5

13.7

19.9

97.1

20.7

3

B

24.9

77.6

25.4

22.8

79.2

23.4

31.3

82.6

31.8

3

C

35.4

63.6

35.7

34.2

62.3

34.5

35.6

65.2

35.8

3

A

93.4

49.4

92.6

93.9

56.5

93.3

95.2

9.6

93.8

3

B

27.1

76.4

27.9

27.2

78.8

28.1

30.7

43.5

30.9

3

C

94.5

37.4

93.5

93.9

47.1

93.1

87.7

93.0

87.8

3

A

35.7

96.7

37.2

35.2

91.9

36.6

43.5

96.7

45.2

3

B

91.3

76.7

91.0

92.2

66.9

91.5

94.1

42.5

92.5

3

C

86.2

91.3

86.3

86.4

89.5

86.5

86.8

79.4

86.6

3

A

34.2

95.1

35.9

33.7

97.1

35.4

39.3

98.9

40.9

3

B

43.2

91.2

44.5

43.6

94.9

45.0

48.1

98.4

49.5

3

C

91.2

65.9

90.5

92.3

58.0

91.4

92.7

92.6

92.7

3


355


/V/

/A/

/U/

/i/

/a/

/O/

/3/

/u/

/el/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

98.2

30.1

97.1

98.7

25.0

97.5

98.0

29.2

97.1

3

B

38.1

92.6

38.9

36.1

95.0

37.1

44.2

100.0

44.9

3

C

93.4

68.7

93.0

93.5

58.8

93.0

86.8

75.0

86.7

3

A

25.3

88.4

26.4

21.9

83.7

23.0

25.3

44.2

25.6

3

B

47.3

98.3

48.2

46.3

93.0

47.1

44.2

81.7

44.9

3

C

92.2

65.3

91.7

93.1

62.8

92.6

83.6

97.5

83.8

3

A

32.1

83.8

32.6

30.0

89.6

30.6

26.6

100.0

27.4

3

B

27.5

76.8

28.0

24.9

77.1

25.4

25.2

98.8

26.0

3

C

95.5

49.5

95.1

94.9

39.6

94.4

89.3

79.0

89.2

3

A

32.2

89.2

34.0

31.8

84.9

33.5

43.1

98.0

45.8

3

B

87.4

23.1

85.4

88.4

23.9

86.4

91.5

10.5

87.5

3

C

75.2

58.5

74.7

75.8

54.7

75.2

81.3

65.2

80.5

3

A

42.9

95.7

45.7

41.7

95.8

44.5

49.9

98.4

52.9

3

B

50.1

97.2

52.6

48.2

98.5

50.8

56.4

99.1

59.1

3

C

91.8

86.8

91.5

90.7

83.2

90.3

91.5

98.7

91.9

3

A

25.4

96.7

29.0

22.8

96.4

26.5

23.9

98.5

27.6

3

B

58.6

88.2

60.1

57.3

86.1

58.8

46.5

97.4

48.9

3

C

94.9

57.2

93.0

93.8

54.6

91.8

90.6

71.5

89.7

3

A

21.5

90.3

24.8

20.0

90.2

23.3

26.7

77.5

29.0

3

B

87.9

62.9

86.7

88.7

62.0

87.5

88.1

77.8

87.7

3

C

91.8

39.0

89.3

91.5

48.3

89.5

87.6

69.1

86.8

3

A

98.3

1.2

94.5

97.8

1.0

94.0

98.9

0.0

94.3

3

B

41.1

90.8

43.1

42.1

87.2

43.9

45.5

65.8

46.4

3

C

83.4

54.1

82.3

83.8

62.1

83.0

77.8

61.5

77.1

3

A

99.0

1.3

94.4

98.5

1.3

93.8

99.7

0.6

97.4

3

B

96.9

2.5

92.5

96.3

3.4

91.9

97.5

10.3

95.4

3

C

86.2

27.1

83.4

87.6

26.4

84.7

90.7

46.1

89.7

3


356


/al/

/Oi/

/OU/

/aU/

/i@/

/U@/

/e@/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

40.1

94.6

43.8

39.2

92.8

42.8

43.4

69.6

44.4

3

B

87.8

61.5

86.0

87.9

65.7

86.4

84.5

27.0

82.3

3

C

59.0

83.7

60.5

59.4

86.6

61.2

63.8

41.5

62.9

3

A

18.5

89.2

21.4

16.6

93.3

19.6

19.6

52.5

21.3

3

B

88.9

57.9

87.7

88.5

52.3

87.1

75.6

2.8

71.9

3

C

37.7

77.1

39.3

35.8

86.7

37.8

38.0

34.2

37.8

3

A

19.7

72.4

22.8

17.9

79.2

21.4

25.6

68.6

27.6

3

B

34.4

78.3

37.0

33.3

83.7

36.3

33.4

44.2

33.9

3

C

83.1

39.5

80.6

82.3

34.3

79.5

75.6

59.1

74.8

3

A

20.5

87.0

23.6

20.1

84.5

23.1

30.4

32.6

30.5

3

B

34.9

85.3

37.2

34.3

87.1

36.8

39.4

23.4

38.7

3

C

68.3

67.8

68.3

67.8

70.7

68.0

66.9

80.2

67.5

3

A

21.0

97.8

23.8

19.7

98.3

22.6

25.6

41.9

26.4

3

B

35.5

94.3

37.6

35.7

95.0

37.8

37.7

13.4

36.6

3

C

31.4

73.8

32.9

30.0

78.9

31.8

40.1

33.7

39.8

3

A

31.9

63.3

32.6

29.4

67.0

30.2

37.9

62.6

38.6

3

B

23.0

84.1

24.3

20.5

92.7

22.1

28.1

74.9

29.4

3

C

40.9

76.5

41.7

39.0

82.6

39.9

45.8

40.4

45.7

3

A

20.9

93.0

22.7

19.3

93.7

21.2

27.3

80.2

28.8

3

B

17.1

97.3

19.1

16.1

96.9

18.2

24.5

89.1

26.4

3

C

74.5

51.0

73.9

73.9

52.8

73.4

69.9

69.3

69.8

3


357

Table F.13: Percent true negative, true positive and overall accuracies (to 1 d.p.) of SECoS optimised with offline aggregation for the phoneme classification problem.


/p/

/b/

/t/

/d/

/k/

/g/

/f/

/v/

/T/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

99.7

48.3

99.1

99.7

35.1

99.0

99.4

16.9

98.7

168

B

99.9

29.3

99.1

100.0

26.3

99.1

99.6

23.7

99.0

200

C

94.9

38.8

94.3

95.4

28.1

94.7

93.5

33.9

93.0

257

A

99.2

52.9

99.1

99.4

22.2

99.2

99.4

0.0

99.1

101

B

99.0

35.3

99.5

99.5

55.6

99.4

99.7

4.3

99.4

133

C

98.9

11.8

98.8

99.0

22.2

98.8

100.0

55.2

99.8

178

A

99.7

35.0

98.8

99.6

29.9

98.6

99.4

6.5

98.8

158

B

100.0

30.7

99.0

100.0

41.8

99.2

99.7

13.0

99.1

192

C

99.8

15.0

98.6

99.9

25.4

98.8

100.0

26.1

99.5

218

A

98.9

55.6

98.8

98.9

23.1

98.7

99.4

13.0

99.1

124

B

99.5

33.3

99.3

99.7

46.2

99.6

99.9

4.3

99.6

145

C

94.3

29.6

94.1

95.1

61.5

95.0

96.2

30.4

95.5

218

A

99.4

58.7

98.9

99.2

46.2

98.6

99.6

52.4

99.3

128

B

99.9

47.7

99.3

99.9

61.5

99.5

99.7

57.1

99.4

155

C

99.8

41.3

99.1

99.9

44.2

99.3

99.9

73.8

99.7

187

A

99.6

53.6

99.5

99.3

15.4

99.1

99.9

0.0

99.4

108

B

99.9

42.9

99.7

99.9

38.5

99.8

100.0

0.0

99.4

123

C

99.6

21.4

99.3

99.2

61.5

99.1

99.9

5.6

99.4

169

A

100.0

23.7

97.7

99.9

18.4

97.5

99.9

20.8

97.9

174

B

99.9

21.7

97.6

100.0

22.4

97.7

99.9

13.3

97.7

215

C

99.7

23.4

97.4

99.7

32.0

97.7

99.9

12.7

97.8

248

A

98.9

19.2

98.3

99.2

5.7

98.5

99.0

5.9

97.5

226

B

99.7

8.2

99.0

99.9

14.3

99.3

99.5

8.5

98.0

262

C

99.4

4.1

98.7

99.4

8.6

98.8

99.9

0.8

98.3

358

A

99.9

39.5

98.4

99.8

33.1

98.1

99.6

12.8

97.5

164

B

99.8

10.2

97.6

99.8

12.1

97.6

99.9

0.6

97.5

180

C

99.7

16.0

97.6

99.7

20.2

97.7

99.9

32.6

98.2

237


358


/D/

/s/

/z/

/S/

/Z/

/h/

/ch/

/dj/

/m/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

99.4

22.8

98.8

99.1

13.2

98.4

98.3

14.8

96.9

196

B

99.5

16.5

98.8

99.6

7.9

98.8

99.2

28.7

98.0

247

C

99.6

8.9

98.9

99.4

7.9

98.7

99.9

23.5

98.7

317

A

99.8

33.7

96.5

98.8

22.5

96.1

98.7

5.2

97.2

198

B

99.1

15.7

96.1

99.4

14.0

96.3

97.8

2.6

96.3

255

C

98.7

21.0

96.0

98.9

20.8

96.1

99.1

40.0

98.2

296

A

100.0

54.4

99.1

100.0

47.4

99.0

99.7

64.1

98.6

159

B

99.9

52.3

98.9

99.8

52.6

98.9

98.9

67.3

98.2

177

C

99.9

45.1

98.9

99.9

50.5

98.9

99.8

51.9

98.7

223

A

99.7

26.9

97.6

99.6

18.9

97.2

99.3

1.0

96.5

219

B

99.9

17.2

97.5

100.0

13.2

97.5

99.8

11.6

97.3

244

C

99.7

11.8

97.1

99.7

19.4

97.4

99.9

11.6

97.4

297

A

99.6

26.3

99.0

99.4

20.5

98.8

99.4

19.5

98.5

159

B

99.7

16.3

99.1

99.8

12.8

99.1

99.7

1.2

98.5

193

C

99.7

30.0

99.1

99.7

20.5

99.1

99.7

41.5

99.0

232

A

99.4

47.0

99.0

99.3

50.0

98.9

99.6

23.6

98.3

131

B

99.7

45.8

99.3

99.9

73.7

99.7

99.5

19.5

98.1

149

C

99.0

30.1

98.4

99.0

57.9

98.7

99.8

48.8

98.9

177

A

99.8

13.7

97.9

99.8

18.5

98.0

99.7

6.2

98.2

190

B

99.6

10.6

97.6

99.8

14.8

97.9

99.3

15.0

98.0

207

C

96.6

6.6

94.6

96.7

13.9

94.9

93.7

30.1

92.7

254

A

99.5

28.9

99.2

99.4

22.2

99.2

98.3

2.0

97.0

135

B

99.6

13.2

99.3

99.7

22.2

99.4

98.3

2.0

96.9

147

C

99.1

2.6

98.7

99.0

33.3

98.7

99.5

0.0

98.1

262

A

99.5

13.0

98.5

99.6

8.3

98.5

98.9

4.4

97.9

308

B

99.8

4.1

98.7

99.9

5.0

98.8

99.3

4.4

98.4

375

C

99.0

13.8

98.0

99.0

6.7

98.9

96.3

8.8

95.4

495


359


/n/

/N/

/l/

/r/

/w/

/ie/

/I/

/e/

/&/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

99.8

10.4

98.2

99.7

4.4

98.0

99.0

3.2

97.7

353

B

99.6

8.2

97.9

99.8

13.3

98.2

98.6

6.5

97.4

401

C

99.7

4.4

98.0

99.8

4.4

98.1

99.9

0.0

98.6

568

A

99.7

8.4

98.8

99.5

2.2

98.6

98.9

0.0

98.0

270

B

99.8

4.2

98.9

99.9

8.7

99.0

99.4

0.0

98.5

329

C

99.6

3.2

98.7

99.6

2.2

98.7

99.9

6.3

99.1

422

A

99.9

11.6

98.7

99.9

12.1

98.7

99.4

5.2

97.4

355

B

99.8

8.7

98.6

99.9

7.6

98.6

99.0

5.9

97.0

419

C

99.5

5.1

98.2

99.4

0.0

98.1

99.5

7.2

97.5

603

A

99.7

34.3

99.9

99.4

30.2

98.7

99.2

18.5

98.5

240

B

98.8

26.7

98.0

98.6

41.5

98.0

97.3

0.0

96.4

286

C

99.3

21.9

98.5

99.3

39.6

98.7

99.4

23.1

98.7

373

A

99.9

18.4

99.1

99.7

14.6

98.8

98.9

0.0

98.2

352

B

99.8

6.8

98.8

99.8

6.3

98.9

99.4

7.4

98.7

404

C

99.5

6.8

98.5

99.6

12.5

98.7

99.7

0.0

98.9

543

A

99.7

1.9

98.7

99.7

1.9

98.6

99.2

17.4

98.4

497

B

99.6

0.0

98.5

99.9

3.8

98.8

99.5

0.0

98.6

582

C

99.6

3.7

98.6

99.6

3.8

98.5

99.5

20.3

98.7

718

A

99.4

6.3

97.8

99.2

9.4

97.6

98.8

0.0

97.2

365

B

99.6

8.0

98.1

99.8

7.1

98.2

99.9

0.0

98.3

446

C

99.9

0.6

98.2

99.8

3.5

98.2

99.7

24.3

98.5

497

A

99.8

4.7

97.5

99.8

2.4

97.4

99.5

5.1

96.7

313

B

99.9

1.2

97.4

99.9

4.0

97.5

99.9

4.2

97.0

397

C

99.9

1.2

97.5

99.9

3.2

97.5

99.9

5.6

97.1

456

A

99.9

12.6

97.5

99.9

16.7

97.6

99.4

41.6

97.8

367

B

99.9

14.3

97.5

99.9

16.7

97.6

99.4

40.0

97.8

438

C

100.0

6.3

97.3

99.9

10.1

97.4

99.9

30.0

98.0

467


360


/V/

/A/

/U/

/i/

/a/

/O/

/3/

/u/

/el/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

99.5

6.1

98.0

99.4

2.5

97.9

99.0

1.0

97.6

333

B

99.9

4.9

98.3

99.9

5.0

98.4

99.4

2.1

98.1

411

C

99.9

3.7

98.4

99.8

2.5

98.3

99.4

18.8

98.6

450

A

100.0

19.1

98.6

99.9

12.8

98.4

99.6

3.3

98.0

266

B

100.0

17.9

98.6

100.0

19.8

98.6

99.2

22.5

97.9

328

C

99.6

12.1

98.1

99.6

20.9

98.2

98.5

51.7

97.7

360

A

99.9

18.2

99.1

99.9

4.2

99.0

99.8

3.7

98.7

253

B

99.8

15.2

98.9

99.9

14.6

99.1

99.7

3.7

98.6

312

C

99.7

9.1

98.8

99.9

6.3

99.0

99.9

6.2

98.8

416

A

99.9

3.1

96.8

99.9

3.8

96.8

99.7

5.7

95.1

622

B

99.8

1.5

96.6

99.7

4.4

96.7

98.4

3.1

93.6

696

C

99.7

3.4

96.6

99.6

13.2

96.9

98.6

10.0

94.2

806

A

100.0

8.9

95.2

99.9

7.6

95.1

99.5

15.0

94.2

406

B

99.9

15.2

95.5

100.0

14.1

95.4

99.6

10.5

94.0

471

C

99.8

10.0

95.1

99.8

9.5

95

99.3

38.6

95.5

484

A

100.0

8.9

95.3

100.0

5.2

95.2

99.6

3.8

95.0

481

B

99.9

4.2

95.1

100.0

2.4

95.1

99.8

3.5

95.1

554

C

99.5

7.5

94.8

99.5

5.2

94.8

99.0

4.9

94.4

720

A

99.9

10.7

95.7

99.9

12.4

95.7

99.2

8.3

95.2

608

B

100.0

7.0

95.5

100.0

8.1

95.6

99.1

11.6

95.3

697

C

99.8

2.1

95.2

99.8

5.6

95.4

99.9

1.9

95.6

829

A

99.5

6.7

95.9

99.4

5.1

95.7

98.9

0.0

94.3

514

B

99.7

5.5

96.0

99.7

6.2

96.1

99.2

0.0

94.5

566

C

99.4

9.5

95.9

99.7

6.7

96.1

99.7

6.4

95.3

677

A

99.9

2.7

95.3

99.9

0.4

95.1

99.8

1.2

97.5

773

B

99.9

2.1

95.3

99.9

3.0

95.3

98.7

1.2

96.4

860

C

99.6

4.0

95.2

99.7

4.3

95.1

99.4

0.0

97.1

917


361


/al/

/Oi/

/OU/

/OU/

/i@/

/u@/

/e@/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

99.8

7.1

93.5

99.9

4.2

93.4

99.7

0.4

95.9

865

B

100.0

2.3

93.4

100.0

0.3

93.2

99.9

1.9

96.1

926

C

100.0

3.1

93.4

99.9

2.4

93.3

99.6

10.7

96.2

1017

A

99.9

2.5

96.0

99.8

2.6

96.0

99.3

0.0

94.2

714

B

99.8

1.2

95.9

99.9

2.6

96.0

99.4

0.0

94.3

762

C

99.7

3.7

95.9

99.7

3.6

96.0

99.9

0.0

94.8

905

A

99.9

6.9

94.5

99.9

3.8

94.3

98.2

0.9

93.6

985

B

99.9

5.1

94.4

99.9

7.3

94.5

97.2

0.0

92.7

1020

C

99.5

1.0

93.8

99.7

2.1

94.0

99.9

0.0

95.3

1145

A

99.8

4.1

95.4

99.8

3.4

95.3

99.7

0.0

94.9

602

B

100.0

2.3

95.5

100.0

4.7

95.5

99.9

0.0

95.0

621

C

99.7

0.9

95.1

99.5

1.7

95.0

96.7

17.7

92.9

658

A

99.8

0.8

96.2

99.9

3.3

96.4

99.7

0.3

95.1

745

B

99.9

1.4

96.3

99.9

2.8

96.4

99.8

0.0

95.1

779

C

99.7

3.8

96.2

99.5

5.0

96.1

99.9

1.8

95.3

859

A

99.5

6.6

97.5

99.4

2.8

97.2

99.5

1.0

96.6

548

B

99.8

5.3

97.7

99.9

9.2

97.9

99.4

0.0

96.5

564

C

99.5

0.4

97.3

99.4

4.6

97.3

99.0

1.0

96.2

655

A

99.8

2.7

97.3

99.7

3.9

97.2

99.4

2.0

96.6

718

B

99.8

3.1

97.4

99.9

7.9

97.5

99.6

1.0

96.8

732

C

99.8

0.4

97.3

99.7

0.8

97.2

99.9

6.4

97.2

792


362

Table F.14: Percent true negative, true positive and overall accuracies (to 1 d.p.) of SECoS optimised with sleep learning for the phoneme classification problem.


/p/

/b/

/t/

/d/

/k/

/g/

/f/

/v/

/T/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

94.0

100.0

94.1

94.3

84.2

94.2

92.2

72.9

92.1

166

B

83.5

100.0

83.7

83.6

100.0

83.8

79.0

79.7

79.0

218

C

45.4

99.1

46.0

47.0

98.2

47.6

43.1

100.0

43.5

298

A

79.3

100.0

79.3

80.8

77.8

80.8

75.2

21.7

75.0

32

B

86.3

100.0

86.4

87.4

100.0

87.4

86.4

8.7

86.2

47

C

60.4

100.0

60.5

61.9

100.0

62.0

58.5

100.0

58.6

76

A

92.3

100.0

92.4

92.3

92.5

92.3

87.5

41.3

87.1

202

B

89.3

100.0

89.5

89.1

100.0

89.3

85.5

71.7

85.4

287

C

60.6

100.0

61.2

59.7

100.0

60.3

54.9

100.0

55.0

323

A

92.1

100.0

92.1

91.3

53.8

91.2

93.3

43.5

93.1

58

B

81.5

96.3

81.5

81.7

100.0

81.8

80.5

21.7

80.3

75

C

49.7

100.0

49.8

49.8

100.0

49.9

42.3

100.0

42.5

105

A

90.4

100.0

90.5

89.7

90.4

89.7

91.7

88.1

91.7

155

B

93.9

100.0

94.0

94.1

100.0

94.2

95.9

73.8

95.8

212

C

83.4

100.0

83.6

83.5

100.0

83.7

88.7

100.0

88.8

255

A

93.3

100.0

93.4

93.0

69.2

92.9

93.8

5.6

93.3

53

B

91.3

96.4

91.4

92.0

100.0

92.0

93.8

5.6

93.4

62

C

64.8

100.0

64.9

64.7

100.0

64.8

67.0

100.0

67.2

119

A

96.7

100.0

96.8

96.3

93.2

96.2

95.5

64.7

94.7

380

B

97.4

100.0

97.4

98.3

100.0

98.4

97.5

60.1

96.6

525

C

87.9

100.0

88.2

87.8

100.0

88.2

89.4

97.1

89.6

638

A

93.2

100.0

93.2

93.4

77.1

93.2

91.9

34.7

90.9

129

B

91.7

100.0

91.8

92.4

100.0

92.4

91.6

39.8

90.7

186

C

59.2

100.0

59.5

59.8

100.0

60.0

65.5

87.3

65.8

234

A

96.6

100.0

96.7

96.4

88.7

96.2

95.9

77.3

95.5

291

B

95.3

100.0

95.4

95.6

100.0

95.7

94.7

85.5

94.5

405

C

91.2

100.0

91.4

92.4

100.0

92.6

92.9

100.0

93.1

605


363


/D/

/s/

/z/

/S/

/Z/

/h/

/ch/

/dj/

/m/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

91.8

100.0

91.9

91.1

65.8

90.9

89.4

47.0

88.7

120

B

87.0

98.7

87.1

87.0

100.0

87.1

87.2

69.6

87.0

166

C

84.9

94.9

85.0

83.9

100.0

84.0

88.6

92.2

88.6

272

A

93.4

98.6

93.6

92.5

96.1

92.6

90.2

80.9

90.0

255

B

82.3

97.0

82.8

82.3

96.6

82.8

79.0

62.6

78.7

309

C

55.0

98.1

56.5

54.5

98.3

56.1

60.5

99.1

61.4

356

A

92.4

100.0

92.6

91.7

88.4

91.6

91.2

97.4

91.4

193

B

90.9

100.0

91.0

90.0

100.0

90.2

87.5

99.4

87.8

248

C

71.7

100.0

72.2

71.9

100.0

72.5

65.2

99.4

66.0

290

A

93.5

99.7

93.7

93.2

87.5

93.0

89.9

81.4

89.7

352

B

93.2

99.7

93.4

94.5

100.0

94.6

92.7

67.3

92.0

448

C

90.5

99.7

90.7

92.1

100.0

92.4

93.1

91.0

93.0

531

A

92.0

100.0

92.1

90.4

82.1

90.4

90.6

63.4

90.3

122

B

62.0

100.0

62.3

62.0

97.4

62.2

56.6

82.9

56.9

162

C

65.6

100.0

65.9

64.6

100.0

64.9

58.8

100.0

59.3

211

A

87.5

100.0

87.6

87.3

100.0

87.4

86.5

26.0

85.5

110

B

84.2

100.0

84.3

84.4

100.0

84.6

86.2

34.1

85.3

153

C

56.3

100.0

56.7

55.4

100.0

55.8

65.4

94.3

65.9

260

A

95.3

99.1

95.3

95.2

89.8

95.1

94.8

61.1

94.2

306

B

95.2

98.7

95.3

96.5

100.0

96.6

93.7

60.2

93.2

425

C

78.2

98.7

78.7

77.3

100.0

77.8

63.2

94.7

63.7

523

A

86.5

100.0

86.6

85.4

55.6

85.3

77.1

7.1

76.1

71

B

90.1

100.0

90.2

91.0

100.0

91.0

78.5

4.1

77.5

95

C

77.0

100.0

77.1

77.2

100.0

77.3

71.3

68.4

71.2

156

A

93.4

95.9

93.4

92.4

76.7

92.3

89.5

36.8

89.0

150

B

92.6

91.1

92.6

93.3

96.7

93.4

89.1

22.1

88.5

159

C

91.2

94.3

91.2

91.4

98.3

91.5

88.1

97.1

88.2

250


364


/n/

/N/

/l/

/r/

/w/

/ie/

/I/

/e/

/&/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

94.6

91.8

94.6

93.7

75.6

93.4

89.3

29.3

88.6

196

B

91.3

91.2

91.3

91.9

97.8

92

84.4

45.7

83.9

227

C

79.3

91.8

79.5

78.5

98.9

78.8

67.2

91.3

67.6

336

A

89.9

98.9

90.0

88.4

80.4

88.3

88.2

9.5

87.5

114

B

93.5

96.8

93.6

93.8

100.0

93.8

91.5

7.9

90.7

142

C

88.2

91.6

88.2

88.0

97.8

88.1

88.2

87.3

88.2

171

A

92.3

99.3

92.4

91.1

72.7

90.9

89.7

56.2

89.0

198

B

90.6

98.6

90.7

90.6

100.0

90.7

84.8

68.0

84.5

263

C

85.7

98.6

85.9

86.1

100.0

86.3

76.4

90.2

76.7

421

A

87.0

100.0

87.2

85.5

94.3

85.5

81.5

73.8

81.5

100

B

85.2

97.1

85.4

84.6

96.2

84.7

77.8

81.5

77.8

121

C

77.8

100.0

78.0

79.0

96.2

79.2

73.3

95.4

73.5

138

A

89.4

99.0

89.5

88.8

70.8

88.6

84.4

46.3

84.1

171

B

84.5

100.0

84.6

86.1

100.0

86.3

81.8

55.6

81.6

209

C

80.1

98.1

80.3

79.6

100.0

79.8

78.1

98.1

78.3

274

A

86.6

99.1

86.7

86.5

75.5

86.4

86.0

44.9

85.6

198

B

85.4

100.0

85.5

86.5

100.0

86.6

85.2

34.8

84.7

267

C

82.2

99.1

82.4

81.4

100.0

81.6

83.2

100.0

83.4

336

A

84.6

97.7

84.8

84.5

90.6

84.6

76.0

50.4

75.6

198

B

85.9

100.0

86.1

85.7

100.0

86.0

77.5

44.3

77.0

238

C

84.4

99.4

84.6

84.2

98.8

84.4

81.2

100.0

81.5

288

A

92.6

92.1

92.6

91.9

74.2

91.5

91.6

41.1

90.0

199

B

87.6

87.4

87.6

87.9

96.8

88.1

85.3

42.1

84.0

231

C

88.4

87.0

88.3

88.5

91.9

88.6

87.2

70.6

86.7

271

A

82.8

96.1

83.2

80.7

94.2

81.1

73.4

80.0

73.6

226

B

87.6

90.9

87.7

86.7

97.8

97.0

80.2

86.3

80.3

276

C

86.2

94.7

86.5

86.2

98.6

86.6

81.6

82.6

81.6

289


365


/V/

/A/

/U/

/i/

/a/

/O/

/3/

/u/

/el/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

92.3

93.9

92.3

91.2

76.3

90.1

85.6

49.0

85.1

236

B

91.2

93.9

91.2

91.2

98.8

91.3

85.2

46.9

84.7

307

C

79.0

93.9

79.2

78.8

97.5

79.1

67.7

84.4

67.9

341

A

92.2

100.0

92.3

91.0

88.4

90.9

86.9

60.8

86.4

179

B

90.0

96.0

90.1

89.2

98.8

89.3

83.8

65.0

83.5

219

C

75.5

92.5

75.8

74.5

94.2

74.8

69.1

90.8

69.5

250

A

88.8

94.9

88.9

87.8

66.7

87.6

87.7

24.7

87.0

132

B

87.3

100.0

87.4

87.5

100.0

87.6

84.5

27.2

83.8

163

C

85.3

97.0

85.4

85.6

93.8

85.7

86.6

60.5

86.3

221

A

87.6

96.9

87.9

85.5

76.7

85.2

83.8

52.7

82.2

433

B

86.9

94.8

87.1

89.6

95.6

89.8

81.8

47.0

80.1

512

C

84.4

91.7

84.6

85.4

91.2

85.6

86.3

79.8

86.0

569

A

89.8

94.8

90.1

88.3

88.5

88.3

86.4

35.3

83.2

374

B

91.6

92.6

91.6

91.5

97.7

91.8

91.2

22.3

86.9

447

C

81.9

91.4

82.4

81.5

96.2

82.2

73.7

93.3

75.0

482

A

91.1

95.8

91.3

89.8

80.9

89.4

84.4

60.5

83.3

457

B

84.1

96.0

84.7

84.0

97.6

84.7

76.4

68.0

76.0

566

C

83.0

94.8

83.6

83.6

95.6

84.2

76.3

83.4

76.7

743

A

89.5

96.7

89.9

87.5

82.5

87.2

80.3

75.9

80.1

404

B

88.8

89.9

88.9

90.4

93.6

90.5

80.7

77.8

80.6

461

C

87.5

89.5

87.6

88.4

93.6

88.7

86.5

87.8

86.6

566

A

84.6

97.8

85.1

82.7

82.6

82.7

77.6

59.1

76.7

380

B

87.1

92.8

87.3

88.5

98.5

88.9

79.8

51.8

78.5

426

C

81.5

92.8

82.0

81.7

98.0

82.3

78.7

94.5

79.5

541

A

85.7

98.1

86.3

83.6

76.6

83.3

82.5

43.0

81.6

612

B

86.0

91.8

86.3

88.4

96.6

88.8

83.5

46.7

82.6

808

C

81.2

89.5

81.6

82.3

94.9

82.9

79.4

81.2

79.4

878


366


/al/

/Oi/

/OU/

/aU/

/i@/

/U@/

/e@/

Set B

Set C

Trained

True

True

Overall True

True

Overall True

True

Overall Neurons

With

Neg.

Pos.

Neg.

Pos.

Neg.

Pos.

A

87.4

94.6

87.8

84.0

75.5

83.5

82.2

32.2

80.3

784

B

85.5

93.5

86.1

88.9

97.3

89.4

81.0

34.8

79.2

1007

C

81.0

93.3

81.9

83.9

97.9

84.9

78.1

75.9

78.0

1142

A

88.8

97.8

89.1

86.9

74.4

86.4

81.7

5.3

77.8

534

B

85.5

92.4

85.8

88.1

99.0

88.5

78.9

4.4

75.1

644

C

80.5

94.6

81.1

81.9

99.5

82.6

74.1

81.1

74.5

749

A

87.8

97.1

88.4

84.0

69.9

83.2

70.8

15.2

68.2

761

B

83.0

94.9

83.7

86.0

99.0

86.7

73.3

17.7

70.7

922

C

79.5

95.3

80.4

81.0

99.3

82.1

75.3

91.5

76.1

1084

A

88.5

97.2

88.9

87.2

78.9

86.8

85.0

12.8

81.4

529

B

82.6

96.6

83.3

84.9

98.3

85.5

78.8

9.3

75.4

655

C

75.4

97.2

76.5

76.0

98.7

77.1

66.0

97.4

67.5

717

A

89.9

95.7

90.1

87.6

72.8

87.1

84.6

15.2

81.4

471

B

87.4

87.6

87.4

90.3

95.6

90.5

84.2

20.4

81.2

518

C

82.0

88.4

82.2

83.8

96.7

84.3

79.6

72.9

79.3

586

A

89.7

96.0

89.8

88.3

79.8

88.1

86.7

13.3

84.6

333

B

87.6

93.4

87.7

89.3

99.1

89.5

86.3

20.7

84.5

387

C

82.8

94.2

93.1

82.3

99.1

82.6

78.5

80.8

78.6

515

A

90.3

95.7

90.4

88.7

69.3

88.2

88.0

22.8

86.1

406

B

86.7

89.5

86.7

89.7

96.1

89.9

84.9

51.5

83.9

463

C

81.2

92.2

81.5.

82.6

98.4

83.0

76.1

90.1

76.5

572

Evolving Connectionist Systems

Evolving Connectionist Systems

Suggest Documents

Evolving Connectionist Systems for Adaptive Sport Coaching

Evolving Connectionist Systems: The Knowledge ... - Semantic Scholar

Evolving Connectionist Systems for Adaptive Sports Coaching

Evolving Connectionist Systems for Adaptive Sport ...

A Decade of Kasabov's Evolving Connectionist Systems - watts.net.nz

Evolving Connectionist Systems for On-line ... - Semantic Scholar

Evolving connectionist system versus algebraic formulas for prediction ...

Contracts for Evolving Systems

Evolving City Systems

Connectionist-Based Decision Support Systems and ... - CiteSeerX

Smoothed Local Adaptation of Connectionist Systems - CiteSeerX

Self-Adaptation in Evolving Systems

Integrative connectionist learning systems inspired by nature: current

on-line q-learning using connectionist systems cued/f ... - CiteSeerX

Evolving Systems within Immersive Architectural ... - Philip Beesley

Evolving Asynchronous Adaptive Systems for an ... - CiteSeerX

EVOLVING FOOD MARKETING SYSTEMS IN RECOVERING ...

Self-aware, Evolving Eternal Systems - RMoD

Stability of Evolving Multi-Agent Systems

Evolving Information Systems: Beyond Temporal ... - CiteSeerX

3 Evolving Intrusion Detection Systems - Ajith Abraham

Evolving academic journal editorial systems - CiteSeerX

Maintaining Configurations of Evolving Software Systems

Industrial Monitoring by Evolving Fuzzy Systems - EUSFLAT