Neural Network based Construction of Fuzzy Graphs

Neural Network based Construction of Fuzzy Graphs Michael R. Berthold and Klaus{Peter Huber

Institute for Computer Design and Fault Tolerance (Prof. D. Schmid), University of Karlsruhe, Germany

Abstract| Function approximation using ex-

f(x)

f(x)

ample data has gained considerable interest in the past. One interesting application is the approximation of the behaviour of simulation models, called metamodelling. The goal is to approximate the behaviour as well as to extract Fuzzification Defuzzification some understandable knowledge about the simulation model. In this paper a combination of k a special type of Neural Network (Rectanguk lar Basis Function Network) with a (de{)fuzziNeural Net cation module is used. The resulting system approximates real valued functions with an adRecBF justable precision. A constructive algorithm builds the network from scratch, resulting in a structure where each hidden unit represents a rectangular area with a corresponding membern ship function (or a fuzzy point). The underlying knowledge can be extracted from the network in Figure 1: The Fuzzy Graph Rectangular Basis Function form of a Fuzzy Graph. Network. x

training

desired class(es)

x

"test"

class memberships

f(x)

x

Fuzzy Graph representation

I. Introduction

The dynamic behaviour of systems can be analyzed using simulation models. Unfortunately, often the results are only available in the form of large datasets, which makes it hard to extract the underlying regularities. Additionally, each simulation run is time consuming and therefore an interactive analysis is usually impossible. One solution is to approximate the behaviour of the simulation model, resulting in a metamodel that performs a mapping from simulation parameters to interesting outputs. Using the original system, each such mapping would require a complete simulation run. Using the approximation this mapping can be done fast and additionally the underlying simulation metamodel can be analyzed to extract knowledge about the system. One kind of metamodel construction is based on data examples that are usually noisy due to the underlying stochastic simulation process. Earlier approaches are based on regression models (see for example [4]) but their application requires assumptions about the underlying dependencies (e.g. linearity). On the other hand purely rule based approaches may reveal a speci c Institut fur Rechnerentwurf und Fehlertoleranz (Prof. D. Schmid) Universitat Karlsruhe, Postfach 6980, 76128 Karlsruhe, Germany eMail: [email protected], [email protected]

type of information, i.e. areas of more or less constant output (insensitivity regions [3]) or bottlenecks of the system [6] but they lack real function approximation capability. Other approaches use Neural Networks [7], because they are known to approximate well arbitrary functions, but they do not allow to interpret what they have learned in an understandable manner. Knowledge extraction from Neural Network based metamodels is therefore usually dicult. In this paper another approach based on a special kind of Neural Networks will be presented, that allows the usage of Fuzzy Graphs [9] to represent the discovered knowledge. Main advantages are the automatic construction of the Neural Network based on data examples and a straightforward knowledge extraction. Additionally, the underlying fuzzy concept seems to be well suited to manage stochastic data. The paper describes how these Fuzzy Graphs are generated using so{called Rectangular Basis Function Networks (RecBFN, [2]). An example is used to demonstrate how the proposed method works in practice. II. Fuzzy Graph Rectangular Basis Function Networks

The approach presented in this paper aims to combine the expressive power of Fuzzy Graphs with the learning

1 support

1 ?1

core

~r

+1

+2 ?2

+2 ?2

Figure 2: A RecBF in two{dimensional space. The center{vector ~r de nes the location, the + ;? de ne the inner (or core{) rectangle and the + ;? specify the outer rectangle (or support{region).

Figure 3: An example for compatible ( and ) and incompatible ( and 2) classes. One rule for class with the corresponding core (solid line) and support area (dashed line) is shown.

ability of Neural Networks. Using a constructive algorithm from the eld of Radial Basis Functions [1] allows to build a network of Rectangular Basis Functions from scratch using training data. The internal structure of the resulting network then enables an interpretation in form of a structure similar to Fuzzy Graphs. The entire system is sketched in gure 1. It can be divided into two stages, the Neural Network module and a second unit responsible for fuzzi cation of training values and defuzzi cation of the network output. Training of the network (or construction of the Fuzzy Graph) requires classi ed input patterns. Therefore the fuzzi cation unit is used to determine which class each pattern belongs to. The network is trained to correctly classify the corresponding input patterns. Function approximation takes the reverse step, the network produces membership values for all classes and the defuzzi cation unit computes the nal output value using the well{known center{of{gravity method.

of their corresponding class. In contrast to common classi cation algorithms not all classes have to be mutually exclusive, it is possible to declare a compatibility{ relation C between dierent classes, making it possible for patterns of one class to be covered by a support{ or core{region of compatible classes. For all other (or incompatible) classes the algorithm guarantees that patterns do not lie inside the support{region of con icting classes. The compatibility{relation can be used to enable the network to tolerate noisy patterns or small oscillations along class boundaries as will be demonstrated later. Figure 3 shows an example for a rule of class , when and are compatible, but and 2 are not. Therefore the resulting RecBF units will have cores that just cover all patterns of their assigned class. The core may contain other compatible patterns as well. The support region covers the largest surrounding area, just barely avoiding con icts with incompatible patterns.

III. Rectangular Basis Function Networks

In strong analogy to Radial Basis Function Networks (RBF, see [5]) Rectangular Basis Function Networks can be seen as feedforward networks with one hidden layer. The input layer is fully connected to the hidden layer. Each unit in the hidden layer holds a center{ vector ~r and two sets of widths: ?i , +i and ?i , +i (1 i n where n is the input dimension) that describe the location and size of the Rectangular Basis Function1 . This is in contrast to Radial Basis Function Networks where each RBF covers a radial area and is described by one parameter, the standard deviation . Each RecBF de nes a core{region using the + ;? and a larger support{region using the + ;? . Figure 2 shows an example for one RecBF in two{dimensional feature space. All RecBFs are connected with a non{ zero weight to exactly one output unit. Each output unit corresponds to one class, resulting in a binary 1{ of{n coding on all outputs. The goal of the used training algorithm is to cover all training patterns with the core of at least one RecBF

1A

similar approach using only one set of widths was introduced in [8] independently from the RecBFN{approach presented in this paper.

A. Training

Training of RecBFN is done using a fast and constructive algorithm that is based on an existing RBF algorithm, called Dynamic Decay Adjustment (DDA, see [1]). The algorithm is based on three steps that introduce new RecBFs when necessary and adjust the core{ and support{regions of existing RecBFs: covered: if a new training pattern lies inside the support{region of an already existing RecBF with the correct class its core{region is extended to cover the new pattern, which is | even in high{dimensional space | an easy task. commit: if a new pattern is not covered by a RecBF of the correct class a new hidden unit will be introduced. Its reference vector will be the same as the new training instance and the widths will be chosen as large as possible, without running into con ict with already existing prototypes of incompatible classes (this is accomplished by executing the shrink{procedure for existing prototypes of incompatible classes). shrink: if a new pattern is incorrectly covered by an already existing RecBF of an incompatible class, this prototype's support{area will be re

ALGO DDA{RecBFN

support

// reset weights and core:

FORALL prototypes pci DO Aci = 0:0 + =? = 0:0 ENDFOR

core

// train one complete epoch

FORALL training pattern (~x; c) DO: IF pci : pci covers ~x THEN Aci = Aci + 1:0 adjust + =? (core) to cover ~x ELSE 9

// \commit": introduce new RecBF

add new RecBF pcmc +1 with: ~rmc c +1 = ~x + =? = FORALL k : (k; c) = C; 1 j mk DO pcmc +1 shrink(~rjk ) ENDFOR Acmc +1 = 1:0 mc = mc + 1 ENDIF 1

2

!

// \shrink": adjust con icting RecBFs

FORALL k : (k; c) = C; 1 j mk DO pkj shrink(~x) ENDFOR ENDFOR END ALGO pci : ith RecBF of class c (1 i mc ) mc : number of RecBFs of class c c ) : reference vector, and ~ric = (ri;c 1; ; ri;n c Ai : amplitude (or weight) of pci C : (k; c) C k is compatible to c. 2

!

2

,

Figure 4: The DDA{algorithm to train one epoch for RecBFNs. See text for further details. duced (e.g. shrunk) so that the con ict is solved. Because it is not possible to nd an optimal solution for this problem in reasonable time a heuristic will be presented later that aims to generate large rectangles. The program presented in gure 4 trains for one epoch using the DDA algorithm. First, all weights are set to zero to avoid accumulation of erroneous information about the training patterns. Next all training patterns are presented to the network. If the new pattern is already covered by a RecBF of the correct class, the weight of the biggest covering RecBF is increased; otherwise a new RecBF is introduced, having the new pattern as its reference. Its initial size is chosen as in nite and it is then shrunk against all existing RecBF of incompatible classes. The last step must include shrinking all RecBFs of incompatible classes if their support{region covers this speci c pattern.

Figure 5: Membership functions extracted from one RecBF are illustrated in this gure. All one{ dimensional functions are combined using a minimum operator resulting in a membership function for the corresponding rule. The main dierence to the original DDA Algorithm is a new shrink{procedure where three cases are distinguished based on a heuristic to maximize the volume of each RecBF. If an existing nite width can be shrunk without shrinking below min the one with the smallest loss in volume will be chosen2 . If this is not the case either one of the remaining in nite widths will be shrunk, or if this would result in a width smaller than min one of the nite widths will be shrunk below min . B. Rule Extraction from RecBFNs

As mentioned previously the RecBFN that is generated by the presented training algorithm can be used to extract rules. Each hidden unit represents one rule which is described by a simple conjunction of restrictions on the subset of all attributes that were shrunk during training. In addition so called core{ and support{areas are extracted to de ne membership functions for the used attributes. The core represents the smallest area where patterns of the class were found, and therefore this area is assigned a membership value of 1 for the corresponding class. The larger support{area contains no con icting patterns and towards its boundaries the membership value declines linearly to 0. This leads to the function illustrated in gure 5 and shows strong similarities to the soft rules described by Zadeh [9]. Each RecBF represents one soft rule and all RecBFs connected to the same output form the rule{set of the corresponding class. C. Fuzzy Graph Approximation with RecBFNs

In the framework shown in gure 1 the RecBF Network can be used to approximate functions and to extract a Fuzzy Graph from the network. To use the RecBFN for a function approximation the output values have to be fuzzi ed, as illustrated in gure 6, resulting in dierent classes for dierent ranges of the output value. In this example the membership functions are of equidistant widths, achieved by a linear quantization and adjacent classes being compatible. Other, individual member2 The vector of minimum widths has to be chosen a{priori. It inhibits long, thin rectangles (\straw"{dilemma) and is easy to adjust. For [0; 1]{normalized attributes a choice of 1=10th is appropriate.

class 5

mi (y)

10 sin(x)*(10/x) + sqrt(x) + 0.5 ’recbf.results’

9

1

2

3

4

5

6

7

8

9 10

8

y

7

Figure 6: Fuzzi cation of output y resulting in 10 classes for the underlying Network.

6 5 4 3

10 2

sin(x)*(10/x) + sqrt(x) + 0.5 ’cores’ ’supports’

9

1

8 0 0

5

10

15

20

25

30

35

40

45

50

7 6

Figure 8: A one dimensional function and its approximation by a 10{class Fuzzy Graph RecBFN.

5 4 3

imates functions with a controllable accuracy, which seems useful for noisy data. The Fuzzy Graph RecBFN oers a promising way to build a metamodel from simulation data that allows to analyze the underlying system behaviour. In addition Figure 7: The corresponding Fuzzy Graph extracted the extracted Fuzzy Graph is an understandable repfrom the RecBFN. All core{regions but only the resentation of the knowledge acquired by the Neural support{areas for the rules of class 5 are shown. Network. 2 1 0

5

10

15

20

25

30

35

40

45

50

ship functions are also possible, allowing seperate resolutions for dierent ranges of the output value. In this scenario, the network can be seen as modeling a Fuzzy Graph, where each hidden unit or RecBF represents one fuzzy point. Figure 7 shows an one{ dimensional function and the corresponding Fuzzy Graph after presenting 200 equally spaced training patterns with a 10 class output{quantization as shown in gure 6. Figure 8 shows the resulting function approximated by the Fuzzy Graph RecBFN. This example shows that the resulting Fuzzy Graph approximates the original function well, with a speci c degree of accuracy. In regions containing \noise" up to a certain amount, the Fuzzy Graph ignores the oscillations and tends to produce plateaus. The amount of smoothing can be controlled by the output fuzzi cation. Using more and ner membership functions results in higher precision. IV. Conclusions

In this paper an approach using an existing Neural Network model to extract Fuzzy Graphs has been proposed. Adding a (de{)fuzzi cation component to the network makes it possible to use the underlying classi er to nd fuzzy points in the feature space. Construction of the Fuzzy Graph is done in a fast and ecient way without any need for user interaction, besides definition of the output fuzzi cation. Evaluation of new data points is straightforward and the resulting representation is easy to understand. The system approx-

V. Acknowledgments

We thank Prof. D. Schmid for his support and the opportunity to work on this interesting project. In addition we would like to thank Prof. Zadeh for his comments on the RecBFN approach that initiated the idea of Fuzzy Graph RecBFNs. References [1] Michael R. Berthold and Jay Diamond. Boosting the performance of RBF networks with Dynamic Decay Adjustment. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems, 7, Cambridge MA, 1995. MIT Press. [2] Klaus-Peter Huber and Michael R. Berthold. Building precise classi ers with automatic rule extraction. In International Conference on Neural Networks, (submitted). IEEE, november 1995. [3] Klaus-Peter Huber and Helena Szczerbicka. Sensitivity analysis of simulation models with decision tree algorithms. In Proceedings of the European Simulation Symposium ESS'94, volume 1, pages 43{47, 1994. [4] J.P.C. Kleijnen. Regression metamodels for generalizing simulation results. IEEE Transactions on Systems, Man and Cybernetics, 9(2):93{96, 1979. [5] John Moody and Christian J. Darken. Fast learning in networks of locally{tuned processing units. In Neural Computation, 1, pages 281{294. MIT, 1989. [6] Henri Pierreval. Rule-based simulation metamodels. European Journal of Operational Research, 61:6{17, 1992. [7] Henri Pierreval. A metamodeling approach based on neural networks. International Journal of Computer Simulation, 6(2), 1996. [8] Nadine Tschichold-Guerman. Rulenet: An arti cial neural network model for cooperation with knowledge based systems. Internal report, Swiss Federal Institute of Technology, 1994. [9] Lot A. Zadeh. Soft computing and fuzzy logic. IEEE Software, pages 48{56, november 1994.