14th PSCC, Sevilla, 24-28 June 2002
Session 20, Paper 3, Page 1
MAPPING NEURAL NETWORKS INTO RULE SETS AND MAKING THEIR HIDDEN KNOWLEDGE EXPLICIT APPLICATION TO SPATIAL LOAD FORECASTING Adriana Castro
Vladimiro Miranda
INESC & UFPA∗ Porto, Portugal
[email protected]
INESC & FEUP Porto, Portugal
[email protected]
Abstract – This paper presents a mathematical transform that maps artificial neural networks into rulebased fuzzy inference systems. This allows one to make explicit the knowledge implicitly captured by a trained neural network. This result is exact and its application is illustrated with data from a spatial load forecasting problem.
Keywords: Artificial Neural Networks, Fuzzy Inference Systems, rule based systems, spatial load forecasting. 1. INTRODUCTION Artificial Neural Networks (ANNs) are recognized by their powerful capacity of expressing relationship between variables of a problem. They constitute a powerful interpolation tool and their presently most used form is considered as a universal approximator, meaning that is always possible to design an ANN that approximates with some precision a given function. Furthermore, ANNs are exceptionally apt for efficient computational implementations. However, there is still much distrust in ANNs for a number of reasons, some better than other. One often heard argument is that ANNs do not have explaining capability – they deliver, but that don’t tell why. In a number of ways, this is certainly true. In many cases, ANNs are sufficient and there is no real need to make knowledge explicit. But in some application areas this will be felt as a must. A good example would be the building of diagnosis for system or equipment failures. Human understanding would be greatly enhanced if the relation between variables or symptoms and equipment condition were explicit, and engineers or technicians would also gain more confidence in the diagnosis produced. On the other hand, rule based systems (Fuzzy Inference Systems - FIS) have precisely the desired characteristics of an explicit form of knowledge. However, their construction is not always straightforward. One of the more important problems in the design of FIS from input-output data is a form of “curse of dimensionality”, meaning that the number of rules of ∗
Federal University of Para - Brazil Sponsored by CAPES – Brazilian Government
the system grows exponentially when the number of inputs increases and computational complexity in the implementation for practical problems increases accordingly. Besides, if the number of rules is excessive, the understanding will be more difficult for the human specialist. To couple the advantages of neural networks and rule based systems, there has been a number of works trying to establish relations between them [1]-[4]. Some of these relationships have been built on the basis of progressive approximations. In this paper, instead of approaching an ANN by some rule based system arbitrarily built, we will present a mathematical transform that maps certain type of ANN into Takagi-Sugeno (TS) fuzzy inference systems – not an approximation, but an equivalence process that allows one to replace an ANN by a TS fuzzy inference system and obtain exactly the same results. The paper will present the derivation and definition of the transform and will present one example of practical case. The case will illustrate how the transform makes explicit and puts into light knowledge that was hidden in the ANN architecture. The practical case used to illustrate the technique is related to Spatial Load Forecasting. However, this technique has the potential to contribute to the development of better diagnosis systems, like for machine failure (transformers, generators…) or relay failure or even for complex power system problems like voltage collapse – or any other problem where an ANN produces a suitable solution. 2. ARTIFICIAL NEURAL NETWORKS An ANN is characterized by having in its architecture many low-level processing units with a high degree of interconnectivity via weighted connections. Motivated by different networks in biological system, some ANN architectures have been proposed in the literature. Among these, the most commonly used is the Multilayer Feedforward Neural Network (Figure 1). In its most basic form this is a modelling consisting of a finite number of successive layers where each layer consists of a finite number of processor units called
14th PSCC, Sevilla, 24-28 June 2002
Session 20, Paper 3, Page 2
neurons. Each neuron of each layer is connected to each neuron of subsequent layer through synaptic weights.
form: IF (antecedent) THEN (consequent) The antecedent is a fuzzy proposition of the type “x is A” where x is a linguistic variable and A is a linguistic term defined by a fuzzy set. Basically, FIS can be categorized into two families: 1) the family that includes linguistic models based on collections of IF-THEN rules, whose antecedents and consequents utilize fuzzy values, and 2) the family that uses a rule structure that has fuzzy antecedent and functional (crisp) consequent.
Figure 1: ANN Architecture
Considering the ANN of the Figure 1, every neuron in hidden layer calculates: n
s j = f (∑ xi wij + θ j )
(1)
i =1
where xi is the i-th input to the net, wij is the weight of the connection from input neuron i to hidden neuron j, θ j is the bias of the j-th hidden neuron and f(.) is the activation function of the neuron. For the output layer, each neuron calculates: m
y k = g (∑ β jk s j )
(2)
j =1
where βjk is the weight of the connection from hidden neuron j to output neuron k , yk is the k-th output of the net and g(.) is the activation function of the neuron. Among all properties of the ANNs, the most important is their function approximation capability. It has been extensively demonstrated that a Multilayer Feedforward Neural Network working with arbitrary squashing functions in hidden neurons can approximate virtually any function of interest to any desired degree of accuracy [5]-[6]. However, a barrier to a more widespread acceptance of ANNs is that they don’t have explaining capability. It is impossible for the specialist human to understand how the neural network arrives a particular decision. The ANNs are considered black boxes and nothing can be revealed about the knowledge encoded within them.
The second category, based on Takagi-Sugeno (TS) fuzzy inference systems, is built with rules in the following form: Rule Rl: IF xl1 is Cl1 and … and xln is Cln THEN y l = c0 + c1x1 + ... + cn xn
(3)
where Cli are fuzzy sets, xi is the input of the system and ci are constants. The consequent of the rule is an affine linear function of the input variables and the output of the TS model is computed as the weighted average of yl. When yl is a constant, the fuzzy inference system is called a Zero-order TS fuzzy model, which can be viewed as a special case of the Mandani fuzzy system in which each rule’s consequent is specified by a fuzzy singleton (or a pre-defuzzified consequent). The Figure 2 illustrates the reasoning mechanism for Zero-Order TS, which is the model of the interest of this paper. A1
B1
and
If
y1=c1 x1
A2 If
Then
x2 B2
and
y2=c2
x1
x2
y=
v1 y1 + v2 y2 = v1 y1 + v2 y2 v1 + v2
Figure 2: A two-input Zero-order Takagi-Sugeno Model
3. FUZZY INFERENCE SYSTEMS Fuzzy Inference Systems or Fuzzy Rule Based Systems (FRBS) have precisely the desired characteristics of an explicit form of knowledge. As ANNs, FIS are dynamic, parallel processing systems that estimate input-output functions. In a FIS, the relationship between variables is represented by means of fuzzy IF-THEN rules in the
4. MAPPING NEURAL NETWORKS INTO RULE SETS 4.1. Definition of the topology of the ANN For the purpose of this paper consider the ANN in Figure 3. This ANN has one neuron in output layer with a linear function as activation function and has only one
14th PSCC, Sevilla, 24-28 June 2002
Session 20, Paper 3, Page 3
hidden layer whose activation function for each neuron is the sigmoid basis approximation function, whose graphic is shown in Figure 4 and defined as: 1 − e −x x ≥ 0 f ( x) = 0 x 0 . Then, considering this function as function f and ⊕ as the operation + in ℜ, we have: Lemma 2: The f-dual of + is *, defined as: f ( x1 + x 2 + ... + x n ) = f ( x1 ) * f ( x 2 ) * ... * f ( x n ) = a ∗ b ∗ ... ∗ p = 1 − (1 − a )(1 − b)...(1 − p )
(6)
Therefore, applying the concept of f-duality to (1), the output signal of the hidden neurons can also be calculated by: n
s j = f (∑ xi wij ) = f ( x1w1 j ) * ... * f ( xn wnj ) = i =1
1 − (1 − f ( x1w1 j ))...(1 − f ( xn wnj ))
(7)
Since the function f(.), which is the Sigmoid Basis Approximation Function, can represent one membership function in Fuzzy Logic and the operation in (7) is the already known Algebraic Sum operator , qualified as a S-norm (union) in Logic Fuzzy, then rules can be extracted of the ANN.
Figure 3: ANN topology
4.3. Extracting Rules of an ANN From ANN shown in Figure 3, and considering the hidden neurons without bias (the function of the bias will be explained later), for each neuron in hidden layer, one rule can be extracted: n
Figure 4: Sigmoid Basis Approximation Function
Rule Rj: If
4.2. Introducing the concept of f-duality The concept of f-duality was introduced by Benitez, Castro and Requena in [3]. In their work they have used this concept to find a convenient operator, the logical interactive-or, to give a proper interpretation of ANN. To produce the mapping of an ANN into rule sets as proposed in this paper, the same concept introduced by those authors will be used to find the equivalent mathematical operation to equation (1) – the operation calculated for the hidden neuron. The following propositions and lemmas are useful: Proposition 1: Let f: X→Y be a bijective function and let ⊕ be an operation defined in the domain of f, X. Then there is one and only one operation ⊗, defined in the range of f, Y, verifying: n
n
i =1
i =1
f ( ⊕ xi ) = ⊗ f ( x i )
(5)
Definition 1: Let f be a bijective function and let ⊕
∑ xi wij is A then yj=βj i =1
(8)
where A is a fuzzy set whose membership function is the activation function of the hidden neuron. From equation (7), rules as in (8) can be expressed as: Rule Rj : If x1 w1j is A *…* xi wi j is A* …* xn wnj is A then yj=βj (9)
As the expression “xi wij is A” might also be interpreted as “xi is Ai ” where the fuzzy set Ai has as membership function µ Ai = f ( x i wij ) with the weight wij as a scaling of the slope of f(.) and once the operation ∗ is considered as a logical operator OR, we can rewrite (9) as: Rule Rj: If x1 is A1 or…or xi is Ai or… or xn is An then yj=βj (10) with the firing strength for each rule Rj as:
14th PSCC, Sevilla, 24-28 June 2002
v j = µ A1 ∗ .... ∗ µ Ai ∗ ... ∗ µ An
Session 20, Paper 3, Page 4
(11)
Finally, from the output neuron in Figure 3 the output of the fuzzy system can be extracted: y=
m
∑ β js j
extracted from ANNs have to make sense - they must be meaningful and subject to interpretation. For this reason, some considerations will be now presented in such a manner that the extracted rule set will be in appropriate form.
(12)
j =1
ANN
and as s j = v j : y=
m
∑ β jv j
(13)
j =1
The inference system extracted from the neural net is similar to a zero-order Takagi-Sugeno model, except that here the fuzzy logic operator used to calculate the firing strength of each rule is a S-norm (OR) and not a T-norm (AND- product). However, for each S-norm there is a T-norm associated with it, that is, there is a fuzzy complement such that the three together satisfy the DeMorgan’s Law [7]. Specifically we state that the S-norm (a,b), T-norm t(a,b) and the fuzzy complement c(a) form an associated class if : c ( s(a, b)) = t [c(a ), c(b)]
(15)
From (14), if the fuzzy system extracted will use the algebraic product to calculate the vj rule’s firing strengths then the rule set must use the negation for all memberships function extracted in (10) i.e. “xi is NOT Ai ”, and the new output of the FIS will be: m
y = ∑ β j (1 − v j )
(16)
j =1
Therefore, the Fuzzy System represented for the rule set in (10) and output in (13) can be also represented for the output in (16 ) and rule set as: Rule Rj: If x1 is Not A1 and…and xi is Not Ai and…and xn is Not An then yj=βj (17)
with the firing strength for each rule Rj as: v j = (1 − µ A1 ) × ... × (1 − µ ) × ... × (1 − µ An ) i
If
0.7
0.4 0.3
(18)
The Fuzzy System extracted and the ANN are equivalent since their output are the same for any input. Figure 5 illustrates the extraction/transformation process for two rules. 4.4. Comments The process explained so far contains the basic idea to produce a mapping of ANNs into FIS. However, the rule antecedents and consequents
0.6 0.5 0.4 0.3
0.2
0.2
0.1
0.1 0
1
2
3
4
5
x1
6
7
8
9
0
10
1
2
3
4
5
6
8
9
10
A4
0.9
0.9
v2
0.8
0.7
0.7
OR
0.6 0.5 0.4 0.3
0.6 0.5 0.4 0.3
0.2
0.2
0.1
0.1 0
1
2
3
4
x1
5
6
7
8
9
0
10
0
1
2
3
4
5
6
7
x2 8
9
v1=1-(1-µA1)(1- µA2) v2=1-(1-µA3)(1- µA4)
A1
Not
A2 THEN
1
0.8
v1
0.8
AND
0.6 0.5 0.4 0.3
0.7 0.6 0.5 0.4 0.3
0.2
0.2
0.1
0.1 0
1
2
3
4
5
6
7
x1
8
9
0
10
Not A3 1
1
2
3
4
5
6
y1 = β1
x2
7
8
9
10
1 0.9
0.8
v2
0.8
0.7
AND
0.6 0.5 0.4 0.3
0.7 0.6 0.5 0.4 0.3
0.2
0.2
0.1 0
0
Not A4
0.9
If
y2 = β2
0.9
0.7
0
10
y=v1β1+ v2 β2
Not
1 0.9
If
y1 = β1
x2
7
1
0.8
0
0
A3
1
If
v1
0.7
OR
0.5
0
THEN
A2 1 0.9 0.8
0.6
(14)
and as already known, the T-norm associated with the Algebraic Sum is the Algebraic Product operator defined as: t ( a, b) = a × b
A1 1 0.9 0.8
0.1 0
1
2
3
4
5
6
7
8
9
x1
0
10
0
1
2
3
4
v2=(1-µA3)(1- µA4) v1=(1-µA1 )(1-µA2)
5
6
7
x2 8
9
10
y2 = β2
y=(1-v1)β1+ (1-v2 )β2
Figure 5: Extraction/Transformation process for two rules.
Condition
1:
For
all
membership
function
µ Ai = f ( x i wij ) extracted, the weight wij can be seen as a
scaling factor of the f(.) and with f (2.3 / wij ) = 0.9 , then the interpretation of the set fuzzy will be “greater than 2.3 / wij ” and it only will make sense if 0 < 2.3 / wij ≤ 1 , which leads to wij ≥ 2.3 . This consideration appears as a result from the usual practice of training an ANN with normalized inputs; therefore all memberships functions extracted have to be defined for the respective input interval. With wij ≥ 2.3 and as 0 ≤ x i ≤ 1 , the correct use of the equation (7) is guaranteed once we will always have xi wij > 0 . Condition 2: The consequent of the rule, whose value is constant (singleton), has to be into the interval of the output system. Therefore, scaling change in βjk (extracted consequent) has to be made after the training of the net. This scaling change can be incrusted in the
14th PSCC, Sevilla, 24-28 June 2002
Session 20, Paper 3, Page 5
output (16) of the FIS. 4.5. ANN with bias If the bias is used in hidden neurons, the equation (7) is rewritten as: n
s j = f (∑ xi wij + θ j ) = f ( x1w1 j ) * ... * f ( xn wnj ) ∗ f (θ j ) (19) i =1
Therefore, the firing strength of the rule j as in (18) will be now: v j = (1 − µ A1 ) × (1 - µ A2 ) × ... × (1 − µ An ) × (1 − f (θ j ))
(20)
If θj >0, we can consider the term 1 − f (θ j ) as the weight of the rule j; then the set of rules as in (17) will be now of the form: Rule Rj: If x1 is Not A1 and…and xi is Not Ai and…and xn is Not An then yj=βj with rj = (1-f(θj)) (21) 4.6. ANN Learning Restrictions The following restrictions have to be introduced in the learning algorithm of the ANN to guarantee condition 1 and the use of bias as weights of the rules: a)
All the weights between input layer and hidden layer have to be greater than 2.3, that is wij ≥ 2.3 .
b) All bias of the hidden neurons have to be greater than zero, that is, θj >0. 5. APPLICATION TO SPATIAL LOAD FORECASTING Spatial Load Forecasting (SLF) methods have been developed to predict where, when and how much load growth will occur in a utility service area. They have been used to model the process of load growth in order to predict load evolution in a spatial and temporal basis, and have proved their value in distribution expansion planning. One of the best ways to implement SLF methods is use Geographical Information System (GIS). The GIS is an ideal environment to SLF models because of its ability to: manage spatial information; model and simulate the phenomena behaviour; visualise data and simulation results and establish the interaction between the planner and simulation environment. To demonstrate the process of the extraction of the FIS from ANN we have selected the study region of the island of Santiago (Republic of Cabo Verde, Africa) illustrated in Figure 6 [8]-[9]. The region size is 39km x 50,5km. The resolution in the GIS spatial representation was of square cells of 250m, aggregated in a cell-based map with 31512 cells. Each cell contains information about influence factors as inputs factors and potential for development as the output. We have divided the data defined in each cell into a training set and a test set of equal size (15756 points each).
Figure 6: Island of Santiago, Republic of Cabo Verde, Africa.
5.1. ANN Results An ANN with three normalized inputs (x1-distance to main urban centre, x2-distance to roads and x3distance to secondary urban centre), 14 hidden neurons and one output neuron (potential for development, y ∈ [0, 5.4] ) was trained with data training set with 15756 points. This data set was acquired directly from the map of the region stored at the GIS together with data about demand development (no. and type of consumers) at each point of the map. The objective of building an ANN was to derive a relationship between influence factors and potential for development that would allow, later on, the use of the ANN for spatial demand forecasting, when applied to the same region in future scenarios or when applied in different, although similar, regions. The training of the ANN was made using a Backpropagation algorithm while guaranteeing the limit constraint for all weights wij and bias θj. Table 1 shows all the weights and bias after the training of the ANN. Figure 7 shows the results of the ANN for the 15756 points trained. The average error of the ANN was 0.1048. j 1 2 3 4 5 6 7 8 9 10 11 12 13 14
w1j 3.618 28.058 4.692 32.687 5.725 6.662 11.294 5.789 7.679 20.529 4.112 3.696 5.672 21.592
w2j 3.400 9.419 5.456 42.014 4. 572 2.571 5.069 3.542 6.180 9.048 5.937 58.470 6.668 2.388
w3j 5.565 25.48 20.39 23.63 7.965 17.32 15.55 5.240 43.29 20.42 21.24 12.03 9.983 20.38
θj 1.953 0.279 0.199 0.002 3.199 0.828 0.486 3.579 1.560 0.074 0.430 0.190 0.389 1.092
βj 1.693 -26.1 12.00 32.88 2.346 -6.654 8.332 3.375 10.19 -12.93 14.13 -22.11 -4.926 -11.81
Table 1: Weights wij between input and hidden layer, bias θj of the hidden neurons and weights βj between hidden layer and output.
14th PSCC, Sevilla, 24-28 June 2002
Session 20, Paper 3, Page 6
β j = 0.1481β j + 1.4634
(22)
And applying this scaling change in β of the table 1 we have found: β = [2.56 0 3.50 5.4 2.62 1.80 3.16 2.71 3.33 1.23 3.69 0.39 1.96 1.33] (21)
With β , and applying the method to the extraction of all antecedents from hidden neurons, the complete set of rules can be obtained. Table 2 shows the FIS extracted.
Figure 7: Potential for development in all cells in the map of the study region (x axis: no. of cells) for three-input neural networks - (+) target output and (*) ANN output.
5.2. Extraction of the FIS With the ANN trained and all weights and bias obtained, we can initiate the process of extraction of the rule set composing a FIS. The FIS will have three normalized inputs (distance to main urban centre, distance to road and distance to secondary urban centre) and one output (potential for development); as the trained ANN has 14 neurons in its hidden layer then 14 rules will be extracted. Since the methodology to extract the antecedent rule from one hidden neuron is similar to all neurons in the hidden layer, we will demonstrate the method only for first neuron in the layer. For this neuron, from Table 1, the weights and bias are: w11 = 3.618, w21 = 3.4 , w 31 = 5.565 and θ1 = 1.953 . Then: a)
For w11=3.618 we can extract the membership function µ A1 = f (3.618 x1) as illustrated in figure 8(a) and whose interpretation is “x1 is greater than 0.6356” since 2.3/3.618=0.6356.
b) For w21=3.4 we can extract the membership function µ A2 = f (3.4 x2 ) as illustrated in figure 8(b) and whose interpretation is “x2 is greater than 0.675” since 2.3/3.4=0.675. c)
For w31=5.565 we can extract the membership function µ A3 = f (5.565 x3 ) as illustrated in figure 8(c) and whose interpretation is “x3 is greater than 0.675” since 2.3/5.565=0.675.
d) For θ1 = 1.953 , we can extract the weight of the rule 1, that is r1 = 1 − f (1.953) = 0.1417 . Considering now the weights β → [−9.882 26.57 ] that will be used to extract the consequents of the rule, we must firstly change their values to into the real output value of the system, that is [−9.881 26.57] → [0 5.4] . The scaling change will be made through:
f (3.4x2)
f (3.618x1) 0.9
0.9
x1
0.635
x2
0.675
(a)
(b) f (5.565x3) 0.9
x3
0.413
(c) Figure 8: Membership Functions extracted from the first hidden neuron. (a) w11=3.618 (b) w21=3.4 (c) w31=5.565.
Finally, from output neuron of the ANN, the output of the FIS can be extracted as in (16) and once the values of the weights β were changed to β , then this scaling change must be included in the system output, which results in: 14
14
j =1
j =1
y = 128.6382 + 9.8811∑ v j r j − 6.75194 ∑ β j v j r j (24)
where the firings strengths v j of the rules are calculated through the Algebraic product as in (18) and r j is the weight of the rule j. 6. CONCLUSION In this paper, we have presented the derivation and definition of the transform that maps an ANN to a TSFIS. We have explained how to proceed in a practical case and have demonstrated how the transform made explicit and put into light knowledge that was hidden in the ANN architecture. The transformation methodology was based on the concept of f-duality that allowed to find the equivalent mathematical operation for a hidden neuron, which can be considered the foundation for all the process of extraction of rules.
14th PSCC, Sevilla, 24-28 June 2002
Rule
1 2 3 4 5 6 7 8 9 10 11 12 13 14
IF
Session 20, Paper 3, Page 7
Distance to main urban centre (x1) is not
Distance to main urban centre (x2) is not
Distance to secondary urban centre (x3) is not
Potential for development is
Rule weight
greater than 0.63 greater than 0.08 greater than 0.49 greater than 0.07 greater than 0.40 greater than 0.34 greater than 0.20 greater than 0.40 greater than 0.29 greater than 0.11 greater than 0.55 greater than 0.62 greater than 0.40 greater than 0.10
greater than 0.67 greater than 0.24 greater than 0.42 greater than 0.05 greater than 0.50 greater than 0.89 greater than 0.45 greater than 0.65 greater than 0.37 greater than 0.25 greater than 0.38 greater than 0.03 greater than 0.34 greater than 0.96
greater than 0.41 greater than 0.09 greater than 0.11 greater than 0.09 greater than 0.29 greater than 0.13 greater than 0.14 greater than 0.43 greater than 0.05 greater than 0.11 greater than 0.10 greater than 0.19 greater than 0.23 greater than 0.11
2.56 0 3.50 5.4 2.62 1.80 3.16 2.71 3.33 1.23 3.69 0.39 1.96 1.33
0.14 0.75 0.81 0.99 0.03 0.43 0.61 0.02 0.21 0.92 0.65 0.82 0.67 0.33
A N D
A N D
T H E N
Table 2: Rules extracted from ANN for SLF problem.
It is important to emphasize that any method for the extraction of rules from an ANN is valuable only to the degree to which the extracted rules are meaningful and comprehensible to a human expert. Therefore, in order to give to the extracted rules a more human-friendly form and logic sense we have to introduce restrictions on the ANN learning algorithm and have to make scaling changes in the extracted consequent. For illustration, the Spatial Load Forecasting problem was chosen; we have demonstrated that the method for extraction of FIS is simple and direct. For this practical case we have obtained the FIS with only 12 rules. In [9] we have constructed a FIS for the same problem using the known ANFIS (adaptive neurofuzzy inference system) and we have obtained 124 rules for the zero-order Takagi Sugeno model. Either system has good results, but in terms of computational complexity for practical implementation we can see that working with the FIS extracted from the ANN is better than working with the FIS obtained from ANFIS. Besides, it becomes easier for the human specialist to understand and analyze a system with smaller number of the rules. 7. REFERENCES [1] S. Mitra and Y. Hayashi, “Neuro-Fuzzy Rule Generation: Survey in Soft Computing Framework”, IEEE Transactions On Neural Network, vol. 11, no 3, pp 748768, May 2000. [2] D. Nauck , F. Klawonn and R. Kruse, “Foundations of Neuro-Fuzzy Systems”, Wiley 1997, ISBN0-471-97151-0. [3] J. M. Benitez, J. L. Castro and I. Requena, “Are Artifical Neural Networks Black Boxes?”, IEEE Transactions On Neural Network, vol. 8, no 5, pp 1156-1164, September 1997. [4] J. S. R. Jang and T. Sun, “Functional Equivalence between Radial Basis Function Networks and Fuzzy Inference Systems”, IEEE Transactions On Neural Network, vol. 4, pp 156-158, 1992.F.
[5] Scarsell and C. Tsoi, “Universal Approximation using Feedforward Neural Networks: A Survey of some existing methods, and some news results”, Neural Network, vol. 11, no 1, pp15-37, 1998. [6] F.A. Pinkus, “Approximation Theory of the MLP Model in Neural Networks”, Acta Numerica, pp 143-195, Cambridge University Press, 1999. [7] L. X. Wang, “A Course in Fuzzy Systems and Control”, Prentice-Hall International, 1997, ISBN: 0-13-593005-7. [8] C. Monteiro, V. Miranda and T. P. Leão, “Scenario Identification Process on Spatial Load Forecasting”, Proceedings of PMAPS2000 – 6th Int. Conference on Probabilistic Methods Applied to Power Systems, ed. INESC Porto, Funchal, Portugal, September 2000. [9] T. Konjic, I. Kapetanovic, V. Miranda and A. Castro, “Uncertainty in Spatial Load Forescasting Models – A Comparasion of Neural Networks and Several Fuzzy Inference Systems”, Proceedings of RIMAPS 2001, Euro Conf. on Risk Management in Power System Planning and Operation, Porto, Portugal, September 2001.
ANNEX - Proof of Lemma 2: Only two input variables in domain of X will be considered to proof the Lemma 2. Let a, b ∈ [01[ . Let x1 , x 2 ∈ ℜ such that a = f ( x1) and b = f ( x2 ) . For the sigmoid basis approximation defined in equation (2) we have then: For x = x1 , x1 = − ln(1 − f ( x1)) = − ln(1 − a ) x = x 2 , x2 = − ln(1 − f ( x 2 ))− = − ln(1 − b)
then x1 + x 2 = − ln(1 − a ) − ln(1 − b) = − ln(1 − a )(1 − b) for x = x1 + x 2 , x1 + x 2 = − ln(1 − f ( x1 + x 2 )) then − ln(1 − a )(1 − b) = − ln(1 − f ( x1 + x 2 )) (1 − a )(1 − b) = 1 − f ( x1 + x 2 )
and f ( x1 + x 2 ) = 1 − (1 − a )(1 − b) then f ( x1 + x 2 ) = f ( x1 ) ∗ f ( x 2 ) = a ∗ b = 1 − (1 − a )(1 − b) and generalizing for n inputs we have proved Lemma2.