Michael R. Berthold and Klaus{Peter Huber. Institute for Computer Design and Fault Tolerance (Prof. D. Schmid). University of Karlsruhe, Am Zirkel 2, 76128 ...
On Missing Values and Fuzzy Rules Michael R. Berthold and Klaus{Peter Huber
Institute for Computer Design and Fault Tolerance (Prof. D. Schmid) University of Karlsruhe, Am Zirkel 2, 76128 Karlsruhe, Germany eMail:
[Michael.Berthold,KlausPeter.Huber]@Informatik.Uni-Karlsruhe.DE
Abstract
Numerous learning tasks involve incomplete or con icting attributes. Most algorithms that automatically nd a set of fuzzy rules are not well suited to tolerate missing values in the input vector, and the usual technique to substitute missing values by their mean or another constant value can be quite harmful. In this paper a technique is proposed to tolerate missing values during the classi cation process as well as during training. This is achieved by using the evolving model to predict the most possible value for the missing attribute, resulting in a \best guess" for the feature vector which is then used to further adjust (or train) the set of fuzzy rules.
1 Introduction Most learning algorithms in practical applications have to deal with missing values, due to unrecorded information or occlusions of features, for example in vision applications. Often the missing attribute is simply replaced by its mean. In this paper an approach is presented to use the evolving model to estimate a more reasonable value for the missing values, therefore using the learned knowledge to produce a \best guess". In contrast to the work presented in [1] and [3] a fuzzy rule based approach is used. This simpli es the resulting math and allows to estimate the best guess for the missing values eciently. Few restrictions on the type of fuzzy system are assumed, i.e. rules have to be normalized and standard min/max{inference is required. For most practical applications these restrictions are of no relevance.
2 Fuzzy Rule Based Classi ers
A fuzzy rule based classi er (see [4]) consists of a set of rules Rik for each class k (1 k c, 1 i mk , where c denotes the number of classes and mk indicates the number of rules of class k). If the feature space is of dimensionality n, each feature vector ~x consists of n attributes: ~x = (x1 ; ; xn ), xi ( denotes the domain of the attributes, usually the real numbers). A rule's acitivity ki (~x) indicates the degree of membership of the pattern ~x to the corresponding rule Rik , the degree of membership for the entire class is computed using:
k (~x) = 1max k (~x) imk i
2 R
R
(1)
Using the normal notation of fuzzy rules, each rule can be decomposed into n individual, one{dimensional membership functions: k (x ) (2) ki (~x) = j=1min ;:::;n i;j j
where ki;j indicates the projection of ki onto the j {th attribute. In addition, we assume normalized, one{ dimensional membership functions:
xj
8
2 R
: 0 ki;j (xj ) 1 and
xj
9
2 R
: ki;j (xj ) = 1
(3)
3 Classi cation with Missing Values For the case of missing input values, i.e. attributes where no information about their true value is available, the feature vector can be written as: ~x = (~v ; ?), where ~v = (v1 ; :::; vnv ) represents the part of the vector with the 1
known values and ? replaces ~u = (u1 ; :::; unu ), the vector of unknown attributes (nv + nu = n). Using equation 2 the activation of one rule can be obtained through: ki (~v; ?) = supn ki (~v; ~u) = 1jnmin1ln v u ~u2R u
ki;j (vj ); sup fki;l (ul )g u 2R
(4)
l
Since we are dealing with normalized fuzzy rules (equation 3), ki;l will have a maximum value of 1 somewhere on , and this equation can be simpli ed to: R
k k (v ) (v ); 1 eq=:3 j=1min ki (~v ; ?) = j=1min ;:::;nv i;j j ;:::;nv i;j j
(5)
independent of ~u, i.e. it is sucient to compute the activation of the each rule in the projection of n on nv . The results obtained here are similar to the results obtained by [1] for Gaussian Basis Functions but are more ecient to compute. In the case of fuzzy rules the only assumption made is the fact that the rule's activity is composed of one{dimensional normalized activation functions. R
R
4 Training with Missing Values Training with missing values is usually being done by substituting the mean or another xed value for each missing attribute. In [3] it was demonstrated how this can lead to severe problems during training. The approach presented here tries to use the model that evolves during training to predict the most possible value for each of the missing attributes, resulting in the so{called \best guess". If the training pattern (~x; k) (input ~x, class k) has missing values the same notation as above can be used: ~x = (~v ; ~u). Training is now done in two steps, rst the best guess for ~u is determined, based on the known values for ~v. Then normal training is executed using the best guess for ~u to complete ~x. Usually nding the best guess is computationally expensive since the maximum of the model{function has to be found in u . In the case of fuzzy rules it becomes easier, the maximum of k (~v; ~u) in respect to ~u has to be found: R
~u2R
u
supn k (~v ; ~u) eq=:1 supn ~u2R
u
max k (~v ; ~u) 1imk i
= 1max im
k
sup ki (~v ; ~u) ~u2Rnu
eq:4
= 1max k (~v ; ?) imk i
(6)
This results in the rule with the highest degree of membership being selected:
k k eq:5 max min (v ) imax = arg max (~v; ?) = arg 1imk 1j nv i;j j 1imk i
(7)
In the case of several maxima one of them can be choosen arbitrarily. The restriction ~v on the best rule leads to a not neccesarily normalized fuzzy rule in nu with a subspace Amax (~u ~v ) of maximal degree of membership under the condition ~v: (8) Amax (~u ~v) = ~u w~ nu : kimax (~v ; ~u) kimax (~v ; w~ ) Using Equation 4 this can be simpli ed to: j
R
j
j 8
2 R
Amax (~u ~v) = ~u kimax (~v; ~u) kimax (~v; ?) j
(9)
j
and ki (~v; ?) is the activation of the rule using the missing values (see equation 5). Amax (~u ~v ) describes a hyper{ rectangle in nu {dimensional space that is equal to the {cut of min1lnu kimax (ul ) using = kimax (~v; ?) and therefore easy to obtain. Now the \best" vector (~v ; ~umax) can be computed, using a typical defuzzi cation technique, for example the Center of Gravity: R ~u kimax (~v ; ~u) d~u u ~ 2 A max ~umax = R k (~v ; ~u) d~u (10) imax j
f
g
~u2Amax
Figure 1 illustrates this procedure using a two dimensional example with one known and one unknown attribute. For the activation function a trapezoidal membership function is used, featuring a core{ ( = 1) and a support{ region (0 < < 1), for more details see [2]. Training can now be performed using the input vector (~v; ~umax) without any missing values. 2
v
v support core
a
h h(?,a) 1 u h u
A
1
max
h(?,a) u max
u
v
Figure 1: The proposed way to nd a best guess for a missing value so that a standard training algorithm can be used. The attribute v has the value a, u is unknown. Using the described method for classi cation with missing attributes (?; a) can be obtained resulting in an area of maximum activity Amax . The \best guess" for u is found using the Center of Gravity method resulting in umax. The completed vector (umax; a) can now be used for conventional training.
5 Discussion A methodology was proposed to deal with missing inputs for classi cation and training using sets of fuzzy rules under the usual assumptions about fuzzy numbers (i.e. max/min{inference and normalized rules). During incremental training, the evolving set of fuzzy rules is used to predict the most probable value for each missing attribute, thus resulting in a \best guess" for the completed feature vector during training. The often used method of replacing unknown values by their mean is contained as a special case, when no knowledge about the underlying model is available, i.e. at the beginning of the training process. In contrast to other methods the estimation of the \best guess" is computationally uncritical, due to the min/max{nature of fuzzy systems. Future plans involve applying the proposed method to a constructive algorithm that automatically nds a set of fuzzy rules, based on training examples (see [2]).
Acknowledgements Thanks go to Prof. D. Schmid for his support and the opportunity to work on this project. Thanks also to Ingrid Fischer and Frank Mayer for many helpful remarks.
References [1] Subutai Ahmad and Volker Tresp. Some solutions to the missing feature problem in vision. In Stephen J. Hanson, Jack D. Cowan, and C. Lee Giles, editors, Advances in Neural Information Processing Systems, 5, pages 393{400, California, 1993. Morgan Kaufmann. [2] Klaus-Peter Huber and Michael R. Berthold. Building precise classi ers with automatic rule extraction. In IEEE International Conference on Neural Networks, 3, pages 1263{1268, 1995. [3] Volker Tresp, Subutai Ahmad, and Ralph Neuneier. Training neural networks with de cient data. In Jack D. Cowan, Gerald Tesauro, and Joshua Alspector, editors, Advances in Neural Information Processing Systems, 6, pages 128{135, California, 1994. Morgan Kaufmann. [4] Lot A. Zadeh. Soft computing and fuzzy logic. IEEE Software, pages 48{56, november 1994.
3