Membership Functions Generation Based on Density Function Imen Derbel Faculty of Sciences of Tunis El Manar1, Campus Universitaire, 2092, Tunis, Tunisia
[email protected] Narjes Hachani Faculty of Sciences of Tunis El Manar1, Campus Universitaire, 2092, Tunis, Tunisia
[email protected] Habib Ounelli Faculty of Sciences of Tunis El Manar1, Campus Universitaire, 2092, Tunis, Tunisia
[email protected]
Abstract
previous attempts on derivation of membership functions require expert knowledge of the application area [11]. However,these methods suffer from the problem of knowledge acquisition and subjectivity[7] .The approach proposed in this paper intends to bridge this gap and provides an automatic generation of trapezoidal membership functions based on a density function. Several advantages of our approach are depicted:
Fuzzy membership functions are considered as a key element in fuzzy systems. In order to generate a fuzzy membership function, there are two potential sources: expert knowledge and real data. However expert knowledge acquisition is a difficult issue, on the other hand using real data needs a methodology to translate real data to membership function. Most previous approaches considered membership function design highly dependent of fuzzy rule base and require the specification of membership functions’ number. This paper attempts to overcome these problems and proposes an automatic membership function generation method. Our approach is based on a clustering technique and a density function for deriving cores of fuzzy sets. Experimental results show that our approach generates large core region which is more preferable than small core region in the context of membership function generation for neuro-fuzzy systems.
• Reflection of the knowledge contained in the data through • Automatic identification of the optimal number of membership functions. • Independency of fuzzy rules and any other components inside the fuzzy system. • Use of a density function identifying the core of a membership function. • Application of the generated membership functions in several fields such as flexible querying of databases.
1. Introduction
The rest of this paper is organized as follows. Section 2 introduces the basic notions of fuzzy sets. Section 3 describes the main steps of the proposed method for generating fuzzy membership functions. Section 4 includes experimental results. Section 5 compares our method to similar approaches proposed in the literature. Finally, section 6 recalls the main points of this paper and draws some working directions for the future.
Fuzzy Sets Theory was introduced by Zadeh[16] as a way to capture uncertainty and vagueness in several systems such as fuzzy database systems. In these systems, impreciseness is generally expressed using fuzzy linguistic terms which are usually defined as fuzzy sets. Each fuzzy set is characterized by its membership function, therefore, these functions must be carefully defined. Most 1
2. Fuzzy Set Theory 2.1. Fuzzy Sets A fuzzy set A [8] over a universe of discourse X is a set of pairs: A = {(x, μA (x))} such that x ∈ X, μA (x) ∈ [0, 1] where μA (x) is called the membership degree of the element x to the fuzzy set A. This degree ranges between the extremes 0 and 1: • μA (x) = 0 indicates that x in no way belongs to the fuzzy set A.
Figure 1. Parameters of a trapezoidal membership function
• μA (x) = 1 indicates that x completely belongs to the fuzzy set A. A trapezoidal membership function [8] is described by its lower limit a, its upper limit d, the lower and upper limits of its core, b and c respectively as sown in figure 2. It is given by the following expression:
2.2. Characteristics of Fuzzy Sets 1. The core of a fuzzy set A, defined over X, is a subset of that universe that is characterized by complete and full membership in the set. It complies with: CoreA = {x ∈ X such that μ(x) = 1}
μtrapezoid (x) =
2. The support of a fuzzy set A defined over X is a subset of that universe that is characterized by nonzero membership in the set. It complies with:
⎧ 1.0 ⎪ ⎪ ⎨ x−a ⎪ ⎪ ⎩
b−a d−x d−c
0
b≤x≤c a≤x≤b c≤x≤d otherwise
Our approach is composed of two major steps: clustering and constructing membership functions.
SuppA = {x ∈ X such that μ(x) > 0}
3.2. Determination of Membership Function Number
3. The height of a fuzzy set A, defined over X, complies with:
Our approach is based on a clustering method which automatically generates the optimal number of clusters (i.e., the number of membership function). Each obtained cluster will be represented by a fuzzy set described by a trapezoidal membership function. We propose to use the clustering method CLUSTERDB* [6] which is an improvement version of CLUSTER algorithm [2] based on a validity index DB*. CLUSTERDB* constructs an initial relative neighborhood graph (RNG) and then tries to divide it into several subgraphs based on a threshold which is dynamically computed. This process is applied iteratively for each obtained subgraph until a stop criterion is reached. The algorithm of our approach is sketched as follows: According to our clustering method, a cluster C can be defined as a subgraph (X, E) where X is the set of vertices and E is the set of edges. The weight of an edge connecting two vertices xi and xj is represented by the euclidian distance between them, noted d(xi , xj ).
HgtA = supx ∈X μA (x)
3. Membership Function Derivation There are three crucial issues in the design process of membership function namely: the shape, the number and the parameters of membership functions.
3.1. Choice of Membership Function Shape There are several types of membership functions, the most commonly used are trapezoidal and triangular ones. We are interested in trapezoidal function for two main reasons. First, trapezoidal function is known for its popularity and simplicity [11]. Second, a recent research [12] has proven that using trapezoidal membership function is more reliable than triangular one. 2
• A density function of a vertex xi ∈ C is defined by the following expression [5]:
3.3. Identification of Membership Functions Parameters The generation of membership function consists of generating the core and the support.
De(xi ) =
DiamC −
1 Dg(xi )
xj ∈V (xi )
d(xi , xj )
DiamC
(1) Where Dg(xi ) is the cardinality of V (xi ). De(xi ) has a high value when the elements of V (xi ) are close to xi .
Generation of the core The core includes objects that best characterize a cluster. Therefore, it can be represented by the dense part of a cluster which is composed of the cluster’s centroid Ce and the set of its dense neighbors.
• The minimal(respectively maximal)density in a cluster C is denoted DminC (respectively DmaxC ) and defined as:
Computation of the centroid The centroid of the cluster is represented by the average value of the cluster. If this object is not included in the cluster, we will consider its nearest neighbor. This method is formalized by the algorithm 1.
∃ xi ∈ C such that DminC (respectively DmaxC )= De(xi ) and ∀xj ∈ C, De(xi ) =)De(xj ).
Algorithm 1: Searchcentroid Input: A cluster C = (X, E) where X is the data set and E is the set of edges Output: Cluster’s centroid Ce begin xi ← thevalue of the ith vertex in C |X| Avg ← i=1 xi /|X| if (∃xi such that xi = Avg) then Ce ← xi
• A dense vertex of a cluster is an object having a density value greater than the density’s threshold (thresh = (DminC +DmaxC ) ) of the cluster. 2 • A vertex xi is considered as a neighbor of another vertex xj if it is its direct neighbor and it represents a dense vertex. The retrieval of centroid’s neighbors is comprised of two main phases namely:
else Ce ← xi such that ∀xj ∈ X d(Avg, xi ) < d(Avg, xj )
• We determine the direct neighbors of the centroid. • For each obtained neighbor, we search their direct neighbors.This search is accomplished iteratively until no neighbor is found.
end
Determination of Centroid’s Neighbors In this section, we first introduce some basic definitions which are useful in method of centroid’s neighbors identification. Second, we present the algorithm describing this method.
The set of all obtained neighbors represents centroid’s neighbors. The algorithm 2 details these phases. Algorithm 2: Core generation Input: A cluster C = (X, E), xi : the centroid of C Output: ClustercoreC : Core of the cluster C begin ClustercoreC ← xi DN ← directneighbors(xi ) ClustercoreC ← ClustercoreC DN thresh ← (Dmin + Dmax)/2 for xi ∈ DN do if xi not ∈ ClustercoreC then if De(xi ) >= thresh then xi Clustercore C ← ClustercoreC DN ← DN directneighbors(xi )
• The diameter of a cluster is defined as the maximum distance between two cluster objects. Let C be a cluster and d(xi , xj ) be the distance between two objets xi and xj of C. DiamC = d(xi , xj ) if xi , xj ∈ C such that d(xi , xj ) > d(xi , xj ). • Let xi and xj be two vertices of a cluster C. xj is a direct neighbor of xi if it exists an edge connecting xi and xj . The set of direct neighbors of a vertex xi is defined as: V (xi ) = {xj ∈ C such that xj is a direct neighbor of xi }.
end
3
Generating the Support of the Fuzzy Set: Based on the obtained cores, we define the support of the corresponding membership function for each fuzzy set. Suppose we want to construct the membership functions of fuzzy sets for the j th quantitative attribute with a range from minj to maxj . Let [bij , cij ] be the core of the fuzzy set i for this j th attribute. Supports of membership functions are determined as follows: For the fuzzy set correspondent to the first cluster C1j , having as core [b1j , c1j ], the support is given by :
3. A selection of 5723 objects from Thyroid DB [4]. The thyroid disease records supplied by the Garavan Institute and J. Ross. We consider the value of TSH (Thyroid Stimulating Hormone) attribute which allows to identify two clusters.
4.1. Experimental Results The experimental results are summarized in tables 1, 2 and 3.
Support of C1j = [minj , b2j ] Table 1. Membership function parameters for Census Income DB Clusters Cluster 1 Cluster 2 Cluster 3 [1, 16] [25, 73] [81, 90]
For the fuzzy set with the core [bij , cij ], the support is defined as follows: Support of Cij = [c(i−1)j , b(i+1)j ]
Core MF’s parameters
For the fuzzy set associated to the last cluster Ck with the core [bkj , ckj ], the support of the membership function is given by:
1..13 1, 13, 25
25..63 13, 25, 63, 81
81..90 63, 81, 90
Support of Ckj = [c(k−1)j , maxj ] Table 2. Membership function parameters for Pima Diabets DB Clusters Cluster1 [44, 74] Cluster2 [100, 199]
The membership functions associated to a partition composed of k clusters are shown by the figure 2.
Core MF’s parameters
56..74 44, 74, 100
100..183 74, 100, 199
Table 3. Membership function parameters for Thyroid DB Clusters Cluster 1 Cluster 2 [0.002, 288] [400, 530] Figure 2. Membership function parameters.
Core MF’s parameters
0.002..236 0.002, 236, 430
430..472 236, 430, 530
Figure 3 shows the graphical representation of the membership function associated to CensusIncome DB in . According to tables 1, 2and 3, we deduce that the used clustering method determine the optimal number of clusters and so the adequate number of membership functions. Results also show that the generated trapezoidal membership functions have large core regions which are considered as more appropriate than functions with small core region in the context of neuro-fuzzy systems [12]. Hence, we find that this method is valuable for reducing time and effort needed to develop a fuzzy expert system. Indeed, it is dependent on computer execution time, but not on human experts which is considered as time-consuming.
4. Experiments In this section, we elaborate on the experimental results of membership functions elicitation. The data sets being used in this experimentation are: 1. Books includes over 400 prices of books. This base is collected from "www.amazon.com". It contains two clusters. 2. Census Income DB [4] includes 606 objects. We are interested in the value of age attribute which allows to identify three clusters. 4
are recorded as y, the observed membership function degree uA (x), from which a conditional distribution function f (y|x) can be generated. This function is charactern ized by a mean y|x =(1/n) i=1 (yi |x) and a variance n −1 2 V (y|x) = (n − 1) i=1 (yi |x − y|x) . However, this method forces subjects to describe the membership degree precisely and rigidly. Reverse Rating Membership degrees are presented to a subject in a random manner. The subject is then asked to respond to the question like "identify element x that belong to the fuzzy set A with the y th degree of membership". The response is an element x for a given membership degree y.
Figure 3. Trapezoidal membership function of Census Income DB
Set Valued Statistics A fuzzy set A is represented by its level-cuts {Aα |α ∈ [0, 1]}, where {Aα = {x ∈ X|μA (x) ≥ α}. Given a random set R = {(Aαi , mi )|i = 1, ..., n}, where Aαi is a set valued observation on X and mi is the probability that Aαi is a representative of A. The membership degree can be expressed as: μA (x) = x∈Aα (mi ).
In the following section, we will present previous methods suggested in the literature [8, 3] and compare them to our approach.
i
5. Comparison to other approaches
All manual approaches suffer from the deficiency that they rely on very subjective interpretation of words and generally all the knowledge acquisition problems [3]. So, human design of membership functions remains unsatisfactory to define fuzzy concepts. Our method is one of some automatic approaches which are proposed to overcome these limitations.
Several methods have been proposed to acquire and construct membership functions [8, 3]. These methods can be categorized as either being manual or automatic.
5.1. Experimental Acquisition of Membership Functions
5.2. Automatic Acquisition of Membership Functions
Mainly, there are four methods [3] for the experimental acquisition of membership degrees, namely, polling, direct rating, reverse rating and set valued statistics.
Lucero, et al. [10] proposed to partition a set of data into classes. Each class is then represented by a triangular membership function. A membership function for a class j consists of three objects: the central object bj , the left one aj and the right one cj . The central object is calculated by the median value of objects included in the cluster. aj and cj are obtained using interpolation. This method requires the specification of some parameters by expert knowledge and generates only triangular membership functions. Whereas, our approach is proposing a trapezoidal membership function generation framework independent of expert knowledge.
Polling Polling supposes that the vagueness arises from interpersonal disagreements; that is, lack of a precise common meaning [3]. A subject is presented repeatedly with an element x. He/She is asked for either a yes or no response to the question like "Do you agree that x is a member of A". The degree μA (x) is determined as follows: μA (x)= (total number of "yes" responses for x) / (total number of "yes" + "no" responses for x). Direct Rating Direct rating supposes that vagueness arises from individual subjective uncertainty [3]. In this procedure, the membership degrees of elements are directly assigned by subjects (humans) . The subject is asked to respond to a question like "How A is x", where A is the linguistic term, and x is an element whose membership degree we seek to acquire. This question is repeated a reasonable number of times, noted n. The responses
Fu, et al. [1] divide the data into k clusters using CLARANS [14]. The medoid of each cluster will represent the core of the corresponding fuzzy set. This approach generates only triangular membership function. Further, CLARANS needs the specification of clusters number. However, our method is based on CLUSTERDB* algorithm 5
which allows automatic detection of the adequate number of clusters.
[7] T. Hong and C. Lee. Induction of fuzzy rules and membership functions from training examples. Fuzzy sets and Systems, 84:33–47, 1996. [8] A. U. J. Galindo and M. Piattini. Fuzzy Databases: Modeling, Design and Implementation. Idea Group Publishing Hershey, USA, 2006. [9] C. Lee. Fuzzy logic in control systems. IEEE Transactions on Systems, man and Cybernetics, 20:404–432, 1990. [10] Y. Lucero and P. Nava. Membership functions part i : Comparing method of measurement. Department of Electrical and Computer engineering University of Texas, 1999. [11] O. B. M. Makrehchi and M. Kamel. Generation of fuzzy membership function using information theory measures and genetic algorithm. IEEE Transaction on Fuzzy Systems, pages 603–610, 2003. [12] J. Paetz. A note on core regions of membership functions. 2001. [13] R. Ribeiro and A. Moreira. Fuzzy query interface for a business database. 2002. [14] T. Rymond and J. Han. Efficient and effective clustering methods for spatial data mining. Proc. of the VLDB conference Santiago. [15] M. Sugeno. An introduction survey of fuzzy control. Information Sciences, 36:59–83, 1985. [16] L. Zadeh. Fuzzy sets. Information and Control, 1965.
Rita, et al. [13] is used to develop a flexible database query interface for a relational database. To construct a fuzzy attribute, all the values of this attribute are ordered and plotted. The membership function is then defined by visual observation of the graphic,using three points (a1, b1, a2) corresponding to the lower, inflexion point and higher value of the attribute. This method constructs open interval trapezoidal membership functions taking into account the real values of each attribute but it is an empirical method. Although there have been many efforts that aim at automatic generation of fuzzy membership functions as a part of fuzzy systems, it is still based on expert knowledge [9, 15]. Whereas, our method is not only independent of fuzzy rules generation and knowledge acquisition, but also can be used in several domains. In our research team, we use these functions for flexible querying of relational databases.
6. Conclusion This paper proposes a formal and automatic approach for the derivation of trapezoidal membership functions. Our approach is based on a clustering method and a density function used to generate cores of fuzzy sets. There are three evident advantages of the introduced method: the independence of fuzzy rule base,the automatic determination of the optimal number of membership functions and the extraction of the knowledge contained in the data. We are currently handling the incremental aspect of our approach. Moreover, we will use our approach in a flexible querying system.
References [1] S. S. W. W. W. W. A. Fu, M.H. Wong and W. Yu. Finding fuzzy sets for the mining of fuzzy association rules for numerical attributes. Department of Computer Science and Engineering, 1998. [2] S. Bandyopadhyay. An automatic shape independent clustering techniques. Pattern Recognition, pages 33–45, 2004. [3] T. Bilgic and I. Turksen. Measurement of Membership Functions: Theoretical and Empirical Work. Handbook of Fuzzy Sets and Systems, 1997. [4] C. Blake and C. Merz. Uci repository of machine learning databases. http://www.ics.uci.edu/∼mlearn/MLRepository.html. [5] A. Guénoche. Clustering by vertex density in a graph. Proceedings of IFCS congress Classification, pages 15–24, 2004. [6] N. Hachani and H. Ounalli. Improving cluster method quality by validity indices. International FLAIRS Conference, 2007.
6