FUZZY
sets and systems ELSEVIER
Fuzzy Sets and Systems 75 (1995) 273-289
Towards the implementation of a generalized fuzzy relational database model J.M. Medina*,
M.A. Vila, J.C. C u b e r o ,
O. Pons
Department of Computer Sciences and Artificial Intelligence, Universityof Granada, 18071 Granada, Spain Received April 1993; revised June 1994
Abstract This paper shows the necessary elements for the effective implementation of the generalized fuzzy relational database model. From the model described in Medina et al. (1994) some criteria for representation and handling of imprecise information are introduced, the most important aspect being the simplicity of the implementation. The paper shows a series of mechanisms to implement imprecise information in a classical RDBMS. Having the information represented in a classical RDBMS data structure and having the implementation of procedural knowledge about such information, we will be able to build a FRDBMS on a host RDBMS.
Keywords." Fuzzy relational database; Fuzzy sets.
1. Introduction In the last few years, some authors have dealt with the problem of relaxing the relational model in order to admit some imprecision; this leads us to Database system that lay within the scope of Artificial Intelligence, as they allow us to manage information with a terminology that is very similar to natural language. Imprecision can be included in the system at two levels: The first level considers the possibility of making imprecise queries to the classic databases.
* Corresponding author. E-mail:
[email protected].
The second one is related to the problem of adding imprecise information to the system. In both cases, the fuzzy sets theory [35] provides a powerful tool to represent imprecision. So, at the first level, we consider the works [3, 12]. The handling of the problems at the second level gives rise to the fuzzy relational database models. The existing approaches at this level can be grouped into two classes: Models through similarity relations unifications and relational models based on possibility distribution. In the first group the main works are [4-6, 1], and additional contributions are [24-26]. In the second group there are some approaches: U m a n o [27], Prade-Testemale [20] and Z e m a n k o v a [37]. In the paper [15] we introduce a new extension of the Relational Model called G E F R E D . This model incorporates a new definition for the data
0165-0114/95/$09.50 © 1995 - Elsevier Science B.V. All rights reserved SSDI 0165-01 14(94)00380-7
274
J.M. Medina et al. / Fuzzy Sets and Systems 75 (1995) 273-289
structure and the corresponding data handling, which allows us to integrate, in the same framework, the previous relational models. In the papers [30, 32], it is proved that G E F R E D can be represented from a logical point of view and that previous models are considered as particular cases. Once our model, briefly described in Section 2, has been formulated, we are concerned with the problem of viability, that is, how, if possible, we can implement it. The problem of the implementation of FRDBMS has been treated in the literature followirlg.two basic lines. • Starting from a RDBMS with precise information, to develop a syntax that allows formulate imprecise queries. In [3] there is a study about a SQL extension to make this kind of queries. • To build a FRDBMS prototype which implements a concrete fuzzy relational database model. In this sense, some proposals have been made: in [28] Umano shows an implementation for his model and the work [13] is about the implementation of a FRDBMS based in Umano's model using Fuzzy-Prolog. For Prade-Testemale model, some aspects of its implementation can be found in I l l ] , and in [37] some ideas about the implementation of Zemankova-Kandel model are showed. Our particular proposal is inside the first line, but including the capability of representating and handling fuzzy information in a classical RDBMS. The first thing to be considered is to adopt a particular criterium for imprecise information representation, i.e., to find the most suitable representation for imprecise data and operators among those possible in GEFRED. Such representation (described in Section 3) will facilitate, as far as possible, the corresponding implementation. In Section 4 we will analyze some aspects related to the implementation of fuzzy information in the system to be developed. In this section, it is shown that the mechanisms adopted to implement all elements related to imprecise information are very general, and their validity is extended to implementations based on other criteria. This section ends with an example that illustrates the way in which imprecise information is represented and how it is implemented in the database and in the metaknowledge base.
2. Theoretical model used In this section we introduce the basic elements of a fuzzy extension of the relational model, called G E F R E D , described in [15]. Such an extension includes some elements shown in the previously reviewed fuzzy relational models of new characteristics. The main contributions are the following. • Information handling whose imprecise origin is wider. • A different information organization. The same relation structure is used to represent the initial information, the information resulting from algebraic operations and the final results. • A certain control can be made on the precision with which any simple condition involved in a query is satisfied.
2.1. Data structure The information the model handles is organized as follows: • The domain DG underlying every attribute of the relation contains some of the data of Table 1. • We structure the data through a relation model, RF~, given by Rr~ ~ (DG1, C1) × ... × (De,, C,), where every D~j is a domain of the type previously described, Cj is a "compatibility attribute" that takes its values in [0, 1]. Every attribute is associated with a "compatibility attribute". In base relations, "compatibility attribute" does not appear. This relation represents the initial information as well as that resulting from the fuzzy algebra operations made on it. Handling of these relations through fuzzy relational algebra could modify, for every tuple, the compatibility attribute values.
2.2. Data handling The fuzzy algebra used in this model is an extension of the classical one; in this extension specific comparison operators are used in order to handle fuzzy information. Fuzzy querying receives special handling, based on the following points.
£M. Medina et al. / Fuzzy Sets and Systems 75 (1995) 273-289
275
Table 1 Data types 1. A single scalar (Behavior = good, represented by the possibility distribution, 1/good)
2. 3. 4. 5. 6. 7. 8. 9. 10.
A single number (Age = 28, represented by the possibility distribution, 1/28) A set of possible scalar assignments (Behavior = {good, bad}, represented by {1/good, 1/bad}) A set of possible numeric assignments (Age = {20, 21}, represented by { 1/20, 1/21}) A possibility distribution in a scalar domain (Behavior = {0.6/bad, 0.7/normal}) A possibility distribution in a numeric domain (Age = {0.3/23, 1.0/24, 0.8/25}, fuzzy numbers or linguistic labels) A real number belonging to [0, 1], referring to degree of matching (Quality = 0.9) An Unknown value with possibility distribution, Unknown = {I/u: u E U} An Undefined value with possibility distribution, Undefined = {0/u: u e U} A N U L L value given by N U L L = {1/Unknown, l/Undefined}
• We call "atomic selection" a query, on a relation type RrG, in which we look for the satisfaction of a simple condition. • When an attribute, an operator and a fuzzy constant are involved in an "atomic selection", such a condition will be satisfied in a degree for every attribute value. Such a degree takes a value in [0, 13. • In an "atomic selection" we can establish a threshold for the degree of satisfaction of a condition. Thanks to that threshold in the "atomic selection" we can eliminate those tuples that do not satisfy the condition to a great or equal degree as the threshold. • The result of an "atomic selection" with a threshold for the degree is, once again, a relation of the type introduced in Section 2.1. In that relation, the degree of satisfaction of a condition for every value of the attribute involved appears in the compatibility attribute. Compound conditions are those obtained combining simple conditions through logic connectives (negation, conjunction and disjunction). Compound conditions are solved as follows. - From every simple condition we obtain the resulting relation applying the "atomic selection" with a given threshold. - For simple conditions connected with conjunctive operator, we make the intersection of the relations obtained from every condition. Afterwards, the values of the "compatibility attribute" associated with every attribute involved in the simple conditions are computed. Such computing consists in giving to the compatibility attribute of every tuple of the
intersection a value that is equal to the minimum of those present in the respective initial simple conditions. - For simple conditions connected with disjunctive operator, we make the union of the relations obtained for every condition and update the compatibility attribute with the maximum value. - For a negated simple condition, we update the compatibility attribute value with the complement to 1 of the present value in every tuple.
3. Fuzzy information representation
The elements related to fuzzy data handling can have different representations. So then, a normalized possibility distribution, for example, can be represented by different types of functions, but we will use a trapezoidal representation for it. The same can be said about the way we are going to model fuzzy relational operators as well as for the rest of fuzzy items to appear in the system. In this section, we show the representation criteria adopted in our implementation. These criteria is not exclusive for a concrete representation but represents the base on which the system is built according to the designed scheme for a FRDBMS. Therefore, we could say that these criteria constitutes a step between the formulation of a FRDB model and the effective implementation of a system based on it. Precise data
We will use the representation provided by the host RDBMS.
276
J.M. Medina et al. / Fuzzy Sets and Systems 75 (1995) 273-289
Imprecise
data
The model considers two different groups with different representation for imprecise data. - " I m p r e c i s e data over ordered underlying domain." This group of data contains possibility distributions defined on continuous or discrete but ordered domains. Type 6 of Table i belongs to this group. Each data of this type is associated with a membership function. For the sake of simplicity in the representation and computing efficiency, we will adopt the representation shown in Table 2. - "Dat~ .with analogy over discrete domain." This group of data is built over discrete domains on which there are "proximity relations" defined between its values. In this case, we will have to store the data representation as well as the representation of the proximity relations defined on the domain values. The different data which we can represent in this group are the following. • Simple scalars. These data are represented using the representation scheme of the host RDBMS. We only have to provide the system with the information for it to handle the "proximity relation" defined on the underlying domain. • Possibility distribution over discrete domain. An imprecise data of this type is associated with
-
-
a representation in which the domain values that constitute it are described together with the respective possibility values for each of them. ((Pl, dl), ..., (Pn, d,)). "Unknown" Data type. Data of this type express ignorance about the value an attribute takes but we know, in fact, that it can take one of the domain values. This means that it is possible for the attribute to take any of them. Therefore, we represent the U N K N O W N type through the possibility distribution { 1/u: u ~ U} where U is the underlying domain. Fig. 1 shows this possibility distribution. "Undefined" Data type. When an attribute takes the value U N D E F I N E D , it reflects the fact that none of the values of its domain are allowed. This means that none of the values are possible. Therefore, the possibility distribution associated is {0/u: u ~ U}, where U is the underlying domain. The possibility distribution is shown in Fig. 1. "Null" Data type. When an attribute takes the value NULL, it means that we have no information, either because we do not know it (UNKNOWN) or because a domain value (UNDEFINED) is not possible. The possibility distribution according to this case is {1/ U N K N O W N , 1/UNDEFINED}.
Table 2 Representation of data type
Data type Trapezoidal possibility distribution
Representation
Imprecise data over ordered underlying domain Data type Representation Linguistic label "Tall"
1
z; Intervalar distribution
125
150
1"/0 190 200
cn~
Approximate value 1
0
! ct=fl
J
y=t5
n-margin n n+margin D
277
J.M. Medina et al. / Fuzzy Sets and Systems 75 (1995) 273-289
1
D'
0
0
Fig. 1. Types UNKNOWN and UNDEFINED.
Ilia --
D'
0
-margin
0 margin
(z ---'y)
Fig. 2. "equal"and "approximatelyequal" operators.
Proximity relations We use proximity relations to model the imprecision derived from the likelihood between two values of the discourse domain. In this case we will only use proximity relations defined over finite discourse universes. So then, we can model such relations in a matricial way.
Fuzzy relational operators The different comparison operators used to relate data base relations are the relational ones. To operate on imprecise information these operators are extended. The representation adopted in our model for the different relational operators is as follows. - "Equal to". This operator models the equality concept for imprecise data. Formally, it can be expressed through the membership function given by
].,lequal_lo(~,~")= sup
min(p(d,d'),nit(d),n1,(d')),
(d,d')~DxD
where p(d,d') is a "proximity relation" nz(d), nz,(d') are the respective possibility distributions defined over the discourse domain D. • For imprecise data defined on ordered domain, p(d, d') = 6(d, d'), where 6 is a Diracs delta. Taking into account the representation we give to this data type, the result of the equal to operation can be obtained geometrically (Fig. 2).
• For data with analogy on discrete domain, p(d, d') is the matricial representation of the "proximity relation" that is defined on the discourse domain D. - "Approximately equal". This operator provides the degree in which two "crisp" numeric values are approximately equal. It is calculated according to the following expression: ~app . . . . . qual( X, Y)
={01
if I x - y l > m a r g i n , - Ix - yl/margin if Ix - yl ~< margin.
Fig. 2 shows the way it is calculated. The parameter margin fits the operator to the domain it is defined over. - "Greater or Equal". It is defined on ordered domains. This operator membership function is given by the fuzzy relation p >t (A,B) =
sup
min( ~> (x,y), ha(x), nB(y)),
(x, y ) e X x Y
where A and B are imprecise data over ordered domain or "crisp" numeric data, hA(x), ha(y) their respective possibility representation and ~> is the classic greater or equal operator given by
J.M. Medina et al. / Fuzzy Sets and Systems 75 (1995) 273-289
278
This operator can solve the following comparisons. • Degree to which a "crisp" number is "greater or equal" than a possibility distribution. • Degree to which a possibility distribution is "greater or equal" than a "crisp" number. • Degree to which a possibility distribution is "greater or equal" than another possibility distribution. - " L e s s or equal". It is defined on ordered domains. This operator membership function is given by the fuzzy relation # ~< (A, B) =
sup
min( ~< (x,y), gA(X), rtn(y)),
(x, y ) e X x Y
where A and B are imprecise data over ordered domain or "crisp" numeric data, nA(X), lrB(y) their respective possibility representations and ~< the classic operator "Less or equal" given by
~ Y , if x~(A,B) = 1 -I~_B(z)
o
f/
.(B