INTERNATIONAL JOURNAL OF COMPUTATIONAL COGNITION (HTTP://WWW.IJCC.US), VOL. 4, NO. 1, MARCH 2006
39
Fuzzy Functional Dependencies in Relational Databases Sadeq Al-Hamouz and Ranjit Biswas
Abstract— The work on this paper is an application of fuzzy set theory in relational databases. The authors introduce a new notion of fuzzy functional dependencies (ffd) in relational databases. It is observed that the existing notions of fuzzy functional dependencies (ffd) have a common drawback. Consequently the need for introducing a new type of fuzzy functional dependencies (ffd) is justified, and explained with an example. Armstrong’s c 2006 Yang’s axioms of fuzzy nature are verified. Copyright ° Scientific Research Institute, LLC. All rights reserved. Index Terms— Fuzzy set, fuzzy relation, fuzzy tolerance relation, α-nearer, α-equality, δ-relation, α-equality of tuples, ffd, α-ffd, f -Armstrong’s axioms.
I. I NTRODUCTION
I
N THIS paper we consider a relational database (Codd’s model, [5]) and define a new type of functional dependency, called fuzzy functional dependency (ffd). In the Codd’s model [5] of relational databases, the real world of interest is expressed by means of relations. Implementation of this model is in terms of precise data only. The comparison of data of the same data types is done with classical logic. But, in real-life problems, the data associated are often imprecise, or non-deterministic. All real data cannot be precise because of their fuzzy nature. Consequently, for comparing such data, classical logic is not appropriate. The most important concept in relational databases is that of the functional dependency of one set of attributes upon another. A functional dependency is a property of the semantics or the meaning of the attributes. The searching for functional dependencies is based on such computations for which equalities of data (attribute values) are tested. But the classical equality relation is not appropriate for comparing data of fuzzy nature. Consequently, a natural question that arises is that if two sets of attributes are not functionally dependent in the classical sense, are they really functionally independent ? This problem was studied in [1,9,10,13,16,17] using fuzzy logic, assuming that some or all data is fuzzy in nature. In this paper, we solve this problem with a different approach. Our approach is not the Manuscript received March 11, 2005; revised September 20, 2005. Sadeq Al-Hamouz1 and Ranjit Biswas2 , 1 Department of Computer Science, Amman Arab University for Graduate Studies, P.O.Box.2234, Amman11953, JORDAN. 2 Faculty of Information Technology, Philadelphia University, P.O.Box.1, Amman-19392, JORDAN. Emails:
[email protected](S. AlHamouz),
[email protected](R. Biswas) Correspondence and offprint requests to Dr. Matthias Kliegel, Department of Psychology, University of Zurich, Freiensteinstrasse 5, CH-8032 Zurich, Switzerland. Fax: 0041-1-6345189. Publisher Item Identifier S 1542-5908(06)10104-9/$20.00 c Copyright °2006 Yang’s Scientific Research Institute, LLC. All rights reserved. The online version posted on March 15, 2006 at http://www.yangsky.us/ijcc/ijcc41.htm
same type of fuzzy functional dependencies (ffd) defined in[1,9,10,13,16,17]. The major drawback of these existing concepts of ffd is that the comparison of two data of a domain is done with the help of fuzzy equality relations, which are not equivalence relations. This is not the case in our proposed ffd. We explain our notion of fuzzy functional dependencies (ffd) by a hypothetical example. The paper is organized with seven sections. We present some preliminaries on fuzzy set theory in Section II. We revisit, in brief, the classical relational model in Section III. In Section IV, we introduce new terminologies, viz, “αnearer”, “α-equal” and then we introduce our notion of fuzzy functional dependencies (ffd) between two sets of attributes. In Section V, we explain the concept of ffd by an example. In Section VI, we study Armstrong’s axioms with this new notion of data dependencies (ffd), verify certain results on Armstrong’s axioms and give some examples. Finally in Section VII we draw an overall conclusion of the work reported here. II. P RELIMINARIES In this section we give some basic preliminaries on fuzzy set theory. Let U = {u1 , u2 , . . . , un } be a universe of discourse. Definition 2.1 A fuzzy set A in the universe of discourse U is characterized by the membership function µA given by µA : U → [0, 1]. The membership function for a fuzzy set of U takes values from the closed interval [0, 1]. A fuzzy set A is defined as the set of ordered pairs A = {(u, µA (u)) : u ∈ U }, where µA (u) is the grade of membership of element u in the set A. The greater the amount of µA (u), the greater is the truth of the statement that ‘the element u belongs to the set A’. For example, consider a universe U = {DOG, CAT, RAT }. A fuzzy set A of U could be A = {(DOG, .7), (CAT, .99), (RAT, .4)}.
40
INTERNATIONAL JOURNAL OF COMPUTATIONAL COGNITION (HTTP://WWW.IJCC.US), VOL. 4, NO. 1, MARCH 2006
Definition 2.2 If A and B are two fuzzy sets of the universe U , then A ⊂ B iff ∀x ∈ U, µA (x) ≤ µB (x). B ⊃ A iff A ⊂ B. A = B iff ∀x ∈ U, µA (x) = µB (x). A = {(x, 1 − µA (x))|x ∈ U }. A ∩ B = {(x, min {µA (x), µB (x)}) : x ∈ U }. A ∪ B = {(x, max {µA (x), µB (x)}) : x ∈ U }. The basic mathematics used in Codd’s relational model [5] is relations and their properties. In this section we recollect few definitions on fuzzy relations [2,3,6,8,11,12,19]. Definition 2.3 Let X and Y be two sets. A fuzzy relation R from X to Y is a fuzzy set on X × Y and is denoted by R(X → Y ). Definition 2.4 If A is a fuzzy set on X then the sup-inf composition of the fuzzy relation R(X → Y ) with A is a fuzzy set B on Y denoted by B = R ◦ A and is defined by the membership function µR◦A given by µR◦A (y) = ∨{µA (x) ∧ µR (x, y)}, X
where ∨ = sup and ∧ = inf. Definition 2.5 Let Q(X → Y ) and R(Y → Z) be two fuzzy relations. The sup-inf composition R ◦ Q is a fuzzy relation from X to Z defined by the membership function µR◦Q given by µR◦Q (x, z) = ∨{µQ (x, y) ∧ Y
µR (y, z)},
∀(x, z) ∈ X × Z and ∀y ∈ Y. Definition 2.6 A fuzzy relation R(X → X) is said to be 1) reflexive : iff ∀x ∈ X, µR (x, x) = 1; 2) symmetric : iff ∀x1 , x2 ∈ X, µR (x1 , x2 ) = µR (x2 , x1 ). A fuzzy relation is said to be a fuzzy tolerance relation if it is reflexive and symmetric. Definition 2.7 If R is a fuzzy relation on X × Y , its inverse R−1 is a fuzzy relation on Y × X such that ∀(y, x) ∈ Y × X, µR−1 (y, x) = µR (x, y). Proposition 2.1 If R, S are two fuzzy relations on X × Y and Y × Z respectively, then 1) (R−1 )−1 = R; 2) (S ◦ R)−1 = R−1 ◦ S −1 . III. T HE C LASSICAL R ELATIONAL M ODEL [5,7,14,15] A classical relational database consists of a collection of relations. A relation is a table of values where each row represents a collection of related data values. In a table, each row is called a tuple, a column header is called an attribute and the table as a whole is called the relation. A relation schema R( A1 , A2 , ..., An ) consists of a relation name R and a list of attributes A1 , A2 , ..., An . The domain of an attribute Ai is denoted by dom(Ai ). An instance relation rof the relation schema R( A1 , A2 , ..., An ), also denoted by r(R), is thus a set of tuples t1 , t2 , ..., tn
where each ti is an n-tuple of the form ti = hv1 , v2 , ..., vn i, vi ∈ dom(Ai ). The ith value in a tuple t corresponds to the attribute Ai and is denoted by t[Ai ]. There are various restrictions on data in the form of constraints. Domain constraints specify that each value of an attribute Ai must be an atomic value from the domain dom(Ai ). This includes restrictions on data types, on the range of values ( if any), and on the format of data. Assume that null is not an element of any domain dom(Ai ), the entity integrity constraint, which states that no primary key value can be null, is satisfied. The key constraint says that if K ⊆ R is a super-key then for any two distinct tuples t1 and t2 in r(R), we have the constraint that t1 [K] 6= t2 [K]. The referential integrity constraints are not something imposed on any individual relation in a database. It is specified between two relations, and is used to maintain the consistency among tuples of the two relations. If we want to study the logical design of a relational database, we see that the integrity constraints play an important role. Among the integrity constraints, data dependency is important. There are various types of data dependencies, viz. functional dependency, join dependency, multi-valued dependency, etc. The single most important concept in relational schema design is that of functional dependency. It is a constraint between two sets of attributes X and Y where X ⊆ R, Y ⊆ R and R = {A1 , A2 , ..., An }. A functional dependency X → Y specifies a constraint such that for any two tuples t1 and t2 in r(R), whenever we have t1 [X] = t2 [X], we shall also have t1 [Y ] = t2 [Y ]. Thus Xfunctionally determinesY inRiff, whenever two tuples of r(R) agree on their X-values, they must necessarily agree on their Y -values. Clearly, for every super-key X of R and for every subset Y (⊆R), X → Y holds. A functional dependency is a property of the semantics of the attributes of a relation schemaR, which must be clearly understood by the database designers. A functional dependency is not to be inferred from any relation r of R. IV. F UZZY F UNCTIONAL D EPENDENCY ( FFD ) In the classical model, the concept of domain constraints, key constraints, and other integrity constraints are imposed with the help of crisp nature of comparisons. But, in practical and in many real applications, there is a genuine need of thinking about integrity constraints with fuzzy theoretical concepts. For example, “the salaries of two newly joined lecturers in Computer Science of the same qualifications should be more or less equal.” Such type of constraints are very often in nature, but not addressed in classical database design. It is because the equality of two domain values in the classical sense is either true or false. With this edging point in mind, the authors [1,9,10,13,16,17] have studied fuzzy functional dependencies with fuzzy integrity constraints. In this paper too, our main motivation is to capture the integrity arising out of fuzzy constraints, and so we need to define a new type of fuzzy functional dependency (ffd). Our approach is not the same type of fuzzy functional dependencies(ffd) defined in [1,9,10,13,16,17]. The major drawback of the existing concepts of ffd is that the comparison of two data of a domain is done with the help of fuzzy equality relations, which are
AL-HAMOUZ & BISWAS, FUZZY FUNCTIONAL DEPENDENCIES IN RELATIONAL DATABASES
not equivalence relations. Thus the exploitation of the property of transitivity is missed by them. This is not the case in our proposed ffd. First of all we define some new terminologies, which will be useful for introducing our notion of ffd. Let X be a set, the universe and < be a fuzzy tolerance relation on X. Consider a choice parameter α ∈ [0,1]. (This parameter is to be predefined by the database designers, and hence we call it a choice parameter). Definition 4.1 (α)< -nearer or α-nearer elements. Two elements x1 , x2 ∈ X are said to be (α)< -nearer ( or α-nearer, in short) if
41
Case (2): Suppose that x1 N(α1 )< x2 is not true, but ∃y1 , y2 , y3 , ..., yr−1 , yr ∈ X such that {x1 N(α)< y1 , y1 N(α)< y2 , y2 N(α)< y3 , . . . , yr−1 N(α)< yr } and yr N(α)< x2 . Suppose that, we also have x2 N(α)< x3 . Clearly, combining the two results, we have {x1 N(α)< y1 , y1 N(α)< y2 , y2 N(α)< y3 , . . . , yr−1 N(α)< yr , yr N(α)< x2 } and x2 N(α)< x3 . ⇒ x1 E(α)< x3 . ⇒ x1 δx3
µ< (x1 , x2 )
≥α
We denote this by the notation x1 N(α)< x2 . The following results are straightforward. Proposition 4.1 1) ∀x ∈ X and ∀α ∈ [0, 1] , x is (α)< −nearer to itself in X, i.e., xN(α)< x. 2) If x1 is (α)< -nearer to x2 in X, then x2 is also (α)< nearer to x1 in X, i.e., x1 N(α)< x2
⇔ x2 N(α)< x1 .
3) Suppose that 0 ≤ α2 ≤ α1 ≤ 1. Then x1 N(α1 )< x2 ⇒ x1 N(α2 )< x2 . Definition 4.2: (α)< -equality or α-equality. Two elements x1 , x2 ∈ X are said to be (α)< -equal ( or α-equal ) if 1) either x1 N(α)< x2 or 2) ∃y1 , y2 , y3 , ..., yr−1 , yr ∈ X such that {x1 N(α)< y1 , y2 N(α)R y3 , . . . ,yr−1 N(α)< yr }and yr N(α)< x2 }. This equality is denoted by the notation x1 E(α)< x2 . The following results follow from the results in Proposition 4.1 Proposition 4.2 1) ∀x ∈ X and ∀α ∈ [0, 1], xE(α)< x. 2) x1 E(α) x2 ⇒ x2 E(α)< x1 3) Suppose that 0 ≤ α2 ≤ α1 ≤ 1. Then x1 E(α1 )< x2 ⇒ x1 E(α2 )< x2 . Definition 4.3 δ(α)< relation on X. The crisp relation δ(α)< on X is defined as : For x1 , x2 ∈ X, x1 δ(α)< x2 , if x1 E(α)< x2 . If there is no confusion, we may use the notation x1 δx2 , which stands for x1 δ(α)< x2 . Proposition 4.3 The relationδ(α)R defined on X is an equivalence relation. Proof: ∀x ∈ X, xE(α)< x, ⇒ δ is reflexive. If x1 E(α) x2 then x2 E(α)< x1 ⇒ δ is symmetric. Now suppose that x1 δx2 and x2 δx3 . There are four cases to be examined. Case (1): Suppose that x1 N(α)< x2 and x2 N(α)< x3 is true. In this case, we have x1 E(α)< x3 by definition, and hence x1 δx3 .
For the other two cases too, we can prove that x1 δx3 . Hence the result. ¥ Consider a relation r(R) of relation schema R(A1 , A2 , ..., An ) given by Table 4.1 A1 a11 a21 a31 ... am1
A2 a12 a22 a32 ... am2
... ... ... ... ... ...
A1 a1n a2n a3n ... amn
Here R = {A1 , A2 , ..., An } . Let us assume a fuzzy tolerance relation Ri on the domain dom (Ai ) ∀i = 1, 2, ..., n. Let < denotes the set {R1 , R2 , ..., Rn } of fuzzy tolerance relations. Let X = {X1 , X2 , ..., Xk } ⊆ R. We will now define (α)< equality of two tuples t1 [X] and t2 [X] in a relational database design. Definition 4.4 (α)< -equality of t1 [X] and t2 [X]. Two tuples t1 [X] and t2 [X] are said to be (α)< −equal if t1 [xi ] E(α)< t2 [xi ] ∀i = 1, 2, ..., k. We denote the equality i by the notation t1 [X] ε(α)< t2 [X]. We have the following results: Proposition 4.4 1) For any tuple t, and for any α ∈ [0, 1], t [X] ε(α)< t [X] 2) t1 [X] ε(α)< t2 [X] ⇒ t2 [X] ε(α)< t1 [X] 3) If 0 ≤ α2 ≤ α1 ≤ 1, then t1 [X] ε(α1 )< t2 [X]
⇒
t1 [X] ε(α2 )< t2 [X] .
Definition 4.5 Fuzzy Functional Dependency. Let X, Y ⊂ R = {A1 , A2 , ..., An }. Choose a parameter α ∈ [0, 1] and propose a fuzzy tolerance relation