An efficient algorithm for reasoning about fuzzy functional dependencies

0 downloads 0 Views 315KB Size Report
A first step in order to fuzzify, is the following definition of fuzzy functional .... −−B is an axiom, the algorithm finishes with the output is Γ0 ⊣ A θ. −−→B. 2.
An efficient algorithm for reasoning about fuzzy functional dependencies P. Cordero, M. Enciso, A. Mora, I. P´erez de Guzm´an, and J. M. Rodr´ıguez-Jim´enez Universidad de M´ alaga. Spain {pcordero,enciso}@uma.es, {amora,guzman}@ctima.uma.es, [email protected]

Abstract. A sound and complete Automated Prover for the Fuzzy Simplification Logic (FSL logic) is introduced and based on it a method for efficiently reasoning about fuzzy functional dependencies over domains with similarity relations. The complexity of the algorithm is the same as that of equivalent algorithms for crisp functional dependencies that appear in the literature.

1

Introduction

Constraints are often used to guide the design of relational schema for the sake of consistency of databases and therefore to avoid the problems of redundances, anomalies, etc. This statement is valid for any extension of the classical relational model. Different authors have studied fuzzy models and which constraints are more appropriate to extend the well studied relational database theory to their fuzzy database. There exist many papers that have established the advantages of having a Fuzzy extension of the relational model for databases [?,?]. Thus, we can affirm that there exists a consensus in the need to have a “good” extension of the classical Codd model to Fuzzy Logic. But it is not only a matter of logicians. Thus, there exist several database researchers that claim for this extension. Several approaches for the definition of fuzzy functional dependency [?,?,?,?,?] (FFD) are proposed in the literature. In the same way as the concept of functional dependency (FD) corresponds to the notion of partial function, it should be desired that the concept of FFD would correspond to the notion of fuzzy partial function. The definitions proposed in [?, ?, ?] fit in this idea. Nevertheless some of these works preserves the original FD definition and substitutes the equality between values of an attribute by a similarity relation [?,?,?,?]. A proper extension of the concept of functional dependencies requires that we are able to introduce the uncertainty in the FDs that are holds by a relation by associating a grade to each FDs [?]. There exists a wide range of dependencies. Each dependency definition is usually followed by its corresponding logic. In [?,?] the authors propose any generalizations of the well-know Armstrong’s Axioms as a useful tool for reasoning with FFDs, but these inferences rule have not been used successfully in automated deduction. The reason is that this inference system were created to explain dependency semantics

more than to design an automated deduction system. In fact, in [?] the authors propose the classical closure algorithm to solve the implication problem and don’t directly use the Armstrong’s Axioms neither any generalization of them. Our approach points in this direction. In [?] a novel logic (SLFD ) equivalent to classical Armstrong’s axioms was presented. The core of SLFD is the Simplification Rule that replaces Transitivity Rule (which is the cause of the non applicability of the other logics [?]). The definition of SLFD introduces, for the first time, interesting solutions to database problems, which are solved using logic-based automated deduction methods [?, ?]. Our fuzzy FD notion was introduced to have a proper fuzzy definition which allows us to built a Fuzzy extension of the SLFD logic [?, ?]. In this work, we illustrate how the Simplification Rule can be considered to reasoning with FFDs . We prove that this rule is the key of three equivalence rules which can be considered as efficient tools to manipulate FFDs in a logical way: removing redundancy, solving implication problem etc. We present an automated prover that applies systematically the equivalence rules in order to answer if a FFD can be deduced from a set of FFDs. This work opens the door to the management of FFD constraints in an efficient and intelligent way. First, we outline the basic notions needed (Section 2) and, in Section 3, we show that the rules of FSL logic are equivalence rules and they become adequate tools to remove redundancy. In Section 4 we propose a new automated prover directly based on the equivalences rules of FSLlogic to solve the FFD implication problem. In Section 5, the soundness and completeness of the algorithm is proved and the complexity is studied and, finally, we establish several conclusions and future works in Section 6.

2

Preliminaries

First, the concept of functional dependency in the relational model of databases is outlined. Let Ω be a finite non-empty set whose elements are named attributes Q and {Da | a ∈ Ω} a family of domains. A database is a relation R ⊆ D = a∈Ω Da usually represented as a table. The columns are the attributes and the rows of this Q table are the tuples t = (ta | a ∈ Ω) ∈ D. If ∅ 6= X ⊆ Ω, DX denotes a∈X Da and, for each t ∈ R, t/X denotes the projection of t to DX . That is, t/X = (ta | a ∈ X) ∈ DX . Definition 1. A functional dependency is an expression X7→Y where X, Y ⊆ Ω. A relation R ∈ D satisfies X7→Y if, for all t1 , t2 ∈ R, t1/X = t2/X implies that t1/Y = t2/Y . The extended method to fuzzify the concept of functional dependency is by using similarity relations instead of the equality. Each domain Da is endowed with a similarity relation ρa : Da × Da → [0, 1], that is, a reflexive (ρa (x, x) = 1 for all x ∈ Da ) and symmetric (ρa (x, y) = ρa (y, x) for all x, y ∈ Da ) fuzzy relation. Given X ⊆ Ω, extensions of these relations to the set D can be obtained as follows: for all t, t0 ∈ D, ρX (t, t0 ) = min{ρa (ta , t0a ) | a ∈ X}. A first step in order to fuzzify, is the following definition of fuzzy functional dependency (FFD) that appears with slight differences in the literature [?, ?, ?].

Remark 1. A relation R ⊆ D satisfies X7→Y if ρX (t, t0 ) ≤ ρY (t, t0 ) holds, for all t, t0 ∈ R. However, the functional dependency remains crisp. In [?] the authors add a degree of fuzzyness in the dependency itself and in [?] we generalize this definition of fuzzy functional dependency as follows. θ

Definition 2. A fuzzy functional dependency is an expression X −−→Y where θ θ ∈ [0, 1] and X, Y ⊆ Ω with X 6= ∅. A relation R ⊆ D is said to satisfy X −−→Y if min{θ, ρX (t, t0 )} ≤ ρY (t, t0 ), for all t, t0 ∈ R. In the literature some authors present complete axiomatic system defined over FFDs with similarity relations [?, ?, ?] and any axiomatic systems where the dependency is fuzzy [?], and all of them are fuzzy extensions of Armstrong Axiom’s having the problem inherent of the transitivity rule in order to apply the axiomatic system in real problems. However, in [?] we introduce FSL, a new logic more adequate for the applications, named Simplification Logic for fuzzy functional dependencies. The main novelty of the system is that it is not based on the transitivity rule like all the others, but it is built around a simplification rule which allows the removal of redundancy. Definition 3. Given a finite non-empty set of attribute symbols Ω, the language θ of FSL is L = {X −−→Y | X, Y ∈ 2Ω , X 6= ∅ and θ ∈ [0, 1]},1 the semantics has been introduced in Definition 2 and the axiomatic system has one axiom scheme: 1

Ax: ` X −→ Y , for all Y ⊆ X

Reflexive Axioms

and four inferences rules: θ

θ

InR: X −−1→Y ` X −−2→Y , if θ1 ≥ θ2 θ

θ

0

0

DeR: X −−→Y ` X −−→Y , if Y ⊆ Y θ1

θ2

θ1

θ2

min(θ1 ,θ2 )

CoR: X −−→Y, U −−→V ` XU −−−−−−−−−→Y V

Inclusion Rule Decomposition Rule Composition Rule

min(θ1 ,θ2 )

SiR: X −−→Y, U −−→V ` U -Y −−−−−−−−−→V -Y , if X ⊆ U and X ∩ Y = ∅

Simplification Rule

The deduction (`), semantic implication (|=) and equivalence (≡) concepts are introduced as usual. Soundness and completeness were proved in [?].

3

Removing redundant information

In database systems redundancy is not desirable in the integrity constraints of a database and finally, in [?] we have outlined that FSL logic is adequate for the applications showing its good behavior for removing redundancy. The systematic application of the rules removes redundancy because they can be seen as equivalence rules as the following proposition ensures. Theorem 1. If X, Y, U, V ⊆ Ω and θ, θ1 ∈ [0, 1] then 1

In logic, it is important to distinguish between the language and the metalanguage. So, in a formula, XY denotes X ∪ Y , X − Y denotes X r Y and > denotes the empty set.

θ

θ

– Decomposition Equivalence (DeEq): {X −−→Y } ≡ {X −−→Y − X}. θ θ θ – Union Equivalence (UnEq): {X −−→Y, X −−→V } ≡ {X −−→Y V }. – Simplification Equivalence (SiEq): if X ⊆ U , X ∩ Y = ∅ and θ ≥ θ1 , then θ

θ

θ

θ

{X −−→Y, U −−1→V } ≡ {X −−→Y, U − Y −−1→V − Y } The proof of this theorem is straightforward and as immediate consequence of these equivalences there exists other equivalences that are very interesting to remove redundant information. Corollary 1. Let θ, θ1 ∈ [0, 1] and X, Y, U, V ⊆ Ω with X ⊆ U and X ∩ Y = ∅. θ

θ

θ

– Simplification+Union Equivalence (SiUnEq): {X −−→Y, U −−→V } ≡ {X −−→Y V } when U \ Y = X. θ θ θ – Simplification+Axiom Equivalence (SiAxEq): {X −−→Y, U −−1→V } ≡ {X −−→Y } when θ ≥ θ1 and V \ Y = ∅.

4

Automated Prover

Given a set of fuzzy functional dependencies Γ , we define the syntactic closure of θ θ Γ as Γ + = {X −−→Y | Γ ` X −−→Y } that coincides with the semantic closure due to the soundness and completeness of the axiomatic system. That is, Γ + is the minimum set that contains Γ , all the axioms and is closed for the inference rules. The aim of this section is to give an efficient algorithm to decide if a given FFD belongs to Γ + . The input of the algorithm will be a set of fuzzy functional dependencies Γ0 θ θ and a fuzzy functional dependency A−−→B and the output will be Γ0 ` A−−→B or θ Γ0 6` A−−→B. We outline the steps on the algorithm: θ

θ

1. If A−−→B is an axiom, the algorithm finishes with the output is Γ0 ` A−−→B. θ θ 2. Compute ΓθA = {AX −−→Y | X −−1→Y ∈ Γ0 with θ1 ≥ θ}. θ 3. If there not exists X ⊆ Ω such that A−−→X ∈ ΓθA , then the algorithm finishes θ

and the output is Γ0 6` A−−→B. 4. In other case, apply DeEq to every formula in ΓθA obtaining Γ1 . θ θ θ 5. Γ1 = {A−−→C1 } ∪ Γ01 such that A ⊆ X for all X −−→Y ∈ Γ1 . The FFD A−−→C1 will be named guide. θ 6. Repeat until obtain a fix point or a guide A−−→Cn with B ⊆ A ∪ Cn . θ θ – Compute Γi+1 = {A−−→Ci+1 } ∪ Γ0i+1 from Γi = {A−−→Ci } ∪ Γ0i applying θ

θ

to A−−→Ci and each X −−→Y ∈ Γ0i the equivalences SiAxEq, SiUnEq or SiEq with this priority ordering. θ θ 7. If the guide is A−−→Cn and B ⊆ A ∪ Cn then the otuput is Γ0 ` A−−→B. In θ other case, the output is Γ0 6` A−−→B. 0.9

1

Example 1. Let Γ = {ac−−−−→def, fh−−→dg} and the fuzzy functional dependency 0.8 0.8 cf− −−− →beg in order to check if Γ ` cf− −−− →beg. The trace of the execution of the FSL Automated Prover is the following:

0.8

1. cf −−−−→ beg is not an axiom, then the algorithm continues. 0.8 0.8 2. Γ0.8 −−− →def, cfh− −−− →dg}. cf = {acf− 0.8 0.8 3. Since in Γcf there is not an FFD cf−−−−→W (guide is ∅) then Γ 6` cf−−−−→beg. (End of FSL Automated Prover ) 0.9

1

0.9

0.4

0.9

0.9

Example 2. Let Γ = {ac−−−−→def, f−−→dg, de−−−−→h, di−−−−→a, ch−−−−→bf, j−−−−→ad, 0.8 0.8 cd− −−− →e} and the cf− −−− →beg a fuzzy functional dependencies in order to check 0.8 if Γ ` cf−−−−→dgh. The trace of the execution of the FSL Automated Prover is the following: 1. 2. 3. 4.

0.8

cf − −−− → beg is not an axiom and the algorithm continues. 0.8 0.8 0.8 0.8 0.8 0.8 Γ0.8 −−− →def, cf− −−− →dg, cdef− −−− →h, chf− −−− →bf, cfj− −−− →ad, cdf− −−− →e}. cf = {acf− 0.8 0.8 There exists cf−−−−→dg ∈ Γcf and the algorithms continues. DeEq applied to every formula in Γ0.8 cf gives the set 0.8

0.8

0.8

0.8

0.8

0.8

−−− →de, cf− −−− →dg, cdef− −−− →h, chf− −−− →b, cfj− −−− →ad, cdf− −−− →e}. Γ1 = {acf− 0.8 5. guide = {cf−−−−→dg} and 0.8 0.8 0.8 0.8 0.8 −−− →de, cdef− −−− →h, chf− −−− →b, cfj− −−− →ad, cdf− −−− →e}. Γ01 = {acf−

6. This step can be followed in the following table which shows step by step the application of the equivalence rules. Equivalence

Γ0

guide 0.8

cf−−−−→dg 0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

acf−−−−→de cdef−−−−→h chf−−−−→b cfj−−−−→ad cdf−−−−→e

SiEq

cf−−−−→dg

acf−−−−→e cdef−−−−→h chf−−−−→b cfj−−−−→ad cdf−−−−→e

SiEq

cf−−−−→dg

0.8

acf−−−−→e cef−−−−→h chf−−−−→b cfj−−−−→ad cdf−−−−→e

SiEq

0.8

cf−−−−→dg

SiUnEq

0.8

cf−−−−→deg acf−−−−→e cef−−−−→h chf−−−−→b cfj−−−−→a ×

SiAxEq

cf−−−−→deg ×

SiUnEq

cf−−−−→degh

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

0.8

acf−−−−→e cef−−−−→h chf−−−−→b cfj−−−−→a cdf−−−−→e

0.8

cef−−−−→h chf−−−−→b cfj−−−−→a

0.8

×

0.8

chf−−−−→b cfj−−−−→a

In the first column we depicted the equivalence applied between the guide and the underlined FFD in each row. The result of each equivalence for the underlined FFD is depicted underneath. The second column depicts the guide set that is augmented for SiUnEq. The application of SiAxEq to an FFD remove this FFD (the symbol × is used). And SiEq removes redundancy in an FFD. 0.8 7. As the guide is cf−−−−→degh and {d, g, h} ⊆ {c, f, d, e, g, h} then the output is 0.8 Γ ` cf− −−− →dgh. (End of FSL Automated Prover )

5

Soundness, Completeness and Complexity

Tarski’s fixed-point theorem ensures that the algorithm finishes because the sequence of the sets Ci is strictly growing in (2Ω , ⊆). The following results are oriented to prove that, for all Γ ∈ {ΓθA } ∪ {Γi | 1 ≤ i ≤ n}, θ

Γ0 ` A−−→B

if and only if

θ

Γ ` A−−→B

Lemma 1. Let Γ be a set of fuzzy functional dependencies, X ⊆ Ω and θ1 , θ2 ∈ θ1 + θ2 + [0, 1]. If θ1 ≤ θ2 then ΓX ⊆ ΓX .

θ1 θ2 Proof. From InR, ΓX ⊆ ΓX

+

θ1 and therefore ΓX

+

+

θ2 ⊆ ΓX .



θ

Lemma 2. Let Γ be a set of FFDs, U −−→V an FFD and X a set of attributes. If θ θ θ Γ ` U −−→V then ΓX ` XU −−→V . θ

Proof. From Lemma 1, it is proved by induction that all the elements U −−→V θ θ belonging to Γ + satisfy that ΓX ` XU −−→V .  θ

Theorem 2. Let Γ be a set of fuzzy functional dependencies, U −−→V an FFD and X a non-empty set of attributes. θ

θ

θ Γ ` X −−→Y if and only if ΓX ` X −−→Y

Proof. The direct implication is an immediate consequence of Lemma 2. Conθ θ θ θ then there exists U 0 −−1→V ∈ Γ ⊆ Γ + . If U −−→V ∈ ΓX versely, we prove that ΓX θ

1

such that θ ≤ θ1 and U = X ∪ U 0 . From U 0 −−1→V and the axiom X −−→V ∩ X, θ θ θ CoR, U −−1→V is obtained and, by FrR, U −−1→V ∈ Γ + . Finally, ΓX ⊆ Γ + implies + θ + that ΓX ⊆ Γ .  With this theorem we have proved that Step 2 in the algorithm is sound and complete. Now, we will prove the soundness of Step 3 and the existence of the guide cited in Step 5. θ

θ ` U −−→V then one of the following conditions holds: Proposition 1. If ΓX θ

A. U −−→V is an axiom. θ θ such that X ⊆ U 0 ⊆ U . B. There exists U 0 −−→V 0 ∈ ΓX Proof. By induction is proved that all the fuzzy functional dependencies belonging θ+ to ΓX satisfy at least one of both conditions.  θ

Consequently, the existence of X ⊆ Ω such that A−−→X ∈ ΓθA is a necessary θ

condition for Γ0 ` A−−→B. This section conclude with the proof of the soundness and completeness of the algorithm. Theorem 3. The algorithm is sound and complete. θ

θ

Proof. Theorem 2 ensures that Γ ` A−−→B if and only if ΓθA ` A−−→B. On the θ

other hand, if the algorithm finishes with the set Γn = {A−−→Cn } ∪ Γ0n then ΓθA ≡ Γn because equivalence rules has been applied. Moreover, if B 6⊆ A ∪ Cn then θ

θ

(X ∪ Y ) ∩ Cn = ∅, for all X −−→Y ∈ Γ0n , and, from Proposition 1, Γn 6` A−−→B because the inference rules can not be applied. However, if B ⊆ A ∪ Cn then the θ following sequence proves that Γn ` A−−→B. θ

1. A−−→Cn by hypothesis. 1 2. A−−→A by AxR.

θ

3. A−−→ACn by CoR to 1. and 2. θ

4. A−−→B

by FrR to 3.



Regarding complexity results, the cost of Steps 1 to 5 is O(|Γ |) in the worst case. Step 6 has O(|Ω| |Γ |) cost because, in the worst case, the prover cross Γ and at least one attribute is added to guide and removed in the rest of the set in each iteration. Then the number of operations is lower than |Ω| |Γ |. As well as know, in the literature, the algorithms for automatic reasoning about fuzzy functional dependencies are given for logics with lowest expressiveness. In [?], the authors give an algorithm for automatic reasoning about (classical) functional dependencies. The complexity of this algorithm is O(|Ω| |Γ |) and they say that: “O(|Ω| |Γ |) is usually considered as the order of the input. From this point of view, this is a linear time algorithm”. In the literature also appears other algorithms for classical functional dependencies with the same cost. However, until now, the unique algorithm for a fuzzy extension of functional dependencies, as far as we know, is the one given in [?]. The complexity of this algorithm is the same. However, the fuzzyfication that they consider is from the first type in which, although they consider fuzzy equalities, the functional dependency remains crisp. That is, the expressiveness of our logic is higher [?].

6

Conclusions

In [?] we have introduced the Simplification Logic for the management of fuzzy functional dependencies (FSL logic) and we outlined the advantages of it. Specifically, the absence of transitivity as primitive inference rule that is replaced by the simplification rule. Our logic is conceived thinking for removing redundancies and particular cases of the inference rules are also equivalence rules that automatically remove redundancies. In this paper, we present a sound and complete Automated Prover to answer θ the question: Γ ` A−−→B? The basic idea of the algorithm is to replace Γ by ΓθA and remove redundancies in this set by applying systematically the equivalence rules. The complexity of the algorithm is the same that the equivalent ones for crisp functional dependencies that appear in the literature. A short-term work goes in the direction of applying our fuzzy model into Formal Concept Analysis. The use of the FSL logic to manipulate attribute implications in Formal Concept Analysis is begin developed.