When Data Dependencies over SQL Tables Meet the Logics of Paradox and S-3 Sven Hartmann
Sebastian Link
Clausthal University of Technology, Germany
Victoria University of Wellington, New Zealand
[email protected]
[email protected]
ABSTRACT
Most commercial database systems are still founded on the relational model of data [11]. Data administrators utilize various classes of data dependencies to restrict the relations in the database to those considered meaningful to the application at hand. According to [14] functional dependencies (FDs) capture around two-thirds, and multivalued dependencies (MVDs) around one-quarter of all uni-relational dependencies (those defined over a single relation schema) that arise in practice. In particular, MVDs are frequently exhibited in database applications [42], e.g. after denormalization or in views. While research on this topic has been extensive, currently existing theories for FDs and MVDs only apply to relations over SQL tables where either all attributes are NOT NULL or all attributes are NULL.
We study functional and multivalued dependencies over SQL tables with NOT NULL constraints. Under a no-information interpretation of null values we develop tools for reasoning. We further show that in the absence of NOT NULL constraints the associated implication problem is equivalent to that in propositional fragments of Priest’s paraconsistent Logic of Paradox. Subsequently, we extend the equivalence to Boolean dependencies and to the presence of NOT NULL constraints using Schaerf and Cadoli’s S-3 logics where S corresponds to the set of attributes declared NOT NULL. The findings also apply to Codd’s interpretation “value at present unknown” utilizing a weak possible world semantics. Our results establish NOT NULL constraints as an effective mechanism to balance the expressiveness and tractability of consequence relations, and to control the degree by which the existing classical theory of data dependencies can be soundly approximated in practice.
Example 1. Consider a table Supplies with the column headers A(rticle), S(upplier), L(ocation) and C(ost). The table collects information about suppliers that deliver articles from a location at a certain cost. CREATE TABLE SUPPLIES ( Article CHAR[20], Supplier VARCHAR NOT NULL, Location VARCHAR NOT NULL, Cost CHAR[8]);
Categories and Subject Descriptors H.2.1 [Database Management]: Logical Design; H.2.4 [Database Management]: Systems—Relational Databases; F.2.3 [Analysis of Algorithms and Problem Complexity]: Tradeoffs among Complexity Measures; F.4.1 [ Mathematical Logic and Formal Languages]: Mathematical Logic
Suppose the database management system enforces the following constraints: The FD A → S says that for every article there is at most one supplier, the FD A, L → C says that the cost is determined by the article and the location, and the MVD S ³ L says that the locations are determined by the supplier independently of the articles and costs. Do the following meaningful constraints also need to be enforced explicitly, or are they already enforced implicitly: i) the FD A → C and ii) the MVD A ³ L?
General Terms Algorithms, Design, Management, Theory
Keywords Axiomatization, Data dependency, Implication, Logic of Paradox, Null value, S-3 logic
1.
No existing theory provides tools that can reason about FDs and MVDs in the presence of arbitrary NOT NULL constraints, cf. the decision problems in Example 1. The classical theory of FDs and MVDs (e.g. [6, 4, 16, 35]) only applies to total relations, i.e., where every attribute is NOT NULL. Atzeni and Morfuni allow a null-free subschema (NFS) that captures SQL’s NOT NULL constraint under Zaniolo’s noinformation null value ni [43], but they only consider FDs [3]. Lien studied the combined class of FDs and MVDs over partial relations under the no-information interpretation, but the interaction of FDs and MVDs is trivial since every attribute is assumed to be NULL [28, 30], i.e., the NFS is forced to be empty by default. An empty NFS results in a fairly weak expressiveness of the consequence relation associated with the implication of FDs and MVDs. However, this
INTRODUCTION
A database system manages a collection of persistent information in a shared, reliable, effective and efficient way.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PODS’10, June 6–11, 2010, Indianapolis, Indiana, USA. Copyright 2010 ACM 978-1-4503-0033-9/10/06 ...$10.00.
317
PL={A,B,C}−3
Example 2. Let R = ASLC, Rs = SL, Σ = {A → S; A, L → C; S ³ L} as in Example 1. It turns out that Σ and Rs indeed imply the FD A → C and also the MVD A ³ L. However, this reasoning depends on the NFS Rs . In fact, if Rs = ASC, then the relation Article Kiwi Kiwi
Supplier G6Kiwi G6Kiwi
Location ni ni
Supplier ni ni
Location Maunganui Taranaki
Cost 1.50 2.50
Cost 1.50 2.50
{B}−3
{C}−3
As a second contribution of this paper we establish equivalences between the implication of several data dependency classes over partial relations and the implication of propositional formulae in fragments of Graham Priest’s well-known paraconsistent Logic of Paradox LP [33]. We extend our equivalences to the presence of an NFS Rs by not allowing the truth values of the variables that correspond to attributes in Rs to be paradoxical P. This leads us to Schaerf and Cadoli’s S-3 logics in which S refers to the set of propositional variables that cannot be paradoxical [36]. In particular, the special case where S = ∅ corresponds to Priest’s logic LP, and the special case where S is the entire relation schema R corresponds to classical propositional logic PL. The situation for the case where R = ABC is illustrated in Figure 1. Our results subsume the well-known equivalences established for the classes of FDs, MVDs and Boolean dependencies over total relations [15, 35], i.e., the special case where Rs = R. From the case of total relations it is known that the equivalences do not hold for more general classes of dependencies such as join or embedded dependencies [35]. We even show that the equivalences do not extend beyond the binary case of Delobel’s full hierarchical dependencies (i.e. beyond MVDs) [13]. In this sense, the combined class of FDs and MVDs is significant and natural. We apply the new correspondences in different ways. The worst-case time-complexities analyses on the consequence relation for Priest’s logic LP [10] carry over to Boolean dependencies over partial relations. The uniform complexity results established for the S-3 logics [36] carry over to the corresponding data dependencies in the presence of an NFS. In particular, by specifying attributes as NOT NULL, the data administrator has a powerful mechanism to approximate the classical theory of FDs and MVDs soundly. We also apply our correspondences in the opposite direction. In fact, our algorithms for deciding FD and MVD implication in the presence of an arbitrary NFS Rs directly establish new upper bounds for the time-complexity of the consequence relations in the corresponding fragments in Rs -3 logics, fragments not studied previously to our knowledge. Finally, we show how all results carry over to Codd’s null-
Example 3. Suppose the NFS over Supplies is empty, denoted by Rs = ∅. Consider again the relation Location ni ni
{A}−3
P (both true and false) to L0 . Indeed, in Priest’s paraconsistent logic LP this interpretation is a model of the formulae ¬A0 ∨L0 and ¬L0 ∨C 0 (evaluating both to P), but not a model of the formula ¬A0 ∨ C 0 (evaluating it to F).
Consequently, to establish SQL’s NOT NULL constraint as an effective control mechanism, the interaction of FDs and MVDs must be studied in the presence of an arbitrary null-free subschema. As a first contribution of this paper we establish a finite axiomatization for the combined class of FDs and MVDs in the presence of an arbitrary NFS. In order to unify the currently existing, but orthogonal theories of FDs and MVDs (i.e. [6, 3, 28, 30, 8]) we first adapt Zaniolo’s no-information interpretation of null values [43]. We also extend Galil and Sagiv’s almost linear time algorithms for computing the dependency basis and deciding the associated implication problem from total relations [18, 34] to the presence of an arbitrary NFS.
Supplier G6Kiwi G6Kiwi
{B,C}−3
Figure 1: Boolean cube of S -3 Logics over R = ABC
satisfies Σ and Rs , but violates A → C and A ³ L. The examples show that A → C can only be implied by Σ and Rs if Rs contains both S and L, and A ³ L can only be implied by Σ and Rs if Rs contains S.
Article Kiwi Kiwi
{A,C}−3
LP= φ −3
satisfies Σ and Rs , but violates A → C. If we choose Rs to be ALC instead, then the relation Article Kiwi Kiwi
{A,B}−3
efficiency
expressiveness, consistency, certainty
may well be desirable to prevent the implicit enforcement of undesired dependencies. As we will show, the opportunity to specify an arbitrary NFS provides the data administrator with a flexible mechanism to control the expressiveness of the consequence relation. However, the following example illustrates that reasoning about FDs and MVDs in the presence of an arbitrary NFS is subtle, and automated tools for such reasoning tasks cannot be taken for granted.
Cost 1.50 . 2.50
This relation must obey the FDs A → L and L → C. However, it violates the FD A → C. This situation is only possible because Location is NULL. In fact, the two tuples in the relation must simultaneously agree on Location (to satisfy the FD A → L) and disagree on Location (to satisfy the FD L → C). This is a paradox, i.e., L0 ∧ ¬L0 is a true instance of an inconsistency. Therefore, non-explosive reasoning about data dependencies in the presence of nulls (at least under the no-information interpretation) means to reason in some paraconsistent logic. In fact, as we will show, the two tuples of the relation above correspond to an interpretation that assigns the truth value true (and only true) T to A0 and S 0 , false (and only false) F to C 0 and paradoxical
318
ni ∈ dom(A). The intention of ni is to mean “no information”. This is the most primitive interpretation, and it can model missing as well as incomplete information [3, 43]. For attribute sets X and Y we may write XY for X ∪ Y . If X = {A1 , . . . , Am }, then we may write A1 · · · Am for X. In particular, we may write simply A to represent the singleton {A}. A tuple over R (R-tupleSor simply tuple, if R is understood) is a function t : R → A∈R dom(A) with t(A) ∈ dom(A) for all A ∈ R. The null value occurrence t(A) = ni associated with an attribute A in a tuple t means that no information is available about the attribute A for the tuple t. For X ⊆ R let t[X] denote the restriction of the tuple t over R to X. A (partial) relation r over R is a finite set of tuples over R. Let t1 and t2 be two tuples over R. It is said that t1 subsumes t2 if for every attribute A ∈ R, t1 [A] = t2 [A] or t2 [A] = ni holds. In consistency with previous work [3, 28, 43], the following restriction will be imposed, unless stated otherwise: No relation shall contain two tuples t1 and t2 such that t1 subsumes t2 . With no null values present this means that no duplicate tuples occur. For a tuple t over R and a set X ⊆ R, t is said to be Xtotal if for all A ∈ X, t[A] 6= ni. Similar, a relation r over R is said to be X-total, if every tuple t of r is X-total. A relation r over R is said to be a total relation if it is R-total. We recall projection and join operations on partial relations [3, 28]. Let r be some relation over R. Let X be some subset of R. The projection r[X] of r on X is the set of tuples t for which (i) there is some t1 ∈ r such that t = t1 [X] and (ii) there is no t2 ∈ r such that t2 [X] subsumes t and t2 [X] 6= t. For Y ⊆ X, the Y -total projection rY [X] of r on X is rY [X] = {t ∈ r[X] | t is Y -total}. Given an X-total relation r over R and an X-total relation s over S such that X = R ∩ S the natural join r ./ s of r and s is the relation over R ∪ S which contains those tuples t such that there are t1 ∈ r and t2 ∈ s with t1 = t[R] and t2 = t[S] [3, 28]. Functional dependencies are important for the relational [5, 11] and other data models [1, 20, 24, 25, 37, 39, 40, 41]. According to Lien [28], a functional dependency with nulls (FD) over R is a statement X → Y where X, Y ⊆ R. The FD X → Y over R is satisfied by a relation r over R ( |=r X → Y ) if and only if for all t1 , t2 ∈ r the following holds: if t1 and t2 are X-total and t1 [X] = t2 [X], then t1 [Y ] = t2 [Y ]. Recall that ni ∈ dom(A) for every attribute A. For total relations the FD definition reduces to the standard definition of a functional dependency [32], and so is a sound generalization. It is also consistent with the no-information interpretation [3, 28]. According to Lien [28], a multivalued dependency with nulls (MVD) over R is a statement X ³ Y where X, Y ⊆ R. The MVD X ³ Y over R is satisfied by a relation r over R (|=r X ³ Y ) if and only if for all t1 , t2 ∈ r the following holds: if t1 and t2 are X-total and t1 [X] = t2 [X], then there is some t ∈ r such that t[XY ] = t1 [XY ] and t[X(R − Y )] = t2 [X(R − Y )]. Informally, the relation r satisfies X ³ Y when every X-total value determines the set of values on Y independently of the set of values on R − Y . It has been shown that |=r X ³ Y if and only if rX [R] = rX [XY ] ./ rX [X(R − Y )] [28]. Again, the MVD definition is a sound generalization of the standard definition of a multivalued dependency over total relations [16, 32]. Following Atzeni and Morfuni [3], a null-free subschema (NFS) over the relation schema R is an expression Rs where Rs ⊆ R. The NFS Rs over R is satisfied by a relation r
value unk (value at present unknown) under the weak possible world semantics of Levene and Loizou [25]. Organization. We summarize related work in Section 2 and the basic definitions from previous work in Section 3. In Section 4 we establish an axiomatization of FDs and MVDs in the presence of an arbitrary NFS, and develop an almost linear time algorithm to decide the corresponding implication problem. The correspondences to Priest’s Logic of Paradox are established in Section 5. These are extended to Boolean dependencies in Section 6, and to the presence of an NFS in Section 7 where we utilize S-3 logics. We discuss how our results apply to the null value unk in Section 8. We conclude in Section 9 and discuss some possible directions of future work in Section 10.
2.
RELATED WORK
Data dependencies have been studied thoroughly in various data models, cf. [32]. Applications comprise almost the full range of database topics, e.g. normalization, requirements engineering and schema validation, data mining, database security, view maintenance and query optimization. They have received considerable attention in other data models [1, 3, 9, 20, 21, 22, 25, 37, 39]. New application areas involve data cleaning, data transformations, consistent query answering, data exchange and data integration. For total relations, Armstrong [2] established the first axiomatization for FDs. Beeri, Fagin, and Howard extended this axiomatization to the combined class of FDs and MVDs [6]. Biskup [7] and Biskup/Link [8] studied notions of FD and MVD implication with an unfixed underlying set of attributes. For FDs over total relations, implication can be decided in time linear in the input [5], for MVDs the best known algorithm runs in almost linear time [18]. Equivalences between the implication of data dependencies over total relations and the implication of formulae in propositional fragments were established by Sagiv, Delobel, Parker and Fagin [15, 35]. One of the most important extensions of Codd’s basic relational model [11] concerns incomplete information [12, 23, 26]. In this paper we focus on incomplete relations. In the literature many kinds of null values have been proposed; for example, “missing” or “value unknown at present”, “nonexistence”, “inapplicable”, “no information” [43] and “open”. Lien [28] axiomatized FDs and MVDs in partial relations under the no-information interpretation, but without an NFS. In the same setting, axiomatizations for FD/MVD implication have been established [30] where the underlying universe is not fixed. Atzeni and Morfuni established axiomatizations and linear-time algorithms for deciding the implication of FDs combined with existence constraints, e.g., null-free subschemata [3].
3.
PRELIMINARIES
We summarize the basic notions required for our treatment of data dependencies over partial relations. Let A = {A1 , A2 , . . .} be a (countably) infinite set of distinct symbols, called attributes (column names). A relation schema is a finite non-empty subset R of A. Each attribute A of a relation schema R is associated with an infinite domain dom(A) which represents the possible values that can occur in column A. In order to encompass incomplete information every column may have a null value, denoted by
319
over R (|=r Rs ) if and only if r is Rs -total. SQL allows the specification of attributes as NOT NULL, cf. Example 1. Hence, the set of attributes declared NOT NULL forms the single NFS over the underlying relation schema. For a set Σ of constraints over some relation schema R, we say that a relation r over R satisfies Σ (|=r Σ) if r satisfies every σ ∈ Σ. If for some σ ∈ Σ the relation r does not satisfy σ we say that r violates σ (and violates Σ) and write 6|=r σ (6|=r Σ). We will consider different classes C of constraints over a single relation schema, e.g. FDs and MVDs. During the design process or the lifetime of a database one usually needs to determine further dependencies which are implied by the given ones. Let R be a relation schema, let Rs ⊆ R denote an NFS over R, and let Σ ∪ {ϕ} be a set of data dependencies over R in the class C. We say that Σ implies ϕ in the presence of Rs (Σ |=Rs ϕ) if every relation r over R that satisfies Σ and Rs also satisfies ϕ. If Σ does not imply ϕ in the presence of Rs we may also write Σ 6|=Rs ϕ. The implication problem for C in the presence of a nullfree subschema is to decide, given any relation schema R, any NFS Rs over R, and any set Σ ∪ {ϕ} of data dependencies in C over R, whether Σ |=Rs ϕ. For the classes C of dependencies we consider here, the sets Σ ∪ {ϕ} over a relation schema R are always finite. Moreover, if Rs = ∅ we also write Σ |= ϕ instead of Σ |=∅ ϕ. This covers the case where every attribute is NULL. The case where every attribute is NOT NULL is covered when Rs = R. For the classes of dependencies we consider here it does not matter whether the relations are finite or not. For this reason, we will only speak of the implication problem. We will show later that it even suffices to consider two-tuple relations. We say that Σ implies ϕ in the world of two-tuple relations and in the presence of an NFS Rs (Σ |=2−Rs ϕ) if every two-tuple relation r over R that satisfies Σ and the NFS Rs also satisfies ϕ. The two-tuple implication problem for C in the presence of a null-free subschema is to decide, given any relation schema R, any NFS Rs over R and any set Σ∪{ϕ} of dependencies in C over R, whether Σ |=2−Rs ϕ holds. Again, we may simply write Σ |=2 ϕ, if Rs = ∅. For a set Σ of data dependencies in C over a relation schema R and an NFS Rs over R, let Σ∗Rs = {ϕ ∈ C | Σ |=Rs ϕ} be its semantic closure. For a finite set Σ of dependencies in C, let Σ+ R = {ϕ | Σ `R ϕ} be its syntactic closure under inferences by a set R of inference rules [32]. R is said to be sound (complete) for the implication of dependencies in C in the presence of an NFS if for every relation schema R, for every NFS Rs over R and for every set Σ of + ∗ ∗ dependencies in C over R we have Σ+ R ⊆ ΣRs (ΣRs ⊆ ΣR ). The (finite) set R is said to be a (finite) axiomatization for the implication of dependencies in C in the presence of an NFS if R is both sound and complete for the implication of dependencies in C in the presence of an NFS.
4.
X →YZ X→Y (decomposition, DF )
XY → Y (reflexivity, RF )
X → Y ;X → Z X →YZ (FD union, UF ) X³Y X ³R−Y R) (R-complementation, CM
X ³ Y ;X ³ Z X ³YZ (MVD union, UM )
X ³ W;Y ³ Z Y ⊆ X(W ∩ Rs ) X ³Z −W (null transitivity, TM ) X→Y X³Y
(implication, IFM )
X ³ W;Y → Z Y ⊆ X(W ∩ Rs ) X →Z−W (null mixed transitivity, TFM )
Table 1: FD/MVD axiomatization D with NFS Rs time. In particular, for the special case of total relations the currently best known bound is matched [18].
4.1
Axiomatization
D is sound for the implication of FDs and MVDs in the presence of an NFS. This follows from the soundness of the FD inference rules of reflexivity, decomposition, and FD union, cf. [3, 28], R-complementation, MVD union and implication, cf. [28], and an inspection of the null transitivity rule and the null mixed transitivity rule for the implication of FDs and MVDs. For a relation schema R, an NFS Rs and a set Σ of FDs and MVDs over R let DepΣ (X) := {Y ⊆ R | Σ `D X ³ Y } denote the set of all attribute subsets Y of R such that X ³ Y can be inferred from Σ by D. The soundness of the union and R-complementation rules imply that (DepΣ (X), ⊆, ∪, ∩, (·)C , ∅, R) forms a finite Boolean algebra where (·)C maps an attribute set Y to its complement R − Y . Recall that an element a ∈ P of a poset (P, v, 0) with least element 0 is called an atom of (P, v, 0) precisely when a 6= 0 and every element b ∈ P with b v a satisfies b = 0 or b = a. Further, (P, v, 0) is said to be atomic if for every element b ∈ P − {0} there is an atom a ∈ P with a v b. In particular, every finite Boolean algebra is atomic. Let DepBΣ (X) denote the set of all atoms of (DepΣ (X), ⊆, ∅). Following Beeri [4] we call DepBΣ (X) the dependency basis of X with respect to Σ. + Moreover, let XΣ = {A | Σ `D X → A} denote the closure of X with respect to Σ [5]. The significance of these notions is embodied in the following theorem.
DEDICATED TOOLS FOR REASONING
We establish the inference system D from Table 1 as the first finite axiomatization for the implication of FDs and MVDs in the presence of an NFS. This subsumes Beeri, Fagin, and Howards axiomatization over total relations [6], Atzeni and Morfuni’s axiomatization of FDs in the presence of an NFS [3], and Lien’s axiomatization of FDs and MVDs in the absence of an NFS [28]. We establish that the associated implication problem can be decided in almost linear
Theorem 1. Let Σ be a set of FDs and MVDs, and Rs an NFS over the relation schema R. Then we have: S 1. Σ `D X ³ Y if and only if Y = Y for some Y ⊆ DepBΣ (X), + 2. Σ `D X → Y if and only if Y ⊆ XΣ , and
3. if Σ `D X → A, then {A} ∈ DepBΣ (X).
320
t t0
+ X(XΣ ∩ Rs ) 0···0 0···0
+ (XΣ − X) − Rs ni · · · ni ni · · · ni
W 1 ∩ Rs 0···0 0···0
W 1 − Rs ni · · · ni ni · · · ni
···
Wi 0···0 1···1
···
W k ∩ Rs 0···0 0···0
W k − Rs ni · · · ni ni · · · ni
Table 2: The relation rϕ in the completeness proof + + XΣ = XΣ[XR and DepBΣ (X) = DepBΣ[XRs ] (X). Thus s] Galil’s algorithm even applies to partial relations when we only consider dependencies Y ³ Z or Y → Z ∈ Σ[XRs ]. This yields an O(|Σ| + min(kΣ[XRs ] , log pΣ[XRs ] )|Σ[XRs ]|) time algorithm to compute DepBΣ (X) under the NFS Rs . Let Σ be a set of FDs and MVDs. If DepBΣ (X) is known, the implication problem Σ |=Rs ϕ for a given FD or MVD ϕ with left-hand side X can be decided in linear time. In particular, Σ |=Rs X → A holds when {A} ∈ DepBΣ (X) and Σ contains an FD Y → Z with A ∈ Z. In practice, a minor modification of Galil’s algorithm gives an
The following result requires a non-trivial extension of the proof arguments from the special cases where Rs = R [6] and where Rs = ∅ [28]. Theorem 2. D is a finite axiomatization for the class of FDs and MVDs in the presence of null-free subschemata. Proof Sketch. Let R be some relation schema, Σ a set of FDs and MVDs, and Rs an NFS over R. We need to show that Σ∗Rs ⊆ Σ+ D holds. Let ϕ denote the MVD X ³ Y ∈ / Σ+ D . We construct a two-tuple relation rϕ that violates X ³ Y but satisfies Σ. Let DepBΣ (X) be the disjoint union of {{A} | A ∈ + XΣ } and {W1 , . . . , Wk }. Since ϕ ∈ / Σ+ D we conclude by Theorem 1 that Y is not the union of some elements of DepBΣ (X). Consequently, there is some i ∈ {1, . . . , k} such that Y ∩ Wi 6= ∅ and Y − Wi 6= ∅ hold. Let rϕ := {t, t0 } be the relation in Table 2. It can be shown that rϕ satisfies Σ but violates ϕ. Finally, let ϕ denote the FD X → Y ∈ / Σ+ D . Due to the FD union rule there is some A ∈ Y such that X → A ∈ / Σ+ D . It + follows that A ∈ / XΣ . Without loss of generality let A ∈ Wi . Again, rϕ satisfies Σ but violates ϕ.
4.2
O(|Σ| + min(kΣ[XRs ] , log p¯Σ[XRs ] )|Σ[XRs ]|) algorithm for deciding Σ |=Rs ϕ, cf. [18]. Here, p¯Σ is the number of sets in DepBΣ (X) that have non-empty intersection with the right-hand side of ϕ. Alternative algorithms for computing the dependency basis and deciding implication over total relations were given by Sagiv [34], cf. Section 7. Though Sagiv’s approach does not provide better time bounds, it is of interest as it directly exploits the equivalence between the implication of FDs and MVDs over total relations and the logical implication in a fragment of propositional logic. Galil [18] predicts that using this equivalence one may possibly come up with a linear-time algorithm to decide implication. This provides strong motivation for investigating the implication of FDs and MVDs over partial relations from a logical point of view.
Algorithms
For total relations, Beeri [4] presented an algorithm for computing DepBΣ (X) that runs in time O(|Σ|4 ). It is based on Beeri’s rule which we extend here to the general case of partial relations with an NFS Rs .
5.
X ³ W;Y ³ Z W ∩ Y = ∅; Y ⊆ XRs X ³ W ∩ Z; X ³ W − Z
EQUIVALENCE TO LP
We establish an equivalence between the implication of Lien, Atzeni and Morfuni’s classes of FDs and MVDs over partial relations (where every attribute is NULL) and the logical implication in a fragment of Priest’s well-known paraconsistent Logic of Paradox [33].
In fact, we derive X ³ R − W by the R-complementation rule, and X ³ W ∩ Z by the null transitivity rule (Y ⊆ R − W and Y ⊆ XRs give Y ⊆ X(Rs − W )). The MVD X ³ W − Z can be derived similarly. For total relations the condition Y ⊆ XRs becomes trivial. The idea of Beeri’s algorithm is to start with a partition B := {R − X, A | A ∈ X} of R which is then stepwise refined by applying Beeri’s rule to sets W ∈ B and dependencies Y ³ Z or Y → Z ∈ Σ that meet the conditions in Beeri’s rule and satisfy W ∩ Z 6= ∅ and W − Z 6= ∅. In each step, W is split into W ∩ Z and W − Z. Note that Σ `D X ³ W for all W ∈ B is a loop invariant. The algorithm stops when no further refinement is possible. The resultant partition B is then the dependency basis DepBΣ (X) of X. More sophisticated implementations of this idea by Hagihara et al. [19] resulted in an O(min((fΣ + kΣ )2 dΣ , |Σ|2 )) time algorithm, and later by Galil [18] resulted in an O(|Σ|+ min(kΣ , log pΣ )|Σ|) time algorithm. Herein, fΣ , kΣ and dΣ are the numbers of FDs, of MVDs, and of distinct attributes, respectively, in Σ, while pΣ denotes the number of sets in DepBΣ (X). Note that Galil’s algorithm runs in linear time when Σ contains only FDs but no MVDs. Let Σ[U ] contain only those dependencies from Σ whose left-hand side is a subset of U . From Theorem 2 we conclude
5.1
Graham Priest’s Logic of Paradox
The proof-theoretic aim of paraconsistent logics is to reason about systems that may be inconsistent. Formalisms such as theory change deal with inconsistencies in knowledge bases by avoiding them, and by removing them once they are located. Paraconsistent logics, on the other hand, reason non-explosively in the presence of inconsistencies. In classical logic, a theory is consistent if and only if it has a model. The trademark of paraconsistent logics is that inconsistent theories can have models. In the Logic of Paradox a sentence can be either true (and not false) T, or false (and not true) F, or paradoxical (both true and false) P. This yields a three-valued logic based on Kleene’s truth tables (Table 3), but in which the third truth value indicates that a sentences is paradoxical, as opposed to being undefined or undetermined in strong Kleene logic. Note that Codd [12] suggested the same tables for the null interpretation of “value at present unknown” to extend the relational algebra by means of a three-valued logic and the null substitution principal, cf. Section 8.
321
¬ T P F
F P T
∧ T P F
T T P F
P P P F
F F F F
∨ T P F
T T T T
P T P P
F T P F
there is a relation r that satisfies Σ and violates ϕ if and only if there is a model ωr0 of Σ0 that is not a model of ϕ0 . For arbitrary finite relations r it is not obvious how to define the LP interpretation ωr0 . Hence, the key to showing the strong correspondence between counterexample relations and counterexample interpretations is the following lemma.
Table 3: Truth functions in LP
Lemma 1. Let Σ ∪ {ϕ} be a set of FDs and MVDs over the relation schema R, and let r be some relation over R that satisfies Σ and violates ϕ. Then there is a two-tuple subrelation r0 ⊆ r such that r0 satisfies Σ and violates ϕ.
L∗ denotes the propositional language over a finite set L of propositional variables, generated from the unary connective ¬ (negation), and the binary connectives ∧ (conjunction) and ∨ (disjunction). That is, L∗ is the smallest set that satisfies: i) L ⊆ L∗ , ii) if ϕ0 ∈ L∗ , then (¬ϕ0 ) ∈ L∗ , iii) if ϕ01 , ϕ02 ∈ L∗ , then (ϕ01 ∨ ϕ02 ), (ϕ01 ∧ ϕ02 ) ∈ L∗ . One may also define ϕ01 → ϕ02 as ¬ϕ01 ∨ ϕ02 . We omit parentheses if it does not cause ambiguities. We denote variables with upper-case Latin letters, e.g. A0 , B 0 , C 0 , or subscripted as A01 , A02 , A03 . Elements of L∗ are denoted by lower-case Greek letters such as ϕ0 , σ 0 , ψ 0 , or their subscripted version, and subsets of L∗ are denoted by the upper-case Greek letter Σ0 . An LP interpretation ω 0 over L is a total function from L to the set of truth values {F, P, T}. The semantics of a formula ϕ from L∗ in an LP interpretation is defined in the usual compositional way given the truth tables in Table 3. That is, we can extend ω 0 to a total function Ω0 : L∗ → {F, P, T} as follows: i) Ω0 (A0 ) := ω 0 (A0 ) for all A0 ∈ L, ii) Ω0 (¬ϕ0 ) := ¬Ω0 (ϕ0 ), iii) Ω0 (ϕ0 ∧ ψ 0 ) := Ω0 (ϕ0 ) ∧ Ω0 (ψ 0 ), and iv) Ω0 (ϕ0 ∨ ψ 0 ) := Ω0 (ϕ0 ) ∨ Ω0 (ψ 0 ). As usual, the connectives on the left-hand sides denote syntactic constructs that generate L∗ from L, whereas the symbols on the right-hand side are the semantic truth functions defined in Table 3. When working with more than two truth values, one has to define the set of designated values. The Logic of Paradox LP has {P, T} as its set of designated truth values since a paradoxical formula is true (and false) [33]. An LP interpretation ω 0 is a model of a set Σ0 of formulae in L∗ , denoted by |=ω0 Σ0 , if and only if for all σ 0 ∈ Σ0 we have Ω0 (σ 0 ) ∈ {P, T}. We say that Σ0 LP implies a formulae ϕ0 , denoted by Σ0 |=LP ϕ0 , if and only if every LP interpretation that is a model of Σ0 is also a model of ϕ0 . Let Σ0 = {A0 → B 0 , B 0 → C 0 } and ϕ0 = A0 → C 0 . The LP interpretation ω 0 that maps A0 to T, B 0 to P and C 0 to F shows that Σ0 does not LP imply ϕ0 , i.e., Σ0 6|=LP ϕ0 . LP is distinguished from classical logic by the invalidity of the Modus Ponens, e.g. from A0 and A0 → B 0 one may not conclude B 0 : simply assign F to B 0 and P to A0 .
5.2
Lemma 1 tells us that for deciding the implication problem Σ |= ϕ it suffices to examine two-tuple relations (instead of arbitrary finite relations). For two-tuple relations {t, t0 }, however, we can define a corresponding LP interpretation 0 ω{t,t 0 } . For this purpose, we introduce an extension of the notion of agree sets of distinct tuples to the presence of null values. For two tuples t, t0 over relation schema R we define ag s (t, t0 ) = ag w (t, t0 ) = ag(t, t0 ) =
{A ∈ R | t(A) = t0 (A) ∧ t(A) 6= ni}, {A ∈ R | t(A) = t0 (A) ∧ t(A) = ni}, ag s (t, t0 ) ∪ ag w (t, t0 ) .
If A ∈ ag s (t, t0 ) we say that t and t0 agree strongly on A. If A ∈ ag w (t, t0 ) we say that t and t0 agree weakly on A, and if A ∈ / ag(t, t0 ) we say that t and t0 disagree on A. We now define the special LP interpretation: for two tu0 ples t, t0 over the relation schema R let ω{t,t 0 } denote the following LP interpretation of LR : T , if A ∈ ag s (t, t0 ) 0 0 P , if A ∈ ag w (t, t0 ) . ω{t,t0 } (A ) = F , if A ∈ / ag(t, t0 ) Note that the special LP interpretation is a generalization of Fagin’s special truth assignment derived from total tuples 0 0 0 and where ω{t,t 0 } (A ) = T if and only if t(A) = t (A) [15]. The following lemma justifies the definition of the special LP interpretation and that of the corresponding LP fragments in terms of two-tuple relations. Lemma 2. Let r be a two-tuple relation over the relation schema R, and let ϕ denote an FD or MVD over R. Then r satisfies ϕ if and only if ωr0 is a model of ϕ0 . In fact, Lemma 1 and Lemma 2 allow us to establish the anticipated equivalence between FD/MVD implication and the implication of their corresponding LP fragment. Theorem 3. Let Σ∪{ϕ} be a set of FDs and MVDs over some relation schema R, and let Σ0 ∪ {ϕ0 } denote the set of its corresponding LP formulae over LR . Then the following three statements are equivalent:
Equivalences for FDs and MVDs
As a first step, we define the LP fragment that corresponds to FDs and MVDs. Let φ : R → LR denote a bijection between a relation schema R and the set LR = {A0 | A ∈ R} of propositional variables. We extend φ to a mapping Φ from the set of FDs and MVDs over R to the set L∗R . For anWFD X → B over R, let Φ(X → B) denote the formula A∈X ¬A0 ∨ B 0 . For the sake of presentation, but without loss of generality, we assume that FDs have only a single attribute on their right-hand side. For an MVD X ³ YVover R, let Φ(X the formula W V ³ Y ) denote 0 0 0 A∈X ¬A ∨ ( B∈Y −X B ) ∨ ( C∈R−XY C ). Disjunctions over zero disjuncts are interpreted as F and conjunctions over zero conjuncts are interpreted as T. We will simply denote Φ(ϕ) = ϕ0 and Φ(Σ) = {Φ(σ) | σ ∈ Σ} = Σ0 . We will now show that for any FD and MVD set Σ ∪ {ϕ}
1. Σ |= ϕ,
2. Σ |=2 ϕ, and
3. Σ0 |=LP ϕ0 .
Example 4. Let R = ASLC denote the relation schema Supplies, and let Σ contain the FDs A → S and A, L → C, and the MVD S ³ L. The following relation r Article Kiwi Kiwi
Supplier ni ni
Location Maunganui Taranaki
Cost 1.50 2.50
shows that Σ implies neither the MVD ϕ1 = A ³ L nor the FD ϕ2 = A → C. For ωr0 we obtain ωr0 (A0 ) = T, ωr0 (S 0 ) = P, ωr0 (L0 ) = F and ωr0 (C 0 ) = F. Indeed, ωr0 is a model of Σ0 but neither a model of ϕ01 nor ϕ02 .
322
5.3
Nonextendibility
Ω{t,t0 } (ϕ) ∈ {W, S}. In particular, BDs subsume FDs: a relation r satisfies the FD A1 , . . . , An → B1 , . . . , Bm [3, 28] if and only if r satisfies the BD (A1 ∧· · ·∧An ) → (B1 ∧· · ·∧Bm ). Let φ : R → LR be a bijection between a relation schema R and the set LR = {A0 | A ∈ R} of propositional variables. We extend φ to a mapping Φ from BR to the set L∗R . As before, let ϕ0 = Φ(ϕ). Then we have i) if ϕ = A, then ϕ0 = A0 ; ii) if ϕ = (¬ψ), then ϕ0 = (¬ψ 0 ); iii) if ϕ = (ϕ1 ∨ ϕ2 ), then ϕ0 = (ϕ01 ∨ϕ02 ); and if ϕ = (ϕ1 ∧ϕ2 ), then ϕ0 = (ϕ01 ∧ϕ02 ).
We show now that the equivalences of Theorem 3 are special: they do not extend beyond FDs and MVDs. Delobel introduced full first-order hierarchical decomposion (FOHD) as an extension of MVDs [13]. An FOHD over R is an expression X : {Y1 , . . . , Yk } where X, Y1 , . . . , Yk are all subsets of R such that XY1 · · · Yk = R. The FOHD X : {Y1 , . . . , Yk } over R is satisfied by a relation r over R if and only if rX [R] = rX [XY1 ] ./ · · · ./ rX [XYk ]. MVDs X ³ Y are binary FOHDs X : {Y, R − XY } where k = 2. For the FOHD X : {Y1 , . . . , Yk }, denoted by ϕ, let ϕ0 denote its corresponding LP -formula: k _ _ ^ ¬A0 ∨ B 0 . (5.1) A∈X
i=1
Theorem 5. Let Σ∪{ϕ} be a set of Boolean dependencies over some relation schema R, and let Σ0 ∪ {ϕ0 } denote the set of its corresponding LP formulae over LR . Under the assumption that relations may contain duplicate tuples, the following three statements are equivalent:
B∈Yi
1. Σ |= ϕ,
Theorem 4. For every integer n > 2 there is a relation schema Rn with n + 1 attributes, an MVD σn over Rn , and an n-ary FOHD ϕn over Rn such that σn does not imply ϕn , but σn0 LP-implies ϕ0n over LRn .
and
0 0 tn 2 = (a, b1 , . . . , bn−1 , bn )
Example 5. Let LR = {A0 (rticle), L0 (ocation), C 0 (ost)}, Σ0 = {A0 → C 0 } and ϕ0 = A0 → ¬L0 . The LP interpretation ω 0 with ω 0 (A0 ) = T = ω 0 (L0 ) and ω 0 (C 0 ) = P shows that Σ0 ∪ {¬A0 ∨ ¬L0 ∨ ¬C 0 } 6|=LP ϕ0 . However, Σ |= ϕ (if we disallow duplicate tuples) as all relations that i) violate ϕ and ii) do not contain R-total duplicate tuples contain two distinct tuples that are both isomorphic to (a, l, ni).
where a ∈ dom(A), bn ∈ dom(Bn ) and for all i = 1, . . . , n−1 the values bi , b0i ∈ dom(Bi ) are distinct. Following correspondence (5.1) we have LRn = {A0 , B10 , . . . , Bn0 }, σn0 = 0 ¬A0 ∨ ((B10 ∧ · · · ∧ Bn−1 ) ∨ Bn0 ) and ϕ0n = ¬A0 ∨ B10 ∨ · · · ∨ Bn0 . The only interpretation that evaluates ϕ0n to F also evaluates σn0 to F. Hence, σn0 logically implies ϕ0n .
6.
3. Σ0 |=LP ϕ0 .
There is no LP formula that characterizes the subsumption of two tuples: for R = A both r1 = {a, ni} and r2 = {a, a0 } satisfy the same BDs over R, but r2 is subsumptionfree whereas r1 W is not. Over total relations, one can add the formulae φ0R = A∈R ¬A to the premise set Σ0 as an alternative to allowing duplicate tuples. For partial relations, the next example shows that Theorem 5 fails when we disallow duplicate tuples, even after adding φ0R to Σ0 .
Proof. Let n > 2. Let Rn = {A, B1 , . . . , Bn }, σn = A ³ Bn , and ϕn = A : {{B1 }, . . . , {Bn }}. A relation rn = n {tn 1 , t2 } over Rn that satisfies σn and violates ϕn is given by tn 1 = (a, b1 , . . . , bn−1 , bn )
2. Σ |=2 ϕ, and
BOOLEAN DEPENDENCIES
An application. As a consequence of Theorem 5 we obtain worst-case time-complexity results for the implication problem of BDs. This problem has been investigated in depth for the logic LP, and Table 4 provides a summary of the results [10]. In fact, the problem has been analyzed with respect to three different notions of complexity defined by Vardi [38]. For data complexity, Σ is the input and ϕ has fixed size. For expression complexity, ϕ is the input and Σ has fixed size. For combined complexity, Σ and ϕ are the input. The complexity results also distinguish the input according to its syntactic form. A BD is in Conjunctive Normal Form (CNF) when it is a single conjunction of clauses, where a clause is a disjunction of literals (i.e. an attribute A or its negation ¬A). A BD is in Disjunctive Normal Form (DNF) when it is a single disjunction of conjunctions of literals. In Table 4, the symbol “Any” means that no assumption is made on the syntactic form of the BDs.
We introduce Boolean dependencies (BDs) over (partial) relations, based on our special LP interpretation. This class subsumes BDs over total relations [35] and FDs over (partial) relations [28, 3]. Note that MVDs are not BDs. Later on, we will extend BDs to the presence of an NFS. The class of Boolean dependencies (BDs) over a relation schema R is defined as the propositional language BR := R∗ over R. An agreement over R is a function ω : R → {D, W, S}. For two distinct tuples t, t0 over R we define the agreement ω{t,t0 } of t and t0 by S , if A ∈ ag s (t, t0 ) W , if A ∈ ag w (t, t0 ) ω{t,t0 } (A) = D , if A ∈ / ag(t, t0 ) for all A ∈ R. Intuitively, the definition encodes the following meaning: ω{t,t0 } (A) = S when t and t0 strongly agree on A, ω{t,t0 } (A) = W when t and t0 w eakly agree on A, and ω{t,t0 } (A) = D when t and t0 d isagree on A. We can extend an agreement ω over R to a function Ω : BR → {D, W, S} as follows: i) if ϕ = A ∈ R let Ω(ϕ) := ω(A), ii) if ϕ = (¬ψ) let Ω(ϕ) := ¬Ω(ψ), iii) if ϕ = (ϕ1 ∨ ϕ2 ) let Ω(ϕ) := Ω(ϕ1 ) ∨ Ω(ϕ2 ); and if ϕ = (ϕ1 ∧ ϕ2 ) let Ω(ϕ) := Ω(ϕ1 ) ∧ Ω(ϕ2 ). On the right-hand side of these definitions, ¬, ∨, and ∧ denote the truth functions defined by Table 3 where F, P and T are replaced by D, W and S, respectively. For a relation r and a BD ϕ over relation schema R we say that r satisfies ϕ, denoted by |=r ϕ, if and only if for all tuples t, t0 ∈ r the following holds: if t 6= t0 , then
7.
EQUIVALENCES TO S -3 LOGICS So far, we can reason logically about data dependencies in two opposite frameworks: for total relations in the classical propositional fragments by Sagiv, Delobel, Parker and Fagin’s equivalences [35], and for partial relations in propositional fragments of LP. We will now unify these two orthogonal frameworks by allowing an arbitrary null-free subschema Rs over the underlying relation schema R. This allows us to capture the two special cases when Rs = R and Rs = ∅, respectively, as well as new equivalences for any NFS Rs that satisfies ∅ ⊂ Rs ⊂ R.
323
Σ Any Any Any CNF CNF CNF DNF DNF DNF
ϕ Any CNF DNF Any CNF DNF Any CNF DNF
Σ |= ϕ Combined coNP-complete O(|Σ| × |ϕ|) coNP-complete coNP-complete O(|Σ| × |ϕ|) coNP-complete coNP-complete O(|Σ| × |ϕ|) coNP-complete
Expression coNP-complete O(|ϕ|) coNP-complete coNP-complete O(|ϕ|) coNP-complete coNP-complete O(|ϕ|) coNP-complete
if ω ˆ (A0 ) = F), and that does not map both a variable A0 ∈ L−S and its negation ¬A0 into F (we must not have ω ˆ (A0 ) = 0 0 F=ω ˆ (¬A ) for any A ∈ L − S). Accordingly, for each variable A0 ∈ L and each S-3 interpretation ω ˆ of L there are the following possibilities:
Data O(|Σ|) O(|Σ|) O(|Σ|) O(|Σ|) O(|Σ|) O(|Σ|) O(|Σ|) O(|Σ|) O(|Σ|)
• ω ˆ (A0 ) = T and ω ˆ (¬A0 ) = F, • ω ˆ (A0 ) = F and ω ˆ (¬A0 ) = T, • ω ˆ (A0 ) = T and ω ˆ (¬A0 ) = T (only if A0 ∈ L − S).
Table 4: Time-complexities for Deciding BDs
7.1
S-3 interpretations generalize both standard 2-valued interpretations and Levesque’s 3 interpretations [27], since a 2valued interpretation is an S-3 interpretation where S = L, while a 3 interpretation is an S-3 interpretation with S = ∅. Hence, in S-3 interpretations the truth value P can be simulated only by variables that do not belong to S. An S-3 interpretation ω ˆ : L` → {F, T} of L can be lifted ∗ ˆ to a total function Ω : L → {F, T} by means of simple rules. This lifting has been defined as follows [36]. An arbitrary formula ϕ0 in L∗ is firstly converted (in linear time in the size of the formula) into its corresponding formula ϕ0N in Negation Normal Form (NNF) using the following rewriting rules: ¬(ϕ0 ∧ ψ 0 ) 7→ (¬ϕ0 ∨ ¬ψ 0 ), ¬(ϕ0 ∨ ψ 0 ) 7→ (¬ϕ0 ∧ ¬ψ 0 ), and ¬(¬ϕ0 ) 7→ ϕ0 . Therefore, negation in a formula in NNF occurs only at the literal level. The rules for assigning truth values to NNF formulae are as follows:
Paradox-free Variables
To accommodate arbitrary NOT NULL constraints into our equivalences we require a logical counterpart to the semantics enforced by an NFS Rs over relation schema R. For the anticipated equivalences we consider LP interpretations ω 0 of LR that are paradox-free on the variables A0 in LRs , i.e. where ω 0 (A0 ) 6= P. We call an LP interpretation ω 0 of LR an LPRs interpretation if and only if ω 0 is paradox-free on all the variables in LRs . Example 6. We reconsider Example 4 in the presence of NFSs. Since the relation r of Example 4 satisfies Rs1 = ALC we know that Σ and Rs1 imply neither ϕ1 nor ϕ2 . Correspondingly, ωr0 satisfies ωr0 (B) ∈ {F, T} for all B ∈ {A0 , L0 , C 0 }. Even under these restrictions, ωr0 is a model of Σ0 but not a model of ϕ01 nor ϕ02 . The following relation s Article Kiwi Kiwi
Supplier G6Kiwi G6Kiwi
Location ni ni
ˆ 0) = ω • Ω(ϕ ˆ (ϕ0 ), if ϕ0 ∈ L` , ˆ 0 ∨ ψ 0 ) = T if and only if Ω(ϕ ˆ 0 ) = T or Ω(ψ ˆ 0 ) = T, • Ω(ϕ
Cost 1.50 2.50
ˆ 0 ∧ψ 0 ) = T if and only if Ω(ϕ ˆ 0 ) = T and Ω(ψ ˆ 0 ) = T. • Ω(ϕ An S-3 interpretation ω ˆ is a model of a set Σ0 of formulae ∗ 0 ˆ in L if and only if Ω(σN ) = T holds for every σ 0 ∈ Σ0 . We say that Σ0 S-3 implies a formula ϕ0 , denoted by Σ0 |=3S ϕ0 , if and only if every S-3 interpretation that is a model of Σ0 is also a model of ϕ0 .
shows that Σ and Rs2 = ASC do not imply the FD ϕ2 = A → C. For ωs0 we obtain ωs0 (A0 ) = T, ωs0 (S 0 ) = T, ωs0 (L0 ) = P and ωs0 (C 0 ) = F. Indeed, ωs0 is a model of Σ0 but not a model of ϕ02 . We reason logically to see that Σ and Rs2 imply ϕ1 : suppose ω 0 is a model of Σ0 but not a model of ϕ01 . Then Ω0 (ϕ01 ) = F. That is, ω 0 (A0 ) = T, and ω 0 (L0 ) = F and (ω 0 (S 0 ) = F or ω 0 (C 0 ) = F). Since ω 0 is a model of Σ0 it follows that ω 0 is a model of {S 0 , ¬S 0 ∨ C 0 }. Since S 0 and C 0 are paradox-free we have ω 0 (S 0 ) = T and ω 0 (C 0 ) = T, a contradiction. Using the anticipated equivalence we would conclude that Σ and Rs2 imply ϕ1 . We show that Σ and Rs3 = SL imply ϕ2 . Indeed, assume 0 ω is a model of Σ0 but not a model of ϕ02 . Then ω 0 (A0 ) = T, and ω 0 (L0 ) = F. To be a model of Σ0 , ω 0 (A0 ) must be a model of {S 0 , ¬L0 , ¬S 0 ∨ L0 }. However, S 0 and L0 are paradox-free, a contradiction. Using the anticipated equivalence we would conclude that Σ and Rs3 imply ϕ2 .
7.3
Equivalences
LRs -3 and LPRs interpretations are related very closely. Proposition 1. Let LRs ⊆ LR and ω 0 : LR → {F, P, T} be an LPRs interpretation of LR , i.e. for all A0 ∈ LRs we have ω 0 (A0 ) 6= P. Then we can associate in a bijective way an LRs -3 interpretation ω ˆ 0 : L`R → {F, T}, i.e. for all A0 ∈ LR we never have ω ˆ 0 (A0 ) = F = ω ˆ 0 (¬A0 ), and for all A0 ∈ LRs we never have ω ˆ 0 (A0 ) = ω ˆ 0 (¬A0 ), where ω ˆ 0 is: • ω ˆ 0 (A0 ) = T and ω ˆ 0 (¬A0 ) = F if and only if ω 0 (A0 ) = T, • ω ˆ 0 (A0 ) = F and ω ˆ 0 (¬A0 ) = T if and only if ω 0 (A0 ) = F,
7.2
S -3 Logics Schaerf and Cadoli [36] introduced S-3 logics as “a semantically well-founded logical framework for sound approximate reasoning, which is justifiable from the intuitive point of view, and to provide fast algorithms for dealing with it even when using expressive languages”. For a finite set L of symbols (variables) let L` denote the set of all literals over L, i.e., L` = L ∪ {¬A0 | A0 ∈ L} ⊆ L∗ . Let S ⊆ L. An S-3 interpretation of L is a total function ω ˆ : L` → {F, T} that maps every variable A0 ∈ S and its negation ¬A0 into opposite values (ˆ ω (A0 ) = T if and only
• ω ˆ 0 (A0 ) = T and ω ˆ 0 (¬A0 ) = T if and only if ω 0 (A0 ) = P. ˆ 0 (ϕ0N ) = T Furthermore, for all formulae ϕ0 ∈ L∗R we have Ω if and only if Ω0 (ϕ0 ) ∈ {P, T}. Theorems 3 and 5 generalize to the presence of an arbitrary NFS Rs (LRs -3 logic). We state the next theorem in terms of S-3 logics since these have been studied in the literature [36]. Proposition 1, however, shows the equivalence to the implication in terms of LPRs interpretations.
324
0 0 Intuitively, ωW 0 is the special LP interpretation ωr induced by the two-tuple relation rϕ in our completeness proof 0 of D (where W = Wi ), cf. Table 2. Note that ωW 0 reduces to the propositional truth assignment defined in [35] for the 0 special case where Rs = R. Moreover, ωW 0 is equivalent to an LRs -3 interpretation, cf. Proposition 1. In [34] Sagiv presents an algorithm for deciding the implication problem Σ |=R ϕ for FD/MVD sets Σ ∪ {ϕ} over R. The algorithm can be implemented to run in time O(¯ pΣ ·|Σ|) where p¯Σ is the number of sets in DepBΣ (X) that have non-empty intersection with the right-hand side of ϕ. Using Theorem 7 we can extend Sagiv’s algorithm to decide implication in the presence of an NFS Rs in time O(|Σ| + p¯Σ[XRs ] · |Σ[XRs ]|). Following Section 4, we note that our
Theorem 6. Let Σ ∪ {ϕ} be either a set of BDs, or a set of FDs and MVDs over the relation schema R, and let Rs denote an NFS over R. Let Σ0 ∪{ϕ0 } denote the set of corresponding LRs -3 formulae over LR . Assuming that relations may contain duplicate tuples, the following are equivalent: 1. Σ |=Rs ϕ,
2. Σ |=2−Rs ϕ,
3. Σ0 |=3LRs ϕ0 .
For the cases of FD/MVD implication in Theorem 6 the assumption of allowing duplicate tuples is not necessary. Finally, we exemplify that our framework allows very general reasoning about key and uniqueness constraints. Example 7. Let R = ABCD and Σ = {A → BC, AB ³ D} and Rs = B. It follows that A → R is implied by Σ and Rs . Hence, A is a uniqueness constraint, i.e., every non-null value in the A-column is unique in the A-column (there can still be distinct rows which are both null on A). If we declare A to be NOT NULL, and Σ is enforced by the database management system, then A is even a candidate key. That is, for every table over R every row in that table has a total and unique value in the A-column.
7.4
O(|Σ| + min(kΣ[XRs ] , log p¯Σ[XRs ] )|Σ[XRs ]|) algorithm for deciding Σ |=Rs ϕ can directly be applied to decide Σ0 |=3LRs ϕ0 for the corresponding fragment of Cadoli and Schaerf’s LRs -3 logics, cf. Theorem 6, a fragment not studied previously to our knowledge.
8.
Applications
VALUE AT PRESENT UNKNOWN
Levene and Loizou introduced and axiomatized strong and weak FDs (WFDs) with respect to a possible world semantics [25]. WFDs are satisfied by an (incomplete) relation if there is a possible world (a total relation obtained from completing the incomplete one) that satisfies the FD according to the standard definition. WFDs are defined with respect to Codd’s null value unk, i.e., “value at present unknown”. For WFDs we re-define the weak agree set to ag w (t, t0 ) := {A ∈ R | t[A] = unk or t0 [A] = unk}, i.e., two tuples weakly agree on an attribute if there is a possible world on which they agree on A. Under this new definition the satisfaction of a WFD X → Y by an (incomplete) relation r is equivalent to the fact that for all t, t0 ∈ r the following holds: if X ⊆ ag s (t, t0 ), then Y ⊆ ag(t, t0 ). We can introduce weak multivalued dependencies (WMVDs) in a similar way. It can be shown that D forms a finite axiomatization for WFDs and WMVDs in the presence of an NFS. With the new definition of weak agree sets Theorems 3 and 4 carry over to sets of WFDs and WMVDs and the logic LP . Weak BDs may be introduced in the same way as in Section 6, and Theorem 5 carries over to sets of weak BDs and the logic LP , as well as the established upper bounds on the time-complexity of the associated implication problems. In the same way, the results of Section 7 carry over.
Let Σ be an arbitrary set of BDs over R in NNF, and let ϕ be an arbitrary BD in CNF. The following findings establish NFSs as an effective mechanism to balance the expressiveness and tractability of various consequence relations. They follow immediately from the results established for S-3 logics [36] and an application of our Theorem 6. Corollary 1. For every Rs and every Rs0 with Rs ⊆ ⊆ R, if Σ |=Rs ϕ, then Σ |=Rs0 ϕ.
Rs0
Corollary 1 establishes the monotonicity of the relations |=Rs . Therefore, by declaring attributes as NOT NULL, a data administrator enforces at least all of the previously enforced data dependencies. In Figure 1, this is illustrated as a potential increase in expressiveness. Corollary 2. For every Rs ⊆ R, the implication problem Σ |=Rs ϕ can be decided in time O(|Σ| × |ϕ| × 2|Rs | ). Corollary 2 establishes a uniform complexity for deciding the consequence relations |=Rs in terms of the NFSs Rs . Therefore, by not declaring attributes as NOT NULL, a data administrator can utilize more efficient algorithms for deciding the associated implication problem. In Figure 1, this is illustrated as a potential increase in efficiency. As an application of the FD/MVD case of Theorem 6 we generalize Sagiv, Delobel, Parker and Fagin’s logical characterization of a dependency basis and attribute set closure from total [35] to partial relations.
9.
CONCLUSION
Previous theories and database practice warrant a thorough study of FDs and MVDs in the presence of an NFS. We established a finite axiomatization and efficient algorithms for the associated implication problem that close the gap between theory and practice, and unify previously orthogonal theories for i) FDs and MVDs over total relations, ii) FDs and an NFS over partial relations, and iii) FDs and MVDs over partial relations without an NFS. For Lien, Atzeni and Morfuni’s class of FDs and MVDs we established equivalences between their implication and the implication of fragments in Priest’s Logic of Paradox [33]. We extended these to equivalences between the implication of BDs, and FDs and MVDs, both in the presence of an arbitrary NFS, and the implication of fragments of Cadoli and Schaerf’s S-3
Theorem 7. Let Σ denote a set of FDs and MVDs, and Rs an NFS over the relation schema R. Let X and W be 0 disjoint subsets of R. Let ωW 0 denote the following LPRs interpretation: T , if A ∈ X((R − W ) ∩ Rs ) 0 0 P , if A ∈ (R − W ) − Rs 0 (A ) = ωW . F , if A ∈ W ∗ 0 If W ∈ DepBΣ (X) and W and XΣ are disjoint, then ωW 0 is 0 0 a model of Σ0 . If ωW 0 is a model of Σ , then W is contained ∗ in one set of DepBΣ (X) and W is disjoint from XΣ .
325
logics [36]. Our findings apply to Zaniolo’s no-information nulls, and to Codd’s “value at present unknown” nulls under Levene and Loizou’s weak possible world semantics. Our theory establishes SQL’s NOT NULL constraint as an effective mechanism to balance the expressiveness and tractability of consequence relations for significant classes of unirelational dependencies that arise in practice.
10.
[13] C. Delobel. Normalization and hierarchical dependencies in the relational data model. ACM Trans. Database Syst., 3(3):201–222, 1978. [14] C. Delobel and M. Adiba. Relational database systems. North Holland, 1985. [15] R. Fagin. Functional dependencies in a relational data base and propositional logic. IBM Journal of Research and Development, 21(6):543–544, 1977. [16] R. Fagin. Multivalued dependencies and a new normal form for relational databases. ACM Trans. Database Syst., 2(3):262–278, 1977. [17] R. Fagin, P. Kolaitis, R. Miller, and L. Popa. Data exchange: semantics and query answering. Theor. Comput. Sci., 336(1):89–124, 2005. [18] Z. Galil. An almost linear-time algorithm for computing a dependency basis in a relational database. J. ACM, 29(1):96–102, 1982. [19] K. Hagihara, M. Ito, K. Taniguchi, and T. Kasami. Decision problems for multivalued dependencies in relational databases. SIAM J. Comput., 8(2):247–264, 1979. [20] S. Hartmann and S. Link. Characterising nested database dependencies by fragments of propositional logic. Ann. Pure Appl. Logic, 152(1-3):84–106, 2008. [21] S. Hartmann and S. Link. Efficient reasoning about a robust XML key fragment. ACM Trans. Database Syst., 34(2), 2009. [22] S. Hartmann and S. Link. Numerical constraints on XML data. Inf. Comput., 208(5):521-544, 2010. [23] T. Imielinski and W. Lipski Jr. Incomplete information in relational databases. J. ACM, 31(4):761–791, 1984. [24] S. Kolahi. Dependency-preserving normalization of relational and XML data. J. Comput. Syst. Sci., 73(4):636–647, 2007. [25] M. Levene and G. Loizou. Axiomatisation of functional dependencies in incomplete relations. Theor. Comput. Sci., 206(1-2):283–300, 1998. [26] M. Levene and G. Loizou. Database design for incomplete relations. ACM Trans. Database Syst., 24(1):80–125, 1999. [27] H. Levesque. A knowledge-level account of abduction. In IJCAI, pages 1061–1067, 1989. [28] E. Lien. On the equivalence of database models. J. ACM, 29(2):333–362, 1982. [29] W.-D. Langeveldt and S. Link. Empirical evidence for the usefulness of Armstrong relations in the acquisition of meaningful functional dependencies. Inf. Syst., 35(5):352-374, 2010. [30] S. Link. On the implication of multivalued dependencies in partial database relations. Int. J. Found. Comput. Sci., 19(3):691–715, 2008. [31] P. Marquis and N. Porquet. Resource-bounded paraconsistent inference. Ann. Math. Artif. Intell., 39:349–384, 2003. [32] J. Paredaens, P. De Bra, M. Gyssens, and D. Van Gucht. The Structure of the Relational Database Model. Springer, 1989. [33] G. Priest. Logic of paradox. Journal of Philosophical Logic, 8:219–241, 1979. [34] Y. Sagiv. An algorithm for inferring multivalued dependencies with an application to propositional logic. J. ACM, 27(2):250–262, 1980. [35] Y. Sagiv, C. Delobel, D. S. Parker Jr., and R. Fagin. An equivalence between relational database dependencies and a fragment of propositional logic. J. ACM, 28(3):435–453, 1981. [36] M. Schaerf and M. Cadoli. Tractable reasoning via approximation. Artif. Intell., 74:249–310, 1995. [37] D. Toman and G. Weddell. On keys and functional dependencies as first-class citizens in description logics. J. Autom. Reasoning, 40(2-3):117–132, 2008. [38] M. Vardi. The complexity of relational query languages. In STOC, pages 137–146, 1982. [39] M. Vincent, J. Liu, and C. Liu. Strong functional dependencies and their application to normal forms in XML. ACM Trans. Database Syst., 29(3):445–462, 2004. [40] G. Weddell. Reasoning about functional dependencies generalized for semantic data models. ACM Trans. Database Syst., 17(1):32–64, 1992. [41] J. Wijsen. Temporal FDs on Complex Objects. ACM Trans. Database Syst., 24(1):127-176, 1999. [42] M. Wu. The practical need for fourth normal form. In ACM SIGCSE, pages 19–23, 1992. [43] C. Zaniolo. Database relations with null values. J. Comput. Syst. Sci., 28(1):142–166, 1984.
FUTURE DIRECTIONS
Data dependencies should be analyzed in other approaches to incomplete information, e.g., for different null value interpretations, or-relations and fuzzy relations. We conjecture that our axiomatization carries over to or-relations. The equivalences to logics seem unlikely, in particular Lemma 1 fails for the empty set Σ and an FD ϕ. In fact, for the or-relation {(a, {b, b0 }), (a, {b0 , b00 }), (a, {b, b00 })} over AB or there is no possible world that satisfies A → B, but for every two-tuple subrelation there is such a possible world. Armstrong relations provide a valuable tool to acquire meaningful data dependencies [29], but their properties have not been studied yet for the class of FDs, MVDs and NFSs. Our equivalences pave the way to develop a preferencebased theory of dependencies where the administrator ranks sets of dependencies according to some preferences [31]. Finally, we mention that it may be interesting to study data exchange problems in the presence of inconsistent sets of source, target or source-to-target dependencies [17].
11.
ACKNOWLEDGEMENT
This research is supported by the Marsden fund council from Government funding, administered by the Royal Society of New Zealand. The first author is supported by a research grant of the Alfried Krupp von Bohlen and Halbach foundation, administered by the German Scholars organization.
12.
REFERENCES
[1] M. Arenas and L. Libkin. A normal form for XML documents. ACM Trans. Database Syst., 29(1):195–232, 2004. [2] W. W. Armstrong. Dependency structures of database relationships. Information Processing, 74:580–583, 1974. [3] P. Atzeni and N. Morfuni. Functional dependencies and constraints on null values in database relations. Information and Control, 70(1):1–31, 1986. [4] C. Beeri. On the membership problem for functional and multivalued dependencies in relational databases. ACM Trans. Database Syst., 5(3):241–259, 1980. [5] C. Beeri and P. Bernstein. Computational problems related to the design of normal form relational schemas. ACM Trans. Database Syst., 4(1):30–59, 1979. [6] C. Beeri, R. Fagin, and J. H. Howard. A complete axiomatization for functional and multivalued dependencies in database relations. In SIGMOD, pages 47–61. ACM, 1977. [7] J. Biskup. Inferences of multivalued dependencies in fixed and undetermined universes. Theor. Comput. Sci., 10(1):93–106, 1980. [8] J. Biskup and S. Link. Appropriate reasoning about data dependencies in fixed and undetermined universes. In FoIKS, pages 58–77, 2006. [9] M. Bojanczyk, A. Muscholl, T. Schwentick, and L. Segoufin. Two-variable logic on data trees and XML reasoning. J. ACM, 56(3), 2009. [10] M. Cadoli and M. Schaerf. On the complexity of entailment in propositional multivalued logics. Ann. Math. Artif. Intell., 18(1):29–50, 1996. [11] E. F. Codd. A relational model of data for large shared data banks. Commun. ACM, 13(6):377–387, 1970. [12] E. F. Codd. Extending the database relational model to capture more meaning. ACM Trans. Database Syst., 4(4):397–434, 1979.
326