STONE DUALITY BETWEEN QUERIES AND DATA DAVID B. BENSON Abstract. The Stone dualities for accessible categories and the subclass of Diers categories provide limit and colimit structuring principles for query languages and the associated database. As a motivating example, we consider relational databases. Relational databases are given by database schemata, the syntax, and the class of sets of relations satisfying the syntactic constraints, called instances. Relational database schemata are shown to be sketched by nite limit sketches. Such sketches include all the usual kinds of dependencies: functional, join and inclusion. The addition of domain constraints results in ( nite limit, countable coproduct)-sketches. Without inclusion dependencies, the accessible category of models of the database sketch consists of all the satisfying instances and homomorphisms between them. With inclusion dependencies, the instances are equivalence classes of the models. Without domain constraints the model category is locally presentable. With domain constraints the model category is locally multipresentable. i.e., a Diers category. The duality theory provides an entirely new means of expressing queries as formal sums of database instances.
1. Introduction A model is a system of sets with relations and functions providing constraints upon the set system. A class of models, all similarly structured, together with the structure preserving maps between them is a category. We consider such categories to be categories of structured data, abstracting the structured data recorded in a computer. There are many examples of such model categories of interest in computer science, both pure and applied. We concentrate on only one such class of examples, relational databases. This enables us to show most of the details for these examples and provides substantive weight to our primary considerations, the duality between model categories of data and the queries on the data. Our interest lies solely in the model categories which are accessible, [20]. Accessible categories may be speci ed via a sketch, [20, 9, 4]. We give a brief description of sketches in later sections. Sentences in basic logic and sketches have equivalent model theories. That is, for every sentence in basic logic there is a sketch with the same category of models, and visa versa, [20]. As this reference points out, little of L is lost by considering only basic logic. For these reasons, sketches are called graph-based logic in [8]. For some situations, the sentence, i.e., theory, in basic rst-order logic is the preferred choice for a speci cation of computational data and activities. For other situations, a sketch provides the clearest speci cation. For still others, a description using results from the theory of accessible categories appears to be the simplest speci cation. Finally, for certain important restrictions of basic logic, corresponding 11
Date : 1996 August 27. .
1
2
DAVID B. BENSON
to easily stated restrictions on the form of sketches, a highly algebraic style of presentation is possible, [23, 4, 5]. In this paper we emphasize sketches as the method of giving constraints. This illustrates the ease with which sketches provide constraints in somewhat complicated settings. A relational database scheme can readily be viewed, with some inessential abstraction involved, as a sketch. The category of models then includes all the database instances. For purists, these are really preinstances. Indeed there are a few minor matters about relational databases which we do not treat at all. One is entitled to add \abstract" before any of the words and phases referrring to relational databases. Since the motivation here is simply to show how sketches, in particular, (limit, coproduct)-sketches, may be used, nothing will be lost in doing so. As there is a restricted form of logic, corresponding to (limit, coproduct)-sketches, [17], in principle one could write the same constraints for relational databases in terms of this logic. Also, the essentially multialgebraic theories of [5] may also be used to write this constraint system. Some of the trades between dierent methods of expressing the same constraints may be more dicult to illustrate with any technique other than the sketch framework. Irrespective of the method used to present model structures, the model category is accessible. By Lair's theorem, a category is accessible if and only if it is sketchable. Lair's theorem gives little or no control over the growth of the regular cardinal associated with the accessible category in the passage from sketches to the corresponding accessible category of models. For this reason, we choose to state the duality results in the most general setting that we know. Despite this appearance of sizable regular cardinals, these model theoretic results have intuitive appeal to everyone who has attempted to specify and implement moderately large computer programs. Every accessible category comes equipped with a dual category. It is natural, especially for the example just mentioned, to consider this dual category to be the category of queries upon the accessible category of data. Our purpose in this paper is to explicate this duality structure in the computer science context with the intent of discovering elementary, but central, programming language structuring principles. We exemplify the duality structure between the category of data and the category of queries using relational databases, but there are many other examples. Stone dualities have many applications in mathematics, [18], and also in computer science, [24, 2, 22]. The dualities here are formally of this type, but the correspondence we emphasize is between data and queries. Now the data is always, for us, a category of models of a general sketch so the dierence is only one of presentation. However, this perspective enables one to concentrate on the structure of queries and data which arises from this current emphasis. The base data of an accessible category is given by the presentable objects of the category. The generating method for the remainder of the data is to form at functors, in eect ltered colimits of elements, as will be brie y described in a later section. This is, for our current purposes, a suitable presentation of potentially in nite data. The dualities we use depend on the commutation between certain classes of limits and colimits in the base, or dualizing, category, Sets. We follow existing literature on this matter. The main contribution of this paper is the thesis that duality in the accessible category setting is relevant to program and data design. Another contribution is to suggest that the very notion of database is, abstractly, an accessible category.
QUERY - DATA DUALITY
3
2. Database Schemes as Sketches Relational database schemes give constraints on the state of the database. A database state is called an instance. The constraint data can also be given by particular cases of nite limit sketches when the so-called domain constraints are ignored. The models of the sketch are all the consistent states of the database speci ed by the sketch. The category of models and homomorphisms, i.e., elementary maps, between them is a locally presentable category, thus possessing term models. Finite limit sketches readily include the ideas of functional dependencies and join dependencies. Inclusion dependencies are also expressed with nite limit sketches, using an idea in [20]. In this case, there is a class of models for each instance. Domain constraints may be included by adding countable, i.e., nite or denumerable, coproducts to the sketch. In this case the sketch is a ( nite limit, countable coproduct)-sketch and the model category is locally multipresentable. Locally multipresentable categories are also said to be Diers categories. We review elements of the theory of sketches and also just enough of the theory of relational databases to see the close connection between these two conceptual frameworks. The presentation of the elements of relational database theory is interwoven with the construction of the corresponding sketches for database schemes. Note that we will include in the sketch considerably more constraints than are typically speci ed in a relational database scheme. One must sketch the speci cation of (almost) all the data, not just that which is stored. Nonetheless, from this connection, it is reasonable to de ne an abstract database scheme to be a sketch, generalizing the notions in relational database theory. 3. Sketches There are several inessential variants to the notion of a sketch. The variant used here stems from [20] and from [4]. By the term graph we mean a directed graph. The nodes, or verticies, of the graph are sometimes called objects. We exclusively use the term arcs rather than \edges" or \arrows" when referring to a graph. The collection of all the nite paths in a graph, together with the objects, is said to be the free category generated by the graph. A commutativity speci cation, also called a commuting diagram, is a pair of paths with the same source and target. A graph together with a set of commutativity speci cations constitutes the presentation of the freest category in which the paths of the commutativity speci cation are identi ed. The morphisms of categories are said to be arrows. Speci cally, for each graph S and each class of commutativity speci cations, D, in the graph S, there is a morphism of graphs, S ! UC , from S to the underlying graph UC of category C which maps every commutativity speci cation to an equality of arrows in the category C and universally so. A diagram in a graph S is a graph morphism D : G ?! S for a small graph G. A cone for diagram D consists of object ?1, the apex of the cone, together with arcs to all the objects of D. A cocone for diagram D consists of object +1, the apex of the cocone, together with arcs from all the objects of D. We often rename the apex objects, using in each instance a more suggestive notation. A sketch S is a list of data (S, D, L, C, ) where S is a graph, called the underlying graph of S , D is a class of commutativity speci cations in S , L is a class of diagrams, called the limit speci cations of S , C is a class of diagrams, called the colimit speci cations of S , and is a function assigning a cone to every
4
DAVID B. BENSON
limit speci cation in L and assigning a cocone to every colimit speci cation in C. A limit sketch is a sketch of the form (S, D, L, ;, ). A nite limit sketch is a limit sketch in which every limit diagram is nite. That is, if D : G ?! S is a diagram in L, then G is nite. To present speci cations and the cones or cocones assigned by in examples, given by pictures on the printed page, the arcs of the abstract diagrams are given by drawing solid arrows between the graphical objects in the pictures and the abstract arcs of the cone or cocone are given by dotted arrows in the pictures. As an example, here is a nite limit speci cation for the product of two objects x and y in the underlying graph of the sketch. The speci cation in L consists solely of the two objects, being a discrete speci cation. The function assigns the cone y xy x to this speci cation, where x y is simply our name for the apex object ?1 in this cone. A coequalizer speci cation in a sketch is a diagram of the form A B c C; /
o
/
/
/
in which C is the apex of the cocone B c C; constraining M(c) to be a coequalizer in every model M of the sketch. The remaining data required for a regular epi speci cation consists of the speci cation of the kernel pair, [20]. A (limit, coproduct)-sketch is a sketch (S, D, L, C, ) in which every colimit speci cation in C is discrete. A ( nite limit, countable coproduct)-sketch is a sketch (S, D, L, C, ) in which every limit speci cation in L is nite and every colimit speci cation is discrete and countable. 3.1. Sketch inclusions. Given two sketches S = (S, D, L, C, ) and S = (S', D', L', C', ) we say that S is included in S if S is a subclass of S', D is a subclass of D', L is a subclass of L', C is a subclass of C', and the domain{codomain restriction of to L [ C and S is equal to . In this case there is an obvious inclusion morphism of graphs /
0
0
0
0
S ?! S
0
preserving D, L, C, and . 3.2. Models of sketches. A model of sketch S = (S, D, L, C, ) is a functor M from the category C = C (S, D) presented by the data (S, D) to the category Sets, M : C ! Sets, satisfying the constraints given just now: Each diagram in L [ C has an image in C and similarly for . For each diagram D in L each model M sends the cone (D) to a limit of the diagram M D and for each diagram D in C sends the cocone (D) to a colimit of the diagram M D. In general then, not all of the functors in Sets are models. The category of all models and natural transformations between them is denoted Mod S . Each inclusion I : S ?! S of sketches sends models of S to models of S by composition with I. For any sketch the category of models is an accessible category by Lair's theorem. If S is a limit sketch, Mod S is a locally presentable category. If S is a nite limit sketch, Mod S is a locally nitely presentable category, possessing term models, [9]. If S is a (limit, coproduct)-sketch, Mod S is a locally multipresentable category. C
0
0
QUERY - DATA DUALITY
5
If S is a (limit, epi)-sketch, Mod S is a weakly locally presentable category. For further information about these topics, see [20, 4]. 4. Sketching Relations The notion of relation here is that used in relational database theory and our notation follows [21] when this does not con ict with usages by category theorists. A relation scheme is simply a distinguished symbol which we take to be an object in the sketch for the relation scheme. Let R be a relation scheme. Let UR = fA1; : : : ; Ang be a nite collection of attribute symbols. The attribute symbols are simply termed attributes. To avoid writing the symbol for cartesian product, we often follow the practice of [21, 7, 1] in writing a list of attributes drawn from UR . These notations are than just strings, or words, in UR . The symbols X; Y , etc. denote such strings while A; B, etc. denote elements of UR . As opposed to standard practice in writings on relational database theory, we consider R; S, etc. to be symbols for relational schemes whose (possibly overlapping) alphabets of attributes are then UR ; US , etc. Let U(R) denote the word obtained by listing every attribute once and only once. This word sets an indexing, once and for all, of the attributes in UR via U(R) = A1A2 : : :An . In the sketch for the relational scheme R, U(R) is the apex object ?1 for the discrete diagram UR . For each subsequence X of U(R) which is an object otherwise required in the sketch, the sketch has to have all of the data so that the projection X; an arc of the sketch, indeed acts as a projection U(R)
/
M(U(R)) = M(A1 ) M(An ) M(X) in every model M, and the set of arcs fX ! Ai jAi in X g is a cone diagram over the discrete diagram fAi in X g. These necessary details of discrete limits in sketches are explicated in [9, 8]. We henceforth assume that all necessary discrete limit diagrams are in all the sketches that we consider. The interesting arc in the sketch of a relation scheme is U(R): R The intent is perhaps clearer if we write this arc as A1 A 2 A n : R This arc is constrained to be injective in every model of the sketch by including the appropriate nite limit diagram R /
/
/
/
/
=
R
/
=
| "
DD DD DD DD
z zz zz z zz }
"
R
U(R) ; we imply that the Whenever we write an arc in a sketch with the notation Q sketch has such a mono speci cation. As M(U(R)) is the product ni=1 M(Ai ) in every model M, M(R) is constrained to be a subset of M(U(R)), up to isomorphism. "
}
/
/
6
DAVID B. BENSON
A relation instance, or simply an instance, for a relation scheme is a model of such a sketch. Often M(R) is said to be an instance of relation scheme R. As the category Sets has surjective{injective factorizations, we can obtain the relational database theoretic notion of projection from a relation to a factor relation given on only some of the attributes. We use the surjective{injective factorization pictured as i M(R) M(A1 ) M(An ) /
/
e
X
M(R)[X] _ _ m_ _ _ _ M(X) where X is a subsequence of U(R) and the pair (e; m) is the surjective{injective factorization of i X = X i. The function e : M(R) M(R)[X] is the desired projection. It is not necessary to include any data in the sketch about this projection, since it exists for the semantical reasons stated. The above picture illustrates another drawing convention. This convention relates to pictures illustrating properties within models. Solid arrows indicate the images of arcs in the underlying graph of the sketch under the action of some model M. Dashed arrows indicate other functions which are under consideration. Such functions may or may not be the image of arcs in the underlying graph of the sketch. 4.1. Witnesses. If the object R[X], for X a subsequence of U(R), is required in the sketch for some reason, it is necessary to constrain the models M so that M(R[X]) = M(R)[X] with M(R)[X] the image set in the surjective-injective factorization of i X above. R[X] as a regular epi One technique is to include a speci cation of arc e : R speci cation in the colimit speci cations of the sketch. We shall avoid this method. Instead, we include a witnessing arc w : R[X] R in the sketch together with the commutativity speci cation which says w e = 1R[X ] . This suces to force M(e) to be surjective for all models M. The price is the additional data, in every model M, of the function M(w) to carry out the witnessing of the surjectivity of M(e). The data in the sketch with witnesses is pictured as R R i UR z DD
/
/
/
/
/
/
/
wzzzz z zz
R[X]
DD e DD DD
=
/
X
R[X]
e
R[X] m
!
=
/
/
/
X:
5. Functional Dependencies Relation schemes are now extended to include functional dependencies. One speci es that a list of attributes X functionally determines an attribute A in relation scheme R. The notion of a key in a relation scheme is a special case of this general speci cation. Functional dependencies are written in the style R : X ! A in [21]. We write such a speci cation as R[A]; R[X] /
QUERY - DATA DUALITY
7
but this is not an arc of a sketch of a relation scheme with functional dependencies. For if this arc were to be an arc in the sketch, the sketch would have to contain R i UR /
/
e
X
R[X] m X where the arc X is part of a cone on a discrete diagram for UR and the arc R e R[X] is the cocone of a regular epi speci cation. There would also have to be a similar diagram with A replacing X. If the sketch contains no regular epi speci cations, the theory of sketches gives stronger properties for the category of models of the sketch. Fortunately, there is an alternate means to sketch functional dependencies using only nite limit diagrams in the sketch and no colimit diagrams at all. Let (EX ; eX;1 ; eX;2 ) be the sketched kernel pair of i X , where i, X are from the picture just above. This sketch is the pullback diagram EX
/
/
/
/
eX;2
eX;1
R BB
R
~
| || ||i | X ||
BB B iX BBB
X: Similarly, let (EXA ; eXA;1 ; eXA;2 ) be the sketched kernel pair of i XA where XA : UR ?! XA is an arc of another discrete limit diagram with apex UR . In every model M, these sketched kernel pairs are mapped to the actual kernel pair, an equivalence relation. Two elements of M(R) are equivalent in M(EX ) i these agree on the values of the X attributes in the instance M(R). Further, the equivalence class map is isomorphic to the projection M(R)[X]: M(R) There is a unique injective function ! : M(EXA ) ! M(EX ) such that ! M(eX;i ) = M(eXA;i ); i = 1; 2. Therefore the sketch data for the functional dependency R[X] ?! R[A] can be given as EX EXA : For by the Schroder { Bernstein Theorem it follows from this constraint that M(EX ) = M(R)[XA]. This in turn implies M(R)[X] = M(R)[XA] establishing the existence of the functional dependency in every model. Parenthetically, it may well be simplier to give commutativity speci cations in the sketch specifying the semantical isomorphism of EX and EXA . At this stage we have a nite limit sketch for one relation scheme which possess a set of functional dependencies. As functional dependencies are intrarelational, nothing essential changes if many relation schemata are sketched in a single sketch. Such a sketch is simply the coproduct, suitably de ned, of the sketches for each !
/
/
}
/
/
8
DAVID B. BENSON
relation scheme taken separately. A database scheme with only functional dependencies is such a nite set of relation schemata with functional dependencies. The coproduct sketch then sketches such database schemata. As these sketches are nite limit sketches, the category of models is locally nitely presentable, i.e., nitely essentially algebraic, [4, 23]. 6. Join Dependencies In this section a database scheme consists of a nite set of relation schemata with functional dependencies and also join dependencies. A join dependency, [21, 7, 1], simply speci es that a certain object is the apex of nite limit diagram belonging to a certain class of such diagrams. Such diagram may involve several of the relation schemata in the database scheme. We could generalize to consider any nite limit diagram, but for simplicity of presentation we consider only the case of two relation schemes, R and S, and an attribute list X in common to both. That is, X is a word in UR \ US . For simplicity of presentation, we consider only the case that X is a subsequence of U(R) and of U(S). The more general setting follows by sketching the diagonal map, required in the next section. In the sketch we specify the join R 1 S as the apex object in the pullback speci cation R1S
R FF
S xx xx x xx xx j X
{
"
FF FF iX FFF
X UR and j : S US are relation scheme speci cations in the where i : R sketch while X : UR X and X : US X are arcs from appropriate limit cones on discrete speci cations for UR and US . The sketch of a database scheme with join dependencies has only nite limit speci cations so the category of models remains nitely essentially algebraic. #
/
/
/
/
/
|
/
/
/
7. Inclusion Dependencies The database schemata of the previous section are now extended to possess inclusion dependencies. These dependencies specify that a projection of one relation is a subset of a projection of a, possibly dierent, relation. For simplicity of presentation we will assume that inclusion dependencies relate the same attribute list in both relations. As an example, let R and S be relation schemes with X a list of attributes in UR \ US . For now, assume X is a subsequence of U(R) and of U(S). An inclusion dependency on this attribute list is notated in [21] as R[X] S[X]: This inclusion dependency constrains instances M to satisfy M(R)[X] M(S)[X]: Of course we can only constrain models of sketches to be injective functions, being subset inclusion only up to isomorphism.
QUERY - DATA DUALITY
9
The obvious manner of proceeding is to add the mono speci cation R[X] i S[X] to give the inclusion dependency. But then R[X] and S[X] must be provided with appropriate speci cations so that all of the models are constrained to map these objects to image factorizations. 7.1. Avoiding Epis. We prefer to give inclusion dependencies using only nite limit sketches. Here is some of the data required in a sketch with an inclusion dependency: S R /
/
S
R i
R[X] E
/
"
EE EE EE
S[X]
/
zz zz z zz |
X: In Sets, every surjective function splits. We sketch this by including commutativity speci cations S[X]F R[X]G R S "
|
/
/
GG G = GGGG
FF F = FFFF
R
R[X] #
S
S[X] #
to provide witnesses. In every model M, M(R ) and M(S ) are surjective functions, which is all that is required to constrain M(R[X]) and M(S[X]) to be the projections of M(R) and M(S), respectively. Parenthetically, this same idea from [20] can also be used in sketches for functional dependencies. 7.2. Duplicates. If a list of attributes Y in UR \ US contains duplicate entries, take the corresponding subsequence X without duplicates for the commutativity conditions above. It is easy to sketch the diagonal map : A ! AA for every attribute and indeed a diagonal map for every list of attributes. Use these diagonal maps as necessary to establish the correspondence between Y and X, with the inclusion dependency involving the list with duplicate entries being R[Y ] ?! S[Y ]: 7.3. Multiplicity of Models. The price to be paid for any witnessing construction are the injective functions M(R) M(S[X]) M(S) M(R[X]) in each model M. These injective functions provide the witnessing data for each datum in the projection. These functions are not part of the usual de nition of an instance. Since there are in general many witnessing functions for each instance, being any injections which split R and S , there are in general many models for each instance. To recover the instances from the models, simply forget the witnessing injections. Parenthetically, this problem does not arise in our treatment of functional dependencies. /
/
/
/
10
DAVID B. BENSON
One can avoid this multiplicity by including data in a sketch S so that R[X] and S[X] are apexes of coequalizer speci cations for the kernel pairs of R X and S X respectively. That is, one includes regular epi speci cations. In this situation, the model category Mod S is only known to be a weakly locally presentable category. However, each model in Mod S is an instance of the relational database scheme. The induced map of models from the epi-less sketch given above to Mod S is a surjection on models. The details of this passage are found in [20]. Summarizing, the inclusion of witnesses keeps the sketch a nite limit sketch at the price of many models for each relational database instance. /
/
8. Domain Constraints For each attribute A, a domain constraint on A speci es the concrete allowed values of this attribute in the actual relations stored in the database. Here we are a bit more abstract. First, include in the sketch an object 1 as the apex of the empty limit speci cation diagram. This means that M(1) is a terminal object for every model M of the enlarged sketch. A domain constraint on A is a countable discrete colimit diagram on 1 to which assigns A as the apex. In every model M, M(A) is a countable copower of a singleton set M(1). For example, if A is the attribute \sex" for a \driver's license" relation scheme, the domain constraint on A is A = 1 + 1. Practical constraints on physical resources may be used to restrict all domain constraints to be nite. In this regard, see also [3]. If both domain constraints and epi speci cations are given, we are using a general sketch. By avoiding epi speci cations, the database scheme can be sketched by a ( nite limit, countable coproduct)-sketch. The instances of such a database scheme are equivalence classes of models for the sketch. Every such sketch has a Diers category of models, an accessible category with connected limits. The connected limits will be of import when we consider queries on the data. In particular, the connected limit of models is a model. We sloganize this fact as: The join of instances is an instance. Sketching also has the advantage that we can use base categories other than Sets in which to form models. Many accessible catgories will suce, [20, 4]. One might wish to consider databases of databases in this way. 9. Accessible Categories We brie y review elements of accessible categories. First note that for each sketch S the category Mod S is equivalent, but not equal to, its associated accessible categories of models. We are being imprecise about this fact, which only becomes important in the last subsection of the paper. Let be an in nite regular cardinal. We are ordinarily only interested in the rst in nite regular cardinal ! = @0 for the purposes of this study. However, giving the data as a sketch gives little control over the regular cardinal asociated with the accessible category of models of the sketch. Note that !-(co)limits are nite (co)limits and that !1-(co)limits are countable (co)limits. However, !- ltered colimits are the usual ltered colimits, which are equivalent for many purposes to directed colimits, but strictly generalize them. The exposition in [4, 10, 11] gives the general case. Object C of category C is said to be -presentable if the representable functor C(C; ?) : C ! Sets preserves - ltered colimits. The category A is said to be
QUERY - DATA DUALITY
11
-accessible if A has - ltered colimits and if there exists a small subcategory Cop of A in which every object is -presentable such that every object of A is a ltered colimit of a - ltered diagram in Cop , [4, 20, 15, 16, 11]. A functor is said to be - at if it is a - ltered colimit of representable functors, [19, 20, 10]. Every -accessible category is, up to equivalence, presented as a category of at functors from the opposite of the small subcategory of -presentables to Sets, [20, 11]. Therefore, each object of a -accessible category can be presented as a - at functor A : C ! Sets. The base category, Sets, may be replaced by any suciently complete and cocomplete category, [20, 16]. For every category C , the Yoneda embedding y : Cop ?! Sets sending each object C in C to the representable functor C (C; ?) is full and faithful. When more facts are known about the category C , it is often possible to show that the essential image of the Yoneda embedding lies in a subcategory of Sets . We shall make frequent use of this fact, appearing in [19]. C
C
10. The General Setting Let M denote a category of possibly in nite data, also called memory states, and morphisms which structure the data. We emphasize that these morphisms are solely for the purpose of data structuring, not necessarily as the transformations induced by the activity of programs. Let P denote a programming language with denotations in a category Q of certain functors and structuring morphisms. The functors in Q map data in M to answers. For simplicity of presentation, the category of answers is taken to be Sets, so for each program denotation Q and each structured memory state M, Q(M) is a set of answers. For this reason, we often call the functors in Q queries. There is a dierent aspect to be considered when the programs are to transform data into data. That transformational aspect of programs is not considered in this paper. The duality theory not only shows that M is equivalent to a certain category of functors and natural transformations from Q to Sets, but gives structuring principles for the programs in P to be fully expressive regarding the queries in Q. The category Sets is too large to be a suitable category of queries. We use Scott's Principle, [25], in the form of -accessiblity, to cut down to a suitable category of queries. Ideally, the in nite regular cardinal in question is the rst in nite regular cardinal !. Require that the category of memory states M be a -accessible category. There is a small category D, a subcategory of M the full subcategory of -presentable objects, such that M ' ?Flat (Dop ) presents, up to equivalence, the memory states as the at functors on Dop . One may call D the category of -presentable data. In sketching relational databases we showed that it is possible to use a ( nite limit, countable coproduct)-sketch. Therefore, by a result of Guitart and Lair given in [4], the model category is !-accessible, that is, nitely accessible. In this class of examples the data category D is the category of nitely presentable data. Denotations preserve - ltered colimits: Let Q = Filt (M; Sets); M
12
DAVID B. BENSON
the category of all functors preserving all - ltered colimits in M. From [20] we have that Q enjoys the equivalence
Q ' SetsD
with the presheaf category SetsD . Notice that this property says that all and only functors from the small category of -presentable data, D, are queries { at least up to equivalence. The de nition of Q provides some program structuring principles for the programs in P . Since -limits commute with - ltered colimits, the -limit of queries in Q is also a query in Q. Since colimits commute with - ltered colimits, every colimit of queries in Q is again a query in Q. Therefore the ideal programming language to denote these queries has combining forms to express -limits and small colimits. Examples: (1). Let Q be a query. For each set S, Q(?) \ S is another query, to be thought of as restricting to only those answers that lie in set S. (2). If Q0 B | || || | ||
BB BB BB B
Q1 Q2 is a system of queries and natural transformations, the pushout of this diagram is another query. While the rst of these examples is commonly observed in practice, the second is most rare indeed. Finally, the duality states the equivalence M ' Lim Colim (Q; Sets); between the category of models and the category of all -limit preserving and small colimit preserving functors from Q to Sets. This equivalence follows from the de nition of Q after noting that all -limits commute with - ltered colimits and, of course, all colimits commute with - ltered colimits. Restating the dualities for both program and data, we have Q ' Filt (Lim Colim (Q; Sets); Sets); M ' Lim Colim (Filt (M; Sets); Sets): ~
11. The Diers Setting One may ask for more of the structure of the data to be preserved by the queries. To begin with an example, think of M as the category of all database instances for some database scheme S and the elementary maps between the instances. To be completely clear, M includes all of the in nite instances as well as the nite instances. The structuring data includes the connected limits such as equalizers and pullbacks. One considers the elementary maps involved in such a connected limit as providing construction data for the limit instance. The queries are now required to preserve this connected limit construction data as well as the - ltered colimits. Preserving connected limits is equivalent to restricting the structure of data to those objects which appear to be most similar to the actual practice of relational databases, [1, 7, 21].
QUERY - DATA DUALITY
13
11.1. Diers categories. A -accessible category is said to be -Diers, [20], or locally -multipresentable, [4], if it has all connected limits. A -accessible category is -Diers i it is multicocomplete, [14, 4], also called familially cocomplete, [16]. A category is said to be a Diers category when there exists a regular cardinal such that the category is -Diers. Every Diers category is the category of models for some sketch with arbitrary limit speci cations but only coproduct speci cations in the colimit speci cation part of the sketch. A category of Diers categories, rich in arrows, is cartesian closed, [6]. Johnstone's logic, [17], was designed to be the logic of Diers categories. Related topics appear in [12, 13]. From the de nition of a -Diers category, de ne the queries on a -Diers category of data A as B = Conn Filt(A; Sets); the category of all functors from A to Sets preserving connected limits and ltered colimits. Since coproducts commute with connected limits, one obtains the equivalence A ' Lim Coprod (B; Sets); with the category of -limit and coproduct preserving functors from B to Sets, [15]. Again this gives program structuring principles for the programs in PB denoting the queries in B: Ideally there are combining forms for expressing -limits and small coproducts. 11.2. SQL. These combining forms suce to express most of the usual combining forms for SQL-like languages: Selection is just a monomorphism, a limit concept. Join is pullback. Intersection is yet another pullback. Restriction via a predicate has been previously illustrated as an intersection with a constant set. Cartesian product is product. Disjoint union is formed using coproduct. Union is obtained from disjoint union via epi-mono factorizations. Parenthetically, in most practical settings union should be replaced by disjoint union since the former lacks the traceability provided by the latter. The simplest method of treating the SQL projection operator is to include all possible projection arcs U(R) X in the sketch and use witnesses in the sketch to guarantee that M(R[X]) = M(R)[X] for all models M. With this, the query Q(M) = M(R[X]) is available to denote the SQL projection operator. Parenthetically, one understands that this is not entirely satisfactory as an exposition of the practice of relational database queries. We have completed the list of the important SQL combining forms, except those related to negation such as the SQL dierencing operation. On several grounds one expects general negation operators to be typically inexpressible via the Diers category dual. However, if every word W of attributes is in the sketch, then SQL dierencing can be treated via intersection. One could also or alternately include formal complements in the sketch by the use of coproduct diagrams. 11.3. Structure of the Diers dual. While A, being accessible, is equivalent to a category of at functors, more information about the base category is possible in the Diers case. Let A be a -Diers category. There exists a small familially -complete, [16], category C such that A ' ?Flat (C): /
14
DAVID B. BENSON
The small subcategory of -presentables, A = Cop , is therefore -multicocomplete. For each category C, let fam(C) denote the free coproduct completion of C. The objects of fam(C) are families of objects hCk ik K for small indexing set K. The arrows hCk ik K ?! hCj ij J are given by a function t : K ! J and a family of morphisms from C, fk : Ck ! Ct(k); k 2 K: From [16], we have that for small familially -complete category C with A ' ?Flat (C); B ' fam(C): In the example of relational databases, this means that for every database query Q in B there exists a small indexing set J and a family of -presentable database instances hDj ij J in fam(C) such that Q = hDj ij J : Recall that each memory state M is a functor M : C ?! Sets = M : Aop ?! Sets: The correspondence Q = hDj ij J between families of data objects and queries is precisely X Q(M) = A(A (?; Dj ); M(?)) 2
2
2
2
2
2
j 2J
for family hDj ij J as above. Via the Yoneda lemma one has X Q(M) = M(Dj ): 2
j 2J
The compositions of the dual equivalences between the Diers category A and its dual B give the equivalences A ' Lim Coprod (Conn Filt (A; Sets); Sets) B ' Conn Filt (Lim Coprod (B; Sets); Sets) in [15].
QUERY - DATA DUALITY [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25]
15
References Serge Abiteboul, Richard Hull & Victor Vianu, Foundations of Databases, Addison-Wesley Publ. Co., 1995. Samson Abramsky, Domain Theory in Logical Form, Proc. Second Annual IEEE Symp. Logic in Comput. Sci., IEEE Computer Society Press, 1987, pp. 47-53. Jiri Adamek, P. T. Johnstone, J. Makowsky & Jiri Rosicky, Finitary sketches, J. Symb. Logic, to appear. Jiri Adamek & Jiri Rosicky, Locally Presentable and Accessible Categories, London Math. Soc. Lecture Note Series 189, Cambridge University Press, 1994. Jiri Adamek & Jiri Rosicky, An algebraic description of locally multipresentable categories, Theory and Application of Categories 2:4(1996), 40-54. Pierre Ageron, The logic of structures, J. Pure Appl. Alg. 79(1992), 15-34. Paulo Atzeni & Valeria de Antonellis, Relational Database Theory, Benjamin/Cummings Publ. Co., 1993. Atish Bagchi & Charles Wells, Graph-based Logic and Sketches I: The General Framework, preprint, 1994. Michael Barr & Charles Wells, Category Theory for Computing Science second edition, Prentice Hall International, 1995. Also Electronic Suppliment. Francis Borceaux, Handbook of Categorical Algebra 1: Basic Category Theory, Cambridge University Press, 1994. Francis Borceaux, Handbook of Categorical Algebra 2: Categories and Structures, Cambridge University Press, 1994. A. Carboni, S. Lack, and R. F. C. Walters, Introduction to extensive and distributive categories, J. Pure Appl. Algebra 84(1993), 145-158. J. R. B. Cockett, Introduction to distributive categories, Math. Struc. Comput. Sci. 3(1993), 277-307. Yves Diers, Categories Localement Multipresentables, Arch. Math. 34(1980), 344-356. Hongde Hu, Dualities for Accessible Categories, Canadian Math. Soc. Conf. Proc. 13, Amer. Math. Soc., 1992, pp. 211-242. Honde Hu & Walter Tholen, Limits in free coproduct completions, J. Pure Appl. Alg. 105(1995), 277-292. P. T. Johnstone, A syntactic approach to Diers' localizable categories, Applications of Sheaves, Lec. Notes in Math. 753, Springer-Verlag, Berlin, 1979, 466-478. P. T. Johnstone, Stone Spaces, Cambridge University Press, 1982. G. Max Kelly, Basic Concepts of Enriched Category Theory, London Math. Soc. Lecture Note Series 64, Cambridge University Press, 1982. Michael Makkai & Robert Pare, Accessible Categories: The Foundations of Categorical Model Theory, Contemporary Mathematics 104, Amer. Math. Soc., 1989. Heikki Mannila & Kari-Jouko Raiha, The Design of Relational Databases, Addison-Wesley Publ. Co., 1992. V. Pratt, The Stone Gamut: A Coordinatization of Mathematics, Proc. Tenth Annual IEEE Symp. Logic in Comput. Sci., IEEE Computer Society Press, 1995, pp. 444-454. Horst Reichel, Initial computability, algebraic speci cations, and partial algebras, Clarendon Press, Oxford, 1987. M. Smyth, Powerdomains and Predicate Transformers, in J. Diaz et al., eds., Proc. 1983 Intern. Conf. Automata, Languages and Programming, Lecture Notes in Comput. Sci. 154, Springer-Verlag, 1983, pp. 662-675. Paul Taylor, The Fixed Point Property in Synthetic Domain Theory, Proc. Sixth Annual IEEE Symp. Logic in Comput. Sci., IEEE Computer Society Press, 1991, pp. 152-160.
~dbenson/doc/research/StoneQueryData/StoneDual.tex
August 29, 1996
(David B. Benson) School of Electrical Engineering and Computer Science, PO Box 642752, Washington State University, Pullman, WA 99164-2752 E-mail address :
[email protected]