general, rule-based languages for querying or updating deductive databases. ...... access plan composed of relational algebra programs. including specialized ...
Modelling Non Deterministic Queries and Updates In Deductive Databases? Chrktophe de Maindreville and Eric Simon INBIA Rocquencourt- BP 105 78153 Le ChesnayCedex, France
Abstract In this paper,we proposea new formalismto modelandhnplement general,rule-basedlanguagesfor querying or updating deductive databases.We consider as a target languagea production rule language for. databasesthat we introduced in previous papers, namely RDLl. This languagecan be seen as an extensionof a logic-basedlanguagefor databases to supportupdatesin the headof rules. The model we introduce, namedProduction Compilation Network (FCN), is derived from Petri-Netbasedmodels. A PCN modelsthe two aspectsof a rule program: the static aspectwhich consistsof the relationshipsbetweenthe rules and the database predicatesandthe dynamicaspectwhich describesthe semanticsof a rule program.The PCN areshownto havethe following features: (i) to provide a formal frameworkto describegeneralcomputation strategiesand query optimization algorithmsin a clear and concise mannerand (ii) to model in a unified way queriesandupdatesin a deductivedatabasecontext. 1. INTRODUCTION A deductive databaseconsistsof a setof baserelations called extensional durubase (EDB), and a set of virtual relations defined using rules, called intentional databare (IDB). In a deductive database.the query language generally consists of a rule based language, named DATALOG, where rules are function-free defmite horn clauses.Severalresearchersstudied extensions of DATALOG in order to increasethe power of the rule language. The introduction of negative literals in the body of a clause leads to DATALOGneg. Another extension is DATALOGset which augmentsthe termswith sets. In previous papers [Simon87, Maindreville881, we proposedan original approachto deductive databasesbasedon a production rule language,namedRDLl. A production rule, in tResearchis supportedin part by the ESPRITProjectISIDE (under contractP1133)
Permissionto copy without fee all or partof this materialis granted providedthat the copiesarenot ma& or distributedfor direct commercial advantage,the VLDB copyright notice and the title of the publication and& date appear,and notice is given that copying is by permissionof the Very Large Data BaseEndowment.To copy otherwise,or to republish.requiresa fee and/orspecialpermission from the Endowment.
Proceedin s of the 14th VLDB Conference Los AngePes, California 1988
395
our language,consistsof a conditional part which is a relational calculus expression and a consequentpart which is a sequence of insertions and deletions of tuples in databasederived or base relations. The main featuresof this language are the following. First, it captures the operational (or declarative) semanticsof stratifled DATALOGneg programs but offers more generality and more computational power than a DATALOGneg language since a sequenceof databaseupdatescan be expressedwithin a single rule. In fact, RDLl can be seen as an extension of DATALOGneg to support updates in the head of rules. The secondfeature is that the operational semanticsof the language provides a uniform and clean compromisebetweendeclarativity and procedurality. Indeed, the procedural aspects of RDLl stand upon (i) the use of updatesand (ii) the use of an implicit ordering of rules (stratification-like) combined with an explicit (user given) partial ordering of rules. Notice that stratification in logic programs already introduces procedural control in the language. Asa consequenceof the previous points, RDLl is not only a query language but is also an update language that can be used to specify complex triggers. One of the main problems encountered in deductive databasesis how to evaluate a rule program, that is a query. Most query processing algorithms for deductive databases strongly use graph models. Many such models have been proposedin the litteratute. PredicateConnection Graphs (PCG) and Active Connection Graphs (ACG) [McKay811 are used to model DATALOG programs. The PCG connects the literals that can be unified, i.e., predicate occurrencesin the rule body are camected to their occurrencesin the headsof the rules. The PCG is static and displays only the structure aspectof the logic program. It allows to retrieve all the formulae that unify with a given formula. ACG provide the dynamic aspect and display the binding propagations through the flow of control. ACGs detectrecursion through predicatesthat have the samebindings. However, both PCG and ACG are not suitable in a database context becausethey do not make a distinction between base
(i.e., EDB) and derived relations (i.e., IDB). System Graphs (SG) [Lozinskii86] have been proposed as an extension of ACG to compensate for these limitations and to support function symbols. RuMGoal Graphs (RGG) [Ullman85] are used to statically representDATALGG programswith function symbols. An RGG is an adorned AND/OR graph where each derived predicate is representedby a number of relation-nodes corresponding to the different possible binding patternsfor this predicate. Thus, an RGG can display the propagation of constants but its size can grow exponentially in the maximum number of arguments of a derived predicate. Capture rules [Ullman85] ln RGG seem limited to a semi-naive evaluation strategy. Extensions of RGG have been shown to be well suited for studying recursive query optimixation algorithms [Bancilhon86b, Beeri and they have been enhanced by so called information passing graphs [VanGelder86] in order to restrict the computation processto relevant facts. Nevertheless, both SG and RGG suffer from several limitations. First, they are highly dependent on the query processing algorithms for which they have been designed. They can hardly be used to model general query evaluation strategies (e.g., bottom-up, top-down, . . . ) as well as the flow of control of several different query optimixation algorithms (e.g., different constant move up algorithms, ...). The second limitation is that no execution model has been developed to model more general query languages such as DATALOGneg or updates and triggers’ languages, as well as their associated optimization strategies in a deductive databasecontext. For instance, the above models are bound to deal only with query languages such as DATALOG.
Apart from this introduction, the paper is organized as follows. Section 2 presents the structural and the dynamic aspect of the PCN formalism. Section 3 introduces a basic language over the PCN which describesin a clear and concise way the different computation strategiesof a rule program.This leads to the notion of annotated PCN. Section 4 discussesthe problem of generating an efficient annotatedPCN and presents a general overview of our query processing strategy as it is currently being implemented. Finally, section 5 is the conclusion. 2 .THE PRODUCTION MODEL
COMPILATION
NETWORK
2.1 Preliminaries We first recall basic databasenotions. A relational schemaR is a finite set of attributes (Al, .... A,). Let dom (Ai) be the domain of values of attribute Ai. A constant tuple t = (cl, .... c,) over a relational schemaR is a mapping from R into dom (Al) u dom (A2) u.... u dom (An) such that for each i in (1, . . .. n) q E dom (Ai). An instance of a relational schemaR, (sometimescalled a relation), is a finite set of constant tuples ovw R. A databare schemais a ftite setof relational schemas. Finally, an instance I (also called a database)over a database schemaS is a total function from S such that for each R in S, I(R) is an instance over R. An RDLl production system (or an RDLl program) is composedof a set of if-then rules called productions that make up the rule base, and a relational database called the fact darabase.
Recently, a Petri-Net based model (colored nets) was proposedin [Aly871 to alleviate the first limitation and to model the behavior of DATALOG programsusing different strategies and algorithms. In this paper, we propose another Petri-Net basedmodel, called Production Compilation Network (PCN). This model is derived from Predicate Transition Nets (PrTN) defied in [Genrich86]. PCN can model the behavior of any RDLl program, (and a fortiori any DATALOGneg program), and thus provide a generalization of the approachpresentedin [Aly87]. In particular, PCN provide a uniform execution model for implementing updatesand triggers in a deductive database. Another imptant feature is that. through a particular language of control over the PCN, it is possible to manage explicit ordering between rules. Therefore, PCN may capture procedural control over the rules.
396
The left-hand side of a production, (corresponding to the if part of the rule), is a range restricted expression of the tuple relational calculus [Codd7 1, Ullman821 which consists of the conjunction of a range formula and a sub-formula. A range formula is a conjunction of positive (resp. negative) range predicates which indicate that a tuple variable ranges (resp. does not range) over a relation. A positive range predicate is denoted R(x) while a negative range predicate is denoted 7 R (x) where x is a tuple variable and R is a relation name. A sub-formula is a condition over the variables that appearin the range formula part. The right-hand side of a production, (corresponding to the then part of the rule), is a set of actions. There are two &mmtaty actions, denoted “+” and I’-“. The updateaction “+”
called Production Compilation Network (PCN) to model the behavior of RDLl programs.This model has two aspects: the st~cture aspectand the execution aspect.
takes a ground fact (i.e., a constant mple) and maps a database stateinto another statewhich contains this fact. Thus, the action ‘I+” inserts a tuple into a derived relation. For instance, + Parent (sam,jeremie) inserts the tuple (sam,jeremie) into the Parent relation. On the contrary. the action “-” takes a fact and delete it from a relation. In aparamewized action. variables are allowed in the argument of the action (e.g., + Parent (Father = x, Child = y)). For safety reasons,all the variables that appear in the action part of a rule also appear positively in the condition part.
2.2. The structure of a Production Compilation Network The PCN model is a Petri Net basedmodel. More precisely, it derives from Predicate Transition Nets (PrTN). A formal definition of PrTN can be found in [Genrich86& It is convenient to recall that a PrTN consistsof [Genrich 861: (1) a bipartite directed graph (P,T,F) where P and T are called places and transitions respectively, and F is a set of directed arcs, each one connecting a place p E P to a transition t E T or vice versa that is (P x T) u (T x P) 1 F. Places correspond to predicates with their extensions, and transitions represent classesof elementary changesof extensions. (2) a labeling of the arcswith formal sums (noted “+“) of tuples of variables; the length of each tuple is the arity of the predicate connectedto the am. (3) A structure C, defining a collection of typed objectstogether with someoperationsand relations applicable to them Formulae built up in Z can be used as inscriptions inside sometransitions. (4) A function K, from the set of places to the set of nonnegative integers, assigning to each place an upper bound to the number of copiesof the sametoken the place can carry at the sametime. K (p) is called the capacity of p. Atokenr=(al,a2 ,. . .,ar) in a place p E P denotesthe fact that the predicate P (x1,x2 ,. . ..Xr) corresponding to that place is true for that particular instantiation of the tuple of arguments contained in the token r.
Tmditionally,a production systeminterpreter ~rownston85] is the underlying mechanismthat determinesthe set of satisfied productions and controls the execution of the production rule program.The interpreter executesa production rule program by performing the following recognize-actcycle : - Match : in this first phase, the left hand side of all productions are matched against the database.As a result, a conflict set is obtained which consists of instantiations of all satisfied productions. An instantiation of a production is a rule where all free variables are substitutedto constanttuples. - Conflict-resolution : In this second phase, one of the productions in the conflict set is chosen for execution. If the conflict set is empty, the interpmter halts. - Act : In this last phase,the actions of the production selected in the previous phase are executed. These actions may change the content of the database.At the end of this phase, the first phaseis executedto restartthe cycle. This cyclic interpretation scheme forms the basic control structure for RDLl production rule programs. In lSimon88. Simon88b], we provided the formal semanticsof RDLl which is in the spirit of Kripke semanticsfor modal logic lKripke631. As a result, we show that this semanticscapturesthe declarative semantics of a stratified DATALOG”% language but offers more generality and more computational power than a DATALOGneg language. Indeed, RDLl can be seen as an attempt to extend DATALOGneg to support databaseupdates. A notable feature of the language is that it enables to specify non-deterministic updatesor queries. Recently a very attractive formal approachhas been proposed in [AbiteboulS’la] to study databaseupdatelanguages.We believe that the basic techniques developed in this paper can also be used to implement their languages.
The structural aspectof a PCN representsthe relationships between rules and relational predicates as specified by a rule program. The following associationscan be madebetweenrules and the PCN structure. We represent each rule by a transition and eachrelational predicateinvolved in the rule by a place. The relational predicatesthat occur in the condition part of a rule are input places to the transition representing the rule and the relational predicates that occur in the action part of the rule are the output places of the transition. The condition itself of a rule is representedin the transition’s inscription. More formally, we are able to set up an isomorphism between an RDLl program and a PCN structure which is summarizedby the table below.
In the next sub-section we presentthe formal graphical tool
397
The second example is the PCN (figure 2.3) associatedto the rule Module WINGS. The base relations are PENGUINS, CROWS having for schema (Name, Color, Sex, Country, Predator).
PCN
RDLl RUE RELATIONAL. PREDICATE P
I
TRANSITION T PLACE P
Free luple variables Xl.....X, rangingover P Bound variables ranging over P or fm. variables ranging over -9
ARC (P, T) labelled by : +x1 +... +x, ARC (P;r) Welled by : (.)
ACTION + P (t)
ARC(TT.P)labelledby
ACTION - P (t)
ARC(T,P)labekd ates
MODULE : WINGS : target : FLY (Name} ; rule8 : PENGUINS (x) + + BIRDS (Name - x.Name) i CROWS (x) --t + BIRDS (Name - x.Name) : BIRDS (x)4 + FLY (Name = x-Name) ; PENGUINS (x) -8 - FLY (Name = x.Name) : end.
+t by -t
Transition’s inscription
We now give a first example of a PCN built from a set of rules. Let us consider the following rule module ANCESTOR which defines a derived relation ANCESTOR as the transitive closure of the baserelation PARENT (Par, Child). MODULE : ANCESTOR ; target : ANCESTOR {Asci Desc); rulea : PARENT (x) + + ANCESTOR (x) ; PARENT (x) and ANCESTOR (y) and x.Child.y.Asc --f + ANCESTOR (Asc - x.Par, Desc - y.Desc); end.
Figure 2.3 : the PCN for the WINGSModule
In this example, x and y denote tuple variables ranging respectively over the PARENT and ANCESTOR relations. Two relational predicate names appearin the rules. They lead to the places PARENT and ANCESTOR (representedby the circles P and A on the net). The fmt rule is modelled by the transition Tl whose inscription is the formula TRUE. The secondrule is modelled by the transition T2, whose inscription is the formula “x.Child =, y.Asc”. On the net, transitions are represented by boxes. The arcs outgoing from PARENT are labelled by + (x.Par, x.Child). The arc outgoing from ANCESTOR is labelled by + (y.Asc., y.Desc.). Finally, the arc going from T2 to ANCESTOR is labelled by + (x.Par, y.Desc.). To simplify the graphical representation of the PCN, tuples are replaced by tuple variables in the labels of the input arcsof the transitions.
A query is a production rule represented as a single transition, noted T$, several input places and a single output place, noted $, representing the result of the query. Thus, a query can be representedas the combination of two PCNs. The first one representsall the rules neededto define the content of the input places of the query. The second net represents the query itself. The resulting PCN is called a query PCN. An example of a query PCN is given figure 2.4. The query expressedin the SQL languageis : SELECT ASC, WHERE Desc
Desc
FROM ANCESTOR
= Georges;
Figure 2.4 : Quety PCN 2.3. Semantics of a PCN In this section, we define the semantics of a PCN. The dynamic aspectconsistsof the semanticsassociatedto the tiring
J
Figure 2.2 : PCNfor the Ancestor rules
398
transition’s conflict set is the set : ((Tl, ), (Tl, a / (Paul, Sam)>)) and Rel (Tl) = (u( / (Bill, Paul)>. ). Note that by definition, Rel (Tl) only contains tokensthat satisfy the inscription of Tl.
of the PCN transitions. This semanticscorrespondsprecisely to the semantics of RDLl as shown in [Simonll, Maindreville88]. A marking is a distribution of tokens over the places of a PCN. The token is a basic concept in all Petri Net basedmodels. In a PCN, a token representsa databasetuple. Traditionally, a Petri Net basedmodel executesby firing transitions. A transition can be fired if it is enabled.
When a transition T is enabled, it can be fired. Firing a transition’s occurrence (T, L) produces several actions. First, it duplicates from each input place P of T a number of tokens equal to the number of positive symbols labeling the arcs (P, T). Indeed, we use conservative nets which meansthat firing a transition T will only change the stateof the output places of T. Then, for each output place P of T, let + tl+ ...+ tn be the positive label of (T, P) and let - sl- ... - sq be the negative label of (T, P). Given the substitution list L in Rel (T), we define L(ti) as the result of applying the variable substitutions of L to the variables of ti. Two sets of tokens S+ and S- are then defined as follows: S+=(L(ti))fori=lton; S-= {L(si))fori=ltoq; Let M(P) be the marking of P; the firing of (T, L) produces a new marking in P: M(p) = (M(p) u (S+ - SJ) - ( s_ - S+).
A transition T is enabled whenever the three following conditions are satisfied. (i) Each input place P of T contains at least as many tokens as specified by the number of tuples figuring on the label of the am (p, T). A tuple of the form (.) counts for zero. (ii)The tokens occurring in the input places of T have values satisljing the transition inscription. (iii)The marking of at least one output place of T is modified by the transition firing. We impose the capacity of eachplace, that is the number of copies of the sametoken a place can carry at the sametime, to be bound to 1. This leads to the notion of a duplication free PCN. A duplicationfree PCN is a PCN where : (i) two tokensof equal value in the same place are merged into a single token, (ii) a transition which would only produce already existing tokens is not enabled. This correspondsto assign a fmpoint semanticsto the “+” symbol in the rule language.
This result is portrayed in the figure 2.5. The dashed part representsthe marking M(P).
At a given time, several transitions can be enabled. A transition’s occurrence is a couple (I’. L) where T is a transition name and where L is a list of variable substitutions of the form where the xi are all the free variables figuring on the labels of the input arcsof transition T and the ai are tokens issued from the input places of transition T. In the remainder of this paper L is called a substitution list. The set of all transition’s occurrences, for a given marking of the net, is called the transition’s conflict set. Note that for a single transition T, different substitution lists can be used to fire the transition. Each one of theselists leads to an element (T, L$ in the transition’s conflict set.We note Rel (T) (for : relevant set of tokens for T). the set of all the substitution lists that appearin the transition’s conflict set together with the transition T.
Figure 2.5 : result of thejIring of a transition’s occurrence. With these definitions, a multi-labelled arc is interpreted as an atomic database update. This property has several consequences.First, it ensuresthat the order of insertions and deletions in the action part of a rule is irrelevant. Second it nullifies an action which inserts and deletes the sameconstant tuple. For instance, the instantiated label of an arc (P. T): L(t) = + (a a) - (a, a) - (a. a) = - (h a) + (a, a) - (8, a) is a null action over P. However, L (t) = + (a, b) - (a, c) = - (a. c) +(a,b) deletesthe tuple (a, c) and addsthe tuple (a, b).
Example : ’ Consider the transition Tl of the PCN representedin Figure 2.2. Assume that the marking of the place A is empty and that the marking of P is ((Bill, Paul), (Paul, Sam)). Then the
Example : Consider the following PCN :
399
Q
Example : In the example of figure 2.2, each transition is clearly deterministic and the PCN is also deterministic. Indeed, this PCN describesa monotone increasing function (in the senseof set inclusion partial order) over the marking of the net. In the example of figure 2.6, the transition T does not describe a monotone function over the markings of the net. However, it is quite easyto demonstratethat such a transition is deterministic. Finally, it is important to note that the net portrayed on figure 2.3 has no stable state.
+(x.orig.. y.exL) - y
pc
1’
I
Figure 2.6 Gn this net, the relation G (Ext., origin) storesoriented arcs in a graph.The transition T computesthe reduction of the graph by replacing two serial edgesby one resulting edge.The condition C is : (V z E G)[(z = y) v ((z.Grigin # x. Ext) A (z.Ext + x.Ext))]. It expressesthat x.Ext. is a single node in G. Let I~Io (G) be the following initial marking in G : MO(G) = (tl = (a, b), t2 = (b, c), t3 = (c. d)); Then, Rel (T) has two elements : Rel (T) = (Ll = cx / (a, b), y/h ch L2= ); The sets S+ and S- are defined as follows :
A PCN evaluator has to execute the following procedure portrayed in figure 2.7. This procedure, starting from an initial marking, successivelycomputeseachreachablemarking. (P:
PCN,
MO:
Initial
M e-MO; repwt the conflict set: select a transition's occurrence (T, L) in the conflict set; compute M :- reachable marking from M by firing (T, L): until a stable marking is reached. compute
For CC W : S+ = Ita. 41, S- =((a, b). (b, c)) For(T.L2):S+= I@, 4). S- = I@, cl. (c, 4) The fiig of (T. Ll) producesthe following marking : Ml(G) = W&3 u (S, - S-N - ( S- - S+) = ((a, cl, Cc,4) Then, Rel 0 = IL’1 = UC/ (a, c). y / (c. d)>) For (T. L’l) : S+ = ((a, 4). S- =((a, 4, (c. d)) The firing of (T, L’l) produces the following stable marking : M2G) = (Ml@)
eP8lu8te marking)
arocedutie
Figure 2.7 : A PCN evaluator. This procedure is not deterministic. The undeterminism standsin the choice of a transition occurrencein the conflict set, that is, (i) the choice in the conflict set of a transition T and (ii) the choice of the tokens used to fue T. In a deductive database context, the problem is to determine a sequenceof transition’s occurrencesthat leadsto a correct and eficient computation.
u (S+ - S-1) - ( S- - S+) = ((a. 4).
For a given initial marking, a place P has a stable marking iff none of its input transitions is enabled. This notion is used to define the stability of a transition. A transition T is said to be stable for a given initial marking iff all its output places have a stable marking. Then, a PCN has a stable marking for an initial marking iff all its transitions am stable.
Correctness refers to the declarative semantics of the corresponding program and to the stable marking associated with it. Indeed, the non monotony of the PCN evaluation denies the existence of a unique stable marking for all nets. When several stable markings are reachable, using a meta-semantics similar in essenceto the stratification of logic programs[Apt86], a deterministic semantics can (under certain conditions) be assignedto the PCN. When it does, the meta semanticsyields a partial ordering over the transitions, and thus constrains the choice of the firing of a transition’s occurrence in the conflict set. Therefore, a correct computation must return the unique stable marking of the net when it exists under the previous meta-semanticsor one of the possible stable markings when the net remains undeterministic. Efficiency refers to both the
When the PCN reaches a stable state, no transition can be enabled and the PCN evaluation stops. This corresponds to a flxpoint for the set of rules modelled by the net. We are now able to statethe definition of a deterministic PCN. A transition T is deterministic iff, for any initial marking Mg, the output places of T have a unique stable marking depending on lvlo. A PCN is deterministic iff for any initial marking IHo, the net has a unique stablemarking depending on MO.
400
computation modeof a single transition (remarkthat evaluating a transition leads to evaluate a relational calculus expression) and the control of the inherent parallelism between rules. We shall seelater two different computation modesfor a transition (a set oriented modeand a tuple at a time mode).
Our objective here is to defme an explicit control component that is specified independently of the PCN evaluation scheme. Informally, this control can be represented by the set of all allowable sequencesof transition firings; that is by specifying a languageover the set of transitions. We will call such a language a control language . At each step of execution, the control language restricts the set of transitions that may be considered for firing and only a subsetof the transitions in the transition’s conflict set is acrive. We call a PCN together with a control languagean annotatedPCN.
Theflow of control in a PCN is defined as the description of both the sequence of transition’s occurrences firings, and the computation mode of each transition. In our framework, this comes to say that query optimization techniques are captured either by the syntactic transformation of the net or by describing a particular flow of control. Therefore, the problems that immediately arise are (i) to provide a formal way of describing the flow of control in a PCN and (ii) to defme how to generatea correct and efficient flow of control. The former is tackled by the next section while section 4 discussessome aspectsof the secondproblem.
All the procedural control is thus contained in the control language and is quite independent of the specification of the evaluation scheme.Note that when the control language places no constraint on transition’s tirings, (i.e., when the control language allows all possible transition sequences),an annotated PCN reduces to a non deterministic system whose execution is completely governed by the evaluation scheme. At the other extreme, where the control language makes the PCN deterministic, the evaluation schemeis inhibited.
3. FLOW OF CONTROL IN A PCN 3.1. A control language over the PCN
We now define formally what is a control language over a PCN. First, recall that a PCN is defined as a triple (P, T, F). We define C as the set of all transition names of T. A control language over a PCN is any subsetof Z* where Z* is the set of all strings over I;. A word in I;* is called an annotation.
The previous section described the PCN structure and its semantics. In this section, we present a language used to describethe computation of a PCN. As we have seenin section 2, a PCN is a non deterministic machine that describes sets of sequences over states of a database,and thus determines a relation on these states. This scheme was illustrated by the PCN evaluator procedure portrayed on figure 2.7. The implementation of such a procedureon a deterministic machine necessitatesspecifying the order in which enabled transitions are selected for fling (the conflict resolution srruregy ) and the order in which statesof the databaseare expanded(the backtracking strategy). For instance, a possible conflict resolution strategy can be the stratified execution of a program. All these strategies are called the evaluation schemeof the PCN.
We are now able to define the execution of an annotated PCN = (P, T, F. A) where A is a set of annotations (or a control language over Z). We first define a stare of execution to be a pair cu. M> where u is a prefix of some word in A and M is a marking over the net. Now, let u and UT be prefixes for some word in A where u E Z* and T E G Then, we say a state of execution is directly reachablefrom a state (noted cu. Ml> + ), if and only if one of the two following conditions holds : (i) The transition T is enabled in Ml and M2 is reachable from Ml by executing T. (ii) the transition T is not enabledin Ml and M2 = Ml.
Remark that if backtracking is eliminated, for non deterministic PCN only one solution path can be generatedfor any given initial marking and query. In such cases,the solution set depends not only on the transitions and the marking of the net but also on the particular evaluation scheme used. Nevertheless, we do not consider any backtracking strategy in the evaluation schemepresentedin this paper.
The last condition meansthat if a transition in the annotation is not enabled in the current marking, then we move on to the next symbol in the annotation. Let +* denote the reflexive transitive closure of + Then the reachability relation computed by an annotatedPCN. (P. T.
401
F, A), is a subset of M x M where M is the set of all possible markings over the net : ( : +* for someu in A)
this means that the annotation is erroneous. Otherwise, a transition T is selected from the active set and T is executed giving raise to a new state of execution . This is illustrated by the following procedure.
The set A of annotations, (or the control language), can be specified using a context-free language. We give below the context-free grammarassociatedwith the languagethat describes all the possible annotationsover a PCN. The terminal alphabet is composedof Z, +, (, ), *, cr. The non terminal alphabet is: T, Q and the start symbol is S. Now, the productions are : S +(T+Q)ITQIT T-,(T+t)ITtItI(T)*IQo Q+(Q+t)IQtItI(Q)*I(Q)Q
In this grammar,the symbol + denotesa disjunction (i.e., (T + Q) means that T or Q can be fired); the symbol Q standsfor saturationand meansthat a sequenceis repeatedup to saturation; the symbol * meansthat a sequenceis repeated0 or more times. Finally, TQ meanscomposition of T and Q. Example : Supposeone wants to express that Tl is always tried to be fiied before T2 which in turn must always be tried before T3. The sequence((Tl)On)O expressesthe precedencebetweenTl and T2. Now, the sequence (((Tl)oT2)oT3)Q is the desired result. Thus, Tl is always evaluated fist. If Tl is not enabled then T2 is evaluated. If T2 is fired using a given transition’s occurrence then Tl is again evaluated. When a stable state is reachedfor Tl and T2 then‘T3 can be evaluatedand so on. 0
procedure evalute l nnot8ted (P:annotatet PCN, t-lo: Initial marking) M 4-q); u 4-I; repeat w active set:= set of all transition: T in the conflict set such that UT is i prefix of some word in A; if active set = 0and UT is not a word in A then return error: salect a transition's occurrence (T, L) ix the active set: w M :- reachable marking from M by firing (T, L): utuT; if u is a word in A and M is not a stable marking then return error; until a state is reached where M is a stable marking and u E A.
3.2 Extended annotations In the previous section we described a control language without indicating the computation mode of eachtransition. The purposeof the present section is to give a detailed description of thesemodes.Each of them yields a new basic symbol associated with a transition. The previous languageis then extendedto deal with these symbols instead of the transition names. We call extended annotations the words recognized by this new control language.
Remark that due to the presenceof parenthesis, the above grammar is of the type anbn and is not a regular grammar. Another point is that due to the disjunction “+“, the set of annotations can be described by only one word in the above context-free language. We are now able to presentthe modified evaluator pm~ed~re that executes an annotated PCN = (P. T, F, A). Initially, the state of execution is 4, MO> for some initial marking M@ Supposethat a stateof execution where the xi are all the free variables figuring on the labels of the input arcs of transition T and the pi are pattern tokens. A pattern-token is simply a partially instantiated tuple. A non instantiated attribute value in such a token is representedby “?“. A substitution of the form xi / pi where pi only contains “?” values is named an empty subsrinuion. A substitution list L’ matchesa pattern list L if each pattern token of L respectively matcheseach token of L’. A non instantiated value ‘I?” matchesany constantvalue. The procedure associatedwith this symbol is :
procedure
compute-tram M:marking,
. (T : t rans . , C: boolean);
The following remarkscan be made : (i) such a procedureis deterministic, and (ii) the “for each” statementof the procedure exactly corresponds to compute t as a Relational Algebra Program (RAP) where the relations are the marking of the input placesof T and the result of the programis the marking of the output places of T. Rel (I’) is computedonly once: thus, the procedureis equivalent to a usual relational algebraquery. This is why such a transition ftig is interesting in a DBMS context.
L:pattlist,
begin
if C = false then compute Rel (T); Choose a list L' that matches pattern list L (if any) in Rel (T); compute the sets S+ and S- : fat each place P such that there exists an arc (T, P) do compute M :- (M v (S+ - S-)) - (S, - S+); end.
An extended annotation is then any word computed by the previous context-free grammar where Z is now the set of symbols built from the two above basic symbols. the set of transition namesin T, and the set of all pattern lists.
A particular casearises when L is such that all the variable substitutions are empty substitutions as defined above. In this casethe symbol FT, L is abbreviated as FT and its meaning is to fire a transition’s occurrence of T chosen at random in the transition’s conflict set.
A particular caseis the annotation (FT, L) o. It meansthat the transition T is tired up to saturation using the pattern list L as a filtering pattern for tokens. In a similar way as above, WT) o means that T is fired up to saturation without any pattern. The associatedprocedureis :
The’ second basic symbol does not really belong to the semantics of a PCN. Rather, this symbol, noted RT, L, defines a particular computation mode of a transition. The meaning of RT, L is first to compute rel (‘I’) and then to fue T with all the occurrences of rel (T) without observing the intermediate modifications of T on the conflict set. Roughly speaking, it meansthat T is fired in a set oriented way. More formally, let T be a transition and (T, P) be an output arc of T. Let + tl...+ tn be the positive label of (T, P) and - sl...- sq be the negative label of (T, P). Two sets SR+ and SR- are now defined as follows : sR+=
U (IL ($)I) LE Rel(T)
SR-=
u (IL(Sj))) LE RelQ
=
L:patternlist,
compute Rel (T); C :- false: Rel (T) # 0 do %until a stable while marking M is reached% begin
compute-trans. (T , L, M, C); compute Rel (Tl; C :- true; end
end.
Example : On the net portrayed on figure 2.2, the extended annotation (FTl,Ll)o (FT~, L2)o where : Ll = a / (?, sam)> and L2 = computes in the ANCESTOR place all Sam’sascendants.Indeed, (FTl,Ll)o successively copies all the PARENT’s tokens that satisfy the pattern of Ll into ANCESTOR. Then (FT~, L2)o computes the ascendantsof sam. Equivalent annotations are either (FTl,Ll* FT~, L2*) o
u sLE Rel(T)
The procedureassociatedwith the symbol RT, L is then : cocedure
compute-rel-ttan8
sgin xnpute Rel(T); BX each place
(T:trans.,
L:patternlist,
(T:trans.,
M:marking);
begin
u S+ LE Relu)
=
compute-ttansition-sat
arocedure
M:marking);
P such that there exists an arc (T, P) do
~~((FT~,L~)OFT~,L~*)~.BU~FT~,L~*F~,L~*~S~O~~
correct annotation for the PCN. Remark that all these annotationshave not the samecomputation Cost.
begin compute SR+ and SR- for all lists that satisfy pattern L; compute M (P) :- (M (P) U (SR+ - SR-)) - (SR- - sR+); compute Rel(T);
An interesting problem is to detect the cases where the symbol (FT, i> o can be rewritten into (RT, L)o and yields the sameresult. When this is possible, the transition T is said to be
end
ad.
403
4.1. The evaluation
relutionul. Note that, when the language is DATALOGneg, all the rules are relational computable. A complete discussion of this problem can be found in [Simon88]. In the following, we only give some examples to illustrate the notion of relational computabletransition.
The evaluation schemepresented here only incorporates a conflict resolution strategy and proceeds in a bottom-up evaluation. The two basic techniques used for conflict resolution are : firing by stepwise saturation and implicit ordering of rules.
Examples : (i) Consider the PCN portrayed in Figure 2.2. The transitions Tl and T2 are relational. This meansthat the firing of Tl corresponds to a usual query in relational algebra, and that the repeatedfiring of T2 can be computed by a relational algebra program which is precisely a loop of joins onto the relation PARENT and ANCESTOR. In this particular example the word (FT~) Q (F-r,$ a can be rewriten into (RTl)o (RT2>o which is a more efficient annotation. (ii) Consider now the PCN portrayed in Figure 2.6. As presented in section 2, the computation of the word (FT) Q leads G to the following stable marking : M2 (G) = [(a, d)). We show in the following that a relational computation of this PCN doesnot provide the samemarking in G. The initial marking in G is MO (G) = (tl = (a, b), t2 = (b, c), t.3 = (c. d)); Then, we have :
Implicit ordering correspondsto what is called stratifcation for logic programs as described in [Apt86]. A detailed discussion of what this concept becomes in our framework where we not only deal with negation but also with updatesis given in [Simon88b]. Roughly speaking, it corresponds to travcrsc the net in the order specified by the oriented arcs of the net. This is called the chaining property.
SR+= u S+=((a.~)) U((b, 4) LE Rel(T) s&= u S-=((a, b))U((b, c))u I@, cl)u ((cd)) LE RelQ
= ((a. b), 0-x cl, k 4) The computation of the procedure compute-rel-trans leads to : M2 := U (sR+ - SR-)) - (SR. - sR+)
(G)(MO (G) = ((a,
cl,
(b, 4).
Hence, the transition T is not relational, i.e., (FT) o + (RT) o. 4. GENERATION PCN
AND USE OF AN ANNOTATED
In this section we present a general query processing overview of the systemwe are being implementing using PCN. We first discuss the evaluation scheme.Then, we present how the previous model is used to implement the production rule languageRDLl.
scheme
We now discuss the choice of computation of a transition using a saturation mode (o mode). Several cases arise according to the nature of the program. Fit, assumethat the PCN is deterministic. Therefore the Q computation mode for all the transitions defines one correct computation among others. Is it an efficient one ?. In fact, two basic optimizations are made possible: (i) If a given transition is relational computable then (FT) o can be replacedby (RT)o in the annotation, (ii) it avoids the PCN evaluator to maintain a main memory environment for severalrules at the sametime; in particular, a transition that has already been fired and that does not appear in a repetitive sequencein the annotation is eliminated from the evaluator’s environment. Now, consider an a priori non deterministic PCN, two sub-casesare distinguished. First, the net has an underlying deterministic semanticsthat can be assignedusing the implicit partial ordering mentioned above, or becausean explicit metasemantics was provided by the user (for instance, an annotation can be given using a patticular grammar).If explicit information is given then the choice of the computation modeis also explicitly derived, Furthermore, if the net is only composed of deterministic transitions, then the explicit annotation may only concern the ordering between transitions. Thus, in this case,a d computation modecan be assignedto all transitions and we get a correct computation. The second sub-case that remains is the case of a non deterministic PCN for which a deterministic semanticscannot be infered. Here, there is no notion of correct computation and
404
Let us review the different uses of annotations. First, the control languageenablesto model a procedural control over the rule language that is not hidden in the rules [GeorgeftXZ]. Whatever the meta-languageis for expressing this procedural control, annotations provide some means to capture it in a compiled form. A second point is the use of annotations for capturing evaluation algorithms. For instance, the “push-up” of constantscan be expressedin the extendedcontrol languagein the same spirit as [Aly87]. If an annotation was already specified by the user,query optimization techniquescorrespond to transform the specified annotation into an optimized and an extended one.
any mode (excepted to the relational one) can be a priori chosen. Thus, in summary, a cscomputation mode can be used when the PCN is assigned a deterministic canonical semantics. We call such a mode a firing by stepwise saturation mode. Obviously, a different mode arises when explicit annotations are provided by the user. Example : Consider the PCN representedFigure 4.1. This PCN admits a unique fixpoint on the contrary to the PCN portayed on Figure 2.3 which loops for ever.
The last step is the evaluation of the annotatedPCN. As a result of this step, a sequential (i.e., without use of disjunction) extended annotation is produced. It is then transformedinto an accessplan composedof relational algebraprograms.including specialized operators in order to compute efficiently the non relational rules. The different stepsof the query processingare summarizedon the following figure:
Query
Figure 4.1
Access
Optimization
Thus, this PCN is deterministic and a correct extended annotation is for example : w = (RTI)~(RT~F(RT~)~ (RT4)o. Remark that in this case, this annotation can be generatedby the systemitself.
Translation
Plan
Generation Ic run
Production Compilation Networks
time
code
Annotated PCN
4.2 A general query processing overview. In this sectiljn, we briefly present the different steps of the evaluation of a query by generation of an extended annotated PCN.
r Rule
Base =i
Figure 7.1 : query processing overview When IAquery enters the system, the fist step consists of retrieving the pertinent PCN, i.e., the minimal PCN which is able to produce the desired marking in the result place. Then, this m%imal Query PCN is build in main memory. The second step consists of an optimization and translation phase. The optimization techniques being currently implemented in our system are : the rewriting of the (FT) 0 words into (RT) o , a generalization of query modification in order to minimize the number of transitions appearing in the query PCN, the transformation of a set of databaseupdates into an optimized one as [SellisgS] does, and the “push-up” of the selections. These algorithms are describedin @kiindxville87].
5. CONCLUSION This paper presented a new execution model called a Production Compilation Network (PCN). This model is an extension of a Petri-Net based model, namely Predicate Transition Nets. We showed how to model the static aspectand the dynamic aspect of RDLl, a production rule language for databases. RDLl can be seen as an extension towards the support of updatesfor DATALGGneg. Hence, the PCN model provides a uniform framework to model the evaluation of queries and updatesin a deductive databasecontext.
405
IL
Acknowledgments : The authors would like to thank Serge Abiteboul for the numerous fruitful discussions, Nathalie Lefebvre for her commentson a previous version of this paper and Marie Aude Portier for her participation in the implementation.
We introduced a control language over the PCN to describe different evaluation schemes and algorithms in a clear and concise manner. In particular, we showed how to describe the relational computation of a production rule. The set of basic symbols of the PCN language as presented in this paper permits only to capture a bottom-up evaluation strategy. We defined elsewhereFlaindreville88b] other symbols that capture pure top-down evaluation strategiesor mixed strategies(as in the query/subquery approach). Using the full PCN language, we are able to describe the Prolog evaluation strategy for DATALOG-like rules. Finally, the structure aspectof a PCN enables to capture a partial ordering over the set of transitions. This ordering can be considered as an extension of the notion of stratification for a production rule language,[Simon88b].
References : [Abiteboul87a] S. Abiteboul. V. Vianu : “A Transaction Language Completefor Database Update and Specification.” , in ACM PODS. 1987. [Aly87] H. Al; , Z. M. Ozsoyoglu : “Non-deterministic ModeRing of Logical Queries in Deductive Databases” hoc of ACM-SIGMOD, Los Angeles, 1987. [Apt861 K.R. Apt, H. Blair, A.Walker : ” Towards a Theory of DeclarativeKnowledge”. IBM Report RC 11681,April 1986. [Bancilhon86] F. Bancilhon, R. Ramakrishnan : “An amateurs’s introduction to recursive query processing strategy”, Pmc of ACM SIGMOD, 1986. [Bancilhon86b] F. Bancilhon, D. Maier, Y. Sagiv, J. Ullman : “Magic sets and other strange ways to implement logic programs”, 5th ACM Symp. of Print. on DatabaseSystems. [Beeri87] C. Beeri et al., :“Sets and Negation in a Logic Database Language (LDLl)“, MCC Tech. Report, Nov. 1986. [Brownston L. Brownston. R. Farrell, E. Kant, N. Martin *“Programming Expert Systemsin ‘OPSS: An Introduction to Rule-BasedProgramming”. Ed Addison-Wesley. [Codd71] E.F. Codd : ” A Data Base Sublanguagefounded on the relational calculus.” Proc of ACM SIGFIDET 7 1. [Gardarin85] G. Gardarin, C. de Maindreville, D. Mermet, E. Simon : “Extending a Relational DBMS towards a KBMS : A First Approach .” Springer Verlag, Ed Schmidt, Thanos, 88. [&uich86] H. J. Genrich : “Predicate I Transition Nets ” , in Advances in Petri Nets’ 86. Springer Verlag. 1987. [Georgeff82] M. P. Georgeff : “Procedural Control in Production Systems”,Artificial Intelligence 18 (1982). [Giordana85] A. Giordana, L. Saitta : “Modeling Production Rules by Means of PredicateTransition Networks.” Inform. SciencesJournal, North Holland Ed. Vol.35 Nol. [Kripke63] S. Kripke : “Semantical consideration on Modal Logic” , Acta Philiosophica Fennica, Helsinki, 1963. [Lozin&ii86] E.L. Lozinskii : “‘AProblem-Oriented Inferential Database System.” ACM TODS , Vol. 11, No 3, Sept. 1986. [McKay811 D.P. McKay, S.C. Shapiro : “Using Active Connection Graphs for reasoning with recursive rules”. Proc of 7th IJCAI, 198i. [Maindreville87] C. de Maindreville, E. Simon :“A Predicate Transition Net for Evaluating Queries against Rules tn a DBMS.” INRIA Research Report, No 604. Feb. 1987. abstractin Proc of 3m JBDA, PortCamargue, France, 1987. [Maindreville88] C. de Maindreville, E. Simon : “A Production Rule Based Approach To Deductive Databases”, Proc of 4th Int. Conference on Data Engineering, Los Angeles, Feb. 88. [Sellis85] T.K. Sellis, L. Shapiro : “Optimization of extended databasequery languages.” ACM SIGMOD, Austin, 1985. [Simon883E. Simon, C. de Maindreville : “Deciding whether a production rule is relational computable” Proc. of Intemanonal Conference on DatabaseTheory, Bruges, Belgium, Sept. 88. [Ullman82] J.D. Ulhnan : “Principles of Databases Systems”, Computer Science Press, 1982. [Ullman85] J.D. Ullman : “Implementation of Logical Query languages for Databases”,ACM TODS, Vol. 10, No3.1985. [VanGelder86] A. Van Gelder : “A Message Passing Frameworkfor Logical Query Evaluation”, ACM SIGMOD 86.
Our main contribution is to provide a good intermediate model that can be used as a control structure for implementing rule basedlanguagesfor deductive databases.For instance,it is possible to describe the computation of a rule program using either a tuple at a time computation mode or an extended relational algebraprogram or both. Moreover, our claim is that the implementation of DATALOGneg programs can benefit from using PCN because (i) the efficient evaluation of a DATALOGneg program requires implementing a variety of strategiesand algorithms, and (ii) procedural control enablesto describemore meaningful programs.Thus, we need to describe control over rules. To illustrate the first reason, consider the recursive query evaluation problem. No universal optimization algorithm is available and different algorithms need to be supportedaccording to the structure of the data in the database relations (cyclic or acyclic, tree, cylinder, ... ). the structure of the rules (chain rules, linear rules, ...). and the operators supportedby the system (e.g., transitive closure). PCN offer a uniform way to describethesealgorithms with their application area. A first implementation of the method has shown the adequacy of the execution model. It also pointed out some researchperspectives: *to extend the model to support : (i) integrity constraints over the derived relations, (ii) accessmethods information in order to include physical optimization in the query processing process, ,to map parallelism onto the model, *to use annotations for parameterizing various evaluation schemes
406