A Calculus for Reactive Databases Joxan Jaffar, Limsoon Wong and Roland H.C. Yap
School of Computing National University of Singapore Singapore
fjoxan,
[email protected] Abstract
[email protected]
Traditionally, interaction with a database is performed by a query on the current state of the database. Such queries are typically independent of one other; in fact, they are modelled as isolated transactions thus necessitating concurrency control. Increasingly, applications live and operate over updates to the database, and which interact with one another explicitly. Further, the database itself needs to be more than just a repository of facts, but be a knowledge base. In this paper, we present an abstract framework, in the form of a formal calculus. Its purpose is to capture, in a minimal and declarative way, an operational model which combines concurrent programming with databases. The key concepts are: sustained truth, for consistency and synchronization; hidden publication, for atomicity and timely updates; sudden-death, for making delayed committed-choice actions. In the end, our calculus forms the foundation of a programming system for writing reactors which are program fragments expressing the database interactions to be embedded within arbitrary agent programs. The basic constructs of reactors provide the ability to react to changes, synchronize with other agents, and perform updates, all within a speci able degree of atomicity and transactional behavior akin to that in traditional databases.
Contact Author: Roland Yap
Email:
[email protected] Phone: +65 874-2972, Fax: +65 779-4580 Postal Address: School of Computing National University of Singapore S16 Level 5, Room 05/08, 3 Science Drive 2 Singapore 117543, Republic of Singapore
1 Introduction Traditionally, interaction with a database is through a query language like SQL. Such queries apply only to the current state of the database, that is, a query entails computation on a static database. Furthermore, queries are independent of each other since the isolation provided by the database transactional model ensures that transactions do not directly interact with one another. While the transactional model for databases has proven to be very successful, increasingly, applications are required to live and operate over updates to the database. They are often required to respond, in some timely fashion, to updates. This dynamic behavior this requires that the application is reactive in nature. Further, it is becoming more important that applications can be designed which cooperate and interact with one another explicitly. Finally, we consider databases not just as a repository of facts, but as a knowledge base. Examples of applications which motivate the work in this paper include ERP systems, work ow systems, monitoring systems, agents in E-commerce settings, etc. Here more than just a traditional database system is required. For example, fund managers could use reactive programs for balancing portfolios against rapidly changing stock prices [13]. In business-to-business systems, rules and contracts could use forms of logic realized by having logical rules in the database (eg. [25]). A trac control system which manages lights could react to trac information according to some mathematical model about ows. A e-commerce system could use constraints in order to reason about combinations of products and about optimal pricing, etc. We present an abstract framework, in the form of a formal calculus, for concurrent programming with reactive databases. The key new concepts are: sustained truth, for synchronization; hidden publication, for user controlled atomicity and timely update; and sudden-death, for making delayed committed-choice actions. Typically, synchronization in a declarative setting provides for blocking until a speci ed logical condition holds (eg [28]). This simplicity is key for programmability, but does not directly address the issue of atomicity, eg. how long must the said condition hold? On the other hand, in the area of transaction processing for databases, there are substantial developments in concurrency control (eg. [8]). Here the focus is in sophisticated execution models to deal with concurrency. These models either work with system-speci ed constraints (such as using a locking algorithm on read/write operations), or they work with user-speci ed constructs in the queries (such as nested transactions). As opposed to work in programming languages, reactive programming is not the concern. The thrust of concurrency control mechanisms is to ensure that (possibly long) transactions can provide sucient parallelism in executing transactions. A main motivation of this work is to bring concurrent programming and concurrency control together. We do this via a calculus so as to have formality and abstraction with a minimal introduction of special or application-dependent features. In our model, a program fragment is runnable as long as the condition is holds at every step of the execution. Coupled with this notion of \sustaining" is a notion of \hiding" whereby updates made by a program fragment is visible, ie \published", only during the time that a speci ed condition holds. We then model the notion of choice by means of a nondeterministic operator which commits upon publication. 1
This three concepts work in an integrated way to provide a calculus that
is a basis for a general-purpose concurrency. At its heart is a database modelled as a logical
entity, and basic synchronization is based on the notion of logical entailment. This provides for high-level reactive programming on the current and future states of the database. With the recent advances in constraint programming [11] and constraint databases [20], ecient implementations of instances of the calculus are credible. has declarative mechanisms for concurrency control, the ability to control atomic behavior in the presence of concurrency. It allows a programmable level of atomicity and notion of consistency geared towards maximizing concurrency.
1.1 Related Work ECA rules in active databases support automatic triggering of actions in response to selected changes in the database [29]. Thus this does represent some form of reactive programming. However, the notion of concurrency is not dealt with explicitly. For example, the issue of how a collection of actions (arising from triggers, or from recursively executing actions) is to be executed relative to each other is not central. Of course, ECA rules do not deal with concurrency control issues directly. Consider next reactive programming languages. One class contains the \synchronous" languages such as Esterel [3], Lustre [10], etc. These languages are grounded in the concept of perfect synchronous concurrency where concurrent computations can be done in zero time. (These languages are further designed for deterministic reaction.) While suitable for applications such as automatic control and hardware design, they are not suitable in a database context because they are designed to react to external signals, not a complicated store. A more relevant class of reactive languages is the concurrent constraint programming (CCP) languages [26], in which logical entailment is used for synchronization. An \ask" primitive enables blocking until the relevant constraint is entailed by the store, and unblocking is combined with a committed choice of alternatives. CCP does not address database issues; in particular, it assumes that the shared store is monotonic in the sense that true formulas always remain true. It addresses issues such as atomicity and transactions only in a limited way. More widely, the notion of synchronization using a logicbased mechanism has been explored extensively in constraint programming (see eg. [22]). Coordination languages, such as blackboard-like languages and Linda [4], use a structured store and synchronization on the store. Linda's \tuple" store is essentially a simple database. Synchronization is based upon the existence of speci ed tuples in the store. These languages do not address reactivity in a suciently general way, nor do they address database issues such as atomicity and transactions. Finally we consider the classical transaction models in databases. Traditionally, the main concern are the ACID (atomicity, consistency, isolation and durability) properties. A transaction is atomic (its actions are done indivisible) and is executed in order to respect serializability, that is, the eect of executing a collection of transactions is the same as executing them serially in some 2
order. It has been recognized (eg. [9]) that serializability can be too restrictive on concurrent execution, particularly when transactions are long-lived. Traditionally, consistency is obtained by a locking regime on the read/write operations associated with a query. The ANSI SQL-92 standard, for example, de nes not one but several levels of \isolation" of read/write operations in association with dierent levels of consistency (see eg. [2]). Possibly the earliest precursor to our notion of sustain is [6] which dealt with the semantics of the transaction directly, as opposed to the underlying read/write operations. Transactions are classi ed and the restrictions on the interleavings of certain classes of transactions are then imposed. Subsequently there are many models which extend or generalize the kinds of consistency speci cation and enforcement. In [15], for example, database consistency is speci ed by a logical formula and transactions are required to satisfy this formula. (See also NT/PV model [16] which relaxes this to using formulas as pre and post-conditions on transactions.) Our hide construct is clearly closely related to the notion of nested transactions, or sub-transactions [19]. This allows for long-lived transactions to be decomposed so that consistency both within a sub-transaction and that of its context can be maintained separately. Applications of nested transactions include cooperative transactions [1], ones that combine toward a common nal goal. All of these works go \beyond serializability" by using encapsulation, and using the semantics of both the database and the transaction in order to restrict concurrency in order to maintain consistency. The work in this paper serves to be an abstraction of these ideas in the form of calculus, and combine them seamlessly into a concurrent and reactive programming framework. Thus while calculi such as the -calculus [18] serves as a basic model of concurrency and synchronization via (channel) communication, our calculus serves as a basic model for reactive databases.
2 The Reactor Language We consider here a language RD for specifying reactors, a reactive program for interaction with a dynamically changing database. For our purposes, a database can simply be any logical theory augmented with a notion of update. Typical realizations of the theory are relational databases, DATALOG programs, logic programs, or a generalization of all these, constraint logic programs. In what follows, c is called a test and denotes any logical formula of the theory. It may contain free variables, but these must be bound by an encapsulating construct. The intuitive meaning of c is the existential closure 9~c. The notation perform u below stands for a prede ned update operation on the theory. Tests and updates represent atomic actions in our calculus. In general, variables are introduced by the nx:r construct. As is customary, we assume that each nx:r introduces a distinctly named variable. We now de ne RD as follows.
r ::=
j j j j j j
perform u update
r1 & r 2 r1 ; r2 r1 j r2 c)r [r]c nx:r
3
interleave sequence choice sustain hide variable
The base construct perform u represents our primitive update operation on the database. For our purposes here, we may leave the details of updates unspeci ed. As an example, consider a traditional database and u can be an SQL command to perform deletion such as DELETE employee WHERE name = 'John', or to perform insertion such as INSERT employee VALUES ('John', 4000), or to perform relative replacement such as REPLACE employee SET salary = salary * 1.1 WHERE name = 'John'. In general, u is any operation which updates the theory representing the current database. The next two constructs, interleave and sequence, are the usual ones in most process calculi for expressing concurrency; they provide for interleaving semantics and sequencing. The choice construct provides for non-deterministic choice. Here, we dier slightly from the norm. In CCP, for example, what is used is a \committed choice" of alternatives, each of which is guarded a logical formula. Here we do not use guards, but instead use a notion of \ rst to publish" as the event dictating committal. That is, the choice operator in r1 j r2 executes both r1 and r2 concurrently, and chooses whichever of r1 or r2 that makes a \visible" update to the database. This notion of publication is used by the `hide" construct described below. In short, our choice operator allows the execution of its two constituents r1 and r2 until one publishes and makes a irrevocable commitment, at which time it aborts the other one. The purpose of the sustain construct is twofold: to provide a notion of consistency, in analogy to integrity constraints in traditional databases, as well as to provide (the only mechanism) for synchronization. The intuitive semantics of c ) r is that during the execution of the reactor r, the formula c is true just before each atomic step (ie. test or update) of r is executed. In between these primitive steps of r, c is not required to hold. Thus, if c becomes false in the midst of executing r, the remaining steps of r are suspended until c becomes true again. Meanwhile, other reactors may execute. This semantics represents our basic model, and is analogous to local computation in a distributed setting where each local computation does not control any competing computation. Other more restrictive models for sustain are possible but we do not discuss them here for lack of space. The purpose of the hide construct [r]c is to shelter the eects of executing r unless c is true. More speci cally, just before each atomic step (ie. test or update) of r is executed, the following is performed: if c is false, then the update of the step (if any) is temporarily hidden away from the encapsulating context; if however c is true, then all hidden updates are made available to the encapsulating context (in the appropriate order). At termination of r, all remaining hidden updates are made available to the encapsulating context. Where the condition c is always false meaning that all updates are hidden, we simply omit c and write hide [r]. Hiding with [r]c not only isolates the eects of r from the enclosing environment (which may be another hide) but also partially isolates r from changes made in the enclosing environment. Updates in r take priority over those in the environment. Hiding is transparent for those external updates which are not related to updates made in r. Thus unlike nested transactions, the encapsulation provided by hide is partial and is under user control. In general, the test c can toggle between true and false several times during execution of r, and this may be due to r itself, or to other reactors. Intuitively, c should represent the condition at which it makes sense for r to share its eects externally and this could be controlled by a combination of r itself as well as other reactors. Thus the hide construct allows a reactor to accumulate actions which do not make sense individually into groups that do make sense. It then 4
allows the publication of the group eect at various appropriate times. In other words, the hide construct, in conjunction with a judicious design of the database, provides a declarative means for breaking a long transaction r into several consistent sub-transactions. The bene ts here are analogous to those of nested transactions of traditional databases. The last construct nx:r is used to introduce variables x whose scope is restricted to r. Its purpose is the obvious one of passing values throughout reactor expressions. We elaborate below. We nally point out that we omit, in accordance with the desire for a minimal calculus, any form of iteration such as looping or replication constructs (such as the replicate construct of the -calculus). Reactors are supposed to be embedded in user programs equipped with some mechanism for extracting information from the current database. Thus new reactors may be added at any time to the set of currently executing reactors (cf. the birth rule in the next section). If the database were a PROLOG program P and c a PROLOG goal, for example, then a reasonable mechanism may be a function which returns a (possibly empty) list of all the substitutions given by the successful derivations of running c on P .
2.1 Examples of Reactors Consider dierent underlying databases. The most basic one is plain shared memory and the operations are tests and assignments. We will also use a relational database together with a SQL-like syntax and where SQL SELECT queries are used, the test function is that the query returns a positive answer. Finally, we also use DATALOG as the database augmented with add and delete updates. Consider the choice reactor (x = 0 j x = 1), one of the reactors attempting to publish the assignment will succeed. If the internal choices are hidden, for example the reactor, ([x = 0; : : :] j [x = 1; : : :]), then the rst sub-reactor which terminates will succeed while the other will be cancelled. Sustain provides for synchronization. For example, a binary semaphore can be expressed with the following protocol: x = 1 ) x = 0 for locking, and x = 1 for unlocking. More generally, a complex reactor may be sustained. For example, 9p(a; : : :) ) (: : : ; nx:p(a; x) ^ q (x) ) delete q (x)), the sustain condition ensures that there is a tuple satisfying p(a; : : :), the other elds of this tuple may change during the execution of the reactor without aecting the sustain condition. We saw that hide can be used to achieve delayed choice. This example, shows nesting of updates with hide, [[: : :]x>100 & [: : :]x 0 ) book singapore; ny) j (flt(singapore; london; d; n2) ^ n2 > 0 ^ flt(london; ny; d; n3) ^ n3 > 0 ) [book singapore; london; book london; ny]) This may not be suciently opportunistic since if we wait too long the sustain condition to get Singapore-London-NY may not eventuate. Another strategy is to try to book any of the ight legs as soon as one is available giving: (we also use fg for bracketing below) f :fltdouble ^ flt(singapore; ny; d; n1) ^ n1 > 0 ) (addfltsingle; cancelflt; book singapore; ny) j fltdouble ) true g &
f fltsingle ) true j :fltsingle ) ( f flt(singapore; london; d; n ) ^ n > 0 ) book singapore; london; flt(london; ny; d; n ) ^ n > 0 ) (add fltdouble; book london; ny) g j f flt(london; ny; d; n ) ^ n > 0 ) book london; ny; flt(singapore; london; d; n ) ^ n > 0 ) (add fltdouble; book singapore; london) g ) g 3
3
4
4
2
2
5
5
This strategy tries to book the single segment ight Singapore-NY concurrently with the two segment ight from Singapore-London-NY. If Singapore-NY is successful then any leg booked on either Singapore-London or London-NY has to be cancelled with cancelflt which doesn't do anything if no ights were booked. This is an example of a compensating transaction which also occurs in other transaction models for long running transactions (eg. [7]). If Singapore-LondonNY is successful then Singapore-NY is aborted with fltdouble (and vice versa with fltsingle). This example also illustrates speculation and aborting other reactors with choice. The one segment
ight is run concurrently since if choice was used then the rst booking on the two segment ight would commit the choice thus cancelling the one segment reactor. The next example illustrates the consistency provided by the sustain construct. Consider the following query where we have the relation stock(part; quantity; shop) which represents an inventory database. Say that we have some long running transactions which does inventory maintenance: if the stock in any of the shops is too high, the excess is moved to some warehouse. Since the transaction is long running, we want to be able to still update the individual shop inventory while doing inventory maintenance. The example partial reactor is as follows, (SELECT quantity FROM stock WHERE balance >= 100) ) (: : : ; REPLACE stock SET quantity = 100 WHERE balance
> 100
; : : :)
Note here that the resetting of the stock may involve dierent shops and quantities from when the initial sustain condition was entered. The sustain condition just ensures that there is still possibly some stock to be moved during the execution of the reactor. Consider a distributed database situation where we have two local databases D1 and D2 which are the authoritative databases. In addition, at headquarters we try to mirror these two databases with database D12 which need not be completely up to date. Our example will model all three databases as a single logical database. We want to a series of long updates A and B independently and concurrently on D1 and D2 where condition ci is sustained. Rather when wait for all the 6
updates to complete we will checkpoint midway after any update A so that D12 can be more up to date. The reactor is as follows: c1 ) [ok1 = 0; update AD1 ; ok1 = 1; ok1 = 0; update AD1 ]ok1 & c2 ) [ok2 = 0; update AD2 ; ok2 = 2; ok2 = 0; update AD2 ]ok2 Our nal example is to contrast our approach of sustain and hide with classical database serializability. Consider two independent reactors: R1 : [(:p(2) ) add p(1); : : :) j p(2) ) true] R2 : [(:p(1) ) add p(2); : : :) j p(1) ) true] If these reactors are run as R1; R2, we get p(1) in the database (conversely R2 ; R1 gives p(2)). Running the reactors sequentially is analogous to the serializability condition in our calculus which gives exactly one of p(1) or p(2). If we run these reactors concurrently, it is also possible to get both p(1) and p(2). Removing hiding would give back one of p(1) or p(2).
3 The Basic Model In this section we present the core of the operational model of RD, sucient for formal purposes. In the following two sections, we specialize the basic model to deal with two detailed concepts which underly any implementation: the trigger mechanism and the way variables are handled. A reactor system is composed of a state and a nite bag of reactors R. In the section, it suces to think of as a logical theory representing the current database. The database should support a notion of logical entailment on the test formulas c appearing in the sustain and hide constructs. The dynamic behavior RD is de ned by the system transition relation R; ?! 0 ; R0. These system transitions are, in turn, de ned in terms of the reactor transition relation r; ?! 0; r0 and the reactor termination relation r; ?! 0, where r; r0 are individual reactors. Note that the syntax of reactor transitions is meant to highlight the change of to 0 and r to r0. We now de ne the system transition relation R; ?! 0 ; R0. The notation X + x means a bag formed by adding x to the bag X ; similarly X ? x means removal. The following rules allow arbitrary birth of reactors since they are embedded within agents which may interact with the database at any time.
*** birth ***
*** progress ***
*** death ***
R; ?! ; R + r
r; ?! 0; r0 R + r; ?! 0; R + r0
r; ?! 0 R + r; ?! 0; R
Next we de ne reactor transition relations r; ?! 0; r0 and r; ?! 0. These rules are annotated by a \update" which can either be an update function u or a special \hidden" transition arising from hidden reactors. Note that an update transition u need not make dierent. Finally, the notation D() denotes the state obtained by applying the updates in D to , in order.
*** interleave *** 7
r1; ?! 0; r10 r1 & r2; ?! 0; r10 & r2 r2; ?! 0; r20 r1 & r2; ?! 0; r1 & r20
r1; ?! 0 r1 & r2; ?! 0; r2 r2; ?! 0 r1 & r2; ?! 0; r1
*** choice *** r ; ?! 0; r0 r ; ?!u 0 r ; ?!u 0; r0 0 0 0 0 r j r ; ?! ; r j r r j r ; ?!u ; r r j r ; ?!u 0 r ; ?!u 0 r ; ?!u 0; r0 r ; ?! 0; r0 0 0 0 0 r j r ; ?! ; r j r r j r ; ?!u ; r r j r ; ?!u 0 *** sequencing *** r ; ?! 0; r0 r ; ?! 0 r ; r ; ?! 0; r0 ; r r ; r ; ?! 0; r *** sustain *** r; ?! 0 j= c r; ?! 0; r0 j = c 0 0 c ) r; ?! ; c ) r c ) r; ?! 0 1
1
1
1
2
1
2
2
2
1
1
1
2
1
2
1
2
2
1
1
1
2
1
1
1
1
2
2
2
2
2
1
2
1
2
1
2
2
*** hide *** r; D() ?!u ; r 0 j= c r; D() ?!u 0; r0 0 [r]c;D ; ?!uD 0 ; [r0]c [r]c;D ; ?! ; [r0]c;uD 6j= c r; D() ?! 0; r0 r; D() ?!u 0 0 [r]c;D ; ?! ; [r ]c;D [r]c;D ; ?!uD 0 *** bound variables *** *** perform *** 0
0
v is any constant nx:r; ?! ; r[v=x]
perform u; ?!u u()
The hide rules above, are rather subtle, and require some explanation. The hide construct [r]c is generalized in the rules as [r]c;D . The purpose of D is to associate to [r]c a buer list of actions which have taken place but need to be hidden from the enclosing environment. Consider the rst hide rule, which represents publication. After the updates u D have been performed by r and c is now true, the new state of the system is given by the new 0. The second rule concerns hiding. There, since c is false, the hidden transition results in the state remains unchanged at . The annotation u D in [r]c;uD logs the fact that these updates need to be replayed the next time r executes another update. The hidden transitions interact with the choice rules, with an transition, the reactor r in a choice evolves to r0 keeping the choice uncommitted. Only when there is an update u transition, is an alternative committed. We remark that early choice can be expressed. Consider the reactor r1 j r2, early commit can be simulated by adding a dummy update, eg. (u1 ; r1) j (u2; r2). The last rule, on bound variables, shows that we dispense with variables by simply instantiating them away with constants. While this is logically correct, it is certainly insucient to explain a reasonable computational mechanism for dealing with variables and how they get instantiated. 8
This situation is thus analogous to that in logic programs where semantics can be studied on the ground instances of the programs, but the computational machinery will require more (the uni cation process, in fact). We choose to deal with variables in a separate section for two reasons. First is that we require more technical machinery than is available above, and second, the issues do not directly aect the basic model presented here, which is best understood in a minimal setting. Finally, the sustain rule requires some discussion. The use of the sustain condition c here is like a scheduling condition on execution. The reactor within a sustain can only make transitions as long as c is entailed by the current . Whenever the sustain condition is false, then the reactor is suspended. It is in this sense that we mean that this notion of sustain only makes minimal conditions on the system. We do not require that all sustained reactors need to constantly check the database but rather than when they execute it should be when the sustain is consistent. Reactors can cooperate in enabling or suspending a sustained reactor by changing the entailment of c. We now turn to some properties of the basic model. Proposition 3.1 Hidden moves do not change the database state . If r; ?! 0; r0, then = 0 If r; ?! 0, then 6= Proof sketch. By structural induction on the rules. 2 Let us write r >0 r0 if there are , 0 , and ?! such that r; ?! 0 ; r10 . Let 0 be the re exive transitive closure of >0 . Then Proposition 3.2 0 is a well-founded partial ordering. Proof sketch. It suces to show that 0 is anti-symmetric. We de ne the size of a reactors to be the number of symbols in that reactor, except for subexpressions of the form [r]c;D whose size is de ned to be the size of r. Then anti-symmetry follows from the fact that ?! strictly decreases the size of reactors. 2 We nish here with some \laws" that hold on reactors in this basic model. Let us write r r0 to mean for any sequence r; 1 ?!1 01; r1; r1; 2 ?!2 02 ; r2; ...; there are r10 , ..., rn0 +1 so that r0; 1 ?!1 01; r10 ; r10 ; 2 ?!2 02; r20 ; ...; and vice versa. Then Proposition 3.3 The sustain reactor is distributive: c ) (r1; r2) (c ) r1); (c ) r2) c ) (r1 & r2) (c ) r1) & (c ) r2) c ) (r1 j r2) (c ) r1) j (c ) r2) Proof. We prove only the rst item by induction on reactors using the well-founded partial ordering 0 . The other two items are similar. Consider the sequence r; 1 ?!1 01 ; r1; r1; 2 ?!2 02 ; r2; ...; and let r be c ) (ra ; rb ) and r0 be (c ) ra ); (c ) rb ). We need to show that there are r10 , r20 , ... such that r10 ; 2 ?!2 02; r20 ; .... The base case is c ) (ra; rb); 1 ?!1 01; c ) rb is the rst step in the sequence. Then we must have 1 j= c and r1; 1 ?!1 01 . Then we must have c ) ra ; 1 ?!1 01 . Therefore, (c ) ra ); (c ) rb); 1 ?!1 01; c ) rb. Thus we have r1 = r10 = rb. Then c ) (ra; rb) (c ) ra ); (c ) rb ) as desired. The induction case is c ) (ra; rb); 1 ?! 01 ; c ) (ra0 ; rb) is the rst step in the sequence. Since 9
c ) (ra; rb) 0 c ) (ra0 ; rb), it follows by hypothesis that c ) (ra0 ; rb) (c ) ra0 ); (c ) rb). That is, we have r1 = c ) (ra0 ; rb ) (c ) ra0 ); (c ) rb ) = r10 . Then c ) (r1 ; r2 ) (c ) r1 ); (c ) r2 ) as desired. 2
4 The Trigger and Log Model The purpose of this section is to re ne the basic model of RD with an abstraction of pending actions required on a blocked reactor, and actions performed buy still need to be hidden. This re nement is designed to be at the minimal level of detail in order to design an implementation, as well as as to re ne the semantics even further (as we do in the next section dealing with variables). We now re ne the state of the system as consisting of three components. Firstly, the theory representing the database state is the component theory(). The next component is a trigger table, denoted by sustaining(), which explicitly records each pending condition and action in the form i : ci ) ri where i is a handle. The intention of the trigger table is that an implementation can use some form of indexing to eciently identify those conditions ci which may be aected, and thus eciently identify which suspended reactors ri to resume. Finally, the state of the system also contains a log table hiding() which explicitly records hidden updates. More speci cally, each hidden update is represented in the form i : [ri]c ;D where i is a handle. Execution of reactors which are hiding will need to replay these updates. Finally, we introduce a family of new constants sustaini and hidingi where i ranges over the abovementioned handles. We now re ne the rules for the sustain and hide constructs as follows: i
i
*** sustain *** j= c i fresh
theory(0) = theory() hiding(0) = hiding() sustaining(0) = sustaining() [ f[i : c
c ) r; ?! ; sustaini j= c [i : c ) r] 2 sustaining() r; ?! 00; r0 0
) r]g
theory(0) = theory(00) hiding(0) = hiding(00) sustaining(0 ) = sustaining(00) ? f[i : c ) r]g [ f[i : c
sustaini ; ?! ; sustaini 0
j= c [i : c ) r] 2 sustaining() r; ?! 00 theory(0) = theory(00) hiding(0) = hiding(00) sustaining(0) = sustaining(00) ? f[i : c ) r]g sustaini ; ?! 0 10
) r0]g
*** hide *** i fresh
theory(0) = theory() sustaining(0) = sustaining() hiding(0 ) = hiding() [ f[i : [r]c]g
[r]c ; ?! 0; hidingi
[i : [r]c;D ] 2 hiding() r; D() ?!u 00; r0 00 j= c theory(0) = theory(00) sustaining(0) = sustaining(00) hiding(0) = hiding(00) ? f[i : [r]c;D ]g [ f[i : [r0]c ]g hidingi ; ?!uD 0 ; hidingi [i : [r]c;D ] 2 hiding() r; D() ?!u 00; r0 00 6j= c theory(0) = theory() sustaining(0) = sustaining(00) 0 hiding( ) = hiding(00) ? f[i : [r]c;D ]g [ f[i : [r0]c;uD ]g hidingi ; ?! 0; hidingi [i : [r]c;D ] 2 hiding() r; D() ?! 00; r0 theory(0) = theory() sustaining(0) = sustaining(00) 0 hiding( ) = hiding(00) ? f[i : [r]c;D ]g [ f[i : [r0]c;uD ]g hidingi ; ?! 0; hidingi [i : [r]c;D ] 2 hiding() r; D() ?! ; r0 theory(0) = theory() sustaining(0) = sustaining(00) 0 hiding( ) = hiding(00) ? f[i : [r]c;D ]g [ f[i : [r0]c;uD ]g hidingi ; ?! 0 ; hidingi [i : [r]c;D ] 2 hiding() r; D() ?!u 00 theory(0) = theory(00) sustaining(0) = sustaining(00) hiding(0) = hiding(00) ? f[i : [r]c;D ]g hidingi ; ?!uD 0 In addition to the above rules, it may be useful to consider a rule for garbage collecting the trigger and log tables at the level of the reactor system. Given a reactor system R and a state , let R denote the reactor system obtained by replacing each r in R by r, where r is obtained by recursively carrying out the following replacements: 11
if [i : ci ) ri] 2 sustaining(), then replace each occurrence of sustaini in r by ci ) ri; if [i : [ri]c ;D ] 2 hiding(), then replace each occurrence of hidingi in r by [ri]c ;D . i
i
i
i
Then the rules to garbage collect the trigger and log tables are
*** clean up *** sustaining(0) = f[i : ci ) ri] 2 sustaining(00) j sustainingi appears in R00g hiding(0) = f[i : [ri]c ;D 2 hiding(00) j hidingi appears in R00g R; ?! 0; R i
i
We then have the following soundness properties which say that every transition according to the rules in this re ned model has a corresponding transition in the basic model. Note that ?! is allowed \for free" in the sense that it is not required to match any step in the basic model.
Proposition 4.1 Soundness of trigger and log model. If r; ?!u 0; r0, then r; theory() ?!u theory(0); r00 in the basic model. If r; ?! 0; r0, then r; theory() ?! theory(0); r00 in the basic model. If r; ?! 0; r0, then r = r00 and theory() = theory(0) in the basic model. If r; ?!u 0, then r; theory() ?!u theory(0) in the basic model. Proof sketch. By structural induction on the rules and the fact that update functions do not aect the \trigger" table nor \log" table. 2 We also have completeness. That is, every transition in the basic model is also covered by a rule in the re ned model.
Proposition 4.2 Completeness of trigger and log model. Let r ; ?! ; r in the basic model. Then there are r0 , 0 , r0 , and 0 such that r = r0 0 , r = r0 0 , = theory(0 ), = theory(0 ), and r0 ; 0 ?! 0 ; r0 . Let r ; ?! in the basic model. Then there are r0 , 0 , and 0 such that r = r0 0 , = theory(0 ), = theory(0 ), and r0 ; 0 ?! 0 . Proof sketch. Let ri0 by obtained by replacing appropriate subexpressions of the form cj ) rj 1
2
1
2
1
2
1
1
2
2
2
1
1
2
1
1
1
1
2
2
1
1
2
2
2
1
1
2
1
1
1
1
1
1
2
2
1
2
by sustainj and of the form [rj ]c ;D by hidingj . Let theory(0i ) = i . Let sustaining(0i) be f[j : cj ) rj ] j cj ) rj in ri was replaced by sustainbj in ri0 g. Let hiding(0i) be f[j : [rj ; cj ; Dj ]] j [rj ]c ;D in ri was replaced by hidingj in ri0 g. Then proceed by structural induction on the rules. j
2
j
j
j
It follows from the soundness and completeness of the re ned semantics with respect to the basic semantics that the following three equations are preserved when the notion of is modi ed in the obvious manner.
Proposition 4.3 Distributivity of sustain. c ) (r ; r ) (c ) r ); (c ) r ) c ) (r & r ) (c ) r ) & (c ) r ) c ) (r j r ) (c ) r ) j (c ) r ). 1
2
1
1
1
2
2
2
1
1
2
2
2
12
5 Dealing with Variables Recall the rule for variables:
v is any constant nx:r; ?! ; r[v=x] This allows many possibilities in the choice of v to instantiate x with. Ideally, the choice of v is one that allows r to make most ecient progress and to complete in good time. A bad choice of v is one that causes r to be stuck|for example, by making some critical condition c inside r to
be false. The operational model above is now extended to deal with this issue. The basic idea below is have a lazy strategy for the instantiation of variables. This is akin to the SLD-resolution strategy of logic programs which compute using only most-general uni ers, as opposed to systematically substituting by constants. We start with a restriction on RD, in fact. The notion of free variables is de ned in the obvious manner as follows: FV(r1 & r2) = FV(r1 j r2) = FV(r1 ; r2) = FV(r1) [ FV(r2); FV(c ) r) = FV([r; c]) = FV(c) [ FV(r); FV(perform u) = FV(u); and FV([r; c]) = FV(r) ? fxg. FV(c) and FV(u) are free variables in c and u respectively. Then we de ne the notion of unsafe variable USV(r) of a reactor r as: USV(r1 & r2 ) = USV(r1 j r2 ) = USV(r1 ; r2) = USV(r1) [ USV(r2); USV(c ) r) = USV(r) ? FV(c); USV([r]c) = FV(c) [ USV(r); USV(perform u) = USV(u); and USV(nx:r) = USV(r) ?fxg. USV(x) = fxg; and USV(perform u) = FV(u). A free variable x 2 FV(r) is considered safe if x 62 USV(r). A reactor r is safe if FV(r) = USV(r) = fg and for every nxi :ri in r, it is the case that xi 62 USV(ri). We restrict RD to those reactors that are safe. The motivation for this restriction is that we intend to only allow the construct c ) r to instantiate variables. (This restriction is not fundamental, and can be removed if we prefer to allow other constructs of RD to instantiate variables.) Next, we introduce a new class of variables that we call skolem variables x^ that are allowed to appear in c's and u's during the execution of a reactor. Skolem variables represent instantiations to be xed later. They can to be viewed as globally existentially quanti ed variables. We now complete the model for RD by modifying the rules for nx:r, perform u, and c ) r of the basic model. Instead of starting with the more detailed trigger and log model of the previous section, we will use the basic model for the sake of simplicity. Extending the ideas below to the trigger and log model is routine but tedious. The state is modi ed to incorporate an instantiation table instance() that maps skolem variables into constants. We write r[^x=x] for the expression obtained by replacing all free occurrences of the variable x in r by x^ and we write r and u for the r0 and u0 obtained by applying the substitution to r and u respectively. Let c be a test and a substitution of constants for skolem variables. We write ; j= c to mean theory() j= c( instance()) and j= c to mean theory() j= c(instance()). The rules of the basic model are modi ed as:
*** bound variables ***
*** perform ***
x^ fresh nx:r; ?! ; r[^x=x]
u0 = u(instance()) perform u; ?!u u0 () 0
13
*** sustain ***
a substitution for skolem variables ; j= c
a substitution for skolem variables ; j= c
r; ?! 00; r0 c ) r; ?! 0; c ) r0
r; ?! 00 c ) r; ?! 0
theory(0) = theory(00) instance(0) = instance(00)
theory(0) = theory(00) instance(0) = instance(00)
Note that if RD is not restricted to safe reactors, the rule for perform u must be modi ed in a more complicated way and the rules for [r]c must also be modi ed. This new model is sound with respect to the basic model in the sense that a transition here is also a transition there. Speci cally
Proposition 5.1 Soundness of variable instantiation If r; ?! 0; r0 and is any substitution for skolem variables in r(instance()), then r( instance()); theory() ?! theory(0); r0( instance(0)) in the basic model. If r; ?! 0 and is any substitution for skolem variables in r(instance()), then r( instance()); theory() ?! theory(0 ) in the basic model. Proof sketch. First show that the following holds: if r; ?! 0; r0, then r( instance()); ?! 0; r0( instance()); and that if r; ?! 0, then r( instance()); ?! 0. Then proceed by induction on the rules. 2 Consider a new notion of , call it 0 . We say r 0 r0 if for any sequence r; 1 ?!1 1 ; r1; r1; 2 ?! 2 02 ; r2; ...; such that r1(instance(01)) = r1(instance(2 )); ...; then there are r10 , r20 , ... such that r0; 1 ?!1 01 ; r10 ; r10 ; 2 ?! 2 02 ; r20 ; ...; and r10 (instance(01)) = r10 (instance(2)); .... It is easy to see that Proposition 5.2 r 0 r0 if and only if r r0 in the basic model. Proof. Recall that reactors are restricted to safe ones. The proof then follows from the one-to-one correspondence of the rules in the two semantics. 2 The basic equations on sustain are preserved, 0
Proposition 5.3 Distributivity of sustain c ) (r ; r ) 0 (c ) r ); (c ) r ) c ) (r & r ) 0 (c ) r ) & (c ) r ) c ) (r j r ) 0 (c ) r ) j (c ) r ). 1
2
1
1
1
2
2
2
1
1
2
2
2
We nally remark that the instantiation table somewhat corresponds to a compilation technique that eciently looks up the value of a variable on demand rather than rewriting entire expressions to physically replace variables by their values.
14
6 Conclusion We presented a calculus for the speci cation of concurrent processes which interact with a changing database which is generalized as an updatable logical theory. Modelling the database abstractly as a theory, we focussed on having a basic set of primitives in order to obtain an operational model which is both simple and yet very expressive enough to be the foundation for reactive programming by agents sharing a database. Thus the RD calculus provides the basic mechanisms for synchronization, consistency, controlled atomicity and isolation, variable substitutions, nondeterministic choice which we argue is appropriate for dealing with databases which change over time. We also provide a hint at the implementation architecture. Extensions to this basic framework include other forms of synchronization models for sustain and adding full programmability to the calculus. A prototype implementation is under way. At the heart of this system is a constraint logic program [11] representing a constraint database. The current implementation the CLP(R) system [12], and a superstructure whose main purpose is the management of a collection of reactors. This system is an instance of a larger eort, Open Constraint Programming [13], which discusses a general architecture matching the calculus presented here. The system in particular provides knowledge representation of shared data using rules and constraints. Current applications targetted include work ow and resource allocation problems.
References [1] F. Bancilhon and H. Korth, A model of CAD transactions, Proc. of the 11th Intl. Conf. on Very Large Databases, August 1985, 25{33. [2] H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil, and P. O'Neil. A critique of ANSI SQL isolation levels. In Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, 1995, pages 1{10. [3] G. Berry and G. Gonthier. The Esterel Synchronous Programming Language: Design, Semantics, Implementation. Science of Computer Programming. 19(2):87{152, 1992. [4] N. Carriero and D. Gelernter. Linda in context. Communications of the ACM. 32(4):444{458, 1989. [5] K.P. Eswaran, J,N, Gray, R.A. Lorie, and I.L. Traiger. The notions of consistency and predicate locks in a database system. Communications of the ACM. 19(11):624{633, 1976. [6] H. Garcia-Molina. Using semantic knowledge for transaction processing in a distributed database, ACM Transactions on Database Systems, 8(2), 1983, 186{213. [7] H. Garcia-Molina and K. Salem. SAGAS, Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, 1987, 249{259. [8] J. Gray and A. Reuter, Transaction Processing: Concepts and Techniques, Morgan Kaufman, 1993. [9] J. Gray, The Transaction Concept: Virtues and Limitations. Proc. of 7th Intl. Conf. on Very Large Data Bases, 1981, 144{154. [10] N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous data ow programming language Lustre. Proc. of the IEEE. 79(9), 1991.
15
[11] J. Jaar and M. Maher. Constraint Logic Programming: a Survey, Journal of Logic Programming, Special 10th Anniversary Issue, Vols 19/20, 1994. [12] J. Jaar, S. Michaylov, P. Stuckey and R. Yap. The CLP(R) Language and System, ACM Transactions on Programming Languages and Systems, 14(3), 1992. [13] J. Jaar and R.H.C. Yap. Open Constraint Programming, 4th Intl. Conf on Principles and Practice of Constraint Programming, 1998. [14] P. Kanellakis, G. Kuper, and P. Revesz. Constraint query languages. Journal of Computer and System Sciences. 51(1):26{52, 1995. [15] H.F. Korth and G.D. Speegle, Formal Model of Correctness Without Serializability, in Proc. ACMSIGMOD Intl. Conf. on Management of Data, Chicago, Illinois, 1988. [16] H. F. Korth and G. Speegle. Formal aspects of concurrency control in long-duration transaction systems using the NT/PV model, ACM Trans. on Database Systems, 19(3), 1994, 492{ 535. [17] H.T. Kung and J.T. Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems. 6(2):213{226, 1981. [18] R. Milner, J. Parrow, and D. Walker. A calculus of mobile processes, Part I/II. Information and Computation. 100(1):1-77, 1992. [19] E. B. Moss. Nested Transactions: An Approach to Reliable Distributed Computing, MIT Press, 1985. [20] P. Z. Revesz. Constraint Databases: A Survey. In: Semantics in Databases, L. Libkin and B. Thalheim, eds., Springer-Verlag, to appear. [21] L. Libkin and L. Wong. Query languages for bags and aggregate functions. Journal of Computer and System Sciences. 55(2):241{272, 1997. [22] K. Marriott and P.J. Stuckey, Programming with Constraints. The MIT Press, 1998. [23] R. Milner, J. Parrow, and D. Walker. A calculus of mobile processes, Information and Computation, 100, 1992, 1{40. [24] R. Ramakrishnan. Database Management Systems. McGraw-Hill, 1997. [25] M. Reeves, Benjamin N. Grosof, Michael P. Wellman, and Hoi Y. Chan. Towards a Declarative Language for Negotiating Executable Contracts. Workshop on Arti cial Intelligence in Electronic Commerce (AIEC-99), AAAI Technical Report, AAAI Press / MIT Press, Menlo Park. See also http://www.research.ibm.com/rules. [26] V. Saraswat. Concurrent Constraint Programming. MIT Press, 1993. [27] G. Schlageter. Optimistic methods for concurrency control in distributed systems. In Proc. of 7th Intl. Conf. on Very Large Data Bases. 125{130, 1981. [28] F. Schneider, \On Concurrent Programming", Springer-Verlag, 1997. [29] J. Widom, S. Ceri, and U. Dayal. Active Database Systems: Triggers and Rules for Advanced Database Processing. Morgan Kaufmann, 1995.
16