Events with Attributes in an Active Database - Semantic Scholar

3 downloads 111 Views 327KB Size Report
AT&T Bell Laboratories ... mumick@research.att.com .... An event history, or simply a history, is a nite set of event occurrences in which no two event occurrences ...
Events with Attributes in an Active Database H. V. Jagadish

AT&T Bell Laboratories [email protected]

Inderpal Singh Mumick AT&T Bell Laboratories [email protected]

Oded Shmueli

AT&T Bell Laboratories [email protected]

December 1992 Abstract

In an active database, triggers are red in response to the occurrence of events. Events may have attributes, and the values of these attributes may be used as parameters to functions invoked in the action part. Composite events may be speci ed as the composition of primitive events in the database system. The attributes of a composite event are derived from the attributes of its constituent primitive events. Event attributes become even more important when events are composite. These attributes can be used to restrict composite events that are recognized (by relating attributes of constituent primitive events), and to specify database state information from the time of occurrence of a consitutent event that must be preserved, as an attribute, for the composite event. We present a formalism for specifying attributes on composite events using event expressions. We show how such composite events can be detected eciently by extended nite state machines.

1 Introduction Of late, there has been a surge of interest in active databases [DBB+ 88, MD89, BM91, SSU91, GJ91, LLPS91, SK91]. In an active database, a trigger res when an event of interest happens and some condition is satis ed. Most e orts have focussed on the trigger ring mechanism and the execution of the triggered action. However, recent work [DHL91, CM91, GJS92b, GJS92a] has recognized the importance of event speci cation. Of special interest is the speci cation of composite events, which are constructed from (simpler) basic events. Events usually have attributes. For instance, an insert event has information regarding the speci c relation (or object collection) updated as well as the speci c tuple (or object) inserted. Such information can be considered an attribute of the event. In addition, event attributes may include system-level information such as transaction id, user id, and time. In object-oriented systems, method invocation events may carry as attributes the parameters with which the respective methods are invoked. It is even possible for events to take as attributes arbitrary values (computed) from the database at the time at which the event occurs. 

Author's current aliation: Technion - Israel Institute of Technology, Haifa 32000 Israel.

1

It has been argued [DHL91, CM91, GJS92b, GJS92a] that the speci cation and detection of composite events is important for an active database. Here, a composite event comprises a sequence of constituent primitive events. The attributes of a composite event must be derived from the attributes of the constituent primitive events. These attributes may be required to specify parameters for the action part of a trigger. Values of these attributes may also be used in condition predicates that can be used to restrict the ring of the trigger. The diculty in dealing with attributes for composite events is that not all of the constituent events occur simultaneously. Thus attributes of events that have occurred in the past must be remembered. Clearly, it is impractical to remember the entire history of the system from start-up time, noting each event and the attributes with which it occurred. For event expressions without attributes, a solution has been found by recognizing them to be equivalent to regular expressions, and therefore using nite automata to recognize composite events. The states in the automaton \remember" exactly the amount of past history that is required. Event expressions with attributes are not equivalent to regular expressions and cannot be recognized by nite automata. A central contribution of this paper is a technique of augmenting states in a nite automaton with a data structure that \remembers" exactly as much of the history as needed for an event expression with attributes. We believe that the ability to associate attributes with events is crucial for any active database. In Section 2 we present a few examples to motivate this need. Section 3 covers background material de ning the basic structure of composite event expressions. The semantics of attributes in event expressions are precisely de ned in Section 4. If all attributes of interest are \immediate", not derived from events that may have occurred in the past, then the implementation of event expressions with attributes is particularly simple. In fact, we can remain within the realm of nite automata. This important special case is discussed in Section 5. The extended nite automata, used by us to implement general event expressions with arbitary attributes, is given in Section 6. The actual construction technique for extended is described in Section 7. In Section 8 we prove that our construction is correct. Related work is discussed in Section 9 and Section 10 summarizes our contributions. A proof of correctness of our automata construction technique is included in the Appendix.

2 Motivation We give a few examples of events with attributes. 1. Suppose there are events: sell(Stockname, Customerid, Amount) and buy(Stockname, Customerid, Amount). (For instance, buy and sell may be names of functions that are executed in an object-oriented database). A buy issued by a customer for a stock previously sold indicates returning con dence and may be an event worth detecting. Specifying this composite event requires equality matching 2

on the stockname and customerid attributes. Thus, one may de ne: confident(X )  relative(sell(X; Y; Z ), buy(X; Y; W )) 2. Passage of time. Clock tick events can be used to mark time. For instance, a tick could occur every second. But then, a composite event expression that says within 1 hour, would have to count 3600 tick events. If event expressions are implemented as automata, at least 3600 states are required to count 3600 tick events { in practice the number of states may be signi cantly greater. This is clearly an inecient implementation. If instead we make the time (as read at the same one second granularity, from a system clock) an attribute, then we can simply take the di erence of times. Thus the number of events that need to be tracked by the system is reduced, increasing system eciency. Further, the number of states in the automaton to recognize the event expression is also reduced. Let the following event be of interest: two IBM stock sales by the same customer within one hour of each other. By making the time an attribute, we only track IBM sale events. We don't need to track 3600 time (second elapsed) events. Thus, we de ne the event: sell(Stockname, Customerid, Amount, Time), and de ne the trigger: relative(sell(IBM; Y; A1; T1), sell(IBM; Y; A2; T2)) & (T2 - T1 < 3600) => action(Y) ; where action is some function to be called when the trigger res. 3. Transaction coupling. In [GJS92b] it has been suggested that composite events with immediate coupling between predicate and action in a trigger is enough to simulate all the di erent types of coupling required. For instance, a deferred coupling, between the event E and the action A would be written: first (before tcomplete) /+

E

=>

A

;

(The operator /+ should be read \since-the-last"). This trigger res the rst time a before tcomplete event happens after the event E : that is when the transaction that caused E to happen has completed and is getting ready to attempt to commit. Often the action A to be taken requires attributes obtained from the event E . But by the time this trigger res, the event E is gone. We have to be able to remember the attributes of E . For this purpose, write: first (before tcomplete) /+ E (X; Y; Z ) => A(X; Z ) ; Here the action, A, requires only the X and Z attributes of the event E , not the Y attribute. 4. View Maintenance, Integrity Constraints. Events like insert, delete, update have associated tuples that are inserted, deleted or updated. These tuples are needed in the action part to propagate the changes into views, or to check integrity constraints. We could associate events with class objects, and have the inserted/deleted/updated objects as attributes of the event. This permits set-oriented event computation. 3

3 Background

3.1 Events

An \event" is a happening of interest. In object-oriented databases, for example, events are related to object manipulation actions such as creation, deletion, and update or access by an object method (member function). Events can be speci ed to happen just prior to or just after the above actions. In addition, events can be associated with transactions and speci ed to happen immediately after a transaction begins, immediately before a transaction attempts to commit, immediately after a transaction commits, immediately before a transaction aborts, and immediately after a transaction aborts. Events can also be associated with time, for example, clock ticks, and the recording of the passage of a day, an hour, a second, or some other time unit. All of the above types of events are called primitive events. An event occurrence (informally referred to as an event) is a tuple of the form (primitive event, event-identi er). Event-identi ers (eids) are used to de ne a total ordering, denoted by giving E (16), < F (3) F (5) F (9) > giving E (17), < F (2) F (1) > giving E (3), 10

< F (3) F (1) > giving E (4), < F (2) F (9) > giving E (11), < F (3) F (9) > giving E (12), and < F (9) > giving E (9). Thus E (X ) is satis ed by the set of X values, f3; 4; 9; 11; 12; 16; 17g. 2

5 Automata Construction: Point Attributes Event expressions with attributes cannot, in general, be implemented using nite automata. The reason is that event attributes may have to be \remembered" from the time of their occurrence to the time at which the automaton accepts and the trigger res. However, a special case that can be implemented using nite automata is when the attributes do not have to be remembered across time: where they participate in predicates that can be evaluated instantaneously. In this section, we consider this special case, for two reasons. First, because this is a very common special case, and signi cantly more ecient to implement than the general case. Second, because the solution to this special case will aid in the understanding of the more general solution to follow. At each state in the automaton, the attributes of the most recent event (the one that caused the transition into the state) are assumed available. Suppose that the event expression E is de ned as F (X ) & p(X ), where F is another event expression, and p is a \mask" predicate function. Let MF be the automaton implementing F . The machine for E is then derived as follows: For each nal state, f , of MF , do the following:

 Create two new states fTRUE and fFALSE with the same output transitions as the state f .  Delete all output transitions of state f and replace them with two transitions, one on the event

pTRUE to the state fTRUE , and one on the event pFALSE to the state fFALSE . State f remains unaltered on all other events.

 Mark state fTRUE accepting, and mark states f and fFALSE non-accepting. Conceptually, we treat the evaluation of the predicate p as resulting in one of the events pTRUE or pFALSE . These events are ignored (cause self transitions) in all states except the nal states of MF . Here they cause a transition to the accept state or not, as appropriate. In an automaton there may be many di erent mask predicates. There are two distinct events associated with each. The order in which these mask predicates are evaluated, and the associated pseudo-events occur, is immaterial. Practically speaking, of course, every mask predicate cannot be evaluated after every event. Instead, we mark states that have outgoing transitions on such pseudo-events. When the automaton reaches such a state, it evaluates the corresponding predicate and makes the necessary transition. The e ect obtained is the same as in the conceptual scheme of the previous paragraph. 11

6 Automata with Attributes We implement event expressions using extended automata. An extended automaton, like a regular automaton, makes a transition at the occurence of each event in the history. In addition, an extended automaton may look at the attributes of the event, and may also compute a set of relations at the transition. In this section we describe the structure and behaviour of an extended automaton. The next Section 7 describes the construction of the extended autmaton for a given event expression.

6.1 Extended Automata The basic structure of an extended automaton M is the same as that of a nite automaton. Extensions include (1) A data structure Dt of the form discussed in Section 6.2 is associated with each state t of extended automaton M , and (2) A transition in M is represented by the tuple (t1 ; t2; u; H ), where t1 and t2 are states of automaton M , u is a symbol representing an event occurrence with attributes, along with an optional condition, and H is an equation deriving the data structure Dt2 on state t2 from (a) the data structure Dt1 on state t1 , and (b) attributes of event u. The conditions are associated with the transition symbol only for succinctness. The conditions can be separated out and dealt with as pseudo-events in the manner described in the previous Section 5. Each state of an extended automaton M for the expression E (X; Y ) is also associated with an accept relation At (X; Y ). The extended automaton accepts a history if it reaches a state t in which the accept relation At (X; Y ) is not empty; in which case the automaton M accepts event E (X; Y ) for the accept relation At (X; Y ). The accept relation At (X; Y ) will be the set of attribute values for which the expression E (X; Y ) is satis ed when the state t is entered. In most states of the automaton, the accept relation will be empty by de nition. For eciency of implementation, we would prefer not to check for non-emptiness of the accept relation at these states. States where there is a possibility of the accept relation being non-empty correspond to accept states of the corresponding regular automaton (without attributes). We shall refer to these states as potentially accepting states when convenient. The potentially accepting states are marked by double circles in the gures. The extended automaton M for expression E (X; Y ) also has an input relation I (X; Y ). The relation I (X; Y ) is the set of attribute values for which the expression E (X; Y ) is to be evaluated. The accept relation At(X; Y ) on any state t of M is a subset of the input relation I (X; Y ). The input relation I (X; Y ) can be in nite (representable by the single nonground tuple (X ), or dom(X ), where dom is the domain from which tuples of the relation derive values). In the above description, we permit the input relation of an automaton to be in nite. Non-ground tuples are used for ease of exposition. When implementing, the automata can be optimized so that no operation except copying is done on such non-ground relationsm or even so that non-ground relations are not used at all. The optimization is straightforward, and omitted for lack of space. 12

6.2 Data Structure As mentioned in Section 6.1, each state t of an extended automaton has an associated data structure, denoted by Dt . The data structure consists of a set of relations, Rt1; : : :; Rtk , speci ed at the time of construction of the automaton M . In addition, the accept relation At , associated with state t, is derived from the data structure Dt according to an accepting function , acct. Thus,

At = acct(Dt) The accepting function is speci ed at the time of construction of the automaton M , and can be of one of the following two forms:

 acct(Dt) = ;. (if the state t is not a potentially accepting state)  acct(Dt) = Rti1 [ : : : [ Rtik , for some relations Rti1 ; : : :; Rtik in the data structure Dt of state

t. (if the state t is a potentially accepting state) The relations Rt1; : : :; Rtk in the data structure Dt of a state t are derived in one of two ways: from the data structure Du of a preceding state q and the attributes of an event u, as de ned by the equations H of a transition (q; t; u; H ) leading into state t, or (2) from the other relations in the data structure Dt, de ned by equations (such as Rt1 = Rt3 [ Rt15) obtained during construction of the automaton M . The data structure Ds of the start state s of automaton M includes the input relation I of automaton M , and possibly other relations. When there are multiple relations in Ds , the input relation will be explictly marked so, and all other relations will be derived from the input relation, using equations speci ed during construction of automaton M . A deterministic extended automaton has only one start state. Our construction would ensure that every non-deterministic extended automaton also has only one start state.

6.3 Notation Given an expression E , let M be the extended automaton to recognize E , and let s be the start state of automaton M . Given an expression Ei , let Mi be the extended automaton to recognize Ei , and let si be the start state of automaton Mi . For an accept relation At , we use deref(At ) to access the accepting function of state t, that is: deref(At ) = acct

EXAMPLE 6.1 Let At = Rt [ Rt , and let a relation Q be de ned by the equation 3

5

Q = deref(At) Then, the equation de ning Q expands to

Q = Rt3 [ Rt5 13

The dereferencing is important since the accepting relation At may then be rede ned, without changing the de nition of Q. 2 During the (inductive) construction of extended automata, conjunctions and disjunctions create a cross product of states, and determinization creates sets of states. Thus a state of an extended automaton, is either a base state, or a set of states (due to determinization), or a tuple of states (due to cross product). Complex states are represented thus: The state t = (u; v ) represents the cross product of states u and v . The state t = fu; v; wg represents the set of states u, v , and w created during the determinization process. We need to refer to data structures of component states during each step of the construction of the extended automata. When a new extended automaton is constructed from two component automata, or by determinization of a non-deterministic extended automaton, the data structures associated with each state of the component automata are available separately. (However, the separate structure is lost once the new automata is constructed, and cannot be used in further steps of the automata construction.) We use the dot notation to refer to the the data structures and accepting relations associated with component states of a complex state. If state t = (u; v ), Dt represents the data structure associated with the full state t, Dt:1 represents the data structure Du associated with component state u of state t, and Dt:2 represents the data structure Dv associated with component state v of state t. If state t = fu; v; wg, Dt represents the data structure associated with the full state t, Dt!u represents the data structure Du associated with substate u of state t, Dt!v represents the data structure Dv associated with substate v of state t, and Dt!w represents the data structure Dw associated with substate w of state t.

7 Construction of Extended Automata Given an event expression E (X ) with attributes, we inductively construct an extended automaton to accept E (X ). The induction is on the operators used in de ning the expression E (X ). Some of the event operators lead to a non-deterministic extended automaton. In such a case, we determinize the automaton (Section 7.9) before applying the construction for the next event operator. We describe the construction of the extended automaton for each event operator in our language (as de ned in Section 4). The presentation is simpli ed by (1) considering representative examples of the application of event operators, (2) preferring a construction that adds more states and leads to a bigger automaton if doing so makes the construction easy to understand, and (3) Using variables such as X in the expression E (X ) to stand for a vector of arguments.

7.1 Basic Events The extended automaton M for the basic event a(X ) is shown in Figure 1. Figure 1 uses conditioned transitions on the input event a(X ); X 2 R, for some relation R. As discussed in Section 5, we use conditioned events for conciseness: in the implementation we split the transition into two transitions. 14

a(X)|X ε R

31 (X)={(X)} 32 R = R 31 31 A = R 32 3 R

I

a(X)|X ε I

1

s

R

(X)={(X)} 32 R = I 31

else R

= I 21

2 A = 2

3

a(X)|X ε R 21 R (X)={(X)} 32 R = R 31 21

φ

else R = 21

R 31

else R = R 21 21

Figure 1: Extended Automaton M for the expression E (X )  a(X ) The automaton M accepts at every event a(X ) such that the attribute X is in the input relation I for which the expression a(X ) is to be evaluated. There is a start state s = 1, a potentially accepting state 3 to which a transition is made if the input symbol is a and the given predicate is satis ed, and a third state 2, to which a transition is made otherwise. The data structure D1 associated with the start state consists of one relation I . The data structure D2 associated with state 2 consists of one relation R21, and this relation is always a copy of the input relation. The data structure D3 associated with state 3 consists of two relations R31 and R32. Relation R31 is simply a copy of the input relation, and the relation R32 contains the x value of the a(x) event occuring at the current time. The accept relation on the potentially accepting state 3, A3 is derived by the equation A3 = R31. The accept relations associated with the other two states are empty.

7.2 Attribute Propagation The keyword any matches any event in the event alphabet. Since, the attributes associated with an event depend on the event type, the event any cannot have any attributes associated with it. Nonetheless, given any attributes (X; Y ), we can de ne an interesting automaton M(X;Y ) , shown in Figure 2, to propagate a given set of attribute values so that they are always available at any future time. The automaton is not of much use by itself: it accepts at every event occurrence. Its use is in conjunction with other automata, to propagate attribute values that are not used in some automaton but are still required to be carried forward in the result (as in the construction of relative, Section 7.6).

15

any R f1(X,Y)=R

(X,Y) f1

I(X,Y) any

s

f

A f(X,Y)=R (X,Y) f1

Rf1(X,Y)=I(X,Y)

Automaton for (X,Y) Propagation

Figure 2: Extended Automaton M(X; Y ) for Propagating Attributes

7.3 Existential Quanti er Given a deterministic extended automaton M1 for the expression E1(X ), the automaton M for the expression E (Y )  ((9X )E1(X; Y )) is constructed as follows: The automaton M has the same states, data structures, transitions, potentially accepting states, and the start state as the automaton M1 . The accept relations are di erent: For each potentially accepting state t of automaton M , rede ne the accept relation as

At(Y ) = Y deref(At (X; Y )) The input relation I (X ) of automaton M is di erent from the input relation I1(X; Y ) of automaton M1. The relation I (X ) is added to the data structure Ds1 , and the relation I1(X; Y ) 2 Ds1 is derived from I (X ) as I1(X; Y ) = dom(X )  I (Y ); where dom(X ) represents every value in the (potentially in nite) domain for attribute X .

7.4 Conjunction For the conjunctive expression

E (X; Y; Z )  E1(X , Y )

&&

E2 (Y , Z )

automaton M is obtained by taking a cross product of automata M1 (X; Y ) and M2 (Y; Z ). The data structure D(t1;t2 ) of state (t1 ; t2) in automaton M includes all relations in data structures Dt1 and Dt2 , where t1 and t2 are states in automata M1 and M2 . Each relation in D(t1;t2 ) is derived as it was being derived in automata M1 or M2 (whereever it came from). 16

The start state of automaton M is (s1 ; s2). The input relation I (X; Y; Z ) is added to data structure D(s1;s2 ), and the relations I1(X; Y ) and I2(Y; Z ) (also in D(s1 ;s2 ) ) are derived from I (X; Y; Z ) as follows: I1(X; Y ) = (X;Y ) (I (X; Y; Z )), and I2 (Y; Z ) = (Y;Z) (I (X; Y; Z )). A state (t1 ; t2) is marked potentially accepting if both states t1 and t2 are potentially accepting in M1 and M2. The accept relation on state (t1; t2) is derived as

A(t1 ;t2 )(X; Y; Z ) = deref(At1 (X; Y )) 1 deref(At2 (Y; Z ))

7.5 Disjunction For the disjunctive expression

E (X )  E1(X )

||

E2 (X )

automaton M is obtained by taking a cross product of automata M1 (X ) and M2 (X ). The data structure D(t1;t2 ) of state (t1 ; t2) in automaton M includes all relations in data structures Dt1 and Dt2 , where t1 and t2 are states in automata M1 and M2. Each relation in D(t1;t2 ) is derived as it was being derived in automata M1 or M2 (whereever it came from). The start state of automaton M is (s1; s2 ). The input relation I (X ) is added to data structure D(s1;s2 ), and the relations I1 (X ) and I2 (X ) (also in D(s1 ;s2 ) ) are derived from I (X ) as follows: I1(X ) = I (X ), and I2(X ) = I (X ). A state (t1 ; t2) is marked potentially accepting if either of the states t1 or t2 is potentially accepting in M1 and M2 . The accept relation on state (t1 ; t2) is derived as

A(t1 ;t2 )(X; Y; Z ) = deref(At1 (X; Y )) [ deref(At2 (Y; Z ))

7.6 Relative Given the expression

E (X; Y; Z )  relative(E1(X; Y );E2(Y; Z )); First construct an automaton Ma as the conjunction of automata M1 (for E1(X; Y )) and MX;Y;Z (the attribute propagation automaton), and an automaton Mb as the conjunction of automata M2 (for E2(Y; Z )) and MX;Y;Z (the attribute propagation automaton). The automaton M for expression E (X; Y; Z ) is obtained from Ma and Mb by adding an  transition from each of the potentially accepting states of Ma to the start state sb of Mb . The state sb becomes an internal state in automaton M . The input relation Ib (X; Y; Z ) on state sb is a part of the data structure Dsb , and is derived along each of the  transitions from a potentially accepting state ta (of Ma) to sb as follows: Ib(X; Y; Z ) = deref(Ata (X; Y; Z )) The states sa is the start state for automaton M . The relation Ia (X; Y; Z ) in the data structure Dsa is agged as the input relation for automaton M . 17

A state t of M is potentially accepting if t is also in automaton Mb , and is a potentially accepting state in Mb . The accept relation on state t is the same as its accept relation in Mb . All states derived from automaton Ma are not potentially accepting (Accept relation = ;). The automaton M derived above is non-deterministic, and must be determinized (as discussed in Section 7.9) before being used in constructing a bigger automaton.

7.7 Linear Recursion The general linear recursive expression can be normalized as

E (X; Y )  E0(X; Y )

||

(9Z )

E (X; Z ),L(Z; Y ))

relative(

||

(9Z )

R(X; Z ),E(Z; Y )):

relative(

where X; Y , and Z represent vectors of variables. We construct the automaton M for E (X; Y ) in multiple steps: 1. Create an automaton MR+ for the expression R +(X; Z ) de ned as follows:

R +(X; Z )  R(X; Z )

||

(9W )

R +(X; W ), R(W; Z ))

relative(

Let MR be the extended automaton for expression R(X; Z ). Let A(X; Y; Z )  R(X; Z ) && dom(Y ) be another expression and let MA be the automata for A(X; Y; Z ) (conjunction of MR and the propagation automaton MY ). The expression A(X; Y; Z ) is needed because the second disjunct in R +(X; Z ) has three variables. The intuition is that attribute X and Z capture the attributes of R(X; Z ), while Y is the rst attribute of R+ that must be carried through all the iterations. To create MR+ , rst add  transitions from each potentially accepting state of automaton MA to the start state sA of automaton MA . The  transition t ! sA derives a new value for the relation IA (X; Y; Z ), which is the input relation for automaton MA , and is one of the relations in the data structure DsA of the start state sA of automaton MA :

IA(W; Y; Z ) = Y;W (deref(At(X; Y; W )))  fdom(Z )g: where At (X; Y; W ) is the accept relation on state t, viewed as a relation on attributes X , Y , and W. Second, create a copy of automaton MR (X, Z), call it MU (X; Z ). Create an  transition from each potentially accepting state tu of MU to the start state sA of automaton A, deriving the relation IA (X; Y; Z ) as follows:

IA (Z; X; Y ) = deref(Atu (X; Z ))  fdom(Y )g: Thus, from now on, the X values accepted by U will be propagated around in automaton MA , the Z values will be used to restrict the rst argument for which MA accepts, and the matching values for Y will be found. 18

The resulting automaton is MR+ . The start state of MR+ is equal to su , the start state of automaton MU , the input relation IR+ is equal to the input relation IU . Every potentially accepting state of MU is a potentially accepting state in MR+ with the same accept relation. Every potentially accepting state tA in MA is also a potentially accepting state in MR+ , with the accepting relation AtA (Y; W ) = Y;W (deref(AtA (X; Y; W ))) The resulting automaton MR+ is non-deterministic. Determinize before proceeding further. 2. Create an automaton MB for

B(X; Y )  E0(X; Y )

R + (X; Z ), E0(Z; Y )).

|| relative(

Use the rules for inductively constructing automata for disjunction and relative given earlier. 3. Create an automaton ML+ for

L + (Z; Y )  L(Z; Y )

L + (Z; Y 0 ), L(Y 0 ; Y ))

|| relative(

Automaton ML+ is constructed in the same way as automaton MR+ in Step 1 above. 4. Create an automaton M (X; Y ) for

E (X; Y )  B(X; Y )

B(X; Z ), L + (Z; Y ))

|| relative(

7.8 relative+ and conrel+ As discussed in Section 4, relative+(E (X )) and con(Y )(conrel+(E (X ))), where con is an incrementally computabile function, can be rewritten using linear recursion. Thus the automata for these expression can be obtained by using the construction for linear recursion (Section 7.7).

7.9 Determinizing Automata with Relations Several of the construction steps above introduce  transitions into the automata under construction. However, deterministic automata are far easier to implement than non-deterministic ones. In this section we discuss how to determinize an automaton M with data structures on states. Determinize automaton M as you would normally determinize an automaton. Let t = ft1; t2; t3g be a state of the determinized automaton D, where t1; t2; t3 are states in the non-deterministic automaton M . The data structure Dt for state t includes all the relations in the data structures Dt1 , Dt2 , and Dt3 , of states t1, t2, and t3. The relations in the data structure continue to be derived as they were in substates t1; t2; t3, independently of each other, and from the transition leading into node t = ft1; t2; t3g. 19

Let there be transitions (u; t1; x; E1) and (v; t1; x; E2) in the non-deterministic automaton M , and let these imply a transition (fu; v; wg; ft1; t2g; x; E ) from state fu; v; wg to state ft1; t2g in the determinized automaton D, Then, each relation R that originally belonged to the data structue Dt1 (on state t1 in the non-deterministic automaton M ) is derived as the the union of relations that would be derived in the automaton M using the transition equations E1 and E2 . We illustrate with an example.

EXAMPLE 7.1 Let the data structure Dt1 include two relations Rt1 (X ) and Rt1 (X; Y ). Let equations E derive Dt1 as: Rt1 (X ) = Ru (X ), and Rt1 (X; Y ) = X (Ru (X; Z ))  (Y ), where Y 1

1

1

1

2

2

2

is an attribute of the input symbol. Let equations E2 derive Dt1 as: Rt11 (X ) = X (Rv1(X; W )), and Rt1 2 (X; Y ) = Rv2 (X; Y ). Then, equations E in the deterministic automaton D de ne data structure Dt1 as: Rt11 (X ) = (Ru1 (X )) [ (X (Rv1 (X; W ))), and Rt12 (X; Y ) = (X (Ru2(X; Z ))  (Y )) [ (Rv2(X; Y )). 2 The start state of automaton D is the same as the start state of M . A state t = ft1; t2; t3g in D is potentially accepting if either of states t1, t2, or t3 are potentially accepting in automaton M . The accept relation on state t is derived as:

At = deref(At1 ) [ deref(At2 ) [ deref(At3 )

8 Correctness of Automata Construction We consider only monotonic operators: (1) Propagation, (2) Conjunction, (3) Disjunction, (4) Existential, (5) Relative, (6) relative+ with attributes, (7) linear recursion, (8)conrel+, and (9) mask predicates.

Monotonicity Before getting into a proof of correctness of the determinization procedure, an few observation is in order: Given an input relation to an expression E , the following describes a correct execution of E 0s automaton M : Let the data structure associated with the start state be the cartesian product of the domains of the attributes of E . At the accept states, take the intersection of the given input relation with accept relation produced by the automaton M . By considering the outermost operators in expression E individually, it is easy to see that this restriction on the accept relation can be \pushed inside" for certain operators. This is exactly what has been done in the automata constructions described above. If a non-deterministic automaton could simultaneously reach multiple accept states, then the overall accept relation must be obtained as the union of the accept relations of these individual states. The reason is that each tuple in an accept relation is derived from one \justi cation" for the occurrence of 20

the speci ed composite event. If multiple accept states are reached, we have more justi cations. In an equivalent deterministic automaton, there is one state representing the combination of accept states. The accept relation at this state must be obtained as the union of its components.

De nition 8.1 Acceptance in non-deterministic automaton: Given a non-deterministic au-

tomaton M for event expression E (X ), and an input history h, let there be k accepting paths to potentially accepting states t1 ; : : :; tm, with multiple paths to the same state being possible. Let the accepting relations computing along each such path be A1 ; : : :; Ak . Then, the event E (X ) is accepted at the last event in history h with the accepting relation A1 [ : : : [ Ak . 2

Theorem 8.1 (Correctness of Determinization): Let M be a correct non-deterministic au-

tomaton for the expression E (X ), and let h be an input history. Let there be k paths to (potentially accepting) states t1 ; : : :; tm from the start state on history h, and let the accept relations associated with the states t1 ; : : :; tm along the k paths be A1 ; : : :; Ak . Let D be the deterministic automaton obtained by determinizing automaton M by the algorithm given above. Then, 1. There is a path in automaton D, on history h, from the start state to a state g containing substates t1 ; : : :; tm . 2. The accept relation on state g is

Ag = A1 [ : : : [ Ak

The accept relation on state g may have arity 0, in which case the accept relation on state g is a boolean. 2

Theorem 8.2 (Correctness of Nondeterministic Automata Construction): Given correct

deterministic automata MF and MH for expressions F and H of n or less operators, let M be the nondeterministic machine for an expression E obtained by applying one operator to F and H , as described in previous sections. Let there be k accepting paths in M on a history h, computing accept relations A1 ; : : :; Ak. Then the semantics of expression E (X ) requires the event expression to be satis ed for the relation A1 [ : : : [ Ak . 2

Theorem 8.3 (Invocation Monotonicity): Let MD be the deterministic automaton constructed for an expression E (X ). Let the automaton MD be used in a non-deterministic larger automaton, and let two paths of the larger non-deterministic automaton, operating over some history h, enter D, perhaps at di erent points in the history. Let, the two paths, when evaluated separately, accept at a point p in the history, relations R1 and R2. Now consider the following mixed computation of the two paths. Each path has a token, t1 and t2 , carrying the state of machine D where the path resides. At each point in history, the token advances, and relations are computed at the new state. It may happen that at some point in history the two

21

tokens enter the same state s. If they do, the data structures computed by the two tokens at the state s are unioned. From that point on, the two tokens proceed as one. Let the accept relation for the computation be the accept relation of state reached at the accepting point p in the history by the two tokens jointly, or the union of the two di erent states reached by the two tokens (provided they never meet). Then, the accept relation due to the second computation is equal to R1 [ R2. 2

The proof for all three theorems is given in the Appendix.

9 Related Work In the Alert system [SPAM91], basic events are inserts/deletes/updates to tables that have been declared active. Composite events are SQL queries de ned over active tables. The attributes of the events are the values of the tuple inserted/deleted/updated, There is no notion of time, and unless time is coded as one of the attributes of a relation, the queries cannot relate the temporal occurence of events. The paper uses an incremental evaluation technique to recompute queries whenever a change occurs to active tables. When recomputing the queries, both the log of changes made to the active tables since the last time the query was recomputed, and the current state of the database must be available. There is no way to refer to past tuples that may have been deleted or changed. (Again, if we chose to store all deleted tuples in a separate relation, we could reference them in the query). Our formalism is di erent. There is a temporal order between events, and this order can be used in specifying composite events. All the events and their attributes are not assumed to be stored in a relation in the database so as to be available to recompute queries. We automatically determine what the relevant events/attributes are that need to be stored, and store only those (during the evaluation of the event expression). The incremental computation done at each event occurence is relatively simple: a single transition in a nite automaton along with single associated relational operation. In the Starburst rule system [WF90], there is a xed set of attributes: the deleted, inserted, and updated tuples. These attributes are shared between all events of similar type, making the attributes more like global variables rather than attributes of speci c events. Further, the Starburst rule system only considers basic events. It does not have a concept of temporal sequences of events. In [CM91] the problem of attributes with event expressions is introduced. Rather than compute all possible sets of attribute values with which a composite event can be satis ed, the authors here use some pre-speci ed rules to select exactly one execution path to justify a composite event. While the solutions presented here may be ecient in some cases, no general technique for creating such automata is given. In [Ch92] Chomicki has investigated a language called MTL (metric temporal logic), which is essentially past temporal logic augmented with an ability to refer to database states at time points 0; 1; : : :. Propositional temporal logic is closely related to regular expressions without the ability to 22

count [Em90]. The language in [Ch92] is not propositional. (However, its claim for being \historyless" actually holds when there is a xed set of attribute values { which means that the problem is essentially propositional.) Our language is also closely related to regular expressions ( nite automata) and is non-propositional. For the propositional case our implementation is optimal. For the general case we keep sucient information to be able to trigger, the optimizations we present can signi cantly reduce the amount of such information. Like the solution in [Ch92] we also keep auxiliary data structures to retain crucial information. Unlike [Ch92] our language has counting ability as displayed by relative+ and linear recursion. For instance, we can express periodic triggers, such as \every tenth withdrawal," whereas [Ch92] cannot. There are numerous works on temporal issues in databases (see [MS91]). One direction that is similar to ours is introducing temporal operators. As a recent example of work in this direction, Gabbay and McBrien [GM91] use a variant of temporal logic called US logic that extends rst order logic with the modal operators Until and Since. The language is to be applied to historical databases - a series of relational databases with a distinguished relation time. They describe a temporal relational algebra TRA for capturing US logic and discusses its encoding within relational algebra. A temporal version of SQL, and its encoding in standard SQL is described as well, its encoding of TRA is explored. No query optimization or incremental evaluation are described. (We note that once temporal operators are expressed within \conventional" query languages, various view maintenance techniques may prove useful.) We ourselves have previously done work on event expressions. In [GJS92a] we presented our (propositional) model for de ning composite event expressions, and suggested that attributes would be useful without a detailed study of semantics or implementation. In [GJS92b] we discuss the integration of composite event expression with Ode, and study the relationship between triggers with composite events and transactions. The current paper assumes the work in these two previous papers, and focusses on the issues in using events with attributes.

10 Conclusions Active databases are rapidly gaining popular favor, and the speci cation of events at which to check and re triggers in an active database is likely to be a subject of growing importance. In this paper, we studied the speci cation of composite event expressions where the events have attributes. Our contributions include the de nition of a precise semantics for event expressions with attributes (even in the presence of linear recursion), and the development of an ecient implementation technique to detect such composite events. Whereas event expressions without attributes are equivalent in expressive power to regular expressions, and can conveniently be mapped into nite automata, this is not true once events are allowed to have attributes. We have augmented nite automata with relations in this paper to implement event expressions with attributes. Using relations as the auxiliary 23

data structure facilitates integration of the triggering system into a database. We believe that composite events are a powerful mechanism and expect to see their widespread use in the active databases of tomorrow. However, the utility of composite events without attributes is quite limited, as we have experienced rst hand through our prototype implementation of an event triggering system at AT&T Bell Laboratories. In this paper, we have taken the rst steps towards making practical a truly useful composite event speci cation facility.

References [BM91] [Ch92] [CM91] [DAJ91] [DBB+ 88]

[DHL91] [Em90] [GJ91] [GJS92a] [GJS92b] [GM91]

Catriel Beeri and Tova Milo. A model for active object oriented database. In Proceedings of the Seventeenth International Conference on Very Large Databases (VLDB), pages 337{349, Barcelona, Spain, September 3-6 1991. Jan Chomicki. Real-Time Integrity Constraints In Proceedings of the Eleventh Symposium on Principles of Database Systems (PODS), pages 274{281, San Diego, CA, June 2-4 1992. ACM SIGACTSIGMOD-SIGART. S. Chakravarthy and D. Mishra. An event speci cation language (snoop) for active databases and its detection. Technical Report CIS TR-91-23, University of Florida, 1991. Shaul Dar, Rakesh Agrawal, and H. V. Jagadish. Optimization of generalized transitive closure. In Proceedings of the Seventh IEEE International Conference on Data Engineering, Kobe, Japan, February 1991. Umeshwar Dayal, Barbara Blaustein, Alex Buchmann, U. Chakravarthy, Meichun Hsu, Rivka Ladin, Dennis R. McCarthy, Arnon Rosenthal, S. Sarin, Michael J. Carey, Miron Livny, and R. Jauhari. The hipac project: Combining active databases and timing constraints. ACM-SIGMOD Record, 17(1):51{70, March 1988. Umeshwar Dayal, Meichun Hsu, and Rivka Ladin. A transaction model for long-running activities. In Proceedings of the Seventeenth International Conference on Very Large Databases (VLDB), pages 113{122, Barcelona, Spain, September 3-6 1991. E. A. Emerson. Temporal and Modal Logic in J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume B, pages 995-1072, Elsevier Science Publishers (North Holland), 1990. Narain Gehani and H. V. Jagadish. Ode as an active database: Constraints and triggers. In Proceedings of the Seventeenth International Conference on Very Large Databases (VLDB), pages 327{336, Barcelona, Spain, September 3-6 1991. Narain Gehani, H. V. Jagadish, and Oded Shmueli. Composite event speci cation in active databases: Model and implementation. In Proceedings of the Eighteenth International Conference on Very Large Databases (VLDB), pages 327{338, Vancouver, Canada, August 23-27 1992. Narain Gehani, H. V. Jagadish, and Oded Shmueli. Event speci cation in an active object-oriented database. In Proceedings of ACM SIGMOD 1992 International Conference on Management of Data, pages 81{90, San Diego, CA, June 2-5 1992. D. Gabbay and P. McBrien. Temporal logic and historical databases. In Proceedings of the Seventeenth International Conference on Very Large Databases (VLDB), pages 423{430, Barcelona, Spain, September 3-6 1991.

24

[LLPS91] Guy M. Lohman, Bruce Lindsay, Hamid Pirahesh, and K. Bernhard Schiefer. Extensions to starburst: Objects, types, functions, and rules. Communications of the ACM, 34(10):94{109, October 1991. [MD89] Dennis R. McCarthy and Umeshwar Dayal. The architecture of an active database management system. In Proceedings of ACM SIGMOD 1989 International Conference on Management of Data, pages 215{224, Portland, OR, May 1989. [MPR90] Inderpal Singh Mumick, Hamid Pirahesh, and Raghu Ramakrishnan. The magic of duplicates and aggregates. In Proceedings of the Sixteenth International Conference on Very Large Databases (VLDB), pages 264{277, Brisbane, Australia, August 13-16 1990. [MS91] L. E. McKenzie, Jr. and R. T. Snodgrass. Evaluation of Relational Algebras Incorporating the Time Dimension. ACM Computing Surveys, 23(4):421{500, December 1991. [Mum91] Inderpal Singh Mumick. Query Optimization in Deductive and Relational Databases. PhD thesis, Stanford University, Stanford, CA 94305, USA, December 1991. Technical Report No. STAN-CS91-1400. Also available from University Micro lms International, 300 N. Zeeb Road, Ann Arbor, MI 48106. (313)761-4700. [SK91] Michael Stonebraker and Greg Kemnitz. The postgres next-generation database management system. Communications of the ACM, 34(10):78{93, October 1991. [SPAM91] Ulf Schreier, Hamid Pirahesh, Rakesh Agrawal, and C. Mohan. Alert: An architecture for transforming a passive dbms into an active dbms. In Proceedings of the Seventeenth International Conference on Very Large Databases (VLDB), pages 469{478, Barcelona, Spain, September 3-6 1991. [SSU91] Abraham Silberschatz, Michael Stonebraker, and Je rey D. Ullman. Database systems: Achievements and opportunities. Communications of the ACM, 34(10):110{120, October 1991. [WF90] Jennifer Widom and Sheldon J. Finkelstein. Set-oriented production rules in a relational database system. In Proceedings of ACM SIGMOD 1990 International Conference on Management of Data, pages 259{270, Atlantic City, NJ, May 23-25 1990.

A Correctness Proof We prove Theorem 8.1, Theorem 8.2, and Theorem 8.3 together.

Proof: (All three Theorems): The proof is by induction over the number of operators in expression E(X). The inductive hypothesis is that a correct determinized automaton is available for every expression with n operators or less. We need to prove that adding the (n + 1)st operator followed by determinization preserves correctness (that is, all three theorems are true for the expression E(X) with n+1 operators. The monotonicity conditions are important during the proof that the two token computation does not mix up the results of the two computations when the tokens reach a common state and the data structures along the two paths get unioned.  is correct. The automata constructed for a(X) Basis, m = 0 : The automaton for the base events a(X)

is deterministic, and there is only one path from start state on any history h. Thus the Nondeterministic Construction and Determinization Theorems are trivially correct. The Invocation Monotonicity theorem requires a look at the automaton for a(X). The two paths can cause a union at states 1, 2, or 3. The accept relation (there is only one, at state 3) is monotonic in the relations kept at various states, so the two token computation would generate a union accept relation. The propagation automaton is also deterministic, so it satis ed thsm 1 and 2. The accept relation at state f is monotonic with respect to all the relations kept in the automaton, so taking a union at any stage during the two token computation will cause the accept relation to be unioned.

25

Inductive Step, m = n + 1 : We consider each operator one by one. Let MF (Y ) and MH (Z) be the

correct deterministic automata over expression F and H containing at most n operators each. Let E(X) be built by applying one more operator. (1) Conjunction and (2) Disjunction do not introduce any nondeterminism, and satisfy thms 1 and 2 trivially. Theorem 3 is satis ed since the two machines MF and MH work independently all the time, and the accept relation of ME is the union or intersection of the two component accept relations at each state. By the inductive hypothesis, the accept relation of MF and MH are correctly unioned during the token computation, so machine ME will also correctly union its own accpt relation. (3) Existential quanti cation: E(X)  (9Y )F(X; Y ). The automaton ME is non-deterministic, but there is still only one path to the accept state on any one given history h. Why? By induction, there is exactly one path to the one (maybe accepting) state t of MF , with a correct accept relation F(X; Y ). Since there is  transition in ME from state t to a new state t whose accept relation is X F(X; Y ), the semantics is preserved (theorem 2 holds).. Determinization simply merges the new state t with the accept state t of MF , creating state g, with an accept relation = accept relation of t . Since there was only one path on history h to state t in automaton MF , there is only one path to g in deterministic machine D, and the accept relation on g includes the accept relation in non-deterministic machine M (theorem 1 holds). Consider a two token computation over ME . Each computation will derive an inpu relation = dom(Y ) for MF and evaluate F, which will compute an accept relation that is a union of the two separate computations. ME simply takes a projection of MF accept relation, and so will have the two token consistenct property. (5) relative: E(X)  relative(F1(X); F2(X)). An expression with any di erent variable patterns is put into above form by taking a conjunction with the propagation automaton. MF1 and MF2 are correct deterministic automata for F1 and F2 . Given a history h, for each pre x subhistory h . there is exactly one path to one (maybe accepting) state t1 of MF1 . There is an  transition from state t1 to the start state s2 of automaton MF2 , with the relation F1(X) for the subhistory passed onto automaton MF2 . For the corresponding sux history, there is exactly one path from s2 to a (maybe accepting) state t2 of MF2 , with the correct accept relations At2 (X). There may be a total of k = length(h) paths from start state s1 of MF1 to (possibly accepting) states t21 ; : : :; t2k (not all distinct) of MF2 . In this case, the semantics is correctly de ned as At21 [ : : : [ At2k . (Theorem 2 holds) After determinizing, two or more distinct paths can lead to the same state of MF2 at the same time if they enter the start state s2 at separate times. Along these paths, the input relations to MF2 may be di erent. Since MF2 satis es the two token computation property, the accept relation for MF2 is unioned correctly across all these paths. Accept relation for ME is equal to accept relation for MF2 , and is thus computed correctly (Theorem 1 holds). Let ME be evaluated with two tokens. The component machines and their data structures work correctly, unioning their accept (and intermediate) relations to derive an accept relation that is the union of accept relations along the two independent path evaluations. The only new operations we introduce during the construction for relative are (1) relation copy, from accept states of MF1 to start state of MF2 , and (2) unions of relations during determinization. Both of these operations are monotonic with respect to union of their arguments, so taking an earlier union does not e ect the computation of either automaton. (6) relative+: E(X)  relative+(F(X)). MF is a deterministic automaton for F, Given a history h, for each pre x subhistory h there is exactly one path to one (maybe accepting) state t1 of MF . There is an  transition from state t1 to the start state s1 of automaton MF , with the accept relation of MF being passed as the input relation into s1 . For the corresponding sux history, if we look at a pre x h , there is exactly one path from s1 to a (maybe accepting) state t1 of MF , and so on. Thus, on the history h, there may be a total of k = f(length(h)) paths from start state s1 of MF1 to (possibly accepting) states t1 ; : : :; tk (not all distinct) of MF , going around the loops created by adding the  transition. In this case, the expression relative+(F(X)) 0

0

0

0

0

0

00

0

26

accepts should accept with the relation At1 [ : : : [ Atk , exactly what Theorem 2 desires. After determinizing, two or more distinct paths can lead to the same state of MF1 at the same time if they enter the start state s1 at separate times. Along these paths, the input relations to MF1 may be di erent. Since MF1 satis es the two token computation property, the accept relation for MF1 at the end of the current iteration is unioned correctly across all these paths. For the next iteration, the existing paths in automaton F and new paths starting from the accepting states with the accepting relations co-incide. Again, by the two token computation property, the automaton MF computes the correct accept relations at every point in the history where the current iteration may end. Since each iteration works correctly, and the accept state of ME is equal to accept state of F, at every point i nthe history, the accept relation for E(X) is computed as the union of accept relations for F(X) across all iterations (Theorem 1 holds). Like relative, the only operations we introduce into the machine for F are (1) relation copy (on the  transition), and (2) union`(during determization). Both of these operations are monotonic with respect to union of their arguments, so taking an earlier union does not e ect the computation of either automaton. (7) Linear Recursion: The automata construction is decomposed into several steps, each of which must be proven correct. Two steps involve disjunction and relative alone, and are thus OK. One step involves a general form of relative+, where attributes are computed di erently then for the standard relative+ expression. For this step, Theorems 2 and 1 hold using exactly the same arguments as for standard relative+. Theorem 3 holds since the additional operations of join and projection are also monotinc with respect to their arguments. (8) conrel+: For incrementally computable aggregation functions, conrel+ can be expressed using linear recursion and an arithmetic function, such as +, or a foreign C++ function. Both these operations are monotonic with respect to union of the number of arguments they are passed, so theorems hold. (9) Predicate testing (masking): E(X)  F(X) & p(X). Automata stays deterimistic: so theorems 2 and 1 hold. The predicate tests work on each tuple in the accept relation of F(X), so it is monotonic, and theorem 3 holds. 2

27

Suggest Documents