Dealing with deviations in DBMSs: an approach to revise consistency constraints
Elisabetta Di Nitto
Letizia Tanca
CEFRIEL - Politecnico di Milano Universita di Verona Via Emanueli, 15 Ca'Vignal, Strada Le Grazie 1 20126 Milano (Italy) Verona (Italy)
[email protected] [email protected]
Abstract
Information systems are used to support the execution of business processes. They are usually developed on the top of database management systems (DBMSs), that store all data used in the business process. Consistency constraints on the database schema re ect the policies and procedures adopted in the business process: they are de ned and enforced to guarantee system correctness. During system operation, some constraints may result obsolete because of changes in the procedures the database is supporting or of incomplete information introduced during the design of the system itself. In both cases, we say that the business process is deviating from its model, represented by the consistency constraints. In this paper we present a semi-automatic approach for updating consistency constraints when they result obsolete: information on constraint violations occurred during database operation are collected, and are used to identify new acceptable constraints. The goal of the approach is to provide support during system operation, by prompting the database administrator with a set of possible constraints modi cations, whereby she/he can choose the appropriate one.
1 Introduction Information Systems (ISs) support users in the execution of oce and business procedures and in the management of large amounts of data. Current ISs are generally built on top of DataBase Management Systems (DBMSs), that provide the infrastructure to store, protect, and retrieve data, and to de ne and check consistency constraints. Consistency constraints should re ect the policies and procedures that humans in the business process are supposed to follow during IS operation: their enforcement guarantees that the data contained in the database always respect the semantic rules established at design time. These rules, however, may change over time, for example because of modi cations occurring in the organization, or because of the adoption of new technologies. When this happens, some database constraints may become no longer up-to-date and be frequently violated by database operations that would be allowed according to the new rules. In this case, we say that the business process is deviating from the model represented by the consistency constraints. If this deviation is simply allowed by the system, an inconsistency between the database state and the consistency constraints occurs. Some work has been done in database research on the problem of managing this kind of inconsistent knowledge bases. In all the approaches we are aware of, a constraint violation is considered as an exceptional situation that can be temporarily tolerated, but, eventually, has to be xed in some way by modifying the database state: in other words, consistency constraints are considered to be always correct, and are never modi ed.
We argue that this approach does not take into account the high dynamicity of business processes nor the diculty of modeling them as sets of consistency constraints. In fact, constraints may still be systematically violated by database updates performed according to a deviating business process. To deal with these situations, it is not eective to skip constraint evaluation, nor to manage deviations as exceptional cases; instead, constraints have to be modi ed according to the new business process situation. In this paper we propose an approach for pursuing semi-automatic constraint modi cation to address permanent deviations: we collect data on constraint violations, and use them to identify permanent deviations: besides, we provide a general syntactic method for constraint modi cation. We suppose that constraints are de ned as part of the database schema, and that they are evaluated each time an operation that modi es the database is performed. If some constraint is not satis ed, either some repairing actions are performed, according to the approaches discussed in Section 1.1, or the operation that violated the constraint is simply aborted. In both cases, however, data on the constraint violation are collected. This information is used by a computerized tool to evaluate possible constraint modi cations. The tool does not have knowledge on the semantics of the database, and is able to perform just syntactic reasoning. Clearly, this may result in the identi cation of modi cations that are not signi cant in the application domain in which the database system is used. But the tool is not meant to automatically substitute humans in the constraint reengineering task. It only provides the database administrator with some kind of \propositive support" for the constraint modi cation activity. The paper is organized as follows: Section 1.1 brie y presents some related work, Section 2 introduces some basic de nitions, Section 3 presents our approach to automatic identi cation of possible constraint modi cations, Section 4 discusses the criteria we use to assess the need for a constraint modi cation, and to evaluate the available alternatives. Finally, Section 5 brie y describes the prototype we are developing to evaluate our approach, and Section 6 draws some conclusions and indications for future research initiatives.
1.1 Related Work
As we outlined before, other contributions to the database constraint violation problem focus on detecting, controlling, and xing the deviations, but do not address the constraint modi cation issue. [CFPT94] de nes a semi-automatic mechanism for reconciling from situations incompatible with the consistency constraints. The basic idea is that, when some constraints are violated during transaction execution, instead of aborting the transaction, some repairing actions are performed, that bring the database in a new consistent state. Similar approaches have been adopted also by [STSW93, ML91, UD90]. [Bor85] introduces some linguistic mechanisms that can be used to de ne exception handlers activated when some constraints are violated. An exception handler can temporarily suspend the constraint evaluation in order to tolerate the presence of exceptional data. A mechanism for maintaining the story of the exceptional data is also provided. [Bal91] de nes the concept of polluted datum. A datum is polluted if it has been modi ed by some operations that caused a constraint violation. Polluted data are marked, and only special transactions are allowed to access them. In the arti cial intelligence domain, the belief or theory revision eld deals with the general problem of revising a knowledge base [MS88, Rae93]. The underlying idea is that if a knowledge base is inconsistent, it has to be revised by modifying some beliefs. Consequently, all the propositions derived from the unbelieved fact have to be retracted. Our approach tailors theory revision principles to the speci c problem of constraint modi cation. In particular, constraints represent the theory to be revised, and the database states in which they are violated, and that re ect permanent deviations, are the input to start theory revision.
CREATE TABLE CITIZEN First_Name: String; Surname: String; Age: Integer; END TABLE
Figure 1: Type constraints on CITIZEN attributes.
2 Database constraints Database constraints de ne semantic relationships that must hold between entities of the real world described in the database. For example, the formula:
E1 ; E2 ; Sal1 ; Sal2 (manager(E1 ; E2 ) salary(E1 ; Sal1 ) salary(E2; Sal2 ) Sal1 > (Sal2 + 30=100 Sal2 ))
8
^
^
!
(1)
requires that managers earn at least thirty per cent more than their employees. In [Bor85] these constraints are called logical constraints, to distinguish them from type constraints. A type constraint associates a speci c domain with each schema attribute. For instance, in the SQL de nition of Figure 1, First Name and Surname are constrained to be strings, and Age must be an integer. In [CFGG94] two classes of logical constraints are identi ed: static constraints, that de ne properties to be veri ed by any database state, and dynamic constraints, that de ne properties of state transformations. In this paper we focus on static constraints, described by rst order logic formulas in the standard implicative form [CFGG94]:1
X1 ;
8
; Xp (C1 Cn and F1
^ ^
Cn
Y1 ;
! 9
; Y q ; F1
_ _
Fm )
(2)
The sub-formulas C1 Fm are called left hand side (lhs) and right hand side (rhs) of the constraint. Literals C1 ; ; Cn and F1 ; ; Fm may be database predicates or special predicates. Database predicates denote relation schemas occurring in the database schema. Special predicates are any other evaluable predicate, such as comparison expressions. For example, in the constraint de ned by Formula 1, manager(E1 ; E2 ) is a database predicate that corresponds to relation schema manager, while Sal1 (Sal2 + 30=100 Sal2) is a special predicate. To allow constraints to be evaluable, the following restrictions are imposed [Ull82, GT91]: The constraint is a closed formula. All the universally quanti ed variables occurring in the right hand side of the constraint also occur in its left hand side. Each variable occurring in a literal denoting a special predicate has to be universally quanti ed and has to appear in at least one database relation predicate in the left hand side of the constraint. A constraint has to be evaluated in any database state. All the variables occurring in the constraint may assume values that range over the domains of the database attributes. In particular, a variable that occurs in a database predicate corresponding to a relation schema RS has associated the domain of the corresponding attribute in RS . For instance, in constraint expressed by Formula 1, database predicate manager corresponds to relation schema manager, having ^ ^
_ _
In [CFGG94] the standard implicative form has been proven to be suitable for modeling most of the relevant database constraints. 1
attributes name1 and name2; variables E1 and E2 , therefore, assume values ranging over the domain of these two attributes. A substitution : X1 ; ; Xn a1 ; ; an associates to each variable, Xi , for 1 i n, a value, ai , that belongs to the domain of the corresponding attribute. De nition 1 (Constraint satis ability) Let DS be a database schema, ' a database constraint described by Formula 2, and D = (r1 ; ; rm ) the current database state. ' is satis ed in D i for each substitution : X1 Xp a1 ap , such that the left hand side of ' (lhs' ) is true, at least a substitution 0 : X1 Xp ; Y1 Yq a1 ap ; b1 ; bq , exists such that the right hand side of ' (rhs' ) is true.
!
!
!
2
3 Constraint modi cation The goal of constraint modi cation approach is to generate new constraints that accept the new procedures and rules originated in the business process. Whenever these procedures and rules are not accepted by the original constraints of the database schema, their application originates a number of violating substitutions. Information on these substitutions are exploited to compute the new constraints. In the present section we discuss how, for any constraint ', it is possible to identify a set of constraint modi cations using as input information the database schema, the database state, and one of the violating substitutions. A database state is de ned by the relations it contains: relations contain information that describes entities of the real world or relationships between entities. An entity in the database is identi ed by the value of a key. Each relation in which the entity, i.e., its key, occurs describes a subset of the entity's properties. A property associated with an entity may be expressed either by the value of an attribute or by its belonging to a particular relation (sometimes representing a relationship). For example, suppose that a database schema is composed of two relation schemas: person, containing information on citizens (name, age, citizenship), and driver containing information on car drivers (name, license number), and that tuples < Brown; 21; UK > and < Brown; 12343 > belong to relations person and driver, respectively. From these tuples we can deduce the properties of Brown. She/he is a UK young person, and is able to drive a car. A constraint de ned on the database schema forces the entities of the database to have some properties. When some entities considered acceptable in the actual application domain do not respect the constraint, this has to be modi ed. A possible solution is to introduce in the constraint some additional condition that is respected by the violating entities and that distinguishes them from all the other entities. The additional condition describes a property that holds for the acceptable entities: it may be the value of a particular database attribute, or the fact that an entity belongs to a particular relation. For instance, suppose that the constraint de ned on the database schema introduced above states that to drive a car a person must be at least 18 years old:
N; A; C; L(person(N; A; C ) driver(N; L)
8
^
!
A 18)
(3)
On the other hand, U.S. citizens are allowed to drive cars even if they are younger than 18. Therefore, the constraint is violated each time an attempt to insert information on a \too
young" U.S. car driver is performed. To accept these attempts the constraint may be modi ed by specifying that the car driver should be either older than 18 or a U.S. citizen:
N; A; C; L(person(N; A; C ) driver(N; L)
8
^
!
A 18 C = US )
_
In this new constraint we require that variable C , corresponding to attribute citizenship in the database, has value \US ". Suppose now that the database contains another relation, that maintains information (name, number of won races, team) on racing car drivers. Assume also that a new law allows racing car drivers to get the driving license with no age and citizenship limits. In this case, we can allow the entities occurring in relation racingCarDrivers to be accepted in our database by modifying the constraint as follows:
N; A; C; L(person(N; A; C ) driver(N; L) W; T (A 18 C = US racingCarDriver(N; W; T )))
8
^
_
! 9
_
In this case the constraint is extended by adding a database predicate not occurring in its original form. This predicate describes an additional property held by some car drivers. Generalizing, given a constraint ', described by Formula 2, the modi ed constraint '0 may be de ned as follows:
X1 ;
8
; Xp (lhs'
Y1 ;
! 9
; Y q ; W1 ;
; Wt(rhs' NL))
(4)
_
where W1 ; ; Wt are variables occurring only in NL. Intuitively, the new literal NL represents a property that holds for the acceptable database entities, and can be either a special predicate (in particular, an equality predicate) or a database predicate. Before starting the discussion on the de nition of NL, let us formalize the concepts we introduced above. From De nition 1, constraint ' described by Formula 2 on a database schema DS is not satis ed in a database D over DS if at least a substitution : X1 ; ; Xp a1 ; ; ap exists such that = lhs and no substitution 0 : X1 ; ; Xp ; Y1 ; ; Yq a1 ; ap; b1 ; bq exists such that 0 = rhs. Constraint modi cation produces a new constraint that is satis ed by the violating substitution and by all the database states that also satisfy the original constraint. Formally, constraint modi cations that we consider acceptable are required to satisfy the following condition: De nition 2 (Constraint acceptability) Given a constraint ', a constraint modi cation, denoted with ' ; '0 , leads to an acceptable constraint '0 if '0 does not restrict the number of database states accepted by the original constraint ': For all D if ' is satis ed in D then '0 is satis ed in D
j
!
!
j
2 It is easy to demonstrate that constraint '0 de ned by Formula 4 satis es constraint acceptability. Moreover, if is a substitution that violates the original constraint, and if NL is such that = (lhs' W1 ; ; Wt NL), then = '0. In the following we discuss how to de ne NL. We distinguish two cases: NL is a special predicate, and NL is a database predicate. j
! 9
3.1 NL as special predicate
j
Given a substitution , consider all the variables occurring in and the corresponding values assigned to them. The literal NL may be de ned as the equality between one of these variables and the corresponding value: Xi = ai .
Theorem 1 Let DS be a database schema, D a database on DS and ' a constraint on DS : '
X1 ; ; Xp (C1 Cn Y1; ; Yq (F1 Fm )) Consider a substitution : X1 ; ; Xp a1 ; ; ap , such that = '. For any j; 1 j p, constraint: '0 X1 ; ; Xp (C1 Cn Y1 ; ; Yq (F1 Fm Xj = aj )) is acceptable according to De nition 2, and is satis ed by substitution . 8
^ ^
8
!
! 9
_ _
6j
^ ^
! 9
Proof: '0 may be written as:
' ( X1 ; _
8
; Xp ; lhs'
!
_ _
_
Xj = aj )
Therefore '0 is true for all the substitutions for which ' is satis ed. Moreover, it is also true for substitution , in fact, both lhs' and Xj = aj , are true for .
2
Example 1 Let a database schema DS maintain information on an organization structure, and
let D be a database on DS , composed of the following relations: employee salary name department name amount manager Rossi 100 Rossi 900 manName empName Bianchi 19 Bianchi 1000 Tanca DiNitto DiNitto 23 DiNitto 900 Tanca Massara Massara 14 Massara 900 Tanca 23 Tanca 2000 On DS a constraint is given, that is de ned by Formula 1. Suppose that the following operation is performed on the database: Insert(manager; < Rossi; Bianchi >) In this case, substitution: : E1 ; E2 ; Sal1 ; Sal2 ! Rossi; Bianchi; 900; 1000 makes constraint 1 violated. If we wish to modify the constraint in order to allow the execution of the operation, we can de ne NL as one of the following literals: E1 = Rossi, E2 = Bianchi, Sal 1 = 900, Sal 2 = 1000.
2
Remark: From the example above it may be noticed that the number of new entities accepted
by the new constraint depends on NL. For instance, E1 = Rossi allows employee Rossi (but not the other employees) to earn more than any of his managers (not necessarily Bianchi). Sal1 = 900 allows not only Rossi but also other employees (DiNitto and Massara, who earn 900) to have a better salary than their managers. Generalizing, if the special predicate Xi = ai used to modify the constraint is such that Xi corresponds to an attribute which is key for one of the relations occurring in the lhs of the constraint, then the constraint condition is relaxed only for the entity having ai as value of the attribute. If Xi is not key, more than one entity will potentially be compatible with the constraint.
3.2 NL as database predicate
Given a constraint ', a violating substitution , and a database D on a database schema DS , NL is de ned as the database predicate PRSi on a relation schema RSi that veri es the following conditions: c.1 It does not occur in the original constraint. c.2 t ri; klhs' Klhs' klhs' t = klhs' where ri is the relation over RSi in database D and Klhs' the set of all the keys of the relation schemas corresponding to database predicates occurring in lhs' . is the tuple of values de ned by substitution , and is the relational projection operation. The rst condition guarantees that the property expressed by predicate PRSi has not been considered in the original constraint. The second condition guarantees that the entity identi ed in by the value of key klhs' occurs in the relation corresponding to PRSi . Given this de nition for NL, the new constraint is: X1 ; ; Xp (C1 C n Y1 ; ; Y q ; W1 ; ; W t F1 Fm PRSi (Xj ; ; Xj+u ; W1 ; ; Wt ) (5) where Xj ; ; Xj +u are the variables corresponding in the constraint to the attributes of klhs' , and u + t is the arity of the predicate PRSi according to the de nition of RSi . Theorem 2 Let DS be a database schema, where RS1; ; RSm are the relation schemas de ned in DS , let D = r1 ; ; rm be a database over DS , ' be a constraint on DS , and : X1 ; ; Xp a1 ; ap be a substitution violating the constraint. Constraint '0 , de ned by Formula 5, where PRSi veri es the conditions c.1 and c.2, is acceptable according to De nition 2, and is true for substitution . 9
j
2
2
8
^ ^
_ _
_
! 9
!
Proof. '0 may be written as:
' ( X1 ; ; Xp ; lhs' W1; ; Wt ; PRSi (Xj ; ; Xj+u ; W1 ; ; Wt ) Therefore, '0 is satis ed by all the substitutions that satisfy '. Moreover, it is also satis ed by substitution . In fact, formula: W1; ; Wt ; PRSi (Xj ; ; Xj+u ; W1 ; ; Wt ) is true for a 0 de ned as follows: 0 : X1 ; ; Xj ; ; Xj+u ; Xp ; W1; ; Wt a1 ; ; aj ; ; aj+u ; ; ap ; t1 ; ; tt where < aj ; ; aj +u ; t1 ; ; tt > is the tuple t that veri es condition c.2. 2 Example 2 Suppose that a database schema DS is composed of the relation schemas person, driver, and racingCarDriver introduced above. Assume also that the constraint on this schema _
8
! 9
9
!
is de ned by Formula 3, and that the database is populated as follows: person racingCarDriver name age citizenship driver name wonRaces team Rossi 40 Italian name licenseN Bianchi 15 Italian Nuvolari 24 Ferrari Rossi 1298934 Jordan 24 US Alesi 12 Ferrari Jordan 1020 Nuvolari 16 Italian Jordan 3 Benetton Alesi 17 French
Suppose that an attempt to execute operation Insert(driver; < Nuvolari; 333 >) is performed. In this case substitution: : N; A; C; L ! Nuvolari; 16; Italian; 333 does not satisfy the constraint. To modify the constraint, we choose the predicate racingCarDriver that we intuitively selected at the beginning of this section as the literal NL. This predicate, in fact, does not appear in the original constraint, and tuple < Nuvolari; 24; Ferrari > veri es condition c.2. In this case the constraint is relaxed for all the racing car drivers. Notice that also the special predicates N = Nuvolari, A = 16, C = Italian, and L = 333 may be used to de ne a new constraint that is satis ed by substitution . In the rst case the constraint is relaxed just for Nuvolari, in the second for all the persons being 16, in the third for all the Italian people (without any age limits), nally, in the last case the constraint is relaxed for the person who has got license number 333.
2
4 Reasoning on constraint modi cation As we outlined in the Examples 1 and 2, not all the modi cations relax the constraint in the same way. We choose to consider as the best constraint modi cation (let it be ' ; '0 ) the one that makes '0 satis ed by the majority of substitutions belonging to the operation history of the system. In this case, in fact, '0 is likely to correctly model the new procedures and rules of the business process. For instance, in Example 2 we have identi ed ve constraint modi cation alternatives that make accepted. Suppose that also substitution belongs to the violation history of the constraint de ned in Example 2: : N; A; C; L Alesi; 17; French; 222 In this case, modifying the constraint based on either of the special predicates identi ed in the example is not eective, since substitution still does not satisfy any of the new constraints. A reasonable value for NL appears to be predicate racingCarDriver, since in this case both and satisfy the new constraint. Notice that not all the violating substitutions that were registered for a given constraint correspond to permanent deviations. Indeed, often, the violation is caused by an operation that is actually unreasonable, that was performed by an unskilled or an absent-minded operator. We call reasonable substitution a substitution corresponding to a permanent deviation. It is reasonable in the sense that, even if it does not satisfy the original database constraint, it corresponds to a database state that is correct in the new business process situation. Identifying reasonable substitutions requires a deep knowledge of the procedures that are performed in the business process. Therefore, it has to be performed by a human. On the other hand, some reasoning can also be performed automatically. In fact, it is possible to assume that unacceptable constraint violations, being accidentally performed, do not occur always in the same situations, and therefore do not originate frequently the same violating substitution or violating substitutions having some common pattern. A permanent deviation, instead, would tend to produce similar substitutions. For instance, when the users of the database attempt to insert information about US car drivers, it is likely that they violate the constraint de ned by Formula 3, that limits the age of car drivers. All the violating substitutions generated by these attempts are similar. In particular, they have the eld citizenship equal to US and the value of eld age less than 18. Summarizing, a substitution violating a constraint is considered reasonable in one of the following cases: !
i. It occurs more than t times in the violation history of the constraint, where t is a threshold value that can be con gured by the system administrator. ii. One or more attribute values in the substitution occur in other violating substitutions. As in the previous case, it is possible to indicate a minimum number of occurrences. iii. The database administrator selects the substitution as reasonable. The identi cation of reasonable substitutions provides an initial piece of information that is used as input of the algorithm we propose to modify a constraint '. This algorithm can be summarized as follows:
Constraint modi cation algorithm 1. Build set of the reasonable substitutions using criteria from i. to iii. 2. do: 2.1. Compute the set of all the new constraints '0 acceptable according to De nition 2 and such that = '0 . 2.2. '0 do: 2.2.1. score' = 0 2.2.2. ~ do: 2.2.2.1. if ~ = '0 then score' = score' + 1 2.2.3. Add score' to the set Scores 2.3 Select from the constraint '^ having the highest score ( score' Scores ; score' score'^ ) and put it in set ' . 3. Prepare a report for the database administrator that shows all the constraints in ' together with their scores. The algorithm has a constraint ' and the violating substitution history as input. The algorithm starts selecting the substitutions that have to be accepted by the new constraint (they de ne set ). Then it selects one of these substitutions and computes the corresponding set of acceptable constraints (step 2.1). This is done according to the criteria de ned in Section 3. Then, a score is assigned to each constraint in (step 2.2). This score depends on the number of reasonable substitutions that satisfy the constraint, and its value ranges from 1 (at least the substitution used for de ning this constraint satis es it) to the cardinality of . The constraint having the highest score is selected and it is inserted in set ' (step 2.3). Steps from 2.1 to 2.3 are repeated for each reasonable substitution. When all the substitutions are considered, the database administrator is provided with a list of constraints, from which she/he can determine the nal constraint. Notice that an additional support to the database administrator can be provided by analyzing set ' in order to automatically synthesize new, more meaningful constraints from the proposed ones. A simple analysis could consider the case in which in ' dierent constraints occur that require the same variable to assume numeric values somehow related, for instance belonging to a certain range: age = 16; age = 17; age = 18; ::: In this case, a special predicate like age 16 would cover all the cases above. The algorithm presented above oers as a result a number of new constraints that more closely re ect the situation of the business process in which the database operates. The update attempts that were not accepted by the DBMS while using the original constraints cannot be RS
8
2 RS
j
8
2
0
8
2 RS j
0
0
0
8
0
RS
RS
0
2
automatically accepted when the constraints are modi ed. In fact, at this time, the state of the database is, in general, dierent from the state in which these attempts were performed; thus, nothing can be said about their applicability. As a last remark, notice that the constraint modi cation algorithm is supposed to be performed o-line, while the DBMS is not executing the normal operations. Thus, interference between the constraint modi cation procedure and the other operations is not considered in this paper.
5 Implementation issues We are currently implementing a prototype, based on ORACLE7 [ORA92], to experiment our approach [Mas95]. The prototype is able to evaluate a constraint, to execute some repairing actions whenever the constraint is violated, and to store violating substitutions. Based on the substitutions that violated a constraint, the prototype executes the constraint modi cation algorithm discussed in Section 4. To implement constraint evaluation, we use the mechanisms oered by active rules [FT95], which provide a powerful mechanism for implementing constraint evaluation, especially in the case in which some speci c actions have to be performed if the constraint is violated.2 In our case, to implement constraint evaluation, active rules are structured as follows: The event part is a disjunction of all the events that might cause a constraint violation. These events are any update operation performed on the relations that occur as predicates in the constraint. The condition is a query on the database that is obtained by negating the constraint. This way, the condition will select all the violating substitutions of the constraint. The action stores the violating substitutions in relation violating' . This relation is de ned for each constraint on the database schema. Moreover, the action de nes the repairing actions that bring the database in a consistent state. For the moment, we just rollback the transaction that violated the constraint. During transaction execution, the active rules de ned above may be triggered and executed according to the deferred semantics. Their evaluation has to be delayed until the commitment of transaction. In fact, it is not required that the intermediate states of the database reached during transaction execution are correct with respect to the de ned constraints. The prototype we implemented generates the active rules from the constraint de nitions and supports active rule execution and constraint modi cation. The high level architecture of the system is shown in Figure 2. It is structured in the following components: The Constraint Editor, that supports the database administrator in de ning a new constraint having the form of Formula 2. The Rule Generator, that generates active rules starting from the constraint de nitions. The Rule Executor, that triggers and executes the active rules. The Constraint Modi cation Executor, that executes the algorithm we introduced in Section 4. Constraint modi cation is executed whenever the database administrator requires it. Moreover, it is also activated when the number of violations of a constraint exceeds a speci ed value.
For instance, active rules are used in [CFPT94] and [UD90] to enforce database integrity through repairing actions on the database state. 2
Constraint Editor
Rule Generator
Constraint Modification Executor
Rule Executor
Figure 2: The system architecture. Data used by all the system components are stored in the database. In addition to the relations de ned for the application, the database stores the constraints de ned by the database administrator, the active rules de ned by the rule generator, and the violating substitutions relations. ORACLE7 provides a limited implementation for active rules (they are called triggers). In particular, it does not provide the deferred semantics. Active rules may be evaluated and executed only immediately, after or before the occurrence of the triggering event. We have enriched ORACLE7 by providing a mechanism for implementing the deferred semantics for active rules. This issue is detailed in Section 5.1.
5.1 Implementing the active rules deferred semantics in ORACLE7
Consider an active rule having E as triggering event, C as condition, and A as action. The deferred semantics is obtained by delaying the evaluation of the condition C, triggered by the occurrence of the event E, after the occurrence of some other event (in our case the commitment of the transaction in which event E occurs). To obtain this behavior in ORACLE7, we have decomposed the active rule in two parts, an alarm and a condition/action section (CA section) having the following form, respectively: CREATE TRIGGER ActRuleAlarm AFTER/BEFORE E WHEN TRUE BEGIN insert into TriggeredActiveRules(``ActRuleCAsection''); END CREATE PROCEDURE ActRuleCAsection DECLARE .... /* declare the local variables used in A */ BEGIN if C then A END
The alarm is de ned as an ORACLE7 trigger. It detects and keeps track of the occurrence of the triggering event E. In particular, it has the same triggering event as the original active rule but it has a dierent condition and action. Thus, it is triggered by all the events that trigger the original active rule. Its action is immediately executed and causes the insertion of the name of the corresponding CA section in relation TriggeredActiveRules. The CA section
is implemented as an ORACLE7 procedure. It executes the condition and action part of the original active rule. The alarm execution is demanded to the run time support oered by ORACLE7 for triggers. Thus, during the execution of a transaction, the occurrence of event E is detected and causes the name of the corresponding CA section to be stored in a service relation. The execution of the CA section is demanded to the Rule executor, that rede nes the semantics of the commit statement. In particular, at commit time, it executes the following steps: 1. For any procedure name in relation TriggeredActiveRules, the corresponding procedure is executed. 2. If any of the executed procedures has requested the abort of the transaction, the operations performed by the transaction are aborted, while the operations performed by the deferred executors are preserved. 3. It no deferred executor has requested the abort, the transaction is committed. The generation of the alarms and the CA sections is automatically performed by the Rule Generator, starting from the constraints de ned for the database. It also overcomes an additional restriction that ORACLE7 poses on the form of its triggers. In particular, in ORACLE7 the events that determine the execution of a trigger can include INSERT, DELETE, and UPDATE operations performed on exactly one database relation. Thus, an event part de ned as follows: AFTER INSERT OR DELETE OR UPDATE ON driver OR person
would not be allowed since events on two dierent relations are speci ed on it. In the cases like the one above, the rule generator builds a trigger for each relation occurring in the disjunction. This would result in a huge number of triggers, depending on the structure and the number of the constraints de ned for the database. In order to limit this number, we considered that as a relation may occur in several constraints, it can be involved in the de nition of the event part of dierent active rules. Thus it is likely to have several alarms having the same event part. They can be merged to obtain a unique trigger, whose action part is composed of the action parts of all the components. For instance trigger: CREATE TRIGGER MergedTrigger AFTER DELETE OR INSERT OR UPDATE ON SampleRelation BEGIN insert into TriggeredActiveRules(``CAForConstr1''); .... insert into TriggeredActiveRules(``CAForConstrN''); END
res whenever an operation on SampleRelation is performed, and, as a result, inserts strings ``CAForConstr1'', ..., ``CAForConstrN'' into TriggeredActiveRules relation. Strings ``CAForConstr1'', ..., ``CAForConstrN'' are the names of the CA sections de ned for constraints Constr1, ..., ConstrN in which database predicate corresponding to SampleRelation occurs. Basically, MergedTrigger is a multiple alarm, that prepares the system to perform the all the corresponding CA sections.
6 Conclusion In this paper we address an aspect of the problem of managing and controlling deviations between an information system and the corresponding business process. In particular, we propose an approach for supporting constraint modi cations when permanent deviations occur. During
system operation, we suppose the database is always kept consistent, either by simply aborting operations that caused constraint invalidation or by enforcing some repairing actions. In both cases, we collect violating substitutions and use them to generate and assess possible constraint modi cations. Then, database constraints can be modi ed extending them to represent the additional properties of the entities considered correct in the new situation of the business process. The criteria we de ne for identifying constraint modi cations rely on the observation that the values of attributes are associated to an entity of the real world, or on the fact that the entity occurs in a relation correspond to a property that holds for the entity itself. In particular, we add to the right hand side of the constraint to be modi ed a new literal, NL, that is an equality predicate, forcing a database attribute to assume a speci ed value, or a database predicate, corresponding to a relation of the database. The approach we propose is simple and can be implemented quite easily. Further studies on this topic concern the improvement of the modi cation criteria. Another subject of investigation is the usage of data mining techniques to classify violating substitutions and to identify convenient constraint modi cations based on this classi cation. Finally, the advantages of adopting less restrictive syntactic forms for constraints are to be examined. Our approach may be also extended to object oriented databases [Kim91]. In this case, constraint de nition poses some problems related to the inheritance mechanism and to the type migration mechanism. If constraints de ned on a class are inherited by the subclasses, they may result obsolete with respect to the properties of the objects of the subclass. In this case, a constraint modi cation approach would help system administrator in re-design the constraints.
Acknowledgments We would like to thank Giulio Massara, who implemented the prototype of the system, and the anonymous referees, who gave us useful comments and suggestions.
References [Bal91]
R. Balzer. Tolerating Inconsistency. In Proceedings of the 13th International Conference on Software Engineering{ICSE 13. IEEE Computer Society, 1991. [Bor85] A. Borgida. Language features for exible handling of exceptions in information systems. ACM Transactions on Database Systems, 10(4), December 1985. [CFGG94] S. Ceri, P. Fraternali, F. Garzotto, and G. Gottlob. Speci cation and Management of Database Integrity Constraints through Logic Programming Techniques. Technical report, Politecnico di Milano, P.zza Leonardo da Vinci, 32 20133 Milano, 1994. [CFPT94] S. Ceri, P. Fraternali, S. Paraboschi, and L. Tanca. Automatic generation of production rules for integrity maintenance. ACM Transactions on Database Systems, 19(3), September 1994. [FT95] P. Fraternali and L. Tanca. A structured approach for the de nition of the semantics of the active databases. ACM Transactions on Database Systems, 1995. To appear. [GT91] A. Van Gelder and R. W. Topor. Safety and Translation of Relational Calculus Queries. ACM Transactions on Database Systems, 16(2):235{278, June 1991. [Kim91] Won Kim. Introduction to Object-Oriented Databases. MIT Press, Cambridge, MA, 1991.
[Mas95]
G. Massara. Gestione dei vincoli di integrita in ORACLE. Master's thesis, Politecnico di Milano, 1995. [ML91] G. Moerkotte and P.C. Lockemann. Reactive consistency control in deductive databases. ACM Transaction on Database Systems, 16(4), 1991. [MS88] J. P. Martins and S. C. Shapiro. A model for belief revision. Arti cial Intelligence, 35:25{79, 1988. [ORA92] ORACLE Corporation. ORACLE7 Server Application Developer's Guide, December 1992. [Rae93] L. De Raedt. Interactive Theory Revision - An Inductive Programming Approach. Academic Press, 1993. [STSW93] K.D. Schewe, B. Thalheim, J.W. Schmidt, and I. Wetzel. Integrity enforcement in object oriented database. In U.W. Lipeck and B. Thalheim, editors, Modelling Database Dynamics. Springer-Verlag, Volkse, Germany, 1993. [UD90] S. D. Urban and L. M. Delcambre. Constraint Analysis: a Design Process for Specifying Operations on Objects. IEEE Transactions on Knowledge and Data Engineering, 2(4):391{400, 1990. [Ull82] J. D. Ullman. Principles of database systems. Computer Science Press, 1982.