19. An Example of Constraint Specification in Prolog. 59. 20. Transformation of Constraints into a CDN. 61. 21. CDN Representation in ICMS Prototype. 62. 22.
A CONSTRAINT MANAGEMENT MODEL FOR DECISION SUPPORT SYSTEMS
Eun Gi Kim
98 Pages
December, 1993
This thesis presents a methodology for constraint modeling and constraint management in DSSs. An architecture for integrity management in a DSS, called the Integrity Constraint Management (ICMS), is presented.
APPROVED: ___________________________________ Date
Balakrishnan Muthuswamy, Chair
___________________________________ Date
Lawrence Eggan
___________________________________ Date Billy Lim
A CONSTRAINT MANAGEMENT MODEL FOR DECISION SUPPORT SYSTEMS Eun Gi Kim
98 Pages
December, 1993
Decision Support Systems (DSSs) play an important role in present-day organizational decision making. Integrity management is an important issue in DSSs, which involves specification and verification of explicit integrity constraints. The heterogeneous components in DSS components make it difficult to maintain integrity between (and within) the subsystems. It is desirable that integrity specification and maintenance of DSS components be integrated in their design and function. This is a topic of current research interest. This thesis presents a methodology for constraint modeling and constraint management in DSSs. An architecture for integrity management in a DSS, called the Integrity Constraint Management System (ICMS), is described. This architecture models constraints in a hierarchical fashion similar to schema and subschema definitions in database management. The ICMS facilitates the maintenance of a variety of types of constraints. This thesis also discusses the details of a methodology for modeling constraints that is appropriate for a DSS with several component systems. This approach uses graph representation of constraints and constraint relationships and is called a Constraint Dependency Network (CDN). The constraints in the ICMS are represented by a CDN. This representation supports the mapping of a broad range of constraints in a modular and declarative fashion. This graph representation of constraints is used to detect any integrity violations.
A prototype implementation of the ICMS that detects integrity violations has been developed. This implementation demonstrates that a) constraint dependencies can be modeled using a constraint dependency network, and b) some constraint management functions are possible through a CDN. The main advantage of the ICMS approach is that it is appropriate for DSSs that have the heterogeneous subsystems (DBMS, KBS, and MMS). We believe that the general features of the ICMS should be central to the integrity management component of a DSS.
APPROVED: ___________________________________ Date
Balakrishnan Muthuswamy, Chair
___________________________________ Date
Lawrence Eggan
___________________________________ Date Billy Lim
THESIS APPROVED: ___________________________________ Date
Balakrishnan Muthuswamy, Chair
___________________________________ Date Lawrence Eggan ___________________________________ Date Billy Lim
CONSTRAINT MANAGEMENT MODEL FOR DECISION SUPPORT SYSTEMS
EUN GI KIM
A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE Department of Applied Computer Science ILLINOIS STATE UNIVERSITY 1993
ACKNOWLEDGMENTS
I would like to acknowledge the contributions to this thesis of the following people and entities and offer them my profound gratitude: •
Dr. Balakrishnan Muthuswamy, for his intellectual inspiration, advice, guidance, and support. He has taught me how to do research and how to complete a thesis. He truly cares about his students.
•
Dr. Larry Eggan and Dr. Billy Lim, for useful comments, and insights to this thesis.
•
The Department of Applied Computer Science, for a graduate assistantship during the last two academic years.
•
My parents and my sister, for their patience, love, and encouragement.
•
Finally, all the instructors who have taught me and my friends as they made graduate school an enjoyable experience for me. E. G. K.
i
CONTENTS
Page ACKNOWLEDGMENTS
i
CONTENTS
ii
TABLES
iv
FIGURES
v
CHAPTER I. INTRODUCTION
1
Objective Scope Organization
3 4 5
II. BACKGROUND AND RELATED WORKS Constraints
6 6
Constraint Languages Types of Constraint Language Constraint Schema Constraint Networks Constraints in Object-Oriented Modeling Classification of Constraints
7 8 9 9 13 15
Inherent, Explicit, and Implicit Constraints Static and Dynamic Constraints Object, Attribute, and Relationship Constraints Unified Constraint Classification for DSS
Existing Approaches to Constraint Management
ii
15 16 17 17
20
Stonebraker, Approach Qian and Wiederhold, Approach Gelman et al., Approach Shepherd and Kerschberg, Approach Wald and Road, Approach Muthuswamy et al., Approach III. ARCHITECTURE OF INTEGRATED CONSTRAINT MANAGEMENT SYSTEM IN A DSS
20 20 21 23 25 25
27
Constraint Representation in ICMS
27
Subsystem Constraints
29
DBMS Constraints KBS Constraints MMS Constraints
29 30 30
Global Constraints Role of ICMS in DSS Advantage of the ICMS
31 32 36
IV. INTEGRITY CONSTRAINT MANAGEMENT SYSTEM Constraint Knowledge Base
37 37
Constraint Objects Mapping Knowledge
38 39
Constraint Dependency Network Direction of Arc The Biparite Matching Algorithm to Determine Direction of Arc for MMS Constraints CDN Generator Constraint Propagation Module
39 41 42 43 44
MMS Constraint Solver
47
Constraint Evaluation Module
48
Complexities in Evaluating Constraints Resolving Conflicts
48 50
iii
Transaction Module User Interface
51 52
V. ICMS PROTOTYPE
54
ICMS Prototype Module User Interface Constraint Specification Using Prolog Transformation of Constraints to CDN Constraint Checking/Enforcement in ICMS Prototype VI. CONCLUSION
54 56 57 60 63 65
Areas of Further Research
66
REFERENCES
68
APPENDIX A: BIPARITE MATCHING ALGORITHM
76
APPENDIX B: ICMS PROTOTYPE PSEUDO CODE
78
APPENDIX C: ICMS PROTOTYPE IMPLEMENTATION CODE
82
iv
TABLES Page Table 1. Comparison of Three Types of Constraints
16
Table 2. Constraint Types
19
Table 3. Constraints in DSS Subsystems
33
v
FIGURES
Figure
Page
1.
Defining a Problem Using a Constraint Network
10
2.
Multiplier-Adder Constraint Network
11
3.
Computation of a Temperature Conversion
12
4.
Constraint Objects
14
5.
Schematic for a Constraint Object
15
6.
Syntax for a Typical Symbolic Constraint in FRM
22
7.
Hierarchy of Constraints in ICMS Approach
28
8.
ICMS in DSS Environment
33
9.
ICMS Modules
34
10. ICMS Constraints
35
11. A Constraint Object Schema
38
12. Example of a Constraint Object
39
13. Constraint Dependency Network
40
14. a) Biparite Matching and Constraint Dependency Network b) Constraint Dependency Network with Directed Graph
43
15. General Solution Sequences for Cm1, Cm2, and Cm3
47
16. Collapsing MMS Constraint Cycle in CDN
50
17. ICMS Prototype Module Hierarchy
55
vi
18. ICMS Prototype Menu
56
19. An Example of Constraint Specification in Prolog
59
20. Transformation of Constraints into a CDN
61
21. CDN Representation in ICMS Prototype
62
22. Constraint Propagation Procedure
64
vii
CHAPTER I INTRODUCTION
Decision Support Systems (DSSs) play an important role in present day organizational decision making. A Decision Support System (DSS) is a computer-based interactive system that helps managers in solving semi-structured problems (Sprague, 1980; Turban, 1986; Murphy & Stohr, 1986). Integrity management is an important issue in DSSs which involves specification and verification of explicit integrity constraints. This thesis deals with this issue of integrity management in DSSs. DSSs are usually configured from existing, often mathematical, representations and tools for the chosen application domain (Binbasioglu, 1986). A DSS generally has three heterogeneous information bearing subsystems: Database Management System (DBMS), Model Base Management System (MMS), and Knowledge Based System (KBS) (Teng et al., 1988; Lee & Kang, 1988; Muthuswamy et al., 1992). The DBMS component supports traditional database functions such as data definition, storage, manipulation, and retrieval in a shared integrated environment. There are several models of databases such as hierarchical, network and relational. Each of these has its own structure of storing, manipulating, and retrieving data. The MMS facilitates use of various mathematical models such as simulation, statistical methods, and operations research models. MMSs play an important role in current day organizational decision making. Traditionally, these models run on mainframe computers involving huge amounts of data. MMSs may interact with databases to obtain input data to the models, and the computed results may be stored in the organizational database. 1
2 The KBS component enables logic processing and manipulation of organizational knowledge to provide intelligent decision support. This includes capturing decision making expertise in expert systems. One way in which knowledge is modeled is by using facts and rules. An inference engine is an important component of a KBS which enables logical inferences. In a DSS the subsystems are not disjoint modules. Depending on the real world application, there is a need for these components to interact with each other to provide coordinated and intelligent support to the decision maker. For example, a KBS may interact with databases and MMSs during the process of supporting a decision within an organization; an expert system may extract data from a database, obtain values of variables from a MMS model, and use this information in a rule to arrive at a decision. Similarly, a DBMS may invoke a MMS module to compute values to be stored within the database. These heterogeneous subsystems are necessary components of a DSS since each of them provides specific functionalities: the database management component supports data retrieval, storage, and data manipulation operations; the model management system facilitates use of quantitative models; and the knowledge based component enables making inferences about data and models. These components differ in their functional objectives, representation structures, operational processes, hardware configurations, and application domains. Each subsystem may use a different methodology to model and process constraints. For example, a DBMS might use SQL (Chamberlin et al., 1981) using ASSERT statements; KBSs might use strict IF-THEN production rules; and a MMS may represent constraints as numerical equations and quantitative functional models. In organizational information systems, enforcing integrity is often the responsibility of individual programmers. This approach is perhaps reasonable in small organizations.
3 In an organization that has several heterogeneous subsystems with interactions between them (such as a DSS), it is desirable that integrity constraints be modeled at a global level. The above discussed heterogeneity in DSS components makes it difficult to maintain integrity between (and within) the subsystems. It is desirable that integrity specification and maintenance of DSS components be integrated in their design and function. This is a topic of current research interest. Though this difficulty has been recognized in the literature (Lee & Kang, 1988; Jarke, 1988; Muthuswamy et al., 1990; Muthuswamy et al., 1992; Dalal & Yadav, 1993; Goul et al., 1993), a unified approach to integrity maintenance in DSSs is yet to be forthcoming. One approach in addressing this problem is to use a high level semantic model (for example, the ER model) to capture the schema of information contained by the three components and other necessary meta-data about the components. Such an approach has been suggested (Muthuswamy et al., 1991). In their approach, relationships among the entities of the three DSS components are modeled using the ER model. This method is appropriate for modeling relationships at a reasonably high level of abstraction, but becomes cumbersome if the relationship details are included. This thesis presents an alternative approach to integrity constraint management and is designed specifically for DSS. This approach is well-suited to solving integrity problems in DSS.
Objective The objective of this thesis is to develop a methodology for constraint modeling and constraint management in DSSs. With this objective, we present an architecture for integrity management in a DSS, called the Integrity Constraint Management System (ICMS). This architecture models constraints in a hierarchical fashion similar to schema and subschema definitions in database management.
4 This thesis also discusses the details of a methodology for modeling constraints that is appropriate for a DSS with several component systems. This approach uses graph representation of constraints and constraint relationships and is called a Constraint Dependency Network (CDN). The goal of the ICMS is to facilitate the maintenance of a variety of types of constraints. The constraints in ICMS are represented by a CDN. This representation supports the mapping of a broad range of constraints in a modular and declarative fashion. This graph representation of constraints is used to detect any integrity violations. We advocate that the general features of the ICMS should be central to the integrity management component of a DSS.
Scope The problems encountered in constraint management involve a variety of constraints. The scope of this thesis is limited to designing the ICMS and presenting a constraint management model that plays an important role in maintaining the integrity of DSS. The scope of this thesis also limited to dealing with the complexities of constraints. Constraints can be numerous and contradictory with one another. Complexities of constraints such as cycles are hard detected and managed. Although dealing with the complexities of constraints is an important part of the study, further study of this research topic should be independent of this thesis. The scope of this study is also limited to developing a prototype. The prototype is not full-fledged as it contains a number of different modules within the ICMS. The focus of the prototype is limited to demonstrate how the CDN could be modeled and used in actual ICMS implementation.
5 The main assumption we make is that the constraints are almost always consistent and not contradictory to each other. Determining whether a constraint is contradictory to another constraint requires a formal proof of mathematics and logic. This part is not throughly convered in this thesis.
Organization This thesis is organized in the following manner. Chapter 2 presents a survey of the literature of constraint modeling as relevant to database, knowledge base, and model base systems. Chapter 3 describes the ICMS architecture, and Chapter 4 presents the details of the ICMS components. Details of a prototype implementation of the ICMS model are presented in Chapter 5. Conclusions and suggestions for future research areas are discussed in Chapter 6.
6
CHAPTER II BACKGROUND AND RELATED WORKS
Constraints A constraint is a relation stating what should be true about one or more items. Examples of a constraint include: 1.
a constraint that states that each student must have a unique student number,
2.
a constraint that a resistor in a circuit simulation obey Ohm's Law, and
3.
a constraint that expresses a numeric equation (for example, x = a + b). In many applications, it is useful to be able to declare constraints. Much of one's
knowledge of the world is best expressed in terms of what is allowed or, conversely, what is not allowed. Constraints are the way we naturally think about a problem. (Morgernstern et al., 1989). Particular areas where constraint modeling is relevant include: 1.
platforms for engineering design,
2.
manipulating complex objects,
3.
dealing with exceptions and incomplete information,
4.
operations research and optimization, and
5.
analysis and reasoning with constraints. Several constraint-based approaches to enable constraint management have been
developed (for example, Morgenstern, 1986; Shepherd & Kershberg, 1986; Serrano, 1987; Gelman et al., 1990). The following section discusses areas that are relevant to constraint management.
7 Constraint Languages Constraint languages are syntactic structures to enable modeling of constraints. They basically express formulas and dependencies among problem variables. Constraint languages may be used in applications with several interrelated constraints. Programming using a constraint language is a declarative activity. For example, consider that the quantity named 'x' is the sum of the quantities named 'a' and b: x = a + b. Then, if values of 'x' and 'a' are known, a constraint system should be able to find values for 'b'. On the other hand, programming in a conventional procedural language requires several more lines of code to find the same solution. For example, in a rule format this would be represented as: IF known (x and a) THEN b=x-a ELSEIF known (x and b) THEN a=x-b ELSEIF known (a and b) THEN x=a+b
which is a rather inelegant representation. The advantage of using a constraint language is that the programmer need not explicitly code solution algorithms. There are three important features of constraint languages (Jaffar, 1992): 1) how constraints appearing in a program are interpreted, for example, static constraints vs. dynamic constraints; 2) how constraints affect the control of a program, for example, how existing constraints may influence a collection of other constraints; 3) how constraints are solved, for example, algorithm used for solving constraints, and the extent to which constraints are solved.
8
Types of Constraint Language Constraint languages can be viewed from various perspectives: the mathematical logic perspective, the programming language perspective (Cohen, 1990), and the integrity constraint management perspective (Oren, 1985). From the mathelogical perspective, constraint languages attempt to establish a class of first-order theories which preserve the basic computational properties of Hornclause logic. Horn-clause logic is a logical expression, for example q ⇐ p1 ∧ p2 ∧ . . . pn where ∧ and ⇐ denotes a logical relationship disjunction (OR) and implication (IF), respectively. From the programming language perspective, the purpose of a constraint language is to establish a class of logic programming languages in which variables have values in a diverse set of domains including trees, booleans, reals, rationals, and lists (Cohen, 1990). This type of constraint language has been used in areas such as numerical analysis, operation research, combinatorics, and engineering applications. From the integrity management perspective, the goal of a constraint language is to maintain integrity in organizational systems. The focus is on enforcing constraints rather than solving constrained problems. An important guideline in constructing this type of language is that it should be simple and easy to use compared to a programming language. Some of the integrity constraint languages include the following: TAXIS (Mylopoulos et al., 1980), Constraint Equation (Morgernstern, 1984, 1986), CSDL (Roussoupoulos, 1979), PRISM (Sherpherd & Kershberg, 1986), DAPLEX (Shipman, 1981), and SCHEMAL (Frost and Whittaker, 1983).
9
Constraint Schema A constraint schema is a semantic description that expresses all possible logical relationships that exist between problem variables. A constraint schema may consist of a constraint name, the involved variables, and rules that allow the values of a variable to be computed when the others are known. A constraint schema is used as rules for computation in various circumstances (Tanimato, 1987). The following example schema is from Tanimato (1987) and represents Ohm's law in an electronics problem-solving system in LISP. (CONSTRAINTS OHMS_LAW (PARTS (VOLTAGE CURRENT RESISTANCE)) (RULES (TAKE VOLTAGE (TIMES CURRENT RESISTANCE)) (TAKE CURRENT (QUOTIENT VOLTAGE RESISTANCE)) (TAKE RESISTANCE (QUOTIENT VOLTAGE CURRENT)) ))
This type of constraint modeling may be used to find unknown values if other variables have known values. Using the above representation, one can obtain a value for current if values for voltage and resistance are known. This method is often used in solving engineering problems. In general, there may be many interrelated constraints in a given application. It is left up to the system to sort out how they interact and to keep them all satisfied.
Constraint Networks A constraint network is a powerful representation to deal with numerical constraints (Sussman & Steele, 1980; Steele; 1979; Serrano, 1987; Serrano, 1991; Muroga, 1990). It is a declarative structure which expresses relations among variables. A constraint network, in general, consists of a set of nodes and a set of arcs (pointers). In general, the arcs represent constraints, and the nodes represent individual variables, which
10 have some particular value, known or unknown. Therefore, a constraint network has a set of variables that have known values and another set of variables that have unknown values. In the graph representation, the actual functional relationship among the variables of a constraint is not explicitly represented. However, the fact that the variables are related or "connected" by a constraint is explicitly represented. The set of variables with known values may be thought of as the inputs to the system and the set of variables with unknown values as the desired outputs from the system (Figure 1). The status of a variable is defined by its membership in one of these two sets. Therefore, a variable may have a status of known or unknown depending on whether it is given or whether it is to be computed. By defining and redefining the status of the variables, various aspects of a given decision situation may be studied with a minimum of input data manipulation by the user.
Input Knowns
Constraint Network
Output Unknowns
Figure 1. Defining a Problem Using a Constraint Network
The constraint satisfaction mechanism of a constraint network is an efficient mechanism that can find the new values under given constrained variables by adjusting variable values so that constraint relationships are satisfied. Since a constraint network is
11 a graph, it is domain-independent and a large number of algorithms from graph theory can be applied. Figure 2 shows an example of a constraint network which represents a 'multiplieradder' device. A*B=E denotes the 'multiplier' constraint among variables A, B, and E, C*D=F denotes the 'multiplier' constraint among the variables C, D, and F and E + F = G denotes the 'add' constraint among the variables E, F, and G.
A A*B=E
E
B E+F=G
G
C C+D=F
F
D
Figure 2. Multiplier-Adder Constraint Network (modified from Steele, 1979)
The above constraint relationships are acausal since information may flow (propagate) in either direction. When certain values of the variables become known, the non-directed graph is transformed to a directed graph by pointing arcs from known variables to unknown variables (Steele, 1980). Thus, the constraint network facilitates the propagation of values through the graph for a given problem. Constraint propagation is a procedure in which information is deduced from a local group of constraints and nodes and is recorded as a change in the network (Steele, 1980). Further deductions will make use of these changes to make other changes. Thus the consequences of a change (input) eventually spreads throughout the network. This is
12 called process propagation. Constraint propagation is discussed in Steele (1980), Muroga (1990), and Winston (1984). The forward inference is basically to deduce something immediately if possible.
Centigrade
Fahrenheit C
A
-72
A
+ ADD
x B
32
B
C
-360
OTHERMULT
C
B -40
x
MULT A
9
5
Figure 3. Computation of a Temperature Conversion (from Steele, 1980)
Figure 3, taken from Steele (1980), shows the temperature conversion constraint between the variables 'fahrenheit' and 'centigrade'. The conversion equation is
centigrade =
5 x (fahrenheit - 32) 9
Suppose now, for example, that we state that centigrade is -40. This results in the following sequence of computations (Steele, 1979): 1.
From centigrade = (B MULT) = - 40 and (A MULT) = 9, the constraint MULT deduces (C MULT) = -40 x 9 = -360.
2.
From (C MULT) = (C OTHERMULT) = -360 and (B OTHERMULT) = 5, the constraint OTHERMULT deduces (A OTHERMULT) = (-360)/5 = -72.
13 3.
From (A OTHERMULT) = (A ADD) = -72 and (B ADD) = 32, the constraint ADD deduces (C ADD) = (-72)+32=-40.
The above example illustrates the generic nature of constraint networks and the way in which propagation of constraints are encoded as a graph structure. The primary disadvantage is that its success depends on the structure of the network. Almost all the complexity of constraint propagation is a consequence of the presence of cycles (loops) in the constraint networks. A cycle is formed when a node directly or indirectly affects its own state. The occurrence of cycles in constraint networks may be reduced by proper modeling of temporal variables and modular aggregation of variables as higher level variables. Another disadvantage is that constraint propagation is a local computational algorithm (Winston, 1984). Therefore, the constraint networks need to be looked at globally in order to identify cycles.
Constraints in Object-Oriented Modeling Constraints in an object oriented environment are rules or assertions used to maintain the integrity and correctness of a set of objects. Eisuke Muroga (1990) suggests use of 'constraint objects' for maintaining integrity of object oriented databases. Constraint objects are just like any other objects in object oriented databases except that they are specifically used to enforce integrity of the database. The constraint object, being an object itself, has its own methods and variables. The methods within a constraint object are the rules in object oriented databases that perform constraint maintenance tasks such as constraint checking and constraint enforcement (Muroga, 1990). Constraint checking is the determination of whether a set of data values are in compliance with a constraint. On the other hand, constraint enforcement is when values are checked against constraints, and if the constraints are not met, new values are determined that will comply with those constraints.
14
A
A+B+C=20
C
D+C=10
D
B
Figure 4. Constraint Objects
Figure 4 is an example of constraint objects that consist of the following two constraints: A+B+C=20 and D+C=10. The entire constraint is represented by a single object and is linked to other constraint objects, A, B, C, and D. The above representation closely adapts to the constraint network concept. The object oriented modeling concept focuses on communication among the constraint objects. This representation of linking the constraint objects together is similar to the representation of constraint network. The sample schematic of a constraint object, borrowed from Muroga, is shown in Figure 5. Along the left side are the methods for the object. The links necessary for each constraint are also shown. More details of methods to check and enforce such constraint objects are provided by Marina Dao (1991).
15 Methods
Links
Check Method
Enforce Method
Linking Method
Figure 5. Schematic for a Constraint Object (from Muroga, 1990)
An advantage of using constraint objects is that they can be dynamically removed and added to other objects without changing the schema (class definitions). The removal and addition of constraints through database operations allows the constraint specification to change as the database evolves.
Classification of Constraints The following are some of the different constraint types that are found in the current literature.
Inherent, Explicit, and Implicit Constraints Constraints may be classified into three basic types: inherent, explicit, and implicit (Brodie, 1978; Tscihritzis & Lochovsky, 1986). Inherent constraint saves the basic semantic properties of a model construct. For example, in a relational model data may be represented only in the form of relations; in a hierarchical model, the relationships are in the form of parent-child association. Explicit constraints are application-dependent constraints that specify the semantic constraints related to an application and are defined using a specification mechanism. For
16 example, a SQL-like language might employ an ASSERT statement to specify further semantics of the model. Implicit constraints are those constraints which are derivable from the interaction of other constraints. For example, if an explicit constraint states that a child named Susan is a sister of a child 'x' in a parent-child association, then the implicit constraint would be that Susan is also a daughter of the parent. Specification and enforcement of constraints depend on whether a constraint is represented inherently, explicitly, or implicitly (Badal, 1979; Tsichritzis & Lochovsky, 1982; Shepherd & Kerschberg, 1986), and the distinction between these constraints depend on the structures provided by the data model. The following table is a brief summary of the three types of constraints.
Table 1 Comparison of Three Types of Constraints TYPE Inherent Explicit
SPECIFICATION SCHEME integral part of data model • very limited specification • flexible
Implicit
•
•
can only derived from other two constraint types
ENFORCEMENT SCHEME can be enforced automatically
•
interpretively enforced • requires specification mechanism • difficult to implement in most cases •
Static and Dynamic Constraints Constraints can be classified as being static (state) or dynamic (transition) (Brodie, 1978; Elmasri, 1989). Static constraints are rules to ensure the appropriate states of entities. Dynamic constraints are rules to ensure the appropriate execution of operations and are often represented by preconditions and postconditions that are placed on operations. Preconditions are used to verify that a system is in an acceptable state before
17 the operation is allowed to proceed, while postconditions ensure that everything happened correctly in an operation. Usually both static and dynamic constraints are needed to specify all applicable constraints.
Object, Attribute, and Relationship Constraints Constraints may also be classified as being object, attribute, and relationship types (Brodie, 1984). These are static rather than dynamic. Attribute constraints are cases such as single versus multi-valued, null versus non-null, modifiable or not. Relationship constraints are those of mappings and cardinality. They are concerned with the existence of objects and the dependency of one object on other object. A referential constraint is a relationship constraint that says if object A is referred to by object B then B can exist only if A exists.
Unified Constraint Classification for DSS In this section, we identify major types of integrity constraints that may occur in DBMS, KBS, and MMS within a unified DSS framework. The benefit of this classification of constraints is that it can be used to determine which type of constraint evaluation should be used. The basic categories of DSS constraints that we define are as follows: 1) unconditional Constraints, 2) conditional Constraints, and 3) numeric Constraints. These constraints are shown in detail in TABLE 2. Unconditional constraints involve the current state of a database. These constraints are: a) range, b) subset,
18 c) attribute, d) data dependency, f) generalized cardinality, g) incompatibility, h) obligation, and i) inclusion constraints. These types of constraints have been investigated by various researchers (Brodie, 1984; Dogac et al., 1985; Tsichritzis & Lochovsky, 1982). Conditional constraints are constraints that hold only if a specified condition is satisfied in the current state of a database, knowledge base, and/or model base. These constraints are represented in the form of IF-THEN statements and consists of a set of conditions and a set of actions. Numeric constraints involve mathematical relationships. Such constraints are found and used extensively in spreadsheet programs which are considered as MMS applications. Two types of explicit constraints are equality and inequality constraints. Notice that a constraint may fall into more than one category. For example, consider the constraint IF X + Y = 0 THEN Employee.Salary is between $3000 and $5000. This may also be classified as a numeric constraint (equality), and range constraint. This, of course, increases the complexity in dealing with constraints. One approach to handle such interaction constraint is presented in this thesis. However, more research is required in the analysis of the interaction between different types of constraints.
19
Table 2 Constraint Types CONSTRAINT CLASS TYPE Unconditional Constraints
CONSTRAINT SUBCLASS TYPE Range Constraints
valid ranges for attributes
Subset Constraints
a condition that a subset of entities or relationships should satisfy
Attribute Constraints
an attribute in an entity or a relationship
Data dependency Constraints
functional dependency between the entities or relationships.
Generalized Cardinality Constraints Incompatibility Constraints
the cardinality of instances in an entity or a relationship
Obligation Constraints
the domain of an attribute to a certain entity or a relationship occurrences of an entity or relationship must be included in one another relates the old and new values of an attribute during an update operation a set of actions are specified if conditions are met a system of equation that contains equal relationship a system of equation that contains the form of x-y > c or xy≥c
Inclusion Constraints
Transitional Constraints Conditional Constraints Numeric Constraints
DESCRIPTION
If .. Then .. Constraints Equality Constraints Inequality Constraints
an attribute of an entity or a relationship occurrence with the attribute of another entity or relationship occurrence
EXAMPLE An employee salary must be between 3000 and 5000. The items purchased from a new department should be no less than 100. An employee's salary must not be less than $20,000. There must be a unique social security number of all employees. A department can have only 2 secretaries. The number of machines that a supplier supplies must not exceed the amount that he actually manufactures. The manager of a project must be an engineer. A supplier can supply those machines that he is manufacturing. The salary of an employee cannot decrease. If salary > $3000 THEN job = "Manager" X+Y=0 3X + Y > 0
20 Existing Approaches to Constraint Management Considerable amount of research literature on constraint management is found in the areas of Artificial Intelligence, Database Management, Knowledge Base Management, and Model Management. No constraint management technique is uniformly superior, each having its own limitations, and different approaches are appropriate for different purposes. We review below some of the constraint management approaches (from literature) that are relevant to this thesis.
Stonebraker, Approach Stonebraker (1975) is one of the earlier researchers to propose a constraint enforcement mechanism used in the relational language QUEL. For example, in the QUEL language, integrity constraint 'average percent of error should be less than 10' may be expressed as follows: ASSERT AVERAGE_ERROR ON ERROR: AVG(PERCENT) < 12
He suggests that a request (for changing a database state) is modified at the query language level to contain all the relevant integrity constraints. During query processing, the query processor performs constraint validation functions. Although this approach is straightforward, more powerful capabilities are required to enforce consistency constraints in DSS (for example, numeric constraints in MMS).
Qian and Wiederhold, Approach Qian and Wiederhold (1986, 1987) attempt to overcome the shortcomings of the approach suggested by Stonebraker. They argue that Stonebraker's method is inefficient for several reasons: 1. constraints have to be checked as stated by user; 2. constraints have to be checked at the query processing stage; and
21 3. constraints checking algorithms do not make use of the knowledge about the specific application domain. Therefore, they propose that a knowledge-based approach to the constraint validation problem is far better than conventional database constraint enforcement. They present a constraint manager that performs two major functions: knowledge acquisition and management, and constraint transformation. The constraint manager extracts knowledge from the database schema and userspecified constraints. A transformational mechanism exploits this knowledge about the application domain and the database organization to reformulate integrity constraints into semantically equivalent ones from which an efficient code may be generated. Thus, a knowledge based approach involves collection of declaratively expressed meta-facts about the application domain and the database implementation.
Gelman et al., Approach Gelman et al. (1990) developed a financial resource management system called FRM. FRM is an experimental CLASS/HYPERCLASS object oriented knowledge base system that is designed for budget planning and financial resource management. The system attempts to integrate many of the tasks an intelligent financial assistant should perform beyond that of a spreadsheet book-keeping program. This system uses hierarchical organization among constraints, as well as among budget items and budgets themselves. The major components of the FRM system are: FORMAN, CONFRM, the Constraint Manager, PLANNER, and the explanation module. FORMAN is a user interface to facilitate users to create, examine, and modify budgets. For example, users can select items with a mouse on images of forms and invoke operations on the items by selecting commands from menus. CONFRM is a component that maintains multiple hierarchies of representation of objects that define a particular view of the full set of budget items. An explanation component is designed to
22 provide details on how a location acquired its current value, and, if possible, it justifies the value. A planning component is called by the constraint manager to determine a sequence of actions to fix a constraint violation. The constraint manager handles numerical constraints as well as symbolic constraints. We discuss the constraint manager of FRM system in detail. All of the FRM constraints have a common structure and a constraint is treated as an object, created or edited through a specialized editor. The editor is responsible for guiding a user to input valid constraints. Figure 6 presents a typical symbolic constraint.
CONSTRAINT: SuperSecretary Arguments = ($Budget $Secretary $AllSecretaries) IF-Clause = (Type? $Secretary SECRETARY) THEN-Clause = (Less (Length $AllSecretaries) 3) CorrectiveActions = (CreateSuperItem $Budget $AllSecretaries) BindClause-1 = (BIND $Secretary (confrm PersonnelItems)) BindClause-2 = (BIND $Budget (FindRoot $Secretary)) BindClause-3 = (BIND $AllSecretaries (FindItems $Budget SECRETARY)) Strength = 4 Priority = 300 ImposedBy = Agency A Source = Bittman Author = Ralston LastEdited = 1/01/88
Figure 6. Syntax for a Typical Symbolic Constraint in FRM (from Gelman et al., 1990)
In Figure 6, a "SuperSecretary" is a constraint that combines the clerical components in to a single item. Such a situation may arise when more than two part-time secretaries provide support in a budget. This constraint detects such a situation and modifies the structure of the current budget view, while retaining a detailed underlying representation for use when the extra detail is appropriate.
23 The Arguments contain one or more arguments that will be bound to values at execution time. The IF-clause refers to the preconditions of the constraint and is a logical expression. The THEN-clause is a conjunction of clauses that specifies a desired state. The Corrective actions specifies database modifications to be invoked upon detection of a violation. The FRM constraint language also allows user to specify other attributes: Strength, Priority, and Imposed_By, Source, Author, and Lastedited. The evaluation process of FRM constraint specification is as follows: 1. The IF-Evaluator checks each of IF clauses to see if the preconditions are met. 2. If the preconditions are met, the THEN Evaluator is called to check for a violation of the desired relationship. 3. If Then condition is satisfied, no action is taken. Otherwise, corrective actions may be undertaken to force satisfaction. The significance of the approach is that it represents domain knowledge uniformly as constraints and views resource management and planning problems as constraint satisfaction and resolution tasks. Although this approach may seem perfect for scheduling/resource management applications, we still need a general paradigm that can integrate domain-independent systems to solve larger and diverse problems.
Shepherd and Kerschberg, Approach Shepherd & Kerschberg (1986) use an object oriented paradigm for a knowledge base approach to represent constraints. Constraints are managed as a network which is used to enforce consistency of a constraint knowledge base. They also provide a Constraint Language (CL) to specify the semantics of application domain. This was used in their PRISM system. In PRISM, an information system is successively represented as a hierarchy of object-oriented models: PRISM metadata, Data Model, Schema, and Application levels. That is, objects at each level are defined with respect to the meta-data of the level above.
24 This allows the PRISM meta-level to implement a conventional data model in terms of constraints on user-defined objects. Constraints in each level are expressed using the CL. The PRISM CL is a collection of rules (constraints). Each rule consists of a common structure: precondition, action, and postcondition. Also, the logical operators AND, OR, NOT and parentheses are used to combine predicates with each precondition and postcondition. User goals (ASSERT, DENY, or TEST) are determined in the precondition clause. The syntax of their CL of a typical constraint is: CONSTRAINT: instance of: invoice; PARMS: $invoice_instance; PRECONDITION: roleis(ASSERT); ACTION: unitasa: $invoice_instance, invoice; POSTCONDITION: assert( invoice_number($invoice_instance, $integer_instance));
The above constraint checks a relationship between an invoice_instance and invoice_number. That is, the instance of invoice must be associated with an invoice number before any instance can be asserted. The validation and verification mechanism is straight forward. All constraints in PRISM are explicitly stated and evaluated at run time. To determine whether a constraint is satisfied, the system evaluates all three CL expressions -- PRECONDITION, POSTCONDITION, and ACTION -- as TRUE, FALSE, UNKNOWN, or EXCEPTION. A user goal is handled as a transaction that is not committed until all subgoals are correct. The advantage of this approach lies in the conceptual hierarchization and semantic organization of constraints. The authors argue that explicit constraints at all levels modeled as a minimal set should reduce the work of run-time enforcement processors, and may point the way to the design of machine architectures optimized for generic constraint enforcement. Thus, their main contribution has been in addressing the need for a constraint management paradigm in a knowledge based information system.
25 Wald and Road, Approach Wald and Road (1989) developed a constraint manager, CONMAN, for the STROBE knowledge base (Smith, 1983). CONMAN basically provides three types of support: definition, analysis, and enforcement. Constraints are defined by condition and action facets (components of slot) which are generally inherited via the taxonomic hierarchy or the datatype mechanism. Constraints are analyzed by putting constraints on the constraint facets which automatically generate operational information needed to efficiently enforce constraints at run time. Finally, constraints are enforced by the CONMAN modification functions which may be invoked either directly or via messages. Wald and Road's approach is particularly useful in object oriented knowledge bases. The major drawback of CONMAN is that it was not developed as an interactive design system; it is slow due to its purely symbolic approach.
Muthuswamy et al., Approach More recently Muthuswamy et al. (1992) proposed an approach to model integrity constraints in DSSs using a three level architecture. The motivation behind using this approach is that DBMS, a mature technology, uses a three level architecture consisting of: a) the conceptual view or the schema, b) the individual user views or the subschema, and c) the internal or physical view. The MMS consists of individual model views or model subschemas, the conceptual model schema, and the internal model view. Similarly, for a Knowledge Base System (KBS), a subschema consists of facts and rules of an individual user or group of users. The subschemas are integrated to form the KBS conceptual schema which represents all the rules and facts of the organization. The physical view contains information about the physical storage representation of the KBS, such as linking of the
26 various rules, accessing database to obtain data, and other device-related information. However, as the authors point out, whether the three-level architecture would be beneficial for DSSs is a question that needs prototyping and experimental investigation. The constraint management model of this thesis is motivated by all of the previously mentioned constraint based models and constraint management techniques. This thesis attempts to apply some of the important constraint management techniques to DSSs.
27
CHAPTER III ARCHITECTURE OF AN INTEGRITY CONSTRAINT MANAGEMENT SYSTEM IN A DSS This chapter describes an architecture of Integrity Constraint Management System (ICMS). ICMS is an interactive system that supports a decision maker/designer to manage integrity constraints in a DSS. This ICMS approach is appropriate for DSSs that have heterogeneous components such as DBMSs, KBSs, and MMSs. The constraint representation in ICMS, the role of ICMS, and ICMS components are presented in this chapter. Constraint Representation in ICMS In an organization, various user groups have different views of constraints. These differences are important and need to be modeled by a constraint management system. The integration of these views provides a conceptual universal view of the organization's constraints. Such an integrated framework for DSS enables a user/designer to explore various aspects of an organization. Frost (1986) defines a view as 'a set of concepts and rules which can be used to formulate and relationalize one's perception of parts of the universe.' Views help one to identify, name, and classify subsets of the universe in which one is interested. These may be considered as abstractions of the real world, and each view representation may involve different underlying constraint types.
28 Using Frost's notion of views, constraints in the ICMS are defined at two levels of abstraction: A. Subsystem Constraints : DSS subsystem constraints are DBMS, KBS, and MMS constraints. B. Global Constraints: Global constraints represent relationships between organizations and inter-component constraints. This two-level hierarchy of constraints is shown in Figure 7.
Global Constraints
DBMS Constraints
KBS Constraints
MMS Constraints
Figure 7. Hierarchy of Constraints in the ICMS Approach
The following sections describe subsystem and global constraints in detail.
29 Subsystem Constraints Database Constraints Specification and enforcement of constraints in a database depends on its underlying data model. Data models present useful sets of declarative semantics which are inherent to DBMS applications. A popular data model is the relational data model (Codd, 1970; Date, 1981). The relational data model is based on set theory. The relational model does not structurally imply any integrity constraints (Dogac et al., 1985), and it contains only explicit constraints. We consider three explicitly defined integrity rules that must be satisfied in a relational database system: a) the entity integrity rules (Codd, 1970; Date, 1981), b) the referential integrity rules (Codd, 1970), and c) explicit constraints (for example, using ASSERT statement in SQL) The entity integrity rule states that no component of a primary key value may be null. Referential constraints arise whenever one relation includes references to another entity. Since the entity and referential integrity rules must be satisfied in any relational system independent of application, we can imbed these rules into a DBMS. Below we give some example database constraints.
Entity Constraints: Cd1_e1: SUPPLIER(SUPPLIER#,SNAME, STATUS, CITY) Cd1_e2: PART(PART#,PNAME,COLOR,WEIGHT) Cd1_e3: SHIPMENT(SUPPLIER#,PART#,QUANTITY)
Explicit Constraints: Cd2: Cd3: Chicago.
SUPPLIER# < 100 SNAME(Smith) < > CITY(Chicago)
-- A supplier number must be less than 100. -- A supplier named Smith cannot be in in
30 KBS Constraints A KBS is developed based on algorithmic knowledge of actions to be taken. A typical production system is comprised of a knowledge base, expressed as if-then rules, and an inference mechanism or rule interpreter. The interpreter tests the values of the facts on the left-hand side of a rule; if test succeeds, new values for facts are set according to the right-hand side of the rule. In a KBS, constraints may be easily represented in the form of rules. The following are examples of KBS constraints: Ck1: Ck2:
IF SHIP_DAY = 1 AND SHIPPING = "AIR" THEN THEN METHOD = "FEDERAL EXPRESS" IF SHIP_DAY = 2 AND SHIPPING = "AIR" THEN THEN METHOD = "FEDERAL EXPRESS" OR METHOD = "UPS"
MMS Constraints MMS often deals with a large number of numeric variables which are usually expressed as algebraic equations. We assume that processing of numeric constraints are traditionally handled by the MMS. For example, a linear programming model involves numeric constraints. Similarly, a spreadsheet system can be viewed as a simple, constraint-driven system in which only mathematical formulas are allowed (Jarke, 1990). Adding numerical constraints to cell values enables modeling of a constraint-driven MMS application in a spreadsheet environmen The following are some example linear programming problem constraints: Cm1:
x + 2y + c ≤ 10
Cm2:
x+y=2
Cm3:
g = f(x y z)
31 Cm1 and Cm2 are numeric equation constraint types that are typically specified in the MMS component. Cm3 is a procedural constraint, yet the input/output values are numeric. Table 3 summarizes the constraints in DSS subsystems that have been discussed so far.
Table 3 Constraints in DSS Subsystems
Primary Function
DBMS data management
KBS knowledge management
Example
Relational DBMS
Production System
Subsystem Constraints
•
entity rule (constraint) • referential integrity rule (constraint) • explicit constraint
•
conditional constraint
MMS facilitates use of mathematical models Linear Programming Model • equalities constraint • inequalities constraint
Global Constraints A global constraint provides a basis for specifying constraints between DBMS, KBS, and MMS with a DSS. These constraints are the result of interactions among the DSS subsystems. Following are some examples of global constraints:
Cg1:
IF SUPPLIER.STATUS = 001 THEN x=3 AND g=2
Cg2:
IF SUPPLIER.CITY="CHICAGO" THEN SHIP_DAY=2
Cg3:
IF PART.WEIGHT > 10 AND SHIPMENT.QUANTITY < 2 THEN SHIPPING = "AIR"
32 Cg is a constraint that specifies a relationship between SUPPLIER.STATUS (which may an attribute in a DBMS) and 'x' and 'g' which are MMS variables. Similarly Cg2 and Cg3 connect variables which are modeled in the database and knowledge base respectively. Global constraints are different from subsystem constraints because they involve more than one DSS component.
Role of ICMS in DSS The architecture of ICMS is presented in Figure 8. Cd, Ck, Cm, and Cg denote database constraints, knowledge base constraints, model base and global constraints respectively. The arrows from the subsystem constraints to the global constraints in ICMS indicate the mapping. Such mapping enables ICMS to capture all the constraints needed to generate a constraint dependency network, which is described in the next chapter.
33
Decision Support System ICMS
Other DSS Modules Cd
Cg Ck
DBMS
KBS
Database
Knowledgebase
Cm
MMS
Modelbase
Figure 8. ICMS in DSS Environment
The architecture allows DSS components (DBMS, KBS, and MMS) to operate independently. Each component uses its own storage base (database, knowledge base, and model base). Integrity problems that arise from one component interacting with another are handled by the ICMS. Thus, ICMS is an independent constraint management system over the standard DSS components. Figure 9 shows the major components of ICMS.
34 User Interface
Constraint Propagation Module
Constraint Evaluation Module
Transaction Manager
Constraint Knowledge Base
Constraint Dependency Network Generator Mapping Knowledge
Constraint Objects
Constraint Dependency Network
Figure 9. ICMS Modules
The ICMS has the following modules: a)
Constraint Knowledge Base: This knowledge base stores the data structures of the constraint, constraint dependencies, and constraint objects (meta-information about constraints). Figure 10 shows the global constraint that includes all of the DSS subsystem constraints and the interaction constraints.
b)
Constraint Dependency Network Generator: This module generates a network based on the dependencies among the variables of DSS.
c)
Constraint Propagation Module: This module performs constraint checking and enforcement procedures.
d)
Constraint Evaluation Module: This module determines whether a constraint is consistent with the existing constraints.
35 e)
Transaction Manager: This module manages constraint checking and enforcement. It also commits or aborts the constraint evaluation and propagation process (explained later).
f)
User Interface: This module provides a friendly framework (structure) for the input and output requirements of the ICMS.
Database Constraints Cd1_e1: SUPPLIER(SUPPLIER#,SNAME, STATUS, CITY) Cd1_e2: PART(PART#,PNAME,COLOR,WEIGHT,CITY) Cd1_e3: SHIPMENT(SUPPLIER#,PART#,QUANTITY) Cd2: SUPPLIER# < 100 Cd3: SNAME(Smith) CITY(Chicago)
KBS Constraints Ck1: Ck2:
IF SHIP_DAY = 1 AND SHIPPING = "AIR" THEN THEN METHOD = "FEDERAL EXPRESS" IF SHIP_DAY = 2 AND SHIPPING = "AIR" THEN THEN METHOD = "FEDERAL EXPRESS" OR METHOD = "UPS"
MMS Constraints Cm1: Cm2: Cm3:
x + 2y + c ≤ 10 x+y=2 g = f(x y z)
Global Constraints Cg1: Cg2: Cg3:
IF SUPPLIER.STATUS = 001 THEN x=3 AND g=2 IF SUPPLIER.CITY="CHICAGO" THEN SHIP_DAY=2 IF PART.WEIGHT > 10 AND SHIPMENT.QUANTITY < 2 THEN SHIPPING = "AIR"
Figure 10. ICMS Constraints
36 We have briefly presented the major components of the ICMS. The functions of this, ICMS module are discussed in detail in the next chapter.
Advantages of the ICMS The advantages of the ICMS model are: a)
modularity of design of the system and its management,
b) interactions and an integrity relationship between the subsystems are directly represented and better understood by the designers/users, c)
sharing of information such as inputs, processing, and outputs subsystems is facilitated,
d) redundancies and conflicts between subsystem constraints are more easily detected and resolved, e)
a facility that has the potential to support multiple user constraint views and a multiuser sharing of constraints thereby providing a mechanism to simplify of the otherwise complex task of processing constraints with many user views, and In summary, the ICMS enables modeling DBMS, MMS, and KBS
constraints and constraints between these components in a unified manner. The ICMS consists of a number of submodules: Constraint Knowledge Base, Constraint Dependency Network Generator, Constraint Propagation Module, Constraint Evaluation Module, Transaction Manager, and the User Interface. In the ICMS, constraints are classified as represented as global constraints and subsystem constraints, and these are stored in the Constraint Knowledge Base.
37
38
CHAPTER IV INTEGRITY CONSTRAINT MANAGEMENT SYSTEM
In this chapter, the components of the ICMS and their functionalities are described in detail.
Constraint Knowledge Base In order to maintain DSS subsystem constraints, it is necessary to store all the constraints and relevant meta-information about constraints. In the ICMS model, this information is stored in the Constraint Knowledge Base. The Constraint Knowledge Base consists of the following sub-knowledge bases: a) constraint objects, b) mapping knowledge, and c) Constraint Dependency Network (CDN).
Constraint Objects In the ICMS constraint, objects are used to model constraints. Figure 11 presents the elements of each attribute of a constraint object. In the constraint knowledge base, constraint objects are indexed by several attributes, such as application type or priority.
39
Constraint ID:
All references to the constraint objects are made through its constraint-id. Constraint ID is used to identify source and type.
Constraint Author:
This identifies the individual who has created the constraint.
Constraint Class Type:
This is the type of constraint and can be entered by user.
Definition of Constraint: This is an actual definition of constraint involving syntax. Description:
This is a brief description of constraints. For example, for range constraints, we can say in English "An employee salary must be between 3000 and 5000."
Priority Level:
This indicates a priority of constraint. Some constraints are more important than others. For example, level 1 may indicate less urgency than level 10.
Constraint Application Type:
This defines the application with which constraint is associated. This is assigned by the system.
Checking Point:
This defines the constraint checking point. For example, 'immediate' or 'delayed'. If 'immediate' option is chosen, constraint processing will begin immediately after each user update. Otherwise, if 'delayed' option is chosen, constraint processing will begin sometime after the user updates.
Link Variables:
The information on link variables is used to construct constraint dependency network.
Figure 11: A Constraint Object Schema
Figure 12 is an example constraint object. Using the ICMS, constraint objects may be interactively inserted, modified, and deleted by the user.
40
Constraint ID: Constraint_M1 Constraint Author: KIM Constraint Class Type: NUMERIC Constraint Type: 12 Definition of Constraint: X+Y