Common Schemas in Temporal Abstraction - Semantic Scholar

3 downloads 6201 Views 191KB Size Report
b) The total sale in a city during a period is inferred from the totals of the sales in the ... Temporal abstraction involves operations on the infinite temporal domain. .... Shahar uses the name “vertical abstraction” to emphasize the move from a low.
Common Schemas in Temporal Abstraction Mira Balaban1 and David Boaz2 Department of Information Systems Engineering, Ben Gurion University, Beer Sheva 84105, Israel 1

[email protected] , [email protected] Abstract

Temporal databases store large quantities of raw data. Reasoning from such data requires abstraction. Temporal abstraction is a kind of temporal-reasoning that concentrates on abstracting temporal data. A language for temporal abstraction is presented, and common schemas of temporal abstraction are analyzed. The language can serve as the basis for a query language that supports temporal or spatial abstractions.

1 Introduction In a historical database, the stored information includes data and temporal attributes. One common use of the temporal attributes is for stating when the data was valid. Often, humans and decisionsupport applications consume abstract high-level concepts derived from the information stored in an underlying historical database. Following the work of Shahar [16], the paradigm of Temporal Abstraction specifies how these temporal-high-level concepts are defined. Temporal abstraction is one kind of temporal-reasoning that concentrates on abstracting temporal data. In general, reasoning by abstraction means deriving “abstract” facts from concrete ones. Abstraction means replacing detailed data by coarser one. Usually, this is done by mapping between different domains. In temporal abstraction the abstraction is associated with operations on temporal data. Example 1 demonstrates different kinds of temporal abstraction. Example 1: Representative examples of temporal abstraction from the medicine, sales, meteorology and financial domains: a) A patient that has a low platelet count and a very-low white-blood-cells count has a severe myelo-toxicity during the common time of these findings. b) The total sale in a city during a period is inferred from the totals of the sales in the city shops during that period. A similar pattern applies to cities and regions, to regions and states, etc. c) If, in two instants with a gap less than two hours, the barometric pressure in a region is V (e.g., high), then the barometric pressure in this region is V also during that gap. d) Stock Trends are abstracted from the daily stock values. The trends are represented using linear equations. These examples emphasize two major characteristics of temporal abstraction mechanisms: (1) Diverse inference mechanisms and (2) application of temporal and a-temporal operations. The

1

mechanisms in this example involve reasoning from concrete facts (ex. 1.a), reasoning on the basis of geographic location (ex. 1.b), succession in time (ex. 1.c), and reasoning on the basis of trend finding (ex. 1.d) Therefore, a framework for handling temporal abstraction must be sufficiently flexible to allow such mechanisms. Characterization of Temporal Abstraction: Systems that involve abstraction distinguish between two kinds of entities: dimensions and measures, e.g., OLAP. A dimension is a set whose elements are called members, and are organized in hierarchies with multiple disjoint levels (not necessarily finite). Measures are domains of values that are associated with dimension members. Abstraction is done moving on a dimension from members in a lower level to a member in a higher level, while the associated measure value is computed by a function that is not one to one. Our observation of common temporal abstraction schemas reveals three dimensions of abstraction: Predicate, Subject and Time, and one measure - the Value. In the above examples, example 1.a illustrates abstraction on the Predicate dimension (from platelet and wbc predicates to myelotoxicity predicate); example 1.b presents abstraction on the Subject dimension (from the store level, to the cities, regions and states levels); and examples 1.c and 1.d show abstraction on the Time dimension (from separate intervals, to a coalesced interval). The Predicate and Subject dimensions are finite since the members of the first are the predicates of a TAR program, and the members of the second are predefined subject values. The Time dimension is infinite since its members are time intervals. Leveling of the Time dimension can be defined in different ways. One common way is to base leveling on interval durations. Abstraction in the Time dimension can follow the interval-inclusion partial ordering of the Time dimension. Examples 1.c and 1.d demonstrate abstractions based on interval-inclusion. Temporal abstraction involves operations on the infinite temporal domain. Such operations might create infinity of terms. A similar problem arises in deductive databases, due to the use of arithmetic relations and function symbols. Deductive databases that guarantee finite relations are called safe. In temporal abstraction the mapping functions are interpreted, and therefore, safety analysis can rely on function properties. In this paper we analyze common schemas of temporal abstraction and point to their spatial analogy. Section 2 presents the Temporal Abstraction Rule language called TAR [2, 5] and shortly analyses its properties. In Section 3 we present and analyze common schemas of temporal abstraction Section 4 discusses future directions. The paper is motivated by the work of one of the authors in the development of a medical information system [6].

2 The Temporal-Abstraction Rules (TAR) Language TAR temporal model supports time-points (points for short), time-intervals (intervals for short) and durations. Points are instants such as 1/1/2000 00:00, and for simplicity they are identified with the integers. Intervals are ordered pairs of points such as [1/1/2000 00:00, 31/12/2000 23:59]. The points are named start and end, and the start point must precede or be equal to the end point. Otherwise, the interval is empty. Durations are sizes on the time line, such as one year. The model supports time operations such as adding duration to a point, interval addition, and relations between points, durations and intervals [1].

2

A TAR database consists of facts and rules, and can be viewed as a subset of deductive databases. Following is a brief overview of TAR [2]. Syntax: TAR is a typed logic language that supports types of three kinds: subject types (e.g., Patient, Region, StockExchange), time types (Point, Interval and Duration) and value types (e.g., Integer, Color). A TAR atomic formula is an atom with the following signature: Subject × Interval × Value. It can include any subject, time or value functions and variable as well. A fact is a ground atomic formula, in which the subject and the value are given by constants and the interval is given by a pair of time-point constants and is not empty. For example, wbc(S, [T, T+one_month], low) is an atomic formula but is not a fact, while wbc(john, [1/1/1990, 2/1/1990], low) is a fact. A TAR rule is a means for deriving facts (intensional relations). A rule includes constrained selection of facts from the database, and deriving new facts that result from the application of evaluable functions to the selected facts. Therefore, the schema of a TAR rule has the following format:

h( s ( F ), i ( F ), v( F )) ← select ( DB, F | C ) , where: C is a set of constraints, DB and F stand for sets of facts, and s, i and v are evaluable (interpreted) functions. select can be any external procedure (non deterministic) that returns from the DB a set of facts F that satisfy C. The application of s, i and v to the selected facts (F) creates new values. The select procedure can take different forms. One common type of select is as a regular body of a logic rule: p1(s1,i1,v1),…,pn(sn,in,vn). Note that the returned facts are not necessarily extensional and that this format allows recursive rules. Example 2 demonstrates representative TAR rules. Example 2: The instances in example 1 are represented in TAR. a) TAR rule for example 1.a: myelotoxicity(S, intersection(I1,I2), mapmyelotoxicity(V1,V2)) ← platelet(S,I1,V1), wbc(S,I2,V2). mapmyelotoxicity(low,very_low)=severe. myelotoxicty facts are derived from platelet and wbc facts that have not an empty intesection. The head interval is the intersection of the fact intervals, and the value is the results from applying mapmyelotoxicty with the appropriate values. b) TAR rule for example 1.b: sales( geographicAggregation(F), intervalOf(F), sumOfValues(F)) ← collect(DB, F, sales| sameGeographicLocation). where the collect procedure selects a set of sales facts F, that satisy the constraints sameGeographicLocation. The subject function geographicAggregation returns the common geographic region (cities, regions, states, ...). c) TAR rule for example 1.c (bp stands for barometric_pressure): bp(S, [startOf(I1), endOf(I2)], V) ← adjacent( bp(S, I1, V), bp(S ,I2, V) | startOf(I2)- endOf(I1) < maxGapbp(duration(I1), durationOf(I2))). maxGapbp(2 minutes, 5 minutes)=2 minutes. where the adjacent procedure selects two successive facts from the database. The head interval starts at the start of the first fact, and ends at the end of the second fact.

3

d) TAR rule for example 1.d: stockTrend(S, [startOf(F), endOf(F)], linear_regression(F)) ← trend(F, S, stock_exchange | confidence_level>95%). where: the trend procedure selects a set of successive facts F from the relation stock_exchange with a common subject and confidence_level > 95%. subjectOf, startOf and endOf return the common subject the start of the interval of the first fact in F, and the end of the interval of the last fact in F, respectivelly. The value function linear_regression computes a linear equation that is approximated to the detailed stock_exchange values. Semantics: TAR semantics is defined as a set of facts that specifies the answer set for every predicate. For a program, P, the semantics is defined using the following Tp operator. Let I be a set of facts, for predicates in P:

Tp ( I ) = {h( s, i, v) | h( s, i, v) ∈ I , or there is a rule h( s ( F ), i ( F ), v( F )) ← select ( DB, F | C ) and there is F ′ ⊆ I such that select ( I,F ′|C ) holds and s = s ( F ′) ,i = i ( F ′) ,v = v( F ′) and I is not the empty interval} Note that TAR Tp operator differs from the standard Tp operator of logic programming [12] in applying the select procedure and evaluating the head functions. If all select procedures in the rules are given by the standard body of Datalog rules, then the semantics of a TAR program is the usual least-fixed-point semantics (lfp(TP)). But, if some select procedures are given by non-monotonic procedures, like adjacent and trend in example 2, we cannot designate the semantics as a known fixed-point of Tp. In such cases, we define the semantics of a TAR program procedurally, as the set of facts obtained from applying a bottom-up evaluation procedure [3, 19] using the Tp operator. Categorization of TAR (according to [4]): TAR supports a discrete, infinite temporal domain, based on time points (not intervals). The language has a single temporal dimension for the valid time and has a closed form evaluation (that is, intensional relations have the same structure as extensional). However, TAR is not a Datalog1s [4], since it includes subject and value functions, in addition to the successor function. Safety: An important property of deductive databases is safety. A database is safe if the answer sets for all queries on this database are finite. An extensive body of work was devoted to the study of finiteness in deductive databases. To our knowledge, there are four approaches for dealing with infinite relations: (1) Constraint databases [8, 15]; (2) Restrictions on the structure of rules and queries. Relational databases forbid recursive rules, Datalog forbids functions and imposes secure restrictions; (3) Structural restrictions in concrete domains (e.g., for sequence databases [13, 14] and for spatial databases [11]; (4) Query restriction based on finiteness analysis [9, 10]. There are three sources of infinity in general deductive databases: (1) infinite extensional relations, (2) new variables in heads of rules, and (3) the combination of recursive rules and functions (which can create infinity of new terms). Our interest is to characterize TAR databases that are safe. The first problem does not arise in TAR since there are no infinite extensional relations. In addition,

4

a select procedure returns facts from a database, and does not create new facts. Therefore, if there is no other source of infinity, select returns only finite sets. The second problem also does not arise in TAR, since the terms in the head are restricted by the appropriate functions. For the third source, TAR rules indeed allow recursion and functions. Intensional relations in TAR can be infinite only when there is a recursive dependency between predicates, and the functions keep generating new values. Since temporal abstraction is defined in concrete domains, and the functions are interpreted, safety restrictions are obtained by analyzing the properties of the functions. As explained above, the Predicate and Subject dimensions are finite, and therefore involve a finite number of levels, each. Since each abstraction step involves moving up in the levels hierarchy, the number of abstraction steps must be finite as well. Therefore, predicate and subject abstraction can produce only a finite number of values. Abstraction on the time dimension involves concrete interpreted functions. Therefore, the safety analysis can follow the work of [13, 14] for sequence databases. We distinguish between two types of functions: the function + : time-point × duration Æ time-point is constructive, since it creates new time points. Other temporal functions are non-constructive (they do not create new time points, e.g., interval addition). Our aim is to avoid recursion that involves the constructive function. For that purpose, we use the notion of predicate-dependency graph [3, 19]. The predicate-dependency graph of a TAR program is a directed graph whose nodes are the predicate symbols in the program. There is an arc from p to q if the program contains a rule with p as the head predicate and q is a one of the body predicates returned by the select procedure. The arc is constructive if the time function, i, is the constructive function. A TAR database that does not contain a constructive cycle is finite since the number of time points is finite; the number of intervals is finite; and therefore the number of levels is finite. Again, since an abstraction step moves up the levels hierarchy, the number of steps must be finite. In conclusion, finiteness of intensional relations in temporal abstraction involves detecting constructive cycles alone.

3 Common Temporal-Abstraction Schemas A temporal-abstraction schema is a general pattern of rules used for reasoning about temporal data. In this section, we survey common temporal-abstraction schemas, we show how they are presented in TAR, and discuss their safety properties. For each schema we point to its spatial abstraction analogy.

3.1 Predicate Abstraction Predicate-abstraction is used when the level of the head predicate in the Predicate dimension is higher than the levels of body predicates (returned by select). The schema is demonstrated in examples 1.a and 2.a. Typically, all body facts involve the same subject and the intersection of their intervals is not empty. The subject of the derived fact is the common subject, the time interval is the intersection of the body intervals, and the value is the result of a specific function application on the body values. This schema has the following form:

5

h(S,intersection({I1,…,In}),v({V1,…,Vn})) ← b1(S,I1,V1),…,bn(S,In,Vn) | C. where the level of h is higher than the levels of b1,…,bn, intersection returns the maximal interval that is common to all {I1,…,In} (empty if the intervals are not overlapping), v computes the head value, and C is a set of constraints on the body facts. This schema is not a source for infinity, since it involves escalation in the Predicate dimension. Predicate abstraction formulates the intuitive idea of vertical-abstraction introduced in Shahar [16]. Shahar uses the name “vertical abstraction” to emphasize the move from a low concept to a higher one. It seems that this schema applies also for spatial databases for deriving abstract concepts based on the combination of facts occurring in a common space.

3.2 Subject Abstraction Subject-abstraction involves moving up in the Subject dimension. Typically, a subject level represents aggregation of subjects from lower levels, e.g., a region is composed from cities, a class is composed from students, etc. This schema is demonstrated in examples 1.b and 2.b. It has the following form: p(agg({S1,…,Sn}), I, v({V1,…,Vn})) ← p(S1, I, V1),…,p(Sn, I, Vn). Note that p and I are common to the body and head facts. The subject function agg aggregates the subjects S1,…,Sn into a higher level subject. The function v computes the head value (e.g. average, sum). The spatial analogy to subject abstraction is deriving facts about aggregation of subjects in the same space. For example, if the majority of the planes of American Airlines park in JFK, then American Airlines parks in JFK.

3.3 Time Abstraction Time-abstraction means deriving facts with intervals that cover the assumption intervals. This schema is demonstrated in examples 1.c,d and 2.c,d. This schema has the following form: p(subjectOf(F), i(F), v(F)) ← select(DB, F | C). where the select procedure returns a body F of the form: p(S,I1,V1),…, p(S,In,Vn) that satisfy the constraints C, and i(F) returns an interval that covers {I1,…,In}. As explainded above, recursion in time abstraction is safe only when the time function is not constructive. That is, if the predicate p participates in a predicate dependency cycle then i cannot be the implemented using the constructive time function. This schema has two common variations: • Concatenation: Meeting (or overlapping) facts are concatenated into a new fact that combines the time periods. Concatenation uses the interval addition function which is non-constructive, and hence is safe. The TQuel [18] language includes a mechanism that coalesces meeting tuples with identical values into a single tuple with a concatented

6

interval. This mechanism is a special case of the time abstraction schema. The spatial analogy of this schema is concatenation of meeting curves. •

Interpolation: Two adjacent facts that are “close enough” are interpolated into a new fact that is valid also during the gap between the facts. This schema is introduced in the knowledge-based temporal-abstraction framework [17] of Shahar, and is demonstrated in examples 1.c and 2.c. The notion of close enough depends on the predicate and the durations of the body intervals. Close enough is characterized by the maximal gap that can bridge between facts. For example, the maximal gap that can bridge two height facts is specified in months, while the maximal gap of two weight facts is specified in weeks, because weight is more dynamic. This schema has the following form: p(S, [startOf(I1), endOf(I2)], V) ← adjacent( p(S, I1, V), p(S ,I2, V) | startOf(I2)- endOf(I1) < maxGapp(duration(I1), durationOf(I2))). where maxGapp retruns the maximal gap that can brige p facts. The adjacent procedure characterizes pairs of facts with the same predicate and subject and that are “adjacent” in the database. Two such facts are adjacent if there is no a third fact with the same predicate and subject that is temporally in between. Adjacency is defined based on a follow relation on the Herbrand base of a TAR database: follow( p(S, [T1s, _], _), p(S, [T2s, _], _) ) ← T2s ≥ T1s. If T2s = T1s then an arbitrary ordering is selected. The follow relationship induces an adjacency relationship between the facts of a database. Two facts X and Y are adjacent if Y follows X, and there is no a third fact, Z, in the database in between: adjacent(X, Y) ← follow(X, Y), not( follow(Z, Y), follow(X, Z) ). adjacent is an example of a non-monotonic select procedure. The interpolation schema is safe, because the time function is non-constructive.

3.4 Trends - Combination of Predicate and Time Abstractions Examples 1.d and 2.d. demonstrate trend inference. A special case of trend abstraction is the periodic patterns introduced in [7]. A periodic pattern derives a trend based on a sequence of facts that follow each other, and satisfies a given set of constraints. It is based on the notion of a maximal sequence of facts that follow in time. The select procedure MaxSeq(DB, F | C) selects a body F of the form: b(S,I1,V1),…, b(S,In,Vn) from DB, such that follow(b(S, Ii, Vi), b(S, Ii+1, Vi+1)) (1≤i