Modeling Fuzzy Temporal Criteria in Database Querying Cornelia Tudorie, Mihai Vlase, Cristina Nica, Dan Munteanu Abstract— Expressing user's preferences in database querying is best achieved by fuzzy modeling of linguistic terms included in selection criteria. This paper deals with temporal criteria, for querying date/time columns in classical relational databases. Fuzzy models for some linguistic terms and ways to operate with them, in order to evaluate queries, are proposed.
from the observation of the necessity to include vague temporal criteria in database querying. Such a study can follow one of two directions: •
starting from computational theories related to temporal data and applying them to database querying, or
•
starting from linguistic practice, for identifying users' habits in linguistic expressing of the searching criteria, then finding linguistic frameworks (eventually language-independent) for defining, classifying, characterizing temporal terms (word or phrase) and after that finding ways to model them and operate with them.
I. INTRODUCTION Almost all databases in the world contain a lot of date type information. It is a relatively new type of data, used since relational databases have been generalized in the information society. Any DBMS (Database Management System) provides operations and libraries of functions able to process such type of data. The selection criteria for columns of date type are also very different. Besides classical comparisons with constants, there are many other options to express selection criteria on date: extracting year, month or day from date and search by year, month, or day; searching by interval (BETWEEN operator). All these search options, in a classical database system, require a precise value specifying the exact date. But the practice shows that many search operations require imprecise or vague criteria expressing. It is difficult to specify an exact calendar date, regardless of user expertise; sometimes it is absolutely not possible, or not useful. The fuzzy set theory is already established as the adequate framework to model and to manage vague expressions, therefore to evaluate vague queries sent to relational database. After modeling the vague linguistic terms contained in the selection criterion as fuzzy sets, the query evaluation consists in a satisfaction degree computing, for each database tuple; it is a kind of membership degree of the tuple to the selection criterion. Evaluating fuzzy searching criteria in database queries was studied in many scientific works, but the fuzzy temporal criteria are very specific and they require special attention. In this context, we are trying to find fuzzy models for linguistic temporal terms in order to evaluate vague temporal queries. Our study is still at the beginning and it is started
C. Tudorie is with the Department of Computer and Information Techonology, Faculty of Automatic Control, Computers, Electrical Engineering and Electronics, "Dunarea de Jos" University of Galati (e-mail:
[email protected]). M. Vlase is with the Department of Computer and Information Techonology, Faculty of Automatic Control, Computers, Electrical Engineering and Electronics, "Dunarea de Jos" University of Galati (e-mail:
[email protected]). C. Nica was student at the "Dunarea de Jos" University of Galati. D. Munteanu is with the Department of Computer and Information Techonology, Faculty of Automatic Control, Computers, Electrical Engineering and Electronics, "Dunarea de Jos" University of Galati (e-mail:
[email protected]).
We chose the second option, based on the observation of the frequent natural expressions used by humans (particularly Romanian users) during database querying. The next section presents in few words how linguistic terms expressing vague selection criteria are modeled and evaluated in database queries. The third section recalls how temporal criteria are expressed in a SQL classical selection on relational database. The fourth section presents some results of our study on identifying and classifying of temporal linguistic terms. After that, in the next two sections we propose fuzzy models for vague temporal terms and their implication in queries' evaluation. Finally, some conclusions are shown up. II. LINGUISTIC TERMS IN VAGUE QUERIES The classical database systems using query languages typically offer a mean to specify selection criteria using Boolean expressions. For example, if a database table contains data about cars, a selection query may be: Retrieve the cars having the price < 25000 The rigidity and specificity of the commonly used query languages can cause an empty result or a too complex one; in both cases the information is useless to the user. A similar situation can be found when the domain of an attribute is very wide, the values are too varying and concrete, and so the user has difficulties knowing or expressing precise criteria. The solution would be accepting approximate or vague criteria in the selection query; thus only objects of a certain area of interest would be retrieved from the database. The vague terms included in such queries are linguistic expressions, currently used in natural language, when trying to identify certain objects. For example: high speed cars, inexpensive cars, low salary, big company, good students, young people, etc.
In a vague query, the selection criterion is no longer Boolean, so it can be more or less satisfied by the database tuples. Therefore, for each table row a satisfaction degree is estimated, which stands for a measure of its compatibility with the vague criterion. The fuzzy sets theory will be used, as the adequate framework to model and to manage vague expressions, or in other words, to evaluate vague queries sent to relational database. Example. Let be a database table, containing data about cars. A possible vague query can ask to Retrieve the inexpensive cars. The response to this query could be Table 1, where the µ coefficient stands for the satisfaction degree of the query selection criterion for each database tuple. TABLE I. Name P 206 OA P 806 IO AA4 P 607 B3 C 300M
µ
THE INEXPENSIVE CARS TABLE
...
Price
...
10461 16042 20633 24000 28449 31268 31562 32000
µ 1 1 1 1 0.54 0.16 0.13 0.07
inexpensive
1
Price 0 10461
25000
40000
55000
69154
Figure 1. Fuzzy model of inexpensive
The query evaluation consists in computing a satisfaction degree, for each database tuple, taking into consideration the model of the inexpensive linguistic term as a fuzzy set (Fig.1); the criteria satisfaction degree µ is just the membership degree to the fuzzy set. In fact, the vague query is translated into a classical SELECT query, where the crisp selection criterion is covering, so that a satisfaction degree µ for all selected tuples can be calculated. Simple query vague criteria, but also some complex vague criteria are studied by researches. Reviews of several categories of linguistic terms involved in fuzzy querying, their fuzzy model and specific operations are presented in [1], [2], [3], [4] and many others. We focus on the possible vagueness of the selection criterion, which involve certain vague terms, currently used in natural language speaking. III. TEMPORAL CRITERIA IN CRISP DATABASE QUERYING The simple DATE (or DATE/TIME) type of data is generalized in all relational DBMS.
The minimal set of operation with date contains: the addition/subtraction of a number of days to a date, the difference between dates, extraction of date's components (year, month, and day). The minimal set of comparison operators used in selection criteria are: •
relational comparators between date type expressions
•
BETWEEN operator for time periods
•
other complex comparisons using specific library functions
New types of temporal data were proposed, even for classical database, expressing Instant (of time), Interval (a length of time) or Period (an anchored duration of time). In this regard, a very good application developer guide is [5]; a lot of aspects of querying conventional tables by date/time criteria are discussed and various implementations in different DBMS are suggested; all these, in the SQL standard framework. A comprehensive handbook on temporal data management is [6], where a history of implication of time in databases is presented. In the recent period, efforts were made to propose extensions of the relational database with temporal dimension of data, so called temporal databases ([7], [8], [9]). All information is considered changed with time; thus, new data types for time, new ways to process them, and query algebra to deal with temporal data were conceived. Temporal query languages ([10], [11]) and various styles of interfaces to temporal databases were built ([12], [13]). Our study is excepting all these issues and it refers to classical DATE type of data; we are focusing on monitoring usual linguistic expressions used when querying databases by temporal criteria. IV. TEMPORAL LINGUISTIC TERMS We started from the observation of users' habits during database querying and their usual way to express temporal criteria. Firstly, we chose to analyze these issues of linguistically viewpoint and then to search fuzzy models for as many as possible temporal linguistic terms. In linguistics, time is a grammatical category which is used to situate an action, event or condition on the time axis, in relation with three references: • • •
PAST PRESENT FUTURE
Past: the past period covers the last time that has elapsed until now. It includes all actions and events committed before speaking. Present: the present period is a variable time, designed as a separate unit between past and future. It consists of actions and events which are taking place at the moment of speaking. Future: the future period follows the present. It includes all actions subsequent to the moment of speaking.
In any language, time indications are used to express time. They are very often accompanied by aspectual indications. Aspect is a grammatical category, showing how the action is viewed in terms of duration and degree of carrying out of the action. Unfortunately, they are not well systematized. A. Time units Any regular phenomenon can be used to record the passage of time. Unit of time is second with decimal submultiples (ms, μs, ns), and with multiples like: minute, year, century, etc... There are many units of time, interdependent, which can be arranged in a pyramid of units of time; the relationships between them are important. In this paper, the smallest unit of time is considered the day. Multiples of this unit are calculated as the number of days. Time periods can be expressed with units from the pyramid or other derived units of time. These units are formed based on the primitive ones, giving to the user the opportunity to express specific period referring to. Romanian language allows defining a large number of units of time derivatives, but we studied only a few. B. Classifying temporal linguistic terms Temporal linguistic terms can be classified according to several criteria: • moment to which is reported; • exactness; • aspect. By period of time, the temporal term can be: • ABSOLUTE - it does not relate to the moment of speaking; • RELATIVE - it is related to the moment of speaking. By degree of accuracy of expression the temporal term can be: • APPROXIMATE – it indicates a temporal value, which is expressed by a specific value and an imprecise one (examples: about 5 days ago, about 2 years ago). • VAGUE – it indicates a temporal value, expressed by an imprecise value (examples: few days ago, few years ago). In database querying, the most important criterion for classification of temporal terms is the ASPECT.
temporal data querying; thus any concrete system that implements these ideas, will benefit by meta-descriptions of these fuzzy models, but the effective models will be dynamically built at the evaluation moment, depending on effective attribute domain. This principle was discussed in some works, related to relative object qualification (for example [14]), but not only.
Figure 2. Shapes in fuzzy modeling of temporal linguistic terms
Generally, in order to dynamically defining linguistic terms as fuzzy sets for database vague querying, some descriptive information is necessary: •
database table and database attribute where the linguistic value is appropriate,
•
label of the linguistic value,
•
position of the linguistic label on the attribute linguistic domain,
•
characteristic points determining the shape on the axis, corresponding to the linguistic value (these points are values on the attribute crisp domain).
The fuzzy models of the linguistic terms always can be drowning as a shape along the axis of the attribute domain, with ordinate in [0, 1] (the membership function). Usually, this shape is a trapezium, with particular cases (rectangle, triangle, or singleton) (Fig. 2). The complexity of temporal linguistic terms requires adding new specific details, so the descriptive information is: •
label of the temporal term,
•
type of the temporal term (the corresponding aspectual category); the possible values are:
In terms of aspect, namely by the length of time and degree of carrying out, temporal linguistic terms can be classified as: • • • •
PERFECTIVE - action is regarded as fulfilled when the speech happens. IMPERFECTIVE - action is seen as ongoing. It starts in the past and ends in present or in the future. INCHOATIVE - action is seen as starting when the speech happens. ITERATIVE - action is seen as repeating in time.
PERFECTIVE
o
IMPERFECTIVE
o
INCHOATIVE
o
ITERATIVE
•
time unit
•
interval (number of days) used in order to determine the characteristic points of the membership function.
•
reference date – date to which the membership functions are referring (origin of the membership function).
•
P0, P1, P2, P3 – percents on the attribute domain that determine the membership function.
V. FUZZY MODELS OF THE TEMPORAL LINGUISTIC TERMS In this section we present how the linguistic terms were modeled at the conceptual level by fuzzy sets. In reality, we consider them as context-free terms, to be applicable to any
o
VI. EVALUATION QUERIES WITH VAGUE TEMPORAL CRITERIA Starting from the fuzzy models of the temporal terms as described above, the evaluation of the vague criteria is now possible. The evaluation process contains two important steps where these models are taken into consideration: 1. Converting the vague query into a classical SELECT query, based on crisp search criterion. This criterion has to cover all database tuples that are candidates to the selection; this interval depends on the support of the fuzzy set model on the attribute domain and it is dynamically computed at the processing moment.
p[2] = p[3] = the maximum limit of the domain related to the end date In order to select the tuples satisfying the vague temporal imperfective criterion, the procedure is this: •
2. Computing the satisfaction degree of the criterion for each candidate tuple, taking into consideration the same fuzzy models. Generally, the definitions of fuzzy models are used thus: •
the temporal linguistic term, the type and the unit of time are identified.
•
the domain is determined and the number of days is computed.
•
the number of days is used for finding the four points required for the membership function representation.
•
The tuples that simultaneously abide the rules are determined: o
The temporal attribute corresponding to the beginning date, contains a value that belongs to the fuzzy set associated to the past period (for which a degree of truth is determined).
o
The temporal attribute corresponding to the end date indicates a value that belongs to the fuzzy set associated to the present or future period (for which another degree of satisfaction is determined).
The fuzzy aggregation connective AND is applied, in order to determine the final satisfaction degree of each tuple.
Only the tuples with the degree of satisfaction higher than zero are displayed.
Algorithms for interpreting linguistic terms, in order to evaluate vague queries, were developed for each type of temporal term presented above.
The way in which the query including temporal inchoative terms is performed, is similar to that for the perfective terms, the difference being that the action is set in the future. As such, the way to compute the temporal period indicated by the time unit is different.
The algorithm used to interpret the perfective temporal terms is:
The algorithm used to interpret the temporal iterative terms is:
Step 1: To determine: the domain, the reference date, the percents corresponding to the temporal term, using its description from the knowledge base.
Step 1: The four percentages corresponding to the selected temporal iterative term are determined, using its description from the knowledge base
Step 2: To determine the number of days corresponding to the area located at the previous step. Interval limits are computed with the unit time indicated. Number of days is computed according to the indicated domain.
Step 2: The number of units of time related to the selected temporal iterative term is determined:
Step 3: Based on reference date (from step 1), the calendar date is calculated. Step 4: Percents from number of days (from step 2) are computed. Step 5: To determine the four calendar dates corresponding to the fuzzy set associated to the selection criterion. Step 6: To check if the dates corresponding to the four points exceed crisp domain limits. The algorithm used to interpret the imperfective temporal terms is: Step 1: To determine the four dates that describe the past period, the algorithm for interpreting the linguistic temporal perfective terms will be used. Step 2: The dates of the future period will be computed: p[0] = today - number of days p[1] = today
For the terms: daily, often, rarely and from time to time the number of days from the period of interest For the term weekly - the number of days from the period divided by the number of days in a week For the term monthly - the number of months from the period of interest (a month is considered to have an absolute value; for example: January 2000, February 2000, etc. ) For the term annually - the number of years from the period of interest (a year is considered to have an absolute value; for example: 2000, 2001, etc. ) Step 3: The four points corresponding to the fuzzy set associated to the selection criterion are set (the points have numerical values). After application of the interpretation algorithm, the fuzzy query is transformed in a crisp SQL query, sent to the relational database. In order to obtain the highest performance, the degree of satisfaction corresponding to each tuple is computed in the same query. Computing the criteria satisfaction is an important step in query evaluation. For each type of representation of the membership function, the satisfaction degree is being
calculated in a different way. For example, in the case of perfective term, it is computed as follows:
X < p0 0 for ⎧ ⎪ X − p0 for p0 ≤ X ≤ p1 ⎪ ⎪ p1 − p0 μ( X ) = ⎨ 1 for p1 ≤ X ≤ p 2 ⎪ p3 − X for p 2 ≤ X ≤ p3 ⎪ ⎪ p3 − p 2 0 for X > p0 ⎩
(1)
The crisp query corresponding to the original vague one may be expressed in the following way (it varies according to the representation of the satisfaction degree): SELECT FORMAT(nume_coloana,'dd-MM-yyyy') AS criteriu,nume_coloane, ROUND(IIF(nume_coloanaFORMAT(P0,'yyyy-MMdd') AND nume_coloana=FORMAT(P1,'y yyy-MM-dd') AND nume_coloanaFORMAT(P2,'yyyyMM-dd') AND nume_coloana=FORMAT(P3,'yy yy-MM-dd'),0,0))))))))),7) AS miu FROM nume_tabela NOLOCK WHERE nume_coloana>FORMAT(P0,'yyyy-MM-dd') AND nume_coloana