Complex Temporal Patterns Detection over ... - Springer Link

3 downloads 59383 Views 187KB Size Report
stream with a sliding window, and checks the data inside the window from ... plications, performance measurements in network monitoring and traffic manage- .... Because our R2L algorithm is best introduced with an example, in the following ...
Complex Temporal Patterns Detection over Continuous Data Streams Lilian Harada Fujitsu Laboratories Ltd. 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki 211-8588, Japan [email protected]

Abstract. A growing number of applications require support for processing data that is in the form of continuous stream, rather than finite stored data. In this paper we present a new approach for detecting temporal patterns with complex predicates over continuous data stream. Our algorithm efficiently scans the stream with a sliding window, and checks the data inside the window from right-to-left to see if they satisfy the pattern predicates. By first preprocessing the complex temporal patterns at compile time, it can exploit their predicates interdependency, and skip unnecessary checks with efficient window slides at run time. It resembles the sliding window process of the Boyer-Moore algorithm, although allowing complex predicates that are beyond the scope of this traditional string search algorithm. Some preliminary evaluation of our proposed algorithm shows its efficiency when compared to the naive approach.

1

Introduction

Over the past few years, a great deal of attention has been driven towards applications that deal with data in the form of continuous, possibly infinite data streams, rather than finite stored data [1]. While the concept of stored data set is appropriate when large portions of the data are queried many times and updates are infrequent, data stream is appropriate when new data arrives continuously, and there is no need to operate on large portions of the data multiple times. Examples of stream data in recent applications include data feeds from sensor applications, performance measurements in network monitoring and traffic management, vital signs and treatments for medical monitoring, call detail records in telecommunications, log records or click streams in Web applications, to name a few. As we illustrate with some examples in section 2, besides the filtering and aggregation of data that have been the focus of most previous work [2-4,10], the detection of complex temporal patterns over the stream data becomes an important issue in many of these new applications. Because of the different aspects of stored data and stream data, the detection of patterns over them requires different solutions. The construction of indexes or other precomputed summary of the stored data is the solution usually applied for efficient pattern detection over stored data. However, for stream data that is continuously growing, instead of preprocessing the data, a preprocess of the complex pattern can derive auxiliary information that allows skip unnecessary checks of the pattern prediY. Manolopoulos and P. Návrat (Eds.): ADBIS 2002, LNCS 2435, pp. 401-414, 2002.  Springer-Verlag Berlin Heidelberg 2002

402

Lilian Harada

cates against stream data. In this paper we explore such an approach for detecting complex temporal patterns over continuous data stream. Our algorithm first preprocesses the complex temporal pattern at compile time and generate some auxiliary information that is then used repeatedly to efficiently search the stream with a window that slides from left-to-right at run time. The stream data inside the window are checked to see if they satisfy the predicates of the pattern from right-to-left. The essence of the algorithm is that it captures the logical relationship between the complex predicates of the pattern as part of the query compilation, and then infers which window shifts can possibly succeed after a partial match of the stream and the pattern predicates. Thus, our algorithm improves the length of window shifts and minimizes repeated passes over the same data. The basic idea is similar to the one used in some string search algorithms in textual applications, such as the Boyer-Moore algorithm [8], which is considered the most efficient string matching algorithm. However our algorithm handles patterns with complex predicates that are beyond the scope of the traditional string search algorithms. This paper is organized as follows. In section 2 we discuss the motivation and essence of our algorithm by using some illustrative examples. In section 3 we present some basic concepts and terminology, and then a detailed description of our algorithm. Section 4 presents some preliminary evaluation results comparing our algorithm with other approaches. Finally section 5 presents our conclusions as well as some issues for further research.

2

Some Illustrative Examples

2.1

Motivating Examples

Let's consider applications that monitor the physical world by querying and analyzing sensor data. An example of monitoring applications is the supervising of items in a factory warehouse [2]. Items of a factory warehouse have a stick-on temperature sensor attached to them. Sensors are also attached to walls and embedded in floors and ceilings. Each sensor provides the measured temperature at regular intervals. The warehouse manager uses the sensor data to make sure that items do not overheat. Typical queries that are run continuously are: Query 1: "Return repeatedly the abnormal temperatures, that is, those above a threshold value V, measured by all sensors". Query 2: "Every five minutes retrieve the maximum temperature measured over the last five minutes by all sensors". The first query shows the filtering of some data satisfying a given condition, while the second one aggregates sensor data over time windows. Filtering and aggregation are recognized to be essential for applications on data streams, and are addressed by many works recently [2-4,10]. However, besides these filtering and aggregation que-

Complex Temporal Patterns Detection over Continuous Data Streams

403

ries, we believe that the following queries are also of great interest for these monitoring applications. Query 3: Find sensors whose three consecutive measured temperatures are 35, 36 and 37 degrees. Query 4: Find sensors where, from a temperature below 35, it goes up more than 2 degrees in three consecutive measures, achieving a temperature higher than 40. Query 5: Find temporal patterns where the temperature increases from a value below 38 to a value between 40 and 50, followed by two successive drops, returning to a temperature below 38. The patterns of interest in these temperature monitoring applications range from very simple ones, such as that in Query 3 that looks for three consecutive given temperatures, to the more complex patterns used in Query 4 and 5 that look for successive increases in the temperature, or unusual spikes that require attention from the warehouse manager but cannot be precisely specified. Instead of absolute single values, relationships such as increased/decreased values, as well as possible range values are specified in the pattern. More specifically, queries 3, 4 and 5 look for records r of the data stream whose temperature attribute values match patterns whose predicates p[i] can be expressed as shown in Fig. 1, 2 and 3, respectively. p[0](r): r.temperature = 35 p[1](r): r.temperature = 36 p[2](r): r.temperature = 37

Fig. 1. Pattern Predicates for Query 3 p[0](r): r.temperature < 35 AND r.temperature < r.next.temperature-2 p[1](r): r.temperature < r.next.temperature-2 p[2](r): r.temperature < r.next.temperature-2 p[3](r): r.temperature > 40

p[0](r): r.temperature < 35 p[1](r): r.temperature > r.previous.temperature+2 p[2](r): r.temperature > r.previous.temperature+2 p[3](r): r.temperature > 40 AND r.temperature > r.previous.temperature+2

(a)

(b) Fig. 2. Pattern Predicates for Query 4

p[0](r): r.temperature < 38 AND p[1](r): 40 < r.temperature < 50 AND r.temperature > r.next.temperature p[2](r): r.temperature > r.next.temperature p[3](r): r.temperature < 38

p[0](r): r.temperature < 38 p[1](r): 40< r.temperature < 50 p[2](r): r.temperature < r.previous.temperature p[3](r): r.temperature < 38 AND r.temperature < r.previous.temperature

(a)

(b) Fig. 3. Pattern Predicates for Query 5

The pattern predicates in Query 3 are always equalities with constants. In this case, well-known string-matching algorithms such as the Boyer-Moore algorithm [8] and the Knuth-Morris-Pratt algorithm [7] can be applied efficiently. These algorithms are extensively covered in the literature and won't be described here.

404

Lilian Harada

However, the patterns of Query 4 and Query 5 are much more complex and their predicates are conjunctions of inequalities with constants and attribute values of neighbor records. Unfortunately, the string-matching algorithms are only applicable when the qualifications in the query are equalities with constants as those of Query 3 and thus, cannot be applied in these more complex cases. In Queries 4 and 5, r.previous indicates a record preceding itself in the stream (that is, the one at its left side), while r.next indicates the record succeeding itself in the stream (that is, the one at its right side). As illustrated in Fig. 2 and 3, the patterns can be described by specifying conditions on left side as well as right side records. When the predicates are checked from the right-to-left, that is, first p[3], then p[2], p[1] and p[0], the specifications are expressed by using r.next. On the other hand, when the predicates are checked from the left-to-right, that is, beginning with p[0], the specifications are expressed by using r.previous. For the string-matching algorithms, the well-known BM algorithm compares the string and text characters from right-to-left, while the KMP algorithm compares the string and text characters from left-to-right. Analogously, for these complex predicates, the direction the predicates are checked leads to different algorithms and different optimizations. We are analyzing algorithms for both directions, i.e., the R2L that checks in the right-to-left direction, and the L2R that checks in the left-to-right direction and is similar to the OPS algorithm proposed by Sadri et al. in [9]. Because of lack of space, in this paper we only present our R2L algorithm. Details of the L2R algorithm and some differences with OPS that resulted in improvements in some situations can be found in [6]. 2.2

An Illustrative Example of the Pattern Detecting Process

Because our R2L algorithm is best introduced with an example, in the following we take Query 5, whose pattern predicates are shown in Fig. 3(a), and illustrate its execution. We consider a very short stream composed of 8 records whose temperature values are 36, 42, 47, 36, 37, 42, 40, 37. As shown in Fig. 4, in the first window processing, we find that p[3](r): r.temperature r.next.temperature is satisfied by the value 47 (since 47>36); but p[1](r): 40

Suggest Documents