Jun 12, 2005 - model which aims to give a general view over sensor data in order to .... als rely on the relational model and add a temporal domain in order to ...
SStreaM: A Model for Representing Sensor Data and Sensor Queries Levent G¨urgen, Cyril Labb´e, Claudia Roncancio, Vincent Olive France Telecom R&D, Grenoble - France LSR-IMAG Laboratory, Grenoble - France {levent.gurgen,vincent.olive}@rd.francetelecom.fr {cyril.labbe,claudia.roncancio}@imag.fr
Abstract Sensor querying has become one of the major challenges in the data processing community since the explosion of the new generation of sensors. Sensor networks and data stream processing solutions are two popular ways of querying sensor data. However, there are not sufficiently generic models neither for representing sensor data nor for formulating queries. In general, models are application dependant. Their reuse in different contexts is limited. This paper aims to provide a general conceptual model, named SStreaM, to represent different kinds of sensor data as well as various types of queries including continuous queries on data streams or on sliding windows. Several aspects of the model are illustrated by an example. This paper also briefly discusses related domains such as sensor databases, data stream management systems; and sequential, temporal and real-time databases.
1
Introduction
Sensor querying has become a very popular research topic in many areas of computer science. This is because the contributions of these tiny yet intelligent devices to the new ”information everywhere” paradigm are now well recognized. Therefore, various domains try to solve challenges appeared with this new way of computing. The networking domain does research on self-adaptive, energy-efficient, and multi-hop routing protocols. Database community is concerned by modelling and querying sensor data. Embedded operating systems adapted for these tiny devices are being proposed by the systems community. And lastly, ubiquitous computing explores different opportunities that new generation sensors can offer to the anywhere anytime information processing. Concerning sensor data management issues, one interesting (and widely used) approach to sensor information processing is to reuse classical database principles. With this approach, the sensors are considered as data sources generating data conforming to a schema. According to the
place where the queries are evaluated, we can differentiate two extreme ways of sensor querying: fully distributed and centralized. Wireless Sensor Networks (WSN) [6] adopt a purely distributed approach. Queries are evaluated on the sensors thanks to their increasing computing and storage capacity. Hence, the sensor database system (SDBS) [11, 25, 19] notion has been introduced. In Data Stream Management Systems (DSMS) [3, 12, 7] continuous queries over streaming sensor data are evaluated on a centralized relatively powerful server. Compared to SDBSs, DSMSs provide more complex queries (sliding window joins, aggregations, etc.). Some recent works propose the integration of these two approaches, i.e. adopting a hybrid approach [23, 17, 4]. Disregarding the way queries are supported, the most popular approach for modeling sensor data is to use the well known relational model. However, unique characteristics of sensor systems prevent a direct use of relational query evaluation techniques. Therefore, several propositions adapted to sensors context at the level of data and query modeling, query languages and optimization techniques are proposed from both SDBS and DSMS solutions. This work mainly concerns data and query modeling issues for sensor data management. It reviews related work and concludes that a general model for representing sensor data and queries is still missing. Besides, we argue that some related domains such as temporal, sequential and real-time databases domain are not sufficiently explored in order to extend sensor query representation. We propose SStreaM, a model for representing sensor data and sensor queries. Our contributions can be summarized as follows: Sensor data model: We propose a sensor stream data model which aims to give a general view over sensor data in order to represent different sensor data types. Basically, SStreaM’s data model is a sequential model which considers sensor data as a (eventually infinite) sequence of tuples. Each tuple represents static sensor properties, as well as the time-varying sensor data. Sensor query model: Query execution over sensor stream data and over windows (finite sub-streams) is formalized. Query operators include a time dimension in order
to reflect the real-time nature of sensor querying. Operators on streams are modeled with a tuple-by-tuple execution basis, while operators on windows are based on a set-oriented execution basis. General window definition: We give a general definition in order to define different types of windows such as temporal and position-based windows with various behaviors such as fix, sliding, tumbling or landmark windows. The rest of the paper is organized as following. Section 2 gives the positioning of our work with respect to related domains such as SDBSs, DSMSs, sequential, temporal, and real-time databases. Section 3 presents SStreaM, the data and query model we propose. Section 4 introduces an example scenario and illustrates the use of SStreaM. Section 5 concludes and gives our research perspectives.
and SDBSs, as well as the fundamental differences of our approach with respect to these aspects.
2
In order to represent sensor data, each SDBS defines a data schema. Queries are formulated according to that schema. However, proposed schemas are rigid and application dependant. There is not a common agreement about how sensor data should be represented. SStreaM proposes a more general schema which distinguishes three types of attributes: sensor properties, sensor measurement, and timestamp. Hence, we aim to obtain more flexible sensor data representation applicable to different types of sensors.
Related Work
Our work is directly related to two domains, namely sensor database systems (SDBSs) and data stream management systems (DSMSs); and indirectly to three other domains, namely sequential, temporal and real-time databases. This section gives a brief overview of these domains as well as the position of our proposal with respect to them.
2.1
SDBSs and DSMSs
SDBSs [11, 25, 19] have been introduced with the arrival of the new generation of sensors. Thanks to their increasing computing, storage and wireless communication capacities, each sensor can be seen as an autonomous database containing data about its environment (temperature, pressure, geographic location, etc.). Sensors form a wireless sensor network [6], in which queries are distributed by a gateway (or base station) in a multi-hop manner. Continuous queries are evaluated on the sensors (or in the network for some aggregation operators [24]) and the results are collected by the gateway. Sensor data is not materialized until the evaluation of the query on the sensor. DSMSs [3, 12, 7] are systems of continuous query processing over data streams. They are not conceived only for sensor applications, but also for monitoring applications of financial, telecommunication, or network data. Contrarily to SDBSs, most of the DSMSs are centralized systems, i.e. stream sources send their data to the DSMS, and continuous queries [10] are evaluated on a centralized server. Sensor data is materialized as a data stream. SDBSs and DSMSs are two strongly related domains. In fact, we can talk about a certain sensor stream management system (SSMS) as a sub-domain of DSMS (see Figure 1). Several recent works in the literature [23, 17, 4, 15] fall into this domain. Below we give some common sensor data and query representation aspects introduced by DSMSs
Representation of sensor data The stream data model used by DSMSs is inspired by sequential model [29] which is another extension of the wellknown relational model. A stream is a (eventually infinite) sequence of tuples which represent sensor measures sampled at different instants. Tuples are ordered by the time domain. Our approach includes, in addition to the time ordering, a position ordering of tuples in order to include the positional semantic of tuples. Hence, we aim to enhance query representation by including some sequential operators.
Query models In both DSMSs and SDBSs, the most popular way of formulating queries is to use an SQL-like language [8, 12, 25, 11]. However, the underlying query models are not always formalized. The most serious effort in formalization is made by the STREAM system [7], namely CQL [8]. Our approach differs from their model principally in three aspects. First, we base our model of stream query evaluation on a timely tuple-by-tuple execution basis (our arguments for this choice is discussed later), while they have a relation granularity for query execution. Second, our model, as well as a timestamp ordering, includes a linear position ordering by which we aim to be able to take advantage of some positional operators seen in sequential databases (see the overview of sequential databases below). And third, we give a more general definition for windows in order to define various types of windows. CQL is limited to sliding windows. In addition, contrarily to their model, we also define sliding distance and sliding rate for windows, as well as a more flexible management of window edges. Another DSMS, namely TelegraphCQ [12] also provides a general definition for windows by using a for-loop construct. However, they are restricted to temporal ordering, thus they only define temporal windows. Besides, their sliding window definition does not include a sliding rate parameter.
Figure 1. Relations between DSMS and SDBS
2.2
Figure 2.
Relations between SSMS and sequential, temporal, and real-time databases
Sequential, temporal and real-time databases
Sequential models [29, 5] are proposed in order to represent ordered and grouped data which is needed for efficient query execution in some special kind of applications such as financial applications, scientific data analysis, pattern recognition, etc. As mentioned earlier, sequential data and query model inspired the stream model used by most DSMSs. However, we argue that some sequential operators can be exploited more extensively in order to enhance stream query operators. For instance positional operators (before, next, index, etc.) would increase the expressiveness of queries. SStreaM includes, as well as temporal ordering, linear position ordering in order to be able to include sequential operators, Temporal databases [20, 13] aim to complete missing temporal aspects of traditional databases. Several proposals rely on the relational model and add a temporal domain in order to represent the temporal semantics of data. The existing models differ in several aspects such as dimension (valid time vs. transaction time), structure (discrete, continuous, etc.), and time domain representation (chronons, time intervals, set of intervals, etc.). Temporal expressions (at, along, present, valid at, somewhen, etc.) and temporal operators (overlap, after, during, precede, etc.) are proposed to extend already existing query languages, therefore, to include temporal semantics in queries. DSMSs model the time domain with a 1-dimension representation (validtime) and with a discrete structure where chronons represent one point in time. However, temporal operators are not exploited in DSMSs. We argue that, since temporal aspects are naturally included in sensor data, temporal operators and expressions would enhance queries on sensor data. A real-time database system (RTDBS) [27, 21] is a system where transactions have time constraints to update the database, or to answer queries. Therefore, besides the logical consistency (that traditional database systems deal with), RTDBSs have to guarantee temporal consistency as well. This is also true for sensor stream processing as sensors deal with real-time events. SStreaM includes the temporal dimension of query operators to reflect the real-time
nature of sensor querying. In addition, in RTDBSs, in order to keep the consistency in the system, the choice of a concurrency control protocol plays an important role. Due to real-time constraints, blocking protocols are avoided. Concurrent access in sensor systems is not studied in existing solutions since the operations over sensor data are in general read-only. However we argue that, modifications made on sensor properties (see Table 1) may create the need of dealing with concurrency control issues for sensor systems. We have a parallel research direction in investigating the concurrency control aspects in sensor stream management systems. However, this point is out of the scope of this paper. Several works analyzing relations between the aforementioned domains exist in the literature [26, 22, 14, 28] (see Figure 2). We have given an overview of interrelations of those domains with the sensor querying domain. However, further work is required to identify the reusable results of those domains in sensor querying.
3
SStreaM: sensor stream data and query model
This section presents SStreaM, a model of sensor data and sensor queries. Section 3.1 gives our general sensor data schema definition and our stream model which is based on the sequential model adopted by most of DSMS solutions. Section 3.2 presents the query model which is essentially based on three types of operators: stream operators, window creation operators and windowed operators.
3.1
Data model
3.1.1
Schema definition
Sensor data is represented by tuples. As in conventional database systems, tuples conform to a data schema according to which queries are formulated. Mostly, queries pertain
Temperature Sensor : GPS Reader: RFID Reader:
Table 1. Examples of sensor data properties measurement Building A, Room 102, temperature, Celsius, 30 Id 13232, GPS localization, 123, 343, 342 Id 3434, RFID localization, Room 101 1233424242
to three different parts of sensor data: meta-information of the sensor (identification, location, type, unit of measure, etc.), sensor’s measurement (temperature, pressure, GPS coordinates, etc.), and timestamp representing the time at which the measurement is made. Continuous query operators execute on the measurement of sensors (e.g. sensors measuring less than 10). However, in order to localize the sensors whose data will be interrogated, a part of the query is executed on the sensors’ meta-information (e.g. temperature sensors in room A measuring less than 10 Celsius). And lastly, time is also concerned by most of the queries on sensor data (e.g. temperature sensors in room A measuring in average less than 10 Celsius in a sliding window of 5 minutes). Hence, in our general sensor data schema definition, we differentiate three types of attributes. First type is property attributes. They contain meta-information of sensors. We note that not all property attributes are known by the sensors. Some of them would be added to the tuples by intermediary units such as proxies, gateways, or servers. The second type is the continuously changing attribute over time, and is represented by the measurement field. The semantic of the measurement and the properties can vary depending on application contexts. The semantic interpretation would be done at the application level. For instance, measurement can represent temperature reading for one application, RFID tag number for another. By this way, our objective is to allow the coexistence of different types of sensors while giving a sufficiently general schema definition. Finally, the last type is the timestamp attribute that represents the time at which a measurement is made by the sensor. We assume that timestamps are attributed to tuples according to a global time (by sensors or by some other intermediary units). We argue that this representation is sufficiently generic that most sensor data types can be represented (see Table 1 for some examples). We can therefore give a formal tuple definition as following: Definition 1 A tuple s is a list of several property attributes ai , one measurement attribute m, and one timestamp attribute tmstmp, i.e. s =< a1 , a2 , ......., an , m, tmstmp >. Each attribute ai belongs to a particular domain Di , m attribute to the measurement domain M , and tmstmp to the time domain T . T is a totally ordered set containing discrete points which represent different moments in time, tmstmp ∈ T = {t0 , t1 , t2 , ...}, T ⊆ N0 .
timestamp 10:23 12 June 2005 12:43 13 August 2005 23:34 14 May 2004
Sequences of tuples form data streams. Next section gives basic stream definitions and notations in order to formalize the stream data model. 3.1.2
Sensor data stream
Definition 2 A stream S = {s0 , s1 , ........, sn , ........} is a set of tuples si ordered by their tmstmp value. In addition, tuples also have linear positional ordering, i.e. the tuple sn is the nth element of the stream S. We note that the set S may contain distinct tuples which have the same value for the timestamp attribute, and an element of T is not necessarily present among the timestamp attributes of si . More formally: Property 1 Let τ : S → T be a function that gives the value of the timestamp attribute of a tuple si (i.e. τ (si ) = si .tmstmp). This function is neither injective nor surjective from S to T. Conceptually streams can be unbounded. However, only a bounded part of a stream is materialized for query processing. Therefore, we differentiate three parts in a stream. The past, the present and the future (see Figure 3): Definition 3 The present part of a stream is the currently materialized part of the stream at an instant t. We denote this part as S t ⊆ S, and as in definition 2, the tuple stn is the nth element of S t . Thus, the first element of S t is st0 , |S t | is the cardinality of S t , the last element of S t is st|S t |−1 , and ∀i sti .tmstmp ≤ t. The present part of the stream contains currently available data for query evaluation. Mostly, this part is materialized in form of a queue structure whose size is limited by the memory capacity of the query processing unit. Beyond this limit, the data expires from the queue, therefore becomes past data. Definition 4 The past of a stream S, at an instant t, is composed of si ∈ S such that si .tmstmp < st0 .tmstmp. The past can be stored in a persistent disc-based storage system. Queries over histories of data can be evaluated on this part. Definition 5 The future of a stream S, at an instant t, is composed of si ∈ S such that si .tmstmp > t.
Figure 3. The subset S t of the infite stream S represents data currently available for processing (e.g. as a queue in memory). Figure 4. Unary stream operator This part represents the measures not yet made by the sensors. We can therefore formulate the queries to be evaluated on the future values of sensor measures.
δt is the termination time of execution of the operator (see Figure 4).
3.2
According to definition 7, the following formula can be given for U nOpt :
Query model
Queries are composed of several operators. Output of one operator can form the input of another. Operators on streams work with a tuple by tuple basis. They take first tuple(s) from the input stream(s), execute the operation, and write the answer to the output stream. Operators on windows work with a set-oriented basis similar to relational operators. They take window(s) as input, they execute the operation, and finally write the result to the output stream. 3.2.1
Stream Operators
Definition 6 A stream operator Op has a certain number of input streams and a unique output stream1 which contains the results of operations over input streams, i.e. Sout = Op(Sin1 , Sin2 , .......Sinn ) where Sout denote the output stream and Sini an input stream However, there are two types of operators which are mostly used: unary operators (i.e. Sout = U nOp(Sin)) and binary operators (i.e. Sout = BinOp(Sin1 , Sin2 )). Although numerous unary operators could be defined, concretely selection and projection operators are general enough to answer a large number of different kinds of queries. Similarly, the join operator can be given as an example of a binary operator. However join operations are generally executed on windows, as a result of the blocking nature of this operation. Therefore, in this section we will only define unary operators. In addition, we will particularly deal with the materialized part of streams which represents present values; whence the following definition: Definition 7 U nOP t is a unary stream operator which represents the execution of the operator UnOP at time t over the input stream Sint . U nOpt takes the first element of Sint and executes the operation defined by the operator. 0 The result forms the last element of Soutt , where t0 = t + 1 Operators
producing several streams are not considered.
UnOp t (Sint ) = UnOp t (sint0 ) = soutt+δt |Soutt |
(1)
(Remember: stn gives the nth term of S t ) From the preceeding formula we can find back the infinite stream Sout with the following:
Sout = =
∞ [ n=1 ∞ [
t +∆tn−1
U nOpt0 +∆tn−1 (sin00 0 +∆tn soutt|Sout t0 +∆tn |
)
(2) (3)
n=1
where t0 represents the time of the first execution of the operator, n ∈ N the nth execution of the operator, and ∆tn the accumulated duration until the operator’s nth execution (i.e. ∆tn = Σni=1 δti ). Note that δti is the duration of operator’s ith execution, |Soutt0 | = 0, and ∆t0 = 0 . Since tuples are being added to the stream continuously and eventually with a high rate, temporal dimension of the query operators which reflects the real-time nature of sensor queries gains more importance. Typically, in real-time databases, δt is the constraint on the execution of the operator. These systems will require δt to be less than a certain threshold in order to keep temporal consistency in the system. Although this subject is out of the scope of this paper, we want to note that adding a temporal dimension to the operators would facilitate to take into account the real-time aspects of these systems. See the perspectives section for more details. In the sensors context, periodic execution of operators is very usual: periodic filters over the data periodically sent by sensors, operators over periodically sliding windows, etc. In order to represent these cases, it would suffice to replace in the preceding formula, ∆tn with rate × n, where rate represents the execution periodicity of the operator.
3.2.2
Window Creation Operators
Windows are finite subsets of streams. From a general point of view, a window is bounded by two parameters: start and end. We differentiate two types of windows: temporal windows and position based windows. In case of temporal windows, window edges are time points (start, end ∈ T ); in case of position based windows, window edges are positions of the tuples in the stream (start, end ∈ N). Note that, in both cases start ≤ end, and start = end implies an empty window. Window creation operators create windows from streams. Formally; Definition 8 Let W be a window creation operator over a data stream S, it returns a window R bounded by start and end parameters. For position based windows: W(start,end) (S) = R = {si ∈ S | start ≤ i ≤ end}, start ≥ 0. For temporal windows: W(start,end) (S) = R = {si ∈ S | start ≤ si .tmstmp ≤ end}, start ≥ s0 .tmstmp
Figure 5. Position based sliding window with rate = 2, start adv = end adv = 3. Window width is 4 units
0 Wdesc (S) =
∞ [
Rn
(4)
n=1
As mentioned earlier, mostly we will deal with the present values of a stream. Thus, we define instantenous window creation operator W t which creates a window from the stream S t . We will use temporal windows to illustrate the rest of the section. The reasoning would be similar for position based windows. Definition 9 W t is an instantaneous temporal window creation operator which, at instant t, takes as input a stream S t and returns a window R, i.e. R = t W(start,end) (S t ) = {sti ∈ S t | start ≤ sti .tmstmp ≤ end}, start ≥ st0 .tmstmp and end ≤ st|S t |−1 .tmstmp. In the sensor querying context, generally the windows are not fixed, i.e. edges of windows vary continuously in function of the time. In order to include this kind of windows, we give a window description definition below: Definition 10 A window description desc is a 5-tuple containing the parameters: start, end, rate, start adv, and end adv. start (resp. end) is the initial value of the first (resp. second) edge of the window, rate represents the periodicity of the window, finally start adv and end adv determine the sliding distance (i.e. how much window edges will advance) in case of moving windows. (see Figure 5 for an example). Therefore, we can generalize the window creation operator definition given above: Definition 11 Let W 0 be a window creation operator, it takes as input a stream S and a description desc, it returns a set of windows created according to the behaviour description given in desc. Formally:
Rn is the window created during the nth execution of the operator W t over the stream S t (see Figure 6). Formally: t0 +(n−1)×rate Rn = W(start(n),end(n)) (S t0 +(n−1)×rate ) where start(n) = start + (n − 1) × start adv and end(n) = end + (n − 1) × end adv. With this general definition, it is possible to define different types of windows: fixed windows (start adv = end adv = 0), landmark (either start adv = 0, or end adv = 0), tumbling (start adv = end adv = end − start), etc. In general, window width is constant for sliding windows (i.e. start adv = end adv = cnst). However, if we want to have windows of different sizes at each sliding period, we can define start adv(n) and end adv(n) which can take different values at the nth execution of the window creation operator. Similarly, a non-constant rate parameter implies an aperiodic window. Therefore, a variable parameter rate(n) defines a different rate for operator’s nth execution (e.g. every time that a new tuple arrives). In addition, in some cases, the window edges may surpass the present part of a stream (part where currently tuples are present). For instance, this can happen when a sliding window advances so fast that window’s end parameter falls into the future part of the stream (see Figure 7). One solution for this problem is to evaluate the windowed operator (see next section) over the window only including the present values of the window; hence, the end parameter of the window becomes the timestamp of the last element of S t , i.e. if end > s|S t |−1 .tmstmp then end = s|S t |−1 .tmstmp. This solution could be used for periodic operators in order to give at least a result at the end of each period. Another solution
Figure 7. if end s|S t |−1 .tmstmp
>
s|S t |−1 .tmstmp then end
=
ing:
Sout = =
Figure 6.
W creates windows from input stream. WUnOp is executed on such windows and creates the output stream
Windowed Operators
This section introduces windowed operators – operators executing over windows. They are represented by the symbol W Op (Windowed Operator). As in the case of stream operators, we take two types of windowed operators: unary (W U nOp) and binary (W BinOp). As examples of windowed unary operators, we can give traditional aggregation operators such as average, count, sum, min, and max. Similarly, a windowed join is a binary operator (see Section 4 for operator examples). In this section, we only define unary windowed aggregation operators due to size restrictions. However, other operators can be defined in an analogous way. Similarly to definition 7, an aggregation W U nOp operator takes as input a window R and returns the result tuple to the output stream: W U nOpt (R) = soutt+δt |Soutt |
n=1 ∞ [
W U nOpt0 +(n−1)×rate (Rn )
(6)
t0 +n×rate sout|Sout t0 +(n−1)×rate |
(7)
n=1
could be to wait until the window fills with the demanding amount of tuples before executing the windowed operator. This solution could be adopted when there is no rate specified for the operator. Similarly, if the start < s0 .tmstmp, we can either take start = s0 .tmstmp or we can take the tuples from historic, if this latter is available. 3.2.3
∞ [
(5)
t where R = W(start,end) (Sint )
As in the formula 2 and 3, we can find back the output stream in case of a periodically sliding window as follow-
where Rn has been introduced in the formula 4. Note that for some windowed unary aggregation operators (e.g. average, count, sum), and binary operators (e.g. join), the timestamp value that the output tuple will take is not obvious. There are several possibilities to handle this problem: i) to choose as the output’s timestamp, one of the timestamp values of input tuples which contributed to the output tuple (e.g. the minimum [16], the maximum [9]), or the one indicated by the query [10]); ii) to assign a new timestamp (e.g. operator’s execution time [10]); iii) or alternatively to have a time interval [min ts, max ts] instead of a unique timestamp [18]. In order to maintain the temporal order of the output tuples, we choose to take the maximum of the input timestamp values. It is also the value nearest to the one that the operator would assign, if the second solution was chosen.
4
Query example
This section illustrates several aspects of SStreaM. The example is based on a hybrid multi-level architecture defined to query distributed heterogeneous sensors [17]. The architecture (see Figure 8) is composed of three main levels: control sites, gateways and sensors. Control sites are the entry points of the system. Users or applications send their queries to the control site, and the control site decomposes the query in order to send the sub-queries to the gateways concerned by the query. Gateways are distributed according to an attribute (mostly the location attribute). They group different kinds of sensors, more precisely their proxies. A proxy is the software controlling one or more sensors. On the gateway, there is also one adapter per proxy which is the interface between the sensor specific proxy and
our sensor querying system. Adapters are charged to make the translation between our query language and proxy’s sensor specific control commands. Sensors are physically distributed in an environment and send their measures to their proxies in a periodic or aperiodic manner. There are different kinds of sensors (temperature, pressure, localization, etc.) with different capabilities such as some query operator processing and storage capacities which can be used for query optimization purposes (e.g. if a sensor can execute a selection operator, then push the selection operator to the sensor). Having this architecture in mind consider the following scenario which will be used to illustrate a query example: In a factory, each product passes respectively by a certain number of sections during its lifecycle of production. The product stays, during one minute, in each section where some operations are effectuated on it, and then passes to the next section. Each section has a gateway containing different kinds of sensor proxies. For our example, we will take two types of sensors: temperature sensors and RFID readers (sensors). We assume that there are, several temperature sensors placed at different locations of a section, and one RFID reader per section detecting product tags (see Figure 8). According to the general operator definitions given in section 3, we introduce following operators which will be used for the query example: Stream operators: SelOpP takes as input a tuple si and returns si if the tuple conforms to the predicate condition defined in P . P rojOpL takes as input a tuple si and returns the tuple s0i which only contains the attributes of si listed in L. Windowed operators: W AV Gattr takes as input a window and returns the average values of attr attributes of the tuples present in the window. W JoinJC takes two windows as input, and returns the concatenation of the tuples holding the join condition specified in JC. According to our objective to represent different kinds of sensor data, we define a global common schema sensor stream: < SId, location, type, measurement, timestamp > This schema is actually a view over different distributed databases located at different levels of the architecture (control sites, gateways, proxies) and over the stream data of sensors. Note that the first three attributes form the property attributes Let’s consider the following query Q: Which products in the production chain had undergone an average temperature more than 40◦ C during its presence in a section? This query will be executed on the gateway of each section. The partial results from gateways will then be col-
Figure 8. Architecture and query example lected by the control site. We illustrate the part of the query which will be executed at one section. Let S1 be the stream created by the RFID reader, and S2 the stream created by temperature sensors of the section, then Q can be represented in algebraic form as following: 0 P rojOpL (SelOpP (W AV Gattr (Wdesc (W JoinJC ( 3 0 0 Wdesc1 (S1), Wdesc2 (S2))) where L =< S1.measurement >, P = (S1.type = RF ID ∧ S2.type = T emperature ∧ average > 40), attr = S2.measurement, and JC = (S1.location = S2.location). There is a join between the RFID readers’ data stream (S1) and a sliding window over the temperature sensors’ data stream (S2). It is an equi-join over the location property. However, according to our assumption that there is one gateway per section, this condition will always be true. As a result, this join operation will only couple each product with the temperature readings made during its presence in the section. Knowing that each product stays in one section during one minute, the width of the window will be 60 seconds 2 . The window is aperiodic; its sliding rate is determined by the products’ arriving rate. Sliding distance for both edges of the window is the difference between arrival times of two consecutive products. Therefore, the join is calculated between the tuples of S1 and the windows created on stream S2. Such window creation operator uses the following description: desc2 = < t0 , t0 + 60, rate(n), dist(n), dist(n) > where t0 is the timestamp of the first tuple in stream S1; rate(n) = dist(n) = S1[n + 1].timestamp − S[n].timestamp; and S1[n] gives the current tuple in progress. We note that, although, in our model we didn’t define joins between a stream and a window, we can consider the former as a position based tumbling window whose start parameter is 0, end parameter is 1, rate is aperiodic, 2 The
smallest time unit is a second
uses the OSGI platform [2], thus adopts a service-oriented approach. Data is collected by sensor services, and aggregated by distributed query services on the gateways. Global sensor stream query services at control sites discover and query sensor stream data by intermediary of query services and sensor services. Our ongoing research concerns the management of sensor farms. We have found out that continuous queries will be executed simultaneously with update transactions modifying sensor properties. This will require specific transaction management. We believe that temporal dimension of operators introduced in SStreaM would lead us to a finer management of continuous queries.
References
Figure 9. Query processing and start adv = end adv = 1. Therefore: desc1 =< 0, 1, rate(n), 1, 1 >. Finally, the average operation is executed over a temporal tumble window over the stream created by the windowed join operation. start, end, and rate parameters have the same values than ones in desc2 ; and start adv = end adv = 60 (i.e. desc3 =< t0 , t0 + 60, rate(n), 60, 60 >). The average operation calculates the average temperature over a 60 seconds length window for each product and adds an average attribute to the result tuple. Figure 9 shows the part of the query executed on each gateway. Answers of gateways are then merged by the control site.
5
Conclusion and perspectives
This paper proposed SStreaM, a model for representing sensor stream data and queries. SStreaM provides a general sensor stream representation model. It defines three types of operators: stream operators, window creation operators and windowed operators. These operators include a temporal dimension to reflect the real-time aspects of sensor stream querying. A general window definition and a flexible window edge management are also provided. A prototype implementing SStreaM has been developed for the PISE project [1]. The aim of this project is to monitor electric power materials in real-time. Various sensors give information about the current status of the materials (intensity, voltage, quality of electricity, etc.). The project
[1] PISE Project, http://www.telecom.gouv.fr/rnrt/rnrt/projets/ PISE.htm. [2] OSGI(Open Services Gateway Initiative), http://www.osgi.org. [3] D. Abadi, D. Carney, U. C ¸ etintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: a new model and architecture for data stream management. VLDB J., 12(2):120–139, 2003. [4] D. Abadi, W. Lindner, S. Madden, and J. Schuler. An integration framework for sensor networks and data stream management systems. In VLDB, pages 1361–1364, 2004. [5] R. Agrawal and R. Srikant. Mining sequential patterns. In ICDE-11, pages 3–14, Taiwan, 1995. [6] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. Wireless sensor networks: a survey. Computer Networks, 38(4):393–422, 2002. [7] A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, R. Motwani, I. Nishizawa, U. Srivastava, D. Thomas, R. Varma, and J. Widom. Stream: The stanford stream data manager. IEEE Data Eng. Bull., 26(1):19–26, 2003. [8] A. Arasu, S. Babu, and J. Widom. The cql continuous query language: Semantic foundations and query execution. Technical Report 2003-67, Stanford University, 2003. [9] A. M. Ayad and J. F. Naughton. Static optimization of conjunctive queries with sliding windows over infinite streams. In SIGMOD ’04, pages 419–430, NY, USA, 2004. [10] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In PODS ’02, pages 1–16, NY, USA, 2002. [11] P. Bonnet, J. Gehrke, and P. Seshadri. Towards sensor database systems. Lecture Notes in Computer Science, 2001. [12] S. Chandrasekaran, O. Cooper, A. Deshpande, M. Franklin, J. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. Shah. TelegraphCQ: Continuous dataflow processing for an uncertain world. In CIDR, 2003. [13] J. Chomicki. Temporal query languages: a survey. In ICTL’94, volume 827, pages 506–534. [14] W. Dreyer, A. K. Dittrich, and D. Schmidt. Research perspectives for time series management systems. SIGMOD Record, 23(1):10–15, 1994.
[15] P. B. Gibbons, B. Karp, Y. Ke, S. Nath, and S. Seshan. Irisnet: An architecture for aworld-wide sensorweb. IEEE Pervasive Computing, 2003. [16] L. Golab and M. T. Ozsu. Update-pattern-aware modeling and processing of continuous queries. In SIGMOD ’05, pages 658–669, NY, USA, 2005. [17] L. Gurgen, C. Labb´e, V. Olive, and C. Roncancio. A scalable architecture for heterogeneous sensor management. In DEXA Workshops, pages 1108–1112, Denmark, 2005. [18] M. Hammad, W. Aref, and M. Franklin. Efficient execution of sliding-window queries over data streams, 2003. [19] C. Intanagonwiwat, R. Govindan, D. Estrin, J. Heidemann, and F. Silva. Directed diffusion for wireless sensor networking. IEEE/ACM Transactions on Networking, 2003. [20] C. S. Jensen, R. T. Snodgrass, and M. D. Soo. The tsql2 data model. In The TSQL2 Temporal Query Language, pages 153–238. 1995. [21] B. Kao and H. Garcia-Molina. An overview of real-time database systems. pages 463–486, 1995. [22] K.Ramamritham. Time for real-time temporal databases? In Proceedings of the International Workshop on an Infrastructure for Temporal Databases, 1993. [23] S. Madden and M. J. Franklin. Fjording the stream: An architecture for queries over streaming sensor data. In ICDE, pages 555–566, 2002. [24] S. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong. Tag: A tiny aggregation service for ad-hoc sensor networks. In OSDI, 2002. [25] S. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong. Tinydb: an acquisitional query processing system for sensor networks. ACM Trans. Database Syst., 2005. [26] G. Ozsoyoglu and R. T. Snodgrass. Temporal and real-time databases: A survey. TKDE, 7(4):513–532, 1995. [27] K. Ramamritham. Real-time databases. Distributed and Parallel Databases, 1(2):199–226, 1993. [28] D. Schmidt, A. K. Dittrich, W. Dreyer, and R. W. Marti. Time series, a neglected issue in temporal database research? In Int. Workshop on Temp. Databases, UK, 1995. [29] P. Seshadri, M. Livny, and R. Ramakrishnan. The design and implementation of a sequence database system. In Proceedings of VLDB’96, pages 99–110, San Francisco, USA.