Keywords: Wireless Sensor Networks (WSNs), Query Processing Systems, ... application developers and ad hoc users to rapidly develop and use wireless sensor network applications. .... It uses TinyOS platform and TinySQL for writing declarative queries. TinySQL ... builder and result display that uses the API. TinyDB uses ...
Query Processing Systems for Wireless Sensor Networks Humaira Ehsan and Farrukh Aslam Khan Department of Computer Science, National University of Computer and Emerging Sciences, A.K. Barohi Road H-11/4, Islamabad, Pakistan {humaira.ehsan,farrukh.aslam}@nu.edu.pk
Abstract. Wireless Sensor Networks (WSNs) have been widely used during the past few years and have wide areas of applications. In today’s era, the users require a sophisticated state of the art system to control and monitor the sensor network. The old centralized systems to collect sensor data have become obsolete because of the lack of flexibility in data extraction and scalability problems. Motivated by the success of distributed database systems, the concept of viewing the sensor network as a sensor database system was proposed which gained a lot of popularity. Based on that concept, many systems have been developed for query processing in WSNs. In this paper, we discuss all those existing systems and compare them based on their working and performance. Keywords: Wireless Sensor Networks (WSNs), Query Processing Systems, Query Optimization.
1 Introduction Wireless Sensor Networks (WSNs) consist of large number of sensor nodes which interact with the environment without any human intervention. The sensor nodes are extremely small in size and have constrained computation power, storage and communication capabilities, therefore, optimal use of these resources is very important. Due to the data centric nature of WSN applications and traditional database system applications, a lot of research has been done on querying and tasking sensor nodes using the database approach. In a traditional database system, data is stored in a persistent storage repository while in a WSN; the database is distributed and consists of data that is acquired by the sensor nodes. In both the cases, the user need not be aware of the physical storage organization in order to query the data of interest [1]. In traditional database systems, queries are used to perform operations on the data which is present on disks. Similarly, in WSNs, queries instruct nodes on the management, filtering, and processing of the data acquired from the environment [2]. Despite the similarities, WSNs are much different than the traditional database systems because their data is volatile, contains a lot of errors because of various reasons, and is a continuous long running stream as new data is being sensed constantly. Although the field of database management systems is already searched in a lot of depth but applying it in WSN introduces many challenges such as how sensor data will be organized and stored, what user interfaces to the sensor database will be required and how queries are processed and served keeping in view that we have T.-h. Kim et al. (Eds.): UCMA 2011, Part I, CCIS 150, pp. 273–282, 2011. © Springer-Verlag Berlin Heidelberg 2011
274
H. Ehsan and F.A. Khan
limited computation and communication capabilities. The first and far most important goal of WSN applications is minimizing energy consumption in order to have a long living network. WSN users are typically interested in continuous streams of sensed data from the physical world. Query processing systems provide a high-level user interface to collect, process, and display continuous data streams from sensor networks. These systems are high-level tools that facilitate the application developers and ad hoc users to rapidly develop and use wireless sensor network applications. In contrast, writing WSN applications in a systems language such as C or Java is tedious and error-prone. A query processing system abstracts the users from tasks such as sensing, forming an ad-hoc network, multi-hop data transmission, and data merging and aggregation [3]. The responsibility to deploy and manage these networks is usually allocated to an owner that acts as a single controlling entity. Research has shown that it is so far not possible to deal with a secure multi-purpose federated sensor network, involving tens of thousands of sensor nodes running different applications in parallel and able to reconfigure dynamically to run others. Due to this reason, a sensor network is often dedicated to a single application. Various factors are taken into consideration while allocating sensing bandwidth and computation resources including the query load and the priority and urgency of each application. Special care is taken to provide desirable quality-of-service, whilst preserving fairness, secure operation and privacy across applications. During the past few years, many systems have been developed by researchers for query processing in WSNs. In this paper, we discuss all those existing systems and provide a comprehensive comparison of these systems based on their working and performance. The rest of the paper is organized as follows: In sections 2, various existing query processing systems are discussed in detail. Section 3 discusses various specific features of these systems and presents their comparison. Finally, section 4 concludes the paper.
2 Query Processing Systems The data centric approach of tasking WSNs was first formally introduced in [4]. After that there have been many other approaches proposed. In general, there are two broad categories of query processing; the centralized approach and the distributed approach. In centralized approach a predefined set of data is regularly delivered from the sensors to a central location where it is stored in a database. User queries that database through some interface provided by the system. This approach is similar to the warehouse approach of traditional database systems but it is not very suitable for WSNs. The problem is that most of the WSN applications require real time data and offline data is of no use in such scenarios. Secondly, communicating bulk of data from sensors to sink periodically wastes lots of resources. The second approach is distributed approach where data is kept on the sensors and part of the processing is done there and only the required data is sent to the sink. Distributed query processing in WSNs has been an active research area over the last few years. In the distributed
Query Processing Systems for Wireless Sensor Networks
275
approach, the data that ought to be extracted from sensors is determined by the query workload. The distributed approach is, therefore, not only flexible such that different queries extract different data from the sensor network, but it is also efficient ensuring extraction of only relevant data from the sensor network. TinyDB [5] [6] and Cougar [7] represent the first generation of distributed query processing systems in WSNs. Other systems that will be discussed are Corona [8] [9], SINA [10] and SenQ [11]. 2.1
Sensor Information Network Architecture (SINA)
Sensor Information Network Architecture (SINA) [10] is a middleware designed for querying, monitoring and tasking of sensor network. The functional components of the SINA architecture are: Hierarchical clustering, attribute-based naming, and location awareness. Due to the large number of sensors nodes, nodes are divided into clusters and each cluster is assigned a cluster head. In this way a hierarchy of clusters is formed and all the information filtering, fusion and aggregation is performed through cluster heads. Most of the sensor applications heavily depend on the physical environment and location of the sensors, therefore, location information is very important component of SINA. Location information can be obtained through GPS, but because of economical reasons all sensor nodes cannot be equipped with GPS. A number of techniques are available to solve this issue and any of them can be used in this component. Sensor Query and Tasking Language (SQTL) [10] is a procedural scripting language which is used as a programming interface between applications and SINA middleware. A sensor execution environment (SEE) runs on every node, which is responsible for receiving, interpreting and dispatching SQTL messages. SQTL has many arguments which are used to generate various types of actions. SINA provides various information gathering methods and according to the application requirements, combinations of those methods are used appropriately. In dense sensor network, generation of response from each node and passing it to the sink cause response implosion. Some applications may not need response from every node; response of some of the nodes from certain area may be enough. Through experiments, authors have shown that large amount of collisions can be caused if none of the information gathering techniques is used. The diffused computation technique performs better than all others. 2.2
TinyDB
TinyDB [6] is an acquisitional query processing system which is designed to work on UC Berkeley motes. They focus on the fact that sensors have control over where, when, and how often data is physically acquired. It is the most widely used system. Its prominent features are intelligent query processing, query optimization, and power efficient execution. It does fault mitigation by automatically introducing redundancy and avoiding problem areas. Fig. 1 illustrates the basic architecture of the system.
276
H. Ehsan and F.A. Khan
Fig. 1. Basic Architecture of TinyDB
In TinyDB, the sensor tuples belong to a table sensor which, logically, has one row per node per instant in time, with one column per attribute (e.g. light, temperature, etc.). It uses TinyOS platform and TinySQL for writing declarative queries. TinySQL queries have the form: SELECT , [FROM {sensors | }] [WHERE ] [GROUP BY ] [SAMPLE PERIOD | ONCE] [INTO ] [TRIGGER ACTION ] Given a query specifying user’s data interests, TinyDB collects that data from motes in the environment, filters it, aggregates it together, and routes it out to a PC. To use TinyDB, TinyOS components need to be installed onto each mote in the sensor network. TinyDB provides a simple Java API for writing PC applications that query and extract data from the network; it also comes with a simple graphical querybuilder and result display that uses the API. TinyDB uses a flooding approach to disseminate the queries throughout the network. The system maintains a routing tree rooted at the user. Every sensor node has its own query processor that processes and aggregates the sensor data and maintains the routing information. The important features that TinyDB includes are: metadata management, network topology and multiple queries handling. 2.3
COUGAR
COUGAR: The Network Is The Database [12] was a project of Cornell University database systems group. They believe that declarative queries are very well suited for WSN applications. They have proposed a query layer for declarative query processing. As in WSNs, computation is cheaper than communication in terms of energy efficiency, they have also proposed in network aggregation which suggests
Query Processing Systems for Wireless Sensor Networks
277
that instead of communicating all of the raw data to the base station, results should be aggregated at intermediate nodes and then communicated towards the base station. Because of the diversity of WSN applications, the requirements in terms of energy consumption, delay and accuracy vary from application to application. This system can generate different query execution plans according to the requirements of different applications. The query plans are normally divided into two components: communication component and computation component. A query plan decides how much computation will be pushed into the network and specifies the responsibility of each sensor node i.e., how to execute the query, and how to coordinate the relevant sensors. The network is viewed as a distributed database which has multiple tables where each table corresponds to a sensor type. Their proposed software component which should be deployed on each sensor is called a query proxy. The proposed query template is given below: SELECT FROM WHERE GROUP BY HAVING DURATION EVERY
{attributes, aggregates} {Sensordata S} {predicate} {attributes} {predicate} time interval time span e
The long running periodic queries are supported by “DURATION” and “EVERY” clause. Authors have proposed three approaches for in-network aggregation. First is Direct Delivery, in which the leader nodes do the aggregation and each sensor sends its data towards the leader. Second is Packet Merging, in which several records are merged into a single packet and that packet is sent; in this way packet overhead is incurred only once. Third is Partial Aggregation, in which each node computes the partial results and those results are sent to the leader. The last two techniques need modification in routing protocol as the packets need to be intercepted and modified packets need to be generated. To perform packet merging or partial aggregation, synchronization between the sensors is required. They have not yet developed a complete working system but their ideas have been partially tested using NS-2. 2.4
Corona
This project Corona [9] was previously named as Sun SPOT Distributed Query Processing (SSDQP) [3]. Corona is a distributed query processor, developed at the School of IT, University of Sydney. The system is implemented on Sun SPOTs which is new state of the art sensor network hardware with full java support. The platform provides much more memory and computational power than previous generation of sensor nodes i.e., Berkley Motes. The system is fully written in Java on top of the Sun SPOT’s Squawk VM, a lightweight J2ME virtual machine, which makes is easy to maintain and extend. The system consists of three components as shown in Fig. 2, i.e., 1. The query engine that is executed on the Sun SPOTs 2. The host system on the user’s PC that is connected to the base station 3. A GUI client which connects via TCP/IP to the host system.
278
H. Ehsan and F.A. Khan
Fig. 2. Basic Architecture of Corona
Corona uses a variant of an acquisitional SQL which provides all the features of a querying language. A unique feature of corona query processor is that it can execute multiple queries simultaneously, due to which the same network can be used for different applications. As energy efficiency is the primary goal of every system designed for WSNs, Corona also has components to ensure efficient energy utilization such as in-network clustering operator which is resource-aware and dynamically adapts its processing granulites to keep the number of transmitted messages small. 2.5
SenQ
SenQ [11] is an embedded query system for interactive wireless sensor networks (IWSNs). IWSNs are human centric and interactive. Applications of this area require very different category of features. The key challenges that SenQ addresses are heterogeneity, deployment dynamics, in-network monitoring, localized aggregation, and resource constraints. General architecture of SenQ is illustrated in Fig. 3.
Fig. 3. Architecture of SenQ
It has a layered system design. The lowest two layers require lesser computation and storage and these layers reside on the embedded sensor devices. Layer 3 of query management and data storage resides on the micro server. Layer 4 is a declarative language like SQL and it is called SenQL. These layers are loosely coupled to deal
Query Processing Systems for Wireless Sensor Networks
279
with diverse application requirements. The design of SenQ is very flexible and can be easily adapted according to the application requirements. SenQ supports two types of queries: snapshots and streaming. Streaming queries collect data and report back results continuously until a stop command is executed, while snapshots provide efficient point in time samples of data. Authors have evaluated SenQ's efficiency and performance in a test bed for assisted-living.
3 Features Comparison The above discussed sensor network query processing systems are compared in this section. First, the comparison parameters are discussed and then the comparison is provided in tabular form in Table 1. 3.1 Event-Based Queries These queries are executed only when interesting events happen, for example, button pushed, some threshold sensor reading sensed, bird enters nest etc. Events in TinyDB are generated explicitly, either by another query or by a lower-level part of the operating system. SenQ uses EventSensor drivers in sensor sampling and processing layer to generate data sporadically. SINA has arguments in SQTL which can be used to trigger events periodically or when a message is received by the node. 3.2 Life-Time Based Queries The sensor network should have a way to specify long running periodic queries parameter. In TinyDB, the SQL clause “LIFETIME ” is used to create life time based queries. In Cougar, the same is accomplished by using DURATION and EVERY clauses of SQL. SenQ provides the same feature through streaming queries. 3.3 In-network Aggregation In this technique instead of passing raw values in the network, the sensor nodes pass on aggregated data along the routing path. This technique is very efficient in saving constrained resources of the network. TinyDB provides various techniques for innetwork aggregation. Cougar was the first project in which in-network aggregation was introduced and then implemented. Cougar have done this by modifying network layer in NS-2. SenQ provides the features of temporal aggregation and spatial aggregation in its sensor sampling and processing layer. SINA has implemented this feature in information gathering component through diffused computation operation. 3.4 Multi-query Optimization In practice, sensor network query systems supports many users accessing the sensor network simultaneously, with multiple queries running concurrently. The simple approach of processing every query independent of the others incurs redundant communication cost and draining the energy reserves of the nodes [13]. In multi query optimization the concept of optimal result sharing for efficient processing of
280
H. Ehsan and F.A. Khan
multiple aggregate queries is suggested. Corona claims to do the multi-tasking optimally. SenQ does support multi-query execution but it is not very clear whether they do optimization or not. 3.5 Heterogeneous Network The network nodes can be heterogeneous in terms of energy, bandwidth, computing and storage capabilities, availability of pre-computed results etc. SenQ is designed to deal with all kinds of heterogeneity in the network. 3.6 Time Synchronization As WSNs can be dynamic, nodes leave and join very often. To keep the nodes synchronized efficient time synchronization technique is required which consumes minimum energy and gives accurate results. 3.7 Scalability in Network Size The system should be scalable for larger networks. Generally, WSNs consist of large number of sensor nodes so the performance of the system should not degrade with the increase in network size. The centralized query processing systems had the scalability problems but distributed approach normally scale well with the network size. Table 1. Comparison of various query processing techniques Criteria Platform
SQL type query interface GUI In-network aggregation Multi query optimization Event Based Queries Life Time based Queries Heterogeneous Networks Support Time Synchronization Scalability in network size
TinyDB Berkeley Motes + TinyOS Yes
Cougar Simulation
Corona SUN SPOT
SenQ TinyOS
SINA Simulation
Yes
Yes
Yes
Yes
Yes
No
Yes
Yes
No
Yes
Yes
Yes
Yes
Yes
No
No
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
Yes
No
Yes
Partial
Yes
Yes
No
Not Clear
Not Clear
Yes
Yes
Yes
Query Processing Systems for Wireless Sensor Networks
281
3.8 Interfaces Interfaces are very important for the users of the system. Cougar provides no graphical user interface (GUI). TinyDB provides the GUI, SQL and the programming abilities. Corona also has both the GUI and the SQL interfaces. SenQ has the most sophisticated interfaces for all categories of users i.e., database experts, application programmers, and ad-hoc users. SINA provide interface through procedural scripting language but no GUI is present.
4 Conclusion A review of existing query processing systems for Wireless Sensor Networks (WSNs) is presented in this paper. Generally, TinyDB is the most widely used system because of its availability. It is a ready to use system in standard mica-mote networks in which the user enters simple SQL-like queries into base station PC. High degree of optimization is possible. But it requires modification of underlying network layer or development of a “wrapper” around the layer to provide the required functionality. Cougar is yet to be implemented on a real test bed. Corona is latest of all the systems and is good as the platform used has more capabilities than the Mica motes. It can be a system for more powerful sensor networks of next generation. SenQ is mainly targeted for a sub-domain of WSNs i.e., interactive WSNs, but because of its loosely coupled layered architecture it can be adapted for any kind of WSN. Its support for heterogeneous networks makes it suitable for all kinds of applications. A feature comparison of the most famous query processing systems has been presented in this paper. However, a true performance comparison of the existing systems is still required in which the energy-efficiency, accuracy, delay and results of these systems need to be compared. Standard benchmarks should also be designed to test the performance of such systems.
References 1. Trigoni, N., Guitton, A., Skordylis, A.: Chapter 6: Querying of Sensor Data. In: Learning from Data Streams: Processing Techniques in Sensor Networks, pp. 73–84. Springer, Heidelberg (2007) 2. Amato, G., Baronti, P., Chessa, S.: Query Optimization for Wireless Sensor Network Databases in the MadWise System. In: Proc. of SEBD 2007, Torre Canne, Fasano, BR, Italy, pp. 242–249 (2007) 3. Scholz, B., Gaber, M.M., Dawborn, T., Khoury, R., Tse, E.: Efficient time triggered query processing in wireless sensor networks. In: Lee, Y.-H., Kim, H.-N., Kim, J., Park, Y.W., Yang, L.T., Kim, S.W. (eds.) ICESS 2007. LNCS, vol. 4523, pp. 391–402. Springer, Heidelberg (2007) 4. Intanagonwiwat, C., Govindan, R., Estrin, D.: Directed Diffusion: A Scalable and Robust Communication Paradigm for Sensor Networks. In: Sixth Annual ACM/IEEE International Conference on Mobile Computing and Networking, Boston, USA (2000)
282
H. Ehsan and F.A. Khan
5. Madden, S.R., Franklin, M.J., Hellerstein, J.M., Hong, W.: TinyDB: An Acquisitional Query Processing System for Sensor Networks. ACM Trans. Database Syst. 30(1), 122– 173 (2005) 6. TinyDB: http://telegraph.cs.berkeley.edu/tinydb/overview.html 7. Demers, A., Gehrke, J., Rajaraman, R., Trigoni, N., Yao, Y.: The Cougar Project: A Work in Progress Report (2003) 8. The Corona Website (2009), http://www.it.usyd.edu.au/~wsn/corona/ 9. Khoury, R., et al.: Corona: Energy-Efficient Multi-query Processing in Wireless Sensor Networks. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds.) DASFAA 2010. LNCS, vol. 5982, pp. 416–419. Springer, Heidelberg (2010) 10. Jaikaeo, C., Srisathapornphat, C., Shen, C.: Querying and Tasking in Sensor Networks. In: SPIE’s 14th Annual Int’l. Symp. Aerospace/Defense Sensing, Simulation, and Control, Orlando, FL, (2000) 11. Wood, A.D., Selavo, L., Stankovic, J.A.: SenQ: An Embedded Query System for Streaming Data in Heterogeneous Interactive Wireless Sensor Networks. In: Nikoletseas, S.E., Chlebus, B.S., Johnson, D.B., Krishnamachari, B. (eds.) DCOSS 2008. LNCS, vol. 5067, pp. 531–543. Springer, Heidelberg (2008) 12. Cougar, http://www.cs.cornell.edu/bigreddata/cougar/index.php 13. Demers, A., Gehrke, J., Rajaraman, R., Trigoni, N., Yao, Y.: Directions in Multi-Query Optimization for Sensor Networks. In: Advances in Pervasive Computing and Networking, pp. 179–197. Springer, Heidelberg (2004)