In-network Data Processing for Wireless Sensor ... - Semantic Scholar

2 downloads 0 Views 200KB Size Report
network, such as eliminating irrelevant records and aggregat- ing raw data, can reduce energy consumption and improve sensor network lifetime significantly.
In-network Data Processing for Wireless Sensor Networks Yingwen Chen Hong Va Leong Ming Xu Jiannong Cao Keith C.C Chan Alvin T.S Chan Department of Computing School of Computer Science The Hong Kong Polytechnic University National University of Defense Technology Hung Hom, Hong Kong Changsha, China csychen, cshleong, csmingxu, csjcao, cskcchan, cstschan @comp.polyu.edu.hk Abstract In wireless sensor networks, energy is the most crucial resource. In-network data processing is a common technique in which an intermediate proxy node is chosen to house a possibly complicated data transformation function to consolidate the sensor data streams from the source nodes, en route to the sink node. We investigate into the placement problem of the proxy. We formulate and solve the energy minimization problem analytically, based on an ENergyEfficient Rate-Governed Yardstick (ENERGY). An optimal solution is derived based on complete network topology information. Taking into account realistic sensor network constraints that only neighboring network connectivity is known to a node, we develop an approximate but effective solution, ENERGY . We evaluate the performance of ENERGY , which performs well even in low-density networks and for queries requesting from data sources at a distance.

1 Introduction A wireless sensor network consists of a collection of communicating nodes, each incorporated with sensors collecting real-time data to the sink node. Sensor nodes are batterypowered and energy is the most crucial resource. Many existing research works address the problem of minimizing energy consumption by minimizing the communication overhead. Since sensor nodes possess local computation abilities, part of the computation can be off-loaded from the sink node [4]. In general, performing data operations inside the network, such as eliminating irrelevant records and aggregating raw data, can reduce energy consumption and improve sensor network lifetime significantly. This is referred to as in-network data processing, in which an intermediate proxy node is chosen to house the data transformation function to consolidate the sensor data streams from the data source nodes, before forwarding the processed stream to the sink. Proper placement of the data transformation function exerts a strong impact on the energy consumption on data transmission. In this paper, we investigate into the optimal placement problem of the transformation function and propose an effective solution based on an ENergy-Efficient Rate-Governed Yardstick (ENERGY). We define the system model and energy cost function and derive a global optimal solution. We then relax the assumption on network This research is partially supported by a research grant from the Department of Computing, The Hong Kong Polytechnic University and the Doctoral Foundation of National Education Department under Grant No.20059998022.

topology, approximating using the coordinates of the sensor nodes in deriving an approximate solution based on ENERGY, known as ENERGY . The remainder of this paper is organized as follows. Section 2 presents some related research work. In Section 3, we formulate our problem and define the sensor network model. Section 4 details the ENERGY solution in finding the ideal placement of the data transformation function. In Section 5, we conduct simulated experiments to evaluate the performance of ENERGY . Finally, we conclude the paper briefly and outline some of our future research directions.

2 Related Work In-network processing originated from the Internet, then extended to sensor networks by means of a software architecture [5]. Opportunistic data aggregation was adopted, since in-network aggregation could significantly reduce network traffic. In-network data processing can be classified into hierarchical and non-hierarchical approaches [2]. The former allows the aggregation of partially aggregated data steams while the latter only aggregates the streams at a single node [10]. There are a number of research issues for hierarchical innetwork data processing. Node synchronization in performing partial aggregation is necessary. Data transmission reliability can be enhanced through proper modification of the routing protocol [11]. Since finding the optimal aggregation tree to support partial aggregation can be shown to be equivalent to finding the Steiner tree, an NP-hard problem, a greedy aggregation method to support the partial aggregation is needed [7]. A greedy incremental tree can improve path sharing and reduce transmission energy. For non-hierarchical in-network data processing, the placement of the aggregation node is critical to energy consumption. An Adaptive and Decentralized Operator Placement Strategy (ADOPS) is proposed to address transformation function placement [3]. In ADOPS, a neighbor exploration strategy is adopted, so that sensor nodes can continuously refine the placement of the function to minimize the amount of data transmitted. However, it performs poorly for networks with low node density, due to a lack of sufficient neighbors to carry out the neighbor exploration strategy. The placement of the operator might hit local minima far from the global one. Furthermore, at regular intervals, estimated costs on candidate nodes have to be sent to the active node for comparison. The exchange of these control messages leads to substantial overhead. Our work is related to ADOPS.

Proceedings of the 7th International Conference on Mobile Data Management (MDM'06) 0-7695-2526-1/06 $20.00 © 2006

IEEE

3 System Model

4 ENERGY*

We model the sensor network as a graph , where each vertex represents a sensor node and each edge represents the existence of communication ability between two nodes. A sink node is a sensor node that sends out a query message to one or more source nodes to gather information in the sensor network. In order to reduce the transmission energy consumption, data processing could be done along the path from the data sources to the sink. A sensor node is called a proxy node when it receives two or more raw data streams and aggregates, correlates or filters these data streams into a single stream. Using a node along the route to the sink as a proxy node can reduce the total volume of data transmitted. Consider that the sink node initiates a query with lifetime , for inputs from data sources. In response to the transmits a data stream at bit rate query, each source bps. The sink would send the transformation function to the proxy , which will process input streams using and forward the result stream at a bit rate of bps to . To formulate the energy consumption by sensor nodes in ENERGY, we first define some notations. Let denote the hop count between node and ; denote the average energy consumption for transmitting one bit; denote the transmission rate of data stream ; denote the lifetime of data stream . Then the total energy consumption of data steam transmitted between and is . The data-delivery path from the source nodes to the sink can be considered as a sink tree, where the sink is the root and the source nodes are leaves of the tree. Let us assume that the cost of a query is only measured by the energy consumption due to transmission (since CPU cost is much lower than data transmission cost [1]) and that the size of the data processing function is bits. The total cost of the query can be formulated as a function of the proxy node :

4.1 Approximation

(1) Let denote the minimal hop count between node and . The cost on choosing as the proxy is: (2) The optimal placement problem of the data transformation function can be formulated as an optimization problem, based on Equation 3, i.e., optimal placement of the proxy node to perform in-network data processing, so as to minimize the total query cost. (3) If the whole topology of the network is known, this optimization problem can be solved by applying Dijkstra’s or Floyd’s algorithm to determine the minimal hop count between any pair of nodes, followed by a search for the minimal energy consumption among all possible proxy nodes. However, it is expensive for a node to obtain knowledge of the whole network topology, especially when the location of nodes, and hence topology, could change over time. A practical method for finding the optimal proxy should only rely on local or partial topology information of the network. This consideration gives rise to our ENERGY approximation.

Without knowing the whole network topology, it is not easy to get the exact minimal hop count between any two nodes. However, it is intuitive that the hop count between any two nodes is generally related to the “distance” between them. Manhattan distance [3] and Euclidean distance [9] were often used to estimate the hop count in the calculation of transmission energy consumption in wireless sensor networks. In our ENERGY approximation, we employ Euclidean distance as an estimate to the hop count. Let denote the Euclidean distance between and and denote the position of the proxy node. node The total cost of the query can be approximated as: (4) Thus the optimization problem with respect to Equation 3 can be transformed into a problem based on Equation 5, i.e., to find the optimal point in the - plane, so as to minimize the total cost function . (5) Generally speaking, optimization based on Equation 5 is an unconstrained optimization problem, which can be solved numerically using iterative methods. The numerical solution reflects the optimal proxy node, if sensor nodes were allowed to be located anywhere in the - plane and data can be transmitted along straight lines between any two nodes. In fact, the solution can be considered as the coordinates of a “virtual” node, where the transformation function can be placed to minimize the cost function. However, the virtual node does not exist in the real sensor network and data should be transmitted hop by hop. Thus, we need to map the virtual node to a real node in the network as the proxy and build up the whole routing tree for the query. These will be described in greater details in the subsequent sections. We solve the optimization problem iteratively from an initial estimate of a minimizer of the objective function , and a sequence of estimates of is generated. We employ the Steepest Descent Algorithm to solve the unconstrained optimization problems, in combination with the Gold Section Algorithm to minimize .

4.2 Node Mapping We assume that the sink node knows the locations and data bit rates of all the source nodes , so that it can determine the coordinates of the optimal virtual node for the proxy, based on ENERGY formulation. The next step is to map the virtual node into a real proxy node in the sensor network. To this end, the sink sends out a probe packet to based on its coordinates by the GPSR routing protocol [8]. If the coordinates of happen to be the same as one of those real nodes, the probe packet will finally reach that real node , which can be taken as the proxy node; otherwise is either inside an interior face or outside the exterior face. GPSR will forward the probe packet till the packet reaches the corresponding face. When reaching the interior or exterior face, the packet will tour around the entirety of the face. By appending location information of the traversed nodes to

Proceedings of the 7th International Conference on Mobile Data Management (MDM'06) 0-7695-2526-1/06 $20.00 © 2006

IEEE

the packet as a traversal list, the forwarding node can determine whether the packet has completed a full traversal of a face [6]. With the location information of all the nodes on the face recorded, we can forward the packet to the node geographically closest to , and take as the proxy. If there are several nodes at the same distance to , we may simply choose the first one in the traversal list to break tie.

4.3 Route Discovery The proxy node needs to build up the routing tree for the complete query, where the sink node is the root and the source nodes are leaves. Since the location of the source nodes and the sink node can be included in the query request, can also use GPSR to find the routing path for each source node. However, the routing path found by GPSR might not be the optimal one with the least hop count. Thus, it is not energy efficient for long-running queries. Instead, we adopt Directed Diffusion [7] to find the least hop paths from proxy to sink and sources so as to construct the routing tree for the query with a minimum energy consumption. Proxy initiates the route discovery process by broadcasting a routing request for sink and sources . After receiving the request for the first time, a node broadcasts the message to other neighbors, if the request does not come from its only neighbor. If it is not the first time, no action is needed. When the request reaches the sink or the sources, they will send back the route along the reversed path. Assuming that the time to transmit packets between neighboring nodes are approximately the same, the distributed route discovery process returns paths with minimal number of hops.

5 Simulation Studies We evaluate the energy consumption with or without innetwork data processing, and make a comparison between ENERGY and other approaches. In our simulation, the sensor nodes are distributed in a region , according to the uniform distribution. A communication graph is generated assuming that all nodes have the same transmission range . A summary of the query and sensor network parameters and their default values is presented in Table 1. Parameter Coverage of sensor network Number of sensor nodes Transmission range Number of data sources Size of transformation function Lifetime of query Source data rate Consolidated data rate Energy consumption per bit Query distance

Symbol

Default value 300 by 300 250 - 450 25 4 100 100 10 10 1 3 - 10 hops

Table 1. Parameters of query and sensor network

We consider two types of queries that require different data bit rates from data sources. The first type is uniform query (denoted Uniform), with equal data bit rate requirement from data sources, i.e., . The second type is non-uniform query (denoted NonUniform), in which the data bit rates are progressive, i.e., = , = , = , and = . There are also two types of queries varying on the geographical location of data

sources. The first type is spatially constrained query (denoted Constrained), which only requires data from sources that are relatively close to one another spatially (we assume that the sources are located within the constrained spatial area of 50 by 50, representing about 3% of the sensor field ). The other type is non-spatially constrained query (denoted Unconstrained), in which the sources are located arbitrarily within the sensor field . In order to eliminate the boundary effects, we choose the sink and the sources only near the central region of size 250 by 250. We evaluate different kind of transformation function placement strategies, including OPTIMUM, ENERGY , ADOPS [3], and BASELINE. In OPTIMUM, the data transformation function is placed at a globally optimal node. Although we cannot achieve the global optimum in a real sensor network, we assume that it can be computed using complete topology information, and use it as a benchmark for comparison. In BASELINE, the transformation function is placed on the sink, without in-network data processing. We generate 30 connected network instances for each simulation and spawn 100 queries in each network instance.

5.1 Impact of Query Distance We first evaluate the energy consumption with varying query distance . The query distance reflects how far it is from the sources to the sink. We fix the number of sensors to 300. The results are depicted in Figure 1. From Figure 1, it is obvious that in-network data processing can reduce energy consumption significantly for all kinds of queries. Besides, ENERGY performs much better than ADOPS in most cases except for short-distance queries and more concentrated data sources. It is because in case of short-distance queries with constrained data sources, the local optimum discovered by ADOPS may not be far away from the global optimum, thus producing a reasonable performance. ENERGY needs more computation to produce an approximate optimal proxy and the mapping of virtual to real sensor node could be a source of sub-optimality; the impact of approximated solution has rendered a greater impact on the performance deviation from OPTIMUM. As the query distance increases, the local optimal placement defined by ADOPS might be further away from the global optimum, but ENERGY generates a proxy not too far away from the optimal, thus performing much better than ADOPS. When the average hop of the query distance is getting to 10, ENERGY leads to a saving of 18% to 30% of energy consumption over ADOPS for different kinds of queries. Across different query patterns, variation to data rates appears to be a minor factor, though ENERGY normally yields a superior performance. Uneven data rates only cause a higher variation to the performance, but does not affect the general performance too much. Distribution of the data sources is more influential. It is not difficult to observe that if the data sources are more concentrated to a region, there is a better chance of getting a good proxy near most of the data sources to reduce the data traffic. If the sources are dispersed, the benefits due to in-network data processing become smaller, since it would generally take a higher bandwidth for the sources to transmit the data to the proxy, which cannot be close to all of the sources simultaneously.

Proceedings of the 7th International Conference on Mobile Data Management (MDM'06) 0-7695-2526-1/06 $20.00 © 2006

IEEE

3

2.5

4

Constrained/Non−unif.

3

1.5

2

1.5

1

0.5

1

4

5

6 7 Average Hop

8

9

0 3

10

( ) Constrained/Unif.

3

2.5

2

1.5

1

4

5

6 7 Average Hop

8

9

10

3

2.5

2

1.5

1

0.5

0 3

( ) Constrained/Non-unif.

Unconstrained/Non−unif. OPTIMUM ENERGY* ADOPS BASELINE

3.5

0.5

0.5

0 3

4

x 10 4

OPTIMUM ENERGY* ADOPS BASELINE

3.5

2.5

2

Unconstrained/Uniform

x 10 4

OPTIMUM ENERGY* ADOPS BASELINE

Energy Consumption

x 10 4

3.5

Energy Consumption

Energy Consumption

4

Constrained/Uniform OPTIMUM ENERGY* ADOPS BASELINE

Energy Consumption

4

x 10 4

3.5

4

5

6 7 Average Hop

8

9

0 3

10

4

5

6 7 Average Hop

8

9

10

( ) Unconstrained/Unif. ( ) Unconstrained/Non-unif.

Figure 1. Energy consumption versus query distance 4

4

Constrained/Uniform

x 10

3.5

4

Constrained/Non−unif.

x 10

3.5

4

Unconstrained/Uniform

x 10

3.5

Unconstrained/Non−unif.

x 10

3

1.5

1

OPTIMUM ENERGY* ADOPS BASELINE

0.5

0 250

300

2.5

2

1.5

1

0.5

350 Amount of Nodes

400

450

( ) Constrained/Unif.

0 250

OPTIMUM ENERGY* ADOPS BASELINE

300

3

3

2.5

2.5

2

1.5

1

0.5

350 Amount of Nodes

400

450

( ) Constrained/Non-unif.

Energy Consumption

2

Energy Consumption

2.5

Energy Consumption

Energy Consumption

3

0 250

OPTIMUM ENERGY* ADOPS BASELINE

2

1.5

1

0.5

300

350 Amount of Nodes

400

450

0 250

OPTIMUM ENERGY* ADOPS BASELINE

300

350 Amount of Nodes

400

450

( ) Unconstrained/Unif. ( ) Unconstrained/Non-unif.

Figure 2. Energy consumption versus node density

5.2 Impact of Node Density Since the network topology is greatly affected by the node density, we investigate its impact by setting the average query distance to 8 and varying the number of nodes , and hence node density. The results are depicted in Figure 2. When the node density increases, the energy consumption decreases. That is because when there are more sensor nodes, each node may have more neighbors, which help to shorten the path from the proxy to other nodes, thus improving energy consumption. It can also be observed that ENERGY performs much better than ADOPS in general. The performance of ENERGY and OPTIMUM is density-sensitive for non-spatially constrained queries and less density-sensitive for spatially constrained queries. The energy consumption for ENERGY and OPTIMUM decreases only slightly for Constrained, but a lot more for Unconstrained at increased density. This is because if the source nodes are constrained in a specific region, even though there are more nodes, the probability to shorten the path from the proxy to the source nodes is still low; if the source nodes are distributed arbitrarily, the probability to shorten the path from the proxy to the sources would be larger. The performance of ADOPS is more densitysensitive than other approaches. This is because ADOPS explores the local optimal transformation function placement in the neighbor list of each node. Existence of more neighbors for each node might increase the chance to refine the local optimum to be closer to the global optimum.

6 Conclusion Energy consumption is a crucial factor in a sensor network. In-network data processing is a useful technique to reduce the energy consumption significantly. In this paper, we proposed an objective measure for energy consumption, namely ENERGY, and an approximate solution ENERGY , which effectively solves the transformation function placement problem without knowing the whole network topology. Compared with other available approaches, ENERGY generates fewer control messages and achieves better performance, especially in low-density networks and for longdistance queries. Our future research work includes the extension of data transformation function placement problem

to take into account of dynamic network topology, node mobility, and more extensive considerations of the adaptivity and fault-tolerant issues.

References [1] I. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. A survey on sensor networks. IEEE Communications Magazine, 40(8):102–114, 2002. [2] M. Bhardwaj and A.P. Chandrakasan. Bounding the lifetime of sensor networks via optimal role assignments. In Proceedings of Joint Conference of IEEE Computer and Communications Societies, pages 1587–1596. 2002. [3] B.J. Bonfils and P. Bonnet. Adaptive and decentralized operator placement for in-network query processing. Telecommunication Systems, 26(2-4):389–409, 2004. [4] P. Bonnet, J. Gehrke, and P. Seshadri. Towards sensor database systems. In Proceedings of International Conference on Mobile Data Management, pages 3–14, 2001. [5] J. Heidemann, F. Silva, C. Intanagonwiwat, R. Govindan, D. Estrin, and D. Ganesan. Building efficient wireless sensor networks with low-level naming. In Proceedings of Symposium on Operating Systems Principles, pages 146–159. 2001. [6] Q. Huang, C. Lu, and G.C. Roman. Reliable mobicast via face-aware routing. In Proceedings of Joint Conference of IEEE Computer and Communications Societies, pages 2108– 2118. 2004. [7] C. Intanagonwiwat, D. Estrin, R. Govindan, and J. Heidemann. Impact of network density on data aggregation in wireless sensor networks. In Proceedings of International Conference on Distributed Computing Systems, pages 457–458. 2002. [8] B. Karp and H.T. Kung. GPSR: Greedy perimeter stateless routing for wireless networks. In Proceedings of International Conference on Mobile Computing and Networking, pages 243–254. 2000. [9] H.S. Kim, T.F. Abdelzaher, and W.H. Kwon. Minimumenergy asynchronous dissemination to mobile sinks in wireless sensor networks. In Proceedings of International Conference on Embedded Networked Sensor Systems, pages 193– 204. 2003. [10] S. Madden, M.J. Franklin, J. Hellerstein, and W. Hong. TAG: A tiny aggregation service for ad-hoc sensor networks. In Proceedings of Symposiumon Operating Systems Design and Implementation, 2002. [11] Y. Yao and J. Gehrke. The cougar approach to in-network query processing in sensor networks. ACM SIGMOD Record, 31(3):9–18, 2002.

Proceedings of the 7th International Conference on Mobile Data Management (MDM'06) 0-7695-2526-1/06 $20.00 © 2006

IEEE

Suggest Documents