A Dynamic Geographic Hash Table for Data-Centric ... - CiteSeerX

0 downloads 0 Views 267KB Size Report
centric storage uses a geographic hash table to map an event type into a .... data and queries for the event have to pass through the original event node, this ...
A Dynamic Geographic Hash Table for Data-Centric Storage in Sensor Networks Thang Nam Le, Wei Yu, Xiaole Bai and Dong Xuan Abstract—This paper proposes a dynamic geographic hash table for data-centric storage (DCS) in sensor networks. In DCS systems, data storage locations are determined by data name. The storage locations are obtained through the use of a geographic hash table (GHT) that maps data names to geographic locations. Traditional DCS systems use a static hash function for this purpose, resulting in a static set of nodes serving the network throughout its lifetime. Hence, these nodes may experience unbalanced resource utilization problems and the network will not be capable of dealing with network dynamics such as new sensor deployments or runtime sensor failures. We address these problems by proposing a dynamic GHT solution that relies on two schemes: 1) a temporal-based geographic hash table to achieve overall load balancing among sensor nodes over time; and 2) a location selection scheme based on node contribution potential to proactively adapt the system to network dynamics. Our performance evaluations show that the dynamic GHT can alleviate the resource utilization problem of DCS systems and can prolong the network lifetime significantly. Keywords—Sensor Networks, Geographic Hash Tables, DataCentric Storage.

I. INTRODUCTION Wireless sensor networks have gained significant importance in a wide range of civil and military applications. Advances in low-powered microprocessor technology, combined with low-cost sensing devices and radio frequency circuits have resulted in the feasibility of inexpensive wireless sensor networks. A vast number of applications like temperature and humidity measurement, habitat trajectory tracking, assembly line production sensing, and intruder detection have proven the significance of sensor networks in a wide range of areas in recent years. In many sensor applications, the identity of a sensor is not as important as the data associated with it. For example, if a sensor network is deployed to track movement of animals in a field, the data detected by the sensors and their geographic location are often more important than the identifiers of the nodes generating these data. Due to this, the data-centric model has been proposed for such applications. In this model, data names, instead of node identifiers, are used to identify the data and to determine their physical storage locations. This is significantly different from the traditional IP or telephone networks where a unique address (i.e. network address or Thang Nam Le, Xiaole Bai and Dong Xuan are with the Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210. E-mail: {let, baixia, xuan}@cse.ohio-state.edu. Wei Yu is with the Computer Science Department, Texas A&M University, College Station, TX 77843. E-mail: [email protected].

phone number) is assigned to each node and is used to identify the destination node in the routing process. In [1, 2], Ratnasamy et al. have proposed a data-centric storage model based on a geographic hash table (GHT) for sensor networks. In this model, low level sensor data are abstracted to a high level concept of event. Example of events could be “elephant sighting” or “vehicle movement”. Datacentric storage uses a geographic hash table to map an event type into a geographic location. All the sensed data associated with an event type are stored at the corresponding hashed location. A query looking for a certain event type is routed according to the event type. A significant benefit of datacentric storage (DCS) with GHT is that queries for data corresponding to a certain event type can be sent directly to the storage node of the event rather than being flooded throughout the network. It has been shown in [1, 2, 3, 4] that DCS-GHT can reduce network traffic and lower energy consumption for sensor nodes. In the context of sensor networks where nodes’ energy and storage spaces are limited, we observe several problems in existing works in DCS with GHT. Existing GHT schemes with a static hash function (for hashing events into locations) have the following drawbacks: i) The set of nodes associated with hashed locations repeatedly serve the network over time. Hence their resources may quickly dry out and the network service may degrade as these nodes run out of resources. ii) When new nodes with more resources are deployed in the network after a period of time, they can be prevented from being event-processing nodes (hereinafter referred to as event node) because their positions may not be as close to the hashed locations as the old nodes. This may hinder the performance of the network because the new nodes with more resources may be able to better service the network. In this paper, we attempt to address these problems by proposing a dynamic GHT solution. The highlights of this paper are: i) We present a temporal-based GHT as opposed to static hash functions used by traditional GHT-based systems. With the temporal information in the GHT, the mappings between the event types and geographic locations change over time. Thus, load balancing can be achieved for the sensor nodes. ii) Using the concept of node contribution potential originally proposed in [9], we propose a location selection scheme to fine tune the set of possible hashed locations, avoiding situations where events are mapped into locations where surrounding nodes do not have enough resources to service the network. The rest of paper is organized as follows: Section II presents some background on data centric storage with GHT

2168 1-4244-0270-0/06/$20.00 (c)2006 IEEE This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2006 proceedings. Authorized licensed use limited to: The Ohio State University. Downloaded on June 29,2010 at 15:46:54 UTC from IEEE Xplore. Restrictions apply.

and discusses problems in existing GHT-based systems. In Section III, we present our dynamic GHT solution by introducing the temporal-based hash function and the location selection scheme based on node contribution potential. We perform simulations to evaluate the performance of our solution in Section IV. Finally, we discuss issues related to the design of our system in Section V and present concluding remarks in Section VI. II. PRELIMINARIES A. GHT – A geographic hash table for data-centric storage Data-centric storage with a geographic hash table (DCSGHT) is an event-driven data dissemination paradigm proposed in [1, 2]. In this model, low level sensor data are abstracted to a high level concept of event. Example of events could be “elephant sighting” or “vehicle movement”. Datacentric storage uses a geographic hash table to map an event type into a geographic location. Figure 1 illustrates a network with two events hashed into two separate locations. Any detected data associated with event 1 are sent to the sensor closest to the mapping location corresponding to event 1 (event node) and queries looking for this event are routed to the same node. In short, GHT defines two basic functions: Put(EventType, DataValue) for data storage and Get(EventType) for data retrieval.

Mapping location for event 1

Location Sensor

Node closest to event 1 mapping

Data for event 1 Mapping location for event 2 Queries for event 1

Figure 1: Basic operations of a data-centric system with GHT

In order to route data and queries to the destination, GHT uses Greedy Perimeter Stateless Routing (GPSR), a popular and efficient geographic routing algorithm proposed in [5]. GPSR uses greedy forwarding to progressively forward a packet to a node closer to the destination. When a packet reaches a region where greedy forwarding is not possible (i.e. the only viable path requires one to move away from the destination), GPSR recovers by routing around the perimeter of the region until it finds a node where greedy forwarding can resume. The routing process terminates after the packet has been delivered to the node closest to the destination. To achieve load balancing for the system, traditional GHT schemes [1, 2] propose the use of structured replication, which creates multiple image nodes for each event. A node only sends its data to the nearest image node to reduce the storage cost. On the other hand, retrieval cost increases because queries need to be routed to all these image nodes.

Some extensions of the traditional GHT-DCS approach include [3, 4, 6]. R-DCS [3] aims at solving the fault tolerance problem by proposing a mechanism to allow the storage nodes to create replicas in neighboring regions. RBI [4] is another extension of GHT where the sensors keep the data locally or at some storage nodes in the neighborhood. The storage nodes register themselves to the index nodes corresponding to the events. Queries for a particular event are routed to the proper index node and then forwarded to the corresponding storage nodes. In [6], Tamoshetty et al. propose a mechanism to store event data at multiple locations which can provide various levels of resilience based on the importance of the data. B. Problems in existing DCS-GHT Overall, existing DCS-GHT systems try to solve the load balancing problem by dividing the network into multiple regions and creating an image for each event in each region. In these solutions however, the hash function is a static function and the set of nodes serving the network is a fixed set. Hence, the uneven resource utilization problem in the network still exists though at a smaller scale. Besides, because of the eventdriven nature of GHT applications, different events may happen at different rates at different times. Different regions may also experience different conditions. Hence, it is difficult to predict the optimal number of storage nodes and their distribution. Another shortcoming of using a static set of event nodes is the inability to deal with network dynamics. When sensors with more resources are deployed in an existing network, there are no mechanisms in the existing work to facilitate the role of the new sensors in the network. In existing work in the DCS-GHT area, a sensor is considered to have a binary state, either dead or alive. When an event node is alive, it is assumed to have enough resources to service the network. If it is dead, existing GHT systems rely on network refreshes to find out the new event node for the event. In practice, an event node may experience a state when it is out of storage space but its energy is still sufficient for a period of operation. At this state, an event node can cause service disruption to the network. For example, the data corresponding to this event node can not be stored there anymore while the queries associated with the event are still routed to this node since it is still marked as an alive node. There are several possible solutions to solve the above problem, including: i) Letting the event node turn off its radio as if it is dead. The data and queries will converge again at the node next closest to the hashed location; or ii) Letting the event node forward the sensed data to a neighbor node which still has enough resources to store the data. When queries arrive at the original event node, this node can forward the queries to the new node. However, these solutions while being simple may have several drawbacks. In the first case, when an event node turns itself off, the previously collected data may be lost. Besides, it would waste system resources to turn this node off as it can still serve as a packet forwarder for the system. In the second case, packets may be forwarded in a zigzag manner. In Figure 2, the original event node 1 ran out of storage space and

2169 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2006 proceedings. Authorized licensed use limited to: The Ohio State University. Downloaded on June 29,2010 at 15:46:54 UTC from IEEE Xplore. Restrictions apply.

designated node 7 as the event node. Similarly, this node later ran out of storage space and designated node 8 as the new event node. All data and queries corresponding to the event now have to travel through nodes 1 and 7 to reach the actual event node. As we will show in Section IV, this may incur excessive overhead to the system. Apart from that, since all data and queries for the event have to pass through the original event node, this node can become a bottleneck of the network. In this paper, we use the term hollow sensor or hollow node to refer to an alive sensor that has run out of storage space. The term hollow region to refer to a region surrounded with hollow nodes. As we will show in the following section, we take a proactive approach to ensure events are mapped only to resourceful regions. Hence, the hollow region problem can be avoided. Location Sensor Event node Hollow node

2 3

III. A DYNAMIC GEOGRAPHIC HASH TABLE To address the problems caused by the static GHT, we propose a Dynamic GHT (D-GHT) solution, which uses the temporal attribute of queries and data in addition to the event type to decide the hashed locations for the events. The set of hashed locations changes over time to ensure that all nodes contribute fairly to the network operations. Besides, D-GHT avoids hashing events into hollow regions by periodically updating network information. D-GHT relies on two schemes: a temporal-based geographic hash table and a location selection scheme based on node contribution potential. A. A temporal-based GHT 1) A temporal-based hash function for GHT: In order to avoid the unbalanced resource utilization, nodes across the network must be fairly used over time. To achieve this objective, we redefine the original basic operations of GHT  Put (eventType, dataValue) from:  Get (eventType)

1

to:

4 7

5 8 6

Data or query for event 1

Figure 6: 2: a hollow region may cause packets to travel a zigzag path

C. Models In this paper, we consider a square sensor network whose boundaries are known to the network operator and to the sensors before deployment. There is a single sink located at coordinate (0, 0) of the network which has relatively large storage space and processing power. Sensor nodes are stationary and the number of them is relatively large compared to the number of events. Sensor nodes are location-aware through the use of GPS devices or some other localization techniques [7, 8]. We divide the time axis into time slots, each with a length of ∆T. We assume that the clock skews between the sensors are ignorable because i) the timeslot period is substantially larger than clock skews, and ii) sensor time can be synchronized by existing time synchronization protocols [10, 11]. We assume that queries have a temporal attribute. We believe that the assumptions about timeslots and queries are reasonable because in many sensor applications, besides the eventspecific data, the time of the event is equally important. When the temporal data are transferred from a sensor to the sink, the ability to interpret the data correctly at the sink suggests that there has already been an underlying time synchronization mechanism among the nodes.

 Put (eventType, dataValue, ∆T )  Get (eventType, ∆T ),

where ∆T is the timeslot number. The new Put and Get functions operate as the original ones except that the hashed location of an event changes over time. The following part explains the construction of a temporal-based mapping from events to locations. We logically divide the network into grid cells of equal size. Let: N be the set of cells from 0 to n-1 E be the set of event types from 0 to e-1 M be the least common multiple of n and e T be the set of timeslots We define a hash function h from (E, T) → N (note that when we use the term hashing an event to a cell, we imply that the hash of the event is the center of the corresponding cell) as below: (1) h (ei ,t) = (t.e + i - ( t div ∂)) mod n where ei is the ith event, t is the timeslot number, ∂ = M/e and h(ei ,t) is the index of the cell corresponding to the ith event. The meaning of the function h(ei ,t) can be explained as follows: • First, we construct an event circle whose circumference is e units. We label the events (from 0 to e-1) on this circle one next to another. The distance between two adjacent events on the circle is one unit (Figure 3). • Next, we construct a location circle whose circumference is M units. We label the cells (from 0 to n1) on this circle repeatedly. The distance between two adjacent cells on this circle is also one unit. The two circles contacts each other at point (event 0, cell 0) as indicated in Figure 3.

2170 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2006 proceedings. Authorized licensed use limited to: The Ohio State University. Downloaded on June 29,2010 at 15:46:54 UTC from IEEE Xplore. Restrictions apply.

1

1

0

0

Location circle

0

Location circle

Location circle n-1

n-1

n-1 n-2 n-2

n-2

Event circle

Event circle

Event circle

2 2

e-2 e-1 n-2 n-1

0 0

1

3

e-1 2

1

Figure 3: The event circle rolls on the location circle. Mapping point of an event is defined as the contact point of the 2 circles

0

2

1

n-2 n-1

2 1

0

Figure 4: After the event circle rolls for ∂=M/e rounds, it returns to the original location as in fig. 3. We shift the positions of the events on the event circle by one unit.

• For each timeslot, we let the event circle roll on the location circle for one round (of the event circle) clockwise. The contact point of the 2 circles defines the mapping from the events to the locations. For example, at the first timeslot, event e0 will be mapped to cell 0, event e1 to cell 1, etc. We repeat this process for timeslot 2, 3, etc. until the event circle returns to its original position (this will happen after ∂ = M/e timeslots because M is the least common multiple of n and e). • At this time, we shift the event positions on the event circle for one unit clockwise (Figure 4). Event e0 will now be mapped to cell n-1, event e1 to cell 0, event e2 to cell 1, etc. We repeat this process every ∂ timeslots. Following this process, for timeslot t (corresponding to the event circle turning t rounds) where t < ∂, we have the mapping: (e0 ,t) → (t.e) mod n and (ei,t) → (t.e + i ) mod n For the next ∂ timeslots, because of the extra shift, the mapping becomes: (ei, t) → (t.e + i - 1) mod n So, in general, the mapping can be defined as: h (ei ,t) = (t.e + i - ( t div ∂) ) mod n where ei is the ith event, t is the timeslot number, ∂ = M/e and h(ei , t) is the index of the cell corresponding to the ith event. As we can see from the above illustration, hash function h ensures that all cells are used fairly over time in the hash process even if different events may have different rates. The current hash function can be easily modified to facilitate load-balancing and fault-tolerance by enlarging the event circle. For example, to have two images for event 0, we can enlarge the event circle by one unit and label 0 on the new place on the event circle as in Figure 5 (M and ∂ need to be recalculated as well). Each node in the network decides which of the two cells corresponding to e0 is geographically closer and sends data to that specific cell. Note that in this case, it is desirable to label the two positions of e0 on the event circle far

1

e-2 e-1 n-2 n-1

0 0

0

2 1

Figure 5: To have a replica for event 0, we only need to add an extra unit space to the event circle, label it with 0 and recalculate M and ∂

from each other so that the hashed locations are distant (to avoid cluster failure and reduce overall storage cost). 2) Data and query routing: Sensor nodes use both the event type and the temporal attribute of queries and data to identify the corresponding event node. For example, when a sensor detects an event, it uses the event type and the current timeslot number to find the corresponding event node using hash function h. Similarly, when a query arrives at the sink, it is routed to the appropriate event node based on its event type and timeslot number. B. A node contribution potential-based location selection scheme The purpose of the temporal-based hash function is to ensure that all the nodes fairly contribute to the operation of the network. However, this function can not cope with dynamics in the network environment such as node failures or new sensor deployment. We design a node contribution potential-based location selection scheme to address this issue. 1) Node contribution potential: Node contribution potential, originally proposed in [9], is a novel concept to represent the potential of a node to contribute to the network operation. In short, a contribution potential P(Vi ) of node Vi can be described by the following function: (2) P(Vi ) = f (IP(Vi ), EP(Vi )) where potential function f is a function of: IP(Vi ) = internal contribution potential of node Vi , defined as a function on a set of attributes of node Vi , for example, IP(Vi ) can be a function of energy level, storage space, connectivity degree or the number of timeslots serving as an event node. EP(Vi ) = external contribution potential of Vi , defined as a function of internal contribution potential of all neighbors of node Vi . We define the cell contribution potential C(Ni ) of cell Ni as a function of contribution potentials of all nodes in this cell.

2171 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2006 proceedings. Authorized licensed use limited to: The Ohio State University. Downloaded on June 29,2010 at 15:46:54 UTC from IEEE Xplore. Restrictions apply.

In the context of this paper, P(V) can be defined as a function of remaining storage space and energy level. C(N) can be defined as the average potential of all nodes in the cell. 2) Potential-based location set: Periodically, nodes inside each cell exchange their contribution potentials to determine the cell potential. This can be done either by limited broadcast or by nodes sending their potentials and geographic coordinates to a predefined point in the cell (i.e. the sensor closest to this point) and having this point calculate the cell potential (Note that since the size of each cell is relatively small and can be adjusted, and the cell potential calculation process is only done over a long period of time, we are not concerned about its effect to the in-cell load balancing issue. Furthermore, nodes in each cell can use a time-based function to determine the point if necessary). This value and the coordinate of the node with highest potential in the cell are next sent to the sink. The sink selects a set of cells with potentials above a certain threshold to be the location set of the period and broadcasts this set, together with the list of coordinates associated with it, to the network. Within the period (consisting of multiple timeslots), all sensors will use this location set for event-location hashes. Let L is a set of possible locations (i.e. cells) for a given period, hash function (1) can be modified as: h (ei ,t) = (t.e + i - ( t div ∂)) mod nL

(3)

where ei is the ith event, t is the timeslot number in the period, nL is the number of cells in the location set, M is the least common multiple of nL and e, ∂ = M/e and h(ei ,t) is the index of the cell corresponding to the ith event in the location set. The destination for the hash is now the coordinate associated with h (ei ,t) (i.e. the geographic location of the sensor with highest potential in the cell). A different approach to solve this resource allocation problem is to assign a different load to each cell based on its potential. After receiving all cell potentials, the sink sends out a list of weights associated with each cell together with the coordinates of the nodes having highest potential in each cell correspondingly. A weight value assigned to a cell is related to its potential and reflects the load allocated to that cell for the next timeslot. Hence, cells with different potentials are used differently during the period. Let us assume that each cell Ni has a potential of ci. We can calculate the weight wi for Ni as the percentage of ci in the total potential of the system: wi =

ci



ci

i

i= 0

value of wi . On this circle, we label cell 0 W0 times, cell 1 W1 times, etc. The operation of this hash function is similar to the one mentioned earlier. Figure 6 illustrates this method in the case W0 and W1 are 2 and 3 respectively. Note that it is desirable to arrange the labels corresponding to any cell uniformly on the location circle to avoid the situation where a cell becomes the event node for several event types at the same time. IV. PERFORMANCE EVALUATIONS In this section, we perform simulations to measure the performance of our dynamic GHT (D-GHT) solution. 1) Metrics and parameters: We evaluate the total system overhead for data collection and query processing as well as the improvement in terms of resource utilization. The total system overhead is defined as the total number of messages transmitted for data storage and query processing, together with the number of messages for updating the location set in the case of D-GHT. Resource utilization is evaluated by the hotspot message and hotspot storage space (i.e. the highest number of messages or storage space used by any sensor) as well as the standard deviations of these variables. We evaluate two different D-GHT systems with grid sizes of 10x10 and 15x15, together with a traditional GHT system. The original DCS-GHT work [1, 2] has been shown to be a practical and robust data dissemination scheme using extensive simulations in ns2. Since D-GHT is built upon DCSGHT mechanism, we also expect our scheme to be like DCSGHT in these aspects. Because DCS simulations in ns2 do not scale to more than a few hundred nodes [1], and we expect our approach to be more useful in large scale sensor networks, we use a light weight simulator (i.e. without wireless radio details) built in C. In our simulations, we assume stationary nodes in the network as well as error-free packet transmission. The simulation parameters are given in table 1. There is a single sink located at coordinate (0, 0) of the network which serves as the query source for all simulations. Table 1: Simulation parameters Parameters Network size Number of sensors (all stationary) Sensor transmission range Number of event types Number of timeslots Grid size (D-GHT) Location set update period (timeslots)

(4)

n −1



Assuming the weights can be normalized to integer values we can construct a hash function that takes into account the cell weights in a similar manner as the one presented in section III.A: we construct a location circle with n −1 circumference of W where Wi is the normalized integer

i= 0

Location circle

Event circle

3

Value 500m x 500m 500, 1000 50m 10-50 100, 1000 10x10, 15x15 100, 24

1 2 e-1

n-2 n-1

0 0

1

1 1

2) Performance results: Figure 7 characterizes the total system overhead of the D-GHT and the traditional GHT solutions in

0

Figure 6: Weight-based event-to-cell hash function

2172 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2006 proceedings. Authorized licensed use limited to: The Ohio State University. Downloaded on June 29,2010 at 15:46:54 UTC from IEEE Xplore. Restrictions apply.

300000

2000 Hotspot storage space

Num ber of m es sages

250000 200000 150000 100000

1500

1000

50000

500

0

0

0

20

40

60

80

100

400

0

20

Event detection rate (per timeslot)

GHT

D-GHT 15x15

Figure 7: Total system overhead in 100 timeslots (query rate =5)

Hotspot message

3000 2500 2000 1500 1000 500 0 20 GHT

40 60 80 Event detection rate (per timeslot) D-GHT 10x10

100

400

D-GHT 10x10

350 300 250 200 150 100 50 0

400

0

D-GHT 15x15

Figure 8: Hotspot storage space after 100 timeslots (query rate =5)

O verhead c ompared to normal situation

3500

0

40 60 80 100 Event detection rate (per timeslot)

400

700

130%

600

110% 100% 90%

Figure 10: Hotspot message after 100 timeslots (query rate =5)

D-GHT 10x10

400

D-GHT 15x15

500 400 300 200 100

80%

0 0%

10%

30%

50%

100%

First node

GHT Regular

GHT Hollow -1 hop

GHT Hollow -2 hops

Figure 11: Extra overhead caused by hollow regions as compared to a system without hollow regions

100 timeslots. The number of event types is 20 and the query rate is 5 (per timeslot). In general, GHT and D-GHT systems have similar overhead. The slight differences between the systems are due to the different hashed locations selected. The overheads of all systems are relatively independent of the number of events and the number of sensor nodes according to our data. With a query rate of 20 and an event rate of 60 (per timeslot, all event types together), updating the location set every 100 timeslots accounts for less than 2 percent of total system overhead for the D-GHT systems. To see how the D-GHT systems perform in terms of resource utilization, we show the hotspot storage space, standard deviation of storage space and hotspot message of the system in Figures 8, 9 and 10. We run the system in 100 timeslots at a query rate of 5 queries per timeslot. The event detection rate varies from 0 to 400. The location set is fixed. The D-GHT systems perform especially well with regards to storage space (shown in Figures 8 and 9). The 10x10 and 15x15-grid D-GHT systems help reduce both the standard deviation of storage space and the hotspot storage space by over 4 and 8 times respectively. We do not see such a high improvement in terms of hotspot message (Figure 10). We attribute this behavior to the “close-to-sink” effect: since the queries come from the sink, the nodes next to the sink become hotspots because all queries and returned data need to pass

5%

10%

15%

20%

Percentage of hollow nodes

Hollow region rate D-GHT 15x15

40 60 80 100 Event detection rate (per timeslot)

Figure 9: Standard Deviation of storage space after 100 timeslots (query rate=5)

140%

120%

20

GHT

Times lot numbe r

D-GHT 10x10

GHT

450

Standard deviation of storage s pace usage

2500

GHT

D-GHT 10x10

D-GHT 15x15

Figure 12: Number of timeslots until hollow nodes exist in the system (query rate = 20, event rate = 40)

through them. The close-to-sink effect diminishes as the event detection rate increases. In Figures 11 and 12, we assess the effects of hollow regions to traditional GHT systems and how our approach can alleviate that problem. In Figure 11, we assume that the traditional GHT system uses the forwarding approach (see section II.B) when hollow event nodes exist. If the data and queries have to travel one extra hop, the traditional GHT system may suffer up to 25 percent of excessive overhead compared to the normal situation (i.e. no hollow event nodes). This number jumps to almost 40 percent if the number of extra hops increases from one to two. Finally, in Figure 12, we simulate a network in operation. The D-GHT systems update the location set once a day. The timeslot period is one hour (i.e. 24 timeslots per location set update) We assign 100 storage units to each node and measure the periods until the first node starts to fall into hollow state and when the hollow node rate gets to 5, 10, 15 and 20 percent. As we can see from this figure, D-GHT helps increase these periods substantially. For example, under a query rate of 20 and an event rate of 40, the first node in a traditional GHT runs out of storage space after 42 timeslots as opposed to more than 600 timeslots for both 10x10 and 15x15-grid D-GHTs. This implies that by enhancing the resource utilization issue, the D-GHT systems can significantly prolong the network lifetime.

2173 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2006 proceedings. Authorized licensed use limited to: The Ohio State University. Downloaded on June 29,2010 at 15:46:54 UTC from IEEE Xplore. Restrictions apply.

V. DISCUSSIONS

VI. CONCLUSIONS

In this section, we present some discussions about the design of our D-GHT approach: 1) What are the advantages of D-GHT compared to a traditional GHT system divided into multiple regions, each with a complete set of event nodes? Although breaking down a network into multiple regions can provide overall load balancing for traditional GHT systems, the event nodes are still static and therefore, the resource utilization problem still exists. Having multiple regions also increases the complexity of query processing because queries need to be routed to all event nodes. Besides, it should also be noted that D-GHT is an orthogonal approach with the existing solutions. For example, a R-DCS [3] system can break down the network into smaller regions and apply the D-GHT solution into each region separately. 2) Would it be better to have a temporal-based hash function that hashes event types to random points in the network as opposed to grid cells in our approach? A point is a special case when the cell size is one unit. While it is possible to hash events into points in the network, system message overhead will increase substantially because each individual node needs to send its potential to the sink. By using cells, the potential information can be calculated within each cell locally and the number of messages for sending cell potentials to the sink can be reduced significantly. 3) Can we achieve load balancing by storing the data based only on temporal information (i.e. all data in a timeslot will go to the same cell regardless of the event type)? By using both event type and timeslot, we want to achieve a finer grain in load balancing since at any time, a cell only serves at most one event. In fact, using only temporal information in deciding the hash locations is a special case of our problem when replacing e with 1 and i with 0 in formula (1) and (3). However, this should only be done if we know in advance that the amount of generated or queried data is low at all time. When the rates of events and queries are high, hashing multiple events to the same location may cause the active cell to be overloaded. 4) Would it be possible to use only the potential-based scheme to achieve load balancing and deal with network dynamics as opposed to using the combination of both temporal-based and potential-based schemes as an integrated solution? Both schemes aim at improving the resource utilization problem in the network. However, these two schemes work at different time granularities and have different costs. Without incurring any overhead, the temporal-based hash scheme intuitively ensures that cells in the network contribute fairly to the network operations. However, it can only work properly for a relatively short period of time because of possible changes in the environment. On the other hand, the contribution potentialbased scheme is able to cope with network dynamics at runtime. But since collecting cell potentials incurs overhead, this scheme should be only operated at a larger time scale. Hence, having an integrated solution helps achieve the objective at a lower cost.

In this paper, we presented a dynamic geographic hash table for data-centric storage to address the resource utilization problem in sensor networks. Our approach relies on a temporal-based geographic hash table to ensure that nodes contribute fairly to the network operation and a node contribution potential-based location selection scheme to cope with dynamics in the operational environment. Our performance data show that the dynamic GHT can alleviate the load balance problem of GHT-based systems and prolong the network lifetime significantly. In the future, we wish to study a GHT mechanism that considers the importance of data when performing hashes. For example, if a certain event is considered as important, it can be hashed into a resourceful area so that the data can be easily replicated. Similarly, it is better to find a hashed location close to the sink for time-critical data. REFERENCES [1] S. Ratnasamy, B. Karp, L. Yin, F. Yu, D. Estrin, R. Govindan, and S. Shenker, “GHT: A geographic hash table for data-centric storage in sensornets”. In Proceedings of the First ACM International Workshop on Wireless Sensor Networks and Applications (WSNA), 2002. [2] S. Ratnasamy, B. Karp, L. Yin, F. Yu, D. Estrin, R. Govindan, and S. Shenker, “Data-centric storage in sensornets with GHT, A Geographic Hash Table”. In Journal of Mobile Networks and Applications (MONET), Kluwer, 2003. [3] A. Ghose, J. Grossklag, and J. Chuang, “Resilient data-centric storage in wireless ad-hoc sensor networks”. In Proceedings of the 4th International Conference on Mobile Data Management (MDM), 2003. [4] W. Zhang, G. Cao, and T. Porta, “Data dissemination with ringbased index for wireless sensor networks”. In Proceedings of IEEE International Conference of Network Protocol (ICNP), 2003. [5] B. Karp and H. T. Kung, “GPSR: Greedy perimeter stateless routing for wireless networks”. In Proceedings of the 6-th Annual International Conference on Mobile Computing and Networking (MOBICOM), 2000. [6] R. Tamoshetty, L. H. Ngoh, and P. H, Keng, “An efficient resiliency scheme for data centric storage in wireless sensor setworks”. In Proceedings of IEEE Vehicular Technology Conference (VTC), 2004. [7] L. Girod and D. Estrin, “Robust range estimation using acoustic and multimodal sensing”. In Proceedings of the IEEE Conference on Intelligent Robots and Systems, 2001. [8] N. Priyantha, A. Liu, H. Balakrishnan, and S. Teller, “The Cricket compass for context-aware mobile applications”. In Proceedings of the Seventh Annual ACM/IEEE International Conference on Mobile Computing and Networking (MOBICOM), Rome, Italy, Jul. 2001. [9] T. Le, W. Yu and D. Xuan, “An Adaptive Zone-Based Storage Architecture For Wireless Sensor Networks”. In Proc. of Global Telecommunications Conference (GLOBECOM), Nov. 2005. [10] S. Palchaudhuri, A. K. Saha, and D. B. Johnson, “Adaptive clock synchronization in sensor networks”. In Proceedings of the Third International Symposium on Information Processing in Sensor Networks (IPSN), 2004. [11] J. Greunen and J. Rabaey, “Lightweight time synchronization for sensor networks”. In Proceedings of the International Workshop on Wireless Sensor Networks and Applications (WSNA), 2003.

2174 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2006 proceedings. Authorized licensed use limited to: The Ohio State University. Downloaded on June 29,2010 at 15:46:54 UTC from IEEE Xplore. Restrictions apply.