are capable of answering simple online aggregate queries such as: SELECT AGR .... to be deleted, any pages in the block which should not be removed must be ...... References. [1] Intel imote, http://www.xbow.com/products/product pdf files/.
2008 Real-Time Systems Symposium
Real-Time Querying of Historical Data in Flash-equipped Sensor Devices Adam Ji Dou, Song Lin, Vana Kalogeraki Department of Computer Science and Engineering University of California, Riverside {jdou,slin,vana}@cs.ucr.edu
Abstract
based sensors (such as Presto [6] Capsule [17], B-Flash [25], R-Flash [26], FlashDB [18], Microhash [29]). These are capable of answering simple online aggregate queries such as:
In this paper we propose a suite of index structures and algorithms which permit us to efficiently support querying of historical data in flash-equipped sensor devices. Our techniques are designed to consider the unique read, write and wear constraints of the flash memories; this combined with very limited data memory on sensor devices prohibits the direct application of most existing indexing methods. We have implemented our methods using nesC and have run extensive experiments in TOSSIM, the simulation environment of TinyOS. Our experimental evaluation using trace-driven real world data sets demonstrates the efficiency of our indexing algorithms.
SELECT AGR FROM SENSOR DATA (where AGR = MAX, MIN, AVG..)
In real world applications, however, the queries issued by the user are often more complex than the simple online aggregate queries. For example, in many cases we may want to compare the current data with data collected at specific time in the past. In other cases, we may want to identify patterns or common occurrences. In all such cases, we need efficient techniques to store and index historical data (i.e., data collected in the past) and keep it up-to-date. Consider for example a person studying weather trends. He may be interested in asking temporally constrained queries, such as Find the number of sunny days in the month of February, perhaps to compare with different months. Another example is the ZebraNet mobile sensor system[20] with GPS enabled devices to gather information about the environment, where the sensor device’s flash memory can only hold up to 26 days of collected information. When dealing with such large amounts of data, it is often sufficient to supply approximate answers based on a good sample of sensor data instead of fetching large amounts of the original data to compute exact results [11]. Taking a random sample is a very efficient way to approximate the average, histogram or quantile of the original sensor data. In our work, we are interested in withdrawing a random sample from the sensor data up to any timestamp for data analysis and query approximation. For example, one wants to approximate 90% quantile of all the sensor data until 2006 by withdrawing a random sample from all the sensor data collected up to 2006. In most cases, the SRAM of the sensor device is too small (i.e. 3KB ∼ 10KB) [12] to keep good samples of the sensor data for all the possible time moments. However, we need sophisticated implementations for random sampling if we want the sampling to be done efficiently (see [19] for a comprehensive survey). Examples of the queries we want
1 Introduction Wireless sensor networks (WSN) have received considerable attention in recent years, deployed in a variety of environments to take measurements that would otherwise be impractical due to hostile environments, remote locations and the extended periods of time required [16, 20, 21, 28, 30, 27]. The recent trend of equipping sensors with flash memories (such as RISE [4] and PRESTO [6]) allows for sensors to store large amounts of data locally. Sensors can now exploit the low energy requirements of processing and data storage by only transmitting the processed data results in response to specific queries. Such in-network storage schemes yield significant energy savings since the communication costs are greatly reduced, prolonging the lifetime of the sensor network. The ability to store large amounts of raw data necessitates an efficient method of retrieving historical stream data upon request. The flash memories, however, have many unique characteristics which make the direct application of existing data indexing techniques impractical. Recently, several indexing structures have been proposed for flash-
1052-8725/08 $25.00 © 2008 IEEE DOI 10.1109/RTSS.2008.30
1
335
to answer are the following: SELECT AGR FROM SENSOR DATA WHERE MONTH = MAY SELECT RANDOM SAMPLES FROM SENSOR DATA UP TO MAY 2007 WHERE SAMPLE SIZE = K
The difficulty of realizing indexing structures and samples to efficiently meet the needs of these queries lies in the unique characteristics (wear and delete constraints) of the flash memory and the very limited SRAM capacities. We are often dealing with not only large amounts of data, but also large amounts of information required to sufficiently index that data (often more than can be contained within the sensor’s SRAM ). Retrieving this historical data has been considered in the past to be prohibitively expensive. It is not trivial to maintain dynamic data structures in flash without repeatedly updating pages which contain frequently changing data. For example, trying to maintain a priority heap can cause unacceptable levels of wear in the block containing the root of the heap. Our Contribution: In this paper, we propose a suite of index structures and algorithms which permit us to efficiently support querying of historical data in flash-equipped sensor devices. More specifically, we will address the following queries:
code (128KB) and some static RAM for main memory (3KB ∼ 10KB [12]). The limited size of the SRAM and constraints of the flash memory introduces many challenges to the data storage and indexing on these flash-equipped sensor devices.
2.2
Recent improvements in flash technologies (storage capacity, data access speed, energy efficiency, price) have made it the fastest growing memory market over the past several years. Flash memories are being increasingly used in mobile and wireless electronic devices (e.g. digital cameras, personal digital assistants, cell phones) due to their unique advantages over other storage methods: low energy consumption, large capacity, shock resistance, non-volatile storage, small physical size and light weight. The structure of a flash memory consists of many blocks (4KB ∼ 16KB) which are further divided into pages (128B ∼ 512B). We summarize the constraints of NAND flash as follows: 1. Wear: Each page in flash memory can only be written a limited number of times (≈ 1, 000, 000). 2. Read: Data can be read from the flash memory in size ranging from a single byte to a block (4KB ∼ 16KB).
• Temporally Constrained Aggregate Queries: compute the aggregate sensor reading over any time period. • Historical Online Sample Queries: compute a random sample of the data generated by the sensor up until any time. By providing answers to these queries we are able to support a much wider range of applications. We have implemented our techniques using nesC [10], the programming language of the TinyOS[13] operating system. Our tracedriven experimental evaluations confirm the efficiency of both the proposed indexing structures and the corresponding query algorithms.
2 Background In this section we briefly present the memory architecture of the sensor devices, along with the distinct characteristics of the flash memories.
2.1
Flash Memories
3. Write: Data can be written to the flash memory at a page (128B ∼ 512B) granularity. 4. Delete: a Block (4KB ∼ 16KB) is the smallest unit that can be deleted in the flash memory. With these constraints, it is easy to see that we would like to maintain an even level of wear (uniform number of page writes) across the entire memory to prevent unexpected bad sectors. These bad sectors can greatly increase a program’s complexity in order to handle them correctly. It is important to avoid deleting pages, as this will cause its entire block to be deleted, any pages in the block which should not be removed must be copied and written back after the delete.
3 Dynamic Techniques for Queries: Problem Setting
Historical
Flash-Equipped Sensors
Flash-equipped sensors (e.g. RISE [4], Stargate [3] in PRESTO [6] or TSAR [7]) distinguish themselves from traditional sensor devices (such as MICA2 [2], iMote [1] and XYZ [15]), by providing access to auxiliary data store typically with a size in the range of 32M B ∼ 8GB. Sensors typically have low powered processors (4M hz ∼ 64M hz), limited program flash for storing
Let S denote some sensor that acquires readings from its environment every δ seconds (i.e. t = 0, δ, 2δ, ...). At each time instance t, the sensor S takes some measurements represented as a data record datarec =< t, v1 , v2 , ..., vn >, where t denotes the timestamp when the tuple was recorded, and vi (1 ≤ i ≤ n) represents the value of environmental variables such as temperature, humidity, pressure, light, voltage, longitude and latitude.
336
Let P =< p1 , p2 , ..., pm > denote a flash media with m available pages. A page can store a finite number of bytes (denoted as psize ), which limits the capacity of P to mpsize . Pages are logically organized into b blocks < block1 , block2 , ..., blockb >, each block containing m/b consecutive pages. We assume that the pages are read in a page-at-a-time basis and that each page pi can only be deleted if its respective block (denoted as pblock ) is i deleted as well (delete constraint). Finally due to the wearconstraint, each page can only be written a limited number of times (denoted as pwc ). With these settings of the sensor device, we want to efficiently answer Temporally Constrained Aggregate Queries and Historical Online Sampling Queries as described in the following sections.
4 Temporally Queries
Constrained
Directory Page
SRAM
Index Page
typedef struct DirectoryP { uint32_t direc[MAX_LEVEL]; };
Data Page
typedef struct IdxP { uint8_t resolution_level; uint32_t back_pointer:23; IdxRec records[I_SIZE]; };
Flash Card
typedef struct IdxRec { uint32_t start_timestamp; uint32_t virtual_pointer; unit32_t aggregate_value; };
Figure 1: (a) Overview of the multi-resolution index structure in SRAM and Flash Card (left) and (b) Data structures used in the nesC implementation (right).
Aggregate
Temporally Constrained Aggregate Queries are aggregate queries with time restrictions. For example, people may be interested in queries related to specific time periods: What is the average/maximum/minimum value of all the temperature readings collected in February 2007?, or How many times has my zebra been to the watering hole in the last month? We define some commonly used Temporally Constrained Aggregate Queries as follows: Average Queries: A query AV G(vi , tstart , tend ) returns the average value of all the readings of attribute vi generated by the sensor between timestamp tstart and tend . A common special case of this query is the Running Average Query: RunningAV G(vi , l) returns the average value of the last l readings. It can also be specified as AV G(vi , currenttime − lδ, currenttime). Count Queries: COU N T (pred(vi ), tstart , tend ) returns the count of all the qualified tuples for a predicate function pred(). Running Count Queries returns the count of all the qualified tuples in the last l readings: RunningCOU N T (pred(vi ), l). Top-K Queries: T OP (vi , tstart , tend , k) returns the highest k values of the readings of attribute vi . Running Top-K Query returns the highest k values in the last l readings: RunningT OP (vi, l, k). A naive method to evaluate the above queries is to organize all the sensor data sequentially (according to the timestamp the data is generated) in the flash memory. When a query of window tstart to tend is issued, all the sensor readings from tstart to tend are fetched and the aggregate is computed from the fetched data. This is an expensive operation when the query window tend − tstart is large or the query is issued frequently.
SRAM
Flash Card 1
2 1
3 2 1-2
4 3
5 4 3-4 1-4
6 5
7 6 5-6
8 7
9 8 7-8
10 9
10
11
t
11
9-10
5-8 1-8
Directory 0 1 2 3 ··· MAX_LEVEL
Figure 2: The Directory and Index Records.
Instead, we propose to build and maintain a multiresolution index structure in order to answer this aggregate query more efficiently. The basic idea is that, as new data is generated, we pre-compute partial results with some resolution level. Our indexing structure is composed of two types of pages in addition to data pages: directory pages and index pages. As shown in Figure 1, a directory is maintained in the SRAM of a sensor device with each directory bucket pointing to an index page stored in the flash memory. The directory pages hold a directory where each bucket corresponds to a unique resolution level (0, 1, 2, ..., M AX LEV EL as shown in Figure 2). The index pages keep partial aggregate results with some resolution level (e.g. in Figure 2, each index record in the index page is the average of 2i consecutive temperature readings, where 2 is the granularity base and i is the resolution level of the index page). Note that higher resolution levels correspond to a lower resolution representation of the actual data. When a new data record is produced by the sensor, we create a new index record with resolution level 0. Then we recursively try to combine consecutive index records from the current resolution level into new index record at the next higher resolution level. Take Figure 2 as an example, when
337
Algorithm 1 Creating Multi-resolution Indexes Input: recordi : the ith data record newly generated by the sensor. Output: an update of multi-dimensional index pages (index0 , index1 , ..., indexMAX LEV EL ) 1: procedure C REATE -AGGREGATE -I NDEX (recordi ) 2: index0 (i) = recordi ; 3: for k = 0 to M AX LEV EL − 1 do 4: if length(indexk ) MOD 2 == 0 then i ) = F (indexk ( 2ik − 1), 5: indexk+1 ( 2k+1 i indexk ( 2k )) 6: else 7: break; 8: end if 9: end for 10: end procedure
the 10th data record Record10 is produced, we first create an index record for it with a value that is the same as that of the indexed attribute in Record10 . Then we can combine consecutive index records index(9) and index(10) into a new index record index(9−10) with a higher resolution level (i.e. 1). This new index record is the aggregate value of data records between timestamp 9 and 10. In these multiresolution index structure, the resolution level (rl) controls the length of consecutive data records the index record aggregates together. More specifically, this means that the index page at resolution level rl will compute and store the aggregate result of 2rl consecutive data records. Algorithm 1 presents this index construction algorithm. The function F (x1 , ..., xn ) in Algorithm 1 aggregates indexes of smaller resolution level into an index of the current resolution level. The actual function depends on the types of query we want to answer: • Average Queries: F returns the average of x1 to xn • Count Queries: F returns the sum of x1 to xn • Top-K Queries: F returns top-k items from x1 to xn The index pages are organized in the flash memory in such a way that all the index pages with the same resolution level are linked together by back pointers, forming a large linked list (as shown in Figure 1). In addition, the index pages with different resolution levels but having the same starting timestamp are linked together by virtual pointers (e.g. in Figure 1 the index records index(1−8) , index(1−4) , index(1−2) , index(1) are linked by dashed arrows). With these virtual links we are able to access any index record at any resolution level efficiently. When a query is issued, we wish to determine which set of index records is required to construct the answer. Our goal is to minimize the number of page reads required to
generate the result, this translates to using as many high resolution level index pages as possible. The idea is to first locate the highest rl index record which contains aggregated data records up to, but not exceeding, the end time tend . To locate this index record, we simply follow the linked list of highest possible rl index pages, dropping to lower rl index as necessary, until we reach an index record that satisfies the condition. We then try to find the index record with the largest resolution level that is covered the query range tstart to tend . After exhausting the largest resolution levels, we begin using progressively smaller resolution levels until we have covered the entire query interval. Figure 2 illustrates the index records used to answer a query with parameters tstart = 4 and tend = 11. We want to compute the aggregate (e.g. TOP-1 or MAX query) results of time interval [4 − 11], we first fetch the index record index(5−8) (i.e. the TOP-1 or MAX value of all original data records in time interval [5-8]) with the largest resolution level rl = 2 that the query window [4 − 11] covers. Then we follow the virtual link from index(5−8) to fetch index records index(4) with low resolution level. Finally we fetch index(11) and index(9−10) as they are covered by the query interval. The query result with window l is then computed by aggregate the index records we have fetched (i.e. aggregate(index(5−8), index(4) , index(9−10) , index(11) )). Our goal is to organize the index records (usually smaller than a page) efficiently into pages and store them in the flash memory. As shown in Figure 3, in the SRAM we keep the directory pointing to the index page links for individual resolution levels. In addition, the SRAM also keeps the last index page for each resolution level. When the index page is full, we write it out to the flash memory and create a new empty index page pointing to it. In the flash memory, on the other hand, the back pointers (solid arrows in Figure 3) are used to link together all the index pages of the same resolution level, and the virtual pointers (dashed arrow in Figure 3) link together the index records with the same starting timestamp. Both back pointers and virtual pointers will point to the pages the corresponding index records reside in instead of the original index records. Storage Overhead: It is easy to see that with n data records and b records in each page, our approach has an indexing storage overhead of at most ⌈ 2n b ⌉ pages and search overhead of O(log nb ) pages. This requires a modest storage overhead but provides much better query speeds when compared with the naive approach (which stores only data pages and accesses O( nb ) pages for a query). In addition to 2 − based resolution directory, we can also use g − based resolution directory to index the data. The parameter g is the granularity base for the multi-resolution indexing (i.e. the index record with resolution level rl now keeps the aggregate of g rl (g = 3, 4, ...) consecu-
338
SRAM
Flash Card
Index Page P1 1
P2 2
3
P4 4
P3 1-2 3-4
5
P5 6
7
P8 8
P6 5-6 7-8
9
10
11
9-10
P7 1-4 5-8
n≤k
Directory
0 1 2 3 ···
Reservoir
1
2
3
n>k 21
2
25 10 15
8 P = k/n
P=1 P=1 P=1
Incoming Data
1
2
3
1
2
3
4
5
6
··· ···
n
1-8
Figure 4: The Reservoir Sampling Algorithm Figure 3: The Directory and Index Pages.
tive data records). An advantage of the g − based resolution directory is that we decrease the space overhead g (i.e. we need to maintain at most nb g−1 index pages in the flash memory). Consequently, the query search performance of the g − based approach is negatively affected ( O((g − 1)logg nb )) pages are accessed for each query).
5 Historical Sampling In many applications, especially when more data is generated than can be effectively stored, users are interested in maintaining a random sample of the sensor data for query approximation and data analysis. Users may also be interested in historical samples which are random samples of data from the beginning of data collection until a certain timestamp. The Historical Online Sampling query is defined as follows, Historical Sampling Query: Given a continuous data stream x1 , x2 , ...xn , a sample size k and the query timestamp t (k ≤ t ≤ n), return k random samples from all the data items generated from the beginning to timestamp t (i.e. x1 , x2 , ..., xt ). To compute a random sample of the sensor data on the fly we investigate using a Reservoir Sampling technique. We want to construct a sample reservoir of size k and maintain the invariant that the current reservoir constitutes a uniform random sample of all data elements seen so far. In Reservoir Sampling the first k arriving data records are inserted into the reservoir and when the nth (n > k) data record arrives, it will be included in the reservoir with probability nk , by replacing a random victim in the reservoir; with probability 1− nk , the arriving data record is not included in the reservoir (shown in Figure 4). If the size of SRAM is larger than k, it is trivial to implement the Reservoir Sampling on the sensor device. We simply allocate enough storage space to hold k records in the SRAM to maintain the reservoir and update it whenever a new sample needs to be included. In most real world applications, however, it is an impractical solution as the SRAM of the sensor devices are very limited (≈ 3K − 10K). Even if SRAM can accommodate the entire sample, it would still
be very difficult to support historic sample queries, which can request a sample up to any time. If a simple checkpointing system were used to periodically store the sample from the SRAM to flash, it would only be able to answer historic queries where the time requested coincides with a checkpoint. To satisfy the requirement for providing a sample for any time, the sample in SRAM must be written out to the flash memory. When the size of the reservoir k is larger than the size of the SRAM , we cannot keep the entire reservoir sample in SRAM . Thus, we have to maintain it in the flash memory. To conserve SRAM memory usage, we only place in the SRAM a directory to track the reservoir membership (i.e. the directory indicates where the sample records are stored in the flash memory). In this manner, the data records can be written out to the flash memory sequentially without replacement. When a newly produced data record replaces an old data record in the reservoir, we simply write out the new record to the flash memory and update the corresponding directory entry in the SRAM . With the directory, we are able to find the correct members of the reservoir sample and keep the reservoir updated as new data record comes. In most applications the SRAM will still be unable to hold either the reservoir samples or the directory of the reservoir membership. In such situations, there is no way to keep the complete reservoir information in the SRAM . We explore a checkpoint and logging approach to efficiently maintain the reservoir in the flash memory. We can maintain in the flash memory using two types of pages: Reservoir Pages and Log Pages. When the sensor begins to collect data, the first k generated data records are written sequentially to the flash memory as a sequence of reservoir pages, this sequence of reservoir pages can be thought of as an initial checkpoint. Later, if a replacement occurs (i.e. the new data record is determined to be included in the reservoir), we write it as a log record in a log page on the flash memory. A log record has a format of < timestamp, recordnew >, where timestamp is the time when the record is generated and recordnew is the content of the new record to be inserted in the reservoir (shown in Figure 5). When a Historical Sampling query of query timestamp t comes, we scan every data record in the log from the beginning to the newest record with timestamp ts up to t (i.e. ts ≤ t), and for each
339
SRAM K bits
L bits
0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1
PReserv = pL P1 = pL-1
Pi = pL-i
PL = 1
Flash Card Reserv (k)
Log File
Reserv (k+L)
Log File
Figure 5: Reservoir Sampling for the Flash Memory
record we scan, we include it by replacing a random victim in the reservoir. The scheme described above requires accessing every data record in the log with timestamp up to t and a replacement operation in the reservoir for each record. Since the reservoir is usually too large to be kept in the SRAM , in the worst case, each replacement operation requires swapping in a new page of the reservoir containing the victim that the new record is to replace. For a log of l records, it requires O(l kb ) page reads and writes for the flash memory, which is very expensive. Therefore, we propose a more efficient reservoir maintenance algorithm to quickly compute the records required for a new reservoir. Our algorithm is based on the following theorem: Theorem 1. For reservoir sampling in flash memories, given a Reservoir of size k (reserv1 , reserv2 , ..., reservk ) and a Log File of size l (log1 , log2 , ..., logl ) with their timestamp up to query t (i.e. log1 .ts < log2 .ts < ... < logl .ts ≤ t < logl+1 .ts), the probability that a reservoir record reservi is in the reservoir sample up to timestamp t is pl and the probability that a log record logi is in the reservoir sample is pl−i , where p = k−1 k . With the knowledge of the probability of each record from the reservoir or log being included in the final reservoir sample, we can compute the final reservoir sample without accessing each log record individually. We can keep in the SRAM a bit vector of k + l bits, where each bit determines whether the corresponding record is included in the final reservoir. Every bit is first initialized to 0 (as shown in Figure 5). The first k bits in the vector represent the k records in the reservoir and the last l bits in the vector represent the log records. We then perform k bit-setting operations, in each operation only 1 bit is set from 0 to 1 and the possibility that the bit is set is the probability that the corresponding
Algorithm 2 Updating Reservoir Input: The reservoir size k, the ith incoming data record recordi generated by the sensor. 1: procedure R ESERVOIR U PDATE (k, recordi ) 2: if i ≤ k then 3: Append recordi to the end of the reservoir 4: else 5: Set a boolean value b to true with probability k/i, and set b to f alse otherwise 6: if b == true then 7: Append recordi to the end of the log 8: end if 9: end if 10: if log size == L then 11: Compute a new check-point (reservoir) by computing the current reservoir sample from the newest reservoir (checkpoint) and the log 12: end if 13: end procedure
record (either in the reservoir or the log) will be included in the final reservoir sample (as shown in Theorem 1). After the k bits haven been set in the bit vector, we only have to fetch the records indicated by the set bits. The time complexity of this flash reservoir sampling algorithm is O(k) and it requires O(k) page reads and write operation for the flash memory, which greatly reduces the data access overhead from the method described previously. As the number of data records increase, the log also becomes longer and the query performance is decreased. We remedy this situation by employing a checkpointing approach to guarantee good query performance. The number of log records, l, is monitored and when l surpasses some threshold L, we will create a checkpoint by computing a new reservoir sample and writing them out as reservoir pages in the flash memory. This newly created reservoir page sequence becomes a new checkpoint. As Figure 5 shows, the first k data records collected by the sensor are written to the flash memory directly as a reservoir (Reserv(k)). New data records are written to as log records if they are determined to be included in the sample. When the length of the Log becomes L, we will use our flash reservoir maintenance algorithm described above to compute the current reservoir sample and write the reservoir sample into the flash memory as a check point (Reserv(k + L)). We summarize checkpointing and sample construction algorithms for flash Reservoir Sampling in Algorithm 2 and 3. To answer historical online sampling queries, we first locate the last log entry or checkpoint before the query timestamp, then we can apply Algorithm 3 at that point to recover the random sample of data up to that timestamp. We use the
340
Algorithm 3 Reservoir Sample Construction Input: The reservoir of k records reservi (1 ≤ i ≤ k), the log of l records logi (1 ≤ i ≤ l). Output: The current reservoir sample. 1: procedure R ESERVOIR C ONSTRUCTION(reserv1 , reserv2 , ..., reservk , log1 , log2 , ..., logl ) 2: Create a bit vector V [k + l] of k + l bits each representing a data record. 3: Initialize all bits in V [k + l] as f alse 4: for i = 1 to k do l 5: Pi = ( k−1 k ) 6: end for 7: for i = 1 to l do l−i 8: Pk+i = ( k−1 k ) 9: end for 10: Set k bits in V [k + l] to true, bit V[i] is set with probability Pi . 11: Include the data record if its corresponding bit V [j] has been set. 12: end procedure
log entry we found and every log entry between it and the previous checkpoint, where the number of log entries is l in Algorithm 3. If we found a checkpoint, we can use it directly to provide the sample. To locate the appropriate log file, we can use a binary search since everything is ordered by time in the flash memory.
6 Experimental Evaluation In this section, we present an extensive performance evaluation of our proposed index structures and algorithms. We have implemented each of our proposed indexing methods in nesC [10], the programming language of TinyOS [13]. We have run simulations on TOSSIM [14], a TinyOS mote simulator. For flash operations, we coded our own flash driver which uses a scratch file to simulate a flash memory. The most important measurement throughout our experiments is the page reading and writing, which is directly proportional to both the runtime and energy usage.
6.1
Data Sets
We conduct our experiment using several data traces: Great Duck Island Data: This is a real world data set collected from the habitat monitoring project on the Great Duck Island in Maine [21]. The data set contains the sensor readings of several environmental quantities: light, temperature, pressure, thermopile, humidity and voltage. The data set includes all the sensor readings collected between Octo-
ber 2002 and November 2002. We use a set of 95,000 data records from one of these sensors. Washington Weather Data: This is a real world data set consists of atmospheric data collected from 32 sites in the weather stations of Washington and Oregon. Each of the 32 sensor locations maintains the average temperature on an hourly basis for a total of 208 days between June 2003 and June 2004. We use a set of 150,000 data records from one of these sensors.
6.2
Temporally Queries
Constrained
Aggregate
In this experiment, we evaluate the query performance of our multi-resolution index structure on Great Duck Island Data. An alternative, naive approach, is to fetch all the original data records with timestamp inside the query interval tstart and tend , this approach requires page reads proportional to the interval size. Our experimental work quantifies the efficiency of the multi-resolution index structure and how it improves the query performance.
6.2.1 Varying Resolution Level We vary the number of resolution levels in our multiresolution index structure, keeping only a certain level of aggregated values. For example, setting our total to 4, the lowest resolution level will contain the aggregate information of 24 original data records. First, we show the number of page reads required in response to queries for increasingly large running average intervals (intervals ending in the current time). Figure 6 show that the number of page reads vary sporadically as the intervals increase. This behavior is, as expected, due to the interval continually becoming just large enough to take advantage of some lower resolution. Also, we see that the number of page reads begin to differ between using 3 and 4 total levels, this difference becomes more apparent as the query interval increases (Figure 7). For the second figure (Figure 7), we vary the total number of resolution levels from 4 to 10 and evaluate the search performance with different query interval sizes. To prevent the jaggedness we saw in Figure 6, we averaged the result over several consecutive interval sizes depending on which resolution levels were used. Ideally, with infinite resolution levels, we would see a log curve. We see this ideal curve initially but the results diverge as the lowest resolution levels are reached. Further increases in interval length can only be calculated with the lowest resolution levels and these become the bottleneck, because of this, the slope of the straight line section is inversely related to the total number of resolution levels.
341
350
12 10 8 6 4 2 0
9 rl=4 rl=6 rl=8 rl=10
300
8 7
250
Search Performance
rl=3 rl=4
14
Search Performance ( # of page reads )
Search Performance (# of page reads)
16
200 150 100
20
30
40
50
60
70
80
90
5 4 3 2
50
1
0 10
6
100
0 0
1000 2000 3000 4000 5000 6000 7000 8000 9000
Distance from Current Time
2
Figure 6: Non-averaged search performance for varied resolution levels
q=25 q=50 q=100 q=200 q=400
100 50
Reservoir Sampling 200 Energy Usage (mJ)
Search Performance (# page read)
150
6
1000 Flash Reservoir Sampling
200
5
Figure 8: Search performance for varied granularity base.
250
250
4 Granularity Base
Figure 7: Average search performance for varied resolution level.
300
Storage Overhead
3
Interval Size
150
100
100
10
50
0
1 2
3
4
5
6
5
10
Granularity Base
Figure 9: Memory overhead for varied granularity base.
15
20
25
30
35
1
2
Log Size (# pages)
Figure 10: Query Search Performance of Flash Reservoir Sampling (Reservoir Size k = 100, SRAM = 2KB).
6.2.2 Varying Granularity Base
6.3
3
4
5
6
7
8
9
Figure 11: Total query energy cost with checkpointing. (q is the total number of queries. Triangles indicate the lowest energy)
Historical Sampling
In the last experiment set, we evaluate the query performance of our Flash Reservoir Sampling algorithm on Washington Weather Data. As described in Section 4, we can also create a d − based resolution directory in the SRAM . In this experiment, we evaluate the search performance and memory overhead of our index structure with different granularity bases. It is shown in Figure 8 that the smaller the granularity base, the better the search performance. With a smaller granularity base, we can represent the interval with more resolution levels and thus represent the query interval more concisely. However, a small granularity base requires more index records and consequently more storage overhead in the flash memory (as shown in Figure 9). The choice of a proper granularity base is then a tradeoff between the query performance and flash memory storage overhead. From the graphs we can see that as the search performance improves linearly, storage requirements increase exponentially, with properly weighted costs for storage space and search performance, it is possible to determine the optimal granularity base value for specific applications.
6.3.1 Query Search performance We compare the query performance of our Flash Reservoir Sampling algorithm with the original reservoir sampling algorithm. With a fixed reservoir size and an increasing log file size, the Flash Reservoir Sampling algorithm performs much fewer page reads (Figure 10). 6.3.2 Effects of Checkpointing We evaluate the query performance of checkpointing. Figure 12 and 13 show that increasing the checkpoint log length threshold L (less frequent checkpointing) require less page accesses (reads and writes) for checkpointing, but more page accesses (i.e. reads) for queries. Figure 11 shows the total energy cost with different checkpointing frequencies and query frequencies. There are optimal checkpoint frequencies depending on the number of queries. With a
342
10
Checkpoint Log Length Threshold (L)
12
300
11
250
Checkpoint Page Accesses
Average Query Page Reads
writes
10 9 8 7
reads
200 150 100 50
6 5
0 1
2
3
4
5
6
7
8
9
10
1
Checkpoint Log Length Threshold (L)
7 Related Work Several flash file systems have been proposed in recent years for data storage in flash memories, such as the Microsoft’s Flash File System (FFS1, FFS2 [22]) for MSDOS, the Journalling Flash File System (JFFS and JFFS2) [23] for Linux, the Yet Another Flash File System (YAFFS) [24] for uClinux and Windows CE, the Transactional Flash File System (TFFS) for transaction processing [8], the Efficient Log Structured Flash File System (ELF) [5] for efficient file updating and appending. The main objective of these file systems, however, is to organize the data into file systems or directories and to provide transaction semantics on these attributes, which does not support the retrieval of data by database queries. In addition to file systems, a large amount of efficient flash-based indexing techniques have been proposed to support various database queries [25, 26, 17, 18, 9, 29]. The B-tree and R-Tree index structures for flash memory on portable devices, such as PDAs and cell phones, have been proposed in [25] and [26] for supporting range queries and spatial queries. These techniques keep in the memory an Address Translation Table (ATT) to provide data indexing and query searching. Such translation tables, though efficient in wear-leveling of the flash memory, are usually too large to be maintained in the limited SRAM (3KB ∼ 10KB) of sensor devices, making them impractical for sensor network applications. Another interesting index structure, Capsule [17], was recently proposed by Mathur et.al. Capsule is a log-structured object storage system that supports, in flash memory, several commonly used data storage objects such as streams, files, arrays, queues and lists. The
3
4
5
6
7
8
9
10
Checkpoint Log Length Threshold (L)
Figure 12: Query search performance with varied checkpointing.
large number of queries, it is better to checkpoint more frequently (as shown in Figure 12). However, with less frequent queries, it is more desirable to create fewer checkpoints as the construction of checkpoints require more page accesses (Figure 13).
2
Figure 13: Page accesses for checkpointing.
idea is to employ a hardware abstraction layer that hides the details of the storage in the flash memories for various database objects. In addition, Capsule supports checkpointing and rollback of object states to deal with various failures in sensor devices. Recently Nath and Kansal proposed FlashDB [18], a self-tuning indexing technique optimized for sensor networks with flash storage. In FlashDB, it is observed that different types of flash devices and different data writing/quering workloads have different optimal storage structures. Their self-tuning indexing technique can dynamically adapt to the ideal storage structure with the changes of the workload and the underlying storage device. However, Capsule and FlashDB do not support the types of complex queries such as temporally constrained aggregate queries, similar pattern queries and online sampling queries as we do in our paper. An efficient interval indexing algorithm for sensor networks is proposed in TSAR [7]. TSAR is a component of the PRESTO [6] predictive storage architecture that combines archival storage with caching and prediction. It proposes to segment the network architecture into two tiers: proxy tier and sensor tier. The sensor devices at the sensor tier measure the environment and store data locally and send small-sized summaries (intervals) of the data to the proxy tier. The proxy tier uses a novel multi-resolution ordered distributed index structure - Interval Skip Graph, to efficiently index these summaries (intervals) for various queries. However, they employ powerful Crossbow Stargate nodes [3] as the proxy tier and the proposed index structure is designed for and implemented in the large RAM of the stargate nodes. Therefore, their index structure cannot be organized efficiently in the flash memories that most manufactured sensor devices are equipped with (e.g. RISE [4]) because their index structure does not consider the characteristics of flash memories which distinguishing it from RAM and hard drives. In [9] Ganesan et al. proposed to compress the sensor data by wavelet based summarization and store the wavelet summaries in the network with differ-
343
ent resolutions. This approach maps a hierarchical indexing structure to the network architecture for efficient query computation. These wavelet summaries, however, can only be utilized to approximate the query results instead of computing the exact query result as in our case. A recent work similar to ours is MicroHash [29], an efficient hashing indexing structure designed for flash-based sensor devices. MicroHash works by organizing both data and index pages in flash memory, and keeping a hash table in the SRAM to facilitate indexing and searching. MicroHash is able to perform equality searches by value in constant time and equality searches by timestamp in logarithmic time. However, MicroHash does not support complex database queries as we address in the paper.
8 Conclusions In this paper, we have presented several efficient indexing structures and algorithms to address a range of practical queries on the data stored in flash equipped sensor devices. Our work provides support for various sensor network applications which require temporally constrained aggregation and online sampling queries. We employ several novel indexing structure designed for flash memories to help compute the query result more efficiently. Our trace-driven experimentation with real world data traces, in TOSSIM, demonstrates that our indexing structures and algorithms are able to provide excellent query performance at the cost of a modest storage overhead.
References [1] Intel imote, http://www.xbow.com/products/product pdf files/ wireless pdf/mica2 datasheet.pdf. [2] mica2, http://www.xbow.com/products/product pdf files/ wireless pdf/mica2 datasheet.pdf. [3] stargate, http://www.xbow.com/products/product pdf files/ wireless pdf/6020 − 0049 − 01 b stargate.pdf. [4] A. Banerjee, A. Mitra, W. A. Najjar, D. Zeinalipour-Yazti, V. Kalogeraki, and D. Gunopulos. Rise co-s : High performance sensor storage and co-processing architecture. In SECON’05, Santa Clara, CA. [5] H. Dai, M. Neufeld, and R. Han. Elf: an efficient logstructured flash file system for micro sensor nodes. In SenSys’04, pages 176–187. [6] P. Desnoyers, D. Ganesan, H. Li, M. Li, and P. Shenoy. Presto: A predictive storage architecture for sensor networks. In HotOS X’05, Santa Fe, New Mexico. [7] P. Desnoyers, D. Ganesan, and P. Shenoy. Tsar: a two tier sensor storage architecture using interval skip graphs. In SenSys’05, San Diego, California, USA. [8] E. Gal and S. Toledo. A transactional flash file system for microcontrollers. In USENIX Annual Technical Conference, Anaheim, CA, 2005.
[9] D. Ganesan, B. Greenstein, D. Estrin, J. Heidemann, and R. Govindan. Multiresolution storage and search in sensor networks. ACM Transactions on Storage, 1(3):277–315, 2005. [10] D. Gay, P. Levis, R. von Behren, M. Welsh, E. Brewer, and D. Culler. The nesc language: A holistic approach to networked embedded systems. In PLDI’03, San Diego, CA. [11] P. Gibbons, Y. Matias, and V. Poosala. Aqua project white paper. In technical report, Bell Laboratories,, 1997. [12] L. Gu and J. A. Stankovic. t-kernel : Provide reliable os support for wireless sensor networks. In SenSys’06, Boulder, CO. [13] J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. E. Culler, and K. S. J. Pister. System architecture directions for networked sensors. In ASPLOS’00, pages 93–104. [14] P. Levis, N. Lee, M. Welsh, and D. Culler. Tossim: accurate and scalable simulation of entire tinyos applications. In SenSys’03, pages 126–137, Los Angeles, CA. [15] D. Lymberopoulos and A. Savvides. Xyz: a motion-enabled, power aware sensor node platform for distributed sensor network applications. In IPSN’05, page 63, Los Angeles, CA. [16] S. Madden, M. Franklin, J. Hellerstein, and W. Hong. Tag: a tiny aggregation service for ad-hoc sensor networks. In OSDI’02. [17] G. Mathur, P. Desnoyers, D. Ganesan, and P. Shenoy. Capsule: an energy-optimized object storage system for memory-constrained sensor devices. In SenSys’06, pages 195–208, Boulder, Colorado, USA. [18] S. Nath and A. Kansal. Flashdb: Dynamic self-tuning database for nand flash. In IPSN’07, Cambridge, MA. [19] F. Olken. Random sampling from databases. In Ph.D. Dissertation, Univ. of California, 1993. [20] C. Sadler, P. Zhang, M. Martonosi, and S. Lyon. Hardware design experiences in zebranet. In ACM SenSys’04. [21] R. Szewczyk, A. Mainwaring, J. Polastre, J. Anderson, and D. Culler. An analysis of a large scale habitat monitoring application. In SenSys’04, pages 214–226, Baltimore, MD, USA. [22] P. Torelli. The microsoft flash file system. In Dr. Dobb’s Journal, pages 62–72, 1995. [23] D. Woodhouse. Jffs : The journalling flash file system. In Red Hat Inc., http://sources.redhat.com/jffs2/jffs2.pdf. [24] Wookey. Yaffs - a filesystem designed for nand flash. In Linux 2004 Leeds, U.K. [25] C. Wu, L. Chang, and T. Kuo. An efficient b-tree layer for flash memory storage systems. In RTCSA’03, Tainan, Taiwan. [26] C. Wu, L. Chang, and T. Kuo. An efficient r-tree implementation over flash-memory storage systems. In GIS’03, pages 17–24. [27] N. Xu, S. Rangwala, K. Chintalapudi, D. Ganesan, A. Broad, R. Govindan, and D. Estrin. A wireless sensor network for structural monitoring. In SenSys’04. [28] Y. Yao and J. Gehrke. The cougar approach to in-network query processing in sensor networks. SIGMOD Rec. ’02, 31(3). [29] D. Zeinalipour-Yazti, S. Lin, V. Kalogeraki, D. Gunopulos, and W. A. Najjar. Microhash: An efficient index structure for flash-based sensor devices. In FAST’05, San Fransisco, CA. [30] D. Zeinalipour-Yazti, S. Neema, D. Gunopulos, V. Kalogeraki, and W. Najjar. Data acquision in sensor networks with large memories. In NetDB’05, Tokyo, Japan.
344