Multi-dimensional Range Query for Data Management using Bloom Filters Yu Hua #1 , Dan Feng #2 , Ting Xie ∗3 #
School of Computer, Huazhong University of Science and Technology Wuhan, China 1
[email protected] [email protected]
2
∗
Computer Science Department, Rensselaer Polytechnic Institute Troy, NY, USA 3
[email protected]
Abstract—Providing range query as basic network services has received much research attention recently. Range query can exhibit all items located within a certain range. Previous approaches to represent and query items, such as Distributed Hash Tables (DHT) or R-tree structures, use too much storage space to store and maintain items to achieve exact query results. Corresponding structures cannot effectively support operations on items that have multi-dimensional attributes. In this paper, we propose a simple and space-efficient structure, i.e., MultiDimensional Segment Bloom Filter (MDSBF), to support range query for data management. Our approach logically divides the range of multi-dimensional attributes into several segments to support fast and accurate lookups. We also develop a simple algorithm to achieve load balance among multiple segments and improve query accuracy. Through theoretical analysis and performance evaluation, we demonstrate that the MDSBF structure can efficiently support range query service for items with multidimensional attributes.
I. I NTRODUCTION Range query allows users to obtain all items whose attributes exist within the range of a query request. Current and emerging cluster network applications where range query invariably seeks information about items having multiple attributes depend upon the provision of fast, highly scalable, and space efficient lookup services. These range queries may request data items that have multi-dimensional attributes. Range queries, explicitly denoted by attribute ranges, commonly use Distributed Hash Tables (DHT) [1]–[3] and treebased structures to store and maintain items needed to satisfy query requests, invariably requiring a lot of storage space. Peer-to-Peer systems using DHT greatly improve the scalability and obtain accurate query results, but offering only the exact-match query facility. Mercury [4] could efficiently support multi-attribute range query by using a circular overlay and light-weight sampling mechanism. Because the overlay was specially designed, we needed to take into account load balancing and operation complexity in the system design. Therefore, it is very necessary to design and develop a spaceefficient structure providing fast and efficient services of multiThis work was supported by National Basic Research 973 Program under Grants 2004CB318201.
1-4244-1388-5/07/$25.00 © 2007 IEEE
attribute range query with the tradeoff of acceptable false probability. A Bloom filter [5] is a space-efficient data structure and provides a useful tool to assist in route lookup [6], packet classification [7] and longest prefix matching [8]. Other forms of Bloom filters that have been proposed for various purposes include beyond Bloom filters [9], counting Bloom filters [10], compressed Bloom filters [11] and multi-dimension dynamic Bloom filters [12]. Space-efficient Bloom filter supports point query very well [13] and however, it generally cannot support range query because the one-way computation of hash functions strips range information of attributes. The main contribution in this paper is the design of a space-efficient structure based on Bloom filters, i.e., MultiDimensional Segment Bloom Filter (MDSBF), which has the advantage of fast lookup and meanwhile supports multiattribute range query by dividing ranges of attributes into several segments that can record the range information of attributes with high accuracy. II. R ELATED W ORK Range queries in the context of P2P systems commonly use DHTs to store and maintain items with multi-dimensional attributes, offering only exact-match query. Complex queries facilities [3] could be built on top of DHT-based P2P systems, providing queries services for many applications. Distributed segment tree (DST) [14] implemented over OpenDHT [15] combined range query with cover query to extend common query services. Armada [16] as a delay-bounded range query scheme can further support single-attribute and multi-attribute range queries over constant-degree DHTs, achieving the query results within 2logN hops in a P2P system with N peers. Classical R-tree structure [17] can efficiently support range query based on exact-matching results. R-trees were proposed to allow multi-dimensional range queries, which they did by aggregating attribute values into corresponding ranges. In essence, the family of R-tree index structures, including R+ tree [18] and R∗ -tree [19], uses solid minimum bounding rectangles (MBR), i.e. bounding boxes, to indicate the queried regions. An MBR is a multi-dimensional interval of the data
428
2007 IEEE International Conference on Cluster Computing
10G
0G
space and can represent the minimal approximations of the enclosed point set. The region description of an MBR consists of a lower and an upper bound for each dimension [20]. The multi-version 3D R-tree (MV3R) [21] combined the concepts of multi-version B-tree (MVB) [22] and 3D R-trees to address the problem of the indexing and retrieval of moving regions’ past locations. Furthermore, RUM-tree (R-tree with Update Memo) [23] presented an R-tree variant that minimized the cost of object updates in a memo-based approach, avoiding disk accesses for purging old entries during an update process. Existing work mainly carries out multi-dimensional range query based on exact matching and uses much storage space for maintaining items that have multi-dimensional attributes. Space-efficient range query, which can execute fast lookup and obtain many space savings, is rarely touched. Thus, we design a new structure that can support multi-dimensional range query based on space-efficient Bloom filters. III. M ULTI -D IMENSIONAL S EGMENT B LOOM F ILTER A standard Bloom filter is a bit array of m bits for representing a set S = {a1 , a2 , . . . , an } of n items. All bits in the array are initially set to 0. A Bloom filter uses q independent hash functions {h1 , . . . , hq } to generate a representation of each item a of the set in the bit array by setting the associated bit of the array to 1. To determine the membership of a in S, one simply checks whether all the bits pointed by hash values are 1. If not, a is not in the set S. If yes, item a is considered a member of the set S with a very small probability of false positive. A false positive means that item a is considered a member of the set S although it is not. However, significant space savings often outweigh this drawback since false positive probability is typically very small.
1
0
..
..
1
1
..
..
0
0
1
..
..
0
0
0
1
..
..
0
1
1
..
..
0
1
0
..
..
1
0
10GHz
60s
0s 1
0
50G
2GHz
0GHz 0
40G
10G
0G
Hash Table
1
1
600s 0
0
..
..
200G
500G
1
0
0
1
..
..
C[1][q] 0
0
1
1
..
..
1
0
One Segment Bloom Filter for storage capacity
Fig. 2.
Counters associated with multiple segments for load balance.
multiple Bloom filters can provide independent storage space for multiple attributes. We assume that each item here has three attributes, storage, computation capacity and available time. Each segment Bloom filter representing one attribute divides the attribute’s range into several segments and each segment representing one sub-range can store the membership information of items. When the attribute value of an item locates within the range of one segment, we can insert the item ID into the corresponding segment by carrying out the computation of hash functions. For example, as shown in Figure 1, we can simply insert an item that has 40GB storage capacity into the second segment. We can further represent the multi-dimension attributes by utilizing multiple segment Bloom filters, which maintain the type and values of multiple attributes of items. We present the general description of MDSBF to facilitate clear understanding: • We have n items and a hash table stores their IDs. • Each item has p attributes stored in p associated segment Bloom filters. • Each segment Bloom filter has m bits and is equally divided into q segments. Each segment has approximately m/q bits. • B[i][j][k] , (1 ≤ i ≤ p, 1 ≤ j ≤ q, 1 ≤ k ≤ m/q), is one bit and represents the position of the kth bit of the jth segment of the ith attribute. IV. L OAD BALANCE
Attributes
Item IDs
50G
C[1][2]
C[1][1]
200G
500G
1
1
..
..
1
0
..
..
0
0
..
..
1
20GHz 1
80GHz
6000s 0
0
Storage Capacity
0
Computation Capacity
Infinity 0
Available Time
Multi-Dimensional Segment Bloom Filters
Fig. 1. An example of our proposed structure including an MDSBF and a hash table.
We propose a new multi-dimensional structure, MDSBF, to support range query and obtain space savings. Each item in the networked environments needs to maintain heterogeneous resources and provide available information, such as computation, storage, network bandwidth capacity, available time and limited energy. Figure 1 shows an example of our proposed structure, which consists of two parts, a hash table and multi-dimension segment Bloom filters. The hash table as an auxiliary structure is used to store item IDs and
Load balance among multiple segments is significantly important to provide efficient service of range query for requests. An overloaded segment may produce larger false probability for range query due to more hash collisions. Unfortunately, we cannot accurately determine the number of items stored in one segment in advance. Thus, some segments in a segment Bloom filter may store more items than others and then become overloaded. Therefore, we need to design a simple and efficient scheme to achieve load balance among multiple segments. The load balance aims to make multiple segments of a Bloom filter maintain approximately equal amounts of items and can significantly decrease false probability of range queries. We set a simple counter-based approach to recording the number of stored items in each segment. The counter is represented as C[i][j],(1≤i≤p,1≤j≤q) , which is associated with the jth segment of the ith attribute. Figure 2 shows the counterbased representation in a segment Bloom filter and the initial values of counters are set to zero. When a new item is inserted, we can increase the value of associated counter, C[i][j] , by 1. Figure 3 shows the practical algorithm for load balance among multiple segments. Its basic idea is to transfer the space of light-load segments to that of heavy-load ones with
429
Inserting Item (Input: item a)
Initialization Model 1: Divide a Bloom filter into segments: range[1...p][1...q] 2: Initialize: C[i][j] = 0, B[i][j][k] = 0
Record Model
3: 4:
1: InsertHashT able(ItemID) 2: for (i = 1, i ≤ p, i + +) do 3: range[i][j] := SelectSegment(Item.attribute[i]) 4: Hash(Item ID, range[i][j] ) 5: C[i][j]++ 6: end for
5: 6:
The algorithm of inserting an item with p attributes.
Fig. 4.
Adjustment Model 1: for (i = 1, i ≤ p, i + +) do 2: C[i][x] := M aximize(C[i][j] ) 3: if (C[i][x] > n/q) then 4: C[i][y] := SelectLeastF irst(C[i][j] ) 5: C[i][z] := SelectLeastSecond(C[i][j] ) 6: CombineRanges(range[y] , range[z] ) 7: C[i][y] := C[i][y] + C[i][z] 8: DivideRange(range[x] , range[x1] , range[x2] ) 9: C[i][x1] := C[i][x2] := 12 C[i][x] 10: end if 11: end for Fig. 3.
for (i = 1, i ≤ p, i + +) do range[i][j] := SelectSegment(a.attribute[i]) C[i][j]++ Increase the counter by 1 end for InsertHashTable(a)
1: 2:
A. Inserting Items
Load-balance algorithm for multi-dimension segment Bloom filter.
the indication of counters. The load-balance algorithm consists of three parts: Initialization, Record and Adjustment models for n items that have multi-dimensional attributes. In the initialization model, we divide one Bloom filter into several segments, which represent continuous ranges of attributes. Record model allows to insert hashed value of an item into a segment, i.e., range[i][j] , in which the range of this segment can cover the item’s attribute value. At the same time, each counter associated with this segment is increased by 1 to record the number of stored items. The adjust model further allows to execute the operations for load balance among multiple segments when the counter in a segment range[x] is larger than n/q. We select two segments, range[y] and range[z] , which have the least items. The adjustment model needs to combine two ranges, i.e., range[y] and range[z] , into one range, range[y] . Furthermore, the saved space is re-allocated to the overloaded segment, range[x] . We divide range[x] into two ranges range[x1] and range[x2] , which become light-loaded after the operations for load balance. We finally set new counters for C[i][x1] and C[i][x2] .
Given a certain item a, we assume that it has p attributes and each attribute can be represented as an array with m counters. The m counters are equally divided into q segments and each segment has m/q counters. Each segment is associated with one hash function and the total array with m counters can contain the hashed values from m/q hash functions. Figure 4 presents the algorithm of inserting items. We first need to select the segment, whose range can cover the attribute value of the current item. Afterwards, we compute the hashed values of item ID by a hash function and increase the corresponding counter in that segment by 1. Figure 5 further shows an example of inserting an item a by using one attribute. The attribute value of item a is 40G, which locates in the second segment indicated as range [10G . . . 50G]. The hashed value of item ID can be mapped into one specific counter based on the computation of hash functions and the counter should be increased by 1 to store the current item ID. In addition, we also need to insert the original value of item ID into hash table to facilitate a future range query. a.attribute[1]=40G 10G
0G 0
2
..
..
0
1
0
50G 1
..
4
2
0
0
1
..
6
..
3
0
200G 1
0
500G ..
..
1
4
10
Storage Capacity n/p=200
Hash (a)
10G
0G 1
0
..
..
1
0
50G 0
2
..
4
V. DATA O PERATIONS
..
7
Fig. 5.
In this section, we present practical data operations in MDSBF to efficiently support range query. The operations include insertion, query and deletion algorithms and we further use the counting Bloom filters [10] to support the deletion operation. Counting Bloom filters replace an array of bits with counters to count the number of items hashed to that location. They are required to deal with item deletion. It is very useful to apply counting Bloom filters to handle a data set that is changing over time.
..
100G
2
0
0
100G 1
..
..
0
0
200G 1
1
500G ..
..
1
0
10
An example of inserting an item a.
B. Querying Items Bloom filter-based range query can select the items whose attributes values match a range request, [s . . . t]. Figure 6 shows the algorithm of attributes-based range query. We need to first create a linked list to temporarily store the valid item IDs, whose attributes values satisfy query requests. When the bounds of query requests, [s . . . t], can not exactly match the bounds of our segments, we need to determine feasible segments covering the given range. For example, as shown
430
Querying Item (Input: range [s . . . t]; Output: items list) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:
VI. T HEORETICAL A NALYSIS
L:=CreateLinkedList() M in:=Lower(s); M ax:=Upper(t) ω :=Number[M in . . . M ax] for (i = 1, i ≤ n, i + +) do for (j = 1, j ≤ ω, j + +) do V alue:= Hash(item[i]) if Probe(value) then InsertItem(item[i], L) end if Break end for end for Return L Fig. 6.
In this section, we make the theoretical analysis for our proposed MDSBF structure and further show associated false probabilities. We can evaluate range query accuracy in the context of MDSBF by analyzing false positive probability from Bloom filters and range error probability from divided segments. A. False Positive Probability To better understand the theoretical analysis, we begin with description of false positive probability in the standard Bloom filters.
The algorithm of range query.
in Figure 5, a given range [6G . . . 40G] can not exactly match any of two existing ranges, [0G . . . 10G] and [10G . . . 50G]. In this case, according to the given range request [6G . . . 40G], we can determine the query bounds, M in = 0G and M ax = 50G, to allow the further query operations. We further need to compute the number of segments belonging to the query range. We use ω to represent the number and as shown in Figure 5, ω is equal to 2 since two segments, [0G . . . 10G] and [10G . . . 50G], are used to satisfy the query request, [6G . . . 40G]. Moreover, we check the counters based on the hashed values of item IDs and a valid item ID is selected if the checked counter is non-zero. The valid items IDs finally can be inserted into the linked list, L. The description above presents the method of range query for one attribute. When considering multiple attributes, we can create multiple lists to temporarily store the valid item IDs. The final results are the item IDs, which simultaneously exist in all the linked lists. C. Deleting Items We use the counting Bloom filters, in which each bit is replaced with a counter to support deletion operations. In a network node, available resources are dynamic and may join or leave from time to time. Thus, we need to design the deletion operation to remove the item IDs from the proposed data structure. The deletion operation is the counter-process of insertion. Figure 7 shows the algorithm of deletion operation, which deletes the item ID in the hash table and its attributes in the segmented Bloom filters. Deleting Item (Input: item, a) 1: 2: 3: 4: 5: 6:
for (i = 1, i ≤ p, i + +) do range[i][j] := SelectSegment(a.attribute[i]) Decrease the counter in current segment by 1 C[i][j]−− end for DeleteHashTable(a) Fig. 7.
The algorithm of deletion operation.
D EFINITION 1. A standard Bloom filter is a bit array of m bits for representing a set S = {a1 , a2 , . . . , an } of n items. All bits in the array are initially set to 0. Then, a Bloom filter uses q independent hash functions {h1 , . . . , hq } to map the set to the bit address space [1, . . . , m]. For each item a, the bits of hi (a) are set to 1. T HEOREM 1. The false positive probability in the standard qn Bloom filter is f0 = (1 − e− m )q when the Bloom filter has m bits and q hash functions for storing n items. The probability can obtain the minimum (1/2)q or (0.6185)m/n when q = (m/n) ln 2. The detailed proof can be found in [24]. B. Range Error Probability Range error probability refers to the probability of query errors due to the unmatched bounds between query requests and existing ranges. The range error probability relies on the number of items existing in one segment, which can cover the range bounds of query requests. Although we can not exactly determine the actual values of multi-dimensional attributes in the segments, we can conjecture approximate bounds based on the information from other adjacent segments. We further assume that n items are uniformly distributed in an array with m bits, which can be divided into q segmentsa . T HEOREM 2. A Bloom filter with m bits is divided into q segments and each array has at least two segments, i.e., q > 2. Each segment is associated with one predefined range and all the segments contain the approximate amounts of items. Range error probability is bounded by 2/q for one query request for one attribute when q is larger than 2. Proof: When the bounds of range query are exactly matched with those of our segments, query operations do not generate range errors. For example, our segments, [0G . . . 10G] and [10G . . . 50G], can satisfy one query request, [0G . . . 50G], in terms of matched bounds. However, when query bounds locate within the segments, the range error probability occurs. A query, [5G . . . 50G], will get uncertain results due to unmatched bounds. The false probability is associated with the number of items existing in the segments, a Our proposed load balance algorithm in Figure 3 can help to achieve the uniform distribution of items among the segments.
431
C OROLLARY 1. Range errors for multiple attributes in the worst case occur when all bounds of range query requests locate in the interior of the segments. The maximum of range error probability for p attributes is 1 − (1 − 2q )p when q > 2. Proof: When all bounds are not exactly matched with those of the segments, the range error probability in an array for one attribute is 2/q, according to the conclusion in Theorem 2. Furthermore, the array can show an accurate answer with the probability, (1 − 2/q). Therefore, all p attributes in p arrays present accurate results with the probability, (1 − 2q )p . Thus, the maximum of range error probability for p attributes is 1 − (1 − 2q )p . C. False Positive Probability of Range Query The false positive probability of range query needs to consider the false probabilities in Bloom filters and range segments. We can further study the query efficiency based on Theorem 1 and 2. T HEOREM 3. The false positive probability of range query qn for one attribute is f1 = 2q (1 − e− m )q when the Bloom filter has m bits and q hash functions for storing n items. The m n (0.6412) n when probability can obtain the minimum 1.8084 m q=m n ln(3.0328).
name, timestamp when a request taking place, URL address, document size and item retrieval time in seconds. According to their own characteristics, we consider the concatenation of one user ID and its request timestamp as an item ID, which consists of 15-decimal digit and can uniquely represent an item. Thus, a request for range query can search and identify item IDs based on the constraints from multi-dimension attributes. Furthermore, we use the MD5 as hash functions for their wellknown properties and relatively fast implementation. Based on the MD5 hash functions, an attribute value can be hashed into 128 bits by calculating its MD5 signature. We then divide the 128 bits into four 32-bit values and utilize the modulus of each 32-bit value by the filter size m. A. False Positive Probability Figure 8 shows the false positive probability of MDSBF when considering multiple attributes. False positive probability from hash collisions means that one item is falsely considered as a member of a data set although it is actually not. We examine the false positive probability when considering 4 and 6 attributes. We use 8 hash functions for one attribute, q = 8, and divide the entire range of an attribute into 8 segments. Figure 8 also illustrates that more attributes usually generate lower false positive probability. The main reason is that the query results for multi-dimension attributes can be the intersection of different attributes and thus more attributes need to utilize more segment Bloom filters, significantly decreasing the probability of false positives. 20
False Positive Probability (%)
which can cover the bounds of range query requests. When an array has q segments and each segment can contain n/q items, the range error probability of one unmatched bound is n/q n = 1/q. Furthermore, the probability of two unmatched bounds, such as [5G . . . 20G] above, is 2/q. Because one request of range query for one attribute has at most two bounds, the range error probability can be bounded by 2/q.
qn
Proof: We can easily obtain f1 = 2q (1−e− m )q , which is the product of false positive probability from Theorem 1 and range error probability in Theorem 2. Furthermore, we have qn − qn 2 2 f1 = (1 − e− m )q = eqln(1−e m ) q q
The optimal number of hash functions (i.e., the number of segments) can minimize f1 when considering f1 as the function of variable q. After taking the derivative, we further utilize Taylor Series and get q = m n ln(3.0328) based on the computation of Matlab 6.5 software. Thus, we can finally obtain the minimum of false positive probability: min(f1 ) = 1.8084
Size=16M (4 attributes) Size=8M (4 attributes) Size=16M (6 attributes) Size=8M (6 attributes)
15
10
5
0 1
2
3
4
5
6
7
8
9
10
Number of Query Requests (1e+06) Fig. 8. False Positive Probability of MDSBF in terms of multiple attributes.
m n (0.6412) n m
VII. P ERFORMANCE E VALUATION To simulate the range-based query behaviors and facilitate a meaningful simulation, we artificially scale up the traces collected by Lawrence Berkeley National Laboratory [25]. We use BU-Web-Client record, which traced a total of 9,633 Mosaic sessions and contained the records of a population of 762 different users and 1,143,839 requests for data transfer. Each record in this log contains six parts: user ID, visited machine
Figure 9 displays the false positive probability when considering 5 attributes and each attribute having 6 and 10 segments. It can be clearly observed that the false positive probability in 10 segments is lower than that in 6 segments when considering the same available space. Unfortunately, the advantage is actually limited. More segments can present more accurate range information of multi-dimension attributes and thus improve the accuracy of range queries. On the other hand, we observe that the larger the number of ranges is, the smaller the size of each range is. Thus, the false positives in smaller storage space occur with relatively higher probability.
432
False Positive Probability (%)
30
In this paper, we propose a simple space-efficient data structure, Multi-Dimension Segment Bloom Filter, to efficiently support range query for multi-attribute items. The segmented Bloom filters can represent multi-dimensional attributes and support fast and accurate lookups. We further present the simple method for load balance among multiple segments to decrease the false positive probability. Practical algorithms can support the insertion, deletion and query operations for MDSBF structure. Theoretical analysis and trace-driven simulation results demonstrate that our MDSBF structure can efficiently support range query for multi-attribute items.
Size=16M (6 segments) Size=8M (6 segments) Size=16M (10 segments) Size=8M (10 segments)
25
20
15
10
5
0 1
2
3
4
5
6
7
8
9
10
R EFERENCES
Number of Query Requests (1e+06)
Fig. 9. False Positive Probability of MDSBF in terms of multiple segments. 20 Size=16M (6 segments) Size=8M (6 segments) Size=16M (10 segments) Size=8M (10 segments)
18
Query Time (ms)
16 14 12 10 8 6 4 2 0 1
2
3
4
5
6
7
8
9
10
Number of Query Requests (1e+06) Fig. 10.
Range query time using MDSBF structure.
B. Query Time Figure 10 shows the query time with the increments of query requests. Query time is an important index for evaluating the performance of range query. We examine the query time in terms of the increased number of requests by considering 5 attributes and each segment Bloom filter maintains 6 segments for each attribute range. The results shown in Figure 10 demonstrate that larger space size potentially has limited advantages than smaller one in terms of query time. The main reason is that query time mainly relies on the number of divided segments, rather than the space size of each segment. Associated query operations need to probe some hashed positions, which are explicitly indicated by the computation results of hash functions. VIII. C ONCLUSION A Bloom filter is a kind of space-efficient data structure that can be widely used for information representation and membership queries in current network environments. Its space efficiency is achieved at the cost of a certain false positive probability in membership queries. The limitation of a Bloom filter is that, because of items in different ranges as unevenly distributed, it cannot efficiently support the range query.
[1] M. Cai, A. Chervenak, and M. Frank, “A Peer-to-Peer Replica Location Service Based on A Distributed Hash Table,” SuperComputing, 2004. [2] H. Zhang, A. Goel, and R. Govindan, “Incrementally improving lookup latency in distributed hash table systems,” ACM SIGMETRICS, pp. 114– 125, 2003. [3] M. Harren, J. M. Hellerstein, R. Huebsch, B. T. Loo, S. Shenker, and I. Stoica, “Complex queries in DHT-based peer-to-peer networks,” IPTPS, 2002. [4] A.R.Bharambe, M.Agrawal, and S.Seshan, “Mercury: Supporting scalable multi-attribute range queries,” ACM SIGCOMM, 2004. [5] B. Bloom, “Space/time trade-offs in hash coding with allowable errors,” Communications of the ACM, vol. 13, 1970. [6] A. Broder and M. Mitzenmacher, “Using multiple hash functions to improve IP lookups,” INFOCOM, 2001. [7] F. Baboescu and G. Varghese, “Scalable packet classification,” ACM SIGCOMM, 2001. [8] S. Dharmapurikar, P. Krishnamurthy, and D. E. Taylor, “Longest prefix matching using Bloom filters,” ACM SIGCOMM, 2003. [9] F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese, “Beyond Bloom filters: From approximate membership checks to approximate state machines,” ACM SIGCOMM, 2006. [10] L. Fan, P. Cao, J. Almeida, and A. Z.Broder, “Summary cache: a scalable wide area web cache sharing protocol,” IEEE/ACM Trans. on Networking, vol. 8, no. 3, 2000. [11] M. Mitzenmacher, “Compressed Bloom filters,” IEEE/ACM Trans. on Networking, vol. 10, no. 5, 2002. [12] D. Guo, J. Wu, H. Chen, and X. Luo, “Theory and network application of dynamic Bloom filters,” INFOCOM, 2006. [13] Y. Zhu, H. Jiang, and J. Wang, “Hierarchical Bloom filter arrays (HBA): A novel, scalable metadata management system for large cluster-based storage,” Cluster, 2004. [14] C. Zheng, G. Shen, S. Li, , and S. Shenker, “Distributed segment tree: Support of range query and cover query over DHT,” IPTPS, 2006. [15] “Opendht,” http://opendht.org/. [16] D. Li, J. Cao, X. Lu, K. C. C. Chan, B. Wang, J. Su, H. va Leong, and A. T. S. Chan, “Delay-bounded range queries in DHT-based peer-to-peer systems,” ICDCS, 2006. [17] A. Guttman, “R-trees: A dynamic index structure for spatial searching,” ACM SIGMOD, pp. 47–57, 1984. [18] T. K. Sellis, N. Roussopoulos, and C. Faloutsos, “The R+ -tree: A dynamic index for multi-dimensional objects,” VLDB, pp. 507–518, 1987. [19] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, “The R*tree: An efficient and robust access method for points and rectangles,” SIGMOD, pp. 322–331, 1990. [20] C. Bohm, S. Berchtold, and D. A. Keim, “Searching in high-dimensional spaces index structures for improving the performance of multimedia databases,” ACM Computing Surveys, vol. 33, no. 3, pp. 322–373, 2001. [21] Y. Tao and D. Papadias, “MV3R-tree: A spatio-temporal access method for timestamp and interval queries,” VLDB, pp. 431–440, 2001. [22] B. Becker, S. Gschwind, T. Ohler, B. Seeger, and P. Widmayer, “An asymptotically optimal multi-version B-tree,” VLDB Journal, vol. 5, pp. 264–275, 1996. [23] X. Xiong and W. G. Aref, “R-trees with update memos,” ICDE, 2006. [24] A. Broder and M. Mitzenmacher, “Network applications of Bloom filters: a survey,” Internet Mathematics, vol. 1, pp. 485–509, 2005. [25] “The internet traffic archive,” http://ita.ee.lbl.gov/.
433