An adaptive hashing technique for indexing ... - Semantic Scholar

20 downloads 13066 Views 297KB Size Report
a School of Electrical Engineering and Computer Science, Seoul National ...... his M.S. and B.S. degrees in the Department of Computer Engineering from.
Data & Knowledge Engineering 56 (2006) 287–303 www.elsevier.com/locate/datak

An adaptive hashing technique for indexing moving objects Dongseop Kwon a

c

a,*

, Sangjun Lee b, Wonik Choi c, Sukho Lee

q

a

School of Electrical Engineering and Computer Science, Seoul National University, San 56-1, Shilim-dong, Kwanak-gu, Seoul 151-742, Korea b School of Computing, Soongsil University, Seoul 156-743, Korea Thinkware Systems Corporation, 15FL., Hanmi Tower, 45 Bangi-Dong, Songpa-Gu, Seoul 138-724, Korea Received 12 March 2005; received in revised form 12 March 2005; accepted 14 April 2005 Available online 16 May 2005

Abstract Although hashing techniques are widely used for indexing moving objects, they cannot handle the dynamic workload, e.g. the traffic at peak hour vs. that in the night. This paper proposes an adaptive hashing technique to support the dynamic workload efficiently. The proposed technique maintains two levels of the hashes, one for fast moving objects and the other for quasi-static objects. A moving object changes its level adaptively according to the degree of its movement. We also present the theoretical analysis and experimental results which show that the proposed approach is more suitable than the basic hashing under the dynamic workload.  2005 Elsevier B.V. All rights reserved. Keywords: Moving objects; Spatio-temporal databases; Index structures

q

This work was supported in part by the Brain Korea 21 Project and in part by the Ministry of Information and Communications, Korea, under the Information Technology Research Center (ITRC) Support Program in 2005. * Corresponding author. E-mail addresses: [email protected] (D. Kwon), [email protected] (S. Lee), [email protected] (W. Choi), [email protected] (S. Lee). 0169-023X/$ - see front matter  2005 Elsevier B.V. All rights reserved. doi:10.1016/j.datak.2005.04.004

288

D. Kwon et al. / Data & Knowledge Engineering 56 (2006) 287–303

1. Introduction Rapid advances in information and communication technologies have been creating new classes of computing environments such as ubiquitous computing [1] or mobile computing. In these new computing environments, the efficient management of a large number of moving objects, e.g. the positions of subscribed users, cars, or devices, becomes important and essential to provide various types of location based services. In addition, context-awareness is a basic requirement in order to realize ‘‘anywhere, anytime’’ computing in ubiquitous computing environments and the location is one of most important and useful context information. Therefore, it is highly required to store and retrieve location information efficiently for the location-aware services [2,3]. Traditional database systems, however, have the problem in processing a large number of moving objects because the locations of moving objects are changed very rapidly and continuously. Moreover, traditional spatial index structures cannot support frequent updates well because they only focus on retrieving spatial data efficiently. To address this problem, several spatio-temporal index structures have been proposed for moving objects [4–8]. Among these, the hash-based approach [4] is one of the simplest ways to index the locations of moving objects. In this approach, the domain space is uniformly divided into grid cells of the same size. The hash value of an object is the cell number of a grid to which the object belongs. Because of using a grid cell, this is also called as the grid method [5]. Despite of the existence of other alternative index structures, the hashing method is widely adopted because it is simple, easy for implementation, and able to support a large number of frequent updates well [5,9,10]. Note that, from now on, the hashing in this paper represents the hash-based index structure using the uniform grid for moving objects. The size of a grid cell is one of the most important factors that affect the performance of the hashing. If a large grid cell is used, which means that the number of grid cells is small, each hash bucket has to keep a lot of data items. Consequently, it suffers from the performance deterioration due to a long chain of overflow pages, which incurs a large number of disk accesses. On the contrary, in case of a small grid cell, the update performance becomes worse because a lot of hash buckets are required for the grid cells. Therefore, it is significantly critical to select the appropriate size of a grid cell with regard to the workload of queries and the distribution of data. In many real applications, however, it is not easy to decide the appropriate size of a grid cell at the beginning because the workload or the distribution of data may change dynamically during the execution. For example, in the commuting time, the positions of most objects, e.g. people or cars, would move relatively fast and continuously. Therefore, the location management system might have to process large volumes of update operations in this period. On the other hand, most objects would be quasi-static in the night or in the office hours. Since the size of a cell cannot be changed during the execution, the hashing method cannot support this dynamic workload efficiently with the static size of a cell. In this paper, to solve this problem, we propose the adaptive two-level hashing technique for moving objects. The proposed method maintains two levels of hash structures. The upper level of the hashes, which is for the fast moving objects, uses a large grid cell to support update queries efficiently. The lower level of the hashes, which is for quasi-static objects, uses a small grid cell to support search queries efficiently. A moving object can change its level adaptively according to the

D. Kwon et al. / Data & Knowledge Engineering 56 (2006) 287–303

289

degree of its agility. By adaptive escalating and de-escalating between two levels, the proposed technique can support the dynamic workload efficiently. The rest of the paper is organized as follows. Related work is briefly discussed in Section 2. Section 3 explains the basic hashing method for moving objects. In Section 4, we propose the adaptive two-level hashing to handle the dynamic workload. Section 5 provides the experimental results. Finally, Section 6 concludes the paper.

2. Related work Various index structures have been proposed for multi-dimensional data, and a survey for these can be found in [11]. However, these multi-dimensional index structures cannot be used directly for indexing moving objects because it is hard for these index structures to handle heavy loads of updates efficiently. Several spatio-temporal index structures have been proposed for indexing moving objects. A detailed survey can be found in [12]. According to the type of data being stored, spatio-temporal index structures can be divided into two categories: (a) indexing the past (trajectories) and (b) indexing the current or the future positions. [13,14] are representative work for the former category. Our approach belongs to the latter category. In the latter category, there are also several types of index methods. One of them is the method that uses a time-parameterized function. They store the parameters of the function, e.g. the velocity and the starting position of an object, instead of the real positions. By doing this, they have to update the index structure only when the parameters change (for example, the speed or the direction of a moving object changes). The time-parameterized R-tree (TPR-tree) [6] and its variants (e.g. TPR*-tree [7]) are the examples of this type of index structures. The main drawback of this approach is that it is hard to find an appropriate function for the movements in many real applications. If the movements of objects are complicated or not linear, this approach is not suitable. The lazy update R-tree (LUR-tree) [8] aims to support frequent updates by reducing overhead in the update operation in the R-tree. It changes the structure of the index only when the new position of an object is out of the corresponding MBR. With adding a secondary index on the R-tree, it can perform the update operation in the bottom–up way. Lee et al. [15] extend the main idea of [8] and generalize the bottom–up approach for updating the positions of moving objects. The Q+Rtree [16] is a hybrid tree structure which consists of both the LUR-tree and the Quadtree. It uses the LUR-tree for quasi-static objects and the Quad-tree for fast moving objects. Since an object moves in two types of index structures adaptively, the Q+Rtree looks similar to our work. However, the Q+Rtree has pre-defined topological regions for the fast movements, and it recognizes all objects in the regions as fast moving objects. On the contrary, the adaptability of our work is based on the agility of an object itself. Therefore, our work does not need predefined regions. Tao and Papadias [17] have proposed a general way to improve the performance of an index structure by changing their node sizes adaptively. However, it is for general tree-based index structures, not for moving objects. Multi-Level Grid File [18] is a dynamic multi-dimensional file structure. However, it is only an extension of the Grid File [19], not related to our work.

290

D. Kwon et al. / Data & Knowledge Engineering 56 (2006) 287–303

3. Hashing for moving objects 3.1. Basic algorithms The basic idea of hashing techniques for moving objects is introduced in [4]. Hash-based approaches for moving objects are principally identical with general hash-based file structures. An object is stored into the corresponding hash bucket to the hash value of its location. The algorithms for the basic hashing are as follows: Insert: Calculate the hash value of the given position of an object, then store the object into the corresponding disk bucket to the hash value. Search: Examine all hash buckets that intersect the given query range. Delete: Search the object, then delete it from the hash bucket. Update: Search the object, then delete it from the old bucket and insert it into the new bucket again. A simple grid is typically used as a hash function for moving objects. Fig. 1 shows an example of a simple grid. The grid divides the domain space of the locations of moving objects into i · i equally sized square. Fig. 1(a) uses a 4 · 4 grid and Fig. 1(b) uses an 8 · 8 grid. It is critical to decide the appropriate size of a grid cell for the performance of the hashing. In general, the small size of a grid cell, which means a fine grid, is good for search operations. A large size of a grid cell, which means a coarse grid, is good for update operations. For example, suppose that the Obj1 moves to a new position along the arrowed line in Fig. 1. If a coarse grid like in Fig. 1(a) is used, the position of the Obj1 is updated only in the corresponding hash bucket. On the other hand, if a fine grid like in Fig. 1(b) is used, the Obj1 should be moved from the bucket for the original position to the bucket for the new position. Therefore, a coarse grid is generally better than a fine one for the update operations. However, since the coarse grid usually has to keep more data items in

Obj1

a

Obj1

b

Fig. 1. Example of simple grids. (a) 4 · 4 grid, (b) 8 · 8 grid.

D. Kwon et al. / Data & Knowledge Engineering 56 (2006) 287–303

page 0

...

page 1

...

page 2 page 3 page 4

...

page 5

...

...

291

object ID page no 0 5 1 5 2 3 3 7 4 14 5 22

page 6

...

...

... ...

Grid

...

Disk Pages

Secondary Access Path

Fig. 2. Structure of the hash method.

the bucket, it needs more disk pages. As a result, for the search operations, it needs to access more disk pages for retrieving one hash bucket. Therefore, the search performance of a coarse grid is worse than that of a fine one. 3.2. Implementation Fig. 2 shows our implementation of the basic hashing. The objects in each grid cell are stored into a list of disk pages. We add a secondary access path to disk pages for hash buckets in order to access directly to the disk page which has the object we are looking for. With this secondary access path, we need not to search an object from the head of a list of disk pages. Similar ideas using the secondary access path are adopted in [8,15]. We simply used chains of disk pages to handle overflows in hash buckets like Fig. 2. 3.3. Analysis To estimate the effect of the number of grid cells, we now analyze the cost of the basic hashing in terms of the number of disk accesses. The notation used in this section is summarized in Table 1. We assume that objects are uniformly distributed in the space. If a query region is inside a grid cell, the system only needs to read all the disk pages corresponding to the grid cell to process the query. Therefore, the average cost of a search query is obtained as follows: N ð1Þ nB If a new location of an object is placed in the same hash grid as before, only the position in the hash bucket should be modified. Therefore, only 2 disk accesses (one for read and one for write) are needed. On the other hand, if an object moves out of the previous hash grid, we should hand C search ¼

292

D. Kwon et al. / Data & Knowledge Engineering 56 (2006) 287–303

Table 1 Notation for analysis Symbol

Meaning

N n B pout Csearch Cupdate

The The The The The The

number of objects number of grid cells maximum number of objects in a disk page probability of an object moving out of the previous grid cell number of disk accesses for a search query number of disk accesses for an update query

over the object from the previous hash bucket to new one. Therefore, if we assume that there is no overflow or underflow during the update operation, 4 disk accesses (2 for deleting old one, and 2 for inserting new one) are necessary. From these two cases, the cost of an update query is obtained as follows: C update ¼ 2  ð1  pout Þ þ 4  pout ¼ 2  pout þ 2

ð2Þ ð3Þ

The probability of moving out of a grid cell, pout rises in proportion to a rise in the number of grid cells, n. Therefore, the cost of an update query is also proportional to the number of hash buckets, n. With an arbitrary constant k, the cost of an update query is as follows: C update ¼ 2  k  n þ 2

ð4Þ

From (1) and (4), it is clear that, as the number of grid cells, n, increases, the cost of a search decreases and the cost of an update increases. Dividing the space into smaller grid cells has the advantage of searches, while it has the disadvantage of updates. There is an tradeoff between the performances of an update operation and a search operation. Therefore, one should decide the number of grid cells carefully with regard to the ratio of search operations or update operations in queries.

4. Adaptive two-level hashing 4.1. Motivation In many real applications, we have observed the following characteristics for moving objects: (1) The workload of update operations changes dynamically as time. For example, the traffic is generally heavy at peak hour. On the contrary, it is relatively lighter in office hours, and only a few cars move in the night. (2) The movement of an object maintains for a certain period. A moving object generally has its destination. Until it reaches the destination, it moves continuously. Once arriving at the destination, it generally stops moving and stays in static state for a long time, e.g. a parked car.

D. Kwon et al. / Data & Knowledge Engineering 56 (2006) 287–303

293

(3) Most moving objects are in quasi-static state most of time. The term ‘‘quasi-static’’ means that objects do not move or move slowly only within a small region of space such as an office or home. According to [16], most of objects, especially human beings, are in a quasi-static state most of the time. The basic hashing cannot support the dynamic workload because the size of a grid cell is fixed at the initial time and cannot be changed during the execution. In addition, most objects are in quasi-static state and only a few objects move continuously for a certain period. However, the basic hashing stores these two types of objects into a hash structure together. Therefore, it is not efficient for update operations. 4.2. Adaptive two-level hashing To solve the problem of the basic hashing, we propose the adaptive two-level hashing. The adaptive two-level hashing consists of two levels of hash structures with different grid sizes. Fig. 3 shows a basic concept of the adaptive two-level hashing. The upper level of the hashes, named the Coarse-Hash, is for fast moving objects. It uses a large grid cell to support update queries efficiently. The lower level of the hashes, named the Fine-Hash, is for quasi-static objects. It uses a small grid cell for search operations. In the adaptive two-level hashing, the levels of moving objects can be changed adaptively according to the movements of the objects, not fixed statically. In each hash structure in our approach, the basic algorithms of insert, search, and update operations are the same as the basic hashing. The difference is that the adaptive two-level hashing has mechanisms for the escalation and the de-escalation. The escalation means the migration from the Fine-Hash to the Coarse-Hash, and the de-escalation means the opposite, the migration from the Coarse-Hash to the Fine-Hash. It is more obvious to say when the object should be escalated. If an object in the Fine-Hash moves out of the fine grid cell, the system considers it as a fast object, and escalates it from the Fine-Hash to the Coarse-Hash. To find objects for the de-escalation, every object in the Coarse-Hash has its own time-stamp. Whenever an object escalate to the Coarse-Hash or moves to another grid cell in the Coarse-Hash, the time-stamp of the object is updated. By examining the time-stamps, the system can recognize which objects are expired.

Coarse Grid

Escalate DeEscalate

Fine Grid

Fig. 3. Adaptive two-level hashing.

294

D. Kwon et al. / Data & Knowledge Engineering 56 (2006) 287–303

The escalation operation can be easily performed during update operations. However, it is not practical to perform the de-escalation in the update process because it is too expansive to examine all the objects in the Coarse-Hash in every update operation. Therefore, the proposed technique performs the de-escalation process in a different way. For a search query, the system should examine all objects in a query region in both of the Coarse-Hash and the Fine-Hash. When the system examines objects in the Coarse-Hash, the examining for the de-escalation is also performed at the same time. We can save additional disk accesses to find expired objects by doing this. Expired objects that the system finds during examining the Coarse-Hash are inserted into the Fine-Hash during examining the Fine-Hash in a bulk way. Although some expired objects can remain in the Coarse-Hash until their cells are examined for a search query, these remaining expired objects do not affect the performance of the system. If there are only update queries, all objects will stay in the Coarse-Hash in the end, which is good for update operations. Contrarily, if there are only search queries for a long time so that all objects are expired, all objects will stay in the Fine-Hash in the end, which is good for search operations. For example, if a man is in his office, he is in a quasi-static state at the beginning. If he rides a car and moves to another place, his position will move out of the fine-grid cell before long. Then, the system escalates his position to the Coarse-Hash. Until he moves by his car, his position remains in the Coarse-Hash. After arriving at the destination, he does not move fast. After a while, if a search query perform over a his position, he is recognized as an expired object and de-escalated to the Fine-Hash. Algorithm 1. Adaptive update algorithm Procedure AdaptiveUpdate (id, position, timestamp) Input: an objects id id, a new position position, query time timestamp begin pageNo getPageNoFromID(id); page ReadPage(pageNo); if page.hashType = FINE then if fineHash(position) = page.hashValue then UpdateItemInPage(page, id, position); else // Escalation DeleteItemFromPage(page, id); InsertIntoCoarseHash(id, position); UpdateTimeStamp(id); endif else if coarseHash(position) = page.hashValue then UpdateItemInPage(page, id, position); else DeleteItemFromPage(page, id); InsertIntoCoarseHash(id, position);

D. Kwon et al. / Data & Knowledge Engineering 56 (2006) 287–303

295

UpdateTimeStamp(id); endif endif end 4.3. Algorithms The escalation occurs during update operations. Algorithm 1 describes the update algorithm for the adaptive two-level hashing. If a new position of an object is still in the grid cell which it belongs to, the same update algorithm as the basic hashing is used. However, if an object in the FineHash moves out of the grid cell, it is recognized as a fast moving object. In this case, instead of updating in the Fine-Hash, the system escalates the object to the Coarse-Hash. The de-escalation is the opposite operation of the escalation. Each object has a time-stamp for the last hand-over time in the Coarse-Hash, and an object that does not move for a given threshold time is expired. The system examines all the objects in the Corse-Hash and finds expired objects. After finding expired objects, the system removes those expired objects from the Coarse-Hash and inserts them into the Fine-Hash. There are two different methods for performing the de-escalation. Algorithm 2. Search algorithm with a de-escalation mechanism Procedure AdaptiveSearch (q) Input: q is a query range begin coarse hash buckets that intersects q; Hcoarse Hfine fine hash buckets that intersects q; Initialize ListDE; // loop for the Coarse-Hash —– forEach disk page D in Hcoarse do forEach item in D do if item.position in q then add item into the result; endif if item is expired and fineHash(item) 2 Hfine then // De-Escalation: delete from the coarse hash delete item; insert item into ListDE; endif endFor endFor // loop for the Fine-Hash —– forEach disk page D in Hfine do forEach item in D do

296

D. Kwon et al. / Data & Knowledge Engineering 56 (2006) 287–303

if item.position in q then add item into the result; endif endFor // De-Escalation: insert into the fine hash Move items that can be stored in D into D from ListDE; endFor end The one is the de-escalation during search operations, which is described in Algorithm 2. The main advantage of this method is that the system can check expired objects during the execution of a search query. Since both of the hashes have to be examined for processing a search query, the system check expired objects at the same time. Therefore, it can reduce additional disk accesses for checking expired objects. Among the expired objects, the system de-escalates the objects which can be stored in the corresponding grid cell to the query region in the Fine-Hash. For the de-escalation, the objects are deleted from the Coarse-Hash, and preserved in a temporary list. During the examination of the Fine-Hash, the expired objects in the temporary list are inserted again in the Fine-Hash. Although the immediate manner is intuitive and simple, the drawback of the immediate manner is that it requires write locks during the search operation. It may lead potential conflicts and deteriorations of the performance. The other is the periodic de-escalation. In this method, the system periodically executes the deescalation process. The periodic de-escalation overcomes the locking problem in the de-escalation during search operations. However, it needs additional disk accesses for the de-escalation operation. 4.4. Analysis We now analyze the cost of the adaptive two-level hashing in terms of the number of disk accesses. The same notation as used in Section 3.3 also used for this section and a new notation used in this section is described in Table 2. For a search query, the adaptive two-level hashing should examine both of the hashes. Therefore the cost of a search query is as follows:

Table 2 Notation for analyzing the adaptive two-level hashing Symbol

Meaning

nF nC pF pC pFout pCout

The The The The The The

number of grid cells in the Fine-Hash number of grid cells in the Coarse-Hash probability of an object in the Fine-Hash probability of an object in the Coarse-Hash (=1  pF) probability of an object moving out of the previous grid cell in the Fine-Hash probability of an object moving out of the previous grid cell in the Coarse-Hash

D. Kwon et al. / Data & Knowledge Engineering 56 (2006) 287–303

297

pF  N pC  N þ n B n B  F   C N nC  pF þ nF  pC ¼  B nF  nC

ð5Þ

C search ¼

ð6Þ

To analyze the cost of an update operation, suppose that the object for update is in the FineHash. If the object moves out of the previous fine grid cell, the system should delete the object from the Fine-Hash and insert it to the Coarse-Hash. Therefore it needs 4 disk accesses. As the same way in Section 3.3, it is equal to the cost in the basic hashing. The cost for the escalation is equal to the cost for moving an object to another grid cell. Therefore, the cost for update an object in the Fine-Hash is 2 Æ pFout + 2 and the probability of an object in the Fine-Hash is pF. From these, the expected value of the cost for update in the Fine-Hash is pF Æ (2 Æ pFout + 2). In the same way, the expected value of the cost for update in the Coarse-Hash is pC Æ (2 Æ pCout + 2). Therefore, as the same way as Section 3.3, the cost of an update query is as follows: C update ¼ pF  ð2  pFout þ 2Þ þ pC  ð2  pCout þ 2Þ

ð7Þ

¼ pF  ð2  k F  nF þ 2Þ þ pC  ð2  k C  nC þ 2Þ

ð8Þ

¼ 2  ðpF  k F  nF þ pC  k C  nC Þ þ 2

ð9Þ

ð* pF þ pC ¼ 1Þ

To compare the cost of the adaptive two-level hashing to that of the basic one, Table 3 shows the cost for some example cases. In this table, the column ‘‘method’’ stands for the type of a hash method. For the simplicity, we assume that kC and kF are equal to k in these examples. The parameter of nF = 100, nC = 10, pF = 0.9, pC = 0.1 means that there are 100 grid cells for the Fine-Hash, 10 grid cells for the Coarse-Hash, 90% of objects in the Fine-Hash, 10% of objects in the CoarseHash. Since the number of grid cells cannot be changed during the execution, the costs of the basic hashing are constant. However, the costs of the adaptive two-level hashing changes as the ratio of quasi-static objects and fast objects changes. If there are more quasi-static objects, in the case of pF = 0.9, pC = 0.1, the search cost becomes better. On the other hand, if the number of fast moving objects increases, in the case of pF = 0.1, 0.9, the update cost is decreasing. Therefore, the adaptive two-level hashing adapts itself to the change of the patterns of movements of objects. 4.5. Multi-level hashing This paper has focused only on the two-level hashing. In this paper, moving objects are categorized into only two classes, which are quasi-static objects and fast moving objects. However,

Table 3 Examples of the cost Method

Parameters

Csearch

Cupdate

Basic Basic Adaptive Adaptive Adaptive

n = 10 n = 100 nF = 100, nC = 10, pF = 0.9, pC = 0.1 nF = 100, nC = 10, pF = 0.5, pC = 0.5 nF = 100, nC = 10, pF = 0.1, pC = 0.9

0.1 (N/B) 0.01 (N/B) 0.019 (N/B) 0.055 (N/B) 0.091 (N/B)

10k + 2 200k + 2 182k + 2 110k + 2 38k + 2

298

D. Kwon et al. / Data & Knowledge Engineering 56 (2006) 287–303

another class of moving objects can be considered such as a person who is moving slowly by working or riding a bicycle. In the two-level hashing, this class of objects would have to be escalated or de-escalated very often. According to circumstances, therefore, it is also possible to use more numbers of levels, for example a three-level hashing or a four-level hashing. The algorithms for the adaptive two-level hashing can be directly extended for the multi-level hashing. However, there is an overhead of multi-level hashing approaches including the adaptive twolevel hashing. Since the system should investigate all the levels of hash structures, the performance of search operations is worse than the basic hashing with a similar number of grid cells. Moreover, as the number of levels increases, the overhead of multi-level approaches also grows. Therefore, this paper deals with the two-level hashing only because it is not practical to consider more numbers of classes of moving objects.

5. Experiments In this section, we present the results of some experiments to analyze the performance of the adaptive hashing. 5.1. Experimental setup We implemented both of the basic hashing and the adaptive two-level hashing in C++. All experiments were conducted on Intel Pentium III 1 GHz with 768 MB RAM running Linux 2.4.20-8. The disk page size is set to 4 KB. We also implemented the LRU-cache for disk pages in order to consider the buffer effects. We used the Network-based Generator of Moving Objects [20] to generate a set of moving objects. The input of the generator is the road map of a German city, Oldenburg, which is shown in Fig. 4(a). In the beginning, the generator makes the initial locations of the given number of objects. For the experiments, we generated datasets with 10K, 50K, and 100K objects, respectively. Fig. 4(b) is an example of the initial locations of objects. The experiments were performed for 100 time-stamps for each workload. If we assume that the objects report their locations every 3 min, 100 time-stamps corresponds to 5 h. At the beginning, all the objects are in the quasi-static state and stored in the Fine-Hash. At time-stamp 0%, 10% of the objects begin to move. Each object has its own destination, and moves to the destination along the given road map. Once a moving object arrives at the destination, it stops moving, and the generator selects one from the static objects for a moving object. At every time-stamp, the generator makes update and search queries. For update queries, the generator reports the pairs of id and location of the moving objects, which are 10% of the whole objects. Then, the search queries are generated. Query windows are uniformly distributed in the domain space. The range of the query window is 0.2% of the entire range in each dimension. The number of search queries per each time-stamp vary with the given workload. The experiments were performed under the three different workloads: (1) the low update ratio (30% updates:70% searches), (2) the medium update ratio (50% updates:50% searches), and (3) the high update ratio (80% updates:20% searches). After 100 time-stamps, a workload is changed to the next workload.

D. Kwon et al. / Data & Knowledge Engineering 56 (2006) 287–303

299

Fig. 4. Dataset for the experiments. (a) Road network map, (b) initial distribution.

For the experiments, we used 2 variants of the adaptive two-level hashing and 5 variants of the basic hashing. The term basic in the results represents the basic hashing, and the following value represents the number of grid cells in each dimension. For example, basic 40 means the basic hashing with the 40 · 40 grid. The term adaptive represents the adaptive two-level hashing, and the following numbers represent the number grid cells in each dimension for the CoarseHash and that for the Fine-Hash, respectively. For example, adaptive 20–80 means the adaptive two-level hashing with the 20 · 20 grid for the Coarse-Hash and the 80 · 80 grid for the FineHash. 5.2. Experimental result In the experiments, we compared the adaptive two-level hashing with the basic hashing in terms of the average number of disk accesses to process each query. Fig. 5(a) shows the result under the workload of the low update ratio. In this case, the search performance dominates the overall performance. The number of disk accesses for a search query is decided by the number of objects in each hash bucket. The basic hashing with a small number of grid cells has fewer hash bucket. Therefore, it is affected by the number of objects much more than others. The performance of the adaptive hashing is better than the basic hashing with a small number of grid cells, although it is worse than the basic hash with a large number of grid cells. Fig. 5(b) shows the result under the workload of the medium update ratio. The performance gaps among the methods become smaller than those Fig. 5(a). Fig. 5(c) shows the result under the workload of the high update ratio. In this case, contrarily, the update cost dominates the overall performance. Consequently, the basic hashing with a large number of grid cells is the worst in every case, and the adaptive two-level hashing is always better than the basic hashing. It shows that the adaptive two-level hashing is efficient in the heavy update workload.

300

D. Kwon et al. / Data & Knowledge Engineering 56 (2006) 287–303 6 adaptive 20-80 adaptive 30-120 basic 20 basic 40 basic 60 basic 100

5.5 5 4.5

Average number of disk accesses

Average number of disk accesses

6

4 3.5 3 2.5

adaptive 20-80 adaptive 30-120 basic 20 basic 40 basic 60 basic 100

5.5 5 4.5 4 3.5 3 2.5 2

2 0

a

20

40 60 80 Number of objects (K)

0

100

20

b

40

60

80

100

Number of objects (K)

Average number of disk accesses

6 adaptive 20-80 adaptive 30-120 basic 20 basic 40

5.5 5

basic 60 basic 100

4.5 4 3.5 3 2.5 2 0

c

20

40 60 80 Number of objects (K)

100

Fig. 5. Average number of disk accesses under the different work load. (a) The low update ratio, (b) the medium update ratio, (c) the high update ratio.

5.3. Summary of the experiments In the high update ratio, the update performance of the adaptive two-level hashing is better than the basic hashing because the adaptive two-level hashing divides data into two groups and stores only the fast moving objects into the Coarse-Hash. The portion of the objects in the Coarse-Hash is relatively small, compared with the total number of the objects. Therefore, the update in the Coarse-Hash can be processed more efficiently. In the low update ratio, the adaptive two-level hashing is worse than the basic hashing with a large number of grid cells. This is because the adaptive two-level hashing needs to search both of the hash structures. However, since the adaptive two-level hashing has the Fine-Hash, it is better than the basic hashing with a small number of grid cells even in the worst case. Consequently, the adaptive two-level hashing can perform queries efficiently in both of the high update ratio and the low update ratio. 6. Conclusion This paper has proposed the adaptive two-level hashing method to support dynamic workloads efficiently. Since the number of grid cells is fixed at the initial time and cannot be changed during

D. Kwon et al. / Data & Knowledge Engineering 56 (2006) 287–303

301

the execution, traditional hashing algorithms have difficulty to support dynamic changes of the workload. The proposed method, however, has two levels of hashes, one for fast moving objects and the other for quasi-static object. A moving object changes its level adaptively according to the degree of its movement. By escalating and de-escalating objects between two levels, the proposed method can support the dynamic workload adaptively. As the future research, we plan to develop an adaptive way for processing moving objects with tree-based index structures, and make an hybrid-way that combines a tree-based structure with a hash-based structure together in order to support various applications.

References [1] M. Weiser, Some computer science issues in ubiquitous computing, Communications of the ACM 36 (7) (1993) 75–84. [2] J. Indulska, P. Sutton, Location management in pervasive systems, in: Proceedings of the Australasian Information Security Workshop Conference on ACSW Frontiers 2003, Australian Computer Society, Inc., 2003, pp. 143–151. [3] M.F. Mokbel, W.G. Aref, S.E. Hambrusch, S. Prabhakar, Towards scalable location-aware services: Requirements and research issues, in: Proceedings of the Eleventh ACM International Symposium on Advances in Geographic Information Systems, ACM Press, 2003, pp. 110–117. [4] Z. Song, N. Roussopoulos, Hashing moving objects, in: Proceedings of the 2nd International Conference on Mobile Data Management, 2001, pp. 161–172. [5] H.D. Chon, D. Agrawal, A.E. Abbadi, Using space-time grid for efficient management of moving objects, in: Proceedings of the 2nd ACM International Workshop on Data Engineering for Wireless and Mobile Access, ACM Press, 2001, pp. 59–65. [6] S. Saltenis, C.S. Jensen, S.T. Leutenegger, M.A. Lopez, Indexing the positions of continuously moving objects, in: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 2000, pp. 331– 342. [7] Y. Tao, D. Papadias, J. Sun, The TPR*-Tree: An optimized spatio-temporal access method for predictive queries, in: Proceedings of 29th International Conference on Very Large Data Bases, 2003, pp. 790–801. [8] D. Kwon, S. Lee, S. Lee, Indexing the current positions of moving objects using the Lazy Update R-tree, in: Proceedings of the 3rd International Conference on Mobile Data Management, 2002, pp. 113–120. [9] H.D. Chon, D. Agrawal, A.E. Abbadi, Storage and retrieval of moving objects, in: Proceedings of the 2nd International Conference on Mobile Data Management, 2001, pp. 173–184. [10] M.F. Mokbel, X. Xiong, W.G. Aref, SINA: Scalable incremental processing of continuous queries in spatiotemporal databases, in: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, ACM Press, 2004, pp. 623–634. [11] V. Gaede, O. Gu¨nther, Multidimensional access methods, ACM Computing Surveys 30 (2) (1998) 170–231. [12] M.F. Mokbel, T.M. Ghanem, W.G. Aref, Spatio-temporal access methods, IEEE Data Engineering Bulletin 26 (2) (2003) 40–49. [13] D. Pfoser, C.S. Jensen, Y. Theodoridis, Novel approaches in query processing for moving object trajectories, in: Proceedings of 26th International Conference on Very Large Data Bases, 2000, pp. 395–406. [14] Y. Tao, D. Papadias, MV3R-Tree: A spatio-temporal access method for timestamp and interval queries, in: Proceedings of 27th International Conference on Very Large Data Bases, 2001, pp. 431–440. [15] M.-L. Lee, W. Hsu, C.S. Jensen, B. Cui, K.L. Teo, Supporting frequent updates in R-Trees: A bottom–up approach, in: Proceedings of 29th International Conference on Very Large Data Bases, 2003, pp. 608– 619. [16] Y. Xia, S. Prabhakar, Q+Rtree: Efficient indexing for moving object databases, in: Proceedings of the Eighth International Conference on Database Systems for Advanced Applications, IEEE Computer Society, 2003, p. 175.

302

D. Kwon et al. / Data & Knowledge Engineering 56 (2006) 287–303

[17] Y. Tao, D. Papadias, Adaptive index structures, in: Proceedings of 28th International Conference on Very Large Data Bases, 2002, pp. 418–429. [18] K.-Y. Whang, R. Krishnamurthy, The multilevel grid file—a dynamic hierarchical multidimensional file structure, in: Proceedings of the Second International Symposium on Database Systems for Advanced Applications, 1991, pp. 449–459. [19] J. Nievergelt, H. Hinterberger, K.C. Sevcik, The grid file: An adaptable, symmetric multikey file structure, ACM Transactions on Database Systems 9 (1) (1984) 38–71. [20] T. Brinkhoff, A framework for generating network-based moving objects, GeoInformatica 6 (2) (2002) 153– 180. Dongseop Kwon is a Ph.D. candidate in the School of Electrical Engineering and Computer Science, Seoul National University, Seoul, Korea. He received his M.S. and B.S. degrees in the Department of Computer Engineering from Seoul National University, Seoul, Korea, in 1998 and 2000, respectively. His current research interests include spatio-temporal databases, high dimensional index structures, mobile data managements, and time series databases.

Sangjun Lee received his M.S. and B.S. degrees in the Department of Computer Engineering from Seoul National University, Seoul, Korea, in 1996 and 1998, respectively and Ph.D. in the School of Electrical Engineering and Computer Science, Seoul National University, Seoul, Korea in 2004. He is currently a full-time lecturer of the School of Computing, Soongsil University, Seoul, Korea. His current research interests include high dimensional index structures, mobile data managements, and multimedia databases.

Wonik Choi received his Ph.D., M.S., and B.S. degrees in the School of Electrical Engineering and Computer Science, Seoul National University, Seoul, Korea, in 2004, 1998 and 1996, respectively. He is a Senior Engineer with LBS R&D Center, Thinkware Systems Corporation, Seoul, Korea. His current research interests include spatio-temporal databases, mobile databases, geographic information systems, XML.

D. Kwon et al. / Data & Knowledge Engineering 56 (2006) 287–303

303

Sukho Lee received his BA degree in Political Science and Diplomacy from Yonsei University, Seoul, Korea, in 1964 and his M.S. and Ph.D. in Computer Sciences from the University of Texas at Austin in 1975 and 1979, respectively. He is currently a professor of the School of Computer Science and Engineering, Seoul National University, Seoul, Korea, where he has been leading the Database Research Laboratory. He served as the president of Korea Information Science Society in 1994. He served as the honorary chair in the International Symposium on Database Systems for Advanced Applications (DASFAA), 2004. His current research interests include database management systems, spatial database systems, and multimedia database systems.