Multidimensional Index Structure for Spatial Database Management ...

4 downloads 5778 Views 730KB Size Report
Abstract— Many spatial databases management systems use R-tree or one of its variants as ... Index Terms—multidimensional index, R-tree, spatial database.
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 2, FEBRUARY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

9

Multidimensional Index Structure for Spatial Database Management Systems Mabruk Fekihal, Ibrahim Jaluta, Izzeldin Osman Abstract— Many spatial databases management systems use R-tree or one of its variants as an index for efficient access of spatial objects in the database. However, update operations on R-tree index structure such as page split, object insertion, and object deletion are not efficient. Page-split in R-tree and its variants is a very expensive operation, searching for an object or deleting an object may follow several paths from the root page to the target leaf page. Moreover, recovery algorithms for R-tree and its variants can be quite complex. In this paper, we present a new Multidimensional index structure, RB+-tree, that performs search, insert, and delete operations efficiently as in B±tree. In our RB+-tree algorithms only one path is followed when executing search, insert, or delete operations. The RB+-tree structure-modification operations such as page split or merge are done in a simple way as in B+-tree. The performance of the range searching (window queries) in our RB+-tree index may not match that of the R-tree or its variants, because sophisticated split algorithms were used in R-tree and its variants to minimize the overlap. Index Terms—multidimensional index, R-tree, spatial database.

——————————  ——————————

1 INTRODUCTION Spatial database systems ([14, 15, 17, 18]) are concerned with the representation and manipulation of data that have a geometrical or topological interpretation such as the physical world (geography, urban planning, astronomy), parts of living organisms (anatomy of the human body), and engineering design (very large-scale integrated circuit) In order to efficiently access spatial objects in spatial databases, spatial indices, which utilize spatial information, have been developed. Even though there has been a lot of work on the index structures for spatial data, there is still lot of improvements that need to be made on these multidimensional index structures to enhance their performances. Several multidimensional index structures (e.g. [6, 3, 9, 10, 8]) have been proposed for the last two decades. Rtree is one of the most popular spatial index. Several variants of R-trees ([1, 2, 4, 5, 7, Kat 97, 11, 16]) have also been proposed as multidimensional index structures. R-tree and its variants can be used for both point and spatial data. Most of the spatial databases management systems use R-tree or one of its variants as access method to access spatial objects that are stored in the spatial database. However, R-tree index structures have some drawbacks, which affect the performance of the R-tree index. Moreover, the recovery protocols for R-tree and its variants can be quite complex.

Performance is very critical for spatial databases. The objective of this research paper is to introduce a new spatial index structure which has the features and properties of both the B+ tree and R-tree. This would be achieved by trying to keep objects that are close in space to be stored close to each other on disk pages. We introduce new structure called RB+-tree. The RB+tree will attempt to acquire the features of both B+-tree and R-tree. For simplicity, our RB+-tree algorithms consider 2-dimensional spatial objects, but the algorithms can be extended to higher dimensions. In Section 2, we outline the structure of the RB+-tree. In Section 3, we present the RB+-tree algorithms for searching, insertion, deletion, splitting, merging, and redistribution. In section 4, we compare the RB+-tree to Rtree. In Section 5, we draw our conclusions and point out our future research.

2 RB+-TREE The RB+-tree is a multi-level tree, with all leaves on the same level. RB+-tree is a highly balanced tree; it has many of the properties of both the B+-tree and R-tree. The structure of the RB+-tree is designed so that a spatial search involves visiting only a small number of pages (nodes).

2.1 RB+-tree Structure

We assume that RB+-tree is a secondary index on the spatial database, where the database records are stored on a  Mabruk Fekihal is with the Faculty of Computing and Information spatial on a separate file (heap) file. Technolog, Sohar University, Oman. 1. Each index non-leaf (including the root) page P stores  Ibrahim Jaluta is with the Department of Computer Science and Engineerstores a list of index record with the format: {K, ((Xing, Helsinki University of Technology, Helsinki, Finland. min, Y-min), (X-max, Y-max)), Q}, where ((X-min, Y Izzeldin Osman is with the Department of Computer Science, Sudan Uni————————————————

versity of Science & technology, Khartoum, Sudan.

© 2012 Journal of Computing Press, NY, USA, ISSN 2151-9617 http://sites.google.com/site/journalofcomputing/

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 2, FEBRUARY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

min), (X-max, Y-max)) is the MBR associated with the child page Q of P, K is the key value associated with this MBR, and it is the highest key value in the child page Q, and Q is the page-id of the child page of P. That is, each child page Q of parent P has exactly one index record associated with it in the parent P. 2. Each index leaf page P contains records with the following format: (Ki, DPi), where DPi is a pointer that points to a file block containing the record whose search field value is Ki. Hence, K serves as: i) The MBR ((Xmin, Ymin), (Xmax, Ymax)) associated with the spatial object O. ii) The key value associated with the MBR of object O, and its value is Xmin||Ymin||Xmax||Ymax, Where “||” means concatenate.

10

page is used to direct the search while answering a Window Query. Each MBR uniquely identifies a rectangle that covers a certain area of the search space. The rectangles can overlap. For the window search, each MBR is used to direct the search to find all the objects that overlap with the window query.

2.2 RB+-tree Properties The leaf and non-leaf pages have different structure. Therefore the leaf and non-leaf have different properties.

Properties of non-leaf pages in RB+-tree Let M1 be the maximum number of entries (records) in a non-leaf index page. Then, the root has at least 2 or  M 1  2    children, all nodes except the root have at least  M 1  2    children. Each non-root page holds (p-1) records and p pointers to sub-trees, where

 M 1  2   p  M1   .

Fig. 1. Twenty spatial objects in 2D space

Properties of leaf pages in RB+-tree Let M2 be the maximum number of records (entries) in a leaf index page. Then all leaf pages (nodes) have at least  M 2  record.  2    All leaf nodes appear at the same level; that is, they are at the same distance from the root. An example of an RB+-tree is given on the next page. Example 1: Suppose we have the following spatial objects with the MBRs, R1, R2, ..., R20 as shown in Figure 1.

3

SEARCHING AND UPDATING IN RB+-TREE

3.1 Keys in RB+-tree In the RB+-tree, we have two types of searches. The first one is the region search (Window query) as in the R-tree and its variants. The second search method is the exactobject search, which is not available in the R-tree and its variants.

3.2 Regions Search (Window Query)

If we associate a key with each object, so that the object can be uniquely identified, then a single object can be accessed efficiently as in the case of B+-tree. We define two types of keys in RB+-tree as follows:

The search algorithm descends the tree from the root in a manner similar to B+-tree and R-tree. However, more than one sub-tree under a node visited may need to be searched. The MBRs in the non-leaf index pages are used to guide the search to the leaf nodes that satisfy the query.

Object-Key (K):

Regions search Algorithm

2.3 Keys in RB+-tree

This is a key which is uniquely identifies an object in RB+-tree. The object-key K is constructed as follows. K= Xmin||Ymin||Xmax||Ymax, where “||” means concatenate. The insert and delete operations use Object-Key for object-key search, and to insert or delete an object into or from the database.

Area-Key (K) The MBR (Minimum Bounding Rectangle) in the non-leaf

Given RB+-tree whose root node is R, find all objects whose rectangles overlap with the window query W. Let linked-list1 and linked-list2 be two linked lists. 1. linked-list1 = linked-list2= 0; /* initialization */ 2. R  the root page; 3. if R is not a leaf page then

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 2, FEBRUARY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

11

Fig. 2. The RB+-tree for the spatial objects shown in Fig 1

find all children pages that satisfy (overlap with) the window query; store their Page-ids into the linked-list1; else; find the objects in this page that overlap with the window query; 4. Search every page whose Page-id is stored in the linked-list-1; if the current level is not a leaf pages level then find all children pages that overlap with the window query; store their Page-ids into the linked-list2; else; find objects in this page that overlap with the window query; 5. linked-list1= 0; 6. Search every page whose Page-id is stored in the linked-list-2; if the current level is not a leaf pages level then find all children pages that overlap with the window query; store their Page-ids into the linked-list1; else find objects in this page that overlap with the window query: 7. linked-list2= 0; 8. Go to Step 4.

3.3 Exact-Object Search The search algorithm descends the tree from the root to find the appropriate leaf node. The Object-Key described in section 1.3.1 is used to guide the search to find the leaf node that contain the object whose key (MBR) satisfies the query.

Exact-object Search Algorithm

Search for an object with key S proceeds as follows. 1. S  the object-key value. 2. N  the root page. 3. if N is a leaf page then search for an entry (Ei) with a key value Ki = S; if key is found then return True; else return False; 4. if N is not a leaf then /* q  ≤ p where p is the order of the tree */ /* and Pi is a page pointer if (S ≤  N.K1 ) then N=N.P1; else if (S > N.Kq) then N=N.Pq+1; else if search N for an entry i such that N.Ki -1 < S  N.Ki; N=N.Pi; 5. Go to Step 3. Example 2: suppose that we are given the object- key (S= 14031613) of the object R11, and we want to search for this object in the RB+-tree of Figure 2. Then we proceed as follows.  Read the root node (N9). As the key value of R11 is greater than the highest key value, then the child page that the pointer Pq+1 points to (i.e., N8) is read.  Read node N8. Since the key value of R11 is less than the first key value in N8, then N5 is read.  N5 is a leaf page. Then a check is made on the entries in this page to locate the object.

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 2, FEBRUARY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

3.4 Insertion Inserting a new record into the RB+-tree is similar to inserting a tuple into the B+-tree. The idea of the insertion algorithm is as follows: given an entry (record), find the leaf node where it belongs, and inserts it there. Sometimes the leaf node is full and it must be split. When a page (node) splits, some records remain in the old node and the rest are moved to the new node. We are assuming that each object O in the 2dimensional plane is represented by the smallest possible rectangle (MBR, Minimal Bounding Rectangle) that fits the object.

Insert Algorithm To insert an object O with MBR= (xmin, ymin, xmax , ymax), then we proceed as follows. 1.first, form the object-key K, That is, K= xmin||ymin||xmax||ymax, where “||” means concatenate. 2. invoke Exact-object search algorithm if object-key is found then return (“record uniqueness violation”); 3.if the leaf node N has a room for the MBR of the new object O then P  parent page of N; insert MBR of object O into N; if the size of MBR of N is changed then update the entry in parent P that points to N; updating of MBRs may propagate up to the root; 4. if the leaf node has no room for MBR of object O then insert MBR of object O into the right position in N; invoke Redistribute Function; if redistribution is not possible then invoke Split-Node function; update MBRs in the parent page of the old and new pages. The MBR updating may prorogate up to the root node; 5. Return;

Redistribute Function(s): The goal of this function is to distribute records between sibling pages (nodes) to reduce splitting of pages in case of a page overflow when inserting, and to redistribute records between nodes when merging is not possible (in case of page underflow) when deleting. As the structure of the leaf and non-leaf nodes is different, we have two redistribute functions. There is small difference between them. So we will explain only the distribution for the leaf nodes. Given an object O to be inserted in leaf page Q whose parent is P, and Q has no room to accommodate O. Then the redistribute algorithm is executed as follows: 1. Find the position of child page Q with respect to its parent page P. The position would be (1) the right most child of P (2) the left most child of P or (3) Q has

12

a left and right sibling. 2. if Q is the right most child of P then Q-leftsib  Q’s direct left sibling page; if Q-leftsib is full then return False; else n1= (number of records in Q-leftsib + number of records in Q)/2, (where n1 is the number of records in Q-leftsib after redistribution); n2= (number of records in Q-leftsib + number of records in Q) – n1, (where n2 is the number of records in Q after redistribution); move records from Q to Q-leftsib; update the MBR of the entry in P that points to Q, (changes of MBRs may propagate up to root); update the MBR of the entry in P that points to Q-leftsib; /* MBRs updating may propagate up to the root */ 3. if Q has a right sibling then Q-rightsib  Q’s direct right sibling page /* Q and Q-rightsib have the same parent*/ if Q-rightsib is full; then return False; else n1= (number of records in Q-rightsib + number in Q)/2, (where n1 is the number of records in Q after distribution); n2= (number of records in Q-rightsib + number of records in Q) – n1, (where n2 is the number of records in Q-right after redistribution); move records from Q to Q-rightsib; update the MBR of the entry in P that points to Q; /* MBRs updating may propagate up to the root */ update the MBR of the entry in P that points to Q-rightsib; /* MBRs updating may propagate up to the root *

3.5 Page-Split Algorithm If a page (node) is full, then adding another record will cause an overflow. Therefore, it is necessary to divide the (M+ 1) entries between the two pages. Assume that an object O needs to be inserted into the full leaf page Q. Then the Page-Split algorithm is performed as follows. 1. if Q is a root then invoke Root-Split Algorithm; return; 2. P  the parent node of Q. 3. invoke Redistribute function; if redistribution occurs then return; /* no page split is required*/ 4. if redistribution is not possible then allocate a new page Q’;

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 2, FEBRUARY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

move the upper half (after dividing key) of the records in the old page Q to the new page Q’; if Q is a leaf node then the dividing key remains in the leaf node and a copy of it is copied into the parent node; else /* split a non-leaf node */ move the dividing key up to P; 5. compute MBRs of Q and Q’; 6. insert new entry of the form () in P; if Q was not the right most child of P then Kj is the highest key in Q’, Pj the address of Q’ and MBRj is the new rectangle that covers the entries in the Q’; else Kj is the highest key in Q, Pj the address of Q and MBRj is the new rectangle that covers the entries in the Q; set Pq+1 in P to point to Q’ (which was pointing to Q); 7. if P has a room for the new record then check if the new record changes the size of the rectangle that’s cover P. If it does, then the MBR in the entry in the parent page of P (say P’) which points to P must be updated. The updates of the MBRs in the parent nodes may propagate up to the root; return; 8. if P is full then invoke Redistribute function; if redistribution occurs then return; /* no page split is required */ else; /* non-leaf page split is required */ Q =P; 9. Go to Step 1.

Root-Split Algorithm. If the root is full, then the root has to be split. The rootsplit algorithm is as follows.: 1. R  the current root page (R will be either leaf or nonleaf page); 2. Allocated new root R’ (non-leaf page); 3. if R is a leaf page then allocate a new leaf page N; copy the record with dividing key to R’; move the upper half of R to N; else allocate new non-leaf page N; move the record with dividing key up to R’; move the upper half of R to N; 4. return;

3.6 Delete Algorithm The deletion algorithm guarantees that no node in RB+tree is less than half filled. Assume that the object to be deleted is O and its MBR is given by (xmin , ymin, xmax, ymax).

13

1. Now, K= xmin||ymin||xmax||ymax, where “||” means the concatenation of the coordinates of MBR of object O as a text string); 2. invoke Exact-object search algorithm; if object-key is not found then return (“object not found”); 3. Delete the object from the leaf node; if no underflow and no change in page MBR then return; 4. if no underflow and page MBR changes then propagate the updating (shrinking) of MBRs up the tree; 5. if underflow occurs then invoke Merge-Nodes Function; if merge is not possible then invoke Redistribute Function; 6. Return;

MERGE-NODES FUNCTION Merging pages is the process of moving the content of one page to its sibling page that has the same parent page, moreover, we only merge to the left. And then the empty page is removed from the tree. Assume that the object O to be deleted is found in the leaf page Q, and the deletion of O causes Q to under-flow, then the merging algorithm is performed as follows. 1. if P is the root page with only two child leaf pages Q and L then if Q is the right most child of P, and L has room to accommodate all records in Q then /* decrease tree height to one */ move all records from Q to L; remove P and Q from the tree; mark L as the new root in the tree header; if Q is the left most child of P, and has room to accommodate all records in L then move all records from L to Q; remove P and L from the tree; mark Q as the new root in the tree header; return; 2. if P is the root page with only two child non-leaf pages Q and L then if Q is the right most child of P, and L has room to accommodate all records in Q then /* decrease tree height */ move the first record in P to L after the last record in L; move all records from Q to L; remove P and Q from the tree; mark L as the new root in the tree header; if Q is the left most child of P, and has room to accommodate all records in L then

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 2, FEBRUARY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

move the first record in P to Q to be the last record in Q; move all records from L to Q; remove P and L from the tree; mark Q as the new root in the tree header; return; 3. if P is not the root page or a root page with more than two children and Q is the right most child of P then L  the direct left sibling of Q which has P as its parent; if L has room to accommodate all records in Q then move all records from Q to L; compute the new MBR of L; update the MBR entry in P that points to L /* MBRs updating may propagate up to the root */ if P is not the root then copy the highest key in L to the appropriate

upper page; if Q is leaf page then remove Q from the tree; remove the entry in P that points to Q;

if there is an underflow in P then

Q= P; Go to Step 1. else return; 4. if P is not the root page or a root page with more than  two children and Q is not the right most child

of P then R  the direct right sibling of Q which has P as its parent; if Q has a room to accommodate R’s records then move all records from R to Q; compute the new MBR of Q; update the MBR entry in P that points to Q, /* MBRs updating may propagate up to the root */ copy the highest key in Q to the entry in P that points to Q; if Q is leaf page, then remove R from the tree; remove the entry in P that points to R; if there is an underflow in P then Q= P; Go to Step 1; else return;

4 ANALYTICAL COMPARISON OF RB+-TREE WITH R-TREE AND ITS VARIANTS Many multidimensional index structures have been proposed as index structures for spatial database systems.

14

However, none of them has a performance that matches the performance of the B-tree. The proposed spatial index structure, RB+-tree, has the most desirable features of the B+-tree. Insertion, deletion, split, and merge algorithms in the proposed RB+-tree are similar to those in the B+-tree (with minor differences due to the presence of MBRs in the RB+-tree structure). Moreover, the proposed index structure supports exact-object retrieval. Because the R-tree is the most widely used spatial index in spatial database management systems, and R*-tree is an improvement over the R-tree, in the following we compare the proposed RB+-tree with the R-tree and the R*-tree

4.1 Comparison of RB+-tree with R-tree R-tree is based on the heuristic optimization of the area of the enclosing rectangle in each inner node. We compare RB+-tree with R-tree regarding insertion, deletion, and exact-object search operations.

4.2 Insertion Comparison In the original R-tree [6], the insertions of new objects are directed to leaf nodes. At each level, we pick the entry of a node that either contains the new object’s MBR or it needs least enlargement to include the new object’s MBR. If several entries satisfy this condition, the one with smallest area is selected. At the end the object is inserted into an existing leaf if the page has room for the new object, otherwise a split takes place. The minimization of the sum of the areas of the two resulting nodes being the driving criterion, Guttman [6] proposed three alternative algorithms to handle splits, which are of linear, quadratic and exponential complexity. In comparison, the insertion in RB+-tree is quite simple, it is the same as the insertion in B+-tree. Page split is very simple too. There are no extensive calculations to determine the split dimension when a page needs to be split as in [6]. The simplicity of the insertion and page split in RB+-tree, which is the same as in B+-tree, implies the RB+-tree is outperforming R-tree in this regard.

4.3 Deletion Comparison To delete an object O from the R-tree whose root is R, multiple paths may be followed to locate the data node(s) that contain the MBR of the object. The deleter invokes the algorithm FindLeaf, which traverse the tree from the root down to the leaf nodes. If R is not a leaf node, then check each entry Ei in R to determine if Ei overlaps with O, for each such entry invoke FindLeaf whose root is pointed to by Ep until O is found or all entries have been checked. In the worst case, it may happen that the object O overlaps with a large number of leaf nodes. In this case, a large part of the R-tree must be searched. If the object is not found, the algorithm stops. Otherwise the object is removed from the leaf L. If L had just m entries before deletion; the deletion will cause L to become under-full. If that happens, the node is eliminated and each of its entries is reinserted later, reinsertion sometime cause nodes to split. Algorithm Condense-Tree ascends from leaf L to root R, adjusting covering rectangles and propagating

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 2, FEBRUARY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

node elimination upward as necessary. The deletion in RB+-tree is just the same as in B+-tree. Given an object O to be deleted, the number of pages to access from the root R to a leaf L is equal to height of the tree even if O overlaps with a large number of leaf nodes. Thus deletion follows one path from the root page to the leaf page. Therefore the RB+-tree is faster than R-tree for finding the object to be deleted. Also, if removing the object O from the leaf node causes the leaf to become under-flown, the merging of nodes in the RB+-tree is quite simple and it is the same as in B+-tree, i.e., the records in the under-full page are moved to the sibling page. Whereas, in R-tree, the records in the under-full page are re-inserted, and this might cause a page split.

15

zero width.) The first data set, "rrlines-window-random", contains completely random window rectangles: x1, y1, x2 and y2 are

4.4 Exact-object search The search for a certain object in the RB+-tree is similar to the key search in B+-tree. That is, the search follows only one path, and the number of pages to be accessed is equal to the height of the tree. In contrast, the search for an object in the R-tree cannot guarantee that only one search path is followed when executing an exact-match query. In the worst case, it may occur that an object O overlaps with MBRs of many leaf pages, in which case most of the R-tree is searched for such an object. Hence, the RB+-tree clearly outperforms the R-tree and its variants in the exact-match search.

4.5 Experimental comparison of RB+-tree and Rtree The window query algorithm in RB+-tree is identical to that of the R-tree. However, due to the sophisticated insert/split algorithms used in the R-tree and its variant we do not expect the performance of the range searching (window queries) in the RB+-tree index structure to match that of the Rtree or its variants. In order to compare the performance of window queries for RB+-tree and R-tree, we run an implementation of R-tree on real data for railroads in Germany. The railroads map for the sample data is shown in Figure 3. The data set “rrlines” is found at http://www.rtreeportal.org/spatial.html. The R-tree implementation is the basic Guttman R-tree with Quadratic Split, m=2. In order to compare the window queries in RB+-tree with the window query in R-tree, three random sets of window queries are used:  rrlines-window-random  rrlines-window-1  rrlines-window-5 each one containing 10000 random windows. All windows are completely inside the smallest rectangle that covers all the MBRs in the rrlines data set; thus, all X coordinates are between 482 and 9500, and all Y coordinates are between 406 and 7904 (the numbers are the smallest and largest X and Y coordinates in the rrlines data). Any zerowidth windows are left out, so none of the windows have x1=x2 or y1=y2. (Zero-width window queries cannot find any objects, because the "overlap" criterion used for window queries (r.x1 < q.x2 and q.x1 < r.x2 and r.y1 < q.y2 and q.y1 < r.y2) is not true for any MBR r, if the query rectangle q has

Fig. 3. Road-Rail Map in Germany random numbers in the area given above (sorted so that x1 < x2 and y1 < y2). With completely random rectangles, the number of objects returned by the different window queries varies greatly, because the sizes of the windows are random (i.e., some of the queries return nothing, while some return almost every object in the tree). So we thought it would be useful to have queries with a more predictable "selectivity", and we created the other two data sets using random rectangles with a fixed area. The data set "rrlines-window-1" has window rectangles whose area is about 1% of the total area of the covering rectangle in the rrlines data set. Similarly, the data set "rrlineswindow-5" has windows with area 5% of the total area. (The rectangles were created by choosing a random width, then calculating a height that gives the wanted area, and finally choosing a random position x1,y1 for the rectangle so that it is still completely inside the covering rectangle of the data set.) The rrlines data set contains 36334 MBRs, but 4 of these are duplicates, the duplicated MBRs where not inserted into both RB+-tree and R-tree , so R-tree and RB+-tree both contain 36330 objects. Table 1 show some results for RB+-tree and R-tree using the rrlines data set. The number of pages occupied by R-tree is 311 pages. However the number of pages allocated for RB+-tree is 194. The result shows that, RB+-tree uses 2/3 of the space occupied by R-tree, that is, RB+-tree saves a lot of space but this also increase the overlapping of the MBRs which effects the performance of the window query. As a result of running window queries on R-tree; the numbers of objects returned by the queries in the three data sets are shown the table 2: Thus, for example, the 10000 window queries in the

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 2, FEBRUARY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

rrlines-window-1 data set returned an average of 441 objects/query, which is about 1% of all 36330 objects in the

16

TABLE 3 NUMBER OF OBJECTS RETURN BY RB+-TREE USING SAMPLE DATA SE

TABLE 1 COMPARISON OF RB+-TREE AND R-TREE USING TEST DATA

Data Set

Number of objects % of all 36330 returned by objects window query

RB+-tree

R-tree

Data set used

Real test data

Real test data

No. of MBRs

36334

36334

rrlines-windowrandom

5566

35056

0%

15% 96%

4

rrlines-window-1 0

441

1435

0%

1%

4%

rrlines-window-5 15

2286

4808

0%

6%

13%

No of duplicate MBRs

4

No. of records inserted

36330

36330

Block size

4069 byte

4069 byte

Size of index

194

311

Tree height

3

3

Max & min of records in non-leaf nodes

max=112 min= 56

max=204 min= 2

Max & min of records in Leafnodes

Max= 255 min= 127

max= 204 min= 2

min

Data Set

Number of objects returned by window query min avg

rrlineswindowrandom 0

max

% of all 36330 objects % min

0%

Data Set

Pages touched by %Of all 311 pages one window query avg

max

% min

% avg

rrlines-windowrandom 1

66.90

310

0%

22%

100%

rrlines-window-1 1

21.80

58

0%

7%

19%

rrlines-window-5 3

40.74

87

1%

13%

28%

min

PAGES TOUCHED BY WINDOW QUERY USING RB+-TREE

Data Set

15% 96%

441

1435

0%

1% 4%

rrlineswindow-5 15

2286 4808

0%

6% 13%

tree. A few of the queries in the rrlines-window-random and rrlines-window-1 data sets returned 0 Objects, these are probably window rectangles that contain no objects that overlap with the window. Table 3 shows the results of our RB+-tree using the three window query data sets. The results of R-tree and RB+-tree in tables 2 and table 3 respectively are exactly the same. These tables are used to check that we are using the same data and that the queries work in the same way. The actual results i.e. the number of pages touched by these window queries for R-tree and RB+-tree is shown the table 4 and 5 respectively. A window query in the rrlines-window-random data set read an average of 66.90 of R-tree pages, which is about 22% of all 311 pages in R-tree. Similarly a window query in the rrlines-window-random data set read an average of 75.89 of RB+-tree pages, which is about 39% of all 194 pages in RB+tree. The minimum of 1 page in the rrlines-window-random random and rrlines-window-1 data sets means that there

% max

MBRs that overlapped the query window. TABLE 5

% % avg max

rrlineswindow-1 0

% % % min avg max

NUMBER OF OBJECTS RETURN BY RB+-TREE USING SAMPLE DATA SET

Pages touched by %Of all 194 pages one window query min

5566 35056

max

was a window query that did not need to descend down from the root page i.e., none of the children of the root had TABLE 4

TABLE 2 NUMBER OF OBJECTS RETURN BY R-TREE USING SAMPLE DATA SETS

0

avg

avg

max

% min

% avg

% max 100%

rrlines-windowrandom

1

75.89

194

1%

39%

rrlines-window-1

1

97.07

194

1%

50%

100%

rrlines-window-5

4

108.71

194

2%

56%

100%

The comparison of table 4 and table 5 shows that R-tree outperforms RB+-tree in the window query operation.

4

CONCLUSION

Modern spatial databases systems are commonly used to store spatial data. In multidimensional indexes we need to respond to exact-match queries and range queries. Therefore, data needs to be organized in such a way so that the spatial proximity is preserved and the update and query operations can be performed efficiently. To meet the needs of spatial indexing, several spatial indexing mechanisms were proposed. Most of the spatial databases management systems use R-tree or one of its variants as an index for efficient access of spatial objects in the database. However, the R-tree index structure suffers from some drawbacks. For example, the page split in the R-tree and its variants is a very expensive operation, searching for an object may follow several paths from the root page to the target leaf (data) page, and deleting an object may also follow several paths from the root page to the target leaf page. These deficiencies can impact the performance of R-tree in terms of

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 2, FEBRUARY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

the number of accessed disk pages. Moreover, the recovery protocols for R-tree and its variants can be quite complex. In a one dimensional data, there is a natural ordering. The TABLE 6 COMPARISON OF RB+-TREE, R-TREE AND R*-TREE

RB+-tree

R-tree

R*-tree

Exact Object search

One path is followed

One or several paths are followed

One or several paths are followed

Window Query

Inferior, as simple-page split is used that creates more overlap

Superior, as Superior, as complex-age split complex-page is used to mini- split is used to mize the overlap minimize overlap

Object insertion

Simple (may Sophisticated, split full pag- expensive comes) putations

expensive computat-ions

Page split

Simple and fast

Expensive, Sophistic-ated

Expensive, sophisticated

Object deletion Simple (may Not simple, (remerge under- insert objects in flow page) case of underflow)

Not simple, (reinsert objects in case of underflow)

Page merge

Simple and fast

No (Reinsert)

No (Reinsert)

Record Redistribute

Simple and fast

No

No

most widely used index structure in traditional database systems is the B-tree, because, it’s insert, delete, split, and merge operation are very efficient. In this paper, we proposed a new multidimensional spatial index called the RB+-tree. It combines some of the good properties of the B+-tree and the R-tree. The RB+-tree performs search, insert, and delete operations as in B+-trees, where few pages are accessed, and only one path is followed when executing search, insert, and delete operations. The R-tree uses the window query for all operations, whereas the RB+-tree uses it for range search only and uses an exact-object search for all other operations. RB+-tree uses two type of search algorithms (1) exact-object search which is used when searching, inserting, and deleting of an object. This search algorithm follows one path from the root to the leaf. Therefore the number of pages to be accessed for any of the above operations is equal to the height of the tree. (2) Window query search is used only for answering the window queries. The RB+-tree index-structure modification operations such as page split, merge, redistribute, increase-indexheight, and decrease-index-height are performed in a simple way as in B+-trees compared to the complex page split in R-tree and its variants. We do not expect the performance of the range searching (window queries) in our new index structure to match that of the R-tree or its variants, because sophisticated split algorithms were used in R-tree and its variants to minimize the overlap. However,

17

the implementation of search, insert, delete operations, and the indexstructure modification operations in our new multidimensional index, RB+-tree, are simpler and more efficient than that of R-tree and its variants. Moreover, many existing concurrency and recovery protocols [12, 13] can be used for RB+-tree.

Future Work We will run more experiments to evaluate the performance of window queries and the performance of RB+-tree in general. Also, we will investigate how to improve the performance of this RB+-tree.

REFERENCES [1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9] [10]

[11]

[12]

[13]

[14] [15]

P. K. Agarwal, M. deBerg, J. Gudmundsson, M. Hammar and H.J. Haverkort: “Box- trees and R-trees with Near Optimal Query Time”, Proceedings Symposium on Computational Geometry, pp.124-133, Medford, MA, 2001. C. H. Ang and T. C. Tan: “New Linear Node Splitting Algorithm for Rtrees”, Proceedings 5th SSD Conference, pp.339-349, Berlin, Germany, 1997. N Beckmann, H.P. Kriegel, R. Schneider, and B. Seeger, “The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles,” Proc. ACM SIGMOD, pp. 322-331, 1990. S. Brakatsoulas, D. Pfoser and Y. Theodoridis: “Revisiting R-tree Construction Principles”, Proceedings 6th ADBIS conference, pp. 149-162, Bratislava, Slovakia, 2002. Y. Garcia, M. Lopez and S. Leutenegger: “On Optimal Node Splitting for R-trees”, Proceedings 24th VLDB Conference, pp.334-344, New York, NY, 1998. A. Guttman: “R-trees: a Dynamic Index Structure for Spatial Searching”, Proceedings ACM SIGMOD Conference, pp.47-57, Boston, MA, 1984. P. W. Huang, P. L. Lin and H. Y. Lin: “Optimizing Storage Utilization in R-tree Dynamic Index Structure for Spatial Databases”, Journal of Systems and Software, Vol.55, pp.291-299, 2001. J. Jin, N. An and A. Sivasubramanian: “Analyzing Range Queries on Spatial Data”, Proceedings 16th IEEE ICDE Conference, pp.525-534, San Diego, 2000. I. Kamel and C. Faloutsos: “On Packing R-trees”, Proceedings 2nd CIKM Conference, pp.490-499, Washington, DC, 1993. Ibrahim Kamel, Christos Faloutsos, Hilbert R-tree: An Improved R-tree using Fractals, Proceedings of the 20th International Conference on Very Large Data Bases, p.500-509, September 12-15, 1994. R. K. V. Kothuri, S. Ravada and D. Abugov: “Quadtree and R-tree Indexes in Oracle Spatial: a Comparison Using GIS Data”, Proceedings ACM SIGMOD Conference, pp.546-557, Madison, WI, 2002. C. Mohan. ARIES/IM: an efficient and high concurrency index management method using write-ahead logging. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 371-380. ACM Press New York, NY, USA, 1992 C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems, 17(1):94-162, 1992. Hanan Samet, The design and analysis of spatial data structures, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1990. Hanan Samet, Applications of spatial data structures: Computer graphics, image processing, and GIS, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1990

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 2, FEBRUARY 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG

[16] T. Sellis, N. Roussopoulos and C. Faloutsos: “The R+-tree - a Dynamic Index for Multidimensional Objects”, Proceedings 13th VLDB conference, pp.507-518, Brighton, England, 1987. [17] T. Sellis, N. Roussopoulos and C. Faloutsos: “Multidimensional Access Methods: Trees Have Grown Everywhere”, Proceedings 23rd VLDB Conference, pp.13-14, Athens, Greece, 1997. [18] S. Shekhar, S. Chawla, S. Ravada, A. Fetterer, X. Liu and C. T. Lu, “Spatial Databases Accomplishments and Research Needs,” IEEE Transaction on Knowledge and Data Engineering, Volume 11, No. 1, 1999.

18

Suggest Documents