Spatial Databases

65 downloads 2572 Views 134KB Size Report
Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke. 1. Spatial Data ... Store spatial objects such as surface of airplane fuselage.
Spatial Data Management Chapter 28

Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke

1

Types of Spatial Data ™

Point Data ƒ Points in a multidimensional space ƒ E.g., Raster data such as satellite imagery, where each pixel stores a measured value ƒ E.g., Feature vectors extracted from text

™

Region Data ƒ Objects have spatial extent with location and boundary ƒ DB typically uses geometric approximations constructed using line segments, polygons, etc., called vector data.

Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke

2

Types of Spatial Queries ™

Spatial Range Queries ƒ Find all cities within 50 miles of Madison ƒ Query has associated region (location, boundary) ƒ Answer includes ovelapping or contained data regions

™

Nearest-Neighbor Queries ƒ Find the 10 cities nearest to Madison ƒ Results must be ordered by proximity

™

Spatial Join Queries ƒ Find all cities near a lake ƒ Expensive, join condition involves regions and proximity

Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke

3

1

Applications of Spatial Data ™

Geographic Information Systems (GIS) ƒ E.g., ESRI’s ArcInfo; OpenGIS Consortium ƒ Geospatial information ƒ All classes of spatial queries and data are common

™

Computer-Aided Design/Manufacturing ƒ Store spatial objects such as surface of airplane fuselage ƒ Range queries and spatial join queries are common

™

Multimedia Databases ƒ Images, video, text, etc. stored and retrieved by content ƒ First converted to feature vector form; high dimensionality ƒ Nearest-neighbor queries are the most common

Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke

4

Single-Dimensional Indexes B+ trees are fundamentally single-dimensional indexes. ™ When we create a composite search key B+ tree, e.g., an index on , we effectively linearize the 2-dimensional space since we sort entries first by age and then by sal. 80 ™

70 60 50 40 30 20 10

Consider entries: , ,

Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke

B+ tree order 11

12

13

5

Multidimensional Indexes A multidimensional index clusters entries so as to exploit “nearness” in multidimensional space. ™ Keeping track of entries and maintaining a balanced index structure presents a challenge! ™

Consider entries: , ,

80 70 60 50 40 30 20 10

Spatial clusters

11

Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke

12

13

B+ tree order 6

2

Motivation for Multidimensional Indexes ™

Spatial queries (GIS, CAD). ƒ Find all hotels within a radius of 5 miles from the conference venue. ƒ Find the city with population 500,000 or more that is nearest to Kalamazoo, MI. ƒ Find all cities that lie on the Nile in Egypt. ƒ Find all parts that touch the fuselage (in a plane design).

™

Similarity queries (content-based retrieval).

™

Multidimensional range queries.

ƒ Given a face, find the five most similar faces. ƒ 50 < age < 55 AND 80K < sal < 90K

Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke

7

What’s the difficulty? ™

An index based on spatial location needed. ƒ One-dimensional indexes don’t support multidimensional searching efficiently. (Why?) ƒ Hash indexes only support point queries; want to support range queries as well. ƒ Must support inserts and deletes gracefully.

Ideally, want to support non-point data as well (e.g., lines, shapes). ™ The R-tree meets these requirements, and variants are widely used today. ™

Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke

8

The R-Tree The R-tree is a tree-structured index that remains balanced on inserts and deletes. ™ Each key stored in a leaf entry is intuitively a box, or collection of intervals, with one Root of interval per dimension. R Tree ™ Example in 2-D: ™

Y X Database management Systems, 3ed, R. Ramakrishnan and J. Gehrke

Leaf level 9

3

R-Tree Properties Leaf entry = < n-dimensional box, rid >

™

ƒ This is Alternative (2), with key value being a box. ƒ Box is the tightest bounding box for a data object.

Non-leaf entry = < n-dim box, ptr to child node >

™

ƒ Box covers all boxes in child node (in fact, subtree).

All leaves at same distance from root. ™ Nodes can be kept 50% full (except root). ™

ƒ Can choose a parameter m that is