Spatial Joins Using Seeded Trees - UCR CS

4 downloads 0 Views 1MB Size Report
We conducted experiments with seeded- tree joins using three spat ial join algorithms: S TJ,. RTJ, and. BFJ. Algorithm. STJ (Seeded. Tree Join) is our algorithm,.
Spatial

Joins

Ming-Ling Electrical

Lo

Using

and

Engineering

Seeded Trees* V. Ravishankar

Chinya

and Computer

University

of Michigan-Ann

1301 Beal Avenue, mingling,

Ann

ravi@eecs

is

methods for

not

In

the

this

for

paper,

This

we

method trees

the

data

a technique

lists

during

construction

the tree construction

.umich.

be replaced

by smsller

seeded

trees

icantly

in terms

growth

during

significantly

1/0.

that

The

databases

creasing query

out

other

to

accesses. methods

penalties

filtering

using

tial

selection,

spatial find

all

spatial

a predefine

find

in

GIS

algorithms

into

incurred

between

This

method

the

objects

access

cilitate

such

operations

Most

research

on

operation.

are

window

contained area,

overlapping

spatial

years.

search

operations

Many

in-

and

methods

have

Examples queries,

within point

a particular been

on the spa-

or

which

which

in space.

devised

to fa-

by the Consortium Networking.

for

up both

the

SIGMOD 94- 5/94 Minneapolis, Minnesota, USA @ 1994 ACM 0-89791 -639-5/94/0005..$3.50

209

similar

method

[LH92]

has

range

space

B+

tree. two

Several

spatial

Joining

database

algorithm

using

can

as some

tree-like

matching before

the

practice,

to

such study

the

and

exist

at

can large

both

descend for of

techniques

data

a general

sets.

Since

pairs

of

recorded

n +

1.

to hold with

models various

be

level

indices

The as long

the

n must to

trees,

sets

data

required

the used

indices.

traversal,

level

Analytical

indices

analyzed

proposed

two

for

tree

performance

tree-like

spatial

in tree

of memory be

amounts

of generalization

to join

is used

could

sets

z-

such

Algorithms

of the join

concept

indices

as R-trees.

by their indices

data

to these

decomposed

[Gun93]

of tree-like

amount

In

is first

using

systems, the

algorithm

the

information out,

joins

tree-nodes

join.

Join

Gunther

be applied order

[Ore89,

streams.

are abstractions

algorithm

proposed

be sequenced

algorithms

in relational

distance-

been

one-dimensional

proposed.

to spatial

can

algorithms

and

Index-Based

applicability

for

indices

Orenstein

spatial

are

exist

using

also

study

two

z-value

join

also been

which

under

using

Tree-like

join

queries.

can then

organized

must

z-order-based selection

at at join

[NHS84]

which

A

which

to merging

grid-files

spatial

spatial

elements,

that

before

proposed

to be frequent.

processing

sets

spatial

a spatial

sets if spatial

of pre-computation

method,

indices

0re91]

breadth-first

Permission to co without fee all or part of this material is granted provid INl t at the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and I& date appear, and notice is given that copying is by permission of the Association of Computing Machinery. To copy otherwise, or to republish, requires a fee andlor specific permission.

assumes

based

builds data

accelerated

access

them.

those

be

indices.

method for two

as

generally

tree-like

sets are known

for

for joins

applications can

on

the overhead

data

join

and

data

It

main

for

speed

join

[Sam90].

*This work was supported in part International Earth Science Information

time.

associated

[Rot91]

on the

Spatial

indices,

based

[Va187]

time

relevant

have

intersect-

queries, point

these trades

as the

1.1

of

index

done

operations

such

on join

those

index

building

into

received

has focused

join

of a join

joins

based

and

algorithms,

is used.

have

system

those

spatial

perform

systems

recent

in such

window

all objects

in

join

as the and

or spatial

selection

necessary

z-ordering,

been

applications.

but

0re90,

signif-

GIS

Existing

to

joins

and

has

important

overlay.

be built

a large

work

most

expensive

invocation

up

little

of the

are

used

linked

allows

one

map

order

attention processing

databases

index

tree

speeds

spatial

CPU

seed-level

in

tree construction

of sequential using

spatial

The

tree

reducing

technique

show

those when

nodes

uses intermediate

The

join,

divided

are divided

The

thereby

and

numbers

of disk

except

and

relatively

spatial

on

Introduction

Spatial

ing

that

studies

outperform

are also lower

tree

disk accesses during

performance

1

edu

version

can also be used to filter

process.

of random

that

sets involved

levels.

construction,

We develop tree

of thedata

to guide

size.

method

process. grown

seed levels

during

map

selections.

seeded trees at join

structures,

the

multiple

join

of

assumption

non-spatial

trees called

are R-tree-like

The

some input

Our

involving

uses knowledge

and

the existence This

a spatiaJ

index

are used

construction.

sets.

involving

explore

seed levels

seed levels

assume

data

to speed up the join

Seeded

number

joins

applications

constructs

in the join

the

spatial

or for queries

dynamically

into

for

participating

realistic

overlays

time.

MI 48109

However,

Existing

layer

Department

Arbor

Arbor,

Abstract indices

Science

In such

high

fan-

to

used

were

techniques,

but

memory

constraints

The

R-tree

were

SRF87]

have

atively

simple

structure

spatial

objects

with

been

R-tree

is a B+-tree

tidimensional contains pointer ing

like

node

of all A

old), is its

ence

their

A

form

that

(mbr,

and mbr is the

refers

spatial

by entries

to

them

into

spatial for the

in the

units,

tial

and Dp

without

based

on

the

R-tree

each

computed a

[BKS93]

et al.

requires

or

variations. data

index.

techniques

to reduce

The

matching

starts

R-trees

for

the

matched

tree

are

to eliminate If the

As

a result,

process

the

was

results

order

technique also

than used

to find

on in

was

input

to a spatial it

result

from

the

Also,

when

inefficient

and/or fore

previous

in the join

to have some a spatial

been

constructed

costs,

pre-computation

to

join

Such

is invoked.

have

overlap

order

a derived Our ically

less

necessary access

input been

struc-

data done

requirements

dices

be-

tion,

can

the

210

in data

data

pre-

query,

only

if

a small it

may

selection

first

buildings

and

approach which

would

no

it is also common

spatial

for

Other

spatial and

queries,

requiring

sets

further

methods

original

methods could

for

be

very

operations buffering

such [SE90]

transformation complicating

call

a data

set

or non-spatial

are

derived

join.

the that

is

operation

FSR87,

for

of spatial

join.

of disk

the Thus,

mts

spatial

up

are not

of these

of spatial

they

accesses

were

incrementally, indices

Most cost

as

In particular,

these

construction.

data

most SRF87]

context.

to be built

to minimize

is to dynam-

However,

BKSS90,

algorithms

for all-at-once

number

queries

the

a different

assumed

are designed

such for

spatial for

construction

that

spatial

[Gut84,

designed

not

we

to supporting

methods

average

queries

If an spatial

to the

access

aggregation and

a spatial

sets.

of previous

related

such

for

data

set,

to support

indices

multiple output

convenience,

access

optimized

sets

and

for

remotely

sets

occur

build

originally

index

data

approach

The buffer

as that the

second

This

pre-computed

of an earlier

A page

Realistic

the

that

such

the

set

is the

For

the

though

join

situation, the output

also used,

some

for

a park

using

is large,

data

be only

original

1/0,

traversed.

for

between

Utilizing

also

disk

experiments.

require

with

applied

join.

or infeasible.

of the

so the methods

in been

the following

a non-spatial

spatial

re-classijicatiorz,

may

used

Motivation these

have

consider

government-owned

joins

might

original

as

be

needed.

to decide was

to involve

such

1.2

all

parks

is government-owned,

applications,

query

set.

reduce

of all

and

data

could

be

to perform

an intermediate

joins,

not

buildings

the

In real-life

the

technique

costs.

and

exists.

over-

tests

in 1/0

indices

algorithms,

can

of buildings

set

perform

index

consideradid

of R2.

the

involve

inthose

Iil

the children

degrees

CPU

in performance

paper

further

were

DB

all

buildings

However,

number

Tikl

by

area

would

also used

decreases those

total

efficient

of comparisons

of a node

based

showed

substantial sizes were

the

of the

nodes

of RI

of overlapping was

children

or

contains

contains

overlap

existing

R-trees.

be more

When

To

sets DB

R-tree

that

query,

fraction

1/0.

reduced.

DP

and DP,

[BKS93],

leaf

the

R-tree

first

this

of disk

a plane-sweeping

significantly

the

the

Brinkhoff

when

and

child

data

all government-owned

denote

costs

pairs,

number

number

plane-sweeping pinning

any

child

the

in

no answer

and and

reduce

in which

will

algorithm

from

spa-

be easily

a park.

computed

a depth-first

reported

of a child

area,

child

Find

For of

traverses

intersection

children

not

area.

that

all buildings

with

of the

CPU

their

box

on one axis,

further

Q2:

of

It

nodes

in

We

two

two

area , and

DB

exist

operation.

common

for both

oper-

indices

costs,

root

described

amount

some

bounding

this

to

collection

recursively

are

reducing

for overlapping

sorted

tures

the

spatial

pre-computed

sets may

consider

some

Assuming

Find

then

overlap,

intersection

mat thing

of the

of the join

at

to

tion.

looking

1/0

consists

is straightforward.

trees.

representing

found

the

disk

a

resulting

both

the

boxes

used

All

example,

may selec-

of previous

spatial

Any

data

to the join

cover in

sets.

original

results other

no

of their joins

of previous

the

cases,

many

operations:

Ql:

a pre-

algorithm

then

techniques aimed

at reducing

were

lap

an

addition,

of some

data

in

to main-

method

have

discussions.

those

bounding Rz

the

area.

join

such

applied

that

the

algorithm

join

to

and

Results

in

component

Improvement

aimed

and

order.

in subsequent

cluded

and

children,

reached

matching

join

children

overlaps,

traversing

nodes

This

algorithm the

a join

set

algorithm CPU

by matching

down

The

matching

tree

proposed

participating

R-tree

R-tree

tree

for

buildings

higher-dimensional

attributes, in

satisfy

sets regardless In

of a series

results

derived

constructed

Brinkhoff

two

index

As

refer-

points.

of

newly

to

be cost-effective

all data

results

or the

Clearly,

efficiently

form

object,

on the

joins,

not

frequencies.

on non-spatial

bound-

R-trees

in whole

tions ations.

of the

a spatial

mul-

impossible

for

and

be required

node

even

it may

structures

patterns

cp is a

entries

rectangle.

objects

or transforming

where

minimum

described contains

old

stores

or

First,

index

usage

An

R-tree

cp),

tain

of

objects.

non-leaf

situations.

rel-

handling

as region

method

bounding

stored

clipping

access

be inconvenient FSR87,

to their

efficient

such

leaf-node

where

due

their

objects

minimum

BKSS90,

popularity

objects.

to a child

in depth.

[Gut84,

extent,

of the

mbr

considered

and

entries

node.

(rnbr,

gaining

spatial

rectangle

child

not

and its variations

per

tend

in-

selec-

to reduce

spatial

selec-

tion

and

the worst-case

disk

space

ways

the

joins.

In

at join

do not

data

time,

For

trees. is

for

We

the

participating

situations

The

available

algorithm

taking

the

and

join

time.

at

operation,

a seeded

participating some results.

tree

Our

algorithm,

given

and

show

methods,

both

that

is

then

with

reasonable

Figure

of

the sets

using

matching

Our

when

paper

describes

is

the

detail.

organized

structure 3

construction

Section

of the seeded

techniques

Section

Section

6 concludes

follows.

behavior

presents

costs,

studies.

Section

as

and

Section

performance and

in

the

4

reports

5 discusses

this

to

2

with

trees

on

our

initial

tree.

initial

tree

that

issues,

Seeded

Let

us assume

that

DS,

for

with

an ordinary

R-tree the

which

index.

derived

The

let

central

that

tree DR

noted depends

TR

that not

index.

is to its

Let

make

expedite

the only

its

seeded

Ts denote for

the

seeded

tree

By

choosing

well,

we may

join

use

of

be the

join of

method costs.

constituent

Ds,

used

in It

has

R-tree-like data

objects

Ts.

Assume

boxes

the area.

If the

order

that

with tree

the

is to a

bounding

join

these

boxes

in TR.

are allowed

tree

but

initial

211

tree

However,

each

against

only

tree

one

bounding

for organizing

is optimized

is optimized so that

tree

for the

spatial data

seeded against

bounding

box

an

achieved,

the

tree two boxes

are organized box

be

Thus,

are different

when

If we will

in

will

TR.

selection

objects

smallest

BS4 but

the when

Ts in such

bounding

join.

with

of Ts

actually of

indices

‘for spatial

joins

shows

achieve

if the

seeded

lb

into

and

to be non-minimal

lc,

tree

each

DR.

to be inserted

root

are

the tree

be joined

to

BS3

in

Say we have

Figure

boxes

BS2,

la.

of the

match

bounding

the

index

will

with

objects

organized

the

small

in spatial

are inserted

bounding BS1,

the criteria

children

are

this

a seeded

TR will

of four,

objects

boxes

matched

of

fan-outs

process

as in Figure

been

data

and

TR,

used

have

in Figure

that

boxes

to determine to

since

with

organization

TS,

we

instead

Furthermore,

for joining

fourteen

its that

join,

growth,

information

example

of the

data

the

suitably

bounding in Ts

join

tree tree

an

When the

a seeded

re-inserting

a spatial

tree

be matched

expect

of tree

TR and

into

for

we know

characteristics

process. an

tree

more

and

for

the

subsequently.

suggest

node.

will

will

hence

phenomena tree

root

and

improves

to guide

tree

by the

bounding

DR.

for

will

tree

the seeded

R-tree

the

performance on

we are given

the

TR

set

exists,

a seeded

the

to reduce

data

index

, for which

be constructed

R-tree

can to

spatial

matches

behind

information Ts

We

and

and

a derived

constructs

TR denote

and

proceae. DR

set,

idea

use available seeded

set DR

algorithm

R-tree

and

DS,

to join

pre-computed

data Our

data

the existing for

no

we want

a single

seeded

inserted

earlier

in an R-tree

Such

tree

tree

deleting

a seeded

importance

an R-tree

Trees

that

were

inserted

inserted

of TR can be used

is illustrated

2

data

objects

a small

is shaped

The

paper.

data

objects

data of the

of the

observed

from the

data The

[BKSS90].

characteristics

reduce

related

tree

constructing

we know

which

organization

of the

can start

join

in

BKSS90].

the initial

of starting

This

tree

order

performance

existing actual

the [Gut84,

a fraction

of our method,

and

it

It has also been

in

performance.

on

into

position

proposed

outperforms

also decide

phases.

in

Beneficial and non-beneficialformation of bounding

1:

boxesin a seededtree.

compute

tree

construction

Date object in tree 2

a join

to

TM

Bounding box in tree 2



not

data

proceeds,

any

Bounding box in tree 1

~q

(c)

it handles

for one of the

component

tree

pre-

encountering

our method

during

to

input

algorithm

matching

simplicity

experiments

use

our

dynamically,

algorithm

works

we will

as the tree its

matching

the

but

advantage

join

m

called

indices

constructed

The

method

but

[BKS93]

sets.

=

a

which

Thus,

the

Upon

tree is first

data

standard

join

with

indices

are constructed

about

(b)

at

available

in

sets.

S4

processing.

index,

pre-computed

trees

We need

to create

system spatial

data

of the

structures

R-tree-like

using

seeded

construction

information

of

require

where

practical.

a

type

both

sets.

problem

index

assume

not

sizes

join

this

using

does

data

spatial

address

al-

So is the spatial

of information

efficient

main

the

time.

m

available

is inexpensive

advantage

al-

of spatial

construction

example,

that

are not

context

information

at join

method

and to minimize

index

of these

we

join

seeded

method

takes

paper,

R-tree

its

exploit.

to support

spatial

exist

spatial

structure

and

time this

new

is useful

characteristics

a spatial

in the

existing

sets are known

distribution

In

characteristics

ones

there

that

data

at join

Such

important

addition,

gorithms

join

costs,

consumption, most

time

input

selection

and

when

can

create

the an

be inBerted

as

slot

B2

slot

seed levels .—— ——— grown levels

R4 R3 o —

El

IF

Ill

).!..

//-=’&i--b~ grown subtree

seed node

w

grown node

Figure

in

Figure

during

lc,

In

the

the

1/0

tree

first

and

seed levels

grown

from

small

number

children

levels

are the

root

nodes,

of the form

a non-leaf

objects

cent ained of the

form

spatial

object

in the

an additional

the following

node.

and

field,

data

mbr

if seed level (see

we assume

Seeding

contains to

seed

through In of

the

the

copied

levels

1 (the

T5

The

a simple

are

k – 1, and

to level tree

phase

a growzng

phase,

The

tree

leaf

the

the

phase),

the

be

In

3.2).

In

never

do not

R-tree

up

some

T5

at levels

nodes.

The

set to NULL. a slot,

and

O (the span

The

the

seed

copying

fields nodes.

TR,

of the

of level

each

(mbr,

insertion

the

call

k – 1 of the copied in

level) level

k

the

into

growing

the tree.

seeded

tree,

seed

phase,

the

to

copying

over

tree

TR

to

the

best

long

box

B2

bounding

of the

seed

describes better

child

nodes

B1 and

are

the

level.

for

guide

B1 instead

slot will

deciding

212

minimal RI

(the

may

growing

never

changes.

grown

be

levels

levels.

The

described

two

in

seeding

tree

a large

dead

whereas

insertion

[Gut84], of B2, which the join

could

area If the

unchanged

area

object

penalize

increase

S1 would result

process.

the

consider

111 contains

minimal

and

be

minimum

will

bounding

R4.

As

and

B2 is a more

children.

to be copied

seeding

always

box

and

are

the

not

formed

bounding R2,

tree,

Simply

As an example,

R3

of its

the

tree.

from may

badly

tree.

of

of seed nodes

seeded

TS

squares

B1 has

growth

fields

boxes

tree

and

we use minimal

during

nodes

nodes

seed

will

of the

the

and

accesses

the

be

boxes

seed

the

the

the

box

Copying

where

B2 were

tree,

of

at

nodes

bounding

its children,

k – 1

must

level

insertion

into

guide

seeded from

description

fields

spltttzng

levels

rectangles box

fields

2.2.

the

contains

the

reachable

bounding

of the seed levels

of the seeded

3,

two

reflect

data

slot

data

performance

boxes

Figure

box

bounding

the

not

all box

of the

grown

strategy.

at level

nodes thus

crucial

performance

undergo

to their

seed

in the

into

k — 1 seed pair

the

the values

copied

fields

cp)

the

which

the

in a bounding

need

of

minimal

upwards

in Section

bounding

seeded k levels

seeding may

being

pointer

fields

data

from

before The

top

TR nodes

We

information

root

of the

O to k – 2 are set to point

level

phase.

from

the

is called

pointer

The

levels

over

R-tree

transformations

corresponding nodes

from levels

grown

is derived, box

simple

numbered

by

TR.

information bounding

clean-up

the

phase,

set

of a seeding

cp)

in the

insertion.

begins.

node

of

data

of mbr

bounding

during

propagates

behavior

value

true

structure

particular,

pointers

boxes

guide

box

fields

as needed

bounding

to

bounding

change

will

the

(mbr,

matching

pointer

can have

level).

seeding are

and

consists

null

the

pair

the

and

bounding

fields

tree

Since

of a seeded

have

slot

only

The

:3) Although a

Phase

construction

level

following

boxes.

bounding

into

detail

2.1

the

used

cp,

before

of all

exist.

The

have

construction, are

minimal

through

Section

grow.

tree

at the

and pointer

true

modified

jiltering

shadow

box

contains

refers

is the

eventually

in a seed node,

to a child

in a seed node

insertion

discussion

old

will seeded

bounding

nodes

Thus,

a the

rectangle

where

tree

of the

nodes

tree

seed

As with

A leaf node

old), entries

from

tree

the

non-null

(:2) During

levels for

cp points

bounding

database,

shadow during

where

minimum

The

seed

seed

but

nodes

span

(:1) The

seed at the

level.

which

seed levels

seeded

the

tree

leaf

Efficientand inefficientboundingboxes

3:

into

——— —. —__

properties:

by

those

The

in the seeded

in the child

object.

The and

levels

to the

cp),

(rnbr,

of

consecutively

grown

node

(mbr,

is the

TR to the

2).

continue

The

is achieved

nodes.

seed level

entries

performed

shape

be reduced

consists

seed nodes,

and

last

and mbr

R-tree tree

grown

of levels.

of the

of that

can

goal

(see Figure

called

R-tree

box

of the

are called

entries node,

cost

this

a seeded

levels

start

CPU

Figure

The

k levels

at the

levels

and

method,

Structurally,

grown

L L

process.

seeded

copying tree.

R2

Exampleof a seededtree.

2:

both

the join

0

subtree



J

l–

a result,

only

compact

bounding into

badly and boxes

the seeded

as the

criterion

be inserted

in unnecessary

into disk

Other

information

box

field

had

copied

the

seeding

into

of seed

can nodes.

the

In

the

for

of seeding

we

copying

into

previous

been

if we

boxes

inserted

from

SI — .

from

properly

three

❑ ■

A

bounding

example,

investigate

information tree

the

of bounding

have

study

box

copied

points

S1 would

this

strategies fields

In

center

tree,

Bz.

be

s

copy

the

minimal

(a)

different

the

bounding

S1

nodes:

(72:

copy

the

center

bounding points

the

minimal

the

slot

minimal

box

bounding Our

level,

show

that

the

growing

into

the seeded

tree

traverse each

the

level next

and

a slot

first slot

tree

node,

and slot

root

the

data

it

into

two

nodes.

new

root.

behave

like

R-tree

being

Recall

the

node and

that

Thus,

a seeded

a small grown

tree nodes

of

to by the each Figure

seed

slot

to the

pointer

up

growing

with

an

slots.

The

as consisting

of

R-tree

of

forest

a grown

subtree

At

choice

seed

level

we must

to traverse,

based

fields

selection

inserted

the

also

this

study,

until

on the

of each

node.

central

point a child of the

on

or an area, whose data

The

exact

whether

the

If central

central

being

stored

Update

point

inserted.

for

points

are stored,

is close

to the

If

areas

are

child

U5: a

only

in

we

not

insertions

boxes

to will

in trying boxes

boxes

associated

insertions

to

from

right

to reflect

pointer

will

set IIs the

We boxes,

choose

bounding

their

of data

seed nodes

by the characteristics

derived

after

traversed

at all

be guided

the inserted

following

not

seeding

tree

so far.

bounding

In box

Update Bounding

insertions. bounding

the inserted

as U2, only

Bounding

is

part

seed bounding

U4: Update

stored

used

insertion,

the bounding

through

inal

box.

criterion

If

Updating

to enclose

the

value

sum input

bounding

bounding

causes

information

data

subsequent

be guided

tion

this

bounding

the the

policies:

from

in the

original

we investigate

We make

a child

is found.

information

depends

choose

choose

a slot

and as that

these

them.

subsequent

by the

C73: Same

(see

it

traversed

each

boxes,

tree.

so that

closes

each level

data

but

2).

next

the

and will

seeding

by

pointed

bounding

from

same

update

update

each insertion

only

tY2:

smallest box

of the

after

to

to

using

the

fields

bounding

U1: No updates

phase.

R-tree

how

a slot,

update

to

of the seed levels

whole

is called

of the

pointer.

propagate

be visualized

nodes,

attached

not

root

slot

slot

is the

whether

these

times,

to

this

the

structure the

to point

criterion

box

how

the

of

bounding

updated

and

after

and

parent

through

to by the does

the

can

nodes, the

insertions,

during

tree

grown

is modified

pointed

splitting

unchanged

two

the

subtracting

old

always

choose

of

to

the

bounding

continue

the

of This

can

find

yields

construction.

update

through

that

insertion,

areas

The

is

due

a child after

are not

new

like

the

R-tree

Otherwise

overflows

insertions

R-tree

node

node

behaves it

area

is the

the

found

choose

rectangle.

of the

to it.

at

reached

grown

we box of

from

pointer

to become

pointer

level,

is

point

node

When

we

If this

a new

into

into

slot

child

node

split

ordinary

seed levels,

(c)

are

traverse

level

to

D5

object,

data.

the

grown

slot

slot

inserted

allocated

The

to

the grown

Subsequent

that

remains

be

the

is set

R-tree.

node

to

case,

object This

will

grown

this

a data

the

slot,

pointer

pointer.

the

point

In

inserted

the

box

this

child

in

an example

node the

inserting

of an ordinary

a third

the

for

NULL.

insertions,

root

Eventually

the

the data’are the

the

through

be

R4

Figure 4: Exampleof seeded tree growth. The fan-out of tree nodes is two. In (a) the seed levels havejust been set up. In (b) rectanglesRI, R2 and R3 are inserted through slot S1 and S2. GrownnodesGI and G2 are allocatedas a result. In (c), rectangle R4 is inserted through slot S1, and ovetflows G1. A new root G4 of the sub-treeis allocatedand S1 is made to point to it.

and C3 almost

4 shows

insert

a suitable

chosen

will

R2

R3

the

minimum

objects

Figure

To

from

level,

allocated,

Cz

data

tree.

growth.

insertion

the

levels,

true

R3

(b)

Cl.

phase,

choosing

the

other

of

R2

G3

B RI

Phase

During seeded

points

the

strategies

2

G1

A

bounding

children.

copy

inserted of

At

strategy

Growing

center

contains

of its

out-perform

the

boxes.

field

box

results

2.2

copy

bounding

bounding

always

2

RI

At

s

G––

Q

boxes. (73:

s —

boxes.

of

grown node

E data rectangle

S1 Cl:

seed node

but

boxes bounding boxes

after

objects

each

inser-

and the orig-

box.

inserted

bounding

boxes data

the data,

box at other box at other

updated but

at

bounding

not

the

the

slot

seed levels at

the

slot

seed levels

box

en-

seed bounding

level are not level are not

as in

Uz.

updated. as in

U3.

updated.

we

central

The

stored,

in ordinary

213

bounding

boxes R-trees.

at the

grown

level

are updated

as

The are

clean-up

inserted

fields

phase

into

of seed node

bounding data

boxes

objects

made

after

seeded

are adjusted of their

are

all data

object

The

bounding

tree.

to be the

children.

deleted

and

true

Slots

box

For

minimum

data

tree

no

Note

process

that

more

data

objects

slots

than

into

may

have

with

directly

During

trees

to

data

some

subtrees

since

the

tree

require

the

it

can

be

developed

for

seeded

trees

pointer

be

any

mat thing

are

as tree

full,

Tree

Since

traditional

buffer

for

Construction

incremental

structed

all

be high. time,

this

section,

diate

at once, we

accesses

during total

exploits sizes

tree

that

the

costs. interme-

hence also

seed

disk

a technique

levels

the

together.

Since

hence

a grown

trees.

tree

3.1 We

T’uning have

indices

grow

actual

and

trees

and

using

such

overflow

can

buffer

several

times

a 512

result

in

Table

2, second

page

7,234

of disk 100K

cost is the If data

objects

in their

input However,

close such

join

This

number

needed

difference for

three

a match

factor

the

times

with

size

the

in space of buffer is hard

to 50.

can

for than

times

the data

random

and

the

tens least

and In

number

the

even

having

with

size can

data

our

stream.

are also close will

to guarantee

experiments,

more

than

worst

buffer

in

tree

214

2,500

least

tens

few

as

from

of

could times

larger

if

some

grown

they

are

likely

built

using

the

same

to

since

smaller

the

constructed

using

be

buffer

much

unlikely

than

be

overflow subtree

buffer

size,

In

seeded

trees

with

buffer

size.

The

a 512-page misses.

to

input, smaller.

to

average the

at

work

of

even

are

two

a few

buffer, likely

in

times.

a fan-out

we ha~e experien-ced

is two

since

access

be

that

skew

nodes

varies

least

we have

must accesses

is determined as

at

buffers

much

tree For

at

access

However,

Note

incurred

“overflow

construction

a seeded

R-trees

data

disk

random

algorithm

the

of

therefore

method

the

size

chances

construction

levels.

that

however, some

be

of

than

be made

faster

assuming

R-

are

lists. than

of slots

size.

penalty

linked

in

seed

of

an

size

The

this

tree,

size

of sequential

faster

in much

overflow

smaller

practice,

the

number

trees do

number

can of all

seeded

the

buffer

price

we

average

of random

The

means

buffer

subtrees

construction

misses

This

objects

instead

in the

data. the

number

is much

hundreds,

the

input

in the

of

the

seeded

much

the

of slots

number

levels,

same

results

have

modified

lists,

than

reading

number

the

seed

(see

input

an R-tree

in the input

chances

clustering

is nine

affecting

to each other the

page

between

and

an Robjects,

construction

to read

by

process.

data

lK-byte during

costs

this

An

that

data

one

slots

overflowing

and

by

smaller

access

1/0,

The

bad

1/0

40K

of clustering

order,

disk

constructing

Another

degree

in

that

with

disk

are many

reduced.

writing

sequential

algorithms,

of the actual

set

for

seeded

particularly

with

accesses,

access needed entries.

resulted

accesses

the

the

The

linked one

is much

is an increase

start

lists.

R-tree.

the

and

we can

the

subtrees,

the

smaller,

pay

sizes of

and

data

row).

disk

R-trees

In

with

a

as before.

is then

grown

subtree

is significantly

as the

space.

relative

observed

accesses

disregarding

misses

construction

that

buffer

disk

of disk

sequential

lower.

than

we have

using

set,

both

much

tree-like

buffer

on the

high.

have

higher

buffer

memory

For

be very

a 800 K-byte

number

from

depend

buffer.

misses

example, for

the

costs

the

of constructing

mainly

built

using

the

is called

lists

pointer

subtrees

there

subtree

slot, slot

of

are reset

linked

of linked

intermediate

grown

most

are inserted, the

pages a small

pointers

proceeds

from

same

additional

than up

are will

data

so written

then

an

a data

buffer

all

longer slot

of

objects

an

which

lists

and

the

freeing

group

the

in

list

contains

box as data

lists

disks,

The

root

many

a grown

Costs

costs

straightforward

costs

cases,

For

the

arise

construction

tree

tree

that

all at once

trees

the

Construction

found

the

lists

insert

in

in DS

lists.

such

construct and

to reduce

under

using

linked

to

list

each

a linked

pages

subtrees

for

to the

By

into

grow

process

objects

in the

to point

substantially

present

of the

In

random

grown

recorded

subtree

of linked

grown

is built

been

at

uses

of the

R-tree

can

such

that

construction, We

costs

set

all data

inserted

corresponding

insertion

constructing

con-

dynamically

we reduce most

structure

of seeded

trees

a method

costs.

being

construction

seeded

to eliminate

join

for

The

The

data

want

to

The

5). tree

a grown

linked

constant

NULL.

When

optimized

all

the

the

data

now

that

into

lists

a linked

write

space.

batch.

been

than

total

examine

lists

reducing the

their

construct

we

have

rather

it is important

linked

that

indices

updates

As

join

Improvements

tree-like

we

pre-defined

a pre-requisite.

we

into

(see Figure

a bounding

all

most

forming

size,

in the

linked

If

object

slots

by

built

with

Eventually

avoid

buffer

be

page

The

the

organized

each

allocated.

to

3

field.

data

R-trees

as long

A data

to

misses

if we estimate

the

not

able

buffer

under

first

of entries,

inserted.

applied

Furthermore,

than will

been

to

phase,

but

pages.

array

not

balanced,

to matching

is not

into

does

difficulty.

technique

be applied

balance

be

a slot

have

due

lists

growing

be larger

through immediately,

grown

However,

linked

the

algorithm,

inserted

[BKS93]

any

insert

tree

we

accesses

intermediate

structures

the seeded

a result,

heights.

without

optimization

been

As

TM

after

above

have

others.

procedure

participating

the

may

different

matching

begins

trees,

disk

size will

matching

is built.

seeded

random

containing

relevant

general.

in DS

consistent.

The

can

begins

the

during

seeded

n

4

S1

S2

s

s

A

S4

.

L41

s



@ —

— L11

S2

s

L12





!$

seed node

fi

grown node

❑ hdllinkecllis.tpage @

*$-

(a)

half full linked Iistpage data rectangle



dm”b

L;l

m.

❑ S4

(b)

(c)

Figure 5: Treeconstructionusing linked lists. Assume each data page has a capacityof two data objects, and the buffer has a capacity of 10 pages. (a) shows four linked lists formed under the four slots. All 10 pages in the buffer have been allocated,and one additionaldata object is now insertedinto linkedlist underslot SI, resultingin bufferoverflow.In(b), the batchof linkedlists formedin (a) has beenwritten to disk due to bufferoverflow.A new linked list is formedunder SI for the data object insertedin the last step. Grownsubtreesare constructed using the data in the linked lists when all the data objects are inserted. (c) showsone grown subtreeconstmcted usina linked lists LI I and LIZ, the linked lists constructedunder slot S1. .



3.2

Reducing

Tree

Sizes

with

Seed

Level

tree

Filtering The

seed

levels

of the

they

can

function: size.

Smaller

tree

tree

is easy

and

indexed

one bounding the

data

seed

R-tree

with

this

When

seed

nodes

during

tree phase

seed

not

Ts,

the

a data we first

field the

in

DR

The

shadow

and

so the

first

test

query

k levels a

to be non-

into

fields Once

space

of the

need

are not occupied

the

Ds.

1/0

tuning

generality,

the

TR

during

be inserted used

after

by them

are Ts

pages.

are

used

If there

into

tree

any

not

is

becomes

both

4

Experiments

Assume derived

that data

set Ds

tree

TR exists.

tree

joins

RTJ,

and

our algorithm,

a spatial

join

and

is to

a data

We conducted

be

set Dn,

performed for

experiments

using

three

BFJ.

Algorithm

spat ial

STJ

join

as described

so far.

which with

algorithms: (Seeded

on

Tree

It constructs

a

an RS

TJ, is

dirt y buffer

process

if

215

and

data

generating x

distributed clustering

of clustering

object of

tree

writes

cent aining and

construction. buffer

cache. the

tree

containing

Ts

of spatial

cluster-

by a simple

set of x x y objects,

the

map

centers

whose area.

of y data By

rectangles, data

must We do

manager.

rectangles,

rectangle.

is

matching.

during

pages

512

buffer

tree

a warm

degrees

in

of the

R-tree entries

the

pages

after

a data

the

the

buffer

was cent rolled

cluster

distributed

and

as dirty

buffer

of different

page as are

are to be re-used.

by the

When

each

buffer

buffer

generated

degree

disk

contain

and

disk

dirty

tal area of the clustering

a seeded

the

RTJJ

with

be

we

within

For

and a 4-byte

and

pages

scheme.

randomly

box

the

starts

could

randomly

Join)

and

used,

a dedicated

if the pages

of clustering

first

CPU

bytes,

to

are marked

degree

were

seeded-

nodes

are re-allocated

The

nodes

construction

there

We studied

tree

STJ

matching

that

ing.

these

was

lK

assumed

of T5,

tree

the

tree

nodes

free.

as

between

[BKS93].

both are

bounding

tree

to disks

purge

are

assume

construction

matching

construction,

D5

to

the

in

size

seeded

files

also

created

Note

all.

use

that

page

algorithms

during

Hence

data at

TS

We

be written

at least

queries

join

structure

assume

the

data

For

newly

the growing

we

of a 16-byte

During

phase.

both

Ds,

(Brute

in

answers

a spatial

described

R tree

memory

The

nodes

overlap

the

sizes of both

consisting

overlaps

k seed levels.

and

identifier.

fields

original

for

of window

of

to

STJ

and

Ts

BFJ

rectangles

aggregation

is a

of its variations.

nodes.

When

these

object

the

data

techniques

simplicity,

TR to the

construction

does -not

not

the

a series

tree

by Brinkhoff

Algorithm

performs the

the Join)

an R-tree

TR.

using

RTJ

and

any

with

The

matches (R- Tree

proposed

is equivalent

DR

size

field

corresponding

copied, tree

in

box

shadow.

of the

data

object

entries

from

of the

entire

at each

fields

called

fields

if the

data

and

the

bounding

TR,

disk not

Ts at all.

simply

then

RTJ

constructs

Ts

queries

and

algorithm

It first

windows.

window

whether

of the

R-tree

For

is to be inserted

check

one shadow

R-tree.

found

is used,

mbr

the

object

no overlap,

overlaps

set D.s, Algorithm

matches

Join)

on the

data

if it

the to

is copied

change.

When

some

of that

be inserted

time,

the

during

phase,

only

objects

additional

shadow

without

changed

object

filtering one

information

of

into

nodes

level

construction

levels

copied

resemble

not

both

data TR.

[BKS93].

then

Force

level

be used

Data

during

and

the and

variation

et al.

tree

seed level filtering.

carry

seeding

each

can

TR.

1/0

overlaps

of TS

they

seeded

Ts

simple

additional

time.

if and

at

one the

less disk

a rectangle

TR need

test

serve

reduce

join

box

overlaps

overlapping

seed

incur

an R-tree

levels T~,

object

We call

by

to

actual

to see that

at least Since

tree

be used

objects

of the

seeded

sizes

construction

It

Ts for

indices

.

set.

The

then

rectangles

controlling we could

centers We the

control

smaller

the

tothe to-

tal

area

of the

the

data

set.

ing

rectangle

to

lie

The

clustering

map

over

the

boundary

clipped.

area.

cluster

ing

rectangles

Without

X and

The

in DR

the

of

of the

We

100,000,

The

6:

centers

study

the

1.0,

of the

z

6000

of

2000 0 20k

Figure

copy

40k 60k Cardinalify of data set S

SOk

7:

1/0costs for tree construction,ExperimentSeries 1.

❑ ❑ ❑ ❑ ❑

14000:

For

effect

of seed

rectangles

tree

each Among

g 12000:

of v ~

level

8000

of Ds

on side

series

policies,

and

C3 (see Section (see

Section

The

differences

were

marginal.

we found 2.1)

2.2) Due

and update gave

the

three

to

seeded (C’3,

that

always

between

from and

combinations

20k

U4),

space trees

best

denoted

using

copy

U3,

C72

U4 and

update

Figure

8: l/O

.Di=ttHd

R2=k=!

that

as STJ1 the

H

❑ ❑ ❑ ❑

BFJ RTJ STJ1-2N STJ1-2F STJ1-3F

only “

combinations and

STJ

0.2 ‘ 0.4 ‘ 0.6 ‘ 0.s’1’ Fraction of area used in data clustering

STJ2, Figure

showed

costs for tree matching,ExperimentSeries 1.

policies

we list

respectively. experiments

SOk

performance.

limitations, built

node

strategies

policies better

40k ‘ 60k L ‘ Cardinality of data set S

as

set of

of experiments

copy

STJI-3F

length

same

of seed

STJ1-2F

of

was set to be same the

RTJ STJ1-2N

,,,

2000:

of the

quotient

BFJ

4000:

O.2, 0.4, 0.6, 0.8

We ran

as the in first

various

U5

U3),

cover

bound

experiment.

the

40,000,

on side length

equaled

upper

fixed

and

configuration.

the

results

The

variations

update

the

rectangles

in each

data

so that of DR

we

of clustering bound

STJI-3F

4000

data

2.

the

effect

upper

STJI-2F

area.

node

Section

100,000

degree

the

rectangles

clustering

seeded

the

at

RTJ STJ1-2N

6000

to 0.04.

map

seed

experiments,

Ds

varied

adjusted

respectively.

of DR

of

and



rectangles

studied

the

:

RTJ and BFJ TJ variations

in

we and

12000

S

different

STJ1-3F

❑ ❑ ❑ ❑

140C0

upper

of all the

of

g

Totaldisk 1/0costs, ExperimentSeries 1.

= 10000

the

degree

setting

we tested

of

series

clustering

clustering

Our

60k

20000 1Sooo

first

varied

to 20$Z0 of the

levels,

DR

and

sets.

(C3,

the

of policies,

second

respectively,

the

40k ‘ 60k ‘ Cardinality of data set S

on performance.

In

and

at

rectangles

as described

of seed

cardinality data

set by

extensive

policies,

the

of the clustering

that

combinations

number

filtering

area

1 along

In

and

to 80,000.

restricted

an

I)R

levels,

was

configuration,

combination

the

of

4

quotient

were

data

update

each

map

O to

;

of clustering

meaning

conducted

and

for

20,000

of data

0.2,

applying

that

from

cover

was

each

and

and

of

STJ1-2F

RTJ

of

Q

cardinality

on the side length

objects

the

from

Figure

number

of experiments.

R-tree

of IIs

resulting

For

series the

an

clustering

in DR

total

STJ1-2N

&q

BFJ

objects

of cluster-

Results

fixed

cardinality bound

range

20k

it was not

of data

of generality,

-,

extended

16000

two

in

spatial

to fit

Y axes.

we

resulting

over

clipped

rectangle,

❑ ❑ ❑

When

extended

were

to the

to

were

bound.

and the number

loss

Experimental

series,

rectangles

the number

assumed

This

clustering

rectangle

clustering

set according

was

conducted

the

of its

of the

rectangles

a data

cluster-

bound.

upper they

clustered

independently

of data

area,

When

was

study

4.1

by

data

was set to be 200,

objects.

under

We

tot al area

In the experiments,

per

and

a smaller

map

more of each

upper

shape

or

of the

the

both

and

using

into

data

the

rectangles

boundary

the width

a predefine

size

chosen

the

randomly

controlled

rectangles. similarly

and

chosen

O and

bound

rectangles,

length

was

between

upper

the

clustering The

versions

216

9:

Totaldisk 1/0 costs, ExperimentSeries 2.

H

RTJ

I ❑

STJ1-2N

❑ ❑

STJ1-2F STJI-3F

tree

construction

may

need be written

during

tree

should

thus

the

The

CPU

I

I

I

■ ❑ ❑ ❑ ❑

1

1

3

whether

two

used

0.4 Fracfionof

0.6 area Wed

0.s

11:

sT,,-,,

cost

arises

For

S

from

BFJ

subst ant i all y outperformed terms

of

disk

the

when

seed

level

filtering,

costs

level

magnitude

from

Figures

observed

in the

10, and

11 break

creation

and

tree

no tree

creation

Tables of the

first

appears

after

A letter

“F”

that

disk

show

no filtering

tree

costs

an

seeded

in

The

numbers

small

1/0

2,3

and

matching and

fly,

1/0

those

BFJ

and

access.

“construct”

1/0

used

following

the

number and

test

X or Y

we

found

terms

during

tree

tree

RTJ

For

during

this

construction.

both

from

buffer

the

linked

lists

of

in

reads

as

increases.

misses

and

during

creation

tree

time

reads

TJ even for large Ds sizes, while

S

from

1.5 times

and

this

of

showed

overflow

costs

Ts

7, 8,

(Table

1) to 30 times

of tree no

give

number

access 1/0

the costs

construction,

and

write

costs,

respectively.

starts

with

a warm

buffer

buffer

two

when

BFJ

in this

case.

Since

size,

for

held

S

no buffer

TJ with

nodes

both

with

times

the

from

However,

larger

the numbers

TJ as soon as the number

exceeded

the

buffer

RTJ performed

that

D5

that

buffer

to three

by S

nodes

we found

the

matching.

incurred

due to buffer

seed

two. The

no

in

as 1/30

trend

cost

size. worse

and

“wr”

Note

that

since

found

over

from

clustering

217

performs in

outweighed

1/0

disk

is not

used,

STJ

1/0,

better

To than

is paid

for

also

gain

that

for tree

creation.

for

using

by the

the

using only

creation an

When

incurs

savings.

Filtering

than

clear

cost tree

the

a consistent

of

is especially during

among

“rd”

terms

costs

Tables

during

misses provides

in

reduction CPU

filtering

columns

filtering levels

This

of the

left

buffer

found

occurred

the

tree

R-tree

variations

three

of random

incurred

during

Seed level

stands

the

of the

expect

We

TR nodes

Overflow since

cases

1).

than

BFJ incurred

accesses

cost

STJ

seed

and

counts

costs,

1/0

indicates

levels

most

in tree

matching

name.

STJ2-2F

seed

as the disk

“N”

in all

(see Table

483 different

surprise,

linked

lists

BFJ in all cases. A closer look showed that though tree costs are lower for RTJ the high tree creation

costs

that

BFJ

sizes,

Ts

of disk

number

algorithm

eliminated

linked

numbers

cases.

was smaller

of accessed

TJ variants

S

a letter Thus,

Under

The

in

successfully

occurred.

sets,

STJ. Our earlier

of

similar S TJ incurred RTJ when intermediate

as

intermediate

only

and

those

Using

smallest

data

data

than

that

reads

used.

number

same

method

CPU

4) larger

in these

level

so incurs

indicates

two

is shown

and

for

vary

and time

accessed

the

Figures

in detail.

TJ with

A sequential disk

arises

STJ outperforms

order

disk

into

trees

this

S

that the

experiments disk

misses

reading

size is the

seed

seed

The

disk

was performed.

accesses.

matching

the

activated,

2 of

Disk

read

about without the

on the

hyphen

was

of a random tree

1/0

of experiments

the

variation

“match”

were costs

mat thing.

4 show

following

construction.

along

interesting,

cost

was not

our

of the

filtering

filtering.

the

structures

series

all

for

misses

costs.

levels

for

With

of experiments.

of tree

1 through

of seed

level

down

those

buffer

1000.

bounding

[B KS93].

of

the

RTJ they

algorithms

activated.

9 summarize

series

all

STJ were between

of

the

6 and

among

not

and

than

two

in

number

TJ

construction

R TJ in all cases also S TJ versions

The

costs

RTJ,

and

tree

of experiments,

1/0

experiments

and

costs was

CPU

greater

filtering.

creates

costs. CPU

filtering

the

BFJ

of

1/0

lowest

of

of

in data dwfeting

list

in

matching

RTJ

construction.

for

numbers

overlap

is particularly

of creation

incurred

of

overlap

units

of operations

boxes

series

The

very

numbers

the costs go up as the size of D5

1/0.

construction

costs for tree matching,ExperimentSeries 2.

1/0

“match” part

in

during

numbers

tree

first

the

the

performed

outperforms

disk

(Table Figure

the

STJI-2N

1

under

construction

algorithms

lists

bounding

that

remain 0.2

Ts nodes

are re-allocated

column tree

are

the

is the

during

the

STJ of

ST.tl-sF

write

given

by

tests

‘(XY”

expected

RTJ

containing

if the pages

to the

“bbox”

overlap

axis,

BFJ

costs

Column

In 35000,

pages

The

be charged

column

box

1/0costs for tree construction,ExperimentSeries 2.

10:

matching.

performed

The

Figure

dirty

to disks

algorithms.

tests

0.2 0.4 0.6 0.8 1 Fraction of area used in data clustering

time,

costs. increase

seed lowest

level CPU

all algorithms.

2 and

second that

Table

series the

5 through

processing

decreases,

Table

of experiments, costs

especially

for

8 list

the

In general, rise tree

as the

results we have

degree

matching.

of

This

n

1/0 Match

A lg

Wr

rd

CPU

costs

Construct

[

r-d

I

w!-

costs

(K bim

total

1/0

tests) X’Y

BFJ RTJ STJ1-2N STJ2-2N STJ1-2F STJ2-2F

438 1182 694 849 685 823

0 359 319 358

0 144, 94 94

0 243 137 150

438 1914 1244 1451

2381 130 79 84

314 349

94 94

85 99

1178 1365

896 898

0 170 168 170 168 170

1 STJ 1-3F

712 746

226 223

94 94

5 5

1037 1068

1945 2001

160 167

u

STJ2-3F

Table

1:

20K.

Cover

Join

performance

quotient

n

I

with

of D~

1/0 Match

.

llDnll

clustering

=

100K,

rectangles

I]DsII equals

CPU

costs

Construct

I I

u

(K

=

wr

bboz

XY

IJFJ RTJ STJ1-2N STJ2-2N

13650 260S 2422 2439

0 27 370 369

0 12274 366 367

0 18S7 1483 1477

13650 16754 4641 4652

6984 315 263 267

0 560 53s 538

STJ1-2F STJ2-2F

2362 2429

358 367

366 366

1343 1357

4429 4519

2603 2610

535 536

STJ 1-3F STJ2-3F

2274 2244

349 368

366 366

451 426

3440 3404

5613 5709

498 520

60K.

3:

Join

Cover

performance

quotient

I/o Match

353 363

507 507

1952 1947

5768 5885

3418 3431

686 690

2739 2745

344 354

505 505

698 672

4286 4276

7328 7435

638 666

715 719

2896 2920

1735 1739

349 356

STJ1-2F STJ2-2F

STJ1-3F STJ2-3F

1519 1537

342 353

236 236

140 120

2237 2246

3767 3843

330 344

STJ1-3F STJ2-3F

with

llD~ll

=

100K,

rectangles

\lD~ll eqU&

Table

n

80K.

0.2.

4:

Join

Cover

performance

quotient

with

of DR

a data

with

rectangle

is higher] nodes

during

as

number

mat thing.

that

of spatial

be accessed, tree

leaving

ently half

low,

the

accesses

grows

rapidly

number

than

the

We have data

is high,

decreases. Ds

eliminated

tree

the

nodes

that

when

costs

are the

become

the

degree

this

some

= 0.2.

CPU

costs

(K

costs tests)

rd

UT

rd

Wr

bboz

XY

BFJ RT.I STJ1-2N STJ2-2N

14803 2881 2265 2347

0 57 329 374

0 6909 236 236

0 121’7 794 795

14803 11036 3624 3752

6628 405 268 284

0 443 437 445

STJ1-2F STJ2.2F

2242 2328

330 374

236 236

770 752

3578 3690

2688 2702

436 445

STJ1-3F STJ2-3F

2265 2342

337 358

236 236

430 430

3268 3366

5268 5364

411 429

total

is

as the because

objects

frlt ering

of spatial

in

degree most DR,

and

5:

40fi-.

(her

Join

performance

quotient

with

of ~R

IIDRII

Chstering

=

100K,

reCtangkS

IIDsII eqU&

= ().4.

for I/o

less

number worst

decreases, becomes

Table

remain

always

BFJ,

For

must

factor

costs

the effectiveness

the

diminishes

the

I]DsII equals

In this

deciding

creation

TR nodes

and

by

the

and

Again,

overlap

in

Match

of

Alg.

of all

because

much

larger

size.

also found

is high

degree

factor

for optimization.

of clustering

of touched buffer

filtering

degree

that

decreases,

leaf

total

RTJ.

of

the

most room

those

as the

suggests

joins.

become

of

methods

and TR leaf

an important

TJ, the tree

and

Ts

Alg.

in DR

time S TJ at tree matching RTJ. This is because for low

little

S

rectangles

100K,

atc

~

that

=

rectangles

by of

costs

For

possibility

clustering

clustering,

creation

performance.

This be

of

accesses to

data

of spatial

degree

the

access more

should

results

close

degrees

in

tree

of disk

becomes

disk

we must

the

data,

overlaps

thus

the

Also,

consist

Ds

clustering

estimating

than

in

and

of spatial

case,

less clustered

IIDRII

clustering

I/o is because

wr

2956 3068

236 236

clustering

I

o r L 741 685 691

357 359

of ~R

rd

costs tests) XY

1588 1606

performance

CPU

costs

(K

UIr

= O.2.

]

STJ1-2F STJ2-2F

Join

[IDs([ equals

I

17151 3292 2996 3063

2:

100K,

bboz

BFJ y RTJ STJ1-2N STJ2-2N

COvel quotient

=

rectangles

9085 415 334 353

0 372 349 355

Table

llD~ll

o [ o I o I 17151 38 16555 2525 22354 361 506 2126 5989 362 505 2154 6084

bboz

4648 295 169 174

40K.

1

I

total

Wr

XY

I

8864 9695 3040 3064

I

total

I

rd

0 1219 817 820

!-d

I

Alg.

0 6015 236 236

Wr

with

clustering

of DR

costs

tests)

0 50 364 360

I

costs tests)

rd

8864 2439 1623 1648

rd

(K

wr

BFJ RTJ STJ1-2N STJ2.2N

Ala.

construct

rd

Table

0.2,

CPU

costs

Match Alg.

of seed level clustering

of

costs

CPU

~

(K

costs tests)

rd

Wr

rd

w!-

BFJ kl~ STJ1-2N STJ2-2N

23177 3451 3263 32S0

0 62 350 366

0 6370 236 236

0 1202 813 802

23177 11057 4662 46S4

7773 a64 419 410

0 634 514 524

STJ1-2F STJ2-2F

3251 3268

352 366

236 236

782 763

4621 4633

2707 2701

514 529

STJ 1-3F STJ2-3F

3212 3385

346 354

236 236

637 583

4431 4558

5788 5879

481 509

bboz

total

of clustering data

Table

objects

cannot

40K.

be

pro cess.

218

6:

Join

Cover

performance

quotient

of DR

with

IIDRII

clustering

=

100K,

rectangles

[lDsll equals

= O.6.

1/0

CPU

costs

Mate~ Alg.

(K w?-

rd

13FJ

costs

2.5167

rd

w!-

total

0

0

II

tests)

b oz

25167

7228

0

slots

uniformly

may

be to extract

techniques

such information

3304 3141 3206

62 358 366

6287 236 236

1195 814 820

10820 4549 4628

587 450 457

556 550 557

seed

levels.

most

suitable

STJ1-2F STJ2-2F

3142 3217

358 366

236 236

790 805

4526 4624

2242 2248

550 552

situations.

STJ1-3F STJ2-3F

3268 3487

335 344

236 236

736 677

4575 4744

5104 5205

497 526

A

the Table

7:

40K.

Join

Cover

performance

quotient

with

of DR

I l.D~ll

clustering

=

100K,

rectangles

I IDs-II equals

=

acteristics

of their

necessary

in

I/o Alg.

(7PU

costs

~

Match

(K wr

13FJ RTJ STJI-2N STJ2-2N

31831 3710 3582 3611

0 69 338 340

0 5976 236 236

0 1207 800 808

31831 10934 4956 4995

8300 763 551 566

0 623 587 613

STJ1-2F STJ2-2F

3579 3600

333 330

236 236

793 799

4941 4965

2353 2367

588 j 615

Table

8:

40K.

297 371

3689 4125 Join

Cover

236 236

performance

849 769

5071 5501

IIDRII

=

with

of DR clustering

quotient

ox

focus

of our

access

leaf

6 this

studied tree

it

may

two (1)

be

trees

input

spatial

seeded

to

when

data

following the

data

sets to be used

input

data

sets are the

it is determined

seeded the

trees

paper,

high,

In the seeded

two-seeded-tree

tree

for

the

the

seed levels

are

seeding

Suppose

tree.

join

both

can

be applied,

for

both

rather One

than

R-tree

trees.

for

being may

trees.

trees copied

DB

that from

be to have

be

is

sets

are

spatial will

For

selection,

input

height data

at join

be

tree

can

spatial

height

of a

of the

R-tree

plus

the

number

from

the

root

upper

to

bound.

existing

tree

to

build

costs the

faster

using

effective

are

for

of the

also

the

matching

lower

due

two

tree

input

against

lists

set the

tree

the

always by

case.

is

a

Tree

is shown

construction of

to better

that

are

methods

data

costs

tree,

sizes and

show

method

linked

eliminating

When

technique

results

other

costs. filtering

size of the

in one boundary

intermediate

tree

linked 1/0

seed level

tree

for

of a tree

sets of varying Our

seeded

nodes during

shaped

basis time

tree

data

Tree

better

the

and data

intermediate

reduce

seeded

levels input

growth

the

called

except

in

misses.

also uses

clustering. of the

tree

construction

input

of two to three,

method

that

the

than

R-trees

to

exist

seed

levels.

in a tree are

to further

with

the

be used

of the

seed

a technique

tested

the

to guide

reduce

lower

clustered,

the

levels

factor

buffer

cannot

into

resulting

of spatial

seeding

trees,

addresses

no R-trees

characteristics

algorithm

construction

a spatial

index

method

R-trees

are used

seed

methods

be very

This

or where

is divided The

can be used

1/0

studied

sets.

utilized

have

and

constructs

time.

processing,

The

degrees

presented

dynamically

to drastically

total

we

The

the

this

construction,

other

choice

selections.

than

seed levels

We

of

a seeded

be shorter

levels.

join.

that

is

is to be

as the

for

issues

as an ordinary

paths

We also presented

If at least

If no pre-computed

we can use a common

seeded

approach

seeded

are

a spatial

done

these

used

same

where

construction

determine

OPC(DC))

two

the

trees,

seeded

grown

the

trees.

say OpB, is a non-spatial

seeded

can

tree.

somehow

of

char-

techniques

been

most

data

the

tree

two

seed levels

seeding

the

work,

than

we have

join

input

lists

selection

is no obvious

joinA(oP~(~~),

pre-computed

This

the

the

still

seeded

no

to the

constructing

the join.

there

and

selections,

non-spatial

from

We must

constructing

one of opB or opc, for

derived

joins,

with

in the

or (2) one or both

scenario,

scenario,

in both

the by

can use the tree

for

of the

one-seeded-tree

the

processed

efficient

occur:

enough

of non-spatial

dynamically

selectivity

the

spatial

on

to realize

However,

that

seeded

The

using

of complicated

closely

in the join,

results

that

is more

case if the

other

are related

join

situations

results

the

using

However,

a spatial

are

including

R-trees

method

R-tree.

the

input and

join

perform

sets

operations

pre-computed

tree

a pre-computed

necessary

seeded the

the

will

situations

help

and

Such

if necessary,

and

spatial

with

method

the

Discussion

a seeded

measizes,

Conclusions

In

1.0.

that

join

for

levels,

nodes

called

We have

has

future

is no greater

constructed

=

equals

tree

of seed

join

5

noting after

method

seeded

553 581

IID.sII

rectangles

within

be retained

5772 5872

100K,

Addressing

It is worth

rd

STJ1-3F STJ2-3F

work

these

costs

Wr

-lb

little

the

in

as the

based

way

area.

find

quantitative

sets.

best

in this

the

to trees

such

databases

tests)

rd

total

data the

and

the common

seeded

is finding

sets using

[OR93],

necessary

operations

input

choosing

Relatively

is

solution

data

sampling

characteristics,

of spatial

Another the

for building

construct

issue

the

0.8.

data

research to

related

to predict

query.

area. from

as a basis

way

outcomes

map

as spatial

More

closely

sures

the

information

use such

RI’,J STJ1-2N STJ2-2N

I

divide

to time

spatially

seeded

tree

organization.

set of seed levels

artificially

1If a seeded

constructed

any pre-computed a set of seed levels

original

R-tree.

used,

whose

original

219

spatial

tree is to be used in spatial selections after the join terminates, seed level tiltering should not be

as the set of data data.

indexed

by the tree will

be a subset

of the

CPU

costs

lowest not

for

the

among

seeded

all

tree

methods

method

are

seed

level

when

always

[Ore91]

the

filtering

and

act ivat ed. Seed

when

level the

reduce

data

CPU

costs

by

CPU

[Rot91]

However,

when

If the

can

the

a join,

this

[Sam90]

[SE90]

References Thomas

Brinkhoff,

Seeger.

using

R-trees.

Norbert

and

efficient

ternational

[FSR87]

Bernhard

322–332,

Christ

os Faloutsos,

[Gun93]

[Gut84]

Conference

Oliver

Gunther. Proceedings

dex of

[LH92]

ACM

spatial

of International

[OR93]

of spatial

A

in-

Proceedings Conference

47-57,

pages

on

dynamic

searching.

Aug.

on

1984.

Dist ante-associated search.

Conference

284–292,

In

on Data

join

Proceedings Engineering,

1992.

J. Nievergelt,

H.

Hinterberger,

and

The

An

adaptable,

symmetric

file:

structure.

ACM

9(l),

K.

Transactions

C. Sevcik. multikey

on

Database

1984.

Frank

Olken

spatial

databases.

Conference

of Data,

1993.

International

of Data,

Han.

Sgstems,

In-

Conference

50–59,

spatial

range

file

ac-

SIGMOD

Management

R-trees:

for

Wei Lu and Jiawei

grid

Rous-

spatial

computation

pages

SIG MOD

for

and Nick

of International

indices pages

[NHS84]

on

Guttman.

structure

Management

In-

of Data,

oriented

of A CM

Efficient

Engineering,

Antonin

R*-tree:

1987.

joins. Data

Sellis,

of object

and

Doron In

on Data

Rotem.

Sampling

Proceedings

from

of International

Engineering,

199-208,

pages

1993.

[Ore89]

J.

ternational Portland,

[ore90]

Jack

In Proceedings Conference OR,

1989.

Orenstein.

A

processing spaces. tional

Redundancy

Orenstein.

A.

databases.

techniques In Proceedings

Conference

343-352.

of A CM on

Management

comparison for

in

of A CM

and

SIGMOD

In-

of Data,

of spatial

native

on Management

spatiaJ

SIGMOD

query

parameter Interna-

of Data,

Samet.

pages

1990.

220

The

indices.

Design

Star

mation

Systems.

and

John

In

on Data

Japan

Spatiai Zurich,

Springer-Verlag. Proceedings Engineering,

1991. and

Addison-Wesley,

Jeffrey

Estes.

Analysis

of Spatial

1990, Geographic

Infor-

Englewood

Cliffs,

Prentice

Hall,

T. Sellis,

N. Roussopoulos,

and

C. Faloutsos.

R+ -tree:

A dynamic

for

multi-dimensional

Jersey,

1990.

index

In Proceedings 3–1 1, Brighton,

P. Valduriez. Database

for points

Management

Proceedings

427–439,

The

of A CM SIGMOD

on

Times

Analysis

ternational pages

Seeger.

Kobe,

Structures.

objects.

RaJf

join

in

381-400,

28-301991.

the

In O. Gunther

Advances

pages

Conference

500–509,

Hanan

pages

1990.

May

cess methods.

Kriegel,

access method

Proceedings

pages

In-

of Data, [Va187]

Hans-Peter

Conference

sopoulos.

Management

[SRF87]

joins

1993.

and robust

and rectangles.

and Bern-

of spatial

of A CM SIGMOD

on

Beckmann,

Schneider, An

Proceedings May

Kriegel,

processing

Conference

237–246,

pages

Hans-Peter

Efficient

ternational

[BKSS90]

SpatiaJ

for computing

spaces.

‘91),

D Rotem.

New

hard

algorithm editors,

(SSD August

Data

a special

concern.

[BKS93]

Schek,

Switzerland,

pages

technique

is not

H.-J

of International

degree

characteristics

capacity

An

of k-dimensional

Databases

sizes

and

3070.

gain

before

when

tree

is high,

is low.

known

only

reducing

almost

without

of data

are not

be used

in

clustering

costs

clustering

input

should

is effective of

tot al 1/0

incur

of spatial of the

filtering

degree

the

it could

Jack Orenstein. overlay

is

Join Systems,

of Very England,

indices. 12(2),

ACM 1987.

Large

Data

The

Bases,

1987. Transactions

on

Suggest Documents