We conducted experiments with seeded- tree joins using three spat ial join algorithms: S TJ,. RTJ, and. BFJ. Algorithm. STJ (Seeded. Tree Join) is our algorithm,.
Spatial
Joins
Ming-Ling Electrical
Lo
Using
and
Engineering
Seeded Trees* V. Ravishankar
Chinya
and Computer
University
of Michigan-Ann
1301 Beal Avenue, mingling,
Ann
ravi@eecs
is
methods for
not
In
the
this
for
paper,
This
we
method trees
the
data
a technique
lists
during
construction
the tree construction
.umich.
be replaced
by smsller
seeded
trees
icantly
in terms
growth
during
significantly
1/0.
that
The
databases
creasing query
out
other
to
accesses. methods
penalties
filtering
using
tial
selection,
spatial find
all
spatial
a predefine
find
in
GIS
algorithms
into
incurred
between
This
method
the
objects
access
cilitate
such
operations
Most
research
on
operation.
are
window
contained area,
overlapping
spatial
years.
search
operations
Many
in-
and
methods
have
Examples queries,
within point
a particular been
on the spa-
or
which
which
in space.
devised
to fa-
by the Consortium Networking.
for
up both
the
SIGMOD 94- 5/94 Minneapolis, Minnesota, USA @ 1994 ACM 0-89791 -639-5/94/0005..$3.50
209
similar
method
[LH92]
has
range
space
B+
tree. two
Several
spatial
Joining
database
algorithm
using
can
as some
tree-like
matching before
the
practice,
to
such study
the
and
exist
at
can large
both
descend for of
techniques
data
a general
sets.
Since
pairs
of
recorded
n +
1.
to hold with
models various
be
level
indices
The as long
the
n must to
trees,
sets
data
required
the used
indices.
traversal,
level
Analytical
indices
analyzed
proposed
two
for
tree
performance
tree-like
spatial
in tree
of memory be
amounts
of generalization
to join
is used
could
sets
z-
such
Algorithms
of the join
concept
indices
as R-trees.
by their indices
data
to these
decomposed
[Gun93]
of tree-like
amount
In
is first
using
systems, the
algorithm
the
information out,
joins
tree-nodes
join.
Join
Gunther
be applied order
[Ore89,
streams.
are abstractions
algorithm
proposed
be sequenced
algorithms
in relational
distance-
been
one-dimensional
proposed.
to spatial
can
algorithms
and
Index-Based
applicability
for
indices
Orenstein
spatial
are
exist
using
also
study
two
z-value
join
also been
which
under
using
Tree-like
join
queries.
can then
organized
must
z-order-based selection
at at join
[NHS84]
which
A
which
to merging
grid-files
spatial
spatial
elements,
that
before
proposed
to be frequent.
processing
sets
spatial
a spatial
sets if spatial
of pre-computation
method,
indices
0re91]
breadth-first
Permission to co without fee all or part of this material is granted provid INl t at the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and I& date appear, and notice is given that copying is by permission of the Association of Computing Machinery. To copy otherwise, or to republish, requires a fee andlor specific permission.
assumes
based
builds data
accelerated
access
them.
those
be
indices.
method for two
as
generally
tree-like
sets are known
for
for joins
applications can
on
the overhead
data
join
and
data
It
main
for
speed
join
[Sam90].
*This work was supported in part International Earth Science Information
time.
associated
[Rot91]
on the
Spatial
indices,
based
[Va187]
time
relevant
have
intersect-
queries, point
these trades
as the
1.1
of
index
done
operations
such
on join
those
index
building
into
received
has focused
join
of a join
joins
based
and
algorithms,
is used.
have
system
those
spatial
perform
systems
recent
in such
window
all objects
in
join
as the and
or spatial
selection
necessary
z-ordering,
been
applications.
but
0re90,
signif-
GIS
Existing
to
joins
and
has
important
overlay.
be built
a large
work
most
expensive
invocation
up
little
of the
are
used
linked
allows
one
map
order
attention processing
databases
index
tree
speeds
spatial
CPU
seed-level
in
tree construction
of sequential using
spatial
The
tree
reducing
technique
show
those when
nodes
uses intermediate
The
join,
divided
are divided
The
thereby
and
numbers
of disk
except
and
relatively
spatial
on
Introduction
Spatial
ing
that
studies
outperform
are also lower
tree
disk accesses during
performance
1
edu
version
can also be used to filter
process.
of random
that
sets involved
levels.
construction,
We develop tree
of thedata
to guide
size.
method
process. grown
seed levels
during
map
selections.
seeded trees at join
structures,
the
multiple
join
of
assumption
non-spatial
trees called
are R-tree-like
The
some input
Our
involving
uses knowledge
and
the existence This
a spatiaJ
index
are used
construction.
sets.
involving
explore
seed levels
seed levels
assume
data
to speed up the join
Seeded
number
joins
applications
constructs
in the join
the
spatial
or for queries
dynamically
into
for
participating
realistic
overlays
time.
MI 48109
However,
Existing
layer
Department
Arbor
Arbor,
Abstract indices
Science
In such
high
fan-
to
used
were
techniques,
but
memory
constraints
The
R-tree
were
SRF87]
have
atively
simple
structure
spatial
objects
with
been
R-tree
is a B+-tree
tidimensional contains pointer ing
like
node
of all A
old), is its
ence
their
A
form
that
(mbr,
and mbr is the
refers
spatial
by entries
to
them
into
spatial for the
in the
units,
tial
and Dp
without
based
on
the
R-tree
each
computed a
[BKS93]
et al.
requires
or
variations. data
index.
techniques
to reduce
The
matching
starts
R-trees
for
the
matched
tree
are
to eliminate If the
As
a result,
process
the
was
results
order
technique also
than used
to find
on in
was
input
to a spatial it
result
from
the
Also,
when
inefficient
and/or fore
previous
in the join
to have some a spatial
been
constructed
costs,
pre-computation
to
join
Such
is invoked.
have
overlap
order
a derived Our ically
less
necessary access
input been
struc-
data done
requirements
dices
be-
tion,
can
the
210
in data
data
pre-
query,
only
if
a small it
may
selection
first
buildings
and
approach which
would
no
it is also common
spatial
for
Other
spatial and
queries,
requiring
sets
further
methods
original
methods could
for
be
very
operations buffering
such [SE90]
transformation complicating
call
a data
set
or non-spatial
are
derived
join.
the that
is
operation
FSR87,
for
of spatial
join.
of disk
the Thus,
mts
spatial
up
are not
of these
of spatial
they
accesses
were
incrementally, indices
Most cost
as
In particular,
these
construction.
data
most SRF87]
context.
to be built
to minimize
is to dynam-
However,
BKSS90,
algorithms
for all-at-once
number
queries
the
a different
assumed
are designed
such for
spatial for
construction
that
spatial
[Gut84,
designed
not
we
to supporting
methods
average
queries
If an spatial
to the
access
aggregation and
a spatial
sets.
of previous
related
such
for
data
set,
to support
indices
multiple output
convenience,
access
optimized
sets
and
for
remotely
sets
occur
build
originally
index
data
approach
The buffer
as that the
second
This
pre-computed
of an earlier
A page
Realistic
the
that
such
the
set
is the
For
the
though
join
situation, the output
also used,
some
for
a park
using
is large,
data
be only
original
1/0,
traversed.
for
between
Utilizing
also
disk
experiments.
require
with
applied
join.
or infeasible.
of the
so the methods
in been
the following
a non-spatial
spatial
re-classijicatiorz,
may
used
Motivation these
have
consider
government-owned
joins
might
original
as
be
needed.
to decide was
to involve
such
1.2
all
parks
is government-owned,
applications,
query
set.
reduce
of all
and
data
could
be
to perform
an intermediate
joins,
not
buildings
the
In real-life
the
technique
costs.
and
exists.
over-
tests
in 1/0
indices
algorithms,
can
of buildings
set
perform
index
consideradid
of R2.
the
involve
inthose
Iil
the children
degrees
CPU
in performance
paper
further
were
DB
all
buildings
However,
number
Tikl
by
area
would
also used
decreases those
total
efficient
of comparisons
of a node
based
showed
substantial sizes were
the
of the
nodes
of RI
of overlapping was
children
or
contains
contains
overlap
existing
R-trees.
be more
When
To
sets DB
R-tree
that
query,
fraction
1/0.
reduced.
DP
and DP,
[BKS93],
leaf
the
R-tree
first
this
of disk
a plane-sweeping
significantly
the
the
Brinkhoff
when
and
child
data
all government-owned
denote
costs
pairs,
number
number
plane-sweeping pinning
any
child
the
in
no answer
and and
reduce
in which
will
algorithm
from
spa-
be easily
a park.
computed
a depth-first
reported
of a child
area,
child
Find
For of
traverses
intersection
children
not
area.
that
all buildings
with
of the
CPU
their
box
on one axis,
further
Q2:
of
It
nodes
in
We
two
two
area , and
DB
exist
operation.
common
for both
oper-
indices
costs,
root
described
amount
some
bounding
this
to
collection
recursively
are
reducing
for overlapping
sorted
tures
the
spatial
pre-computed
sets may
consider
some
Assuming
Find
then
overlap,
intersection
mat thing
of the
of the join
at
to
tion.
looking
1/0
consists
is straightforward.
trees.
representing
found
the
disk
a
resulting
both
the
boxes
used
All
example,
may selec-
of previous
spatial
Any
data
to the join
cover in
sets.
original
results other
no
of their joins
of previous
the
cases,
many
operations:
Ql:
a pre-
algorithm
then
techniques aimed
at reducing
were
lap
an
addition,
of some
data
in
to main-
method
have
discussions.
those
bounding Rz
the
area.
join
such
applied
that
the
algorithm
join
to
and
Results
in
component
Improvement
aimed
and
order.
in subsequent
cluded
and
children,
reached
matching
join
children
overlaps,
traversing
nodes
This
algorithm the
a join
set
algorithm CPU
by matching
down
The
matching
tree
proposed
participating
R-tree
R-tree
tree
for
buildings
higher-dimensional
attributes, in
satisfy
sets regardless In
of a series
results
derived
constructed
Brinkhoff
two
index
As
refer-
points.
of
newly
to
be cost-effective
all data
results
or the
Clearly,
efficiently
form
object,
on the
joins,
not
frequencies.
on non-spatial
bound-
R-trees
in whole
tions ations.
of the
a spatial
mul-
impossible
for
and
be required
node
even
it may
structures
patterns
cp is a
entries
rectangle.
objects
or transforming
where
minimum
described contains
old
stores
or
First,
index
usage
An
R-tree
cp),
tain
of
objects.
non-leaf
situations.
rel-
handling
as region
method
bounding
stored
clipping
access
be inconvenient FSR87,
to their
efficient
such
leaf-node
where
due
their
objects
minimum
BKSS90,
popularity
objects.
to a child
in depth.
[Gut84,
extent,
of the
mbr
considered
and
entries
node.
(rnbr,
gaining
spatial
rectangle
child
not
and its variations
per
tend
in-
selec-
to reduce
spatial
selec-
tion
and
the worst-case
disk
space
ways
the
joins.
In
at join
do not
data
time,
For
trees. is
for
We
the
participating
situations
The
available
algorithm
taking
the
and
join
time.
at
operation,
a seeded
participating some results.
tree
Our
algorithm,
given
and
show
methods,
both
that
is
then
with
reasonable
Figure
of
the sets
using
matching
Our
when
paper
describes
is
the
detail.
organized
structure 3
construction
Section
of the seeded
techniques
Section
Section
6 concludes
follows.
behavior
presents
costs,
studies.
Section
as
and
Section
performance and
in
the
4
reports
5 discusses
this
to
2
with
trees
on
our
initial
tree.
initial
tree
that
issues,
Seeded
Let
us assume
that
DS,
for
with
an ordinary
R-tree the
which
index.
derived
The
let
central
that
tree DR
noted depends
TR
that not
index.
is to its
Let
make
expedite
the only
its
seeded
Ts denote for
the
seeded
tree
By
choosing
well,
we may
join
use
of
be the
join of
method costs.
constituent
Ds,
used
in It
has
R-tree-like data
objects
Ts.
Assume
boxes
the area.
If the
order
that
with tree
the
is to a
bounding
join
these
boxes
in TR.
are allowed
tree
but
initial
211
tree
However,
each
against
only
tree
one
bounding
for organizing
is optimized
is optimized so that
tree
for the
spatial data
seeded against
bounding
box
an
achieved,
the
tree two boxes
are organized box
be
Thus,
are different
when
If we will
in
will
TR.
selection
objects
smallest
BS4 but
the when
Ts in such
bounding
join.
with
of Ts
actually of
indices
‘for spatial
joins
shows
achieve
if the
seeded
lb
into
and
to be non-minimal
lc,
tree
each
DR.
to be inserted
root
are
the tree
be joined
to
BS3
in
Say we have
Figure
boxes
BS2,
la.
of the
match
bounding
the
index
will
with
objects
organized
the
small
in spatial
are inserted
bounding BS1,
the criteria
children
are
this
a seeded
TR will
of four,
objects
boxes
matched
of
fan-outs
process
as in Figure
been
data
and
TR,
used
have
in Figure
that
boxes
to determine to
since
with
organization
TS,
we
instead
Furthermore,
for joining
fourteen
its that
join,
growth,
information
example
of the
data
the
suitably
bounding in Ts
join
tree tree
an
When the
a seeded
re-inserting
a spatial
tree
be matched
expect
of tree
TR and
into
for
we know
characteristics
process. an
tree
more
and
for
the
subsequently.
suggest
node.
will
will
hence
phenomena tree
root
and
improves
to guide
tree
by the
bounding
DR.
for
will
tree
the seeded
R-tree
the
performance on
we are given
the
TR
set
exists,
a seeded
the
to reduce
data
index
, for which
be constructed
R-tree
can to
spatial
matches
behind
information Ts
We
and
and
a derived
constructs
TR denote
and
proceae. DR
set,
idea
use available seeded
set DR
algorithm
R-tree
and
DS,
to join
pre-computed
data Our
data
the existing for
no
we want
a single
seeded
inserted
earlier
in an R-tree
Such
tree
tree
deleting
a seeded
importance
an R-tree
Trees
that
were
inserted
inserted
of TR can be used
is illustrated
2
data
objects
a small
is shaped
The
paper.
data
objects
data of the
of the
observed
from the
data The
[BKSS90].
characteristics
reduce
related
tree
constructing
we know
which
organization
of the
can start
join
in
BKSS90].
the initial
of starting
This
tree
order
performance
existing actual
the [Gut84,
a fraction
of our method,
and
it
It has also been
in
performance.
on
into
position
proposed
outperforms
also decide
phases.
in
Beneficial and non-beneficialformation of bounding
1:
boxesin a seededtree.
compute
tree
construction
Date object in tree 2
a join
to
TM
Bounding box in tree 2
■
not
data
proceeds,
any
Bounding box in tree 1
~q
(c)
it handles
for one of the
component
tree
pre-
encountering
our method
during
to
input
algorithm
matching
simplicity
experiments
use
our
dynamically,
algorithm
works
we will
as the tree its
matching
the
but
advantage
join
m
called
indices
constructed
The
method
but
[BKS93]
sets.
=
a
which
Thus,
the
Upon
tree is first
data
standard
join
with
indices
are constructed
about
(b)
at
available
in
sets.
S4
processing.
index,
pre-computed
trees
We need
to create
system spatial
data
of the
structures
R-tree-like
using
seeded
construction
information
of
require
where
practical.
a
type
both
sets.
problem
index
assume
not
sizes
join
this
using
does
data
spatial
address
al-
So is the spatial
of information
efficient
main
the
time.
m
available
is inexpensive
advantage
al-
of spatial
construction
example,
that
are not
context
information
at join
method
and to minimize
index
of these
we
join
seeded
method
takes
paper,
R-tree
its
exploit.
to support
spatial
exist
spatial
structure
and
time this
new
is useful
characteristics
a spatial
in the
existing
sets are known
distribution
In
characteristics
ones
there
that
data
at join
Such
important
addition,
gorithms
join
costs,
consumption, most
time
input
selection
and
when
can
create
the an
be inBerted
as
slot
B2
slot
seed levels .—— ——— grown levels
R4 R3 o —
El
IF
Ill
).!..
//-=’&i--b~ grown subtree
seed node
w
grown node
Figure
in
Figure
during
lc,
In
the
the
1/0
tree
first
and
seed levels
grown
from
small
number
children
levels
are the
root
nodes,
of the form
a non-leaf
objects
cent ained of the
form
spatial
object
in the
an additional
the following
node.
and
field,
data
mbr
if seed level (see
we assume
Seeding
contains to
seed
through In of
the
the
copied
levels
1 (the
T5
The
a simple
are
k – 1, and
to level tree
phase
a growzng
phase,
The
tree
leaf
the
the
phase),
the
be
In
3.2).
In
never
do not
R-tree
up
some
T5
at levels
nodes.
The
set to NULL. a slot,
and
O (the span
The
the
seed
copying
fields nodes.
TR,
of the
of level
each
(mbr,
insertion
the
call
k – 1 of the copied in
level) level
k
the
into
growing
the tree.
seeded
tree,
seed
phase,
the
to
copying
over
tree
TR
to
the
best
long
box
B2
bounding
of the
seed
describes better
child
nodes
B1 and
are
the
level.
for
guide
B1 instead
slot will
deciding
212
minimal RI
(the
may
growing
never
changes.
grown
be
levels
levels.
The
described
two
in
seeding
tree
a large
dead
whereas
insertion
[Gut84], of B2, which the join
could
area If the
unchanged
area
object
penalize
increase
S1 would result
process.
the
consider
111 contains
minimal
and
be
minimum
will
bounding
R4.
As
and
B2 is a more
children.
to be copied
seeding
always
box
and
are
the
not
formed
bounding R2,
tree,
Simply
As an example,
R3
of its
the
tree.
from may
badly
tree.
of
of seed nodes
seeded
TS
squares
B1 has
growth
fields
boxes
tree
and
we use minimal
during
nodes
nodes
seed
will
of the
the
and
accesses
the
be
boxes
seed
the
the
the
box
Copying
where
B2 were
tree,
of
at
nodes
bounding
its children,
k – 1
must
level
insertion
into
guide
seeded from
description
fields
spltttzng
levels
rectangles box
fields
2.2.
the
contains
the
reachable
bounding
of the seed levels
of the seeded
3,
two
reflect
data
slot
data
performance
boxes
Figure
box
bounding
the
not
all box
of the
grown
strategy.
at level
nodes thus
crucial
performance
undergo
to their
seed
in the
into
k — 1 seed pair
the
the values
copied
fields
cp)
the
which
the
in a bounding
need
of
minimal
upwards
in Section
bounding
seeded k levels
seeding may
being
pointer
fields
data
from
before The
top
TR nodes
We
information
root
of the
O to k – 2 are set to point
level
phase.
from
the
is called
pointer
The
levels
over
R-tree
transformations
corresponding nodes
from levels
grown
is derived, box
simple
numbered
by
TR.
information bounding
clean-up
the
phase,
set
of a seeding
cp)
in the
insertion.
begins.
node
of
data
of mbr
bounding
during
propagates
behavior
value
true
structure
particular,
pointers
boxes
guide
box
fields
as needed
bounding
to
bounding
change
will
the
(mbr,
matching
pointer
can have
level).
seeding are
and
consists
null
the
pair
the
and
bounding
fields
tree
Since
of a seeded
have
slot
only
The
:3) Although a
Phase
construction
level
following
boxes.
bounding
into
detail
2.1
the
used
cp,
before
of all
exist.
The
have
construction, are
minimal
through
Section
grow.
tree
at the
and pointer
true
modified
jiltering
shadow
box
contains
refers
is the
eventually
in a seed node,
to a child
in a seed node
insertion
discussion
old
will seeded
bounding
nodes
Thus,
a the
rectangle
where
tree
of the
nodes
tree
seed
As with
A leaf node
old), entries
from
tree
the
non-null
(:2) During
levels for
cp points
bounding
database,
shadow during
where
minimum
The
seed
seed
but
nodes
span
(:1) The
seed at the
level.
which
seed levels
seeded
the
tree
leaf
Efficientand inefficientboundingboxes
3:
into
——— —. —__
properties:
by
those
The
in the seeded
in the child
object.
The and
levels
to the
cp),
(rnbr,
of
consecutively
grown
node
(mbr,
is the
TR to the
2).
continue
The
is achieved
nodes.
seed level
entries
performed
shape
be reduced
consists
seed nodes,
and
last
and mbr
R-tree tree
grown
of levels.
of the
of that
can
goal
(see Figure
called
R-tree
box
of the
are called
entries node,
cost
this
a seeded
levels
start
CPU
Figure
The
k levels
at the
levels
and
method,
Structurally,
grown
L L
process.
seeded
copying tree.
R2
Exampleof a seededtree.
2:
both
the join
0
subtree
❑
J
l–
a result,
only
compact
bounding into
badly and boxes
the seeded
as the
criterion
be inserted
in unnecessary
into disk
Other
information
box
field
had
copied
the
seeding
into
of seed
can nodes.
the
In
the
for
of seeding
we
copying
into
previous
been
if we
boxes
inserted
from
SI — .
from
properly
three
❑ ■
A
bounding
example,
investigate
information tree
the
of bounding
have
study
box
copied
points
S1 would
this
strategies fields
In
center
tree,
Bz.
be
s
copy
the
minimal
(a)
different
the
bounding
S1
nodes:
(72:
copy
the
center
bounding points
the
minimal
the
slot
minimal
box
bounding Our
level,
show
that
the
growing
into
the seeded
tree
traverse each
the
level next
and
a slot
first slot
tree
node,
and slot
root
the
data
it
into
two
nodes.
new
root.
behave
like
R-tree
being
Recall
the
node and
that
Thus,
a seeded
a small grown
tree nodes
of
to by the each Figure
seed
slot
to the
pointer
up
growing
with
an
slots.
The
as consisting
of
R-tree
of
forest
a grown
subtree
At
choice
seed
level
we must
to traverse,
based
fields
selection
inserted
the
also
this
study,
until
on the
of each
node.
central
point a child of the
on
or an area, whose data
The
exact
whether
the
If central
central
being
stored
Update
point
inserted.
for
points
are stored,
is close
to the
If
areas
are
child
U5: a
only
in
we
not
insertions
boxes
to will
in trying boxes
boxes
associated
insertions
to
from
right
to reflect
pointer
will
set IIs the
We boxes,
choose
bounding
their
of data
seed nodes
by the characteristics
derived
after
traversed
at all
be guided
the inserted
following
not
seeding
tree
so far.
bounding
In box
Update Bounding
insertions. bounding
the inserted
as U2, only
Bounding
is
part
seed bounding
U4: Update
stored
used
insertion,
the bounding
through
inal
box.
criterion
If
Updating
to enclose
the
value
sum input
bounding
bounding
causes
information
data
subsequent
be guided
tion
this
bounding
the the
policies:
from
in the
original
we investigate
We make
a child
is found.
information
depends
choose
choose
a slot
and as that
these
them.
subsequent
by the
C73: Same
(see
it
traversed
each
boxes,
tree.
so that
closes
each level
data
but
2).
next
the
and will
seeding
by
pointed
bounding
from
same
update
update
each insertion
only
tY2:
smallest box
of the
after
to
to
using
the
fields
bounding
U1: No updates
phase.
R-tree
how
a slot,
update
to
of the seed levels
whole
is called
of the
pointer.
propagate
be visualized
nodes,
attached
not
root
slot
slot
is the
whether
these
times,
to
this
the
structure the
to point
criterion
box
how
the
of
bounding
updated
and
after
and
parent
through
to by the does
the
can
nodes, the
insertions,
during
tree
grown
is modified
pointed
splitting
unchanged
two
the
subtracting
old
always
choose
of
to
the
bounding
continue
the
of This
can
find
yields
construction.
update
through
that
insertion,
areas
The
is
due
a child after
are not
new
like
the
R-tree
Otherwise
overflows
insertions
R-tree
node
node
behaves it
area
is the
the
found
choose
rectangle.
of the
to it.
at
reached
grown
we box of
from
pointer
to become
pointer
level,
is
point
node
When
we
If this
a new
into
into
slot
child
node
split
ordinary
seed levels,
(c)
are
traverse
level
to
D5
object,
data.
the
grown
slot
slot
inserted
allocated
The
to
the grown
Subsequent
that
remains
be
the
is set
R-tree.
node
to
case,
object This
will
grown
this
a data
the
slot,
pointer
pointer.
the
point
In
inserted
the
box
this
child
in
an example
node the
inserting
of an ordinary
a third
the
for
NULL.
insertions,
root
Eventually
the
the data’are the
the
through
be
R4
Figure 4: Exampleof seeded tree growth. The fan-out of tree nodes is two. In (a) the seed levels havejust been set up. In (b) rectanglesRI, R2 and R3 are inserted through slot S1 and S2. GrownnodesGI and G2 are allocatedas a result. In (c), rectangle R4 is inserted through slot S1, and ovetflows G1. A new root G4 of the sub-treeis allocatedand S1 is made to point to it.
and C3 almost
4 shows
insert
a suitable
chosen
will
R2
R3
the
minimum
objects
Figure
To
from
level,
allocated,
Cz
data
tree.
growth.
insertion
the
levels,
true
R3
(b)
Cl.
phase,
choosing
the
other
of
R2
G3
B RI
Phase
During seeded
points
the
strategies
2
G1
A
bounding
children.
copy
inserted of
At
strategy
Growing
center
contains
of its
out-perform
the
boxes.
field
box
results
2.2
copy
bounding
bounding
always
2
RI
At
s
G––
Q
boxes. (73:
s —
boxes.
of
grown node
E data rectangle
S1 Cl:
seed node
but
boxes bounding boxes
after
objects
each
inser-
and the orig-
box.
inserted
bounding
boxes data
the data,
box at other box at other
updated but
at
bounding
not
the
the
slot
seed levels at
the
slot
seed levels
box
en-
seed bounding
level are not level are not
as in
Uz.
updated. as in
U3.
updated.
we
central
The
stored,
in ordinary
213
bounding
boxes R-trees.
at the
grown
level
are updated
as
The are
clean-up
inserted
fields
phase
into
of seed node
bounding data
boxes
objects
made
after
seeded
are adjusted of their
are
all data
object
The
bounding
tree.
to be the
children.
deleted
and
true
Slots
box
For
minimum
data
tree
no
Note
process
that
more
data
objects
slots
than
into
may
have
with
directly
During
trees
to
data
some
subtrees
since
the
tree
require
the
it
can
be
developed
for
seeded
trees
pointer
be
any
mat thing
are
as tree
full,
Tree
Since
traditional
buffer
for
Construction
incremental
structed
all
be high. time,
this
section,
diate
at once, we
accesses
during total
exploits sizes
tree
that
the
costs. interme-
hence also
seed
disk
a technique
levels
the
together.
Since
hence
a grown
trees.
tree
3.1 We
T’uning have
indices
grow
actual
and
trees
and
using
such
overflow
can
buffer
several
times
a 512
result
in
Table
2, second
page
7,234
of disk 100K
cost is the If data
objects
in their
input However,
close such
join
This
number
needed
difference for
three
a match
factor
the
times
with
size
the
in space of buffer is hard
to 50.
can
for than
times
the data
random
and
the
tens least
and In
number
the
even
having
with
size can
data
our
stream.
are also close will
to guarantee
experiments,
more
than
worst
buffer
in
tree
214
2,500
least
tens
few
as
from
of
could times
larger
if
some
grown
they
are
likely
built
using
the
same
to
since
smaller
the
constructed
using
be
buffer
much
unlikely
than
be
overflow subtree
buffer
size,
In
seeded
trees
with
buffer
size.
The
a 512-page misses.
to
input, smaller.
to
average the
at
work
of
even
are
two
a few
buffer, likely
in
times.
a fan-out
we ha~e experien-ced
is two
since
access
be
that
skew
nodes
varies
least
we have
must accesses
is determined as
at
buffers
much
tree For
at
access
However,
Note
incurred
“overflow
construction
a seeded
R-trees
data
disk
random
algorithm
the
of
therefore
method
the
size
chances
construction
levels.
that
however, some
be
of
than
be made
faster
assuming
R-
are
lists. than
of slots
size.
penalty
linked
in
seed
of
an
size
The
this
tree,
size
of sequential
faster
in much
overflow
smaller
practice,
the
number
trees do
number
can of all
seeded
the
buffer
price
we
average
of random
The
means
buffer
subtrees
construction
misses
This
objects
instead
in the
data. the
number
is much
hundreds,
the
input
in the
of
the
seeded
much
the
of slots
number
levels,
same
results
have
modified
lists,
than
reading
number
the
seed
(see
input
an R-tree
in the input
chances
clustering
is nine
affecting
to each other the
page
between
and
an Robjects,
construction
to read
by
process.
data
lK-byte during
costs
this
An
that
data
one
slots
overflowing
and
by
smaller
access
1/0,
The
bad
1/0
40K
of clustering
order,
disk
constructing
Another
degree
in
that
with
disk
are many
reduced.
writing
sequential
algorithms,
of the actual
set
for
seeded
particularly
with
accesses,
access needed entries.
resulted
accesses
the
the
The
linked one
is much
is an increase
start
lists.
R-tree.
the
and
we can
the
subtrees,
the
smaller,
pay
sizes of
and
data
row).
disk
R-trees
In
with
a
as before.
is then
grown
subtree
is significantly
as the
space.
relative
observed
accesses
disregarding
misses
construction
that
buffer
disk
of disk
sequential
lower.
than
we have
using
set,
both
much
tree-like
buffer
on the
high.
have
higher
buffer
memory
For
be very
a 800 K-byte
number
from
depend
buffer.
misses
example, for
the
costs
the
of constructing
mainly
built
using
the
is called
lists
pointer
subtrees
there
subtree
slot, slot
of
are reset
linked
of linked
intermediate
grown
most
are inserted, the
pages a small
pointers
proceeds
from
same
additional
than up
are will
data
so written
then
an
a data
buffer
all
longer slot
of
objects
an
which
lists
and
the
freeing
group
the
in
list
contains
box as data
lists
disks,
The
root
many
a grown
Costs
costs
straightforward
costs
cases,
For
the
arise
construction
tree
tree
that
all at once
trees
the
Construction
found
the
lists
insert
in
in DS
lists.
such
construct and
to reduce
under
using
linked
to
list
each
a linked
pages
subtrees
for
to the
By
into
grow
process
objects
in the
to point
substantially
present
of the
In
random
grown
recorded
subtree
of linked
grown
is built
been
at
uses
of the
R-tree
can
such
that
construction, We
costs
set
all data
inserted
corresponding
insertion
constructing
con-
dynamically
we reduce most
structure
of seeded
trees
a method
costs.
being
construction
seeded
to eliminate
join
for
The
The
data
want
to
The
5). tree
a grown
linked
constant
NULL.
When
optimized
all
the
the
data
now
that
into
lists
a linked
write
space.
batch.
been
than
total
examine
lists
reducing the
their
construct
we
have
rather
it is important
linked
that
indices
updates
As
join
Improvements
tree-like
we
pre-defined
a pre-requisite.
we
into
(see Figure
a bounding
all
most
forming
size,
in the
linked
If
object
slots
by
built
with
Eventually
avoid
buffer
be
page
The
the
organized
each
allocated.
to
3
field.
data
R-trees
as long
A data
to
misses
if we estimate
the
not
able
buffer
under
first
of entries,
inserted.
applied
Furthermore,
than will
been
to
phase,
but
pages.
array
not
balanced,
to matching
is not
into
does
difficulty.
technique
be applied
balance
be
a slot
have
due
lists
growing
be larger
through immediately,
grown
However,
linked
the
algorithm,
inserted
[BKS93]
any
insert
tree
we
accesses
intermediate
structures
the seeded
a result,
heights.
without
optimization
been
As
TM
after
above
have
others.
procedure
participating
the
may
different
matching
begins
trees,
disk
size will
matching
is built.
seeded
random
containing
relevant
general.
in DS
consistent.
The
can
begins
the
during
seeded
n
4
S1
S2
s
s
A
S4
.
L41
s
—
@ —
— L11
S2
s
L12
—
—
!$
seed node
fi
grown node
❑ hdllinkecllis.tpage @
*$-
(a)
half full linked Iistpage data rectangle
■
dm”b
L;l
m.
❑ S4
(b)
(c)
Figure 5: Treeconstructionusing linked lists. Assume each data page has a capacityof two data objects, and the buffer has a capacity of 10 pages. (a) shows four linked lists formed under the four slots. All 10 pages in the buffer have been allocated,and one additionaldata object is now insertedinto linkedlist underslot SI, resultingin bufferoverflow.In(b), the batchof linkedlists formedin (a) has beenwritten to disk due to bufferoverflow.A new linked list is formedunder SI for the data object insertedin the last step. Grownsubtreesare constructed using the data in the linked lists when all the data objects are inserted. (c) showsone grown subtreeconstmcted usina linked lists LI I and LIZ, the linked lists constructedunder slot S1. .
“
3.2
Reducing
Tree
Sizes
with
Seed
Level
tree
Filtering The
seed
levels
of the
they
can
function: size.
Smaller
tree
tree
is easy
and
indexed
one bounding the
data
seed
R-tree
with
this
When
seed
nodes
during
tree phase
seed
not
Ts,
the
a data we first
field the
in
DR
The
shadow
and
so the
first
test
query
k levels a
to be non-
into
fields Once
space
of the
need
are not occupied
the
Ds.
1/0
tuning
generality,
the
TR
during
be inserted used
after
by them
are Ts
pages.
are
used
If there
into
tree
any
not
is
becomes
both
4
Experiments
Assume derived
that data
set Ds
tree
TR exists.
tree
joins
RTJ,
and
our algorithm,
a spatial
join
and
is to
a data
We conducted
be
set Dn,
performed for
experiments
using
three
BFJ.
Algorithm
spat ial
STJ
join
as described
so far.
which with
algorithms: (Seeded
on
Tree
It constructs
a
an RS
TJ, is
dirt y buffer
process
if
215
and
data
generating x
distributed clustering
of clustering
object of
tree
writes
cent aining and
construction. buffer
cache. the
tree
containing
Ts
of spatial
cluster-
by a simple
set of x x y objects,
the
map
centers
whose area.
of y data By
rectangles, data
must We do
manager.
rectangles,
rectangle.
is
matching.
during
pages
512
buffer
tree
a warm
degrees
in
of the
R-tree entries
the
pages
after
a data
the
the
buffer
was cent rolled
cluster
distributed
and
as dirty
buffer
of different
page as are
are to be re-used.
by the
When
each
buffer
buffer
generated
degree
disk
contain
and
disk
dirty
tal area of the clustering
a seeded
the
RTJJ
with
be
we
within
For
and a 4-byte
and
pages
scheme.
randomly
box
the
starts
could
randomly
Join)
and
used,
a dedicated
if the pages
of clustering
first
CPU
bytes,
to
are marked
degree
were
seeded-
nodes
are re-allocated
The
nodes
construction
there
We studied
tree
STJ
matching
that
ing.
these
was
lK
assumed
of T5,
tree
the
tree
nodes
free.
as
between
[BKS93].
both are
bounding
tree
to disks
purge
are
assume
construction
matching
construction,
D5
to
the
in
size
seeded
files
also
created
Note
all.
use
that
page
algorithms
during
Hence
data at
TS
We
be written
at least
queries
join
structure
assume
the
data
For
newly
the growing
we
of a 16-byte
During
phase.
both
Ds,
(Brute
in
answers
a spatial
described
R tree
memory
The
nodes
overlap
the
sizes of both
consisting
overlaps
k seed levels.
and
identifier.
fields
original
for
of window
of
to
STJ
and
Ts
BFJ
rectangles
aggregation
is a
of its variations.
nodes.
When
these
object
the
data
techniques
simplicity,
TR to the
construction
does -not
not
the
a series
tree
by Brinkhoff
Algorithm
performs the
the Join)
an R-tree
TR.
using
RTJ
and
any
with
The
matches (R- Tree
proposed
is equivalent
DR
size
field
corresponding
copied, tree
in
box
shadow.
of the
data
object
entries
from
of the
entire
at each
fields
called
fields
if the
data
and
the
bounding
TR,
disk not
Ts at all.
simply
then
RTJ
constructs
Ts
queries
and
algorithm
It first
windows.
window
whether
of the
R-tree
For
is to be inserted
check
one shadow
R-tree.
found
is used,
mbr
the
object
no overlap,
overlaps
set D.s, Algorithm
matches
Join)
on the
data
if it
the to
is copied
change.
When
some
of that
be inserted
time,
the
during
phase,
only
objects
additional
shadow
without
changed
object
filtering one
information
of
into
nodes
level
construction
levels
copied
resemble
not
both
data TR.
[BKS93].
then
Force
level
be used
Data
during
and
the and
variation
et al.
tree
seed level filtering.
carry
seeding
each
can
TR.
1/0
overlaps
of TS
they
seeded
Ts
simple
additional
time.
if and
at
one the
less disk
a rectangle
TR need
test
serve
reduce
join
box
overlaps
overlapping
seed
incur
an R-tree
levels T~,
object
We call
by
to
actual
to see that
at least Since
tree
be used
objects
of the
seeded
sizes
construction
It
Ts for
indices
.
set.
The
then
rectangles
controlling we could
centers We the
control
smaller
the
tothe to-
tal
area
of the
the
data
set.
ing
rectangle
to
lie
The
clustering
map
over
the
boundary
clipped.
area.
cluster
ing
rectangles
Without
X and
The
in DR
the
of
of the
We
100,000,
The
6:
centers
study
the
1.0,
of the
z
6000
of
2000 0 20k
Figure
copy
40k 60k Cardinalify of data set S
SOk
7:
1/0costs for tree construction,ExperimentSeries 1.
❑ ❑ ❑ ❑ ❑
14000:
For
effect
of seed
rectangles
tree
each Among
g 12000:
of v ~
level
8000
of Ds
on side
series
policies,
and
C3 (see Section (see
Section
The
differences
were
marginal.
we found 2.1)
2.2) Due
and update gave
the
three
to
seeded (C’3,
that
always
between
from and
combinations
20k
U4),
space trees
best
denoted
using
copy
U3,
C72
U4 and
update
Figure
8: l/O
.Di=ttHd
R2=k=!
that
as STJ1 the
H
❑ ❑ ❑ ❑
BFJ RTJ STJ1-2N STJ1-2F STJ1-3F
only “
combinations and
STJ
0.2 ‘ 0.4 ‘ 0.6 ‘ 0.s’1’ Fraction of area used in data clustering
STJ2, Figure
showed
costs for tree matching,ExperimentSeries 1.
policies
we list
respectively. experiments
SOk
performance.
limitations, built
node
strategies
policies better
40k ‘ 60k L ‘ Cardinality of data set S
as
set of
of experiments
copy
STJI-3F
length
same
of seed
STJ1-2F
of
was set to be same the
RTJ STJ1-2N
,,,
2000:
of the
quotient
BFJ
4000:
O.2, 0.4, 0.6, 0.8
We ran
as the in first
various
U5
U3),
cover
bound
experiment.
the
40,000,
on side length
equaled
upper
fixed
and
configuration.
the
results
The
variations
update
the
rectangles
in each
data
so that of DR
we
of clustering bound
STJI-3F
4000
data
2.
the
effect
upper
STJI-2F
area.
node
Section
100,000
degree
the
rectangles
clustering
seeded
the
at
RTJ STJ1-2N
6000
to 0.04.
map
seed
experiments,
Ds
varied
adjusted
respectively.
of DR
of
and
‘
rectangles
studied
the
:
RTJ and BFJ TJ variations
in
we and
12000
S
different
STJ1-3F
❑ ❑ ❑ ❑
140C0
upper
of all the
of
g
Totaldisk 1/0costs, ExperimentSeries 1.
= 10000
the
degree
setting
we tested
of
series
clustering
clustering
Our
60k
20000 1Sooo
first
varied
to 20$Z0 of the
levels,
DR
and
sets.
(C3,
the
of policies,
second
respectively,
the
40k ‘ 60k ‘ Cardinality of data set S
on performance.
In
and
at
rectangles
as described
of seed
cardinality data
set by
extensive
policies,
the
of the clustering
that
combinations
number
filtering
area
1 along
In
and
to 80,000.
restricted
an
I)R
levels,
was
configuration,
combination
the
of
4
quotient
were
data
update
each
map
O to
;
of clustering
meaning
conducted
and
for
20,000
of data
0.2,
applying
that
from
cover
was
each
and
and
of
STJ1-2F
RTJ
of
Q
cardinality
on the side length
objects
the
from
Figure
number
of experiments.
R-tree
of IIs
resulting
For
series the
an
clustering
in DR
total
STJ1-2N
&q
BFJ
objects
of cluster-
Results
fixed
cardinality bound
range
20k
it was not
of data
of generality,
-,
extended
16000
two
in
spatial
to fit
Y axes.
we
resulting
over
clipped
rectangle,
❑ ❑ ❑
When
extended
were
to the
to
were
bound.
and the number
loss
Experimental
series,
rectangles
the number
assumed
This
clustering
rectangle
clustering
set according
was
conducted
the
of its
of the
rectangles
a data
cluster-
bound.
upper they
clustered
independently
of data
area,
When
was
study
4.1
by
data
was set to be 200,
objects.
under
We
tot al area
In the experiments,
per
and
a smaller
map
more of each
upper
shape
or
of the
the
both
and
using
into
data
the
rectangles
boundary
the width
a predefine
size
chosen
the
randomly
controlled
rectangles. similarly
and
chosen
O and
bound
rectangles,
length
was
between
upper
the
clustering The
versions
216
9:
Totaldisk 1/0 costs, ExperimentSeries 2.
H
RTJ
I ❑
STJ1-2N
❑ ❑
STJ1-2F STJI-3F
tree
construction
may
need be written
during
tree
should
thus
the
The
CPU
I
I
I
■ ❑ ❑ ❑ ❑
1
1
3
whether
two
used
0.4 Fracfionof
0.6 area Wed
0.s
11:
sT,,-,,
cost
arises
For
S
from
BFJ
subst ant i all y outperformed terms
of
disk
the
when
seed
level
filtering,
costs
level
magnitude
from
Figures
observed
in the
10, and
11 break
creation
and
tree
no tree
creation
Tables of the
first
appears
after
A letter
“F”
that
disk
show
no filtering
tree
costs
an
seeded
in
The
numbers
small
1/0
2,3
and
matching and
fly,
1/0
those
BFJ
and
access.
“construct”
1/0
used
following
the
number and
test
X or Y
we
found
terms
during
tree
tree
RTJ
For
during
this
construction.
both
from
buffer
the
linked
lists
of
in
reads
as
increases.
misses
and
during
creation
tree
time
reads
TJ even for large Ds sizes, while
S
from
1.5 times
and
this
of
showed
overflow
costs
Ts
7, 8,
(Table
1) to 30 times
of tree no
give
number
access 1/0
the costs
construction,
and
write
costs,
respectively.
starts
with
a warm
buffer
buffer
two
when
BFJ
in this
case.
Since
size,
for
held
S
no buffer
TJ with
nodes
both
with
times
the
from
However,
larger
the numbers
TJ as soon as the number
exceeded
the
buffer
RTJ performed
that
D5
that
buffer
to three
by S
nodes
we found
the
matching.
incurred
due to buffer
seed
two. The
no
in
as 1/30
trend
cost
size. worse
and
“wr”
Note
that
since
found
over
from
clustering
217
performs in
outweighed
1/0
disk
is not
used,
STJ
1/0,
better
To than
is paid
for
also
gain
that
for tree
creation.
for
using
by the
the
using only
creation an
When
incurs
savings.
Filtering
than
clear
cost tree
the
a consistent
of
is especially during
among
“rd”
terms
costs
Tables
during
misses provides
in
reduction CPU
filtering
columns
filtering levels
This
of the
left
buffer
found
occurred
the
tree
R-tree
variations
three
of random
incurred
during
Seed level
stands
the
of the
expect
We
TR nodes
Overflow since
cases
1).
than
BFJ incurred
accesses
cost
STJ
seed
and
counts
costs,
1/0
indicates
levels
most
in tree
matching
name.
STJ2-2F
seed
as the disk
“N”
in all
(see Table
483 different
surprise,
linked
lists
BFJ in all cases. A closer look showed that though tree costs are lower for RTJ the high tree creation
costs
that
BFJ
sizes,
Ts
of disk
number
algorithm
eliminated
linked
numbers
cases.
was smaller
of accessed
TJ variants
S
a letter Thus,
Under
The
in
successfully
occurred.
sets,
STJ. Our earlier
of
similar S TJ incurred RTJ when intermediate
as
intermediate
only
and
those
Using
smallest
data
data
than
that
reads
used.
number
same
method
CPU
4) larger
in these
level
so incurs
indicates
two
is shown
and
for
vary
and time
accessed
the
Figures
in detail.
TJ with
A sequential disk
arises
STJ outperforms
order
disk
into
trees
this
S
that the
experiments disk
misses
reading
size is the
seed
seed
The
disk
was performed.
accesses.
matching
the
activated,
2 of
Disk
read
about without the
on the
hyphen
was
of a random tree
1/0
of experiments
the
variation
“match”
were costs
mat thing.
4 show
following
construction.
along
interesting,
cost
was not
our
of the
filtering
filtering.
the
structures
series
all
for
misses
costs.
levels
for
With
of experiments.
of tree
1 through
of seed
level
down
those
buffer
1000.
bounding
[B KS93].
of
the
RTJ they
algorithms
activated.
9 summarize
series
all
STJ were between
of
the
6 and
among
not
and
than
two
in
number
TJ
construction
R TJ in all cases also S TJ versions
The
costs
RTJ,
and
tree
of experiments,
1/0
experiments
and
costs was
CPU
greater
filtering.
creates
costs. CPU
filtering
the
BFJ
of
1/0
lowest
of
of
in data dwfeting
list
in
matching
RTJ
construction.
for
numbers
overlap
is particularly
of creation
incurred
of
overlap
units
of operations
boxes
series
The
very
numbers
the costs go up as the size of D5
1/0.
construction
costs for tree matching,ExperimentSeries 2.
1/0
“match” part
in
during
numbers
tree
first
the
the
performed
outperforms
disk
(Table Figure
the
STJI-2N
1
under
construction
algorithms
lists
bounding
that
remain 0.2
Ts nodes
are re-allocated
column tree
are
the
is the
during
the
STJ of
ST.tl-sF
write
given
by
tests
‘(XY”
expected
RTJ
containing
if the pages
to the
“bbox”
overlap
axis,
BFJ
costs
Column
In 35000,
pages
The
be charged
column
box
1/0costs for tree construction,ExperimentSeries 2.
10:
matching.
performed
The
Figure
dirty
to disks
algorithms.
tests
0.2 0.4 0.6 0.8 1 Fraction of area used in data clustering
time,
costs. increase
seed lowest
level CPU
all algorithms.
2 and
second that
Table
series the
5 through
processing
decreases,
Table
of experiments, costs
especially
for
8 list
the
In general, rise tree
as the
results we have
degree
matching.
of
This
n
1/0 Match
A lg
Wr
rd
CPU
costs
Construct
[
r-d
I
w!-
costs
(K bim
total
1/0
tests) X’Y
BFJ RTJ STJ1-2N STJ2-2N STJ1-2F STJ2-2F
438 1182 694 849 685 823
0 359 319 358
0 144, 94 94
0 243 137 150
438 1914 1244 1451
2381 130 79 84
314 349
94 94
85 99
1178 1365
896 898
0 170 168 170 168 170
1 STJ 1-3F
712 746
226 223
94 94
5 5
1037 1068
1945 2001
160 167
u
STJ2-3F
Table
1:
20K.
Cover
Join
performance
quotient
n
I
with
of D~
1/0 Match
.
llDnll
clustering
=
100K,
rectangles
I]DsII equals
CPU
costs
Construct
I I
u
(K
=
wr
bboz
XY
IJFJ RTJ STJ1-2N STJ2-2N
13650 260S 2422 2439
0 27 370 369
0 12274 366 367
0 18S7 1483 1477
13650 16754 4641 4652
6984 315 263 267
0 560 53s 538
STJ1-2F STJ2-2F
2362 2429
358 367
366 366
1343 1357
4429 4519
2603 2610
535 536
STJ 1-3F STJ2-3F
2274 2244
349 368
366 366
451 426
3440 3404
5613 5709
498 520
60K.
3:
Join
Cover
performance
quotient
I/o Match
353 363
507 507
1952 1947
5768 5885
3418 3431
686 690
2739 2745
344 354
505 505
698 672
4286 4276
7328 7435
638 666
715 719
2896 2920
1735 1739
349 356
STJ1-2F STJ2-2F
STJ1-3F STJ2-3F
1519 1537
342 353
236 236
140 120
2237 2246
3767 3843
330 344
STJ1-3F STJ2-3F
with
llD~ll
=
100K,
rectangles
\lD~ll eqU&
Table
n
80K.
0.2.
4:
Join
Cover
performance
quotient
with
of DR
a data
with
rectangle
is higher] nodes
during
as
number
mat thing.
that
of spatial
be accessed, tree
leaving
ently half
low,
the
accesses
grows
rapidly
number
than
the
We have data
is high,
decreases. Ds
eliminated
tree
the
nodes
that
when
costs
are the
become
the
degree
this
some
= 0.2.
CPU
costs
(K
costs tests)
rd
UT
rd
Wr
bboz
XY
BFJ RT.I STJ1-2N STJ2-2N
14803 2881 2265 2347
0 57 329 374
0 6909 236 236
0 121’7 794 795
14803 11036 3624 3752
6628 405 268 284
0 443 437 445
STJ1-2F STJ2.2F
2242 2328
330 374
236 236
770 752
3578 3690
2688 2702
436 445
STJ1-3F STJ2-3F
2265 2342
337 358
236 236
430 430
3268 3366
5268 5364
411 429
total
is
as the because
objects
frlt ering
of spatial
in
degree most DR,
and
5:
40fi-.
(her
Join
performance
quotient
with
of ~R
IIDRII
Chstering
=
100K,
reCtangkS
IIDsII eqU&
= ().4.
for I/o
less
number worst
decreases, becomes
Table
remain
always
BFJ,
For
must
factor
costs
the effectiveness
the
diminishes
the
I]DsII equals
In this
deciding
creation
TR nodes
and
by
the
and
Again,
overlap
in
Match
of
Alg.
of all
because
much
larger
size.
also found
is high
degree
factor
for optimization.
of clustering
of touched buffer
filtering
degree
that
decreases,
leaf
total
RTJ.
of
the
most room
those
as the
suggests
joins.
become
of
methods
and TR leaf
an important
TJ, the tree
and
Ts
Alg.
in DR
time S TJ at tree matching RTJ. This is because for low
little
S
rectangles
100K,
atc
~
that
=
rectangles
by of
costs
For
possibility
clustering
clustering,
creation
performance.
This be
of
accesses to
data
of spatial
degree
the
access more
should
results
close
degrees
in
tree
of disk
becomes
disk
we must
the
data,
overlaps
thus
the
Also,
consist
Ds
clustering
estimating
than
in
and
of spatial
case,
less clustered
IIDRII
clustering
I/o is because
wr
2956 3068
236 236
clustering
I
o r L 741 685 691
357 359
of ~R
rd
costs tests) XY
1588 1606
performance
CPU
costs
(K
UIr
= O.2.
]
STJ1-2F STJ2-2F
Join
[IDs([ equals
I
17151 3292 2996 3063
2:
100K,
bboz
BFJ y RTJ STJ1-2N STJ2-2N
COvel quotient
=
rectangles
9085 415 334 353
0 372 349 355
Table
llD~ll
o [ o I o I 17151 38 16555 2525 22354 361 506 2126 5989 362 505 2154 6084
bboz
4648 295 169 174
40K.
1
I
total
Wr
XY
I
8864 9695 3040 3064
I
total
I
rd
0 1219 817 820
!-d
I
Alg.
0 6015 236 236
Wr
with
clustering
of DR
costs
tests)
0 50 364 360
I
costs tests)
rd
8864 2439 1623 1648
rd
(K
wr
BFJ RTJ STJ1-2N STJ2.2N
Ala.
construct
rd
Table
0.2,
CPU
costs
Match Alg.
of seed level clustering
of
costs
CPU
~
(K
costs tests)
rd
Wr
rd
w!-
BFJ kl~ STJ1-2N STJ2-2N
23177 3451 3263 32S0
0 62 350 366
0 6370 236 236
0 1202 813 802
23177 11057 4662 46S4
7773 a64 419 410
0 634 514 524
STJ1-2F STJ2-2F
3251 3268
352 366
236 236
782 763
4621 4633
2707 2701
514 529
STJ 1-3F STJ2-3F
3212 3385
346 354
236 236
637 583
4431 4558
5788 5879
481 509
bboz
total
of clustering data
Table
objects
cannot
40K.
be
pro cess.
218
6:
Join
Cover
performance
quotient
of DR
with
IIDRII
clustering
=
100K,
rectangles
[lDsll equals
= O.6.
1/0
CPU
costs
Mate~ Alg.
(K w?-
rd
13FJ
costs
2.5167
rd
w!-
total
0
0
II
tests)
b oz
25167
7228
0
slots
uniformly
may
be to extract
techniques
such information
3304 3141 3206
62 358 366
6287 236 236
1195 814 820
10820 4549 4628
587 450 457
556 550 557
seed
levels.
most
suitable
STJ1-2F STJ2-2F
3142 3217
358 366
236 236
790 805
4526 4624
2242 2248
550 552
situations.
STJ1-3F STJ2-3F
3268 3487
335 344
236 236
736 677
4575 4744
5104 5205
497 526
A
the Table
7:
40K.
Join
Cover
performance
quotient
with
of DR
I l.D~ll
clustering
=
100K,
rectangles
I IDs-II equals
=
acteristics
of their
necessary
in
I/o Alg.
(7PU
costs
~
Match
(K wr
13FJ RTJ STJI-2N STJ2-2N
31831 3710 3582 3611
0 69 338 340
0 5976 236 236
0 1207 800 808
31831 10934 4956 4995
8300 763 551 566
0 623 587 613
STJ1-2F STJ2-2F
3579 3600
333 330
236 236
793 799
4941 4965
2353 2367
588 j 615
Table
8:
40K.
297 371
3689 4125 Join
Cover
236 236
performance
849 769
5071 5501
IIDRII
=
with
of DR clustering
quotient
ox
focus
of our
access
leaf
6 this
studied tree
it
may
two (1)
be
trees
input
spatial
seeded
to
when
data
following the
data
sets to be used
input
data
sets are the
it is determined
seeded the
trees
paper,
high,
In the seeded
two-seeded-tree
tree
for
the
the
seed levels
are
seeding
Suppose
tree.
join
both
can
be applied,
for
both
rather One
than
R-tree
trees.
for
being may
trees.
trees copied
DB
that from
be to have
be
is
sets
are
spatial will
For
selection,
input
height data
at join
be
tree
can
spatial
height
of a
of the
R-tree
plus
the
number
from
the
root
upper
to
bound.
existing
tree
to
build
costs the
faster
using
effective
are
for
of the
also
the
matching
lower
due
two
tree
input
against
lists
set the
tree
the
always by
case.
is
a
Tree
is shown
construction of
to better
that
are
methods
data
costs
tree,
sizes and
show
method
linked
eliminating
When
technique
results
other
costs. filtering
size of the
in one boundary
intermediate
tree
linked 1/0
seed level
tree
for
of a tree
sets of varying Our
seeded
nodes during
shaped
basis time
tree
data
Tree
better
the
and data
intermediate
reduce
seeded
levels input
growth
the
called
except
in
misses.
also uses
clustering. of the
tree
construction
input
of two to three,
method
that
the
than
R-trees
to
exist
seed
levels.
in a tree are
to further
with
the
be used
of the
seed
a technique
tested
the
to guide
reduce
lower
clustered,
the
levels
factor
buffer
cannot
into
resulting
of spatial
seeding
trees,
addresses
no R-trees
characteristics
algorithm
construction
a spatial
index
method
R-trees
are used
seed
methods
be very
This
or where
is divided The
can be used
1/0
studied
sets.
utilized
have
and
constructs
time.
processing,
The
degrees
presented
dynamically
to drastically
total
we
The
the
this
construction,
other
choice
selections.
than
seed levels
We
of
a seeded
be shorter
levels.
join.
that
is
is to be
as the
for
issues
as an ordinary
paths
We also presented
If at least
If no pre-computed
we can use a common
seeded
approach
seeded
are
a spatial
done
these
used
same
where
construction
determine
OPC(DC))
two
the
trees,
seeded
grown
the
trees.
say OpB, is a non-spatial
seeded
can
tree.
somehow
of
char-
techniques
been
most
data
the
tree
two
seed levels
seeding
the
work,
than
we have
join
input
lists
selection
is no obvious
joinA(oP~(~~),
pre-computed
This
the
the
still
seeded
no
to the
constructing
the join.
there
and
selections,
non-spatial
from
We must
constructing
one of opB or opc, for
derived
joins,
with
in the
or (2) one or both
scenario,
scenario,
in both
the by
can use the tree
for
of the
one-seeded-tree
the
processed
efficient
occur:
enough
of non-spatial
dynamically
selectivity
the
spatial
on
to realize
However,
that
seeded
The
using
of complicated
closely
in the join,
results
that
is more
case if the
other
are related
join
situations
results
the
using
However,
a spatial
are
including
R-trees
method
R-tree.
the
input and
join
perform
sets
operations
pre-computed
tree
a pre-computed
necessary
seeded the
the
will
situations
help
and
Such
if necessary,
and
spatial
with
method
the
Discussion
a seeded
measizes,
Conclusions
In
1.0.
that
join
for
levels,
nodes
called
We have
has
future
is no greater
constructed
=
equals
tree
of seed
join
5
noting after
method
seeded
553 581
IID.sII
rectangles
within
be retained
5772 5872
100K,
Addressing
It is worth
rd
STJ1-3F STJ2-3F
work
these
costs
Wr
-lb
little
the
in
as the
based
way
area.
find
quantitative
sets.
best
in this
the
to trees
such
databases
tests)
rd
total
data the
and
the common
seeded
is finding
sets using
[OR93],
necessary
operations
input
choosing
Relatively
is
solution
data
sampling
characteristics,
of spatial
Another the
for building
construct
issue
the
0.8.
data
research to
related
to predict
query.
area. from
as a basis
way
outcomes
map
as spatial
More
closely
sures
the
information
use such
RI’,J STJ1-2N STJ2-2N
I
divide
to time
spatially
seeded
tree
organization.
set of seed levels
artificially
1If a seeded
constructed
any pre-computed a set of seed levels
original
R-tree.
used,
whose
original
219
spatial
tree is to be used in spatial selections after the join terminates, seed level tiltering should not be
as the set of data data.
indexed
by the tree will
be a subset
of the
CPU
costs
lowest not
for
the
among
seeded
all
tree
methods
method
are
seed
level
when
always
[Ore91]
the
filtering
and
act ivat ed. Seed
when
level the
reduce
data
CPU
costs
by
CPU
[Rot91]
However,
when
If the
can
the
a join,
this
[Sam90]
[SE90]
References Thomas
Brinkhoff,
Seeger.
using
R-trees.
Norbert
and
efficient
ternational
[FSR87]
Bernhard
322–332,
Christ
os Faloutsos,
[Gun93]
[Gut84]
Conference
Oliver
Gunther. Proceedings
dex of
[LH92]
ACM
spatial
of International
[OR93]
of spatial
A
in-
Proceedings Conference
47-57,
pages
on
dynamic
searching.
Aug.
on
1984.
Dist ante-associated search.
Conference
284–292,
In
on Data
join
Proceedings Engineering,
1992.
J. Nievergelt,
H.
Hinterberger,
and
The
An
adaptable,
symmetric
file:
structure.
ACM
9(l),
K.
Transactions
C. Sevcik. multikey
on
Database
1984.
Frank
Olken
spatial
databases.
Conference
of Data,
1993.
International
of Data,
Han.
Sgstems,
In-
Conference
50–59,
spatial
range
file
ac-
SIGMOD
Management
R-trees:
for
Wei Lu and Jiawei
grid
Rous-
spatial
computation
pages
SIG MOD
for
and Nick
of International
indices pages
[NHS84]
on
Guttman.
structure
Management
In-
of Data,
oriented
of A CM
Efficient
Engineering,
Antonin
R*-tree:
1987.
joins. Data
Sellis,
of object
and
Doron In
on Data
Rotem.
Sampling
Proceedings
from
of International
Engineering,
199-208,
pages
1993.
[Ore89]
J.
ternational Portland,
[ore90]
Jack
In Proceedings Conference OR,
1989.
Orenstein.
A
processing spaces. tional
Redundancy
Orenstein.
A.
databases.
techniques In Proceedings
Conference
343-352.
of A CM on
Management
comparison for
in
of A CM
and
SIGMOD
In-
of Data,
of spatial
native
on Management
spatiaJ
SIGMOD
query
parameter Interna-
of Data,
Samet.
pages
1990.
220
The
indices.
Design
Star
mation
Systems.
and
John
In
on Data
Japan
Spatiai Zurich,
Springer-Verlag. Proceedings Engineering,
1991. and
Addison-Wesley,
Jeffrey
Estes.
Analysis
of Spatial
1990, Geographic
Infor-
Englewood
Cliffs,
Prentice
Hall,
T. Sellis,
N. Roussopoulos,
and
C. Faloutsos.
R+ -tree:
A dynamic
for
multi-dimensional
Jersey,
1990.
index
In Proceedings 3–1 1, Brighton,
P. Valduriez. Database
for points
Management
Proceedings
427–439,
The
of A CM SIGMOD
on
Times
Analysis
ternational pages
Seeger.
Kobe,
Structures.
objects.
RaJf
join
in
381-400,
28-301991.
the
In O. Gunther
Advances
pages
Conference
500–509,
Hanan
pages
1990.
May
cess methods.
Kriegel,
access method
Proceedings
pages
In-
of Data, [Va187]
Hans-Peter
Conference
sopoulos.
Management
[SRF87]
joins
1993.
and robust
and rectangles.
and Bern-
of spatial
of A CM SIGMOD
on
Beckmann,
Schneider, An
Proceedings May
Kriegel,
processing
Conference
237–246,
pages
Hans-Peter
Efficient
ternational
[BKSS90]
SpatiaJ
for computing
spaces.
‘91),
D Rotem.
New
hard
algorithm editors,
(SSD August
Data
a special
concern.
[BKS93]
Schek,
Switzerland,
pages
technique
is not
H.-J
of International
degree
characteristics
capacity
An
of k-dimensional
Databases
sizes
and
3070.
gain
before
when
tree
is high,
is low.
known
only
reducing
almost
without
of data
are not
be used
in
clustering
costs
clustering
input
should
is effective of
tot al 1/0
incur
of spatial of the
filtering
degree
the
it could
Jack Orenstein. overlay
is
Join Systems,
of Very England,
indices. 12(2),
ACM 1987.
Large
Data
The
Bases,
1987. Transactions
on