Hui Xiong. Department of Computer Science & Engineering. University of Minnesota - Twin Cities. (c) University of Minnesota - Twin Cities. April 8, 2004 ...
Slide 1
A Framework for Discovering Co-location Patterns in Data Sets with Extended Spatial Objects Hui Xiong Department of Computer Science & Engineering University of Minnesota - Twin Cities
(c) University of Minnesota - Twin Cities
April 8, 2004
Slide 2
Overview Introduction
General Problems Related Works Research Motivations
A Buffer-based Model
A Filter-and-Refine Co-location Pattern Mining Framework
Experimental Evaluation
Conclusions and Future Work
(c) University of Minnesota - Twin Cities
April 8, 2004
Slide 3
Introduction & Background Co−location Patterns − Sample Data
80
70
60
Y
50
40
30
20
10
0
0
10
(c) University of Minnesota - Twin Cities
20
30
40 X
50
60
70
80
April 8, 2004
Slide 4
Examples of Co-location Patterns
'
! -
!
,+ "
" (
!
'
"
*
)
!
.
% $ &
"!#
, " , / " '
/ 5'
*
( *
!
)
)
)
.
34$
"(
!
*1
2
,+ ' " # (
!
!
+*
,(+
"0
, "! # /* '+
,
*# ! !-
#*
.
'
"
0 "
! +* 3 *
!#
1
"*
+
#
'( +
#
1
!#
(
!
(
"2
'
*#
8
.
! ,+
1
!
+
2 0
"
3 *
*#
!
!
*
!
!
!#
'
! + * !
"
;:,
!(
+*
#2
+
#
!
'-
!
#
'-
"
1
"
!-
)
0 )
*
,
0 '
!
,-
"
,
,
#
6 $ 7
# ) ! + # ! # ( !
* !#
!# *
! + # ! # (
(
!
"
"
,(+
%
"
,+
)
*
,+
$
, )
1
,#
!
(
1
'#
'
'#
1
*
*
!#
#
"!
' # 1 ,( "
"# 0 ,
9
)
)
8
.
$
#
*
3
+
# '
"
!
#
!
'-
' "-
*
*+
!
, "
'
,#+ ='
' ! "! (
.
April 8, 2004 (c) University of Minnesota - Twin Cities
Slide 5
Related Works
Spatial Statistics
Use measures of spatial correlation to characterize the relationship between spatial features * the cross-K function with Monte Carlo simulation * mean nearest neighbor distance * spatial regression model Computationally expensive
Data Mining Approaches
A clustering based approach by Estivill-Castro et al. * Features can be completely spatially random or declustered. * Sensitive to the choices of clustering algorithms. Asssocation-rule based approaches.
(c) University of Minnesota - Twin Cities
April 8, 2004
Slide 6
Related Works - Cont.
Asssocation-rule based approaches.
Transaction-based approaches. * A reference-feature centric model by Koperski et al. - Generalizing this paradigm to the case where no reference feature is specified is non-trivial. - May yield duplicate counts for many candidate associations. Distance-based approaches. * k-neighboring classs sets by Morimoto. - the number of instances for each pattern is used as the prevalence measure * an event centric model by Shekhar et al. All these approaches are for point spatial features.
(c) University of Minnesota - Twin Cities
April 8, 2004
Slide 7
Motivation
Identifying co-location patterns in data sets with extended spatial objects (e.g. polygons and line strings).
Highway often have frontage road nearby in large metropolitan.
D
O
O
K
N M &
IL
IJ
DE
BH
DFG
AE
D
ABC
(c) University of Minnesota - Twin Cities
April 8, 2004
Slide 8
Problem Formulation
Given: Y
Y
Y
P
SRQ
A set T of spatial feature types and spatial data types can be point as well as other extended spatial objects, such as line strings and polygons. A set of instances , each is a vector instance-id, spatial feature type, location where spatial feature type and location spatial framework . A buffer size, a minimum prevalence threshold, a minimum conditional probability threshold. WUZ
WUX
TUV
[
]
a
Y ^_
Y
Y
\
R ]
^`
T^V
[
c
b
Q
d
a
a
Find: Co-location Patterns and Co-location Rules.
Objective: Computational Efficiency.
Constraints: Correctness and Completeness.
(c) University of Minnesota - Twin Cities
April 8, 2004
Slide 9
A Buffer-based Model hl
k
ij
hi
gf
e
Buffer is a zone of specified distance around spatial objects. The boundary of the buffer is the isoline of equal distance to the edge of the objects.
Motivation
Objects in space frequently have sort of impact on the objects and areas around them * freeways create “noise pollution” that can be heard blocks away. * factories emit fumes that can affect people for miles around.
(c) University of Minnesota - Twin Cities
April 8, 2004
Slide 10
A Buffer-based Model
.
A Point Object O1
A LineString Object O2
A Polygon Object O3
Neighborhood of O1
Neighborhood of O2
Neighborhood of O3
.
. .
Instance r
Neighborhood of r
..
\
hm
k
ij
hi
gf
e
, the size- Euclidean neighborhood of a point location , is a circle of side with as its center. E
^
o
n
E
^
^
\
h
k
ij
hi
gf
e
, the size- neighborhood of an extended spatial object (e.g. polygon, line-string), is defined by the buffer operation. E
B
p
o
n
(c) University of Minnesota - Twin Cities
April 8, 2004
Slide 11
A Buffer-based Model - Cont. w
Y UZ
Y
Y
]
k
u
f
ij
t
u
ks
r
k
tf
hq
ij
hi
gf
e
v
UX
o
UV n
}
}
z
{ yz
~}|z
_
Y UZ
Y
\
Y
Y
Y
R x
WUZ
W
for a co-location is the Y
UV
UX
TUV
n
[
o
The is , where Euclidean neighborhood of the co-location C.
z
]
i
i
j
u
k
w
x
x
o
V
Xn
t
k
j
u
h
i
k
r
k
i
h
h
ij
hi
gf
e
The of a co-location rule is the probability of finding the neighborhood of in the using the neighborhood of . It can be computed as and . neighborhoods of co-locations
x
x
x
X
X
V
_
|
; { y
x
V
_
{y
x
x
x
X
V
V
lu
f
The coverage ratio for co-location patterns is monotonically non-increasing with the size of the co-location pattern increasing.
(c) University of Minnesota - Twin Cities
April 8, 2004
Slide 12
A Coarse-Level Co-location Pattern Mining Framework A Point Object O
.
A LineString Object O
A Polygon Object O
Neighborhood of O
Neighborhood of O
Neighborhood of O
.. ..
Point Objects
Neighborhood of r
.
Row Instance r for two
\
k
B
h
ij
hi
gf
e
, the bounding neighborhood of a spatial object is defined as MBBR(Buffer(MOBR(Spatial Object O), d)), where MOBR is the minimum object bounding box, Buffer is the buffer operation with a buffer size as d, and MBBR is the minimum buffer bounding box. o
n
\
h
k
ij
hi
gf
e
The Euclidean bounding neighborhood spatial feature is the union of for every instance spatial feature .
o
Un
\
J
J
U
o
n
of a of the
U
(c) University of Minnesota - Twin Cities
April 8, 2004
Slide 13
A Coarse-Level Co-location Pattern Mining Framework - Cont. k
Y UZ
Y
Y
\
h
ij
hi
gf
e
The Euclidean bounding neighborhood for a coarse-level co-location pattern is the intersection of for every spatial feature in .
UX
o
UV n
Y
Y
Y
R x
x
WUZ
W
TUV
[
\
x
x
U`
o
U` n
w
x
Y UZ
Y
Y
]
k
u
f
ij
t
u
tf
ks
r
f
fs
k
r
tu
k
¡¢f
h
ij
hi
gf
e
The for a is , coarse-level co-location pattern is the Euclidean bounding neighborhood of the where coarse-level co-location pattern CC. v
UX
o
UV n
}
}
z
{ yz
} |z
£
_
Y
Y
Y
R x
x
WUZ
W
TUV
[
z
Y UZ
Y
Y
\
UX
o
UV n
(c) University of Minnesota - Twin Cities
April 8, 2004
Slide 14
A Coarse-Level Co-location Pattern Mining Framework - Cont. mu
f
The coarse-level coverage ratio for coarse-level co-location patterns is monotonically non-increasing with the size of the coarse-level co-location pattern increasing. Y
Y
Y
R ¤
u
f
For any spatial feasure set , the coarse-level coverage ratio CPr(F) is larger than or equal to the coverage ratio Pr(F). p
WUZ
W
WUX
TUV
[
(c) University of Minnesota - Twin Cities
April 8, 2004
Slide 15
A Coarse-Level Co-location Pattern Mining Framework - Cont.
A2 C1 B2 A3 A1
B3 A4 B1
Y ¬
NO
o
=
©ª
w
¥¦n
] x
=
§ y ¨
£
_
=
X
«
«
z
X
Y O®
O
=
V
w
o
¥ n ¦
] x
=
£
§ y
£
_
=
.
X
«
«
z
(c) University of Minnesota - Twin Cities
April 8, 2004
Slide 16
Geometric Challenges and Solutions Y S
(X2 , Y2)
y3rt y1rt
A3
A1
y2rt y3lb y1lb
A2
y2lb
(X1 , Y1) x1lb x2lb x3lb x1rt x2rt x3rt
O
¥
Y
Y
Y
u
¥ V
f
For any n spatial events
X
p
W
W
,
¯
¯ Ä
³ ´¶
¯
Ã
Â
¸
ÂÅ
Â
¸ ³
¯ ¹
·
¹
¹
~¸·
Ã
¿
¿Â ¹ Á
¼ ·
Â
Â
¸
À ¸ ¾
¸±
¸ ·
´¶
¼ ¹ ¦
¾
À ¸ ¾
¸±
~¸·
´¶
¿¹
¸±
¸ ·
´¶
¼¦¹ ±
~¸·
´¶
~¸·
4 º ¹ ±
»
»
»
» ³
´¶µ³ ° ±²
±² Á½
½
À½
±
½
À½
±
±½ ¾
¾
¾
(c) University of Minnesota - Twin Cities
April 8, 2004
Slide 17
Geometric Challenges and Solutions - Cont.
W
W
¥ W
Y
Y
Y
¥ X
¥ V
Theorem 2 Given any n spatial events and the corresponding bounding neighborhoods , , , , where the bounding neighborhood of the event , , is represented by the left bottom point and the right top point , if the bounding neighborhoods of these n spatial events have the common intersection area, then this intersection area can be computed by Equation 2.
VW
W
Æ Vn
È
VW
È
M
M
o
o
o
Ç Æ Vn
n
Y
Y
Y
W
W
M
M
o
o
o
n
o
o
o
Ç Æ n W
È
È
M
M
Ç Æ n
XW
Ç Æ Xn W
È
È
XW
n
Ç Æ Xn
N
A
J
¥ `
É
É
`W
M
M
o
o
Ç Æ `n
È
`W
È
Ç Æ `n
Ë
Ë o
o
Ï n ¦
Î V
ΦX n
Ì o Í
Ê V
o
X ʦn
R
Y¥
Y
Y
¥ X
¥¦V n
\
where R Y
Y
Y
Ê
Æ W
Æ X W
W
[
Æ VT
A
J
C
X R
Y
Y
Y È
Æ W
È W
Æ X W
È
Æ V T
Æ
D
C
Ê V
[
R Y
Y
Y
A
J
C
W
W
W M
M [
M VT
X
Î X R
Y
Y
Y
Æ
D
C X È
È
È M
M [
(c) University of Minnesota - Twin Cities
W
W
W
M VT
Î V
, , , . April 8, 2004
Slide 18
Algorithm Design
DCS: A Direct Combinatorial Search Algorithm.
EXCOM: An Extended Co-location Mining Algorithm. Direct Combinatorial Search With Buffer Test (Prevalence-based Pruning)
Spatial Objects
Geometric Filter (Quad-Tree)
(c) University of Minnesota - Twin Cities
Coarse Level Co-location Patterns
Refinement Combinatorial Search With Buffer Test (Prevalence-based Pruning)
Co-location Patterns
April 8, 2004
Slide 19
Experimental Setup
Experimental Data Set: MN/DOT base map.
Experimental Design Candidates: DCS, EXCOM
Road Map Data Set
Co−location Mining Algorithms
Co−location ratio analysis for line−string co−location patterns
Summary
measurements, e.g. execution time
Parameters: Coverage Ratio, Buffer Size
(c) University of Minnesota - Twin Cities
April 8, 2004
Slide 20
The Filtering Effect of the Geometric Component
Running Time (sec)
10000
DCS EXCOM 1000
100
20
40
60
80
100
120
140
160
Buffer Size (feet)
The geometric filter can speed up the prevalence-based pruning approach by a fact of 30 - 40.
(c) University of Minnesota - Twin Cities
April 8, 2004
Slide 21
Line-String Co-location Patterns for Test Route Selection
Evaluate the positional accuracy of digital roadmap databases.
Co-located roads are the most challenging test sites for evaluating the ability of global positioning systems (GPS) systems to identify correct roads from a digital roadmap.
(c) University of Minnesota - Twin Cities
April 8, 2004
Slide 22
Contributions
Generalize the concept of co-location patterns to extended spatial objects, e.g. polygons and line-strings.
Propose a novel buffer-based model for mining co-location pattern. This model has three advantages over the event centric model and is transaction-free.
Propose a geometric filter-and-refine co-location mining framework.
Experiment evaluation with a real data set shows that the geometric filter-and-refine approach can speed up the prevalence-based pruning approach by a fact of 30 to 40.
The application of applying line-string co-location patterns for selecting test routes has been provided to show the usefulness of co-location patterns.
(c) University of Minnesota - Twin Cities
April 8, 2004
Slide 23
Conclusions and Future Work
Extending the notion of co-location pattern
de-colocation pattern co-incidence pattern
Applications of Co-location Pattern
(c) University of Minnesota - Twin Cities
April 8, 2004