A Framework for Discovering Co-location Patterns in Data ... - Hui Xiong

6 downloads 0 Views 380KB Size Report
Hui Xiong. Department of Computer Science & Engineering. University of Minnesota - Twin Cities. (c) University of Minnesota - Twin Cities. April 8, 2004 ...
Slide 1

A Framework for Discovering Co-location Patterns in Data Sets with Extended Spatial Objects Hui Xiong Department of Computer Science & Engineering University of Minnesota - Twin Cities

(c) University of Minnesota - Twin Cities

April 8, 2004

Slide 2

Overview Introduction   

General Problems Related Works Research Motivations



A Buffer-based Model



A Filter-and-Refine Co-location Pattern Mining Framework



Experimental Evaluation



Conclusions and Future Work

(c) University of Minnesota - Twin Cities

April 8, 2004

Slide 3

Introduction & Background Co−location Patterns − Sample Data

80

70

60

Y

50

40

30

20

10

0

0

10

(c) University of Minnesota - Twin Cities

20

30

40 X

50

60

70

80

April 8, 2004

Slide 4

Examples of Co-location Patterns

















 



















    



  

'



! -

!

,+ "



"  ( 

!

'

" 

*

)

! 





.

% $ &

"!#

  

, " , / "  '

/ 5'

*

( * 

!

)

)

)







.

34$

"( 

!

*1

2

,+ ' " # (

!

!



+*

,(+

"0 

, "! # /* '+ 

,

*# ! !-

#*

.

'

"

0 "

! +* 3 *

!# 

1

"*

+

#

'( +

#

1

!#

(

!

(

"2

'

*#

8

.

! ,+

1

!

+

2 0

" 

3  *

*#

!

!

*

!

!

!#

'

! + * !



"

;:,

!(

+*



#2

+

#

!

'-

!

#

'-

"

1

"

!-

)

0 )

*

,

0 '

!

,-

"

,

,



#





6 $ 7

# ) ! + # ! # ( !

* !#

!# *

! + # ! # (



(

!

 "

"

,(+

%



"

,+

)

*

,+ 

$

,  )

 

1 

,#

!

(

1

'#

'

'#

1

*

*

!#

#

"! 

' # 1 ,( "







"# 0 ,  

 9

)

)

8



.

$

# 

*



3

+ 
# '

"

!

#

!

'-

' "-



*

*+



!

, "

'

,#+ ='

' ! "! (



.

April 8, 2004 (c) University of Minnesota - Twin Cities

Slide 5

Related Works



Spatial Statistics  

Use measures of spatial correlation to characterize the relationship between spatial features * the cross-K function with Monte Carlo simulation * mean nearest neighbor distance * spatial regression model Computationally expensive



Data Mining Approaches  

A clustering based approach by Estivill-Castro et al. * Features can be completely spatially random or declustered. * Sensitive to the choices of clustering algorithms. Asssocation-rule based approaches.

(c) University of Minnesota - Twin Cities

April 8, 2004

Slide 6

Related Works - Cont.



Asssocation-rule based approaches.   

Transaction-based approaches. * A reference-feature centric model by Koperski et al. - Generalizing this paradigm to the case where no reference feature is specified is non-trivial. - May yield duplicate counts for many candidate associations. Distance-based approaches. * k-neighboring classs sets by Morimoto. - the number of instances for each pattern is used as the prevalence measure * an event centric model by Shekhar et al. All these approaches are for point spatial features.

(c) University of Minnesota - Twin Cities

April 8, 2004

Slide 7

Motivation 

Identifying co-location patterns in data sets with extended spatial objects (e.g. polygons and line strings). 

Highway often have frontage road nearby in large metropolitan.

 D

O

O

K

N M &

IL

IJ

DE

BH

DFG

AE

D

ABC

(c) University of Minnesota - Twin Cities

April 8, 2004

Slide 8

Problem Formulation



Given: Y

Y

Y

P

SRQ



A set T of spatial feature types and spatial data types can be point as well as other extended spatial objects, such as line strings and polygons. A set of instances , each is a vector instance-id, spatial feature type, location where spatial feature type and location spatial framework . A buffer size, a minimum prevalence threshold, a minimum conditional probability threshold. WUZ

WUX

TUV

[

]

a

Y ^_

Y

Y

\

R ]



^`

T^V

[

c

b

Q

d

a

a

 

Find: Co-location Patterns and Co-location Rules.



Objective: Computational Efficiency.



Constraints: Correctness and Completeness.

(c) University of Minnesota - Twin Cities

April 8, 2004

Slide 9

A Buffer-based Model hl

k

ij

hi

gf

e

Buffer is a zone of specified distance around spatial objects. The boundary of the buffer is the isoline of equal distance to the edge of the objects.



Motivation 

Objects in space frequently have sort of impact on the objects and areas around them * freeways create “noise pollution” that can be heard blocks away. * factories emit fumes that can affect people for miles around.

(c) University of Minnesota - Twin Cities

April 8, 2004

Slide 10

A Buffer-based Model

.

A Point Object O1

A LineString Object O2

A Polygon Object O3

Neighborhood of O1

Neighborhood of O2

Neighborhood of O3

.

. .

Instance r

Neighborhood of r

..

\

hm

k

ij

hi

gf

e

, the size- Euclidean neighborhood of a point location , is a circle of side with as its center. E

^

o

n

E

^

^

\

h

k

ij

hi

gf

e

, the size- neighborhood of an extended spatial object (e.g. polygon, line-string), is defined by the buffer operation. E

B

p

o

n

(c) University of Minnesota - Twin Cities

April 8, 2004

Slide 11

A Buffer-based Model - Cont. w

Y UZ

Y

Y

]

k

u

f

ij

t

u

ks

r

k

tf

hq

ij

hi

gf

e

v

UX

o

UV n

}

}

z

{ yz

~}|z

_



Y UZ

Y

\

Y

Y

Y

R x

WUZ

W

for a co-location is the Y

UV

UX

TUV

n

[

o

The is , where Euclidean neighborhood of the co-location C. 

Šƒ

‡

†

‚ƒ

‰

„z

…ˆƒ

†

‡

†

„†

‚ƒ„…

]

i



i

j





w

‹

x

x

o





Xn

t Ž

k

j

u

h

i

k

r

k



h

h

ij

hi

gf

e

The of a co-location rule is the probability of finding the neighborhood of in the using the neighborhood of . It can be computed as and . neighborhoods of co-locations ’

x

x

x

X

X

V

_

“|

€

“;” { y

x

V

_

€

“ {y

”

x

x

x

X

V

V

lu

–

–f

•

The coverage ratio for co-location patterns is monotonically non-increasing with the size of the co-location pattern increasing.

(c) University of Minnesota - Twin Cities

April 8, 2004

Slide 12

A Coarse-Level Co-location Pattern Mining Framework A Point Object O

.

A LineString Object O

A Polygon Object O

Neighborhood of O

Neighborhood of O

Neighborhood of O

.. ..

Point Objects

Neighborhood of r ˜ ™— ˜ ™— ˜ ™— ˜ ™— ˜ ™— ™—

.

Row Instance r for two

\

k

B

›š

h

ij

hi

gf

e

, the bounding neighborhood of a spatial object is defined as MBBR(Buffer(MOBR(Spatial Object O), d)), where MOBR is the minimum object bounding box, Buffer is the buffer operation with a buffer size as d, and MBBR is the minimum buffer bounding box. o

n

\

›

h

k

ij

hi

gf

e

The Euclidean bounding neighborhood spatial feature is the union of for every instance spatial feature . œ

o

Un



\

›

J

J

‡

‡

U

o

n

of a of the



U



(c) University of Minnesota - Twin Cities

April 8, 2004

Slide 13

A Coarse-Level Co-location Pattern Mining Framework - Cont. k

Y UZ

Y

Y

\

›

h

ij

hi

gf

e

The Euclidean bounding neighborhood for a coarse-level co-location pattern is the intersection of for every spatial feature in . ž

UX

o

UV n

Y

Y

Y

R x

x

WUZ

W

TUV

[

\

›

x

x

U`

o

U` n

Ÿ

w

x

Y UZ

Y

Y

]

k

u

f

ij

t

u

tf

ks

r

f

fs

 

k

r

tu

k

 ¡¢f

h

ij

hi

gf

e

The for a is , coarse-level co-location pattern is the Euclidean bounding neighborhood of the where coarse-level co-location pattern CC. v

UX

o

UV n

}

}

z

{ yz

} |z

£

_



Y

Y

Y

R x

x

WUZ

W

TUV

[



Šƒ

‡

†

‚ƒ

‰

„z

ƒ…ˆ

†

‡

†

„†

‚ƒ„…

Y UZ

Y

Y

\

›

UX

o

UV n

(c) University of Minnesota - Twin Cities

April 8, 2004

Slide 14

A Coarse-Level Co-location Pattern Mining Framework - Cont. mu

–

–f

•

The coarse-level coverage ratio for coarse-level co-location patterns is monotonically non-increasing with the size of the coarse-level co-location pattern increasing. Y

Y

Y

R ¤

–

u

–f

•

For any spatial feasure set , the coarse-level coverage ratio CPr(F) is larger than or equal to the coverage ratio Pr(F). p

WUZ

W

WUX

TUV

[

(c) University of Minnesota - Twin Cities

April 8, 2004

Slide 15

A Coarse-Level Co-location Pattern Mining Framework - Cont.

A2 C1 B2 A3 A1

B3 A4 B1

Y ¬­

NO

o

=

©ª

€

w

¥¦n

] x

=

§ y ¨

£

_



=

X

 Šƒ

‡

†

‚ƒ «

«

‰

„z

ˆƒ…

†

‡

†

„†

‚ƒ„… X

Y O®

O

=

V

€

› w

o

¥ n ¦

] x

=

£

§ y

£

_



=

.

X

 Šƒ

‡

†

‚ƒ «

«

‰

„z

ˆƒ…

†

‡

†

„†

‚ƒ„…

(c) University of Minnesota - Twin Cities

April 8, 2004

Slide 16

Geometric Challenges and Solutions Y S

(X2 , Y2)

y3rt y1rt

A3

A1

y2rt y3lb y1lb

A2

y2lb

(X1 , Y1) x1lb x2lb x3lb x1rt x2rt x3rt

O

¥

Y

Y

Y

u

¥ V

–

–f

•

For any n spatial events

X

p

Š

W

W

,

¯

¯ Ä

³ ´¶

¯

Ã

Â

¸

ÂÅ

Â

¸ ³

¯ ¹

·

¹

¹

~¸·

Ã

¿

¿Â ¹ Á

¼ ·

Â

Â

¸

À ¸ ¾

¸±

¸ ·

´¶

¼ ¹ ¦

¾

À ¸ ¾

¸±

~¸·

´¶

¿¹

¸±

¸ ·

´¶

¼¦¹ ±

~¸·

´¶

~¸·

4 º ¹ ±

»

»

»

» ³

´¶µ³ ° ±²

±² Á½

½

À½

±

½

À½

±

±½ ¾

¾

¾

(c) University of Minnesota - Twin Cities

April 8, 2004

Slide 17

Geometric Challenges and Solutions - Cont. Š

W

W

¥ W

Y

Y

Y

¥ X

¥ V

Theorem 2 Given any n spatial events and the corresponding bounding neighborhoods , , , , where the bounding neighborhood of the event , , is represented by the left bottom point and the right top point , if the bounding neighborhoods of these n spatial events have the common intersection area, then this intersection area can be computed by Equation 2. ˆ

„

VW

ˆ

„

W

Æ Vn

‡È

VW

‡È

M

M

o

o

o

Ç Æ Vn

n

Y

Y

Y

ˆ

„Š

ˆ

„Š

W

W

M

M

o

o

o

n

o

o

o

Ç Æ n W

‡È

Š

‡È

Š

M

M

Ç Æ n

ˆ

„

XW

ˆ

„

Ç Æ Xn W

‡È

‡È

XW

n

Ç Æ Xn

N

A

J

¥ `

É

É

ˆ

„

`W

ˆ

„

M

M

o

o

Ç Æ `n

‡È

`W

‡È

Ç Æ `n

Ë

Ë o

o

Ï n ¦

Î V

ΦX n

Ì o Í

Ê V

o

X ʦn

R Š



Y

Y

¥ X

¥¦V n

\

›

where R Y

Y

Y

Ê ˆ

„Š

Æ W

ˆ

„

Æ X W

ˆ

„

W

[

Æ VT

A

J

C

X R

Y

Y

Y ‡È

Š

Æ W

‡È W

Æ X W

‡È

Æ V T

Æ

D

C

Ê V

[

R Y

Y

Y

A

J

C ˆ

„Š

ˆ

„ W

W

W M

M [

ˆ „ M VT

X

Î X R

Y

Y

Y

Æ

D

C X ‡È

Š

‡È

‡È M

M [

(c) University of Minnesota - Twin Cities

W

W

W

M VT

Î V

, , , . April 8, 2004

Slide 18

Algorithm Design



DCS: A Direct Combinatorial Search Algorithm.



EXCOM: An Extended Co-location Mining Algorithm. Direct Combinatorial Search With Buffer Test (Prevalence-based Pruning)

Spatial Objects

Geometric Filter (Quad-Tree)

(c) University of Minnesota - Twin Cities

Coarse Level Co-location Patterns

Refinement Combinatorial Search With Buffer Test (Prevalence-based Pruning)

Co-location Patterns

April 8, 2004

Slide 19

Experimental Setup



Experimental Data Set: MN/DOT base map.



Experimental Design Candidates: DCS, EXCOM

Road Map Data Set

Co−location Mining Algorithms

Co−location ratio analysis for line−string co−location patterns

Summary

measurements, e.g. execution time

Parameters: Coverage Ratio, Buffer Size

(c) University of Minnesota - Twin Cities

April 8, 2004

Slide 20

The Filtering Effect of the Geometric Component

Running Time (sec)

10000

DCS EXCOM 1000

100

20

40

60

80

100

120

140

160

Buffer Size (feet)



The geometric filter can speed up the prevalence-based pruning approach by a fact of 30 - 40.

(c) University of Minnesota - Twin Cities

April 8, 2004

Slide 21

Line-String Co-location Patterns for Test Route Selection 

Evaluate the positional accuracy of digital roadmap databases.



Co-located roads are the most challenging test sites for evaluating the ability of global positioning systems (GPS) systems to identify correct roads from a digital roadmap.

(c) University of Minnesota - Twin Cities

April 8, 2004

Slide 22

Contributions



Generalize the concept of co-location patterns to extended spatial objects, e.g. polygons and line-strings.



Propose a novel buffer-based model for mining co-location pattern. This model has three advantages over the event centric model and is transaction-free.



Propose a geometric filter-and-refine co-location mining framework.



Experiment evaluation with a real data set shows that the geometric filter-and-refine approach can speed up the prevalence-based pruning approach by a fact of 30 to 40.



The application of applying line-string co-location patterns for selecting test routes has been provided to show the usefulness of co-location patterns.

(c) University of Minnesota - Twin Cities

April 8, 2004

Slide 23

Conclusions and Future Work



Extending the notion of co-location pattern  

de-colocation pattern co-incidence pattern



Applications of Co-location Pattern

(c) University of Minnesota - Twin Cities

April 8, 2004

Suggest Documents