A clustering-based visualization of colocation patterns Elise Desmier 1 , Frédéric Flouvat2 , Dominique Gay Selmaoui-Folcher 2
3
and Nazha
1
2
Université de Lyon, LIRIS, UMR5205 CNRS, Villeurbanne, France
[email protected] University of New Caledonia, PPME, EA3325, Nouméa, New Caledonia
[email protected] [email protected] 3 TECH/ASAP/PROF, Orange Labs, Lannion, France
[email protected]
IDEAS’11, Lisboa
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Toward a better visualization of spatial patterns One of the major issues in data mining (Han and Kamber 06) "the presentation and visualization of discovered knowledge expressed in high-level languages, visual representations, or other expressive forms so that the knowledge can be easily understood and directly usable by humans"
Problem with existing solutions No solutions to display spatial patterns (colocations) in a simple, concise and intuitive way for experts
Contribution A new visualization of colocations based on a heuristic clustering method easily usable and interpretable by domain experts additional spatial and thematic informations wrt "classical" colocations
Frédéric Flouvat
A clustering-based visualization of colocations
2 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Outline
1
Context
2
Spatial pattern mining and visualization
3
Visualization of colocations
4
Application
5
Conclusion
Frédéric Flouvat
A clustering-based visualization of colocations
3 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Application Context New Caledonia Exceptional biodiversity and caledonian lagoons declared a World Heritage site by the UNESCO But important mining projects (25% of world resources in Nickel), a tropical climate with cyclones and bush fires
Important soil erosion Strong impact on terrestrial and littoral ecosystems
➫ FO.S.T.ER. project (financed by the French government) A multidisciplinary consortium composed of specialists in data mining, image processing and geology Providing to geologists a semi-automatic and complete process for monitoring soil erosion Frédéric Flouvat
A clustering-based visualization of colocations
4 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Data Complex data Heterogenous data : DEM, vegetation, soils occupation , climate, ... Large and spatial data ➫ Need of advanced analysis and modelization methods to assist experts
Spatial data mining Extracting interesting useful and unexpected knowledge in spatial data A large number of descriptive and/or predictive methods
• e.g. spatial decision trees, clustering, spatial pattern mining ... Focus on colocations (spatial patterns)
Frédéric Flouvat
A clustering-based visualization of colocations
5 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Outline
1
Context
2
Spatial pattern mining and visualization
3
Visualization of colocations
4
Application
5
Conclusion
Frédéric Flouvat
A clustering-based visualization of colocations
6 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
What is a colocation ? First, the data Spatial objects associated to different features
• e.g. object 1 is characterized as "sparse vegetation" (A), object 7 as "mine" (C), and object 8 as "river erosion" (B) ➫ A1 , C7 and B8
Then, the pattern Colocation = subset of features whose objects are "often" located close to each other
• e.g. {A, C, B}, i.e. {sparse vegetation, mine, river erosion} Colocation instance = subset of objects having the features of the colocation and close to each other
• set of all instances of a colocation = table instance T I Frédéric Flouvat
A clustering-based visualization of colocations
7 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Mining colocations(Shekhar et al. 01) Two important aspects The neighborhood relationship • e.g. euclidean distance, intersection, ... The measure "often located close to each other" • participation index (anti-monotone)
Mining Input : a set of spatial objects each one associated to a feature, a neighborhood relationship, and a threshold for the measure • data stored in a GIS Output : "frequent" colocations, i.e. those whose participation index is greater than a threshold Algorithm : classical levelwise mining algorithm • such as Apriori for itemset mining Frédéric Flouvat
A clustering-based visualization of colocations
8 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Methods unsuited to expert needs Many works on colocations Improving algorithms performance Extracting local patterns Reducing the number of colocations ...
Problems No visualization of colocations adapted to expert needs and practices
• necessary to extract relevant informations
Frédéric Flouvat
A clustering-based visualization of colocations
9 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Visualizing data mining results Three main approaches to visualize data mining results : 1. Textual representation
• basically a list of patterns with interestingness measures • ex. : textual visualization of colocation patterns
➫ simple but not easily understandable by domain experts
Frédéric Flouvat
A clustering-based visualization of colocations
10 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Visualizing data mining results Three main approaches to visualize data mining results : 2. Abstract representation (e.g. plots, matrices, graphs, trees or cubes)
• condense and informative visual representations of the solutions with statistics
• • •
ex. : grid representation of association rules in MineSet (Brunk et al. 97) ex. : radial hierarchical layout to represent frequent itemsets (Keim et al. 05) ex. : orthogonal graphs to represent frequent itemsets (Leung et al. 08)
➫ not really adapted to spatial patterns
• •
in spatial pattern mining, spatiality is not just an other dimension of analysis for domain experts, the spatial dimension is the basis of their interpretation
Frédéric Flouvat
A clustering-based visualization of colocations
11 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Visualizing data mining results Three main approaches to visualize data mining results : 3. Cartographic representation
• first solution : visualization of spatial pattern instances on a map
• •
ex. : classical cartographic visualization of spatial clusters with colors ex. : select an association rule and visualize its interestingness measure for each country (Andrienko 99)
➫ not possible to display all colocations instances (such as in spatial cluster analysis) ➫ "select a pattern and display its instances" gives only a local view of one pattern
Frédéric Flouvat
A clustering-based visualization of colocations
12 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Visualizing data mining results Three main approaches to visualize data mining results : 3. Cartographic representation
• second solution : generating visual representations of the solutions
•
ex. : clusters of trajectories summarized by "representative trajectories" using a classifier and visual refinement (Andrienko 09)
➫ not directly usable for colocation patterns but an interesting approach Frédéric Flouvat
A clustering-based visualization of colocations
13 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Outline
1
Context
2
Spatial pattern mining and visualization
3
Visualization of colocations
4
Application
5
Conclusion
Frédéric Flouvat
A clustering-based visualization of colocations
14 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Our approach Problem How to visualize interesting colocations on a map ?
Motivations Have a easily usable and interpretable visual representation for experts Give additional spatial and thematic informations Give a global cartographic view of the solutions
Frédéric Flouvat
A clustering-based visualization of colocations
15 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
A colored and labeled clique representation of colocations A natural visual representation of a colocation A clique node = object-type (i.e. feature) vertex = neighborhood relationship Example : Colocation {mining zone, sparse vegetation, sensitive trail, river erosion} Visual representation :
Frédéric Flouvat
A clustering-based visualization of colocations
16 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
A colored and labeled clique representation of colocations Additional informations Node coloration to represent thematic informations Edge coloration to visualize the interestingness measure, i.e. the prevalence of the colocation Example : Colocation {mining zone, sparse vegetation, sensitive trail, river erosion} with participation index = 0.8 Visual representation :
Frédéric Flouvat
A clustering-based visualization of colocations
17 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Spatial representation of colocations How to position the visual representations of colocations on the map ? In other words, how to position the clique nodes ? ➫ Using a "spatialization" function Summarize spatial informations on its colocation instances • only spatial objects (instances) have spatial informations Allow to visualize where and how instances of an interesting colocation are generally located
Frédéric Flouvat
A clustering-based visualization of colocations
18 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
A first basic spatialization function A centroid based spatialization function The centroid = a basic approach to summarize a set of points • "average" of all points ➫ For each clique node, generate the centroid of its feature instances • ex. : for colocation {A, B, C}, node A is the centroid of spatial objects {A1 , A5 } (i.e. objects with A belonging to the table instance of {A, B, C})
Frédéric Flouvat
A clustering-based visualization of colocations
19 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
A first basic spatialization function Problem with this centroid based spatialization function
➫ Solution : using clustering to allow several representations for each colocation Frédéric Flouvat
A clustering-based visualization of colocations
20 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
A clustering-based spatial representation of colocations Principle For each interesting colocation, Cluster its instances Process the position of each colocation feature in each cluster, using the centroid based spatialization function Draw the colored and labeled clique representation
Frédéric Flouvat
A clustering-based visualization of colocations
21 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
A clustering-based spatial representation of colocations Interest of this approach : a better representation of colocation instances Show where an interesting colocation is generally located Ex. : colocation {A, B, C} is generally located the north-west (and mainly in this area) Show how features in a colocation are w.r.t. each others Ex. : objects in colocation {A, B, C} are relatively far from each other • show for example that mines and sparse vegetation have an indirect impact on erosion (colocation {mine, sparse vegetation, erosion}) ➫ Difficult to have such informations with "classical" approaches
One major problem : scalability All clusterings (one for each colocation) may be computationally expensive Frédéric Flouvat
A clustering-based visualization of colocations
22 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Improving performance Optimizing memory occupation by combining colocation mining algorithm and visualization Clustering and visualization not done in a post-processing step
• Avoid storage of all colocation instances in memory Each colocation is mined one by one and their visualization is done at the same time
• Integrate visualization in the mining algorithm Optimizing execution time using a heuristic clustering method
Frédéric Flouvat
A clustering-based visualization of colocations
23 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Heuristic clustering approach Observation : colocation instances share lots of spatial objects e.g. colocations {A, B} and {A, B, C} share spatial objects A1 and A5 ➫ If clustering in post-processing step, some processing will be done several times e.g. computing distances between A1 and A5
Proposition A two-step clustering approach integrated in the mining algorithm a clustering of each feature instances, run once at the beginning of the algorithm
• i.e. one clustering for A objects, one clustering for B objects, ... a clustering of each colocation instances based on the previous clusters, using a merge and split approach Frédéric Flouvat
A clustering-based visualization of colocations
24 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Heuristic clustering approach
Frédéric Flouvat
A clustering-based visualization of colocations
25 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Focus on the Merge and Split approach Principle : Select the feature f in the current colocation C, having the highest number of clusters Split instances of C w.r.t. clusters of f Problem : "conflictual clusters", i.e. object instances belonging to several partitions Ex. : Y2 is in the first instance partition and in the second one Solution : merge clusters leading to a conflict Ex. : merge first and second clusters of Z ➫ Merge and split approach : alternate merge and split until no change Frédéric Flouvat
A clustering-based visualization of colocations
26 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Outline
1
Context
2
Spatial pattern mining and visualization
3
Visualization of colocations
4
Application
5
Conclusion
Frédéric Flouvat
A clustering-based visualization of colocations
27 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Experimentations Data Studied area : mountainous watershed of 9km2 3 thematic layers : • erosion : "not bare ground" or different types of "bare ground" (6 features) • nature of the ground : lithology (13 features) • vegetation : types of vegetation (13 features) ➫ 32 features and more than 7000 objects
Experimental protocol Spatial relationships : euclidean distance between areas Several participation index thresholds ➫ Results studied by a geologist expert in soil erosion of the studied area
Frédéric Flouvat
A clustering-based visualization of colocations
28 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Experimentations : Map readability Number of patterns displayed to users an important indicator for visualization methods if too much patterns are displayed, then interpretation is difficult Distance 200m
300m
nb colocations avg nb instances for a colocation total nb instances for all colocations nb colocations displayed by our approach nb colocations avg nb instances for a colocation total nb instances for all colocations nb colocations displayed by our approach
Participation index threshold 0.5 0.3 0.1 21 68 266 16 478 11 974 8 365 346 046 814 263 2 225 118 31 112 510 55 50 803 2 794 205 84
163 78 347 12 770 670 258
711 87 100 61 928 727 1349
➫ No more than twice the number of colocations If too much, possibility to use the zoom functionality of the GIS to filter ➫ Enables to compare our approach with classical visualization approaches "select a pattern and display its instances" approach = average number of instances for a colocation Frédéric Flouvat
A clustering-based visualization of colocations
29 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Experimentations : Performance evaluation Execution time versus our approach and a "basic" post-processing clustering approach post-processing approach = executing a DBScan clustering on each table instance after colocation extraction 5802 objects and 18 features 100000
7642 objects and 32 features 100000
Spatial clustering-based colocation mining Colocation mining then DBScan clustering
10000
Total Time (sec)
Total Time (sec)
10000
Spatial clustering-based colocation mining Colocation mining then DBScan clustering
1000
100
10
1000
100
10
1
1 0.8
0.7
0.6
0.5
0.4
0.3
Minimum participation index
0.2
0.1
0
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Minimum participation index
➫ Our approach more efficient than the basic approach
Frédéric Flouvat
A clustering-based visualization of colocations
30 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Experimentations : Expert feedback Example of result provided to our expert by our prototype
➫ Point out known correlations about soil erosion in this area e.g. highlight the environmental damage near the areas where there are humans activities ➫ Interest of our approach for experts Give a global picture on where and how colocations are generally located Quickly identify new patterns, then focus on some of these patterns and study more deeply their instances Frédéric Flouvat
A clustering-based visualization of colocations
31 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Outline
1
Context
2
Spatial pattern mining and visualization
3
Visualization of colocations
4
Application
5
Conclusion
Frédéric Flouvat
A clustering-based visualization of colocations
32 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Conclusion & Perspectives Conclusion Proposition of a new clustering based visualization of colocations A colored and labeled clique representation with thematic and prevalence informations A spatialization of colocation using a heuristic clustering method and a centroid based positioning ➫ An easily usable and interpretable global picture of the solutions ➫ Good scalability
Main perspectives Improving algorithm performance with dedicated data structures, spatial indexes, or new mining strategies Improving our prototype Extending our approach to other patterns, e.g. sequential spatio-temporal patterns Frédéric Flouvat
A clustering-based visualization of colocations
33 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Questions ? Thank you
Frédéric Flouvat
A clustering-based visualization of colocations
34 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Our approach Problem How to visualize interesting colocations on a map ?
Principle of our solution Generate a clique representation of each colocation and georeference this representation on a map using clustering
Frédéric Flouvat
A clustering-based visualization of colocations
35 / 36
Context
Spatial pattern mining and visualization
Visualization of colocations
Application
Conclusion
Formal definition of a visual colocation representation A colored and labeled clique representation of a colocation C = a colored and labeled clique Gcol C = (VC , EC , Ltype , Lpi , Ltheme ), where VC is the set of vertices, EC = {(u, v) ∈ VC × VC | u 6= v} is the set of edges, Ltype : VC → C is a labelling function that assigns an object-type f ∈ C to a vertex v ∈ VC , S Lpi : C EC → Col is a coloring function that assigns a color k ∈ Col = {1, 2, ..., m} (m ≥ 1) to a colocation edge based on the prevalence measure pi(C) ∈ [0, 1] (saturation factor), and Ltheme : VC → Coltheme is a coloring function that assigns the thematic color k ′ ∈ Coltheme = {1, 2, ..., m′ } (m′ ≥ 1) of object-type Ltype (v) to a vertex v ∈ VC .
Frédéric Flouvat
A clustering-based visualization of colocations
36 / 36