FUZZY STRUCTURAL PRIMITIVES FOR SPATIAL DATA MINING A. Boulmakoul*, K. Zeitouni** — N. Chelghoum** *LIST - Laboratoire Informatique des Systèmes de Transport, Faculté des Sciences et Techniques de Mohammedia, B.P. 146 Mohammedia, Maroc. {
[email protected]} **Laboratoire PRISM - Université de Versailles Saint Quentin, 45, Avenue des Etats-Unis, F-78035 Versailles Cedex, France.
Abstract Spatial data mining knows a more and more important interest. Fundamental processes of spatial data mining are in particular clustering and structural patterns detection. These processes are influenced strongly by the concept of proximity or neighborhood. This paper introduces some structures to the construction of a spatial data mining integrating fuzzy structural primitives and proposes its application within a system for safety road analysis. We propose also a fuzzy general algorithm permitting to determine partitions for a fuzzy reflexive and symmetrical relation. These investigating are important for the data analysis and the spatial data mining. The system implementation uses in particular, the C++/STL, the Microsoft Foundation Class Library (MFC) and MapObjects ActiveX control ( ESRI). The system components architecture is also described in this work. Index terms— GIS, spatial data mining, fuzzy clustering, similarity, accidents analysis, MapObjects.
1. INTRODUCTION The main objective of the spatial data mining is to discover hidden complex knowledge from spatial and not spatial data despite of their huge amount and the complexity of spatial relationships computing. However, the spatial data mining methods are still an extension of those used in conventional data mining. Spatial Data Mining (SDM) consists in two functions [3] [17-18]. The first describes a spatial phenomenon by exploring data, for example to identify risky zones by viewing the spatial distribution of the accidents location. The second function explains or even predicts the phenomena while looking for some correspondences with properties of the geographical environment. For instance, accidents could be “ explained “ by the state of the road or by the urban density around. The spatial classification belongs to these explanatory methods. Algorithms of spatial data mining are bound strongly to the concept of neighborhood relations. The neighborhood relations as it has been defined in the relative recent works on the knowledge discovery in
spatial data bases, was integrated in database management systems (DBMS) [7]. This approach permits using efficient algorithms of data mining. From neighborhood relations some operators in DBMS context, have been proposed to facilitate the expression of spatial data mining algorithms [8-9]. The objective of this paper is to incorporate some supplementary techniques based on the fuzzy structural analysis of the complex systems [5] [12] [14]. In this article we propose a simple general algorithm that permits to generate partitions for all reflexive and symmetrical fuzzy relations. In particular for those that are transitive max-min or transitive max- . This algorithm is advantageous in relation to the one proposed by Yang [15], for its general formulation and the simplicity of its implementation. A software component dedicated to the management and the manipulation of fuzzy relations has been developed and built-in in a global system of spatial data mining. The spatial clustering is a process of the grouping objects in classes. Several techniques have been developed, they distinguish themselves according to the following typology: partitioning methods, hierarchical methods, density based methods [13] and grid based methods [10] [18]. In this context we propose a fuzzy clustering model based on the fuzzy graphs. These graphs are constructed from fuzzy relations between objects with using spatial relations [4]. In this approach, properties and operations of the fuzzy relations of similarity are solicited (the hierarchical analysis, and the convex fuzzy relation decomposition). The work that we describe in this paper targets the spatial data mining for the road accidents analysis. The traffic risk analysis allows identifying the road safety problem in order to propose safety measures. This project aims at deducing relevant risk models to help in traffic safety task. The risk assessment is based on the information on the previous injury accidents collected by police forces. However, right now, this analysis has
been based on statistic with no consideration of spatial relationships. This work aims at identifying risky road sections and analyzing and explaining those risks with respect to the geographic context. We propose to combine accident data to thematic data relating to the road network, the population census, the buildings, or any other geographic neighborhood information in the process of risk analysis. This is specifically the approach of spatial data mining technology. This paper focuses in the clustering task. The analysis builds a spatial partitions that integrates the spatial feature of the entry thematic layer (here the accidents). This stands for considering the interaction with other thematic layers in the decision rule induction. Meanwhile, in the application domain, one can explain and predict the danger of the roads by their geographical context. The present article is structured as follows: the section 2, gives the conception aspects of the spatial data mining system and shows the integration of the software component dedicated to the fuzzy relations. In the section 3 we recall summarily some operations on similarity relations and we develop the general algorithm permitting the extraction of partitions of a similarity relation. The last section offers the conclusion of this work and summarize the results of the present developments and trace the future stages of this research project.
1- a domain bound to the technical components of the SDM (data minig algorithms, queries operator, etc.) 2- a domain concerning the semantic persistent data (DBMS), 3- a domain attached to the spatial persistent data (GIS), The spatial view uses MapObjects (© ESRI) within STL and MFC library. MapObjects software is a set of mapping software components that provides dynamic mapping and geographic information system capabilities to Windows applications or to build custom mapping and GIS solutions. MapObjects comprises an ActiveX control called the Map control and a set of 46 ActiveX automation objects. In our work, MapObjects is used in Visual C++ programming environment. The figure 1 shows the interface of the Spatial Data Mining application for accidents analysis. The application is developed under windows (MFC/MapObjects) and allows the user to express requests for spatial accidents analysis on the Mohammedia urban transport network. Figure 2 shows collaborations between the different classes. In this use case view, only the fundamental interactions contributing to the processing of the fuzzy clustering algorithm, the data exchange and their visualization are taken in account. Figure 4 gives a simplified class diagram for this use case view.
Fuzzy Cluster
CView Clustering
Accident Clusters exchange Data preparation
DataProvider
Display
Create Accident Layer
CMap
Accident DB
Figure 1. Spatial accidents analysis in Mohammedia city. 2. SPATIAL DATA MINING SOFTWARE COMPONENTS ANALYSIS & DESIGN
Figure 2. Fuzzy clustering use case view. G IS
In this section elements of analysis and design of a spatial data mining system are studied. The intended system has for objective the analysis of the spatial data accident occurred on the urban roads. This system includes several primitives of data exploration. The fuzzy relations manipulation in a general setting is also integrated in this system.
Spatial D ata
Clustering
Mi ning
D BM S
G eo Statistics
Structural prim itiv es
The analysis of the spatial data mining system gives three important informational domains (figure 3) : Figure 3. Main domains.
- T µ R x, y , µ R y , z min µ R x, y , µ R y, z is said transitive max-min,
Da ta P ro vi d e r O m ega : Ac c ide nt
M ap Ad dAc c ide nts La y er() O nM apSa v eSha pes F ro m D B() O nM apSa v eSha pes F ro m File ()
- T µ R x, y , µ R y , z max 0, µ R x, y then R is said transitive max- , - T µ R x, y , µ R y , z µ R x, y said transitive max-prod.
then R
µ R y, z
1
µ R y, z , then R is
If R is a fuzzy relation, its convex decomposition is given by R max aR a , where R a is the -cut of Fu z z z y C l u s t e r A l go
the relation R, [0,1]. If R is a transitive max-min similarity relation, then R a is an equivalence relation.
1..*
The proposed algorithm bellow, is used for finding partitions of similarity relations constructed from road accidents data. The construction of similarity relations makes reference to the spatial data.
A cci d e n t
Figure 4. Fuzzy clustering Class diagram. The software components deducted at this stage are protected in packages that correspond to the detected domains (figure 5).
3.1. General algorithm for partition finding Notation : -
IMS
: the set of objects to classify.
- R, : the function indicator of a reflexive and symmetrical fuzzy relation defined on ..
Data mining Spatial STL
WebLink
MapObject
-
:the list of obtained clusters, initially empty.
-
e,
µ R e, x is the similarity function x between object and set. Fuzzy Primitives
Let the fololowing functions defined as : , e, µ R e, x , e x
x,
0 ;
Accident Data
- Rˆ min R, R a , where R a is the -cut of the relation R, [0,1]. The main procedures of the algorithm are given below.
Provider
DBMS
Figure 5. SDM Components view. 3. FUZZY RELATIONS AND CLUSTERING A fuzzy similarity relation [1-2] [6] [8-9], is a generalization of the notion of equivalence relation in the classic setting. Let be a set of objects, R a fuzzy relation on . R is a similarity relation if its verifies the following properties [16] : - (reflexivity)
R(x,x)=1
x
- (symmetry)
R(x,y)= R(y,x)=
; (x,y)
- (max-T transitivity) max T µ R x, y , µ R y , z R(y,x) µ R x, z y
2
; where T is a T-norm.
2
; (x,z)
Cluster_finding( R, , , ) { Rˆ min R, R a ; ? max µ ?, x x O Rˆ - T is an STL stack container, in which elements of to classify are sorted out, according to the decreasing values of . T heap_sort( ,compare( )); While ( !T.empty()) { e T.top() if ( .empty() then CreateCluster(e,T, ) else { , calculate e, , if ( ! (e).empty()) then AttractionCluster(e, ); else CreationCluster(e,T, ); } }
e,
,
e
The procedure CreationCluster permits to create a new partition and to suppress from the stack T all classified objects. CreateCluster(e,T, ) { C = new cluster; Ge y T , y e / µ R y, e y*
REFERENCES 1.
max µ R e, x
;
x O
2.
(e).top(); C insert(e),y}; C insert(y); insert(C);T.delete(e); T.delete(y*);
} The procedure attractionCluster given bellow, affects objects to "the most similar" existing clusters. Its suppresses from the stack T all classified objects. AttractionCluster (e, ) { Let the cluster C* such that s e, C*
max s e,
;
3.
4.
? e
C* }
C* {e}; T.delete(e); 5.
The general algorithm assures the extraction of partitions for all similarity relation transitive max-min or transitive max- .
6.
4. CONCLUSION & PERSPECTIVES In this work we proposed a general algorithm of partition finding for a reflexive and symmetrical fuzzy relation. The algorithm assures the determination of partitions for all cases of a fuzzy relation transitivity (max-min, max- , etc.). in any case the algorithm can be applied directly on a reflexive, symmetrical and non transitive fuzzy relation. The design of a spatial data mining is also described in this work. In this system we are going to integrate the software component of fuzzy relations manipulation, for accidents spatial data mining.
7.
8.
9. 10.
The fuzzy structural primitives will be able to bring a new approach to fear the accidents analysis in the setting of this work. On the basis of fuzzy graphs and the clustering algorithm it is possible to reach the following objectives :
11.
1. to define a formal model of the urban transportation network founded on the notion of fuzzy neighborhood graphs, this by the definition of a neighborhood graph by the concept of fuzzy relation,
13.
2. to build a risk map for the urban network using fuzzy paths [11],
12.
14. 15.
3. to lead the itinerary risk approach analyzes to detect the dangerous paths,
16.
4. to project results gotten in exportable dynamic map on the WEB,
17.
5. to integrate and to formulate information about “risk accidents” according to specified point of view.
18.
Backer E., Cluster analysis by optimal decomposition of induced fuzzy sets, PhD Thesis, Delftse University, 1978. Boulmakoul A., Structure Prétopologique Matroïdale : Application à la Décomposition des Systèmes Complexes, Conférence Internationale de Mathématiques Appliquées et Sciences de l’Ingénieur, Tome I, pp. 277-281, Casablanca, ENSEM, 14-19 Nov. 1996. Boulmakoul A., Zeitouni, K. Primitives structurales pour le data mining spatial, in Int. AMSE Conf. , vol 1, 62-69, March 19-21, 2001, Rabat Morocco. Cohn A.G., Randel D.A., Cui Z., Taxonomies of logically defined qualitative spatial relations, Int. Journal of Human-Computer Studies, 43(1995), 831846. Dussauchoy A., Paths algebra, similarities and system decomposition, Journal of Math. Analysis and Applications Vol. 102, N° 1, 75-85. Emptoz H., Modèle prétopologique pour la reconnaissance des formes, Thèse d’état 1983, Université Claude Bernard Lyon, France. Ester M., Kriegel H-P., and Sander J., Spatial data mining : a database approach, Proc. in Lecture Notes in Computer Science, 1997, Vol. 1262, Springer, pp. 4766. Kim L., Fuzzy relation compositions and pattern recognition, Information Sciences 89, 107-130 (1996), Elsevier. Murali V., Fuzzy equivalence relations, Fuzzy Sets and Systems 30 (1989) 155-163. Nanopoulos A., Manolopoulos Y., Mining patterns from graph traversals, Data & Knowledge Engineering 37 (2001) 243-266, Elsevier. Okada S., Soper T., A shortest path problem on a network with fuzzy arc lenghts, Fuzzy Sets and Systems 109 (2000) 129-140. Tamura S., Higuchi S., Tanaka K., Pattern classification based on fuzzy relations, IEEE Trans. Syst. Man Cybernet. 1 (1978) 61-66. Tremolières R., The percolation method for an efficient grouping of data, Pattern recognition, vol. 11, n°4, 1979. Yager A., On general class of fuzzy connectives, Fuzzy Sets and Systems 4 (1980) 235-242. Yang M., Shih H., Cluster analysis based on fuzzy relations, Fussy Sets and Systems 120 (2001) 197-212. Zadeh L., Similarity relations and fuzzy ordering, Infor. Sci. 3 (1971) 177-200. Zeitouni K., Chelghoum N., Boulmakoul A., Arbre de décision spatial multi-thèmes, in SFC’01, 17-21 Décembre 2001, Pointe-à-Pitre Guadeloupe. Zeitouni K., Yeh L., Le data mining spatial et les bases de données spatiales, in revue Int. de géomatique, Vol. 9- n°4/1999, pp. 389-423.
This document was created with Win2PDF available at http://www.daneprairie.com. The unregistered version of Win2PDF is for evaluation or non-commercial use only.