3D Geovisualisation Techniques Applied in Spatial Data Mining

22 downloads 0 Views 384KB Size Report
Keywords: Geovisualisation, Spatial Data Mining, Database, Geographic In- .... to the spatial data mining, a database of work accident notifications, having more ...
3D Geovisualisation Techniques Applied in Spatial Data Mining Carlos Roberto Valêncio, Thatiane Kawabata, Camila Alves de Medeiros, Rogéria Cristiane Gratão de Souza, José Márcio Machado São Paulo State University Departamento de Ciências de Computação e Estatística Rua Cristóvão Colombo, 2265, São José do Rio Preto, São Paulo, Brazil phone 55 17 3221- 2231 fax 55 17 32310-2203

[email protected],{thatianekawabata, camila.alves.medeiros}@gmail.com,{rogeria, jmarcio}@ibilce.unesp.br

ABSTRACT. The increase in the number of spatial data collected has motivated the development of geovisualisation techniques, aiming to provide an important resource to support the extraction of knowledge and decision making. One of these techniques are 3D graphs, which provides a dynamic and flexible increase of the results analysis obtained by the spatial data mining algorithms, principally when there are incidences of georeferenced objects in a same local. This work presented as an original contribution the potentialisation of visual resources in a computational environment of spatial data mining and, afterwards, the efficiency of these techniques is demonstrated with the use of a real database. The application has shown to be very interesting in interpreting obtained results, such as patterns that occurred in a same locality and to provide support for activities which could be done as from the visualisation of results. Keywords: Geovisualisation, Spatial Data Mining, Database, Geographic Information System.

1

Introduction

The expressive amount of stored spatial data in the system has awakened the interest of various areas of study. As the data is of high complexity, the task of extracting knowledge became very costly so, therefore, computational spatial data mining systems appeared [2], [6], [10], [16]. Nevertheless, results were many times not easily visible nor understood, principally in the case of spatial data in the same geographic coordinates [5]. To overcome this obstacle, 3D graphs were developed following geovisualisation guidelines together with an analysis of the region of the map.

adfa, p. 1, 2013. © Springer-Verlag Berlin Heidelberg 2013

The approach by means of a 3D visualisation technique made possible interactivity and flexibility of the spatial data mining results [4]. The implementation of that technique is described in this article, as well as its application with a knowledge extraction system. The experimental results were done based on the work accidents real data, which validates the efficiency of this work. This article is organised in the following manner: section 2 presents the theoretical substantiation; section 3 describes the development of the work; section 4 shows the experimental results from the developed work and section 5 the conclusions.

2

Theoretical Substantiation

The notorious increase of georeferenced data collected by new technological systems such as GPS, remote sensing, spatial localising systems among others, has intensified studies in this area [7], [13]. A new concept has emerged, the spatial data mining or Knowledge Discovery in Spatial Databases (KDSD) to discover implicit patterns in the correlation of spatial and non-spatial attributes [12]. Among these techniques is the geovisualisation that has, as a principal, to facilitate the understanding and decision making from the results of the spatial data mining stage and to emphasise the interactivity for exploring knowledge [5], [10]. Geovisualisation has shown itself to be a very important technique for the analysis of results, as it is not necessary to have an extensive knowledge about the technique to be able to use it. The technique addresses three important points: data processing, due to the large amount and variety of the spatial data; the movement of diverse variables to discover implicit patterns; and an interface that is easy to use and understand [8], [10]. Various spatial data mining systems can be found in literature together with their geovisualisation guidelines to extract knowledge, such as: 3D mono-colour graph for the analysis of outliers [3] that are discrepant points in a set of data; 3D multicolour graphs to show different types of rocks and to analyse the wearing out in certain regions [1]; among other relevant works related to this area. This work covers both the 3D multicolour graphs for each attribute chosen in the data analysis, as well as the manipulation of these resources from any desired angle, besides the relationship of the locality to the graph as from a highlighted geographical point on the map and the description of the information about the selected georeferenced data. The implemented techniques can be taken as an effective contribution, since no works were found in literature that adopt these techniques for the analysis of results obtained from spatial data mining algorithms.

3

Developed Work

The developed set of visual resources is coupled to a spatial clustering system, which makes for a better understanding of results obtained with knowledge extraction process of this system [9], [14].

With the potentialisation of visual resources, it is possible to obtain implicit information from the results of a cluster, since a large quantity of georeferenced objects may be in a same spatial locality. In Fig. 1, is possible to see the registers contained in a cluster of a determined region, which implies a strong correlation between the spatial and non-spatial data, since these clusters are formed from specific criteria, such as: lost work time (time absent from work due to accident), sex and branch of activity. Each square of the map refers to a geographic position that may have many georeferenced registers in that locality, not visualised in that perspective.

Fig. 1. Spatial cluster

In view of that occurrence, a three dimension and multicolour graph resource was implemented which favours the visualisation of different characteristics. The 3D graph provided a rapid visualisation of the distribution of georeferenced points in the cluster, as well as a quantity in certain regions. This is relevant in an analysis, because the visualisation of a map in two dimensions makes it impossible to see the number of georeferenced points in a same locality and to make a comparison to its neighbours. The 3D graph has interactivity with the user, that is, the graph can be seen from various angles, its focus increased or decreased, making it easier to discover implicit patterns as shown in Fig. 2.

Fig. 2. 3D graph in different angles

Finally, there is a relationship between the map and the graph, in that, on pressing on a point in the 3D graph, its location on the map is shown and it is also possible to verify all its characteristics. Fig. 3 shows an example of the application of the visualisation techniques and implemented resources.

Fig. 3. Overall vision of the system

4

Experimental Results

A system namely SIVAT - Sistema de Informação e Vigilância de Acidentes do Trabalho (Work Accidents Vigilance System) catalogues all work accident and contains the georeferenced data about the places where those accidents occurred so as to assist health area actions. Said system is used by the CEREST - Centro de Referência em Saúde do Trabalhador (Worker Health Reference Centre) in the cities of São José do Rio Preto and Ilha Solteira, both in the interior of São Paulo State, to collect, register, follow-up and manage work accident notifications [15]. To summarise the adopted approach so as to optimise the visual resources applied to the spatial data mining, a database of work accident notifications, having more than 70 thousand registers with more than 20 thousand of them georeferenced, was used. That repository has the following stored data: occupation of the injured person, labour market situation, cause of accident, if hospitalisation and absence from work was necessary, among others [15]. Said repository is updated daily and an analysis of the data is done periodically which justifies the relevancy of the proposed work and its contribution to that activity. To do the experiments, a CLARANS - Clustering Large Applications based on Randomised Research [11] algorithm of the spatial clustering system was selected so that results could be analysed and enriched by the resources implemented in this work. Two executions of the algorithm were done: the first about the year 2009 accidents and the second about 2010, and in both was used the same parameters, showed in Table 1. Table 1. Parameters used in the CLARANS algorithm

Number of centroids Maximum iterations Maximum points in each cluster Maximum neighbours 4.1

20 50 20 30

Experiment 1: Cluster of 2009 Data

The following attributes were chosen for this experiment: lost work time, sex and branch of activity in the year 2009. The analysed cluster was made up of 119 work accidents, shown in Fig. 4.

Fig. 4. Cluster used for analysis – 2009

By means of the implemented resources, it is possible to observe that, in the 3D graphs of Fig. 5, Fig. 6 and Fig. 7, it was seen that, for each chosen attribute, the cluster had many more accidents than those plotted in the two dimensional map. This is due to the fact that many happened in the same place. Accidents in this cluster resulted in 100% of lost work time and the principal branches of activity were; hospitals, clinics, laboratories, supermarkets, markets, shops, mini-markets and metalwork, with 32% women and 68% men. With the support of the map in Fig. 4, it is possible to see the region in which such characteristics are concentrated. 3D graphs in Fig. 5, Fig. 6 and Fig. 7 show the occurrences of work accidents at each point for each characteristic besides discovering new correlations, as for example, on analysing the graphs in Fig. 6 and Fig. 7 together, it can be seen that most of the accidents in the hospital, clinic, and laboratory branches of activity happened in the same locality (as can be seen by the signalise vertical line in the graph of Fig. 7) and, on analysing the same geographic position in the graph of Fig. 6, it can be verified that most of those accidents were with women. Moreover, the map’s zoom tool permits visualising the exact location of those accidents, thus giving the CEREST, the agency responsible for work accidents vigilance, an indicative for the planning of preventive and corrective strategies for those accidents in the right region or even in a specific locality that has such characteristics.

Fig. 5. D graph generated for the lost work time attribute

Fig. 6. 3D graph generated for the activity attribute

Fig. 7. 3D graph for the sex attribute

4.2

Experiment 2: Cluster of Year 2010 Data

Experiment 2 was done from year 2010 data for the following attributes: lost work time, hospitalisation and the period of the day in which the accident occurred. To better identify the periods of the day, they were discretised as dawn, morning, afternoon and night. The cluster in Fig. 8 only shows the points that make up those periods. As can be seen, the cluster covers the Washington Luiz Highway region and surrounding streets. A very interesting point of this cluster is the fact that all the accidents occurred during the afternoon (13:00 to 18:59) as can be seen in Fig. 9. Although the cluster covered the Washington Luiz Highway, the accidents in that local did not result in the worker being absent from neither his activities nor being hospitalised. Fig. 10 and Fig. 11 show the 3D graphs for lost work time and hospitalisation. The cluster shown in Fig. 8 also covers work accidents in one of the principal streets of the municipality (Independência street), accounting for 91 work accidents, which allied to the fact that they all occurred during the afternoon, is an indication that the risk of accidents on this street are higher in that period than in others and, therefore, preventative measures can be taken by the responsible agencies to avoid them.

Fig. 8. Cluster used for analysis – 2010

Fig. 9. 3D Graph for a day

Fig. 10. 3D Graph for the lost work time attribute

Fig. 11. 3D Graph for the hospitalisation attribute

5

Conclusions

A large number of valuable discoveries can be done on spatial databases by means of the spatial data mining algorithm. To make these discoveries, a system was constructed that does the spatial clustering and can visualise said clusters on a map of the region in which the points are shown. To support with the understanding of obtained results, as well as to permit new correlations, a geovisualisation 3D technique was incorporated that makes it possible to visualise and search for patterns in certain regions in which the incidence of georeferenced data is high. 3D graphs that can be rotated by the user were constructed to permit a more adequate visualisation of the results. Moreover, a filter was developed of the points that are not part of the cluster, the information related to their georeferenced data and their location on the map. That visual interaction of the generated clusters, together with the map and the 3D graphs, represent a new solution for the extraction of knowledge from spatial databases, offering the user various interactive resources that analyse results, permitting a wider vision of the georeferenced data, as well as supporting with decision making in certain localities. The use of the proposed system on a real database of work accident reports revealed some interesting spatial correlations that, by means of the 3D graphs and the visualisation of points on the map, enabled the CEREST to plan specific strategies in accordance with the localities and accident characteristics. Therefore, this work proves to be relevant in the computational area by means of frontier technologies as well as for public health, since work accidents are responsible for jeopardising workers due to corporal wounds and great expense for public agencies that deal with those workers.

References 1. Amirbekyan, A.: Applying Data Mining and Mathematical Morphology to Borehole Data Coming from Exploration and Mining Industry. In: 2010 IEEE Sixth International Conference on E-Science, e-Science 2010, Brisbane, Australia, December 7-10, pp.113-120. IEEE Computer Society (2010) 2. Bae, D.-H., Baek, J.-H., Oh, H.-K., Song, J.-W., Kim, S.-W.: SD-Miner: A spatial data mining system. In: Proc. 2009 IEEE International Conference on Network Infrastructure and Digital Content, IC-NIDC 2009, Beijing, China, November 6-8, pp. 803–807. IEEE, (2009) 3. Cai, Q., He, H., Man, H.: SOMSO: A self-organizing map approach for spatial outlier detection with multiple attributes. In: International Joint Conference on Neural Networks, IJCNN 2009, Atlanta, USA, June 14-19, pp. 425-431. IEEE Computer Society (2009) 4. Compieta, P., Di Martino, S., Bertolotto, M., Ferrucci, F., Kechadi, T.: Exploratory spatiotemporal data mining and visualization. Journal of Visual Languages & Computing 18, 255–279 (2007) 5. Han, J.: Geographic Data Mining and Knowledge Discovery, 2nd edn. CRC Press, Taylor & Francis Group (2009)

6. Ji, M., Jin, F., Zhao, X., Ai, B., Li, T.: Mine geological hazard multi-dimensional spatial data warehouse construction research. In: 2010 18th International Conference on Geoinformatics, Beijing, China, June 18-20, pp. 1–5. IEEE (2010) 7. Jin, H., Miao, B.: The research progress of spatial data mining technique. In: 2010 3nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2010, Chengdu, China, July 9-11, vol. 3, pp. 81-84. IEEE (2010) 8. Li, C., Li, F., Tian,Y.: Geovisualization of knowledge diffusion a case study in data mining. In: 2nd International Conference on Computer Engineering and Technology, ICCET 2010, Chengdu, China, April 16-18, vol. 2, pp. V2-590-V2-595. IEEE (2010) 9. Medeiros, C.A., Ichiba, F.T., Souza, R. C. G, Valencio, C.R.: Ferramenta de Apoio ao Spatial Data Mining. In: International Association for Development of the Information Society Ibero-Americana Conference on WWW/Internet, November 5-7, Rio de Janeiro, Brazil, pp. 396-398. IADIS Press (2011) (in Portuguese) 10. Mennis, J., Guo, D.: Spatial data mining and geographic knowledge discovery—An introduction. Computers, Environment and Urban Systems 33, 403–408 (2009) 11. Ng, R. T., Han, J.: Efficient and Effective Clustering Methods for Spatial Data Mining. In: Proc. 20th International Conference on Very Large Data Bases, VLDB '94, Santiago de Chile, Chile, pp. 144-155. Morgan Kaufmann Publishers (1994) 12. Pattabiraman, V., Parvathi, R., Nedunchezian, R., Palaniammal, S.: A Novel Spatial Clustering with Obstacles and Facilitators Constraint Based on Edge Deduction and KMediods. In: 2009 International Conference on Computer Technology and Development, ICCTD '09, Kota Kinabalu, Malaysia, November 13-15, vol. 1, pp. 402-406. IEEE Computer Society (2009) 13. Shun, H. Y., Wei, X.: A study of spatial data mining architecture and technology. In: 2009 2nd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2009, Beijing, China, August 8-11, pp. 163-166. IEEE Computer Society (2009) 14. Valêncio, C.R., Medeiros, C.A., Ichiba, F.T., Souza, R.C.G.: Spatial Clustering Applied to Health Area. In: 2011 12th International on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2011, Gwangju, Korea, October 20-22, pp. 427-432. IEEE Computer Society (2011) 15. Valêncio, C.R., Oyama, F.T., Scarpelini Neto, P., Colombini, A C., Cansian, A.M., Souza, R.C.G., Corrêa, P.L.P.: MR-Radix: a multi-relational data mining algorithm. Humancentric Computing and Information Sciences (HCIS), SpringerOpen Journal 2, 1-17 (2012) 16. Wang, P., Ma, L., Xi, Y., Jin, L.: Research on Logistics Oriented Spatial Data Mining Techniques. In: 2009 International Conference on Management and Service Science, MASS '09, Wuhan, China, September 20-22, pp. 1–4. IEEE (2009)

Suggest Documents