Document not found! Please try again

WebGIS Platform for Detecting Spatio-Temporal ... - IEEE Xplore

115 downloads 409 Views 5MB Size Report
e-mail: [email protected]. Abstract— This paper is a natural prosecution of two previous works. We present a web geospatial framework for analyzing.
2014 International Conference on Intelligent Networking and Collaborative Systems

WEBGIS PLATFORM FOR DETECTING SPATIO-TEMPORAL HOTSPOTS OF OTOLARYNGO-PHARYNGEAL DISEASES

F. Di Martino, R. Mele, S. Sessa

U. E.S.Barillari, M.R.Barillari

Dept. of Architecture UNINA Federico II Napoli, Italy e-mail: fdimarti,[email protected]

Dept. PNIAD SUN Napoli, Italy e-mail: [email protected]

Abstract— This paper is a natural prosecution of two previous works. We present a web geospatial framework for analyzing and monitoring the spatio-temporal evolution of disease hotspots. In order to detect the hotspots, we adopt Extended Fuzzy C-Means method which has been adapted for calculating spatial areas with high concentrations of events in a Geographic Information System and tested to study the spatial and temporal evolution of hotspot areas. Each event is given by the geo-positional coordinates of the place of residence of the patient. The analyst can insert event data directly on the map or digitizing the address of the residence of the patient and by using geo-coding services for locating the event. The events can be grouped for time intervals for studying the temporal evolution of the phenomenon. Other services allow the analyst to study the spatio-temporal evolution of the hotspot areas. In our experiments, the data consist of georeferenced patterns corresponding to the residence of patients in the district of Naples (Italy) to whom a surgical intervention to the oto-laryngo-pharyngeal apparatus was carried out between the years 2008 -2012. The results show the presence of two major hotspots for every type of oto-laryngo-pharyngeal disease, the former covering a geographical area that affects the city of Naples, the latter covering various towns around Vesuvius. Both hotspots are significantly increased in recent years.

areas is formed by concentric circular areas around that point and the radius of each circular buffer area is defined a priori. When it is not possible to define statically an area of impact and we need to determine what is the area affected by the presence of a consistent set of events, we are faced with the problem of detecting this area as a cluster on which thicken the geo-referenced events. It is dealt with using an appropriate clustering algorithm in which the detected clusters are geo-referenced and represented as polygonal elements on the map: these clusters are called hotspot areas. The study of hotspots is vital in many disciplines in which it is necessary to detect the geographic areas on which thicken events, as crime analysis [2, 8,14], which studies the spread on the territory of criminal events, fire analysis [5] which analyzes the phenomenon of fires on forested areas, and disease analysis, which studies the localization of focuses of diseases and their spatial evolution during the time [15,16,17],. The clustering methods mainly use are the algorithms based on density [6,11] but they are highly expensive in terms of computational complexity and it is not necessary to determine the exact geometry of the clusters in the great majority of cases. The clustering algorithm Fuzzy C-Means algorithm (FCM) [1] has a linear computational complexity and uses the Euclidean distance to determine prototypes cluster as points. However this algorithm is sensitive to the presence of noise and outliers, moreover the number of clusters must be fixed a priori. In order to overcome these shortcomings, in [9,10] the EFCM algorithm is proposed as a variation of the FCM algorithm in which the cluster prototypes are spheres in the case of the Euclidean metric. Also, the EFCM algorithm is characterized by a linear computational complexity, it is robust with respect to the presence of noise and outliers, and the final number of clusters is determined during the iterative process. In [3,4] the authors use the EFCM algorithm for detecting hotspot areas. The final hotspots are identified as circular areas on the map. In [5] the authors present a new method based on the EFCM algorithm for analyzing the spatio-temporal evolution of the hotspots in fire analysis. The pattern event dataset is partitioned according to the time of the event’s detection; so each

Keywords: carcinoma disease, fuzzy clustering, hotspot, WEBGIS, Extended Fuzzy C-means

I.

INTRODUCTION

This paper is a natural prosecution of two previous works. In this paper we present a WEBGIS platform developed to study the space-time evolution of hotspots. Our WEBGIS platform is fully interoperable with WEB and WEBGIS systems. In a GIS the study of the impact of phenomena in a specific area due to the proximity of the event (for example, the study of the impact area of an earthquake, or the area constraint around a river basin) is performed using buffer area geo-processing functions. Given a geo-spatial event topologically represented as a geo-referenced punctual, linear or areal element, an atomic buffer area is constituted by a circular areas each centered on the element. For example, if the event is the epicenter of an earthquake, geo-referenced by a point, a set of buffer 978-1-4799-6387-4/14 $31.00 © 2014 IEEE DOI 10.1109/INCoS.2014.72

112

subset is corresponding to a specific time interval. The authors compare the hotspots obtained in two consecutive years by studying their intersections on the map. The cluster prototypes detected with the EFCM method are circular areas and considerec as a hotspot area. Let us consider two hotspots A and B and the related areas: -

-

-

process is a geocoding activity necessary for obtaining the event dataset starting by the street address of the patients. To ensure an accurate matching for the geo-positioning of the event, we need the topologically correct road network and corresponding complete toponymic data. The starting data include the name of the street and the house number of the patient’s residence. After the matching process, each data is converted in an event point geo-referenced on the map. In Fig. 2 we show the road network of the district of Naples, the name of the street is labeled on the map; the events are geo-referenced as points on the map. The administrator services of the WEBGIS plaform allow to set the spatial dataset which is the road graph, the toponymic dataset associated with it and the parameters included in the geocoding processes. The geocoding process is performed using an ESRI Address Locator object. An Address Locator defines the matching process to a street in the toponymic dataset. We set the toponymic dataset and the key fields of this dataset for the matching process. Using the matching options, we can set the minimal match degree between the address and an address in the toponymic dataset. Fig. 3 shows an example of Address Locator with an unique key field. By accessing to the area “EFCM parameters” the analyst can calibrate the EFCM algorithm by setting the initial number of clusters, the fuzzifier parameter, and the error that stops the iteration process. The user can check the operation of the algorithm with the set values of the parameters using a test dataset. Fig. 4 shows the setting mode of the parameters in the EFCM algorithm. We can set other numerical fields for adding other features to the geographical coordinates.In the “Event data Extraction” area the user can geocode an event interactively on the map or can use its address locator for geocoding events. He can insert the address where the event was recorded or can upload a file containing the list of addresses associated with each event. An event can be matched, partially matched, or unmatched. The EFCM Analysis contains functionalities for detecting and analyzing hotspots. By accessing to the area called “Hotspots detection” the user define the interval time to analyze and use the EFCM panel for starting the EFCM process. The values of the parameters initially set in the EFCM panel are the ones set in the “EFCM Parameters” area of the admin section. The final clusters are shown as circular areas on the map and can be saved in a new geographic layer (Fig. 5). By accessing to the area called “Hotspots analysis” the user can analyze characteristics of the hotspots detected as identify on basic information, data on the population and on socio-economic data in the impacted area, geo-processing results as intersections between the hotspots and risk areas, etc. In Fig. 6 we show an example of “Identify” for obtain basic information on the selected hotspot. In the area called “Spatio temporal analysis” the user can analyze the spatio-temporal evolution of the hotspots. This area allows to study the displacement in time of an hotspot, and its spatial reduction or expansion.

the region in which the hotspot A is not intersected by the hotspot B: this region can be considered as a geographical area in which the event subsequently disappears because prematurely detected; the region of intersection of the two hotspots: this region can be considered as a geographical area in which the event persist in the course of time; the region in which the hotspot B is not intersected by the hotspot A: this region can be considered as a geographical area in which the event subsequently propagates because prematurely not detected.

We can study the spatio-temporal evolution of the hotspots analyzing the interactions between the corresponding circular cluster prototypes obtained for consecutive periods, the presence of a cluster in a region in which previously are not been detected clusters and the disappearance of clusters in a specific region. In this research we present a platform for study the spatio-temporal evolution of hotspots in disease analysis. We apply the EFCM algorithm for comparing consecutive years’ event datasets corresponding to oto-laryngo-pharyngeal diseases diagnosis detected in the district of Naples (Italy). Each event corresponds to the residence of the patients who contracted that disease. We study the spatio-temporal evolution of hotspots by analyzing the displacement of the centroids of the hotspots, the increase or reduction of the hotspots areas, the emergence of new hotspots and their impact on the population. In Section 2 we present our platform and the method implemented for studying the spatio-temporal evolution of hotspots in disease analysis. In Section 3 we present the results of the spatio-temporal evolution of hotspots for oto-laryngo-pharyngeal diseases detected in the district of Naples. Our conclusions are in Section 4. II.

THE PLATFORM

As shown in [3,4,5], the cluster prototypes formed by circular areas obtained via the EFCM algorithm applied to bi-dimensional pattern datasets are a good approximation of hotspots in spatial analysis. In this paper each pattern is given by the event corresponding to the residence of the patient to whom a specific disease was detected. The two features of the pattern are the geographic coordinates of the residence. In this paper we present a WEBGIS platform for spatio-temporal hotspot analysis. In Fig 1 we show the home page. By accessing to the admin menu, the administrator can set the road network source and the corresponding toponymic dataset. The first step of our

113

The user can analyze the intersecting zone between two hotspots correspondent to consecutive periods and study the evolution in time of hotspot parameters (for example radius, area, inhabitants, population density, etc.). In Fig. 7 we show an example of analysis: the information corresponding to two intersecting hotspots in two consecutive years 2012 and 2013 and the histogram of the hotspot’s area in real time. With the “EFCM WEBGIS platform” the analyst performs the following steps:

corresponding to surgical interventions to the oto-laryngopharyngeal apparatus in patients residents in the district of Naples between the years 2008÷2012. We divide the dataset per year and analyze various types of diseases. Among the types of the most frequent diseases the following were found: carcinoma, endema of bilateral Reinke, hypertrophy of the inferior turbinate, nasal polyposis, bilateral vocal fold prolapse.

1) he sets the road network and creates an address locator for the geocoding process; 2) then he acquires and geo-codes the event data on the road network; the event data are stored in a point spatial dataset, partitioned and classified based on a specified field (for example, the year of the date on which the event occurred; 3) now the analyst can use the EFCM algorithm applied to each event time dataset for evaluating the hotspots as circular areas on the map; 4) after completing the step 3), the analyst can study the spatio-temporal evolution of the detected hotspots; he analyzes intersecting areas between hotspots related to two consecutive event subsets and study the trend in time of characteristics of the detected hotspots (radius, position, population density, etc.).

We present the results obtained for carcinoma disease since the other above diseases were analyzed in [6,7]. We obtain two hotspots covering the city of Naples and many Vesuvian towns: the cluster showing the greatest expansion in 2012 is the one that covers the city of Naples (cfr. Fig.8). Below we show the results obtained by comparing the hotspots for the years 2011 and 2012. The results in Table 1 show that in 2012 the two hotspots covering the city of Naples and many Vesuvian towns are expanded with respect to the ones detected in 2011. The expansion of the first hotspot covers in 2012 municipalities located at north of the city of Naples, and affects an area with more 1300000 inhabitants, of which about 35% of them are 50 years old.

III.

IV.

RESULTS

CONCLUSIONS

We create a WEBGIS platform including the EFCM method for detecting as circles on the map cluster prototypes that can approximate hotspots in hotspot analysis. This method has a linear computational complexity and is robust to noises and outliers. Our WEBGIS platform offers functionalities for analyzing and studying the spatiotemporal evolution of hotspots. We consider the residence’s information of patients in the district of Naples (Italy) to whom a surgical intervention to the oto-laryngo-pharyngeal apparatus between the years 2008÷2012 was carried out. The WEBGIS platform contains geocoding functions for geo-referencing the event data, then the geo-referenced dataset is partitioned per year and type of oto-laryngopharyngeal disease. We compare the hotspots obtained for each pair of consecutive years and analyze the trend of each hotspot over time measuring some parameters as the area covered by each hotspot, the distance between intersecting cluster centroids concerning two consecutive years, area, number of inhabitants and population density of geographical zones not covered by hotspots in the previous period, but covered in the successive period. For brevity, here we show partial results for years 2011 and 2012.

An important use of our WEBGIS platform consists of the analysis of characteristics in geographical areas impacted by the hotspots. For example, in fire analysis is necessary to study the density of a certain type of vegetation included in the hotspot in which, in a certain period, are focused fire events. In human disease analysis an analysis the density and type of resident population in the area covered by the hotspot is crucial as well. The event datasets, divided by time sequences corresponding to periods of one year, are made up of patterns for different events georeferenced corresponding to ailments encountered in patients for which an intervention and the subsequent histological examination was then carried out. The event refers to the geo-position of the location of the patient. The data have been further divided by type of disease for analyzing the distribution and evolution of each specific disease on the area of study. In order to assess the expansion and the displacement of an hotspot we measure the radius of the hotspot and the distance between the centroids of two intersecting hotspots. By analyzing the intersecting area between hotspots resulting by two consecutive event’s subsets we can estimate the extent of the region on which the disease is still present. In addition, using the sociodemographic information associated with census tracts, we can estimate the change in time of the population, both whole and by age, and density of population living in the areas covered by hotspots. In Section 4 we present the results obtained applying this method for the data

ACKNOWLEDGMENT The first and third author were supported from project FARO financed from Polo delle Scienze e delle Tecnologie of UNINA Federico II, Napoli, Italy.

114

[12] U. Kaymak and M. Setnes, Fuzzy clustering with volume prototype and adaptive cluster merging, IEEE Transactions on Fuzzy Systems, Vol. 10, no. 6, 2002, pp. 705712. [13] R. Krishnapuram and J. Kim,. Clustering algorithms based on volume criteria, IEEE Transactions on Fuzzy Systems, Vol. 8, no. 2, 2002, pp. 228– 236. [14] X. Liu, J. Zhang, W. Cai and Z. Tong, Information diffusion-based spatio-temporal risk analysis of grassland fire disaster in northern China, Knowledge-Based Systems, Vol. 23, 2010, pp. 53-60. [15] P.G. McGuire and D. Williamson, Geographic Mapping Tools for Management and Accountability, Third International Crime Geographic Mapping Research Center Conference, Orlando, 1999. [16] A.T. Murray, I. McGuffog, J.S. Western and P. Mullins, Exploratory spatial data analysis techniques for examining urban crime, British Journal of Criminology, Vol. 41, 2001, pp. 309–329. [17] R.M. Mullner, K. Chung,K.G. Croke and E.K.Mensah, Introduction: Geographic Information Systems in Public Health and Medicine. Journal of Medical Systems, Vol. 28 no. 3, 2004, pp. 215-221. [18] K. Polat, Application of attribute weighting method based on clustering centers to discrimination of linearly nonseparable medical datasets. Journal of Medical Systems, Vol. 36, no. 4, 2012, pp. 2657-2673. [19] C.K. Wei, S. Su and M.C. Yang MC (2012) Application of data mining on the develoment of a disease distribution map of screened comminity residents of Taipei County in Taiwan. Journal of Medical Systems, Vol. 36, no. 3, 2012, pp. 20212027.

REFERENCES [1]

J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, New York: Plenum Press, 1981. [2] S.P. Chainey, S. Reid and N. Stuart, “When is a hotspot a hotspot? A procedure for creating statistically robust hotspot geographic maps of crime. In D. Kidner, G. Higgs and S. White, Eds., Innovations in GIS 9: Socioeconomic Applications of Geographic Information Science. London: Taylor and Francis, 2002. [3] F. Di Martino, V. Loia and S. Sessa, Extended fuzzy c-means clustering algorithm for hotspot events in spatial analysis. International Journal of Hybrid Intelligent Systems, Vol. 4, 2007, pp. 1-14. [4] F. Di Martino and S. Sessa, Implementation of the extended fuzzy C-means algorithm in geographic information systems. Journal of Uncertain Systems, Vol. 3, no.4, 2009, pp. 298-306. [5] F. Di Martino and S. Sessa, The extended fuzzy C-means algorithm for hotspots in spatio- temporal GIS. Expert Systems with Applications, Vol. 38, 2011, pp. 11829–11836. doi:10.1016/j.eswa.2011.03.071 [6] F. Di Martino, R. Mele, M.R. Barillari, U.E.S. Barillari, I. Perfilieva and S. Senatore, Spatio-temporal hotspots analysis for exploring the evolution of diseases: an application on oto- laryngo-pharyngeal diseases, Advances in Fuzzy Systems, Vol. 2013, 2013, 7 pages. doi:10.1155/2013/385974 [7] F. Di Martino, S. Sessa, M.R. Barillari and U.E.S. Barillari, Spatio-temporal hotspot analysis and application on a disease analysis case via Geographic Information Systems, Soft Computing, to appear, doi: 10.1007/s00500-013-1211-7 [8] I. Gath and A.B. Geva, Unsupervised optimal fuzzy clustering. IEEE Trans Pattern Analysis, Machine Intelligence, Vol. 11, 1989, pp. 773–781. [9] T. H. Grubesic and A.T. Murray,. Detecting hotspots using cluster analysis and GIS, Annual Conference of CMRC, Dallas, 2001. (www.ojp.usdoj.gov/cmrc). [10] K, Harries, Geographic Mapping Crime: Principle and Practice, Washington DC: National Institute of Justice, 1999. [11] U. Kaymak, R. Babuska, M. Setnes, H. B.Verbruggen and H.M. Van Nauta Lemke, Methods for simplification of fuzzy models. In D. Ruan, Ed., Intelligent Hybrid Systems Dordrecht: Kluwer Academic Publishers, 1997, pp. 91-108.

115

Figure 1. Home page of our platform

Figure 2. Example of geo-coding process of an event

Figure 3. Example of Address Locator

116

Figure 4. Setting and testing mode of the EFCM parameters

Figure 5. Hotspots creation

Figure 6. Example of “Identify” on the selected hotspot

117

Figure 7. Example of spatio-temporal hotspot analysis

Figure 8. Carcinoma disease: display of the two hotspots in 2011 and 2012. TABLE 1. CARCINOMA DISEASE – RESULTS OF THE HOTSPOTS IN 2011 AND 2012 Hotspot

Radius (km)

Area (km2)

Inhabitants

1(2011) 2(2011) 1(2012) 2(2012)

4.981 3.562 7.103 4.951

77.944 39.860 154.378 52.355

670801 325415 1326428 672876

118

Suggest Documents