Procedia Computer Science Procedia Computer Science 101, 2016, Pages 217 – 226 YSC 2016. 5th International Young Scientist Conference on Computational Science
Geospatial Data Generation and Preprocessing Tools for Urban Computing System Development∗ Alexey Golubev, Ilya Chechetkin, Danila Parygin, Alexander Sokolov, and Maxim Shcherbakov Volgograd State Technical University, Volgograd, Russia {ax.golubev, illaech, dparygin, sashkacosmonaut}@gmail.com,
[email protected]
Abstract Almost all people who own smartphones and have the internet access, use geo-informational systems (GIS) in their daily life. Regardless the fact of widespread occurrence of mobile phones with Geo applications installed on it , the quality and the application set of features is being developed step by step. Data-driven technologies, including machine learning and artificial intelligence techniques suggest ways to establish a new generation of GIS systems as a personal assistant in the urban environment. In spite of the existence of various machine learning and artificial intelligence approaches and their implementation in built-in libraries and components, the lack of initial geospatial data (such as origin/destination points) thwarts the progress. This situation will be changed if the data even generated synthetically is provided for intelligent algorithm implementation. Paper describes the proposed set of open source tools for generating and preprocessing of geospatial data. They are: “scatter” for generating geospatial data, “clustering” for reducing the number of geo points based on their density, “0-network” for initial public network generation and “routing” to modify the public network modification based on quality criteria. Some use cases show how to use the instruments for persons who responsible for urban development in the framework of the problem of public transport network design. Keywords: urban development, urban computing, geospatial data, public transportation network design, route planning, GIS
1
Introduction
Usaging of geoservices expands its borders: almost each resident in the big city has a powerful computer (smartphones) in theirs pocket and use this computers functions for convenient living in the urban environment. Studies made by Cisco show the increasing of mobile traffic among the others devices, and focusing to mobile application as well[1]. Mobile applications with ∗ The reported study was partially supported by RFBR research projects 16-37-60066 mol a dk, 15-4702613 r povolje a, 16-37-50017 mol nr and project MD-6964.2016.9
Peer-review under responsibility of organizing committee of the scientific committee of the 5th International Young Scientist Conference on Computational Science © 2016 The Authors. Published by Elsevier B.V. doi:10.1016/j.procs.2016.11.026
217
Geospatial data generation and preprocessing tools
Golubev, Chechetkin, Parygin, Sokolov and Shcherbakov
geodata processing assist an user for finding necessary locations, configuring the shortest path, exploring new places and sharing their experience connected with locations. Moreover, the features of recommender system based on machine learning algorithms can be upgraded significantly if its use geospatial data. The main goal here is to provide the right data for right people in right places. To achieve the objective, we need to perform the following steps: (i) collecting data about the residents current location, (ii) understanding the residents needs, (iii) finding services suppliers and delivering the appropriate information to the user in uncontentious ways. Data-driven technologies, including machine learning algorithms and artificial intelligence techniques are able to design the new generation of GIS systems as a personal assistant in the urban environment. Basically, data-driven application requires a set of data for training (or fitting) embedded models. In spite of existence of various data-driven technologies and their implementation in built-in libraries or components, the lack of availability of geospatial data sources (such as origin/destination points) thwarts the progress. Data collecting from different data sources is a crucial feature in data-driven GIS application. Gathered data must be fresh, real and just-in-time available for the GIS application. Also, designed applications must meet the requirement of processing data with reasonable latency [20]. For the new generation of GIS systems, we call its Urban Computing Systems (UCS), the ideal situation where data is provided by the owner (residents) through the easiest and the most implicit way of communication, e.g. a mobile phone. However, this is might be the issue regarding privacy and collecting of data might be considered as a harmful procedure for many people. Other data sources are the data warehouses of mobile services providers. They keep anonymous data about the presence of a mobile phone in a certain cell. But accuracy of these data is good enough for understanding the urban processes and for designing geospatial data-based services. The most important remark here is that mobile applications (or web services as a urban computing systems) for urban development do not require any personal data. For instance, the problem of modification of existing public transport network requires the passengers traffic information without additional information about who these passengers are 1 . The problem is how to create a set of tools for generating geospatial data and do preprocess of geospatial data for further using in different domains. The contribution of the paper is a set of tools that allows simulating urban development processes and providing different algorithms for preprocessing of geospatial data. But through it seeming simpleness, algorithms contains novel approaches behind them. These tools are: “scatter” for generating geospatial data, “clustering” for reducing the number of geo points based on their density, “0-network” for initial public network generation and “routing” to modify the public network modification based on quality criteria. We consider these tools for urban development management in framework of the concrete problem: the developing and improving the public transport network of the certain type of vehicles such as buses, trolleys. We assume, that the networks contains of the routes and each route contains of stops. Evaluation of the quality of public transport network is out of the scope of the current paper.
2
Background
Spatial data processing is becoming increasingly popular area of research. It includes several aspects. Firstly, it is the growing volume of information collected from different sources [7]. If this data includes location tag, then the source providing this data is considered a source 1 Gathering,
218
2015, http://79.170.167.101:8000/
Geospatial data generation and preprocessing tools
Golubev, Chechetkin, Parygin, Sokolov and Shcherbakov
of geospatial data. Secondly, there are new processing tools and ETL-methods [11] and new interdisciplinary directions of knowledge and insights retrieved from geodata. Third, low technological entrance threshold for specialists with domain-related knowledge. But for some sectors it is a critical qualitative shift for the perception the object of research by means of another spatial interpretation of the situation. And, fourthly, it is an important tool for knowledge translation, conclusions and proposals for the transition teams, various stakeholders, as well as residents of the territories in order to promote development projects [16]. Methods of spatial information presentation are developing for the comprehensive assessment in different situations. These methods requires available geospatial data source, processing technologies that suitable to problems research and thought-out solutions for the visualization of data. All these components support the qualitative analysis of the urban environment including geographically distributed objects, transportation system, communications networks and infrastructure. This might be called the decision support systems for urban development. The primary study objective is to determine the method that will provide the most complete coverage the observed data. Nowadays, significant amount of data is stored in open sources. But their specifics is in orientation to the coordinates of a particular location. So, researchers use the tools to generate synthetic data. Specificity of a synthetic geodata is the options spatial arrangement, the proportion of filling territory areas, the presence of zero filling polygons, complex territorial figures [10]. Using GIS that having function of arbitrary generation spatial data allows to compare several methods for the formation clusters of curved planar shapes or identify anomalies shapes for different cases and processing methods [14]. Another particular side of the problem may be the production of large amounts of data for extended territories. Also it is used for restoring the missing sets of geodata, for example, occupancy the territory, based on indirect indicators, to obtain a complete picture of future studies [3]. An important processing tool is a grouping of geodata when determining their homogeneity, giving a basis for placing them in the same group, set, category, classification or cluster. Clustering spatial data primarily involves the use of appropriate criteria and methods for estimating the degree of closeness. This may be carried out by distance estimate with kmeans and CLARANS2 , distribution or density gradation of geodata with CLIQUE, ENCLUS, Grid-Clustering, OptiGrid [5], moreover can be carried out hierarchical spatial clustering and subspace clustering [9]. Modern approaches of data processing imply the possibility of remote processing and presentation. In connection with this we have to solve technological issues to work with different data storage formats like Shapefiles, KML, GPX, GeoJSON, CSV and processing means, the most convenient for speed and load with server-side/client-side[17]. This solutions for cluster analysis of varying difficulty are now a compulsory part of the corresponding popular mapping services[15]3 and specialized network resources to work with geodata4 . Clustering geodata becomes the basis for further specialized studies. The spatial specificity geodata determine their effectiveness for the analysis of complex geographically distributed systems. One such system in the city is the urban transport. The task of routing itself is quite multifaceted[8]. In the aspect of urban public transport system, it acquires specific restrictions and complex settings. To build a route networks apply a number of classic (two-phase, improving, constructive) and metaheuristic (list of exceptions, annealing, ant colony, genetic 2 DataSet
Cluster Analysis, 2016, http://www.ncbi.nlm.nih.gov/geo/info/cluster.html API Maps, 2016, https://tech.yandex.ru/maps/ 4 Predict through location, 2016, https://carto.com/ 3 Yandex.Maps:
219
Geospatial data generation and preprocessing tools
Golubev, Chechetkin, Parygin, Sokolov and Shcherbakov
algorithms, neural networks) methods [13, 6]. There are a number of package solutions among which there are CityLab, ArcGIS, TripSpark, PTV Group, INRO [21] that have become widespread. These systems implement an integrated range of transport tasks and, in particular, the planning of the route network of the city. In general, this professional tools that require significant training. Such solutions contain in their composition different network routing configuration methods like genetic algorithms[2]. Although there are more simple tools for manually building of routes with automatic calculation the set of characteristics5 , popular of its availability for a wide range of non-specialists.
3
Tools Description
This section presents the description of the developed tools. We use the following content structure for each tool. Firstly in “description” paragraph we describe an objective of the observed tool and why it needs to be applied. The next two paragraphs “inputs” and “outputs” cover the requirements for input and output data. Section “features” explains the set of features from user’s use cases point of view. Finally, we provide the links to repository containing the developed tool.
3.1
The “Scatter”
The “Scatter” is a tool that allows users to manually generate arbitrary geospatial data. It is a web application developed with JavaScript and Leaflet library 6 . The main window of the application consists of map and tools menu. “Scatter” tool proposes some useful instruments for fast and easy data generating: “spray”, “polygon”, and “bounding box”. It was developed for generation modeling data of peoples transport preferences. Thereby, “Scatter” provides creation of different kinds of geospatial points. “Scatter” tool was designed to ease a procedure of generating data sets, the number of points in which is not very big (about 5000 – 10000). The “Scatter” does not require the input data, as a user use this tool for creating the points on the map by clicking it. The output of “Scatter” is a JSON-based text file composed of generated data. Provided features of the “Scatter” includes functions for different types data generating, file saving and file loading functions. It supports five variant of data generated performed as instruments: “single”, “spray”, “polygon”, “bounding box”, and “delete”. “Single” instrument allows the user to place geospatial data points one-by-one by single mouse click on the map. “Spray” instrument allows the user to place a set of some data points in the area near users click. The areas radius and a number of points in the set are defined by the user. By default, the number of points is set to 20, and the radius is set to 100m. “Polygon” instrument allows the user to randomly generate a set of data points in an arbitrary polygon. Bounds of polygon and number of points in the set are defined by the user. “Bounding box” instrument allows the user to randomly generate a set of data points in a bounding box of a map on the screen. The number of points in the set is defined by the user. “Delete” instrument allows the user to manually delete points by single mouse click on it. Also, “undo” function which is used to revert last changes. If the user wants to distinguish some points, he can set them with the different type. “Scatter” provides creation of different points type by the special instrument. Whenever during the generation user can save its work to text file. If generation procedure is not completed, the user can upload it at another time to continue the work. 5A
Planning Platform for Public Transit, 2016, https://www.getremix.com/ Agafonkin, Leaflet, 2015, http://leafletjs.com/
6 Vladimir
220
Geospatial data generation and preprocessing tools
Golubev, Chechetkin, Parygin, Sokolov and Shcherbakov
The tool is available by the link http://vstu-cad-stuff.github.io/scatter/. You can find the source code at GitHub repo by https://github.com/vstu-cad-stuff/scatter.
3.2
The “Clustering”
The “Clustering” is a tool that allows the user to reduce the number of geospatial data based on their density. It is a Python script which implements k-means clustering on geospatial data. The main difference with existed algorithm provided by sklearn7 that the distance in our implementation is calculated using Open Source Routing Machine8 . This approach allows avoiding the main drawback of the k-means algorithm implementing to clustering the geospatial data. Originally, k-means uses Euclidean distance between two points. However, it leads to including in the same cluster the points which are close geographically, but far due to urban obstacles. These obstacles could be rivers, roads, railways tracks and so on. So, it was important to modify the existed clustering technique according to the urban terrain specifics. Input of the “Clustering” tool is a JSON-formatted text file with data points. In the pipeline of the considered task (public network analysis and modifying), the Scatter provides data for Clustering. Output of “Clustering” tool is a JSON-formatted text file composed on calculated clusters centers. The main objective of clustering procedure is grouping of data objects into non-overlapping sets, called clusters, in a such way that each cluster consisted of the same objects, and the objects in different clusters were differed significantly. The structure of clusters in initial data set is unknown. To recognize which objects are similar and which are not, we define function of similarity on data set. In “Clustering” tool we provide k-means clustering algorithm and “route” distance function. A special distance function for clustering the geospatial data is necessary because if the distance between two geospatial objects calculated with default functions is small, then in reality it may be not so. The way may be obstructed with natural and artificial obstacles like rivers, railways, ravines, huge buildings. The examples of such cases are shown on figure 1 with the real (urban terrain) distances, which must be used in calculations.
Figure 1: A pair of objects close by “euclidean” metric, but distant by “urban terrain” There are three distances metrics in “Clustering” to calculate distances between pairs of objects: euclidean, “surface” and “route”. The “surface” metric use a solution of inverse geodetic problem [12] as distance between pair of objects. The “route” metric finds distance between pair of objects on citys roadmap network. To solve the inverse geodetic problem in “surface” metric the GeographicLib9 library is used. The engine Open Source Routing Machine 7 Machine
Learning in Python, 2016, http://scikit-learn.org Source Routing Machine, 2015, http://project-osrm.org/ 9 Geographic library, 2015, http://geographiclib.sourceforge.net/ 8 Open
221
Geospatial data generation and preprocessing tools
Golubev, Chechetkin, Parygin, Sokolov and Shcherbakov
(OSRM) is used to find distance between objects in “route” metric. OSRM provides open source code of its routing engine. While running engine on local machine user can use some HTTP post-requests, called HTTP-API, to work with it. The responses to requests are json objects with all necessary information. “Route” metric uses two types of requests: “route” (“viaroute” in HTTP-API v4) to get distance between objects, and “nearest” (“locate” in HTTP-API v4) to find nearest street segment for given object. A response to route request contains a list of routes with their distances and waypoints, a response to ‘nearest’ request contains a list of points on street network with distance to them from given object. The tool is available by the link http://vstu-cad-stuff.github.io/clustering/. You can find the source code at GitHub repo by the following url: https://github.com/ vstu-cad-stuff/clustering.
3.3
The “0-Network”
The “0-Network” is a tool that allows to form of the initial network, using the principle of minimum increase in the route length with the inclusion of a new transport stop node. To build initial route network based on consistently adding new nodes to existed routes in respect with minimal length increasing of designed network. It is a Python script which implements several algorithms: build convex hull by graham algorithm, finding terminal and non-terminal nodes and build route network by principle of minimum increasing of route length. The Input of “0-Network” is a number of routes in the designed public transportation network (nr ), set of terminal nodes (Ct ) and set of non-terminal nodes (Cnt ). The Output of “0-Network” is a constructed route network (list of list of nodes) (named as RN ) or GeoJSON file format. The main feature in its algorithm presented on the algorithm 1 as a sequence of the steps. Algorithm 1: The algorithm for building initial route network Data: nr , Ct , Cnt Result: RN 1 Create nr direct routes containing pairs of opposite nodes from Ct and add these routes into network Ri ; 2 foreach i-th route from the network Ri do 3 Select the i-th route from Ri and split it in half, each new route add into network PN; 4 Find the node from Ct , which minimal increase the length of splitted routes from P N and add new route with added node into network RC; 5 Compose ||RC|| variants of new routes: one route taken from the network RC, another from the network P N (Ri ) and compose a list of routes candidates to change routes in the network RCC; 6 Evaluate the length of routes from the list RCC and choose the route Ri with minimal length among the others; 7 Replace Ri with Ri in the network RN ; 8 Delete a node cj from Cnt as it was added into Ri ; 9 end 10 Check if Ct is not empty, then go to step 2, otherwise, terminate; The results available at https://vstu-cad-stuff.github.io/routing/geojson. 222
The
Geospatial data generation and preprocessing tools
Golubev, Chechetkin, Parygin, Sokolov and Shcherbakov
source code at GitHub https://github.com/vstu-cad-stuff/routing/tree/0-Network. The detailed description of the algorithm you can find in the paper [19].
3.4
The “Routing”
The “Routing” is a tool that allows to modify of the initial version of the road network of public transport routes is carried out using the idea of iterative transformation of the original route. The method uses a modified evolutionary algorithm using the operations of mutation and crossover for route network optimization according to the chosen criterion(length of route/average length of road network, passenger traffic, etc.). It is a Python script which implement Genetic Algorithm (crossover and mutation operations) for road network. The input of the “Routing” is a road network list (list of list of nodes, RN ) generated by “0-network” algorithm. The output of “Routing” is a modified road network list (RN ) after algorithm execution. Features. Modification of the transport network on the basis of genetic algorithms with taking into account the assessment of the quality of the network for the selected criterion. This task is closely connected with the routing problem from point A to point B, but has some differences. First, in a routing problem the objective function is a trip time from begin to end of the route, which should be minimized. In the case of construction of a route network of public transport objective function is integral that takes into account the average time walking to the bus stop, path length, number of transfers and etc. [4, 18]. Second, in a typical routing problem for movement can be selects any intermediate point who reduce the route, but in this task intermediate points placed there where the cluster centers. The general description of the algorithm presented on the algorithm 2. Details are covered by a paper [21]. Algorithm 2: The algorithm for modification route network Data: RN Result: RN 1 Put route list RN to R and RN ; 2 while stop condition for RN not performed do 3 For each element in list R apply crossover and mutation operation; 4 Rate the quality of R by chosen criterion; 5 if new population R is better than RN then 6 Replace RN to R; 7 end 8 end The result of the algorithm available at https://vstu-cad-stuff.github.io/routing/ network/. You can find the source code at GitHub repo by the following url https://github. com/vstu-cad-stuff/routing/tree/GA-Route.
4
Use cases and discussion
To explain how the set of proposed tools are applied in urban development task, we showed the example for new public transportation network creating for midsize city located near the large regional centre. For the subset of residents in the city, we obtain a pair of origin and destinations expressed as longitude and latitude. Origin might be where the certain resident lives and the 223
Geospatial data generation and preprocessing tools
Golubev, Chechetkin, Parygin, Sokolov and Shcherbakov
destination where the same resident works. It means each pair shows the most popular path of the certain person in the city. This path contains a part when the public transportation is used, and walking zones. The ideal public network contains the routes where travel time for each resident will be minimal. We apply the criteria of the total length in the public transportation network. We apply the straightforward procedure containing the following steps: (i) generation of origin-destination pairs for 600 residents using Scatter; (ii) clustering all the 1,200 points into 35 clusters (the number of clusters has been chosen according to the number of stops) using “clustering” tool; (iii) creating the initial network using 0-Network tool; (iv) modifying the initial public network using “routing” tool towards reducing the average length of the routes in the networks. Figure 2a shows the result of generation of origin-destination pairs for 600 residents using Scatter. A couple of districts have been selected for generation, but it is not a limitation of the tool.
(a) Visualization of the generated data. Blue points represent origin points, black ones – destination points, and red points are initial cluster centers for “clustering” tool.
(b) Clustering all the 1200 points into 35 clusters (the number of clusters has been chosen according to the number of stops) using “clustering” tool.
Figure 2: Generated and clustered data. The result of the generation of data is a textual file containing origin-destination points and their coordinates. Using this file, the “clustering” is able to provide the centers of the clusters based on modified k-means algorithm. Figure 2b presents the visualization of the results of the clustering of 1,200 points into 35 clusters. Analyzing of figures shows that borders of clusters are edged main roads, it means, that urban terrain-based distance calculation works properly. Next step, is creating the initial network using 0-Network tool. The results of initial routes where the number of routes is set by user and it is equal to nr = 6 with the total length of public transportation network is 40,579 meters. Figure 3a presents the route network obtained with “0-Network” with nr = 12 and total length is 66,577 meters. Once, the initial public network is obtained, the next set tends to reduce the overall length of the network with usage of genetic algorithm implemented in the “route” tool. The total length for final network with nr = 6 was reduced to 39,821 meters. The figure 3b represents the results for nr = 12 where the total length has been minimized to 66,079 meters. These reductions are not sufficient, but the the aim of the paper is highlight the features of the tools without any additional settings. Finally, we discuss the results of the contribution in terms of advantages and drawbacks. The main advantages of “Scatter” tool is that it’s easy and fast in use. It allows to generate data in different locations all over the world. The main disadvantages of the tool is slowness of its work with huge (more than 10,000 points) datasets. Also, tool cant check what type of 224
Geospatial data generation and preprocessing tools
(a) The initial network (nr = 12).
Golubev, Chechetkin, Parygin, Sokolov and Shcherbakov
(b) The final network (nr = 12).
Figure 3: Route network with 12 routes.
surface is on point, so points can be located in rivers or seas. The future work aims to improve work for fast generation of huge datasets. The advantage of developed tools “Clustering”, “0Network” and “Routing” is the simplicity and scalability, due to which they can be integrated into the one GIS system. The main drawback of tools is use of OSRM engine to get distance between points, because it takes about 6 ms to get distance, and the processing of huge datasets becomes really slow. Also, one of disadvantages of the tools is that they were tested only on the generated sets of data.
5
Conclusion
The paper presents the set of the tools for generating and preprocessing of geospatial data in the framework of public network creating and modifying task. The main benefit of the tools is possibilities to apply different methods (e.g. machine learning as well) for geospatial data analysis. It is useful for creating the geospatial services in different domains. This allows to (i) estimate volume of data receiving from residents in future, (ii) apply and evaluate data-driven approaches for urban development tasks, (iii) define the specification requirements for geospatial data warehouses and (iv) find out the drawbacks in algorithms switching from structured data to geospatial data. The future work is the creation of own simple route engine to get distance between points, the use of a larger number of criteria for the construction and modification of the route network, as well as performance testing on a real dataset.
References [1] Cisco visual networking index: Global mobile data traffic forecast update. [online], 2016. http://www.cisco.com/c/en/us/solutions/collateral/service-provider/ visual-networking-index-vni/mobile-white-paper-c11-520862.html. [2] Execution of works in the field of studies transportation systems, integrated transport planning based on mathematical modeling. [online], 2016. http://optimal-drive.ru/downloads/ transportmodelling.pdf. [3] J. Barthelemy and P. L. Toint. Synthetic population generation without a sample. Transportation Science, pages 1–14, 2012. 225
Geospatial data generation and preprocessing tools
Golubev, Chechetkin, Parygin, Sokolov and Shcherbakov
[4] A. Ceder. Designing public transport network and routes. Advanced Modeling for Transit Operations and Service Planning, 3:59–91, 2003. [5] S. K. Dulin, I. N. Rozenberg, and V. I. Umanskiy. Implementation of clustering methods for studying geodata arrays. Sistemy i Sredstva Informatiki [Systems and Means of Informatics], 19(2):86–113, 2009. [6] W. D. Fan and R. B. Machemehl. Some computational insights on the optimal bus transit route network design problem. In Journal of the Transportation Research Forum, volume 47, 2012. [7] A. Finogeev, L. Fionova, A. Finogeev, I. Nefedova, E. Finogeev, T. Q. Vinh, and V. Kamaev. Methods and tools for secure sensor data transmission and data mining in energy scada system. In Creativity in Intelligent, Technologies and Data Science, pages 474–487. Springer, 2015. [8] A. Golubev, I. Chechetkin, K. S. Solnushkin, N. Sadovnikova, D. Parygin, and M. Shcherbakov. Strategway: Web solutions for building public transportation routes using big geodata analysis. In Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services, iiWAS ’15, pages 91:1–91:4, New York, NY, USA, 2015. ACM. [9] D. Guo, D. J. Peuquet, and M. Gahegan. Iceage: Interactive clustering and exploration of large and high-dimensional geodata. GeoInformatica, 7(3):229–253, 2003. [10] K. Hermes and M. Poulsen. A review of current methods to generate synthetic spatial microdata using reweighting and future directions. Computers, Environment and Urban Systems, 36(4):281– 290, 2012. [11] V. Kamaev, A. Finogeev, A. Finogeev, and S. Shevchenko. Knowledge discovery in the scada databases used for the municipal power supply system. In Joint Conference on Knowledge-Based Software Engineering, pages 1–14. Springer, 2014. [12] C. F. Karney. Algorithms for geodesics. Journal of Geodesy, 87(1):43–55, 2013. [13] E. Kochegurova and Y. Martynova. Optimization of planning the public transport routes when developing the automated decision support system. Izvestiya Tomskogo politekhnicheskogo universiteta, 323(5):79–84, 2013. [14] J. A. Maantay and S. McLafferty. Geospatial analysis of environmental health, volume 4. Springer Science & Business Media, 2011. [15] L. Mahe and C. Broadfoot. Too many markers! google geo apis team. [online], 2010. https: //developers.google.com/maps/articles/toomanymarkers?hl=en. [16] A. Matokhina, N. Sadovnikova, D. Parygin, and E. Gnedkova. Ontology development for intelligent decision support system in the city development management tasks. Izvestia VSTU, 14(178):69–74, 2015. [17] G. Ortelli. Server-side clustering of geo-points on a map using elasticsearch. [online], 2013. http://blog.trifork.com/2013/08/01/ server-side-clustering-of-geo-points-on-a-map-using-elasticsearch/. [18] N. Sadovnikova, D. Parygin, M. Kalinkina, B. Sanzhapov, and T. N. Ni. Models and methods for the urban transit system research. In Creativity in Intelligent, Technologies and Data Science, pages 488–499. Springer, 2015. [19] M. Shcherbakov and A. Golubev. An algorithm for initial public transport network design over geospatial data. In IEEE Second International Smart Cities Conference (ISC2 2016) Improving the citizens quality of life, ISC2, pages 274–280. 2016 IEEE International Smart Cities Conference (ISC2), 2016. [In press]. [20] M. Shcherbakov, Y. Timofeev, A. Saprykin, V. Trushin, A. Tyukov, N. Shcherbakova, V. Kamaev, and A. Brebels. An on-line and off-line pipeline-based architecture of the system for gaps and outlier detection in energy data stream. In Engineering of Computer Based Systems (ECBSEERC), 2013 3rd Eastern European Regional Conference on the, pages 1–7. IEEE, Aug 2013. [21] M. V. Shcherbakov, N. P. Sadovnikova, D. S. Parygin, A. V. Golubev, and I. A. Chechetkin. Decision support automation for the public transport routes development based on the correspondences population data analysis. Herald of Computer and Information Technology, (8):29–33, 2016. 226