A BETTER VIEWING OF CROWDSOURCED WEB MAPS BY TECHNIQUES OF CLUSTERING AND ZOOM SELECTIVE. Celso Alberto Saibel Santos PPGI - Federal University of Espírito Santo (UFES) Av. Fernando Ferrari, 514 - Vitória - ES - CEP 29075-910
[email protected]
Wancharle Sebastião Quirino PPGI - Federal University of Espírito Santo (UFES) Av. Fernando Ferrari, 514 - Vitória - ES - CEP 29075-910
[email protected]
ABSTRACT The te rm crowdsourcing is re late d to the practice of obtaining se rvice s, ide as or conte nt from voluntary contributions from a large numbe r of pe ople . In crowdsourcing, the proce ss of building a we b map fro m ge ore fere nced annotations te nds to ge ne rate a huge amount of data, making it difficult to vie w, unde rstand, and ultimate ly the transformation of the se note s in use ful information for use rs. This article pre se nts two strate gie s - Se lective zoom and cluste ring - to de al with the e xce ss of marke rs and to improve the visualization of we b maps ge ne rate d by crowdsourcing. To support the proposal, a frame work, calle d Se archlight, applying the se two strate gie s, improve s acce ss and unde rstand the information of a crowdsource d map, and provide s a me chanism that automate s the cre ation and sharing of this kind of map. KEY WORDS We b Map, Crowdsourcing, Data Vie wing, Zoom, Cluste ring.
1. INTRODUCTION The te rm crowdsourcing is associated with the process of obtaining services, ideas or content from the volunta ry contributions of a la rge number of pe ople (e specially from a n online community) ra the r tha n tra ditiona l e mployees or s upplie rs of informa tion from a company. There a re several type s of crowdsourcing, and this work has focused on the type known as Wisdom of the Crowd, in which la rge a mounts of informa tion a re collected a nd a ggre ga ted to obta in a n ove rvie w of a pa rticula r topic. This vie w, de pending on the topic of s tudy, ca n be re presented by a n a nnotated ma p in crowds ourcing. The production of crowdsourced maps is usually done automatically, from the calls of a process ca lled Volunteered Geographic Informa tion or VGI [1]. Us ua lly the re is s ome s oftware a nd / or we bsite tha t collects a nd ga the rs information via s ome a lgorithm de veloped s pecifically for a s e tma p a re a. Portoa legre.cc [6], OurMa p [3], W ikiCrimes [4] a nd [10] a re e xa mples of volunta ry propos als for collecting user information for solving problems related to cities, crimes and transport. Colle cting this informa tion ca n be done ma nually a nd/or a utoma tically. For e xa mple, in
PortoAle gre .cc portal users add new causes manually by the site. In the proposed [10], applications on s ma rtphone s provide dis pla ce me nt da ta of the us e rs a utoma tica lly. Once collected, the information is processed a nd dis played on a ma p. However, the map itself s hould not be seen as the end product of the process, but s ome specific information extracted from it. For e xa mple, the re may be ma ps s howing tra ffic congestion in a city a nd a system [9], tha t through this ma p, you ca n identify a more efficient route with lower fue l consumption. If on one hand the crowdsourced ma ps allow you to e xtra ct us eful information at a re duced cost, the y must handle a qua ntity of uns tructure d information, ambiguous and ofte n associated with very close (phys ically) on ma p locations. Depending on the scale used, the large amount of information in a small area ends up polluting the vis ua liza tion a nd ma ke difficult the unde rs tanding of the ma p produced colla boratively. Furthermore, knowledge dis covery re quire va riations in the level of detail of the informa tion re corded on the ma p. To resolve the se is sues, it is ne cessary tha t ma ps e nvironments provide me chanisms to group a nd filte r the informa tion provide d by us e rs. An ide a l zoom me chanism is contextua l, or zoom group tha t filte rs out e xtra neous information a t certain levels of s cale that makes the map light (from the point of vie w of processing) a nd more understandable (from the point of vie w of pe rce ption informa tion). Se e king he lp in s olving s ome of the problems presented, this article pre sents the Se archlight, a fra me work to s upport a nd improve the vis ualiza tion of W eb ma ps produced in crowdsourcing. A s e condary goa l is to use the fra mework to create interfaces for vis ua lizing ma ps in a Web browser, us ing fe a tures available in HTML5 to e nable selective zoom operations, grouping, a nd other useful vis ua liza tion options for be tte r us e of informa tion from a W e b ma p. This a rticle is divide d into 5 sections. In section 2, a more detailed definition of the problem and s cope of work is presented. In section 3, two simple techniques tha t can be used to s olve the proble ms pre s ented a re dis cussed. In Se ction 4, the fra me work Se a rchlight, its a rchitecture a nd its functiona lities is presented. Finally, Se ction 5 pre sents the conclusions of the work, the difficulties e ncounte re d, a nd s ugge s tions for future work .
2. PROBLEM DEFINITION AND SCOPE OF WORK The informa tion vis ualiza tion on ma ps crowdsourced brings two type s of proble ms in ge neral: (i) the ove rla p of informa tion a t a ce rtain le vel of s cale a nd (ii) the e xcess inte ra ctions (zoom in) unne ce s s a ry to obta in a n a ppropria te s ca le for vie wing. The ove rla p proble m is illus tra te d in Figure 1(a ). The ma p obta ined in PortoAle gre.cc e nvironment, has ga thered a group of ma rkers in a specific region. Each marker can be overridden by doze ns or even thousands of other markers depending on the scale (the scale used in the example is the de fault scale of the s ystem homepage). It is easy to see tha t the map fa ils to s how, clearly and pre cis e ly, the a re a s of highe s t occurre nce of ce rta in ca us e s . The e xcess of inte ractions occurs due to the a rbitrary us e of a rigid zoom s cale and, therefore, this problem is called Zoom Arbitra ry. Figure 1(b) illus trates the problem of a rbitrary zoom. In this type of proble m, the user needs to re peat the 'zoom in' s everal times to go from le ve l 'ZOOM A' to 'ZOOM C'. This implies expenditure of time and unnecessary inte ractions with the application server to dis pla y inte rmediate levels of zoom, in a n ideal scenario only the ZOOM B s hould be necessary, s ince there is no significant change in the information displayed. In other words, there is no gain in te rms of visualiza tion, since the data will remain grouped in a circle with the number of occurrences (8674) in a ll inte ra ctions be fore ZOOM C.
Figure 1. Proble m of ove rlapping marke rs and arbitrary zoom on we b maps with lots of information.
(a)
(b)
The ma in objective of Se a rchlight fra me work proposed in this a rticle, is to fa cilitate the vis ua liza tion of informa tion (a nd he nce knowledge dis covery) in W e b ma ps ge ne rated from a proce ss of crowds ourcing. The Se a rchlight a ls o a ims to provide re s ources tha t fa cilitate the dis s emination of such information. Therefore, a secondary goal is to support the generation of maps from da ta s heets, which could be s ha red without the a uthor ne ed to progra m or ha ve some knowledge of we b progra mming. In this s cope, the ma in contributions a re (i) the Se a rchlight fra me work with its features and (ii) the project website that provides the framework documentation a nd a n application that generates and shares maps from spatial data stored in a spreadsheet (Google Docs ) or a file in JSON forma t.
3. STRATEGIES FOR DEALING WITH MANY MARKERS In orde r to cre a te the fra mework Searchlight a s tudy on s tra tegies for de a ling with the problem of dis pla ying many markers on a map was needed. In [8] the author shows two basic strategies for this proble m. The firs t, a nd most obvious, is to re duce the number of ma rkers. The second is to group ma rke rs by s ome de gre e of s imila rity.
3.1 Reducing the number of markers The re a re many ways to re duce the number of ma rkers to fa cilitate vis ual inspection of web maps. Among the m, we highlight the filte ring a nd vis ua l optimiza tion . The filte ring mechanism is one of the most known a nd efficient ways to re duce the number of ma rke rs dis pla yed on a ma p, be cause only the ma rke rs tha t me e t the crite ria of the filter are dis pla yed. The mechanism can also be applied to groups of ma rkers, applying the selection criteria to the group me mbers. This me thod a lso a llows the User to be more s pecific on vis ual inspection, be ca us e e a ch filte r crite rion is dire ctly re la te d to a ny ca te gory conta ine d in the informa tion . The a pproa ch us e d in the me thod of s e arching a nd filte ring is to re duce the numbe r of bookmarks tha t a ppear on the ma p, ta king a s input the mos t complete lis t of ma rkers tha t s atisfy s ome criterion. However, it is not a lways necessary to use markers to represent data from a map. In s ome cases, it makes more sense to do a visual optimiza tion in the information displayed, replacing the ma rkers by polygons or groups of polygons to re pre sent the m. For e xample, when s howing a route does not ha ve a marker for e ach ve rtex of the route, since only a line through each vertex allows to re pre sent this route. Figure 2 illustrates an example of optimiza tion. On the left, the map shows a route with ma rkers a t e ach ve rtex of the pa th tra ve rsed. At right, the s a me route is displayed, but
without the markers at the vertices of the path, leaving only a marker to display the starting point of the pa th. In this ca s e , e a ch ve rte x of the route is a s top for the bus e s tha t run a long the pa th . Figure 2. Example of visual optimization to re duce the numbe r of marke rs displaye d on a map.
In s ome cases, the re ma y be the need to dis play the ma rkers of the se ve rtices as a mechanism for inte ra cting with the data they represent. For example, public transport users might want to know the time s that buses run on ce rtain specific bus s top, and with a s ingle click on the ma rker that informa tion could be dis played. This type of inte ra ction would re quire ma rkers on each ve rtex of the route , a nd the vie w from the le ft of Figure 2 would not be a ppropriate. Fortuna tely, there are s ome alternatives that do not require the presence of a marker at each point of the path. For example, if the user interacts with some part of the path, in this case represented by a single line, the interface vis ua liza tion on the map could do a search based on proximity, and locate the bus stop requested by the us e r.
3.2 Clustering The re are several methods for grouping ma rkers, but the most common are implemented using one of the following crite ria for grouping: grouping by gra de , by dis ta nce , or by re gion . Clus te ring ba sed on grid is proba bly the most common a pproach to group ma rkers. In this a pproa ch, the map is divided into a grid and each square markers are grouped into a group. Despite be ing a n effective technique, it has limitations that can lead to unwanted results. In Figure 3, the two ma rke rs physically close on the map, but located in diffe rent squares of the grid will not be grouped in the s a me group a s e xpe cte d. Figure 3. The two marke rs are place d in diffe re nt groups since the y be long to diffe re nt square s of the grid.
Source : (Sve nne rbe rg, 2010, fig. 9.6)
The dis tance grouping te chnique s olves the a bove problem by grouping ne ighboring markers tha t a re close enough (be low a threshold) in the s a me group. The problem of the te chnique is that the groups of ma rkers near each other will not ha ve a fixed location a s in grouping grid. Thus, the
groups tha t we re groupe d by dis ta nce may a ppear a t ra ndom positions a nd do not ma ke sense in ce rta in conte xts the ma p ma y be us e d. In grouping by re gion, diffe rent divis ions into ge ographic regions (s uch as countries, states, citie s) are used to delimit the set of markers that re present a region. Moreover, you can also set what le ve l of s cale (zoom) groups will be broken into s ubgroups. The a dva ntage he re is the creation of groups tha t ma ke more sense for the us er. Groups which follow the order Country> Sta te > City is more na tural for the user tha n one tha t only considers the proximity when it pa rses annotations re la ted to georeferenced data, as in the case of OurMa p and PortoAlegre.cc. The disadvantage of this te chnique is the e ffort to imple ment it [8] s ince the de finition of the groups ca nnot be e asily a utoma ted. This difficulty occurs due to the na ture of how this hie rarchy is de fined, which can cha nge according to the political-administrative relations of each country. To solve this problem, [3] propos e the use of an ontology covering the concepts related to the places where da ta in the form of digita l note s can be included in a crowdsourcing map. According to the authors, the ontology allows to de fine hie ra rchical re la tionships be tween pla ces inde pe ndently of politica l-a dministrative s tructure a dopte d in the country.
4. THE FRAMEWORK SEARCHLIGHT A fra me work in software development, is an abstra ction that unites common code between multiple s oftwa re projects by providing a ge ne ric functionality for a fa mily of re la ted problems [5]. It can a chieve a s pecific functionality for configura tion, when s cheduling a n a pplication. In ge neral, a fra me work is a set of tools that work together. What unites these tools are the rules of the framework tha t a ims to provide the functiona lity tha t motiva te d its cre a tion.
4.1 Architecture Figure 4 s hows the layered architecture of the fra mework Searchlight. The system architecture, a s s hown in the figure , is thus e xpla ine d:
Application Layer: provide s services to the users of the fra mework. The layer ma kes a vailable to us e rs who a re progra mmers a library ca lled Searchlight.js tha t a llows custom inte gration of the fra mework to the priva te website. However, for users without progra mming skills, the layer a ls o provides the map generator. The map ge nerator enables automatic generation of maps and the ir s ha ring ove r the inte rne t without ha ving to code .
Framework Layer: imple ments the services offered to us ers through the application layer. The fra me work la ye r imple ments cla sses: Control, Clus te rCtr, a nd Da ta . The se classes are re s pons ible for controlling the us e r inte rfa ce , groups , filte rs , focus , a nd da ta proce s s ing.
Libraries Layer: imple ments basic services tha t a re used by the fra mework la yer a s viewing ma ps , clusters of markers and data processing. Its main components are the Leaflet library, with its e xte nsions, a nd Ta bletop libra ry. Since the Le a flet libra ry is re s ponsible for the vis ual functions of the map and the library Tabletop by processing da ta from Google Spre a ds he e ts .
Base Layer: provide s the structura l components for building the upper layers. Being tha t its main components a re the jQuery libra ry, which provides compatibility with diffe rent web browsers,
a nd the group forme d by HTML, CSS a nd Ja va Script tha t s hapes the entire s tructure of the fra me work Figure 4. The archite cture of the frame work Se archlight
4.2 The features of the framework Searchlight In orde r to illus trate the capabilities of Searchlight for its potential users, a website with information a bout the project1 was created. The repository contains the link Searchlight.js API for progra mmers who wa nt to us e the fra mework in a customize d manner. It is pos sible to vie w e xamples of the fe a tures imple mented by Se a rchlight e specially: category filte rs , clustering with Se lective Zoom, focus in a group, a utoma tic ge ne ra tion of ma ps a nd ma p s ha ring.
4.2.1 Category filters Ca te gory filte r is one of the fe a tures of Se a rchlight based on the re duction of ma rkers s trategy. Through it you ca n select which categories of informa tion should be dis played on the ma p. To use the filte r by ca tegories the user needs to a ccess the Searchlight menu located in the upper right corner of the ma p, ma ke the choice of which categories are to be dis played a nd click "Atua liza r mapa" (upda te ma p). Figure 5 (a ) dis pla ys the filte r ca tegories a nd how it works . In this figure , two ca te gories were s e lected in the filte r (Urba n Mobility a nd Se curity) a nd only the icons corresponding to them are s hown on the map. The filter also dis pla ys the a mount of ma rke rs be longing to e a ch ca te gory.
1
Searchlight: Facilitando a visualização de Mapas Web. http://wancharle.Github.io/Searchlight/
Figura 5. (a) Filte ring cate gorie s ge ne rate d by Se archlight (b) Summary balloon afte r clicking on a grouping. Tabela formatada
(a)
(b)
4.2.2 Clustering with selective zoom Anothe r inte re sting fe ature provide d by the fra mework Searchlight is the grouping of ma rkers a ccording to the zoom level. This only solves the problem of overlapping information, which is one of the proble ms tha t motiva te d the creation of this proje ct. The colors of the ma rke rs a re de fined a ccording to thre s holds, ra nging from gre e n (lowest) to re d (highe s t) a ccording to the numbe r of e le ments of the group. This can be seen in figure 6 that the left pa rt displays a group containing 1411 e le me nts a nd on its right pa rt s hows its s ubgroups . Grouping ma rke rs a lso s olves the problem of a rbitra ry zoom, s e en in Figure 2. W he n the user clicks a group s o Searchlight dis plays a zoom le vel tha t e nables a ll members of the group that was clicked a re vis ible, running what we call selective zoom. Figure 6 illustrates an example of selective zoom us ing da ta obta ined dire ctly from Portoalegre.cc e nvironment us ing one of the views ge ne rated by Se archlight. In a common map 5 clicks would be re quired in the group from the left pa rt of Figure 6 to ma ke up the re presentation of the right pa rt of the figure , which shows the map with a zoom level where the firs t division of the group occurs. However, selective zoom supplied by Se a rchlight fra mework allows to reach the firs t divis ion of the group a fter the firs t interaction with the group.
Figura 6. Se le ctive zoom to avoid unne ce ssary clicks during inte raction with groups.
4.2.3 Balloons summary and focus in groups On ma ps with a la rge numbe r of ma rkers (a nd ca tegories) it is inte re sting tha t the fra mework provide s a vis ua l de vice to s umma rize the informa tion s tored in e ach clus ter of ma rkers. The re s ource is used by Searchlight balloon s ummary. Figure 5 (b) s hows a n example of us ing this feature for a clus ter with 26 ma rkers. The ba lloon s hows a s ummary lis t of e le ments (icons in ca tegories) be longing to the group, and the lis t is s orted by the number of ma rkers belonging to e ach category group. The ba lloons s ummary provide two forms of inte ra ction (see Figure 5 (b)). The firs t form of inte ra ction is with the button "Expa ndir grupo” (Expa nd group) which basically zooms in the group. The second is with an icon of a category that trigge rs the "focus in groups" feature. This feature also ma kes zooming in on the group, but unlike the other this feature displays only those elements that be long to the category tha t was clicked. The use of focus in groups is useful for s itua tions that you wa nt to s e e just one ca tegory type quickly without ha ving to s e lect a nd de select categories in the ca te gorie s filte r.
4.2.4 Automatic generation and sharing maps The Se archlight e nables a utomatic ma p ge neration through e xte rnal da ta s ources, without the ne e d of progra mming. For this, it was agreed that all information object must contain three essential prope rtie s: latitude, longitude, a nd description (te xt). The "latitude" a nd "longitude " properties are re s ponsible for informing the exact position where the marker should a ppear. The "text" property is re s ponsible for informing the content that will be displayed in the popup balloon when you click the ma rke r. The conte nts of the prope rty "te xt" ca n be a ny te xt, including HTML code or URLs . Figure 7 s hows a s a mple ma p ge ne rated a utomatically from the da ta s ource tha t us es the conve ntion a dopted by Se archlight. In this case, a public Google s preadsheet was used. Moreover, in the column "te xt" we re ins e rte d HTML code a nd pla in te xt. Figure 7. Example of automatically ge ne rate d map with the Google Docs spre adshe e t. Tabela formatada
The Searchlight a lso s upports the JSON forma t da ta, and through JSONP protocol, the framework a llows the se file s to be used a s a da ta s ource in the a utomatic ge neration of ma ps. The use of this fe a ture is quite s imple in Searchlight. Ea ch row of the spreadsheet re presents an information object or ma rker on the ma p a nd e ach column in the s preadsheet is the object properties. The ma p with ma rke rs is automatically generated from the public web address of google spreadsheet, without the Us e r write a ny line of code to it.o.
5. CONCLUSION The increased s ocial interaction through the W eb creates new behaviors a nd interactions among pe ople one e xample is crowdsourcing. Allie d to this is increasing the ne ed for cre ation of virtual me chanisms tha t a llow pe ople to e xercise the ir citize nship connected a nd collaboratively to s olve re a l e veryday problems, s uch a s tra nsportation, pollution, s afety, a mong others. However, such me chanisms ne ed to de a l with the huge volume of da ta ge ne rated, this ma kes difficult the tra ns formation of the se da ta into us e ful informa tion for de cision ma king. Appropria te wa ys to dis pla y these data are an important part in solving this puzzle . In this sense, this paper proposed the Se a rchlight fra mework that a ims to fa cilitate the vis ualiza tion of crowdsourcing information on Web Ma ps In pa rticula r, the fra mework pre sented stra tegies based on re ducing the number of ma rkers a nd the grouping of ma rkers to try to s olve the problems of overlapping informa tion and a rbitrary zoom on the s e ma ps . During the de ve lopment work, it was noticed tha t besides improving the map view, it was also ne cessary to improve the wa y of a ccess to the ma p, fa cilita ting the us e of s uch vis ua liza tion component. For this , the fra mework Searchlight offe rs a feature tha t allows you to ge nerate and share ma ps a utomatically for us ers who do not ne cessarily ha ve progra mming knowledge. The creation a nd s ha ring of a nnota te d ma ps a re ma de dire ctly from informa tion in s pre a ds he e ts . Re ga rding the de ve lopment of Se a rchlight worth me ntioning s ome difficultie s tha t were ove rcome in thre e ca tegories: de s ign, compa tibility a nd e rror de te ction. In de s ign, the biggest difficulty wa s deciding which interface objects should be used so as not to create obstacle s for users of ta ble ts and smartphones. With respect to compatibility, the greatest difficulty wa s with the va rious rule s and security policies adopted by browsers to implement the data processing fra mework. Error de te ction was the greatest difficulty. Once the fra mework code is in Ja va Script, much of the coding e rrors were only dis covered during imple mentation, with little enlightening error messages, which ma de it ve ry difficult to ma ke corre ctions . Re ga rding future work, several features a re yet to be incorporated in Se archlight. One of the curre nt limita tions is tha t the a utomatic ge neration of ma ps only a ccept Google Spre adsheets and not be able to use spreadsheet applications such as Excel, Calc and Numbers. Currently, the user can a dd the ir da ta into these other forma ts spreadsheet, but to us e them on the ma p it is necessary that the y be converted to the forma t of Google Docs s preadsheet. Anothe r is sue to be s tudied re fers to s tra te gies to re duce the number of ma rkers in the map by vis ual optimiza tion. An e xa mple of generic vis ua l optimiza tion that could be implemented in the framework is the connection of the markers in the s a me category. In a s cenario where the ma p s hows a ll bus s tops in a city, a ll markers could be re pla ced by dra wing a line which cuts these markers fa cilitating the vis ualiza tion of key informa tion, which in this ca s e is the pa th of the bus .
REFERENCES 1.
Ballatore A.; Bertolotto M.; Wilson, D.C., 2012. Geographic knowledge extraction and semantic similarity in OpenStreetMap. Knowledge and Information Systems. Springer-Verlag, London. DOI:10.1007/s10115-0120571-0 2. Furtado, V. et al, 2010. Collective intelligence in law enforcement – The WikiCrimes system. In Elsevier Information Sciences, 180(1),4-17. 3. Gonzales et al, 2013. Representação Aberta e Semântica de Anotações de Incidentes em Mapas Web. In Anais do IX Simpósio Brasileiro de Sistemas de Informação, p. 1-12. 4. Graves, A., 2010. Collaborative information on public safety: P otentials and challenges. In Web Science Conf. 2010, Raleigh, NC, USA. 5. Roberts, D., Johnson, R., others, 1996. Evolving frameworks: A pattern language for developing objectoriented frameworks. P attern languages of program design 3, 471–486. 6. P ortoalegre.cc, 2013. P ortoalegre.cc | Uma nova cidade vai nascer, 2013. URL: 7. Sen, S., 2012. Representing Geospatial Concepts: Activities or Entities? In Universal Ontology of Geographic Space: Semantic Enrichment for Spatial Data. 8. Svennerberg, G., 2010. Beginning Google Maps AP I 3. [s.l.] Apress. 9. Thiagarajan, A. et al, 2009. VTrack: accurate, energy-aware road traffic delay estimation using mobile phones. In Proc. 7th ACM Conference on Embedded Networked Sensor Systems. DOI: 10. Thiagarajan, A. et al, 2010. Cooperative transit tracking using smart-phones. In Proc. 8th ACM Conference on Embedded Networked Sensor Systems. DOI: