An advanced Systematic Literature Review

70 downloads 261679 Views 1MB Size Report
Keywords: Crowdsourcing, Location Based Social Networks, Twitter, .... Several electronic literature sources have been evaluated during the search.
An advanced Systematic Literature Review (SLR) on Spatiotemporal Analyses of Twitter Data – Technical Report Enrico Steiger, João Porto de Albuquerque, Alexander Zipf GIScience Research Group, Institute of Geography, University of Heidelberg Berliner Straße 48 D-69120 Heidelberg, Germany +49(0)6221 / 54 4546 {enrico.steiger,joao.porto,zipf}@geog.uni-heidelberg.de

Abstract The increasing number of research contributions covering various topics and research questions from multiple academic disciplines which are relevant to geographical information science (GIScience), are a challenging factor for a successful selection and assessment of corresponding high quality research articles. Heterogeneous multiple electronic databases pools underline the demand for a statistical significant inclusion of research relevant articles with more sophisticated methods, in order to prove empirical observations during the review and to be able to answer specific research questions. The aim is therefore to conduct a Systematic Literature Review (SLR) to provide a current state of research concerning methods and application of crowdsourced Web 2.0 data for the Location Based Social Network (LBSN) of Twitter. We introduce an advanced framework enabling an automatized systematical, reproducible literature review process. Keywords: Crowdsourcing, Location Based Social Networks, Twitter, Volunteered geographic information, VGI, Systematic Literature Review 1

Introduction

Interactive social media platforms offer a tremendous amount of voluntarily, user-generated content. The enormous potential of LBSN is increasingly recognized by numerous research domains over the last years. Simultaneously we are facing an interdisciplinary and relatively new research field with a lack of common online databases and available literature sources. In terms of application, established and applied methods for LBSN are hard to identify on the first view. Hence the overall goal

Location Based Social Networks – Systematic Literature Review

of this paper is to provide an objective summary of the current state of the research concerning where Twitter has been used, for which specific use cases and what methods have been applied. The reviewed articles allow a more detailed evaluation regarding the potential of LBSN, but also intend to summarize remaining challenges and investigate possible drawbacks. A key element of this review is to identify where excess research already exists and where new research is needed. Cross analyzing our review paper regarding involved research disciplines, applications and methods, we identified research gaps and provided a solid foundation for further studies. We are also able to give recommendations of future research directions. GIScience can contribute essential research methods in order to advance the research of LBSN by further integrating methods of spatial analysis. In the first part of this paper we will outline existing well defined review methods from other professions in order to synthesize the methodology of systematic literature reviews closer to the field of information systems and the related discipline of GIScience. The following section 2 describes the implemented quantitative and qualitative review workflow and states the used research questions. Review results are presented in part 3, followed by a discussion of the results in section 4. The last part 5 will finally conclude the review outcome with some final remarks. 1.1 Background of VGI, Social Media and LBSN Emerging technologies have created new approaches towards the distribution and acquisition of crowdsourced information. The growing availability of mobile devices equipped with GPS sensors, high performing computers and broadband internet connections with advanced server and client side key technologies, allow users to actively participate and create content through mobile applications and location based services. The role of the user is more and more intertwined from a previously distinctive perspective being either a producer or consumer, into a rather dynamic manner of becoming a prosumer (Tapscott 1996). The participation of individuals and their vast amount of generated data has been commonly known under the term of Web 2.0 (o’Reilly 2009). Facilitated by new technologies audiences are committing their local knowledge without the need of having a prior expertise. Goodchild names this phenomenon Citizens as Sensors, where Volunteered Geographic Information (VGI) is created, assembled, and disseminated by individuals or groups with knowledge or capabilities using the Web 2.0 (Goodchild 2007).Within this interactive networked, participatory model of People as Sensors (Resch 2013) information is supplied free of charge and voluntarily. Haklay terms this development of new innovative social web mapping applications as the evolution of the GeoWeb (Haklay 2

Location Based Social Networks – Systematic Literature Review

et al. 2008). Social Networks are a key part of this development, incorporating new information plus communication tools and attracting millions of users. Boyd & Ellison (2007) outline the term Social Network Sites (SNS), typified by individuals who construct an online profile communicating with other users, sharing common ideas, activities, events and interests. Location Based Social Networks (LBSN) further enhances existing social networks, adding a spatial dimension with location-embedded services. For example users upload geotagged photos via Flickr, checking in at a venue with Foursquare or commenting on a local event via Twitter. These location-driven social structures allow mobile device owner with ubiquitous internet access to exchange details of their personal location as a key point of interaction (Zheng 2011). LBSN are bridging the gap between our physical world and online social network services containing three layers of information according to Symeonidis et al. (2014): a social network (user layer), a geographical network (location layer) and a semantic metadata network (content layer). 1.2 Background of Systematic Literature Review Systematic Literature Review (SLR) is a well-established review method, first notably applied by professionals in the field of medicine and health care. According to Cochrane, one of the common well known and highly ranked medical database review libraries setup in 1994, “systematic reviews seek to collate all evidence that fits pre-specified eligibility criteria in order to address a specific research question” (Cochrane 2011). The Cochrane Handbook for Systematic Reviews of Interventions contains methodological guidance and aims to minimize bias by using explicit, systematic methods (Cochrane 2011).Coming from a medical and public health background, Fink (2005) considers four keyword strategies to be essential when conducting a research review: systematic examination of all literature sources, comprehensive elaboration of a structured review, explicit explanation of all used methods and reproducible exclusion of subjective examinations allowing others to reproduce results. Webster & Watson (2009) state a lack of theoretical progress in information systems due to the complex and difficult delineation of research in an interdisciplinary field. SLRs therefore can assist in summarizing existing evidence concerning a technology to identify research gaps for further investigations and provide a reproducible methodology to be able to conduct new research activities (Brereton et al. 2007). Levy & Ellis (2006) introduce a framework for conducting and writing an effective literature review adapting the review concept into the field of information systems. Kitchenham refined the approach of an Evidence-based Software Engineering (EBSE) aiming “to improve decision making related to software 3

Location Based Social Networks – Systematic Literature Review

development and maintenance by integrating current best evidence from research with practical experience” (Kitchenham 2004). Keele (2007) adapted the concept of systematic reviews into software engineering research by setting up distinct guidelines derived from those in the medical field. In further tertiary studies (Kitchenham et al. 2009 and Kitchenham et al. 2010) automated searches for systematic literature reviews have been performed in order to quantitatively and qualitatively evaluate the applied SLR methodologies. One study outcome was an increased number of SLRs being published between 2004 and 2008. However many researchers still conduct informal literature surveys. Related to geographic information science Horita et al. (2013) assessed the current state of research analyzing VGI for disaster management and applying a SLR including a screening process of important literature databases. Roick & Heuser (2013) provided a general non systematical review article about the current research on Location Based Social Networks, stating the need of further studies on investigating how social networks can be applied to specific use cases. However literature reviews have been performed in a rather non systematical manner with a lack of statistical techniques including meta-analysis. 2

Review method

This review will follow the guidelines developed by Kitchenham & Keele (2007), dividing the research into three main phases: planning the review, conducting the review with the selection of studies from electronic databases and reporting the final review report itself (Kitchenham & Keele 2007). The procedure of the literature review including all derived results has been documented in the review protocol. Furthermore test reviews with preliminary trial searches have been carried out, in order to detect and minimize bias concerning the defined search strings or during the subsequent data extraction process. The flowchart review model in fig. 1 visualizes our automatic workflow approach. All following paragraphs and chapters are divided according to the review process steps shown in the flowchart.

4

Location Based Social Networks – Systematic Literature Review

Fig. 1: Flowchart Review Process and number of included papers

5

Location Based Social Networks – Systematic Literature Review

2.1 Electronic Databases

The initial step of selecting eligible literature sources is based on following criteria: -

consideration of journal and conference proceedings published between 2005 and 2013 in English (technical drafts etc. are excluded) selection of multiple digital libraries with relevance to information research identified by Brereton et al. (2007) are further supplemented

Several electronic literature sources have been evaluated during the search documentation according to ability of citation export, maximum number of keyword terms, query limits etc. Other input factors were crawl and search limitations (e.g. Springer) and research papers not being fully accessible. Table 2 visualizes our initial 288 and 92 final reviewed paper concerning the publication origin. Duplicate search results found in multiple electronic databases have been excluded. Papers appearing in several electronic databases e.g. inside Google Scholar search engine for publications and Web of Knowledge will only be included once, storing unique search results. Source

URL

Unique Meta Search

Text

Paper

Backward Final

Analysis Analysis Screening Reference Review

Result IEEE Library

http://www.ieeexplore.ieee.org

ACM Digital Library

http://dl.acm.org

AIS Electronic Library

http://aisel.aisnet.org

Google Scholar

Search

36

16

14

5

9

14

149

33

35

20

21

41

4

1

1

1

0

1

http://scholar.google.de

12

8

12

8

8

16

Science Direct

http://www.sciencedirect.com

12

3

4

0

0

0

Elsevier

http://www.scopus.com

23

10

0

3

1

4

Springer Link

http://www.springerlink.com

9

7

1

0

3

3

Taylor & Francis

http://www.tandfonline.com/

15

10

1

0

0

0

Wiley Online Library

http://onlinelibrary.wiley.com

2

2

1

1

1

2

Web of Knowledge

http://www.webofknowledge.com

18

11

2

2

0

2

AAAI

https://www.aaai.org/

2

0

3

2

7

9

282

101

74

42

50

92

Total

Table 2: Used electronic databases with included and excluded papers during the review process.

6

Location Based Social Networks – Systematic Literature Review

2.2 Search Terms

The defined electronic databases has been searched conducting an automatic predefined keyword search including research relevant terms which have been collected through an iterative training step proposed by Kitchenham et al. (2010) (table 3). 𝐶𝑟𝑜𝑤𝑑𝑠𝑜𝑢𝑟𝑐𝑖𝑛𝑔 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝐵𝑎𝑠𝑒𝑑 𝑆𝑜𝑐𝑖𝑎𝑙 𝑁𝑒𝑡𝑤𝑜𝑟𝑘𝑠 𝑇𝑤𝑖𝑡𝑡𝑒𝑟 𝑉𝑜𝑙𝑢𝑛𝑡𝑒𝑒𝑟𝑒𝑑 𝐺𝑒𝑜𝑔𝑟𝑎𝑝𝑕𝑖𝑐 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 [ ] 𝑉𝐺𝐼

𝑇𝑤𝑖𝑡𝑡𝑒𝑟

Table 3: defined Search Terms

As described by Levy & Ellis (2006) defining the search terms can present multiple problems for the novice researcher. In addition, it can introduce possible bias, therefore an iterative approach beginning with predefined terms was used in this review. All retrieved papers and their meta-data are then semantically analyzed looking for new keywords. These key words form new additional search terms for the next search iteration and so on. Finally we are generating a cloud of terms and their probabilistic occurrences within all the screened research papers, indicating search terms with the highest relevance to our research question. All search terms have been queried pair- and crosswise to the digital libraries and connected through “and” as well as “or” operators. The search term “social media” has been excluded as search results including meta-analysis have shown no relevant research papers with specific methods and use cases were extracted. “Volunteered Geographic Information” and “VGI” as its abbreviation have been queried as two individual search terms, because term occurrences within selected research papers and in the metadata after our training phase, have revealed a random distribution and use by authors. Some papers only included the abbreviation as a keyword, others mentioned the full term. Keywords arbitrary defined by researchers can be an issue since these buzzwords appear and disappear during temporal and the technological development (Levy & Ellis 2006). Therefore the underlying methodologies might be subject to a more static development, but difficult to quantitatively assess. To assist the selection process a backward reference search was performed within the qualitative review proposed by Webster & Watson (2009). Implementing an automatic citation search approach during the quantitative review however is not possible at this stage, due to the high amount of primarily included papers and the fact that meta-data of research paper currently does not contain machine-readable information concerning used references. The performed 7

Location Based Social Networks – Systematic Literature Review

broad automated search has been finished when no new results and only identical literature references were identified (Okoli & Schabram 2010). After the search process all papers were screened to avoid duplications and were excluded when already existing, in order to minimize publication bias (Brereton et al. 2007). Quantitative review 282 research papers have been identified with our previously described defined setup inclusion and exclusion parameter. Searching the literature generates a large number of resulting studies and a single reviewer is not capable to review these qualitatively. Therefore all selected papers are further processed within a meta-data analysis exporting citations and references using common file formats (bibtex, endnote, ris). 2.3 Metadata Analysis

Fig. 4 visualizes exemplarily a conference paper we selected including author, title, year and keywords.

Fig. 4: meta-data example in ris reference format

The created term corpus from our meta-data was eliminated from white spaces, converted to lower cases and removed from numbers, stop words (for english) and special characters. We are comparing the total number of term occurrences for all 282 screened papers during our meta-analysis, once only with keywords and once with keywords including the paper titles (Fig 5). The specific terms “social”, “twitter”, “information”, “data” and “network” occur in absolute numbers more than 60 times and appear in almost 43% of all papers when semantically analyzing keywords and topics.

8

Location Based Social Networks – Systematic Literature Review

total number of term occurences

histogram - term frequencies 160 140 120 100 80 60 40 20 0

n = 282

Meta-Analysis: only Keyword

Meta-Analysis: Keyword + Title

Fig. 5: total occurring term frequencies of 282 screened papers comparing only keywords and keywords with title. The terms “urban” and “routing” only appear as keywords, not within the title.

Looking for association rules and terms showing a significant correlation with other extracted words, we were able to build up a term adjacency matrix represented as a graph (fig. 6) highlighting the most occurring terms. The more terms mutually correlating, the higher the edge weight and the closer they appear in our graph. For example the terms “social” + “media” and “social” + “network” correlate to each other by more than 0.5 and are therefore associated.

Fig 6: weighted graph network generated from term adjacency matrix

In the next step all semantic annotations and occurring terms are counted simply using the term frequency–inverse document frequency (tf-idf) algorithm implemented within R text mining package (Feinerer 2013):

9

Location Based Social Networks – Systematic Literature Review

𝑤

𝑤 =number of occurrences of i (term) in j (paper)

(

)

𝑑𝑓 =number of papers containing I N= total number of selected papers

We are calculating each term frequency across the specific research papers and divide it by the total number of term occurrences over all papers. The obtained weighting factor indicates whether terms occur rarely or are commonly used words within all selected research papers. We are applying DBSCAN (Ester et al. 1996) density based clustering algorithm in order to detect statistically significant centroids of term occurrences. The minimum number of points to form a cluster was defined to be 5 with a reachability distance of ε=1.5. As a result (fig. 7) cluster 1 has a high overall document frequency with a tf-idf score close to zero. The research papers in cluster 1 (n=101) seem to cover similar topics because we are computing pairwise a two dimensional semantic distance from used keywords and titles. In addition, these papers form a strong cluster without scattering and are our targeted research papers. Cluster 2 shows a medium document/ term frequency being more dispersed and diffusing into local sub cluster of semantic similarity. Cluster 3 has a high term frequency for the specific paper with a low document frequency (tf-idf score higher) over all papers, becoming a noisy outlier cluster. We can identify most frequently occurring terms from all 101 papers which have been assigned to cluster 1:”Crowdsourcing”, “Twitter”, “Volunteered Geographic Information” and “Social Networks”. We have extracted a group of terms being clearly semantically related to each other and suitable to answer our research questions.

10

Location Based Social Networks – Systematic Literature Review

frequent terms (cluster1): 𝑐𝑟𝑜𝑤𝑑𝑠𝑜𝑢𝑟𝑐 𝑡𝑤𝑖𝑡𝑡𝑒𝑟 𝑣𝑜𝑙𝑢𝑛𝑡 𝑔𝑒𝑜𝑔𝑟𝑎𝑝𝑕 𝑖𝑛𝑓𝑜𝑟𝑚 𝑠𝑜𝑐𝑖𝑎𝑙 [ ] 𝑛𝑒𝑡𝑤𝑜𝑟𝑘

Fig. 7: Term frequency-inverse document frequency of research papers with DBSCAN clustering

2.4 Text Analysis

The next goal is to find papers within the identified semantic cluster, intersecting with most of the frequent terms from our previous meta-analysis (Kofod-Petersen 2012). Simultaneously we are assessing whether the extracted cluster from our meta-analysis correlates with the content of the paper. Therefore we are now focusing on semantically analyzing the full text. All initially stored papers are converted from pdf into plain text content using the java open source Tika toolkit. Afterwards the remaining converted papers undergo a natural language processing (NLP) step. We are semantically processing all documents including Tokenization, Stop word-Filtering, and Stemming using Rapidminer functionalities as a powerful data and text mining tool (Mierswa et al. 2006). References at the end of research papers have been filtered, due to the processing parameters of minimum and maximum word lengths, removal of numbers and punctuations. For our case we are using Latent Dirichlet allocation (LDA) as one semantic probability based topic extraction model introduced by Blei et al. (2003). The unsupervised machine learning model identifies latent topics and corresponding document clusters from a large text collection.

11

Location Based Social Networks – Systematic Literature Review

Topic 1 𝑓𝑖𝑟𝑒 𝑡𝑤𝑒𝑒𝑡 𝑒𝑣𝑒𝑛𝑡 𝑡𝑤𝑖𝑡𝑡𝑒𝑟 [ 𝑡𝑖𝑚𝑒 ]

Topic 2 𝑡𝑤𝑖𝑡𝑡𝑒𝑟 𝑢𝑠𝑒𝑟 𝑚𝑎𝑟𝑠𝑒𝑖𝑙𝑙𝑒 𝑖𝑛𝑓𝑜𝑟𝑚 [ 𝑝𝑟𝑜𝑣𝑖𝑑𝑒 ]

1 document-term matrix (306 terms) Non-/sparse entries: 306/0 Maximal term length: 8 Weighting: term frequency (tf)

Fig. 8: Topic Model results, showing the log-likelihood

Fig. 9: LDA topic model for our example paper

(𝑙𝑜𝑔 𝑀𝐿𝐸) of the data for different number of topics (k)

Longueville & Smith (2009) and document term matrix results with no sparsely terms

As the number of specified topics k for LDA needs to be set prior, we are estimating the parameter by computing the maximum likelihood estimation iterating through each topic model for every paper, generated from the document term matrix. Results have shown a highest log likelihood for all papers with 2 topic models (𝑙𝑜𝑔 𝑀𝐿𝐸 ) following the topic model selection by Griffiths & Steyvers (2004) (Fig. 8). Out of all 282 papers, we extracted 564 individual topics (2 topics per paper), each consisting of 5 associated terms (in total 2820 terms). For our example paper “OMG, from here, I can see the flames !: a use case of mining Location Based Social Networks to acquire spatio-temporal data on forest fires” published by Longueville & Smith (2009) (from fig. 4), we were uncovering 2 latent topics intersecting with each other (fig. 9). When qualitatively reviewing this paper we can indeed discern a time analysis of twitter tweet distributions for a fire event in Marseille where user are providing information through LBSN. In the next step we are now picking only terms related to topics within all of our papers which occur several times. When analyzing most frequent occurring associated terms inside a correlation matrix (fig. 10), we are able to detect topics in our papers which significantly overlap with each other. Each dot inside our scatterplot represents the specific correlation of one paper against another paper for our given topics. For example paper 1-4, 10-18 and paper 234-249 highly mutually correlate ( 𝑟 ) and contain semantically the same topics. Papers are highly 12

Location Based Social Networks – Systematic Literature Review

semantic correlating towards each other for the specific topics “social network”, “tweet” and twitter”. We are only including paper during the text analysis process which shows a strong positive correlation coefficient for these specific topics equal or higher than 𝑟 . Therefore we can reject the null hypothesis, since t-test critical value is 74 documents have been identified based on a semantic topic extraction by measuring topic correlations.

Fig. 10: Correlation matrix of all papers for three extracted topic related terms “social network”, “tweet” and “twitter”.

Results of most frequent topics from the text analysis are shown in fig. 11. The frequency of occurrence of topics has been included according to the color and term label size. Fig. 12 visualizes our selected 74 paper and their year of publication. The count of publications per year by comparing the initially 282 records with results from meta-analysis and the final text analysis, shows similar frequencies. This analysis 13

Location Based Social Networks – Systematic Literature Review

helps to verify if there is a one-sided selection of papers in any of the quantitative review steps, which might bias the result.

Fig. 11: Wordcloud with most frequent

Fig. 12: Comparison year of publication of initially selected

extracted topics (applying LDA) over all

papers (n=282) with results after meta-analysis (n=101) and

research paper (n=282)

text analysis (n=74)

2.5 Merge Process

Comparing identified papers from meta- and text analysis, not all are matching up. Out of 74 papers from the text analysis, 54 (=72%) have been part of the previous cluster from our meta-analysis. 14 are journal publications, 60 are conference proceedings. 21 papers from the text analysis have not been identified within the meta-analysis. Therefore all papers need to go through a merging process (Fig. 13), where frequent terms from the metadata-analysis are compared with extracted topics from the text analysis for each paper.

Fig. 13: review of identical and non-identical papers

14

Location Based Social Networks – Systematic Literature Review

47 papers and 20 papers respectively are non-identical. Out of 47 additional identified papers from the meta-analysis 39 contain, when focusing on LDA extracted terms from the text-analysis, none of our relevant topics “twitter”, “tweet” and “social network”. Correlating text analysis topics with meta-analysis terms, 8 papers from the text analysis have shown similarities with meta-data terms and have therefore been included. Validating the merge results, we can indeed confirm that non identical papers which have been excluded cover different topics (e.g. dealing with Foursquare or Flickr data). One reason meta- and text analysis results differ are character- and keyword limitations within the metadata of papers. Also every papers metadata contains different amounts of keywords, or no keywords at all. Based on our relevant topics, we have quantitative extracted scientific articles (n=62) and can now begin the qualitative review phase by screening the remaining articles. Qualitative review During the qualitative review all clearly irrelevant results will be discarded, i.e. papers that neither address any aspect of the research questions. Drafting a clear and concise research question is an essential task needed to successful identify primary studies providing a detailed State of the Art (Okoli & Schabram 2010). 2.6 Research questions

As the reviews objectives are to extract use cases, focused research areas and methods when utilizing Twitter as one LBSN, the following three research questions have been selected: 1. What are the applications where Twitter as one LBSN has been used? 2. Which of the academic disciplines are mainly focused on researching Twitter? 3. What are the methods aiming to use voluntary information and crowdsourced social media data from Twitter? 2.7 Paper Screening

A practical screen of included papers further synthesizes the review by examining methods and use cases. During the paper screening process, paper have been excluded which do not show any relevance to our previous formulated research questions. Paper from the Association for the Advancement of Artificial Intelligence (AAAI) have been extracted from the text analysis but not detected within the meta-analysis. The qualitative review has shown a relevance of these articles to our 15

Location Based Social Networks – Systematic Literature Review

research question and therefore all papers have been included. 15 Papers not explaining their methodological approach or application of Twitter are fallen under exclusion criteria. Another 5 paper have been excluded because of self-citation. These cross citations have not been excluded quantitatively in the meta- and text-analysis as they are strongly semantically close. 42 papers are remaining. 2.8 Backward Reference Search

Given the topic and term related semantic inclusion of papers, we started a manually backward reference search according to Webster & Watson (2009) and referred from Levy & Ellis (2006). This approach looks through all citations of our 42 finally selected articles to follow methods and their development. However papers from authors referencing back to their own papers covering similar topics have been excluded. 50 additional articles have been included through backward reference searching. Conclusions This technical report has presented an advanced framework to conduct a quantitative and qualitative review of studies providing a state of research concerning methodologies, applications and use cases of Twitter as one main Location Based Social Network (LBSN). The proposed systematic literature review method considers and combines search results from multiple heterogeneous digital libraries and allows an effective reproducible assessment of relevant research studies. With a combined synthesis of methods from computer linguistics including tf-idf algorithm during the meta-analysis and LDA probabilistic topic approach modeling the semantic content of all documents, we achieved a successful quantitative inclusion of papers. Together with the implementation of an iterative keyword search considering meta-analysis results, we were able to minimize bias during the overall review process. The major research outcome, generating answers for our research questions after the qualitative review has been accomplished going into detail by providing new statistical insights for LBSN. 6

Acknowledgement

This research has been funded through the graduate scholarship program Crowdanalyser- spatiotemporal analysis of user-generated content supported by the state of Baden Wurttemberg. We also thank Prudence Carr for proofreading this research article.

16

Location Based Social Networks – Systematic Literature Review

References Abel, F. et al., 2012. Semantics + Filtering + Search = Twitcident Exploring Information in Social Web Streams Categories and Subject Descriptors. ACM Transactions on Intelligent Systems

and Technology, pp.285–294. Andrienko, G. & Andrienko, N., 2013. Thematic Patterns in Georeferenced Tweets through Space-Time Visual Analytics. Becker, H. & Gravano, L., 2011. Beyond Trending Topics: Real-World Event Identification on Twitter. AAAI, pp.438–441. Blei, D., Ng, A. & Jordan, M., 2003. Latent dirichlet allocation. the Journal of machine Learning

research. Boettcher, A. & Lee, D., 2012. EventRadar: A Real-Time Local Event Detection Scheme Using Twitter Stream. 2012 IEEE International Conference on Green Computing and

Communications, pp.358–367. Boyd, D.M. & Ellison, N.B., 2007. Social Network Sites: Definition, History, and Scholarship.

Journal of Computer-Mediated Communication, 13(1), pp.210–230. Brereton, P. et al., 2007. Lessons from applying the systematic literature review process within the software engineering domain. Journal of Systems and Software, 80(4), pp.571–583. Cha, M. et al., 2010. Measuring User Influence in Twitter: The Million Follower Fallacy. ICWSM. Chae, J. et al., 2012. Spatiotemporal social media analytics for abnormal event detection and examination using seasonal-trend decomposition. 2012 IEEE Conference on Visual Analytics

Science and Technology (VAST), pp.143–152. Chu, Z., Gianvecchio, S. & Wang, H., 2010. Who is Tweeting on Twitter: Human , Bot , or Cyborg?

ACM, pp.21–30. Cochrane Reviewers’ Handbook Glossary, Version 5.1.0. Cochrane Collaboration. Corvey, W. et al., 2010. Twitter in Mass Emergency: What NLP Techniques Can Contribute.

Computational Linguistics, 4(June), pp.23–24. Cranshaw, J. et al., 2012. The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City. ICWSM. Crooks, A. et al., 2013. #Earthquake: Twitter as a Distributed Sensor System. Transactions in GIS, 17(1), pp.124–147. Cui, A. et al., 2012. Discover breaking events with popular hashtags in twitter. In Proceedings of

the 21st ACM international conference on Information and knowledge management - CIKM ’12. New York, New York, USA: ACM Press, p. 1794.

17

Location Based Social Networks – Systematic Literature Review

Dalvi, N., Kumar, R. & Pang, B., 2012. Object matching in tweets with spatial models. In

Proceedings of the fifth ACM international conference on Web search and data mining WSDM ’12. New York, New York, USA: ACM Press, p. 43. Earle, P.S., Bowden, D.C. & Guy, M., 2011. Twitter earthquake detection : earthquake monitoring in a social world. Annals of Geophysics. Ester, M. et al., 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD. Feinerer, I., 2013. Introduction to the tm Package Text Mining in R. tiré de http://cran. r-project.

org/web/packages/tm/ Ferrari, L. et al., 2011. Extracting urban patterns from location-based social networks. In

Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Location-Based Social Networks - LBSN ’11. New York, New York, USA: ACM Press, p. 1. Finin, T. et al., 2010. Annotating Named Entities in Twitter Data with Crowdsourcing. ACM pp. 80–88. Fink, A., 2005. Conducting Research Literature Reviews: From the Internet to Paper Fuchs, G., Jankowski, P. & Augustin, S., 2013. Extracting Personal Behavioral Patterns from Geo-Referenced Tweets. AGILE. Gelernter, J. & Balaji, S., 2013. An algorithm for local geoparsing of microtext. GeoInformatica. Gerais, M. et al., 2012. Traffic Observatory : a system to detect and locate traffic events and conditions using Twitter. ACM, pp.5–11. Go, A., Huang, L. & Bhayani, R., 2009. Sentiment Analysis of Twitter Data. Entropy, 2009(June), pp.30–38. Gonzalez, R. & Chen, Y., 2012. TweoLocator : A Non-Intrusive Geographical Locator System for Twitter. ACM, pp.24–31. Goodchild, M., 2007. Citizens as sensors: the world of volunteered geography. GeoJournal. Griffiths, T. & Steyvers, M., 2004. Finding scientific topics. … academy of Sciences of the United …. Gupta, A. & Kumaraguru, P., 2012. Credibility Ranking of Tweets during High Impact Events.

ACM. Haklay, M., Singleton, A. & Parker, C., 2008. Web mapping 2.0: The neogeography of the GeoWeb.

Geography Compass. Hecht, B. et al., 2011. Tweets from Justin Bieber ’ s Heart : The Dynamics of the “ Location ” Field in User Profiles.

18

Location Based Social Networks – Systematic Literature Review

Hiruta, S. et al., 2012. Detection , Classification and Visualization of Place-triggerd Geotagged Tweets. ACM. Hong, L. et al., 2012. Discovering geographical topics in the twitter stream. Proceedings of the 21st

international conference on World Wide Web - WWW ’12, p.769. Hong, L., Convertino, G. & Chi, E.H., 2011. Language Matters in Twitter : A Large Scale Study Characterizing the Top Languages in Twitter Characterizing Differences across Languages Including URLs and Hashtags. , (1), pp.518–521. Horita, F.E.A. et al., 2013. The use of Volunteered Geographic Information and Crowdsourcing in Disaster Management: a Systematic Literature Review. In Proceedings of the Nineteenth

Americas Conference on Information Systems, Chicago Illinois, August 15-17, 2013. Atlanta, GA, USA: AIS, pp. 1–10. Hughes, A.L. & Palen, L., 2009. Twitter adoption and use in mass convergence and emergency events. International Journal of Emergency Management, 6(3/4), p.248. Jackoway, A., Samet, H. & Sankaranarayanan, J., 2011. Identification of live news events using Twitter. In Proceedings of the 3rd ACM SIGSPATIAL International Workshop on

Location-Based Social Networks - LBSN ’11. New York, New York, USA: ACM Press, p. 1. Keele, S., 2007. Guidelines for performing Systematic Literature Reviews in Software Engineering. Kinsella, S., Murdock, V. & Hare, N.O., 2011. “ I ’ m Eating a Sandwich in Glasgow ”: Modeling Locations with Tweets. ACM, (June), pp.61–68. Kitchenham, B., 2004. Evidence-based software engineering. Engineering, 2004. ICSE.

Proceedings. 26th International Conference on. IEEE, 2004. Kitchenham, B. et al., 2009. Systematic literature reviews in software engineering – A systematic literature review. Information and Software Technology, 51(1), pp.7–15. Kitchenham, B. et al., 2010. Systematic literature reviews in software engineering – A tertiary study. Information and Software Technology, 52(8), pp.792–805. Kitchenham, B. & Keele, S., 2007. Guidelines for performing Systematic Literature Reviews in Software Engineering. Kling, F., Kildare, C. & Pozdnoukhov, A., 2012. When a City Tells a Story : Urban Topic Analysis.

ACM, pp.482–485. Kofod-Petersen, A., 2012. How to do a Structured Literature Review in computer science. Kosala, R. & Adi, E., 2012. Harvesting Real Time Traffic Information from Twitter. Procedia

Engineering, 50(Icasce), pp.1–11. Krishnamurthy, B. & Arlitt, M., 2006. A Few Chirps About Twitter. , pp.19–24.

19

Location Based Social Networks – Systematic Literature Review

Kulshrestha, J. & Gummadi, K.P., 2008. Geographic Dissection of the Twitter Network. Kwon, K.H. & Hall, B., 2010. AN EXPLORATION OF SOCIAL MEDIA IN EXTREME EVENTS : RUMOR THEORY AND TWITTER DURING THE HAITI EARTHQUAKE 2010. AIS, pp.1– 14. Lampos, V. & Cristianini, N., 2010. Tracking the flu pandemic by monitoring the Social Web. , pp.411–416. Lee, B. & Hwang, B.-Y., 2012. A Study of the Correlation between the Spatial Attributes on Twitter. 2012 IEEE 28th International Conference on Data Engineering Workshops , pp.337– 340. Lee, R. & Sumiya, K., 2010. Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection. In Proceedings of the 2nd ACM SIGSPATIAL

International Workshop on Location Based Social Networks - LBSN ’10. New York, New York, USA: ACM Press, p. 1.. Levy, Y. & Ellis, T.J., 2006. A Systems Approach to Conduct an Effective Literature Review in Support of Information Systems Research. , 9. Li, W. et al., 2011. The where in the tweet. In Proceedings of the 20th ACM international

conference on Information and knowledge management - CIKM ’11. New York, New York, USA: ACM Press, p. 2473. Longueville, B. De & Smith, R.S., 2009. “ OMG , from here , I can see the flames !”: a use case of mining Location Based Social Networks to acquire spatio- temporal data on forest fires. ACM, (c), pp.73–80. Maceachren, A.M. et al., 2011. SensePlace2 : GeoTwitter Analytics Support for Situational Awareness. , pp.181–190. Mierswa, I., Wurst, M. & Klinkenberg, R., 2006. Yale: Rapid prototyping for complex data mining tasks. … and data mining. Murthy, D. & Longwell, S. a., 2013. Twitter and Disasters. Information, Communication & Society, 16(6), pp.837–855. o’Reilly, T., 2009. What is web 2.0 Okoli, C. & Schabram, K., 2010. A Guide to Conducting a Systematic Literature Review of Information Systems Research. , (2008). Pan, C., 2011. Event Detection with Spatial Latent Dirichlet Allocation. ACM, pp.349–358. Pennacchiotti, M. & Popescu, A., 2010. to Twitter User Classification. , pp.281–288.

20

Location Based Social Networks – Systematic Literature Review

Resch, B., 2013. People as sensors and collective sensing-contextual observations complementing geo-sensor network measurements. Progress in Location-Based Services. Ritterman, J., Osborne, M. & Klein, E., 2009. Using Prediction Markets and Twitter to Predict a Swine Flu Pandemic. , (2004). Roick, O. & Heuser, S., 2013. Location Based Social Networks - Definition, Current State of the Art and Research Agenda. Transactions in GIS, p.n/a–n/a. Sadilek, A., Krumm, J. & Horvitz, E., 2013. Crowdphysics : Planned and Opportunistic Crowdsourcing for Physical Tasks. Mircrosoft Research. Sakaki, T., 2009. Earthquake Shakes Twitter Users : Real-time Event Detection by Social Sensors. Sakaki, T. & Matsuo, Y., 2012. Real-time Event Extraction for Driving Information from Social Sensors. IEEE, pp.221–226. Sakaki, T., Okazaki, M. & Matsuo, Y., 2010. Earthquake shakes Twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide

web. ACM, pp. 851–860. Sofean, M. & Smith, M., 2012. A Real-Time Architecture for Detection of Diseases using Social Networks : Design , Implementation and Evaluation. ACM, (figure 1), pp.309–310. Starbird, K. & Muzny, G., 2012. Learning from the Crowd : Collaborative Filtering Techniques for Identifying On-the-Ground Twitterers during Mass Disruptions. ISCRAM, 2011; pp.1–10. Stefanidis, A., Crooks, A. & Radzikowski, J., 2011. Harvesting ambient geospatial information from social media feeds. GeoJournal, 78(2), pp.319–338. Symeonidis, P., Ntempos, D. & Manolopoulos, Y., 2014. Location-Based Social Networks.

Recommender Systems for …. Takhteyev, Y., Gruzd, A. & Wellman, B., 2012. Geography of Twitter networks. Social Networks, 34(1), pp.73–81. Tapscott, D., 1996. The digital economy: Promise and peril in the age of networked intelligence. Terpstra, T., 2012. Towards a realtime Twitter analysis during crises for operational crisis management. ISCRAM, pp.1–9. Thomson, R. et al., 2012. Trusting Tweets : The Fukushima Disaster and Information Source Credibility on Twitter. ISCRAM, pp.1–10. Veloso, A. & Ferraz, F., 2011. Dengue surveillance based on a computational model of spatio-temporal locality of Twitter.

21

Location Based Social Networks – Systematic Literature Review

Wakamiya, S. & Lee, R., 2012. Crowd-sourced Urban Life Monitoring : Urban Area Characterization based Crowd Behavioral Patterns from Twitter Categories and Subject Descriptors. ACM. Wang, H. et al., 2012. A System for Real-time Twitter Sentiment Analysis of 2012 U . S . Presidential Election Cycle. In Other. pp. 115–120. Wanichayapong, N. et al., 2011. Social-based Traffic Information Extraction and Classification.

IEEE, pp.107–112. Watanabe, K. et al., 2011. Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs. In CIKM ’11 Proceedings of the 20th ACM

international conference on Information and knowledge management . ACM New York, NY, USA, pp. 2541–2544. Webster, J. & Watson, R.T., 2009. ANALYZING THE PAST TO PREPARE FOR THE FUTURE : WRITING A REVIEW. , 26(2). Weng, J. & Lee, B., 2011. Event Detection in Twitter. AAAI, pp.401–408. Weng, J., Lim, E. & Jiang, J., 2010. Twitterrank : Finding Topic-Sensitive Influential Twitterers TwitterRank : Finding Topic-sensitive Influential Twitterers. , pp.261–270. Wu, S. et al., 2011. Who Says What to Whom on Twitter. , pp.705–714. Yardi, S., Tweeting from the Town Square : Measuring Geographic Local Networks. Yardi, S. & Boyd, D., 2010. Tweeting from the Town Square: Measuring Geographic Local Networks. ICWSM. Yuan, Q. et al., 2013. Who , Where , When and What : Discover Spatio-Temporal Topics for Twitter Users. ACM, pp.605–613. Zhang, D. et al., 2010. Extracting Social and Community Intelligence from Digital Footprints : An Emerging Research Area. , pp.4–18. Zheng, Y., 2011. Location-based social networks: Users. Computing with Spatial Trajectories. Zielinski, A. & Bügel, U., 2012. Multilingual Analysis of Twitter News in Support of Mass Emergency Events. , pp.1–5. Zielinski, A. & Middleton, S., 2013. Social Media Text Mining and Network Analysis for Decision Support in Natural Crisis Management. Proceedings of the 10th International ISCRAM Conference

22