Detection of tourists attraction points using Instagram ... - Science Direct

19 downloads 62531 Views 870KB Size Report
Jun 12, 2017 - People make photos to memorize moments or meaningful places [1]. For this reason, photo-based social networks, like Instagram, are constantly growing. ... All that leads to the second question we consider in this paper:.
Available online at www.sciencedirect.com

ScienceDirect Procedia Computer Science 108C (2017) 2378–2382

International Conference on Computational Science, ICCS 2017, 12-14 June 2017, Zurich, Switzerland

Detection of tourists attraction points using profiles points Detection tourists Detection of of Instagram tourists attraction attraction points using using Instagram profiles profiles Ksenia D. Mukhina,Instagram Stepan V. Rakitin, Alexander A. Visheratin ITMO University, Saint Petersburg, Russia

Ksenia D. V. A. [email protected], [email protected], [email protected] Ksenia D. Mukhina, Mukhina, Stepan Stepan V. Rakitin, Rakitin, Alexander Alexander A. Visheratin Visheratin ITMO University, Saint Petersburg, Russia ITMO University, Saint Petersburg, Russia [email protected], [email protected], [email protected] [email protected], [email protected], [email protected]

Abstract During vacations, people try to explore new places, but it is impossible for tourists to see all interesting Abstract In this paper, we shed the light on differences between favorite places of tourists and locals locations. Abstract During vacations,profiles. people try to time explore new places, it is impossible for tourists to see to all interesting using The windows basedbut identification method is proposed DuringInstagram vacations, people try to explore new places, but it is impossible for tourists to see all distinguish interesting locations. In this paper, The we shed the light ontourists’ differences between favorite places of tourists locals visitors from residents. list of potential attraction points in Saint Petersburg wasand obtained locations. In this paper, we shed the light on differences between favorite places of tourists and locals using Instagram profiles. Theplaces. time windows based identification method is proposed to distinguish by analysis of locals’ popular using Instagram profiles. The time windows based identification method is proposed to distinguish visitors from residents. The list of potential tourists’ attraction points in Saint Petersburg was obtained visitors from residents. The list Elsevier of potential attraction points in Saint Petersburg was obtained © The Authors. Published B.V. tourists’ Keywords: Instagram, geotagged photo, location recommendation, area of interest, geo-social data by2017 analysis of locals’ popularbyplaces. by analysisunder of locals’ popularofplaces. Peer-review responsibility the scientific committee of the International Conference on Computational Science

1 Introduction People make photos to memorize moments or meaningful places [1]. For this reason, photo-based 1 Introduction 1 Introduction social networks, like Instagram, are constantly growing. By this time Instagram includes over 600 Keywords: Instagram, geotagged photo, location recommendation, area of interest, geo-social data Keywords: Instagram, geotagged photo, location recommendation, area of interest, geo-social data

1 People make photos to memorize placesof[1]. For this content reason, photo-based million users from all over the worldmoments . One of or themeaningful prevalent types Instagram is photos of People make photos to memorize moments or meaningful places [1]. For this reason, photo-based social networks, like Instagram, are constantly growing. By this Such time posts Instagram includestoover 600 visited places, historical buildings and some interesting locations. are expected be from social networks, like Instagram, are constantly growing. By this time Instagram includes over 600 1 million users from all over the world of the prevalent types of Instagram content is photos of 1. One someone, who sees place for the first time, however, there is another tendency to post pictures from of million users from all over the world . One of the prevalent types of Instagram content is photosthe visited places, historical buildings andpopular some interesting locations. Suchand posts are expected toThe be from favorite place. There is no doubt that places for local citizens visitors differ [2]. first visited places, historical buildings and some interesting locations. Such posts are expected to be from someone, who sees place for the first time,is:however, there is another tendency toof post pictures from the question that addressed in our how different areanother populartendency locationsto tourists andfrom locals? someone, whoissees place for the research first time, however, there is post pictures the favorite place. There is no doubt that plans popular places for local citizens and visitors differ [2]. The first Due to time limitations, a person his or her visit with the help of travel guides, web sites or favorite place. There is no doubt that popular places for local citizens and visitors differ [2]. The first question that is addressed in our research is: how different and are popular locations ofoftourists and locals? even using specific recommendation systems [3]. Websites guides contain lists “must-see” places, question that is addressed in our research is: how different are popular locations of tourists and locals? Due tolists time limitations, a person plans his or herThus, visit with the help of travelupguides, web sites or but Due theseto usually limited to a plans several potential visitor tied toweb a relatively timearelimitations, a person hisplaces. or her visit with the help of ends travel guides, sites or even using specific recommendation systems [3]. Websites and guides contain lists of “must-see” places, short using list ofspecific typical locations. Although there are recommendation like TripAdvisor, even recommendation systems [3].user-based Websites and guides containsystems, lists of “must-see” places, 2 but these lists are usually limited to a several places. Thus, potential visitor ends up tied to a relatively theythese usually contentlimited integrity tourists . Still, many aretied not to mentioned in but liststarget are usually to apolicies severaltowards places. Thus, potential visitorplaces ends up a relatively short list of typical locations. Although there are user-based recommendation systems, like TripAdvisor, the tour but locations. well-known by locals. Allare thatuser-based leads to the second questionsystems, we consider in this paper: short listguides, of typical Although there recommendation like TripAdvisor, 2 they usually target content integrity policies towards tourists places are not mentioned in 2. Still, many can we expand the visible area for tourists using the information from residents? . Still, many places are not mentioned in they usually target contentcity integrity policies towards tourists the To touranswer guides,the butstated well-known by locals.questions, All that leads to the second question we considerfor in this paper: above research we propose: (1) an improved separation the tour guides, but well-known by locals. All that leads to the second question we method consider in this paper: can we expand the visible city area for tourists using the information from residents? of places, popular of tourists and locals per posting activity; (2) a method for an automated classification can we expand the visible city area for tourists using the information from residents? To answer the stated above research questions, we propose: (1) an improved method for separation among locals, with the identification of the most interesting locations. To answer the stated above research questions, we propose: (1) an improved method for separation of tourists and locals per posting activity; (2) a method for an automated classification of places, popular of tourists and locals per posting activity; (2) a method for an automated classification of places, popular among locals, with the identification of the most interesting locations. among locals, with the identification of the most interesting locations. 1 http://blog.instagram.com/post/154506585127/161215-600million 2 1 1 2 2

https://www.tripadvisor.com/pages/content_integrity_policy.html http://blog.instagram.com/post/154506585127/161215-600million http://blog.instagram.com/post/154506585127/161215-600million https://www.tripadvisor.com/pages/content_integrity_policy.html https://www.tripadvisor.com/pages/content_integrity_policy.html

1877-0509 © 2017 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the International Conference on Computational Science 10.1016/j.procs.2017.05.131



Ksenia D. Mukhina et al. / Procedia Computer Science 108C (2017) 2378–2382

2 Problem statement The problem of the development of tourist areas and places of attraction has been widely studied since 1980s, when the concept of tourist area cycle of evolution was proposed [4]. The most part of articles, devoted to this topic, perform theoretical analysis of the target area [5] or use statistical methods of analyzing tourists [6]. These methods work well for relatively small cases and for particular areas, but with the development of the Internet and photo-sharing networks, like Instagram, researchers now can investigate the activity of the people in the tourist areas on the much larger scale. We can collect a large amount of posts from the social network to see, if there are any differences in the behavior of local citizens and tourists. In Figure 1 the density map of Instagram photos, taken by locals and residents in Saint Petersburg in 2016, is presented. We can see that the clear majority of tourists’ photos is located in the city center, mostly along the main street – Nevsky prospect. Popular places of residents are scattered throughout the city area. Saint Petersburg was chosen as the target city of the research, because it is a cultural and touristic center. In 2016 the city was awarded as the World's Leading Cultural City Destination and Europe's Leading Destination3. Thus, we can say that there is no better place in Russia to study the differences in behavior of tourists and residents.

Figure 1. Density map of Instagram photos in Saint Petersburg

3 Proposed solution Data crawling. For the analysis of users’ activity on the Instagram, we require full profiles of users, which include user’s information and all his or her posts with geolocation of posts where available. At the first step, we collected a sample of posts made in Saint Petersburg in the period of two months in 2016 – February and November. These two months were chosen because they represent periods of low tourist activity4 and this fact allows to collect more posts of local users. The crawling of posts was initiated at four points of interest: three parks at the north, east and south of the city and the cultural center of Saint Petersburg – the Hermitage. Instagram entries were collected within the 5 km radius around these points, which allowed to cover the major part of the city. After that, we extracted all usernames from the collected dataset and collected full profiles for these users. Detection of city residents. One of the most common ways to discover city visitors is to check the time periods of posted photos [7]. If someone posts pictures in the target location only in a specific time frame, he or she is considered as a tourist. However, usage of the approach with small time frames would result into situations, when tourists with relatively long vacations would be classified as citizens. Since the main idea of our research is to analyze the most popular places among locals, we need to use more accurate way to detect city residents. It was decided to expand the described approach with the use of official tourist statistics. By information from the administration of Saint Petersburg, the largest number of tourists comes from the European Union (32%). According to Eurostat5, average duration of the https://www.worldtravelawards.com/profile-8085-saint-petersburg-committee-for-tourism-development http://travel.usnews.com/St_Petersburg_Russia/When_To_Visit/ 5 http://ec.europa.eu/eurostat/statistics-explained/index.php/File:Trips_made_by_EU28(%C2%B9)_residents_by_duration,_destination_and_purpose,_2014.png 3 4

2379

2380

Ksenia D. Mukhina et al. / Procedia Computer Science 108C (2017) 2378–2382

outbound vacations of European citizens mostly does not exceed 14 days (87% of trips). Overall length of vacations and holidays for one year is no longer than 28 days. These result in a strategy for identifying a person as a tourist if his or her posts in Saint Petersburg during a calendar year can be covered by two 15-days long time windows with a gap between the windows at least 30 days. Illustration of this concept is presented in Figure 2.

Figure 2. Illustration of time windows strategy Thus, in our analysis, the user is considered as local if he or she fits at least one of the following criteria: 1. Profile description or username contains information that user is located in Saint Petersburg; 2. User is considered as local according to the strategy described above. Despite all aforementioned rules, some visitors still could be associate as locals. It could happen if someone visits city repeatedly with short periods of staying. But repeating visitors are known to choose favorite places, stay longer and explore their destination in details [8]. Also repeaters are more likely to live nearby and travel on the weekends [9]. Thus, suchlike tourists could be fairly considered as locals. In addition to that, tourists with long vacations would be also identified as locals. During the first days of staying they would likely visit all the most popular touristic places and start to explore new areas. Location filtering. Since we are particularly interested in revealing of new promising locations, some preliminary data processing is required. Instagram application allows users to set up location of the photo by typing it. It provides some difficulties with a location distinction: different places can have similar names, some names have typos, etc. Different places having the same location are united, and all locations inside of 120 meters radius are considered as one place with an aggregated statistics. After that, we compare the most popular places for both groups – tourists and locals. Place is considered as popular if it was visited by at least 100 different users during one calendar year. Top 15 tourists’ places are considered as well known and excluded from the analysis. Places corresponding to large areas like satellite towns, city districts and locations, related to transport infrastructure, i.e. subway stations and railway stations, are also eliminated from the analysis. Detection of popular places. On the next step, all locations are divided into clusters per their weekly visiting dynamics. For each day of the week total number of users, who visited the place, was calculated and for every place the dynamics was normalized to reveal a clear dynamics pattern. To group locations by similar dynamics, the agglomerative hierarchical clustering was chosen. This method has shown its efficiency in social networks analysis [10] and it could be successfully used to cluster time series of equal length [11]. Squared Euclidean distance was used to measure the distance to enlarge the distance between different locations and shorten it for similar locations. The optimal number of clusters is defined by the Dunn index [12]: index is calculated for the range between 2 and 30 and after that the lowest number with highest value is picked as a solution. Finally, top 10% of places from each cluster is considered as a result and potential attraction points. The list combines the places of different types covering all possible kinds of dynamics, thus tourists would be able to choose a location by their own preferences and interests.

4 Experimental part The collected dataset contains 529 251 posts with geolocation in 17 921 unique places in Saint Petersburg. 23 596 users were classified as locals. To testify the time windows method, a sample of 150 random profiles was taken from the dataset. The correctness of classifying as tourists or locals was checked manually. Results of the verification showed that if user is defined as a local, he almost certain local (type I error is 6.25%). Meanwhile, some users due to their low posting activity are marked as



Ksenia D. Mukhina et al. / Procedia Computer Science 108C (2017) 2378–2382

tourists. This happens mostly in two cases: (1) User do not use geo location functionality: only few photos contain geotags; (2) User tends to post photos from other places: majority of photos from profile are taken in places outside Saint Petersburg. Thus, current implementations of the time windows method meet certain limitations: it can be successfully used to detect only active users of Instagram and it can fail to identify frequent travelers. However, our research is focused on the city and its active residents, which constantly explore the city. The comparison of weekly dynamics during a year seasons is presented on Figure 3. During winter and fall, there is a lot of similarities between profiles of tourists and locals, since people visit Saint Petersburg mostly on weekends. In contrast, in summer tourists tend to stay on a longer period, which results in higher level of activity during the working days, whereas locals post more photos during weekends. The lowest value on Tuesday in summer and fall periods could be related to the day off in museums cultural places.

Figure 3. Tourists and locals’ day-of-week dynamics by season. On the next step, filtered locations list of locals’ popular places was analyzed to reveal potential attraction points. As a result of clustering, there were obtained 26 clusters with different weekly dynamics. These clusters were combined into four groups (Figure 4): the first group is the places with a stable high level of activity. The second group corresponds to places with rather high weekends’ dynamics compared to working days. The third group demonstrates upward trend during working days and sharp decrease during weekends. And the fourth group represents places with a peak of activity in one of the working days and the lower level of posts in weekends.

Figure 4. Groups of day-of-week dynamics’ clusters. Top 10% of popular places among locals were taken from each cluster. Thus, there were obtained 44 popular locations in 2016. The recommended places could be divided into 5 categories: cultural

2381

Ksenia D. Mukhina et al. / Procedia Computer Science 108C (2017) 2378–2382

2382

locations (theatres, museums and circuses), restaurants (bars and café), interesting city locations (bridges, parts of streets, etc.), parks and others (creative spaces, studios, etc.).

5 Conclusion and future works The main contribution of this work is a method for revealing new potential attraction points for tourists in a large city. We presented an extension for tourist and locals detection from social media data by using time frames. This approach allows to identify the person without complete information and full profile. Saint Petersburg was used as an example: top places for visitors and residents were compared and the results showed that tourists and locals have different activity trends during the year. In summer, tourists are more active during a week compared to other seasons. Conversely, during a winter season tourists and locals have rather similar dynamics, which leads us to the idea that places popular among citizens could be interesting for tourists as well. After analysis, the list of potential interesting locations was presented. However, there are several steps to enhance these results. One of direction to improve this work is to take into consideration social interactions between users, i.e. likes and comments. Obviously, pictures with many likes are representing something interesting to other users, thus place, where photo was taken, could be potential point of interest for tourists. Presence of comments does not guarantee social approval; however, it shows users’ concernment. Usage of sentiment analysis for comments can allow to endorse an accuracy of posts evaluation. Some location with a low density of photos could be considered potentially interesting in case of big resonance in social response. Acknowledgements. This research financially supported by Ministry of Education and Science of the Russian Federation, Agreement #14.578.21.0196 (03.10.2016). Unique Identification RFMEFI57816X0196.

References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

P. Bourdieu and S. Whiteside, “Photography: A middle-brow art,” 1996. A. Indaco and L. Manovich, “Social Media Inequality : Definition, Measurements, and Application,” Urban Stud. Pract., pp. 1–22, Jul. 2016. D. Gavalas, C. Konstantopoulos, and K. Mastakas, “Mobile recommender systems in tourism,” vol. 39, pp. 319–333, 2014. R. W. Butler, “The Concept of a Tourist Area Cycle of Evolution: Implications for Management of Resources,” Can. Geogr., vol. 24, no. 1, pp. 5–12, 1980. M. Ma and R. Hassink, “An evolutionary perspective on tourism area development,” Ann. Tour. Res., vol. 41, no. December 2015, pp. 89–109, 2013. M. F. Cracolici and P. Nijkamp, “The attractiveness and competitiveness of tourist destinations: A study of Southern Italian regions,” Tour. Manag., vol. 30, no. 3, pp. 336–344, 2009. F. Girardin, F. D. Fiore, C. Ratti, and J. Blat, “Leveraging explicitly disclosed location information to understand tourist dynamics: a case study,” J. Locat. Based Serv., vol. 2, no. 1, pp. 41–56, 2008. J. P. Tiefenbacher, F. A. Day, and J. A. Walton, “Attributes of repeat visitors to small touristoriented communities,” Soc. Sci. J., vol. 37, no. 2, pp. 299–308, 2000. X. (Robert) Li, C. K. Cheng, H. Kim, and J. F. Petrick, “A systematic comparison of first-time and repeat visitors via a two-phase online survey,” Tour. Manag., vol. 29, no. 2, pp. 278–293, 2008. S. Wasserman and K. Faust, “Social network analysis: Methods and applications,” Cambridge Univ. Press, vol. 1, p. 116, 1994. T. Warren Liao, “Clustering of time series data - A survey,” Pattern Recognit., vol. 38, no. 11, pp. 1857–1874, 2005. J. C. Dunn, “Well-Separated Clusters and Optimal Fuzzy Partitions,” J. Cybern., vol. 4, no. 1, pp. 95–104, 1974.

Suggest Documents