Toward Dynamic Path Recommender System ... - ACM Digital Library

2 downloads 0 Views 1MB Size Report
Social Network Data. Faizan Ur Rehman#§, Ahmed Lbath#, Md. Abdur Rahman§, Saleh Basalamah§,. Imad Afyouni⋆, Akhlaq Ahmad$, Syed Osama Hussain§.
Toward Dynamic Path Recommender System Based on Social Network Data Faizan Ur Rehman#§ , Ahmed Lbath# , Md. Abdur Rahman§ , Saleh Basalamah§ , Imad Afyouni? , Akhlaq Ahmad$ , Syed Osama Hussain§ #

Department of Computer Science, LIG, University of Grenoble Alpes, France KACST GIS Technology Innovation Center, Umm Al-Qura University, Makkah, KSA ? Department of Computer Science, American University of the Middle East, Kuwait  College of Computer and Information Systems, Umm Al-Qura University, Makkah, KSA $ KICT, International Islamic University, Kualalumpur, Malaysia §

{fsrehman,marahman,smbasalamah,aajee}@uqu.edu.sa, [email protected], [email protected], [email protected] ABSTRACT

Keywords

With the advancement of mobile technologies, more and more people are connected to social networks such as Facebook and Twitter. Social networks allow users to share diversity of information including spatio-temporal data either publicly or within their community of interest in realtime. Particularly, by analyzing social network data streams and then validating the content, one can extract knowledge about dynamic road conditions for a given city. This paper presents a dynamic path recommender system that helps users finding optimized routes in dynamic environments based on social network data. The system collects geo-tagged social network data from which relevant knowledge is extracted for identifying constraints such as accidents, weather conditions, and congestions. Moreover, by continuously collecting moving user’s geo-tagged data, the system can also identify the traffic flow as well as roads’ conditions. As soon as the system identifies and validates a given constraint, it can notify affected users and recommend an adapted route from their current position to the destination. A proof of concept of the system will be shown through three example scenarios.

Path recommender, Social networks, Geo-tagged tweets

1.

INTRODUCTION

Social Networking allows users to write, read, and comment on social posts. Many users specially prefer using social networks while traveling by bus, metro or any vehicle. Social Networking sites such as Facebook and Twitter are world’s top most visited sites. Since social sensors provides a rich source of data, end users, industries, and researchers show an enormous interest in social sensors. Thanks to current advancements in smartphones equipped with high precision GPS sensors and high speed internet access, most of the social network content that people share nowadays is geo-tagged. For example, out of millions of tweets that are posted per second, a large number of tweets are geo-tagged. Analyzing these geo-tagged data gives us the location and time from where people are sharing their content, so that different advanced services can be provided.

Categories and Subject Descriptors H.2.8 [Database Applications]: Spatial databases and GIS; H.2.4 [Information Systems]: Query Processing

General Terms Design, Algorithms Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. IWCTS ’14, November 04-07 2014, Dallas/Fort Worth, TX, USA Copyright 2014 ACM ACM 978-1-4503-3138-8/14/11 ...$15.00 http://dx.doi.org/10.1145/2674918.2674927

Figure 1: Dynamic road condition based on knowledge extracted from geo-tagged social network data Another aspect of this geo-tagged content is that it provides knowledge about dynamic road conditions. Dynamic road conditions include the traffic flow and constraints such

as blocked roads, affected by flooding or under construction, outdoor advertising, etc. These dynamic road conditions can then be used to alert mobile users by recommending an optimized path. For instance, Figure 1 illustrates a snapshot of dynamic road conditions including the traffic flow based on smartphone usage data as well as constraints collected from social network data. Events showed in Figure 1 are considered as constraints and the average speed of the road is calculated based on the frequency of the social posts while the user is moving. Efficiently extracting social network data and make them available for real time computation is not-trivial; let alone the recognition of evolving situations from massive social network streams. In this context, we present a “social-aware dynamic path recommender system” that continuously analyzes geo-tagged social network data streams in order to recommend adaptive paths to users. The system analyzes keywords of streams and validates data by cross checking the text and location attached to each stream. The system then extracts knowledge from data streams by breaking it into tokens and checking it with a predefined constraint keywords list. The system also stores geo-tagged data of continuously interacting users to find the trajectory flows those users are following. This helps in identifying the live traffic condition, the average time to cross an intersection, and can also be used to analyze more relevant information about road dynamics. Whenever a user requests for an “optimal” path from a given source to a destination, our proposed system processes this query by: 1) searching geo-tagged social network data and extract relevant information; and then 2) finding the optimized path from source to destination based on extracted constraints and the current situation of the road network. Dynamic monitoring of the user’s path results in adapting the current route when a dynamic constraint occurs in the remaining path. The user is then notified of the constraint and the system will recommend a new optimized path. The remainder of this paper is as follows. Section 2 discusses the related works. Section 3 discussed the modeling approach. Section 4 presents an overview of the proposed architecture. Section 5 highlights implementation. Section 6 shows some example scenarios while Section 7 draws conclusions and future challenges.

2.

RELATED WORK

Dynamic road conditions are of key interest to many areas of research, including real-time recommendation systems. Leveraging the volunteered field data collection, recommendation systems are getting popularity in terms of endorsing information related to flights, hotels, restaurants and preference-aware location-based services using current and historical data [7, 10]. Route recommender systems such as CrowdPlanner [15] and DroidOppPathFinder [3] lack the ability to detect the dynamic road network conditions. On the other hand, social networks have been used in cases of major alarming situations such as earthquakes and local event detection. Authors in [4] and [5] have used social networks to broadcast the alarming situation to all potential users with the accuracy of more than 65% using spatiotemporal analysis. Crowdsourcing framework collects users’ locations through smart phones for different purposes [12]. In order to ensure good quality of generated content, large number of partic-

Figure 2: Proposed architecture

ipants are essential. The use of smart phones with multisensing abilities helps in generating very rich sensory data [8, 6, 16]. GPS sensors helps us in finding the location of the users. Our proposed system collects social network location data not only to broadcast live traffic conditions but also to generate alternative paths in case of any blockage, accident, flood or any other alarming situation.

3.

MODELING APPROACH

Many geo-tagged tweets may come with information that at a given location there is an accident or another traffic constraint. For example, the road is closed due to an accident. There might be many other users who have to pass through this location. Once they come to that place without a prior knowledge about the accident, a heavy blockage and wast of time might be resulting. In our work, we propose a scheme that will detect the nearby users coming towards that particular location and inform them about the road blockage and offer them alternative optimized paths. This section presents a grid-based modeling approach by considering a road accident as a use case. This grid-based model is used to update optimized paths for affected queries based on road conditions. This approach divides the world into regular square cells that help finding the affected users for a given query in minimum response time. Each user, constraint, and trajectory is tagged with a particular cellId. If a user u0 finds a road is closed because of a road accident; many other users {u1 , . . . , un } that are moving across the area have to be averted of this accidental point. The information provided by the user through social networks will be communicated to the rest of the users {u1 , . . . , un } to avoid following their routes by suggesting alternative paths. This helps to avoid traffic congestions. Assume that the user u0 sends a geo-tagged tweet with information that at his/her current location the road is closed due to an accident; the system reads that tweet and derive an information about the road accident for example at location X, X = hlat, lng, t, txt, imagei (1) where lat, long are the coordinates, t is the current time, txt is the textual information about the accident, and image is the image shared by the user.

Our aim is to propagate this information to other users approaching this point X. The earlier this information is sent, the lesser would be the congestion on the roads. We propose that the whole world is divided into number of regular squared cells. The grid size is fixed in advance based on the spatial resolution parameter, which is applicationdependent. Each cell has the size of (1 latitude degree * 1 longitude degree). For instance, if the grid starts from (0 lat, 0 long); the point A of the first cell has (0,0) coordinates, the point B will be (0,1), point C (1,0), and D (1,1). The number of cells would increase if the spatial resolution is higher, and conversely less for larger areas. If, for example, we consider a cell with vertices differing by one degree in both longitude and latitude then one side of square grid would be 111.2 km. The location X for example is within a cell labeled currentGrid (Figure 3). For now, we will consider only the eight neighboring cells that will help us to limit the number of users to whom we have to inform and provide the alternative paths. Once a user leaves the current cell and enters into its neighboring cell, the system automatically updates the grid label. We define the response time as the time taken by the system in reading a tweet at location X. By considering only neighboring cells N1 −N8 , to increase the response time tresp tresp = thnu , Acell i (2) where tresp is the response time to find the affected user, nu is the number of users, and Acell is the area of the cell. This response time depends on the area of the cell (i.e., size and number of the active users that are in that cell); if the number of the users is higher and the size of the cell is big, the response time to identify the affected users will be longer. By limiting the number of users with in the neighboring grids N1 , . . . , N8 , the query processor will have fewer loads and the response time can be faster. This is elaborated more in Section 5. In our approach, we also represent users’ locations, the locations of the constraints and trajectories are tagged within a specified cell. Celli = hU sers1,2,3...U , Constraints1,2,3....C , T rajectories1,2,3....T i (3) We can find the cell ID easily by passing the location of the user or the constraints. Based on the current cellId, we can get the information of their neighboring cell as well. Select cellId From cellIndex Where lat Between minimumuLat And maximumLat And lon Between minimumLon And maximumLon; Select N1, N2, N3, N4, N5, N6, N7, N8 From cellNeighbor Where currentCell = cellId; The management of users’ locations, traffic constraints, and trajectories of the users will be discussed in the following section.

4.

PROPOSED ARCHITECTURE

We present a Dynamic Path Recommender System that receives social network data streams, analyzes, and validates those streams in order to extract relevant information for path recommendation. Extracted knowledge is used to identify the traffic flow and different types of constraints on road networks that will help the system to recommend the optimized path. Figure 2 shows an overview of our proposed system architecture with salient components. The server side consists of a Constraint Aware Query Processor, Constraints and Trajectories, Geo-tagged Social Network Data

analyzer, and Open Street Map server. The front-end of the system is a client application that communicates with the back-end through HTTP REST API, which passes and receives the parameters in JSON format. Our client application is used to pass the source and destination as an input to the Constraint-Aware Query Processor and also to show the dynamic optimized path on the map. These components are going to be discussed in detail in the following sections.

4.1

Constraint Aware Query Processor

The Constraint Aware Query Processor recommends the optimized path based on dynamic road networks. It keeps track of all active path queries, starting and destination points for a given user, the current user location, and the remaining path for a given query. Upon receiving any constraint from the repository, this component detects the constraint type and location. It then updates the particular edge and its corresponding value of speed and time. All the users who might be affected by that constraint are then notified by the system through a push notification. Optimized and constraints-aware paths are then computed based on the current location of the corresponding users. The system also takes the current situation of the dynamic road network into consideration for any new request issued by a different user. The Constraint-Aware Query Processor also uses knowledge of road conditions received from the trajectory repository such as the current traffic flow, and the intersection crossing time for recommending optimal paths.

4.2

Constraints and Trajectories

Constraints and Trajectories stores all the events that are extracted from social networking sites. The system extracts a given constraint, notifies the Constraint-Aware Query Processor and stores it in database. It also stores users’ trajectories that are continuously interacting with the social networking sites. Based on trajectory data, the system can easily find the vehicle speed, estimates the traffic flow, as well as the average waiting time on the intersection. As suggested in [9, 13, 14], our system stores the movement of each user in a quadruple , where tid is the trajectory identifier, uid is the user identifier, eid is the edge identifier, and t is the time instant. The system retrieves data from the users that are continuously interacting and feeding social networks. The system computes the average speed based on its last time stamp, position and current time stamp. Handling the high arrival rates of data from social networking sites is a big challenge that can be done on the basis of [2, 11] by using in-memory pyramid index with bulk operation with respect to insertion and deletion.

4.3

Geo-tagged Social Network Data Analyzer

Geo-tagged Social Network Data Analyzer is a java-based crawler that is running continuously and collecting social geo-tagged data. It contains a very rich source of information including the location, name, user profile, followers, language, city, and country for a given tweet. Whenever the system receives a new data stream, the Geo-tagged Social Network Data Analyzer processes this stream by: 1) Collecting geo-tagged data, 2) Analyzing the keywords of the stream and validates it by cross checking the values attached to it (e.g., “Stuck in accident on 3rd Ring Road”), 3) Extracting knowledge from social data by breaking those data streams into tokens and comparing them to a predefined ex-

isting keyword list, 4) Identifying the constraints and the traffic flow through the continuously moving users that are interacting with the social networking sites. The knowledge that is extracted in terms of constraints and/or trajectories are stored in the Constraints and Trajectories repository.

Figure 3: Grid Approach N1-N8 as neighboring cells with respect to a current cell

5.

IMPLEMENTATION

To validate our approach, we developed a prototype based on twitter data as social media. The front-end is a webbased application (but can also be a smart phone application) that is passing the queries for recommending optimized paths. Users who are connected with the social network are giving updates about the road network by updating their geo-tagged posts on Twitter. We are using Amazon Web Services (AWS) framework in the back-end. As we are dealing with a large crowd, during peak time AWS can scale up vertically and horizontally by sharing the load on multiple instances based on the latency of9 the request. In the backend, we have an Amazon Web Services EC2 auto scaling query handler server, a Geo-tagged social network crawler server, a dedicated database server, and an open street map database. Figure 4 shows the architecture of the back-end. Currently, we are using only one EC2 c3.4xlarge machine to collect the geo-tagged twitter data by using twitter-streaming API only for Saudi Arabia. As twitter allows getting only 1% of data free of service. Later, we are planning to move the geo-tagged crawler inside the cluster where we divide the world into zones and each machine is receiving the data for the configured zone only. It will be used to handle the large flow of data. Once the data is received, it will be stored in db.m3.large RDS machine that provides an auto-recover, multiple reads, and an auto backup facility. It is storing the data of all the constraints and trajectories. We used our grid-based approach as shown in Figure 3 for storing the constraints and trajectories. For now, we have divided the area of the world in the form of grid and every cell is represented a 1 by 1 degree latitude/longitude cell covering the world i.e. one side of the square cell is 111.2 km each side and their diagonal difference is 157.2 km as. Each cell has a unique id. Each cell has eight neighboring cells (Figure 3), and each cell also store the id of the neighbor cells as well in the database. The Open Street Map component contains planet dump data that is converted into PostGIS datasets by using the osm2pgsql tool. This data is used as our road network data.

Figure 4: Back-end architecture of the system Mapnik and mod tiles are also used to generate and render the map. The Constraint Aware Query Processor is the main component of our backend that is running on amazon auto-scaling feature. As soon as the number of users increases, it will increase the instance of the machine and share the load with other instances. It receives the queries from the front-end and store the queries in the query table in the form of a tuple , where qId is a query Id, uId is a user Id, sNode is a source node Id, cNode is a current node Id, dNode is a destination node Id, curCellId is a current cell Id, rId is a route ID. Route Id stores the complete list of nodes Id that are in the path in a separate route table. It will fetch the road network data from OSM database and fetch the live road situation from amazon RDS and recommend the optimize path to the affected user in the following way: 1) Constraint Aware Query Processor receives constraint from RDS Find the cellId based on constraintLatitude and constraintLongitude, 2) Find all the eight neighboring cell of the current cell, 3) Get the list of active users of all the nine cell Ids i.e. current affected cell Id and all its eight neighboring cell, 4) Update the query table and route table for all the user’s of the affected cells, 5) Find out the affected users, if the affected edge is in their remaining path, 6) Send Notification to affected users about the changes in the road network, and 7) Recommend new optimize path to all the affected users. It will also update the query table whenever user moves from one cell to another cell. Constraint Aware Query Processor has all the knowledge of active users and constraint/trajectories that are inside the particular cell number at any given time.

6.

EXAMPLE SCENARIO - USE CASES

The example scenario are based on data that Geo-tagged Social Network Data crawler collects from Twitter for Saudi Arabia. Saudi Arabia ranks first in Twitter usage and their tweets have grown more than 300% since last year, reaching an average of 150 million tweets/month [1]. In Twitter, there are two types of tweets, one without location data and

(a) User selects source and destination

(b) System shows the path and User starts traveling

(c) Dynamic constraint regarding the segment(s) on the path that the user is supposed to follow is updated from the real-time analysis of Twitter data

(d) User gets notification along with recommended optimized path based on the dynamic constraint

Figure 5: System recommending optimize path as soon as it receives constraint from tweets one with exact location data in terms of latitude and longitude that we call as “geo-tagged tweets”. Apart from the tweet and its location, the data is very rich and contains information about place, time zone, and much more information that we are using for basic level validation. Our system is currently collecting only geo-tagged tweets and extracting knowledge from it and using that knowledge for recommending the best optimized path. We will demonstrate the proof of concept through the following three use case scenarios. These use cases are based on the knowledge extracted from geo-tagged tweets. Use case 1: In figure 5(a), a user has choosen a source A and a destination B. Figure 5(b) shows the interface that generates the path from A to B and the user starts her journey from A. During traveling from A to B, the system extracts knowledge from geo-tagged tweets and identifies potential constraints for that path as shown in figure 5(c). In that case, the system will notify the user about the type of constraint and its location. In figure 5(d), the system recommends the optimized path that avoids the identified constraint. The notification enables us to determine the accident, outdoor advertisement, construction or any other constraint. Use case 2: Many users prefer to use social networks while traveling by bus, metro or any vehicle that helps us to provide the spatio-temporal data. The system continuously analyzes geo-tagged tweets of active users that are fed to the

twitter service. Moreover, based on such data, it will display the trajectories of the users that follow a specific path highlighted in Figure 1. The thickness of each edge is based on the tweets that we are getting from that trajectory. If the edge is thin, this means less number of users are interacting with the social networking site from that region. Having no line means that the system does not have any user from that region to interact with social networking sites. This use case is important to identify the current average speed of the traffic as well as the average crossing time of an intersection. The notification shown in Figure 1 enables us to determine the busiest route and the fastest route. Currently trajectory use case has been tested on simulation data, and different statistics about the frequency and accuracy of gathered live data are to be considered in future work. Use case 3: The system has prior information of a given constraint through geo-tagged tweets. The user sends a query from a given source to a destination; the system computes and displays the path, and notifies the user about the detected constraint. The system then suggests a new constraint-aware optimized path. In this use case, the user has the option to follow the path with constraints or to follow the recommended path.

7.

CONCLUSION AND FUTURE WORK

This paper presents a system that recommends constraint-

aware optimized paths based on geo-tagged social networking data. The system extracts knowledge from social sensing in terms of constraints, traffic flow, and average crossing time of an intersection and recommends the best-optimized path over dynamic road networks. The trajectory flow of data with the time stamp and date can be used for statistical analysis in future. The idea is implemented currently using static grid but in future, we plan to generate dynamic grids (different size of the grid cells) based on events, time, types of area which helps us further to minimize the response time to identify the affected users. In addition, we are planning to use physical sensors apart from social sensor data for recommending optimized path over dynamic road networks. We will also work on the quality of the generated path before recommending it to the user by mining and validating social sensor data on the one hand and improving the system through learning process on the other hand. We are also working to handle massive real-time microblog data, i.e., the streaming data arriving at a high rate. We will also implement a calibration model to find the best suited frequency for acquisition of streaming data.

8.

ACKNOWLEDGEMENTS

This project was fully supported by the NSTIP strategic technologies program (11-INF1700-10 13-INF-2455-10 and 11-INF1703-10) in the Kingdom of Saudi Arabia. We kindly acknowledge the useful suggestions from Prof. Walid G. Aref of Purdue University. We would also like to thank GISTIC and Advance Media Laboratory of Umm Al-Qura University, Saudi Arabia for providing the resources.

9.

REFERENCES

[1] The state of social media in saudi arabia 2013. http://www.thesocialclinic.com/the-state-of-socialmedia-in-saudi-arabia-2013/, January, 2014. [2] W. G. Aref and H. Samet. Efficient processing of window queries in the pyramid data structure. In Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 265–272, 1990. [3] V. Arnaboldi, M. Conti, F. Delmastro, G. Minutiello, and L. Ricci. Droidopppathfinder: A context and social-aware path recommender system based on opportunistic sensing. In World of Wireless, Mobile and Multimedia Networks (WoWMoM), 2013 IEEE 14th International Symposium and Workshops, pages 1–3, June 2013. [4] M. Avvenuti, S. Cresci, M. N. La Polla, A. Marchetti, and M. Tesconi. Earthquake emergency management by social sensing. In Pervasive Computing and Communications Workshops (PERCOM Workshops), 2014 IEEE International Conference on, pages 587–592, March 2014. [5] A. Boettcher and D. Lee. Eventradar: A real-time local event detection scheme using twitter stream. In Green Computing and Communications (GreenCom), IEEE International Conference, pages 358–367, November 2012. [6] G. Chatzimilioudis, A. Konstantinidis, C. Laoudias, and D. Zeinalipour-Yazti. Crowdsourcing with

[7] [8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

smartphones. Internet Computing, IEEE, 16(5):36–44, September 2012. K. Kabassi. Review: Personalizing recommendations for tourists. Telemat. Inf., 27(1):51–66, February 2010. S. Kanhere. Participatory sensing: Crowdsourcing data from mobile smartphones in urban spaces. In Mobile Data Management (MDM), 2011 12th IEEE International Conference, pages 3–6, June 2011. B. Krogh, O. Andersen, E. Lewis-Kelham, N. Pelekis, Y. Theodoridis, and K. Torp. Trajectory based traffic analysis. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 536–539, November 2013. L. Liu, J. Xu, S. S. Liao, and H. Chen. A real-time personalized route recommendation system for self-drive tourists based on vehicle to vehicle communication. Expert Syst. Appl., 41(7), June 2014. A. Magdy, M. F. Mokbel, S. Elnikety, S. Nath, and Y. He. Indexing of network constrained moving objects. In IEEE International Conference on Data Engineering (ICDE), 2014. M. Nagarajan, K. Gomadam, A. P. Sheth, A. Ranabahu, R. Mutharaju, and A. Jadhav. Spatio-temporal-thematic analysis of citizen sensor data: Challenges and experiences. In Proceedings of the 10th International Conference on Web Information Systems Engineering, pages 539–1553, 2009. D. Pfoser and C. S. Jensen. Indexing of network constrained moving objects. In Proceedings of the 11th ACM International Symposium on Advances in Geographic Information Systems, pages 25–32, 2003. I. Sandu Popa, K. Zeitouni, V. Oria, D. Barth, and S. Vial. Indexing in-network trajectory flows. The VLDB Journal, 20(5):643–669, 2011. H. Su, K. Zheng, J. Huang, H. Jeung, L. Chen, and X. Zhou. Crowdplanner: A crowd-based route recommendation system. In Data Engineering (ICDE), IEEE 30th International Conference, pages 1144–1155, March 2014. D. Yang, G. Xue, X. Fang, and J. Tang. Crowdsourcing to smartphones: Incentive mechanism design for mobile phone sensing. In Proceedings of the 18th Annual International Conference on Mobile Computing and Networking, number 12, pages 173–184, 2012.