Proactive and Reactive Carpooling Recommendation System based on Spatiotemporal and Geosocial Data Ahmed Elbery Dept. of Computer Science Virginia Tech Blacksburg, VA, 24060
[email protected]
Mustafa ElNainay Dept. of Computer and Systems Eng. Alexandria University Alexandria, Egypt 21544
[email protected]
Abstract—In this paper, we present a new carpooling recommendation system whose main objective is to find the best carpool matchings and recommend individuals to join their friends during trips or travels. The proposed recommendation system utilizes user’s mobility history and user social network information to find carpool matchings. The proposed system employs a probabilistic model based on continuous time Markov chain to model user’s mobility and to predict user future movements. Moreover, it uses two similarity measures, (interest based similarity, and friendship based similarity) to find the similarities between users. The interest based similarity uses a weighted bipartite graph between users and places, where the edges are weighted by the term frequency-inverse document frequency. The friendship based similarity uses the common friends as a similarity measure. The proposed system is evaluated using the number of carpool matchings that can be found, this number gives an indication of the reduction that can be made in vehicular traffic congestion, pollutant immersions, and energy consumption. Keywords:- Carpooling;Recommender System; Social VANET; Geosocial; Spatiotemporal; Similarity Detection; Mobility Modeling; Markov chain.
I.
INTRODUCTION
Spatiotemporal and geosocial data open an endless horizon for studying and analyzing human behavior, activities, and interactions. This is a key enabler to enhance people lives and make it easier and sometimes more enjoyable. These data can be collected from different sources. For example, social online networks [1] are rich sources of data about users, their interests, and relationships. Another important source for this data is the location-based services that utilize the wireless communication devices and embedded GPS systems to collect information about users and their mobility trajectories. Integrating data from social networks and location-based services together potentially facilitates new services and applications. Vehicular Ad-Hoc Network (VANET) [2] exemplifies such integration and benefits. VANET provides a huge amount of information about user mobility patterns and trajectories. Vehicles in transportation networks can benefit from such data to reduce the congestion and fuel consumption. In his paper, we use the user mobility information to propose a new carpooling recommendation system based on the Social VANET (S-VANET) framework that was introduced in our previous work [3]. The proposed system attempts to minimize the number of individual trips by finding
Hesham Rakha Dept. of Civil Engineering Virginia Tech, Blacksburg, VA, 24060
[email protected]
carpool matching. Consequently, it will mitigate the traffic congestion, traffic environmental impact, and the travel cost. A main principle behind this carpooling recommendation system is that people prefer to work in communities [4], which means that individuals have self-motivation to accompany others when traveling or visiting specific venues. However, finding those persons who can carpool or accompany other is a challenging task. The contributions of this paper are as follows: • Carpooling recommendation system: a new recommendation system is developed to recommend users to join their friends in their travels. The system utilizes two main subsystems: mobility modeling and user similarity detection subsystems. • Mobility modeling/prediction using continuous time Markov chain: we use continuous time Markov chain [5] to model user mobility, based on which, we can probabilistically predict the user future movements. • User similarity detection: an important component of the recommendation system is the similarity detection subsystem. For this purpose, we define two new similarity measures between users. The first is based on the user interest and the second is based on the user friendship.
The remaining of the paper is organized as follows. The motivation and objective of this work are provided in Section II. Survey of the related work is presented in Section III. The proposed carpooling recommendation system and the techniques used for similarity detection and for user mobility prediction are described in Section IV. The used dataset, data preprocessing and the implementation details are presented in Section V. The results are discussed in Section VI. The paper is concluded in Section VII. II.
MOTIVATION AND OBJECTIVE
The increase of urban traffic in the last decades resulted in many consequences including traffic congestions, increasing travel time, environmental pollution, and increasing the travel cost. All these problems mandates for studying alternative mobility management methods that can address this high traffic problem. An important and promising method is the carpooling. Carpooling is the sharing of a car by multiple commuters who travel on the same route or, at least, share a portion of their routes. Carpooling covers many objectives that span different levels; individual, organizational and governmental levels. From an individual perspective, carpooling reduces the travel costs (fuel and toll costs). In
addition, it reduces the driving stress if commuters share the driving. Moreover, carpooling saves fuel, reduces road traffic congestion and pollutant emissions, which are national objectives the government addresses [6, 7]. For organizations, carpooling helps to overcome the shortage of parking slots, on which different organizations spend large amounts of money to address [8]. Because of these advantages, many efforts are being done to encourage commuters to carpool. Some of them are initiated by individuals who established social media groups to find carpool matching among the group users. Because of its simplicity, this method lacks for scalability, trustworthiness, and manageability. Another direction is the centralized direction, which makes a good step beyond the previous one such as [9-11] by building centralized systems. However, these systems are reactive to user requests, which means the users have to schedule or search a carpool matching. The expected benefits of the carpooling on the national transportation sector motivated some governments, such as the USA government, to dedicate the High Occupancy Vehicles (HOV) lanes [12-14] to vehicles with 2+ travelers. In this way, the government provides an incentive to encourage the travelers to carpool. However, some studies on the effectiveness of these HOV lanes, such as [13], showed a negative impact of the HOV lanes on the roads congestion because it reduces the road capacity. The reason is that these HOV lanes are underutilized most of the time while the lanes adjacent to them are heavily congested. Since people have their motivations to carpool and share rides, this underutilization of these HOV lanes can be reasoned to the non-efficient carpooling systems and lack for trustworthiness [15]. The objective of this paper is to propose an automated, proactive, easy to use and efficient carpooling recommendation system. The proposed system utilizes on new technologies such as VANET communication, social networks, big data and data analysis techniques to find the best potential carpool matching. One main difference between the proposed system and current systems is that the proposed system is smart enough to find user similarities and predict future user trips based on the mobility history of the users. Based on those predictions and similarities, the system can reveal potential carpooling opportunities. III.
RELATED WORK
In this paper, we utilize the social VANET framework to build the proposed carpooling recommendation system. The social VANET is a new concept that we initially proposed in [3]. The carpooling is an old concept and has been studied since the previous century. For example, in 1975, Kendall studied the carpooling and developed a model to predict the maximum potential level of carpooling in an urban area [15]. In 1987, Teal [16] studied the carpoolers’ characteristics by distinguishing among different types of carpoolers. In 1999, Yang and Huang [17] studied the carpooling and congestion pricing problem in a multilane highway with HOV lanes. Our
work in this paper aims to make the carpooling easier and more efficient in order to better use such HOV lanes. In 2006, Burris and Winn [18] focused on the casual carpooling where the passengers are strangers forming carpooling groups to meet the occupancy requirements. The survey made in 2007 by Li et al. [19] showed that the social factor of enjoyment of the ride sharers is the second most important factor next to cost. Thus, our work in this paper is consentient with this conclusion, we utilize the social friendship among users as a preference when finding the carpool matchings. In [20], Trasarti et al. proposed a mining methodology to create user mobility profile and based on this methodology, they proposed a carpooling recommendation system. In this paper, we take a step beyond the user profiles in [20] by utilizing the prediction of user mobility. In [14], Correia and Viegas tried to increase the efficiency of finding carpool matching based on a base trust level in the groups. Our proposed system employs the user friendship network for finding the matches. Consequently, we assume a mutual trust among those friends. Subsequently, carpool matchings for non-friends are given lower preference. Bicocchi and Mamei [21] developed an application capable of finding carpool opportunities based on the mobility traces and the clustering of places to find their pertinence to users. In our system we use two similarity measures; one based on the user interest, and the other is based on the friendship. Vanoutrive et al. [22] Studied the workplace-centric carpooling in Belgium based on three factors; location, organization, and promotion. An important characteristic of the proposed system is its capability to proactively create recommendations. The mobility prediction feature enables the system to trigger the recommendation creation process. For example, if the user is expected to visit a venue in the near future, then the system automatically start finding the recommendation for this visit. Another important characteristic of the system that it is based on the S-VANET, in which the tracking system can find recommendations in real-time based on the vehicle’s current location. IV.
THE RECOMMENDATION SYSTEM
The proposed recommendation system is responsible for finding the carpool matches in response to either user requests (reactively) or the user mobility prediction (proactively). Consequently, there are two types of recommendation, reactive and proactive recommendations. The recommendation system has two main modules: the similarity detection, and the mobility modeling and prediction modules. In addition, the recommendation module manages the operation between the two former modules. A. Similarity Detection The objective of similarity detection is to find users who are similar to a given user, thus they have interest in visiting the same places. To calculate the similarity, we use multiparameter similarity function with two parameters; userinterest centric and friendship-centric similarities. 1)
Interest based similarity
To find the similarity between users based on their places of interest, we create a bipartite graph (, , ), shown in Fig. 1. It includes the two users and a set of places as disjoint sets. The existence of an edge ∈ indicates that the user visited place at least once. The edges of the graph are weighted by , which is the multiplication of two components; the inverse document frequency of place j, and the interest level of the user in visiting place . = . (1)
We define the interest of a user in visiting a given place as the probability that user will visit place in his next visit independently of his previous visits, i.e. if user , independently of his previous visits, selects a place to visit next, the only bias in this selection will be his interest. Thus, can be calculated from the places he visited and the frequency of visiting each place. To find we use (2) which is similar to the work in [25]. = (2) ∑ 0 ≤ ≤ 1 , " = 1 (3)
where is the number of visits the $% user made to the place. This interest level is then multiplied by the normalized inverse document frequency () [26] where the users are treated as documents and the places are the terms. The purpose of this weighting is to consider the specialty level of the place to the user, i.e. the place that was visited by large number of users is very common place and does not give any information about the interests of the users visited it. While those places which are visited by limited number of users can be used to identify the interest of those users. Thus, the edge weight can be calculated as follows: * log ) + = . ∑ log(*) log. / = . ,1 − 0 (4) ∑ log(*)
%$where N is the total number of users, is the degree of the $% place (i.e. the number of users visited it). From this bipartite graph, the distance between users is measured as the Euclidian distance between their edge weights. Then, the interest based similarity HI (J, K) between user J and user K is the reciprocal of the distance between them.
HI (J, K) = HI (K, J) = LM". − N / P
O
Q
where is the number of modeled places.
2)
Friendship based similarity
(5)
Based on the friendship locality, preference locality, and the travel locality concepts [4], we can conclude that close friends prefer traveling to the same nearby venues. Consequently, when a user visits a place, it is highly probable that his spatially close friends like to visit the same place. Based on that, the similarity between user travel preferences can be deduced from the friendship for those users. Thus, for two users and , having friend sets Y and Y respectively, we define the friendship similarity HZ as K\].^Y ∩ Y ^ + 1/ H[ = (6) K\] , .|Y |. ^Y ^/O + 10 Figure 1. Users-Places Bipartite Graph
The square root in the denominator guarantees that 0 ≤ H[ ≤ 1. The addition of one is necessary to avoid zeros in the log.
B. User Modeling and Mobility Prediction As mentioned in the previous section, the proposed system can operate proactively. This means it must be able to predict the users’ future mobility based on their mobility history. Subsequently, the system can create the appropriate carpooling recommendations either in response to a search request or based on system periodic invocations to the carpooling recommendation process. The mobility modeling and prediction is determined through the application of continuous-time homogeneous Markov Chain (CTMC) [5]. In CTMC, the process X (user) can be in one of a finite or countable set of states (locations) with the memoryless property; i.e. the next state depends only on the current state. Important characteristics to mention about CTMC are: 1) CTMC assumes an exponential distribution of transition times. Based on this assumption, the expectation for the future states is driven. 2) Irreducibility of the chain, i.e. there is a route between any two states in the chain. Without this feature, the process may stick at one of the states and will not be able to go to any other states. The CTMC is characterized by the transition rate matrix (Q-matrix).
f, fO, d = e ⋮ fh,
fO fO,O ⋮ fh,O
… … ⋱ …
f,h fO,h ⋮ k (7) fh,h
where f , is the transition rate from state to state . Another parameter is i∈S which is the parameter of the exponential distribution of sojourn time m 1 m = , = " f , (8) n
Furthermore, the state probability p (q) is defined as the probability that the system is in state at time q p (q) = rs(q) = t (9)
And the probability distribution p(q) is the vector of the state probabilities for all the v states. p(q) = wp (q), pO (q), … ph (q)x (10)
Given the exponential distribution of time, the state probability of irreducible Markov chain at any time depends only on the current state, and the Q-matrix, p(q) can be calculated as shown in the following equation. p(q) = py $z (11)
Assuming that the user mobility satisfies the memoryless conditions, then based on this mathematical foundation, the user mobility can be modeled using CTMC. The states of the chain represent the different places he/she visited, and the time between consequent transitions is used to create the transition rate matrix (Q-matrix). Therefore, it is important to make sure that m is exponentially distributed, and the chain for user state is irreducible. These two issues will be discussed in the implementation section. C. Creating the Recommendation Whenever a recommendation is needed for a given user who is expected to visit a specific venue at a given time, or for a user who scheduled a visit to a specific place at a particular time, the recommendation module follows the logic in Fig. 2 in order to find the carpooling matching. First, the user’s friends and the friends of friends are selected. From this set, users whose friendship similarity exceeds a threshold {[ are selected. Secondly, selected users are filtered by the interest based similarity, i.e. the users whose interest similarity to the given user is higher than a threshold {I are chosen. Finally,
the users who are expected to visit the given place in the future are selected and the recommendations are sent to all or some of them based on the user permission. V.
DATASET AND IMPLEMENTATION
This section describes the used datasets and the implementation details of the proposed recommendation system. A. The Dataset and Preprocessing The dataset used in this paper is called Gowalla dataset [27], which is found on the Stanford University’s website for Stanford Network Analysis Project (SNAP). Gowalla dataset has two files; the spatiotemporal data for user check-ins and the user friendship. The spatiotemporal data consisted of over 6 millions check-ins as shown in Fig. 3. Each check-in has the following fields User_Id, Check-in_Time, Latitude, Longitude, and Location_Id. The friendship files are an edge list of 950,327 edges among 196,591 users. In this paper, we apply our CTMC model on two areas the first is the New York City area shown in Fig. 4. The second is the total USA area including New York City itself. The reason behind this selection is to compare the system performance for different sizes and different levels of coherence between places as well as between users. We noticed two main issues in the data. First, some places are mentioned with different location IDs, either with the same or slightly different coordinates. To overcome this problem, the area of interest is divided into square areas each is 0.5x0.5 km. and all venues in each square are considered the same place. The second problem is that some users created many check-ins to the same place within very small time interval. Thus, we set up a minimum inter-check-in interval and if a user creates more than one check-in within this interval, all of them are considered one check-in. We define this interval to be one hour. B. Similarity Implementation For the area of interest, the check-ins are extracted from the total check-ins. Then, the user-place frequency table is generated to calculate the interest based similarities. Similarly, edges list is used to calculate the friendship based
Figure 3. Total Gowalla Dataset Check-ins
Figure 2. Creating Recommendations
Figure 4. Check-ins in New York Area
similarity. In this phase, another issue has been discovered in the data, the existence of non-active users (i.e. users who have a very small number of check-ins). Such users do not have enough information to calculate the similarity parameters or to model/predict their mobility. Such users are inactive and should not be considered by the system. To extract active users, we define a minimum activity threshold number of check-ins for the user to be active; this threshold is set to 50 check-ins in his history. For New York area, there are 144,728 check-ins for 7,195 different users, only 566 are considered active users. For each active user, the real check-ins are determined based on two factors; the real place ID and the inter-check-in interval as described in the previous subsection. Then the weights of the edges are calculated using equation (4). To find the friendship based similarity, we use the complete edge list file, not only for the users visited the selected areas. Thus, the friendship based similarity does not depend on the area of interest. C.
User Mobility Modeling and Prediction To correctly apply CTMC to model the user check-ins, we assume that the next user check-in is dependent only on the current user state. We also need to make sure that the data satisfies the two characteristics of CTMC which are:• Sojourn time m is exponentially distributed for each user, • The chain for user state is irreducible. For the first condition, it is important for the inter-checkin interval for each individual user to be distributed exponentially. Therefore, we statistically analyzed the transitions intervals for individual users’ check-ins for two levels namely; user check-ins in New York area, and user check-ins in overall USA area including New York. In this analysis, we used chi-square goodness of fit test to compare the user’s inter-check-in intervals to a generated exponential data with the same mean. The test shows that; for New York area, 72.08% of the users fit the exponential distribution at pvalue 5%. While this ratio increased to 81.1% when decreasing the p-value to only 1%. For the check-ins in the overall USA, the fitting ratio is 42.34 % and 50% for the pvalues of 5% and 1% respectively. Because of these distribution fitting ratios, we expect that the small areas models give better accuracy. The second important issue is the irreducibility of the Markov chain which simply means that the chain states form a connected graph (i.e. from any state there is a path to any other state). If the chain is not irreducible, this means that the system may stick at a state and will not be able to move to any other state, which is not suitable for human mobility where a human can move to any place and will not stick to a specific place. Fig. 5 shows the trajectory for a given user and how it is converted to a state diagram (we selected a user with a small number of check-ins for easy representation). Fig. 5-a is the spatial projection on the map. It shows that check-ins 1 and 3 are in different locations but they are located in the same square area so they are given the same location state A. Similarly, check-ins 4 and 5 are assigned the same state C.
Then Fig. 5-b shows the state transition diagram for this trajectory. It is clear that the chain is not irreducible. For example, if the user comes to state C, D or E, he would not be able to return to states A or B. This chain state diagram should be converted to an irreducible chain. To do that, we use the idea of the virtual home state. The main idea is that, if a user visited two different places and the inter-check-ins interval between these checkins are long enough (i.e. two check-ins in two different days) then it is intuitive that he/she returned back to his home between the two visits. We define this interval to be 12 hours. Thus, if the duration between two check-ins is greater than 12 hours, then the home state should be inserted between the states of these two check-ins. In addition to that, the initial and the end state for any user trajectory should be the home state. In this way, adding the home state creates a path between any two states in the chain state diagram, and converts any reducible chain to irreducible one. Proving that is easy as follows: by considering that, before adding the home state, each state in the trajectory is connected to at least two states (previous and next states) except the first and the last ones, then, adding the home state will, at least, create a path between the first and last ones forming a cyclic directed graph which is connected. Fig. 6 shows the chain after adding the new home state (H). It shows that adding the home state converted the chain to irreducible. It is worth to mention that the home is not necessary to be the user home address, it may be a hotel or any place in which the user sleep or takes a rest. Thus, this state is not a real location; it is a virtual state that represents a rest between two visits. Using this final state diagram, we create the transition rate matrix (Q-matrix) that represents the user mobility model. Based on this model, we can calculate the state probability for
(a) A
B
C
D
E
(b) Figure 5. Trajectory to Chain State: (a) Trajectory Map, (b) Trajectory Reducible State Diagram
individual trips is expected to increase. However, increasing this number above a specific threshold will not produce more saving because of the vehicle capacity. The methodology is described in details in the next subsections. A.
Figure 6. Final Irreducible State Diagram
the user which is the probability of being in each state in future. The transition rates for the Q-matrix are the inverse of the interval between two consequent check-ins. By satisfying these conditions, the only assumption we use in this model is the memoryless property of the chain that is the next state depends only on the current state regardless of the previous states. However, this assumption can be relaxed in future work by using higher order versions of the Markov chain. VI.
SIMULATION RESULTS
This section introduces the simulation results of the proposed system. First, we study the effectiveness of the prediction models. Then, the effect of some system parameters on the number of found carpooling matches is investigated. We implemented the system as described in the previous sections with the parameters shown in Table 1. The minimum inter-check-in interval is 1 hour. Thus, if a user made two or more check-ins within one hour, they are considered as only one check-in. The maximum inter-checkin interval is 12 hours. Thus, if the duration between two consecutive check-ins for the same user is greater than this threshold, the home state is inserted between them. To generate unique place ID of each check-in, the area of interest is divided into squares of 0.5 x 0.5 Km. To calculate the state probability distribution, we use time resolution of 6 hours, which give a reasonable accuracy as well as computation costs. TABLE I.
CARPOOLING RECOMMENDATION SYSTEM PARAMETERS
Parameter Minimum inter-check-in Interval Maximum inter-check-in Interval Prediction time resolution Active user check-in count threshold Spatial divisions area |[ |I
Value/s 1 Hour 12 Hours 6 Hours 50 check-in 0.5 x o.5 Km 0.4 0, 0.4, 0.5, 0.6, 0.7
The user mobility prediction is evaluated based on the prediction ranking of the places that are visited in the test dataset by the user. Also, in this context, we compare the CTMC based prediction of visited places to a simple ratio based prediction. The evaluation of the carpooling recommendation system is based on the average number of carpooling matching found per trip. As this number increases, the number of saved
Prediction Model Evaluation Fig. 7 shows the user state probability distribution for a sample user (we selected a user with a small number of interesting places) and its variation with time with different initial states. The number on each graph in Fig. 7 is the initial state for this graph. Fig. 7 shows that the state probability distribution changes significantly in the short prediction intervals (transient phase), then converge to steady state values. It also shows that the steady state probabilities are independent of the initial state, which is consistent with Markov chain. Fig. 7 also shows the significant effect of the initial state on the state probability in the transient state where the state probabilities change significantly for different initial states. To check the effectiveness of the prediction, we use the check-ins within both New York area alone and within USA. We temporally divide check-ins into two parts; the history check-ins (training dataset) and the future check-ins (test dataset). The former is used to create the user CTMC model, while and the latter is used to check the correctness of the prediction. The dataset covers a time span from Feb. 2009 to Oct. 2010. We use 'Aug/31/ 2010' date as the end of the history data. Then, for each of the modeled users, we select the check-ins he/she made in the testing dataset (future dataset) to places in his model (i.e. places he visited in his history). The predictions of these test visits are then calculated. For New York area, the future dataset includes 9975 check-ins, from them only 3330 are predictable checkins (predictable check-ins are those made by modeled users to modeled places). For the total USA check-ins, there are 70,314 future check-ins and 17,738 predictable check-ins. For each of these predictable visits, we calculate its prediction, then each place is ranked based on its probability (i.e. rank1 is the place with the highest probability, the next is ranked two …etc.). Letting } be the visited place rank and N is the total number of places for the given user, then the displacement ratio in this prediction is (} − 1)/*, which represents the normalized number of incorrect ranks made by the system. Fig. 8 shows the ranking displacement ratio versus the prediction ratio for both areas. It shows that for the USA check-ins predictions, about 34% of the visited places are ranked with the highest 10% of the ranked places and about 70% are ranked in the highest 40% ranks. For example, if the user history has 10 places in his history, then at any future time, the highest 4 ranked places will cover 70% of his visits in the future time while for the New York area only 26% of the visited venues are ranked within the 10% highest ranked places. This comparison shows that despite the better distribution fitting for the New York users’ inter-check-ins interval, the accuracy of the prediction is lower than that for the total USA check-ins. The reason is that the larger the area, the larger the number of user’s states modeled and the better the accuracy of the user’s check-ins.
Figure 7. State Probability Distribution for a Sample User (x- axis is The Time in Days). The number on each curve is the initial state (P0), where 0 is the virtual home location
Next, for the USA check-ins, we compare our prediction model to the simple hitting ratio based prediction where the probability of visiting a place in future equals to the number of user visits to this place in the past divided by the total number of user visits; = . Fig. 9 compares the average CTMC prediction to the ratio based prediction for the visited places in the test dataset. It is clear that the CTMC prediction is much better than the visiting ratio based prediction. B.
User Recommendations The dataset does not include the home location, we utilize on the locality of preference and locality of friendship concepts [4]; that is, users who are friends and interested in visiting the same places are, most probably, living close to each other. Based on this concept we do not need to find the home address of the users. Instead, we will use the similarity of interest and the friendship based similarity as measures of closeness in addition to mobility history as shown in Fig. 2. Another important issue is the inverse relation between the
Figure 8. Prediction Predicted Ratio vs Ranking Displacement.
Figure 9: 200 Sample-Window Average Prediction, CTMC vs. Visiting Ratio for the Total USA Future Visits
locality concepts and the spatial space, i.e. the wider the area, the lower the locality concepts, thus the lower the similarity. We apply the proposed system only on the small New York area where the locality concepts can be better preserved. To evaluate the system, we find the carpooling matching for each trip in the test dataset (future trips). To create the recommendation for a given trip, we first find the user_id for this trip, the trip time and the place visited. For the user, we find other users who are direct friend or those FoF who exceed the |[ threshold. Then, from those users we select user who For each of them, we find his/her pass the |I value. probability of visiting trip place around this given time. If this probability exceeds the threshold $%
% then a recommendation will be created. This threshold is defined as $%
% = v } \ K q qℎ }
In this way, $%
% represents multiples of user’s average visiting probability to any place where represents the system accuracy measure. The lower the value of , the lower the accuracy of the prediction and thus, the lower the accuracy on the recommendation, and vice versa. We run the system for values from 1 to 10. Fig. 10 shows the result average matching for each future trip versus and for different values of {I . {I = 0 means that the interest based similarity filtering is disabled. Based on this figure and the user recommendation acceptance rate we can define the parameters that fits well for the system. For example, assuming 50% acceptance rate for vehicles occupancy of 4 passenger, then to maximize the saving we need 3 accepted recommendation, thus, 6 total average recommendation for each trip. This average can be achieved at = 1.5 and {I = 0.4. From Fig. 10, we can
Figure 10: Average Matching for every Future Trip
conclude that between 1.5 and 2.5 and {I between 0.4 and 0.5 will produce between 1.5 to 6 average matchings per trip.
[9]
VII. SUMMARY AND FUTUR WORK In this paper, a carpooling recommendation system is proposed to help users plan their trips and recommends friends for carpooling. The proposed recommender system utilizes the spatiotemporal and geosocial data to create the carpooling recommendations. It has two main components; similarity detection component and mobility prediction component. The similarity detection uses the user mobility history, friendship and friend-of-friend (FOF) to find similar users. For the mobility, we use continuous time Markov chain to model user mobility and predict their future movements. The proposed prediction model shows reasonable prediction accuracy where the future visits ranking keeps within 10% among overall the history place for about 35% of the predictions. It also shows the importance of the system parameters which should be customized based on the required number of recommendations needed. In the future work, we plan to study the effect of different parameters. The places also have many attributes and relation such as the relation between these places and the closeness between them. Consequently, it is worthwhile to consider these attributes and study their impact on the similarity and prediction components. Another important direction for the future work is to study the computational complexity of the algorithms. This study is essential to apply these models in real time application. Moreover, the work in this paper is built on the assumption of the memoryless of user mobility. This assumption might not be completely valid. Thus, it is important to relax this assumption by considering higher order Markov chains (second and third order), and compare this accuracy to the result n this work.
[10] [11] [12]
[13]
[14]
[15]
[16]
[17] [18]
[19] [20]
[21]
REFERENCES [1]
L. Garton, C. Haythornthwaite, and B. Wellman, "Studying Online Social Networks," Journal of Computer-Mediated Communication, vol. 3, 1997. [2] H. Hartenstein and K. Laberteaux, VANET vehicular applications and inter-networking technologies vol. 1: John Wiley & Sons, 2009. [3] A. Elbery, M. ElNainay, F. Chen, C.-T. Lu, and J. Kendall, "A carpooling recommendation system based on social VANET and geosocial data," Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Orlando, Florida, 2013. [4] J. Bao, Y. Zheng, and M. F. Mokbel, "Location-based and preferenceaware recommendation using sparse geo-social networking data," presented at the Proceedings of the 20th International Conference on Advances in Geographic Information Systems, Redondo Beach, California, 2012. [5] A. Aziz, K. Sanwal, V. Singhal, and R. Brayton, "Model-checking continuous-time Markov chains," ACM Trans. Comput. Logic, vol. 1, pp. 162-170, 2000. [6] U. S. D. Energy, "Annual Energy Outlook 2008, With Projection to 2030,," Energy Inf. Admin., Washington, DC, Rep. DOE/EIA0383(2008), , Jun. 2008. [7] U.S. Environ Protection Agency, "Inventory of U.S. greenhouse gas emissions and sinks," 1990–2006, Washington, DC, , Apr. 15, 2008. 2006. [8] M. Caliskan, D. Graupner, and M. Mauve, "Decentralized discovery of free parking places," presented at the Proceedings of the 3rd
[22]
[23]
[24]
[25]
[26]
[27]
international workshop on Vehicular ad hoc networks, Los Angeles, CA, USA, 2006. http://www.connectingcommuters.org/carpool/. (Accessed December 2015). https://carmacarpool.com/. (Accessed December 2015). https://www.blablacar.co.uk/. (Accessed December 2015). M. Menendez and C. F. Daganzo, "Effects of HOV lanes on freeway bottlenecks," Transportation Research Part B: Methodological, vol. 41, pp. 809-822, 2007. J. Kwon and P. Varaiya, "Effectiveness of California’s High Occupancy Vehicle (HOV) system," Transportation Research Part C: Emerging Technologies, vol. 16, pp. 98-115, 2008. A. Guin, M. Hunter, and R. Guensler, "Analysis of Reduction in Effective Capacities of High-Occupancy Vehicle Lanes Related to Traffic Behavior," Transportation Research Record: Journal of the Transportation Research Board, vol. 2065, pp. 47-53, 2008. G. Correia and J. M. Viegas, "Carpooling and carpool clubs: Clarifying concepts and assessing value enhancement possibilities through a Stated Preference web survey in Lisbon, Portugal," Transportation Research Part A: Policy and Practice, vol. 45, pp. 81-90, 2011. Kendall, Donald C. Carpooling: Status and potential. No. PB-244609; DOT-TSC-OST-75-23. Department of Transportation, Cambridge, Mass.(USA). Transportation Systems Center; US Dept. of Transportation, Transportation Systems Center, Office of Systems Research and Analysis, Kendall Square, Cambridge, MA 02142, 1975. R. F. Teal, "Carpooling: Who, how and why," Transportation Research Part A: General, vol. 21, pp. 203-214, 1987. H. Yang and H.-J. Huang, "Carpooling and congestion pricing in a multilane highway with high-occupancy-vehicle lanes," Transportation Research Part A: Policy and Practice, vol. 33, pp. 139155, 1999. M. W. Burris and J. R. Winn, " Houston — Casual Carpool Passenger Characteristics," Journal of Public Transportation, vol. 9, 2006. J. Li, P. Embry, S. Mattingly, K. Sadabadi, I. Rasmidatta, and M. Burris, "Who Chooses to Carpool and Why?: Examination of Texas Carpoolers," Transportation Research Record: Journal of the Transportation Research Board, vol. 2021, pp. 110-117, 2007. R. Trasarti, F. Pinelli, M. Nanni, and F. Giannotti, "Mining mobility user profiles for car pooling," presented at the Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, San Diego, California, USA, 2011. N. Bicocchi and M. Mamei, "Investigating ride sharing opportunities through mobility data analysis," Pervasive and Mobile Computing, vol. 14, pp. 83-94, 2014. T. Vanoutrive, E. Van De Vijver, L. Van Malderen, B. Jourquin, I. Thomas, A. Verhetsel, et al., "What determines carpooling to workplaces in Belgium: location, organisation, or promotion?," Journal of Transport Geography, vol. 22, pp. 77-86, 2012. B. Young-Ji, J. Young Seon, S. M. Easa, and B. Joonsang, "Feasibility analysis of transportation applications based on APIs of social network services," 2013 8th International Conference for Internet Technology and Secured Transactions (ICITST), 2013, pp. 59-64. M. Damashek, "Gauging Similarity with n-Grams: LanguageIndependent Categorization of Text," Science, vol. 267, pp. 843-848, February 10, 1995. K. Church and W. Gale, "Inverse Document Frequency (IDF): A Measure of Deviations from Poisson," in Natural Language Processing Using Very Large Corpora. vol. 11, S. Armstrong, K. Church, P. Isabelle, S. Manzi, E. Tzoukermann, and D. Yarowsky, Eds., ed: Springer Netherlands, 1999, pp. 283-295. E. Cho, S. A. Myers, and J. Leskovec, "Friendship and mobility: user movement in location-based social networks," presented at the Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, San Diego, California, USA, 2011.