International Journal of
Geo-Information Article
Agent-Based Modeling of Taxi Behavior Simulation with Probe Vehicle Data Saurav Ranjit 1, * 1 2 3
*
ID
, Apichon Witayangkurn 2 , Masahiko Nagai 3 and Ryosuke Shibasaki 2
Institute of Industrial Science, The University of Tokyo, 4-6-1, Komaba, Meguro-ku, Tokyo 153-8505, Japan Center for Spatial Information Science, The University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa-shi, Chiba 277-8568, Japan;
[email protected] (A.W.);
[email protected] (R.S.) Graduate School of Sciences and Technology for Innovation, Yamaguchi University, 2-16-1, Tokiwadai, Ube, Yamaguchi 755-8611, Japan;
[email protected] Correspondence:
[email protected] or
[email protected]
Received: 10 March 2018; Accepted: 7 May 2018; Published: 8 May 2018
Abstract: Taxi behavior is a spatial–temporal dynamic process involving discrete time dependent events, such as customer pick-up, customer drop-off, cruising, and parking. Simulation models, which are a simplification of a real-world system, can help understand the effects of change of such dynamic behavior. In this paper, agent-based modeling and simulation is proposed, that describes the dynamic action of an agent, i.e., taxi, governed by behavior rules and properties, which emulate the taxi behavior. Taxi behavior simulations are fundamentally done for optimizing the service level for both taxi drivers as well as passengers. Moreover, simulation techniques, as such, could be applied to another field of application as well, where obtaining real raw data are somewhat difficult due to privacy issues, such as human mobility data or call detail record data. This paper describes the development of an agent-based simulation model which is based on multiple input parameters (taxi stay point cluster; trip information (origin and destination); taxi demand information; free taxi movement; and network travel time) that were derived from taxi probe GPS data. As such, agent’s parameters were mapped into grid network, and the road network, for which the grid network was used as a base for query/search/retrieval of taxi agent’s parameters, while the actual movement of taxi agents was on the road network with routing and interpolation. The results obtained from the simulated taxi agent data and real taxi data showed a significant level of similarity of different taxi behavior, such as trip generation; trip time; trip distance as well as trip occupancy, based on its distribution. As for efficient data handling, a distributed computing platform for large-scale data was used for extracting taxi agent parameter from the probe data by utilizing both spatial and non-spatial indexing technique. Keywords: agent-based modeling and simulation; origin destination; taxi demand; taxi free movement; index and search; big data; distributed computing
1. Introduction Taxi behavior is characterized by the dynamic discrete time dependent events involving customer pick-up, customer drop-off, cruising, and parking within the spatial and temporal domain. Simulation models which are a simplification of a real-world system, can help understand effects of change of such dynamic behavior. In this regard, agent-based simulation and modeling, in which each taxi behaves as an agent, can capture such dynamic behavior through reconstructing complex patterns by decomposing complex systems down to the level of single agents that are administrated by sets of behavior rules [1]. The advantage of agent-based modeling is that, rather than modeling the entire system with a single equation, the entire system is modeled with the collection of autonomous taxi
ISPRS Int. J. Geo-Inf. 2018, 7, 177; doi:10.3390/ijgi7050177
www.mdpi.com/journal/ijgi
ISPRS Int. J. Geo-Inf. 2018, 7, 177
2 of 24
agent with rules governing them, which makes complex individual agent behave more naturally [2]. In this way, agent-based simulation and modeling can highlight the effect of a change in taxi services and its impact to driver’s income profitability through optimizing parameters (number of trips, passenger waiting time) derived from simulation. As an example, what will be the impact on taxi behavior service when the number of agents i.e., taxi is increased to the region of low taxi demand or decreased to the region of high taxi demand. Understanding such causality could help better management of taxi fleets with regards to the operational cost as well as improve taxi driver’s income. Moreover, recently, many big cities, such as London and New York, have plans to adopt electric taxis [3], and understanding discrete taxi behavior through agent-based modeling could help optimize locations for charging stations, which are crucial for such electric vehicles. As taxis services are operational throughout the city, spatial and temporal information from these vehicles can be an asset for governing different aspects of urban management. Information, as such, could contribute mainly to helping make better decision-making processes at both government and local level. Having said this, the constant advancement of collecting moving trajectory data in space and time has opened up the possibility of a wide range of study in the field of spatial information science [4]. One of the primary technologies for retrieval of information of various traffic data is from stationary equipment, like loop detectors, for automatic vehicle identification. However, they are limited to specific sections of road. On the contrary, a probe car, also known as a probe vehicle or floating car, utilizes the running vehicles to gather various traffic information, and has been an emerging ITS technology for modeling vehicle behavior [5–7]. Big cities, like New York and Beijing, have taxis already equipped with GPS sensors that collects spatial and temporal data to a data center to be processed to extract traffic information [8]. The taxi driver mobility intelligence is an essential factor to maximize both profit and reliability within every possible scenario, and the knowledge about the service can be an advantage for the driver [9]. However, to understand such stochastic dynamics of taxi behavior, micro-level simulation models are required, which can be further analyzed for optimization of taxi services by adjusting parameters like demand, supply, or altering dispatching algorithm [10]. In this paper, agent-based modeling and simulation (ABMS) was implemented, for which in recent years, has been seen in many areas of application, such as flow evacuation, traffic, and customer flow management [2]. Agent-based modeling and simulation describes the dynamic action of an entity i.e., taxi agent governed by behavior rule and properties, similar to the work presented in [6,11,12], to emulate the taxi behavior in Bangkok, Thailand. The contributions of this paper are summarized as follows:
• • •
Proposed a taxi agent regarding spatial and temporal domain based on a stay point cluster of probe GPS data and a kernel density of its timestamp. Formulated a concept of free taxi movement based on the movement direction of the taxi, which was introduced for searching passengers. Developed an agent-based simulation model which is based on multiple parameters (taxi stay point cluster; trip information (origin and destination); taxi demand information; free taxi movement and network travel time) that were derived from probe GPS taxi data. As such, agent’s parameters were mapped into a grid network and the road network, for which the grid network was used as a base for query/search/retrieval of taxi agent’s parameters, while the actual movement of taxi agents was on the road network, with routing and interpolation.
The motivation of taxi behavior simulation modeling is to optimize taxi service operation, which would be the subject of future study, through an increased number of passenger trips, making drivers wait a less amount of time to get their next passenger, and making more extended passenger trips, as well as determine optimum working time based on the spatial and temporal domain. However, to identifying and evaluating such optimizing parameters, knowing real taxi behavior is a must. The proposed agent-based simulation and modeling recreates the real taxi behavior from which
ISPRS Int. J. Geo-Inf. 2018, 7, 177
3 of 24
optimizing parameters could be derived, that would improve the taxi service for both driver, regarding monetary profit, as well as for passenger, regarding service level of the taxi. Also, spatial data, as such, probe GPS taxi data, with its ubiquitous properties, are enormous, and in most cases, deemed confidential. In such cases, obtaining raw spatial data is somewhat complicated. However, simulation and modeling techniques proposed in the study could essentially recreate such spatial data, with secondary data derived, and with properties as similar with the real data. In this regard, such simulation and modeling techniques are not only limited to vehicle behavior, but also could be implemented in simulating human mobility behavior from GPS or call detail record data. 2. Literature Review In computer modeling, the term “model” describes the abstract or simplified representation of a real world that is already present or planned for the future. The simulation model is typically defined as a mathematical process or an algorithm that depends on various input parameters, which, when processed with mathematical expressions, will result in one or more than one output, encapsulating the behavior and performance of a system in real-world scenarios [11,13]. Taxi service simulation is a dynamic process involving changing demand and supply, as well as urban traffic environment, which suggests stochastic behavior of taxi services that govern the movement, as well as the distribution of taxis [10,14]. Taxi customer bilateral searching and meeting behavior in a network was proposed in [15], which considered stochastic micro-searching behavior of both taxis and customers when they are searching for each other based on customer origin-destination (OD). The model featured location variation in the level of taxi services and stochastic microscopic searching behavior, such that the taxi searched for passenger locally in the network that incorporated Markov chain approach as a route for which transition probability or the link choice probability was specified by the customer pick-up rate within the network. An hourly zone-based origin-destination matrix with the occupied vehicle was developed for evaluating the taxi service behavior, which was then implemented for evaluating time-based taxi demand and supply concerning a given location [7]. A probabilistic based model for time-dependent taxi behavior on a road segment, as well as parking space, was devised for taxi passenger recommendation, in which the probability of picking up a passenger was estimated when the taxi went for a specific parking space [16]. The model was primarily a recommendation system used for suggesting the taxi driver with a location, towards which they would pick up a passenger. Moreover, [17] proposed passenger-finding strategies based on large real-world taxi data which utilized two passenger-finding strategies which were looking or waiting for a passenger that was analyzed using average pick-up number over the given period and location. The model focus was also predicting potential passenger for the event before pick-up and after drop-off only. A time-dependent taxi behaviors model was proposed which incorporated a taxi picking up, dropping off, cruising, and parking system for both taxi drivers and passengers. The model was also primarily a recommendation system that was developed considering the queue length at parking places, along with day type and weather condition. The model provided a number of top parking places along with routes to them, given the current location and time of the taxi driver or a passenger [8]. Time-dependent logit based search models were proposed using global positioning data from an urban taxi, in which profit per unit time was used as the factor characterizing taxi drivers’ search behavior [18]. A cell-based local customer search behavior was implemented for understanding vacant taxi behavior using a cell/grid-based approach which showed customer search decisions were significantly affected by the probability of successfully picking up a customer along the search route [19]. The model was further improved by introducing discrete choice behavior representing taxi search behavior of taxi customers for hailing vacant taxis on the street, proposed by [20], which adopted a multinomial logit approach to model the preference of taxi customers of hailing vacant taxis on streets. Furthermore, the study has been made for a prediction model, which employed learning algorithms to the GPS data. Real-time streaming data was implemented for predicting taxi passenger
ISPRS Int. J. Geo-Inf. 2018, 7, 177
4 of 24
demand at a given taxi stand [9], in which the model predicted the passenger demand over the taxi stand for a given period in future. The existing taxi behavior model primarily focuses on finding passenger strategies through demand prediction with a recommendation system, while few studies are present that would provide insight into the effect of an oversupply of taxis in the given area or vice versa [21], the reason for which real taxi behavior modeling is important, as that would replicate the real-world system. The proposed agent-based simulation model in this research primarily focuses on understanding the real taxi behavior by utilizing GPS data from the probe taxi, from which further investigation could be made for ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 4 of 24 an efficient passenger-finding system, managing taxi fleet operations, i.e., optimizing the taxi service realas taxi behavior modeling important, that would replicate the real-world system. The respect to operation,which as well understanding theisimpact of as oversupply or undersupply of taxis with proposed agent-based simulation model in this research primarily focuses on understanding the real existing demand. Agent-based simulation in the field of computational science has proved to become taxi behavior by utilizing GPS data from the probe taxi, from which further investigation could be a powerfulmade tool for foran analyzing complex problems, random or stochastic behavior, as similar to the efficient passenger-finding system,where managing taxi fleet operations, i.e., optimizing the taxi service as well as understanding the impact of oversupply or undersupply of taxis a discrete taxi behavior, can beoperation, presented together with behavioral rules. In this regard, [12] proposed with respect to existing demand. the Agent-based simulation in the field of computational science has of which event simulation model for modeling behavior of agents operating in a city road network proved to become a powerful tool for analyzing complex problems, where random or stochastic agents make their own decisions for making trips. Similarly, [6] further provided a multi-agent-based behavior, as similar to the taxi behavior, can be presented together with behavioral rules. In this simulation, which driver’s a decentralized discrete focusing on regard, [12]modeled proposed taxi a discrete eventstrategies simulation as model for modeling the behaviorevent, of agents a city road network of which make their to own decisions for making trips. modeling operating only the in taxi driver’s behavior, whichagents was designed make an aggregated pattern of taxi [6] further a multi-agent-based simulation, which modeled taxi driver’s strategies movementSimilarly, as similar to theprovided real world. as a decentralized discrete event, focusing on modeling only the taxi driver’s behavior, which was designed to make an aggregated pattern of taxi movement as similar to the real world.
3. System Overview and Preprocessing
3. System Overview and Preprocessing The overall system overview is shown in Figure 1, which has preprocessing, data preparation, The overall systemas overview is shown Figure 1, which has preprocessing, data preparation, and taxi behavior modeling its stages, and in indexing and processing platform as its tool to handle and taxi behavior modeling as its stages, and indexing and processing platform as its tool to handle the big data. The preprocessing stage consists of preparing for the grid network and road network, the big data. The preprocessing stage consists of preparing for the grid network and road network, with conducting cleaning and map matching rawprobe probe GPS with conducting cleaning and map matchingofofthe the raw GPS data.data.
Figure 1. System overview.
Figure 1. System overview. In the data preparation stage, multiple secondary datasets from cleaned probe data were extracted, including stay point, passenger trip data,datasets origin-destination probability data, demand In the data preparation stage, multiple secondary from cleaned probe data were extracted, probability data, direction probability data, and grid network road network travel time data. In taxi including stay point, passenger trip data, origin-destination probability data, demand probability data, behavior modeling stage, agent-based modeling was implemented, that simulated the taxi behavior direction probability data, and grid network road network travel time data. In taxi behavior modeling of the urban city. Managing a large volume of data requiresthat an efficient indexing that would handle stage, agent-based modeling was implemented, simulated the technique taxi behavior of the urban city. index, search, and retrieval jobs [22]. In this research, both spatial and non-spatial indexing technique was implemented for the simulation purpose. STR tree, which is sort–tile–recursive R tree from Java
ISPRS Int. J. Geo-Inf. 2018, 7, 177
5 of 24
Managing a large volume of data requires an efficient indexing technique that would handle index, search, and retrieval jobs [22]. In this research, both spatial and non-spatial indexing technique was implemented for the simulation purpose. STR tree, which is sort–tile–recursive R tree from Java Topological Suite (JTS), was implemented to index and search spatial data. As for non-spatial data, an index and search engine named Lucene, that works on vector space model algorithms, was implemented for all query, search, and retrieval tasks during the simulating operation [23]. In addition to the large indexing volume of data, the preprocessing of all the data to be utilized for simulation, including cleaning, retrieving trip information, origin-destination, stay point extraction, direction movement extraction, was conducted in Apache Hadoop/Hive large-scale distributed computing system [24]. The total GPS probe data preprocessed from 1 June 2015 to 31 July 2015 was about 2.2 billion data rows which were stored in Hadoop Distributed File System (HDFS). Each data row consisted of a GPS data points with specification as described in Table 1. For spatial data processing, Apache Hive based query HiveQL (Hive Query Language) was developed including Hive UDF (User Defined Function) and Hive UDAF (User Defined Aggregated Function). Table 1. Probe data specification. Data Parameter IMEI Latitude Longitude Speed Direction Error
Description
Sample
International mobile equipment identification number.
10015646
Geographic coordinate of the taxi regarding decimal degree.
13.749 100.553
The speed of a moving taxi in km/hr.
42
The direction of a moving taxi in degree.
208
Error status of for each GPS data point.
0
Engine
Engine status (0/1): 0 indicates the engine is off; 1 indicates the engine is on.
1
Meter
Passenger occupancy status (0/1): 0 indicate taxi with no passenger; 1 indicates taxi with a passenger.
0
Timestamp
Unix epoch timestamp. Time system which is described as a number of seconds elapsed since 00:00:00 coordinated universal time, 1 January 1970.
Data source
Indicates the type of vehicle from which the data are being transmitted.
1388509240 9
3.1. Vehicle Probe Data In this research, the vehicle is the taxi that is running in and around Bangkok, Thailand, of which data is provided by Toyota Tsusho Nexty Electronics (Thailand) Co., Ltd., Bangkok, Thailand. The probe GPS data was collected from approximately 10,000 taxis with a sampling time of 3 s or 5 s, which was collected from 1 June 2015 to 31 July 2015. Each of the probe data collected belongs to the spatial trajectory generated by moving taxi in geographical space such that trajectory Ti = {p1 , p2 , p3 , . . . , pj }, where pj = (xj , yj , tj ), such that xj = longitude, yj = latitude, and tj = timestamp. Table 1 shows the data specification and sample data of collected probe data. 3.2. Prepare Road Network and Grid Network Open street map data of Thailand was utilized for the road network for which topological error was cleaned [25]. Here, road network was represented by R such that R = {r1 , r2 , r3 , r4 , . . . , rn }, where r1 , r2 , r3 , r4 , . . . , rn is each road segment. The total of 228,416 OSM road network features was extracted for Bangkok and the surrounding provinces. Following the preparation of OSM road network data, taxi probe GPS data were preprocessed to remove erroneous datasets. The cleaned GPS data were then map-matched with probabilistic map-matching process, with open street map road network R [25], that mapped GPS data on the road segment, which was the subject of the previous research work. The small grid size of 500 × 500 meters was chosen as grid network, in order to preserve spatial patterns and characteristics in the grid [26], however, the optimum grid size selection is still subjective, as larger grid size could be suitable for suburban or rural areas, but not suitable for dense urban
ISPRS Int. J. Geo-Inf. 2018, 7, 177
6 of 24
area [27]. A grid network of 500 × 500 meters was constructed, covering all of Bangkok region, as well as surrounding provinces. Here, grid network was represented by G such that G = {g1 , g2 , g3 , g4 , . . . , gm }, where g1 , g2 , g3 , g4 , . . . , gm is each grid or cell. The total of 64,620 grid network features ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 6 of 24 was prepared for Bangkok and the surrounding provinces. In addition to the road network map ISPRS Int. Geo-Inf. 2018, 7, x FOR PEER REVIEW 6the of 24 matching, theJ. cleaned also to the grid network G.the Figure shows OSM the cleaned GPS dataGPS weredata also were mapped to mapped the grid network G. Figure 2 shows OSM2road network grid network in network Bangkok and surrounding road and network and grid in Bangkok andprovinces. surrounding provinces. the cleaned GPS data were also mapped to the grid network G. Figure 2 shows the OSM road network Grid network was used as itsimplified simplified the the computation computation while maintaining bothboth spatial and and Grid network was used as it while maintaining spatial and grid network in Bangkok and surrounding provinces. temporal relevance of the aggregated dataset. Also, use of grid network splits the given spatial region temporal relevance of the aggregated Also, use of grid network splits the given spatialand region Grid network was used as it dataset. simplified the computation while maintaining both spatial into disjoint areas, which makes it easy to inspect for further qualitative analysis [27]. The road into disjoint areas, which it easy todataset. inspect for further qualitative road network temporal relevance ofmakes the aggregated Also, use of grid network analysis splits the[27]. givenThe spatial region network was used during the preprocessing step of cleaning and map-matching probe taxi data, and into during disjoint the areas, which makesstep it easy to inspect formap-matching further qualitative analysis [27].and Thethen roadlater was used preprocessing of cleaning and probe taxi data, then later used routing and interpolation of the simulated taxi agent trajectory, as described in Section network was used during the preprocessing step of cleaning and map-matching probe taxi data, used routing and interpolation of the simulated taxi agent trajectory, as described in Section 5.and 5. then later used routing and interpolation of the simulated taxi agent trajectory, as described in Section 5.
Figure 2. (Left) Open Street Map (OSM) Bangkok road network; (Right) Grid network.
Figure 2. (Left) Open Street Map (OSM) Bangkok road network; (Right) Grid network. Figure 2. (Left) Open Street Map (OSM) Bangkok road network; (Right) Grid network.
4. Data Preparation
4. Data Preparation
4. Data Preparation 4.1. Data Aggregation
4.1. Data Aggregation
grid network mapped data, data were further categorized as weekday and weekend 4.1. For Datathe Aggregation data. Mapped and categorized probe data were then aggregated to each grid network atweekend multiple For the mapped data,data, datadata were further categorized as as weekday Forgrid the network grid network mapped were further categorized weekdayand and weekenddata. time steps of 1 h, 15 min, and 5 min. Figure 3 shows grid network G at time steps of Δt. Mapped and categorized probe dataprobe weredata thenwere aggregated to each grid network at multiple time steps data. Mapped and categorized then aggregated to each grid network at multiple of 1 h,time 15 min, Figure shows grid3 network G network at time steps of ∆t. stepsand of 1 5h,min. 15 min, and 53 min. Figure shows grid G at time steps of Δt.
Figure 3. Probe data aggregation in the grid network G for weekday and weekend.
Figure 3. Probe datadata aggregation ininthe Gfor forweekday weekday and weekend. Figure 3. Probe aggregation thegrid gridnetwork network G and weekend. The flow diagram of a data aggregation process is shown in Figure 4. As mentioned, the probe data were, at first, mapped into the grid network. The mapped data were then partitioned based on The flow diagram of a data aggregation process is shown in Figure 4. As mentioned, the probe data were, at first, mapped into the grid network. The mapped data were then partitioned based on
ISPRS Int. J. Geo-Inf. 2018, 7, 177
7 of 24
ISPRSThe Int. J.flow Geo-Inf. 2018, 7, xof FOR PEERaggregation REVIEW diagram a data ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW
of 24 process is shown in Figure 4. As mentioned, the 7probe 7 of 24 data were, at first, mapped into the grid network. The mapped data were then partitioned based on weekday and and weekend. weekend. Finally, an aggregation aggregation function function was was used used for for aggregating aggregating the the data data for for time time weekday Finally, an weekday and weekend. Finally, an aggregation function was used for aggregating the data for time steps of of ∆t Δt at at road road network network and and grid grid network network for for both both weekday weekday and and weekend weekend data data partitioned. partitioned. steps steps of Δt at road network and grid network for both weekday and weekend data partitioned.
Figure 4. Flow diagram diagram of of aa data data aggregation aggregation process. Figure 4. Flow process. Figure 4. Flow diagram of a data aggregation process.
4.2. Stay 4.2. Stay Point Point Extraction Extraction 4.2. Stay Point Extraction Stay point point denotes denotes locations locations where where the the vehicle vehicle (taxi) (taxi) have have stopped stopped or or stayed stayed at at some Stay some location location for for Stay interval point denotes locations where the vehicle (taxi) have stoppedwhile or stayed at some location for certain aa certain interval of of time, time, such such as as aa parking parking place place or or aa gas gas station, station, or or while looking looking for for aa passenger. passenger. aTaxi certain of time, such was as a extracted, parking place ordepicts a gas station, orlocation while looking a passenger. stay interval point cluster cluster location which the start start for each eachfor taxi during the the Taxi stay point location was extracted, which depicts the location for taxi during Taxi stay point clusterpoint location was extracted, which depicts the start location for each taxi 𝑃𝑃during the simulation. The simulation. The stay stay pointalgorithm algorithmfirst firstchecks checksthe thedistance distancebetween betweenthe theanchor anchorpoint point P P2, 𝑎𝑎𝑎𝑎𝑎𝑎ℎ𝑜𝑜𝑜𝑜 anchor P2, simulation. The stay5,point algorithm firstpoints checks𝑃𝑃the distance between the anchor point 𝑃𝑃𝑎𝑎𝑎𝑎𝑎𝑎ℎ𝑜𝑜𝑜𝑜 P2, as shown in Figure with the successor (P3, P4, P5) within the distance threshold as shown in Figure 5, with the successor points P𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 successor (P3, P4, P5) within the distance threshold as shown in Figure 5, with the successor pointsto𝑃𝑃𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 (P3, P4, P5) was within the distance Following threshold 𝐷𝐷 value [28,29]. was chosen meters, which setempirically. empirically. D𝑡𝑡ℎ𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑜𝑜𝑜𝑜𝑜𝑜 [28,29]. D𝐷𝐷threshold was chosen to bebe5050meters, which was set Following 𝑡𝑡ℎ𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑜𝑜𝑜𝑜𝑜𝑜 threshold value 𝐷𝐷 value [28,29]. 𝐷𝐷𝑡𝑡ℎ𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑜𝑜𝑜𝑜𝑜𝑜 chosen to be 50 meters, which was setwas empirically. Following 𝑡𝑡ℎ𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑜𝑜𝑜𝑜𝑜𝑜 this, the time time span between anchor was point and last last successor point P5, which which within the the distance this, the span between anchor point and successor point P5, was within distance this, the time span between anchor point and last successor point P5, which was within the distance threshold, was measured. If the time span measured was greater than the time threshold 𝑇𝑇 threshold, was measured. If the time span measured was greater than the time threshold T𝑡𝑡ℎ𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑜𝑜𝑜𝑜𝑜𝑜 threshold threshold, If the locations time spanwere measured was than theas time threshold 𝑇𝑇𝑡𝑡ℎ𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑜𝑜𝑜𝑜𝑜𝑜 value of of 10 10was min,measured. then stay stay point point detected forgreater the given given taxi, shown in Equation Equation (1). value min, then locations were detected for the taxi, as shown in (1). value of 10 min, then stay point locations were detected for the given taxi, as shown in Equation (1). 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = {(𝑃𝑃𝑎𝑎𝑎𝑎𝑎𝑎ℎ𝑜𝑜𝑜𝑜 , 𝑃𝑃𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 ) < 𝐷𝐷𝑡𝑡ℎ𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑜𝑜𝑜𝑜𝑜𝑜 &(𝑃𝑃𝑎𝑎𝑎𝑎𝑎𝑎ℎ𝑜𝑜𝑜𝑜 , 𝑃𝑃𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 ) > 𝑇𝑇𝑡𝑡ℎ𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑜𝑜𝑜𝑜𝑜𝑜 (1) Stay𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 point=={(𝑃𝑃 {( Panchor, 𝑃𝑃 , Psuccessor )< Dthreshold & (1) ( Panchor ,, P successor ) > Tthreshold 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 &(𝑃𝑃 (1) 𝑎𝑎𝑎𝑎𝑎𝑎ℎ𝑜𝑜𝑜𝑜 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 ) < 𝐷𝐷𝑡𝑡ℎ𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑜𝑜𝑜𝑜𝑜𝑜 𝑎𝑎𝑎𝑎𝑎𝑎ℎ𝑜𝑜𝑜𝑜 𝑃𝑃𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 ) > 𝑇𝑇 𝑡𝑡ℎ𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑜𝑜𝑜𝑜𝑜𝑜
Figure 5. Stay point extraction. Figure Figure 5. 5. Stay Stay point point extraction. extraction.
Stay points were extracted for the clustering as to simplify the processing for the clustering Stay points were extracted for the as probe to simplify the processing the clustering algorithm. As mentioned previously, theclustering number of GPS data points fromfor probe taxis was Stay points were extracted for the clustering as to simplify the processing for the clustering algorithm. As mentioned previously, the number of probe GPS data points from probe taxis was about 2.2 billion data rows. Clustering of this vast dataset would take lots of time as well as clusters algorithm. As mentioned previously, the number of probe GPS data points from probe taxis was about about 2.2 billion data rows. Clustering of this vast dataset would take lots of time as well as clusters would be difficult to Clustering separate asofGPS would would be dense inlots particular overcome this 2.2 billion data rows. thistraces vast dataset take of time region. as well To as clusters would would be difficult to was separate as GPS traces would bethe dense in particular region. Totime overcome this difficulty, point extracted would reduce computational and would be difficultstay to separate as GPS tracesthat would be dense in particular region. processing To overcome this difficulty, difficulty, stay point clusters. was extracted that would reduce the computational processing time and would provide meaningful stay point was extracted that would reduce the computational processing time and would provide provide meaningful A grid G basedclusters. DBSCAN algorithm was implemented for each taxi stay point identified as meaningful clusters. A grid G based DBSCAN implemented fortoeach stay point identified as proposed in [30–32], where thealgorithm minimumwas number of points formtaxi clusters (MinPts) and the A grid G based DBSCAN algorithm was implemented for each taxi stay point identified as proposed in [30–32], where the minimum number of points to form clusters (MinPts) and the maximumindistance twominimum points that belongs the cluster were chosen empirically, proposed [30–32],within where the number of to points to form(epsilon clustersε) (MinPts) and the maximum maximum distance within twoofpoints thatrequired. belongs to theMinPts cluster (epsilon ε)chosen were chosen depending upontwo the number clusters The as 100empirically, points, and distance within points that belongs to the cluster (epsilon ε)value were was chosen empirically, depending depending upon the number of clusters required. The MinPts value was chosen as 100 points, and epsilon distance ε was chosen as 50 m. However, both values could be adjusted depending upon the upon the number of clusters required. The MinPts value was chosen as 100 points, and epsilon distance epsilon distance ε was chosen as 50 m. However, both values could be adjusted depending upon the of clusters on the clustering algorithm could be achieved with εnumber was chosen as 50 m.required. However,Improvement both values could be adjusted depending upon the number of clusters number of clusters required. Improvement on the clustering could be achieved with HDBSCAN clustering [33] uses only single parameter, i.e., algorithm MinPts forHDBSCAN the clustering algorithm. required. Improvement on which the clustering algorithm could be achieved with clustering [33] HDBSCAN [33] which uses only single parameter, i.e., MinPts fordistance the clustering algorithm. However, asclustering stay points were extracted before clustering, with a threshold of 50 m, all those However, stay identified points were before clustering, a threshold distanceHence, of 50 m, all those points thataswere asextracted stay points were within 50with m threshold distance. traditional points that were identified as stay points were within 50 m threshold distance. Hence, traditional DBSCAN, with an epsilon distance of 50 m, was used for clustering instead of HDBSCAN. Finally, DBSCAN, with an epsilon of 50 m,that wasrepresented used for clustering Finally, the centroid of each clusterdistance was computed the stay instead locationof forHDBSCAN. each taxi. For each the centroid of each cluster was computed that represented the stay location for each taxi. For each
ISPRS Int. J. Geo-Inf. 2018, 7, 177
8 of 24
which uses only single parameter, i.e., MinPts for the clustering algorithm. However, as stay points were extracted before clustering, with a threshold distance of 50 m, all those points that were identified ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 8 of 24 as stay points were within 50 m threshold distance. Hence, traditional DBSCAN, with an epsilon distance ofInt. 50J. m, was2018, used for clustering instead of HDBSCAN. Finally, the centroid of each cluster was Geo-Inf. 7,time x FOR was PEER REVIEW 8 of during 24 of theISPRS clusters, the start computed that represented the starting time for the taxi computed that represented the stay location for each taxi. For each of the clusters, the start time was simulation, using the kernel density function for the timestamp of each of the points in the clusters. computed represented the starting time for therepresented taxi during using thetaxi kernel density of the that clusters, the start time was computed that thesimulation, starting time for the during High kernel density value of timestamp was chosen as the start time. The left Figure 6 shows an simulation, using the kernel density function the clusters. timestampHigh of each of the pointsvalue in the of clusters. function for the timestamp of each of the points for in the kernel density timestamp example for staydensity point clusteroffor taxi id “10012462” at the gridstart id “200000036679”. The6 right Figure 6 High kernel timestamp was6chosen time. leftpoint Figure shows was chosen as the start value time. The left Figure showsasan example forThe stay cluster foran taxi id shows the kernel density of timestamp all the stay point cluster for taxi id “10012462” at id exampleatfor stay cluster for taxiThe id “10012462” at grid id “200000036679”. The right Figuregrid 6 all “10012462” grid idpoint “200000036679”. right Figure 6 shows the kernel density of timestamp “200000036679”. For this case, the density at timestamp “14,292 s” (3:58:13 a.m.) was the highest, and shows the kernel density of timestamp all the stay point cluster for taxi id “10012462” at grid id the stay point cluster for taxi id “10012462” at grid id “200000036679”. For this case, the density at hence, start time forFor this chosenatattimestamp 3:58:13 a.m. and the cluster as the and starting “200000036679”. thistaxi case,was the density “14,292 s” (3:58:13 a.m.)centroid was the highest, timestamp “14,292 s” (3:58:13 a.m.) was the highest, and hence, start time for this taxi was chosen at hence, start time for this taxi was chosen at 3:58:13 a.m. and the cluster centroid as the starting location. 3:58:13 a.m. and the cluster centroid as the starting location. location.
Figure 6. Left: Stay point cluster; Right: Kernel density function of timestamp.
Figure 6. 6. Left: Left: Stay Figure Stay point point cluster; cluster; Right: Right: Kernel Kernel density density function function of of timestamp. timestamp.
4.3. Taxi Origin and Destination
4.3. Taxi Origin and and Destination Destination 4.3. Taxi Origin
Taxi origin and destination or OD refer to the location where the taxi picked up and dropped off
Taxi origin and destination OD refer toto the location where the taxi picked upup andand dropped off the passenger or customer [34].or The OD of the taxi was extracted from the taxi trip information, which Taxi origin and destination or OD refer the location where the taxi picked dropped is tracked through taxi “meter status” of probe data, as described in Table 1. As for the trip itself, only the the passenger or customer [34].[34]. The OD theoftaxi from the taxi trip information, which off passenger or customer Theof OD thewas taxiextracted was extracted from the taxi trip information, those through trips weretaxi considered whose origin and data, destination was within Bangkok and the surrounding is tracked “meter status” of probe as described in Table 1. As for the trip itself, which is tracked through taxi “meter status” of probe data, as described in Table 1. As for theonly trip as shown in Figure 2. This alsoand suggested that longer trip information was eliminated and thoseprovinces, trips origin within Bangkok and the surrounding itself, only were thoseconsidered trips were whose considered whosedestination origin and was destination was within Bangkok and the simulation focused more on Bangkok region. Figure 7 shows the passenger trip based on pick-up and provinces, as provinces, shown in Figure 2. This also suggested thatsuggested longer tripthat information was eliminated was and surrounding as shown in Figure 2. This also longer trip information drop-off transition. simulation focused more on Bangkok region. Figure 7 shows the passenger trip based on pick-up and eliminated and simulation focused more on Bangkok region. Figure 7 shows the passenger trip based drop-off transition. on pick-up and drop-off transition.
Figure 7. Passenger trip based on pick-up and drop-off transition.
Meter status transition from 0 to 1 was considered as the start of a passenger trip, and transition Figure 7. Passenger based on and drop-off transition. Figure 7. Passenger trip based on pick-up pick-up and transition. from 1 to 0 was considered as the starttrip of the non-passenger trip. drop-off Each of the transition locations were mapped onto the grid network G, along with trip distance and trip time computed. A total of Meter status transition from to 1of 1 was was considered as the the start start of passenger trip, and transition 3,081,230 passenger trips and a total 2,570,432 non-passenger trips of were recorded for Juneand andtransition July Meter status transition from 00 to considered as aa passenger trip, from 1 to 0 was considered as the start of the non-passenger trip. Each of the transition locations were 2015. Table 2 shows the passenger and non-passenger trip examples for taxi id: 10012462 on weekday. from 1 to 0 was considered as the start of the non-passenger trip. Each of the transition locations
mapped onto onto the grid network G, along with triptrip distance and of were mapped the grid network G, along with distance andtrip triptime timecomputed. computed.AA total total of 3,081,230 passenger trips and a total of 2,570,432 non-passenger trips were recorded for June and July 3,081,230 passenger trips and a total of 2,570,432 non-passenger trips were recorded for June and July 2015. Table 2015. Table 22 shows shows the the passenger passenger and and non-passenger non-passenger trip trip examples examples for for taxi taxi id: id: 10012462 10012462on onweekday. weekday.
ISPRS Int. J. Geo-Inf. 2018, 7, 177
9 of 24
Table 2. Passenger and non-passenger trip. Date: 1 June 2015; Day Type: Weekday Passenger Trip IMEI
Pick-Up Grid
Drop-Off Grid
Pick-Up Time
Drop-Off Time
Trip Distance (km)
Trip Time (min)
10012462 10012462
200000035920 200000036680
200000036681 200000040994
9:04:56 9:22:01
9:09:57 15:44:16
2.34 62.35
5.02 382.25
IMEI
Drop-Off Grid
Pick-Up Grid
Drop-Off Time
Pick-Up Time
Trip Distance (km)
Trip Time (min)
10012462 10012462
200000036681 200000040994
200000036680 200000037087
9:09:57 15:44:16
9:22:01 16:35:59
2.95 17.63
12.07 51.72
Non-Passenger Trip
Based on passenger trip data, an OD matrix was established for a time step of 1 h, for which pick-up location (origin) and drop-off location (destination) pairs were aggregated, based on each grid network G = {g1 , g2 , g3 , g4 , . . . , gm }. As mentioned previously, there were about 30 million passenger trips recorded for a 2 month period, hence there are about 500,000 OD matrixes derived for each individual day, approximately. The OD matrix of this scale could be easily matched to the closest OSM road network. Doing so, however, generated issues where many OD pairs became sparse, with only one transition between origin and destination road network segment at the given time interval, as shown in Figure 8. Figure 8A shows the case when the OD became sparse, with only one transition between origin road segment and destination road segment, when created using OSM road networks as compared to when created with the grid network, which had 10 transitions between origin grid and destination grid, at the given time interval. Figure 8B shows the pick-up location at the origin with respect to the OSM road network. Similarly, Figure 8C shows the drop-off locations at the destination with respect to the OSM road network. Here, each OD pair corresponding to road network were at different segments, hence creating the sparseness in the OD matrix. Using grid network helped reduce the sparsity problem when creating the OD matrix. In the case when a longer period dataset or larger numbers of taxi data were available, using road network becomes a better option than using the grid network. Other advantages of an OD matrix based in grid is that it could help anonymize the data where data are sensitive, and privacy needs to be protected as well as; such an OD matrix could be applied for understanding interzonal or intercity mobility as well [35]. With the origin-destination matrix for passenger trip, OD transition probability was computed for both weekday and weekend data as shown in Equation (2), which was to be used for passenger trip simulation, as described in Section 5.
∀ g ∈ Gt , P( gO→ D ) =
TripO→ D TripO
(2)
where P( gO→ D ) is the OD probability for all grids g that belongs to G at time interval t, such that TripO→ D is the total number of passenger trips between the origin grid O and the destination grid D, and TripO is all the passenger trips that originated at grid O at time interval t. The concept behind the use of conditional probability of the OD matrix as shown in Equation (2) was to construct the transition probability matrix between origin and the potential destinations, such that for any given passenger trip generated during simulation, taxis at the origin would move to the destination estimated by the OD transition probability matrix for the given time interval. The OD probability transition matrix, however, could be the subject of periodic update [36] at specific time intervals as agents start to move along the spatiotemporal domain, which could provide the indirect interaction between the agents during the simulation process. Figure 9 shows the passenger trip OD visualization at a time interval of 7 a.m. to 8 a.m., and 18 p.m. to 19 p.m. Each line segment in the OD visualization represented the passenger trip from origin grid O to destination grid D, while the width of the line segment represented the number of trips. The higher the number of trips, the broader was the line segment, and vice versa.
ISPRS Int. J. Geo-Inf. 2018, 7, 177 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW
10 of 24 10 of 24
Figure 8. (A) OSM road network and grid network OD comparison; (B) Pick-up location at origin Figure 8. (A) OSM road network and grid network OD comparison; (B) Pick-up location at origin with with respect to OSM road network; (C) Drop-off location at destination with respect to OSM road respect to OSM road network; (C) Drop-off location at destination with respect to OSM road network. network.
ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW
11 of 24
ISPRS Int. J. Geo-Inf. 2018, 7, 177
11 of 24
ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW
11 of 24
Figure 9. Passenger trip OD at time interval. Left: 7–8 a.m.; Right: 18–19 p.m.
4.4. Taxi Demand Passenger demand for the given time interval was defined as the total number of passenger pick9. Passenger tripOD OD time interval. interval. Left: 7–8 a.m.; Right: 18–19 p.m. ups within Figure the Figure given spatial region. In general, demand is the passenger pick-up count in the spatial 9. Passenger trip atattime Left: 7–8 a.m.; Right: 18–19 p.m. and temporal domain [16,19,37,38]. Demand for a taxi was computed with the concept of probability 4.4. Taxi Demand of success or probability of finding passenger, as proposed by [19,39], which was the demand that 4.4. Taxi Demand generated in the grid over of vacant that grid in that time forpickboth Passenger demand for the the number given time intervaltaxis wasin defined as the totalgiven number ofinterval passenger weekday and weekend data, as shown in Equation (3). Passenger demand for the given time interval was defined as the total number of passenger ups within the given spatial region. In general, demand is the passenger pick-up count in the spatial
pick-ups within the given spatial region. In general, demand passenger pick-up count in the and temporal domain [16,19,37,38]. Demand for a taxi was computed the concept of probability 𝑂𝑂𝑔𝑔 is thewith ∀𝑔𝑔 ∈ Demand 𝐺𝐺𝑡𝑡 ,as 𝑃𝑃(𝑑𝑑𝑑𝑑) = taxi (3) of success or probability of finding passenger, proposed [19,39], which was the demand that spatial and temporal domain [16,19,37,38]. was computed with the concept of 𝑔𝑔for a by 𝑉𝑉𝑔𝑔 generated in the grid over the number of vacant taxis in that grid in that given time interval for both probability of success or probability of finding passenger, as proposed by [19,39], which was the weekday and weekend data, shown inthe Equation where 𝑃𝑃(𝑑𝑑𝑑𝑑) probability of success for all(3). grid ∈ 𝐺𝐺 at taxis time interval 𝑡𝑡, such 𝑔𝑔 is the in 𝑔𝑔 and time demand that generated the as grid over number of 𝑔𝑔vacant in that grid in that that 𝑂𝑂given 𝑉𝑉 are the total number of demands generated and total number of recorded vacant taxis at grid 𝑔𝑔 ∈ 𝑔𝑔 for both weekday and weekend data, as shown in𝑂𝑂Equation (3). interval 𝑔𝑔
=demand probability was chosen for every 𝐺𝐺 and time interval 𝑡𝑡, respectively. The∀time (3) 1 𝑔𝑔 ∈ 𝐺𝐺interval 𝑡𝑡 , 𝑃𝑃(𝑑𝑑𝑑𝑑)for 𝑔𝑔 𝑉𝑉𝑔𝑔 h interval. Figure 10 shows the aggregated demand of taxi in morning hour from 7 a.m. to 8 a.m. and Og ∀ ∈ G , P dm = (3) ( ) g t in the evening 18 p.m. to p.m. for all grid g 𝑔𝑔 ∈V𝐺𝐺 at time interval 𝑡𝑡, such that 𝑂𝑂 and where 𝑃𝑃(𝑑𝑑𝑑𝑑)𝑔𝑔 hour is thefrom probability of 19 success 𝑔𝑔 g demand for eachgenerated grid and each timenumber interval of varies as shown in taxis Figure which 𝑉𝑉𝑔𝑔 areThe thepassenger total number of demands and total recorded vacant at10, grid 𝑔𝑔 ∈ that success varies This indicated thatt,chosen the probability of where P dm the the probability ofofsuccess for all accordingly. grid ∈ G at time interval such thatevery Og and (implies ) g isinterval 𝐺𝐺also and time 𝑡𝑡,probability respectively. The time interval forgdemand probability was for 1 Vg success was subject to spatial and temporal variation, which could be captured through demand h interval. Figure 10 shows the aggregated demand of taxi in morning hour from 7 a.m. to 8 a.m. and are the total number of demands generated and total number of recorded vacant taxis at grid g ∈ G estimation. However, of demand-related information (such as conservative, in theinterval evening hour fromdifferent 18 p.m. tolevels 19 p.m. and time t, respectively. The time interval for demand probability was chosen for every 1 h empirical, informed in realfor time, and informed about predictions) using data-driven 10, Artificial The passenger demand each grid and eachoftime varieshour as shown whichand in interval. Figure 10 shows the aggregated demand taxiinterval in morning fromin7 Figure a.m. to 8 a.m. Intelligence (AI) technologies, and how the information is shared among the driver, could alter also implies that the probability of success varies accordingly. This indicated that the probabilityand of the evening hour from 18 p.m. to 19needs p.m. to be defined carefully for better demand estimation [40]. improvewas the subject overall behavior success to spatialand and temporal variation, which could be captured through demand estimation. However, different levels of demand-related information (such as conservative, empirical, informed in real time, and informed about predictions) using data-driven Artificial Intelligence (AI) technologies, and how the information is shared among the driver, could alter and improve the overall behavior and needs to be defined carefully for better demand estimation [40].
Figure 10. Cont.
ISPRS Int. J. Geo-Inf. 2018, 7, 177
12 of 24
ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW
12 of 24
ISPRS Int. J.10. Geo-Inf. 7, x FORtaxi PEER REVIEW Figure Aggregated demand Figure 10.2018, Aggregated taxi demandat attime time
of 24 interval. Top:7–8 7–8 a.m.; Bottom: 18–19 p.m. interval. Top: a.m.; Bottom: 18–19 p.m.12
4.5. Vacant Taxi Movement
The passenger demand for each grid and each time interval varies as shown in Figure 10, the taxi passenger,ofthe taxi has a totalaccordingly. of nine possible directions move which alsoWhen implies thathas theno probability success varies Thiscardinal indicated that thetoprobability for searching the passenger, north (337.5–22.5°), northeast (22.5–67.5°), east through (67.5–112.5°), of success was subject to spatialincluding and temporal variation, which could be captured demand southeast (112.5–157.5°), south (157.5–202.5°), southwest (202.5–247.5°), west (247.5–292.5°), estimation. However, different levels of demand-related information (such as conservative, empirical, northwest (292.5–337.5°), or stay at the same location. Figure 11 illustrated the nine possible cardinal informed in real time, and informed about predictions) using data-driven Artificial Intelligence (AI) directions for vacant taxi movement. Based on this assumption, vacant taxi movement was directed technologies, and how the information is shared among the driver, could alter and improve the overall for searching of a passenger, with a directional probability which was estimated for both weekday behavior and needs to as beshown defined for for better and weekend data, in carefully Equation (4), eachdemand grid at a estimation time interval[40]. of every 5 min. 4.5. Vacant Taxi Movement
∀𝑔𝑔 ∈ 𝐺𝐺𝑡𝑡 , 𝑃𝑃(𝑑𝑑)𝑔𝑔 =
𝑛𝑛𝑔𝑔
(4)
𝑁𝑁𝑔𝑔
When the taxi Figure has no taxiathas total of nine possible 10. passenger, Aggregated taxithe demand timeainterval. Top: 7–8 a.m.; Bottom:cardinal 18–19 p.m. directions to move ◦ ), east (67.5–112.5 where 𝑃𝑃(𝑑𝑑) is the direction probability for(337.5–22.5 vacant taxi◦movement, to direction 𝑑𝑑, for all ◦ ), for searching the including north ), northeastmoving (22.5–67.5 𝑔𝑔 passenger, 4.5. Vacant Taxi Movement ◦ ◦ ◦ ◦ ), northwest grid 𝑔𝑔 ∈ 𝐺𝐺 at time interval such that 𝑛𝑛𝑔𝑔),and 𝑁𝑁𝑔𝑔 are the number of ),vacant taxi points moving to southeast (112.5–157.5 ), south 𝑡𝑡, (157.5–202.5 southwest (202.5–247.5 west (247.5–292.5 ◦ When the taxi has no passenger, the taxi has a total of nine possible cardinal directions to move direction 𝑑𝑑 , and the total number of vacant taxi points in grid 𝑔𝑔 ∈ 𝐺𝐺 , and time interval of 𝑡𝑡 , (292.5–337.5 ), or stay at the same location. Figure 11 illustrated the nine possible cardinal directions for searching the passenger, including north (337.5–22.5°), northeast (22.5–67.5°), east (67.5–112.5°), respectively. The direction the vacant taxi would choose for searching for passengers was estimated for vacant taxi movement. Based on this assumption, vacant taxi movement was directed for searching southeast south southwest (202.5–247.5°), west (247.5–292.5°), from the highest(112.5–157.5°), 𝑃𝑃(𝑑𝑑)𝑔𝑔 obtained for(157.5–202.5°), a given grid and time interval. of a passenger, with a directional probability estimated fortheboth and weekend northwest (292.5–337.5°), or stay at the same which location.was Figure 11 illustrated nine weekday possible cardinal directions for vacant taxi Basedat onathis assumption, taxi 5 movement data, as shown in Equation (4),movement. for each grid time interval vacant of every min. was directed for searching of a passenger, with a directional probabilitynwhich was estimated for both weekday g and weekend data, as shown in Equation ∀ g ∈(4),Gfor = at a time interval of every 5 min. (4) (d) ggrid t , Peach Ng ∀𝑔𝑔 ∈ 𝐺𝐺𝑡𝑡 , 𝑃𝑃(𝑑𝑑)𝑔𝑔 =
𝑛𝑛𝑔𝑔
taxi𝑁𝑁𝑔𝑔movement,
(4)
where P(d) g is the direction probability for vacant moving to direction d, for all grid g ∈ Gwhere at time interval t, such that n and N are the number of vacant taxi points g g 𝑃𝑃(𝑑𝑑)𝑔𝑔 is the direction probability for vacant taxi movement, moving to direction 𝑑𝑑, for allmoving to direction d,grid and𝑔𝑔 the total number of vacant taxi points in grid g ∈ G, and time interval of t, respectively. ∈ 𝐺𝐺 at time interval 𝑡𝑡, such that 𝑛𝑛𝑔𝑔 and 𝑁𝑁𝑔𝑔 are the number of vacant taxi points moving to direction 𝑑𝑑 , and the total number of vacant taxi points in grid 𝑔𝑔 ∈ 𝐺𝐺 , and time interval of 𝑡𝑡the , The direction the vacant taxi would choose for searching for passengers was estimated from highest respectively. The direction the vacant taxi would choose for searching for passengers was estimated P(d) g obtained for a given grid and time interval. from the highest 𝑃𝑃(𝑑𝑑)𝑔𝑔 obtained for a given grid and time interval.
Figure 11. Nine cardinal directions for vacant taxi movement.
4.6. Network Travel Time GPS probe data now enables us to collect the travel time information through the moving vehicle, and has the given convenience to make travel time predictions in a complicated road network, whereby road network travel time can be estimated by estimating the speed between the
Figure 11. Nine cardinal directions for vacant taxi movement.
Figure 11. Nine cardinal directions for vacant taxi movement. 4.6. Network Travel Time GPS probe data now enables us to collect the travel time information through the moving vehicle, and has the given convenience to make travel time predictions in a complicated road network, whereby road network travel time can be estimated by estimating the speed between the
ISPRS Int. J. Geo-Inf. 2018, 7, 177
13 of 24
4.6. Network Travel Time GPS probe data now enables us to collect the travel time information through the moving vehicle, and has the given convenience to make travel time predictions in a complicated road network, whereby road network travel time can be estimated by estimating the speed between the nodes of the road segment [41,42]. GPS probe data can essentially provide information of a traffic condition of a given period, such as travel time estimation, as well as traffic congestion, which directly relates to the distance travelled by a vehicle in that period [43]. In this paper, the average road network segment speed and average grid network speed was estimated for the time step of every 15 min time interval for both weekday and weekend data, as shown in Equations (5) and (6). The estimated average road network segment speed and average grid network speed was used for estimating taxi travel time during the simulation movement. ∑ S p ∈r ∀r ∈ R t , sr = (5) Np ∈r
∀ g ∈ Gt , s g =
∑ S p∈ G Np ∈ G
(6)
where, sr and s g are the average speed on the road network segment r ∈ R, and grid network g ∈ G, respectively at a time interval of t, such that ∑ S p∈r and ∑ S p∈ g are the sum of the speed of all the points p with Np∈r and Np∈ g as the total number of points belonging to its respective network. The use of two different average speeds was because not all road network segments had the probe GPS point associated with it at a given time interval. In such cases, the road network segment average speed could not be computed. Hence, grid network average speed was used instead for the given road network segment associated with it at the given time interval. In this regard, road network travel time was given by Equation (7). rdistance sr Tr: pˆ ∈t = (7) rdistance sg where, Tr: pˆ ∈t is the road network travel time with rdistance as road network segment distance, such that for any point pˆ that appears on road network segment r ∈ R and grid network g ∈ G at time interval t during simulation, would require Tr: pˆ ∈t unit time to cross or complete the road network segment. 5. Methodology 5.1. Agent-Based Model An agent-based simulation model was designed to simulate the discrete event of real taxi movement as it happens in the real-world situation. The entire model was subdivided into five different submodules which were initialization module, passenger pick-up module, data logger module & updater module, passenger drop-off module, as shown in Figure 12. In the agent-based simulation model, each taxi was treated as an individual taxi agent, for which they were given a specific behavior rule, based on location and time, for its entire movement. This approach makes modeling flexible, as it makes it easy to add individual variation in the behavior rule, as well as external random influences [44]. The result is an overall characteristic feature of the system from the collective individual entity with rule. Other features that make agent-based simulation promising is the use of modularity, where different events are separated into individual modules.
ISPRS Int. J. Geo-Inf. 2018, 7, 177
14 of 24
ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW
14 of 24
ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW
14 of 24
Figure 12. Agent-based simulation model.
Figure 12. Agent-based simulation model.
5.2. Initialization Module
Figure 12. Agent-based simulation model.
5.2. Initialization Module
In this module, various simulation spatial and non-spatial parameters were indexed. Spatial data 5.2. Initialization Module
parameters included OSM road networkspatial and grid network data as described in Section 3.2, whereas In this module, various simulation and non-spatial parameters were indexed. Spatial In this module, various simulation spatial and non-spatial parameters were indexed. Spatial data non-spatial data parameters included stay point data, passenger trip data, origin-destination data parameters included OSM road network and grid network data as described in Section 3.2, parameters included OSM road network andmovement grid datapassenger as (direction described in data, Section 3.2, whereas probability, demand vacant taxi probability probability), OSM road whereas non-spatial dataprobability, parameters included staynetwork point data, trip origin-destination non-spatial included stay point data, passenger trip13data, network, anddata gridparameters network speed data, as described in Section 4. Figure showsorigin-destination the initialization probability, demand probability, vacant taxi movement probability (direction probability), OSM road probability, demand probability, module of the simulation process.vacant taxi movement probability (direction probability), OSM road network, and grid network speed data, as described in Section 4. Figure 13 shows the initialization network, and grid network speed data, as described in Section 4. Figure 13 shows the initialization module of the process. module of simulation the simulation process.
Figure 13. Initialization module. Figure 13. Initialization module.
Figure 13. Initialization module.
ISPRS Int. J. Geo-Inf. 2018, 7, 177
15 of 24
5.3. Taxi Agent Each of the taxi agents was created based on the taxi stay point cluster extracted from the probe data as described in Section 4.2. For each taxi agent, various parameters were set at the beginning of the simulation, including start location regarding latitude and longitude, start time, trip type, i.e., whether ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 15 of 24 the taxi have a passenger or not, day type, i.e., whether simulation was for weekday or weekend, and some 5.3. days hours to run the simulation. Based on taxi agent start time and the number of Taxior Agent days or hoursEach for simulation, thewas end or stop criteria forstay each taxi agent was determined. of the taxi agents created based on the taxi point cluster extracted from the probeAs for the data as described in Section 4.2. For taxi agent, various were set have at the beginning of starting condition, the assumption waseach made that the taxiparameters agent will not any passenger, and theinitialization simulation, including startthe location regarding latitude and onto longitude, start time, trip type, i.e., hence, after module, taxi agent would move the passenger pick-up module.
whether the taxi have a passenger or not, day type, i.e., whether simulation was for weekday or weekend, and some days or hours to run the simulation. Based on taxi agent start time and the 5.4. Passenger Pick-Up Module number of days or hours for simulation, the end or stop criteria for each taxi agent was determined. for the starting condition, the assumption made that the agent will have would any In theAspassenger pick-up module, the taxi was agent, based ontaxi location andnottime, move, passenger, and hence, after initialization module, the taxi agent would move onto the passenger picksearching for a passenger. The searching of the passenger was made based on vacant taxi movement up module.
probability or direction probability for the grid that the taxi agent belongs to, and the time interval. 5.4.types Passenger Pick-Up Module Two distinct of movement could be observed, which were staying in the same grid for searching passenger (taxi event) or module, move adjoining gridbased (taxionmovement For each In queueing the passenger pick-up the taxi agent, location andevent). time, would move,taxi agent searching a passenger. searchingin of Section the passenger based onwas vacant taxi movement free movement at for grid level as The described 4.5,was an made estimation made as to whether the or direction probability for the grid that the taxi agent belongs to, and the time interval. taxi agentprobability would get a passenger or not, based on demand probability of success, as described in Two distinct types of movement could be observed, which were staying in the same grid for searching Section 4.4.passenger If there(taxi were no pick-up events, then thegrid taxi(taxi agent wouldevent). either stay the same grid or queueing event) or move adjoining movement For eachin taxi agent move to anfree adjoining grid to look a passenger. If there was a pick-up intothe grid, the then pick-up movement at grid level for as described in Section 4.5, an estimation was event made as whether taxi agent would get a passenger or not, based on demand probability of success, as described location was estimated based on the distribution of real passenger trips that had originated in in that grid. Section 4.4. If there were no pick-up events, then the taxi agent would either stay in the same grid or With pick-up location successfully estimated, the route to the pick-up location was implemented using move to an adjoining grid to look for a passenger. If there was a pick-up event in the grid, then pickthe Dijkstra which wasonbased on OSMofnetwork. Fortrips each road travel upalgorithm location was[45] estimated based the distribution real passenger that had network originated segment, in that time was estimated and described in Section 4.6. The travel time was of each road segment of grid. With pick-up location successfully estimated, the cumulative route to the pick-up location implemented usingthen the Dijkstra [45] which was based OSM network. For eachas road network segment, the route was storedalgorithm as passenger search time.on Route interpolation, described in [46], was then travel time was estimated and described in Section 4.6. The cumulative travel time of each road ˆ to the pick-u location, with the sampling rate of 30 s. implemented to generate taxi agent points p. segment of the route was then stored as passenger search time. Route interpolation, as described in The time period chosen as to maintain [25] of the taxi along with [46], waswas then implemented to generate taxi the agentoverall points 𝑝𝑝̂trajectory to the pick-up location, with theagent, sampling reducing the storageThe load. module called in to log all the interpolated points, rate data of 30 seconds. timeData periodlogger was chosen as to was maintain the overall trajectory [25] of the taxi agent, along with reducing the data storage load. Data logger module was called in to log all the created during the taxi agent simulation movement, on to the file system. Updater module then resets interpolated points, created during the taxi agent simulation movement, on to the file system. the parameters for the taxi agent, such as location, as well as time for the succeeding module. Figure 14 Updater module then resets the parameters for the taxi agent, such as location, as well as time for the shows thesucceeding detailed passenger pick-up module. module. Figure 14 shows the detailed passenger pick-up module.
Figure 14. Passenger pick-up module.
ISPRS Int. J. Geo-Inf. 2018, 7, 177
ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW
5.5. Passenger Drop-Off Module
16 of 24
16 of 24
Figure 14. Passenger pick-up module.
5.5. Passengerdrop-off Drop-Offmodule Module subsequently was used for estimating taxi agent drop-off location. Passenger
Drop-off Passenger grid location wasmodule estimated based onwas trip origin-destination as location. described in drop-off subsequently used for estimating taxiprobability, agent drop-off Drop-off grid location was estimated based on trip origin-destination probability, as described in Section 4.3, for a given grid as well as a time interval. With drop-off grid estimated, drop-off location Section 4.3, for a given grid as well as a time interval. With drop-off grid estimated, drop-off location was then estimated based on the distribution of real passenger trip that had destinated in that grid. was then estimated basednetwork on the distribution of estimation, real passenger in that grid. Following, route selection, travel time as trip wellthat as had taxi destinated agent route interpolation, Following, route selection, network travel time estimation, as well as taxi agent route interpolation, was conducted as similar to the passenger pick-up module. Data logger was then called in to log all was conducted as similar to the passenger pick-up module. Data logger was then called in to log all the interpolated points on to the file system along with updater module to reset parameters for the the interpolated points on to the file system along with updater module to reset parameters for the succeeding module. Figure 15 shows the detailed passenger drop-off module. succeeding module. Figure 15 shows the detailed passenger drop-off module.
Figure 15. Passenger drop-off module.
Figure 15. Passenger drop-off module.
For each taxi agent simulation, one iteration cycle was composed of passenger pick-up module, data logger module updater module, passenger drop-off module, and again,pick-up data logger For each taxi agentand simulation, one iteration cycle was composed offinally passenger module, and updater of each iteration cycle triggered end criteria condition. data module logger module andmodule. updaterCompletion module, passenger drop-off module, and finally again, data Iflogger the end asmodule. set in initialization module were not met,cycle then the agent would again be moved to If the module and criteria updater Completion of each iteration triggered end criteria condition. passenger pick-up module, and the cycle would continue until the end criteria were met, indicating end criteria as set in initialization module were not met, then the agent would again be moved to the end of simulation for a given taxi agent.
passenger pick-up module, and the cycle would continue until the end criteria were met, indicating the end simulation a given taxi agent. 5.6. of Data Logger andfor Updater Module purpose of the data logger module was to log all the simulated data generated during both 5.6. Data The Logger and Updater Module
passenger pick-up module and passenger drop-off module. Following the data logger module, The purpose of the logger was to log theupdater simulated data served generated during updater module was data called in, asmodule shown in Figure 16.allThe module mainly two both passenger pick-up module andwas passenger drop-off module.local Following theofdata loggersuch module, updater purposes. The first purpose to update the individual parameter an agent, as agent location time. was16. to update the global parameters, included taxi module wasand called in,The as second shownpurpose in Figure The updater module servedwhich mainly two purposes. demand probability direction probability local data, parameter taxi origin destination probability data, and The first purpose was todata, update the individual of an agent, such as agent location passenger trip data. The global parameter could be updated by merging the simulated data with demand the and time. The second purpose was to update the global parameters, which included taxi real dataset. The new taxi demand probability data, direction probability data, taxi origin destination probability data, direction probability data, taxi origin destination probability data, and passenger trip probability data, and passenger trip data, then could be computed, based on merged simulated and
data. The global parameter could be updated by merging the simulated data with the real dataset. The new taxi demand probability data, direction probability data, taxi origin destination probability data, and passenger trip data, then could be computed, based on merged simulated and real dataset. The index of the parameters then needs to be updated, which would be used by the agents subsequently
ISPRS Int. J. Geo-Inf. 2018, 7, 177
ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW
17 of 24
17 of 24
real dataset. The index of the parameters then needs to be updated, which would be used by the
during simulation. The update on global parameter provided a mechanism where agent could interact agents subsequently during simulation. The update on global parameter provided a mechanism with each other indirectly. where agent could interact with each other indirectly.
Figure 16. Data logger and updater module.
Figure 16. Data logger and updater module.
6. Results
6. Results
The simulation was conducted in two scenarios, i.e., weekday and weekend, each for an entire
day, simulation such that overall of in simulated taxi service within city was kept The wasproperties conducted two scenarios, i.e.,behavior weekday andthe weekend, eachintact, for anasentire that with the real taxi service, by adjusting various parameters within the simulation process. day, such that overall properties of simulated taxi service behavior within the city wasVarious kept intact, property comparisons, on distribution, were conducted between the simulated taxi agent data as that with the real taxi based service, by adjusting various parameters within the simulation process. and the real taxi data, that showed the level of similarity between them. The overlapping coefficient, Various property comparisons, based on distribution, were conducted between the simulated taxi which is defined as a measure of the agreement between two probability distributions [47], was agent data and the real taxi data, that showed the level of similarity between them. The overlapping computed to identify the similarity measurement with a scale of 0 to 1. The measured value of 1 coefficient, which is defined as while a measure of the agreement two indicated a perfect match, the measured value of between 0 indicated noprobability interaction distributions between the [47], was computed identify thetaxi similarity with a scale of 0 to 1. The per measured distribution.to Four different attributesmeasurement that included trip generated, trip generated grid, tripvalue of 1 time, indicated a perfect while the measured value of distribution, 0 indicated whereas no interaction and trip distancematch, were compared together regarding their average between taxi speed was compared for hourly Finally, occupancytrip between the simulated and real per taxi grid, the distribution. Four different taxivariation. attributes that included generated, trip generated data was trip time, andalso tripevaluated. distance were compared together regarding their distribution, whereas average taxi speed was compared for hourly variation. Finally, occupancy between the simulated and real taxi data 6.1. Distribution Overlapping Coefficient was also evaluated. The weekday’s distribution comparison is shown in Figure 17. Trip distribution comparison
6.1. Distribution Overlapping Coefficient between simulated taxi data and real taxi data showed the overlapping coefficient of 0.81, which indicated some trips generated from the simulation were a close match with the real data. Similarly,
The weekday’s distribution comparison is shown in Figure 17. Trip distribution comparison a number of trips generated per grid distribution showed a very high overlapping coefficient of 0.98, between simulated taxi data and taxi data indicating high similarity for the tripreal generated on theshowed grid level.the overlapping coefficient of 0.81, which indicated some trips generated from the simulation a close match with the real As for the trip time distribution, regarding minutes, anwere overlapping similarity of 0.93 was data. Similarly, a number of trips generated per grid distribution showed a very high overlapping coefficient obtained, which showed significant similarity between simulated trip time and real trip time. Finally, of 0.98, high similarity for the atrip generated the grid level. for indicating trip distance, regarding kilometers, distribution of on overlapping similarity of 0.88 was obtained between simulated trip time and real trip time. The significant similarity for the weekday As for the trip time distribution, regarding minutes, overlapping an overlapping similarity of 0.93 was simulation suggested that the simulated taxi agent emulated the real taxi, keeping overall obtained, which showed significant similarity between simulated trip time and real trip time.taxi Finally, behavior intact. Table 3 shows distribution properties ofofthe four compared attributes taxi was behavior for trip distance, regarding kilometers, a distribution overlapping similarity ofof0.88 obtained for the weekday simulation. between simulated trip time and real trip time. The significant overlapping similarity for the weekday simulation suggested that the simulated taxi agent emulated the real taxi, keeping overall taxi behavior intact. Table 3 shows distribution properties of the four compared attributes of taxi behavior for the weekday simulation.
ISPRS Int. J. Geo-Inf. 2018, 7, 177
18 of 24
ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW
18 of 24
Figure 17. Simulated trip vs real trip regarding distribution (day type = weekday). Figure 17. Simulated trip vs real trip regarding distribution (day type = weekday). Table 3. Weekday simulated trip data vs real trip data comparison. Table 3. Weekday simulated trip data vs real trip data comparison. Weekday Simulated DataWeekday Real Data Type Overlapping Coefficient Mean Standard Deviation Simulated DataDeviationMeanStandard Real Data Type Overlapping Coefficient Trip Count 15.24 Standard 5.27 7.44 Deviation 0.81 Mean Deviation 13.87 Mean Standard Grid Trip Generated 15.249.02 20.42 9.57 19.827.44 0.98 0.81 Trip Count 5.27 13.87 Grid Trip Generated 20.42 9.57 Passenger Trip Time (min)9.02 20.18 16.50 21.81 18.5019.82 0.93 0.98 Passenger Trip Time (min) 20.18 16.50 21.81 Passenger Trip Distance (km) 9.09 7.87 10.31 9.7818.50 0.88 0.93
Passenger Trip Distance (km)
9.09
7.87
10.31
9.78
0.88
The weekend distribution comparison is shown in Figure 18. Trip distribution comparison The distribution comparison is shown in the Figure 18. Tripcoefficient distribution comparison betweenweekend simulated taxi data and real taxi data showed overlapping of 0.7, which between simulated taxi data and real taxi data showed the overlapping coefficient of 0.7, indicated a number of trips generated from the simulation were in accord with the real data. which indicated a number ofgenerated trips generated the simulation accord with the real data. Similarly, a number of trips per gridfrom distribution showed were a highinoverlapping coefficient of Similarly, a number of similarity trips generated per grid distribution showed 0.91, indicating high for trips generated on the grid level. a high overlapping coefficient of For trip time distribution, to minutes, thelevel. overlapping coefficient of 0.96 was 0.91, indicating high similarity forwith tripsrespect generated on the grid obtained, a significant similarity between simulated trip time and real of trip0.96 time. For tripwhich time showed distribution, with respect to minutes, the overlapping coefficient was Finally, which for tripshowed distance distribution, regarding kilometers, an overlapping similarity 0.86 Finally, was obtained, a significant similarity between simulated trip time and real tripoftime. between simulated regarding trip time and real trip time. The significant overlapping for forobtained trip distance distribution, kilometers, an overlapping similarity of 0.86similarity was obtained the weekend simulation also suggested that the simulated taxi agent emulated the real taxi keeping between simulated trip time and real trip time. The significant overlapping similarity for the weekend overall taxi behavior intact. Table shows distribution properties of the the real four taxi compared attributes simulation also suggested that the4simulated taxi agent emulated keeping overallof taxi taxi behavior for the weekend simulation. behavior intact. Table 4 shows distribution properties of the four compared attributes of taxi behavior
for the weekend simulation.
ISPRS Int. J. Geo-Inf. 2018, 7, 177
19 of 24
ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW
19 of 24
Figure 18. Simulated trip vs Real trip regarding distribution (Day type = Weekend). Figure 18. Simulated trip vs Real trip regarding distribution (Day type = Weekend). Table 4. Weekend Simulated Trip Data vs. Real Trip Data comparison. Table 4. Weekend Simulated Trip Data vs. Real Trip Data comparison. Weekend Simulated DataWeekend Real Data Type Overlapping Coefficient Mean Standard Deviation Simulated DataDeviationMeanStandard Real Data Type Overlapping Coefficient Trip Count 13.31 Standard 4.86 8.58 Deviation 0.70 Mean Deviation 15.78 Mean Standard Grid Trip Generated 13.318.18 20.68 11.50 23.818.58 0.91 0.70 Trip Count 4.86 15.78 Grid Trip Generated 20.68 11.50 Passenger Trip Time (min)8.18 19.17 16.13 20.37 16.7023.81 0.96 0.91 Passenger TripTrip TimeDistance (min) 19.17 16.13 20.37 Passenger (km) 9.01 7.91 10.74 10.0216.70 0.86 0.96
Passenger Trip Distance (km)
9.01
7.91
10.74
10.02
0.86
As described in Section 4.3, only those trips that had origin and destination in Bangkok and As described in Section 4.3,considered only thosetotrips that had and destination in Bangkok and surrounding provinces were construct theorigin OD matrix, and subsequently, OD surrounding were considered construct the OD and subsequently, ODsimilarity probability. probability.provinces Hence, for both weekdayto and weekend, in matrix, the overlapping coefficient Hence, for bothonly weekday in the overlapping coefficient similarity comparison,Out only comparison, those and tripsweekend, within Bangkok and surrounding provinces were considered. of those all trips within Bangkok and surrounding provinces were considered. Out of all the real passenger trips the real passenger trips within Bangkok and surrounding provinces, 97% of the trip had a trip time of less than 2 and h, and 98% of the trip had 97% a tripofdistance less athan Thisthan implies that98% the of within Bangkok surrounding provinces, the trip had trip 100 timekm. of less 2 h, and emulate taxi This behaviors forthat a trip distance within 100 obtained km, and trip thesimulated trip had result a trip obtained distance could less than 100 km. implies the simulated result could time within 2 h. emulate taxi behaviors for a trip distance within 100 km, and trip time within 2 h. The simulated dataresults resultsin inparameters parameters i.e., i.e., trip passenger trip time The simulated data trip count, count,grid gridtrip tripgenerated, generated, passenger trip time (min), passenger distance as compared real dataparameters, result parameters, was (min), andand passenger trip trip distance (km),(km), as compared to the to realthe data result was marginally marginally lower, as shown in Table simulated 4. However, simulated data resultcould parameters could be by lower, as shown in Table 4. However, data result parameters be maintained maintained by adjusting the demand probability success for which the current threshold value was to adjusting the demand probability success for which the current threshold value was set empirically set empirically to 15%. In addition, the higher value of standard deviation for both simulated 15%. In addition, the higher value of standard deviation for both simulated weekday and weekend weekday and weekend results especially for the grip trip generated and passenger trip time was results especially for the grip trip generated and passenger trip time was obtained as the distribution obtained as the distribution was computed at the grid level for which grip trip generated within the was computed at the grid level for which grip trip generated within the inner city would have obtained inner city would have obtained more trip as compared to the grid which was located at the outskirts more trip as compared to the grid which was located at the outskirts of the city. Similarly, passenger of the city. Similarly, passenger trip time would also vary depending upon the individual trip trip time would also vary depending upon the individual trip generated, which resulted in the high generated, which resulted in the high standard deviation in simulated result. standard deviation in simulated result.
ISPRS Int. J. Geo-Inf. 2018, 7, 177 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW
20 of 24 20 of 24 20 of 24
6.2. Average Speed Hourly Hourly Variation Variation 6.2. Average Speed 6.2. Average Speed Hourly Variation OSM road road network network based based average average speed speed comparison comparison was h OSM was conducted conducted for for time time intervals intervals of of 11 h OSM road network based average speed comparison wasweekend. conductedFigure for time intervals of hourly 1h between simulated and real datasets for both weekday and 19 shows the between simulated and real datasets for both weekday and weekend. Figure 19 shows the hourly betweenof simulated and real forregion both weekday andand weekend. Figure provinces. 19 shows the hourly variation average speed speed fordatasets the entire entire of Bangkok Bangkok surrounding variation of average for the region of and surrounding provinces. Comparison Comparison variation of average speed for the entire region of Bangkok and surrounding provinces. of an an average average speed speed showed showed an R squared squared value value of of 0.96 0.96 for for the the weekday weekday comparison comparison Comparison and R R squared squared of an R and of an average speed showed an R squared value of 0.96 for the weekday comparison and R squared value of of 0.97 for for the weekend’s weekend’s comparison. The high squared value for both both weekday weekday and and value Thehigh highRR Rsquared squaredvalue value value of0.97 0.97 forthe the weekend’s comparison. comparison. The forfor both weekday and weekend simulated data suggested simulated taxi agents could keep the real taxi properties intact. weekend simulatedtaxi taxiagents agentscould could keep real properties intact. weekendsimulated simulateddata data suggested suggested simulated keep thethe real taxitaxi properties intact. As mentioned mentioned previously, all query, query, search, search, and and retrieval retrieval tasks tasks were conducted over grid network. As previously, all were conducted over grid network. As mentioned previously, all query, search, and retrieval tasks were conducted over grid network. However, the the routing and and route interpolation interpolation were were conducted conducted over over the the OSM OSM road road network, network, which which is However, However, therouting routing androute route interpolation were conducted over the OSM road network, which is is shown in terms of taxi agent trajectory in Figure 20. Similarly, the simulated trajectory for the shown terms of taxi agent trajectory in Figure 20. Similarly, the simulated trajectory for thefor weekday showninin terms of taxi agent trajectory in Figure 20. Similarly, the simulated trajectory the weekday data at a time interval of every 4 h is shown in Figure 21. data at a time ofinterval every 4of h is shown inshown Figurein 21.Figure 21. weekday datainterval at a time every 4 h is
Figure 19. 19. Average Average speed concerning hourly time interval. Figure speed variation variation Figure 19. Average speed variation concerning concerning hourly hourly time time interval. interval.
Figure 20. Simulated trajectory of taxi agent for one day. Left: Weekday; Right: Weekend.
Figure Right: Weekend. Weekend. Figure 20. 20. Simulated Simulated trajectory trajectory of of taxi taxi agent agent for for one one day. day. Left: Left: Weekday; Weekday; Right:
ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW
21 of 24
ISPRS Int. J. Geo-Inf. 2018, 7, 177 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW
21 of 24
21 of 24
Figure 21. Simulated taxi agent trajectory visualization for weekday at 4 h time intervals.
6.3. Taxi Occupancy Evaluation taxi agent forweekday weekday timeintervals. intervals. OneFigure of the21. properties that characterizes taxivisualization service behavior is its occupancy, which is defined Figure 21.Simulated Simulated taxi agenttrajectory trajectory visualization for atat4 4hhtime as the ratio of taxi driving time with a passenger to total driving time [39]. Figure 22 shows the distribution of taxi occupancy ratio for the simulated taxi agent and real taxi data. 6.3.frequency Taxi Occupancy Evaluation 6.3. Taxi Occupancy Evaluation The occupancy ratio analysis showed for simulated taxi agent data occupancy was clustered One properties that characterizestaxi taxiservice servicebehavior behaviorisisits itsoccupancy, occupancy,which which is is defined defined as One of of thethe properties that characterizes around 37–41%, whereas for the real taxi data, occupancy ratio was clustered around 48–52%. Though as the ratio of taxi driving time with a passenger to total driving time [39]. Figure 22 shows the thesimulated ratio of taxi driving with slight a passenger to total driving time [39]. Figure 22 ratio; shows the the frequency taxi data time showed underestimation regarding occupancy overall frequency of distribution of taxi occupancy ratio for the taxi simulated anddata. real taxi data. distribution taxikept occupancy simulated agent taxi andagent real taxi distribution was similar ratio to thefor realthe taxi data. The occupancy ratio analysis showed for simulated taxi agent data occupancy was clustered around 37–41%, whereas for the real taxi data, occupancy ratio was clustered around 48–52%. Though simulated taxi data showed slight underestimation regarding occupancy ratio; the overall distribution was kept similar to the real taxi data.
Figure 22. Taxi occupancy. Left: Simulated taxi agent data; Right: Real taxi data. Figure 22. Taxi occupancy. Left: Simulated taxi agent data; Right: Real taxi data.
7. Conclusions The occupancy ratio analysis showed for simulated taxi agent data occupancy was clustered Modeling of taxi is an important aspecttaxi of understanding the behavior of taxi service 22.service Taxi occupancy. Simulated agent Real taxi data.48–52%. around 37–41%, Figure whereas for the real taxiLeft: data, occupancy ratio data; was Right: clustered around Though level in the city. This paper proposes a data driven agent-based simulation model to study simulated simulated taxi data showed slight underestimation regarding occupancy ratio; the overall distribution taxi behaviors in a large-scale urban area with the taxi probe vehicle data. Analysis of the taxi agent 7. Conclusions was kept similar to the real taxi data. simulation showed a significant similarity with the real taxi data, indicating that the simulated result Modeling of taxi service is an important aspect of understanding the behavior of taxi service could keep the real nature of taxi service behavior. The previous study on agent-based modeling for 7. Conclusions level in the city. This paper proposes a data driven agent-based simulation model to study simulated taxi behavior analyses have compared the measured and the modeled travel distance, and travel time taxi behaviorsofintaxi a large-scale urban area with aspect the taxiof probe vehicle data.the Analysis of the Modeling service an important understanding behavior of taxi taxi agent service with the cost, to validate theismodel. However, for travel time, results were scattered between the simulation showed a significant similarity with the real taxi data, indicating that the simulated result level in the city. This paper proposes a data driven agent-based simulation model to study simulated could keep the real nature of taxi service behavior. The previous study on agent-based modeling for taxi behaviors in a large-scale urban area with the taxi probe vehicle data. Analysis of the taxi agent taxi behavior analyses have compared the measured and the modeled travel distance, and travel time simulation showed a significant similarity with the real taxi data, indicating that the simulated result with the cost, to validate the model. However, for travel time, results were scattered between the could keep the real nature of taxi service behavior. The previous study on agent-based modeling
ISPRS Int. J. Geo-Inf. 2018, 7, 177
22 of 24
for taxi behavior analyses have compared the measured and the modeled travel distance, and travel time with the cost, to validate the model. However, for travel time, results were scattered between the measured and the model data [12], for which two issues regarding road network density and routing were mentioned. In the agent-based modeling presented here, taxi service modeling was categorized based on weekday and weekend. Nevertheless, with the increasing utilization of GPS probe data, modeling of service can be made by adding other entities, such as daily variation, monthly variation, etc. More importantly, such simulation can help understand and predict the effect of having a large number of taxis in the spatial and temporal domain with low demand, and vice versa. Understanding such taxi behavior in the city can significantly help managing and dispatching the fleet of the taxi that can make monetary profit for the drivers. The limitation of the current agent-based model is that the current agent-based model system utilizes an offline learning method, which possesses constraints in terms of time and resources when required to learn from a high-speed streaming dataset. The offline learning method, despite having many use cases, possess a limitation in regards how it can handle new datasets. In such cases, the model needs to be improved that would help accommodate learning from high-speed steaming data, as proposed in [36], where the OD matrix constantly evolves as time progresses, with the addition of new datasets and removal of the outdated datasets. The model can further be improved in terms of free movement of vacant taxis, with regard to replacing movement directed by direction angle to searching the next road network node at each time interval. Furthermore, current agent-based modeling could describe the taxi behavior with a trip time of 2 h and trip distance of 100 km. Though such trips accounted for about 98% of the total trip, the model could be further improved to encapsulate both short and long trips, regarding both time and distance. Author Contributions: S.R., A.W. and R.S. conceptualize and designed the system. M.N. and A.W. provided spatial data management framework, technique and methodology. A.W., M.N. and R.S. contributed to the improvement of the overall system and the original manuscript. Acknowledgments: This research was facilitated and funded by Shibasaki Laboratory (http://shiba.iis.u-tokyo. ac.jp/) of The University of Tokyo. This research was partially supported by Toyota Tsusho Nexty Electronics (Thailand) Co., Ltd., by providing Taxi Probe Data for research purpose. The author would like to thank Hiroshi Kanasugi from The University of Tokyo who provided insight and expertise that greatly assisted the research. Conflicts of Interest: The authors declare no conflict of interest.
References 1.
2. 3.
4. 5. 6.
7.
Baster, B.; Duda, J.; Maciol, A.; Rebiasz, B. Rule-Based Approach to Human-like Decision Simulating in Agent-Based Modeling and Simulation. In Proceedings of the 2013 17th International Conference on System Theory, Control and Computing (ICSTCC) 2013, Sinaia, Romania, 11–13 October 2013; pp. 739–743. Bonabeau, E. Agent-Based Modeling: Methods and Techniques for Simulating Human Systems. Proc. Natl. Acad. Sci. USA 2002, 99 (Suppl. 3), 7280–7287. [CrossRef] [PubMed] Tu, W.; Li, Q.; Fang, Z.; Shaw, S.-L.; Zhou, B.; Chang, X. Optimizing the Locations of Electric Taxi Charging Stations: A Spatial–temporal Demand Coverage Approach. Transp. Res. Part C Emerg. Technol. 2016, 65, 172–189. [CrossRef] Sadahiro, Y.; Lay, R.; Kobayashi, T. Trajectories of Moving Objects on a Network: Detection of Similarities, Visualization of Relations, and Classification of Trajectories. Trans. GIS 2013, 17, 18–40. [CrossRef] Miwa, T.; Sakai, T.; Morikawa, T. Route Identification and Travel Time Prediction Using Probe-Car Data. Int. J. ITS Res. 2004, 2, 21–28. Cheng, S.F.; Nguyen, T.D. TaxiSim: A Multiagent Simulation Platform for Evaluating Taxi Fleet Operations. In Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, Lyon, France, 22–27 August 2011; Volume 2, pp. 14–21. Bischoff, J.; Maciejewski, M.; Sohr, A. Analysis of Berlin’s Taxi Services by Exploring GPS Traces. In Proceedings of the 2015 International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS), Budapest, Hungary, 3–5 June 2015; pp. 209–215.
ISPRS Int. J. Geo-Inf. 2018, 7, 177
8. 9.
10. 11. 12. 13. 14.
15. 16. 17.
18. 19. 20. 21. 22. 23.
24. 25.
26. 27. 28.
29.
23 of 24
Yuan, N.J.; Zheng, Y.; Zhang, L.; Xie, X. T-Finder: A Recommender System for Finding Passengers and Vacant Taxis. IEEE Trans. Knowl. Data Eng. 2013, 25, 2390–2403. [CrossRef] Moreira-Matias, L.; Gama, J.; Ferreira, M.; Mendes-Moreira, J.; Damas, L. On Predicting the Taxi-Passenger Demand: A Real-Time Approach. In Progress in Artificial Intelligence; Correia, L., Reis, L.P., Cascalho, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; Volume 8154 LNAI, pp. 54–65. Maciejewski, M.; Salanova, J.M.; Bischoff, J.; Estrada, M. Large-Scale Microscopic Simulation of Taxi Services. Berlin and Barcelona Case Studies. J. Ambient Intell. Humaniz. Comput. 2016, 7, 385–393. [CrossRef] Abar, S.; Theodoropoulos, G.K.; Lemarinier, P.; O’Hare, G.M.P. Agent Based Modelling and Simulation Tools: A Review of the State-of-Art Software. Comput. Sci. Rev. 2017, 24, 13–33. [CrossRef] Grau, J.M.S.; Romeu, M.A.E. Agent Based Modelling for Simulating Taxi Services. Procedia Comput. Sci. 2015, 52, 902–907. [CrossRef] Raychaudhuri, S. Introduction to Monte Carlo Simulation. In Proceedings of the 2008 WSC Winter Simulation Conference, Miami, FL, USA, 7–10 December 2008; pp. 91–100. Deng, Z.; Ji, M. Spatiotemporal Structure of Taxi Services in Shanghai: Using Exploratory Spatial Data Analysis. In Proceedings of the 2011 19th International Conference on Geoinformatics, Shanghai, China, 24–26 June 2011; pp. 1–5. Wong, K.I.; Wong, S.C.; Bell, M.G.H.; Yang, H. Modeling the Bilateral Micro-Searching Behavior for Urban Taxi Services Using the Absorbing Markov Chain Approach. J. Adv. Transp. 2005, 39, 81–104. [CrossRef] Yuan, J.; Zheng, Y.; Zhang, L.; Xie, X.; Sun, G. Where to Find My Next Passenger? In Proceedings of the 13th International Conference on Ubiquitous Computing, Beijing, China, 17–21 September 2011; pp. 109–118. Li, B.; Zhang, D.; Sun, L.; Chen, C.; Li, S.; Qi, G.; Yang, Q. Hunting or Waiting? Discovering Passenger-Finding Strategies from a Large-Scale Real-World Taxi Dataset. In Proceedings of the 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops), Seattle, WA, USA, 21–25 March 2011; pp. 63–68. Szeto, W.Y.; Wong, R.C.P.; Wong, S.C.; Yang, H. A Time-Dependent Logit-Based Taxi Customer-Search Model. Int. J. Urban Sci. 2013, 17, 184–198. [CrossRef] Wong, R.C.P.; Szeto, W.Y.; Wong, S.C. A Cell-Based Logit-Opportunity Taxi Customer-Search Model. Transp. Res. Part C Emerg. Technol. 2014, 48, 84–96. [CrossRef] 2Wong, R.C.P.; Szeto, W.Y.; Wong, S.C. Behavior of Taxi Customers in Hailing Vacant Taxis: A Nested Logit Model for Policy Analysis. J. Adv. Transp. 2015, 49, 867–883. 2Wong, R.C.P.; Szeto, W.Y.; Wong, S.C. A Two-Stage Approach to Modeling Vacant Taxi Movements. Transp. Res. Procedia 2015, 7, 254–275. Chakka, V.P.; Everspaugh, A.C.; Patel, J.M. Indexing Large Trajectory Data Sets with SETI. In Proceedings of the CIDR Conference on Innovative Data Systems Research, Asilomar, CA, USA, 5–8 January 2003. Zhang, Y.; Li, J. Research and Improvement of Search Engine Based on Lucene. In Proceedings of the 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, Hangzhou, China, 26–27 August 2009; pp. 270–273. Witayangkurn, A.; Horanont, T.; Shibasaki, R. The Design of Large Scale Data Management for Spatial Analysis on Mobile Phone Dataset. Asian J. Geoinform. 2013, 13, 17–24. Ranjit, S.; Nagai, M.; Witayangkurn, A.; Shibasaki, R. Sensitivity Analysis of Map Matching Techniques of High Sampling Rate GPS Data Point of Probe Taxi on Dense Open Street Map Road Network of Bangkok in a Large-Scale Data Computing Platform. In Proceedings of the 15th International Conference on Computers in Urban Planning and Urban Management, Adelaide, Australia, 11–14 July 2017. Nam, D.; Hyun, K.; Kim, H.; Ahn, K.; Jayakrishnan, R. Analysis of Grid Cell–Based Taxi Ridership with Large-Scale GPS Data. Transp. Res. Rec. J. Transp. Res. Board 2016, 2544, 131–140. [CrossRef] Castro, P.S.; Zhang, D.; Chen, C.; Li, S.; Pan, G. From Taxi GPS Traces to Social and Community Dynamics: A Survey. ACM Comput. Surv. 2013, 46, 1–34. [CrossRef] Li, Q.; Zheng, Y.; Xie, X.; Chen, Y.; Liu, W.; Ma, W.Y. Mining User Similarity Based on Location History. In Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Irvine, CA, USA, 5–7 November 2008. Zheng, Y.U. Trajectory Data Mining: An Overview. ACM Trans. Intell. Syst. Technol. 2015, 6, 1–41. [CrossRef]
ISPRS Int. J. Geo-Inf. 2018, 7, 177
30.
31.
32. 33. 34. 35. 36. 37. 38.
39.
40.
41. 42.
43. 44. 45. 46.
47.
24 of 24
Ester, M.; Kriegel, H.; Sander, J.; Xu, X. A Density-Based Algorithm for Discovering Clusters a Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, OR, USA, 2–4 August 1996. Gan, J.; Tao, Y. DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation. In Proceedings of the 2015 ACM SIGMOD IInternational Conference on Management of Data, Melbourne, Victoria, Australia, 31 May 31–4 June 2015; pp. 519–530. Wong, D.W.S.; Huang, Q. Sensitivity of DBSCAN in Identifying Activity Zones Using Online Footprints. In Proceedings of the Spatial Accuracy 2016, Montpellier, France, 5–8 July 2016; pp. 151–156. Campello, R.J.G.B.; Moulavi, D.; Sander, J. Density-Based Clustering Based on Hierarchical Density Estimates. In Advances in Knowledge Discovery and Data Mining; Springer: London, UK, 2013; pp. 160–172. Gonzales, E.; Yang, C.; Morgul, F.; Ozbay, K. Modeling Taxi Demand with GPS Data from Taxis and Transit; Mineta National Transit Research Consortium: San Jose, CA, USA, 2014. Ge, Q.; Fukuda, D. Updating Origin-Destination Matrices with Aggregated Data of GPS Traces. Transp. Res. Part C Emerg. Technol. 2016, 69, 291–312. [CrossRef] Moreira-Matias, L.; Gama, J.; Ferreira, M.; Mendes-Moreira, J.; Damas, L. Time-Evolving O-D Matrix Estimation Using High-Speed GPS Data Streams. Expert Syst. Appl. 2016, 44, 275–288. [CrossRef] Zhang, D.; He, T.; Lin, S.; Munir, S.; Stankovic, J.A. Taxi-Passenger-Demand Modeling Based on Big Data from a Roving Sensor Network. IEEE Trans. Big Data 2016, 3, 362–374. [CrossRef] Ke, J.; Zheng, H.; Yang, H.; Chen, X.M. Short-Term Forecasting of Passenger Demand under on-Demand Ride Services: A Spatio-Temporal Deep Learning Approach. Transp. Res. Part C Emerg. Technol. 2017, 85, 591–608. [CrossRef] Lv, H.; Fang, F.; Zhao, Y.; Liu, Y.; Luo, Z. A Performance Evaluation Model for Taxi Cruising Path Recommendation System. In Advances in Knowledge Discovery and Data Mining; Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S., Eds.; Springer International Publishing: Cham, Switzerland, 2017; Volume 10235 LNAI, pp. 156–167. Grau, J.M.S.; Moreira-Matias, L.; Saadallah, A.; Tzenos, P.; Aifadopoulou, G.; Chaniotakis, E.; Romeu, M.A.E. Informed versus Non-Informed Taxi Drivers: Agent-Based Simulation Framework for Assessing Their Performance. In Proceedings of the Transportation Research Board 97th Annual Meeting, Washington, DC, USA, 7–11 January 2018. Liu, K.; Yamamoto, T.; Morikawa, T. An Analysis of the Cost Efficiency of Probe Vehicle Data at Different Transmission Frequencies. Int. J. ITS Res. 2006, 4, 21–28. Liu, K.; Yamamoto, T.; Morikawa, T. Comparison of Time/space Polling Schemes for a Probe Vehicle System. In Proceedings of the 14th World Conference on Intelligent Transport Systems, Beijing, China, 9–13 October 2007. Wang, Y.; Zhu, Y.; He, Z.; Yue, Y.; Li, Q. Challenges and Opportunities in Exploiting Large-Scale GPS Probe Data; Technical Report HPL-2011-109; HP Laboratories: Palo Alto, CA, USA, 2011. Helbing, D. Agent-Based Modeling. In Social Self-Organization: Agent-Based Simulations and Experiments to Study Emergent Social Behavior; Helbing, D., Ed.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 25–70. Sekimoto, Y.; Shibasaki, R.; Kanasugi, H.; Usui, T.; Shimazaki, Y. PFlow: Reconstruction of People Flow by Recycling Large-Scale Fragmentary Social Survey Data. IEEE Pervasive Comput. 2011, 10, 27–35. [CrossRef] Kanasugi, H.; Sekimoto, Y.; Kurokawa, M.; Watanabe, T.; Muramatsu, S.; Shibasaki, R. Spatiotemporal Route Estimation Consistent with Human Mobility Using Cellular Network Data. In Proceedings of the 2013 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops), San Diego, CA, USA, 18–22 March 2013; pp. 267–272. Inman, H.F.; Bradley, E.L. The Overlapping Coefficient as a Measure of Agreement Between Probability Distributions and Point Estimation of the Overlap of Two Normal Densities. Commun. Stat. Theory Methods 1989, 18, 3851–3874. [CrossRef] © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).