Data-driven Geospatial-enabled Transportation ...

Data-driven Geospatial-enabled Transportation Platform for Freeway Performance Analysis

Sa Xiao Ph.D. Student, Graduate Research Assistant1) Tel: (206) 495-1882, Email: [email protected] Xiaoyue Cathy Liu, Ph.D. (Corresponding Author) Assistant Professor2) Tel: (281) 760-9768, Email: [email protected]

ra ft

and Yinhai Wang, Ph.D. Professor1) Tel: (206) 616-2696, Fax: (206) 543-1543, Email: [email protected]

1) Department of Civil & Environmental Engineering, University of Washington Box 352700, Seattle, WA 98195-2700

D

2) Department of Civil & Environmental Engineering, University of Utah Salt Lake City, Utah, 84112

Xiao, Liu, Wang

2

ABSTRACT

ra ft

The burgeoning field of big data nowadays has motivated the development of innovative architecture for better exploiting and exploring huge amount of multidisciplinary data. Inspired by the concept of eScience, the on-line transportation platform Digital Roadway Interactive Visualization and Evaluation Network (DRIVE Net) is developed in this study for the purpose of transportation data sharing, integration, visualization, and analysis. The major research goal of the DRIVE Net system can be summarized in threefold. First, it provides the repository service to facilitate data sharing and integration. Second, the system is capable of visualizing large sets of transportation data, helping users to perceive and understand the data. Third, the interactive and computational functionalities built into the DRIVE Net allow users to perform a variety of statistical modeling and analysis on multiple data sources, assisting with users to draw meaningful inferences and to make informed decisions. This research thus developed such an eScience platform addressing the aforementioned challenges for transportation applications. To particularly demonstrate the analytical capability of DRIVE Net, a new approach that automates real-time freeway performance measurement is developed and implemented onto the system. The proposed method provides quantitative evaluation of network-wide freeway performance to facilitate decision making in transportation operations and management.

D

Keywords: Freeway Performance Measurement; Data Visualization; Geospatial Data Fusion; Pixel-based Segmentation; Cluster Analysis

Xiao, Liu, Wang

3 I.

Introduction

D

ra ft

Big data motivates and inspires scientists and researchers to develop new infrastructure for better exploiting and exploring petabytes of multidisciplinary data. In 1999, John Taylor, the Director General of Research Councils in the UK Office of Science and Technology, first introduced the term “eScience” [1]. The concept of eScience captures the novel generation of infrastructure that enables researchers from multidisciplinary areas to collaborate with each other for achieving better, faster, and more diverse research capabilities. Most of the eScience research activities have been looked into the development of new computational tools to assist with scientific discovery. Inspired by the concept of eScience, the online transportation platform prototype, Digital Roadway Interactive Visualization and Evaluation Network (DRIVE Net), was developed in 2008 for the purpose of data sharing, integration, visualization, and analysis [2]. The system provides users with the capability to store, access, and manipulate data from anywhere as long as they have Internet connections. The major research goal for the DRIVE Net system can be summarized in threefold. First, it provides the repository service to facilitate data sharing and integration. The existing data sources integrated amongst various agencies include roadway geometric data, loop detector data, Bluetooth data, INRIX speed data, incident data, weather data, freeway travel time, etc. Second, it can visualize large sets of transportation data, helping users to perceive and understand the data. Third, the interactive and computational functionalities built into the DRIVE Net system allow users to perform a variety of statistical modeling and analysis on multiple data sources, assisting users to draw meaningful inferences and make informed decisions. The system benefits not only transportation practitioners and researchers but also the general public by providing both historical and real-time transportation information and numerous performance measures in the broader context of an interdisciplinary framework. As a demonstration of the DRIVE Net’s functionality for data-driven discoveries and decision making, this study illustrates the computational procedure for automating real-time freeway performance analysis upon this platform. Freeway networks play a vital role in providing accessibility to a multitude of resources, while social and economic needs continue to shape the freeway network in turn with the rapid advancement of technology. Considering the fact that freeway performance analysis takes into consideration complex interactions amongst geometric, environmental, political, behavioral, and technological features, it is an ideal example to showcase the capability of DRIVE Net from an eScience perspective. The remainder of this paper is organized as follows. It first gives an overview of the state-ofthe-art in web-based transportation system development. This is followed by a detailed description of the research highlight, more specifically, the innovative system architecture adopted by the nextgeneration DRIVE Net 3.0. The authors further present an application on top of the system, automating real-time freeway performance measurement, with explanation of proposed methodologies and implementation results. The conclusion and future work are summarized at last. II.

State-of-the-practice for web-based transportation platform

During the past decades, a significant amount of research in Intelligent Transportation System (ITS) has been focusing on developing web-based transportation information system to enable real-time information extraction with user-friendly interface. In this section, several representative online transportation systems related to DRIVE Net are reviewed.

Xiao, Liu, Wang

4

A. Freeway performance measurement system (PeMS) Established in 1998, PeMS is a freeway performance measurement system jointly developed by the University of California, Berkeley, California Department of Transportation (Caltrans), and the Partners for Advanced Transportation Technology (PATH). With the support from Caltrans and local agencies, the system integrates various traffic data sources including traffic detectors, census traffic counts, incident logs, vehicle classification data, toll tag-based data, and roadway inventory, etc. These traffic data have been automatically collected and archived for over ten years and real-time information is updated from over 25,000 detectors [3], [4]. As a critical component of Caltrans performance measurement system, PeMS provides a variety of freeway evaluations in terms of speed, occupancy, travel time, vehicle miles traveled, vehicle hours traveled, and vehicle hours of delay, etc.

ra ft

B. Regional integrated transportation information system (RITIS)

RITIS is an automated data archiving and integration system developed by the Center for Advanced Transportation Technology Laboratory (CATT Lab) at the University of Maryland. As one of representative online transportation systems, RITIS targets to improve transportation safety, efficiency, and security by fusing and mining the transportation-related data in Maryland, Virginia, and the District of Columbia. The system provides both real-time and historical data to users with access credentials, including incident, weather, radio scanners, and other sensors. Numerous visualization and analysis tools are developed to enable the interactive exploration and analysis of performance measures from data archival. DOT or public safety employees could possibly use the RITIS service by applying online. The system is not accessible to the general public [5]. C. Portland Oregon regional transportation archive listing (PORTAL)

D

Originally established in 2004 with simple user interface and only one data source – freeway loop detector, PORTAL has evolved significantly over the past eight years. In addition to the loop detector data from the Portland-Vancouver metropolitan region, now PORTAL 2.0 archives approximately one-terabyte transportation data including weather data, incident data, freight data and transit data. The system takes advantages of Adobe Flash and Google Maps technologies to display transportation data spatially. Additionally, various graphical and tabulated performance information are available on the website, such as incident reports, transit speed map, traffic count, vehicle miles traveled, and vehicle hours traveled [6]. D. Freeway and arterial system of transportation dashboard (FAST)

FAST dashboard, released online in September 2010 (http://bugatti.nvfast.org), is a web-based system developed to control and monitor the traffic in the Las Vegas and Nevada Metropolitan areas [10]. In collaboration with the Nevada Department of Transportation (NDOT), the system collects and archives real-time traffic data retrieved from loop detectors, radar detectors, and Bluetooth sensors deployed on freeways and ramps. Traffic data including lane occupancy, volume, and speed are further processed as the major source for performance measurement. Also integrated in the system are incident data in report format collected from the general public, and weather data shared by NDOT Road Weather Information System.

Xiao, Liu, Wang

5

The majority of these systems only provides simple and straightforward evaluations in terms of speed, travel time, occupancy, vehicle miles traveled, vehicle hours traveled, and vehicle hours of delay. However, in reality, more sophisticated evaluations should be adopted to capture interactions among geometric, environmental, political, behavioral, and technological features. With regard to accessibility, PORTAL and FAST is accessible to the general public, while PeMS and RITIS has designated usage for their customers and is not open to the public. Besides, most of these systems leave their core technique and architecture uncovered, making it difficult to be used by the transportation academic community. These led to the motivation of the development of DRIVE Net system, which is more informative, accessible, interoperable, scholarly and exploratory in nature. III.

Methodology

A. Fat-Server Thin-Client framework (DRIVE Net architecture)

D

ra ft

The next-generation DRIVE Net system, referred to as DRIVE Net 3.0, adopts the “thin-client and fat server” architecture with three basic tiers of web application, i.e. presentation tier, logic tier, and data tier, as showed in Fig. 1. Presentation tier includes the user interface terminal via which users interact with the application. Logic tier, also known as computational tier, is the core component of DRIVE Net system. It performs computations to assist with customized analysis and decision making based on users’ interactive input. Analytical tools developed include incidentinduced delay forecasting using deterministic queuing theory [8], Bluetooth-based pedestrian trajectory re-construction [9], GPS-based truck performance measure [10], etc. Data tier organizes and supports data requested for analysis. Normally the client handles the user interface while the server is responsible for the data. The significant difference between “thin-client and fat server” and “fat-client and thin server” is the shifted responsibility for the logic/computational tier [11]. In fat-server systems, the server fully takes over the logic/computational tier while the client only hosts the presentation tier for displaying user interface and dealing with user interaction. There are three reasons to adopt a thin-client architecture. First, no plug-in and installation is required at the client side except a basic browser, which ensures compatibility to the greatest extent. Considering the fact that the system is designed for customers with constrained network environment, minimal requirement at the client side is most desirable. Second, there is less security concern since all the data are manipulated and computational tasks are performed at the server side, while the client is only responsible for user interaction and results presentation. Third, mature frameworks for building thin-client web application could be re-used to boost development productivity. However, thin-client architecture does have its drawbacks. One major disadvantage is that the performance of the system solely depends on the server and excessive user requests would greatly compromise system efficiency. This can be remedied nowadays with the continuous advancement of cloud computing technologies such as Amazon Web Service, where the cloud servers are fully utilized to improve system performance. The DRIVE Net 3.0 can be accessed at: http://www.uwdrive.net or http://wsdot.uwdrive.net. The data communication flows in the DRIVE Net system can be summarized as follows and presented in Fig. 1: 1. The end-user sends an HTTP(S) request to the web server. 2. The web server looks into the request and retrieves the related data information from data warehouse.

Xiao, Liu, Wang

6

D

ra ft

3. The warehouse sends back the requested data and the web server performs the computational tasks using either the built-in analytical tools or external statistical modules provided by R Server. 4. If geospatial analysis is involved, the web server will connect to the OpenStreetMap Server and request the map. 5. Analysis results as well as the map are then returned to the client. Web browser displays the results or visualizes the returned objects on the map.

Xiao, Liu, Wang

7 Thin-Client Side

HTTP(S)

1

5

Incident Induced Delay Calculation

Dynamic Routing

Travel Time Performance Measure

Pedestrian Trajectory Reconstruction

4 OpenStreetMap Server

ra ft

Web Mapping Service

3

R Server

Statistical Analysis Service

Freight Performance Measure

DRIVENet Web Server

3

Traffic Emission Evaluation Freeway Performance Measure

Corridor Sensor Comparison

Real-time Traffic

Fat-Server Side

2

Loop Detectors

OpenStreetMap

WSDOT Roadway Geometric data

Data fusion

Data fusion

Geospatial Data

Data fusion

Weather

Transportation Data

INRIX Speed

…...

…...

D

HPMS

TMC Network

WITS

Data Quality Control

…...

File Importing

FTP Downloading

Data Sources

Data Sources

External Database Connecting

…...

Data Sources

Xiao, Liu, Wang

8 Fig. 1. DRIVE Net 3.0 architecture.

To reduce costs and boost productivity, multiple open source products are utilized for the system implementation. DRIVE Net not only takes advantage of code-sharing and collaboration with a broad community of developers, but also contributes to open source projects. Core open source products that are integrated into DRIVE Net system are introduced below: 1) OpenStreetMap

ra ft

OpenStreetMap (OSM) is a collaborative project to create a comprehensive worldwide map that is free to use and editable [12]. With the outlook that geospatial data should be freely accessible to the public, the OSM project was established by University College London in July 2004 and treated as one of the most prominent and famous examples of Volunteered Geographic Information, a concept introduced by Goodchild [13]. The process of maintaining OSM data is described as crowdsourcing, which distributes the labor-intensive tasks to large groups of users and allows volunteers to create and update geospatial data on the Internet. By January 2013, OSM has over one million registered contributors and 20,000 active users worldwide and the number keeps rising dramatically [14]. Additionally, OSM obtained strong support from governments and commercial companies such as Yahoo Maps and Bing Maps. One major reason for DRIVE Net to choose OSM is its data sharing nature and low cost yet good quality geospatial data compared with commercialized datasets. Under the Open Data Commons Open Database License (ODbL), developers are free to use, distribute, and modify the OSM data as long as OSM and its contributors are credited. On one hand, using OSM to replace Google Maps helps DRIVE Net avoid potential charges incurred in the future that might eventually prevent the project from growing. On the other hand, within the eScience scope, DRIVE Net prefers open source over commercialized products, which essentially help share ideas, drive innovation, and boost productivity for the entire community. 2) PostgreSQL, PostGIS, and pgRouting

D

The biggest challenge in the previous DRIVE Net system is the lack of geo-processing power, which makes it less competitive in spatial modeling. In the new system, PostgreSQL with extender PostGIS and pgRouting is adopted to store geo-data and perform spatial analysis as well as routing functions. Those three products are all free, open source, and well-supported by their active communities. Although commercialized software products such as ArcGIS/ArcServer might accomplish the same tasks, open source projects are always preferred by academia besides the fact that commercialized products usually have expensive license and usage restrictions. B. Real-Time Freeway Performance Analysis Real-time freeway performance measurement quantitatively delivers traffic conditions to transportation researchers, operators, planners, and the general public in a timely manner. With the network-wide real-time information, decision makers can not only efficiently evaluate the quality of service on transportation facilities and identify the congestion bottlenecks, but also perform prompt coordination/response and refine policy and investment decisions. The ultimate goal of measuring freeway performance is to improve transportation mobility and accessibility.

Xiao, Liu, Wang

9

D

ra ft

The most widely used guidance for measuring freeway performance is the Highway Capacity Manual (HCM) 2010, which has been undergoing constant revision ever since 1944 [15]. The 2010 version HCM, published by Transportation Research Board (TRB) of the National Academies of Science, is a collection of the state-of-the-art methodologies for measuring the quality of service of transportation facilities. One important concept introduced by HCM is the Level of Service (LOS), which represents a qualitative ranking of traffic performance ranging from A to F. LOS A represents the best traffic operational condition, while F represents the worst. In this study the HCM 2010 methods are used for quantifying freeway performance. Although real-time traffic data as well as roadway geometric data are collected by every State agency, there is no standardized procedure developed to utilize available datasets and automate the network-wide freeway operational analysis. FREEVAL 2010, the computational engine executed in Microsoft Excel, is one alternative solution to freeway facilities analysis developed on the basis of HCM 2010 [16]. However, FREEVAL requires users to manually input geometric and traffic demand information for each individual segment, which can be extremely cumbersome when analyzing long roadway segments across multiple time periods. With the significant computational power and comprehensive data in the data warehouse (such as mainline loop detector data, freeway geometric factors, INRIX speed, etc.), DRIVE Net provides a mature platform to perform real-time LOS analysis for freeway segments. Due to the limited information on ramp geometrics, on-ramp volume, off-ramp volume, and weaving volume, this study will only focus on quantifying traffic operational performance for basic freeway segment. Given the existing resources including roadway geometry, loop detector data and INRIX speed data, the objective is to automate the freeway performance measurement with satisfying efficiency and accuracy. The proposed spatial modeling framework for the network-wide freeway performance measurement is shown in Fig. 2 with two main phases. In Phase 1, the roadway network is segmented using an innovative spatial data fusion technique - pixel-based segmentation. Once the segmented network is formed, three different methods are applied to compute LOS in Phase 2, specifically, HCM 2010 Method, HCM 2010 Method with INRIX Speed Data, and Multiregime Prediction Method.

Xiao, Liu, Wang

10

Phase 1: Segment Roadway Network & Integrate GIS Layers (Pixel-based Segmentation)

Determine FFS

Loop Detector Data

Input volumes

INRIX Speed

Input historical speed data sets

Input real-time INRIX Speed

Input historical adjusted demand volume

Adjust Demand Volume

ra ft

Input real-time demand volume

Input historical data

Input real-time data

Calculate Density & Determine LOS

Calculate Density & Determine LOS

Finalize LOS

Multi-Regime Regression Predict LOS

D

Fig.2. Modeling framework of the freeway performance analysis.

1) Phase 1: Segment Roadway Network and Integrate GIS Layers

With heterogeneous datasets, it is necessary to perform multi-layer geospatial data processing such that multiple Geographic Information System (GIS) layers can be superimposed. To calculate performance measurements, a fundamental network layer needs to be prepared, in which each basic roadway segment has a consistent attribute dataset containing roadway geometrics and traffic conditions. In GIS, vector overlay is the major solution to combine both the geographic data and attribute data from multiple input GIS layers. However, in our case, the large volume of networkwide spatial data makes the overlay analysis time consuming and computational expensive. Additionally, if a new GIS layer is imported into the DRIVE Net data warehouse, it is not realistic to re-perform the entire overlay operations. Therefore, pixel-based segmentation, a novel method to model the geospatial data, is proposed, which borrows the concept of pixel in digital imaging. A pixel is generally treated as the fundamental unit of a digital photo. Millions of pixels are combined together to resemble the

Xiao, Liu, Wang

11

ra ft

original image seemingly. The quality of the image highly depends on the total number of pixels per inch, which is defined as image resolution. The more pixels the image contains per inch, the more details it is able to reveal. Similarly, pixel-based segmentation subdivides a roadway network into basic segments of equal length, called line pixel. The length of line pixel defines the resolution of segmentation. The shorter the pixel length, the more details the output network contains. For example, Table 2 illustrates I-5 northbound with start milepost 140.4 and end milepost 140.9. The entire roadway is subdivided into 5 basic segments of equal length (0.1 mile each). Attribute data of output network use the combination of route ID, start milepost, and end milepost as the unique key to link with the geographic data. With the geographic data being segmented into equal line pixels already, the process of superimposing multiple GIS layers can be accomplished in attribute data side only. WSDOT adopts the Linear Referencing System (LRS) to identify the locations of features based on state route ID and feature distance in miles from route beginning. It is thus efficient to retrieve corresponding features given the route ID, start milepost and end milepost. Pseudocode for integrating attribute data from multiple GIS layers can be found below. Algorithm 1. Function integrateGISLayers

function integrateGISLayers for each route r in network for k = 0; k < r.length; k = k + pixel_length start_mp = k; end_mp = k + pixel_length; for each input GIS Layers l # look up attribute data of l # given routeid, start_mp and end_mp outputLayer[r, start_mp, end_mp, l] = getAttributeDate(l, r, start_mp, end_mp); end end end output outputLayer;

D

The pixel-based segmentation is used in this study for the following reasons. First, it separates the attribute data from geographic data. Compared to the vector overlay operations, the integration of attribute data based on LRS is more efficient to implement. Second, the fixed segmentation makes it convenient to integrate more GIS layers into existing network in the future, as long as the pixel resolution remains the same. Third, the value of pixel resolution is adjustable depending on the level of accuracy one desires to achieve. If the line pixel is infinitely close to 0, the output attribute table will capture perfect details regardless of the number of GIS layers imported. In reality, pixel size 0.1 mile is a good trade-off for efficiency and accuracy, considering the average length of roadway segments in shapefiles is around 0.25 mile. 2) Phase 2.1: Calculate LOS using the HCM 2010 methodology

The HCM 2010 provides a comprehensive methodology for analyzing the LOS as demonstrated in Fig. 2 Phase 2.1. Note that there is no measured FFS available for the entire network layer, FFS is thus computed via lane width adjustment and lateral clearance adjustment in this study. The HCM 2010 is unable to handle system-wide oversaturated flow conditions, and

Xiao, Liu, Wang

12

ra ft

only focuses on analyzing under-saturated flow conditions. Over-saturated flow condition estimation is discussed in Phase 2.2. Input data for LOS calculation are prepared as described below. For more details on LOS computations, please refer to the HCM 2010. Demand Volume: Real-time demand volume are mainly estimated from loop detectors. The system automatically fetches all the cabinets between the Nearest Upstream Ramp (NUR) and the Nearest Downstream Ramp (NDR), and then queries corresponding latest 15-min flow. Demand volume is calculated using the following equation: V = 4 ⋅ Med({lastest 15-min flow | cabinets between NUR and NDR}) (1) where V is the hourly volume (veh/h). Median is selected to measure the central tendency since it naturally eliminates the outliers. It is then multiplied by 4, which projects into hourly volume. If no cabinets/loop detectors exist between the upstream and downstream ramps, the system will mark that there is no demand volume input for segments and use real-time INRIX speed and historical regression model to predict LOS in Phase 2.3. Other Input Data: Total ramp density (TRD) is defined as the total number of ramps (both on and off with one direction) within 3 miles of midpoint of segment under study divided by 6 miles. The original geometric data, including number and width of lanes, right-side lateral clearance and terrain, are downloaded from WSDOT Roadway Datamart for GIS. Geospatial data fusion is performed using the pixel-based method introduced earlier. Since there is no site-specific data available for the remaining features (peak hour factor PHF, driver population factor f HV , percentage of heavy vehicles f p ), default values recommended by NCHRP Report 599 [17] are adopted. Table 1 Default values for basic freeway segments. Required Data Peak Hour Factor PHF Driver Population Factor f HV

Default Values Urban: 0.92, Rural: 0.88 Urban: 1.0, Rural: 0.975

Percentage of heavy vehicles (%) f p

Urban: 5%, Rural: 12%

D

Free-Flow Speed: Since the site-specific measured FFS is not available, the following equation developed by HCM 2010 is used to estimate FFS . The Base Free-Flow Speed (BFFS) are adjusted based on Lane width, right-shoulder lateral clearance, and total ramp density using adjustment factors f LW , f LC and TRD in HCM 2010. The estimated FFS is then rounded to the nearest 5 mph. (2) FFS = 75.4 − f LW − f LC − 3.22 ⋅ TRD 0.84 Adjust Demand Volume: Demand volume V obtained from loop detectors needs to be converted into service flow rate under equivalent base conditions V p using Eq. (3). According to the HCM 2010, the base conditions for a basic freeway segment are specified as • • •

12-ft lane widths 6-ft right shoulder clearance 100% passenger cars in the traffic stream

Xiao, Liu, Wang • •

13

Level terrain A driver population of regular users familiar with roadway in general = V p V / ( PHF ⋅ N ⋅ f HV ⋅ f p )

(3)

where N is the number of lanes, f HV is the heavy vehicle adjustment factor, f p is the driver population adjustment factor.

ra ft

3) Phase 2.2: Incorporate the real-time INRIX speed into LOS calculation One of the limitations of the HCM method is that it cannot analyze system-wide oversaturated conditions. In other words, once demand is greater than capacity, HCM is unable to estimate space mean speed and density. However, in reality, it is critical to identify oversaturated conditions spatially and temporally such that operators and planners can better understand bottlenecks (formation, propagation, and dissipation) of the facilities. As suggested by HCM 2010, under oversaturated conditions, traffic speed drops dramatically, typically below 35 mph. To fill the gap of analyzing oversaturated conditions, INRIX speed data are utilized in the LOS calculation. Again using the adjusted demand volume, the density under over-saturated condition can be estimated using INRIX speed Sinrix as shown in Eq. (4): Dinrix = V p / Sinrix (4) Dhcm = δ ⋅ Dhcm + (1 − δ ) ⋅ Dinrix where Dinrix is the density calculated by INRIX speed, Dhcm is the density calculated by the 1 if Dinrix ≤ 45 mph . HCM 2010 methods, δ =  0 if Dinrix > 45 mph

D

4) Phase 2.3: Develop Empirical Speed-Density Regression Models to Predict LOS As one of the primary concerns, the quality of traffic data greatly influences the accuracy of performance estimation. The data quality issues include but not limited to: missing data, suspicious or erroneous data, and inaccurate data [18]. While erroneous data do not follow accepted principles or might go beyond thresholds, inaccurate data contains inexact values due to measurement bias. In this study, these three types of errors are all considered as invalid traffic data entry. The data quality issues trigger the challenge: How to compensate for the invalid data entry? Using historical information is one of alternatives to remedy invalid data input, as empirical speed-density relationships provides the valuable resource to perform LOS prediction. Over the past decades, much research has been done on developing speed-density models. Considering the data-driven nature, multi-regime model based on cluster analysis [19] is adopted to fit empirical speed-density observations. This method first applies K-means algorithm to traffic datasets, which naturally partitions the data into homogenous groups. It then applies a series of single-regime models to each cluster and finds out the one that best fits the data. Different from previous study that chose k value through a trial-and-error process [19], the optimal number of clusters is determined by the average Silhouette criterion in this study. For a proof of concept, only linear, logarithmic, and exponential models are tested against for determining the suitable model format. Pseudocode for building multi-regime traffic model can be found below. Algorithm 2. Function PerformSpeedDensityRegression

Xiao, Liu, Wang

14

function PerformSpeedDensityRegression # Given traffic datasets observations # Choosing k using the Sihouette k = DetermineKbySihouette(observations); clusters = kmeans(observations, k); for each cluster c in clusters # three basic functions chosen to fit c lmReg = lm(c.speed ~ c.density, data = c); logReg = lm(c.speed ~ ln(c.density), data = c); expReg = lm(c.speed ~ exp(c.density), data = c); #choose the regression model fits best bestReg = max(lmReg.Rsquare, logReg.Rsqaure, expReg.Rsquare); output bestReg;

ra ft

end

IV.

Case study

The aforementioned modeling framework is implemented in a real-world network for pilot testing purposes. I-5 Northbound corridor in Seattle, Washington from milepost 140 to milepost 195 is selected as the study site. It is the primary travel route connecting Tacoma-Everett through Downtown Seattle and has the most comprehensive traffic data available, with a total of 140 cabinets. In the following subsections, network segmentation and data preprocessing is briefly introduced, followed by result explanation of the LOS computed from three proposed methods, namely, HCM 2010 Method, HCM 2010 Method with INRIX Speed Data, and Multi-regime Prediction Method. The satisfactory results further confirm the reliability and feasibility of the proposed modeling framework. A. Network Segmentation

D

Applying the proposed pixel-based segmentation on geographic data, the corridor is subdivided into 550 basic freeway segments with pixel length of 0.1 mile. The corresponding attribute data are then fused based on route ID (I-5), start milepost, and end milepost. Table 2 presents the sample attribute data. Note that roadway geometric data is relatively static and not updated very often. It is more efficient to pre-process the attribute data fusion instead of running it on the fly.

Xiao, Liu, Wang

15

Table 2 Fused attribute data. Route

Start MP

End MP

Direct ion

Shoul der width

Rdwy width

Num Lns

Urba n/Ru ral

Terrain

TRD

0.833

Upper Ramp Milepo st 141.64

Lower Ramp Milepos t 138.04

5

140.4

140.5

North

10

48

4

U

Level

5

140.5

140.6

North

10

48

4

U

Level

0.833

141.64

138.04

5

140.6

140.7

North

10

48

4

U

Level

1

141.64

138.04

5

140.7

140.8

North

10

48

5

140.8

140.9

North

10

48

4

U

Level

1.167

141.64

138.04

4

U

Level

1.167

141.64

138.04

ra ft

B. Volume and Speed Datasets Real-time volume data are collected from single loop detectors every 20 seconds and INRIX speed is aggregated every 1 minute based on GPS data, respectively. Both datasets are archived in the DRIVE Net database. For the pilot testing purpose, 2-day observations are extracted and used in the later computation. The two traffic datasets are further aggregated into 15-min time interval as recommended by the HCM. Data quality control techniques are applied to ensure data accuracy. For example, several thresholds are set to eliminate outliers. Comprehensive data quality control is critical to the DRIVE Net system. For more detail, please refer to [20]. Fig. 3 shows the scatter plot of (a) adjusted volume V p vs. speed Sinrix and (b) density Dinrix

D

vs. speed Sinrix for a total of 95,040 observations. Notice that the service volume V p used in Fig. 3 is under base conditions, converted from real-time traffic counts following the HCM 2010 methods.

16

ra ft

Xiao, Liu, Wang

(a) volume vs. speed

(b) density vs. speed

Fig.3. Scatter plot of INRIX speed, adjusted volume, and density.

C. HCM Method with/without INRIX Speed Data

The HCM method with volume only and HCM method with volume and speed are applied to compute Lhcm and Linrix respectively. Since HCM method is unable to analyze oversaturated

D

conditions (LOS = F), the comparison between Lhcm and Linrix is conducted for undersaturated flow only. With a total of 92,400 observations fall into the undersaturated conditions, 83.83% of Lhcm is equivalent to Linrix , which totaled 77,458 data points. The matching rate increases to 98.98% if adjacent LOSs are treated as approximately equal (e.g. LOS A ≅ LOS B). The fact that these two methods have a high consistency in estimating LOS delivers several important messages: (1) the proposed methodologies such as pixel-based segmentation can generate satisfying accuracy; (2) using INRIX speed to determine oversaturated condition is feasible and cost-effective; and (3) the quality of INRIX speed data has been justified to some extent, considering the consistency of the LOS results computed by HCM methods in Phase 2.1 (without INRIX speed) and Phase 2.2 (with INRIX speed).

Table 3 and Fig. 4 show the comparison of the LOS category counts produced by the two methods. Note that LOS computed using INRIX speed usually underestimate service quality. The results are consistent with recent research on transportation sensor comparison conducted by University of Washington, which reveals that INRIX speed data usually has smaller standard deviation and underestimate traffic conditions [21].

Xiao, Liu, Wang

17

Table 3 LOS Count by Phase 2.1 (without INRIX Speed) and Phase 2.2 (with INRIX Speed) HCM Method 37430 30343 18677 5756 194 2640

HCM Method with INRIX Speed 35994 25188 20324 8077 2817 2640

D

ra ft

LOS A B C D E F

Fig. 4 LOS by Phase 2.1(without INRIX Speed) and Phase 2.2(with INRIX Speed)

D. Regression Analysis

To compensate for missing data or those of low quality, empirical multi-regime density-speed model is used to predict density in this study. During the implementation, the two-day datasets are divided evenly into training set (November 07, 2011) and testing set (November 08, 2011) to avoid the overfitting problem. Following the procedures described in Methodology Section, k value is chosen to be 2 using the Silhouette criterion. According to Sun et al. [19], using the original data for K-mean algorithm would outperform the normalized data. Hence, this study applies K-mean

Xiao, Liu, Wang

18

algorithm to the training set without normalization. Clustering results can be found in Fig. 4 and Table 4. As expected, Cluster 1 with 80.27% of observations has high speed and low density which represents free-flow regime, while Cluster 2 with 19.73% of observations has lower speed and high density which represents congested-flow regime. Three single-regime models, namely, linear, logarithmic, and exponential functions, are then used to fit Cluster 1 and Cluster 2 respectively. The one with the greatest R squared value is chosen to represent the empirical speed-density relationship. The following equation shows the final two-regime model obtained from the training set: 66.3237 − 0.1851 ⋅ D if D ≤ 24.6 (5) S = exp(4.657 − 0.02169 ⋅ D) if D > 24.6 where S is speed (mile/h), D is density (pc/mile)

ra ft

Table 4 Training set: two clusters by K-means algorithm analysis. Traffic Dataset Cluster 1 Center I-5 Northbound Speed (mile/h) 63 Density (pc/mile) 16.94186 Percentage 80.27%

D

Cluster 2 Center 53 32.87736 19.73%

Fig. 5. Training set: two clusters by k-means algorithm analysis with clustering and regression result presented.

Xiao, Liu, Wang

19

As Fig. 5 demonstrates, the two-regime model fits the training set quite well. Comparison between the ground-truth value Linrix and predicted value Lreg for both training set and testing set is further conducted. The testing set yields an even lower error as indicated in Table 5. If adjacent levels are treated as approximately equal, both training error and test error are within the 5% range (shown in Accuracy of ±1 in Table 4). It thus proves the feasibility and accuracy of the modeling framework proposed.

Accuracy 57.7% 59.84%

Accuracy of ±1 95.38% 95.01%

ra ft

Table 5 Test results. Date Set Training Set Test Set

V.

Conclusion

D

This research presents an eScience transportation platform, DRIVE Net 3.0, to model the transportation data. New architecture design is proposed and implemented with motivation to facilitate transportation data-driven research. The goal of the DRIVE Net system is to make transportation big data reliable, accessible, interoperable, and understandable. To demonstrate the computational capability of DRIVE Net, a pilot study of automating real-time freeway performance measurement is conducted. The modeling framework enables users to probe traffic condition in a timely manner and avoid time-consuming manual input for analysis. Particularly, it utilizes innovative spatial data fusion technique - pix-based segmentation - to spatially integrate multiple datasets. Applying HCM 2010 methodologies and multi-regime model, the system leverages network-wide datasets to give accurate and real-time evaluation of freeway facilities, which great facilitates users understanding of the traffic data and helps to draw useful inferences and make informed decisions. Future work involves using DRIVE Net to provide solutions to other practical problems, such as safety performance assessment, active traffic management evaluation, and HOT lane strategy assessment and optimization, travel time reliability and delay quantification, and congestion analysis. Meanwhile, the DRIVE Net team is actively seeking collaborations with researchers in other disciplines, which aligns well with the theme that “eScience is about global collaboration in key areas of science and the next generation of infrastructure that will enable it” as Dr. John Taylor stated. References

[1] Hey, T., Trefethen, A. E., 2002. The UK e-Science Core Programme and the Grid. Future Generation Computer Systems 18(8), pp. 1017-1031. [2] Ma, X., Wu, Y. J., Wang, Y., 2011. DRIVE Net: An E-Science of Transportation Platform for Data Sharing, Visualization, Modeling, and Analysis. Transportation Research Record: Journal of the Transportation Research Board 2215(1), pp. 37-49. [3] Chen, C., Petty, K., Skabardonis, A., Varaiya, P., Jia, Z., 2001. Freeway Performance Measurement System: mining loop detector data. Transportation Research Record: Journal of the Transportation Research Board 1748(1), pp. 96-102.

Xiao, Liu, Wang

20

D

ra ft

[4] Chao, C., Kwon, J., Rice, J., Skabardonis, A., Varaiya, P., 2003. Detecting Errors and Imputing Missing Data for Single-loop Surveillance Systems. Transportation Research Record: Journal of the Transportation Research Board 1855(1), pp. 160-167. [5] CATT Lab, 2012. RITIS System. http://www.cattlab.umd.edu/?portfolio=ritis. [6] Tufte, K. A., Bertini, R. L., Chee, J., Fernandez-Moctezuma, R. J., Periasamy, S., Sarkar, S., Singh, P., Whiteneck, J., Matthews, S., Freeman, N., Ahn, S., 2010. Portal 2.0: Towards a Next Generation Archived Data User Service. In: 89th Annual Meeting of Transportation Research Board, Washington, DC. [7] Xie, G., Hoeft, B., 2012. Freeway and Arterial System of Transportation Dashboard. Transportation Research Record: Journal of the Transportation Research Board 2271(1), pp. 45–56. [8] Yu, R., Lao, Y., Ma, X., Wang, Y., 2011. Short-term Traffic Flow Forecasting for Improved Estimates of Freeway Incident Induced Delays. In 90th Annual Meeting of the Transportation Research Board, Washington, DC. [9] Malinovskiy, Y., Saunier, N., Wang, Y., 2012. Pedestrian Travel Analysis Using Static Bluetooth Sensors. Transportation Research Board: Journal of the Transportation Research Board 2299(1), pp. 122-132. [10] Ma, X., McCormack, E. D, Wang, Y., 2011. Processing Commercial Global Positioning System Data to Develop a Web-based Truck Performance Measures Program. Transportation Research Record: Journal of the Transportation Research Board 2246(1), pp. 92-100. [11] Lewandowski, S. M., 1998. Frameworks for Component-based Client/Server Computing. ACM Computing Surveys (CSUR) 30(1), pp. 3-27. [12] Haklay, M., Weber, P., 2008. OpenStreetMap: User-generated Street Maps. Pervasive Computing, IEEE 7(4), pp. 12–18. [13] Goodchild, M. F, 2007. Citizens as sensors: the world of volunteered geography. GeoJournal 69(4), pp. 211-221. [14] Wood, H., 2013. 1 million OpenStreetMappers. http://blog.openstreetmap.org/2013/01/06/1million-openstreetmappers/. [15] Kittelson, W. K. Historical overview of the committee on highway capacity and quality of service. Transportation Research Board-, National Research Council-Transportation Research Circular E-C018: 4th International Symposium on Highway Capacity, Maui, Hawaii June. 2000. [16] Highway Capacity Manual, 2010. Highway Capacity Manual 2010. Transportation Research Board, National Research Council, Washington, DC. [17] Zegeer, J. D., Vandehey, M., Blogg, M., Nguyen, K., Ereti, M., 2008. NCHRP Report 599: Default Values for Highway Capacity and Level of Service Analyses. Transportation Research Board of the National Academies, Washington, DC. [18] Turner, S. M., 2001. Guidelines for Developing ITS Data Archiving Systems. Technical report. [19] Sun, L., Zhou, J., 2005. Development of Multiregime Speed-density Relationships by Cluster Analysis. Transportation Research Record: Journal of the Transportation Research Board 1934(1), pp. 64–71. [20] Wang, Y., Corey, J., Lao, Y., Wu, Y. J., 2009. Development of a Statewide Online System for Traffic Data Quality Control and Sharing. Publication TNW2009-12. Transportation Northwest and Washington State Department of Transportation.

Xiao, Liu, Wang

21

D

ra ft

[21] Wang, Y., Araghi, B. N., Malinovskiy, Y., Corey J., Cheng, T., 2014. Error Assessment for Emerging Traffic Data Collection Devices. Report No. WA-RD 810.1. Transportation Northwest and Washington State Department of Transportation.

Data-driven Geospatial-enabled Transportation ...

Data-driven Geospatial-enabled Transportation ...

Suggest Documents

datadriven insights - Pecan Street Inc.

Defining and Visualizing DataDriven Personas - WordPress.com

Defining and Visualizing DataDriven Personas - WordPress.com [PDF]

Toward a datadriven evaluation of the 2010 American College of ...

OVERSIZED TRANSPORTATION PROJECT TRANSPORTATION

Transportation

Transportation

TRANSPORTATION

TRANSPORTATION

Transportation

TRANSPORTATION

Southeast Transportation Consortium - Louisiana Transportation ...

981 Transportation / koda Transportation - WordPress.com

Diagnosing Transportation - Transportation Research at McGill

Transportation and Climate Change - Transportation Research Board

transportation ConCurrenCy - Center for Urban Transportation Research

Applying Census Data for Transportation - Transportation Research ...

Transportation Construction Equipment - Transportation Research Board

Transportation Asset Management Today - Center for Transportation ...

Freight Transportation Data - Transportation Research Board

Public Health and Transportation - Transportation Research Board

Who Rides Public Transportation - American Public Transportation ...

Medical Transportation - The Community Transportation Association of ...

shaping transportation - Transportation planning, traffic engineering ...