Paper No. 03-4230
Duplication for publication or sale is strictly prohibited without prior written permission of the Transportation Research Board.
Title The Impact of Trip Underreporting on VMT and Travel Time Estimates: Preliminary Findings from the California Statewide Household Travel Survey GPS Study
Jean Wolf
[email protected] Marcelo Oliveira
[email protected] Miriam Thompson
[email protected] GeoStats 530 Means St, NW, Suite 310 Atlanta, Georgia 30318
Transportation Research Board 82nd Annual Meeting January 12-16, 2003 Washington, D.C.
Wolf, Oliveira and Thompson
2
Abstract Trip underreporting has long been a problem in household travel surveys due to the self-reporting nature of traditional survey methods. Memory decay, failure to understand or to follow survey instructions, unwillingness to report full details of travel, and simple carelessness have all contributed to the incomplete collection of travel data in self-reporting surveys. Since household trip survey data is the primary input to trip generation models, it has a potentially serious impact on transportation model outputs, such as Vehicle Miles of Travel (VMT) and travel time. Global Positioning System (GPS) technology has been used as a supplement in the collection of personal travel data. Previous studies confirmed the feasibility of applying GPS technology to improve both the accuracy and completeness of travel data. This paper presents an analysis of the impact of trip underreporting on modeled VMT and travel times. This analysis compared VMT and travel times estimates with GPS measured data. These VMT and travel time estimates were derived through the trip assignment of Computer Assisted Telephone Interview (CATI) retrieved trips to travel demand models. This analysis employed a subset of data from the California Statewide Household Travel Survey GPS Study and was made possible through the cooperation of the Metropolitan Planning Organizations of the three study areas (Alameda, Sacramento and San Diego).
2
Wolf, Oliveira and Thompson
3
INTRODUCTION Trip underreporting is a known problem in household travel surveys. It is a natural consequence of the selfreporting nature of the traditional survey methods used. Data collected in household travel surveys is one of the key inputs to the travel demand models employed by transportation planning organizations. Recent studies are examining the use of Global Positioning System (GPS) technology to compare traditional household survey methods’ results with passively recorded travel activity (1). Vehicle Miles of Travel (VMT, where one mile is equivalent to 1.6 km) is widely used as an indicator of the total level of travel in a region. It is also an often-used output of travel demand models. This paper presents an analysis of the impact of trip underreporting as detected by GPS on the estimation of VMT in the context of travel demand models. A similar analysis is performed using GPS-measured and model-estimated trip travel times. VMT and travel time are related measures of urban mobility. Congestion and transportation network characteristics are important factors in the relationship between VMT and travel time (2). However, it is not clear how measured and modeled estimates of VMT and travel time compare. This paper attempts to shed some light on this topic area. This paper starts with an overview of the problem of trip underreporting and its associated impacts on VMT and travel times. This is followed by a description of the data collection area, processing methods and GPS study results. The authors then present a methodology for comparing travel demand model-derived VMT and travel time estimates with GPS measured values. Finally, this methodology is applied to the collected data and its results are discussed. The paper concludes with final comments and directions for future research. BACKGROUND Planning organizations typically conduct travel surveys that are used to derive trip generation models, more specifically personal trip productions. According to Ortúzar and Willumsen (2), a personal trip production is defined as the home end of a home-based (HB) trip, or as the destination of a non-home based (NHB) trip, while a trip attraction is defined as the non-home end of an HB trip or as the origin of an NHB trip. The total number of household trips generated in a traffic analysis zone (TAZ) constitutes trip generation, the first step in the popular four-step transportation-modeling framework. Transportation models are used among others to allocate and prioritize transportation infrastructure investments, to determine air-quality attainment, and to evaluate competing transportation system designs. Trip underreporting in household travel surveys may severely impact the outcome of trip generation models. This, in turn, can impact forecasted traffic flows and ultimately VMT and travel time estimates. This section starts by presenting background information of the problem of trip underreporting in household travel surveys. Several studies introducing GPS technology in household travel surveys to quantify trip underreporting are then introduced. Trip Underreporting Understanding the reasons behind trip item nonresponse has helped researchers identify methods of identifying and correcting for these omissions. In 1982, Brög identified three reasons for non-reported trips (3): 1. 2. 3.
Trips that are not reported by the respondents due to the lack of care in case of survey periods of several days’ length; Trips that are not reported by the respondent because they forget or consider them redundant; and Trips that the respondent does not want to report on the basis of their own deliberate decisions.
Later, Richardson (4) confirmed the Brög list and added reasons related to a respondent’s screening and elimination of trips considered too short, too unimportant, or non-motorized. Underreporting problems may occur either at the trip or trip-detail level. Researchers and practitioners consider trip item non-response at the trip level to be the most serious problem related to missing information in household travel surveys, since trip rates and general trip-making behavior are the focus of travel surveys. Underreporting at the trip level directly affects the outcome of predicted future demand.
3
Wolf, Oliveira and Thompson
4
Research results from mail-back travel surveys with multiple response waves reveal more reasons for underreporting trips. One reason is the increasing memory gap in later waves. In addition, respondents tend to under-report trips made once they realize the amount of time involved in recording each trip (4). These reasons are applicable to telephone based travel surveys as well, since the respondent might not be reached immediately after the travel day, or, when they are reached, they quickly learn how much time reporting an individual trip demands. Richardson also found a significant correlation between non-reported trips, the purpose of the trip, and the time of day. Correction methods for missing trip details, such as trip purpose and travel mode, have focused on data expansion procedures (4, 5, 6 and 7). In addition, gaps in trip data can be “repaired” through manual or automatic corrections by applying a set of diagnostic repair rules. These rules detect and repair inconsistencies and missing data (8). These methods have also been applied to correcting for omitted socio-economic data and missed trips. During the past decade, researchers have tried to calculate several trip correction weights and factors in order to address the trip-underreporting problem. Hassounah (9) calculated the underreporting of non-informant trips by using informant data from a Canadian telephone interview travel survey from 1986. The morning peak-period underreporting trip factor from the dataset and cordon line counts were estimated and applied to other periods of day. The result estimated correction factors that ranged between 1.03 and 1.85, based on the period of the day and the trip types. Canadian data from the greater Toronto area collected in 1996 were used to calculate trip adjustment factors (10). Based on the same procedure as in Hassounah’s study, researchers brought the non-informant trip-rate to the level of the informant trip-rate. Thus, adjustment factors, defined as the ratio of the informant trip-rate to the non-informant trips-rate, were estimated. These factors were used to determine the additional number of trips required to raise the non-informant trip-rate to that of the informant’s rate. Other studies were done based on the German KONTIV (Kontinuierliche Erhebung des Verkehrsverhaltens – A Continuous Survey of Travel Behavior) survey design. Based on the dataset, Richardson calculated a non-reporting correction factor by dividing the sum of the original stops, the expected stops, and the unexpected added stops by the original stops (4, 8). Finally, in another survey, researchers used two survey instruments to find a weighting method for underreported short trips for French National Personal Transportation Surveys (5). They compared results from car diaries and interview results and found differences of up to 32 percent depending on the length of the trip, especially on weekend days. GPS-enhanced Travel Surveys Several pilot studies were conducted in the 1990’s that examined the use of GPS as a supplemental source for household travel behavior data collection. The Federal Highway Administration (FHWA)-sponsored Lexington study was the first study to collect in-vehicle GPS data for a six-day period and then to compare the data with trips recalled for a chosen travel day. The results of this process identified several issues with the attempt to match GPS data with recall data, including respondent rounding inaccuracies with respect to trip start times, travel times, and travel distances; variations in methodologies to process the GPS data; and possible omission or malfunction of equipment use (11). As a result of these issues, the study was unable to match more than 61% of the recall trips with GPS-recorded trips. Consequently, it was not possible to perform a thorough trip rate analysis or to explore corrective or adjustment factors. In 1997, the first household travel survey conducted with a GPS subcomponent occurred in Austin, Texas. A total of 186 vehicles were instrumented with GPS loggers. Researchers analyzed the data to identify appropriate dwell times for trip end identification within the GPS data streams (11). Dwell time, as defined by Pearson, is the minimum amount of time allowed for a stop between trips. The determination of dwell time was challenging for the Austin study since the data were collected while Selective Availability (the U.S. government’s intentional degradation of GPS signal accuracy) was still active at that time. Selective Availability, which introduced positional errors of 30 to 100 meters, was officially terminated on May 1, 2000 – this resulted in typical GPS accuracy within a 5 to 10 meter range. In her dissertation work, Wolf (13) examined the use of GPS data loggers as a replacement to travel diaries. In this research effort, procedures were developed for processing second-by-second GPS data collected in Atlanta to create trip-level details. As a result of this research, Wolf found that most trips made in an instrumented vehicle could be
4
Wolf, Oliveira and Thompson
5
identified in the GPS data stream, including some trips that are often forgotten or omitted in paper diary recording, such as trips of short duration that occur as part of a trip chain or those trips that are short ‘out-and-back’ round trips. Finally, the GPS pilot studies conducted to date confirmed that GPS points are not always captured at the start of some trips. This occurs when GPS receivers experience a cold start signal acquisition delay after a loss of signal due to power off or signal blockage conditions that last more than 30 minutes. As a result, it is possible that a GPS data logger may not capture the first 20 to 120 seconds of a trip start. However, these studies have also shown that it is likely that the previous trip’s destination coordinates are the origin coordinates for the current trip and can be used accordingly. METHODOLOGY The objective of this paper is to compare estimates of VMT derived using traditional travel survey results, collected using Computer Assisted Telephone Interviewing (CATI) methods and generated within travel demand models, with GPS-measured VMT. The developed methodology utilizes reported trip data collected in three regions in California with and GPS-based trip data collected from the same households. These two datasets are prepared so that their estimates of VMT and travel times can be directly compared. For the CATI-reported trips, the initial step of data preparation involves geocoding the reported trip ends to the participating regions’ TAZs. These trips are then loaded, according to their time periods, onto the region’s transportation model’s network. This produces transportation model-derived distance and travel time for each reported trip. Aggregating these distances and travel times by time of day (TOD) and region produce VMT and travel time totals that can be compared with the GPS-based trip distances and travel times. To obtain GPS-based trip distances and travel times, the GPS data streams collected in the participating households’ vehicles are processed utilizing GeoStats’ Trip End Analysis (TEA) software to identify trips and trip level details, including trip distances and travel times. These GPS trips are then grouped and aggregated according to the same TOD periods employed by the regional travel demand models, thus allowing direct comparisons between GPS-based and model-based VMT and travel time totals by TOD periods. The final step in the methodology is the analysis of statistical measures of estimated and GPS-based values of VMT and travel time. Visual comparisons and interpretations of these values are made, with formal tests for statistical significance conducted accordingly. Given the non-normal distributions of travel distances and times, nonparametric methods are used. This methodology was applied to CATI and GPS data collected in three regions in California as part of the 2001 California Statewide Household Travel Survey GPS Study. Each region assisted this research effort by utilizing their planning networks and assignment methods to derive VMT and travel-time estimates from the CATI retrieved trips. A questionnaire was also provided and completed to obtain detailed descriptive information about these regions’ travel demand models. GPS TRIP RATE ANALYSIS Because the analyses presented in this paper leverage data collected within the California Statewide Travel Survey GPS Study, it is beneficial to briefly review the scope and results of this study. The initial deployment goal for the GPS study was 500 households. These households were recruited from within the 16,990 households participating in the statewide household travel survey. Participating households were provided GeoLoggersTM for a maximum of three household vehicles. If they had more than three vehicles, they still reported (through their travel diary and CATI interviews) travel within those additional vehicles, however, corresponding GPS data were not collected. No incentives were offered to those participating in this additional GPS component, nor were the participants told that the equipment would essentially collect the same information they were reporting in the diary. In fact, the message to the participants was intentionally vague; recruits were asked if they would assist in testing a new technology that would allow for the collection of addition travel details. Once installed, the passive nature of the GPS loggers introduced minimum burden and awareness of the equipment.
5
Wolf, Oliveira and Thompson
6
Given such a small sample size (as compared to the statewide sample) and to allow for deployment efficiencies, the sampling plan was tailored for three geographic regions – San Diego, Sacramento, and Alameda regions. By the end of the deployment period, GeoStats deployed in-vehicle GPS data loggers (i.e., the GeoLoggerTM) to 517 households; from this sample, it was found that 292 households returned complete GPS and CATI data. Deployment and Instrumentation A deployment firm was contracted to deliver and pick up the GeoLoggers to and from each participating household. Equipment installation took place one to two days prior to the assigned travel day. The standard GPS data stream elements recorded by the GeoLogger included second-by-second date, time, latitude, longitude, speed, heading, altitude, number of satellites, and horizontal dilution of precision (HDOP, a measure of positional accuracy). In addition, each household filled out the standard paper travel diary provided for the household travel survey and reported the household travel information via CATI-retrieval. Data Collection and Reduction Data collection took place over a period of 20 weeks. Data collection began on February 2001, stopped during the summer months, and finished in October 2001. Upon arrival at GeoStats, the GPS second-by-second data were converted into a format compatible with GeoStats’ Trip End Analysis (TEA) software. These conversion programs also checked the GPS data for quality and consistency, flagging potentially bad or poor points (i.e., those with low number of satellites and/or high dilution of precision values). It should be noted that, overall, the quality of the GPS data collected across the three regions was quite good, with very few points discarded. After data pre-processing and checking, GeoStats’ Trip End Analysis (TEA) software loaded each GPS data stream and identified potential trip ends based on dwell times between logged GPS points. This study utilized 120 seconds as an initial threshold for dwell time between GPS-recorded trips. Next, a GeoStats analyst used TEA to evaluate each potential trip, searching for missing and false trip ends. Any movement delays (or potential stops) lasting less than 120 seconds were examined for the possibility of being a short stop; if the characteristics indicated that this delay should be considered a stop, then a trip end was added. Similarly, stops identified with durations between two and five minutes were examined as possible congestion or traffic control delays; if these stops were determined to be delays, then this false trip end was deleted. The result of this process was an updated GPS-based trip file for each household vehicle within the study. NuStats, a survey research firm specializing in transportation studies, was responsible for CATI data collection, cleaning and geocoding. The CATI trip file provided by NuStats was person oriented; pre-processing programs at GeoStats converted this file into a vehicle-based format. This conversion provided a standard unit against which GPS trips could be compared. The CATI trip-matching module of TEA compared the individual GPS and CATI vehicle trip records, based on departure times. A GeoStats analyst then utilized TEA to display the comparison results in an interactive tabular form, with questionable items (or matches) reviewed and modified as necessary. TEA also overlaid the geocoded CATI trip ends alongside the GPS data stream. The results of these comparisons were classified as:
1. Matched Trips a.
Perfect or Automatic Match. In these cases, the respondents reported time (within 12.5 minutes) and locations that closely matched to the identified GPS trip ends. b. Manual or Forced Match. These trips were reported with the correct trip end locations but with poor trip start times. These CATI trip ends were manually matched to the corresponding GPS trip ends. 2. CATI Missed Trips These were the GPS trips detected that were not reported through CATI. Figure 1 shows an example of this, with a GPS trace of a sampled household vehicle, along with lists of the CATI-reported and GPSdetected trips. 3. GPS Missed Trips These were CATI-reported trips that were not identified in the GPS data stream. These missing trips were classified as either first trips of the day, mid-day trips or last trips of the day. It is suspected that
6
Wolf, Oliveira and Thompson
7
the vast majority of start-of-day missed GPS trips were due to a lack of power to the GeoLogger resulting from delayed installation. In addition, it is likely that the end-of-day missed GPS trips were the result of GeoLogger removal followed by unplanned, end-of-day errands. It is more difficult to explain the missing mid-day GPS trips. These could have resulted from a misreporting of vehicle identification in the CATI retrieval script, a loss of signal in the middle of the day, or a power disconnect between the GeoLogger and the vehicle (i.e., the participant removed the GeoLogger power cord to use the cigarette lighter to power a cell phone or to light a cigarette). The final result of the CATI to GPS trip comparison process was a list of missed trips and trip rate correction factors. The results of the GPS trip rate analysis for the entire California study are presented in Table 1 (more thorough review of study and results can be found in Wolf et al, 2002). Also, in Table 1, the start and end of day missed GPS trips were added back into the GPS trip sample (see adjusted columns) assuming that these missed trips were the result of a lack of equipment installation or power, and if the equipment would have been installed properly, then those trips would have been collected.
VMT AND TRAVEL TIME DATA ANALYSIS For the VMT / travel time analysis, a subset of households and trips was selected from the Caltrans GPS sample to allow for direct estimated to measured comparisons. Of the 292 instrumented households which provided both GPS and CATI data in the Caltrans GPS study, only those with three or less vehicles were considered since only three GeoLoggers were provided. This reduced the sample to 254 households from the three counties. Next, all CATI vehicle trips reported by the 254 households were used to match geocoded CATI trip ends to the respective county’s TAZ codes. Figure 2 displays maps with the location of the CATI trip ends with respect to the three regions’ TAZs. This figure shows that some of the CATI trips either originated or ended outside the TAZs. These trips and their corresponding GPS trips were excluded from the dataset used in this analysis given that these trips, when modeled, are given truncated travel distances and times based on the external gateway to which they are assigned. The resulting dataset contained 1736 CATI trips and 2146 GPS trips across the three regions. Table 2 summarizes the data used for the remainder of the analysis presented in this paper. This table, unlike Table 1, does not include the GPS missed trips within the GPS trip totals. However, these missed trips will be discussed later to help explain the differences found between GPS-measured and CATI-modeled VMT and travel times. VMT and Travel Time Estimation Methods for Transportation Planning Models A questionnaire was provided to each participating region to obtain descriptive information about each region’s travel demand model. This information was later used as a means of understanding the differences observed between GPS-collected VMT and travel times with estimates derived utilizing each region’s models. Table 3 presents a summary of the questionnaire responses. The final step in the traditional travel demand-modeling framework is trip assignment. Trip assignment algorithms allocate the model’s predicted trips onto a transportation network model. As output, the model provides inter-zonal link volumes, predicted speeds, travel times, and other system measures. These outcomes can then be combined into measures of performance such as vehicle miles travel (VMT). Total regional VMT can be computed utilizing the expression in Equation (1).
VMT = ∑∑ ODij × SDij i
where: ODij SDij
j
: Number of trips allocated between zones i and j; : Network distance between i and j in miles;
7
(1)
Wolf, Oliveira and Thompson
8
Equation (1) can be applied to subsets of Origin-Destination (OD) pairs, network facility types, and time periods to obtain categorized estimates of VMT. It can also be applied to a single roadway segment by replacing the OD total with the roadway volume and SD with its length. Different transportation applications require different VMT reporting levels. Travel times between OD pairs are obtained by adding the travel times of the network path’s individual links between the OD zones. A number of factors within the transportation-planning model can influence the values for origin-destination trip totals, network distances and travel times. These factors include assignment type, the number and distribution of trips being loaded, transportation network spatial resolution, the number of TAZs, their size, etc. Horowitz (14) compared different levels of transportation network spatial resolution and their impact on assignment results. Horowitz found that adding local streets to the network did not necessarily improve the overall model results and, in fact, under-estimated VMT. GPS-based VMT and Travel Time Estimation GeoStats’s Trip End Analysis (TEA) software computes trips distances using a piece-wise integration approach. Distance is computed in miles between subsequent points and accumulated into a grand trip total. The expression used is shown in Equations (2) through (4). ADi ( i+1)
lat − lati 2 long ( i+1) − long i 2 = sin( ( i+1) ) + cos(lat1 ) cos(long1 ) sin( ) 2 2
ADi ( i+1) GDi ( i+1) = 2 ⋅ 3956 ⋅ arctan( 1 − ADi ( i+1)
2
TD = ∑ GDi ( i +1)
(2)
(3)
(4)
i
where: lati, longi ADi(i+1) GDi(i+1) TD
: latitude, longitude coordinates for point i in radians; : Arc distance between point i and point (i+1); : Geodistance between points i and (i+1) in miles; : Total trip distance in miles.
TEA obtains travel time values by calculating the difference between the time stamps of each trip’s first and last GPS points. Travel times are summarized in minutes, with one decimal digit of precision. VMT Comparisons and Results The case study analyzed VMT and travel time data for 1736 CATI trips and 2146 GPS trips. Of these trips, 1625 were matched GPS and CATI trips. Some of those matches were automatically detected by TEA, while others were forced or manually matched by GeoStats analysts. In addition, there were 521 missed CATI trips (GPS trips with no match) and 111 missed GPS trips (CATI trips with no match). The analysis performed in this research effort employed a step-wise approach. The analyses’ results are shown first for the matched trips, then for each of the missed trip classifications, and finally for the aggregate comparisons which factor in both the impact of missed CATI and GPS trips (Tables 4 – 7). Although the terms “missed CATI” and “missed GPS” may be confusing, they do properly define the nature of the trip within the study framework. In this context, a missed CATI trip refers to a GPS trip that does not have a corresponding CATI trip, while a missed GPS trip is a CATI trip that is not paired with a recorded GPS trip.
8
Wolf, Oliveira and Thompson
9
Matched Trips Table 4 summarizes VMT for the 1625 matched trips. For this subset, the differences between total modeled CATI and GPS VMT were rather small for all three regions. Kolmogorov-Smirnov (15) tests were used to compare average trips distances between CATI and GPS trips for each of the three regions. These tests revealed statistically significant differences, at the 95% confidence level, for Alameda and San Diego. The most notable differences were seen in San Diego, where the authors believe that trip distances are being slightly over-estimated (relative to GPS measured distances) by the San Diego travel model. The questionnaire results (see Table 2) indicate that San Diego utilizes much smaller TAZs, when compared to the other two regions. This would lead one to believe that trip distances would be more accurate, not less. It is likely that other aspects of the San Diego travel model, not covered in the questionnaire, are contributing to this potential overestimation. Additional investigation is needed in order to properly explain the observed behavior. For the Alameda region, different start times of matched CATI and GPS trips, which are a consequence of poorly reported CATI start times, created instances where the same trip was assigned to different time periods based on its GPS or CATI trip start time. In fact, approximately 50 GPS trips were assigned to off-peak hours, while the corresponding CATI trips were assigned to the peak periods. The assignment of these CATI trips to peak periods most likely results in the over-estimation of modeled VMT in these periods. Missed CATI Trips Table 5 shows the 521 missed CATI trips. These trips corresponded to trip ends identified in the GPS data but not reported by the study participants. For San Diego, the CATI missed VMT of 625.9 miles was 13% of the matched CATI VMT. For the Alameda region, the CATI missed VMT was 19% of the CATI VMT. The Sacramento region, which had the most missed CATI trips, resulted in a 56% CATI missed VMT when compared to matched CATI VMT. These percentages represent the raw VMT under-reporting rates. These findings are consistent with the trip under-reporting percentages seen in the full study (and in Table 1), in which San Diego had a 21% under-reporting rate, Alameda had a 22% rate, and Sacramento had a 42% rate. Missed GPS Trips Table 6 shows VMT estimates for the 111 missed GPS trips. These trips were reported by travel participants but were not identified in the GPS analysis. Missed VMT estimates for these trips were derived from the modeled CATI trip distances. For San Diego, 28 trips were not captured by GPS, resulting in a missed GPS VMT total of 131.0, which corresponds to 3% of the total matched GPS VMT. This percentage represents the potential GPS under-estimation of VMT. For Alameda, which had the highest number of GPS missed trips at 42, this percentage was also relatively low at 5% since the total CATI VMT for those trips was small (138.4 miles) compared to the total matched GPS VMT (2587.5 miles). In Sacramento, 41 trips were missing from the GPS data; since these trips accounted for 259 miles of missing GPS VMT, the percentage of missing to total measured VMT was 10%. Final VMT Comparisons The final task in this step-wise analysis is to apply the missed CATI and missed GPS totals to the origin matched trip table – this results in the final VMT comparison table (Table 7). The impact of these missing trips on the matched table was a larger increase in GPS VMT than in modeled CATI VMT. This is because the magnitude of missing VMT from the missed CATI trips (1561.1 miles) was much greater than the total missing GPS mileage (258.9 miles). Table 7 indicates a trend towards higher GPS VMT for off-peak travel. This trend was also evident for the matched trips of Alameda region in Table 4, but was not as evident for the other regions. Overall, this is probably a result of poor quality in reported travel times combined with higher trip underreporting levels in off-peak periods . This is a logical trend given that participants are more likely to recall and report accurate addresses and travel times for recurring destinations (such as work or school), while are less likely to recall trips to infrequent destinations or the corresponding trip details. The larger percentage of GPS VMT observed in Sacramento was due to the high number of missed CATI trips. In Alameda and Sacramento, the average CATI modeled VMT per trip was similar to the measured VMT per trip, thus indicating that the introduction of the missed CATI trips accounts for the higher GPS VMT. On the other hand, San
9
Wolf, Oliveira and Thompson
10
Diego showed average modeled VMT per trip to be 12-16% higher than GPS measured VMT, further indicating that this model may be over-estimating travel distances. Travel Time Comparisons and Results Table 8 presents summary variables for the travel time analysis. One of the noteworthy and important differences between the VMT and travel time analysis is the presence of CATI travel time. CATI travel times (in minutes) were derived from the reported trip start and end times; this type of data element was not available in the VMT analysis since participants were not asked to report trip lengths. The authors expected that the higher GPS VMT values, as observed in Table 7, would lead to higher GPS travel times. However, data analysis revealed the opposite result. It was observed that the CATI reported travel times were significantly higher (approximately 38% higher) than both the measured GPS travel times and those calculated by the travel models. The Kruskal-Wallis (15) non-parametrical test was used to compare the three different travel time distributions for each region based on total times. These tests indicated that the travel times were significantly different at the 95% confidence level for all three regions. Closer examination of the differences between the total travel times shows that the gap is less between GPS and modeled travel times, especially when compared to reported times. It is possible that the elevated travel times reported by the participants (based on start and end times) resulted from consistent upward rounding. CONCLUSIONS AND NEXT STEPS This research effort focused on an important issue -- trip underreporting in household travel surveys and its impact on modeled estimates of VMT and travel times. By introducing passive GPS logging into traditional household travel surveys, the means to audit reported trip and trip characteristics are now available. GPS-based and reported travel data collected in the California Statewide Travel Survey GPS Study, conducted in 2001, were used to perform comparisons of measured to modeled VMT and travel times. As the first study to examine differences between reported and measured VMT and travel times using a substantial sample, the results reveal the great potential for GPS-enhanced travel surveys. As expected, differences were found. Specifically, in areas with high levels of underreported trips, such as Sacramento, the impact of these missed trips on modeled VMT was predictable, with 3008 miles of modeled VMT compared to 4235 miles of GPS-measured VMT from the same households. Similarly, Sacramento’s households accounted for 4673 total modeled travel minutes, whereas the GPS measured trips had 5259 total minutes. Alameda’s trends were consistent with Sacramento. Unexpectedly, San Diego produced results that seem counterintuitive when only trip details are considered. Although a net increase of 135 trips were added in the GPS analysis, the impact on total VMT was a negative 30 vehicle miles. Although the reason for this is unclear without further investigation, it is possible that different regional travel demand models produce estimates of VMT and travel time that vary considerably with respect to GPS-measured data. Values of average GPS trip distances were equivalent with CATI modeled estimates, except for San Diego County where CATI modeled distances were consistently higher. The misallocation of CATI trips, due to poor start time reporting, accompanied by the missed CATI trips caused GPS VMT estimates to be higher on off peak periods. The next step in this analysis is to deepen the investigation into these preliminary findings. Additional statistical analyses will be conducted on the variables of interest. In addition, the authors plan to conduct a more detailed analysis on the relationship between the travel demand models’ characteristics and their impacts on travel time estimates and VMT.
10
Wolf, Oliveira and Thompson
11
ACKNOWLEDGEMENTS This work was conducted using data acquired under sponsorship of the California Department of Transportation (Caltrans). The authors wish to express their gratitude to the MPOs responsible for the regions of Alameda, Sacramento and San Diego for their cooperation and insightful comments. Special thanks go to Gordon Garry, Chuck Purvis, and Bill McFarlane. The views presented in this paper are those of the authors and do not necessarily represent those of Caltrans or the MPOs responsible for the regions of Alameda, Sacramento and San Diego.
11
Wolf, Oliveira and Thompson
12
REFERENCES 1.
2.
Wolf, Jean, Loechl, Michael, Thompson, Miriam and Arce, Carlos. (2002) “Trip Rate Analysis in GPS Enhanced Personal Travel Surveys”, Proceedings of the International Conference on Transportation Survey Quality and Innovation, Kruger Park, South Africa, August 2001. Ortúzar, J de D., and Willumsen, L. G. (1994), Modelling Transport, 2nd Ed., Wiley & Sons.
3.
Brög, W., E. Erl, A. H. Meyburg and M. J. Wermuth (1982), “Problems of Nonreported Trips in Survey of Nonhome Activity Patterns,” Transportation Research Record 891, pp. 1-5.
4.
Richardson, A. J. (2000), “Behavioural Mechanisms of Non-Response in Mailback Travel Surveys,” Paper presented at the 79th Annual Transportation Research Board Meeting, Washington D.C., January 2000.
5.
Armoogum, J. and J.L. Madre (1998), “Weighting or Imputations? The Example of Nonresponses for Daily Trips in the French NPTS,” Journal of Transportation and Statistics, October 1998, pp. 53-63.
6.
Zmud, J. P., C. H. Arce (1997), “Item Nonresponse in Travel Surveys: Causes and Solutions,” Paper presented at the International Conference on Transport Survey Quality and Innovation, Grainau, Germany, May 1997.
7.
Polak, J., X-L Han (1997), “Iterative Imputation Based Methods for Unit and Item Non-Response in Travel Surveys,” Paper presented at the 8th Meeting of the International Association of Travel Behaviour Research, Austin, Texas, September 1997.
8.
Richardson, A. J., E. S. Ampt and A. H. Meyburg (1995), Survey Methods for Transport Planning, Eucalyptus Press, Melbourne, Australia.
9.
Hassounah, M. I., L-S Cheah, and G. N. Steuart (1993), “Underreporting of Trips in Telephone Interview Travel Surveys,” Transportation Research Record 1412, pp. 90-94.
10.
Badoe, Daniel A. and Gerald N. Steuart (1999), Underreporting of Trips in Travel-Survey Conducted by Telephone. Paper presented at the 78th Annual Meeting of the Transportation Research Board, Washington D.C., January.
11.
Wagner, D. P. Battelle (1997), Report: Lexington Area Travel Data Collection Test; GPS for Personal Travel Surveys, Final Report for OHIM, OTA, and FHWA.
12.
Pearson, D. (2001), “Global Positioning System (GPS) and Travel Surveys: Results from the 1997 Austin Household Survey,” Paper presented at the Eighth Conference on the Application of Transportation Planning Methods, Corpus Christi, Texas, April 2001.
13.
Wolf, J. (2000), Using GPS Data Loggers to Replace Travel Diaries in the Collection of Travel Data, Dissertation, Georgia Institute of Technology, School of Civil and Environmental Engineering, Atlanta, Georgia.
14.
Horowitz, A. J., Computational Issues in Increasing the Spatial Precision of Traffic Assignments, University of Milwakee, 2001.
15.
Daniel, W. W. (1978), Applied Nonparametric Statistics, Houghton Mifflin Company.
12
Wolf, Oliveira and Thompson
13
LIST OF TABLES AND FIGURES
Figure 1:
Example of Misreported Travel.
Figure 2:
Three Counties’ TAZ’s With Geocoded CATI Trip Ends.
Table 1:
Data Summary for California GPS Study Sample.
Table 2:
Data Summary for Selected Households.
Table 3:
Summary Information from the Participating Regions’ Travel Models.
Table 4:
VMT for Matched Trips.
Table 5:
VMT for Missed CATI Trips.
Table 6:
VMT for Missed GPS Trips
Table 7:
Final VMT Comparisons.
Table 8:
Travel Time Totals and Averages in Minutes for Matched Trips.
13
Wolf, Oliveira and Thompson
14
CATI reported: Trip 1 – Home to Work Trip 2 – Work to Home
Actual Travel (captured by GPS): Trip 1 – Home to Work Trip 2 – Work to Unknown 1 Trip 3 – Unknown 1 to Unknown 2 Trip 4 – Unknown 2 to Unknown 3 Trip 5 – Unknown 3 to Home
Figure 1: Example of Misreported Travel.
14
Wolf, Oliveira and Thompson
15
Alameda
Sacramento
San Diego
Figure 2: Three Counties’ TAZ’s With Geocoded CATI Trip Ends
15
Wolf, Oliveira and Thompson
Table 1:
16
Data Summary for California GPS Study Sample.
Region Name
Number of Number of Vehicles per Number of Households Vehicles Households GPS Trips
Number of Number of CATI Trips CATI Missed
% Missed Trips (baseline)
Number of Missed (adjusted)
% Missed Trips (adjusted)
Alameda
88
152
1.73
711
605
106
17.5%
134
22.1%
Sacramento
93
171
1.84
854
635
219
34.5%
264
41.6%
San Diego
111
200
1.8
1046
888
158
17.8%
186
20.9%
Totals
292
523
1.79
2611
2128
483
22.7%
584
27.4%
16
Wolf, Oliveira and Thompson
Table 2:
17
Data Summary for Selected Households.
Region Name
Number of Households
Number of Vehicles
Vehicles per Households
Number of GPS Trips
Number of CATI Trips
Number of % Missed Trips Number of CATI Missed (baseline) GPS Missed
Alameda
74
105
1.42
589
516
73
14.1%
42
Sacramento
81
119
1.47
704
502
202
40.2%
41
San Diego
99
150
1.52
853
718
135
18.8%
28
254
374
1.47
2146
1736
410
23.6%
111
Totals
17
Wolf, Oliveira and Thompson
Table 3:
18
Summary Information from the Participating Regions’ Travel Models. Alameda
Sacramento
San Diego
TP+/Viper
MINUTP
TRANPLAN, migrating to TransCAD
Trip Data Year
1990
2000
1995 trip data, calibrated for 2000
Network Year
1998
2001
2000
Trip Generation Model
Mix of regression, cross-classification, and simple trip-rates.
Cross-classification
Trip rate
Time Periods
AM: 5:00 to 8:59 PM: 3:00 to 6:59 Off-Peak: Remaining
AM: 6:45 to 9:44 Mid-Day: 9:45 to 2:44 PM: 2:45 to 5:44 Evening: 5:45 to 6:44
AM: 6:00 to 8:59 PM: 3:00 to 5:59 Off-Peak: Remaining
Trip Assignment
Equilibrium, capped at 99 iterations
Equilibrium
Equilibrium
Number of Zones
1,099
1,290
4,252
6.4 Miles
5 Miles
1 Mile
26,904
Approx. 12,000
29,041
Minor arterials and collectors
Collectors
Collectors
Modeling Package
Average TAZ Area Number of Links Lowest Road Functional Class
Table 4:
VMT for Matched Trips. Number of CATI Trips
Modeled Distance (Miles)
Number of GPS Trips
GPS Distance (Miles)
Change from Modeled to GPS (%)
Region
Time Period
San Diego
AM Peak
127
1015.3
131
982.6
-3.2%
PM Peak
177
1184.1
164
984.5
-16.9%
Off Peak
386
2626.3
395
2333.6
-11.1%
Total Distance
690
4825.7
690
4300.7
-10.9%
AM Peak
68
506.08
48
395.8
-21.8%
PM Peak
189
991.8
160
901.2
-9.1%
Off Peak
217
955.36
266
1290.5
35.1%
Total Distance
474
2453.2
474
2587.5
5.5%
AM Peak
103
593.5
98
603.9
1.7%
Midday
144
671.0
143
709.8
5.8%
PM Peak
121
777.5
117
634.6
-18.4%
Evening
93
707.3
103
725.9
2.6%
461
2749.3
461
2674.2
-2.7%
Alameda
Sacramento
Total Distance
18
Wolf, Oliveira and Thompson
Table 5:
19
VMT for Missed CATI Trips. Number of CATI missed trips
Time Period
San Diego
AM Peak
16
28.0
PM Peak
31
119.3
Off Peak
116
478.7
Total Travel Distance
163
625.9
AM Peak
11
65.4
PM Peak
41
151.4
Off Peak
63
238.8
115
455.5
AM Peak
35
248.2
Midday
94
519.3
PM Peak
76
438.8
Evening
38
354.9
243
1561.1
Alameda
Total Travel Distance Sacramento
Total Travel Distance
Table 6:
GPS Travel Distance (Miles)
Region
VMT for Missed GPS Trips Number of GPS missed
Modeled Travel Distance (Miles)
Region
Time Period
San Diego
AM Peak
3
13.7
PM Peak
3
3.1
Off Peak
22
114.2
Total Travel Distance
28
131.0
AM Peak
6
22.0
PM Peak
16
61.6
Off Peak
20
54.8
Total Travel Distance
42
138.4
4
19.4
Midday
16
101.8
PM Peak
6
21.8
Evening
15
115.8
Total Travel Distance
41
258.9
Alameda
Sacramento
AM Peak
19
Wolf, Oliveira and Thompson
Table 7:
20
Final VMT Comparisons. CATI Trips
CATI Modeled (Miles)
Average
AM Peak
130
1029.0
7.9
PM Peak
180
1187.2
Off Peak
408
Total Distance AM Peak
Number of GPS Trips
GPS Distance (Miles)
Average
7.7
147
1010.5
6.9
6.6
7.1
195
1103.8
2740.5
6.7
6.9
511
718 74
4956.7 528.0
6.9 7.1
7.1 8.2
PM Peak
205
1053.4
5.1
Off Peak
237
1010.2
Total Distance Sacramento AM Peak
516 107
Midday
Average Difference (%)
Total Difference (%)
7.0
-13.2%
-1.8%
5.7
6.5
-14.2%
-7.0%
2812.2
5.5
6.6
-18.1%
2.6%
853 59
4926.5 461.2
5.8 7.8
6.7 9.3
-16.3% 9.6%
-0.6% -12.7%
6.9
201
1052.6
5.2
6.8
1.9%
-0.1%
4.3
6.1
329
1529.2
4.6
6.5
9.0%
51.4%
2591.6 613.0
5.0 5.7
6.8 4.9
589 133
3043 852
5.2 6.4
6.9 5.9
2.9% 11.8%
17.4% 39.0%
160
772.8
4.8
5.0
237
1229.1
5.2
6.0
7.4%
59.0%
PM Peak
127
799.3
6.3
7.1
193
1073.4
5.6
7.5
-11.6%
34.3%
Evening
108
823.1
7.6
7.2
141
1080.7
7.7
7.7
0.6%
31.3%
Total Distance
502
3008.2
6.0
6.2
704
4235.2
6.0
6.8
0.4%
40.8%
Region
Time Period
San Diego
Alameda
Standard Deviation
20
Standard Deviation
Wolf, Oliveira and Thompson
Table 8:
Region
21
Travel Time Totals and Averages in Minutes for Matched Trips. Time Period
Number CATI Standard CATI of CATI Reported Average Deviation Modeled Trips
Number Standard of GPS GPS Standard Average Deviation Trips Measured Average Deviation
San Diego AM Peak
127
2153
17.0
14.3
1912
15.1
13.3
131
1798
13.7
11.1
PM Peak
177
2933
16.6
12.3
2474
14.0
13.1
164
2363
14.4
22.7
Off Peak
386
5700
14.8
10.2
4292
11.1
9.0
395
4333
11.0
8.4
Total Time
690
10786
15.6
11.6
8678
12.6
11.1
690
8493
12.3
13.7
AM Peak
68
1162
17.1
13.5
900
13.2
12.7
48
747
15.6
12.7
PM Peak
189
2857
15.1
13.5
1825
9.7
10.6
160
1915
12.0
10.4
Off Peak
217
3065
14.1
13.4
1513
7.0
7.6
266
2585
9.7
7.9
Total Time
474
7084
14.9
13.4
4239
8.9
9.9
474
5247
11.1
9.6
103
1558
15.1
10.4
1074
10.4
7.7
98
1170
11.9
11.5
Midday
144
1864
12.9
9.8
1154
8.0
6.8
143
1432
10.0
9.3
PM Peak
121
1923
15.9
13.3
1397
11.5
10.5
117
1418
12.1
10.6
Evening
93
1441
15.5
10.7
1049
11.3
8.3
103
1240
12.0
9.9
Total Time
461
6786
14.7
11.2
4673
10.1
8.5
461
5259
11.4
10.2
Alameda
Sacramento AM Peak
21