An integrated stop-mode detection algorithm for real ...

2 downloads 47888 Views 8MB Size Report
smartphones - Samsung S2, HTC Desire, HTC V, iPhone 4 iPhone 5, Samsung S3, HTC sensation. 40. 710e , iPhone 4s from three major manufacturers ...
2

An integrated stop-mode detection algorithm for real world smartphone-based travel survey

3 4 5 6 7

Ajinkya Ghorpade, Francisco Câmara Pereira, Fang Zhao Singapore-MIT Alliance for Research and Technology, Future Urban Mobility 1 CREATE Way, #09-02 CREATE Tower, Singapore 138602 Tel: 65-6601 1547, Fax: 65-6778 5654 Email address: {ajinkya, camara, fang.zhao}@smart.mit.edu

8 9 10 11 12

Christopher Zegras, Moshe Ben-Akiva Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA, 02139 Telephone: 617.253.5324 Email address: {czegras, mba}@mit.edu

13

total word count = 5034 + 9 * 250 =7284

1

1

Abstract

2 3 4 5 6 7 8 9 10 11 12 13 14

Travel surveys form a key component of transportation planning by making transportation related data available to the planners. Smartphones are emerging as ideal tools for collecting detailed individual travel information, which motivated us to develop a smartphone-based travel survey system, Future Mobility Survey (FMS). Inferring people’s stops and modes of transportation is a critical and challenging problem in the FMS system, especially because time phased data collection method has been used to reduce the battery usage. In this paper, we propose a novel algorithm for integrated stop and travel mode detection using parsimonious real world data collected from smartphones through FMS. We use a two stage classification system to detect five modes of travel, viz., stop, walk, train, car, and bus. To improve accuracy of the classification we derive features from a fusion of data collected from GPS,GSM,Wi-Fi and Accelerometer sensors on-board the smartphones. We also propose new features based on contextual information and user’s historical data. Experimental results show that the algorithm can effectively perform stop/mode detection, and is robust against noisy/incomplete data from the smartphones.

1 2 3 4 5 6 7 8 9

1

INTRODUCTION Travel surveys are crucial in collecting data required by the transportation modellers and planners. Traditionally, travel surveys were conducted through face-to-face interviews, telephonic interviews or by asking the participants to fill in questionnaires related to their trips through emails or websites. Conducting these traditional surveys is time, labour and cost intensive, thus, they are usually conducted once a decade, or at most once every 4-5 years. Last decade has seen tremendous improvement in technology bringing about rapid and significant changes in urban lifestyles. To be able to fit the travel behaviour models to these rapidly changing patterns there is a need to conduct travel surveys economically and more frequently.

10 11 12 13 14 15 16 17 18 19

To overcome these challenges, modern survey methodologies utilise low cost GPS devices which are carried by the participants of the survey. New methods are developed to process the large amount of data collected from GPS devices to infer stops and modes of travel. Although these new models provide a promising alternative to traditional surveys (1, 2, 3, 4, 5, 6, 7), they face difficulties due to the logistics involved, as the agency conducting the survey is required to purchase and distribute the devices and the participants themselves are required to carry an additional device. In recent years, smartphones have emerged as the most widely used pervasive computing devices incorporating a wide array of miniaturised sensors. Recent trend in modern survey technologies is to use smartphones for data collection due to their ubiquitous nature and their wide spread use (8, 9).

20 21 22 23 24 25 26 27 28 29 30 31 32

Future Mobility Survey (FMS) is an innovative smartphone based travel survey system with web-based prompted recall. It has gone through a large-scale field test in Singapore in 2012/2013 demonstrating its capability to collect accurate and high resolution data of users’ travel behavior over multiple days (10, 11). The three main components of the FMS system are smartphone app, backend data analysis, and web-interface. The smartphone app, available for both Android and iOS platforms, runs in the background on the phones to collect data from GPS, GSM, WiFi and accelerometer sensors. The raw data sent back from phones is processed in the backend server to infer users’ stops, modes of transportation, and activities. The processed data is presented to the user on the web-interface in the form of an activity diary for the user to validate, correct, and provide additional information. The validated data is fed back to the backend to assist in future inference. In this paper, we present an integrated stop-mode detection algorithm for the FMS system that uses machine learning techniques together with context information (Points of Interest, land use information, user profile, time of day, day of week etc.) and user history.

33 34 35 36 37 38 39 40 41

Using smartphones for data collection in travel surveys presents an unique set of challenges compared to using dedicated GPS devices. One of the major challenges is to ensure that the application is the least intrusive so that it doesn’t affect participant’s behaviour. Another challenge is to conserve battery so that the participant can continue normal usage of his/her smartphone and remain interested in the travel survey. In designing the FMS system, the trade-off between battery consumption and data accuracy has to be taken into consideration, and our stop-mode detection algorithm is also designed with these practical limitations in mind. It is a novel algorithm to perform integrated stop-mode detection for a fully automated smartphone based travel survey developed using parsimonious real world data collected from multitude of smartphones available in the market. 1

1 Our algorithm is designed to work with 1 Hz of GPS phase sampled data and 2Hz of accelerometer 2 data. Note that our definition of stop is a place where the user has stayed to perform an activity, 3 which is consistent with what’s required for a travel survey. 4 5 6 7 8 9 10 11 12

Many related works in smartphones based automated travel survey application development has focused mainly on identifying the modes of travel for each trip made by the carrier of the smartphone (12, 13). However, determining where the user has stopped (activity locations) are crucial from a travel survey perspective, and it also determines the trip start and end times. Previous works address this issue by either asking the users to tag on the phone when they start or end a trip or treat “stationary" state as the stops. The former approach significantly increase the user burden, and the latter definition does not match that of a travel survey. There are also some works focusing on detecting the “places of interest" for individual users (14, 15), and their activities at these places, but they do not provide mode detection.

13 The paper is organized as follows: Section 2 describes the methodology of the algorithm 14 followed by experimental design in Section 3. Results of our experiments are presented in Section 15 4, and Section 5 is the conclusion.

16 17 18 19 20 21 22 23 24 25 26 27 28

2 METHODOLOGY Overview The architecture of the system is organised into stages as depicted in figure 1 on the following page. Section 2.2 describes in detail the data logging procedures and other sources of data that were used to build the knowledge base used in development of the algorithm. In the first step of algorithm, low accuracy location data is filtered out. Filtered data is then segmented using time-distance thresholds as described in section 2.3 on page 4. Each of these segments are then passed to first stage of classifier to distinguish stops, walk and motorised modes (section 2.4.2). Labeled segments are then given to a segment merging algorithm for smoothing the inference from first stage classifier (section 2.5). The segments are merged into homogenous stops and trips based on the output of the merging algorithm. A classifier at second stage described in section 2.6 then classifies the contiguous motorised trips into bus, train or car based trips. The architecture is further described in details in following subsections.

29 30 31 32 33 34 35 36 37 38 39

Data This section elaborates on data collected from the smartphones for this study. Due to differences in the accessibility to the information from heterogeneous devices there is a variation in the type of information collected from these devices. For this data collection effort, we focused on devices developed around two operating systems, viz., android and iOS, due to their widespread use in the place of experiment (Singapore). For android devices, Service Set Identifiers (SSID’s) for all the wifi access points within scanning range of the device are recorded every minute. These devices provide access to rich location data from GPS sensors but usage of these sensors consumes a lot of energy. To preserve battery, GPS location data is sampled for three minutes with 1 Hertz frequency followed by a sleeping period of two minutes. Due to the urban canyons and underground rail network, smartphones don’t always have

2

FIGURE 1 : System Architecture

3

1 2 3 4 5 6 7 8 9 10 11 12 13

access to GPS location services. In absence of GPS location service, both android and iOS systems report estimated location based on GSM triangulation and known wifi access points in the area. In case of android based devices, location service shares the source of location data by indicating whether it is from GPS or from GSM/Wifi in which case it is called Network location. Both the systems report accuracy of the estimated location in terms of radius in meters with 68% confidence. We filter less accurate location data from Network source where accuracy reading is more than 500m. Devices operating the iOS system do not share the source of location data with applications so we only use accuracy measure to filter the location data from iOS. We recorded accelerometer data continuously at a frequency of 2 Hertz for both android and iOS devices. Lower frequency of accelerometer data reduces the size of file thus reducing the amount of data transferred on network devices. Throughout the literature various algorithms have used more than 10Hz of accelerometer data. This is first effort in our knowledge to use such parsimonious data for performing stop and mode detection.

14 15 16 17 18 19 20

The survey participants are presented their stops and modes identified by the system to which they can provide their feedback by confirming or modifying the results. We create a database of stops and modes validated or deleted by the users in the past. This historical database is used to calculate user specific features in the following classification steps.We created a database of points of interest for Singapore using the open street maps (OSM) and land use data available from the Singapore Land Authority (SLA). The points of interest database is used to extract GIS features for the classification.

21 22 23 24 25 26 27 28 29 30

Segmentation Filtered raw data is segmented using the time and distance thresholds that were chosen such that every location point within 5 minutes and 200 meters from the first point is included in one segment. The location data is phase sampled only three minutes followed by a two minute sleeping period so a 5 minute threshold is long enough to ensure that all samples within a three minute collection phase are included in same segment unless the distance threshold is exceeded. On one hand it is important to ensure that the time threshold is long enough so as to include sufficient information to make inference while on the other hand the time threshold should be short enough so as not to contain more than one mode in individual segment. The 200 meter threshold is ideal to differentiate stops from other types of modes.

31 Stage 1 Classifier 32 In this stage, each segment is considered for classification into stop,walk or motorised 33 modes. 34 Features 35 Statistical Features 36 Statistical features are derived from location and Accelerometer data within each seg37 ment. For each segment, the following statistical features are extracted from location data 38

• Average, minimum, maximum and standard deviation of speed

39 40

• Average and maximum geographical distance between each location reading and the centroid of location points within the segment. Maximum distance represents the radius 4

TABLE 1 : Features used in classification Input Speed Latitude,Longitude Accelerometer magnitude( fa )

Features from each input segment Average, minimum, maximum and standard deviation Average and maximum vincenty distance from mean Average, minimum, maximum, standard deviation, coefficient of variance, interquantile range Peak frequency, normalised relative energy in five intervals and its standard deviation, peak interval Minimum and average distance and density Minimum and average distance and density Minimum and average distance and density Common wi-fi access points Common bus/train line Time duration, time and distance gap

DFT( fa ) POI Validated stops/modes Deleted stops/modes Wi-Fi Bus/train network data Segment r of the segment

1

2 Accelerometer sensors provide readings (denoted as ax , ay and az ) in three directions viz., 3 x, y and z. We calculate q the l 2 norm of these three readings to obtain rotation invariant observations.



4 ( fa = ax , ay , az =

a2x + a2y + a2z ) . The following statistical features are obtained from fa

5

• Average, minimum, maximum and standard deviation of fa

6

• Coefficient of variance cv =

7

• Interquartile range of fa

Standard Deviation( fa ) Average ( fa )

8 From the Fourier transform of fa we calculate following features 9 10 11 12 13 14 15

• Peak frequency • The segment size is determined by location and is dynamic so the length of accelerometer data within each segment is not uniform. Which is why we calculate normalised relative energy of Fourier transform inside 5 fixed bins instead of using all the coefficients across the power spectrum as features. To calculate this feature the frequency spectrum is divided into 5 equal intervals b j where j = 1, 2, 3, 4, 5 and for each interval b j = j/max( j) we calculate the normalised sum of magnitude as follows: ∑

16

lj =

b j−1 ≤2i/N

Suggest Documents