Variability of Commuters' Bus Line Choice: 1. An Analysis of Oystercard Data. 2. 3. 4. Fumitaka Kurauchi. 1. , Jan-Dirk Schmöcker. 2. , Hiroshi Shimamoto. 3 and.
1
Variability of Commuters' Bus Line Choice:
2
An Analysis of Oystercard Data
3 4 5
Fumitaka Kurauchi1, Jan-Dirk Schmöcker2, Hiroshi Shimamoto3 and
6
Seham M. Hassan4
7 8 9 10 11 12 13 14 15
1. 2. 3. 4.
Dept of Civil Eng., Gifu University, 1-1 Yanagido, Gifu City 501-1193, Japan Dept of Urban Management, Kyoto University, Kyotodaigakukatsura, Kyoto, 615-8540, Japan Dept of Urban Management, Kyoto University, Kyotodaigakukatsura, Kyoto, 615-8540, Japan Dept of Civil Engineering, Aswan University, Aswan, Egypt
16 17
Abstract A hyperpath can be defined as a set of attractive lines identified by the
18
passenger, each of which might be the optimal one from the current stop,
19
depending on lines’ arrival time, frequency, cost etc. This concept can lead to
20
complex route choice and has been a fundamental assumption in most transit
21
assignment models, despite few evidence whether passengers’ indeed select such
22
complex strategies. This research uses time series smart card data from London to
23
investigate flexibility in buses chosen by morning commuters. The analysis is
24
based on n-step Markov models and proposes that the variations in bus lines taken
25
by passengers who supposedly travel between the same OD pair every morning
26
over several days should reflect the set of paths included in an (optimal)
27
hyperpath. Our hypothesis is that a large variation in bus lines over days indicates
28
a complex hyperpath whereas a passenger who takes the same line every morning
29
does not consider many alternatives. Our results suggest that there is indeed
30
signficiant variation in bus lines chosen, possibly in accordance with the theory of
31
hyperpaths in networks with uncertainty.
32 33
Keywords: Bus line choice variation, Oyster Card data, Travel behaviour,
34
Markov model
35 1
1
1. Introduction
2
It is generally assumed that on transit networks travellers try to minimise their
3
expected travel time consisting of waiting time, on-board time as well as
4
potentially other factors such as fare, crowding or seat availability by selecting a
5
hyperpath (Ngyuen et al. 1988). A hyperpath can be defined as a set of attractive
6
lines identified by the passenger for each stop, each of which might be the optimal
7
one from the stop, depending on lines arrival time, frequency, cost etc. In
8
networks with few uncertainties, i.e. regular arrival times and low congestion, this
9
set of services will be smaller as passengers can better estimate whether it is
10
advantageous to let slow services pass in order to wait for the faster service that
11
might arrive soon. This behavioural assumption has led to a fairly large set of
12
literature.
13 14
Several transit network design models that aim at optimising the frequencies of
15
the lines have been proposed (Fearnside et al. 1971). Transit assignment has been
16
studied either as a separate problem (e.g. see Andreasson 1976), or as a sub
17
problem of more complex models, such as transit network design (e.g. Mandl
18
1979), or multimodal network equilibrium (Florian 1977, and Florian et al. 1983).
19
Most of the early algorithms may be classified as heuristic approaches to the
20
problem. These algorithms are variants of assignment procedures used for private
21
car traffic on road networks (such as shortest path, stochastic multipath
22
assignment) that are modified to reflect the waiting time phenomenon inherent to
23
transit networks. Spiess and Florian (1989) replace the simplistic route choice
24
models by a transit assignment model leading to more realistic transit network
25
design models. They described a model for the transit assignment problem with a
26
fixed set of transit lines. The traveller chooses the strategy that allows him or her
27
to reach his or her destination at minimum expected cost assuming that passengers
28
choose the line from an attractive line that happens to arrive first. Since then,
29
numerous frequency-based assignment models have been proposed to explain
30
additional factors influencing line choice. For example De Cea and Fernández
31
(1993) proposed “effective frequencies” to consider congestion and Kurauchi et al
32
(2003) implemented “fail-to-board probabilities” to recgonise strict capacity
33
constraints of vehicles. More recently, Nökel et al (2009) derived expected line
34
splits reflecting the increasing amount of information passengers obtain during 2
1
their journeys. Noteworthy in the context of this study is particularly the study of
2
Ngyuen et al. (1998) who allowed for stochastic hyperpath choice with a logit
3
choice model for line choice to recognise that the optimal strategy is not taken by
4
all passengers. The path set of passengers in Ngyuen et al (1998) is based on the
5
concept of optimal paths as in the Spiess and Florian model. Recently, Schmöcker
6
et al (2013) proposed the “reversed model” in which passengers choose the path
7
set based on personal preferences but the choice of the line itself is according to
8
the line frequency as in the Spiess and Florian model. The motivation of their
9
model is though identical in that it is recognised that passengers’ “optimal
10
strategies” are not easily understood by modellers.
11 12
The purpose of this research is therefore to understand whether passengers indeed
13
follow such theoretically proposed hyperpaths or whether habits and other factors
14
would dominate routing decisions leading to less (or more) complex hyperpaths
15
than those proposed in the literature. Observing hyperpaths is, however, difficult.
16
One would have to understand which (unchosen) routing options the traveller
17
considers. As a first step this analysis assumes that the variations in lines taken by
18
regular commuters during their morning journey over several days should reflect
19
the set of paths considered by hyperpath travellers in networks with uncertain
20
vehicle arrival times. Our hypothesis is that a large variation in bus lines over
21
days indicates a complex hyperpath whereas a traveller who takes the same route
22
every morning does not consider many alternatives leading to a simple hyperpath.
23 24
The reminder of this paper is structured as follows. The next section reviews some
25
literature on passengers’ travel behaviour on public transit. Section 3 describes the
26
data used for this research and the pre-processing. Section 4 describes the Markov
27
analysis used to analyse stability in bus line choice over days and shows some
28
initial results. Section 5 describes how overlapping between bus lines is
29
considered and how this changes the results. Finally, Section 6 discusses the
30
findings of this paper and proposes further research directions.
31
3
1
2. Passengers’ Route Choice Observation
2
Analysing the travel behaviour of public transit passengers has received more
3
attention in the literature during the last decade. In particular through the advent
4
of electronic data collection systems, such as smart card data, detailed studies on
5
passengers’ route choice and trip patterns have become feasible.
6 7
Electronic data collection systems differ by transport mode and fare systems
8
as indicated in Table 1. An early and well established type of system is
9
highway ETC (Electronic Toll Collection) which has been installed all over the
10
world. On some tolled urban expressways, the fare is flat and therefore the
11
ETC gates only need to be constructed at either entry or exit ramps.
12
Accordingly the analyst can only gain limited behavioural information. On the
13
other hand, if the fare is distance-based, gates should be obviously
14
constructed both at entry and exit ramps. The analyst can now obtain better
15
data, but for both systems, if there are route choices, in general the route the
16
traveller has taken cannot be identified though, unless card readers are
17
installed also at interchange points or periodically en-route. There has been
18
substantial research discussing the possibilities of ETC data to estimate time
19
dependent demand patterns including their variability in reaction to service
20
quality changes (ex. Nishiuchi et al, 2010, Yamazaki et al., 2012 and Kim et al.,
21
2013).
22 23
Table 1. Charasteristics of (smart-)card data. Label
Mode
Fare
OD
Route
Research example
ETC (Flat)
Car
Flat
Either entry or exit ramp
No
ETC (Distance)
Car
Distance-based
Both entry and exit ramps
No in general
Nishiuchi et al., 2010 Yamazaki et al., 2012 Kim et al., 2013
Rail/ Subway
Subway/Metro/ Underground
Generally distance- based
Both entry and exit ramps
No
Van der Hurk et al, 2012
Bus (Flat)
Bus
Flat within a specific area
Either at boarding or alighting
Yes in general
Trépanier et al., 2007 Li et al., 2011
Bus (Distance)
Bus
Distance based
Both at boarding and alighting
Yes
24 25
Looking at public transport systems, on most rail-bound systems all over the
26
world the fare is calculated based on the distance and it is common to “tap-
27
in” and “tap-out”. Therefore, entry and exit stations can be identified from 4
1
smart card data and with it the analyst has often a good idea about the
2
traveller’s origin and destination. Again though, the route or the line which
3
cardholders used cannot be identified if there is more than one reasonable
4
route connecting the entry and exit stations. In reality though, especially for
5
heavy rail, there may not be a large number of route options and passenger
6
movements can often be inferred even without route information (eg. Van
7
der Hurk, 2012).
8 9
This is in general different for bus services where the route can be identified
10
because the card reader is equipped on buses. Moreover, if the fare varies
11
based on the distance, the boarding and alighting bus stop can be accurately
12
identified with the route information. Even if the alighting bus stop is not
13
recorded, some approaches have been proposed to estimate the trip
14
destination (e.g. Trepanier et al, 2007; Li et al., 2011). An issue with vehicle-
15
based card reader systems might only be that the boarding and alighting
16
location might not always be readily identified. This problem is usually
17
overcome by linking the card reader to a vehicle location system which also
18
serves to provide for example countdown information at bus stops. If the two
19
systems are in operation but not linked, the traveller location is still
20
identifiable if the card records a vehicle identification number. In systems
21
where the bus fare is flat, such as in London, there is no operational
22
requirement to do so though, leaving the analyst with limited information
23
about travellers’ boarding points. Further, in line with above discussion for
24
other modes, in flat bus fare systems tap-out is not required meaning that
25
also travel time and alighting point information are not available even if
26
smart card and AVL data are linked. Table 1 includes some references to
27
literature analysing passenger behaviour with smart card data. The following
28
reviews methodologies and key findings of some of these papers.
29 30
Morency et al. (2006, 2007) have analysed the variability of transit users’
31
behaviour with multi-day smart card data. They adopted the average number of
32
different bus stops used and the proportion of the number of used bus stops as
33
indicators of spatial variability. In addition, they applied cluster analysis and then
34
evaluated the temporal variability. As a result, they confirmed that both the spatial 5
1
and the temporal variability differ among users depending on what card type they
2
hold. Park et al. (2008) describe user characteristics (e.g. transfer location,
3
boarding time distribution) and estimated future trends in passenger behaviour. A
4
first attempt to estimate the most probable alighting point, specifically by looking
5
at multi-day transit fare card data, was presented by Trépanier et al. (2007). The
6
accuracy of their estimated destination was 66% for the whole time period, and it
7
reached about 80% at peak hours. Agard et al. (2006, 2008) analysed transit users’
8
behaviour using data mining techniques (clustering) and related tools. Four major
9
behavioural groups were created through clustering, and their activities were
10
associated with fare types (adult, student, elderly, etc). Chapleau et al. (2008) also
11
explored the behaviour of students with smart card data.
12 13
Other datasets, in particular travel diaries within large scale travel surveys have
14
been used to investigate other behavioural aspects such as mode choice, effect of
15
times and transfer penalties and individual attributes. Detailed route choice
16
analysis is, however, mostly not possible as not many surveys ask the respondents
17
for the specific bus or metro lines they took. Other studies use observed line loads
18
to (implicitly) derive route choice. Kato et al (2011) estimate route choices of
19
passengers in the Tokyo metropolitan area based on such line load data. Their
20
focus is on understanding which discrete choice approach can best replicate line
21
flows. For estimating such discrete choice, however, choice set assumptions are
22
required. For these Kato et al (2011) test different heuristics, highlighting also the
23
need for this study on empirically testing the complexity of passengers’
24
hyperpaths. Also the large set of literature estimating values of time for
25
passengers through surveys or observed data do not specifically investigate such
26
issues in detail, see e.g. Wardman (2004) for a meta-analysis of a large number of
27
studies on values of times for public transit users.
28 29
The review shows that, though several studies on passenger route choice have
30
been carried out, there has been only relatively small emphasis on the estimation
31
of passengers’ choice sets and whether line choices are “random” based on transit
32
service attributes such as “take whichever line from this choice set arrives first”.
33
Kurauchi et al (2012) investigated this aspect through a web-based stated
34
preference survey asking respondents to choose in hypothetical scenarios between 6
1
two simple and one complex hyperpath. The “simple” hyperpaths are single line
2
hyperpaths (“I will choose Line A”), one of which is faster whereas the other is
3
slower but more frequent. The complex hyperpath consists of both lines with the
4
strategy “I will take whichever line comes first”. They find that most respondents
5
would choose the complex hyperpath which minimises the total travel time.
6
However, they do also find some significant differences based on socio-
7
demographic characteristics and previous transit experiences. For example
8
students tend to have lower values for waiting time (and hence choose more often
9
the fast line) or those experiencing higher levels of service congestion during their
10
daily life tend to use more often the complex strategies in the SP surveys.
11 12
The analysis presented in the following should be considered as complementary to
13
the previous analysis. The hypothetical scenarios used in the stated preference
14
survey might overestimate the tendency for passengers to choose complex
15
hyperpaths, as habits are ignored. Further, we would like to understand in how far
16
passengers indeed face a choice between different strategies. For these reasons we
17
base our analysis in this paper on the observed choices of morning commuters.
18 19
3. Data Description and Preparation
20
Through Transport for London (TfL) smart card data from London’s public
21
transport network, commonly referred to as “Oyster card”, has been obtained.
22
London is a good case study site for our analysis because of three reasons. First,
23
the public transport network is large and dense offering passengers a large number
24
of route choices. Second, public transport services are operated frequency based.
25
Even though there might be an internal timetable within TfL, in many cases
26
passengers only find information about service frequencies at bus stops for
27
frequent services during peak hour. Third, service reliability in London is not as
28
high as in many other cities with smart card systems. All three reasons should
29
encourage passengers to consider in many cases fairly complex hyperpaths.
30 31
The Oyster smart card system is implemented in London’s bus, tube, tram, DLR
32
as well as parts of its commuter rail system (see
33
http://www.tfl.gov.uk/tickets/14836.aspx). Smart card data are convenient for
7
1
travellers, operators as well as analysts. Pelletier et al.(2011) described in more
2
detail possibilities as well as limitations of smart card data use by categorising
3
existing researches into strategic-level studies (such as long-term planning),
4
tactical-level studies (such as service adjustments and network development) and
5
operational-level studies (such as ridership statistics and performance indicators).
6 7
We obtained two weeks of Oyster card data for the period 08 Nov - 22 Nov 2007
8
with an average of 6.3 Million swipes recorded per day. Cardholders travelling by
9
bus only have to swipe when boarding a bus. Travellers on all track bound modes
10
though have to swipe when boarding as well as when alighting. For the purpose of
11
our analysis on path choice decision, this additional alighting record is obviously
12
advantageous. However, interchanges between tube lines are not captured by
13
Oyster card, whereas the bus data do record the route number taken. Furthermore,
14
the bus network offers users far more routing options and potentially complex
15
hyperpaths with several bus routes departing from the same stop. Therefore, our
16
initial analysis focuses on bus records only.
17 18
Due to the size of our data set as well as some incomplete records significant pre-
19
processing is required. As Kusakabe et al. (2010) also noted this pre-processing
20
effort required for the use of smart card data can be very substantial. Firstly we
21
reduced the dataset to only that information relevant for our analysis. These are
22
-
Card number, to identify the same traveller over several days,
23
-
Route ID, to identify the bus route,
24
-
Boarding time.
25
We further kept the boarding location information in our database, however,
26
unfortunately the boarding location recorded on the Oyster card are not reliable as
27
the system is not connected with the GPS system. Note further that Oyster card
28
does not record the bus ID the passenger is boarding but only the route number.
29
Clearly these are limitations to our study, but we believe to partly overcome these
30
by reducing our sample to only those travellers who use a bus every day of the 10
31
week days in our sample before 9.30 am, meaning that we are likely to pick up
32
only regular morning commuters. The aim of this data reduction is to a) ensure
33
that we pick up passengers who are indeed facing a repeated choice and b) to
34
overcome the limitation due to missing spatial data in our data set. That is, we 8
1
assume, that those travelling for 10 consecutive weekdays before 9.30am by bus
2
are likely to be home-to-work commuters and hence face the same line choice
3
each day. For the same reason we only select the first boarding of the day,
4
meaning that it is likely that we pick up the same boarding point for each
5
observation, the person’s home. Furthermore, we reduce our sample to those
6
commuters whose hyperpath presumably only consists of bus route options from
7
their home location. With these stringent conditions we reduce our sample size to
8
22492 regular bus commuters. Fig. 1 illustrates the share of samples for the size
9
of the choice set. Note that the share of people with two alternative bus lines
10
(30%, 6839 commuters) is larger than the share of people who use only one bus
11
line (23%, 5190 people). This may be because people are changing bus lines
12
stochastically in line with the hyperpath concept, which we further investigate in
13
the following. As will be explained in the next section, we assume that the choice
14
set of each person is known and the Markov Analysis only addresses the choice
15
variation within each person’s choice set. This means that people who continue
16
using only one line, i.e., persons who always choose the same line (choice set size
17
equal to one), can be excluded from the data set. Eventually, our sample size is
18
reduced to 17302 commuters or 17302 line choice observations.
Share of Samples
19 35% 30% 25% 20% 15% 10% 5% 0% 1
20 21
2
3
4 5 6 7 Size of Choice Set
8
9
10
Fig. 1. Choice Set Size Distribution
22 23
4. Initial N-Step Markov Analysis
24
The idea to use Markov chains for day to day route choice variation has been
25
applied by Yang and Liu (2007). They present a new Markov model to study 9
1
travellers’ stochastic behaviour in the day-to-day route choice adjustment process.
2
The model is characterised by two components: how often a traveller reconsiders
3
his/her route choice (route-switching rate), and the probability to take a certain
4
route (route choice probability). By applying evolutionary game theory, the
5
conventional perfect information and complete rationality requirements in
6
equilibrium analysis are relaxed. Their behavioural assumption for an individual
7
is the Markov decision rule, i.e., one makes route choice “today” only depending
8
on the limited road information available from “yesterday”, and behaves not
9
completely rationally in that one might choose the non-optimal route with certain
10
probability.
11 12
For our objectives we are not considering equilibrium problems but otherwise the
13
problem is similar. In contrast to Yang and Liu (2007), however, to analyse the
14
consistency in route choice over days, we adopted an n-Step Markov model. The
15
choice of route on day d is assumed to depend on the choices on n previous days.
16
As we are not interested in which specific route the passenger is taking, but only
17
in whether the traveller is taking the same or a different route the choice on the
18
first day is generally abbreviated with bus A in the following. On the next day the
19
passenger has then the choice to take the same bus A or a different bus B. If the
20
passenger took two different buses on the first two days, on the third day he/she
21
then has a choice between buses A, B, C and so on.
22 23
In the first analysis step we assume n=2. Our choice of independent days is taken
24
as the previous workday as well as the same weekday during the previous week.
25
The letters follow chronological order of choices from left to right. Therefore, the
26
two letters before the underline indicate the routes taken on previous days and the
27
last letter indicates the route chosen on the predicted day. For example AA_A
28
indicates that the traveller is taking the same route on all three days, AB_B
29
indicates that the traveller is taking the same bus as yesterday, but that he took a
30
different bus route on the same weekday on the previous week. AB_C indicates
31
that different buses are taken each day. The 3-Step case can also be calculated in
32
the same way, assuming e.g. the day before yesterday as an additional
33
independent variable. The general form of the probability that a person i will
34
choose transit line j on day d in the 2-Step case can be described as follows; 10
Pr AA _ A Pr AA _ C Ji 1 pijd Pr AB _ A Pr AB _ B Pr AB _ C Ji 2
1
2
if jd jd 1 , jd 1 jd 7 if jd jd 1 , jd 1 jd 7 if jd jd 1 , jd 1 jd 7
(1)
if jd jd 7 , jd 1 jd 7 if jd jd 1 , jd jd 7 , jd 1 jd 7
Where: i
:
person,
j
:
transit line choice,
jd
:
transit line choice on the dth day,
pijd :
probability that person i chooses transit line j on the dth day,
Ji
number of available transit lines for person i.
:
3 4
Note that we assume here that the choice set of each person is known in advance.
5
Suppose that a person selected the same line on the previous day and the same
6
weekday during the previous week. In this case, we can calculate two
7
probabilities, Pr(AA_A), and Pr(AA_C). Note that C stands for a generic other line
8
than the lines chosen before (A or A and B). Therefore, if calculating the
9
probability of a specific line C, we divide the choice probability by the number of
10
other lines that are in the person’s choice set but not taken before.
11
words, assume the person chose AA on previous days and the choice set size of
12
this person, Ji, is larger than three, i.e., the person has more than two lines other
13
than the line (s)he used on the day before and the same weekday during the
14
previous week, then we have to divide the probability by the number of lines
15
which are regarded as C. This is the reason why Pr(AA_C) is divided by (Ji-1).
16
Similarly, in the case when a person i used different lines on the previous day and
17
the same weekday during the previous week, the probability of choosing line j that
18
is regarded as the ‘other’ (C) line should be divided by (Ji-2).
In other
19 20
Fig. 2 describes the probability of each choice for 2-step Markov models,
21
suggesting that there is considerable variation in routes chosen, and indicates that
22
a large amount of commuters change route at least on some days. It also shows 11
1
that only around 23% choose the same bus every day whereas around 20% choose
2
a different bus route on all three days. The remaining percentage of passengers
3
chooses a different bus on at least one out of the three days. Fig.3 shows the
4
results for n=3 where the independent days are taken as the day before, two days
5
before and the same weekday in the week before. The percentage of commuters
6
taking every day the same bus now reduces to less than 17%. Note, however, that
7
the percentage of commuters taking a different bus every day reduces even further
8
to below 6%. The remaining percentage of passengers chooses the same bus route
9
on at least two out of the four days in question.
10 11
Both figures further indicate that the day of the week for which the route choice is
12
predicted does not appear to have a significant influence on the results, possibly
13
except for a small “Monday effect”, i.e. on Mondays the percentage of passengers
14
taking the same route as last Friday and last week Monday (or last Friday, last
15
Thursday and last week Monday in the three step case) is even smaller. This is in
16
line with an observation that the choice on the last day travelled appears to have a
17
slightly higher influence on the predictability of the line chosen on the day in
18
question. In the two step model we observe a slightly higher Markov probability
19
for the choice AB_B than AB_A. In the three step case the difference is more
20
apparent as the percentage of ABB_B is 10.9% but for choices ABA_B and
21
ABC_B only 5.3% and 4.8%, respectively. In general though the figures illustrate
22
that the differences are small and that choices are difficult to predict which is in
23
line with our hypothesis that these morning commuters choose “randomly” from a
24
set of attractive lines.
25
Markov Probability
40% 30% 20% 10% 0% AA_A
26 27
Monday
Fig.2
AA_C
Tuesday
AB_A
Wednesday
AB_B
AB_C
Thursday
Friday
2-Step results for route prediction during the 2nd week
12
Markov Probability
40% 30% 20% 10%
AAA_A AAA_D AAC_A AAC_C AAC_D ABA_A ABA_B ABA_D ABB_A ABB_B ABB_D ABC_A ABC_B ABC_C ABC_D
0%
Monday
Tuesday
Wednesday
Thursday
Friday
1 2
Fig.3
3-Step results for route prediction during the 2nd week
3 4
5. Consideration of Overlapping Routes
5
In order to understand whether the variation in chosen routes observed in Figs 2
6
and 3 is indeed due to passengers traveling on hyperpaths or whether this is due to
7
other reasons, overlap of routes is considered in this section. We expect that, if the
8
threshold for overlapping is relaxed, i.e., if routes with less overlapping rate are
9
recognised as the same route, and the variation in route choice decreases
10
disproportionally to the relaxation of the overlapping threshold, a large part of the
11
route choice variation is indeed due to common lines that are part of the travellers’
12
hyperpaths.
13 14
As explained before though, unfortunately our data set does not allow us to induce
15
the home bus stop of the respondent. Therefore, we cannot identify directly
16
whether a traveller took a different route B on a second day because it is part of
17
the same hyperpath or because of different reasons. As an approximation we can
18
only identify the degree to which the routes overlap. We therefore define pxy as the
19
percentage of shared stops on the routes of two bus lines x and y.
20 21
If the percentage exceeds a predefined threshold S these two lines are considered
22
as the same line as it is presumed likely that passengers could take both lines from
23
their home location. This means that the smaller S the smaller the set of lines.
24
S=0% would mean that the traveller always faces only the option of one line,
25
whereas S=100% leads to results identically to those shown in the previous
26
section. 13
1 2
Tests were carried out with different thresholds S for 2-step Markov models for
3
prediction of route choice during the second week. Fig.4 shows the goodness of
4
fit index () for different overlapping thresholds S during weekdays, which are
5
calculated with the following equation.
6 7
1 LL0 LLmodel
(2)
With our likelihood measures calculated as
8
LL0 ln i j d 1 J i ij
9
LLmodel ln i j d pijd ij
d
( ) d
∑∑∑ ( )
( )
∑∑∑ ( )
(3)
(4)
10 11
where
12
S= 0% is meaningless (network collapses to a single line and traveller faces no
13
choice) we use as our lower bound the case when two lines share at least one stop.
14 15
Fig.4 suggests that considering overlap is important to increase the model fit of
16
the predicted model. The likelihood ratio index improves significantly for low
17
overlapping thresholds S (i.e. 20% is better than 40% and so on) but no
18
improvement, compared to ignoring overlapping, can be observed for S>40%. To
19
understand why the likelihood ratio did not change when S>40%, we calculated
20
the share of line pairs which are treated as ‘same’ line by each overlapping
21
threshold. The results are 0.17%, 0.21%, 0.37%, 1.12%, and 14.33%, respectively,
22
for thresholds of 80%, 60%, 40%, 20% and ‘any sharing’. This suggests that when
23
S is large (>40%), only less than 0.4% of line pairs are treated as ‘same’ line, and
24
in such case, the result may remain unchanged. At the same time, it is interesting
25
that likelihood ratio improves when S=20%, even though only 1.12% of line pairs
26
are merged. Table 2 shows the goodness of fit indices for the 2-step Markov
27
analysis and Fig.5(a)-(d) illustrates the results of the 2-Step Markov model for
28
route prediction during the 2nd week for different overlapping thresholds S.
29
Obviously the percentage of AA_A increases with lower thresholds as there are
30
less distinguished lines. Interestingly though the increase is nonlinear compared to
31
the decrease in choices which leads to the increase in the model fit.
equals 1 if person i chooses option j on day d and zero otherwise. Since
32 14
Goodness of fit index
0.40 0.35
Monday Wednesday Friday
0.30 0.25
Tuesday Thursday
0.20 0.15 0.10 0.05 0.00 Any sharing
20%
40%
60%
80%
100%
Overlapping threshold S
1 2
Fig.4 Good of fit index () for different overlapping thresholds S
3
Table 2 Goodness of fit index for different thresholds S 100% (all lines
any sharing
20%
40%
60%
80%
Mon
0.245
0.153
0.146
0.155
0.157
0.157
Tues
0.306
0.170
0.154
0.158
0.158
0.159
Wed
0.336
0.180
0.156
0.158
0.159
0.159
Thrs
0.350
0.170
0.156
0.159
0.160
0.161
Fri
0.324
0.171
0.150
0.155
0.156
0.157
distinguished)
4 40% overlapping
20% overlapping
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
AA_A
Monday
AA_C
Tuesday
AB_A
Wednesday
AB_B
Thursday
AA_A
AB_C
Friday
Monday
AA_C Tuesday
AB_A
AB__B
Wednesday
Thursday
AB_C Friday
(b) 40% overlapping
(a) 20% overlapping
80% overlapping
60% overlapping 1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2 0
0
AA_A Monday
AA_C Tuesday
AB_A Wednesday
AB_B Thursday
AB_C Friday
AA_A Monday
AA_C Tuesday
AB_A Wednesday
AB_B Thursday
AB_C Friday
15
(c) 60% overlapping
(d) 80% overlapping
Fig.5 2-Step results with S=20%, S=40%, S=60%, and S=80%
1 2
Comparing Fig.2 and Fig.5, we have evidence that the variation in route choice
3
decreases in case line overlap is considered. This may indicate that a large part of
4
the route choice variation is indeed due to common lines that are part of the
5
travellers’ hyperpaths or at least some route variation is due to overlap and
6
possibly hyperpaths. S > 60% reduces the significance of considering overlapping
7
as there remains significant variation in route choice prediction. Only for S