Variability of Commuters' Bus Line Choice

2 downloads 0 Views 654KB Size Report
Variability of Commuters' Bus Line Choice: 1. An Analysis of Oystercard Data. 2. 3. 4. Fumitaka Kurauchi. 1. , Jan-Dirk Schmöcker. 2. , Hiroshi Shimamoto. 3 and.
1

Variability of Commuters' Bus Line Choice:

2

An Analysis of Oystercard Data

3 4 5

Fumitaka Kurauchi1, Jan-Dirk Schmöcker2, Hiroshi Shimamoto3 and

6

Seham M. Hassan4

7 8 9 10 11 12 13 14 15

1. 2. 3. 4.

Dept of Civil Eng., Gifu University, 1-1 Yanagido, Gifu City 501-1193, Japan Dept of Urban Management, Kyoto University, Kyotodaigakukatsura, Kyoto, 615-8540, Japan Dept of Urban Management, Kyoto University, Kyotodaigakukatsura, Kyoto, 615-8540, Japan Dept of Civil Engineering, Aswan University, Aswan, Egypt

16 17

Abstract A hyperpath can be defined as a set of attractive lines identified by the

18

passenger, each of which might be the optimal one from the current stop,

19

depending on lines’ arrival time, frequency, cost etc. This concept can lead to

20

complex route choice and has been a fundamental assumption in most transit

21

assignment models, despite few evidence whether passengers’ indeed select such

22

complex strategies. This research uses time series smart card data from London to

23

investigate flexibility in buses chosen by morning commuters. The analysis is

24

based on n-step Markov models and proposes that the variations in bus lines taken

25

by passengers who supposedly travel between the same OD pair every morning

26

over several days should reflect the set of paths included in an (optimal)

27

hyperpath. Our hypothesis is that a large variation in bus lines over days indicates

28

a complex hyperpath whereas a passenger who takes the same line every morning

29

does not consider many alternatives. Our results suggest that there is indeed

30

signficiant variation in bus lines chosen, possibly in accordance with the theory of

31

hyperpaths in networks with uncertainty.

32 33

Keywords: Bus line choice variation, Oyster Card data, Travel behaviour,

34

Markov model

35 1

1

1. Introduction

2

It is generally assumed that on transit networks travellers try to minimise their

3

expected travel time consisting of waiting time, on-board time as well as

4

potentially other factors such as fare, crowding or seat availability by selecting a

5

hyperpath (Ngyuen et al. 1988). A hyperpath can be defined as a set of attractive

6

lines identified by the passenger for each stop, each of which might be the optimal

7

one from the stop, depending on lines arrival time, frequency, cost etc. In

8

networks with few uncertainties, i.e. regular arrival times and low congestion, this

9

set of services will be smaller as passengers can better estimate whether it is

10

advantageous to let slow services pass in order to wait for the faster service that

11

might arrive soon. This behavioural assumption has led to a fairly large set of

12

literature.

13 14

Several transit network design models that aim at optimising the frequencies of

15

the lines have been proposed (Fearnside et al. 1971). Transit assignment has been

16

studied either as a separate problem (e.g. see Andreasson 1976), or as a sub

17

problem of more complex models, such as transit network design (e.g. Mandl

18

1979), or multimodal network equilibrium (Florian 1977, and Florian et al. 1983).

19

Most of the early algorithms may be classified as heuristic approaches to the

20

problem. These algorithms are variants of assignment procedures used for private

21

car traffic on road networks (such as shortest path, stochastic multipath

22

assignment) that are modified to reflect the waiting time phenomenon inherent to

23

transit networks. Spiess and Florian (1989) replace the simplistic route choice

24

models by a transit assignment model leading to more realistic transit network

25

design models. They described a model for the transit assignment problem with a

26

fixed set of transit lines. The traveller chooses the strategy that allows him or her

27

to reach his or her destination at minimum expected cost assuming that passengers

28

choose the line from an attractive line that happens to arrive first. Since then,

29

numerous frequency-based assignment models have been proposed to explain

30

additional factors influencing line choice. For example De Cea and Fernández

31

(1993) proposed “effective frequencies” to consider congestion and Kurauchi et al

32

(2003) implemented “fail-to-board probabilities” to recgonise strict capacity

33

constraints of vehicles. More recently, Nökel et al (2009) derived expected line

34

splits reflecting the increasing amount of information passengers obtain during 2

1

their journeys. Noteworthy in the context of this study is particularly the study of

2

Ngyuen et al. (1998) who allowed for stochastic hyperpath choice with a logit

3

choice model for line choice to recognise that the optimal strategy is not taken by

4

all passengers. The path set of passengers in Ngyuen et al (1998) is based on the

5

concept of optimal paths as in the Spiess and Florian model. Recently, Schmöcker

6

et al (2013) proposed the “reversed model” in which passengers choose the path

7

set based on personal preferences but the choice of the line itself is according to

8

the line frequency as in the Spiess and Florian model. The motivation of their

9

model is though identical in that it is recognised that passengers’ “optimal

10

strategies” are not easily understood by modellers.

11 12

The purpose of this research is therefore to understand whether passengers indeed

13

follow such theoretically proposed hyperpaths or whether habits and other factors

14

would dominate routing decisions leading to less (or more) complex hyperpaths

15

than those proposed in the literature. Observing hyperpaths is, however, difficult.

16

One would have to understand which (unchosen) routing options the traveller

17

considers. As a first step this analysis assumes that the variations in lines taken by

18

regular commuters during their morning journey over several days should reflect

19

the set of paths considered by hyperpath travellers in networks with uncertain

20

vehicle arrival times. Our hypothesis is that a large variation in bus lines over

21

days indicates a complex hyperpath whereas a traveller who takes the same route

22

every morning does not consider many alternatives leading to a simple hyperpath.

23 24

The reminder of this paper is structured as follows. The next section reviews some

25

literature on passengers’ travel behaviour on public transit. Section 3 describes the

26

data used for this research and the pre-processing. Section 4 describes the Markov

27

analysis used to analyse stability in bus line choice over days and shows some

28

initial results. Section 5 describes how overlapping between bus lines is

29

considered and how this changes the results. Finally, Section 6 discusses the

30

findings of this paper and proposes further research directions.

31

3

1

2. Passengers’ Route Choice Observation

2

Analysing the travel behaviour of public transit passengers has received more

3

attention in the literature during the last decade. In particular through the advent

4

of electronic data collection systems, such as smart card data, detailed studies on

5

passengers’ route choice and trip patterns have become feasible.

6 7

Electronic data collection systems differ by transport mode and fare systems

8

as indicated in Table 1. An early and well established type of system is

9

highway ETC (Electronic Toll Collection) which has been installed all over the

10

world. On some tolled urban expressways, the fare is flat and therefore the

11

ETC gates only need to be constructed at either entry or exit ramps.

12

Accordingly the analyst can only gain limited behavioural information. On the

13

other hand, if the fare is distance-based, gates should be obviously

14

constructed both at entry and exit ramps. The analyst can now obtain better

15

data, but for both systems, if there are route choices, in general the route the

16

traveller has taken cannot be identified though, unless card readers are

17

installed also at interchange points or periodically en-route. There has been

18

substantial research discussing the possibilities of ETC data to estimate time

19

dependent demand patterns including their variability in reaction to service

20

quality changes (ex. Nishiuchi et al, 2010, Yamazaki et al., 2012 and Kim et al.,

21

2013).

22 23

Table 1. Charasteristics of (smart-)card data. Label

Mode

Fare

OD

Route

Research example

ETC (Flat)

Car

Flat

Either entry or exit ramp

No

ETC (Distance)

Car

Distance-based

Both entry and exit ramps

No in general

Nishiuchi et al., 2010 Yamazaki et al., 2012 Kim et al., 2013

Rail/ Subway

Subway/Metro/ Underground

Generally distance- based

Both entry and exit ramps

No

Van der Hurk et al, 2012

Bus (Flat)

Bus

Flat within a specific area

Either at boarding or alighting

Yes in general

Trépanier et al., 2007 Li et al., 2011

Bus (Distance)

Bus

Distance based

Both at boarding and alighting

Yes

24 25

Looking at public transport systems, on most rail-bound systems all over the

26

world the fare is calculated based on the distance and it is common to “tap-

27

in” and “tap-out”. Therefore, entry and exit stations can be identified from 4

1

smart card data and with it the analyst has often a good idea about the

2

traveller’s origin and destination. Again though, the route or the line which

3

cardholders used cannot be identified if there is more than one reasonable

4

route connecting the entry and exit stations. In reality though, especially for

5

heavy rail, there may not be a large number of route options and passenger

6

movements can often be inferred even without route information (eg. Van

7

der Hurk, 2012).

8 9

This is in general different for bus services where the route can be identified

10

because the card reader is equipped on buses. Moreover, if the fare varies

11

based on the distance, the boarding and alighting bus stop can be accurately

12

identified with the route information. Even if the alighting bus stop is not

13

recorded, some approaches have been proposed to estimate the trip

14

destination (e.g. Trepanier et al, 2007; Li et al., 2011). An issue with vehicle-

15

based card reader systems might only be that the boarding and alighting

16

location might not always be readily identified. This problem is usually

17

overcome by linking the card reader to a vehicle location system which also

18

serves to provide for example countdown information at bus stops. If the two

19

systems are in operation but not linked, the traveller location is still

20

identifiable if the card records a vehicle identification number. In systems

21

where the bus fare is flat, such as in London, there is no operational

22

requirement to do so though, leaving the analyst with limited information

23

about travellers’ boarding points. Further, in line with above discussion for

24

other modes, in flat bus fare systems tap-out is not required meaning that

25

also travel time and alighting point information are not available even if

26

smart card and AVL data are linked. Table 1 includes some references to

27

literature analysing passenger behaviour with smart card data. The following

28

reviews methodologies and key findings of some of these papers.

29 30

Morency et al. (2006, 2007) have analysed the variability of transit users’

31

behaviour with multi-day smart card data. They adopted the average number of

32

different bus stops used and the proportion of the number of used bus stops as

33

indicators of spatial variability. In addition, they applied cluster analysis and then

34

evaluated the temporal variability. As a result, they confirmed that both the spatial 5

1

and the temporal variability differ among users depending on what card type they

2

hold. Park et al. (2008) describe user characteristics (e.g. transfer location,

3

boarding time distribution) and estimated future trends in passenger behaviour. A

4

first attempt to estimate the most probable alighting point, specifically by looking

5

at multi-day transit fare card data, was presented by Trépanier et al. (2007). The

6

accuracy of their estimated destination was 66% for the whole time period, and it

7

reached about 80% at peak hours. Agard et al. (2006, 2008) analysed transit users’

8

behaviour using data mining techniques (clustering) and related tools. Four major

9

behavioural groups were created through clustering, and their activities were

10

associated with fare types (adult, student, elderly, etc). Chapleau et al. (2008) also

11

explored the behaviour of students with smart card data.

12 13

Other datasets, in particular travel diaries within large scale travel surveys have

14

been used to investigate other behavioural aspects such as mode choice, effect of

15

times and transfer penalties and individual attributes. Detailed route choice

16

analysis is, however, mostly not possible as not many surveys ask the respondents

17

for the specific bus or metro lines they took. Other studies use observed line loads

18

to (implicitly) derive route choice. Kato et al (2011) estimate route choices of

19

passengers in the Tokyo metropolitan area based on such line load data. Their

20

focus is on understanding which discrete choice approach can best replicate line

21

flows. For estimating such discrete choice, however, choice set assumptions are

22

required. For these Kato et al (2011) test different heuristics, highlighting also the

23

need for this study on empirically testing the complexity of passengers’

24

hyperpaths. Also the large set of literature estimating values of time for

25

passengers through surveys or observed data do not specifically investigate such

26

issues in detail, see e.g. Wardman (2004) for a meta-analysis of a large number of

27

studies on values of times for public transit users.

28 29

The review shows that, though several studies on passenger route choice have

30

been carried out, there has been only relatively small emphasis on the estimation

31

of passengers’ choice sets and whether line choices are “random” based on transit

32

service attributes such as “take whichever line from this choice set arrives first”.

33

Kurauchi et al (2012) investigated this aspect through a web-based stated

34

preference survey asking respondents to choose in hypothetical scenarios between 6

1

two simple and one complex hyperpath. The “simple” hyperpaths are single line

2

hyperpaths (“I will choose Line A”), one of which is faster whereas the other is

3

slower but more frequent. The complex hyperpath consists of both lines with the

4

strategy “I will take whichever line comes first”. They find that most respondents

5

would choose the complex hyperpath which minimises the total travel time.

6

However, they do also find some significant differences based on socio-

7

demographic characteristics and previous transit experiences. For example

8

students tend to have lower values for waiting time (and hence choose more often

9

the fast line) or those experiencing higher levels of service congestion during their

10

daily life tend to use more often the complex strategies in the SP surveys.

11 12

The analysis presented in the following should be considered as complementary to

13

the previous analysis. The hypothetical scenarios used in the stated preference

14

survey might overestimate the tendency for passengers to choose complex

15

hyperpaths, as habits are ignored. Further, we would like to understand in how far

16

passengers indeed face a choice between different strategies. For these reasons we

17

base our analysis in this paper on the observed choices of morning commuters.

18 19

3. Data Description and Preparation

20

Through Transport for London (TfL) smart card data from London’s public

21

transport network, commonly referred to as “Oyster card”, has been obtained.

22

London is a good case study site for our analysis because of three reasons. First,

23

the public transport network is large and dense offering passengers a large number

24

of route choices. Second, public transport services are operated frequency based.

25

Even though there might be an internal timetable within TfL, in many cases

26

passengers only find information about service frequencies at bus stops for

27

frequent services during peak hour. Third, service reliability in London is not as

28

high as in many other cities with smart card systems. All three reasons should

29

encourage passengers to consider in many cases fairly complex hyperpaths.

30 31

The Oyster smart card system is implemented in London’s bus, tube, tram, DLR

32

as well as parts of its commuter rail system (see

33

http://www.tfl.gov.uk/tickets/14836.aspx). Smart card data are convenient for

7

1

travellers, operators as well as analysts. Pelletier et al.(2011) described in more

2

detail possibilities as well as limitations of smart card data use by categorising

3

existing researches into strategic-level studies (such as long-term planning),

4

tactical-level studies (such as service adjustments and network development) and

5

operational-level studies (such as ridership statistics and performance indicators).

6 7

We obtained two weeks of Oyster card data for the period 08 Nov - 22 Nov 2007

8

with an average of 6.3 Million swipes recorded per day. Cardholders travelling by

9

bus only have to swipe when boarding a bus. Travellers on all track bound modes

10

though have to swipe when boarding as well as when alighting. For the purpose of

11

our analysis on path choice decision, this additional alighting record is obviously

12

advantageous. However, interchanges between tube lines are not captured by

13

Oyster card, whereas the bus data do record the route number taken. Furthermore,

14

the bus network offers users far more routing options and potentially complex

15

hyperpaths with several bus routes departing from the same stop. Therefore, our

16

initial analysis focuses on bus records only.

17 18

Due to the size of our data set as well as some incomplete records significant pre-

19

processing is required. As Kusakabe et al. (2010) also noted this pre-processing

20

effort required for the use of smart card data can be very substantial. Firstly we

21

reduced the dataset to only that information relevant for our analysis. These are

22

-

Card number, to identify the same traveller over several days,

23

-

Route ID, to identify the bus route,

24

-

Boarding time.

25

We further kept the boarding location information in our database, however,

26

unfortunately the boarding location recorded on the Oyster card are not reliable as

27

the system is not connected with the GPS system. Note further that Oyster card

28

does not record the bus ID the passenger is boarding but only the route number.

29

Clearly these are limitations to our study, but we believe to partly overcome these

30

by reducing our sample to only those travellers who use a bus every day of the 10

31

week days in our sample before 9.30 am, meaning that we are likely to pick up

32

only regular morning commuters. The aim of this data reduction is to a) ensure

33

that we pick up passengers who are indeed facing a repeated choice and b) to

34

overcome the limitation due to missing spatial data in our data set. That is, we 8

1

assume, that those travelling for 10 consecutive weekdays before 9.30am by bus

2

are likely to be home-to-work commuters and hence face the same line choice

3

each day. For the same reason we only select the first boarding of the day,

4

meaning that it is likely that we pick up the same boarding point for each

5

observation, the person’s home. Furthermore, we reduce our sample to those

6

commuters whose hyperpath presumably only consists of bus route options from

7

their home location. With these stringent conditions we reduce our sample size to

8

22492 regular bus commuters. Fig. 1 illustrates the share of samples for the size

9

of the choice set. Note that the share of people with two alternative bus lines

10

(30%, 6839 commuters) is larger than the share of people who use only one bus

11

line (23%, 5190 people). This may be because people are changing bus lines

12

stochastically in line with the hyperpath concept, which we further investigate in

13

the following. As will be explained in the next section, we assume that the choice

14

set of each person is known and the Markov Analysis only addresses the choice

15

variation within each person’s choice set. This means that people who continue

16

using only one line, i.e., persons who always choose the same line (choice set size

17

equal to one), can be excluded from the data set. Eventually, our sample size is

18

reduced to 17302 commuters or 17302 line choice observations.

Share of Samples

19 35% 30% 25% 20% 15% 10% 5% 0% 1

20 21

2

3

4 5 6 7 Size of Choice Set

8

9

10

Fig. 1. Choice Set Size Distribution

22 23

4. Initial N-Step Markov Analysis

24

The idea to use Markov chains for day to day route choice variation has been

25

applied by Yang and Liu (2007). They present a new Markov model to study 9

1

travellers’ stochastic behaviour in the day-to-day route choice adjustment process.

2

The model is characterised by two components: how often a traveller reconsiders

3

his/her route choice (route-switching rate), and the probability to take a certain

4

route (route choice probability). By applying evolutionary game theory, the

5

conventional perfect information and complete rationality requirements in

6

equilibrium analysis are relaxed. Their behavioural assumption for an individual

7

is the Markov decision rule, i.e., one makes route choice “today” only depending

8

on the limited road information available from “yesterday”, and behaves not

9

completely rationally in that one might choose the non-optimal route with certain

10

probability.

11 12

For our objectives we are not considering equilibrium problems but otherwise the

13

problem is similar. In contrast to Yang and Liu (2007), however, to analyse the

14

consistency in route choice over days, we adopted an n-Step Markov model. The

15

choice of route on day d is assumed to depend on the choices on n previous days.

16

As we are not interested in which specific route the passenger is taking, but only

17

in whether the traveller is taking the same or a different route the choice on the

18

first day is generally abbreviated with bus A in the following. On the next day the

19

passenger has then the choice to take the same bus A or a different bus B. If the

20

passenger took two different buses on the first two days, on the third day he/she

21

then has a choice between buses A, B, C and so on.

22 23

In the first analysis step we assume n=2. Our choice of independent days is taken

24

as the previous workday as well as the same weekday during the previous week.

25

The letters follow chronological order of choices from left to right. Therefore, the

26

two letters before the underline indicate the routes taken on previous days and the

27

last letter indicates the route chosen on the predicted day. For example AA_A

28

indicates that the traveller is taking the same route on all three days, AB_B

29

indicates that the traveller is taking the same bus as yesterday, but that he took a

30

different bus route on the same weekday on the previous week. AB_C indicates

31

that different buses are taken each day. The 3-Step case can also be calculated in

32

the same way, assuming e.g. the day before yesterday as an additional

33

independent variable. The general form of the probability that a person i will

34

choose transit line j on day d in the 2-Step case can be described as follows; 10

 Pr AA _ A   Pr AA _ C   Ji 1   pijd  Pr AB _ A  Pr AB _ B    Pr AB _ C     Ji  2

1

2

if jd  jd 1 , jd 1  jd 7 if jd  jd 1 , jd 1  jd 7 if jd  jd 1 , jd 1  jd 7

(1)

if jd  jd 7 , jd 1  jd 7 if jd  jd 1 , jd  jd 7 , jd 1  jd 7

Where: i

:

person,

j

:

transit line choice,

jd

:

transit line choice on the dth day,

pijd :

probability that person i chooses transit line j on the dth day,

Ji

number of available transit lines for person i.

:

3 4

Note that we assume here that the choice set of each person is known in advance.

5

Suppose that a person selected the same line on the previous day and the same

6

weekday during the previous week. In this case, we can calculate two

7

probabilities, Pr(AA_A), and Pr(AA_C). Note that C stands for a generic other line

8

than the lines chosen before (A or A and B). Therefore, if calculating the

9

probability of a specific line C, we divide the choice probability by the number of

10

other lines that are in the person’s choice set but not taken before.

11

words, assume the person chose AA on previous days and the choice set size of

12

this person, Ji, is larger than three, i.e., the person has more than two lines other

13

than the line (s)he used on the day before and the same weekday during the

14

previous week, then we have to divide the probability by the number of lines

15

which are regarded as C. This is the reason why Pr(AA_C) is divided by (Ji-1).

16

Similarly, in the case when a person i used different lines on the previous day and

17

the same weekday during the previous week, the probability of choosing line j that

18

is regarded as the ‘other’ (C) line should be divided by (Ji-2).

In other

19 20

Fig. 2 describes the probability of each choice for 2-step Markov models,

21

suggesting that there is considerable variation in routes chosen, and indicates that

22

a large amount of commuters change route at least on some days. It also shows 11

1

that only around 23% choose the same bus every day whereas around 20% choose

2

a different bus route on all three days. The remaining percentage of passengers

3

chooses a different bus on at least one out of the three days. Fig.3 shows the

4

results for n=3 where the independent days are taken as the day before, two days

5

before and the same weekday in the week before. The percentage of commuters

6

taking every day the same bus now reduces to less than 17%. Note, however, that

7

the percentage of commuters taking a different bus every day reduces even further

8

to below 6%. The remaining percentage of passengers chooses the same bus route

9

on at least two out of the four days in question.

10 11

Both figures further indicate that the day of the week for which the route choice is

12

predicted does not appear to have a significant influence on the results, possibly

13

except for a small “Monday effect”, i.e. on Mondays the percentage of passengers

14

taking the same route as last Friday and last week Monday (or last Friday, last

15

Thursday and last week Monday in the three step case) is even smaller. This is in

16

line with an observation that the choice on the last day travelled appears to have a

17

slightly higher influence on the predictability of the line chosen on the day in

18

question. In the two step model we observe a slightly higher Markov probability

19

for the choice AB_B than AB_A. In the three step case the difference is more

20

apparent as the percentage of ABB_B is 10.9% but for choices ABA_B and

21

ABC_B only 5.3% and 4.8%, respectively. In general though the figures illustrate

22

that the differences are small and that choices are difficult to predict which is in

23

line with our hypothesis that these morning commuters choose “randomly” from a

24

set of attractive lines.

25

Markov Probability

40% 30% 20% 10% 0% AA_A

26 27

Monday

Fig.2

AA_C

Tuesday

AB_A

Wednesday

AB_B

AB_C

Thursday

Friday

2-Step results for route prediction during the 2nd week

12

Markov Probability

40% 30% 20% 10%

AAA_A AAA_D AAC_A AAC_C AAC_D ABA_A ABA_B ABA_D ABB_A ABB_B ABB_D ABC_A ABC_B ABC_C ABC_D

0%

Monday

Tuesday

Wednesday

Thursday

Friday

1 2

Fig.3

3-Step results for route prediction during the 2nd week

3 4

5. Consideration of Overlapping Routes

5

In order to understand whether the variation in chosen routes observed in Figs 2

6

and 3 is indeed due to passengers traveling on hyperpaths or whether this is due to

7

other reasons, overlap of routes is considered in this section. We expect that, if the

8

threshold for overlapping is relaxed, i.e., if routes with less overlapping rate are

9

recognised as the same route, and the variation in route choice decreases

10

disproportionally to the relaxation of the overlapping threshold, a large part of the

11

route choice variation is indeed due to common lines that are part of the travellers’

12

hyperpaths.

13 14

As explained before though, unfortunately our data set does not allow us to induce

15

the home bus stop of the respondent. Therefore, we cannot identify directly

16

whether a traveller took a different route B on a second day because it is part of

17

the same hyperpath or because of different reasons. As an approximation we can

18

only identify the degree to which the routes overlap. We therefore define pxy as the

19

percentage of shared stops on the routes of two bus lines x and y.

20 21

If the percentage exceeds a predefined threshold S these two lines are considered

22

as the same line as it is presumed likely that passengers could take both lines from

23

their home location. This means that the smaller S the smaller the set of lines.

24

S=0% would mean that the traveller always faces only the option of one line,

25

whereas S=100% leads to results identically to those shown in the previous

26

section. 13

1 2

Tests were carried out with different thresholds S for 2-step Markov models for

3

prediction of route choice during the second week. Fig.4 shows the goodness of

4

fit index () for different overlapping thresholds S during weekdays, which are

5

calculated with the following equation.

6 7

  1  LL0 LLmodel 

(2)

With our likelihood measures calculated as

8

LL0  ln i  j d 1 J i  ij

9

LLmodel   ln i  j d  pijd  ij

d

( ) d

∑∑∑ ( )

( )

∑∑∑ ( )

(3)

(4)

10 11

where

12

S= 0% is meaningless (network collapses to a single line and traveller faces no

13

choice) we use as our lower bound the case when two lines share at least one stop.

14 15

Fig.4 suggests that considering overlap is important to increase the model fit of

16

the predicted model. The likelihood ratio index  improves significantly for low

17

overlapping thresholds S (i.e. 20% is better than 40% and so on) but no

18

improvement, compared to ignoring overlapping, can be observed for S>40%. To

19

understand why the likelihood ratio did not change when S>40%, we calculated

20

the share of line pairs which are treated as ‘same’ line by each overlapping

21

threshold. The results are 0.17%, 0.21%, 0.37%, 1.12%, and 14.33%, respectively,

22

for thresholds of 80%, 60%, 40%, 20% and ‘any sharing’. This suggests that when

23

S is large (>40%), only less than 0.4% of line pairs are treated as ‘same’ line, and

24

in such case, the result may remain unchanged. At the same time, it is interesting

25

that likelihood ratio improves when S=20%, even though only 1.12% of line pairs

26

are merged. Table 2 shows the goodness of fit indices for the 2-step Markov

27

analysis and Fig.5(a)-(d) illustrates the results of the 2-Step Markov model for

28

route prediction during the 2nd week for different overlapping thresholds S.

29

Obviously the percentage of AA_A increases with lower thresholds as there are

30

less distinguished lines. Interestingly though the increase is nonlinear compared to

31

the decrease in choices which leads to the increase in the model fit.

equals 1 if person i chooses option j on day d and zero otherwise. Since

32 14

Goodness of fit index

0.40 0.35

Monday Wednesday Friday

0.30 0.25

Tuesday Thursday

0.20 0.15 0.10 0.05 0.00 Any sharing

20%

40%

60%

80%

100%

Overlapping threshold S

1 2

Fig.4 Good of fit index () for different overlapping thresholds S

3

Table 2 Goodness of fit index for different thresholds S 100% (all lines

any sharing

20%

40%

60%

80%

Mon

0.245

0.153

0.146

0.155

0.157

0.157

Tues

0.306

0.170

0.154

0.158

0.158

0.159

Wed

0.336

0.180

0.156

0.158

0.159

0.159

Thrs

0.350

0.170

0.156

0.159

0.160

0.161

Fri

0.324

0.171

0.150

0.155

0.156

0.157

distinguished)

4 40% overlapping

20% overlapping

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0

0

AA_A

Monday

AA_C

Tuesday

AB_A

Wednesday

AB_B

Thursday

AA_A

AB_C

Friday

Monday

AA_C Tuesday

AB_A

AB__B

Wednesday

Thursday

AB_C Friday

(b) 40% overlapping

(a) 20% overlapping

80% overlapping

60% overlapping 1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2 0

0

AA_A Monday

AA_C Tuesday

AB_A Wednesday

AB_B Thursday

AB_C Friday

AA_A Monday

AA_C Tuesday

AB_A Wednesday

AB_B Thursday

AB_C Friday

15

(c) 60% overlapping

(d) 80% overlapping

Fig.5 2-Step results with S=20%, S=40%, S=60%, and S=80%

1 2

Comparing Fig.2 and Fig.5, we have evidence that the variation in route choice

3

decreases in case line overlap is considered. This may indicate that a large part of

4

the route choice variation is indeed due to common lines that are part of the

5

travellers’ hyperpaths or at least some route variation is due to overlap and

6

possibly hyperpaths. S > 60% reduces the significance of considering overlapping

7

as there remains significant variation in route choice prediction. Only for S

Suggest Documents